Sunteți pe pagina 1din 38

WEEK 6&7

Curve Fitting
Linear regression

Polynomial regression
Multiple regression
General linear least squares
Nonlinear regression

LESSON OUTCOMES
At the end of this topic, the students will be able:
To fits the data using linear and polynomial
regression
To fits the data using multiple linear and
non-linear regression
To assess and choose the preferred method
for any particular problems

Curve Fitting
Describes techniques to fit curves (curve fitting) to
discrete data to obtain intermediate estimates.
There are two general approaches for curve fitting:
Data exhibit a significant degree of error. The strategy is to
derive a single curve that represents the general trend of the data.
Data is very precise. The strategy is to pass a curve or a series of
curves through each of the points.

In engineering two types of applications are


encountered normally when fitting experimental data:
Trend analysis. Predicting values of dependent variable, may
include extrapolation beyond data points or interpolation between
data points.
Hypothesis testing. Comparing existing mathematical model with
measured data.
4

Three attempts to fit a best


curve through uncertain data
points.

a) Least-squares regression

Linear regression
Polynomial regression
Multiple regression
General linear least squares
Nonlinear regression

b) Linear interpolation
c) Curvilinear interpolation

Mathematical background in Simple Statistics


Arithmetic mean. The sum of the individual data
points (yi) divided by the number of points (n).
y

n
i 1, , n

Standard deviation. The most common measure of a


spread for a sample.
St
sy
n 1
S t ( yi y ) 2

or

2
y

y y

2
i

/n

n 1

(St total sum of the squares of the residuals


between data points and the m ean)
6

Variance. Representation of spread by the square of


the standard deviation.
s y2

2
(
y

y
)
i

Degrees of freedom

n 1

Coefficient of variation. Has the utility to quantify the


spread of data and provides a normalized measure of
the spread.

c.v.

sy
y

100%
7

Least-Squares Regression
Regression analysis is a study of the relationships among
variables.
Get the best straight line to fit through a set of uncertain
data points.
Calculate the slope and the intercept of the line.
Also fit the best polynomial to data.
Consider multiple linear regression for a case when one
variable depends on two or more variables in linear function.
8

Linear Regression
Fitting a straight line to a set of paired
observations: (x1, y1), (x2, y2),,(xn, yn).
The mathematical expression for straight line:

y=a0+a1x+e
where a0 = intercept
a1 = slope
e = error, or residual, between the model and the
observations (e = y a0 a1x ; discrepancy between

true value of y and approximate value, a0 + a1x)

Criteria for a Best Fit


Minimize the sum of the residual errors for all available data:
n

e (y a
i 1

i 1

n = total number of points

a1 xi )

However, this is an inadequate criterion


because the errors cancel.
So, another logical criteria might be to minimize the sum of
the absolute values of discrepancies:
n

e y a
i 1

i 1

a1 xi

However, this criterion is also


inadequate. It will minimize the sum of
the absolute values. This criterion also
does not yield a unique best fit.
10

Best strategy is to minimize the sum of the squares of the


residuals between the measured y and the y calculated with
the linear model:
n

S r e ( yi , measured yi , model ) ( yi a0 a1 xi ) 2
i 1

2
i

i 1

Eq.(17.3)

i 1

Yields a unique line for a given set of data. The line chosen that
minimizes the maximum distance that individual point fall
from the line.

11

Least-Squares Fit of a Straight Line


To determine values for a0 and a1, Eq.(17.3) is differentiated
with respect to each coefficient:
n

S r ( yi a0 a1 xi ) 2
i 1

S r
2 ( yi ao a1 xi ) 0
ao
S r
2 ( yi ao a1 xi ) xi 0
a1

Setting derivatives equal to


zero will result in a
minimum Sr

0 yi a 0 a1 xi

0 yi xi a 0 xi a1 xi2
where

na0
12

na0 xi a1 yi

x a x a x y
i

2
i

Normal equations,
can be solved
simultaneously

By using Cramer' s rule,


a1

n xi yi xi yi
n x xi
2
i

a0 y a1 x
Mean values

13

Example 1
Fit a straight line to the x and y values in the first two
columns of Table 1
x
1.0
2.0
3.0
4.0
5.0
6.0
7.0

a1
a1

n7

y
0.5
2.5
2.0
4.0
3.5
6.0
5.5

2
x
i 140

x i 28

n xi yi xi yi
n x xi
2
i

x i yi 119.5

7(119.5) (28)(24)
0.8393
2
7(140) (28)

24

28
4
7
24
y
3.4286
7

a0 y a1 x
a0 3.4286 (0.8393)(4) 0.0714

The least - squares fit is


y 0.8393x 0.0714

14

7.0

6.0

5.0

4.0

3.0

2.0

1.0

0.0
0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0
8.0
y = 0.8393x + 0.0714
R = 0.8683

15

Exercise 1
Fit the best straight line to the following set of x and y
values using the method of least-squares.
x

15

17

24

25

16

30

25

20

15

10

0
0

y = 4.1071x + 1.5357
R = 0.9822
17

Quantification of Error of Linear Regression


Recall that the sum of the residuals squared
which measures the deviations of the regression
line from the actual data values is
n

i 1

i 1

S r ei2 ( yi a0 a1 xi ) 2
The square of the residual
represents the square of the
vertical distance between the
data and another measure of
central tendency-the straight line.

18

The analogy can be extended further for cases:


The spread of points around the line is of similar
magnitude along the entire range of data.
The distribution of these points about the line is
normal.
The following statistical formulas may be used to
quantify the error associated with linear regression

S t ( yi y )

St
sy
n 1

S r ( yi a0 a1 xi )

sy / x

Sr

n2
19

Original standard deviation,


sy quantified the spread
around the mean.

Standard error of the estimate


quantifies the spread the data.
sy/x quantifies the spread
around the regression line
20

Goodness of our fit


Determine

1) Total sum of the squares around the mean for the


dependent variable, y, is St
2) Sum of the squares of residuals around the
regression line is Sr
3) Difference between two quantities, St-Sr quantifies
the improvement or error reduction due to
describing data in terms of a straight line rather than
as an average value.
4) Correlation of determination, r2 and correlation
coefficient, r.
21

St S r
r
St
2

coefficient of determination

St S r
r
St
correlation coefficient

For a perfect fit, Sr= 0 and r=r2=1, signifying that the


line explains 100 percent of the variability of the
data.
For r =r2=0, Sr=St, the fit represents no improvement.
In a more convenient form for computation of r is
n xi yi xi yi
r
2
2
n xi2 xi n yi2 yi
22

Example 3
Determine the total standard deviation, standard
error of the estimate and coefficient of correlation for
the linear regression line obtained in the Example 1.
xi
1.0
2.0
3.0
4.0
5.0
6.0
7.0

yi
0.5
2.5
2.0
4.0
3.5
6.0
5.5
24.0000

(yi-ymean)2

(yi-a0-a1xi)2

8.5765

0.1687

0.8622

0.5625

2.0408

0.3473

0.3265

0.3265

0.0051

0.5897

6.6122

0.7970

4.2908

0.1994

22.7143

2.9911
23

Solution
S t ( yi y ) 2 22.7143
sy

St

n 1

22.7143
1.9457
7 -1

S r ( yi a0 a1 xi ) 2 2.9911
Sr
2.9911
sy / x

0.7734
n2
7-2
St S r 22.7143 2.9911
2
r

0.868
St
22.7143
r 0.868 0.932
24

Exercise 4
Determine the total standard deviation, standard
error of the estimate and coefficient of correlation for
the linear regression line obtained in the Example 2.
x

15

17

24

25

25

Linearization of Nonlinear Relationships


Most of the engineering functions has nonlinear
relationships.
These functions cannot adequately be described by
a linear line.
Transformations can be used to express the data in
a form that is compatible with linear regression.
Examples of functions that can be linearized are
Exponential equation
Power equation
Saturation-growth rate equation
26

y 1e 1x

y 2 x 2

y 3

x
3 x

1 3 1 1

y 3 x 3

ln y 1 x ln 1

log y 2 log x log 2


27

Example 5
Fit simple power equation to the data in Table using a
logarithmic transformation of the data.
y 2 x 2
log y 2 log x log 2
log y 1.75 log x 0.300

log x

log y

0.5

-0.301

1.7

0.301

0.226

3.4

0.477

0.534

5.7

0.602

0.753

8.4

0.699

0.922
28

Polynomial Regression
Some engineering data is poorly
represented by a straight line where
the error is high.
Example (a) and (b): A curve would
be suited to fit the data.
Instead of trying to linearize some
of the nonlinear functions and use
linear regression,we may alternatively
fit polynomials to the data using
polynomial regression.
The least-squares procedure can be
readily extended to fit the data to a
higher order polynomial.

29

For a second-order polynomial or quadratic:

y=a0+a1x+a2x2 + e
For this case, the sum of the squares of the residual is
n

S r ( yi a0 a1 xi a2 xi2 ) 2
i 1

Derivative with respect to each of the unknown coefficients of the


polynomial
S
r

ao

2 ( yi ao a1 x1i a2 x i2 )

S r
2 xi ( yi ao a1 xi a2 x i2 )
a1
S r
2 x i2 ( yi ao a1 xi a2 x i2 )
a2

These eq. set to be zero and develop set of normal equations:


( n)a0 ( xi ) a1 ( xi2 )a2 yi

( xi ) a0 ( xi2 )a1 ( xi3 )a2 xi yi

( xi2 ) a0 ( xi3 )a1 ( xi4 )a2 xi2 yi


30

The coefficients of the unknowns can be calculated directly


from the observed data.
Then, solving a system of three simultaneous linear
equations.
The standard error is formulated as
Sy/x

Sr
n (m 1)

Coefficient of determination for polynomial regression can


also be as
r2

St S r
St

31

Example 6
Fit a second order polynomial to the data
xi

yi

2.1

7.7

13.6

27.2

40.9

61.1

Solution: Refer to page 472 text book

32

Multiple Linear Regression


Case where y is a linear function of two or more independent
variable:

y=a0+a1x1+a2x2 + e

n
x1i
x2i

x1i
x2i
a0
yi
(x1i)2 x1ix2i a1 = x1iyi
x1ix2i (x2i)2 a2
x2iyi
n

S r ( yi a0 a1 x1i a2 x2i ) 2
i 1

Sy/x

Sr
n (m 1)

Example:
Refer to page 475 text book
33

Multiple linear regression can also be extended to the power


equation of the general form,

This equation is extremely useful when fitting experimental


data.
To use multiple regression analysis, we transform the above
equation by taking logarithm of base 10 which yield

34

General Linear Least Squares


y a0 z0 a1 z1 a2 z 2 am z m e
z0 , z1, , z m are m 1 basis functions

Y Z A E

Z matrix of the calculated values of the basis functions


at the measured values of the independent variable
Y observed valued of the dependent variable
A unknown coefficien ts
E residuals

S r yi a j z ji
i 1
j 0

Minimized by taking its partial


derivative with respect to each of
the coefficients and setting the
resulting equation equal to zero
35

The normal equations can be expressed concisely in matrix


form as

ZT Y

ZT Z A

To solve it, matrix inverse can be employed:

Z T Y

1
T

A Z Z

36

Nonlinear Regression
The nonlinear models as those which have a nonlinear
dependence on their parameters. There are times when the
nonlinear model must fit the data.
As with linear least squares, nonlinear regression is based on
determining values of the parameters that minimize the sum of the
squares of the residual.
The Gauss-Newton method is one of the procedures to achieve
the least-squares criterion. It uses a Taylor series expansion to
approximate the nonlinear equation in linear form.
The least-squares theory is applied to obtain new parameter
values that move in the direction of minimizing the residuals.
37

For example nonlinear models,


D Z j A E

Z
j

f1

a0
f
2
a0
f n

a0

f1

a1
f 2

a1
f n

a1

y1 f ( x1 )
y f ( x )
2
2

f
(
x
)
n
n

ao
a
1

a
m

Z
j

Z A
D
Z

j
j

a
a
a
0, j 1
0, j
0
a
a
a
1, j 1
1, j
1
This procedure is repeated until the solution converges,

ak , j 1 ak , j
ak , j 1

100%

Example:
Refer to page
483 text book

until falls below an aceptable stoppingcriterion.


38

S-ar putea să vă placă și