Week06&07 - Curve Fitting PDF

WEEK 6&7
Curve Fitting
Linear regression
Polynomial regression
Multiple regression
General linear least squares
Nonlinear regression
LESSON OUTCOMES
At the end of this topic, the students will be able:
To fits the data using linear and polynomial
regression
To fits the data using multiple linear and
non-linear regression
To assess and choose the preferred method
for any particular problems
Curve Fitting
Describes techniques to fit curves (curve fitting) to
discrete data to obtain intermediate estimates.
There are two general approaches for curve fitting:
Data exhibit a significant degree of error. The strategy is to
derive a single curve that represents the general trend of the data.
Data is very precise. The strategy is to pass a curve or a series of
curves through each of the points.
In engineering two types of applications are

encountered normally when fitting experimental data:
Trend analysis. Predicting values of dependent variable, may
include extrapolation beyond data points or interpolation between
data points.
Hypothesis testing. Comparing existing mathematical model with
measured data.
4
Three attempts to fit a best

curve through uncertain data
points.
a) Least-squares regression
Linear regression
Polynomial regression
Multiple regression
General linear least squares
Nonlinear regression
b) Linear interpolation
c) Curvilinear interpolation
Mathematical background in Simple Statistics

Arithmetic mean. The sum of the individual data
points (yi) divided by the number of points (n).
y
n
i 1, , n
Standard deviation. The most common measure of a

spread for a sample.
St
sy
n 1
S t ( yi y ) 2
or
2
y
y y
2
i
/n
n 1
(St total sum of the squares of the residuals

between data points and the m ean)
6
Variance. Representation of spread by the square of

the standard deviation.
s y2
2
(
y
y
)
i
Degrees of freedom
n 1
Coefficient of variation. Has the utility to quantify the

spread of data and provides a normalized measure of
the spread.
c.v.
sy
y
100%
7
Least-Squares Regression
Regression analysis is a study of the relationships among
variables.
Get the best straight line to fit through a set of uncertain
data points.
Calculate the slope and the intercept of the line.
Also fit the best polynomial to data.
Consider multiple linear regression for a case when one
variable depends on two or more variables in linear function.
8
Linear Regression
Fitting a straight line to a set of paired
observations: (x1, y1), (x2, y2),,(xn, yn).
The mathematical expression for straight line:
y=a0+a1x+e
where a0 = intercept
a1 = slope
e = error, or residual, between the model and the
observations (e = y a0 a1x ; discrepancy between
true value of y and approximate value, a0 + a1x)
Criteria for a Best Fit

Minimize the sum of the residual errors for all available data:
n
e (y a
i 1
i 1
n = total number of points
a1 xi )
However, this is an inadequate criterion

because the errors cancel.
So, another logical criteria might be to minimize the sum of
the absolute values of discrepancies:
n
e y a
i 1
i 1
a1 xi
However, this criterion is also

inadequate. It will minimize the sum of
the absolute values. This criterion also
does not yield a unique best fit.
10
Best strategy is to minimize the sum of the squares of the

residuals between the measured y and the y calculated with
the linear model:
n
S r e ( yi , measured yi , model ) ( yi a0 a1 xi ) 2
i 1
2
i
i 1
Eq.(17.3)
i 1
Yields a unique line for a given set of data. The line chosen that
minimizes the maximum distance that individual point fall
from the line.
11
Least-Squares Fit of a Straight Line

To determine values for a0 and a1, Eq.(17.3) is differentiated
with respect to each coefficient:
n
S r ( yi a0 a1 xi ) 2
i 1
S r
2 ( yi ao a1 xi ) 0
ao
S r
2 ( yi ao a1 xi ) xi 0
a1
Setting derivatives equal to

zero will result in a
minimum Sr
0 yi a 0 a1 xi
0 yi xi a 0 xi a1 xi2
where
na0
12
na0 xi a1 yi
x a x a x y
i
2
i
Normal equations,
can be solved
simultaneously
By using Cramer' s rule,

a1
n xi yi xi yi
n x xi
2
i
a0 y a1 x
Mean values
13
Example 1
Fit a straight line to the x and y values in the first two
columns of Table 1
x
1.0
2.0
3.0
4.0
5.0
6.0
7.0
a1
a1
n7
y
0.5
2.5
2.0
4.0
3.5
6.0
5.5
2
x
i 140
x i 28
n xi yi xi yi
n x xi
2
i
x i yi 119.5
7(119.5) (28)(24)
0.8393
2
7(140) (28)
24
28
4
7
24
y
3.4286
7
a0 y a1 x
a0 3.4286 (0.8393)(4) 0.0714
The least - squares fit is

y 0.8393x 0.0714
14
7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
y = 0.8393x + 0.0714
R = 0.8683
15
Exercise 1
Fit the best straight line to the following set of x and y
values using the method of least-squares.
x
15
17
24
25
16
30
25
20
15
10
0
0
y = 4.1071x + 1.5357
R = 0.9822
17
Quantification of Error of Linear Regression

Recall that the sum of the residuals squared
which measures the deviations of the regression
line from the actual data values is
n
i 1
i 1
S r ei2 ( yi a0 a1 xi ) 2
The square of the residual
represents the square of the
vertical distance between the
data and another measure of
central tendency-the straight line.
18
The analogy can be extended further for cases:

The spread of points around the line is of similar
magnitude along the entire range of data.
The distribution of these points about the line is
normal.
The following statistical formulas may be used to
quantify the error associated with linear regression
S t ( yi y )
St
sy
n 1
S r ( yi a0 a1 xi )
sy / x
Sr
n2
19
Original standard deviation,

sy quantified the spread
around the mean.
Standard error of the estimate

quantifies the spread the data.
sy/x quantifies the spread
around the regression line
20
Goodness of our fit

Determine
1) Total sum of the squares around the mean for the

dependent variable, y, is St
2) Sum of the squares of residuals around the
regression line is Sr
3) Difference between two quantities, St-Sr quantifies
the improvement or error reduction due to
describing data in terms of a straight line rather than
as an average value.
4) Correlation of determination, r2 and correlation
coefficient, r.
21
St S r
r
St
2
coefficient of determination
St S r
r
St
correlation coefficient
For a perfect fit, Sr= 0 and r=r2=1, signifying that the

line explains 100 percent of the variability of the
data.
For r =r2=0, Sr=St, the fit represents no improvement.
In a more convenient form for computation of r is
n xi yi xi yi
r
2
2
n xi2 xi n yi2 yi
22
Example 3
Determine the total standard deviation, standard
error of the estimate and coefficient of correlation for
the linear regression line obtained in the Example 1.
xi
1.0
2.0
3.0
4.0
5.0
6.0
7.0
yi
0.5
2.5
2.0
4.0
3.5
6.0
5.5
24.0000
(yi-ymean)2
(yi-a0-a1xi)2
8.5765
0.1687
0.8622
0.5625
2.0408
0.3473
0.3265
0.3265
0.0051
0.5897
6.6122
0.7970
4.2908
0.1994
22.7143
2.9911
23
Solution
S t ( yi y ) 2 22.7143
sy
St
n 1
22.7143
1.9457
7 -1
S r ( yi a0 a1 xi ) 2 2.9911
Sr
2.9911
sy / x
0.7734
n2
7-2
St S r 22.7143 2.9911
2
r
0.868
St
22.7143
r 0.868 0.932
24
Exercise 4
Determine the total standard deviation, standard
error of the estimate and coefficient of correlation for
the linear regression line obtained in the Example 2.
x
15
17
24
25
25
Linearization of Nonlinear Relationships

Most of the engineering functions has nonlinear
relationships.
These functions cannot adequately be described by
a linear line.
Transformations can be used to express the data in
a form that is compatible with linear regression.
Examples of functions that can be linearized are
Exponential equation
Power equation
Saturation-growth rate equation
26
y 1e 1x
y 2 x 2
y 3
x
3 x
1 3 1 1
y 3 x 3
ln y 1 x ln 1
log y 2 log x log 2

27
Example 5
Fit simple power equation to the data in Table using a
logarithmic transformation of the data.
y 2 x 2
log y 2 log x log 2
log y 1.75 log x 0.300
log x
log y
0.5
-0.301
1.7
0.301
0.226
3.4
0.477
0.534
5.7
0.602
0.753
8.4
0.699
0.922
28
Polynomial Regression
Some engineering data is poorly
represented by a straight line where
the error is high.
Example (a) and (b): A curve would
be suited to fit the data.
Instead of trying to linearize some
of the nonlinear functions and use
linear regression,we may alternatively
fit polynomials to the data using
polynomial regression.
The least-squares procedure can be
readily extended to fit the data to a
higher order polynomial.
29
For a second-order polynomial or quadratic:
y=a0+a1x+a2x2 + e
For this case, the sum of the squares of the residual is
n
S r ( yi a0 a1 xi a2 xi2 ) 2
i 1
Derivative with respect to each of the unknown coefficients of the

polynomial
S
r
ao
2 ( yi ao a1 x1i a2 x i2 )
S r
2 xi ( yi ao a1 xi a2 x i2 )
a1
S r
2 x i2 ( yi ao a1 xi a2 x i2 )
a2
These eq. set to be zero and develop set of normal equations:

( n)a0 ( xi ) a1 ( xi2 )a2 yi
( xi ) a0 ( xi2 )a1 ( xi3 )a2 xi yi
( xi2 ) a0 ( xi3 )a1 ( xi4 )a2 xi2 yi

30
The coefficients of the unknowns can be calculated directly

from the observed data.
Then, solving a system of three simultaneous linear
equations.
The standard error is formulated as
Sy/x
Sr
n (m 1)
Coefficient of determination for polynomial regression can

also be as
r2
St S r
St
31
Example 6
Fit a second order polynomial to the data
xi
yi
2.1
7.7
13.6
27.2
40.9
61.1
Solution: Refer to page 472 text book
32
Multiple Linear Regression

Case where y is a linear function of two or more independent
variable:
y=a0+a1x1+a2x2 + e
n
x1i
x2i
x1i
x2i
a0
yi
(x1i)2 x1ix2i a1 = x1iyi
x1ix2i (x2i)2 a2
x2iyi
n
S r ( yi a0 a1 x1i a2 x2i ) 2
i 1
Sy/x
Sr
n (m 1)
Example:
Refer to page 475 text book
33
Multiple linear regression can also be extended to the power

equation of the general form,
This equation is extremely useful when fitting experimental

data.
To use multiple regression analysis, we transform the above
equation by taking logarithm of base 10 which yield
34
General Linear Least Squares

y a0 z0 a1 z1 a2 z 2 am z m e
z0 , z1, , z m are m 1 basis functions
Y Z A E
Z matrix of the calculated values of the basis functions

at the measured values of the independent variable
Y observed valued of the dependent variable
A unknown coefficien ts
E residuals
S r yi a j z ji
i 1
j 0
Minimized by taking its partial

derivative with respect to each of
the coefficients and setting the
resulting equation equal to zero
35
The normal equations can be expressed concisely in matrix

form as
ZT Y
ZT Z A
To solve it, matrix inverse can be employed:
Z T Y
1
T
A Z Z
36
Nonlinear Regression
The nonlinear models as those which have a nonlinear
dependence on their parameters. There are times when the
nonlinear model must fit the data.
As with linear least squares, nonlinear regression is based on
determining values of the parameters that minimize the sum of the
squares of the residual.
The Gauss-Newton method is one of the procedures to achieve
the least-squares criterion. It uses a Taylor series expansion to
approximate the nonlinear equation in linear form.
The least-squares theory is applied to obtain new parameter
values that move in the direction of minimizing the residuals.
37
For example nonlinear models,

D Z j A E
Z
j
f1
a0
f
2
a0
f n
a0
f1
a1
f 2
a1
f n
a1
y1 f ( x1 )
y f ( x )
2
2
f
(
x
)
n
n
ao
a
1
a
m
Z
j
Z A
D
Z
j
j
a
a
a
0, j 1
0, j
0
a
a
a
1, j 1
1, j
1
This procedure is repeated until the solution converges,
ak , j 1 ak , j
ak , j 1
100%
Example:
Refer to page
483 text book
until falls below an aceptable stoppingcriterion.

38

Week06&07 - Curve Fitting PDF

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Week06&07 - Curve Fitting PDF

Încărcat de

Drepturi de autor:

Formate disponibile

WEEK 6&7

In engineering two types of applications are

Three attempts to fit a best

Mathematical background in Simple Statistics

Standard deviation. The most common measure of a

(St total sum of the squares of the residuals

Variance. Representation of spread by the square of

Coefficient of variation. Has the utility to quantify the

true value of y and approximate value, a0 + a1x)

Criteria for a Best Fit

n = total number of points

However, this is an inadequate criterion

However, this criterion is also

Best strategy is to minimize the sum of the squares of the

Least-Squares Fit of a Straight Line

Setting derivatives equal to

By using Cramer' s rule,

The least - squares fit is

Quantification of Error of Linear Regression

The analogy can be extended further for cases:

Original standard deviation,

Standard error of the estimate

Goodness of our fit

1) Total sum of the squares around the mean for the

For a perfect fit, Sr= 0 and r=r2=1, signifying that the

Linearization of Nonlinear Relationships

log y 2 log x log 2

For a second-order polynomial or quadratic:

Derivative with respect to each of the unknown coefficients of the

These eq. set to be zero and develop set of normal equations:

( xi ) a0 ( xi2 )a1 ( xi3 )a2 xi yi

( xi2 ) a0 ( xi3 )a1 ( xi4 )a2 xi2 yi

The coefficients of the unknowns can be calculated directly

Coefficient of determination for polynomial regression can

Solution: Refer to page 472 text book

Multiple Linear Regression

Multiple linear regression can also be extended to the power

This equation is extremely useful when fitting experimental

General Linear Least Squares

Z matrix of the calculated values of the basis functions

Minimized by taking its partial

The normal equations can be expressed concisely in matrix

To solve it, matrix inverse can be employed:

For example nonlinear models,

until falls below an aceptable stoppingcriterion.

S-ar putea să vă placă și