Sunteți pe pagina 1din 34

# Numerical Methods for Civil Engineers

Lecture 7

Curve Fitting

 Linear Regression
 Polynomial Regression
 Multiple Linear Regression

SURANAREE
UNIVERSITY OF TECHNOLOGY

INSTITUTE OF ENGINEERING
SCHOOL OF CIVIL ENGINEERING

LINEAR REGRESSION
y

## Candidate lines for curve fit

y = x +

x
No exact solution but many approximated solutions

## Error Between Model and Observation

y

Observation: [ xi yi ]
Model: y = x +

Error: ei = yi - xi +

x
Criteria for a Best Fit
Find the BEST line which minimize the sum of error for all data

## Least-Square Fit of a Straight Line

Minimize sum of the square of the errors
n

i =1

i =1

S r = e i2 = (y i x i )
Differentiate with respect to each coefficient:

S r
= 2 ( y i x i )

S r
= 2 [( y i x i ) x i ]

Setting derivatives = 0 :

0 = y i x i
0 = y i x i x i x i2
From = n , express equations as set of 2 unknowns ( , )

n + x i = y i

x i + x i2 = y i x i

## Solve equations simultaneously:

1
xi yi xi yi
n
=
1
2
2
xi ( xi )
n

S xy
S xx

= y x
where y and x are the mean of y and x
1
Define: S xy = xi yi xi yi
n
1
2
S xx = x ( xi )
n
2
i

S yy = yi2

1
2

y
( i)
n

y = x +

## Example: Fit a straight line to x and y values

xi2

xi yi

y i2

xi

yi

1
2
3
4
5
6
7

0.5
2.5
2.0
4.0
3.5
6.0
5.5

1
4
9
16
25
36
49

0.5
5.0
6.0
16.0
17.5
36.0
38.5

0.25
6.25
4
16
12.25
36
30.25

28

24

140

119.5

105

(119.5) (28)(24) / 7
=
= 0.8393
2
(140) (28) / 7

## = 3.4286 0.8393(4) = 0.0714

n =7
28
x =
=4
7
24
y =
= 3 . 4286
7

Least-square fit:
y = 0 . 8393 x + 0 . 0714

## How good is our fit?

Sum of the square of the errors:
n

i =1

i =1

S r = ei2 = ( yi xi ) = ( S xx S yy S xy2 ) / S xx
2

## Sum of the square around the mean:

St = ( yi y ) = S yy
i =1

## Standard errors of the estimation:

Standard deviation:

sy / x

Sr
=
n2

St
sy =
n2

sy

Linear regression
sy > sy/x

sy/x

S xy2

Coefficient of determination

St S r
=
r =
S xx S yy
St

r2 
    y

## Example: error analysis of the linear fit

xi

yi

( yi y )

1
2
3
4
5
6
7

0.5
2.5
2.0
4.0
3.5
6.0
5.5

8.5765
0.8622
2.0408
0.3265
0.0051
6.6122
4.2908

28

24

22.7143

( y i - - x i) 2
0.1687
0.5626
0.3473
0.3265
0.5896
0.7972
0.1993

y = 3.4286
= 0.8393

= 0.0714
22.7143
sy =
72
= 2.131
sy / x

2.9911

2.9911
=
72
= 0.773

St
Sr
Since sy/x < sy , linear regression has merit.
r=

22.7143 2.9911
= 0.868 = 0.932
22.7143

## OR Example: error analysis of the linear fit

xi2

y i2

xi

yi

xi yi

1
2
3
4
5
6
7

0.5
2.5
2.0
4.0
3.5
6.0
5.5

1
4
9
16
25
36
49

0.5
5.0
6.0
16.0
17.5
36.0
38.5

0.25
6.25
4
16
12.25
36
30.25

28

24

140

119.5

105

S xx = 140 282 / 7 = 28
S yy = 105 242 / 7 = 22.7
S xy = 119.5 28 24 / 7 = 23.5

2
23.5
r2 =
= 0.869
28 22.7

= 2.977
sy =
sy / x =

22.7
= 2.131
72
2.977
= 0.772
72

## Confidence Interval (CI)

A confidence interval is an interval in which a measurement or trial
falls corresponding to a given probability.
y i
y i

i
x
x
For CI 95%, you can be 95% confident that the two curved

## confidence bands enclose the true best-fit linear regression line,

leaving a 5% chance that the true line is outside those boundaries.

## A 100 (1 - ) % confidence interval for yi is given by

Confidence interval 95% = 0.05

yi t / 2 s y / x

1 ( xi x ) 2
+
n
S xx

xi
1
2
3
4
5
6
7

yi
0.5
2.5
2.0
4.0
3.5
6.0
5.5

## 95% Confidence = 0.05 t/2 = t0.025(df = n-2 = 5) = 2.571

1 (3.4 4) 2
Interval: 2.9148 (2.571) (0.772) +
7
28
2.9148 0.7832

T-Distribution
t 0.025

## Probability density function of

the t distribution:

t 0.025

t 0.005

t 0.005

f ( x) =

(1 + x 2 / ) ( +1) / 2

t
95%
99%

B(0.5, 0.5 )

## where B is the beta function and

is a positive integer
shape parameter.

## The formula for the beta function is

1

B ( , ) = t 1 (1 t ) 1 dt
0

The following is the plot of the t probability density function for 4 different
values of the shape parameter.

= df
Degree of freedom

## In fact, the t distribution with equal to 1 is a Cauchy distribution.

The t distribution approaches a normal distribution as becomes large.
The approximation is quite good for values of > 30.

Critical Values of t
Confidence Interval

df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

1
2
3
4
5

3.078
1.886
1.638
1.533
1.476

6.314
2.920
2.353
2.132
2.015

12.706
4.303
3.182
2.776
2.571

31.821
6.965
4.541
3.747
3.365

63.657
9.925
5.841
4.604
4.032

318.313
22.327
10.215
7.173
5.893

6
7
8
9
10

1.440
1.415
1.397
1.383
1.372

1.943
1.895
1.860
1.833
1.812

2.447
2.365
2.306
2.262
2.228

3.143
2.998
2.896
2.821
2.764

3.707
3.499
3.355
3.250
3.169

5.208
4.782
4.499
4.296
4.143

11
12
13
14
15

1.363
1.356
1.350
1.345
1.341

1.796
1.782
1.771
1.761
1.753

2.201
2.179
2.160
2.145
2.131

2.718
2.681
2.650
2.624
2.602

3.106
3.055
3.012
2.977
2.947

4.024
3.929
3.852
3.787
3.733

16
17
18
19
20

1.337
1.333
1.330
1.328
1.325

1.746
1.740
1.734
1.729
1.725

2.120
2.110
2.101
2.093
2.086

2.583
2.567
2.552
2.539
2.528

2.921
2.898
2.878
2.861
2.845

3.686
3.646
3.610
3.579
3.552

Confidence Interval
df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

21
22
23
24
25

1.323
1.321
1.319
1.318
1.316

1.721
1.717
1.714
1.711
1.708

2.080
2.074
2.069
2.064
2.060

2.518
2.508
2.500
2.492
2.485

2.831
2.819
2.807
2.797
2.787

3.527
3.505
3.485
3.467
3.450

26
27
28
29
30

1.315
1.314
1.313
1.311
1.310

1.706
1.703
1.701
1.699
1.697

2.056
2.052
2.048
2.045
2.042

2.479
2.473
2.467
2.462
2.457

2.779
2.771
2.763
2.756
2.750

3.435
3.421
3.408
3.396
3.385

31
32
33
34
35

1.309
1.309
1.308
1.307
1.306

1.696
1.694
1.692
1.691
1.690

2.040
2.037
2.035
2.032
2.030

2.453
2.449
2.445
2.441
2.438

2.744
2.738
2.733
2.728
2.724

3.375
3.365
3.356
3.348
3.340

36
37
38
39
40

1.306
1.305
1.304
1.304
1.303

1.688
1.687
1.686
1.685
1.684

2.028
2.026
2.024
2.023
2.021

2.434
2.431
2.429
2.426
2.423

2.719
2.715
2.712
2.708
2.704

3.333
3.326
3.319
3.313
3.307

Confidence Interval
df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

41
42
43
44
45

1.303
1.302
1.302
1.301
1.301

1.683
1.682
1.681
1.680
1.679

2.020
2.018
2.017
2.015
2.014

2.421
2.418
2.416
2.414
2.412

2.701
2.698
2.695
2.692
2.690

3.301
3.296
3.291
3.286
3.281

46
47
48
49
50

1.300
1.300
1.299
1.299
1.299

1.679
1.678
1.677
1.677
1.676

2.013
2.012
2.011
2.010
2.009

2.410
2.408
2.407
2.405
2.403

2.687
2.685
2.682
2.680
2.678

3.277
3.273
3.269
3.265
3.261

51
52
53
54
55

1.298
1.298
1.298
1.297
1.297

1.675
1.675
1.674
1.674
1.673

2.008
2.007
2.006
2.005
2.004

2.402
2.400
2.399
2.397
2.396

2.676
2.674
2.672
2.670
2.668

3.258
3.255
3.251
3.248
3.245

56
57
58
59
60

1.297
1.297
1.296
1.296
1.296

1.673
1.672
1.672
1.671
1.671

2.003
2.002
2.002
2.001
2.000

2.395
2.394
2.392
2.391
2.390

2.667
2.665
2.663
2.662
2.660

3.242
3.239
3.237
3.234
3.232

Confidence Interval
df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

61
62
63
64
65

1.296
1.295
1.295
1.295
1.295

1.670
1.670
1.669
1.669
1.669

2.000
1.999
1.998
1.998
1.997

2.389
2.388
2.387
2.386
2.385

2.659
2.657
2.656
2.655
2.654

3.229
3.227
3.225
3.223
3.220

66
67
68
69
70

1.295
1.294
1.294
1.294
1.294

1.668
1.668
1.668
1.667
1.667

1.997
1.996
1.995
1.995
1.994

2.384
2.383
2.382
2.382
2.381

2.652
2.651
2.650
2.649
2.648

3.218
3.216
3.214
3.213
3.211

71
72
73
74
75

1.294
1.293
1.293
1.293
1.293

1.667
1.666
1.666
1.666
1.665

1.994
1.993
1.993
1.993
1.992

2.380
2.379
2.379
2.378
2.377

2.647
2.646
2.645
2.644
2.643

3.209
3.207
3.206
3.204
3.202

76
77
78
79
80

1.293
1.293
1.292
1.292
1.292

1.665
1.665
1.665
1.664
1.664

1.992
1.991
1.991
1.990
1.990

2.376
2.376
2.375
2.374
2.374

2.642
2.641
2.640
2.640
2.639

3.201
3.199
3.198
3.197
3.195

Confidence Interval
df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

81
82
83
84
85

1.292
1.292
1.292
1.292
1.292

1.664
1.664
1.663
1.663
1.663

1.990
1.989
1.989
1.989
1.988

2.373
2.373
2.372
2.372
2.371

2.638
2.637
2.636
2.636
2.635

3.194
3.193
3.191
3.190
3.189

86
87
88
89
90

1.291
1.291
1.291
1.291
1.291

1.663
1.663
1.662
1.662
1.662

1.988
1.988
1.987
1.987
1.987

2.370
2.370
2.369
2.369
2.368

2.634
2.634
2.633
2.632
2.632

3.188
3.187
3.185
3.184
3.183

91
92
93
94
95

1.291
1.291
1.291
1.291
1.291

1.662
1.662
1.661
1.661
1.661

1.986
1.986
1.986
1.986
1.985

2.368
2.368
2.367
2.367
2.366

2.631
2.630
2.630
2.629
2.629

3.182
3.181
3.180
3.179
3.178

96
97
98
99
100

1.290
1.290
1.290
1.290
1.290

1.661
1.661
1.661
1.660
1.660

1.985
1.985
1.984
1.984
1.984

2.366
2.365
2.365
2.365
2.364

2.628
2.627
2.627
2.626
2.626

3.177
3.176
3.175
3.175
3.174

1.282

1.645

1.960

2.326

2.576

3.090

Polynomial Regression
Second-order polynomial:
y = a0 + a1x + a2 x2
Sum of the squares of the residuals:
S r = ( y i a 0 a 1 x i a 2 x i2 ) 2

## Take derivative with respect to each coefficients:

S r
= 2 ( y i a 0 a 1 x i a 2 x i2 )
a 0
S r
= 2 x i ( y i a 0 a 1 x i a 2 x i2 )
a 1
S r
= 2 x i2 ( y i a 0 a 1 x i a 2 x i2 )
a 2

Normal equations:

( )
)a + ( x )a
)a + ( x )a

n a 0 + ( x i )a 1 + x i2 a 2 = y i

( x i )a 0 + ( x i2

( x )a + ( x
2
i

3
i

3
i

4
i

= x iy i

= x i2 y i

## For mth-order polynomial: y = a0 + a1x + a2 x2 + . . . + amxm

We have to solve m+1 simultaneous linear equations.

## MATLAB polyfit Function

For second-order polynomial, we can define
x12
x1
1
y1
c1
2

y
x2
x2
1

A=
, Y = 2 , C = c2




c3
2

xm 1
ym
xm
and show that C = ( A ' A) 1 A ' Y or C=A -1Y

Fit norm

Fit QR

>> C = polyfit(x, y, n)
>> [C, S] = polyfit(x, y, n)
x = independent variable
y = dependent variable
n = degree of polynomial

C = coeff. of polynomial in
descending power
S = data structure for polyval
function

## Example: Fit a second-order polynomial to the data.

xi

yi

( yi y )

0
1
2
3
4
5

2.1
7.7
13.6
27.2
40.9
61.1

544.44
314.47
140.03
3.12
239.22
1272.11

15 152.6
From the given data:

2513.39

( yi - a0 - a1xi - a2xi2)2
0.14332
1.00286
1.08158
0.80491
0.61951
0.09439
3.74657

m=2

xi = 15

xi4 = 979

n=6

yi = 152.6

xi yi = 585.6

x = 2.5

xi2 = 55

xi2 yi = 585.6

y = 25.433

xi3 = 225

6
15

55

15
55
225

25 a0 152.6

225 a1 = 585.6
979 a2 2488.8

## Solving these equation gives a0 = 2.47857, a1 = 2.35929, and

a2 = 1.86071.
y = 2.47857 + 2.35929x + 1.86071x2
Coefficient of determination:
r=

2513.39 3.74657
= 0.99851 = 0.99925
2513.39

## Solving by MATLAB polyfit Function

>> x = [0 1 2 3 4 5];
>> y = [2.1 7.7 13.6 27.2 40.9 61.1];
>> c = polyfit(x, y, 2)
>> [c, s] = polyfit(x, y, 2)
>> st = sum((y - mean(y)).^2)
>> sr = sum((y - polyval(c, x)).^2)
>> r = sqrt((st - sr) / st)

## MATLAB polyval Function

Evaluate polynomial at the points defined by the input vector
>> y = polyval(c, x)
where x = Input vector
y = Value of polynomial evaluated at x
c = vector of coefficient in descending order
Y = c(1)*xn + c(2)*x(n-1) + ... + c(n)*x + c(n+1)
Example: y = 1.86071x2 + 2.35929x + 2.47857
>> c = [1.86071 2.35929 2.47857]

Polynomial Interpolation
70
60
50
y

40
30
20
10
0
0

>> y2 = polyval(c,x)
>> plot(x, y, o, x, y2)

Error Bounds
By passing an optional second output parameter from polyfit as an input
to polyval.

## >> [c,s] = polyfit(x,y,2)

>> [y2,delta] = polyval(c,x,s)
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')
Interval of 2 = 95% confidence interval

xi

yi

1
2
3
4
5
6
7

0.5
2.5
2.0
4.0
3.5
6.0
5.5

## >> [c,s] = polyfit(x,y,1)

>> [y2,delta] = polyval(c,x,s)
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')

## Multiple Linear Regression

y = c0 + c1x1 + c2x2 + . . . + cpxp
Example case: two independent variables y = c0 + c1x1 + c2x2
Sum of squares of the residual: S r = ( yi c0 c1 x1i c2 x2i ) 2
Differentiate with respect to unknowns:
S r
= 2 ( yi c0 c1 x1i c2 x2i )
c0
S r
= 2 x1i ( yi c0 c1 x1i c2 x2i )
c1
S r
= 2 x2i ( yi c0 c1 x1i c2 x2i )
c2

x1i

n
x
1i
x2i
Example:
x1 x2
0
2
2.5
1
4
7

0
1
2
3
6
2

x12i
x1i x2i

y
5
10
9
0
3
27

x2i c0 yi

2
x2i c2 x2i yi
6
16.5

14
c0 = 5
c1 = 4
c2 = 3

16.5
76.25
48

14 c0 54

48 c1 = 243.5
54 c2 100

## Multivariate Fit in MATLAB

c0 + c1x11 + c2x12 + . . . + cpx1p = y1
c0 + c1x21 + c2x22 + . . . + cpx2p = y2
.
.
.
c0 + c1xm1 + c2xm2 + . . . + cpxmp = ym
Overdetermined system of equations: A c = y

x11
x
21
A=


xm1

x12
x22

xm 2






x1 p
x2 p

xmp

1
c0
y1
c
y
1
1
, c = , and y = 2




1
ym
c p

Fit norm

>> c = (A*A)\(A*y)

Fit QR

>> c = A\y

Example:
x1 x2
0
2
2.5
1
4
7

0
1
2
3
6
2

y
5
10
9
0
3
27

## >> x1=[0 2 2.5 1 4 7]';

>> x2=[0 1 2 3 6 2]';
>> y=[5 10 9 0 3 27]';
>> A=[x1 x2 ones(size(x1))];
>> c=A\y