Sunteți pe pagina 1din 34

Numerical Methods for Civil Engineers

Lecture 7

Curve Fitting

 Linear Regression
 Polynomial Regression
 Multiple Linear Regression
Mongkol JIRAVACHARADET

SURANAREE
UNIVERSITY OF TECHNOLOGY

INSTITUTE OF ENGINEERING
SCHOOL OF CIVIL ENGINEERING

LINEAR REGRESSION
y

Candidate lines for curve fit


y = x +

x
No exact solution but many approximated solutions

Error Between Model and Observation


y

Observation: [ xi yi ]
Model: y = x +

Error: ei = yi - xi +

x
Criteria for a Best Fit
Find the BEST line which minimize the sum of error for all data

Least-Square Fit of a Straight Line


Minimize sum of the square of the errors
n

i =1

i =1

S r = e i2 = (y i x i )
Differentiate with respect to each coefficient:

S r
= 2 ( y i x i )

S r
= 2 [( y i x i ) x i ]

Setting derivatives = 0 :

0 = y i x i
0 = y i x i x i x i2
From = n , express equations as set of 2 unknowns ( , )

n + x i = y i

x i + x i2 = y i x i

Solve equations simultaneously:

1
xi yi xi yi
n
=
1
2
2
xi ( xi )
n

S xy
S xx

= y x
where y and x are the mean of y and x
1
Define: S xy = xi yi xi yi
n
1
2
S xx = x ( xi )
n
2
i

S yy = yi2

1
2

y
( i)
n

Approximated y for any x is

y = x +

Example: Fit a straight line to x and y values


xi2

xi yi

y i2

xi

yi

1
2
3
4
5
6
7

0.5
2.5
2.0
4.0
3.5
6.0
5.5

1
4
9
16
25
36
49

0.5
5.0
6.0
16.0
17.5
36.0
38.5

0.25
6.25
4
16
12.25
36
30.25

28

24

140

119.5

105

(119.5) (28)(24) / 7
=
= 0.8393
2
(140) (28) / 7

= 3.4286 0.8393(4) = 0.0714

n =7
28
x =
=4
7
24
y =
= 3 . 4286
7

Least-square fit:
y = 0 . 8393 x + 0 . 0714

How good is our fit?


Sum of the square of the errors:
n

i =1

i =1

S r = ei2 = ( yi xi ) = ( S xx S yy S xy2 ) / S xx
2

Sum of the square around the mean:

St = ( yi y ) = S yy
i =1

Standard errors of the estimation:

Standard deviation:

sy / x

Sr
=
n2

St
sy =
n2

sy

Linear regression
sy > sy/x

sy/x

S xy2

Coefficient of determination

St S r
=
r =
S xx S yy
St

r2 
    y

    

For perfect fit Sr = 0 and r = r2 = 1

Example: error analysis of the linear fit


xi

yi

( yi y )

1
2
3
4
5
6
7

0.5
2.5
2.0
4.0
3.5
6.0
5.5

8.5765
0.8622
2.0408
0.3265
0.0051
6.6122
4.2908

28

24

22.7143

( y i - - x i) 2
0.1687
0.5626
0.3473
0.3265
0.5896
0.7972
0.1993

y = 3.4286
= 0.8393

= 0.0714
22.7143
sy =
72
= 2.131
sy / x

2.9911

2.9911
=
72
= 0.773

St
Sr
Since sy/x < sy , linear regression has merit.
r=

22.7143 2.9911
= 0.868 = 0.932
22.7143

Linear model explains 86.8% of original uncertainty.

OR Example: error analysis of the linear fit


xi2

y i2

xi

yi

xi yi

1
2
3
4
5
6
7

0.5
2.5
2.0
4.0
3.5
6.0
5.5

1
4
9
16
25
36
49

0.5
5.0
6.0
16.0
17.5
36.0
38.5

0.25
6.25
4
16
12.25
36
30.25

28

24

140

119.5

105

S xx = 140 282 / 7 = 28
S yy = 105 242 / 7 = 22.7
S xy = 119.5 28 24 / 7 = 23.5

Since sy/x < sy , linear regression has merit.


2
23.5
r2 =
= 0.869
28 22.7

S r = (28 22.7 23.52 ) / 28


= 2.977
sy =
sy / x =

Linear model explains 86.9% of original uncertainty.

22.7
= 2.131
72
2.977
= 0.772
72

Confidence Interval (CI)


A confidence interval is an interval in which a measurement or trial
falls corresponding to a given probability.
y i
y i

i
x
x
For CI 95%, you can be 95% confident that the two curved

confidence bands enclose the true best-fit linear regression line,


leaving a 5% chance that the true line is outside those boundaries.

A 100 (1 - ) % confidence interval for yi is given by


Confidence interval 95% = 0.05

yi t / 2 s y / x

1 ( xi x ) 2
+
n
S xx

Example: to estimate y when x is 3.4 using 95% confidence interval:


xi
1
2
3
4
5
6
7

yi
0.5
2.5
2.0
4.0
3.5
6.0
5.5

y = x + = 0.8363(3.4) + 0.0714 = 2.9148

95% Confidence = 0.05 t/2 = t0.025(df = n-2 = 5) = 2.571

1 (3.4 4) 2
Interval: 2.9148 (2.571) (0.772) +
7
28
2.9148 0.7832

T-Distribution
t 0.025

Probability density function of


the t distribution:

t 0.025

t 0.005

t 0.005

f ( x) =

(1 + x 2 / ) ( +1) / 2

t
95%
99%

B(0.5, 0.5 )

where B is the beta function and


is a positive integer
shape parameter.

The formula for the beta function is


1

B ( , ) = t 1 (1 t ) 1 dt
0

The following is the plot of the t probability density function for 4 different
values of the shape parameter.

= df
Degree of freedom

In fact, the t distribution with equal to 1 is a Cauchy distribution.


The t distribution approaches a normal distribution as becomes large.
The approximation is quite good for values of > 30.

Critical Values of t
Confidence Interval

df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

1
2
3
4
5

3.078
1.886
1.638
1.533
1.476

6.314
2.920
2.353
2.132
2.015

12.706
4.303
3.182
2.776
2.571

31.821
6.965
4.541
3.747
3.365

63.657
9.925
5.841
4.604
4.032

318.313
22.327
10.215
7.173
5.893

6
7
8
9
10

1.440
1.415
1.397
1.383
1.372

1.943
1.895
1.860
1.833
1.812

2.447
2.365
2.306
2.262
2.228

3.143
2.998
2.896
2.821
2.764

3.707
3.499
3.355
3.250
3.169

5.208
4.782
4.499
4.296
4.143

11
12
13
14
15

1.363
1.356
1.350
1.345
1.341

1.796
1.782
1.771
1.761
1.753

2.201
2.179
2.160
2.145
2.131

2.718
2.681
2.650
2.624
2.602

3.106
3.055
3.012
2.977
2.947

4.024
3.929
3.852
3.787
3.733

16
17
18
19
20

1.337
1.333
1.330
1.328
1.325

1.746
1.740
1.734
1.729
1.725

2.120
2.110
2.101
2.093
2.086

2.583
2.567
2.552
2.539
2.528

2.921
2.898
2.878
2.861
2.845

3.686
3.646
3.610
3.579
3.552

Confidence Interval
df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

21
22
23
24
25

1.323
1.321
1.319
1.318
1.316

1.721
1.717
1.714
1.711
1.708

2.080
2.074
2.069
2.064
2.060

2.518
2.508
2.500
2.492
2.485

2.831
2.819
2.807
2.797
2.787

3.527
3.505
3.485
3.467
3.450

26
27
28
29
30

1.315
1.314
1.313
1.311
1.310

1.706
1.703
1.701
1.699
1.697

2.056
2.052
2.048
2.045
2.042

2.479
2.473
2.467
2.462
2.457

2.779
2.771
2.763
2.756
2.750

3.435
3.421
3.408
3.396
3.385

31
32
33
34
35

1.309
1.309
1.308
1.307
1.306

1.696
1.694
1.692
1.691
1.690

2.040
2.037
2.035
2.032
2.030

2.453
2.449
2.445
2.441
2.438

2.744
2.738
2.733
2.728
2.724

3.375
3.365
3.356
3.348
3.340

36
37
38
39
40

1.306
1.305
1.304
1.304
1.303

1.688
1.687
1.686
1.685
1.684

2.028
2.026
2.024
2.023
2.021

2.434
2.431
2.429
2.426
2.423

2.719
2.715
2.712
2.708
2.704

3.333
3.326
3.319
3.313
3.307

Confidence Interval
df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

41
42
43
44
45

1.303
1.302
1.302
1.301
1.301

1.683
1.682
1.681
1.680
1.679

2.020
2.018
2.017
2.015
2.014

2.421
2.418
2.416
2.414
2.412

2.701
2.698
2.695
2.692
2.690

3.301
3.296
3.291
3.286
3.281

46
47
48
49
50

1.300
1.300
1.299
1.299
1.299

1.679
1.678
1.677
1.677
1.676

2.013
2.012
2.011
2.010
2.009

2.410
2.408
2.407
2.405
2.403

2.687
2.685
2.682
2.680
2.678

3.277
3.273
3.269
3.265
3.261

51
52
53
54
55

1.298
1.298
1.298
1.297
1.297

1.675
1.675
1.674
1.674
1.673

2.008
2.007
2.006
2.005
2.004

2.402
2.400
2.399
2.397
2.396

2.676
2.674
2.672
2.670
2.668

3.258
3.255
3.251
3.248
3.245

56
57
58
59
60

1.297
1.297
1.296
1.296
1.296

1.673
1.672
1.672
1.671
1.671

2.003
2.002
2.002
2.001
2.000

2.395
2.394
2.392
2.391
2.390

2.667
2.665
2.663
2.662
2.660

3.242
3.239
3.237
3.234
3.232

Confidence Interval
df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

61
62
63
64
65

1.296
1.295
1.295
1.295
1.295

1.670
1.670
1.669
1.669
1.669

2.000
1.999
1.998
1.998
1.997

2.389
2.388
2.387
2.386
2.385

2.659
2.657
2.656
2.655
2.654

3.229
3.227
3.225
3.223
3.220

66
67
68
69
70

1.295
1.294
1.294
1.294
1.294

1.668
1.668
1.668
1.667
1.667

1.997
1.996
1.995
1.995
1.994

2.384
2.383
2.382
2.382
2.381

2.652
2.651
2.650
2.649
2.648

3.218
3.216
3.214
3.213
3.211

71
72
73
74
75

1.294
1.293
1.293
1.293
1.293

1.667
1.666
1.666
1.666
1.665

1.994
1.993
1.993
1.993
1.992

2.380
2.379
2.379
2.378
2.377

2.647
2.646
2.645
2.644
2.643

3.209
3.207
3.206
3.204
3.202

76
77
78
79
80

1.293
1.293
1.292
1.292
1.292

1.665
1.665
1.665
1.664
1.664

1.992
1.991
1.991
1.990
1.990

2.376
2.376
2.375
2.374
2.374

2.642
2.641
2.640
2.640
2.639

3.201
3.199
3.198
3.197
3.195

Confidence Interval
df

80%
0.10

90%
0.05

95%
0.025

98%
0.01

99%
0.005

99.8%
0.001

81
82
83
84
85

1.292
1.292
1.292
1.292
1.292

1.664
1.664
1.663
1.663
1.663

1.990
1.989
1.989
1.989
1.988

2.373
2.373
2.372
2.372
2.371

2.638
2.637
2.636
2.636
2.635

3.194
3.193
3.191
3.190
3.189

86
87
88
89
90

1.291
1.291
1.291
1.291
1.291

1.663
1.663
1.662
1.662
1.662

1.988
1.988
1.987
1.987
1.987

2.370
2.370
2.369
2.369
2.368

2.634
2.634
2.633
2.632
2.632

3.188
3.187
3.185
3.184
3.183

91
92
93
94
95

1.291
1.291
1.291
1.291
1.291

1.662
1.662
1.661
1.661
1.661

1.986
1.986
1.986
1.986
1.985

2.368
2.368
2.367
2.367
2.366

2.631
2.630
2.630
2.629
2.629

3.182
3.181
3.180
3.179
3.178

96
97
98
99
100

1.290
1.290
1.290
1.290
1.290

1.661
1.661
1.661
1.660
1.660

1.985
1.985
1.984
1.984
1.984

2.366
2.365
2.365
2.365
2.364

2.628
2.627
2.627
2.626
2.626

3.177
3.176
3.175
3.175
3.174

1.282

1.645

1.960

2.326

2.576

3.090

Polynomial Regression
Second-order polynomial:
y = a0 + a1x + a2 x2
Sum of the squares of the residuals:
S r = ( y i a 0 a 1 x i a 2 x i2 ) 2

Take derivative with respect to each coefficients:


S r
= 2 ( y i a 0 a 1 x i a 2 x i2 )
a 0
S r
= 2 x i ( y i a 0 a 1 x i a 2 x i2 )
a 1
S r
= 2 x i2 ( y i a 0 a 1 x i a 2 x i2 )
a 2

Normal equations:

( )
)a + ( x )a
)a + ( x )a

n a 0 + ( x i )a 1 + x i2 a 2 = y i

( x i )a 0 + ( x i2

( x )a + ( x
2
i

3
i

3
i

4
i

= x iy i

= x i2 y i

For mth-order polynomial: y = a0 + a1x + a2 x2 + . . . + amxm


We have to solve m+1 simultaneous linear equations.

MATLAB polyfit Function


For second-order polynomial, we can define
x12
x1
1
y1
c1
2

y
x2
x2
1

A=
, Y = 2 , C = c2




c3
2


xm 1
ym
xm
and show that C = ( A ' A) 1 A ' Y or C=A -1Y

Fit norm

Fit QR

>> C = polyfit(x, y, n)
>> [C, S] = polyfit(x, y, n)
x = independent variable
y = dependent variable
n = degree of polynomial

C = coeff. of polynomial in
descending power
S = data structure for polyval
function

Example: Fit a second-order polynomial to the data.


xi

yi

( yi y )

0
1
2
3
4
5

2.1
7.7
13.6
27.2
40.9
61.1

544.44
314.47
140.03
3.12
239.22
1272.11

15 152.6
From the given data:

2513.39

( yi - a0 - a1xi - a2xi2)2
0.14332
1.00286
1.08158
0.80491
0.61951
0.09439
3.74657

m=2

xi = 15

xi4 = 979

n=6

yi = 152.6

xi yi = 585.6

x = 2.5

xi2 = 55

xi2 yi = 585.6

y = 25.433

xi3 = 225

Simultaneous linear equations


6
15

55

15
55
225

25 a0 152.6

225 a1 = 585.6
979 a2 2488.8

Solving these equation gives a0 = 2.47857, a1 = 2.35929, and


a2 = 1.86071.
Least-squares quadratic equation:
y = 2.47857 + 2.35929x + 1.86071x2
Coefficient of determination:
r=

2513.39 3.74657
= 0.99851 = 0.99925
2513.39

Solving by MATLAB polyfit Function


>> x = [0 1 2 3 4 5];
>> y = [2.1 7.7 13.6 27.2 40.9 61.1];
>> c = polyfit(x, y, 2)
>> [c, s] = polyfit(x, y, 2)
>> st = sum((y - mean(y)).^2)
>> sr = sum((y - polyval(c, x)).^2)
>> r = sqrt((st - sr) / st)

MATLAB polyval Function


Evaluate polynomial at the points defined by the input vector
>> y = polyval(c, x)
where x = Input vector
y = Value of polynomial evaluated at x
c = vector of coefficient in descending order
Y = c(1)*xn + c(2)*x(n-1) + ... + c(n)*x + c(n+1)
Example: y = 1.86071x2 + 2.35929x + 2.47857
>> c = [1.86071 2.35929 2.47857]

Polynomial Interpolation
70
60
50
y

40
30
20
10
0
0

>> y2 = polyval(c,x)
>> plot(x, y, o, x, y2)

Error Bounds
By passing an optional second output parameter from polyfit as an input
to polyval.

>> [c,s] = polyfit(x,y,2)


>> [y2,delta] = polyval(c,x,s)
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')
Interval of 2 = 95% confidence interval

Linear Regression Example:


xi

yi

1
2
3
4
5
6
7

0.5
2.5
2.0
4.0
3.5
6.0
5.5

>> [c,s] = polyfit(x,y,1)


>> [y2,delta] = polyval(c,x,s)
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')

Multiple Linear Regression


y = c0 + c1x1 + c2x2 + . . . + cpxp
Example case: two independent variables y = c0 + c1x1 + c2x2
Sum of squares of the residual: S r = ( yi c0 c1 x1i c2 x2i ) 2
Differentiate with respect to unknowns:
S r
= 2 ( yi c0 c1 x1i c2 x2i )
c0
S r
= 2 x1i ( yi c0 c1 x1i c2 x2i )
c1
S r
= 2 x2i ( yi c0 c1 x1i c2 x2i )
c2

Setting partial derivatives = 0 and expressing result in matrix form:

x1i

n
x
1i
x2i
Example:
x1 x2
0
2
2.5
1
4
7

0
1
2
3
6
2

x12i
x1i x2i

y
5
10
9
0
3
27

x2i c0 yi

x1i x2i c1 = x1i yi


2
x2i c2 x2i yi
6
16.5

14
c0 = 5
c1 = 4
c2 = 3

16.5
76.25
48

14 c0 54

48 c1 = 243.5
54 c2 100

Multivariate Fit in MATLAB


c0 + c1x11 + c2x12 + . . . + cpx1p = y1
c0 + c1x21 + c2x22 + . . . + cpx2p = y2
.
.
.
c0 + c1xm1 + c2xm2 + . . . + cpxmp = ym
Overdetermined system of equations: A c = y

x11
x
21
A=


xm1

x12
x22

xm 2






x1 p
x2 p

xmp

1
c0
y1
c
y
1
1
, c = , and y = 2






1
ym
c p

Fit norm

>> c = (A*A)\(A*y)

Fit QR

>> c = A\y

Example:
x1 x2
0
2
2.5
1
4
7

0
1
2
3
6
2

y
5
10
9
0
3
27

>> x1=[0 2 2.5 1 4 7]';


>> x2=[0 1 2 3 6 2]';
>> y=[5 10 9 0 3 27]';
>> A=[x1 x2 ones(size(x1))];
>> c=A\y