Sunteți pe pagina 1din 40

EE290H F05 Spanos

Regression Analysis

Simple Regression
Multivariate Regression
Stepwise Regression
Replication and Prediction Error

Lecture 8: Regression Analysis 1


EE290H F05 Spanos

Regression Analysis

• In general, we "fit" a model by minimizing a


metric that represents the error.
n
min Σ (yi - yi)2
i=1

• The sum of squares gives closed form solutions


and minimum variance for linear models.
Lecture 8: Regression Analysis 2
EE290H F05 Spanos

The Simplest Regression Model

y=bx
Line through the origin: y

yu =βxu +ε u u=1,2,...,n ε u ~N(0, σ 2R )


n
min S = min Σ (yu - βxu )2 : estimate of σ 2R
u=1
y=bx η u =βxu
b: estimate of β
y: estimate of η u , the true value of the model.

Lecture 8: Regression Analysis 3


EE290H F05 Spanos

Using the Normal Equation


to fit “line through the origin” model

Our model only has one degree of freedom


This is why our choices are confined on this line…
min Σ (y-y)2

y2 y

y=bx (1 d.f.) y1

Lecture 8: Regression Analysis 4


EE290H F05 Spanos

Using the Normal Equation (cont)


(fitting “line through the origin” model)

Choose b so that the residual vector is perpendicular to the


model vector...

Σ (y-y)⋅ x = 0 ⇒ Σ (y - bx)⋅ x = 0 ⇒
Σ xy
b= (est. of β) s2= SR (est. of σ2R)
Σ x2 n-1
s 2 s2
V(b) = 67% conf: b ±
Σ x2 Σ x2
*
b-β
Significance test: t = 2
~ tn-1
s
Σ x2
Lecture 8: Regression Analysis 5
EE290H F05 Spanos

Etch time vs. removed material: y = bx


500

400
R
e
m 300
o
v
e
d 200
(
n
m 100
)
0
0.0 0.2 0.4 0.6 0.8 1.0
Etch Time (sec) x 10^3

Variable Std. Err. t


Name Coefficient Estimate Statistic Prob > t

Etch Time (sec) 0.501 0.0162 30.9 0.000


Lecture 8: Regression Analysis 6
EE290H F05 Spanos

Model Validation through ANOVA

The idea is to decompose the sum of squares into


orthogonal components.
Assuming that there is no need for a model at all* (always
a good null Hypothesis!):

H0: β*=0

Σ y2u = Σ y2u + Σ (yu - yu)2


n p n-p
total model residual
* This is equivalent to saying that y~N(μ,σ2), where μ and σ are constants, independent of x.

Lecture 8: Regression Analysis 7


EE290H F05 Spanos

Model Validation through ANOVA (cont)

Assuming a specific model:

H0 : β * = b

Σ (yu - β*xu)2 = Σ (yu - β*xu)2 + Σ (yu - yu)2


n p n-p

total model residual

The
TheANOVA
ANOVAtable
tablewill
willanswer
answerthe
thequestion:
question:
Is
Isthere
thereaarelationship
relationshipbetween
between xxand
and y?
y?
Lecture 8: Regression Analysis 8
EE290H F05 Spanos

ANOVA table and Residual Plot


Sum of Deg. of Mean
Source Squares Freedom Squares F-Ratio Prob>F

Model 1.83e+5 1 1.83e+5 1.98e+2 2.17e-6


Error 6.47e+3 7 9.24e+2

Total 1.89e+5 8

60
40
R
e 20
s
i 0
d -20
u
a -40
l
s -60
0.0 0.2 0.4 0.6 0.8 1.0
Etch Time (sec) x 10^3
Lecture 8: Regression Analysis 9
EE290H F05 Spanos

A More Complex Regression Equation


- a straight line with two parameters
actual estimated

η = α + β (x - x ) y = a + b (x - x )
yi~ N (ηi, σ2)

Minimize R =Σ (yi-yi)2 to estimate α and β


Σ(xi-x)yi Σ(xi-x)(yi-y)
a=y b= =
Σ(xi-x) 2
Σ(xi-x)2
Are a and b good estimators of α and β?
Σ(xi-x)E[yi]
E[a] = α E[b] = =β
Σ(xi-x) 2

Lecture 8: Regression Analysis 10


EE290H F05 Spanos

Variance Estimation:

Note that all variability comes from yi!


Σ yi 1 σ 2
V[a] = V = 2 Σ V[ y i] = min
minvar.
var.
k k k thanks
thankstoto
least
least
Σ (x i-x)y i σ 2 squares!
squares!
V[b] = V =
Σ (x i-x) 2 Σ (x i-x) 2

Lecture 8: Regression Analysis 11


EE290H F05 Spanos

LTO
L
thickness vs deposition time: y = a + bx
T 4
O
t
h
i 3
c
k
A
2
x
1
0
^ 1
3 1.0 1.5 2.0 2.5 3.0 3.5
Dep time x 10^3

Variable Std. Err. t


Name Coefficient Estimate Statistic Prob > t

Constant 6.04e+1 5.61e+1 1.08e+0 0.030


Dep time 9.75e-1 2.52e-2 3.87e+1 0.000
Lecture 8: Regression Analysis 12
EE290H F05 Spanos

Anova table and Residual Plot


Sum of Deg. of Mean
Source Squares Freedom Squares F-Ratio Prob>F
Model 4.77e+6 1 4.77e+6 1.50e+3 0.000
Error 5.09e+4 16 3.18e+3

Total 4.82e+6 17

100

R
e
s
i 0
d
u
a
l
s -100
1.0 1.5 2.0 2.5 3.0 3.5
Dep time x 10^3
Lecture 8: Regression Analysis 13
EE290H F05 Spanos

ANOVA Representation
(xi,yi) (yi-yi)

y (yi-η i)
b(xi-x)
(yi-η i)

(a-α)
β(xi-x)
yi = a+b(xi-x)

η i = α+β(xi-x) x xi x

Note differences between "true" and "estimated" model.


Lecture 8: Regression Analysis 14
EE290H F05 Spanos

ANOVA Representation (cont)

( y i-ηi) = (a- α ) + (b- β ) ( x i- x ) + ( yi- y i)

Σ (y i-η i) 2 = k(a- α ) 2 + (b- β) 2 Σ (x i-x )+ Σ(y i-y i) 2

(k) (1) (1) (k-2)

~ σ 2 χ 2 (k) ~ σ 2 χ 2 (1) ~ σ 2 χ 2 (1) ~ σ 2 χ 2 (k-2)

In
In this
this way,
way, the
the significance
significance of
of the
the model
model can
can be
be
analyzed
analyzedinindetail.
detail.

Lecture 8: Regression Analysis 15


EE290H F05 Spanos

Confidence Limits of an Estimate

y0 = y+b(x0 -x )
2
V(y0 ) = V(y)+(x0 -x ) V(b)
2
(x0 -x )
V(y0 ) = 1 + s 2
n 2
Σ (x -x )

prediction interval: y0 +/- t α V(y0 )


2

Lecture 8: Regression Analysis 16


EE290H F05 Spanos

Confidence Interval of Prediction (all points)


p
L
T
3000
O

T
h 2500
i
c
k
n 2000
e
s
s
1500

1000
1000 1500 2000 2500 3000
Dep time Leverage
Lecture 8: Regression Analysis 17
EE290H F05 Spanos

Confidence Interval of Prediction (half the points)


L
T
3000
O

T
h 2500
i
c
k
n 2000
e
s
s
1500

1000
1000 1500 2000 2500 3000
Dep time Leverage
Lecture 8: Regression Analysis 18
EE290H F05 Spanos

Confidence Interval of Prediction (1/4 of points)


L
T
3000
O

T
h 2500
i
c
k
n 2000
e
s
s
1500

1000
1000 1500 2000 2500 3000
Dep time Leverage
Lecture 8: Regression Analysis 19
EE290H F05 Spanos

Prediction Error vs Experimental Error

Experimental Error

Prediction error
y
Estimated Model

••Experimental
ExperimentalError
Error
Does
Doesnot
notdepend
depend
on
on locationor
location or
True model sample
samplesize.
size.
••Prediction
PredictionError
Error
depends
dependson on
location
location
gets
getssmaller
smalleras
as
sample size
sample size
x increases.
increases.
Lecture 8: Regression Analysis 20
EE290H F05 Spanos

Multivariate Regression

η = β1x1 +β2x2
x2 R
y The
TheResidual
Residualisis
toy ,,x1 ,,x2 ..
to
β2 y

β1 x1

Coefficient Estimation: Σ (y-y)x1=0 Σ (y-y)x2=0


Σ yx1-b1Σ x21-b2Σ x1x2 = 0
Σ yx2-b2Σ x22-b1Σ x1x2 = 0
Lecture 8: Regression Analysis 21
EE290H F05 Spanos

Variance Estimation:

SR
s2 = n-p
V(b1) = 1 2 s2
1-ρ Σx21
V(b2) = 1 2 s2 - Σ x 1 x2
ρ=
1-ρ Σx22 Σ x21Σ x22

Lecture 8: Regression Analysis 22


EE290H F05 Spanos

Thickness vs time, temp: y = a + b1 x1 + b2 x2

Variable Std. Err. t


Name Coefficient Estimate Statistic Prob > t

Constant -7.04e+2 7.18e+1 -9.80e+0 0.000


temp 7.14e-1 7.00e-2 1.02e+1 0.000
time min 8.69e-1 3.89e-2 2.23e+1 0.000
Lecture 8: Regression Analysis 23
EE290H F05 Spanos

Anova table and Correlation of Estimates

Sum of Deg. of Mean


Source Squares Freedom Squares F-Ratio Prob>F
Model 2.58e+4 2 1.29e+4 3.01e+2 0.000
Error 7.71e+2 18 4.28e+1

Total 2.66e+4 20

Data File: regression


Tox Temp Time
tox nm 1.000 0.410 0.896
temp 0.410 1.000 0.000
time min 0.896 0.000 1.000
Lecture 8: Regression Analysis 24
EE290H F05 Spanos

Multiple Regression in General

x1 x2 xn b = y + e
minimize Xb - y 2 = e 2 = ( y - Xb )T ( y - Xb )
or, min -e T Xb + e Ty which is equiv. to: ( y - Xb )T Xb = 0
X T Xb = X T y
b = ( X T X ) -1 X T y V(b) = ( X T X ) -1 σ 2

Lecture 8: Regression Analysis 25


EE290H F05 Spanos

Joint Confidence Region for x1 x2


p
S = SR 1 + n-p Fα (p, n-p)

Σ β1- b1 2Σ x12+2 β1- b1 β2- b2 Σ x1x2+ β2- b2 2 Σ x22= S-SR


Lecture 8: Regression Analysis 26
EE290H F05 Spanos

What if a “linear” model is not enough?


300

d
e
p
200
r
a
t
e

100
600 610 620 630 640 650
inlet temp

Variable Std. Err. t


Name Coefficient Estimate Statistic Prob > t

Constant -1.85e+3 4.64e+1 -3.99e+1 0.000


inlet temp 3.24e+0 7.46e-2 4.35e+1 0.000
Lecture 8: Regression Analysis 27
EE290H F05 Spanos

ANOVA table and Residual Plot


Sum of Deg. of Mean
Source Squares Freedom Squares F-Ratio Prob>F
Model 3.65e+4 1 3.65e+4 1.89e+3 0.000
Error 4.06e+2 21 1.93e+1

Total 3.69e+4 22

20

R 10
e
s
i 0
d
u
a -10
l
s
-20
600 610 620 630 640 650
inlet temp
Lecture 8: Regression Analysis 28
EE290H F05 Spanos

Multiple Regression with Replication

S E= 1
2 Σ (y i1-y i2) 2 SLF =SR -S E

k ni
Σ Σv (y iv-ηi)2 =
i
k
Σ ηi
i
k k k k ni
(a-α)2Σ ηi + (b-β)2Σ ηi(x i-x)2 + Σ ηi(y i.-y i)2 + Σ Σ (y iv-y i.)2
i i i iv
k
1 1 k-2 Σ ηi-k
i

k ni k ni k k
Σ Σv (y iv-y)2 = Σ Σv (y iv-y i.)2 + Σ ηi(y i.-y i)2 + Σ ηi(y-y i)2
i i i i

Lecture 8: Regression Analysis 29


EE290H F05 Spanos

Pure Error vs. Lack of Fit Example

Lack Of Fit
Source DF Sum of Squares Mean Square F Ratio
Lack Of Fit 17 401.01 23.59 21.04
Pure Error 4 4.49 1.12 Prob > F
Total Error 21 405.50 0.005

Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|
Intercept -1850.16 46.42 -39.85 0.000
inlet temp 3.24 0.07 43.47 0.000

Model Test
Source DF Sum of Squares F Ratio Prob > F
inlet temp 1 36489.55 999.99 0.000

Lecture 8: Regression Analysis 30


EE290H F05 Spanos

Dep. rate vs temperature: y = a + bx + cx2


300

d
e
p
200
r
a
t
e

100
600 610 620 630 640 650
inlet temp
Variable Std. Err. t
Name Coefficient Estimate Statistic Prob > t

Constant 8.34e+3 1.80e+3 4.66e+0 0.000


inlet temp -2.94e+1 5.74e+0 -5.13e+0 0.000
inlet temp ^2 2.62e-2 4.60e-3 5.69e+0 0.000
Lecture 8: Regression Analysis 31
EE290H F05 Spanos

Pure Error vs. Lack of Fit Example (cont)


Lack Of Fit
Source DF Sum of Squares Mean Square F Ratio
Lack Of Fit 16 150.24 9.39 8.37
Pure Error 4 4.49 1.12 Prob > F
Total Error 20 154.73 0.026

Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|
Intercept 8339.05 1789.92 4.66 0.0002
inlet temp^1 -29.45 5.74 -5.13 0.0001
inlet temp^2 0.03 0.005 5.69 0.0000

Model Test
Source DF Sum of Squares F Ratio Prob > F
Poly(inlet temp,2) 2 36740.32 999.99 0.0000

Lecture 8: Regression Analysis 32


EE290H F05 Spanos

ANOVA table and Residual Plot


Sum of Deg. of Mean
Source Squares Freedom Squares F-Ratio Prob>F

Model 3.67e+4 2 1.84e+4 2.37e+3 0.000


Error 1.55e+2 20 7.74e+0

Total 3.69e+4 22

6
4
R
e 2
s
i 0
d
u -2
a
l -4
s
-6
600 610 620 630 640 650
inlet temp

Lecture 8: Regression Analysis 33


EE290H F05 Spanos

Use regression line to predict LTO thickness...

y = 60.352 + 0.97456 x y = - 38.440 + 1.0153 x


R2 = 0.989 R2 = 0.989

4000 4000

3000
3000

2000

2000
1000
LTO Thick A
90%LimitLow
90%LimitHigh
0 1000
1000 2000 3000 4000 1000 2000 3000 4000
Dep Time Sec LTO Thick A
Lecture 8: Regression Analysis 34
EE290H F05 Spanos

Response Surface Methodology

• Objectives:
• get a feel of I/O relationships
• find setting(s) that satisfy multiple constraints
• find settings that lead to optimum performance
• Observations:
• Function is nearly linear away from the peak
• Function is nearly quadratic at the peak

Lecture 8: Regression Analysis 35


EE290H F05 Spanos

Building the planar model

A Factorial experiment with center points is enough to build


and confirm a planar model.

b1, b2,
b12 = -0.65 +/-0.75
b11+b22=1/4Σp+1/3Σc= -0.50 +/-1.15

Lecture 8: Regression Analysis 36


EE290H F05 Spanos

Quadratic Model and Confirmation Run


Close to the peak, a quadratic model can be built and
confirmed by an expanded two-phase experiment.

Lecture 8: Regression Analysis 37


EE290H F05 Spanos

Response Surface Methodology

• RSM consists of creating models that lead to visual


images of a response. The models are usually linear or
quadratic in nature.
• Either expanded factorial experiments, or regression
analysis can be used.
• All empirical models have a random prediction error. In
RSM, the average variance of the model is:
n
V(y) = n Σ V(y i) = n
1 pσ2
i=1

• where “p” is the number of model parameters and “n” is


the number of experiments.

Lecture 8: Regression Analysis 38


EE290H F05 Spanos

Response Surface Exploration

Lecture 8: Regression Analysis 39


EE290H F05 Spanos

"Popular" RSM

• Use singe-stage Box-B or Box-W designs


• Use computer (simulated) experiments
• Rely on "goodness of fit" measures
• Automate model structure generation
• Problems?

Lecture 8: Regression Analysis 40

S-ar putea să vă placă și