Sunteți pe pagina 1din 54

AMS 572 Presentation

CH 10 Simple Linear Regression


Introduction
Example:

David Beckham: 1.83m Brad Pitt: 1.83m George Bush :1.81m


Victoria Beckham: 1.68m Angelina Jolie: 1.70m Laura Bush: ?

● To predict height of the wife in a couple, based on the husband’s height


Response (out come or dependent) variable (Y): height of the wife
Predictor (explanatory or independent) variable (X): height of the husband
Regression analysis:
●  regression analysis is a statistical methodology to estimate the relationship
of a response variable to a set of predictor variable.

● when there is just one predictor variable, we will use simple linear regression.
When there are two or more predictor variables, we use multiple linear
regression.

●when it is not clear which variable represents a response and which is a predictor,
correlation analysis is used to study the strength of the relationship

History:
● The earliest form of linear regression was the method of least squares, whic
h was published by Legendre in 1805, and by Gauss in 1809.
● The method was extended by Francis Galton in the 19th century to describe

a biological phenomenon.
● This work was extended by Karl Pearson and Udny Yule to a more general st

atistical context around 20th century.


A probabilistic model
 Specific settings of the predictor variable

x1 , x2 , ..., xn
 Corresponding values of the response variable

y1 , y2 , ..., y n
ASSUME:

yi - Observed value of the random variable Yi depends on xi

Yi   0  1 xi   i (i  1, 2, ..., n) (10.1)

E ( i )  0
i - random error with
Var ( i )   2

 E (Yi )  i   0  1 xi (10.2) unknown mean of Yi

True Regression Line Unknown Slope


Unknown Intercept
4 BASIC
ASSUMPTIONS
Yi Linear function of xi

Have a common variance, 


2

Same for all values of x.

i Normally distributed

Independent
Comments:
1. Linear not because of x
Linear in the parameters  0 and 1

Example:

E (Y )   0  1 log x linear, logx = x

2. Predictor variable is not set as predetermined fixed values,


is random along with Y

Example: Height and Weight of the children.


Height (X) – given
Weight (Y) – predict

E (Y | X  x )   0   1 x

Conditional expectation of Y given X = x


10.2 Fitting the Simple
Linear Regression Mo
del
10.2.1 Least Squares (LS) Fit
Example 10.1(Tires Tread Wear vs. Mileage: Scatter Plo
t)
y  0  1 x n
Q  �[ yi - (  0  1 xi )]
2
yi - (0  1 xi ) (i  1,2,.....n)
i 1
The “best” fitting straight line in the sense of minimizing Q: LS estimat
e

� �
One way to find the LS estimate  0 and 1
�Q n
 -2�[ yi - (  0  1 xi )]
0
� i 1

� Q n
 -2�xi [ yi - (  0  1 xi )]
�1 i 1

Setting these partial derivatives equal to zero and simplifying, we get

n n
0n  1�xi  �yi
i 1 i1
n n n
0 �xi  1�x  �xi yi 2
i
i 1 i1 i1
 Solve the equations and we get

n n n n


(�xi2 )(�yi ) - (�xi )(�xi yi )
0  i 1 i 1
n
i 1
n
i 1

n�x - (�xi ) 2
i
2

i 1 i 1
n n n


n�xi yi - (�xi )(�yi )
1  i 1
n
i 1
n
i 1

n�x - (�xi )
2
i
2

i 1 i 1
 To simplify, we introduce
n n
1 n n
S xy  �( xi - x )( yi - y )  �xi yi - (�xi )(�yi )
i 1 i 1 n i 1 i 1
n n n
1
S xx  �( xi - x ) 2  �xi2 - (�xi ) 2
i 1 i 1 n i 1
n n n
1
S yy  �( yi - y ) 2  �yi2 - (�yi ) 2
i 1 i 1 n i 1
� � � S xy
 0  y - 1 x 1 
S xx
� � �
 We get The equation y   0  1 x is known as the l
east squares line, which is an estimate of the true re
gression line.
Example 10.2 (Tire Tread vs. Mileage: LS Line Fit)
Find the equation of the line for the tire tread wear data from Table10.1,
we have

�x i  144, �yi  2197.32, �xi2  3264, �yi2  589,887.08, �xi yi  28,167.72

and n=9.From these we calculate


x  16, y  244.15,
n
1 n n
1
S xy  �xi yi - (�xi )( �yi )  28,167.72 - (144 * 2197.32)  -6989.40
i 1 n i 1 i 1 9
n
1 n
1
S xx  �xi - (�xi )  3264 - (144)  960
2 2 2

i 1 n i 1 9
The slope and intercept estimates are

ˆˆ -6989.40
1   -7.281 and  0  244.15  7.281*16  360.64
960
Therefore, the equation of the LS line is

y  360.64 - 7.281x.
Conclusion: there is a loss of 7.281 mils in the tire groove depth for
every 1000 miles of driving.

Given a particular
We can find x  25
y  360.64 - 7.281* 25  178.62 mils
Which means the mean groove depth for all tires driven for 25,000m
iles is estimated to be 178.62 miles.
10.2.2 Goodness of Fit of the LS Line
 Coefficient of Determination and Correlation
yˆi   0  ˆ1 xi (i  1, 2,.....n)

 The residuals:
(i  1, 2,.....n)
ei  yi - ( ˆˆ0  1 xi )

are used to evaluate the goodness of fit of the L


S line.
n n n n
SST  �( yi - y )2  �( yˆˆˆˆ
i - y ) 2
 � ( yi - yi ) 2
 2�( yi - yi )( yi - y )
i 1
1
i 1
42 43 1 i 1
4 2 4 3 1i 41 442 4 4 43
SSR SSE 0

 We define:
SST  SSR  SSE

The ratio r 2  SSR  1 - SSE


SST SST
Note: total sum of squares (SST)
Regression sum of squares (SSR)
Error sum of squares (SSE)

r is called the coefficient of determination 0<r<1


Example 10.3(Tire Tread Wear vs. Mileage: Co
efficient of Determination and Correlation
2
 For the tire tread wear data, calculate r and r using the re
sult s from example 10.2 We have
n
1 n 2 1
SST  S yy  �y - (�yi )  589,887.08 - (2197.32) 2  53, 418.73
i
2

i 1 n i 1 9
 Next calculate SSR  SST - SSE  53, 418.73 - 2531.53  50,887.20
50,887.20
 Therefore r 2

53, 418.73
 0.953 and r  - 0.953  -0.976

where the sign of r follows from the sign of ˆ1  -7.281 since 95.
3% of the variation in tread wear is accounted for by linear
regression on mileage, the relationship between the two is
strongly linear with a negative slope.
10.2.3 estimation of  2
An unbiased estimate of  2
is given by
n

�e SSE
2
i
s 
2

i 1

n-2 n-2
Example 10.4(Tire Tread Wear Vs. Mileage: Estimate of  2

Find the estimate of for the tread wear data using the results from Example 10.3 W
e have SSE=2351.3 and n-2=7,therefore

2351.53
S2   361.65
7
Which has 7 d.f. The estimate of  is s  361.65  19.02 miles.
Statistical Inference on  0 and  1 ,
Con’t
� �
Point estimators:  0 , 1
� �
Sampling distributions of 0 and 1 :

 
2
xi
2
 � �xi
ˆ 0 ~ N   0,  2  SE (  0 )  s
 nSxx  nS xx
 

  2

� s
ˆ 
 1 ~ N   1,  SE ( 1 ) 
 S xx  S xx
For mathematical derivations, please refer to the text book, P331.
Statistical Inference on  0 and  1 ,
Con’t

P.Q.’s:
ˆ 0 -  0 ˆ 1 -  1
~ tn - 2 ~ tn - 2
SE ( ˆ 0) SE ( ˆ 1)
CI’s:


�� � � �� �
 0 �t a SE � 0 �, 1 �t a SE � 1 �
n - 2,
2 � � n - 2,
2 � �
Statistical Inference on  0 and  1 ,
Con’t
H 0 : 1  10 H 0 : 1  0
Hypothesis test:
H a : 1 �10 H a : 1 �0
-- Test statistic: � �
1 - 0
1
t0  �
1
t0  �
SE ( 1 ) SE ( 1 )
-- At the significance level a , we reject H 0 in
favor of H a iff t0 �tn - 2,a / 2

-- Can be used to show whether there is a


linear relationship between x and y
Analysis of Variance (ANOVA), Con’t

Mean Square:
-- a sum of squares divided by its d.f.

SSR SSE
MSR= , MSE=
1 n-2
2 2
MSR SSR ˆ 1 Sxx  ˆ 1   ˆ1  H 0 2
2
 2       t ~ F 1, n - 2
2 ˆ
MSE s s  s / Sxx   SE (  1) 
Analysis of Variance (ANOVA)

ANOVA Table
Source of Sum of Degrees of Mean F
Variation Squares Freedom Square
(Source) (SS) (d.f.) (MS)
Regression SSR 1 SSR
MSR=
1 MSR
SSE F=
Error SSE n-2 MSE= MSE
n-2
Total SST n-1

Example:
Source SS d.f. MS F
Regression 50,887.20 1 50,887.20 140.71
Error 7 361.25
2531.53
Total 53,418.73 8
10.4 Regression Diagnostics

10.4.1 Checking for Model Assumptions

 Checking for Linearity


 Checking for Constant Variance
 Checking for Normality
 Checking for Independence
Checking for Linearity
^
Xi =Mileage Y=β0 + β1 x
i Xi Yi Yi ei Yi =Groove Depth ^ ^ ^

1 0 394.33 360.64 33.69 ^ Y=β0 + β1 x


Yi =fitted value ^
2 4 329.50 331.51 -2.01
3 8 291.00 302.39 -11.39
ei =residual Residual = ei = Yi- Yi
4 12 255.17 273.27 -18.10
5 16 229.33 244.15 -14.82 Scatterplot of ei vs Xi
40
6 20 204.83 215.02 -10.19
7 24 179.00 185.90 -6.90
30

20

10
ei

8 28 163.83 156.78 7.05

-10

-20
0 5 10 15 20 25 30 35
Xi
Checking for Normality
Normal Probability Plot of residuals
Normal
99
Mean 3.947460E-16
StDev 17.79
95 N 9
AD 0.514
90
P-Value 0.138
80
70
Percent

60
50
40
30
20

10

1
-40 -30 -20 -10 0 10 20 30 40 50
C1
Checking for Constant Variance

Var(Y) is not constant. A sample residual plots when


Var(Y) is constant.
Checking for Independence

 Does not apply for S


imple Linear Regres
sion Model
 Only apply for time
series data
10.4.2 Checking for Outliers & Inf
luential Observations

 What is OUTLIER
 Why checking for outliers is important
 Mathematical definition
 How to deal with them
10.4.2-A. Intro
Recall Box and Whiskers Plot (Chapter 4)
 Where (mild) OUTLIER is defined as any observations that lies outside of
Q1-(1.5*IQR) and Q3+(1.5*IQR) (Interquartile range, IQR = Q3 − Q1)
 (Extreme) OUTLIER as that lies outside of Q1-(3*IQR) and Q3+(3*IQR)
 Observation "far away" from the rest of the data
10.4.2-B. Why are outliers a probl
em?
 May indicate a sample peculiarity or a data entry error or othe
r problem ;
 Regression coefficients estimated that minimize the Sum of S
quares for Error (SSE) are very sensitive to outliers >>Bias or
distortion of estimates;
 Any statistical test based on sample means and variances ca
n be distorted In the presence of outliers >>Distortion of p-va
lues;
 Faulty conclusions.

Example:
( Estimators not sensitive to outliers are said to be robust )
Sorted Data Median Mean Variance 95% CI for mean
Real 1 3 5 9 12 5 6.0 20.6 [0.45, 11.55]
Data
Data 1 3 5 9 120 5 27.6 2676.8 [-36.630,91.83]
with
Error
10.4.2-C. Mathematical Definition
 Outlier
The standardized residual is given by

If |ei*|>2, then the corresponding observation may be regarded an outlier.


Example: (Tire Tread Wear vs. Mileage)

i 1 2 3 4 5 6 7 8 9

ei* 2.25 -0.12 -0.66 -1.02 -0.83 -0.57 -0.40 0.43 1.51

• STUDENTIZED RESIDUAL: a type of standardized residual calculated with the current


observation deleted from the analysis.
• The LS fit can be excessively influenced by observation that is not necessarily an outlier as
defined above.
10.4.2-C. Mathematical Definition

 Influential Observation
Observation with extreme x-value, y-value, or both.

• On average hii is (k+1)/n, regard any hii>2(k+1)/n as high leverage;


• If xi deviates greatly from mean x, then hii is large;
• Standardized residual will be large for a high leverage observation;
• Influence can be thought of as the product of leverage and outlierness.
Example: (Observation is influential/ high leverage, but not an outlier)

eg.1 with without eg.2 scatter plot residual plot


10.4.2-C. SAS code of the exampl
es
SAS code
proc reg data=tire;
model y=x;
output out=resid rstudent=r h=lev cookd=cd dffits=dffit;
proc print data=resid;
where abs(r)>=2 or lev>(4/9) or cd>(4/9) or abs(dffit)>(2*sqrt(1/9));
run;

SAS output
10.4.2-D. How to deal with Outli
ers & Influential Observations

 Investigate (Data errors? Rare events? Can be corr


ected?)
 Ways to accommodate outliers
 Non Parametric Methods (robust to outliers)
 Data Transformations
 Deletion (or report model results both with and with
out the outliers or influential observations to see ho
w much they change)
10.4.3 Data Transformations
Reason

 To achieve linearity
 To achieve homogeneity of variance
 To achieve normality or symmetry about the r
egression equation
Type of Transformation
 Linearzing Transformation
transformation of a response variable, or predicted
variable, or both, which produces an approximate li
near relationship between variables.

 Variance Stabilizing Transformation


make transformation if the constant variance assum
ption is violated
Method of
Linearizing Transformation

 Use mathematical operation, e.g. square roo


t, power, log, exponential, etc.

 Only one variable needs to be transformed in


the simple linear regression.
Which one? Predictor or Response? Why?
e.g. We take a exponential transformation on
Y = a exp (-x) <=> log Y = log a -  x

Plot of Residual vs xi & xi from the exponential fit


^
40 Variable
Y= ei (original)
^ ^ 30
ei with transformation

Xi Yi log Yi exp (logYi) Ei


0 394.33 5.926 374.64 19.69 20

Residual
4 329.50 5.807 332.58 -3.08 10

8 291.00 5.688 295.24 -4.24


0
12 255.17 5.569 262.09 -6.92
-10
16 229.33 5.450 232.67 -3.34
20 204.83 5.331 206.54 -1.71 -20
0 5 10 15 20 25 30 35
24 179.00 5.211 183.36 -4.36 xi

Normal Probability Plot of ei and ei with transformation


99
Variable
ei
95 ei with transformation
Mean StDev N AD P
90 3.947460E-16 17.79 9 0.514 0.138

28 163.83 5.092 162.77 1.06 80


0.3256 8.142 9 0.912 0.011

70
Percent

60
50
40
30
20

10

1
-40 -30 -20 -10 0 10 20 30 40 50
Data
Method of
Variance Stabilizing Transformation

Delta method : Two terms Taylor-series approximations

Var( h(Y)) ≈ [h()]2 g2 () where Var(Y) = g2(), E(Y) = 

1. set [h’()]2 g2 ()  1


1
2. h’() = g ( )

d dy
3. h() =  g ( 
) h(y) =  g ( y)

e.g. Var(Y) = c2 2 , where c > 0, g() = c ↔ g(y) = cy

h(y) =  dy
cy 
1
c  dyy 1
c log( y )
Therefore it is the logarithmic transformation
Correlation Analysis
 Correlation: a measurement of how closely two var
iables share a linear relationship.

Cov(X, Y)
   corr(X, Y) 
Var(X)Var( Y)

 Useful when it is not possible to determine which var


iable is the predictor and which is the response.
 Health vs wealth. Which is predictor? Which is response?
Statistical Inference on the Correla
tion Coefficient ρ
 We can derive a test on the correlation coeffi
cient in the same way that we have been doi
ng in class.
 Assumptions
 X, Y are from the bivariate normal distributi
on
 Start with point estimator
 R: sample estimate of the population correl
ation coefficient ρ
n

 (X i - X )(Yi - Y )
R i 1
n n

 ( X i - X ) 2  (Yi - Y ) 2
i 1 i 1

 Get the pivotal n-2


Rquantity
T 
 The distribution
1 - Rof2 R is quite complicated
 T: transform the point estimator into a p.q.
Bivariate Normal Distribution
 pdf:

 Properties
 μ1, μ2 means
for X, Y
 σ12, σ22 variances
for X, Y
 ρ the correlation coeff
between X, Y
Derivation of T
are these equivalent?
r n - 2 ? ˆ1
t   Therefore, we can use t
1- r 2
SE ( ˆ )
1
as a statistic for testing
substitute : against the null hypothe
s S
r  ˆ1 x  ˆ1 xx  ˆ1
S xx sis
sy S yy SST
H0: β1=0
SSE ( n - 2) s 2
1- r 
2

SST SST

then :
 Equivalently, we can te
S xx ( n - 2) SST ˆ1 ˆ1 st against
t  ˆ1  
SST ( n - 2) s 2 s / S xx SE ( ˆ1 ) H0: ρ=0
 yes, they are equivalent.
Exact Statistical Inference on ρ
 Test  Example (from textbook)
 H0 : ρ=0 , A researcher wants to determine if two test
Ha : ρ<>0 
instruments give similar results. The two
test instruments are administered to a
 Test statistic: sample of 15 students. The correlation
coefficient between the two sets of scores
is found to be 0.7. Is this correlation

r n-2
statistically significant at the .01 level?

t0 
1- r 2  H0 : ρ=0
0.7 ,15 -H
2 a : ρ<>0
t0   3.534
1 - 0.7 2

 Reject H0 if t0 > tn-2  for α = .01, 3.534 = t0 > t13, .005 = 3.012

 ▲ Reject H0
Approximate Statistical Inference on ρ
 There is no exact method of testin
g ρ vs an arbitrary ρ0
 Distribution of R is very complicate
d
 T ~ tn-2 only when ρ = 0

 To test ρ vs an arbitrary ρ0 use Fis


her’s Normal approximation
1  1 R   1  1   1 
  N ln 
-1
tanh R  ln , 
2  1- R   2  1 -   n - 3 

 Transform
^ the sample
1  1 r  ^ estimate
 1  1   1 
  ln  
, underH0 , ~ N ln 0

2  1- r  2 1 -  , n - 3 
  0 
Approximate Statistical Inference on ρ
H0 :   0 vs. H1 :   0
 Test :
1  1  0 
H0 :   0  ln  vs. H1 :   0
2  1 - 0 

^ 1 1 r 
 Sample estimate:   ln 
2 1- r 

^ 
z 0  n - 3  - 0 
 Z statistic:  
reject H0 if |z0| > zα/2

^ 1 ^ 1
 - za / 2      za / 2
 CI: n-3 n-3
e 2l - 1 e 2u - 1
   2u
e 2l  1 e 1
Approximate Statistical Inference on ρ
using SAS

 Code:

 Output:
Pitfalls of Regression and Correlat
ion Analysis
 Correlation and causation
 Ticks cause good health
 Coincidental data
 Sun spots and republicans
 Lurking variables
 Church, suicide, population
 Restricted range
 Local, global linearity
Summary
Model
Assumptions Linear regression analysis

Correlation
The Least squares (LS) estimates: 0 and 1 Coefficient r

Probabilistic model
for Linear regression:
 0or1  tn - 2 , a / 2 SE (  0 or1)
Correlation
Analysis
Outliers?
Confidence Interval & Prediction interval
Influential Observations?

Data Transformations?
n
Least Squares (LS) Fit Q � i
[ y
i 1
- (  0   x
1 i )]2

SSR SSE
Sample correlation coefficient r r 
2
 1-
SST SST

Statistical inference on ß0 & ß1  


ˆ 0 ~ N   0,  2   ˆ 1 ~ N   1,
 x 2
  2

i

 nSxx  S xx
  

( ) �
2
�� 1 x *
-x �
Prediction Interval Y α�
*
Y *
t n - 2,a / 2 s 1   �
n S xx
� �
� �

Model Assumptions Linearity Constant Variance


Normality Independence

Correlation Analysis r n-2 1 � 1  �


t   ln � �
2 � 1-  �
1- r2
Thank You and Any questions?

S-ar putea să vă placă și