Documente Academic
Documente Profesional
Documente Cultură
i i i i i
bX a Y Y Y e
.
Intuitively, line-A doesnt fit the data i.e. it doesnt go
through the scatterplot!
More formally, the residuals (e
i
) are all negative so the
average residual,
0
1
1
N
i
i
e
N
e
. This suggests:
Property 1: one desirable property of a regression line fit
is that the average residual is 0: i.e.
0
1
1
N
i
i
e
N
e
.
5
Next, consider line-B
Line-B satisfies the zero average residual condition but
still doesnt look like a good fit, because there are mostly
negative residuals for low-Xs and positive residuals for
high-Xs.
More formally, the problem is the residuals are correlated
with the X
i
s i.e. their covariance should be zero:
0 ) )( (
1
) , cov(
1
N
i
i i i i
X X e e
N
X e
.
This suggests:
Property 2: a second desirable property of a regression
line fit is that the residuals are uncorrelated with the Xs.
Note: if 0 e , then this implies:
0
1
) , cov(
1
N
i
i i i i
X e
N
X e
.
6
Finally, consider line-C
Line-C looks like a good fitting line, and satisfies both
properties 1 and 2.
This very intuitive sense of a good fitting regression line
is based on a method of moments approach
It turns out that this approach leads to the (essentially)
most commonly used estimators in regression analysis.
These are generally referred to as the Ordinary Least
Squares (OLS) estimators the OLS name comes from an
alternative approach than intuited here.
But, lets see what this method of moments approach
implies about the regression coefficient estimates
7
3. Summary / Implications
This very intuitive discussion of fitting a good line to a
scatterplot relied on three aspects:
1. The assumed functional form of the relationship
between Y and X i.e. is linear
2. The resulting residuals should have zero average; and
3. The residuals should be uncorrelated with the X
i
s
To estimate the coefficients (a & b) of the good-fitting line,
we use these three points.
Property 1: Zero average residual
0 )
(
1
1
N
i
i i
Y Y
N
e
Y Y
i.e. avg actual-Y = avg predicted-Y.
And, using the linear functional form assumption,
0 )) ( (
1
1
N
i
i i
bX a Y
N
which implies
X b a Y ,
and solving for a , gives:
X b Y a
i.e. the intercept = avg-Y b*avg-X.
8
Property 2: zero correlation between residuals and Xs
0
) 1 (
1
) , cov(
1
N
i
i i i i
X e
N
X e
0 )) ) ( (
1
N
i
i i i
X bX X b Y Y
0 ) ( ) (
1 1
N
i
i i
N
i
i i
X X X b X Y Y
Solving for b, gives
N
i
i i
N
i
i i
X X X
X Y Y
b
1
1
) (
) (
.
Since
N
i
i
N
i
i
X X X X Y Y
1 1
) ( 0 ) (
, we can rewrite
this to solve for b,
N
i
i i
N
i
i i
X X X X
N
X X Y Y
N
b
1
1
) )( (
) 1 (
1
) )( (
) 1 (
1
,
which is simply
) (
) , (
i
i i
X Var
Y X Cov
b
i.e. the slope parameter is the covariance between X
i
and Y
i
divided (i.e. normalised) by the variance of X
i
.