Sunteți pe pagina 1din 3

Simple Linear Regression

Least Squares Estimates of β0 and β1


Simple linear regression involves the model

Ŷ = µY |X = β0 + β1 X.

This document derives the least squares estimates of β0 and β1 . It is simply for your own
information. You will not be held responsible for this derivation.

The least squares estimates of β0 and β1 are:


∑n
i=1 (Xi − X̄)(Yi − Ȳ )
βˆ1 = ∑n
i=1 (Xi − X̄)
2

βˆ0 = Ȳ − βˆ1 X̄

The classic derivation of the least squares estimates uses calculus to∑find the β0 and β1
parameter estimates that minimize the error sum of squares: SSE = ni=1 (Yi − Ŷi )2 . This
derivation uses no calculus, only some lengthy algebra. It uses a very clever method that
may be found in:

Im, Eric Iksoon, A Note On Derivation of the Least Squares Estimator, Working Paper Series
No. 96-11, University of Hawai’i at Manoa Department of Economics, 1996.

The Derivation

The least squares estimates are estimates βˆ0 and βˆ1 that minimize the error sum of squares

n
SSE = (Yi − Ŷi )2 .
i=1

1
We can algebraically manipulate things to get

n
SSE = (Yi − Ŷi )2
i=1

n
= (Yi − β0 − β1 Xi )2
i=1
∑n
[ ]2
= (Yi + Ȳ − Ȳ ) − β0 − β1 (Xi + X̄ − X̄)
i=1

n
[ ]2
= (Ȳ − β0 − β1 X̄) + Yi − Ȳ − β1 Xi + β1 X̄
i=1

n
[ ]2
= (Ȳ − β0 − β1 X̄) − (β1 Xi − β1 X̄ − Yi + Ȳ )
i=1
∑n
[ ] ∑n
[ ]2
= (Ȳ − β0 − β1 X̄)2 + (β1 Xi − β1 X̄ − Yi + Ȳ )
i=1 i=1

n
[ ]2
= n(Ȳ − β0 − β1 X̄) +
2
(β1 Xi − β1 X̄ − Yi + Ȳ )
i=1
∑n
[ ]2
= n(Ȳ − β0 − β1 X̄)2 + β1 (Xi − X̄) − (Yi − Ȳ )
i=1

n
[ 2 ]
= n(Ȳ − β0 − β1 X̄)2 + β1 (Xi − X̄)2 − 2β1 (Xi − X̄)(Yi − Ȳ ) + (Yi − Ȳ )2
i=1

n ∑
n ∑
n
= n(Ȳ − β0 − β1 X̄) + 2
β12 (Xi − X̄) − 2β1
2
(Xi − X̄)(Yi − Ȳ ) + (Yi − Ȳ )2
i=1 i=1 i=1
= n(Ȳ − β0 − β1 X̄) 2
( ∑ ∑n )∑n
2β1 ni=1 (Xi − X̄)(Yi − Ȳ ) i=1 (Yi − Ȳ )
2
+ β1 −
2
∑n + ∑n (Xi − X̄)2
i=1 (Xi − X̄)2
i=1 (X i − X̄) 2
i=1
= n(Ȳ − β0 − β1 X̄)2
 ∑n [ ∑n ]2 
(Xi −X̄)(Yi −Ȳ ) (X −X̄)(Yi −Ȳ )

β12 − 1 ∑i=1 n + ∑n i
i=1
∑ n
 i=1 (Xi −X̄) i=1 (Xi −X̄) 
2 2
+ ∑n [ ∑n ] 2 (Xi − X̄)2
(Y −Ȳ ) 2 (X −X̄)(Y − Ȳ )
+ ∑ni=1(Xi −X̄)2 −
i
∑n
i=1 i i
i=1
i=1 (Xi −X̄)
2
i=1
( ∑n )2
(X − X̄)(Yi − Ȳ )
= n(Ȳ − β0 − β1 X̄)2 + β1 − i=1 ∑n i
i=1 (Xi − X̄)
2
 [ ]2 
∑n ∑n
(Xi − X̄)(Yi − Ȳ )
+ (Yi − Ȳ )2 1 − √∑n i=1 ∑n 
i=1 i=1 (X i − X̄) 2 (Y
i=1 i − Ȳ ) 2

2
We’re still trying to minimize the SSE, and we’ve split the SSE into the sum of three terms.
Note that the first two terms involve the parameters β0 and β1 . The first two terms are also
squared terms, so they can never be less than zero. The third term is only a function of the
data and not the parameter. So, we know that
( ∑n )2
i=1 (Xi − X̄)(Yi − Ȳ )
SSE = n(Ȳ − β0 − β1 X̄) + β1 − 2
∑n
i=1 (Xi − X̄)
2
 [ ]2 
∑n ∑n
(Xi − X̄)(Yi − Ȳ )
+ (Yi − Ȳ )2 1 − √∑n i=1 ∑n 
i=1 i=1 (X i − X̄) 2 (Y
i=1 i − Ȳ )2

 [ ]2 
∑n ∑n
(Xi − X̄)(Yi − Ȳ )
≥ (Yi − Ȳ )2 1 − √∑n i=1 ∑n 
i=1 i=1 (X i − X̄) 2
i=1 (Y i − Ȳ )2

This is the minimum possible value for the SSE. We actually achieve this minimum value
when the first two terms of the equation above are zero. Setting each of these two terms
equal to zero gives us two equations in two unknowns, so we can solve for β0 and β1 .
0 = n(Ȳ − β0 − β1 X̄)2
( ∑n )2
i=1 (Xi − X̄)(Yi − Ȳ )
0 = β1 − ∑n
i=1 (Xi − X̄)
2

From the first equation we get


0 = n(Ȳ − β0 − β1 X̄)2
⇒ 0 = Ȳ − β0 − β1 X̄
⇒ β0 = Ȳ − β1 X̄

From the second equation we get


( ∑n )2
i=1 (Xi − X̄)(Yi − Ȳ )
0 = β1 − ∑n
i=1 (Xi − X̄)
2
∑n
(X − X̄)(Yi − Ȳ )
⇒ 0 = β1 − i=1∑n i
i=1 (Xi − X̄)
2
∑n
(X − X̄)(Yi − Ȳ )
⇒ β1 = i=1∑n i
i=1 (Xi − X̄)
2

As these are estimates, we put hats on them. We are done! We’ve now shown that
∑n
i=1 (Xi − X̄)(Yi − Ȳ )
βˆ1 = ∑n
i=1 (Xi − X̄)
2

βˆ0 = Ȳ − βˆ1 X̄.

S-ar putea să vă placă și