Sunteți pe pagina 1din 27

Applied Econometrics

Department of Economics
Stern School of Business

Applied Econometrics
3. Linear Least Squares
Vocabulary
Some terms to be used in the discussion.
Population characteristics and entities vs. sample
quantities and analogs
Residuals and disturbances
Population regression line and sample regression
Objective: Learn about the conditional mean
function. Estimate | and o
2

First step: Mechanics of fitting a line
(hyperplane) to a set of data
Fitting Criteria
The set of points in the sample
Fitting criteria - what are they:
LAD
Least squares
and so on
Why least squares? (We do not call it ordinary at
this point.)
A fundamental result:
Sample moments are good estimators of
their population counterparts
We will spend the next few weeks using this principle
and applying it to least squares computation.
An Analogy Principle
In the population E[y | X ] = X| so
E[y - X| |X] = 0
Continuing E[x
i
c
i
] = 0
Summing,
i
E[x
i
c
i
] =
i
0 = 0
Exchange
i
and E E[
i
x
i
c
i
] = E[ X'c ] = 0
E[ X' (y - X|) ] = 0

Choose b, the estimator of | to mimic this population result: i.e.,
mimic the population mean with the sample mean

Find b such that

As we will see, the solution is the least squares coefficient vector.
' '
= =
1 1
n n
Xe 0 X(y- Xb)
Population and Sample Moments
We showed that E[c
i
|x
i
] = 0 and Cov[x
i
,c
i
] = 0.
If it is, and if E[y|X] = X|, then

| = (Var[x
i
])
-1
Cov[x
i
,y
i
].

This will provide a population analog to the
statistics we compute with the data.
An updated version, 1950 2004 used in the problem sets.
Least Squares
Example will be, y
i
= G
i
on
x
i
= [a constant, PG
i
and Y
i
] = [1,Pg
i
,Y
i
]
Fitting criterion: Fitted equation will be
y
i
= b
1
x
i1
+ b
2
x
i2
+ ... + b
K
x
iK
.
Criterion is based on residuals:
e
i
= y
i
- b
1
x
i1
+ b
2
x
i2
+ ... + b
K
x
iK

Make e
i
as small as possible.
Form a criterion and minimize it.

Fitting Criteria
Sum of residuals:
Sum of squares:
Sum of absolute values of residuals:
Absolute value of sum of residuals

We focus on now and later

=
n
i
i
e
1

=
n
i
i
e
1
2

=
n
i
i
e
1

=
n
i
i
e
1
2
1
n
i
i
e
=

1
n
i
i
e
=

Least Squares Algebra


2
1
A digression on multivariate calculus.
Matrix and vector derivatives.
Derivative of a scalar with respect to a vector
Derivative of a column vector wrt a row vector

n
i
i
e
=
'
=

e e = (y - Xb)'(y - Xb)
Other derivatives
Least Squares Normal Equations
2
Note: Derivative of 1x1 wrt Kx1 is a Kx1 vector.
Solution
c
=
c
c c
(y - Xb)'(y - Xb)
X'(y - Xb) = 0
b
(1x1)/ (kx1) (-2)(nxK)'(nx1)
= (-2)(Kxn)(nx1) = Kx1
: X'y = X'Xb
Least Squares Solution
( ) ( )
-1
1
1
Assuming it exists: = ( )
Note the analogy: = Var( ) Cov( ,y)
1 1
=
Suggests something desirable about least squares
n n

| | | |
| |
\ . \ .
b X'X X'y
x x
b X'X X'y
|
Second Order Conditions
2
2
=
column vector
=
row vector
= 2
c
=
c
c
| |
c
|
c
c
\ .
' '
c c c
c
c
(y - Xb)'(y - Xb)
X'(y - Xb)
b
(y - Xb)'(y - Xb)
(y - Xb)'(y - Xb)
b
b b b
X'X
Does b Minimize ee?
2
1 1 1 1 2 1 1
2
2
1 2 1 1 2 1 2
2
1 1 1 2 1
...
...
2
... ... ... ...
...
If there were a single b, we would require this to be
po
n n n
i i i i i i i iK
n n n
i i i i i i i iK
n n n
i iK i i iK i i iK
x x x x x
x x x x x
x x x x x
= = =
= = =
= = =
( E E E
(
E E E
c
(
=
(
c c
(
E E E
(

e'e
X'X = 2
b b'
2
1
sitive, which it would be; 2 = 2 0.
The matrix counterpart of a positive number is a
positive definite matrix.
n
i
i
x
=
>

x'x
Sample Moments - Algebra
2 2
1 1 1 1 2 1 1 1 1 2 1
2 2
1 2 1 1 2 1 2 2 1 2 2
1
2
1 1 1 2 1 1 2
... ...
... ...
=
... ... ... ... ... ... ... ...
...
= = =
= = =
=
= = =
( E E E
(
E E E
(
E
(
(
E E E
(

n n n
i i i i i i i iK i i i i iK
n n n
n i i i i i i i iK i i i i iK
i
n n n
i iK i i iK i i iK iK i iK i
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x
X'X =
| |
2
1
2
1 1 2
1
...
= ...
...
=
=
=
(
(
(
(
(
(

(
(
(
E
(
(

'
E
iK
i
i n
i i i iK
ik
n
i i i
x
x
x
x x x
x
x x
Positive Definite Matrix
Matrix is positive definite if is > 0
for any .
Generally hard to check. Requires a look at
characteristic roots (later in the course).
For some matrices, it is easy to verify. i
C a'Ca
a
X'X
K
2
k
k=1
= v 0 >

-1
s
one of these.
= ( )( ) = ( )'( ) =
Could = ?
Conclusion: =( ) does indeed minimize .
a'X'Xa a'X X'a X'a X'a v'v
v 0
b X'X X'y e'e
Algebraic Results - 1
1
n
i
i=
=

In the population: E[ ' ] =


1
In the sample: e
n
i
X 0
x 0
c
Residuals vs. Disturbances
'
= c
'
=
|
|
i i i
i i i
Disturbances (population) y
Partitioning : = E[ | ] +
Residuals (sample) y e
Partitio
x
y y y X
x

= conditional mean + disturbance
ning : = + y y Xb
X'
e
= projection + residual
(Note : Projection 'into the column space of )
Algebraic Results - 2
The residual maker M = (I - X(XX)
-1
X)
e = y - Xb= y - X(XX)
-1
Xy = My
MX = 0 (This result is fundamental!)
How do we interpret this result in terms of residuals?
(Therefore) My = MXb + Me = Me = e
(You should be able to prove this.
y = Py + My, P = X(XX)
-1
X = (I - M).
PM = MP = 0. (Projection matrix)
Py is the projection of y into the column space of X.
(New term?)
The M Matrix
M = I- X(XX)
-1
X is an nxn matrix
M is symmetric M = M
M is idempotent M*M = M
(just multiply it out)
M is singular M
-1
does not exist.
(We will prove this later as a side result in
another derivation.)
Results when X Contains a
Constant Term
X = [1,x
2
,,x
K
]
The first column of X is a column of ones
Since Xe = 0, x
1
e = 0 the residuals sum to zero.


=
= =
=
=
'
=

+
n
i
i=1
Define [1,1,...,1]' a column of n ones
= y ny
implies (after dividing by n)
y (the regression line passes through the means)
These do not apply if the model has no
y Xb e
i
i'y
i'y i'Xb+i'e =i'Xb
x b
constant term.
Least Squares Algebra
Least Squares
Residuals
Least Squares Residuals
Least Squares Algebra-3
M is nxn potentially huge
Least Squares Algebra-4

S-ar putea să vă placă și