Sunteți pe pagina 1din 50

Ordinary Least Squares

Rmulo A. Chumacero
OLS
Motivation
Economics is (essentially) observational science
Theory provides discussion regarding the relationship between variables
Example: Monetary policy and macroeconomic conditions
What?: Properties of OLS
Why?: Most commonly used estimation technique
How?: From simple to more complex
Outline
1. Simple (bivariate) linear regression
2. General framework for regression analysis
3. OLS estimator and its properties
4. CLS (OLS estimation subject to linear constraints)
5. Inference (Tests for linear constraints)
6. Prediction
1
OLS
An Example
Figure 1: Growth and Government size
2
OLS
Correlation Coecient
Intended to measure direction and closeness of linear association
Observations: {
t
. r
t
}
1
t=1
Data expressed in deviations from the (sample) mean:
e .
t
= .
t
.. . = 1
1
1
X
t=1
.
t
. . = . r
Cov(. r) = E (r) E () E (r)
:
r
= 1
1
1
X
t=1
e r
t
e
t
which depends on the units in which r and are measured
Correlation coecient is a measure of linear association independent of units
: = 1
1
1
X
t=1
e r
t
:
r
e
t
:

=
:
r
:
r
:

. :
.
=
v
u
u
t
1
1
1
X
t=1
e .
2
t
. . = . r
Limits: 1 : 1 (applying Cauchy-Schwarz inequality)
3
OLS
Caution
Fallacy: Post hoc, ergo propter hoc (after this, therefore because of this)
Correlation is not causation
Numerical and statistical signicance, may not mean nothing
Nonsense (spurious) correlation
Yule (1926):
Death rate - Proportion of marriages in the Church of England (1866-1911)
: = 0.95
Ironic: To achieve immortality close the church!
A few more recent examples
4
OLS
Ice Cream causes Crime
Figure 2: Nonsense 1
5
OLS
Yet another reason to hate Bill Gates
Figure 3: Nonsense 2
6
OLS
Facebook and the Greeks
Figure 4: Nonsense 3
7
OLS
Lets save the pirates
Figure 5: Nonsense 4
8
OLS
Divine justice
Figure 6: Nonsense 5?
9
OLS
Simple linear regression model
Economics as a remedy for nonsense (correlation does not indicate direction of dependence)
Take a stance:

t
= ,
1
+,
2
r
t
+n
t
Linear
Dependent / independent
Systematic / unpredictable
1 observations, 2 unknowns
Innite possible solutions
Fit a line by eye
Choose two pairs of observations and join them
Minimize distance between and predictable component
min
P
|n
t
| LAD
min
P
n
2
t
OLS
10
OLS
Our Example
Figure 7: Growth and Government size
11
OLS
Our Example
Figure 8: Linear regression
12
OLS
Simple linear regression model
Dene the sum of squares of the residuals (oo1) function as:
o
1
(,) =
1
X
t=1
(
t
,
1
,
2
r
t
)
2
Estimator: Formula for estimating unknown parameters
Estimate: Numerical value obtained when sample data is substituted in formula
The OLS estimator (
b
,) minimizes o
1
(,). FONC:
Jo
1
(,)
J,
1

b
,
= 2
X

b
,
1

b
,
2
r
t

= 0
Jo
1
(,)
J,
2

b
,
= 2
X
r
t

b
,
1

b
,
2
r
t

= 0
Two equations, two unknowns:
b
,
1
=
b
,
2
r
b
,
2
=
:
r
:
2
r
= :
:

:
r
=
P
1
t=1
e r
t
e
t
P
1
t=1
e r
2
t
13
OLS
Simple linear regression model
Properties:

b
,
1
.
b
,
2
minimize oo1
OLS line passes through the mean point (r. )
b n
t

t

b
,
1

b
,
2
r
t
are uncorrelated (in the sample) with r
t
Figure 9: SSR
14
OLS
General Framework
Observational data {n
1
. n
2
. .... n
1
}
Partition n
t
= (
t
. r
t
) where
t
R, r
t
R
/
Joint density: ,(
t
. r
t
. o), o vector of unknown parameters
Conditional distribution: ,(
t
. r
t
. o) = ,(
t
|r
t
. o
1
),(r
t
. o
2
); ,(r
t
. o
2
) =
R

,(
t
. r
t
. o)d
Regression analysis: statistical inferences on o
1
Ignore ,(r
t
. o
2
) provided o
1
and o
2
are variation free
: dependent or endogenous variable. r: vector of independent or exogenous variables
Conditional mean: :(r
t
. o
3
). Conditional variance: q (r
t
. o
4
)
:(r
t
. o
3
) = E (
t
|r
t
. o
3
) =
Z

,(| r
t
. o
2
)d
q (r
t
. o
4
) =
Z

2
,(| r
t
. o
2
)d [:(r
t
. o
3
)]
2
n
t
: dierence between
t
and conditional mean:

t
= :(r
t
. o
3
) +n
t
(1)
15
OLS
General Framework
Proposition 1 Properties of n
t
1. E (n
t
|r
t
) = 0
2. E (n
t
) = 0
3. E [/(r
t
)n
t
] = 0 for any function /()
4. E (r
t
n
t
) = 0
Proof. 1. By denition of n
t
and linearity of conditional expectations,
E (n
t
|r
t
) = E [
t
:(r
t
) |r
t
]
= E [
t
|r
t
] E [:(r
t
) |r
t
]
= :(r
t
) :(r
t
) = 0
2. By the law of iterated expectations and the rst result,
E (n
t
) = E [E (n
t
|r
t
)] = E (0) = 0
3. By essentially the same argument,
E [/(r
t
)n
t
] = E [E [/(r
t
)n
t
|r
t
]]
= E [/(r
t
)E [n
t
|r
t
]]
= E [/(r
t
) 0] = 0
4. Follows from the third result setting /(r
t
) = r
t
.
16
OLS
General Framework
(1) + rst result of Proposition 1: regression framework:

t
= :(r
t
. o
3
) +n
t
E (n
t
|r
t
) = 0
Important: framework, not model: holds true by denition.
:() and q () can take any shape
If :() is linear: Linear Regression Model (LRM).
:(r
t
. o
3
) = r
0
t
,
1
11
=


1
.
.
.

. A
1/
=

r
0
1
.
.
.
r
0
1

r
1.1
r
1./
.
.
.
.
.
.
.
.
.
r
1.1
r
1./

. n
11
=

n
1
.
.
.
n
1

17
OLS
Regression models
Denition 1 The Linear Regression Model (LRM) is:
1.
t
= r
0
t
, + n
t
or 1 = A, +n
2. E (n
t
|r
t
) = 0
3. rank(A) = / or det (A
0
A) 6= 0
4. E (n
t
n
:
) = 0 t 6= :
Denition 2 The Homoskedastic Linear Regression Model (HLRM) is the LRM plus
5. E

n
2
t
|r
t

= o
2
or E (nn
0
|A) = o
2
1
1
Denition 3 The Normal Linear Regression Model (NLRM) is the LRM plus
6. n
t
N

0. o
2

18
OLS
Denition of OLS Estimator
Dene the sum of squares of the residuals (oo1) function as:
o
1
(,) = (1 A,)
0
(1 A,) = 1
0
1 21
0
A, +,
0
A
0
A,
The OLS estimator (
b
,) minimizes o
1
(,). FONC:
Jo
1
(,)
J,

b
,
= 2A
0
1 + 2A
0
A
b
, = 0
which yield normal equations A
0
1 = A
0
A
b
,.
Proposition 2
b
, = (A
0
A)
1
(A
0
1 ) is the arg min
,
o
1
(,)
Proof. Using normal equations:
b
, = (A
0
A)
1
(A
0
1 ). SOSC:
J
2
o
1
(,)
J,J,
0

b
,
= 2A
0
A
then
b
, is minimum as A
0
A is p.d.m.
Important implications:

b
, is a linear function of 1

b
, is a random variable (function of A and 1 )
A
0
A must be of full rank
19
OLS
Interpretation
Dene least squares residuals
b n = 1 A
b
, (2)
b o
2
= 1
1
b n
0
b n
1 = A
b
, + b n = 11 + `1 ; where 1 = A (A
0
A)
1
A
0
and ` = 1 1
Proposition 3 Let be an : : matrix of rank :. A matrix of the form 1 = (
0
)
1

0
is
called a projection matrix and has the following properties:
i) 1 = 1
0
= 1
2
(Hence 1 is symmetric and idempotent)
ii) rank(1) = :
iii) Characteristic roots (eigenvalues) of 1 consist of r ones and n-r zeros
iv) If 2 = c for some vector c, then 12 = 2 (hence the word projection)
v) ` = 1 1 is also idempotent with rank n-r, eigenvalues consist of n-r ones and r zeros,
and if 2 = c, then MZ= 0
vi) 1 can be written as G
0
G, where GG
0
= 1, or as
1

0
1
+
2

0
2
+... +
:

0
:
where
i
is a vector
and : = :c:/(1)
20
OLS
Interpretation
1 = A
b
, + b n = 11 +`1
Y
Col(X)
MY
PY
0
Figure 10: Orthogonal Decomposition of 1
21
OLS
The Mean of
b
,
Proposition 4 In the LRM, E
h
b
, ,

|A
i
= 0 and E
b
, = ,
Proof. By the previous results,
b
, = (A
0
A)
1
A
0
1 = (A
0
A)
1
A
0
(A, + n)
= , + (A
0
A)
1
A
0
n
Then
E
h
b
, ,

|A
i
= E
h
(A
0
A)
1
A
0
n|A
i
= (A
0
A)
1
A
0
E (n|A)
= 0
Applying the law of iterated expectations, E
b
, = E
h
E

b
, |A
i
= ,
22
OLS
The Variance of
b
,
Proposition 5 In the HLRM, V

b
, |A

= o
2
(A
0
A)
1
and V

b
,

= o
2
E
h
(A
0
A)
1
i
Proof. Since
b
, , = (A
0
A)
1
A
0
n.
V

b
, |A

= E

b
, ,

b
, ,

0
|A

= E
h
(A
0
A)
1
A
0
nn
0
A (A
0
A)
1
|A
i
= (A
0
A)
1
A
0
E [nn
0
|A] A (A
0
A)
1
= o
2
(A
0
A)
1
Thus, V

b
,

= E
h
V

b
, |A
i
+ V
h
E

b
, |A
i
= o
2
E
h
(A
0
A)
1
i
Important features of V

b
, |A

= o
2
(A
0
A)
1
:
Grows proportionally with o
2
Decreases with sample size
Decreases with volatility of A
23
OLS
The Mean and Variance of b o
2
Proposition 6 In the LRM, b o
2
is biased.
Proof. We know that b n = `1 . It is trivial to verify that b n = `n. Then, b o
2
= 1
1
b n
0
b n =
1
1
n
0
`n. This implies that
E

b o
2
|A

= 1
1
E [n
0
`n|A]
= 1
1
tr E [n
0
`n|A]
= 1
1
E [tr (n
0
`n) |A]
= 1
1
E [tr (`nn
0
) |A]
= 1
1
o
2
tr (`)
= o
2
(1 /) 1
1
Applying the law of iterated expectations we obtain Eb o
2
= o
2
(1 /) 1
1
Unbiased estimator: e o
2
= (1 /)
1
b n
0
b n.
Proposition 7 In the NLRM, Vb o
2
= 1
2
2 (1 /) o
4
Important:
With the exception of Proposition 7, normality is not required
b o
2
is biased, but it is the MLE under normality and is consistent
Variance of
b
, and b o
2
depend on o
2
.
b
V

b
,

= e o
2
(A
0
A)
1
24
OLS
b
, is BLUE
Theorem 1 (Gauss-Markov)
b
, is BLUE
Proof. Let = (A
0
A)
1
A
0
, so
b
, = 1 . Consider any other linear estimator / = (+ C) 1 .
Then,
E (/ |A) = (A
0
A)
1
A
0
A, +CA, = (1 CA) ,
For / to be unbiased we require CA = 0, then:
V (/ |A) = E

(+C) nn
0
(+C)
0

As (+ C) (+C)
0
= (A
0
A)
1
+ CC
0
, we obtain
V (/ |A) = V

b
, |A

+o
2
CC
0
As CC
0
is p.s.d. we have V (/ |A) V

b
, |A

Despite popularity, Gauss-Markov not very powerful


Restricts quest to linear and unbiased estimators
There may be nonlinear or biased estimator that can do better (lower MSE)
OLS not BLUE when homoskedasticity is relaxed
25
OLS
Asymptotics I
Unbiasedness is not that useful in practice (frequentist perspective)
It is also not common in general contexts
Asymptotic theory: properties of estimators when sample size is innitely large
Cornerstones: LLN (consistency) and CLT (inference)
Denition 4 (Convergence in probability) A sequence of real or vector valued random
variables {r
t
} is said to converge to r in probability if
lim
1
Pr (kr
1
rk ) = 0 for any 0
We write r
1
j
r or j limr
1
= r.
Denition 5 (Convergence in mean square) {r
t
} converges to r in mean square if
lim
1
E (r
1
r)
2
= 0
We write r
1
`
r.
Denition 6 (Almost sure convergence) {r
t
} converges to r almost surely if
Pr
h
lim
1
r
1
= r
i
= 1
We write r
1
c.:.
r.
Denition 7 The estimator
b
o
1
of o
0
is said to be a weakly consistent estimator if
b
o
1
j
o
0
.
Denition 8 The estimator
b
o
1
of o
0
is said to be a strongly consistent estimator if
b
o
1
c.:.
o
0
.
26
OLS
Laws of Large Numbers and Consistency of
b
,
Theorem 2 (WLLN1, Chebyshev) Let E (r
t
) = j
t
, V (r
t
) = o
2
t
, Co (r
i
. r
,
) = 0 i 6= ,.
If lim
1
1
1
P
1
t=1
o
2
t
` < , then
r
1
j
1
j
0
Theorem 3 (SLLN1, Kolmogorov) Let {r
t
} be independent with nite variance V (r
t
) =
o
2
t
< . If
P

t=1
o
2

t
2
< , then
r
1
j
1
c.:.
0
Assume that 1
1
A
0
A Q (invertible and nonstochastic)
b
, , = (A
0
A)
1
A
0
n
=

1
1
A
0
A

1
1
A
0
n

j
0

b
, is consistent:
b
,
j
,
27
OLS
Analysis of Variance (ANOVA)
1 =
b
1 + b n
1 1 =

b
1 1

+ b n

1 1

1 1

=

b
1 1

b
1 1

+ 2

b
1 1

0
b n + b n
0
b n
but
b
1
0
b n = 1
0
1`1 = 0 and 1
0
b n = 1 :
0
b n = 0. Thus

1 1

1 1

=

b
1 1

b
1 1

+ b n
0
b n
This is called the ANOVA formula, often written as
1oo = 1oo + oo1
1
2
=
1oo
1oo
= 1
oo1
1oo
= 1
1
0
`1
1
0
11
1 = 1
1
1
1
::
0
. If regressors include constant, 0 1
2
1.
28
OLS
Analysis of Variance (ANOVA)
Measures percentage of variance of 1 accounted for in variation of
b
1 .
Not measure or goodness of t
Doesnt explain anything
Not even clear if 1
2
has interpretation in terms of forecast performance
Model 1:
t
= r
t
, + n
t
Model 2:
t
r
t
= r
t
+n
t
with = , 1
Mathematically identical and yield same implications and forecasts
Yet reported 1
2
will dier greatly
Suppose , ' 1. Second model: 1
2
' 0, First model can be arbitrarily close to one
1
2
is increases as regressors are added. Theil proposed:
1
2
= 1
oo1
1oo
1
1 /
= 1
e o
2
b o
2

Not used that much today, as better model evaluation criteria have been developed
29
OLS
OLS Estimator of a Subset of ,
Partition A =

A
1
A
2

, =

,
1
,
2

Then A
0
A
b
, = A
0
1 can be written as:
A
0
1
A
1
b
,
1
+A
0
1
A
2
b
,
2
= A
0
1
1 (3a)
A
0
2
A
1
b
,
1
+A
0
2
A
2
b
,
2
= A
0
2
1 (3b)
Solving for
b
,
2
and reinserting in (3a) we obtain
b
,
1
= (A
0
1
`
2
A
1
)
1
A
0
1
`
2
1
b
,
2
= (A
0
2
`
1
A
2
)
1
A
0
2
`
1
1
where `
i
= 1 1
i
= 1 A
i
(A
0
i
A
i
)
1
A
0
i
(for i = 1. 2).
Theorem 4 (Frisch-Waugh-Lovell)
b
,
2
and b n can be computed using the following algorithm:
1. Regress 1 on A
1
. obtain residual
e
1
2. Regress A
2
on A
1
. obtain residual
e
A
2
3. Regress
e
1 on
e
A
2
, obtain
b
,
2
and residuals b n
FWL was used to speed computation
30
OLS
Application of FWL: (Demeaning)
Partition A =

A
1
A
2

where A
1
= : and A
2
is the matrix of observed regressors
e
A
2
= `
1
A
2
= A
2
: (:
0
:)
1
:
0
A
2
= A
2
A
2
e
1 = `
1
1 = 1 : (:
0
:)
1
:
0
1
= 1 1
FWL states that
b
,
2
is OLS estimate from regression of
e
1 on
e
A
2
b
,
2
=

1
X
t=1
e r
2t
e r
0
2t
!
1

1
X
t=1
e r
2t
e
t
!
Thus the OLS estimator for the slope coecients is a regression with demeaned data.
31
OLS
Constrained Least Squares (CLS)
Assume the following constraint must hold:
Q
0
, = c (4)
Q is a / matrix of known constants and c is a -vector of known constants. < / and
rank(Q) = .
CLS estimator of , (,) is value of , that minimizes oo1 subject to (4).
L(,. ) = (1 A,)
0
(1 A,) + 2
0
(Q
0
, c)
is a -vector of Lagrange multipliers. FONC:
JL
J,

,.
= 2A
0
1 + 2A
0
A, + 2Q = 0
JL
J

,.
= Q
0
, c = 0
, =
b
, (A
0
A)
1
Q
h
Q
0
(A
0
A)
1
Q
i
1

Q
0
b
, c

(5)
o
2
= 1
1

1 A,

1 A,

, is BLUE
32
OLS
Inference
Up to now, properties of estimators did not depend on the distribution of n
Consider the NLRM with n
t
N

0. o
2

. Then:

t
|r
t
N

r
0
t
,. o
2

On the other hand as


b
, = (A
0
A)
1
A
0
1 , then:
b
, |A
c
v N

,. o
2
(A
0
A)
1

However, as
b
,
j
,, it also converges in distribution to a degenerate distribution
Thus, we require something more to conduct inference
Next, we discuss nite (exact) and large sample distribution of estimators to test hypothesis
Components:
Null hypothesis H
0
Alternative hypothesis H
1
Test statistic (one tail, two tails)
Rejection region
Conclusion
33
OLS
Inference with Linear Constraints (normality)
H
0
: Q
0
, = c H
1
: Q
0
, 6= c
The t Test
= 1. Assume n is normal, under the null hypothesis:
Q
0
b
,
c
v N
h
c. o
2
Q
0
(A
0
A)
1
Q
i
Q
0
b
, c
h
o
2
Q
0
(A
0
A)
1
Q
i
1,2
N (0. 1) (6)
Test statistic used when o is known. If not, recall
b n
0
b n
o
2

2
1/
(7)
As (6) and (7) are independent, hence:
t
1
=
Q
0
b
, c
h
e o
2
Q
0
(A
0
A)
1
Q
i
1,2
o
1/
(6) holds even when normality of n is not present.
If H
0
: ,
1
= 0, dene Q =

1 0 0

0
c = 0,
t
1
=
b
,
1
q
b
V
1.1
34
OLS
Inference with Linear Constraints (normality)
Condence interval:
Pr

b
,
i
.
c,2
q
b
V
i.i
< ,
i
<
b
,
i
+ .
c,2
q
b
V
i.i

= 1 c
Tail probability, or probability value (j-value) function
j
1
= j (t
1
) = Pr (|2| |t
1
|) = 2 (1 (|t
1
|))
Reject the null when the j-value is less than or equal to c
Condence interval for o:
Pr
"
(1 /) e o
2

2
1/.1c,2
< o
2
<
(1 /) e o
2

2
1/.c,2
#
= 1 c (8)
35
OLS
The 1 Test (normality)
1. Under the null:
o
1

,

o
1

b
,

o
2

2

When o
2
is not known, replace o
2
with e o
2
and obtain
o
1

,

o
1

b
,

e o
2
=
1 /

Q
0
b
, c

0
h
Q
0
(A
0
A)
1
Q
i
1

Q
0
b
, c

b n
0
b n
1
.1/
(9)
As with t tests, reject null when the value computed exceeds the critical value
36
OLS
Asymptotics II
How to conduct inference when n is not necessarilly normal?
Figure 11: Convergence in distribution
37
OLS
CLT
Denition 9 (Convergence in distribution) {r
t
} is said to converge to r in distribution
if the distribution function F
1
of r
1
converges to the distribution F of r at every continuity
point of F. We write r
1
1
r and we call F the limiting distribution of {r
t
}. If {r
t
} and {
t
}
have the same limiting distribution, we write r
1
11
=
1
.
Theorem 5 (CLT1, Lindeberg-Lvy) Let {r
t
} be i.i.d. with Er
t
= j and Vr
t
= o
2
. Then
2
1
=
r
1
j
[Vr
1
]
1,2
=

1
r
1
j
o
1
N (0. 1)
Assume that 1
1
A
0
A Q(invertible and nonstochastic) and that 1
1,2
A
0
n
1
N

0. o
2
Q

1

b
, ,

=

1
1
A
0
A

1
1,2
A
0
n

1
N

0. o
2
Q
1

Thus, under the HLRM, asymptotic distribution does not depend on distribution of n
Normal vs t-test / Chi
2
vs F test
38
OLS
Tests for Structural Breaks
Suppose we have two regimes regression
1
1
= A
1
,
1
+ n
1
1
2
= A
2
,
2
+ n
2
E

n
1
n
2

n
0
1
n
0
2

=

o
2
1
1
1
1
0
0 o
2
2
1
1
2

H
0
: ,
1
= ,
2
Assume o
1
= o
2
. Dene
1 = A, +n
1 =

1
1
1
2

, A =

A
1
0
0 A
2

, , =

,
1
,
2

, and n =

n
1
n
2

Applying (9) we obtain:


1
1
+1
2
2/
/

b
,
1

b
,
2

0
h
(A
0
1
A
1
)
1
+ (A
0
2
A
2
)
1
i
1

b
,
1

b
,
2

1
0
h
1 A (A
0
A)
1
A
0
i
1
1
/.1
1
+1
2
2/
(10)
where
b
,
1
= (A
0
1
A
1
)
1
A
0
1
1
1
and
b
,
2
= (A
0
2
A
2
)
1
A
0
2
1
2
.
39
OLS
Same result can be derived as follows: Dene oo1 under alternative (structural change)
o
1

b
,

= 1
0
h
1 A (A
0
A)
1
A
0
i
1
and oo1 under the null hypothesis
o
1

,

= 1
0
h
1 A (A
0
A)
1
A
0
i
1
1
1
+1
2
2/
/
o
1

,

o
1

b
,

o
1

b
,

1
/.1
1
+1
2
2/
(11)
Unbiased estimate of o
2
is
e o
2
=
o
1

,

1
1
+1
2
2/
H
0
: o
1
= o
2
b n
0
i
b n
i
o
2

2
1

/
for i = 1. 2
Because these chi-square variables are independent, we have
1
2
/
1
1
/
b n
0
1
b n
1
b n
0
2
b n
2
1
1
1
/.1
2
/
40
OLS
If o
1
6= o
2
, test for equality of regression parameters are more involved. If / = 1,
t
1
=
b
,
1

b
,
2
q
e o
2
1
A
0
1
A
1
+
e o
2
2
A
0
2
A
2
o

=
h
e o
2
1
A
0
1
A
1
+
e o
2
2
A
0
2
A
2
i
2
e o
4
1
(1
1
1)(A
0
1
A
1
)
2
+
e o
4
2
(1
2
1)(A
0
2
A
2
)
2
Cleaner way: Likelihood Ratio Tests.
Even though Chow tests are popular, modern practice is skeptic because of ad-hoc manner
the point at which to split the sample is chosen. Recent theoretical and empirical applications
are working on treating the possible period of break as an endogenous latent variable.
41
OLS
Prediction
Out-of-sample predictions for
j
(for j 1) is not easy. In that period:
j
= r
0
j
, + n
j
Types of uncertainty:
Unpredictable component
Parameter uncertainty
Uncertainty about r
j
Specication uncertainty
Types of forecasts
Point forecast
Interval forecast
Density forecast
Active area of research
42
OLS
Prediction
If HLRM holds, the predictor that minimizes MSE is b r
0
j
b
,
1
Given r, mean squared prediction error is
E
h
(b
j

j
)
2
|r
j
i
= o
2
h
1 +r
0
j
(A
0
A)
1
r
j
i
To construct estimator of variance of forecast error, substitute e o
2
for o
2
You may think that a condence interval forecast could be formulated as:
Pr

b
j
.
c,2
q
b
V
b

<
j
< b
1+j
+.
c,2
q
b
V
b

= 1 c
WRONG. Notice that

j
b
j
r
o
2
h
1 + r
0
j
(A
0
A)
1
r
j
i
=
n
j
+ r
0
j

,
b
,

r
o
2
h
1 + r
0
j
(A
0
A)
1
r
j
i
Relation does not have a discernible limiting distribution (unless n is normal). We didnt need
to impose normality for all the previous results (at least asymptotically).
We assumed that the econometrician knew r
j
. If r is stochastic and not known at 1, MSE
could be seriously underestimated.
43
OLS
Prediction

x
1
x
0
x
f(y)
y
Figure 12: Forecasting
44
OLS
Measures of predictive accuracy of forecasting models
RMSE =
v
u
u
t
1
1
1
X
j=1
(
j
b
j
)
2
MAE =
1
1
1
X
j=1
|
j
b
j
|
Theil l statistic:
l =
v
u
u
t
P
1
j=1
(
j
b
j
)
2
P
1
j=1

2
j
l

=
v
u
u
t
P
1
j=1
(
j
b
j
)
2
P
1
j=1
(
j
)
2

j
=
j

j1
and b
j
= b
j

j1
or, in percentage changes,

j
=

j

j1

j1
and b
j
=
b
j

j1

j1
These measures will reect the models ability to track turning points in the data
45
OLS
Evaluation
When comparing 2 models, is one model really better than the other?
Diebold-Mariano: Framework for comparing models
d
j
= q (b n
i.j
) q (b n
,.j
) ; 1` =
d

\
d
N (0. 1)
Harvey, Leyborne, and Newbold (HLN): Correct size distortions and use Students t
H1` = 1`

1 + 1 2/ +/(/ 1) ,1
1

1,2
46
OLS
Finite Samples
Statistical properties of most methods: known only asymptotically
Exact nite sample theory can rarely be used to interpret estimates or test statistics
Are theoretical properties reasonably good approximations for the problem at hand?
How to proceed in these cases?
Monte Carlo experiments and bootstrap
47
OLS
Monte Carlo Experiments
Often used to analyze nite sample properties of estimators or test statistics
Quantities approximated by generating many pseudo-randomrealizations of stochastic process
and averaging them
Model and estimators or tests associated with the model. Objective: assess small sample
properties.
DGP: special case of model. Specify true values of parameters, laws of motion of
variables, and distributions of r.v.
Experiment: replications or samples (J), generating articial samples of data according
to DGP and calculating the estimates or test statistics of interest
After J replications, we have equal number of estimators which are subjected to statistical
analysis
Experiments may be performed by changing sample size, values of parameters, etc. Re-
sponse surfaces.
Monte Carlo experiments are random. Essential to perform enough replications so results
are suciently accurate. Critical values, etc.
48
OLS
Bootstrap Resampling
Bootstrap views observed sample as a population
Distribution function for this population is the EDF of the sample, and parameter estimates
based on the observed sample are treated as the actual model parameters
Conceptually: examine properties of estimators or test statistics in repeated samples drawn
from tangible data-sampling process that mimics actual DGP
Bootstrap do not represent exact nite sample properties of estimators and test statistics
under actual DGP, but provides approximation that improves as size of observed sample
increases
Reasons for acceptance in recent years:
Avoids most of strong distributional assumptions required in Monte Carlo
Like Monte Carlo, it may be used to solve intractable estimation and inference problems
by computation rather than reliance on asymptotic approximations, which may be very
complicated in nonstandard problems
Boostrap approximations are often equivalent to rst-order asymptotic results, and may
dominate them in cases.
49

S-ar putea să vă placă și