Sunteți pe pagina 1din 62

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

M ODEL IDENTIFICATION , ESTIMATION AND


DIAGNOSTIC CHECKING
Vlayoudom Marimoutou
AMSE Master 2
Autumn 2013

December 11, 2013

1/60

O UTLINE I NTRODUCTION E STIMATION

1
2

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION
E STIMATION OF ARMA(p, q) PROCESSES
Estimation of AR(p) processes
estimation of an AR(p) for a fixed (and known) p
Estimation of AR() processes

Estimation of ARMA(p, q) process: Maximum likelihood estimation


Likelihood function for AR processes
Exact versus conditional MLE

3
4

5
6

likelihood function for MA process


Numerical optimization of the log likelihood
A SYMPTOTIC PROPERTIES OF ML ESTIMATORS AND
INFERENCE
I DENTIFICATION OF AN ARMA PROCESS : SELECTING
AND q
Information criteria
Other model selection approaches
D IAGNOSTIC CHECKING
F ORECASTING
The MSE optimal forecast is the conditional mean
Forecast evaluation
Relative evaluation:Diebold-Mariano

2/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

3/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Suppose you want to fit an ARMA(p, q) model to a univariate


stationary process {Xt }.
The first thing will be to set values for p and q. We will call this step
identification.
Next, we would like to estimate the corresponding ARMA(p, q). The
most popular method for estimating ARMA processes is Maximum
Likelihood (ML).
AR processes can be also estimated by OLS. It can be shown that this
approach is asymptotically equivalent to ML. This is very convenient
because OLS estimator have closed analytical expressions and are
straight forward to compute.
The final step will be to check whether the residuals of th estimated
model verify the assumptions that have been imposed on the
innovations.

4/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

5/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

6/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

Assume for the time being that (p, q) are known. We will consider first
estimation of AR processes (its simpler) and then we will move to
estimation of general ARMA(p, q) processes.
Here we distinguish two cases:
Xt follows an AR(p) for some finite p
Xt follows an AR() process.

6/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

Let {Xt }be a stationary AR(p) process with p < , i.e.


Xt = 0 + 1 Xt1 + ... + p Xtp + t
where t is a zero mean innovation (either w.n, m.d.s. or i.i.d.) with variance
2 .
0

It is possible to estimate = 0 1 . . . p ) by applying the standard OLS

formula and the resulting estimator is T - consistent and asymptotically


normal

7/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

We now provide a sketch of the proof for the simplest case. Let {Xt } be an
AR(1) process given by
Xt = Xt1 + t
where t is a m.d.s and || < 1 (which implies that {Xt } is stationary and
ergodic).
Then,
P
P
Xt1 t
Xt Xt1

P
ols =
=+ P 2
2
Xt1
Xt1

8/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

Consistency
P
Xt1 t p
p

ols
P 2

0
Xt1
Under the previous assumptions, Xt2 is also stationary and ergodic, thus by
LLN,
2
X Xt1

2
T
(1 2 )
P
As for the numerator, notice that if t is P
a m.d.s, so is Xt1 t . Therefore, a
p
LLN also applies to the numerator and XTt1 t
E(Xt1 t ) = 0. Then
p

var(Xt1 ) =

op (T)
op (1)
ols = +
=+
= + op (1)
Op (T)
Op (1)

9/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

Asymptotic normality
T

1
2

!
P
P
T 1 2 Xt1 t
T 1/2 Xt1 t
P
=
+ op (1)
2
var(Xt )
T 1 Xt1

ols =

By the CLT for m.d.s


1

T 2

Xt1 t
N 0, var(Xt ) 2

Thus



1
d
T 2 ols
N 0,

2
var(Xt )

= N 0, (1 2 )

Remark: What happens with the distribution of ols if 1?


A similar proof can be established for a general stationary AR(p) process.

10/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

AR(p) (for finite p) are easier to estimate than MA processes since OLS
can be applied.
We know that invertible ARMA process can be written as an AR()
process.
Berk showed that if an AR(k) process is fitted toXt , where k tends to infinity
with the sample size, it is possible to obtain consistent and asymptotically
normally distributed estimators of the relevant coefficients.
More explicitly, k has to verify two conditions:

11/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

an upper bound conditions k3 /T 0 (that says that k should not


increase too quickly),
P
and a lower bound one, T 1/2 j=k+1 i c 6= 0 (that says that k must
not increase too slowly).
How to choose k in practice?
The above mentioned conditions do not help much in choosing the
appropriate k in applications. Ng an Perron (2005) have shown that the
value of k chosen by the AIC and the BIC does not verify the lower
bound condition. This implies that the estimates are consistent but not
asymptotically normal
Kuersteiner (2005) has proved that the general to specific model
selection verifies both the upper and the lower bound conditions. Then,
the resulting estimates are consistent and asymptotically normal.

12/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF ARMA(p, q) PROCESS : M AXIMUM LIKELIHOOD ESTIMATION

Let {Xt } be an ARMA(p, q) process


p (L)Xt = + q (L)t
0
and let = 1 , ..., p , 1 , ..., q , , 2 be the vector containing all the
unknown parameters. Suppose that we have observed a sample of size
T, (x1 , x2 , ...).
The ML approach will amount to calculate the joint probability density
of (XT , ..., X1 )
fXT ,XT1 ,...,X1 (xT , xT1 , ..., x1 ; )

(1)

which might be loosely interpreted as the probability of having


observed this particular sample.
The Maximum likelihood estimator (MLE) of is the value that
maximizes (1) , i.e, the probability of having observed this sample.

13/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF ARMA(p, q) PROCESS : M AXIMUM LIKELIHOOD ESTIMATION

The first step is to specify a particular distribution for the white noise
innovations of the process, t . Typically, it will be assumed that t is a
Gaussian white noise, t i.i.dN(0, 2 ).
This assumption is strong but when Gaussianity does not hold , the
estimator computed by maximizing the Gaussian likelihood has, under
certain conditions, the same asymptotic properties as if the process
were indeed normal. In this cases, the estimator of the parameters of a
non Gaussian process computed on a Gaussian likelihood is called
Quasi or Pseudo MLE
The second step would be to calculate the likelihood function (1). The
exact closed form of the likelihood function of a general ARMA
processes is complicated. In the following we will illustrate how the
likelihood is computed for simple AR and MA processes.
The final step is to obtain the values of that maximize the log
likelihood. Unless the process is a pure AR process (in which cas OLS
can be applied), numerical optimization procedures should be applied
in order to obtain the estimates.
14/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES

Let {Xt } be an AR(1) process


Xt = c + Xt1 + t
where || < 1 and {t } is i.i.d N(0, 2 ) and 0 = (c, , 2 ). We now
compute the density of the first observation X1 . Clearly, since t is Gaussian
c
2
with E(X1 ) = (1)
and E(X1 )2 = (1
2 ) , then
fX1 (x1 ; ) = p

1
2 2 /(1 2 )

(x1 c/(12 ))2


2 2 /(12 )

Next, we consider the density of X2 conditional on X1 = x1 . It is clear that


X2 |(X1 = x1 ) N(c + y1 , 2 ), an then
fX2 |X1 (x2 |x1 ; ) =

1
2 2

(x2 cx1 )2
2 2

15/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES

As for the X3 , the distribution of X3 conditional on X2 = x2 and X1 = x1 is


fX3 |X2 ,X1 (x3 |x2 , x1 ; ) =

1
2 2

(x3 cx2 )2
2 2

The joint distribution of the X1 , X2 and X3 can be written as


fX3 ,X2 ,X1 (x3 x2 , x1 ; )
= fX3 |X2 ,X1 (x3 |x2 , x1 ; )fX2 |X1 (x2 |x1 ; )
= fX3 |X2 ,X1 (x3 |x2 , x1 ; )fX2 |X1 (x2 |x1 ; )fX1 (x1 ; )
Furthermore, notice that the values of X1 , X2 , ..., XT1 matter for Xt only
through the value of Xt1 , and then

16/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES

fXt |Xt1 ,...,X1 (xt |xt1 , ..., x1 ; ) = fXt |Xt1 ,...,X1 (xt |xt1 ; )
fXt |Xt1 ,...,X1 (xt |xt1 ; ) =

1
2 2

(xt cxt1 )2

2 2

(2)
(3)

Thus, the joint density of the first t observations is


fXt |Xt1 ,...,X1 (xT |xT1 , ..., x1 ; )

(4)

= fXt |Xt1 (xt |xt1 ; )fXt1 |Xt2 (xt1 |xt2 ; )...fX1 (x1 ; )

(5)

= fX1 (x1 ; )

T
Y

fXt |Xt1 (xt |xt1 ; )

(6)

t=2

17/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES

Taking logs
L() = log(fX1 (x1 ; )) +

T
X

log fXt |Xt1 (xt |xt1 ; )

(7)

t=2

and L is called the log-likelihood function.


Clearly, the value of that maximizes (6) and (7) is the same but the
maximization problem is simpler in the latter case, and the the log likelihood
function is always preferred.

18/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES

The next step would be to compute the value of for which the exact log
likelihood in (2) is maximized. This amounts to deriving the log likelihood
and equating the first derivatives to zero. The result is a system of non linear
equations on and the sample for which there is no close solution in terms
of (x1 , ..., xT . Then, iterative numerical procedures are needed to obtain .
And alternative procedure is to regard the value of y1 as deterministic and
then
log fXT ,XT1 ,...,X1 (xT , xT1 , ..., x1 ; )

= log fXT |XT1 (xT |xT1 ; )fXT1 |XT2 (xt1 |xt2 ; )...fX1 (x1 ; )
=

T
X


log fXT |XT1 (xT |xT1 ; )

t=2


=

T 1
2


log 2


T 
X
(xt c xt1 )2
t=2

2 2

19/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES

Notice that the conditional MLE of c and are obtained by minimizing


T
X

(xt c xt1 )2

t=2

It follows that maximization of the log likelihood is equivalent to the


minimization of the sum of squared residuals. More generally, the
conditional MLE for an AR(p) process can be obtained from an OLS
regression of yt on a constant and p of its own lagged values. In contrast to
exact MLE, the conditional maximum likelihood are trivial to compute.
Moreover, if the sample size T is sufficiently large, the first observation
makes a negligible contribution to the total likelihood provided || < 1.

20/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES

The conditional MLE of the innovation is found by differentiating (9) with


respect to 2 and setting the result equal to zero. It can be checked that
!
T
X
t1 )2

(x

x
t
2 =
,
T 1
t=2

and in general, the MLE estimator of 2 of an AR(p) process is given by the


sum of squared residuals over (T p)

21/60

O UTLINE I NTRODUCTION E STIMATION


LIKELIHOOD FUNCTION FOR

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

MA PROCESS

Let {Xt } be a gaussian MA(1) process


Xt = + t + t1

with t is iid N(0, 2 ). Let = , , 2 be
 the population parameters to be
estimated. Then, Xt |t N + t1 , 2 and
fXt |t1 (xt |t1 ) =

1
2 2

(xt t1 )2
2 2

It is not possible to compute this density since t1 is not directly observed


from the data. Suppose that it is known that 0 = 0. Then, all the sequence
of innovations could be computed recursively as

22/60

O UTLINE I NTRODUCTION E STIMATION


LIKELIHOOD FUNCTION FOR

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

MA PROCESS

1 = x1
2 = x2 x1
...
t = xt xt1
Then the conditional density is given by
fXt |Xt1 ,...,X1 (xt |xt1 , ..., x1 , 0 = 0; ) = fXt |t1 (xt |t1 )
=

1
2 2

2
t
2 2

23/60

O UTLINE I NTRODUCTION E STIMATION


LIKELIHOOD FUNCTION FOR

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

MA PROCESS

while the conditional log likelihood is


L() =

 X 2t
T
log 2 2
2
2 2

The conditional log likelihood function is a complicated non linear function


and, in contrats to the AR case, they should be found by numerical
optimization.
If is far from 1 in absolute value, then the effect of imposing 0 = 0 will
quickly die out. However, if is over 1 in absolute value, the error of
imposing this restriction accumulates over time. Then, if the output of
estimating is greater than1 should discard the results. The numerical
optimization should be attempted again with the reciprocal of used as
starting value.

24/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

N UMERICAL OPTIMIZATION OF THE LOG LIKELIHOOD

Once the log likelihood has been computed, the next step is to find the value
that maximizes L(). With the exception of pure AR processes, for which
closed analytical expressions of the estimators are available, numerical
optimization procedures should be employed to obtain the estimates. These
procedures make different guesses for , evaluate the likelihood at these
values and try to infer from these there the value for which is largest. See
Hamilton, section 5.7 for a description of these methods.
The search procedure may be greatly accelerated if the optimization
algorithm begins with parameter values which are close to the optimum
values. For this reason, simple preliminary estimates of are often
employed to begin the search. See Brockwell and Davis, Section 8.2-8.4 for
a description of these preliminary estimators.

25/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

26/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

ML estimators have very good asymptotic properties.


Under certain conditions it can be shown that they are consistent,
asymptotically normal and efficient, since the variance covariance
matrix equals the inverse of Fischer information matrix.
Let {Xt } be an ARMA(p, q) process, 0 be the vector containing the true
parameter values and is the MLE of . Assuming that neither 0 nor falls
on the boundary of the parameter space then



d
T 0
N 0, I 1
where I is the Fischer Information matrix
 2

L()
I = E
|
=
0
0

27/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

An estimator of the variance covariance matrix is


 2

L()
1

I = T
|
0 =
which is often calculated numerically. Another popular estimator of the
Fischer information matrix is the so-called outer product estimator
I = T 1

T 
X


0

ht ()
ht ()

t=1

= log f (yt |yt1 , yt2 , ...; )/| denotes the vector of


where ht ()
=
derivatives of the log of the conditional density of the t th observation with

respect to the elements of the parameter vector , evaluated at .

28/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

If the process is correctly specified, both estimators should yield similar


values. If the two estimators differ a great deal, this may mean that the
model is misspecified. See White (1982) for a general test of model
specification based on this idea.
These estimates can be used to construct asymptotic standard errors that in
turn can be employed to construct confidence intervals and t- tests.
Another popular approaches to testing hypotheses about parameters that are
estimated by ML are the Likelihood ratio and the Lagrange Multiplier test.
See Hamilton Chapter 5.8

29/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

30/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

How to select p and q?


ACF and PACF are of help but their usefulness is limited in the general
ARMA(p, q) case.
In this section we will see other ways of selecting these values: model
selection criteria
If you assume that your model contains an infinite number of
parameters or that the type of models you are considering for
estimation do not include the true model, the goal will be to select one
model that best approximates the true model from a set of finite
dimensional candidate models. In large samples, a model selection
criteria that chooses the model with minimum mean squared error
distribution is said to be a asymptotically efficient
Many researchers assume that the true model is of finite dimension and
is included in the set of candidate models. The goal in this case is to
correctly choose the true model from the list of candidates. A model
selection criteria that identifies the correct model asymptotically with
probability 1 is said to be consistent
31/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NFORMATION CRITERIA

Information criteria are mechanisms designed for choosing an


appropriate number of parameters to be included in the model. By
minimizing some particular distances between the true and the candidates
models (the Kullback-Leidler discrepancy, see Gourieroux-Montfort), it is
possible to arrive to expressions that usually have the general form:
C(T)
IT (k) = log(
k2 ) + k
T }
| {z

(8)

penalty term

Some popular IC are:


1

Akaike: C(T) = 2

Schwartz or Bayesian (BIC): C(T) = log T

Hannan and Quinn (HQ): C(T) = 2 log(log T)

32/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NFORMATION CRITERIA

The numbers of parameters k to be included in the model is chosen as


k = arg min IT (k)
km

where m is some pre-specified maximum numbers of parameters


Remark 1 The AIC is not consistent, in the sense that it might not choose
asymptotically the correct model with probability one. It tends to
overestimate the number of parameters to be included in the model.
However, this does not mean that this method is useless. It minimizes the
MSE for one-step ahead forcasting

33/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NFORMATION CRITERIA

Remark 2 The BIC and the HQ criteria are consistent that is


lim P(k = k0 ) = 1

(9)

In fact, the consistency result (9) holds for any criterion of the type (8)
with limT C(T)/T = 0 and limT C(T) =

34/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

OTHER MODEL SELECTION APPROACHES

General to specific criterion (Ng and Perron, 1995). This method


amounts to
1

2
3

Set the maximum numbers of parameters to be estimated. For instance, an


AR(k) model with k = 7.
Estimate the "general" model, (an AR(7) in this case).
Test for the statistical significance of the coefficients associated with the
higher order lags - with either a t or an F test (yt7 in this case).
Exclude the nonsignificant parameters, reestimate the smaller model and
test again for significance. Repeat until all the remaining estimates are
significantly different from zero.

Bootstrap model selecting techniques.

35/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

36/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Typically, the goodness of fit of a statistical model is judged by


comparing the observed data with the values predicted by the estimated
model.
If the fitted model is appropriate, the residuals should behave in a
similar way as the true innovations of the process.
Thus, we should plot the residuals of the estimated model to check whether
they verify the main assumptions: they are zero mean, homoskedastic (with
constant variance) and with no significant correlations. If the graph shows a
cyclic or trended component or a non- constant variance, it can be
interpreted as a sign of misspecification.
The autocorrelation function of these residuals should not display any
significant correlations. Let {et } be the sequence of residuals, given by
et = Xt Xt
where Xt are the fitted values.
37/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Then, the autocorrelation function is given by


PTh
e (h) =

(et e)(et+h e)
, h = 1, 2, ...
PT
e)2
t=1 (et

t=1

Assume first that the true parameter values were known. Then, et = t . In

d
this case, we know (see Chapter 1) that T e
N (0, IH ), where
0
= (
(1), ..., (H)) and IH is the H H identity matrix, then the null
hypothesis that the first H autocorrelations are non significant can be tested
using the Box-Pierce Q statistic:
T

H
X

2 (i)
2H

(10)

i=1

38/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

However, in practice the values of the parameters are unknown and they
should be estimated. The fact that the parameters employed to compute et
are not the true ones but are estimates has an impact on the asymptotic
distribution.
More specifically, the asymptotic variance of the sample autocorrelations is
not the identity matrix anymore. Hence, in this case (10) no longer holds.
However, under certain assumptions it is still possible to use the sample
autocorrelations for diagnosis of the model. The following proposition
presents the distribution of the sample autocorrelations of the sample
residuals, when the parameters of the model are estimated.

39/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

P ROPOSITION
Suppose that Yt = Xt0 + t , t = 1, ..., T, where Yt and t are scalar and Xt
and are k 1 vectors. If {Yt , Xt } are jointly stationary and ergodic,
E (Xt Xt0 ) is a full rank matrix,
E (t |t1 , t2 , ..., Xt , Xt1 , ...) = 0
and

E 2t |t1 , t2 , ..., Xt , Xt1 , ... = 2 > 0
then

T e
N (0, IH Q)

where qjk (the jk element of the H H matrix Q) is given by


0

E (Xt tj ) E (Xt Xt0 )

E (Xt tk ) / 2

Proof. see Hayashi, p165


40/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Further, if Xt is an ARMA(p, q) process and the model is correctly specified,


it can be shown that the Box Pierce Q -statistic for diagnosting checking
when the parameters are estimated is approximately distributed as
Qe = T

H
X

2e (i)
2(Hpq)

i=1

The adequacy of the model is rejected at level if


X
Qe = T
H 2e (i) > 21(Hpq)
i=1

Finally, Ljung and Box (1978) suggest replacing Qe by


Qe =

PH
n(n + 2) i=1 2e (i)
ni

since they claim that the statistics offers a better approximation to the 2
distribution.
41/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

42/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

A h-step ahead forecast, xt+h|t , is designed to minimize expected loss,


conditional on time t information. Examples of widely-used loss
functions:
2
MSE: xt+h xt+h|t


MAD: xt+h xt+h|t
Quad-Quad: 1 xt+h xt+h|t

2

+ 2 I[xt+h xt+h <0] xt+h xt+h|t

2

Asymmetric if 1 6= 2

43/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

T HE MSE OPTIMAL FORECAST IS THE CONDITIONAL MEAN

Let xt+h
= Et [xt+h ]
Let xt+h be any other value
i
h
h

2 i
2

xt+h xt+h
+ xt+h
xt+h
Et (xt+h xt+h ) = Et
h
2


2 i

= Et xt+h xt+h
+ 2 xt+h xt+h
xt+h xt+h + xt+h
xt+h
h



2 i

= Vt [xt+h ] + 2Et xt+h xt+h


xt+h xt+h
+ Et xt+h
xt+h
h
h

2 i
 

= Vt [xt+h ] + 2 xt+h
xt+h Et xt+h xt+h
+ Et xt+h
xt+h
h

2 i

= Vt [xt+h ] + 2 xt+h
xt+h .0 + Et xt+h
xt+h

= Vt [xt+h ] + (xt+h
xt+h )2

44/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

T HE MSE OPTIMAL FORECAST IS THE CONDITIONAL MEAN

MSE optimal forecast for an AR(1):


xt = 1 xt1 + t
Et [xt+1 ] = Et [1 xt + t+1 ] = 1 Et [xt ] + Et [t+1 ] = 1 xt + 0
Et [xt+2 ] = Et [1 xt+1 t+2 ] = 1 Et [xt+1 ] + Et [t+2 ] = 1 (1 xt ) + 0 = 21 xt + 0

45/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

F ORECAST EVALUATION

Two primary criteria to evaluate forecasts:


Objective
Relative

Objective: Mincer Zarnowitz regressions


xt+h = + xt+h|t + t
H0 : = 0, = 1; H1 : 6= 0 6= 1
Use any test: Wald, LR, LM

Can be generalized to include any variable available when the forecast


was produced
xt+h = + xt+h|t + zt + t

46/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

F ORECAST EVALUATION

H0 : = 0, = 1, = 0; H1 : 6= 0 6= 1 j 6= 0
zt must be in the time t information set
Important when working with macro data

47/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

R ELATIVE EVALUATION :D IEBOLD -M ARIANO

A
B
Two forecasts, xt+h|t
and xt+h|t

Two losses
ltA = (yt+h yAt+h|t )2 and ltB = (yt+h yBt+h|t )2
Losses do not need to be MSE

if equally good or bad, E[ltA ] = E[ltB ] or E[ltA EltB ] = 0


Define = ltA ltB

48/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

R ELATIVE EVALUATION :D IEBOLD -M ARIANO

Implemented as a t-test that E[t ] = 0


H0 : E[t ] = 0; H1A : E[t ] < 0; H1B : E[t ] > 0
Composite alternative
Sign indicates which model is favored

DM = q

]
V[

One complication: {t } cannot be assumed to be uncorrelated, so a


more complicated variance estimator is required
Newey-West covariance estimator:
2 = 0 + 2

L 
X
1
l=1


l
l
L+1

49/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I MPLEMENTING A D IEBOLD -M ARIANO TEST

DM = q

]
V[

Algorithm (Diebold-Mariano Test)


1

A
B
Using the two forecasts, xt+h|t
and xt+h|t
, compute t = ltA ltB

Run the regression t = + t

Use a Newey-West covariance estimator (olsnw in Matlab)

T-test, H0 : = against H1A : < 0, and H1B : > 0

Reject if |t| > C where C is the critical value for a 2 sided test using
a normal distribution with a size of . If significant, reject in favor of
model A if test statistic is negative or in favor of model B if test statistic
is positive.

50/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

51/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Identifying a model for yt refers to the methodology for selecting


The appropriate transformations for obtaining
stationarity (such as variance stabilization
transformations and differencing)
values for p, q and d (the integration order)
deciding whether a deterministic component should
or should not be included in the model.
The followings steps should be followed to identify the model:

52/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Step 1. Plot the data and choose the proper transformations.


The plot usually shows whether the data contains a trend, a seasonal
component, outliers, non constant variances, etc. then, it might suggest
some proper transformations.
The most commonly used transformations are variance stabilizing
transformations, typically, a logarithmic transformations or, more
generally, a Box Cox transformation, and/or differencing.
Always apply the variance stabilizing transformations before taking any
number differences.
Step 2. Compute and examine the sample ACF and sample PACF of the
(variance-stabilized) process.
If these functions decay very slowly, this would be a sign of non stationarity.
Unit root tests should also be used at this stage. Typically, the number of
differences would be 0, 1 or 2.

53/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Step 3. Compute the ACF and PACF of the transformed variable to identify
p and q.
Identifying the order of sample AR or MA polynomials is, in theory, easy
with the table above. However, it is more difficult to identify the orders of an
ARMA process. In these cases, other model selection mechanisms, such as
information criteria (see below) can be implemented.
Step 4. Test the deterministic trend term when d > 0.
If d > 0 and a trend in the data not suspected should not be included.
However, if there is a reason to believe that a trend should be include, one
can include this term in the model and the, discard it if the coefficient is not
significant.
For some interesting exemples, see Wei, Chapter 6 and Brockwell and
Davies, Chapter 9.

54/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

ESTIMATION

Once the appropriate transformations have been performed, the resulting


process is stationary (basically, variance stabilizing transformations and/or
differencing)
At this stage, the estimation techniques developed for the stationary ARMA
case are applicable to the resulting stationary process.

55/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

56/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Download the R Library "Time Series"


A first step in analysing time series is to examine the autocorrelations (ACF)
and partial autocorrelations (PACF). R provides rhe functions acf () and
pacf () for computing and plotting of ACF and PACF
sim.ar<-arima.sim(list(ar=c(0.4,0.4),n=1000)
sim.ma<-arima.sim(list(ma=c(0.6,-0.4),n=1000)
par(mfrow=c(2,2))
acf(sim.ar, main="ACF of AR(2) process")
acf(sim.ma, main="ACF of MA(2) process")
pacf(sim.ar, main="PACF of AR(2) process")
pacf(sim.ma, main="PACF of MA(2) process")
From the "Time Series" library, the function arima(data, order = c(p, d, q))
can be used to estimate the parameters.
fit<-arima(sim.ar, order=c(2,0,0))
fit is a list containing the coefficients (fit$coef ), residuals (fit$residuals) and
the Akaike Information Criterion (fit$aic)
57/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Diagnostic Checking
A first step in diagnostic checking of fitted models is to analyze the residuals
from the fit for any signs of randomness. R has the function tsdiag(), which
produces a diagnostic plot of fitted time series model
tsdiag(fit)
it produces output containing a plot of the residuals, the autocorrelation of
the residual and the p- values of the Ljung-Box statistic for the first 10 lags.
The function Box.test() computes the test statistic for a given lag
Box-test(fit$residuals, lag=1)

58/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Prediction of ARMA-Models
predict( ) can be used for predicting future values of the levels under the
model
AR.pred<-predict(fit,n.ahead=8)
AR.pred is a list containing two entries, the predicted values AR.pred$pred
and the standard errors of the prediction AR.pred$se. Using a rule of thumb
for an approximate confidence interval (95% of the prediction),i.e.
prediction 2SE, one can plot the AR data, predicted values and an
approximate confidence interval:
plot(sim.ar)
lines(AR.pred$pred,col="red")
lines(AR.pred$pred+2*AR.pred$se,col="red",lty=3)
lines(AR.pred$pred-2*AR.pred$se,col="red",lty=3)

59/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

I NTRODUCTION

F ORECASTING

E STIMATION OF
ARMA(p, q) PROCESSES

M ODEL IDENTIFICATION

A SYMPTOTIC

PROPERTIES OF ML
ESTIMATORS AND
INFERENCE

R EFERENCES

I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q

D IAGNOSTIC CHECKING

IN THE GENERAL CASE


CODE

60/60

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Brockwell and Davies, (1991), Chapters 8, 9


Hamilton, 1994, Chapter 5
Wei, Chapters 6,7.

60/60

S-ar putea să vă placă și