ARMA Estimation

O UTLINE I NTRODUCTION E STIMATION
A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE
M ODEL IDENTIFICATION , ESTIMATION AND

DIAGNOSTIC CHECKING
Vlayoudom Marimoutou
AMSE Master 2
Autumn 2013
December 11, 2013
1/60
1
2
I NTRODUCTION
E STIMATION OF ARMA(p, q) PROCESSES
Estimation of AR(p) processes
estimation of an AR(p) for a fixed (and known) p
Estimation of AR() processes
Estimation of ARMA(p, q) process: Maximum likelihood estimation

Likelihood function for AR processes
Exact versus conditional MLE
3
4
5
6
likelihood function for MA process

Numerical optimization of the log likelihood
A SYMPTOTIC PROPERTIES OF ML ESTIMATORS AND
INFERENCE
I DENTIFICATION OF AN ARMA PROCESS : SELECTING
AND q
Information criteria
Other model selection approaches
D IAGNOSTIC CHECKING
F ORECASTING
The MSE optimal forecast is the conditional mean
Forecast evaluation
Relative evaluation:Diebold-Mariano
2/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
ARMA(p, q) PROCESSES
M ODEL IDENTIFICATION
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
I DENTIFICATION OF AN
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
3/60
Suppose you want to fit an ARMA(p, q) model to a univariate

stationary process {Xt }.
The first thing will be to set values for p and q. We will call this step
identification.
Next, we would like to estimate the corresponding ARMA(p, q). The
most popular method for estimating ARMA processes is Maximum
Likelihood (ML).
AR processes can be also estimated by OLS. It can be shown that this
approach is asymptotically equivalent to ML. This is very convenient
because OLS estimator have closed analytical expressions and are
straight forward to compute.
The final step will be to check whether the residuals of th estimated
model verify the assumptions that have been imposed on the
innovations.
4/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
5/60
E STIMATION OF AR(p) PROCESSES
6/60
Assume for the time being that (p, q) are known. We will consider first
estimation of AR processes (its simpler) and then we will move to
estimation of general ARMA(p, q) processes.
Here we distinguish two cases:
Xt follows an AR(p) for some finite p
Xt follows an AR() process.
6/60
Let {Xt }be a stationary AR(p) process with p < , i.e.

Xt = 0 + 1 Xt1 + ... + p Xtp + t
where t is a zero mean innovation (either w.n, m.d.s. or i.i.d.) with variance
2 .
0
It is possible to estimate = 0 1 . . . p ) by applying the standard OLS
formula and the resulting estimator is T - consistent and asymptotically

normal
7/60
We now provide a sketch of the proof for the simplest case. Let {Xt } be an
AR(1) process given by
Xt = Xt1 + t
where t is a m.d.s and || < 1 (which implies that {Xt } is stationary and
ergodic).
Then,
P
P
Xt1 t
Xt Xt1
P
ols =
=+ P 2
2
Xt1
Xt1
8/60
Consistency
P
Xt1 t p
p
ols
P 2
0
Xt1
Under the previous assumptions, Xt2 is also stationary and ergodic, thus by
LLN,
2
X Xt1
2
T
(1 2 )
P
As for the numerator, notice that if t is P
a m.d.s, so is Xt1 t . Therefore, a
p
LLN also applies to the numerator and XTt1 t
E(Xt1 t ) = 0. Then
p
var(Xt1 ) =
op (T)
op (1)
ols = +
=+
= + op (1)
Op (T)
Op (1)
9/60
Asymptotic normality
T
1
2
!
P
P
T 1 2 Xt1 t
T 1/2 Xt1 t
P
=
+ op (1)
2
var(Xt )
T 1 Xt1
ols =
By the CLT for m.d.s

1
T 2
Xt1 t
N 0, var(Xt ) 2
Thus

1
d
T 2 ols
N 0,
2
var(Xt )
= N 0, (1 2 )
Remark: What happens with the distribution of ols if 1?

A similar proof can be established for a general stationary AR(p) process.
10/60
AR(p) (for finite p) are easier to estimate than MA processes since OLS
can be applied.
We know that invertible ARMA process can be written as an AR()
process.
Berk showed that if an AR(k) process is fitted toXt , where k tends to infinity
with the sample size, it is possible to obtain consistent and asymptotically
normally distributed estimators of the relevant coefficients.
More explicitly, k has to verify two conditions:
11/60
an upper bound conditions k3 /T 0 (that says that k should not

increase too quickly),
P
and a lower bound one, T 1/2 j=k+1 i c 6= 0 (that says that k must
not increase too slowly).
How to choose k in practice?
The above mentioned conditions do not help much in choosing the
appropriate k in applications. Ng an Perron (2005) have shown that the
value of k chosen by the AIC and the BIC does not verify the lower
bound condition. This implies that the estimates are consistent but not
asymptotically normal
Kuersteiner (2005) has proved that the general to specific model
selection verifies both the upper and the lower bound conditions. Then,
the resulting estimates are consistent and asymptotically normal.
12/60
E STIMATION OF ARMA(p, q) PROCESS : M AXIMUM LIKELIHOOD ESTIMATION
Let {Xt } be an ARMA(p, q) process

p (L)Xt = + q (L)t
0
and let = 1 , ..., p , 1 , ..., q , , 2 be the vector containing all the
unknown parameters. Suppose that we have observed a sample of size
T, (x1 , x2 , ...).
The ML approach will amount to calculate the joint probability density
of (XT , ..., X1 )
fXT ,XT1 ,...,X1 (xT , xT1 , ..., x1 ; )
(1)
which might be loosely interpreted as the probability of having

observed this particular sample.
The Maximum likelihood estimator (MLE) of is the value that
maximizes (1) , i.e, the probability of having observed this sample.
13/60
E STIMATION OF ARMA(p, q) PROCESS : M AXIMUM LIKELIHOOD ESTIMATION
The first step is to specify a particular distribution for the white noise
innovations of the process, t . Typically, it will be assumed that t is a
Gaussian white noise, t i.i.dN(0, 2 ).
This assumption is strong but when Gaussianity does not hold , the
estimator computed by maximizing the Gaussian likelihood has, under
certain conditions, the same asymptotic properties as if the process
were indeed normal. In this cases, the estimator of the parameters of a
non Gaussian process computed on a Gaussian likelihood is called
Quasi or Pseudo MLE
The second step would be to calculate the likelihood function (1). The
exact closed form of the likelihood function of a general ARMA
processes is complicated. In the following we will illustrate how the
likelihood is computed for simple AR and MA processes.
The final step is to obtain the values of that maximize the log
likelihood. Unless the process is a pure AR process (in which cas OLS
can be applied), numerical optimization procedures should be applied
in order to obtain the estimates.
14/60
L IKELIHOOD FUNCTION FOR AR PROCESSES
Let {Xt } be an AR(1) process

Xt = c + Xt1 + t
where || < 1 and {t } is i.i.d N(0, 2 ) and 0 = (c, , 2 ). We now
compute the density of the first observation X1 . Clearly, since t is Gaussian
c
2
with E(X1 ) = (1)
and E(X1 )2 = (1
2 ) , then
fX1 (x1 ; ) = p
1
2 2 /(1 2 )
(x1 c/(12 ))2

2 2 /(12 )
Next, we consider the density of X2 conditional on X1 = x1 . It is clear that

X2 |(X1 = x1 ) N(c + y1 , 2 ), an then
fX2 |X1 (x2 |x1 ; ) =
1
2 2
(x2 cx1 )2
2 2
15/60
As for the X3 , the distribution of X3 conditional on X2 = x2 and X1 = x1 is

fX3 |X2 ,X1 (x3 |x2 , x1 ; ) =
1
2 2
(x3 cx2 )2
2 2
The joint distribution of the X1 , X2 and X3 can be written as

fX3 ,X2 ,X1 (x3 x2 , x1 ; )
= fX3 |X2 ,X1 (x3 |x2 , x1 ; )fX2 |X1 (x2 |x1 ; )
= fX3 |X2 ,X1 (x3 |x2 , x1 ; )fX2 |X1 (x2 |x1 ; )fX1 (x1 ; )
Furthermore, notice that the values of X1 , X2 , ..., XT1 matter for Xt only
through the value of Xt1 , and then
16/60
fXt |Xt1 ,...,X1 (xt |xt1 , ..., x1 ; ) = fXt |Xt1 ,...,X1 (xt |xt1 ; )
fXt |Xt1 ,...,X1 (xt |xt1 ; ) =
1
2 2
(xt cxt1 )2
2 2
(2)
(3)
Thus, the joint density of the first t observations is

fXt |Xt1 ,...,X1 (xT |xT1 , ..., x1 ; )
(4)
= fXt |Xt1 (xt |xt1 ; )fXt1 |Xt2 (xt1 |xt2 ; )...fX1 (x1 ; )
(5)
= fX1 (x1 ; )
T
Y
fXt |Xt1 (xt |xt1 ; )
(6)
t=2
17/60
Taking logs
L() = log(fX1 (x1 ; )) +
T
X
log fXt |Xt1 (xt |xt1 ; )
(7)
t=2
and L is called the log-likelihood function.

Clearly, the value of that maximizes (6) and (7) is the same but the
maximization problem is simpler in the latter case, and the the log likelihood
function is always preferred.
18/60
The next step would be to compute the value of for which the exact log
likelihood in (2) is maximized. This amounts to deriving the log likelihood
and equating the first derivatives to zero. The result is a system of non linear
equations on and the sample for which there is no close solution in terms
of (x1 , ..., xT . Then, iterative numerical procedures are needed to obtain .
And alternative procedure is to regard the value of y1 as deterministic and
then
log fXT ,XT1 ,...,X1 (xT , xT1 , ..., x1 ; )

= log fXT |XT1 (xT |xT1 ; )fXT1 |XT2 (xt1 |xt2 ; )...fX1 (x1 ; )
=
T
X

log fXT |XT1 (xT |xT1 ; )
t=2

=
T 1
2

log 2

T
X
(xt c xt1 )2
t=2
2 2
19/60
Notice that the conditional MLE of c and are obtained by minimizing

T
X
(xt c xt1 )2
t=2
It follows that maximization of the log likelihood is equivalent to the

minimization of the sum of squared residuals. More generally, the
conditional MLE for an AR(p) process can be obtained from an OLS
regression of yt on a constant and p of its own lagged values. In contrast to
exact MLE, the conditional maximum likelihood are trivial to compute.
Moreover, if the sample size T is sufficiently large, the first observation
makes a negligible contribution to the total likelihood provided || < 1.
20/60
The conditional MLE of the innovation is found by differentiating (9) with

respect to 2 and setting the result equal to zero. It can be checked that
!
T
X
t1 )2
(x
x
t
2 =
,
T 1
t=2
and in general, the MLE estimator of 2 of an AR(p) process is given by the

sum of squared residuals over (T p)
21/60

LIKELIHOOD FUNCTION FOR
MA PROCESS
Let {Xt } be a gaussian MA(1) process

Xt = + t + t1

with t is iid N(0, 2 ). Let = , , 2 be
the population parameters to be
estimated. Then, Xt |t N + t1 , 2 and
fXt |t1 (xt |t1 ) =
1
2 2
(xt t1 )2
2 2
It is not possible to compute this density since t1 is not directly observed

from the data. Suppose that it is known that 0 = 0. Then, all the sequence
of innovations could be computed recursively as
22/60

MA PROCESS
1 = x1
2 = x2 x1
...
t = xt xt1
Then the conditional density is given by
fXt |Xt1 ,...,X1 (xt |xt1 , ..., x1 , 0 = 0; ) = fXt |t1 (xt |t1 )
=
1
2 2
2
t
2 2
23/60

MA PROCESS
while the conditional log likelihood is

L() =
X 2t
T
log 2 2
2
2 2
The conditional log likelihood function is a complicated non linear function

and, in contrats to the AR case, they should be found by numerical
optimization.
If is far from 1 in absolute value, then the effect of imposing 0 = 0 will
quickly die out. However, if is over 1 in absolute value, the error of
imposing this restriction accumulates over time. Then, if the output of
estimating is greater than1 should discard the results. The numerical
optimization should be attempted again with the reciprocal of used as
starting value.
24/60
N UMERICAL OPTIMIZATION OF THE LOG LIKELIHOOD
Once the log likelihood has been computed, the next step is to find the value
that maximizes L(). With the exception of pure AR processes, for which
closed analytical expressions of the estimators are available, numerical
optimization procedures should be employed to obtain the estimates. These
procedures make different guesses for , evaluate the likelihood at these
values and try to infer from these there the value for which is largest. See
Hamilton, section 5.7 for a description of these methods.
The search procedure may be greatly accelerated if the optimization
algorithm begins with parameter values which are close to the optimum
values. For this reason, simple preliminary estimates of are often
employed to begin the search. See Brockwell and Davis, Section 8.2-8.4 for
a description of these preliminary estimators.
25/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
26/60
ML estimators have very good asymptotic properties.

Under certain conditions it can be shown that they are consistent,
asymptotically normal and efficient, since the variance covariance
matrix equals the inverse of Fischer information matrix.
Let {Xt } be an ARMA(p, q) process, 0 be the vector containing the true
parameter values and is the MLE of . Assuming that neither 0 nor falls
on the boundary of the parameter space then

d
T 0
N 0, I 1
where I is the Fischer Information matrix
2

L()
I = E
|
=
0
0
27/60
An estimator of the variance covariance matrix is

2

L()
1
I = T
|
0 =
which is often calculated numerically. Another popular estimator of the
Fischer information matrix is the so-called outer product estimator
I = T 1
T
X

0
ht ()
ht ()
t=1
= log f (yt |yt1 , yt2 , ...; )/| denotes the vector of

where ht ()
=
derivatives of the log of the conditional density of the t th observation with
respect to the elements of the parameter vector , evaluated at .
28/60
If the process is correctly specified, both estimators should yield similar

values. If the two estimators differ a great deal, this may mean that the
model is misspecified. See White (1982) for a general test of model
specification based on this idea.
These estimates can be used to construct asymptotic standard errors that in
turn can be employed to construct confidence intervals and t- tests.
Another popular approaches to testing hypotheses about parameters that are
estimated by ML are the Likelihood ratio and the Lagrange Multiplier test.
See Hamilton Chapter 5.8
29/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
30/60
How to select p and q?

ACF and PACF are of help but their usefulness is limited in the general
ARMA(p, q) case.
In this section we will see other ways of selecting these values: model
selection criteria
If you assume that your model contains an infinite number of
parameters or that the type of models you are considering for
estimation do not include the true model, the goal will be to select one
model that best approximates the true model from a set of finite
dimensional candidate models. In large samples, a model selection
criteria that chooses the model with minimum mean squared error
distribution is said to be a asymptotically efficient
Many researchers assume that the true model is of finite dimension and
is included in the set of candidate models. The goal in this case is to
correctly choose the true model from the list of candidates. A model
selection criteria that identifies the correct model asymptotically with
probability 1 is said to be consistent
31/60
I NFORMATION CRITERIA
Information criteria are mechanisms designed for choosing an

appropriate number of parameters to be included in the model. By
minimizing some particular distances between the true and the candidates
models (the Kullback-Leidler discrepancy, see Gourieroux-Montfort), it is
possible to arrive to expressions that usually have the general form:
C(T)
IT (k) = log(
k2 ) + k
T }
| {z
(8)
penalty term
Some popular IC are:

1
Akaike: C(T) = 2
Schwartz or Bayesian (BIC): C(T) = log T
Hannan and Quinn (HQ): C(T) = 2 log(log T)
32/60
The numbers of parameters k to be included in the model is chosen as

k = arg min IT (k)
km
where m is some pre-specified maximum numbers of parameters

Remark 1 The AIC is not consistent, in the sense that it might not choose
asymptotically the correct model with probability one. It tends to
overestimate the number of parameters to be included in the model.
However, this does not mean that this method is useless. It minimizes the
MSE for one-step ahead forcasting
33/60
Remark 2 The BIC and the HQ criteria are consistent that is

lim P(k = k0 ) = 1
(9)
In fact, the consistency result (9) holds for any criterion of the type (8)
with limT C(T)/T = 0 and limT C(T) =
34/60
OTHER MODEL SELECTION APPROACHES
General to specific criterion (Ng and Perron, 1995). This method

amounts to
1
2
3
Set the maximum numbers of parameters to be estimated. For instance, an

AR(k) model with k = 7.
Estimate the "general" model, (an AR(7) in this case).
Test for the statistical significance of the coefficients associated with the
higher order lags - with either a t or an F test (yt7 in this case).
Exclude the nonsignificant parameters, reestimate the smaller model and
test again for significance. Repeat until all the remaining estimates are
significantly different from zero.
Bootstrap model selecting techniques.
35/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
36/60
Typically, the goodness of fit of a statistical model is judged by

comparing the observed data with the values predicted by the estimated
model.
If the fitted model is appropriate, the residuals should behave in a
similar way as the true innovations of the process.
Thus, we should plot the residuals of the estimated model to check whether
they verify the main assumptions: they are zero mean, homoskedastic (with
constant variance) and with no significant correlations. If the graph shows a
cyclic or trended component or a non- constant variance, it can be
interpreted as a sign of misspecification.
The autocorrelation function of these residuals should not display any
significant correlations. Let {et } be the sequence of residuals, given by
et = Xt Xt
where Xt are the fitted values.
37/60
Then, the autocorrelation function is given by

PTh
e (h) =
(et e)(et+h e)
, h = 1, 2, ...
PT
e)2
t=1 (et
t=1
Assume first that the true parameter values were known. Then, et = t . In
d
this case, we know (see Chapter 1) that T e
N (0, IH ), where
0
= (
(1), ..., (H)) and IH is the H H identity matrix, then the null
hypothesis that the first H autocorrelations are non significant can be tested
using the Box-Pierce Q statistic:
T
H
X
2 (i)
2H
(10)
i=1
38/60
However, in practice the values of the parameters are unknown and they
should be estimated. The fact that the parameters employed to compute et
are not the true ones but are estimates has an impact on the asymptotic
distribution.
More specifically, the asymptotic variance of the sample autocorrelations is
not the identity matrix anymore. Hence, in this case (10) no longer holds.
However, under certain assumptions it is still possible to use the sample
autocorrelations for diagnosis of the model. The following proposition
presents the distribution of the sample autocorrelations of the sample
residuals, when the parameters of the model are estimated.
39/60
P ROPOSITION
Suppose that Yt = Xt0 + t , t = 1, ..., T, where Yt and t are scalar and Xt
and are k 1 vectors. If {Yt , Xt } are jointly stationary and ergodic,
E (Xt Xt0 ) is a full rank matrix,
E (t |t1 , t2 , ..., Xt , Xt1 , ...) = 0
and

E 2t |t1 , t2 , ..., Xt , Xt1 , ... = 2 > 0
then
T e
N (0, IH Q)
where qjk (the jk element of the H H matrix Q) is given by

0
E (Xt tj ) E (Xt Xt0 )
E (Xt tk ) / 2
Proof. see Hayashi, p165

40/60
Further, if Xt is an ARMA(p, q) process and the model is correctly specified,

it can be shown that the Box Pierce Q -statistic for diagnosting checking
when the parameters are estimated is approximately distributed as
Qe = T
H
X
2e (i)
2(Hpq)
i=1
The adequacy of the model is rejected at level if

X
Qe = T
H 2e (i) > 21(Hpq)
i=1
Finally, Ljung and Box (1978) suggest replacing Qe by

Qe =
PH
n(n + 2) i=1 2e (i)
ni
since they claim that the statistics offers a better approximation to the 2
distribution.
41/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
42/60
A h-step ahead forecast, xt+h|t , is designed to minimize expected loss,

conditional on time t information. Examples of widely-used loss
functions:
2
MSE: xt+h xt+h|t

MAD: xt+h xt+h|t
Quad-Quad: 1 xt+h xt+h|t
2
+ 2 I[xt+h xt+h <0] xt+h xt+h|t
2
Asymmetric if 1 6= 2
43/60
T HE MSE OPTIMAL FORECAST IS THE CONDITIONAL MEAN
Let xt+h
= Et [xt+h ]
Let xt+h be any other value
i
h
h

2 i
2
xt+h xt+h
+ xt+h
xt+h
Et (xt+h xt+h ) = Et
h
2

2 i
= Et xt+h xt+h
+ 2 xt+h xt+h
xt+h xt+h + xt+h
xt+h
h

2 i
= Vt [xt+h ] + 2Et xt+h xt+h

xt+h xt+h
+ Et xt+h
xt+h
h
h

2 i

= Vt [xt+h ] + 2 xt+h
xt+h Et xt+h xt+h
+ Et xt+h
xt+h
h

2 i
= Vt [xt+h ] + 2 xt+h
xt+h .0 + Et xt+h
xt+h
= Vt [xt+h ] + (xt+h
xt+h )2
44/60
T HE MSE OPTIMAL FORECAST IS THE CONDITIONAL MEAN
MSE optimal forecast for an AR(1):

xt = 1 xt1 + t
Et [xt+1 ] = Et [1 xt + t+1 ] = 1 Et [xt ] + Et [t+1 ] = 1 xt + 0
Et [xt+2 ] = Et [1 xt+1 t+2 ] = 1 Et [xt+1 ] + Et [t+2 ] = 1 (1 xt ) + 0 = 21 xt + 0
45/60
F ORECAST EVALUATION
Two primary criteria to evaluate forecasts:

Objective
Relative
Objective: Mincer Zarnowitz regressions

xt+h = + xt+h|t + t
H0 : = 0, = 1; H1 : 6= 0 6= 1
Use any test: Wald, LR, LM
Can be generalized to include any variable available when the forecast

was produced
xt+h = + xt+h|t + zt + t
46/60
F ORECAST EVALUATION
H0 : = 0, = 1, = 0; H1 : 6= 0 6= 1 j 6= 0
zt must be in the time t information set
Important when working with macro data
47/60
R ELATIVE EVALUATION :D IEBOLD -M ARIANO
A
B
Two forecasts, xt+h|t
and xt+h|t
Two losses
ltA = (yt+h yAt+h|t )2 and ltB = (yt+h yBt+h|t )2
Losses do not need to be MSE
if equally good or bad, E[ltA ] = E[ltB ] or E[ltA EltB ] = 0

Define = ltA ltB
48/60
R ELATIVE EVALUATION :D IEBOLD -M ARIANO
Implemented as a t-test that E[t ] = 0

H0 : E[t ] = 0; H1A : E[t ] < 0; H1B : E[t ] > 0
Composite alternative
Sign indicates which model is favored
DM = q
]
V[
One complication: {t } cannot be assumed to be uncorrelated, so a

more complicated variance estimator is required
Newey-West covariance estimator:
2 = 0 + 2
L
X
1
l=1

l
l
L+1
49/60
I MPLEMENTING A D IEBOLD -M ARIANO TEST
DM = q
]
V[
Algorithm (Diebold-Mariano Test)

1
A
B
Using the two forecasts, xt+h|t
and xt+h|t
, compute t = ltA ltB
Run the regression t = + t
Use a Newey-West covariance estimator (olsnw in Matlab)
T-test, H0 : = against H1A : < 0, and H1B : > 0
Reject if |t| > C where C is the critical value for a 2 sided test using
a normal distribution with a size of . If significant, reject in favor of
model A if test statistic is negative or in favor of model B if test statistic
is positive.
50/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
51/60
Identifying a model for yt refers to the methodology for selecting

The appropriate transformations for obtaining
stationarity (such as variance stabilization
transformations and differencing)
values for p, q and d (the integration order)
deciding whether a deterministic component should
or should not be included in the model.
The followings steps should be followed to identify the model:
52/60
Step 1. Plot the data and choose the proper transformations.

The plot usually shows whether the data contains a trend, a seasonal
component, outliers, non constant variances, etc. then, it might suggest
some proper transformations.
The most commonly used transformations are variance stabilizing
transformations, typically, a logarithmic transformations or, more
generally, a Box Cox transformation, and/or differencing.
Always apply the variance stabilizing transformations before taking any
number differences.
Step 2. Compute and examine the sample ACF and sample PACF of the
(variance-stabilized) process.
If these functions decay very slowly, this would be a sign of non stationarity.
Unit root tests should also be used at this stage. Typically, the number of
differences would be 0, 1 or 2.
53/60
Step 3. Compute the ACF and PACF of the transformed variable to identify
p and q.
Identifying the order of sample AR or MA polynomials is, in theory, easy
with the table above. However, it is more difficult to identify the orders of an
ARMA process. In these cases, other model selection mechanisms, such as
information criteria (see below) can be implemented.
Step 4. Test the deterministic trend term when d > 0.
If d > 0 and a trend in the data not suspected should not be included.
However, if there is a reason to believe that a trend should be include, one
can include this term in the model and the, discard it if the coefficient is not
significant.
For some interesting exemples, see Wei, Chapter 6 and Brockwell and
Davies, Chapter 9.
54/60
ESTIMATION
Once the appropriate transformations have been performed, the resulting

process is stationary (basically, variance stabilizing transformations and/or
differencing)
At this stage, the estimation techniques developed for the stationary ARMA
case are applicable to the resulting stationary process.
55/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
56/60
Download the R Library "Time Series"

A first step in analysing time series is to examine the autocorrelations (ACF)
and partial autocorrelations (PACF). R provides rhe functions acf () and
pacf () for computing and plotting of ACF and PACF
sim.ar<-arima.sim(list(ar=c(0.4,0.4),n=1000)
sim.ma<-arima.sim(list(ma=c(0.6,-0.4),n=1000)
par(mfrow=c(2,2))
acf(sim.ar, main="ACF of AR(2) process")
acf(sim.ma, main="ACF of MA(2) process")
pacf(sim.ar, main="PACF of AR(2) process")
pacf(sim.ma, main="PACF of MA(2) process")
From the "Time Series" library, the function arima(data, order = c(p, d, q))
can be used to estimate the parameters.
fit<-arima(sim.ar, order=c(2,0,0))
fit is a list containing the coefficients (fit$coef ), residuals (fit$residuals) and
the Akaike Information Criterion (fit$aic)
57/60
Diagnostic Checking
A first step in diagnostic checking of fitted models is to analyze the residuals
from the fit for any signs of randomness. R has the function tsdiag(), which
produces a diagnostic plot of fitted time series model
tsdiag(fit)
it produces output containing a plot of the residuals, the autocorrelation of
the residual and the p- values of the Ljung-Box statistic for the first 10 lags.
The function Box.test() computes the test statistic for a given lag
Box-test(fit$residuals, lag=1)
58/60
Prediction of ARMA-Models
predict( ) can be used for predicting future values of the levels under the
model
AR.pred<-predict(fit,n.ahead=8)
AR.pred is a list containing two entries, the predicted values AR.pred$pred
and the standard errors of the prediction AR.pred$se. Using a rule of thumb
for an approximate confidence interval (95% of the prediction),i.e.
prediction 2SE, one can plot the AR data, predicted values and an
approximate confidence interval:
plot(sim.ar)
lines(AR.pred$pred,col="red")
lines(AR.pred$pred+2*AR.pred$se,col="red",lty=3)
lines(AR.pred$pred-2*AR.pred$se,col="red",lty=3)
59/60
I NTRODUCTION
F ORECASTING
E STIMATION OF
A SYMPTOTIC
PROPERTIES OF ML
ESTIMATORS AND
INFERENCE
R EFERENCES
ARMA PROCESS :
SELECTING p AND q
IN THE GENERAL CASE

CODE
60/60
Brockwell and Davies, (1991), Chapters 8, 9

Hamilton, 1994, Chapter 5
Wei, Chapters 6,7.
60/60

ARMA Estimation

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

ARMA Estimation

Încărcat de

Drepturi de autor:

Formate disponibile

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

M ODEL IDENTIFICATION , ESTIMATION AND

December 11, 2013

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Estimation of ARMA(p, q) process: Maximum likelihood estimation

likelihood function for MA process

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

IN THE GENERAL CASE

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

Suppose you want to fit an ARMA(p, q) model to a univariate

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

IN THE GENERAL CASE

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

Let {Xt }be a stationary AR(p) process with p < , i.e.

It is possible to estimate = 0 1 . . . p ) by applying the standard OLS

formula and the resulting estimator is T - consistent and asymptotically

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

By the CLT for m.d.s

Remark: What happens with the distribution of ols if 1?

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF AR(p) PROCESSES

an upper bound conditions k3 /T 0 (that says that k should not

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF ARMA(p, q) PROCESS : M AXIMUM LIKELIHOOD ESTIMATION

Let {Xt } be an ARMA(p, q) process

which might be loosely interpreted as the probability of having

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

E STIMATION OF ARMA(p, q) PROCESS : M AXIMUM LIKELIHOOD ESTIMATION

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES

Let {Xt } be an AR(1) process

(x1 c/(12 ))2

Next, we consider the density of X2 conditional on X1 = x1 . It is clear that

O UTLINE I NTRODUCTION E STIMATION

A SYMPTOTIC PROPERTIES I DENTIFICATION D IAGNOSTIC CHECKING F ORECASTING M ODEL IDE

L IKELIHOOD FUNCTION FOR AR PROCESSES