ECONOMETRIC

Regime–Switching Models
H ANS -M ARTIN K ROLZIG

Department of Economics and Nuffield College,
University of Oxford.
hans-martin.krolzig@nuffield.oxford.ac.uk
Hilary Term 2002
The course offers an introduction to regime-switching models, covering their theoretical prop-
erties and the statistical tools for empirical research (including maximum likelihood estima-
tion, model evaluation, model selection and forecasting). With the Markov-switching vector
autoregressive model, it presents a systematic and operational approach to the econometric
modelling of time series subject to shifts in regime. The theory will be linked to empirical
studies of the business cycle, using MSVAR for OX.
Course structure
(1) Introduction
(2) Types of regime-switching models
(Assumptions, properties and estimation)
• Structural change and switching regression models
• Threshold models
• Smooth transition autoregressive models
• Markov-switching vector autoregressions
(3) Assessing business cycles with regime-switching models
(Markov-switching VECM of the UK labour market)
(4) Prediction and structural analysis with regime-switching models
1
Basic literature
• Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time

series and the business cycle, Econometrica, 57, 357–384.
◦ Hamilton, J.D. (1994). Time Series Analysis. Princeton: Princeton University Press.
Chapter 22.
• Hansen, B. (1999), Testing for Linearity, Journal of Economic Surveys, 13, 551–576.
• Krolzig, H.-M., Marcellino, M. and G. E. Mizon, A Markov–Switching Vector Equilib-
rium Correction Model of the UK Labour Market, Empirical Economics, forthcoming.
◦ Potter, S. (1999), Nonlinear time series modelling: An introduction, Journal of Eco-
nomic Surveys, 13, 505–528.
• Teräsvirta, T. (1994). Specification, estimation, and evaluation of smooth transition
autoregressive models, Journal of the American Statistical Association, 89, 208–218.
Monographies
◦ Franses, H.P. and D. van Dijk (2000). Nonlinear Time Series Models in Empirical Fin-
ance, Cambridge: Cambridge University Press.
◦ Granger, C.W.J. and T. Teräsvirta (1993). Modelling Nonlinear Economic Relation-
ships, Oxford, Oxford University Press.
◦ Kim, C.J. and C.R. Nelson (1999). State-Space Models with Regime Switching, Cam-
bridge, MA: MIT Press.
◦ Krolzig, H.-M. (1997). ‘Markov-Switching Vector Autoregressions. Modelling, Statist-
ical Inference and Application to Business Cycle Analysis’, Lecture Notes in Economics
and Mathematical Systems, Volume 454, Berlin: Springer.
2
1 Introduction
1.1 Linear time series models
Since Sims (1980) critique of traditional macroeconometric modeling, vector autoregressive

(VAR) models are widely used in macroeconometrics. Their popularity is due to the flexib-
ility of the VAR framework and the ease of producing macroeconomic models with useful
descriptive characteristics, within statistical tests of economically meaningful hypothesis can
be executed. Over the last two decades VARs have been applied to numerous macroeconomic
data sets providing an adequate fit of the data and fruitful insight on the interrelations between
economic data.
In the vector autoregressive model, the K-dimensional time series vector yt = (y1t , . . . , yKt )0
is generated by a vector autoregressive process of order p
yt = ν + A1 yt−1 + · · · Ap yt−p + εt (1)
where t = 1, . . . , T , the ν is a vector of intercepts and Ai are coefficient matrices. The error
process εt = (ε1t , . . . , εKt )0 is an unobservable, usually Gaussian, zero-mean white noise
process,
εt ∼ WN(0, Σ).
that is, E[εt ] = 0, E[εt ε0t ] = Σ, and E[εt ε0s ] = 0 for s 6= t, where the variance-covariance
matrix Σ is time-invariant, positive-definite and non-singular.
The errors are such that the innovations can be interpreted as the one-step prediction errors of
the system
εt = yt − E[yt |Yt−1 ],
while the expectation of yt conditional on the information set Yt−1 = (yt−1 , yt−2 , . . . , y1−p )
is given by the vector autoregression:
X
p
E[yt |Yt−1 ] = ν + Aj yt−j .
j=1
Although, in the past macroeconomic fluctuations and growth have been largely investigated
using linear time series models, it is now increasingly recognized that the implications of the
linear models
• linearity (invariance of dynamic multipliers with regard to the history of the system, size
and sign of the shocks)
• time-invariance of parameters
• Gaussianity
are problematic and that a better understanding requires new econometric tools. Consequently
there has been a great deal of interest in the modelling of non-linearities in economic time
series.
3
1.2 Regime-switching models
While the importance of regime shifts seems to be generally accepted, there is no established
theory suggesting a unique approach for specifying econometric models that embed changes
in regime. Increasingly, regime shifts are not considered as singular deterministic events, but
the unobservable regime is assumed to be governed by an exogenous stochastic process. Thus
regime shifts of the past are expected to occur in the future in a similar fashion.
When a time series is subject to regime shifts, the parameters of the statistical model will
be time-varying. The basic idea of regime-switching models is that the process is time-
invariant conditional on a regime variable st indicating the regime prevailing at time t. Regime-
switching models characterize a non-linear data generating process as piecewise linear by re-
stricting the process to be linear in each regime, where the regime might be unobservable, and
only a discrete number of regimes are feasible. The models within this class differ in their
assumptions concerning the stochastic process generating the regime.
The primary objective of regime-switching models is to provide a systematic econometric ap-
proach for the statistical analysis of multiple time series when the mechanism which generated
the data is subject to regime shifts:
(i) extracting the information in the data about regime shifts in the past,
(ii) estimating the parameters of the model consistently and efficiently,
(iii) detecting recent regime shifts,
(iv) correcting the vector autoregressive model at times when the regime alters,
(v.) incorporating the probability of future regime shifts into forecasts.
Regime-switching models studied represent a very general class which encompasses some
alternative non-linear and time-varying models. In general, the model generate conditional
heteroscedasticity and non-normality; prediction intervals are asymmetric and reflect the pre-
vailing uncertainty about the regime.
We will investigate the issues of detecting multiple breaks in multiple time series, modelling,
specification, estimation, testing and forecasting. En route, we discuss the relation to altern-
ative non-linear models and models with time-varying parameters. In course of this study we
will also propose new directions to generalize the MS-VAR model. Although some methodo-
logical and technical ideas are discussed in detail, the focus is on modelling, specification and
estimation of suitable models.
4
1.2.1 Regime shifts
Characteristics
finite number — infinite number
deterministic — stochastic
single event — reoccurring within sample — reoccurring out of sample
observable — observable if DGP is known — unobservable even if DGP is known
(strongly) exogenous — endogenous
permanent — persistent — transitory
predictable — unpredictable
common — interrelated — independent
Granger causal — Granger noncausal
Implications
nonlinearity
time-varying parameters
non-Gaussianity
5
1.2.2 The Conditional Process
The statistical model of yt defined conditional upon the regime st ∈ {1, . . . , M }. :


 f (yt |Yt−1 , Xt , θ1 )
 if st = 1
p(yt |Yt−1 , Xt , st ) = ..
 .

f (yt |Yt−1 , Xt , θM ) if st = M.
where p(yt |Yt−1 , Xt , st ) is the probability density function of the vector of endogenous vari-
ables yt = (y1t , . . . , yKt )0 conditional upon the history of the process, Yt−1 = {yt−i }∞ i=1 ,
∞
some (strongly) exogenous variables Xt = {xt−i }i=0 and the regime variable st .. θm is the
parameter vector present in regime m.
It is usually assumed that the statistical model is linear in each regime, say st = m. In the
following we focus on autoregressive processes
2
yt = νm + αm1 yt−1 + . . . + αmp yt−p + εt , εt ∼ IID(0, σm ),
and their multivariate generalization: the vector autoregressive (VAR) process
yt = νm + Am1 yt−1 + . . . + Amp yt−p + εt , εt ∼ IID(0, Σm ).
1.2.3 The Regime Generating Process
If the stochastic process of yt is defined conditionally upon the (unobservable) regime st ,

a complete description of the data generating mechanism requires the specification of the
stochastic process which generates the regime:
Pr(st |Yt−1 , St−1 , Xt ; ρ)
where the history St−1 = {st−j }∞ j=1 of the state variable might be unobserved but will be
“reconstructed” from the observations and the vector ρ collects the parameters of the regime
generating process.
6
2 Types of regime-switching models
2.1 Structural change and switching regression models
2.1.1 Structural break models
Structural break at time t = τ :

( P
ν1 + pi=1 α1i yt−i + εt for t < τ
yt = P (2)
ν2 + pi=1 α2i yt−i + εt for t ≥ τ
where εt ∼ IID(0, σ 2 ). By using the indicator function I (t; τ ) :

(
1 for t > τ
I(t; τ ) =
0 for t ≤ τ.
the DGP can be rewritten as

! !
X
p X
p
yt = ν1 + α1i yt−i (1 − I (t; τ )) + ν2 + α2i yt−i I (t; γ) + εt .
i=1 i=1
Two different assumptions regarding the information structure
• τ is known: break is deterministic

• τ is unknown: break is stochastic
2.1.2 Switching regression model
Closely related to the structural change model is the switching regression model, where the
regime shifts are driven by an observable regime variable st :
! !
X p X p
y t = ν1 + α1i yt−i (1 − I (st = 1)) + ν2 + α2i yt−i I (st = 2) + εt . (3)
i=1 i=1
7
2.1.3 Maximum likelihood estimation under normality
Structural break at time t = τ :

( P
ν1 + pi=1 α1i yt−i + εt for t < τ
yt = P
ν2 + pi=1 α2i yt−i + εt for t ≥ τ
where εt ∼ NID(0, σ 2 ).
Two different assumptions regarding the information structure
• τ is known: break is deterministic

– Estimation: Split sample and OLS for each regime;
– Test of β1 = β2 has standard asymptotics; where βm = (νm , α1 , . . . , αp ).
– The same technique can be used for switching regression models.
• τ is unknown: break is stochastic
– Grid search for τ ∈ [0.15, 0.85]T :
τ ∗ = arg min RSS(τ )

τ
= arg min τ σ̂12 (τ ) + (1 − τ )σ̂22 (τ )
τ
– Test of β1 = β2 has non-standard asymptotics as τ becomes nuisance variable.

– See, inter alia, Andrews (1993), and Andrews and Ploberger (1994) and Banerjee,
Lazarova and Urga (1998).
8
2.2 Threshold models
2.2.1 The TAR model
In the threshold autoregressive model, the regime shifts are triggered by an observable, exo-
genous transition variable xt crossing the threshold c:
! !
Xp Xp
y t = ν1 + α1i yt−i (1 − I (xt ; c)) + ν2 + α2i yt−i I (xt ; c) + εt (4)
i=1 i=1
where εt ∼ IID(0, σ 2 ). The indicator function I (xt ; c) is of the type

(
1 if g(xt ) > c
I(x; c) =
0 if g(xt ) ≤ c.
For xt = t a model with a structural break at time t = c occurs
2.2.2 The SETAR model
If the transition variable is a lagged endogenous variable yt−d with delay d > 0, the self-
exciting threshold autoregressive model results:
! !
Xp Xp
y t = ν1 + α1i yt−i (1 − I (yt−d ; c)) + ν2 + α2i yt−i I (yt−d ; c) + εt (5)
i=1 i=1
where εt ∼ IID(0, σ 2 ). . c is again the threshold.

Note that the model can be written as:
X
p
yt = ν(st ) + αi (st )yt−i + εt
i=1
where for a given but unknown threshold c, the ‘probability’ of the unobservable regime, say
st = 2 is given by
(
1 if g(yt−d ) > c
Pr (st = 1|St−1 , Yt−1 ) = I (yt−d ; c) =
0 if g(yt−d ) ≤ c.
Thus in the self-exciting threshold autoregressive (SETAR) model, the regime-generating pro-
cess is not assumed to be exogenous but directly linked to the lagged endogenous variable
yt−d . While the presumptions of the SETAR and the MS-AR model seem to be quite different,
the relation between both model alternatives is rather close. Actually, SETAR and MS-VAR
models can be observationally equivalent as illustrated in Carrasco (1994).
9
SETAR Models of US GNP of Tiao and Tsay (1994) and Potter (1993)
Quarterly growth rate of U.S. GNP, ∆yt :

5
X
∆yt = µ(st ) + αi (st )∆yt−i + ut , ut ∼ IID(0, σ 2 (st ))
i=1
2-regime SETAR with d = 2.

Empirical models:
(
1 if ∆yt−2 > r
• Threshold r ≈ 0: st =
2 if ∆yt−2 ≤ r
(L)
• Moving swiftly out of recessions: α2 << 0
2.2.3 Maximum likelihood estimation under normality
(i) For given delay d, and threshold c :

• Sample split according to I(yt−d ; c).
• OLS regression for each regime separately:
β̂m = (X0m Xm )−1 X0m ym

êm = I − Xm (X0m Xm )−1 X0m ym
2 −1 0
σ̂m = Tm êm êm
where Xm and ym collect the observations from regime m, i.e. those observations
at time t with st = m. Tm is the number of observations in regime m.
• Alternative indicator functions can be used in a single regression, constraining the
residual error variance to be constant across regimes (see, for example, Potter,
1993, p.113.).
(ii) Grid search over d and c: select the pair (c, d) that minimizes the overall residual sum
of squares (RSS)
X
M
∗ 2
(c, d) = arg min RSS(c, d) = arg min Tm σ̂m
(c,d) (c,d)
m=1
Usually the search over c (given d) is restricted such that

min Tm ≥ 0.15T.
(iii) When p is unknown, fit is usually traded against parsimony. A search is made over all
values of p ≤ pmax , and the preferred order is often taken to be that which minimizes
AIC. ( )
XM
p∗ = arg min AIC(p) = 2
Tm ln σ̂m + 2 (p + 1) .
p
m=1
Tsay (1989) describes a specification procedure for threshold models.
10
Actual and fitted values from an AR(3), 1948:1 - 1990:4
4
actual fitted
-2
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995
Actual and fitted values from an AR(2), 1959:4 - 1996:2
4
-2 actual fitted
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995
Figure 1 Linear AR model of US GNP growth.
Actual and fitted values from a SETAR(2;2,2), 1947:4 - 1990:4

4
-2 actual fitted
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995
Actual and fitted values from SETAR(2;2,2), 1959:4 - 1996:2
4
-2 actual fitted
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995
Figure 2 SETAR model of US GNP growth.
11
2.3 Smooth transition autoregressive models
2.3.1 The STAR model
In the smooth transition autoregressive model popularized by Granger and Teräsvirta (1993),
the weight attached to the regimes depends on the realization of exogenous or lagged endo-
genous variables zt :
Pr(st = 2|St−1 , Yt−1 , Xt ) = G(zt ; γ, c),
where the transition function G (zt ; γ, c) is a continuous function determining the weight of
regime 2, and usually bounded between 0 and 1.
The STAR model is closely associated with the work of Teräsvirta (1994), (1998)
! !
X
p Xp
y t = ν1 + α1i yt−i (1 − G (zt ; γ, c)) + ν2 + α2i yt−i G (zt ; γ, c) + εt (6)
i=1 i=1
where εt ∼ IID(0, σ 2 ).
The transition variable zt can be a lagged endogenous variable (zt = yt−d for d > 0),
an exogenous variable (zt = xt ), or a function of some lagged endogenous and exogenous
variables: zt = g(yt−d , xt ). For zt = t a model with smoothly changing parameters results
(see Lin and Teräsvirta, 1994). c is the threshold, γ is the smoothness parameter.
The STAR model (6) exhibits two regimes
• associated with the extreme values of the transition function: G (zt ; γ, c) = 1 and
G (zt ; γ, c) = 0;
• transition from one regime to the other is gradual;
• the regime occurring at time t is observable (for given zt ; γ, c) and can be determined
by G (zt ; γ, c).
For multiple-regime STAR models: see Dijk (1999).

Choices for the transition function G (zt ; γ, c) :
• logistic cumulative density function (LSTAR): different behavior for positive versus
negative values of zt relatively to c
1
G (zt ; γ, c) = .
1 + exp {−γ(zt − c)}
For γ → ∞ : LSTAR → SETAR:
G (zt ; γ, c) = I(zt > c);
For γ → 0 : LSTAR → linear AR
G (zt ; γ, c) = 0.5.
12
• exponential function (ESTAR): different behavior for small versus large deviations of zt
from the threshold c :

G (zt ; γ, c) = 1 − exp −γ(zt − c)2 .
For γ → ∞ and γ → 0 : ESTAR → linear AR:
G (zt ; γ, c) = 0.
• quadratic logistic function:
1
G (zt ; γ, c) = .
1 + exp {−γ(zt − c1 )(zt − c2 )}
For γ → ∞ : quadratic LSTAR → 3-regime SETAR:
G (zt ; γ, c) = 1 − I(c1 < zt < c2 );
For γ → 0 : quadratic LSTAR → linear AR
G (zt ; γ, c) = 0.5.
Properties of STAR models
• Little is known about the conditions under which STAR models are stationary;
• Stationarity has to be evaluated by numerical procedures;
• Even under stationarity: Rich variability of the implied dynamics
– unique equilibrium
– multiple equilibria
– limit cycles
– strange attractors (chaos)
STAR models of US Industrial Production
Teräsvirta and Anderson (1992): 2-regime LSTAR model of the annual growth rate of US
Industrial Production (quarterly data from 1961-1986):
9
! 9
!
X X
∆ 4 y t = ν1 + α1i ∆4 yt−i (1 − G(·)) + ν2 + α2i ∆4 yt−i G(·) + εt
i=1 i=1
with the transition function
1
G (yt−3 ; γ, c) = .
1 + exp {−45(∆yt−3 − 0.0061)/σy }
Properties of business cycle
• expansion: ∆yt−3 > 0.61%
largest root of α1 (L): modulus = 0.76 and period = 61 quarters
• contraction: ∆yt−3 < 0.61%
largest root of α2 (L): modulus = 1.1 and period = 8.9 quarters
• the economy moves from deep recession into higher growth very aggressively.
13
Multivariate Smooth Transition Models
! !
X
p X
p
yt = ν1 + A1i yt−i (1 − G (zt ; γ, c)) + ν2 + A2i yt−i G (zt ; γ, c) + εt
i=1 i=1
where yt = (y1t , · · · , yKt )0 , εt ∼ IID(0,Σ), Ami is a (K × K) matrix, νm is (K × 1).

Tsay (1998) describes a specification procedure for multivariate threshold models.
Suppose now that yt is I(1), but a linear combination et = β 0 yt is stationary with mean µ.
Then a smooth transition equilibrium correction model is of interest:
Asymmetric VECMs
∆yt = α1 (1 − G (et−1 ; γ, µ)) (et−1 − µ) + α2 G (et−1 ; γ, µ) (et−1 − µ) + εt .
LSTAR: positive versus negative deviations from equilibrium

1
G (et−1 ; γ, µ) = .
1 + exp {−γ(et−1 − µ)}
SETAR results for γ → ∞

G (et−1 ; γ, µ) = I(et−1 > µ)
ESTAR: small versus large deviations from equilibrium

G (et−1 ; γ, µ) = 1 − exp −γ(et−1 − µ)2 .
Interesting case: random walk behavior in regime 1 (β 0 α1 = 0) and mean adjustment in

regime 2 (β 0 α2 < 0)
See Granger and Lee (1989) for an early attempt and Granger and Swanson (1996) for a more
general discussion.
14
2.3.2 Maximum likelihood estimation
STAR model
yt = x0t β1 (1 − G (zt ; γ, c)) + x0t β2 G (zt ; γ, c) + εt ,

εt ∼ IID(0, σ 2 )
Non-linear least squares (NLS) estimation of θ = (β10 , β20 ; γ, c)0 :
X
T
θ̂ = arg min RSS = arg min ε2t (θ)
θ θ
t=1
where εt (θ) = yt − [x0t β1 (1 − G (zt ; γ, c)) + x0t β2 G (zt ; γ, c)] .
• Under the assumption of normality, εt ∼ NID(0, σ 2 ) : NLS = ML.

• Estimation via numerical optimization procedure (see e.g. Hendry, 1995, Appendix
A5).
– local maxima!
– convergence?
• Starting values:
– Conditional upon γ and c : OLS estimation of β = (β10 0 , β20 )0
X
T
−1
β̂(γ, c) = xt (γ, c)xt (γ, c)0 xt (γ, c)yt
t=1
where xt (γ, c) = (x0t (1 − G (zt ; γ, c)) , x0t G (zt ; γ, c))0 ;

– Grid search over γ and c : min RSS(γ, c).
• Concentrating the likelihood (RSS) function:
– Conditional upon γ and c : OLS estimation of β = (β10 0 , β20 )0 ;
– NLS of γ and c : min RSS(γ, c).
• Problem: precise estimation of γ
– reason: for large values of γ, the shape of the logistic function changes only little
– accurate estimate of γ requires many observations in the immediate neighbourhood
of the threshold c.
– insignificance of γ should not be interpreted as evidence against the presence of
STAR nonlinearity (see Bates and Watts, 1988).
15
2.3.3 Model selection
An empirical specification procedure
Teräsvirta (1994) based on the Granger and Teräsvirta (1993) recommendation of a specific-
to-general procedure for non-linear models.
(1) Specify appropriate linear AR(p) model;

(2) Test the null hypothesis of linearity against the STAR alternative;
(3) If linearity is rejected, select zt and specify G(zt ; γ, c);
(4) Estimate the STAR model;
(5) Evaluate the STAR model using diagnostic tests;
(6) If misspecification is detected, modify the model;
(7) Use the model for descriptive or forecasting purposes.
Testing for STAR nonlinearity
Problem: Under the null of linearity, some ‘nuisance’ parameters are not identified
null hypothesis nuisance parameters
(ν1 , α11 , . . . α1p ) = (ν2 , α21 , . . . α2p ) γ; c
γ=0 (ν1 , α11 , . . . α1p ) − (ν1 , α11 , . . . α1p ); c
→ conventional statistical theory can not be applied (see Davies, 1977, Davies, 1987 and
Hansen, 1996b)
→ non-standard distributions
→ critical values have to be determined by means of simulation methods.
Solution proposed by Luukkonen, Saikkonen and Teräsvirta (1988):
Replace the transition function G(zt ; γ, c) by a suitable Taylor approximation.
In the reparametrized model, the identification problem is no longer present.
Linearity can be tested by means of a Lagrange multiplier (LM) statistic,
which has a standard asymptotic χ2 −distribution under the null.
→ Test against LSTAR: Luukkonen et al. (1988).
→ Test against LSTAR: Granger and Teräsvirta (1993).
→ LSTAR against ESTAR: Teräsvirta (1994) and Escribano and Jorda (1999).
16
Diagnostic checking in STAR models
Eitrheim and Teräsvirta (1996) discuss formal diagnostic tests for STAR models
• Jarque-Bera test for normality of the residuals

• LM type test for serial autocorrelation
• LM test for remaining nonlinearity (two-regime STAR against the alternative of an ad-
ditive STAR model)
• LM test for parameter constancy (two-regime STAR against the alternative of a time-
varying STAR model)
17
Hans–Martin Krolzig Hilary Term 2002
2.4 Markov-switching vector autoregressions
2.4.1 The MS-VAR model
In Markov-switching vector autoregressive (MS-VAR) models it is assumed that the regime st

is generated by a hidden discrete-state homogeneous and ergodic Markov chain:
Pr(st |St−1 , Yt−1 , Xt ) = Pr(st |st−1 ; ρ)
defined by the transition probabilities
pij = Pr(st+1 = j|st = i).
The conditional process is a VAR(p) with
• shift in the mean (MSM-VAR): once-and-for-all jump in the time series
yt − µ(st ) = A1 (st ) (yt−1 − µ(st−1 )) + . . . + Ap (st ) (yt−p − µ(st−p )) + ut ,
• shift in the intercept (MSI-VAR): smooth adjustment of the time series
yt = ν(st ) + A1 (st )yt−1 + . . . + Ap (st )yt−p + ut ,
A major advantage of the MS-VAR is its flexibility, see Krolzig (1997).
Special MS-VAR Models

MSM MSI Specification
µ varying µ invariant ν varying ν invariant
Aj Σ invariant MSM–VAR linear MVAR MSI–VAR linear VAR
invariant Σ varying MSMH–VAR MSH–MVAR MSIH–VAR MSH–VAR
Aj Σ invariant MSMA–VAR MSA–MVAR MSIA–VAR MSA–VAR
varying Σ varying MSMAH–VAR MSAH–MVAR MSIAH–VAR MSAH–VAR
18
MSM(2)-AR(4), 1952 (2) - 1984 (4)
2.5
1955 1960 1965 1970 1975 1980 1985
1
Probabilities of Regime 1
.5
1955 1960 1965 1970 1975 1980 1985
1
Probabilities of Regime 2
.5
1955 1960 1965 1970 1975 1980 1985
Figure 3 Hamilton’s MSM(2)-AR(4) model.
Markov-switching autoregressive models of US GNP
Hamilton (1989): 2-regime MS-AR model for the quarterly growth rate of U.S. GNP:
4
X
∆yt − µ(st ) = αk (∆yt−k − µ(st−k )) + ut , ut |st ∼ NID(0, σ 2 )
k=1
Two regimes “state of the business cycle”

(
µ1 > 0 if st = 1 (‘expansion’)
µ(st ) =
µ2 < 0 if st = 2 (‘contraction’)
generated by an ergodic Markov chain
p12 = Pr( contraction in t | expansion in t − 1)

p21 = Pr( expansion in t | contraction in t − 1)
19
MSM(2)-AR(4) Model, 1947:2 - 1990:4 MSM(2)-AR(2) Model, 1947:1 - 1990:4
1 1
.5 .5
50 60 70 80 90 1950 1960 1970 1980 1990

1 1
.5 .5
50 60 70 80 90 1950 1960 1970 1980 1990

1 1
.5 .5
50 60 70 80 90 1950 1960 1970 1980 1990

1 1
.5 .5
50 60 70 80 90 1950 1960 1970 1980 1990
Figure 4 MSM(2)-AR models of US GNP growth.
‘High’ Growth Regime, H ‘Recession’ Regime, L

1 1
1948:2-1990:4
.5 .5
50 60 70 80 90 50 60 70 80 90
1 1
1948:2-1984:4
.5 .5
50 60 70 80 90 50 60 70 80 90
1 1
1960:2-1996:2
.5 .5
50 60 70 80 90 50 60 70 80 90
1 1
1960:2-1990:4
.5 .5
50 60 70 80 90 50 60 70 80 90
Figure 5 MSM(3)-AR models of US GNP growth.
20
2.4.2 State-Space Representation
The framework for the statistical analysis of MS-VAR models is the state-space form. The
advantage of viewing MS-VAR models in this way is that general concepts as the likelihood
principle and a recursive filter algorithm can be introduced. The state-space model consists of
the set of measurement and transition equations.
Measurement or observation equation (conditional process): The measurement equation
describes the relation between the unobserved state vector ξt and the observed time series
vector yt . Here, the predetermined variables Yt−1 and the vector of Gaussian disturbances ut
enter the model.
Example: MSI(M )-VAR(1) model
yt = Mξt + A1 yt−1 + ut
 
I(s = 1) (
h i t
1 if st = m
 . 
where M = ν1 · · · νM and ξt =  ..  with I(st = m) =
0 otherwise
I(st = M )
State or transition equation (regime generating process): The state vector ξt follows a
Markov chain subject to a discrete adding-up restriction. The Markov chain governing the
state vector ξt can be represented as a first-order vector autoregression (cf. Hamilton, 1994b):
ξt+1 = Fξt + vt+1 , vt+1 ≡ ξt+1 − E[ξt+1 |{ξt−j }∞

j=0 ]
where F = P0 is the transition matrix. The last equation implies that the innovation vt is an
martingale difference series. Although the vector vt can take on only a finite set of values,
the mean E[vt ] = E[vt |{ξt−j }∞ j=1 ] equals zero. While it is impossible to improve the fore-
cast of vt given the previous realizations of the Markov chain, the conditional variance of vt ,
E[vt vt0 |{ξt−j }∞ 0
j=1 ] = E[vt vt |ξt−1 ] depends on ξt−1 .
21
MSM-VAR processes as linearly transformed VAR processes
MSM(M )–VAR(p) Process, p ≥ 0



 yt = µ(st ) + zt
A(L) (yt − µ(st )) = ut ⇐⇒ µt = Mξt ,

 A(L) z = u , u i.i.d. WN(0, Σ ).
t t t u
State Space Representation

 " #
 h i

 ζt
yt − µy = Mζt + Jzt  y t − µy =
 M J
zt
ζt = Fζt−1 + vt ⇐⇒ " # " #" # " #

 ζ F 0 ζt−1 vt
zt = Azt−1 + ut 

t
= +
 zt 0 A zt−1 ut
   
ξ1,t ξ̄1
 ..   .. 
ζt =  . − . 
ξM −1,t ξ̄M −1
 
p1,1 − pM,1 ... pM −1,1 − pM,1
 .. .. 
F =  . . ,
p1,M −1 − pM,M −1 . . . pM −1,M −1 − pM,M −1
h i
M = µ1 − µM . . . µM −1 − µM ,
     
zt A1 . . . Ap−1 Ap ut
     
 zt−1   IK 0 0   0 

zt =    ..  , ut =  .. 
 
.. , A =  .. .. ,
 .   . . .   . 
zt−p+1 0 . . . IK 0 0
J = e01 ⊗ IK .
A VARMA-Representation Theorem
MSM(M )−VAR(p)
yt = µy + Mζt + zt
zt = A(L)−1 ut , A(L) = IK − A1 L − . . . − Ap Lp
ζt = F(L)−1 vt , F(L) = IM −1 − FL
Moving-average representation:
yt = µy + MF(L)−1 vt + A(L)−1 ut
Final-equations-form VARMA(M + Kp − 1,M + Kp − 2):

|F(L)||A(L)|(yt − µy ) = M|A(L)|F(L)∗ vt + |F(L)|A(L)∗ ut ,
22
2.4.3 Related models
Mixture of normals
The mixture of normals model is characterized by serially independently distributed regimes:
Pr(st |St−1 , Yt−1 ) = Pr(st ; ρ).
This is a special case of the MS-AR model, which results when the transition probabilities are
independent of the history of the regime.
The conditional probability distribution of yt is independent of St−1 ,
p(yt |Yt−1 , St−1 ) = p(yt |Yt−1 ),
and the regimes are Granger non-causal for yt . Even so, this model can be considered as a
restricted MS-VAR model where the transition matrix has rank one. Moreover, if only level of
the time series is regime-dependent, the model is observationally equivalent to time-invariant
linear processes with non-normal errors.
Time-varying transition probabilities (endogenous switching)
All the previously mentioned models are special cases of an endogenous selection model: The
transition probabilities pij are not time-invariant parameters, but functions of the observed
time series vector yt−d or some exogenous variables xt :
(
1 − F12 (zt ; γ, c) if st−1 = 1
Pr(st = 1|St−1 , Yt−1 , Xt ) = F (zt , st−1 ; γ, c) =
F21 (zt ; γ, c) if st−1 = 2.
For example, in the case of an exponential function the time-varying transition probabilities
are given by:

pijt = Fij (zt ; γ, c) = 1 − exp −γij (zt − cij )2 for i 6= j
PM
and piit = 1 − j=1 pijt .
In contrast to an MS-AR model, the regime switching rule also depends on the history of
the observed variables. Since the observed variables contain additional information on the
conditional probability distribution of the states, the regime generating process is no longer
Markovian: a.e.
Pr(st |St−1 , Yt−1 ) 6= Pr(st |st−1 ).
In contrast to the SETAR and the STAR model, MS-VAR models include the possibility that
the threshold depends on the last regime, e.g. that the threshold for staying in regime 2 is
different from the threshold for switching from regime 1 to regime 2 .
23
Regime−dependent densities
0.4
p(y t |s t =1,Yt−1 )
p(y t |s t =2,Yt−1 )
0.2
−5 −4 −3 −2 −1 0 1 2 3 4 5
0.3
Density of y t given Yt−1
p(y t |Yt−1 ) for Pr(s t =1|Yt−1 )=.3
p(y t |Yt−1 ) for Pr(s t =1|Yt−1 )=.5
0.2
0.1
−5 −4 −3 −2 −1 0 1 2 3 4 5
1.0
Regime inference after observation of y t
Pr(s t =1|Y t ) for Pr(s t =1|Yt−1 )=.3
Pr(s t =1|Y t ) for Pr(s t =1|Yt−1 )=.5
0.5
0.0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 6 Regime inference.
2.4.4 Regime inference
The discrete support of the state in the MS-AR model allows to derive the complete conditional
distribution of the unobservable state variable
• instead of deriving the first two moments, as in the Kalman filter (cf. Kalman, 1960,
Kalman and Bucy, 1961, and Kalman, 1963) for Gaussian linear state-space models,
• the grid-approximation suggested by Kitagawa (1987) for non-linear, non-normal state-
space models.
Literature
• The filtering and smoothing algorithms for time series models with Markov-switching
regimes are closely related to Hamilton (1988, 1989, 1994a) building upon ideas of
Cosslett and Lee (1985).
• The basic filtering and smoothing recursions had been introduced by Baum, Petrie,
Soules and Weiss (1970) for the reconstruction of hidden Markov chains.
• Lindgren (1978) applied their algorithms to regression models with Markovian regime
switches.
• A major improvement of the smoother has been provided by the backward recursions of
Kim (1994).
24
Filtering
The filter introduced by Hamilton (1989) can be described as an iterative algorithm for calcu-
lating the optimal inference of ξt+1 on the basis of the information set in t consisting of the
observed values of yt , namely Yt = (yt0 , yt−1
0 , . . . , y0 0
1−p ) . It might also be viewed as a discrete
version of the Kalman filter for the state-space model
yt = Xt B ξt + ut ,
ξt+1 = F ξt + vt+1 .
For given parameters, the discrete-state algorithm under consideration summarizes the condi-
tional probability distribution of the state vector ξt by
 
Pr(ξt = ι1 |Yt )
 .. 
ξ̂t|t = E[ξt |Yt ] =  . .
Pr(ξt = ιN |Yt )
Since each component of ξ̂t|t is a binary variable, ξ̂t|t possesses not only the interpretation as
the conditional mean, which is the optimal inference of ξt given Yt , but it also presents the
probability distribution of ξt conditional on Yt .
The filtering algorithm computes ξˆt|t by deriving the joint probability density of ξt and yt
conditioned on observations Yt .
By invoking the law of Bayes, the posterior probabilities Pr(ξt |yt , Yt−1 ) are given by
p(yt |ξt , Yt−1 )Pr(ξt |Yt−1 )
Pr(ξt |Yt ) ≡ Pr(ξt |yt , Yt−1 ) = ,
p(yt |Yt−1 )
with the prior probability
X
Pr(ξt |Yt−1 ) = Pr(ξt |ξt−1 )Pr(ξt−1 |Yt−1 )
ξt−1
and the density

X X
p(yt |Yt−1 ) = p(yt , ξt |Yt−1 ) = Pr(ξt |Yt−1 )p(yt |ξt , Yt−1 ).
ξt ξt
Note that the summation involves all possible values of ξt and ξt−1 .
Let ηt be the vector of the densities of yt conditional on ξt and Yt−1
   
p(yt |θ1 , Yt−1 ) p(yt |ξt = ι1 , Yt−1 )
 ..   .. 
ηt =  . = . ,
p(yt |θN , Yt−1 ) p(yt |ξt = ιN , Yt−1 )
where θ has been dropped on the right hand side to avoid unnecessary notation, such that the
density of yt conditional on Yt−1 is given by p(yt |Yt−1 ) = ηt0 ξ̂t|t−1 = 10N (ηt ξ̂t|t−1 ).
25
Then, the contemporaneous inference ξ̂t|t is given in matrix notation by
ηt ξ̂t|t−1
ξ̂t|t = , (7)
10N (ηt ξ̂t|t−1 )
where denotes the element-wise matrix multiplication and 1N = (1, . . . , 1)0 is a vector
consisting of ones. The filter weights for each regime the conditional density of the observation
yt , given the vector θm of AR parameters of regime m, with the predicted probability of being
in regime m at time t given the information set Yt−1 . Thus, the instruction (7) describes
the filtered regime probabilities ξt|t as an update of the estimate ξt|t−1 of ξt given the new
information yt .
The transition equation implies that the vector ξˆt+1|t of predicted probabilities is a linear func-
tion of the filtered probabilities ξ̂t|t :
ξ̂t+1|t = Fξ̂t|t . (8)
The sequence {ξ̂t|t−1 }Tt=1 can therefore be generated by iterating on (7) and (8), which can be
summarized as:
F(ηt ξ̂t|t−1 )
ξ̂t+1|t = . (9)
10 (ηt ξ̂t|t−1 )
In the prevailing Bayesian context, ξ̂t|t−1 is the prior distribution of ξt . The posterior distri-
bution ξˆt|t is calculated by linking the new information yt with the prior via Bayes’ law. The
posterior distribution ξ̂t|t becomes the prior distribution for the next state ξt+1 and so on.
Smoothing
The filter recursions deliver estimates for ξt , t = 1, . . . , T based on information up to time

point t. This is a limited information technique, as we have observations up to t = T . In the
following, full-sample information is used to make an inference about the unobserved regimes
by incorporating the previously neglected sample information Yt+1.T = (yt+1 0 , . . . , y 0 )0 into
T
the inference about ξt . Thus, the smoothing algorithm gives the best estimate of the unobserv-
able state at any point within the sample.
The smoothing algorithm proposed by Kim (1994) may be interpreted as a backward filter that
starts at the end point t = T of the previously applied filter.
The full–sample smoothed inferences ξˆt|T can be found by iterating backward from t = T −
1, · · · , 1 by starting from the last output of the filter ξˆT |T and by using the identity
X
Pr(ξt |YT ) = Pr(ξt , ξt+1 |YT )
ξt+1
X
= Pr(ξt |ξt+1 , YT )Pr(ξt+1 |YT ). (10)
ξt+1
26
For pure AR models with Markovian parameter shifts, the probability laws for yt and ξt+1
depend only on the current state ξt and not on the former history of states. Thus, we have
Pr(ξt |ξt+1 , YT ) ≡ Pr(ξt |ξt+1 , Yt , Yt+1.T )

p(Yt+1.T |ξt , ξt+1 , Yt )Pr(ξt |ξt+1 , Yt )
=
p(Yt+1.T |ξt+1 , Yt )
= Pr(ξt |ξt+1 , Yt ).
It is therefore possible to calculate the smoothed probabilities ξˆt|T by getting the last term from
the previous iteration of the smoothing algorithm ξ̂t+1|T , while it can be shown that the first
term can be derived from the filtered probabilities ξ̂t|t ,
Pr(ξt+1 |ξt , Yt )Pr(ξt |Yt )

Pr(ξt |ξt+1 , Yt ) =
Pr(ξt+1 |Yt )
Pr(ξt+1 |ξt )Pr(ξt |Yt )
= . (11)
Pr(ξt+1 |Yt )
If there is no deviation between the full information estimate, ξ̂t+1|T , and the inference based
on the partial information, ξ̂t+1|t , then there is no incentive to update ξ̂t|T = ξ̂t|t and the
filtering solution ξ̂t|t cannot be further improved.
In matrix notation, (10) and (11) can be condensed to

ξ̂t|T = F0 (ξ̂t+1|T ξ̂t+1|t ) ξˆt|t , (12)
where and denote the element-wise matrix multiplication and division. The recursion
is initialized with the final filtered probability vector ξˆT |T . Recursion (12) describes how
the additional information Yt+1.T is used in an efficient way to improve the inference on the
unobserved state ξt .
27
2.4.5 Maximum Likelihood estimation
The Likelihood Function
In econometrics the so-called Markov model of switching regressions considered by Goldfeld

and Quandt (1973)
yt = x0t βm + umt , umt ∼ NID(0, σm

2
) for m = 1, 2
has been one of the first attempts to analyze regressions with Markovian regime shifts. Gold-
feld and Quandt (1973) claimed to derive maximum likelihood estimates by maximizing their
“likelihood” function, which would be in terms of our model
Y
T
Q(θ, ρ, ξ0 ) = ηt (θ)0 ξt|0 (ρ, ξ0 ),
t=1
where ηt is again an (M × 1) vector collecting the conditional densities p(yt |Yt−1 , θm ), m =

1, . . . , M , and ξt|0 = Ft ξ0 are the unconditional regime probabilities.
Unfortunately, the function Q(θ, ρ, ξ0 ) is not the likelihood function as pointed out by Cosslett
and Lee (1985).
Derivation of the likelihood function as a by–product of the filter:
L(λ|Y ) := p(YT |Y0 ; λ)

Y
T
= p(Yt |Yt−1 , λ)
t=1
YT X
= p(yt |ξt , Yt−1 , θ) Pr(ξt |Yt−1 , λ)
t=1 ξt
Y
T
= ηt0 ξ̂t|t−1
t=1
YT
= ηt0 Fξ̂t−1|t−1 .
t=1
The conditional densities p(yt |ξt−1 = ιi , Yt−1 ) are mixtures of normals. Thus, the likelihood
function is non-normal:
Y
T X
N X
N
L(λ|Y ) = pij Pr(ξt−1 = ιi |Yt−1 , λ) p(yt |ξt = ιj , Yt−1 , θ)
t=1 i=1 j=1
Y
T X
N X
N
−K/2 −1/2 1 0 −1
= pij ξ̂i.t−1|t−1 (2π) |Σj | exp − ujt Σj ujt ,
t=1 i=1 j=1
2
where ujt = yt − E[yt |ξt = ιj , Yt−1 ] and N = M p+1 in MSM specifications or N = M

otherwise.
28
Normal Equations of the ML Estimator
The maximum likelihood (ML) estimates can be derived by maximization of likelihood func-
tion L(λ|Y ) subject to the adding-up restrictions:
P1M = 1
10M ξ0 = 1
and the non-negativity restrictions
ρ ≥ 0, σ ≥ 0, ξ0 ≥ 0.
If the non-negativity can be ensured, the ML estimate λ̃ is given by the first-order conditions
(FOCs) of the constrained log-likelihood function
ln L∗ (λ) := ln L(λ|YT ) − κ01 ( P1M − 1M ) − κ2 (10M ξ0 − 1). (13)
Then the FOCs are given by the set of simultaneous equations

∂ ln L(λ|Y )
= 0
∂θ 0
∂ ln L(λ|Y )
− κ01 (10M ⊗ IM ) = 0
∂ρ0
∂ ln L(λ|Y )
− κ2 10M = 0,
∂ξ00
where it is assumed that the interior solution of these conditions exits and is well-behaved,
such that the non-negativity restrictions are not binding.
The derivation of the log-likelihood function concerning the parameter vector θ leads to the
score function
Z
∂ ln L(λ|Y ) 1 ∂p(Y |ξ, θ)
0
= Pr(ξ|ξ0 , ρ) dξ
∂θ L ∂θ 0
Z
1 ∂ ln p(Y |ξ, θ)
= p(Y |ξ, θ)Pr(ξ|ξ0 , ρ) dξ
L ∂θ 0
Z
∂ ln p(Y |ξ, λ)
= Pr(ξ|Y, λ) dξ
∂θ 0
XT X
∂ ln p(yt |ξt , Yt−1 , λ)
= Pr(ξt |YT , λ)
t=1
∂θ 0
ξt
Maximization of the constrained likelihood function with respect to the parameter vector ρ of
the hidden Markov chain leads to
Z
∂ ln L(λ|Y ) 1 ∂Pr(ξ|ξ0 , ρ)
0
= p(Y |ξ, θ) dξ
∂ρ L ∂ρ0
Z
1 ∂ ln Pr(ξ|ξ0 , ρ)
= p(Y |ξ, θ)Pr(ξ|ξ0 , ρ) dξ
L ∂ρ0
Z
∂ ln Pr(ξ|ξ0 , ρ)
= Pr(ξ|Y, λ) dξ.
∂ρ0
29
Thus, the ML estimator of the vector of transition probabilities ρ is equal to the transition
probabilities in the sample calculated with the smoothed regime probabilities:
PT
t=1 Pr(st = j, st−1 = i|YT ; λ)
p̃ij = PT .
t=1 Pr(st−1 = i|YT ; λ)
The EM Algorithm
As shown in Hamilton (1990), the Expectation-Maximization (EM) algorithm introduced by

Dempster, Laird and Rubin (1977) can be used in conjunction with the filter to obtain the
maximum likelihood estimates of the model’s parameters.
The EM algorithm is an iterative ML estimation technique designed for a general class of
models where the observed time series depends on some unobservable stochastic variables.
For the hidden Markov-chain model an early precursor to the EM algorithm was provided
by Baum et al. (1970) building upon ideas in Baum and Eagon (1967). The consistency and
asymptotic normality of the proposed ML estimator were studied in Baum and Petrie (1966)
and Petrie (1969). Their work has been extended by Lindgren (1978) to the case of regression
models with Markov-switching regimes.
Each iteration of the EM algorithm consists of two steps:
• In the expectation step (E), the unobserved states ξt are estimated by their smoothed
probabilities ξ̂t|T . The conditional probabilities Pr(ξ|Y, λ(j−1) ) are calculated with the
filter and smoother by using the estimated parameter vector λ(j−1) of the last maximiz-
ation step instead of the unknown true parameter vector λ.
• In the maximization step (M), an estimate of λ is derived as a solution λ̃ of the FOCs of
ML estimation, where the conditional regime probabilities Pr(ξt |Y, λ) are replaced by
the smoothed probabilities ξ̂t|T (λ(j−1) ) of the last expectation step. Thus, the dominant
source of non-linearities in the FOCs is eliminated. If the score, i.e. the gradient of
ln L(λ|YT ), would have been linear in ξ, this procedure were equivalent to replacing the
unobserved latent variables ξ in the FOCs with their expectation ξ̂t|T .
Equipped with the new parameter vector λ the filtered and smoothed probabilities are updated
and so on. Thus, each EM iteration involves a pass through the filter and smoother, followed
by an update of the first order conditions and the parameter estimates and is guaranteed to
increase the value of the likelihood function.
General results available for the EM algorithm indicate that the likelihood function increases
in the number of iterations j. Finally, a fixed-point of this iteration schedule λ(j) = λ(j−1)
coincides with the maximum of the likelihood function. The general statistical properties of
the EM algorithm are discussed more comprehensively in Ruud (1991).
30
Determination of the number of regimes in MS-VAR models
Testing for the number of regimes in an MS-VAR model is a difficult enterprise:

Conventional testing approaches are not applicable due to the presence of unidentified nuis-
ance parameters under the null of linearity.
null hypothesis nuisance parameters
µ1 = µ2 p12 , p21
p12 = 0(s0 = 1) µ2
The presence of the nuisance parameters gives the likelihood surface sufficient freedom so that
one cannot reject the possibility that the apparently significant parameters could simply be due
to sampling variation. The scores associated with parameters of interest under the alternative
may be identically zero under the null.
Davies (1977, 1987) derived an upper bound for the significance level of the likelihood ratio
test statistic under nuisance parameters.
Formal tests of the Markov-switching model against linear alternative employing standardized
likelihood ratio test designed to deliver (asymptotically) valid inference have been proposed
by Hansen (1992, 1996a), Garcia (1998), but they are computationally demanding.
The results of Ang and Bekaert (1998) indicate that critical values of the χ2 (r +n) distribution
can be used approximately where r is the number of restricted parameters and n is the number
of nuisance parameters.
Alternatives
• Information criteria:
AIC = −2 log L/T + 2n/T,

SC = −2 log L/T + n log(T )/T,
HQ = −2 log L/T + 2n log(log(T ))/T,
where L is the maximized likelihood, n is the number of parameters and T is the sample
size: see Akaike (1985), Schwarz (1978), and Hannan and Quinn (1979).
• Check model congruency: specification and misspecification testing!
31
Hans–Martin Krolzig Hilary Term 2002
3 Prediction and structural analysis with regime-switching models
Forecasting and structural analysis with regime-switching models is considerably more in-
volved than with linear ones. Various techniques have been proposed to overcome these prob-
lems (see, inter alia, Granger and Teräsvirta, 1993). Though the main problems are common
to all non-linear models, we will focus on the MS-VAR approach in the following.
3.1 Predictions of linear and nonlinear stochastic processes
For the mean square prediction error (MSPE) criterion,

h i

min E (yt+h − ŷ)2 Ωt ,
ŷ
the optimal predictor of yt+h is given by the conditional expectation for the given information
set Ωt :
ŷt+h|t = E[yt+h |Ωt ],
where Ωt is the available information set, i.e. the past of the stochastic process up to time t,
Ωt = Yt . The prediction error associated with the optimal predictor ŷt+h|t is given by
êt+h|t = yt+h − E[yt+h |Yt ].
32
3.1.1 Linear AR(1) model
yt = αyt−1 + εt , εt ∼ IID(0, σ 2 ).
One-step prediction
ŷt+1|t = E [αyt + εt+1 |Ωt ] = αyt .
Multi-step prediction
ŷt+h|t = E [αyt+h−1 + εt+h |Ωt ] = αŷt+h−1|t = αh yt = F h (yt , α).
3.1.2 Nonlinear AR(1) model
yt = F (yt−1 ; θ) + εt , εt ∼ IID(0, σ 2 )
where F (yt−1 ; θ) is some nonlinear function.

One-step prediction
ŷt+1|t = E [F (yt ; θ) + εt+1 |Ωt ] = F (yt ; θ).
Multi-step prediction, say h = 2 :
ŷt+2|t = E [F (yt+1 ; θ) + εt+2 |Ωt ]

= E [F (yt+1 ; θ)|Ωt ]

6= F (E [yt+1 |Ωt ] ; θ) = F ŷt+1|t ; θ
33
3.1.3 Methods of calculating multi-step forecasts in nonlinear models
(1) ‘Naive’ approach

(n)
ŷt+2|t = F ŷt+1|t ; θ
→ biased.
(2) ‘Exact’ approach (closed form forecast)
Z +∞
(e)
ŷt+2|t = F (F (yt ; θ) + εt+1 ; θ) f (εt+1 ) dεt+1
−∞
Z +∞
= F (yt+1 ; θ) g(yt+1 |Ωt ) dyt+1
−∞
Z +∞
= E [yt+2 |yt+1 ] g(yt+1 |Ωt ) dyt+1
−∞
where f (εt+1 ) is the pdf of εt+1 and g(yt+1 |Ωt ) = p(yt+1 − F (yt ; θ)) is the pdf of yt+1
conditional on Ωt .
→ approximation by numerical integration; time-consuming for h > 2
→ normal forecast error method: assumes normality of g(yt+h−1 |Ωt ).
(3) ‘Monte-Carlo’ method
1 X
N
(mc)
ŷt+2|t = F (F (yt ; θ) + εi ; θ)
N
i=1
where N is large and εi is drawn from the presumed distribution of εt .

→ approximation of g(yt+h−1 |Ωt ) by simulation
(4) ‘Bootstrap’ method
1X
T
(bs)
ŷt+2|t = F (F (yt ; θ) + ε̂i ; θ)
T
i=1
where the residuals ε̂i from the estimated model are used.
→ distribution-free
(5) ‘Direct’ approach (Multi-step estimation)
yt = G(yt−2 ; τ ) + ε∗t
=⇒ ŷt+2 = G(yt ; τ )
34
3.2 Forecasting performance of non-linear / regime-switching models
3.2.1 Empirical Findings
• Superior in-sample fit does not imply superior forecasts

when compared to linear models
– Clements and Krolzig (1998)
– Dacco and Satchell (1999)
• Dependence on the regime in which the forecast was made
– Pesaran and Potter (1997)
– Clements and Smith (1999)
3.2.2 Illustrative Example: Hamilton’s model of the US business cycle
4
X
∆yt − µ(st ) = αk (∆yt−k − µ(st−k )) + ut ,
k=1
(
µ1 > 0 if st = 1 (‘expansion’)
States of the business cycle : µ(st ) =
µ2 < 0 if st = 2 (‘contraction’)
Transition probabilities : p12 = Pr(contraction in t | expansion in t − 1)
p21 = Pr(expansion in t | contraction in t − 1)
Forecast comparison
• Monte Carlo study

– Generate data from the empirical MSM(2)-AR(4) model
– Estimate MS-AR, AR and SETAR models
– Compare their forecasts for different metrics.
• Empirical forecast accuracy comparison (1980-84,1985-1996,1992-96)
35
RMSE
1.1
1.08
1.06
1.04
1.02 DGP
AR
MS-AR
MS2-AR4
1
SETAR
.98
1 2 3 4 5 6 7 8
Forecast Horizon
Figure 7 Monte Carlo comparison of the models on RMSE.
Q.95
2
Q.90
1.75
1.5
1.25
RMSE
1
MAE
.75 Q.50
1 2 3 4 5 6 7 8 9 10 11 12
Forecast horizon
Figure 8 Monte Carlo. Forecast Errors when the DGP is the MSM(2)-AR(4).
36
Predicting the MSM(2)-AR(4) Process
.4
1-step 2-step 12-step
.4 N(s=0.976) N(s=1.02) .3 N(s=1.06)
.2 .2
.2
.1
-2.5 0 2.5 -2.5 0 2.5 -2.5 0 2.5

Forecasting the MSM(2)-AR(4) Process with MSM(2)-AR(p) Models
.4
.4 N(s=1.01) .4 N(s=1.05) N(s=1.07)
.2
.2 .2
-2.5 0 2.5 -2.5 0 2.5 -2.5 0 2.5

Forecasting the MSM(2)-AR(4) Process with AR(p) Models
.4 .4
N(s=1.03) .3 N(s=1.07) N(s=1.07)
.2 .2 .2
.1
-2.5 0 2.5 -2.5 0 2.5 -2.5 0 2.5

Forecasting the MSM(2)-AR(4) Process with SETAR Models
.4
.4 N(s=1.06) .3 N(s=1.1) N(s=1.08)
.2 .2
.2
.1
-2.5 0 2.5 5 -5 -2.5 0 2.5 5 -15 -10 -5 0 5
Figure 9 Monte Carlo. Forecast Error Density.
1980-84 (ex-post) 1980-84 1985-96 1992-96

MS-AR: RMSPE
MAPE
Q.95
2.5 2.5 2.5 2.5 AR: RMSPE
MAPE
Q.95
SETAR: RMSPE
MAPE
Q.95
2 2 2 2
1.5 1.5 1.5 1.5
1 1 1 1
.5 .5 .5 .5
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Forecast horizon Forecast horizon Forecast horizon Forecast horizon
Figure 10 Empirical Forecasting Performance of the Hamilton Model.
37
3.3 Predicting Markov-switching VARs
The following discussion is based on Krolzig (2000).
3.3.1 Econometric theory of predicting multiple time series subject to shifts in regime
• Optimal predictor of Markov-switching time series models

• Factors resulting in deviations from linear forecasting rule
– Significance of regime shifts
– Persistence of the regime generating process
– Asymmetry of the regime generating process
– Interaction with the autoregressive dynamics
• Concepts
– Predictability of the regime-generating process
– Granger-causality of the regimes for the observed variables
3.3.2 Prediction in Markov-switching regression models
I. Switching regression model


 Xt β1 + ut,
 ut |Xt , st ∼ NID(0, Σ1 ) if st = 1
yt = ..
 .

Xt βM + ut, ut |Xt , st ∼ NID(0, ΣM ) if st = M
Thus: p(yt |xt , st = m) is Gaussian with expectation Xt βm and variance Σm .
II. VAR(1) Representation of the hidden Markov Chain
h i0
ξt = Fξt−1 + vt , vt ∼ MDS and ξt = I(st = 1) · · · I(st = M )
PM
Unrestricted ARM −1 (1) representation using m=1 ξmt =1:
ζt = Fζt−1 + vt , vt ∼ MDS with ζmt = ξmt − ξ̄m for m = 1, · · · , M − 1.
Example: Two-state Markov chain
ζt = ρζt−1 + vt , ρ = p11 + p22 − 1
38
Prediction Density
Mixture of normals weighted with the predicted regime probs Pr(st+h = j|Ωt ) :
X
M
p(yt+h |Ωt ) = Pr(st+h = j|Ωt )p(yt+h |xt+h , st+h = j)
j=1
(M )
X
M X
= Pr(st+h = j|st = i)Pr(st = i|Ωt ) p(yt+h |xt+h , st+h = j)
j=1 i=1
where the filtered regime probs Pr(st = m|Ωt ) are given by the Rule of Bayes:
p(yt |xt , st = j)Pr(st = j|Ωt−1 )

Pr(st = j|Ωt ) = .
P
M
p(yt |xt , st = i)Pr(st = i|Ωt−1 )
i=1
One-step prediction density:

(M )
X
M X
p(yt+1 |Ωt ) = pij Pr(st = i|Ωt ) p(yt+1 |xt+1, st+1 = j)
j=1 i=1
MSPE-Optimal Predictor
Weighted average of the predictors of the M regimes,
X
M
ŷt+1|t = E[yt+1 |Ωt ] = Pr(st+1 = j|Ωt )E [yt+1 |xt , st+1 = j] ,
j=1
where E[yt+1 |xt+1, st+1 = j] = Xt+1 βj :
X
M
ŷt+1|t = E[yt+1 |Ωt ] = Xt+1 βj Pr(st+1 = j|Ωt ) = Xt+1 β̂t+1|t
j=1
!
X
M X
M
β̂t+1|t = βj pij Pr(st = i|Ωt )
j=1 i=1
Multi-step predictions:
ŷt+h|t = Xt+h [β1 , · · · , βM ]ξbt+h|t = Xt+h [β1 , · · · , βM ]F h ξbt|t
When can we expect this predictor to outperform a linear forecasting rule?
39
3.3.3 Predictability and Granger-Causality
Unpredictability: The regime generating process {st } is said to be unpredictable iff the re-
gimes are serially independent:
Pr(st+1 |st ) = Pr(st+1 ).
If the regime generating process {st } is unpredictable, then the detection of recent regime
shifts has no predictive value for future regimes
Pr(st+1 |st ) = Pr(st+1 ) ⇒ Pr(st+h |Ωt−1 ) = Pr(st+h ).
Granger causality:
{st } is said to be non-causal for {yt } in a strict sense iff
p(yt+1 |Ωt ; λ) = p(yt+1 |Ωt , st ; λ).
{st } is said to be non-causal for {yt } in a weak sense iff
E[yt+1 |Ωt ; λ] = E[yt+1 |Ωt , st ; λ].
Result for MS-Regression Models
Unpredictability of regimes implies non-causality.

The regime {st } is Granger non-causal for the observed times series vector{yt } (in a strict
sense) iff the regime is unpredictable:
ξ̂t+h|t = ξ̄ ⇒ ŷt+h|t = Xt+h β̄
Observational equivalence to a time-invariant linear model with heteroscedastic non-Gaussian

errors:
X M

yt = Xt β̄ + wt , f (wt ) = ξ̄m fu wt − Xt (βm − β̄)
m=1
40
3.3.4 Prediction of MS time series processes
VAR(1) process with shifts in the intercept.
yt − µy = Mζt + A (yt−1 − µy ) + ut
Optimal h-step prediction

!
X
h
ŷt+h|t − µy = Kh ζ̂t|t + A (yt − µy ),
h
Kh = Ah−i
MF i
i=1
Example: MSI(2)-AR(1) with ζt = ρζt−1 + vt
ŷt+h|t − µy = αh (yt − µy ) + Kh (ν1 , ν2 , α, ρ) ζbt|t ,

!
X
h
with Kh (ν1 , ν2 , α, ρ) = (ν1 − ν2 ) αh−i ρi .
i=1
Optimal predictor
• Dynamic intercept correction Kh ζbt|t

which depends on the persistence of the regimes: Kh → 0.
• The predictor ŷt+h|t is linear in ζ̂t|t and the last p observations of Yt
• But ŷt+h|t is a non-linear function of the observed Yt
as the regime inference ζ̂t|t is non-linear in Yt .
Result: Unpredictability of regimes implies strict non-causality

MS-AR is observationally equivalent to a linear AR model with heteroscedastic, non-
Gaussian errors (mixture of normals):
ŷt+h|t − µy = Ah (yt − µy )
Markovian Shifts in the Mean of a AR Process
Example: AR(1) process with shifts in the mean µ:
yt − µ(st ) = α (yt−1 − µ(st−1 )) + ut
Sum of two independent processes:
yt − µy = (µ1 − µ2 )ζt + zt
zt = αzt−1 + ut , ut ∼ NID(0, σ 2 ), zt = yt − µy − (µ1 − µ2 )ζt
ζt = ρζt−1 + vt , vt+1 ∼ MDS, ρ = p11 + p22 − 1
41
Optimal predictor
ybt+h|t = µy + (µ1 − µ2 )ζ̂t+h|t + ẑt+h|t

h i
= µy + αh (yt − µy ) + (µ1 − µ2 ) ρh − αh ζ̂t|t .
Result: Unpredictability of regimes does not imply non-causality!
General case of an MS-AR
Consider the model
yt = A(ξt )yt−1 + ut
ξt = F ξt−1 + νt
It follows that
    
ξ1t yt p11 A1 · · · pM 1 A1 ξ1t−1 yt−1
 ..   .. ..  .. 
 .  =  . .  .  + εt
ξM t y t p1M AM · · · pM M AM ξM t−1 yt−1
ηt = Πηt−1 + εt
where ηt = (ξt ⊗ yt ) and εt is a MDS, such that
E [ηt+h |ηt ] = Πh ηt .
As
X
M
yt = ξit yt
i=1
we have that
X
M
E [yt+h |Ωt ] = E [ξit+h yt+h |Ωt ]
i=1
= (10M ⊗ IK )E [ηt+h |Ωt ]
= (10M ⊗ IK )Πh E [ηt |Ωt ]
= (10M ⊗ IK )Πh E [ηt |Ωt ]
= (10M ⊗ IK )Πh (E [ξt |Ωt ] ⊗ yt )
Thus
ŷt+h|t = (10M ⊗ IK )Πh ξ̂t|t ⊗ yt
42
3.3.5 Conclusions
(i) Detecting recent regime shifts is essential to predict MS-AR processes.

(ii) The predictability of regimes and their Granger causality for the observed time series
are critical for the predictive value of detected regime shifts.
(iii) The optimal predictor differs from a linear prediction rule by including
a dynamic intercept correction.
(iv) MS-AR processes have short memory. The longer the forecast horizon, the better the
linear approximation of the optimal predictor.
(v) Forecastability requires the structural stability of the MS-AR.
43
3.4 Impulse-response analysis
3.4.1 Traditional and generalized impulse-response analysis
Measure of the response of yt+h to a shock or impulse δ at time t, given a history ωt .

Traditional impulse response function:
TIRF(h, δ, ωt−1 ) = E [yt+h |εt = δ, εt+1 = · · · = εt+h = 0, ωt−1 ]

−E [yt+h |εt = 0, εt+1 = · · · = εt+h = 0, ωt−1 ] .
Linear models: TIRF is symmetric, linear and history independent.

Nonlinear models: TIRF depends on the sign and size of the shock, as well as the history of
the process; assumption of no intermediate shocks is problematic.
Generalized impulse response function (introduced by Koop, Pesaran and Potter, 1996):
GIRF(h, δ, ωt−1 ) = E [yt+h |εt = δ, ωt−1 ] − E [yt+h |ωt−1 ] .
For linear models: GIRF = TIRF.

The GIRF can be interpret as the realisation of the random variable
GIRF(h, εt , Ωt−1 ) = E [yt+h |εt , Ωt−1 ] − E [yt+h |Ωt−1 ] .
Thus various conditional versions of the type GIRF(h, A, B) can be defined by fixing the shock
or the history.
Calculation by Monte Carlo simulation.
44
3.4.2 Impulse responses in MS-ARs
• The response of to shocks arising from the Gaussian innovations to each of the variables
(corresponds to the impulse response analysis in linear Gaussian VARs).
E [yt+h |ut = δ, ωt−1 ] − E [yt+h |ut = 0, ωt−1 ]
• The study of the path of the variables when there is a change in regime such as from
recession to growth, from recession to high growth, growth to recession or any other
combination between the existing regimes.
E [yt+h |st = j, ωt−1 ] − E [yt+h |st = i, ωt−1 ]
• The dynamic when there is a move in the information structure from the ergodic distri-
bution to certainty regards the state.
E [yt+h |st = j, ωt−1 ] − E [yt+h |ωt−1 ]
Note: Responses are linear in δ and independent of ωt−1 .
The state-space representation
Consider the MS(M )–AR(p) representation is given by

yt = Mξt + A1 yt−1 + . . . + Ap yt−p + ut ,
where M =[ν1 : · · · : νM ], A1 = IK + αβ 0 + Γ1 and Aj = Γj − Γj−1 for 1 < j ≤ p with
Γp = 0K .
To derive the impulse-response functions, we use the stacked MS(M )-AR(1) representation
in yt = (yt0 , . . . , yt−p+1
0 )0 :
yt = Hξt + JAyt−1 + ut ,
   
A1 . . . Ap−1 Ap M
   
 IK 0 0   0 
where A =   .. ..   
 , H =  ..  and J = [IK : 0 : · · · : 0]
 . .   . 
0 IK 0 0
The complete state-space representation involves the VAR(1) representation of the Markov
chain
ξt+1 = Fξt + vt ,
where ξt is the unobservable state vector and vt is a martingale difference sequence.
Hence the expectation of yt+h conditional upon {ut , ξt , Yt−1 } is given by:
yt+h|t = Hξt+h|t + JAyt+h−1|t
where the conditional expectation of ξt+h is
ξt+h|t = Fh ξt .
45
The response to shocks arising from the Gaussian innovations to the variables
∂yt+h
= JAh ιj (14)
∂ujt
If the variance-covariance matrix Σu is regime-dependent, the standardized and orthogonal-

ized impulse-responses also become regime-dependent:
∂yt+h
= JAh D(ξt )ιj , (15)
∂εjt
where ut = D(ξt )εt and D(ξt ) is a lower triangular matrix resulting from the Choleski de-
composition of Σu (ξt ) = D(ξt )D(ξt )0 .
Response to changes in regime such as from recession to growth
The effects of regime shifts can be measured as the reaction of xt+h to the information that
st = j (considered as a shift from the unconditional distribution ξ or the mth regime):
h
P k
dyt+h = J A HF h−k
ιj − ξ (16)
k=0

P
h
dyt+h = J A HF
k h−k
(ιj − ιm ) . (17)
k=0
Dynamics are generated by:
• Changes of the current state and hence to the cond. expectation of future regimes.
• Autoregressive transmission of intercept shifts.
46
References
Akaike, H. (1985). Prediction and entropy. In Atkinson, A. C., and Fienberg, S. E. (eds.), A Celebration
of Statistics, pp. 1–24. New York: Springer-Verlag.
Andrews, D. W. K. (1993). Tests for parameter instability and structural change point. Econometrica,
61, 821–856.
Andrews, D. W. K., and Ploberger, W. (1994). Optimal tests when a nuisance parameter is present only
under the alternative. Econometrica, 62, 1386–1414.
Ang, A., and Bekaert, G. (1998). Regime switches in interest rates. Research paper 1486, Stanford
University.
Banerjee, A., Lazarova, S., and Urga, G. (1998). Bootstrapping sequential tests for multiple structural
breaks. Discussion paper eco no. 98/24, European University Institute, Florence.
Bates, D. M., and Watts, D. G. (1988). Nonlinear regression and its application. New York: John
Wiley.
Baum, L. E., and Eagon, J. A. (1967). An inequality with applications to statistical estimation for prob-
abilistic functions of Markov chains and to a model for ecology. Bull. American Mathematical
Society, 73, 360–363.
Baum, L. E., and Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov
chains. Annals of Mathematical Statistics, 37, 1554–1563.
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in
the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical
Statistics, 41, 164–171.
Carrasco, M. (1994). The asymptotic distribution of the Wald statistic in misspecified structural change,
threshold or Markov switching models. Discussion Paper, GREMAQ.
Clements, M. P., and Krolzig, H. M. (1998). A comparison of the forecast performance of Markov-
switching and threshold autoregressive models of US GNP. Econometrics Journal, 1, C47–C75.
Clements, M. P., and Smith, J. (1999). A Monte Carlo study of the forecasting performance of empirical
SETAR models. Journal of Applied Econometrics, 14, 124–141.
Cosslett, S. R., and Lee, L.-F. (1985). Serial correlation in latent discrete variable models. Journal of
Econometrics, 27, 79–97.
Dacco, R., and Satchell, S. (1999). Why do regime-switching models forecast so badly?. Journal of
Forecasting, 18, 1–16.
Davies, R. B. (1977). Hypothesis testing when a nuisance parameter is present only under the alternat-
ive. Biometrika, 64, 247–254.
Davies, R. B. (1987). Hypothesis testing when a nuisance parameter is present only under the alternat-
ive. Biometrika, 74, 33–43.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood estimation from incom-
plete data via the EM algorithm. Journal of the Royal Statistical Society, 39, Series B, 1–38.
Dijk, D. v. (1999). Extensions and outlier robust inference. Tinbergen institute research series 200,
Erasmus University, Rotterdam.
Eitrheim, Ø., and Teräsvirta, T. (1996). Testing the adequacy of smooth transition autoregressive mod-
els. Journal of Econometrics, 74, 59–76.
Escribano, A., and Jorda, O. (1999). Improved testing and specification of smooth transition autore-
gressive models. In Nonlinear Time Series Analysis of Economic and Financial Data, pp. 289–
47
319. Boston: Kluwer Academic Press.
Garcia, R. (1998). Asymptotic null distribution of the likelihood ratio test in Markov switching models.
International Economic Review, 39.
Goldfeld, S. M., and Quandt, R. E. (1973). A Markov model for switching regressions. Journal of
Econometrics, 1, 3–16.
Granger, C. W. J., and Lee, T. H. (1989). Investigation of production, sales and inventory relation-
ships using multicointegration and non-symmetric error correction models. Journal of Applied
Econometrics, 4, S145–S159.
Granger, C. W. J., and Swanson, N. (1996). Further developments in the study of cointegrated variables.
Oxford Bulletin of Economics and Statistics, 58, 537–554.
Granger, C. W. J., and Teräsvirta, T. (1993). Modelling nonlinear economic relationships. Oxford:
Oxford University Press.
Hamilton, J. D. (1988). Rational-expectations econometric analysis of changes in regime. An invest-
igation of the term structure of interest rates. Journal of Economic Dynamics and Control, 12,
385–423.
Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the
business cycle. Econometrica, 57, 357–384.
Hamilton, J. D. (1990). Analysis of time series subject to changes in regime. Journal of Econometrics,
45, 39–70.
Hamilton, J. D. (1994a). State-space models. In Engle, R., and McFadden, D. (eds.), Handbook of
Econometrics, Vol. 4. Amsterdam: North–Holland.
Hamilton, J. D. (1994b). Time Series Analysis. Princeton: Princeton University Press.
Hannan, E. J., and Quinn, B. G. (1979). The determination of the order of an autoregression. Journal
of the Royal Statistical Society, B, 41, 190–195.
Hansen, B. E. (1992). The likelihood ratio test under non-standard conditions: Testing the Markov
switching model of GNP. Journal of Applied Econometrics, 7, S61–S82.
Hansen, B. E. (1996a). Erratum: the likelihood ratio test under non-standard conditions: Testing the
Markov switching model of GNP. Journal of Applied Econometrics, 11, 195–199.
Hansen, B. E. (1996b). Inference when a nuisance parameter is not identified under the null. Econo-
metrica, 64, 414–430.
Hendry, D. F. (1995). Dynamic Econometrics. Oxford: Oxford University Press.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Transactions ASME
Journal of Basic Engineering, Series D, 82, 35–45.
Kalman, R. E. (1963). New methods in Wiener filtering theory. In Bogdanoff, J. L., and Kozin, F.
(eds.), Proceedings of the First Symposium of Engineering Applications of Random Function
Theory and Probability, pp. 270–388: New York: Wiley.
Kalman, R. E., and Bucy, R. S. (1961). New results in linear filtering and prediction theory. Transac-
tions ASME Journal of Basic Engineering, Series D, 83, 95–108.
Kim, C.-J. (1994). Dynamic linear models with Markov-switching. Journal of Econometrics, 60, 1–22.
Kitagawa, G. (1987). Non–gaussian state–space modeling of nonstationary time series. Journal of the
American Statistical Association, 82, 1032–1041.
Koop, G., Pesaran, M. H., and Potter, S. M. (1996). Impulse response analysis in nonlinear multivariate
models. Journal of Econometrics, 74, 119–147.
48
Krolzig, H.-M. (1997). Markov Switching Vector Autoregressions. Modelling, Statistical Inference and
Application to Business Cycle Analysis. Berlin: Springer.
Krolzig, H.-M. (2000). Predicting Markov-switching vector autoregressive processes. Economics
Discussion Paper 2000-W31, Nuffield College, Oxford.
Lin, C.-F., and Teräsvirta, T. (1994). Testing the constancy of regression parameters against continous
structural change. Journal of Econometrics, 62, 211–228.
Lindgren, G. (1978). Markov regime models for mixed distributions and switching regressions. Scand-
inavian Journal of Statistics, 5, 81–91.
Luukkonen, R., Saikkonen, P., and Teräsvirta, T. (1988). Testing linearity against smooth transition
autoregressive models. Biometrika, 75, 491–499.
Pesaran, M. H., and Potter, S. M. (1997). A floor and ceiling model of US Output. Journal of Economic
Dynamics and Control, 21, 661–695.
Petrie, T. (1969). Probabilistic functions of finite state Markov chains. Annals of Mathematical Statist-
ics, 60, 97–115.
Potter, S. M. (1993). A nonlinear approach to US GNP. Journal of Applied Econometrics, 10, 109–125.
Ruud, P. A. (1991). Extensions of estimation methods using the EM algorithm. Journal of Economet-
rics, 49, 305–341.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Sims, C. A. (1980). Macroeconomics and reality. Econometrica, 48, 1–48.
Teräsvirta, T. (1994). Specification, estimation, and evaluation of smooth transition autoregressive
models. Journal of the American Statistical Association, 89, 208–218.
Teräsvirta, T. (1998). Modelling economic relationships with smooth transition regressions. In Ullah,
A., and Giles, D. (eds.), Handbook of Applied Economic Statistics, pp. 507–555. New York:
Marcel Dekker.
Teräsvirta, T., and Anderson, H. (1992). Modelling nonlinearities in business cycles using smooth
transition autoregressive models. Journal of Applied Econometrics, 7, S119–S136.
Tiao, G. C., and Tsay, R. S. (1994). Some advances in non-linear and adaptive modelling in time-series.
Journal of Forecasting, 13, 109–131.
Tsay, R. S. (1989). Testing and modeling threshold autoregressive processes. Journal of the American
Statistical Association, 84, 231–240.
Tsay, R. S. (1998). Testing and modeling multivariate threshold models. Journal of the American
Statistical Association, 93, 1188–1202.
49
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Linear time series models . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Regime-switching models . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Regime shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 The Conditional Process . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 The Regime Generating Process . . . . . . . . . . . . . . . . . 6
2 Types of regime-switching models . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Structural change and switching regression models . . . . . . . . . . . . 7
2.1.1 Structural break models . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Switching regression model . . . . . . . . . . . . . . . . . . . 7
2.1.3 Maximum likelihood estimation under normality . . . . . . . . 8
2.2 Threshold models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 The TAR model . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 The SETAR model . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Maximum likelihood estimation under normality . . . . . . . . 10
2.3 Smooth transition autoregressive models . . . . . . . . . . . . . . . . . . 12
2.3.1 The STAR model . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Maximum likelihood estimation . . . . . . . . . . . . . . . . . 15
2.3.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Markov-switching vector autoregressions . . . . . . . . . . . . . . . . . 18
2.4.1 The MS-VAR model . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 State-Space Representation . . . . . . . . . . . . . . . . . . . 21
2.4.3 Related models . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.4 Regime inference . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.5 Maximum Likelihood estimation . . . . . . . . . . . . . . . . 28
3 Prediction and structural analysis with regime-switching models . . . . . . . . . . 32
3.1 Predictions of linear and nonlinear stochastic processes . . . . . . . . . . 32
3.1.1 Linear AR(1) model . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.2 Nonlinear AR(1) model . . . . . . . . . . . . . . . . . . . . . 33
3.1.3 Methods of calculating multi-step forecasts in nonlinear models 34
3.2 Forecasting performance of non-linear / regime-switching models . . . . 35
3.2.1 Empirical Findings . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 Illustrative Example: Hamilton’s model of the US business cycle 35
3.3 Predicting Markov-switching VARs . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Econometric theory of predicting multiple time series subject to
shifts in regime . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 Prediction in Markov-switching regression models . . . . . . . 38
3.3.3 Predictability and Granger-Causality . . . . . . . . . . . . . . 40
3.3.4 Prediction of MS time series processes . . . . . . . . . . . . . 41
3.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Impulse-response analysis . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.1 Traditional and generalized impulse-response analysis . . . . . 44
3.4.2 Impulse responses in MS-ARs . . . . . . . . . . . . . . . . . . 45
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
50

ECONOMETRIC

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

ECONOMETRIC

Încărcat de

Drepturi de autor:

Formate disponibile

Regime–Switching Models

H ANS -M ARTIN K ROLZIG

Hilary Term 2002

• Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time

1.1 Linear time series models

Since Sims (1980) critique of traditional macroeconometric modeling, vector autoregressive

yt = ν + A1 yt−1 + · · · Ap yt−p + εt (1)

The statistical model of yt defined conditional upon the regime st ∈ {1, . . . , M }. :

and their multivariate generalization: the vector autoregressive (VAR) process

yt = νm + Am1 yt−1 + . . . + Amp yt−p + εt , εt ∼ IID(0, Σm ).

1.2.3 The Regime Generating Process

If the stochastic process of yt is defined conditionally upon the (unobservable) regime st ,

Pr(st |Yt−1 , St−1 , Xt ; ρ)

2.1 Structural change and switching regression models

2.1.1 Structural break models

Structural break at time t = τ :

where εt ∼ IID(0, σ 2 ). By using the indicator function I (t; τ ) :

the DGP can be rewritten as

Two different assumptions regarding the information structure

• τ is known: break is deterministic

2.1.2 Switching regression model

Structural break at time t = τ :

• τ is known: break is deterministic

τ ∗ = arg min RSS(τ )

– Test of β1 = β2 has non-standard asymptotics as τ becomes nuisance variable.

2.2.1 The TAR model

where εt ∼ IID(0, σ 2 ). The indicator function I (xt ; c) is of the type

For xt = t a model with a structural break at time t = c occurs

2.2.2 The SETAR model

where εt ∼ IID(0, σ 2 ). . c is again the threshold.

Quarterly growth rate of U.S. GNP, ∆yt :

2-regime SETAR with d = 2.

2.2.3 Maximum likelihood estimation under normality

(i) For given delay d, and threshold c :

Usually the search over c (given d) is restricted such that

Tsay (1989) describes a specification procedure for threshold models.

Figure 1 Linear AR model of US GNP growth.

Actual and fitted values from a SETAR(2;2,2), 1947:4 - 1990:4

Figure 2 SETAR model of US GNP growth.

2.3.1 The STAR model

For multiple-regime STAR models: see Dijk (1999).

G (zt ; γ, c) = I(zt > c);

For γ → 0 : LSTAR → linear AR

Properties of STAR models

STAR models of US Industrial Production

where yt = (y1t , · · · , yKt )0 , εt ∼ IID(0,Σ), Ami is a (K × K) matrix, νm is (K × 1).

∆yt = α1 (1 − G (et−1 ; γ, µ)) (et−1 − µ) + α2 G (et−1 ; γ, µ) (et−1 − µ) + εt .

LSTAR: positive versus negative deviations from equilibrium

SETAR results for γ → ∞

Interesting case: random walk behavior in regime 1 (β 0 α1 = 0) and mean adjustment in

yt = x0t β1 (1 − G (zt ; γ, c)) + x0t β2 G (zt ; γ, c) + εt ,

Non-linear least squares (NLS) estimation of θ = (β10 , β20 ; γ, c)0 :

where εt (θ) = yt − [x0t β1 (1 − G (zt ; γ, c)) + x0t β2 G (zt ; γ, c)] .

• Under the assumption of normality, εt ∼ NID(0, σ 2 ) : NLS = ML.

where xt (γ, c) = (x0t (1 − G (zt ; γ, c)) , x0t G (zt ; γ, c))0 ;

An empirical specification procedure

(1) Specify appropriate linear AR(p) model;

Testing for STAR nonlinearity

• Jarque-Bera test for normality of the residuals

2.4 Markov-switching vector autoregressions

2.4.1 The MS-VAR model

In Markov-switching vector autoregressive (MS-VAR) models it is assumed that the regime st

Pr(st |St−1 , Yt−1 , Xt ) = Pr(st |st−1 ; ρ)