Documente Academic
Documente Profesional
Documente Cultură
A Concise Course
Francis X. Diebold
University of Pennsylvania
Edition 2015
Version 2015.08.12
Francis X. Diebold
To Marc Nerlove,
who taught me time series,
xvii
xviii
Guide to e-Features
xix
Acknowledgments
xx
Preface
xxiv
Chapter 1.
Introduction
Chapter 2.
Chapter 3.
Nonparametrics
20
Chapter 4.
Spectral Analysis
29
Chapter 5.
Markovian Structure, Linear Gaussian State Space, and Optimal (Kalman) Filtering
52
Chapter 6.
84
Chapter 7.
96
Chapter 8.
119
Chapter 9.
129
139
146
Appendices
186
187
190
195
xvii
xviii
Guide to e-Features
xix
Acknowledgments
xx
Preface
xxiv
Chapter 1.
1.1
1.2
1.3
1.4
Chapter 2.
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
Introduction
The Environment
White Noise
The Wold Decomposition and the General Linear Process
Approximating the Wold Representation
2.4.1 The M A(q) Process
2.4.2 The AR(p) Process
2.4.3 The ARM A(p, q) Process
Wiener-Kolmogorov-Wold Extraction and Prediction
2.5.1 Extraction
2.5.2 Prediction
Multivariate
2.6.1 The Environment
2.6.2 The Multivariate General Linear Process
2.6.3 Vector Autoregressions
A Small Empirical Toolkit
2.7.1 Nonparametric: Sample Autocovariances
2.7.2 Parametric: ARM A Model Selection, Fitting and Diagnostics
Exercises, Problems and Complements
Notes
Chapter 3.
Nonparametrics
1
1
1
1
2
3
3
3
4
5
5
6
7
9
9
9
9
9
9
9
10
10
11
12
13
13
14
15
18
20
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
Density Estimation
3.1.1 The Basic Problem
3.1.2 Kernel Density Estimation
3.1.3 Bias-Variance Tradeoffs
3.1.4 Optimal Bandwidth Choice
Multivariate
Functional Estimation
Local Nonparametric Regression
3.4.1 Kernel Regression
3.4.2 Nearest-Neighbor Regression
Global Nonparametric Regression
3.5.1 Series (Sieve, Projection, ...)
3.5.2 Neural Networks
Time Series Aspects
Exercises, Problems and Complements
Notes
Chapter 4.
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
Chapter 5.
5.1
5.2
5.3
5.4
5.5
Spectral Analysis
Markovian Structure, Linear Gaussian State Space, and Optimal (Kalman) Filtering
Markovian Structure
5.1.1 The Homogeneous Discrete-State Discrete-Time Markov Process
5.1.2 Multi-Step Transitions: Chapman-Kolmogorov
5.1.3 Lots of Definitions (and a Key Theorem)
5.1.4 A Simple Two-State Example
5.1.5 Constructing Markov Processes with Useful Steady-State Distributions
5.1.6 Variations and Extensions: Regime-Switching and More
5.1.7 Continuous-State Markov Processes
State Space Representations
5.2.1 The Basic Framework
5.2.2 ARMA Models
5.2.3 Linear Regression with Time-Varying Parameters and More
5.2.4 Dynamic Factor Models and Cointegration
5.2.5 Unobserved-Components Models
The Kalman Filter and Smoother
5.3.1 Statement(s) of the Kalman Filter
5.3.2 Derivation of the Kalman Filter
5.3.3 Calculating P0
5.3.4 Predicting yt
5.3.5 Steady State and the Innovations Representation
5.3.6 Kalman Smoothing
Exercises, Problems and Complements
Notes
xiii
20
20
20
21
22
23
25
25
25
26
27
27
27
27
28
28
29
29
29
32
33
36
40
40
42
42
51
52
52
52
52
53
54
55
56
57
58
58
60
65
67
68
69
70
71
74
74
75
77
77
83
xiv
Chapter 6.
6.1
6.2
6.3
6.4
6.5
6.6
84
84
85
85
86
87
87
88
Chapter 7.
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
Chapter 8.
8.1
8.2
8.3
Bayesian Basics
Comparative Aspects of Bayesian and Frequentist Paradigms
Markov Chain Monte Carlo
8.3.1 Metropolis-Hastings Independence Chain
89
89
91
91
92
94
95
96
96
98
98
99
99
101
101
101
101
102
103
104
109
110
110
110
111
112
112
114
116
116
116
117
118
118
118
119
119
119
121
121
8.4
8.5
8.6
8.7
8.8
Chapter 9.
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
xv
121
121
122
123
125
125
128
128
129
129
130
131
132
133
133
137
137
137
138
139
139
139
139
139
139
139
139
146
146
146
146
146
185
185
Appendices
186
187
190
B.1
B.2
B.3
B.4
B.5
B.6
B.7
Diffusions
Jumps
Quadratic Variation, Bi-Power Variation, and More
Integrated and Realized Volatility
Realized Covariance Matrix Modeling in Big Data Multivariate Environments
Exercises, Problems and Complements
Notes
190
194
194
194
194
194
194
xvi
195
Francis X. Diebold is Paul F. and Warren S. Miller Professor of Economics, and Professor
of Finance and Statistics, at the University of Pennsylvania, as well as Faculty Research Associate at the National Bureau of Economic Research in Cambridge, Mass. He has published
widely in econometrics, forecasting, finance and macroeconomics, and he has served on the
editorial boards of numerous scholarly journals. He is an elected Fellow of the Econometric
Society, the American Statistical Association, and the International Institute of Forecasters;
the recipient of Sloan, Guggenheim, and Humboldt fellowships; and past President of the
Society for Financial Econometrics. Diebold lectures actively, worldwide, and has received
several prizes for outstanding teaching. He has held visiting appointments in Economics and
Finance at Princeton University, Cambridge University, the University of Chicago, the London School of Economics, Johns Hopkins University, and New York University. His research
and teaching are firmly rooted in applications; he has served as an economist under Paul
Volcker and Alan Greenspan at the Board of Governors of the Federal Reserve System in
Washington DC, an Executive Director at Morgan Stanley Investment Management, CoDirector of the Wharton Financial Institutions Center, and Chairman of the Federal Reserve
Systems Model Validation Council. All his degrees are from the University of Pennsylvania;
he received his B.S. from the Wharton School in 1981 and his economics Ph.D. in in 1986.
He is married with three children and lives in suburban Philadelphia.
The colorful graphic is by Peter Mills and was obtained from Wikimedia Commons. As
noted there, it represents the basins of attraction of the Gaspard-Rice scattering system
projected onto a double impact parameter (whatever that means). I used it mainly because
I like it, but also because its reminiscent of a trending time series.
Guide to e-Features
Hyperlinks to internal items (table of contents, index, footnotes, etc.) appear in red.
Hyperlinks to bibliographic references appear in green.
Hyperlinks to the web appear in cyan.
Hyperlinks to external files (e.g., video) appear in blue.
Many images are clickable to reach related material.
Additional related materials are at http://www.ssc.upenn.edu/~fdiebold, including book updates, presentation slides, datasets, and code.
Facebook group: Diebold Time Series Econometrics
Related blog (No Hesitations): fxdiebold.blogspot.com
Acknowledgments
All media (images, audio, video, ...) were either produced by me or obtained from the
public domain repository at Wikimedia Commons.
List of Figures
1.1
1.2
The R Homepage
Resources for Economists Web Page
3.1
22
4.1
4.2
4.3
4.4
4.5
33
37
38
38
39
7.1
7.2
7.3
11.1
11.2
11.3
11.4
11.5
11.6
2
3
97
98
100
147
148
148
149
149
model, after which we compute 1-day 1% HS-V aR using a rolling window of 500 observations. We plot the daily series of true conditional exceedance probabilities, which
we infer from the model. For visual reference we include a horizontal line at the desired
1% probability level.
152
158
159
159
159
160
S&P500 returns, and the bottom panel shows daily S&P500 realized volatility. We
compute realized volatility as the square root of AvgRV , where AvgRV is the average
of five daily RVs each computed from 5-minute squared returns on a 1-minute grid of
S&P500 futures prices.
161
xxii
LIST OF FIGURES
162
163
167
169
170
171
174
181
181
184
List of Tables
11.1 Stock Return Volatility During Recessions. Aggregate stock-return volatility
11.2
is quarterly realized standard deviation based on daily return data. Firm-level stockreturn volatility is the cross-sectional inter-quartile range of quarterly returns.
Real Growth Volatility During Recessions. Aggregate real-growth volatility is
quarterly conditional standard deviation. Firm-level real-growth volatility is the crosssectional inter-quartile range of quarterly real sales growth.
175
175
Preface
Time Series Econometrics (TSE ) provides a modern and concise masters or Ph.D.-level
course in econometric time series. It can be covered realistically in one semester, and I
have used it successfully for many years with first-year Ph.D. students at the University of
Pennsylvania.
The elephant in the room is of course Hamiltons Time Series Analysis. TSE complements
Hamilton in three key ways.
First, TSE is concise rather than exhaustive. (Nevertheless it maintains good breadth
of coverage, treating everything from the classic early framework of Wold, Wiener, and
Kolmogorov, through to cutting-edge Bayesian MCMC analysis of non-linear non-Gaussian
state space models with the particle filter.) Hamiltons book can be used for more extensive
background reading for those topics that overlap.
Second and crucially, however, many of the topics in TSE and Hamilton do not overlap,
as TSE treats a variety of more recently-emphasized ideas. It stresses Markovian structure
throughout, from linear state space, to MCMC, to optimization, to non-linear state space
and particle filtering. Simulation and Bayes feature prominently, as do nonparametrics,
realized volatility, and more.
Finally, TSE is generally e-aware, with numerous hyperlinks to internal items, web sites,
code, research papers, books, databases, blogs, etc.)
Francis X. Diebold
Philadelphia
Wednesday 12th August, 2015
Chapter One
Introduction
CHAPTER 1
INTRODUCTION
CHAPTER 1
Consider the following point/counterpoint items. In each case, which do you think
would be more useful for analysis of economic time series? Why?
Continuous / discrete
linear / nonlinear
deterministic / stochastic
univariate / multivariate
time domain / frequency domain
conditional mean / conditional variance
trend / seasonal / cycle / noise
ordered in time / ordered in space
stock / flow
stationary / nonstationary
aggregate / disaggregate
Gaussian / non-Gaussian
2. Nobel prizes for work involving time series analysis.
Go to the economics Nobel Prize web site. Read about Economics Nobel Prize winners Frisch, Tinbergen, Kuznets, Tobin, Klein, Modigliani, Friedman, Lucas, Engle,
Granger, Prescott, Sargent, Sims, Fama, Shiller, and Hansen. Each made extensive
contributions to, or extensive use of, time series analysis. Other econometricians and
empirical economists winning the Prize include Leontief, Heckman, McFadden, Koopmans, Stone, Modigliani, and Haavelmo.
1.4 NOTES
The study of time series of, for example, astronomical observations predates recorded
history. Early writers on economic subjects occasionally made explicit reference to
astronomy as the source of their ideas. For example, Cournot stressed that, as in astronomy, it is necessary to recognize secular variation that is independent of periodic
variation. Similarly, Jevons made clear his approach to the study of short-term fluctuations used the methods of astronomy and meteorology. During the 19th century
interest in, and analysis of, social and economic time series evolved into a new field of
study independent of developments in astronomy and meteorology. Time-series analysis then flourished. Nerlove et al. (1979) provides a brief history of the fields early
development.
For references old and new, see the library of useful books in Appendix A.
Chapter Two
CHAPTER 2
g(z) =
( ) z
Autocorrelation Function
( ) =
( )
(0)
(0, 2 )
iid
Gaussian white noise: t
N (0, 2 )
E(t ) = 0
var(t ) = 2
Conditional Moment Structure of Strong White Noise
E(t |t1 ) = 0
where
t1 = t1 , t2 , ...
2 , = 0
( ) =
0, 1
(
( ) =
1, = 0
0, 1
bi ti
i=0
where:
b0 = 1
b2i <
i=0
CHAPTER 2
yt = B(L)t =
bi ti
i=0
t W N (0, 2 )
b0 = 1
b2i <
i=0
E(yt ) = E
!
bi ti
bi Eti =
X
i=0
!
bi ti
bi 0 = 0
i=0
i=0
i=0
var(yt ) = var
b2i var(ti ) = 2
i=0
b2i
i=0
= 0 + b1 t1 + b2 t2 + ... =
bi ti
i=1
"
( ) = E
!
bi ti
i=
= 2
!#
bh t h
h=
bi bi
i=
(where bi 0 if i < 0)
10
CHAPTER 2
h1
X
bi T +hi
i=0
E(eT +h,T ) = 0
var(eT +h,T ) = 2
h1
X
b2i
i=0
Immediately,
yT +1,T = yT
yT +2,T = yT +1,T = 2 yT
..
.
yT +h,T = yT +h1,T = h yT
Extension to AR(p) and AR() is immediate.
2.6 MULTIVARIATE
2.6.1 The Environment
(y1t , y2t )0 is covariance stationary if:
E(y1t ) = 1 t
E(y2t ) = 2 t
y1 y2 (t, ) = E
y1t 1
y2t 2
!
(y1,t 1 , y2,t 2 )
11
11 ( ) 12 ( )
21 ( ) 22 ( )
= 0, 1, 2, ...
12 ( ) 6= 12 ( )
12 ( ) = 21 ( )
y1 y2 ( ) = 0y1 y2 ( ), = 0, 1, 2, ...
Gy1 y2 (z) =
y1 y2 ( ) z
Cross Correlations
Ry1 y2 ( ) = Dy1
y1 y2 ( ) Dy1
, = 0, 1, , 2, ...
1 y2
1 y2
D =
y1t
y2t
1t
2t
yt = B(L)t = (I + B1 L + B2 L2 + ...)t
(
E(t 0s )
X
i=0
if t = s
otherwise
k Bi k2 <
12
CHAPTER 2
Autocovariance Structure
y1 y2 ( ) =
0
Bi Bi
i=
(where Bi 0 if i < 0)
Gy (z) = B(z) B 0 (z 1 )
2.6.3 Vector Autoregressions
N -variable VAR of order p:
(L) yt = t
(N xN )(N x1)(N x1)
t (0, )
(N x1)(N xN )
Simple estimation and analysis (OLS)
Granger-Sims causality
Getting the facts straight before theorizing; assessing the restrictions implies by
economic theory
Impulse Response Functions
How does a shock to yit (alone) dynamically affect yjt ?
(I 1 L ... p LP )yt = t
13
(I 1 L ... p LP )yt = P vt
where vt (0, I) and = P P 0
Impulse Response Function:
yt = (I + 1 L + 2 L2 + ...) P vt
= (P + 1 P L + 2 P L2 + ...) vt
( ) =
( ) =
1
T
T | |
xt xt+| | , = 0, 1, ..., (T 1)
t=1
T | |
X
1
xt xt+| | , = 0, 1 , ..., (T 1)
T | | t=1
d
T (
) N (0, )
1
,
T
asycov(
( ), ( + v)) = 0
Bartlett standard errors
14
CHAPTER 2
2
t=1 et
M SE =
T
PT
= 1 PT
2
t=1 et
y)2
t=1 (yt
M SE
= 1
1
T
PT
y)2
t=1 (yt
Still bad:
s2 =
=
PT
2
t=1 et
T k
T
T k
2 = 1 P
R
T
PT
2
t=1 et
PT
2
t=1 et
t=1 (yt
/ T k
yt )2 / T 1
s2
= 1 PT
t=1 (yt
yt )2 / T 1
Good:
k
SIC = T ( T )
PT
2
t=1 et
More generally,
SIC =
2lnL KlnT
+
T
T
15
QBP = T
m
X
2 ( ) 2 (m)
=1
QLB = T (T + 2)
m
X
=1
1
T
2 ( )
16
CHAPTER 2
are well known. Much recent research examines conditions sufficient for the LLN in
more general situations, such as dependent time series with heterogeneous innovations.
The resulting theories of mixing, martingale difference, and nearepoch dependent sequences are discussed in White (1984), Gallant and White (198*), and White (199*),
among many others.
2. The autocovariance function of the MA(1) process, revisited.
In the text we wrote
( ) = E(yt yt ) = E((t + t1 )(t + t 1 )) =
2 , = 1
0, otherwise.
X
j=0
j xtj ,
17
j ytj .
j=0
6. Prediction-error dynamics.
Consider the general linear process with strong white noise innovations. Show that
both the conditional (with respect to the information set t = {t , t1 , ...}) and
unconditional moments of the Wiener-Kolmogorov h-step-ahead prediction error are
identical.
7. Truncating the Wiener-Kolmogorov predictor.
Consider the sample path, {yt }Tt=1 , where the data generating process is yt = B(L)t
and B(L) is of infinite order. How would you modify the Weiner-Kolmogorov linear
least squares prediction formula to generate an operational 3-step-ahead forecast?
(Hint: truncate.) Is your suggested predictor linear least squares? Least squares within
the class of linear predictors using only T past observations?
8. Empirical GDP dynamics.
(a) Obtain the usual quarterly expenditure-side U.S. GDPE from FRB St. Louis,
1960.1-present.
(b) Leaving out the 12 most recent quarters of data, perform a full correlogram
analysis for GDPE logarithmic growth.
(c) Again leaving out the 12 most recent quarters of data, specify, estimate and defend appropriate AR(p) and ARM A(p, q) models for GDPE logarithmic growth.
(d) Using your preferred AR(p) and ARM A(p, q) models for GDPE logarithmic
growth, generate a 12-quarter-ahead linear least-squares path forecast for the
hold-out sample. How do your AR(p) and ARM A(p, q) forecasts compare to
the realized values? Which appears more accurate?
(e) Obtain ADNSS GDPplus logarithmic growth from FRB Philadelphia, read about
it, and repeat everything above.
18
CHAPTER 2
(f) Contrast the results for GDPE logarithmic growth and GDPplus logarithmic
growth.
9. Time-domain analysis of housing starts and completions.
(a) Obtain monthly U.S. housing starts and completions data from FRED at FRB
St. Louis, seasonally-adjusted, 1960.1-present. Your two series should be of equal
length.
(b) Using only observations {1, ..., T 4}, perform a full correlogram analysis of starts
and completions. Discuss in detail.
(c) Using only observations {1, ..., T 4}, specify and estimate appropriate univariate ARM A(p, q) models for starts and completions, as well as an appropriate
V AR(p). Discuss in detail.
(d) Characterize the Granger-causal structure of your estimated V AR(p). Discuss in
detail.
(e) Characterize the impulse-response structure of your estimated V AR(p) using all
possible Cholesky orderings. Discuss in detail.
(f) Using your preferred ARM A(p, q) models and V AR(p) model, specified and estimated using only observations {1, ..., T 4}, generate linear least-squares path
forecasts for the four quarters of hold out data, {T 3, T 2, T 1, T }. How
do your forecasts compare to the realized values? Discuss in detail.
10. Factor structure.
Consider the bivariate linearly indeterministic process,
!
!
y1t
B11 (L) B12 (L)
=
y2t
B21 (L) B22 (L)
1t
2t
!
,
under the usual assumptions. Suppose further that B11 (L) = B21 (L) = 0 and 1t = 2t = t
(with variance 2 ). Discuss the nature of this system. Why might it be useful in economics?
2.9 NOTES
Characterization of time series by means of autoregressive, moving average, or ARMA models was suggested, more or less simultaneously, by the Russian statistician and economist E.
Slutsky and the British statistician G.U. Yule. The Slutsky-Yule framework was modernized, extended, and made part of an innovative and operational modeling and forecasting
paradigm in a more recent classic, a 1970 book by Box and Jenkins. In fact, ARMA and
related models are often called Box-Jenkins models.
19
By 1930 Slutzky and Yule had shown that rich dynamics could be obtained by taking weighted averages of random shocks. Wolds celebrated 1937 decomposition established
the converse, decomposing covariance stationary series into weighted averages of random
shocks, and paved the way for subsequent path-breaking work by Wiener, Kolmogorov,
Kalman and others. The beautiful 1963 treatment by Wolds student Whittle (1963), updated and reprinted as Whittle (1983) with a masterful introduction by Tom Sargent, remains widely-read. Much of macroeconomics is built on the Slutzky-Yule-Wold-WienerKolmogorov foundation. For a fascinating overview of parts of the history in its relation
to macroeconomics, see Davies and Mahon (2009), at http://www.minneapolisfed.org/
publications_papers/pub_display.cfm?id=4348.
Chapter Three
Nonparametrics
f (x0 )
1
2h
x0 +h
f (u)du =
x0 h
1
P (x [x0 h, x0 + h])
2h
1 #xi [x0 h, x0 + h]
fh (x0 ) =
2h
N
N
x0 xi
1 X1
1
I
N h i=1 2
h
Rosenblatt estimator
Kernel density estimator with
kernel: K(u) = 21 I(|u| 1)
bandwidth: h
3.1.2 Kernel Density Estimation
Issues with uniform kernels:
21
NONPARAMETRICS
K(u)du = 1
K(u) = K(u)
Common Kernel Choices
Standard normal: K(u) =
u2
1 e 2
2
1 X
K
fh (x0 ) =
N h i=1
x0 xi
h
Rosenblatt-Parzen estimator
3.1.3 Bias-Variance Tradeoffs
3.1.3.1 Inescapable Bias-Variance Tradeoff (in Practice, Fixed N )
3.1.3.2 Escapable Bias-Variance Tradeoff (in Theory, N )
E(fh (x0 )) f (x0 ) +
h2
2
Op (1)
(So h 0 = bias 0)
var (fh (x0 ))
1
Nh
Op (1)
(So N h = var 0)
Thus,
h0
Nh
= fh (x0 ) f (x0 )
22
CHAPTER 3
d
N h(fh (x0 ) f (x0 )) D
M SE fh (x0 ) f (x) dx
d
N h fh (x0 ) f (x0 ) D
h N 1/5
23
NONPARAMETRICS
d
N 4/5 fh (x0 ) f (x0 ) D
h = 1.06N 1/5
So use:
= 1.06
h
N 1/5
Better to err on the side of too little smoothing:
=
h
N 1/5
3.2 MULTIVARIATE
Earlier univariate kernel density estimator:
N
1 X
fh (x0 ) =
K
N h i=1
x0 xi
h
where Kh () = h1 K( h )
or Kh () = h1 K(h1 )
Multivariate Version (d-Dimensional)
Precisely follows equation (3.2):
N
1 X
fH (x0 ) =
KH (x0 xi ),
N i=1
24
CHAPTER 3
= fh (x0 ) =
N
1 X
x0 xi
K
N hd i=1
h
h0
N hd
N hd
= fh (x0 ) f (x0 )
d
fh (x0 ) f (x0 ) D
h N d+4
q
d
d
N 1 d+4 fh (x0 ) f (x0 ) D
=
h
4
d+2
1
d+4
where
d
1X 2
=
d i=1 i
2
N d+4
25
NONPARAMETRICS
f (y, x)
dy
f (x)
Regression Slope
(x) =
(M (x + h2 ) M (x h2 ))
M (x)
= lim
h0
xj
h
Conditional Variance
Z
var(y|x) = V (x) =
y2
f (y, x)
dy M (x)2
f (x)
Hazard Function
(t) =
f (t)
1 F (t)
C(x) =
lim (x + h2 ) (x h2 )
2
)M
(x)
=
(x) = (
xj
xj 2
h
h0
2|r|+d
Nh
Z
yf (y|x0 )dy =
f (x0 , y)
dy
f (x0 )
26
CHAPTER 3
Using multivariate kernel density estimates and manipulating gives the Nadaraya-Watson
estimator:
h (x0 ) =
M
N
X
"
i=1
x0 xi
h
PN
x0 xi
K
i=1
h
#
yi
h 0, N h =
d
h (x0 ) M (x0 )) N (0, V )
N hd (M
k
N
k (x0 ) P M (x0 )
0 M
k (x0 ) M (x0 )) d D
k (M
1
2
I(|u| 1) (uniform)
(x , xkT ) = [P
j=1 (xkT
xj )2 ] 2
27
NONPARAMETRICS
vt (xt , x , xkT ) = C
(
C(u) =
(xt , x )
(x , x
k )
(1 u3 )3
f or u < 1
otherwise
J
N
P
M (x0 )
0 M J (x0 )
S
N
0 O(x0 ) p O(x0 )
1
N
K( x0 x
h
N hd
28
CHAPTER 3
to get:
N (x0 ) =
M
x xN
h
N 1 (x0 ) + YN K( 0
(N 1)hd fN 1 (x0 )M
x x
(N 1)hd fN 1 (x0 ) + K( 0 N )
t1 ),
j = 1, ..., S
Ot = (0 + Sj=1 j hjt )
S
Compactly: Ot = (0 + Sj=1 j (j0 + R
i=1 ij xi t + l=1 jl hl,
Back substitution:
Ot = g(xt , xt1 , ..., x1 ; )
3.8 NOTES
t1 )
Chapter Four
Spectral Analysis
yt = B(L)t =
bi ti
i=0
( ) z
= 2 B(z)B(z 1 )
( ) and g(z) are a z-transform pair
30
CHAPTER 4
Spectrum
Evaluate g(z) on the unit circle, z = ei :
g(ei ) =
( ) ei , < <
= 2 B(ei ) B(ei )
= 2 | B(ei ) |2
Spectrum
Trigonometric form:
g() =
( )ei
= (0) +
( ) ei + ei
=1
= (0) + 2
( ) cos( )
=1
f () =
f () =
1
g()
2
1 X
( )ei
2 =
( < < )
1
1X
(0) +
( ) cos( )
2
=1
2
B ei B ei
2
2
| B ei |2
2
31
SPECTRAL ANALYSIS
g() =
( )ei
1
( ) =
2
g()ei d
( ) =
1
2
g()ei d
f ()ei d
Hence
Z
(0) =
f ()d
| |
T
( )
T (
x ) 0,
T
1
X
=(T 1)
d
T (
x ) N (0, gx (0))
1
| |
T
( )
32
CHAPTER 4
yt = t
t W N (0, 2 )
f () =
2
B ei B ei
2
f () =
2
2
yt = yt1 + t
t W N (0, 2 )
f () =
2
B(ei )B(ei )
2
2
1
2 (1 ei )(1 ei )
2
1
2 1 2 cos() + 2
(1 L)yt = (1 L)t
f () =
Rational spectral density
Internal peaks? What will it take?
2 1 2 cos() + 2
2 1 2 cos() + 2
33
SPECTRAL ANALYSIS
4.4 MULTIVARIATE
Multivariate Frequency Domain
Covariance-generating function;
Gyx (z) =
yx ( )z
1
Gyx (ei )
2
1 X
yx ( ) ei , < <
2 =
(Complex-valued)
Co-Spectrum and Quadrature Spectrum
Cyx () =
1 X
yx ( ) cos( )
2 =
Qyx () =
1 X
yx ( ) sin( )
2 =
Cross Spectrum
fyx () = gayx ()exp(i phyx ()) (generic cross spectrum)
34
CHAPTER 4
2
gayx () = [Cyx
() + Q2yx ()] 2 (gain)
Qyx ()
(phase)
phyx () = arctan Cyx
()
ph()
)
cohyx () =
|fyx ()|2
(coherence)
fxx ()fyy ()
N
X
fxi ().
i=1
B(ei ) =
= B(ei ) =
fyx ()
fxx ()
yt = .5xt1 + t
35
SPECTRAL ANALYSIS
t W N (0, 1)
xt = .9xt1 + t
t W N (0, 1)
Correlation Structure
Autocorrelation and cross-correlation functions are straightforward:
y ( ) = .9| |
x ( ) .9| |
yx ( ) .9| 1|
(What is the qualitative shape of yx ( )?)
Spectral Density of x
xt =
= fxx () =
1
t
1 .9L
1
1
1
i
2 1 .9e
1 .9ei
1
1
2 1 2(.9) cos() + (.9)2
=
1
11.37 11.30 cos()
Shape?
Spectral Density of y
yt = 0.5Lxt + t
= fyy () =| 0.5ei |2 fxx () +
= 0.25fxx () +
1
2
1
2
36
CHAPTER 4
0.25
1
+
11.37 11.30 cos() 2
Shape?
Cross Spectrum
B(L) = .5L
B(ei ) = 0.5ei
fyx () = B(ei )fxx ()
= 0.5ei fxx ()
= (0.5fxx ()) ei
gyx () = 0.5fxx () =
0.5
11.3711.30 cos()
P hyx () =
(In time units, P hyx () = 1, so y leads x by -1)
Coherence
Cohyx () =
2
()
| fyx () |2
.25fxx
.25fxx ()
=
=
fxx ()fyy ()
fxx ()fyy ()
fyy ()
1
1
2 12(.9) cos()+.92
1
1
1
2 12(.9) cos()+.92 + 2
.25
=
.25
1
8.24 + 7.20 cos()
Shape?
yt = xt xt1
= B(ei ) = 1 ei
Hence the filter gain is:
B(ei ) = 1 ei = 2(1 cos())
How would the gain look for B(L) = 1 + L?
Filter Analysis: Kuznets Infamous Filters
Low-frequency fluctuations in aggregate real output growth.
Kuznets cycle 20-year period
37
SPECTRAL ANALYSIS
= B1 (e
2
1 X ij
sin(5/2)
)=
e
=
5 j=2
5sin(/2)
38
CHAPTER 4
39
SPECTRAL ANALYSIS
B1 (ei )B2 (ei ) = sin(5/2) |2sin(5)|
5sin(/2)
Kuznets Filters, Continued
Filter Design: A Bandpass Filter
Canonical problem:
Find B(L) s.t.
(
fy () =
fx () on [a, b] [b, a]
0 otherwise,
where
yt = B(L)xt =
bj tj
j=
fy () = |B(ei )| fx ().
Hence we need:
(
B(e
)=
sin(jb) sin(ja)
j
, j Z
40
CHAPTER 4
2
I() =
T
2
T
X
yt eit =
T
2 X it
yt e
T t=1
t=1
! r
T
2 X it
yt e
T t=1
Usually examine frequencies j =
2j
T ,
j = 0, 1, 2, ..., T2
1
f() =
2
f() =
T
1
X
( )ei
=(T 1)
1
2T
2
T
X
it
yt e
T
1 X it
yt e
2T t=1
t=1
T
1 X it
yt e
2T t=1
1
I()
4
2j
T ,
j = 0, 1, ..., T2 )
41
SPECTRAL ANALYSIS
1
f() =
2
T
1
X
( )ei =
=(T 1)
f () =
1
2
T 1
1
2 X
(0) +
( ) cos( )
2
2 =1
T
1
X
( )
( )ei
=(T 1)
| |
MT
, MT and 0 otherwise
MT
T
42
CHAPTER 4
4.6.2 Multivariate
Spectral density matrix:
Fyx () =
1 X
yx ( )ei ,
2 =
< <
Fyx
() =
1
2
(T 1)
yx ( ) ei ,
( )
< <
=(T 1)
43
SPECTRAL ANALYSIS
Solution: Assume normality, and then take draws from the process by using a noram
random number generator in conjunction with the Cholesky factorization of the data
covariance matrix. This procedure can be used to estimate the sampling distribution
of the autocorrelations, taken one at a time. One will surely want to downweight
the long-lag autocorrelations before doing the Cholesky factorization, and let this
downweighting adapt to sample size. Assessing sampling uncertainty for the entire
autocorrelation function (e.g., finding a 95% confidence tunnel) appears harder,
due to the correlation between sample autocorrelations, but can perhaps be done
numerically. It appears very difficult to dispense with the normality assumption.
7. Bootstrapping sample spectra.
Assuming normality, propose a parametric bootstrap method of assessing the finitesample distribution of a consistent estimator of the spectral density function at various
selected frequencies. How would you generalize this to assess the sampling uncertainty
associated with the entire spectral density function?
Solution: At each bootstrap replication of the autocovariance bootstrap discussed
above, Fourier transform to get the corresponding spectral density function.
8. Bootstrapping spectra without normality.
Drop the normality assumption, and propose a parametric bootstrap method of
assessing the finite-sample distribution of (1) a consistent estimator of the spectral
density function at various selected frequencies, and (2) the sample autocorrelations.
Solution: Make use of the asymptotic distribution of the periodogram ordinates.
9. Sample coherence.
If a sample coherence is completed directly from the sample spectral density matrix
(without smoothing), it will be 1, by definition. Thus, it is important that the sample spectrum and cross-spectrum be smoothed prior to construction of a coherence
estimator.
Solution:
2
coh() =
|fyx ()|
fx () fy ()
1.
10. De-meaning.
Consider two forms of a covariance stationary time series: raw and de-meaned.
Contrast their sample spectral density functions at ordinates 2j/T, j = 0, 1, ...,
44
CHAPTER 4
T/2. What do you conclude? Now contrast their sample spectral density functions at
ordinates that are not multiples of 2j/T. Discuss.
Solution: Within the set 2j/T, j = 0, 1, ..., T/2, only the sample spectral density at
frequency 0 is affected by de-meaning. However, de-meaning does affect the sample
spectral density function at all frequencies in [0, ] outside the set 2j/T, j = 0, 1,
..., T/2. See Priestley (1980, p. 417). This result is important for the properties of
time- versus frequency-domain estimators of fractionally-integrated models. Note in
particular that
I(j )
1 X ij t 2
|
yt e
|
T
so that
I(0)
1 X 2
|
yt | T y2 ,
T
which approaches infinity with sample size so long as the mean is nonzero. Thus it
makes little sense to use I(0) in estimation, regardless of whether the data have been
demeaned.
T
T
X
2 X
yt sin j t]2 ).
yt cos j t]2 + [
([
T t=1
t=1
45
SPECTRAL ANALYSIS
the variance of the sample mean of such a time series. If you are very ambitious, you
might want to explore in a Monte Carlo experiment the sampling properties of your
estimator of the standard error vs. the standard estimator of the standard error, for
various population models (e.g., AR(1) for various values of ) and sample sizes. If
you are not feeling so ambitious, at least conjecture upon the outcome of such an
experiment.
15. Coherence.
a. Write out the formula for the coherence between two time series x and y.
b. What is the coherence between the filtered series, (1 - b1 L) xt and (1 - b2 L) yt ?
(Assume that b1 6= b2 .)
c. What happens if b1 = b2 ? Discuss.
Solution:
(a) G2 = 1 - e-i2 is monotonically increasing on [0, ]. This is an example of a high
pass filter.
(b) G2 = 1 + e-i2 is monotonically decreasing on [0, ]. This is an example of a low
pass filter.
(c) G2 = (1 - .5 e-12i)2 has peaks at the fundamental seasonal frequency and its
harmonics, as expected. Note that it corresponds to a seasonal autoregression.
(d) G2 = (1 - .5 e-12i)2 has troughs at the fundamental seasonal frequency and its
harmonics, as expected, because it is the inverse of the seasonal filter in (c) above.
46
CHAPTER 4
Thus, the seasonal process associated with the filter in (c) above would be appropriately seasonally adjusted by the present filter, which is its inverse.
18. Filtering
(a) Consider the linear filter B(L) = 1 + L. Suppose that yt = B(L) xt,
where xt WN(0, 2). Compute fy().
(b) Given that the spectral density of white noise is 2/2, discuss how the filtering
theorem may be used to determine the spectrum of any LRCSSP by viewing it as a
linear filter of white noise.
Solution:
(a) fy() = 1 + e-i2 fx()
= 2/2 (1 + e-i)(1 + ei)
= 2/2 (1 + 2 + 2 cos ),
which is immediately recognized as the sdf of an MA(1) process.
(b) All of the LRCSSPs that we have studied are obtained by applying linear filters
to white noise. Thus, the filtering theorem gives their sdfs as
f() = 2/2 B(e-i)2
= 2/2 B(e-i) B(ei)
= 2/2 B(z) B(z-1),
evaluated on |z| = 1, which matches our earlier result.
Solution: The series must be deterministic, because one could design a filter such that
the filtered series has zero spectrum everywhere.
20. Period.
Period is 2/ and is expressed in time/cycle. 1/P, cycles/time. In engineering, time
is often measured in seconds, and 1/P is Hz.
21. Seasonal autoregression.
Consider the seasonal autoregression (1 - L12) yt = t.
(a) Would such a structure be characteristic of monthly seasonal data or quarterly
seasonal data?
47
SPECTRAL ANALYSIS
(b) Compute and plot the spectral density f(), for various values of . Does it have
any internal peaks on (0, )? Discuss.
(c) The lowest-frequency internal peak occurs at the so-called fundamental seasonal
frequency. What is it? What is the corresponding period?
(d) The higher-frequency spectral peaks occur at the harmonics of the fundamental
seasonal frequency. What are they? What are the corresponding periods?
Solution:
(a) Monthly, because of the 12-period lag.
(b)
f () =
2
(1 + 2 2 cos(12))
2
48
CHAPTER 4
(d) The higher-frequency spectral peaks occur at the harmonics of the fundamental
seasonal frequency. What are they? What are the corresponding periods?
Solution: (a) Quarterly, because of the 4-period lag.
(b)
f () =
2
(1 + 2 2 cos(4))
2
49
SPECTRAL ANALYSIS
Let
T
x
=
1X
xt .
T t=1
Then
var(
x) =
=
PT PT
1
s=1
t=1 (t s)
T2
P
T 1
| |
1
=(T 1) (1 T )( ),
T
2fx (0)
.
T
0.2
= 1
1 0.8
1 X
1
1
f () =
( )ei =
(0) =
2 =
2
2
50
CHAPTER 4
Because
x2t
follows an AR(1) process, it follows that
x2 ( ) = 0.8x2 ( 1)
for =1,2,.... We can write the s.d.f . as
f () =
X
X
1
1
x2 (0) + 2
x2 ( ) cos( ) =
(1 + 2
0.8 cos( ))
2
2
=1
=1
2
2
[(1 0.8e12i )(1 0.8e12i )]1 =
(1 1.6cos12 + 0.64)1
2
2
b.
f () =
2
2
[(1 + 0.8e12i )(1 + 0.8e12i )] =
(1 + 1.6cos12 + 0.64)
2
2
51
SPECTRAL ANALYSIS
4.8 NOTES
Harmonic analysis is one of the earliest methods of analyzing time series thought to exhibit
some form of periodicity. In this type of analysis, the time series, or some simple transformation of it, is assumed to be the result of the superposition of sine and cosine waves
of different frequencies. However, since summing a finite number of such strictly periodic
functions always results in a perfectly periodic series, which is seldom observed in practice,
one usually allows for an additive stochastic component, sometimes called noise. Thus, an
observer must confront the problem of searching for hidden periodicities in the data, that
is, the unknown frequencies and amplitudes of sinusoidal fluctuations hidden amidst noise.
An early method for this purpose is periodogram analysis, initially used to analyse sunspot
data, and later to analyse economic time series.
Spectral analysis is a modernized version of periodogram analysis modified to take account
of the stochastic nature of the entire time series, not just the noise component. If it is assumed
that economic time series are fully stochastic, it follows that the older periodogram technique
is inappropriate and that considerable difficulties in the interpretation of the periodograms
of economic series may be encountered.
These notes draw in part on Diebold, Kilian and Nerlove, New Palgrave, ***.
Chapter Five
[time (t + 1)]
[time t]
P
p11
p12
p21
p22
pij 0,
j=1
pij = 1
pij
(m)
Let P (m) pij .
= P rob(Xt+m = j | Xt = i)
53
Chapman-Kolmogorov theorem:
P (m+n) = P (m) P (n)
Corollary: P (m) = P m
5.1.3 Lots of Definitions (and a Key Theorem)
(n)
State i has period d if pii = 0 n such that n/d 6 Z, and d is the greatest integer with
that property. (That is, a return to state i can only occur in multiples of d steps.) A state
with period 1 is called an aperiodic state.
A Markov process all of whose states are aperiodic is called an aperiodic Markov process.
Still more definitions....
The first-transition probability is the probability that, starting in i, the first transition to
j occurs after n transitions:
(n)
n=1
(n)
fij ).
jj (=
54
CHAPTER 5
(1) All states are transient or all states are null recurrent
(n)
i, j
p1j = 1,
j=1
2
X
p2j = 1
j=1
!
=
55
5.1.4.4 Periodicity
State 1:
d(1) = 2
State 2:
d(2) = 2
(n)
1,
(n)
f21
= 0 n > 1 f21 = 1
5.1.4.6 Recurrence
Because f21 = f12 = 1, both states 1 and 2 are recurrent.
Moreover,
11 =
(n)
nf11 = 2 <
n=1
(.5, .5)
!
1
= (.5, .5).
56
CHAPTER 5
Initialize (j = 0) using z2
(0)
(1)
1
Gibbs iteration j = 1: Draw z(1)
from f (z1 |z2 ), draw z2
Repeat j = 2, 3, ....
5.1.5.2 Global Optimization
(e.g., simulated annealing)
If c
/ N ((m) ) then P ((m+1) = c |(m) ) = 0
If c N ((m) ) then P ((m+1) = c | (m) ) = exp (min[0, /T (m)])
5.1.6 Variations and Extensions: Regime-Switching and More
5.1.6.1 Markovian Regime Switching
P =
p11
1 p11
1 p22
p22
st P
yt = cst + st yt1 + t
t iid N (0, s2t )
Markov switching, or hidden Markov, model
Popular model for macroeconomic fundamentals
5.1.6.2 Heterogeneous Markov Processes
p11,t
p21,t
Pt =
p12,t
p22,t
(1)
57
yt = cst + st yt1 + t
t iid N (0, s2t )
Business cycle duration dependence: pij,t = gij (t)
Credit migration over the cycle: pij,t = gij (cyclet )
General covariates: pij,t = gij (xt )
5.1.6.3 Semi-Markov Processes
We call semi-Markov a process with transitions governed by P , such that the state durations
(times between transitions) are themselves random variables. The process is not Markov,
because conditioning not only the current state but also time-to-date in state may be useful
for predicting the future, but there is an embedded Markov process.
Key result: The stationary distribution depends only on P and the expected state durations. Other aspects of the duration distribution are irrelevant.
Links to Diebold-Rudebush work on duration dependence: If welfare is affected only by
limiting probabilities, then the mean of the duration distribution is the only relevant aspect.
Other, aspects, such as spread, existence of duration dependence, etc. are irrelevant.
5.1.6.4 Time-Reversible Processes
Theorem: If {Xt } is a stationary Markov process with transition probabilities pij and stationary probabilities i , then the reversed process is also Markov with transition probabilities
pij =
j
pji .
i
In general, pij 6= pij . In the special situation pij = pij (so that i pij = j pji ), we say that
the process is time-reversible.
5.1.7 Continuous-State Markov Processes
5.1.7.1 Linear Gaussian State Space Systems
t = T t1 + Rt
yt = Zt + t
t N, t N
58
CHAPTER 5
mx1
t1
mxm
mx1
mxg
gx1
t = 1, 2, ..., T
Measurement Equation
yt
1x1
1xm
mx1
wt
1xL
Lx1
t = 1, 2, ..., T
(Important) Details
t
t
WN
0, diag( Q , |{z}
h )
|{z}
gg
E(0 t 0 ) = 0mxg
E(0 t ) = 0mx1
11
t
1x1
59
mx1
mxm
yt
1x1
mx1
1xm
mx1
t1
WN
mxg
gx1
wt
1xL
Lx1
t
1x1
0, diag( Q , |{z}
h )
|{z}
gg
E(0 t ) = 0mx1
11
E(0 t 0 ) = 0mxg
mx1
1x1
1xm
t1
mxm
yt
mx1
mx1
mxg
gx1
wt
1xL
Lx1
t
1x1
mx1
yt
1x1
B 1
T
mxm
Z
1xm
B 1
mxm
mxm
B
mxm
t1
mxm
t
mx1
mx1
mxL
mxg
gx1
wt
Lx1
t
1x1
60
CHAPTER 5
(B t )
(B T B 1 )
(B t1 )
mxm
mx1
mx1
yt
1x1
(Z B 1 )
1xm
(B t )
mx1
mxL
yt = yt1 + t
t W N (0, 2 )
Already in state space form!
t = t1 + t
yt = t
(T = , R = 1, Z = 1, = 0, Q = 2 , h = 0)
MA(1)
yt = (L)t
t W N (0, 2 )
where
(L) = 1 + 1 L
MA(1) in State Space Form
yt = t + t1
t W N (0, 2 )
(B R)
mxg
gx1
wt
Lx1
t
1x1
61
1t
!
=
2t
1,t1
!
+
2,t1
!
t
yt = (1, 0) t = 1t
MA(1) in State Space Form
Why? Recursive substitution from the bottom up yields:
yt
t =
MA(q)
yt = (L)t
t W N (0, 2 )
where
(L) = 1 + 1 L + ... + q Lq
MA(q) in State Space Form
yt = t + 1 t1 + ... + q tq
t W N N (0, 2 )
1t
2t
..
.
0
..
.
Iq
00
q+1,t
1,t1
2,t1
..
.
1
+ . t
q+1,t1
yt = (1, 0, ..., 0) t = 1t
MA(q) in State Space Form
Recursive substitution from the bottom up yields:
62
CHAPTER 5
q tq + . . . + 1 t1 + t
..
q t1 + q1 t
q t
yt
..
.
q t1 + q1 t
q t
AR(p)
(L)yt = t
t W N (0, 2 )
where
(L) = (1 1 L 2 L2 ... p Lp )
AR(p) in State Space Form
1t
2t
= .
..
pt
.
..
Ip1
00
1,t1
2,t1
..
p,t1
+ . t
..
yt = (1, 0, ..., 0) t = 1t
AR(p) in State Space Form
Recursive substitution from the bottom up yields:
1t
..
.
p1,t
pt
ARMA(p,q)
1 1,t1 + . . . + p 1,tp + t
..
p1 1,t1 + p 1,t2
p 1,t1
yt
..
p1 yt1 + p yt2
p yt1
63
(L)yt = (L)t
t W N (0, 2 )
where
(L) = (1 1 L 2 L2 ... p Lp )
(L) = 1 + 1 L + ... + q Lq
ARMA(p,q) in State Space Form
yt = 1 yt1 + ... + p ytp + t + 1 t1 + ... + q tq
t W N (0, 2 )
Let m = max(p, q + 1) and write as ARMA(m, m 1):
(1 , 2 , ..., m ) = (1 , ..., p , 0, ..., 0)
(1 , 2 , ..., m1 ) = (1 , ..., q , 0, ..., 0)
ARMA(p,q) in State Space Form
.
..
1
Im1
t1 +
..
00
m1
yt = (1, 0, ..., 0) t
ARMA(p,q) in State Space Form Recursive substitution from the bottom up yields:
1t
m1,t
mt
1 1,t1 + p 1,tp + t + 1 t1 + . . . + q tq
.
.
.
m1 1,t1 + m,t1 + m2 t
m 1,t1 + m1 t
m1 yt1 + m yt2
m yt1
yt
.
.
.
+ m1 t1 + m2 t
+ m1 t
t
mx1
T
mxm
t1
mx1
mxg
gx1
64
CHAPTER 5
yt
N x1
N xm
mx1
Lx1
N x1
WN
Wt
N xL
0, diag( Q , |{z}
H )
|{z}
gg N N
E(0 t 0 ) = 0mxg
E(0 t 0 ) = 0mxN
N -Variable V AR(p)
yt
N x1
N xN
N x1
t W N (0, )
State Space Representation
1t
2t
.
.
.
pt
N px1
1
2
.
.
.
p
IN (p1)
00
N pxN p
= (IN ,
yt
1,t1
2,t1
.
p,t1
0N
N x1
N px1
, ..., 0N )
IN
0N xN
.
.
.
0N xN
N xN p
N px1
Multivariate ARMA(p,q)
yt
N x1
+ t +
yt1 + ... +
1
N xN
t1 + ... +
1
N xN
t W N (0, )
ytp
p
N xN
q
N xN
tq
t
N P xN
65
Multivariate ARMA(p,q)
t
N mx1
2
.
.
.
m
1
IN (m1)
t1 +
1
..
.
m1
0N xN (m1)
yt = (I, 0, ..., 0) t = 1t
where m = max(p, q + 1)
5.2.3 Linear Regression with Time-Varying Parameters and More
Linear Regression Model, I
Transition: Irrelevant
Measurement:
yt = 0 xt + t
(Just a measurement equation with exogenous variables)
(T = 0, R = 0, Z = 0, = 0 , Wt = xt , H = 2 )
Linear Regression Model, II
Transition:
t = t1
Measurement:
yt = x0t t + t
(T = I, R = 0, Zt = x0t , = 0, H = 2 )
Note the time-varying system matrix.
Linear Regression with ARMA(p,q) Disturbances
yt = xt + ut
ut = 1 ut1 + ... + p
t =
utp
2
.
.
.
m
+ t + 1 t1 + ... + q tq
Im1
t1 +
00
1
..
.
m1
66
CHAPTER 5
where m = max(p, q + 1)
Linear Regression with Time-Varying Coefficients
Transition:
t = t1 + t
Measurement:
yt = x0t t + t
(T = , R = I, Q = cov(t ), Zt = x0t , = 0, H = 2 )
Gradual evolution of tastes, technologies and institutions
Lucas critique
Stationary or non-stationary
5.2.3.1 Simultaneous Equations
N -Variable Dynamic SEM
+
Structure:
0 yt = 1 yt1 + ... + p ytp + P t
t W N (0, I)
Reduced form:
1
1
yt = 1
0 1 yt1 + ... + 0 p ytp + 0 P t
1t
2t
.
.
.
1
0 1
1
0 2
=
..
.
1
0 p
pt
yt
N x1
00
= (IN ,
1,t1
2,t1
..
p,t1
0N
N xN p
, ..., 0N )
1
0 P
0
..
.
0
t
N px1
1t
.
.
.
N t
67
1
y1t
.
.
. = .
.
.
N
yN t
1t
1
.
.
+ . Ft + .
.
.
N t
N
Ft = Ft1 + t
Already in state-space form!
Dynamic Factor Model Single ARMA(p,q) Factor
y1t
1
1
1t
.
.
.
.
. = . + . Ft + .
.
.
.
.
yN t
N
N
N t
(L) Ft = (L) t
Dynamic Factor Model Single ARMA(p,q) Factor State vector for F is state vector for
system:
t =
2
.
.
.
m
Im1
t1 +
00
1
..
.
m1
Dynamic Factor Model Single ARMA(p,q) factor System measurement equation is then:
y1t
1
.
. = ..
.
.
yN t
N
1
1t
+ .. (1, 0, ..., 0) t + ..
.
.
N
N t
1
1
.
.
.
.
=
. + .
N
N
1t
t + ..
.
... 0
N t
... 0
68
CHAPTER 5
y1t
=
t +
y2t
2
t = t1 + t
1t
2t
Common trend t
Note that
y1t
1
y2t
2
1t
1
2t
2
xt = xt1 + t
yt = xt + t
t
t
!
WN
0,
(t = xt , T = , R = 1, Z = 1, = 0, Q = 2 , H = 2 )
Cycle + Seasonal + Noise
yt = ct + st + t
ct = ct1 + ct
st = st4 + st
Cycle + Seasonal + Noise
Transition equations for the cycle and seasonal:
ct = c,t1 + ct
! !
69
st =
I3
s,t1 + 0 st
0
0
00
Cycle + Seasonal + Noise Stacking transition equations gives the grand transition equation:
st
ct
!
=
0 0 1 0
0 0 0 0
0 0 0
s,t1
+
0
0
0
c,t1
0
1
yt = (1, 0, 0, 0, 1)
!
+ t
ct
mx1
yt
N x1
mxm
N xm
t
t
t1
mx1
WN
mxg
gx1
Wt
N xL
mx1
Lx1
0, diag( Q , |{z}
H )
|{z}
gg N N
E(0 t0 ) = 0mxg
E(0 0t ) = 0mxN
t
N x1
st
ct
70
CHAPTER 5
a0 = E(0 )
P0 = E(0 a0 ) (0 a0 )0
Statement of the Kalman Filter
II. Prediction Recursions
at/t1 = T at1
Pt/t1 = T Pt1 T 0 + R Q R0
III. Updating Recursions
t |t1 N (T t1 , RQR0 )
yt |t N (Zt , H)
Kalman Filter in Density Form (Assuming Normality)
Initialize at a0 , P0
State prediction:
t |
yt1 N (at/t1 , Pt/t1 )
at/t1 = T at1
Pt/t1 = T Pt1 T 0 + RQR0
Data prediction:
yt |
yt1 N (Zat/t1 , Ft )
Update:
t |
yt N (at , Pt )
71
N
,
(x x
(y))2 f (x, y) dx dy
x
y
N
,
= (x , y )0
xx
xy
yx
yy
x|y = x + xy 1
yy (y y )
x|y = xx xy 1
yy yx
Constructive Derivation of the Kalman Filter
Under Normality
Let Et () E( |t ), where t {y1 , ..., yt }.
Time 0 update:
a0 = E0 (0 ) = E (0 )
P0 = var0 ( 0 ) = E [(0 a0 ) (0 a0 )0 ]
Derivation of the Kalman Filter, Continued...
Time 0 prediction
1 = T 0 + R1
72
CHAPTER 5
= a1/0 = E0 (1 ) = T E0 (0 ) + RE0 (1 )
= T a0
Derivation of the Kalman Filter, Continued...
0
P1/0 = E0 (1 a1/0 ) (1 a1/0 )
(subst. a1/0 ) = E0 (1 T a0 ) (1 T a0 )0
(subst. 1 ) = E0 (T (0 a0 ) + R1 ) (T (0 a0 ) + R1 )0
= T P0 T 0 + RQR0
(using E(0 t0 ) = 0 t)
Derivation of the Kalman Filter, Continued...
Time 1 updating
We will derive the distribution of:
1
y1
!
0
E0 (1 ) = a1/0
E0 (y1 ) = Za1/0 + W1
73
var0 (y1 ) = E0 (y1 Za1/0 W1 ) (y1 Za1/0 W1 )0
= E0 (Z(1 a1/0 ) + 1 ) (Z(1 a1/0 ) + 1 )0
= Z P1/0 Z 0 + H (using )
1
y1
!
0 N
a1/0
Za1/0 + W1
!
,
P1/0
P1/0 Z 0
ZP1/0
ZP1/0 Z 0 + H
!!
74
CHAPTER 5
yt |t1 N (yt/t1 , Ft )
or equivalently
vt | t1 N (0, Ft )
t = T t1 + Rt
yt = Zt + t
E(t t0 ) = Q
E(t 0t ) = H
(Nothing new)
5.3.5.1 Combining Covariance Matrix Prediction and Updating
(1) Prediction: Pt+1/t = T Pt T 0 + RQR0
(2) Update: Pt = Pt/t1 Kt Z Pt/t1
75
76
CHAPTER 5
Covariance matrix of vt is Ft
5.3.5.3 Innovations (Steady-State) Representation
t
at+1|t = T at|t1 + T Kv
yt = Z at|t1 + vt
where
= P Z 0 F 1
K
E(vt vt0 ) = F = Z P Z 0 + H
P solves the matrix Ricatti equation
Effectively Wold-Wiener-Kolmogorov prediction and extraction
77
Prediction yt+1/t is now the projection of yt+1 on infinite past, and one-step prediction
errors vt are now the Wold-Wiener-Kolmogorov innovations
Remarks on the Steady State
1. Steady state will be approached if:
underlying two-shock system is time invariant
all eigenvalues of T are less than one
P1|0 is positive semidefinite
2. Because the recursions for Pt|t1 and Kt dont depend on the data, but only on P0 ,
by letting the filter run
we can calculate arbitrarily close approximations to P and K
5.3.6 Kalman Smoothing
1. (Kalman) filter forward through the sample, t = 1, ..., T
2. Smooth backward, t = T, (T 1), (T 2), ..., 1
Initialize: aT,T = aT , PT,T = PT
Then:
at,T = at + Jt (at+1,T at+1,t )
Pt,T = Pt + Jt (Pt+1,T Pt+1,t )Jt0
where
1
Jt = Pt T 0 Pt+1,t
78
CHAPTER 5
(e) The expected number of returns to a recurrent state is infinite, and the expected
number of returns to a transient state is finite. That is,
State j is recurrent
State j is transient
n
Pjj
= ,
n=1
n
Pjj
< .
n=1
Pijn < , i.
n=1
.9
.1
.3
.7
79
i, j
2
X
P1j = 1,
j=1
P2j = 1
j=1
P =
.9
!
.1
.9
!
.1
.9
.1
.3
.7
.7
.7
.3
.3
!
=
.804 .196
.588
.412
(1)
f12 = .1
(2)
f12 = .9 .1 = .09
(3)
(1)
f12 = .3
(2)
f21 = .7 .3 = .21
(3)
Eventual:
f12 =
X
n=1
(n)
f12 =
.1
=1
1 .9
80
CHAPTER 5
f21 =
(n)
f21 =
n=1
.3
=1
1 .7
(g) Recurrence:
Because f12 = f21 = 1, both states 1 and 2 are recurrent. In addition,
P
(n)
n f11 <
11 =
Pn=1
(n)
22 =
<
n=1 n f22
States 1 and 2 are therefore positive recurrent and (given their aperiodicity established earlier) ergodic.
(h) Stationary distribution
We can iterate on the P matrix to see that:
lim P n =
.75
.25
.75
.25
P11
P12
P21
P22
!
=
1 P11 + 2 P21 = 1
(1)
1 P12 + 2 P22 = 2
(2)
1 =
Thus,
1
P21
1 P11
(1 P11 + P21 )
P21
1 P11
lim P =
P12
=
P21
2
1
2
1
P21 = .1
P12 = .3
81
yt = yt1 + t + t1
t W N (0, 2 )
1t
2t
1,t1
2,t1
!
t
yt = (1, 0) t = 1t
Recursive substitution from the bottom up yields:
t =
yt1 + t1 + t
t
!
=
yt
82
CHAPTER 5
xt = yt + ut
yt = yt1 + vt .
Show that the reduced form is ARM A(1, 1):
yt = yt1 + t + t1
and provide expressions for 2 and in terms of the underlying parameters , v2 and
u2 .
Solution:
Box and Jenkins (1976) and Nerlove et al. (1979) show the ARMA result and give the
formula for . That leaves 2 . We will compute var(x) first from the UCM and then
from the ARMA(1,1) reduced form, and equate them.
From the UCM:
var(x) =
v2
+ u2
1 2
var(x) = 2
(1 + 2 2)
1 2
Equating yields
2 =
v2 + u2 (1 2 )
1 + 2 2
83
extract the seasonal and substractboth methods yield the same answer;
ii) y
s , the estimated seasonal, is less variable than ys , the true seasonal, and yn , the
estimated nonseasonal, is less variable than yn , the true nonseasonal. It is paradoxical
that, by (ii), both estimates are less variable than their true counterparts, yet, by (i),
they still add up to the same observed series as their true counterparts. The paradox is
explained by the fact that, unlike their true counterparts, the estimates ys and yn are
correlated (so the variance of their sum can be more than the sum of their variances).
5.5 NOTES
Chapter Six
yt N (, ())
Example: AR(1)
(yt ) = (yt1 ) + t
ij () =
T /2
L(y; ) = (2)
|()|
1/2
2
|ij|
1 2
1
0 1
exp (y ) ()(y )
2
1
1
lnL(y; ) = const ln|()| (y )0 1 () (y )
2
2
T xT matrix () can be very hard to calculate (we need analytic formulas for the autocovariances) and invert (numerical instabilities and inaccuracies; slow even if possible)
Prediction-error decomposition and the Kalman filter:
Schweppes prediction-error likelihood decomposition is:
L(y1 , . . . , yT ; ) =
T
Y
Lt (yt |yt1 , . . . , y1 ; )
t=1
or:
ln L(y1 , . . . , yT ; ) =
T
X
t=1
ln Lt (yt |yt1 , . . . , y1 ; )
85
MAXIMUM LIKELIHOOD
Prediction-error decomposition
In the univariate Gaussian case, the Schweppe decomposition is
T
ln L =
T
1X
1 X (yt t )2
ln 2
ln t2
2
2 t=1
2 t=1
t2
T
1 X vt2
T
1X
ln Ft
ln 2
2
2 t=1
2 t=1 Ft
ln L =
NT
1X
1X
ln |t |
(yt t )0 1
ln 2
t (yt t )
2
2 t=1
2 t=1
T
NT
1X
1 X 0 1
ln 2
ln |Ft |
v F vt
2
2 t=1
2 t=1 t t
86
CHAPTER 6
k (m+1) k
p
k (m) k
= O(1)
D(m) = H 1(m)
2 lnL
|
12 (m)
2 lnL
1 k | (m)
.
.
.
2 lnL
k 1 | (m)
2 lnL
2 | (m)
k
MAXIMUM LIKELIHOOD
87
88
CHAPTER 6
Related R packages:
trust (trust region optimization)
minpack.lm (R interface to Levenberg-Marquardt in MINPACK)
89
MAXIMUM LIKELIHOOD
90
CHAPTER 6
Complete-Data Likelihood:
f (y, 0 , {t }Tt=1 ) = fa0 ,P0 (0 )
T
Y
T
Y
fT,Q (t |t1 )
t=1
fZ,H (yt |t )
t=1
ln L(y, {t }T
t=1 ; ) = const
T
T
1X
ln |H|
(yt Zt )0 H 1 (yt Zt )
2
2 t=1
6.3.2.2 E Step
Construct: lnL(m) (y; ) E lnL y, {t }Tt=0 ;
h
i
h
i
1
1
E ln L(y, {t }T
ln |P0 | E (0 a0 )0 P01 (0 a0 )
t=0 ; ) = const
2
2
T
1X
T
E (t T t1 )0 Q1 (t T t1 )
ln |Q|
2
2 t=1
T
T
1X
ln |H|
E (yt Zt )0 H 1 (yt Zt )
2
2 t=1
T0 =
T
X
! T
!1
X
0
0
E t t1
E t1 t1
t=1
t=1
t = (t Tt1 )
T
X
= 1
E [
t t0 ]
Q
T t=1
Z 0 =
T
X
t=1
!
yt E [t ]
T
X
!1
E [t t0 ]
t=1
t)
t = (yt Z
91
MAXIMUM LIKELIHOOD
T
X
= 1
E [
t 0t ]
H
T t=1
where:
E [t ] = at|T
E [t t0 ] = at|T a0t|T + Pt|T
0
E t t1
= at|T a0t1|T + P(t,t1)|T
Simply replacing t with at,T wont work because E [t t0 |T ] 6= at,T a0t,T
Instead we have E [t t0 |T ] = E [t | T ] E [t |T ]0 + V ar(t |T ) = at,T a0t,T + Pt,T
ln L() =
T
X
ln Lt ()
t=1
T
T
X
X
ln L()
ln Lt ()
=
=
st (0 )
0
0
t=1
t=1
IEX,H (0 ) =
= EH(0 ) =
T
X
t=1
E
2 ln L()
0
0
2 ln Lt ()
0
=
0
T
X
t=1
EHt (0 )
92
CHAPTER 6
T (M L 0 ) N (0, VEX (0 ))
(6.1)
where
VEX (0 ) = VEX,H (0 ) = plimT
= VEX,s (0 ) = plimT
IEX,H (0 )
T
IEX,s (0 )
T
1
1
!1
VEX,H (0 ) =
!1
VOB,H (0 ) =
IOB,H (M L )
T
IEX,s (M L )
T
!1
VEX,s (0 ) =
!1
VOB,s (0 ) =
IOB,s (M L )
T
d
m
T (M L 0 ) N (0, VEX
(0 ))
(6.2)
93
MAXIMUM LIKELIHOOD
where:
m
VEX
(0 ) = VEX,H (0 )1 VEX,s (0 )VEX,H (0 )1
!1
IEX,s (M L )
T
IEX,H (M L )
T
!1
IEX,H (M L )
T
IOB,H (M L )
T
!1
IOB,s (M L )
T
IOB,H (M L )
T
!1
m
VOB
(0 ) =
m
VEX
(0 )
Sandwich Estimator
6.4.2.2 General Misspecification
Under possible general mispecification,
d
m
T (M L ) N (0, VEX
( ))
(6.3)
where:
m
VEX
( ) = VEX,H ( )1 VEX,s ( )VEX,H ( )1
m
VEX
( )
IEX,H (M L )
T
!1
IEX,s (M L )
T
IEX,H (M L )
T
!1
94
CHAPTER 6
m
VOB
( ) =
IOB,H (M L )
T
!1
IOB,s (M L )
T
IOB,H (M L )
T
!1
j =
MGF of any one of the xj s is
Mx (t) =
1
1 2t
Let
f (j ; ) xj
yj = f(j ) =
2
My (t) = Mx
f (j ; )
t
2
=
1
1 f (j ; ) t
g(f(j ); ) =
f (j )
1
e f (j ;)
f (j ; )
ln L(f; ) =
T /2
X
ln f (j ; )
j=0
T /2
X
f(j )
f (j ; )
j=0
95
MAXIMUM LIKELIHOOD
ln L(f; ) =
T /2
X
ln |F (j ; )| trace
j=0
T /2
X
F 1 (j ; ) F (j )
j=0
6.6 NOTES
Chapter Seven
xt , a, m Z+
97
SIMULATION
Figure 7.1: Ripleys Horror Plots of pairs of (Ui+1 , Ui ) for Various Congruential Generators Modulo 2048 (from Ripley, 1987)
x0 = 1
x0 = 1, x1 = 3, x2 = 9, x3 = 11, x4 = 1, x5 = 3, ...
Perfectly periodic, with a period of 4.
Generalize:
(xt , a, c, m Z+ )
xt (axt1 + c)(mod m)
Remarks
1. xt [0, m 1], t. So take xt =
xt
m,
98
CHAPTER 7
Figure 7.2: Transforming from U(0,1) to f (from Davidson and MacKinnon, 1993)
7.2 THE BASICS: C.D.F. INVERSION, BOX-MUELLER, SIMPLE ACCEPTREJECT
7.2.1 Inverse c.d.f.
Inverse cdf Method (Inversion Methods)
Desired density: f (x)
1. Find the analytical c.d.f., F (x), corresponding to f (x)
2. Generate T U (0, 1) deviates {r1 ,...rT }
3. Calculate {F1 (r1 ),..., F1 (rT )}
Graphical Representation of Inverse cdf Method
Example: Inverse cdf Method for exp() Deviates
f (x) = ex where > 0, x 0
Rx
F (x) = 0 et dt
x
t
= e = ex + 1 = 1 ex
o
Hence ex = 1 F (x) so x =
ln(1 F (x))
99
SIMULATION
7.2.2 Box-Muller
An Efficient Gaussian Approach: Box-Muller
Let x1 and x2 be i.i.d. U (0, 1), and consider
y1 = 2 ln x1 cos(2x2 )
y2 = 2 ln x1 sin(2x2 )
Find the distribution of y1 and y2 . We know that
f (y1 , y2 ) = f (x1 , x2 )
1
x1
y1
x2
y1
2
x1
y2
x2
y2
x1
y1
x2
y1
x1
y2
x2
y2
1
2
arctan
y2
y1
1
1
y12 /2
y22 /2
e
e
=
2
2
Bivariate density is the product of two N (0, 1) densities, so we have generated two independent N (0, 1) deviates.
Generating Deviates Derived from N(0,1)
21 = [N (0, 1)]2
2d =
Pd
i=1
N (, 2 ) = + N (0, 1)
p
td = N (0, 1)/ x2d /d, where N (0, 1) and 2d are independent
Fd1 ,d2 = 2d1 /d1 /2d2 /d2 where 2d1 and 2d2 are independent
Multivariate Normal
N (0, I) (N -dimensional) Just stack N N (0, 1)s
N (, ) (N -dimensional)
Let P P 0 = (P is the Cholesky factor of )
Let X N (0, I). Then P X N (0, )
To sample from N (, ), take + P X
7.2.3 Simple Accept-Reject
Accept-Reject
(Naive but Revealing Example)
We want to sample x f (x)
Draw:
1 U (, )
100
CHAPTER 7
2 U (0, h)
If 1 , 2 lies under the density f (x), then take x = 1
Otherwise reject and repeat
Graphical Representation of Naive Accept-Reject
Accept-Reject
General (Non-Naive) Case
We want to sample x f (x) but we only know how to sample x g(x).
Let M satisfy
f (x)
g(x)
M < , x. Then:
1. Draw x0 g(x)
2. Take x = x0 w.p.
f (x0 )
g(x0 )M ;
else go to 1.
(Allows for blanket functions g() more efficient than the uniform)
Note that accept-reject requires that we be able to evaluate f (x) and g(x) for any x.
Mixtures
On any draw i,
x fi (x), w.p. pi
where
0 pi 1, i
N
X
i=1
pi = 1
SIMULATION
101
For example, all of the fi could be uniform, but with different location and scale.
7.4 MORE
Slice Sampling
Copulas and Sampling From a General Joint Density
102
CHAPTER 7
2 ] = g(, T )
E[( )
e.g., Power function of a test:
= g(, T )
Experimental Design, Continued
Selection of (, T ) Configurations to Explore
a. Do we need a full design? In general many values of and T need be explored.
But if, e.g., g(, T ) = g1 () + g2 (T ), then only explore values for a single T , and T
values for a single (i.e., there are no interactions).
b. Is there parameter invariance (g(, T ) unchanging in )? e.g., If y = X + ,
N (0, 2 ()), then the exact finite-sample distributions of
M LE
and
M
LE
2
are invariant to true , 2 . So vary only , leaving and alone (e.g., set to 0 and
1). Be careful not to implicitly assume invariance regarding unexplored aspects of the
design (e.g., structure of X variables above.)
Experimental Design, Continued
Number of Monte Carlo Repetitions (N )
e.g., MC computation of test size
nominal size 0 , true size , estimator
=
#rej
N
N
i=1 I(reji )
N
a
(1 )
N ormal approximation :
N ,
N
"
P
1.96
(1 )
N
#!
= .95
103
SIMULATION
0 (1 0 )
= .01
N
If 0 = .05, N = 7299
Strategy 2 (Use =
1
2
2 1.96
1
2
1
2
= .01 N = 38416
Strategy 3 (Use =
; the obvious strategy)
7.6.2 Simulation
(II) Simulation
Running example: Monte Carlo integration
R1
Definite integral: = 0 m(x)dx
Key insight:
R1
= 0 m(x)dx = E(m(x))
x U (0, 1)
Notation:
= E[m(x)]
2 = var(m(x))
Direct Simulation:
Arbitrary Function, Uniform Density
Generate N U (0, 1) deviates xi , i = 1, ..., N
Form the N deviates mi = m(xi ), i = 1, ..., N
N
1 X
mi
=
N i=1
d
N ( ) N (0, 2 )
m(x)f (x)dx
104
CHAPTER 7
d
N ( ) N (0, 2 )
xf (x)dx
N ( ) N (0, 2 )
f (xi )
g(xi ) xi ,
i = 1, ..., N
N
N
N
X
1 X
1 X f (xi )
=
mi =
xi =
wi xi
N i=1
N i=1 g(xi )
i=1
N ( ) N (0, 2 )
105
SIMULATION
f (y/x)f (x)dx.
f (y|x)
f (x)
g(x)dx,
I(x)
f (xi )
I(xi )
PN f (xj )
i=1
j=1 g(xj )
N
X
f (y|xi ) =
N
X
wi f (y|xi ).
i=1
So importance sampling replaces a simple average of f (y|xi ) based on initial draws from
f (x) with a weighted average of f (y|xi ) based on initial draws from g(x), where the weights
wi reflect the relative heights of f (xi ) and g(xi ).
Indirect Simulation
Variance-Reduction Techniques
(Swindles)
Importance Sampling to Achieve Variance Reduction
Again we use:
Z
=
f (x)
g(x)dx,
g(x)
N ( ) N (0, 2 )
106
CHAPTER 7
xf (x)
g(x)
N
X
I(xi > 1.96)
N
i=1
f (x)
(with variance 2 )
i=1
(with variance 2 )
2
0.06
2
Antithetic Variates
We average negatively correlated unbiased estimators of (Unbiasedness maintained,
variance reduced)
The key: If x symmetric(, v), then xi are equally likely
e.g., if x U (0, 1), so too is (1 x)
e.g., if x N (0, v), so too is x
Consider for example the case of zero-mean symmetric f (x)
Z
= m(x)f (x)dx
N
1 X
mi , ( is based on xi , i = 1, ..., N )
Direct : =
N i=1
1
1
Antithetic : = (x) + (x)
2
2
((x) is based on xi , i = 1, ..., N/2 , and
(x) is based on xi , i = 1, ..., N/2)
Antithetic Variates, Contd
107
SIMULATION
More concisely,
N/2
2 X
=
ki (xi )
N i=1
where:
1
1
m(xi ) + m(xi )
2
2
ki =
2 =
N ( ) N (0, 2 )
1
1
1
var (m(x)) + var (m(x)) +
4
4
2
Often 2 2
Z
=
Z
m(x)f (x)dx =
Z
g(x)f (x)dx +
Control function g(x) simple enough to integrate analytically and flexible enough to absorb
most of the variation in m(x).
We just find the mean of m(x)g(x), where g(x) has known mean and is highly correlated
with m(x).
Control Variates
Z
g(x)dx +
N
1 X
[m(xi ) g(xi )]
N i=1
d
N ( ) N (0, 2 )
ex dx
108
CHAPTER 7
g(x)dx =
1.7 2
x+
x
2
1
= 1.85
N
1 X xi
direct =
e
N i=1
N
1 X xi
cv = 1.85 +
[e (1 + 1.7xi )]
N i=1
var(direct )
78
var(CV )
Common Random Numbers
We have discussed estimation of a single integral:
Z 1
f1 (x)dx
0
But interest often centers on difference (or ratio) of the two integrals:
Z
f1 (x)dx
0
f2 (x)dx
0
The key: Evaluate each integral using the same random numbers.
Common Random Numbers in Estimator Comparisons
; true parameter 0
Two estimators ,
Compare MSEs: E( 0 )2 , E( 0 )2
Expected difference: E ( 0 )2 ( 0 )2
Estimate:
N
2
1 X
(i 0 )2 (i 0 )
N i=1
Variance of estimate:
1
1
2
var ( 0 )2 + var ( 0 )2 cov ( 0 )2 , ( 0 )2
N
N
N
Extensions...
Sequential importance sampling: Builds up improved proposal densities across draws
109
SIMULATION
=
rej
N
(1 )
,
N
or
= + = g(T ) +
g(T )(1 g(T ))
0,
N
= 0 + T
12
c0 +
p
X
!
ci T
i
2
i=1
0 is nominal size, which obtains as T . Second term is the vanishing size distortion.
Response surface regression:
110
CHAPTER 7
(
0 ) T 2 , T 1 , T 2 , ...
Disturbance will be approximately normal but heteroskedastic.
So use GLS or robust standard errors.
m1 () m
1
2
m2 () m
d() =
..
.
mr () m
r
The mi () are model moments and the m
i are data moments.
MM: k = r and the mi () calculated analytically
GMM: k < r and the mi () calculated analytically
Inefficient relative to MLE, but useful when likelihood is not available
7.7.2 Simulated Method of Moments (SMM)
(k r and the mi () calculated by simulation )
Model moments for GMM may also be unavailable (i.e., analytically intractable)
SMM: if you can simulate, you can consistently estimate
Simulation ability is a good test of model understanding
If you cant figure out how to simulate pseudo-data from a given probabilistic
model, then you dont understand the model (or the model is ill-posed)
Assembling everything: If you understand a model you can simulate it, and if you
can simulate it you can estimate it consistently. Eureka!
No need to work out what might be very complex likelihoods even if they are in
principle available.
111
SIMULATION
MLE efficiency lost may be a small price for SMM tractability gained.
SMM Under Misspecification
All econometric models are misspecified.
GMM/SMM has special appeal from that perspective.
Under correct specification any consistent estimator (e.g., MLE or GMM/SMM) takes
you to the right place asymptotically, and MLE has the extra benefit of efficiency.
Under misspecification, consistency becomes an issue, quite apart from the secondary
issue of efficiency. Best DGP approximation for one purpose may be very different
from best for another.
The bottom line: under misspecification MLE may not be consistent for what you
want, whereas by construction GMM is consistent for what you want (once you decide
what you want).
7.7.3 Indirect Inference
k-dimensional economic model parameter
> k-dimensional auxiliary model parameter
IE = argmin d()0 d()
where
1 () 1
2 () 2
d() =
..
d () d
112
CHAPTER 7
x
T =
(x)
x
T u(1)/2 ]
T
T
1X
xt , 2 (x) = E(x )2
T t=1
(
xT )
u solves P
!
u
(x)
(x)
T u
(1)/2 ]
I = [
xT u
(1+)/2 , x
T
T
T
2 (x) =
1 X
(xt x
T )2
T 1 t=1
u
solves P
xT )
(
(x)
u
=
(x)
I=x
T t(1)/2
T
113
SIMULATION
(
xT )
(
xT )
(j) T
2. Compute
x
T
xT
(x)
T
(j)
x
T
H(z)
= P
(j)
T )
xT x
(
(x)
(x)
(x)
I = [
xT u
(1+)/2 , x
T u
(1)/2 ]
T
T
!
(j)
where P
(
xT x
T )
(x)
u ) =
= H(
Percentile-t Bootstrap
S=
(
xT )
(x)
H(z) = P
xT )
(
H(z)
=P
(x)
(j)
xT x
T )
(
(x(j) )
x
T
xT
(x)
which is an
114
CHAPTER 7
(x)
(x)
I = [
xT u
(1+)/2 , x
T u
(1)/2 ]
T
T
(j)
(
x
T)
T
u
=
P
(j)
(x
Percentile: x
T changes across bootstrap replications
(j)
Percentile-t: both x
T and
(x(j) ) change across bootstrap replications
Effectively, the percentile method bootstraps the parameter, whereas the percentile-t bootstraps the t statistic
Key Bootstrap Property: Consistent Inference
Real-world root:
d
S D
(as T )
Bootstrap-world root:
d
D
(as T, N )
(
xT )
(x)
(j) T
115
SIMULATION
Issues:
1. Inappropriate standardization of S for dynamic data. So replace
(x) with 2fx (0),
where fx (0) is a consistent estimator of the spectral density of x at frequency 0.
(j) T
2. Inappropriate to draw {xt }t=1 with replacement for dynamic data. What to do?
Non-Parametric Time Series Bootstrap
(Overlapping Block Sampling)
Overlapping blocks of size b in the sample path:
(j)
1 , ..., k
(j)
(j)
(j)
(j)
xt = c + xt1 + t ,
t iid
4. Generate xt
(j)
5. Regress xt
(j)
(c, xt1 ) to get c(j) and (j) , associated t-statistics, etc.
116
CHAPTER 7
(j)
(j) T
(j) T
3. Draw {ut }T
ut }T
t=1 with replacement from {
t=1 and convert to {vt }t=1 = {t ut }t=1 .
(j)
(j)
T
4. Using the prediction-error draw {vt }T
t=1 , simulate the model, obtaining {yt }t=1 .
7.9.1 Local
Using MCMC for MLE (and Other Extremum Estimators)
Chernozukov and Hong show how to compute extremum estimators as mean of pseudo-posterior distri
butions, which can be simulated by MCMC and estimated at the parametric rate 1/ N , in contrast to the
much slower nonparametric rates achievable (by any method) by the standard posterior mode extremum
estimator.
7.9.2 Global
Summary of Local Optimization:
1. initial guess (0)
2. while stopping criteria not met do
3. select (c) N ((m) ) (Classically: use gradient)
4. if lnL((c) ) lnL((m) ) > 0 then (m+1) = (c)
5. end while
Simulated Annealing
(Illustrated Here for a Discrete Parameter Space)
Framework:
1. A set , and a real-valued function lnL (satisfying regularity conditions) defined on . Let
be the set of global maxima of lnL
2. (m) , a set N ((m) ) (m) , the set of neighbors of (m)
3. A nonincreasing function, T (m) : N (0, ) (the cooling schedule), where T (m) is the temperature at iteration m
4. An initial guess, (0)
Simulated Annealing Algorithm
1. initial guess (0)
2. while stopping criteria not met do
3. select (c) N ((m) )
117
SIMULATION
118
CHAPTER 7
7.12 NOTES
Chapter Eight
T ( ) N (0, )
d
T (M L ) N
0,
IEX,H ()
T
or more crudely
(Enough said.)
Bayesian Paradigm (T )
T (M L ) N (0, )
1 !
120
CHAPTER 8
Model comparison:
p(Mi |y)
p(Mj |y)
| {z }
posterior odds
p(y|Mi )
p(y|Mj )
| {z }
Bayes f actor
p(Mi )
p(Mj )
| {z }
prior odds
T
Y
P (yt |y1:t1 )
t=1
= lnP (y) =
T
X
t=1
T
X
Z
ln
t=1
121
BAYES
As T , the distinction between model averaging and selection vanishes, as one goes to 0 and the
other goes to 1.
If one of the models is true, then both model selection and model averaging are consistent for the true
model. Otherwise theyre consistent for the X-optimal approximation to the truth. Does X = KLIC?
w.p. ((s1) , )
accept
reject
p( = ) q = (s1)
p( = (s1) ) q ( = )
#
, 1
p( = )
,1
p( = (s1) )
8.3.3 More
Burn-in, Sampling, and Dependence
total simulation = burn-in + sampling
Questions:
How to assess convergence to steady state?
In the Markov chain case, why not do something like the following. Whenever time t is a multiple of m,
use a distribution-free non-parametric (randomization) test for equality of distributions to test whether the
unknown distribution f1 of xt , ..., xt(m/2) equals the unknown distribution f2 of xt(m/2)+1 , ..., xtm . If,
for example, we pick m = 20, 000, then whenever time t is a multiple of 20,000 we would test equality of the
distributions of xt , ..., xt10000 and xt10001 , ..., xt20000 . We declare arrival at the steady state when the
null is not rejected. Or something like that.
122
CHAPTER 8
Of course the Markov chain is serially correlated, but who cares, as were only trying to assess equality of
unconditional distributions. That is, randomizations of xt , ..., xt(m/2) and of xt(m/2)+1 , ..., xtm destroy
the serial correlation, but so what?
How to handle dependence in the sampled chain?
Better to run one long chain or many shorter parallel chains?
A Useful Property of Accept-Reject Algorithms
(e.g., Metropolis)
Metropolis requires knowing the density of interest only up to a constant, because the acceptance probability is governed by the RATIO p( = )/p( = (s1) ). This will turn out to be important for Bayesian
analysis.
Metropolis-Hastings (Discrete)
For desired , we want to find P such that P = . It is sufficient to find P such that i Pij = j Pji .
Suppose weve arrived at zi . Use symmetric, irreducible transition matrix Q = [Qij ] to generate proposals.
That is, draw proposal zj using probabilities in ith row of Q.
Move to zj w.p. ij , where:
1, if i 1
ij =
j otherwise
i
j
, 1
i
Metropolis-Hastings, Continued...
This defines a Markov chain P with:
Pij =
ij Qij , for i 6= j
1
j6=i
Pij , for i = j
123
BAYES
Useful if/when conditionals are known and easy to sample from, but joint and marginals are not. (This
happens a lot in Bayesian analysis.)
General Gibbs Sampling
We want to sample from f (z) = f (z1 , z2 , ..., zk )
Initialize (j = 0) using z20 , z30 , ..., zk0
Gibbs iteration j = 1:
a. Draw z11 from f (z1 |z20 , ..., zk0 )
b. Draw z21 from f (z2 |z11 , z30 , ..., zk0 )
c. Draw z31 from f (z3 |z11 , z21 , z40 , ..., zk0 )
...
1
k. Draw zk1 from f (zk |z11 , ..., zk1
)
Repeat j = 2, 3, ....
Again, limj f (z j ) = f (z)
Metropolis Within Gibbs
Gibbs breaks a big draw into lots of little (conditional) steps. If youre lucky, those little steps are simple.
If/when a Gibbs step is difficult, i.e., its not clear how to sample from the relevant conditional, it can be
done by Metropolis.
(Metropolis within Gibbs)
Metropolis is more general but also more tedious, so only use it when you must.
Composition
We may want (x1 , y1 ), ..., (xN , yN ) iid from f (x, y)
Or we may want y1 , ..., yN iid from f (y)
They may be hard to sample from directly.
But sometimes its easy to:
Draw x f (x)
Draw y f (y|x )
Then:
(x1 , y1 ), ..., (xN , yN ) iid f (x, y)
(y1 , ..., yN ) iid f (y)
M
L =
e0 e
T
M L N , 2 (X 0 X)1
2
T
M
L
2T K
2
124
CHAPTER 8
1
0
p(/ 2 , y) exp(1/2( 0 )0 1
0 ( 0 ) 2 2 (y X) (y X))
This is the kernel of a normal distribution (*Problem*):
/ 2 , y N (1 , 1 )
where
1
2 (X 0 X)
2 (X 0 X)
M L )
(1
1 = 1
0 0 +
0 +
2 (X 0 X))1
1 = (1
0 +
Gamma and Inverse Gamma Refresher
iid
zt N
0,
, x=
v
X
v
zt2 x x; ,
2 2
t=1
2v
var(x) = 2
x 1 ( v2 , 2 ) (inverse gamma)
Bayesian Inference for 2 /
Prior:
1
/ v20 , 20
2
v0 1
2
0
exp 2
g 12 / 12
2
1
x
( v2 , 2 )
(Independent
2 / for completeness.)
of , but write
T /2
1
2
L 2 /, y
exp 21 2 (y X)0 (y X)
(*Problem*: In contrast to L(/ 2 , y) earlier, we dont absorb the ( 2 )T /2 term into the constant of
proportionality. Why?)
Hence (*Problem*):
v1 1
2
1
exp
p 12 /, y 12
2 2
or 12 /, y v21 , 21
v1 = v0 + T
1 = 0 + (y X)0 (y X)
Bayesian Pros Thus Far
1. Feels sensible to focus on p(/y). Classical relative frequency in repeated samples replaced with
subjective degree of belief conditional on the single sample actually obtained
2. Exact finite-sample full-density inference
Bayesian Cons Thus Far
1. From where does the prior come? How to elicit prior distributions?
2. How to do an objective analysis?
(e.g. what is an uninformative prior? Uniform?)
(Note, however, that priors can be desirable and helpful. See, for example, the cartoon at http:
//fxdiebold.blogspot.com/2014/04/more-from-xkcdcom.html)
3. We still dont have the marginal posteriors that we really want: p(, 2 /y), p(/y).
Problematic in any event!
125
BAYES
t
t
!
iid
Q
0
0
H
Let
T = (01 , . . . , 0T )0 , = (T 0 , R0 , Z 0 , Q0 , H 0 )0
The key: Treat
T as a parameter, along with system matrices
Recall the State-Space Model in Density Form
t |t1 N (T t1 , RQR0 )
yt |t N (Zt , H)
Recall the Kalman Filter in Density Form
Initialize at a0 , P0
State prediction:
t |
yt1 N (at/t1 , Pt/t1 )
at/t1 = T at1
Pt/t1 = T Pt1 T 0 + RQR0
State update:
t |
yt N (at , Pt )
at = at/t1 + Kt (yt Zat/t1 )
Pt = Pt/t1 Kt ZPt/t1
Data prediction:
yt |
yt1 N (Zat/t1 , Ft )
where yt = (y10 , ..., yt0 )0
Carter-Kohn Multi-move Gibbs Sampler
0 )0
Let yT = (y10 , . . . , yT
(0)
0. Initialize
Gibbs sampler at generic iteration j:
(j)
j1. Draw from posterior
T /(j1) , yT (hard)
(j)
(j)
j2. Draw from posterior /
T , yT (easy)
Iterate to convergence, and then estimate posterior moments of interest
Just two Gibbs draws: (1)
T parameter, (2) parameter
(j)
Multimove Gibbs Sampler, Step 2 ((j) |
T , yT ) (easy)
(j)
(j)
Conditional upon draws
T , sampling
becomes a multivariate regression problem.
126
CHAPTER 8
We have already seen how to do univariate regression. We can easily extend to multivariate regression.
The Gibbs sampler continues to work.
*********************fxd
Multivariate Regression
yit = x0t i + it
iid
T N
T K KN
T N
np1
2
1
exp( tr(1 V1 ))
2
np1
2
1
exp tr(XV1 )
2
1
0 1
tr vec(B B0 ) V0 vec(B B0 )
2
Likelihood:
!
T
1X
0
0 1
0
(Yt B Xt ) (Yt B Xt )
2 t=1
1 1
0
exp tr (Y XB) (Y XB)
2
1
0 1 X 0 X vec(B B)
exp
vec(B B)
2
Posterior:
p(vec(B)|, Y )
1
0 1 X 0 X vec(B B)
+ vec(B B0 )0 V 1 vec(B B0 )
exp
vec(B B)
0
2
127
BAYES
128
CHAPTER 8
or
Case 3 and 2
Joint prior g(, 12 ) = g(/ 12 )g( 12 )
where / 12 N (0 , 0 ) and 12 G( v20 ,
HW Show that the joint posterior,
p(, 12 /y) = g(, 12 )L(, 12 /y)
can be factored as p(/ 12 , y)p( 21/y )
0
)
2
where / 12 , y N (1 , 1 )
and 12 /y G( v21 , 21 ),
and derive expressions for 1 , 1 , v1 , 1
in terms of 0 , 0 , 0 , x, and y.
Moreover, the key marginal posterior
R
P (/y) = 0 p(, 12 /y)d 2 is multivariate t.
Implement the Bayesian methods via Gibbs sampling.
Chapter Nine
9.1 RANDOM WALKS AS THE I(1) BUILDING BLOCK: THE BEVERIDGENELSON DECOMPOSITION
Random Walks
Random walk:
yt = yt1 + t
t W N (0, 2 )
Random walk with drift:
yt = + yt1 + t
t W N (0, 2 )
Properties of the Random Walk
yt = y0 +
t
X
i=1
yt = t + y0 +
t
X
i=1
130
CHAPTER 9
131
BAYES
=
s
PT 1 2
t=2 yt1
132
CHAPTER 9
: yt = yt1 + et
: yt = c + yt1 + et
: yt = c + t + yt1 + et
5. Repeat N times, yielding {
i , i , i }N
i=1
6. Sort and compute fractiles
7. Fit response surfaces
AR(p)
yt +
p
X
j ytj = t
j=1
yt = 1 yt1 +
p
X
j (ytj+1 ytj ) + t
j=2
P
P
where p 2, 1 = pj=1 j , and i = pj=i j , i = 2, ..., p
Studentized statistic
Allowing for Nonzero Mean Under the Alternative
(yt ) +
p
X
j (ytj ) = t
j=1
yt = + 1 yt1 +
p
X
j (ytj+1 ytj ) + t
j=2
P
where = (1 + pj=1 j )
Studentized statistic
Allowing for Trend Under the Alternative
(yt a bt) +
p
X
j=1
yt = k1 + k2 t + 1 yt1 +
p
X
j (ytj+1 ytj ) + t
j=2
k1 = a(1 +
p
X
i ) b
i=1
k2 = b (1 +
p
X
ii
i=1
p
X
i )
i=1
Pp
i=1
ii and k2 = 0
133
BAYES
yt = 1 yt1 +
k1
X
j (ytj+1 ytj ) + t
j=2
yt = + 1 yt1 +
k1
X
j (ytj+1 ytj ) + t
j=2
yt = k1 + k2 t + 1 yt1 +
k1
X
j (ytj+1 ytj ) + t
j=2
k1
X
j (ytj+1 ytj ) + t
j=2
t
T
d
RV (t diverges)
d
RV ( diverges)
134
CHAPTER 9
x CI(1, 1) if
(1) xi I(1), i = 1, . . . , N
(2) 1 or more linear combinations
zt = 0 xt s.t. zt I(0)
Example
xt = xt1 + vt , vt W N
yt = xt1 + t , t W N, t vt , t,
(yt xt ) = t vt = I(0)
Cointegration and Attractor Sets
xt is N -dimensional but does not wander randomly in RN
0 xt is attracted to an (N R)-dimensional subspace of RN
N : space dimension
R: number of cointegrating relationships
Attractor dimension = N R
(number of underlying unit roots)
(number of common trends)
Example
3-dimensional V AR(p), all variables I(1)
R = 0 no cointegration x wanders throughout R3
R = 1 1 cointegrating vector x attracted to a 2-Dim hyperplane in R3 given by 0 x = 0
R = 2 2 cointegrating vectors x attracted to a 1-Dim hyperplane (line) in R3 given by intersection
of two 2-Dim hyperplanes, 01 x = 0 and 02 x = 0
R = 3 3 cointegrating vectors x attracted to a 0-Dim hyperplane (point) in R3 given by the
intersection of three 2-Dim hyperplanes, 01 x = 0 , 02 x = 0 and 03 x = 0
(Covariance stationary around E(x))
Cointegration Motivation: Dynamic Factor Structure
Factor structure with I(1) factors
(N R) I(1) factors driving N variables
e.g., single-factor model:
y1t
.
. =
.
yN t
1
.
. ft +
.
1
1t
.
.
.
N t
ft = ft1 + t
R = (N 1) cointegrating combs: (y2t y1t ), ..., (yN t y1t )
(N R) = N (N 1) = 1 common trend
Cointegration Motivation: Optimal Forecasting
I(1) variables always co-integrated with their optimal forecasts
Example:
xt = xt1 + t
xt+h|t = xt
P
xt+h xt+h|t = h
i=1 t+i
(finite MA, always covariance stationary)
Cointegration Motivation:
Long-Run Relation Augmented with Short-Run Dynamics
135
BAYES
xt =
xt1 +
p1
X
Bi xti + ut
i=1
Integration/Cointegration Status
Rank() = 0
0 cointegrating vectors, N underlying unit roots
(all variables appropriately specified in differences)
Rank() = N
N cointegrating vectors, 0 unit roots
(all variables appropriately specified in levels)
Rank() = R
(0 < R < N )
xt CI(1, 1)
V ECM Cointegration
We can always write
Pp1
xt =
i=1 Bi xti xt1 + ut
But under cointegration, rank() = R < N , so
0
=
N N
N R RN
P
0
xt = p1
i=1 Bi xti xt1 + ut
Pp1
= i=1 Bi xti zt1 + ut
V ECM Cointegration
p1
X
xt =
Bi xti 0 xt1 + ut
i=1
Premultiply by 0 :
0 xt = 0
p1
X
Bi xti
i=1
0 x
t1
0 0 xt1 + 0 ut
|{z}
full rank
be stationary.
136
CHAPTER 9
Stationary-Nonstationary Decomposition
M0
(N N )
0
(R N )
x
=
(N 1)
CI combs
x =
com. trends
(N R) N
(Rows of to columns of )
Intuition Transforming the system by yields
xt =
p1
X
0
|{z}
Bi xti
i=1
0 xt1 + t
0 by orthogonality
0 1
1 0
! !
x1t
1 0
u1t
x2t
u2t
Dickey-Fuller form:
x1t
!
=
x2t
0 0
x1t1
1 1
x2t1
u1t
u2t
Example, Continued
x1t
x2t
!
=
!
0
1
u2t u1t
x1t
1 = 0
!
=
x2t x1t
x1t
137
BAYES
1
2
1
1
<d<
2
2
(1 L)d = 1 dL +
138
CHAPTER 9
9.9 NOTES
Chapter Ten
t = T t1 + Rt
yt = Zt + t
t = T t1 + Rt
yt = Zt + t
140
CHAPTER 10
t D , t D
Non-Linear / Gaussian
t = Q(t1 , t )
yt = G(t , t )
t = Tt t1 + Rt t
yt = Zt t + t
t N ,
t N
Conditionally Gaussian
Whites theorem
Non-Linear / Non-Gaussian
t = Q(t1 , t )
yt = G(t , t )
t D ,
t D
t = Q(t1 ) + t
yt = G(t ) + t
141
NON-LINEAR NON-GAUSSIAN
t D ,
t D
t = Qt (t1 , t )
yt = Gt (t , t )
t Dt ,
t Dt
1t N 1
2t N 2
Extensions to:
Richer 1 dynamics (governing the observed y)
1t
2t
1t
2t
1t 2t
142
CHAPTER 10
ht = + ht1 + t
rt =
t N (0, 2 ),
eht t
(transition)
(measurement)
t N (0, 1)
ht = + ht1 + t
(transition)
ut Du
IVt = IVt1 + t
RVt = IVt + t
represents the fact that RV is based on less than an infinite sampling frequency.
Microstructure Noise Model
**Hasbrouck
(Non-linear / non-Gaussian)
A Distributional Statement of the Kalman Filter ****
Multivariate Stochastic Volatility with Factor Structure
143
NON-LINEAR NON-GAUSSIAN
***
Approaches to the General Filtering Problem Kitagawa (1987), numerical integration
(linear / non-Gaussian) More recently, Monte Carlo integration
Extended Kalman Filter (Non-Linear / Gaussian)
t = Q(t1 , t )
yt = G(t , t )
t N, t N
Take first-order Taylor expansions of:
Q around at1
G around at,t1
Use Kalman filter on the approximated system
Unscented Kalman Filter (Non-Linear / Gaussian)
Bayes Analysis of SSMs: Carlin-Polson-Stoffer 1992 JASA
single-move Gibbs sampler
(Many parts of the Gibbs iteration: the parameter vector, and then each observation of
the state vector, period-by-period)
Multi-move Gibbs sampler can handle non-Gaussian (via mixtures of normals), but not
nonlinear.
Single-move can handle nonlinear and non-Gaussian.
Expanding S(M L ) around yields:
S(M L ) S() + S 0 ()(M L ) = S() + H())(M L ).
Noting that S(M L ) 0 and taking expectations yields:
0 S() IEX,H ()(M L )
or
(M L ) I 1
EX,H ().
or
***
Case 3 and 2
Joint prior g(, 12 ) = g(/ 12 )g( 12 )
where / 12 N (0 , 0 ) and 12 G( v20 , 20 )
HW Show that the joint posterior,
p(, 12 /y) = g(, 12 )L(, 12 /y)
can be factored as p(/ 12 , y)p( 21/y )
144
CHAPTER 10
where / 12 , y N (1 , 1 )
and
1
2 /y
G( v21 , 21 ),
Regime Switching
We have emphasized dynamic linear models, which are tremendously important in practice. Theyre called linear because yt is a simple linear function of past ys or past s.
In some forecasting situations, however, good statistical characterization of dynamics may
require some notion of regime switching, as between good and bad states, which is a
type of nonlinear model.
Models incorporating regime switching have a long tradition in business-cycle analysis,
in which expansion is the good state, and contraction (recession) is the bad state. This
idea is also manifest in the great interest in the popular press, for example, in identifying
and forecasting turning points in economic activity. It is only within a regime-switching
framework that the concept of a turning point has intrinsic meaning; turning points are
naturally and immediately defined as the times separating expansions and contractions.
yt =
(u)
(l)
+ yt1 +
(l)
t ,
(l)
> ytd .
The superscripts indicate upper, middle, and lower regimes, and the regime operative
145
NON-LINEAR NON-GAUSSIAN
at any time t depends on the observable past history of y in particular, on the value of
ytd .
p00
1 p00
1 p11
p11
!
.
The ij-th element of M gives the probability of moving from state i (at time t 1) to state
j (at time t). Note that there are only two free parameters, the staying probabilities, p00
and p11 . Let {yt }Tt=1 be the sample path of an observed time series that depends on {st }Tt=1
such that the density of yt conditional upon {st } is
1
f (yt |st ; ) =
exp
2
(yt st )
2 2
!
.
Thus, yt is Gaussian white noise with a potentially switching mean. The two means around
which yt moves are of particular interest and may, for example, correspond to episodes of
differing growth rates (booms and recessions, bull and bear markets, etc.).
Chapter Eleven
Volatility Dynamics
VOLATILITY DYNAMICS
147
148
CHAPTER 11
VOLATILITY DYNAMICS
149
150
CHAPTER 11
Conditional Sharpe:
E(rit rf t )
t
Asset Pricing II: CAPM Standard CAPM:
(rit rf t ) = + (rmt rf t )
=
cov((rit rf t ), (rmt rf t ))
var(rmt rf t )
t =
Conditional CAPM:
ln(S/K) + (r + 2 /2)
d2 =
ln(S/K) + (r 2 /2)
151
VOLATILITY DYNAMICS
PC = BS(, ...)
(Standard Black-Scholes options pricing)
Completely different when varies!
Hedging
Standard delta hedging
Ht = St + ut
cov(Ht , St )
var(St )
Dynamic hedging
Ht = t St + ut
t =
covt (Ht , St )
vart (St )
Trading
Standard case: no way to trade on fixed volatility
Time-varying volatility I: Options straddles, strangles, etc. Take position according to
whether PC >< f (t+h,t , . . .)
(indirect)
Time-varying volatility II: Volatility swaps
Effectively futures contracts written on underlying
realized volatility
(direct)
Some Warm-Up
Unconditional Volatility Measures
Variance: 2 = E(rt )2 (or standard deviation: )
Mean Absolute Deviation: M AD = E|rt |
Interquartile Range: IQR = 75% 25%
152
CHAPTER 11
12
10
True Probability, %
100
200
300
400
500
Day Number
600
700
800
900
1000
p
V aRT
+1|T
fT (rT +1 )drT +1
ESTp+1|T
= p
V aRT+1|T d
153
VOLATILITY DYNAMICS
2
2
t2 = t1
+ (1 ) rt1
t2 =
2
j rt1j
j=0
j = (1 ) j
(Many initializations possible: r12 , sample variance, etc.)
RM-VaRpT +1|T = T +1 1
p
Random walk for variance
Random walk plus noise model for squared returns
Volatility forecast at any horizon is current smoothed value
But flat volatility term structure is not realistic
Rigorous Modeling I
Conditional Univariate Volatility Dynamics from Daily
Data
Conditional Return Distributions
f (rt ) vs. f (rt |t1 )
Key 1: E(rt |t1 )
Are returns conditional mean independent? Arguably yes.
Returns are (arguably) approximately serially uncorrelated, and (arguably) approximately
free of additional non-linear conditional mean dependence.
Conditional Return Distributions, Continued Key 2: var(rt |t1 ) = E((rt )2 |t1 )
Are returns conditional variance independent? No way!
Squared returns serially correlated, often with very slow decay.
The Standard Model
(Linearly Indeterministic Process with iid Innovations)
yt =
X
i=0
bi ti
154
CHAPTER 11
iid (0,
2 )
b2i < b0 = 1
i=0
i=1 i ti (varies)
2
E(yt+k | t ) =
bk+i ti
i=0
k1
X
bi t+ki
i=0
E([yt+k E(yt+k | t )] | t ) = 2
k1
X
b2i
i=0
E(rt ) = 0
2
E(rt E(rt )) =
(1 )
E(rt |t1 ) = 0
2
2
E([rt E(rt |t1 )] |t1 ) = + rt1
GARCH(1,1) Process
Generalized ARCH
rt | t1 N (0, ht )
155
VOLATILITY DYNAMICS
2
ht = + rt1
+ ht1
E(rt ) = 0
2
E(rt E(rt )) =
(1 )
E(rt |t1 ) = 0
2
2
E([rt E(rt | t1 )] | t1 ) = + rt1
+ ht1
ln L (; rp+1 , . . . , rT )
T
T
T p
1 X
1 X rt2
ln(2)
ln ht ()
2
2 t=p+1
2 t=p+1 ht ()
156
CHAPTER 11
2
wj rtj
where wj = (1 )j
But in GARCH(1,1) we have:
2
ht = + rt1
+ ht1
2
+
j1 rtj
1
ht =
Variance Targeting
Sample unconditional variance:
2 =
T
1X 2
r
T t=1 t
We can constrain 2 =
2 by constraining:
= (1 )
2
Saves a degree of freedom and ensures reasonableness
ARMA Representation in Squares
rt2 has the ARMA(1,1) representation:
2
rt2 = + ( + )rt1
t1 + t ,
where t = rt2 ht .
Variations on the GARCH Theme
Regression with GARCH Disturbances
yt = x0t + t
t |t1 N (0, ht )
157
VOLATILITY DYNAMICS
2
ht = + rt1
+ ht1 + 0 zt
is a parameter vector
z is a set of positive exogenous variables.
Asymmetric Response and the Leverage Effect I: TARCH
2
Standard GARCH: ht = + rt1
+ ht1
2
2
TARCH:
( ht = + rt1 + rt1 Dt1 + ht1
1 if rt < 0
Dt =
0 otherwise
positive return (good news): effect on volatility
rt = ht
zt
N (0, 1)
f at tailed
158
CHAPTER 11
iid
zt
td
std(td )
Rigorous Modeling II
Conditional Univariate Volatility Dynamics from HighFrequency Data
VOLATILITY DYNAMICS
159
Figure 11.10: Conditional Standard Deviation, History and Forecast, Daily NYSE Returns.
160
CHAPTER 11
Dependent Variable: R
Method: ML - ARCH (Marquardt) - Student's t distribution
Date: 04/10/12 Time: 13:48
Sample (adjusted): 2 3461
Included observations: 3460 after adjustments
Convergence achieved after 19 iterations
Presample variance: backcast (parameter = 0.7)
GARCH = C(4) + C(5)*RESID(-1)^2 + C(6)*RESID(-1)^2*(RESID(-1)<0)
+ C(7)*GARCH(-1)
Variable
Coefficient
Std. Error
z-Statistic
Prob.
@SQRT(GARCH)
C
R(-1)
0.083360
1.28E-05
0.073763
0.053138
0.000372
0.017611
1.568753
0.034443
4.188535
0.1167
0.9725
0.0000
C
RESID(-1)^2
RESID(-1)^2*(RESID(1)<0)
GARCH(-1)
1.03E-06
0.014945
2.23E-07
0.009765
4.628790
1.530473
0.0000
0.1259
0.094014
0.922745
0.014945
0.009129
6.290700
101.0741
0.0000
0.0000
T-DIST. DOF
5.531579
0.478432
11.56188
0.0000
Variance Equation
161
VOLATILITY DYNAMICS
100
50
0
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
2006
2008
2010
1992
1994
1996
1998
2000
2002
2004
Figure 11.12: S&P500 Daily Returns and Volatilities (Percent). The top panel shows daily S&P500
returns, and the bottom panel shows daily S&P500 realized volatility. We compute realized volatility as the
square root of AvgRV , where AvgRV is the average of five daily RVs each computed from 5-minute squared
returns on a 1-minute grid of S&P500 futures prices.
N ()
RVt ()
pt1+j pt1+(j1)
j=1
RVt () IVt =
2 ( ) d
t1
Microstructure Noise
State space signal extraction
AvgRV
Realized kernel
Many others
RV is Persistent
RV is Reasonably Approximated as Log-Normal
RV is Long-Memory
Exact and Approximate Long Memory
Exact long memory:
(1 L)d RVt = 0 + t
2
CHAPTER 11
100
5
1
0
1
2
Standard Normal Quantiles
QQ plot of Daily Realized Volatility
10
10
5
162
1
0
1
2
Standard Normal Quantiles
QQ plot of Daily log RVAVR
5
5
1
0
1
Standard Normal Quantiles
Figure 11.13: S&P500: QQ Plots for Realized Volatility and Log Realized Volatility. The top
panel plots the quantiles of daily realized volatility against the corresponding normal quantiles. The bottom
panel plots the quantiles of the natural logarithm of daily realized volatility against the corresponding normal
quantiles. We compute realized volatility as the square root of AvgRV , where AvgRV is the average of five
daily RVs each computed from 5-minute squared returns on a 1-minute grid of S&P500 futures prices.
d T +1|T 1
RV V aRTp+1|T = RV
p ,
GARCH-RV
2
t2 = + t1
+ RVt1
163
VOLATILITY DYNAMICS
Autocorrelation
0.8
0.6
0.4
0.2
0
50
100
150
200
250
200
250
Lag Order
ACF of Daily Return
Autocorrelation
0.8
0.6
0.4
0.2
0
50
100
150
Lag Order
Figure 11.14: S&P500: Sample Autocorrelations of Daily Realized Variance and Daily Return. The top panel shows realized variance autocorrelations, and the bottom panel shows return autocorrelations, for displacements from 1 through 250 days. Horizontal lines denote 95% Bartlett bands. Realized
variance is AvgRV , the average of five daily RVs each computed from 5-minute squared returns on a 1-minute
grid of S&P500 futures prices.
where
JVt =
Jt
X
2
Jt,j
j=1
T Vt () =
j=1
Bi-Power Variation:
BP Vt () =
N ()
2 N () 1
N ()1
|pt1+j | |pt1+(j+1) |
j=1
Minimum:
M inRVt () =
N ()
N () 1
N ()1
2
min |pt1+j |, |pt1+(j+1) |
j=1
164
CHAPTER 11
N (N +1)
2
distinct elements
Rt = t Zt
Zt i.i.d.(0, I)
1/2
where t
0
t = t1 + (1 ) Rt1 Rt1
165
VOLATILITY DYNAMICS
Assumes that the dynamics of all the variances and covariances are driven by a single
scalar parameter (identical smoothness)
Guarantees that the smoothed covariance matrices are pd so long as 0 is pd
Common strategy is to set 0 equal to the sample covariance matrix
1
T
PT
t=1
Rt Rt0
(which is pd if T > N )
But covariance matrix forecasts inherit the implausible scaling properties of the univariate RM forecasts and will in general be suboptimal
Multivariate GARCH(1,1)
0
vech (t ) = vech (C) + B vech (t1 ) + A vech (Rt1 Rt1
)
0
vech (t ) = vech (C) + (I) vech (t1 ) + (I) vech (Rt1 Rt1
)
Mirrors RM, but with the important difference that the t forecasts now revert to
= (1 )1 C
Fewer parameters than diagonal, but still O(N )2
(because of C)
Encouraging Parsimony: Covariance Targeting
Recall variance targeting:
2 =
T
1X 2
r , 2 =
T t=1 t
1
= take = (1 )
2
166
CHAPTER 11
T
1 X
Rt Rt0 )
T t=1
1
T
PT
t e0t
t=1 e
167
VOLATILITY DYNAMICS
0.8
0.6
0.4
0.2
75
80
85
90
95
00
05
10
Figure 11.15: Time-Varying International Equity Correlations. The figure shows the estimated
equicorrelations from a DECO model for the aggregate equity index returns for 16 different developed
markets from 1973 through 2009.
Updating rule is naturally given by the average conditional correlation of the standardized returns,
ut =
PN
PN
i=1
j>i
PN
i=1
ei,t ej,t
e2i,t
Rt = Ft + t
where
1/2
Ft = F t Zt
Zt i.i.d.(0, I)
t i.i.d.(0, )
= t = F t 0 + t
One-Factor Case with Everything Orthogonal
Rt = ft + t
168
CHAPTER 11
where
ft = f t zt
zt i.i.d.(0, 1)
t i.i.d.(0, 2 )
= t = f2 t 0 +
2
2
it
= f2 t 2i + i
2
ijt
= f2 t i j
Rigorous Modeling IV
Conditional Asset-Level (Multivariate) Volatility Dynamics from High-Frequency Data
Realized Covariance
RCovt ()
0
Rt1+j, Rt1+j,
j=1
RCovt () ICovt =
( ) d
t1
169
VOLATILITY DYNAMICS
15
10
10
15
15
10
0
Standard Normal Quantiles
10
15
Figure 11.16: QQ Plot of S&P500 Returns. We show quantiles of daily S&P500 returns from January
2, 1990 to December 31, 2010, against the corresponding quantiles from a standard normal distribution.
S = RCovt () + (1 ) t
t
t is p.d. and 0 < < 1
t = I (naive benchmark)
t = (unconditional covariance matrix)
t = f2 0 + (one-factor market model)
Multivariate GARCH-RV
t1 )
vech (t ) = vech (C) + B vech (t1 ) + A vech(
Fine for 1-step
Multi-step requires closing the system with an RV equation
Noureldin et al. (2011), multivariate HEAVY
Rigorous Modeling V
Distributions
Modeling Entire Return Distributions:
Returns are not Unconditionally Gaussian
170
CHAPTER 11
5
5
1
0
1
Standard Normal Quantiles
Gaussian conditional expected shortfall, which integrates over the left tail, would be
terrible
So we want more accurate assessment of things like V aRTp +1|T than those obtained
under Gaussian assumptions
Doing so for all values of p [0, 1] requires estimating the entire conditional return
distribution
More generally, best-practice risk measurement is about tracking the entire conditional return distribution
Observation-Driven Density Forecasting
Using r = and GARCH
Assume:
rT +1 = T +1/T T +1
T +1 iid(0, 1)
Multiply T +1 draws by T +1/T (fixed across draws, from a GARCH model) to build up
the conditional density of rT +1 .
T +1 simulated from standard normal
T +1 simulated from standard t
T +1 simulated from kernel density fit to
rT +1
T +1/T
171
VOLATILITY DYNAMICS
5
5
1
0
1
Standard Normal Quantiles
rT +1 = T +1 T +1
T +1 iid(0, 1)
Multiply T +1 draws by T +1 draws (from a simulated SV model) to build up the conditional density of rT +1 .
Again, T +1 simulated from any density deemed relevant
Modeling Entire Return Distributions:
Returns Standardized by RV are Approximately Gaussian
A Special Parameter-Driven Density Forecasting Approach
Using r = and RV
(Log-Normal / Normal Mixture)
Assume:
rT +1 = T +1 T +1
T +1 iid(0, 1)
Multiply T +1 draws from N (0, 1) by T +1 draws (from a simulated RV model fit to log
realized standard deviation) to build up the conditional density of rT +1 .
172
CHAPTER 11
rT +1 = T +1/T T +1
T +1 iidN (0, 1)
But in the conditionally non-Gaussian case there is potential loss of generality in writing:
rT +1 = T +1/T T +1
T +1 iid(0, 1),
because there may be time variation in conditional moments other than T +1/T , and using
T +1 iid(0, 1) assumes that away
Multivariate Return Distributions
If reliable realized covariances are available, one could do a multivariate analog of the
earlier lognormal/normal mixture model. But the literature thus far has focused primarily
on conditional distributions for daily data.
Return version:
1/2
Zt = t
Rt ,
Zt i.i.d.,
et = Dt1 Rt ,
Et1 (et ) = 0,
V art1 (et ) = t
where Dt denotes the diagonal matrix of conditional standard deviations for each of the
assets, and t refers to the potentially time-varying conditional correlation matrix.
Leading Examples
Multivariate normal:
Multivariate t:
f (et ) = C (d, t )
e0 1 et
1+ t t
(d 2)
(d+N )/2
173
VOLATILITY DYNAMICS
Multivariate asymmetric t:
r
1 (et )
1 exp (et )
1
d + (et )
0
0
0
C d, t K d+N
t
t
t
2
f (et ) =
1 (e )
(et )
0
t
t
d
1+
(d+N ) r
2
1 (et )
1
d + (et )
0
0
t
t
(d+N )
2
More flexible than symmetric t but requires estimation of N asymmetry parameters simultaneously with the other parameters, which is challenging in high dimensions.
Copula methods sometimes provide a simpler two-step approach.
Copula Methods
Sklars Theorem:
F (e) = G( F1 (e1 ), ..., FN (eN ) ) G( u1 , ..., uN ) G(u)
N
Y
N G(F1 (e1 ), ..., FN (eN ))
fi (ei )
= g (u)
e1 ...eN
i=1
f (e) =
log L =
T
X
log g(ut ) +
t=1
T X
N
X
log fi (ei,t )
t=1 i=1
Standard Copulas
Normal:
g(ut ; t )
1
|t | 2
1 1
1
0
1
exp (ut ) (t I) (ut )
2
where 1 (ut ) refers to the N 1 vector of standard inverse univariate normals, and the
correlation matrix t pertains to the N 1 vector et with typical element,
ei,t = 1 (ui,t ) = 1 (Fi (ei,t )).
Often does not allow for sufficient dependence between tail events.
t copula
Asymmetric t copula
Asymmetric Tail Correlations
Multivariate Distribution Simulation (General Case)
Simulate using:
1/2
Rt =
t Zt
Zt i.i.d.(0, I)
Zt may be drawn from parametrically-(Gaussian, t, ...) or nonparametrically-fitted
174
CHAPTER 11
Empirical
Gaussian
DECO
Threshold Correlation
0.4
0.3
0.2
0.1
0.5
0
Standard Deviation
0.5
Figure 11.19: Average Threshold Correlations for Sixteen Developed Equity Markets. The
solid line shows the average empirical threshold correlation for GARCH residuals across sixteen developed
equity markets. The dashed line shows the threshold correlations implied by a multivariate standard normal
distribution with constant correlation. The line with square markers shows the threshold correlations from a
DECO model estimated on the GARCH residuals from the 16 equity markets. The figure is based on weekly
returns from 1973 to 2009.
1/2 ZF,t
Ft =
F,t
Ft + t
Rt =
ZF,t and t may be drawn from parametrically- or nonparametrically-fitted distributions,
or with replacement from the empirical distribution.
Rigorous Modeling VI
Risk, Return and Macroeconomic Fundamentals
We Want to Understand the Financial / Real Connections
Statistical vs. scientific models
Returns Fundamentals
rf
Disconnect?
excess volatility, disconnect, conundrum, ...
r , r , f , f
Links are complex:
r r f f
Volatilities as intermediaries?
For Example...
175
VOLATILITY DYNAMICS
Mean Recession
Volatility Increase
Standard
Error
Sample
Period
43.5%
28.6%
3.8%
6.7%
63Q1-09Q3
69Q1-09Q2
Aggregate Returns
Firm-Level Returns
Table 11.1: Stock Return Volatility During Recessions. Aggregate stock-return volatility is quarterly realized standard deviation based on daily return data. Firm-level stock-return volatility is the crosssectional inter-quartile range of quarterly returns.
Mean Recession
Volatility Increase
Standard
Error
Sample
Period
37.5%
23.1%
7.3%
3.5%
62Q1-09Q2
67Q1-08Q3
Aggregate Growth
Firm-Level Growth
Table 11.2: Real Growth Volatility During Recessions. Aggregate real-growth volatility is quarterly
conditional standard deviation. Firm-level real-growth volatility is the cross-sectional inter-quartile range of
quarterly real sales growth.
f r
Return Volatility is Higher in Recessions
Schwerts (1989) failure: Very hard to link market risk to expected fundamentals (leverage, corporate profitability, etc.).
Actually a great success:
Key observation of robustly higher return volatility in recessions!
Earlier: Officer (1973)
Later: Hamilton and Lin (1996), Bloom et al. (2009)
Extends to business cycle effects in credit spreads via the Merton model
f r , Continued
Bloom et al. (2009) Results
f f
Fundamental Volatility is Higher in Recessions
More Bloom, Floetotto and Jaimovich (2009) Results
f r
Return Vol is Positively Related to Fundamental Vol
Follows immediately from relationships already documented
Moreover, direct explorations provide direct evidence:
Engle et al. (2006) time series
176
CHAPTER 11
P (% per year)
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
90
100
177
VOLATILITY DYNAMICS
Real Stock Return Volatility and Real PCE Growth Volatility, 1983-2002
?? r ??
r r
Risk-Return Tradeoffs (or Lack Thereof)
Studied at least since Markowitz
ARCH-M characterization:
Rt = 0 + 1 Xt + 2 t + t
2
2
t2 = + rt1
+ t1
178
CHAPTER 11
(1)
(2)
(3)
(4)
(5)
(6)
(7)
-0.22
(0.08)
-0.21
(0.09)
-0.20
(0.09)
-0.20
(0.10)
DPt
T ERMt
CAYt
0.24
(0.07)
0.17
(0.10)
-0.01
(0.09)
0.17
(0.07)
DEFt
0.25
(0.10)
-0.11
(0.07)
0.15
(0.07)
0.19
(0.12)
-0.10
(0.08)
0.09
(0.09)
0.17
(0.11)
0.12
(0.11)
0.00
(0.09)
0.11
(0.09)
0.15
(0.10)
gte
0.22
(0.08)
r r
The Key Lesson
The business cycle is of central importance for both r and r
Highlights the importance of high-frequency business cycle monitoring. We need to
interact high-frequency real activity with high-frequency financial market activity
e.g., Aruoba-Diebold-Scotti real-time framework at Federal Reserve Bank of Philadelphia
Conclusions
Reliable risk measurement requires conditional models that allow for time-varying
volatility.
Risk measurement may be done using univariate volatility models. Many important
recent developments.
High-frequency return data contain a wealth of volatility information.
Other tasks require multivariate models. Many important recent developments, especially for N large. Factor structure is often useful.
The business cycle emerges as a key macroeconomic fundamental driving risk.
New developments in high-frequency macro monitoring yield high-frequency real activity data to match high-frequency financial market data.
****************************
Models for non-negative variables (from Minchul)
Introduction Motivation: Why do we need dynamic models for positive values?
Volatility: Time-varying conditional variances
179
BAYES
Alternative model
ACD (autoregressive conditional duration) by Engle and Russell (1998)
Its extension through Dynamic conditional score models
Harvey (2013)
Creal, Koopman, and Lucas (2013)
Autoregressive Gamma Processes
Autoregressive Gamma Processes (ARG): Definition Definition: Yt follows the autoregressive gamma
process if
Measurement:
Yt Zt Gamma( + Zt , c)
Transition:
Zt Yt1 P oisson(Yt1 )
180
CHAPTER 11
Transition:
Zt Yt1 P oisson(Yt1 )
Conditional moments:
E(Yt |Yt1 ) = Yt1 + c
V (Yt |Yt1 ) = 2cYt1 + c2
Corr(Yt , Yth ) = h
where = c > 0.
The process is stationary when < 1.
Conditional over-dispersion The conditional over-dispersion exists if and only if
V (Yt |Yt1 ) > E(Yt |Yt1 )2
When < 1,
The stationary ARG process features marginal over-dispersion.
The process may feature either conditional under- or over-dispersion, depending on the value of Yt1 .
Remark: ACD (autoregressive conditional duration) model assumes the path-indepednet over-dispersion.
Continuous time limit of ARG(1) The stationary ARG process is a discretized version of the CIR process.
p
dYt = a(b Yt )dt + Yt dWt
where
a = log
c
1
2 log
2 =
c
1
b=
181
BAYES
Transition:
Zt Yt1 P oisson(Yt1 )
Yt : Interquote durations of the Dayton Mining stock traded on the Toronto Stock Exchange in October
1998.
Estimation based on QMLE
Extension Creal (2013) considers the following non-linear state space
182
CHAPTER 11
Measurement
yt p(yt |ht , xt ; )
where xt is an exogenous regressor.
Transition
ht Gamma( + zt , c)
zt P oisson(ht1 )
p
ht et ,
et N (0, 1)
Transition
ht Gamma( + zt , c)
zt P oisson(ht1 )
Example 2: Stochastic duration and intensity models Measurement
yt Gamma (, ht exp(xt ))
Transition
ht Gamma( + zt , c)
zt P oisson(ht1 )
Example 3: Stochastic count models Measurement
yt P oisson (ht exp(xt ))
Transition
ht Gamma( + zt , c)
zt P oisson(ht1 )
Recent extension: ARG-zero processes 1 Monfort, Pegoraro, Renne, Roussellet (2014) extend ARG process
to take account for zero-lower bound spells,
Recent extension: ARG-zero processes 2 Monfort, Pegoraro, Renne, Roussellet (2014) extend ARG process
to take account for zero-lower bound spells,
183
BAYES
z=1
Conditional moments
E[Yt |Yt1 ] = c + Yt1
and
V (Yt |Yt1 ) = 2c2 + 2cYt1
where = c.
Figure: ARG-zero
184
CHAPTER 11
Autoregressive conditional duration model (ACD) Yt follows the autoregressive conditional duration
model if
yt = t et ,
E[et ] = 1
t = w + t1 + yt1
Because of its multiplicative form, it is classified as the multiplicative error model (MEM).
Conditional moments
E[yt |y1:t1 ] = t
V (yt |y1:t1 ) = k0 2t
Conditional over-dispersion is path-independent
V (yt |y1:t1 )
= k0
E[yt |y1:t1 ]2
Recall that ARG process can have path-dependent over-dispersion.
Dynamic conditional score (DCS) model Dynamic conditional score model (or Generalized Autoregressive
Score model) is a general class of observation-driven model.
Observation-driven model is a time-varying parameter model where time-varying parameter is a
function of histories of observable. For example, GARCH, ACD, ...
DCS (GAS) model encompasses GARCH, ACD, and other observation-driven models.
Convenient and general modelling strategy. I will describe it within the MEM class of model.
DCS Example: ACD 1 Recall
yt = t et ,
E[et ] = 1
t = w + t1 + yt1
185
BAYES
Instead, we apply DCS principle: Give me conditional likelihood and time-varying parameters, then I
will give you a law motion
yt = t et , et Gamma(, 1/)
DCS Example: ACD 2
yt = t et ,
et Gamma(, 1/)
1
Appendices
186
Appendix A
Ait-Sahalia, Y. and Hansen, L.P. eds. (2010), Handbook of Financial Econometrics. Amsterdam: North-Holland.
Ait-Sahalia, Y. and Jacod, J. (2014), High-Frequency Financial Econometrics, Princeton
University Press.
Beran, J., Feng, Y., Ghosh, S. and Kulik, R. (2013), Long-Memory Processes: Probabilistic
Properties and Statistical Methods, Springer.
Box, G.E.P. and Jenkins, G.W. (1970), Time Series Analysis, Forecasting and Control,
Prentice-Hall.
Davidson, R. and MacKinnon, J. (1993), Estimation and Inference in Econometrics, Oxford
University Press.
Diebold, F.X. (1998), Elements of Forecasting, South-Western.
Douc, R., Moulines, E. and Stoffer, D.S. (2014), Nonlinear Time Series: Theory, Methods,
and Applications with R Examples, Chapman and Hall.
Durbin, J. and Koopman, S.J. (2001), Time Series Analysis by State Space Methods, Oxford
University Press.
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, Chapman and
Hall.
Elliott, G., Granger, C.W.J. and Timmermann, A., eds. (2006), Handbook of Economic
Forecasting, Volume 1, North-Holland.
Elliott, G., Granger, C.W.J. and Timmermann, A., eds. (2013), Handbook of Economic
Forecasting, Volume 2, North-Holland.
Engle, R.F. and McFadden, D., eds. (1995), Handbook of Econometrics, Volume 4, NorthHolland.
188
APPENDIX A
USEFUL BOOKS
189
Whittle, P. (1963), Prediction and Regulation by Linear Least Squares Methods, University
of Minnesota Press.
Zellner, A. (1971), An Introduction to Bayesian Inference in Econometrics, John Wiley and
Sons.
Appendix B
B.1 DIFFUSIONS
A key general reference is Karatzas and Shreve (1991).
Continuous-time white noise with finite variance is hard to define. (Why? See, for example,
Priestley, 1980, pp. 156-158). Continuous-time analogs of random walks are easier to define,
so they wind up playing a more crucial role as building blocks for continuous-time processes.
A diffusion is a process with Markovian structure and a continuous (but non-differentiable)
sample path. Some important special cases:
Standard Brownian motion
dx = dW,
where
Z
W (t) =
(u)du
0
(W (t) W (s))
iid
N (0, t s), 0 s t
Brownian motion is fundamental, because processes with richer dynamics are built up
from it, via location and scale shifts. W stands for Wiener process. Standard Brownian
motion is the simplest example of the slightly more general Wiener process.
Wiener process. Standard Brownian motion shifted and scaled.
dx = dt + dW.
191
CONTINUOUS TIME
Figure: Gaussian random walk with drift, optimal point and interval forecasts
Wiener process arises as the continuous limit of a discrete-time binomial tree. Discrete
periods t . Each period the process moves up by h w.p. p, and down by h w.p. 1-p. If
we take limits as t 0 and adjust h and p apropriately (as they depend on t ), we
obtain the Wiener process. Useful for simplified derivatives pricing, as in Cox, Ross and
Rubinstein (1979, JFE ).
Wiener process subject to reflecting barriers
dx = dt + dW.
s.t. |x| < c
Figure: Gaussian random walk with drift subject to reflecting barriers.
Stationary distribution exists. Symmetry depends on whether drift exists, and if so, on the
sign of the drift.
Process arises as the continuous limit of a discrete-time binomial tree subject to reflecting
barriers. Discrete periods t . Each period the process moves up by h w.p. p, and down
by h w.p. 1-p, except that if it tries to move to c or -c, it is prohibited from doing so.
Ito process
dx = (x, t) dt + (x, t) dW
An important generalization of a Wiener process.
Geometric Brownian motion
dx = x dt + x dW.
Simple and important Ito process.
Figure: Exp of a logarithmic Gaussian random walk with drift, optimal point and interval
forecasts
Orstein-Uhlenbeck process
dx = ( + x)dt + dW.
Simple and important Ito process. Reverts to a mean of -/. Priestley (1980) shows how
it arises as one passes to continuous time when starting from a discrete-time AR(1)
process.
Figure: Gaussian AR(1) with nonzero mean, optimal point and interval forecasts
192
APPENDIX B
dx = ( + x)dt + xdW.
This is an important example of an Ito process this time heteroskedastic, as the variance
depends on the level.
Generalized CIR process (Chan et al., JOF 1992; Kroner et al.)
dr = (a + r)dt + r dW
Discrete approximation:
where
b (1 + )
.
More precisely, let r1,t denote a 1-aggregated series at time t. Then
r1,t = a
h1
X
bi + bh r1,th +
i=0
h1
X
bi 1,ti .
i=0
rh,t = a
h1
X
bi + bh rh,t1 + h,t
i=0
1 We change the notation from x to r, in keeping with the fact that the models are commonly used for
interest rates.
193
CONTINUOUS TIME
h,t |t1 N
0,
h1
X
!
2i
b var(1,ti ) ,
i=0
where
2
var(1,ti ) = 2 r1,ti1
.
Note that, although the discretization interval must be set by the investigator,
and is therefore subject to discretion, from that point on the parameter estimates are
(asymptotically) invariant to the data recording interval.
Diffusion limit of GARCH (Nelson, 1990; Drost-Werker, 1996)
drt = t dWpt
dt2 = ( t2 ) +
2t2 dWst
where dP =
0, w.p.1 dt
u, w.p.dt
194
dF =
APPENDIX B
F
t
+ (x, t) F
x +
1
2
i
2
2 (x, t) xF2 dt + (x, t) F
x dW
Itos Lemma is central because we often need to characterize the diffusion followed by a
function of an underlying diffusion, as in derivatives pricing.
As an example, suppose that x follows a geometric Brownian motion. Then a simple
application of Itos Lemma reveals that lnx follows the simple Wiener process:
dF = ( 12 2 ) dt + dW ,
where F = lnx.
Hence lnx is the continuous time limit of a logarithmic Gaussian random walk.
B.2 JUMPS
B.3 QUADRATIC VARIATION, BI-POWER VARIATION, AND MORE
B.4 INTEGRATED AND REALIZED VOLATILITY
B.5 REALIZED COVARIANCE MATRIX MODELING IN BIG DATA MULTIVARIATE ENVIRONMENTS
B.6 EXERCISES, PROBLEMS AND COMPLEMENTS
1. A key problem in nonparametric estimation.
Estimate the drift function f (t) in:
1
dyt = f (t)dt + dW, t [0, 1]
N
consistently (N ).
B.7 NOTES
Appendix C
cov(it , jt ) = ij , = [ij ]
i = 1, ..., N ; t = 1, ...T
Matrix form: y i = X i i + i , i = 1, ..., N
Stacked version:
1
X
y
.
. =
.
0
yN
X2
..
1
1
0 . .
. + .
. .
N
N
N
X
y = X +
cov() = I
1
1 y
1 X
X 0
SU R = X 0
1
1 X
1 y
GLS = X 0
X 0
GLS = OLS if diagonal (obvious)
or same regressors in each equation (not so obvious)
So for VARs just do equation-by-equation OLS
Bibliography
Aldrich, E.M., F. Fernndez-Villaverde, A.R. Gallant, and J.F. Rubio-Ramrez (2011), Tapping the Supercomputer Under Your Desk: Solving Dynamic Equilibrium Models with Graphics Processors, Journal
of Economic Dynamics and Control, 35, 386393.
Aruoba, S.B., F.X. Diebold, J. Nalewaik, F. Schorfheide, and D. Song (2013), Improving GDP Measurement:
A Measurement Error Perspective, Working Paper, University of Maryland, Federal Reserve Board, and
University of Pennsylvania.
Nerlove, M., D.M. Grether, and J.L. Carvalho (1979), Analysis of Economic Time Series: A Synthesis. New
York: Academic Press. Second Edition.
Ruge-Murcia, Francisco J. (2010), Estimating Nonlinear DSGE Models by the Simulated Method of Moments, Manuscript, University of Montreal.
Yu, Yaming and Xiao-Li Meng (2010), To Center or Not to Center: That is Not the Question An AncillaritySufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency, Manuscript, Harvard University.