Documente Academic
Documente Profesional
Documente Cultură
Hayley Fisher
2 December 2010
Key references: Wooldridge (2009) part II, Greene (2008) chapter 21.
Time series
To illustrate some basic features of Stata with time series, I am using a dataset of the general fertility rate
and personal tax exemptions for the US from 1913 to 1984. This is based on an example in chapter 10 of
Wooldridge (2009), and ultimately on an article by Whittington, Alm and Peters from 1990. The dataset is
available from my website.
We start off by summarizing the small dataset.
. summarize
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------gfr |
72
95.63194
19.80464
65.4
126.6
pe |
72
100.4015
65.87563
0
243.83
year |
72
1948.5
20.92845
1913
1984
pill |
72
.3055556
.4638749
0
1
ww2 |
72
.0694444
.2559923
0
1
gfr is the general fertility rate, and we are looking to explain it using personal tax exemption (pe),
whether World War II was occurring, and whether the contraceptive pill was available. Stata makes it easy
to create variables such as lags, leads and first differences. To take advantage of these we first need to declare
the data to be time series using the tsset command:
. tsset year
time variable:
delta:
Once this has been done, we can create variables using the lag (L.), lead (F.) and first difference (D.)
operators:
. generate Lgfr=L.gfr
(1 missing value generated)
. generate Fgfr=F.gfr
(1 missing value generated)
. generate Dgfr=D.gfr
(1 missing value generated)
A missing value is created each time at the beginning, or end, of the dataset. We can check that these are
correct by listing the first five values:
1
1.
2.
3.
4.
5.
+------------------------------------------+
| year
gfr
Lgfr
Fgfr
Dgfr |
|------------------------------------------|
| 1913
124.7
.
126.6
. |
| 1914
126.6
124.7
125
1.900002 |
| 1915
125
126.6
123.4
-1.599998 |
| 1916
123.4
125
121
-1.599998 |
| 1917
121
123.4
119.8
-2.400002 |
+------------------------------------------+
We can naively estimate the relationship between the general fertility rate and these variables using
regress.
. regress gfr pe ww2 pill, vce(robust)
Linear regression
Number of obs =
F( 3,
68) =
Prob > F
=
R-squared
=
Root MSE
=
72
51.57
0.0000
0.4734
14.685
-----------------------------------------------------------------------------|
Robust
gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pe |
.08254
.0269359
3.06
0.003
.0287902
.1362898
ww2 |
-24.2384
3.351355
-7.23
0.000
-30.92592
-17.55087
pill | -31.59403
3.131765
-10.09
0.000
-37.84337
-25.34469
_cons |
98.68176
4.222996
23.37
0.000
90.2549
107.1086
-----------------------------------------------------------------------------As would be expected, the fertility appears to be lower during World War II and when the contraceptive
pill is available. For these results to be consistent we require no serial correlation in the error terms. We
can test for this using the Durbin-Watson statistic, implemented by typing estat dwatson after running
the regression.
. estat dwatson
Durbin-Watson d-statistic(
4,
72) =
.1768727
The statistic is far below 2 and so indicates serial correlation. Stata can also implement two Lagrange
Multiplier tests for serial correlation, using estat durbinalt and estat bgodfrey.
. estat durbinalt
Durbins alternative test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
255.261
1
0.0000
--------------------------------------------------------------------------H0: no serial correlation
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
57.031
1
0.0000
--------------------------------------------------------------------------H0: no serial correlation
Again, serial correlation is detected. There are several ways of dealing with serial correlation of the error
terms. One method might be to add lags of the dependent variable. Here we can add two lags and it is
not necessary to create both lags before running the regression.
. regress gfr L.gfr L2.gfr pe ww2 pill
Source |
SS
df
MS
-------------+-----------------------------Model | 25053.8199
5 5010.76397
Residual | 938.613043
64 14.6658288
-------------+-----------------------------Total | 25992.4329
69 376.701926
Number of obs
F( 5,
64)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
70
341.66
0.0000
0.9639
0.9611
3.8296
-----------------------------------------------------------------------------gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gfr |
L1. |
1.076351
.1214261
8.86
0.000
.8337748
1.318928
L2. |
-.175926
.1156576
-1.52
0.133
-.4069785
.0551266
pe |
.0227773
.0086381
2.64
0.010
.0055207
.040034
ww2 | -2.545189
2.096259
-1.21
0.229
-6.732947
1.642569
pill | -4.824531
1.439447
-3.35
0.001
-7.700157
-1.948904
_cons |
8.143783
3.182315
2.56
0.013
1.786377
14.50119
-----------------------------------------------------------------------------. estat durbinalt
Durbins alternative test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
0.498
1
0.4805
--------------------------------------------------------------------------H0: no serial correlation
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
0.549
1
0.4589
--------------------------------------------------------------------------3
Number of obs =
F( 3,
68) =
Prob > F
=
R-squared
=
Root MSE
=
72
51.57
0.0000
0.4734
14.685
-----------------------------------------------------------------------------|
Robust
gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pe |
.08254
.0269359
3.06
0.003
.0287902
.1362898
ww2 |
-24.2384
3.351355
-7.23
0.000
-30.92592
-17.55087
pill | -31.59403
3.131765
-10.09
0.000
-37.84337
-25.34469
_cons |
98.68176
4.222996
23.37
0.000
90.2549
107.1086
-----------------------------------------------------------------------------. newey gfr pe ww2 pill, lag(2)
Regression with Newey-West standard errors
maximum lag: 2
Number of obs
F( 3,
68)
Prob > F
=
=
=
72
25.84
0.0000
-----------------------------------------------------------------------------|
Newey-West
gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pe |
.08254
.0436866
1.89
0.063
-.0046352
.1697153
ww2 |
-24.2384
3.615538
-6.70
0.000
-31.45309
-17.0237
pill | -31.59403
5.016629
-6.30
0.000
-41.60456
-21.58351
_cons |
98.68176
7.003884
14.09
0.000
84.70572
112.6578
-----------------------------------------------------------------------------As the order of serial correlation in the error term increases, the standard errors also increase. This
adjusts the standard errors for the presence of serial correlation.
Alternatively, it is possible to directly model the serial correlation in the error terms using the arima
command. For example, if we believe that the error terms are best modelled by a first order autoregressive
process, and so (3) below should be estimated:
gf rt = Xt + ut
(1)
ut = ut1 + t
(2)
(3)
Using arima with the option ar(1) achieves this. Extra autoregressive lags can be added, as can moving
average components using the option ma(i) where i is the order of the moving average term.
. arima gfr pe ww2 pill, ar(1)
(setting optimization to BHHH)
Iteration 0:
log likelihood = -231.52653
Iteration 1:
log likelihood = -221.50334
Iteration 2:
log likelihood = -217.64887
Iteration 3:
log likelihood = -217.50191
Iteration 4:
log likelihood = -215.36991
(switching optimization to BFGS)
Iteration 5:
log likelihood = -214.43403
Iteration 6:
log likelihood = -211.39705
Iteration 7:
log likelihood = -209.80199
Iteration 8:
log likelihood = -207.01082
Iteration 9:
log likelihood = -206.41696
Iteration 10: log likelihood = -206.12935
Iteration 11: log likelihood = -205.96173
Iteration 12: log likelihood = -205.95487
Iteration 13: log likelihood = -205.95375
Iteration 14: log likelihood = -205.9535
(switching optimization to BHHH)
Iteration 15: log likelihood = -205.95347
Iteration 16: log likelihood = -205.95347
ARIMA regression
Sample:
1913 - 1984
Number of obs
Wald chi2(4)
Prob > chi2
=
=
=
72
791.75
0.0000
-----------------------------------------------------------------------------|
OPG
gfr |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gfr
|
pe |
-.025783
.0313082
-0.82
0.410
-.087146
.0355799
ww2 | -5.033364
1.873287
-2.69
0.007
-8.704939
-1.36179
pill | -4.253375
19.34541
-0.22
0.826
-42.16968
33.66292
_cons |
98.88224
18.5853
5.32
0.000
62.45571
135.3088
-------------+---------------------------------------------------------------ARMA
|
ar |
L1. |
.982849
.040655
24.18
0.000
.9031666
1.062531
-------------+---------------------------------------------------------------/sigma |
4.129038
.3477984
11.87
0.000
3.447366
4.81071
-----------------------------------------------------------------------------Having explicitly modelled the autocorrelation we see that the personal exemption and pill variables are
no longer statistically significant.
However, it is also hypothesised that there may be a lag in the response to the personal tax exemption.
We should therefore include lagged values of the personal exemption.
. regress gfr pe L.pe L2.pe pill ww2, vce(robust)
5
Linear regression
Number of obs =
F( 5,
64) =
Prob > F
=
R-squared
=
Root MSE
=
70
31.21
0.0000
0.4986
14.27
-----------------------------------------------------------------------------|
Robust
gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pe |
--. |
.0726718
.0979877
0.74
0.461
-.1230812
.2684248
L1. | -.0057796
.1132301
-0.05
0.959
-.2319826
.2204235
L2. |
.0338268
.0894437
0.38
0.707
-.1448575
.2125111
pill | -31.30499
3.123293
-10.02
0.000
-37.54448
-25.06549
ww2 |
-22.1265
6.950919
-3.18
0.002
-36.01256
-8.24044
_cons |
95.8705
4.284189
22.38
0.000
87.31185
104.4291
-----------------------------------------------------------------------------The coefficients on the lags of pe are imprecisely estimated, but we can test their joint significance using the
test command introduced above:
. test pe L.pe L2.pe
( 1)
( 2)
( 3)
pe = 0
L.pe = 0
L2.pe = 0
F(
3,
64) =
Prob > F =
4.67
0.0051
The three coefficients are jointly significantly different from zero at the 1% level.
We may also want to include a time trend. This can be generated (plus a quadratic) if the data is sorted
by date:
. sort year
. generate t=_n
. generate t2=t^2
These are easily included in a regression model. It remains important to test for the presence of serial
correlation in the error term.
However, we must also consider the possibility of a unit root in the series we are trying to explain. Time
series can be simply displayed using the tsline command once the data have been declared to be time series.
Graphing the fertility rate suggests that a unit root is very likely.
60
140
. tsline gfr
1920
1940
1913 to 1984
1960
1980
To investigate further we can look at the autocorrelations of both the fertility rate and its first difference.
These too strongly suggest a unit root which is removed by first differencing the series.
1.00
Autocorrelations of gfr
0.50
0.00
0.50
1.00
. ac gfr, lags(10)
Lag
Bartletts formula for MA(q) 95% confidence bands
10
0.40
Autocorrelations of Dgfr
0.20
0.00
0.20
0.40
. ac Dgfr, lags(10)
10
Lag
Bartletts formula for MA(q) 95% confidence bands
A formal test for a unit root can be conducted using a Dickey-Fuller test and the dfuller command.
. dfuller gfr
Dickey-Fuller test for unit root
Number of obs
71
Number of obs
70
References
Greene, William H., Econometric Analysis, 6th ed., Pearson/Prentice Hall, 2008.
8
Wooldridge, Jeffrey M., Introductory Econometrics: A Modern Approach, 4th ed., South Western /
Cengage Learning, 2009.