Sunteți pe pagina 1din 9

Introduction to Stata Handout 5: Time Series

Hayley Fisher
2 December 2010
Key references: Wooldridge (2009) part II, Greene (2008) chapter 21.

Time series

To illustrate some basic features of Stata with time series, I am using a dataset of the general fertility rate
and personal tax exemptions for the US from 1913 to 1984. This is based on an example in chapter 10 of
Wooldridge (2009), and ultimately on an article by Whittington, Alm and Peters from 1990. The dataset is
available from my website.
We start off by summarizing the small dataset.
. summarize
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------gfr |
72
95.63194
19.80464
65.4
126.6
pe |
72
100.4015
65.87563
0
243.83
year |
72
1948.5
20.92845
1913
1984
pill |
72
.3055556
.4638749
0
1
ww2 |
72
.0694444
.2559923
0
1
gfr is the general fertility rate, and we are looking to explain it using personal tax exemption (pe),
whether World War II was occurring, and whether the contraceptive pill was available. Stata makes it easy
to create variables such as lags, leads and first differences. To take advantage of these we first need to declare
the data to be time series using the tsset command:
. tsset year
time variable:
delta:

year, 1913 to 1984


1 unit

Once this has been done, we can create variables using the lag (L.), lead (F.) and first difference (D.)
operators:
. generate Lgfr=L.gfr
(1 missing value generated)
. generate Fgfr=F.gfr
(1 missing value generated)
. generate Dgfr=D.gfr
(1 missing value generated)
A missing value is created each time at the beginning, or end, of the dataset. We can check that these are
correct by listing the first five values:
1

. list year gfr Lgfr Fgfr Dgfr in 1/5

1.
2.
3.
4.
5.

+------------------------------------------+
| year
gfr
Lgfr
Fgfr
Dgfr |
|------------------------------------------|
| 1913
124.7
.
126.6
. |
| 1914
126.6
124.7
125
1.900002 |
| 1915
125
126.6
123.4
-1.599998 |
| 1916
123.4
125
121
-1.599998 |
| 1917
121
123.4
119.8
-2.400002 |
+------------------------------------------+

We can naively estimate the relationship between the general fertility rate and these variables using
regress.
. regress gfr pe ww2 pill, vce(robust)
Linear regression

Number of obs =
F( 3,
68) =
Prob > F
=
R-squared
=
Root MSE
=

72
51.57
0.0000
0.4734
14.685

-----------------------------------------------------------------------------|
Robust
gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pe |
.08254
.0269359
3.06
0.003
.0287902
.1362898
ww2 |
-24.2384
3.351355
-7.23
0.000
-30.92592
-17.55087
pill | -31.59403
3.131765
-10.09
0.000
-37.84337
-25.34469
_cons |
98.68176
4.222996
23.37
0.000
90.2549
107.1086
-----------------------------------------------------------------------------As would be expected, the fertility appears to be lower during World War II and when the contraceptive
pill is available. For these results to be consistent we require no serial correlation in the error terms. We
can test for this using the Durbin-Watson statistic, implemented by typing estat dwatson after running
the regression.
. estat dwatson
Durbin-Watson d-statistic(

4,

72) =

.1768727

The statistic is far below 2 and so indicates serial correlation. Stata can also implement two Lagrange
Multiplier tests for serial correlation, using estat durbinalt and estat bgodfrey.
. estat durbinalt
Durbins alternative test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
255.261
1
0.0000
--------------------------------------------------------------------------H0: no serial correlation

. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
57.031
1
0.0000
--------------------------------------------------------------------------H0: no serial correlation
Again, serial correlation is detected. There are several ways of dealing with serial correlation of the error
terms. One method might be to add lags of the dependent variable. Here we can add two lags and it is
not necessary to create both lags before running the regression.
. regress gfr L.gfr L2.gfr pe ww2 pill
Source |
SS
df
MS
-------------+-----------------------------Model | 25053.8199
5 5010.76397
Residual | 938.613043
64 14.6658288
-------------+-----------------------------Total | 25992.4329
69 376.701926

Number of obs
F( 5,
64)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

70
341.66
0.0000
0.9639
0.9611
3.8296

-----------------------------------------------------------------------------gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gfr |
L1. |
1.076351
.1214261
8.86
0.000
.8337748
1.318928
L2. |
-.175926
.1156576
-1.52
0.133
-.4069785
.0551266
pe |
.0227773
.0086381
2.64
0.010
.0055207
.040034
ww2 | -2.545189
2.096259
-1.21
0.229
-6.732947
1.642569
pill | -4.824531
1.439447
-3.35
0.001
-7.700157
-1.948904
_cons |
8.143783
3.182315
2.56
0.013
1.786377
14.50119
-----------------------------------------------------------------------------. estat durbinalt
Durbins alternative test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
0.498
1
0.4805
--------------------------------------------------------------------------H0: no serial correlation
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
0.549
1
0.4589
--------------------------------------------------------------------------3

H0: no serial correlation


Adding these lags removes the serial correlation, and reduces the point estimates of coefficients. Note that
with a lagged dependent variable the standard Durbin-Watson statistics is not appropriate.
Instead of removing the serial correlation from the error terms, we could attempt to correct the standard
errors using newey. Here we must specify the order of serial correlation the number of lags over which
to calculate the serial correlation in the errors. I include the heteroscedasticity robust but not adjusted for
serial correlation results for comparison:
. regress gfr pe ww2 pill, vce(robust)
Linear regression

Number of obs =
F( 3,
68) =
Prob > F
=
R-squared
=
Root MSE
=

72
51.57
0.0000
0.4734
14.685

-----------------------------------------------------------------------------|
Robust
gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pe |
.08254
.0269359
3.06
0.003
.0287902
.1362898
ww2 |
-24.2384
3.351355
-7.23
0.000
-30.92592
-17.55087
pill | -31.59403
3.131765
-10.09
0.000
-37.84337
-25.34469
_cons |
98.68176
4.222996
23.37
0.000
90.2549
107.1086
-----------------------------------------------------------------------------. newey gfr pe ww2 pill, lag(2)
Regression with Newey-West standard errors
maximum lag: 2

Number of obs
F( 3,
68)
Prob > F

=
=
=

72
25.84
0.0000

-----------------------------------------------------------------------------|
Newey-West
gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pe |
.08254
.0436866
1.89
0.063
-.0046352
.1697153
ww2 |
-24.2384
3.615538
-6.70
0.000
-31.45309
-17.0237
pill | -31.59403
5.016629
-6.30
0.000
-41.60456
-21.58351
_cons |
98.68176
7.003884
14.09
0.000
84.70572
112.6578
-----------------------------------------------------------------------------As the order of serial correlation in the error term increases, the standard errors also increase. This
adjusts the standard errors for the presence of serial correlation.
Alternatively, it is possible to directly model the serial correlation in the error terms using the arima
command. For example, if we believe that the error terms are best modelled by a first order autoregressive
process, and so (3) below should be estimated:
gf rt = Xt + ut

(1)

ut = ut1 + t

(2)

gf rt = Xt + (gf rt1 Xt1 ) + t

(3)

Using arima with the option ar(1) achieves this. Extra autoregressive lags can be added, as can moving
average components using the option ma(i) where i is the order of the moving average term.
. arima gfr pe ww2 pill, ar(1)
(setting optimization to BHHH)
Iteration 0:
log likelihood = -231.52653
Iteration 1:
log likelihood = -221.50334
Iteration 2:
log likelihood = -217.64887
Iteration 3:
log likelihood = -217.50191
Iteration 4:
log likelihood = -215.36991
(switching optimization to BFGS)
Iteration 5:
log likelihood = -214.43403
Iteration 6:
log likelihood = -211.39705
Iteration 7:
log likelihood = -209.80199
Iteration 8:
log likelihood = -207.01082
Iteration 9:
log likelihood = -206.41696
Iteration 10: log likelihood = -206.12935
Iteration 11: log likelihood = -205.96173
Iteration 12: log likelihood = -205.95487
Iteration 13: log likelihood = -205.95375
Iteration 14: log likelihood = -205.9535
(switching optimization to BHHH)
Iteration 15: log likelihood = -205.95347
Iteration 16: log likelihood = -205.95347
ARIMA regression
Sample:

1913 - 1984

Number of obs
Wald chi2(4)
Prob > chi2

Log likelihood = -205.9535

=
=
=

72
791.75
0.0000

-----------------------------------------------------------------------------|
OPG
gfr |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gfr
|
pe |
-.025783
.0313082
-0.82
0.410
-.087146
.0355799
ww2 | -5.033364
1.873287
-2.69
0.007
-8.704939
-1.36179
pill | -4.253375
19.34541
-0.22
0.826
-42.16968
33.66292
_cons |
98.88224
18.5853
5.32
0.000
62.45571
135.3088
-------------+---------------------------------------------------------------ARMA
|
ar |
L1. |
.982849
.040655
24.18
0.000
.9031666
1.062531
-------------+---------------------------------------------------------------/sigma |
4.129038
.3477984
11.87
0.000
3.447366
4.81071
-----------------------------------------------------------------------------Having explicitly modelled the autocorrelation we see that the personal exemption and pill variables are
no longer statistically significant.
However, it is also hypothesised that there may be a lag in the response to the personal tax exemption.
We should therefore include lagged values of the personal exemption.
. regress gfr pe L.pe L2.pe pill ww2, vce(robust)
5

Linear regression

Number of obs =
F( 5,
64) =
Prob > F
=
R-squared
=
Root MSE
=

70
31.21
0.0000
0.4986
14.27

-----------------------------------------------------------------------------|
Robust
gfr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pe |
--. |
.0726718
.0979877
0.74
0.461
-.1230812
.2684248
L1. | -.0057796
.1132301
-0.05
0.959
-.2319826
.2204235
L2. |
.0338268
.0894437
0.38
0.707
-.1448575
.2125111
pill | -31.30499
3.123293
-10.02
0.000
-37.54448
-25.06549
ww2 |
-22.1265
6.950919
-3.18
0.002
-36.01256
-8.24044
_cons |
95.8705
4.284189
22.38
0.000
87.31185
104.4291
-----------------------------------------------------------------------------The coefficients on the lags of pe are imprecisely estimated, but we can test their joint significance using the
test command introduced above:
. test pe L.pe L2.pe
( 1)
( 2)
( 3)

pe = 0
L.pe = 0
L2.pe = 0
F(

3,
64) =
Prob > F =

4.67
0.0051

The three coefficients are jointly significantly different from zero at the 1% level.
We may also want to include a time trend. This can be generated (plus a quadratic) if the data is sorted
by date:
. sort year
. generate t=_n
. generate t2=t^2
These are easily included in a regression model. It remains important to test for the presence of serial
correlation in the error term.
However, we must also consider the possibility of a unit root in the series we are trying to explain. Time
series can be simply displayed using the tsline command once the data have been declared to be time series.
Graphing the fertility rate suggests that a unit root is very likely.

60

births per 1000 women 1544


80
100
120

140

. tsline gfr

1920

1940
1913 to 1984

1960

1980

To investigate further we can look at the autocorrelations of both the fertility rate and its first difference.
These too strongly suggest a unit root which is removed by first differencing the series.

1.00

Autocorrelations of gfr
0.50
0.00
0.50

1.00

. ac gfr, lags(10)

Lag
Bartletts formula for MA(q) 95% confidence bands

10

0.40

Autocorrelations of Dgfr
0.20
0.00
0.20

0.40

. ac Dgfr, lags(10)

10

Lag
Bartletts formula for MA(q) 95% confidence bands

A formal test for a unit root can be conducted using a Dickey-Fuller test and the dfuller command.
. dfuller gfr
Dickey-Fuller test for unit root

Number of obs

71

---------- Interpolated Dickey-Fuller --------Test


1% Critical
5% Critical
10% Critical
Statistic
Value
Value
Value
-----------------------------------------------------------------------------Z(t)
-0.857
-3.551
-2.913
-2.592
-----------------------------------------------------------------------------MacKinnon approximate p-value for Z(t) = 0.8019
. dfuller Dgfr
Dickey-Fuller test for unit root

Number of obs

70

---------- Interpolated Dickey-Fuller --------Test


1% Critical
5% Critical
10% Critical
Statistic
Value
Value
Value
-----------------------------------------------------------------------------Z(t)
-6.161
-3.552
-2.914
-2.592
-----------------------------------------------------------------------------MacKinnon approximate p-value for Z(t) = 0.0000
Here the null hypothesis is that there is a unit root. So there is insufficient evidence to suggest that there
is no unit root in the gfr series, but the null hypothesis is rejected in the first difference of the series, as
suggested by the autocorrelations.

References
Greene, William H., Econometric Analysis, 6th ed., Pearson/Prentice Hall, 2008.
8

Wooldridge, Jeffrey M., Introductory Econometrics: A Modern Approach, 4th ed., South Western /
Cengage Learning, 2009.

S-ar putea să vă placă și