Sunteți pe pagina 1din 4

ECON 761: Two Stage Least Squares (2SLS) Example

L. Magee November 2007

———————————————————–

The following variables were used by Klein in a 1950 U.S. macroeconometric study:

Variable Name Description


year Year, beginning with 1920 and ending with 1941
cn Consumption, measured in billions of 1934 dollars
p Profits, billions of 1934 dollars
w1 Private Sector Wages, billions of 1934 dollars
i Net Investment, billions of 1934 dollars
klag End-of-year Capital Stock lagged one year, billions of 1934 dollars
e Private Product (GNP plus Indirect Taxes minus Government Wage Bill), in
billions of 1934 dollars
w2 Government Wage Bill, billions of 1934 dollars
g Government Expenditures (includes Government Wage Bill), billions of
1934 dollars
tx Indirect Taxes, billions of 1934 dollars

Klein specifies the consumption function as

cnt = β0 + β1 wt + β2 pt + β3 pt−1 + t

where wt is the total wage bill, defined as wt = w1t + w2t . He considers the variables w1t (and
therefore wt ) and pt to be endogenous. pt−1 is assumed to be exogenous. Of the variables in the
above list, g, w2 , tx, klag, and year are considered exogenous. (He uses t = year − 1931 instead of
year.) He creates another assumed-to-be-exogenous variable, et−1 , the lag of the private product
variable.

Stata Program
generate commands are used to create the pt−1 and et−1 variables using the “L.” lag operator, as
well as the t and w variables. reg computes the usual OLS estimates.
If w1t and pt are endogenous though, 2SLS is usually preferred. It is computed in Stata using
the ivreg command. The dependent variable is followed by a list of the exogenous regressors. In
this example there is only one, pt−1 , as well as the constant term, which is included by default.
Then in parentheses, the endogenous variables are listed, followed by an equals sign and a list of

1
the instrumental variables (that do not appear on the right-hand side of the equation).

Hausman Test
Note that the 2SLS standard errors are higher. This is an indication of the theoretical result that
the variance of the 2SLS estimator is higher than the variance of the OLS estimator. When deciding
whether to use OLS or 2SLS, there is a trade-off – OLS has a smaller variance (“efficient”) but 2SLS
is consistent under more general conditions (“consistent”). The Hausman test is a widely-used gen-
eral specification testing method for this and other situations where this trade-off is present. It tests:

H0 : the efficient estimator (OLS) is consistent. (Favours the use of OLS)


Ha : the efficient estimator is not consistent. (Favours the use of 2SLS)

Hausman tests decide whether or not the difference between the two estimators is statistically
significant. If it is, that is evidence that the more restricted, or “efficient”, one is not consistent.
(One reason why I keep putting the word “efficient” in quotation marks is that it is only really
efficient if it is also consistent. Roughly speaking, an estimator is efficient if it is consistent and no
other consistent estimator has a smaller variance.)
For 2SLS Hausman-testing, it is necessary to include the sigmamore option, otherwise the
output will be incorrect. The degrees-of-freedom should equal the number of endogenous regressors
handled by the 2SLS estimator.
In this example, the P -value is 0.0264 < .05, so we reject H0 at the 5% significance level. This
favours 2SLS, since it is evidence that OLS is not consistent.

2
.
. insheet using "C:\Documents and Settings\courses\761 and
762\f07\2SLS\klein_dat.txt"
(10 vars, 22 obs)

.
. summarize

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
year | 22 1930.5 6.493587 1920 1941
cn | 22 53.35 7.34774 39.8 69.7
p | 22 16.7 4.214262 7 23.5
w1 | 22 36.01818 6.360191 25.5 53.3
i | 22 1.331818 3.47979 -6.2 5.6
-------------+--------------------------------------------------------
klag | 22 199.5682 10.61133 180.1 216.7
e | 22 59.36818 10.85359 44.3 88.4
w2 | 22 4.986364 2.008386 2.2 8.5
g | 22 9.672727 3.98057 4.6 22.3
tx | 22 6.65 2.111364 3.4 11.6

.
. tsset year
time variable: year, 1920 to 1941

.
. g p_lag=L.p
(1 missing value generated)

. g e_lag=L.e
(1 missing value generated)

. g t=year-1931

. g w=w1+w2

.
. reg cn w p p_lag

Source | SS df MS Number of obs = 21


-------------+------------------------------ F( 3, 17) = 292.71
Model | 923.549937 3 307.849979 Prob > F = 0.0000
Residual | 17.8794524 17 1.05173249 R-squared = 0.9810
-------------+------------------------------ Adj R-squared = 0.9777
Total | 941.429389 20 47.0714695 Root MSE = 1.0255

------------------------------------------------------------------------------
cn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
w | .7962188 .0399439 19.93 0.000 .7119444 .8804931
p | .1929343 .0912102 2.12 0.049 .0004977 .385371
p_lag | .0898847 .0906479 0.99 0.335 -.1013658 .2811351
_cons | 16.2366 1.302698 12.46 0.000 13.48815 18.98506
------------------------------------------------------------------------------
. estimates store ols_efficient

.
. ivreg cn p_lag (w p = g w2 tx t klag e_lag)

Instrumental variables (2SLS) regression

Source | SS df MS Number of obs = 21


-------------+------------------------------ F( 3, 17) = 225.93
Model | 919.504137 3 306.501379 Prob > F = 0.0000
Residual | 21.925252 17 1.28972071 R-squared = 0.9767
-------------+------------------------------ Adj R-squared = 0.9726
Total | 941.429389 20 47.0714695 Root MSE = 1.1357

------------------------------------------------------------------------------
cn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
w | .8101827 .0447351 18.11 0.000 .7158 .9045655
p | .0173022 .1312046 0.13 0.897 -.2595153 .2941197
p_lag | .2162338 .1192217 1.81 0.087 -.0353019 .4677696
_cons | 16.55476 1.467979 11.28 0.000 13.45759 19.65192
------------------------------------------------------------------------------
Instrumented: w p
Instruments: p_lag g w2 tx t klag e_lag
------------------------------------------------------------------------------

. estimates store iv_consistent

.
. hausman iv_consistent ols_efficient, sigmamore

Note: the rank of the differenced variance matrix (2) does not equal the number
of coefficients being
tested (3); be sure this is what you expect, or there may be problems
computing the test. Examine
the output of your estimators for anything unexpected and possibly
consider scaling your variables
so that the coefficients are on a similar scale.

---- Coefficients ----


| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| iv_consist~t ols_effici~t Difference S.E.
-------------+----------------------------------------------------------------
w | .8101827 .7962188 .0139639 .0060356
p | .0173022 .1929343 -.1756322 .0756226
p_lag | .2162338 .0898847 .1263492 .0580855
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from ivreg
B = inconsistent under Ha, efficient under Ho; obtained from regress

Test: Ho: difference in coefficients not systematic

chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 7.27
Prob>chi2 = 0.0264
(V_b-V_B is not positive definite)

S-ar putea să vă placă și