Sunteți pe pagina 1din 31

3SLS

3SLS is the combination of 2SLS and SUR.

It is used in an system of equations which are endogenous, i.e. In each


equation there are endogenous variables on both the left and right hand
sides of the equation. THAT IS THE 2SLS PART.

But there error terms in each equation are also correlated. Efficient
estimation requires we take account of this. THAT IS THE SUR
(SEEMINGLY UNRELATED REGRESSIONS). PART.

Hence in the regression for the ith equation there are endogenous (Y )
variables on the rhs AND the error term is correlated with the error terms in
other equations.
3SLS
log using "g:summ1.log"

If you type the above then a log is created on drive g


(on my computer this is the flash drive, on yours you
may need to specify another drive.

The name summ1 can be anything. But the suffx


must be log

At the end you can close the log by typing:

log close

So open a log now and you will have a record of this


session
3SLS Load Data

Clear
use http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2

THAT link no longer works. But the following does


webuse klein
In order to get the rest to work
rename consump c
rename capital1 k1
rename invest i
rename profits p
rename govt g
rename wagegovt wg
rename taxnetx t
rename totinc t
rename wagepriv wp
generate x=totinc
*generate variables
generate w = wg+wp
generate k = k1+i
generate yr=year-1931
generate p1 = p[_n-1]
generate x1 = x[_n-1]
OLS Regression

regress c p p1 w

Regresses c on p , p1 and w (what this equation means is not so


important).
Usual output

Source SS df MS Number of obs = 21


F( 3, 17) = 292.71
Model 923.549937 3 307.849979 Prob > F = 0.0000
Residual 17.8794524 17 1.05173249 R-squared = 0.9810
Adj R-squared = 0.9777
Total 941.429389 20 47.0714695 Root MSE = 1.0255

c Coef. Std. Err. t P>|t| [95% Conf. Interval]

p .1929343 .0912102 2.12 0.049 .0004977 .385371


p1 .0898847 .0906479 0.99 0.335 -.1013658 .2811351
w .7962188 .0399439 19.93 0.000 .7119444 .8804931
_cons 16.2366 1.302698 12.46 0.000 13.48815 18.98506
reg3

By the command reg3, STATA estimates a system of structural


equations, where some equations contain endogenous variables
among the explanatory variables. Estimation is via three-stage
least squares (3SLS). Typically, the endogenous regressors are
dependent variables from other equations in the system.

In addition, reg3 can also estimate systems of equations by


seemingly unrelated regression (SURE), multivariate regression
(MVREG), and equation-by-equation ordinary least squares
(OLS) or two-stage least squares (2SLS).
2SLS Regression

reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1)

Regresses c on p , p1 and w. The instruments (i.e. The predetermined


or exogenous variables in this equation and the rest of the system) are
t wg g yr p1 x1 k1

This means that p and w (which are not included in the instruments
are endogenous).
The output is as before, but it confirms
what the exogenous and endogenous
variables are.

Two-stage least-squares regression

Equation Obs Parms RMSE "R-sq" F-Stat P

c 21 3 1.135659 0.9767 225.93 0.0000

Coef. Std. Err. t P>|t| [95% Conf. Interval]

c
p .0173022 .1312046 0.13 0.897 -.2595153 .2941197
p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696
w .8101827 .0447351 18.11 0.000 .7158 .9045654
_cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192

Endogenous variables: c p w
Exogenous variables: t wg g yr p1 x1 k1
2SLS Regression

ivreg c p1 (p w = t wg g yr p1 x1 k1)

This is an alternative command to do the same thing. Note that the


endogenous variables on the right hand side of the equation are
specified in (p w

And the instruments follow the = sign.


The results are identical.

Instrumental variables (2SLS) regression

Source SS df MS Number of obs = 21


F( 3, 17) = 225.93
Model 919.504138 3 306.501379 Prob > F = 0.0000
Residual 21.9252518 17 1.28972069 R-squared = 0.9767
Adj R-squared = 0.9726
Total 941.429389 20 47.0714695 Root MSE = 1.1357

c Coef. Std. Err. t P>|t| [95% Conf. Interval]

p .0173022 .1312046 0.13 0.897 -.2595153 .2941197


w .8101827 .0447351 18.11 0.000 .7158 .9045654
p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696
_cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192

Instrumented: p w
Instruments: p1 t wg g yr x1 k1
3SLS Regression

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)

This format does two new things. First it specifies all the three
equations in the system. Note it has to do this. Because it needs to
calculate the covariances between the error terms and for this it needs
to know what the equations and hence the errors are.

Secondly it says 3sls not 2sls


All 3 equations are printed out. This tells us
what these equations look like
Three-stage least-squares regression

Equation Obs Parms RMSE "R-sq" chi2 P

c 21 3 .9443305 0.9801 864.59 0.0000


i 21 3 1.446736 0.8258 162.98 0.0000
wp 21 3 .7211282 0.9863 1594.75 0.0000

Coef. Std. Err. z P>|z| [95% Conf. Interval]

c
p .1248904 .1081291 1.16 0.248 -.0870387 .3368194
p1 .1631439 .1004382 1.62 0.104 -.0337113 .3599992
w .790081 .0379379 20.83 0.000 .715724 .8644379
_cons 16.44079 1.304549 12.60 0.000 13.88392 18.99766

i
p -.0130791 .1618962 -0.08 0.936 -.3303898 .3042316
p1 .7557238 .1529331 4.94 0.000 .4559805 1.055467
k1 -.1948482 .0325307 -5.99 0.000 -.2586072 -.1310893
_cons 28.17785 6.793768 4.15 0.000 14.86231 41.49339

wp
x .4004919 .0318134 12.59 0.000 .3381388 .462845
x1 .181291 .0341588 5.31 0.000 .1143411 .2482409
yr .149674 .0279352 5.36 0.000 .094922 .2044261
_cons 1.797216 1.115854 1.61 0.107 -.3898181 3.984251

Endogenous variables: c p w i wp x
Exogenous variables: t wg g yr p1 x1 k1
Lets compare the three different sets of equations. Look at the coefficient on
w. In OLS very significant and in 2SLS not significant but in 3SLS its back to
similar with OLS and significant. That is odd.

Now I expect that if 2sls is different because of bias then so should 3sls. As it
stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does
not make an awful lot of sense.

But we do not have many observations. Perhaps that is partly why.

3SLS 2SLS OLS


coefficient t stat coefficient t stat coefficient t stat
p 0.125 1.16 0.017 0.13 0.193 2.12
p1 0.163 1.62 0.810 18.11 0.090 0.99
w 0.790 20.83 0.216 1.81 0.796 19.93
_cons 16.441 12.6 16.555 11.28 16.237 12.46
R2 0.98 0.977 0.981
3SLS Regression

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)


matrix sig=e(Sigma)

Now this command stores the variances and covariances between the
error terms in a matrix I call sig.

You have used generate to generate variables, scalar to generate scalars.


Similarly matrix produces a matrix.

e(Sigma)stores this variance covariance matrix from the previous


regression
3SLS Regression
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
matrix sig=e(Sigma) . display sig[1,1], sig[1,2], sig[1,3]
display sig[1,1], sig[1,2], sig[1,3] 1.0440596 .43784767 -.3852272
display sig[2,1], sig[2,2], sig[2,3]
.
display sig[3,1], sig[3,2], sig[3,3] . display sig[2,1], sig[2,2], sig[2,3]
.43784767 1.3831832 .19260612
Variance of 1st error term
.
. display sig[3,1], sig[3,2], sig[3,3]
-.3852272 .19260612 .47642626
1.04406 0.437848 -0.38523

0.437848 1.383183 0.192606 Covariance of error terms


from equations 2 and 3

-0.38523 0.192606 0.476426


3SLS Regression

1.04406 0.437848 -0.38523 This relates to the variance


covariance matrix in the lecture
0.437848 1.383183 0.192606
Hence 0.437848 relates to 12 and
of course 21
-0.38523 0.192606 0.476426
This matrix is
3SLS Regression

display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5)

Now this should give the correlation between the error terms from
equations 1 and 2.

It is this formula Correlation (x, y) = xy /(x x). When we do this we get:

. display sig[1,2]/( sig[1,1]^0.5* sig[2,2]^0.5)


.36435149
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
matrix sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc

matrix cy= e(b) stores the coefficients from the regression in a


regression vector we call cy,

cy[1,1] is the first coefficient on p in the first equation


cy[1,4] is the fourth coefficient in the first equation (the constant term)
cy[1,5] is the first coefficient ion p in the second equation
Note this is cy[1,5] NOT cy[2,1]
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1
k1)matrix sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc

Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value


from this first regression. and

i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])

Is the actual minus the predicted value, i.e. The error term from the 2nd
equation
correlate ri rc prints out the correlation between the two error terms
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix
sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc

. correlate ri rc
(obs=21)

ri rc

ri 1.0000
rc 0.3011 1.0000

The correlation is 0,30, close to what we had before. But not the same.
Now the main purpose of this class is to illustrate commands. So its not
too important. I think it could be because stata is not calculating the
e(sigma) matrix by dividing by n-k, but just n?????
Lets check
Click on help (on tool bar at the top of the screen to the right).
Click on stata command
In the dialogue box type reg3

Move down towards the end of the file and you get the following

Saved results

reg3 saves the following in e():

Scalars
e(N) number of observations
e(k) number of parameters
e(k_eq) number of equations
e(mss_ # ) model sum of squares for equation #
e(df_m # ) model degrees of freedom for equation #
e(rss_ # ) residual sum of squares for equation #
e(df_r) residual degrees of freedom ( small)
e(r2_ # ) R-squared for equation #
e(F_ # ) F statistic for equation # (small)
e(rmse_ # ) root mean squared error for equation #
e(dfk2_adj) divisor used with VCE when dfk2 specified
e(ll) log likelihood
e(chi2_ # ) chi-squared for equation #
e(p_ # ) significance for equation #
e(ic) number of iterations
e(cons_ # ) 1 when equation # has a constant; 0 otherwise
Some important retrievables
e(mss_#) model sum of squares for equation #
e(rss_#) residual sum of squares for equation #
e(r2_#) R-squared for equation #
e(F_#) F statistic for equation # (small)
e(rmse_#) root mean squared error for equation #
e(ll) log likelihood

Where # is a number e.g. If 2 it means equation 2.

And

Matrices
e(b) coefficient vector
e(Sigma) Sigma hat matrix
e(V) variance-covariance matrix of the estimators
The Hausman Test Again
We looked at this with respect to panel data. But it is a general test to
allow us to compare an equation which has been estimated by two
different techniques. Here we apply the technique to comparing ols
with 3sls.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols


est store EQNols

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1


x1 k1)
est store EQN3sls

hausman EQNols EQN3sls


The Hausman Test Again
Below we run the three regressions specifying ols and store
the results as EQNols.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols


est store EQNols

Then we run the three regressions specifying 3sls and store


the results as EQN3sls.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1


x1 k1)
est store EQN3sls

Then we do the Hausman test


hausman EQNols EQN3sls
The Results
. hausman EQNols EQN3sls

Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
EQNols EQN3sls Difference S.E.

p .1929343 .1248904 .068044 .


p1 .0898847 .1631439 -.0732592 .
w .7962188 .790081 .0061378 .0124993

b = consistent under Ho and Ha; obtained from reg3


B = inconsistent under Ha, efficient under Ho; obtained from reg3

Test: Ho: difference in coefficients not systematic

chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 0.06
Prob>chi2 = 0.9963
(V_b-V_B is not positive definite)

The table prints out the two sets of coefficients and their difference.

The Hausman test statistic is 0.06

The significance level is 0.9963

This is clearly very far from being significant at the 10% level.
The Hausman Test Again
Hence it would appear that the coefficients from the two
regressions are not significantly different.

If OLS was giving biased estimates that 3SLS corrects they


would be different.

Hence we would conclude that there is no endogeneity which


requires endogenous techniques.

But because the error terms do appear correlated SUR is


probably the approriate technique as it produces better
results.
Tasks
1. Using the display command, e.g.

display e(mss_2)

Print on the screen some of the retrievables from eqach regression (the
above the model sum of squared residuals for the second equation.

2. Lets look at the display command

Type:

display "The residual sum of squares =" e(mss_2)


Tasks
display "The residual sum of squares =" e(mss_2), "and the
R2 =" e(r2_2)

display _column(20) "The residual sum of squares ="


e(mss_2), _column(50) "and the R2 =" e(r2_2)

display _column(20) "The residual sum of squares ="


e(mss_2), _column(60) "and the R2 =" e(r2_2)

display _column(20) "The residual sum of squares ="


e(mss_2), _column(60) "and the R2 =" _skip(5) e(r2_2)

display _column(20) "The residual sum of squares ="


e(mss_2), _column(60) "and the R2 =" _skip(10) e(r2_2)
Tasks
Close log:

log close

And have a look at it in word.


webuse klein
In order to get the rest to work
rename consump c
rename capital1 k1
rename invest i
rename profits p
rename govt g
rename wagegovt wg
rename taxnetx t
rename totinc t
rename wagepriv wp
generate x=totinc
generate w = wg+wp
generate k = k1+i
generate yr=year-1931
generate p1 = p[_n-1]
generate x1 = x[_n-1]
reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1
k1)
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),
3sls inst(t wg g yr p1 x1 k1)

S-ar putea să vă placă și