3 Sls

3SLS
3SLS is the combination of 2SLS and SUR.
It is used in an system of equations which are endogenous, i.e. In each

equation there are endogenous variables on both the left and right hand
sides of the equation. THAT IS THE 2SLS PART.
But there error terms in each equation are also correlated. Efficient
estimation requires we take account of this. THAT IS THE SUR
(SEEMINGLY UNRELATED REGRESSIONS). PART.
Hence in the regression for the ith equation there are endogenous (Y )
variables on the rhs AND the error term is correlated with the error terms in
other equations.
3SLS
log using "g:summ1.log"
If you type the above then a log is created on drive g

(on my computer this is the flash drive, on yours you
may need to specify another drive.
The name summ1 can be anything. But the suffx

must be log
At the end you can close the log by typing:
log close
So open a log now and you will have a record of this

session
3SLS Load Data
Clear
use http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2
THAT link no longer works. But the following does

webuse klein
In order to get the rest to work
rename consump c
rename capital1 k1
rename invest i
rename profits p
rename govt g
rename wagegovt wg
rename taxnetx t
rename totinc t
rename wagepriv wp
generate x=totinc
*generate variables
generate w = wg+wp
generate k = k1+i
generate yr=year-1931
generate p1 = p[_n-1]
generate x1 = x[_n-1]
OLS Regression
regress c p p1 w
Regresses c on p , p1 and w (what this equation means is not so

important).
Usual output
Source SS df MS Number of obs = 21

F( 3, 17) = 292.71
Model 923.549937 3 307.849979 Prob > F = 0.0000
Residual 17.8794524 17 1.05173249 R-squared = 0.9810
Adj R-squared = 0.9777
Total 941.429389 20 47.0714695 Root MSE = 1.0255
c Coef. Std. Err. t P>|t| [95% Conf. Interval]
p .1929343 .0912102 2.12 0.049 .0004977 .385371

p1 .0898847 .0906479 0.99 0.335 -.1013658 .2811351
w .7962188 .0399439 19.93 0.000 .7119444 .8804931
_cons 16.2366 1.302698 12.46 0.000 13.48815 18.98506
reg3
By the command reg3, STATA estimates a system of structural

equations, where some equations contain endogenous variables
among the explanatory variables. Estimation is via three-stage
least squares (3SLS). Typically, the endogenous regressors are
dependent variables from other equations in the system.
In addition, reg3 can also estimate systems of equations by

seemingly unrelated regression (SURE), multivariate regression
(MVREG), and equation-by-equation ordinary least squares
(OLS) or two-stage least squares (2SLS).
2SLS Regression
reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1)
Regresses c on p , p1 and w. The instruments (i.e. The predetermined

or exogenous variables in this equation and the rest of the system) are
t wg g yr p1 x1 k1
This means that p and w (which are not included in the instruments
are endogenous).
The output is as before, but it confirms
what the exogenous and endogenous
variables are.
Two-stage least-squares regression
Equation Obs Parms RMSE "R-sq" F-Stat P
c 21 3 1.135659 0.9767 225.93 0.0000
Coef. Std. Err. t P>|t| [95% Conf. Interval]
c
p .0173022 .1312046 0.13 0.897 -.2595153 .2941197
p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696
w .8101827 .0447351 18.11 0.000 .7158 .9045654
_cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192
Endogenous variables: c p w
Exogenous variables: t wg g yr p1 x1 k1
2SLS Regression
ivreg c p1 (p w = t wg g yr p1 x1 k1)
This is an alternative command to do the same thing. Note that the

endogenous variables on the right hand side of the equation are
specified in (p w
And the instruments follow the = sign.

The results are identical.
Instrumental variables (2SLS) regression
Source SS df MS Number of obs = 21

F( 3, 17) = 225.93
Model 919.504138 3 306.501379 Prob > F = 0.0000
Residual 21.9252518 17 1.28972069 R-squared = 0.9767
Adj R-squared = 0.9726
Total 941.429389 20 47.0714695 Root MSE = 1.1357
c Coef. Std. Err. t P>|t| [95% Conf. Interval]
p .0173022 .1312046 0.13 0.897 -.2595153 .2941197

w .8101827 .0447351 18.11 0.000 .7158 .9045654
p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696
_cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192
Instrumented: p w
Instruments: p1 t wg g yr x1 k1
3SLS Regression
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
This format does two new things. First it specifies all the three
equations in the system. Note it has to do this. Because it needs to
calculate the covariances between the error terms and for this it needs
to know what the equations and hence the errors are.
Secondly it says 3sls not 2sls

All 3 equations are printed out. This tells us
what these equations look like
Three-stage least-squares regression
Equation Obs Parms RMSE "R-sq" chi2 P
c 21 3 .9443305 0.9801 864.59 0.0000

i 21 3 1.446736 0.8258 162.98 0.0000
wp 21 3 .7211282 0.9863 1594.75 0.0000
Coef. Std. Err. z P>|z| [95% Conf. Interval]
c
p .1248904 .1081291 1.16 0.248 -.0870387 .3368194
p1 .1631439 .1004382 1.62 0.104 -.0337113 .3599992
w .790081 .0379379 20.83 0.000 .715724 .8644379
_cons 16.44079 1.304549 12.60 0.000 13.88392 18.99766
i
p -.0130791 .1618962 -0.08 0.936 -.3303898 .3042316
p1 .7557238 .1529331 4.94 0.000 .4559805 1.055467
k1 -.1948482 .0325307 -5.99 0.000 -.2586072 -.1310893
_cons 28.17785 6.793768 4.15 0.000 14.86231 41.49339
wp
x .4004919 .0318134 12.59 0.000 .3381388 .462845
x1 .181291 .0341588 5.31 0.000 .1143411 .2482409
yr .149674 .0279352 5.36 0.000 .094922 .2044261
_cons 1.797216 1.115854 1.61 0.107 -.3898181 3.984251
Endogenous variables: c p w i wp x
Exogenous variables: t wg g yr p1 x1 k1
Lets compare the three different sets of equations. Look at the coefficient on
w. In OLS very significant and in 2SLS not significant but in 3SLS its back to
similar with OLS and significant. That is odd.
Now I expect that if 2sls is different because of bias then so should 3sls. As it
stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does
not make an awful lot of sense.
But we do not have many observations. Perhaps that is partly why.
3SLS 2SLS OLS

coefficient t stat coefficient t stat coefficient t stat
p 0.125 1.16 0.017 0.13 0.193 2.12
p1 0.163 1.62 0.810 18.11 0.090 0.99
w 0.790 20.83 0.216 1.81 0.796 19.93
_cons 16.441 12.6 16.555 11.28 16.237 12.46
R2 0.98 0.977 0.981
3SLS Regression

matrix sig=e(Sigma)
Now this command stores the variances and covariances between the
error terms in a matrix I call sig.
You have used generate to generate variables, scalar to generate scalars.

Similarly matrix produces a matrix.
e(Sigma)stores this variance covariance matrix from the previous

regression
3SLS Regression
matrix sig=e(Sigma) . display sig[1,1], sig[1,2], sig[1,3]
display sig[1,1], sig[1,2], sig[1,3] 1.0440596 .43784767 -.3852272
display sig[2,1], sig[2,2], sig[2,3]
.
display sig[3,1], sig[3,2], sig[3,3] . display sig[2,1], sig[2,2], sig[2,3]
.43784767 1.3831832 .19260612
Variance of 1st error term
.
. display sig[3,1], sig[3,2], sig[3,3]
-.3852272 .19260612 .47642626
1.04406 0.437848 -0.38523
0.437848 1.383183 0.192606 Covariance of error terms

from equations 2 and 3
-0.38523 0.192606 0.476426

3SLS Regression
1.04406 0.437848 -0.38523 This relates to the variance

covariance matrix in the lecture
0.437848 1.383183 0.192606
Hence 0.437848 relates to 12 and
of course 21
-0.38523 0.192606 0.476426
This matrix is
3SLS Regression
display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5)
Now this should give the correlation between the error terms from
equations 1 and 2.
It is this formula Correlation (x, y) = xy /(x x). When we do this we get:
. display sig[1,2]/( sig[1,1]^0.5* sig[2,2]^0.5)

.36435149
Lets check
matrix sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc
matrix cy= e(b) stores the coefficients from the regression in a

regression vector we call cy,
cy[1,1] is the first coefficient on p in the first equation

cy[1,4] is the fourth coefficient in the first equation (the constant term)
cy[1,5] is the first coefficient ion p in the second equation
Note this is cy[1,5] NOT cy[2,1]
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1
k1)matrix sig=e(Sigma)
matrix cy= e(b)
correlate ri rc
Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value

from this first regression. and
i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
Is the actual minus the predicted value, i.e. The error term from the 2nd
equation
correlate ri rc prints out the correlation between the two error terms
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix
sig=e(Sigma)
matrix cy= e(b)
correlate ri rc
. correlate ri rc
(obs=21)
ri rc
ri 1.0000
rc 0.3011 1.0000
The correlation is 0,30, close to what we had before. But not the same.
Now the main purpose of this class is to illustrate commands. So its not
too important. I think it could be because stata is not calculating the
e(sigma) matrix by dividing by n-k, but just n?????
Lets check
Click on help (on tool bar at the top of the screen to the right).
Click on stata command
In the dialogue box type reg3
Move down towards the end of the file and you get the following
Saved results
reg3 saves the following in e():
Scalars
e(N) number of observations
e(k) number of parameters
e(k_eq) number of equations
e(mss_ # ) model sum of squares for equation #
e(df_m # ) model degrees of freedom for equation #
e(rss_ # ) residual sum of squares for equation #
e(df_r) residual degrees of freedom ( small)
e(r2_ # ) R-squared for equation #
e(F_ # ) F statistic for equation # (small)
e(rmse_ # ) root mean squared error for equation #
e(dfk2_adj) divisor used with VCE when dfk2 specified
e(ll) log likelihood
e(chi2_ # ) chi-squared for equation #
e(p_ # ) significance for equation #
e(ic) number of iterations
e(cons_ # ) 1 when equation # has a constant; 0 otherwise
Some important retrievables
e(mss_#) model sum of squares for equation #
e(rss_#) residual sum of squares for equation #
e(r2_#) R-squared for equation #
e(F_#) F statistic for equation # (small)
e(rmse_#) root mean squared error for equation #
e(ll) log likelihood
Where # is a number e.g. If 2 it means equation 2.
And
Matrices
e(b) coefficient vector
e(Sigma) Sigma hat matrix
e(V) variance-covariance matrix of the estimators
The Hausman Test Again
We looked at this with respect to panel data. But it is a general test to
allow us to compare an equation which has been estimated by two
different techniques. Here we apply the technique to comparing ols
with 3sls.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols

est store EQNols
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1

x1 k1)
est store EQN3sls
hausman EQNols EQN3sls

Below we run the three regressions specifying ols and store
the results as EQNols.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols

est store EQNols
Then we run the three regressions specifying 3sls and store

the results as EQN3sls.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1

x1 k1)
est store EQN3sls
Then we do the Hausman test

hausman EQNols EQN3sls
The Results
. hausman EQNols EQN3sls
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
EQNols EQN3sls Difference S.E.
p .1929343 .1248904 .068044 .

p1 .0898847 .1631439 -.0732592 .
w .7962188 .790081 .0061378 .0124993
b = consistent under Ho and Ha; obtained from reg3

B = inconsistent under Ha, efficient under Ho; obtained from reg3
Test: Ho: difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 0.06
Prob>chi2 = 0.9963
(V_b-V_B is not positive definite)
The table prints out the two sets of coefficients and their difference.
The Hausman test statistic is 0.06
The significance level is 0.9963
This is clearly very far from being significant at the 10% level.
Hence it would appear that the coefficients from the two
regressions are not significantly different.
If OLS was giving biased estimates that 3SLS corrects they

would be different.
Hence we would conclude that there is no endogeneity which

requires endogenous techniques.
But because the error terms do appear correlated SUR is

probably the approriate technique as it produces better
results.
Tasks
1. Using the display command, e.g.
display e(mss_2)
Print on the screen some of the retrievables from eqach regression (the
above the model sum of squared residuals for the second equation.
2. Lets look at the display command
Type:
display "The residual sum of squares =" e(mss_2)

Tasks
display "The residual sum of squares =" e(mss_2), "and the
R2 =" e(r2_2)
display _column(20) "The residual sum of squares ="

e(mss_2), _column(50) "and the R2 =" e(r2_2)

e(mss_2), _column(60) "and the R2 =" e(r2_2)

e(mss_2), _column(60) "and the R2 =" _skip(5) e(r2_2)

e(mss_2), _column(60) "and the R2 =" _skip(10) e(r2_2)
Tasks
Close log:
log close
And have a look at it in word.

webuse klein
In order to get the rest to work
rename consump c
rename capital1 k1
rename invest i
rename profits p
rename govt g
rename wagegovt wg
rename taxnetx t
rename totinc t
rename wagepriv wp
generate x=totinc
generate w = wg+wp
generate k = k1+i
generate yr=year-1931
generate p1 = p[_n-1]
generate x1 = x[_n-1]
reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1
k1)
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),
3sls inst(t wg g yr p1 x1 k1)

3 Sls

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

3 Sls

Încărcat de

Drepturi de autor:

Formate disponibile

3SLS

3SLS is the combination of 2SLS and SUR.

It is used in an system of equations which are endogenous, i.e. In each

If you type the above then a log is created on drive g

The name summ1 can be anything. But the suffx

At the end you can close the log by typing:

So open a log now and you will have a record of this

THAT link no longer works. But the following does

Regresses c on p , p1 and w (what this equation means is not so

Source SS df MS Number of obs = 21

c Coef. Std. Err. t P>|t| [95% Conf. Interval]

p .1929343 .0912102 2.12 0.049 .0004977 .385371

By the command reg3, STATA estimates a system of structural

In addition, reg3 can also estimate systems of equations by

reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1)

Regresses c on p , p1 and w. The instruments (i.e. The predetermined

Two-stage least-squares regression

Equation Obs Parms RMSE "R-sq" F-Stat P

c 21 3 1.135659 0.9767 225.93 0.0000

Coef. Std. Err. t P>|t| [95% Conf. Interval]

This is an alternative command to do the same thing. Note that the

And the instruments follow the = sign.

Instrumental variables (2SLS) regression

Source SS df MS Number of obs = 21

c Coef. Std. Err. t P>|t| [95% Conf. Interval]

p .0173022 .1312046 0.13 0.897 -.2595153 .2941197

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)

Secondly it says 3sls not 2sls

Equation Obs Parms RMSE "R-sq" chi2 P

c 21 3 .9443305 0.9801 864.59 0.0000

Coef. Std. Err. z P>|z| [95% Conf. Interval]

But we do not have many observations. Perhaps that is partly why.

3SLS 2SLS OLS

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)

You have used generate to generate variables, scalar to generate scalars.

e(Sigma)stores this variance covariance matrix from the previous

0.437848 1.383183 0.192606 Covariance of error terms

-0.38523 0.192606 0.476426

1.04406 0.437848 -0.38523 This relates to the variance

display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5)

It is this formula Correlation (x, y) = xy /(x x). When we do this we get:

. display sig[1,2]/( sig[1,1]^0.5* sig[2,2]^0.5)

matrix cy= e(b) stores the coefficients from the regression in a

cy[1,1] is the first coefficient on p in the first equation

Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value

i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])

reg3 saves the following in e():

Where # is a number e.g. If 2 it means equation 2.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1

hausman EQNols EQN3sls

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols

Then we run the three regressions specifying 3sls and store

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1

Then we do the Hausman test

p .1929343 .1248904 .068044 .

b = consistent under Ho and Ha; obtained from reg3

Test: Ho: difference in coefficients not systematic

The Hausman test statistic is 0.06

The significance level is 0.9963

If OLS was giving biased estimates that 3SLS corrects they

Hence we would conclude that there is no endogeneity which

But because the error terms do appear correlated SUR is

Thus cy[1,1]p+ cy[1,2]p1+ cy[1,3]*w+cy[1,4] is the predicted value

i-(cy[1,5]p+ cy[1,6]p1+ cy[1,7]*k1+ cy[1,8])