Documente Academic
Documente Profesional
Documente Cultură
But there error terms in each equation are also correlated. Efficient
estimation requires we take account of this. THAT IS THE SUR
(SEEMINGLY UNRELATED REGRESSIONS). PART.
Hence in the regression for the ith equation there are endogenous (Y )
variables on the rhs AND the error term is correlated with the error terms in
other equations.
3SLS
log using "g:summ1.log"
log close
Clear
use http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2
regress c p p1 w
This means that p and w (which are not included in the instruments
are endogenous).
The output is as before, but it confirms
what the exogenous and endogenous
variables are.
c
p .0173022 .1312046 0.13 0.897 -.2595153 .2941197
p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696
w .8101827 .0447351 18.11 0.000 .7158 .9045654
_cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192
Endogenous variables: c p w
Exogenous variables: t wg g yr p1 x1 k1
2SLS Regression
ivreg c p1 (p w = t wg g yr p1 x1 k1)
Instrumented: p w
Instruments: p1 t wg g yr x1 k1
3SLS Regression
This format does two new things. First it specifies all the three
equations in the system. Note it has to do this. Because it needs to
calculate the covariances between the error terms and for this it needs
to know what the equations and hence the errors are.
c
p .1248904 .1081291 1.16 0.248 -.0870387 .3368194
p1 .1631439 .1004382 1.62 0.104 -.0337113 .3599992
w .790081 .0379379 20.83 0.000 .715724 .8644379
_cons 16.44079 1.304549 12.60 0.000 13.88392 18.99766
i
p -.0130791 .1618962 -0.08 0.936 -.3303898 .3042316
p1 .7557238 .1529331 4.94 0.000 .4559805 1.055467
k1 -.1948482 .0325307 -5.99 0.000 -.2586072 -.1310893
_cons 28.17785 6.793768 4.15 0.000 14.86231 41.49339
wp
x .4004919 .0318134 12.59 0.000 .3381388 .462845
x1 .181291 .0341588 5.31 0.000 .1143411 .2482409
yr .149674 .0279352 5.36 0.000 .094922 .2044261
_cons 1.797216 1.115854 1.61 0.107 -.3898181 3.984251
Endogenous variables: c p w i wp x
Exogenous variables: t wg g yr p1 x1 k1
Lets compare the three different sets of equations. Look at the coefficient on
w. In OLS very significant and in 2SLS not significant but in 3SLS its back to
similar with OLS and significant. That is odd.
Now I expect that if 2sls is different because of bias then so should 3sls. As it
stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does
not make an awful lot of sense.
Now this command stores the variances and covariances between the
error terms in a matrix I call sig.
Now this should give the correlation between the error terms from
equations 1 and 2.
Is the actual minus the predicted value, i.e. The error term from the 2nd
equation
correlate ri rc prints out the correlation between the two error terms
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix
sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc
. correlate ri rc
(obs=21)
ri rc
ri 1.0000
rc 0.3011 1.0000
The correlation is 0,30, close to what we had before. But not the same.
Now the main purpose of this class is to illustrate commands. So its not
too important. I think it could be because stata is not calculating the
e(sigma) matrix by dividing by n-k, but just n?????
Lets check
Click on help (on tool bar at the top of the screen to the right).
Click on stata command
In the dialogue box type reg3
Move down towards the end of the file and you get the following
Saved results
Scalars
e(N) number of observations
e(k) number of parameters
e(k_eq) number of equations
e(mss_ # ) model sum of squares for equation #
e(df_m # ) model degrees of freedom for equation #
e(rss_ # ) residual sum of squares for equation #
e(df_r) residual degrees of freedom ( small)
e(r2_ # ) R-squared for equation #
e(F_ # ) F statistic for equation # (small)
e(rmse_ # ) root mean squared error for equation #
e(dfk2_adj) divisor used with VCE when dfk2 specified
e(ll) log likelihood
e(chi2_ # ) chi-squared for equation #
e(p_ # ) significance for equation #
e(ic) number of iterations
e(cons_ # ) 1 when equation # has a constant; 0 otherwise
Some important retrievables
e(mss_#) model sum of squares for equation #
e(rss_#) residual sum of squares for equation #
e(r2_#) R-squared for equation #
e(F_#) F statistic for equation # (small)
e(rmse_#) root mean squared error for equation #
e(ll) log likelihood
And
Matrices
e(b) coefficient vector
e(Sigma) Sigma hat matrix
e(V) variance-covariance matrix of the estimators
The Hausman Test Again
We looked at this with respect to panel data. But it is a general test to
allow us to compare an equation which has been estimated by two
different techniques. Here we apply the technique to comparing ols
with 3sls.
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
EQNols EQN3sls Difference S.E.
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 0.06
Prob>chi2 = 0.9963
(V_b-V_B is not positive definite)
The table prints out the two sets of coefficients and their difference.
This is clearly very far from being significant at the 10% level.
The Hausman Test Again
Hence it would appear that the coefficients from the two
regressions are not significantly different.
display e(mss_2)
Print on the screen some of the retrievables from eqach regression (the
above the model sum of squared residuals for the second equation.
Type:
log close