Documente Academic
Documente Profesional
Documente Cultură
Vartanian: SW 541
Were examining the age when people first get married. We have a number of people with missing values
because they have never been married. We want to see if not being married is non-random process, and correct
for this non-randomness of not being married in our regression results.
First, well examine if age of first marriage is missing for many people in our sample. I do this in Stata with the
following commands.
. generate nomarr=agemarr~=.
. tab nomarr
Here, we see that of our 3,983 observations, 2,811 are married and 1,172 are not ever married, and therefore
will not have a valid value for agemarr. (It seems like these numbers should be just the opposite that 2811 are
not married, but this is a quirk with Stata.)
Next, we can run a probit analysis (which is what is used in the Heckman models instead of a logit analysis) to
see if any factors are affecting the likelihood of being married, and thus having an age at first marriage. (Ive
chosen these variables somewhat randomly.)
Next, well run the model with possible selection bias, then run a Heckman selection model.
C:\WP60\LECT2.PHD\Heckman Selection\Heckman Selection Models.doc 1
OLS model
. regress agemarr income male norelig bigcity kds
Source | SS df MS Number of obs = 2804
-------------+------------------------------ F( 5, 2798) = 48.64
Model | 3961.61695 5 792.323389 Prob > F = 0.0000
Residual | 45582.367 2798 16.2910533 R-squared = 0.0800
-------------+------------------------------ Adj R-squared = 0.0783
Total | 49543.984 2803 17.6753421 Root MSE = 4.0362
------------------------------------------------------------------------------
agemarr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .0000127 2.18e-06 5.81 0.000 8.39e-06 .0000169
male | 1.524309 .1530466 9.96 0.000 1.224213 1.824404
bigcity | 1.409821 .1618219 8.71 0.000 1.092519 1.727124
kds | .1954122 .0457291 4.27 0.000 .105746 .2850784
_cons | 21.02824 .2366657 88.85 0.000 20.56418 21.49229
If we look at this closely, we see that the selection variables appear to affect the likelihood of being
censored out of the sample but that the hypothesis that rho=0 is accepted (or we fail to reject rho=0). If
we then examine the coefficient estimates for the two models, well see that they are very similar. In this
case, the OLS model is not biased.
Were examining the wages of wives. There are over 800 wives with 0 wages. We would like to determine if
this is a random process where some wives work and some do not, or a non-random process. First, our OLS
models give us the following information.
. regress wageswf income kids youngest
Source | SS df MS Number of obs = 1726
-------------+------------------------------ F( 3, 1722) = 188.40
Model | 34064.8457 3 11354.9486 Prob > F = 0.0000
Residual | 103787.983 1722 60.2717669 R-squared = 0.2471
-------------+------------------------------ Adj R-squared = 0.2458
Total | 137852.828 1725 79.9146831 Root MSE = 7.7635
------------------------------------------------------------------------------
wageswf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .0001151 4.86e-06 23.70 0.000 .0001056 .0001246
kids | .2038852 .1834453 1.11 0.267 -.1559138 .5636843
youngest | -.0856795 .0406674 -2.11 0.035 -.1654423 -.0059168
_cons | 4.429739 .3830775 11.56 0.000 3.678393 5.181085
We next run a Heckman model with a limited number of selection variables.
. heckman wageswf income kids youngest, select(youngest white)
Iteration 0: log likelihood = -8501.7992
Iteration 1: log likelihood = -7903.3723
Iteration 2: log likelihood = -7793.5076
Iteration 3: log likelihood = -7612.9477
Iteration 4: log likelihood = -7565.4426
Iteration 5: log likelihood = -7564.2854
Iteration 6: log likelihood = -7564.2536
Iteration 7: log likelihood = -7564.2529
Heckman selection model Number of obs = 2560
(regression model with sample selection) Censored obs = 834
Uncensored obs = 1726
Wald chi2(3) = 565.28
Log likelihood = -7564.253 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wageswf |
income | .0001151 4.85e-06 23.72 0.000 .0001056 .0001246
kids | .203736 .1833006 1.11 0.266 -.1555266 .5629986
youngest | -.0864731 .04876 -1.77 0.076 -.182041 .0090947
_cons | 4.452173 .8532013 5.22 0.000 2.779929 6.124417
-------------+----------------------------------------------------------------
select |
youngest | .0459541 .0055746 8.24 0.000 .0350282 .0568801
white | -.0066306 .0583676 -0.11 0.910 -.1210289 .1077678
_cons | .3075549 .0536253 5.74 0.000 .2024512 .4126586
-------------+----------------------------------------------------------------
/athrho | -.0047056 .1600203 -0.03 0.977 -.3183397 .3089284
/lnsigma | 2.048273 .0170247 120.31 0.000 2.014905 2.081641
-------------+----------------------------------------------------------------
rho | -.0047056 .1600168 -.3080049 .2994619
sigma | 7.754496 .1320182 7.500014 8.017612
lambda | -.0364894 1.240864 -2.468538 2.395559
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0): chi2(1) = 0.00 Prob > chi2 = 0.9795
------------------------------------------------------------------------------
From this, we find that the OLS coefficients are no different (statistically) than the Heckman corrections.
Lets use a few more selection variables to see what happens.
With these new selection variables, we find that our OLS models were biased, and we are correct in
correcting these coefficient estimates with this Heckman correction model. We can see what happens to
the effects of Youngest, for example.