Documente Academic
Documente Profesional
Documente Cultură
Jeroen Smits
(http://home.planet.nl/~smits.jeroen)
September 2003
1. Introduction
Many statistical software packages like SAS, STATA, or LIMDEP offer
the possibility to use the Heckman two-step procedure to control for
selection bias (although the possibilities in these packages
sometimes are rather limited). However in SPSS, the statistical
package which is widely used by social researchers, no procedure for
applying this method is available. That does not mean that it is
completely impossible to apply this method with SPSS. With some
additional computations, the SPSS procedures PROBIT or LOGISTIC
REGRESSION can be used to construct a Heckman selection bias control
factor. This control factor, then, can be added to an OLS regression
analysis in which selection bias is a problem, to produce unbiased
parameter estimates. To get also correct standard errors for these
parameters, another step can be taken in which a WLS regression
analysis is performed using weights constructed on the basis of the
outcomes of the earlier steps. This paper gives detailed
instructions on how this can be done.
1.3 Limitations
compute SUBJ=1.
PROBIT PARTW of SUBJ with AGEW EDUW CHILD
/log=none /print=none.
With this COMPUTE statement, the individual probit scores (IPS) are
computed and added to the temporary data file. These probit scores
are used to compute the Heckman control factor LAMBDA.
Jeroen Smits (http://home.planet.nl/~smits.jeroen) Heckman with SPSS 5
compute LAMBDA =
((1/sqrt(2*3.141592654))*(exp(-IPS*IPS*0.5)))/cdfnorm(IPS).
With the instruction " /save pred (IKL) " a new variable is made and
saved under the name IKL, which contains the individual
probabilities predicted by the model. Using the inverse cumulative
distribution function of the normal distribution, these individual
probabilities are translated into the form they would have had when
they would have been computed on the basis of a probit model:
The variable IPS now contains the quasi probit scores and can be
used to compute LAMBDA in the same way as when using a probit
selection model:
compute LAMBDA =
((1/sqrt(2*3.141592654))*(exp(-IPS*IPS*0.5)))/cdfnorm(IPS).
Computation of the help and control factor DELTA and testing whether
the value of DELTA is between -1 and 0:
REGRESSION /dep=INCW
/method=enter AGEW EDUW LAMBDA
/save resid (RES).
Besides RES, two help variables must be computed. The first one is
the regression coefficient of LAMBDA in the OLS analysis, which is
called LAMB. The second one is the number of cases used in the OLS
regression, called N.
compute LAMB=0.002648.
compute N=9024.
The variable RES2 and also DELTA, which was computed in the first
part of the analysis, have to be summed over all cases. In SPSS this
can be done automatically by first saving the aggregated totals in a
separate file and then reading them in again:
compute HELP = 1.
AGGREGATE /outfile=A /break=HELP
/RESS=sum(RES2)
/DELTAS=sum(DELTA).
MATCH FILES /table=A /file=* /by HELP.
Jeroen Smits (http://home.planet.nl/~smits.jeroen) Heckman with SPSS 7
Now the corrected value of the variance (VARC) and the standard
error (SEC) of the error term of the substantial equation can be
estimated:
REGRESSION /dep=INCW
/method=enter AGEW EDUW LAMBDA
/regwgt=WGT.
compute SUBJ=1.
Jeroen Smits (http://home.planet.nl/~smits.jeroen) Heckman with SPSS 8
The individual probit scores computed with this statement (IPS) are
used again to compute LAMBDA. With heterogeneity bias this
computation is more complex than in the classical selection bias
situation. Because migrants and nonmigrants need correction bias
factors with opposite signs, LAMBDA has to be computed for both
groups separated.
First we compute LAMBDA for the migrants:
If (MIGR=1) LAMBDA =
((1/sqrt(2*3.141592654))*(exp(-IPS*IPS*0.5)))/cdfnorm(IPS).
If (MIGR=0) LAMBDA =
-((1/sqrt(2*3.141592654))*(exp(-ips*ips*0.5)))/(1-cdfnorm(ips)).
From this point on, the procedure is the same as in the classical
case.
Using the quasi probit scores IPS we again compute LAMBDA separate
for migrants and nonmigrant. First the migrants:
If (MIGR=1) LAMBDA =
((1/sqrt(2*3.141592654))*(exp(-IPS*IPS*0.5)))/cdfnorm(IPS).
If (MIGR=0) LAMBDA =
-((1/sqrt(2*3.141592654))*(exp(-IPS*IPS*0.5)))/(1-cdfnorm(IPS)).
From this point on, the procedure is the same as in the classical
case.
Jeroen Smits (http://home.planet.nl/~smits.jeroen) Heckman with SPSS 9
REGRESSION /dep=INC
/method=enter AGE EDU MIGR LAMBDA
/save resid (RES).
References
Breen, Richard (1996). Regression Models: Censored, Sample Selected,
or Truncated Data. Sage University Paper no. 111. Thousand Oaks:
Sage.
Ploeg, Sjerp van der (1993). The Expansion of Secondary and Tertiary
Education in the Netherlands. PhD Thesis, Tilburg University.
Nijmegen: ITS.