Documente Academic
Documente Profesional
Documente Cultură
∗
GSI: Caroline, Chris, Jimmy, Kaushiki, Leah
Economists are often interested in the factors behind the decision-making of individuals or enterprises:
• Why do some women enter the labor force while others do not?
The models that have been developed for this purpose are known as qualitative response or binary choice
models, with the outcome, which we will denote Y, being assigned a value of 1 if the event occurs and 0
otherwise. Models with more than two possible outcomes have also been developed, but we will conne
• Pi ≡ P (Yi = 1|X1i , X2i , . . . , Xki ) = F (β0 +β1 X1i +· · ·+βk Xki ), for some function F . In other words,
the probability of Yi being equal to 1 given explanatory variables is a function of linear combinations
of independent variables.
Eect for a continuous variable, Xji , = ∂ P̂i/∂Xji = β̂j f (βˆ0 + · · · + βˆk Xki ), where f is the
Eect for a dummy variable = F (βˆ0 + · · · + β̂j · 1 + . . . β̂k Xki ) − F (βˆ0 + · · · + β̂j · 0 + . . . β̂k Xki ).
∗
Thanks for previous GSIs for sharing section notes.
1
1.2 Linear Probability Model (LPM)
The simplest binary choice model is the linear probability model where, as the name implies, the probability
of the event occurring, Pi , is assumed to be a linear function of a set of explanatory variables. That is we
assume that:
• In other words,
• As usual, the value of the dependent variable Yi in observation i has a nonstochastic component
and a random component. The nonstochastic component depends on Xi and the parameters. The
random component is the disturbance term: Yi = E(Yi |X1i , . . . , Xki ) + ui . So, our model can be
Yi = β0 + β1 X1i + · · · + βk Xik + ui .
• So, using heteroskedasticity robust standard error, we can apply the same method to estimate and
Remarks:
Yi = 1 ⇒ ui = 1 − β0 − β1 X1i − · · · − βk Xki
Yi = 0 ⇒ ui = −β0 − β1 X1i − · · · − βk Xki .
• It is heteroskedastic as shown in Exercise 3. It can be shown that V ar(ui |X1i , . . . , Xki ) = Pi (1−Pi ) =
(β0 + β1 X1i + . . . βk Xki )[1 − (β0 + β1 X1i + . . . βk Xki )]. So the variance of the disturbance term is a
• It may predict probabilities of more than 1 or less than 0 depending on the values of parameters
and independent variables. Linear probability models assume that if the X variable changes from
1 to 2 then you have the same change in probability as for someone who goes from X = 1000 to
X = 1001. This is why the predicted values can go above 1 or below 0. Although it is very simple
and has straightforward methods for statistical inferences, we need another models to deal with
2
1.3 Logit and Probit Models
The linear probability model may make the nonsense predictions that an event will occur with probability
greater than 1 or less than 0. Instead you may want to consider a non-linear model where your predicted
values cannot go above 1 or below 0. The usual way of avoiding this problem is to assume that the
probability is F (Z), where Z is a function of the explanatory variables and F is a cumulative distribution
Now, the probability of Yi being equal to 1 given explanatory variables is F (β0 + β1 X1i + . . . βk Xki ).
Since F is a cdf, it always lies in [0, 1]. And our model can be written as
• Probit Model: When F is a cdf of the standard normal distribution, then the above model is so
called probit regression. That is, F (β0 + β1 X1i + . . . βk Xki ) = Φ(β0 + β1 X1i + . . . βk Xki ), where Φ
is the c.d.f. of the standard normal distribution.
• Logit Model: When F is a cdf of the standard logistic distribution, then the above model is so
called logistic regression. That is, F (β0 + β1 X1i + . . . βk Xki ) = Λ(β0 + β1 X1i + . . . βk Xki ) =
1
.
1+e−(β0 +β1 X1i +...βk Xki )
• As the above shows, these models are not linear. So we cannot use the OLS method in this case.
Instead, we rely on a non-linear least squares estimator (NLSE) or the maximum likelihood estimator
(MLE). These methods are technically and computationally much more complicated than the OLSE.
• In STATA, you may use `probit' or `logit' commands to estimate the model and derive standard
errors.
• Eect of a change in Xi :
The important thing to remember is that under both assumptions F is non-linear, which means
3
2 Instrumental Variables Regression
In the last classes we have discussed several problems (or internal threats) that make the error term cor-
related with the regressor (Cov(X, u) 6= 0) such as omitted variables, errors in variables, and simultaneity
causality. Instrumental variables (IV) regression is a nice way to address these issues. To understand
how IV works think of the variation in X as having two parts: one that is correlated with u (the bad
variation), and a second the part that is not correlated with u (the good variation). Roughly speaking
IV regression uses information from another variable Z (called instrument) to isolate the good variation
in X and disregard the bad variation (which biases the OLS estimates). As a result, the IV estimate
of the coecient on X will give us a consistent estimate of the eect of the good variation part of X on Y.
An important thing to keep in mind is that for this approach to work we need to nd an instrument Z
that is related to X and unrelated to things we don't observe that might also change Y (so that Z aects
Y only through X ). The rst condition is empirically testable, but unfortunately this is not true for the
second condition.
Yi = β0 + β1 Xi + β2 Wi + ui i = 1, ..., n
where:
1. E (ui |Wi ) = 0.
3. Large outliers are unlikely: The X 0 s, W 0 s, Z 0 s, and Y have nonzero nite fourth moments.
4. The two conditions for a valid instrument hold: instrument relevance and instrument exogeneity.
4
3 Exercises
1. [Female teacher.] You want to estimate the probability of a student being assigned to a female
teacher. For doing that, you regress a variable that takes the value of 1 if the teacher is female (f em)
on an indicator for male student (male), percentage of female teachers in the school (perf em) and
past grades (pastgrade), where pastgrade is standardized to have a mean of zero and a standard
deviation of 1, and perf em has a mean of 0.7338 and a standard deviation of 0.3611. The OLS
ˆ
f em = 0.195 −0.002 pastgrade −0.020 male +0.717 perf em
(0.001) (0.000) (0.001) (0.001)
(b) You decide to also use a logit model to estimate your regression. Your results are the following:
(c) The results for the same regression using a probit model are the following:
2. [Voting Republican.] You want to estimate how income aects the probability of voting for a Re-
publican candidate. The results from your logit model are the following:
where Income is measured in thousands of dollars. The mean income in your sample is 50 thousand
dollars.
(a) What is the probability of voting republican if the income is ten thousand dollars?
5
3. [SW 12.9 Instrumental Variables.] A researcher is interested in the eect of military service on
human capital. He collects data from a random sample of 4000 workers aged 40 and runs the OLS
regression Yi = β0 + β1 Xi + ui , where Yi is the worker i's annual earnings and Xi is a binary variable
a) Explain why the OLS estimates are likely to unreliable. (Hint: Which variables are omitted
b) During the Vietnam War there was a draft, where priority for the draft was determined by a
national lottery. (The days of the year were randomly reordered 1 through 365. Those with
birthdates ordered rst were drafted before those with birthdates ordered second, and so forth.)
Explain how the lottery might be used as an instrument to estimate the eect of military service
on earnings.
4 Solutions
1. [Female teacher.]
(a) A male student is 2 percentage points less likely to have a female teacher than a female student.
student.
student.
2. [Voting Republican.]
1
(a) P
cr(Republican = 1) =
1+e−(−1.00+0.02∗10)
= 0.31.
∂P
cr(Rep=1) ∂Λ(β̂0 +β̂1 Income) β̂1 e−(β̂0 +β̂1 Income)
(b)
∂Income = ∂Income β̂1 = .
[1+e−(β̂0 +β̂1 Income) ]2
If we evaluate the above expression at Income = 50, the marginal eect is equal to 0.005:
increasing Income in one thousand dollars increases the probability of voting republican by 0.5
percentage points.
3. [Instrumental Variables.]
a) There are other factors that could aect both the choice to serve in the military and annual
earnings. One example could be education, although this could be included in the regression as
a control variable. Another variable is ability which is dicult to measure, and thus dicult
6
b) The draft was determined by a national lottery so the choice of serving in the military was
random. Because it was randomly selected, the lottery number is uncorrelated with individual
characteristics that may aect earning and hence the instrument is exogenous. Because it
aected the probability of serving in the military, the lottery number is relevant.