Section 11 PDF

Section 11 - Econ 140
∗
GSI: Caroline, Chris, Jimmy, Kaushiki, Leah
1 Binary Dependent Variables (Yi = 1 or Yi = 0)
Economists are often interested in the factors behind the decision-making of individuals or enterprises:
• Why do some people go to college while others do not?
• Why do some women enter the labor force while others do not?
• Why do some people buy houses while others rent?
• Why do some people migrate while others stay put?
The models that have been developed for this purpose are known as qualitative response or binary choice
models, with the outcome, which we will denote Y, being assigned a value of 1 if the event occurs and 0
otherwise. Models with more than two possible outcomes have also been developed, but we will conne
most of our attention to binary choice models.
1.1 General Model
• Pi ≡ P (Yi = 1|X1i , X2i , . . . , Xki ) = F (β0 +β1 X1i +· · ·+βk Xki ), for some function F . In other words,
the probability of Yi being equal to 1 given explanatory variables is a function of linear combinations
of independent variables.
• Predicted probabilities: P̂i = F (βˆ0 + βˆ1 X1i + · · · + βˆk Xki )
• Eect of a change in the regressor:
Eect for a continuous variable, Xji , = ∂ P̂i/∂Xji = β̂j f (βˆ0 + · · · + βˆk Xki ), where f is the
derivative of the cumulative distribution function, i.e. the density function.
Eect for a dummy variable = F (βˆ0 + · · · + β̂j · 1 + . . . β̂k Xki ) − F (βˆ0 + · · · + β̂j · 0 + . . . β̂k Xki ).
∗
Thanks for previous GSIs for sharing section notes.
1
1.2 Linear Probability Model (LPM)
The simplest binary choice model is the linear probability model where, as the name implies, the probability
of the event occurring, Pi , is assumed to be a linear function of a set of explanatory variables. That is we
assume that:
• P (Yi = 1|X1i , . . . , Xki ) = β0 + β1 X1i + · · · + βk Xki for all i.
• In other words,
Yi |X1i , . . . , Xki ∼ Bernoulli(β0 + β1 X1i + · · · + βk Xki ) for all i.
• As usual, the value of the dependent variable Yi in observation i has a nonstochastic component
and a random component. The nonstochastic component depends on Xi and the parameters. The
random component is the disturbance term: Yi = E(Yi |X1i , . . . , Xki ) + ui . So, our model can be
written in the form of an usual OLS model:
Yi = β0 + β1 X1i + · · · + βk Xik + ui .
• See Exercise 3 on E(ui |X1i , . . . , Xki ) and V ar(ui |X1i , . . . , Xki ).
• So, using heteroskedasticity robust standard error, we can apply the same method to estimate and
do statistical inferences on parameters.
Remarks:
• ui takes only two values for each i:
Yi = 1 ⇒ ui = 1 − β0 − β1 X1i − · · · − βk Xki
Yi = 0 ⇒ ui = −β0 − β1 X1i − · · · − βk Xki .
• It is heteroskedastic as shown in Exercise 3. It can be shown that V ar(ui |X1i , . . . , Xki ) = Pi (1−Pi ) =
(β0 + β1 X1i + . . . βk Xki )[1 − (β0 + β1 X1i + . . . βk Xki )]. So the variance of the disturbance term is a
function of the observables: we have a particular case of heteroscedasticity.
• It may predict probabilities of more than 1 or less than 0 depending on the values of parameters
and independent variables. Linear probability models assume that if the X variable changes from
1 to 2 then you have the same change in probability as for someone who goes from X = 1000 to
X = 1001. This is why the predicted values can go above 1 or below 0. Although it is very simple
and has straightforward methods for statistical inferences, we need another models to deal with
binary dependent variables.
2
1.3 Logit and Probit Models
The linear probability model may make the nonsense predictions that an event will occur with probability
greater than 1 or less than 0. Instead you may want to consider a non-linear model where your predicted
values cannot go above 1 or below 0. The usual way of avoiding this problem is to assume that the
probability is F (Z), where Z is a function of the explanatory variables and F is a cumulative distribution
function. That is,
Yi |X1i , . . . , Xki ∼ Bernoulli (F (β0 + β1 X1i + . . . βk Xki ))
Now, the probability of Yi being equal to 1 given explanatory variables is F (β0 + β1 X1i + . . . βk Xki ).
Since F is a cdf, it always lies in [0, 1]. And our model can be written as
Yi = F (β0 + β1 X1i + . . . βk Xki ) + ui , for all i.
• Probit Model: When F is a cdf of the standard normal distribution, then the above model is so
called probit regression. That is, F (β0 + β1 X1i + . . . βk Xki ) = Φ(β0 + β1 X1i + . . . βk Xki ), where Φ
is the c.d.f. of the standard normal distribution.
• Logit Model: When F is a cdf of the standard logistic distribution, then the above model is so
called logistic regression. That is, F (β0 + β1 X1i + . . . βk Xki ) = Λ(β0 + β1 X1i + . . . βk Xki ) =
1
.
1+e−(β0 +β1 X1i +...βk Xki )
• As the above shows, these models are not linear. So we cannot use the OLS method in this case.
Instead, we rely on a non-linear least squares estimator (NLSE) or the maximum likelihood estimator
(MLE). These methods are technically and computationally much more complicated than the OLSE.
So, we will not go over the details in this note.
• In STATA, you may use `probit' or `logit' commands to estimate the model and derive standard
errors.
• Eect of a change in Xi :
For simplicity, we assume that k = 1.

In general, the choice between prot and logit will not make a substantial dierence empirically.
The important thing to remember is that under both assumptions F is non-linear, which means
that interpreting your results is not as straightforward as in the linear case.

Xi ∂P
= βb1 .f βb0 + βb1 Xi , f (.)
ci
if is a continuous variable: eect = where is the density
∂Xi
function.
if Xi is a dummy variable: eect = cr(Yi = 1|Xi = 1) − P

P cr(Yi = 1|Xi = 0).
3
2 Instrumental Variables Regression
2.1 Intuition and general model
In the last classes we have discussed several problems (or internal threats) that make the error term cor-
related with the regressor (Cov(X, u) 6= 0) such as omitted variables, errors in variables, and simultaneity
causality. Instrumental variables (IV) regression is a nice way to address these issues. To understand
how IV works think of the variation in X as having two parts: one that is correlated with u (the bad
variation), and a second the part that is not correlated with u (the good variation). Roughly speaking
IV regression uses information from another variable Z (called instrument) to isolate the good variation
in X and disregard the bad variation (which biases the OLS estimates). As a result, the IV estimate
of the coecient on X will give us a consistent estimate of the eect of the good variation part of X on Y.
An important thing to keep in mind is that for this approach to work we need to nd an instrument Z
that is related to X and unrelated to things we don't observe that might also change Y (so that Z aects
Y only through X ). The rst condition is empirically testable, but unfortunately this is not true for the
second condition.
The general IV regression model is
Yi = β0 + β1 Xi + β2 Wi + ui i = 1, ..., n
where:
• Yi is the dependent variable
• ui is the error term (representing measurement error and/or omitted factors)
• Xi is the endogenous regressor (Cov(X, u) 6= 0)
• Wi is the included exogenous regressor (Cov(W, u) = 0)
• Zi is the instrument, which satises:
instrument relevance: Cov(Z, X) 6= 0

instrument exogeneity: Cov(Z, u) = 0
The IV regression assumptions are now
1. E (ui |Wi ) = 0.
2. (Xi , Wi , Zi , Yi ) are i.i.d. draws from their joint distribution.
3. Large outliers are unlikely: The X 0 s, W 0 s, Z 0 s, and Y have nonzero nite fourth moments.
4. The two conditions for a valid instrument hold: instrument relevance and instrument exogeneity.
4
3 Exercises
1. [Female teacher.] You want to estimate the probability of a student being assigned to a female
teacher. For doing that, you regress a variable that takes the value of 1 if the teacher is female (f em)
on an indicator for male student (male), percentage of female teachers in the school (perf em) and
past grades (pastgrade), where pastgrade is standardized to have a mean of zero and a standard
deviation of 1, and perf em has a mean of 0.7338 and a standard deviation of 0.3611. The OLS
results from your regression are the following:
ˆ
f em = 0.195 −0.002 pastgrade −0.020 male +0.717 perf em
(0.001) (0.000) (0.001) (0.001)
(a) What is the eect of being a male student?
(b) You decide to also use a logit model to estimate your regression. Your results are the following:
P̂ (f em = 1) = Λ(−1.528 −0.013 pastgrade −0.153 male +3.717 perf em)

(0.008) (0.003) (0.007) (0.009)
What is the eect of being a male student?
(c) The results for the same regression using a probit model are the following:
P̂ (f em = 1) = Φ(−0.904 −0.008 pastgrade −0.085 male +2.200 perf em)

(0.005) (0.002) (0.004) (0.005)
What is the eect of being a male student?
2. [Voting Republican.] You want to estimate how income aects the probability of voting for a Re-
publican candidate. The results from your logit model are the following:
P̂ (Republican = 1) = Λ( −1.00 +0.02 Income)

(0.00) (0.00)
where Income is measured in thousands of dollars. The mean income in your sample is 50 thousand
dollars.
(a) What is the probability of voting republican if the income is ten thousand dollars?
(b) What is the marginal eect of income?
5
3. [SW 12.9 Instrumental Variables.] A researcher is interested in the eect of military service on
human capital. He collects data from a random sample of 4000 workers aged 40 and runs the OLS
regression Yi = β0 + β1 Xi + ui , where Yi is the worker i's annual earnings and Xi is a binary variable
that is equal to 1 if the person served in the military and 0 otherwise.
a) Explain why the OLS estimates are likely to unreliable. (Hint: Which variables are omitted
from the regression? Are they correlated with military service?)
b) During the Vietnam War there was a draft, where priority for the draft was determined by a
national lottery. (The days of the year were randomly reordered 1 through 365. Those with
birthdates ordered rst were drafted before those with birthdates ordered second, and so forth.)
Explain how the lottery might be used as an instrument to estimate the eect of military service
on earnings.
4 Solutions
1. [Female teacher.]
(a) A male student is 2 percentage points less likely to have a female teacher than a female student.
(b) cr(f em = 1|male = 1) − P

P cr(f em = 1|male = 0)
= Λ(β̂0 + β̂1 pastgrade + β̂2 + β̂3 perf em) − Λ(β̂0 + β̂1 pastgrade + β̂3 perf em)
1 1
= − = −0.02825.
1+e−(β̂0 +β̂1 pastgrade+β̂2 +β̂3 perf em) 1+e−(β̂0 +β̂1 pastgrade+β̂3 perf em)
A male student is 2.8 percentage points less likely to have a female teacher than a female
student.
(c) cr(f em = 1|male = 1) − P

P cr(f em = 1|male = 0)

= Φ β̂0 + β̂1 pastgrade + β̂2 + β̂3 perf em − Φ β̂0 + β̂1 pastgrade + β̂3 perf em = −0.02714.
A male student is 2.7 percentage points less likely to have a female teacher than a female
student.
2. [Voting Republican.]
1
(a) P
cr(Republican = 1) =
1+e−(−1.00+0.02∗10)
= 0.31.
∂P
cr(Rep=1) ∂Λ(β̂0 +β̂1 Income) β̂1 e−(β̂0 +β̂1 Income)
(b)
∂Income = ∂Income β̂1 = .
[1+e−(β̂0 +β̂1 Income) ]2
If we evaluate the above expression at Income = 50, the marginal eect is equal to 0.005:
increasing Income in one thousand dollars increases the probability of voting republican by 0.5
percentage points.
3. [Instrumental Variables.]
a) There are other factors that could aect both the choice to serve in the military and annual
earnings. One example could be education, although this could be included in the regression as
a control variable. Another variable is ability which is dicult to measure, and thus dicult
to control for in the regression.
6
b) The draft was determined by a national lottery so the choice of serving in the military was
random. Because it was randomly selected, the lottery number is uncorrelated with individual
characteristics that may aect earning and hence the instrument is exogenous. Because it
aected the probability of serving in the military, the lottery number is relevant.

Section 11 PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Section 11 PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Section 11 - Econ 140

1 Binary Dependent Variables (Yi = 1 or Yi = 0)

• Why do some people go to college while others do not?

• Why do some people buy houses while others rent?

• Why do some people migrate while others stay put?

most of our attention to binary choice models.

1.1 General Model

• Predicted probabilities: P̂i = F (βˆ0 + βˆ1 X1i + · · · + βˆk Xki )

• Eect of a change in the regressor:

derivative of the cumulative distribution function, i.e. the density function.

• P (Yi = 1|X1i , . . . , Xki ) = β0 + β1 X1i + · · · + βk Xki for all i.

Yi |X1i , . . . , Xki ∼ Bernoulli(β0 + β1 X1i + · · · + βk Xki ) for all i.

written in the form of an usual OLS model:

• See Exercise 3 on E(ui |X1i , . . . , Xki ) and V ar(ui |X1i , . . . , Xki ).

do statistical inferences on parameters.

• ui takes only two values for each i:

function of the observables: we have a particular case of heteroscedasticity.

binary dependent variables.

function. That is,

Yi |X1i , . . . , Xki ∼ Bernoulli (F (β0 + β1 X1i + . . . βk Xki ))

Yi = F (β0 + β1 X1i + . . . βk Xki ) + ui , for all i.

So, we will not go over the details in this note.

 For simplicity, we assume that k = 1.

that interpreting your results is not as straightforward as in the linear case.

 if Xi is a dummy variable: eect = cr(Yi = 1|Xi = 1) − P

2.1 Intuition and general model

The general IV regression model is

• Yi is the dependent variable

• ui is the error term (representing measurement error and/or omitted factors)

• Xi is the endogenous regressor (Cov(X, u) 6= 0)

• Wi is the included exogenous regressor (Cov(W, u) = 0)

• Zi is the instrument, which satises:

 instrument relevance: Cov(Z, X) 6= 0

The IV regression assumptions are now

2. (Xi , Wi , Zi , Yi ) are i.i.d. draws from their joint distribution.

results from your regression are the following:

(a) What is the eect of being a male student?

P̂ (f em = 1) = Λ(−1.528 −0.013 pastgrade −0.153 male +3.717 perf em)

What is the eect of being a male student?

P̂ (f em = 1) = Φ(−0.904 −0.008 pastgrade −0.085 male +2.200 perf em)

What is the eect of being a male student?

P̂ (Republican = 1) = Λ( −1.00 +0.02 Income)

(b) What is the marginal eect of income?

that is equal to 1 if the person served in the military and 0 otherwise.

from the regression? Are they correlated with military service?)

(b) cr(f em = 1|male = 1) − P

(c) cr(f em = 1|male = 1) − P

to control for in the regression.

S-ar putea să vă placă și

• Eect of a change in the regressor:

For simplicity, we assume that k = 1.

if Xi is a dummy variable: eect = cr(Yi = 1|Xi = 1) − P

• Zi is the instrument, which satises:

instrument relevance: Cov(Z, X) 6= 0

(a) What is the eect of being a male student?

What is the eect of being a male student?

What is the eect of being a male student?

(b) What is the marginal eect of income?