Sunteți pe pagina 1din 7

Section 11 - Econ 140


GSI: Caroline, Chris, Jimmy, Kaushiki, Leah

1 Binary Dependent Variables (Yi = 1 or Yi = 0)

Economists are often interested in the factors behind the decision-making of individuals or enterprises:

• Why do some people go to college while others do not?

• Why do some women enter the labor force while others do not?

• Why do some people buy houses while others rent?

• Why do some people migrate while others stay put?

The models that have been developed for this purpose are known as qualitative response or binary choice

models, with the outcome, which we will denote Y, being assigned a value of 1 if the event occurs and 0

otherwise. Models with more than two possible outcomes have also been developed, but we will conne

most of our attention to binary choice models.

1.1 General Model

• Pi ≡ P (Yi = 1|X1i , X2i , . . . , Xki ) = F (β0 +β1 X1i +· · ·+βk Xki ), for some function F . In other words,

the probability of Yi being equal to 1 given explanatory variables is a function of linear combinations

of independent variables.

• Predicted probabilities: P̂i = F (βˆ0 + βˆ1 X1i + · · · + βˆk Xki )

• Eect of a change in the regressor:

 Eect for a continuous variable, Xji , = ∂ P̂i/∂Xji = β̂j f (βˆ0 + · · · + βˆk Xki ), where f is the

derivative of the cumulative distribution function, i.e. the density function.

 Eect for a dummy variable = F (βˆ0 + · · · + β̂j · 1 + . . . β̂k Xki ) − F (βˆ0 + · · · + β̂j · 0 + . . . β̂k Xki ).


Thanks for previous GSIs for sharing section notes.

1
1.2 Linear Probability Model (LPM)

The simplest binary choice model is the linear probability model where, as the name implies, the probability

of the event occurring, Pi , is assumed to be a linear function of a set of explanatory variables. That is we

assume that:

• P (Yi = 1|X1i , . . . , Xki ) = β0 + β1 X1i + · · · + βk Xki for all i.

• In other words,

Yi |X1i , . . . , Xki ∼ Bernoulli(β0 + β1 X1i + · · · + βk Xki ) for all i.

• As usual, the value of the dependent variable Yi in observation i has a nonstochastic component

and a random component. The nonstochastic component depends on Xi and the parameters. The

random component is the disturbance term: Yi = E(Yi |X1i , . . . , Xki ) + ui . So, our model can be

written in the form of an usual OLS model:

Yi = β0 + β1 X1i + · · · + βk Xik + ui .

• See Exercise 3 on E(ui |X1i , . . . , Xki ) and V ar(ui |X1i , . . . , Xki ).

• So, using heteroskedasticity robust standard error, we can apply the same method to estimate and

do statistical inferences on parameters.

Remarks:

• ui takes only two values for each i:

Yi = 1 ⇒ ui = 1 − β0 − β1 X1i − · · · − βk Xki
Yi = 0 ⇒ ui = −β0 − β1 X1i − · · · − βk Xki .

• It is heteroskedastic as shown in Exercise 3. It can be shown that V ar(ui |X1i , . . . , Xki ) = Pi (1−Pi ) =
(β0 + β1 X1i + . . . βk Xki )[1 − (β0 + β1 X1i + . . . βk Xki )]. So the variance of the disturbance term is a

function of the observables: we have a particular case of heteroscedasticity.

• It may predict probabilities of more than 1 or less than 0 depending on the values of parameters

and independent variables. Linear probability models assume that if the X variable changes from

1 to 2 then you have the same change in probability as for someone who goes from X = 1000 to

X = 1001. This is why the predicted values can go above 1 or below 0. Although it is very simple

and has straightforward methods for statistical inferences, we need another models to deal with

binary dependent variables.

2
1.3 Logit and Probit Models

The linear probability model may make the nonsense predictions that an event will occur with probability

greater than 1 or less than 0. Instead you may want to consider a non-linear model where your predicted

values cannot go above 1 or below 0. The usual way of avoiding this problem is to assume that the

probability is F (Z), where Z is a function of the explanatory variables and F is a cumulative distribution

function. That is,

Yi |X1i , . . . , Xki ∼ Bernoulli (F (β0 + β1 X1i + . . . βk Xki ))

Now, the probability of Yi being equal to 1 given explanatory variables is F (β0 + β1 X1i + . . . βk Xki ).
Since F is a cdf, it always lies in [0, 1]. And our model can be written as

Yi = F (β0 + β1 X1i + . . . βk Xki ) + ui , for all i.

• Probit Model: When F is a cdf of the standard normal distribution, then the above model is so

called probit regression. That is, F (β0 + β1 X1i + . . . βk Xki ) = Φ(β0 + β1 X1i + . . . βk Xki ), where Φ
is the c.d.f. of the standard normal distribution.

• Logit Model: When F is a cdf of the standard logistic distribution, then the above model is so

called logistic regression. That is, F (β0 + β1 X1i + . . . βk Xki ) = Λ(β0 + β1 X1i + . . . βk Xki ) =
1
.
1+e−(β0 +β1 X1i +...βk Xki )

• As the above shows, these models are not linear. So we cannot use the OLS method in this case.

Instead, we rely on a non-linear least squares estimator (NLSE) or the maximum likelihood estimator

(MLE). These methods are technically and computationally much more complicated than the OLSE.

So, we will not go over the details in this note.

• In STATA, you may use `probit' or `logit' commands to estimate the model and derive standard

errors.

• Eect of a change in Xi :

 For simplicity, we assume that k = 1.


 In general, the choice between prot and logit will not make a substantial dierence empirically.

The important thing to remember is that under both assumptions F is non-linear, which means

that interpreting your results is not as straightforward as in the linear case.


 
 Xi ∂P
= βb1 .f βb0 + βb1 Xi , f (.)
ci
if is a continuous variable: eect = where is the density
∂Xi
function.

 if Xi is a dummy variable: eect = cr(Yi = 1|Xi = 1) − P


P cr(Yi = 1|Xi = 0).

3
2 Instrumental Variables Regression

2.1 Intuition and general model

In the last classes we have discussed several problems (or internal threats) that make the error term cor-

related with the regressor (Cov(X, u) 6= 0) such as omitted variables, errors in variables, and simultaneity

causality. Instrumental variables (IV) regression is a nice way to address these issues. To understand

how IV works think of the variation in X as having two parts: one that is correlated with u (the bad

variation), and a second the part that is not correlated with u (the good variation). Roughly speaking

IV regression uses information from another variable Z (called instrument) to isolate the good variation

in X and disregard the bad variation (which biases the OLS estimates). As a result, the IV estimate

of the coecient on X will give us a consistent estimate of the eect of the good variation part of X on Y.

An important thing to keep in mind is that for this approach to work we need to nd an instrument Z
that is related to X and unrelated to things we don't observe that might also change Y (so that Z aects

Y only through X ). The rst condition is empirically testable, but unfortunately this is not true for the

second condition.

The general IV regression model is

Yi = β0 + β1 Xi + β2 Wi + ui i = 1, ..., n

where:

• Yi is the dependent variable

• ui is the error term (representing measurement error and/or omitted factors)

• Xi is the endogenous regressor (Cov(X, u) 6= 0)

• Wi is the included exogenous regressor (Cov(W, u) = 0)

• Zi is the instrument, which satises:

 instrument relevance: Cov(Z, X) 6= 0


 instrument exogeneity: Cov(Z, u) = 0

The IV regression assumptions are now

1. E (ui |Wi ) = 0.

2. (Xi , Wi , Zi , Yi ) are i.i.d. draws from their joint distribution.

3. Large outliers are unlikely: The X 0 s, W 0 s, Z 0 s, and Y have nonzero nite fourth moments.

4. The two conditions for a valid instrument hold: instrument relevance and instrument exogeneity.

4
3 Exercises

1. [Female teacher.] You want to estimate the probability of a student being assigned to a female

teacher. For doing that, you regress a variable that takes the value of 1 if the teacher is female (f em)

on an indicator for male student (male), percentage of female teachers in the school (perf em) and

past grades (pastgrade), where pastgrade is standardized to have a mean of zero and a standard

deviation of 1, and perf em has a mean of 0.7338 and a standard deviation of 0.3611. The OLS

results from your regression are the following:

ˆ
f em = 0.195 −0.002 pastgrade −0.020 male +0.717 perf em
(0.001) (0.000) (0.001) (0.001)

(a) What is the eect of being a male student?

(b) You decide to also use a logit model to estimate your regression. Your results are the following:

P̂ (f em = 1) = Λ(−1.528 −0.013 pastgrade −0.153 male +3.717 perf em)


(0.008) (0.003) (0.007) (0.009)

What is the eect of being a male student?

(c) The results for the same regression using a probit model are the following:

P̂ (f em = 1) = Φ(−0.904 −0.008 pastgrade −0.085 male +2.200 perf em)


(0.005) (0.002) (0.004) (0.005)

What is the eect of being a male student?

2. [Voting Republican.] You want to estimate how income aects the probability of voting for a Re-

publican candidate. The results from your logit model are the following:

P̂ (Republican = 1) = Λ( −1.00 +0.02 Income)


(0.00) (0.00)

where Income is measured in thousands of dollars. The mean income in your sample is 50 thousand

dollars.

(a) What is the probability of voting republican if the income is ten thousand dollars?

(b) What is the marginal eect of income?

5
3. [SW 12.9 Instrumental Variables.] A researcher is interested in the eect of military service on

human capital. He collects data from a random sample of 4000 workers aged 40 and runs the OLS

regression Yi = β0 + β1 Xi + ui , where Yi is the worker i's annual earnings and Xi is a binary variable

that is equal to 1 if the person served in the military and 0 otherwise.

a) Explain why the OLS estimates are likely to unreliable. (Hint: Which variables are omitted

from the regression? Are they correlated with military service?)

b) During the Vietnam War there was a draft, where priority for the draft was determined by a

national lottery. (The days of the year were randomly reordered 1 through 365. Those with

birthdates ordered rst were drafted before those with birthdates ordered second, and so forth.)

Explain how the lottery might be used as an instrument to estimate the eect of military service

on earnings.

4 Solutions

1. [Female teacher.]

(a) A male student is 2 percentage points less likely to have a female teacher than a female student.

(b) cr(f em = 1|male = 1) − P


P cr(f em = 1|male = 0)
= Λ(β̂0 + β̂1 pastgrade + β̂2 + β̂3 perf em) − Λ(β̂0 + β̂1 pastgrade + β̂3 perf em)
1 1
= − = −0.02825.
1+e−(β̂0 +β̂1 pastgrade+β̂2 +β̂3 perf em) 1+e−(β̂0 +β̂1 pastgrade+β̂3 perf em)
A male student is 2.8 percentage points less likely to have a female teacher than a female

student.

(c) cr(f em = 1|male = 1) − P


P cr(f em = 1|male = 0)
   
= Φ β̂0 + β̂1 pastgrade + β̂2 + β̂3 perf em − Φ β̂0 + β̂1 pastgrade + β̂3 perf em = −0.02714.
A male student is 2.7 percentage points less likely to have a female teacher than a female

student.

2. [Voting Republican.]

1
(a) P
cr(Republican = 1) =
1+e−(−1.00+0.02∗10)
= 0.31.
∂P
cr(Rep=1) ∂Λ(β̂0 +β̂1 Income) β̂1 e−(β̂0 +β̂1 Income)
(b)
∂Income = ∂Income β̂1 = .
[1+e−(β̂0 +β̂1 Income) ]2
If we evaluate the above expression at Income = 50, the marginal eect is equal to 0.005:

increasing Income in one thousand dollars increases the probability of voting republican by 0.5
percentage points.

3. [Instrumental Variables.]

a) There are other factors that could aect both the choice to serve in the military and annual

earnings. One example could be education, although this could be included in the regression as

a control variable. Another variable is ability which is dicult to measure, and thus dicult

to control for in the regression.

6
b) The draft was determined by a national lottery so the choice of serving in the military was

random. Because it was randomly selected, the lottery number is uncorrelated with individual

characteristics that may aect earning and hence the instrument is exogenous. Because it

aected the probability of serving in the military, the lottery number is relevant.

S-ar putea să vă placă și