Documente Academic
Documente Profesional
Documente Cultură
1 / 30
2 / 30
Organization
I
I
I
I
I
I
3 / 30
Materials
I
4 / 30
Software
I
5 / 30
6 / 30
Causality
Randomized trials
Linear regression
Instrumental variables
Regression discontinuity
Differences in differences
Panel data
Binary dependent variables
7 / 30
Causality
Economic theory informs us about causal relations, which are relations
ceteris paribus, i.e., all other things equal.
For example: higher prices cause lower demand.
Violation of ceteris paribus:
Football team
USC Trojans
Santa Monica High School Vikings
Ticket price
$45$200
Free?
Demand
25,00080,000
< 1,000?
8 / 30
Income tax
Progressive (1%12%)
9 / 30
10 / 30
Randomized trials
Most powerful way to establish causality.
This is the standard for approval of new drugs.
1. Start with a pool of participants.
2. Randomly assign participants to treatment or control group.
When the number of participants is large enough, this should ensure
that all other things are equal.
3. Administer treatment (e.g., policy) to the treatment group and not to
the control group.
4. Compare outcomes of variables of interest in the treatment group with
the control group. Any differences (beyond those that can be expected
due to chance) must be due to the treatment.
5. Statistics tells us what differences can be attributed to chance and when
we can reject chance as the sole determining factor.
11 / 30
Treatment
0.79
0.35
Control
0.58
0.24
Difference
0.21
(0.03)
0.12
(0.11)
12 / 30
Linear regression
Often, the causal variable of interest is a continuous variable that can take
many values, and we are interested in an outcome as a function of the
determining variable.
For example, demand function:
demand = a + b price +
a = intercept;
b = regression coefficient (here < 0);
= error term (random variation)
We could vary the prices experimentally and estimate these parameters.
13 / 30
14 / 30
15 / 30
Log earnings
6.5
Average (CA)
Average (TX)
Linear regression (CA)
Linear regression (TX)
5.5
8
10
12
14
Education in years
16
18
16 / 30
Example (continued)
Regressor
Education
(s.e.)
California
.144
(.010)
Texas
.132
(.011)
Constant
(s.e.)
4.480
(.150)
4.606
(.153)
1004
684
Interpretation:
14.4% higher earnings per additional year of schooling in CA, 13.2% in TX.
The s.e.s suggest that this difference could be due to chance.
All states: lowest is .057 (ND), highest is .184 (NE).
Statistically, we reject that all coefficients are the same.
Source: Analysis of the 2013 CPS. Program and data are on Blackboard.
17 / 30
Instrumental variables
Standard linear regression (OLS) does not estimate the causal effect if
there are still omitted variables that affect the outcome and are correlated
with the included variables. This often happens (at least, is often suspected).
Most common econometric solution: instrumental variables (IVs)
IVs are correlated with the included variables but uncorrelated with the error
term.
Simple case: one regressor (x), one IV (z), outcome is y.
Equation of interest:
y = 1 + 2 x +
(1)
Reduced form:
y = 1 + 2 z + u
(2)
First stage:
x = 1 + 2 z + v
(3)
Then 2 = 2 /2 .
OLS of (1) does not estimate 2 when x and are correlated.
OLS of (2) and (3) estimates 2 and 2 .
IV estimate: = /
2
18 / 30
Outcome
Log quantity (y) Log price (x)
0.36
0.34
(0.15)
(0.07)
Price elasticity
of demand (2 )
1.08
(0.48)
(0.36/0.34 = 1.08)
Source: J. D. Angrist, K. Graddy, & G. W. Imbens (2000), The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models
with an Application to the Demand for Fish, Review of Economic Studies, 67, 499527.
19 / 30
Regression discontinuity
Does Medicare save lives?
Does enrollment in an honors class improve later-life outcomes?
Does being the incumbent increase the probability of being elected?
Binary determinant that depends discontinuously on a continuous variable
(age 65; GPA cutoff score; share of the vote in previous election 50%)
that itself is related to the outcome.
Regression discontinuity (RD) estimates the size of the jump at the cutoff
score, which is the causal effect of the binary variable of interest, at least for
individuals close to the cutoff score.
20 / 30
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.5
0.4
0.4
0.5
21 / 30
Differences in differences
Comparing outcomes before and after a policy change may not estimate the
causal effect of the change because of violation of the ceteris paribus
condition: many things have changed at the same time.
For example, fewer uninsured in 2015 than before Obamacare was enacted
may be (partially) due to decrease in unemployment and economic growth in
general.
Diff-in-diff can be used if there is a population that is affected by the
policy change and another population that is not. It compares the change in
the treated population with the change in the control population.
22 / 30
Before change
After Change
Difference
Low earners
(s.e.)
1.13
(0.03)
1.13
(0.03)
0.01
(0.04)
High earners
(s.e.)
1.38
(0.04)
1.58
(0.04)
0.20
(0.05)
Diff-in-diff
(s.e.)
0.19
(0.07)
The non-change in the low-earners group makes the increase for the
high-earners more plausible as a causal effect.
Source: B. D. Meyer, W. K. Viscusi, & D. L. Durbin (1995), Workers Compensation and Injury Duration: Evidence from a Natural Experiment
American Economic Review, 85, 322340.
23 / 30
Panel data
Observe the same individuals (firms, states, countries, etc.) at multiple
points in time.
I
I
(4)
i.e., the error term consists of a component that is constant over time (e.g.,
stable preferences, abilities, and other characteristics) and a component that
varies over time.
It is often reasonable to assume that the main problem with estimating b is
the component i , which may be correlated with xit .
Eliminate by taking first differences:
yit = yit yi,t1 = bxit + vit .
This equation can be estimated in the ordinary way.
24 / 30
Most effects are smaller after first differencing, e.g., elasticity of Prob(arrest)
is much smaller.
26 / 30
.2
.8
p(x) = Pr(y = 1 | x) =
0
x
27 / 30
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
Logistic regression
Log likelihood = -4386.4103
=
=
=
=
=
-4531.5338
-4394.3345
-4386.4225
-4386.4103
-4386.4103
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
13856
290.25
0.0000
0.0320
-----------------------------------------------------------------------------fairpoor |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------loginc | -.4656788
.0269452
-17.28
0.000
-.5184904
-.4128671
_cons |
2.671893
.2778113
9.62
0.000
2.127393
3.216393
------------------------------------------------------------------------------
28 / 30
Example (continued)
.5
Fair/poor health
.4
.3
Logit
Linear
.2
.1
0
6
9
10
Log family income
11
12
13
29 / 30
I
I
30 / 30