Lecture 01

Econ 513: Practice of Econometrics
Lecture 1: Introduction and overview

(cf. A&P, Introduction)
USC, Fall 2015
1 / 30
Aims of this course

Practice of Econometrics
We will:
I
cover methods to analyze economic data and to understand, quantify,

and interpret economic relationships;
use both analytical and computer-based problems to gain practical
experience in the application of these methods.
Goal is to give you an understanding and thorough basis for applying

econometric methods to empirical economic questions and being able to
consult reference books for more advanced problems.
2 / 30
Organization
I
I
I
I
I
I
Lectures: Tue/Thu 45:50pm, GFS 116

Assignments: 5 times
Midterm: October 8 in class
Final exam: December 10, 4:306:30
Office hours: Tue 23pm VPD 314L or by appointment (send email)
TA: Ahram Moon; office hours: Fri 11am1pm KAP 363
See the syllabus for more details
3 / 30
Materials
I
Textbook: Angrist & Pischke (2015), Mastering Metrics [A&P]. We

will loosely follow the topics of this book but expand on it with more
advanced material and some additional topics.
Recommended reference book: Cameron & Trivedi (2005),
Micoreconometrics. Discusses the advanced topics that are not in A&P
and much more.
Slides and supplementary material will be made available through
Blackboard.
Data: These will be provided through the Blackboard page of the
course or through giving download links.
4 / 30
Software
I
Stata (statistics/econometrics software). This is available in the USC

computer labs.
Stata is by far the most used econometric software currently used (esp.
in applied micro) and therefore an important part of this course is
getting experience with this.
I will discuss Stata examples in my lectures and will post the Stata
programs and data files on the Blackboard page.
The TA (Ahram Moon) will organize an introductory session in the
computer lab.
A good online introduction is at
http://data.princeton.edu/stata/
and many more resources can be found at
http://www.stata.com/links/resources-for-learning-stata/
Cameron & Trivedi (2010), Microeconometrics using Stata (rev. ed.) is
a very useful book for learning how to use Stata to perform econometric
analyses.
5 / 30
What we will be using

I
Basic economic theory.

Source: undergraduate economics course or similar.
Simple calculus.
Source: undergraduate calculus course or similar.
Probability and statistics.
Source: undergraduate statistics course; appendix in undergraduate
econometrics book, etc. You should already be familiar with basic
statistics, but we will briefly review this. We will discuss more
advanced statistics extensively.
Matrix algebra.
Source: undergraduate math course; appendix in undergraduate
econometrics book. We will review this material as well.
6 / 30
Topics discussed in this course

I
I
I
I
I
I
I
I
Causality
Randomized trials
Linear regression
Instrumental variables
Regression discontinuity
Differences in differences
Panel data
Binary dependent variables
We will focus on micro-econometric methods and applications. These also

form the basis for methods for macro, but macro problems often use time
series data. Time series analysis is more advanced and outside the scope of
this course.
7 / 30
Causality
Economic theory informs us about causal relations, which are relations
ceteris paribus, i.e., all other things equal.
For example: higher prices cause lower demand.
Violation of ceteris paribus:
Football team
USC Trojans
Santa Monica High School Vikings
Ticket price
$45$200
Free?
Demand
25,00080,000
< 1,000?
Quality (utility) of USC games local HS game

not everything else is equal.
Economic theory applies to the same USC game, against the same opponent,
in the same location, at the same time: for this given game, demand would
be lower if ticket prices were higher. And this still assumes that the quality
of the game does not depend on the ticket price.
Source: http://www.gettrojantix.com/, https://en.wikipedia.org/wiki/2014 USC Trojans football team, http://samohifootball.com/,
and speculation
8 / 30
Causality and policy evaluation

Economists are often interested in evaluating policies (govt, company, etc.);
also predict the costs and benefits of potential future policies.
What would have happened if different policy was followed?
causal effect of policy.
State
CA
TX
Income tax
Progressive (1%12%)
Real GDP growth (2014)

2.8%
5.2%
Can California increase economic growth by eliminating income taxes?

Many other differences between TX and CA: population growth, population
composition, natural resources, other (tax) policies, labor markets, etc.
We cannot conclude income tax causes the different GDP growth rates.
Sources: http://www.bea.gov/newsreleases/regional/gdp state/2015/xls/gsp0615.xlsx, http://comptroller.texas.gov/taxes/,
https://www.ftb.ca.gov/forms/2014 California Tax Rates and Exemptions.shtml,
http://www.bankrate.com/finance/taxes/state-with-no-income-tax-better-or-worse-1.aspx
9 / 30
Causality and selectivity

We are often interested in causal effects in which economic agents have
some control over the cause we are interested in.
Economic agents make their choices based on their expectations and
preferences, which may include factors that partially determine the outcome
of interest.
Highest education
Bachelors
Masters
Avg. earnings (2013)

$45,431
$58,402
Is the $13,000 difference the causal effect of getting a masters?

Masters may be smarter, more disciplined, motivated, . . .
would have had higher earnings anyway.
This kind of selectivity is one of the key issues in applied economic
research, as opposed to, for example, physics.
Source: http://www.census.gov/cps/data/cpstablecreator.html
10 / 30
Randomized trials
Most powerful way to establish causality.
This is the standard for approval of new drugs.
1. Start with a pool of participants.
2. Randomly assign participants to treatment or control group.
When the number of participants is large enough, this should ensure
that all other things are equal.
3. Administer treatment (e.g., policy) to the treatment group and not to
the control group.
4. Compare outcomes of variables of interest in the treatment group with
the control group. Any differences (beyond those that can be expected
due to chance) must be due to the treatment.
5. Statistics tells us what differences can be attributed to chance and when
we can reject chance as the sole determining factor.
11 / 30
Example: Financial incentives increase productivity

Teacher absence is a big problem in India.
RCT: randomly assign schools to treatment and control groups.
Treatment: teachers attendance was monitored daily using cameras, and
their salaries were made a nonlinear function of attendance.
Outcome
Teacher attendance
(s.e.)
Avg. student test score
(s.e.)
Treatment
0.79
0.35
Control
0.58
0.24
Difference
0.21
(0.03)
0.12
(0.11)
Strong evidence for causal effect on teachers attendance, but difference in

student test scores could be due to chance.
Source: E. Duflo, R. Hanna, & S. P. Ryan (2012), Incentives Work: Getting Teachers to Come to School, American Economic Review, 102, 12411278,
Tables 2 & 8.
12 / 30
Linear regression
Often, the causal variable of interest is a continuous variable that can take
many values, and we are interested in an outcome as a function of the
determining variable.
For example, demand function:
demand = a + b price +
a = intercept;
b = regression coefficient (here < 0);
= error term (random variation)
We could vary the prices experimentally and estimate these parameters.
13 / 30
Controlling for confounders

We can easily add more terms to the right-hand sides of these equations, and
this may help addressing the selectivity problem.
For example,
earnings = a + b masters + c IQ +
masters = 1 for those with a masters degree and 0 for those with only a
bachelors.
We can add other terms on the right hand side if we think they affect earnings
and differ between individuals with a masters and individuals without.
The challenge is to make sure that the remaining (unexplained) random
variation is not related to the variables that are included in the equation,
i.e., no other determinants of the outcome are related to the variables in the
equation.
14 / 30
Descriptive use of regression

Linear regression is also useful for purely descriptive purposes:
Is there a relation between two variables (after controlling for others)?
Even if there are possible confounders, this is useful information:
I
I
Has the relation become stronger or weaker over time?

Is it stronger in one state (or country) than another?
This may be interesting by itself and lead to hypothesis generation and

follow-up work that tries to find the causal effect.
15 / 30
Example: Earnings vs. education in CA and TX
Log earnings
6.5
Average (CA)
Average (TX)
Linear regression (CA)
Linear regression (TX)
5.5
8
10
12
14
Education in years
16
18
16 / 30
Example (continued)
Regressor
Education
(s.e.)
California
.144
(.010)
Texas
.132
(.011)
Constant
(s.e.)
4.480
(.150)
4.606
(.153)
1004
684
Interpretation:
14.4% higher earnings per additional year of schooling in CA, 13.2% in TX.
The s.e.s suggest that this difference could be due to chance.
All states: lowest is .057 (ND), highest is .184 (NE).
Statistically, we reject that all coefficients are the same.
Source: Analysis of the 2013 CPS. Program and data are on Blackboard.
17 / 30
Instrumental variables
Standard linear regression (OLS) does not estimate the causal effect if
there are still omitted variables that affect the outcome and are correlated
with the included variables. This often happens (at least, is often suspected).
Most common econometric solution: instrumental variables (IVs)
IVs are correlated with the included variables but uncorrelated with the error
term.
Simple case: one regressor (x), one IV (z), outcome is y.
Equation of interest:
y = 1 + 2 x +
(1)
Reduced form:
y = 1 + 2 z + u
(2)
First stage:
x = 1 + 2 z + v
(3)
Then 2 = 2 /2 .
OLS of (1) does not estimate 2 when x and are correlated.
OLS of (2) and (3) estimates 2 and 2 .
IV estimate: = /
2
18 / 30
Example: Demand for fish

Regressing quantities sold on prices does not estimate the demand equation
(nor the supply equation) because price is an equilibrium outcome that
depends on the error terms of both the supply and demand equations.
Demand for fish in NY:
Stormy weather reduced supply higher equilibrium price
Assumption: recent stormy weather does not directly affect demand
Stormy can be used as an IV to estimate the demand function.
Regressor
Stormy (z)
(s.e.)
Outcome
Log quantity (y) Log price (x)
0.36
0.34
(0.15)
(0.07)
Price elasticity
of demand (2 )
1.08
(0.48)
(0.36/0.34 = 1.08)
Source: J. D. Angrist, K. Graddy, & G. W. Imbens (2000), The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models
with an Application to the Demand for Fish, Review of Economic Studies, 67, 499527.
19 / 30
Regression discontinuity
Does Medicare save lives?
Does enrollment in an honors class improve later-life outcomes?
Does being the incumbent increase the probability of being elected?
Binary determinant that depends discontinuously on a continuous variable
(age 65; GPA cutoff score; share of the vote in previous election 50%)
that itself is related to the outcome.
Regression discontinuity (RD) estimates the size of the jump at the cutoff
score, which is the causal effect of the binary variable of interest, at least for
individuals close to the cutoff score.
20 / 30
Example: Incumbency effect

Democrat vote share next election
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.5
0.4
0.3 0.2 0.1

0.0
0.1
0.2
0.3
Difference in vote share DemocratRepublican
0.4
0.5
Estimate of the jump at zero: 0.082 (s.e. = 0.008).

So the incumbent party gets an 8.2 percentage points bonus.
Source: D. S. Lee (2008), Randomized experiments from non-random selection in U.S. House elections, Journal of Econometrics, 142, 675697,
Figure 4a. Program and data are on Blackboard.
21 / 30
Differences in differences
Comparing outcomes before and after a policy change may not estimate the
causal effect of the change because of violation of the ceteris paribus
condition: many things have changed at the same time.
For example, fewer uninsured in 2015 than before Obamacare was enacted
may be (partially) due to decrease in unemployment and economic growth in
general.
Diff-in-diff can be used if there is a population that is affected by the
policy change and another population that is not. It compares the change in
the treated population with the change in the control population.
22 / 30
Example: Workers Compensation

In 1980, Kentucky increased benefits for injured high-earning workers, but
not for low-earners.
Higher benefits reduce the incentive to return to work longer duration.
Log duration
Income group
Before change
After Change
Difference
Low earners
(s.e.)
1.13
(0.03)
1.13
(0.03)
0.01
(0.04)
High earners
(s.e.)
1.38
(0.04)
1.58
(0.04)
0.20
(0.05)
Diff-in-diff
(s.e.)
0.19
(0.07)
The non-change in the low-earners group makes the increase for the
high-earners more plausible as a causal effect.
Source: B. D. Meyer, W. K. Viscusi, & D. L. Durbin (1995), Workers Compensation and Injury Duration: Evidence from a Natural Experiment
American Economic Review, 85, 322340.
23 / 30
Panel data
Observe the same individuals (firms, states, countries, etc.) at multiple
points in time.
I
I
Study changes in one variable as a function of changes in another.

Also helps with removing unobserved confounders.
Suppose the model of interest is

yit = a + bxit + it = a + bxit + i + vit ,
(4)
i.e., the error term consists of a component that is constant over time (e.g.,
stable preferences, abilities, and other characteristics) and a component that
varies over time.
It is often reasonable to assume that the main problem with estimating b is
the component i , which may be correlated with xit .
Eliminate by taking first differences:
yit = yit yi,t1 = bxit + vit .
This equation can be estimated in the ordinary way.
24 / 30
Example: Crime rates in counties in North Carolina

Regression without taking panel nature into account:
. regress lcrmrte ///
>
lprbarr lprbconv lprbpris lavgsen lpolpc ldensity lpctymle lwcon lwtuc
///
>
lwtrd
lwfir
lwser
lwmfg
lwfed lwsta
lwloc
west central ///
>
urban
lpctmin
-----------------------------------------------------------------------------lcrmrte |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lprbarr |
-.545336
.029672
-18.38
0.000
-.6036079
-.487064
lprbconv | -.4392882
.021475
-20.46
0.000
-.4814622
-.3971143
lprbpris | -.1286902
.0482829
-2.67
0.008
-.2235115
-.0338689
lavgsen | -.0595281
.0383381
-1.55
0.121
-.1348192
.0157629
lpolpc |
.3623397
.0224237
16.16
0.000
.3183026
.4063768
ldensity |
.3121217
.0278302
11.22
0.000
.2574669
.3667765
lpctymle | -.1603721
.0629848
-2.55
0.011
-.2840659
-.0366783
lwcon |
.0715306
.0540324
1.32
0.186
-.0345819
.177643
lwtuc |
.0039808
.0289216
0.14
0.891
-.0528174
.0607791
lwtrd |
.0162076
.0620666
0.26
0.794
-.105683
.1380982
lwfir | -.0095551
.0448274
-0.21
0.831
-.0975901
.0784799
lwser | -.0358545
.0310605
-1.15
0.249
-.0968532
.0251442
lwmfg | -.0827497
.0528848
-1.56
0.118
-.1866083
.021109
lwfed |
.0414201
.113905
0.36
0.716
-.1822742
.2651143
lwsta | -.2331157
.0820558
-2.84
0.005
-.3942624
-.071969
lwloc |
.028267
.1156494
0.24
0.807
-.198853
.255387
west | -.2212392
.0462601
-4.78
0.000
-.3120878
-.1303905
central | -.1706151
.0274328
-6.22
0.000
-.2244895
-.1167408
urban | -.1417121
.0536367
-2.64
0.008
-.2470475
-.0363767
lpctmin |
.1837615
.0196319
9.36
0.000
.145207
.222316
_cons | -1.904982
.5722773
-3.33
0.001
-3.028858
-.7811051
-----------------------------------------------------------------------------Source: Analysis of the data from C. Cornwell & W. N. Trumbull (1994), Estimating the Economic Model of Crime with Panel Data, Review of
Economics and Statistics, 76, 360366. Inspired by their Table 3. Program and data are on Blackboard.
25 / 30
Comparison with and without first differencing

-------------------------------------------------------------|
Without FD
After FD
lcrmrte |
Coef.
Std. Err.
Coef.
Std. Err.
-------------+-----------------------------------------------lprbarr |
-.545336
.029672
-.3443183
.0307744
lprbconv | -.4392882
.021475
-.2519694
.0186189
lprbpris | -.1286902
.0482829
-.1763339
.0266779
lavgsen | -.0595281
.0383381
-.0086446
.0222081
lpolpc |
.3623397
.0224237
.3910226
.027922
ldensity |
.3121217
.0278302
.1403951
.5923311
lpctymle | -.1603721
.0629848
-.2287106
.7466448
lwcon |
.0715306
.0540324
-.0411665
.0315042
lwtuc |
.0039808
.0289216
.0112048
.0134731
lwtrd |
.0162076
.0620666
-.0411206
.0318461
lwfir | -.0095551
.0448274
.0033035
.0219192
lwser | -.0358545
.0310605
.0164855
.0148252
lwmfg | -.0827497
.0528848
-.2298159
.1022266
lwfed |
.0414201
.113905
-.1766902
.1713265
lwsta | -.2331157
.0820558
.1232835
.0933971
lwloc |
.028267
.1156494
.0949626
.1024504
west | -.2212392
.0462601
central | -.1706151
.0274328
urban | -.1417121
.0536367
lpctmin |
.1837615
.0196319
_cons | -1.904982
.5722773
-.0042917
.0206917
--------------------------------------------------------------
Most effects are smaller after first differencing, e.g., elasticity of Prob(arrest)
is much smaller.
26 / 30
Binary dependent variables

Binary outcome (1/0; e.g., whether has health insurance, whether employed,
whether enrolls in masters program after bachelors)
Linear regression model may predict probabilities outside the 01 range,
which is logically impossible and therefore often undesirable.
A nonlinear relation fits such relations better, for example the logit model
ea+bx
1 + ea+bx
.2
p(x) with a=0, b=1

.4
.6
.8
p(x) = Pr(y = 1 | x) =
0
x
27 / 30
Example: Bad health as a function of income

This is an example of a non-causal descriptive relation.
. logit fairpoor loginc if (famincy2 >= 500)
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
Logistic regression
Log likelihood = -4386.4103
=
=
=
=
=
-4531.5338
-4394.3345
-4386.4225
-4386.4103
-4386.4103
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
13856
290.25
0.0000
0.0320
-----------------------------------------------------------------------------fairpoor |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------loginc | -.4656788
.0269452
-17.28
0.000
-.5184904
-.4128671
_cons |
2.671893
.2778113
9.62
0.000
2.127393
3.216393
------------------------------------------------------------------------------
28 / 30
Example (continued)
.5
Fair/poor health
.4
.3
Logit
Linear
.2
.1
0
6
9
10
Log family income
11
12
13
29 / 30
When you get home

I
I
I
[Before you leave] Fill out background questionnaire and return it to

Ahram or me.
Read the Introduction of A&P.
Read the following article:
A. Frakt (08/17/2015), How to Know Whether to Believe a Health Study,
New York Times, http://nyti.ms/1NABX1O
This is about health studies, but most of it also applies to economic

studies.
30 / 30

Lecture 01

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture 01

Încărcat de

Drepturi de autor:

Formate disponibile

Econ 513: Practice of Econometrics

Lecture 1: Introduction and overview

USC, Fall 2015

Aims of this course

cover methods to analyze economic data and to understand, quantify,

Goal is to give you an understanding and thorough basis for applying

Lectures: Tue/Thu 45:50pm, GFS 116

See the syllabus for more details

Textbook: Angrist & Pischke (2015), Mastering Metrics [A&P]. We

Stata (statistics/econometrics software). This is available in the USC

What we will be using

Basic economic theory.

Topics discussed in this course

We will focus on micro-econometric methods and applications. These also

Quality (utility) of USC games  local HS game

Causality and policy evaluation

Real GDP growth (2014)

Can California increase economic growth by eliminating income taxes?

Causality and selectivity

Avg. earnings (2013)

Is the $13,000 difference the causal effect of getting a masters?

Example: Financial incentives increase productivity

Strong evidence for causal effect on teachers attendance, but difference in

Controlling for confounders

Descriptive use of regression

Has the relation become stronger or weaker over time?

This may be interesting by itself and lead to hypothesis generation and

Example: Earnings vs. education in CA and TX

Example: Demand for fish

Example: Incumbency effect

0.3 0.2 0.1

Estimate of the jump at zero: 0.082 (s.e. = 0.008).

Example: Workers Compensation

Study changes in one variable as a function of changes in another.

Suppose the model of interest is

Example: Crime rates in counties in North Carolina

Comparison with and without first differencing

Binary dependent variables

p(x) with a=0, b=1

Example: Bad health as a function of income

When you get home

[Before you leave] Fill out background questionnaire and return it to

This is about health studies, but most of it also applies to economic

S-ar putea să vă placă și

Quality (utility) of USC games local HS game