Chapters 1 & 2-Final - PPT Econmetrics - Smith/Watson

Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Introduction
to Econometrics

Chapters 1 and 2

The statistical analysis of
economic (and related)
data
and
Review of Probability
1-2
Introduction to Econometrics is title of text
What is econometrics?
What is it?
Science (& art!)
Broadly, using theory and statistical methods to analyze
data
What are some uses?
Test theories
Forecast values (e.g., firms sales, unemployment, stock
prices, path of a hurricane, & much, much more)
Fit mathematical economic models to data
Use data to make numerical policy recommendations in
govt. and business

1-3
Brief Overview of the Course
Economics suggests important relationships, often
with policy implications, but virtually never
suggests quantitative magnitudes of causal
effects.
What is the quantitative effect of reducing class size on
student achievement?
How does a bachelors degree change earnings?
What is the price elasticity of cigarettes?
What is the effect on output growth of a 1 percentage
point increase in interest rates by the Fed?
What is the effect on housing prices of environmental
improvements?
How much does knowing econometrics improve your love
life?

1-4
Economic Questions Well Examine
1. Does reducing class size improve elementary
school education?
2. Is there racial discrimination in the market for
home loans?
3. How much do cigarette taxes reduce smoking?
4. What will be the rate of inflation next year?
(in todays economy, a bigger question might be What
will be the unemployment rate next year?)
5. How much does knowing econometrics improve
your love life?

1-5
This course is about using data to
measure causal effects.
Ideally, we would like an experiment
What would be an experiment to estimate the effect of class
size on standardized test scores?
But almost always we only have observational
(nonexperimental) data.
returns to education
cigarette prices
monetary policy
Most of the course deals with difficulties arising from using
observational data to estimate causal effects
confounding effects (omitted factors)
simultaneous causality
correlation does not imply causation

1-6
Learn methods for estimating causal effects using
observational data
Learn some tools that can be used for other purposes; for
example, forecasting using time series data;
Focus on applications theory is used only as needed to
understand the whys of the methods;
Learn to evaluate the regression analysis of others this
means you will be able to read/understand empirical
economics papers in other econ courses;
Get some hands-on experience with regression analysis in
your problem sets.

In this course you will:
1-7
Three types of data
Cross-sectional
Different entities, single time period
Time series
Single entity, multiple time periods
Panel
Multiple entities, two or more time periods

Speaking of using observational data. . .
1-8
Empirical problem: Class size and educational
output
Policy question: What is the effect on test scores (or
some other outcome measure) of reducing class size by
one student per class? by 8 students/class?
We must use data to find out (is there any way to answer
this without data?)

Review of Probability and Statistics
(Chapter 2)
1-9
The California Test Score Data Set (note 1-1)
All K through 8 California school districts (n = 420)
1999

Variables:
5
th
grade test scores
district-wide mean of reading and math scores for fifth
graders.
Student-teacher ratio (STR)
no. of students in the district divided by no. of full-time
teachers

1-10
Initial look at the data: (note 1-2)
(You should already know how to interpret this table)
What does this table tell us about the relationship between test
scores and the STR?
1-11
Do districts with smaller classes have
higher test scores?
Scatterplot of test score v. student-teacher ratio
What does this figure show?
1-12
We need to get some numerical evidence on whether
districts with low STRs have higher test scores but how?
1. Compare average test scores in districts with low STRs to
those with high STRs (estimation)
2. Test the null hypothesis that the mean test scores in the
two types of districts are the same, against the
alternative hypothesis that they differ (hypothesis
testing)
3. Estimate an interval for the difference in the mean test
scores, high v. low STR districts (confidence interval)

1-13
Initial data analysis: Compare districts with small (STR < 20)
and large (STR 20) class sizes: (note 1-3)

1. Estimation of = difference between group means
2. Test the hypothesis that = 0
3. Construct a confidence interval for

Class Size Average score
( )
Standard deviation
(s
Y
)

n
Small 657.4 19.4 238
Large 650.0 17.9 182

Y
1-14
1. Estimation (note 1-4)
=
= 657.4 650.0
= 7.4
Is this a large difference in a real-world sense?
Standard deviation across districts = 19.1
Is this a big enough difference to be important for
school reform discussions, for parents, or for a
school committee?
What does this tell us about the population?

1
n
small
Y
i
i=1
n
small
Y
small
Y
large

1
n
large
Y
i
i=1
n
large

1-15
2. Hypothesis testing (note 1-5)

t =
Y
s
Y
l
s
s
2
n
s
+
s
l
2
n
l
=
Y
s
Y
l
SE(Y
s
Y
l
)
Difference-in-means test: compute the t-statistic,
(remember this?)

where SE( ) is the standard error of ,
the subscripts s and l refer to small and large
STR districts, and (etc.)

Y
s

Y
l

Y
s

Y
l

s
s
2
=
1
n
s
1
(Y
i
Y
s
)
2
i=1
n
s

2. Hypothesis testing (note 1-6)
Before testing. . . what are the H
0
and H
A

for this test?
1-16
1-17
Compute the difference-of-means t-statistic:
(note 1-7)
= 4.05

(note p-value = .000061)

So. . . reject the null hypothesis that the two means
are the same or not? Explain your decision.

Size s
Y
n
small 657.4 19.4 238
large 650.0 17.9 182

Y

t =
Y
s
Y
l
s
s
2
n
s
+
s
l
2
n
l
=
657.4 650.0
19.4
2
238
+
17.9
2
182
=
7.4
1.83
1-18
3. Confidence interval (note 1-8)
A 95% confidence interval for the difference
between the means is,

( ) 1.96SE( )
= 7.4 1.961.83 = (3.8, 11.0)

So. . . reject the null hypothesis that the two means
are the same or not? Explain your decision.

Y
l
Y
s
Y
l
Y
s
1-19
What comes next
The mechanics of estimation, hypothesis testing,
and confidence intervals should be familiar
These concepts extend directly to regression and
its variants
Before turning to regression, however, we will
review some of the underlying theory of
estimation, hypothesis testing, and confidence
intervals:
Why do these procedures work, and why use these rather
than others?
We will review the intellectual foundations of statistics
and econometrics

1-20
Review of Statistical Theory (note 1-9)
Why review probability?
Randomness everywhere theory of probability so as to
describe that randomness

Structure of notes:
1. The probability framework for statistical inference -
now
2. Estimation
3. Testing
4. Confidence Intervals

1-21
Review of Statistical Theory
The probability framework for statistical inference

Single random variable:
Population, random variable, and distribution
Moments of a distribution (mean, variance, standard deviation,
covariance, correlation)
Two random variables:
Conditional distributions and conditional means
Four useful distributions
Normal, chi-squared, students t, F
Random sampling & sampling distribution:
Distribution of a sample of data drawn randomly from a population: Y
1
,
, Y
n

1-22
(a) Single random variable (note 1-10)
Population
The group or collection of all possible entities of interest
(school districts)
We will think of populations as infinitely large ( is an
approximation to very big)

Sample
Whats a sample?

1-23
Fundamental concepts
Outcomes
Probability
Event
Random variable Y
Numerical summary of a random outcome (district average test score, district STR)
Types of random variables
Discrete
Continuous

1-24
Probability distributions - discrete
Definition
Probabilities of events
c.d.f.
Bernoulli

1-25
Probability distributions continuous
p.d.f.
c.d.f.

1-26
Population distribution of Y
The probabilities of different values of Y that occur
in the population, for ex. Pr[Y = 650] (when Y is
discrete)
or: The probabilities of sets of these values, for
ex. Pr[640 Y 660] (when Y is continuous).

1-27
(b) Moments of a population distribution: mean,
variance, standard deviation (note 1-14)

mean = expected value (expectation) of Y
= E(Y)
=
Y

= long-run average value of Y over many
repeated occurrences of Y

1-28
Moments (cont.) (note 1-15)

variance = E[(Y
Y
)
2
]
=
= measure of the squared spread of
the distribution around its mean

standard deviation = =
Y

o
Y
2

variance
1-29
skewness =
= measure of asymmetry (lack of symmetry) of a
distribution
skewness = 0: distribution is symmetric
skewness > (<) 0: distribution has long right (left) tail
Skewness mathematically describes how much a distribution
deviates from symmetry

E Y
Y
( )
3
(
o
Y
3
1-30

kurtosis =
= measure of mass in tails
= measure of probability of large values
kurtosis = 3: normal distribution
kurtosis > 3: heavy tails (leptokurtotic)

E Y
Y
( )
4
(
o
Y
4
1-31
1-32
Two random variables
Random variables X and Y
Together they have a joint distribution
Each one has a marginal distribution
Each one has a conditional distribution

Joint distribution of two discrete X and Y
Probability that X and Y simultaneously take on certain values, say x
and y.
Pr(X = x, Y = y) or Pr(x, y) or P(X = x, Y = y) or P(x, y)
NOTE lower case symbols x and y denote values and. . .
Upper case symbols X and Y denote random variables
Probabilities of all possible (x, y) combinations sum to what?

Joint distribution (cont.)
After recording data for many commutes
prob. of long, rainy commute = P(X=0,Y=0) = .15
prob. of long, clear commute = P(X=1,Y=0) = ??
prob. of short, rainy commute = P(X=0,Y=1) = ??
prob. of short, clear commute = P(X=1,Y=1) = ??

These four outcomes are mutually exclusive and exhaust all possibilities
So, they must sum to ??

1-33
Marginal distribution
Marginal distribution is P(X=x) or P(Y=y)

Sum of joint probabilities:

prob. of long commute = P(X=0,Y=0) + P(X=1,Y=0) = .15 +.07 =.22
prob. of short commute = ??
prob. of rainy commute = ??
prob. of clear commute = ??

1-34
1
( ) ( , )
L
i
i
P Y y P X x Y y
=
= = = =

1-35
Conditional Distribution
Conditional distribution of X and Y
Probability that Y is some value conditional on (depending on or after)
X taking on a specified value
Examples: distribution of. . .
test scores, given that STR < 20
wages of all female workers (Y = wages, X = gender)
mortality rate of those given an experimental treatment (Y = live/die;
X = treated/not treated)

P(Y=y | X=x) or P(y | x)

( , ) ( , )
( | ) ( | )
( ) ( )
P X x Y y P x y
P Y y x X or P y x
P X x P x
= =
= = = =
=
Conditional Distribution (cont.)
Example
prob. of long commute (Y=0) if you know its raining (X=0)

If its raining, only two possibilities. What are they?
So, prob. of short commute (Y=1) if you know its raining (X=0)
= P(Y=1| X=0) = ?? (hint: recall the answer above)

1-36
( 0, 0) .15
( 0| 0) .50
( 0) .30
P X Y
P Y X
P X
= =
= = = = =
=
Conditional Distribution (cont.)
Question from previous slide (cont.)
prob. of short commute (Y=1) if you know its raining (X=0)
Now, check your answer by calculation

1-37
( 0, 1)
( 1| 0) ??
( 0)
P X Y
P Y X
P X
= =
= = = =
=
Conditional Distribution (cont.) (note 1-18)
Questions
What is prob. of long commute (Y=0) if you know its not raining (X=1)?
What is prob. of short commute (Y=1) if you know its not raining
(X=1)?
What do these two probabilities sum to?

1-38
Conditional Distribution Examples. (note 1-19)
Figure 2.4 Average Hourly Earnings of U.S. Full-Time Workers
in 2008. Why do I say that these are conditional distributions?

1-39
1-40
Independence
Two rvs X and Y independent if
Knowing value of one tells you nothing about the value of the other
Conditional distribution of X & Y = marginal distribution of Y (or X)
P(Y=y | X=x) = P(Y=y) or. . .
P(X=x | Y=y) = P(X=x)

1-41
Independence (cont.)
Recall rvs X and Y independent if
P(Y=y | X=x) = P(Y=y)
Example
M = number of PC crashes & A = age of PC (0 = old & 1 = new)
P(M = 0) = 0.80 and P(M = 1) = 0.07
Are M and A independent? Explain your answer.
Case 1: P(M=0 | A = 0) = 0.70
Case 2: P(M=1 | A = 1) = 0.07

1-42
Two random variables: joint distributions
and covariance (note 1-20)
Random variables X and Z have a joint distribution
The covariance between X and Z is
cov(X,Z) = E[(X
X
)(Z
Z
)] =
XZ

The covariance is a measure of the linear association
between X and Z; its units are units of Xunits of Z
cov(X,Z) > 0 means a positive relation between X and Z
If X and Z are independently distributed, then cov(X,Z) = 0
(but not vice versa!!)
1-43
The covariance between Test Score and STR is negative:
So is the correlation
Covariance vs. Correlation
Recall The covariance. . . units are units of
Xunits of Z.
If X & Z in feet, then covariance is feet
2
If X & Z (both same variables) in meters, then
covariance is meters
2
Same association but different values of
covariance
What if X in feet and Z in lbs. What units for
covariance?
Problems!

1-44
1-45
The correlation coefficient is defined in terms of
the covariance:
corr(X,Z) = = r
XZ

1 corr(X,Z) 1
corr(X,Z) = 1 mean perfect positive linear association
corr(X,Z) = 1 means perfect negative linear association
corr(X,Z) = 0 means no linear association

Correlation coefficient is unitless, so it avoids the
problems of the covariance.
corr(X,Z) when measured in feet same as corr(X,Z) when
X & Z in meters or pounds or. . .

cov( X, Z)
var( X) var(Z)
=
o
XZ
o
X
o
Z
1-46
The correlation coefficient measures linear
association
Four Distributions: normal, chi-squared,
Student t, F (note 1-21)
Normal Distribution
bell-shaped probability density
X ~ N(,
2
)
Standard normal
Z ~ N(0, 1)
Standardizing a normal r.v. (z score)
Used for finding probabilities about X ~ N(,
2
)

1-47
Normal Distribution (cont.)
A Bad Day on Wall Street (note 1-22)
The box A Bad Day on Wall Street has an
example of the normal distribution in the
U.S. stock market
1-48
A Bad Day on Wall Street (cont.)
(note 1-22)
1-49

The Chi-squared Distribution (note 1-23)

Usually written as
Shape of distribution
Shape depends on degrees of freedom m
When used
1-50
2
m
_

The Student t Distribution (note 1-24)

Always lower case t
Symmetric like normal distribution
Shape depends on degrees of freedom m
m < 20: fatter tails than normal distribution
m > 30: shape close to normal distribution
m : exactly like normal distribution
When used

1-51
The F Distribution (note 1-25)

Shape depends on two degrees of freedom
Numerator d.f. n
denominator d.f. m
When used

1-52
1-53
(d) Distribution of a sample of data drawn
randomly from a population: Y
1
,, Y
n
(note 1-26)
We will assume simple random sampling
Choose an individual (district, entity) at random from the
population
Randomness and data
Prior to sample selection, the value of Y is random because
the individual selected is random
Once the individual is selected and the value of Y is
observed, then Y is just a number not random
The data set is (Y
1
, Y
2
,, Y
n
), where Y
i
= value of Y for the i
th

individual (district, entity) sampled
1-54
Distribution of Y
1
,, Y
n
under simple
random sampling (note 1-27)
Because individuals #1 and #2 are selected at
random, the value of Y
1
has no information
content for Y
2
. Thus:
Y
1
and Y
2
are independently distributed
Y
1
and Y
2
come from the same population (distribution).
That is, Y
1
, Y
2
are identically distributed
So, under simple random sampling, Y
1
and Y
2
are
independently and identically distributed (i.i.d.).
More generally, under simple random sampling, {Y
i
},
i = 1,, n, are i.i.d.

1-55
Simple Random Sampling (note 1-28)
Recall: Under simple random sampling, {Y
i
}, i = 1,, n, are
i.i.d.

This framework allows rigorous statistical inferences about
moments of population distributions using a sample of data
from that population

Structure of notes:
1. The probability framework for statistical inference
2. Estimation - now
3. Testing
4. Confidence Intervals

1-56
Estimation

is the natural estimator of the population mean. But:
a) What are the properties of ?
b) Why should we use rather than some other estimator?
Y
1
(the first observation)
maybe unequal weights not simple average
median(Y
1
,, Y
n
)
The starting point is the sampling distribution of

Y
Y
Y
Y
1-57
(a) The sampling distribution of (note 1-29)
is a random variable, and its properties are
determined by

the sampling distribution of
The individuals in the sample are drawn at random.
Thus the values of (Y
1
, , Y
n
) are random
Thus functions of (Y
1
, , Y
n
), such as , are random:
had a different sample been drawn, they would have
taken on a different value

The distribution of over ALL possible different
samples of size n is called the. . .
sampling distribution of .
Y
Y

Y

Y

Y

Y
1-58
(a) The sampling distribution of
Recall: The distribution of over ALL possible different
samples of size n is called the. . .
sampling distribution of .

The mean and variance of all of the values are the
mean and variance of its sampling distribution,
E( ) and var( ).
(remember: is a sample statistic.)

VIP: The concept of the sampling distribution underpins
all of inference in econometrics.
Y
Y

Y

Y

Y

Y

Y
1-59
The sampling distribution of (cont.)
(note 1-30)

Example: Suppose Y takes on 0 or 1 (a Bernoulli random
variable) with the probability distribution,
Pr[Y = 0] = .22, Pr(Y =1) = .78
Then
E(Y) = p1 + (1 p) 0 = p = .78
= E[Y E(Y)]
2
= p(1 p) [remember this?]
= .78 (1.78) = 0.1716
The sampling distribution of depends on n.
Consider n = 2. The sampling distribution of is,
Pr( = 0) = .22
2
= .0484
Pr( = ) = 2.22.78 = .3432
Pr( = 1) = .78
2
= .6084

Y

o
Y
2
Y
Y
Y
Y
Y
1-60
The sampling distribution of when Y is Bernoulli (p
= .78): (note 1-31)
Y
1-61
Things we want to know about the
sampling distribution:
What is the mean of ?
If E( ) = true = .78, then is an unbiased estimator
of
What is the variance of ?
How does var( ) depend on n (famous 1/n formula)
Does become close to when n is large?
Law of large numbers: is a consistent estimator of
Distribution of appears bell shaped for n
largeis this generally true?
Wait until next section (2.6 in 3
rd
ed.) to answer this
question about the SHAPE of the sampling distribution of
.

Y
Y
Y
Y
Y
Y
Y
Y
Y
1-62
The mean and variance of the sampling
distribution of
General case that is, for Y
i
i.i.d. from ANY distribution, not
just Bernoulli:
mean: E( ) = E( ) = = =
Y

Variance: var( ) = E[ E( )]
2

= E[
Y
]
2

= E
= E

Y

Y

1
n
Y
i
i=1
n

1
n
E(Y
i
)
i=1
n

1
n

Y
i=1
n

Y

Y

Y

1
n
Y
i
i=1
n
|
\
|
.
|

Y
(
2

1
n
(Y
i

Y
)
i=1
n
(
2

Y
1-63
so var( ) = E
=
=
=
=
=

1
n
(Y
i

Y
)
i=1
n
(
2

Y

E
1
n
(Y
i

Y
)
i=1
n
1
n
(Y
j

Y
)
j=1
n

1
n
2
E (Y
i

Y
)(Y
j

Y
)
j=1
n
i=1
n

1
n
2
cov(Y
i
,Y
j
)
j=1
n
i=1
n
2 2
2 2
1
1 1
[ : cov( , ) var( )]
n
Y Y
i
n note Y Y Y
n n
o o
=
= =
2
Y
n
o

1-64
Mean and variance of sampling
distribution of (cont.) (note 1-32)
E( ) =
Y

var( ) =
Implications:
1. is an unbiased estimator of
Y
(that is, E( ) =
Y
)
2. var( ) is inversely proportional to n
1. the spread (standard deviation) of the sampling
distribution is proportional to 1/
2. Thus the sampling uncertainty associated with
is proportional to 1/ (larger samples, less
uncertainty, but square-root law)

Y
Y
Y
2
Y
n
o
Y
Y
Y

n

n
Y
1-65
The sampling distribution of when n is
large (note 1-33)
For small sample sizes, the distribution of will
usually be complicated (unless. . . what is true about
the distribution of the Y
i
values in the population?)

But if n is large, the sampling distribution is simple!
1. As n increases, the distribution of becomes more
tightly centered around
Y
(the Law of Large Numbers)

2. Moreover, the distribution of both become normal (the
Central Limit Theorem)
1.
2.

Y
Y
Y
Y
Y
Y
1-66
The Law of Large Numbers: (note 1-34)
An estimator is consistent if the probability that its falls within
an interval of the true population value tends to one as the
sample size increases.
If (Y
1
,,Y
n
) are i.i.d. and < , then is a consistent
estimator of
Y
, that is,
Pr[|
Y
| < ] 1 as n
which can be written,
( means converges in probability to
Y
).

o
Y
2
Y
Y
Y
p
Y
Y
" "
p
Y
Y
1-67
The Central Limit Theorem (CLT): (note 1-35)
If (Y
1
,,Y
n
) are i.i.d. and 0 < < , then when n is
large the distribution of is well approximated by
a normal distribution.
is approximately distributed N(
Y
, ) (normal
distribution with mean
Y
and variance /n) AND. . .
(
Y
)/
Y
is Y approximately distributed N(0,1)
(standard normal)
That is, standardized = = is
approximately distributed as N(0,1)
VIP: The larger is n, the better is the
approximation.

n
Y

o
Y
2
Y
Y

o
Y
2
n

o
Y
2
Y

Y E(Y )
var(Y )

Y
Y
o
Y
/ n
1-68
Fig. 2.8 Sampling distribution of when Y is
Bernoulli, p = 0.78 (n = 2, 5, 25, 100)
Y
Fig. 2.8 Sampling distribution of (cont.)
(note 1-36)
In figure on previous slide (fig. 2.8), when
n = 100, it might not be easy to see that
the distribution of is normal.
Its easier to see this if we examine the
distribution of standardized =

See next slide
1-69
Y
Y
/
Y
Y
Y
n
o

Y
1-70
Same example: sampling distribution of
(n = 2, 5, 25, 100) (Fig. 2.9 in book)

Y E(Y )
var(Y )
1-71
Summary: The Sampling Distribution of
For Y
1
,,Y
n
i.i.d. with 0 < < ,
The exact (finite sample) sampling distribution of has mean
Y
( is an unbiased estimator of
Y
) and variance /n
Other than its mean and variance, the exact distribution of is
complicated and depends on the distribution of Y (the
population distribution)
When n is large, the sampling distribution simplifies:
Y

o
Y
2
Y

o
Y
2
Y
(Law of large numbers)
p
Y
Y
( )
var( ) var( )
Y
Y E Y Y
Y Y

=
is approximately N(0,1) (CLT)

Y

Chapters 1 & 2-Final - PPT Econmetrics - Smith/Watson

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Chapters 1 & 2-Final - PPT Econmetrics - Smith/Watson

Încărcat de

Drepturi de autor:

Formate disponibile

Copyright 2011 Pearson Addison-Wesley. All rights reserved.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.

S-ar putea să vă placă și