Documente Academic
Documente Profesional
Documente Cultură
OLS Estimator
Quantitative
Methods 2
Lecture 5
1
Solutions for β0 and β1
OLS Chooses values of β0 and β1 that
minimizes the unexplained sum of squares.
To find minimum take partial derivatives with
respect to β0 and β1
n
SSR = ∑uˆ 2
i =1
SSR ( yi yi )
ˆ 2
SSR ( yi 0 1 xi ) 2
2
Solutions for β0 and β1
y (nˆ ) ˆ ( x )
i 0 1 i
x yi i ˆ0 ( xi ) ˆ1 ( xi 2 )
3
Solutions for β0 and β1
Our estimate of the slope of the line is:
ˆ
( xi x )( yi y )
(x x )
1 2
i
ˆ ˆ
0 y 1 x
4
Estimators and the “True”
Coefficients
ˆ0 & ˆ1 are the “true”coefficients if we
only wanted to describe the data we
have observed
We are almost ALWAYS using data to
draw conclusions about cases outside
our data
Thus ˆ0 & ˆ1 are estimates of some
“true” set of coefficients (β0 and β1) that
exist beyond our observed data
5
Some Terminology for Labeling
Estimators
Various conventions used to
distinguish the “true” coefficients Wooldridge :
from the estimates that we observe.
ˆ
We will use theThink
beta versus beta-
of this as the
hat distinction fromsame
Wooldridge.
distinction
between population Others :
But other authors, textbooks,
values andorsample-
websites may use based
differentestimates ˆ
terms. b b
b
b B&a A
6
Gauss-Markov Theorem:
Under the 5 Gauss-Markov assumptions,
the OLS estimator is the best, linear,
unbiased estimator of the true parameters
(β’s) conditional on the sample values of
the explanatory variables. In other words,
the OLS estimators is BLUE
7
5 Gauss-Markov Assumptions for
Simple Linear Model (Wooldridge, p.65)
Linear in Parameters y 0 1 x1 u
Random Sampling of n ( xi , yi ) : i 1, 2,...n
observations
Sample variation in
explanatory variables. xi’s
x ( x1 x2 x3 ... xn )
are not all the same value
Zero conditional mean. The E (u x ) 0
error u has an expected
value of 0, given any values
of the explanatory variable
Homoskedasticity. The Var (u x) 2
error has the same variance
given any value of the
explanatory variable. 8
The Linearity Assumption
Key to understanding
OLS models.
The restriction is that our
model of the population y 0 1 x1 2 x2 ... k xk u
must be linear in the
parameters.
A model cannot be non- OLS [ y 0 12 x1 ]
linear in the parameters OLS [ y 0 ln( 1 ) x1 ]
yˆ = Βˆ 0 + Βˆ 1 x
x1
x2
x3
x4
x
10
F(y/x)
Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Heteroskedasticity
yˆ = Βˆ 0 + Βˆ 1 x
x1
x2
x3
x4
x
11
How Good are the Estimates?
Properties of Estimators
Small Sample Properties
True regardless of how much data we have
Most desirable characteristics
Unbiased
Efficient
BLUE (Best Linear Unbiased Estimator)
12
“Second Best” Properties of
Estimators
Asymptotic (or large sample) Properties
True in hypothetical instance of infinite data
In practice applicable if N>50 or so
Asymptotically unbiased
Consistency
Asymptotic efficiency
13
Bias
A parameter is unbiased if
ˆ
E ( j ) j , j 0,1,...., k
In other words, the average value of the estimator
in repeated sampling equals the true parameter.
Note that whether an estimator is biased or not
implies nothing about its dispersion.
14
Efficiency
An estimator is efficient if its variance is less
than any otherWeestimator
might wantof the
to parameter.
choose a biased
This criterionestimator,
only usefulif in
it combination
has a with
ˆ j =2 isvariance.
others. (e.g.smaller low variance, but biased)
Var ( ˆ j ) Var ( %
j)
,where %
j is any other unbiased estimator
of β
15
F(βx)
Unbiased and
efficient estimator
of β Biased estimator
High Sampling
of β
Variance means
inefficient
estimator of β
0 16
True β β + bias
BLUE
(Best Linear Unbiased Estimate)
An Estimator ˆ j is BLUE
if:
ˆ j is a linear function
ˆ j is unbiased: E ( ˆ j ) j , j 0,1,...., k
ˆ j is the most efficient: Var ( ˆ j ) Var ( %
j)
17
Large Sample Properties
Asymptotically Unbiased
As n becomes larger E( ˆ j ) trends toward βj
Consistency
If the bias and variance both decrease as n
gets larger, the estimator is consistent.
Asymptotic Efficiency
asymptotic distribution with finite mean and
variance
is consistent
no estimator has smaller asymptotic variance18
Demonstration of
F(βx)
Consistency
n=50
n=16
n=4
0 19
True β
Let’s Show that OLS is
Unbiased
Begin with our equation: yi = β0+β1xi +u
u ~ N(0,σ2) and yi ~ N(β0+β1xi, σ2)
A Linear function of a normal random
variable is also a normal random
variable
Thus β0 and β1 are normal random
variables
20
The Robust Assumption of
“Normality”
Even if we do not know the distribution of y, β0
and β1 will behave like normal random variables
Central Limit Theorem says estimates of the
mean of any random variable will approach
normal as n increases
Assumes cases are independent (errors not
correlated) and identically distributed (i.i.d).
This is critical for hypothesis testing
β’s are normal regardless of y
21
Showing β1hat is Unbiased
Recall the formula for ˆ j
ˆ1
( x x )( y y )
i i
(x x ) i
2
ˆ1
(x x ) y
i i
(x x ) i
2
22
Showing β1hat is Unbiased
Now we substitute for yi to yield:
ˆ
1
( xi x )( 0 1 x1 ui )
( xi x ) 2
ˆ
1
0 ( xi x ) 1 ( xi x ) x1 ( xi x )ui
( xi x ) 2
23
Showing β1hat is Unbiased
Now we can separate terms to yield:
ˆ 0 ( xi x )
1 ( xi x ) x1
( xi x )u i
( xi x ) ( xi x ) ( xi x )
1 2 2 2
i 1 i 1 i 1 24
Showing β1hat is Unbiased
By the first summation rule, the first term = 0
By the second summation rule, the second term
= β1
This leaves:
ˆ
( x xi)u i
(x x )
1 1 2
i
25
Showing β1hat is Unbiased
Expanding the summations yields:
27
Showing β0hat Is Unbiased
Begin with equation for β0hat
ˆ0 y ˆ1 x
Since :
yi 0 1 xi u i
...Then,...
y 0 1 x u
Substitute for mean of y (ybar)
ˆ0 0 1 x u ˆ1 x
28
Showing β0hat Is Unbiased
Take expected value of both sides:
E[ ˆ0 ] 0 1 x E[u ] E[ ˆ1 x ]
We just showed that
E ( ˆ1 ) 1
Thus, β1’s cancel each other out
E[ ˆ0 ] 0 E[u ]
This leaves:
Again, since
E[u ] 0 then,
E[ ˆ0 ] 0 29
Notice Assumptions
Two key assumptions to show β0hat and
β1hat are unbiased
x is fixed (meaning, it is measured without
error)
E(u)=0
Unbiasedness tells us that OLS will give
us a best guess at the slope and intercept
that is correct on average.
30
OK, but is it BLUE?
Now we have an estimator (β1hat )
We know that β1hat is unbiased
We can calculate the variance of β1hat
across samples.
But is β1hat the Best Linear Unbiased
Estimator????
31
The variance of the
estimator and
hypothesis testing
32
The variance of the estimator
and hypothesis testing
We have derived an estimator for the
slope a line through data: β1hat
We have shown that β1hat is an unbiased
estimator of the “true” relationship β1
Must assume x is measured without error
Must assume the expected value of the error
term is zero
33
Variance of β0hat and β1hat
Even if β0hat and β1hat are right “on average” we
still want to know how far off they might be in a
given sample
Hypotheses are actually about β1, not β1hat
Thus we need to know the variance of β0hat and
β1hat
Use probability theory to draw conclusions about
β1, given our estimate of β1hat
34
Variances of β0hat and β1hat
Conceptually, the Var(ˆ0 ) = E[(ˆ0 -E(ˆ0 )) 2 ]
variances of β0hat and β1hat
are the expected distance Var(ˆ ) = E[(ˆ -E(ˆ ))2 ]
1 1 1
from their individual
values to their mean
values.
We can solve these
based on our proof of
unbiasedness
Recall from above:
ˆ ( x1 x )u1 ( x2 x )u2 ( xn x )un
1 1 ...
( xi x ) ( xi x )
2 2
( xi x ) 2
35
The Variance of β1hat
If a random variable (β1hat ) is the linear
combination of other independently distributed
random variables (u)
Then the variance of β1hat is the sum of the
variances of the u’s
Note the assumption of independent observations
Applying this principle to the previous equation
yields:
36
The Variance of β1hat
ˆ ( x x )
2 2
( x x )
2 2
( x x ) un
2 2
Var ( 1 ) 1 u1
2 u2
... n
[ ( xi x ) ] [ ( xi x ) ]
2 2 2 2
[ ( xi x ) ]
2 2
ˆ ( xi x )2
Var ( 1 ) ˆ1 u
2 2
[ ( xi x )2 ]2 37
The Variance of β1hat !
ˆ 2
OR: Var ( 1 ) ˆ1
2 u
( xi x ) 2
38
The Estimated Variance of β1hat
We do not observe the σu2 - because we
don’t observe β0 and β1
β0hat and β1hat are unbiased, so we use the
variance of the observed residuals as an
estimator of the variance of the “true” errors
We lose 2 degrees of freedom by
substituting in estimators β0hat and β1hat
39
The Estimated Variance of β1hat
Thus: σ = 2 ∑ i
ˆ
u 2
uˆ
n−2
This is an unbiased estimator of σu2
Thus the final equation for the estimated variance
of β1hat is: 2
ˆ
Var ( 1 ) ˆ1
2 uˆ
(x x )
i
2
( xi x ) 2
2
has nice intuitive qualities
ˆ 1
As the size of the errors decreases, ˆ
2
1
decreases
The line fits tightly through the data. Few other lines
could fit as well
2ˆ1
As the variation in x increases, decreases
Few lines will fit without large errors for extreme
values of x
41
The Estimated Variance of β1hat
∑ uˆ 2
ˆ 2
σ u2ˆ =
i Var (1 ) ˆ1
2 uˆ
n −2 i
( x x ) 2
43
T-Tests
In general, our theories give us
hypotheses that β0 >0 or β1 <0, etc.
We can estimate β1hat , but we need a way
to assess the validity of statements that β1
is positive or negative, etc.
We can rely on our estimate of β1hat and its
variance to use probability theory to test
such statements.
44
Z – Scores & Hypothesis Tests
We know that β1hat ~ N(β1 , σβ )
Subtracting β1 from both sides, we can
see that (β1hat - β1 ) ~ N( 0 , σβ )
The if we divide by the standard deviation
we can see that:
(β1hat - β1 ) / β1hat ~ N( 0 , 1 )
To test the “Null Hypothesis that β1 =0, we
can see that: β1hat / σβ ~ N( 0 , 1 ) 45
Z-Scores & Hypothesis Tests
This variable is a “z-score” based on the
standard normal distribution.
95% of cases are within 1.96 standard
deviations of the mean.
If β1hat / σβ > 1.96 then in a series of
random draws there is a 95% chance that
β1 >0
The Problem is that we don’t know σβ
46
Z-Scores and t-scores
Obvious solution is to substitute ˆ1
in place of σβ
47
The t-statistic
The statistic is called “Student’s t,” and the t-
distribution looks similar to a normal distribution
48
The t-statistic
Note the addition of a “degrees of
freedom” constraint
Thus the more data points we have
relative to the number of parameters we
are trying to estimate, the more the t
distribution looks like the z distribution.
When n>100 the difference is negligible
49
Limited Information in Statistical
Significance Tests
Results often illustrative rather than
precise
Only tests “not zero” hypothesis – does
not measure the importance of the
variable (look at confidence interval)
Generally reflects confidence that results
are robust across multiple samples
50
For Example… Presidential
Approval and the CPI
reg approval cpi
Source | SS df MS Number of obs = 148
---------+------------------------------ F( 1, 146) = 9.76
Model | 1719.69082 1 1719.69082 Prob > F = 0.0022
Residual | 25731.4061 146 176.242507 R-squared = 0.0626
---------+------------------------------ Adj R-squared = 0.0562
Total | 27451.0969 147 186.742156 Root MSE = 13.276
------------------------------------------------------------------------------
approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
cpi | -.1348399 .0431667 -3.124 0.002 -.2201522 -.0495277
_cons | 60.95396 2.283144 26.697 0.000 56.44168 65.46624
------------------------------------------------------------------------------
. sum cpi
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
cpi | 148 46.45878 25.36577 23.5 109
51
So the distribution of β1hat is:
.3
Fraction
0
-.3 -.2 -.135 -.1 -.05 0 .1
Simd cpi parameter
52
Now Lets Look at Approval and
the Unemployment Rate
. reg approval unemrate
Source | SS df MS Number of obs = 148
---------+------------------------------ F( 1, 146) = 0.85
Model | 159.716707 1 159.716707 Prob > F = 0.3568
Residual | 27291.3802 146 186.927262 R-squared = 0.0058
---------+------------------------------ Adj R-squared = -0.0010
Total | 27451.0969 147 186.742156 Root MSE = 13.672
------------------------------------------------------------------------------
approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
unemrate | -.5973806 .6462674 -0.924 0.357 -1.874628 .6798672
_cons | 58.05901 3.814606 15.220 0.000 50.52003 65.59799
------------------------------------------------------------------------------
. sum unemrate
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
unemrate | 148 5.640541 1.744879 2.6 10.7
53
Now the Distribution of β1hat is:
.25
Fraction
0
-3 -2 -1 -.597 0 .67 1 2 3
Simd unemrate parameter
54