Sunteți pe pagina 1din 54

Properties of the

OLS Estimator

Quantitative
Methods 2
Lecture 5
1
Solutions for β0 and β1
 OLS Chooses values of β0 and β1 that
minimizes the unexplained sum of squares.
 To find minimum take partial derivatives with
respect to β0 and β1
n
SSR = ∑uˆ 2
i =1

SSR   ( yi  yi )
ˆ 2

SSR   ( yi   0  1 xi ) 2

2
Solutions for β0 and β1

 Derivatives were transformed into the


normal equations
 Solving the normal equations for β0
and β1 gives us our OLS estimators

 y  (nˆ ) ˆ ( x )
i 0 1  i

x yi i  ˆ0 ( xi ) ˆ1 ( xi 2 )
3
Solutions for β0 and β1
 Our estimate of the slope of the line is:

ˆ 
 ( xi x )( yi y )
 (x  x )
1 2
i

 And our estimate of the intercept is:

ˆ ˆ
 0  y  1 x
4
Estimators and the “True”
Coefficients
 ˆ0 & ˆ1 are the “true”coefficients if we
only wanted to describe the data we
have observed
 We are almost ALWAYS using data to
draw conclusions about cases outside
our data
 Thus ˆ0 & ˆ1 are estimates of some
“true” set of coefficients (β0 and β1) that
exist beyond our observed data

5
Some Terminology for Labeling
Estimators
 Various conventions used to
distinguish the “true” coefficients Wooldridge :
from the estimates that we observe.
  ˆ
We will use theThink
beta versus beta-
of this as the
hat distinction fromsame
Wooldridge.
distinction
between population Others :
 But other authors, textbooks,
values andorsample-
websites may use based
differentestimates ˆ
terms. b  b

b
b  B&a  A

6
Gauss-Markov Theorem:
Under the 5 Gauss-Markov assumptions,
the OLS estimator is the best, linear,
unbiased estimator of the true parameters
(β’s) conditional on the sample values of
the explanatory variables. In other words,
the OLS estimators is BLUE

7
5 Gauss-Markov Assumptions for
Simple Linear Model (Wooldridge, p.65)
 Linear in Parameters y   0  1 x1  u
 Random Sampling of n ( xi , yi ) : i  1, 2,...n
observations
 Sample variation in
explanatory variables. xi’s
x  ( x1  x2  x3  ...  xn )
are not all the same value
 Zero conditional mean. The E (u x )  0
error u has an expected
value of 0, given any values
of the explanatory variable
 Homoskedasticity. The Var (u x)   2
error has the same variance
given any value of the
explanatory variable. 8
The Linearity Assumption
 Key to understanding
OLS models.
 The restriction is that our
model of the population y   0  1 x1   2 x2  ... k xk  u
must be linear in the
parameters.
 A model cannot be non- OLS  [ y   0  12 x1 ]
linear in the parameters OLS  [ y   0  ln( 1 ) x1 ]

 Non-linear in the OLS  [ y   0  1 x1   x12 ]


variables (x’s), however,
is fine and quite useful OLS  [ y   0  1 ln( x1 )] 9
F(y/x)
Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Homoskedasticity

Variance across values


of x is constant

yˆ = Βˆ 0 + Βˆ 1 x
x1
x2
x3
x4
x

10
F(y/x)
Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Heteroskedasticity

Variance differs across


values of x

yˆ = Βˆ 0 + Βˆ 1 x
x1
x2
x3
x4
x

11
How Good are the Estimates?
Properties of Estimators
 Small Sample Properties
 True regardless of how much data we have
 Most desirable characteristics
 Unbiased
 Efficient
 BLUE (Best Linear Unbiased Estimator)

12
“Second Best” Properties of
Estimators
 Asymptotic (or large sample) Properties
 True in hypothetical instance of infinite data
 In practice applicable if N>50 or so
 Asymptotically unbiased
 Consistency
 Asymptotic efficiency

13
Bias
 A parameter is unbiased if

ˆ
E (  j )   j , j  0,1,...., k
 In other words, the average value of the estimator
in repeated sampling equals the true parameter.
 Note that whether an estimator is biased or not
implies nothing about its dispersion.
14
Efficiency
 An estimator is efficient if its variance is less
than any otherWeestimator
might wantof the
to parameter.
choose a biased
 This criterionestimator,
only usefulif in
it combination
has a with
ˆ j =2 isvariance.
others. (e.g.smaller low variance, but biased)

ˆ is the“best” Unbiased estimator if


j

Var ( ˆ j )  Var ( %
j)

,where %
j is any other unbiased estimator
of β
15
F(βx)
Unbiased and
efficient estimator
of β Biased estimator
High Sampling
of β
Variance means
inefficient
estimator of β

0 16
True β β + bias
BLUE
(Best Linear Unbiased Estimate)
 An Estimator ˆ j is BLUE
if:

 ˆ j is a linear function

 ˆ j is unbiased: E ( ˆ j )   j , j  0,1,...., k
 ˆ j is the most efficient: Var ( ˆ j )  Var ( %
j)

17
Large Sample Properties
 Asymptotically Unbiased
 As n becomes larger E( ˆ j ) trends toward βj
 Consistency
 If the bias and variance both decrease as n
gets larger, the estimator is consistent.
 Asymptotic Efficiency
 asymptotic distribution with finite mean and
variance
 is consistent
 no estimator has smaller asymptotic variance18
Demonstration of
F(βx)
Consistency

n=50

n=16

n=4

0 19
True β
Let’s Show that OLS is
Unbiased
 Begin with our equation: yi = β0+β1xi +u
 u ~ N(0,σ2) and yi ~ N(β0+β1xi, σ2)
 A Linear function of a normal random
variable is also a normal random
variable
 Thus β0 and β1 are normal random
variables

20
The Robust Assumption of
“Normality”
 Even if we do not know the distribution of y, β0
and β1 will behave like normal random variables
 Central Limit Theorem says estimates of the
mean of any random variable will approach
normal as n increases
 Assumes cases are independent (errors not
correlated) and identically distributed (i.i.d).
 This is critical for hypothesis testing
 β’s are normal regardless of y

21
Showing β1hat is Unbiased
 Recall the formula for ˆ j

ˆ1 
 ( x  x )( y  y )
i i

 (x  x ) i
2

 From rules of summation properties, this


reduces to:

ˆ1 
 (x  x ) y
i i

 (x  x ) i
2

22
Showing β1hat is Unbiased
 Now we substitute for yi to yield:

ˆ
1 
 ( xi  x )(  0  1 x1  ui )
 ( xi  x ) 2

 This expands to:

ˆ
1 
 0  ( xi  x )  1  ( xi  x ) x1   ( xi  x )ui

 ( xi  x ) 2

23
Showing β1hat is Unbiased
 Now we can separate terms to yield:

ˆ   0  ( xi  x )

1  ( xi  x ) x1

 ( xi  x )u i

 ( xi  x )  ( xi  x )  ( xi  x )
1 2 2 2

 Now we need to rely on two more rules of


summation:
n
6. ( xi  x )  0
i 1
n n n
7. ( xi  x )   x  n( x )   ( xi  x ) xi
2 2
i
2

i 1 i 1 i 1 24
Showing β1hat is Unbiased
 By the first summation rule, the first term = 0
 By the second summation rule, the second term
= β1

 This leaves:

ˆ   
 ( x  xi)u i

 (x  x )
1 1 2
i

25
Showing β1hat is Unbiased
 Expanding the summations yields:

ˆ    ( x1  x )u1  ( x2  x )u2 ...  ( xn  x )un


 ( xi  x )  ( xi  x )  ( xi  x )
1 1 2 2 2

 To show that β1hat is unbiased, we must show


that the expectation of β1hat = β1
 ( x1  x )u1   ( x2  x )u2   ( xn  x )un 
E (ˆ1 )  1  E  2
E 2
...  E  2
  ( xi  x )    ( xi  x )    ( xi  x ) 
26
Showing β1hat is Unbiased
Two Assumptions needed
 Now we need Gauss-
to get E (u x)  0
this result:
Markov assumption4
that expected value of
1. =x’s
the error term 0 are fixed
(measured without
 Then all terms after β1
error) E ( ˆ1 )  1  0  0...  0
are equal to 02. Expected Value of the
error is zero
 This reduces to: E ( ˆ1 )  1

27
Showing β0hat Is Unbiased
 Begin with equation for β0hat
ˆ0  y  ˆ1 x
 Since :
yi   0  1 xi  u i
...Then,...
y   0  1 x  u
 Substitute for mean of y (ybar)
ˆ0   0  1 x  u  ˆ1 x
28
Showing β0hat Is Unbiased
 Take expected value of both sides:
E[ ˆ0 ]   0  1 x  E[u ]  E[ ˆ1 x ]
 We just showed that
E ( ˆ1 )  1
Thus, β1’s cancel each other out
E[ ˆ0 ]   0  E[u ]
 This leaves:
 Again, since
E[u ]  0 then,
E[ ˆ0 ]   0 29
Notice Assumptions
 Two key assumptions to show β0hat and
β1hat are unbiased
x is fixed (meaning, it is measured without
error)
 E(u)=0
 Unbiasedness tells us that OLS will give
us a best guess at the slope and intercept
that is correct on average.

30
OK, but is it BLUE?
 Now we have an estimator (β1hat )
 We know that β1hat is unbiased
 We can calculate the variance of β1hat
across samples.
 But is β1hat the Best Linear Unbiased
Estimator????

31
The variance of the
estimator and
hypothesis testing

32
The variance of the estimator
and hypothesis testing
 We have derived an estimator for the
slope a line through data: β1hat
 We have shown that β1hat is an unbiased
estimator of the “true” relationship β1
 Must assume x is measured without error
 Must assume the expected value of the error
term is zero

33
Variance of β0hat and β1hat
 Even if β0hat and β1hat are right “on average” we
still want to know how far off they might be in a
given sample
 Hypotheses are actually about β1, not β1hat
 Thus we need to know the variance of β0hat and
β1hat
 Use probability theory to draw conclusions about
β1, given our estimate of β1hat

34
Variances of β0hat and β1hat
 Conceptually, the Var(ˆ0 ) = E[(ˆ0 -E(ˆ0 )) 2 ]
variances of β0hat and β1hat
are the expected distance Var(ˆ ) = E[(ˆ -E(ˆ ))2 ]
1 1 1
from their individual
values to their mean
values.
 We can solve these
based on our proof of
unbiasedness
 Recall from above:
ˆ ( x1  x )u1 ( x2  x )u2 ( xn  x )un
1  1   ... 
 ( xi  x )  ( xi  x )
2 2
 ( xi  x ) 2
35
The Variance of β1hat
 If a random variable (β1hat ) is the linear
combination of other independently distributed
random variables (u)
 Then the variance of β1hat is the sum of the
variances of the u’s
 Note the assumption of independent observations
 Applying this principle to the previous equation
yields:

36
The Variance of β1hat
ˆ ( x  x ) 
2 2
( x  x ) 
2 2
( x  x )  un
2 2
Var ( 1 )  1 u1
 2 u2
...  n

[ ( xi  x ) ] [ ( xi  x ) ]
2 2 2 2
[ ( xi  x ) ]
2 2

 Now we need another Gauss-Markov


Assumption 5: Var (u x)   2
σ u2 = σ u21 = σ u22 ... = σ un2
 That is, we must assume that the variance of
the errors is constant. This yields:

ˆ ( xi  x )2
Var ( 1 )   ˆ1   u
2 2

[ ( xi  x )2 ]2 37
The Variance of β1hat !
ˆ  2
 OR: Var ( 1 )   ˆ1 
2 u

 ( xi  x ) 2

 That is, the variance of β1hat is a function of


the variance of the errors (σu2), and the
variation of x

 But…what is the true variance of the errors?

38
The Estimated Variance of β1hat
 We do not observe the σu2 - because we
don’t observe β0 and β1
 β0hat and β1hat are unbiased, so we use the
variance of the observed residuals as an
estimator of the variance of the “true” errors
 We lose 2 degrees of freedom by
substituting in estimators β0hat and β1hat

39
The Estimated Variance of β1hat
 Thus: σ = 2 ∑ i
ˆ
u 2


n−2
 This is an unbiased estimator of σu2
 Thus the final equation for the estimated variance
of β1hat is:  2
ˆ
Var ( 1 )   ˆ1 
2 uˆ

 (x  x )
i
2

 New assumptions: independent observations and


constant error variance
40
The Estimated Variance of β1hat
ˆ  2
Var ( 1 )   ˆ1 
2 uˆ

 ( xi  x ) 2

  2
has nice intuitive qualities
ˆ 1
 As the size of the errors decreases,  ˆ
2
1
decreases
 The line fits tightly through the data. Few other lines
could fit as well
 2ˆ1
 As the variation in x increases, decreases
 Few lines will fit without large errors for extreme
values of x
41
The Estimated Variance of β1hat
∑ uˆ 2
ˆ  2

σ u2ˆ =
i Var (1 )   ˆ1 
2 uˆ

n −2  i
( x  x ) 2

 Because the variance of the estimated errors has


n in the denominator, as n increases, the
variance of β1hat decreases
 The more data points we must fit to the line,
the smaller the number of lines that fit with few
errors
 We have more information about where the
line must go
42
Variance of β1hat is Important for
Hypothesis Testing
 F –test – hypothesis that Null Model does
better
 Log-likelihood Test – joint significance of
variables in an MLE model
 T-test – tests that individual coefficients
are not zero.
 Thisis the central task for testing most policy
theories

43
T-Tests
 In general, our theories give us
hypotheses that β0 >0 or β1 <0, etc.
 We can estimate β1hat , but we need a way
to assess the validity of statements that β1
is positive or negative, etc.
 We can rely on our estimate of β1hat and its
variance to use probability theory to test
such statements.
44
Z – Scores & Hypothesis Tests
 We know that β1hat ~ N(β1 , σβ )
 Subtracting β1 from both sides, we can
see that (β1hat - β1 ) ~ N( 0 , σβ )
The if we divide by the standard deviation
we can see that:
(β1hat - β1 ) / β1hat ~ N( 0 , 1 )
 To test the “Null Hypothesis that β1 =0, we
can see that: β1hat / σβ ~ N( 0 , 1 ) 45
Z-Scores & Hypothesis Tests
 This variable is a “z-score” based on the
standard normal distribution.
 95% of cases are within 1.96 standard
deviations of the mean.
 If β1hat / σβ > 1.96 then in a series of
random draws there is a 95% chance that
β1 >0
 The Problem is that we don’t know σβ
46
Z-Scores and t-scores
 Obvious solution is to substitute  ˆ1
in place of σβ

 Problem: β1hat /  ˆ1is the ratio of two random


variables, and this will not be normally
distributed

 Fortunately, an employee of Guinness Brewery


figured out this distribution in 1919

47
The t-statistic
 The statistic is called “Student’s t,” and the t-
distribution looks similar to a normal distribution

 Thus β1hat /  ˆ1 ~ t(n-2) for bivariate regression.

 More generally β1hat /  ˆ1 ~ t(n-k)


 where k is the # of parameters estimated

48
The t-statistic
 Note the addition of a “degrees of
freedom” constraint
 Thus the more data points we have
relative to the number of parameters we
are trying to estimate, the more the t
distribution looks like the z distribution.
 When n>100 the difference is negligible

49
Limited Information in Statistical
Significance Tests
 Results often illustrative rather than
precise
 Only tests “not zero” hypothesis – does
not measure the importance of the
variable (look at confidence interval)
 Generally reflects confidence that results
are robust across multiple samples

50
For Example… Presidential
Approval and the CPI
 reg approval cpi

 Source | SS df MS Number of obs = 148
 ---------+------------------------------ F( 1, 146) = 9.76
 Model | 1719.69082 1 1719.69082 Prob > F = 0.0022
 Residual | 25731.4061 146 176.242507 R-squared = 0.0626
 ---------+------------------------------ Adj R-squared = 0.0562
 Total | 27451.0969 147 186.742156 Root MSE = 13.276

 ------------------------------------------------------------------------------
 approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
 cpi | -.1348399 .0431667 -3.124 0.002 -.2201522 -.0495277
 _cons | 60.95396 2.283144 26.697 0.000 56.44168 65.46624
 ------------------------------------------------------------------------------

 . sum cpi

 Variable | Obs Mean Std. Dev. Min Max
 ---------+-----------------------------------------------------
 cpi | 148 46.45878 25.36577 23.5 109

51
So the distribution of β1hat is:
.3
Fraction

0
-.3 -.2 -.135 -.1 -.05 0 .1
Simd cpi parameter

52
Now Lets Look at Approval and
the Unemployment Rate
 . reg approval unemrate

 Source | SS df MS Number of obs = 148
 ---------+------------------------------ F( 1, 146) = 0.85
 Model | 159.716707 1 159.716707 Prob > F = 0.3568
 Residual | 27291.3802 146 186.927262 R-squared = 0.0058
 ---------+------------------------------ Adj R-squared = -0.0010
 Total | 27451.0969 147 186.742156 Root MSE = 13.672

 ------------------------------------------------------------------------------
 approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
 unemrate | -.5973806 .6462674 -0.924 0.357 -1.874628 .6798672
 _cons | 58.05901 3.814606 15.220 0.000 50.52003 65.59799
 ------------------------------------------------------------------------------

 . sum unemrate

 Variable | Obs Mean Std. Dev. Min Max
 ---------+-----------------------------------------------------
 unemrate | 148 5.640541 1.744879 2.6 10.7

53
Now the Distribution of β1hat is:
.25
Fraction

0
-3 -2 -1 -.597 0 .67 1 2 3
Simd unemrate parameter

54

S-ar putea să vă placă și