Econometrics Multiple Regression Analysis: Heteroskedasticity

MLR.
5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Econometrics
Multiple Regression Analysis: Heteroskedasticity
João Valle e Azevedo
NOVA School of Business and Economics
Spring Semester
João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 1 / 19

MLR.5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS
Heteroskedasticity
Properties of OLS: Variance

Assumption MLR.5 (Homoskedasticity) The error u has the same
variance given any value of the explanatory variables
−1
Var (u|xi , ..., xk ) = σ 2 leading to Var (β̂|X) = σ 2 (X0 X)
With MLR.1 through MLR.5 we have derived the variance of the OLS
estimators and further concluded that OLS was asymptotically
Normal: Enough to conduct inference ”as usual”
If MLR.5 does not hold, that is, if the conditional variance of u is

allowed to vary given the x’s, then the errors are heteroskedastic
and the results above are NOT valid. Cannot make inference ”as
usual” (t-tests, F tests, LM tests)

Heteroskedasticity
Heteroskedastic Case
Suppose y is wage and x is education
f(y|x)
.
. E(y|x) = b0 + b1x
.
x1 x2 x3 x
Figure: How spread out is the distribution of the estimator

Heteroskedasticity
Properties of OLS: Variance (Cont.)
Theorem
Under assumptions MLR.1 through MLR.5
σ2
Var (β̂j ) = , j = 0, 1, ..., k
SSTj (1 − Rj2 )
n
X
SSTj = (xij − x¯j )2
i=1
Rj2 is the coefficient of determination from regressing xj on all the other regressors.
Tells us how much the other regressors ”explain” xj
Heteroskedasticity
Variance with Heteroskedasticity

Now assume Var (ui |xi1 , ..., xik ) = σi2
For the simple regression case:
P
ˆ (xi − x̄)ui
β1 = β1 + P
(xi − x̄)2
So, conditional on the x’s:
(xi − x̄)2 σi2

P
ˆ
Var (β1 ) = P 2
(xi − x̄)2
A valid estimator when σi2 6= σ 2 is:
(xi − x̄)2 ûi2

P
Var (βˆ1 ) = P 2 , where ûi are the OLS residuals
(xi − x̄)2

Heteroskedasticity
Variance with Heteroskedasticity

For the multiple regression model, a valid (consistent) estimator of
Var (β̂j ) with heteroskedasticity is:
P 2 2
r̂ij ûi
Var (β̂j ) =
SSRj2
r̂ij is the i th residual from regressing xj on all other independent

variables
SSRj is the sum of squared residuals from this regression
ûi are the OLS residuals

Heteroskedasticity
Robust Standard Errors

The square root of this variance can be used as a standard error for
inference (Robust Standard error). With these standard errors it
turns out that:
(β̂j − βj ) a
t= ∼ Normal(0, 1)
se(β̂j )
I This is an heteroskedasticity-robust t statistic
Often, the estimated variance is corrected for degrees of freedom by
multiplying by n/(n-k-1) (irrelevant for large n)
Why not use always robust standard errors?
I In small samples t statistics using robust standard errors will not have a
distribution close to the Normal (or t) and inferences will not be correct
Will not deal with heteroskedasticity-robust F statistics
Instead, use heteroskedasticity-robust LM tests
Heteroskedasticity
A Robust LM Statistic
Suppose we have a standard model
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
and our null hypothesis is H0 : βk−q+1 = βk−q+2 = ... = βk = 0 (the
number of restrictions is q)
First, we just run OLS on the restricted model and save the residuals
ŭ
Regress each of the excluded variables on all of the included variables
(q different regressions) and save each set of residuals r̆1 , r̆2 , ..., r̆q
Regress a variable defined to be = 1 on r̆1 ŭ, r̆2 ŭ, ..., r̆q ŭ, with no
intercept
The LM statistic is n − SSR1 , where SSR1 is the sum of squared
residuals from this final regression, it has a chi-square distribution
with q degrees of freedom (under the Null)
Heteroskedasticity
Testing for Heteroskedasticity

Want to test H0 : Var (u|xi , ..., xk ) = σ 2 , which is equivalent to
H0 : E (u 2 |xi , ..., xk ) = E (u 2 ) = σ 2
If assume the relationship between u 2 and xj will be linear, can test as

a linear restriction
I Thus, for u 2 = δ0 + δ1 x1 + ... + δk xk + ν this means testing
H0 : δ1 = δ2 = ... = δk = 0
I Don’t observe the error, but can use residuals from the OLS regression

Heteroskedasticity
The Breusch-Pagan Test

Estimate u 2 = δ0 + δ1 x1 + ... + δk xk + ν by OLS
Want to test H0 : δ1 = δ2 = ... = δk = 0
I Take the R 2 of this regression. With assumptions MLR.1 through
MLR.4 still in place we can use an F test or an LM type test
I The F statistic is just the reported F statistic for overall significance of
this regression
R 2 /k
F = ∼ F(k,n−k−1)
(1 − R 2 )/(n − k − 1)
Alternatively, can form the LM statistic LM = nR 2 , which is
approximately distributed as a χ2k under the null (R 2 of the regression
above!, this is not the typical LM test!)
These tests are usually called the Breusch-Pagan tests for
heteroskedasticity

Heteroskedasticity
The White Test

The Breusch-Pagan tests will detect linear forms of heteroskedasticity
The White test allows for nonlinearities by using squares and
cross-products of all the x’s
Estimate
u 2 = δ0 + δ1 x1 + ... + δk xk + δk+1 x12 + ... + δ2k xk2 + ...+
+ δ2k + 1x1 x2 + ... + δk+k(k+1)/2 xk xk−1 + error by OLS
Want to test H0 : δ1 = δ2 = ... = δk+k(k+1)/2 = 0
I Take the R 2 of this regression and still use the F or LM statistics to
test whether all the xj , xj2 , and xj xh are jointly significant:
R 2 /q
F = ∼ F (q, n − k − 1) (approx.) under the null
(1 − R 2 )/(n − q − 1)
I and LM=nR 2 ∼ χ2q (approx.) under the null (q = k + k(k + 1)/2)
I If k is large and n small these approximations are poor
Heteroskedasticity
Alternate form of the White Test

Now, the fitted values from OLS, ŷ , are a function of all the x’s
Thus, ŷ 2 will be a function of the squares and cross-products and ŷ

and ŷ 2 can ”substitute” for all of the xj , xj2 , and xj xh , so:
I Regress the squared residuals on ŷ and ŷ 2 (as well as a constant) and

use the R 2 to form an F or LM statistic (as for the BP or White tests)
I Only testing 2 restrictions now

WLS
Weighted Least Squares

We can always estimate robust standard errors for OLS
However, if we know something about the specific form of the

heteroskedasticity, we can obtain estimators that have a smaller
variance than OLS
If we know in fact something we are able to transform the model into

one that has homoskedastic errors

WLS
Case of known form up to a multiplicative constant
y = β0 + β1 x1 + β2 x2 + β3 x3 + ... + βk xk + u
Suppose we know that Var (u|x) = σ 2 h(x), or

Var (ui |x) = σ 2 h(xi ) = σ 2 hi
Example:
wage = β0 + β1 Education + β2 Experience + β3 Tenure + u

√
We know √ that E (ui2/ hi |x) = 0, because hi2 depends only on x, and
Var (ui / hi |x) = σ , because Var (u|x) = σ hi
√
So, if we divide the regression equation by hi we will get a model
where the error is homoskedastic (MLR.1 to MLR.5 verified again)
WLS
Generalized Least Squares

Estimating the transformed equation by OLS is an example of
generalized least squares (GLS)
GLS will be BLUE (Best Linear Unbiased Estimator) in this case
The GLS estimator for √ the particular case where we divide the
regression equation by hi is called a weighted least squares (WLS)
estimator. Why?
n
X βˆ0
(yi∗ − √ − βˆ1 xi1 ∗
− ... − βˆk xik∗ )2
i=1
hi
p p
where yi∗ = yi / hi , xi1∗
= xi1 / hi
Xn
(yi − βˆ0 − βˆ1 xi1 − ... − βˆk xik )2 /hi
i=1

WLS
More on WLS
We interpret WLS estimates in the original (not transformed model)
but get variances of the WLS estimators in the transformed model
WLS is optimal if we know the form of Var (ui |xi )
In most cases, won’t know the form of heteroskedasticity
Can often estimate the form of heteroskedasticity
Example:
wage = β0 + β1 Education + β2 Experience + β3 Tenure + u
Var (u|Education, Experience, Tenure) = σ 2 exp(δ0 + δ1 Education)
I where δ0 and δ1 are unknown

WLS
Must estimate the form of Heteroskedasticity: Feasible

GLS
First, we assume a model for heteroskedasticity
Example: Var (u|x) = E (u 2 |x) = σ 2 exp(δ0 + δ1 x1 + ... + δk xk ) > 0
Since we don’t know the δ’s, must estimate them
We can write the above model as:
u 2 = σ 2 exp(δ0 + δ1 x1 + ... + δk xk )ν, where E (ν|x) = 1

Assume further that ν is independent of x
Then ln(u 2 ) = α0 + δ1 x1 + ... + δk xk + e
where E (e) = 0 and e is independent of x

WLS
Feasible GLS (continued)
ln(u 2 ) = α0 +δ1 x1 +...+δk xk +e where E (e) = 0 and e is independent of x
Can use û (from OLS) instead of u, to estimate this equation by OLS

Then, obtain an estimate of hi by ĥi = exp(ĝi ),
Finally, use 1/ĥi as the weights in WLS
Summary:
I Run OLS in the original model, save the residuals, û, square them and
take logs
I Regress ln(û 2 ) on all of the independent variables (plus constant) and
get the fitted values, ĝ
I Do WLS using 1/exp(ĝ ) as the weight
WLS
Notes on GLS
OLS is still unbiased and consistent with heteroskedasticity (as long
as MLR.1 through MLR.4) hold
We use GLS just for efficiency (smaller variance of the estimators)
If we know the weights to use in WLS, then GLS is unbiased.
Otherwise, and assuming that we estimate a correctly specified for
heteroskedasticity, FGLS (which is a Feasible GLS) is not unbiased
but is consistent and asymptotically efficient
Remember, with FGLS we are estimating the parameters of the
original model. Standard errors in the transformed model also refer to
standard errors in the original model
Can use the t and F tests for inference
When doing F tests with WLS, form the weights from the
unrestricted model and use those weights to do WLS on the restricted
model as well as on the unrestricted model

Econometrics Multiple Regression Analysis: Heteroskedasticity

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Econometrics Multiple Regression Analysis: Heteroskedasticity

Încărcat de

Drepturi de autor:

Formate disponibile

MLR.

5 Variance LM Statistic Testing Heteroskedasticity WLS and GLS

João Valle e Azevedo

NOVA School of Business and Economics

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 1 / 19

Properties of OLS: Variance

If MLR.5 does not hold, that is, if the conditional variance of u is

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 2 / 19

Figure: How spread out is the distribution of the estimator

Properties of OLS: Variance (Cont.)

Variance with Heteroskedasticity

(xi − x̄)2 σi2

(xi − x̄)2 ûi2

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 5 / 19

Variance with Heteroskedasticity

r̂ij is the i th residual from regressing xj on all other independent

SSRj is the sum of squared residuals from this regression

ûi are the OLS residuals

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 6 / 19

Robust Standard Errors

Testing for Heteroskedasticity

If assume the relationship between u 2 and xj will be linear, can test as

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 9 / 19

The Breusch-Pagan Test

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 10 / 19

The White Test

Alternate form of the White Test

Thus, ŷ 2 will be a function of the squares and cross-products and ŷ

I Regress the squared residuals on ŷ and ŷ 2 (as well as a constant) and

I Only testing 2 restrictions now

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 12 / 19

Weighted Least Squares

However, if we know something about the specific form of the

If we know in fact something we are able to transform the model into

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 13 / 19

Case of known form up to a multiplicative constant

Suppose we know that Var (u|x) = σ 2 h(x), or

wage = β0 + β1 Education + β2 Experience + β3 Tenure + u

Generalized Least Squares

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 15 / 19

wage = β0 + β1 Education + β2 Experience + β3 Tenure + u

Var (u|Education, Experience, Tenure) = σ 2 exp(δ0 + δ1 Education)

I where δ0 and δ1 are unknown

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 16 / 19

Must estimate the form of Heteroskedasticity: Feasible

u 2 = σ 2 exp(δ0 + δ1 x1 + ... + δk xk )ν, where E (ν|x) = 1

Then ln(u 2 ) = α0 + δ1 x1 + ... + δk xk + e

where E (e) = 0 and e is independent of x

João Valle e Azevedo (NOVA SBE) Econometrics Lisbon 17 / 19

Feasible GLS (continued)

ln(u 2 ) = α0 +δ1 x1 +...+δk xk +e where E (e) = 0 and e is independent of x

Can use û (from OLS) instead of u, to estimate this equation by OLS

S-ar putea să vă placă și