Detect Heteroscedasticity Linear Regression

Muhammad Ali
Lecturer in Statistics
GPGC Mardan.
Heteroscedasticity
Definition
One of the assumption of the classical linear regression model that the error ( i )
term having the same variance i.e. 2. But in most practical situation this
assumption did not fulfill, and we have the problem of heteroscedasticity.
Heteroscedasticity does not destroy the unbiased and consistency property of the
ordinary least square estimators, but these estimators have not the property of
minimum variance. Recall that OLS makes the assumption that V (i ) =2 for al i.
That is, the variance of the error term is constant. (Homoscedasticity). If the error
terms do not have constant variance, they are said to be heteroscedasticity. The
term means differing variance and comes from the Greek hetero ('different')
and scedasis ('dispersion').]
When heteroscedasticity might occur/causes of heteroscedasticity

1. Errors may increase as the value of an independent variable
increases. For
example, consider a model in which annual family income is the independent

variable and annual family expenditures on vacations is the dependent variable.
Families with low incomes will spend relatively little on vacations, and the
1
Muhammad Ali
GPGC Mardan.
variations in expenditures across such families will be small. But for families
with large incomes, the amount of discretionary income will be higher. The mean
amount spent on vacations will be higher, and there will also be greater variability
among such families, resulting in heteroscedasticity. Note that, in this example, a
high family income is a necessary but not sufficient condition for large vacation
expenditures. Any time a high value for an independent variable is a necessary but
not sufficient condition for an observation to have a high value on a dependent
variable, heteroscedasticity is likely.
2. Other model misspecifications can produce heteroscedasticity. For example, it
may be that instead of using Y, you should be using the log of Y. Instead of using
X, maybe you should be using X2, or both X and X2. Important variables may be
omitted from the model. If the model were correctly specified, you might find that
the patterns of heteroscedasticity disappeared.
3. As data Collection techniques improve, 2i is likely to decrease. Thus banks that
have sophisticated data processing equipment are likely to commit fewer errors in
the monthly or quarterly statements of their customers than banks without such
facilities.
4. Heteroscedasticity can also arise as a result of the presence of outliers. An
outlying observation is an observation that is much different in relation to the
observations in the sample.
2
Muhammad Ali
GPGC Mardan.
5. Error learning models, as people learn, their errors of behavior become smaller
over time. In this case, 2i is expected to decrease. As an example, the number of
typing speed errors decreases as the number of typing practice increases, the
average number of typing errors as well as their variances decreases.
Consequences of heteroscedasticity
Following are the consequences of the heteroscedasticity:
1. Heteroscedasticity does not result in biased parameter estimates. However, OLS
estimates are no longer BLUE. That is, among all the unbiased estimators, OLS
does not provide the estimate with the smallest variance. Depending on the nature
of the heteroscedasticity, significance tests can be too high or too low.
2. In addition, the standard errors are biased when heteroscedasticity is present. This
in turn leads to bias in test statistics and confidence intervals.
3. Fortunately, unless heteroscedasticity is marked, significance tests are virtually
unaffected, and thus OLS estimation can be used without concern of serious
distortion. But, severe heteroscedasticity can sometimes be a problem. Warning:
Note that heteroscedasticity can be very problematic with methods besides OLS.
For example, in logistic regression heteroscedasticity can produce biased and
misleading parameter estimates.
Muhammad Ali
GPGC Mardan.
OLS estimation in presence of heteroscedasticity

If we introduce heteroscedasticity by letting that E( i ) = i but retain all other
2
assumptions of the classical model the OLS estimates are still unbiased.
Consider the two variable regression model.
Yi = 0 + 1 X i + i
We know that the ordinary least square estimate of 1 is:
1 =
xi y i
xi
1 = xi (Yi Y ) / xi 2
1 =
xi Yi Y xi
x 2i
x 2i
1 =
xi ( 0 + 1 X i + i )
x 2i
1 =
0 xi + 1 xi X i + xi i
x 2i
xi X i + xi i
= 1
x 2i
x X xi i
= 1 2i i +
A
2
x i
xi
Now
xi X i
( X i X ) X i
=
2
x i
( X i X ) ( X i X )
=
( X i X ) X i
X i ( X i X ) X ( X i X )
=1
Muhammad Ali
GPGC Mardan.
Put this value in equation (A)
E ( 0 ) = 0
Similarly
It is shown that in the presence of heteroscedasticity the OLS estimators are unbiased.
Variance of OLS estimator in the presence of heteroscedasticity

Since
Var ( 1 ) = E 1
2
2
xi i
= E 1 +
Using previous result

1
2
xi
x
2
= E[ w i i ] As w i = i 2
xi
2
2
2
2
2
2
Var( 1 ) = E w1 1 + w2 2 + ... + wn n + w1 w2 1 2 + ... + wn 1 wn n 1 n
The cross product term equals to zero because we know that E( i j ) = 0

= w 1 E ( 1 ) 2 + w2 E ( 2 ) 2 + ...wn E ( n ) 2
2
= w 1 1 + w2 2 + ...wn n
2
= wi i =
2
xi
( xi )
2 2
Var ( 1 ) = i 2
xi
2
i2
Muhammad Ali
GPGC Mardan.
Which is different when Homoscedasticity is present in the model.
Tests for Detection of Heteroscedasticity

The following tests to be used for detection of multicollinearity:
1. Park Test
Park test suggest that 2i is some function of the explanatory variable Xi. i.e.
i 2 = 2 X i e
ln i = ln 2 + ln X i + i i
2
Since i is unknown , park suggest using u i as a proxy and running the following regression.
2
ln u i = ln 2 + ln X i + i = + ln X i + i ii
2
If found statistically significant in the above equation then it means that

heteroscedasticity is present in the data, otherwise we may accept the assumption of
Homoscedasticity.
The Park test is thus a two-stage procedure. In the first stage we run the OLS regression
disregarding the heteroscedasticity question. We obtain u i from this regression, and then
in the second stage we run the regression (ii).
Muhammad Ali
GPGC Mardan.
2. Glejsar Test
Glejsar test is much similar to Park test. After obtaining residuals u i from the OLs
regression Glejsar suggest regressing the absolute of the u i on the X variable that is
thought to be closely associated with 2i .
Glejsar used the following functional form:
u i = 1 + 2 X i + i
u i = 1 + 2 X i + i
u i = 1 + 2
u i = 1 + 2
1
+ i
X
1
Xi
+ i
u i = 1 + 2 X i + i
2
u i = 1 + 2 X i + i
Where i is the error term.
Muhammad Ali
GPGC Mardan.
Goldfeld and Quandt point out that the error term vi has some problems in the above
expressions.
Its expected value is not equal to zero.
It is serially correlated.
The last two expression are not linear in parameters and therefore cannot be estimated
with the usual OLS procedure.
3. Spearman's Rank Correlation Test.

The well known spearman's rank correlation coefficient is given by the following
formula.
di 2
rs = 1 6
2
n n 1
Where d= difference between two rankings and n= number of individuals. The above
spearman's rank correlation coefficient can be used to detect heteroscedasticity.
The procedure for Spearman's rank correlation coefficient is as follows:
i.
Fit the regression line on Y and X and find the residuals.
ii.
Rank the residuals by ignoring their sign.
iii.
Rank either the value of X or Y.
iv.
Find difference between two rankings(di).
v.
Apply the following test statistic to test the hypothesis that the population
rank correlation coefficient i = 0 and n > 8 i.e.
Muhammad Ali
GPGC Mardan.
t = rs
n2
1 rs
with ' n - 2 degree of freedom
If the computed value of t exceeds than the tabulated value then we may
accept the hypothesis of heteroscedasticity; otherwise we may reject it.
4. Goldfeld-Quandt Test
This test is suggested if the heteroscedasticitic variance 2i is positively related to one of

the predictor variables in the regression model.
Consider the two-variable regression model:
Yi = 1 + 2 X i + i
Suppose that 2i is positively related to X as:
2i=2Xi2
Now to test the hypothesis that there is no heteroscedasticity we will follow the following
steps.
Step#1.
Rank the observations beginning with the lowest value of X.
Step#2.
Omit 'c' central observations where 'c' is fixed in advance, and then divide
the remaining observation into two groups.
Step#3.
Fit the OLS regression model to both groups and obtain sum of square of
regression i.e. RSS1 and RSS2. RSS1 representing the RSS to the smaller
Muhammad Ali
GPGC Mardan.
variance groups and RSS2 representing the RSS to the larger variance
group. Both RSS1 and RSS2 having the same degrees of freedom. i.e.
(n c ) k
2
n - c - 2k
or
Where k is the number of parameters to be estimated. In two variable case k=2
Step#4
Compute the ratio
RSS 2 / df
RSS1 / df
If the error term is normally distributed i.e. ~N(0,2) then follows the F distribution with
1 = n c 2k / 2 and 2 = n c 2k / 2 degrees of freedom.

If the computed value of is greater than the tabulated value of F then we can reject the
hypothesis of Homoscedasticity.
10

Detect Heteroscedasticity Linear Regression

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Detect Heteroscedasticity Linear Regression

Încărcat de

Drepturi de autor:

Formate disponibile

Muhammad Ali

When heteroscedasticity might occur/causes of heteroscedasticity

example, consider a model in which annual family income is the independent

OLS estimation in presence of heteroscedasticity

Put this value in equation (A)

Variance of OLS estimator in the presence of heteroscedasticity

Using previous result

The cross product term equals to zero because we know that E( i j ) = 0

Which is different when Homoscedasticity is present in the model.

Tests for Detection of Heteroscedasticity

If found statistically significant in the above equation then it means that

Where i is the error term.

Its expected value is not equal to zero.

3. Spearman's Rank Correlation Test.

Fit the regression line on Y and X and find the residuals.

Rank the residuals by ignoring their sign.

Rank either the value of X or Y.

Find difference between two rankings(di).

with ' n - 2 degree of freedom

This test is suggested if the heteroscedasticitic variance 2i is positively related to one of

Rank the observations beginning with the lowest value of X.

Where k is the number of parameters to be estimated. In two variable case k=2

Compute the ratio

1 = n c 2k / 2 and 2 = n c 2k / 2 degrees of freedom.

S-ar putea să vă placă și