Sunteți pe pagina 1din 12

Heteroskedasticity

In the context of linear regression models, one of the


necessary assumptions was constancy of variance of
the disturbance term for all observations. For
example, for the simple two- variable model
we assumed that the variance of disturbance term,
E., is constant for all observations.

Here hetero means unequal or different and


skedastic means spread or scatter.

different values of X are closely concentrated around the regression line with equal
spread (represented by the dashed lines) above and below the regression line. This is
the situation of homoskedasticity.
On the other hand, although the data points in Figure 4.2 again refer to different
values of X, it is now observed that the higher value of X. has higher spread of data
points around the regression line. An opposite situation is displayed in Figure 4.3
where a higher value of X has lower spread of data points around the regression line.
Both these cases represent the situation of heteroskedasticity; the former is referred
to as increasing heteroskedasticity and the latter as decreasing heteroskedasticity

This feature of disturbance term of the regression


model is known as homoskedasticity.
However, it is quite common in regression analysis
to have cases where the variance of disturbance
term becomes variable rather than remaining
constant.
In this situation, the disturbance term is said to be
heteroskedastic. Thus, heteroskedasticity
represents the situation where the variance of
the disturbance term of the model does not
remain constant and turns out to be a
variable. We express this algebraically by writing

Sources of Heteroskedasticity
Heteroskedasticity in data may arise
due to several reasons.
(i) When we are dealing with
microeconomic or cross-section
data, we are very likely to have a
heteroskedasticity problem.

Sources of
Heteroskedasticity
For instance, when data are collected from a cross-section of individuals on
their levels of consumption and income to estimate a consumption function,
we are likely to face the heteroskedasticity problem.2 This is because the
variance (or spread) of consumption at low levels of income is much less
compared to variance of consumption at higher levels of income. This
happens as people with low income levels do not have much flexibility in
spending their income; a large proportion of it is spent on food, clothing,
and transportation. On the other hand, people with high levels of income
have a much wider choice and flexibility in spending; some might consume
a lot while others might prefer investing in share market. This will produce
high variance of consumption at high levels of income. In any case, this is
typically the case of increasing heteroskedasticity As an example of
decreasing heteroskedasticity, we may refer to error-learning models To
take an example, in a regression of score performance by the students on
hours taken for preparation, we are likely to observe decreasing
heteroskedasticity. This is because, compared to the variance of scores at
lower levels of preparation, the variance of scores will be less at higher
levels of preparation.

Sources of
Heteroskedasticity
2. Presence of outliers in data may
cause heteroskedasticity problem.
As we know, an outlying observation
is much different from the rest of
the observations in the sample.
Presence of such an observation,
especially when the sample size is
small, will create heteroskedasticity
problem.

Sources of
Heteroskedasticity
3. Heteroskedasticity may arise if some relevant
variables have been mistakenly omitted so
that the model is incorrectly specified. For
instance, if we do not include the prices of
competing or complementary goods in the
demand function for a commodity, the
residuals obtained from the regression may
contain non-constant variance. However,
when the omitted variables are included in
the model, the non- constancy of variance
may disappear.

Sources/reasons of
Heteroskedasticity
4. Inclusion of explanatory variables in the
model whose distributions are skewed
may cause heteroskedasticity problem.
The examples of such variables are
income, wealth, etc.
5. Heteroskedasticity may also arise due to
incorrect data transformations (e.g.,
ratios, first-differences, etc.) and
incorrect functional forms (e.g., linear
instead of log-linear models).

CONSEQUENCES OF
HETEROSKEDASTICITY
heteroskedasticity represents a
situation where the variance of the
disturbance of the model becomes a
variable.
we check whether OLS estimators
continue to remain unbiased and
minimum-variance or best when the
disturbance term of the model is
heteroskedastic.

CONSEQUENCES OF
HETEROSKEDASTICITY
Unbiasedness
-hat remains unbiased when the disturbance
term of the model (s,) is heteroskedastic.

Bestness
-hat is no longer a minimum variance and
hence best estimator. Further, as ,-hat is only
unbiased and not the best, it becomes
inefficient.

Consistency
-hat is consistent when the disturbance term is
heteroskedastic.

CONSEQUENCES OF
The OLS estimators
continue to remain unbiased and consistent under
HETEROSKEDASTICITY

heteroskedasticity. This happens because the explanatory variable(s)


is not correlated with the disturbance term of the model. Therefore, a
correctly specified model that suffers only from the problem of
heteroskedasticity may still be useful.

Heteroskedasticity increases the variances of the distributions of -hats


thereby turning the OLS estimators inefficient.
Heteroskedasticity also affects the variances of OLS estimators and their
standard errors. In fact, the presence of heteroskedasticity, in general,
causes the OLS method to underestimate the variances and hence
standard errors of the estimators. As a consequence, we have higher
than expected values of t and F statistics. Thus, heteroskedasticity has
a wide impact on hypothesis testing; the conventional t and F statistics
are no more reliable for hypothesis testing.
As the OLS estimators are unbiased under heteroskedasticity, the
forecasts generated on the basis of the estimated model will also be
unbiased. However, as the OLS estimators are inefficient, the reliability
of the forecasts (as measured by their variances) will be inferior to an
alternative estimator that is more efficient.

S-ar putea să vă placă și