Sunteți pe pagina 1din 19

Regression Analysis Tutorial

183

LECTURE / DISCUSSION Weighted Least Squares

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

184

Introduction
In a regression problem with time series data (where the variables have subscript "t" denoting the time the variable was observed), it is common for the error terms to be correlated across time, but with a constant variance; this is the problem of "autocorrelated disturbances," which will be considered in the next lecture. For regressions with cross-section data (where the subscript "i" now denotes a particular individual or firm at a point in time), it is usually safe to assume the errors are uncorrelated, but often their variances are not constant across individuals. This is known as the problem of heteroskedasticity (for "unequal scatter"); the usual assumption of constant error variance is referred to as homoskedasticity. Although the mean of the dependent variable might be a linear function of the regressors, the variance of the error terms might also depend on those same regressors, so that the observations might "fan out" in a scatter diagram, as illustrated in the following diagrams.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

185

. . . . .
Y

. . . . .

. . . . .
X

Homoskedasticity

. . . . .
Y

. . . . .

. . . . .
X

Increasing heteroskedasticity

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .
X

"U-shaped" heteroskedasticity

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

186

Assumptions of Heteroskedastic Linear Model


C yi '
K

@ x i % gi % gi

[simple linear model] or

yi ' j xij @
j'1

[multiple regression model]; [zero mean error terms];

C C C

E(gi) ' 0 Cov(gi , gi ) ' 0


)

if i i )
2

[no serial correlation]; and [heteroskedasticity].

Var(gi) '

2 i

'

@ hi , some hi

Sometimes also assume C gi normally distributed [optional].

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

187

Examples of Heteroskedastic Models


1. Grouped (Aggregate) Data
For individual "i" in group "s" (i.e., state, region, time period) yis ' % @ xis % gis , with Var(gis) '
2

, etc.

However, we only observe some group averages: 1 s ys ' j yis , ns i'1 Then ys ' % @ x s % g s , with Var(g s) '
2 n

1 s xs ' j xis . ns i'1

1 ns

@ hs .

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

188

2. Random Coefficients
Both the intercept and slope vary (randomly) across i, yi / "i % $i @ xi , where E( i) / V( i) /
2

, E( i) / Cov( i , i) / %

, Var( i) / , so that

, and yi /

@ x i % gi ,

with gi / ( i & ) % ( i & ) @ x i , which has E(gi) / 0 and V(gi) ' ' /
2 2

% 2 @ xi %

@ xi

2 2

@ (1 % 2 @ hi .

1 @ xi %

2 @ xi )

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

189

3. Variance Proportional to Square of Mean


yi ' V(gi) ' %
2

@ x i % gi , @ x i) 2 /

with
2

( %

@ hi ,

so that a larger variance is associated with a larger mean.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

190

Properties of Classical Least Squares Under Heteroskedasticity


C Least squares estimators of " and $ still unbiased and consistent; C Least squares estimators no longer efficient, i.e., they are no longer the best linear unbiased estimators; and C Usual estimators for the standard errors of least squares are biased, so the usual confidence intervals and test statistics are incorrect, and may lead to incorrect conclusions.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

191

Approaches to Dealing with Heteroskedasticity


C For known heteroskedasticity (e.g., grouped data with known group sizes), use weighted least squares (WLS) to obtain efficient unbiased estimates; C Test for heteroskedasticity of a special form using a squared residual regression; C Estimate the unknown heteroskedasticity parameters using this squared residual regression, then use the estimated variances in the WLS formula to get efficient estimates of regression coefficients (known as feasible WLS); or C Stick with the (inefficient) least squares estimators, but get estimates of standard errors which are correct under arbitrary heteroskedasticity.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

192

Correction for Heteroskedasticity of Known Form


If Var(gi) '
2

@ hi

where hi is known (e.g., grouped data), then yi ' Y yi hi yi ' etc. Since Var(gi ) '
( 2 (

% '

@ x i % gi , @ 1 hi
( (

Var(gi) ' @ xi hi
(

@ hi or

gi hi yi hi

@ zi %

@ xi % gi , with yi /

, zi /

1 hi

, can use Classical Least Squares on

this transformed equation to get efficient estimates of " and $ . For multiple regression model, divide the dependent variable and all of the regressors (including the constant term) by hi , then do least squares.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

193

Weighted Least Squares


Regressing yi on zi and xi involves minimization of
n ( (

j
i'1

( yi

& a @ zi & b @

(2 xi

' j
i'1

(yi & a & b @ xi)2 hi

thus, a more efficient estimator is obtained by downweighting the squared residuals for observations with large variances, in proportion to those variances.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

194

Properties of Weighted Least Squares Estimates (with known weights)


C Estimated coefficients are efficient, i.e., best linear unbiased (BLUE). C Regression of yi
(

on zi and xi gives correct standard

errors for coefficient estimates. C R2 must be redefined, since transformed model usually has no intercept term.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

195

Detection of Heteroskedasticity (unknown weights)


C Residual plot: Graph squared LS residuals ei ' (yi & & @ xi)2 against xi or yi ' % @ xi to check variability. C Diagnostic testing: Do formal statistical test for particular hypothesized form of h i .
2

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

196

Squared Residual Regression Test for Heteroskedasticity


Conditions: Moderate to large samples (n = 50 - 100+), possibly nonnormal errors, and linear form for hi , hi ' 1 % Y Y
2 i 2 gi 1zi1 izi1

% % % %

LziL LziL LziL

' '

2 2

% zi1 % %

% ui , E(ui) ' 0 ,

where zi1,...,ziL are known functions of regressors (e.g., zi1 = xi , zi2 = xi2 for random coefficients model). Idea: replace unknown squared errors gi with squared residuals ei ' (yi & yi)2 from LS, then regress ei zi1,...,ziL . Steps: (1) (2) (3) Get LS residuals ei ' yi & & @ xi . Get R2 from regression of ei on 1, zi1,...,ziL . Construct usual F-statistic 1&R reject homoskedasticity if ) exceeds critical value from F-table with L-1 and n-L degrees of freedom.
2
Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

on 1,

)=

R2

n&L L&1

Regression Analysis Tutorial

197

Correction for Heteroskedasticity: Feasible WLS


C Conditions: Same as for squared residual regression test, including
2 i

'

1zi1

% %

LziL

C Idea: Use squared residual regression to estimate weights. C Steps: (1) Fit
yi ' % @ xi % gi
2

by least squares, get

ei ' y i & & @ xi . (2) Regress ei "hi"


N

on
2

1,z1i,...,z1L

get

2 i ' 2 % 1zi1 % % LziL .

(3)

Replace

with " i " in WLS formula-(yi & & @ xi)2


2 i

minimize j
i'1

to estimate ",$ .

C Properties: In large samples, (approximately) same as WLS.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

198

Examples of Feasible WLS


1. Grouped Data Regress ns y s on ns , ns x s to estimate ",$

(exact WLS).

2. Random Coefficients First get LS estimates model has


2 i 2

, , and residuals ei . Since


2

'

1x i

2 2x i

regress ei on 1, xi, xi2 , test H0: *1 = *2 = 0 with F-test. If H0 (homoskedasticity) rejected, let
2 2 i ' 2 % 1x i % 2x i ,

plug into WLS formula. (For multiple regression, set ziR equal to squares and cross-products of regressors.)

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

199

3. Variance Proportional to Square of Mean


2 To test for heteroskedasticity, regress ei ' (yi & & xi)2

on 1 and zi1 ' ( i)2 ' ( % xi)2 , do F-test (or t-test) for y exclusion of
2

yi .

If homoskedasticity rejected, use

" hi ' yi " in place of "hi" in WLS formula.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

200

Correcting Least Squares Standard Errors for Heteroskedasticity


C Situation: Suspect heteroskedasticity but dont want to specify hi ; willing to stick with (inefficient but unbiased) least squares coefficient estimators. C Idea: Find formula for standard errors of LS which are valid under either homoskedasticity or heteroskedasticity. Known as Eicker-White (or just White) heteroskedasticity-consistent standard errors. C Usual Variance Estimator for LS (page 60): V( ) ' s 2 / j xi
2

if no intercept term; if intercept term;

' s 2 / j (xi & x)2 ' s 2 j (xi & x)2 / j (xi & x)2 2 . Formula assumes

E gi (xi & x)2 ' E(gi ) @ E(xi & x)2

which fails under heteroskedasticity.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

Regression Analysis Tutorial

201

C Corrected Variance Estimator for : Now use V( ) ' j ei (xi & x)2 / j (xi & x)2 2 where ei ' yi & & xi . Note V( ) s 2 / (xi & x)2 unless ei ' s 2 '
2 2

1 2 j ei n&k

for all observations, which never happens in practice.

C Formula for Multiple Regression: Similar, but more complicated. Fortunately, many computer packages (e.g., TSP) compute "Eicker-White" standard errors as an option.

Econometrics Laboratory C University of California at Berkeley C 22-26 March 1999

S-ar putea să vă placă și