00 voturi pozitive00 voturi negative

20 vizualizări19 paginiPanel Data

Oct 21, 2016

© © All Rights Reserved

DOC, PDF, TXT sau citiți online pe Scribd

Panel Data

© All Rights Reserved

20 vizualizări

00 voturi pozitive00 voturi negative

Panel Data

© All Rights Reserved

Sunteți pe pagina 1din 19

In panel data we have elements of both time series and cross-sectional data.

That is, the same cross-sectional unit (say a firm/state/country) is surveyed over time.

Example: data on GDP for 3 countries for a period of 5 years (see Table).

Year

1991

1992

1993

1994

1995

India

***

***

***

***

***

USA

***

***

***

***

***

Pakistan

***

***

***

***

***

In the typical panel, there are a large number of cross-sectional units and only a few

periods though the opposite case is also relevant.

Such data set focuses on cross-sectional variation or heterogeneity.

There are other names for panel data such as pooled data and longitudinal data. But

strictly not all pooled data are panel data. It will become complete panel only when

we include individual and/or time effects.

(A) Balanced panel: Each cross-sectional unit has same number of time series

observations.

Year

1991

1992

1993

1994

1995

1991

1992

1993

1994

1995

1991

1992

1993

1994

1995

Firm ID

1

1

1

1

1

2

2

2

2

2

3

3

3

3

3

Investment (Y)

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

panel members

Year

1991

1992

1993

1991

1992

1993

1994

1991

1992

1993

1994

Firm ID

1

1

1

2

2

2

2

3

3

3

3

Investment (Y)

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

***

It takes into account heterogeneity across individual cross-sectional units and through

time. This is done by including individual and time specific variables.

It allows inclusion of unobserved effects as explanatory variables in the model.

Exclusion of such effects could cause omitted variable bias.

It helps to examine issues that cant be studied in either cross-sectional or time-series

settings alone.

It gives more informative data, more variability, less collinearity among variables,

more degrees of freedom and more efficiency

It is better suited to study the dynamics of change as it uses repeated cross section of

observations

It enables us to study more complicated behavioural models (e.g. technological

change)

It allows researcher far greater flexibility in modelling differences in behaviour across

individual cross-sectional units (see one-way fixed effects model)

Using a single cross section disregards much useful information in other time periods.

On the other hand, usage of time-series data disregards useful information on other

cross-sectional units.

4) Methods of Estimation

(FEM)

One-way

FEM

Two-way

FEM

(REM)

One-way

REM

Two-way

REM

Consider the following simple (panel) regression model

Yit = 1 + 2X2it + 3 X3it + uit

(1)

Where i stands for i th cross-sectional unit or observation (e.g. firm or state); t stands

for t th time period; and uit is common error term.

Problem with this model: It ignores two important aspects which are relevant in

panel data context

First, it doesnt consider space/individual/group dimension: that is factors which are

specific to each cross-sectional unit (individuals/firms/states/countries) but remain

unchanged overtime.

style/philosophy of a company, fixed capital of a firm, historical factors etc.

Second, it ignores time dimension: that is factors which are specific to the period in

which they occur but not carried across periods within a cross-sectional unit

Examples: technological change, changes in govt. regulatory policies, economic boom

or bust, global economic slowdown, weather factor and so on.

Together,

group

and

effects/variables/factors.

time

dimensions

are

called

unobserved

Thus, model (1) is the usual OLS regression and in panel data context is called pooled

regression model.

Due to omission of group and time dimensions (specification error), the OLS results

might not be fully reliable.

Hence, we need to find some way to take into account group and time dimension in

model (1).

How to accomplish this?

We let the intercept of (1) vary for each cross-sectional unit but dont allow it to

change overtime for each cross-sectional unit.

If we do so, model (1) would become

Yit = 1i + 2X2it + 3 X3it + uit

(2)

In (2) we have an intercept term (1i) specific to each individual cross-sectional unit i

Technically, 1i is called the individual effect and is an unknown parameter to be

estimated

Note that in model (1) we have a common intercept term ( 1) which is assumed as

same across all cross-sectional units

By incorporating group dimensions we acknowledge that there are heterogeneity or

significant differences among cross-sectional units and the difference is captured

through individual specific constant term 1i.

Model (2) is known as the one-way Fixed Effects Model since only individual effect

is present.

The term fixed effects is due to the fact that (a) we consider 1i as a group specific

constant term in regression model; and (b) each cross-sectional units (is) intercept

does not vary over time; that is, it is time invariant.

We can easily allow this using the dummy variable technique.

Now, allowing for dummy variables, model (2) can be written as

Yit= 1+ 2D2i+ 3D3i ++ nDni+ 2X2it + 3 X3it + uit

(3)

D3i = 1 if the observation belong to cross-sectional unit 3, zero otherwise

Dni = 1 if the observation belong to nth cross-sectional unit, zero otherwise

Note: If we have n cross-sectional units/groups, we use n-1 dummies to avoid falling

into dummy-variable trap (i.e. situation of perfect collinearity). Hence there is no

dummy for first cross-sectional unit, which means 1 represents intercept for first (or

omitted i in terms of assigning dummies) cross-sectional unit.

Other s represent differential intercept coefficients indicating how much

intercepts of dummy variable assigned i's differ from intercept of i which is not

assigned a dummy.

In short, cross-sectional unit which is not assigned a dummy becomes comparison

cross-sectional unit.

Of course we are free to choose any cross-sectional unit as comparison cross-sectional

unit [Anyway, computer programme will automatically omit the dummy for first

cross-sectional unit (i.e. select the first i as comparison i) and compute the results].

Since we use dummies to estimate the fixed/individual effects, this model is also

known as Least-Squares Dummy Variable (LSDV) model. Hence, the terms FEM

and LSDV can be used interchangeably.

How to incorporate individual dummy variable in the data set?

Assume that we have 5 year data for 4 companies on three variables: one dependent

(investment) and 2 independent (value of firm and capital stock) variables.

We can stack the 5 observations for each company on the top of the other; thus giving

in all 20 observations for each of the variables in the model.

After that the individual dummies are assigned in the following fashion.

No of

observati

ons*

1

2

3

4

5

6

7

8

-----------N

Year

Firm ID

1940

1941

1942

1943

1944

1940

1941

1942

1943

1944

1940

1941

1942

1943

1944

1940

1941

1942

1943

1944

1

1

1

1

1

2

2

2

2

2

3

3

3

3

3

4

4

4

4

4

I

74.4

113

91.9

61.3

56.8

461.2

512

448

499.6

547.5

361.6

472.8

445.6

361.6

288.2

28.57

48.51

43.34

37.02

37.81

F

2132.2

1834.1

1588

1749.4

1687.2

4643.9

4551.2

3244.1

4053.7

4379.3

2202.9

2380.5

2168.6

1985.1

1813.9

628.5

537.1

561.2

617.2

626.7

C

186.6

220.9

287.8

319.9

321.3

207.2

255.2

303.7

264.1

201.6

254.2

261.4

298.7

301.8

279.1

26.5

36.2

60.8

84.4

91.2

D1

D2

D3

D4

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

For this data we get the following result

Yit = -52.468 + 260.69 D2i + 285.51 D3i + 49.51 D4i + .065 X2it + .056 X3it

Where Yit represents I; X2it represents F; and X3it represents C.

Note: Other statistics are not produced

The intercept values of the four companies are as follows: -52.47 for Company 1;

208.22 (= 260.69 52.47) for Company 2; 233.04 (= 285.51 52.47) for Company 3

and -2.96 (= 49.51 52.47).

We may say that these differences in intercepts may be due to unique features of each

company, such as differences in management style or managerial talent.

The problem with dummy variable analysis is that it produces many explanatory

variables, especially when the number of cross-sectional observations is plenty. Thus,

in such cases, dummy variable method is not very practical.

However, computer regression packages take care of this. But loss in degrees of

freedom is the major casualty of this exercise.

But the packages rarely produce (LIMDEP & GRETL do not) the estimated intercepts

from the dummy variable analysis for the practical reason that there are so many such

intercepts.

Some econometric packages that support fixed effects estimation report only one

intercept (GRETL does this) which can cause confusion.

Typically, the intercept reported is this case is the average of individual-specific

intercepts: (-52.47 + 208.22 + 233.04 -2.96)/4 = 96.45 in the above example

How to decide on the statistical significance of individual/group effects (or individual

dummies)? [OR] How can we choose between OLS regression results and FEM (with

individual effects) results?

We can use the restricted F test for the purpose. We use F test because our aim is to

test joint significance of null hypothesis that 2 = 3 = n = 0 (n-1 restriction,

since one group is chosen as the base group), where n is the number of crosssectional units [or the null hypothesis is that all dummy parameters

except one are zero: 1 = 2 = n-1 = 0]. In vast majority of applications, the

dummy variables will be jointly significant.

For the purpose of the F test, OLS model is considered a restricted model in the

sense that it imposes a common intercept on all the individual cross-sectional units.

The FEM (with individual effects) is considered as an unrestricted model.

The null hypothesis set for the purpose of testing is: Intercept terms are equal for

all i [or] No significant difference across i's

Acceptance of this null means that the efficient estimator is the one obtained from the

model which assumes constant intercept, i.e. OLS.

The F ratio used for the test is as follows:

(R2UR R2R)/(n-1)

F(n-1, nT-n-K) = --------------------------(1- R2UR)/(nT-n-K)

Where n number of cross sectional units (or groups)

T number of time period w.r.t a single i

K number of explanatory variables (excluding constant)

R2UR R2 value of unrestricted regression model (FEM)

R2R - R2 value of restricted regression model (OLS)

If computed F value (for n-1 numerator df and nT-n-K denominator df) is larger than

the critical value from the F table (for n-1 numerator df and nT-n-K denominator df)

we reject the null hypothesis.

Rejection of null hypothesis favours existence of individual effect in the data (and

vice versa).

This is another variant of FEM with only time-effect/dimension is present.

If we incorporate this effect, the panel regression model will look like:

Yit = 1t + 2X2it + 3 X3it + uit

(4)

In (4) we have an intercept term (1t) specific to each time period t

Technically, 1t is called the time effect and is an unknown parameter to be estimated

Model (4) is also known as one-way FEM since only time effect is present

We could account for time effects in the model by introducing time dummies (t-1

dummies to avoid perfect collinearity) on the right hand side of the equation (4).

In such case we write (4) as:

Yit= 0+ 1DYear1+ 2DYear2+..+ n DYearT + 2X2it + 3 X3it + uit

(5)

Where DYear1 indicates dummy for year 1; DYear2 indicates dummy for year 2 and

so on. DYear1 takes a value of 1 for observation in Year 1 and 0 otherwise etc.

Assignment of time dummies: Illustration for our example

Year

1940

1941

1942

1943

1944

1940

1941

1942

1943

1944

1940

1941

1942

1943

1944

1940

1941

1942

1943

1944

Firm ID

1

1

1

1

1

2

2

2

2

2

3

3

3

3

3

4

4

4

4

4

I

74.4

113

91.9

61.3

56.8

461.2

512

448

499.6

547.5

361.6

472.8

445.6

361.6

288.2

28.57

48.51

43.34

37.02

37.81

F

2132.2

1834.1

1588

1749.4

1687.2

4643.9

4551.2

3244.1

4053.7

4379.3

2202.9

2380.5

2168.6

1985.1

1813.9

628.5

537.1

561.2

617.2

626.7

C

186.6

220.9

287.8

319.9

321.3

207.2

255.2

303.7

264.1

201.6

254.2

261.4

298.7

301.8

279.1

26.5

36.2

60.8

84.4

91.2

T1

T2

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

T4

T3

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

T5

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

There is no dummy for first time-period, which means 0 represents intercept for first

(or omitted time period in terms of assigning dummies) time-period.

Other s represent differential intercept coefficients indicating how much intercepts

of dummy variable assigned t's differ from intercept of t which is not assigned a

dummy.

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

The problem with time dummies is that it produces many explanatory variables,

especially when the number of time periods is plenty. Also, loss in degrees of

freedom is the major casualty of this exercise.

How to decide on the statistical significance of time effects (or time dummies)? [OR]

How can we choose between OLS regression results and FEM (with time effects)

results?

We can use the restricted F test explained above.

The third variant of FEM incorporates both individual and time-effect, called twoway model.

If we incorporate these effects, the panel regression model will look like:

Yit = 1i + 1t + 2X2it + 3 X3it + uit

(6)

How to incorporate both individual and the time dummies in the data set? Follow the

procedure explained above.

How to decide on the statistical significance of model with both individual and time

effects? [OR] How can we choose between FEM with individual effect and FEM

with both individual and time effects?

We can use the restricted F test explained above.

The model with both individual and time effects is rarely used in practice due to two

reasons.

First, the cost in terms of degrees of freedom is often not justified.

Second, in those instances in which a model of the time-wise evolution of the

disturbances is desired, a more general model than the dummy variable formulation is

usually used.

The major problems with one-way or two-way FEM with dummy variables are (a)

they are impractical (Is it so considering the help rendered by software packages?)

when number of cross-sectional units and time periods is very large (say for example

if n = 1000 workers or t = 50 years); and (b) there is a cost involved in terms of the

loss in degrees of freedom.

One way to overcome this is to run a pooled OLS regression based on the timedemeaned or entity demeaned variables. This method avoids the inclusion of

dummies by transforming all the variables using group means. In other words, the

philosophy underlying this method is to eliminate the influence of

unobserved/individual effects prior to estimation.

To see what this method involves, consider the following model with a single

explanatory variable: for each i.

yit = 1xit + ai + uit, t = 1, 2, ., T ------ (1)

where ai is the individual or unobserved effect

Now for each i, by averaging equation (1) over time. We get

y i 1 x i ai u i

( 2)

1

Where y i T y it . Similar interpretation holds for x i and u i .

t 1

If we subtract (2) from (1) for each t, we get

y it y i 1 ( xit x i ) u it u i , t 1,2,....., T

(or)

y it 1 xit uit , t 1,2,....T ,

Where yit

variables.

y it y i is

(3)

eliminated! This suggests that we should estimate (3) by pooled OLS

But, note that from (2) above we can compute dummies using the following formula

a i y i 1 xi

(4)

Note that OLS on (3) should not have a constant term (Why?). Even if we generate a

constant it is arbitrary.

Illustration: elimination of unobserved (individual) effects via fixed effects

transformation

Time

1

2

3

Firm

1

1

1

4

5

1

1

Group

average

2

Actual Data

C

D1

186.6

1

220.9

1

287.8

1

I

74.4

113

91.9

F

2132.2

1834.1

1588

61.3

56.8

397

79.5

1749.4

1687.2

8990.9

1798.18

319.9

321.3

1336.5

267.3

461

4643.9

207.2

Time-demeaned data

I

F

C

-5.08 334.02

-80.7

33.52

35.92

-46.4

12.42 -210.2

20.5

D2

0

0

0

D3

0

0

0

D4

0

0

0

1

1

5

1

0

0

0

0

0

0

-18.18

-22.68

-48.78

-111

52.6

54

-32.46

469.46

-39.16

2

3

4

5

2

2

2

2

1

2

3

4

5

Group

average

3

3

3

3

3

1

2

3

4

5

Group

average

4

4

4

4

4

Group

average

512

448

500

548

2468

494

4551.2

3244.1

4053.7

4379.3

20872.2

4174.44

255.2

303.7

264.1

201.6

1231.8

246.36

0

0

0

0

1

1

1

1

5

1

0

0

0

0

0

0

0

0

18.34

-45.66

5.94

53.84

376.76

-930.3

-120.7

204.86

8.84

57.34

17.74

-44.76

362

473

446

362

288

1930

386

2202.9

2380.5

2168.6

1985.1

1813.9

10551

2110.2

254.2

261.4

298.7

301.8

279.1

1395.2

279.04

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

5

1

0

0

0

0

0

-24.36

86.84

59.64

-24.36

-97.76

92.7

270.3

58.4

-125.1

-296.3

-24.84

-17.64

19.66

22.76

0.06

28.6

48.5

43.3

37

37.8

195

39.1

628.5

537.1

561.2

617.2

626.7

2970.7

594.14

26.5

36.2

60.8

84.4

91.2

299.1

59.82

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

5

1

-10.48

9.46

4.29

-2.03

-1.24

34.36

-57.04

-32.94

23.06

32.56

-33.32

-23.62

0.98

24.58

31.38

The regression on these time-demeaned variables will produce the following result:

Iit = 0.0651 F + 0.0558 C

Now, using the coefficient estimates of F and C, we can compute group dummies by

applying (4) above. For example the dummy for firm one is computed as

= 79.5 [0.0651 (1798.18) + 0.0558 (267.3)]

= 79.5 [117.062 + 14.915]

= -52.477

The fixed effects transformation is also called the within transformation. This can be

done in a single command in STATA econometrics software package.

Since the within transformation or within effect model does not use the

dummies, it has larger degrees of freedom, smaller MSE (mean

square error), and incorrect (smaller) standard errors of

parameters than those of LSDV. Also, R-squared of the within effect

model (which is small compared to LSDV model) is not correct

because an intercept is suppressed.

A pooled OLS estimator that is based on the time-demeaned variable is called the

fixed effects estimator or the within estimator. The latter name comes from the fact

that OLS on (3) uses the time variation in y and x within each cross-sectional

observation.

The method of time-demeaned data can be extended to time-period dummies as

well. In the presence of time effect, the within effect model involves 3 steps: (i)

compute time average of all variables (i.e. take average of variables across time

instead of group-wise); (ii) compute deviation of time-wise observations from the

time average and (iii) use (ii) for running OLS.

Illustration for the above data

Time

Firm

1

1

1

2

1

3

1

4

Average for year 1

I

74.4

461

362

28.6

***

F

2132.2

4643.9

2202.9

628.5

***

C

186.6

207.2

254.2

26.5

***

The major defect of this procedure is that in the process of eliminating the

individual effects, any other explanatory variable that is constant over time for

all i gets swept away.

Hence, we cannot include (i.e. find estimates of) variables such as gender or a citys

distance from a river as explanatory variables in the model. In such a case, we should

use random effects model.

There is yet another way to estimate fixed effects model, called

between group effect model. This model uses aggregate

information, group means of variables (i.e. equation 2 above). In

other words, the unit of analysis is not an individual observation, but

groups or subjects. The number of observations jumps down to n

from nT.

Illustration for the above data:

Firm

Group 1 average

Group 2 average

Group 3 average

Group 4 average

I

79.5

494

386

39.1

F

1798.18

4174.44

2110.2

594.14

C

267.3

246.36

279.04

59.82

parameter estimates from those of LSDV and the within effect

model.

However, the between estimator ignores important information on

how the variables change over time.

In the presence of time effect, the between effect model involves

regressing time means of dependent variables (i.e. take average of

variables across time instead of group-wise) on those of

independent variables.

Note that between effect model is not valid in case of two-way fixed

effect model (why?)

Dummy variable regression is considered as a traditional way of applying fixed

effects panel regression.

If the researcher sees the unobserved/individual effect as a parameter to be estimated

for each i then dummy variable regression method is fine.

Anyway, the estimated intercepts from the dummy variable analysis are of only

occasional interest. Example: if we want to pick a particular firm or city to see

whether its dummy intercept coefficient is above or below average value in the

sample.

Also, an interesting feature of dummy variable regression is that it gives us exactly

the same estimates of the explanatory variables that we would obtain from the

regression on time-demeaned data.

But, one major benefit of the dummy variable regression is that the R-squared from

such regression is usually rather high. This occurs because we are including a

dummy variable for each cross-sectional unit, which explains much of the variation in

the data.

On the other hand R-squared of the regression on time-demeaned data would be low.

But this method gains more degrees of freedom compared to dummy variable

regression.

An examination of results of computer software packages namely GRETL and

LIMDEP reveal that they both use dummy variable regression to estimate fixed effect

panel model.

6.1. Why or when REM?

One of the major assumptions underlying FEM is that the unobserved effect is

correlated with one or more of the explanatory variables (i.e. Xs).

more of the explanatory variables then the usage of random effects model (REM) is

more attractive or appropriate

When the unobserved effect is thought to be uncorrelated with the explanatory

variables then unobserved effect can be left in the error term, the resulting serial

correlation over time can be handled by GLS estimation.

6.2. Illustration

Let us consider the following one-way (individual effect) FEM

Yit = 1i + 2X2it + 3 X3it + uit

..

(1)

Suppose in this case the sampled cross-sectional units were drawn from a large

population (say the case of longitudinal data set) [OR] the cross-sectional units

included in our sample are a drawing from a much larger universe of such units.

In such cases we need to treat 1i as random variable like uit and not as an

individual specific fixed constant over time.

If we treat 1i as random variable the variable would take the following form

1i = 1 + i i = 1, 2, 3,.., N ---------------- (2)

Where 1 is the mean of intercepts (1i) of all cross-sectional units and i is a random

error term characterizing the ith observation and is constant through time. What the

term (2) essentially implies is that the intercept of an individual cross-sectional unit is

nothing but the mean of intercepts of all cross-sectional units PLUS or MINUS the

error term (the error term represents the random deviation of individual intercept from

the mean of intercepts of all cross-sectional units). Hence, the individual differences

in the intercept value of each company are reflected in the error term i.

Now, Substituting (2) into (1), we obtain:

Yit = 1 + 2X2it + 3 X3it + i + uit

Yit = 1 + 2X2it + 3 X3it + wit

..

(3)

Note that the composite error term wit consists of two components, i, which is the

cross-section or individual-specific error component, and uit, which is the combined

time series and cross-section error component. i is assumed independent of uit.

Since the composite error term consists of two or more (if we include time effects)

error terms, REM is also called error components model.

Notice carefully the difference between FEM and REM. In FEM each cross-sectional

unit has its own (fixed) intercept value. In REM, on the other hand, the intercept 1

represents the mean value of all the cross-sectional intercepts, and the error

component i represents the random deviation of individual intercept from this mean

value.

Now, since i is in the composite error in each time period, the wit are serially

correlated across time

That is, correlation (wit, wis) = 2 / ( 2 + 2u), t s. Here, 2 = Variance ( i) and 2u

= Variance (uit).

This serial correlation problem can be solved or eliminated by making use of the

GLS method. As the usual pooled OLS standard errors ignore this correlation, they

will not be correct

Deriving the GLS transformation that eliminates serial correlation in the errors

requires sophisticated matrix algebra. But the transformation itself is simple (or) there

is a simple transformation method.

Now, define 1

u2

( u2 T 2 )

Here lies between zero and one. Then, the transformed equation turns out to be

y it y i 1 (1 ) 2 ( x 2 it x 2 i ) 3 ( x3it x 3i ) ( wit w i )

(3)

This is very interesting equation, as it involves quasi-demeaned data on each variable.

The (random effects) transformation here involves subtracting a fraction of the

group/time average with respect to each variable, where the fraction depends on

2u, 2 and the number of time periods, T.

In contrast, the fixed effects estimator subtracts complete/full time or group

averages from the corresponding variable.

Thus, GLS estimator is simply the pooled OLS regression on quasi-demeaned data

[equation (3)].

The transformation in (3) allows for the inclusion of explanatory variables that are

constant over time in the regression. This is because the quasi-time-demeaning only

removes a fraction of the group or time average, and not the whole group or time

average.

This is one major advantage random effects model has over fixed effects model.

Equation (3) allows us to relate the RE estimator to both pooled OLS and fixed

effects.

Pooled OLS is obtained when =0, and FE is obtained when =1.

In practice, the estimate is never zero or one. But, if is close to zero, the RE

estimates will be close to the pooled OLS estimates. This is the case when

unobserved effect i is relatively unimportant (because, it has small variance relative

to 2u).

It is more common for 2 to be large relative to 2u in which case will be closer to

unity.

As T gets large, tends to one, and this makes the RE and FE estimates very similar.

Thus the value of the estimated transformation parameter indicates whether the

estimates are likely to be closer to the pooled OLS or the fixed effects estimates.

The REM with time effect and the REM with both individual and time effects is a

straightforward extension of the REM with individual effect.

The REM with time effect will take the following form:

Yit = 1 + 2X2it + 3 X3it + t + uit

Yit = 1 + 2X2it + 3 X3it + vit

where vit = t + uit

The REM with both individual and time effect will take the following form:

Yit = 1 + 2X2it + 3 X3it + i + t + uit

Yit = 1 + 2X2it + 3 X3it + Wit

where Wit = i + t + uit

[or Choosing between OLS and REM/FEM]:

Whether the data supports random effects model can be verified with the help of

Lagrange multiplier (LM) test developed by Breusch and Pagan (called BreuschPagan Lagrange multiplier (LM) test). The test procedure is as follows:

Set H0: Variances of groups are zero, i.e. 2 = 0 (or Correlation [wit, wis] =

0)1

1

In correlation (wit, wis) = 2 / ( 2 + 2u), t s, if 2 = 0 then correlation (wit, wis) will become zero.

freedom (two degrees of freedom in case the model has both individual and time

effects).

If the calculated LM test statistic exceeds the 95 percent critical/table value for chisquared with one degree of freedom (two degrees of freedom in case of two way

random effects model) (which is 3.841), we reject H0.

Rejection of H0 implies classical regression model (OLS) with a single constant term

is inappropriate for the data. The result of the test is to reject H 0 in favour of the

random effects model.

The null hypothesis of the one-way random time effect is that

variance components for time are zero, i.e. 2 = 0

The two way random effects model has the null hypothesis that

variance components for groups and time are all zero.

An inevitable question in panel data models is: which of the two models FEM &

REM should be used?

The random effect model is recommended under the following circumstances:

(a) If the key explanatory variable is constant over time (e.g. education

qualification), we cannot use FE to estimate its effects on dependent variable.

(b) If we are willing to assume the unobserved effect is uncorrelated with

explanatory variables then we can only use random effects model.

(c) If the key policy variable is set experimentally, then random effects would

be appropriate for estimating the coefficients. For example, for estimating the

effect of class size on class performance random effects would be appropriate

if each year children are randomly assigned to classes of different sizes.

(d) If we treat (or believe) our sample as a random drawings from a large

population (the case of a wide longitudinal data set), the REM has some

intuitive appeal.

(e) When number of cross-sectional units (N) is big, REM is preferred

The random effect model is recommended under the following circumstances:

(a) If we are willing to assume the unobserved effect is correlated with

explanatory variables then we can only use fixed effects model. In other

and the explanatory variables, while random effects does not, FE is widely

thought to be a more convincing tool for estimating ceteris paribus effects. In

most cases the regressors are themselves outcomes of choice processes and

likely to be correlated with unobserved effects.2

(b) In some applications of panel data methods, we cannot treat our sample as

a random sample from a large population, especially when the unit of

observation is a large geographical unit (say, states or provinces). Then, it

often makes sense to think of each unobserved effect (ai) as a separate

intercept to estimate for each cross-sectional unit. In this case, we use fixed

effects. Hence, FE is almost always much more convincing than RE for policy

analysis using aggregate data.

(c) If T (the number of time series data) is large and N (the number of crosssectional units) is small, there is likely to be little difference in the values of

the parameters estimated by the FEM and REM. Hence the choice here is

based on computational convenience. On this score, FEM may be preferable.

A readymade solution to this choice problem is the Hausman specification test. This

test compares the fixed versus random effects under the null hypothesis that the

individual or unobserved effects are uncorrelated with the other regressors in the

model.

If Hausman test rejects the null hypothesis, fixed effect model is preferred over

random effect model (this implies that the unobserved effect are uncorrelated with

each explanatory variable.

Null hypothesis is rejected if the calculated Hausman test statistic is higher than the

critical value from the chi-squared table (the Hausman test statistic has an asymptotic

chi-squared distribution) with k degrees of freedom (where k is the number of

explanatory variables in the model excluding the constant term).

Consider an example. Suppose we have a random sample of large number of individuals and we want

to model their wage or earnings function. Suppose earnings are a function of education, work

experience, etc. Now if we let i stand for innate ability, family background etc., then when we model

the earnings function including i it is very likely to be correlated with education, for innate ability and

family background are often crucial determinants of education. As Wooldridge contends, In many

applications, the whole reason for using panel data is to allow the unobserved effect [i.e., i] to be

correlated with the explanatory variables.

Fixed Effect Model

Dummies are considered as a part of the

intercept.

Dummies act as an error term

times

groups taken together.

times.

intercepts

for groups and error, assuming the

same intercept and slopes. The

difference among groups (or time

periods) lies in the variance of the

error term.

Use least squares dummy Use generalized least squares

variable (LSDV), within effect (GLS) method

and between effect estimation

methods.

In

short,

OLS

regressions with dummies, in

fact, are fixed effect models.

Fixed effects are tested by the

(incremental) F test

the Lagrange Multiplier (LM) test.

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.