Sunteți pe pagina 1din 19

Panel Data Regression Methods

1) What is Panel Data?


In panel data we have elements of both time series and cross-sectional data.
That is, the same cross-sectional unit (say a firm/state/country) is surveyed over time.
Example: data on GDP for 3 countries for a period of 5 years (see Table).
Year
1991
1992
1993
1994
1995

India
***
***
***
***
***

USA
***
***
***
***
***

Pakistan
***
***
***
***
***

In the typical panel, there are a large number of cross-sectional units and only a few
periods though the opposite case is also relevant.
Such data set focuses on cross-sectional variation or heterogeneity.
There are other names for panel data such as pooled data and longitudinal data. But
strictly not all pooled data are panel data. It will become complete panel only when
we include individual and/or time effects.

2) Types of Panel Data


(A) Balanced panel: Each cross-sectional unit has same number of time series
observations.
Year
1991
1992
1993
1994
1995
1991
1992
1993
1994
1995
1991
1992
1993
1994
1995

Firm ID
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3

Investment (Y)
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***

Value of Firm (X1)


***
***
***
***
***
***
***
***
***
***
***
***
***
***
***

Capital Stock (X2)


***
***
***
***
***
***
***
***
***
***
***
***
***
***
***

(B) Unbalanced panel: Number of observations/time-periods differs among


panel members
Year
1991
1992
1993
1991
1992
1993
1994
1991
1992
1993
1994

Firm ID
1
1
1
2
2
2
2
3
3
3
3

Investment (Y)
***
***
***
***
***
***
***
***
***
***
***

Value of Firm (X2)


***
***
***
***
***
***
***
***
***
***
***

Capital Stock (X3)


***
***
***
***
***
***
***
***
***
***
***

3) Advantages of Panel Data


It takes into account heterogeneity across individual cross-sectional units and through
time. This is done by including individual and time specific variables.
It allows inclusion of unobserved effects as explanatory variables in the model.
Exclusion of such effects could cause omitted variable bias.
It helps to examine issues that cant be studied in either cross-sectional or time-series
settings alone.
It gives more informative data, more variability, less collinearity among variables,
more degrees of freedom and more efficiency
It is better suited to study the dynamics of change as it uses repeated cross section of
observations
It enables us to study more complicated behavioural models (e.g. technological
change)
It allows researcher far greater flexibility in modelling differences in behaviour across
individual cross-sectional units (see one-way fixed effects model)
Using a single cross section disregards much useful information in other time periods.
On the other hand, usage of time-series data disregards useful information on other
cross-sectional units.

4) Methods of Estimation

Panel Data Regression

Fixed Effect Methods


(FEM)

One-way
FEM

Two-way
FEM

Random Effect Methods


(REM)

One-way
REM

Two-way
REM

5) Fixed Effects Model (FEM)


Consider the following simple (panel) regression model
Yit = 1 + 2X2it + 3 X3it + uit
(1)
Where i stands for i th cross-sectional unit or observation (e.g. firm or state); t stands
for t th time period; and uit is common error term.
Problem with this model: It ignores two important aspects which are relevant in
panel data context
First, it doesnt consider space/individual/group dimension: that is factors which are
specific to each cross-sectional unit (individuals/firms/states/countries) but remain
unchanged overtime.

Examples: geographical location of a state, ability of an individual, managerial


style/philosophy of a company, fixed capital of a firm, historical factors etc.
Second, it ignores time dimension: that is factors which are specific to the period in
which they occur but not carried across periods within a cross-sectional unit
Examples: technological change, changes in govt. regulatory policies, economic boom
or bust, global economic slowdown, weather factor and so on.
Together,
group
and
effects/variables/factors.

time

dimensions

are

called

unobserved

Thus, model (1) is the usual OLS regression and in panel data context is called pooled
regression model.
Due to omission of group and time dimensions (specification error), the OLS results
might not be fully reliable.
Hence, we need to find some way to take into account group and time dimension in
model (1).
How to accomplish this?

5.1. The case of group dimension:


We let the intercept of (1) vary for each cross-sectional unit but dont allow it to
change overtime for each cross-sectional unit.
If we do so, model (1) would become
Yit = 1i + 2X2it + 3 X3it + uit

(2)

In (2) we have an intercept term (1i) specific to each individual cross-sectional unit i
Technically, 1i is called the individual effect and is an unknown parameter to be
estimated
Note that in model (1) we have a common intercept term ( 1) which is assumed as
same across all cross-sectional units
By incorporating group dimensions we acknowledge that there are heterogeneity or
significant differences among cross-sectional units and the difference is captured
through individual specific constant term 1i.
Model (2) is known as the one-way Fixed Effects Model since only individual effect
is present.
The term fixed effects is due to the fact that (a) we consider 1i as a group specific
constant term in regression model; and (b) each cross-sectional units (is) intercept
does not vary over time; that is, it is time invariant.

How to allow for Group/Individual effect in econometric model?


We can easily allow this using the dummy variable technique.
Now, allowing for dummy variables, model (2) can be written as
Yit= 1+ 2D2i+ 3D3i ++ nDni+ 2X2it + 3 X3it + uit

(3)

Where D2i = 1 if observation belong to cross-sectional unit 2; zero otherwise


D3i = 1 if the observation belong to cross-sectional unit 3, zero otherwise
Dni = 1 if the observation belong to nth cross-sectional unit, zero otherwise
Note: If we have n cross-sectional units/groups, we use n-1 dummies to avoid falling
into dummy-variable trap (i.e. situation of perfect collinearity). Hence there is no
dummy for first cross-sectional unit, which means 1 represents intercept for first (or
omitted i in terms of assigning dummies) cross-sectional unit.
Other s represent differential intercept coefficients indicating how much
intercepts of dummy variable assigned i's differ from intercept of i which is not
assigned a dummy.
In short, cross-sectional unit which is not assigned a dummy becomes comparison
cross-sectional unit.
Of course we are free to choose any cross-sectional unit as comparison cross-sectional
unit [Anyway, computer programme will automatically omit the dummy for first
cross-sectional unit (i.e. select the first i as comparison i) and compute the results].
Since we use dummies to estimate the fixed/individual effects, this model is also
known as Least-Squares Dummy Variable (LSDV) model. Hence, the terms FEM
and LSDV can be used interchangeably.
How to incorporate individual dummy variable in the data set?
Assume that we have 5 year data for 4 companies on three variables: one dependent
(investment) and 2 independent (value of firm and capital stock) variables.
We can stack the 5 observations for each company on the top of the other; thus giving
in all 20 observations for each of the variables in the model.
After that the individual dummies are assigned in the following fashion.

No of
observati
ons*
1
2
3
4
5
6
7
8
-----------N

Year

Firm ID

1940
1941
1942
1943
1944
1940
1941
1942
1943
1944
1940
1941
1942
1943
1944
1940
1941
1942
1943
1944

1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4

I
74.4
113
91.9
61.3
56.8
461.2
512
448
499.6
547.5
361.6
472.8
445.6
361.6
288.2
28.57
48.51
43.34
37.02
37.81

F
2132.2
1834.1
1588
1749.4
1687.2
4643.9
4551.2
3244.1
4053.7
4379.3
2202.9
2380.5
2168.6
1985.1
1813.9
628.5
537.1
561.2
617.2
626.7

C
186.6
220.9
287.8
319.9
321.3
207.2
255.2
303.7
264.1
201.6
254.2
261.4
298.7
301.8
279.1
26.5
36.2
60.8
84.4
91.2

D1

D2

D3

D4

1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1

* - No of years multiplied by no of cross-sectional units.


For this data we get the following result
Yit = -52.468 + 260.69 D2i + 285.51 D3i + 49.51 D4i + .065 X2it + .056 X3it
Where Yit represents I; X2it represents F; and X3it represents C.
Note: Other statistics are not produced
The intercept values of the four companies are as follows: -52.47 for Company 1;
208.22 (= 260.69 52.47) for Company 2; 233.04 (= 285.51 52.47) for Company 3
and -2.96 (= 49.51 52.47).
We may say that these differences in intercepts may be due to unique features of each
company, such as differences in management style or managerial talent.
The problem with dummy variable analysis is that it produces many explanatory
variables, especially when the number of cross-sectional observations is plenty. Thus,
in such cases, dummy variable method is not very practical.
However, computer regression packages take care of this. But loss in degrees of
freedom is the major casualty of this exercise.
But the packages rarely produce (LIMDEP & GRETL do not) the estimated intercepts
from the dummy variable analysis for the practical reason that there are so many such
intercepts.

Some econometric packages that support fixed effects estimation report only one
intercept (GRETL does this) which can cause confusion.
Typically, the intercept reported is this case is the average of individual-specific
intercepts: (-52.47 + 208.22 + 233.04 -2.96)/4 = 96.45 in the above example
How to decide on the statistical significance of individual/group effects (or individual
dummies)? [OR] How can we choose between OLS regression results and FEM (with
individual effects) results?
We can use the restricted F test for the purpose. We use F test because our aim is to
test joint significance of null hypothesis that 2 = 3 = n = 0 (n-1 restriction,
since one group is chosen as the base group), where n is the number of crosssectional units [or the null hypothesis is that all dummy parameters
except one are zero: 1 = 2 = n-1 = 0]. In vast majority of applications, the
dummy variables will be jointly significant.
For the purpose of the F test, OLS model is considered a restricted model in the
sense that it imposes a common intercept on all the individual cross-sectional units.
The FEM (with individual effects) is considered as an unrestricted model.
The null hypothesis set for the purpose of testing is: Intercept terms are equal for
all i [or] No significant difference across i's
Acceptance of this null means that the efficient estimator is the one obtained from the
model which assumes constant intercept, i.e. OLS.
The F ratio used for the test is as follows:
(R2UR R2R)/(n-1)
F(n-1, nT-n-K) = --------------------------(1- R2UR)/(nT-n-K)
Where n number of cross sectional units (or groups)
T number of time period w.r.t a single i
K number of explanatory variables (excluding constant)
R2UR R2 value of unrestricted regression model (FEM)
R2R - R2 value of restricted regression model (OLS)
If computed F value (for n-1 numerator df and nT-n-K denominator df) is larger than
the critical value from the F table (for n-1 numerator df and nT-n-K denominator df)
we reject the null hypothesis.
Rejection of null hypothesis favours existence of individual effect in the data (and
vice versa).

5.2. The case of time dimension:


This is another variant of FEM with only time-effect/dimension is present.

If we incorporate this effect, the panel regression model will look like:
Yit = 1t + 2X2it + 3 X3it + uit
(4)
In (4) we have an intercept term (1t) specific to each time period t
Technically, 1t is called the time effect and is an unknown parameter to be estimated
Model (4) is also known as one-way FEM since only time effect is present
We could account for time effects in the model by introducing time dummies (t-1
dummies to avoid perfect collinearity) on the right hand side of the equation (4).
In such case we write (4) as:
Yit= 0+ 1DYear1+ 2DYear2+..+ n DYearT + 2X2it + 3 X3it + uit
(5)
Where DYear1 indicates dummy for year 1; DYear2 indicates dummy for year 2 and
so on. DYear1 takes a value of 1 for observation in Year 1 and 0 otherwise etc.
Assignment of time dummies: Illustration for our example
Year
1940
1941
1942
1943
1944
1940
1941
1942
1943
1944
1940
1941
1942
1943
1944
1940
1941
1942
1943
1944

Firm ID
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4

I
74.4
113
91.9
61.3
56.8
461.2
512
448
499.6
547.5
361.6
472.8
445.6
361.6
288.2
28.57
48.51
43.34
37.02
37.81

F
2132.2
1834.1
1588
1749.4
1687.2
4643.9
4551.2
3244.1
4053.7
4379.3
2202.9
2380.5
2168.6
1985.1
1813.9
628.5
537.1
561.2
617.2
626.7

C
186.6
220.9
287.8
319.9
321.3
207.2
255.2
303.7
264.1
201.6
254.2
261.4
298.7
301.8
279.1
26.5
36.2
60.8
84.4
91.2

T1

T2
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0

T4

T3
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0

0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0

T5
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0

There is no dummy for first time-period, which means 0 represents intercept for first
(or omitted time period in terms of assigning dummies) time-period.
Other s represent differential intercept coefficients indicating how much intercepts
of dummy variable assigned t's differ from intercept of t which is not assigned a
dummy.

0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1

The problem with time dummies is that it produces many explanatory variables,
especially when the number of time periods is plenty. Also, loss in degrees of
freedom is the major casualty of this exercise.
How to decide on the statistical significance of time effects (or time dummies)? [OR]
How can we choose between OLS regression results and FEM (with time effects)
results?
We can use the restricted F test explained above.

5.3. The case of individual and time dimension:


The third variant of FEM incorporates both individual and time-effect, called twoway model.
If we incorporate these effects, the panel regression model will look like:
Yit = 1i + 1t + 2X2it + 3 X3it + uit
(6)
How to incorporate both individual and the time dummies in the data set? Follow the
procedure explained above.
How to decide on the statistical significance of model with both individual and time
effects? [OR] How can we choose between FEM with individual effect and FEM
with both individual and time effects?
We can use the restricted F test explained above.
The model with both individual and time effects is rarely used in practice due to two
reasons.
First, the cost in terms of degrees of freedom is often not justified.
Second, in those instances in which a model of the time-wise evolution of the
disturbances is desired, a more general model than the dummy variable formulation is
usually used.

5.4. Regression on Time-demeaned data:


The major problems with one-way or two-way FEM with dummy variables are (a)
they are impractical (Is it so considering the help rendered by software packages?)
when number of cross-sectional units and time periods is very large (say for example
if n = 1000 workers or t = 50 years); and (b) there is a cost involved in terms of the
loss in degrees of freedom.
One way to overcome this is to run a pooled OLS regression based on the timedemeaned or entity demeaned variables. This method avoids the inclusion of
dummies by transforming all the variables using group means. In other words, the
philosophy underlying this method is to eliminate the influence of
unobserved/individual effects prior to estimation.

To see what this method involves, consider the following model with a single
explanatory variable: for each i.
yit = 1xit + ai + uit, t = 1, 2, ., T ------ (1)
where ai is the individual or unobserved effect
Now for each i, by averaging equation (1) over time. We get
y i 1 x i ai u i

( 2)

1
Where y i T y it . Similar interpretation holds for x i and u i .
t 1

Because ai is fixed over time, it appears in both (1) and (2).


If we subtract (2) from (1) for each t, we get
y it y i 1 ( xit x i ) u it u i , t 1,2,....., T

(or)
y it 1 xit uit , t 1,2,....T ,
Where yit
variables.

y it y i is

(3)

the time-demeaned data on y, and similarly for other

As a result of this transformation (called fixed effects transformation), ai gets


eliminated! This suggests that we should estimate (3) by pooled OLS
But, note that from (2) above we can compute dummies using the following formula
a i y i 1 xi

(4)

Note that OLS on (3) should not have a constant term (Why?). Even if we generate a
constant it is arbitrary.
Illustration: elimination of unobserved (individual) effects via fixed effects
transformation
Time
1
2
3

Firm
1
1
1

4
5

1
1

Group
average
2

Actual Data
C
D1
186.6
1
220.9
1
287.8
1

I
74.4
113
91.9

F
2132.2
1834.1
1588

61.3
56.8
397
79.5

1749.4
1687.2
8990.9
1798.18

319.9
321.3
1336.5
267.3

461

4643.9

207.2

Time-demeaned data
I
F
C
-5.08 334.02
-80.7
33.52
35.92
-46.4
12.42 -210.2
20.5

D2
0
0
0

D3
0
0
0

D4
0
0
0

1
1
5
1

0
0

0
0

0
0

-18.18
-22.68

-48.78
-111

52.6
54

-32.46

469.46

-39.16

2
3
4
5

2
2
2
2

1
2
3
4
5

Group
average
3
3
3
3
3

1
2
3
4
5

Group
average
4
4
4
4
4
Group
average

512
448
500
548
2468
494

4551.2
3244.1
4053.7
4379.3
20872.2
4174.44

255.2
303.7
264.1
201.6
1231.8
246.36

0
0
0
0

1
1
1
1
5
1

0
0
0
0

0
0
0
0

18.34
-45.66
5.94
53.84

376.76
-930.3
-120.7
204.86

8.84
57.34
17.74
-44.76

362
473
446
362
288
1930
386

2202.9
2380.5
2168.6
1985.1
1813.9
10551
2110.2

254.2
261.4
298.7
301.8
279.1
1395.2
279.04

0
0
0
0
0

0
0
0
0
0

1
1
1
1
1
5
1

0
0
0
0
0

-24.36
86.84
59.64
-24.36
-97.76

92.7
270.3
58.4
-125.1
-296.3

-24.84
-17.64
19.66
22.76
0.06

28.6
48.5
43.3
37
37.8
195
39.1

628.5
537.1
561.2
617.2
626.7
2970.7
594.14

26.5
36.2
60.8
84.4
91.2
299.1
59.82

0
0
0
0
0

0
0
0
0
0

0
0
0
0
0

1
1
1
1
1
5
1

-10.48
9.46
4.29
-2.03
-1.24

34.36
-57.04
-32.94
23.06
32.56

-33.32
-23.62
0.98
24.58
31.38

The regression on these time-demeaned variables will produce the following result:
Iit = 0.0651 F + 0.0558 C

Now, using the coefficient estimates of F and C, we can compute group dummies by
applying (4) above. For example the dummy for firm one is computed as
= 79.5 [0.0651 (1798.18) + 0.0558 (267.3)]
= 79.5 [117.062 + 14.915]
= -52.477
The fixed effects transformation is also called the within transformation. This can be
done in a single command in STATA econometrics software package.
Since the within transformation or within effect model does not use the
dummies, it has larger degrees of freedom, smaller MSE (mean
square error), and incorrect (smaller) standard errors of
parameters than those of LSDV. Also, R-squared of the within effect
model (which is small compared to LSDV model) is not correct
because an intercept is suppressed.
A pooled OLS estimator that is based on the time-demeaned variable is called the
fixed effects estimator or the within estimator. The latter name comes from the fact
that OLS on (3) uses the time variation in y and x within each cross-sectional
observation.
The method of time-demeaned data can be extended to time-period dummies as
well. In the presence of time effect, the within effect model involves 3 steps: (i)

compute time average of all variables (i.e. take average of variables across time
instead of group-wise); (ii) compute deviation of time-wise observations from the
time average and (iii) use (ii) for running OLS.
Illustration for the above data
Time
Firm
1
1
1
2
1
3
1
4
Average for year 1

I
74.4
461
362
28.6
***

F
2132.2
4643.9
2202.9
628.5
***

C
186.6
207.2
254.2
26.5
***

The major defect of this procedure is that in the process of eliminating the
individual effects, any other explanatory variable that is constant over time for
all i gets swept away.
Hence, we cannot include (i.e. find estimates of) variables such as gender or a citys
distance from a river as explanatory variables in the model. In such a case, we should
use random effects model.
There is yet another way to estimate fixed effects model, called
between group effect model. This model uses aggregate
information, group means of variables (i.e. equation 2 above). In
other words, the unit of analysis is not an individual observation, but
groups or subjects. The number of observations jumps down to n
from nT.
Illustration for the above data:
Firm
Group 1 average
Group 2 average
Group 3 average
Group 4 average

I
79.5
494
386
39.1

F
1798.18
4174.44
2110.2
594.14

C
267.3
246.36
279.04
59.82

This group mean regression produces different goodness-of-fits and


parameter estimates from those of LSDV and the within effect
model.
However, the between estimator ignores important information on
how the variables change over time.
In the presence of time effect, the between effect model involves
regressing time means of dependent variables (i.e. take average of
variables across time instead of group-wise) on those of
independent variables.

Note that between effect model is not valid in case of two-way fixed
effect model (why?)

5.5. Dummy variable Regression Vs Regression on Time-demeaned data


Dummy variable regression is considered as a traditional way of applying fixed
effects panel regression.
If the researcher sees the unobserved/individual effect as a parameter to be estimated
for each i then dummy variable regression method is fine.
Anyway, the estimated intercepts from the dummy variable analysis are of only
occasional interest. Example: if we want to pick a particular firm or city to see
whether its dummy intercept coefficient is above or below average value in the
sample.
Also, an interesting feature of dummy variable regression is that it gives us exactly
the same estimates of the explanatory variables that we would obtain from the
regression on time-demeaned data.
But, one major benefit of the dummy variable regression is that the R-squared from
such regression is usually rather high. This occurs because we are including a
dummy variable for each cross-sectional unit, which explains much of the variation in
the data.
On the other hand R-squared of the regression on time-demeaned data would be low.
But this method gains more degrees of freedom compared to dummy variable
regression.
An examination of results of computer software packages namely GRETL and
LIMDEP reveal that they both use dummy variable regression to estimate fixed effect
panel model.

6. Random Effects Model (REM)


6.1. Why or when REM?
One of the major assumptions underlying FEM is that the unobserved effect is
correlated with one or more of the explanatory variables (i.e. Xs).

Suppose we think/assume that the unobserved effect is uncorrelated with one or


more of the explanatory variables then the usage of random effects model (REM) is
more attractive or appropriate
When the unobserved effect is thought to be uncorrelated with the explanatory
variables then unobserved effect can be left in the error term, the resulting serial
correlation over time can be handled by GLS estimation.

6.2. Illustration
Let us consider the following one-way (individual effect) FEM
Yit = 1i + 2X2it + 3 X3it + uit

..

(1)

Suppose in this case the sampled cross-sectional units were drawn from a large
population (say the case of longitudinal data set) [OR] the cross-sectional units
included in our sample are a drawing from a much larger universe of such units.
In such cases we need to treat 1i as random variable like uit and not as an
individual specific fixed constant over time.
If we treat 1i as random variable the variable would take the following form
1i = 1 + i i = 1, 2, 3,.., N ---------------- (2)
Where 1 is the mean of intercepts (1i) of all cross-sectional units and i is a random
error term characterizing the ith observation and is constant through time. What the
term (2) essentially implies is that the intercept of an individual cross-sectional unit is
nothing but the mean of intercepts of all cross-sectional units PLUS or MINUS the
error term (the error term represents the random deviation of individual intercept from
the mean of intercepts of all cross-sectional units). Hence, the individual differences
in the intercept value of each company are reflected in the error term i.
Now, Substituting (2) into (1), we obtain:
Yit = 1 + 2X2it + 3 X3it + i + uit
Yit = 1 + 2X2it + 3 X3it + wit

..

(3)

Where wit = i + uit . (4)


Note that the composite error term wit consists of two components, i, which is the
cross-section or individual-specific error component, and uit, which is the combined
time series and cross-section error component. i is assumed independent of uit.

Since the composite error term consists of two or more (if we include time effects)
error terms, REM is also called error components model.
Notice carefully the difference between FEM and REM. In FEM each cross-sectional
unit has its own (fixed) intercept value. In REM, on the other hand, the intercept 1
represents the mean value of all the cross-sectional intercepts, and the error
component i represents the random deviation of individual intercept from this mean
value.

Now, since i is in the composite error in each time period, the wit are serially
correlated across time
That is, correlation (wit, wis) = 2 / ( 2 + 2u), t s. Here, 2 = Variance ( i) and 2u
= Variance (uit).
This serial correlation problem can be solved or eliminated by making use of the
GLS method. As the usual pooled OLS standard errors ignore this correlation, they
will not be correct
Deriving the GLS transformation that eliminates serial correlation in the errors
requires sophisticated matrix algebra. But the transformation itself is simple (or) there
is a simple transformation method.
Now, define 1

u2
( u2 T 2 )

Here lies between zero and one. Then, the transformed equation turns out to be
y it y i 1 (1 ) 2 ( x 2 it x 2 i ) 3 ( x3it x 3i ) ( wit w i )

(3)

Where the overbar denotes the time averages.


This is very interesting equation, as it involves quasi-demeaned data on each variable.
The (random effects) transformation here involves subtracting a fraction of the
group/time average with respect to each variable, where the fraction depends on
2u, 2 and the number of time periods, T.
In contrast, the fixed effects estimator subtracts complete/full time or group
averages from the corresponding variable.
Thus, GLS estimator is simply the pooled OLS regression on quasi-demeaned data
[equation (3)].
The transformation in (3) allows for the inclusion of explanatory variables that are
constant over time in the regression. This is because the quasi-time-demeaning only
removes a fraction of the group or time average, and not the whole group or time
average.

This is one major advantage random effects model has over fixed effects model.
Equation (3) allows us to relate the RE estimator to both pooled OLS and fixed
effects.
Pooled OLS is obtained when =0, and FE is obtained when =1.
In practice, the estimate is never zero or one. But, if is close to zero, the RE
estimates will be close to the pooled OLS estimates. This is the case when
unobserved effect i is relatively unimportant (because, it has small variance relative
to 2u).
It is more common for 2 to be large relative to 2u in which case will be closer to
unity.
As T gets large, tends to one, and this makes the RE and FE estimates very similar.
Thus the value of the estimated transformation parameter indicates whether the
estimates are likely to be closer to the pooled OLS or the fixed effects estimates.
The REM with time effect and the REM with both individual and time effects is a
straightforward extension of the REM with individual effect.
The REM with time effect will take the following form:
Yit = 1 + 2X2it + 3 X3it + t + uit
Yit = 1 + 2X2it + 3 X3it + vit
where vit = t + uit
The REM with both individual and time effect will take the following form:
Yit = 1 + 2X2it + 3 X3it + i + t + uit
Yit = 1 + 2X2it + 3 X3it + Wit
where Wit = i + t + uit

6.3. Testing for Random effects model


[or Choosing between OLS and REM/FEM]:
Whether the data supports random effects model can be verified with the help of
Lagrange multiplier (LM) test developed by Breusch and Pagan (called BreuschPagan Lagrange multiplier (LM) test). The test procedure is as follows:
Set H0: Variances of groups are zero, i.e. 2 = 0 (or Correlation [wit, wis] =
0)1
1

In correlation (wit, wis) = 2 / ( 2 + 2u), t s, if 2 = 0 then correlation (wit, wis) will become zero.

Under the null hypothesis, LM is distributed as chi-squared with one degree of


freedom (two degrees of freedom in case the model has both individual and time
effects).
If the calculated LM test statistic exceeds the 95 percent critical/table value for chisquared with one degree of freedom (two degrees of freedom in case of two way
random effects model) (which is 3.841), we reject H0.
Rejection of H0 implies classical regression model (OLS) with a single constant term
is inappropriate for the data. The result of the test is to reject H 0 in favour of the
random effects model.
The null hypothesis of the one-way random time effect is that
variance components for time are zero, i.e. 2 = 0
The two way random effects model has the null hypothesis that
variance components for groups and time are all zero.

6.4. Choosing between FEM and REM The Hausman Test


An inevitable question in panel data models is: which of the two models FEM &
REM should be used?
The random effect model is recommended under the following circumstances:
(a) If the key explanatory variable is constant over time (e.g. education
qualification), we cannot use FE to estimate its effects on dependent variable.
(b) If we are willing to assume the unobserved effect is uncorrelated with
explanatory variables then we can only use random effects model.
(c) If the key policy variable is set experimentally, then random effects would
be appropriate for estimating the coefficients. For example, for estimating the
effect of class size on class performance random effects would be appropriate
if each year children are randomly assigned to classes of different sizes.
(d) If we treat (or believe) our sample as a random drawings from a large
population (the case of a wide longitudinal data set), the REM has some
intuitive appeal.
(e) When number of cross-sectional units (N) is big, REM is preferred
The random effect model is recommended under the following circumstances:
(a) If we are willing to assume the unobserved effect is correlated with
explanatory variables then we can only use fixed effects model. In other

words, as fixed effects allow arbitrary correlation between unobserved effects


and the explanatory variables, while random effects does not, FE is widely
thought to be a more convincing tool for estimating ceteris paribus effects. In
most cases the regressors are themselves outcomes of choice processes and
likely to be correlated with unobserved effects.2
(b) In some applications of panel data methods, we cannot treat our sample as
a random sample from a large population, especially when the unit of
observation is a large geographical unit (say, states or provinces). Then, it
often makes sense to think of each unobserved effect (ai) as a separate
intercept to estimate for each cross-sectional unit. In this case, we use fixed
effects. Hence, FE is almost always much more convincing than RE for policy
analysis using aggregate data.
(c) If T (the number of time series data) is large and N (the number of crosssectional units) is small, there is likely to be little difference in the values of
the parameters estimated by the FEM and REM. Hence the choice here is
based on computational convenience. On this score, FEM may be preferable.
A readymade solution to this choice problem is the Hausman specification test. This
test compares the fixed versus random effects under the null hypothesis that the
individual or unobserved effects are uncorrelated with the other regressors in the
model.
If Hausman test rejects the null hypothesis, fixed effect model is preferred over
random effect model (this implies that the unobserved effect are uncorrelated with
each explanatory variable.
Null hypothesis is rejected if the calculated Hausman test statistic is higher than the
critical value from the chi-squared table (the Hausman test statistic has an asymptotic
chi-squared distribution) with k degrees of freedom (where k is the number of
explanatory variables in the model excluding the constant term).

Consider an example. Suppose we have a random sample of large number of individuals and we want
to model their wage or earnings function. Suppose earnings are a function of education, work
experience, etc. Now if we let i stand for innate ability, family background etc., then when we model
the earnings function including i it is very likely to be correlated with education, for innate ability and
family background are often crucial determinants of education. As Wooldridge contends, In many
applications, the whole reason for using panel data is to allow the unobserved effect [i.e., i] to be
correlated with the explanatory variables.

6.5. FEM and REM A Comparison:


Fixed Effect Model
Dummies are considered as a part of the
intercept.

Random Effect Model


Dummies act as an error term

Intercepts vary across groups and/or


times

There is one common intercept for all


groups taken together.

Error variances are constant

Error variances vary across groups and/or


times.

Examines group differences in


intercepts

Estimates variance components


for groups and error, assuming the
same intercept and slopes. The
difference among groups (or time
periods) lies in the variance of the
error term.
Use least squares dummy Use generalized least squares
variable (LSDV), within effect (GLS) method
and between effect estimation
methods.
In
short,
OLS
regressions with dummies, in
fact, are fixed effect models.
Fixed effects are tested by the
(incremental) F test

Random effects are examined by


the Lagrange Multiplier (LM) test.