Sunteți pe pagina 1din 48

L14205 Applied Microeconometrics

Lecture 1:
Static Panel Data Modelling

Professor Sourafel Girma


sourafel.girma@nottingham.ac.uk
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 1 of 48

Lecture objectives :
1. Explain the nature of panel data
2. Discuss the modelling of time effects and estimation
of robust standard errors.
3. Discuss the estimation and testing of the random and
fixed effects models.

4. Explain the Hausman test for correlated effects.


5. Demonstrate the practical estimation of panel data
models.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 2 of 48

1. Introduction
2. The example dataset
3. Time effects
4. Robust standard errors
5. The random effects model
6. The fixed effects model
7. Summary

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 3 of 48

1
Introduction

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 4 of 48

1. Introduction

Year
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990

Firm 1
Invest Assets
33.1 1170.6
45
2015.8
77.2 2803.3
44.6 2039.7
48.1 2256.2
74.4 2132.2
113 1834.1
91.9
1588
61.3 1749.4
56.8 1687.2
93.6 2007.7

Firm 2
Invest Assets
317.6 3078.5
391.8 4661.7
410.6 5387.1
257.7 2792.2
330.8 4313.2
461.2 4643.9
512 4551.2
448 3244.1
499.6 4053.7
547.5 4379.3
561.2 4840.9

Firm 3
Invest Assets
209.9 1362.4
355.3 1807.1
469.9 2673.3
262.3 1801.9
230.4 1957.3
361.6 2202.9
472.8 2380.5
445.6 2168.6
361.6 1985.1
288.2 1813.9
258.7 1850.2

Firm 4
Invest Assets
12.93 191.5
25.9
516
35.05
729
22.89 560.4
18.84 519.9
28.57 628.5
48.51 537.1
43.34 561.2
37.02 617.2
37.81 626.7
39.27 737.2

We can see the above dataset as 4 separate time series


datasets, one for each firm.
Alternatively, we can see it as 11 separate cross-sectional
datasets, one for each year.
Or we can see it as one big dataset by pooling (combining) the
time series and cross sectional observations. In this case the
pooled data set is called a panel data set.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 5 of 48

1. Introduction

Firm
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4

Year
1980
1981

1980
1981

1980
1981

1980
1981

L14025 Applied Microeconometrics

Invest
33.1
45

317.6
391.8

Assets
1170.6
2015.8

3078.5
4661.7

209.9 1362.4
355.3 1807.1

12.93 191.5
25.9
516

If we pool the data, we have one big


dataset.
We can estimate one regression
(y= Invest; X=Assets)

yit = + xit + eit

where i = 1, 2, 3, 4 (firm) and


t= 1980, 1981, 1990(time period)
we now have 44 observations in the
model (4 x 11)
Examples of indexing:
Y3,1981 = 355.3
X1,1980 = 1170.6
If we apply a simple regression
technique (OLS) to the model, we call
this the Pooled Model.

Lecture 1: Static panel data modelling

Slide 6 of 48

1. Introduction

Typically panel data consist of observations for the same units


(e.g. firms, countries, or individuals) across time (e.g. yearly or
monthly). This is also referred to as longitudinal data.
However any two or more dimensional data can be treated as
panel data (e.g. universities-courses; regions-firms; farms-plots).
Simple regression analysis using OLS may not always be
adequate for such complicated datasets. Hence we may need
special techniques or estimators.
Two such estimators are especially useful in the context of
linear static panel data models: the fixed effects and the random
effects estimators.
For all intents and purposes one can see linear static panel data
modelling as a three-horse race: For the particular model we propose
to estimate and the specific data in hand, which of the OLS, fixed effects
and random effects estimators is most appropriate?
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 7 of 48

1. Introduction

The good news is that there are formal statistical tests that
help identify the winner of this race, and in this lecture we will
discuss the practical implementation of these tests via an
empirical example. A typical panel data model can be written as

= 0 + 1 + + ; i=1,,N;t=1,.T
~ 0, 2
~ 0, 2

e is the usual idiosyncratic random error term which vary with

i and t. The innovation in panel data modelling is the introduction


of the term ui which is time-invariant for each individual unit. It
is the permanent effect associated with individual unit and can
be thought of as capturing unobserved individual heterogeneity.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 8 of 48

1. Introduction

For example if y denotes wages and x is education, ui would


capture the impact of time-invariant individual characteristics
such as ability or family connections that affect earnings.

A three-horse race:
1. There are no individual effects, that is 2 =0 or = 0 for
all i, in which case OLS will be most appropriate .
2. There are individual effects and these are not correlated
with the regressor x. In this case the random effects
estimator will be most appropriate .
3. There are individual effects and these are correlated with
the regressor x. In this case, the fixed effects estimator
is the only appropriate method, and the OLS and random
effects estimators should not be used.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 9 of 48

1. Introduction

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 10 of 48

2.
The example dataset

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 11 of 48

2. Example
dataset

To fix ideas consider the research question Does advertisement work?


A management consultant was asked to write a report on whether
advertisement expenditure leads to statistically and economically
significant improvements in company profitability.
The consultant decided to use an econometric analysis based on a
panel dataset of 250 companies over the period 2003-2006, and
collected the following variables for this purpose:
1. Log of companys profitability (profits) which is before-tax profits
divided by sales.
2. Log of companys market share (mkshare) in its industry (expected
to have positive effect on profitability).
3. Log of index of industry competition (expected to have negative
effect on profits)
4. Log of advertisement (advert) expenditure divided by sales ( ?).
5. Panel unit identifier is company and time identifier is year

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 12 of 48

2. Example
dataset

A screenshot of the dataset (advert.dta)

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 13 of 48

2. Example
dataset

Start by declaring which variable identifies panel units


(company) and which one indicates time (year) :

We can see that the panel data span the period 2003 to 2006, and
it is a balanced panel. That is all companies were observed over
the whole period.
In an unbalanced panel data, different panel units have different
number of observations.
- e.g. some companies might be observed from 2003 to
2006, while others are observed for 2004 and 2005 only.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 14 of 48

2. Example
dataset

Summary statistics by year

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 15 of 48

2. Example
dataset

Take note of the within versus between companies


variability of the variables

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 16 of 48

3.
Time effects

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 17 of 48

3. Time effects

If we assume that all companies have the same profitability


function, and this is stable over time, we can use OLS on the
panel data. This is called the pooled regression.

It seems that there is a negative albeit statistically insignificant


relationship between advertisement and profitability.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 18 of 48

3. Time effects

The pooled regression model is simple to estimate as it does


not require the use of any special techniques. But it does not
make full use of the richness of the panel data.

Is the profit function really stable over the time period 2003-2006?
Do all of the companies really have the same profits function?
The pooled model ignores company heterogeneity and time
differences, and this might lead to wrong conclusions. It is
therefore advisable to check whether pooling is appropriate.
One possibility is to test for time effects. For example, business
cycle effects might be important in determining the overall level
of company profitability. We can explore this possibility by reestimating the model with year dummies.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 19 of 48

3. Time effects

Testing for time effects involves three simple steps:


1. Add time dummies to the pooled regression model.
-- In our case, we have yearly data (2003-2006).
-- So we include 3 year dummies ( say from 2004 to 2006)
to avoid the dummy variable trap.
2. Estimate the model using OLS.
3. Test if the time dummies are jointly equal to zero.
If we reject the null hypothesis that the time dummies
are equal to zero, we conclude that there are time effects
in the data and the pooled regression model is not
appropriate.
Next we demonstrate how to implement these steps in
practice.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 20 of 48

3. Time effects

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 21 of 48

3. Time effects

Testing the joint significance of the time effects:

No evidence that time (year) effects are significant. This means that
the average level of profits (conditional on the regressors) has not
fluctuated much during the sample period.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 22 of 48

3. Time effects

In the previous regression, the effect of advertisement on profits is


assumed to be stable across the years. But it would be useful to
explore whether the profits-advertisement relationship has changed
over time by interacting the year dummies with advertisement.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 23 of 48

4.
Robust standard errors

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 24 of 48

4. Robust
standard
errors

Heteroscedasticity is prevalent in cross sectional data and


serial correlation is widespread in time series data.
Since panel data is a combination of cross sectional and time
series data, both problems are likely to be present.
There are many methods of dealing with these problems in
panel data, some more complicated than others. Here we
consider the simplest but most widely used method.
This method gives standard errors of regression coefficients
that are robust to heteroscedasticity and serial correlation.
These robust standard errors can be used to test hypotheses
about the model parameters and construct confidence
intervals.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 25 of 48

4. Robust
standard
errors

Serial correlation within each panel unit is sometimes referred


to as clustering, and the robust standard errors are also known
as clustered standard errors.
Consider the following pooled panel data model

yit xit e it
i 1,...N ; t 1,..., T .

Given independence over the panel units i, allows for


heteroscedastic errors terms and unrestricted serial
correlation within panel units. That is, for all t and s

Cove it , e is 0

Let ols and eit yit xit ols denote the OLS estimator
and the estimated residual term.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 26 of 48

4. Robust
standard
errors

Assuming finite T and , the panel-robust estimator of the


asymptotic variance covariance matrix is

V ols W WW

Where

W xit xit
i 1 t 1

and

W xit xis eit eis


i 1 t 1 s 1

Note that if we only wanted to correct for heteroscedasicity ( that is


assuming serially uncorrelated errors), the matrix W would simplify to
N

W xit xit e it2


i 1 t 1

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 27 of 48

4. Robust
standard
errors

Pooled model with panel robust standard errors.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 28 of 48

5.
The Random Effects Model

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 29 of 48

5. Random
effects model

Recall that:
1. A panel data model is called a random effects model if
ui is not correlated with the regressors.

2. By contrast a panel data model where the individual


heterogeneity term is correlated with the regressors is
referred to as the fixed effects or correlated effects
model.
3. The most important question in applied static panel data
analysis is to determine which of the three contenders
pooled, random effects and fixed effects models -- is the
most appropriate for the data in hand.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 30 of 48

5. Random
effects model

Consider the following panel data model with the individual specific
effects ui are assumed to be uncorrelated with the regressor, x:
= 0 + 1 + +

~ 0, 2 and ~ 0, 2
i=1, .N; t=1, ,T.
It can be shown that the model is best estimated by Generalised Least
Squares ( GLS). The model is called the random effects model and the GLS
estimator is usually called the random effects estimator. To demonstrate
the mechanics of the random effects estimator, define the time means of
y and x as
T

xit

yi
L14025 Applied Microeconometrics

yit
i 1

and

xi i 1
T

Lecture 1: Static panel data modelling

Slide 31 of 48

5. Random
effects model

The random effects GLS estimator is equivalent to estimating the


following transformed model by OLS

( ) = 0 1 + 1 ( ) +
where

=1

2 +2

= 1 + ( )
The above transformation is sometimes called the GLS
transformation.
In unbalanced panel with i=1, .N; t=1, ,Ti, the above
transformation factor will have to be individual specific, i.e.

2
= 1
2 + 2
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 32 of 48

5. Random
effects model

Estimating the random effects profitability model

Advertisement doesnt seem to work?


L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 33 of 48

5. Random
effects model

Testing for individual heterogeneity in random effects models

When there is individual heterogeneity, the random effects model


is more efficient than the pooled model.
But if there is no heterogeneity in the panel data, it is better to
use the pooled model (apply OLS). So it is advisable to test for the
presence of heterogeneity.
The Breusch-Pagan test can be used for this purpose. The null
hypothesis of this test states that there is no heterogeneity.

Rejection of the null hypothesis can be taken as evidence in


favour of the random effects model.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 34 of 48

5. Random
effects model

How to choose between REM vs. Pooled Regression ?


Breusch and Pagan (1980) have devised a Lagrange
Multiplier (LM) test for the REM against the Pooled
Regression based on the OLS residuals.
The hypotheses are:
H0 : 2 = 0 Pooled regression is more appropriate)
Ha: 2 0 ( REM is more appropriate)

The LM test-statistic is based on the OLS (restricted model)


residuals and follows a Chi-Square distribution with 1
degree of freedom :
2
N T

i 1 t 1 it

NT

LM
1

N T
2
2T 1
e
it

i 1i 1

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

N is the number of cross sections


T is the number of time periods.

Slide 35 of 48

5. Random
effects model

Testing for individual heterogeneity in random effects models.


In practice, use the following command right after estimating the
random effects model

P-value = 0, so reject the null hypothesis of no random effects.


OLS would have been inefficient.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 36 of 48

6.
The Fixed Effects Model

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 37 of 48

6. Fixed effects
model

We saw that the random effects model is preferable to the pooled


model if there is individual heterogeneity in the panel data.
But recall that the random effects model assumes that the individual
heterogeneity term is not correlated with the regressors of the
model).
If this assumption is not correct, and the regressors and ui are indeed
correlated, and the random effects model would be inappropriate.
The fixed effects model which allows for correlation between the
regressors and the heterogeneity term should be used.
The fixed effects model is sometimes referred to as the Least Squares
Dummy Variables model.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 38 of 48

6. Fixed effects
model

Consider our panel data model


= 0 + 1 + +

But now assume that ui is correlated with x:

Because of this correlation, OLS will be biased and inconsistent. For


this reason, we first eliminate the effects through the so-called
within transformation of the data and then estimate the transformed
model using OLS:

( ) = 1 ( ) + ( )
The resulting estimator is called the within estimator, and it unbiased
and consistent.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 39 of 48

6. Fixed effects
model

An alternative to the within transformation for dealing with regressorindividual effects correlation is the first-differenced transformation:

( 1 ) = 1 ( 1 ) + ( 1 )

When T=2 , the first-differenced and the within estimators are


algebraically equivalent.

However,

drawback

of

the

within

and

first-differenced

transformations is that they also eliminate all variables that are timeinvariant. For example if the profitability regression model includes the
gender of the manager as an explanatory variable, this variable will
drop out of the transformed model.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 40 of 48

6. Fixed effects
model

Estimating the fixed effects model

For the time being , we are ignoring the


possibility of non-iid errors.

Advertisement seems to work!

The fixed effects estimator is also called least squares dummy variables
estimator. Here 249 companies dummies are (implicitly) used, and the Ftest shows that these are jointly significant: another evidence that there
are individual specific effects.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 41 of 48

6. Fixed effects
model

Fixed or random effects model?

The choice between the fixed and random effects models is an


important issue in applied panel data analysis.
If the individual effects (heterogeneity) and the regressors are
uncorrelated, use the random effects model as it is the best
(although the fixed effects model is still useful).
If the regressors and the individual effects are correlated, choose
the fixed effects model and never use the random effects model.
The test used to choose between the two models is known as
the Hausman test.
The null hypothesis of this test states that there is no
correlation between regressors and individual effects. So
rejection of the null favours the fixed effects model.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 42 of 48

6. Fixed effects
model

The Hausman test


H0: Regressors and effects ( heterogeneity) are not correlated
H1: They are correlated

Under H0 the Hausman test statistic is distributed as a Chi-Square


random variable with degrees of freedom equal to the number of
regressors.
The formula of the Hausman test statistic is

Where the FEM and REM indices are used to denote the fixed and random
effects estimators respectively, and

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 43 of 48

6. Fixed effects
model

Going back to our empirical example, results from the random effects
(RE) estimator appear to suggest that there is no relationship
between advertisement and profitability. By contrast the fixed effect
(FE) estimator would appear to suggest that advertisement works.
Which model should we trust more? Enter the Hausman test!

Reject the null hypothesis that RE model


is best. Discard RE results and base
your analysis on FE model.
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 44 of 48

6. Fixed effects
model

Micro panel data are typically characterised by heteroscedasticity and


within units serial correlation. So it is advisable to estimate the random
and fixed effects using robust standard errors.
One important practical implication of doing so is that the standard
Hausman test does not work with non-i.i.d errors ( heteroscedastic and
serially correlated ). For instance when ~ 0, 2 .
Instead a robust version of the Hausman test should be used. This
involves estimating the following model by OLS with robust standard
errors :

= 0 1 + 1 ( ) + 2 ( ) +

where l is as defined on Slide 32.


Testing the null hypothesis 20 is then equivalent to testing the null
hypothesis that the individual effects are not correlated with x.
One way to practically implement this approach is to use the user-written
Stata program xtoverid (you should be able to install this into your machine
by typing ssc install xtoverid , replace from within Stata).
L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 45 of 48

6. Fixed effects
model

Robust version of Hausman test using xtoverid command, which in


this case is given by the Sargan-Hansen statistic

Thus reject the RE model in favour of the FE model.

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 46 of 48

7.
Summary

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 47 of 48

7. Summary

The estimation of static linear panel data models boils down to the
choice between three estimators:

1.

The pooled model should be used when there is no individual

heterogeneity in the model.


2.

When there is individual heterogeneity and it is not correlated with the


independent variables of the model, the random effects model should be
preferred.

3.

The Hausman test helps us decide whether this is the case or not. If the
individual heterogeneity is correlated with the independent variables, the
fixed effects model should be used.
THANK YOU!

L14025 Applied Microeconometrics

Lecture 1: Static panel data modelling

Slide 48 of 48

S-ar putea să vă placă și