Sunteți pe pagina 1din 45

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed

Eects

Econometrics
Endogeneity

Toke Reichstein
Department of Innovation and Organizational Economics Copenhagen Business School

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Endogeneity and Instrumental Variables Regression What is Endogeneity? What To Do About It? Instruments Existence of Endogeneity and Evaluating IVs Testing For Endogeneity Testing the Instruments Strength and Validity Panel Data and Fixed Eects Cross section versus Panel What happens in a Fixed Eects Setting?

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Two Types of Data Two Sets of Models

Cross Section Data


Instrumental Variables Regression - Two-Stage regression approach to model the unobserved

Panel Data
Exploiting the time dimension of the subjects to control for the unobserved

Impossible to cover both in one short session - we concentrate on the cross section setting

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Endogeneity in Management

Endogeneity is one of the most major challenges in econometric analysis in management and much of social sciences Social sciences is about understanding the behaviour of people It is not possible to establish a laboratory like in natural sciences and run experiments keeping the ceteris paribus assumption As a consequence, much of the work done in social sciences are biased since it suers endogeneity

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

What is Endogeneity?

Endogeneity can be caused by three circumstances


1. Omitted Variables 2. Measurement Error 3. Simultaneity

The eect of endogeneity is bias in estimates and hence:


Rejecting a hypothesis that in fact is true (Type I Error) Fail to reject a hypothesis that in fact is false (Type II Error)

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Omitted Variables Case

Example:
Consider a sub-sample of subjects for whom we wish to understand whether college degree have an eect on wages Unfortunately, to understand the eect of a college degree, we need to have a proxy for the subjects intrinsic ability Intrinsic ability may inuence the likelihood of obtaining a college degree The intrinsic ability may also inuence the wages you obtain As a result a positive estimate on return on college degree may be attributed to the intrinsic ability of the individual rather than the degree

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Self-selection Bias
Sometimes we call omitted variables endogeneity for self-selection bias This is often used when we wish to understand the eect of a behaviour or being enrolled in a program (like the college) Here the subject have self-selected to behave in a particular manner or have chosen to be in the program That choice is not a random choice not a random variable We need to understand the choice before we can understand the eect of that choice on the main variable of interest We need to understand an unobserved factor that leads to individuals self-selecting into a scenario or a behaviour if the unobserved factors have a direct implication on the main variables
Toke Reichstein Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Think of a case in which you believe a standard regression would suer from endogeneity

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

What to do about Unobserved Heterogeneity Endogeneity?

Do nothing and accept potential bias Collect panel data and correct with a model that solves the problem of endogeneity Find a suitable proxy for the unobserved - which then is not unobserved anymore Apply Instrumental Variables Regression

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

The Example of Wages

We wish to understand the impact of education on wages We are unable to measure individuals non-education based capabilities, which not only inuence wages but also the choice and ability to complete a degree Here these capabilities are the unobserved heterogeneity causing bias in the estimated eect of education on wages In this case probably a positive impact since boosting the estimated eect of education

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

OLSs Problem

The problem using an OLS in cases which suers from endogeneity is that the error term and the explanatory variables become correlated Cov (xi , i ) = 0 (1)

This is caused by the unobserved element (omitted variable) since it is hidden in the error term

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

The Problem at its Core


Consider the following main equation of interest: yi = 0 + 1 x1i + 2 x2i + + k xki + qi + vi (2)

Here qi is the unobserved variable (such as intrinsic ability) and vi is the traditional error term If we omit qi the equation transforms into: yi = 0 + 1 x1i + 2 x2i + + k xki + ui (3) where ui = qi + vi If cov (qi , xj ) = 0 where j 1, 2, . . . , k , then cov (ui , xj ) = 0 We violate one of the OLS assumptions this is the endogeneity problem which represents a potential bias
Toke Reichstein Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Bias and IV Regression


We cannot consistently estimate any of the j s, when ui is correlated with any of the regressors Each of the j s consists of the true j variables bias + an omitted

> 0 and cov (xi , qi ) > 0 leads to an upwards/positive bias in stimates (e.i. the eect of xi is overestimated) Instrumental variables (IV) regression is designed to control for unobserved heterogeneity IV regression is designed to correct the esimates in the main equation as an eect of the unobserved

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Instruments - What are they?


Instruments (zi ) are variables used to exlain a variable we suspect of being endogeneous and which are exogenous with respect to the main equation cov (zi , ui ) = 0 cov (zi , xi ) = 0 (4) (5)

These are tough criteria Challenge: to nd variables correlated with the endogenous variable but uncorrelated with the part of the error term that is due to the unobserved heterogeneity Rule of thumb: a good instrument should correlate with the key independent variable, but not with the main equation dependent variable
Toke Reichstein Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Try to identify some instruments appropriate for usage in the example of educations eect on wages

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

The 2SLS Model for IV Regression I

Assume that we want to study yi using x1i and a number of controls xji where j 2, 2, . . . , k Assume x1i to be endogenous We have a n instruments Zni useful for predicting x1i The 2sls model becomes yi x1i = 0 + 1 x1i + j xji +
i

(6) (7)

= 0 + n zni + j xji + vi

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

The 2SLS Model for IV Regression II

Since cov (zni , i ) = 0 then it must be that cov (0 + n zni , i ) = 0 In your instrumentation (regression against the endogenous variable), you also include the remaining explanatory variables We also sometimes call the instrumental variables zni for the excluded instruments, because they do not appear in the main equation explaining yi

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

The 2SLS Approach


First run the regression against the endogenous variable (rst stage) and calculate the predicted x1i (x 1i ) Then use the predicted x1i (x 1i ) rather than the observed x1i in the main (regression) equation (second stage) yi = 0 + ! x 1i + +j xji +
i

(8)

We have removed the unobserved element of the endogenous variable What is left is the eect of the predicted value of the endogenous variable

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Dierence Between OLS and IV

The OLS estimate is: 1 = The IV estimate is: 1 =


n )(yi i =1 (xi x n )2 i =1 (xi x

y )

(9)

n i =1 (zi n i =1 (zi

z )(yi y ) z )(xi x )

(10)

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

Interpretation of Results

Generally, IV regression follows the same conventions as OLS. Here the test statistics of the parameters can be shown to follow a standard normal distribution - the test statistics are hence referred to as z scores and compared to the normal distribution and not t-distributed as is the case in OLS The R 2 in IV is less useful since it in fact can be negative; the Residual Sum of Squares can be higher than the Total Sum of Squares in IV - do not interpret it.

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

OLS Regression

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

What is Endogeneity? What To Do About It? Instruments

2SLS Regression

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Bad Instruments
If the instruments are poor in the sense that they are not exogenous to the main equation, we obtain biased results (they are said not to be valid) The same goes if the correlation between the instruments and the endogenous variable is not signicant (they are said to be weak they are required to be strong) This can be seen from the following expression of the estimated parameter: 1 = 1 + plim Corr (zi , ui ) u Corr (zi xi ) x (11)

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Three Things to Check

Make sure that you in fact can trust your IV regression estimates You need to check three things:
That the suspected explanatory variable indeed is endogenous That you do not have weak instruments Overidentication - validity of instruments

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Endogeneity or Not?

If all the criteria otherwise are met, we can assume that a problem with endogeneity would produce dierent estimates in the 2SLS case compared to the standard OLS We have endogeneity eects if the estimates of the 2SLS dier signicantly from those of the OLS Use the Hausman Test (visual comparison is too weak but may give a hint)
Run the two regressions (OLS and IV) and store them Use the Hausman test to see if the coecient are dierent (<hausman>)

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

2SLS Regression

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Durbin-Wu- Hausman Test for Endogeneity

We wish to understand if the OLS estimates are consistent Follow 4 steps:


1. Run the reduced form regression against the endogenous variable 2. Extract the residuals 3. Run the main equation including these residuals as explanatory variables 4. Test if the residual is signicantly dierent from zero using a f test (<test var>)

If the test shows signicance endogeneity issues

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

2SLS Regression

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Strength of Instruments

Many tests for weak instruments have been proposed No clear consensus on best approach for evaluation Generally we consider the signicance of the instruments in the rst stage equation Signicance suggest instruments not to be weak since the criteria is: cov (zi , xi ) = 0 (12)

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Overidentication Validity

We say that we have potential overidentication if the number of instruments exceeds the number of endogenous variables We could potentially drop some of the instruments to make the estimation less restricted
We have more exogenous variables than needed to estimate the parameter in the main equation

We also increase the likelihood of invalid instruments that do not uphold the rule: Cov (zi , ui ) = 0 (13)

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Overidentication (Sargan) Test

The test is used to test for overidentication Follow the following steps
1. Estimate the 2SLS IV regression - Extract residuals 2. Regress these residuals on all exogenous variables and extract R2 3. Calculate nR 2 which is 2 distributed 4. Compare the value with the critical value in the chi-square table with degrees of freedom equal to # instruments less # endogenous variables

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Overidentication Test
If the statistics (nR 2 ) exceeds the critical 2 value, we can conclude that the instruments are not exogenous and hence invalid. They are not uncorrelated with the error term and hence has some explanatory power in the main equation. Even when the overidentication test suggests that the instruments are valid, we should be very careful: The test assumes that one instrument is valid. If all instruments do not fulll the criteria Cov (zi , ui ) = 0, then the test might suggest that the instruments are valid, even when they are not!! Use your own reasoning/theory in arguing why the instruments can be considered correlated with the endogenous variable but not with yi
Toke Reichstein Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Overidentication Test in Stata

Instead of using <ivregress> use instead <ivreg2> This procedure will automatically produce the key statistics for evaluating overidentication (validity)

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Testing For Endogeneity Testing the Instruments Strength and Validity

Exercise - Endogeneity

Assume you wish to understand the eect of political engagement on the number of minutes individuals spends on reading their newspaper Why would there be endogeneity What instruments do you think can be used for correcting this endogeneity?

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Panel Data
Panel Data - cross sections of subjects re-sampled Typically: We observe characteristics of subject at several dierent points in time Having several observations on the same subject allows us to control for unobserved characteristics Panel also allows researchers to investigate causality Panel data allow the research to investigate the lag of an eect Some questions are simply in need of panel data for investigation

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Divorce Rate Example


Lets consider the divorce rate across american states We wish to understand the relationship between welfare payments (AFDC) and the divorce rate We consider the cross section of states looking at the size of AFDC and the divorce rate in the state Prior to the study we would think that the relationship would be positive since states with desirable economic climates enjoy both low divorce rates and low welfare payments Furthermore, welfare payment systems are conducive since it aid individuals to cope as single rather than married

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Cross Section of Divorce in American States


A regression line would suggest that there is a negative relationship (-0.37) between aid to families with dependent children and the divorce rate

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Cross Section of Divorce in American States

We would conclude that a higher AFDC is associated with low divorce rate The fast and hasty researcher will conclude that wealthy states that can aord extensive AFDC programs also are conducive to a dierent cultural and economic climate Furthermore, ASDCs potentially can help families with dependent children to cope with their situation and avoid quarrels based on economic positioning

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Panel of Observations Across American States


Now we see that the relationship within the sates goes in the opposite direction suggesting the expected positive relationship like the one expected initially

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Cross Section Versus Panel

The example illustrates that a cross sectional analysis and a panel analysis may point in completely opposite directions The dynamic relationship between the variables diers dramatically from the cross sectional relationship between states This may point to cross sections suering from problems in the shape of unobserved heterogeneity across the studied subjects a feature panel data can be used to remedy

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Panel Data
Lets assume we did not settle for a cross section and therefore put some eort into collecting further data giving us a panel dataset Equation to be tested only change marginally, but it provides strong options for controlling for unobserved heterogeneity/endogeneity The original equation looked as depicted in equation 14 and lets assume it is changed into equation 15 yi = 0 + 1 x1i + 2 x2i + + k xki + qi + vi yit = 0 + 1 x1it + 2 x2it + + k xkit + qi + vit (14) (15)

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Fixed Eects Modelling


In the xed eect model we convert all the variables by subtracting its own mean value 0 ) + 1 (x1it x (yit y i ) = (0 1i ) + + k (xkit x ki ) + (qi q i ) + vit v it (16)

Since all variables that do not change over time (those without i subscripts) obtain the same mean value, they disappear into a nill - this includes the intercept We get a new equation expressed by: y it = xit + it
Toke Reichstein Econometrics

(17)

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Controlled for All Fixed Unobserved Eects

The Fixed Eects estimation evens out all eects that are xed That includes also all unobserved eects Puts a limit on what variables we can obtain parameters for The degrees of freedom is not calculated in the standard way Degrees of freedom = N*T - N - k We loose N to the demeaning of the function and k due to the explanatory variables

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Possibilities in Fixed Eects

Even if we cannot include xed eects estimates, we can include them interacting with time varying variables Interacting with time dummies will enable us to express how the eect of a constant variable changes across time We are able to determine the increasing or decreasing eect as time goes by while keeping the overall eect xed Example: How does education aect wages after completing nal exams - we cannot put education in, but we can add education interacted with time dummies

Toke Reichstein

Econometrics

Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects

Cross section versus Panel What happens in a Fixed Eects Setting?

Be Careful

This was only a short introduction to Fixed Eects - much more is needed for sound panel data analysis there are other options that should be considered and may suit your data better
Random Eects 1st Dierencing Least Square Dummy Variables Model (LSDV)

Toke Reichstein

Econometrics

S-ar putea să vă placă și