Documente Academic
Documente Profesional
Documente Cultură
Eects
Econometrics
Endogeneity
Toke Reichstein
Department of Innovation and Organizational Economics Copenhagen Business School
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Endogeneity and Instrumental Variables Regression What is Endogeneity? What To Do About It? Instruments Existence of Endogeneity and Evaluating IVs Testing For Endogeneity Testing the Instruments Strength and Validity Panel Data and Fixed Eects Cross section versus Panel What happens in a Fixed Eects Setting?
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Panel Data
Exploiting the time dimension of the subjects to control for the unobserved
Impossible to cover both in one short session - we concentrate on the cross section setting
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Endogeneity in Management
Endogeneity is one of the most major challenges in econometric analysis in management and much of social sciences Social sciences is about understanding the behaviour of people It is not possible to establish a laboratory like in natural sciences and run experiments keeping the ceteris paribus assumption As a consequence, much of the work done in social sciences are biased since it suers endogeneity
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
What is Endogeneity?
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Example:
Consider a sub-sample of subjects for whom we wish to understand whether college degree have an eect on wages Unfortunately, to understand the eect of a college degree, we need to have a proxy for the subjects intrinsic ability Intrinsic ability may inuence the likelihood of obtaining a college degree The intrinsic ability may also inuence the wages you obtain As a result a positive estimate on return on college degree may be attributed to the intrinsic ability of the individual rather than the degree
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Self-selection Bias
Sometimes we call omitted variables endogeneity for self-selection bias This is often used when we wish to understand the eect of a behaviour or being enrolled in a program (like the college) Here the subject have self-selected to behave in a particular manner or have chosen to be in the program That choice is not a random choice not a random variable We need to understand the choice before we can understand the eect of that choice on the main variable of interest We need to understand an unobserved factor that leads to individuals self-selecting into a scenario or a behaviour if the unobserved factors have a direct implication on the main variables
Toke Reichstein Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Think of a case in which you believe a standard regression would suer from endogeneity
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Do nothing and accept potential bias Collect panel data and correct with a model that solves the problem of endogeneity Find a suitable proxy for the unobserved - which then is not unobserved anymore Apply Instrumental Variables Regression
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
We wish to understand the impact of education on wages We are unable to measure individuals non-education based capabilities, which not only inuence wages but also the choice and ability to complete a degree Here these capabilities are the unobserved heterogeneity causing bias in the estimated eect of education on wages In this case probably a positive impact since boosting the estimated eect of education
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
OLSs Problem
The problem using an OLS in cases which suers from endogeneity is that the error term and the explanatory variables become correlated Cov (xi , i ) = 0 (1)
This is caused by the unobserved element (omitted variable) since it is hidden in the error term
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Here qi is the unobserved variable (such as intrinsic ability) and vi is the traditional error term If we omit qi the equation transforms into: yi = 0 + 1 x1i + 2 x2i + + k xki + ui (3) where ui = qi + vi If cov (qi , xj ) = 0 where j 1, 2, . . . , k , then cov (ui , xj ) = 0 We violate one of the OLS assumptions this is the endogeneity problem which represents a potential bias
Toke Reichstein Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
> 0 and cov (xi , qi ) > 0 leads to an upwards/positive bias in stimates (e.i. the eect of xi is overestimated) Instrumental variables (IV) regression is designed to control for unobserved heterogeneity IV regression is designed to correct the esimates in the main equation as an eect of the unobserved
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
These are tough criteria Challenge: to nd variables correlated with the endogenous variable but uncorrelated with the part of the error term that is due to the unobserved heterogeneity Rule of thumb: a good instrument should correlate with the key independent variable, but not with the main equation dependent variable
Toke Reichstein Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Try to identify some instruments appropriate for usage in the example of educations eect on wages
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Assume that we want to study yi using x1i and a number of controls xji where j 2, 2, . . . , k Assume x1i to be endogenous We have a n instruments Zni useful for predicting x1i The 2sls model becomes yi x1i = 0 + 1 x1i + j xji +
i
(6) (7)
= 0 + n zni + j xji + vi
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Since cov (zni , i ) = 0 then it must be that cov (0 + n zni , i ) = 0 In your instrumentation (regression against the endogenous variable), you also include the remaining explanatory variables We also sometimes call the instrumental variables zni for the excluded instruments, because they do not appear in the main equation explaining yi
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
(8)
We have removed the unobserved element of the endogenous variable What is left is the eect of the predicted value of the endogenous variable
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
y )
(9)
n i =1 (zi n i =1 (zi
z )(yi y ) z )(xi x )
(10)
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Interpretation of Results
Generally, IV regression follows the same conventions as OLS. Here the test statistics of the parameters can be shown to follow a standard normal distribution - the test statistics are hence referred to as z scores and compared to the normal distribution and not t-distributed as is the case in OLS The R 2 in IV is less useful since it in fact can be negative; the Residual Sum of Squares can be higher than the Total Sum of Squares in IV - do not interpret it.
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
OLS Regression
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
2SLS Regression
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Bad Instruments
If the instruments are poor in the sense that they are not exogenous to the main equation, we obtain biased results (they are said not to be valid) The same goes if the correlation between the instruments and the endogenous variable is not signicant (they are said to be weak they are required to be strong) This can be seen from the following expression of the estimated parameter: 1 = 1 + plim Corr (zi , ui ) u Corr (zi xi ) x (11)
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Make sure that you in fact can trust your IV regression estimates You need to check three things:
That the suspected explanatory variable indeed is endogenous That you do not have weak instruments Overidentication - validity of instruments
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Endogeneity or Not?
If all the criteria otherwise are met, we can assume that a problem with endogeneity would produce dierent estimates in the 2SLS case compared to the standard OLS We have endogeneity eects if the estimates of the 2SLS dier signicantly from those of the OLS Use the Hausman Test (visual comparison is too weak but may give a hint)
Run the two regressions (OLS and IV) and store them Use the Hausman test to see if the coecient are dierent (<hausman>)
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
2SLS Regression
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
2SLS Regression
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Strength of Instruments
Many tests for weak instruments have been proposed No clear consensus on best approach for evaluation Generally we consider the signicance of the instruments in the rst stage equation Signicance suggest instruments not to be weak since the criteria is: cov (zi , xi ) = 0 (12)
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Overidentication Validity
We say that we have potential overidentication if the number of instruments exceeds the number of endogenous variables We could potentially drop some of the instruments to make the estimation less restricted
We have more exogenous variables than needed to estimate the parameter in the main equation
We also increase the likelihood of invalid instruments that do not uphold the rule: Cov (zi , ui ) = 0 (13)
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
The test is used to test for overidentication Follow the following steps
1. Estimate the 2SLS IV regression - Extract residuals 2. Regress these residuals on all exogenous variables and extract R2 3. Calculate nR 2 which is 2 distributed 4. Compare the value with the critical value in the chi-square table with degrees of freedom equal to # instruments less # endogenous variables
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Overidentication Test
If the statistics (nR 2 ) exceeds the critical 2 value, we can conclude that the instruments are not exogenous and hence invalid. They are not uncorrelated with the error term and hence has some explanatory power in the main equation. Even when the overidentication test suggests that the instruments are valid, we should be very careful: The test assumes that one instrument is valid. If all instruments do not fulll the criteria Cov (zi , ui ) = 0, then the test might suggest that the instruments are valid, even when they are not!! Use your own reasoning/theory in arguing why the instruments can be considered correlated with the endogenous variable but not with yi
Toke Reichstein Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Instead of using <ivregress> use instead <ivreg2> This procedure will automatically produce the key statistics for evaluating overidentication (validity)
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Exercise - Endogeneity
Assume you wish to understand the eect of political engagement on the number of minutes individuals spends on reading their newspaper Why would there be endogeneity What instruments do you think can be used for correcting this endogeneity?
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Panel Data
Panel Data - cross sections of subjects re-sampled Typically: We observe characteristics of subject at several dierent points in time Having several observations on the same subject allows us to control for unobserved characteristics Panel also allows researchers to investigate causality Panel data allow the research to investigate the lag of an eect Some questions are simply in need of panel data for investigation
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
We would conclude that a higher AFDC is associated with low divorce rate The fast and hasty researcher will conclude that wealthy states that can aord extensive AFDC programs also are conducive to a dierent cultural and economic climate Furthermore, ASDCs potentially can help families with dependent children to cope with their situation and avoid quarrels based on economic positioning
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
The example illustrates that a cross sectional analysis and a panel analysis may point in completely opposite directions The dynamic relationship between the variables diers dramatically from the cross sectional relationship between states This may point to cross sections suering from problems in the shape of unobserved heterogeneity across the studied subjects a feature panel data can be used to remedy
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Panel Data
Lets assume we did not settle for a cross section and therefore put some eort into collecting further data giving us a panel dataset Equation to be tested only change marginally, but it provides strong options for controlling for unobserved heterogeneity/endogeneity The original equation looked as depicted in equation 14 and lets assume it is changed into equation 15 yi = 0 + 1 x1i + 2 x2i + + k xki + qi + vi yit = 0 + 1 x1it + 2 x2it + + k xkit + qi + vit (14) (15)
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Since all variables that do not change over time (those without i subscripts) obtain the same mean value, they disappear into a nill - this includes the intercept We get a new equation expressed by: y it = xit + it
Toke Reichstein Econometrics
(17)
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
The Fixed Eects estimation evens out all eects that are xed That includes also all unobserved eects Puts a limit on what variables we can obtain parameters for The degrees of freedom is not calculated in the standard way Degrees of freedom = N*T - N - k We loose N to the demeaning of the function and k due to the explanatory variables
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Even if we cannot include xed eects estimates, we can include them interacting with time varying variables Interacting with time dummies will enable us to express how the eect of a constant variable changes across time We are able to determine the increasing or decreasing eect as time goes by while keeping the overall eect xed Example: How does education aect wages after completing nal exams - we cannot put education in, but we can add education interacted with time dummies
Toke Reichstein
Econometrics
Outline Endogeneity and Instrumental Variables Regression Existence of Endogeneity and Evaluating IVs Panel Data and Fixed Eects
Be Careful
This was only a short introduction to Fixed Eects - much more is needed for sound panel data analysis there are other options that should be considered and may suit your data better
Random Eects 1st Dierencing Least Square Dummy Variables Model (LSDV)
Toke Reichstein
Econometrics