Documente Academic
Documente Profesional
Documente Cultură
Multicollinearity
Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear predictors are height and weight of a person, years of education and income, and assessed value and square footage of a home. Consequences of high multicollinearity: 1. Increased standard error of estimates of the s (decreased reliability). 2. Often confusing and misleading results.
Example
Suppose that we t a model with x1 = income and x2 = years of education as predictors and y intake of fruits and vegetables as the response. Years of education and income are correlated, and we expect a positive association of both with intake of fruits and vegetables. ttests for the individual 1, 2 might suggest that none of the two predictors are signicantly associated to y, while the Ftest indicates that the model is useful for predicting y Contradiction? No! This has to do with interpretation of regression coecients.
F or t
1 is the expected change in y due to x1 given x2 is already in the model. 2 is the expected change in y due to x2 given x1 is already in the model. Since both x1 and x2 contribute redundant information about y once one of the predictors is in the model, the other one does not have much more to contribute. This is why the Ftest indicates that at least one of the predictors is important yet the individual tests indicate that the contribution of the predictor, given that the other one has already been included, is not really important.
Detecting Multicollinearity
Easy way: compute correlations between all pairs of predictors. If some r are close to 1 or -1, remove one of the two correlated predictors from the model. Another way: calculate the variance ination factors for each predictor x: VIF = 1/(1-R-squarej) Where R-squarej = is the coecient of determination of the model that includes all predictors except the jth predictor. If VIF 10 then there is a problem with multicollinearity.
Heteroscedasticity
When the variance of ei depends on the value of yi we say that the variances are heteroscedastic (unequal). Heteroscedasticity may occur for many reasons, but typically occurs when responses are not normally distributed or when the errors are not additive in the model. Examples of non-normal responses are: Poisson counts: number of sick days taken per person/month, or number of trac crashes per intersection/year. Binomial proportions: proportion of felons who are repeat oenders or proportion of consumers who prefer low carb foods.
if = 1, then we expect et = et-1. When this is true, d 0. if = -1, then we expect et = - et-1 and et - et-1 = -2 et-1. When this is true, d 4. if = 0, then we expect d 2.
Hence values of the test statistic far from 2 indicate that serial correlation is likely present.
Solution
Use OLS to estimate the regression and fix the standard errors A. We know OLS is unbiased, its just that the usual formula for the standard errors is wrong (and hence tests can be misleading) B. We can get consistent estimates of the standard errors (as the sample size goes to infinity, a consistent estimator gets arbitrarily close to the true value in a probabilistic sense) called Newey-West standard errors C. When specifying the regression, use coefficient covariance matrix option, and the HAC Newey-West method D. Most of the time, this approach is sufficient. E. If not, we cannot use OLS and need to use Generalized Least Squares.