Documente Academic
Documente Profesional
Documente Cultură
Alex Yu
http://www.creative-wisdom.com/computer/sas/collinear_stepwise.html
One common approach to select a subset of variables from a complex model is stepwise regression. A stepwise regression is a procedure to examine the impact of each variable to the model step by step. The variable that cannot contribute much to the variance explained would be thrown out. There are several versions of stepwise regression such as forward selection, backward elimination, and stepwise. Many researchers employed these techniques to determine the order of predictors by its magnitude of influence on the outcome variable (e.g. June, 1997; Leigh, 1996). However, the above interpretation is valid if and only if all predictors are independent (But if you write a dissertation, it doesn't matter. Follow what your committee advises). Collinear regressors or regressors with some degree of correlation would return inaccurate results. Assume that there is a Y outcome variable and four regressors X1-X4. In the left panel X1-X4 are correlated (non-orthogonal). We cannot tell which variable contributes the most of the variance explained individually. If X1 enters the model first, it seems to contribute the largest amount of variance explained. X2 seems to be less influential because its contribution to the variance explained has been overlapped by the first variable, and X3 and X4 are even worse.
1 of 5
2013/01/01 07:46 .
http://www.creative-wisdom.com/computer/sas/collinear_stepwise.html
Indeed, the more correlated the regressors are, the more their ranked "importance" depends on the selection order (Bring, 1996). However, we can interpret the result of step regression as an indication of the importance of independent variables if all predictors are orthogonal. In the right panel we have a "clean" model. The individual contribution to the variance explained by each variable to the model is clearly seen. Thus, we can assert that X1 and X4 are more influential to the dependent variable than X2 and X3.
One-variable models X3 X2 X1 Two-variable models X2X3 X1X3 X1X2 Full model X1X2X3 0.62 1.84 4.00 0.60 0.33 0.32 1.81 2.34 2.35 2.70 11.20 11.34 0.31 0.27 0.00 2.27 2.35 2.75 9.40 10.90 19.41
At first, each regressor enters the model one by one. In all one-variable models, the best variable is X3 according to the max. R-square criterion (R2=.31). (Now we temporarily ignore RMSE and Cp). Then, all combinations of two-variable models are computed. This time the best two predictors are X2 and X3 (R2=.60). Last, all three variables are used for a full model (R2=.62). From the one-variable model to the two-variable model, the variance explained gains a substantive improvement (.60 - .31 = .29). However, from the two-variable to the full model, the gain is trivial (.62 - .60 = .02).
2 of 5
2013/01/01 07:46 .
http://www.creative-wisdom.com/computer/sas/collinear_stepwise.html
If you cannot follow the above explanation, this figure may help you. The x-axis represents the number of variables while the y-axis represents the R-square. It clearly indicates a sharp jump from one to two. But the curve turns into flat from two to three (see the red arrow).
Now, let's examine RMSE and Cp. Interestingly enough, in terms of both RMSE and Cp, the full model is worse than the two-variable model. The RMSE of the best two-variable is 1.81 but that of the full model is 1.83 (see the red arrow in the right panel)! The Cp of the best two is 2.70 whereas that of the full model is 4.00 (see the red arrow in the following figure)!
Nevertheless, although the approaches of maximum R-square, Root Mean Square Error, and Mallow's Cp are different, the conclusion is the same: One is too few and three are too many. To perform a variable selection in SAS, the syntax is "PROC REG; MODEL Y=X1-X3 /SELECTION=MAXR". To plot Max. R-square, RMSQ, and Cp together, use NCSS (NCSS Statistical Software, 1999).
3 of 5
2013/01/01 07:46 .
http://www.creative-wisdom.com/computer/sas/collinear_stepwise.html
Burnham and Anderson (2002) recommend replacing AIC with AICc, especially when the sample size is small and the number of parameters is large. Actually, AICc converges to AIC as the sample size is getting larger and larger. Hence, AICc should be used regardless of sample size and the number of parameters. Bayesian information criterion (BIC) is similar to AIC, but its penalty is heavier than that of AIC. However, some authors believe that AIC and AICc are superior to BIC for a number of reasons. First, AIC and AICc is based on the principle of information gain. Second, the Bayesian approach requires a prior input but usually it is debatable. Third, AIC is asymptotically optimal in model selection in terms of the least squared mean error, but BIC is not asymptotically optimal (Burnham & Anderson, 2004; Yang, 2005). JMP provides the users with the options of AICc and BIC for model refinement. To start running stepwise regression with AICc or BIC, use Fit models and then choose Stepwise from Personality. These short movie clips show the first and the second steps of constructing an optimal regression model with AICc (Special thanks to Michelle Miller for her help in recording the movie clips).
4 of 5
2013/01/01 07:46 .
http://www.creative-wisdom.com/computer/sas/collinear_stepwise.html
procedures "redundant" variables are not excluded. Rather they are retained and combined to form latent factors. It is believed that a construct should be an "open concept" that is triangulated by multiple indicators instead of a single measure (Salvucci, Walter, Conley, Fink, & Saba, 1997). In this sense, redundancy enhances reliability and yields a better model. However, factor analysis and principal component analysis do not have the distinction between dependent and independent variables and thus may not be applicable to research with the purpose of regression analysis. One way to reduce the number of variables in the context of regression is to employ the partial least squares (PLS) procedure. PLS is a method for constructing predictive models when the variables are too many and high collinear (Tobias, 1999). Besides collinearity, PLS is also robust against other data structural problems such as skew distributions and omission of regressors (Cassel, Westlund, & Hackl, 1999). It is important to note that in PLS the emphasis is on prediction rather than explaining the underlying relationships between the variables. Like principal component analysis, the basic idea of PLS is to extract several latent factors and responses from a large number of observed variables. Therefore, the acronym PLS is also taken to mean projection to latent structure. The slide show below illustrates the idea of factor extraction. Please press the next button to start the slide show (this Macromedia flash slideshow is made by Gregory Van Eekhout):
The following is an example of the SAS code for PLS: PROC PLS; MODEL; y1-y5 = x1-x100; Note that unlike an ordinary least squares regression, PLS can accept multiple dependent variables. The output shows the percent variation accounted for each extracted latent variable:
Navigation
Index Simplified Navigation Table of Contents Search Engine Contact
5 of 5
2013/01/01 07:46 .