Documente Academic
Documente Profesional
Documente Cultură
9.6
Step1 The stepwise regression routine first fits a simple linear regression model for each of the P - I
potential X variables. For each simple linear Regression model, the f* statistic (2.17) for testing
whether or not the slope is zero is obtained:
The X variable with the largest t* value is the candidate for first addition. If this f* value exceeds a
predetermined level, or if the corresponding P-value is less than a predetermined the X variable is
added. Otherwise, the program terminates with no X variable.
Considered sufficiently helpful to enter the regression model. Since the degrees of freedom associated
with MSE vary depending on the number of X variables in the model, and since repeated tests on the
same data are undertaken, fixed t* limits for adding or deleting a variable have no precise probabilistic
meaning. For this reason, software programs often favor the use of predetermined a-limits.
Step 2 Assume X7 is the variable entered at step 1. The stepwise regression routine now fits all
regression models with two X variables, where X7 is one of the pair. For each such regression model,
the t* test statistic corresponding to the newly added predictor Xk is obtained. This is the statistic for
testing whether or not k = 0 when X7 and Xk are the variables in the model. The X variable with the
largest t* value-or equivalently, the smallest P -value-is the candidate for addition at the second stage:
If this t* value exceeds a predetermined level (i.e., the P-value falls below a predetermined level), the
second X variable is added. Otherwise, the program terminates
Step 3. Suppose X3 is added at the second stage. Now the stepwise regression routine examines
whether any of the other X variables already in the model should be dropped. For our illustration, there
is at this stage only one other X variable in the model, X7, so that only one t* test statistic is obtained:
At later stages, there would be a number of these t* ~statistics, one for each of the variables in the
model besides the one last added. The variable for which this t* value is smallest (or equivalently the
variable for which the P-value is largest) is the candidate for deletion. If this t* value falls below-or the
P-value exceeds-a predetermined limit, the variable is dropped from the model; otherwise, it is
retained.
Step 4. Suppose X7 is retained so that both X3 and X7 are now in the model. The stepwise regression
routine now examines which X variable is the next candidate for addition, then examines whether any
of the variables already in the model should now be dropped, and so on until no further X variables can
either be added or deleted, at which point the search terminates.
Note that the stepwise regression algorithm allows an X variable, brought into the model at an earlier
stage, to be dropped subsequently if it is no longer helpful in conjunction with variables added at later
stages.
Forward Selection. The forward selection search procedure is a simplified version of forward stepwise
regression, omitting the test whether a variable once entered into the model should be dropped.
Backward Elimination. The backward elimination search procedure is the opposite of forward
selection. It begins with the model containing all potential X variables and identifies the one with the
largest P-value. If the maximum P-value is greater than a predetermined limit, that X variable is
dropped. The model with the remaining P - 2 X variables is then fitted, and the next candidate for
dropping is identified. This process continues until no further X variables can be dropped.
9.9 a>
Number in
Model
R-Square
Adjusted
R-Square
C(p)
0.6190
0.6103
8.3536
220.5294 X1
0.4155
0.4022
35.2456
240.2137 X3
0.3635
0.3491
42.1123
244.1312 X2
0.6761
0.6610
2.8072
215.0607 X1 X3
0.6550
0.6389
5.5997
217.9676 X1 X2
0.4685
0.4437
30.2471
237.8450 X2 X3
0.6822
0.6595
4.0000
Obs VarsInModel
216.1850 X1 X2 X3
Press
1 X1
5569.56
2 X2
9254.49
3 X3
8451.43
4 X1 X2
5235.19
5 X1 X3
4902.75
6 X2 X3
8115.91
7 X1 X2 X3
5057.89
b> As seen in part a the answer is yes, unfortunately this is a very rare thing to
happen. As talked in class
c> Since forward step has advantage when covariates are large, and in our case it is
small So its not advantageous.
9. 10
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
F Value
Model
8718.02248
2179.50562
129.74
Pr
>
<.0001
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Error
20
335.97752
16.79888
Corrected Total
24
9054.00000
Root MSE
Dependent Mean
Coeff Var
F Value
Pr
4.09864 R-Square
0.9629
0.9555
>
4.44538
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
Value
Pr
>
Intercept
-124.38182
9.94106
-12.51
<.0001
X1
0.29573
0.04397
6.73
<.0001
X2
0.04829
0.05662
0.85
0.4038
X3
1.30601
0.16409
7.96
<.0001
X4
0.51982
0.13194
3.94
0.0008
proficiency
X1
proficiency
X1
X2
X3
X4
1.00000
0.51441
0.49701
0.89706
0.86939
0.0085
0.0115
<.0001
<.0001
1.00000
0.10227
0.18077
0.32666
0.6267
0.3872
0.1110
1.00000
0.51904
0.39671
0.0078
0.0496
1.00000
0.78204
0.51441
0.0085
X2
X3
X4
0.49701
0.10227
0.0115
0.6267
0.89706
0.18077
0.51904
<.0001
0.3872
0.0078
0.86939
0.32666
0.39671
<.0001
0.78204
1.00000
|t|
X1
X2
X3
<.0001
0.1110
0.0496
<.0001
X4
F value tells us that there are some predictor to be kept, X2 has to be dropped according to the P Value
And the X2 X3 are highly related, while X3, X4 are highly related too. We need to consider X3, X4
correlation there after
9.11 a>
Number
in
Model
R- Adjusted
Square R-Square
C(p)
AIC
SBC Variables in
Model
0.8047
0.7962
0.7558
0.2646
0.2470
0.9330
0.9269
17.1130
0.8773
0.8661
0.8153
0.7985
0.8061
0.7884
0.7833
0.7636
0.4642
0.9615
0.9560
3.7274
73.8473
78.72282 X1 X3 X4
0.9341
0.9247
18.5215
87.3143
92.18984 X1 X2 X3
0.8790
0.8617
0.8454
0.8233
0.9629
0.9555
5.0000
85.7272
74.9542
89.38384 X1 X3
81.04859 X1 X2 X3 X4
According to the table above the best 4 combinations are (X1 X3), (X1 X3 X4), (X1
X2 X3), (X1 X2 X3 X4). Because they have the largest adjusted R squre.
b> Those more ordered criteria are useful, as we can see in the above table, C(p), AIC
SBC are minimized for X1 X3 X4, So those criteria are useful
9.18 a>
Summary of Stepwise Selection
Step Variable Variable Number Partial Model
Entered Removed
Vars
RRIn Square Square
C(p)
F Pr
Value
>
1 X3
<.0001
2 X1
<.0001
3 X4
0.0285 0.9615
0.0007
3.7274 15.59
b> We can see from 9.11a that X1 X3 X4 has the largest AdjR^2 so the result agrees
with that criteria.
Code:
option ls = 80 nodate;
title 'probelm 9.9';
data patient;
infile 'C:\Reg\hw9\9.9.txt';
input Y X1 X2 X3;
run;
option ls = 80 nodate;
title 'probelm 9.10';
data hw9_910;
infile 'C:\Reg\hw9\9.10.txt';
input proficiency X1 X2 X3 X4;
run;
run;