Stepwise Regression

Stepwise
regression
Statement of problem
A common problem is that there is a large set
of candidate predictor variables.

Goal is to choose a small subset from the
larger set so that the resulting regression
model is simple, yet have good predictive
ability.
What method should you use:

forward or backward?
If you have a verylargeset of potential
independent variables from which you wish

toextracta few, you should generally go
forward
if you have amodest-sizedset of potential
variables from which you wish to eliminatea
few, you should generally gobackward.
Stepwise regression:
Preliminary steps
1. Specify an Alpha-to-Enter (E = 0.05)
significance level.
2. Specify an Alpha-to-Remove (R = 0.05)
significance level.
Stepwise regression:
Stopping the procedure
The procedure is stopped when adding an
additional predictor does not yield a t-test Pvalue below E = 0.05.
Caution about stepwise

regression!
Do not jump to the conclusion
that all the important predictor variables for
predicting y have been identified, or
that all the unimportant predictor variables
have been eliminated.
Caution about stepwise

regression!
The probability is high
that we included some unimportant predictors
that we excluded some important predictors
Drawbacks of stepwise
regression
The final model is not guaranteed to be
optimal in any specified sense.

The procedure yields a single final model,
although in practice there are often several
equally good models.
It doesnt take into account a researchers
knowledge about the predictors.
The three most commonly used

automated procedures are
Forward selection -- start with the best predictor and add
predictors to get a best model
Backward selection -- start with a full model and delete predictors

to get a best model
Stepwise selection -- a combination of the first two
Forward selection
Step 1 the first predictor in the model is the best single predictor
Select the predictor with the numerically largest simple
correlation with the dependent variable
ry,x1
vs.
ry,x2
vs.
ry,x3
vs.
ry,x4
Step 2 the next predictor in the model is the one that will
contribute the most -- with two equivalent definitions
1. The 2-predictor model (including the first predictor) with the
numerically largest R -- if the R is significant and
significantly larger than the r from the first step
R2y.x3,x1
vs.
R2y.x3,x2
vs.
R2y.x3,x4
2. Add to the model that predictor with the highest semi-partial

correlation with the dependent variable, controlling the
predictor for the predictor already in the model
ry(x1.x3)
vs.
ry(x2.x3)
vs.
ry.(x4.x3)
All subsequent steps -- the next predictor in the model is the one
that will contribute the most -- with two equivalent definitions
significantly larger than the R from the previous step
R2y.x3,x2,x1
vs.
R2y.x3,x2,x4

correlation with the dependent variable, controlling the
predictor for the predictors already in the model
r y.(x1.x3,x2)
vs.
r y.(x4.x3,x2)
When to quit ??? When no additional predictor will significantly

increase the R (same as when no multiple semi-partial is
significant).
Difficulties with the forward inclusion model
The major potential problem is over-inclusion -- a predictor that
contributes to a smaller (earlier) model fails to continue to
contribute as the model gets larger (with increased
collinearity), but the predictor stays in the model.
The resulting model may not be the best -- there may be
another model with the same # predictors but larger R, etc
All of these problems are exacerbated by increased collinearity !!
Backward selection
Step 1 -- start with the full model (all predictors) -- if the R is
significant. Consider the regression weights of this model.
Step 2 -- remove from the model that predictor that contributes
the least
Delete that predictor with the largest p-value associated
with its regression (b) weight -- if that p-value is greater
than .05. (The idea is the predictor with the largest pvalue is the one most likely to not be contributing to the
model in the population)
bx1(p=.08) vs. bx2(p=.02) vs. bx3(p=.02) vs. bx4(p=.27)
On all subsequent steps -- the next predictor dropped from the

model is that with the largest (non-significant) regression weight.
bx1(p=.21) vs. bx2(p=.14) vs. bx3(p=.012)

When to quit ?? When all the predictors in the model are
contributing to the model.
Difficulties with the backward deletion model
The major potential problem is under-inclusion -- a predictor
that is deleted from a larger (earlier) model would
contribute to a smaller model, but isnt re-included.
another model with the same # predictors but larger R, etc
All of these problems are exacerbated by increased collinearity !!
Stepwise regression
Step 1 the first predictor in the model is the best single predictor
(same as the forward inclusion model)
Select the predictor with the numerically largest simple
correlation with the criterion -- if it is a significant
correlation
by using this procedure we are sure that the initial model works
Step 2 the next predictor in the model is the one that will
contribute the most -- with two equivalent definitions
(same as the forward inclusion model)
correlation with the criterion, controlling the predictor for the
predictor already in the model -- if the semi-partial is
significant
by using this procedure we are sure the 2-predictor model works
and works better than the 1-predictor model
On all Subsequent steps (each having two parts)

a. -- remove from the model that predictor that contributes
the least (same as the backward deletion model)
Delete that predictor with the largest p-value associated
with its regression (b) weight -- if that p-value is greater
than .05. (The idea is the predictor with the largest pvalue is the one most likely to not be contributing to the
model in the population)
-- if a predictor is deleted, look for a second (third, etc) that
should also be deleted, before moving on to part b.
by using this procedure, we are sure that all the predictors
in the model are contributing before adding any additional
predictors to the model
b. the next predictor in the model is the one that will

contribute the most ( same as for forward inclusion ) -with two equivalent definitions
correlation with the criterion, controlling the predictor for the
predictor already in the model -- if the semi-partial is
significant
by using this procedure we are sure the model with the added
predictor works and works better than the model without it
When to quit ? -- when BOTH of two conditions exist

1. All predictors included in the model are contributing to it
2. None of the predictors that are not in the model would
contribute if they were added.
by using this procedure we avoid both over-inclusion and underinclusion
Difficulty with the stepwise regression

another model with the same # predictors but larger R
Assumes that the best model is found by starting with the best
single predictor
This problem is exacerbated by increased collinearity !!
Model selection
A full model is one that includes all the
variables
A null model is one that includes only the
intercept
Selection of which variables to include can be
done by you, by the computer, or both
Types of selection:
Forward, backward, stepwise
Backward selection
Starts with a full model
Removes variables starting with the least
significant variable
Often the best approach to start with
What do you get when you cross a statistician
with a chiropractor?
You get an adjusted R squared from a
BACKward regression problem!
Forward selection
Starts with a null model
Enters the variables into the model starting
with the most significant

Can miss important associations or
interactions
Stepwise selection
Starts with a full or null model (usually a full
model or backwards stepwise)

Adds or removes variables based on their
significance in the model
Looks at variable itself and the relationship
with other in the model
Can be considered the best automatic model
selection especially with many exposure
variables
Stepwise Regression
Analysis
Stepwise finds the explanatory variable with the
highest R2 to start with. It then checks each of
the remaining variables until two variables with
highest R2 are found. It then repeats the process
until three variables with highest R 2 are found,
and so on.
The overall R2 gets larger as more variables are
added.
Stepwise may be useful in the early exploratory
stage of data analysis, but not to be relied upon
for the confirmatory stage.
Week assignment
Summary stepwise regression (3 pages)
Run stepwise regression (data in next slide)
https://www.youtube.com/watch?v=eme0ErU
7GJA
Data for stepwise

regression
childAA
childA
childIn
parentIn
teacherI
n
frequenc
y
30
30
30
30
30
30
30
30
30

Stepwise Regression

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Stepwise Regression

Încărcat de

Drepturi de autor:

Formate disponibile

Stepwise

of candidate predictor variables.

What method should you use:

independent variables from which you wish

additional predictor does not yield a t-test Pvalue below E = 0.05.

Caution about stepwise

Caution about stepwise

optimal in any specified sense.

The three most commonly used

Backward selection -- start with a full model and delete predictors

Stepwise selection -- a combination of the first two

2. Add to the model that predictor with the highest semi-partial

2. Add to the model that predictor with the highest semi-partial

When to quit ??? When no additional predictor will significantly

bx1(p=.08) vs. bx2(p=.02) vs. bx3(p=.02) vs. bx4(p=.27)

On all subsequent steps -- the next predictor dropped from the

bx1(p=.21) vs. bx2(p=.14) vs. bx3(p=.012)

On all Subsequent steps (each having two parts)

b. the next predictor in the model is the one that will

When to quit ? -- when BOTH of two conditions exist

Difficulty with the stepwise regression

What do you get when you cross a statistician

with the most significant

model or backwards stepwise)

Data for stepwise

S-ar putea să vă placă și