Introduction To Multiple Regression: Dale E. Berger Claremont Graduate University

Introduction to Multiple Regression
Dale E. Berger
Claremont Graduate University http://wise.cgu.edu
Overview
Multiple regression is a flei!le method of data analysis that may !e appropriate whenever a
"uantitative varia!le #the dependent or criterion varia!le$ is to !e eamined in relationship to any
other factors #epressed as independent or predictor varia!les$. %elationships may !e nonlinear&
independent varia!les may !e "uantitative or "ualitative& and one can eamine the effects of a single
varia!le or multiple varia!les with or without the effects of other varia!les ta'en into account #Cohen&
Cohen& (est& ) *i'en& +,,-$.
Multiple Regression Models and Significance Tests
Many practical "uestions involve the relationship !etween a dependent or criterion varia!le of interest
#call it .$ and a set of ' independent varia!les or potential predictor varia!les #call them /0& /+& /-&...&
/'$& where the scores on all varia!les are measured for 1 cases. 2or eample& you might !e
interested in predicting performance on a 3o! #.$ using information on years of eperience #/0$&
performance in a training program #/+$& and performance on an aptitude test #/-$. * multiple
regression e"uation for predicting . can !e epressed a follows:
#0$ Y A B X B X B X ' = + + +
1 1 2 2 3 3
4o apply the e"uation& each /3 score for an individual case is multiplied !y the corresponding B3
value& the products are added together& and the constant * is added to the sum. 4he result is .& the
predicted . value for the case.
2or a given set of data& the values for * and the B3s are determined mathematically to minimi5e the
sum of s"uared deviations !etween predicted . and the actual . scores. Calculations are "uite
comple& and !est performed with the help of a computer& although simple cases with only one or two
predictors can !e solved !y hand with special formulas.
4he correlation !etween . and the actual . value is also called the multiple correlation coefficient&
%y.0+...'& or simply %. 4hus& % provides a measure of how well . can !e predicted from the set of /
scores. 4he following formula can !e used to test the null hypothesis that in the population there is no
linear relationship !etween . and prediction !ased on the set of ' / varia!les from 1 cases:
#+$ F
R k
R N k
df k N k
y k
k y
=

=
. ...
. ...
/
( ) / ( )
, ,
12
2
12
2
1 1
1.
2or the statistical test to !e accurate& a set of assumptions must !e satisfied. 4he 'ey assumptions
are that cases are sampled randomly and independently from the population& and that the deviations
Page 2
of . values from the predicted . values are normally distri!uted with e"ual variance for all predicted
values of ..
*lternatively& the independent varia!les can !e epressed in terms of standardi5ed scores where 60 is
the 5 score of varia!le /0& etc. 4he regression e"uation then simplifies to:
#-$ 6.
7 8060 9 8+6+ 9 8-6- .
4he value of the multiple correlation % and the test for statistical significance of % are the same for
standardi5ed and raw score formulations.
Test of R Squared Added
*n especially useful application of multiple regression analysis is to determine whether a set of
varia!les #:et B$ contri!utes to the prediction of . !eyond the contri!ution of a prior set #:et *$. 4he
statistic of interest here& % s"uared added& is the difference !etween the % s"uared for !oth sets of
varia!les #%
+
..*B$ and the % s"uared for only the first set #%
+
..*$. ;f we let '* !e the num!er of
varia!les in the first set and 'B !e the num!er in the second set& a formula to test the statistical
significance of % s"uared added !y :et B is:
Each set may have any num!er of varia!les. 1otice that 2ormula #+$ is a special case of 2ormula #<$
where '*7,. ;f '*7, and 'B70& we have a test for a single predictor varia!le& and 2ormula #<$
!ecomes e"uivalent to the s"uare of the t test formula for testing a simple correlation.
Example: Prediction of Scores on a inal Examination
*n instructor taught the same course several times and used the same eaminations each time. 4he
composition of the classes and performance on the eaminations was very sta!le from term to term.
:cores are availa!le on a final eamination #.$ and two midterm eaminations #/0 and /+$ from an
earlier class of += students. 4he correlation !etween the final and the first midterm& r.0& is .>,.
:imilarly& r.+7.?, and r0+7.-,. ;n the current class& scores are availa!le from the two midterm
eaminations& !ut not from the final. 4he instructor poses several "uestions& which we will address
after we develop the necessary tools:
a$ (hat is the !est formula for predicting performance on the final eamination from
performance on the two midterm eaminations@
!$ Aow well can performance on the final !e predicted from performance on the two
midterm eaminations@
c$ Does this prediction model perform significantly !etter than chance@
d$ Does the second midterm add significantly to prediction of the final& !eyond the
prediction !ased on the first midterm alone@
(4) F =
(
R
-
R
) /
k
(1-
R
) / (N -
k
-
k
- 1)
, df =
k
, N -
k
-
k
- 1.
Y. AB
2
Y.A
2
B
Y. AB
2
A B
B A B
Page 3
Regression !oefficients: Standardi"ed and #nstandardi"ed
:tandard statistical pac'age programs such as :B:: %EG%E::;C1 can !e used to calculate
statistics to answer each of the "uestions in the eample& and many other "uestions as well. :ince
there are only two predictors& special formulas can !e used to conduct an analysis without the help of
a computer.
(ith standardi5ed scores& the regression coefficients are:
Using the data from the eample& we find:
(e can put these estimates of the !eta weights into 2ormula #-$ to produce a prediction e"uation for
the standardi5ed scores on the final eamination. 2or a person whose standardi5ed scores on the
midterms are 60 7 .=, and 6+ 7 .>,& our prediction of the standardi5ed score on the final eamination
is:
6.D 7 #80$#60$ 9 #8+$#6+$ 7 #.<E$#.=,$ 9 #.-?$#.>,$ 7 .>,+.
Cnce we have the !eta coefficients for standardi5ed scores& it is easy to generate the B3 regression
coefficients shown in 2ormula #0$ for prediction using unstandardi5ed or raw scores& !ecause
;t is important that B3 weights not !e compared without proper consideration of the standard deviations
of the corresponding /3 varia!les. ;f two varia!les& /0 and /+& are e"ually predictive of the criterion&
!ut the :D for the first varia!le is 0,, times larger than the :D for the second varia!le& B0 will !e 0,,
times smaller than B+F Aowever& the !eta weights for the two varia!les would !e e"ual.
4o apply these formulas& we need to 'now the :D and mean for each test. :uppose the mean is G,
for the final& and >, and ?, for the first and second midterms& respectively& and :D is +, for the final&
0? for the first midterm& and 0, for the second midterm. (e can calculate B0 7 #.<E$#+,/0?$ 7 .>?-
and B+ 7 #.-?$#+,/0,$ 7 .G,,& and * 7 G, H #.>?-$#>,$ H #.G,,$#?,$ 7 I<.0=.
4hus& the !est formula for predicting the score on the final in our eample is
.D 7 I<.0= 9 .>?- /0 9 .G,, /+
(5)
1
Y Y 12
12
2 2
Y Y 12
12
2
=
r
- (
r
)(
r
)
1 - (
r
)
, and =
r r
)(
r
)
1 - (
r
)
.
1 2 2 1
(
1 2
=
.6 - (.5)(.3)
1-(.3)(.3)
= .49, and =
.5 - (.6)(.3)
1-(.3)(.3)
= .35.
(6)
).
X
)(
B
( - )
X
)(
B
( - Y = A and ,
SD
SD
=
B
,
SD
SD
=
B
2
2
1
1
X
Y
2
2
X
Y
1
1
2 1

Page 4
Multiple !orrelation wit$ Two Predictors
4he strength of prediction from a multiple regression e"uation is nicely measured !y the s"uare of the
multiple correlation coefficient& %
+
. ;n the case of only two predictors& %
+
can !e found !y using the
formula
;n our eample& we find
Cne interpretation of %
+
..0+ is that it is the proportion of . variance that can !e eplained !y the two
predictors. Aere the two midterms can eplain #predict$ <G.-J of the variance in the final test scores.
Tests of Significance for R
;t can !e important to determine whether a multiple regression coefficient is statistically significant&
!ecause multiple correlations calculated from o!served data will always !e positive. (hen many
predictors are used with a small sample& an o!served multiple correlation can !e "uite large& even
when all correlations in the population are actually 5ero. (ith a small sample& o!served correlations
can vary widely from their population values. 4he multiple regression procedure capitali5es on
chance !y assigning greatest weight to those varia!les which happen to have the strongest
relationships with the criterion varia!les in the sample data. ;f there are many varia!les from which to
choose& the inflation can !e su!stantial. Kac' of statistical significance indicates that an o!served
sample multiple correlation could well !e due to chance.
;n our eample we o!served %
+
7.<G-. (e can apply 2ormula #+$ to test for statistical significance to
get
4he ta!led 2#+& +?& .,0$ 7 ?.?G& so our findings are highly significant #pL.,0$. ;n fact& pL.,,0 !ecause
2#+& +?& .,,0$ 7 E.++.
Tests of Significance for R Squared Added
4he a!ility of any single varia!le to predict the criterion is measured !y the simple correlation& and the
statistical significance of the correlation can !e tested with the tItest& or with an 2Itest using 2ormula
(7)
Y.12
2
Y
2
Y Y Y 12
12
2
R
=
r

r
- 2(
r
)(
r
)(
r
)
1 -
r
.
1 2
2
1 2
Y.12
2
2 2
2
R
=
(.6 ) (.5 ) - 2(.6)(.5)(.3)
1 - (.3 )
=
.43
.91
= .4!3.
F =
.4!3 / 2
(1-.4!3) / (2" - 2 - 1)
= 11.2, df = 2,25.
Page 5
#+$ with '70. Cften it is important to determine if a second varia!le contri!utes relia!ly to prediction of
the criterion after any redundancy with the first varia!le has !een removed.
;n our eample& we might as' whether the second midterm eamination improves our a!ility to
predict the score on the final eamination !eyond our prediction !ased on the first midterm alone.
Cur a!ility to predict the criterion with the first midterm #/0$ alone is measured !y #r.0$
+
7 #.>$
+
7 .->,&
and with !oth /0 and /+ our a!ility to predict the criterion is measured !y %
+
7 .<G-. 4he increase is
our a!ility to predict the criterion is measured !y the increase in % s"uared& which is also called M%
s"uared added.N ;n our eample % s"uared added 7 #.<G- I .->,$ 7 .00-. (e can test % s"uared
added for statistical significance with 2ormula #<$& where :et * consists of the first midterm eam
#/0$& and :et B consists of the second midterm eam #/+$. 2or our eample we find
4he 2 ta!les show 2#0& +?& .,0$ 7 G.GG and #2#0& +?& .,?$ 7 <.+<& so our finding is statistically significant with
pL.,?& !ut not pL.,0. (e can conclude that the second midterm does improve our a!ility to predict
the score on the final eamination !eyond our predictive a!ility using only the first midterm score.
Measures of Partial !orrelation
4he increase of %
+
when a single varia!le #B$ is added to an earlier set of predictors #*$ is identical to
the s"uare of the semipartial correlation of . and B with the effects of set * removed from B.
:emipartial correlation is an inde of the uni"ue contri!ution of a varia!le a!ove and !eyond the
influence of some other varia!le or set of varia!les. ;t is the correlation !etween the criterion varia!le
#.$ and that part of a predictor varia!le #B$ which is independent of the first set of predictors #*$. ;n
comparison& partial correlation !etween . and B is calculated !y statistically removing the effects of
set * from !oth . and B. Bartial and semipartial correlation have similar interpretations& and identical
tests of statistical significance. ;f one is significant& so is the other.
4he tests of statistical significance for !oth standardi5ed and unstandardi5ed regression coefficients
for a varia!le /3 are also identical to the tests of significance for partial and semipartial correlations
!etween . and /3 if the same varia!les are used. 4his is !ecause the null hypotheses for testing the
statistical significance of each of these four statistics #B& !eta& partial correlation& and semipartial
correlation$ have the same implication: 4he varia!le of interest does not ma'e a uni"ue contri!ution
to the prediction of . !eyond the contri!ution of the other predictors in the model.
(hen two predictor varia!les are highly correlated& neither varia!le may add much uni"ue predictive
power !eyond the other. 4he partial and semipartial correlations will !e small in this case. 4he !eta
and B weights will not necessarily !e small& !ut our estimates of these weights will !e unsta!le. 4hat
is& the weight that each varia!le is given in the regression e"uation is somewhat ar!itrary if the
varia!les are virtually interchangea!le. 4his insta!ility of the estimates of !eta and B is reflected in
the tests of statistical significance& and the 2 tests will !e identical to the 2 tests of the partial and
semipartial correlations.
;n the special case of two predictors& the standard error for !eta #which is the same for !oth !etas
when there are only two predictors$ can !e calculated with the following formula and applied to our
eample:
F =
(.4!3 - .36#) / 1
(1 - .4!3) / (2" - 1- 1- 1)
= 5.36, df = 1,25.
Page 6
Each !eta can !e tested for statistical significance using a tItest with df71I'I0& where t 7 !eta / #:E
for !eta$ 7 8 / :E8. 2or our second varia!le& this leads to t#+? df$ 7 .-?+/.0?+ 7 +.-0>. ;f we wished
to conduct an 2 test& we could s"uare the t value and use 2#l& 1I'I0$. 2or our data& this produces 2#0&+?$
7 ?.-> which is the same value that we o!tained when we tested the % s"uare added !y /+.
Tolerance and Multicollinearit%
1otice the effect of a large r0+ on the :E for !eta in 2ormula #=$. *s r0+ approaches 0.,& the :E for
!eta grows very rapidly. ;f you try to enter two predictor varia!les that are perfectly correlated
#r0+70.,$& the regression program may a!ort !ecause calculation of the :E involves division !y 5ero.
(hen any one predictor varia!le can !e predicted to a very high degree from the other predictor
varia!les& we say there is a pro!lem of multicollinearity& indicating a situation where estimates of
regression coefficients are very unsta!le.
4he :E for an unstandardi5ed regression coefficient& B3& can !e o!tained !y multiplying the :E for the
!eta !y the ratio of the :D for . divided !y the :D for the /3 varia!le:
(9)
$
$
$
B
Y
X
S%
=
SD
SD
(
S%
).
4he tItest of statistical significance of B3 is t 7 #o!served B3$/#:E for B3$ with df71I'I0& which is 1I-
when there are two predictors.
(ith more than two predictor varia!les #' O +$& the standard error for !eta coefficients can !e found
with the formula:
(10)
$
&
Y
2
($)
2
S%
=
1 -
R
(N - k - 1)(1 -
R
)
.
where %
+
. indicates the multiple correlation using all ' predictor varia!les& and %
+
#3$ indicates the
multiple correlation predicting varia!le /3 using all of the remaining #'I0$ predictor varia!les. 4he
term %
+
#3$ is an inde of the redundancy of varia!le /3 with the other predictors& and is a measure of
multicollinearity. 4olerance& as calculated !y :B:: and other programs& is e"ual to #0 I %
+
#3$$.
4olerance close to 0., indicates the predictor in "uestion is not redundant with other predictors
already in the regression e"uation& while a tolerance close to 5ero indicates a high degree of
redundancy.
S$run&en R Squared 'or Ad(usted R Squared)
Multiple % s"uared is the proportion of . variance that can !e eplained !y the linear model using /
varia!les in the sample data& !ut it overestimates that proportion in the population. 4his is !ecause
the regression e"uation is calculated to produce the maimum possi!le % for the o!served data. *ny
varia!le that happens to !e correlated with . in the sample data will !e given optimal weight in the
(8)
S% f'r ()*a (*+' ,r)d-.*'r/) =
1 -
R
(N - k - 1)(1 -
r
)
=
1 - .4!3
(25)(1-.
3
)
= .152.
2
12
2 2
Page 7
sample regression e"uation. 4his capitali5ation on chance is especially serious when many predictor
varia!les are used with a relatively small sample. Consider& for eample& sample %
+
7 .>, !ased on
'7G predictor varia!les in a sample of 170? cases. *n estimate of the proportion of . variance that
can !e accounted for !y the / varia!les in the population is called Mshrun'en % s"uaredN or Mad3usted
% s"uared.N ;t can !e calculated with the following formula:
4hus& we conclude that the rather impressive %
+
7 .>, that was found in the sample was greatly
inflated !y capitali5ation on chance& !ecause the !est estimate of the relationship !etween . and the
/ varia!les in the population is shrun'en % s"uared 7 .+,. * shrun'en % s"uared e"ual to 5ero
corresponds eactly to 2 7 0., in the test for statistical significance. ;f the formula for shrun'en %
s"uared produces a negative value& this indicates that your o!served %
+
is smaller than you would
epect if %
+
7 , in the population& and your !est estimate of the population value of % is 5ero. ;t is
important to have a large num!er of cases #1$ relative to the num!er of predictor varia!les #'$. *
good rule of thum! is 1 O ?, 9 =' when testing %
+
and 1 O 0,< 9 ' when testing individual B3 values
#Green& 0EE0$. ;n eploratory research the 1:' ratio may !e lower& !ut as the ratio drops it !ecomes
increasingly ris'y to generali5e regression results !eyond the sample.
Stepwise *s+ ,ierarc$ical Selection of *aria-les
*nother pitfall& which can !e even more serious& is inflation of the sample %
+
due to selection of the
!est predictors from a larger set of potential predictors. 4he culprit here is the MstepwiseN regression
option that is included in many statistical programs. 2or eample& in :B:: %EG%E::;C1 it is very
easy for the novice to use stepwise procedures where!y the computer program is allowed to choose a
small set of the !est predictors from the set of all potential predictors. 4he pro!lem is that the
significance levels reported !y the computer program do not ta'e this into accountFFF *s an etreme
eample& suppose you have 0,, varia!les that are complete nonsense #e.g.& random num!ers$& and
you use them in a stepwise regression to predict some criterion .. By chance alone a!out half of the
sample correlations will !e at least slightly positive and half at least slightly negative. *gain& !y
chance one would epect that a!out ? would !e Mstatistically significantN with pL.,?. 4he stepwise
regression program will find all of the varia!les that happen to contri!ute MsignificantlyN to the
prediction of .& and the program will enter them into the regression e"uation with optimal weights.
4he test of significance reported !y the program will pro!a!ly show that the %
+
is highly significant
when& in fact& all correlations in the population are 5ero.
Cf course& in practice& one does not plan to use nonsense varia!les& and the correlations in the
population are not all 5ero. 1evertheless& stepwise regression procedures can produce greatly
(11)
S0r1nk)n
R
=
R
= 1- (1-
R
)
N - 1
N - k - 1
= 1- (1-.6)
14
!
= .2#.
2 2 2 ~
Page 8
inflated tests of significance if you do not ta'e into account the total num!er of varia!les that were
considered for inclusion. Until 0EGE there was no simple way to deal with this pro!lem. * procedure
that was sometimes recommended for tests of statistical significance was to set ' e"ual to the total
num!er of varia!les considered for inclusion& rather than set ' e"ual to the num!er of predictors
actually used. 4his is a very conservative procedure !ecause it assumes that the o!served % would
not have grown larger if all of the varia!les had !een used instead of a su!set.
* more accurate test of significance can !e o!tained !y using special ta!les provided !y (il'inson
#0EGE$. 4hese ta!les provide values of % s"uared that are statistically significant at the .,? and .,0
levels& ta'ing into account sample si5e #1$& num!er of predictors in the e"uation #'$& and total num!er
of predictors considered !y the stepwise program #m$. :B:: and other programs will not compute
the correct test for you.
*nother pro!lem with stepwise regression is that the program may enter the varia!les in an order that
ma'es it difficult to interpret % s"uared added at each step. 2or eample& it may ma'e sense to
eamine the effects of a training program after the effects of previous a!ility have already !een
considered& !ut the reverse order is less interpreta!le.
;n practice& it is almost always prefera!le for the researcher to control the order of entry of the
predictor varia!les. 4his procedure is called Mhierarchical analysis&N and it re"uires the researcher to
plan the analysis with care& prior to loo'ing at the data. 4he dou!le advantage of hierarchical
methods over stepwise methods is that there is less capitali5ation on chance& and the careful
researcher will !e assured that results such as % s"uared added are interpreta!le. :tepwise methods
should !e reserved for eploration of data and hypothesis generation& !ut results should !e
interpreted with proper caution.
2or any particular set of varia!les& multiple % and the final regression e"uation do not depend on the
order of entry. 4hus& the regression weights in the final e"uation will !e identical for hierarchical and
stepwise analyses after all of the varia!les are entered. *t intermediate steps& the B and !eta values
as well as the % s"uared added& partial and semipartial correlations can !e greatly affected !y
varia!les that have already entered the analysis.
!ategorical *aria-les
Categorical varia!les& such as religion or ethnicity& can !e coded numerically where each num!er
represents a specific category #e.g.& 07Brotestant& +7Catholic& -7Pewish& etc.$. ;t would !e
meaningless to use a varia!le in this form as a regression predictor !ecause the si5e of the num!ers
does not represent the amount of some characteristic. Aowever& it is possi!le to capture all of the
predictive information in the original varia!le with c categories !y using #cI0$ new varia!les& each of
which will pic' up part of the information.
2or eample& suppose a researcher is interested in the relationship !etween ethnicity #/0$ and income
#.$. ;f ethnicity is coded in four categories #e.g.& 07EuroI*mericansQ +7*fricanI*mericansQ -7KatinoI
*mericansQ and <7Cther$& the researcher could create three new varia!les that each pic' up one
aspect of the ethnicity varia!le. Berhaps the easiest way to do this is to use MdummyN varia!les&
where each dummy varia!le #D3$ ta'es on only values of 0 or , as shown in 4a!le 0.
;n this eample& D070 for EuroI*mericans and D07, for everyone elseQ D+70 for *fricanI*mericans
and D+7, for everyone elseQ D-70 for KatinoI*mericans and D-7, for everyone else. * person who is
not a mem!er of one of these three groups will !e given the code of , on all three dummy varia!les.
Cne can eamine the effects of ethnicity !y entering all three dummy varia!les into the analysis
simultaneously as a set of predictors. 4he % s"uared added for these three varia!les as a set can !e
measured& and tested for significance using 2ormula #<$. 4he 2 test for significance of the % s"uared
Page 9
added !y the three ethnicity varia!les is identical to the 2 test one would find with a oneIway analysis
of variance on ethnicity. ;n !oth analyses the null hypothesis is that the ethnic groups do not differ in
income& or that there is no relationship !etween income and ethnicity.
Table 1: Dummy Coding of Ethnicity
Criterion Ethnicity Dummy variables
Case (Y) (X1) D1 D2 D3
---- --------- --------- -- -- --
1 25 1 1
2 1! 2 1
3 21 3 1
4 2" 4
5 23 2 1
# 13 4
$ $ $ $ $ $
;f there are four groups& any three can !e selected to define the dummy codes. 4ests of significance
for % s"uared added !y the entire set of #cI0$ dummy varia!les will !e identical in each case.
;ntermediate results and the regression weights will depend on the eact nature of the coding&
however. 4here are other methods of recoding in addition to dummy coding that will produce
identical overall tests& !ut will produce different intermediate results that may !e more interpreta!le in
some applications.
* test of the simple correlation of D0 with . is a test of the difference !etween EuroI*mericans and
everyone else on .. Aowever& when all three dummy varia!les are in the model& a test of B0 for EuroI
*mericans is a test of the difference !etween EuroI*mericans and the reference group RDCther&N the
group not represented !y a dummy varia!le in the model. ;t is important to interpret this surprising
result correctly. ;n a multiple regression model& a test of B or !eta is a test of the Runi"ueD contri!ution
of that varia!le& !eyond all of the other varia!les in the model. ;n our eample& D+ accounts for
differences !etween *fricanI*mericans and other groups and D- accounts for differences !etween
KatinoI*mericans and other groups. 1either of these two varia!les can separate EuroI*mericans
from the RCtherD reference group. 4hus& the uni"ue contri!ution of varia!le D0 is to distinguish EuroI
*mericans from the RCtherD group.
Interactions
4he interaction of any two predictor varia!les can !e coded for each case as the product of the values
for the two varia!les. 4he contri!ution of the interaction can !e assessed as % s"uared added !y the
interaction term after the two predictor varia!les have !een entered into the analysis. Before
computing the interaction term& it is advisa!le to RcenterD varia!les !y su!tracting the varia!le mean
from each score. 4his reduces the amount of overlap or collinearity !etween the interaction term and
the main effects. Cohen& et al #+,,-$ provide a thorough discussion of this issue.
;t is also possi!le to assess the effects of an interaction of a categorical varia!le #/0$ with a
"uantitative varia!le #/+$. ;n this case& the categorical varia!le with c categories is recoded into a set
of #cI0$ dummy varia!les& and the interaction is represented as a set of #cI0$ new varia!les defined !y
the product of each dummy varia!le with /+. *n 2 test for the contri!ution of the interaction can !e
calculated for % s"uared added !y the set of interaction varia!les !eyond the set of #cI0$ dummy
varia!les and /+. 4he main effects must !e in the model when contri!ution of the interaction is tested.
Page 10
Multiple Regression and Anal%sis of *ariance
4he interaction !etween two categorical varia!les can !e tested with regression analysis. :uppose
the two varia!les have c categories and d categories& respectively& and they are recoded into sets of
#cI0$ and #dI0$ dummy varia!les& respectively. 4he interaction can !e represented !y a set of #cI0$
#dI0$ terms consisting of all possi!le pairwise products constructed !y multiplying one varia!le in the
first set !y one varia!le in the second set. 2ormula #<$ can !e used to conduct tests of significance
for each set of dummy varia!les& and for the % s"uared added !y the set of #cI0$#dI0$ interaction
varia!les after the two sets of dummy varia!les for the main effects. 4he denominator term that is
used in 2ormula #<$ for testing the contri!ution of the interaction set !eyond the main effects #the two
sets of dummy varia!les$ is eactly e"ual to the Mean :"uares (ithin Cells in *1CS*. 4he 2 tests
of statistical significance for the sets of varia!les and the set of #cI0$#dI0$ interaction varia!les are
identical to the corresponding 2 tests in analysis of variance if the denominator of 2ormula #<$ is
replaced !y the Mean :"uares (ithin Cells for all three tests.
;n most applications the % s"uared added !y each set of dummy varia!les will depend on the order of
entry. Generally& the uni"ue contri!ution of most varia!les will !e less when they are entered after
other varia!les than when they are entered prior to the other varia!les. 4his is descri!ed as
MnonorthogonalityN in analysis of variance. ;f the num!er of cases is the same in each of the #c$#d$
cells defined !y the #c$ levels of the first varia!le and the #d$ levels of the second varia!le& then the
analysis of variance is orthogonal& and the order of entry of the two sets of dummy varia!les does not
affect their contri!ution to prediction.
Missing .ata
Missing data causes pro!lems !ecause multiple regression procedures re"uire that every case have
a score on every varia!le that is used in the analysis. 4he most common ways of dealing with
missing data are pairwise deletion& listwise deletion& deletion of varia!les& and coding of missingness.
Aowever& none of these methods is entirely satisfactory.
;f data are missing randomly& then it may !e appropriate to estimate each !ivariate correlation on the
!asis of all cases that have data on the two varia!les. 4his is called Mpairwise deletionN of missing
data. *n implicit assumption is that the cases where data are availa!le do not differ systematically
from cases where data are not availa!le. ;n most applied situations this assumption clearly is not
valid& and generali5ation to the population of interest is ris'y.
*nother serious pro!lem with pairwise deletion is that the correlation matri that is used for
multivariate analysis is not !ased on any single sample of cases& and thus the correlation matri may
not !e internally consistent. Each correlation may !e calculated for a different su!group of cases.
Calculations !ased on such a correlation matri can produce anomalous results such as %
+
O 0.,.
2or eample& if ry0 7 .=& ry+ 7 .=& and r0+ 7 ,& then %
+
y.0+ 7 0.+=F * researcher is luc'y to spot such
anomalous results& !ecause then the error can !e corrected. Errors in the estimate and testing of
multivariate statistics caused !y inappropriate use of pairwise deletion usually go undetected.
* second procedure is to delete an entire case if information is missing on any one of the varia!les
that is used in the analysis. 4his is called Mlistwise deletion&N the default option in :B:: and many
other programs. 4he advantage is that the correlation matri will !e internally consistent. *
disadvantage is that the num!er of cases left in the analysis can !ecome very small. 2or eample&
suppose you have data on E varia!les from 0,, cases. ;f a different group of 0, cases is missing
data on each of the E varia!les& then only 0, cases are left with complete data. %esults from such an
analysis will !e useless. 4he 1:' ratio is only 0,:E so the sample statistics will !e very unsta!le and
the sample % will greatly overestimate the population value of %. 2urther& those cases that have
Page 11
complete data are unli'ely to !e representative of the population. Cases that are a!le #willing@$ to
provide complete data are unusual in the sample.
* third procedure is simply to delete a varia!le that has su!stantial missing data. 4his is easy to do&
!ut it has the disadvantage of discarding all information that is carried !y the varia!le.
* fourth procedure& populari5ed !y Cohen and Cohen& is to construct a new MmissingnessN varia!le
#D3$ for every varia!le #/3$ that has missing data. 4he D3 varia!le is a dummy varia!le where D370 for
each case that is missing data on /3& and D37, for each case that has valid data on /3. *ll cases are
retained in the analysisQ cases that are missing data on /3 are MpluggedN with a constant value such as
EEE. ;n the regression analysis& the missingness varia!le D3 is entered immediately prior to the /3
varia!le. 4he % s"uared added for the set of two varia!les indicates the amount of information that is
carried !y the original varia!le as it is coded in the sample. 4he % s"uared added !y D3 can !e
interpreted as the proportion of variance in . that can !e accounted for !y 'nowledge of whether or
not information is availa!le on /3. 4he % s"uared added !y /3 indicates predictive information that is
carried !y cases that have valid data on /3.
4he % s"uared added !y /3 after D3 has !een entered does not depend on the value of the constant
that was used to indicate missing data on /3. *n advantage of MpluggingN missing data with the mean
of valid scores on /3 is that then D3 and /3 are uncorrelated: for !oth levels of D3 #cases with and
without data on /3$& the mean value of /3 is e"ual to the same value. ;n this case& the order of entry is
D3 and /3 does not affect the value of % s"uared added for either varia!le. *n advantage of using a
value li'e EEE to plug missing data is that such a value pro!a!ly is already in the data. ;t is important
that only one num!er is used to plug missing data on any one varia!le. Aowever& after D3 has
entered the analysis& the % s"uared added !y /3 plugged with EEE is identical to the % s"uared added
!y /3 plugged with the mean. 4he % s"uared added !y /3 indicates additional predictive information
that is carried !y cases that have valid data on /3.
;t is also important to consider how much data is missing on a varia!le. (ith only a small amount of
missing data& it generally doesnDt matter which method is used. (ith a su!stantial portion of data
missing& it is important to determine whether the missingness is random or not. ;n practice&
missingness often goes together on many varia!les& such as when a respondent "uits or leaves a
page of a survey !lan'. ;n such a case& it may !e !est to use a single missingness varia!le for
several /3 varia!les. Ctherwise& there may !e serious multicollinearity pro!lems among the D3
missingness varia!les.
;f data are missing on the dependent varia!le #.$& there is no alternative !ut to drop the case from
consideration. ;f the loss is truly random& it might !e reasona!le to include the case for estimating the
correlations among the predictors. 4he correlation of the missingness varia!le with other varia!les
such as the criterion #.$ can !e used to test the hypothesis that data are missing at random. Cohen&
Cohen& (est& and *i'en #+,,-$ provide an etended discussion of dealing with missing data.
/$at to Report
%easona!le people may choose to present different information. ;t is useful to consider four distinct
'inds of information. 2irst& we have the simple correlations #r$ which tell us how each individual
predictor varia!le is related to the criterion varia!le& ignoring all other varia!les. 4he correlation of .
with an interaction term is not easily interpreted& !ecause this correlation is greatly influenced !y
scaling of the main effectsQ it could !e omitted from the ta!le with no loss. #:ee 4a!le +$
4he second type of information comes from %
+
added at each step. Aere the order of entry is critical if
the predictors overlap with each other. 2or eample& if :e had !een entered alone on :tep 0& %
+

added would have !een .,,<TT& statistically significant with pL.,0. #%
+
added for the first term is
simply its r s"uared.$ Because of partial overlap with education& :e adds only .,,0 #not significant$
Page 12
when Education is in the model. Aowever& the interaction term adds significantly !eyond the main
effects #.,,+T$& indicating that we do have a statistically significant interaction !etween :e and
Education in predicting Cccupational Brestige.
Table 2: Regression of Occupational Prestige on Years of Education and e! "#$1%1&'
tep (ariable r R
2
added ) E ) )eta***
1 Education "years' +&2,--- +2.,--- 1+//0 +110 +&10---
2 e! "2$13 4$2' 5+,/1-- +,,1 5/+,01 2+/06 5+,2.
1 Educ 7 e! "+2&&' +,,2- +%12 +2,1 5555
"Constant' 22+%,1 %+1,,
-p8+,&3 --p8+,13 ---p8+,,13 Cumulati9e R s:uared $ +2.13 ";d<usted R s:uared $ +2.1'+
) and E) are from the final model at tep 1= and )eta is from the model at tep 2 "all main
effects= but no interaction term'+
4he third type of information comes from the B weights in the final model. 4hese weights allow us to
construct the raw regression e"uation& and we can use them to compute the separate regression
e"uations for males and females& if we wish. 4he B weights and their tests of significance on the main
effects are not easily interpreted& !ecause they refer to the uni"ue contri!ution of each main effect
!eyond all other terms& including the interaction #which was computed as a product of main effects$.
4he test of B for the final term entered into the model is meaningful& as it is e"uivalent to the test of
%
+
added for the final term. ;n this case& !oth tests tell us that the interaction is statistically significant.
4he fourth type of information comes from the !eta weights on the model that contains only the main
effects. 4his provides a test of the uni"ue contri!ution of each main effect !eyond the other main
effects. ;f the main effects did not overlap at all& the !eta weight would !e identical to the r value for
each varia!le. Aere we see that :e does not contri!ute significantly !eyond Education to predicting
Cccupational Brestige #!eta 7 I.,+G$& although its simple r was I.,>-& pL.,0.
;t is also good to present the cumulative % s"uared when all varia!les of interest have !een entered
into the analysis. * test of significance should !e provided for each statistic that is presented& and the
sample si5e should !e indicated in the ta!le. 2igures can !e helpful& especially to display interactions.
inal Advice
Koo' at your dataF ;t is especially good practice to eamine the plot of residuals as a function of ..
*n assumption of regression analysis is that residuals are random& independent& and normally
distri!uted. * residual plot can help you spot etreme outliers or departures from linearity. Bivariate
scatter plots can also provide helpful diagnostics& !ut a plot of residuals is the !est way to find
multivariate outliers. * transformation of your data #e.g.& log or s"uare root$ may reduce the effects of
etreme scores and ma'e the distri!utions closer to normal.
;t is desira!le to use few predictors with many cases. (ith ' independent predictors& Green #0EE0$
recommended 1 O ?, 9 =' when testing %
+
and 1 O 0,< 9 ' when testing individual B3. Karger
samples are needed when predictor varia!les are correlated. ;f all population correlations are
RmediumD #i.e.& all Uy and U 7.-$& 17<0E is re"uired to attain power 7 .=, with five predictors& !ut if all
Uy 7 .- and U 7.?& then re"uired 17000G #Mawell& +,,,$. :tatistical significance may not !e very
meaningful with etremely large samples& !ut larger samples provide more precise estimates of
parameters and smaller confidence intervals.
Page 13
;f you have data availa!le on many varia!les and you pee' at your data to help you select the
varia!les that are the !est predictors of your criterion& !e sure that your tests of statistical significance
ta'e into account the total num!er of varia!les that were considered. 4he pro!lem is even more
serious with stepwise regression where the computer does the pee'ing.
(atch for multicollinearity where one predictor varia!le can itself !e predicted !y another predictor
varia!le or set of varia!les. 2or eample& with two highly correlated predictors you might find that
neither !eta is statistically significant !ut each varia!le has a significant simple r with the criterion and
the multiple % is statistically significant. 2urther& each varia!le contri!utes significantly to the
prediction when it is entered first& !ut not when it is entered second. ;n this case& it may !e !est to
form a composite of the two varia!les or to eliminate one of the varia!les.
;t is often useful to reduce the num!er of predictor varia!les !y forming composites of varia!les that
measure the same concept. * composite can !e epected to have higher relia!ility than any single
varia!le. ;t is important that the composites are formed on the !asis of relationships among the
predictors& and not on the !asis of their relationship with the criterion. 2actor analysis can !e used to
help formulate composites& and relia!ility analysis can !e used to evaluate the cohesiveness of the
composite.
2inally& !e thoughtful rather than mechanical with your data analysis. Be sure your summaries
ade"uately reflect your data. Get close to your data. Koo' at distri!utions& residuals& etc. DonDt trust
the computer to do 3ustice to your data. Cne advantage you have over the computer is that you can
as' MDoes this ma'e sense@N DonDt lose this advantage. %emem!er& to err is human& !ut to really
screw up it ta'es a computerF
Recommended Sources
Cohen, J., Cohen, P., West, S. ., ! "#$en, %. S. (2003). A,,2-)d 312*-,2) r)4r)//-'n/.'rr)2a*-'n ana2y/-/ f'r *0)
()0a5-'ra2 /.-)n.)/, 3rd %d. &ah'ah, (J) %a'*en+e ,*-.a/0 "sso+#ates.
*een, S. 1. (1991). 2o' 0an3 s/.4e+ts 5oes #t ta$e to 5o a *eg*ess#on ana-3s#s6 612*-5ar-a*) B)0a5-'ra2
R)/)ar.0, 26, 4997510. 8S#09-e */-es o: th/0. .ase5 on e09#*#+a- :#n5#ngs.;
2a<-#+e$, %., ! Pete*son, (., (1977). ,::e+ts o: the <#o-at#on o: ass/09t#ons /9on s#gn#:#+an+e -e<e-s o: the Pea*son
*. 7/y.0'2'4-.a2 B122)*-n, "4, 3737377. 8=o/ +an get a'a3 '#th a -ot 7 *eg*ess#on #s *e0a*$a.-3 *o./st '#th
*es9e+t to <#o-at#ng the ass/09t#on o: no*0a--3 5#st*#./te5 *es#5/a-s. 2o'e<e*, e>t*e0e o/t-#e*s +an 5#sto*t 3o/*
:#n5#ngs <e*3 s/.stant#a--3.;
&a>'e--, S. ,. (2000). Sa09-e s#?e an5 0/-t#9-e *eg*ess#on ana-3s#s. 7/y.0'2'4-.a2 6)*0'd/, 5(4), 4347458. 8When
9*e5#+to*s a*e +o**e-ate5 '#th ea+h othe*, -a*ge* sa09-es a*e nee5e5.;
Ste<ens, J. P. (2002). A,,2-)d 312*-5ar-a*) /*a*-/*-./ f'r *0) /'.-a2 /.-)n.)/ (4
*0
)d.). &ah'ah, (J) %a'*en+e
,*-.a/0 "sso+#ates. 8@h#s #ne>9ens#<e 9a9e*.a+$ #s a++ess#.-e, :#--e5 '#th e>a09-es an5 /se:/- a5<#+e.;
@a.a+hn#+$, 1. ., ! A#5e--, %. S. (2001). 8/-n4 312*-5ar-a*) /*a*-/*-./ (4
*0
)d.). (ee5ha0 2e#ghts, &") "--3n !
1a+on. 8@h#s #s an e>+e--ent *eso/*+e :o* st/5ents an5 /se*s o: a *ange o: 0/-t#<a*#ate 0etho5s, #n+-/5#ng
*eg*ess#on.;
W#-$#nson, %. (1979). @ests o: s#gn#:#+an+e #n ste9'#se *eg*ess#on. 7/y.0'2'4-.a2 B122)*-n, "6, 1687174. 8@he
se*#o/s 9*o.-e0 o: +a9#ta-#?at#on on +han+e #n ste9'#se ana-3ses #s gene*a--3 not /n5e*stoo5. W#-$#nson 9*o<#5es
s#09-e ta.-es to 5ea- '#th th#s 9*o.-e0.;
Ba-e ,. 1e*ge*, 2003

Introduction To Multiple Regression: Dale E. Berger Claremont Graduate University

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Introduction To Multiple Regression: Dale E. Berger Claremont Graduate University

Încărcat de

Drepturi de autor:

Formate disponibile

Introduction to Multiple Regression

S-ar putea să vă placă și