Complete Business Statistics: Multiple Regression

COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
7th edition.
Prepared by Lloyd Jaisingh, Morehead State
University
Chapter 11
Multiple Regression
McGraw-Hill/Irwin
Copyright 2009 by The McGraw-Hill Companies, Inc. All
11-2
11 Multiple Regression (1)
Using Statistics
The k-Variable Multiple Regression Model
The F Test of a Multiple Regression Model
How Good is the Regression
Tests of the Significance of Individual Regression Parameters
Testing the Validity of the Regression Model
Using the Multiple Regression Model for Prediction
11-3
11 Multiple Regression (2)
Qualitative Independent Variables

Polynomial Regression
Nonlinear Models and Transformations
Multicollinearity
Residual Autocorrelation and the Durbin-Watson Test
Partial F Tests and Variable Selection Methods
Multiple Regression Using the Solver
11-4
11 LEARNING OBJECTIVES (1)

After studying this chapter you should be able to:
Determine whether multiple regression would be applicable to a given

instance
Formulate a multiple regression model
Carry out a multiple regression using a spreadsheet template
Test the validity of a multiple regression by analyzing residuals
Carryout hypothesis tests about the regression coefficients
Compute a prediction interval for the dependent variable
11-5
11 LEARNING OBJECTIVES (2)

After studying this chapter you should be able to:
Use indicator variables in a multiple regression

Carryout a polynomial regression
Conduct a Durbin-Watson test for autocorrelation in residuals
Conduct a partial F-test
Determine which independent variables are to be included in a multiple
regression model
Solve multiple regression problems using the Solver macro
11-6
11-1 Using Statistics

y
Lines
B
Slope: 11
x1
Intercept: 00
x
Any two points (A and B), or
an intercept and slope (0 and
1), define a line on a twodimensional surface.
Planes
x2
Any three points (A, B, and C), or an
intercept and coefficients of x1 and x2
(0 , 1, and 2), define a plane in a
three-dimensional surface.
11-7
11-2 The k-Variable Multiple

Regression Model
Thepopulation
populationregression
regressionmodel
modelof
ofaa
The
dependentvariable,
variable,Y,
Y,on
onaaset
setof
ofkk
dependent
independentvariables,
variables,XX,1,XX,.2,.. .. ., ,XXkisis
independent
1
2
k
givenby:
by:
given
Y=0++X
+X
+. .. .. .++X
+
1X1+
2X2+
kXk+
Y=
0
where0isisthe
theY-intercept
Y-interceptof
ofthe
the
where
0
regressionsurface
surfaceand
andeach
eachi, ,ii==1,2,...,k
1,2,...,k
regression
i
theslope
slopeof
ofthe
theregression
regressionsurface
surface-isisthe
sometimescalled
calledthe
theresponse
responsesurface
surface-sometimes
withrespect
respecttotoXX.i.
with
i
x2
x1
y 0 1 x1 2 x 2
Modelassumptions:
assumptions:
Model
2
~N(0,2),
),independent
independentofofother
othererrors.
errors.
1.1. ~N(0,
Thevariables
variablesXXiare
areuncorrelated
uncorrelatedwith
withthe
theerror
errorterm.
term.
2.2. The
i
11-8
Simple and Multiple Least-Squares

Regression
y
x1
y b0 b1x
X
simpleregression
regressionmodel,
model,
model
InInaasimple
model
theleast-squares
least-squaresestimators
estimators
the
minimizethe
thesum
sumof
ofsquared
squared
minimize
errorsfrom
fromthe
theestimated
estimated
errors
regressionline.
line.
regression
x2
y b0 b1 x1 b2 x 2
multipleregression
regressionmodel,
model,
model
InInaamultiple
model
theleast-squares
least-squaresestimators
estimators
the
minimizethe
thesum
sumofofsquared
squared
minimize
errorsfrom
fromthe
theestimated
estimated
errors
regressionplane.
plane.
regression
The Estimated Regression

Relationship
Theestimated
estimatedregression
regressionrelationship:
relationship:
relationship
The
relationship
Y b0 b1 X 1 b2 X 2 bk X k
whereY isisthe
thepredicted
predictedvalue
valueof
ofY,
Y,the
thevalue
valuelying
lyingon
onthe
the
where
estimatedregression
regressionsurface.
surface. The
Theterms
termsbbi,i,for
forii==0,
0,1,1,....,k
....,kare
are
estimated
theleast-squares
least-squaresestimates
estimatesof
ofthe
thepopulation
populationregression
regression
the
parametersi.i.
parameters
Theactual,
actual,observed
observedvalue
valueof
ofYYisisthe
thepredicted
predictedvalue
valueplus
plusan
an
The
error:
error:
+.....++bbkkxxkjkj+e,
+e, jj==1,1,,
,n.n.
yyj j==bb00++bb11xx1j1j++bb22xx2j2j+.
11-9
Least-Squares Estimation:
The 2-Variable Normal Equations
11-10
Minimizing the sum of squared errors with respect to the

estimated coefficients b0, b1, and b2 yields the following
normal equations which can be solved for b0, b1, and b2.
y nb b x b x
0
x y b x b x b x x
2
x y b x b x x b x
2
2
2
11-11
Example 11-1
YY
72
72
76
76
78
78
70
70
68
68
80
80
82
82
65
65
62
62
90
90
----743
743
XX1 1
12
12
11
11
15
15
10
10
11
11
16
16
14
14
88
88
18
18
----123
123
XX2 2
55
88
66
55
33
99
12
12
44
33
10
10
----65
65
1X2
XX1X
2
60
60
88
88
90
90
50
50
33
33
144
144
168
168
32
32
24
24
180
180
----869
869
2
XX121
144
144
121
121
225
225
100
100
121
121
256
256
196
196
64
64
64
64
324
324
------1615
1615
2
XX222
25
25
64
64
36
36
25
25
99
81
81
144
144
16
16
99
100
100
----509
509
1Y
2Y
XX1Y
XX2Y
864 360
360
864
836 608
608
836
1170 468
468
1170
700 350
350
700
748 204
204
748
1280 720
720
1280
1148 984
984
1148
520 260
260
520
496 186
186
496
1620 900
900
1620
---- ---------9382 5040
5040
9382
Estimatedregression
regressionequation:
equation:
Estimated
NormalEquations:
Equations:
Normal
743==10b
10b+123b
0+123b+65b
1+65b2
743
0
1
2
9382==123b
123b+1615b
0+1615b+869b
1+869b2
9382
0
1
2
5040==65b
65b+869b
0+869b+509b
1+509b2
5040
0
1
2
47.164942
bb00==47.164942
1.5990404
bb11==1.5990404
1.1487479
bb22==1.1487479
47164942
15990404
11487479
47164942
YY
..
15990404
..
XX11 11487479
..
XX22
11-12
Example 11-1: Using the Template

Regression results for Alka-Seltzer sales
Coefficients
11-13
Example 11-1: Using Minitab

The Regression Equation; coefficients are truncated to two decimal places.
Decomposition of the Total Deviation

in a Multiple Regression Model
Total deviation: Y Y
Y Y: Error Deviation
Y Y : Regression Deviation
x1
x2
TotalDeviation
Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
Total
SST
SST
==
SSR
SSR
SSE
++ SSE
11-14
11-15
11-3 The F Test of a Multiple

Regression Model
statisticaltest
testfor
forthe
theexistence
existenceof
ofaalinear
linearrelationship
relationshipbetween
betweenYYand
andany
anyor
or
AAstatistical
allof
ofthe
theindependent
independentvariables
variablesXX,1,XX,2,...,
...,XX:k:
all
1
2
k
...==
k= 0
HH0:0: 11==22==...=
k 0
Notall
allthe
thei(i=1,2,...,k)
(i=1,2,...,k)are
areequal
equaltoto00
HH1:1: Not
i
Sourceofof Sum
Sumofof
Source
Variation Squares
Squares
Variation
Degreesofof
Degrees
Freedom Mean
MeanSquare
Square
Freedom
Regression SSR
SSR
Regression
kk
Error
Error
Total
Total
SSE
SSE
SST
SST
(k+1)
nn--(k+1)
n-1
n-1
SSR
MSR
MSE
SSE
( n ( k 1))
MST
SST
( n 1)
Ratio
FFRatio
Using the Template: Analysis of

Variance Table (Example 11-1)
F Distribution with 2 and 7 Degrees of Freedom

f(F)
Test statistic 86.34
=0.01
F
F0.01=9.55
Thetest
teststatistic,
statistic,FF==86.34,
86.34,isisgreater
greater
The
thanthe
thecritical
criticalpoint
pointof
ofFF(2, 7)for
forany
any
than
(2, 7)
commonlevel
levelof
ofsignificance
significance
common
(p-value0),
0),so
sothe
thenull
nullhypothesis
hypothesisisis
(p-value
rejected,and
andwe
wemight
mightconclude
concludethat
that
rejected,
thedependent
dependentvariable
variableisisrelated
relatedtoto
the
oneor
ormore
moreof
ofthe
theindependent
independent
one
variables.
variables.
11-16
Using Minitab: Analysis of Variance

Table (Example 11-1)
F Distribution with 2 and 7 Degrees of Freedom

f(F)
Test statistic 86.34
=0.01
F
F0.01=9.55
Thetest
teststatistic,
statistic,FF==86.34,
86.34,isisgreater
greater
The
thanthe
thecritical
criticalpoint
pointof
ofFF(2, 7)for
forany
any
than
(2, 7)
commonlevel
levelof
ofsignificance
significance
common
(p-value0),
0),so
sothe
thenull
nullhypothesis
hypothesisisis
(p-value
rejected,and
andwe
wemight
mightconclude
concludethat
that
rejected,
thedependent
dependentvariable
variableisisrelated
relatedtoto
the
oneor
ormore
moreof
ofthe
theindependent
independent
one
variables.
variables.
11-17
11-18
11-4 How Good is the Regression

The mean square error is an unbiased
estimator of the variance of the population
2
errors, , denoted by :
( y y) 2
MSE
( n ( k 1)) ( n ( k 1))
SSE
x1
x2
Standard error of estimate:
s = MSE
Errors: y - y
2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
SSR
SSE
R =
=1SST
SST
2
Decomposition of the Sum of Squares and

the Adjusted Coefficient of Determination
SST
SSR
R
SSE
=
SSR
SST
= 1-
SSE
SST
The adjusted multiple coefficient of determination, R 2, is the coefficient of

determination with the SSE and SST divided by their respective degrees of freedom:
SSE
R 2 =1- (n -(k +1))
SST
(n -1)
Example11-1:
11-1:
Example
1.911
ss==1.911
R-sq==96.1%
96.1%
R-sq
R-sq(adj)==95.0%
95.0%
R-sq(adj)
11-19
Measures of Performance in Multiple

Regression and the ANOVA Table
Sourceof
of Sum
Sumof
of
Source
Variation Squares
Squares
Variation
Degreesof
of
Degrees
Freedom Mean
MeanSquare
Square
Freedom
Regression SSR
SSR
Regression
(k)
(k)
MSR
Error
Error
SSE
SSE
(n-(k+1))
(n-(k+1))
=(n-k-1)
=(n-k-1)
Total
Total
SST
SST
(n-1)
(n-1)
SSR
SST
= 1-
SSE
SST
MSE
11-20
Ratio
FFRatio
F
SSR
k
MSR
MSE
SSE
( n ( k 1))
MST
SST
( n 1)
SSE
( n ( k 1))
2
(1 R )
(k )
= 1-
(n - (k + 1))
SST
(n - 1)
MSE
MST
11-5 Tests of the Significance of

Individual Regression Parameters
Hypothesistests
testsabout
aboutindividual
individualregression
regressionslope
slope
Hypothesis
parameters:
parameters:
(1)
(1)
HH00::11==00
0
HH11::110
(2)
(2)
HH00::22==00
0
HH11::220
...
..
.
(k)
(k)
HH00::kk==00
b
00
b
H
:
0
Test
statistic
for
test
i
:
t
H11: kk0for test i: t

Test statistic
( n ( k 1 )
( n ( k 1 )
ss((bb))
i
11-21
11-22
Regression Results for Individual

Parameters (Interpret the Table)
Variable
Variable
Constant
Constant
Coefficient Standard
Standard
Coefficient
Estimate
Error
Estimate
Error
t-Statistic
t-Statistic
53.12
53.12
5.43
5.43
XX11
2.03
2.03
0.22
0.22
9.227
9.227
XX22
5.60
5.60
1.30
1.30
4.308
4.308
XX33
10.35
10.35
6.88
6.88
1.504
1.504
XX44
3.45
3.45
2.70
2.70
1.259
1.259
XX55
-4.25
-4.25
0.38
0.38
11.184
11.184
n=150
n=150
t0.025=1.96
=1.96
t0.025
9.783
9.783
**
**
**
**
11-23
Example 11-1: Using the Template

Regression results for Alka-Seltzer sales
11-24
Using the Template: Example 11-2

Regression results for Exports to Singapore
Coefficients
11-25
Using Minitab: Example 11-2

Regression results for Exports to Singapore
Regression Equation
11-6 Testing the Validity of the

Regression Model: Residual Plots
Residuals vs M1 (Example 11-2)
It appears that the residuals are randomly distributed with no pattern and
with equal variance as M1 increases
11-26
11-6 Testing the Validity of the

Regression Model: Residual Plots
Residuals vs Price (Example 11-2)
It appears that the residuals are increasing as the Price increases. The
variance of the residuals is not constant.
11-27
Normal Probability Plot for the

Residuals: Example 11-2
Linear trend indicates residuals are normally distributed
11-28
Residual Plots from Minitab to help

Assess Assumptions: Example 11-2
Normal Probability Plot
Normal Probability Plot
99.9
99.9
99
99
90
90
50
50
10
10
1
1
0.1
0.1
-1.0
-1.0
-0.5
-0.5
0.0
0.0
Residual
Residual
0.5
0.5
Histogram
Histogram
Residual
Residual
Frequency
Frequency
12
12
6
0
-0.8
-0.8
-0.4
-0.4
0.0
0.4
0.0
0.4
Residual
Residual
0.0
0.0
-0.5
-0.5
1.0
1.0
24
24
0.5
0.5
-1.0
-1.0
1.0
1.0
18
18
0.8
0.8
Versus Fits
Versus Fits
1.0
1.0
Residual
Residual
Percent
Percent
ResidualPlots
Plotsfor
forExports
Exports
Residual
3.0
3.0
3.6
4.2
3.6
4.2
Fitted Value
Fitted Value
4.8
4.8
5.4
5.4
Versus Order
Versus Order
0.5
0.5
0.0
0.0
-0.5
-0.5
-1.0
-1.0 1 5 10 15 20 25 30 35 40 45 50 55 60 65
1 5 10 15 20 25 30 35 40 45 50 55 60 65
Observation Order
Observation Order
11-29
11-30
Investigating the Validity of the Regression:

Outliers and Influential Observations
Regression line
without outlier
. .
.
.. ..
. .. ..
.. .
.
Point with a large

value of xi
y
Regression
line with
outlier
.
.
.
.
.. .. .. .
. .. .
Regression line
when all data are
included
No relationship in
this cluster
* Outlier
Outliers
Outliers
x
InfluentialObservations
Observations
Influential
Possible Relation in the Region between the

Available Cluster of Data and the Far Point
Point with a large value of xii
Some of the possible data between the

original cluster and the far point
.
.
.
.
.. .. .. .
. .. .
x
x
x
x x
x
x
x
*
x
x
x x
x
x
x
x
x
x
x
x x
x
More appropriate curvilinear relationship
(seen when the in between data are known).
11-31
Outliers and Influential Observations:

Example 11-2
Unusual Observations
Observations
Unusual
Obs.
M1
EXPORTS
Obs.
M1
EXPORTS
Residual
St.Resid
Residual
St.Resid
5.10
2.6000
11
5.10
2.6000
0.1288
-0.0420
0.1288
-0.0420
4.90
2.6000
22
4.90
2.6000
0.1234
-0.0438
0.1234
-0.0438
25
6.20
5.5000
25
6.20
5.5000
0.0676
0.9051
0.0676
0.9051
26
6.30
3.7000
26
6.30
3.7000
0.0651
-0.9311
0.0651
-0.9311
50
8.30
4.3000
50
8.30
4.3000
0.0648
-0.8317
0.0648
-0.8317
67
8.20
5.6000
67
8.20
5.6000
0.0668
0.6526
0.0668
0.6526
Fit
Fit
Stdev.Fit
Stdev.Fit
-0.14 XX
-0.14
-0.14 XX
-0.14
2.80R
2.80R
-2.87R
-2.87R
-2.57R
-2.57R
2.6420
2.6420
2.6438
2.6438
4.5949
4.5949
4.6311
4.6311
5.1317
5.1317
4.9474
4.9474
2.02R
2.02R
denotes an
an obs.
obs. with
with aa large
large st.
st. resid.
resid.
RR denotes
denotes an
an obs.
obs. whose
whose XX value
value gives
gives it
it large
large influence.
influence.
XX denotes
11-32
11-7 Using the Multiple Regression

Model for Prediction
Sales
EstimatedRegression
RegressionPlane
Planefor
forExample
Example11-1
11-1
Estimated
89.76
Advertising
18.00
63.42
8.00
Promotions
12
11-33
11-34
Prediction in Multiple Regression

(1-
-))100%
100%prediction
predictioninterval
intervalfor
foraavalue
valueof
ofYYgiven
givenvalues
valuesof
of XX::
AA(1
ii
MSE
yytt
,(n(k 1))) ss22((yy))MSE
(
( 2,(n(k 1)))
2
(1--))100%
100%prediction
predictioninterval
intervalfor
forthe
theconditiona
conditionallmean
meanof
ofYYgiven
given
AA(1
valuesof
ofXX::
values
ii
s[[EE(Y(Y)])]
yytt
s
)))
(( 2,(,(nn((kk11)))
2
11-35
11-8 Qualitative (or Categorical)

Independent Variables (in Regression)
An indicator (dummy, binary) variable of qualitative level A:
1 if level A is obtained
Xh
0 if level A is not obtained
MOVIE EARN
MOVIE EARN
1
28
1
28
2
35
2
35
3
50
3
50
4
20
4
20
5
75
5
75
6
60
6
60
7
15
7
15
8
45
8
45
9
50
9
50
10
34
10
34
11
48
11
48
12
82
12
82
13
24
13
24
14
50
14
50
15
58
15
58
16
63
16
63
17
30
17
30
18
37
18
37
19
45
19
45
20
72
20
72
COST
COST
4.2
4.2
6.0
6.0
5.5
5.5
3.3
3.3
12.5
12.5
9.6
9.6
2.5
2.5
10.8
10.8
8.4
8.4
6.6
6.6
10.7
10.7
11.0
11.0
3.5
3.5
6.9
6.9
7.8
7.8
10.1
10.1
5.0
5.0
7.5
7.5
6.4
6.4
10.0
10.0
PROM
PROM
1.0
1.0
3.0
3.0
6.0
6.0
1.0
1.0
11.0
11.0
8.0
8.0
0.5
0.5
5.0
5.0
3.0
3.0
2.0
2.0
1.0
1.0
15.0
15.0
4.0
4.0
10.0
10.0
9.0
9.0
10.0
10.0
1.0
1.0
5.0
5.0
8.0
8.0
12.0
12.0
BOOK
BOOK
0
0
1
1
1
1
0
0
1
1
1
1
0
0
0
0
1
1
0
0
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
1
1
1
1
EXAMPLE 11-3
11-36
Picturing Qualitative Variables in

Regression
y
Line for X2=1
b3
b0+b2
Line for X2=0
x1
b0
X1
regressionwith
withone
one
AAregression
quantitativevariable
variable(X
(X)1)and
and
quantitative
1
onequalitative
qualitativevariable
variable(X
(X):
2):
one
2
y b b x b x
0
x2
multipleregression
regressionwith
withtwo
two
AAmultiple
quantitativevariables
variables(X
(X1and
andXX)2)
quantitative
1
2
andone
onequalitative
qualitativevariable
variable(X
(X):
3):
and
3
y b b x b x b x
0
11-37
Picturing Qualitative Variables in Regression:

Three Categories and Two Dummy Variables
Y
Line for X = 0 and X3 = 1
Line for X2 = 1 and X3 = 0

b0+b3
Line for X2 = 0 and X3 = 0
qualitative
AAqualitative
variablewith
withrr
variable
levelsor
orcategories
categories
levels
representedwith
with
isisrepresented
(r-1)0/1
0/1(dummy)
(dummy)
(r-1)
variables.
variables.
b0+b2
b0
X1
regressionwith
withone
onequantitative
quantitativevariable
variable(X
(X)1)and
andtwo
two
AAregression
1
qualitativevariables
variables(X
(X2and
andXX):
2):
qualitative
2
2
y b b x b x b x
0
Category XX2
Category
2
Adventure 00
Adventure
Drama
Drama
00
Romance 11
Romance
XX33
00
11
00
Using Qualitative Variables in

Regression: Example 11-4
11-38
Salary == 8547
8547
949 Education
Education ++ 1258
1258
Salary
++ 949
Experience -- 3256
3256 Gender
Gender
Experience
(SE)
(32.6)
(45.1)
(SE)
(32.6)
(45.1)
(78.5)
(212.4)
(78.5)
(212.4)
(t)
(262.2)
(21.0)
(t)
(262.2)
(21.0)
(16.0)
(-15.3)
(16.0)
(-15.3)
1 if Female
Gender
0 if Male
Onaverage,
average,female
femalesalaries
salariesare
are
On
$3256below
belowmale
malesalaries
salaries
$3256
Interactions between Quantitative and

Qualitative Variables: Shifting Slopes
Line for X2=0
Line for X2=1
Slope = b1
b0
Slope = b1+b3
b0+b2
X1
regressionwith
withinteraction
interactionbetween
betweenaaquantitative
quantitative
AAregression
variable(X
(X)1)and
andaaqualitative
qualitativevariable
variable(X
(X2):):
variable
1
2
y b b x b x b x x
0
11-39
11-40
11-9 Polynomial Regression

One-variable polynomial regression model:
Y= 0+ 1 X + 2X2 + 3X3 +. . . + mXm +
where m is the degree of the polynomial - the highest power of X appearing in
the equation. The degree of the polynomial is the order of the model.
Y
y b b X
0
y b b X
0
y b b X b X
(b 0)
0
y b b X b X b X
2
X1
X1
Polynomial Regression: Example 11-5

Using the Template
11-41
Polynomial Regression: Example 11-5

Using Minitab
11-42
Polynomial Regression: Other

Variables and Cross-Product Terms
11-43
Variable Estimate
Estimate Standard
StandardError
Error T-statistic
T-statistic
Variable
2.34
0.92
2.54
XX1 1
2.34
0.92
2.54
3.11
1.05
2.96
XX2 2
3.11
1.05
2.96
2
4.22
1.00
4.22
XX121
4.22
1.00
4.22
2
3.57
2.12
1.68
XX222
3.57
2.12
1.68
2
2.77
2.30
1.20
2
1X
XX1X
2.77
2.30
1.20
11-44
11-10 Nonlinear Models and

Transformations
The multiplicative
multiplicative model
model::
The
X
X
X
Y X X X
The logarithmic
logarithmic transformation
transformation::
The
logYY log
log log
logX
X log
logX
X log
logX
X log
log
log
0
11-45
Transformations: Exponential Model

The exponential
exponential model
model::
The
e
Y e
The logarithmic
logarithmic transformation
transformation::
The
logY
Y
log
log

X
X
log
log
log
0
0
1X
1X
0
0
1
1
1
1
11-46
Plots of Transformed Variables

Sim ple Regression of Sales on Advertising
Regression of Sales on Log(Advertising)

25
20
Y = 6 .59 2 71 + 1.19 176 X

R- Sq uared = 0 .8 9 5
10
SALES
SALES
30
15
Y = 3.6 6 8 2 5 + 6.78 4 X
R- Sq uared = 0 .978
5
10
15
ADVERT
LOGADV
Regression of Log(Sales) on Log(Advertising)
Residual Plots: Sales vs Log(Advertising)

1.5
3.5
2.5
Y = 1.70 0 8 2 + 0 .5 53 13 6 X
R- Sq uar ed = 0 .9 47
RESIDS
LOGSALE
0.5
-0.5
-1.5
1.5
0
LOGADV
12
Y-HAT
22
11-47
Variance Stabilizing Transformations
Squareroot
roottransformation:
transformation:
Square
Y Y
Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisisapproximately
approximatelyproportional
proportional
Useful
theconditional
conditionalmean
meanof
ofYY
totothe
Logarithmictransformation:
transformation:
Logarithmic
Y log(Y )
Usefulwhen
whenthe
thevariance
varianceof
ofregression
regressionerrors
proportionaltoto
Useful
thesquare
squareof
ofthe
theconditional
conditionalmean
meanof
ofYY
the
Reciprocaltransformation:
transformation:
Reciprocal
1
Y
Y
Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
proportional
Useful
thefourth
fourthpower
powerof
ofthe
theconditional
conditionalmean
meanof
ofYY
totothe
Regression with Dependent Indicator

Variables
The logistic function:
e ( X )
E (Y X )
1 e ( X )
0
Transformation to linearize the logistic function:

p
1 p
p log
Logistic Function
11-48
11-49
11-11: Multicollinearity
x2
x2
x1
Orthogonal X variables provide
information from independent
sources. No multicollinearity.
x2
x1
Some degree of collinearity.
Problems with regression depend
on the degree of collinearity.
x1
Perfectly collinear X variables

provide identical information
content. No regression.
x2
x1
A high degree of negative

collinearity also causes problems
with regression.
11-50
Effects of Multicollinearity
Variancesof
ofregression
regressioncoefficients
coefficientsare
areinflated.
inflated.
Variances
Magnitudesof
ofregression
coefficientsmay
maybe
bedifferent
different
Magnitudes
fromwhat
whatare
areexpected.
expected.
from
Signsof
ofregression
coefficientsmay
maynot
notbe
beas
asexpected.
expected.
Signs
Addingor
orremoving
removingvariables
variablesproduces
produceslarge
largechanges
changesin
in
Adding
coefficients.
coefficients.
Removingaadata
datapoint
pointmay
maycause
causelarge
largechanges
changesin
in
Removing
coefficientestimates
estimatesor
orsigns.
signs.
coefficient
Insome
somecases,
cases,the
theFFratio
ratiomay
maybe
besignificant
significantwhile
whilethe
thett
In
ratiosare
arenot.
not.
ratios
Detecting the Existence of Multicollinearity:

Correlation Matrix of Independent Variables and
Variance Inflation Factors
11-51
11-52
Variance Inflation Factor

Thevariance
varianceinflation
inflationfactor
factorassociated
associatedwith
withXXhh::
The
1
1
VIF((XXhh))
VIF
22
1
R
1 Rhh
2
2
whereRR2hh isisthe
theRR2 value
valueobtained
obtainedfor
forthe
theregression
regressionof
ofXXon
on
where
theother
otherindependent
independentvariables.
variables.
the
Relationship between VIF and Rh2
VIF100
50
0
0.0
0.5
1.0
Rh2
11-53
Variance Inflation Factor (VIF)
Observation: The VIF (Variance Inflation Factor) values

for both variables Lend and Price are both greater than
5. This would indicate that some degree of
multicollinearity exists with respect to these two
variables.
Solutions to the Multicollinearity

Problem
Drop aa collinear
collinear variable
variable from
from the
the
Drop
regression
regression
Change in
in sampling
sampling plan
plan to
to include
include
Change
elements outside
outside the
the multicollinearity
multicollinearity range
range
elements
Transformations of
of variables
variables
Transformations
Ridge regression
regression
Ridge
11-54
11-12 Residual Autocorrelation and

the Durbin-Watson Test
Anautocorrelation
autocorrelationisisaacorrelation
correlationof
ofthe
thevalues
valuesof
ofaavariable
variable
An
withvalues
valuesof
ofthe
thesame
samevariable
variablelagged
laggedone
oneor
ormore
moreperiods
periods
with
back. Consequences
Consequencesof
ofautocorrelation
autocorrelationinclude
includeinaccurate
inaccurate
back.
estimatesof
ofvariances
variancesand
andinaccurate
inaccuratepredictions.
predictions.
estimates
Lagged Residuals
Residuals
Lagged
ii
11
22
33
44
55
66
77
88
99
10
10
1.0
1.0
0.0
0.0
-1.0
-1.0
2.0
2.0
3.0
3.0
-2.0
-2.0
1.0
1.0
1.5
1.5
1.0
1.0
-2.5
-2.5
i-1
i i
i-1
**
**
1.0 **
1.0
0.0 1.0
1.0
0.0
-1.0 0.0
0.0
-1.0
2.0 -1.0
-1.0
2.0
3.0 2.0
2.0
3.0
-2.0 3.0
3.0
-2.0
1.0 -2.0
-2.0
1.0
1.5 1.0
1.0
1.5
1.0 1.5
1.5
1.0
i-2 i-3
i-2
i-3
*
*
**
**
**
**
**
1.0
1.0
**
0.0
1.0
0.0
1.0
-1.0
0.0
-1.0
0.0
2.0 -1.0
-1.0
2.0
3.0
2.0
3.0
2.0
-2.0
3.0
-2.0
3.0
1.0 -2.0
-2.0
1.0
TheDurbin-Watson
Durbin-Watsontest
test(first-order
(first-order
The
autocorrelation):
autocorrelation):
i-4
i-4
HH0:0:11==00
0
1:
HH1:
0
TheDurbin-Watson
Durbin-Watsontest
teststatistic:
statistic:
The
n
2
( ei ei 1 )
d i2 n
2
ei
i 1
11-55
11-56
Critical Points of the Durbin-Watson Statistic: = 0.05,

n = Sample Size, k = Number of Independent Variables
kk==11
kk==22
nn ddLL ddUU ddLL ddUU
15
15
16
16
17
17
18
18
. ..
..
.
65
65
70
70
75
75
80
80
85
85
90
90
95
95
100
100
1.08
1.08
1.10
1.10
1.13
1.13
1.16
1.16
1.57
1.57
1.58
1.58
1.60
1.60
1.61
1.61
1.62
1.62
1.63
1.63
1.64
1.64
1.65
1.65
1.36
1.36
1.37
1.37
1.38
1.38
1.39
1.39
. ..
..
.
1.63
1.63
1.64
1.64
1.65
1.65
1.66
1.66
1.67
1.67
1.68
1.68
1.69
1.69
1.69
1.69
0.95
0.95
0.98
0.98
1.02
1.02
1.05
1.05
1.54
1.54
1.55
1.55
1.57
1.57
1.59
1.59
1.60
1.60
1.61
1.61
1.62
1.62
1.63
1.63
1.54
1.54
1.54
1.54
1.54
1.54
1.53
1.53
. ..
..
.
1.66
1.66
1.67
1.67
1.68
1.68
1.69
1.69
1.70
1.70
1.70
1.70
1.71
1.71
1.72
1.72
kk==33
ddLL ddUU
0.82
0.82
0.86
0.86
0.90
0.90
0.93
0.93
1.50
1.50
1.52
1.52
1.54
1.54
1.56
1.56
1.57
1.57
1.59
1.59
1.60
1.60
1.61
1.61
1.75
1.75
1.73
1.73
1.71
1.71
1.69
1.69
. ..
..
.
1.70
1.70
1.70
1.70
1.71
1.71
1.72
1.72
1.72
1.72
1.73
1.73
1.73
1.73
1.74
1.74
kk==44
ddLL ddUU
0.69
0.69
0.74
0.74
0.78
0.78
0.82
0.82
1.47
1.47
1.49
1.49
1.51
1.51
1.53
1.53
1.55
1.55
1.57
1.57
1.58
1.58
1.59
1.59
1.97
1.97
1.93
1.93
1.90
1.90
1.87
1.87
. ..
..
.
1.73
1.73
1.74
1.74
1.74
1.74
1.74
1.74
1.75
1.75
1.75
1.75
1.75
1.75
1.76
1.76
kk==55
ddLL ddUU
0.56
0.56
0.62
0.62
0.67
0.67
0.71
0.71
1.44
1.44
1.46
1.46
1.49
1.49
1.51
1.51
1.52
1.52
1.54
1.54
1.56
1.56
1.57
1.57
2.21
2.21
2.15
2.15
2.10
2.10
2.06
2.06
. ..
..
.
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.78
1.78
1.78
1.78
1.78
1.78
11-57
Using the Durbin-Watson Statistic
Positive
Autocorrelation
dL
Test is
Inconclusive
dU
No
Autocorrelation
Test is
Inconclusive
4-dU
Negative
Autocorrelation
4-dL
Fornn==67,
67,kk==4:
4: ddUU1.73
1.73 4-d
4-dUU2.27
2.27
For
1.47 44-ddLL2.53
2.53<<2.58
2.58
ddLL1.47
rejected,and
andwe
weconclude
concludethere
thereisisnegative
negativefirst-order
first-order
HH00isisrejected,
autocorrelation.
autocorrelation.
11-13 Partial F Tests and Variable

Selection Methods
11-58
Fullmodel:
model:
Full
YY==0 0++1 1XX1 1++2 2XX2 2++3 3XX3 3++4 4XX4 4++
Reducedmodel:
model:
Reduced
YY==0 0++1 1XX1 1++2 2XX2 2++
PartialFFtest:
test:
Partial
HH0:0:3 3==4 4==00
and 4not
notboth
both00
HH1:1:3 3and
4
PartialFFstatistic:
statistic:
Partial
(SSE
F
(r, (n (k 1))
SSE ) / r
R
F
MSE
F
whereSSE
SSERisisthe
thesum
sumofofsquared
squarederrors
errorsofofthe
thereduced
reducedmodel,
model,SSE
SSEFisisthe
thesum
sumofofsquared
squared
where
R
F
errorsofofthe
thefull
fullmodel;
model;MSE
MSEFisisthe
themean
meansquare
squareerror
errorofofthe
thefull
fullmodel
model[MSE
[MSEF==SSE
SSE/(nF/(nerrors
F
F
F
(k+1))];rrisisthe
thenumber
numberofofvariables
variablesdropped
droppedfrom
fromthe
thefull
fullmodel.
model.
(k+1))];
Variable Selection Methods Using the

Template Example 11-2
Allpossible
possibleregressions
regressions
All
Runregressions
regressionswith
withall
allpossible
possiblecombinations
combinationsof
ofindependent
independentvariables
variables
Run
andselect
selectbest
bestmodel
model
and
A p-value of 0.001 indicates

that we should reject the null
hypothesis H0: the slopes for
Lend and Exch. are zero.
11-59
Variable Selection Methods Using Minitab

Example 11-2
11-60
11-61
Variable Selection Methods
Stepwiseprocedures
procedures
Stepwise
Forwardselection
selection
Forward
Addone
onevariable
variableatataatime
timetotothe
themodel,
model,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Add
Backwardelimination
elimination
Backward
Removeone
onevariable
variableatataatime,
time,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Remove
Stepwiseregression
regression
Stepwise
Addsvariables
variablestotothe
themodel
modeland
andsubtracts
subtractsvariables
variablesfrom
fromthe
themodel,
model,on
onthe
thebasis
basis
Adds
theFFstatistic
statistic
ofofthe
11-62
Stepwise Regression
Compute F statistic for each variable not in the model
Is there at least one variable with p-value > Pin?
No
Stop
Yes
Enter most significant (smallest p-value) variable into model
Calculate partial F for all variables in the model
Is there a variable with p-value > Pout?

No
Remove
variable
Stepwise Regression: Using the

Computer (MINITAB) Example 11-2
11-63
11-64
Using the Computer: MINITAB

Complete Business Statistics: Multiple Regression

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Complete Business Statistics: Multiple Regression

Încărcat de

Drepturi de autor:

Formate disponibile

COMPLETE

Copyright 2009 by The McGraw-Hill Companies, Inc. All

11 Multiple Regression (1)

11 Multiple Regression (2)

Qualitative Independent Variables

11 LEARNING OBJECTIVES (1)

Determine whether multiple regression would be applicable to a given

11 LEARNING OBJECTIVES (2)

Use indicator variables in a multiple regression

11-1 Using Statistics

11-2 The k-Variable Multiple

Simple and Multiple Least-Squares

The Estimated Regression

Minimizing the sum of squared errors with respect to the

Example 11-1: Using the Template

Example 11-1: Using Minitab

Decomposition of the Total Deviation

11-3 The F Test of a Multiple

Using the Template: Analysis of

F Distribution with 2 and 7 Degrees of Freedom

Test statistic 86.34

Using Minitab: Analysis of Variance

F Distribution with 2 and 7 Degrees of Freedom

Test statistic 86.34

11-4 How Good is the Regression

Standard error of estimate:

Decomposition of the Sum of Squares and

The adjusted multiple coefficient of determination, R 2, is the coefficient of

Measures of Performance in Multiple

11-5 Tests of the Significance of

H11: kk0for test i: t

Regression Results for Individual

Example 11-1: Using the Template

Using the Template: Example 11-2

Using Minitab: Example 11-2

11-6 Testing the Validity of the

11-6 Testing the Validity of the

Normal Probability Plot for the

Residual Plots from Minitab to help

Investigating the Validity of the Regression:

Point with a large

Possible Relation in the Region between the

Some of the possible data between the

Outliers and Influential Observations:

11-7 Using the Multiple Regression

Prediction in Multiple Regression

11-8 Qualitative (or Categorical)

Picturing Qualitative Variables in

Line for X2=1

Line for X2=0

Picturing Qualitative Variables in Regression:

Line for X = 0 and X3 = 1

Line for X2 = 1 and X3 = 0

Line for X2 = 0 and X3 = 0

Using Qualitative Variables in

Interactions between Quantitative and

Line for X2=1

11-9 Polynomial Regression

Polynomial Regression: Example 11-5

Polynomial Regression: Example 11-5

Polynomial Regression: Other

11-10 Nonlinear Models and

Transformations: Exponential Model