Sunteți pe pagina 1din 64

COMPLETE

BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
7th edition.
Prepared by Lloyd Jaisingh, Morehead State
University

Chapter 11

Multiple Regression
McGraw-Hill/Irwin

Copyright 2009 by The McGraw-Hill Companies, Inc. All

11-2

11 Multiple Regression (1)

Using Statistics
The k-Variable Multiple Regression Model
The F Test of a Multiple Regression Model
How Good is the Regression
Tests of the Significance of Individual Regression Parameters
Testing the Validity of the Regression Model
Using the Multiple Regression Model for Prediction

11-3

11 Multiple Regression (2)

Qualitative Independent Variables


Polynomial Regression
Nonlinear Models and Transformations
Multicollinearity
Residual Autocorrelation and the Durbin-Watson Test
Partial F Tests and Variable Selection Methods
Multiple Regression Using the Solver

11-4

11 LEARNING OBJECTIVES (1)


After studying this chapter you should be able to:

Determine whether multiple regression would be applicable to a given


instance
Formulate a multiple regression model
Carry out a multiple regression using a spreadsheet template
Test the validity of a multiple regression by analyzing residuals
Carryout hypothesis tests about the regression coefficients
Compute a prediction interval for the dependent variable

11-5

11 LEARNING OBJECTIVES (2)


After studying this chapter you should be able to:

Use indicator variables in a multiple regression


Carryout a polynomial regression
Conduct a Durbin-Watson test for autocorrelation in residuals
Conduct a partial F-test
Determine which independent variables are to be included in a multiple
regression model
Solve multiple regression problems using the Solver macro

11-6

11-1 Using Statistics


y

Lines
B

Slope: 11

x1

Intercept: 00

x
Any two points (A and B), or
an intercept and slope (0 and
1), define a line on a twodimensional surface.

Planes

x2
Any three points (A, B, and C), or an
intercept and coefficients of x1 and x2
(0 , 1, and 2), define a plane in a
three-dimensional surface.

11-7

11-2 The k-Variable Multiple


Regression Model
Thepopulation
populationregression
regressionmodel
modelof
ofaa
The
dependentvariable,
variable,Y,
Y,on
onaaset
setof
ofkk
dependent
independentvariables,
variables,XX,1,XX,.2,.. .. ., ,XXkisis
independent
1
2
k
givenby:
by:
given
Y=0++X
+X
+. .. .. .++X
+
1X1+
2X2+
kXk+
Y=
0

where0isisthe
theY-intercept
Y-interceptof
ofthe
the
where
0
regressionsurface
surfaceand
andeach
eachi, ,ii==1,2,...,k
1,2,...,k
regression
i
theslope
slopeof
ofthe
theregression
regressionsurface
surface-isisthe
sometimescalled
calledthe
theresponse
responsesurface
surface-sometimes
withrespect
respecttotoXX.i.
with
i

x2

x1
y 0 1 x1 2 x 2

Modelassumptions:
assumptions:
Model
2
~N(0,2),
),independent
independentofofother
othererrors.
errors.
1.1. ~N(0,
Thevariables
variablesXXiare
areuncorrelated
uncorrelatedwith
withthe
theerror
errorterm.
term.
2.2. The
i

11-8

Simple and Multiple Least-Squares


Regression
y

x1

y b0 b1x
X

simpleregression
regressionmodel,
model,
model
InInaasimple
model
theleast-squares
least-squaresestimators
estimators
the
minimizethe
thesum
sumof
ofsquared
squared
minimize
errorsfrom
fromthe
theestimated
estimated
errors
regressionline.
line.
regression

x2

y b0 b1 x1 b2 x 2

multipleregression
regressionmodel,
model,
model
InInaamultiple
model
theleast-squares
least-squaresestimators
estimators
the
minimizethe
thesum
sumofofsquared
squared
minimize
errorsfrom
fromthe
theestimated
estimated
errors
regressionplane.
plane.
regression

The Estimated Regression


Relationship
Theestimated
estimatedregression
regressionrelationship:
relationship:
relationship
The
relationship
Y b0 b1 X 1 b2 X 2 bk X k

whereY isisthe
thepredicted
predictedvalue
valueof
ofY,
Y,the
thevalue
valuelying
lyingon
onthe
the
where
estimatedregression
regressionsurface.
surface. The
Theterms
termsbbi,i,for
forii==0,
0,1,1,....,k
....,kare
are
estimated
theleast-squares
least-squaresestimates
estimatesof
ofthe
thepopulation
populationregression
regression
the
parametersi.i.
parameters
Theactual,
actual,observed
observedvalue
valueof
ofYYisisthe
thepredicted
predictedvalue
valueplus
plusan
an
The
error:
error:
+.....++bbkkxxkjkj+e,
+e, jj==1,1,,
,n.n.
yyj j==bb00++bb11xx1j1j++bb22xx2j2j+.

11-9

Least-Squares Estimation:
The 2-Variable Normal Equations

11-10

Minimizing the sum of squared errors with respect to the


estimated coefficients b0, b1, and b2 yields the following
normal equations which can be solved for b0, b1, and b2.
y nb b x b x
0

x y b x b x b x x
2

x y b x b x x b x
2

2
2

11-11

Example 11-1
YY
72
72
76
76
78
78
70
70
68
68
80
80
82
82
65
65
62
62
90
90
----743
743

XX1 1
12
12
11
11
15
15
10
10
11
11
16
16
14
14
88
88
18
18
----123
123

XX2 2
55
88
66
55
33
99
12
12
44
33
10
10
----65
65

1X2
XX1X
2
60
60
88
88
90
90
50
50
33
33
144
144
168
168
32
32
24
24
180
180
----869
869

2
XX121
144
144
121
121
225
225
100
100
121
121
256
256
196
196
64
64
64
64
324
324
------1615
1615

2
XX222
25
25
64
64
36
36
25
25
99
81
81
144
144
16
16
99
100
100
----509
509

1Y
2Y
XX1Y
XX2Y
864 360
360
864
836 608
608
836
1170 468
468
1170
700 350
350
700
748 204
204
748
1280 720
720
1280
1148 984
984
1148
520 260
260
520
496 186
186
496
1620 900
900
1620
---- ---------9382 5040
5040
9382

Estimatedregression
regressionequation:
equation:
Estimated

NormalEquations:
Equations:
Normal
743==10b
10b+123b
0+123b+65b
1+65b2
743
0
1
2
9382==123b
123b+1615b
0+1615b+869b
1+869b2
9382
0
1
2
5040==65b
65b+869b
0+869b+509b
1+509b2
5040
0
1
2
47.164942
bb00==47.164942
1.5990404
bb11==1.5990404
1.1487479
bb22==1.1487479

47164942
15990404
11487479
47164942
YY
..
15990404
..
XX11 11487479
..
XX22

11-12

Example 11-1: Using the Template


Regression results for Alka-Seltzer sales

Coefficients

11-13

Example 11-1: Using Minitab


The Regression Equation; coefficients are truncated to two decimal places.

Decomposition of the Total Deviation


in a Multiple Regression Model

Total deviation: Y Y

Y Y: Error Deviation

Y Y : Regression Deviation

x1
x2
TotalDeviation
Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
Total

SST
SST

==

SSR
SSR

SSE
++ SSE

11-14

11-15

11-3 The F Test of a Multiple


Regression Model

statisticaltest
testfor
forthe
theexistence
existenceof
ofaalinear
linearrelationship
relationshipbetween
betweenYYand
andany
anyor
or
AAstatistical
allof
ofthe
theindependent
independentvariables
variablesXX,1,XX,2,...,
...,XX:k:
all
1
2
k
...==
k= 0
HH0:0: 11==22==...=
k 0
Notall
allthe
thei(i=1,2,...,k)
(i=1,2,...,k)are
areequal
equaltoto00
HH1:1: Not
i
Sourceofof Sum
Sumofof
Source
Variation Squares
Squares
Variation

Degreesofof
Degrees
Freedom Mean
MeanSquare
Square
Freedom

Regression SSR
SSR
Regression

kk

Error
Error
Total
Total

SSE
SSE
SST
SST

(k+1)
nn--(k+1)
n-1
n-1

SSR

MSR

MSE

SSE
( n ( k 1))

MST

SST
( n 1)

Ratio
FFRatio

Using the Template: Analysis of


Variance Table (Example 11-1)

F Distribution with 2 and 7 Degrees of Freedom


f(F)

Test statistic 86.34

=0.01

F
F0.01=9.55

Thetest
teststatistic,
statistic,FF==86.34,
86.34,isisgreater
greater
The
thanthe
thecritical
criticalpoint
pointof
ofFF(2, 7)for
forany
any
than
(2, 7)
commonlevel
levelof
ofsignificance
significance
common
(p-value0),
0),so
sothe
thenull
nullhypothesis
hypothesisisis
(p-value
rejected,and
andwe
wemight
mightconclude
concludethat
that
rejected,
thedependent
dependentvariable
variableisisrelated
relatedtoto
the
oneor
ormore
moreof
ofthe
theindependent
independent
one
variables.
variables.

11-16

Using Minitab: Analysis of Variance


Table (Example 11-1)

F Distribution with 2 and 7 Degrees of Freedom


f(F)

Test statistic 86.34

=0.01

F
F0.01=9.55

Thetest
teststatistic,
statistic,FF==86.34,
86.34,isisgreater
greater
The
thanthe
thecritical
criticalpoint
pointof
ofFF(2, 7)for
forany
any
than
(2, 7)
commonlevel
levelof
ofsignificance
significance
common
(p-value0),
0),so
sothe
thenull
nullhypothesis
hypothesisisis
(p-value
rejected,and
andwe
wemight
mightconclude
concludethat
that
rejected,
thedependent
dependentvariable
variableisisrelated
relatedtoto
the
oneor
ormore
moreof
ofthe
theindependent
independent
one
variables.
variables.

11-17

11-18

11-4 How Good is the Regression


The mean square error is an unbiased
estimator of the variance of the population
2
errors, , denoted by :

( y y) 2
MSE

( n ( k 1)) ( n ( k 1))
SSE

x1
x2

Standard error of estimate:

s = MSE

Errors: y - y

2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:

SSR
SSE
R =
=1SST
SST
2

Decomposition of the Sum of Squares and


the Adjusted Coefficient of Determination
SST
SSR
R

SSE
=

SSR
SST

= 1-

SSE
SST

The adjusted multiple coefficient of determination, R 2, is the coefficient of


determination with the SSE and SST divided by their respective degrees of freedom:
SSE
R 2 =1- (n -(k +1))
SST
(n -1)
Example11-1:
11-1:
Example

1.911
ss==1.911

R-sq==96.1%
96.1%
R-sq

R-sq(adj)==95.0%
95.0%
R-sq(adj)

11-19

Measures of Performance in Multiple


Regression and the ANOVA Table
Sourceof
of Sum
Sumof
of
Source
Variation Squares
Squares
Variation

Degreesof
of
Degrees
Freedom Mean
MeanSquare
Square
Freedom

Regression SSR
SSR
Regression

(k)
(k)
MSR

Error
Error

SSE
SSE

(n-(k+1))
(n-(k+1))
=(n-k-1)
=(n-k-1)

Total
Total

SST
SST

(n-1)
(n-1)

SSR
SST

= 1-

SSE
SST

MSE

11-20

Ratio
FFRatio
F

SSR
k

MSR
MSE

SSE
( n ( k 1))

MST

SST
( n 1)
SSE

( n ( k 1))
2

(1 R )

(k )

= 1-

(n - (k + 1))
SST
(n - 1)

MSE
MST

11-5 Tests of the Significance of


Individual Regression Parameters
Hypothesistests
testsabout
aboutindividual
individualregression
regressionslope
slope
Hypothesis
parameters:
parameters:
(1)
(1)
HH00::11==00
0
HH11::110
(2)
(2)
HH00::22==00
0
HH11::220
...
..
.
(k)
(k)
HH00::kk==00
b
00
b
H
:

0
Test
statistic
for
test
i
:
t

H11: kk0for test i: t


Test statistic

( n ( k 1 )
( n ( k 1 )

ss((bb))
i

11-21

11-22

Regression Results for Individual


Parameters (Interpret the Table)
Variable
Variable

Constant
Constant

Coefficient Standard
Standard
Coefficient
Estimate
Error
Estimate
Error

t-Statistic
t-Statistic

53.12
53.12

5.43
5.43

XX11

2.03
2.03

0.22
0.22

9.227
9.227

XX22

5.60
5.60

1.30
1.30

4.308
4.308

XX33

10.35
10.35

6.88
6.88

1.504
1.504

XX44

3.45
3.45

2.70
2.70

1.259
1.259

XX55

-4.25
-4.25

0.38
0.38

11.184
11.184

n=150
n=150

t0.025=1.96
=1.96
t0.025

9.783
9.783

**
**
**
**

11-23

Example 11-1: Using the Template


Regression results for Alka-Seltzer sales

11-24

Using the Template: Example 11-2


Regression results for Exports to Singapore

Coefficients

11-25

Using Minitab: Example 11-2


Regression results for Exports to Singapore
Regression Equation

11-6 Testing the Validity of the


Regression Model: Residual Plots
Residuals vs M1 (Example 11-2)

It appears that the residuals are randomly distributed with no pattern and
with equal variance as M1 increases

11-26

11-6 Testing the Validity of the


Regression Model: Residual Plots
Residuals vs Price (Example 11-2)

It appears that the residuals are increasing as the Price increases. The
variance of the residuals is not constant.

11-27

Normal Probability Plot for the


Residuals: Example 11-2
Linear trend indicates residuals are normally distributed

11-28

Residual Plots from Minitab to help


Assess Assumptions: Example 11-2
Normal Probability Plot
Normal Probability Plot

99.9
99.9
99
99
90
90
50
50
10
10
1
1
0.1
0.1

-1.0
-1.0

-0.5
-0.5

0.0
0.0
Residual
Residual

0.5
0.5

Histogram
Histogram

Residual
Residual

Frequency
Frequency

12
12

6
0

-0.8
-0.8

-0.4
-0.4

0.0
0.4
0.0
0.4
Residual
Residual

0.0
0.0
-0.5
-0.5

1.0
1.0

24
24

0.5
0.5

-1.0
-1.0

1.0
1.0

18
18

0.8
0.8

Versus Fits
Versus Fits

1.0
1.0
Residual
Residual

Percent
Percent

ResidualPlots
Plotsfor
forExports
Exports
Residual

3.0
3.0

3.6
4.2
3.6
4.2
Fitted Value
Fitted Value

4.8
4.8

5.4
5.4

Versus Order
Versus Order

0.5
0.5
0.0
0.0
-0.5
-0.5
-1.0
-1.0 1 5 10 15 20 25 30 35 40 45 50 55 60 65
1 5 10 15 20 25 30 35 40 45 50 55 60 65
Observation Order
Observation Order

11-29

11-30

Investigating the Validity of the Regression:


Outliers and Influential Observations

Regression line
without outlier

. .
.
.. ..
. .. ..
.. .
.

Point with a large


value of xi

y
Regression
line with
outlier

.
.
.
.
.. .. .. .
. .. .

Regression line
when all data are
included

No relationship in
this cluster

* Outlier

Outliers
Outliers

x
InfluentialObservations
Observations
Influential

Possible Relation in the Region between the


Available Cluster of Data and the Far Point
Point with a large value of xii

Some of the possible data between the


original cluster and the far point

.
.
.
.
.. .. .. .
. .. .

x
x
x

x x
x
x
x

*
x
x
x x

x
x
x

x
x
x
x
x x
x
More appropriate curvilinear relationship
(seen when the in between data are known).

11-31

Outliers and Influential Observations:


Example 11-2
Unusual Observations
Observations
Unusual
Obs.
M1
EXPORTS
Obs.
M1
EXPORTS
Residual
St.Resid
Residual
St.Resid
5.10
2.6000
11
5.10
2.6000
0.1288
-0.0420
0.1288
-0.0420
4.90
2.6000
22
4.90
2.6000
0.1234
-0.0438
0.1234
-0.0438
25
6.20
5.5000
25
6.20
5.5000
0.0676
0.9051
0.0676
0.9051
26
6.30
3.7000
26
6.30
3.7000
0.0651
-0.9311
0.0651
-0.9311
50
8.30
4.3000
50
8.30
4.3000
0.0648
-0.8317
0.0648
-0.8317
67
8.20
5.6000
67
8.20
5.6000
0.0668
0.6526
0.0668
0.6526

Fit
Fit

Stdev.Fit
Stdev.Fit
-0.14 XX
-0.14
-0.14 XX
-0.14
2.80R
2.80R
-2.87R
-2.87R
-2.57R
-2.57R

2.6420
2.6420
2.6438
2.6438
4.5949
4.5949
4.6311
4.6311
5.1317
5.1317
4.9474
4.9474
2.02R
2.02R

denotes an
an obs.
obs. with
with aa large
large st.
st. resid.
resid.
RR denotes
denotes an
an obs.
obs. whose
whose XX value
value gives
gives it
it large
large influence.
influence.
XX denotes

11-32

11-7 Using the Multiple Regression


Model for Prediction
Sales

EstimatedRegression
RegressionPlane
Planefor
forExample
Example11-1
11-1
Estimated

89.76

Advertising

18.00
63.42
8.00

Promotions

12

11-33

11-34

Prediction in Multiple Regression


(1-
-))100%
100%prediction
predictioninterval
intervalfor
foraavalue
valueof
ofYYgiven
givenvalues
valuesof
of XX::
AA(1
ii
MSE
yytt
,(n(k 1))) ss22((yy))MSE
(
( 2,(n(k 1)))
2
(1--))100%
100%prediction
predictioninterval
intervalfor
forthe
theconditiona
conditionallmean
meanof
ofYYgiven
given
AA(1
valuesof
ofXX::
values
ii
s[[EE(Y(Y)])]
yytt
s

)))
(( 2,(,(nn((kk11)))
2

11-35

11-8 Qualitative (or Categorical)


Independent Variables (in Regression)
An indicator (dummy, binary) variable of qualitative level A:
1 if level A is obtained
Xh
0 if level A is not obtained
MOVIE EARN
MOVIE EARN
1
28
1
28
2
35
2
35
3
50
3
50
4
20
4
20
5
75
5
75
6
60
6
60
7
15
7
15
8
45
8
45
9
50
9
50
10
34
10
34
11
48
11
48
12
82
12
82
13
24
13
24
14
50
14
50
15
58
15
58
16
63
16
63
17
30
17
30
18
37
18
37
19
45
19
45
20
72
20
72

COST
COST
4.2
4.2
6.0
6.0
5.5
5.5
3.3
3.3
12.5
12.5
9.6
9.6
2.5
2.5
10.8
10.8
8.4
8.4
6.6
6.6
10.7
10.7
11.0
11.0
3.5
3.5
6.9
6.9
7.8
7.8
10.1
10.1
5.0
5.0
7.5
7.5
6.4
6.4
10.0
10.0

PROM
PROM
1.0
1.0
3.0
3.0
6.0
6.0
1.0
1.0
11.0
11.0
8.0
8.0
0.5
0.5
5.0
5.0
3.0
3.0
2.0
2.0
1.0
1.0
15.0
15.0
4.0
4.0
10.0
10.0
9.0
9.0
10.0
10.0
1.0
1.0
5.0
5.0
8.0
8.0
12.0
12.0

BOOK
BOOK
0
0
1
1
1
1
0
0
1
1
1
1
0
0
0
0
1
1
0
0
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
1
1
1
1

EXAMPLE 11-3

11-36

Picturing Qualitative Variables in


Regression
y

Line for X2=1

b3

b0+b2

Line for X2=0

x1

b0
X1

regressionwith
withone
one
AAregression
quantitativevariable
variable(X
(X)1)and
and
quantitative
1
onequalitative
qualitativevariable
variable(X
(X):
2):
one
2

y b b x b x
0

x2
multipleregression
regressionwith
withtwo
two
AAmultiple
quantitativevariables
variables(X
(X1and
andXX)2)
quantitative
1
2
andone
onequalitative
qualitativevariable
variable(X
(X):
3):
and
3

y b b x b x b x
0

11-37

Picturing Qualitative Variables in Regression:


Three Categories and Two Dummy Variables
Y

Line for X = 0 and X3 = 1

Line for X2 = 1 and X3 = 0


b0+b3

Line for X2 = 0 and X3 = 0

qualitative
AAqualitative
variablewith
withrr
variable
levelsor
orcategories
categories
levels
representedwith
with
isisrepresented
(r-1)0/1
0/1(dummy)
(dummy)
(r-1)
variables.
variables.

b0+b2
b0

X1

regressionwith
withone
onequantitative
quantitativevariable
variable(X
(X)1)and
andtwo
two
AAregression
1
qualitativevariables
variables(X
(X2and
andXX):
2):
qualitative
2
2

y b b x b x b x
0

Category XX2
Category
2
Adventure 00
Adventure
Drama
Drama
00
Romance 11
Romance

XX33
00
11
00

Using Qualitative Variables in


Regression: Example 11-4

11-38

Salary == 8547
8547
949 Education
Education ++ 1258
1258
Salary
++ 949
Experience -- 3256
3256 Gender
Gender
Experience
(SE)
(32.6)
(45.1)
(SE)
(32.6)
(45.1)
(78.5)
(212.4)
(78.5)
(212.4)
(t)
(262.2)
(21.0)
(t)
(262.2)
(21.0)
(16.0)
(-15.3)
(16.0)
(-15.3)

1 if Female
Gender
0 if Male

Onaverage,
average,female
femalesalaries
salariesare
are
On
$3256below
belowmale
malesalaries
salaries
$3256

Interactions between Quantitative and


Qualitative Variables: Shifting Slopes
Line for X2=0

Line for X2=1

Slope = b1

b0
Slope = b1+b3

b0+b2
X1

regressionwith
withinteraction
interactionbetween
betweenaaquantitative
quantitative
AAregression
variable(X
(X)1)and
andaaqualitative
qualitativevariable
variable(X
(X2):):
variable
1
2

y b b x b x b x x
0

11-39

11-40

11-9 Polynomial Regression


One-variable polynomial regression model:
Y= 0+ 1 X + 2X2 + 3X3 +. . . + mXm +
where m is the degree of the polynomial - the highest power of X appearing in
the equation. The degree of the polynomial is the order of the model.
Y

y b b X
0

y b b X
0

y b b X b X
(b 0)
0

y b b X b X b X
2

X1

X1

Polynomial Regression: Example 11-5


Using the Template

11-41

Polynomial Regression: Example 11-5


Using Minitab

11-42

Polynomial Regression: Other


Variables and Cross-Product Terms

11-43

Variable Estimate
Estimate Standard
StandardError
Error T-statistic
T-statistic
Variable
2.34
0.92
2.54
XX1 1
2.34
0.92
2.54
3.11
1.05
2.96
XX2 2
3.11
1.05
2.96
2
4.22
1.00
4.22
XX121
4.22
1.00
4.22
2
3.57
2.12
1.68
XX222
3.57
2.12
1.68
2
2.77
2.30
1.20
2
1X
XX1X
2.77
2.30
1.20

11-44

11-10 Nonlinear Models and


Transformations
The multiplicative
multiplicative model
model::
The

X
X
X
Y X X X
The logarithmic
logarithmic transformation
transformation::
The
logYY log
log log
logX
X log
logX
X log
logX
X log
log
log
0

11-45

Transformations: Exponential Model


The exponential
exponential model
model::
The

e
Y e
The logarithmic
logarithmic transformation
transformation::
The
logY
Y
log
log

X
X
log
log
log
0
0

1X
1X

0
0

1
1

1
1

11-46

Plots of Transformed Variables


Sim ple Regression of Sales on Advertising

Regression of Sales on Log(Advertising)


25

20

Y = 6 .59 2 71 + 1.19 176 X


R- Sq uared = 0 .8 9 5

10

SALES

SALES

30

15

Y = 3.6 6 8 2 5 + 6.78 4 X
R- Sq uared = 0 .978
5

10

15

ADVERT

LOGADV

Regression of Log(Sales) on Log(Advertising)

Residual Plots: Sales vs Log(Advertising)


1.5

3.5

2.5

Y = 1.70 0 8 2 + 0 .5 53 13 6 X
R- Sq uar ed = 0 .9 47

RESIDS

LOGSALE

0.5

-0.5

-1.5

1.5
0

LOGADV

12

Y-HAT

22

11-47

Variance Stabilizing Transformations

Squareroot
roottransformation:
transformation:
Square

Y Y

Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisisapproximately
approximatelyproportional
proportional
Useful
theconditional
conditionalmean
meanof
ofYY
totothe

Logarithmictransformation:
transformation:
Logarithmic

Y log(Y )

Usefulwhen
whenthe
thevariance
varianceof
ofregression
regressionerrors
errorsisisapproximately
approximatelyproportional
proportionaltoto
Useful
thesquare
squareof
ofthe
theconditional
conditionalmean
meanof
ofYY
the

Reciprocaltransformation:
transformation:
Reciprocal

1
Y
Y

Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisisapproximately
approximatelyproportional
proportional
Useful
thefourth
fourthpower
powerof
ofthe
theconditional
conditionalmean
meanof
ofYY
totothe

Regression with Dependent Indicator


Variables
The logistic function:
e ( X )
E (Y X )
1 e ( X )
0

Transformation to linearize the logistic function:


p

1 p

p log

Logistic Function

11-48

11-49

11-11: Multicollinearity
x2
x2

x1
Orthogonal X variables provide
information from independent
sources. No multicollinearity.

x2
x1
Some degree of collinearity.
Problems with regression depend
on the degree of collinearity.

x1

Perfectly collinear X variables


provide identical information
content. No regression.

x2

x1

A high degree of negative


collinearity also causes problems
with regression.

11-50

Effects of Multicollinearity

Variancesof
ofregression
regressioncoefficients
coefficientsare
areinflated.
inflated.
Variances
Magnitudesof
ofregression
regressioncoefficients
coefficientsmay
maybe
bedifferent
different
Magnitudes
fromwhat
whatare
areexpected.
expected.
from
Signsof
ofregression
regressioncoefficients
coefficientsmay
maynot
notbe
beas
asexpected.
expected.
Signs
Addingor
orremoving
removingvariables
variablesproduces
produceslarge
largechanges
changesin
in
Adding
coefficients.
coefficients.
Removingaadata
datapoint
pointmay
maycause
causelarge
largechanges
changesin
in
Removing
coefficientestimates
estimatesor
orsigns.
signs.
coefficient
Insome
somecases,
cases,the
theFFratio
ratiomay
maybe
besignificant
significantwhile
whilethe
thett
In
ratiosare
arenot.
not.
ratios

Detecting the Existence of Multicollinearity:


Correlation Matrix of Independent Variables and
Variance Inflation Factors

11-51

11-52

Variance Inflation Factor


Thevariance
varianceinflation
inflationfactor
factorassociated
associatedwith
withXXhh::
The
1
1
VIF((XXhh))
VIF
22
1

R
1 Rhh
2
2
whereRR2hh isisthe
theRR2 value
valueobtained
obtainedfor
forthe
theregression
regressionof
ofXXon
on
where
theother
otherindependent
independentvariables.
variables.
the
Relationship between VIF and Rh2

VIF100

50

0
0.0

0.5

1.0

Rh2

11-53

Variance Inflation Factor (VIF)

Observation: The VIF (Variance Inflation Factor) values


for both variables Lend and Price are both greater than
5. This would indicate that some degree of
multicollinearity exists with respect to these two
variables.

Solutions to the Multicollinearity


Problem
Drop aa collinear
collinear variable
variable from
from the
the
Drop
regression
regression
Change in
in sampling
sampling plan
plan to
to include
include
Change
elements outside
outside the
the multicollinearity
multicollinearity range
range
elements
Transformations of
of variables
variables
Transformations
Ridge regression
regression
Ridge

11-54

11-12 Residual Autocorrelation and


the Durbin-Watson Test
Anautocorrelation
autocorrelationisisaacorrelation
correlationof
ofthe
thevalues
valuesof
ofaavariable
variable
An
withvalues
valuesof
ofthe
thesame
samevariable
variablelagged
laggedone
oneor
ormore
moreperiods
periods
with
back. Consequences
Consequencesof
ofautocorrelation
autocorrelationinclude
includeinaccurate
inaccurate
back.
estimatesof
ofvariances
variancesand
andinaccurate
inaccuratepredictions.
predictions.
estimates
Lagged Residuals
Residuals
Lagged
ii

11
22
33
44
55
66
77
88
99
10
10

1.0
1.0
0.0
0.0
-1.0
-1.0
2.0
2.0
3.0
3.0
-2.0
-2.0
1.0
1.0
1.5
1.5
1.0
1.0
-2.5
-2.5

i-1
i i
i-1
**
**
1.0 **
1.0
0.0 1.0
1.0
0.0
-1.0 0.0
0.0
-1.0
2.0 -1.0
-1.0
2.0
3.0 2.0
2.0
3.0
-2.0 3.0
3.0
-2.0
1.0 -2.0
-2.0
1.0
1.5 1.0
1.0
1.5
1.0 1.5
1.5
1.0

i-2 i-3
i-2
i-3
*
*
**
**
**
**
**
1.0
1.0
**
0.0
1.0
0.0
1.0
-1.0
0.0
-1.0
0.0
2.0 -1.0
-1.0
2.0
3.0
2.0
3.0
2.0
-2.0
3.0
-2.0
3.0
1.0 -2.0
-2.0
1.0

TheDurbin-Watson
Durbin-Watsontest
test(first-order
(first-order
The
autocorrelation):
autocorrelation):
i-4
i-4
HH0:0:11==00
0
1:
HH1:
0
TheDurbin-Watson
Durbin-Watsontest
teststatistic:
statistic:
The
n
2
( ei ei 1 )
d i2 n
2
ei
i 1

11-55

11-56

Critical Points of the Durbin-Watson Statistic: = 0.05,


n = Sample Size, k = Number of Independent Variables
kk==11
kk==22
nn ddLL ddUU ddLL ddUU

15
15
16
16
17
17
18
18
. ..
..
.
65
65
70
70
75
75
80
80
85
85
90
90
95
95
100
100

1.08
1.08
1.10
1.10
1.13
1.13
1.16
1.16
1.57
1.57
1.58
1.58
1.60
1.60
1.61
1.61
1.62
1.62
1.63
1.63
1.64
1.64
1.65
1.65

1.36
1.36
1.37
1.37
1.38
1.38
1.39
1.39
. ..
..
.
1.63
1.63
1.64
1.64
1.65
1.65
1.66
1.66
1.67
1.67
1.68
1.68
1.69
1.69
1.69
1.69

0.95
0.95
0.98
0.98
1.02
1.02
1.05
1.05
1.54
1.54
1.55
1.55
1.57
1.57
1.59
1.59
1.60
1.60
1.61
1.61
1.62
1.62
1.63
1.63

1.54
1.54
1.54
1.54
1.54
1.54
1.53
1.53
. ..
..
.
1.66
1.66
1.67
1.67
1.68
1.68
1.69
1.69
1.70
1.70
1.70
1.70
1.71
1.71
1.72
1.72

kk==33
ddLL ddUU

0.82
0.82
0.86
0.86
0.90
0.90
0.93
0.93
1.50
1.50
1.52
1.52
1.54
1.54
1.56
1.56
1.57
1.57
1.59
1.59
1.60
1.60
1.61
1.61

1.75
1.75
1.73
1.73
1.71
1.71
1.69
1.69
. ..
..
.
1.70
1.70
1.70
1.70
1.71
1.71
1.72
1.72
1.72
1.72
1.73
1.73
1.73
1.73
1.74
1.74

kk==44
ddLL ddUU

0.69
0.69
0.74
0.74
0.78
0.78
0.82
0.82
1.47
1.47
1.49
1.49
1.51
1.51
1.53
1.53
1.55
1.55
1.57
1.57
1.58
1.58
1.59
1.59

1.97
1.97
1.93
1.93
1.90
1.90
1.87
1.87
. ..
..
.
1.73
1.73
1.74
1.74
1.74
1.74
1.74
1.74
1.75
1.75
1.75
1.75
1.75
1.75
1.76
1.76

kk==55
ddLL ddUU

0.56
0.56
0.62
0.62
0.67
0.67
0.71
0.71
1.44
1.44
1.46
1.46
1.49
1.49
1.51
1.51
1.52
1.52
1.54
1.54
1.56
1.56
1.57
1.57

2.21
2.21
2.15
2.15
2.10
2.10
2.06
2.06
. ..
..
.
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.77
1.78
1.78
1.78
1.78
1.78
1.78

11-57

Using the Durbin-Watson Statistic

Positive
Autocorrelation

dL

Test is
Inconclusive

dU

No
Autocorrelation

Test is
Inconclusive

4-dU

Negative
Autocorrelation

4-dL

Fornn==67,
67,kk==4:
4: ddUU1.73
1.73 4-d
4-dUU2.27
2.27
For
1.47 44-ddLL2.53
2.53<<2.58
2.58
ddLL1.47
rejected,and
andwe
weconclude
concludethere
thereisisnegative
negativefirst-order
first-order
HH00isisrejected,
autocorrelation.
autocorrelation.

11-13 Partial F Tests and Variable


Selection Methods

11-58

Fullmodel:
model:
Full
YY==0 0++1 1XX1 1++2 2XX2 2++3 3XX3 3++4 4XX4 4++
Reducedmodel:
model:
Reduced
YY==0 0++1 1XX1 1++2 2XX2 2++
PartialFFtest:
test:
Partial
HH0:0:3 3==4 4==00
and 4not
notboth
both00
HH1:1:3 3and
4
PartialFFstatistic:
statistic:
Partial

(SSE
F

(r, (n (k 1))

SSE ) / r
R
F
MSE
F

whereSSE
SSERisisthe
thesum
sumofofsquared
squarederrors
errorsofofthe
thereduced
reducedmodel,
model,SSE
SSEFisisthe
thesum
sumofofsquared
squared
where
R
F
errorsofofthe
thefull
fullmodel;
model;MSE
MSEFisisthe
themean
meansquare
squareerror
errorofofthe
thefull
fullmodel
model[MSE
[MSEF==SSE
SSE/(nF/(nerrors
F
F
F
(k+1))];rrisisthe
thenumber
numberofofvariables
variablesdropped
droppedfrom
fromthe
thefull
fullmodel.
model.
(k+1))];

Variable Selection Methods Using the


Template Example 11-2

Allpossible
possibleregressions
regressions
All
Runregressions
regressionswith
withall
allpossible
possiblecombinations
combinationsof
ofindependent
independentvariables
variables
Run
andselect
selectbest
bestmodel
model
and

A p-value of 0.001 indicates


that we should reject the null
hypothesis H0: the slopes for
Lend and Exch. are zero.

11-59

Variable Selection Methods Using Minitab


Example 11-2

11-60

11-61

Variable Selection Methods

Stepwiseprocedures
procedures
Stepwise
Forwardselection
selection
Forward

Addone
onevariable
variableatataatime
timetotothe
themodel,
model,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Add

Backwardelimination
elimination
Backward

Removeone
onevariable
variableatataatime,
time,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Remove

Stepwiseregression
regression
Stepwise

Addsvariables
variablestotothe
themodel
modeland
andsubtracts
subtractsvariables
variablesfrom
fromthe
themodel,
model,on
onthe
thebasis
basis
Adds
theFFstatistic
statistic
ofofthe

11-62

Stepwise Regression
Compute F statistic for each variable not in the model

Is there at least one variable with p-value > Pin?

No

Stop

Yes
Enter most significant (smallest p-value) variable into model

Calculate partial F for all variables in the model

Is there a variable with p-value > Pout?


No

Remove
variable

Stepwise Regression: Using the


Computer (MINITAB) Example 11-2

11-63

11-64

Using the Computer: MINITAB

S-ar putea să vă placă și