Documente Academic
Documente Profesional
Documente Cultură
Seventeen
17-1
Chapter Outline
1) Overview
2) Product-Moment Correlation
3) Partial Correlation
4) Nonmetric Correlation
5) Regression Analysis
6) Bivariate Regression
7) Statistics Associated with Bivariate
Regression Analysis
8) Conducting Bivariate Regression Analysis
i. Scatter Diagram
ii. Bivariate Regression Model
2007
17-2
Chapter Outline
iii.
iv.
v.
vi.
vii.
viii.
9)
10)
11)
2007
Estimation of Parameters
Standardized Regression Coefficient
Significance Testing
Strength and Significance of Association
Prediction Accuracy
Assumptions
Multiple Regression
Statistics Associated with Multiple
Regression
Conducting Multiple Regression
i.
Partial Regression Coefficients
ii.
Strength of Association
iii.
Significance Testing
iv.
Examination of Residuals
17-3
Chapter Outline
12) Stepwise Regression
13) Multicollinearity
14) Relative Importance of Predictors
15) Cross Validation
16) Regression with Dummy Variables
17) Analysis of Variance and Covariance with
Regression
18) Summary
2007
17-4
2007
17-5
Product Moment
Correlation
= 1
i
n
r=
= 1
(X i X )(Y i Y )
(X i X )
i=1
(Y i Y )2
D iv is io n o f th e n u m er ato r an d d en o m in ato r b y ( n 1 ) g iv es
i=1
n
r=
= 1
i
2007
( X i X )( Y i Y )
n 1
(X i X )2
n 1
C OV x y
SxSy
i=1
(Y i Y )2
n 1
17-6
2007
17-7
2007
Duration of
Residence
Importance
Attached to
Weather
10
12
11
12
10
12
11
11
18
10
10
11
10
17
12
17-8
X
Y
n
= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12
= 9.333
= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12
= 6.583
=1 (X i X )(Y i Y )
i
2007
=
+
+
+
+
+
=
+
+
=
17-9
=1
(X i X )2
i =1
Thus,
2007
+
+
=
+
+
=
r=
179.6668
(304.6668)(120.9168)
= 0.9361
17-10
S S x
=
S S y
2
= T o ta l v a r ia tio n E r r o r v a r ia tio n
T o ta l v a r ia tio n
=
2007
S S
S S
S S
e rro r
17-11
correlation
is denoted by , the Greek letter
H0 : = 0
H1 : 0
2007
17-12
t=r n22
1r
1/2
2007
17-13
2007
-3
-2
-1
17-14
Partial Correlation
A partial correlation coefficient measures the
association between two variables after controlling
for,
or adjusting for, the effects of one or more additional
variables.
rx y . z =
rx y (rx z )(ry z )
1rx2z 1ry2z
2007
17-15
Partial Correlation
2007
17-16
rx y ry z rx z
ry (x . z )=
1rx2z
2007
17-17
Nonmetric Correlation
numeric, Spearman's
tau, , are two measures of nonmetric
correlation, which can be used to examine the
correlation between them.
Both these measures use rankings rather than
the absolute values of the variables, and the
basic concepts underlying them are quite similar.
Both vary from -1.0 to +1.0 (see Chapter
15).
s
In the absence of ties, Spearman's
yields a
closer approximation to
the Pearson product
2007
17-18
Regression Analysis
Regression analysis examines associative relationships
between a metric dependent variable and one or more
independent variables in the following ways:
Determine whether the independent variables explain a
significant variation in the dependent variable: whether a
relationship exists.
Determine how much of the variation in the dependent
variable can be explained by the independent variables:
strength of the relationship.
Determine the structure or form of the relationship: the
mathematical equation relating the independent and
dependent variables.
Predict the values of the dependent variable.
Control for other independent variables when evaluating
the contributions of a specific variable or set of variables.
Regression analysis is concerned with the nature and
degree of association between variables and does not
imply or assume any causality.
2007
17-19
2007
17-20
Regression coefficient.
The estimated
parameter b is usually referred to as the nonstandardized regression coefficient.
2007
17-21
Also
termed the beta coefficient or beta weight, this
is the slope obtained by the regression of Y on
X when the data are standardized.
2007
17-22
e
minimizes the sum of squared errors,j
.
2007
17-23
2007
17-24
0 + 1 Xi + ei
2007
17-25
Attitude
Fig. 17.3
9
6
3
2.25
4.5
6.75
Duration of Residence
2007
17-26
Line 1
Line 2
Line 3
Line 4
6
3
2.25 4.5
2007
6.75
17-27
Bivariate
Regression
Fig. 17.5
0 + 1X
Y
YJ
eJ
eJ
YJ
X1
2007
X2
X3
X4
X5
17-28
Y i = a + b xi
where Y i is the estimated or predicted value of Yi, and
0 and 1 , respectively.
a and b are estimators of
b=
i=1
i=1
2007
S x2
(X i X )(Y i Y )
COV xy
i=1
n
(X i X )
X iY i nX Y
i=1
X i2 nX 2
17-29
XiYi
i =1
= (10) (6) + (12) (9) + (12) (8) + (4) (3) + (12) (10) +
(6) (4)
+ (8) (5) + (2) (2) + (18) (11) + (9) (9) + (17) (10) +
(2) (2)
= 917
12
X i2
i =1
2007
17-30
9.333
Y = 6.583
Given n = 12, b can be calculated as:
917(12)(9.333)(6.583)
b=
1350(12)(9.333)2
= 0.5897
a =Y - b X
= 6.583 - (0.5897) (9.333)
= 1.0793
2007
17-31
2007
17-32
H0 : 1 = 0
H1 : 1 0
A t statistic with n - 2 degrees of freedom can be
used, where
t= b
SEb
2007
17-33
2007
17-34
n
SSreg=i (YiY)2
=1
n
SSres= i=1 (YiYi)2
2007
17-35
al n
t
To tio
ria y
a
V SS
X1
2007
X2
X3
X4
X5
17-36
follows:
SS
r2 = reg
SSy
S S S S res
= y
SSy
To illustrate the calculations of r2, let us consider again the effect of att
toward the city on the duration of residence. It may be recalled from e
calculations of the simple correlation coefficient that:
n
SS y = (Y i Y )2
i =1
= 120.9168
2007
17-37
2007
17-38
i =1
= (6.9763-6.5833)2 + (8.1557-6.5833)2
+ (8.1557-6.5833)2 + (3.4381-6.5833)2
+ (8.1557-6.5833)2 + (4.6175-6.5833)2
+ (5.7969-6.5833)2 + (2.2587-6.5833)2
+ (11.6939 -6.5833)2 + (6.3866-6.5833) 2
+
(11.1042 -6.5833)2 + (2.2587-6.5833)2
=0.1544 + 2.4724 + 2.4724 + 9.8922 + 2.4724
+ 3.8643 + 0.6184 + 18.7021 + 26.1182
+ 0.0387 + 20.4385 + 18.7021
= 105.9524
2007
17-39
i =1
4.6175)2
+ (5-5.7969)2 + (2-2.2587)2 + (11+ (9-6.3866)2 + (10-11.1042)2 +
11.6939)2
(2-2.2587)2
= 14.9644
It can be seen that SSy = SSreg + SSres . Furthermore,
= SSreg /SSy
= 105.9524/120.9168
= 0.8762
2007
17-40
2007
17-41
SS reg
SS res /(n2)
0: = 0
H
or
H0 : 0
2007
17-42
r2 = 105.9522/(105.9522 + 14.9644)
= 0.8762
F = 105.9522/(14.9644/10)
= 70.8027
2007
17-43
Bivariate Regression
Table 17.2
Multiple R
R2
Adjusted R2
Standard Error
0.93608
0.87624
0.86387
1.22329
df
Regression
1
Residual
F = 70.80266
ANALYSIS OF VARIANCE
Sum of Squares Mean Square
105.95222
105.95222
10
14.96444
1.49644
Significance of F = 0.0000
Variable
Significance
T
Duration
8.414
0.0000
2007
T
of
0.58972
0.07008
0.93608
17-44
SEE
or
SEE
(Y i Y i )
i 1
n2
SS
res
n2
SEE SS res
n k 1
2007
For the data given in Table 17.2, the SEE is estimated as follows:
SEE=
14.9644/(122)
= 1.22329
17-45
Assumptions
2007
17-46
Multiple Regression
The general form of the multiple regression model
is as follows:
Y= 0 + 1 X1 + 2 X2 + 3 X3+ . . . + k Xk + e
which is estimated by the following equation:
= a + b 1 X 1 + b2 X 2 + b 3 X 3 + . . . + bk X k
2007
17-47
2007
17-48
2007
17-49
2007
17-50
2007
17-51
2007
17-52
Multiple Regression
Table 17.3
Multiple R
R2
Adjusted R2
Standard Error
0.97210
0.94498
0.93276
0.85974
df
Regression
2
Residual
F = 77.29364
Variable
Significance
T
IMPORTANCE
0.0085
2007
DURATION
ANALYSIS OF VARIANCE
Sum of Squares Mean Square
114.26425
57.13213
9
6.65241
0.73916
Significance of F = 0.0000
T
of
0.28865
0.08608
0.31382
3.353
0.48108
0.05895
0.76363
8.160
17-53
SSy =
S S reg =
S S res =
2007
i =1
n
i =1
n
i =1
(Y i Y )2
(Y i Y )
(Y i Y i )
17-54
SS reg
SS y
R 2 =
2007
17-55
2
pop
H0: 1 = 2 = 3 = . . . = k = 0
The overall test can be conducted by using an F statistic:
SS reg /k
F=
SS res /(nk1)
R 2 /k
=
(1R 2 )/(nk1)
2007
17-56
's
Testing for the significance of the
can be done in a man
i
similar to that in the bivariate case by using t tests. The
significance of the partial coefficient for importance
attached to weather may be tested by the following equation
t= b
SE
2007
17-57
2007
17-59
Residuals
Predicted Y Values
2007
17-60
Residuals
Time
2007
17-61
Residuals
Predicted Y
Values
2007
17-62
Stepwise
Regression
The purpose of stepwise regression is to select, from a large
2007
17-63
Multicollinearity
Multicollinearity arises when intercorrelations
among the predictors are very high.
Multicollinearity can result in several problems,
including:
The partial regression coefficients may not be
estimated precisely. The standard errors are
likely to be high.
The magnitudes as well as the signs of the
partial regression coefficients may change
from sample to sample.
It becomes difficult to assess the relative
importance of the independent variables in
explaining the variation in the dependent
variable.
Predictor variables may be incorrectly included
or removed in stepwise regression.
2007
17-64
Multicollinearity
2007
17-65
Relative Importance of
Predictors
Unfortunately, because the predictors are correlated,
there is no unambiguous measure of relative
importance of the predictors in regression analysis.
However, several approaches are commonly used to
assess the relative importance of predictor variables.
2007
17-66
Relative Importance of
Predictors
Cross-Validation
The available data are split into two parts, the estimation
sample and the validation sample. The estimation sample
generally contains 50-90% of the total sample.
2007
17-68
0
1
0
0
0
0
1
0
Y i = a + b 1D 1 + b 2D 2 + b 3D 3
2007
17-69
Y
In regression with dummy variables, the predicted
category is the mean of Y for each category.
Product Usage
Category
Predicted
Mean
Value
Value
Y
Nonusers...............
Light Users...........
Medium Users.......
Heavy Users..........
2007
for each
a + b1
a + b2
a + b3
a
a + b1
a + b2
a + b3
a
17-70
SS res = (Y i Y i )
i =1
n
SS reg = (Y i Y )
One-Way ANOVA
= SSwithin = SSerror
= SSbetween = SSx
i =1
Overall F test
2007
= F test
17-71
SPSS Windows
The CORRELATE program computes Pearson product
moment correlations and partial correlations with
significance levels. Univariate statistics, covariance,
and cross-product deviations may also be requested.
Significance levels are included in the output. To
select these procedures using SPSS for Windows click:
Analyze>Correlate>Bivariate
Analyze>Correlate>Partial
Scatterplots can be obtained by clicking:
Graphs>Scatter >Simple>Define
REGRESSION calculates bivariate and multiple
regression equations, associated statistics, and plots.
It allows for an easy examination of residuals. This
procedure can be run by clicking:
Analyze>Regression Linear
2007
17-72
SPSS Windows:
Correlations
1.
2.
3.
4.
5.
6.
7.
Click OK.
2007
17-73
2.
3.
4.
5.
6.
7.
8.
Click CONTINUE.
9.
Click OK.
2007
17-74