Documente Academic
Documente Profesional
Documente Cultură
This procedure performs multiple linear regression with five methods for entry and
removal of variables. It also provides extensive analysis of residual and influential
cases. Caseweight (CASEWEIGHT) and regression weight (REGWGT) can be
specified in the model fitting.
Notation
The following notation is used throughout this chapter unless otherwise stated:
yi
ci
gi
wi
ci gi
l
i =1
Sum of caseweights:
i =1
x ki
Xk
w x
Sample mean for the dependent variable: Y = w y W
l
i ki
i =1
i i
i =1
hi
2 REGRESSION
~
hi
gi
+ hi
W
S kj
S yy
S ky
Descriptive Statistics
r
11
r21
R=
ry1
K
K
K
K
r1 p r1 y
r2 p r2 y
ryp
ryy
"#
##
##
$
where
rkj =
S kj
S kk S jj
and
S ky
ryk = rky =
S kk S yy
The sample mean X i and covariance Sij are computed by a provisional means
algorithm. Define
k
Wk =
w =
i
i =1
REGRESSION 3
then
Xi
1k 6 = X i1 k 16 + 4 xik X i1 k 16 9 Wkk
w
Cij
16
k
= Cij
1 6 4
k 1
+ xik X i
1 6 94 x
k 1
jk
Xj
w
9
1 6
k 1
wk2
Wk
Otherwise,
Cij
1 k 6 = Cij1k 16 + wk xik x jk
where
16
X i 1 = xi 1
and
16
Cij 1 = 0
The sample covariance Sij is computed as the final Cij divided by C 1 .
Yi = 0 + 1 X 1i + 2 X 2i + L + p X pi + ei
sweep operations are used to compute the least squares estimates b of E and the
associated regression statistics. The sweeping starts with the correlation matrix R.
4 REGRESSION
~
Let R be the new matrix produced by sweeping on the kth row and column of R.
~
The elements of R are
1
r~kk =
rkk
r
r~ik = ik ,
rkk
rkj
r~kj =
,
rkk
ik
jk
and
i k, j k
R=
R
R
11
R 12
21
R 22
where R 11 contains independent variables in the equation at the current step, the
result is
1
R 11
~
R=
1
R 21R 11
1
R 11
R 12
1
R 22 R 21R 11
R 12
REGRESSION 5
can be used to obtain the partial correlations for the variables not in the equation,
controlling for the variables already in the equation. Note that this routine is its own
inverse; that is, exactly the same operations are performed to remove a variable as
to enter a variable.
r
jj
r jk rkj
rkk
t 1
The above condition is imposed so that entry of the variable does not reduce the
tolerance of variables already in the model to unacceptable levels.
The F-to-enter value for X k is computed as
4C p 19V
F to enterk =
ryy Vk
Vk =
ryk rky
rkk
4C p 9 V
=
F to removek
ryy
6 REGRESSION
Stepwise
If there are independent variables currently entered in the model, choose X k such
that F to removek is minimum. X k is removed if F to removek < Fout
(default = 2.71) or, if probability criteria are used, P F to removek > Pout
(default = 0.1). If the inequality does not hold, no variable is removed from the
model.
If there are no independent variables currently entered in the model or if no
entered variable is to be removed, choose X k such that F to enterk is
maximum.
Xk
is entered if
P F to enterk < Pin (default = 0.05). If the inequality does not hold, no
variable is entered.
At each step, all eligible variables are considered for removal and entry.
Forward
This procedure is the entry phase of the stepwise procedure.
Backward
This procedure is the removal phase of the stepwise procedure and can be used only
after at least one independent variable has been entered in the model.
REGRESSION 7
Statistics
Summary
For the summary statistics, assume p independent variables are currently entered in
the equation, of which a block of q variables have been entered or removed in the
current step.
Multiple R
R = 1 ryy
R Square
R 2 = 1 ryy
Adjusted R Square
41 R 9 p
2
Radj
=R
C p
8 REGRESSION
%K R2 4C p 9
2
KK q41 Rcurrent
9
F = & 2
KK R 4C p q9
2
19
K' q4 Rprevious
the degrees of freedom for the addition are q and C p , while the degrees of
freedom for the removal are q and C p q .
Residual Sum of Squares
1 6
SSe = ryy C 1 S yy
with degrees of freedom C p .
Sum of Squares Due to Regression
1 6
SS R = R 2 C 1 S yy
with degrees of freedom p.
REGRESSION 9
ANOVA Table
Analysis of Variance
df
Sum of Squares
Regression
SS R
C p
SSe
Mean Square
1SS 6 p
1SS 6 4C p 9
R
1 6
var bk =
Skk C p
cov bk , b j =
cor bk , b j =
Skk S jj C p
rkj
rkk rjj
Selection Criteria
Akaike Information Criterion (AIC)
AIC = C ln
SS + 2 p
C
e
10 REGRESSION
41 R 94C + p 9
PC =
C p
Mallows Cp (CP)
CP =
SSe
+ 2 p* C
2
$
where $ 2 is the mean square error from fitting the model that includes all the
variables in the variable list.
Schwarz Bayesian Criterion (SBC)
SBC = C ln
SS + p ln1C6
C
e
Collinearity
Variance Inflation Factors
VIFi =
1
rii
Tolerance
Tolerancei = rii
REGRESSION 11
Eigenvalues, ON
The eigenvalues of scaled and uncentered cross-product matrix for the
independent variables in the equation are computed by the QL method
(Wilkinson and Reinsch, 1971).
Condition Indices
max j
k =
Variance-Decomposition Proportions
Let
v i = vi1 ,K , vip
) ij =
vij2
i and ) j =
ij
i =1
ij = ) ij ) j
bk =
ryk Syy
Skk
for k = 1,K , p
12 REGRESSION
rkk ryy S yy
$ bk =
S kk C p
bk $ bk t 0.025, C p
b0 = y
b X
k
k =1
1C 16r S + X $
C 4C p 9
p
$ b2 =
0
yy yy
k =1
2 2
k bk
+2
p 1
k = j +1 j =1
Beta Coefficients
Beta k = ryk
The standard error of Beta k is estimated by
$ Betak =
ryy rkk
C p
X j est . cov bk , b j
REGRESSION 13
Beta
F =
$
Beta k
1 6
Part Corr X k =
ryk
rkk
1 6
Partial Corr X k =
ryk
rkk ryy ryk rky
Beta k =
ryk
rkk
4C p 19r
F=
2
yk
2
rkk ryy ryk
14 REGRESSION
1 6
Partial X k =
ryk
ryy rkk
Tolerance of Xk
Tolerancek = rkk
Minimum tolerance among variables already in the equation if Xk enters at the next step is
min
1 j p
r r 1r r
3 8
jj
kj jk
, rkk
kk
%K
KK 0Cg 15 3 X X S83 XS
K
h =&
KK
KK 0Cg 15 X SX S r
'
p
ji
ki
Xk rjk
if intercept is included
jj kk
j =1 k =1
ji ki jk
j =1 k =1
jj kk
otherwise
REGRESSION 15
For selected cases, leverage is hi ; for unselected case i with positive caseweight,
leverage is
%Kg 1 + h 1 + 1 + h 1 "
K W W W + 1#$
h = & !
KKh 11 + h g 6
'
i
if intercept is included
otherwise
%K
b X
K
$
Y =&
KKb + b X
K'
p
k =1
if no intercept
ki
ki
otherwise
k =1
Unstandardized Residuals
ei = Yi Y$i
Standardized Residuals
%K e
ZRESID = & s
K'SYSMIS
i
otherwise
16 REGRESSION
%K Y$ Y
ZPRED = & sd
KKSYSMIS
'
i
otherwise
where sd is computed as
sd =
ci Y$i Y
i =1
C 1
Studentized Residuals
%K e s
~
1 h 9 g
K
4
=&
KK e ~s
K' 41 + h 9 g
i
SRESi
otherwise
Deleted Residuals
DRESIDi =
%Ke 41 h~ 9
&Ke
'
i
i
REGRESSION 17
%K DRESID
s1 6
K
SDRESID = &
KK e~
K' s 41 + h 9 g
i
otherwise
i
16
where s i is computed as
16
si =
4C p 9s
~
1 hi
C p 1
DRESIDi2
ADJPREDi = Yi DRESIDi
DfBeta
16
g e X WX
DFBETAi = b b i = i i
~
1 hi
Xit
where
Xit =
%K31, X ,K, X 8
&K3 X ,K, X 8
'
1i
if intercept is included
pi
1i
ottherwise
pi
and W = diag w1 ,K , wl .
18 REGRESSION
Standardized DfBeta
16
bj bj i
SDBETAij =
t
1 6 4X WX9 jj
si
16
16
16
DFFITi = X i b b i =
~
hi ei
~
1 hi
Standardized DfFit
SDFITi =
DFFITi
~
s i hi
16
Covratio
s1 6
COVRATIO =
s
i
2 p
1
~
1 hi
Mahalanobis Distance
For selected cases with ci > 0 ,
MAHALi =
%&1C 16h
'C h
if intercept is included
otherwise
REGRESSION 19
MAHALi =
%&C h
'1C + 16h
if intercept is included
otherwise
%K4 DRESIDi2h~i gi 9 s2 1 p + 16
COOKi = &
K'4 DRESIDi2hi gi 9 4s2 p9
if intercept is included
otherwise
%K DRESID h + 1
W
K
=&
KK4 DRESID h 9 4~s p9
'
2
i
COOKi
2
i i
1 6
~
s 2 p +1
if intercept is included
otherwise
~2
s
2
i
if intercept is included
otherwise
20 REGRESSION
%Ks
&Ks
'
SEPREDi =
~
hi gi
if intercept is included
hi gi
otherwise
LICINi
UICINi
%KY$ t
=&
K'Y$ t
0.025, C p
0.025, C p s
%KY$ + t
=&
K'Y$ + t
0.025, C p
0.025, C p s
1e~ e~ 6
i
i 1
i =2
c e~
i i
i =1
where e~i = ei gi .
4h~ + 19 g
1h + 16 g
i
Durbin-Watson Statistic
DW =
otherwise
4h~ + 19 g
1h + 16 g
i
if intercept is included
if intercept is included
otherwise
REGRESSION 21
Missing Values
By default, a case that has a missing value for any variable is deleted from the
computation of the correlation matrix on which all consequent computations are
based. Users are allowed to change the treatment of cases with missing values.
References
Cook, R. D. 1977. Detection of influential observations in linear regression,
Technometrics, 19: 1518.
Dempster, A. P. 1969. Elements of Continuous Multivariate Analysis. Reading,
Mass.: Addison-Wesley.
Velleman, P. F., and Welsch, R. E. 1981. Efficient computing of regression
diagnostics. The American Statistician, 35: 234242.
Wilkinson, J. H., and Reinsch, C. 1971. Linear algebra. In: Handbook for
Automatic Computation, Volume II, J. H. Wilkinson and C. Reinsch, eds. New
York: Springer-Verlag.