Sunteți pe pagina 1din 53

1

3.3 Hypothesis Testing in Multiple


Linear Regression
Questions:
What is the overall adequacy of the model?
Which specific regressors seem important?
Assume the errors are independent and follow a
normal distribution with mean 0 and variance o
2

2
3.3.1 Test for Significance of Regression
Determine if there is a linear relationship between
y and x
j
, j = 1,2,,k.
The hypotheses are
H
0
:
1
=
2
==
k
= 0
H
1
:
j
= 0 for at least one j
ANOVA
SS
T
= SS
R
+ SS
Res

SS
R
/o
2
~ ;
2
k
, SS
Res
/o
2
~ ;
2
n-k-1
, and SS
R
and SS
Res

are independent
1 ,
Re Re
0
~
) 1 /(
/

=

=
k n k
s
R
s
R
F
MS
MS
k n SS
k SS
F
3







Under H
1
, F
0
follows F distribution with k and n-
k-1 and a noncentrality parameter of
|
|
|
.
|

\
|


=
=
+ =
=
k nk n
k k
c
k
c c
R
s
x x x x
x x x x
X
k
X X
MS E
MS E

1 1
1 1 11
1
*
2
* ' *'
2
2
Re
)' ,..., (
) (
) (
| | |
o
| |
o
o
2
* ' *'
o
| |

c c
X X
=
4
ANOVA table

5
6
Example 3.3 The Delivery Time Data
7
R
2
and Adjusted R
2

R
2
always increase when a regressor is added to
the model, regardless of the value of the
contribution of that variable.
An adjusted R
2
:


The adjusted R
2
will only increase on adding a
variable to the model if the addition of the
variable reduces the residual mean squares.


) 1 /(
) /(
1
Re 2

=
n SS
p n SS
R
T
s
adj
8
3.3.2 Tests on Individual Regression Coefficients
For the individual regression coefficient:
H
0
:
j
= 0 v.s. H
1
:
j
= 0
Let C
jj
be the j-th diagonal element of (XX)
-1
.
The test statistic:


This is a partial or marginal test because any
estimate of the regression coefficient depends
on all of the other regression variables.
This test is a test of contribution of x
j
given the
other regressors in the model
1
2
0
~
)


= =
k n
j
j
jj
j
t
se
C
t
|
|
o
|
9
Example 3.4 The Delivery Time Data

10
The subset of regressors:

11
For the full model, the regression sum of square

Under the null hypothesis, the regression sum of
squares for the reduce model

The degree of freedom is p-r for the reduce model.
The regression sum of square due to
2
given
1


This is called the extra sum of squares due to
2

and the degree of freedom is p - (p - r) = r
The test statistic
y X SS
R
' '

) ( | | =
y X SS
R
'
1
'
1 1

) ( | | =
) ( ) ( ) | (
1 1 2
| | | |
R R R
SS SS SS =
p n r
s
R
F
MS
r SS
F

=
,
Re
1 2
0
~
/ ) | ( | |
12
If
2
= 0, F
0
follows a noncentral F distribution
with


Multicollinearity: this test actually has no power!
This test has maximal power when X
1
and X
2
are
orthogonal to one another!
Partial F test: Given the regressors in X
1
, measure
the contribution of the regressors in X
2
.

2 2
'
1
1
1
'
1 1
'
2
'
2
2
] ) ( [
1
| |
o
X X X X X I X

=
13
Consider y =
0
+
1
x
1
+
2
x
2
+
3
x
3
+ c
SS
R
(
1
|
0
,
2
,
3
), SS
R
(
2
|
0
,
1
,
3
)
and SS
R
(
3
|
0
,
2
,
1
) are signal-degree-of
freedom sums of squares.
SS
R
(
j
|
0
,,
j-1
,
j
,
k
) : the
contribution of x
j
as if it were the last variable
added to the model.
This F test is equivalent to the t test.
SS
T
= SS
R
(
1
,
2
,
3
|
0
) + SS
Res

SS
R
(
1
,
2
,
3
|
0
) = SS
R
(
1
|
0
) +
SS
R
(
2
|
1
,
0
) + SS
R
(
3
|
1
,
2
,
0
)

14
Example 3.5 Delivery Time Data

15
3.3.3 Special Case of Orthogonal Columns in X
Model: y = X + c = X
1

1
+ X
2

2
+ c
Orthogonal: X
1
X
2
= 0
Since the normal equation (XX)= Xy,




|
|
.
|

\
|
=
|
|
.
|

\
|
|
|
.
|

\
|
y X
y X
X X
X X
'
2
'
1
2
1
2
'
2
1
'
1

0
0
|
|
y X X X y X X X
'
2
1
2
'
2 2
'
1
1
1
'
1 1
) (

and ) (


= = | |
16
17
3.3.4 Testing the General Linear Hypothesis
Let T be an m p matrix, and rank(T) = r
Full model: y = X + c


Reduced model: y = Z +c , Z is an n (p-r)
matrix and is a (p-r) 1 vector. Then



The difference: SS
H
= SS
Res
(RM) SS
Res
(FM)

with r degree of freedom. SS
H
is called the sum of
squares due to the hypothesis H
0
: T = 0
freedom) of degree p - (n ' '

' ) (
Re
y X y y FM SS
s
| =
freedom) of degree r p - (n ' '

' ) (
' ) ' (

Re
1
+ =
=

y Z y y RM SS
y Z Z Z
s

18
The test statistic:

p n r
s
H
F
p n FM SS
r SS
F

=
,
Re
~
) /( ) (
/
19
20
Another form:


H
0
: T = c v.s. H
1
: T= c Then

) /( ) (
/

] ' ) ' ( [ ' '

Re
1 1
p n FM SS
r T T X X T T
F
s

=

| |
p n r
s
F
p n FM SS
r c T T X X T c T
F


=
,
Re
1 1
~
) /( ) (
/ )

( ] ' ) ' ( [ )'

( | |
21
3.4 Confidence Intervals in Multiple
Regression
3.4.1 Confidence Intervals on the Regression
Coefficients
Under the normality assumption,
) ) ' ( , ( ~

1 2
X X MN o | |
22

23
3.4.2 Confidence Interval Estimation of the Mean
Response
A confidence interval on the mean response at a
particular point.
x
0
= (1,x
01
,,x
0k
)
The unbiased estimator of E(y|x
0
) :
2 / 1
0
1 '
0
2
, 2 / 0
0
1 '
0
2
0
'
0 0 0
) ) ' ( ( y

response mean on the C.I. ) - 100(1 The


) ' ( )

(
) | ( )

(
x X X x t
x X X x y Var
x x y E y E
p n

=
= =
o
o
o
|
o
24
Example 3.9 The Delivery Time Data
25
3.4.3 Simultaneous Confidence Intervals on
Regression Coefficients






An elliptically shaped region
26
Example 3.10 The Rocket Propellant Data

27
28
Another approach:


A is chosen so that a specified probability that all
intervals are correct is obtained.
Bonferroni method: = t
/2p, n-p

Scheffe S-method: =(2F
,p, n-p
)
1/2

Maximum modulus t procedure: = u
,p, n-2
is the
upper o tail point of the distribution of the
maximum absolute value of two independent
student t r.v.s each based on n-2 degree of
freedom
k ..., 1, 0, j ),

= A
j j
se | |
29
Example 3.11 The Rocket Propellant Data
Find 90% joint C.I. for
0
and
1
by
constructing a 95% C.I. for each parameter.


30
The confidence ellipse is always a more efficient
procedure than the Bonferroni method because the
volume of the ellipse is always less than the
volume of the space covere3d by the Bonferroni
intervals.
Bonferroni intervals are easier to construct.
The length of C.I.:
Maximum modulus t < Bonferroni method
< Scheffe S-method
31
3.5 Prediction of New Observations

32
3.6 Hidden Extrapolation in Multiple
Regression
Be careful about extrapolating beyond the region
containing the original observations!
Rectangle formed by ranges of regressors NOT
data region.
Regressor variable hull (RVH): the convex hull of
the original n data points.
Interpolation: x
0
e RVH
Extrapolation: x
0
e RVH
33
34
h
ii
of the hat matrix H = X(X'X)
-1
Xare useful in
detecting hidden extrapolation.
h
max
: the maximum of h
ii
. The point x
i
that has the
largest value of h
ii
will lie on the boundary of
RVH
{x | x'(X'X)
-1
x h
max
} is an ellipsoid enclosing
all points inside the RVH.
Let h
00
= x
0
(XX)-1x
0

h
00
s h
max
: inside the RVH and the boundary of
RVH
h
00
> h
max
: outside the RVH
35
MCE : minimum covering ellipsoid (Weisberg,
1985).

36
37
3.7 Standardized Regression
Coefficients
Difficult to compare regression coefficients
directly.
Unit Normal Scaling: Standardize a Normal r.v.

38
New model:

There is no intercept.
The least-square estimator of b is
n i z b z b y
i ik k i i
,..., 1 ,
1 1
*
= + + + = c
* 1
' ) ' (

y Z Z Z b

=
39
Unit Length Scaling:
40
New Model:

The least-square estimator:
n i w b w b y
i ik k i i
,..., 1 ,
1 1
0
= + + + = c
0 1
' ) ' (

y W W W b

=
41







It does not matter which scaling we use! They
both produce the same set of dimensionless
regression coefficient.
42
43
44
3.8 Multicollinearity
A serious problem: Multicollinearity or near-linear
dependence among the regression variables.
The regressors are the columns of X. So an exact
linear dependence would result a singular XX

45
Unit length scaling

1
)

( )

(
1 0
0 1
) ' ( and
1 0
0 1
'
2
2
2
1
1
= =

=

o o
b Var b Var
W W W W
46
Soft drink data:




Off-diagonal elements are of WW usually called
the simple correlations between regressors.
12 . 3
)

( )

(
12 . 3 57 . 2
57 . 2 12 . 3
) ' ( and
1 824 . 0
824 . 0 1
'
2
2
2
1
1
= =

=

o o
b Var b Var
W W W W
47
Variance inflation factors (VIFs):
The main diagonal elements of the inverse of
XX ((WW)
-1
above)
From above two cases:Soft drink: VIF
1
= VIF
2

= 3.12 and Figure 3.12: VIF
1
= VIF
2
= 1
VIF
j
= 1/(1-R
j
)
R
j
is the coefficient of multiple determination
obtained from regressing x
j
on the other
regressor variables.
If x
j
is nearly linearly dependent on some of the
other regressors, then R
j
~ 1 and VIF
j
will be
large.
Serious problems: VIFs > 10
48
Figure 3.13 (a): The plan is unstable and very
sensitive to relatively small changes in the data
points.
Figure 3.13 (b): Orthogonal regressors.
49
3.9 Why Do Regression Coefficients
Have the Wrong Sign?
The reasons of the wrong sign:
1. The range of some of the regressors is too
small.
2. Important regressors have not been included
in the model.
3. Multicollinearity is present.
4. Computational errors have been made.
50
For reason 1:







=
= =
n
i
i xx
x x S Var
1
2 2
1
)) ( /( / )

( o o |
51
Although it is possible to decrease the variance of
the regression coefficients by increase the range of
the xs, it may not be desirable to spread the levels
of the regressors out too far:
The true response function may be nonlinear.
Impractical or impossible.
For reason 2:
52



t. coefficien regression total" " a is

here
, 463 . 0 835 . 1
1
|
x y + =
given x x of effect the is

Here
649 . 3 222 . 1 036 . 1

2 1 1
2 1
|
x x y + =
53
Fore reason 3: Multicollinearity inflates the
variances of the coefficients, and this increases the
probability that one or more regression
coefficients will have the wrong sign.
Different computer programs handle round-off or
truncation problems in different ways, and some
programs are more effective than the others in this
regard.

S-ar putea să vă placă și