Sunteți pe pagina 1din 47

THE ANOVA APPROACH TO THE ANALYSIS OF

LINEAR MIXED EFFECTS MODELS


We begin with a relatively simple special case. Suppose
yijk = + i + uij + eijk , (i = 1, . . . , t; j = 1, . . . , n; k = 1, . . . , m)
= (, 1 , . . . , t )0 , u = (u11 , u12 , . . . , utn )0 , e = (e111 , e112 , . . . , etnm )0 ,
IRt+1 , an unknown parameter vector,
 
   2

u
0
u I 0
N
,
, where
e
0
0 e2 I
u2 , e2 IR+ are unknown variance components.

c
Copyright 2012
(Iowa State University)

Statistics 511

1 / 47

This is the standard model for a CRD with t treatments, n


experimental units per treatment, and m observations per
experimental unit.
We can write the model as y = X + Zu + e, where
X = [1tnm1 , Itt 1nm1 ]

c
Copyright 2012
(Iowa State University)

and

Z = [Itntn 1m1 ].

Statistics 511

2 / 47

The ANOVA Table

Source
treatments
exp.units(treatments)
obs.units(exp.units, treatments)
c.total

c
Copyright 2012
(Iowa State University)

DF
t1
t(n 1)
tn(m 1)
tnm 1

Statistics 511

3 / 47

The ANOVA Table

Source
trt
xu(trt)
ou(xu, trt)

DF
t1
t(n 1)
tn(m 1)

c.total

tnm 1

c
Copyright 2012
(Iowa State University)

Sum of Squares
Pt Pn Pm
2
k=1 (yi y )
Pti=1 Pj=1
n Pm
2
k=1 (yij yi )
Pi=1
Pj=1
t
n Pm
2
j=1
k=1 (yijk yij )
Pi=1
P
P
t
n
m
2
i=1
j=1
k=1 (yijk y )

Statistics 511

4 / 47

Source
trt
xu(trt)
ou(xu, trt)

DF
t1
tn t
tnm tn

c.total

tnm 1

c
Copyright 2012
(Iowa State University)

Sum of Squares
Mean Square
P
nmP ti=1P
(yi y )2
m ti=1 nj=1 (yij yi )2
SS/DF
Pt Pn Pm
2
(y

y
)
ij
k=1 ijk
Pi=1
Pj=1
t
n Pm
2
(y

y
)
i=1
j=1
k=1 ijk

Statistics 511

5 / 47

Expected Mean Squares

E(MStrt ) =

nm
t1

Pt

nm
t1

Pt

+ i + ui + ei u e )2

nm
t1

Pt

+ ui u + ei e )2

nm
t1

Pt

nm Pt
t1 [ i=1 (i

i=1 E(yi..
i=1 E(
i=1 E(i
i=1 [(i

y )2

)2 + E(ui u )2 + E(ei e )2 ]
P
)2 + E{ ti=1 (ui u )2 }

P
+E{ ti=1 (ei e )2 }]

c
Copyright 2012
(Iowa State University)

Statistics 511

6 / 47

To simplify this expression further, note that




u2
i.i.d.
u1 , . . . , ut N 0,
.
n
Thus,
E

( t
X

)
(ui u )

= (t 1)

i=1

u2
.
n

Similarly,
e1 , . . . , et



e2
.
N 0,
nm

i.i.d.

Thus,
E

( t
X

)
(ei e )2

i=1

= (t 1)

e2
.
mn

It follows that
E(MStrt ) =

t
nm X
(i .)2 + mu2 + e2 .
t1
i=1

c
Copyright 2012
(Iowa State University)

Statistics 511

7 / 47

Similar calculations allow us to add an Expected Mean Squares (EMS)


column to our ANOVA table.
Source
trt
xu(trt)
ou(xu, trt)

c
Copyright 2012
(Iowa State University)

EMS
e2 + mu2 +
e2 + mu2
e2

nm
t1

Pt

i=1 (i

.)2

Statistics 511

8 / 47

The entire table could also be derived using matrices


X1 = 1, X2 = It t 1nm 1 , X3 = Itn tn 1m 1 .
Source
trt
xu(trt)
ou(xu, trt)
c.total

Sum of Squares
y0 (P2 P1 )y
y0 (P3 P2 )y
y0 (I P3 )y
y0 (I P1 )y

c
Copyright 2012
(Iowa State University)

DF
MS
rank(X2 ) rank(X1 )
rank(X3 ) rank(X2 ) SS/DF
tnm rank(X3 )
tnm 1

Statistics 511

9 / 47

Expected Mean Squares (EMS) could be computed using


E(y0 Ay) = tr(A) + E(y)0 AE(y),
where
= Var(y) = ZGZ0 + R = u2 Itn tn 110 m m + e2 Itnm tnm
and

E(y) =

c
Copyright 2012
(Iowa State University)

+ 1
+ 2
.
.
.
+ t

1nm 1 .

Statistics 511

10 / 47

Furthermore, it can be shown that


y0 (P2 P1 )y
2t1
e2 + mu2

t
X
nm
(i .)2
e2 + mu2

!
,

i=1

y0 (P3 P2 )y
2tnt ,
e2 + mu2
y0 (I P3 )y
2tnmtn ,
e2
and that these three 2 random variables are independent.

c
Copyright 2012
(Iowa State University)

Statistics 511

11 / 47

It follows that
MStrt
Ft1, tnt
F1 =
MSxu(trt)
and

MSxu(trt)
F2 =

MSou(xu,trt)

t
X
nm
(i .)2
e2 + mu2

e2 + mu2
e2

i=1


Ftnt, tnmtn .

Thus, we can use F1 to test H0 : 1 = ... = t and F2 to test H0 : u2 = 0.

c
Copyright 2012
(Iowa State University)

Statistics 511

12 / 47

Estimating u2

Note that

E

MSxu(trt) MSou(xu,trt)
m

Thus,


=

(e2 + mu2 ) e2
= u2 .
m

MSxu(trt) MSou(xu,trt)
m

is an unbiased estimator of u2 .

c
Copyright 2012
(Iowa State University)

Statistics 511

13 / 47

Although
MSxu(trt) MSou(xu,trt)
m
is an unbiased estimator of u2 , this estimator can take negative
values.
This is undesirable because u2 , the variance of the u random
effects, cannot be negative.

c
Copyright 2012
(Iowa State University)

Statistics 511

14 / 47

As we have seen previously,


Var(y) = u2 Itn tn 110 m m + e2 Itnm tnm .
It turns out that
= (X0 1 X) X0 1 y = (X0 X) X0 y = .

Thus, the GLS estimator of any estimable C is equal to the OLS


estimator in this special case.

c
Copyright 2012
(Iowa State University)

Statistics 511

15 / 47

An Analysis Based on the Average for Each


Experimental Unit

Recall that our model is


yijk = + i + uij + eijk , (i = 1, ..., t; j = 1, ..., n; k = 1, ..., m)
The average of observations for experimental unit ij is
yij = + i + uij + eij

c
Copyright 2012
(Iowa State University)

Statistics 511

16 / 47

If we define
ij = uij + eij i, j
and
2 = u2 +

e2
,
m

we have
yij = + i + ij ,
where the ij terms are iid N(0, 2 ). Thus, averaging the same number
(m) of multiple observations per experimental unit results in a normal
theory Gauss-Markov linear model for the averages


yij : i = 1, ..., t; j = 1, ..., n .

c
Copyright 2012
(Iowa State University)

Statistics 511

17 / 47

Inferences about estimable functions of obtained by analyzing


these averages are identical to the results obtained using the
ANOVA approach as long as the number of multiple observations
per experimental unit is the same for all experimental units.
When using the averages as data, our estimate of 2 is an
2
estimate of u2 + me .
We cant separately estimate u2 and e2 , but this doesnt matter if
our focus is on inference for estimable functions of .

c
Copyright 2012
(Iowa State University)

Statistics 511

18 / 47

Because

E(y) =

+ 1
+ 2
.
.
.
+ t

1nm 1 ,

the only estimable quantities are linear combinations of the treatment


means + 1 , + 2 , ..., + t , whose Best Linear Unbiased
Estimators are y1 , y2 , . . . , yt , respectively.

c
Copyright 2012
(Iowa State University)

Statistics 511

19 / 47

Thus, any estimable C can always be written as

+ 1
+ 2

A
for some matrix A.
..

.
+ t
It follows that the BLUE of C can be written as

y1
y
2
A . .
..
yt

c
Copyright 2012
(Iowa State University)

Statistics 511

20 / 47

Now note that


Var(yi ..) = Var( + i + ui . + ei ..)
= Var(ui . + ei ..)
= Var(ui .) + Var(ei ..)
u2
2
=
+ e
n
nm


1
e2
2
=
u +
n
m
2

=
.
n

c
Copyright 2012
(Iowa State University)

Statistics 511

21 / 47

Thus

Var

y1..
y2..
.
.
.
yt..

2
=
Itt

which implies that the variance of the BLUE of C is


y1
.

 2

2 0

Var A . = A
Itt A0 =
AA .
n
n
.
yt

c
Copyright 2012
(Iowa State University)

Statistics 511

22 / 47

Thus, we dont need separate estimates of u2 and e2 to carry out


inference for estimable C.
We do need to estimate 2 = u2 +

e2
m.

This can equivalently be estimated by


MSxu(trt)
m
or by the MSE in an analysis of the experimental unit means


yij : i = 1, ..., t; j = 1, ..., n. .

c
Copyright 2012
(Iowa State University)

Statistics 511

23 / 47

For example, suppose we want to estimate 1 2 . The BLUE is


y1 y2 whose variance is
Var(y1 y2 ) = Var(y1 ) + Var(y2 )
 2

2
u
e2
= 2
=2
+
n
n
mn
2 2
=
( + mu2 )
mn e
2
E(MSxu(trt) )
=
mn
Thus,
c 1 y2 ) =
Var(y

c
Copyright 2012
(Iowa State University)

2MSxu(trt)
.
mn

Statistics 511

24 / 47

A 100(1 )% confidence intreval for 1 2 is


r
2MSxu(trt)
.
y1 y2 tt(n1),1/2
mn
A test of H0 : 1 = 2 can be based on

y y2
1 2
t = q1
tt(n1) q
2MSxu(trt)
mn

c
Copyright 2012
(Iowa State University)

2(e2 +mu2 )
mn

Statistics 511

25 / 47

What if the number of observations per experimental unit is not


the same for all experimental units?
Let us look at two miniature examples to understand how this type
of unbalancedness affects estimation and inference.

c
Copyright 2012
(Iowa State University)

Statistics 511

26 / 47

First Example

y111
y121

y=
y211 ,
y212

c
Copyright 2012
(Iowa State University)

1
1
X=
0
0

0
0
,
1
1

1
0
Z=
0
0

0
1
0
0

0
0

1
1

Statistics 511

27 / 47

X1 = 1,

X2 = X,

X3 = Z

MStrt = y0 (P2 P1 )y = 2(y1 y )2 + 2(y2 y )2 = (y1 y2 )2


1
MSxu(trt) = y0 (P3 P2 )y = (y111 y1 )2 + (y121 y1 )2 = (y111 y121 )2
2
1
MSou(xu,trt) = y0 (I P3 )y = (y211 y2 )2 + (y212 y2 )2 = (y211 y212 )2
2

c
Copyright 2012
(Iowa State University)

Statistics 511

28 / 47

E(MStrt ) = E(y1 y2 )2
= E(1 2 + u1 u21 + e1 e2 )2
= (1 2 )2 + Var(u1 ) + Var(u21 ) + Var(e1 ) + Var(e2 )
= (1 2 )2 +

u2
2

+ u2 +

e2
2

e2
2

= (1 2 )2 + 1.5u2 + e2

c
Copyright 2012
(Iowa State University)

Statistics 511

29 / 47

E(MSxu(trt) )

1
2 E(y111

1
2 E(u11

1
2
2 (2u

y121 )2

u12 + e111 e121 )2

+ 2e2 )

= u2 + e2
E(MSou(xu,trt) ) =

1
2 E(y211

y212 )2

1
2 E(e211

e212 )2

= e2

c
Copyright 2012
(Iowa State University)

Statistics 511

30 / 47

SOURCE
trt
xu(trt)
ou(xu, trt)

F=

MStrt
1.5u2 + e2

c
Copyright 2012
(Iowa State University)

EMS
(1 2 )2 + 1.5u2 + e2
u2 + e2
e2

 



MSxu(trt)
(1 2 )2
/

F
1,1
u2 + e2
1.5u2 + e2

Statistics 511

31 / 47

The test statistic that we used to test


H0 : 1 = = t
in the balanced case is not F distributed in this unbalanced case.


(1 2 )2
MStrt
1.5u2 + e2
F1,1

MSxu(trt)
u2 + e2
1.5u2 + e2

c
Copyright 2012
(Iowa State University)

Statistics 511

32 / 47

A Statistic with an Approximate F Distribution


Wed like our denominator to be an unbiased estimator of
1.5u2 + e2 in this case.
Consider 1.5MSxu(trt) 0.5MSou(xu,trt)
The expectation is
1.5(u2 + e2 ) 0.5e2 = 1.5u2 + e2 .
The ratio

MStrt
1.5MSxu(trt) 0.5MSou(xu,trt)

can be used as an approximate F statistic with 1 numerator DF


and a denominator DF obtained using the Cochran-Satterthwaite
method.
c
Copyright 2012
(Iowa State University)

Statistics 511

33 / 47

The Cochran-Satterthwaite method will be explained in the next


set of notes.
We should not expect this approximate F-test to be reliable in this
case because of our pitifully small dataset.

c
Copyright 2012
(Iowa State University)

Statistics 511

34 / 47

Best Linear Unbiased Estimates in this First Example

What do the BLUEs of the treatment means look like in this case?
Recall

1 0
1 0 0


1 0

1
, Z = 0 1 0 .
=
, X=
0 1
0 0 1
2
0 1
0 0 1

c
Copyright 2012
(Iowa State University)

Statistics 511

35 / 47

= Var(y) = ZGZ0 + R = u2 ZZ0 + e2 I

1 0 0 0
1 0 0 0
0 1 0 0

2 0 1 0 0

= u2
0 0 1 1 + e 0 0 1 0
0 0 1 1
0 0 0 1
2
2
u 0 0 0
e 0 0 0
0 u2 0 0 0 e2 0 0

=
0 0 u2 u2 + 0 0 e2 0
0 0 u2 u2
0 0 0 e2
2

u + e2
0
0
0

0
u2 + e2
0
0

=
2
2
2

0
0
u + e
u
2
2
2
0
0
u
u + e

c
Copyright 2012
(Iowa State University)

Statistics 511

36 / 47

It follows that
= (X0 1 X) X0 1 y


 1 1


0 0
y1
2
2
=
y=
y2
0 0 21 12
Fortunately, this is a linear estimator that does not depend on unknown
variance components.

c
Copyright 2012
(Iowa State University)

Statistics 511

37 / 47

Second Example

y111
y112

y=
y121 ,
y211

c
Copyright 2012
(Iowa State University)

1
1
X=
1
0

0
0
,
0
1

1
1
Z=
0
0

0
0
1
0

0
0

0
1

Statistics 511

38 / 47

In this case, it can be shown that


0 1
0 1

= (X X) X y

"
=
"
=

e2 +u2
3e2 +4u2

e2 +u2
3e2 +4u2

e2 +2u2
3e2 +4u2

2e2 +2u2
3e2 +4u2

c
Copyright 2012
(Iowa State University)

y11

+
y211

e2 +2u2
3e2 +4u2

0
1

y121

y111
y112

y121
y211
#
.

Statistics 511

39 / 47

It is straightforward to show that the weights on y11 and y121 are


1
Var(y11 )
1
Var(y11 )

1
Var(y121 )

and

1
Var(y121 )
1
Var(y11 )

1
Var(y121 )

, respectively.

This is a special case of a more general phenomenon: the BLUE is a


weighted average of independent linear unbiased estimators with
weights for the linear unbiased estimators proportional to the inverse
variances of the linear unbiased estimators.

c
Copyright 2012
(Iowa State University)

Statistics 511

40 / 47

Of course, in this case and in many others,


" 2 2
#
2e +2u
e2 +2u2
= 3e2 +4u2 y11 + 3e2 +4u2 y121

y211
is not an estimator because it is a function of unknown parameters.
b as our estimator (i.e., we replace 2 and 2 by
Thus, we use
e
u

estimates in the expression above).

c
Copyright 2012
(Iowa State University)

Statistics 511

41 / 47

b is an approximation to the BLUE.

b is not even a linear estimator in this case.

Its exact distribution is unknown.


When sample sizes are large, it is reasonable to assume that the
b is approximately the same as the distribution of
distribution of

c
Copyright 2012
(Iowa State University)

Statistics 511

42 / 47

) = Var[(X0 1 X)1 X0 1 y]
Var(

= (X0 1 X)1 X0 1 Var(y)[(X0 1 X)1 X0 1 ]0


= (X0 1 X)1 X0 1 1 X(X0 1 X)1
= (X0 1 X)1 X0 1 X(X0 1 X)1
= (X0 1 X)1
1

b ) = Var[(X0
b
Var(

c
Copyright 2012
(Iowa State University)

b 1 y] =???? (X0
b 1 X)1
X)1 X0

Statistics 511

43 / 47

Summary of Main Points

Many of the concepts we have seen by examining special cases


hold in greater generality.
For many of the linear mixed models commonly used in practice,
balanced data are nice because...

c
Copyright 2012
(Iowa State University)

Statistics 511

44 / 47

It is relatively easy to determine degrees of freedom, sums of


squares, and expected mean squares in an ANOVA table.

Ratios of appropriate mean squares can be used to obtain exact


F-tests.

b = C.
(OLS = GLS).
For estimable C, C

= constant E(MS), exact inferences about c0


When Var(c0 )
can be obtained by constructing t-tests or confidence intervals
based on
c0
c0
t= p
tDF(MS) .
constant (MS)
Simple analysis based on experimental unit averages gives the
same results as those obtained by linear mixed model analysis of
the full data set.

c
Copyright 2012
(Iowa State University)

Statistics 511

45 / 47

When data are unbalanced, the analysis of linear mixed may be


considerably more complicated.
1

Approximate F-tests can be obtained by forming linear


combinations of Mean Squares to obtain denominators for test
statistics.
b may be a nonlinear estimator of C whose
The estimator C

exact distribution is unknown.


Approximate inference for C is often obtained by using the
b , with unknowns in that distribution replaced by
distribution of C

estimates.

c
Copyright 2012
(Iowa State University)

Statistics 511

46 / 47

Whether data are balanced or unbalanced, unbiased estimators of


variance components can be obtained using linear combinations of
mean squares from the ANOVA table.

c
Copyright 2012
(Iowa State University)

Statistics 511

47 / 47

S-ar putea să vă placă și