Documente Academic
Documente Profesional
Documente Cultură
Experimental Design
Unified Concepts, Practical
Applications, and Computer
Implementation
Bruce L. Bowerman, Richard T. OConnell, and
Emily S. Murphree
Abstract
Experimental Design: Unified Concepts, Practical Applications, and Computer Implementation is a concise and innovative book that gives a complete
presentation of the design and analysis of experiments in approximately
one half the space of competing books. With only the modest prerequisite
of a basic (noncalculus) statistics course, this text is appropriate for the
widest possible audience.
Keywords
experimental design, fractional factorials, Latin square designs, nested
designs, one factor analysis, one-way ANOVA, randomized block design,
response surfaces, split plot design, two factor analysis, two level factorial
designs, two-way ANOVA
Contents
Prefaceix
Chapter 1
An Introduction to Experimental Design:
One Factor Analysis........................................................ 1
Chapter 2
Two Factor Analysis..................................................... 45
Chapter 3
Chapter 4
Two Level Factorials, Fractional Factorials,
Block Confounding, and Response Surfaces...............179
Appendix A
Statistical Tables..........................................................249
References257
Index259
Preface
Experimental Design: Unified Concepts, Practical Applications, and Computer Implementation is a concise and innovative book that gives a c omplete
presentation of the design and analysis of experiments in approximately
one half the space of competing books. With only the modest prerequisite of a basic (noncalculus) statistics course, this text is appropriate
for the widest possible audiencecollege juniors, seniors, and first year
graduate students in business, the social sciences, the sciences and statistics, as well as professionals in business and industry. Using a unique and
integrative approach, this text organizes and presents the two procedures
for analyzing experimental design dataanalysis of variance (ANOVA)
and regression analysisin such a way that the reader or instructor can
move through the material more quickly and efficiently than when using
competing books and so that the true advantages of both ANOVA and
regression analysis are made clearer.
Because ANOVA is more intuitive, this book devotes most of its first
three chapters to showing how to use ANOVA to analyze the type of
experimental design data that it can be validly used to analyze: balanced
(equal sample size) data or unbalanced (unequal sample size) data from
one factor studies, balanced data from two factor studies (two-way
factorials and randomized block designs), and balanced data from three
or more factor studies. Chapter 3 includes a general ANOVA procedure
for analyzing balanced data experiments.
Regression analysis can be used to analyze almost any balanced or
unbalanced data experiment but is less intuitive than ANOVA. Therefore,
this book waits to discuss regression analysis until it is needed to analyze
data that cannot be analyzed by ANOVA. This is in Section 2.4, where
the analysis of unbalanced data resulting from two-way factorials is discussed. Waiting until Section 2.4, gives more space to explain regression
analysis from first principles to readers who have little or no background
in this subject and also allows concise discussion of the regression analyses of one factor studies and incomplete block designs. Section 2.4 also
x PREFACE
CHAPTER 1
An Introduction to
Experimental Design:
One Factor Analysis
1.1 Basic Concepts of Experimental Design
In many statistical studies a variable of interest, called the response
variable (or dependent variable), is identified. Then data are collected
that tell us about how one or more factors (or independent variables)
influence the variable of interest. If we cannot control the factor(s) being
studied, we say that the data obtained are observational. For example,
suppose that in order to study how the size of a home relates to the sales
price of the home, a real estate agent randomly selects 50 recently sold
homes and records the square footages and sales prices of these homes.
Because the real estate agent cannot control the sizes of the randomly
selected homes, we say that the data are observational.
If we can control the factors being studied, we say that the data are
experimental. Furthermore, in this case the values, or levels, of the
factor (or combination of factors) are called treatments. The purpose
of most experiments is to compare and estimate the effects of the different treatments on the response variable. For example, suppose that an
oil company wishes to study how three different gasoline types (A, B,
and C ) affect the mileage obtained by a popular compact automobile
model. Here the response variable is gasoline mileage, and the company
will study a single factorgasoline type. Because the oil company can
control which gasoline type is used in the compact automobile, the data
that the oil company will collect are experimental. Furthermore, the
EXPERIMENTAL DESIGN
Gasoline type C
yA1 = 34.0
yB1 = 35.3
yC1 = 33.3
yA2 = 35.0
yB2 = 36.5
yC2 = 34.0
yA3 = 34.3
yB3 = 36.4
yC3 = 34.7
yA4 = 35.5
yB4 = 37.0
yC4 = 33.0
yA5 = 35.8
yB5 = 37.6
yC5 = 34.9
Mileage
Gasoline type A
38
37
36
35
34
33
A
B
Gas type
selected from the remaining 990 available Lances. These autos will be
assigned to gasoline type C .
Each randomly selected Lance is test driven using the appropriate gasoline type (treatment) under normal conditions for a specified distance,
and the gasoline mileage for each test drive is measured. We let yij denote
the jth mileage obtained when using gasoline type i. The mileage data
obtained are given in Table 1.1. Here we assume that the set of gasoline
mileage observations obtained by using a particular gasoline type is a sample randomly selected from the infinite population of all Lance mileages
that could be obtained using that gasoline type. Examining the box plots
shown below the mileage data, we see some evidence that gasoline type B
yields the highest gasoline mileages.
EXPERIMENTAL DESIGN
i =1
=0
This implies that m., which we define to be the mean of the treatment
means, is
p
m. =
mi
i =1
( m + ti )
i =1
p m + ti
i =1
=m
That is, the previously considered overall mean m is equal to m., the mean
of the treatment means. Moreover, because mi = m + ti the treatment
effect ti is equal to mi m = mi m., the difference between the ith treatment mean and the mean of the treatment means.
yi =
y
j =1
ij
ni
ni
si =
( y
j =1
ij
yi )2
ni 1
EXPERIMENTAL DESIGN
between treatment means. The validity of these formulas requires that the
following three ANOVA assumptions hold:
1. Constant variancethe p populations of values of the response variable associated with the treatments have equal variances. We denote
the constant variance as s 2.
2. Normalitythe p populations of values of the response variable
associated with the treatments all have normal distributions.
3. Independencethe different yij response variable values are statistically independent of each other.
Because the previously described process of randomly assigning experimental units to the treatments implies that each yij can be assumed to be
a randomly selected response variable value, the ANOVA assumptions
say that each yij is assumed to have been randomly and independently
selected from a population of response variable values that is normally
distributed with mean mi and variance s 2. Stated in terms of the error
term of the one factor model yij = mi + eij , the ANOVA assumptions say
that each eij is assumed to have been randomly and independently selected
from a population of error term values that is normally distributed with
mean zero and variance s 2.
The one-way ANOVA results are not very sensitive to violations of
the equal variances assumption. Studies have shown that this is particularly true when the sample sizes employed are equal (or nearly equal).
Therefore, a good way to make sure that unequal variances will not be a
problem is to take samples that are the same size. In addition, it is useful
to compare the sample standard deviations s1 , s2 ,..., s p to see if they are
reasonably equal. As a general rule, the one-way ANOVA results will be
approximately correct if the largest sample standard deviation is no more than
twice the smallest sample standard deviation. The variations of the samples
can also be compared by constructing a box plot for each sample (as we
have done for the gasoline mileage data in Table 1.1). Several statistical
tests also employ the sample variances to test the equality of the population variances. See Section 1.3.
The normality assumption says that each of the p populations is normally distributed. This assumption is not crucial. It has been shown that
EXPERIMENTAL DESIGN
This alternative says that at least two treatments have different effects on
the mean response.
To carry out such a test, we compare what we call the between-
treatment variability to the within-treatment variability. We can
understand and numerically measure these two types of variability by
defining several sums of squares and mean squares. To begin to do this
we define n to be the total number of experimental units employed in the
one-way ANOVA, and we define y to be the overall mean of all observed
values of the response variable. Then we define the following:
The treatment sum of squares is
p
SST = ni ( yi y )2
i =1
In order to compute SST , we calculate the difference between each sample treatment mean yi and the overall mean y , we square each of these
differences, we multiply each squared difference by the number of observations for that treatment, and we sum over all treatments. The SST measures the variability of the sample treatment means. For instance, if all the
sample treatment means ( yi values) were equal, then the treatment sum
of squares would be equal to 0. The more the yi values vary, the larger will
be SST . In other words, the treatment sum of squares measures the amount
of between-treatment variability.
As an example, consider the gasoline mileage data in Table 1.1. In this
experiment we employ a total of
n = nA + nB + nC = 5 + 5 + 5 = 15
experimental units. Furthermore, the overall mean of the 15 observed
gasoline mileages is
y=
Then
SST =
i = A , B ,C
ni ( yi y )2
= nA ( y A y )2 + nB ( yB y )2 + nC ( yC y )2
= 5(34.922 35.153)2 + 5(36.56 35.153)2 + 5(33.98 35.153)2
= 17.0493
In order to measure the within-treatment variability, we define the
following quantity:
The error sum of squares is
n1
n2
np
SSE = ( y1 j y1 ) + ( y2 j y2 ) + ... + ( y pj y p )2
j =1
j =1
j =1
Here y1 j is the jth observed value of the response in the first sample, y2 j
is the jth observed value of the response in the second sample, and so
forth. The previous formula says that we compute SSE by calculating the
squared difference between each observed value of the response and its
10
EXPERIMENTAL DESIGN
nC
nB
j =1
j =1
ni
SSTO = ( yij y )2
i =1 j =1
It can be shown that SSTO is the sum of SST and SSE . That is:
SSTO = SST + SSE
This says that the total variability in the observed values of the response
must come from one of two sourcesthe between-treatment variability
or the within-treatment variability. Therefore, the SST and SSE are said to
partition the total sum of squares. For the gasoline mileage study
SSTO = SST + SSE = 17.0493 + 8.028 = 25.0773
11
Using the treatment and error sums of squares, we next define two mean
squares. The treatment mean square is
MST =
SST
p 1
MSE =
SSE
n p
MST
MSE
E ( MST ) = s 2 +
n ( m m. )
i =1
p 1
* 2
and E ( MSE ) = s 2
E ( MST ) after the plus sign equals zero and thus E ( MST ) = s 2. This
implies that E ( MST ) / E ( MSE ) = 1. On the other hand, if H 0 is not
true, the part of E ( MST ) after the plus sign is greater than 0 and thus
E ( MST ) > s 2. This implies that E ( MST ) / E ( MSE ) > 1. We conclude
that values of F = MST / MSE that are large (substantially greater than 1)
12
EXPERIMENTAL DESIGN
MST SST / ( p 1)
=
MSE SSE / (n p )
13
Also define the p-value related to F to be the area under the curve
of the F distribution having p 1 numerator and n p denominator
degrees of freedom to the right of F .
Then, we can reject H 0 in favor of H a at level of significance a if
either of the following equivalent conditions hold:
1. F > Fa
2. p -value < a
Here,
F > Fa is the point on the horizontal axis under the curve of the F
distribution having p 1 numerator and n p denominator degrees of
freedom that gives a right hand tail area equal to a.
14
EXPERIMENTAL DESIGN
1a
a = The level of
significance
Fa
If F(model) Fa,
do not reject H0 in favor of Ha
p-value
F(model)
(b) If the p-value is smaller than a, then
F(model) > Fa and we reject H0.
MST =
SST 17.0493
=
= 8.525
p 1
3 1
and
MSE =
SSE
8.028
=
= 0.669
n p 15 3
It follows that
F=
MST 8.525
=
= 12.74
MSE 0.669
15
16
EXPERIMENTAL DESIGN
Source
Degrees
of
freedom
Treatments
p 1= 31
=2
SST = 17.0493
Error
n p = 15 3
= 12
SSE = 8.028
Total
n 1 = 15 1
= 14
SSTO = 25.0773
Sums of
squares
Mean
squares
SST
p 1
17.0493
=
31
= 8.525
MST =
F statistic p-value
MST
MSE
8.525
=
0.669
= 12.74
F=
0.001
SSE
np
8.028
=
15 3
= 0.669
MSE =
mB mA , mA mC , and mB mC . Here, for instance, the pairwise difference mB mA can be interpreted as the change in mean mileage achieved
by changing from using gasoline type A to using gasoline type B.
There are two approaches to calculating confidence intervals for pairwise differences. The first involves computing the usual, or individual,
confidence interval for each pairwise difference. Here, if we are computing
100(1 a ) percent confidence intervals, we are100(1 a ) percent confident that each individual pairwise difference is contained in its respective
interval. That is, the confidence level associated with each (individual)
comparison is100(1 a ) percent, and we refer to a as the comparisonwise error rate. However, we are less than100(1 a ) percent confident
that all of the pairwise differences are simultaneously contained in their
respective intervals. A more conservative approach is to compute simultaneous confidence intervals. Such intervals make us100(1 a )percent
confident that all of the pairwise differences are simultaneously contained
in their respective intervals. That is, when we compute simultaneous
intervals, the overall confidence level associated with all the comparisons
being made in the experiment is100(1 a ) percent, and we refer to a as
the experimentwise error rate.
N
5
5
5
SS
Source of Variation
np
3 n1
10
34.8
36.0
8 MSE
0.0011 10
Pvalue
7 MST
12.7424 9
0.587
0.723
0.697
Variance
6 SSTO
8.5247 7
0.6690 8
2 1
12 2
14 3
5 SSE
MS
34.92 11
36.56 12
33.98 13
Average
33.6
Upper
3.0190
0.439
12 y
B
13 y
C
14 F
05
3.8853 14
F crit
37.2
++++
()
*
()
*
()
*
++++
df
SST
P
0.001
Figure 1.2 MINITAB and Excel output of an analysis of variance of the gasoline mileage data in Table 1.1
Sum
174.6
182.8
169.9
Type C
F
12.74 9
Individual 95%
CIs For Mean Based on Pooled StDev
Type B,
MS
8.525 7
0.669 8
1 p1
17.0493 4
8.0280 5
25.0773 6
5
5
5
Type A
Type B
Type C
ANOVA
Between Groups
Within Groups
Total
Type A,
SS
17.049 4
8.028 5
25.077 6
StDev
Mean
34.920 11 0.766
36.560 12 0.850
33.980 13 0.835
Level
Type A
Type B
Type C
17
2.326733
36.56000000p 1.26000000
Residual
35.76301898q
34.60780316r
38.51219684r
Figure 1.3 SAS output of an analysis of variance of the gasoline mileage data in Table 1.1
37.35698102q
Lower 95% CL
Upper 95% CL
for Individual for Individual
1.26000000
Upper 95% CL
for Mean
0.51730069o
0.51730069
0.51730069
0.44799550
Std Error of
Estimate
35.15333333
Pr > F
0.0011j
MILEAGE Mean
F Value
12.74i
0.0081n
0.0942
0.0003
0.0007
35.30000000
Predicted
Value
Observed
Value
Observation
36.56000000p
Predicted
Value
3.17m
1.82
4.99
4.71
Pr > |T|
0.81792420
k
Root MSE
Mean Square
8.52466667g
0.66900000h
T for H0:
Parameter=0
35.30000000
Observed
Value
1.64000000l
0.94000000
2.58000000
2.11000000
Estimate
Sum of Squares
17.04933333d
8.02800000e
25.07733333f
C. V.
Observation
MUBMUA
MUAMUC
MUBMUC
MUB(MUC+MUA)/2
Parameter
0.679870
Source
DF
Model
2a
Error
12b
Corrected Total 14c
Rsquare
18
EXPERIMENTAL DESIGN
19
1 1
( yi yh ) t a / 2 MSE +
ni nh
20
EXPERIMENTAL DESIGN
MSE
( yi yh ) qa
MSE
yi t a /2
ni
5. A point prediction of yi 0 = mi + ei 0, a randomly selected individual value of the response variable when using treatment i, is yi, and
a 100(1 a) percent prediction interval for yi 0 is
1
yi t a /2 MSE 1 +
ni
Note that, because the ANOVA assumptions imply that the error term
ei 0 is assumed to be randomly selected from a normally distributed
population of error term values having mean zero, ei 0 has a fifty percent
chance of being positive and a fifty percent chance of being negative.
Therefore, we predict ei 0 to be zero, and this implies that the point
21
Example 1.3
In the gasoline mileage study, we are comparing p = 3 treatment means
( mA , mB , and mC ). Furthermore, each sample is of size m = 5, there are a
total of n = 15 observed gas mileages, and the MSE found in Table 1.2 is
.669. Because q.05 = 3.77 is the entry found in Table A3 corresponding to
p = 3 and n p = 12, a Tukey simultaneous 95 percent confidence interval for mB mA is
MSE
.669
( yB y A ) q.05
= (36.56 34.92 ) 3.77
m
5
= [1.64 1.379]
= [.261, 3.019]
Similarly, Tukey simultaneous 95 percent confidence intervals for mA mC
and mB mC are, respectively,
[( y
[( y
and
yC ) 1.379] = [(36.56 33.98) 1.379]
= [2.58 1.379]
= [1.201, 3.959]
22
EXPERIMENTAL DESIGN
MSE
.669
yB t.025
= 36.56 2.179
nB
5
= [35.763, 37.357 ]
This interval says we can be 95 percent confident that the mean mileage obtained by all Lances using gasoline type B is between 35.763 and
37.357 mpg. Also, a 95 percent prediction interval for yB 0 = mB + eB 0,
the mileage obtained by a randomly selected individual Lance when
driven using gasoline type B, is
1
1
yB t .025 MSE 1 + = 36.56 2.179 .669 1 +
5
nB
= [34.608, 38.512]
Notice that the 95 percent confidence interval for mB is graphed on the
MINITAB output of Figure 1.2, and both the 95 percent confidence
interval for mB and the 95 percent prediction interval for an individual
Lance mileage using gasoline type B are given on the SAS output in Figure 1.4. The MINITAB output also shows the 95 percent confidence
intervals for mA and mC , and a typical SAS output would also give these
23
intervals, but to save space we have omitted them. Also, the MINITAB
output gives Tukey simultaneous 95 percent intervals. For example, consider finding the Tukey interval for mB mA on the MINITAB output. To
do this, we look in the table corresponding to Type A subtracted from
and find the row in this table labeled Type B. This row gives the interval
for Type A subtracted from Type Bthat is, the interval for mB mA.
This interval is [.261, 3.109], as previously calculated. Finally, note that
the half-length of the individual 95 percent confidence interval for a pairwise comparison is (because nA = nB = nC = 5 )
1 1
1 1
t .025 MSE + = 2.179 .669 + = 1.127
5 5
n
n
i
h
This half-length implies that the individual intervals are shorter than
the previously constructed Tukey intervals, which have a half-length
of 1.379. Recall, however, that the Tukey intervals are short enough to
allow us to conclude with 95 percent confidence that mB is greater than
mA and mC .
We next suppose in the gasoline mileage situation that gasoline type
B contains a chemicalChemical XXthat is not contained in gasoline
types A or C . To assess the effect of Chemical XX on gasoline mileage, we
consider
mB
mC + mA
2
This is the difference between the mean mileage obtained by using gasoline type B and the average of the mean mileages obtained by using
gasoline types C and A. Note that
1
1
mB ( mC + mA ) / 2 = mA + (1) mB + mC
2
2
= a A mA + aB mB + aC mC
=
l = A , B ,C
al ml
24
EXPERIMENTAL DESIGN
a m
i =1
= a1 m1 + a2 m2 + ... + a p m p
a y
i =1
a m is
i =1
p
ai yi t a / 2 s
i =1
ai2
i =1 ni
a
i =1
a m
i =1
25
such
( yi y h )
( p 1) F (ap 1, n p ) s
1 1
+
ni nh
p
p
ai yi
i =1
a2
( p 1) F (ap 1, n p ) s i
n
i =1
( yi y h )
( p, n p )
pFa
1 1
+
ni nh
p
b.
The Scheff interval for the linear combination ai mi is
i =1
P
ai yi
i =1
( p, n p )
p Fa
ai2
i =1 ni
4. A Bonferroni simultaneous 100(1 a) percent confidence interval for mi mh in a prespecified set of g linear combinations is
1 1
+
( yi yh ) ta /2g s
ni nh
26
EXPERIMENTAL DESIGN
val for
a m
i =1
ai2
n
i =1
27
Example 1.4
Consider the North American Oil Company problem. Suppose that we
had decidedbefore we observed the gasoline mileage data in Table 1.1
that we wished to find Scheff simultaneous 95 percent confidence intervals for all contrasts in the following set of contrasts:
Set I
mB mA
mA mC
mB mC
m + mA
mB C
Suppose that we also wish to find such intervals for other contrasts that
the data might suggest. That is, we are considering all possible contrasts.
To verify, for example, that mB ( mC + mA ) / 2 is a contrast, note that
mB
mC + mA
1
1
= mA + 1 mB mC
2
2
2
= a A mA + aB mB + aC mC
1
1
Here, a A = , aB = 1, and aC = , which implies that
2
2
i = A , B ,C
ai = a A + aB + aC
1
1
= +1
2
2
=0
Moreover,
ai2 a A2 aB2 aC2
=n +n +n
i = A , B ,C ni
A
B
C
1
2
( 2 ) (1)2 ( 12 )2
=
+
+
5
5
5
= .3
Since s = MSE = .669 = .8179, it follows that a Scheff simultaneous
95 percent confidence interval for mB ( mC + mA ) / 2 is (using formula 2b)
28
EXPERIMENTAL DESIGN
y + yA
a2
( p 1, n p )
( p 1) Fa
s i
yB C
2
i = A , B ,C n
33.98 + 34.92
( 3 1,15 3 )
= 36. 56
(3 1) F.05
(.8179) .3
2
= [2.11 2(3.89)(.8179) .3 ]
= [.86, 3.36]
This interval says that we are 95 percent confident that mB is between
.86mpg and 3.36 mpg greater than ( mC + mA ) / 2. Note here that Chemical XX might be a major factor causing mB to be greater than ( mC + mA ) / 2.
However, this is not at all certain. The chemists at North American Oil
must use the previous comparison, along with their knowledge of the
chemical compositions of gasoline types A, B, and C , to assess the effect
of Chemical XX on gasoline mileage. The Scheff simultaneous 95 percent confidence intervals for mB mA , mA mC , and mB mC (the other
contrasts in Set I) can be calculated by using formula 2a.
Next, suppose that we had decidedbefore we observed the gasoline
mileage data in Table 1.1that we wished to calculate Scheff simultaneous 95 percent confidence intervals for all the linear combinations in
Set II:
Set II
mA
mB
m + mA
mB C
2
mC
mB mA
mA mC
mB mC
In addition, suppose that we wish to find such intervals for other linear
combinations that the data might suggest. Note that mA , mB , and mC are
not contrasts. That is, these means cannot be written as i = A , B ,C ai mi,
where
i = A , B ,C
ai = 0
For example,
29
i = A , B ,C
ai = 0 + 1 + 0
=1
( p 1) Fa( p 1, n p ) = (3 1) F.05(31,153)
= 2(3.89)
= 2.7893
to calculate Scheff simultaneous 95 percent confidence intervals for all
possible contrasts, formulas 3a and 3b use the larger
( p, n p )
p Fa
( 3,15 3 )
= 3 F.05
= 3(3.49)
= 3.2357
to calculate Scheff simultaneous 95 percent confidence intervals for all
possible linear combinations. Because the formulas 3a and 3b differ from
the respective formulas 2a and 2b only by these comparative values, we
are paying for desiring Scheff simultaneous 95 percent confidence intervals for all possible linear combinations (some of which are not contrasts)
by having longer (and thus less precise) Scheff simultaneous 95percent
confidence intervals for the contrasts.
Next, consider finding Bonferroni simultaneous 95 percent confidence intervals for the prespecified linear combinations mB mA , mA mC , mB mC , and mB
mB mA , mA mC , mB mC , and mB ( mC + mA ) / 2. Since there are g = 4 linear combinations here, we need to find t a / 2 g = t .05/ 2( 4 ) = t .00625. Using Excel to find
t .00625 based on n p = 15 3 = 12 degrees of freedom, we find that
t .00625 = 2.934459. This t point is larger than the previously found Scheff
( 3 1,15 3 )
interval point (3 1) F.05
= 2.7893 for all possible contrasts, so the
Bonferroni simultaneous 95 percent confidence intervals would be longer
than the corresponding Scheff intervals. On the other hand, consider
30
EXPERIMENTAL DESIGN
(3 1) F.05(31,153 )
31
yi y h
s (1 / ni ) + (1 / nh )
Also, define the p-value to be twice the area under the curve of the
t -distribution having n p degrees of freedom to the right of t . Then
we can reject H 0 : mi mh = 0 in favor of H a : mi mh 0 at level of
significance a if either of the following equivalent conditions hold.
1. | t | > ta / 2 or | yi yh | > ta / 2 s (1 / ni ) + (1 / nh )
2. p-value < a
32
EXPERIMENTAL DESIGN
1 1
+
ni nh
( p 1) Fa( p 1, n p ) s
1 1
+
ni nh
In this case we are controlling the experimentwise error rate over all null
p
hypotheses that set a contrast i =1 ai mi equal to zero. The Tukey method
declares the difference between mi and mh to be statistically significant if
| yi y h | > q a
MSE
m
33
Here, we are controlling the experimentwise error rate over all possible
pairwise comparisons of treatment means. Recall from our discussion of
Tukey simultaneous 95 percent confidence intervals that the sample size
for each treatment is assumed to be the same value m, and qa is a studentized range value obtained from Table A3 corresponding to the values p
and n p.
A modification of the Tukey procedure is the StudentNewman
Keuls (SNK) procedure, which has us first arrange the sample treatment
means from smallest to largest. Denoting these ordered sample means as
y(1) , y( 2 ) ,..., y( p ), the SNK procedure declares the difference between the
ordered population means m(i ) and m( h ) (where i is greater than h) to be
statistically significant if
| y(i ) y( h ) | > qa (i h + 1, n p )
MSE
m
Here, we denote the fact that the studentized range value obtained from
Table A3 depends upon i h + 1, the number of steps between y(i ) and
y( h ), by denoting this studentized range value as qa(i h + 1, n p ).
For example, in the gasoline mileage example the three sample means
y A = 34.92, yB = 36.56, and yC = 33.98 arranged in increasing order are
y(1) = 33.88, y( 2 ) = 34.42, and y( 3 ) = 36.56. To compare m( 3 ) with m(1), (that
is, mB with mC ) at significance level .05, we look up q.05 3 1 + 1, 15 3 = q.05 3, 12
q.05 3 1 + 1, 15 3 = q.05 3, 12 in Table A3 to be 3.77. Because | y( 3 ) y(1) | = | 36.56 33.98 | = 2.58
y(1) | = | 36.56 33.98 | = 2.58 is greater than q.05 (3,12) MSE / m = 3.77 .669 / 5 = 1.379 ,
we conclude that mB and mC differ. To compare m( 3 ) with m( 2 ) (that is, mB with
mA) at significance level .05, we look up q.05 (3 2 + 1,15 3) = q.05 (2,12)
in Table A3 to be 3.08. Because | y( 3 ) y( 2 ) | = | 36.56 34.92 | = 1.64 is
greater than q.05 (2,12) MSE / m = 3.08 .669 / 5 = 1.127, we conclude
that mB and mA differ. To compare m( 2 ) with m(1) (that is, mA with mC ) at significance level .05, we look up q.05 (2 1 + 1,15 3) = q.05 (2,12) in Table A3
to be 3.08. Because | y( 2 ) y(1) | = | 34.92 33.98 | = .94 is not greater than
q.05 (2,12) MSE / m = 3.08 .669 / 5 = 1.127, we cannot conclude that
mA and mC differ. In general, the SNK procedure has neither a comparisonwise or an experimentwise error rate. Rather, the SNK procedure controls
the error rate at a for all comparisons of means that are the same number of
ordered steps apart. For example, in the gasoline mileage e xample, the error
34
EXPERIMENTAL DESIGN
rate is a for comparing m( 3 ) with m( 2 ) (that is, mB with mA) and m( 2 ) with m(1)
(that is, mA with mC ) because in both cases i h + 1 = 2.
In general, because qa(i h + 1, n p ) MSE / m decreases as the
number of steps apart i h + 1 decreases, the SNK procedure is more liberal than the Tukey procedure in declaring significant differences between
treatment means. However, the SNK procedure is more conservative than
performing individual t tests in declaring such significant differences. It
is important to note that since many users find individual t tests each at
significance level a [or individual100(1 a ) percent confidence intervals]
easy to calculate and, therefore, use them for making multiple treatment
mean comparisons (including those suggested by the data), statisticians
recommend doing this only if the F test of H 0 : m1 = m2 = ... = m p rejects
H 0 at significance level a. Such a use of the F test as a preliminary test of
significance, followed by making multiple, individual t test comparisons,
is called Fishers least significant difference (LSD) procedure. Simulation studies suggest that Fishers LSD procedure controls the experimentwise error rate for the multiple comparisons at approximately a.
Lastly, in some situations it is important to use a control treatment and compare various treatment means with the control treatment
mean. This is true, for example, in medical research where various new
medicines would be compared with a placebo. The placebo might, for
example, be a pill with no active ingredients, and measuring the placebo
effect is important because sometimes patients react favorably simply
because they are taking a pill. Dunnetts procedure declares treatment
mean mi to be different from the control mean mcontrol at significance
level a if
| yi ycontrol | > d a( p 1, n p )
2 MSE
m
yC ycontrol
35
we find from Table A4 that d .05 (3 1,15 3 = 12) = d .05 (2,12) is 2.50.
Therefore, d .05 (2,12) 2 MSE / m = 2.50 2(.669) / 5 = 1.293. Because
| y B ycontrol | = | y B y A | = | 36.56 34.92 | = 1.64 is greater than 1.293,
we conclude that mB differs from mA. Because | yC ycontrol | = | yC y A | = | 33.98 34.92 | = .
| = | yC y A | = | 33.98 34.92 | = .94 is not greater than 1.293, we cannot conclude that
mC differs from mA.
Vat 2
Vat 3
Vat 4
6.1
7.1
5.6
6.5
6.6
7.3
5.8
6.8
6.4
7.3
5.7
6.2
6.3
7.7
5.3
6.3
y2 = 7.35
y3 = 5.6
y4 = 6.45
y1 = 6.35
36
EXPERIMENTAL DESIGN
Let yij denote the potency of the jth sample in the ith randomly
selected vat. Then the random model says that
yij = mi + eij
Here, mi is the mean potency of all possible samples of liquid medication
that could be randomly selected from the ith randomly selected vat. That
is, mi is the mean potency of all of the liquid medication in the ith randomly selected vat. Moreover, since the four vats were randomly selected,
mi is assumed to have been randomly selected from the population of all
possible vat means. This population is assumed to be normally distributed
with mean m and variance s m2 . Here, m is the mean potency of all possible
samples of liquid medication that could be randomly selected from all
possible vats. That is, m is the mean potency of all possible liquid medication. In addition, s m2 is the variance between all possible vat means.
We further assume that each error term eij has been randomly selected
from a normally distributed population of error term values having mean
zero and variance s 2 and that different error terms eij are independent of
each other and of the randomly selected means mi . Under these assumptions we can test the null hypothesis H 0 : s m2 = 0. This hypothesis says
that all possible vat means are equal. We test H 0 versus the alternative
hypothesis H a : s m2 0, which says that there is some variation between
the vat means. Specifically, we can reject H 0 in favor of H a at significance
level a = .05 if the F statistic of Section 1.2, F = MST / MSE , is greater
than Fa = F.05 = 3.49, which is based on p 1 = 4 1 = 3 numerator and
n p = 16 4 = 12 denominator degrees of freedom.
Table 1.4 tells us that since F = 45.5111 is greater than F.05 = 3.49,
we can reject H 0 : s m2 = 0 with a = .05. Therefore, we conclude that there
is variation in the population of all vat means. That is, we conclude that
some of the vat means differ. Furthermore, as illustrated in Table 1.4, we
can calculate point estimates of the variance components s 2 and s m2 .
These estimates are .0542 and .6031, respectively. Note here that the
variance component s 2 measures the within-vat variability, while
s m2 measures the between-vat variability. In this case the between-vat
variability is substantially higher than the within-vat variability. We can
also calculate a 95 percent confidence interval for m , the mean potency
F = 45.5111
MST = 2.4667
MSE = .0542
SST = 7.4
SSE = .65
p 1 = 3
n p = 12
Model
Error
H0 : s 2m = 0
H0 : m1 = m2 = ... = mp
s 2 + n s 2m
s2
1 p
n ( m m )2
p 1 i=1 i i
E(mean square)
random model
s2
s2 +
E(mean square)
fixed model
MST
2.4667
Furthermore, a 100(1 a)% = 95% confidence interval for m is y ta / 2
= [6.4375 3.182(.3926 )]
= 6.4375 t .025
pm
4( 4 )
= [5.1881,7.6869]
Here, t a /2 is based on p 1 = 4 1 = 3 degrees of freedom.
n2i
1 p
i=1
1. n =
n p (= m for equal sample sizes)
p 1 i=1 i
ni
i =1
Notes:
F statistic
Mean
square
Sum of squares
df
Source
AN INTRODUCTION TO EXPERIMENTAL DESIGN
37
38
EXPERIMENTAL DESIGN
39
null hypothesis that the population variances are equal. An alternative test
that does not require the populations to have normal distributions is the
Brown-Forsythe-Levene (BFL) test. To carry out this test, which involves
considerable calculation, we let zij = | yij med i |, where med i denotes the
median of the observations in the ith sample. We then calculate
ni
zi . =
z
j =1
ij
and z.. =
ni
ni
z
i =1 j =1
ij
where n = n1 + n2 + ... + n p is the total sample size. The BFL test then says
that we can reject H 0 : s12 = s22 = ... = s 2p at level of significance a if
p
L=
n (z
i =1
p ni
i.
(z
i =1 j =1
ij
z.. )2 / ( p 1)
zi . )2 / (n p )
1.6Exercises
1.1 A
n oil company wishes to study the effects of four different gasoline additives on mean gasoline mileage. The company randomly
selects four groups of six automobiles each and assigns a group of
six automobiles to each additive type (W , X , Y , and Z). Here, all
24 automobiles employed in the experiment are the same make and
model. Each of the six automobiles assigned to a gasoline additive is
test driven using the appropriate additive, and the gasoline mileage
40
EXPERIMENTAL DESIGN
DATA GASOLINE;
A 34.3
B 36.4
C 34.7
A 35.5
B 37.0
C 33.0
A 35.8
B 37.6
C 34.9
;
PROC GLM; }
Specifies General Linear Models Procedure
CLASS GASTYPE; }
Defines class variable GASTYPE
Specifies model, and CLM
MODEL MILEAGE = GASTYPE / P CLM; }
requests confidence intervals
Estimates B A
ESTIMATE MUB-MUA GASTYPE -1 1;}
Estimates A C
ESTIMATE MUA-MUC GASTYPE 1 0 -1;}
Estimates B C
ESTIMATE MUB-MUC GASTYPE 0 1 -1;}
ESTIMATE MUB-(MUC+MUA)/2
C + A
Estimates mB
GASTYPE -.5 1 -.5 ; }
PROC GLM ;
CLASS GASTYPE ;
MODEL MILEAGE = GASTYPE / P CLI ; }
CLI requests prediction intervals
Notes: 1.The coefficients in the above ESTIMATE statements are obtained by writing the quantity to be
estimated as a linear combination of the factor level means mA , mB , and mC with the factor levels
considered in alphabetical order. For example, if we consider MUB MUA (that is, mB mA ),
we write this difference as
mA + mB = 1( mA ) + 1( mB ) + 0( mC )
Here, the trailing zero coefficient corresponding to mC may be dropped to obtain
ESTIMATE MUB MUA GASTYPE 1 1;
As another example, the coefficients in the ESTIMATE statement for
MUB ( MUC + MUA ) / 2 (that is, mB ( mC + mA ) /2 ) are obtained by writing this
expression as
1
1
mB ( mC + mA ) / 2 = ( mA ) + 1( mB ) + mC
2
2
= .5( mA ) + 1( mB ) + ( .5) mC
Thus we obtain
ESTIMATE MUB ( MUC + MUA ) / 2 GASTYPE .5 1 .5;
2. Expressions inside single quotes (for example, MUB MUA) are labels that may be up to 16
characters in length.
3. Confidence intervals (CLM) and prediction intervals (CLI) may not be requested in the same
MODEL statement when using PROC GLM.
Figure 1.4 SAS program to analyze the North American Oil Company
data
41
for the test drive is recorded. The results of the experiment are given
in Table 1.5. A one-way ANOVA of this data is carried out by using
SAS. The PROC GLM output is given in Figure 1.5. Note that the
treatment means mW , mX , mY , and mZ are denoted as MUW, MUX,
MUY, and MUZ on the output.
(a) Identify and report the values of SSTO, SST , MST , SSE , and MSE
SSTO, SST , MST , SSE , and MSE .
(b) Identify, report, and interpret F and its associated p-value.
(c) Identify, report, and interpret the appropriate individual t statistics and associated p-values for making all pairwise comparisons
of mW , mX , mY , and mZ .
(d) Identify, report, and interpret the appropriate individual
t
statistic and associated p-value for testing the significance of
[( mY + mZ ) / 2] [( mX + mW ) / 2].
(e) Identify, report, and interpret a point estimate of and a 95
percent confidence interval for mZ (see observation 24).
(f ) Identify, report, and interpret a point prediction of and a 95
percent prediction interval for y Z 0 = mZ + eZ 0
1.2 Consider the one-way ANOVA of the gasoline additive data in
Table 1.5 and the SAS output of Figure 1.5
(a) Compute individual 95 percent confidence intervals for all possible pairwise differences between treatment means.
(b) Compute Tukey simultaneous 95 percent confidence intervals
for all possible pairwise differences between treatment means.
(c) Compute Scheff simultaneous 95 percent confidence intervals
for all possible pairwise differences between treatment means.
(d) Compute Bonferroni simultaneous 95 percent confidence intervals for the (prespecified) set of all possible pairwise differences
between treatment means.
(e) Which of the above intervals are the most precise?
1.3 Consider the one-way ANOVA of the gasoline additive data in
Table 1.5 and the SAS output of Figure 1.5. Also consider the
prespecified set of linear combinations (contrasts):
mZ mW
mY mW
mZ mX
( mY + mZ ) ( mX + mW )
2
2
mY mX
mZ mY
mX mW
42
EXPERIMENTAL DESIGN
Mean
31.2
27.6
35.7
34.5
32.6
28.1
34.0
36.2
30.8
27.4
35.1
35.2
31.5
28.5
33.9
35.8
32.0
27.5
36.1
34.9
30.1
28.7
34.8
35.3
31.3667
27.9667
34.9333
35.3167
OBSERVED
VALUE
35.30000000
PREDICTED
VALUE
35.31666667
35.31666667
PREDICTED
VALUE
T FOR H0:
PARAMETER=0
9.14
17.01
0.89
8.25
16.12
-7.87
17.86
127.22
F VALUE
0.0001
PR > F
3
DF
-0.01666667
RESIDUAL
-0.01666667
RESIDUAL
0.0001
0.0001
0.3857
0.0001
0.0001
0.0001
0.0001
PR > |T|
F VALUE
127.22
34.67916207
LOWER 95% CL
INDIVIDUAL
33.62998806
LOWER 95% CL
FOR MEAN
127.22
F VALUE
STD ERROR OF
ESTIMATE
0.43221008
0.43221008
0.43221008
0.43221008
0.43221008
0.43221008
0.30561868
TYPE III SS
R-SQUARE
0.950205
213.88125000
PR > F
0.0001
ROOT MSE
0.74860982
35.95417126
UPPER 95% CL
INDIVIDUAL
37.00334528
UPPER 95% CL
FOR MEAN
0.0001
PR > F
C. V.
2.3108
MILEAGE MEAN
32.39583333
Figure 1.5 SAS output of a one-way ANOVA of the gasoline additive test results
24
35.30000000
24
OBSERVATION
ESTIMATE
3.95000000
7.35000000
0.38333333
3.56666667
6.96666667
-3.40000000
5.45833333
TYPE I SS
213.88125000
OBSERVED
VALUE
DF
DF
3
20
23
OBSERVATION
PARAMETER
MUZ-MUW
MUZ-MUX
MUZ-MUY
MUY-MUW
MUY-MUX
MUX-MUW
(Y+Z)/2-(X+W)/2
ADDTYPE
SOURCE
SOURCE
MODEL
ERROR
CORRECTED TOTAL
SAS
GENERAL LINEAR MODELS PROCEDURE
43
44
EXPERIMENTAL DESIGN
Mean
Alpha
Best
Century
Divot
281
270
218
364
220
334
244
302
274
307
225
325
242
290
273
337
251
331
249
355
253.6
306.4
241.8
336.6
ANOVA
Source of Variation
Between Groups
Within Groups
Total
SS
df
MS
P-Value
F crit
29860.4
9698.4
39558.8
3
16
19
9953.4667
606.15
16.420798
3.853E-05
3.2388715
1.5 M
odify the golf ball durability data in Table 1.6 by assuming that
the four brands of golf balls have been randomly selected from the
population of all brands. Then, using the random model:
(a) Test H 0 : s m2 = 0 versus H a : s m2 0 by setting a equal to .05.
(b) Find point estimates of s 2 and s m2 . Interpret.
(c) Find a 95 percent confidence interval for m . Interpret this interval.
Index
Analysis of variance (ANOVA)
assumptions, 6
one-way analysis of variance
(one-way ANOVA), 4
table, 15, 16
two-way ANOVA, 53
using pooling, 223
Basic design, 197, 200
Block confounding, 214234
Block sum of squares (SSB), 80
Bonferroni simultaneous intervals, 24,
25, 26, 29, 30, 86
Brown-Forsythe-Levene (BFL) test,
39
Column vector, 93
Complete model, 103
Completely randomized experimental
design, 2
Control treatment, 34
Covariate, 115
Cross-over design, 162168
Dependent variable, 1
Designed experiment, 2
Design generator, 197, 200
Dunnetts procedure, 34
Error mean square, 11
Error sum of squares (SSE), 9, 80
Experimental units, 2
F (model) statistic, 96
F tests, 58, 59
Fishers least significant difference
(LSD), 34
Fixed models, 3538
Fold over design, 205
Fractional factorials
basic techniques, 189204
fold over designs, 204214
Plackett-Burman designs, 204214
General analysis approach, 132152
Hartleys test, 38
Independent variables, 1
ith factor level mean, 50
ith treatment effect, 4
jth factor level mean, 51, 105
L equations, 218
L procedure, 218
Latin square design, 158162
Least squares point estimates, 93
Linear combination, 24
Matrix algebra, 94
Mean square error, 95
Mean squares, 8, 11
Multiple coefficient of variation, 95
Nested factors, 125132
Null hypothesis, 8
One factor analysis
basic concepts of, 13
fixed models, 3538
population variances, equality of,
38, 39
random models, 3538
significant differences between
treatment means, 715
treatment means, linear
combinations of, 1535
260 Index