2
of
of
ce
= Analysis of Variance
. :
* tn the previous chapter we covered techniques for determining whether @
Ss diferenee eis between the means of to independent populations. Tt isnot
carer oncnar toencountersitatonein which we wish o test foradifference
Ly ‘among several independent means rather than between only two. The extension
4 Of the two-sample t-test to three or more samples is known as the analysis of
ME 12.1. One-Way Analysis of Variance
12.1.1 TheProblem
When discussing the paired t-test, we examined data froma study that inves-
tigates the effect of carbon monoxide exposure on patients with coronary artery
disease, The subjects involved in the study were recruited from three different
‘medical centers—the Johns Hopkins University School of Medicine, the Rancho
Los Amigos Medical Center, and the St. Louis University School of Medicine.
Before combining the subjects into one large group to conduct the analysis, we
can first examine some baseline characteristics to ensure that the patients from
the various centers are in fact comparable.
‘One characteristic that we might wish to consider is pulmonary function
before the start of the study; if the patients from one medical center begin with
measures of forced expiratory volume in 1 second that are much larger—or much
smaller—than those from the other centers, then the results of the analysis may
be affected. Therefore, given that the populations of patients in the three centers
have mean baseline FEV; measurements 13, 42, and 1s respectively, we would
like to test the null hypothesis that the population means are identical. This may
be expressed as
257256
Chapter12 Analysis of Variance
Ho: se. = p= us.
Thealternative hypothesisis thatat least one of the population means differs from
the others
In general, we are interested in comparing the means of k different popula-
tions. Suppose that the k populations are independent and normally distributed.
‘We begin by drawing a random sample of size m from the normal population with
‘mean jz and standard deviation 0;. The mean of this sample is denoted by % and
its standard deviation by s1. Similarly, we select a random sample of size 11 from
the normal population with mean 2 and standard deviation oz, and so on for the
remaining populations. The numbers of observations in each sample need not be
the same.
Group | Group? | = | Grouph
Population Mean’ ms a | a
Standarddeviation | oy o *
Sample Mean a a ee
Standard devition 8 2 a
Sample size m m m
Forthe study investigating the effects of carbon monoxide exposure on indi-
Vidualswith coronary artery disease, the FEV; distributions of patientsassociated
with each ofthe three medical centers comprise distinct populations. From the
population of FEV; measurements for the patients at Johns Hopkins University,
Tweselecta sample of size ny ~ 2. From the population at Rancho Los Amigos we
draw a sample ofsize my ~ 16, and from the one at St. Louis University we select
a sample of size ny = 23. The data, along with their sample means and standard
deviations, are provided in Table 121 [1
Presented with hese data, wemightattempt tocomparethethree population
smeans by evaluating all possible pairs of sample means using the two-sample f-
test. Fora totalof three groups, the numberof tests required s (2) = 3. We would
compare group to group?, group I togroup3,and group? togroup3. Weassume
that the variances ofthe underlying populations are all equal or
of = oy
The pooled estimate of the common variance, which we denote 2, combines
information from all three samples; in particular,
Ds} +
3
Dsf +
mm
Ds?
This quantity is simply an extension of s2, the pooled estimate of the variance
used in the two-sample festula.
and
om
the
tbe
di.
he
y
xt
rd
b
i
121 One-Way Analysis of Variance 259
Table 12.1 Forced expiratory volume in 1 second for
patients with coronary artery disease sampled at three
different medical centers
Johns Hopkins | RanchoLos Amigos | St.Louis
| 32 a7
eo es |e
a Ff |.
2 te | ae
301 a7 | 247
iw 33 2
iio ee
ba ge | ae
us be is
ue in ie
dat ih 38
i an io
Be 2 2
a | le a
as 33 2B
wo |e
22 | ie
2 is
es | is
2s | Gs
is | 33
sh
38
2.88 ers
4) =263ters As titers
sy=oaseliers | sp=0523Iters | sy = 0498 iter
Performing all possible pairs of tests is not a problem if the number of
populations is relatively small. In the instance where k = 3, there are only three
Such tests. If k = 10, however, the process becomes much more complicated. In
this case, we would have to perform (9) = 45 different pairwise tests
More important, another problem that arises when all possible two-sample
tests are conducted is that this procedure is likely to lead to an incorrect con-
clusion, Suppose thatthe three population means are infact equal and that we
‘conduct al tree pairwise tests. Assume that the tests are independent and set
the significance level a 0.05 for each one. By the multiplicative rule, the proba-
bility of failing t reject a null hypothesis of no difference inal three instances
would be
a= 0.05"
(0.95)
0.857,
(fal to reject in all three tests)