Complete Business Statistics: Analysis of Variance

COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
7th edition.
Prepared by Lloyd Jaisingh, Morehead State
University
Chapter 9
Analysis of Variance
McGraw-Hill/Irwin
Copyright 2009 by The McGraw-Hill Companies, Inc. All
9-2
9 Analysis of Variance
Using Statistics
The Hypothesis Test of Analysis of Variance
The Theory and Computations of ANOVA
The ANOVA Table and Examples
Further Analysis
Models, Factors, and Designs
Two-Way Analysis of Variance
Blocking Designs
9-3
9 LEARNING OBJECTIVES
After studying this chapter you should be able to:
Explain the purpose of ANOVA

Describe the model and computations behind ANOVA
Explain the test statistic F
Conduct a one-way ANOVA
Report ANOVA results in an ANOVA table
Apply a Tukey test for pairwise analysis
Conduct a two-way ANOVA
Explain blocking designs
Apply templates to conduct one-way and two-way ANOVA
9-4
9-1 Using Statistics
ANOVA (ANalysis Of VAriance) is a statistical method for determining

the existence of differences among several population means.
ANOVA is designed to detect differences among means from

populations subject to different treatments
ANOVA is a joint test
The equality of several population means is tested

simultaneously or jointly.
ANOVA tests for the equality of several population means by looking at

two estimators of the population variance (hence, analysis of variance).
9-2 The Hypothesis Test of

Analysis of Variance
In an analysis of variance:
We have r independent random samples, each one corresponding to a

population subject to a different treatment.
We have:
n = n1+ n2+ n3+ ...+nr total observations.

r sample means: x1, x2 , x3 , ... , xr
These r sample means can be used to calculate an estimator of
the population variance. If the population means are equal,
we expect the variance among the sample means to be small.
r sample variances: s12, s22, s32, ...,sr2
These sample variances can be used to find a pooled
estimator of the population variance.
9-5
9-2 The Hypothesis Test of Analysis of

Variance (continued): Assumptions
Weassume
assumeindependent
independentrandom
randomsampling
samplingfrom
fromeach
eachofofthe
therrpopulations
populations
We
Weassume
assumethat
thatthe
therrpopulations
populationsunder
understudy:
study:
We
are normally distributed,
are
normally distributed,
with means i that may or may not be equal,
with
means i that may or may not be equal,
but with equal variances, 2i2.
but with equal variances, i .
Population 1
Population 2
Population r
9-6
9-2 The Hypothesis Test of Analysis of

Variance (continued)
Thehypothesis
hypothesistest
testof
ofanalysis
analysisof
ofvariance:
variance:
The
HH00::11==22==33==44==......r r
Notall
alli i(i(i==1,1,...,
...,r)r)are
areequal
equal
HH11::Not
Thetest
teststatistic
statisticof
ofanalysis
analysisof
ofvariance:
variance:
The
=
FF(r-1,
(r-1,n-r)
n-r) =
Estimateofofvariance
variancebased
basedon
onmeans
meansfrom
fromr rsamples
samples
Estimate
Estimateofofvariance
variancebased
basedon
onall
allsample
sampleobservations
observations
Estimate
Thatis,
is,the
thetest
teststatistic
statisticininan
ananalysis
analysisof
ofvariance
varianceisisbased
basedon
onthe
theratio
ratioof
of
That
twoestimators
estimatorsof
ofaapopulation
populationvariance,
variance,and
andisistherefore
thereforebased
basedon
onthe
theFF
two
distribution,with
with(r-1)
(r-1)degrees
degreesof
offreedom
freedomininthe
thenumerator
numeratorand
and(n-r)
(n-r)
distribution,
degreesof
offreedom
freedomininthe
thedenominator.
denominator.
degrees
9-7
9-8
When the Null Hypothesis Is True

Whenthe
thenull
nullhypothesis
hypothesisisistrue:
true:
When

Wewould
wouldexpect
expectthe
thesample
samplemeans
meanstotobe
benearly
nearly
We
equal,as
asininthis
thisillustration.
illustration. And
Andwe
wewould
would
equal,
expectthe
thevariation
variationamong
amongthe
thesample
samplemeans
means
expect
(betweensample)
sample)totobe
besmall,
small,relative
relativetotothe
the
(between
variationfound
foundaround
aroundthe
theindividual
individualsample
sample
variation
means(within
(withinsample).
sample).
means
thenull
nullhypothesis
hypothesisisistrue,
true, the
thenumerator
numeratorinin
IfIfthe
thetest
teststatistic
statisticisisexpected
expectedtotobe
besmall,
small,relative
relative
the
thedenominator:
denominator:
totothe
=
(r-1, n-r)
FF(r-1,
n-r)=
x
Estimate of variance based on means from r samples

Estimate of variance based on all sample observations

9-9
When the Null Hypothesis Is False
When the null hypothesis is false:

is equal to but not to ,
is equal to but not to ,
is equal to but not to
, or
, , and are all unequal.
Inany
anyof
ofthese
thesesituations,
situations,we
wewould
wouldnot
notexpect
expectthe
thesample
samplemeans
meanstotoall
allbe
benearly
nearly
In
equal. We
Wewould
wouldexpect
expectthe
thevariation
variationamong
amongthe
thesample
samplemeans
means(between
(between
equal.
sample)totobe
belarge,
large,relative
relativetotothe
thevariation
variationaround
aroundthe
theindividual
individualsample
samplemeans
means
sample)
(withinsample).
sample).
(within
thenull
nullhypothesis
hypothesisisisfalse,
false, the
thenumerator
numeratorininthe
thetest
teststatistic
statisticisisexpected
expectedtotobe
be
IfIfthe
large,relative
relativetotothe
thedenominator:
denominator:
large,
= Estimate
of variance based on means from r samples
(r-1, n-r)
FF(r-1,
=
n-r)

9-10
The ANOVA Test Statistic for r = 4 Populations and n

= 54 Total Sample Observations
Supposewe
wehave
have44populations,
populations,from
fromeach
eachof
ofwhich
whichwe
wedraw
drawan
an
Suppose
independentrandom
randomsample,
sample,with
withnn11++nn22++nn33++nn44==54.
54. Then
Thenour
ourtest
test
independent
statisticis:
is:
statistic
Estimate of variance based on means from 4 samples
= F(3,50) =Estimate
(4-1, 54-4)
of variance based on means from 4 samples
FF(4-1,
54-4)= F(3,50) =
Estimate of variance based on all 54 sample observations
Estimate of variance based on all 54 sample observations
F Distribution with 3 and 50 Degrees of Freedom

0.7
0.6
f(F)
0.5
0.4
0.3
0.2
=0.05
0.1
0.0
0
3
2.79
F(3,50)
The nonrejection region (for =0.05)in this

instance is F 2.79, and the rejection region
is F > 2.79. If the test statistic is less than
2.79 we would not reject the null hypothesis,
and we would conclude the 4 population
means are equal. If the test statistic is
greater than 2.79, we would reject the null
hypothesis and conclude that the four
population means are not equal.
9-11
Example 9-1
Randomlychosen
chosengroups
groupsofofcustomers
customerswere
wereserved
serveddifferent
differenttypes
typesofofcoffee
coffeeand
andasked
askedtotorate
ratethe
the
Randomly
coffeeon
onaascale
scaleofof00toto100:
100:21
21were
wereserved
servedpure
pureBrazilian
Braziliancoffee,
coffee,20
20were
wereserved
servedpure
pureColombian
Colombian
coffee
coffee,and
and22
22were
wereserved
servedpure
pureAfrican-grown
African-growncoffee.
coffee.
coffee,
Theresulting
resultingtest
teststatistic
statisticwas
wasFF==2.02
2.02
The
F 2.02 F
2,60
3.15
H0 cannot be rejected, and we cannot conclude that any of the

population means differs significantly from the others.
0.7
0.6
0.5
f(F)
H0 : 1 2 3
H1: Not all three means equal
n1 = 21 n 2 = 20 n3 = 22 n = 21+ 20 + 22 = 63
r =3
The critical point for = 0.05 is:
F
F
F
3.15
r -1,n-r
31,633
2,60
0.4
0.3
0.2
=0.05
0.1
0.0
0
Test Statistic=2.02
F(2,60)=3.15
9-12
9-3 The Theory and the Computations

of ANOVA: The Grand Mean
Thegrand
grandmean,
mean,
themean
meanof
ofall
allnn== nn+
mean x,x,isisthe
1+ n2+ n3+...+ nr observations
The
mean
1 n2+ n3+...+ nr observations
allrrsamples.
samples.
ininall
The mean of sample i (i = 1,2,3,..., r) :
ni
xij
j
xi = 1
ni
The grand mean, the mean of all data points :
r ni
r
xij ni xi
xi = i1 j 1 = i1
n
n
where x is the particular data point in position j within the sample from population i.
ij
The subscript i denotes the population, or treatment, and runs from 1 to r. The subscript j
denotes the data point within the sample from population i; thus, j runs from 1 to n .
j
9-13
Using the Grand Mean: Table 9-1

Treatment (j)
Sample point(j)
I = 1 Triangle
1
Triangle
2
Triangle
3
Triangle
4
Mean of Triangles
I = 2 Square
1
Square
2
Square
3
Square
4
Mean of Squares
I = 3 Circle
1
Circle
2
Circle
3
Mean of Circles
les
Grand mean of all data points
Value(x ij)
4
5
7
8
6
10
11
12
13
11.5
1
2
3
2
6.909
x1=6
x2=11.5
x=6.909
x3=2
0
10
Distance from data point to its sample mean

Distance from sample mean to grand mean
ther rpopulation
populationmeans
meansare
aredifferent
different(that
(thatis,is,atat
IfIfthe
leasttwo
twoofofthe
thepopulation
populationmeans
meansare
arenot
notequal),
equal),
least
thenititisislikely
likelythat
thatthe
thevariation
variationofofthe
thedata
data
then
pointsabout
abouttheir
theirrespective
respectivesample
samplemeans
means
points
(within sample
samplevariation)
variation)will
willbe
besmall
smallrelative
relative
(within
thevariation
variationofofthe
ther rsample
samplemeans
meansabout
aboutthe
the
totothe
grandmean
mean(between
(betweensample
samplevariation).
variation).
grand
The Theory and Computations of ANOVA:

Error Deviation and Treatment Deviation
9-14
We define an error devi ation as the difference between a data point

and its sample mean. Errors are denoted by e, and we have:
eeijij xxijij xxii

We define a treatment deviation as the deviation of a sample mean
from the grand mean. Treatment deviations, ti , are given by:
tt xx xx
i
TheANOVA
ANOVAprinciple
principlesays:
says:
The
Whenthe
thepopulation
populationmeans
meansare
arenot
notequal,
equal,the
theaverage
averageerror
error
When
(withinsample)
sample) isisrelatively
relativelysmall
smallcompared
comparedwith
withthe
theaverage
average
(within
treatment(between
(betweensample)
sample)deviation.
deviation.
treatment
9-15
The Theory and Computations of

ANOVA: The Total Deviation
Thetotal
totaldeviation
deviation(Tot
(Totij))isisthe
thedifference
differencebetween
betweenaadata
datapoint
point(x
(xij))and
andthe
thegrand
grandmean
mean(x):
(x):
The
ij
ij
Totij=x
=xij--xx
Tot
ij
ij
Forany
anydata
datapoint
pointxx:ij:
For
ij
Tot==tt++ee
Tot
Thatis:
is:
That
TotalDeviation
Deviation==Treatment
TreatmentDeviation
Deviation++Error
ErrorDeviation
Deviation
Total
Consider data point x24=13 from table 9-1. The

mean of sample 2 is 11.5, and the grand mean is
6.909, so:
e24 x 24 x 2 13 11.5 1.5
t 2 x 2 x 11.5 6.909 4 .591
Tot 24 t 2 e24 1.5 4 .591 6.091

or
Tot 24 x 24 x 13 6.909 6.091
Total deviation:
Tot24=x24-x=6.091
Error deviation:
e24=x24-x2=1.5
x24=13
Treatment deviation:
t2=x2-x=4.591
x2=11.5
x = 6.909
10

ANOVA: Squared Deviations
Total Deviation = Treatment Deviation + Error Deviation
The total deviation is the sum of the treatment deviation and the error deviation:
t + e = ( x x ) ( x ij x ) ( x ij x ) Tot ij
i
ij
i
i
Notice that the sample mean term ( x ) cancels out in the above addition, which
i
simplifies the equation.
Squared Deviations
2
= (x x)
i
ij
i
2
2
Tot ij ( x ij x )
t
+e
( x ij x )
i
9-16

The Sum of Squares Principle
Sums of Squared Deviations
n
j
j
r
r
r
2
2
2
Tot
e
nt
+
i 1 j 1 ij
i 1 ii
i 1 j 1 ij
n
n
j
j
r
r
r
2
2
(x x) = n (x x)
( x x )2
i
i 1 j 1 ij
i 1 i i
i 1 j 1 ij
SST =
SSTR
SSE
TheSum
Sumof
ofSquares
SquaresPrinciple
Principle
The
Thetotal
totalsum
sumof
ofsquares
squares(SST)
(SST)isisthe
thesum
sumof
oftwo
twoterms:
terms: the
thesum
sumof
of
The
squaresfor
fortreatment
treatment(SSTR)
(SSTR)and
andthe
thesum
sumof
ofsquares
squaresfor
forerror
error(SSE).
(SSE).
squares
SST == SSTR
SSTR ++ SSE
SSE
SST
9-17

Picturing The Sum of Squares Principle
SSTR
SSE
SST
SSTmeasures
measuresthe
thetotal
totalvariation
variationininthe
thedata
dataset,
set,the
thevariation
variationof
ofall
allindividual
individualdata
data
SST
pointsfrom
fromthe
thegrand
grandmean.
mean.
points
SSTRmeasures
measuresthe
theexplained
explainedvariation,
variation,the
thevariation
variationof
ofindividual
individualsample
samplemeans
means
SSTR
fromthe
thegrand
grandmean.
mean. ItItisisthat
thatpart
partofofthe
thevariation
variationthat
thatisispossibly
possiblyexpected,
expected,oror
from
explained,because
becausethe
thedata
datapoints
pointsare
aredrawn
drawnfrom
fromdifferent
differentpopulations.
populations. Its
Itsthe
the
explained,
variationbetween
betweengroups
groupsofofdata
datapoints.
points.
variation
SSEmeasures
measuresunexplained
unexplainedvariation,
variation,the
thevariation
variationwithin
withineach
eachgroup
groupthat
thatcannot
cannotbe
be
SSE
explainedby
bypossible
possibledifferences
differencesbetween
betweenthe
thegroups.
groups.
explained
9-18

ANOVA: Degrees of Freedom
Thenumber
numberofofdegrees
degreesofoffreedom
freedomassociated
associatedwith
withSST
SSTisis(n
(n--1).
1).
The
totalobservations
observationsininall
allrrgroups,
groups,less
lessone
onedegree
degreeof
offreedom
freedom
nntotal
lostwith
withthe
thecalculation
calculationof
ofthe
thegrand
grandmean
mean
lost
Thenumber
numberofofdegrees
degreesofoffreedom
freedomassociated
associatedwith
withSSTR
SSTRisis(r(r--1).
1).
The
samplemeans,
means,less
lessone
onedegree
degreeof
offreedom
freedomlost
lostwith
withthe
the
rrsample
calculationofofthe
thegrand
grandmean
mean
calculation
Thenumber
numberofofdegrees
degreesofoffreedom
freedomassociated
associatedwith
withSSE
SSEisis(n-r).
(n-r).
The
totalobservations
observationsininall
allgroups,
groups,less
lessone
onedegree
degreeofoffreedom
freedom
nntotal
lostwith
withthe
thecalculation
calculationof
ofthe
thesample
samplemean
meanfrom
fromeach
eachof
ofrrgroups
groups
lost
Thedegrees
degreesofoffreedom
freedomare
areadditive
additiveininthe
thesame
sameway
wayas
asare
arethe
thesums
sumsof
ofsquares:
squares:
The
df(total)==df(treatment)
df(treatment)++df(error)
df(error)
df(total)
(n(n--1)1) == (r(r--1)1)
++ (n(n--r)r)
9-19

ANOVA: The Mean Squares
Recallthat
thatthe
thecalculation
calculationofofthe
thesample
samplevariance
varianceinvolves
involvesthe
thedivision
divisionof
ofthe
thesum
sumofof
Recall
squareddeviations
deviationsfrom
fromthe
thesample
samplemean
meanby
bythe
thenumber
numberof
ofdegrees
degreesof
offreedom.
freedom. This
This
squared
principleisisapplied
appliedas
aswell
welltotofind
findthe
themean
meansquared
squareddeviations
deviationswithin
withinthe
theanalysis
analysisof
of
principle
variance.
variance.
Meansquare
squaretreatment
treatment(MSTR):
(MSTR):
Mean
SSTR
MSTR
(r 1)
Meansquare
squareerror
error(MSE):
(MSE):
Mean
SSE
MSE
(n r )
Meansquare
squaretotal
total(MST):
(MST):
Mean
SST
MST
(n 1)
(Notethat
thatthe
theadditive
additiveproperties
propertiesofofsums
sumsofofsquares
squaresdo
donot
notextend
extendtotothe
themean
mean
(Note
squares. MSTMSTR
MSTMSTR++MSE.
MSE.
squares.
9-20
9-21

ANOVA: The Expected Mean Squares
2
E ( MSE )
and
n
(
)
2 when the null hypothesis is true
2
i
i
E ( MSTR)
r 1
> 2 when the null hypothesis is false
where i is the mean of population i and is the combined mean of all r populations.
That is, the expected mean square error (MSE) is simply the common population variance
(remember the assumption of equal population variances), but the expected treatment sum of
squares (MSTR) is the common population variance plus a term related to the variation of the
individual population means around the grand population mean.
If the null hypothesis is true so that the population means are all equal, the second term in
the E(MSTR) formulation is zero, and E(MSTR) is equal to the common population variance.
Expected Mean Squares and the

ANOVA Principle
When the null hypothesis of ANOVA is true and all r population means are
equal, MSTR and MSE are two independent, unbiased estimators of the
common population variance 2.
Onthe
theother
otherhand,
hand,when
whenthe
thenull
nullhypothesis
hypothesisisisfalse,
false,then
thenMSTR
MSTRwill
willtend
tendtoto
On
belarger
largerthan
thanMSE.
MSE.
be
Sothe
theratio
ratioof
ofMSTR
MSTRand
andMSE
MSEcan
canbe
beused
usedas
asan
anindicator
indicatorof
ofthe
the
So
equalityor
orinequality
inequalityof
ofthe
therrpopulation
populationmeans.
means.
equality
Thisratio
ratio(MSTR/MSE)
(MSTR/MSE)will
willtend
tendto
tobe
benear
nearto
to11ififthe
thenull
nullhypothesis
hypothesisisis
This
true,and
andgreater
greaterthan
than11ififthe
thenull
nullhypothesis
hypothesisisisfalse.
false. The
TheANOVA
ANOVAtest,
test,
true,
finally,isisaatest
testof
ofwhether
whether(MSTR/MSE)
(MSTR/MSE)isisequal
equalto,
to,or
orgreater
greaterthan,
than,1.1.
finally,
9-22

ANOVA: The F Statistic
Underthe
theassumptions
assumptionsof
ofANOVA,
ANOVA,the
theratio
ratio(MSTR/MSE)
(MSTR/MSE)
Under
possessan
anFFdistribution
distributionwith
with(r-1)
(r-1)degrees
degreesof
offreedom
freedomfor
for
possess
thenumerator
numeratorand
and(n-r)
(n-r)degrees
degreesof
offreedom
freedomfor
forthe
the
the
denominator when
whenthe
thenull
nullhypothesis
hypothesisisistrue.
true.
denominator
The test statistic in analysis of variance:
F( r -1,n-r )
MSTR
MSE
9-23
9-24
9-4 The ANOVA Table and Examples

Treatment (i)
(x ij -xi ) (x ij -x i )2
Value (x ij )
Triangle
-2
Triangle
-1
Triangle
Triangle
Square
10
-1.5
2.25
Square
Square
Square
2
2
2
2
3
4
11
12
13
-0.5
0.5
1.5
0.25
0.25
2.25
Circle
-1
Circle
Circle
17
73
Treatment
(x i -x)
(x i -x)
ni (x i -x)
Triangle
-0.909
0.826281
3.305124
Square
4.591
21.077281
84.309124
Circle
-4.909
124.098281
72.294843
159.909091
j
r
( x x ) 2 17
SSE
i
i 1 j 1 ij
r
2
SSTR n ( x x ) 159 .9
i 1 i i
SSTR
159 .9
79 .95
MSTR
r 1
( 3 1)
SSTR 17
2 .125
MSE
n r
8
MSTR
79 .95
37 .62 .
F
MSE
2 .125
( 2 ,8 )
Critical point ( = 0.01): 8.65
H may be rejected at the 0.01 level
0
of significance.
9-25
ANOVA Table
Source of
Variation
Sum of
Squares
Degrees of
Freedom Mean Square F Ratio
Treatment SSTR=159.9
(r-1)=2
MSTR=79.95 37.62
Error
SSE=17.0
(n-r)=8
MSE=2.125
Total
SST=176.9
(n-1)=10
MST=17.69
F Distribution for 2 and 8 Degrees of Freedom

0.7
TheANOVA
ANOVATable
Tablesummarizes
summarizesthe
the
The
ANOVAcalculations.
calculations.
ANOVA
0.6
0.5
Computed test statistic=37.62
f(F)
0.4
0.3
0.2
0.01
0.1
0.0
0
8.65
10
F(2,8)
thisinstance,
instance,since
sincethe
thetest
teststatistic
statisticisis
InInthis
greaterthan
thanthe
thecritical
criticalpoint
pointfor
foran
an=
=
greater
0.01level
levelofofsignificance,
significance,the
thenull
null
0.01
hypothesismay
maybe
berejected,
rejected,and
andwe
wemay
may
hypothesis
concludethat
thatthe
themeans
meansfor
fortriangles,
triangles,
conclude
squares,and
andcircles
circlesare
arenot
notall
allequal.
equal.
squares,
9-26
Template Output
Decision:
Decision:
Rejectthe
the
Reject
NullHypothesis
Hypothesis
Null
9-27
Minitab Output
Decision:
Decision:
Rejectthe
the
Reject
NullHypothesis
Hypothesis
Null
9-28
Example 9-2: Club Med

Club Med has conducted a test to determine whether its Caribbean resorts are equally well liked by
vacationing club members. The analysis was based on a survey questionnaire (general satisfaction,
on a scale from 0 to 100) filled out by a random sample of 40 respondents from each of 5 resorts.
Resort
Guadeloupe
89
Source of
Variation
Martinique
75
Treatment
SSTR= 14208 (r-1)= 4
MSTR= 3552
Eleuthra
73
Error
SSE=98356
(n-r)= 195
MSE= 504.39
Paradise Island
91
Total
SST=112564
(n-1)= 199
MST= 565.65
St. Lucia
85
SSE=98356
Sum of
Squares
Degrees of
Freedom
Mean Square
F Ratio
7.04

0.7
0.6
0.5
Computed test statistic=7.04
0.4
f(F)
SST=112564
Mean Response (x i )
0.3
0.2
0.01
0.1
0.0
0
3.41
F(4,200)
Theresultant
resultantFF
The
ratioisislarger
largerthan
than
ratio
thecritical
criticalpoint
pointfor
for
the
=0.01,
0.01,so
sothe
the
=
nullhypothesis
hypothesismay
may
null
berejected.
rejected.
be
9-29
Example 9-3: Job Involvement

Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean Square
F Ratio
Treatment
SSTR= 879.3
(r-1)=3
MSTR= 293.1
8.52
Error
SSE= 18541.6
(n-r)= 539
MSE=34.4
Total
SST= 19420.9
(n-1)=542
MST= 35.83
Giventhe
thetotal
totalnumber
numberof
ofobservations
observations(n
(n==543),
543),the
thenumber
numberof
ofgroups
groups
Given
4),the
theMSE
MSE(34.
(34.4),
4),and
andthe
theFFratio
ratio(8.52),
(8.52),the
theremainder
remainderof
ofthe
theANOVA
ANOVA
(r(r==4),
tablecan
canbe
becompleted.
completed. The
Thecritical
criticalpoint
pointof
ofthe
theFFdistribution
distributionfor
for=
=0.01
0.01
table
and(3,
(3,400)
400)degrees
degreesof
offreedom
freedomisis3.83.
3.83. The
Thetest
teststatistic
statisticininthis
thisexample
exampleisis
and
muchlarger
largerthan
thanthis
thiscritical
criticalpoint,
point,so
sothe
theppvalue
valueassociated
associatedwith
withthis
thistest
test
much
statisticisisless
lessthan
than0.01,
0.01,and
andthe
thenull
nullhypothesis
hypothesismay
maybe
berejected.
rejected.
statistic
9-30
9-5 Further Analysis

Data
Do Not Reject H0
ANOVA
Stop
Reject H0
The sample means are unbiased estimators of the population means.

The mean square error (MSE) is an unbiased estimator of the common
population variance.
Further
Analysis
The ANOVA Diagram
Confidence Intervals
for Population Means
Tukey Pairwise
Comparisons Test
Confidence Intervals for Population

Means
A (1 - ) 100% confidence interval for i , the mean of population i:
MSE
xi t
ni
2
where t is the value of the t distribution with (n - r ) degrees of
2
freedom that cuts off a right - tailed area of .

2
Resort
Mean Response (x i )
Guadeloupe
89
Martinique
75
Eleuthra
73
Paradise Island
91
St. Lucia
85
SST = 112564
SSE = 98356
ni = 40
n = (5)(40) = 200
MSE = 504.39
MSE
504.39
xi 1.96
xi 6.96
ni
40
2
89 6.96 [82.04, 95.96]
75 6.96 [ 68.04,81.96]
73 6.96 [ 66.04, 79.96]
91 6.96 [84.04,97.96]
85 6.96 [ 78.04, 91.96]
xi t
9-31
9-32
The Tukey Pairwise-Comparisons Test

The Tukey Pairwise Comparison test, or Honestly Significant Differences (MSD) test, allows us to
compare every pair of population means with a single level of significance.
It is based on the studentized range distribution, q, with r and (n-r) degrees of freedom.
The critical point in a Tukey Pairwise Comparisons test is the Tukey Criterion:
T q
MSE
ni
where ni is the smallest of the r sample sizes.

The test statistic is the absolute value of the difference between the appropriate sample means, and
the null hypothesis is rejected if the test statistic is greater than the critical point of the Tukey
Criterion
Note that there are
r!
pairs of population means to compare. For example, if r =

2 !( r 2 ) !
H 0 : 1 2
H 0 : 1 3
H0 : 2 3
H1 : 1 2
H1 : 1 3
H1 : 2 3
2
3:
The Tukey Pairwise Comparison Test:

The Club Med Example
Thetest
teststatistic
statisticfor
foreach
eachpairwise
pairwisetest
testisisthe
theabsolute
absolutedifference
differencebetween
betweenthe
theappropriate
appropriate
The
samplemeans.
means.
sample
Resort
Mean
VI. HH:0:24
ii
Resort
Mean
I.I. HH0:0:1 12 2
VI.
0
2
4
1
Guadeloupe
89
H
:
H
:
1
Guadeloupe
89
H1:1 1 1 2 2
H1:1 2 2 4 4
Martinique
75
|89-75|=14>13.7*
|75-91|=16>13.7*
22
Martinique
75
|89-75|=14>13.7*
|75-91|=16>13.7*
Eleuthra
73
II. HH:0:13
VII. HH:0:25
33
Eleuthra
73
II.
VII.
0
1
3
0
2
5
4
Paradise
Is.
91
H
:
H
:
4
Paradise Is.
91
H1:1 1 1 3 3
H1:1 2 2 5 5
St.Lucia
Lucia
85
|89-73|=16>13.7*
|75-85|=10<13.7
55
St.
85
|89-73|=16>13.7*
|75-85|=10<13.7
III. HH:0:14
VIII.HH:0:34
III.
VIII.
0
1
4
0
3
4
The
critical
point
T
for
H
:
H
:
0.05for
The critical point T0.05
H1:1 1 1 4 4
H1:1 3 3 4 4
r=5and
and(n-r)=195
(n-r)=195
|89-91|=2<13.7
|73-91|=18>13.7*
r=5
|89-91|=2<13.7
|73-91|=18>13.7*
degreesofoffreedom
freedomis:
is:
IV. HH:0:15
IX. HH:0:35
degrees
IV.
IX.
0
1
5
0
3
5
MSE
H
:
H
:
T q
H1:1 1 1 5 5
H1:1 3 3 5 5
ni
|89-85|=4<13.7
|73-85|=12<13.7
|89-85|=4<13.7
|73-85|=12<13.7
504.4
V. HH:0:23
X. HH:0:45
V.
X.
0
2
3
0
4
5
3.86
13.7
H
:
H
:
40
H1:1 2 2 3 3
H1:1 4 4 5 5
|75-73|=2<13.7
|91-85|=6<13.7
6<13.7
|75-73|=2<13.7
|91-85|=
Rejectthe
thenull
nullhypothesis
hypothesisififthe
theabsolute
absolutevalue
valueofofthe
thedifference
differencebetween
betweenthe
thesample
samplemeans
means
Reject
greaterthan
thanthe
thecritical
criticalvalue
valueofofT.
T.(The
(Thehypotheses
hypothesesmarked
markedwith
with**are
arerejected.)
rejected.)
isisgreater
9-33
Picturing the Results of a Tukey Pairwise

Comparisons Test: The Club Med Example
Werejected
rejectedthe
thenull
nullhypothesis
hypothesiswhich
whichcompared
comparedthe
themeans
meansof
ofpopulations
populations11
We
and2,2,11and
and3,3,22and
and4,4,and
and33and
and4.4. On
Onthe
theother
otherhand,
hand,we
weaccepted
acceptedthe
the
and
nullhypotheses
hypothesesof
ofthe
theequality
equalityof
ofthe
themeans
meansof
ofpopulations
populations11and
and4,4,11and
and5,5,
null
and3,3,22and
and5,5,33and
and5,5,and
and44and
and5.5.
22and
3
2
5
1
4
Thebars
barsindicate
indicatethe
thethree
threegroupings
groupingsof
ofpopulations
populationswith
withpossibly
possiblyequal
equal
The
means:22and
and3;3;2,2,3,3,and
and5;5;and
and1,1,4,4,and
and5.5.
means:
9-34

9-35

9-36
NOTE:Zero
Zeroisisnot
not
NOTE:
includedininthe
theintervals.
intervals.
included
Thusthere
thereisisaa
Thus
significantdifference
difference
significant
betweenthe
themeans
means
between
forAAand
andB,
B,AAand
andC,
C,
for
andBBand
andC.
C.
and
9-37
9-6 Models, Factors and Designs
statisticalmodel
modelisisaaset
setofofequations
equationsand
andassumptions
assumptionsthat
thatcapture
capturethe
the
AAstatistical
essentialcharacteristics
characteristicsofofaareal-world
real-worldsituation
situation
essential
Theone-factor
one-factorANOVA
ANOVAmodel:
model:
The
i+
i+
xxijij==i+
ijij==++i+
ijij
whereijijisisthe
theerror
errorassociated
associatedwith
withthe
thejth
jthmember
memberof
ofthe
theith
ith
where
population. The
Theerrors
errorsare
areassumed
assumedtotobe
benormally
normallydistributed
distributed
population.
withmean
mean00and
andvariance
variance2.2.
with
9-38
9-6 Models, Factors and Designs

(Continued)
factorisisaaset
setofofpopulations
populationsorortreatments
treatmentsofofaasingle
singlekind.
kind. For
Forexample:
example:
AAfactor
One factor models based on sets of resorts, types of airplanes, or kinds of

One
factor models based on sets of resorts, types of airplanes, or kinds of
sweaters
sweaters
Two factor models based on firm and location
Two
factor models based on firm and location
Three factor models based on color and shape and size of an ad.
Three
factor models based on color and shape and size of an ad.
Fixed-Effectsand
andRandom
RandomEffects
Effects
Fixed-Effects
A fixed-effects model is one in which the levels of the factor under study (the
A
fixed-effects model is one in which the levels of the factor under study (the
treatments)are
arefixed
fixedininadvance.
advance. Inference
Inferenceisisvalid
validonly
onlyfor
forthe
thelevels
levelsunder
under
treatments)
study.
study.
A random-effects model is one in which the levels of the factor under study are
A
random-effects model is one in which the levels of the factor under study are
randomly
chosenfrom
froman
anentire
entirepopulation
populationofoflevels
levels(treatments).
(treatments). Inference
Inferenceisis
randomly chosen
validfor
forthe
theentire
entirepopulation
populationofoflevels.
levels.
valid
9-39
Experimental Design
completely-randomizeddesign
designisisone
oneininwhich
whichthe
theelements
elementsare
areassigned
assignedtoto
AAcompletely-randomized
treatmentscompletely
completelyatatrandom.
random. That
Thatis,
is,any
anyelement
elementchosen
chosenfor
forthe
thestudy
studyhas
has
treatments
anequal
equalchance
chanceof
ofbeing
beingassigned
assignedtotoany
anytreatment.
treatment.
an
blockingdesign,
design,elements
elementsare
areassigned
assignedtototreatments
treatmentsafter
afterfirst
firstbeing
being
InInaablocking
collectedinto
intohomogeneous
homogeneousgroups.
groups.
collected
In a completely randomized block design, all members of each block
In
a completely randomized block design, all members of each block
(homogeneousgroup)
group)are
arerandomly
randomlyassigned
assignedtotothe
thetreatment
treatmentlevels.
levels.
(homogeneous
In a repeated measures design, each member of each block is assigned to all
In
a repeated measures design, each member of each block is assigned to all
treatment
levels.
treatment levels.
9-40
9-7 Two-Way Analysis of Variance
two-wayANOVA,
ANOVA,the
theeffects
effectsofoftwo
twofactors
factorsorortreatments
treatmentscan
canbe
beinvestigated
investigatedsimultaneously.
simultaneously.Two-way
Two-way
InInaatwo-way
ANOVAalso
alsopermits
permitsthe
theinvestigation
investigationofofthe
theeffects
effectsofofeither
eitherfactor
factoralone
aloneand
andofofthe
thetwo
twofactors
factorstogether.
together.
ANOVA
Threequestions
questionsanswerable
answerableby
bytwo-way
two-wayANOVA:
ANOVA:
Three
Theeffect
effecton
onthe
thepopulation
populationmean
meanthat
thatcan
canbebeattributed
attributedtotothe
thelevels
levelsofofeither
eitherfactor
factoralone
aloneisiscalled
calledaamain
main
The
effect.
effect.
Aninteraction
interactioneffect
effectbetween
betweentwo
twofactors
factorsoccurs
occursififthe
thetotal
totaleffect
effectatatsome
somepair
pairofoflevels
levelsofofthe
thetwo
twofactors
factorsoror
An
treatmentsdiffers
differssignificantly
significantlyfrom
fromthe
thesimple
simpleaddition
additionofofthe
thetwo
twomain
maineffects.
effects. Factors
Factorsthat
thatdo
donot
notinteract
interactare
are
treatments
calledadditive.
additive.
called
Arethere
thereany
anyfactor
factorAAmain
maineffects?
effects?
Are
Arethere
thereany
anyfactor
factorBBmain
maineffects?
effects?
Are
Arethere
thereany
anyinteraction
interactioneffects
effectsbetween
betweenfactors
factorsAAand
andB?
B?
Are
Forexample,
example, we
wemight
mightinvestigate
investigatethe
theeffects
effectson
onvacationers
vacationersratings
ratingsofofresorts
resortsby
bylooking
lookingatatfive
fivedifferent
different
For
resorts(factor
(factorA)
A)and
andfour
fourdifferent
differentresort
resortattributes
attributes(factor
(factorB).
B). InInaddition
additiontotothe
thefive
fivemain
mainfactor
factorAA
resorts
treatmentlevels
levelsand
andthe
thefour
fourmain
mainfactor
factorBBtreatment
treatmentlevels,
levels,there
thereare
are(5*4=20)
(5*4=20)interaction
interactiontreatment
treatmentlevels.3
levels.3
treatment
9-41
The Two-Way ANOVA Model
j j++((
ijij++ijkijk
i+
xxijkijk==++i+
whereisisthe
theoverall
overallmean;
mean;
where
theeffect
effectof
oflevel
leveli(i=1,...,a)
i(i=1,...,a)of
offactor
factorA;
A;
i iisisthe
theeffect
effectof
oflevel
levelj(j=1,...,b)
j(j=1,...,b)of
offactor
factorB;
B;
j jisisthe
jjjjisisthe
theinteraction
interactioneffect
effectof
oflevels
levelsi iand
andj;j;
isthe
theerror
errorassociated
associatedwith
withthe
thekth
kthdata
datapoint
pointfrom
fromlevel
leveli iof
offactor
factorAA
jjkjjkis
andlevel
levelj jof
offactor
factorB.
B.
and
isassumed
assumedtotobe
bedistributed
distributednormally
normallywith
withmean
meanzero
zeroand
andvariance
variance
jjkjjkis
forall
alli,i,j,j,and
andk.k.
22for
9-42
Two-Way ANOVA Data Layout:

Club Med Example
Factor B:
Attribute
Factor A: Resort
Friendship
Sports
Culture
Excitement
Guadeloupe
n11
n12
n13
n14
Martinique
n21
n22
n23
n24
Graphical Display of Effects
Eleuthra
n31
n32
n33
n34
Rating
Friendship
R a ting
Excitement
Sports
Culture
Friendship
Paradise
Island
n41
n42
n43
n44
St. Lucia
n51
n52
n53
n54
Eleuthra/sports interaction:
Combined effect greater than
additive main effects
Attribute
Excitement
Sports
Culture
Eleuthra
St. Lucia
Paradise island
Martinique
Guadeloupe
Resort
Resort
St. Lucia
Paradise Island
Eleuthra
Guadeloupe
Martinique
9-43
Hypothesis Tests a Two-Way ANOVA
FactorAAmain
maineffects
effectstest
test::
Factor
H00::i=i=00for
forall
alli=1,2,...,a
i=1,2,...,a
H
H11::Not
Notall
alli iare
are00
H
FactorBBmain
maineffects
effectstest:
test:
Factor
H00::j=j=00for
forall
allj=1,2,...,b
j=1,2,...,b
H
H11::Not
Notall
alli iare
are00
H
Testfor
for(AB)
(AB)interactions:
interactions:
Test
H00::
ijij==00for
forall
alli=1,2,...,a
i=1,2,...,aand
andj=1,2,...,b
j=1,2,...,b
H
H11::Not
Notall
all
ijijare
are00
H
9-44
Sums of Squares
In a two-way ANOVA:
In a two-way ANOVA:
=+i+i+j j++(
(ijkijk++ijkijk
xxijkijk=+
SST==SSTR
SSTR+SSE
+SSE
SST
SST==SSA
SSA++SSB
SSB+SS(AB)+SSE
+SS(AB)+SSE
SST
SST SSTR SSE

( x x )2 ( x x )2 ( x x )2
SSTR SSA SSB SS ( AB)
( x x )2 ( x x )2 ( x x x x )2
i
j
ij i
j
9-45
The Two-Way ANOVA Table

Source of
Variation
Sum of
Squares
Degrees
of Freedom
Mean Square
F Ratio
Factor A
SSA
a-1
MSA
SSA
a 1
MSA
F
MSE
Factor B
SSB
b-1
MSB
SSB
b 1
MSB
F
MSE
Interaction SS(AB)
(a-1)(b-1)
MS ( AB)
Error
SSE
ab(n-1)
Total
SST
abn-1
A Main Effect Test: F(a-1,ab(n-1))
SS ( AB)
( a 1)(b 1)
SSE
MSE
ab( n 1)
MS ( AB)
F
MSE
B Main Effect Test: F(b-1,ab(n-1))
(AB) Interaction Effect Test: F((a-1)(b-1),ab(n-1))
9-46
Example 9-4: Two-Way ANOVA

(Location and Artist)
Source of
Variation
Sum of
Squares
Degrees
of Freedom
Location
1824
912
8.94
Artist
2230
1115
10.93
804
201
1.97
Error
8262
81
102
Total
13120
89
Interaction
Mean Square
F Ratio
=0.01,FF(2,81)=4.88
=4.88
Both
Bothmain
maineffect
effectnull
nullhypotheses
hypothesesare
arerejected.
rejected.
=0.01,
(2,81)
=0.05,FF(2,81)=2.48
=2.48
Interaction
Interactioneffect
effectnull
nullhypotheses
hypothesesare
arenot
notrejected.
rejected.
=0.05,
(2,81)
9-47
Hypothesis Tests

0.7
0.7
Location test statistic=8.94

Artist test statistic=10.93
0.6
0.4
Interaction test statistic=1.97
0.5
f(F)
f(F)
0.5
0.6
0.4
0.3
0.3
=0.01
0.2
=0.05
0.2
0.1
0.1
0.0
0
F0.01=4.88
0.0
0
F0.05=2.48
Overall Significance Level and Tukey

Method for Two-Way ANOVA
KimballsInequality
Inequalitygives
givesan
anupper
upperlimit
limiton
onthe
thetrue
trueprobability
probabilityof
ofatatleast
least
Kimballs
one Type
TypeIIerror
errorininthe
thethree
threetests
testsof
ofaatwo-way
two-wayanalysis:
analysis:
one
1-(1-
(1-)1)(1-
(1-)2)(1-
(1-)3)
11
2
3
TukeyCriterion
Criterionfor
forfactor
factorA:
A:
Tukey
T q
MSE
bn
wherethe
thedegrees
degreesof
offreedom
freedomof
ofthe
theqqdistribution
distributionare
arenow
nowaaand
andab(n-1).
ab(n-1).
where
Notethat
thatMSE
MSEisisdivided
dividedby
bybn.
bn.
Note
9-48
9-49
Template for a Two-Way ANOVA
9-50
Extension of ANOVA to Three Factors

Source of
Variation
Sum of
Squares
Degrees
of Freedom
Mean Square
SSA
a 1
Factor A
SSA
a-1
MSA
Factor B
SSB
b-1
SSB
MSB
b 1
Factor C
SSC
c-1
MSC
Interaction
(AB)
Interaction
(AC)
Interaction
(BC)
SS(AB)
(a-1)(b-1)
SS(AC)
(a-1)(c-1)
SS(BC)
(b-1)(c-1)
SS ( AB)
( a 1)(b 1)
SS ( AC)
MS ( AC)
(a 1)(c 1)
SS ( BC)
MS ( BC)
(b 1)(c 1)
Interaction
(ABC)
Error
SS(ABC)
(a-1)(b-1)(c-1)
SSE
abc(n-1)
Total
SST
abcn-1
SSC
c 1
MS ( AB)
SS ( ABC)
(a 1)(b 1)(c 1)
SSE
MSE
abc( n 1)
MS ( ABC)
F Ratio
MSA
F
MSE
F
MSB
MSE
MSC
MSE
MS ( AB )
F
MSE
F
MS ( AC )
F
MSE
MS ( BC)
F
MSE
MS( ABC)
F
MSE
Two-Way ANOVA with One

Observation per Cell
The case of one data point in every cell presents a

problem in two-way ANOVA.
There will be no degrees of freedom for the error term.
What can be done?
If we can assume that there are no interactions between
the main effects, then we can use SS(AB) and its
associated degrees of freedom (a 1)(b 1) in place of
SSE and its degrees of freedom.
We can then conduct main effects tests using MS(AB).
See the next slide for the ANOVA table.
9-51
9-52
Two-Way ANOVA with One

Observation per Cell
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Factor A
SSA
a-1
Factor B
SSB
b-1
Error
SS(AB)
(a 1)(b 1)
Total
SST
ab - 1
Mean Square
F Ratio
MSA SSA
a 1
F MSA
MS (AB)
MSB SSB
b 1
F MSB
MS (AB)
MS ( AB) SS ( AB)
(a 1)(b 1)
9-53
9-8 Blocking Designs
A block is a homogeneous set of subjects, grouped to

minimize within-group differences.
A competely-randomized design is one in which the
elements are assigned to treatments completely at
random. That is, any element chosen for the study has an
equal chance of being assigned to any treatment.
In a blocking design, elements are assigned to treatments
after first being collected into homogeneous groups.
In a completely randomized block design, all members of each
block (homogenous group) are randomly assigned to the
treatment levels.
In a repeated measures design, each member of each block is
assigned to all treatment levels.
Model for Randomized Complete

Block Design
=++i+
j j ++ ijij
xxijij=
i+
whereisisthe
theoverall
overallmean;
mean;
where
theeffect
effectof
oflevel
leveli(i=1,...,a)
i(i=1,...,a)of
offactor
factorA;
A;
i iisisthe
isthe
theeffect
effectof
ofblock
blockj(j=1,...,b);
j(j=1,...,b);
j jis
xxijij
isthe
theerror
errorassociated
associatedwith
with
ijis
ij
isassumed
assumedtotobe
bedistributed
distributednormally
normallywith
withmean
meanzero
zeroand
and
ijijis
variance22for
forall
alliiand
andj.j.
variance
9-54
ANOVA Table for Blocking Designs:

Example 9-5
Source of Variation Sum of Squares Degress of Freedom Mean Square
Blocks
Treatments
Error
Total
SSBL
SSTR
SSE
SST
Source of Variation
Blocks
Treatments
Error
Total
n-1
r-1
(n -1)(r - 1)
nr - 1
9-55
F Ratio
MSBL = SSBL/(n-1) F = MSBL/MSE

MSTR = SSTR/(r-1) F = MSTR/MSE
MSE = SSE/(n-1)(r-1)
Sum of Squares
df
Mean Square F Ratio
2750
39
70.51
0.69
2640
2
1320
12.93
7960
78
102.05
13350 119
= 0.01, F(2, 78) = 4.88
9-56
Template for the Randomized

Complete Block Design
9-57
Two-Way ANOVA Using the

Template for Problem 9-42
9-58
Two-Way ANOVA Using Minitab for

Problem 9-42
9-59
Two-Way ANOVA Using Minitab for

Problem 9-42

Complete Business Statistics: Analysis of Variance

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Complete Business Statistics: Analysis of Variance

Încărcat de

Drepturi de autor:

Formate disponibile

COMPLETE

Copyright 2009 by The McGraw-Hill Companies, Inc. All

Explain the purpose of ANOVA

9-1 Using Statistics

ANOVA (ANalysis Of VAriance) is a statistical method for determining

ANOVA is designed to detect differences among means from

The equality of several population means is tested

ANOVA tests for the equality of several population means by looking at

9-2 The Hypothesis Test of

We have r independent random samples, each one corresponding to a

n = n1+ n2+ n3+ ...+nr total observations.

9-2 The Hypothesis Test of Analysis of

9-2 The Hypothesis Test of Analysis of

When the Null Hypothesis Is True

Estimate of variance based on means from r samples

Estimate of variance based on all sample observations

When the Null Hypothesis Is False

When the null hypothesis is false:

Estimate of variance based on all sample observations

The ANOVA Test Statistic for r = 4 Populations and n

F Distribution with 3 and 50 Degrees of Freedom

The nonrejection region (for =0.05)in this

H0 cannot be rejected, and we cannot conclude that any of the

9-3 The Theory and the Computations

Using the Grand Mean: Table 9-1

Distance from data point to its sample mean

The Theory and Computations of ANOVA:

We define an error devi ation as the difference between a data point

eeijij xxijij xxii

The Theory and Computations of

Consider data point x24=13 from table 9-1. The

Tot 24 t 2 e24 1.5 4 .591 6.091

The Theory and Computations of

The Theory and Computations of ANOVA:

The Theory and Computations of ANOVA:

The Theory and Computations of

The Theory and Computations of

The Theory and Computations of

Expected Mean Squares and the

The Theory and Computations of

9-4 The ANOVA Table and Examples

F Distribution for 2 and 8 Degrees of Freedom

Computed test statistic=37.62

Example 9-2: Club Med

SSTR= 14208 (r-1)= 4

F Distribution with 4 and 200 Degrees of Freedom

Computed test statistic=7.04

Example 9-3: Job Involvement

9-5 Further Analysis

The sample means are unbiased estimators of the population means.

The ANOVA Diagram

Confidence Intervals for Population

freedom that cuts off a right - tailed area of .

The Tukey Pairwise-Comparisons Test

where ni is the smallest of the r sample sizes.

pairs of population means to compare. For example, if r =

The Tukey Pairwise Comparison Test:

Picturing the Results of a Tukey Pairwise

Picturing the Results of a Tukey Pairwise

Picturing the Results of a Tukey Pairwise

9-6 Models, Factors and Designs

9-6 Models, Factors and Designs

One factor models based on sets of resorts, types of airplanes, or kinds of