Documente Academic
Documente Profesional
Documente Cultură
2
The one-way analysis of variance is used to test
the claim that three or more population means
are equal
This is an extension of the two independent
samples t-test
One-way ANOVA – An analysis of variance
procedure using one dependent and one
independent variable.
3
The response variable is the variable you’re
comparing
The factor variable is the categorical variable
being used to define the groups
◦ We will assume k samples (groups)
The one-way is because each value is classified in
exactly one way
◦ Examples include comparisons by gender, race, political
party, color, etc.
4
To use the one-way ANOVA test, the following
assumptions must be true
5
There is a “family” of F
Distributions.
Each member of the family is
determined by two parameters:
◦ the numerator degrees of freedom
◦ the denominator degrees of freedom.
F cannot be negative, and it is a
continuous distribution.
The F distribution is positively
skewed.
Its values range from 0 to ∞
As F → ∞ the curve approaches
the X-axis.
6
Only one classification factor is
considered
Factor
1 Response/ outcome/
Treatment 2 dependent variable
(samples)
(The level of
the factor)
i
Replicates (1,…,j)
The object to a
given
7
treatment
Mean square
(variance)
H0: µ1 = µ2 = µ3 = ... = µk within
f(X)
— All population means
are equal
— No treatment effect
Ha: Not All µi Are Equal X
µ1 = µ2 = µ3
— At least 2 pop. means
are different Mean square among
— Treatment effect
f(X)
— µ1 ≠ µ2 ≠ ... ≠ µk is
Wrong
X
µ1 = µ 2 µ 3
8
If the null hypothesis is true,
◦ we would expect all the sample means to be close
to one another (and as a result, close to the grand
mean).
9
Variation
◦ Variation is the sum of the squares of the
deviations between a value and the mean of
the value.
As long as the values are not identical,
there will be variation
Denoted as SS for Sum of Squares
10
Are all of the values identical?
◦ No, so there is some variation in the data
◦ This is called the total variation
◦ Denoted SS(Total) for the total Sum of
Squares (variation)
◦ Sum of Squares is another name for variation
11
VARIATION BETWEEN GROUPS
◦ Are all of the sample means identical?
No, so there is some variation between the groups
for each data value look at the difference between its
group mean and the overall mean. This is called the
between group variation
Sometimes called the variation due to the factor
Denoted SS(A) for Sum of Squares (variation)
between the groups
(xi − x ) 2
12
VARIATION WITHIN GROUPS
◦ Are each of the values within each group identical?
No, there is some variation within the groups.
for each data value we look at the difference between that
value and the mean of its group.This is called the within
group variation
Sometimes called the error variation
Denoted SS(E) for Sum of Squares (variation) within
the groups
(
x ij − x i
• for each data value we look )
2 at the difference
between that value and the mean of its group
13
Variance is described as Sum of Squares
SS TOTAL
SSBETWEEN SS WITHIN
14
ONE-WAY ANOVA TABLE
Source SS df MS F
Between
(Factor)
Within
(Error)
Total
15
“F” means “F test statistic”
One-way Analysis of Variance
Source DF SS MS F
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7
Source DF SS MS F
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7
Source DF SS MS F
Factor a-1 SS(Between) MSA MSA/MSE
Error n-a SS(Error) MSE
Total n-1 SS(Total)
MSA = SS(Between)/(a-1)
n-1 = (a-1) + (n-a) MSE = SS(Error)/(n-a)
obs n
SSE = ∑ (x ij − x i ) 2
obs
SSA = ∑ (x i − x ) = ∑
2 (∑ x i )
−
2
(∑ x ) ij
2
obs ni n
SS MSA
SST = SSA + SSE; MS = ; F=
DF MSE
19
If means are equal,
F = MST / MSE ≈ 1.
Only reject if large F!
Reject H0
Do Not α
Reject H0
0 F
F(α; k – 1, n – k)
Always One-Tail!
© 1984-1994 T/Maker Co.
If MST is close to MSE then both have same source of variation
20
As production manager, you want to see if three filling
machines have different mean filling times. You assign
15 similarly trained and experienced workers, 5 per
machine, to the machines. At the 5% level of
significance, is there a difference in mean filling times?
21
The summary statistics for the three filling
machines of each row are shown in the table
below
22
The H0 is that the means are all equal
◦ H0: All machines have equal mean filling times
23
SSA = ∑ ( x i − x ) = ∑
2
(∑ x i ) 2
−
(∑ x )ij
2
obs ni n
124.652 113.052 102.952 (340.65)2
= ∑ + + −
5 5 5 15
= 7783.326 − 7736.162
= 47.164
24
(∑ x ) 2
SST = ∑ ( x ij − x ) = ∑∑ x −
2 2 ij
ij
obs n
[
= 25.4 + 26.31 + 24.1 +...+ 20.4 − 7736.162
2 2 2 2
]
= 7794.379 − 7736.162
= 58.2172
25
SST = SSA + SSE
SSE = SST − SSA
= 58.2172 − 47.164
= 11.0532
26
Source SS df MS F
Between
47.1640
(Machines)
11.0532
Within (Error)
58.2172
Total
27
Filling in the degrees of freedom gives this …
Source SS df MS F
Between
47.1640 3-1=2
(Machines)
11.0532 15 - 3 = 12
Within (Error)
58.2172 15 - 1 = 14
Total
28
Completing the MS gives …
Source SS df MS F
Between
47.1640 3-1=2 23.5820
(Machines)
11.0532 15 - 3 = 12 0.9211
Within (Error)
58.2172 15 - 1 = 14
Total
29
Adding F to the table …
Source SS df MS F
Between
47.1640 3-1=2 23.5820 25.60
(Machines)
11.0532 15 - 3 = 12 0.9211
Within (Error)
58.2172 15 - 1 = 14
Total
30
H0: µ1 = µ2 = µ3
Test Statistic:
H1: Not all mean equal
MST 23 .5820
Critical Value(s): F= = = 25.6
MSE .9211
α = .05
ν1 = 2 ν2 = 12
Decision:
Reject at α = .05
α = .05
Conclusion:
There is evidence that three
0 3.89 F filling machines have different
31 mean filling times
One-way ANOVA: time versus Machine
Source DF SS MS F P
Machine 2 47.164 23.582 25.60 0.000
Error 12 11.053 0.921
Total 14 58.217
32
33
An experiment was performed to determine whether
the annealing temperature of ductile iron affects its
tensile strength. Five specimens were annealed at each
of four temperatures. The tensile strength (in ksi) was
measured for each temperature. The results are
presented in the following table. Can you conclude that
there are differences among the mean strengths?
34
Temperature Sample Total
(oC) size (n)
750
800
850
900
35
36
One-way ANOVA: strength versus Temperature
Source DF SS MS F P
Temperature 3 58.65 19.55 8.49 0.001
Error 16 36.84 2.30
Total 19 95.49
37
38
Confidence interval for each mean, µi
MSE
x ± tα
,n − a ni
2
39
1 1
( X 1 − X 2 ) ± t MSE n + n
1 2
40
When the null hypothesis is rejected, it may
be desirable to find which mean(s) is (are)
different.
Two statistical inference procedures, geared
at doing this, are presented:
◦ “regular” confidence interval calculations
◦ Tukey test
41
Two means are considered different if the
confidence interval for the difference
between the corresponding sample
means does not contain 0.
In this case the larger sample mean is
believed to be associated with a larger
population mean.
42
Tukey 95% Simultaneous Confidence Intervals
All Pairwise Comparisons among Levels of Machine
43
44
Only two classification factor is considered
Factor B
1 2 j
1
Factor A 2
45
The standard two-way ANOVA tests are valid under the
following conditions:
a b n
x...2
Total abn-1 SST = ∑∑∑ x − 2
ijk
i =1 j =1 k =1 abn
47
A chemical engineer is studying the effects of various reagents
and catalyst on the yield of a certain process. Yield is expressed
as a percentage of a theoretical maximum. 4 runs of the process
were made for each combination of 3 reagents and 4 catalysts.
Construct an ANOVA table and test is there an interaction
effect between reagents and catalyst.
Reagent
Catalyst
1 2 3
A 86.8 82.4 93.4 85.2 77.9 89.6
86.7 83.5 94.8 83.1 89.9 83.7
B 71.9 72.1 74.5 87.1 87.5 82.7
80.0 77.4 71.9 84.1 78.3 90.1
C 65.5 72.4 66.7 77.1 72.7 77.8
76.6 66.7 76.7 86.1 83.5 78.8
D 63.9 70.4 73.7 81.6 79.8 75.7
77.2 81.2 84.2 84.9 80.5 72.9
48