Anova

1
252anova 1/26/07 (Open this document in 'Outline' view!)
Roger Even Bove
F. ANALYSIS OF VARIANCE
1. 1-Way Analysis of Variance
a. The ANOVA model - relation to regression
The one-way ANOVA model is used to compare the means of more than two samples, taken from
populations that are all assumed to have the same variance. Each sample (called a treatment) is usually
represented as a column, but there is no requirement that each column have the same number of items in
it.
xij a j eij , where i 1 through n j , and n
We will assume the model
treatments, n j items in each column and a total of

column i and row j )
j 1,m
.We thus have
observations. (Thus x ij should be the number in
b. An ANOVA problem
The following data describes monthly expenses for energy in three random samples of essentially
identical homes. Each column represents expenses on one fuel. .05 .
Fuel
1
2
3
Sum
Our hypotheses are
89
101
87
87
364
104
120
98
110
432
86
98
100
96
380
H 0 : 1 2 3
In the notation used here,
H 1 : Not all equal
that a mean has been taken, that is x j is the mean of column
x 1
i is replaced by a dot to indicate
j , in particular, the mean of column 1 is
364
432
380
91 , x 2
108 and x 3
95. The overall or grand mean, is the mean of
4
4
4
all the numbers in the problem, and is often indicated by
x , but
x seems to be a more appealing
364 432 380

98 .
12
We compute three sums of squares.
(i) The total sum of squares is the same thing as the numerator of the sample variance of the numbers in
notation. x
the problem. SST x ij x

j
89 98 2
101 98 2
87 98 2
87 98 2
104 98 2
120 98 2
98 98 2
110 98 2
86 98 2
98 98 2
1148
100 98 2
96 98 2
(ii) The sum of squares within treatments has the same number of terms, but highlights the contribution
to the total sum of squares generated by the difference between the individual numbers and the column
(treatment)
means.
SSW
xij x j 2
j
89 91 2
101 91 2
87 91 2
87 91 2
104 108 2
120 108 2
98 108 2
110 108 2
86 95 2
98 95 2
516 (iii) The sum
100 95 2
96 95 2
of squares between treatments also has the same number of terms, but it highlights the contribution to the
total sum of squares generated by the difference between the column (treatment) means and the overall
mean.
SSB
x
j
.j
91 98 2
91 98 2
91 98 2
91 98 2
108 98 2
95 98 2
108 98 2
95 98 2
108 98 2
108 98 2
95 98 2
632
95 98 2
But, because of the repetition of the column mean, this can be simplified to SSB n j x. j x
j
4 91 98 2 4108 98 2 4 95 98 2 632 .
But note that SSB SSW SST , so that the computation of one of the three sums of squares is
unnecessary. The material is summarized in a table like the one below.
Source
SS
DF
MS
F
SSB
MSB
Between
SSB
m 1 MSB m 1 F MSW
SSW
Within
SSW
n m MSW n m
Total
SST
n 1
We fill in the table with the numbers we have computed and compare the F that we have computed with
an F with the appropriate significance level and degrees of freedom shown in the DF column. If the
F that we have computed is larger than the table F , reject the null hypothesis.
F
Source
SS
DF
MS
F.05
H0
2 ,9
Between
632
2
316
5.51
Column
means equal
F
4.26
s
Within
516
9
57.333
Total
1148
11
The s for significant difference indicates that the null hypothesis of equality of means has been rejected.
ns for no significant difference would indicate that the null hypothesis has not been rejected.
c. A format for ANOVA
If we use the same simplifications that we use in calculating a sample variance, we can get the tableau
below.
Sum
nj
x j
SS
x 2j
x
SSB x
SST
1
89
101
87
87
364 +
Fuel
2
104
120
98
110
432 +
3
86
98
100
96
380
Sum
1176
4+
4+
12 n
91.00
108.00
95.00
1176
98 x
12
33260 +
46920 +
36216
116396
8281 +
11664 +
9025
ij
x
28970 x
2
ij
2
j
2 x ij2 nx 2 116396 12 98 2 1148

2
2
2
4 91 2 4108 2 4 95 2 12 98 2
. j x n j x. j nx
ij
4 28970 12 98 2 632
Source
SS
Between
632
DF
MS
316
5.51
F.05
H0
F 2,9 4.26 Column means equal
s
Within
516
9
57.333
Total
1148
11
Explanation: Since the Sum of Squares (SS) column must add up, 516 is found by subtracting
632 from 1148. Since n 12 , the total degrees of freedom are n 1 11 . Since there are 3 random
samples or columns, the degrees of freedom for Between is 3 1 = 2. Since the Degrees of Freedom (DF)
column must add up, 9 = 11 2. The Mean Square (MS) column is found by dividing the SS column by
MSB
the DF column. 316 is MSB and 57.333 is MSW . F
, and is compared with F.05 from
MSW
the F table df 1 2, df 2 9 . To see this as Minitab output go to 252anovaex1.
d. Confidence Intervals
i. A single Confidence Interval
If we desire a single interval, we use the formula for the difference between two means when the variance
is known. For example, if we want the difference between means of column 1 and column 2.
1
1
1 2 x 1 x2 t n m s
, where s MSW .
2
n1 n 2
ii. Scheff Confidence Interval
If we desire intervals that will simultaneously be valid for a given confidence level for all possible
intervals between column means, use 1 2 x 1 x 2
m 1 F m1, n m
1
1
.
n1 n 2
iii. Bonferroni Confidence Interval

nm
If we only need k different intervals, use 1 2 x 1 x 2 t 2 k s
1
1
n1 n 2
iv. Tukey Confidence Interval
This also applies to all possible differences.

s
1
1
1 2 x1 x2 q m, n m
This gives rise to Tukeys HSD (Honestly Significant

n
n
2
1
2
Difference) procedure. Two sample means x.1 and x.2 are significantly different if x.1 x.2 is
m, n m s
greater than 1 2 q
2
1
1
n1 n2
2. 2 -Way Analysis of Variance

a. The 2-way model
We will assume R rows, C columns and P observations per cell. Thus our model reads
xijk i j ij ijk , where i 1 through R,
j 1 through C , and k 1 through P .
We will be testing three pairs of Hypotheses - (i) H 01 : All row means equal (All i zero); H 11 : Not
all row means equal , (ii) H 02 : All column means equal (All j zero); H 12 : Not all column means
equal, (iii) H 03 : No interaction (All ij zero) ; H 13 : Interaction.
This is similar to one-way ANOVA with RC groups , but the between variation is itemized as to
whether it is due to variation between row means, variation between column means or interaction. If we
remember that n RCP and m RC , we can rewrite the one way ANOVA table diagram on the
previous table as below. As previously, we get the items in the MS column by dividing the numbers in the
SS column, by the numbers in the DF column. The F is then found by dividing MSB by MSW .
Source
SS
DF
MS
F
F
H0
Between
___
SSB
RC 1
MSB
Within
SSW
RC P 1
MSW
Total
SST
RCP 1
We can now rewrite the same table with the between items split up.
Source
SS
DF
MS
F
Rows
Columns
Interaction
Within
Total
SSR
SSC
SSI
SSW
SST
R 1
MSR
MSC
R 1 C 1 MSI
RC P 1
MSW
RCP 1
C 1
___
___
___
___
Treatment means equal
H0
___
___
___
Row means equal

Column means equal
No Interaction
b. An example
Insulation 1
(Factor B1 )
89
101
Insulation 2
(Factor B 2 )
87
87
x11 95
x12 87
x1 91
x 21 115
x 22 101
x 2 108
x 31 99
x 32 91
x 3 95
x1 103
x2 93
x x 98
Fuel 1 (Factor A1
)
120
98
Fuel 2 (Factor A2
110
104
)
100
86
Fuel 3 (Factor A3
98
96
)
This problem has R 3 rows, C 2 columns and, within each cell P 2 measurements. We can
compute a table of means which shows means for each cell, row and column, as well as an overall mean.
Insulation 1
Insulation 2
Row means
(Factor B1 )
(Factor B 2 )
Fuel 1 (Factor A1
)
Fuel 2 (Factor A2
)
Fuel 3 (Factor A3
)
Column Means
Now we do the computation of sums of squares, using the same simplification that we use in computing a
sample variance.
SST
x
i
ijk
89 98 2 101 98 2 87 98 2 87 98 2 120 98 2 96 98 2
89 2 101 2 87 2 87 2 120 2 96 2 12 98 2 1148
S W xijk xij
ijk
89 95 2 101 95 2 87 87 2 87 87 2 120 115 2 96 91 2
89 2 101 2 2 95 2 87 2 87 2 2 87 2 120 2 110 2 2115 2

86 2 96 2 2 91 2 192
SSR CP xi x
2 2 91 98 2 108 98 2 95 98 2
2 2 91 2 108 2 105 2 3 98 2 632
SSC RP x j x
3 2 103 98 2 93 98 2 3 2 103 2 93 2 2 98 2 300
S I P xijxi xj x
2
, but we do not compute this because SST SSR SSC SSI SSW , so that
ij
SSI SST SSR SSC SSW 1148 632 300 192 24
7
Out ANOVA table is thus:
Source
DF
MS
Rows A
632
SS
316
9.88s
Columns B
300
300
2,6 5.14
F.05
F 1,6 5.99
9.36s
.05
2, 6
F.05
H0
Row means equal
Column means equal
Interaction
24
2
12
0.38ns
5.14 No Interaction
AB
Within
192
6
32
Total
1148
11
So we reject H 01 and H 02 ,but do not reject H 03 .
To explain further, We get the degrees of freedom for rows by taking the number of rows minus 1,. We do
the same for columns. Then the interaction degrees of freedom are the product of row and column degrees
of freedom. The total degrees of freedom comes from subtracting 1 from the total number of items in the
problem. The within degrees of freedom comes from subtracting the other degrees of freedom from the
total degrees of freedom. The MS column comes from dividing the SS column by the DF column. The
F column is calculated by dividing the items in the MS column by s MSW 32 .
To see this as Minitab output go to 252anovaex2. An example of 2-way ANOVA with one
measurement per cell is in 252anovaex3.
c. Confidence Intervals
i. A Single Confidence Interval
If we desire a single interval we use the formula for a Bonferroni Confidence Interval below with m 1 .
ii. Scheff Confidence Interval

If we desire intervals that will simultaneously be valid for a given confidence level for all possible
intervals between means, use the following formulas.
For cell means, use
RC1,RC P1 2MSW
x x RC 1F
11 21 11 21
For row means, use 1 2 x1 x 2

For column means, use
C1,RC P1 2MSW
x x C 1F
1 2 1 2
PR
R 1 F R 1, RC P 1
2 MSW
.
PC
Note that if
P 1 , replace RC P 1 with
R 1 C 1 .
iii. Bonferroni Confidence Interval
If we only need m different intervals, use for cell means
RCP1 2MSW
x x t
1 21 1 21
Use for row means 1 2 x1 x 2 t RC P 1

2m
Use for column means
RC P1 2MSW
x x t
1 2 1 2
2m PR
2m P
2MSW
.
PC
iv. Tukey Confidence Interval
For cell means, use
RC,RC P1 MSW
x x q
1 21 1 21
For row means, use 1 2 x1 x 2 q R , RC P 1
For column means, use
Note that if
MSW
.
PC
C,RC P1 MSW
x x q
1 2 1 2
P 1 , replace RC P 1 with
PR
R 1 C 1 .
3. More than 2-Way analysis of Variance See 252anovaex4.
10
4. Kruskal-Wallis Test
Equivalent to one-way ANOVA when the underlying distribution is non-normal.
H 0 : Columns come from same distribution or medians equal.
Example: Use same example as for one-way ANOVA, but assume that data comes from non-normal
source. Assume that .05 . There are n 12 data items, so rank them from 1 to 12. Let n i be the
ni .
number of items in column i and SRi be the rank sum of column i . n
Original Data
Ranked Data
Treatment Treatment Treatment
Treatment Treatment Treatment
1
2
3
1
2
3
89
104
86
4
10
1
101
120
98
9
12
6.5
87
98
100
2.5
6.5
8
87
110
96
2.5
11 .
5 .
18.0
39.5
20.5
SRi
4
ni
To check the ranking, note that the sum of the three rank sums is 18.0 + 39.5 + 20.5 = 78.0, and that the
sum of the first
numbers is
n n 1 1213
78.
2
2
11
12
n n 1
Now, compute the Kruskal-Wallis statistic H
SRi 2
ni
3 n 1
12 18.0 2 39.5 2 20.5 2

1
576.125 39 5.3173 . If we look up
313
4
4
4
13
1213

this result in the (4, 4, 4) section of the Kruskal-Wallis table (Table 9) , we find that the p-value for
H 5.6538 is .054 and that the p-value for H 4.6539 is .097, so the p-value for H 5.3173
must lie between these two. Since both are above .05 , do not reject H 0 .
If the size of the problem is larger than those shown in Table 9, use the 2 distribution, with
df m 1 , where m is the number of columns. For example, if each of m 3 columns contains 6
items, .05 and H 5.3173 , compare
with 2 2 5.9915 . Since
is smaller than
.05
.205 , do not reject the null hypothesis.
5. Friedman Test
Equivalent to two-way ANOVA with one observation per cell when the underlying distribution is nonnormal.
H 0 : Columns come from same distribution or medians equal. Note that the only difference between this
and the Kruskal-Wallis test is that the data is cross-classified in the Friedman test.
Example: Three groups of 4 matched workers are trained to do a task by four different methods. When
each worker is observed later, he or she is given a grade of 1 through 10 on performance of the task. Note
that because this data is ordinal, ANOVA is not appropriate. Assume that .05 . In the data below, the
methods are represented by c 4 columns, and the groups by r 3 rows.. In each row the numbers are
ranked from 1 to c 4 . For each column, compute SRi , the rank sum of column i .
Group 1
Group 2
Group 3
Method
1
9
6
9
Original Data
Method Method
2
3
4
1
5
2
1
2
Method
4
7
8
6
SRi
Method
1
4
3
4
11
Ranked Data
Method Method
2
3
2
1
2
1
1
2
5
4
Method
4
3
4
3
10
To check the ranking, note that the sum of the four rank sums is 11 + 5 + 4 + 10 = 30, and that the sum
of the
c c 1
. However, there are r rows, so we must multiply the
2
rc c 1 3 4 5
. So we have
SRi
30 .
2
2
numbers in a row is
expression by
12
12
rc c 1
2
Now compute the Friedman statistic F
11
SR
2
i
3r c 1
5 2 4 2 10 2 3 3 5 262 45 7.4 . If we find the place on

5
the Friedman Table (Table 8) for 4 columns and 3 rows, we find that the p-value for F2 7.4 is .033.
Since the p-value is below .05 , reject the null hypothesis.
12
3 4 5
If the size of the problem is larger than those shown in Table 10, use the 2 distribution, with
df c 1 , where c is the number of columns. For example, if each of c 5 columns contains 6
2 4
items, .05 and F2 7.4 , compare F2 with .05 9.4877 . Since F2 is not larger than
.205 , do not reject the null hypothesis.
6. Tests for Equality of Variances See 252mvar.

Anova

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Anova

Încărcat de

Drepturi de autor:

Formate disponibile

1

252anova 1/26/07 (Open this document in 'Outline' view!)

Roger Even Bove

xij a j eij , where i 1 through n j , and n

We will assume the model

treatments, n j items in each column and a total of

.We thus have

observations. (Thus x ij should be the number in

Our hypotheses are

H 1 : Not all equal

that a mean has been taken, that is x j is the mean of column

i is replaced by a dot to indicate

j , in particular, the mean of column 1 is

all the numbers in the problem, and is often indicated by

x seems to be a more appealing

364 432 380

the problem. SST x ij x

516 (iii) The sum

c. A format for ANOVA

2 x ij2 nx 2 116396 12 98 2 1148

F 2,9 4.26 Column means equal

the F table df 1 2, df 2 9 . To see this as Minitab output go to 252anovaex1.

ii. Scheff Confidence Interval

iii. Bonferroni Confidence Interval

iv. Tukey Confidence Interval

This also applies to all possible differences.

This gives rise to Tukeys HSD (Honestly Significant

2. 2 -Way Analysis of Variance

j 1 through C , and k 1 through P .

Treatment means equal

Row means equal

89 95 2 101 95 2 87 87 2 87 87 2 120 115 2 96 91 2

89 2 101 2 2 95 2 87 2 87 2 2 87 2 120 2 110 2 2115 2

2 2 91 2 108 2 105 2 3 98 2 632

3 2 103 98 2 93 98 2 3 2 103 2 93 2 2 98 2 300

SSI SST SSR SSC SSW 1148 632 300 192 24

ii. Scheff Confidence Interval

For cell means, use

For row means, use 1 2 x1 x 2

iii. Bonferroni Confidence Interval

If we only need m different intervals, use for cell means

Use for row means 1 2 x1 x 2 t RC P 1

Use for column means

iv. Tukey Confidence Interval

For cell means, use

For row means, use 1 2 x1 x 2 q R , RC P 1

For column means, use

3. More than 2-Way analysis of Variance See 252anovaex4.

Now, compute the Kruskal-Wallis statistic H

12 18.0 2 39.5 2 20.5 2

.205 , do not reject the null hypothesis.

5 2 4 2 10 2 3 3 5 262 45 7.4 . If we find the place on

.205 , do not reject the null hypothesis.

6. Tests for Equality of Variances See 252mvar.

S-ar putea să vă placă și