Sunteți pe pagina 1din 28

Analysis of Variance

Introduction
Suppose that we are interested in the effect of four
different types of chemical fertilizers on the yield of rice,
measured in pounds per acre. If there is no difference
between the different types of fertilizers, then we would
expect all the mean yields to be approximately equal.
Otherwise, we would expect the mean yields to differ.
The different types of fertilizers are called treatments
and their effects are the treatment effects. The yield is
called the response. Typically we have a model with a
response Variable that is possibly affected by one or
more treatments.
2

The study of these types of models falls under the purview of


design of experiments, which we discussed in Chapter 9. In
this chapter we concentrate on the analysis aspect of the data
obtained from the designed experiments. If the data came
from one or two populations, we could use the techniques
learned in Chapters 8 and 9. Here, we introduce some tests
that are used to analyze the data from more than two
populations. These tests are used to deal with treatment
effects, including tests that take into account other factors
that may affect the response. The hypothesis that the
population means are equal is considered equivalent to the
hypothesis that there is no difference in treatment effects.
3

Analysis of variance is one of the most flexible and

practical techniques for comparing several means.


It is important to observe that analysis of variance is
not about analyzing the population variance.
In fact, we are analyzing treatment means by
identifying sources of variability of the data.
In its simplest form, analysis of variance can be
considered as an extension of the test of hypothesis for
the equality of two means that we learned in earlier
Chapter.
Actually, the so-called one-way analysis of variance is a
generalization of the two-means procedure to a test of
equality of the means of more than two independent,
normally distributed populations.
4

Treatment
Item that is to be compared is termed as the treatment.
Factor and the levels of the Factor
If we are interested in finding the effect of an item at
different levels, such an item is called a factor and the
different levels are termed as the levels of the factor.
Example:
If it is necessary to investigate the solubility of a catalyst
at different three temperatures (say 30 0C , 60 0C , 90 0C ).
The temperature is the factor and 30 0C are the levels
of the factor.
5

Note:
The experiments those are carried out to study/compare
the effects of one factor are called one factor experiment.
In one factor experiment, the levels of the factor can be
considered as different treatments.
One factor experiment is associated with one way
analysis of variance.
Replicate
To minimize the effect of random variation we usually
make more than a single measurement under the same
experimental condition.
6

The relevant measurements are called replicates and

the process is called replication.

Analysis of Variance Technique


In the estimation and hypothesis testing material, we
are restricted in each case to considering no more that
two population parameters.
Such was the case, for example, in testing for the
equality of two population means using independent
samples from normal populations with common but
unknown variance, where it was necessary to obtain a
pooled estimate of 2.
7

This material dealing in two sample inference

represents a special case of what we call the one-factor


problem.
For example, the survival time is measured for two
samples of mice where one sample received a new a
new serum for leukemia treatment and the other
sample received no treatment.
In this case we say that there is one factor, namely
treatment and he factors is at two levels.
If several competing treatments were being used in the

sampling process, more samples of mice would be


necessary.
8

In this case the problem would involve one factor with

more than two levels and thus more than two samples.
In the k>2 sample problem, it will be assumed that

there are k samples from k populations.


One very common procedure used to deal with testing
population means is called the analysis of variance, or
ANOVA.

One-Way Analysis of Variance (Completely


Randomized Design)
Random samples of size n are selected from each of k
populations.
The k different populations are classified on the basis
of a single criterion such as different treatments or
groups.
Today the term treatment is used generally to refer to
the various classifications, whether they are different
aggregates, different analysts, and different fertilizers.

10

Assumptions and Hypothesis in One-Way ANOVA


It is assumed that k populations are independent and
normally distributed with means 1 , 2 ,..., k and
common variance 2.
The Hypothesis is,
H 0 : 1 2 ... k

H1 : at least two of means are not equal.


Let yij denote the jth observation from the ith treatment

and arrange the data as follows.

11

12

Here, Yi. is the total of all observations in the sample

from the ith treatment, yi . is the mean of all


observations in the sample from the ith treatment, Y.. is
the total of all nk observations, and y.. is the mean
of all nk observations.

Model for One-way ANOVA


Each observation may be written in the form,

Yij i ij or Yij i ij

where ij measures the deviation of the jth observation


of the ith sample from the corresponding treatment
mean.
13

The ij term represents random error and plays the

same role as the error terms in the regression models.


Alternative and preferred form of this equation is
obtained by ksubstituting i = + i, subject to the
constraint
i 0.
i 1

Hence, we may write

Yij = + i + ij ,
where is just the grand mean of all the i, that is,

1 k
i
k i 1

and i is called the effect of the ith treatment.


14

The null hypothesis that the k population means are

equal against the alternative that at least two of the


means are unequal may now be replaced by the
equivalent hypothesis.
H0: 1 = 2 = = k = 0,
H1: At least one of the i is not equal to zero.

Resolution of Total Variability into Components


Our test will be based on a comparison of two
independent estimates of the common population
variance 2. These estimates will be obtained by
partitioning the total variability of our data, designated
by the double summation
15

y
k

i 1 j 1

ij

y.. , into two components.

Theorem 10.1
Sum-of-Squares Identity

y
k

i 1 j 1

ij y.. n yi . y.. yij y.


2

i 1

i 1 j 1

It will be convenient in what follows to identify the


terms of the sum-of-squares identity by the following
notation:
16

Three Important Measures of Variability

SST yij y.. Total sum of squares


k

i 1 j 1

2
y
yij2 ..
N
i 1 j 1
k

SSA n yi. y.. Treatment sum of squares


2

i 1

yi2. y..2

N
i 1 ni
k

17

SSE yij yi Error sum of squares


k

i 1 j 1

SSE SST SSA


The sum-of-squares identity can then be represented

symbolically by the equation


SST = SSA + SSE.
The identity above expresses how between-treatment
and within-treatment variation add to the total sum of
squares.

18

Theorem 10.2:
k

E ( SSA) (k 1) 2 n i2
i 1

If H0 is true, an estimate of 2, based on k 1 degrees

of freedom, is provided by this expression:


SSA
S
k 1
2
1

Treatment Mean Square

If H0 is true and thus each i in Theorem 10.2 is equal

to zero, we see that,

SSA
2
E

k 1
19

and S12 is an unbiased estimate of 2. However, if H1 is


true, we have,
k
SSA
n

2
2
E

i
k 1 i 1
k 1

and S12 estimates 2 plus an additional term, which


measures variation due to the systematic effects.
A second and independent estimate of 2, based on
k(n1) degrees of freedom, is this familiar formula:

SSE
S
Error Mean Square
k (n 1)
2

20

Use of F-Test in ANOVA


The estimate s2 is unbiased regardless of the truth or
falsity of the null hypothesis.
It is important to note that the sum-of squares identity
has partitioned not only the total variability of the
data, but also the total number of degrees of freedom.
That is,
nk 1 = k 1 + k(n 1).

21

F-Ratio for Testing Equality of Means


When H0 is true, the ratio f = S21 /S2 is a value of the
random variable F having the F-distribution with k1
and k(n1) degrees of freedom.
Since S21 overestimates 2 when H0 is false, we have a
one-tailed test with the critical region entirely in the
right tail of the distribution.
The null hypothesis H0 is rejected at the -level of
significance when,
f > f[k 1, k(n 1)].

22

Another approach, the P-value approach, suggests that

the evidence in favor of or against H0 is,


P = P{f[k 1, k(n 1)] > f}.
The computations for an analysis-of-variance problem
are usually summarized in tabular form as follows.

23

Example 1:
Test the hypothesis 1 = 2 = = 5 at the 0.05 level of
significance for the data of below table on absorption of
moisture by various types of cement aggregates.

24

25

Solution :
The hypotheses are
H0: 1 = 2 = = 5,
H1: At least two of the means are not equal.
= 0.05.
Critical region: f > 2.76 with v1 = 4 and v2 = 25 degrees
of freedom. The sum-of-squares computations give
SST = 209,377,
SSA= 85,356,
SSE = 209,377 85,356 = 124,021.

26

Example 2:
To compare the cleansing action of three detergents on
the basis of the following whiteness readings made on 15
swatches of white cloth, which were first soiled with
India ink and then washed in an agitator-type machine
with the respective detergents.
Detergent A: 77, 81, 71, 76, 80
Detergent B: 72, 58, 74, 66, 70
Detergent C: 76, 85, 82, 80, 77

27

The means of three samples are 77, 68, and 80. Test at
the 0.01 level of significance whether the differences
among the means of the whiteness readings are
significant.
Solution:
The hypotheses are
1. H0: i=0, for i= 1, 2, 3
H1: At least one value of i, i not equal to zero.
= 0.01.
Critical region:
2. Reject the null hypothesis if f > 6.93 with v1 = 2 and v2 = 12
degrees of freedom. Where f is obtained by a one way
analysis of variance and 6.93 is the value of f0.01,2,12
28

S-ar putea să vă placă și