Documente Academic
Documente Profesional
Documente Cultură
Analysis
of Variance
Shaun Burke, RHM Technology Ltd, High Wycombe, Buckinghamshire, UK.
With the advent of built-in spreadsheet central tenet of ANOVA is that the total SS in the form of the data contained in Figure 1,
functions and affordable dedicated an experiment can be divided into the which shows the results from 12 different
statistical software packages, Analysis of components caused by random error, given analysts analysing the same material. Using
Variance (ANOVA) has become relatively by the within-group (or sample) SS, and the these data and a spreadsheet, the results
simple to carry out. This article will components resulting from differences obtained from carrying out one-way
therefore concentrate on how to select the between means. It is these latter components ANOVA are reported in Example 1. In this
correct variant of the ANOVA method, the that are used to test for statistical example, the ANOVA shows there are
advantages of ANOVA, how to interpret significance using a simple F-test (1). significant differences between analysts
the results and how to avoid some of the (Fvalue > Fcrit at the 95% confidence level).
pitfalls. For those wanting more detailed Why not use multiple t-tests This result is obvious from a plot of the
theory than is given in the following instead of ANOVA? data (Figure 1) but in many situations a
section, several texts are available (25). Why should we use ANOVA in preference visual inspection of a plot will not give such
to carrying out a series of t-tests? I think a clear-cut result. Notice that the output
A bit of ANOVA theory this is best explained by using an example; also includes a p-value (see Interpretation
Whenever we make repeated suppose we want to compare the results of the result(s) section, which follows).
measurements there is always some from 12 analysts taking part in a training
variation. Sometimes this variation (known exercise. If we were to use t-tests, we Note: ANOVA cannot tell us which
as within-group variation) makes it difficult would need to calculate 66 t-values. Not individual mean or means are different
for analysts to see if there have been only is this a lot of work but the chance of from the consensus value and in what
significant changes between different groups reaching a wrong conclusion increases. The direction they deviate. The most effective
of replicates. For example, in Figure 1 correct way to analyse this sort of data is way to show this is to plot the data (Figure
(which shows the results from four replicate to use one-way ANOVA. 1) or alternatively, but less effectively, carry
analyses by 12 analysts), we can see that out a multiple comparison test such as
the total variation is a combination of the One-way ANOVA Scheffe's test (2). It is also important to
spread of results within groups and the One-way ANOVA will answer the question: make sure the right questions are being
spread between the mean values (between- Is there a significant difference between asked and that the right data are being
group variation). The statistic that measures the mean values (or levels), given that the captured. In Example 1, it is possible that
the within and between-group variations in means are calculated from a number of the time difference between the analysts
ANOVA is called the sum of squares and replicate observations? Significant refers carrying out the determinations is the
often appears in the output tables to the observed spread of means that reason for the difference in the mean
abbreviated as SS. It can be shown that the would not normally arise from the chance values. This example shows how good
different sums of squares calculated in variation within groups. We have already experimental design procedures could have
ANOVA are equivalent to variances (1). The seen an example of this type of problem in prevented ambiguity in the conclusions.
10 statistics and data analysis LCGC Europe Online Supplement
27.1
27.2
27
27.1
27.2
27
27.1
27.2
26.9
27.3
26.9
27.3
26.9
27.3
27.1
27.2
27
27.1
27.2
27
27.1
27.2
26.9
27.3
26.9
27.3
26.9
27.3
protein yield?
60
Does time and/or temperature affect the
protein yield?
27
27.1
27.2
27
27.1
27.2
27
27.1
27.2
26.9
27.3
26.9
27.3
26.9
27.3
48
46
44
Analyte concentration (ppm)
42
40 total
standard
deviation
38
36
34
32 Mean
30
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12
Analyst ID
significant difference when none is a number of tests for heteroscedasity (i.e., problem in the data structure by
present. The best way to avoid this pitfall Bartlett's test (5) and Levene's test (2)). It transforming it, such as by taking logs (7).
is, as ever, to plot the data. There also exist may be possible to overcome this type of If the variability within a group is
correlated with its mean value then
ANOVA may not be appropriate and/or it
may indicate the presence of outliers in the
ZHigh data (Figure 4). Cochran's test (5) can be
ZHigh used to test for variance outliers.
ZLow
Conclusions
ANOVA is a powerful tool for
Response
Response
ZLow
determining if there is a statistically
significant difference between two or
more sets of data.
One-way ANOVA should be used
when we are comparing several sets
of observations.
YLow YHigh YLow YHigh Two-way ANOVA is the method
used when there are two separate
(a) Y and Z are independent (b) Y and Z are interacting factors that may be influencing a result.
Except for the smallest of data sets
ANOVA is best carried out using a
figure 2 Interactive factors. spreadsheet or statistical software
package.
You should always plot your data to
make sure the assumptions ANOVA is
Yes Compare interaction mean
based on are not violated.
Compare within-group mean Significant
Start squares with interaction mean difference? squares with individual factor
squares (F > F crit) mean squares Acknowledgements
The preparation of this paper was
No supported under a contract with the UK
Pool the within-group and
Department of Trade and Industry as part
interaction sums of squares of the National Measurement System Valid
Analytical Measurement Programme (VAM)
(8).
Compare pooled mean
squares with individual factor References
mean squares (1) S. Burke, Scientific Data Management 1(1),
3238, September 1997.
(2) G.A. Millikem and D.E. Johnson, Analysis of
Messy Data, Volume 1: Designed Experiments,
Van Nostrand Reinhold Company, New York,
figure 3 Comparing mean squares in two-way ANOVA with replication.
USA (1984).
(3) J.C. Miller and J.N. Miller, Statistics for
Analytical Chemistry, Ellis Horwood PTR
Prentice Hall, London, UK (ISBN 0 13 0309907).
(4) C. Chatfield, Statistics for Technology,
Chapman & Hall, London, UK (ISBN 0412
25340 2).
(5) T.J. Farrant, Practical Statistics for the Analytical
Unreliable high mean (may contain outliers)
Scientist, A Bench Guide, Royal Society of
Chemistry, London, UK (ISBN 0 85404 442 6)
(1997).
(6) K.V. Mardia, J.T. Kent and J.M. Bibby,
Multivariate Analysis, Academic Press Inc. (ISBN
Significantly different means by ANOVA 0 12 471252 5) (1979).
Variance