Sunteți pe pagina 1din 12

An introduction to sphericity

The article is written for a general audience of post-graduate and graduate researchers.
The technical material goes slightly beyond what is covered in most text books,
although there is still some simplification (which is usually indicated in the text). The
aim to give advice about best practice for checking and dealing with sphericity in
repeated measures ANOVA. Some of the content is personal opinion (which I have
tried to indicate in the text). I include a short bibliography of my sources at the end for
readers who want to explore the topic in more detail.

Dr Thom Baguley 2004

Some background

Sphericity is a mathematical assumption in repeated measures ANOVA designs. Let's start by


considering a simpler ANOVA design (e.g., one-way independent measures ANOVA).

In independent measures ANOVA one of the mathematical assumptions is that the variances of
the populations that groups are sampled from are equal. This homogeneity of variance
assumption more-or-less follows from the null hypothesis being tested in ANOVA ? if the
treatment has no effect on the thing being measured (the DV) then we can consider all the groups
to be sampled from the same population.1 However, because we're taking samples we'd be very
lucky to observe exactly equal variances (even if the assumption were perfectly met). Real data
is rarely that neat! What we'd expect to get (most of the time) is groups with similar variances.2

The sphericity assumption can be thought of as an extension of the homogeneity of variance


assumption in independent measures ANOVA. Why does the assumption need to be extended?
To understand this we need to introduce the ANOVA covariance matrix.

The covariance matrix

What is a covariance matrix. In a nutshell it is a matrix that contains the covariances between
levels of a factor in an ANOVA design.3 A covariance is the shared or overlapping variance
between two things (sometimes called variance in common). Let's look at an example of the
layout for a one-factor ANOVA design with four levels and therefore four samples (called A1,
A2, A3 and A4):

Samples: A1 A2 A3 A4
A1 s12 s12 s13 s14
A2 s21 s22 s23 s24
A3 s31 s32 s32 s34
A4 s41 s42 s43 s42
The first thing to notice is the main diagonal cells in the matrix (running top left to bottom right)
contain the variances of the four levels (e.g., s12 is the variance of A1).4 The second thing to
notice is that the covariances are therefore in the cells off the main diagonals (called the off-
diagonal cells). The third thing to notice is that the covariances are mirrored above and below the
main diagonal. (The term s14 is the covariance between samples A1 and A4, while s41 is the
covariance between samples A4 and A1. As this is the variance they have in common, s14 = s41.)

What does the covariance matrix look like for independent measures ANOVA? Here an example
for a one-way independent measures ANOVA design with 4 levels (and hence four groups):

Samples: A1 A2 A3 A4
A1 s12 0 0 0
A2 0 s22 0 0

A3 0 0 s32 0

A4 0 0 0 s42

The most striking observation is that all the covariances are zero. Why? The answer is fairly
straight-forward. In an independent measures design the observations should be independent and
therefore uncorrelated with each other.5 Two samples that are uncorrelated will share no variance
(and the covariance will be zero). So in this relatively simple case we only have to worry about
homogeneity of variance ? which would lead us to expect that the observed variances on the
main diagonal should be similar.

Reminder: Assumptions such as homogeneity of variance, sphericity and so forth are


assumptions about the populations we are sampling from. I'll try and indicate this as I go
through, but it sometimes gets clumsy to keep repeating "in the population being sampled" all the
time! We expect samples to have similar characteristics to the populations being sampled, but
only in rare cases will the samples show exactly the same pattern of variance (or whatever) as the
population. It is also worth adding that large samples are more similar to the populations they are
sampled from than small samples.

Finally, please note that the statistical term "population" is an abstract one. We are referring to a
population of data points that we might potentially be sampling not a fixed entity such as the
population of a country. (In other contexts, such as market research, people sometimes deal with
such fixed populations, but this requires slightly different methods from those in most sciences).
What is the sphericity assumption?

Compound symmetry

The sphericity assumption is an assumption about the structure of the covariance matrix in a
repeated measures design. Before we describe it in detail lets consider a simpler (but stricter)
condition. This one is called compound symmetry. Compound symmetry is met if all the
covariances (the off-diagonal elements of the covariance matrix) are equal and all the variances
are equal in the populations being sampled. (Note that the variances don't have to equal the
covariances.) Just as with the homogeneity of variance assumption we'd only rarely expect a real
data set to meet compound symmetry exactly, but provided the observed covariances are roughly
equal in our samples (and the variances are OK too) we can be pretty confident that compound
symmetry is not violated.

The good news about compound symmetry

If compound symmetry is met then sphericity is also met. So if you take a look at the covariance
matrix and the covariances are similar and the variances are similar then we know that sphericity
is not going to be a problem.6

The bad news about compound symmetry

As compound symmetry is a stricter requirement than sphericity we still need to check sphericity
if compound symmetry isn't met. This is where it gets technical (well, even more technical).

The sphericity assumption

Lets take a look at the raw data. Imagine that the first few observations of A1, A2, A3 and A4 are
as follows:

A1 A2 A3 A 4
Participant 1 8 9 12 4
Participant 2 6 11 16 3
Participant 3 9 8 12 5
etc. ... ... ... ...
For each possible pair of levels of factor A (e.g., A1 and A2 or A2 andA3) we can calculate the
difference between the observations. For example:

A1-A2 A1-A3 A1-A4 etc.


Participant 1 -1 -4 +4
Participant 2 -5 -10 +3
Participant 3 +1 -3 +4
etc. ... ... ... ...

We could then calculate variances for each of these differences (e.g., s1-22 or s2-42).

The sphericity assumption is that the all the variances of the differences are equal (in the
population sampled). In practice, we'd expect the observed sample variances of the differences to
be similar if the sphericity assumption was met.

Using the covariance matrix to check the sphericity assumption

We can check sphericity assumption using the covariance matrix, but it turns out to be fairly
laborious. (Later on I'll discuss some simpler ways to check sphericity using output from SPSS
and similar statistics packages). Variance of differences can be computed using a versio of the
variance sum law:

sx-y2 = sx2 +sy2 - 2(sxy)


In other words the variance of a difference is the sum of the two variances minus twice their
covariance. A simple check will show that this works out as zero if the two variances share all
their variance.

(Note that we could also calculate the variances of the differences directly from
the raw data. We'd simply calculate the differences between all the possible pairs
of levels of a factor. For example, using Excel or SPSS we could define a new
column value as one level minus another level and then calculate the variances of
each column using the built in descriptive statistics of the program. This would
get very laborious if we had lots of levels, so I'd recommend either the above
method if you really want to calculate the variances of the differences and you
already have a covariance matrix. Fortunately this isn't necessary in most cases
? as I'll discuss later.)

An example
This example is adapted from Kirk (1995). Imagine the observed covariance matrix for our
design above is this:

Samples: A1 A2 A3 A 4
A1 10 5 10 15
A2 5 20 15 20
A3 10 15 30 25
A4 15 20 25 40

sx-y2 = sx2 +sy2 - 2(sxy)


s1-22 = 10 + 20 - 2(5) = 20
s1-32 = 10 + 30 - 2(10) = 20
s1-42 = 10 + 40 - 2(15) = 20
s2-32 = 20 + 30 - 2(15) = 20
s2-42 = 20 + 40 - 2(20) = 20
s3-42 = 30 + 40 - 2(25) = 20
This example has been contrived so that variances of the differences are exactly equal (which
would be unusual in real data), but it does demonstrate that lack of compound symmetry does not
necessarily mean that sphericity is violated. (Compound symmetry is a sufficient, but not
necessary requirement for sphericity to be met.)7 In this example, compound symmetry is clearly
not met (the largest variances and covariances are 4 or 5 times bigger than the smallest), but
sphericity holds.

What to do if sphericity is violated in repeated measures ANOVA

There are two broad approaches to dealing with violations of sphericity. The first is to use a
correction to the standard ANOVA tests. The second is to use a different test (i.e., one that
doesn't assume sphericity).

In the following sub-sections I give general advice on what to do if sphericity is violated, this
advice tends to hold well in most cases for factorial repeated measures designs but may be
problematic for mixed ANOVA designs (discussed later under Complications).
Correcting for violations of sphericity

The best known corrections are those developed by Greenhouse and Geisser (the Greenhouse-
Geisser correction) and Huynh and Feldt (the Huynh-Feldt correction).8 Each of these
corrections works roughly in the same way. They all attempt to adjust the degrees of freedom in
the ANOVA test in order to produce a more accurate significance (p) value. If sphericity is
violated the p values need to be adjusted upwards (and this can be accomplished by adjusting the
degrees of freedom downwards).

The first step in each test is to estimate something called epsilon.9 For our purposes we can
consider epsilon to be a descriptive statistic indicating the degree to which sphericity has been
violated. If sphericity is met perfectly then epsilon will be exactly 1. If epsilon is below 1 then
sphericity is violated. The further epsilon gets away from 1 the worse the violation.

How bad can epsilon get? Well, it depends on the number of levels (k) on the repeated measure
factor.

Lower bound of epsilon = 1/(k-1)

So for 3 levels epsilon can go as low as 0.5, for 6 levels it can go as low as 0.2 and so forth. The
more levels on the repeated measures factor the worse the potential for violations of sphericity.10

The three common corrections fall into a range of most to least strict. First consider the most
strict. We could use the lower bound value of epsilon and correct for the worst possible case.
Fortunately, there is a much better option. The Greenhouse-Geisser correction is a conservative
correction (it tends to underestimate epsilon when epsilon is close to 1 and therefore tends to
over-correct). Huynh-Feldt produced a modified version for use when the true value of epsilon is
thought to be near or above 0.75.

The Huynh-Feldt correction tends to overestimate sphericity, so some statisticians have


suggested using the average of the Greenhouse-Geisser and Huynh-Feldt corrections. My advice
would be to consider the aims of the research and the relative cost of Type I and II errors. If
Type I errors are considered more costly (especially if the estimates of epsilon fall below 0.75)
then stick to the more conservative Greenhouse-Geisser correction.

Using the correction is fairly simple. Replace the treatment and error d.f. by (epsilon*d.f.). So an
epsilon of 0.6 would turn an F1,10 test into an F0.6,6 test.11

Using the these corrections seems to work well for relatively modest departures from 1 by
epsilon or when sample sizes are small.

Using MANOVA
An alternative approach is to use a test that doesn't assume sphericity. In the case of repeated
measures ANOVA this usually means switching to multivariate ANOVA (MANOVA for short).
Some computer programs print out MANOVA automatically alongside repeated measures
ANOVA. While this can be confusing, it does make it easy to compare results for different tests
and corrections. If sphericity is met (i.e., epsilon = 1) all the p values for a given test should be
identical. The degree to which they differ can be informative. If there is a wide discrepancy
between different tests or correction then this suggests that the sphericity assumption may be
severely violated and that one of the more conservative tests should be reported (e.g.,
Greenhouse-Geisser or MANOVA).

In general MANOVA is less powerful than repeated measures ANOVA and therefore should
probably be avoided. However, when sample sizes are reasonably large (n > 10+ k) and epsilon
is low (< 0.7) MANOVA may be more powerful and should probably be preferred. Other factors
(such as the correlations between samples) can influence the relative power of MANOVA and
ANOVA, but are beyound the scope of this summary.

How to check sphericity

In this section I will focus on information readily available in SPSS (and most good statistics
packages).

Factorial repeated measures ANOVA

If there is more than one repeated measures factor consider each factor separately. (See also
Special cases: factors with 2 levels.)

A warning about Mauchly's sphericity test

Many text books recommend using significance tests such as Mauchly's to test sphericity. In
general this is a very bad idea.

Why? First, tests of statistical assumptions ? and Mauchly's is no exception ? tend to


lack statistical power (they tend to be bad at spotting violations of assumptions when n is small).
Second, tests of statistical assumptions ? and, again, Mauchly's is no exception ? tend
not to be very robust (unlike ANOVA and MANOVA they are poor at coping with violations of
assumptions such as normality). Third, significance tests don't reveal the degree of violation
(e.g., with large n even a poor test like Mauchly's will show significance if there are very minor
violations of sphericity; with low n the poor power means that even severe violations may not be
detected). Fourth, significance tests of assumptions tend to be used as substitutes for looking at
the data ? if you followed the advice of many popular texts you'd never look at the
descriptive statistics at all (e.g., the variances, the covariance matrix, estimates of epsilon and so
forth). Fifth, I don't like them.12

Using Mauchly's sphericity test

The test principle is fairly simple. The null hypothesis is that sphericity holds (I like to think of it
as a test that the true value of epsilon = 1). A significant result indicates evidence that sphericity
is violated (i.e., evidence that the true value of episilon is below 1).

Epsilon

I would recommend using estimates of epsilon to decide whether sphericity is violated. If epsilon
if close to 1 then it is likely that sphericity is intact (or that any violation is very minor). If
epsilon is close to the lower bound (see above) then a correction or alternative procedure such as
MANOVA is likely to be necessary. Exactly where to draw the line is a matter of personal
judgement, but it is often instructive to compare p values for the corrected and uncorrected tests.
If they are fairly similar then there is little indication that sphericity is violated. If the the
discrepancy is large then one of the corrections (or MANOVA) should probably be used.

The covariance matrix

If estimates of epsilon are not readily available then lower-bound procedures can be used (see
above) or the covariance matrix can be consulted. If compound symmetry holds then it is safe to
proceed with repeated measures ANOVA. If compound symmetry does not hold it is relatively
simple (if time-consuming) to calculate the variances of the differences for each factor from the
covariance matrix.

Complications

Special cases: factors with 2 levels (and the paired t test)

If k = 2 (a repeated measures factor with only two levels) then the sphericity assumption is
always met. Using the lower-bound formula one can see that when k = 2 epsilon can't be lower
than 1/(k-1) = 1/(2-1) = 1. This is also true for the paired t test (in effect a one-way repeated
measures ANOVA where k = 2).

Why isn't sphericity a problem when there are only two levels? Well, think about the covariance
matrix:
Samples: A1 A2
A1 s12 s12
A2 s21 s22

There are two covariances s21 and s12. The covariances above and below the main diagonal are
constrained to be equal (because the shared variance between level 1 and level 2 is the same
thing as the shared variance between level 2 and level 1). In effect there is only one covariance.
Similarly, if we calculated the variance of the difference for s1-22 we should realize there is only
one such variance. Sphericity is met if all the variances of the differences are equal. As there is
only one, it can't not be equal to itself. For information Mauchly's sphericity test can't be
computed if d.f. = 1 (i.e, if k = 2) and some computer programs give confusing messages or
printouts if you try.

Note that sphericity subsumes the standard homogeneity of variance assumption. In effect, we
are only interested in the variances of the differences. When k = 2 there is only one variance of
the difference between levels and we can ignore differences in the 'raw' level variances
themselves.

Multiple comparisons

In general Bonferroni t tests are recommended for repeated measures ANOVA (whether or not
sphericity is violated). The Bonferroni correction relies on a general probability inequality and
therefore isn't dependent on specific ANOVA assumptions. As Bonferroni corrections tend to be
conservative, a number of modified Bonferroni procedures have been proposed. Some are
specific to certain patterns of hypothesis-testing, but others such as Holm's test (or the similar
Larzelere and Mulaik test) are more powerful than standard Bonferroni corrections and should be
used more widely (and not just for ANOVA).

Most statisticians seem to recommend specific (rather than pooled) error terms for repeated
measures factors (i.e., calculate the SE for t using only the conditions being compared, rather
than using the square root of the MSE term from the ANOVA table). This advice also extends to
contrasts which can be easily calculated by performing paired t tests on weighted averages of the
appropriate means. Using a specific error term should avoid problems with sphericity (e.g., see
Judd et al., 1995) for more-or-less the same reason that sphericity is not a problem for factors
with only 2 levels.

Mixed designs
Mixed designs (combining independent and repeated measures factors) muddy the waters
somewhat. Mixed measures ANOVA requires that multisample sphericity holds. This more-or-
less means that the covariance matrices should be similar between groups (i.e, across the levels
of the independent measures factors). Provided group sizes are equal (or at least roughly equal)
the Greenhouse-Geisser and Huynh-Feldt corrections perform well when multisample sphericity
doesn't hold and can therefore still be used. If these corrections are inappropriate, or if group
sizes are markedly unequal then more sophisticated methods are required (Keselman, Algina &
Kowalchuk, 2001). A description of these methods is beyond the scope of this summary
(possible solutions include multilevel methods found in SAS PROC MIXED, MlWin and HLM,
though Keselman et al. also discuss a number of other options). If at all possible researchers
should keep group sizes in mixed ANOVA equal or as close to equal as possible.

Bibliography

Field, A. (1998). A bluffer's guide to ... sphericity. The British Psychological Society:
Mathematical, Statistical & Computing Section Newsletter, 6, 13-22.

Howell, D. C. (2002). Statistical methods for psychology. (5th. ed.). Belmont, CA: Duxberry
Press.

Judd, C. M., McClelland, G. H., & Culhane, S. E. (1995). Data analysis: continuing issues in
everday analysis of psychological data. Annual Review of Psychology, 46, 433-465.

Keselman, H. J., Algina, J., & Kowalchuk, R. K. (2001). The analysis of repeated measures
designs: a review. British Journal of Mathematical and Statistical Psychology, 54, 1-20.

Kirk, R. E. (1995). Experimental design: procedures for the behavioral sciences. (3rd ed.).
Pacific Grove: Brooks/Cole.

Footnotes:
1 As long as the treatment only has the effect of adding or subtracting to the group means (and
doesn't influence their variances) the homogeneity of variance assumption isn't a problem. This
special case is known as unit-treatment-additivity. Unfortunately life isn't always that simple:
there are good reasons why treatments might be expected to influence both means an variances.
For this reason it is always sensible to check the group variances in independent measures
designs.
2 As a rule of thumb the largest group variance should be no more than 3 or 4 times as large as
the smallest group variance.
3Covariance matrices also crop up in all sorts of other statistics, but we can forget about that for
now.
4 The diagonals contain the variances because samples share all of their variance with
themselves. I've used s rather than the Greek sigma symbol because it turns out better when
browsers with different fonts are used. s is normally used for samples and sigma for populations,
but I'm using s interchangeably in this example. You can generate Greek letters by using the
"symbol" font in many word processors (e.g., 's' for sigma, m for mu and so forth).
5 The covariance between groups will rarely be exactly zero in the samples. However, provided
people are randomly assigned to groups, and each person contributes only one data point then we
can be pretty certain that the covariances in the populations being sampled are zero (and
therefore the independence assumption is met). Even if random assignment to groups doesn't
occur the independence assumption is often reasonable. Any time we know or believe that the
measures will be correlated (e.g., in matched designs) a repeated measures analysis should be
used.
6I probably should have mentioned this earlier, but covariances (like correlations) can be both
negative and positive (unlike variances which are always positive). Positive covariances occur
between samples when two samples are positively correlated. Negative covariances occur
between samples when two samples are negatively correlated. The idea of a negative covariance
is often tricky to grasp ? but it just means that as one group tends to vary upwards in value
the other tends to vary downwards. So when checking covariances to see if they are similar bear
in mind the sign of the covariance as well as its magnitude (e.g., -124.3 is very different from
+124.3).
7 By now you can probably appreciate why many text books focus on compound symmetry and
don't cover sphericity in detail.
8 Oneof the nice things about this topic is that the tests have nice, proper statistical-sounding
names.
9 The Greek letter epsilon is usually used. Greenhouse-Geisser estimates of epsilon have a little
hat on top (^). Huynh-Feldt estimates have a little squiggle on top (~). You can generate Greek
letters by using the "symbol" font in many word processors (e.g., 'e' for epsilon).
10 Later
on we discuss the special case of k = 2 and the analagous case of paired t tests. Feel free
to jump there now if you wish.
11 You won't find tables for fractional d.f., but exact p values can be calculated if d.f. are
fractional (most good computer packages do this automatically these days).
12 Why don't I like them? Apart from all the above reasons, I don't like the idea of using a
significance test to test the assumptions of a significance test. If that was a good idea, why don't
we use significance tests to test the assumptions of Mauchly's sphericity test or Levine's test of
homogeneity of variances? At some point you've got look at the data (using graphical methods,
descriptive statistics and so forth) and make a considered judgement about what procedures to
use.

S-ar putea să vă placă și