Sunteți pe pagina 1din 98

Factorial Analysis of Variance

(ANOVA) on SPSS
Practice reproducing the analyses yourself:
2 Factor Between (2 levels x 2 levels).sav
2 Factor Between (2 levels x3 levels).sav
3 Factor Between (2 levels x 2 levels x 2 levels).sav
2 Factor Within (2 levels x 2 levels).sav

All on Portal
Reading
http://www.socialresearchmethods.net/kb/expfact.htm - a simple
summary of factorial designs
http://davidmlane.com/hyperstat/index.html - see sections 11 & 12
for between subjects designs and section 13 for within subjects
(repeated measures) designs. This is recommended its concise,
clear and to the point. It also contains a very good glossary from
which you can quickly refresh your memory for definitions of such
things as the Standard Error etc.
Chapters 10,11,12 of Gravetter & Forzano cover between, within,
and factorial design issues.
Chapters 13,14,15 of Gravetter & Wallnau cover the stats ANOVA
etc. However dont get bogged down with formulas for calculating
sums of squares. See next slide
Things you should know:
How to interpret interaction plots
How to interpret ANOVA tables and assumption tests
That the Error degrees of Freedom is always N-1 (N= total number of
data points)
That the degrees of freedom for a test of a main effect of a factor =
number of levels the factor contains -1.
That the degrees of freedom for a test of an interaction between two or
more factors = the number of levels in one factor x the number of levels in
the other xetc. Thus the DF for a 3 way interaction between factors
having 2,2 and 4 factor levels is 1 x 1 x 3=3.
That ANOVA uses F tests and that the F statistic for any effect is the Mean
Square for the Effect divided by the Error mean Square: MS
condition
/MS
error
That when you have an alpha level of .05 this means that the probability
of not making a Type 1 error is 95% (.95) for each test you do
Thus if you have 20 F tests in your ANOVA table the probability of none of
them being spurious is .95 x .95 x .95 x .95or .95
20
or (1-a)
20
This actually = .36 or 36% which is why (in complex designs especially)
you should stick to examining a few predictions.

Things you neednt worry about:
The precise way that Sums of Squares are calculated
(But it will help your understanding of ANOVA if you at least understand
the gist of how variability is partitioned).

How Levenes test or Mauchlys test are calculated only that they test
the assumption of homogeneity of variance for between subjects designs
and its (more or less equivalent) in within subjects designs.

In the SPSS output you can largely ignore the following when doing
repeated measures analyses (at this stage at least):

The multivariate tests which you get at the beginning
Tests of within subject contrasts (although these can be a useful tool for
examining patterns in the data)
Any tests of between subjects effects that only involve an intercept (i.e.
you can ignore this output when all your factors are within subjects)
1. Between Subjects Designs
2 Factor designs

Data Format
All scores in a single column
Additional columns for each Factor
Main assumptions of ANOVA:

Assumptions:
There are 3 main assumptions underlying ANOVA

1. Homogeneity of variance
The error variance within each condition should be
statistically equal. Thus any differences between conditions
should only be a shift in the mean. Put another way the
effect of treatment/condition manipulations is to add a
constant to each individuals score.


m
B
m
A
m
C
m
C
s
2
A

s
2
B

s
2
C

m
A
m
B
m
C
OK
NOT OK
Main assumptions of ANOVA:
2. Normality

The distribution of errors within each condition should
be normal. By errors we mean deviations from the
mean for that condition.

Because the errors are the deviations from the
condition means this is equivalent to saying that the
scores should be distributed normally about the
condition means.
Main assumptions of ANOVA:

3.Independence of observations
The data points should represent independent
observations. Knowing the value of one should not tell
you anything about the value of any other.

N.B. This assumption is obviously violated in repeated
measures experiments (because knowing that one data
point comes from subject x who might be a particularly
fast responder, say- does tell you something about the
likelihood of another observation from subject x being
relatively fast). This is why Subjects have to be included
as a factor in the analysis of repeated measures
designs- the non-independent component is partialled
out.

Main assumptions of ANOVA:
Design
Experiment to investigate the effect of
stimulus duration and modality (Word vs
Picture) on Recognition performance.

Dependent Variable (Score)
Two Factors: Modality and Duration
Factor Levels
Modality two levels
Word, Picture

Duration two levels
200msec, 800msec
= 2 x 2 design

5 subjects
5 subjects
5 subjects
5 subjects
2
0
0

m
s

8
0
0

m
s

Pictures Words
Modality
D
u
r
a
t
i
o
n

Data entry
View Factor Level Labels
This person scored 127.19
and was tested in the word
modality and with the 800msec
duration
Analyse / General Linear Model / Univariate
Dependent Variable : Score
Fixed Factors: duration + modality
Main effect plot for modality
Main effect plot for duration
Interaction plot
Interaction plot
(duration*modality)
Options Condition means, descriptive stats,
test for homogeneity (equality) of variances.
Displays overall
mean, means for
each level of
duration, mean for
each level of
modality and the
means for each
combination of
duration by modality
(= the interaction
means).
Means
Produces Levenes
test for
homogeneity of
variance (one of
the assumptions of
Anova i.e. that
the variances
within each cell of
the design are not
significantly
different.
Homogeneity Test
Gives descriptive
statistics (mean,
max, min, SD etc.
by the
experimental
groups)
Descriptive stats
Output
Factors and
Factor level
labels
Output
Descriptives-
cell means &
SDs
Levenes test. This significant result means
the assumption of equal group variances
has not been met.
Output
In this case the analysis is not valid !.
A data transformation may be of use
here.

Output
**
**
Some cell SDs
considerably
different
At this point either
Abandon the analysis
See if a data transformation removes the
problem (e.g. Log(score))
Report results but with extreme caution



Assume we have different data:
Levenes test, and
any test that checks
assumptions for an
analysis should not
be significant.

Here the p value of
.271 says that there
is no evidence for
any differences in
variances between
the groups which is
what we want.
2 Factor Between (2 levels x 2 levels).sav
ANOVA Table
Test for the Main Effect of Duration (i.e. 200 vs 800 ms
pooling across both Modalities)

Significant effect of Duration, F(1,16) = 5.5, p = .032
(Ignore shaded items)
There was a significant effect of Stimulus Duration.
Participants who viewed the stimulus for 200 msec scored
higher (M =134) than those who viewed it for 800 msec (M =
115), F(1,16) = 5.5, p = .032.
Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
140
130
120
110
This difference
is significant
Duration Profile Plot
ANOVA Table
Test for the Main Effect of Modality (i.e. Pictures vs Words
pooling across both Durations).

No Significant effect of Modality.
(Ignore shaded items)
Estimated Marginal Means of SCORE
MODALITY
Word Picture
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
126.0
125.5
125.0
124.5
124.0
123.5
Profile Plot for Modality
This difference
is not significant
C
h
e
c
k

t
h
e

s
c
a
l
e
!

Any graphs you present should be using the same scale.
By default SPSS changes the scale so that the data takes
up the whole graph area. Here are the two graphs on the
same scale:
Estimated Marginal Means of SCORE
MODALITY
Word Picture
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
140.0
130.0
120.0
110.0
Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
140
130
120
110
Modality
Duration
ANOVA Table
Test for the Interaction between Modality and Duration.

There was a significant two-way interaction between modality
and duration, F(1,16) = 7.2, p = .017.
(Ignore shaded items)
Profile Plot of Modality by Duration interaction
Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
150
140
130
120
110
100
MODALITY
Picture
Word
Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
150
140
130
120
110
100
MODALITY
Picture
Word
Main effect of Duration is still observable in the graph
200 msec Average
800 msec Average
Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
140
130
120
110
Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
150
140
130
120
110
100
MODALITY
Picture
Word
Main effect of Duration is still observable in the graph
200 msec Average
800 msec Average
Interpretation of the Modality by Duration Interaction
Several ways
of describing
the interaction:


Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
150
140
130
120
110
100
MODALITY
Picture
Word
Interpretation of the Modality by Duration Interaction
.At the 200 msec
duration pictures
resulted in scores
approximately 20 points
higher than words
whereas at the 800 msec
duration the opposite
pattern was true with
words producing scores
approximately 20 points
below pictures), F (1,16)
= 7.2, p = .017.


Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
150
140
130
120
110
100
MODALITY
Picture
Word
Interpretation of the Modality by Duration Interaction



For words there
was a small
increase in
performance
going from the
200 msec (M= to
the 800 msec
duration. With
pictures,
however, there
was a large
decrease in
performance


Estimated Marginal Means of SCORE
DURATION
800 200
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
150
140
130
120
110
100
MODALITY
Picture
Word
Alternative Plot same data
Estimated Marginal Means of SCORE
MODALITY
Word Picture
E
s
t
i
m
a
t
e
d

M
a
r
g
i
n
a
l

M
e
a
n
s
150
140
130
120
110
100
DURATION
200
800
At the 200 msec duration
performance was better
with pictures (M = 144)
than words (M = 124)
whereas at the 200 msec
duration the opposite was
true with words giving
better performance (M =
127) than pictures (M =
103), F (1,16) = 7.2, p =
.017.
Extension to factors with 3 Levels
10 extra participants at 500
msec duration - 5 with Words,
5 with Pictures
2 Factor Between (2 levels x 3 levels).sav
The analysis is the same, however the interpretation of
the main effect of DURATION is a little more complex:
Note the increased Degree of Freedom for Duration and the
interaction
Duration Profile Plot:
A significant F test only says that not all the means are equal
To examine individual pair-wise comparisons:

1. If you make a priori predictions about which means you
are interested in comparing:

You can use Simple T tests (LSD) for 3 means
Sidak or Bonferroni for a greater number of comparisons.

2. If you want to make post hoc comparisons:
You can use Tukeys Test



Note that the more conservative Tukey test only finds one
significant difference whereas LSD finds two. Note the Tukey
test requires equal sample sizes.
There was a significant main effect if stimulus duration,
F (2,24) = 8.07, p =.02. Post Hoc comparisons using Tukeys
HSD showed that only the difference between the 200 (M =
134.3) and 500 (M = 101.8) durations was significant,
p=.001.
3 Factor Designs
3 Factor Between (2 levels x 2 levels x 2 levels).sav
Adding a third Noise factor
with two levels (Low, High)
requires doubling the
number of subjects,
assuming you still want 5
in each cell. In the
following analysis, for ease
of interpretation, we will go
back to having just two
levels of the duration factor
(200 vs. 800).

Logic of the analysis is the same but we now have:

3 possible main effects :
Duration
Modality
Noise

3 possible 2-way interactions:
Duration x modality
Duration x noise
Modality x noise

1 possible 3-way interaction
Duration x modality x noise
Both main effects of duration and noise significant. 3-way
interaction also significant.
Interpreting 3-Way interactions.
Much easier if you have some predictions about
the expected pattern

For instance in this example we might predict
that as well as generally decreasing performance
high levels of noise might obscure any
differences between the picture and word
conditions:
3-way interaction is a difference in the pattern of a 2-way
interaction at levels of the third factor
There was a significant 3 way interaction between duration,
modality and noise, F (1,32)=4.5, p = .041. In the low noise
condition pictures and words produced opposite effects on
performance at the two durations. At stimulus presentations of
200msec words gave rise to performance some 20 points lower
than pictures whereas the reverse pattern was true for the 800
msec duration. With high noise, however, there was very little
evidence of any interaction.
If you want to provide a bit more weight to your
conclusions concerning the interpretation of the 3-way
interaction you could perform a simple interaction effects
analysis.

This is actually very easy

You just run two separate ANOVAs one at each
level of (in this example) the noise factor.

Each of these analyses has the factors duration and
modality but one uses the data from the high noise
condition and the other from the low noise condition.

You then interpret the 2-way interactions between
duration and modality at each level of noise
One ANOVA on this data One ANOVA on this data
Can then say whether it is true that the interaction on the
left (low noise) is significant whilst the one on the right (high
noise) is not.
There is one catch the F ratio for the 2-way interactions in
each separate analysis needs to be computed using the
MS
error
from the original analysis.
Original 3 Factor ANOVA
MS
error
from the original analysis = 400.8 on 32 DF

You now need to run the two
separate 2 way ANOVAS on
the data from the high and
low noise conditions.

On SPSS the easiest way to
do this is to first split the data
using the split data
command.



Any subsequent commands be
they Tables, Plots or, as in this
case ANOVAs, will now be done
separately for each level of the
grouping variable (noise):
Having split the data file by the noise variable you now
simply perform a 2 way ANOVA, with factors duration and
modality as before:
Analyse / General Linear Model / Univariate
This factor is
left out as it is
the one used
to split the file
SPSS will now compute the two 2 way ANOVAs
This table is simply 2 ANOVA tables put together one for
the low noise data and one for the high noise data.
However the F ratios are wrong as they need to be computed
using the MS
error
from the original 3 way ANOVA

Original 3 Factor ANOVA
MS
error
from the original analysis = 400.8 on 32 DF

F ratios are simply the result of dividing the Mean Square
for the effect by the error Mean Square (MS
error
)

E.g. the duration F ratio is simply MS
duration
/ MS
error


For the simple interaction effects follow up we need to
compute our own F ratios for the modality by duration
interactions at each noise level by substituting the MS
error
from the original analysis.

For the low noise interaction the correct F ratio is 2393.978 / 400.8 = 5.97
For the high noise interaction the correct F ratio is 129.97 /400.8 = .32
MS
error
from the original analysis = 400.8 on 32 DF
For the low noise interaction the correct F ratio is 2393.978 / 400.8 = 5.97
For the high noise interaction the correct F ratio is 129.97 /400.8 = .32
5.97

.32

To work out the p value you need either to look it up in
F tables.

Or to calculate the exact probability (very easlily) using a
package such as Excel:


E.g. To calculate the p value associated with the low noise
modality x duration interaction:

The value we got was 5.97
This is based on 1 df for the effect and 32 df for error
Click in any cell in Excel and type:

=FDIST(5.97,1,32) and press return
NB. Dont forget the = at the start of the formula

Excel then gives the answer:
The simple interaction effect at the low noise level was
significant, F (1,32) = 5.97, p = .02.
=FDIST(5.97,1,32)
Repeated Measures designs
These are where the same subject is tested in
the different experimental conditions

Advantages are that the test is more sensitive
Disadvantages things like order effects,
practice effects etc.
Not always possible in principle e.g. if partaking
in one condition exposes subjects to information
that will ruin them for any other condition
2 Factor Within (2 levels x 2 levels).sav
Test is more sensitive because:

Individual differences are controlled for:

e.g. suppose a reaction time study:

Some people are just faster average
responders than others. What we are usually
interested in is the relative effect of a
treatment on performance

Repeated measures (or within subjects)
designs examine the relative effect of
conditions on individuals

Repeated measures ANOVA on SPSS
Interpretation of effects from the ANOVA table is the same

Main difference is in the data entry

Designs can be all repeated measures or a mixture

E.g. A two factor repeated measures design could have:

Both factors as repeated measures (or within subjects)
Or
One repeated measure and one between subjects measure



Each subject is tested under every combination

The order of the combinations would normally be
randomised for each subject

Or

Pseudo-randomised so that equal numbers of subjects
receive each order (this is the most common method)
Both factors as repeated measures:
Modality by Stimulus duration data
Assuming this experiment was carried out with both
factors as repeated measures:

This is how the data
is entered into SPSS.

Each row represents
scores from a single
subject.
Each subject has 4
data points.

These could be single
scores or the average of
many trials under that
condition. The latter is
common with measures
such as RT which are
inherently noisy (i.e. you
need to take the average
of many raw data points to
get a good estimate for
that subject under those
conditions).
Give the columns meaningful names the first column contains data from
the Duration level 1 (200msec) and Modality level 1 (picture). You can use
short hand for the actual column names and put the longer, more
meaningful, description as the variable label:
To avoid confusion later the columns should always be
ordered in a hierarchy - take a 3 Factor example (all with 2
levels and where F1(1) = Factor 1 Level 1):
F3(1) F3(2)
F2(1)
F3(1) F3(2)
F2(2)
F3(1) F3(2)
F2(1)
F3(1) F3(2)
F2(2)
F1(1)
F1(2)

To run the analysis:



First Factor is Duration and this has two levels
NB the first factor is the one at the top of the hierarchy:
F3(1) F3(2)
F2(1)
F3(1) F3(2)
F2(2)
F3(1) F3(2)
F2(1)
F3(1) F3(2)
F2(2)
F1(1)
F1(2)
1
2
3
Order in which you define the factors in SPSS
Second factor is
modality with two
levels






This sets up all the
factors now click
Define to tell SPSS
where the columns are
that correspond to
each factor level
combination
The first question mark is asking
where is the column containing
the data from level 1 of factor 1
and level 1 of factor 2? This is
our column 1 (d1m1)
Note at the top where it says Within-Subjects Variables you get a
reminder of which is the first and second factors. The order we
defined the factors in was duration then modality hence at the top
we have (duration, modality). The numbers in the brackets refer to
the levels of the corresponding factors.
The process continues until all the within subject
variables have been set up. NB: only when you
set up the factors in the data sheet according to
the hierarchy and define the factors starting from
the top of the hierarchy will they be in the
correct order already.
Once set up you can use the plots and options (display
means) in exactly the same way as with between subjects
designs.
SPSS Output

This is not quite the same as for between
subjects designs.


The first box just summarises the within-subjects
factors and allows you to check that they have
been entered in the right order:
You can ignore the multivariate tests output
unless you have special reason to question
certain assumptions.
Mauchlys Sphericity test is an important assumption test it is the
repeated measures equivalent of Levenes test for homogeneity of
variance. IT SHOULD NOT BE SIGNIFICANT. NB when, as in this
case, a factor only has two levels the sphericity cannot be violated
and there is never a problem. The dots in the SIG column simply
mean that the test is not appropriate.
The Tests of Within Subjects Effects are where you find the
significance tests for all your within subjects factors and any
interactions involving any within subjects factor. Highlighted
here is the test for a main effect of Duration.

If for any test there is no violation of sphericity use the
sphericity assumed F and p value.
Suppose the test of sphercity for the interaction had given a
significant result (p<.05). Then when you came to interpret
the interaction effect in the main ANOVA table you would use
the Greenhouse-Geiser adjustment:
Notice also that in a repeated measures design any within-
subjects variable has its own error term and this should be
checked when giving the DFs for a test:

E.g. ..interaction was significant F (1,19) = .

Here the 1 , as before, comes from the DF associated with
the test of the interaction and the 19 comes from the DF
associated with the Specific Duration x Modality error
term.
The Tests of Within-Subjects contrasts only really apply
When you have a factor with more than 2 levels
and
You want to test for a particular trend (e.g. that
performance increases in a straight line (linear) fashion as
drug dosage increases.
Plots and tables of means can be interpreted in exactly the
same way as between subjects designs.

S-ar putea să vă placă și