Sunteți pe pagina 1din 13

MicrobiologyBytes: Maths & Computers for Biologists: ANOVA with SPSS

Updated: February 6, 2009

Search

ANOVA with SPSS


Never, ever, run any statistical test without performing EDA first!

SPSS for Windows Step by... Darren George, Pau... New $55.62 Best $52.00

SPSS for Dummies Arthur Griffith New 10.99 Best 9.00

Privacy Information

Privacy Informatio

What's wrong with t-tests? Nothing, except ... If you want to compare three or more groups using t-tests with the usual 0.05 level of significance, you would have to compare the three groups pairwise (A to B, A to C, B to C), so the chance of getting the wrong result would be: 1 - (0.95 x 0.95 x 0.95) = 14.3% If you wanted to compare four or more groups, the chance of getting the wrong result would be (0.95)6 = 26%, and for five groups, 40%. Not good, is it? So we use ANOVA. Never perform multiple t-tests: Anyone on this module discovered performing multiple t-tests when they should use ANOVA will be shot! ANalysis Of VAriance (ANOVA) is such an important statistical method that it would be easy to spend a whole module on this test alone. Like the t-test, ANOVA is a parametric test which assumes: data is numerical data representing samples from normally distributed populations the variances of the groups are "similar" the sizes of the groups are "similar" the groups should be independent

so it's important to carry out EDA before starting AVOVA! In fact, ANOVA is quite a robust procedure, so as long as the groups are similar, the test is normally reliable. ANOVA tests the null hypothesis that the means of all the groups being compared are equal, and produces a statistic called F which is equivalent to the t-statistic from a t-test. But there's a catch. If the means of all the groups tested by ANOVA are equal, fine. But if the result tells us to reject the null hypothesis, we still don't know which of the means differ. We solve this problem by performing what is known as a "post hoc" (after the event) test.

Reminder:
Independent variable: Variables which are experimentally manipulated by an investigator are called independent variables. Dependent variable: Variables which are measured are called dependent variables (because they are presumed to depend on the value of the independent variable).

NIITimperia.com/Businessmanagement

Ads by Google

ANOVA jargon: Way = an independent variable, so a one-way ANOVA has one independent variable, two-way ANOVA has two independent variables, etc. Simple ANOVA tests the hypothesis that means from two or more samples are equal (drawn from populations with the same mean). Student's t-test is actually a particular application of one-way ANOVA (two groups compared). Factor = a test or measurement. Single-factor ANOVA tests whether the means of the groups being compared are equal and returns a yes/no answer, two-factor ANOVA simultaneously tests two or more factors, e.g. tumour size after treatment with different drugs and/or radiotherapy (drug treatment is one factor and radiotherapy is another). So, "factor" and "way" are alternative terms for the same thing (inpependent variables). Repeated measures: Used when members of a sample are measured under different conditions. As the sample is exposed to each condition, the measurement of the dependent variable is repeated. Using standard ANOVA is not appropriate because it fails to take into account correlation between the repeated measures, violating the assumption of independence. This approach can be used for several reasons, e.g. where research requires repeated measures, such as longitudinal research which measures each sample member at each of several ages - age is a repeated factor. This is comparable to a paired t-test. The array of options for different ANOVA tests in SPSS is confusing, so I'll go through the most important bits using some examples.

One-Way / Single-Factor ANOVA:


Data:
Pain Scores for Analgesics
Drug: Diclofenac Pain Score: 0, 35, 31, 29, 20, 7, 43, 16

Ibuprophen 30, 40, 27, 25, 39, 15, 30, 45 Paracetamol 16, 33, 25, 32, 21, 54, 57, 19 Asprin 55, 58, 56, 57, 56, 53, 59, 55

Since it would be unethical to withhold pain relief, there is no control group and we are just interested in knowing whether one drug performs better (lower pain score) than another, so we need to perform a one-way/single-factor ANOVA. We enter this data into SPSS using dummy values (1, 2, 3, 4) for the drugs so this numeric data can be used in the ANOVA:

It's always a good idea to enter descriptive labels for data into the Variable View window, or the output is difficult to interpret! EDA (Analyzer: Descriptive Statistics: Explore) shows that the data is normally distributed, so we can proceed with the ANOVA: Analyze: Compare Means: One-Way ANOVA Dependent variable: Pain Score Factor: Drug:

SPSS allows many different post hoc tests. Click Post Hoc and select the Tukey and Games-Howell tests. The Tukey test is powerful and widely accepted, but is parametric in that it assumes that the population variances are equal. It also assumes that the sample sizes are equal. If this is not the case, you should use Gabriel's procedure, or if the sizes are very different, use Hochberg's GT2. Games-Howell does not assume population variances are equal or that sample sizes are equal, so is a good alternative if this turns out to be the case. Click Options and select Homogeneity of Variance Test, Brown-Forsythe and Welch. The homogeneity of variance test is important since this is an assumption of ANOVA, but if this assumption turns out to be broken, the Brown-Forsythe and Welch options will display alternative versions of the F statistic which means you may still be able to use the result. Click OK to run the tests.

Output:
Test of Homogeneity of Variances: Pain Levene Statistic df1 df2 Sig. 4.837 3 28 .008

The significance value for homogeneity of variances is <.05, so the variances of the groups are significantly different. Since this is an assumption of ANOVA, we need to be very careful in interpreting the outcome of this test:
ANOVA: Pain Sum of Squares df Mean Square Between Groups Within Groups Total 4956.375 3 F Sig. 1652.125 11.967 .000 138.054

3865.500 28 8821.875 31

This is the main ANOVA result. The significance value comparing the groups (drugs) is <.05, so we could reject the null hypothesis (there is no difference in the mean pain scores with the four drugs). However, since the variances are significantly different, this might be the wrong answer. Fortunately, the Welch and Brown-Forsythe statistics can still be used in these circumstances:
Robust Tests of Equality of Means: Pain Statistic df1 Welch Brown-Forsythe 32.064 11.967 df2 Sig. 3 12.171 .000 3 18.889 .000

The significance value of these are both <.05, so we still reject the null hypothesis. However, this result does not tell us which drugs are responsible for the difference, so we need the post hoc test results:
Multiple Comparisons Dependent Variable: Pain (I) Drug (J) Drug Mean Difference (I-J) Std. Error Sig. 2 1 3 4 1 2 Tukey HSD 3 4 1 3 2 4 4 1 2 -8.750 -9.500 -33.500(*) 8.750 -.750 -24.750(*) 9.500 .750 -24.000(*) 33.500(*) 24.750(*) 5.875 .457 5.875 .386 5.875 .000 5.875 .457 5.875 .999 5.875 .001 5.875 .386 5.875 .999 5.875 .002 5.875 .000 5.875 .001 95% Confidence Interval Lower Bound Upper Bound -24.79 -25.54 -49.54 -7.29 -16.79 -40.79 -6.54 -15.29 -40.04 17.46 8.71 7.29 6.54 -17.46 24.79 15.29 -8.71 25.54 16.79 -7.96 49.54 40.79

3 2 1 3 4 1 2 Games-Howell 3 3 4 1 2 4 1 4 2 3

24.000(*) -8.750 -9.500 -33.500(*) 8.750 -.750 -24.750(*) 9.500 .750 -24.000(*) 33.500(*) 24.750(*) 24.000(*)

5.875 .002 6.176 .513 7.548 .602 5.194 .001 6.176 .513 6.485 .999 3.471 .001 7.548 .602 6.485 .999 5.558 .014 5.194 .001 3.471 .001 5.558 .014

7.96 -27.05 -31.45 -50.55 -9.55 -20.09 -36.03 -12.45 -18.59 -42.26 16.45 13.47 5.74

40.04 9.55 12.45 -16.45 27.05 18.59 -13.47 31.45 20.09 -5.74 50.55 36.03 42.26

* The mean difference is significant at the .05 level.

The Tukey test relies on homogeneity of variance, so we ignore these results. The Games-Howell post-hoc test does not rely on homogeneity of * variance (this is why we used two different post-hoc tests) and so can be used. SPSS kindly flags ( ) which differences are significant! Result: Drug 4 (Asprin) produces significantly different result from the other three drugs: Formal Reporting: When we report the outcome of an ANOVA, we cite the value of the F ratio and give the number of degrees of freedom, outcome (in a neutral fashion) and significance value. So in this case:

There is a significant difference between the pain scores for asprin and the other three drugs tested, F(3,28) = 11.97, p < .05.

Two-Factor ANOVA
Do anti-cancer drugs have different effects in males and females?

Data:
Drug: cisplatin vinblastine 5-fluorouracil

Gender: Female Male Female Male Female Male Tumour Size: 65 70 60 60 60 55 60 50 50 55 80 65 70 75 75 65 70 65 60 70 65 60 60 50 45 60 85 65 70 70 80 60 55 65 70 55 55 60 50 50 35 40 35 55 35 40 45 40

We enter this data into SPSS using dummy values for the drugs (1, 2, 3) and genders (1,2) so the coded data can be used in the ANOVA:

It's always a good idea to enter descriptive labels for data into the Variable View window, or the output is difficult to interpret! EDA (Analyze: Descriptive Statistics: Explore) shows that the data is normally distributed, so we can proceed with the ANOVA: Analyze: General Linear Model: Univariate Dependent variable: Tumour Diameter Fixed Factors: Gender, Drug:

Also select: Post Hoc: Tukey and Games-Howell:

Options: Display Means for: Gender, Drug, Gender*Drug Descriptive Statistics Homogeneity tests:

Output:
Levene's Test of Equality of Error Variances(a) Dependent Variable: Diameter F 1.462 df1 5 df2 42 Sig. .223

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a Design: Intercept+Gender+Drug+Gender * Drug

The significance result for homogeneity of variance is >.05, which shows that the error variance of the dependent variable is equal across the groups, i.e. the assumption of the ANOVA test has been met.
Tests of Between-Subjects Effects Dependent Variable: Diameter Source Corrected Model Intercept Gender Drug Gender * Drug Error Total Corrected Total Type III Sum of Squares df Mean Square 3817.188(a) 167442.188 42.188 2412.500 1362.500 5 1 1 2 2 763.438 F Sig. 10.459 .000

167442.188 2294.009 .000 42.188 1206.250 681.250 72.991 .578 .451 16.526 .000 9.333 .000

3065.625 42 174325.000 48 6882.813 47

a R Squared = .555 (Adjusted R Squared = .502)

The highlighted values are significant (<.05), but there is no effect of gender (p = 0.451). Again, this does not tell us which drugs behave differently, so again we need to look at the post hoc tests:
Multiple Comparisons Dependent Variable: Diameter (I) Drug (J) Drug vinblastine 5-flourouracil cisplatin 5-flourouracil cisplatin vinblastine vinblastine 5-flourouracil cisplatin 5-flourouracil cisplatin vinblastine Mean Difference (I-J) Std. Error Sig. -1.25 14.38(*) 1.25 15.63(*) -14.38(*) -15.63(*) -1.25 14.38(*) 1.25 15.63(*) -14.38(*) -15.63(*) 3.021 .910 3.021 .000 3.021 .910 3.021 .000 3.021 .000 3.021 .000 3.329 .925 3.534 .001 3.329 .925 3.699 .001 3.534 .001 3.699 .001 95% Confidence Interval Lower Bound Upper Bound -8.59 7.04 -6.09 8.29 -21.71 -22.96 -9.46 5.64 -6.96 6.50 -23.11 -24.75 6.09 21.71 8.59 22.96 -7.04 -8.29 6.96 23.11 9.46 24.75 -5.64 -6.50

cisplatin

Tukey HSD

vinblastine

5-flourouracil

cisplatin

Games-Howell vinblastine

5-flourouracil Based on observed means.

* The mean difference is significant at the .05 level.

In this example, we can use the Tukey or Games-Howell results. Again, SPSS helpfully flags which results have reached statistical significance. We already know from the main ANOVA table that the effect of gender is not significant, but the post hoc tests show which drugs produce significantly different outcomes. Formal Reporting: When we report the outcome of an ANOVA, we cite the value of the F ratio and give the number of degrees of freedom, outcome (in a neutral fashion) and significance value. So in this case:

There is a significant difference between the tumour diameter for 5-flourouracil and the other two drugs tested, F(5,47) = 10.46, p < .05.

Repeated Measures ANOVA


Remember that one of the assumptions of ANOVA is independence of the groups being compared. In lots of circumstances, we want to test the same thing repeatedly, e.g: Patients with a chronic disease after 3, 6 and 12 months of drug treatment Repeated sampling from the same location, e.g. spring, summer, autumn and winter etc This type of study reduces variability in the data and so increases the power to detect effects, but violates the assumption of independence, so as with the paired t-test, we need to use a special form of ANOVA called repeated measures. In a parametric test, the assumption that the relationship between pairs of groups is equal is called "sphericity". Violating sphericity means that the F statistic cannot be compared to the normal tables of F, and so software cannot calculate a significance value. SPSS includes a procedure called Mauchly's test which tells us if the assumption of sphericity has been violated: If Mauchlys test statistic is significant (i.e. p 0.05) we conclude that the condition of sphericity has not been met. If, Mauchlys test statistic is nonsignificant (i.e. p >.05) it is reasonable to conclude that the variances of differences are not significantly different. If Mauchlys test is significant then we cannot trust the F-ratios produced by SPSS unless we apply a correction (which, fortunately, SPSS helps us to do).

One-Way Repeated Measures ANOVA


i.e. one independent variable, e.g. pain score after surgery:
Patient1 Patient2 Patient3

1 2 4 5 5 6

3 5 6 7 9 10

1 3 6 4 1 3

This data can be entered directly into SPSS. Note that each column represents a repeated measures variable (patients in this case). There is no need for a coding variable (as with between-group designs, above):

It's always a good idea to enter descriptive labels for data into the Variable View window, or the output is difficult to interpret! Next: Analyze: General Linear Model: Repeated Measures

Within-Subject factor name: Patient Number of Levels: 3 (because there are 3 patients) Click Add, then Define (factors):

There are no proper post hoc tests for repeated measures variables in SPSS. However, via the Options button, you can use the paired t-test procedure to compare all pairs of levels of the independent variable, and then apply a Bonferroni correction to the probability at which you accept any of these tests. The resulting probability value should be used as the criterion for statistical significance. A Bonferroni correction is achieved by dividing the probability value (usually 0.05) by the number of tests conducted, e.g. if we compare all levels of the independent variable of these data, we make three comparisons and so the appropriate significance level is 0.05/3 = 0.0167. Therefore, we accept t-tests as being significant only if they have a p value <0.0167.

Output:
Mauchly's Test of Sphericity Within Subjects Effect Mauchly's W Approx. Chi-Square df Sig. patient .094 9.437 2 .009 Epsilon Greenhouse-Geisser Huynh-Feldt Lower-bound .525 .544 .500

Mauchlys test is significant (p <.05) so we conclude that the assumption of sphericity has not been met.
Tests of Within-Subjects Effects Source Sphericity Assumed patient Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Error(patient) Greenhouse-Geisser Huynh-Feldt Lower-bound Type III Sum of Squares 44.333 df 2 Mean Square F Sig. 22.167 8.210 .008 42.239 8.210 .033 40.752 8.210 .031 44.333 8.210 .035 2.700 5.145 4.964 5.400

44.333 1.050 44.333 1.088 44.333 1.000 27.000 10

27.000 5.248 27.000 5.439 27.000 5.000

Because the significance values are <.05, we conclude that there was a significant difference between the three patients, but this test does not tell us which patients differed from each other. The next issue is which of the three corrections to use. Going back to Mauchly's test: If epsilon is >0.75, use the Huynh-Feldt correction. If epsilon is <0.75, or nothing is known about sphericity at all, use the Greenhouse-Geisser correction. In this example, the epsilon values from Mauchly's test values are 0.525 and 0.544, both <0.75, so we use the Greenhouse-Geisser corrected values. Using this correction, F is still significant because its p value is 0.033, which is <.05.

Post Hoc Tests:


Pairwise Comparisons (I) patient (J) patient Mean Difference (I-J) Std. Error Sig.(a) 2 3 1 3 1 2 -2.833(*) .833 2.833(*) 3.667 -.833 -3.667 .401 .946 .401 1.282 .946 1.282 .003 1.000 .003 .106 1.000 .106 95% Confidence Interval for Difference(a) Lower Bound -4.252 -2.509 1.415 -.865 -4.176 -8.199 Upper Bound -1.415 4.176 4.252 8.199 2.509 .865

Based on estimated marginal means * The mean difference is significant at the .05 level. a Adjustment for multiple comparisons: Bonferroni.

Formal reporting:

Mauchlys test indicated that the assumption of sphericity had been violated (chi-square = 9.44, p <.05), therefore degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (epsilon = 0.53). The results show that the pain scores of the three patients differed significantly, F(1.05, 5.25) = 8.21, p <.05. Post hoc tests revealed that although the pain score of Patient2 was significantly higher than that of than Patient1 (p<.001), Patient3's score was not significantly differently from either of the other patients (both p>.05).

Two-Way Repeated Measures ANOVA


i.e. two independent variables: In a study of the best way to keep fields free of weeds for an entire growing season, a farmer treated test plots in 10 fields with either five different concentrations of weedkiller (independent variable 1) or five different length blasts with a flamethrower (independent variable 2). At the end of they growing season, the number of weeds per square metre were counted. To exclude bias (e.g. pre-existing seedbank in the soil), the following year, the farmer repeated the experiment but this time the treatments the fields received were reversed:
Treatment: Severity: Field1 Field2 Field3 Field4 Field5 Field6 Field7 Field8 Field9 Field10 1 Weedkiller 2 3 4 5 Flamethrower 1 2 3 4 5

10 15 18 22 37 9 13 13 18 22 10 18 10 42 60 7 14 20 21 32 7 11 28 31 56 9 13 24 30 35 9 19 36 45 60 7 14 9 20 25 15 14 29 33 37 14 13 20 22 29 14 13 26 26 49 5 12 17 16 33 9 12 19 37 48 5 15 12 17 24 9 18 22 31 39 13 13 14 17 17 12 14 24 28 53 12 13 21 19 22 7 11 21 23 45 12 14 20 21 29

SPSS Data View:

It's always a good idea to enter descriptive labels for data into the Variable View window, or the output is difficult to interpret:

Analyze: General Linear Model: Repeated Measures Define Within Subject Factors (remember, "factor" = test or treatment): Treatment, (2 treatments, weedkiller or flamethrower) (SPSS only allows 8 characters for the name) Severity (5 different severities):

Click Define and define Within Subject Variables:

As above, there are no post hoc tests for repeated measures ANOVA in SPSS, but via the Options button, we can apply a Bonferroni correction to the probability at which you accept any of the tests:

Output:
Mauchly's Test of Sphericity(b) Measure: MEASURE_1 Within Subjects Effect Mauchly's W Approx. Chi-Square df Sig. treatmen severity treatmen * severity 1.000 .092 .425 .000 17.685 6.350 0 . Epsilon Greenhouse-Geisser Huynh-Feldt Lower-bound 1.000 .552 .747 1.000 .740 1.000 1.000 .250 .250

9 .043 9 .712

The outcome of Mauchlys test is significant (p <.05) for the severity of treatment, so we need to correct the F-values for this, but not for the treatments themselves.
Tests of Within-Subjects Effects Source Sphericity Assumed treatmen Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Error(treatmen) Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed severity Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Error(severity) Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed treatmen * severity Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Error(treatmen*severity) Huynh-Feldt Lower-bound Type III Sum of Squares 1730.560 1730.560 1730.560 1730.560 457.040 457.040 457.040 457.040 9517.960 9517.960 9517.960 9517.960 1026.040 df 1 1.000 1.000 1.000 9 9.000 9.000 9.000 4 2.209 2.958 1.000 36 Mean Square F Sig. 1730.560 34.078 .000 1730.560 34.078 .000 1730.560 34.078 .000 1730.560 34.078 .000 50.782 50.782 50.782 50.782 2379.490 83.488 .000 4309.021 83.488 .000 3217.666 83.488 .000 9517.960 83.488 .000 28.501 51.613 38.541 114.004 373.810 20.730 .000 500.205 20.730 .000 373.810 20.730 .000 1495.240 20.730 .001 18.032 24.129 18.032 72.129

1026.040 19.880 1026.040 26.622 1026.040 1495.240 1495.240 1495.240 1495.240 649.160 9.000 4 2.989 4.000 1.000 36

649.160 26.903 649.160 36.000 649.160 9.000

Since there was no violation of sphericity, we can look at the comparison of the two treatments without any correction. The significance value shows (0.000) that there was a significant difference between the two treatments, but does not tell us which treatments produced this effect.

The output also tells us the effect of the severity of treatments, but remember there was a violation of sphericity here, so we must look at the corrected F-ratios. All of the corrected values are highly significant and so we can use the Greenhouse-Geisser corrected values as these are the most conservative.
Pairwise Comparisons (I) severity (J) severity Mean Difference (I-J) Std. Error Sig.(a) 2 1 3 4 5 1 2 3 4 5 1 3 2 4 5 1 4 2 3 5 1 5 2 3 4 -4.200(*) -10.400(*) -16.200(*) -27.850(*) 4.200(*) -6.200(*) -12.000(*) -23.650(*) 10.400(*) 6.200(*) -5.800 -17.450(*) 16.200(*) 12.000(*) 5.800 -11.650(*) 27.850(*) 23.650(*) 17.450(*) 11.650(*) .895 1.190 1.764 2.398 .895 1.521 1.280 2.045 1.190 1.521 1.690 2.006 1.764 1.280 1.690 1.551 2.398 2.045 2.006 1.551 .011 .000 .000 .000 .011 .028 .000 .000 .000 .028 .075 .000 .000 .000 .075 .000 .000 .000 .000 .000 95% Confidence Interval for Difference(a) Lower Bound -7.502 -14.790 -22.709 -36.698 .898 -11.810 -16.723 -31.197 6.010 .590 -12.036 -24.852 9.691 7.277 -.436 -17.373 19.002 16.103 10.048 5.927 Upper Bound -.898 -6.010 -9.691 -19.002 7.502 -.590 -7.277 -16.103 14.790 11.810 .436 -10.048 22.709 16.723 12.036 -5.927 36.698 31.197 24.852 17.373

* The mean difference is significant at the .05 level. a Adjustment for multiple comparisons: Bonferroni.

This shows that there was only one pair for which there was no significant difference: 40% weedkiller followed by 2 minutes flame thrower, and 2 minutes flame thrower followed by 40% weedkiller. The differences for all the other pairs are significant. It does not matter if the farmer uses weedkiller or a flamethrower, but how much weedkiller and how long a burst of flame does make a difference to weed control. Formal report: There was a significant main effect of the type of treatment, F(1, 9) = 34.08, p < .001. There was a significant main effect of the severity of treatment, F(2.21, 19.88) = 83.49, p <.001.

MicrobiologyBytes 2009.

S-ar putea să vă placă și