Sunteți pe pagina 1din 53

Analysis of Variance (ANOVA)

Scott Harris October 2009

Learning outcomes
By the end of this session you should be able to choose between, perform (using SPSS) and interpret the results from:

Analysis of Variance (ANOVA), Kruskal-Wallis test, Adjusted ANOVA (can also be called Univariate General Linear Model or Multiple linear regression.).

Contents
Reminder of the example dataset. Comparison of more than 2 independent groups (P/NP) Test information. How to in SPSS. Adjusting for additional variables How to in SPSS. What to do when you add a continuous predictor. What to do when you add 2 or more categorical predictors. Interpreting the output.

Example dataset: Information


CISR (Clinical Interview Schedule: Revised) data:

Measure of depression the higher the score the worse the depression. A CISR value of 12 or greater is used to indicate a clinical case of depression. 3 groups of patients (each receiving a different form of treatment: GP, CMHN and CMHN problem solving). Data collected at two time points (baseline and then a follow-up visit 6 months later). Calculated age at interview from the 2 dates.

Example CISR dataset

Comparing more than two independent groups


Analysis of variance (ANOVA) or Kruskal Wallis test

Normally distributed data


Analysis of variance (ANOVA)

More than 2 groups?


When there are more than 2 groups that you wish to compare then t tests are no longer suitable and you should employ Analysis of variance (ANOVA) techniques instead.

Analysis of Variance (ANOVA): Hypotheses

H 0 : X1 X 2 X3 ... X n
H1 : Not all the means are the same
The null hypothesis (H0) is that all of the groups are the same. The alternative hypothesis (H1) is that they are not all the same.
9

SPSS One-way ANOVA

Analyze Compare Means One-Way ANOVA


10

SPSS One-way ANOVA


* One-way ANOVA . ONEWAY B0SCORE BY TMTGR /STATISTICS DESCRIPTIVES /MISSING ANALYSIS /POSTHOC = LSD BONFERRONI ALPHA(.05).

11

Info: One-way ANOVA in SPSS


1) From the menus select Analyze Compare Means One-Way ANOVA. 2) Put the variable that you want to test into the Dependent List: box. 3) Put the categorical variable, that indicates which group the values come from, into the Factor: box. 4) Click the Options button and then tick the boxes for Descriptive. Click Continue. 5) Click the Post Hoc button and then tick the boxes for the post hoc tests that you would like. Click Continue.

6) Finally click OK to produce the test results or Paste to add the syntax for this into your syntax file.

12

SPSS One-way ANOVA: Output


Descriptiv es B0SCORE 95% Confidence Interval for Mean Lower Bound Upper Bound 19.5863 28.1280 25.5901 32.1599 24.7479 31.5448 25.2704 29.3534

N GP CMHN CMHN PS Total 28 40 41 109

Mean 23.8571 28.8750 28.1463 27.3119

Std. Deviation 11.01418 10.27116 10.76699 10.75286

Std. Error 2.08148 1.62401 1.68152 1.02994

Minimum 6.00 7.00 4.00 4.00

Maximum 52.00 48.00 57.00 57.00

Group summary statistics (descriptives option)

2 sided p value with an alternative hypothesis of non-equality of at least one group. Non significant (P=0.137) hence no significant evidence to suggest differences in the groups.

ANOVA B0SCORE Sum of Squares Between Groups 460.469 Within Groups 12026.926 Total 12487.394 df 2 106 108 Mean Square 230.234 113.462 F 2.029 Sig. .137

13

SPSS One-way ANOVA: Output


These methods use t tests to perform all pair wise comparisons between group means No adjustment for multiple comparisons (LSD option)
Multiple Comparisons Dependent Variable: B0SCORE Mean Difference (I-J) -5.01786 -4.28920 5.01786 .72866 4.28920 -.72866 -5.01786 -4.28920 5.01786 .72866 4.28920 -.72866

P-value
95% Confidence Interval Lower Bound Upper Bound -10.2215 .1857 -9.4666 .8882 -.1857 10.2215 -3.9647 5.4220 -.8882 9.4666 -5.4220 3.9647 -11.4025 1.3668 -10.6417 2.0633 -1.3668 11.4025 -5.0298 6.4872 -2.0633 10.6417 -6.4872 5.0298

LSD

Bonferroni

Adjusted p values for multiple comparisons (Bonferroni option)

(I) TMTGR (J) TMTGR GP CMHN CMHN PS CMHN GP CMHN PS CMHN PS GP CMHN GP CMHN CMHN PS CMHN GP CMHN PS CMHN PS GP CMHN

Std. Error 2.62464 2.61143 2.62464 2.36725 2.61143 2.36725 2.62464 2.61143 2.62464 2.36725 2.61143 2.36725

Sig. .059 .103 .059 .759 .103 .759 .176 .310 .176 1.000 .310 1.000

Mean difference between Groups I and J 95% Confidence interval for the difference between Groups I and J
14

Non-normally distributed data


Kruskal Wallis test

SPSS Kruskal Wallis test

* Kruskal-Wallis test . NPAR TESTS /K-W=M6SCORE BY TMTGR(1 3) /MISSING ANALYSIS.

Analyze Nonparametric Tests K Independent Samples

16

Info: Kruskal Wallis test in SPSS


1) From the menus select Analyze Nonparametric Tests K Independent Samples. 2) Put the variable that you want to test into the Test Variable List: box.

3) Put the categorical variable, that indicates which group the values come from, into the Grouping Variable: box.
4) Click the Define Range box and then enter the numeric codes for the minimum and maximum of the groups that you want to compare. Click Continue.

5) Ensure that the Kruskal-Wallis H option is ticked in the Test Type box.
6) Finally click OK to produce the test results or Paste to add the syntax for this into your syntax file.

17

SPSS Kruskal Wallis test: Output


Ranks M6SCORE TMTGR GP CMHN CMHN PS Total N 28 40 41 109 Mean Rank 68.66 48.24 52.27

Observed mean ranks

Test Statisticsa,b Chi-Square df Asymp. Sig. M6SCORE 7.388 2 .025

a. Kruskal Wallis Test b. Grouping Variable: TMTGR

2 sided p value with an alternative hypothesis of non-equality of groups. Significant (P=0.025) hence significant evidence that at least one of the groups is different.

If you want to find out where the differences are then you need to conduct a series of pair-wise Mann Whitney U tests.
18

Practical Questions
Analysis of Variance

Questions 1 and 2

Practical Questions
From the course webpage download the file HbA1c.sav by clicking the right mouse button on the file name and selecting Save Target As. The dataset is pre-labelled and contains data on Blood sugar reduction for 245 patients divided into 3 groups.
1) Assuming that the outcome variable is normally distributed:

Conduct a suitable statistical test to compare the finishing HbA1c level (HBA1C_2) between all of the 3 groups. What are your conclusions from this test if you dont worry about multiple testing? What about if you do, using a Bonferroni correction?
2) Assuming that the outcome variable is NOT normally distributed:

Conduct a suitable statistical test to compare the finishing HbA1c level (HBA1C_2) between all of the 3 groups. What are your conclusions from this test?
20

Practical Solutions
1) The ANOVA table shows that at least one of the groups is significantly different from the others (p=0.010).
Descriptiv es HB1AC_2 95% Confidence Interval for Mean Lower Bound Upper Bound 5.3283 6.1133 5.6558 6.3706 6.1584 6.8625 5.8670 6.2942

N Active A Active B Placebo Total 83 80 82 245

Mean 5.7208 6.0132 6.5105 6.0806

Std. Deviation 1.79766 1.60600 1.60229 1.69735

Std. Error .19732 .17956 .17694 .10844

Minimum 1.36 2.31 3.44 1.36

Maximum 10.36 9.88 11.21 11.21

ANOVA HB1AC_2 Sum of Squares 26.261 676.706 702.967 df 2 242 244 Mean Square 13.131 2.796 F 4.696 Sig. .010

Between Groups Within Groups Total

21

Practical Solutions
Looking at the individual LSD and Bonferroni corrected pair-wise comparisons it can be seen that there is only one contrast that shows a significant difference at the 5% level and that is Active A vs. Placebo, with the Placebo levels higher.
Multiple Comparisons Dependent Variable: HB1AC_2 Mean Difference (I-J) Std. Error -.29239 .26200 -.78968* .26037 .29239 .26200 -.49728 .26278 .78968* .26037 .49728 .26278 -.29239 .26200 -.78968* .26037 .29239 .26200 -.49728 .26278 .78968* .26037 .49728 .26278

LSD

Bonferroni

(I) Treatment group (J) Treatment group Active A Active B Placebo Active B Active A Placebo Placebo Active A Active B Active A Active B Placebo Active B Active A Placebo Placebo Active A Active B

Sig. .266 .003 .266 .060 .003 .060 .797 .008 .797 .179 .008 .179

95% Confidence Interval Lower Bound Upper Bound -.8085 .2237 -1.3026 -.2768 -.2237 .8085 -1.0149 .0204 .2768 1.3026 -.0204 1.0149 -.9240 .3392 -1.4174 -.1620 -.3392 .9240 -1.1308 .1362 .1620 1.4174 -.1362 1.1308

*. The mean difference is significant at the .05 level.

22

Practical Solutions
2) For the non-parametric test, again, there is only a p value to report from the test (although the group medians could be reported from elsewhere, the pairwise comparisons need to be done as separate Mann-Whitney U tests as shown in Analysing Continuous data and CIs for these differences could be calculated from CIA).
Ranks HB1AC_2 Treatment group Active A Active B Placebo Total N 83 80 82 245 Mean Rank 108.99 119.30 140.79
Test Statisticsa,b Chi-Square df Asymp. Sig. HB1AC_2 8.631 2 .013

a. Kruskal Wallis Test b. Grouping Variable: Treatment group

The Kruskal-Wallis test shows that at least one of the groups is significantly different from the others (p=0.013)
23

Comparing groups and adjusting for other variables


Adjusted ANOVA

Adjusted ANOVA
Sometimes you wish to look at a relationship that is more complicated than one continuous outcome with one categorical group predictor. Adjusted ANOVA allows for the addition of other covariates (predictor variables). These can be either categorical, continuous or a combination of both. The next command in SPSS is one of the most powerful. SPSS calls it a Univariate General Linear Model (GLM). It can replicate one-way ANOVA and Linear regression.

It is also equivalent to multiple regression but with a bit more flexibility.

25

Example 1
Replicating the one-way ANOVA

SPSS Adjusted ANOVA


Outcome variable Categorical predictor variables

Continuous predictor variables

Analyze General Linear Model Univariate


27

SPSS Adjusted ANOVA

Can produce pair-wise comparisons for multiple categorical variables

The same additional options can be set as for the one-way ANOVA, with post hoc pair-wise comparisons

28

SPSS Adjusted ANOVA

and simple descriptive statistics available.


29

Info: Adjusted ANOVA in SPSS (no continuous covariates)


1) From the menus select Analyze General Linear Model Univariate. 2) Put the variable that you want to test into the Dependent Variable: box. 3) Put any categorical variables, that indicate which group the values come from or some other category, into the Fixed Factor(s): box. 4) Click the Options button and then tick the boxes for Descriptive statistics. Click Continue. 5) Click the Post Hoc button and then move over the categorical variable(s) that you would like the pairwise comparisons for. Then tick the boxes for the post hoc tests that you would like. Click Continue. 6) Finally click OK to produce the test results or Paste to add the syntax for this into your syntax file.

30

SPSS Adjusted ANOVA: Output


This is the same p value as from the one-way ANOVA and it is interpreted in the same way. Notice how the row uses the variable name (important for later).

31

SPSS Adjusted ANOVA: Output


The same post-hoc pair-wise results as before:

32

Example 2
Adjusting for continuous and categorical covariates

SPSS Adjusted ANOVA


Outcome variable Categorical predictor variables (2 categorical variables here)

Continuous predictor variables (1 continuous variable here)

Analyze General Linear Model Univariate


34

SPSS Adjusted ANOVA

As soon as we include a continuous covariate, the Post Hoc option is no longer available and we need to use the Contrasts option which isnt quite as powerful.

35

SPSS Adjusted ANOVA

Select the Category that you want the contrast for and then you can select the type of contrast and the reference level. Simple is the standard contrast (simple differences between levels) and the reference category is the level that all other levels of the categorical variable are compared against.

36

Info: Adjusted ANOVA in SPSS (inc. continuous covariates)


1) 2) 3) 4) From the menus select Analyze General Linear Model Univariate. Put the variable that you want to test into the Dependent Variable: box. Put any categorical variables, that indicate which group the values come from or some other category, into the Fixed Factor(s): box. Put any continuous variables into the Covariate(s): box.

5)

Click the Contrasts button and set up any contrasts that you want for any categorical variables. You need to select the variable, then choose the type of contrast (generally you use Simple). Next you need to select the reference level. This can be either first or last and it will dictate the level of the category variable that all other levels will be compared against (first will compare all other levels against the first: 2nd -1st, 3rd1st etc.), then click Change. When you are finished click Continue.
Click the Options button and then tick the boxes for Descriptive statistics. Click Continue. Finally click OK to produce the test results or Paste to add the syntax for this into your syntax file.

6) 7)

37

SPSS Adjusted ANOVA


If you have more than 1 variable in the Fixed Factor(s) box then you need to go into the Model.. options.

38

SPSS Adjusted ANOVA


The default model is Full factorial. This will include all possible interactions between factors. Generally we want to consider only main effects (at least at the start). To do this select Custom and then set Type: to Main effects and move all Factors & Covariates into the Model: box.
39

Info: Adjusted ANOVA in SPSS (2+ categorical covariates)


1) 2) 3) 4) 5) 6) 7) 8) From the menus select Analyze General Linear Model Univariate. Put the variable that you want to test into the Dependent Variable: box. Put the 2 or more categorical variables, that indicate which group the values come from or some other category, into the Fixed Factor(s): box. Click the Model button and then select Custom. Change the Type: type to Main effects and then move all Factors & Covariates into the Model: box. Put any continuous variables into the Covariate(s): box. Set up either the Post Hoc or Contrasts options depending on whether there are any continuous covariates or not (see previous information slides). Click the Options button and then tick the boxes for Descriptive statistics. Click Continue. Finally click OK to produce the test results or Paste to add the syntax for this into your syntax file.

40

SPSS Adjusted ANOVA: Output


Descriptive statistics separated by all combinations of the factor variables.

This is now the p-value for the effect of TMTGR having adjusted for SEX and AgeInt. So having taken into account variability due to SEX and AgeInt there is no statistically significant difference between the treatment groups (p=0.121).
Similar statements can be made regarding the other variables in the model, i.e. having adjusted for TMTGR and AgeInt there is no statistically significant difference between the 2 sexes (p=0.261).
41

SPSS Adjusted ANOVA: Output


The Contrast results (interpretation is the same as the previous slide):
The TMTGR variable had 3 levels here: 1 GP 2 CMHN 3 CMHN PS By selecting the first level of the factor as the reference category the contrast will produce: CMHN GP (2-1) CMHN PS GP (3-1) 95% CI for the difference between CMHN PS and GP

Difference between CMHN PS and GP

P-value for CMHN PS - GP

42

Practical Questions
Analysis of Variance

Question 3

Practical Questions
3) Using an Adjusted ANOVA with the finial HbA1c level

(HBA1C_2) as the outcome: i.

Replicate the model from question 1.

ii. Add the baseline level of HbA1c (HBA1C_1) in as a covariate. How does this affect the results? iii. Add Gender to the model from part (ii). Look at just the main effects of the variables rather than any interactions. Does this change your results? Do you think Gender should be in the model?

44

Practical Solutions
3) i.

By adding HBA1C_2 as the dependent variable and GROUP as a Fixed factor we can replicate the one-way ANOVA
45

Practical Solutions

The same multiple comparisons:

46

Practical Solutions
3) ii.

By adding HBA1C_1 to the model GROUP has become more significant. We are explaining an additional amount of variability, hence increasing the precision.
47

Practical Solutions
We need to use the contrasts option when a continuous covariate is added to the model. To see the remaining contrast we need to re-run with a different reference category.
48

Practical Solutions
3) iii.

Adding in another categorical covariate means we need to go into the model options or we will get interaction terms fitted as default.

49

Practical Solutions

The size of the effects alters slightly but the conclusion remains the same. Although Gender is not statistically significant it may still be important in the model. We can include terms if they are significant in our sample, they are key variables that have shown to be important in the literature or we want to 50 test them for differences.

Practical Solutions

51

Summary
You should now be able to choose between, perform (using SPSS) and interpret the results from:
Comparing three or more independent groups:

Parametric: Analysis of Variance (ANOVA) Non-parametric: Kruskal-Wallis test.


Comparing independent groups and adjusting for other variables: Parametric: Adjusted ANOVA (can also be called Univariate GLM or Multiple linear regression.)
52

References
Parametric Practical statistics for medical research, D Altman: Chapter 9. Medical statistics, B Kirkwood, J Stern: Chapters 7 & 9. An introduction to medical statistics, M Bland: Chapter 10.

Statistics for the Terrified: Testing for differences between groups.

Non-parametric Practical statistics for medical research, D Altman: Chapter 9. Medical statistics, B Kirkwood, J Stern: Chapter 30. An introduction to medical statistics, M Bland: Chapter 12. Statistics for the Terrified: Testing for differences between groups.

Practical Non-Parametric Statistics (3rd ED), W.J. Conover.

53

S-ar putea să vă placă și