Sunteți pe pagina 1din 6

Australian Critical Care (2011) 24, 133138

STATISTICS PAPER

Testing differences in proportions


Murray J. Fisher RN, ITU Cert, DipAppSc, BHSc, MHPEd, PhD a,,
Andrea P. Marshall RN, PhD a,
Marion Mitchell RN, PhD b,c

a Sydney Nursing School (MO2), University of Sydney, NSW 2006, Australia


b Grifth University & Princess Alexandra Hospital, Australia

Received 25 August 2010 ; received in revised form 21 December 2010; accepted 11 January 2011

KEYWORDS Summary This paper is the sixth in a series of statistics articles recently published
Statistics; by Australian Critical Care. In this paper we explore the most commonly used sta-
Chi square test; tistical tests to compare groups of data at the nominal level of measurement. The
Test for goodness of t; chosen statistical tests are the chi-square test, chi-square test for goodness of t,
Test for independence;
chi-square test for independence, Fishers exact test, McNemars test and the use
of condence intervals for proportions. Examples of how to use and interpret the
Fishers exact test;
tests are provided.
McNemars test;
Crown Copyright 2011 Published by Elsevier Australia (a division of Reed
Condence intervals International Books Australia Pty Ltd) on behalf of Australian College of Critical
Care Nurses Ltd. All rights reserved.

Introduction surement are explained in the second article of this


series Understanding Descriptive Statistics.1
This article presents the most commonly used sta- The number of cases in each category for a given
tistical tests to compare groups of data at the sample is known as the frequency distribution. A
nominal level of measurement. Nominal (or cate- common way of presenting frequency distributions
gorical) level of measurement is the sorting of cases for nominal data is in a table, sometimes referred
into one of several categories (for example, types of to as a contingency table or cross-tabulation. An
religion), where the measure of dispersion is based example is depicted in Table 1, which shows the
on the count or frequency of cases in each category frequency distribution of the incidence of diarrhoea
of measurement. Concepts around levels of mea- in an intensive care unit (ICU) population over a
12-month period during which time an intervention
was introduced.2
Specic methods of inferential statistics are
Corresponding author. Tel.: +61 2 9351 0587;
required to determine differences between sam-
fax: +61 2 9351 0654.
ples in nominal level measurement. In the example
E-mail address: murray.sher@sydney.edu.au (M.J. Fisher).
c Address: Nurse Practice Development Unit, Princess Alexan- depicted in Table 1, we would be determining the
dra Hospital, Ipswich Road, Woolloongabba, Qld 4102, Australia. difference in frequency of diarrhoea in patients

. Crown Copyright 2011 Published by Elsevier Australia (a division of Reed International Books Australia Pty Ltd) on behalf of Australian College of Critical Care Nurses Ltd. All rights reserved.
1036-7314/$ see front matter

doi:10.1016/j.aucc.2011.01.005
134 M.J. Fisher et al.

Table 1 Incidence of diarrhoea in intensive care following a bowel management protocol.


Pre-intervention, n (%) Post-intervention, n (%) Total
Patients with diarrhoea 138 (36) 63 (23) 201
Patients without diarrhoea 241 (77) 214 (64) 455
Total 379 277 656

before implementation of a bowel management that is, the results may be considered statistically
protocol with those after the protocol was imple- signicant when in reality they may not be sta-
mented. The tests of signicance for nominal tistically signicant. When samples are small and
data vary depending on the nature of the chosen the assumptions for the chi-square are violated,
measurements for the variables. Table 2 presents Fishers exact test could be used.6
the most commonly used tests for comparing two The formula for calculating the chi-square statis-
groups using nominal measurement level. tic is

 (fo fe )2
Chi-square test 2 =
fe
The chi-square test compares the observed fre-
quency distribution (fo ) for each category of the
where 2 is equal to the sum of the squared dif-
scale with the expected frequency distribution (fe )
ference between the observed and the expected
of the null hypothesis. When using a chi-square
frequencies divided by the expected frequency for
test it is assumed that there has been random
each cell.
sampling; that 80% of the cells have an expected
The concept of degrees of freedom (df) is impor-
frequency of greater than ve; that no cell has an
tant and is a mathematical limitation that needs to
observed frequency of 0; and, that a large sam-
be factored in when calculating an estimate of one
ple is used, as small sample sizes lead to a small
statistic from an estimate of another. The df are
expected frequency which causes large chi-square
used in conjunction with the table of critical val-
values.3 A limitation of the chi- square test is that
ues for chi-square. The df for a chi- square test is
it is sensitive to either very small or large sam-
calculated with the following equation:
ples. Quantifying the minimum sample is difcult
as it is dependent on the number of cells in the
crosstab. A sample is considered too small when df = (R 1) (C 1)
the above assumptions are not met. When these
assumptions are not met the chi-square cannot be where R equals the number of rows and C equals
meaningfully interpreted.4 The chance of nding a the number of columns.3
signicant difference between samples is greater
with larger samples. If you double the sample size,
the chi-square statistic will double due to the large Chi-square test for goodness of t
sample size rather than a strong pattern of depen-
The chi-square test for goodness of t is used for
dence between the variables.5
a single population and is a test used when you
When these assumptions are violated the results
have one categorical variable. This test determines
may lead to erroneous interpretation of the data;
how well the frequency distribution from that sam-
ple ts the model distribution. Consider the data
provided in the contingency table (Table 3) which
Table 2 The tests of signicance for nominal data.
reports the frequency of patients who developed
Sample types Test of signicance diarrhoea for three different wards within a hospi-
One-sample case Chi-square goodness tal.
of t
Two or more independent Chi-square test for
samples independence Table 3 Frequency of diarrhoea in patients admitted
Two dependent (paired) McNemars test for to three wards.
samples binomial
distributions Ward A Ward B Ward C Total
Small samples Fishers exact test 30 25 40 95
Testing differences in proportions 135

Table 4 Calculating the expected number of patients with diarrhoea in the pre-intervention sample for the null
hypothesis.
Pre-intervention Post-intervention Total
Patients with diarrhoea ? (fe) 63 201 (fr)
Patients without diarrhoea 241 214 455

Total 379 (fc) 277 656 (n)

Chi-square test for independence


Box 1
In statistics the degree of freedom is the The chi-square test for independence is also used
number of values in the calculation of a statis- for a single population but where there are two cat-
tic that are free to vary. To calculate the egorical variables. The test examines if there is a
degree of freedom for a chi-square test you relationship between the two variables for the one
count the number of rows and subtract 1 and sample.3 Consider the observed frequency distribu-
multiply with the number of columns with tion on the difference in the incidence of diarrhoea
1 subtracted. So for Table 2, DF = (number before and after the implementation of a bowel
of rows 1) (number of columns 1) or management protocol (Table 1).
(2 1) (2 1) = 1. The contingency table (Table 1) demonstrates
that 36.41% of the pre-intervention sample and
22.74% of the post-intervention sample experi-
enced diarrhoea. In order to determine whether
there is a statistical difference between the pre-
The chi-square test for goodness of t deter- intervention and post-intervention groups, the chi-
mines difference by comparing the observed squared test of independence is used as these are
frequency distribution with the frequency distribu- two independent samples. The chi-square statis-
tion of the null hypothesis. The null hypothesis is tic compares the observed frequency distribution
the expected frequency distribution of all wards is (fo ), for example the frequencies that are depicted
the same. That is, approximately 33.3 patients in in Table 1, with the expected frequency distribu-
each ward would be expected to have developed tion of the null hypothesis (fe ). The null hypothesis
diarrhoea. expresses the expected frequency for each cate-
gory if there is no statistical difference between
categories (see previous publication in this series7
(3.3)2 (8.3)2 (6.7)2
2 = + + = 0.33 + 2.09 for further information on hypothesis testing).
33.3 33.3 33.3 In this case the null hypothesis is that there
+ 1.36 = 3.78 is no statistical difference between the number
of patients with diarrhoea in the pre-intervention
sample as compared to those in the post-
intervention sample. To calculate the frequency
The critical value for 2 needs to be determined.
distribution of the null hypothesis (fe ) the following
First determine the df (see Box 1 ) and determine
formula is used:
the level of signicance (often set at 0.05) (please
refer to the fourth article of this series Statistical
and clinical signicance, and how to use con- fc fr
fe =
dence intervals to help interpret both7 ). Referring n
to a table listing the critical values of chi-square
(available in most statistics texts) and using the where fc is the frequency total for the column, fr
calculated degrees of freedom (df = 1) and level of is the frequency total for the row and n is the total
signicance of 0.05, the critical value for 2 is 3.84. sample size. To calculate the expected frequency
The computed chi-square value of 3.78 is lower than for each cell you simply substitute the observed
the critical value of 3.84, therefore the null hypoth- frequency with the calculated expected frequency
esis is not accepted and we conclude that there is using the formula.
not a statistical difference in the distribution of the Refer to Table 4 and the calculation below to
frequency of patients who developed diarrhoea for determine the expected number of patients with
the three different wards. diarrhoea in the pre-intervention sample for the
136 M.J. Fisher et al.

Table 5 Expected frequency distribution for patients with and without diarrhoea at pre-intervention and post-
intervention time periods.
Pre-intervention Post-intervention Total
Patients with diarrhoea 116.13 84.87 201
Patients without diarrhoea 262.87 192.13 455
Total 379 277 656

null hypothesis. In this case the fe would be: value, therefore the null hypothesis is rejected and
we conclude that there is a statistical difference
379(201) between the number of patients with diarrhoea in
fe = = 116.13
656 the pre-intervention sample as compared to those
in the post-intervention sample.
The expected frequency distribution for the null The Statistical Package for the Social Sciences
hypothesis in this example would be calculated as (SPSS version 18) was used to examine this sample
depicted in Table 5. of patients with or without diarrhoea. The reported
At a glance it would appear that in this example SPSS output conrms that there was a statistical
there is a difference between frequency observed difference in the incidence of diarrhoea between
(fo ) and the expected frequency (fe ). Table 6 the pre-intervention and post-intervention samples
presents the difference between the observed and 2 (1, n = 656) = 14.06, p < 0.0001. In the original
expected frequency for each cell. study Ferrie and East2 identied a statistical differ-
To calculate whether there is a statistical differ- ence in the incidence of diarrhoea between the two
ence the chi-square formula is used. samples (p < 0.0001), however this claim could have
been strengthened by reporting the 2 statistic.
 (fo fe )2
2 =
fe
Fishers exact test
where 2 is equal to the sum of the squared dif-
Fishers exact test is used in cases where there are
ference between the observed and the expected
cells with an expected frequency (fe ) less than 5
frequency divided by the expected frequency for
and/or with small sample sizes, as Fishers exact
each cell. In this case the chi-square statistic is
test has no sample size restriction.6 The method
equal to:
of calculation of Fishers exact test is different
(21.87)2 (21.87)2 (21.87)2 (21.87)2 to the chi-square statistic and is calculated by
2 = + + + determining the probability of getting the observed
116.13 84.87 262.87 192.13
frequency distribution by establishing and compar-
= 4.12 + 5.63 + 1.82 + 2.49 = 14.06 ing to all other possible distributions where the
column and row totals remain the same as the
The critical value for 2 needs to be determined; observed distribution. In this case the null hypoth-
rst calculate the df (see Box 1) and determine the esis indicates that all the cells would be close to
level of signicance. Referring to a table listing the equal. The calculation of Fishers exact test is com-
critical values of chi-square and using the calcu- plex and is not available in all statistical packages
lated df (1) and level of signicance of 0.05, the but can be performed using the Statistical Package
critical value for 2 is 3.84. The chi-square value of for Social Sciences.
14.06 calculated above exceeds that of the critical
McNemars test
Table 6 Difference between frequency observed and
expected frequency. The McNemar test compares dependent (paired
or matched) samples in terms of a dichotomous
Pre- Post-
variable.4 It is the best test for comparing dichoto-
intervention intervention
mous variables with two dependent sample studies
Patients with 21.87 21.87
as opposed to the chi-square test which examines
diarrhoea
Patients without 21.87 21.87
nominal level variables with two samples that are
diarrhoea independent of each other.4 A dichotomous vari-
able has only two possible outcomes, for example
Testing differences in proportions 137

Table 7 Contingency table of diarrhoea at two time periods (with cells named).
Time 1

No Yes Total
Time 2 Yes Cell A = 40 Cell B = 67 107
No Cell C = 60 Cell D = 33 93
Total 100 100 200

yes or no and it results in a binomial distribution.8 Condence intervals for differences in


The McNemar test may be used for pretestposttest proportion
design or in time series data where the same sam-
ple is tested at least in two points in time. The main Condence intervals (CI) are now being reported
assumption of the McNemar text is that the data along with p values in clinical studies and their
comes from two samples that are matched. This use has been described in an earlier paper in this
can either be as a paired sample or a before/after series.5 Calculation of CI is based on the assump-
sample. The McNemar test is a non parametric test tion that the variable is normally distributed in the
and thus assumes that the data are not normally population and is dependent on the level of mea-
distributed.4 surement and therefore the statistical test used.
Consider the following ctitious two by two The formula to construct the CI for a propor-
contingency table (Table 7) which shows the tion will be available in most statistics textbooks.9
incidence of diarrhoea at two time periods in There are also computer programmes that perform
a sample (n = 200). The McNemar test is simi- these tasks for researchers. The statistical pro-
lar to the chi-squared test in that it examines gramme will calculate the CI and the researcher
the difference between expected and observed selects the level of condence (for example, 95%).
cell frequencies. The following formula is used Below is an example of how CIs for nominal data
to calculate the McNemar test. There is one df may be used in determining clinical signicance.
which is derived from the following equation: In this sample the proportional difference of those
(rows 1) (columns 1) = 1. with diarrhoea before the intervention compared to
those with diarrhoea after the intervention is 13%
(3623% Table 1). We will now calculate the CI
around this sample result.
(Na Nd 1)2
2 M = The equation for an approximate 95% con-
Na + N d dence interval for the difference between two
population proportions (p1 p2 ) based on two inde-
pendent samples of size n1 and n2 with sample
where Na equals the frequency of observed proportions p1 and p2 is given by the following
responses see Cell marked A in Table 7 and Nd equation:
equals the frequency of observed responses see 
Cell D marked in Table 7. p1 (1 p1 ) p2 (1 p2 )
In the above example the 2 M is equal to: (p1 p2 ) 1.96 +
n1 n2

Example: Using the data provided in Table 1, we


2 (40 33 = 1)2 36 will calculate the 95% CI using the equation above
 M= = = 0.04
40 + 33 73 where p1 = 36%; p2 = 23%; n1 = 379 and n2 = 277.
The gure of 1.96 indicates we are computing a CI
of 95%.
With one degree of freedom and level of signif-

icance of .05, based on the chi-square distribution 0.36(1 0.36) 0.23(1 0.23)
the critical value for 2 M is 3.84. The McNemar chi- CI = (0.36 0.23) 1.96 +
379 277
square value is less than that of the critical value, 
therefore the null hypothesis (Ho ) is retained and = 0.13 1.96 0.000607316 + 0.00063335 = 0.13 1.96
we conclude that there is not a statistical differ- 0.035223.89 = 0.13 0.069037, upper limit = 0.199
ence in the incidence of diarrhoea between the two
and lower limit = 0.060
time periods.
138 M.J. Fisher et al.

These results indicate that the lower limit of a be more appropriate. As with many other statisti-
95% CI is 6% and the upper limit is 20% with the cal tests, assessment of the critical values, p values
sample proportion difference at 13%. Note that CIs and CI may assist in the reader determining clinical
may not be symmetrical around the sample propor- and statistical signicance of the results.
tion, it just happens to be in this instance. With
a 0.05 level of signicance, there is a signicant
result with p < 0.0001 (as reported earlier in the References
paper) and the CI provides additional information
1. Fisher M, Marshall AP. Understanding descriptive statistics.
as it gives a range of where the population propor- Aust Crit Care 2009;22:937.
tion is likely to lie. Patients with the intervention 2. Ferrie S, East V. Managing diarrhoea in intensive care. Aust
are somewhere between 6% and 20% more likely Crit Care 2007;20:713.
to experience no diarrhoea than those without the 3. Corty EW. Using and interpreting statistics: a practical text
for the health, behavioral and social sciences. St. Louis:
intervention. The clinical signicance and research
Mosby Elsevier; 2007.
conclusions should be drawn from the individual 4. Argyrous G. Statistics for social research. Melbourne:
context for the study.3 Macmillan Education Australia Pty. Ltd.; 1996.
5. Smithson MJ. Statistics with condence: an introduction for
psychologists. Canberra: Sage; 2000.
6. Altman DG. Practical statistics for medical research. Lon-
Conclusion don: Chapman & Hall; 1996.
7. Periera SMC, Leslie G. Hypothesis testing. Aust Crit Care
This paper has provided an introduction to the sta- 2009;22:18791.
tistical tests commonly used to test differences in 8. Polit DF, Beck CT. Nursing research: generating and assess-
ing evidence for nursing practice. 8th ed. St. Louis: Mosby
proportions for nominal level data. Chi-square tests Elsevier; 2008.
are commonly used in health care research and 9. Carlin JB, Doyle LW. Statistics for Clinicians 6:
where sample sizes are small, Fishers exact test comparison of means and proportions using con-
may be used. If the data from dependent, paired dence intervals. J Paediatr Child Health 2001;37:
samples are binomial, then the McNemar test may 5836.

Available online at www.sciencedirect.com

S-ar putea să vă placă și