Sunteți pe pagina 1din 11

Introduction to Nonparametric Statistics

Craig L. Scanlan, EdD, RRT


Parametric statistics assume (1) that the distribution characteristics of a sample's population are
known (e.g. the mean, standard deviation, normality) and (2) that the data being analyzed are at
the interval or ratio level. Frequently, however, these assumptions cannot be met. Commonly,
this occurs with nominal- or ordinal-level data (for which there are no measures of means or
standard deviations). Alternatively, continuous data may be so severely skewed from normal that
it cannot be analyzed using regular parametric methods.

In these cases we cannot perform analyses based on means or standard deviations. Instead, we
must use nonparametric methods. Unlike their parametric counterparts, non-parametric tests
make no assumptions about the distribution of the data nor do they rely on estimates of
population parameters such as the mean in order to describe a variable's distribution. For this
reason, nonparametric tests often are called 'distribution-free' or ' parameter-free' statistics.

Given that nonparametric methods make less stringent demands on the data, one might wonder
why they are not used more often. There are several reasons. First, nonparametric statistics
cannot provide definitive measures of actual differences between population samples. A
nonparametric test may tell you that two interventions are different, but it cannot provide a
confidence interval for the difference or even a simple mean difference between the two.

Second, nonparametric procedures discard information. For example, if we convert severely


skewed interval data into ranks, we are discarding the actual values and only retaining their
order. Because vital information is discarded, nonparametric tests are less powerful (more prone
to Type II errors) than parametric methods. This also means that nonparametric tests typically
require comparably larger sample sizes in order to demonstrate an effect when it is present.

Last, there are certain types of information that only parametric statistical tests can provide. A
good example is independent variable interaction, as provided by factorial analysis of variance.
There is simply no equivalent nonparametric method to analysis such interactions.

For these reasons, you will see nonparametric analysis used primarily on an as-needed basis,
either (1) to analyze nominal or ordinal data or (2) to substitute for parametric tests when their
assumptions are grossly violated, e.g., when a distribution is severely skewed. Discussion here
will be limited to the analysis of nominal or ordinal data.

Nominal (Categorical) Data Analysis

We previously have learned that the Pearson product-moment correlation coefficient (r) is
commonly used to assess the relationship between two continuous variables. If instead the two
variables are measured at the nominal level (categorical in nature), we assess their relationship
by crosstabulating the data in a contingency table. A contingency table is a two-dimensional
(rows x columns) table formed by 'cross-classifying' subjects or events on two categorical
variables. One variable's categories define the rows while the other variable's categories define
the columns. The intersection (crosstabulation) of each row and column forms a cell, which

1
displays the count (frequency) of cases classified as being in the applicable category of both
variables. Below is a simple example of a hypothetical contingency table that crosstabulates
patient gender against survival of chest trauma:

Outcome
Survives Dies Total
Male 34 16 50
Female 7 43 50
Total 41 59 100

Testing for Independence (Chi-square and Related Tests)


Based on simple probability, we can easily compute the expected values for each cell, i.e., the
number of cases we would expect based on their total distribution in the sample. For example,
given that the sample contains exactly 50% male and 50% female, were there no relationship
between gender and outcome (the null hypothesis of independence), we would expect exactly
half of those surviving (41) to be male, i.e., 41/2 = 20.5.* Similar expected values can be
computed for all cells in the table.

The greater the difference between the observed (O) and expected (E) cell counts, the less likely
that the null hypothesis of independence holds true, i.e., the stronger the evidence that the two
variables are related. In our example, the large difference between the observed (O = 34) and
expected (E = 20.5) cell counts for the Male/Survives cell suggests that being male is associated
with greater likelihood of survival.

To determine whether or not the row and column categories for the table as a whole are
independent of each other, i.e. we compute the Chi-square statistic (χ2):

where O = the observed frequency and E = the expected frequency. As indicated in the formula,
one first computes the differences between the observed and expected frequencies in each cell,
squares this difference, and then divides the squared difference by that cell's expected frequency.
These values are then summed for each cell (the ∑ symbol), yielding the values of chi-square
(χ2). In our example χ2 = 30.14.

The resulting χ2 statistic is then compared to a critical value that is based on the number of rows
and columns and obtained from a Chi-square distribution table. If the computed χ2 statistic is less
than this critical value, then we must accept the null hypothesis and conclude that the variable
categories are independent of each other, i.e., not associated. If on the other hand the computed

*
The actual formula for computing the expected count (E) in any contingency table cell is:
E = (row total x column total)/grand total
For the Male/Survives cell E = (50 x 41) / 100 = 20.5

2
χ2 statistic exceeds the critical value, then we reject the null hypothesis and conclude that the
variable categories are indeed related.

In our example, the critical value for χ2 in this analysis is 3.84. Since our computed χ2 of 30.14
clearly exceeds this critical value, we can conclude that the variable categories are indeed
related, i.e., that gender is associated with survival after chest trauma (hypothetical example).

If the minimum expected count for any cell in a contingency table is less than 5, then the
resulting χ2 statistic may not be accurate. In this case, an alternative is needed. The alternative to
the χ2 test for this situations is Fisher's Exact Test. Most authors recommend using Fisher's
Exact Test statistics instead of χ2 whenever one or more of the expected counts in a table cell is
less than 5 or when the row or column totals are very uneven.

It is important to note that both χ2 and Fisher's Exact Test are nondirectional (symmetrical) tests,
i.e., they make no assumptions as to directionality or cause and effect. If one is assessing the
relationship between cause and effect, other nonparametric test would need to be considered.

Testing for the Strength of Categorical Relationships


χ2 and Fisher's Exact Test only test whether or not there is a relationship between categorical
variables. To test the strength of such relationships we use correlation-like measures such as the
Contingency Coefficient, the Phi coefficient or Cramer's V. These coefficients can be thought of
as Pearson product-moment correlations for categorical variables. However, unlike the Pearson r,
which can assume negative values, these coefficients only range from 0 to +1 (you cannot have a
'negative' relationship between categorical variables) .

The contingency coefficient (CC) is computed as follows:

where χ2 = the Chi-square value and N = the sample size. Unfortunately, the maximum value of
the contingency coefficient varies with table size (being larger for larger tables). For this reason,
it is difficult to compare the association among variables among different size tables using this
coefficient.

The Phi coefficient (φ) is a measure of nominal association applicable only to 2 x 2 contingency
tables. It is calculated using the following formula:

In our example, the Phi coefficient = √30.14/50 = 0.60, a moderately strong association.

If we were conducting crosstabulation on contingency tables larger than 2 x 2, Cramer's V is the


nominal association measure of choice. The formula for Cramer's V is:

3
where N is the total number of cases and k is the lesser of the number of rows or columns.
Because in 2 x 2 tables k = 2 and k-1 = 1, Cramer's V equals Phi for 2 x 2 analyses.

Ordinal (Ranked) Data Analysis

Testing for the Strength of Ordinal (Ranked) Relationships


As with continuous and nominal data, measures exist to quantify the strength of association
between variable measured at the ordinal level. The two most common ordinal measures of
association are Spearman's rho (ρ) and Kendall's rank order correlation coefficient or Kendall's
tau (τ). Both Spearman's rho and Kendall's tau require that the two variables, X and Y, are paired
observations, with the variables measured are at least at the ordinal level.* Like the parametric
Pearson product-moment correlation coefficient, both these measures can range between -1.0 and
+1.0, with a positive correlation indicating that the ranks increase together, while a negative
correlation indicates that as the rank of one variable increases the other one decreases.

Spearman's rho. In principle, Spearman's rho is simply a special case of the Pearson product-
moment coefficient in which the data are converted to ranks before calculating the coefficient. In
practice, however, a simpler procedure is normally used to calculate ρ. The raw scores are
converted to ranks, and the differences D between the ranks of each observation on the two
variables are calculated. ρ is then computed as:

where:
D = the difference between the ranks of corresponding values of X and Y, and
N = the number of pairs of values.
As an example, suppose we rank a group of eight people by height and by weight (here person A
is tallest and third-heaviest, and so on):

Case A B C D E F G H

Rank by Height 1 2 3 4 5 6 7 8

Rank by Weight 3 4 1 2 5 7 8 6

The differences between the ranks for the 8 subjects (height rank – weight rank) are:
-2, -2, 2, 2, 0, -1, -1, 2
Squaring and then summing these values:

*
If the data are at the interval or ratio level, nonparametric correlation tests like Spearman's rho and Kendall's tau
simply replace these data with their ranks.

4
∑D2 = 4 + 4 + 4 + 4 + 0 + 1 + 1 + 4 = 22
Computing the denominator:
N(N2-1) = 8(82 – 1) = 8(64 – 1) = 504

And finally Spearman's ρ


ρ = 1 – [(6*22)/504] = 1 – (132/504) = 0.738

Like a Pearson r, a Spearman's rho (ρ) of 0.738 would be considered a moderately strong
positive correlation, in this case indicating that as a person's height rank increases, so too does
their weight rank.
Kendall's tau. An alternative measure used to test for the strength of a relationship between
ordinal (ranked) variables is Kendall's rank order correlation coefficient or Kendall's tau (τ). The
main advantage of using Kendall's tau over Spearman's rho is that one can interpret its value as a
direct measure of the probabilities of observing concordant and discordant pairs.
As long as one of the variables is presorted by order, Kendall's tau can be computed using the
following formula:

Where P is the sum, over all the cases, of cases ranked after the given item by both rankings, and
n is the number of paired items.
Using the same data as we employed to compute Spearman's rho, we note that the paired
observations are sorted in order of height, so we will compute P based on the weight data. In the
Weight row of this table, the first entry, 3, has five higher ranks to the right of it; so its
contribution to P is 5. Moving to the second entry, 4, we see that there are four higher ranks to
the right of it and its contribution to P is 4. Continuing this way, we find that

P = 5 + 4 + 5 + 4 + 3 + 1 + 0 + 0 = 22
And thus
2P = 2 x22 = 44
Computing the denominator:
½ n(n -1) = 4(8-1) = 4 x 7 = 28
And finally computing Kendall's tau

.
Again, we see a positive correlation between the height and weight ranks, albeit less strong than
that revealed by Spearman's rho.

5
Testing for Group Differences on Ordinal (Ranked) Data
There are many times when researchers want to compare two or more groups on an outcome that
is measured at the ordinal level (as opposed to interval or ratio data). Alternatively, interval or
ratio-level measurements on groups may be so skewed as to make regular parametric analysis
impossible. In these cases, comparable nonparametric approaches to traditional t-testing or
analysis of variance (ANOVA) are needed. The following table summarizes the nonparametric
equivalents to traditional t-testing or ANOVA:
Purpose or Need Parametric Approach Nonparametric Approach
To analyze differences between 2 Independent t-test Mann-Whitney U test
independent groups (aka Wilcoxon rank-sum test)
To analyze differences between 2 Paired (dependent) t-test Wilcoxon signed rank test for
related groups (repeated measures) paired data
To analyze differences between 3 or One way ANOVA (F-test) Kruskal-Wallis ANOVA
more independent groups
To analyze differences between 3 or Repeated measures ANOVA Friedman two-way ANOVA
more related groups (repeated measures)
Adapted from: Dallal, G.E. Nonparametric statistics. In The Little Handbook of Statistical Practice available at:
http://www.tufts.edu/~gdallal/LHSP.HTM

Comparing Two Groups by Ranks – the Mann-Whitney U Test. The Mann-Whitney U test (also
known as the Wilcoxon Rank Sum Test) is a nonparametric test used to determine whether two
samples of ordinal/ranked data differ.* It is the nonparametric equivalent to conducting an
independent t-test comparing two groups on a normally distributed continuous variable.

The Mann-Whitney U ranks all the cases for each of the two groups from the lowest to the
highest value. Then a mean rank, sum of ranks and 'U' score is computed for each group.

Two U scores are computed: U1 and U2. U1 is defined as the number of times that a score from
group 1 is lower in rank than a score from group 2. Likewise U2 is defined as the number of
times that a score from group 2 is lower in rank that a score from group 1. U1 and U2 are
computed as follows:

U1 = n1n2 + (n1(n1 + 1))/2 - R1


U2 = n1n2 + (n2(n2 + 1))/2 - R2

where:

n1 = number of observations in group 1


n2 = number of observations in group 2
R1 = sum of ranks assigned to group 1
R2 = sum of ranks assigned to group 2

*
If the data are at the interval or ratio level, nonparametric tests like the Mann-Whitney U simply replace these data
with their ranks. However, if the sample data are continuous and normally distributed, then nonparametric tests like
the Mann-Whitney U Test should not be employed since they are less powerful than their parametric equivalents and
thus more likely to miss a true difference between groups.

6
The Mann-Whitney U statistic is defined as the smaller of U1 or U2. The Wilcoxon W statistic
(Wilcoxon rank-sum test) is simply the smaller of the two groups’ Sums of Ranks. Since the
sampling distributions for both the U and W statistics approach that of a normal curve (as long as
N > 20), we can use a simple Z-score to judge the significance of group differences in ranks. If
the rank distributions are identical to one another, then the Z-score will equal 0. Positive Z-
scores indicate that the sums of the ranks of group 2 are greater than that of group 1, while
negative Z-scores indicate the opposite, i.e., that the sums of the ranks of group 2 are less than
that of group 1. At the normal confidence level of 0.05, any Z-score greater than ±1.96 indicates
a statistically significant difference in the distribution of ranks.

Note that if the observations are paired instead of independent of each other (e.g., a pre/post
measure conducted on the same subjects), then we use the Wilcoxon signed rank test for paired
data (not to be confused with the Wilcoxon rank-sum test described above) instead of the Mann-
Whitney U test.

Comparing More than Two Groups by Ranks – the Kruskal-Wallis Test. The Kruskal-Wallis Test
is a generalization of the Wilcoxon rank sum test nonparametric test used to determine whether
more than two groups of ordinal/ranked data differ.* It is the nonparametric equivalent to
conducting a one-way ANOVA comparing multiple groups on a normally distributed continuous
variable. The Kruskal-Wallis statistic, H, is computed as follows, with the results being
compared to a critical value in the Chi-square distribution:

where:

k = number of samples (groups)


ni = number of observations for the i-th sample or group
N = total number of observations (sum of all the ni)
Ri = sum of ranks for group i

As an example, consider the following comparison of four diet plans (labeled as plan A, B, C &
D) enrolling a total of 19 patients. The observations represent kilograms of weight lost over a 3
month period.

*
If the data are at the interval or ratio level, nonparametric tests like the Mann-Whitney U simply replace these data
with their ranks. However, if the sample data are continuous and normally distributed, then nonparametric tests like
the Mann-Whitney U Test should not be employed since hey are less powerful than their parametric equivalents and
thus more likely to miss a true difference between groups.

7
.
A B C D
4.2 3.3 1.9 3.5
4.6 2.4 2.4 3.1
3.9 2.6 2.1 3.7
4.0 3.8 2.7 4.1
2.8 1.8 4.4

The first step in conducting a Kruskal-Wallis analysis is to rank order ALL the observations
from lowest (1) to highest (19) and then sum the ranks for each plan:

A B C D
17 10 2 11
19 4.5 4.5 9
14 6 3 12
15 13 7 16
8 1 18
Sum of Ranks 65 41.5 17.5 66

Based on the sum of ranks for each group, we apply the computation formula for H:

Last, from using a Chi-square table, we determine that the critical value three degrees of freedom
(degrees of freedom = # groups – 1) is 7.812. Since 13.678 is greater than this critical value, we
reject the null hypothesis and conclude that the rankings of weight loss do differ among the four
diet plans. Inspecting the sum of ranks suggests that plans A and B are the best (and nearly
equivalent), whereas plan C ranks lowest in weight loss.

Note that if the observations are repeated more than once (e.g., pre-test, post-test, other follow-
up), then we cannot use the Kruskal-Wallis test and instead must use a nonparametric alternative
to the repeated-measures ANOVA, e.g., Friedman's two-way ANOVA.

8
Reference Bibliography
Agresti, A. (1996). Introduction to categorical data analysis. NY: Wiley.
Altman, D.G. (1991). Comparing groups – categorical data (Chapter 10). In Practical statistics
for medical research. Boca Raton, FL: Chapman & Hall.
Becker, L.A. (1999a). Crosstabs: Measures for nominal data. University of Colorado at Colorado
Springs. Retrieved January 9, 2004 from http://web.uccs.edu/lbecker/SPSS/ctabs1.htm
Becker, L.A. (1999b). Crosstabs: Measures for ordinal data. University of Colorado at Colorado
Springs. Retrieved January 9, 2004 from http://web.uccs.edu/lbecker/SPSS80/ctabs2.htm
Becker, L.A. (1999c). Testing for differences between two groups: Nonparametric tests.
University of Colorado at Colorado Springs. Retrieved January 9, 2004 from
http://web.uccs.edu/lbecker/spss80/nonpar.htm
Connor-Linton, J. (2003). Chi-square tutorial. Georgetown University. Retrieved November 30,
2004 from http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed). New York: Wiley.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed). Boston: PWS-Kent
Dallal, G.E. (2000). Nonparametric statistics. In The little handbook of statistical practice.
Retrieved November 22, 2002 from http://www.tufts.edu/~gdallal/npar.htm
Dallal, G.E. (2000). Contingency tables. In The little handbook of statistical practice. Retrieved
November 22, 2002 from http://www.tufts.edu/~gdallal/ctab.htm
Daniel, W.W. (2004). The Chi-square distribution and the analysis of frequencies (Chapter 12).
In Biostatistics: A foundation for analysis in the health sciences, 8th ed. New York: Wiley
Daniel, W.W. (2004). Nonparametric and distribution-free statistics (Chapter 13). In
Biostatistics: A foundation for analysis in the health sciences, 8th ed. New York: Wiley
deRoche, J. (2004). Measures of association. Cape Breton University. Retrieved May 7, 2004
from http://anthrosoc.capebretonu.ca/Measures%20of%20association.doc
Field, A. (2005). Categorical data (Chapter 16). In Discovering statistics using SPSS. 2nd ed.
London: Sage Publications
Friel, C.M. (2004a). Nonparametric tests. Sam Houston State University. Retrieved February 19,
2004 from http://www.shsu.edu/~icc_cmf/cj_685/mod9.doc
Friel, C.M. (2004b). Nonparametric correlation techniques. Sam Houston State University.
Retrieved February 19, 2004 from http://www.shsu.edu/~icc_cmf/cj_685/mod12.doc
Garson, G.D. (1998a). Chi-square significance tests. In Statnotes: Topics in multivariate
analysis. Retrieved February 26, 2004 from http://www2.chass.ncsu.edu/garson/pa765/chisq.htm

9
Garson, G.D. (1998b). Fisher exact test of significance. In Statnotes: Topics in multivariate
analysis. Retrieved February 26, 2004 from http://www2.chass.ncsu.edu/garson/pa765/fisher.htm
Garson, G.D. (1998c). Nominal association: Phi, contingency coefficient, Tschuprow's T,
Cramer's V, lambda, uncertainty coefficient. In Statnotes: Topics in multivariate analysis.
Retrieved February 26, 2004 from http://www2.chass.ncsu.edu/garson/pa765/assocnominal.htm
Garson, G.D. (1998d). Ordinal association: gamma, Kendall's tau-b and tau-c, Somers' d. In
Statnotes: Topics in multivariate analysis. Retrieved February 26, 2004 from
http://www2.chass.ncsu.edu/garson/pa765/assocordinal.htm
Garson, G.D. (1998e). Tests for two independent samples: Mann-Whitney U, Kolmogorov-
Smirnov Z, & Moses extreme reactions tests. In Statnotes: Topics in multivariate analysis.
Retrieved May 17. 2004 from http://www2.chass.ncsu.edu/garson/pa765/mann.htm
Gibbons, J. (1993). Nonparametric measures of association. Quantitative applications in the
social sciences series. Thousand Oaks, CA: Sage Publications
Gibbons, J. (1992). Nonparametric statistics: An introduction. Quantitative applications in the
social sciences series. Thousand Oaks, CA: Sage Publications
Gibbons, J. D., & Chakraborti, S. (1992). Nonparametric statistical inference (3rd ed.). New
York: Marcel Dekker
Gore, A. P., Deshpande, J. V., & Shanubhogue, A. (1993). Statistical analysis of non-normal
data. New York: Wiley.
Hollander, W., & Wolfe, D.A. (1999). Nonparametric statistical methods (2nd ed). New York:
Wiley.
Lehmkuhl, L.D. (1996). Nonparametric statistics: Methods for analyzing data not meeting
assumptions required for the application of parametric tests. J Prothetics Orthotics, 8, 105-113.
Michael, R.S. (2001) Crosstabulation & Chi square. Indiana University. Retrieved November 30,
2004 from http://www.indiana.edu/~educy520/sec5982/week_12/chi_sq_summary011020.pdf
Noether, G. E. (1991). Introduction to statistics: the nonparametric way. New York: Springer-
Verlag.
Norman, G.R. & Streiner, D.L. (2000). Test of significance for categorical frequency data
(Chapter 20). In Biostatistics: The bare essentials (2nd ed). Hamilton, Ontario: B.C. Decker
Norman, G.R. & Streiner, D.L. (2000). Measures of association for categorical data (Chapter
21). In Biostatistics: The bare essentials (2nd ed). Hamilton, Ontario: B.C. Decker
Norman, G.R. & Streiner, D.L. (2000). Tests of significance for ranked data (Chapter 22). In
Biostatistics: The bare essentials (2nd ed). Hamilton, Ontario: B.C. Decker
Norman, G.R. & Streiner, D.L. (2000). Measures of association for ranked data (Chapter 23). In
Biostatistics: The bare essentials (2nd ed). Hamilton, Ontario: B.C. Decker

10
Pett, P. (1997). Nonparametric statistics in health care research: Statistics for small samples and
unusual distributions. Thousand Oaks, CA: Sage Publications
Reynolds, H.T. (1984). Analysis of nominal data. Sage series on Quantitative Applications in the
Social Sciences. Newbury Park CA: Sage.
Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences (2nd
ed.). New York: McGraw-Hill.
Statsoft (2006). Nonparametric statistics. In Electronic Statistics Textbook. Tulsa, OK:
StatSoft, Inc. Retrieved Aprill 22, 2004 from http://www.statsoft.com/textbook/stnonpar.html
van Belle, G., Fisher, L.D., Heagerty, P.J., & Lumley, T.S. (2004). Categorical data:
Contingency tables (Chapter 7). Biostatistics: A methodology for the health sciences, 2nd ed.
New York: Wiley
van Belle, G., Fisher, L.D., Heagerty, P.J. Lumley, T.S. (2004). Nonparametric, distribution-free
and permutation models: Robust procedures. (Chapter 8). Biostatistics: A methodology for the
health sciences, 2nd ed. New York: Wiley
Weaver, B. (2002). Nonparametric tests (Chapter 3). Northern Ontario School of Medicine.
Retrieved August 2, 2006 from http://www.angelfire.com/wv/bwhomedir/notes/nonpar.pdf
Weaver, B. (2005). Analysis of categorical data (Chapter 2). Northern Ontario School of
Medicine. Retrieved August 2, 2006 from
http://www.angelfire.com/wv/bwhomedir/notes/categorical.pdf
Williams, R. (2005). Categorical data analysis. University of Notre Dame, Department of
Sociology. Retrieved March 25, 2007 from http://www.nd.edu/~rwilliam/stats1/x51.pdf

11

S-ar putea să vă placă și