Efecto Man Withney 2 PDF

INNOVATIVE The simple difference formula: an approach to teaching
TEACHING nonparametric correlation1

2014, Volume 3, Article 1
ISSN 2165-2236
Dave S. Kerby
DOI 10.2466/11.IT.3.1 Oklahoma Department of Corrections
© Dave S. Kerby 2014
Attribution-NonCommercial-
NoDerivs CC-BY-NC-ND
Abstract
Although teaching effect sizes is important, many statistics texts omit the topic
for the Mann-Whitney U test and the Wilcoxon signed-rank test. To address
this omission, this paper introduces the simple difference formula. The formula
Received May 27, 2014 states that the correlation equals the simple difference between the proportion
Accepted December 27, 2014 of favorable and unfavorable evidence; in symbols this is r = f – u. For the Mann-
Whitney U, the evidence consists of pairs. For the signed-rank test, the evidence
Published February 14, 2014
consists of rank sums. Also, the formula applies to the Binomial Effect Size Dis-
play. The formula r = f – u means that a correlation r can yield a prediction so that
the proportion correct is f and the proportion incorrect is u.
Effect sizes are a key issue in teaching statistics in psychology. An important early state-
CITATION
ment of this fact was by Jacob Cohen in 1962. Surveying published studies, he found
Kerby, D. S. (2014) The simple that researchers had little chance of rejecting the null, “unless the effect they sought was
difference formula: an ap- large” (Cohen, 1962, p. 151). He concluded that the low power seen in much published
proach to teaching nonpara- research was due to a lack of awareness of effect sizes. The issue of effect sizes and
metric correlation. Innovative
Teaching, 3, 1. power was part of the debate in the 1990s over the value of the null hypothesis decision
strategy, and the American Psychological Association addressed the debate with a Task
Force on Statistical Inference. When the Task Force issued its report in 1999, one key
recommendation echoed Cohen's concern about the importance of effect sizes. “Always
present effect sizes for primary outcomes” (Wilkinson & APA Task Force, 1999, p. 599).
In the wake of this recommendation, textbooks on statistics in psychology have paid
more attention to the topic of effect sizes; however, these textbooks still have gaps. For
example, many popular texts do not mention effect sizes for two common nonpara-
metric procedures: the Mann-Whitney U, and the Wilcoxon signed-rank test (e.g., Kirk,
2008; Nolan & Heinzen, 2008; Kiess & Green, 2010; Aron, Aron, & Coups, 2011; Spatz,
2011; Howells, 2012; Privitera, 2012; Gravetter & Wallnau, 2013). Thus, teachers face a
challenge in teaching effect sizes for at least two simple methods usually covered in the
introductory course in statistics.
The goal of this paper is to present an effect size formula for two commonly used
nonparametric methods. The simplicity of the formula provides insight into the mean-
ing of rank correlation, and it permits its use in other psychology classes to convey a
meaningful sense of the size of an effect.
Effect Size for the Mann-Whitney U

Frank Wilcoxon (1945) developed a now widely used nonparametric test called the
rank-sum test. The test assigns ranks to all the scores considered as one group, then
sums the ranks of each group. The null hypothesis is that the two samples come from
the same population, so any difference in the two rank sums comes only from sampling
error. The rank-sum test is often described as the nonparametric version of the t test for
two independent groups.
A mathematically equivalent version of the rank-sum test is the Mann-Whitney U
test (Mann & Whitney, 1947). In their original paper, Mann and Whitney defined U as
a count of the number of times that a y score (a score from Group 1) precedes in rank
order an x score (a score from Group 2). They illustrated the idea with an example of
a drug study, with rats assigned to either a treatment group or to a control group; the
Ammons Scientific Address correspondence to Dave S. Kerby, Kerby Behavioral Development, 2205 Cardinal Lane, McAlester
1
www.AmmonsScientific.com OK 74464 or e-mail (Dave.S.Kerby@gmail.com).

Simple Difference Formula / D. S. Kerby
hypothesis was that the rats given the drug would live students. First, the terms are informative. To say that an
longer. U is computed twice, once for each group; and x score precedes in rank order a y score is accurate, but
the test statistic is the smaller of the two. it is more informative to say that the pair is favorable to
The value of each U can be computed from the rank the hypothesis. The second reason is that the terms are
sums. For each group, U is equal to the observed rank general, and they can be used in other contexts, such as
sum, ΣR, minus the minimum value that the rank sum with the Wilcoxon signed-rank test, as will be shown
could be for the group size, ΣRmin. The formulas for U below. To use the Mann and Whitney (1947) example, if
can be expressed most simply as follows: a rat in the treatment group does live longer than a rat
in the control group, then these two rats are said to form
U 1 = ΣR1 − ΣRmin 1 (1)
a favorable pair.
U 2 = ΣR2 − ΣRmin 2 (2) The Definition of U
Finding the test statistic U requires two steps. First, com-
Replacing ΣRmin with an expression using only the pute the number of favorable and unfavorable pairs; or
number in each group, another still quite simple way to what is the same thing, compute U1 and U2, as defined
compute U is with the formulas below: in Equations 1 and 2. Second, select the smaller of the
n1 (n1 + 1) (3) two numbers; this smaller number is the test statistic U.
U 1 = ΣR1 −
2 An easy way to teach these steps to students is with
n (n + 1) a structured data table, as shown in Table 1. The first
U 2 = ΣR2 − 2 2 (4)
2 column lists the name or ID of each participant, as a
reminder that each person has one score. The second
Teachers of statistics can impart to students some
column lists the scores, and an important part of the
insight into the meaning of this formula by noting that
structure of the table is that the scores are in rank order;
U is zero when one group has all the smallest ranks.
the scores can be in descending or ascending order,
To illustrate, consider the rat example used by Mann
whichever seems most convenient for the problem at
and Whitney (1947), and imagine five rats in the con-
hand. The last two columns list the ranks, with one col-
trol group and five in the treatment group. Suppose that
umn for the treatment group and one column for the
the five rats in the control group have the shortest lives,
control group. The hypothetical data in Table 1 concern
and so have ranks one (shortest life) through five (fifth
running times for an 800-meter run. The hypothesis is
shortest life). The actual sum of ranks for the control
that wind-sprint training will yield faster runners than
group is ΣR = 1 + 2 + 3 + 4 + 5 = 15. This is also the mini-
a control method of training.
mum value, so U for this group is 15 minus 15, which
With the data structured in this manner, students
equals zero, meaning that none of the rats in the control
may now examine the pairs. An easy way to teach pair
group lived longer than a rat in the treatment group.
formation is to draw a line from a rank in the experi-
Favorable and Unfavorable Pairs mental group to a rank in the control group.
A helpful term for teaching students the meaning of U First, consider the favorable pairs. In the current
is the word pair. Consider a study with human partici- example, a favorable pair exists when the runner in
pants on learning vocabulary: a pair exists when a per-
son in the experimental group is compared to a person TABLE 1
in the control group. Mann-Whitney U Test Sample Data
When a pair is formed, there are three possible out- Ranks
comes. One outcome is that the pair may support the Person Score
Sprints Control
hypothesis; using the vocabulary example, the student
Art 2:20 1
in the experimental group learned more words. The
Bill 2:21 2
teaching approach introduced in this paper describes
Carl 2:23 3
such a pair with the term favorable, because the data
Dan 2:25 4
are favorable to the hypothesis. The second outcome
Ed 2:26 5
is that the pair may not support the hypothesis; using
the vocabulary example, the student in the experimen- Frank 2:28 6
tal group learned fewer words. The teaching approach Gary 2:29 7
introduced in this paper describes such a pair with the Hal 2:30 8
term unfavorable, because the data are unfavorable to the Ira 2:32 9
hypothesis. Finally, the pair may have the same score, in Note The hypothesis is that the runners in the
sprint group will run faster. Of the total of 20
which case they are tied.
pairs, 18 (90%) are favorable to the hypothesis
There are two reasons to use the terms favorable and and 2 (10%) are unfavorable; hence, the rank-
unfavorable when teaching the Mann-Whitney test to biserial correlation is r = .90–.10 = .80.
Innovative Teaching 2 2014, Volume 3, Article 1

the experimental group is faster than the runner in the Though U is non-directional, it is always possible to
control group. For example, Art in the treatment group state a directional hypothesis. In fact, in the field of psy-
came in first place, so he has a rank of one. Dan in the chology a directional hypothesis is common. Researchers
control group came in fourth place, so he has a rank of prefer that depressed patients become less depressed,
four. A line that connects the two ranks will slope down that students make higher grades, and that healthy
to the right, a direction that indicates a favorable pair. behaviors lead to a longer life. In cases where there is
Art can be paired in this way with three more runners, no preferred direction, one can select an arbitrary direc-
so Art is a member of four favorable pairs. Counting tion as the hypothesis.
in this way for each person in the treatment group, the When a direction is stated, and when the results are
result is a total of 18 favorable pairs. in the expected direction, then U is easy to interpret: it is
Next, consider the unfavorable pairs. In the current the number of pairs that are unfavorable to the hypoth-
example, a pair is unfavorable when the runner in the esis. For example, consider a study with 10 people in
control group is faster; when this is the case, the line the control group and 10 in the treatment group (for a
that connects the two ranks slopes up to the right. For total of 100 pairs). If there are 70 favorable pairs and 30
example, Ed in the treatment group has a rank of five, unfavorable pairs, then U is 30. Dividing U by the total
and Dan in the control group has a rank of four. The two number of pairs will yield the proportion of unfavor-
runners form an unfavorable pair, so the line between able pairs, a proportion that can be represented by u.
the two ranks slopes up to the right. Counting in this In the same way, the proportion of favorable pairs can
way, for the entire table, the result is only two unfavor- be represented by f, which can be used as a measure of
able pairs. effect size with the Mann-Whitney U test.
Table 1 contains no tied pairs, but tied pairs are
The Common Language Effect Size
counted as one half favorable and one half unfavor-
able. By counting pairs in this simple manner, students McGraw and Wong (1992) discussed using the propor-
obtain the same result as applying the formulas for U1 tion of favorable pairs as a measure of effect size, which
and U2. Applying the formulas, the result is 18 and 2; they called the common language effect size. Given
counting the pairs with connecting lines, the result is 18 assumptions such as normality and equal variances,
favorable pairs and 2 unfavorable pairs. In this way, the effect sizes expressed as Cohen's d or the Pearson r can
meaning of the formulas can be made clear. be converted into the common language effect size. The
While on the topic of ties, it should be noted that a effect size also works well with the Mann-Whitney U
few ties cause little concern with the Mann-Whitney U. test. The purpose of the common language effect size, as
However, when there are numerous ties, “it may now the name implies, is to express the meaning of an effect
be misleading to use tabulated critical values” (Sprent size in the everyday language of a percent. McGraw and
& Smeeton, 2001, p. 152). The misleading results would Wong give the example of the height difference between
also apply to effect sizes discussed in this paper, so cau- men and women. While a simple way of reporting the
tion should be used when applying the Mann-Whitney effect size is to note that the average man is 5.4 inches
U to data with numerous ties (see also Conover, 1999). taller than the average woman, the common language
effect size is also easily interpreted, because one can say
U is Non-Directional that “in 92 out of 100 blind dates among young adults,
A key feature of U as a statistic is that it is non-direc- the male will be taller than the female” (p. 361). Thus,
tional. For example, consider a study on the treatment one advantage of the common language effect size is
of depression. Suppose that five people in the treatment that it is easily interpreted.
group have the five lowest scores on depression, so the A second advantage of reporting the common lan-
ranks are 1, 2, 3, 4, and 5. The four people in the con- guage effect size is that it allows for a clear statement of
trol group have the highest scores on depression, so the the null hypothesis: if the null is true, then in the long
ranks are 6, 7, 8, and 9. In such a case, there are 20 favor- run the amount of favorable evidence is equal to the
able pairs, and 0 unfavorable pairs, so U = 0. On the amount of unfavorable evidence. In other words, when
other hand, suppose the treatment backfired, so that the the null is true, the expected value of the common lan-
five people in the treatment group had the five highest guage effect size is fifty percent. With the Mann-Whit-
scores on depression. In such a case, there are 0 favor- ney U test, this amounts to saying that the sampling dis-
able pairs, and 20 unfavorable pairs, so U still equals tribution has a mean of 50% favorable pairs.
zero. A third advantage of the common language effect
The point here is that in both cases, U equals zero. size is that it allows easy interpretation when the results
That is, if one only knows that U is zero, then one of two are against the prediction. For example, suppose that a
states is true: either the data in the form of ranks are as study backfires, so that the analysis yields 30 favorable
good as they can possibly be, or they are as bad as they pairs and 70 unfavorable pairs. The value of U is still
can possibly be. In short, U is non-directional. 30, just as it was in the previous example when the data

were in the predicted direction. However, by stating the when U is zero. Because U is by definition non-direc-
effect size as 30% favorable pairs, one knows that the tional, the rank-biserial as computed by the Wendt for-
data are not in the predicted direction, because the most mula is also non-directional and is always positive.
of the evidence is not in accord with the hypothesis.
The Simple Difference Formula
Rank-Biserial Correlation Though the three formulas mentioned above are use-
While the common language effect size is useful, a more ful, they were introduced before the common language
widely used measure in statistics is the correlation. A effect size. Also, two decades of teaching statistics at the
correlation effect size exists for the Mann-Whitney U college level has convinced me that none of the three
test, and it is known as the rank-biserial correlation. formulas convey much meaning to the typical student.
Three formulas have been proposed for computing this In this paper, I introduce a fourth formula for comput-
correlation. ing the rank-biserial correlation, one that is based on the
Edward Cureton (1956) introduced and named the common language effect size. To do this, begin with the
rank-biserial correlation. To compute the correlation, Cure- formula presented by Cureton (1956), then break the
ton stated a direction; that is, one group was hypothe- ratio in two. This yields a formula that is the simple dif-
sized to have higher ranks. Then he used the concepts ference between two ratios: r = (P/Pmax) – (Q/Pmax). The
of an agreement and an inversion, his terms for what in first ratio is the proportion of favorable pairs, which is
this paper are called a favorable pair and an unfavor- the common language effect size; the second ratio is the
able pair. Cureton denoted the number of agreements proportion of unfavorable pairs. If the proportion of
(favorable pairs) with P, and he denoted the num- favorable pairs is represented as f, and if the proportion
ber of inversions (unfavorable pairs) with Q. He then of unfavorable pairs is represented as u, then the for-
computed the correlation as the difference between P mula can be written as the simple difference between
and Q, divided by the maximum value. In symbols, r the two proportions: r = f – u.
= (P–Q)/Pmax. This is the simple difference formula. In words, the
A second formula for the rank-biserial correlation formula states that the nonparametric correlation equals
was developed by Gene Glass (1965) in the course of his the simple difference between the proportion of favor-
work on item analysis for scale development. His goal able and unfavorable evidence; in the case of the Mann-
was to derive a formula to estimate the Spearman cor- Whitney U test, the evidence consists of pairs.
relation, in the same way that the biserial r estimates An advantage of the simple difference formula is
the Pearson r. As Glass worded it, “One can derive a that it expresses one effect size (the rank-biserial corre-
coefficient defined on X, the dichotomous variable, and lation) in terms of another easily understood measure of
Y, the ranking variable, which estimates Spearman's rho effect size (the common language effect size). The other
between X and Y in the same way that biserial r esti- formulas for the rank-biserial express it in terms that
mates Pearson's r between two normal variables” (p. are less easily interpreted. A second advantage of the
91). He ended with a formula that was mathematically simple difference formula is that it gives meaning to the
equivalent to Cureton's formula. The Glass formula is direction of the sign. A positive correlation means that
convenient for item analysis, because a high correlation the data were in the predicted direction; a negative cor-
occurs when test takers who correctly answer an indi- relation, that they were against the predicted direction.
vidual item have a higher rank on the total score than A third advantage of the simple difference formula
those who answered incorrectly. The Glass formula is is that it is readily understood by students, for an anal-
equal to twice the difference in the mean ranks of those ogy is easily made to weighing information in a balance.
people who answered correctly (Y1 ) and those who If the favorable data is more than the unfavorable data,
missed the item (Y0 ), divided by the total number of then the scales are tipped in favor of the hypothesis,
( )
people who are ranked: r = 2 Y1 − Y0 / N . Among those and the correlation is positive. Using the data in Table
few textbooks that present a formula, the Glass formula 1 as an example, the favorable evidence outweighs the
is used (e.g., Cohen, 2008; Jaccard & Becker, 2009). unfavorable 90% to 10%, so the overall balance is 0.90
Hans Wendt (1972) presented a third formula, one minus 0.10, yielding a rank-biserial correlation of r = .80.
based on U. Wendt was motivated to develop his for- If the data are all favorable, then the scales are tipped as
mula because he observed in published research a much as possible, and the correlation is a perfect one. If
“neglect of correlation in favor of significance statistics” the amount of favorable data is equal to the amount of
(p. 463). His goal was to derive an easy-to-use formula unfavorable data, then the scales are not tipped either
that would promote the reporting of effect sizes with way, and the correlation is zero.
the Mann-Whitney U test. The Wendt formula com- In summary, because many introductory texts omit
putes the rank-biserial correlation from U and from the a discussion of a correlational effect size for the Mann-
sample size (n) of the two groups: r = 1 – (2U)/ (n1 * n2). Whitney U, teachers of introductory statistics may find
One can see that the correlation is at a maximum of r = 1 the simple difference formula a useful way to address

this omission. The simple difference formula allows If SU represents the sum of the unfavorable ranks,
students to compute an effect size for the Mann-Whit- then SF represents the sum of the favorable ranks. Then
ney U, and it provides insight into the meaning of the the expected sum E can be expressed as (SF + SU)/2, and
correlation. this reduces the numerator to W, which is sometimes
used as the test statistic (e.g., Glantz, 2005). Thus, the
formula is now r = W/S. And of course, for a directional
The Wilcoxon Signed-rank Test hypothesis, W can be stated as the difference between
Another popular method taught in introductory courses the favorable sums and unfavorable sums. The result
is the Wilcoxon signed-rank test. While the U test com- is that the matched-pairs rank-biserial correlation can
pares two independent groups, the signed-rank test be expressed r = (SF/S) – (SU/S), a difference between
compares two matched groups (Wilcoxon, 1945). The two proportions. One can note that the rank-biserial
signed-rank test is often described as the nonparamet- as defined by Cureton (1956) can be stated in a simi-
ric version of the paired t test. lar form, namely r = (P/Pmax) – (Q/Pmax). Therefore, the
Various symbols have been used for the test statistic. In formula for the matched-pairs rank-biserial correlation
his original paper, Wilcoxon used r to refer to the smaller also reduces to: r = f – u.
sum of liked-signed ranks; however, the letter r is used for Here again is the simple difference formula. In
the correlation coefficient, so the use of this symbol has words, the formula states that the nonparametric cor-
been rarely adopted. The popular book by Sidney Siegel relation is the simple difference between the proportion
(1956) introduced many behavioral researchers to non- of favorable and unfavorable evidence; in the case of
parametric methods, and Siegel referred to the smaller of the Wilcoxon signed-rank test, the evidence consists of
the liked-signed ranks with the letter T, which has become rank sums.
popular in the social sciences. Another approach is to add
both the negative and positive sums to produce a test sta- Sample Problem
tistic called W (e.g., Glantz, 2005). An easy way to teach the signed-rank test is to place the
As mentioned above, it is common for introduc- data in a structured table. Table 2 displays such a data
tory texts to omit a discussion of an effect size with the table. The first column lists the names or IDs for each
signed-rank test. One exception to this state of affairs participant. The next two columns contain the pretest
is King, Rosopa, and Minium (2011), who note that an scores and posttest scores. The fourth column lists the
appropriate effect size is the matched-pairs rank-bise- change score, computed as the posttest score minus the
rial correlation. Though they do not provide a citation, pretest score. A directional hypothesis is stated, either
King et al. present a formula for the correlation in terms that scores are predicted to increase or to decrease. A
of the smaller of the liked-signed ranks (T), the sum of favorable rank is one that is in accord with this predic-
the positive ranks (R+), the sum of the negative ranks tion; an unfavorable rank, one that is not in accord with
(R–), and the sample size (N). Using r for the correlation, it. The last two columns contain the ranks for the abso-
the formula is as follows: lute value of the change scores, with one column for
favorable ranks and one column for unfavorable ranks.
4×|T − (( R+ + R− )/ 2)| (5) An important part of the structure of such a table is that
r=
N ( N + 1) the ranks are placed in order; they can be ascending or
descending, whichever seems more convenient for the
The formula can be daunting for students in their problem at hand.
first course of statistics, so it would be convenient if a For the hypothetical study in Table 2, eight people
simpler form were available. In fact, a simpler form is participate in a program to increase marital happiness,
possible, because this formula can be converted into the and a scale of marital happiness is given before and
simple difference formula. To simplify, first change the after the program. Because there are eight people in the
four in the numerator to two divided by one half; this study, the total rank sum is 36 (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8
change makes the value in the denominator equal to = 36). Given the hypothesis that the program will
the total sum of ranks, which can be symbolized as S. increase the happiness score, a person's data is favor-
Next, the value in parenthesis is merely the expected able when the score increases after the program; in
value when the null hypothesis is true, so the value in the same way, the data is unfavorable when the score
parenthesis can be replaced with the letter E. The abso- decreases after the program. The data in Table 2 illus-
lute value sign allows a change from T–E to E–T, so the trate how to apply the simple difference formula, and
formula becomes: r = 2 |E–T|/S. To express the correla- these particular data yield a correlation of 0.50.
tion in a directional manner, eliminate the absolute sign The major advantage of the simple difference for-
in the numerator. Also, when a direction is stated, T is mula is its simplicity, especially as compared with the
equal to the sum of the unfavorable ranks, which can be formula presented in the text by King, et al. (2011). A
expressed as SU, yielding the formula: r = (2E – 2SU)/S. second advantage is that students do not have to learn a

TABLE 2 Suppose that the count of students at or above grade

Wilcoxon Signed-rank Test Sample Data level is 70 in the treatment group, but only 30 in the con-
Score Rank of Change trol group. The BESD is the simple difference between the
Person Change
Before After Favorable (+) Unfavorable (–) two proportions, r = .70 – .30 = .40. Thus, for those teachers
A 20 38 18 8 who cover the BESD in class, the simple difference for-
B 22 37 15 7 mula which is used for the BESD can be carried over to the
C 19 33 14 6 Mann-Whitney U test and the Wilcoxon signed-rank test.
D 20 29 9 5
Three-valued Logic for the Null Hypothesis
E 22 14 –8 4
F 18 12 –6 3
Another possible use of the simple difference formula
is in teaching three-valued logic for the null hypothesis.
G 24 20 –4 2
A long tradition exists in introductory texts for teaching
H 20 22 2 1
a two-valued testing strategy: reject the null, or fail to
Rank Sums = 27 9 reject. Despite this tradition, Wainer and Robinson (2003)
Note The hypothesis is that the scores will increase. Of the note that Ronald Fisher, the founder of null hypothesis
total rank sum of 36, the favorable rank sum is 27 (75%), and the testing, in fact used a three-valued approach. “When p
unfavorable rank sum is 9 (25%), so the matched-pairs rank-biserial … is small (less than .05), he declared that an effect has
correlation is r = .75–.25 = .50.
been demonstrated. When it is large (p is greater than
.2), he declared that, if there is an effect, it is too small
new formula; they can instead apply the same rank for- to be detected with an experiment this size. When it lies
mula used for the Mann-Whitney U test. A third advan- between these extremes, he discussed how to design the
tage is that the simple difference formula is directional next experiment to estimate the effect better” (p. 23).
and gives meaning to the direction of the sign; when Harris (1997) provides a good discussion on three-
the data are in the predicted direction, the correlation is valued logic in testing the null hypothesis, and recom-
positive; when the data are not in the predicted direc- mends its use. He noted that two-valued logic leads to
tion, the correlation is negative. “such absurdities as stating whether or not results are
A fourth advantage is again the ease of understand- statistically significant, but not in what direction” (p.
ing that comes from making an analogy to weighing 8). Given that the simple difference formula is easy to
the evidence in a balance. Using the data in Table 2, the use and readily indicates a direction, the formula may
favorable evidence outweighs the unfavorable 75% to prove useful to those who wish to apply three-valued
25%, so the overall balance is 0.75 minus 0.25, yielding logic in testing the null.
a matched-pairs rank-biserial correlation of r = .50. One
would expect that students would report little insight Teaching Effect Sizes In Other Psychology
into the meaning of the involved formula used by King, Classes
et al. (2011); by contrast, they find the analogy to weigh- The simple difference formula is so readily understood
ing evidence for and against a hypothesis to be intui- that it can be used to convey the meaning of research
tively meaningful. results in psychology classes other than statistics. As an
example, consider a course in personality, for which the
The Binomial Effect Size Display topic of effect sizes is important.
An additional benefit of teaching the simple difference In the 1960s, many researchers attacked the field of
formula is that it applies to another topic in psychological personality. Two researchers describe the mood of the
statistics – the Binomial Effect Size Display (BESD). Based decade: “During the 1960s when we were graduate
on a 2 × 2 chi-square, the BESD was developed to provide students, we frequently heard predictions from exper-
insight into the meaning of the size of a correlation (Rosen- imental psychologists and experimental social psy-
thal & Rubin, 1982). The effect size in this case is a version chologists that in 20 or so years differential psychol-
of the Pearson r sometimes referred to as phi. ogy would be a dead field” (Schmidt & Hunter, 2004,
The concepts of favorable and unfavorable evidence p. 162). The dismissal of personality traits rested on the
still apply. But whereas the evidence for the Mann- supposedly small effect sizes between traits and rele-
Whitney U consists of pairs, and whereas the evidence vant outcomes, as detailed in Mischel (1968). He noted
for the Wilcoxon rank-sum test consists of sums, the evi- that much research found that traits correlated with
dence for the BESD consists of counts. To illustrate, con- social outcomes at about r = .30, which he said was too
sider of study of reading ability in fifth grade students. small to be important, and which he mockingly called
The 100 students in the control group receive teaching the personality coefficient. Nisbett (1980) later had to
as usual, and the 100 students in the treatment group admit that many well-designed personality studies reg-
receive a new method; the outcome measure is reading ularly obtained effect sizes near r = .40, but the criticism
at or above grade level. remained that this was too small to be important. How-

ever, Nisbett and other social psychologists continued TABLE 3

to claim that effect sizes in social psychology were large. Example to Illustrate a Rank Correlation of .40
This line of argument based on effect sizes was Ranks
Person ID Trait Score
shown to be unfounded in a seminal paper by Funder Failure Success
and Ozer (1983). Selecting high-profile social psychol- A1 70 40
ogy studies widely agreed to show large effects, Funder A2 67 39
and Ozer computed the effect sizes. The result was that A3 60 38
the median effect size of these large effects was r = .38. If A4 59 37
this effect is large in social psychology, then it must also A5 58 36
be large in personality psychology, refuting the claim A6 57 35
that the effects of traits are too small to be important. A A7 56 34
later meta-analysis of 25,000 studies in social psychol- A8 55 33
ogy over the past century found that the average effect A9 54 32
size was r = .21, with a standard deviation of .15 (Rich-
A10 52 30
ard, Bond, & Stokes-Zoota, 2003). These results soundly
A11 51 31
reject the claim that social psychology has larger effects
A12 50 29
than personality psychology.
A13 48 28
Because this debate looms large in the history of per-
A14 47 27
sonality, teachers of personality may wish to convey to
students the meaning of a correlation of r = .40. An easy A15 46 26
way to do this is to use a simple rank method, such as A16 43 25
the simple difference formula. Table 3 presents a sample A17 42 24
of data that can be used for this purpose. The data con- A18 41 23
sists of scores on a trait, which are ranked in the table A19 40 22
from high to low; and an outcome variable is scored as A20 35 21
success or failure. Thus, the data could illustrate using B1 32 20
emotional stability to predict a given level of marital B2 31 19
happiness, extraversion to predict meeting a sales tar- B3 30 18
get, or core self-evaluations to predict a satisfactory job B4 25 17
evaluation by a work supervisor. B5 24 16
The data in Table 3 consists of 40 people, of which 20 B6 23 15
succeed on the outcome, and 20 fail. The ranks of the trait B7 21 14
scores are in the last two columns, with the ranks of suc- B8 20 13
cessful people in one column, and the ranks of unsuccess- B9 19 12
ful people in the other. The rank-biserial r for these data B10 18 11
is r = .40. To reject the null at the .05 level with the Mann-
B11 15 10
Whitney U requires a U of 127 or less, so the obtained U of
B12 14 9
120 allows one to reject the null hypothesis.
B13 13 8
A good way to teach students the meaning of a cor-
B14 12 7
relation is in terms of a prediction. For the rank-biserial
B15 11 6
r, the prediction is with pairs. When all possible pairs
are formed between a person with success and a person B16 10 5
with failure, the prediction is that the person with the B17 8 4
higher rank on the trait score is the one rated successful. B18 6 3
In Table 3 there are 400 pairs, and the prediction is cor- B19 5 2
rect in 70% of cases (n = 280 pairs) and incorrect in 30% B20 4 1
of cases (n = 120 pairs); in other words, the odds of being Note The hypothesis is that the people with
correct are seven to three. higher trait scores will be more likely to show
A line in Table 3 divides the upper half from the success on the outcome. Of the total of 400 pairs,
lower half; this line shows the close association between 280 (70%) are favorable to the hypothesis and
the rank-biserial r and the BESD. For the BESD, the pre- 120 (30%) are unfavorable; hence, the rank-
biserial correlation is r = .70 – .30 = .40.
diction is with counts of people: the prediction is that
those people in upper half of trait scores will have
success. Note that this prediction is correct in 70% of three. Of course, this means those people in the lower
cases (n = 14 people) and wrong in 30% (n = 6 people); half of trait scores are predicted to fail; this prediction
put another way, the odds of being correct are seven to is also correct in 70% of cases (n = 14 people) and wrong

in 30% (n = 6 people), with odds of being correct also at Cohen, J. (1962) The statistical power of abnormal-social psycho-
seven to three. logical research. Journal of Abnormal Psychology, 65(3), 145-153.
Notice that these figures of 70% and 30% are those of DOI: 10.1037/h0045186
Conover, W. J. (1999) Practical nonparametric statistics (3rd ed.).
the simple difference formula: r = .70 – .30 = .40. In gen- New York, NY: John Wiley.
eral terms, the simple difference formula r = f – u means Cureton, E. E. (1956) Rank-biserial correlation. Psychometrika,
that a given correlation r can yield a prediction for two 21(3), 287-290. DOI: 10.1007/BF02289138
independent groups, so that the proportion correct is f Funder, D. C., & Ozer, D. J. (1983) Behavior as a function of the
and the proportion incorrect is u. For the data in Table situation. Journal of Personality and Social Psychology, 44(1), 107-
3, this holds true whether using the 400 pairs to com- 112. DOI: 10.1037/0022-3514.44.1.107
pute the rank-biserial r, or whether using the 40 people Glantz, S. A. (2005) A primer of biostatistics (6th ed.). New York,
NY: McGraw-Hill Medical.
to compute the BESD. An example such as this can pro- Glass, G. V. (1965) A ranking variable analogue of biserial correla-
vide students with insight into the meaning of r = .40. tion: implications for short-cut item analysis. Journal of Educa-
tional Measurement, 2(1), 91-95. DOI: 10.1111/j.1745-3984.1965.
Discussion tb00396.x
Though the importance of teaching effect sizes in intro- Gravetter, F. J., & Wallnau, L. B. (2013) Statistics for the behavioral
ductory courses is widely recognized, many introduc- sciences (8th ed.). Belmont, CA: Wadsworth Cengage Learning.
tory textbooks do not provide an effect size measure for Harris, R. J. (1997) Significance tests have their place. Psychological
Science, 8(1), 8-11.
two nonparametric methods commonly covered in the
Howells, D. C. (2012) Statistical methods for psychology (8th ed.).
course – the Mann-Whitney U test, and the Wilcoxon Belmont, CA: Thompson Wadsworth.
signed-rank test. When a correlational effect size is dis- Jaccard, J., & Becker, M. A. (2009) Statistics for the behavioral sciences
cussed in textbooks, two different formulas are used – (5th ed.). Pacific Grove, CA: Wadsworth Publishing.
the Glass (1965) formula for the Mann-Whitney U, and Kiess, H. O., & Green, B. A. (2010) Statistical concepts for the behav-
a different formula for the Wilcoxon signed-rank test ioral sciences (4th ed.). Boston, MA: Allyn & Bacon.
(Cohen, 2008; King, et al., 2011). Kirk, R. E. (2008) Statistics: an introduction (5th ed.). Belmont, CA:
Thomson Wadsworth.
This paper has shown that the simple difference for-
King, B. M., Rosopa, P. J., & Minium, E. W. (2011) Statistical reason-
mula can be used for both inferential tests. The simple ing in the behavioral sciences (6th ed.). Hoboken, NJ: John Wiley.
difference formula states that the correlation is equal Mann, H. B., & Whitney, D. R. (1947) On a test of whether one of
to the difference between the proportion of favorable two random variables is stochastically larger than the other.
and unfavorable evidence; in symbols, r = f – u. When Annals of Mathematical Statistics, 18(1), 50-60. DOI: 10.1214/
expressed in terms of favorable pairs, the formula com- aoms/1177730491
putes the rank-biserial correlation for the Mann-Whit- McGraw, K. O., & Wong, J. J. (1992) A common language ef-
fect size statistic. Psychological Bulletin, 111(2), 361-365. DOI:
ney U. When expressed in terms of favorable sums, the
10.1037/0033-2909.111.2.361
simple difference formula computes the matched-pairs Mischel, W. (1968) Personality and assessment. New York, NY: Wi-
rank-biserial correlation for the Wilcoxon signed rank ley.
test. Its ease of use and its generality makes the simple Nisbett, R. E. (1980) The trait construct in lay and professional
difference formula a useful concept to teach in the intro- psychology. In L. Festinger (Ed.), Retrospections on social psy-
ductory course in psychological statistics. chology. New York, NY: Oxford Univer. Press. Pp. 109-130.
In addition, another advantage of adopting the for- Nolan, S. A., & Heinzen, T. E. (2008) Statistics for the behavioral sci-
ences. New York, NY: Worth Publ.
mula for class is that it is related to other concepts in statis-
Privitera, G. J. (2012) Statistics for the behavioral sciences. Thousand
tics. When expressed in terms of favorable counts, the sim- Oaks, CA: Sage Publications, Inc.
ple difference formula is an easy introduction to the BESD, Richard, F. D., Bond, C. F., Jr., & Stokes-Zoota, J. J. (2003) One
a method designed to instruct students on how to inter- hundred years of social psychology quantitatively described.
pret the size of a correlation. Also, the formula can be used Review of General Psychology, 7(4), 331-363. DOI: 10.1037/1089-
to introduce students to the common language effect size, 2680.7.4.331
which in the Mann-Whitney U test is equal to the propor- Rosenthal, R., & Rubin, D. B. (1982) A simple, general purpose dis-
play of magnitude of experimental effect. Journal of Educational
tion of favorable pairs. In summary, teachers of psycho-
Psychology, 74(2), 166-169. DOI: 10.1037/0022-0663.74.2.166
logical statistics may find the simple difference formula a Siegel, S. (1956) Nonparametric statistics for the behavioral sciences.
useful addition to their class, and teachers of other classes New York, NY: McGraw-Hill Book Company.
in psychology may find the formula an easy way to con- Schmidt, F. L., & Hunter, J. (2004) General mental ability in the
vey the meaning of a correlation. world of work: occupational attainment and job performance.
Journal of Personality and Social Psychology, 86(1), 162-173. DOI:
References 10.1037/0022-3514.86.1.162
Aron, A., Aron, E. N., & Coups, E. J. (2011) Statistics for psychology Spatz, C. (2011) Basic statistics: Tales of distributions (10th ed.). Bel-
(6th ed.). Upper Saddle River, NJ: Pearson Prentice Hall. mont, CA: Wadsworth, Cengage Learning.
Cohen, B. (2008) Explaining psychological statistics (3rd ed.). Hobo- Sprent, P., & Smeeton, N. C. (2001) Applied nonparametric statistical
ken, NJ: John Wiley. methods. Boca Raton, FL: Chapman & Hall/CRC.

Wainer, H., & Robinson, D. H. (2003) Shaping up the practice of null Wilcoxon, F. (1945) Individual comparisons by ranking methods.
hypothesis significance testing. Educational Researcher, 32(7), 22-30. Biometrics Bulletin, 1(6), 80-83. DOI: 10.2307/3001968
Wendt, H. W. (1972) Dealing with a common problem in social Wilkinson, L., & APA Task Force on Statistical Inference. (1999) Sta-
science: a simplified rank-biserial coefficient of correlation tistical methods in psychology journals: guidelines and explana-
based on the U statistic. European Journal of Social Psychology, tions. American Psychologist, 54(8), 594-604. DOI: 10.1037/0003-
2(4), 463-465. DOI: 10.1002/ejsp.2420020412 066X.54.8.594

Efecto Man Withney 2 PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Efecto Man Withney 2 PDF

Încărcat de

Drepturi de autor:

Formate disponibile

INNOVATIVE The simple difference formula: an approach to teaching

TEACHING nonparametric correlation1

Effect Size for the Mann-Whitney U

www.AmmonsScientific.com OK 74464 or e-mail (Dave.S.Kerby@gmail.com).

Innovative Teaching 2 2014, Volume 3, Article 1

Innovative Teaching 3 2014, Volume 3, Article 1

Innovative Teaching 4 2014, Volume 3, Article 1

Innovative Teaching 5 2014, Volume 3, Article 1

TABLE 2 Suppose that the count of students at or above grade

Innovative Teaching 6 2014, Volume 3, Article 1

ever, Nisbett and other social psychologists continued TABLE 3

Innovative Teaching 7 2014, Volume 3, Article 1

Innovative Teaching 8 2014, Volume 3, Article 1

Innovative Teaching 9 2014, Volume 3, Article 1

S-ar putea să vă placă și