Sunteți pe pagina 1din 40

Week 1:

Nonparametric tests

Mikhail Zhelonkin
Erasmus University Rotterdam

Assume you have the following grades from students in


two groups:
Group 1 (X)
5.1
6.3
6.2
5.5
7.2
5.7

Group 2 (Y)
3.1
8.0
5.6
4.0
6.5
6.1

The question is: Is Group 1 better than Group 2?


This is a test of a hypothesis.
2

Testing procedure

Formulation of a Null hypothesis. For instance


H 0 : X = Y

Formulation of an Alternative hypothesis. Say


H a : X > Y

Under the null hypothesis we calculate a test statistic.


3

Testing procedure, contd

The null hypothesis together with other assumptions provide a known distribution. From this distribution we obtain the critical values for a given
level of significance .
The significance level (level of the test) is the probability that we reject the null hypothesis even if it
is true (type I error).

Reject
Not reject

Null hypothesis is:


True
False
Type I error
Correct
Correct
Type II error
4

t-Test
The test statistic is:
t=

X
r

sp n1x + n1y

where
v
u
2
u (nx 1)s2
+
(n

1)s
y
y
x
sp = t
nx + ny 2
P
P
= 1 nx Xi, Y
= 1 n y Yi .
is a pooled variance, and X
nx i=1
ny i=1

and Y
follow a normal disIf the random variables X
tribution then the test statistic follows a t-distribution
with nx + ny 2 degrees of freedom.
5

In our example
= 6, Y
= 5.55
X
2
s2
x = 0.544, sy = 3.123, nx = ny = 6
q

1 (0.544 + 3.123) = 1.35411


2
65.55
t=
= 0.5716
sp 1/3

sp =

The critical value is t10,0.05 = 1.812 (or t10,0.025 =


2.228), hence we do not reject the null.
But...
We assumed normality of Xs and Y s or a large
amount of observations, which is not very realistic
in our example.
We need something nonparametric.
6

Sign test/Teken toets


For paired data!
If the two groups in the table have the same ability,
then the probability that one of the two in the pair
has a better grade is 1/2.
Hence, our null hypothesis is
P (Xi > Yi) = 0.5, for each pair i.
Clearly, this hypothesis does not depend on the underlying distribution of the data. For each distribution with the same location (median) the chance
will be 1/2.
The alternative hypothesis is therefore
Ha : P (Xi > Yi) > (6=)0.5.
7

The test statistic is


M =

n
X

I(Xi > Yi),

i=1

where I(A) is an indicator function, i.e. I(A) = 1 if A


is true, and zero otherwise.
What distribution does M follow? Binomial with p =
0.5 under the null.
P (M = k) =

n
k

0.5n

Critical values:
The critical values are the tails at each end of the
distribution.
For the t-test we can find the critical values in the
table.
For the sign test we can calculate them ourselves.
For one-sided test we start from the bottom, i.e.
X > Y zero times, then 1 time, etc.. Or we can
start from above. We stop when the sum of probabilities is greater than .
For two-sided we start at both ends and move inwards. P (0) + P (n), then P (0) + P (1) + P (n 1) +
P (n), and so on until the probability is larger than
.
9

In our example
Group 1 (X)
5.1
6.3
6.2
5.5
7.2
5.7

Group 2 (Y)
3.1
8.0
5.6
4.0
6.5
6.1

sign
+
+
+
+
-

Then the test statistic is M = 4.


The critical value is
p(6) =

6
6

0.56 = 0.0156,

10

p(5) =

6
5

0.56 = 0.0938,

then one sided critical value is 6, which means that we


reject if all members of Group 1 have better results
than in Group 2. Hence we do not reject the null.
Remark: What if two matched group members have
the same grade? We delete those observations and
continue with n 1 observations.

11

Example 2
The table below gives the reaction time of 9 people for
two experiments.
Person
1
2
3
4
5
6
7
8
9

Experiment 1
9.4
7.8
5.6
12.1
6.9
4.2
8.8
7.7
6.4

Experiment 2
10.3
8.9
4.1
14.7
8.7
7.1
11.3
5.2
7.8
12

Let us test whether the distribution of the reaction


times for experiment 1 is at a significantly different
location from those for experiment 2 using the sign
test.
Formulate the hypotheses:
H0 : 1 = 2,
Ha : 1 6= 2.
In this formulation of the question we make a two-sided
test.

13

Test statistic: We know that the data are from a


paired experiment since the same person is tested. Hence,
we use the sign test.
Person
1
2
3
4
5
6
7
8
9

Experiment 1
9.4
7.8
5.6
12.1
6.9
4.2
8.8
7.7
6.4

Experiment 2
10.3
8.9
4.1
14.7
8.7
7.1
11.3
5.2
7.8

Difference
+
+
-

The test statistic M = 2.


14

Critical values
p(0) =

p(9) =

9
0

9
9

0.59 = 0.0019

0.59 = 0.0019

p(0) + p(9) = 0.0039(< 0.05)


p(1) =

p(8) =

9
1

9
8

0.59 = 0.0175

0.59 = 0.0175

p(0) + p(9) + p(1) + p(8) = 0.0389(< 0.05)


15

Still less thatn 5%, we continue.


p(2) =

p(7) =

9
2

9
7

0.59 = 0.07

0.59 = 0.07

p(0) + p(9) + p(1) + p(8) + p(2) + p(7) = 0.179


This is already too much. The rejection region is {0, 1, 8, 9}.
The value of our test statistic is outside the rejection
region, hence we do not reject the null hypothesis that
the locations are the same.

16

If n is large, than it will be increasingly difficult to


calculate critical values.
But as n the Binomial distribution converges to
the normal distribution. n > 25 can be considered big
enough to yield a close enough approximation for most
situations.
Then we have a z-test:
M E(M )
N (0, 1),
ST D(M )
where E(M ) = np = n/2, V ar(M ) = np(1 p) = n/4,

ST D(M ) = n/2. Then


z=

M n/2
z=
N (0, 1).
n/2
17

Wilcoxon signed-rank test

We can use additional information in the data, without making further assumptions about the distribution of the data - but we assume that the difference
have a symmetric distribution:
The difference between pairs of observations can be
positive or negative, but they can also be smaller
or larger.
We attribute no weight on the size of the difference but we simply observe that they are smaller or
larger.
18

If the distributions have the same location, the negative and positive differences should be of about the
same size.

The null hypothesis and the alternative are not altered but we have a different test statistic.

19

Test statistic: Example 1 The test statistic is the


sum of the positive and negative ranks.
Group 1 (X)
5.1
6.3
6.2
5.5
7.2
5.7

Group 2 (Y)
3.1
8.0
5.6
4.0
6.5
6.1

sign
+
+
+
+
-

abs.diff
2
1.7
0.6
1.5
0.7
0.4

rank
6
5
2
4
3
1

T + = 6 + 2 + 4 + 3 = 15, and T = 5 + 1 = 6

20

For a two-sided test we use: minimum: T = min(T +, T ).


One-sided test: Ha : f (X) is to the right of f (Y ).
Then the difference can be expected to be large positive
and small negative, then T is a test statistic.
One-sided test: Ha : f (X) is to the left of f (Y ). (X <
Y)
Then we expect large negative and small positive differences, then T + is a test statistic.
The critical values are in the Table 9 in the book
(Wackerly et al).
21

In our example one-sided test with H0 : f (X) is to the


right of f (Y ). The test statistic is T = T = 6.
The critical value is 1. Hence we do not reject the null
hypothesis.

22

Remark What is if a pair of observations have the same


value? As in the sign test, we will delete the observations from the sample and work with the remaining
sample of n 1 observations.
What if two pairs of observations have the same rank?
The observation get the average rank.
Group 1 (X)
5.1
6.3
6.2
5.5
7.2
5.7

Group 2 (Y)
3.1
8.0
5.6
4.0
6.6
6.1

sign
+
+
+
+
-

abs.diff
2
1.7
0.6
1.5
0.6
0.4

rank
6
5
2.5
4
2.5
1
23

Example 2 (reaction time).


Person
1
2
3
4
5
6
7
8
9

Ex 1
9.4
7.8
5.6
12.1
6.9
4.2
8.8
7.7
6.4

Ex 2
10.3
8.9
4.1
14.7
8.7
7.1
11.3
5.2
7.8

Sign
+
+
-

Abs.diff
0.9
1.1
1.5
2.6
1.8
2.9
2.5
2.5
1.4

rank
1
2
4
8
5
9
6.5
6.5
3

We get T + = 4 + 6.5 = 10.5 and T = 1 + 2 + 8 + 5 +


9 + 6.5 + 3 = 34.5. Then for a two-sided test we use
the smaller. here T + = 10.5. The critical value is 6,
hence we do not reject. the null hypothesis.
24

For large n:
If n is large, we can approximate the distribution of T
with the normal distribution:
n(n + 1)
,
E(T ) =
4
V ar(T ) =

n(n + 1)(2n + 1)
,
24

Hence
Z=q

T n(n + 1)/4

N (0, 1).

n(n + 1)(2n + 1)/24

25

Wilcoxon Rank-sum test

The sign test and the Wilcoxon signed-rank test


make an important assumption: The observations
are in pairs.

So the question is: What can we do when the observations are not in pairs?

If we have two populations with the same distribution, then the size of the observations should also
be equally distributed between the two populations.
26

For example, if I assume that economics students


and econometrics students are equally tall, then it
cant be that when I line up the students by size all
the econometrics students are on one side and the
economics students on the other.
Put differently, if I draw an observation from a population, I am just as likely to draw a large one from
either distributions.
So what we can do is to look at the ranks of the
observations in the samples. If the sum of the ranks
of one sample is larger than that of the other than
it is less likely that the distributions of the two populations have the same location.
27

Compared to the Wilcoxon signed-rank test, we do


not calculate the rank of the differences, bu the
rank of the observations themselves.

The null and alternative remain the same. The


assumptions on the data change and therefore the
test statistic changes: it is now the sum of the ranks
of the observations in the sample.

28

Rank
1
2
3
4
5
6
7
8
9
10
11
12

Student
Y1
Y4
X1
X4
Y3
X6
Y6
X3
X2
Y5
X5
Y2

Grade
3.1
4.0
5.1
5.5
5.6
5.7
6.1
6.2
6.3
6.5
7.2
8.0

Rank-sum(Y ) = 1 + 2 + 5 + 7 + 10 + 12 = 37, Ranksum(X) = 3 + 4 + 6 + 8 + 9 + 11 = 41


These rank-sums form our test statistic.
29

Critical values
We can calculate the critical values ourselves.

We have 12 ranks, then 12! possible permutations.

The most extreme is that all observations in one


group have the highest ranks and those of the other
sample the lowest rank: 6! permutations for group
X and 6! permutations for group Y . The probability
for this event is
6!6!
p=
= 0.0011
12!
30

At the other end of the spectrum we could have all


the highest ranks for observations in group Y and
the lowest for those in group X. The probability for
this would be
6!6!
= 0.0011
p=
12!
The next possibility is that the observations in group
Y have the following ranks:
1, 2, 3, 4, 5, 7 sum = 22
The probability is 0.0011.
We have to continue untill we reach . This is very
tedious...
31

The Mann-Whitney U-test

The Mann-Whitney U-test, which at face level is a different test from the Wilcoxon Rank-sum test, however,
we will show that the two tests are actually equivalent.
For the Mann-Whitney U-test we ask a different question: How many observations in the respective other
groups are of lower rank, i.e smaller?
If the population distributions have the same location,
then the ranks should be equal. A difference in the
ranks suggests that the location is not the same.
32

X1
X4
X6
X3
X2
X5

2
2
3
4
4
5

Y1
Y4
Y3
Y6
Y5
Y2

0
0
2
3
5
6

Hence for X the U-test statistic Ux = 2 + 2 + 3 + 4 +


4 + 5 = 20, and for Y : Uy = 0 + 0 + 2 + 3 + 5 + 6 = 16.

33

Test statistic.
What are the possible values for U ? 0, 1, 2, . . . , nxny .
The distribution of U is symmetric about nxny /2
For each c > 0 we have that
n n
n n
P (U 1 2 c) = P (U 1 2 + c)
2
2
Thus
P (U U0) = P (U n1n2 U0)
Quick check: If U0 = n12n2 c then
n1n2
c) = n1n2/2 + c
n1n2 U0 = n1n2 (
2
34

Critical values

The critical vales are in Table 8 in the book by


Wakerly et al. and are forP (U U0). So only for
one of the tails of the distribution. We therefore
take the smaller of the two U s.

For a two sided test we take /2 and reject for


P (U U0) = /2 or alternatively if P (U n1n2
U0).

In our example above the critical value is 8.


35

Connection to the Wilcoxon rank-sum test:


n (n + 1)
W2 ,
U1 = n1n2 + 1 1
2
or simpler,
n (n + 1)
U1 = W 1 1 1
.
2
From the Mann-Whitney U-statistic we can therefore
calculate the Wilcoxon rank-sum test statistic, and vice
versa. Hence, there is no need for two sets of critical
values. If you prefer to order all observations and can
calculate the Wilcoxon test statistic, convert it to a U
statistic and compare it to the critical values that are
tabulated.
36

Example 2 (reaction time) Using Wilcoxon:


Ex 1
9.4
7.8
5.6
12.1
6.9
4.2
8.8
7.7
6.4

rank
14
9.5
4
17
6
2
12
8
5

Ex 2
10.3
8.9
4.1
14.7
8.7
7.1
11.3
5.2
7.8

rank
15
13
1
18
11
7
16
3
9.5

Rank sum of group 1: 14 + 9.5 + 4 + 17 + 6 + 2 + 12 +


7 + 5 = 77.5, Rank sum of group 2: 15 + 13 + 1 + 18 +
11 + 7 + 16 + 3 + 9.5 = 93.5
37

Then according to the formula


(

Ui = 81 + 90/2 Wj =

32.5
48.5

38

Mann-Whitney test directly.


Ex 1
9.4
7.8
5.6
12.1
6.9
4.2
8.8
7.7
6.4

exceeds
6
3.5
2
8
2
1
5
3
2

Ex 2
10.3
8.9
4.1
14.7
8.7
7.1
11.3
5.2
7.8

rank
8
7
0
9
6
4
8
1
5.5

U1 = 6 + 3.5 + 2 + 8 + 2 + 1 + 5 + 3 + 2 = 32.5, and


U2 = 8 + 7 + 0 + 9 + 6 + 4 + 8 + 1 + 5.5 = 48.5
The critical value is 17 and we cannot reject.
39

What if n is large?
For large n we will use the approximation to the normal
distribution.
E(U ) = n1n2/2,
and
V ar(U ) = n1n2(n1 + n2 + 1)/12
Hence the standardized U-test statistic is
U n1n2/2
.
n1n2(n1 + n2 + 1)/12

Z=q

40

S-ar putea să vă placă și