Sunteți pe pagina 1din 2

Manuscript Not for reproduction or citation without permission by the author

Algorithm for bootstrapping a distribution of

Klaus Krippendorff kkrippendorff@asc.upenn.edu

2006.08.16

Revised 2011.6.12

In the absence of a theoretically motivated distribution for , and especially because reliability data may be small and have various metrics (levels of measurement), the distribution of is obtained by bootstrapping. It provides probabilities of the -values that can be expected when very many similar samples of reliability data were coded. This bootstrapping algorithm randomly draws a great number of samples from the cell contents of a matrix of observed coincidences, obtains a hypothetical disagreement D o for each, which together with the original expected disagreement D e , gives rise to a probability distribution, p , of likely -values.

Given:

The square matrix of observed coincidences o ck , which gave rise to the .as calculated,

including the total number n

of values contributing to pair comparisons

n

1

D

o

D

e

 

v

c

1

v

k

The expected disagreement D e in the denominator of the observed

The applicable metric difference

metric

2

ck

1

o

ck

The number X of resamples to be drawn chosen by the analyst.

The bootstrapping algorithm is defined in four steps:

First.

Define the function

R is a uniformly distributed random number between 0 and 1 within a continuum of adequate precision. That continuum is segmented by the probabilities

2

ck

f ( R )

where

metric

p ck

o

ck

n

;

 

v

c

1

v

k

1

p

ck

1

so that each segment p ck of R is associated with its corresponding

metric

2

ck

:

 g  c h  k  g  v h  v  p gh p ck p gh R =0 g  1 h  1 g  h  k c 2 2 2  vv metric11 metricck metric

1=R

Second. Determine the number M of random draws with replacement from the data, capped by a practical limit.

Let Q = the number of non-zero c-k coincidences, o ck > 0,

Third.

Forth.

M = min[25Q, (m1)n

/2]

Bootstrap the distribution of :

Set the array n = 0; where 1 +1, and has at least 4 significant digits.

Do X times

X is chosen by the analyst, by default X = 20,000

SUM = 0 Do M times Pick a random number R between 0 and 1 (uniform distribution)

Determine

SUM <= SUM +

2

ck

by means of the function f(R)

metric

2

ck

metric

= 1

SUM

M

D

e

If < 1.000, n = 1 <= n = 1 + 1 Otherwise: n <= n + 1

Correct the frequencies n for situations in which the lack of variation should cause to be indeterminate ( = 1 0/0 ):

n x = 0 If the matrix of coincidences contains exactly one non-zero diagonal cell: o cc > 0:

n x = n =1

and

n =1 =0

If the matrix of coincidences contains two or more non-zero diagonal cells: o cc > 0:

n x =

c

v

X

c 1

o

n

cc

M

and

n =1 <= n =1 n x

The resulting distribution of is expressed in terms of the probabilities

p

n

X

n

x

.

This distribution offers two important statistical properties of :

The confidence interval for at a chosen level p of statistical significance (two-tailed):

smallest

largest

the smallest

n

p

X

n

x

2

the largest

X n    1

x

n

smallest

largest

p

2  

The probability q that the reliability data fail to reach the smallest acceptable min :

q

 

min

n

X n

x