Sunteți pe pagina 1din 4

Mean and Variance of the HyperGeometric Distribution Page 1

In a drawing of n distinguishable objects without replacement from a set of N (n < N)


distinguishable objects, a of which have characteristic A, (a < N) the probability that
exactly x objects in the draw of n have the characteristic A is given by then number of
different ways the x objects can be chosen from the a available times the number of
different ways the n-x objects in the draw which dont have A can be chosen from the
N-a available divided by the number of different ways n distinguishable objects can be
chosen from a set of N. The resulting probability distribution for the random variable x is
called the hypergeometric distribution. In symbols,
a N a

x nx
f ( x ) = .
N

n
k k!
The binomial coefficient = is defined to be zero if either j or k-j is
j j !( k j )!
negative, so that the probability of the null event of drawing more objects than those
a N a
n n
x n x =1 , consider the factorization
available is zero. To prove that f ( x) = N

x =0 x =0

n
( B + C ) N = ( B + C )a ( B + C ) N a . From the binomial theorem,
a
a a j j N a N a N a l l
(B + C) (B + C)
a N a
= B C B C
j =0 l =0
j l
a N a
a N a N ( l + j ) l + j
= j
l
B C
j =0 l =0

Using the diagonal rearrangement suggested by the figure below with l = n j , with the
intercept n running from 0 to N and j running from 0 to a. This generates more than the
( a + 1)( N a + 1) terms in the above sum. However, all of the new terms generated vanish
since they have l > N a .
N a
a N a N n n
( B + C )a ( B + C ) N a = B C
j n j
n =0 j =0
a
Now, for n > a extending the sum over j to n because of the factor would only add
j
terms which are zero. Similarly, if n < a , the terms in the sum over j from j = n + 1 to j = a
N a
are all zero due to the factor. Thus,
n j
N a
a N a N n n N n a N a N n n

N a
(B + C) a
(B + C) = B C = B C .
n =0 j =0 j n j n =0 j =0 j n j

Al Lehnen Madison Area Technical College 11/30/2011


Mean and Variance of the HyperGeometric Distribution Page 2

But from a second use of the binomial theorem,


N n N N
a N a N n n N n n
( B + C )a ( B + C ) N a = B C = ( B + C ) N
= B C .
n =0 j =0 j n j n =0 n
The only way the two sums can be equal for all values of B and C is for
n
a N a N
j = .
n j n
(1)
j =0
This in turn implies that the hypergeometric probabilities do indeed construct a valid
a N a
n
n
x n x
probability distribution, i.e.
f ( x) = N
=1 .
x =0 x =0

n
The mean or expected value of the hypergeometric random variable is given by
n 1 n
N a N a
x = x =
x f ( x) =
n
x x .
n x
x =0 x =0
Now, using Equation (1),

Al Lehnen Madison Area Technical College 11/30/2011


Mean and Variance of the HyperGeometric Distribution Page 3

n
a N a n xa ! N a
n a ( a 1)! ( N 1) ( a 1)
x = =
x n x x =1 x !( n x )! n x x =1 ( x 1)!( n 1) ( x 1) ! ( n 1) ( x 1)
x =0
n 1
( a 1)! ( N 1) ( a 1) n1 a 1 ( N 1) ( a 1)
=a x! n 1) x ! ( n 1) x = a x ( n 1) x
x =0 ( x =0
N 1
= a
n 1
1
n
This gives that x = x = x f ( x ) =a
N N 1 ( N 1)! n !( N n )! na
=a = .
x =0 n n 1 ( n 1) !( N n ) ! N! N
a
Using the notation of the binomial distribution that p = , we see that the expected value
N
of x is the same for both drawing without replacement (the hypergeometric distribution)
and with replacement (the binomial distribution).
na
x = x = = np (2)
N

The variance of the hypergeometric distribution can be computed from the generic
2 2
formula that x2 = x x = x2 x . Again from Equation (1),
n
a N a n x ( x 1) a ! N a n a ( a 1)( a 2 )! ( N 2) ( a 2)
x ( x 1) = =
x n x x =2 x !( n x )! n x x =2 ( x 2 )!( n 2 ) ( x 2 ) ! ( n 2 ) ( x 2 )
x =0
n2
( a 2 )! ( N 2 ) ( a 2 ) = a a 1 n 2 a 2 ( N 2 ) ( a 2 )
= a ( a 1) x! n 2 ) x ! ( n 2 ) x ( ) x ( n 2 ) x
x =0 ( x =0
N 2
= a ( a 1)
n2
So,
1 n 1
N a N a N N 2
x ( x 1) =
n
x ( x 1) =
x n x n
a ( a 1)
n2

x =0
( N 2 )! ( N n )!n! a ( a 1) n ( n 1)
= a ( a 1) =
( n 2 )!( N n )! N! N ( N 1)
and
a ( a 1) n ( n 1) an an ( a 1)( n 1)
x 2 = x ( x 1) + x = + = + 1 .
N ( N 1) N N ( N 1)

Al Lehnen Madison Area Technical College 11/30/2011


Mean and Variance of the HyperGeometric Distribution Page 4

Thus,
an ( a 1)( n 1)
2 an an N ( a 1)( n 1) N ( N 1) an ( N 1)
x2 = x 2 x = +1 = +
N N 1 N N N ( N 1) N ( N 1) N ( N 1)

an Nan Na Nn + N + N 2 N Nan + an an N 2 Na Nn + an
= =
N N ( N 1) N N ( N 1)
an N ( N a ) n ( N a ) an ( N n )( N a ) an N a N n
= = =
N N ( N 1) N N ( N 1) N N N 1
an a N n N n
= 1 = np (1 p )
N N N 1 N 1

N n
The last factor is called the finite population correction and is the reason that
N 1
the variance of the binomial distribution np (1 p ) differs from the hypergeometric
distribution. For N large compared to the sample size n, the two distributions are
essentially identical.

Al Lehnen Madison Area Technical College 11/30/2011

S-ar putea să vă placă și