Properties of The Con Dence Interval For The Negative Binomial Distribution

Properties of the Condence Interval for the Negative
Binomial Distribution
Advisor: Dr. Dyer
Jason Gilgenbach
University of Texas at Arlington
November 2009
Jason Gilgenbach (UTA) 11/09 1 / 26
Introduction
Review the negative binomial (NB) distribution
Review interval estimation
Derive a condence interval (CI ) for the parameter (p) in the NB
distribution
Investigate properties of the CI for the NB distribution
Investigate using the midpoint of the CI as and estimate of p
Derive a CI for p in the truncated NB distribution
Investigate properties of the CI for the truncated NB distribution
Negative Binomial Distribution
Used when conducting independent Bernoulli trials until a specied
number of successes is achieved
Probability density function
f (x; r , p) = (
x1
r 1
)p
x
(1 p)
xr
r N is the number of successes
x = r , r + 1, ...is the number of trials required to achieve r successes
p [0, 1] is the probability of success on each trial
Cumulative distribution function
F (x; r , p) =
x
k=r
f (k; r , p) = F
BETA
(p; r , x r + 1)
F
BETA
(y; a, b) is the CDF for the beta (BETA) distribution
Interval Estimation
The pivotal quantity method cannot be used for the NB distribution
The general method can be used
Theorem: Suppose that the statistic S is discrete with CDF G (s; ), which is
an increasing function of , then the following statements hold:
1
A one-sided lower 100 (1 ) % condence limit,
L
, is provided by a
solution of G (s;
L
) =
2
A one-sided upper 100 (1 ) % condence limit,
U
, is provided by a
solution of G (s 1;
U
) = 1
Condence Interval Derivation
It can be shown that
X
i
~ NB (1, p) = GEO (p) , i = 1, ..., n
S =
n
i =1
x
i
is a sucient statistic for p
S v NB (n, p)
F
S
(s; n, p) =
_
p
0
(s+1)
(n)(sn+1)
y
n1
(1 y)
sn
dy
And so the CDF G (s, p) is an increasing function of p
Apply the previous theorem to solve for the p
L
and p
U
p
L
= G
1
(; s) = BETAINV (; n, s n + 1)
p
U
= G
1
(1 ; s 1) = BETAINV (1 ; n, s n)
Note there is no solution for s = n for p
U
so set p
U
= 1
Properties of the Condence Interval
Assume that generally searching for rare events, i.e. p _ 0.1 and so
p
L
- 0
length (CI ) = p
U
midpoint(CI ) =
p
U
2
Since the NB distribution is discrete the condence interval estimate
is "conservative"
Following are graphs of
the midpoint as a function of s for xed n and
the simulated condence
Quality demonstration values can also be determined as a function of
p and
Midpoint as a Function of s
0 50 100 150 200 250
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
s
M
i
d
p
o
i
n
t

=

p
U
/
2
95% Confidence Interval
n = 1
n = 2
n = 3
n = 5
n = 10
Simulated Condence
n
p
Simulated Confidence
1 2 3 4 5
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.95
0.955
0.96
0.965
0.97
0.975
Quality Demonstration
Question: For a given condence the n
th
failure should occur on or after
what trial to demonstrate the process failure rate is p?
p n s p
U
p n s p
U
0.01 1 300 0.01 0.03 1 100 0.0298
2 474 0.01 2 158 0.0299
3 629 0.01 3 209 0.0300
4 774 0.01 4 258 0.0299
5 914 0.01 5 304 0.0300
p n s p
U
p n s p
U
0.02 1 150 0.0199 0.05 1 60 0.0495
2 237 0.0199 2 94 0.0500
3 314 0.0200 3 125 0.0499
4 387 0.0200 4 154 0.0499
5 457 0.0200 5 182 0.0499
Using the Midpoint as an Estimator for p
Investigate how well the midpoint of the CI (p
CI
)estimates p
Bias
Mean Squared Error (MSE)
Compare the bias and MSE between p
CI
and
Maximum Likelihood Estimator p
MLE
Minimum Variance Unbiased Estimator p
MVUE
Initially all bias and MSE values were simulated but exact solutions
can be calculated for p
MLE
and p
MVUE
Simulated Bias for the Midpoint of the Condence Interval
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
p
B
i
a
s
Bias for the Midpoint of the 95% Confidence Interval
n = 1
n = 2
n = 3
n = 5
n = 10
Simulated MSE for the Midpoint of the Condence Interval
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.01
0.02
0.03
0.04
0.05
0.06
p
M
S
E
MSE for the Midpoint of the 95% Confidence Interval
n = 1
n = 2
n = 3
n = 5
n = 10
Exact Solution for the MLE Bias and MSE
p
MLE
=
n
s
Bias (p
MLE
) = E [p
MLE
p] = nE
_
1
s
_
p
MSE (p
MLE
) = E
_
(p
MLE
p)
2
_
= n
2
E
_
1
s
2
_
2npE
_
1
s
_
+p
2
Solving for E
_
1
s
and E
_
1
s
2
is dicult since it involves computing
s=n
_
s 1
n 1
__
1
s
or
1
s
2
_
p
n
(1 p)
sn
Exact Solution for the MLE Bias and MSE Continued
With some diculty it can be shown that
E
_
1
s
_
=
_
_
_
p
1p
_
(ln p) , n = 1
_
p
1p
_
n
(1)
n
ln p +
n
j =2
(1)
j
nj +1
_
p
1p
_
j 1
, n _ 2
E
_
1
s
2
_
=
_
_
_
p
1p
_
Li
2
(1 p) , n = 1
_
p
1p
_
2
(Li
2
(1 p) + ln p) , n = 2
(1)
n+1
_
p
1p
_
n
_
Li
2
(1 p) +
n
j =2
ln p
nj +1
_
+
n1
J=2
nj +1
i =2
(1)
j +i
(nj +1)(nj i +2)
_
p
1p
_
j +i 2
, n _ 3
where Li
2
(x) =

k=1
(x)
k
k
2
is the 2
nd
order polylogorithm
Derivation of the MLE Bias
From the CDF of the NB distribution
s=n
_
s 1
n 1
_
p
n
(1 p)
sn
= 1
Multiply both sides by p
n
(1 p)
n1
to get
s=n
_
s 1
n 1
_
(1 p)
s1
= p
n
(1 p)
n1
Integrate both sides with respect to p to get
s=n
_
s 1
n 1
_
1
s
(1 p)
s
+C =
_
p
n
(1 p)
n1
dp
Derivation of the MLE Bias Continued
Let t =
1p
p
_
p
n
(1 p)
n1
dp =
_
t
n1
t + 1
dt
Note that
t
n1
=
_
_
_
1, n = 1
(1)
n+1
+ (t + 1)
n
j =2
(1)
j
t
nj
, n _ 2
And so
_
t
n1
t + 1
dt =
_
_
_
ln (1 +t) +C, n = 1
(1)
n+1
ln (1 +t) +
n
j =2
(1)
j
nj +1
t
nj +1
+C, n _ 2
Derivation of the MLE Bias Continued
Recall that t =
(1p)
p
and (1 +t) =
1
p
And so
_
p
n
(1 p)
n1
dp
=
_
_
_
ln (p) +C, n = 1
(1)
n
ln (p) +
n
j =2
(1)
j
nj +1
_
1p
p
_
nj +1
+C, n _ 2
=

s=n
_
s 1
n 1
_
1
s
(1 p)
s
+C
Solve for the constant by letting p = 1
Finally, multiply both sides by p
n
(1 p)
n
to get the desired result
Derivation of the MLE MSE
Using the previous result and process for E
_
1
s
it can be shown that

E
_
1
s
2
_
=
_
_
_
p
1p
_
Li
2
(1 p) , n = 1
_
p
1p
_
2
(Li
2
(1 p) + ln p) , n = 2
(1)
n+1
_
p
1p
_
n
_
Li
2
(1 p) +
n
j =2
ln p
nj +1
_
+
n1
J=2
nj +1
i =2
(1)
j +i
(nj +1)(nj i +2)
_
p
1p
_
j +i 2
, n _ 3
where Li
2
(x) =

k=1
(x)
k
k
2
is the 2
nd
order polylogorithm
Exact Solution for the MVUE Bias and MSE
p
MVUE
=
n 1
s 1
, n _ 2
Bias (p
MVUE
) = E [p
MVUE
p]
= E
_
n 1
s 1
p
_
= (n 1) E
_
1
s 1
_
p
MSE (p
MVUE
) = E
_
(p
MVU
p)
2
_
= E
_
_
n 1
s 1
p
_
2
_
= (n 1)
2
E
_
1
(s 1)
2
_
p
2
Derivation of the MVUE Bias and MSE
Using a similar process to nding E
_
1
s
it can be shown that

E
_
1
s 1
_
=
p
n 1
, n _ 2
E
_
1
(s 1)
2
_
=
_
p
2
1p
ln p, n = 2
1
n1
_
(1)
n+1
p
n
(1 p)
n1
ln p
+
n1
j =2
(1)
j
nj
p
j
(1 p)
1j
_
, n _ 3
Comparison of the Bias for the Estimators
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
p
B
i
a
s
n = 1
P
CI
P
MLE
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
p
B
i
a
s
n = 2
P
CI
P
MLE
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
p
B
i
a
s
n = 3
P
CI
P
MLE
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
p
B
i
a
s
n = 10
P
CI
P
MLE
Comparison of the MSE for the Estimators
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
p
M
S
E
n = 1
P
CI
P
MLE
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.005
0.01
0.015
0.02
0.025
0.03
p
M
S
E
n = 2
P
CI
P
MLE
P
MVUE
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
p
M
S
E
n = 3
P
CI
P
MLE
P
MVUE
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
x 10
-3
p
M
S
E
n = 10
P
CI
P
MLE
P
MVUE
Truncated Negative Binomial Distribution
Used when a NB experiment is terminated before the n
th
success
Consider a random variable Y =

S +n X
S
where
S = min (s, n
+
)
s is the number of trials until the n
th
success
n
+
is the integer limit on the number of trials to perform
n is the number of successes needed to stop
X
S
is the number of successes obtained
Cumulative Distribution Function
F(y; n, p) =
_
_
y n
i =0
(
y
i
)p
y i
(1 p)
i
, y = n, n + 1, , n
+
= Y
1
y n
i =0
(
n
+
i
)p
n
+
i
(1 p)
i
, y = n
+
+ 1, , n
+
+n 1 = Y
2
Condence Interval Derivation for the TNB Distribution
F (y; n, p) =
_
F
BETA
(p; n, y n + 1) , y = Y
1
F
BETA
(p; n
+
+n y, y n + 1) , y = Y
2
Just like the CDF for the NB distribution the CDF for the TNB
distribution is an increasing function of p and so p
L
and p
U
are given by
p
L
=
_
BETAINV (; n, y n + 1) , y = Y
1
BETAINV (; n
+
+n y, y n + 1) , y = Y
2
p
U
=
_
BETAINV (1 ; n, y n) , y = Y
1
BETAINV (1 ; n
+
+n y + 1, y n) , y = Y
2
Note there is no solution for y = n for p
U
so set p
U
= 1
Midpoint as a Function of y
0 10 20 30 40 50 60 70 80 90 100
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
y
M
i
d
p
o
i
n
t

=

p
U
/
2
95% Confidence Interval
n = 1
n = 2
n = 3
n = 5
n = 10
Conclusions
A CI for the NB distribution can be calculated using the general
method
For n = 1 the p
CI
estimates p with lower MSE than the p
MLE
and
the p
MVUE
does not exist
Even for n = 2 there are values of p for which the p
CI
estimates p
with lower MSE than the p
MLE
and the p
MVUE
Exact solutions for the bias and MSE for p
MLE
and p
MVUE
exist
A CI for the TNB distribution can be calculated using the general
method

Properties of The Con Dence Interval For The Negative Binomial Distribution

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Properties of The Con Dence Interval For The Negative Binomial Distribution

Încărcat de

Drepturi de autor:

Formate disponibile

Properties of the Condence Interval for the Negative

is dicult since it involves computing

it can be shown that

it can be shown that

S-ar putea să vă placă și