Sunteți pe pagina 1din 33

Statistical Science

2001, Vol. 16, No. 2, 101133

Interval Estimation for


a Binomial Proportion
Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

Abstract. We revisit the problem of interval estimation of a binomial


proportion. The erratic behavior of the coverage probability of the stan-
dard Wald condence interval has previously been remarked on in the
literature (Blyth and Still, Agresti and Coull, Santner and others). We
begin by showing that the chaotic coverage properties of the Wald inter-
val are far more persistent than is appreciated. Furthermore, common
textbook prescriptions regarding its safety are misleading and defective
in several respects and cannot be trusted.
This leads us to consideration of alternative intervals. A number of
natural alternatives are presented, each with its motivation and con-
text. Each interval is examined for its coverage probability and its length.
Based on this analysis, we recommend the Wilson interval or the equal-
tailed Jeffreys prior interval for small n and the interval suggested in
Agresti and Coull for larger n. We also provide an additional frequentist
justication for use of the Jeffreys interval.
Key words and phrases: Bayes, binomial distribution, condence
intervals, coverage probability, Edgeworth expansion, expected length,
Jeffreys prior, normal approximation, posterior.

1. INTRODUCTION of the t test, linear regression, and ANOVA, its


popularity in everyday practical statistics is virtu-
This article revisits one of the most basic and
ally unmatched. The standard interval is known as
methodologically important problems in statisti-
the Wald interval as it comes from the Wald large
cal practice, namely, interval estimation of the
sample test for the binomial case.
probability of success in a binomial distribu-
So at rst glance, one may think that the problem
tion. There is a textbook condence interval for
is too simple and has a clear and present solution.
this problem that has acquired nearly universal
In fact, the problem is a difcult one, with unantic-
acceptance in practice. The interval, of course, is
ipated complexities. It is widely recognized that the
p z/2 n1/2 p1 p1/2 , where p = X/n is
actual coverage probability of the standard inter-
the sample proportion of successes, and z/2 is the
val is poor for p near 0 or 1. Even at the level of
1001 /2th percentile of the standard normal
introductory statistics texts, the standard interval
distribution. The interval is easy to present and
is often presented with the caveat that it should be
motivate and easy to compute. With the exceptions
used only when n minp 1 p is at least 5 (or 10).
Examination of the popular texts reveals that the
Lawrence D. Brown is Professor of Statistics, The qualications with which the standard interval is
Wharton School, University of Pennsylvania, 3000 presented are varied, but they all reect the concern
Steinberg Hall-Dietrich Hall, 3620 Locust Walk, about poor coverage when p is near the boundaries.
Philadelphia, Pennsylvania 19104-6302. T. Tony Cai In a series of interesting recent articles, it has
is Assistant Professor of Statistics, The Wharton also been pointed out that the coverage proper-
School, University of Pennsylvania, 3000 Steinberg ties of the standard interval can be erratically
Hall-Dietrich Hall, 3620 Locust Walk, Philadelphia, poor even if p is not near the boundaries; see, for
Pennsylvania 19104-6302. Anirban DasGupta is instance, Vollset (1993), Santner (1998), Agresti and
Professor, Department of Statistics, Purdue Uni- Coull (1998), and Newcombe (1998). Slightly older
versity, 1399 Mathematical Science Bldg., West literature includes Ghosh (1979), Cressie (1980)
Lafayette, Indiana 47907-1399 and Blyth and Still (1983). Agresti and Coull (1998)

101
102 L. D. BROWN, T. T. CAI AND A. DASGUPTA

particularly consider the nominal 95% case and lower at quite large sample sizes, and this happens
show the erratic and poor behavior of the stan- in an unpredictable and rather random way.
dard intervals coverage probability for small n Continuing, also in Section 2 we list a set of com-
even when p is not near the boundaries. See their mon prescriptions that standard texts present while
Figure 4 for the cases n = 5 and 10. discussing the standard interval. We show what
We will show in this article that the eccentric the deciencies are in some of these prescriptions.
behavior of the standard intervals coverage prob- Proposition 1 and the subsequent Table 3 illustrate
ability is far deeper than has been explained or is the defects of these common prescriptions.
appreciated by statisticians at large. We will show In Sections 3 and 4, we present our alterna-
that the popular prescriptions the standard inter- tive intervals. For the purpose of a sharper focus
val comes with are defective in several respects and we present these alternative intervals in two cat-
are not to be trusted. In addition, we will moti- egories. First we present in Section 3 a selected
vate, present and analyze several alternatives to the set of three intervals that clearly stand out in
standard interval for a general condence level. We our subsequent analysis; we present them as our
will ultimately make recommendations about choos- recommended intervals. Separately, we present
ing a specic interval for practical use, separately several other intervals in Section 4 that arise as
for different intervals of values of n. It will be seen clear candidates for consideration as a part of a
that for small n (40 or less), our recommendation comprehensive examination, but do not stand out
differs from the recommendation Agresti and Coull in the actual analysis.
(1998) made for the nominal 95% case. To facili- The short list of recommended intervals contains
tate greater appreciation of the seriousness of the the score interval, an interval recently suggested
problem, we have kept the technical content of this in Agresti and Coull (1998), and the equal tailed
article at a minimal level. The companion article, interval resulting from the natural noninforma-
Brown, Cai and DasGupta (1999), presents the asso-
tive Jeffreys prior for a binomial proportion. The
ciated theoretical calculations on Edgeworth expan-
score interval for the binomial case seems to
sions of the various intervals coverage probabili-
have been introduced in Wilson (1927); so we call
ties and asymptotic expansions for their expected
it the Wilson interval. Agresti and Coull (1998)
lengths.
suggested, for the special nominal 95% case, the
In Section 2, we rst present a series of exam-
interval p z0 025 n1/2 p1 p1/2 , where n = n + 4
ples on the degree of severity of the chaotic behav-
and p = X + 2/n + 4; this is an adjusted Wald
ior of the standard intervals coverage probability.
interval that formally adds two successes and
The chaotic behavior does not go away even when
two failures to the observed counts and then uses
n is quite large and p is not near the boundaries.
For instance, when n is 100, the actual coverage the standard method. Our second interval is the
probability of the nominal 95% standard interval appropriate version of this interval for a general
is 0.952 if p is 0.106, but only 0.911 if p is 0.107. condence level; we call it the AgrestiCoull inter-
The behavior of the coverage probability can be even val. By a slight abuse of terminology, we call our
more erratic as a function of n. If the true p is 0.5, third interval, namely the equal-tailed interval
the actual coverage of the nominal 95% interval is corresponding to the Jeffreys prior, the Jeffreys
0.953 at the rather small sample size n = 17, but interval.
falls to 0.919 at the much larger sample size n = 40. In Section 3, we also present our ndings on the
This eccentric behavior can get downright performances of our recommended intervals. As
extreme in certain practically important prob- always, two key considerations are their coverage
lems. For instance, consider defective proportions in properties and parsimony as measured by expected
industrial quality control problems. There it would length. Simplicity of presentation is also sometimes
be quite common to have a true p that is small. If an issue, for example, in the context of classroom
the true p is 0.005, then the coverage probability presentation at an elementary level. On considera-
of the nominal 95% interval increases monotoni- tion of these factors, we came to the conclusion that
cally in n all the way up to n = 591 to the level for small n (40 or less), we recommend that either
0.945, only to drop down to 0.792 if n is 592. This the Wilson or the Jeffreys prior interval should
unlucky spell continues for a while, and then the be used. They are very similar, and either may be
coverage bounces back to 0.948 when n is 953, but used depending on taste. The Wilson interval has a
dramatically falls to 0.852 when n is 954. Subse- closed-form formula. The Jeffreys interval does not.
quent unlucky spells start off at n = 1279, 1583 and One can expect that there would be resistance to
on and on. It should be widely known that the cov- using the Jeffreys interval solely due to this rea-
erage of the standard interval can be signicantly son. We therefore provide a table simply listing the
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 103

limits of the Jeffreys interval for n up to 30 and Cp n = Pp p CI 0 < p < 1, for the coverage
in addition also give closed form and very accurate probability.
approximations to the limits. These approximations A standard condence interval for p based on nor-
do not need any additional software. mal approximation has gained universal recommen-
For larger n n > 40, the Wilson, the Jeffreys dation in the introductory statistics textbooks and
and the AgrestiCoull interval are all very simi- in statistical practice. The interval is known to guar-
lar, and so for such n, due to its simplest form, antee that for any xed p 0 1 Cp n 1
we come to the conclusion that the AgrestiCoull as n .
interval should be recommended. Even for smaller Let z and z be the standard normal density
sample sizes, the AgrestiCoull interval is strongly and distribution functions, respectively. Throughout
preferable to the standard one and so might be the the paper we denote z/2 = 1 1 /2 p =
choice where simplicity is a paramount objective. X/n and q = 1 p. The standard normal approxi-
The additional intervals we considered are two mation condence interval CIs is given by
slight modications of the Wilson and the Jeffreys (1) CIs = p n1/2 pq1/2
intervals, the ClopperPearson exact interval,
the arcsine interval, the logit interval, the actual This interval is obtained by inverting the accep-
Jeffreys HPD interval and the likelihood ratio tance region of the well known Wald large-sample
interval. The modied versions of the Wilson and normal test for a general problem:
the Jeffreys intervals correct disturbing downward (2)  /
se 
spikes in the coverages of the original intervals very
close to the two boundaries. The other alternative where is a generic parameter, is the maximum
intervals have earned some prominence in the liter- likelihood estimate of and se is the estimated
ature for one reason or another. We had to apply a standard error of . In the binomial case, we have
certain amount of discretion in choosing these addi- = p = X/n and se = pq1/2 n1/2
tional intervals as part of our investigation. Since The standard interval is easy to calculate and
we wish to direct the main part of our conversation is heuristically appealing. In introductory statis-
to the three recommended intervals, only a brief tics texts and courses, the condence interval CIs
summary of the performances of these additional is usually presented along with some heuristic jus-
intervals is presented along with the introduction tication based on the central limit theorem. Most
of each interval. As part of these quick summaries, students and users no doubt believe that the larger
we indicate why we decided against including them the number n, the better the normal approximation,
among the recommended intervals. and thus the closer the actual coverage would be to
We strongly recommend that introductory texts the nominal level 1 . Further, they would believe
in statistics present one or more of these recom- that the coverage probabilities of this method are
mended alternative intervals, in preference to the close to the nominal value, except possibly when n
standard one. The slight sacrice in simplicity is small or p is near 0 or 1. We will show how
would be more than worthwhile. The conclusions completely both of these beliefs are false. Let us
we make are given additional theoretical support take a close look at how the standard interval CIs
by the results in Brown, Cai and DasGupta (1999). really performs.
Analogous results for other one parameter discrete 2.1 Lucky n, Lucky p
families are presented in Brown, Cai and DasGupta
(2000). An interesting phenomenon for the standard
interval is that the actual coverage probability
2. THE STANDARD INTERVAL of the condence interval contains nonnegligible
oscillation as both p and n vary. There exist some
When constructing a condence interval we usu- lucky pairs p n such that the actual coverage
ally wish the actual coverage probability to be close probability Cp n is very close to or larger than
to the nominal condence level. Because of the dis- the nominal level. On the other hand, there also
crete nature of the binomial distribution we cannot exist unlucky pairs p n such that the corre-
always achieve the exact nominal condence level sponding Cp n is much smaller than the nominal
unless a randomized procedure is used. Thus our level. The phenomenon of oscillation is both in n,
objective is to construct nonrandomized condence for xed p, and in p, for xed n. Furthermore, dras-
intervals for p such that the coverage probability tic changes in coverage occur in nearby p for xed
Pp p CI 1 where is some prespecied n and in nearby n for xed p. Let us look at ve
value between 0 and 1. We will use the notation simple but instructive examples.
104 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 1. Standard interval; oscillation phenomenon for xed p = 0 2 and variable n = 25 to 100

The probabilities reported in the following plots note that when n = 17 the coverage probability
and tables, as well as those appearing later in is 0.951, but the coverage probability equals 0.904
this paper, are the result of direct probability when n = 18. Indeed, the unlucky values of n arise
calculations produced in S-PLUS. In all cases suddenly. Although p is 0.5, the coverage is still
their numerical accuracy considerably exceeds the only 0.919 at n = 40. This illustrates the inconsis-
number of signicant gures reported and/or the tency, unpredictability and poor performance of the
accuracy visually obtainable from the plots. (Plots standard interval.
for variable p are the probabilities for a ne grid
of values of p, e.g., 2000 equally spaced values of p
Example 3. Now let us move p really close to
for the plots in Figure 5.)
the boundary, say p = 0 005. We mention in the
introduction that such p are relevant in certain
Example 1. Figure 1 plots the coverage prob- practical applications. Since p is so small, now one
ability of the nominal 95% standard interval for may fully expect that the coverage probability of
p = 0 2. The number of trials n varies from 25 to the standard interval is poor. Figure 2 and Table
100. It is clear from the plot that the oscillation is 2.2 show that there are still surprises and indeed
signicant and the coverage probability does not we now begin to see a whole new kind of erratic
steadily get closer to the nominal condence level behavior. The oscillation of the coverage probabil-
as n increases. For instance, C0 2 30 = 0 946 and ity does not show until rather large n. Indeed, the
C0 2 98 = 0 928. So, as hard as it is to believe, coverage probability makes a slow ascent all the
the coverage probability is signicantly closer to way until n = 591, and then dramatically drops to
0.95 when n = 30 than when n = 98. We see that 0.792 when n = 592. Figure 2 shows that thereafter
the true coverage probability behaves contrary to the oscillation manifests in full force, in contrast
conventional wisdom in a very signicant way. to Examples 1 and 2, where the oscillation started
early on. Subsequent unlucky values of n again
Example 2. Now consider the case of p = 0 5. arise in the same unpredictable way, as one can see
Since p = 0 5, conventional wisdom might suggest from Table 2.2.
to an unsuspecting user that all will be well if n is
about 20. We evaluate the exact coverage probabil-
2.2 Inadequate Coverage
ity of the 95% standard interval for 10 n 50.
In Table 1, we list the values of lucky n [dened The results in Examples 1 to 3 already show that
as Cp n 0 95] and the values of unlucky n the standard interval can have coverage noticeably
[dened for specicity as Cp n 0 92]. The con- smaller than its nominal value even for values of n
clusions presented in Table 2 are surprising. We and of np1 p that are not small. This subsec-

Table 1
Standard interval; lucky n and unlucky n for 10 n 50 and p = 0 5

Lucky n 17 20 25 30 35 37 42 44 49
C0 5 n 0.951 0.959 0.957 .957 0.959 0.953 0.956 0.951 0.956
Unlucky n 10 12 13 15 18 23 28 33 40
C0 5 n 0.891 0.854 0.908 0.882 0.904 0.907 0.913 0.920 0.919
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 105

Table 2
in this regard. See Figure 5 for such a comparison.
Standard interval; late arrival of unlucky n for small p
The error in coverage comes from two sources: dis-
Unlucky n 592 954 1279 1583 1876 creteness and skewness in the underlying binomial
C0 005 n 0.792 0.852 0.875 0.889 0.898 distribution. For a two-sided interval, the rounding
error due to discreteness is dominant, and the error
due to skewness is somewhat secondary, but still
tion contains two more examples that display fur- important for even moderately large n. (See Brown,
ther instances of the inadequacy of the standard Cai and DasGupta, 1999, for more details.) Note
interval. that the situation is different for one-sided inter-
vals. There, the error caused by the skewness can
Example 4. Figure 3 plots the coverage probabil- be larger than the rounding error. See Hall (1982)
ity of the nominal 95% standard interval with xed for a detailed discussion on one-sided condence
n = 100 and variable p. It can be seen from Fig- intervals.
ure 3 that in spite of the large sample size, signi- The oscillation in the coverage probability is
cant change in coverage probability occurs in nearby caused by the discreteness of the binomial dis-
p. The magnitude of oscillation increases signi- tribution, more precisely, the lattice structure of
cantly as p moves toward 0 or 1. Except for values the binomial distribution. The noticeable oscil-
of p quite near p = 0 5, the general trend of this lations are unavoidable for any nonrandomized
plot is noticeably below the nominal coverage value procedure, although some of the competing proce-
of 0 95. dures in Section 3 can be seen to have somewhat
smaller oscillations than the standard procedure.
See the text of Casella and Berger (1990) for intro-
Example 5. Figure 4 shows the coverage proba- ductory discussion of the oscillation in such a
bility of the nominal 99% standard interval with n = context.
20 and variable p from 0 to 1. Besides the oscilla- The erratic and unsatisfactory coverage prop-
tion phenomenon similar to Figure 3, a striking fact erties of the standard interval have often been
in this case is that the coverage never reaches the remarked on, but curiously still do not seem to
nominal level. The coverage probability is always be widely appreciated among statisticians. See, for
smaller than 0.99, and in fact on the average the example, Ghosh (1979), Blyth and Still (1983) and
coverage is only 0.883. Our evaluations show that Agresti and Coull (1998). Blyth and Still (1983) also
for all n 45, the coverage of the 99% standard show that the continuity-corrected version still has
interval is strictly smaller than the nominal level the same disadvantages.
for all 0 < p < 1.
2.3 Textbook Qualications
It is evident from the preceding presentation
that the actual coverage probability of the standard The normal approximation used to justify the
interval can differ signicantly from the nominal standard condence interval for p can be signi-
condence level for moderate and even large sam- cantly in error. The error is most evident when the
ple sizes. We will later demonstrate that there are true p is close to 0 or 1. See Lehmann (1999). In
other condence intervals that perform much better fact, it is easy to show that, for any xed n, the

Fig. 2. Standard interval; oscillation in coverage for small p


106 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 3. Standard interval; oscillation phenomenon for xed n = 100 and variable p

condence coefcient Cp n 0 as p 0 or 1. For example, when n = 40 and p = 0 5, one has


Therefore, most major problems arise as regards np = n1 p = 20 and np1 p = 10, so clearly
coverage probability when p is near the boundaries. either of the conditions (1) and (2) is satised. How-
Poor coverage probabilities for p near 0 or 1 are ever, from Table 1, the true coverage probability in
widely remarked on, and generally, in the popu- this case equals 0.919 which is certainly unsatisfac-
lar texts, a brief sentence is added qualifying when tory for a condence interval at nominal level 0.95.
to use the standard condence interval for p. It The qualication (5) is useless and (6) is patently
is interesting to see what these qualications are. misleading; (3) and (4) are certainly veriable, but
A sample of 11 popular texts gives the following they are also useless because in the context of fre-
qualications: quentist coverage probabilities, a data-based pre-
The condence interval may be used if:
scription does not have a meaning. The point is that
1. np n1 p are 5 (or 10); the standard interval clearly has serious problems
2. np1 p 5 (or 10); and the inuential texts caution the readers about
3. np n1
 p are 5 (or 10); that. However, the caution does not appear to serve
4. p 3 p1 p/n does not contain 0 or 1; its purpose, for a variety of reasons.
5. n quite large; Here is a result that shows that sometimes the
6. n 50 unless p is very small. qualications are not correct even in the limit as
It seems clear that the authors are attempting to n .
say that the standard interval may be used if the
central limit approximation is accurate. These pre- Proposition 1. Let > 0. For the standard con-
scriptions are defective in several respects. In the dence interval,
estimation problem, (1) and (2) are not veriable.
Even when these conditions are satised, we see,
for instance, from Table 1 in the previous section, (3) lim inf Cp n
n p np n1p
that there is no guarantee that the true coverage
probability is close to the nominal condence level. Pa < Poisson b 

Fig. 4. Coverage of the nominal 99% standard interval for xed n = 20 and variable p.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 107

Fig. 5. Coverage probability for n = 50.

Table 3
It is clear that qualication (1) does not work at
Standard interval; bound (3) on limiting minimum coverage
when np n1 p all and (2) is marginal. There are similar problems
with qualications (3) and (4).
 5 7 10

lim inf Cp n 0.875 0.913 0.926


3. RECOMMENDED ALTERNATIVE INTERVALS
n p np n1p
From the evidence gathered in Section 2, it seems
clear that the standard interval is just too risky.
This brings us to the consideration of alternative
where a and b are the integer parts of
intervals. We now analyze several such alternatives,
 each with its motivation. A few other intervals are
2 + 2 2 + 4/2 also mentioned for their theoretical importance.
where the sign goes with a and the + sign with b . Among these intervals we feel three stand out in
their comparative performance. These are labeled
The proposition follows from the fact that the separately as the recommended intervals.
sequence of Binn /n distributions converges 3.1 Recommended Intervals
weakly to the Poisson() distribution and so the
limit of the inmum is at most the Poisson proba- 3.1.1 The Wilson interval. An alternative to the
bility in the proposition by an easy calculation. standard interval is the condence interval based
Let us use Proposition 1 to investigate the validity on inverting the test in equation (2) that uses the
of qualications (1) and (2) in the list above. The null standard error pq1/2 n1/2 instead of the esti-
nominal condence level in Table 3 below is 0.95. mated standard error pq1/2 n1/2 . This condence
interval has the form
Table 4 X + 2 /2 n1/2
(4) CIW = pq + 2 /4n1/2
Values of x for the modied lower bound for the Wilson interval n + 2 n + 2
1 x=1 x=2 x=3 This interval was apparently introduced by Wilson
(1927) and we will call this interval the Wilson
0.90 0.105 0.532 1.102 interval.
0.95 0.051 0.355 0.818
0.99 0.010 0.149 0.436
The Wilson interval has theoretical appeal. The
interval is the inversion of the CLT approximation
108 L. D. BROWN, T. T. CAI AND A. DASGUPTA

to the family of equal tail tests of H0  p = p0 . the Jeffreys prior is Beta1/2 1/2 which has the
Hence, one accepts H0 based on the CLT approx- density function
imation if and only if p0 is in this interval. As
fp = 1 p1/2 1 p1/2
Wilson showed, the argument involves the solution
of a quadratic equation; or see Tamhane and Dunlop The 1001 % equal-tailed Jeffreys prior interval
(2000, Exercise 9.39). is dened as

3.1.2 The AgrestiCoull interval. The standard (6) CIJ = LJ x UJ x
interval CIs is simple and easy to remember. For where LJ 0 = 0 UJ n = 1 and otherwise
the purposes of classroom presentation and use in
texts, it may be nice to have (7) LJ x = B/2 X + 1/2 n X + 1/2
 an alternative that has
the familiar form p z p1 p/n, with a better (8) UJ x = B1 /2 X + 1/2 n X + 1/2
and new choice of p rather than p = X/n. This can
be accomplished by using the center of the Wilson The interval is formed by taking the central 1
region in place of p. Denote X  = X + 2 /2 and posterior probability interval. This leaves /2 poste-
2 
n = n + . Let p = X/n and q = 1 p. Dene the rior probability in each omitted tail. The exception
condence interval CIAC for p by is for x = 0n where the lower (upper) limits are
modied to avoid the undesirable result that the
(5) CIAC = p pq1/2 n1/2 coverage probability Cp n 0 as p 0 or 1.
The actual endpoints of the interval need to be
Both the AgrestiCoull and the Wilson interval are numerically computed. This is very easy to do using
centered on the same value, p. It is easy to check softwares such as Minitab, S-PLUS or Mathematica.
that the AgrestiCoull intervals are never shorter In Table 5 we have provided the limits for the case
than the Wilson intervals. For the case when = of the Jeffreys prior for 7 n 30.
0 05, if we use the value 2 instead of 1.96 for , The endpoints of the Jeffreys prior interval are
this interval is the add 2 successes and 2 failures the /2 and 1/2 quantiles of the Betax+1/2 n
interval in Agresti and Coull (1998). For this rea- x + 1/2 distribution. The psychological resistance
son, we call it the AgrestiCoull interval. To the among some to using the interval is because of the
best of our knowledge, Samuels and Witmer (1999) inability to compute the endpoints at ease without
is the rst introductory statistics textbook that rec- software.
ommends the use of this interval. See Figure 5 for We provide two avenues to resolving this problem.
the coverage of this interval. See also Figure 6 for One is Table 5 at the end of the paper. The second
its average coverage probability. is a computable approximation to the limits of the
Jeffreys prior interval, one that is computable with
3.1.3 Jeffreys interval. Beta distributions are the just a normal table. This approximation is obtained
standard conjugate priors for binomial distributions after some algebra from the general approximation
and it is quite common to use beta priors for infer- to a Beta quantile given in page 945 in Abramowitz
ence on p (see Berger, 1985). and Stegun (1970).
Suppose X Binn p and suppose p has a prior The lower limit of the 1001 % Jeffreys prior
distribution Betaa1  a2 ; then the posterior distri- interval is approximately
bution of p is BetaX + a1  n X + a2 . Thus a x + 1/2
1001 % equal-tailed Bayesian interval is given (9) 
n + 1 + n x + 1/2e2 1
by
where

B/2 X + a1  n X + a2  4pq/n + 2 3/6n2 
=
B1 /2 X + a1  n X + a2  4pq
1/2 ppq2 + 2 1/n
where B m1  m2  denotes the quantile of a +
Betam1  m2  distribution. 6npq2
The well-known Jeffreys prior and the uniform The upper limit may be approximated by the same
prior are each a beta distribution. The noninforma- expression with replaced by in . The simple
tive Jeffreys prior is of particular interest to us. approximation given above is remarkably accurate.
Historically, Bayes procedures under noninforma- Berry (1996, page 222) suggests using a simpler nor-
tive priors have a track record of good frequentist mal approximation, but this will not be sufciently
properties; see Wasserman (1991). In this problem accurate unless np1 p is rather large.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 109

Table 5
95% Limits of the Jeffreys prior interval

x n=7 n=8 n=9 n = 10 n = 11 n = 12

0 0 0.292 0 0.262 0 0.238 0 0.217 0 0.200 0 0.185


1 0.016 0.501 0.014 0.454 0.012 0.414 0.011 0.381 0.010 0.353 0.009 0.328
2 0.065 0.648 0.056 0.592 0.049 0.544 0.044 0.503 0.040 0.467 0.036 0.436
3 0.139 0.766 0.119 0.705 0.104 0.652 0.093 0.606 0.084 0.565 0.076 0.529
4 0.234 0.861 0.199 0.801 0.173 0.746 0.153 0.696 0.137 0.652 0.124 0.612
5 0.254 0.827 0.224 0.776 0.200 0.730 0.180 0.688
6 0.270 0.800 0.243 0.757

x n = 13 n = 14 n = 15 n = 16 n = 17 n = 18

0 0 0.173 0 0.162 0 0.152 0 0.143 0 0.136 0 0.129


1 0.008 0.307 0.008 0.288 0.007 0.272 0.007 0.257 0.006 0.244 0.006 0.232
2 0.033 0.409 0.031 0.385 0.029 0.363 0.027 0.344 0.025 0.327 0.024 0.311
3 0.070 0.497 0.064 0.469 0.060 0.444 0.056 0.421 0.052 0.400 0.049 0.381
4 0.114 0.577 0.105 0.545 0.097 0.517 0.091 0.491 0.085 0.467 0.080 0.446
5 0.165 0.650 0.152 0.616 0.140 0.584 0.131 0.556 0.122 0.530 0.115 0.506
6 0.221 0.717 0.203 0.681 0.188 0.647 0.174 0.617 0.163 0.589 0.153 0.563
7 0.283 0.779 0.259 0.741 0.239 0.706 0.222 0.674 0.207 0.644 0.194 0.617
8 0.294 0.761 0.272 0.728 0.254 0.697 0.237 0.668
9 0.303 0.746 0.284 0.716

x n = 19 n = 20 n = 21 n = 22 n = 23 n = 24

0 0 0.122 0 0.117 0 0.112 0 0.107 0 0.102 0 0.098


1 0.006 0.221 0.005 0.211 0.005 0.202 0.005 0.193 0.005 0.186 0.004 0.179
2 0.022 0.297 0.021 0.284 0.020 0.272 0.019 0.261 0.018 0.251 0.018 0.241
3 0.047 0.364 0.044 0.349 0.042 0.334 0.040 0.321 0.038 0.309 0.036 0.297
4 0.076 0.426 0.072 0.408 0.068 0.392 0.065 0.376 0.062 0.362 0.059 0.349
5 0.108 0.484 0.102 0.464 0.097 0.446 0.092 0.429 0.088 0.413 0.084 0.398
6 0.144 0.539 0.136 0.517 0.129 0.497 0.123 0.478 0.117 0.461 0.112 0.444
7 0.182 0.591 0.172 0.568 0.163 0.546 0.155 0.526 0.148 0.507 0.141 0.489
8 0.223 0.641 0.211 0.616 0.199 0.593 0.189 0.571 0.180 0.551 0.172 0.532
9 0.266 0.688 0.251 0.662 0.237 0.638 0.225 0.615 0.214 0.594 0.204 0.574
10 0.312 0.734 0.293 0.707 0.277 0.681 0.263 0.657 0.250 0.635 0.238 0.614
11 0.319 0.723 0.302 0.698 0.287 0.675 0.273 0.653
12 0.325 0.713 0.310 0.690

x n = 25 n = 26 n = 27 n = 28 n = 29 n = 30

0 0 0.095 0 0.091 0 0.088 0 0.085 0 0.082 0 0.080


1 0.004 0.172 0.004 0.166 0.004 0.160 0.004 0.155 0.004 0.150 0.004 0.145
2 0.017 0.233 0.016 0.225 0.016 0.217 0.015 0.210 0.015 0.203 0.014 0.197
3 0.035 0.287 0.034 0.277 0.032 0.268 0.031 0.259 0.030 0.251 0.029 0.243
4 0.056 0.337 0.054 0.325 0.052 0.315 0.050 0.305 0.048 0.295 0.047 0.286
5 0.081 0.384 0.077 0.371 0.074 0.359 0.072 0.348 0.069 0.337 0.067 0.327
6 0.107 0.429 0.102 0.415 0.098 0.402 0.095 0.389 0.091 0.378 0.088 0.367
7 0.135 0.473 0.129 0.457 0.124 0.443 0.119 0.429 0.115 0.416 0.111 0.404
8 0.164 0.515 0.158 0.498 0.151 0.482 0.145 0.468 0.140 0.454 0.135 0.441
9 0.195 0.555 0.187 0.537 0.180 0.521 0.172 0.505 0.166 0.490 0.160 0.476
10 0.228 0.594 0.218 0.576 0.209 0.558 0.201 0.542 0.193 0.526 0.186 0.511
11 0.261 0.632 0.250 0.613 0.239 0.594 0.230 0.577 0.221 0.560 0.213 0.545
12 0.295 0.669 0.282 0.649 0.271 0.630 0.260 0.611 0.250 0.594 0.240 0.578
13 0.331 0.705 0.316 0.684 0.303 0.664 0.291 0.645 0.279 0.627 0.269 0.610
14 0.336 0.697 0.322 0.678 0.310 0.659 0.298 0.641
15 0.341 0.690 0.328 0.672
110 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 6. Comparison of the average coverage probabilities. From top to bottom: the AgrestiCoull interval CIAC  the Wilson interval CIW 
the Jeffreys prior interval CIJ and the standard interval CIs . The nominal condence level is 0 95

In Figure 5 we plot the coverage probability of the The coverage of the Jeffreys interval is quali-
standard interval, the Wilson interval, the Agresti tatively similar to that of CIW over most of the
Coull interval and the Jeffreys interval for n = 50 parameter space 0 1. In addition, as we will see
and = 0 05. in Section 4.3, CIJ has an appealing connection to
the mid-P corrected version of the ClopperPearson
3.2 Coverage Probability exact intervals. These are very similar to CIJ ,
In this and the next subsections, we compare the over most of the range, and have similar appealing
performance of the standard interval and the three properties. CIJ is a serious and credible candidate
recommended intervals in terms of their coverage for practical use. The coverage has an unfortunate
probability and length. fairly deep spike near p = 0 and, symmetrically,
Coverage of the Wilson interval uctuates accept- another near p = 1. However, the simple modica-
ably near 1 , except for p very near 0 or 1. It tion of CIJ presented in Section 4.1.2 removes these
might be helpful to consult Figure 5 again. It can two deep downward spikes. The modied Jeffreys
be shown that, when 1 = 0 95, interval CIMJ performs well.
  Let us also evaluate the intervals in terms of their

lim inf C  n = 0 92 average coverage probability, the average being over
n 1 n p. Figure 6 demonstrates the striking difference in
 
the average coverage probability among four inter-
lim inf C  n = 0 936
n 5 n vals: the AgrestiCoull interval, the Wilson interval
the Jeffreys prior interval and the standard inter-
and
  val. The standard interval performs poorly. The
interval CIAC is slightly conservative in terms of
lim inf C  n = 0 938
n 10 n average coverage probability. Both the Wilson inter-
for the Wilson interval. In comparison, these three val and the Jeffreys prior interval have excellent
values for the standard interval are 0.860, 0.870, performance in terms of the average coverage prob-
and 0.905, respectively, obviously considerably ability; that of the Jeffreys prior interval is, if
smaller. anything, slightly superior. The average coverage
The modication CIMW presented in Section of the Jeffreys interval is really very close to the
4.1.1 removes the rst few deep downward spikes nominal level even for quite small n. This is quite
of the coverage function for CIW . The resulting cov- impressive.
erage function is overall somewhat conservative for  1Figure 7 displays the mean absolute errors,
p very near 0 or 1. Both CIW and CIMW have the 0 Cp n 1  dp, for n = 10 to 25, and
same coverage functions away from 0 or 1. n = 26 to 40. It is clear from the plots that among
The AgrestiCoull interval has good minimum the four intervals, CIW  CIAC and CIJ are com-
coverage probability. The coverage probability of parable, but the mean absolute errors of CIs are
the interval is quite conservative for p very close signicantly larger.
to 0 or 1. In comparison to the Wilson interval it
3.3 Expected Length
is more conservative, especially for small n. This
is not surprising because, as we have noted, CIAC Besides coverage, length is also very important
always contains CIW as a proper subinterval. in evaluation of a condence interval. We compare
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 111

Fig. 7. The mean absolute errors of the coverage of the standard solid the AgrestiCoull dashed the Jeffreys + and the Wilson
dotted intervals for n = 10 to 25 and n = 26 to 40

both the expected length and the average expected 40. Interestingly, the comparison is clear and con-
length of the intervals. By denition, sistent as n changes. Always, the standard interval
and the Wilson interval CIW have almost identical
Expected length
average expected length; the Jeffreys interval CIJ is
= En p lengthCI comparable to the Wilson interval, and in fact CIJ
n   is slightly more parsimonious. But the difference is
 n
= Ux n Lx n px 1 pnx  not of practical relevance. However, especially when
x=0
x n is small, the average expected length of CIAC is
where U and L are the upper and lower lim- noticeably larger than that of CIJ and CIW . In fact,
its of the condence interval CI, respectively. for n till about 20, the average expected length of
The CIAC is larger than that of CIJ by 0.04 to 0.02, and
 1 average expected length is just the integral this difference can be of denite practical relevance.
0 En p length(CI) dp.
We plot in Figure 8 the expected lengths of the The difference starts to wear off when n is larger
four intervals for n = 25 and = 0 05. In this case, than 30 or so.
CIW is the shortest when 0 210 p 0 790, CIJ is
the shortest when 0 133 p 0 210 or 0 790 p 4. OTHER ALTERNATIVE INTERVALS
0 867, and CIs is the shortest when p 0 133 or p Several other intervals deserve consideration,
0 867. It is no surprise that the standard interval is either due to their historical value or their theoret-
the shortest when p is near the boundaries. CIs is ical properties. In the interest of space, we had to
not really in contention as a credible choice for such exercise some personal judgment in deciding which
values of p because of its poor coverage properties additional intervals should be presented.
in that region. Similar qualitative phenomena hold
for other values of n. 4.1 Boundary modication
Figure 9 shows the average expected lengths of The coverage probabilities of the Wilson interval
the four intervals for n = 10 to 25 and n = 26 to and the Jeffreys interval uctuate acceptably near

Fig. 8. The expected lengths of the standard solid the Wilson dotted the AgrestiCoull dashed and the Jeffreys + intervals for
n = 25 and = 0 05.
112 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 9. The average expected lengths of the standard solid the Wilson dotted the AgrestiCoull dashed and the Jeffreys +
intervals for n = 10 to 25 and n = 26 to 40.

1 for p not very close to 0 or 1. Simple modica- replaced by a lower bound of x /n where x solves
tions can be made to remove a few deep downward
(10) e 0 /0! + 1 /1! + + x1 /x 1! = 1
spikes of their coverage near the boundaries; see
Figure 5. A symmetric prescription needs to be followed to
modify the upper bound for x very near n. The value
4.1.1 Modied Wilson interval. The lower bound of x should be small. Values which work reasonably
of the Wilson interval is formed by inverting a CLT well for 1 = 0 95 are
approximation. The coverage has downward spikes
x = 2 for n < 50 and x = 3 for 51 n 100+.
when p is very near 0 or 1. These spikes exist for all
n and . For example, it can be shown that, when Using the relationship between the Poisson and
1 = 0 95 and p = 0 1765/n, 2 distributions,

lim Pp p CIW  = 0 838 PY x = P221+x 2


n
where Y Poisson, one can also formally
and when 1 = 0 99 and p = 0 1174/n express x in (10) in terms of the 2 quantiles:
limn Pp p CIW  = 0 889 The particular x = 1/222x  where 22x denotes the 100th
numerical values 0 1174 0 1765 are relevant only percentile of the 2 distribution with 2x degrees of
to the extent that divided by n, they approximate freedom. Table 4 gives the values of x for selected
the location of these deep downward spikes. values of x and .
The spikes can be removed by using a one-sided For example, consider the case 1 = 0 95 and
Poisson approximation for x close to 0 or n. Suppose x = 2. The lower bound of CIW is 0 548/n +
we modify the lower bound for x = 1  x . For a 4. The modied Wilson interval replaces this by a
xed 1 x x , the lower bound of CIW should be lower bound of /n where = 1/2 24 0 05 . Thus,

Fig. 10. Coverage probability for n = 50 and p 0 0 15. The plots are symmetric about p = 0 5 and the coverage of the modied intervals
solid line is the same as that of the corresponding interval without modication dashed line for p 0 15 0 85.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 113

Fig. 11. Coverage probability of other alternative intervals for n = 50.

from a 2 table, for x = 2 the new lower bound is 4.2 Other intervals
0 355/n.
4.2.1 The ClopperPearson interval. The Clopper
We denote this modied Wilson interval by
Pearson interval is the inversion of the equal-tail
CIMW . See Figure 10 for its coverage.
binomial test rather than its normal approxima-
tion. Some authors refer to this as the exact
4.1.2 Modied Jeffreys interval. Evidently, CIJ
procedure because of its derivation from the bino-
has an appealing Bayesian interpretation, and,
mial distribution. If X = x is observed, then
its coverage properties are appealing again except
the ClopperPearson (1934) interval is dened by
for a very narrow downward coverage spike fairly
CICP = LCP x UCP x, where LCP x and UCP x
near 0 and 1 (see Figure 5). The unfortunate down-
are, respectively, the solutions in p to the equations
ward spikes in the coverage function result because
UJ 0 is too small and symmetrically LJ n is too Pp X x = /2 and Pp X x = /2
large. To remedy this, one may revise these two
It is easy to show that the lower endpoint is the /2
specic limits as
quantile of a beta distribution Betax n x + 1,
UMJ 0 = pl and LMJ n = 1 pl  and the upper endpoint is the 1 /2 quantile of a
beta distribution Betax + 1 n x. The Clopper
where pl satises 1 pl n = /2 or equivalently Pearson interval guarantees that the actual cov-
pl = 1 /21/n . erage probability is always equal to or above the
We also made a slight, ad hoc alteration of LJ 1 nominal condence level. However, for any xed p,
and set the actual coverage probability can be much larger
than 1 unless n is quite large, and thus the con-
LMJ 1 = 0 and UMJ n 1 = 1 dence interval is rather inaccurate in this sense. See
Figure 11. The ClopperPearson interval is waste-
In all other cases, LMJ = LJ and UMJ = UJ . fully conservative and is not a good choice for prac-
We denote the modied Jeffreys interval by CIMJ . tical use, unless strict adherence to the prescription
This modication removes the two steep down- Cp n 1 is demanded. Even then, better exact
ward spikes and the performance of the interval is methods are available; see, for instance, Blyth and
improved. See Figure 10. Still (1983) and Casella (1986).
114 L. D. BROWN, T. T. CAI AND A. DASGUPTA

4.2.2 The arcsine interval. Another interval is The interval (13) has been suggested, for example,
based on a widely used variance stabilizing trans- in Stone (1995, page 667). Figure 11 plots the cov-
formation for the binomial distribution [see, e.g., erage of the logit interval for n = 50. This interval
Bickel and Doksum, 1977: Tp = arcsinp1/2  performs quite well in terms of coverage for p away
This variance stabilization is based on the delta from 0 or 1. But the interval is unnecessarily long;
method and is, of course, only an asymptotic one. in fact its expected length is larger than that of the
Anscombe (1948) showed that replacing p by ClopperPearson exact interval.
p = X + 3/8/n + 3/4 gives better variance
stabilization; furthermore Remark. Anscombe (1956) suggested that =
X+1/2
log nX+1/2  is a better estimate of ; see also Cox
2n1/2 arcsinp1/2  arcsinp1/2  N0 1
and Snell (1989) and Santner and Duffy (1989). The
as n . variance of Anscombes may be estimated by

This leads to an approximate 1001% condence n + 1n + 2


=
V
interval for p, nX + 1n X + 1

CIArc = sin2 arcsinp1/2  12 n1/2  A new logit interval can be constructed using the
(11)
 Our evaluations show that
new estimates and V.
sin2 arcsinp1/2  + 12 n1/2  the new logit interval is overall shorter than CILogit
in (13). But the coverage of the new interval is not
See Figure 11 for the coverage probability of this satisfactory.
interval for n = 50. This interval performs reason-
ably well for p not too close to 0 or 1. The coverage 4.2.4 The Bayesian HPD interval. An exact
has steep downward spikes near the two edges; in Bayesian solution would involve using the HPD
fact it is easy to see that the coverage drops to zero intervals instead of our equal-tails proposal. How-
when p is sufciently close to the boundary (see ever, HPD intervals are much harder to compute
Figure 11). The mean absolute error of the coverage and do not do as well in terms of coverage proba-
of CIArc is signicantly larger than those of CIW , bility. See Figure 11 and compare to the Jeffreys
CIAC and CIJ . We note that our evaluations show equal-tailed interval in Figure 5.
that the performance of the arcsine interval with
the standard p in place of p in (11) is much worse 4.2.5 The likelihood ratio interval. Along with
than that of CIArc . the Wald and the Rao score intervals, the likeli-
hood ratio method is one of the most used methods
4.2.3 The logit interval. The logit interval is for construction of condence intervals. It is con-
obtained by inverting a Wald type interval for the structed by inversion of the likelihood ratio test
p
log odds = log 1p ; (see Stone, 1995). The MLE which accepts the null hypothesis H0  p = p0 if
of (for 0 < X < n) is 2 log2n  2 , where 2n is the likelihood ratio
   
p X
= log = log  Lp0  pX0 1 p0 
nX
1 p nX 2n = = 
supp Lp X/n 1 X/nnX
X
which is the so-called empirical logit transform. The
variance of , by an application of the delta theorem, L being the likelihood function. See Rao (1973).
can be estimated by Brown, Cai and DasGupta (1999) show by analyt-
n ical calculations that this interval has nice proper-
=
V
Xn X ties. However, it is slightly harder to compute. For
the purpose of the present article which we view as
This leads to an approximate 1001% condence primarily directed toward practice, we do not fur-
interval for , ther analyze the likelihood ratio interval.
1/2  + V
(12) CI = l  u  =  V 1/2 
4.3 Connections between Jeffreys Intervals
The logit interval for p is obtained by inverting the and Mid-P Intervals
interval (12),
The equal-tailed Jeffreys prior interval has some

e l eu interesting connections to the ClopperPearson


(13) CILogit = 
1 + e l 1 + e u interval. As we mentioned earlier, the Clopper
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 115

Pearson interval CICP can be written as limit for the ClopperPearson rule with x 1/2 suc-
cesses and LJ x is the lower limit for the Clopper
CICP = B/2 X n X + 1
Pearson rule with x + 1/2 successes. Strawderman
B1 /2 X + 1 n X and Wells (1998) contains a valuable discussion of
mid-P intervals and suggests some variations based
It therefore follows immediately that CIJ is always
on asymptotic expansions.
contained in CICP . Thus CIJ corrects the conserva-
tiveness of CICP .
It turns out that the Jeffreys prior interval,
5. CONCLUDING REMARKS
although Bayesianly constructed, has a clear and
convincing frequentist motivation. It is thus no sur-
prise that it does well from a frequentist perspec- Interval estimation of a binomial proportion is a
tive. As we now explain, the Jeffreys prior interval very basic problem in practical statistics. The stan-
CIJ can be regarded as a continuity corrected dard Wald interval is in nearly universal use. We
version of the ClopperPearson interval CICP . rst show that the performance of this standard
The interval CICP inverts the inequality Pp X interval is persistently chaotic and unacceptably
Lp /2 to obtain the lower limit and similarly poor. Indeed its coverage properties defy conven-
for the upper limit. Thus, for xed x, the upper limit tional wisdom. The performance is so erratic and
of the interval for p, UCP x, satises the qualications given in the inuential texts
are so defective that the standard interval should
(14) PUCP x X x /2 not be used. We provide a fairly comprehensive
and symmetrically for the lower limit. evaluation of many natural alternative intervals.
This interval is very conservative; undesirably so Based on this analysis, we recommend the Wilson
for most practical purposes. A familiar proposal to or the equal-tailed Jeffreys prior interval for small
eliminate this over-conservativeness is to instead nn 40). These two intervals are comparable in
invert both absolute error and length for n 40, and we
believe that either could be used, depending on
(15) Pp X Lp1+1/2Pp X = Lp = /2
taste.
This amounts to solving For larger n, the Wilson, the Jeffreys and the
AgrestiCoull intervals are all comparable, and the
1/2PUCP x X x 1
(16) AgrestiCoull interval is the simplest to present.
+ PUCP x X x = /2 It is generally true in statistical practice that only
which is the same as those methods that are easy to describe, remember
and compute are widely used. Keeping this in mind,
Umid-P X = 1/2B1 /2 x n x + 1
(17) we recommend the AgrestiCoull interval for prac-
+ 1/2B1 /2 x + 1 n x tical use when n 40. Even for small sample sizes,
the easy-to-present AgrestiCoull interval is much
and symmetrically for the lower endpoint. These
preferable to the standard one.
are the Mid-P Clopper-Pearson intervals. They are
known to have good coverage and length perfor- We would be satised if this article contributes
mance. Umid-P given in (17) is a weighted average to a greater appreciation of the severe aws of the
of two incomplete Beta functions. The incomplete popular standard interval and an agreement that it
Beta function of interest, B1 /2 x n x + 1, is deserves not to be used at all. We also hope that
continuous and monotone in x if we formally treat the recommendations for alternative intervals will
x as a continuous argument. Hence the average of provide some closure as to what may be used in
the two functions dening Umid-P is approximately preference to the standard method.
the same as the value at the halfway point, x + 1/2. Finally, we note that the specic choices of the
Thus values of n, p and in the examples and gures
are artifacts. The theoretical results in Brown, Cai
Umid-P X B1/2x+1/2nx+1/2 = UJ x and DasGupta (1999) show that qualitatively sim-
exactly the upper limit for the equal-tailed Jeffreys ilar phenomena as regarding coverage and length
interval. Similarly, the corresponding approximate hold for general n and p and common values of
lower endpoint is the Jeffreys lower limit. the coverage. (Those results there are asymptotic
Another frequentist way to interpret the Jeffreys as n , but they are also sufciently accurate
prior interval is to say that UJ x is the upper for realistically moderate n.)
116 L. D. BROWN, T. T. CAI AND A. DASGUPTA

APPENDIX
Table A.1
95% Limits of the modied Jeffreys prior interval

x n=7 n=8 n=9 n = 10 n = 11 n = 12

0 0 0.410 0 0.369 0 0.336 0 0.308 0 0.285 0 0.265


1 0 0.501 0 0.454 0 0.414 0 0.381 0 0.353 0 0.328
2 0.065 0.648 0.056 0.592 0.049 0.544 0.044 0.503 0.040 0.467 0.036 0.436
3 0.139 0.766 0.119 0.705 0.104 0.652 0.093 0.606 0.084 0.565 0.076 0.529
4 0.234 0.861 0.199 0.801 0.173 0.746 0.153 0.696 0.137 0.652 0.124 0.612
5 0.254 0.827 0.224 0.776 0.200 0.730 0.180 0.688
6 0.270 0.800 0.243 0.757

x n = 13 n = 14 n = 15 n = 16 n = 17 n = 18

0 0 0.247 0 0.232 0 0.218 0 0.206 0 0.195 0 0.185


1 0 0.307 0 0.288 0 0.272 0 0.257 0 0.244 0 0.232
2 0.033 0.409 0.031 0.385 0.029 0.363 0.027 0.344 0.025 0.327 0.024 0.311
3 0.070 0.497 0.064 0.469 0.060 0.444 0.056 0.421 0.052 0.400 0.049 0.381
4 0.114 0.577 0.105 0.545 0.097 0.517 0.091 0.491 0.085 0.467 0.080 0.446
5 0.165 0.650 0.152 0.616 0.140 0.584 0.131 0.556 0.122 0.530 0.115 0.506
6 0.221 0.717 0.203 0.681 0.188 0.647 0.174 0.617 0.163 0.589 0.153 0.563
7 0.283 0.779 0.259 0.741 0.239 0.706 0.222 0.674 0.207 0.644 0.194 0.617
8 0.294 0.761 0.272 0.728 0.254 0.697 0.237 0.668
9 0.303 0.746 0.284 0.716

x n = 19 n = 20 n = 21 n = 22 n = 23 n = 24

0 0 0.176 0 0.168 0 0.161 0 0.154 0 0.148 0 0.142


1 0 0.221 0 0.211 0 0.202 0 0.193 0 0.186 0 0.179
2 0.022 0.297 0.021 0.284 0.020 0.272 0.019 0.261 0.018 0.251 0.018 0.241
3 0.047 0.364 0.044 0.349 0.042 0.334 0.040 0.321 0.038 0.309 0.036 0.297
4 0.076 0.426 0.072 0.408 0.068 0.392 0.065 0.376 0.062 0.362 0.059 0.349
5 0.108 0.484 0.102 0.464 0.097 0.446 0.092 0.429 0.088 0.413 0.084 0.398
6 0.144 0.539 0.136 0.517 0.129 0.497 0.123 0.478 0.117 0.461 0.112 0.444
7 0.182 0.591 0.172 0.568 0.163 0.546 0.155 0.526 0.148 0.507 0.141 0.489
8 0.223 0.641 0.211 0.616 0.199 0.593 0.189 0.571 0.180 0.551 0.172 0.532
9 0.266 0.688 0.251 0.662 0.237 0.638 0.225 0.615 0.214 0.594 0.204 0.574
10 0.312 0.734 0.293 0.707 0.277 0.681 0.263 0.657 0.250 0.635 0.238 0.614
11 0.319 0.723 0.302 0.698 0.287 0.675 0.273 0.653
12 0.325 0.713 0.310 0.690

x n = 25 n = 26 n = 27 n = 28 n = 29 n = 30

0 0 0.137 0 0.132 0 0.128 0 0.123 0 0.119 0 0.116


1 0 0.172 0 0.166 0 0.160 0 0.155 0 0.150 0 0.145
2 0.017 0.233 0.016 0.225 0.016 0.217 0.015 0.210 0.015 0.203 0.014 0.197
3 0.035 0.287 0.034 0.277 0.032 0.268 0.031 0.259 0.030 0.251 0.029 0.243
4 0.056 0.337 0.054 0.325 0.052 0.315 0.050 0.305 0.048 0.295 0.047 0.286
5 0.081 0.384 0.077 0.371 0.074 0.359 0.072 0.348 0.069 0.337 0.067 0.327
6 0.107 0.429 0.102 0.415 0.098 0.402 0.095 0.389 0.091 0.378 0.088 0.367
7 0.135 0.473 0.129 0.457 0.124 0.443 0.119 0.429 0.115 0.416 0.111 0.404
8 0.164 0.515 0.158 0.498 0.151 0.482 0.145 0.468 0.140 0.454 0.135 0.441
9 0.195 0.555 0.187 0.537 0.180 0.521 0.172 0.505 0.166 0.490 0.160 0.476
10 0.228 0.594 0.218 0.576 0.209 0.558 0.201 0.542 0.193 0.526 0.186 0.511
11 0.261 0.632 0.250 0.613 0.239 0.594 0.230 0.577 0.221 0.560 0.213 0.545
12 0.295 0.669 0.282 0.649 0.271 0.630 0.260 0.611 0.250 0.594 0.240 0.578
13 0.331 0.705 0.316 0.684 0.303 0.664 0.291 0.645 0.279 0.627 0.269 0.610
14 0.336 0.697 0.322 0.678 0.310 0.659 0.298 0.641
15 0.341 0.690 0.328 0.672
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 117

ACKNOWLEDGMENTS Cox, D. R. and Snell, E. J. (1989). Analysis of Binary Data, 2nd


ed. Chapman and Hall, London.
We thank Xuefeng Li for performing some helpful Cressie, N. (1980). A nely tuned continuity correction. Ann.
computations and Jim Berger, David Moore, Steve Inst. Statist. Math. 30 435442.
Samuels, Bill Studden and Ron Thisted for use- Ghosh, B. K. (1979). A comparison of some approximate con-
ful conversations. We also thank the Editors and dence intervals for the binomial parameter J. Amer. Statist.
two anonymous referees for their thorough and con- Assoc. 74 894900.
structive comments. Supported by grants from the Hall, P. (1982). Improving the normal approximation when
National Science Foundation and the National Secu- constructing one-sided condence intervals for binomial or
rity Agency. Poisson parameters. Biometrika 69 647652.
Lehmann, E. L. (1999). Elements of Large-Sample Theory.
REFERENCES Springer, New York.
Newcombe, R. G. (1998). Two-sided condence intervals for the
Abramowitz, M. and Stegun, I. A. (1970). Handbook of Mathe- single proportion; comparison of several methods. Statistics
matical Functions. Dover, New York. in Medicine 17 857872.
Agresti, A. and Coull, B. A. (1998). Approximate is better than
Rao, C. R. (1973). Linear Statistical Inference and Its Applica-
exact for interval estimation of binomial proportions. Amer.
tions. Wiley, New York.
Statist. 52 119126.
Anscombe, F. J. (1948). The transformation of Poisson, binomial Samuels, M. L. and Witmer, J. W. (1999). Statistics for
and negative binomial data. Biometrika 35 246254. the Life Sciences, 2nd ed. Prentice Hall, Englewood
Anscombe, F. J. (1956). On estimating binomial response rela- Cliffs, NJ.
tions. Biometrika 43 461464. Santner, T. J. (1998). A note on teaching binomial condence
Berger, J. O. (1985). Statistical Decision Theory and Bayesian intervals. Teaching Statistics 20 2023.
Analysis, 2nd ed. Springer, New York. Santner, T. J. and Duffy, D. E. (1989). The Statistical Analysis
Berry, D. A. (1996). Statistics: A Bayesian Perspective. of Discrete Data. Springer, Berlin.
Wadsworth, Belmont, CA.
Stone, C. J. (1995). A Course in Probability and Statistics.
Bickel, P. and Doksum, K. (1977). Mathematical Statistics.
Duxbury, Belmont, CA.
Prentice-Hall, Englewood Cliffs, NJ.
Blyth, C. R. and Still, H. A. (1983). Binomial condence inter- Strawderman, R. L. and Wells, M. T. (1998). Approximately
vals. J. Amer. Statist. Assoc. 78 108116. exact inference for the common odds ratio in several 2 2
Brown, L. D., Cai, T. and DasGupta, A. (1999). Condence inter- tables (with discussion). J. Amer. Statist. Assoc. 93 1294
vals for a binomial proportion and asymptotic expansions. 1320.
Ann. Statist to appear. Tamhane, A. C. and Dunlop, D. D. (2000). Statistics and Data
Brown, L. D., Cai, T. and DasGupta, A. (2000). Interval estima- Analysis from Elementary to Intermediate. Prentice Hall,
tion in discrete exponential family. Technical report, Dept. Englewood Cliffs, NJ.
Statistics. Univ. Pennsylvania. Vollset, S. E. (1993). Condence intervals for a binomial pro-
Casella, G. (1986). Rening binomial condence intervals
portion. Statistics in Medicine 12 809824.
Canad. J. Statist. 14 113129.
Casella, G. and Berger, R. L. (1990). Statistical Inference. Wasserman, L. (1991). An inferential interpretation of default
Wadsworth & Brooks/Cole, Belmont, CA. priors. Technical report, Carnegie-Mellon Univ.
Clopper, C. J. and Pearson, E. S. (1934). The use of condence Wilson, E. B. (1927). Probable inference, the law of succes-
or ducial limits illustrated in the case of the binomial. sion, and statistical inference. J. Amer. Statist. Assoc. 22
Biometrika 26 404413. 209212.

Comment
Alan Agresti and Brent A. Coull

In this very interesting article, Professors Brown, ness can cause havoc for much larger sample sizes
Cai and DasGupta (BCD) have shown that discrete- that one would expect. The popular (Wald) con-
dence interval for a binomial parameter p has been
known for some time to behave poorly, but readers
Alan Agresti is Distinguished Professor, Depart- will surely be surprised that this can happen for
ment of Statistics, University of Florida, Gainesville, such large n values.
Florida 32611-8545 (e-mail: aa@stat.u.edu). Brent Interval estimation of a binomial parameter is
A. Coull is Assistant Professor, Department of Bio- deceptively simple, as there are not even any nui-
statistics, Harvard School of Public Health, Boston, sance parameters. The gold standard would seem
Massachusetts 02115 e-mail: bcoull@hsph.har- to be a method such as the ClopperPearson, based
vard.edu. on inverting an exact test using the binomial dis-
118 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 1. A Comparison of mean expected lengths for the nominal 95% Jeffreys J Wilson W Modied Jeffreys M-J Modied Wilson
M-W and AgrestiCoull AC intervals for n = 5 6 7 8 9.

tribution rather than an approximate test using average p rather than the weighted average of the
the normal. Because of discreteness, however, this variances. The resulting interval p z/2 pq/n1/2
method is too conservative. A more practical, nearly is wider than CIW (by Jensens inequality), in par-
gold standard for this and other discrete problems ticular being conservative for p near 0 and 1 where
seems to be based on inverting a two-sided test CIW can suffer poor coverage probabilities.
using the exact distribution but with the mid-P Regarding textbook qualications on sample size
value. Similarly, with large-sample methods it is for using the Wald interval, skewness considera-
better not to use a continuity correction, as other- tions and the Edgeworth expansion suggest that
wise it approximates exact inference based on an guidelines for n should depend on p through 1
ordinary P-value, resulting in conservative behav- 2p2 /p1p. See, for instance, Boos and Hughes-
ior. Interestingly, BCD note that the Jeffreys inter- Oliver (2000). But this does not account for the
val (CIJ ) approximates the mid-P value correction
effects of discreteness, and as BCD point out, guide-
of the ClopperPearson interval. See Gart (1966)
lines in terms of p are not veriable. For elemen-
for related remarks about the use of 12 additions
tary course teaching there is no obvious alternative
to numbers of successes and failures before using
(such as t methods) for smaller n, so we think it is
frequentist methods.
sensible to teach a single method that behaves rea-
1. METHODS FOR ELEMENTARY sonably well for all n, as do the Wilson, Jeffreys and
STATISTICS COURSES AgrestiCoull intervals.

Its unfortunate that the Wald interval for p


is so seriously decient, because in addition to 2. IMPROVED PERFORMANCE WITH
being the simplest interval it is the obvious one BOUNDARY MODIFICATIONS
to teach in elementary statistics courses. By con-
BCD showed that one can improve the behavior
trast, the Wilson interval (CIW ) performs surpris-
of the Wilson and Jeffreys intervals for p near 0
ingly well even for small n. Since it is too com-
and 1 by modifying the endpoints for CIW when
plex for many such courses, however, our motiva-
tion for the AgrestiCoull interval (CIAC ) was to x = 1 2 n 2 n 1 (and x = 3 and n 3 for
provide a simple approximation for CIW . Formula n > 50) and for CIJ when x = 0 1 n 1 n. Once
(4) in BCD shows that the midpoint p for CIW is one permits the modication of methods near the
a weighted average of p and 1/2 that equals the sample space boundary, other methods may per-
sample proportion after adding z2/2 pseudo obser- form decently besides the three recommended in
vations, half of each type; the square of the coef- this article.
cient of z/2 is the same weighted average of the For instance, Newcombe (1998) showed that when
variance of a sample proportion when p = p and 0 < x < n the Wilson interval CIW and the Wald
when p = 1/2, using n = n + z2/2 in place of n. The logit interval have the same midpoint on the logit
CIAC uses the CIW midpoint, but its squared coef- scale. In fact, Newcombe has shown (personal com-
cient of z/2 is the variance pq/n at the weighted munication, 1999) that the logit interval necessarily
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 119

Fig. 2. A comparison of expected lengths for the nominal 95% Jeffreys J Wilson W Modied Jeffreys M-J Modied Wilson M-W
and AgrestiCoull AC intervals for n = 5.

contains CIW . The logit interval is the uninforma- suggest that the boundary-modied likelihood ratio
tive one [0 1] when x = 0 or x = n, but substitut- interval also behaves reasonably well, although con-
ing the ClopperPearson limits in those cases yields servative for p near 0 and 1.
coverage probability functions that resemble those For elementary course teaching, a disadvantage
for CIW and CIAC , although considerably more con- of all such intervals using boundary modications
servative for small n. Rubin and Schenker (1987) is that making exceptions from a general, simple
recommended the logit interval after 12 additions to recipe distracts students from the simple concept
numbers of successes and failures, motivating it as a of taking the estimate plus and minus a normal
normal approximation to the posterior distribution score multiple of a standard error. (Of course, this
of the logit parameter after using the Jeffreys prior. concept is not sufcient for serious statistical work,
However, this modication has coverage probabili- but some over simplication and compromise is nec-
ties that are unacceptably small for p near 0 and 1 essary at that level.) Even with CIAC , instructors
(See Vollset, 1993). Presumably some other bound- may nd it preferable to give a recipe with the
ary modication will result in a happy medium. In same number of added pseudo observations for all
a letter to the editor about Agresti and Coull (1998), , instead of z2/2 . Reasonably good performance
Rindskopf (2000) argued in favor of the logit inter- seems to result, especially for small , from the
val partly because of its connection with logit mod- value 4 z20 025 used in the 95% CIAC interval (i.e.,
eling. We have not used this method for teaching the add two successes and two failures interval).
in elementary courses, since logit intervals do not Agresti and Caffo (2000) discussed this and showed
extend to intervals for the difference of proportions that adding four pseudo observations also dramat-
and (like CIW and CIJ ) they are rather complex for ically improves the Wald two-sample interval for
that level. comparing proportions, although again at the cost of
For practical use and for teaching in more rather severe conservativeness when both parame-
advanced courses, some statisticians may prefer the ters are near 0 or near 1.
likelihood ratio interval, since conceptually it is sim-
ple and the method also applies in a general model- 3. ALTERNATIVE WIDTH COMPARISON
building framework. An advantage compared to the
Wald approach is its invariance to the choice of In comparing the expected lengths of the
scale, resulting, for instance, both from the origi- three recommended intervals, BCD note that the
nal scale and the logit. BCD do not say much about comparison is clear and consistent as n changes,
this interval, since it is harder to compute. However, with the average expected length being noticeably
it is easy to obtain with standard statistical soft- larger for CIAC than CIJ and CIW . Thus, in their
ware (e.g., in SAS, using the LRCI option in PROC concluding remarks, they recommend CIJ and CIW
GENMOD for a model containing only an intercept for small n. However, since BCD recommend mod-
term and assuming a binomial response with logit ifying CIJ and CIW to eliminate severe downward
or identity link function). Graphs in Vollset (1993) spikes of coverage probabilities, we believe that a
120 L. D. BROWN, T. T. CAI AND A. DASGUPTA

more fair comparison of expected lengths uses the Finally, we are curious about the implications of
modied versions CIMJ and CIMW . We checked the BCD results in a more general setting. How
this but must admit that gures analogous to much does their message about the effects of dis-
the BCD Figures 8 and 9 show that CIMJ and creteness and basing interval estimation on the
CIMW maintain their expected length advantage Jeffreys prior or the score test rather than the Wald
over CIAC , although it is reduced somewhat. test extend to parameters in other discrete distri-
However, when n decreases below 10, the results butions and to two-sample comparisons? We have
change, with CIMJ having greater expected width seen that interval estimation of the Poisson param-
than CIAC and CIMW . Our Figure 1 extends the eter benets from inverting the score test rather
BCD Figure 9 to values of n < 10, showing how the than the Wald test on the count scale (Agresti and
comparison differs between the ordinary intervals Coull, 1998).
and the modied ones. Our Figure 2 has the format One would not think there could be anything
of the BCD Figure 8, but for n = 5 instead of 25. new to say about the Wald condence interval
Admittedly, n = 5 is a rather extreme case, one for for a proportion, an inferential method that must
which the Jeffreys interval is modied unless x = 2 be one of the most frequently used since Laplace
or 3 and the Wilson interval is modied unless x = 0 (1812, page 283). Likewise, the condence inter-
or 5, and for it CIAC has coverage probabilities that val for a proportion based on the Jeffreys prior
can dip below 0.90. Thus, overall, the BCD recom- has received attention in various forms for some
mendations about choice of method seem reasonable time. For instance, R. A. Fisher (1956, pages 63
to us. Our own preference is to use the Wilson inter- 70) showed the similarity of a Bayesian analysis
val for statistical practice and CIAC for teaching in with Jeffreys prior to his ducial approach, in a dis-
elementary statistics courses. cussion that was generally critical of the condence
interval method but grudgingly admitted of limits
4. EXTENSIONS obtained by a test inversion such as the Clopper
Other than near-boundary modications, another Pearson method, though they fall short in logical
type of ne-tuning that may help is to invert a test content of the limits found by the ducial argument,
permitting unequal tail probabilities. This occurs and with which they have often been confused, they
naturally in exact inference that inverts a sin- do full some of the desiderata of statistical infer-
gle two-tailed test, which can perform better than ences. Congratulations to the authors for brilliantly
inverting two separate one-tailed tests (e.g., Sterne, casting new light on the performance of these old
1954; Blyth and Still, 1983). and established methods.

Comment
George Casella

1. INTRODUCTION vastly different from their nominal condence level.


What we now see is that for the Wald interval, an
Professors Brown, Cai and DasGupta (BCD) are
approximate interval, the chaotic behavior is relent-
to be congratulated for their clear and imaginative
less, as this interval will not maintain 1 cover-
look at a seemingly timeless problem. The chaotic
age for any value of n. Although xes relying on
behavior of coverage probabilities of discrete con-
ad hoc rules abound, they do not solve this funda-
dence sets has always been an annoyance, result-
ing in intervals whose coverage probability can be mental defect of the Wald interval and, surprisingly,
the usual safety net of asymptotics is also shown
not to exist. So, as the song goes, Bye-bye, so long,
farewell to the Wald interval.
George Casella is Arun Varma Commemorative Now that the Wald interval is out, what is in?
Term Professor and Chair, Department of Statis- There are probably two answers here, depending
tics, University of Florida, Gainesville, Florida on whether one is in the classroom or the consult-
32611-8545 e-mail: casella@stat.u.edu. ing room.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 121

Fig. 1. Coverage probabilities of the Blyth-Still interval upper and Agresti-Coull interval lower for n = 100 and 1 = 0 95.

2. WHEN YOU SAY 95% stated condence; when you say 95% you should
mean 95%!
In the classroom it is (still) valuable to have a
But the x here is rather simple: apply the con-
formula for a condence intervals, and I typically
tinuity correction to the score interval (a technique
present the Wilson/score interval, starting from
that seems to be out of favor for reasons I do not
the test statistic formulation. Although this doesnt understand). The continuity correction is easy to
have the pleasing p something, most students justify in the classroom using pictures of the nor-
can understand the logic of test inversion. More- mal density overlaid on the binomial mass func-
over, the fact that the interval does not have a tion, and the resulting interval will now maintain
symmetric form is a valuable lesson in itself; the its nominal level. (This last statement is not based
statistical world is not always symmetric. on analytic proof, but on numerical studies.) Anyone
However, one thing still bothers me about this reading Blyth (1986) cannot help being convinced
interval. It is clearly not a 1 interval; that is, that this is an excellent approximation, coming at
it does not maintain its nominal coverage prob- only a slightly increased effort.
ability. This is a defect, and one that should not One other point that Blyth makes, which BCD do
be compromised. I am uncomfortable in present- not mention, is that it is easy to get exact con-
ing a condence interval that does not maintain its dence limits at the endpoints. That is, for X = 0 the
122 L. D. BROWN, T. T. CAI AND A. DASGUPTA

lower bound is 0 and for X = 1 the lower bound is So for any value of n and , we can compute an
1 1 1/n [the solution to PX = 0 = 1 ]. exact, shortest 1 condence interval that will
not display any of the pathological behavior illus-
3. USE YOUR TOOLS trated by BCD. As an example, Figure 1 shows the
AgrestiCoull interval along with the BlythStill
The essential message that I take away from the interval for n = 100 and 1 = 0 95. While
work of BCD is that an approximate/formula-based the AgrestiCoull interval fails to maintain 0 95
approach to constructing a binomial condence coverage in the middle p region, the BlythStill
interval is bound to have essential aws. However, interval always maintains 0 95 coverage. What is
this is a situation where brute force computing will more surprising, however, is that the BlythStill
do the trick. The construction of a 1 binomial interval displays much less variation in its cov-
condence interval is a discrete optimization prob- erage probability, especially near the endpoints.
lem that is easily programmed. So why not use the Thus, the simplistic numerical algorithm produces
tools that we have available? If the problem will an excellent interval, one that both maintains its
yield to brute force computation, then we should guaranteed coverage and reduces oscillation in the
use that solution. coverage probabilities.
Blyth and Still (1983) showed how to compute
exact intervals through numerical inversion of ACKNOWLEDGMENT
tests, and Casella (1986) showed how to compute
exact intervals by rening conservative intervals. Supported by NSF Grant DMS-99-71586.

Comment
Chris Corcoran and Cyrus Mehta

We thank the authors for a very accessible some asymptotic procedures tenuous, even when the
and thorough discussion of this practical prob- underlying probability lies away from the boundary
lem. With the availability of modern computa- or when the sample size is relatively large.
tional tools, we have an unprecedented opportu- The authors have evaluated various condence
nity to carefully evaluate standard statistical pro- intervals with respect to their coverage properties
cedures in this manner. The results of such work and average lengths. Implicit in their evaluation
are invaluable to teachers and practitioners of is the premise that overcoverage is just as bad as
statistics everywhere. We particularly appreciate undercoverage. We disagree with the authors on this
the attention paid by the authors to the gener- fundamental issue. If, because of the discreteness of
ally oversimplied and inadequate recommenda- the test statistic, the desired condence level cannot
tions made by statistical texts regarding when to be attained, one would ordinarily prefer overcover-
use normal approximations in analyzing binary age to undercoverage. Wouldnt you prefer to hire
data. As their work has plainly shown, even in a fortune teller whose track record exceeds expec-
the simple case of a single binomial proportion, tations to one whose track record is unable to live
the discreteness of the data makes the use of up to its claim of accuracy? With the exception of
the ClopperPearson interval, none of the intervals
discussed by the authors lives up to its claim of
Chris Corcoran is Assistant Professor, Depart- 95% accuracy throughout the range of p. Yet the
ment of Mathematics and Statistics, Utah authors dismiss this interval on the grounds that
State University, 3900 old Main Hill, Logon, it is wastefully conservative. Perhaps so, but they
Utah, 84322-3900 e-mail: corcoran@math. do not address the issue of how the wastefulness is
usu.edu. Cyrus Mehta is Professor, Department manifested.
of Biostatistics, Harvard School of Public Health, What penalty do we incur for furnishing con-
655 Huntington Avenue Boston, Massachusetts dence intervals that are more truthful than was
02115 and is with Cytel Software Corporation, 675 required of them? Presumably we pay for the conser-
Massachusetts Avenue, Cambridge, Massachusetts vatism by an increase in the length of the condence
02319. interval. We thought it would be a useful exercise
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 123

In fact the BSC and LR intervals are actually


shorter than AgrestiCoull for p < 0 2 or p > 0 8,
and shorter than the Wilson interval for p < 0 1
and p > 0 9. The only interval that is uniformly
shorter than BSC and LR is the Jeffreys interval.
Most of the time the difference in lengths is negligi-
ble, and in the worst case (at p = 0 5) the Jeffreys
interval is only shorter by 0.025 units. Of the three
asymptotic methods recommended by the authors,
the Jeffreys interval yields the lowest average prob-
ability of coverage, with signicantly greater poten-
tial relative undercoverage in the 0 05 0 20 and
0 80 0 95 regions of the parameter space. Consid-
ering this, one must question the rationale for pre-
ferring Jeffreys to either BSC or LR.
The authors argue for simplicity and ease of com-
putation. This argument is valid for the teaching of
statistics, where the instructor must balance sim-
plicity with accuracy. As the authors point out, it is
customary to teach the standard interval in intro-
ductory courses because the formula is straight-
forward and the central limit theorem provides a
good heuristic for motivating the normal approxi-
mation. However, the evidence shows that the stan-
dard method is woefully inadequate. Teaching sta-
Fig. 1. Actual coverage probabilities for BSC and LR intervals tistical novices about a ClopperPearson type inter-
as a function of pn = 50 Compare to authors Figures 5 10 val is conceptually difcult, particularly because
and 11. exact intervals are impossible to compute by hand.
As the AgrestiCoull interval preserves the con-
to actually investigate the magnitude of this penalty dence level most successfully among the three rec-
for two condence interval procedures that are guar- ommended alternative intervals, we believe that
anteed to provide the desired coverage but are not this feature when coupled with its straightforward
as conservative as ClopperPearson. Figure 1 dis- computation (particularly when = 0 05) makes
plays the true coverage probabilities for the nominal this approach ideal for the classroom.
95% BlythStillCasella (see Blyth and Still, 1983; Simplicity and ease of computation have no role
Casella, 1984) condence interval (BSC interval) to play in statistical practice. With the advent
and the 95% condence interval obtained by invert-
of powerful microcomputers, researchers no longer
ing the exact likelihood ratio test (LR interval; the
resort to hand calculations when analyzing data.
inversion follows that shown by Aitken, Anderson,
While the need for simplicity applies to the class-
Francis and Hinde, 1989, pages 112118).
room, in applications we primarily desire reliable,
There is no value of p for which the coverage of the
BSC and LR intervals falls below 95%. Their cover- accurate solutions, as there is no signicant dif-
age probabilities are, however, much closer to 95% ference in the computational overhead required by
than would be obtained by the ClopperPearson pro- the authors recommended intervals when compared
cedure, as is evident from the authors Figure 11. to the BSC and LR methods. From this perspec-
Thus one could say that these two intervals are uni- tive, the BSC and LR intervals have a substantial
formly better than the ClopperPearson interval. advantage relative to the various asymptotic inter-
We next investigate the penalty to be paid for the vals presented by the authors. They guarantee cov-
guaranteed coverage in terms of increased length of erage at a relatively low cost in increased length.
the BSC and LR intervals relative to the Wilson, In fact, the BSC interval is already implemented in
AgrestiCoull, or Jeffreys intervals recommended StatXact (1998) and is therefore readily accessible to
by the authors. This is shown by Figure 2. practitioners.
124 L.D. BROWN, T.T. CAI AND A. DASGUPTA

Fig. 2. Expected lengths of BSC and LR intervals as a function of p compared respectively to Wilson AgrestiCoull and Jeffreys
intervals n = 25. Compare to authors Figure 8

Comment
Malay Ghosh

This is indeed a very valuable article which brings modied Jeffreys equal-tailed interval works well
out very clearly some of the inherent difculties in this problem and recommend it as a possible con-
associated with condence intervals for parame- tender to the Wilson interval for n 40.
ters of interest in discrete distributions. Professors There is a deep-rooted optimality associated with
Brown, Cai and Dasgupta (henceforth BCD) are Jeffreys prior as the unique rst-order probability
to be complimented for their comprehensive and matching prior for a real-valued parameter of inter-
thought-provoking discussion about the chaotic est with no nuisance parameter. Roughly speak-
behavior of the Wald interval for the binomial pro- ing, a probability matching prior for a real-valued
portion and an appraisal of some of the alternatives parameter is one for which the coverage probability
that have been proposed. of a one-sided Baysian credible interval is asymp-
My remarks will primarily be conned to the totically equal to its frequentist counterpart. Before
discussion of Bayesian methods introduced in this giving a formal denition of such priors, we pro-
paper. BCD have demonstrated very clearly that the
vide an intuitive explanation of why Jeffreys prior
is a matching prior. To this end, we begin with
Malay Ghosh is Distinguished Professor, Depart- the fact that if X1   Xn are iid N 1, then
ment of Statistics, University of Florida, Gainesville, Xn = n Xi /n is the MLE of . With the uni-
i=1
Florida 32611-8545 e-mail: ghoshm@stat.u.edu. form prior  c (a constant), the posterior of
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 125

n  1/n. Accordingly, writing z for the upper


is NX belongs to the general class of rst-order probabil-
100% point of the N0 1 distribution, ity matching priors
1/2
P X n 
n + z n1/2 X  I11 h2   p 
n + z n1/2 
= 1 = P X as derived in Tibshirani (1989). Here h is an arbi-
trary function differentiable in its arguments.
and this is an example of perfect matching. Now In general, matching priors have a long success
if n is the MLE of , under suitable regular- story in providing frequentist condence intervals,
ity conditions, n  is asymptotically (as n ) especially in complex problems, for example, the
N I1 , where I is the Fisher Information
 1/2 BehrensFisher or the common mean estimation
number. With the transformation g = I t, problems where frequentist methods run into dif-
by the delta method, gn  is asymptotically culty. Though asymptotic, the matching property
Ng 1. Now, intuitively one expects the uniform seems to hold for small and moderate sample sizes
prior  c as the asymptotic matching prior for as well for many important statistical problems.
g. Transforming back to the original parameter, One such example is Garvan and Ghosh (1997)
Jeffreys prior is a probability matching prior for . where such priors were found for general disper-
Of course, this requires an invariance of probability sion models as given in Jorgensen (1997). It may
matching priors, a fact which is rigorously estab- be worthwhile developing these priors in the pres-
lished in Datta and Ghosh (1996). Thus a uniform ence of nuisance parameters for other discrete cases
prior for arcsin1/2 , where is the binomial pro- as well, for example when the parameter of interest
portion, leads to Jeffreys Beta (1/2, 1/2) prior for . is the difference of two binomial proportions, or the
When is the Poisson parameter, the uniform prior log-odds ratio in a 2 2 contingency table.
for 1/2 leads to Jeffreys prior 1/2 for . Having argued so strongly in favor of matching
In a more formal set-up, let X1   Xn be iid priors, I wonder, though, whether there is any spe-
conditional on some real-valued . Let 1 X1   cial need for such priors in this particular problem of
Xn  denote a posterior 1th quantile for under binomial proportions. It appears that any Beta (a a)
the prior . Then is said to be a rst-order prob- prior will do well in this case. As noted in this paper,
ability matching prior if by shrinking the MLE X/n toward the prior mean
P 1 X1   Xn  1/2, one achieves a better centering for the construc-
(1) tion of condence intervals. The two diametrically
= 1 + on1/2 
opposite priors Beta (2, 2) (symmetric concave with
This denition is due to Welch and Peers (1963) maximum at 1/2 which provides the AgrestiCoull
who showed by solving a differential equation that interval) and Jeffreys prior Beta (1/2 1/2) (symmet-
Jeffreys prior is the unique rst-order probability ric convex with minimum at 1/2) seem to be equally
matching prior in this case. Strictly speaking, Welch good for recentering. Indeed, I wonder whether any
and Peers proved this result only for continuous Beta   prior which shrinks the MLE toward
distributions. Ghosh (1994) pointed out a suitable the prior mean / +  becomes appropriate for
modication of criterion (1) which would lead to the recentering.
same conclusion for discrete distributions. Also, for The problem of construction of condence inter-
small and moderate samples, due to discreteness, vals for binomial proportions occurs in rst courses
one needs some modications of Jeffreys interval as in statistics as well as in day-to-day consulting.
done so successfully by BCD. While I am strongly in favor of replacing Wald inter-
This idea of probability matching can be extended vals by the new ones for the latter, I am not quite
even in the presence of nuisance parameters. sure how easy it will be to motivate these new inter-
Suppose that = 1   p T , where 1 is the par- vals for the former. The notion of shrinking can be
ameter of interest, while 2   p T is the nui- explained adequately only to a few strong students
sance parameter. Writing I = Ijk  as the in introductory statistics courses. One possible solu-
Fisher information matrix, if 1 is orthogonal to tion for the classroom may be to bring in the notion
2   p T in the sense of Cox and Reid (1987), of continuity correction and somewhat heuristcally
that is, I1k = 0 for all k = 2  p, extending ask students to work with X+ 12  nX+ 12  instead
1/2
the previous intuitive argument,  I11  of X n X. In this way, one centers around
is a probability matching prior. Indeed, this prior X + 12 /n + 1 a la Jeffreys prior.
126 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Comment
Thomas J. Santner

I thank the authors for their detailed look at be consistent in their assessment of the likely values
a well-studied problem. For the Wald binomial p for p. Symmetry (1) is the minimal requirement of a
interval, there has not been an appreciation of binomial condence interval. The Wilson and equal-
the long persistence (in n) of p locations having tailed Jeffrey intervals advocated by BCD satisfy
substantially decient achieved coverage compared the symmetry property (1) and have coverage that
with the nominal coverage. Figure 1 is indeed a is centered (when coverage is plotted versus true p)
picture that says a thousand words. Similarly, the about the nominal value. They are also straightfor-
asymptotic lower limit in Theorem 1 for the mini- ward to motivate, even for elementary students, and
mum coverage of the Wald interval is an extremely simple to compute for the outcome of interest.
useful analytic tool to explain this phenomenon, However, regarding p condence intervals as the
although other authors have given xed p approx- inversion of a family of acceptance regions corre-
imations of the coverage probability of the Wald sponding to size tests of H0  p = p0 versus
interval (e.g., Theorem 1 of Ghosh, 1979). HA  p = p0 for 0 < p0 < 1 has some sub-
My rst set of comments concern the specic bino- stantial advantages. Indeed, Brown et al. mention
mial problem that the authors address and then the this inversion technique when they remark on the
implications of their work for other important dis- desirable properties of intervals formed by invert-
crete data condence interval problems. ing likelihood ratio test acceptance regions of H0
The results in Ghosh (1979) complement the cal- versus HA . In the binomial case, the acceptance
culations of Brown, Cai and DasGupta (BCD) by region of any reasonable test of H0  p = p0 is of
pointing out that the Wald interval is too long in the form Lp0   Up0 . These acceptance regions
addition to being centered at the wrong value (the invert to intervals if and only if Lp0 and Up0 are
MLE as opposed to a Bayesian point estimate such nondecreasing in p0 (otherwise the inverted p con-
is used by the AgrestiCoull interval). His Table 3 dence set can be a union of intervals). Of course,
lists the probability that the Wald interval is longer there are many families of size tests that meet
than the Wilson interval for a central set of p val- this nondecreasing criterion for inversion, includ-
ues (from 0.20 to 0.80) and a range of sample sizes ing the very conservative test used by Clopper and
n from 20 to 200. Perhaps surprisingly, in view of Pearson (1934). For the binomial problem, Blyth and
its inferior coverage characteristics, the Wald inter- Still (1983) constructed a set of condence intervals
val tends to be longer than the Wilson interval by selecting among size acceptance regions those
with very high probability. Hence the Wald interval that possessed additional symmetry properties and
is both too long and centered at the wrong place. were small (leading to short condence intervals).
This is a dramatic effect of the skewness that BCD For example, they desired that the interval should
mention. move to the right as x increases when n is xed
When discussing any system of intervals, one and should move the left as n increases when x
is concerned with the consistency of the answers is xed. They also asked that their system of inter-
given by the interval across multiple uses by a vals increase monotonically in the coverage proba-
single researcher or by groups of users. Formally, bility for xed x and n in the sense that the higher
this is the reason why various symmetry properties nominal coverage interval contain the lower nomi-
are required of condence intervals. For example, nal coverage interval.
in the present case, requiring that the p interval In addition to being less intuitive to unsophisti-
LX UX satisfy the symmetry property cated statistical consumers, systems of condence
intervals formed by inversion of acceptance regions
(1) Lx Ux = 1 Ln x 1 Un x also have two other handicaps that have hindered
for x 0  n shows that investigators who their rise in popularity. First, they typically require
reverse their denitions of success and failure will that the condence interval (essentially) be con-
structed for all possible outcomes, rather than
merely the response of interest. Second, their rather
Thomas J. Santner is Profesor, Ohio State Univer- brute force character means that a specialized com-
sity, 404 Cockins Hall, 1958 Neil Avenue, Columbus, puter program must be written to produce the
Ohio 43210 e-mail: tjs@stat.ohio-state.edu. acceptance sets and their inversion (the intervals).
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 127

Fig. 1. Coverage of nominal 95% symmetric DuffySantner p intervals for n = 20 bottom panel and n = 50 top panel

However, the benets of having reasonably short aries arbitrarily set at a b = 0 1 in the nota-
and suitably symmetric condence intervals are suf- tion of Duffy and Santner, and a small adjustment
cient that such intervals have been constructed for was made to insure symmetry property (1). (The
several frequently occurring problems of biostatis- nonsymmetrical multiple stage stopping boundaries
tics. For example, Jennison and Turnbull (1983) and that produce the data considered in Duffy and Sant-
Duffy and Santner (1987) present acceptance set ner do not impose symmetry.) The coverages of these
inversion condence intervals (both with available systems are shown in Figure 1. To give an idea of
FORTRAN programs to implement their methods) computing time, the n = 50 intervals required less
for a binomial p based on data from a multistage than two seconds to compute on my 400 Mhz PC.
clinical trial; Coe and Tamhane (1989) describe a To further facilitate comparison with the intervals
more sophisticated set of repeated condence inter-
whose coverage is displayed in Figure 5 of BCD,
vals for p1 p2 also based on multistage clinical
I computed the Duffy and Santner intervals for a
trial data (and give a SAS macro to produce the
slightly lower level of coverage, 93.5%, so that the
intervals). Yamagami and Santner (1990) present
average coverage was about the desired 95% nomi-
an acceptance setinversion condence interval and
FORTRAN program for p1 p2 in the two-sample nal level; the coverage of this system is displayed
binomial problem. There are other examples. in Figure 2 on the same vertical scale and com-
To contrast with the intervals whose coverages pares favorably. It is possible to call the FORTRAN
are displayed in BCDs Figure 5 for n = 20 and program that makes these intervals within SPLUS
n = 50, I formed the multistage intervals of Duffy which makes for convenient data analysis.
and Santner that strictly attain the nominal con- I wish to mention that are a number of other
dence level for all p. The computation was done small sample interval estimation problems of con-
naively in the sense that the multistage FORTRAN tinuing interest to biostatisticians that may well
program by Duffy that implements this method have very reasonable small sample solutions based
was applied using one stage with stopping bound- on analogs of the methods that BCD recommend.
128 L. D. BROWN, T. T. CAI AND A. DASGUPTA

Fig. 2. Coverage of nominal 93 5% symmetric DuffySantner p intervals for n = 50.

Most of these would be extremely difcult to han- method of choice in other elementary texts. In his
dle by the more brute force method of inverting introductory texts, Larson (1974) introduces the
acceptance sets. The rst of these is the problem Wilson interval as the method of choice although
of computing simultaneous condence intervals for
he makes the vague, and indeed false, statement, as
p0 pi  1 i T that arises in comparing a con-
trol binomial distribution with T treatment ones. BCD show, that the user can use the Wald interval if
The second concerns forming simultaneous con- n is large enough. One reviewer of Santner (1998),
dence intervals for pi pj , the cell probabilities an article that showed the coverage virtues of the
of a multinomial distribution. In particular, the Wilson interval compared with Wald-like intervals
equal-tailed Jeffrey prior approach recommended by
advocated by another author in the magazine Teach-
the author has strong appeal for both of these prob-
lems. ing Statistics (written for high school teachers) com-
Finally, I note that the Wilson intervals seem mented that the Wilson method was the standard
to have received some recommendation as the method taught in the U.K.

Rejoinder
Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

We deeply appreciate the many thoughtful and atic than previously believed. We are happy to see
constructive remarks and suggestions made by the a consensus that the Wald interval deserves to
discussants of this paper. The discussion suggests be discarded, as we have recommended. It is not
that we were able to make a convincing case that surprising to us to see disagreement over the spe-
the often-used Wald interval is far more problem- cic alternative(s) to be recommended in place of
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 129

this interval. We hope the continuing debate will to the construction of satisfactory procedures. This
add to a greater understanding of the problem, and forces one to again decide whether % should mean
we welcome the chance to contribute to this debate. approximately %, as it does in most other con-
temporary applications, or at least % as can
A. It seems that the primary source of disagree-
be obtained with the BlythStill procedure or the
ment is based on differences in interpretation
CloppePearson procedure. An obvious price of the
of the coverage goals for condence intervals.
latter approach is in its decreased precision, as mea-
We will begin by presenting our point of view
sured by the increased expected length of the inter-
on this fundamental issue.
We will then turn to a number of other issues, vals.
as summarized in the following list: B. All the discussants agree that elementary
motivation and simplicity of computation are impor-
B. Simplicity is important. tant attributes in the classroom context. We of
C. Expected length is also important. course agree. If these considerations are paramount
D. Santners proposal. then the AgrestiCoull procedure is ideal. If the
E. Should a continuity correction be used? need for simplicity can be relaxed even a little, then
F. The Wald interval also performs poorly in we prefer the Wilson procedure: it is only slightly
other problems. harder to compute, its coverage is clearly closer to
G. The two-sample binomial problem. the nominal value across a wider range of values of
H. Probability-matching procedures. p, and it can be easier to motivate since its deriva-
I. Results from asymptotic theory. tion is totally consistent with NeymanPearson the-
A. Professors Casella, Corcoran and Mehta come ory. Other procedures such as Jeffreys or the mid-P
out in favor of making coverage errors always fall ClopperPearson interval become plausible competi-
only on the conservative side. This is a traditional tors whenever computer software can be substituted
point of view. However, we have taken a different for the possibility of hand derivation and computa-
perspective in our treatment. It seems more consis- tion.
tent with contemporary statistical practice to expect Corcoran and Mehta take a rather extreme posi-
that a % condence interval should cover the true tion when they write, Simplicity and ease of com-
value approximately % of the time. The approxi- putation have no role to play in statistical practice
mation should be built on sound, relevant statisti- [italics ours]. We agree that the ability to perform
cal calculations, and it should be as accurate as the computations by hand should be of little, if any, rel-
situation allows. evance in practice. But conceptual simplicity, parsi-
We note in this regard that most statistical mod- mony and consistency with general theory remain
els are only felt to be approximately valid as repre- important secondary conditions to choose among
sentations of the true situation. Hence the result- procedures with acceptable coverage and precision.
ing coverage properties from those models are at These considerations will reappear in our discus-
best only approximately accurate. Furthermore, a sion of Santners BlythStill proposal. They also
broad range of modern procedures is supported leave us feeling somewhat ambivalent about the
only by asymptotic or Monte-Carlo calculations, and boundary-modied procedures we have presented in
so again coverage can at best only be approxi- our Section 4.1. Agresti and Coull correctly imply
mately the nominal value. As statisticians we do that other boundary corrections could have been
the best within these constraints to produce proce- tried and that our choice is thus somewhat ad hoc.
dures whose coverage comes close to the nominal (The correction to Wilson can perhaps be defended
value. In these contexts when we claim % cover- on the principle of substituting a Poisson approx-
age we clearly intend to convey that the coverage is imation for a Gaussian one where the former is
close to %, rather than to guarantee it is at least clearly more accurate; but we see no such funda-
%. mental motivation for our correction to the Jeffreys
We grant that the binomial model has a some- interval.)
what special character relative to this general dis- C. Several discussants commented on the pre-
cussion. There are practical contexts where one can cision of various proposals in terms of expected
feel condent this model holds with very high preci- length of the resulting intervals. We strongly con-
sion. Furthermore, asymptotics are not required in cur that precision is the important balancing crite-
order to construct practical procedures or evaluate rion vis-a-vis coverage. We wish only to note that
their properties, although asymptotic calculations there exist other measures of precision than inter-
can be useful in both regards. But the discrete- val expected length. In particular, one may investi-
ness of the problem introduces a related barrier gate the probability of covering wrong values. In a
130 L. D. BROWN, T. T. CAI AND A. DASGUPTA

charming identity worth noting, Pratt (1961) shows he suggests producing nominal % intervals by con-
the connection of this approach to that of expected structing the % BlythStill intervals, with %
length. Calculations on coverage of wrong values of chosen so that the average coverage of the result-
p in the binomial case will be presented in Das- ing intervals is approximately the nominal value,
Gupta (2001). This article also discusses a number %. The coverage plot for this procedure compares
of additional issues and presents further analytical well with that for Wilson or Jeffreys in our Figure 5.
calculations, including a Pearson tilting similar to Perhaps the expected interval length for this proce-
the chi-square tilts advised in Hall (1983). dure also compares well, although Santner does not
Corcoran and Mehtas Figure 2 compares average say so. However, we still do not favor his proposal.
length of three of our proposals with BlythStill and It is conceptually more complicated and requires a
with their likelihood ratio procedure. We note rst specially designed computer program, particularly if
that their LB procedure is not the same as ours. one wishes to compute % with any degree of accu-
Theirs is based on numerically computed exact per- racy. It thus fails with respect to the criterion of sci-
centiles of the xed sample likelihood ratio statistic. entic parsimony in relation to other proposals that
We suspect this is roughly equivalent to adjustment appear to have at least competitive performance
of the chi-squared percentile by a Bartlett correc- characteristics.
tion. Ours is based on the traditional asymptotic E. Casella suggests the possibility of perform-
chi-squared formula for the distribution of the like- ing a continuity correction on the score statistic
lihood ratio statistic. Consequently, their procedure prior to constructing a condence interval. We do
has conservative coverage, whereas ours has cov- not agree with this proposal from any perspec-
erage uctuating around the nominal value. They tive. These continuity-corrected Wilson intervals
assert that the difference in expected length is neg- have extremely conservative coverage properties,
ligible. How much difference qualies as negligible though they may not in principle be guaranteed to
is an arguable, subjective evaluation. But we note be everywhere conservative. But even if ones goal,
that in their plot their intervals can be on aver- unlike ours, is to produce conservative intervals,
age about 8% or 10% longer than Jeffreys or Wilson these intervals will be very inefcient at their nor-
intervals, respectively. This seems to us a nonneg- mal level relative to BlythStill or even Clopper
ligible difference. Actually, we suspect their prefer- Pearson. In Figure 1 below, we plot the coverage
ence for their LR and BSC intervals rests primarily of the Wilson interval with and without a conti-
on their overriding preference for conservativity in nuity correction for n = 25 and = 0 05, and
coverage whereas, as we have discussed above, our the corresponding expected lengths. It is seems
intervals are designed to attain approximately the clear that the loss in precision more than neutral-
desired nominal value. izes the improvements in coverage and that the
D. Santner proposes an interesting variant of the nominal coverage of 95% is misleading from any
original BlythStill proposal. As we understand it, perspective.

Fig. 1. Comparison of the coverage probabilities and expected lengths of the Wilson dotted and continuity-corrected Wilson solid
intervals for n = 25 and = 0 05.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 131

Fig. 2. Comparison of the systematic coverage biases. The y-axis is nSn p From top to bottom: the systematic coverage biases of the
AgrestiCoull Wilson Jeffreys likelihood ratio and Wald intervals with n = 50 and = 0 05.

F. Agresti and Coull ask if the dismal perfor- probability matching is a one-sided criterion. Thus
mance of the Wald interval manifests itself in a family of two-sided intervals Ln  Un  will be rst-
other problems, including nordiscrete cases. Indeed order probability matching if
it does. In other lattice cases such as the Poisson
Prp p Ln  = /2 + on1/2  = Prp p Un 
and negative binomial, both the considerable neg-
ative coverage bias and inefciency in length per- As Ghosh notes, this denition cannot usefully
sist. These features also show up in some continu- be literally applied to the binomial problem here,
ous exponential family cases. See Brown, Cai and because the asymptotic expansions always have a
DasGupta (2000b) for details. discrete oscillation term that is On1/2 . However,
In the three important discrete cases, the bino- one can correct the denition.
mial, Poisson and negative binomial, there is in fact One way to do so involves writing asymptotic
some conformity in regard to which methods work expressions for the probabilities of interest that can
well in general. Both the likelihood ratio interval be divided into a smooth part, S, and an oscil-
(using the asymptotic chi-squared limits) and the lating part, Osc, that averages to On3/2  with
equal-tailed Jeffreys interval perform admirably in respect to any smooth density supported within (0,
all of these problems with regard to coverage and 1). Readers could consult BCD (2000a) for more
expected length. Perhaps there is an underlying the- details about such expansions. Thus, in much gen-
oretical reason for the parallel behavior of these erality one could write
two intervals constructed from very different foun-
Prp p Ln 
dational principles, and this seems worth further (1)
= /2 + SLn p + OscLn p + On1 
study.
G. Some discussants very logically inquire about where SLn p = On1/2 , and OscLn p has the
the situation in the two-sample binomial situation. property informally described above. We would then
Curiously, in a way, the Wald interval in the two- say that the procedure is rst-order probability
sample case for the difference of proportions is less matching if SLn p = on1/2 , with an analogous
problematic than in the one-sample case. It can expression for the upper limit, Un .
nevertheless be somewhat improved. Agresti and In this sense the equal-tail Jeffreys procedure
Caffo (2000) present a proposal for this problem, is probability matching. We believe that the mid-
and Brown and Li (2001) discuss some others. P ClopperPearson intervals also have this asymp-
H. The discussion by Ghosh raises several inter- totic property. But several of the other proposals,
esting issues. The denition of rst-order proba- including the Wald, the Wilson and the likelihood
bility matching extends in the obvious way to any ratio intervals are not rst-order probability match-
set of upper condence limits; not just those cor- ing. See Cai (2001) for exact and asymptotic calcula-
responding to Bayesian intervals. There is also an tions on one-sided condence intervals and hypoth-
obvious extension to lower condence limits. This esis testing in the discrete distributions.
132 L. D. BROWN, T. T. CAI AND A. DASGUPTA

The failure of this one-sided, rst-order property, that we have satisfactorily met the rst two of these
however, has no obvious bearing on the coverage goals. As Professor Casella notes, the debate about
properties of the two-sided procedures considered alternatives in this timeless problem will linger on,
in the paper. That is because, for any of our proce- as it should. We thank the discussants again for a
dures, lucid and engaging discussion of a number of rel-
(2) SLn p + SUn p = 0 + On1  evant issues. We are grateful for the opportunity
to have learned so much from these distinguished
even when the individual terms on the left are only colleagues.
On1/2 . All the procedures thus make compensat-
ing one-sided errors, to On1 , even when they are
not accurate to this degree as one-sided procedures. ADDITIONAL REFERENCES
This situation raises the question as to whether Agresti, A. and Caffo, B. (2000). Simple and effective con-
it is desirable to add as a secondary criterion for dence intervals for proportions and differences of proportions
two-sided procedures that they also provide accu- result from adding two successes and two failures. Amer.
Statist. 54. To appear.
rate one-sided statements, at least to the probabil-
Aitkin, M., Anderson, D., Francis, B. and Hinde, J. (1989).
ity matching On1/2 . While Ghosh argues strongly Statistical Modelling in GLIM. Oxford Univ. Press.
for the probability matching property, his argument Boos, D. D. and Hughes-Oliver, J. M. (2000). How large does n
does not seem to take into account the cancellation have to be for Z and t intervals? Amer. Statist. 54 121128.
inherent in (2). We have heard some others argue in Brown, L. D., Cai, T. and DasGupta, A. (2000a). Condence
intervals for a binomial proportion and asymptotic expan-
favor of such a requirement and some argue against sions. Ann. Statist. To appear.
it. We do not wish to take a strong position on Brown, L. D., Cai, T. and DasGupta, A. (2000b). Interval estima-
this issue now. Perhaps it depends somewhat on the tion in exponential families. Technical report, Dept. Statis-
practical contextif in that context the condence tics, Univ. Pennsylvania.
bounds may be interpreted and used in a one-sided Brown, L. D. and Li, X. (2001). Condence intervals for
the difference of two binomial proportions. Unpublished
fashion as well as the two-sided one, then perhaps manuscript.
probability matching is called for. Cai, T. (2001). One-sided condence intervals and hypothesis
I. Ghoshs comments are a reminder that asymp- testing in discrete distributions. Preprint.
totic theory is useful for this problem, even though Coe, P. R. and Tamhane, A. C. (1993). Exact repeated condence
exact calculations here are entirely feasible and con- intervals for Bernoulli parameters in a group sequential clin-
ical trial. Controlled Clinical Trials 14 1929.
venient. But, as Ghosh notes, asymptotic expres- Cox, D. R. and Reid, N. (1987). Orthogonal parameters and
sions can be startingly accurate for moderate approximate conditional inference (with discussion). J. Roy.
sample sizes. Asymptotics can thus provide valid Statist. Soc. Ser. B 49 113147.
insights that are not easily drawn from a series of DasGupta, A. (2001). Some further results in the binomial inter-
val estimation problem. Preprint.
exact calculations. For example, the two-sided inter-
Datta, G. S. and Ghosh, M. (1996). On the invariance of nonin-
vals also obey an expression analogous to (1), formative priors. Ann. Statist. 24 141159.
(3) Prp Ln p Un  Duffy, D. and Santner, T. J. (1987). Condence intervals for
a binomial parameter based on multistage tests. Biometrics
= 1 + Sn p + Oscn p + On3/2  43 8194.
Fisher, R. A. (1956). Statistical Methods for Scientic Inference.
The term Sn p is On1  and provides a useful Oliver and Boyd, Edinburgh.
expression for the smooth center of the oscillatory Gart, J. J. (1966). Alternative analyses of contingency tables. J.
coverage plot. (See Theorem 6 of BCD (2000a) for Roy. Statist. Soc. Ser. B 28 164179.
a precise justication.) The following plot for n = Garvan, C. W. and Ghosh, M. (1997). Noninformative priors for
dispersion models. Biometrika 84 976982.
50 compares Sn p for ve condence procedures. Ghosh, J. K. (1994). Higher Order Asymptotics. IMS, Hayward,
It shows how the Wilson, Jeffreys and chi- CA.
squared likelihood ratio procedures all have cover- Hall, P. (1983). Chi-squared approximations to the distribution
age that well approximates the nominal value, with of a sum of independent random variables. Ann. Statist. 11
Wilson being slightly more conservative than the 10281036.
Jennison, C. and Turnbull, B. W. (1983). Condence intervals
other two. for a binomial parameter following a multistage test with
As we see it our article articulated three primary application to MIL-STD 105D and medical trials. Techno-
goals: to demonstrate unambiguously that the Wald metrics, 25 4958.
interval performs extremely poorly; to point out that Jorgensen, B. (1997). The Theory of Dispersion Models. CRC
none of the common prescriptions on when the inter- Chapman and Hall, London.
Laplace, P. S. (1812). Theorie Analytique des Probabilites.
val is satisfactory are correct and to put forward Courcier, Paris.
some recommendations on what is to be used in its Larson, H. J. (1974). Introduction to Probability Theory and Sta-
place. On the basis of the discussion we feel gratied tistical Inference, 2nd ed. Wiley, New York.
INTERVAL ESTIMATION FOR BINOMIAL PROPORTION 133

Pratt, J. W. (1961). Length of condence intervals. J. Amer. Tibshirani, R. (1989). Noninformative priors for one parameter
Statist. Assoc. 56 549567. of many. Biometrika 76 604608.
Rindskopf, D. (2000). Letter to the editor. Amer. Statist. 54 88. Welch, B. L. and Peers, H. W. (1963). On formula for con-
Rubin, D. B. and Schenker, N. (1987). Logit-based interval esti- dence points based on intergrals of weighted likelihoods. J.
mation for binomial data using the Jeffreys prior. Sociologi- Roy. Statist. Ser. B 25 318329.
cal Methodology 17 131144. Yamagami, S. and Santner, T. J. (1993). Invariant small sample
Sterne, T. E. (1954). Some remarks on condence or ducial condence intervals for the difference of two success proba-
limits. Biometrika 41 275278. bilities. Comm. Statist. Simul. Comput. 22 3359.

S-ar putea să vă placă și