Sunteți pe pagina 1din 18

Improved Confidence Intervals for the Difference Between Two Proportions and Number Needed to Treat (NNT).

Version 1.49 Worksheet Tab Description 1. Abstract A brief outline of the problem. 2. Explanation How the CI calculator works. 3. Example An illustrative example. 4. Calculate CI The CI calculator itself. 5. Derivation Outline of derivation of score intervals. 6. References Selected references. 7. Acknowledgments Thanks to those who helped. To download the latest version of this spreadsheet: http://www.cebm.net/ Go to 'toolbox' to get the file.
Please send comments to: Dan Tandberg, MD The George F. Key, M.D. Endowed Professor Research Director University of New Mexico School of Medicine Department of Emergency Medicine Ambulatory Care Center. 4-West Albuquerque, New Mexico, USA 87131-5246 Phone (505) 272-5062 FAX (505) 272-6503 Internet: tandberg@salud.unm.edu

Improved Confidence Intervals for the Difference Between Two Proportions and Number Needed to Treat (NNT). Dan Tandberg, M.D.
[See Newcombe RG: Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods. Statistics in Medicine 1998;17:873-890.]

Objective: The number of patients one needs to treat to help one (NNT), together with its confidence intervals (CI) is a useful way of reporting the results of randomized controlled clinical trials. Since these estimates are derived from the difference between two proportions (control=p1, treatment=p2), reliable methods for estimating appropriate confidence intervals for this difference (p1-p2) have become increasingly important. Standard statistics textbooks and computer software packages typically provide only simple asymptotic confidence intervals for p1-p2 that are based on the normal distribution. While these are easy to calculate, they have been shown to be surprisingly unreliable when sample sizes are modest or when the proportions, p1 or p2 are close to 0 or 1. Exact confidence intervals for p1-p2 are now available using fast, iterative computer algorithms, but these require expensive software and are often unnecessarily conservative (wide). Methods: I reviewed the mathematical statistics literature describing alternative methods for estimating confidence intervals for p1-p2. Papers using computer intensive methods for comparing different methods were reviewed in detail. Results: More than a dozen different methods were found and reviewed. A hybrid efficient score method7 performed best overall, irrespective of sample size and closeness of the two proportions to 0 or 1. These intervals have a closed-form solution and do not require calculus or iteration. They can be calculated using only spreadsheet or other algebraic software. Conclusion: Estimates of confidence intervals for the difference between two binomial proportions (and NNT) calculated with currently available statistical software may be inadequate. Improved modern methods should be made widely available and more frequently used.

Confidence Intervals for the Difference in Two Proportions:


This spreadsheet on tab 4 calculates Newcombe-Wilson hybrid score confidence intervals, without a continuity correction (see reference 7). Classical Wald-type asymptotic methods results are provided for comparison. Method One provides the best coverage and is probably optimal.

Definitions:
Control: X1= number of failures in the control group n1= sample size of the control group p1=CER= control event rate=X1/n1 Experimental: X2= number of failures in the experimental group n2= sample size of the experimental group p2=EER= experimental event rate=X2/n2 n= n1+n2 a= two-sided type I error rate p1-p2=CER-EER= control event rate-experimental event rate ARR= absolute risk reduction=CER-EER NNT= 1/ARR Method 1 Newcombe-Wilson hybrid score, not continuity corrected. 2 Wald-type "classical" asymptotic method, continuity corrected. 3 Wald-type "classical" asymptotic method, not continuity corrected.

Notes:
The number needed to treat (NNT) is calculated as the reciprocal of p1-p2 using method 1. The estimated confidence intervals for NNT are left blank in cases of non-significance, i.e., where the CI of p1-p2 does not exclude zero. Altman DG: Confidence Intervals for the Number Needed to Treat. BMJ 1998;317:1309-1312 gives a clear description of this problem. Yates' Chi Square is flagged in cases in which expected cell counts are less than 5. A 2x2 table is generated at the bottom (row 60) of the CI calculator "No Cont. Corr." and "Cont. Corr." refer to continuity correction.

11

ity corrected. tinuity corrected.

An example showing the advantages of Newcombe-Wilson hybrid score based confidence intervals. Abstract: Ann Emerg Med 1984 Mar;13(3):155-7 Evaluation of prophylactic oxacillin in cat bite wounds. Elenbaas RM, McNabney WK, Robinson WA A prospective, double-blind, placebo-controlled study was undertaken to determine the influence of prophylactic oxacillin on the frequency of infection in cat bite wounds. Adult patients with uninfected full-thickness wounds presenting within 24 hours of injury were considered. Emergency department management consisted of cleansing irrigation, debridement, and closure as indicated; no topical antibiotics were applied. Patients were randomly assigned to receive oxacillin 500 mg qid for five days or identically appearing placebo. Home wound care was standardized and patients were observed at least every two days for a minimum of five days, or until wounds were sufficiently healed to allow discharge from the study. Clinical assessment of infection was confirmed microbiologically when possible. Twelve patients were admitted and 11 completed the study. Oxacillin (n = 5) and placebo (n = 6) groups were identical in sex, age, number of wounds per patient, wound location and type, delay to emergency department presentation, length of follow-up observation, medication compliance, and adequacy of home wound care. Four of six patients receiving placebo, but none of the five receiving oxacillin, developed a wound infection (P = .045). Material obtained from three of these four patients yielded Pasteurella multocida as the responsible organism. Prophylactic oxacillin was thus associated with a significant reduction in the frequency of infection following cat bites. We recommend such therapy in the care of these wounds. Summary 2x2 table: Observed: Placebo Oxacillin Total

Infected 4 0 4

Well 2 5 7

Total 6 5 11

CI for Difference of Two Proportions: Control: X1= 4 n1= 6 Experimental: X2= n2= n= a= CER= EER= Method: 0 5 11 0.05 0.667 0 Za/2= 1.96

Newcombe-Wilson NNT 1.1 1.5 10.2

Wald
No Cont. Corr. Cont. Corr.

Method Number 1(Optimal) Upper 95.0% CI= 0.90 CER-EER= 0.67 Lower 95.0% CI= 0.10

2 1.04 0.67 0.29

3 1.23 0.67 0.11

'Exact' 0.99 0.67 0.06

Note that the Wald-type "classical" asymptotic methods 2 and 3 both yield nonsensical upper confidence intervals that are greater than 1.0. This type of aberrancy is referred to as "overshoot" and is one of several problems seen with the classical methods when sample size is small or observed probabilities are near one or zero. Newcombe-Wilson hybrid score confidence intervals provide sensible estimates even under such extreme conditions. The confidence intervals calculated using the exact permutation distribution are 0.06, 0.99. (StatXact-3, version 3.0.2) These may be needlessly conservative. .

4. Calculate CI

CI for Difference of Two Proportions:


Use the scroll bars to set the values.
Control: X1= n1= Experimental: X2= n2= n= a= p1=CER= p2=EER= 4 6 0 5 11 0.05 0.8990 0.7790 Za/2= 1.95996 C2= 2.75327

1.25 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 -1.25

p-value= 0.097055518 Wald-Asymptotic


No Cont. Corr. Cont. Corr.

***some expected cell values < 5

Newcombe-Wilson Hybrid Score Method Number 1(Optimal)

NNT 8.3

2
0.5564 0.1200 -0.3164

3
0.7397 0.1200 -0.4997

Upper 95.0% CI= CER-EER= Lower 95.0% CI=

0.5496 0.1200 -0.3179

[Please see Newcombe RG: Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods. Statistics in Medicine 1998;17:873-890.derivation of method 1]

Calculations:
CER= 1-CER= EER= 1-EER= 0.89900 0.10100 0.77900 0.22100

Page 10

4. Calculate CI
8.5563505 7.6836495 2.5563505 1.6836495 7.5563505 6.6836495 3.4013605 6.8027211 5.9276823 14.880952

Asymptotic CI: SEp1-p2= 0.22263191 Lower 95.0% CI= -0.3163505 CER-EER= 0.12000 Upper 95.0% CI= 0.55635052 Asymptotic, continuity corrected CI: SEp1-p2= 0.22263191 Lower 95.0% CI= -0.4996839 CER-EER= 0.12000 Upper 95.0% CI= 0.73968386 Newcombe-Wilson hybrid score CI: lower1 0.49892495 upper1 0.98758829 lower2 0.35862307 upper2 0.95693571 Lower 95.0% CI= -0.3178598 CER-EER= 0.12000 Upper 95.0% CI= 0.54960987 -10 -10.3163505 -9.44364948

8 2 7

term1= 0.22340194 term2= 0.21919274

a= 0.05 Za= 1.959963985 Z2a= 3.841458821

Yates' Chi Square 2x2 Observed: row_1 row_2 total Expected: col_1 4 0 4 col_2 2 5 7 total 6 5 11 total 6 5 11

col_1 col_2 row_1 2.181818 3.8181818 row_2 1.818182 3.1818182 total 4 7

Page 11

4. Calculate CI

Yates' C2

col_1 col_2 row_1 0.796402 0.4550866 row_2 0.955682 0.5461039 total 4 7


Yates C2

total 6 5 11
C2 Flag

p-value 2.75327381 0.097056 Workarea for alpha scrollbar:


50

Workarea for XY Plot of CI:


Method Number Upper 95.0% CI= CER-EER= Lower 95.0% CI= 1(Optimal) 2 3 0.42960987 0.436350522 0.61968386 0.12 0.12 0.12 0.43785975 0.436350522 0.61968386

Page 12

Derivation of Newcombe-Wilson Hybrid Score Confidence Intervals for the Difference Between Two Binomial Proportions.
I. Derivation of classical, Wald-type confidence intervals for a single binomial proportion and for the difference between two binomial proportions.
Let X equal the number of successes out of a sample of n trials. Let

equal the observed sample

proportion, X/n. Let p equal the true population proportion. Let z a equal the 1-a quantile of the standard normal distribution, with a being the type I error rate. The Wald-type hypothesis test uses a standard error of p estimate (the square root term) calculated at the maximum likelihood estimate,

p:

za / 2 < p - p / p(1 - p) / n
p - za / 2 p (1 - p ) / n < p < p + z a / 2

[Equation 1]

A 100(1-a)% confidence interval for p may then be calculated by solving this inequality for p.

p (1 - p ) / n

[Equation 2]

(For clarity, from this point on we will drop the subscript from za/2.) By a similar inversion of the Waldtype test for the difference between two independent binomial proportions, p1- p2, a 100(1-a)% confidence interval may then be calculated as:

( p1 - p2 ) - z p1 (1 - p1 ) / n1 + p2 (1 - p2 ) / n2 < p1 - p 2 < ( p1 - p2 ) + z p1 (1 - p1 ) / n1 + p2 (1 - p2 ) / n2
[Equation 3] where the subscripts indicate the first and second binomial proportions. These are the methods most often presented in introductory textbooks of statistics and most often made available in software.

II. Derivation of Wilson score confidence interval for a single binomial proportion.
Let X equal the number of successes out of a sample of n trials. Let

equal the observed sample

proportion, X/n. Let p equal the true population proportion. Let z a equal the 1-a quantile of the standard normal distribution. The Wilson-type hypothesis test estimates the standard error of p (the square root term) at the null hypothesis. This is the score test approach to hypothesis testing.

z < p - p / p (1 - p ) / n
[Equation 4. Compare this to Equation 1.] To calculate confidence limits we will set z equal to the right side of the inequality. After squaring both sides, we can put this into standard quadratic form and solve for p.

z = p - p / p (1 - p ) / n

Squaring both sides

z 2 = ( p 2 - 2 pp + p 2 ) /(p (1 - p ) / n )
Then simplifying

z 2 (p (1 - p ) / n ) = ( p 2 - 2 pp + p 2 ) z 2p / n - z 2p 2 / n = p 2 - 2 pp + p 2

Squaring both sides

z 2 = ( p 2 - 2 pp + p 2 ) /(p (1 - p ) / n )
Then simplifying

z 2 (p (1 - p ) / n ) = ( p 2 - 2 pp + p 2 ) z 2p / n - z 2p 2 / n = p 2 - 2 pp + p 2
p 2 - 2 pp + p 2 - z 2p / n + z 2p 2 / n = 0
Putting this into quadratic form, ap2+bp+c=0, yields

(( n + z 2 ) / n )p 2 - ( 2 p + z 2 / n )p + p 2 = 0
Now solve for p, using the quadratic formula

p=

- ( -( 2 p +

z2 z2 z2 )) ( -( 2 p + ))2 - 4(1 + ) p 2 n n n (n + z 2 ) 2 n

This simplifies by algebra

p=

2p +

z2 4 pz 2 z 4 4 p2 z2 4 p2 + + 2 - 4 p2 n n n n 2 (n + z ) 2 n z2 n 4 pz 2 z 4 4 p 2 z 2 + 2n n n 2 (n + z ) 2 n z4 4z2 + ( p (1 - p )) n2 n (n + z 2 ) 2 n

p=

2p +

p=

2p +

z2 n

2np + z 2 z z 2 + 4n ( p (1 - p )) p= 2 2( n + z )

These two roots provide score type upper and lower 100(1-a)% confidence limits for p.

U=

2np + z 2 + z z 2 + 4n( p(1 - p )) 2 2( n + z ) 2np + z 2 - z z 2 + 4n( p (1 - p )) 2 2( n + z )

[Equation 5]

L=

[Equation 6]

These two roots provide score type upper and lower 100(1-a)% confidence limits for p.

2np + z 2 + z z 2 + 4n( p(1 - p )) U= 2 2( n + z ) 2np + z 2 - z z 2 + 4n( p (1 - p )) 2 2( n + z )

[Equation 5]

L=

[Equation 6]

III. Derivation of Newcombe-Wilson hybrid score confidence limits for the difference between two binomial proportions.
These are formed by calculating the Wilson score intervals [Equations 5,6] for each of the two independent binomial proportion estimates, p1 and p 2 . The first proportion, p1 , with sample size n 1, has score
intervals of L1 and U1. The second proportion, p 2 , with sample size n 2 has score intervals of L2 and U 2.

These are then substituted into the standard error terms of the inequality for Wald-type confidence intervals for the difference in two proportions. Starting with Equation 3 from above we have

( p1 - p 2 ) - z p1 (1 - p1 ) / n1 + p 2 (1 - p 2 ) / n2 < p 1 - p 2 < ( p1 - p 2 ) + z p1 (1 - p1 ) / n1 + p 2 (1 - p 2 ) / n2
Replacing the observed proportions in each standard error term (the square root terms) with their corresponding score interval estimates gives us

( p1 - p2 ) - z L1 (1 - L1 ) / n1 + U 2 (1 - U 2 ) / n2 < p 1 - p 2 < ( p1 - p2 ) + z U1 (1 - U1 ) / n1 + L2 (1 - L2 ) / n2
[Equation 7] where the subscripts indicate the first and second proportions. Notice that the standard error term for the lower limit is calculated from the lower score limit for the first proportion and the upper score limit for the second proportion. The standard error term for the upper limit is calculated from the upper score limit for the first proportion and the lower score limit for the second proportion. These provides upper and lower Newcombe-Wilson hybrid score 100(1-a)% confidence limits for p1- p 2.
Upper _ Limit = ( p - p ) + z 1 2 a /2 U ( 1 - U ) / n + L (1 - L ) / n 1 1 1 2 2 2 U 2 (1 - U 2 )/n 2

Lower _ Limit = ( p - p ) - z 1 2 a /2

+ L (1 - L ) / n 1 1 1

Selected References: CI for Difference in Two Proportions 1 Wilson EB: Probable Inference, the Law of Succession, and Statistical Inference. J Am Stat Assoc 1927;22(158):209-212. 2 Clopper CJ, Pearson ES: The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika 1934;26(4)4043 Blyth CR, Still HA: Binomial Confidence Intervals. J Am Stat Assoc 1983;78(381):108-116. 4 Ghosh BK: A Comparison of Some Approximate Confidence Intervals for the Binomial Parameter. J Am Stat Assoc 1979;74(368):894 5 Agresti A, Coull BA: Approximate is Better than Exact for Interval Estimation of Binomial Proportions. The American Statistician 1 6 Newcombe RG: Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine 1998 7 Newcombe RG: Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods. Sta 8 Peskun PH: A New Confidence Interval Method Based on the Normal Approximation for the Difference of Two Binomial Probabilities 9 Altman DG: Confidence Intervals for the Number Needed to Treat. BMJ 1998;317:1309-1312. 10 Laupacis A, Sackett DL, Roberts RS: An Assessment of Clinically Useful Measures of the Consequences of Treatment. N Engl J Med 1 11 Cook RJ, Sackett DL: The number Needed to Treat: a Clinically Useful Measure of Treatment Effect. BMJ 1995;310(6977):452-454. 12 Cordell WH: Number Needed to Treat (NNT). Ann Emerg Med 1999;33(4):433-436 13 Feinstein AR: Fraud, Distortion, Delusion, and Consensus: The Problems of Human and Natural Deception in Epidemiologic Science. A 14 Feinstein AR: Invidious Comparisons and Unmet Clinical Challenges. Am J Med 1992;92:117-120. 15 Forrow L, Taylor WC, Arnold RM: Absolutely Relative: How Research Results Are Summarized Can Affect Treatment Decisions. Am 16 Naylor CD, Chen E, Strauss B: Measured Enthusiasm: Does the Method of Reporting Trial Results Alter Perceptions of Therapeutic Ef

1927;22(158):209-212. l. Biometrika 1934;26(4)404-413.

Stat Assoc 1979;74(368):894-900. s. The American Statistician 1998;52(2):119-126. ds. Statistics in Medicine 1998;17:857-872. ison of Eleven Methods. Statistics in Medicine 1998;17:873-890. of Two Binomial Probabilities. J Am Stat Assoc 1993;88(422):656-661.

of Treatment. N Engl J Med 1988;318(26):1728-1733. MJ 1995;310(6977):452-454.

on in Epidemiologic Science. Am J Med 1988;84:475-478.

fect Treatment Decisions. Am J Med 1992;92:121-124. Perceptions of Therapeutic Effectiveness? Annals of Internal Med 1992;117(11):916-921

Acknowledgment: I am indebted to Robert G. Newcombe, Senior Lecturer in Medical Statistics, University of Wales College of Medicine for his tireless assistance (over the internet) during the development of this implementation of his work. Clifford Qualls, Ph.D. and Ed Bedrick, Ph.D. of the Department of Mathematics and Statistics, University of New Mexico, helped me with much discussion and guidance. You may share this implementation with anyone who might find it useful. Any errors you run across are, of course, entirely my own. Any helpful feedback would be much appreciated. Dan Tandberg, MD

S-ar putea să vă placă și