Sunteți pe pagina 1din 3

B a s i c S t a t i s t i c s F o r D o c t o r s Singapore Med J 2003 Vol 44(4) : 172-174

Randomised Controlled
Trials (RCTs) – Sample Size:
The Magic Number?
Y H Chan

INTRODUCTION does not mean that that it is true (just that we do not
A common question posed to a biostatistician from have enough evidence to reject).
a medical researcher is “How many subjects do I We want to reject the null hypothesis but could
need to obtain a significant result for my study?”. be committing a Type I Error: rejecting the null
That magic number! In the manufacturing industry, hypothesis when it’s true. In a research study, there’s
it is permitted to test thousands of components in no such thing as “my results are correct” but rather
order to derive a conclusive result but in medical “how much error I am committing”. For example,
research, the sample size has to be “just large enough” if in the population, there are actually no differences
to provide a reliable answer to the research question. between two therapies (but we do not know, that’s
If the sample size is too small, it’s a waste of time why we are doing the study) and after conducting
doing the study as no conclusive results are likely the study, a significant difference was found which is
to be obtained and if the sample size is too large, given by p<0.05.
extra subjects may be given a therapy which perhaps There are only two reasons for this significant
could be proven to be non-efficacious with a smaller difference (assuming that we have controlled for
sample size(1). bias of any kind). One is, there’s actually a difference
Another major reason, besides the scientific between the two therapies and the other is by chance.
justification for doing a study, why a researcher wants The p-value gives us this “amount of chance”. If the
an estimate of the sample size is to calculate the cost of p-value is 0.03, then the significant difference due to
the study which will determine the feasibility of chance is 3%. If the p-value is very small, then this
conducting the study within budget. This magic difference happening by chance is “not possible” and
number will also help the researcher to estimate the thus should be due to the difference in therapies
length of his/her study – for example, the calculated (still with a small possibility of being “wrong”).
sample size may be 50 (a manageable number) but The other situation is not being able to reject
if the yearly accrual of subjects is 10 (assuming all the null hypothesis when it is actually false (Type II
subjects give consent to be in the study), it will take Error). As mentioned, the main aim of a clinical
at least five years to complete the study! In that case research is to reject the null hypothesis and we
a multicentre study is encouraged. could achieve this by controlling the type II error(2).
This is given by the Power of the study (1 – type II
STATISTICAL THEORY ON SAMPLE SIZE error): the probability of rejecting the null hypothesis
CALCULATIONS when it is false. Conventionally, the power is set
The Null Hypothesis is set up to be rejected. The at 80% or more, the higher the power, the bigger the
philosophical argument is: it is easier to prove a sample size required.
statement is false than to prove it’s true. For example, To be conservative, a two-sided test (more sample
Clinical Trials and we want to prove that “all cats are black”, and even size required) is usually carried out compared to a
Epidemiology if you point to me black cats everywhere, there’s one-sided test which has the assumption that the
Research Unit
226 Outram Road still doubt that a white cat could be lying under a test therapy will perform clinically better than the
Blk A #02-02
Singapore 169039 table somewhere. But once you bring me a white cat, standard or control therapy.
Y H Chan, PhD the hypothesis of ‘all cats are black’ is disqualified.
Head of Biostatistics Hence if we are interested to compare two SAMPLE SIZE CALCULATIONS
Correspondence to: therapies, the null hypothesis will be “there is no To estimate a sample size which will ethically
Y H Chan
Tel: (65) 6317 2121 difference” versus the Alternative Hypothesis of answer the research question of an RCT with a reliable
Fax: (65) 6317 2122 “there is a difference”. From the above philosophical conclusion, the following information should be
Email: chanyh@
cteru.gov.sg argument, not being able to reject the null hypothesis available.
Singapore Med J 2003 Vol 44(4) : 173

Type of comparison(3) Effect size of therapies


Superiority trials The effect size specifies the accepted clinical difference
To show that a new experimental therapy is superior between two therapies that a researcher wants to
to a control treatment observe in a study.
Null Hypothesis: The test therapy is not better
than the control therapy by a clinically relevant amount. There are three usual ways to get the effect size:
Alternative Hypothesis: The test therapy is better a. from past literature.
than the control therapy by a clinically relevant amount. b. if no past literature is available, one can do a
small pilot study to determine the estimated
Equivalence trials effect sizes.
Here the aim is to show that the test and control c. clinical expectations.
therapies are equally effective.
Null Hypothesis: The two therapies differ by a To calculate the sample size, besides knowing
clinically relevant amount. the type of design to be used, one has to classify the
Alternative Hypothesis: The two therapies do not type of the primary outcome.
differ by a clinically relevant amount.
Proportion outcomes
Non-inferiority trials The primary outcome of interest is dichotomous
For non-inferiority, the aim is to show that the new therapy (success/failure, yes/no, etc). For example, 25% of the
is as effective but need not be superior compared to the subjects on the standard therapy had a successful
control therapy. This is when the test therapy could be outcome and it is of clinical relevance only if we
cheaper in cost or has fewer side effects, for example. observe a 40% (effect size) absolute improvement
Null Hypothesis: The test therapy is inferior to the for those on the study therapy (i.e. 65% of the
control therapy by a clinically relevant amount. subjects will have a successful outcome). How
Alternative Hypothesis: The test therapy is not inferior many subjects do we need to observe a significance
to the control therapy by a clinically relevant amount. difference?
A 1-sided test is performed in this case. For a two-sided test of 5%, a simple formula to
calculate the sample size is given by
Type of configuration(4)
π1(1 – π1) + π2(1 – π2)
Parallel design m (size per group) = c X
(π1 – π2)2
Most commonly used design. The subjects are
randomised to one or more arms of different therapies where c = 7.9 for 80% power and 10.5 for 90%
treated concurrently. power, π1 and π2 are the proportion estimates.
Thus from the above example, π 1 = 0.25 and
Crossover design π2 = 0.65. For a 80% power, we have
For this design, subjects act as their own control, m (size per group) = 7.9 X [0.25 (1 – 0.25) +
will be randomised to a sequence of two or more 0.65 (1 – 0.65)]/(0.25-0.65)2
therapies with a washout period in between therapies. = 20.49
Appropriate for chronic conditions which will return
to its original level once therapy is discontinued. Hence 21 X 2 = 42 subjects will be needed.

Type I error and Power(5) Table I shows the required sample size per group
The type I error is usually set at two-sided 5% and for π1 & π2 in steps of 0.1for powers of 80% & 90% at
power is at 80% or 90%. two-sided 5%.

Table I
π 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 199 (266) 62 (82) 32 (42) 20 (26) 14 (17) 10 (12) 7 (9) 5 (6)
0.2 – 294 (392) 82 (109) 39 (52) 23 (30) 15 (19) 10 (13) 7 (9)
0.3 – 356 (477) 93 (125) 42 (56) 24 (31) 15 (19) 10 (12)
0.4 – 388 (519) 97 (130) 42 (56) 23 (30) 14 (17)
0.5 – 388 (519) 93 (125) 39 (52) 20 (26)
0.6 – 356 (477) 82 (109) 32 (42)
0.7 – 294 (392) 62 (82)
0.8 – 199 (266)
Numbers in ( ) are for 90% power
174 : 2003 Vol 44(4) Singapore Med J

Continuous outcomes and unless an error message is obtained, it is most


Two independent samples likely the magic number being generated is accepted
The primary outcome of interest is the mean difference by the user. For this number to be “correct”, the right
in an outcome variable between two treatment groups. formula must be used for the right type of design and
For example, it is postulated that a good clinical primary outcome. It is important to note that nearly all
response difference between the active and placebo the programs would provide the sample size for one
groups is 0.2 units with an SD of 0.5 units, how group and not the total (except for paired designs).
many subjects will be required to obtain a statistical A simple-to-use PC-based sample size software,
significance for this clinical difference? affordable in cost, is Machin’s et al(6) Sampsize version
A simple formula, for a two-sided test of 5%, is 2.1 but it could only be installed for Windows 98 and
below. Software with network capabilities are SPSS
2c
m (size per group) = +1 (www.spss.com), STATA (www.stata.com) and Power
δ2
µ2 - µ1 & Precision (www.PowerAnalysis.com), just to
where δ = is the standardised effect size and
σ mention a few. Thomas & Krebs(7) gave a review of
µ1 and µ2 are the means of the two treatment groups the various statistical power analysis software,
σ is the common standard deviation comparing the pros and cons.
c = 7.9 for 80% power and 10.5 for 90% power
CONCLUSIONS
From the above example, δ = 0.2/0.5 = 0.4 and for This article has thus far covered the basic discussions
a 80% power, we have m (size per group) = (2 X 7.9)/ for simple sample size calculations with two aims in
(0.4 X 0.4) + 1 = 99.75 mind. Firstly, a researcher could calculate his/her own
Hence 100 X 2 = 200 subjects will be needed. sample size given the types of design and measures of
outcome mentioned above; secondly, it is to provide
Table II shows the required sample size per group some knowledge on what information will be needed
for values of δ in steps of 0.1 for powers of 80% & when coming to see a biostatistician for sample
90% at 2-sided 5% size determination. If one is interested in doing
an equivalence/non-inferiority study or with survival
Paired samples outcomes analysis, it is recommended that a biostatistician
In this case, we have the pre and post mean difference should be consulted.
of the two treatment groups and a simple formula is
REFERENCES
Total sample size = c 1. Fayers PM & D Machin. Sample size: how many subjects patients
+2
δ2 are necessary!. British Journal of Cancer 1995; 72:1-9.
2. Muller KE & Benignus VA. Increasing scientific power with
Table III shows the total size required for values statistical power. Neurotoxicology & Teratology, 1992; 14:211-9.
of δ in steps of 0.1 for powers of 80% and 90% at 3. Schall R, Luus H & Erasmus T. Type of comparison, introduction to
clinical trials, editors Karlberg J & Tsang K, 1998; pp:258-66.
two-sided 5%.
4. Chan YH. Study design considerations — study configurations,
introduction to clinical trials, editors Karlberg J & Tsang K. 1998;
SAMPLE SIZE SOFTWARE pp:249-57.
5. Thomas L & Juanes F. The importance of statistical power analysis:
There are many sample size calculations software an example from animal behaviour. Animal Behaviour, 1996; 52:856-9.
available in the Internet and even on most computers. 6. D Machin, M Campbell, Fayers P & Pinol A. Sample size tables for
The main point to note in using a software is to clinical studies, 2nd edition. Blackwell Science, 1997.
7. L Thomas & CJ Krebs. A review of statistical power analysis
understand the proper instructions of getting the software. Bulletin of the Ecological Society of America 1997, 78(2):
sample size. One could enter some data into a program, 126-39.

Table II
δ
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
80% power 1,571 394 176 100 64 45 33 26 21
90% power 2,103 527 235 133 86 60 44 34 27

Table III
δ
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
80% power 792 200 90 52 34 24 19 15 12
90% power 1,052 265 119 68 44 32 24 19 15

S-ar putea să vă placă și