Documente Academic
Documente Profesional
Documente Cultură
156
BAYESIAN IDEAS AND HEALTH-CARE EVALUATION 157
This paper presents a personal perspective based simple use of Bayes theorem and use of a “community”
on experience of trying to communicate to “classi- of prior distributions in order to assess the impact of
cally” trained statisticians. For such an audience it new evidence. Of course many other important issues
can be helpful to break the additional benefits of a could be identified, including the ease of prediction,
Bayesian approach into two main strands: those that reporting probabilities of events of direct interest,
are inherent to the Bayesian philosophy, and those use of prior distributions in sample size assessment
that arise through the ability provided by the “MCMC and power calculations and so on: these features are
revolution” to handle complex models. For example, reflected in the references given above.
a recent issue of Statistics in Medicine (Volume 22, 2.1 Acknowledgment of Subjectivity and Context
Number 10) comprised entirely of Bayesian analyses
using Markov chain Monte Carlo (MCMC) methods: Bayesian analysis is rooted in probability theory,
of these ten papers, all exploited the ability to handle whose basic rules are generally considered as self-
complex models, but only one gave any attention to an evident. However, as Lindley (2000) emphasizes, the
informative prior distribution, and then only minimally rules of probability can be derived from “deeper” ax-
(Hanson, Bedrick, Johnson and Thurmond, 2003). ioms of reasonable behavior of an individual (say You)
This division is reflected in the structure of this in the face of Your own uncertainty. The vital point
paper. In Section 2 we focus on three selected features of this subjective interpretation is that Your probabil-
of basic Bayesian analysis, which are then illustrated in ity for an event is a property of Your relationship to
a fairly detailed example. We then go on in Section 3 to that event, and not an objective property of the event
identify three important features of complex modelling itself. This is why, pedantically speaking, one should
followed again by an example. It will be clear that the always refer to probabilities for events rather than
emphasis throughout is firmly on inference rather than probabilities of events, since the probability is condi-
decision-making: this is primarily to avoid overlap with tioned on the context, which includes the observer and
all the observer’s background knowledge and assump-
Berry (2004), but also reflects personal enthusiasm.
tions. Bayesian methods therefore explicitly allow for
This issue is briefly discussed in Section 4, which also
the possibility that the conclusions of an analysis may
attempts to put current developments in perspective
depend on who is conducting it and their available ev-
and outlines some issues in increasing appropriate use
idence and opinion, and therefore an understanding of
of Bayesian methods.
the context of the study is vital:
Of course, this paper can only scratch the surface of a
burgeoning literature, and only a sample of references Bayesian statistics treats subjectivity with
is provided in the text. For basic arguments for the respect by placing it in the open and under
Bayesian approach in this context it is difficult to the control of the consumer of data (Berger
improve upon the classic papers by Jerome Cornfield, and Berry, 1988).
for example, Cornfield (1966, 1969, 1976). More This view appears particularly appropriate to the
recent introductions and polemics include Etzioni and complex circumstances in which evaluations of health-
Kadane (1995), Berry and Stangl (1996a) and Kadane care interventions are carried out. Apart from method-
(1995), while Spiegelhalter, Myles, Jones and Abrams ological researchers, at least five different viewpoints
(2000) systematically reviews the literature, Berry and might be identified:
Stangl (1996b) contains a wide range of applications,
and O’Hagan and Luce (2003) is an excellent free • sponsors—for example, the pharmaceutical indus-
primer. Much of the material presented in this paper is try, medical charities or granting bodies such as
taken from Spiegelhalter, Abrams and Myles (2004), the U.S. National Institutes of Health, and the U.K.
to which we refer for further detail. Medical Research Council;
• investigators—that is, those responsible for the con-
2. THREE SELECTED FEATURES OF BASIC
duct of a study, whether industry or publicly funded;
BAYESIAN ANALYSIS
• reviewers—for example, journal editors regarding
publication, and regulatory bodies for approval of
A number of generic characteristics of the Bayesian pharmaceuticals or devices;
paradigm make it especially suitable for application • policy makers—for example, agencies responsible
to health-care evaluations. Here we focus on a limited for setting health policy taking into account cost-
selection: acknowledgment of subjectivity and context, effectiveness, such as the U.K. National Institute
158 D. J. SPIEGELHALTER
for Clinical Excellence (NICE), or individual health theorem provides “a yardstick against which a sur-
maintenance organizations (HMOs), prising finding may be measured.” Increasing attention
• consumers—for example, individual patients or to “false discovery rates” (Benjamini and Hochberg,
clinicians acting on their behalf. 1995), which are essentially measures of the “predic-
tive value positive,” has refocused attention on this con-
Each of these broad categories can be further sub-
cept within the context of classical multiple testing.
divided. Thus a characteristic of health-care evalua-
Our second example concerns the simple use of
tion is that the investigators who plan and conduct a
Bayes theorem when data has been analyzed using
study are generally not the same body as that which
standard statistical packages. In parametric models
makes decisions on the basis of the evidence provided
Bayes theorem is often taught using binomial likeli-
in part by that study. An immediate consequence of this
hoods and conjugate beta distributions, but this frame-
complex cast of stakeholders is that it is not generally
work does not fit in well with comparative evaluations
straightforward to implement a decision-theoretic ap-
for which interest will generally lie with odds ratios
proach that is based around a single decision-maker.
or hazard ratios, classically estimated using logistic or
In addition, there are a range of possible prior distribu-
Cox regression analyses provided within standard sta-
tions, which may in turn be used for design but possibly
tistical packages. Practitioners will therefore generally
not in reporting results. All this reinforces the need for
have data summaries comprising estimates and stan-
an extremely flexible approach, with a clear specifica-
dard errors of a log(odds ratio) or a log(hazard ratio),
tion of whose beliefs and values are being expressed, and these can be interpreted as providing normal like-
and the necessity of taking forward in parallel a range
lihoods and incorporated into a Bayesian analysis.
of possible opinions. To be specific, suppose we have a classical esti-
2.2 Simple Use of the Bayes Theorem mate y, with standard error s, of a true log(odds
ratio) or a log(hazard ratio) θ , and this is interpreted
The use of MCMC methods can lead to the use of
as providing a normal likelihood based on the sam- √
extravagantly complex models, but here we consider
pling distribution y ∼ N [θ, σ 2 /m], where s = σ/ m,
two important applications of Bayes theorem used in
with σ known. Then if we are willing to approximate
its simplest analytic form: the interpretation of positive our prior distribution for θ by θ ∼ N [µ, σ 2 /n0 ], the
trial results, and using approximate normal likelihoods simplest application of Bayes theorem gives a poste-
and priors. rior distribution
Bayes’ theorem is often introduced through exam-
ples based on diagnostic testing for a disease of known n0 µ + mym σ2
θ|y ∼ N , .
prevalence. For fixed sensitivity and specificity the n0 + m n0 + m
posterior probability after a positive test result (or The choice of σ is essentially one of convenience
the “predictive value positive”) can be calculated, and since we are matching three parameters (σ, m, n0 )
the frequent conflict of this value with naive intuition to two specified quantities (the likelihood and prior
can be a good educational warning to take into ac- variability), but perhaps remarkably it turns out that
count the prior probability (prevalence). In the context σ = 2 leads to a value m that is generally interpretable
of health-care evaluation, the equivalent of a positive as the “effective number of events” (Spiegelhalter,
test is a “significant” finding in a clinical trial, which al- Abrams and Myles, 2004): for example, Tsiatis (1981)
most inevitably receives disproportionately more pub- shows that in a balanced trial with small treatment
licity than a negative finding. effect, the estimated log(hazard ratio) has approximate
There have been frequent attempts to adapt Bayesian variance 4/m, where m is the observed number of
ideas as an aid to interpretation of positive clinical trial events. This formulation aids interpretation as one can
results. For example, Simon (1994) points out that if translate both prior input and evidence from data on a
one carries out clinical trials with Type I error α = 0.05 common scale, as we shall see in Section 2.4.
and power (1 − Type II error) 1 − β = 0.80, then if
2.3 Flexible Prior Specification
only 10% of investigated treatments are truly effective
(not an unreasonable estimate), then Bayes theorem For a “classical” audience, it is important to clarify
shows that 36% of claimed “discoveries” will be false a number of possible misconceptions that may arise
positives. This figure will be even higher if there is ad- concerning the prior distribution. In particular, a prior
ditional external evidence against a particular interven- is not necessarily specified beforehand: Cox (1999)
tion, prompting Grieve (1994) to suggest that Bayes states that:
BAYESIAN IDEAS AND HEALTH-CARE EVALUATION 159
I was surprised to read that priors must one scale are not uniform on a transformed scale,
be chosen before the data have been seen. and apparently innocuous prior assumptions can
Nothing in the formalism demands this. have a strong impact particularly when events are
Prior does not refer to time, but to a situa- rare. Special problems arise in hierarchical mod-
tion, hypothetical when we have data, where elling, both with regard to appropriate priors on
we assess what our evidence would have nuisance parameters such as baseline risks, and se-
been if we had had no data. This assess- lection of a default prior for the between-group
ment may rationally be affected by having variability. For the latter, attention has concen-
seen the data, although there are consider- trated on placing a prior directly on the degree of
able dangers in this, rather similar to those shrinkage (Christiansen and Morris, 1997b; Daniels,
in frequentist theory. 1999; Natarajan and Kass, 2000; DuMouchel and
Normand, 2000; Spiegelhalter, 2001), although a
Naturally when making predictions or decisions one’s
half-normal prior on the between-group standard
prior distribution needs to be unambiguously specified,
deviation appears to be a transparent and flexible
although even then it is reasonable to carry out sensi-
means of incorporating a degree of prior information
tivity analysis to alternative choices.
(Spiegelhalter, Abrams and Myles, 2004).
The prior is also not necessarily unique, since the
• “Sceptical” priors that express archetypal doubts
discussion in Section 2.1 should make clear that there
about large effects—Informative priors that express
is no such thing as the “correct” prior. Instead, Kass
scepticism about large treatment effects have been
and Greenhouse (1989) introduced the term “commu-
put forward both as a reasonable expression of
nity of priors” to describe the range of viewpoints that
doubt, and as a way of controlling early stopping
should be considered when interpreting evidence, and
of trials on the basis of fortuitously positive results.
therefore a Bayesian analysis is best seen as providing
Kass and Greenhouse (1989) suggest that a
a mapping from a space of specified prior beliefs to
appropriate posterior beliefs. cautious reasonable sceptic will recom-
Members of this “community” may include the mend action only on the basis of fairly
following: firm knowledge,
• “Clinical” priors representing expert opinion—Eli- but that these sceptical
citation methods for such priors were reviewed by
beliefs we specify need not be our own,
Chaloner (1996), who concluded that fairly simple
nor need they be the beliefs of any actual
methods are adequate, using interactive feedback
person we happen to know, nor derived in
with a scripted interview, providing experts with
some way from any group of “experts.”
a systematic literature review, basing elicitation on
2.5% and 97.5% percentiles, and using as many Mathematically speaking, a sceptical prior about
experts as possible. Recent reports of elicitation a treatment effect will have a mean of zero and a
before clinical trials include Fayers et al. (2000) and shape chosen to include plausible treatment differ-
Chaloner and Rhame (2001). ences which determines the degree of scepticism.
• “Evidence-based” priors representing a synthesis Spiegelhalter, Freedman and Parmar (1994) argue
of available evidence—Since conclusions strongly that a reasonable degree of scepticism may corre-
based on beliefs that cannot be “objectively” sup- spond to a feeling that the trial has been designed
ported are unlikely to be widely regarded as convinc- around an alternative hypothesis that is optimistic,
ing, it is valuable to summarize available evidence. formalized by a prior with only a small probability γ
Possible models for incorporation of past data are (say 5%) that the treatment effect is as large as the
discussed in Section 3.2. alternative hypothesis θA .
• “Reference” priors—It is attractive to seek a “non- In Section 2.2 we emphasized how a “significant”
informative” prior to use as a baseline analysis, positive trial result may be tempered by taking
and such analyses have been suggested as a way into account prior prevalence, and Matthews (2001)
of making probability statements about parameters extended those ideas to allow for the full likelihood
without being explicitly Bayesian (Burton, 1994; observed. Specifically, he derives a simple formula
Shakespeare, Gebski, Veness and Simes, 2001). But for working backward from the observed likelihood
the problems are well known: uniform priors on to the sceptical prior centered on 0 that would just
160 D. J. SPIEGELHALTER
give a 95% posterior interval that included 0—if the data monitoring committee used Bayesian tech-
that degree of scepticism were considered plausible, niques to inform the decision whether to stop early.
then the trial results could not be considered as More detail is provided in Parmar et al. (2001) and
convincing. An example is provided in Section 2.4. Spiegelhalter, Abrams and Myles (2004).
Sceptical priors have been used in a number of Intervention. In 1986 a new radiotherapy technique
case studies (Fletcher et al., 1993; Parmar, called CHART (continuous hyper-fractionated acceler-
Ungerleider and Simon, 1996; DerSimonian, 1996; ated radiotherapy) was introduced. Its concept was to
Heitjan, 1997; Dignam et al., 1998; Cronin et al., give radiotherapy continuously (no weekend breaks),
1999; Harrell and Shih, 2001). A senior Food and in many small fractions (three a day) and acceler-
Drug Administration (FDA) biostatistician (O’Neill, ated (the course completed in twelve days). There are
1994) has stated that he clearly considerable logistical problems in efficiently
delivering CHART, as well as concerns about possible
would like to see [sceptical priors] ap- increased side effects.
plied in more routine fashion to provide Aim of studies. Promising nonrandomized and pilot
insight into our decision making. studies led the U.K. Medical Research Council to insti-
• “Enthusiastic” priors that express archetypal opti- gate two large randomized trials to compare CHART to
mism—As a counterbalance to the pessimism ex- conventional radiotherapy in both nonsmall-cell lung
pressed by the sceptical prior, Spiegelhalter, and head-and-neck cancer, and in particular to assess
Freedman and Parmar (1994) suggest an “enthusias- whether CHART provides a clinically important dif-
tic” prior centered on the alternative hypothesis and ference in survival that compensates for any additional
with a low chance (say 5%) that the true treatment toxicity and problems of delivering the treatment.
benefit is negative. Study design. The trials began in 1990, random-
ized in the proportion 60 : 40 in favor of CHART,
The community of prior opinions becomes particu- with planned annual meetings of the data monitor-
larly important when faced with the difficult issue of ing committee (DMC) to review efficacy and toxicity
whether to stop a clinical trial. Kass and Greenhouse data. No formal stopping procedure was specified in
(1989) express the crucial view that “the purpose of a the protocol.
trial is to collect data that bring to conclusive consen- Outcome measure. Full data were to become avail-
sus at termination opinions that had been diverse and able on survival (lung) or disease-free survival (head-
indecisive at the outset,” and this idea may be formal- and-neck), with results presented in terms of estimates
ized as follows: of the hazard ratio h, defined as the ratio of the hazard
under CHART to the hazard under standard treatment.
1. Stopping with a “positive” result (i.e., in favor
Hence hazard ratios less than 1 indicate superiority
of the new treatment) might be considered if a of CHART.
posterior based on a sceptical prior suggested a high Planned sample sizes. 600 patients were to be en-
probability of treatment benefit. tered in the lung cancer trial, with 470 expected
2. Stopping with a “negative” result (i.e., equivocal or deaths, giving 90% power to detect at the 5% level
in favor of the standard treatment) may be based on a 10% improvement (15% to 25% survival). Under
whether the results were sufficiently disappointing a proportional hazards assumption, this is equivalent
to make a posterior based on an enthusiastic prior to an alternative hypothesis (hazard ratio) of hA =
rule out a treatment benefit. log(0.25)/ log(0.15) = 0.73. The head-and-neck trial
In other words we should stop if we have convinced was to have 500 patients, with 220 expected recur-
a reasonable adversary that they are wrong. Fayers, rences, giving 90% power to detect at the 5% level
Ashby and Parmar (1997) provide a tutorial on such a 15% improvement (45% to 60% disease-free sur-
an approach, and Section 2.4 describes its application vival), equivalent to an alternative hypothesis of hA =
by a data monitoring committee of two cancer trials. log(0.60)/ log(0.45) = 0.64.
Statistical model. A proportional hazards Cox model
2.4 Example 1: Bayesian Monitoring of provides an approximate normal likelihood (Sec-
the CHART Trials tion 2.2) for θ = log(h)= log(hazard ratio), based on
Here we illustrate the selected ideas described pre- σ2
viously in the context of two clinical trials in which ym ∼ N θ, ,
m
BAYESIAN IDEAS AND HEALTH-CARE EVALUATION 161
TABLE 1
Summary data reported at each meeting of the CHART lung trial DMC. The “effective number of deaths” m is derived√from
the likelihood-based 95% interval, in that the standard error of the estimated log(hazard ratio) is assumed to be 2/ m
For the head-and-neck cancer trial, the data reported assuming a more enthusiastic prior: we adopt the ex-
at each meeting of the independent data monitoring perts’s clinical prior, since this expresses considerable
committee showed no strong evidence of benefit shown optimism. The initial clinical demands were a 13%
at any stage of the study: at the final analysis the change in survival from 45% to 58%, but in paral-
likelihood-based hazard ratio estimate was 0.95 with lel with the lung trial we have reduced this to a 7%
95% interval 0.79 to 1.14. improvement. The results are described in detail in
Bayesian interpretation. For the lung trial, the DMC Parmar et al. (2001): the final posterior expresses a
were presented with survival curves, and posterior dis- 17% chance that CHART reduces survival, 8% chance
tributions and tail areas arising from a reference prior that it reduces a clinically significant (> 7%) im-
[uniform on a log(h) scale]. Following the discussion provement and 75% chance that the effect lies in the
in Section 2.3, the posterior distribution resulting from “grey area” in between. The data should therefore be
the sceptical prior was emphasized in view of the pos- sufficient to convince a reasonable enthusiast that, on
itive findings, in order to check whether the evidence the basis of the trial evidence, CHART is not of clini-
was sufficient to persuade a reasonable sceptic. cal benefit in head-and-neck cancer.
Figure 3 shows the sceptical prior distributions at Sensitivity analysis. We can use the results of
the start of the lung cancer trial, and the likelihood Matthews (2001) to see what degree of scepticism
(essentially the posterior under the reference prior) would have been necessary not to have found the fi-
and posterior for the results available in subsequent nal lung results convincing. Had the effective number
years. Under the reference prior there is substantial of events underlying our sceptical prior been n0 = 701,
reduction in the estimated effect as the trial progresses, this would have just led the 95% posterior interval for
while the sceptical results are remarkably stable and the hazard ratio to include 1. Since this prior distrib-
Table 1 shows that the initial estimate in 1992 remains ution would have restricted plausible hazard ratios to
essentially unchanged. a 95% prior interval of 0.86 to 1.14, this can be con-
Before the trial the clinicians were demanding sidered too great a degree of reasonable scepticism and
a 13.5% improvement before changing treatment hence the trial results can be considered convincing of
(Parmar et al., 2001): however, the inconvenience and survival benefit.
toxicity were found to be substantially less than ex- Comments. There are two important features of the
pected and so the probability of improvement of at prospective Bayesian analysis of the CHART trial.
least 7% was calculated, around half the initial de- First, while classical stopping rules may well have led
mands, and equivalent to h of around 0.80. Such “shift- the DMC to have stopped the lung trial earlier, per-
ing of the goalposts” is entirely reasonable in the light haps in 1993 when the two-sided P -value was 0.001
of the trial data. Figure 3 and the final column of this would have overestimated the benefit. The DMC
Table 1 show that the sceptical posterior distribution is allowed the trial to continue, and consequently pro-
centered around these clinical demands, showing that duced a strong result that should be convincing to wide
these data should persuade even a sceptic that CHART range of opinion. Second, after discovering that the
both improves survival and, on balance, is the prag- secondary aspects of the new treatment were less unfa-
matic treatment of choice. vorable than expected, the DMC is allowed to “shift the
Since the results for the head-and-neck trial were es- goalposts” and not remain with unnecessarily strong
sentially negative, it is appropriate to monitor the trial clinical demands.
BAYESIAN IDEAS AND HEALTH-CARE EVALUATION 163
F IG . 3. Prior, “likelihood” ( posterior based on reference prior) and posterior distributions for the CHART lung cancer trial assuming
a sceptical prior. The likelihood becomes gradually less extreme, providing a very stable posterior estimate of the treatment effect when
adopting a sceptical prior centered on a hazard ratio of 1. Demands are based on a 7% improvement from 15% to 22% 2-year survival,
representing a hazard ratio of 0.80.
3. THREE SELECTED FEATURES OF COMPLEX et al., 1997), institutional comparisons (Goldstein and
BAYESIAN MODELLING Spiegelhalter, 1996; Christiansen and Morris, 1997a;
Normand, Glickman and Gatsonis, 1997) and meta-
It is important to note that many of the advan- analysis (Sutton et al., 2000; Whitehead, 2002). How-
tages claimed for the Bayesian approach follow from ever, many of these analyses minimize the role of prior
the ability to handle complex models. In particular, information and could have been carried out using flex-
there has been extensive use of hierarchical models in ible likelihood methods, such as simulating the distri-
health-care evaluation: see, for example, applications bution of functions of maximum likelihood estimates
in subset analysis (Dixon and Simon, 1991; Simon, and so on.
Dixon and Friedlin, 1996), multicenter analysis (Gray, Here we focus on three aspects that reflect a specif-
1994; Stangl and Greenhouse, 1998), cluster random- ically Bayesian input into complex modelling: com-
ized trials (Spiegelhalter, 2001; Turner, Omar and putation; incorporation of historical information, and
Thompson, 2001), multiple N -of-1 studies (Zucker inference on complex functions of parameters.
164 D. J. SPIEGELHALTER
3.1 Computation
the historical evidence by taking its likelihood and Whitehead (1996) and Hasselblad (1998) con-
p(yh |θh ) to a power α. For example, Greenhouse sider a range of hierarchical models for this prob-
and Wasserman (1995) downweight a previous lem, while Song, Altman, Glenny and Deeks (2003)
trial with 176 subjects to be equivalent to only carry out an empirical investigation and report that
10 subjects: Fryback, Stout and Rosenberg (2001) such comparisons arrive at essentially the same conclu-
also discounted past trials to create a prior for the sions as head-to-head comparisons. A specific applica-
GUSTO analysis. We note, however, that Eddy, tion arises in the context of “active control” studies.
Hasselblad and Shachter (1992) are very strong in Suppose an established treatment C exists for a condi-
their criticism of this method, as it does not have tion, and a new intervention T is being evaluated. The
any operational interpretation and hence no clear efficacy of T would ideally be estimated in randomized
means of assessing a suitable value for α. trial with a placebo P as the control group, but because
(e) Functional dependence—The current parameter of the existence of C this may be considered unethical.
of interest is a logical function of parameters Hence C may be used as an “active control” in a head-
estimated in historical studies: this option is further to-head clinical trial, and inferences about the efficacy
explored in Section 3.3. of T may have to be estimated indirectly, using past
(f) Equal—Past studies are measuring precisely the data on C-versus-P comparisons.
parameters of interest and data can be directly A more complex situation is as follows. Suppose we
pooled—this is equivalent to assuming exchange- are interested in drawing inferences on a quantity f
ability of individuals. about which no direct evidence exists, but where
f can be expressed as a deterministic function of
Various combinations of these techniques are possi-
a set of “fundamental” parameters θ = θ1 , . . . , θN .
ble. For example, Berry and Stangl (1996a) assume a
For example, f might be the response rate in a new
fixed probability p that each historical patient is ex-
population made up of subgroups about which we
changeable with those in the current study [i.e., ei-
do have some evidence. More generally, we might
ther option (f) (complete pooling) with probability p
assume we have available a set of K studies in
or option (a) (complete irrelevance) with probabil-
which we have observed data y1 , . . . , yK which depend
ity 1 − p], while Racine, Grieve, Fluhler and Smith
on parameters ψ1 , . . . , ψK , where each ψk is itself
(1986) assume a certain prior probability that the
a function of the fundamental parameters θ . This
entire historical control group exactly matches the
structure is represented graphically in Figure 6. This
contemporaneous controls and hence can be pooled.
situation sounds very complex but in fact is rather
Given the wide range of options concerning the way
common, when we have many studies, each of which
in which historical data may be incorporated into a
informs part of a jigsaw, and which need to be put
model, there is clearly a need for both qualitative and
together to answer the question of interest. An example
quantitative input into the modelling, based on both
is provided in Section 3.4.
judgements and substantive knowledge.
3.3 Inference on Complex Functions
TABLE 2
Available data from relevant studies, generally only allowing direct estimation of functions of fundamental parameters of interest. Also
provided are estimates and intervals based on full data, and the cross-validatory P -values based on excluding data source 4
Observed P -value
Data items and sources Parameter being estimated Data proportion Estimate 95% interval (excl. 4)
Costs and utilities. Ades and Cliffe (2002) specify The pre-natal population in London is N = 105,000,
the cost per test as T = £3, and the net benefit K and hence the annual incremental net benefit (INB) of
per maternal diagnosis is judged to be around £50,000 implementing full rather than targeted screening is
with a range of £12,000 to £60,000. In this instance
there is explicit monetary net benefit from maternal INB = N (1 − a − b) Ke(1 − h) − T (1 − eh) .
diagnosis and so it may be reasonable to take K as
an unknown parameter, and Ades and Cliffe (2002) We would also like to know, for fixed K, the proba-
perform a probabilistic sensitivity analysis by giving K bility Q(K) = P ( INB > 0|data): when plotted as a
a somewhat complex prior distribution. In contrast, we function of K this is known as the cost-effectiveness
prefer to continue to treat K as a willingness-to-pay acceptability curve (CEAC); see, for example, Briggs
for each unit of benefit, and therefore we conduct a (2000) and O’Hagan, Stevens and Montmartin (2000,
deterministic sensitivity analysis in which K is varied 2001) for detailed discussion of these quantities from a
up to £60,000. Bayesian perspective.
BAYESIAN IDEAS AND HEALTH-CARE EVALUATION 169
We can also conduct a “value of information” analy- and discounting at 6% per year gives a multiplier of 7.8
sis (Claxton, Lacey and Walker, 2000). For some (not discounting the first year) to the annual figure.
unknown quantity θ , the “value of perfect informa- Bayesian interpretation. Following the previous find-
tion” VPI(θ) is defined as the amount we would gain ings the analysis is conducted without data-source 4.
by knowing θ exactly: VPI(θ) is 0 when INB(θ) > 0, Figure 8(a) shows the normal approximations to the
and −INB(θ) when INB(θ) < 0, and hence can be ex- posterior distributions of INB for different values of K.
pressed as The expected INB and 95% limits are shown in Fig-
ure 8(b) for K up to £60,000, indicating that the policy
VPI(θ) = max −INB(θ), 0 . of universal testing is preferred on balance provided
that the benefit K from a maternal diagnosis is greater
Hence the “expected value of perfect information”
than around £10,000: K is certainly judged to exceed
EVPI is
this value. The cost-effectiveness acceptability curve
(1) EVPI = E max −INB(θ), 0 |data . in Figure 8(c) points a high probability of universal
testing being cost-effective for reasonable values of K.
This may be calculated in two ways: first using MCMC Figure 8(d) shows the EVPI (±2 Monte Carlo errors)
methods, and second by assuming a normal approx- calculated using 100,000 MCMC iterations and also
imation to the posterior distribution of INB(K) and using the approximation to the distribution of INB,
using a closed form identity. Taking a 10-year horizon which provides an adequate approximation. The EVPI
F IG . 8. (a) and (b) incremental net benefits; (c) cost-effectiveness acceptability curve; and (d) expected value of perfect information
for universal versus targeted prenatal testing for HIV. Note the EVPI is maximized at the threshold value of K at which the optimal
decision changes.
170 D. J. SPIEGELHALTER
is substantial for low values of K, but for values around of the posterior distributions in order to check whether
£50,000 the EVPI is negligible. Hence there appears to the critical boundaries had been crossed. The DMC
be little purpose in further research to determine the recommended stopping when the “futility” boundary
parameters more accurately. was crossed after recruiting over 900 patients; the
trial stopped immediately and subsequently reported
4. CONCLUSIONS an essentially “flat” dose–response curve (Krams et al.,
2003).
4.1 Current Status of Bayesian Methods
The greatest enthusiasm for Bayesian methods ap-
As mentioned in Section 1, there has been a ma- pears to be in U.S. FDA Center for Devices and
jor growth in Bayesian publications but these mainly Radiological Health (CDRH) (Campbell, 1999). De-
comprise applications of complex modelling. A no- vices differ from pharmaceuticals in having better
table exception is the use of Bayesian models in understood physical mechanisms, which means that
cost-effectiveness analysis, in which informative prior effectiveness is generally robust to small changes.
distributions may be based on a mixture of evidence Since devices tend to develop in incremental steps, a
synthesis and judgement (O’Hagan and Luce, 2003). large body of relevant evidence often exists and com-
When considering health-care evaluations one can- panies did not tend to follow established phases of drug
not ignore the regulatory framework which controls the development. The fact that an application for approval
release onto the market of both new pharmaceuticals might include a variety of studies, including historical
and medical devices. Given the need to exercise strict controls and registries, suggests that Bayesian methods
control, it is hardly surprising that institutions such as for evidence synthesis might be appropriate.
the U.S. Food and Drug Administration adopt a fairly 4.2 The Role of Decision Theory
conservative line in statistical innovations and retain a
strong interest in frequentist properties of any statisti- The debate about the appropriate role of formal
cal method. Nevertheless, it is important to note that decision theory in health-care evaluation continues.
the latest international statistical guidelines for phar- Claims for a strong role of decision theory include
maceutical submissions to regulatory agencies state the following:
that “the use of Bayesian and other approaches may be • In the context of clinical trials, Lindley (1994)
considered when the reasons for their use are clear and categorically states that
when the resulting conclusions are sufficiently robust”
(International Conference on Harmonisation E9 Expert clinical trials are not there for inference
Working Group, 1999). Unfortunately they do not go but to make decisions,
on to define what they mean by clear reasons and ro- while Berry (1994) states that
bust conclusions, and so it is still open as to what will
constitute an appropriate Bayesian analysis for a phar- deciding whether to stop a trial requires
maceutical regulatory body. considering why we are running it in
A recent example has proved, however, that it is the first place, and this means assessing
possible to obtain regulatory approval for a large and utilities.
complex adaptive trial that uses a Bayesian monitor- Healy and Simon (1978) considers that
ing procedure. Berry et al. (2002) describe the design
of a phase II/III dose-finding study in acute stroke, in in my view the main objective of al-
which 15 different doses were to be given at random most all trials on human subjects is (or
at the start of randomization, with steady adaptation to should be) a decision concerning the
the range of doses around the ED95, that is, the mini- treatment of patients in the future.
mum dose that provides 95% of the maximum efficacy. • Within a pharmaceutical company it is natural to try
The original decision-theoretic stopping criterion was to maximize profitability, and this naturally leads to
replaced by one based on posterior tail-areas being less the use of utilities.
than a certain value: a frequentist assessment of the size • Within a health-policy setting, decision theory and
and power of the study was based on pretrial simula- economic argument clearly state that maximized
tions and approved by the FDA. The trial was closely expected utility is the sole criteria for choosing
monitored, with the statistician of the data monitor- between two options. Therefore measures of “sig-
ing committee (myself) receiving weekly summaries nificance,” posterior tail areas of incremental net
BAYESIAN IDEAS AND HEALTH-CARE EVALUATION 171
benefit and so on are all irrelevant (Claxton and solely on the balance of net benefit—some convinc-
Posnett, 1996). Claxton, Lacey and Walker (2000) ing evidence is required before changing policy.
point out that: • A change in policy carries with it many hidden
penalties: for example, it may be difficult to reverse
Once a price per effectiveness unit has
if later found to be erroneous, and it may hinder the
been determined, costs can be incor-
development of other, better, innovations. It would
porated, and the decision can then be
be difficult to explicitly model these phenomena
based on (posterior) mean incremental
with any plausibility.
net benefit measured in either monetary
• Value of information analysis is strongly dependent
or effectiveness terms.
on having the “correct” model, which is never
Uncertainty is only taken into account through known and generally cannot be empirically checked.
evaluating the benefit of further experimentation, as Sensitivity analysis can only compensate to some
measured by a value of information analysis. extent for this basic ignorance.
• To maximize the health return from the limited
resources available from a health budget, health-care Whitehead (1997, page 208) points out that the
purchasers should use rational resource allocation theory of optimal decision making only exists for a
procedure. Otherwise the resulting decisions could single decision-maker, and that no optimal solution
be considered as irrational, inefficient and unethical. exists when making a decision on behalf of multiple
• Overall, a decision-theoretic framework provides a parties with different beliefs and utilities. He therefore
formal basis for designing trials, assessing whether argues that internal company decisions at phase I and
to approve an intervention for use, deciding whether phase II of drug development may be modelled as
an intervention is cost-effective and commissioning decision problems, but that phase III trials cannot.
further research. The discussion in Section 2.1 has revealed the com-
plexity of the context in which health-care evaluation
Claims against the use of decision theory include takes place, and clearly a simplistic decision-theoretic
the following: approach is inappropriate. Nevertheless in the context
• It is unrealistic to place clinical trials within a of any real decision that must be made, it would seem
decision-theoretic context, primarily because the im- beneficial to have at least a qualitative expression of
pact of stopping a trial and reporting the results can- the potential gains and losses, and from there to move
not be predicted with any confidence: Peto (1985) in toward a full quantitative analysis.
the discussion of Bather (1985), states that: 4.3 Increasing the Appropriate Use of
Bather, however, merely assumes . . . “it is Bayesian Methods
implicit that the preferred treatment will We conclude by some brief personal opinions about
then be used for all remaining patients” innovations that may lead to wider and improved use
and gives the problem no further atten- of Bayesian methods.
tion! This is utterly unrealistic, and leads First, we need a set of good worked examples based
to potentially misleading mathematical on realistic situations and that set good standards in
conclusions. specification and analysis. Second, we need a structure
Peto goes on to argue that a serious decision- for reporting Bayesian analyses that permits rapid
theoretic formulation would have to model the critical appraisal: as Berry (2002) says:
subsequent dissemination of a treatment. There is as much Bayesian junk as there is
• The idea of a null hypothesis (the status quo), which
frequentist junk. Actually, there’s probably
lies behind the use of “statistical significance” or
more of the former because, to the uniniti-
posterior tail-areas, is fundamentally different from
ated, the Bayesian approach appears to pro-
an alternative hypothesis (a novel intervention). The
vide a free lunch.
consequences and costs of the former are generally
established, whereas the impact of the latter must Third, following the running theme of this paper,
contain a substantial amount of judgement. Often, there is a need to understand and integrate with the
therefore, a choice between two treatments is not a current methodology and software used in studies.
choice between two equal contenders to be decided Finally, it should be acknowledged that Bayesian
172 D. J. SPIEGELHALTER
methods do not provide a panacea. Problems should Health Policy (D. K. Stangl and D. A. Berry, eds.) 83–104.
be clearly highlighted and it should be acknowledged Dekker, New York.
that sampling properties of systems may be important B URTON , P. R. (1994). Helping doctors to draw appropriate
inferences from the analysis of medical studies. Statistics in
in some contexts. The general statistical community, Medicine 13 1699–1713.
who are not stupid, have justifiably found somewhat C AMPBELL , G. (1999). A regulatory perspective for Bayesian
tiresome the tone of hectoring self-righteousness that clinical trials. Food and Drug Administration, Washington.
has often come from the Bayesian lobby. Fortunately C ASELLA , G. and G EORGE , E. (1992). Explaining the Gibbs
that period seems to be coming to a close, and with luck sampler. Amer. Statist. 46 167–174.
C HALONER , K. (1996). Elicitation of prior distributions. In
the time has come for the appropriate use of Bayesian
Bayesian Biostatistics (D. A. Berry and D. K. Stangl, eds.)
thinking to be pragmatically established. 141–156. Dekker, New York.
C HALONER , K. and R HAME , F. (2001). Quantifying and docu-
menting prior beliefs in clinical trials. Statistics in Medicine
REFERENCES 20 581–600.
A DES , A. E. and C LIFFE , S. (2002). Markov chain Monte Carlo C HESSA , A. G., D EKKER , R., VAN V LIET, B.,
estimation of a multiparameter decision model: Consistency of S TEYERBERG, E. W. and H ABBEMA , J. D. F. (1999).
evidence and the accurate assessment of uncertainty. Medical Correlations in uncertainty analysis for medical decision
Decision Making 22 359–371. making: An application to heart-valve replacement. Medical
BATHER , J. A. (1985). On the allocation of treatments in sequential Decision Making 19 276–286.
medical trials (with discussion). Internat. Statist. Rev. 53 1–13, C HRISTIANSEN , C. L. and M ORRIS , C. N. (1997a). Improving the
25–36. statistical approach to health care provider profiling. Annals of
B ENJAMINI , Y. and H OCHBERG , Y. (1995). Controlling the false Internal Medicine 127 764–768.
discovery rate: A practical and powerful approach to multiple C HRISTIANSEN , C. L. and M ORRIS , C. N. (1997b). Hierarchical
testing. J. Roy. Statist. Soc. Ser. B 57 289–300. Poisson regression modeling. J. Amer. Statist. Assoc. 92
B ENSON , K. and H ARTZ , A. (2000). A comparison of observa- 618–632.
tional studies and randomized controlled trials. New England C LAXTON , K., L ACEY, L. F. and WALKER , S. G. (2000).
J. Medicine 342 1878–1886. Selecting treatments: A decision theoretic approach. J. Roy.
B ERGER , J. O. and B ERRY, D. A. (1988). Statistical analysis and Statist. Soc. Ser. A 163 211–225.
the illusion of objectivity. American Scientist 76 159–165. C LAXTON , K. and P OSNETT, J. (1996). An economic approach
B ERRY, D. A. (1994). Discussion of “Bayesian approaches to to clinical trial design and research priority-setting. Health
randomized trials,” by D. J. Spiegelhalter, L. S. Freedman and Economics 5 513–524.
M. K. B. Parmar. J. Roy. Statist. Soc. Ser. A 157 399. C ORNFIELD , J. (1966). A Bayesian test of some classical
B ERRY, D. A. (2002). Adaptive clinical trials and Bayesian statis- hypotheses—with applications to sequential clinical trials.
tics (with discussion). In Pharmaceutical Report—American J. Amer. Statist. Assoc. 61 577–594.
Statistical Association 9 1–11. Amer. Statist. Assoc., Alexan- C ORNFIELD , J. (1969). The Bayesian outlook and its application.
dria, VA. Biometrics 25 617–657.
B ERRY, D. A. (2004). Bayesian statistics and the efficiency and C ORNFIELD , J. (1976). Recent methodological contributions to
ethics of clinical trials. Statist. Sci. 19 175–187. clinical trials. American J. Epidemiology 104 408–421.
B ERRY, D. A., M ÜLLER , P., G RIEVE , A., S MITH , M., C OX , D. R. (1999). Discussion of “Some statistical heresies,” by
PARKE , T., B LAZEK , R., M ITCHARD , N. and K RAMS , M. J. K. Lindsey. The Statistician 48 30.
(2002). Adaptive Bayesian designs for dose-ranging drug tri- C RONIN , K. A., F REEDMAN , L. S., L IEBERMAN , R.,
als. In Case Studies in Bayesian Statistics. Lecture Notes in W EISS , H. L., B EENKEN , S. W. and K ELLOFF , G. J. (1999).
Statist. 162 99–181. Springer, New York. Bayesian monitoring of phase II trials in cancer chemopreven-
B ERRY, D. A. and S TANGL , D. K. (1996a). Bayesian methods in tion. J. Clinical Epidemiology 52 705–711.
health-related research. In Bayesian Biostatistics (D. A. Berry DANIELS , M. J. (1999). A prior for the variance components in
and D. K. Stangl, eds.) 3–66. Dekker, New York. hierarchical models. Canad. J. Statist. 27 567–578.
B ERRY, D. A. and S TANGL , D. K., eds. (1996b). Bayesian D ECISIONEERING (2000). Crystal ball. Technical report. Available
Biostatistics. Dekker, New York. at http://www.decisioneering.com/crystal_ball.
B RIGGS , A. (2000). Handling uncertainty in cost-effectiveness D ER S IMONIAN , R. (1996). Meta-analysis in the design and moni-
models. Pharmacoeconomics 17 479–500. toring of clinical trials. Statistics in Medicine 15 1237–1248.
B RITTON , A., M C K EE , M., B LACK , N., M C P HERSON , K., D IGNAM , J. J., B RYANT, J., W IEAND , H. S., F ISHER , B. and
S ANDERSON , C. and BAIN , C. (1998). Choosing between W OLMARK , N. (1998). Early stopping of a clinical trial when
randomised and non-randomised studies: A systematic review. there is evidence of no treatment benefit: Protocol B-14 of
Health Technology Assessment 2 1–124. the National Surgical Adjuvant Breast and Bowel Project.
B ROOKS , S. P. (1998). Markov chain Monte Carlo method and its Controlled Clinical Trials 19 575–588.
application. The Statistician 47 69–100. D IXON , D. O. and S IMON , R. (1991). Bayesian subset analysis.
B ROPHY, J. and J OSEPH , L. (2000). A Bayesian analysis of Biometrics 47 871–881.
random mega-trials for the choice of thrombyotic agents in D OMINICI , F., PARMIGIANI , G., W OLPERT, R. and
acute myocardial infarction. In Meta-Analysis in Medicine and H ASSELBLAD, V. (1999). Meta-analysis of migraine
BAYESIAN IDEAS AND HEALTH-CARE EVALUATION 173
headache treatments: Combining information from heteroge- H IGGINS , J. P. and W HITEHEAD , A. (1996). Borrowing strength
neous designs. J. Amer. Statist. Assoc. 94 16–28. from external trials in a meta-analysis. Statistics in Medicine
D U M OUCHEL , W. and N ORMAND , S. (2000). Computer- 15 2733–2749.
modeling and graphical strategies for meta-analysis. In Meta- I BRAHIM , J. G. and C HEN , M.-H. (2000). Power prior distribu-
Analysis in Medicine and Health Policy (D. K. Stangl and tions for regression models. Statist. Sci. 15 46–60.
D. A. Berry, eds.) 127–178. Dekker, New York. I NTERNATIONAL C ONFERENCE ON H ARMONISATION E9 E X -
E DDY, D. M., H ASSELBLAD , V. and S HACHTER , R. (1992). PERT W ORKING G ROUP (1999). Statistical principles for clin-
Meta-Analysis by the Confidence Profile Method: The Statis- ical trials: ICH harmonised tripartite guideline. Statistics in
tical Synthesis of Evidence. Academic Press, San Diego. Medicine 18 1905–1942. Available at http://www.ich.org.
E TZIONI , R. D. and K ADANE , J. B. (1995). Bayesian statistical I OANNIDIS , J. P. A., H AIDICH , A. B., PAPPA , M.,
methods in public health and medicine. Annual Review of PANTAZIS, N., KOKORI , S. I., T EKTONIDOU , M. G.,
Public Health 16 23–41. C ONTOPOULOS -I OANNIDIS , D. G. and L AU , J. (2001).
FAYERS , P. M., A SHBY, D. and PARMAR , M. K. B. (1997). Comparison of evidence of treatment effects in randomized
Tutorial in biostatistics: Bayesian data monitoring in clinical and nonrandomized studies. J. American Medical Association
trials. Statistics in Medicine 16 1413–1430. 286 821–830.
FAYERS , P. M., C USCHIERI , A., F IELDING , J., C RAVEN , J., K ADANE , J. B. (1995). Prime time for Bayes. Controlled Clinical
U SCINSKA , B. and F REEDMAN , L. S. (2000). Sample size Trials 16 313–318.
calculation for clinical trials: The impact of clinician beliefs. K ASS , R. E. and G REENHOUSE , J. B. (1989). A Bayesian per-
British J. Cancer 82 213–219. spective. Comment on “Investigating therapies of potentially
F LETCHER , A., S PIEGELHALTER , D., S TAESSEN , J., T HIJS , L. great benefit: ECMO,” by J. H. Ware. Statist. Sci. 4 310–317.
and B ULPITT, C. (1993). Implications for trials in progress of K RAMS , M., L EES , K., H ACKE , W., G RIEVE , A.,
publication of positive results. The Lancet 342 653–657. O RGOGOZO, J. and F ORD , G. (2003). Acute stroke therapy by
F RYBACK , D. G., S TOUT, N. K. and ROSENBERG , M. A. inhibition of neutrophils (ASTIN): An adaptive dose–response
(2001). An elementary introduction to Bayesian computing study of UK-279,276 in acute ischemic stroke. Stroke 34
using WINBUGS. International J. Technology Assessment in 2543–2548.
Health Care 17 98–113. K UNZ , R. and OXMAN , A. D. (1998). The unpredictability
G ILBERT, J. P., M C P EEK , B. and M OSTELLER , F. (1977). paradox: Review of empirical comparisons of randomised
Statistics and ethics in surgery and anesthesia. Science 198 and non-randomised clinical trials. British Medical J. 317
684–689. 1185–1190.
G ILKS , W. R., R ICHARDSON , S. and S PIEGELHALTER , D. J., L AROSE , D. T. and D EY, D. K. (1997). Grouped random effects
eds. (1996). Markov Chain Monte Carlo in Practice. Chapman models for Bayesian meta-analysis. Statistics in Medicine 16
and Hall, New York. 1817–1829.
G OLDSTEIN , H. and S PIEGELHALTER , D. J. (1996). League L AU , J., S CHMID , C. H. and C HALMERS , T. C. (1995). Cumula-
tables and their limitations: Statistical issues in comparisons tive meta-analysis of clinical trials builds evidence for exem-
of institutional performance (with discussion). J. Roy. Statist. plary medical care. J. Clinical Epidemiology 48 45–57.
Soc. Ser. A 159 385–443. L INDLEY, D. V. (1994). Discussion of “Bayesian approaches to
G OULD , A. L. (1991). Using prior findings to augment active- randomized trials,” by D. J. Spiegelhalter, L. S. Freedman and
controlled trials and trials with small placebo groups. Drug M. K. B. Parmar. J. Roy. Statist. Soc. Ser. A 157 393.
Information J. 25 369–380. L INDLEY, D. V. (2000). The philosophy of statistics (with discus-
G RAY, R. J. (1994). A Bayesian analysis of institutional effects in sion). The Statistician 49 293–337.
a multicenter cancer clinical trial. Biometrics 50 244–253. M ATTHEWS , R. A. J. (2001). Methods for assessing the credibility
G REENHOUSE , J. B. and WASSERMAN , L. (1995). Robust of clinical trial outcomes. Drug Information J. 35 1469–1478.
Bayesian methods for monitoring clinical trials. Statistics in NATARAJAN , R. and K ASS , R. E. (2000). Reference Bayesian
Medicine 14 1379–1391. methods for generalized linear mixed models. J. Amer. Statist.
G RIEVE , A. P. (1994). Discussion of “Bayesian approaches to Assoc. 95 227–237.
randomized trials,” by D. J. Spiegelhalter, L. S. Freedman and N ORMAND , S.-L., G LICKMAN , M. E. and G ATSONIS , C. A.
M. K. B. Parmar. J. Roy. Statist. Soc. Ser. A 157 387–388. (1997). Statistical methods for profiling providers of medical
H ANSON , T., B EDRICK , E., J OHNSON , W. and T HURMOND , M. care: Issues and applications. J. Amer. Statist. Assoc. 92
(2003). A mixture model for bovine abortion and foetal 803–814.
survival. Statistics in Medicine 22 1725–1739. O’H AGAN , A. and L UCE , B. (2003). A Primer on Bayesian Statis-
H ARRELL , F. E. and S HIH , Y. C. T. (2001). Using full probability tics in Health Economics and Outcomes Research. Centre for
models to compute probabilities of actual interest to decision Bayesian Statistics in Health Economics, Sheffield, UK.
makers. International J. Technology Assessment in Health O’H AGAN , A., S TEVENS , J. W. and M ONTMARTIN , J. (2000).
Care 17 17–26. Inference for the cost-effectiveness acceptability curve and
H ASSELBLAD , V. (1998). Meta-analysis of multi-treatment stud- cost-effectiveness ratio. Pharmacoeconomics 17 339–349.
ies. Medical Decision Making 18 37–43. O’H AGAN , A., S TEVENS , J. W. and M ONTMARTIN , J. (2001).
H EALY, M. J. R. and S IMON , R. (1978). New methodology in Bayesian cost-effectiveness analysis from clinical trial data.
clinical trials. Biometrics 34 709–712. Statistics in Medicine 20 733–753.
H EITJAN , D. F. (1997). Bayesian interim analysis of phase II O’N EILL , R. T. (1994). Conclusions. 2. Statistics in Medicine 13
cancer clinical trials. Statistics in Medicine 16 1791–1802. 1493–1499.
174 D. J. SPIEGELHALTER
PALISADE E UROPE (2001). @RISK 4.0. Technical report. Avail- S IMON , R., D IXON , D. O. and F RIEDLIN , B. (1996). Bayesian
able at http://www.palisade-europe.com. subset analysis of a clinical trial for the treatment of HIV infec-
PARMAR , M. K. B., G RIFFITHS , G. O., S PIEGELHALTER, D. J., tions. In Bayesian Biostatistics (D. A. Berry and D. K. Stangl,
S OUHAMI , R. L., A LTMAN , D. G. and VAN DER eds.) 555–576. Dekker, New York.
S CHEUREN , E. (2001). Monitoring of large randomised clini- S ONG , F., A LTMAN , D., G LENNY, A. and D EEKS , J. J. (2003).
cal trials—a new approach with Bayesian methods. The Lancet Validity of indirect comparison for estimating efficacy of
358 375–381. competing interventions: Empirical evidence from published
PARMAR , M. K. B., S PIEGELHALTER , D. J. and F REED - meta-analyses. British Medical J. 326 472–476.
MAN , L. S. (1994). The CHART trials: Bayesian design and S PIEGELHALTER , D. J. (2001). Bayesian methods for cluster
monitoring in practice. Statistics in Medicine 13 1297–1312. randomized trials with continuous responses. Statistics in
PARMAR , M. K. B., U NGERLEIDER , R. S. and S IMON , R. (1996). Medicine 20 435–452.
Assessing whether to perform a confirmatory randomized S PIEGELHALTER , D. J., A BRAMS , K. R. and M YLES , J. P.
clinical trial. J. National Cancer Institute 88 1645–1651. (2004). Bayesian Approaches to Clinical Trials and Health
P ETO , R. (1985). Discussion of “On the allocation of treatments in Care Evaluation. Wiley, New York.
sequential medical trials,” by J. Bather. Internat. Statist. Rev. S PIEGELHALTER , D. J. and B EST, N. G. (2003). Bayesian
53 31–34. approaches to multiple sources of evidence and uncertainty in
P OCOCK , S. (1976). The combination of randomized and historical complex cost-effectiveness modelling. Statistics in Medicine
controls in clinical trials. J. Chronic Diseases 29 175–188. 22 3687–3708.
S PIEGELHALTER , D. J., F REEDMAN , L. S. and PAR -
P REVOST, T. C., A BRAMS , K. R. and J ONES , D. R. (2000).
MAR , M. K. B. (1994). Bayesian approaches to randomized
Hierarchical models in generalized synthesis of evidence:
trials (with discussion). J. Roy. Statist. Soc. Ser. A 157
An example based on studies of breast cancer screening.
357–416.
Statistics in Medicine 19 3359–3376.
S PIEGELHALTER , D. J., M YLES , J., J ONES , D. and A BRAMS , K.
R ACINE , A., G RIEVE , A. P., F LUHLER , H. and S MITH , A. F. M.
(2000). Bayesian methods in health technology assessment:
(1986). Bayesian methods in practice—experiences in the
A review. Health Technology Assessment 4 1–130.
pharmaceutical industry (with discussion). Appl. Statist. 35
S TANGL , D. K. and G REENHOUSE , J. B. (1998). Assessing
93–150.
placebo response using Bayesian hierarchical survival models.
R EEVES , B., M AC L EHOSE , R., H ARVEY, I., S HELDON , T., Lifetime Data Analysis 4 5–28.
RUSSELL, I. and B LACK , A. (2001). A review of observa- S UTTON , A. J., A BRAMS , K. R., J ONES , D. R., S HELDON , T. A.
tional, quasi-experimental and randomized study designs for and S ONG , F. (2000). Methods for Meta-Analysis in Medical
the evaluation of the effectiveness of healthcare interventions. Research. Wiley, New York.
In The Advanced Handbook of Methods in Evidence Based T SIATIS , A. A. (1981). The asymptotic joint distribution of
Healthcare (A. Stevens, K. Abrams, J. Brazier, R. Fitzpatrick the efficient scores test for the proportional hazards model
and R. Lilford, eds.) 116–135. Sage, London. calculated over time. Biometrika 68 311–315.
S ANDERSON , C., M C K EE , M., B RITTON , A., B LACK , N., T URNER , R., O MAR , R. and T HOMPSON , S. (2001). Bayesian
M C P HERSON , K. and BAIN , C. (2001). Randomized and non- methods of analysis for cluster randomized trials with binary
randomized studies: Threats to internal and external validity. outcome data. Statistics in Medicine 20 453–472.
In The Advanced Handbook of Methods in Evidence Based W HITEHEAD , A. (2002). Meta-Analysis of Controlled Clinical
Healthcare (A. Stevens, K. Abrams, J. Brazier, R. Fitzpatrick Trials. Wiley, New York.
and R. Lilford, eds.) 95–115. Sage, London. W HITEHEAD , J. (1997). The Design and Analysis of Sequential
S HAKESPEARE , T. P., G EBSKI , V. J., V ENESS , M. J. and Clinical Trials, 2nd ed. Wiley, New York.
S IMES , J. (2001). Improving interpretation of clinical studies Z UCKER , D. R., S CHMID , C. H., M C I NTOSH , M. W.,
by use of confidence levels, clinical significance curves, and D’AGOSTINO , R. B., S ELKER , H. P. and L AU , J. (1997).
risk-benefit contours. The Lancet 357 1349–1353. Combining single patient (N -of-1) trials to estimate population
S IMON , R. (1994). Some practical aspects of the interim monitor- treatment effects and to evaluate individual patient responses to
ing of clinical trials. Statistics in Medicine 13 1401–1409. treatment. J. Clinical Epidemiology 50 401–410.