Documente Academic
Documente Profesional
Documente Cultură
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of America are
collaborating with JSTOR to digitize, preserve and extend access to Journal of Computational and Graphical
Statistics.
http://www.jstor.org
SPECIAL SECTION: WINNINGENTRIES
OF THE 1998 STUDENT PAPER COMPETITION
662
BAYESIANANALYSISOF A TWO-STATEMARKOVMODULATEDPOISSONPROCESS 663
-l^-
- - ----- ----.-----------
Additional Fraud Process: N
D x
O 1 D D I I x
Observed Process
Figure 1. Illustrationof the MarkovmodulatedPoisson process described in Section 1. Notice that the end of
a contaminationepisode does not generate an observed event. We get to see the event times in the observed
process, but we are unable to distinguishboxes (honest events)from X's and vertical lines (fraudulentevents).
r --7
= a A
2. GIBBS SAMPLING
This section gives the details of our posteriorsampling algorithm,requiringheavier
notationthan the rest of the article.We begin by assuming a conjugatepriordistribution
for (Ao,Al, y, q), where the variablesare a priori independentwith
Here the gamma density is parameterizedso that if x ~ F(a, b), then E(x) = a/b and
var(x) = a/b2. Under this prior, the joint posteriordensity of all quantitiesappearingin
the model is
0
Ao r(no +ax0,T+bAo) A1 ~ r(nl -nol + ax, T1 + bx,)
7 ~ r(nol +a, To+by) q r(nlo +a0, T1+b0).
Since it is trivial to sample 0 given complete data, interest in Gibbs sampling rests
on drawingthe missing data. The purpose of this section is to show that we are able to
draw the missing data directly from their posteriordistributiongiven the observed data
and model parameters.For conveniencedefine To= 0, Tn+1 = T, and for vectors x write
to mean (xj, . . , Xk). The event times T_+l provide a natural partition of the interval
(0, T] that allows us to work with a discretetime stochasticprocess containingthe same
informationas a continuoustime realizationof C(.). Let Ij = (rj-1, Tj],j = 1,..., n+1,
and say that Arj = IIjl = rj - rj-i. The behavior of C(.) over Ij may be classified
according to one of the five states illustratedin Figure 2. If we let hj denote which
of these five states describes the behavior of C(.) over Ij, and if we write wj for the
amountof time C(.) spent in state 1 over Ij, then there is a one-to-one mappingbetween
C(t),t E (0, T] and (hj,wj),j E {1,...,n + 1}.
Let h = hn, w = wn, and AT = ATln. Once the algorithmfor drawing h, w, and
y understood,only minor changes are needed to draw the endpoints hn+l and w,+l.
is
These repetitivedetails are omitted for the sake of brevity. The first step in drawingthe
missing data is to factor its distributionas
p(h, w, yOl, Ar) = p(hlj, Ar) x p(wlh, Ar, 0) x p(ylw, h, AT, 0).
BAYESIANANALYSISOF A TWO-STATEMARKOVMODULATEDPOISSONPROCESS 665
* --0- * 0
* 0
hj = 0 1 2 3 4
Figure 2. Thefive possible states describingthe behaviorof C(.) over the interval (rj- 1, rj]. A transitionfrom
1 to 0 is only possible in the interior of the interval,while a transitionfrom 0 to 1 is only possible at the right
endpoint.
The next two sections give details on how to draw h, w, and y from their respective
distributionsvia nested conditioning.
qrO=o-
Ao + y rl = -
qrl
Ao + ) qr2 = qr3 = qr4 0
if r = 0 or 1, and
if r =2,3, or 4.
Second, observe that the distribution of ATj is determined by three processes:
No, N1, and C(.). If we conditionon hj then we know the value of C(.) on the boundary
of Ij. Because C(.) obeys the Markov propertyand the other two processes are mem-
oryless, the distributionof Arj given hj is independentof the behavior of No, N1 and
C(.) outside of Ij. Thereforep(Arj Ih, Ar_j, 0) dependsonly on hj and 0. Specifically,
if E(.) representsthe exponentialdistributionwith rate *, then
= Ps
p(Ar,j hj = s, h_j, Ar-j, 0) (ATj0)
E(Ao + y) if s c{0,1}
= ?(A + A1 + 4) if s = 3 (2.2)
?(Ao + A1+ ) *S(Ao + y) if s E {2, 4}.
where the proportionalityis reconciled by Y,r s Pjrs = 1 for each j. It is worth noting
that computing P2,...,Pn is identical to the forward step in the standardforward-
backwardalgorithmfor HMM's developed by Baum, Petrie, Soules, and Weiss (1970).
Once P2,..., Pn have been computed,draw (h,,_l, h,) from the multinomialdis-
tributiondeterminedby Pn, which conditions on all the elements of AT. Notice that
p(hlAr, 0) factors as follows:
n-i
where
Complete the draw of h by generating hn-j from the distributionin (2.3) for j E
{2,..., n - 1}. At least one of the probabilities appearing in the final sum of Equation
(2.3) must be positive, since hn-j+l has already been drawn. Thus, there is no danger
of dividing by zero when computingthe backwardsmultinomialprobabilities.
.... .. o -
:..?* .
s...:f.*... . , . .
..' . : *.-
-:.- -
2.3 COMPUTATIONALISSUES
3. NUMERICAL RESULTS
Figure 3(a) shows a jittered dot plot of event times simulated from the model with
parameter values T = 1,000,Ao = 1,AI = 1, = .001, and .= .1. The simulation
resulted in 1,035 events and two contamination intervals. We also plot the posterior
probabilitiesof contamination,given the event times and the true model parameters,in
Figure 3(b). These probabilitiesare availablefrom the standard(nonstochastic)forward-
backward algorithm in Baum et al. (1970).
Figure 4(a) shows posterior density estimates for the simulated dataset when all
parameters are given exchangeable F(l, 1) priors. For comparison purposes, log(.001) =
-6.91 and log(O.l) = -2.30. The Gibbs sampler was started from the true parameter
values and run for 100,000 iterations. A primary task of the Gibbs sampler is to attribute
each event to one of the three processes No, N1, or C(.). Although the sampler spends
much of its time in the vicinity of the parametervalues responsible for generatingthe
data, inspection of the sample paths of the sufficient statistics, which are not shown due
668 S. L. SCOTT
(0
o 9
0
(ac
.6 (
c co '^
c 6
a)
in 0 - lI 0.
o
.
cli 60
o O S 0
, , ,
N d
o
o o o
c
a 7
oO -
o I 0 l o
o , ,,,, ,, ,,
(a) (b)
Figure 4. Posterior density estimatesfor the simulateddata set based on Gibbs output.(a) Weakexchangeable
priors. (b) Strongprior on 7y.
4. DISCUSSION
The task undertakenin a simulation study is to draw a random variable x =
(xl,... ,xk) from its distributionp(x). In ideal cases this simulation can be done di-
rectly, and inferences can be based on the sequence of draws x(), x(2),... as if it were
an independentsample from p(.). In cases where direct simulation is not possible, we
may employ the Gibbs sampler,which breaks x into smaller components (xl,..., xm)
with m < k. The Gibbs algorithmthen draws xj from p(xj Ix-j) for j = 1,..., m. We
refer to m as the numberof steps per iteration.Each iterationof the Gibbs algorithm-
that is, each completion of m steps-produces a draw of x from a Markovchain whose
stationarydistributionis p(x). Direct simulation correspondsto the case when m = 1,
and as m grows we move fartherand fartherfrom this ideal. As m becomes large we
increasingly rely on the convergence of the underlyingMarkov chain to correct for re-
lationshipsbetween the componentsof x. This increasedreliance translatesdirectly into
serial dependencein the sequence of draws outputby the sampler.
An importantfeature of the algorithmpresented in this article is that, although it
samples 3n + 4 quantitiesin each iteration,the samplingis done in only two steps. One
step draws the parametersgiven the missing data, and the other draws the missing data
given the parameters.It is possible to implement sampling schemes other than the one
we employ. For instance, a naive approachwould be to generate each of the variables
in w and y from its full conditionaldistributiongiven the most recent draw of the other
variables.The naive approachmust abandonexplicit considerationof h, since that would
destroy the ergodicity of the Markovchain underpinningthe Gibbs algorithm.[Note: To
see this, fix j and consider a case where the currentdraw of yj is 1 and 0 < wj < Arj.
The full conditionalfor hj puts probability1 on hj = 4. Later, when it is time to draw
wj and yj given hj = 4, all the mass will be on the events yj = 1 and 0 < wj < A7j.
Thereforehj = 4 is an absorbingstate in the Gibbs Markovchain, which does not allow
for convergence to the desired stationarydistribution.]This makes the full conditional
for wj more complicated.The Markovdependencerepresentedby h is shifted to w, and
the full conditional for wj may consist of both a density on (0, Arj) and a point mass
at either 0 or Arj, depending on the values of wj-1 and wj+l.
In a similarhidden Markovmodel setup, Robert,Celeux, and Diebolt (1993) argued
for the naive Gibbs sampleron the premise that the forward-backwardalgorithmis too
complicated.Our implementationof the forwardstep takes less than 50 lines of reusable
C code, and the stochastic backwardstep merely involves sampling from multinomial
distributions.The slight increase in complicationis repaid with a more computationally
efficient algorithm.The naive Gibbs samplerhas 2n + 1 steps per Gibbs iteration,com-
paredwith only two such steps for the samplerinvolving the stochasticforward-backward
algorithm.Not only does the naive samplerhave more steps per iteration,but the number
of steps increases linearly with n.
Finally, our example is simple in the sense that the parametersmay be generated
directly once the missing data have been drawn. It is not hard to imagine a case where
MCMC methods are requiredto sample model parameters,even given the missing data.
If thereis extensive serial dependencein the MCMCalgorithmthatdrawsthe parameters,
then an algorithmthat draws the missing data in a single step is not likely to seriously
670 S. L. SCOTT
REFERENCES
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970), "A Maximization Technique Occurringin the
StatisticalAnalysis of ProbabilisticFunctionsof MarkovChains,"TheAnnals of MathematicalStatistics,
41, 164-171.
Chib, S. (1996), "CalculatingPosteriorDistributionsand Modal Estimatesin MarkovMixtureModels,"Journal
of Econometrics,75, 79-97.
Robert,C. P., Celeux, G., and Diebolt, J. (1993), "BayesianEstimationof HiddenMarkovChains:A Stochastic
Implementation,"Statistics and ProbabilityLetters, 16, 77-83.