Sunteți pe pagina 1din 10

Interface Foundation of America

Bayesian Analysis of a Two-State Markov Modulated Poisson Process


Author(s): Steven L. Scott
Source: Journal of Computational and Graphical Statistics, Vol. 8, No. 3 (Sep., 1999), pp. 662-
670
Published by: American Statistical Association, Institute of Mathematical Statistics, and
Interface Foundation of America
Stable URL: http://www.jstor.org/stable/1390883
Accessed: 07/05/2009 13:32

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of America are
collaborating with JSTOR to digitize, preserve and extend access to Journal of Computational and Graphical
Statistics.

http://www.jstor.org
SPECIAL SECTION: WINNINGENTRIES
OF THE 1998 STUDENT PAPER COMPETITION

Bayesian Analysis of a Two-StateMarkov


ModulatedPoisson Process
Steven L. SCOTT

We postulate observationsfrom a Poisson process whose rate parametermodulates


between two values determinedby an unobservedMarkov chain. The theory switches
from continuousto discrete time by consideringthe intervalsbetween observationsas a
sequence of dependentrandomvariables. A result from hidden Markov models allows
us to sample from the posteriordistributionof the model parametersgiven the observed
event times using a Gibbs samplerwith only two steps per iteration.

Key Words: Forward-backward;


Gibbs sampler;Hidden Markovmodel; Markovchain
Monte Carlo.

1. INTRODUCTION, MOTIVATION, AND MODEL


DESCRIPTION
The Markov modulated Poisson process arises naturally in the study of event histories
that may be contaminated by an outside source. Suppose transactions on a customer's
account are observed at times rl < . * < Tn in the fixed interval (0, T]. In the problem
motivating this article the transactions are phone calls, and rj is the time the jth call was
placed. We refer to r = (rl,..., 7n) as the event history for the customer. A criminal
may gain access to the account and generate additional traffic. When a criminal is present
we say the account has been "contaminated," and we refer to the interval from his arrival
to his departure as a "contamination episode."
Suppose that a customer's transactions in an interval (a, b] C (0, T] follow a Poisson
process No(a, b] with rate parameter A0. A criminal present for the entire interval (a, b]
generates fraudulent traffic according to an independent Poisson process N1 (a, b] with
rate parameter A1. Let C(t) = 1 if a criminal is present at time t, and C(t) = 0 otherwise,
so that C(.) is a random step function of continuous time. We model C(.) as a Markov

Steven L. Scott is AssistantProfessor,MarshallSchool of Business, Bridge Hall 401-H, Universityof Southern


California,Los Angeles, CA 90089-1421 (Email: sls@usc.edu).
?1999 AmericanStatisticalAssociation, Instituteof MathematicalStatistics,
and InterfaceFoundationof NorthAmerica
Journal of Computationaland GraphicalStatistics, Volume8, Number3, Pages 662-670

662
BAYESIANANALYSISOF A TWO-STATEMARKOVMODULATEDPOISSONPROCESS 663

---I ---- - - ------- ----- ------- ----- -


.
Honest Process: No
. ........ ..................... ... ..:
: .... ..............
__ _-
_ X-- - - - - - - - - - - - - - - -x - - - - - - - - - - - - - - - - - - -- - - - - - - - - -
_ _ _ _---------- - - - -X - - - - - -- - -X-- _ - - _ -- - ._ _ _. _ .

Contamination Process: C (')

-l^-
- - ----- ----.-----------
Additional Fraud Process: N

D x
O 1 D D I I x

Observed Process

Figure 1. Illustrationof the MarkovmodulatedPoisson process described in Section 1. Notice that the end of
a contaminationepisode does not generate an observed event. We get to see the event times in the observed
process, but we are unable to distinguishboxes (honest events)from X's and vertical lines (fraudulentevents).

process with generatormatrix

r --7
= a A

that is, successive waiting times to arrivalsand departuresare independentexponential


randomvariableswith rates y and 0.
The arrivalof a criminalis defined as the time of his first fraudulentevent in a con-
taminationepisode, thus eliminatingthe possibility of empty contaminationintervals.An
equivalentassumptionis that a transitionof C(.) from 0 to 1 generatesa fraudulentevent
in the event history. To allow contaminationepisodes that contain a single fraudulent
event, assume that transitionsof C(.) from 1 to 0 generateno events.
The model is illustratedin Figure 1, where we refer to No as the honest process,
C(-) as the contaminationprocess, and N1 as the additionalfraudulentprocess. Notice
that events in the interiorof a contaminationintervalmay have been producedby either
No or N1, so that C(rj) = 1 is not sufficientto imply that an event at time rj is fraud.
Let y = (Yl,..., Yn), where yj = 0 if the event at time rj was produced by No, and
yj = 1 otherwise. Takentogether,C(.), y, and ir convey enough informationto recreate
No and N1 from the event history r.
The goal of this article is to simulate draws from the posteriordistributionof 0 =
(Ao,Al,-y, ) given an observed r, but not the missing data C(-) and y. The primary
obstacle that must be overcome is that the missing data include a process occurringin
continuoustime. Section 2 explains the algorithmemployed for posteriorsampling, and
Section 3 presentsnumericalresults for event times simulatedfrom this model. Section 4
concludes by discussing the merits of the sampling algorithm.
664 S. L. ScoTT

2. GIBBS SAMPLING
This section gives the details of our posteriorsampling algorithm,requiringheavier
notationthan the rest of the article.We begin by assuming a conjugatepriordistribution
for (Ao,Al, y, q), where the variablesare a priori independentwith

Ao- r(aAo, bAxo 1 (a ba)b) 7Y - r(a, r(a,)F , b).

Here the gamma density is parameterizedso that if x ~ F(a, b), then E(x) = a/b and
var(x) = a/b2. Under this prior, the joint posteriordensity of all quantitiesappearingin
the model is

p(O,r, y,C(.)) no+aO-l exp(-A (T + b))


xA -nol+aA -l exp (-Al(T1 + bA ))
x?ynoi+a7-l1 exp ( - (To + by))
Xnlo+ao-l exp ( - (T1 + b)). (2.1)

The complete data sufficientstatistics(i.e., those quantitiesthatwould be sufficientstatis-


tics had the missing data been observed)in Equation(2.1) are as follows: no and nl are
the numbersof honest and fraudulentevents (no + nl = n), noi and nio are counts of
transitionsby C(.) from 0 to 1 and from 1 to 0, T1 is the total time C(.) spent in the
state 1, and To is the total time spent in state 0 (T1 + To = T).
Had we observed C(.) and y in additionto r, then the elements of 0 would remain
independentof one anotherin their posterior distribution,which would be the product
of the following densities

0
Ao r(no +ax0,T+bAo) A1 ~ r(nl -nol + ax, T1 + bx,)
7 ~ r(nol +a, To+by) q r(nlo +a0, T1+b0).
Since it is trivial to sample 0 given complete data, interest in Gibbs sampling rests
on drawingthe missing data. The purpose of this section is to show that we are able to
draw the missing data directly from their posteriordistributiongiven the observed data
and model parameters.For conveniencedefine To= 0, Tn+1 = T, and for vectors x write
to mean (xj, . . , Xk). The event times T_+l provide a natural partition of the interval
(0, T] that allows us to work with a discretetime stochasticprocess containingthe same
informationas a continuoustime realizationof C(.). Let Ij = (rj-1, Tj],j = 1,..., n+1,
and say that Arj = IIjl = rj - rj-i. The behavior of C(.) over Ij may be classified
according to one of the five states illustratedin Figure 2. If we let hj denote which
of these five states describes the behavior of C(.) over Ij, and if we write wj for the
amountof time C(.) spent in state 1 over Ij, then there is a one-to-one mappingbetween
C(t),t E (0, T] and (hj,wj),j E {1,...,n + 1}.
Let h = hn, w = wn, and AT = ATln. Once the algorithmfor drawing h, w, and
y understood,only minor changes are needed to draw the endpoints hn+l and w,+l.
is
These repetitivedetails are omitted for the sake of brevity. The first step in drawingthe
missing data is to factor its distributionas

p(h, w, yOl, Ar) = p(hlj, Ar) x p(wlh, Ar, 0) x p(ylw, h, AT, 0).
BAYESIANANALYSISOF A TWO-STATEMARKOVMODULATEDPOISSONPROCESS 665

* --0- * 0
* 0
hj = 0 1 2 3 4

Figure 2. Thefive possible states describingthe behaviorof C(.) over the interval (rj- 1, rj]. A transitionfrom
1 to 0 is only possible in the interior of the interval,while a transitionfrom 0 to 1 is only possible at the right
endpoint.

The next two sections give details on how to draw h, w, and y from their respective
distributionsvia nested conditioning.

2.1 h GIVEN0 ANDAT


DRAWING
Two facts allow us to draw h directly from p(hl0, Ar). The first is that h behaves
as a discrete time Markov chain if we do not condition on AT. This is clear since
h results from the discretizationof a continuous time Markov process. The transition
matrix Q = (q,,) of this Markov chain is derivable from well-known propertiesabout
the minimum of independentexponentialrandomvariables. Specifically,

qrO=o-
Ao + y rl = -
qrl
Ao + ) qr2 = qr3 = qr4 0

if r = 0 or 1, and

qrO = qrl = qr2 = 0++q$


A0++4
qr3 = A0+A +0+o+?1
v qr4 =

if r =2,3, or 4.
Second, observe that the distribution of ATj is determined by three processes:
No, N1, and C(.). If we conditionon hj then we know the value of C(.) on the boundary
of Ij. Because C(.) obeys the Markov propertyand the other two processes are mem-
oryless, the distributionof Arj given hj is independentof the behavior of No, N1 and
C(.) outside of Ij. Thereforep(Arj Ih, Ar_j, 0) dependsonly on hj and 0. Specifically,
if E(.) representsthe exponentialdistributionwith rate *, then

= Ps
p(Ar,j hj = s, h_j, Ar-j, 0) (ATj0)
E(Ao + y) if s c{0,1}
= ?(A + A1 + 4) if s = 3 (2.2)
?(Ao + A1+ ) *S(Ao + y) if s E {2, 4}.

The two conditions, h a Markov chain and p(ATrj h, Ar_j,0) -


p(ArjT hj,0),
imply that AT behaves accordingto a hidden Markovmodel (HMM) with hidden states
h. This means that we can employ the stochastic forward-backwardrecursions (Chib
1996) to produceh fromp(hlAr, 0). The first step in the algorithmrecursivelycomputes
the matrices P2, . , Pn, where Pj = (pjrs) and Pjrs = Pr(hj_l = r, hj = sl/jAr, 0).
In other words, Pj+l is the distributionof the jth transitiongiven the observed data up
to time j + 1. Let 7rj(r) = Pr(hj = r\Arl, 0) and observe that knowledge of Pj implies
666 S. L. SCOTT

knowledge of irj. It is straightforwardto show that the recursionis defined by

Pjrs Xc 7Irj_(r)qrsPs(ATj |),

where the proportionalityis reconciled by Y,r s Pjrs = 1 for each j. It is worth noting
that computing P2,...,Pn is identical to the forward step in the standardforward-
backwardalgorithmfor HMM's developed by Baum, Petrie, Soules, and Weiss (1970).
Once P2,..., Pn have been computed,draw (h,,_l, h,) from the multinomialdis-
tributiondeterminedby Pn, which conditions on all the elements of AT. Notice that
p(hlAr, 0) factors as follows:
n-i

p(hn IATr = p,p(hnT,


) 0) (hn- _jI-+l ATn, O)
j=l

where

Pr(hn-j r|hn_+l, A,r 0) = Pr(hn-j = rhn-j+l = s, Arn-+ 0)


= Pn-j+l,r,s/ Pn-j+l,r,s. (2.3)
r

Complete the draw of h by generating hn-j from the distributionin (2.3) for j E
{2,..., n - 1}. At least one of the probabilities appearing in the final sum of Equation
(2.3) must be positive, since hn-j+l has already been drawn. Thus, there is no danger
of dividing by zero when computingthe backwardsmultinomialprobabilities.

2.2 DRAWING y ANDW GIVENh, AT, AND0


If we condition on hj, ATrj, and 0, then yj and Wj are independentof one another
and of all other quantitiesappearingin the model. The conditionaldistributionof yj is

0 with probability1 if hj E {0, 2}


p(yj h, Ar, 0) = 1 with probability1 if hj E {1,4}
Bernoulli (A /(A1 + Ao)) if hj = 3.

The correspondingdensity for wj is

0 with probability1 if hj E {0, 1}


AT with probability 1 if hj = 3
p(Wjlh, Ar, 0) =
(Ap(wj 0)expA,
Wj e (O, Aj) if hj E {2,4}.
-exp (-(A+--y)r)
(2.4)
If AI + - y > 0, then p(wj Ihj E {2, 4}, Ar, 0) is the truncated exponential density on
(0, ATj), which tends to the uniformdensity as Al + q - -y 0. Equation(2.4) remainsa
properdensity on [0, ATj] even if A1+ - 7 < 0. Regardlessof the value of AX+ q -,
it is trivial to sample from (2.4) by CDF inversion.
.......

BAYESIANANALYSISOF A TWO-STATEMARKOVMODULATEDPOISSONPROCESS 667

.... .. o -
:..?* .

s...:f.*... . , . .
..' . : *.-

-:.- -

Pr(C(.)= l|?, A.,)

2.3 COMPUTATIONALISSUES

Even though simulating h, y, and w are three conceptually different operations,


the conditional independence properties satisfied by yj and wj allow them to be drawn
immediately following the draw of hj, saving the effort of multiple loops through the
data. The complete data sufficient statistics may also be updated as each piece of missing
data is drawn, resulting in a double savings. Not only do we obviate the need for an
additionalloop, but w and y need not be stored. The complete data sufficient statistics
may be decomposed in the following way:

nf=Z j no=Zo(1-yj) nol= ZI(hj= 1or4)

io = (h = 2 or 4) Ti =Z wj To= (A, j-wj).

3. NUMERICAL RESULTS
Figure 3(a) shows a jittered dot plot of event times simulated from the model with
parameter values T = 1,000,Ao = 1,AI = 1, = .001, and .= .1. The simulation
resulted in 1,035 events and two contamination intervals. We also plot the posterior
probabilitiesof contamination,given the event times and the true model parameters,in
Figure 3(b). These probabilitiesare availablefrom the standard(nonstochastic)forward-
backward algorithm in Baum et al. (1970).
Figure 4(a) shows posterior density estimates for the simulated dataset when all
parameters are given exchangeable F(l, 1) priors. For comparison purposes, log(.001) =
-6.91 and log(O.l) = -2.30. The Gibbs sampler was started from the true parameter
values and run for 100,000 iterations. A primary task of the Gibbs sampler is to attribute
each event to one of the three processes No, N1, or C(.). Although the sampler spends
much of its time in the vicinity of the parametervalues responsible for generatingthe
data, inspection of the sample paths of the sufficient statistics, which are not shown due
668 S. L. SCOTT

lambdaO lambda1 lambdaO lambdal

(0
o 9

0
(ac
.6 (
c co '^
c 6
a)
in 0 - lI 0.
o
.

cli 60

o O S 0
, , ,

00 0.4 0.8 1.2 0246


0 2 4 6 0.85 0.95 1.05 0 1 2 3 4

Parameter Value Parameter Value Parameter Value Parameter Value

log(gamma) log(phi) log(gamma) log(phi)

N d
o
o o o
c
a 7

oO -

o I 0 l o
o , ,,,, ,, ,,

-14 -10 -6 -2 0 -8 -6 -4 -2 0 2 4 -10 -8 -6 -4 -6 -4 -2 0 2

Parameter Value Parameter Value Parameter Value ParameterValue

(a) (b)
Figure 4. Posterior density estimatesfor the simulateddata set based on Gibbs output.(a) Weakexchangeable
priors. (b) Strongprior on 7y.

to space considerations,reveals that it is prone to long excursions where most of the


points are attributedto the contaminationprocess. This accounts for the two distinct
modes present in Figure 4(a). Even though there are only two contaminationepisodes
in the simulated dataset, noi and nho can take on values close to 1,000 during one of
these excursions.When this happens,probabilitymass focuses on large values of y and q,
makingit more likely that no1 and n1o will be large in the next iteration.Such excursions
may last for several thousanditerations,but the samplereventually returnsto the "true"
regime. The excursions are lengthy enough to give substantialevidence of bimodalityin
the posteriordistribution.
Priorinformationabout7yand 0, if present,can be used to curbthe aberrantbehavior
causing the second mode. Figure 4(b) shows posterior density estimates for the same
situation as Figure 4(a), except that the prior distributionfor 7 has been replaed by
r(l, 100). The informationin this prioris equivalentto investigating 100 time units of a
similarcustomer'stransactionrecordand finding that he has not sufferedcontamination.
Essentially, we are communicatingto the model our belief that fraud is a rare event.
Notice that the firm prior on y is centeredat the wrong value (.01 ratherthan .001), yet
it succeeds in eliminatingthe pathologicalsecond mode. The marginalposteriordensities
of the parametersare now centeredclose to the true values that generatedthe data.
BAYESIANANALYSISOF A TWO-STATEMARKOVMODULATEDPOISSONPROCESS 669

4. DISCUSSION
The task undertakenin a simulation study is to draw a random variable x =
(xl,... ,xk) from its distributionp(x). In ideal cases this simulation can be done di-
rectly, and inferences can be based on the sequence of draws x(), x(2),... as if it were
an independentsample from p(.). In cases where direct simulation is not possible, we
may employ the Gibbs sampler,which breaks x into smaller components (xl,..., xm)
with m < k. The Gibbs algorithmthen draws xj from p(xj Ix-j) for j = 1,..., m. We
refer to m as the numberof steps per iteration.Each iterationof the Gibbs algorithm-
that is, each completion of m steps-produces a draw of x from a Markovchain whose
stationarydistributionis p(x). Direct simulation correspondsto the case when m = 1,
and as m grows we move fartherand fartherfrom this ideal. As m becomes large we
increasingly rely on the convergence of the underlyingMarkov chain to correct for re-
lationshipsbetween the componentsof x. This increasedreliance translatesdirectly into
serial dependencein the sequence of draws outputby the sampler.
An importantfeature of the algorithmpresented in this article is that, although it
samples 3n + 4 quantitiesin each iteration,the samplingis done in only two steps. One
step draws the parametersgiven the missing data, and the other draws the missing data
given the parameters.It is possible to implement sampling schemes other than the one
we employ. For instance, a naive approachwould be to generate each of the variables
in w and y from its full conditionaldistributiongiven the most recent draw of the other
variables.The naive approachmust abandonexplicit considerationof h, since that would
destroy the ergodicity of the Markovchain underpinningthe Gibbs algorithm.[Note: To
see this, fix j and consider a case where the currentdraw of yj is 1 and 0 < wj < Arj.
The full conditionalfor hj puts probability1 on hj = 4. Later, when it is time to draw
wj and yj given hj = 4, all the mass will be on the events yj = 1 and 0 < wj < A7j.
Thereforehj = 4 is an absorbingstate in the Gibbs Markovchain, which does not allow
for convergence to the desired stationarydistribution.]This makes the full conditional
for wj more complicated.The Markovdependencerepresentedby h is shifted to w, and
the full conditional for wj may consist of both a density on (0, Arj) and a point mass
at either 0 or Arj, depending on the values of wj-1 and wj+l.
In a similarhidden Markovmodel setup, Robert,Celeux, and Diebolt (1993) argued
for the naive Gibbs sampleron the premise that the forward-backwardalgorithmis too
complicated.Our implementationof the forwardstep takes less than 50 lines of reusable
C code, and the stochastic backwardstep merely involves sampling from multinomial
distributions.The slight increase in complicationis repaid with a more computationally
efficient algorithm.The naive Gibbs samplerhas 2n + 1 steps per Gibbs iteration,com-
paredwith only two such steps for the samplerinvolving the stochasticforward-backward
algorithm.Not only does the naive samplerhave more steps per iteration,but the number
of steps increases linearly with n.
Finally, our example is simple in the sense that the parametersmay be generated
directly once the missing data have been drawn. It is not hard to imagine a case where
MCMC methods are requiredto sample model parameters,even given the missing data.
If thereis extensive serial dependencein the MCMCalgorithmthatdrawsthe parameters,
then an algorithmthat draws the missing data in a single step is not likely to seriously
670 S. L. SCOTT

furtherimpede the convergenceof the samplerto its stationarydistribution.Conversely,


an algorithmthat adds a number of steps proportionalto the size of the dataset could
slow convergence to the point where Gibbs sampling is no longer feasible.

[ReceivedAugust 1998. Revised February1999.]

REFERENCES
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970), "A Maximization Technique Occurringin the
StatisticalAnalysis of ProbabilisticFunctionsof MarkovChains,"TheAnnals of MathematicalStatistics,
41, 164-171.
Chib, S. (1996), "CalculatingPosteriorDistributionsand Modal Estimatesin MarkovMixtureModels,"Journal
of Econometrics,75, 79-97.
Robert,C. P., Celeux, G., and Diebolt, J. (1993), "BayesianEstimationof HiddenMarkovChains:A Stochastic
Implementation,"Statistics and ProbabilityLetters, 16, 77-83.

S-ar putea să vă placă și