Sunteți pe pagina 1din 20

Journal of Statistical Computation and Simulation

ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20

A Bayesian wavelet approach to estimation of


a change-point in a nonlinear multivariate time
series
Robert M. Steward & Steven E. Rigdon
To cite this article: Robert M. Steward & Steven E. Rigdon (2015): A Bayesian wavelet approach
to estimation of a change-point in a nonlinear multivariate time series, Journal of Statistical
Computation and Simulation, DOI: 10.1080/00949655.2015.1116535
To link to this article: http://dx.doi.org/10.1080/00949655.2015.1116535

Published online: 16 Dec 2015.

Submit your article to this journal

Article views: 5

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=gscs20
Download by: [Library Services City University London]

Date: 19 December 2015, At: 09:47

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015


http://dx.doi.org/10.1080/00949655.2015.1116535

A Bayesian wavelet approach to estimation of a change-point in a


nonlinear multivariate time series
Robert M. Stewarda and Steven E. Rigdonb
Downloaded by [Library Services City University London] at 09:47 19 December 2015

a Department of Mathematics and Computer Science, Saint Louis University, Saint Louis, MO, USA; b Department of

Biostatistics, College for Public Health and Social Justice, Saint Louis University, Saint Louis, MO, USA
ABSTRACT

ARTICLE HISTORY

We propose a semiparametric approach to estimate the existence and location of a statistical change-point to a nonlinear multivariate time series
contaminated with an additive noise component. In particular, we consider
a p-dimensional stochastic process of independent multivariate normal
observations where the mean function varies smoothly except at a single
change-point. Our approach involves conducting a Bayesian analysis on
the empirical detail coefficients of the original time series after a wavelet
transform. If the mean function of our time series can be expressed as a
multivariate step function, we find our Bayesian-wavelet method performs
comparably with classical parametric methods such as maximum likelihood
estimation. The advantage of our multivariate change-point method is seen
in how it applies to a much larger class of mean functions that require only
general smoothness conditions.

Received 29 August 2014


Accepted 2 November 2015
KEYWORDS

Semiparametric; scaling
coecient; detail coecient;
discrete wavelet transform;
Haar wavelet

1. Introduction
The change-point problem has been studied in a variety of settings since at least the 1920s when in
an effort to improve quality control Walter Shewart developed his now ubiquitous statistical control charts to detect various statistical changes in industrial processes.[1] Although control chart
methods proved useful in practice more theoretically grounded approaches involving maximum
likelihood estimation (MLE) [2] and Bayesian techniques [3] later allowed the practitioner to rigorously associate confidence intervals to their conclusions. While initially the univariate case of a single
change-point in the mean function was the focus, efforts expanded to include various other related
problems such as a multiple statistical change-points,[46] change in variance,[7] and a simultaneous
change in mean and variance.[8] The case where the error component is not from a normal distribution has also been studied by various authors.[9,10] While many of these methods have proven to be
valuable diagnostic data analysis tools, they generally either apply only in a single dimension or after
making strict assumptions on the time series model.
There appears to be a gap in the change-point literature that addresses the change-point problem
for nonlinear multivariate time series. Classical parametric approaches such as MLE and Bayesian
methods exist to detect and estimate the location of one or more statistical change-points in multivariate time series.[8,11,12] Many variations of such parametric approaches exist for detecting
multivariate statistical change-points,[1316] but invariably these methods require strict assumptions on the time series mean function. Mller [17] developed an approach to detect discontinuities
in derivatives using left and right one-sided kernel smoothers for one-dimensional smooth functions.
CONTACT Steven E. Rigdon
2015 Taylor & Francis

srigdon@slu.edu

R. M. STEWARD AND S. E. RIGDON

More recently, Ogden and Lynch,[18] Ciuperca,[19] and Battaglia and Protopapas [20] all have results
for estimating change-point locations in one-dimensional nonlinear time series. Matteson and James
[21] developed a fully nonparametric approach for estimating the location of multiple change-point in
a multivariate data. While their work is perhaps the method most relevant to the change-point problem in this article, their method still only applies to data sets where the mean function is piecewise
constant.
The multivariate change-point problem is an important problem that has direct applications in
a surprising number of otherwise seemingly unrelated fields. In statistical process control (SPC),
the multivariate change-point problem is important to quickly detect and estimate changes in
many industrial processes.[22] The US Department of Transportation has applied the multivariate
change-point problem to estimate statistical change-points around a speed limit increase from 55 to
65 mph.[13] Additional applications occur in such unrelated fields as biosurveillance, financial market analysis, and hydrology to name a few.[23,24] In practice, however, imposing strict assumptions
on the time series may be impractical when encountering the change-point problem for real world
multivariate data. Unfortunately, in the multivariate time series setting there have not been many
other good options. In this article we propose a method that attempts to bridge this gap by developing a generalization of the approach from Ogden and Lynch.[18] The method we propose detects
and estimates the location of a statistical change-point for multivariate data through a Bayesian analysis on empirical wavelet detail coefficients and applies even when strict assumptions about the true
underlying mean function cannot be made.

2. Background: why wavelets in the change-point problem?


The attractiveness of wavelets springs from both their simplicity in theory and flexibility in
application.[25] From a statistical point of view, wavelets offer alternative methods in data smoothing,
density estimation, and multiscale time series analysis.[26] The multiscale characteristic of wavelets
is particularly important in our setting because of the flexibility it offers to various change-point
problems we might encounter. The wavelet transform divides the original time series into scaling
coefficient and detail coefficient components. The scaling coefficients represent varying degrees of
time series smoothing or averaging while the details contain information pertaining to how much
the function is changing at a certain resolution level. Figure 1 illustrates this concept by displaying
a time series along with two example plots of corresponding scaling and detail coefficients. In some
sense, wavelets offer a method to analyse the change of a time series through a lens that may be readily
zoomed in or zoomed out as required.[27]
All wavelets we consider in this article constitute an orthonormal basis of the usual inner product space of square integrable functions, L2 (R). In particular, we may approximate any f L2 (R)

Level 6 Detail Coefficients


0.4

Level 5 Scaling Coefficients

0.3
0.2

magnitude

0.1

0.0

signal
0

coefficient value
2
0
2

Sample function with two frequencies

Downloaded by [Library Services City University London] at 09:47 19 December 2015

0.0

0.2

0.4

0.6
time

0.8

1.0

0.0

0.2

0.4

0.6
time

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

time

Figure 1. Original function (left), scaling coecients at level 5 smoothing the time series (middle), and detail coecients capturing
how the time series is changing at detail level 6 (right).

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

arbitrarily close in the L2 (R) sense as


f (t) =

w0,k 0,k +

dj,k j,k

(1)

j=j0 k=

k=

Downloaded by [Library Services City University London] at 09:47 19 December 2015

where wj,k and dj,k are called the scaling and detail coefficients, respectively. Through an integration
which we define below, each wj,k and dj,k coefficient is associated with a particular scaling and wavelet
function j,k and j,k , respectively. Each j,k and j,k are in turn related to each other by the so-called
father and mother wavelet expressed as
{j,k (t) = 2j/2 (2j t k)}j,k ,
{j,k (t) = 2j/2 (2j t k)}j,k .

(2)

Explicitly, wj,k and dj,k are given by



wj,k = f , j,k  =
dj,k = f , j,k  =

f (t)j,k (t) dt,


f (t)j,k (t) dt.

From Equation (2), we see wavelets by definition are simply systems of dilations and translations. The
simplest wavelet that we may explicitly express in closed form is the Haar wavelet given by

(t) =

1,
0

0t<1
otherwise

1,

1
0t<
2
(t) = 1, 1 t < 1

0
otherwise.

Besides being orthonormal bases for L2 (R), all wavelet systems in this article also possess other
special properties. By fixing a particular j in Equation (2), we denote the closed linear spans of the
scaling and wavelet functions as Vj and Oj , respectively, as k ranges through the integers. Each Vj is
an approximation space for the next finer approximation space of spanning scaling functions, Vj+1 ,
with the difference in information being precisely Oj . Figure 1 (middle) demonstrates this concept
of an approximation space by illustrating how a particular scaling level approximates the original
function by retaining the overall function shape while losing localized function characteristics. In
particular Vj is orthogonal to Oj with the direct sum of these orthogonal subspaces equal to Vj+1 ,
that is Vj+1 = Vj Oj . Such a construction leads to the so-called multiresolution analysis and allows
us to approximate (smooth) a signal at various approximation levels while precisely keeping track of
detail levels. The detail levels capture function change at these different resolutions and will play a
key role in our analysis of the change-point problem.
While the above formulation represents fundamental concepts of wavelet theory for general L2 (R)
functions and provides intuition for the approach, in practice we always apply the discrete wavelet
transform (DWT) with actual data. We start with a discrete time series x = {xi } of length 2J for some
natural number J. Next, we let j be an index across the DWT resolution levels ranging from J 1
down to 0. At each resolution level we produce 2j scaling and detail coefficients. For the Haar DWT,

R. M. STEWARD AND S. E. RIGDON

the finest level of scaling (wjk ) and detail (djk ) coefficients are computed by the formulas

w(J1),k = (x2k + x2k1 )/ 2 and d(J1),k = (x2k x2k1 )/ 2.

(3)

Downloaded by [Library Services City University London] at 09:47 19 December 2015

We then compute all subsequent levels of scaling and detail coefficients recursively by the formulas

w(j1),k = (wj,2k + wj,2k1 )/ 2 and d(j1),k = (wj,2k wj,2k1 )/ 2.


(4)
This process terminates when j = 0 and we produce just 20 = 1 additional scaling and detail coefficient. After we take the DWT, we have in total N 1 scaling and detail coefficients. Of course,
if we chose a different wavelet, the above formulas would also be different. We refer the reader to
Daubechies,[25] Nason,[26] and Mallat [28] for in depth treatments of wavelet theory.
Given a discrete noisy signal, we can take its DWT and analyse the resulting detail coefficients
to distinguish the signal from the statistical noise. Recall that a function is said to be smooth if it
possesses derivatives of all orders. Donoho and Johnstone [29] showed the DWT of a noisy signal
with an underlying smoothly varying mean function results in a sparse representation of the detail
coefficients provided the signal-to-noise ratio is sufficiently high. In particular, the contribution of
the signal to the high-level detail coefficient magnitudes should be close to zero leaving the energy
of the true signal concentrated in a relatively sparse number of low-level detail coefficients representing overall signal change. The noise component of the original signal, however, does not have a
sparse representation; rather, it is again transformed to noise after the DWT and spread throughout
all resolution levels. We will exploit this difference between signal and noise detail representation in
our change-point detection and estimation method described below. Explicitly we model our original
time series as
xi = g(i) + i ,
where g() is our true underlying smooth (except possibly at a change-point) mean function observed
for a discrete number of equally spaced time intervals and i is some additive noise component. Next,
we take the DWT of our time series and obtain a transformed data model of the following form:

djk
= djk + jk ,
is the empirical detail coefficient we actually observe. In the case of the Haar wavelet d
where djk
jk
would be the computation results after recursively applying Equations (3) and (4). Next, djk is the
true (but unknown) detail coefficient of the underlying smooth mean function we wish to estimate.
Finally, jk is the transformed additive noise component from the original time series that transforms
again to noise.[26]
If we assume i is generated from a Gaussian process, then i will also be Gaussian.[30] Wang [31]
connected these properties of the DWT to the change-point problem where he recognized that under
suitable conditions the largest detail coefficients are the result of those places where the time series
is changing most rapidly and probably not attributable to noise. Wang then hypothesized that the
places where the time series is most rapidly changing may be due to a statistical change-point. While
Wangs method works well for change-point problems with relatively high signal-to-noise ratios, it
becomes much less reliable as the additive noise is increased. Additionally, there is also the issue of
determining how best to combine the information from different detail levels to use in the analysis.
In the following section we develop a method that capitalizes on these statistical properties while
addressing Wangs shortcomings in a complete Bayesian model framework.

3. Bayesian-wavelet approach to the change-point problem


From Section 2 we know any additive noise component of a time series is again transformed to an
additive noise component after a DWT. Ogden and Lynch [18] exploited these properties of the DWT

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

Sine function with a shift

20

40

60

80

100

0.5

signal
0

1.0 0.5 0.0

0.5
0.0

signal

1.0

1.0

One dimensional step function

120

20

40

units of time

32

100

120

48

6 5 4 3 2 1 0

Resolution Level
16

80

Detail Coefficients
(sine function with a shift)

6 5 4 3 2 1 0
0

60

units of time

Detail Coefficients
(step function)

Resolution Level

Downloaded by [Library Services City University London] at 09:47 19 December 2015

64

Translate
Standard transform Daub cmpct on ext. phase N=10

16

32

48

64

Translate
Standard transform Daub cmpct on ext. phase N=10

Figure 2. Two example mean functions with a change-point at time point 81 (top) along with their respective detail coecients (bottom). Each detail level is normalized by its l -norm. Notice at the nest four resolution levels the detail coecients
are essentially identical to each other.

by proposing a method for estimating the change-point location of a one-dimensional time series by
applying Bayesian techniques in the wavelet domain. In this section, we generalize a similar methodology now to an arbitrary dimensional time series and extend the approach to answer the inference
question.
The DWT allows us to analyse a time series at varying resolution levels and stores the resulting
details of smooth functions in a similar way. Observe Figure 2 which displays two examples of a
smooth time series mean function except at a change-point at time point 81 (top) along with the
respective detail coefficient values (bottom). Observe that the detail coefficient values are essentially
identical for the finest three resolution levels (levels 4, 5, and 6), despite the fact that the mean functions are quite different. While some coefficient values at the lowest four resolution levels do begin
to diverge, at least 112 of the total 127 detail coefficients in this 128 element time series very closely
agree. This suggests that, from the wavelet perspective, the two change-points in Figure 2 are equally
difficult to detect when using just the highest detail coefficient levels.
The phenomena in Figure 2 illustrates the sparsity property of the DWT and holds in general for
any smoothly varying mean functions which share a common change-point. In particular, any otherwise smooth function with a change-point should have a similar detail coefficient representation
at the finest levels as a step function with a change-point at the same location. This observation provides the intuition behind why an analysis of wavelet detail coefficients may be an effective approach
in estimating the change-point location of time series with otherwise smooth functions such as those
shown in Figure 2.
Consider a multi-dimensional time series of independent observations {xi }N
i=1 for N N where xi
is a p-dimensional vector, such that
xi Np (i , ).

(5)

R. M. STEWARD AND S. E. RIGDON

In the typical case where Bayesian or likelihood techniques are applied to the multivariate changepoint problem, i is assumed to be a p-dimensional step function. For our more general analysis,
however, i is assumed to be generated by a p-dimensional function, g(), smoothly changing except
at a single point in time where the shift occurs. Throughout this article, we denote the unknown time
series change-point location with the symbol . We also assume  is an unknown but constant p p
covariance matrix throughout our time series. A particular observation of the time series takes the
form

Downloaded by [Library Services City University London] at 09:47 19 December 2015

xi = g(i) + i ,
where
i Np (0, ).
Next, we let the N p matrix X represent our time series where each row represents an observation
at a particular time. Additionally, we introduce the idealized N p matrix, H, which we compare
against X:

x11
x21
..
.

X=
x 1
x +1,1

.
..
xN1

x12
x22
..
.
x 2
x +1,2
..
.
xN2

...
...
..
.

x1p
x2p
..
.

...
x p

. . . x +1,p

..
..
.
.
...
xNp Np

0
0

..
.

and H =
0
1

.
..

0
0
..
.
0
1
..
.

0
0

..
.

. . . 1 Np

...
...
..
..
.
.
... ...
... ...
..
..
.
.

...

(6)

The zero rows in H represent those observations before the change-point and the one rows indicate
observations after the change-point. We assume for now that our time series is of dyadic length, that
is, of length N = 2J for some J N. While this appears to be a restrictive requirement, in practice
there are several padding techniques that remedy this apparent difficulty.[26] For example we might
simply concatenate low-level statistical noise to the front end of the time series to achieve the required
dyadic length if we have data available from the in-control time series state. Another method is to
reflect the time series elements of sufficient length to obtain the require dyadic length. For example,
a data set with six elements (x1 , x2 , x3 , x4 , x5 , x6 ) could be modified as (x3 , x2 , x1 , x2 , x3 , x4 , x5 , x6 ) to
achieve the required dyadic length. The latter approach is what we will apply for our practical example
in Section 6.2.
We now take a one-dimensional discrete wavelet transform (DWT) of both X and H column by
column which produces two (N 1) p matrices in the wavelet domain D and Q. We can normalize each detail level by its l norm which has the effect weighting coefficients from different resolution
levels equally. With l normalized detail levels our subsequent analysis becomes less sensitive to
change information contained by the lowest resolution levels. In Section 6 we apply our algorithm
both with and without normalized detail coefficients. When the rows of zeroes and ones of H exactly
correspond to the rows of X before and after change-point, the rows of D and Q will closely relate
to each other in a meaningful manner as we describe below. Since the statistical properties of the
additive noise component of the time series are retained after a one-dimensional DWT, it can easily
be shown using the linearity of the DWT that the expected covariance matrix after the transform
remains .
Notationally, we index our detail matrices to emphasize the detail levels of each row. More explicitly, supposing our time series is of length 2J , we denote a p-dimensional detail coefficient as djk =
(djk,1 , djk,2 , . . . , djk,p ) where j represents a particular detail level and k the translation index at the given

Downloaded by [Library Services City University London] at 09:47 19 December 2015

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

detail level. We then express the DWT of X and H as the matrices D and Q where,

d01,1
d01,2
...
d01,p

d11,2
...
d11,p

11,1
d

d12,2
...
d12,p

12,1

d21,2
...
d21,p
d21,1

.
.
.

.
D =
..
..
..
..

d
djk,2
...
djk,p
jk,1

..
..
..
..

.
.
.
.

d
... d
d
J
J
J
and

(J1) 2 ,1

(J1) 2 ,2

q01,1
q11,1
q12,1
q21,1
..
.

q01,2
q11,2
q12,2
q21,2
..
.

Q=

qjk,1

..

.
q(J1) J ,1
2

qjk,2
..
.
q(J1) J ,2
2

(J1) 2 ,p

...
...
...
...
..
.

q01,p
q11,p
q12,p
q21,p
..
.

...
qjk,p
..
..
.
.
. . . q(J1) J ,p

(N1)p

(7)

(N1)p.

Next, we define  = [1 , 2 , . . . , p ] as the amount our mean function shifts at the unknown
change-point. It is important to note that here  is not a vector, rather a set of coefficients. We use the
[ ] notation to distinguish this from say q11 = (q11,1 , q11,2 , . . . , q11,p ) which is a p-dimensional vector. So in particular, we define, q11 = (1 q11,1 , 2 q11,2 , . . . , p q11,p ) using element-by-element scalar
multiplication.
We know the additive noise component of the original time series is again transformed to an additive noise component after the dimension-by-dimension DWT is taken of the original time series.
Furthermore, as illustrated in Figure 2, we know, at least for the finest level detail vectors, that the
true detail vector values should very closely match the detail vectors of Q. In the case when the
mean function of our time series is a multivariate step function, all true detail vectors will match the
detail vectors of Q. In the more general case where the true underlying mean function is unknown,
we will ultimately retain only the finest level detail vectors in our final analysis. With these properties
in mind, those retained empirical detail coefficient vectors, djk , may therefore be modelled as
djk Np (djk , ) = Np (qjk , ),
where djk is the true detail vector while djk = (djk1 , djk2 , . . . djkp ) and qjk = (qjk,1 , qjk,2 , . . . qjk,p ) are
the jk rows of the matrices D and Q, respectively. Using Bayes theorem, our posterior distribution
of , , and  takes the form of the product of our likelihood and prior distribution; that is,

p( , , |D )
f (djk | , , )p0 ( , , ).
(8)
j

where f is a p-dimensional multivariate normal probability density function.


In Equation (8) we use the double index notation to emphasize that we are taking the product
over distinct detail coefficients by their resolution and translation indices. In our model  is a constant but unknown covariance matrix. Following the discussion above, any prior covariance matrix

R. M. STEWARD AND S. E. RIGDON

information we have in our original time series directly applies after our transform. For example,
we could put a Wishart distribution as an informative prior on  if we have sufficient prior knowledge of . For the most general case, however, we will apply Jeffreys noninformative prior given as
p0 (, , ) ||1/2 . We also note, that implicit in this prior is that we assign a uniform prior to
the change-point location throughout the time series. Our posterior distribution takes the form


1
p( , , |D ) ||m/2 exp
(djk qjk )T  1 (djk qjk ) ||1/2 ,
2 j
Downloaded by [Library Services City University London] at 09:47 19 December 2015

where m represents the actual number of detail coefficients used in the analysis. In the appendix,
we provide details of the calculations where we integrate out  and  to arrive at the marginalized
posterior distribution function that we apply in Sections 6 and 7,

(mp1)/2





1

1/2 
T
T
p( |D ) C
djk djk BB 
,

C
 j k


(9)

where
A=


j

1
dT
jk  djk ,

B=


j

qij djk ,

BT =


j

qij dT
jk ,

and C =


j

q2ij .

Formally we estimate the change-point of the time series as arg max p( |D ). In particular, there

are N 1 possible values of and with probability one a maximum value always exists. Notice that
Equation (9) is neither wavelet nor detail level specific. Depending on what we know (or do not
know) about the time series, different wavelet- and detail-level combinations may be more appropriate. Depending on the true underlying mean function of the time series, we found through simulation
studies that the choice of wavelet had a minor, but noticeable, effect on correctly estimating the
change-point location. In the simplest case, when the mean function is represented by a multivariate step function, studies show it is also the simplest wavelet (i.e. the Haar wavelet) that performs
marginally better. In the case of a smoothly varying mean function, the Daubechies 10-tap wavelet
became the best choice for correctly estimating the change-point location.
We also need to decide which detail levels to apply. This decision is fairly straightforward depending on what is known about the true mean function. In general the more applicable detail vectors that
we can use in Equation (9), the more confidence we will be able to attribute to our conclusions. So
long as the mean function is smooth except at the change-point location then our model assumptions
apply and at least the finest three or four detail levels should be applied. If more information about the
mean is available, it may be optimal to use more detail levels. For example in the case of a multivariate
step function, all detail levels should be applied.

4. Bayesian-wavelet approach to detecting the existence of a change-point


We determine the existence of a change-point by taking a model selection approach and applying a
form of the Schwarz information criteria (SIC). Let M1 denote that a single change-point occurs in
the mean function of our time series and let M2 denote the model where no change occurs. We first
compute the likelihood of observing either of these two models given the observed data. Since it is
unclear which constants will cancel in the ratio of these two models, we must retain them throughout
the calculations. Then similar to our previous derivation of Equation (9), we obtain the following

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

likelihood for M1 :
P(D |M1 ) = K(2 )(pmp)/2 2mp/2 p


(mp1)/2
 



1
1/2 
T
T
djk djk BB 
,
C

2
C
 j k


m

(10)

where p () is the multivariate gamma function defined as


p (x) =

p(p1)/4

p


[x + (1 i)/2],

Downloaded by [Library Services City University London] at 09:47 19 December 2015

i=1

K is a constant common to both models, and all other terms are as previously defined.
In M2 , calculations are simplified since  is assumed to be the p-dimensional zero vector. Once
again adopting a similar approach as before we obtain the likelihood of observing our data under M2 :

(mp)/2

  



(m
+
1)

mp/2 (m+1)p/2
T 

P(D |M2 ) = K(2 )
2
p
djk djk 
.
(11)

2
 j k

We note the difference in the number of free parameters in M1 and M2 is k2 k1 = p, namely the
dimension of . This suggests a form of the SIC.
(SIC) = 2(log P(D |M2 ) log P(D |M1 )) + (k2 k1 ) log N.
For our multi-dimensional change-point problem, we maximize Equation (10) for to obtain our
final result.
(SIC) = 2(log(P(D |M2 )) log P(D |M1 )) + p log(N),

(12)

where Equation (12) implicitly assumes equal prior probability of realizing either M1 or M2 . In certain
instances the modeller may have reason to favour one model over the other and so the prior odds ratio
of the two models would not be 1. Recall, the posterior odds ratio may be expressed as
P(D |M1 ) P(M1 )
P(M1 |D )
=
P(M2 |D )
P(D |M2 ) P(M2 )
P(M1 )
= Bayes Factor
.
P(M2 )

(13)

We may modify Equation (12) to incorporate a prior belief to a priori favour one model over the other.
In our setting this may be accomplished by substituting the data-dependent terms in Equation (12)
with 2 times the log of Equation (13). For the later examples and simulations we provide, we note
that each model is given equal weight and Equation (12) is implemented in its present form.
Our selection process is now a straightforward calculation of (SIC). We select the no change
model when (SIC) < 0 and infer a change-point exists in the time series when (SIC) > 0. We note
slightly positive values (e.g. (SIC) 3) should be treated with caution. Although the change-point
model is favoured in such cases, the evidence is not particularly strong. Values computed farther from
zero (i.e. (SIC) > 3) denote strong evidence of the existence of a change-point with more assurance
obtained with larger computed values.

5. Extending to the case of multiple change-points


The preceding methods may be extended to the case of time series containing multiple statistical
change-points. In this section we demonstrate how an application of the so-called binary segmentation algorithm may be applied in conjunction with the methods developed in Sections 3 and 4 to

10

R. M. STEWARD AND S. E. RIGDON

(1) estimate the number of change-points in a nonlinear multivariate time series and (2) estimate the
locations of these change-points. In Section 6 we also provide an illustrative example of how this may
be applied to a data set containing multiple change-points.
Assume we observe a p-dimensional time series, X = {xi }N
i , where N N such that
xi Np (i , ).

Downloaded by [Library Services City University London] at 09:47 19 December 2015

We assume  is an unknown constant covariance matrix throughout the time series while i is determined by an unknown multivariate mean function g() smoothly varying except at the set of points
{i }M
i=1 . We focus our attention on determining M and each i . The binary segmentation algorithm
may now be applied as follows:
(1) Apply Equation (12) to the time series X.
If (SIC) < 3 terminate the algorithm and conclude time series has no change-points.
(2) Apply Equation (9) and record change-point location .
(3) Segment the original time series into two time series from elements 1 through and + 1
through N.
(4) Return to step 1 for each segment.
The algorithm runs until all segments terminate. This approach may be efficiently applied to time
series with an arbitrary number of change-points. Furthermore, no new theoretical machinery is
required thereby simplifying its implementation.
One technical issue the practitioner should be aware of concerns instances when the signal-tonoise ratio is not sufficiently high for the algorithm to pick out the exact change-point location. For
example, if in step 2 of the algorithm, should the change-point location estimate miss by even one time
point, then the subsequent segmented time series will contain a change-point already accounted for
in the previous step. If this possibility is not accounted for in advance, the algorithm could incorrectly
indicated the presence of false change-points. One possibility for addressing this issue is to not allow
for change-points within a fixed distance of each other. In practice when change-points are at least
five time units away from each other this problem is not encountered for time series whose dimensional components have a signal-to-noise ratio of at least one. Alternatively, if the algorithm returns
multiple change-points adjacent to each other, then the modeler may often safely interpret this return
as representing a single change-point.

6. Examples
6.1. Illustrative examples
We provide an illustrative example to demonstrate how our Bayesian-wavelet approach to the multivariate change-point problem easily adapts to various mean functions. For this example, we simulate
data from a three-dimensional normal distribution centred around 0 for the first 85 elements of the
time series and then introduce a shift of 1 unit in the first and third dimensions for the remaining
43 elements. The covariance matrix remains constant throughout the time series and has 0.25 on all
diagonal elements and 0 on all off-diagonal elements. Figure 3 depicts a plot of our time series where
the shift in the first and third dimensions is visually evident.
Applying a classical likelihood-based approach to this time series correctly returns time point 85
as the estimated change-point location. A purely Bayesian approach such as the one described by
Perreault et al. [15] also returns time point time point 85 as the estimated change-point location along
with a 95% credible interval of [84, 86]. Applying our Bayesian-wavelet approach we first calculate the
SIC using Equation (12) to determine the existence of a change-point. Equation (12) returns a value of
53.25 providing us with near certainty that a change-point exists in the data. Estimating the changepoint location with Equation (9) also correctly returns the change-point location at time point 85.

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

20

40

60

80

100

120

0.0 0.5 1.0 1.5 2.0

Signal

1.5

1.0

0.5 0.0

Signal

0.5

1.0

0.5 1.0 1.5


0.5
0

20

Units of Time

40

60

80

100

120

20

40

60

Units of Time

80

100

120

Units of Time

Figure 3. A three-dimensional time series where the mean function is a three-dimensional step function. In particular, a shift occurs
at time point 85 in the rst and third dimensions.

0.0
0

20

40

60

80

units of time

100

120

0.6
0.4
0.2

0.2

0.4

0.6

Probability of a changepoint

Probability of Changepoint Location


(detail levels 3, 4, 5, and 6 used)

0.0

0.8

Probability of Changepoint Location


(all detail levels used)
Probability of a changepoint

Downloaded by [Library Services City University London] at 09:47 19 December 2015

Dimension 3

1.5

Dimension 2

1.5

Signal

Dimension 1

11

20

40

60

80

100

120

units of time

Figure 4. Marginal posterior distribution of Equation (9) applied to the time series in Figure 3 with all details used (left) and only
the four highest detail levels used (right). Notice in each case the concentrated probability is correctly centred at time point 85, but
with a slightly wider credible interval for the case on the right when not all detail coecients are used.

In this case since the true underlying mean function is a multivariate step function all details levels
should be applied. In practice, we may not know the structure of the mean function and so would
only apply the four highest detail levels. In both cases the Bayesian-wavelet correctly estimates the
change-point location, but with different 95% credible intervals. With all detail levels used we obtain
a 95% credible interval of [84, 86] and in the second case with only the first levels used we obtain a
slightly less precise 95% credible interval of [82, 89] (see Figure 4).
To illustrate the power of the Bayesian-wavelet approach, suppose we now impose one period of
a sine wave on the same data set in each dimension. This new data set now represents the scenario
where the mean function of our time series is nonlinear. Figure 5 depicts this new time series where
we see the change-point at time point 85 is much more obscured. Applying both the likelihood and
pure Bayesian approaches to this time series with a nonlinear mean function both return meaningless
results as the assumptions upon which they are based are now violated. Directly imputing the new
time series in the MLE algorithm, for example, incorrectly estimates the change-point location at
time 63.
Our Bayesian-wavelet approach, however, easily adapts to this more complicated situation. Using
the four highest detail coefficient levels we calculate an SIC of 12.5 indicating the presence of a changepoint in the time series. Maximizing Equation (9) for correctly estimates the change-point location
once again at time point 85. Figure 6 displays the relative probabilities for the change-point location
with a slightly less concentrated 95% credible interval of [82, 88].
As a final illustrative example, we generate a five-dimensional time series now with multiple
change-points at time points 50, 100, 150, and 200. Figure 7 illustrates the first dimension of this
time series where segments 1, 2, 3, 4, and 5 are centred around mean vectors T1 = (0, 0, 0, 0, 0), T2 =

R. M. STEWARD AND S. E. RIGDON

Dimension 2

Dimension 3

Signal

0.5

Signal

1.5

0
1

Signal

Dimension 1

20

40

60

80 100 120

20

Units of Time

40

60

80 100 120

Units of Time

20

40

60

80 100 120

Units of Time

Figure 5. This is the same data set at in Figure 3 only now with one period of the trigonometric function sin(2t/128) added to
the elements in each dimension.

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Probability of Changepoint Location


Probability of a changepoint

Downloaded by [Library Services City University London] at 09:47 19 December 2015

0.5 1.0 1.5 2.0

12

20

40

60

80

100

120

units of time
Figure 6. Marginal posterior distribution from the time series in Figure 5 with concentrated probability at the correct change-point
at time point 85.

(1, 1, 1, 1, 1), T3 = (.5, .5, .5, .5, .5), T4 = (2, 2, 2, 2, 2), and T5 = (.5, .5, .5, .5, .5),
respectively. Applying Equation (12) to the original time series returns a value of 70.5 indicating with
near certainty the presence of a statistical change-point; we therefore apply Equation (9) to estimate
the location of the change-point. The first application of Equation (9) estimates the change-point
location at time point 200 corresponding the largest shift of the time series. Next we segment the
time series from time points 1200 and 201256 and repeat this process. Continuing in such a way
until all segments terminate, the algorithm correctly estimates the presence of four statistical changepoints at time points 51, 100, 151, and 200 each with associated 95% credible intervals of less than
5 time units of the actual change-point location.
6.2. Practical example
We present a practical example implementing the methods developed in this article involving six
hydrological sequences in the Northern Qubec Labrador region as represented in Figure 8. In particular, we analyse the streamflow in units of 1/(km2 s) measured in the springs from 1957 to 1995. It
has been noted that a perceptible general decrease in streamflow seemed to occur in the 1980s in this
region. The regional proximity of the rivers suggests a likely relationship between the rivers, but the
specific covariance structure is unclear a priori. Hence, a multivariate analysis certainly appears more
appropriate than six individual river univariate studies. The assertion is that due to causes attributed to

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION


Dimension 1

2
0

50

100
150
units of time

200

250

50

100
150
units of time

200

250

Figure 7. The left gure represents the rst dimension of a ve-dimensional time series with change-points at time points 50,
100, 150, and 200. The right gure delineates the time series into segments as estimated by the binary segmentation algorithm in
conjunction with Equations (9) and (12).

Churchill Falls

1960

1970

1980

30
20

1/km^2 s

30
20

1/km^2 s

40

Romain

1990

1960

1970

1980

1990

Manicouagan

1970

1980

30

1/km^2 s
1960

20

30
20

1/km^2 s

40

Outardes

1990

1960

1980

1990

1990

25

1/km^2 s
1970

1980

15

40

1960

1970

A la Baleine

30

1/km^2 s

SainteMarguerite

20

Downloaded by [Library Services City University London] at 09:47 19 December 2015

signal

signal

Dimension 1

13

1960

1970

1980

1990

Figure 8. Plots of riverows of six rivers in the Northern Qubec Labrador region. The dashed lines for la Baliene are years river
ows are estimated from a linear regression since the actual data are unavailable.

perhaps climate change or other regional factors, a change-point in streamflow has occurred. Applying our methods, we would like to determine whether or not our methods support this assertion and
if so estimate the change-point year.
Perreault et al. [15] originally applied a retrospective Bayesian change-point analysis to this data
set. The principal advantage of our Bayesian-wavelet method over Perreaults pure Bayesian approach
to this data set is that our method applies even if the true underlying mean function is not a step
function. Perreault spends considerable time justifying rather strict assumptions on the data and the
choice of hyperparameters used in the model. While Perreaults analysis appears largely valid in this
case, the strict assumptions required by such a purely Bayesian approach limit its applicability in more
general contexts and often make conclusions less compelling. With the Bayesian-wavelet approach,
however, we have no need to elicit informative priors for the mean vectors both before and after the

14

R. M. STEWARD AND S. E. RIGDON

0.8
0.6
0.
0.2
0.0

Probability of a changepoint

Downloaded by [Library Services City University London] at 09:47 19 December 2015

Probability of changepoint location

1960

1970

1980

1990

Year
Figure 9. Posterior distribution of a change-point for six hydrological sequence in the Northern Qubec Labrador region.

unknown change-point nor for the covariance matrix to construct our model. As discussed above,
we require only that the true underlying mean function be smooth except at the single change-point
and that the random component be normally distributed.
To begin our analysis we note measurements for one river, la Baliene, are unavailable from the
years 19571962 inclusive. To handle this discrepancy we took two different approaches. In the first
case, we simply analysed the data for the common years from 1963 to 1995 inclusive. In the second
approach, we treat river flows for la Baliene as a dependent variable and perform a linear regression
for the years with complete data against the other five rivers. With the linear model in hand, we
estimate river flows for la Baliene for the years 19571962 from the linear model using the data
from the other rivers with complete data sets. The dashed line in Figure 8 for la Baliene represents
these estimated values. After a comparison of our analyses, we find very similar results are obtained
in both cases. As such we present results from only the latter case.
We implement the Daubechies 10-tap wavelet since it has known properties particularly well suited
to detect abrupt time series change.[32] Based on Perreaults analysis, the mean function is some
unknown multivariate step function. If this property actually holds, we should be able to apply all
detail levels with Bayesian-wavelet and arrive at the same answer. Standardizing detail coefficients
as described in Section 3, we thus apply all detail coefficients in our analysis. Finally, we note this
time series is not a power of two as required to apply any DWT. We remedy this situation by simply reflecting the beginning of the time series to achieve the required dyadic length as described in
Section 3.
With our wavelet parameters in hand, we next must determine whether or not a statistical changepoint in the mean vector even exists in our data set. A computation of the SIC returns a value 14.53
which represents strong evidence for the existence of a statistical change-point. Next, we estimate
the location of the change-point by maximizing the Bayesian-wavelet change-point equation for .
This returns the year 1984 as the change-point location estimate with posterior probability of nearly
0.85. Furthermore, we note a 90% credible interval ranges around this estimation of the change-point
location from [1983, 1986] (see Figure 9). We note these results are similar to Perreault, who also
estimated the change-point year as 1984, but with a 90% credible interval of [1983, 1985].[15]

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

15

7. Simulations
In order to compare the performance of the Bayesian-wavelet method with a likelihood-based
method, we ran simulations and compared how often the estimate of the change-point was within
two time units of the true change-point.

Downloaded by [Library Services City University London] at 09:47 19 December 2015

7.1. Multivariate step mean function


For the simulations in this section we generate multivariate time series with an underlying mean
function represented as a multivariate step function. The time series length in each case is 128 where
the change-point is randomly selected somewhere in the middle 90% of the time series elements.
Before the change-point elements are centred around the zero mean vector, = (0, 0, . . . , 0). After
the change-point the mean vector shifts to = (, , . . . , ). Furthermore, we generate simulated
data for two separate covariance matrices 1 = I, the identity covariance matrix, and then 2 a
covariance matrix with 1s along the diagonal and 0.5s on all off-diagonal elements. We record the
percentage each method correctly estimates the change-point location with two time units of the true
change-point location for each 1000 simulation run.
Before we can begin our simulations, we must decide which detail levels and which wavelet to
apply. In the case of a stationary time series with a single change-point there is no underlying trend
to the time series contributing to time series change except the single change-point. In this case, the
change information contained in the detail coefficients only pertains to the change-point itself. Hence,
we apply all wavelet details in our simulations. For the wavelet function itself, we present results from
the Haar wavelet although applying the Daubechies 10-tap wavelet yielded similar results (Table 1).
We compared the effectiveness of the Bayesian-wavelet and likelihood methods for estimating
the change-point location by varying the dimension, jump size, and covariance matrix of our time
series. It is interesting to note that the simulation results suggest there is very little difference between
the likelihood and Bayesian-wavelet approach in correctly estimating the change-point. Furthermore,
what differences may exist become less important as the dimension increases. Thus, we obtain comparable results to the likelihood method with our Bayesian-wavelet method without the same stringent
likelihood time series assumptions.
Table 1. Percentage of change-point estimations within two time units of actual change-point after 1000 simulations. In all cases
the initial mean vector is = (0, 0, . . . , 0) and then shifts to = (, , . . . , ).
Time series dimension
Chgt Pt method
BW
MLE
BW
MLE
BW
MLE
BW
MLE
BW
MLE
BW
MLE
BW
MLE
BW
MLE

10

25

50

75

100

0.5
0.5
1.0
1.0
1.5
1.5
2.0
2.0
0.5
0.5
1.0
1.0
1.5
1.5
2.0
2.0

I
I
I
I
I
I
I
I
2
2
2
2
2
2
2
2

0.36
0.37
0.77
0.77
0.95
0.96
0.99
0.99
0.21
0.22
0.60
0.60
0.85
0.84
0.96
0.96

0.60
0.59
0.96
0.96
0.99
0.99
1.00
1.00
0.14
0.16
0.56
0.56
0.81
0.81
0.97
0.97

0.79
0.79
0.99
0.99
1.00
1.00
1.00
1.00
0.20
0.21
0.71
0.71
0.89
0.90
0.98
0.98

0.98
0.94
1.00
1.00
1.00
1.00
1.00
1.00
0.13
0.13
0.62
0.63
0.87
0.88
0.97
0.97

0.98
0.98
1.00
1.00
1.00
1.00
1.00
1.00
0.07
0.08
0.41
0.41
0.76
0.76
0.91
0.91

0.98
0.98
1.00
1.00
1.00
1.00
1.00
1.00
0.06
0.07
0.24
0.26
0.57
0.57
0.88
0.88

0.93
0.92
1.00
1.00
1.00
1.00
1.00
1.00
0.05
0.05
0.12
0.14
0.32
0.31
0.54
0.53

Notes: BW indicates the Bayesian-wavelet approach and MLE indicates the maximum likelihood estimation approach. Simulations
are conducted with two covariance matrices, the identity covariance matrix (I) and a covariance matrix with 1s along the diagonal
and .5s on all o-diagonal elements (1 ).

16

R. M. STEWARD AND S. E. RIGDON

Table 2. Percentage each method estimates the change-point location within 2 time units of true change-point location where
each run represents 1000 simulations. In all cases the initial mean vector is = (sin(2t/128), sin(2t/128), . . . , sin(2t/128))
and then shifts to = (sin(2t/128) + 1, sin(2t/128) + 1, . . . , sin(2t/128) + 1).

Downloaded by [Library Services City University London] at 09:47 19 December 2015

Time series dimension


Chgt Pt method

10

25

50

BW
MLE
BW
MLE
BW
MLE
BW
MLE
BW
MLE
BW
MLE
BW
MLE
BW
MLE

0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1.0
1.0
1.2
1.2
1.4
1.4
1.6
1.6

0.98
0.00
0.88
0.00
0.71
0.00
0.60
0.01
0.53
0.02
0.44
0.01
0.39
0.01
0.31
0.01

0.99
0.00
0.99
0.00
0.92
0.00
0.87
0.01
0.78
0.01
0.70
0.00
0.62
0.01
0.54
0.00

1.00
0.00
0.99
0.00
0.99
0.00
0.94
0.00
0.89
0.00
0.89
0.00
0.76
0.00
0.69
0.00

1.00
0.00
1.00
0.00
0.99
0.00
0.97
0.00
0.96
0.00
0.90
0.00
0.83
0.00
0.79
0.00

1.00
0.00
1.00
0.00
0.99
0.00
0.99
0.00
0.97
0.00
0.95
0.00
0.89
0.00
0.87
0.00

1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
0.99
0.00
0.99
0.00
0.98
0.00

1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
0.99
0.00
0.99
0.00

Notes: BW indicates the Bayesian-wavelet approach and MLE indicates the maximum likelihood estimation approach. Throughout
the simulations the covariance matrix used is the identity multiplied by 2 .

7.2. Multivariate piecewise smooth function with a single mean function shift
We next investigate how these methods perform when the underlying mean function does not conform to a multivariate step function. In particular, since the Bayesian-wavelet method requires only
the underlying mean function to be smooth except at the change-point, we consider a multivariate
time series with a nonconstant smoothly varying mean function.
We generate time series with a smoothly varying mean function except at a single change-point.
Specifically, we set the initial mean to t = sin(2 t/128)1, t = 1, 2, . . . , and then after the changepoint the mean vector becomes t = sin(2 t/128)1 + 1, t = + 1, + 2, . . . , 128. That is, the shift
vector is  = (1, 1, . . . , 1) for all simulations. We then incrementally adjust the variance of the additive noise by changing the diagonal terms of the covariance matrix. We set our covariance matrix
equal to the identity multiplied by the constant 2 as given in Table 2. The change-point is randomly
selected from the middle 90% of the time series and the Daubechies 10-tap wavelet is applied using
the four highest details coefficients.
Simulation results provide evidence that the Bayesian-wavelet method does well seeing through
additive noise component of the time series and estimating the true change-point location. Applying
Equation (9) exactly as we did in Section 7.1, only now with just the four highest detail levels and
the Daubechies 10-tap wavelet, we have a method that easily adapts to estimate change-points in
a very different time series. Methods such as MLE or a purely Bayesian approach that make strict
assumptions on the true underlying mean function do not share this same flexibility. We see the
underlying form of the oscillating mean function violates the likelihood assumptions in such a way
that this method has no ability to correctly estimate the change-point location. Only in the lower
dimensional cases with high variance when the time series more closely resembles pure noise, does
the MLE register a few correct estimates by chance alone. In the other cases the geometry of the time
series forces the MLE method away from the true change-point location.

8. Conclusion
In this article we presented a methodology for both inferring the existence of one or more statistical change-points in a multivariate time series and estimating their location. We see this general

Downloaded by [Library Services City University London] at 09:47 19 December 2015

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

17

approach is not limited to just changes in mean, but can also be adapted to estimate covariance structure change-point locations as well. Finally, it can be shown that Equation (5) is invariant to dimension
preserving linear transformations. This property suggests applications to the change-point problem
for high dimensional time series in conjunction with a dimension reduction through a random matrix
multiplication. All these topics are currently under investigation.
Another interesting aspect of this approach is how it may be used as an indirect tool to validate
certain data set assumptions. When parametric methods such as MLE or purely Bayesian models
are applied to infer and estimate the location of a single change-point in a multivariate time series,
the true underlying mean function is typically a multivariate step function. In principle using all
detail levels of our Bayesian-wavelet method should always return very nearly identical change-point
location estimates in such cases. If a discrepancy exists between the above parametric methods with
our Bayesian-wavelet method, then either the time series signal-to-noise ratio is not sufficiently high
or the model assumptions are simply not valid.
We found our multivariate Bayesian-wavelet approach for detecting statistical change-points performs comparably with the classical likelihood method when the true mean function of the time
series is a multivariate step function. The advantage to our approach is seen in how our method also
easily extends to more general situations. The simulations demonstrate how the likelihood method
fails when its model assumptions become invalid, but also show how the Bayesian-method still performs well. We chose a multivariate trigonometric function as an example in our simulations, but the
approach applies equally well to any other such piecewise smooth multivariate functions. We thus
conclude that the Bayesian-wavelet method affords the modeler greater flexibility in much more general situations and potentially serves as a valuable diagnostic tool in the setting of the multivariate
change-point problem.

Acknowledgments
We would like to thank both Professor Darrin Speegle and the anonymous referees for their careful consideration of
this paper. Their suggestions and helpful advice certainly improved the final form of this paper.

Disclosure statement
No potential conflict of interest was reported by the authors.

References
[1] Montgomery D. Introduction to statistical quality control. 6th ed. Hoboken, NJ: Wiley ; 2009.
[2] Worsley K. On the likelihood ratio test for a shift in location of normal populations. J Amer Statist Assoc.
1979;74:365367.
[3] Smith AFM. A Bayesian approach to inference about a change-point in a sequence of random variables.
Biometrika. 1975;62:407416.
[4] Barry D, Artigan J. A Bayesian analysis for change point problems. J Amer Statist Assoc. 1993;88(421):309319.
[5] Chib S. Estimation and comparison of multiple change-point models. J Econ. 1998;86(2):221241.
[6] Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.
Biometrika. 1995;82:711732.
[7] Chen J, Gupta AK. Testing and locating variance change-points with application to stock prices. J Amer Statist
Assoc. 1997;92:739747.
[8] Zamba K, Hawkins D. A multivariate change-point model for change in mean vector and/or covariance structure.
J Quality Technol. 2009;41(3).
[9] Carlin B, Gelfand A, Smith A. Hierarchical Bayesian analysis of changepoint problems. Appl Stat. 1992;389405.
[10] Pettitt A. A non-parametric approach to the change-point problem. Appl Stat. 1979;126135.
[11] Bai J. Estimation of a change point in multiple regression models. Rev Econ Stat. 1997;79(4):551563.
[12] Sullivan J, Woodall W. Change-point detection of mean vector or covariance matrix shifts using multivariate
indificual observations. IIE Trans. 2000;32(6):537549.
[13] Chen J, Gupta AK. Parametric statistical change point analysis. New York: Birkhauser; 2012.
[14] Horvth L, Kokoszka P. Testing for changes in multivariate dependent observations with an application to
temperature changes. J Multivariate Anal. 1999;68:96119.

Downloaded by [Library Services City University London] at 09:47 19 December 2015

18

R. M. STEWARD AND S. E. RIGDON

[15] Perreault L, Parent E, Bernier J, Bobe B, Parent E. Retrospective multivariate Bayesian change-point analysis: A
simultaneous single change in the mean of several hydrological sequences. J Multivariate Anal. 2000;235:221241.
[16] Son YS, Kim SW. Bayesian single change point detection in a sequence of multivariate normal observations.
Statistics. 2005;39(5):373387.
[17] Mller HG. Change-points in nonparametric regression analysis. Ann Stat. 1992;20:737761.
[18] Ogden R, Lynch J. Bayesian analysis of change-point models. Lecture Notes Stat. 1999;141:6782.
[19] Ciuperca G. Estimating Nonlinear Regression with and without Change-points by the LAD Method. Ann Inst
Stat Math. 2011;63:717743.
[20] Battaglia F, Protopapas MK. Multi-regime models for nonlinear nonstationary time series. Comput Stat.
2012;27:319341.
[21] Matteson DS, James NA. A nonparametric approach for multiple change point analysis of multivariate data. J
Amer Statist Assoc. 2014;109:334345.
[22] Mason R, Young J. Multivariate statistical process control with industrial applications. Philadelphia, PA: Society
for Industrial and Applied Mathematics; 2002.
[23] Perreault L, Bernier J, Bobe B, Parent E. Change-point analysis in hydrometeorological time series. part 1. the
normal model revisted. J Multivariate Anal. 2000;235:221241.
[24] Wagner M. Handbook of bioserveillance. Burlington, MA: Elsevier Academic Press; 2006.
[25] Daubechies I. Ten lectures on wavelets. Philadelphia, PA: Society for Industrial and Applied Mathematics; 1992.
[26] Nason G. Wavelet methods in statistics with R. New York: Springer Science+Business Media, LLC; 2008.
[27] Vidakovic B. Statistical modeling by wavelets. Canvers, MA: Wiley; 1999.
[28] Mallat SG. Theory for multiresolution signal decomposition: The wavelet representaion. IEEE Trans Pattern Anal
Mach Intell. 1989;11(7):674693.
[29] Donoho DL, Johnstone JM. Ideal Spatial Adaption by Wavelet Shrinkage. Biometrika. 1994;81:425455.
[30] Mardia K, Kent J, Bibby J. Multivariate analysis. New York: Academic Press; 1979.
[31] Wang Y. Jump and sharp cusp detection by wavelets. Biometrika. 1995;82:38597.
[32] Jensen A, la Cour-Harbo A. Ripples in mathematics: the discrete wavelet transform. Berlin: Springer; 2001.

Appendix
We derive Equation (9) beginning with the posterior distribution


1

m/2
T
1
exp
(djk qjk )  (djk qjk ) ||1/2 .
p( , , |D ) ||
2 j
k

Here, m represents the actual number of detail coefficients used in the analysis. We integrate out  and  to obtain the
marginal posterior distribution function




1

(m+1)/2
T
1
p( |D )
||
exp
(djk qjk )  (djk qjk ) dd.
(A1)
2 j
PD(p) Rp
k

where PD(p) represents the space of p-dimensional positive-definite matrices.


Notice by how Q is defined, that all the elements of any qjk are identical. With this observation in mind, we let qjk be
a scalar representative for a given row of Q corresponding to the value of each element in that particular row. Next, we
let  represent a vector of the mean function shift at the change-point in the natural way. With this change of notation
in hand, we may equivalently write Equation (A1) as




1
p( |D )
||(m+1)/2 exp
(djk qjk )T  1 (djk qjk ) d d.
(A2)
2 j
PD(p) Rp
k

Expanding the exponent of Equation (A2), we obtain




||(m+1)/2
p( |D )

PD(p) Rp


1
exp
(dTjk  1 djk + qjk T  1 qjk  qjk T  1 djk dTjk  1 qjk ) d d
2 j


PD(p)



1
||(m+1)/2 exp (A + CT  1  T  1 B BT  1 ) d d,
2
Rp

(A3)

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

where
A=



Downloaded by [Library Services City University London] at 09:47 19 December 2015

dTjk  1 djk ,

B=


j

BT =

qij djk ,


j

qij dTjk ,

and C =


j

19

q2ij .

Continuing from Equation (A3), we provide the following detailed calculations:







BT 1
C A
B
+ T  1  T  1
 
||(m+1)/2 exp
d d
p( |D ) =
2 C
C
C
PD(p) Rp






 


1
B T 1
1
C
B
=
A BT  1 B

||(m+1)/2 exp
exp


d d
2
C
2
C
C
PD(p)
Rp




1
1
A BT  1 B
||(m+1)/2 ||1/2 C1/2 exp
d
=
2
C
PD(p)



1
1
 m/2 C1/2 exp
dTjk  1 djk BT  1 B d
=
2
C
PD(p)
j


=

PD(p)

m/2 1/2

1
exp tr  1
2




(mp1)/2





1
djk dTjk BBT 
,
C1/2 
C
 j k


1
djk dTjk BBT
C

d

(A4)

where in the last step Equation (9) follows by dropping multiplicative constants and applying the known form of the
Wishart distribution.

S-ar putea să vă placă și