Documente Academic
Documente Profesional
Documente Cultură
Stochastic Modelling
CTSM 2.3
Mathematics Guide
Contents
1 The mathematics behind the algorithms of CTSM
1.1
1.2
Parameter estimation
. . . . . . . . . . . . . . . . . . . . . . .
1.1.1
Model structures . . . . . . . . . . . . . . . . . . . . . .
1.1.2
1.1.3
Filtering methods
. . . . . . . . . . . . . . . . . . . . .
1.1.4
Data issues . . . . . . . . . . . . . . . . . . . . . . . . .
20
1.1.5
Optimisation issues . . . . . . . . . . . . . . . . . . . . .
22
1.1.6
Performance issues . . . . . . . . . . . . . . . . . . . . .
26
Other features
. . . . . . . . . . . . . . . . . . . . . . . . . . .
26
1.2.1
Various statistics . . . . . . . . . . . . . . . . . . . . . .
26
1.2.2
28
References
. . . . . . . . . . . . . . . .
31
iv
Contents
1
The mathematics behind
the algorithms of CTSM
The following is a complete mathematical outline of the algorithms of CTSM.
1.1
Parameter estimation
The primary feature in CTSM is estimation of parameters in continuousdiscrete stochastic state space models on the basis of experimental data.
1.1.1
Model structures
CTSM differentiates between three different model structures for continuousdiscrete stochastic state space models as outlined in the following.
1.1.1.1
The most general of these model structures is the nonlinear (NL) model, which
can be described by the following equations:
dxt = f (xt , ut , t, )dt + (ut , t, )d t
y k = h(xk , uk , tk , ) + ek
(1.1)
(1.2)
1.1.1.2
A special case of the nonlinear model is the linear time-varying (LTV) model,
which can be described by the following equations:
dxt = (A(xt , ut , t, )xt + B(xt , ut , t, )ut ) dt + (ut , t, )d t
y k = C(xk , uk , tk , )xk + D(xk , uk , tk , )uk + ek
(1.3)
(1.4)
(1.5)
(1.6)
1.1.2
(1.7)
(1.8)
or equivalently:
L(; YN ) =
N
Y
!
p(y k |Yk1 , ) p(y 0 |)
(1.9)
k=1
where the rule P (A B) = P (A|B)P (B) has been applied to form a product
of conditional probability densities. In order to obtain an exact evaluation of
the likelihood function, the initial probability density p(y 0 |) must be known
and all subsequent conditional densities must be determined by successively
solving Kolmogorovs forward equation and applying Bayes rule (Jazwinski,
1970), but this approach is computationally infeasible in practice. However,
since the diffusion terms in the above model structures do not depend on the
state variables, a simpler alternative can be used. More specifically, a method
based on Kalman filtering can be applied for LTI and LTV models, and an
approximate method based on extended Kalman filtering can be applied for
NL models. The latter approximation can be applied, because the stochastic
differential equations considered are driven by Wiener processes, and because
increments of a Wiener process are Gaussian, which makes it reasonable to
assume, under some regularity conditions, that the conditional densities can be
well approximated by Gaussian densities. The Gaussian density is completely
characterized by its mean and covariance, so by introducing the notation:
k|k1 = E{y k |Yk1 , }
y
(1.10)
Rk|k1 = V {y k |Yk1 , }
(1.11)
k|k1
k = y k y
(1.12)
and:
the likelihood function can be written as follows:
N exp 1 T R1
Y
k|k1 k
2 k
L(; YN ) =
l p(y 0 |)
p
det(R
)
2
k=1
k|k1
(1.13)
where, for given parameters and initial states, k and Rk|k1 can be computed
by means of a Kalman filter (LTI and LTV models) or an extended Kalman
filter (NL models) as shown in Sections 1.1.3.1 and 1.1.3.2 respectively. Further
conditioning on y 0 and taking the negative logarithm gives:
1 X
ln(det(Rk|k1 )) + Tk R1
k|k1 k
2
k=1
N !
1 X
l ln(2)
+
2
N
ln (L(; YN |y 0 )) =
(1.14)
k=1
and ML estimates of the parameters (and optionally of the initial states) can
now be determined by solving the following nonlinear optimisation problem:
= arg min { ln (L(; YN |y ))}
(1.15)
1.1.2.2
p(YN |)p()
p(YN |)p()
p(YN )
(1.16)
and subsequently finding the parameters that maximize this function, i.e. by
performing maximum a posteriori (MAP) estimation. A nice feature of this
expression is the fact that it reduces to the likelihood function, when no prior
information is available (p() uniform), making ML estimation a special case
of MAP estimation. In fact, this formulation also allows MAP estimation on a
subset of the parameters (p() partly uniform). By introducing the notation1 :
= E{}
= V {}
(1.17)
(1.18)
(1.19)
and:
and by assuming that the prior probability density of the parameters is Gaussian, the posterior probability density function can be written as follows:
N exp 1 T R1
Y
k|k1 k
2 k
p(|YN )
l p(y 0 |)
p
det(Rk|k1 ) 2
k=1
(1.20)
exp 12 T 1
p
p
det( ) 2
Further conditioning on y 0 and taking the negative logarithm gives:
1 X
ln(det(Rk|k1 )) + Tk R1
k|k1 k
2
k=1
!
N !
X
1
l + p ln(2)
+
2
N
ln (p(|YN , y 0 ))
(1.21)
k=1
1
1
+ ln(det( )) + T 1
2
2
and MAP estimates of the parameters (and optionally of the initial states) can
now be determined by solving the following nonlinear optimisation problem:
= arg min { ln (p(|YN , y ))}
(1.22)
1.1.2.3
1 i
Ni exp 1 (i )T (Ri
S
)
Y
Y
k|k1
k
2 k
p(y i0 |)
q
p(|Y)
l
i
det(Rk|k1 ) 2
i=1 k=1
(1.23)
1 T 1
exp 2
p
p
det( ) 2
where:
1
2
i
S
Y = [YN
, YN
, . . . , YN
, . . . , YN
]
1
2
i
S
(1.24)
and where the individual sequences of measurements are assumed to be stochastically independent. This formulation allows MAP estimation on multiple data
sets, but, as special cases, it also allows ML estimation on multiple data sets
(p() uniform), MAP estimation on a single data set (S = 1) and ML estimation
on a single data set (p() uniform, S = 1). Further conditioning on:
y0 = [y 10 , y 20 , . . . , y i0 , . . . , y S0 ]
(1.25)
1 XX
ln(det(Rik|k1 )) + (ik )T(Rik|k1 )1 ik
2 i=1
k=1
!
S N !
i
XX
1
(1.26)
+
l + p ln(2)
2
i=1
ln (p(|Y, y0 ))
k=1
1
1
+ ln(det( )) + T 1
2
2
and estimates of the parameters (and optionally of the initial states) can now
be determined by solving the following nonlinear optimisation problem:
= arg min { ln (p(|Y, y0 ))}
1.1.3
(1.27)
Filtering methods
CTSM computes the innovation vectors k (or ik ) and their covariance matrices Rk|k1 (or Rik|k1 ) recursively by means of a Kalman filter (LTI and LTV
models) or an extended Kalman filter (NL models) as outlined in the following.
1.1.3.1
Kalman filtering
For LTI and LTV models k (or ik ) and Rk|k1 (or Rik|k1 ) can be computed
for a given set of parameters and initial states x0 by means of a continuousdiscrete Kalman filter, i.e. by means of the output prediction equations:
k|k1 + Duk
k|k1 = C x
y
(1.28)
Rk|k1 = CP k|k1 C T + S
(1.29)
k|k1
k = y k y
(1.30)
K k = P k|k1 C T R1
k|k1
(1.31)
K k Rk|k1 K Tk
(1.32)
(1.33)
(1.34)
(1.35)
(1.36)
(1.37)
and (1.35) can be replaced by their discrete time counterparts, which can be
derived from the solution to the stochastic differential equation:
dxt = (Axt + But ) dt + d t , t [tk , tk+1 [
(1.38)
i.e. from:
Z
xtk+1 = eA(tk+1 tk ) xtk +
tk+1
eA(tk+1 s) Bus ds +
tk
tk+1
eA(tk+1 s) d s (1.39)
tk
which yields:
Z
k+1|k = E{xtk+1 |xtk } = e
x
A(tk+1 tk )
k|k +
x
tk+1
eA(tk+1 s) Bus ds
(1.40)
tk
T
P k+1|k = E{xtk+1 xTtk+1 |xtk } = eA(tk+1 tk ) P k|k eA(tk+1 tk )
Z tk+1
T
+
eA(tk+1 s) T eA(tk+1 s) ds
(1.41)
tk
(1.42)
(1.43)
In order to be able to use (1.40) and (1.41), the integrals of both equations
must be computed. For this purpose the equations are rewritten to:
Z
k+1|k = eA(tk+1 tk ) x
k|k +
x
Z
eA(tk+1 s) Bus ds
tk
tk+1
k|k +
x
eA(tk+1 s) B ((s tk ) + uk ) ds
tk
Z s
k|k +
= s x
eAs B ((s s) + uk ) ds
0
Z s
Z s
k|k
= s x
eAs sdsB +
eAs dsB (s + uk )
=e
As
tk+1
(1.44)
and:
T
P k+1|k = eA(tk+1 tk ) P k|k eA(tk+1 tk )
Z tk+1
T
+
eA(tk+1 s) T eA(tk+1 s) ds
tk
Z s
T
eAs T eAs ds
= eAs P k|k eAs +
0
Z s
T
T
As
= s P k|k s +
e T eAs ds
(1.45)
uk+1 uk
tk+1 tk
(1.46)
exp
A T
H 1 (s )
s =
0
0
AT
H 2 (s )
H 3 (s )
(1.47)
Z
0
(1.48)
T
eAs T eAs ds = H T3 (s )H 2 (s )
(1.49)
T
s T Ts T = A
eAs T eAs ds
(1.50)
Z 0s
T
+
eAs T eAs dsAT
0
but this approach has been found to be less feasible. The integrals in (1.44)
are not as easy to deal with, especially if A is singular. However, this problem
can be solved by introducing the singular value decomposition (SVD) of A, i.e.
U V T, transforming the integrals and subsequently computing these.
3 Within
s
0
U T eAs U sdsU T = U
eAs sds = U
0
eAs sdsU T
(1.51)
= A1
A
0
2
A
0
(1.52)
e
0
As
1 A
2 2
1 A
2 2 s3
A
A
Is +
s +
sds =
+ ds
0
0
0
0
2
0
!
"
#
Z s
1 A
2 2
2 A
1A
2 s3
A
A
1
+ ds
=
Is +
s +
0
0
2
0
0
0
#
"R
R 1 A
s A1 s
2 ds
e
sds 0 s A
e 1 s I sA
1
0
=
2
0
I 2s
"h 1
is
1s
eA
1
A
Is A
1
1
=
0
(1.53)
0
h 1
is
#
1s
1 A
eA
1 I s2
2
A
Is A
A
1
1
1
2
Z
"
s2
2
1 A
1
I +
1 s
A
1
1
s
s
=
0
1
1
#
2
1 s I s A
1 A
1
I +
2
A
A
s
1
1
1
s
2
I
s2
2
1
s = U s U =
s
0
T
s
I
#
(1.54)
U T eAs U dsU T = U
eAs ds = U
0
eAs dsU T
(1.55)
10
Z s
Z s
1 A
2
1 A
2 2 s2
A
A
As
s+
e ds =
I+
+ ds
0
0
0
0
2
0
0
#
"
!
Z s
1 A
2
2 s2
2 A
1A
A
A
1
s+
=
I+
+ ds
0
0
2
0
0
0
#
"R
R s 1 A
s A
1s
1s
2 ds
e
ds
A
e
I
A
1
0
= 0
0
Is
#
"h 1
is
h 1
is
1
1s
2
eA
eA1 s
Is A
A
A
A
1
1
1
=
0
0
0
Is
#
1 1
" 1 1
2
I Is A
1 A
I
A
1
s
1
1
s
=
0
Is
(1.56)
j+1 = s x
j U
x
eAs sdsU T B + U
eAs dsU T B (s + uj ) (1.57)
0
with:
Z
"
As
e
0
and:
Z
e
0
1 1
#
1
I
1 A
I Is A
2
A
A
1
s
1
1
s
ds =
0
Is
"
As
1 A
1
I +
1 s
A
1
1
s
s
sds =
0
1
1
#
2
1
1
I +
1 s I s A
2
A
A
A
1
1
1
s
s
2
(1.58)
(1.59)
s2
2
j+1 = s x
j + U
x
eAs dsU T Buj
(1.60)
0
11
with:
Z
e
0
1 1
#
1
1 A
I Is A
2
I
1
A
A
1
1
s
1
s
ds =
0
Is
"
As
(1.61)
with:
eAs ds = A1 (s I)
(1.63)
eAs sds = A1 A1 (s I) + s s
(1.64)
and:
with:
eAs ds = A1 (s I)
(1.66)
with:
eAs ds = Is
(1.68)
and:
eAs sds = I
s2
2
(1.69)
12
with:
eAs ds = Is
(1.71)
1.1.3.2
For NL models k (or ik ) and Rk|k1 (or Rik|k1 ) can be computed for a given
set of parameters and initial states x0 by means of a continuous-discrete
extended Kalman filter, i.e. by means of the output prediction equations:
k|k1 = h(
y
xk|k1 , uk , tk , )
T
Rk|k1 = CP k|k1 C + S
(1.72)
(1.73)
(1.74)
K k = P k|k1 C T R1
k|k1
(1.75)
K k Rk|k1 K Tk
(1.76)
(1.77)
f
h
A=
,C=
xt x=xk|k1 ,u=uk ,t=tk ,
xt x=xk|k1 ,u=uk ,t=tk ,
(1.78)
(1.79)
(1.80)
= (uk , tk , ) , S = S(uk , tk , )
4 Within CTSM the code needed to evaluate the Jacobians is generated through analytical
manipulation using a method based on the algorithms of Speelpenning (1980).
13
f
f
A=
,B=
xt x=xj|j1 ,u=uj ,t=tj ,
ut x=xj|j1 ,u=uj ,t=tj ,
(1.83)
= (uj , tj , ) , S = S(uj , tj , )
The solution to (1.82) is equivalent to the solution to (1.35), i.e.:
Z
P j+1|j =
s P j|j Ts
T
eAs T eAs ds
(1.84)
(1.85)
and introducing:
=
uj+1 uj
tj+1 tj
(1.86)
(1.87)
5 Within CTSM the code needed to evaluate the Jacobians is generated through analytical
manipulation using a method based on the algorithms of Speelpenning (1980).
14
j ) + B(t tj )
= f + U V T (
xt x
j ) + U T B(t tj )
= U T f + U T U V T U U T (
xt x
(1.88)
= U T f + V T U (z t z j ) + U T B(t tj )
t z j ) + B(t
= f + A(z
tj ) , t [tj , tj+1 [
= U T B. Now, if A
f = U f and the matrices A = V T U = U T AU and B
has a special structure:
is singular, the matrix A
1 A
2
A
A=
(1.89)
0
0
which makes it possible to split up the previous result in two distinct equations:
dz 1t
1 (z 1 z 1 ) + A
2 (z 2 z 2 ) + B
1 (t tj ), t [tj , tj+1 [
= f 1 + A
t
j
t
j
dt
(1.90)
dz 2t
2 (t tj ), t [tj , tj+1 [
= f 2 + B
dt
which can then be solved one at a time for the transformed variables. Solving
the equation for z 2t , with the initial condition z 2t=tj = z 2j , yields:
1
2
z 2t = z 2j + f 2 (t tj ) + B
2 (t tj ) , t [tj , tj+1 [
2
(1.91)
dz 1t
1 (z 1 z 1 ) + A
2 f 2 (t tj ) + 1 B
2 (t tj )2
= f 1 + A
t
j
dt
2
+ B 1 (t tj ) , t [tj , tj+1 [
(1.92)
1
2 f 2 + B
1 , G = f 1 A
1z1
A2 B 2 , F = A
j
2
(1.93)
(1.94)
15
1t
1t
1
A
A
2
zt = e
e
E(ttj ) +F (ttj )+G dt + c , t [tj , tj+1 [
which can be rearranged to:
1 I(t tj )2 + 2A
1 (t tj ) + 2A
2 E
z 1t = A
1
1
1
1t
1 I(t tj ) + A
1 F + G + eA
A
c , t [tj , tj+1 [
1
1
(1.95)
(1.96)
2
1 tj
1 F + G + eA
E+A
1 2A
c
z 1j = A
1
1
1
1 2
E+A
F + G + z1
c = eA1 tj A
2A
1
(1.97)
1 I(t tj )2 + 2A
1 (t tj ) + 2A
2 E
z 1t = A
1
1
1
1
1
F +G
I(t tj ) + A
A
1
1
1 2
E+A
1 F + G + z 1 , t [tj , tj+1 [
+ eA1 (ttj ) A
2
A
1
1
1
j
(1.98)
1 I 2 + 2A
1 s + 2A
2 E + Is + A
1 F + G
z 1j+1 = A
1
s
1
1
1
1 2
1 A
E+A
1 F + G + z 1
+
2
A
s
1
1
1
j
2
1
1 s + 2A
2 1 A
2B
2
= A
I
+
2
A
1
s
1
1
2
1
1
2 f + B
1 + f A
1z1
A
Is + A
A
2
1
1
1
j
1
1
2 1
1
+ s A1
2A1 A2 B 2 + A1 A2 f 2 + B 1
2
(1.99)
1
1
+ s A1 f 1 A1 z 1j + z 1j
1
1 1 A
2B
2 2 + A
A
2B
2 + A
2 f 2 + B
1 s
= z 1j A
1
s
1
2
1
2 1
I A
A
2B
2 + A
2 f 2 + B
1 + A
1 f 1
+
A
s
1
1
1
1 1
2
1 A
A
2B
2 + A
2 f 2 + B
1 s
= z 1j A
1 A2 B 2 s A1
1
2
1 1
1 I A
A
2B
2 + A
2 f 2 + B
1 + f 1
+ A1
A
s
1
1
16
and:
1
2
z 2j+1 = z 2j + f 2 s + B
2 s
2
(1.100)
1
s = U T s U =
s
s
I
#
(1.101)
j+1|j can be
and where the desired solution in terms of the original variables x
t = U zt.
found by applying the reverse transformation x
Depending on the specific singularity of A (see Section 1.1.3.3 for details on
how this is determined in CTSM) and the particular nature of the inputs,
several different cases are possible as shown in the following.
General case: Singular A, first order hold on inputs
In the general case, the extended Kalman filter solution is given as follows:
1 1
2
z 1j+1|j = z 1j|j A
1 A2 B 2 s
2
1
1 A
A
2B
2 + A
2 f 2 + B
1 s
(1.102)
A
1
1
1 A
1 A
2B
2 + A
2 f 2 + B
1 + f 1
1
1 I A
+A
1
1
1
s
and:
1
2
z 2j+1|j = z 2j|j + f 2 s + B
2 s
2
(1.103)
1 A
2 f 2 s + A
1
I A
A
2 f 2 + f 1
z 1j+1|j = z 1j|j A
1
1
s
1
(1.104)
and:
z 2j+1|j = z 2j|j + f 2 s
(1.105)
17
j+1|j = x
j|j A1 Bs + A1 (s I) A1 B + f
(1.106)
x
Special case no. 3: Nonsingular A, zero order hold on inputs
The solution to this special case can be obtained by removing the SVD depen 1, B
1 and f 1 with xt , A, B and f respecdent parts, i.e. by replacing z 1t , A
2
(1.107)
(1.108)
(1.109)
18
(1.110)
Rik|k1
(1.111)
C i P k|k1 C Ti
+S
(1.112)
(1.113)
(1.115)
h
Ci =
xt x=i ,u=uk ,t=tk ,
(1.116)
P k|k = P k|k1
where:
(1.114)
K ik Rik|k1 (K ik )T
Determination of singularity
Computing the singular value decomposition (SVD) of a matrix is a computationally expensive task, which should be avoided if possible. Within CTSM
the determination of whether or not the A matrix is singular and thus whether
or not the SVD should be applied, therefore is not based on the SVD itself,
but on an estimate of the reciprocal condition number, i.e.:
1 =
1
|A||A1 |
(1.117)
where |A| is the 1-norm of the A matrix and |A1 | is an estimate of the 1-norm
of A1 . This quantity can be computed much faster than the SVD, and only
if its value is below a certain threshold (e.g. 1e-12), the SVD is applied.
1.1.3.4
19
In order for the (extended) Kalman filter to work, the initial states x0 and their
covariance matrix P 0 must be specified. Within CTSM the initial states may
either be pre-specified or estimated by the program along with the parameters,
whereas the initial covariance matrix is calculated in the following way:
Z t1
eAs T (eAs )T ds
(1.118)
P 0 = Ps
t0
i.e. as the integral of the Wiener process and the dynamics of the system over
the first sample, which is then scaled by a pre-specified scaling factor Ps 1.
1.1.3.5
The (extended) Kalman filter may be numerically unstable in certain situations. The problem arises when some of the covariance matrices, which are
known from theory to be symmetric and positive definite, become non-positive
definite because of rounding errors. Consequently, careful handling of the covariance equations is needed to stabilize the (extended) Kalman filter. Within
CTSM, all covariance matrices are therefore replaced with their square root
free Cholesky decompositions (Fletcher and Powell, 1974), i.e.:
P = LDLT
(1.119)
(1.120)
(1.121)
and for this purpose a number of different methods are available, e.g. the
method described by Fletcher and Powell (1974), which is based on the modified
Givens transformation, and the method described by Thornton and Bierman
(1980), which is based on the modified weighted Gram-Schmidt orthogonalization. Within CTSM the specific implementation of the (extended) Kalman
filter is based on the latter, and this implementation has been proven to have
a high grade of accuracy as well as stability (Bierman, 1977).
20
1.1.4
Data issues
Raw data sequences are often difficult to use for identification and parameter
estimation purposes, e.g. if irregular sampling has been applied, if there are
occasional outliers or if some of the observations are missing. CTSM also
provides features to deal with these issues, and this makes the program flexible
with respect to the types of data that can be used for the estimation.
1.1.4.1
Irregular sampling.
Occasional outliers
(1.123)
with a threshold function (ki ), which returns the argument for small values
of ki , but is a linear function of ik for large values of ki , i.e.:
(ki ) =
ki p
, ki < c2
c(2 ki c) , ki c2
(1.124)
1.1.4.3
21
Missing observations.
The algorithms of the parameter estimation methods described above also make
it easy to handle missing observations, i.e. to account for missing values in the
output vector y ik , for some i and some k, when calculating the terms:
i
1 XX
ln(det(Rik|k1 )) + (ik )T (Rik|k1 )1 ik
2 i=1
(1.125)
k=1
and:
1
2
S N !
!
i
XX
l + p ln(2)
(1.126)
i=1 k=1
in (1.26). To illustrate this, the case of extended Kalman filtering for NL models
is considered, but similar arguments apply in the case of Kalman filtering for
LTI and LTV models. The usual way to account for missing or non-informative
values in the extended Kalman filter is to formally set the corresponding elements of the measurement error covariance matrix S in (1.73) to infinity, which
in turn gives zeroes in the corresponding elements of the inverted output covariance matrix (Rk|k1 )1 and the Kalman gain matrix K k , meaning that
no updating will take place in (1.76) and (1.77) corresponding to the missing
values. This approach cannot be used when calculating (1.125) and (1.126),
however, because a solution is needed which modifies both ik , Rik|k1 and l to
reflect that the effective dimension of y ik is reduced. This is accomplished by
replacing (1.2) with the alternative measurement equation:
y k = E (h(xk , uk , tk , ) + ek )
(1.127)
1 0 0
E=
(1.128)
0 0 1
Equivalently, the equations of the extended Kalman filter are replaced with the
following alternative output prediction equations:
k|k1 = Eh(
y
xk|k1 , uk , tk , )
T
(1.129)
T
(1.130)
(1.131)
22
K k = P k|k1 C T E T Rk|k1
(1.132)
(1.133)
T
K k Rk|k1 K k
(1.134)
The state prediction equations remain the same, and the above replacements
in turn provide the necessary modifications of (1.125) to:
i
1 XX
i
i
ln(det(Rk|k1 )) + (ik )T (Rk|k1 )1 ik
2 i=1
(1.135)
k=1
1.1.5
Optimisation issues
CTSM uses a quasi-Newton method based on the BFGS updating formula and
a soft line search algorithm to solve the nonlinear optimisation problem (1.27).
This method is similar to the one described by Dennis and Schnabel (1983),
except for the fact that the gradient of the objective function is approximated
by a set of finite difference derivatives. In analogy with ordinary NewtonRaphson methods for optimisation, quasi-Newton methods seek a minimum of
a nonlinear objective function F(): Rp R, i.e.:
min F()
(1.136)
F ()
satisfies:
(1.137)
Both types of methods are based on the Taylor expansion of g() to first order:
g( i + ) = g( i ) +
g()
i + o()
|
=
(1.138)
i+1
= +
(1.139)
(1.140)
23
g()
i
|
=
(1.141)
but unfortunately neither the Hessian nor the gradient can be computed explicitly for the optimisation problem (1.27). As mentioned above, the gradient
is therefore approximated by a set of finite difference derivatives, and a secant
approximation based on the BFGS updating formula is applied for the Hessian. It is the use of a secant approximation to the Hessian that distinguishes
quasi-Newton methods from ordinary Newton-Raphson methods.
1.1.5.1
F( i + j ej ) F( i )
, j = 1, . . . , p
j
(1.142)
where gj ( i ) is the jth component of g( i ) and ej is the jth basis vector. The
error of this type of approximation is o(j ). Subsequently, i.e. when ||g()||
becomes small near a minimum of the objective function, central difference
approximations are used instead, i.e.:
gj ( i )
F( i + j ej ) F( i j ej )
, j = 1, . . . , p
2j
(1.143)
j = 2 j
(1.144)
j = 3 j
(1.145)
where is the relative error of calculating F() (Dennis and Schnabel, 1983).
24
1.1.5.2
y i y Ti
B i si sT B i
T i
T
y i si
si B i si
(1.146)
where y i = g( i+1 ) g( i ) and si = i+1 i . Necessary and sufficient conditions for B i+1 to be positive definite is that B i is positive definite and that:
y Ti si > 0
(1.147)
This last demand is automatically met by the line search algorithm. Furthermore, since the Hessian is symmetric and positive definite, it can also be written
in terms of its square root free Cholesky factors, i.e.:
B i = Li D i LTi
(1.148)
With i being the secant direction from (1.139) (using H i = B i obtained from
(1.146)), the idea of the soft line search algorithm is to replace (1.140) with:
i+1 = i + i i
(1.149)
and choose a value of i > 0 that ensures that the next iterate decreases F()
and that (1.147) is satisfied. Often i = 1 will satisfy these demands and (1.149)
reduces to (1.140). The soft line search algorithm is globally convergent if each
step satisfies two simple conditions. The first condition is that the decrease in
F() is sufficient compared to the length of the step si = i i , i.e.:
F( i+1 ) < F( i ) + g( i )T si
(1.150)
where ]0, 1[. The second condition is that the step is not too short, i.e.:
g( i+1 )T si g( i )T si
(1.151)
T
y Ti si = g( i+1 ) g( i ) si ( 1)g( i )T si > 0
(1.152)
25
which guarantees that (1.147) is satisfied. The method for finding a value of
i that satisfies both (1.150) and (1.151) starts out by trying i = p = 1. If
this trial value is not admissible because it fails to satisfy (1.150), a decreased
value is found by cubic interpolation using F( i ), g( i ), F( i + p i ) and
g( i + p i ). If the trial value satisfies (1.150) but not (1.151), an increased
value is found by extrapolation. After one or more repetitions, an admissible
i is found, because it can be proved that there exists an interval i [1 , 2 ]
where (1.150) and (1.151) are both satisfied (Dennis and Schnabel, 1983).
1.1.5.4
Constraints on parameters
(1.153)
These constraints are satisfied by solving the optimisation problem with respect
to a transformation of the original parameters, i.e.:
!
j jmin
j = ln
, j = 1, . . . , p
(1.154)
jmax j
A problem arises with this type of transformation when j is very close to one
of the limits, because the finite difference derivative with respect to j may
be close to zero, but this problem is solved by adding an appropriate penalty
function to (1.26) to give the following modified objective function:
F() = ln (p(|Y, y0 )) + P (, , min , max )
which is then used instead. The penalty function is given as follows:
p
p
min
max
X
X
|
|
|
|
j
j
P (, , min , max ) =
+
max
min
j
j
j
j
j=1
j=1
(1.155)
(1.156)
for |jmin | > 0 and |jmax | > 0, j = 1, . . . , p. For proper choices of the Lagrange
multiplier and the limiting values jmin and jmax the penalty function has no
influence on the estimation when j is well within the limits but will force the
finite difference derivative to increase when j is close to one of the limits.
Along with the parameter estimates CTSM computes normalized (by multiplication with the estimates) derivatives of F() and P (, , min , max ) with
respect to the parameters to provide information about the solution. The derivatives of F() should of course be close to zero, and the absolute values
of the derivatives of P (, , min , max ) should not be large compared to the
corresponding absolute values of the derivatives of F(), because this indicates
that the corresponding parameters are close to one of their limits.
26
1.1.6
Performance issues
1.2
Other features
1.2.1
Various statistics
2
{hij } = E
ln (p(|Y, y0 )) , i, j = 1, . . . , p
i j
and where an approximation to H can be obtained from:
, i, j = 1, . . . , p
{hij }
ln (p(|Y, y0 ))
i j
=
(1.157)
(1.158)
(1.159)
which is the Hessian evaluated at the minimum of the objective function, i.e.
H i |= . As an overall measure of the uncertainty of the parameter estimates,
27
14
120
12
100
10
8
CPUs
80
60
40
20
10
12
14
10
12
14
CPUs
CPUs
(a) Performance.
(b) Scalability.
Figure 1.1. Performance (execution time vs. no. of CPUs) and scalability (no. of
CPUs vs. no. of CPUs) of CTSM when using shared memory parallel computing.
Solid lines: CTSM values; dashed lines: Theoretical values (linear scalability).
ln det H i |=
(1.160)
The lower the value of this statistic, the lower the overall uncertainty of the
parameter estimates. A measure of the uncertainty of the individual parameter
estimates is obtained by decomposing the covariance matrix as follows:
= R
(1.161)
j = 0
(1.162)
j 6= 0
(1.163)
28
0.5
0.5
0.45
0.45
0.4
0.4
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
-5
-4
-3
-2
-1
0
-5
-4
-3
-2
-1
where, if there are missing observations in y ik for some i and some k, the
particular value of l is reduced with the number of missing values in y ik . The
critical region for a test on significance level is given as follows:
z t (j ) < t(DF) z t (j ) > t(DF)1
(1.165)
2
P t <|z t (j )| t >|z t (j )|
(1.166)
t
z N (j ) = z t (j ) q
1+
1
4DF
(z t (j ))2
2DF
N (0, 1)
(1.168)
1.2.2
To facilitate e.g. residual analysis, CTSM can also be used to generate validation data, i.e. state and output estimates corresponding to a given input data
set, using either pure simulation, prediction, filtering or smoothing.
1.2.2.1
29
The state and output estimates that can be generated by means of pure si k|0 and y
k|0 , k = 0, . . . , N , along
mulation arepx
with their standard deviations
p
SD(
xk|0 ) = diag(P k|0 ) and SD(
y k|0 ) = diag(Rk|0 ), k = 0, . . . , N . The estimates are generated by the (extended) Kalman filter without updating.
1.2.2.2
k|kj ,
The state and output estimates that can be generated by prediction are x
k|kj
j 1, and y
,
j
1,
k
=
0,
.
.
.
,
N
,
along
with
their
standard
deviations
p
p
SD(
xk|kj ) = diag(P k|kj ) and SD(
y k|kj ) = diag(Rk|kj ), k = 0, . . . , N .
The estimates are generated by the (extended) Kalman filter with updating.
1.2.2.3
k|k , k = 0, . . . , N ,
The state estimates that can be generated by filtering
p are x
along with their standard deviations SD(
xk|k ) = diag(P k|k ), k = 0, . . . , N .
The estimates are generated by the (extended) Kalman filter with updating.
1.2.2.4
k|N , k = 0, . . . , N ,
The state estimates that can be generated by smoothing
p are x
along with their standard deviations SD(
xk|N ) = diag(P k|N ), k = 0, . . . , N .
The estimates are generated by means of a nonlinear smoothing algorithm
based on the extended Kalman filter (for a formal derivation of the algorithm,
see Gelb (1974)). The starting point is the following set of formulas:
1
k|k
k|N = P k|N P 1
x
+
x
P
x
(1.169)
k|k1
k|k
k|k1
1 1
P k|N = P 1
(1.170)
k|k1 + P k|k
which states that the smoothed estimate can be computed by combining a
forward filter estimate based only on past information with a backward filter estimate based only on present and future information. The forward
k|k1 , k = 1, . . . , N , and their covariance matrices P k|k1 ,
filter estimates x
k = 1, . . . , N , can be computed by means of the EKF formulation given above,
k|k , k = 1, . . . , N ,
which is straightforward. The backward filter estimates x
and their covariance matrices P k|k , k = 1, . . . , N , on the other hand, must be
computed using a different set of formulas. In this set of formulas, a transformation of the time variable is used, i.e. = tN t, which gives the SDE model,
on which the backward filter is based, the following system equation:
dxtN = f (xtN , utn , tN , )d (utN , tN , )d (1.171)
30
k|N = P k|N P 1
x
x
+
s
(1.172)
k|k
k|k1 k|k1
1 1
P k|N = P 1
(1.173)
k|k1 + P k|k
The backward filter consists of the updating equations:
k|k1
y k h(
xk|k1 , uk , tk , ) + C k x
sk|k = sk|k+1 + C Tk S 1
k
1
P k|k
1
P k|k+1
C Tk S 1
k Ck
(1.174)
(1.175)
1
tN |k (1.177)
P tN |k f (
xtN |k , utN , tN , ) A x
1
dP tN |k
1
1
1
1
= P tN |k A + AT P tN |k P tN |k T P tN |k
d
(1.178)
which are solved, e.g. by means of an ODE solver, for [k , k+1 ]. In all of
the above equations the following simplified notation has been applied:
A =
f
h
|x
|x
,u
,t , , C k =
,u ,t ,
xt tN |k tN N
xt k|k1 k k
= (utN , tN , ) , S k = S(uk , tk , )
(1.179)
1
k|k + sk|k+1
k|N = P k|N P 1
(1.180)
x
k|k x
1
1
P k|N = P 1
(1.181)
k|k + P k|k+1
by realizing that the smoothed estimate must coincide with the forward filter
estimate for k = N . The smoothing feature is only available for NL models.
References
Bierman, G. J. (1977). Factorization Methods for Discrete Sequential Estimation. Academic Press, New York, USA.
Dennis, J. E. and Schnabel, R. B. (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood
Cliffs, USA.
Fletcher, R. and Powell, J. D. (1974). On the Modification of LDLT Factorizations. Math. Comp., 28, 10671087.
Gelb, A. (1974). Applied Optimal Estimation. The MIT Press, Cambridge,
USA.
Hindmarsh, A. C. (1983). ODEPACK, A Systematized Collection of ODE Solvers. In R. S. Stepleman, editor, Scientific Computing (IMACS Transactions
on Scientific Computation, Vol. 1), pages 5564. North-Holland, Amsterdam.
Huber, P. J. (1981). Robust Statistics. Wiley, New York, USA.
Jazwinski, A. H. (1970). Stochastic Processes and Filtering Theory. Academic
Press, New York, USA.
Moler, C. and van Loan, C. F. (1978). Nineteen Dubious Ways to Compute
the Exponential of a Matrix. SIAM Review , 20(4), 801836.
Sidje, R. B. (1998). Expokit: A Software Package for Computing Matrix Exponentials. ACM Transactions on Mathematical Software, 24(1), 130156.
Speelpenning, B. (1980). Compiling Fast Partial Derivatives of Functions Given
by Algorithms. Technical Report UILU-ENG 80 1702, University of IllinoisUrbana, Urbana, USA.
Thornton, C. L. and Bierman, G. J. (1980). UDUT Covariance Factorization for
Kalman Filtering. In C. T. Leondes, editor, Control and Dynamic Systems.
Academic Press, New York, USA.
van Loan, C. F. (1978). Computing Integrals Involving the Matrix Exponential.
IEEE Transactions on Automatic Control , 23(3), 395404.
32
References