2009 P. Norton

An Introduction to Identification
J. I'. Norton
DCPlll'ffJ/'J/1 /..'/('('(rollic ami Electrical Engil/e(!ring,
. Uni/'f'Fsit.l' l1irmillg!Wf11, England
1986
ACADEMIC I'RESS
llarCOUf! !J/'{{{'c .IoPl1J10vicl1. Plth/isha.\'
London Orlando
l\lolltrcal
SUII Diegu New York
Sydney Tokyo Turonto
Austin
ACADEMIC PRESS INC. (LONDON) LTD
24/28 Oval Road,
London NW I
UI/ired Stoles EditioJ1 published hy
ACADEMIC PRESS INC.
Orlando, Florida 32H87
Copyright !I.) 1986 by
ACADEMIC PRESS INC. (LONDON) LTD
A1/ Rig/liS RCJI..'I'I'l..'d
No part ofthi5 book may be I'eprouuccu in any form by microlillll, or allY
other means, without lhe w.riuell permission from the publishers
Uritish Library CUlaloguillg in IJublicntioli Uuta
Norton, J. P.
An inlroduclion 10 idclililicatioll.
I. System analysis 2. Enginccring'
0
-
MathemUlical models
I. Tille
003 QA402
ISBN
Printed in Grcal Britain by GalJial'd (Printers) Ltd, Great YarulOuth
Preface
!'VIa thema tical models of dynamical systems (systems wi th links bel ween past
history and present behaviour) arc relJuired ill engineering, physics, medicine,
econumics, ecolugy and in most areas of scientific enquiry. In control
engineering, model-building from measurements on a dynamical system is
known as idcnLiflcalion, and has enjoyed a sustained boom as a research topic
for a decade alld a half. III that time techllique alld theory have developed at
such a pace that, although lllcrc arc a number of good advanced or specialised
textbtJoks 011 identification, a gap lUiS opened in the cuverage at the
uiH.Jcrgrauuatc and introductory graduate level. This book is aillleJ at that
gap. As the book gives a broad view at ;J fairly modest mathcmaticallevcL it
should also suit the reader who, with a particular modelling problem ill mind,
needs a quick appraisal or established methods and their limitations.
A scrious attempt has been tnade to recognise that identifkation, like any
engineering activity, is an art backed up by some science, and not a branch of
applied mathematics. The presentation is therefore informal and more
concerned with usefulness than with elegance. It is also highly selective,
inevitably $0 in such a diverse and eclectic Held.
The mathematical requisites increase gradually from Chapter I to
Chapter 7, but never go far beyond what appears in most first-degree courses
in electrical or cOiltrol engineering. All necessary topics arc well covered in
many textbooks, so brief reviews and references Hre given in the text raliler
than in additional mathematical appendices. With few exceptions results are
derived in full, but questions of rigour are mentioned only when absolutely
necessary. Chapters 2 and 3 use classical Jinear-
syslcl1ll11ethods cmbrucing superposition, Fourier and Laplace transforms, :
transforms, transfer functions, correlation functiuns, power spectra and a
minute alllOUllt of probability theory. Matrix algebra first appears in Chapter
4. including inlier products. quadratic forms, gradicilt vectors, Jacobian and
llcssiun matrices, inverses alltl singularity, positive dcHnitencss, rank and
linear depeilucIlLC, EuclidcHIl nOI'IlI, tlrlhogolJality of vccturs, ortllllgunal
matriccs, eigenvalues and eigenvedurs. Probability and statistics arc required
froIll Chapter 5 011 but they arc illtruduceu in ail elementary way, that is,
intuitively rather than axiomatically. Convergence of random sequences is
discussed in cOllllection with asymptotic properties of estimators, but no
background in analysis is necued. Some acquaintance with state-variable
nlethods will help in Sections 7.3, 7.5, 8.2, 8.6 and 8.7.
Selections for special purposes could be made as !'ollows: basic topics for all
undergraduate course, introducing classical ll1clhous with modest math-
ematical requirements: Chapters I to 4 as far as Section 4.2.1; estimation
theory, suitable for .stale estimation as well as identification: Chapters 4, 5, 6
and 7 up to Section computalional methods for parameter estimation:
Chapters 3, 4,7,8 and 10 (presupposing some background in probability);
review of some reccnt and current areas of activity: Section 7.5, Chapters
and 9; practice of identification for linear, lumped systems; Chapters 2, 3, 4, 7,
9 and 10. The most specialised sections are starred.
My debt to a handful of prominent workers in idenlilication is ubvious from
the texl and is shared by everyone in the lield. Less obvious but no less
appreciated are debts to friends and collaborators over the years, which range
frol11 odds and ends of technique to shaping of an atlituue to the subject.
Rather than attempting a list, I shaJljust mention twu people whuse inlluence
at crucial times has been especially valueu: Percy Hall1ll111lld, ill whose group
at the NPL I encountered identification as an inexhaustible sourcc ofintercst,
and Keith Godfrey, whuse stimulus is responsible for this book being written.
Thanks are also due to Graham Goodwin ror facilities and cncouragement
during a sabbatical at the University or Newcastle. New Suuth Wales. ill which
early sections were written, alll.l to my long-sulfcring ramily.
Provision of records by the Hydro-Electric Commission or Tasmania, the
Institute of Hydrology and Dr. Alan Knell of Warwick Hospital is gralefully
acknowledged, as is the provision of Exercise 9.2.1 by Dr. Alan Robins of
British Aerospace, Dynamics Group (Brislol). '
Finally, I must thank Mandy Dunn for her cheerful good nature and
efJiciency throughoulthe preparation or the typescript, a rar longer job than
either of us expected.
Chapter I
1.1
1.2
vi
Birmingham
November 1985
PREFACE
J.P. Not,mN
Contents
IIrefm:e
List or ablJrevinliolls
Illln,ducliun
Wlmt is iUel1lilkation?
Why should we want <l model?
1.1.1 Models to siltlsry scientifk curiosity
1.2.2 Models ror prcdiclioll ilnu cOlllrol
1.2.3 Models for slale estimation
1.2.4 Modelling for diagnosis of faults ami inadequacies
1.2.5 Models for silliulation anu operator training ,'.1
1.2Jl Models and serendipity .
1..1 What surl or model '!
1..1. J Dynamical IIklUelS; model oruer
1.1.2 Lumped models
1..1.3 Time-invariant models
1.3.4 Linear models
1.3.5 Other model categories
1.4 How do you construct a model?
104.1 Stages in idcntification
1.4.2 Constrainls on i<'!cntilici.ltioll methous and results
1.5 How to rcad this book
Rd"crenn:s
ProblclJ1S
v
xiii
I
1
l
I
2
3
4
5
5
5
6
7
8
9
9
15
15
18
19
19
2t1
Chapter 1 Classic:.1 mcthuds of idcntification: illlliUlsl', stcil and sinc-Wlne
tcsling 2J
2.1 lIL-scriplion ur response 2J
2.1.1 Impulsc responsc initial-condition respollSc 13
2.1.2 Discrclc-time forced responsc: unil-pulse response 25
1.1.3 Step response . 18
2.2 Direct measuremcnt of impulse aild step responses 28
2.2.1 Measurcmcnt of impulse response 18
2.2.2 MCi.lsurcmcn t of step response
2..1 Transform description or response .10
1..1.1 luclltilicatioll of Laplacc transICr function J I
2.3.2 Discrete-timc transrer functioll JI
CONTENTS
34
36
39
40
Chapter 5 Statistical prol,crtics of estimators
5.1 Introduction
5.1.1 Bias of estimators
5.1.2 Unbiased linear estimator for lincar-in-pari.tillcters model
5.2 Bias of cstimate .
5.2.1 Bias with regressors deterministic
5.2.2 Bias with regressors randum .
5.2.3 Bias due to noisily observed regressors: the "errors in
variubles" problem.
5.2,4 Bias due 10 presence of output among regressors
5.2.5 Convcrgence, probability limits and consistcncy
5.3 Covariancc of cSlimates .
5.3.1 Definition of covariance.
Chapter 4 Least-S(IUarCS model fitling
4.1 Finding the "best-lit" model
4.1.1 Lcast squares
4.1.2 Ordinary least squares
4.1.3 Orthogonality
4. 1.4 Weighted least squares
4.2 Computational methods for on.linary least squares
4.2. I Choleski factorisation
4.2.2 tcchnique
4.2.3 Householder transformation 01" regressor matrix
4.2.4 Singular-value decomposition.
4.3 Non-linear least-squarcs cstimation .
4.3.1 Generalised 110rmal cquations
4.3.2 Gauss-Newton algorithm
4.3.3 Levenberg-Marquardt algorithm
4.4 Why bother with statistics?
Further reading
References
Problems
,
99
Ix
III
114
114
117
118
119
119
135
135
136
103
10(,
109
138
139
140
140
141
101
121
121
121
122
123
125
128
128
130
131
132
132
133
133
102
143
143
143
145
Optimal estimatioll, Bayes and maximum-likelihood estimators
Introduclion
Baycsian approach to optimal estimation
6.2.1 Optimality: loss fUllctiolls and risk
6.2.2 Posterior probability densily or parametcrs: Baycs' rule
6.2.3 Bayes estimation: details
Minimum-risk cstimators
6.3.1 rvlini11luIll quadratic cosl
6.3.2 Minimum expected absolute crror
6.3.3 Minimax cstimator
6.3.4 "Most likely" estimate
6.3.5 Baycs estilllation with Gaussian probabilily density function
Maximulll-likelihood estimation .
6.4.1 Conditional estimator
6.4.2 cstimator with Gaussian measure-
ments and lincar model .
6.4.3 UncolluitioJlal maxillluJll-likclihood cstilitalOr
6.4.4 Propertics 01" ll1aximum likelihood estimators.
6.4.5 Maximum-likelihood estimation with GUlissiull vCl:tor
measuremenLs anu unknowii regrcssioil-cqllatiol1 el'l'ur
covariance
Practkal implications of" this chupter
Further reading
Refcrenccs
Problems
CUUlllUtational algorithms for idcntHicalion
Assumptions and model form
7.1. I Assumptions .
7.1.2 Standard Iincar single-input-single output modcl
5.3.2 Covariancc of linear functions of estimatc; signiHcancc of
minimum-covariance estimate
5.3.3 Covariance 01" ordinary least-square and weighted
squarc estimatcs
5.3.4 Minil11ullH:ovariance properly or ordillury
when CITor is ullcorrc!l.ued
5..1.5 Minimum-cuvariance estimate when error is autocorrclalcd/
llOIHitationilry: the Markov estimatc
5..1.() 11IStl'ulllental variables
5.3.7 Minimulll-lllcan-square-error cstilllation: ridgc rcgression
5.J.H lincarcstimatlH' ami ortho-
gonality
ElliciclH:Y
5.4.1 Cramer-Ruo bound
5.4.2 I:flidellcy
I:unhcr reading
Rel"crcnces
Problems
5,4
6.4
6.3
6.5
Chapter 7
7.1
CONTENTS
Chilptcr 6
6.1
6.2
43
43
43
,14
4(,
49
4l.J
SO
55
55
Sf>
59
5Y
59
(,0
64
68
69
70
72
73
77
78
78
79
81
82
83
84
84
9.1
94
96
lJH
98
87
87
88
91'
92
n
93
2.3.3 Frequency transfer function
2.3.4 Measurcment uf frequency transfcr function
Refercnces
Problems
Identification bused Oil correhllion functions
Time averaging to reduce clrects of noisc
3.1.1 Timc-avcrage relations between signals
3.1.2 Input output relation in tt:nns or condalloll J'ullctions
3.1.3 Power spectral density; white noise
Correlation-based identificatioil with special pcrtul'hatioll
3.2.1 test
3.2.2 Pseudo-random binary sequence lest
Further reading
References
Problcllls
Chaptcr 3
3.1
3.2
viii
'1'
,
,
i x
CONTENTS
CONTENTS xi
7.1.3 Output-error and equation-error algorithms 147
7.1.4 A.r.m.<l.x. model 148
7.2 Balch (on"-line) idcnLilicutioll algorithms 149
7.2.1 Role Dr batch algorithms 149
7.2.2 Ilcrntivc generalised least slJlHln:s 150
7.2.3 Maximum-likclihoOlI algorithm 151
7.3 Recursive estimation 152
7.3, I Linear unbiased updating 154
7.3.2 Minimum-covariance linear ullhiascll updating 155
7.3.3 Recursive minimtlln-covariancc cslimalt)f derived from
least sqUllrcs J 58
7.3.4 Inl"ornwtioll updating 162
7.4 Recursive identification including. a I1llisc-slruclurc llIodel 164
7.4.1 Extended leusl squares 164
7.4.2 Extended IlHltrix method 166
7.4.3 Extended least squares us approxilililtc muximuin likelihood;
recursive maximum likelihood 2 167
7.4.4 Stochastic approximatioll 16H
7.4.5 Recursive instrumental variable algorithm 170
7.4.6 Prediction-error algorithms 174
*7.5 Convergence analysis for recursive idcntilkatiun ..l1godthms 176
7.5.1 Formulatioll of all archctypc algorithm . IUJ
7.5.2 Asymptotic behaviour of archctype algorithm 17Y
7.5.3 Convergence theorems for archetype algorithm 182
Further reading 184
Referenccs 185
Problems 187
Chapter 8 Spccialiscd fUllics ill idcntiliclltiull I XlJ
8.1 Recursive idcntification ur lincar, 18Y
8.1.1 Role of time-varying lIIodels . l W)
8.1.2 Modification orrecursive algofithms to track timc vari,ltioll 19 I
8.1.3 Recursive with a forgetting faclor 19 J
8.1.4 Covariancc resetting llJ2
8.1.5 Explicit modelling \11' parameter vilriatilJlI 11J2
8.1.6 Optimal smoothing IYS
8el IJcntiliability , 20 I
8.2.1 Factors alrccting iuentiJiability 20 I
8.2.2 Deterministic ideIllifiability 203
8.2.3 Signal requirements for idcntiliabilily: persistcllCy or
excitation 206
*8.2.4 Persistency of excilation conuitions anu convergence 201}
8.3 Identificalion in cJoseu loop 212
8.3.1 Effects of feedback 011 idcntiliability 212
*8.3.2 Conditions 011 feedback and extcrnal inputs to cnsure
identillability . 215
8.3.3 Self-tuning control . 217
8.4 Audition of regressors: reclirsion ill model ordcr 221
8.4.1 Order-incremcnting equations 221
*H.4.2 Orthogonality in order-incrementing
8.4.3 Latticc algorithms and identiHcation
8.5 rvlodel reduclion .
!I.5.1 Moment llIiltching: Padc approximation
X.5.2 Continued-fraction approximation
8.5.3 Momenl matching in discrete time
8.5.4 Other reduction methods
8.6 idcntilicntioll in houllllcd noise
8.6.1 jjoundcd-Iloisc. model
H.6.2 Recursive algoritllll1: ellipsoidal bOlllJd
"'"H.6 ..1 TolcralJced plcdiction
8.7 IdcntiJicatioll or lilultival'iablc systems
8.7.1 Paramcterisalioll of lTlulli-input-'l1lulti-output models
8.7.2 Cost functions for multivariablc models
8.8 Idclltilication of nOIl-lincar systems
8.8.1 model
8.H.2. Block-oriented models
H.8.J Regression models for non-linear systellls
8.9 Simultaneous estimation of parUillcters and state
S.9.1 State augmentation and extclllled Kalman filtering.
H.9.2 Alternated parameter and state estimation
8.9.J Maxilllllm-likeliliood estimator
8.9.4 Bayes estimator
Rel'erem:es
Problems
Chapter 9 Experimenl dcsign and choice or model slruc:lure
9.1 Introduction
9.2 Expcrimcnt design
9.2.1 AJcquucy of inpul and output
*9.2.2 Optimisation 01' input
*9.2.3 Optimisation of output-sampling schedule
9.3 Selectiun of model structure
Y.J.l Model ordcl' determination
9.3.2 FIest
lJ.J.3 Thc Akaikc infMlllation criterion
t).J.4 I'roduct-ltllllllcnt lIIatrix test
Y.4 SUlllllwry
Further reading
References
Problems
Chapter 10 Model validatiun
10.1 Introduction
10.1.1 Nature of validation tests
10.1.2 What do we test'! .
10.2 Checks before and during estimation
10.2.1 Checks on records
10.2.2 Checks 011 parameter eSlimatcs
10.2.3 Checks on residuals
10.2.4 Checks on covariancc
213
225
225
225
227
230
232
233
233
237
241
242
242
245
246
146
148
250
251
252
254
254
156
256
259
263
263
263
263
266
267
272
272
274
27S
177
281
281
281
283
285
285
285
287
287
287
288
290
292
xii
10.3 checks
10.3.1 Employment of models
10.3.2 Checks on parameter estimates
10.3.3 Checks on residuals
10.3.4 rUIlS
10.3.5 Informal checks
lOA Epilogue.
l:'urlhcr rcading
J{crcrcnccs
Subject index
CONTENTS
29.1
29.1
294
296
.100
.102
.10.1
.111.1
.104
.lOS
a.e.r.
a.m.1.
a.r.
a.r.m.a.
a.r.m.a.x.
a.s.
c.e,f.
d.c.
e.l.s.
c.m.tn.
g.l.s.
i.e.
i.r. \\'.
Ill.a.
m.Ld.
m.i.m.o.
ill.1.
m.m.s.e.
ilLS.
m.s.!:.
o.l.s.
p.d.!".
p.e.
p.r.b.s.
p.s.d.
q.L
Li. v.
Lm.1. I
LI11.1. 2
r.m.s.
s,a.
s.Ls.o.
S.n.L
5.L\\!.
s.s.i.
u.p.r.
W.1.5.
w.p. I
List Or Abbreviations
Aut{)correlatioll function
Approximate maximum likelihood (algorithm)
Autoregression, autoregressive
Auroregrcssive
Autoregressive moving-average exogenous
Almost surely
function
Direct current
Extended least-squares (algorithm)
Extended matrix method
Generalised
Initial cOllllition
Integrated random walk
Moving average
description

IllaxiIII Ii kcli hoot!
Mill illl Uare-error

error
Ordinary
Probability density fUllction
Persistently exciting
binary sequcnce
Power spectral density
Quadratic residue
Recursive (algorithm)
Recursive I (algorithm)
Recursive 2 (algorithm)
Roo t ua re
Stochastic apprOXimation
Si nput tput
ralio
Simple random walk
Strongly
Unit-impulse response
Weighted
With probabilily I
I
,
C'hlll'h'r 1
Inl mdncliun
1.1 WHAT IS IDENTIFICATION'!
IdentiHcation is 'the process of constructing a mathematical model of a
dynamical system frol11 observations and prior knowledge, This definition
raises quite a few questions, Why should we want a model? What exactly does
"dynamical" signil"y'! What sort 01" mathematical model? What sort 01" prior
knowledge and observations? How do you construct such a model? How do
you uecide whether it is any good? The first two questions can bel answered
{"irly quickly, as we shall soon sec, but the others will take the rest 01" the book
to answer even partially.
L2 WilY SIIOULO WE WANT A IVIOOEL'!
To lllakc allY sense, klcntincatiull must have some deHnite purpose, althuugh
it is sometimes not deady stated. Dynamical models find applicatiun in areas
as diverse as engineering and the hard sciences, economics, medicine and the
lire sciences, ecology and agriculture; the references at the start or Chapter 7
give some idea 01" the range. The same few basic purposes undcrlic
idcl1tilil:alion ill all these nelLis.
1.2,1 Mudels to Satisfy Scientific Curiusity
A characteristic of science is its usc of mathemulil:al models to extract the
esscntials from complicaledcvidcncc and to quantify the il1lplications.
Identification has a long history in science,
ICXlllllpic 1.2.1 Halley conducted an identification exercise in 17U4 when,
realising thal reports 01" COl1lets in 1531, 1607 anu 1682 related to" single
object, he calculaled the parameters of its orbit from the limited observations,
I
2 I INTRODUCTION 1.2 WHY SHOULD WE WANT A MODEL'! J
and hence predicted it, return in 175H (AsinlUv, MUldin and Allen,
The orbit detel'lnination relied on prior knowlcdge, from Newtonian
dynamics and gravitational theory, that the orbit would be all ellipse.
We shall return to this example more than once, appropriately enough.
/::,
The aim in scientific modelling is to im:rcasc ulHJerstunuing of some
mechanism by finding the connections between observations relating to iL
Any predictive ability of the model is an incidental beueli t, valuable a, a mea n,
of testing the model.
Example 1.2.2 Halley would have been pleased, no doubt, to know that the
comet did return in 175H, sixteen years arter he died, and pleased that his
model was the reason for people reaching for lheir binoculars in 19H5/6,
but his immediate satisfaction came from ulH.lerslallding beller how the comel
behaved. /::,
1.2.2 1\'lotlcls for Prediction and Control
A wish to prelli!.:t is a COlll1l10n anLl puwerful Il10tive 1'01' dynamicalmudclling.
On a utilitarian vicw, a prediction Illouel should be juJgcJ solely on the
accuracy of its predictions, The plausibility and simplicity of the preuiL:tioli
mouel anu its power to give iilsight arc all incidentlll, although tlley help in
mouel construction and aCl;cplalll.:e. The narrowness of prediction" as an aim
paradoxically makcs the choice of mouel wider. .
Example 1.2.3 Hydrologists have i.I range of techniques for predicting river
flow from measurements of rainfall and flow (Kitanidis and Bras, 19HO;
Kashyap and Rao, 1976). At one extreme a runoll'-routing model represents in
detail the physical features of areas within the catchment, and traces the
passage ol'water through all the areas. Atthe other extreme a black-box model
is estimated from measurements of now at one place anu rainfall at one place
or averaged over the catchment. It makes no attempt to depict the internal
workings of the catchment, but aggregates the catchment dynamics as they
alfect that particular flow. Runolf-routing'models require much more field
measurement to construct. They force a detailed examination of catchment
peculiarities and, perhaps for that reason, inspire conndenee in spite of the
difficulty of testing such a large model. l3lack-box models have the advantages
of a simple and standard form, e,g, linear dillcrellcc equations, and fairly
standard estimation techniques. They arc simple enough to be updated
continually according to recent predil:tioll perrormance, though they may not
be flexible enough, eveu with updating, to match the complicated non-linear
dynamics of the catchment. 6.
Prediction by a dynamical model is important in control-system design.
Design of any scheme more ambitious than the traditional trial .. und-
error-tuned two-term controller requires u model (D"Azzo and I-Joupis, 1981;
Richards, 1979). To keep the design procedure tractable, the model must be
simple, even if that means rough. An accurate model may not be a realistic aim
in any ease. because of variability in the system to be controlled. Ideally the
model would indicate the extent of the variability, so that the controller could
be designed to have acceptable worst-case performance. This is less than
straightforwan.l, as disturbances, measurement inaccuracy and the limitations
of the Illodel structure also contribute uncertainty to the Illodel (Ashworth,
19H2).
Most controldesign mouc!s employ a model for prediction only in the
sense of saying how the system will respond to a standard input such as a step
in the uesired outpul value, or a speciHed disturbance. Prediction is more
directly involved in two control techniques, feed-forward and self-tuning
control, which have been underexploiled through lack of reliable models. In
I'ced-forwanJ cOlltrol, a disturbance is detected early in its progress through
the system and feu forward, suitably shapeu and inverted, 10 cancel its own
cJfects further on. Self-tuning control recakulates the controlling input to the
system periodically by reference to a periodically updated prcdictiollll1ouel or
Ihe clfed or lhal inJlut. The idelllilkation aspects or seif-tulling cOlllru! arc
discllssed brielly in Section H.3.3.
1.2.3 Models for State Estimation
The object of slale estimatioll is to track variables which characterise some
dynamical behaviour, by processing observations amitted by errors, wholly or
partly random. The 1960's and early 1970's saw strikingly successful examples
of state estimation in space navigation, including the Apollo moon landings,
Mariner Mars orbiler and flybys of Venus and Mercury. The state estimated
then was the position and velocity of a space vehicle, or equivalently its orbital
parameters. The range of applications of state estimation has expanded
rapidly, now embracing radar tracking, terrestrial navigation, slabilisation of
ship motion. remole sensing, geophysical exploration, regulation of various
industrial processes, monitoring of power systems and applications in
demography (iEEE Transuctiolls on Automatic COlltrol, special issue, 1983).
Slate estimators rely on a lllOUe! to relate the state variables to each other,
to the observations and to the forcing. COlllmonly, some model parameters
arc initially unknown and must be identified, before ur during slate
estimation. Section 8.9 considers combined slate and parulIleter estimation.
5
4
INTIWOUCTION 1.3 WHAT SORT OF MOOEL'!
between the responses of the reagent tClIlpera ture to perturbations of the inlet
heating-gas lemperature and now rate. 6.
Example 1.2.4 When a digital message is lransmillcd Over 11 cUIlHllunicaliol1
channel at a rate tlose to the maximulll attainahle. lhe ch,lIl1lcl dynamics
smear out ci.H..:h signalling pulse and cause each received pulse to uverlap
several others. This inter-symbol interference Iliust be corrected if the
lrallsmilled message is 10 be recovered (Clark, 1971). Fixed fillers called
equalisers can do lhe job if Ihe channel dynamics arc fixed. In a swilched
system such as the public telephone network lhe chanllel varies fwm
connection to connection. It is also atl'cclcL! by temperature chunges. HntI there
is noise due to switching, poor contacts, crosstalk and induction rrom power
apparatus. The equaliser lllusttherefore adapt 10 Ihe channel and, ideally, lhe
noise characteristics. Many adaptive equalise!" structures have been proposed,
one being to estimate the mcssage computationally as the slate of an initially
unknown and varying system consisting or the chanllel, its filters, coder alld
decoder, modulator Hnd demodulator. The message estimator requires a
model of the channel dynamics, which is updated cOlltinually (Lee allli
Cunningham, 1976; Luvisol1 ,ll1d Pirani, 1979). /'::,.
1.2.4 lVIOllcllilig for Oiagnosis or Faults and Imidetluolties
A great benefit of identification, seldom atkllowledgcu, is its ability to uncover
shortcomings and anomalies. For instance, when an attempt is IlHlde to 1.denlify
thc dynamics of a system, the measuremcilts are oncil fOlllHJ to be inadequa le.
ExaJ'nples are a noisy thermocouple on a uistillation column, incompletc
economic statistics, too few rain gauges or too many lIngaugcu inllows in
hydrology, and physiologicalmeasuf'ements 100 widely spaced because of Ihe
discomforlthey cause. A deJiciency like Ihis lllay nol be easy 10 pUI right, bUI
the awal:eness of its importance may be worth the clrOrl vainly spent on
identification. Disclosure of unexpecteu or untypkal behaviour of the system
is equally valuable whether unprclUcuitated, as in lhe first example below, or
the main reason ror identifying H model, as in the second.
Example 1.2,5 A digital simulation or a pilol-scale gas-healed calalylic
reactor was developed from results of tests on olle reador tube, physical
chemislry and design information (Norton and Smith, 1972). The siUlLIlati()I1
was initially unable lo lIlutch the observed steady-slate telllperaturc prolilc of
the reagents. The trouble was traced 10 slagnation of the heating gas at one
end of the reactor, pointing to :l polcllti;d design illlprovellll:1I1. The reactor
model was also useful in a positive way in explaining an unexpected dilTcrcncc
EX31111J1e 1.2.6 A methionine tolerance test consists of giving a human subject
an oral dose 01" nielhionilie then sampling its concentration in thc blood at nvc
to ten instants over the next three hours or so. Abnormality in the variation or
the concentration may be due to liver disease or diabetes. To aid interpretation
and classification of the response, a tnOllel made up of two or three rate
equalions il1ay be filled 10 it (Brown cl al., 1979). ,6
1.2.5 !\lludels for Simuhltioll and Operator Training
lvlOLlels make it possible to explore situations which in the actual system would
be hazardous, dilliwlt or expeI1sive to sel up. Aircrafl and space-vehicle
silliulators arc well-known examples. Comprehensivcncss and accuracy arc at
a premiulIl for this application, whereas cheapness and simplicity are less so.
As well as lJperator training, simulation moucls are valuable for "what ifT
analyses. Accuracy and completeness muy be less crucial for
qualitative uutcollles arc bdng explored rather than precIse l1umcncal
consequcnces. The discussiulJ alll} thought stilllulated by notoriolls world-
growth models somc years ago (Forrester, IY71; Meaduws, 1973) justified
their constructiun, imperfect as lhey may have becn.
1.2.6 Models llnd Serendipity
We all sUlllcti1l1CS stumble across something interesting when looking for
something else entirely. This also happens in identification.
EXllInple 1.2.7 In 1758 Messier was searching for Halley's comel, to validate
Halley's orbil model. He found Ihe Crab nebula and labelled it M I in his
calaloglle.ll was sllbscquenlly found 10 be a slrong radio sourCe (1948) and 10
have a pulsar at its centre {I 968). In fuct, it turned out to be more interesting
Ihan Halley's comel. ,6
Serendipity is hardly a motive for mouclling, but it cun be a weighty
retrospective justification.
I.J WIlAT sOln OF MODEI/!
Olle ofdynHlllicalmodc1s lias idclltilicatiollmethods far more
fully developed than the rest: lincar, IUlIlped, time-invariant, /inite-order
i
I
I INTROULICTION
I.J WHAT SORT OF MOUEL? 7
models. The reason is that they arc versatile, yet cUluparativcly simple to
identify, analyse and unucrstullll. We had beller examine the properties or
these models. In doing so, we shall inL:idclllally say what we mean by
"dynamical" .
1.3.1 Dyuamical
The feature that distinguishes a dYl1amical systcm is that its output at any
instant depends on its history. not just on the present input as in an
instantaneous sJ2stem. In other words, a dynamical system has memory. Onen
the memory ean be attributed to some easi Iy recognisuble stored energy. If the
present output can be expressed in terms of the input and output values an
infinitesimal time ago, the relation is a dillercntial equation.
Example 1.3.1 The voltage v(l) at lime I across the capacitor in Fig. 1.3.1 is
related to the source voltage lI(I) by v(I) = [lI(I) - v(l)]/CII. This equation is
the limit, as time increment i5t tcnds to zero, ur v(f) = flU - (51) +
01[11(1 - 01) - v(1 - e51)]/CR.
R
Input 1 1Outpul
u(l) Of- c viti
Fig. 1.3.1 A dynumical syslem.
An initial condition v(to) and knowledge of LI(t) from 1
0
onwards will give
v(t) from 1
0
on. Conceptually, we Hnd v(l) from 11(1- ,II) successively at
1= 1
0
+ 01, 1
0
+ 2 01 and so on. In the limit. we integrate the differential
equation to obtain
(
In-I) I I' (r-I)
v(l)=v(ln)exp --- +-- lI(r)exp --- tlr
Cli CIi, CIi
"
whatever the value of to, so DUo) is cnough to determine the ellccts of thc
history up to to on the later behaviour. The capacitor charge q(to) is C[1(1I))' so
the stored electric-Held energy q2(1n)/2C is determined by /'(1,,) and can be
regarded as thc mcmory or the system, 6
The past of a systcm or model influcnccs the future by way of a number of
initial conditions or stored energies (one, in Example 1.3.1). Thc number is
called the system 01' model order. Any modeJ must describe how each of its
energy storages call tributes to the output, so the 11 umber of model parameters
is at least equal to the model order. Some parameter's may be known in
advance, e.g. known to be zero, and the sizc of the i11odelll1ay well be reduced
further by ignoring some energy storages because they give rise to dynamics
which are too rapid, too slow or too small in amplitude to show.
Examille 1.3.2 The resistor and wiring in Example 1.3.1 have a small stray
inductance L, which modiHes the input-output relation to LCv(l) +
RC/;(I) + v(l) = 11(1). Two initial conditions are now needed before we can
solve for 1
1
(1), as there arc two energy storages, electric-field energy in
the capacitor and magnetic-Held energy LC'v
2
(1)/2 in the inductance.
However, the magnetic-field energy can be ignored unless we are interested in
circumstances giving very large rates of change of v(t). 2,
The story is more complicated if there is significant pure delay in the
system. Some delay is always present, since changes propagate through the
system allinite velocity. but it may be negligible. A delay I
d
right at the input
merely delays the response by I
d
so it adds nothing to the dilliculty of
analysing the response. The same goes for a delay right at the output; and we
ha ve only to write y(l + I,,) ror y(l) a t the end of the analysis or 1/(1 - I,,) for 1/(1)
at the start in either casCo Such delay is called dead time. With noticeable delay
anywherc else, the response from any instant onwards depends not just on the
conditions at that instant and the forcing, but also on the behaviour of the
delayed variable throughout the delay interval, i.e. at an infinite number of
instants. Tile number of ini tial conditions to be specified, and hence the system
order, is infinite, making analysis more dillicult (Problem 1.I).
We shall be paying most attention to models which relate the output at a
succession of evenly spaced sampling instants to input and output at earlier
sample instants. They also are hard to analyse when they contain a delay not at
the input or output, even if it is an integer number of samples 100lg, if the
variables being sampled exist between samples as welf. The dilliculty does not
arise, however, if the delay is in a part of the system which is entirely discrete in
time, such as a digital filter or controllcr, since a complete specification of the
delayed variable then only amounts to a Hnite number of sample values
(Problem 1.2).
1.3.2 IVIudeis
Whcn wc writc thc variables describing a system as functions oftimc only, we
imply that each is located at one point in splice or has 110 spatial signilicancc:
the systcm is lumped. not distributed.
I I NTROOUCTION
it
,
1.3 WHAT SORT OF MODEL? 9
Example 1.3.3 Halley's comct extends over a considerable distance and allers
in shape as it orbits. Its velocity is theoretically u fUlIl:lion or three spatial
dimensions and lime. An astronomer is, in practice, content to know fairly
precisely how its centre of gravity moves and, as a separate issue,
approximately what happens to its shape. 6
Example 1.3.4 Studies ofwatcr quality in rivers and lakes arc concerned with
diffusion and citculation of pollutants, nutrients and dissolved oxygen.
Rather thal1l11odelling these quantities through partial dillcrcnlial eljuations
as' functions of two or three spatial dimcnsi()!1s as well as time, it may be
permissi ble to represent rivers as cascades of well-mixed reaches, and lakes by
two- or three-dimensional arrays of compartments. Exchanges of material
between reaches or compartments are then describcd by a Set or ordinary
differential equations (Whitehead i't al., 1979). 6
When a distributed variable is represciltetl by OIlC or lllore lumped
variables, approximation crror is incurred in the t1ynamics in addition to loss
ofresolulion. Evcn so, the question is how to lump. 1101 wlicther to IUllip. sincc
digital computation requiring a lumpetl repl"eSClIlatioll will be nccessary al
somc stage ill Ihe analysis of the sYSICli1 unless thc syslclli <lnd its houndary
COIHJitiollS arc very simple.
1.3.3 Time-Inl'ariant Mudels
,
A dynamical system is timc"'-invariant if the sole ciTed or delaying its Forcing
and initial conditions is to delay its response by thc same amount. In other
words, the relations do not vary with time and arc relatively easy
to analyse.
A time-varying model may be preferred to a marc comprehensive but
cOlnplicated time-invariant model, with the time variation showing the efTects
o[ the omitted part of the time-invariant model. Section 8.1 discusses this
point further.
Example 1.3.5 (Dumont. 1982) Chip refiners in the wood-pulp and paper
industry consist ortwo contra-rotating grooved plates which grind a mixture
o[ wood chips and water. The wood-chip feed rate and motor power to the
plates must be adjusted to control the encrgy input per unit mass of woou
fibrcs. The motor load is adjustcd by a hydraulic <lcluator which varics the gap
betwecn the gl"inuing plates. Unfortunately the gain from plate gap to load
power is non-stationary. because the plates wear, rclatively slowly in II tl rlll 'II
operation but rapidly if the plates clash.
All "open-loop" estimate or the gain may be obtained from a \vcar index
which records the plalc age and number and sevcrity of clashes. This is
implici tty a I1lmlcl 01" the mcchanislil dctcl"mining the ga in. A more sa lisfactory
soltllion is to update an cmpirieal estimate of the gain at short interyals, as
part or a closed iuenlilkation anu control loop. 6
1.3.4 Linear Models
Consider a system with no initial stored energy. lI"its respoilse to an input III (t)
is ydO and its response to t/
2
(1) is Y2(t), it is linear jf its response to
+ IJII
2
(1), with and IJ any constants, is +IJy,(I). A similar
definilion of linearity goes for systems with more than one input variable and
response (output) variable. Linearity allows us to fllld the response to any
input, however cumplicated, by breaking the input into simple components
thcn aduing the responses to each component. In identification, this implies
that only tile response to a suitable standard input need be identified, as in
Chapters 2 and 3.
1-:1'0111 Chapter 4 ()Jlwards, the mudel is rcquireu to be linctlr in its
coclliciellls but not IlCCL'SSil ri ly j 11 its dylli.llllics. Tlli.lt is, if we wri tc 11\e n'HleJcI as
I{y.I'll l . , ., yllll, II, ut
ll
... , "lin).
1
, 0J.'" '. OJ') = U (1.3.1)
where r
liJ
means di),/dt
i
anti similarly for li,;alll.I U
j
to UJlare the coeJlicicnts to
be estimatcu, the derivative orfwith respect to each 0 ll1ust be indepentlent of
all the (}'s bUlthe same is not necessary [or the input" and output-dependent
argumcnts of j; as it would bc for linear dynamics.
EX3111p.IC 1.3.6 The circuit in Exercise 1.3.2 has linear dynamics but is non-
linear in parameters L, C and R. It is linear in LC and RC, though, and they
can be regarded as its parameters if we do not insist on keeping L, C and R
separate. 6
1.3.5 Other Modd Categories
Within the ramily of linear, lumped, time-invariant, fmite-order mouelsthere
are quitc a few further distinctions to be made.
(i) rerSlis State-variable mouels
have lite sti"llclurc shown in Fig. 1.3.2, \\'ith state variables as intermediaries
between the inputs tlnd outputs. The dynuinics arc expressed by a set or Hrst-
order ol'd inary different ial eq uatiulIs, onc per stu lc variablc. By this IllCill1S the
analysis of linear 1i1Olkls of dilIcrcnt orders is unilicd and brought within thc
Any two independent linear combinations of v and; will do as state variables.
Notice that the second and third alternatives have II RC, I/LC and R/L as
parameters, easily related to thc paramcters ofthe second-order input-output
equation and thus identifiable from the input-output behaviour. The first
pair of state equations, however, has parameters R/L, IlL and I/C, only the
first of which can be identilied from the input-output relation between uand v.
6.
10
Forcing
input
u(l)
Differential
slote
equation
Dynamics
Non-Dynamical
observation
process
I INTRUDUCTIUN
Observed
oulput
yll)
I.J WIlAT SORT OF MODEL? II
Inillal
conditions
,(101
Fig. 1.3.2 State-variable Illodel.
scope of linear algebra (Kailath, 1980). Models with mUltiple inputs and/or
outputs flL into this framework as comfortably as
systems.
Any given input-output relation can be rcaliscu with any olle OraB infinity
of equally valid choices of state variables.;\ suitable chuice cail cil1lcr pUllhc
state cquaLions into a convenient form, e.g. with certain coctlkicnts zero
(Problem 1.3), or make as many state variables physically meaningful as
possible. This free choice, all within a standard form of model, is helpful in
general but makes identification more complicated. The trouble is that the
coefficients in a preferred state-variable model may not be identifiable from
input-output behaviour alone, a point which must be checked for every
candidate choice of state.
Example 1.3.7 The second-order input-output relation LCi; -I- RCv -I- v= II
of Exercise 1.3.2 can be rewritten as two first-order equations in statc variables
v and i, the capacitor voltage and source current:
i=(-Ri-v-l-II)/L
v =ilC
or equally well in terms of v and v -I- Ri:
d (I R) V RII
dt(v-l-Ri)= RC-"L (v+Ri)-'
RC
-I-7
I v
v=-(v-l-Ri)---
RC RC
or v ami V, the Ialler dcnoted by lj.' to avoid a cryptil: cquatioll,l; == I;:
Ii' = (- RL'w - Ii -I- IIJ/LL'
v=w
Since the choice of state-variable models for identification requires mOre
background in linear algebra than the identification models,
we touch on it only brielly al1Jong the more advanced topics in Chapter 8.
(ii) Time-Duma;n versus Tralls/orm "lodels. Linear, time-invariant models
may be difrerential equations or impulse responses in the time domain, or
transfer functiOlis in the frequency or Laplace-transform domain (Gabel and
Robens, 198U; Ziemer ('11I1., 1983). The two are formally equivalent; we can
move rcadily frolll aile to the other, and the choicc is OJ matter or practical
conveniellce (LjulIg ami Ulovrr, IlJXI). The great IlliJjorily ur reccnt
deve!L,pl11cllts in idcntification com:ern timc-uomain mcthous anu so uocs
most of this book. co'
(iii) Deterministic rerslls Stocllllstic Aloe/ds. Identiikation mcthods span
a range from making 110 provision for uncertainty in the measurements or the
model to treating the Illodel coemcienls as random variables anu modelling
the errors anu iinpairmcnts in the measurements in some detail. The resulting
models are respcctively deterministic, i.e. certain, and stochastic, i.e.
probabilistic with time as an independent variable (Helstrom, 1984; Melsa
and Sage, 1973). As a compromise, some methods identify deterministic
models but take care to allow for impaired measurements. Chapters 2-4
examinc such methods.
(iv) Sillgle-lllput-Sillgle-OwpuI twsus Multi-blpw-Mlilti-Olitput Models.
Idcntilieation methods for (s.i.s.o.) models will be
our main focus. They form the foundation of methods for multi-input -mul ti-
output (111.i.m.o.) models. Section 8.7 looks brielly at m.i.m.o. models. An
important feature of linearity in the dynamics is additivity of the output
responses to separate inpuls. Nl)I1-linear dynamics cause the response to any
one input variable to bc alreclcu by the behaviour oflhe other inpuls, and so it
is lH.:ccssary to identify the relations bclwccil ul! the input variables and thc
olitpUt silllultaneously. The relations Jllay be idel1tilicd OIlC at i.l til1lc in a lincar
Systclll, ill principle, treating all hut OIiC or lhe inputs as sourecs or output
Jislurbuncc (ul.hllittcdly structurcJ) while each input-output relation is
identified.
12 I I NTI\ODOCTION I.J WHAT SORT OF MODEL'!
IJ
(v) Continuous-Time versus Discrete-Time A/ode/s. All large systems and
many small ones are identiIled from recorus tuken at a SllcL:cssioll of tliscrctc
instants, either because the uata-Iogging or parameter estimation is digital or
because the observations become available only periouically. Typical
periodical observations are quarterly or monthly economic statistics, shift
records from industrial processes and sampled signals fraill digital
communication channels. The naturallhing to do \vith discrctc.. timc records is
to identify a model which relates sampletime values but says nothing abollt
what happens between samples. Such a mouel is convenient if it is inlcnucd for
digital control design, state estimation or pcriouic prediction.
Information in continuous-time variables is lost when thcy urc samplcd un-
less the sampling rate is high enongh (Gabel anu Roberts, 1980, Chapter 5;
Reid, 1983, Chapter 3; Ziemer ef al., 1983, Chapter 7). Without going into
details, we note that the rate must be at least 2/to preserve a component at a rre-
queney j: in the sense that accurate recovery or the compnJlent is theoretically
possible by lowpass filtering of the sample sequence, if the samplcr and liltcr arc
perleeL Allowing for imperrections a nl! the gradual ra t her t haJi a bi'Upt dedi IlC
ofsigniJkant contcnt with incrcasing frequcncy, a realistic s:lJllplillg rate is IU
or so per period at tllC frequency at which thc power starts to drop olf n.lpiJly.
or at the cut-oil' frequcncy of a lowpass filter applied to the variable before
sampling. Convcrsely, it is ullwisc to ura\\' conclusiolls from a discrl:lc-timc
model about behaviour .It frequcncics <.Ipproaching half the sampling
rrequency.
Too-rapid sampling has its own drawbacks, rOrlunately serious only in
rather specialised circumstances. It yiclds non-llIinimulll,-pllasc Jiscrctc-tillle
models of some minimum-phase continuous-timc systems (Astr(illl al.,
1984), making some adaptive cOlltrolmelhous unfeasible. We shall nut pay
any further attention to lhis problem. Sampling will be assumcd to be at a
satisfactory rate and uniform in time whcnever we considcr discretc-time
models.
(vi) ParllJ1lClric t'ci"SUs NOll-Parametric A/odds. Qnc way to represent
dynamical behaVIOur is by a function, say the outpUll esponse 11(1) 01" a IIncar
system to an impulse input, which is nol paramelerised. That is.thc function IS
specified directly by the result it gives for eath value of its argument. The
alternative is to nominate first a family of functions, such as ull SUIIlS or
exponentials, then one or more parametcr values to pkk Olle mcmber oUl,
such as the number of exponentials to be included in our model, anu finally
coetlicient values, like the inilia.l value and coellkicllt of timc in the exponcnt
for each exponential.
The bCllellt of restricting the modcl to a paramctric family is economy;
relatively few parameters and coellicients arc ncetlcd to describe it. Such
economy is achieved by taking lhe trouble to lind a suitablc model structure,
then going through a Inore complicateu and less general identification
procedure thall for a non-parametric model. The impulse-response, step-
response and frequency-response methods of Chapters 2 and 3 are non-
parametric, while the mcthods of hller chapters are most often applied to
parametric models. Wel!stcud (1981) reviews non-parametric identification
methods from a practical viewpoint.
In practice the distinction between parametric and non-parametric models
is not sharp. For instance, the number of inswnts we evaluate h(t) at, and thc
intelval, are in clrctt parameters. A working dellnition of" parametric model"
is "prelly rcstrictive model. identified in stages (structure, parameters,
coeIlicientsJ". "Non-parametric mouel" might be intel'preteu as "rairly
unrestrictive mouel, iuentilieu "II in one go".
A potential source of confusion is that "parameter" is used in identification
both in the seL1se employed here and to inean any number which is not a
variable. e.g. tlie coellicicnt or each term in a non-parametric linear model.
The mix-up is Jirmly establishcd and there is a lack or alternative words with
precise enough IHennings. so we shall just lry nol 10 read too much into the
word.
(vii) Secllolled ferslIS Voila,.y !I1odels. Sectioning a l11odcl,,;unsin
1
plify
itlclltilkatilln by separating aspects of bchaviour which can he identified aile
at a time. A nat'ural and urten intuitively obvious basis for sectioning is
For example, a distillation column might be modelled
on two scales, local s.Ls.o. relations between feed now rate or temperature and
product How rate or column temperature, for instance. rOrining parts of a
model of the column as a whole. The column Illodclmight then Conn part of a
model ofa relinery. Control is conveniently split up in the same way, with local
controllers co-ordinated by manipulation of their set-points, and
overall control exerted through a relatively scheduling process. in a
control hierarchy.
Dilrerences in time scale are also a basis for sectioning. Indeeu. models arc
orten split up by time seale with little conscious thought, treating slow
componcnts of all output as drift while identifying faster dynamics. and fast
dynamics as instantaneous while identifying slower ones. \Vhen the spread of
speeds is large, separate treatment or rast and slow dynamics is highly
desirable, as otherwise the choice of sampling rate is dillkull. i\ rale high
ellough for the fastest Jyni.llll ics inl plies a large number of samples t u cover the
slowest. Resolution of the !lIst may be lost, or estimalion oflhe slow
dynumics spoilt by cumulative error.
Example 1.3.8 With R'Cj L = 100 iJ1thc circuit or Exercise 1.3.7, Illc response
r(l) to an impulse input u(t) contains two decaying exponentials. The ratio
between the exponents is 98:1. In an intcrval LjR, the faster exponential
decays by a factor 2.718 and the slower by less lhan ... To delermine the
faster component accurately the output should be sampled at intervals not
much over O.4LjR, the time to decay by 33 An unparamclcriscu impulse.
response model would then need over 750 terms to cover the time lip to when
the slower component has dccaycu to 5 of its initial value.
A parametric transfer-fullction mouel (Section 2.3) gives the
same response with only three cocllkicnls. However, an error of 0.1 in the
decay of the slower component oVer olle O.4L/R interval gives all error of
112/; in the value of the impulse response at a lag of 750 samples,
so the model coeOlclcnls would have to be found to an impossible degree of
accuracy. The remedy is to use a longer sampling interval, around 40L/R, in
estllllaling the slower component (Problem 2.3.) 6
with rc.'ativcly few internal conncctions arc easily split into
se.ctlOlls If those connectIons arc accessible for measurcmcnt. Su arc systcms
wl.th. re.spollsc compo.nents a large range of speeds but Hot tt)O many
SimIlar speed: Jl IS a dillcrellt story for large systellls with complicated
Internal conllecliollS und sySlCIIIS Wilh a fairly uniforlll spreud of response
of modcis with linear uynalllics has rCl:eivcu a guuu deal of
atte.ntlOn, wIth eye on simplified 1l1Odcls for control design, aillI is the
SectIOn 8.5. Methods lor dccomposing large systems for
IdentIficatIOn (Mahmoud and Singh, 1981) have not yet had mlleh impact on
the cut-and-try approach usual in idenlification. A funuamcntal obstacle to
automation of the identification of large systems (and small ones, come to
that) is the need, in the end, to decide part by parl whether the model is
credible. If modeltesling and validation has to be piecemeal, and it there
is less incentive to avoid a piecemeal approach to idenlification. '
We should nOle that a system may be large and strongly interacting without
necessarily being diITIcultto seelion for identilieation and eonlrol design. The
accessibility of variables for measurement is the detcnnining factor. An
example is a steelworks cold-rolling mill (llryant and Higham, 1973), Where
the en'ects of control action at dilferent stands interact strongly blltthrough a
small number orwell-instrumented variables such as strip gauge, tension and
speed. .
/l1arkov-Clwill tHode/s. We shall look only at Illodels based diret.:tly
or JnllJrectly on dilrerenlial equations, or, in tiisnetc time, Jillerencc
equations, with the system inputs and outputs as variables. Other types 01"
model are valuable in some applicatiuns; notably models.
These specify the probability of every possiblc tnlllsitioll from a stale at one
instant to a state althe next (Lucnbergcr, 11)79, Chapter 7; Wadsworth and
13ry:'lI1, 1974, Chapter 9). The variables arc the probabilities of being ill eal:h
possible state at each instant. The transition probabilities are assumed
IA IIOW DO YOU CONSTRUCT A MODEL?
I A, I Stages in Identification
15 1.4 HOW DO YOU CONSTRUCT 1\ MODEL?
No straight answer can be given to the qucstion or how to construct a model.
The best way to construct a Illodel depends on a host of practicalities, not all
foreseeable; and all we can do is generalise by pointing out a fe\v or the stages
and some of the constraints on the choice of technique.
constant and il1llependellt or previous history. At first sight a
l11odcllooks very diJrerenl frolll a dilI'crencc-equation Illodel, which gives the
next output in terms of input values over a range of times. V\'hcn thc diiTerence
equatioll includcs all adJitivc random-variable "'noise" term to account for
unknown disturbanccs and measurement error, some similarities can be
tr:ll:eu. We can rewrite the dilfercl1ce-equation moucl a set of first-order
dilrerelll:e equations, i.e. a state-variable mOLlel, with the next slate expressed
in terms of present, not past, Slale, input anu noise. Moreover, the next state is
given as a probability distribution, delermined by the probability distri-
butions of the noise and prescnt stHte, although for simplicity we usually quote
(and compUle) only the mean value of the output and perhaps SOme measure
or its variubility about the mcan. Thus we see that a stochastic diJference-
equation model also gives future state probabilities from present slale
probubilitics. the remaining dilfcrence belween Markov-chain anti ordinary
is one of emphasis, model
gives thc next stale mainly as a deterministic function of present state aIltI
knowll input. \\/illl the uncertainty hrought in vial the ;lIoisc. Tile
l:haill nlOdd is entilely probabilistic, with IlO mechanislll t1csL:ribing how the
next state is detennincd, other than the transition probabilitie3.
1
Identification of a Markov-chain mouel presupposes either enough
observations of each possible state transition to yield its probability, or a good
knowledge of the causes which determine the probabilities.
Chapters 2H inlrodul:e identilkatioil methods, Chapter Y talks about
expcrimcnt design and Chapter to discusses the analysis or results. Very
seldum docs an idcnlilication projecl l:ollsist or a single pass thruugh the
sequence (I) pil:k a model, (ii) design the experiment, (iii) do the experimellt
and (iv) analyse the results. Figure 1.4.1 sketches identification more
realistically. Most items ill Fig. 1.4. I arc just i.:ommon sense, but two of these
itcms deserve !llorc attcntion than they tend to get in the literature, ni.ill1cly
informal checks on the records and model validation. Both arc disl:llssed at
length in Chapter 10.
1 INTRODUCTION
14
17
No
Stop
Yes
No
I-='----{ SlOp
Take records
Sp.lecl and tidy lip r.ceDrus
Check records informally for
errors, anomalies, rnissinu
va!u-es, effects of unrecorded
inputs, drift
Eslim<lte strueluml panlll1eters of
model. estimate model coefficients

VALIDATION I
I
I
I
I
I
_____J
Document model
Fig. 1.4.1 (cO/uilllted)
I
I
I
L _
1------
I
I
IL..-_'__ _
-
I
I
______J
1----- -------l
I VALIOATION I
I Estimates credible? [\Jo I
I
Structure adequaLe?
Good fit to observations? I
I I Yes
I I
I Apply model for intended purpose I
I I
I I
I r--;-;-!------.J
__
j
1
I
I
1
I
I
1
I
i:-
I I
I 1-
1.4 HOW UO YOU CONSTRUCT A MOOEL?
I
I'
I
I
I
Yes
Stop
I
_"""'-' ._---J
Choose type and pariHnelers of
perturbation, design sampling
schedule
Examine actualors, instruments,
access, operating constraints
Fig. 1.4. J IJclllilkatioll.
INTIWOUCTION

I No EXPERIMENT OESIGN I
I Choose inputs to perturb (if any), 1----1'-----.
I r outputs to observe, identification
I I method
I I
I I
I
I I
I I
I I
I f-
I I

1ReviSl!

I PROBLEM FORMULATION I
I Purpose <Inri scope of modo]? I
I r What eHort is jUMified? I
I I I
I iRevi" I
I I
I : I
I I I
I I I
I L What is known of system, environ- I
ment; what models exist now? I
L
I
,
'6
Once the aim of the identiHcalion exercise is dear and the system to be
modelled has been denned, the methods adopted and resulls oblained depend
on
(i) access [0 the system: historical rCL:orus, e.g, ccollumil' or hydrological,
with no opportunity to influclll.:c them, or normal operating rccon.ls wilh no
control over the input but some chance to improve instrumentation, or
responses to planned inpu I perturba lions ill the presence or other d ist urbanccs
and drift, or responses 1'1'0111 bench tests in controlled cOlH.litiollS. or detailed
examination of a system which call be dismantled and lcstco in sections. or,
mas! likely of"all, some combinalion of"these; repeatability of"tesls and volume
of records arc other important facIors;
(ii) time available: the response of a bandpass cOlllll1unication dHlllllel
might be found in milliseconds or less at lillie cost, while a blast furnace with
tnmsient responses lasting a oay or two has to be logged for weeks ()r months
to get sullkicnt records (Norton. 11)73). so its itlcntilil.:alioJl is costly (but so is
ignorance of its dynamics);
(iii) instrum('nts and actuators: the average power, instantaneous value.
ra le ofchange and smoothness or eha nge of in put perturbations arc Ii mited by
the input actuators and the system; the size of an output response to a
perturbation is normally stringently limited in process plant, to ensure usable
product; the sampling rale may be limited by instruments or logger. or by the
lime taken to collect rccortls, particularly in the life sciences or economics; test
duration may be limited by instrument ano actuator unreli,ibility or l'ty human
cndurance when mallual collection of records is involved (Stebbillg 'I al.,
1984); instrument noise or sampling error may be the predominanl factor
limiting the quality of" the records;
(iv) cOl1lputillg.!acilities: lack of t:olnputing pmvt:r is lIot tlit: constraint it
once was, but an important question is still whether the model can be updatetl
sequentially as measurements are taken, or requires itcri.ttioll through
the records; the back-oC-an-envclopc cakulatioll or approximatc model
parameters, C.g. gains and timc constants from Bode plots or slep responses
(Chapters 2 and 3), is still a great attraction of some classical idelltificalion
mcthods;
(v) arailability of specialised l'lfuipl1lL'fll alld 1Il('thods: transfer-fullction
analysers and (Chapters 2 and 3) make identification
relatively quick and casy in circumstances that suit thclli; specialised mcthods
of estimating models have devcloped in many fields, such as reverberation
testing of rooms (Parkin and Humphreys, 1958) or "curve-pceling" for
separating exponentials in biomedical tcst results (Jacquez, IY72, Cha pier 7),
but we have not the space to cover them;
REFERENCES
1.5 HOW TO REAU nus BOOK
(vi) precisioll find c01llpll'{('Ili..'SS required: effort is not unlimited, even for
academics. and the law of diminishing returns applics as much to
identification as to anything else.
t9
REFERENCES
Ashworth; iYL J. (1982). "Feedback Dcsign of Systems with SignilicanL Uncertainty". Rcsean:h
Studies Prc5s, Wiley. New York.
Asimov. I. ( 1983). "The Universe", 3rd ed. Penguin, London.
Astrtilll. K. J.,Hagandcr, P., and Slcmby. J. (Em'-l). Zeros of sampled systems. ,IU(o/llaiic{l 20.
31 JH.
Brown, R. F. Godfrey, K. R., and Knell, A. (1979). Compartmental modelling based on
JIIclhitlllinc lolcram:e test dala; a case SlutIy. A1cd. !Jiol. Eng. COiillJllI. 17,223-229.
Bryant, G. F.. antI Higham. J. D. (197.1). A methotI for realizable Iloninleractivc control design
for a live stand coltI rolling mill. AIl!omatica 9. 453-466.
Clark, A.!'. (1977). "Advanced Dutil-Transmission Systems". Pentcch Press, London.
D'Azzo. J. J., am.I Houpis, C. H. (198l). "Linear Control SystcmAnulysis untI Design"; 2nd cd.
McGraw-Hili Kogakusha. Tokyo. .
Dumont, G. A. ( 11)82). Self.tuning control ora chip reliner mot(lf load. Aufomat;m 18, J(J731'-1.
Forrester, J. W. (197 [). "World Dynamics". Wright-Allen, Cambridge. Massachusetts.
Gabel. R. A. and Robens. R. A. (1980). "Signals and Linear Systems". 2nd ed. Wiley, New York.
Helstrom. C. W. (198'-1). "Probability and Stochastic Processes for Enginecrs". Macmillan, New
York.
IEEE (1983). Spcciat issue on App!iCillion of Kalman Fillering. IEEE Trans. AI/tom. COllfr.
AC-28. 3.
Jacquez. J. A. ( 1972). "COlllpartmclltal Anulysis in Hiology and Medicine". Elsevier. Amsterdam.
Kailuth. T. tI(80). "Linear Syslems". Prentice-Hall. Englewood Oilrs, Ne\\; Jersey.
Identifkalion is not a spectator sport. Thc only way to lind out what the
various techniqucs really caJl and callnol do is to try them. There are pieccs of
fairly portable lechnique \vith a firm theoretical base; otherwise this book
would hardly be justilied, but they ,,11 have their weaknesses, and every
application seems to have some non-standard feature to test them. The book
recognises the Ilunlerical and elllpiritainaturc ofidcntiJication by resorting to
numerical cxatllliles as often as possible. They are intended to be followed
through in detail and often raise signiJicant points; they are nol just
illustrations. III many of them it would be worthwhile to alter sOl11e o[ the
!lumbers ur details and explore the consequences. Similarly, the cnd-of-
chapter probk:lIls arc not primurily drill exercises, but are intended to
encourage scrutiny of further practical issues. The best accompa'Ji!mcnts to
the lext, though, are a set of records from an actual dynamic system, and
someone who knows their peculiarities and wants a model from them.
1
,
INTRODUCTION
1.4.2 Constraints on Idenlilic<,tion Methods and Resnlls
IH
20 INTRODUCTION PROBLEMS
21
1.2 The output)'{I) from a digital controller is related to the input II{I ) by the
discrete-time equation
where Tis the sllulpling interval, 1I and b arc constants and It is a fixed positive
integer. The output is zero up to and inclUding time zero, A unit-pulse input,
whkh is one at timc zcro und zero at all other timcs, is applied. For(i) k = I, (ii)
k = J lilH.lthc resulting output. over a long enough period for its behaviour to
becomc clear. Docs the cxtra delay ill (ii) make the output allY morc
cOlllplkalcd '!
1.3 Veriry that for the system or Exercises 1.3.2 and 1.3.7, state variables
give decouplcd state equations of the form
.<, = Alx, +au/CR, .<, = A,x, - all/CR
where a = (I - 4L/CR')I/' and A" A, are the poles (eigenvalues) oCthe sYstem.
What is the obset'vation equation relating v{l) to these state By
integrating the state equations find the response v(t) to a unit-impulse input
u(t) for (i) 4L < CR', (ii) 4L> CR', What funclions of R, Land C can be
idcntiJieli from this response'llfthe amplitude of the response is unknown but
its waveform is otherwise known accurately, is there any change in what cun be
idenlified '!
1.4 (Mainly (or electrical engineers) How wonld you find LCand RC in Ihe
input-output o,d,e, of Exercises 1,3.2 and 1,3.7, if you could choose u(t) freely
but only record v(t), making no other measurements? If the network were in a
vandal-proof box all the bench, with the input and output terminals labelled,
but you knew only that the box contained passive, bilateral components,
could you identify the nature. configuration and values of the components? If
so, you woulu be relying on electrical cngineering background.
This problem illustrates the large dilrerellee between "black box"
idcntiJication of input-output dynamics and identification of the internal
structure of the system from eXlernal measurements. The latter requires more
thonght and more hackground knowledge about the system.
x, = (I - a)v/2 - Lai/CR
i = 1,2,3" ..
x, = (! +aj"/2 + Lai/CR,
I'UT) - l'[(i - I) Tj
. 'T +(/)'[U-k)Tj=bll[(i-ljTj,
1.1 A system containing a pure delay r is uescribcu by
)'(I) +(/)'{I - r) = IIII{I)
Its output y(l) is zero over the interval r lip to time input is zero
at all times except zero, whcn it is a vcry short, unit-area 1J1lpulse. hnll y( r),
y(2r), y(3r), and if you have the patience, y(4t), by 0 < I:::; T,
T < I ::; 2r and so Oil, Compare these values ofy(I), and the ellort It cosls to gel
them, with those of the system described by
);(I) +(/)'(t) = b(/{I)
Kashyap, R. L.. and Ruo, A. R. (1976). "'DYllumit: Stochastic Models from Elllpirical DaIU",
Academic Press, New York lind London.
Kitanidis, P. K., anti Bras, R. L. (1980). Real-time forcc<I:;lillg wjth 11 com;cptual hydrtJlogil:
model. Water R('sollr. Res. 16.
Lee, T. S . and Cunningham, D. R. (1976). Kuhnllnliltcr equilizaliOil for QPSK cOllllllunications.
IEEE TrallS. COif/mUll. COM-24, 361-364.
Ljung, L. t and Glover. K. (1981). Frequency u(llllain versus lillie uomain 111l:lhods in 5yStCll!
identification. AuloIIIU/;W 17.71-86.
Lucnbergcr, D. G. (1979). "lnlrouucliOil 10 Dynamic Systems", Wiley" New York. . .
LuvisOll, A" amI Pirani, G. (1979). Design lind JlcrfonmlliLc of nn adaptIve K"lman receIver IOi"
synchronous unta lrallsmission. IEEE Tnms. Aao.lf/tIl'(' "','/1'011 . .sJ'.I'. 15,
Mahmoud. M. S., amJ Singh. M. G. (19M I). "Largc Scale System Modelling". I'Clg.IITIOIl, Oxl onJ.
Meadows, D. L. (1973). "Dynamics ur Growth in a Finite World". WrightAllcll, Cambridge,
Masslll.:husctlS.
Mclsa, J. L.. and Sage, A. P. (1973). "An InlrmJudioli to I'l'ohabiHty I1ml Sll1l:lHlslic I'roccs!'ics".
Prentice-Hall. Englcwom.l Clilrs, New Jersey.
Murdin, P. and Allen, D. (1979). "Catalogue ur Ihe Universc", Cambridgc Ulliv. Press, Lundon
and New York.
Norton. J. P. (1973). Praclical problems in blasl rurnaee identilication, Mer/,I;. COl/frol ('.
Norton. J. P., and Smith, W. (1972). Digilal simulatiun or lhe dylllllllll:S of u lixcu-bed calalyill.:
reaclor. M"Qs. COIlIra15,
Parkin, P. H., amJ Humphreys, H. R. (1958). "Acuusties, Noise and Buildings", Fabel, London.
Reid, J. G. (1983). "Uncal' Syslem Fundamentals". l\tkGru\v-1-I ill , New York.
Richards. R. J. (1979). "An Introductionla Dynamics und Control". Longman. London.
Stebbing, A. R. D., Norton, J, P., and Brinslcy, M. D. (1984). DYlHllllics of growth control in a
marine ycast subjccted to perturbation. J. Gm, Microhiol. 130, 1799-1808.
Wadsworth, G. P., and Bryan. J. G. (1974). "Applicatiuns or I'robubility and RamJolll
Variablcs", 2nd cd. McGraw-Hili, Ncw York.
Wellstcud, P. E.lI981). NOIl-paramctric mcthods of systcm idcntilkatioll. 17, 55--6\}.
Whitehead, P. G. Young, P. c., and HlJrnberger, G. (1979). A systems model 01 I1m".:und water
qUlility in the Bedrord Ousc river systcm--I. Sircamllow mudelling. I!'a/a Re.\. IJ,
1155-1169.
Ziemer, R. E., Tranter, W. H., and Fannin, D. R. (1983). "Signals and Systcms: Conlinuous and
Discretc". Macmillan, Ncw York.
I'IWIlLEMS
Chapter 2
Classical Methods of Idcntification:
Impnlse, Step and Sinc-Wave Testing
2.1 T1ME-OUMAIN O/,SCRIPTIUN OF RESPONSE
The idcntiJication mcthods or this chapter rely on the theory of s.i.s.o., linear.
dynamical systems as covered by electrical and control
engineering degree courses and many others. Suitable textbooks, useful also
: for later chapters, include Gabel and Roberts (1980), Reid (1983) and Ziemer
1 el al. (1983). A reminder of the theoretical background will be given foreach
,J rnelhod to minimise the need for background reading.
11
2.1.1 IlIIlul!se Response and Response
Linearity allows us to lind the response ora linear syslem to any forcing u(t),
t ;0, 0, by
(i) calculaling or measuring the response 10 some very simple standard
waveform, say a step or size I or a short rectangular pulse of area I, thcn
(ii) breaking 1/(1) lip into a collection of components, each a scaled and
delayed version of the standard waverorm, approximating u(t) if necessary,
and finally,
(iii) building up the lotal response by sUl11ll1ing the responses to lhe
COlllPOllCllls ("superposition").
Figure 2.1.1 illuslrales this procedure with a short, reclangular, unit-area
pulse as the standard waveform. The shorter the pulse, the beller 1/(1) can be
approximated. Ifwc make the pulse width tend to zero, keeping the area unity,
the standard waveform becomes a unit (area) impulse or Dirac ()jimctioll. The
unit impulse at time zero is defined by
'l(l) = for all 1 '" 0,
f,J(t)dt=1 (2.1.1)
25
(2.1.3) !I" probably zero, integer I :i:: 0
2.1 TIME-DOMAIN DESCRIPTION OF RESPONSE
Provided a l und,a
1
are not too close, we can estimate them from the impulse-
response test by saying that y(tJ is dominated by the slower-to-decay
exponciltial towards the end orthe transient. fiLting a single exponential to this
tail, subtracting it from J'(t) anu JJtting a faster exponcntial to the remainder.
Hence 1 and til are estimated, -Howcver, b can be found only ify(O_) and
Ji(O _) are known in advance. The response to any spccifieu initial conditions
can be determined without knowing b, and therefore without
y(O_) and .';(0") in the impulse-response test. 6
The forced response of a discrete-time system is easily found by su per position.
The obvious stHndard input is a unit pulse, i.e, one at time zero. The zero-
initial-cundition response (u lhis input, the ulli/-pulse response (u.p.r.),
consists of a sequence of pulses or size 11
0
,11 1.11 2' ' .. at times 0, T, 2T, ... ,
wherc Tis the sampling period. Nu real systcm rcspolH.ls instantly to:l1l inpul,
so 11 0 is zero, although it sometimes makes sense to ignore a delay much
smaller than T. Superpositioll gives the response at s:lInple instant t to all
input sequcncc "
0
, Ill' Ul' .. at O. 7:,2 T, ' .. as thc ('olluolll/ioll .\/Iln
2.1.2 Uiscrete-Timc Furced Response: Unit-Pulse Response
ExulIIl"e 2.1.1 A stuble system is modelled by y + (/ ,.1; + a,.I' bu. Taking
Laplace transforms (Jf this equatiun,
s[sY(s) - ylO - .''10.) +", [s1'(s) - .1'(0 Jl +", Y(s) = b U(s)
so
. bUis). (s+",)r(O_J+l'(O_)
Y(s) = lorced response ,-.._'----- -I- I.e:. response ., - -
s--I- a
l
s -/-tl
1
s--I-a
l
s-l-a
1
Consider the case where Sl -I- (I IS -I- (11 can be factorised into (s - ell)(s - IX.,)
wi til el 1 alld Cf.
1
being rea I, and u(t ) i", a unit iIII pulse. so U(s) is I. We spli t Y(;,)
into partial I'ractions and illvcrl thcm tu Hull thc total respunse
I
y(t) = ", {[b - IX,y(O +}(O"J]exp(a,l) - [b - a,y(O_) +}(O_)]exp(a,I)
Example 2.1.2 At intervals of Tscconds, a radar gives a sample of the position
x of a target moving in a straight line. A microprocessor forms an estimate
g, = (XI - 2xl _ 1 +x,_1)/T
1
or the target's acceleration, which is added tu the
signal conlrolling the torque or the motor rotaling the radar antenna, by way
of a digital-analogue converter, hold and power ampliHer.
(2.1.2)
.. lutul

Response 10
standard Input
---y(t)
Linear
system
2 CLASSICAL METHOLJS 01' IIJENTIFICATION
Standard input
waveform
u(t}---
Input and its Total response
components and its components
Fig. 2.1.1 Finding respoUse of lilicar system by superpositioJl.
y(1 J= f' h(1 - r )II(r) dI,
o
Any non-zero initial cOIH..IiLions, due to initial slofctl energy ill the syslem,
also contribute to the response. The initiukondition response just adds to the
forced response, because of linearity. In fact, it t:Hn be thought of as the
response to U11 input before time zero, which was carefully tlcsigncJ to
establish the specified conditions al lillie zero. The presence of all initial-
condition response in an idcntilicatioll experiment means that tu identify the
system we must first know, or find out, the initiul conditions, e.g. .1'(0Jand Y(O)
for a second-order system. As they generally include tlcrivativcs of a noisy
signal, they may not be easy to measure. We therefore either arrange Zero
initial conditions by lelling the system settle to quiescence berore perturbing it,
or observe y(t) for long enough for the rorced response to predominate as the
initial-condition response dies away. The (unit-) impulse response is round by
solving (2.1.2), e.g. by Laplace transrorms. II' a parametric Illodel such as a
differential equation is then Jitled, the response to any speciIied initial
conditions (i.c.) eun also be round.
The section of input covering a short time ,11" about time r is roughly a
rectangular pulse with area II(r) As tiT tends to zero, this tends to lI(r)
times a unit impulse occurring at time r, (5(t - r). If h(l) is the wlit-impulse
respollse or the system to cl(l), the response to the impulse ,)(1 - rJII(t) is
Iz(t - r )u( r) Lh. Summing thecon tri butiollS from infinitesimally short sections
or input all the way rrom its start at r 0 to the present time I = I we lind the
total forced response as the COllvolulitm Or superposition illfl!gral
24
COll1monly all cstinwtcd 1I.p.r. describes the sampled behaviour of a
continuous-time system. We must be careful to recognise the limitations 01" a
u.p.r. in those circumstances. The u.p.r. may miss signincant fast uynumicsby
having too large a sampling period, or slow dynamics by being too restrictcu in
duration. The dead time (pure delay) is delermined only to wilhin one
sampling period,
The response g to a unit pulse X
o
= I at lime zero is 1/1'2 altimc 'c' - 2/7'2
at T + Ie and I/T
i
at 21' -I- Ie' where Ie is the delay in the microprocessor.
I-Ienec the initial response go is zero, and the holo output is 1/1'2 at tillle r,
-2/T' al 2T and I/T' al 3T. However. if I, T and the molor lorque
responds rapidly to the control signal, the anlenna angular position is more
accurately calculatcu by iglloring tile delay I co Oil the uther hand, 1he iller! in 01'
the motor and antenna prevents ungulaI' position l'rom rcspom..lillg rapidly to
torque changes, so lhe response or sampled :Jngular position ttl .r
l
lin-il shows
at sumple I -I- I, and 11
0
is taken as zero ill the uverall 1I.p.r. 6.
Example 2.1.3 (i) Figure 2.1.2a shows two u.p.r:s, one wilh h, 1.1 (0.6)'"" 1 -
0.1(0.95)'-1, I;:' I and lhe olher wilh h, = 0.555'- I, I;:' I. Over lhe !irsl!ive
samples they look very similar, yet the correspolluing steauy-state gains (final
responses to a sampled unil Slep) are 0.75 and 2.247, respectively. The
explanation is that the steauy-state gain is the sum of all the u,p.r. values h
ll
to
h
w
' and lhe low-amplilude slowly decaying second component of lhe first
u.p,r. ultimately cancels 73/;, of the sum uue to the faster nrst component,
even though its effect on hi to !Is is small.
(ii) The lwo sections of u.p.r. in Fig. 2.1.2b dilfer by almost 5 '\ or so. They
are h, = 0.8' - 0, I', I ;:, I and h, = 0.72(0.828)' - I, I;:' I. They represenl lhe
sampled behaviour of zero-ucau-limc continuous-timc systcms wilh grcally
diJfering impulse responscs, That 01" the lirst syslem rises at a !inite ralc from
zero, while lhal of lhe laller jumps abruplly lo 0.87 (Problem 2.5). The former
has fast dynamics which are initially important in the impulse rcsponse but
hardly show al all in lhe U.p.r.
(iii) Consider lhe second u.p.r. in 0). It could derive from sampling a
continuous-time system with negligible dcau timc, impulsc response
exp[U/T) In 0,555J/0.555, steady-stale gain (area under lhe impulse response)
3.06 and impulse-response peak 1.802, At the other extreme, it mighl come
from a system with dead time just less than T, impulsc
response exp[(I/T - I) In 0.555J, I ;:, T, sleady-slate gain 1.698 and impulse-
response peak I. 6
27
(2.1.4)
H(jw) = (l/.iw)(1 - e- j"J)
1 0
(0)
o
h(l) Jl(t) - Jl(t - T),
2.1 TIME-DOMAIN DESCRIPTION OF RESPONSE
Fig.2.1.2 (a) Unit-pulse responses, Exanlple 2.I.J(i). 11/=0.555'-1, 0; II
r
=
O,I(O.S15)'-I, D. (b) Unit-pulse responses, Example 2.1.3(ii). 11,=0.8'-0.1', 0; 11,=
0"7210"828)'"'. D.
Another point to note when identifying the u.p.r. of a continuous-time
system is that a sampled input is usually applied to a continuous-time system
through a hold circuit, which reconstructs a continUOUS-lime signal frolll the
samples. The dynamics of the hold circuit will be included with those of lhe
system in the identiHed model, and Illight noticeably affect them. The simplest
and most widely used hold is the zero-order hold, which provides a constant
outpulequal to the mosl recenl input. Ifit is taken 10 respond to lhearea oflhe
input pulse rather thun its amplitude, as is rcalistic, its impulse response h(t) is
a pulse of heighl I and duralion T, lhe sampling inlerval. Hence h(t) and its
frequency transfer function are
fl
,,' O!J'
0
i 8
1
0
j
0
0
08
(b)
8
0
0
8
"
04" 0
,.1
)
I
! 2 CLASSICAL METHODS OF IDENTIFICATION 26
The dimculties and dangers of" fitting models lo impulsc responscs arc
further illustrated, wilh many numerical examples, by Godfrey (1983).
where p(l) is the unit slep al time zero. The effects of the hold are negligible al
frequencies much below l/nTHz (Problem 2.7) .
2.2.1 J"Icmmrclllcnt uf hllJlulsc RespUllse

2.2 DmECr MEASUIlEMENT OF IMPULSE AND STEP
IlESI'ONSES
2lJ
(2.2.1 )
2.2 DIRECT MEASUREMENT OF IMPULSE AND STEP RESPONSES
Another potclItial dilliculty is that the aClUator may have dynamics
comparable in timc scale with, am..l inseparablc from, the dynamics of the rcst
of the system, but we may not want to include thcm in the model.
A stcp is an indefinite succession of contiguous, equal, short, rectangular
pulses and produces a much larger respoIlse and hjgher s.n.L than does One
shari pulse of the same peuk amplitude. Conversely, a given peak oUlput
amplitude is achievable with a smaller step input that pulse input, and
therefore there is less risk of saturation within the systelll.
The cllcct of noise on a panll1lcter estimate d iJrers according to whether the
estimate is obtained frol11 a step response or from an impulse or unit-pulse
enough to give an acceptable output S.Il.r. The actual response is the ideal
impulse response convolved with the inpul pulse waveform, us in (2.1.2). The
effect is to blur the impulse response. For instance, a rectangular pulse of
duration I
d
and area I would give by (2.1.2)
If'"
1"(1) = h(l .- t)dt ooh(l)
. ',1 0
rather than the ideal h(l).
A large-alllplitude short pulse .may he diJIkult to produce. Moreover, it
leaves some doubt as to whether the response is linear and typical or is alrected
by largc-signalnon-lineariIY such as saturation. The pulse amplitude is limiled
by the input actuator range, and usually by the maximum size of perturbation
regarded as acceptablc.
EXlllllf)lc 2.2.1 The silitoll content of the pig iron produced by a blast furnace
is a good indicator or how the furnace is running. It is influenced by the
tcmperature of the hot air blast. The response of silicon content to a blast
temperature perturbation lasts of the order of a dny and allects several
successive easls of iron (Unbchauen und Diekmunn, I ~ 8 3 ; Norton', 1973), The
temperature can be altered quite rapidly, but a short pulse of higher or lower
temperature would have to be very large for its ellect to be discernible in the
normal castto-cast silicon variation. Even if it could be produced, such a
pulsc would not be risked. The prime concern is to run the furnace smoothly.
A large temperature pulse might cause irregular behaviour, dillicult to predict
or correct and untypical of 1I0nnai opcration. In an extreme case, hanging and
slipping of the burden (ore, sinter and coke) could occur, perhaps damaging
the rel'ractory lining. 6
2.2.2 Mcusuremenl of Slep Response
f
I
2 CLASSICAL METIIODS OF IDENTiFICATION
2.1.3 Slcp Rcspunse
at(e-Jd"/f + e-lk -I )1"il +... +e ~ 1")f + I)
=llT(!_e-tHIl1'f1HI_e Fir)
Example 2.1.4 A continuous-time system has unit-impulse response (/(' -Ilr. Its
unit-step response is therefore ar(I - e - f/f), whiL:h when sHmpled gives
ar(l_c-
kT
/
r
) at sample k. The response at lillic kTto a sall1plcd-lIllitstcp
inpul, on the olher hund, is
The simplest of a II iuelltilicatioll techniques is to lind the impulse respollse by
pUlling in an impulse and seeing wll1lt cOllles out. The Illude! is acquircd in a
single response IIlcasurclllclil if lIuisc is 1I0t cxcc!;sivc. The inllucllcc 01" lI11ise
cun be reduced if neccssary by repeating the perlurbiJliOIl and averaging the
responscs. The a vcrilging depends 011 the incollsistency or the noise. SuIII III ing
N responses gives N times the consistent part but less than N timcs the
inconsistent part, and thus improves the signal:noiseratio (S.ILL). A more
formal statistical justiflcation is investigated in Problcm 5.4. Measured
impulse responses may also contain structured disturbances such as slow drift
or periodic responses to unmeasured forcing inputs, not necessarily reduced
by averaging. They must be filtered oul instead. Freehand interpolation and
subtraction of the disturbance on a responsc plot may be enough. Other
options are bandpass fillering or even Jitting an explicit model to thc
dislurbance so lhal it cun be sublracted.
A practicable test input has Jinite duration and :'Illlplitut!c, and can only
approximate the () fUllction. The duration must bc short t:olllparcd with the
fastest feature of interest in the impulse response, and the amplituuc large
A convenient feature of lincar systems is that an ideal illtcgralor at the input
has the same cfl:Cet as Olle at the outpul. The response to an integrated L1llit
impulse, i.e. a unil step, is therefore the time-integral or the impulse response.
We can, in principle, measure the step response and lind the illlj1ulsc rC."poIISC
by tliJlcrcntiatioll, flow pr'actical tllis is we discus:; ill Sccliull 2.2.2.
We must be rather careful with tliscrctc-tilllc systellls 011 this point. Tlie
U.p.r. can be found, in principle, bydiJrcrcllcing the response to a sampled unit
step but not the sampled respunse to an 1lIlSi.UUplel1 unit slep.
28
2.3 TRANSFORM DESCRII'T10N OF RESPONSE
response. For example, steady-stale gain is easily rouno, in the absence of
drift, from the initial and tina I values of the step response, even in considerable
noise. By contrast, the value obtained by integrating the impulse response
includes an unknown contribution frol11l1oise, and a value found by summing
u,p.r. samples would be susceptible to dead-time uncertainLy introduced by
the sampling, as in Example 2.I.J(iii). Allother CX:llllp!C, favouring the
impulse response this lime, is estimation of the impulsc-rcspullsC peak.
Dilfercntiatiol1 of a noisy step response \Youlll exaggerate widcbund noise
because Jillcrcntiatioll amounts 10 1I1lcring with gain proportiunal tu
frequency. Unless the peak of the derivative were ruunu 1'1'0111 a parametric
model fitted to the whole step responsc with high confidcnce, its value would
be very uncertain.
We note finally that a step is the easiest of all inpuls to IJI"oduce wilh
acceptable fidelity, and repeated steps in a square wave arc equally easy.
-I- l'a,e"'!I(t-r)dr
cit lJ JlI
= -e"'!I(O) -I- !I(t) -I- a,/i(e'" - e"')
We now colleci lerms and cancel, 10 lind !I(t) (a, - a
1
J/iexp(a,I). By
contrasl, Equation (2.3.2) is
(a, - a,)/i/(s - a, J(s - "',) = fJ(s)/(s - a,)
giving l1(s) and hence h(l) with negligible drort.
31 2.3 TILANSFORM DESCRIPTION OF RESPONSE
We start a time-domain solotion for !I(t) by dilrerentiating:
I
' d!l(I-r)
[J(a.](!7-11_a.,c:lll)=h(O)ellt+ e
l1f
_ dT
- 0 dt
We thcll put -clh(r [)jdr fur clh(r - T)jclt and intcgrate by parts:
2 CLASSICAL METHODS OF IDENTIFICATION JU
or the fourier version with jw for s. In .identification we are interested in
solving (2.3.2) for the transfer function /its), which is thc Laplace transform
of the impulse response h(t). Usually (2.3.2) is much easier to solve than the
integral equation (2.1.2), and if we want h(I), we can /iJld it by inverse Laplace
transformation of N(s).
Example 2.3.1 The output or a certain initially quiescent, linear, time-
invariant system forced by u(t) = e(;q', 1;;:: 0, is
.1'(1 J= - e"')
so the convolution relation (2.1.2) is
- e"') 1'!I(t - tJe'" <it
J"
2.3.1 Idenlilicalion uf Laplace Transfer Funetiun
-"t
We now review briel1y the description ofsamplcu signals by:= transforms and
at the same time establish notation which will be useful later. 111 discrete-time
syslems the relation between a sampled signal and its transform description is
very direct. If a signal /(t) is sampled every T seconds, the result is fully
described by listing the sample values and their times. As before, we denote
samplef(k T) by/;. and we shall usually be concerned with signals which start
at lime zero. The list of sample values/;,,/; ,/;, ... is denoted by {fl, the curly
brackcts being real! as ""sequcnce". To complete the description we need an
operator LO shirt a sample along to thc correct place in the sequence. If a plain
number .\:' is taken to mean a sample at time zero of value.Y, and aT-second
2.3.2 Discrete-Time Transfer Functiun
Through (2.3.2) we call in principle idciltify".IJ{s) as Y(s)/ U(s) uS1I1g any
Laplace-transformable signal as input. We have to fit a Laplace transform to
the waverorm .\'(1) by breaking it into cOillpOIlCnts with known Laplace
lransronns. slll:h aSi:! constant. a ramp and exponentials. By doing so we arc
selecting a parameterised model, a lllorc dcmanding business than merely
recoruing all 1I1lparumclcriscd illipulse response. To add to the complication,
Y(s) contains components stemming from U(.q as well as I-!(s), as in Example
2.3.1. This wcighs against any inpllt more complicated than a step. but a more
elaborate waveform may bc I'orecu 011 us by input-actuator limitations.
,
I
j
(2.3.2)
(2.3.1 J
Y(s) = lI(s)U(s)
Laplace and Fourier transl'orms are the basis or classical control design and
much of the analysis of" electrical systems. We recall their delinitiolls
2'l/(I)] '" F(s) '" r' /(I)e'" dl
J"
Jl/(I)] '" F(jw) '" I' /(tle )""dl
J"
Their popularity is due to the way they simplify the inputoutput
relation (2.1.2) to a mUliiplicalion of Laplace transforJll'
Thc convolution sum suggests an identiHcation method in which Yo to YN
antlu
o
to II,..,. arc recorded, then the N +- I simultaneous linear equations givcn
by (2.1.3) arc solved for!lo to I1
N
Assuming the input is zero before time zero,
(2.U) gives
32 2 CLASSICAL METHODS OF IDENTIFICATION
delaying or a sample is denoted by an operator :-1, a sample}; at time kT is
conventionally The entire sampled signal is then
(2.3.3)
2.3 TRANSFORM DESCRIPTION OF RESPONSE
33
(2.3.9)
,.1 -';r
+ ... + rull +- jJJ/Z-1 +- fJ,;=-2 + ... ,y..!)
+,.c '_ + ... +- )'/1 I
1-/1
1
,,-1 1-/1,:-1 I-{J"z
1'1(1-/I,:"I)(I-{l.,: 1)( ... )(1-/1":-1)+)"(... )+ ... ),,,(,,,)
(I i)
( 1+(/ ... +b ... +I) _-"+I)U(_-I)
1 u- . 0 1 /1- 1 '"
(2.3.10)
I' = -ll I'
_ I J. I
so it looks as il'we call solve sliccessively 1'01' hlP hi' hJ.' ctc. The only thing that
spoils the idea is the presence of output noisc. Careful choicc 01'11
0
to liN might
keep the errors ill "0 to "N tolerable (Problem 2.8), but a beller
solution is to recortl more input and output samples, and find estimates of 11
0
to "N which give a good overall Htto the observed output. Chapter 4 rollows up
this idea ill detail.
Let us now think abollt itlentifying the z-transform transfer function
writtcn as a rational polynomial function of z - 1 rather than a long; or even
infinite, power series. A typical u.p.r. consists of 11 samplcd exponentials, and
can be written as
The coellicicnts or ='-' 011 each side give us a difference equalion
+- .. (2.3.11)
Il
o
+- h
l
:::- l+- .. +- h
ll
I::: -II + 1
.. -- all::: -';;--
Here the dcatltimc has becntakcll as zcro; more unCIl it will be nOll-zero, with
the elrcd of making hI)' anti perhaps other leading numerator cocllicients,
zero. From (2.3.9) ano (2.3.6),
from which
(2.3.12)
Likc the convolution sum, this equation 1'01'.1', is linear in its transfer-function
(2.3.7)
(2.3.4)
(2.3.5)
oJ
'1
:1
r
(2.3.6) "
!
,
cun be
/>0
F(z - 1) = I + e::tJ'z - 1 + e1J.'fz -1 + ... + ('''''ilL::: - N
l_elN+1)::tTz-N-I
I-erz I
the output is altogether
+' .. )Ul/+-;:::-1(h
o
+h
1
::: 1 +"')11
1
+ ;:-2(11
0
+h
1
=-1 + .. ')11
2
+ ...
+ ... + ... )==II(=-I)U(: I)
Clearly (2.3.6) is the discrete"time coullterpart or (2.3.2). Sample
picked out:
We recognise this us the convolution sum (2.1.3), with a timc marker
attached. In discrete time thc step from the operational input-output rclation
(2.3.6) to the explicit expression (2.3.7) ror the output is trivial, comparcd with
the step from the transform relation (2.3.2) to thc convolution (2.1.2) in
continuous time. Furthermore, the t1iscrete-timc convolution sum is easy to
compute.
A z-trallsrorm input-output equation comes straight from superposition or
the responses to individual input samples. With u.p.r. and input respectively
li(: - I) == "0 +" I: - I +",: .. , + .
UU-
I
) == lIo + 11
1
:-
1
+ 11
2
=-2 + .
This polynomial ill:: I is the or the Sillllplcd /(/). Its length is
tiniLc or inlinitc m.:con.lillg to whcnf(l) CIlUS, alld for lllOSI Sililplc wavcfOrlils
F{Z-l) cUn be wriucn concisely in closed form as the quotient of two
polynomials in Z-I (that is, a rational polynomial function). f:or example,
when a finite-duration exponelltial f(l) = (-':II, 0::; 1::; NT, is sampled, its =
tmnsform is
J4 2 CLASSICAL METHODS OF IlJENTIFICATION
2.3 TRANSFORM DESCRIPTION OF RESPONSE
35
cocllicients, and we should expect that the u's and b'g can be estimated iIllllllCh
the same way as the u.p.r. cocJlkicnls h
o
to h
N
Latcr chapters will show how
rar lhis is lrue. The mosl important auvantage or (2.J.Y) (2.J.12) is Ihat I'm
most systems 11 can be quite small, typically'2 or J, whereas the number or
significant u.p.r. samples may be very Il1lich larger.
2.3.3 FCClllICIlCY Trnnsfcr Function
shineu Olllpul, su the gain IH(jw)1 anu phase change LJJ(jw) apply to allY
input sinc-wave at frequency wl2rr Hz,
For sinusoidal or periodic inputs we nced tile complex value or HUn)) only
at the frequcncy of the input amI its harmonics. ifany, This is the situation. for
instance, in steady-state analysis 01" power systeni.s. More generally we wanl
the ji'('(/IICJU'.1' Jimcliul1 lI(jUJ) over the whole range of frequencies
passed by the system. so we can predict the rcsponse to allY input waveform
which has a Fourier transform, using
Although we shall concentrate on the identification or discrete-time systems,
we should not ignore the 1110s1 wiucly used identification method of all,
wave testing to obtain the frequency transfer function lJ(jw) of a con tillllOUS-
time system.
The response of a system to a sine-wave input CUll be found algebraically by
first writing down its response to Cj"'l, applied from time - The response to
e- j(lI! follows by putting -j for j, and the response to sin wI or cos wi by
expressing them in terms of ej,n
l
unO e - j,nl, The convolution integral gives the
rcsponse to ejl"t as
Thc impulse rcsponse can be recovcrcd from H(jw), in theory by inverse
Fourier transformation and in practice by fitting a parametric model with a
known inverse Lo an experimental H(jw), as in the next section.
Althis point il is worth nOling that the delay operator :-1 in discrete-time
systems can be interpreted as a compressed notation for e-
ST
, where Tis the
sampling period. That is, at timc kTis regarded as a /5 function of
which has a Laplace Two beneJHs are
conferred by this view, First, we can putjw for s und obtain the frequency
I l .; .... ,
transfer function of a discretc-timc system from its z-transform' transfer
fUI1l:tion, anu the spectrum F(e - jlUT) of a signal from its z-transfol'm. Second,
we can use Laplace transfurms to analyse the sampled input response of a
system helll'eell thc sampling instants. writing for U(:::-l) and
lI{e lor 11(: -- I). The only thing that thcll marks out discrete-timc linear
systems frolll allY otht.;1's is tilt.; relative complicatiun or t!leir Laplace and
frequency translcr fUJ1Ctions.
j
"
.1'(1) = , h(l - T )'J"" ciT
Substitution of r ' for f - t thell gives
(2.J.1 J)
Y(jw) = H(jru)U(jru)
(2.J.17)
where fl(jw) is the braekeleu integral, i.e. lhe one-siueu Fourier lransrorm or
the impulse response. II' I/(jw) is A +jlJ with A anu lJ real (anu rrcljucncy-
uepenuent), a sine-wave inpul
(2.J.15)
gives an output
.1'(1) = [H(jw)e
j
"" - H( -jw)e - j""J/2j
= [(A +jB)(eos u)/ +jsin wI) - (A -jB)(eos wI -jsin wI)]/2j
= A sin wI + Beos wI = (A' + B')'12 sin(w/ + Ian -1(B/A (2.J.16)
In other words, the output is a sine-wave of' the same frequency as 11(1),
multiplieu in amplituuc by (A' + B
2
)1/', which is Ill(jw)l, anu auvanceu in
phase by tan-l(B/Aj, which is LJJ(jw). Since the syslem is linear anu lime-
invariant, a scaled or time-shifteu input produces a similarly scaled or timc-
Examl)lc 2.3.2 A sampled exponential lu} = (':l:t, 1 = 0, T, 21'. " . if..! with rJ. < 0
has z lransl"orm U(z-')= I +e,rz-l +e"l"z-2+ ... YJ=J/[I-e,Tz-lJ,
anu Laplace lransrOnn I/ll - e" -'Ill The response 01" a syslem with transl"er
runelion I/(s -)') to III J has Laplace transrorm
so
y( t) = ei'iJl{l) +eTe
rlt
- I)Il{l - 1') + J T)Il(r - 2T) + ., .y.)
where 1/(1) is the unit step at time zero. That is, each sample from (ll} excites a
new response starting at the sampling instant and persisting for ever. The
expressiun ror .1'(1) is good ror all times, not JUSI lhe sample inslants.
The spectrum of (li} is Since e
2
rr.j is J, e(:l:-j,,,/T=
e1:l- jlt"- (Jnki1'lrr for any integer k, so the spectrum repeats itself at intervals
or 21[/7' in w. [:,
36
2 CLASSICAL METHODS OF IDENTIFICATION 2.3 TRANSFORM DESCRIPTION OF RESPONSE 37
and the last integral is zero if /)(t) is constant, a ramp or any sinusoid al an
integer multiple of the input frequency (Problem 2.11). Ii is small if N is large
and 1)(1) is random noise im.lepenuent of the input or a sinusoiu at a
frequency unrelated to that of the input. Similarly, C gives Vsin 0 plus a
term which can be made small, so altogether
Example 2.3.3 The opcn-Ioop dynamics of a motor-speed cOiltrol system,
skelched in Fig. 2.3.1, are to be identiHed by sine-wave testing. The sinusoid is
added to the d.c. speed reference voltage, and the gain and phase change
measured by Ihe lachogeneralor voltage normally fed back. The amplidyne is
Speed
reference Speed feedback
against 10gw(D'AzZQ and Houpis, 1981; Melsa and Schultz, 1969), arc
particularly useful in suggesting parameterised rational polynomial transfer-
fUlictiun 111Odcls. Think of
ff(i",) K(jWI, + Ij(iwr, + I j ... (iwr"cl :':1) = GLI! (2.3.21)
. (iwT, + I}(. . }(jwT" + I)
with the r's and T's real. As we increase the frequency J"rOI11 ncar zero, the low-
frequency gain is roughly 1\, then each factorjwT
i
+ I contributes signiJicunt
gail1 and phase advance frolll about W= l/r
j
up, and each 1/(jwT
1
+ I)
conlributes Httenualion and phase lag from about w = liT, up. If the decibel
gain 2010g
lll
G is plotted, eachjlUr, + I is asymplotically proportional 10 w,
and contributes 20dB more gain per decade increHse in w. Each I/(jwT, + I)
gives 20 dB less. With w also plotted logarilhmically the asymptole is a
straight lille, approached within 0.17 dB (2 i';.J at w = 5/r, or SIT,. If
we iii straight-line seclions with slopes integer multiples of 20dB/decade to
the measured gain Bode plot, UJ at each junction between two successive
sections determines a liT; or liT;, according as the bend is upwards or
downwards. The phase plot provides a rough check, sincejwt; 4Y"
lead at w = I/r" 1/(./w1', + I ) a 45
0
lag at w = lIT" and the othcr Caclo'r; litlle
lag or lead al Crequcncies well bclow their values oC I/I or II1'but almost 90"
well above those rrequencies.
Deau tillle is visiblc on the phase plot as a constant rate of increase of phase
lag \\'ith OJ, since a uclay 1" gives rise to eil"l
cl
in lhe transfer function.
Complex conjugale roots 01" the numerator or Jcnominator of HUctJ) are less
straightforward to estimate, but Exercise 2.3.3 will give an example. Factors
jlV in the denominator contribule 90 lag and -20dB/decade gain over all IV,
and are easiest to detect from the low-frequcncy behaviour which they
dominate.
(2.3.19)
2.3.4 IVlcasurclllcnl of FrClIU(!IICY Transfer Functiull
Some input actuators cannot apply a sine-wave. Many valves ill process plant
open and close at fixed rales. for instance. AllY hysteresis (backhlsh) in <111
input acluatorwill distort a sine-wave bull101 spoil <I step. If we arc lucky and a
sine-wave can be applied, anu the output is undislorlcd and tlie s.ll.r. high, the
gain and phase change at a number or frcquclIcics call be lllcasurcu by just
looking at input and output, once the iniliul-eolluilion response due to
switching the input on has subsidcu and the respunse looks as il'thc sine-wave
has been going for eVer. More often the output contains noise, harmonics
caused by non-linearity (i.e. components at integer ll1ultiples or the input
frequency), constant bias and perhaps drift. Periodic disturbances such as
diurnal variation in biological systems, seasonal factors in economic records
or mains-frequency interference in electrical systems arc also cOl11mon. Gain
and phase-change measurcmentsctln be mauc less susceptible to such
impairments by extracting the fundamental Fourier componcnt from the
output and measuring its amplitude and phase. That is, we compute the
averages
w II) J.'
12N
lrli'';
S=--- )'(r)lsmwlell .I'(I)Co""lelf (2.3.IH)
2Nn {) 2Nn 0
over N cycles of the output. An inpul Vsin flJlpruduccs all oulpUI
GJlsin(wl + 0) + 1'(1) where G is the gain, II is tlie phase change and 1'(1)
comprises all the impairmcnts. Now
GVw II2Nlr'i'" ...
S = -7- [hcos 0 - cos(2wf +0) +1'(1) sin Wf] elf
_Nn ()
GV GVw I"N"'!'" .
= --cos 0 +--- 1'(1) sm wf ell
2 2Nn u
Commercial transfer-function analyscrs work on this principle.
Frequency transfer runctions are attractive 1'01' several reasolls. They arc
familiar to electrical engineers through lheir role in a.c. circuit analysis, anu to
control engineers through classical stability analysis and control uesign. The
Bode plots used in control engineering, plots of log G againsl logw amI ()
0"" tan -I(C/Sj (2.3.20)
TaellO-
G generator
Fig. 2,3.1 Spccd-contl'ol system.
3Y
REFERENCES
Frequcncy-response ioentitlcation is nol short of disaovantages to balance
its virtues. It rcquires it slIccession ol"tests <It different frequencies, taking time
ano necessitating triul ailo ern,,' to arrivc at a suitable range ano spacing or
frcljucncies. At cach frcqucncy thc initial-condition respollse due to sudden
transition from no input to a must die out before the steady-state
forced response is observed. For that reason s\",eeping or the frequency may
nul be acceptable. A model is 110l convenient for some
applications. For instancc, the intcrsymbol interference and echo behaviour
of a data-communication chanllel is bettcr modelled in the time domain, by
i mpul se responses or u.p. r:s. Finally, frequency tmnsfer functions of discrctc-
ti me systems are complicatcd, as Problem 2. I0 discovers.
and 1'} and leaving no extra lag to suggest significant dead time. The low-
frequency gain gives K 18 in the transfer function. The high-frequency
behaviour is morc thought-provoking. A very sharp resonance peak at aboul
171lz suggests a translcr-rulIl:tion factor l/l(jw/w
n
)2+ 2(iw/w" + I] with (, the
damping ratio ([)'Azzo and Iioupis, IlJH I J, very small. Thc full-line curves ill
I:ig. 2..1.2 givL:11 hy IIi" 11.1 IIDd e..- tHL"i arc a good iii ill phase hut ollly a
11IOLkratL: lit in guill. It louks as ir even this' may be too high. At this stagc
some physical insight is csscntial to make scnse of the results. A likcly source
of very lightly damped resonancc is torsional oscillation between the
armatures of motor :Jnd tachogenerator. The motor-tachogenerator
combination is, in fuct, a laboratory set with two identical
large armatures. Further testing with the taehogenerator electrically loaded
eonlirmed this explanation; a lOW load provided enough damping to reduce
the resonance peak height to about 1.5 d B.
Two lessons may be drawn from this example. First, surprisingly smooth
and apparclltly accurale results arc obtained by averaging y(f}sinw! and
.I'U) cos wi (over 10 cycles up to 5 Hz, IliO cycles up to 10 I-Iz, then 1000 cycles)
even ill thc prcsencc or extreme impairmcnt of the output. Second, a
convincing model can only bc found by interplay between test results and
background knowledge of the system. /':,
REFERENCES
O'Aizo. J. J., and Houpis, C. H. (1981). "Linear Control System Analysis and Design", 2nd cd.
McGraw-Hili, New York.
Gabel,R.A., anti Roberts, R. A. (1980). "Signals and Linear Syslems", 2nd cd. Wiley. New York.
Gourrey, K. R. (1983). "Compartmental Models and Their Appli<:utinn'. A<:ildcmic Press, New
Ynrk ;1Ilt! LlJlltlulI.
t\ilelsa, J. L., anti Sdlultz, D. G. (llJ69). "Lillcar Control Syslellls". McGraW-Hili, New York.
Norton. J. P. (1913). Praclil;al problems in blast furnace idelltificatioll. M"as. CUI/trut 6, 29-.34.
. 30

, .'-_..,-
10
o
?u
10
lU
".\
2 CLASSICAL METIIODS 01' IOENTIFICATION
30
m
c
-f80
u
&
o
Q.
30
-,

20
ill
<.1
L IU
"
CO
0
10
0
38
a high-gain d.c. generator acting as all amplifier. The dynamics might be quite
complicatcu, as they include the umplidync Held tillle l:oJlstanl, the armature'"
circuit time constant of the umplidync and mOlor, and the mechanical time
constant of the motor-armature: tachogcncra tor-armature: motor-load
combination. The output, ofLm.s, amplitude betwcen about 0.25 V and a fcw
volts, has superimposed on it about 25 V d.c., a ncar-sinusoidal 200-Hz
com1l1utator ripple of 5-10 V Lm.s. and a few volts LnLS. of wideband
commutation noise. Given thc opportunity, onc would tcst the various
machines separately, but testing the overall dynamics is quicker and nceds less
instrumentation.
The Bode plots of the test rcsults are given in Fig. 2.3.2. The downward
breaks urthc straight-line approximations to the gain plot, to - 20d U/dccude
slope at 1.8rad/s and -40dB/deeade at 12.6rad/s, indicate denolllinator
factorsjwT + I with T
1
= 0.556 and T, = 0.080. The phase changes at those
frequencies are close to '-45
0
and 90 Q - 145, confirming the values or 1')

FrcquClicy (HI)
Fig. 2.3.2 Botle gainulld phase-change pl,ols for spccd-c(llllrllJ systelll. Vertical burs indicate
uncertain measurements.
\
I'ROIlLEMS
a(t) 0.05J(t - 0.05) +0.1 ,$t - 0.1) +0.15<5(t - 0.15)
Reid, J. G. (1983). "Linear System Fundamentals", McGraw-Hill. New York.
Unbchaucn, B., und Dickmann, 1<. (19fD). Application of Ill.LIl1.0. idclltilk:llioll In u blast
furnace. In "JdcntHiculion and Syslem Parameler Estimation 11)81" (0. A. Hckey Hlld
G. N. Saridis, cds.), pp. Pergamon, Oxford.
Ziemer, R. E" Tranter, W. B.. and Fannin, D. R. (1983). "Signals ami Systems: Continuous ilmi
Discrete", Macmillan, New York.
t'tWIILI'MS
sleady-state gains equals lhe common ratio between successive samples in
their u.p.r.'s, anJ so docs the ratio or thcir peak impulsc-response values.
2.7 Show thalthc gain lIud phase change of the zero-order hold described in
Sect ion 2.1 .2 arc 2 sin( w T/2)jw and - (I) T/2. where (I) is the angular frequency.
Explain the aSylllpllllk gain as (:) tends to zero. At what frequency docs the
gain dilfer by 1dB rl$ln its zero-frequency value?
2.M For lhe idcntilkaliull Illcllwd tClltalively suggestcd in Section 2.3.2,
solving for the u.p.r. ordinates, consider the elfects of an estimation
error ()II;in Ilion the estilnates oflatcr ordinates hi-j- i' etc. \Vhat features orthe
input sequencc would cause tile error to increase as it propagfited? \Vould a
tliverging sampled exponcntial be a good test input? \Vould a converging
exponential?
2.9 Find anu sketch the amplitude and phase spectra or the sampled
exponenLial l/(iT) = exp(aiT). i = O. 1,2, .... considered in Example 2.3.2.
with IJ. real anu negative. lRather than grinding Ollt algcbraic expressions,
think or exp(a -jw)Tas a vecLor or length explaT) at an angle - wTradians
to the positive real axis, anu do somc geometry.]
2.10 A microprocessor takes ... or a signal/(I) at intervals T
and 101"1115 the three-term moving average g/ = (/; I +'/;_1)/3. find the
transfer runctiull (i(jlt)jF(j(J). Sketch how the gain and clHlnge vary
with frequency.
2.11 Rcfcrril.lg to (2.3.ltJ), verify that conslant, ramp or harnlOnic-l'requellcy
CUlIlpllllenls III the uutput have no clfet:l Ull the frequency lransfer function
measured by the Fourier analysis nlcthod or Section 2.3.4.
2.12 Two adjaccnt break frequcnl:ies lll1 a Bouc gain plot arc scparated by a
factor or 3. Thc gain contribution at the lower break frequency of the transfcr-
function ractol'which gives rise to the LIpper break is therefore less thall 0.5 dB,
negligible for most practical purposes. What is its phase contribution? What
do you concluue about the relative convenience of gain and phase Bode plots
for identification'!
2.13 Roughly, what deau time would be the smallesl reliably detectable in tesl
results or the apparent quality of those in Example 2.3.3'1
2.14 (For control engineers) Whal limitations, if any, would Ihe high-
frequency reSOnance found in Example 2.3.3 place on Lhe steady-staLe-error
capability orthe closed-loop control system? Would your anslVerchange ifthe
resonance peak were aL -3dB rather than -15dll or so'!
2 CLASSICAL METHODS OF IDENTIFICATIlJN
(iv) the continuous-Lime response to the input in (iii).
2.2 investigate the effects ora non-ideal input in an impulse-response tcst by
plotting the response or the system with impulse response h(l) = exp( -I) -
exp( -51) to a rectangular pulse input ofullit arca and duration (i) 0.1 ,(ii) 0.2,
(iii) 0.5. Compare each response wilh hU).
2.3 A system has a lLp.r. l:ollsisting 01" two sampled exponential
components. one fast anu one slow. In each sampling interval, thc fast one
ralls to {XI times its valuc at the start of the interval. The corresponding figure
for the slow One is (X2' and (XI is about 0:; with r a large integer. As ill Examplc
1.3.8, the diffcrence in speeu causcs uilliculty in ioclltirying 0:; auu 0:
2
adequately. They could be ioculilied in two separate experimcnts with
sampling intervals diffcring by a factor f. ShoulJ the results ror (XI and 0:; be
combined in a z-transform transfer function or the rorm
H(z-')=b,!(I-a,z-')+b,!(I-a;z-'j'!
Ir not, how should they be combined?
2.4 Find and sketch Ihe sampled unit-step responses or the two systems or
Example 2.1 J(i). .
2.5 Ily treating Ihe lirst u.p.r. in Example 2.1.3(ii) as two sampled
exponentials and the second 1I.p.L as a single samplcd exponential, verify that
the continuous-time impulse responses of the two systems just aner ti mc zero
behave as statcd in that example.
2.6. Two first-order continuous-time systems have the sal11e u.p.r.'s when
their outputs arc sampled, but one system has a dcau time mllch less than one
sampling intcrval, and the other a deaJ time just less lhan onc sampling
interval, as consiJered in Example 2.1.3(iii). Show lhat thc ratio or their
4U
2.1 A system with impulse response h(t) =exp( - 101) is quiescent at time
zero. Find and sketch (i) its response to an input a(t) = I. 0,; I < 0.2; (ii) the
sampled response to this input, the samples being at time intervals of 0.U5
from time zero; (iii) the sampled response to the sampled version of this input,
I.e.
Chllilicr J
hlcnlilicalion Bascd on Correlalion Fnnctions
3.1 TIME AVERAGING TO REDUCE EFFECTS OF NOISE
. 3.LI Time-Ayerage Relations between Signals
A basic problcm in iuenlificatioll is to distinguish the effects orthe input from
noise in the observed output. Averaging the responses \vas recommended in
the last chapter 1'01' impulse and step tests, unu in a rrequcncy rcsponse test,
Examplc 2.3.3, \vt achieved impressive rejectiun uJ' lIoise. and other
impairments by averaging the product of the output and siH ;lJI or'cos wi,
signals uerivcd from the input. The idea oravcragingcan be cxtcnded to other
forms of input by employing the jilllclion (c.c.!".) I'urt r,l)
between inputll(r) and output .1'(1). dclined as
1'",.(r,I)=t'[II(1)'(1-1-r)] (3,1.1)
The notation El' Jsignifies "the ex pceted value or .", so I'UI'( t, f) is the average
of 11(1 )'(1 -I- r) over all possible values. regarding 11(1 ),1'(1 -I- r) for given I and r
as a random variable (Helstrom, 1984; Melsa and Sage, 1973), For our
immediate purposes, we can read it as "the average of," and interprct it as the
I ime average
I'",.(r) = 2
1
l ,[r1/(1)'(1 -I- r)tll (3.1.2)
This average is a function of lag r but n,ot or 1. The two averages do 110t
coincide for all signals, since [11(1)'(1 -I- r)J might well vary with I. That is,
averaging over time in one long experiment need not give the same result as
averaging at olle time over a large number of experiments, even inlhe limit as
experiment length alld number of experimellts tend to infinity. A random
variable for which they do coincide, as we shall be assuming, is called ergodic.
The reaSOn rur IClling the start time - T ill (3.1.2) tend to 1.) rather than
fixing it at zero is that \\'cshulJ be interested in thc steady-state forced response
to an input which started long ago, not the transient response to a signal which
starts at time zero. Thc c.c.r. defined by (3.1.1) measures how closely .r(f + t) is
4.1
3 IDENTIFICATION IlASED ON ('ORRFlAIION FliNCIIONS
related 10 lI(I); the value" which minimises E[(a//(I) - y(l + r)']
shown 10 be f,,).(r)/E[//'(I)] (Problem J.I).
A dist.:rclc-tilllc cuuIIlcrparl or p.I.2) is
45
(3.1.7)
so:thc ex.r. bctween 111} and l.l'J is
:\.1 TIME AVERAGING TU REI>U('E EFFECTS OF NOISE
initial-condition response has long since vanished, is
(J.I.J)
is casily
k integer
N
. I l:' ,. (Ii) = Ii III --"'"""-.. 1/. l'.
uy. N-.,,,2N-j-1 r.llk
i N
44
which CUll be approximated by an average computet! from finite records.
The ulltocorre{llioll jimetioll (a.c.r.) uf a single signal is ocflncd
analogously. For instance, the continuous-time a.c.r. of an ergodic 1/(1) is
N
f",.(k) = lim 2--/1,_1_-
1
\' "
i
\' IIPi"_j
N ... +
j=- j=-O
I
T
f",,( r) = E[//(I )//(1 + r) J= rT _,."(1 )//(1 +r) dl
and the discrele-lime a.c.r. is
(J.1.4)
(3.1.8)
3.1.2 Rclatiun ill Tenus uf Currchltiuu FUllclitms
The a.c.f. covers negative as well as positive lags T or k. and is all even I"unction
or lag. From (3.1.5), for inslancc,
N-k
f",,(k) '" k" \' "i(Y;" +Vi +k) '" +f",.(k) (3.1.1 0)
. N-j'+ -!....,
i=-M
So long as {v: is unrelated to :u} and zero-mean, the long-IeI'm average or
LljV
H
-
k
is very likely to be dose to zero. We formalise this by saying [u} and {vJ
are JIll/wally ll1u.:orre/ated if f'lw(k) is zero 1'01' all lags k. A morc restrictive way
to ensure that r(k) is zero is to aSSU!11C that ll. and V'+k arc statistically
independent, so E[UiVH-d is Eluj]Elpilk]' and lone orlthem is zero-mean,
i.e. E[uJ or E[v
i
I kl is zero. The precise assumption is unimportant, as we
cannot in any case verify tllLlt {rIll' J is llcgligiblc during lhe experilllciit since Iv l
is unobservable. A heuristic "engineering" assumption is that we can be prelly
sllre {r'JI,J is ncgligible if we takc care lo HvoiJ treating any or
f",.(r)= i>(t)f",,(r-t)dl ""', (3,1.9)
wherc I, is thc scttling tillle beyonu which 11(1) is negligible. The equation
originally arose in optimal filter design. We wanl to solve (3.1.8) 1'01' the u.p.r.
111 I, which call be truncated al say, in ail asymptotically stable system,
CUlling 011' the negligible part of the dccaying tail. The unknowns "u lo cnter
linearly, so wecan compule "u\.(k) for s + I values of k, inserllhem into (3.1.8)
with the corresponding f'uu{k) 'to rllU{k '-' s), and solve by matrix inversion. We
go to the trouble of computing (I'1I1'} and {run} to reduce the influence of noise.
II' the observed QlItput U'l is comj)oscd of clean ou lput lye l plus Iwise {V}, we
compute
This is lhe IVienef-!Jonl' equation. The continuous-time version, derived ill
much the same way, is
(J.1.6j
(J.I.5)
N-I Ii
I
,...' 1 Ii
I "i"i_k=r",,(-kj
i"" - NIl;
I
I
=. 1111 ---
N"" 2N + I
I
I
f (k) = lin ' ..
Uti N-"L 2N + I
since the starting point or the sUlllmation is immaterial. Although the c.c.r.
also exists for negative lags, it is not an even function, and values or rUl' at
negative lags are seldom of interest, as J' docs nol then depend on u. .
The a.c.f. fUll and c.c.r. "11\' are important in identilication because they are
related through the impulse response or u.p.r. of the system with /I as input
and y as oulpul. A good eslimale or the impulse response or u.p.r. is oftcn
obtainable from the relation, since "u)" and ir necessary r
ulI
' can be measured
by a time average on which noise has little ctfect.
The discrete-time relatiun is most usc to LIS. The oUlput al sample insltmt
j +k due to an input thal started an inddinilcly lung timc ago, so that the
3.1.3 Puwer Spcctnll Density: \Vhill' Nuise
constant components of the output as. Boise. We can be surer still that the
computed II"",.} in (3.I.S) is less afrected by noise than 1.1' I in the input-output
convolution (2.3.8) which olrered an alternative way to lind the u.p.r.
Solution for {II} is very easy if we employ an input with al1ullcomplicatcd
a.c,f. The best of all is a II'hile input which has ruu{k -j) zero cXl.:epl when lag
I< - j is zero, yielding I" i directly from
47
(3.1.15)
U(z)U(: ') =jU(z"JI' =IU(e-j,uf'W
3.1 TIME AVERAGING TO HEDUcE EFFEcTS OF NOiSE
N - AI is im:reased, the z-transrorm of the c.c.r. gets closer and closer to
U(z)Y(z" )/(N - M -I- I). If the sampling interval is T', say, the frequency.
domain behaviour or the c.c.f. is therel'ore found by writing =as cxp(jwT'J
in U(z)r(z -, )I(N - M -I- I). The a.e.L is dealt with similarly, using
U(:) U(z' )/( N - M -I- I J, and is mote informative since
That is, thc Fourier transform of the a.c.r. gives the square orthe amplitude of
thc signal spectrum. By analogy with a sinusoid, the square of the anlplitudc is
proportional to the signal power at the frequency in question, or
accurately the power per unit frequency, since the signal is represented as a
continuum of frequency components. The Fourier transform of the a.c.f. is
therefore called the power (allto-) .\jJectral density. The c.c.r. gives rise to the
cross-spectral pOII'i'r density, less readily interpreted.
For completeness let us Fourier transform the c,c.f. and a.c.r. of
continuous-lillll' sigllals and lind a similar inll.:rprctatiol1. \ViLlI rlll.(T) as in
(3.1.12),
(3.1.11) I< =0, I, .. .,S
3 IDENTIFICATION liASI'D ON CORRFIAIJON FUNCTIONS 46
1"",.(1<) = :2.>/",,(1< - j) = ",(J,;,
j==O
where is "uu(O), the 111.5. value of Ii.
Before we examine specific white or near-while input signals for
idcntiHcation, it is worth laking a look at the frequency-domain signilkancc or
correlation functions and of while signals.
(3.1.17)
(3.I.IY)
(3.I.IS)
1'/,. Ir)/= I" ,. (rl<' '''''dr
. 11\' III
RUl(jw) =(I/2T)U( -jw)Y(jw)
In exactly the same way, the transform of the a.c.L of 1/(1) is
Ruu(jw) = (1/2T) U( - jw)U(jwJ = (1/2T)/ U(jw)/'
A rrequency-response idclllilicatioll mcthod call be based on
lI(jw) = Y(jw)IU(jwJ = Jiu,.(jw)IJiuu(jw)
I fT . fe .
= u(t )('i"" ..' y(1 + r)e JUJ{1 -I- r) dr tit
2T -I' -x
(3.1.16)
In the inner integral I stays constant, so ilr equals iI(I -I- r) and the integral
gives the Fourier transform Y(jw) of the output. The transform exists
provided the response to any short sectioll of 11(1) is absolutely integrable,
which is so in any system or which all poles have negative real parts. As ll(t)
extends only from - Tto T, the outer integral is the Fourier transform integral
with j in place of - j, so
J
',. 1 fl' .
=. 2T .. 1/(I)e'''''y(1 +T)e-
J
'''''' "d'dT
- I
(3.1.11)
(3.1.14)
(3.1.13)
for
As we arc considering signals whidl go Oil ror CVl:r, we canllot discuss their
frequency-domain charaderistics without first ensuring that their Fourier
transforms exist. The transform ur u(f) Illay 1I0t exist unlcss ll{f) is absulutely
integrable, i.e. J: ,11I(I)lill is finite (Gabel and Roberts, IYSO; IJracclVell, InS).
To make sure it is, We restrict u(t) to a finite bUllong duration rrom - T to T,
and correspondingly rcdellnc I'll.\'( T) as
1'",.(r) = -'- IT 11(1 )'(1 -I- r) ill
. 2T_
T
and similarly for fuuer). Discrete-time signals arc treated the same way, with
the a.c.f. and c.c.f. regarded as finite-time averages over as Jong an interval as
we please.
The z transforms of (ll} and {Yl, both extcnding from sample iH to sample
N, are
+lIM-I-IZ-M .. j +...
y(:-I)=.I'Mt-M + ...
and it is easy to see that
[cocllicicnt or :-k in U(:)Y(: I)J = UMJ'Mlk + I1,1f! l.l'Ml-h! I +... +II,..,' h.l',v
If we divide through by N - M -I- I - 1<, this approximales 1'",.(1<), and as and some commercial frequency-response analysers work that way, but the
time-domain alternative is more convenient, as it avoids thc practical
2(1 +acoswT'+1I
2
cos2wT'+' .. iJJ)-1
= lie[2(1 +aexpjwT' +u'exp 2iwT' +... y ) - I]
I I I _ ,,'
Ii Uw) = + - I =
uu l-aexpjw'f' I -acxp-jwT' 1+0
2
-211coswT'
or written as
Exumple 3.1.1 A sampled signal has a.c.r. tuu(k) = alAI with -I < a < I. Ils
p.s.d. is found [rolll {turJ written as the: transform I +a(z+z-')+
a
2
(z2+ z -2)+ ... Cf) with cxpjwT' for t. Sinccjul < I, the inJillitc scricsci.ln
be summed to get
3.2.1 Test
3.2 COIWELATION-IlASED IDENTIFICATION WITH SPECIAL
PERTURIlATION
It is important to recognise thc limitations of white noise as a basis for
modelling structured signals. Real-life inputs and noisc more oftcn than not
contain fcatures which cannot be reprcsented as filtercd while noise. Sustained
deterministic componcnts or sporadic disturbances are often present.
Sometimcs the model call be extcnded to include theIll, but sometimcs wc are
reduced La hoping thcy do not malleI' too Illuch, or selccting records where
they arc not loo promincnt.
3.2 CORRELATION-BASED IDENTIFICATION WITH PERTURBATION
Since U(j",) is lI(jw)/V(jw), IU(jw)I' is IIJ(jwJI'IW(jwJl' so the ]l.s.d. or
11(1) is 11I(jw!l'I', i.e'I'/(w'Ti + I). The larger T
r
, the narrower the spectral
spreau of 11(1). 6
3 IDENTIFICATION !lASED ON COllRELATION FUNCTIONS 48
problems of aliasing, leakage ano windowing in numerical Fourier
transformation (Ziemer elal. 1983).
Although we have written the transform uf the input a.c.r. ill terms of the
input spectrum in deriving the power spectrum, we would in practice usually
find the p.s.d. from the a.c.r. directly.
and summed.
The p.s.d. li
uu
Uw) is periodic in w alld oscillates betwcen (I - ,,)/( I +a) at
w= rr/T', 3rr/T', ... and (I +a)/(I-a) at w=O, 2rr/T', 4rr/T', ....
6
One special case merits close attention. A sequence {IF} with a.c.r. zero
except at lag zero has a constant p.s.d., since the :-transfonn of {t
ll
II
.} isjuSl
the m.s. value of \1', and pUlling expjwT' for z does not it. In
continuous time, a c}-function rll'll'(r) transforms to a llat, infinite-bandwidth
p.s.d. A signal with a flat power spectrum is called II'Mll! noise, by loose
analogy with while light. The total power of such an infinite-bandwidth signal
would be infinite if the power in any flnite bandwidth were non-zero, so pure
white noise is pure fiction; we are actually concerned with Iinite"banuwidth,
flnite-power signals with flat power spectra. A signal with a f}-function or
single-impulse a.c.r. has no consistent time structure. Its future values do not
depend on its present value. Because of its lack of structure, white noise
represents an ideal against which to mcasure the output errors 01" a model,
since 110 model can do murc than embody all the uf the output.
White noise is also a convcnicnt raw matcrial for modelling structured signals.
The idea is to describe a structured noisc or input signal as the result of lincar
filtering of white noise,
Example 3.1.2 A test signal u(t) is gcnerated by filtering while noise II'U) or
constant p.s,d. p ovcr the bandwidth of interesl. The lilter transfer function is
HUw) = 1/(jwT
p
+ I). The p.s.d. or 11(1) is to be rOUIlU.
A white discrete-limc perturbation signal {Ii} can readily be' generated by
sampling a physical source of widebund llat-spectrum noise such as thermal
noise (Helstrom, 1984), looking up a random-number table or l:omputing a
deterministic long-period number sequence indistinguishable from noise
samples. as in the pseudo-random number generators provided by high-level
languages, Jl is wise always to compute the sample a.c,r. and reject uny
unreprcscntative test sequence. The standard deviation of a sample a.c.f.
computed from N samplcs ofa genuillely white, zero-mean {II} is E[lI']/.jN at
any 1101l-zero lag (Problem 3.4). As E[II'] is the autocorrelation at lag zero, a
trial {II/might reasol1ably be rejected if the sample a.c.r. at any small non-zero
lag exceeded 2/JNtimes the sample autocorrelation at lag zero, roughly
the 95 confidence limits for a Gaussian variate. A more refIned test is hardly
justified, as the criteriun for rejection is subjcctive anyway.
Idcntification of the u.p,r. from (3.1,8) using a white input {It} is known as
while-noise testing. A white test signal is called "noise" to cmphasisc its
unslructured nature, evcn when it is completely known. Whitc-noise testing
IHIS some drawbacks. Very largc input values Illay Ol:l:ur, tlepclH.ling 011 the
source, and be clipped by the converter or input actuator,
altering the a,c.r. A signal frol11 a genuinely random noise source is not
reproducible but can, or course, be rccurdctl to allow cUlllparison or
experiments. More scriously, a tinite stretch of a white sequence has a finite
risk of a rar-rrom-iueal sample ,..c.r., illvaliuating thc usc or (3./.11) on the
sample a.c.r. anu c.c.r.
The reliability of results from finite records can be assessed more easily if
50 3 IDENTIFICATION IlASED ON COHHELATION FUNCTIONS
J.2 COHRELATIONBASED IDENTIFtCATION WITH PEHTUHllATlON 51
(3.2.1 )
2 3 "P-I Ptj
ITI-r---T P
(b)
k (bds)
(0)
Figure J.2.la shows the a.c.r. of a gcneralm-scqlicnce with binary Icvcls
h. The sequence is normally applied via a zero-ordcr hold to the input of a
continuous-time system, and the output is sampled. The u.p.r. is thcn a
description of lhe sample-time dynamics of the zero-order hold and system
combincd. An allcrnative is to vicw the output of the zero-ordcf','hold as a
continuous-limc input, observe the system output continuously and form
7
I
(lI
i
-e)(lI
i
,,-e) - 7
r . .(/1) = = I' (k) - 'ell +c-
Ull 7 rm-
i= 1
T( X 1h)
Fig. .1.2.1 ta) Aulocorrclation function of IIHequcllcc. The negative v,l!ucsof "u" areaJJ - /J!';P.
(b) autocorrelation functioll of output of zcroonJcr hold driven by /II-sequence.
amI so 011: r
uu
(3) =I'lIu(4) =r
llu
(5) ='-1111(6) =1, rllll?) =4 =rut,{O) , 1'111,(8) =
I'rw( I), .... The a.c.f. is far from that of white noise but can be brought
closer by a d.c. shift in {II}. II' we subtract e fr0111 every bit, forming {II'},
where If is the mean or ju The choice (' = (4 ,/2)/7 makes r'l'lI(k) zero at
all lugs except multiples or 7. The resulting binary levels 0.227 and -0.773 or
0.631 and -0.369 may be less convenient than symmetrical levels.
Symmctrical Icvels give non-zero autocorrelations at all lags, but the
measured \r
lly
} call be adjusted to allow for them, as described shortly. 6
{u} is a deterministic pseudo-noise signal, with a.c.r. behaviour over a linilc
interval precisely known. A family of pscuuonoisc signals with several
convenient features is considered nexl.
(i) deterministic but pseudo-random in the sense thal its a.cJ. is close to
zero, compared with the value at lag zero, over a range of non-zero lags;
(ii) binary, a great advantage as it maximises the power for a given
maximum amplitude, simplil1cs digital generation and storage of the
sequence, suits most actuators and makes e.c.r. COIll pUl<tliulI easy as ex plai II cd
later;
(iii) periodic, so its a.c.r. is periodic and is foulIl.l al:curatcJy by averaging
over a single period, since if we take 2p periods,
1'-101 1)/'-]
I' (k) = lim \' I \' "
i
"
i
"
1111 P G
j= -I' i '" j/'
1'- 1 /,- I
=*I "i "iH =* I "i " i 'HII' = I'",,(k + 11')
i=O 1=0
where I is an integer and P the period;
(iv) synchronous, i.e. the samples are produccL! regularly at one per bit
in terval t h'
The best-known p.r.b.s. is the maximal-length sequence or III-sequcnce. An
m-sequence has period 2" - I during which every n-bit binary number exceptll
zeros starts exactly once. For example, one periou of the with 11 = 3
is 1011100, so the three-bit numbers 101,011, III, 110, 100,001,010 start
successively in that period (,ind run into the /irst two bits orthc next). The a.c.r.
characteristics of m-sequences arc best seen in an example.
A pseudo-random binary sequence (p.r.b.s.) (Pcterson, 1961; Golomb, 1967;
Godfrey, 1969) is
3.2.2 UhUlf)' SCIllH.'IICC Tes.
Example 3.2.1 We calculate the a.c.r. or the seven-bit /II-sequence by
averaging LJjUj+!i over seven successive samples, starting at i = I auu taking
lI
iH
rr0111 the second period 1011100 as required. We lind
1'",,(0) = (1 + 0 + I + I + I + 0 + 0)/7 = 4/7
1',,,,(1)=(0+0+ I + I +0+0+0)/7=2/7
1'",,(2) = (I + 0 + I + 0 + 0 + 0 + 0)/7 = 2/7
I'",.(k) l>J'""lk - j) +I'",.(k)
j'" II
I'",.(r) by analogue means, for any lag r of interest. Hellce II(T) at particular
values of r is found, rather than the u.p.r. Figure 3.2.1 b shows thc a.cJ. of the
zero-order hold output. It deviates from the ideal morc L1lan the discrete-time
a.c.r. of the Ill-sequence, by virtue of the two-bit-interval width of the spikes.
The c.c.f. corresponding to Fig. 3.2.la, with output noise Iv}
present, is
(0)
53 3.2 CORRELATIONBASED IDEN I'IFICATION WITH PERTURIlATION
Two dilrercnl cross-correia tors were used. One produced the continuous-
limc cross-correlation one lag at a time by analogue integration of the output,
with sign reversed whenever the input changed binary level. Irregular drift of
about 0.2 Hz bandwidth in the output made values at successive lags
inconsistent and unrepeatable, even with careful biasing of the oulput to zero
long-tenn d.c. level and avcraging over several periods. Susceptibility to drin
is common in open-loop systcms normally controlled by feedback, since they
have high gain at low frequencies.
3 IDENTIFICATION IlASED ON CORRELATION FUNCTiONS 52
,(/' + J )11, - g
'" 1,- _....
J'
1
C'
g
0 '___,
20 40 60 ->'80" '. JOO

....t- -1

o 1
C'

-e
<1
0
0.2
..!:: r{s)
S
, -1
Fig. 3.:U Rcsults or opcn-loop p.r.b.s. tcst of spced-control systcm (a) c.c.f. computcd from
2048 samplcs showing dominant time constants (sampling intcrvallO ms), and (b) c.c.[ t.:omputcd
rrom 16384 samplcs showing resonancc as ringing.
Orin-correction scherncs for m-sequcnce testing are well investigated
(Brown, 1969) allJ fairly straightforward, but only compensate for drift
adequately represented by a polynomial or short Pourier series. Unavoidable
erratic drift is bettcr modelled as non..deterministic, as in some methods in
Chapter 7, where the disturbances as well as the dynamics are explicitly
iJentilied.
The second COI'relator sampled the input and output at a rate well above the
bit rate and computed the c.c.f. over a range of lags from the same samples. As
Fig. 3.2.2 shows, acceptable results were obtained, now repeatable. They are
consistent with the dominant time constants 0.08 and 0.56 s found in Example
2.3.3, and reveal a similar high-frequency resonance. Averaging over 15 or
1110re p.r.b.s. periods was necessary to obtain sulliciently repeatable and
plausibly smooth u.p.r.'s. Ii
where g is the steady-stale gain ' II. of the sytilcm, i.e. lhe IInal value of its
L...j= II )
unit-sampled-step response. It can be Illcasurcu ill a step-lest, inrcl'rcu from
steady input and output levels, calcnlatcd as PI; 0" I'"... (k )/1 (/' - .1')11
2
) or
estimated from steady-state pcrfonnancc specifications.
Continuous- or discrete-time cross-correlation is particularly easy when the
binary input has symmetrical levels or one level zero, since Illultiplication by
the input only rcquires sign reversal or switching 011 and all' of the laggcd
output. '
Experiment design for p.r.b.s. tcsts is straightforward. Thc bit interval
should be short compared with thc shortest feature of interest in the impulse
response or u.p.r., to avoid blurring it or missing it, respectively. The period
should be longer than the settling time of the impulse response or u.p.r., to
avoid superimposing the effects of II
j
and II
j
+1' in (3.1.8) or similarly in (3.1.9).
The swing between binary levels shoulJ bc as large as pcrmittcd to maximise
the output s.n.r., and the experimcnt as long as possible to minimise the
contribution of noise to the input-oulpUt c.c.r. Several short pcriods' of
p.r.b.s. are preferable to onc long onc, to prevcnt one or two short breakdowns
from ruining the experimcnt; the good a.e.!'. propcrties apply unly ttl complete
periods.
Industrial applications of p.r.b.s. tcsting arc describcd by Godfrcy (1970)
and Cumming (1972), who discuss their results in Jetail.
Example 3.2.2 The open-loop speed-control system or Examplc 2.3.3 was
perturbed by a 63-bit III-sequence with bit interval 25 ms anJ mcan-square
value similar to that of the input in Example 2.3.3.
Ib) oj I
Pseudo-random binary sequence resulls have the disadvantage that non-
linearity or drift may have effects indistinguishable from those of noise, in
contrast to sine-wave responses. Inspeclion or the response waveform before
any processing is highly advisable in either cuse, A major advantage of p.r.b.s.
testing is its relative speed, even with considerable averaging, comparcu with a
succession of sine-wave tests, providing the cruss-correlations al all lags arc
found from the same rccon.1s, Fast frequency-sweeping, sillc-wavc-bascd
unalysers exist, however (Docbclin. IYHO). In them, the
output is passed thruugh a narruwband liltcr to extract the fundamental
without long averaging, uno the variablc-frequcncy output is hClcrotlyncd to a
fixetl frequcncy so that a fixed lilter CUll be uscd. Determining the fastest
permissible sweep rate may not be very easy when, as usual, Ihe required
frequency resolution is initially unknown.
M-scquences are easy to generate comparctl with thc othcr main family of
periodic p.r.b.s., quadratic-residue codes (Everell, 1966; Godfrey, 1969). The
m-sequcnce of period 2
11
- I appears in the right-most stage oran ll-stuge shift
register fed at the lefl-hand end by the modul0-2 sum (i.e. Ihe remainder Oflhe
sum when divided by 2) of the outputs of the right-most slage and OUe or 1110re
others. Table 3.2.1 gives the stages sumlllcd for nup to II.
Example 3.2.3 The conlenls or the 3-stagc shirt register in Fig. 3.2.3 arc
initially set [0 001. The first clock pulse shifts them rightward one stage and
trausfers I, Ihe modul0-2 sum or bits 2 and 3, into stage I, making the canlents
100. Succeeding clock pulses produce eonlents 0 I0, 101. I 10. I I I, UII, 001 ;
Ihenthe sequence repeats. Stage 3 gives lhe 7 bit Ill-sequence 100 I0 1I. Stages I
and 2 give Ihe sequence delayed by 5 and 6 bits (nol2 and I bits!). and other
delayed versions arc easily obtailluble; for instance, the lllodulo-2 sum of stage
1 and 3 contenls is 1100101, the sequence delayed by I bit. ", . 6.
55
Fig. 3.2.3 Shin-reg.ister gcncratioll of 7-billll-sequencc.
Shift register
----l
Clock
I
f-t
l
....-. M-sequence
output
J
REFERENCES
J IDENTIFICATION BASED ON COHRELATION FUNCTIONS 54
Tah/e 3.2.1
Feedback COll/lec//o/U to ge1/l'l'ltft' 1II-.H'I/IICIlCCS FUIUIIER READING
Modulo-2 addiLioll of two bits A and B obeys the accompanying exclusive-OR
trulh table
Numbcr of
stages
"
3
4
5
I>
7
H
10
II
A
o
o
I
I
Se4uence
pcriO(J
(bilS)
7
15
31
(I]
127
255
511
I02l
2047
B
o
I
o
I
Input to stage I
is lllutJ-2 slim of
stagcs
2, J
3.4
J, 5
5, fJ
4, 7
2, J, 4, H

7, 10
lJ. II
{II +B)mod2
o
I
I
o
Ilendat and Piersol (1980) and Jenkins and Walls (1968) cover spectral and
correlation methods of idenlilication in depth. Further examples of p.r.b.s.
identification arc given by Billings (1981) and Hogg (1981).
REFERENCES
llcllual, J. S., ant..! Picrsol, 1\. U. ( IYXO)... Engincering AppliL'lItiollS of CurrclutillIl alltl Spectral
Analysis". Wiley, New York.
Billings, S. A. (llJH I). Modelling HIllI idclItifh:atioll {II' thlce-phase cicci ric-arc furnace. In
"MoJclling of Oynarnical Syslems" (II. Nicholson. cd), Vol. 2. Peler Percgrinus, London,
Braccwell, R. N. (l97H). "Thc Fourier Inlegral and Its Applications". McGraw-Hill. New York.
Drown, R. F. (1969). Review and t:omparison of drirt-correction schemes for periodic cross-
correlalion. Eit'ctroll. Left. 5,179-181.
Cumming, I. G. (1972). On-line identification of a steel mill. Au/omalicll 8, 531-541.
Docbclin. E. O. (1980). "Syslcm Modeling and Rcsponse". Wiley, New York.
Everclt, D. ( 19(6). Pcriodic digital sequences with pseudonoise GEe J. 33, 115-126.
Gabel, R. A., anl! Roberts, R. A. ( ItJ80). "Signals and Linear Systems", 2nded. Wiley, New York.
Godfrey, K. R. (1969). The thcory of the correlation method of dynamic analysis and its
application to industriul processcs and nuclear pOWer plant. MeQs. C01/fro12, T65-T72.
Godfrey, K. R. (1970). The application of pseudorandom sequences to industrial processes and
!'IlOIlLEMS
nuclear power plant. 2nd I FA CSymp. 011 Jdl!fIlijiCll1iOfl lindSystem Parameter Estimation,
Praguc, CZCCho.flovakia. Pap. 7.1.
Golomb, S. W. (1967). "Shift Register Sequences", San Francisco, California.
Helstrom, C. W. (1984). "Probabilily ami Stochastic Processes for Engineers", Macmillan, New
York.
Hogg, B. W. (1981). Representalion and conlrol of turbugcllcrators in electric pow,cr systems. /11
"Modelling ofDynumicaI SysLems" (H. Nicholson, cd.', Vol. 2. Peter Pcrcgnnus, Loudon.
Jenkins, a. M., and Walls, D. G. (1968). "Speclral Analysis and Its Applications", Holden-Day,
San Francisco, California.
Melsu. J. W" and Sage, A. P. (1973). "Alllnlrouuclioll to Probability and Slochaslic Processes".
Prenticc-Hall, Englewood Cliffs, New Jcrsey.
Petcrson, W. W. (1961). "Error Correcting Codes". MIT Tl'c1micul PI'l'.fS, Cambridge,
Massachusetts.
Ziemer, R. E., Tranter, W. H., amll:;"unlliri, D. R. "Signals and Systems: COIllinuous and
Discrctc". Macmillan, New York.
3.1 Show that, in the nutation orSee:tion 3.1.1, the gain cnvhiLh IHakes 0:11(1)
as e10se as possible to .1'(1 -I- r), in (he sellse Ihat E[(au(l) .1'(1 -I- I)2] is
minimised, is I'lI},{t)/r
UI
/{O).
3.2 A diserele"time system wilh unit"pulse respollse If(z - I) "0 -1-" I Z - I -I-
+... is driven by a white-noise input\ II' }und has output \.1' \. Dy writing
in terms of the autocorrelation function of \II'J, show that the power
of the system, Le. m.s. output/m.s. input, is +!IT + + ... for this
input.
3.3 A linear system with input 11(1) has an oUlpli t Z(I) consisting of noise-free
output .1'(1) plus noise uneorrelated wilh y(l). If the lransfer fUlletioll H(jw) of
the system is to be identil1cd from correlation functions Hnd the ell'cct or noise
is to be as small as possible, wOlild II uzUw)/ 11 uuUw) or II
zz
( -jw)/ II uA -jeo)
be preferable as the eslimator of l/(jIU)'/
3.4 A sample autocorrelation is computed as ':'Il/{k) = I Ifillif dN. Show
that if successivc samples from {II J are inucpellllcnt, zero-Illcan and _C?r
constant m.s. value rulI(k) has a mean Zero anti Lm.S. valuc fT
I
7J...../N.
[Nole that. no assumption need be made about the amplitude probability
distribution of III }.J
3.5 A discrete-timc dcterministic test signal has a period of Ai samplcs and
sampling interval T. Find (i) Ihe interval in lag at which its autocorrelation
funclion repeats itself; (ii) the interval in frequency at which ils discrete power
spectrum repeats itself; (iii) how many values on its discrete power spectrum
can be specified independently whcn the signal is being designed in the
frequellcy domain. . .
3.6 Find modul0-2 sums of stage eonlents and/or gate olltput III Fig. 3.2.3(0
give the 7-bit m-seqUence delayed by 2, 3 and 4 bits.
PROBLEMS 57
3.7 An inverse-repeat sequence of period 2P is the result ofchanging the sign
of alternate bits, say the even-numbered bils, in 2 periods of a P-bit !,,_
sequence. Find the a.c.f. of such a scquence. Compare the contribution of a
constant-plus-linear-drift output component to the input-output c.c.c., when
input is an sequcnce, with that given by an lI1-sequcncc
III pu l.
3.8 Show that the cross-correlation function between an m-sequence and the
inverse-repeal sequence deriveu from it as in Problem 3.7 is zero at all lags.
Verify (hal Ihis makes il possible 10 idenlify the two u.p.r.'s of a (wo-inpul,
one-output lincar system simultancously, using the Wiener-HopI" equation.
3.9 The quadralie-residue sequellce of period 7 is I I - I I - I - I -I- I. Take
whichever sign you like for the last bit and check whelher the a.cS. of Ihis
sequcnce is the saillc as that or the 7-bit lI1-scqucnce 100 I 011. How is [his 111-
sequence rclated to thc q.r. sequence'? [Note that not all q.r. sequences have
the same period as an lI1-sequcnce; a q.L sequence with period 4k _ 1 exists
whenever 4k - I is a prime (GoMrey, 1%9).J
3.10 In an iJclltilicatiun cxperiment, the lirst bit in each pcriod ofthe 7-bit 1JI-
sequence 1001011 used as an input is inadvertently changed to O. The mistake
is later discovered. \Vhat alterations to the 1I.p.r. estimates obtained bYway of
the equation arc necessary to correct the error?
A discrete-time modcl of a linear, time-invariant s.i.s.o. system is to be
lound by a p.r.b.s. tesl. The system is known to have a continuous-time
impulse respollse approximaling 10 il(l) = IOlJ(exp( -O.lt) - exp( -I). Find
acceptable values 01" the p.r.b.s. bit inlerval, sequence length 2N - I bits
(N integer) and amplitude, if the mean-square value of the sampled output is
not to exceed 25.
{Hint /01' last part: lind the discrete-time power gain or thc syslcm for all
ullL'Orrelalcd input, Oil the reasonable assumption that the p.r.b.s.
apprOXilllatcs white noise. J
3 IDENTIFICATION BASED ON CORRELATiON FUNCTIONS 56
ChUfller 4
Lcasl-StIUlU'CS Model Filling
4.1 FINDING TilE "BESTFlT" MODEL
In the last two chapters we reduced the inlluence of noise on the estimate of
step respunse, impulse response or transfer function by time-averaging. The
justification was essentially statistical. We relied on the zero-mean noise-
depelltlent terills alTccting the estimates becoming negligible if averaged over a
long enough interval. For discrete-time models, we had to take many more
observations than would be needed in the absence of noise, i . e . ~ manyJl10re
observations than unknowns in the model.
In this chapter we take a diJrercnt appruach to the problem of estimating a
model from a large set of observations. We find the model, of specified
structure, which fits the observations best according to a deterministic
measure of error between model output and observed output, totalled over all
the observations. Initially we appeal to slatistiealtheory as little as possible
(not at all. in raet. ror most or this chapter). We examine the resulting
estimators in a probabilistic setting in later chapters.
4.1.1 Lcasl S'I,,,,rcs
We shall lind the values ortlle coellicienls in a givcl1modcl which minimise the
sum of the squared errors between the model output and the observations of
the output: leasl-sCjIIllr'S estimates of the coeflicients. We might well consider
other measures ornlthan output error squared, but this measure has two big
advantages. First, large errors are heavily penalized: an error twice as large is
fOUf times as bad. This usually accords with common sense, but there are
exceptions. For instance, when a few observations are very poor, or even
totally spurious misrcadings, the best thing may be to ignore them alLOgethcr,
and the worst thing to take a lot or notice or them. The other advantage is
mathematical tractability, The formula giving the least-squares estimates is
obtained by quite simple matrix algebra, and the estimates arc computed as
59
60 4 LEASTSQUARES MODEL FrITING
4.1 FINDING THE "BESTFIT" MODEL 61
To make the algebra tidy. collect all the samples y, to y" into an N-veetor y.
all the II, vectors into an N x p matrix U and e I to eN into e, giving
the solution to a set of linear equations. Moreover, the properties or the
estimates are relatively easy to analyse. Gauss, who dcviscLi least-squares
estimation at roughly the same time as Legendre, wrote: "' ... of all these
principles ours is the most simple; by the others we woulll be led into the most
complicatcd calculalions' (Gauss.
ami
UO+e (4.1.6)
The valuc (}which minimises S luakes the gradient ofSwith respect to 0 zero:
= U (4.18)
To evaluate ,15/<111 we need Iwo slandard results ror derivatives or vector-
mutrixcxprcssions, namely
S=e'e=(y' -1I'U')(y- UII)
4.1.2 Ordinary Least Sltuares
The model we use relates an observed variable )'1' the I'l!gl'l!sslllul. to fJ
explanatory variables, the regressurs If
ll
to "jill all known in advance or
observed. In dynamical models the sample-indexing variable / is time. bUlthe
method is not restricted to dynamical models, ami t need nol represent time.
For instance, an econometric model might rclate expenditure .1', to such
indicators as income, age and family size. In that case I would index observed
individuals or groups.
The model has one unknown coelliciellt 0, per explanatory variable. Irlhe
u's for one sample and the U's are collected into jJ-vcclors and
cJ(a'O) ( . .
== vector with clement I
ao
p
c '\' )
DU
j
/....; a}}j = a
j"" I
(4.1.7)

(4.1.13)
(4.1.12)
(4.1.11)
j= 1 k= I
15
-2U
T
y +2U
T
UO
l' r
= (vector with clement i La;kll, + I ajiOi)
k"" I j I
=(11 +11')0 (4.1.10)
,1(0'AO) ( .
._-- = vector with element i
DO
"
b5 = 2 JilTU' U 6IJ = 2( U bill'" U 6IJ = 2 L((element I of U JO)l:
I"" I
The 0 that makes the gradient or 5 zero is thererore
Ii = [U
T
U] - 1 UTy
To check that 0gives a minimum of S, not a maximum or saddle point, We
must see whelher any small change clIi about Ii increases 5. With cJ51DO zero.
We Illlliliply lllllthl.: l.:xprl.:SSillll 1"01' S' ill (4.1.7), 1l11lc lhalO
I
U I Yis idcllticallu
y'l'UO since it is a scalar, and putting UTy ror a and U
"
U ror A in and
(4.1.10) obtain
(4.1.4)
(4.1.3)
(4.1.2)
(4. L1) 0,,] , 0,
/ = 1.2.3... N
t=I.2.3... ,N
lipll
T
,
" "
'\' \'
S:;;;' L, (',' = L, (Y, -/(n,. O)J'
,"" I '''''I
y, =/(u" 0) +e"
then the model is
for thc practically useful case wherc/(,) is linear in Ihe UlIknOIl'JI
making up 0. That is.
It is important to realise that the Illodel nced not be linear in the physical
variables giving rise to the u's. For example, we might model the smooth
trajectory of a radar target in One dimension by
y, = 0, +0,/ +O.lt' +e, (4.1.5)
The model is clearly non-linear in t. but linear in 0,,0, and 0
"
Notice that 0,
covers any constant component of y, so {e I can be assumed zerO-Incan. The
explanatory "variable" whose coellicient is 0, is I lor all samples.
where e, accounts lor observation error (mcasurcll1cntlloisc) ano modelling
error, sincc even without observation error few modcls arc perfect. Wc aim to
lind the value Ii or which Ininimiscs
ExanJl)lc 4.1.1 The positive x of a raJar target moving in a straight Iinc is
observed at intervals oC 0.2 s over I s. Its position is to be predicted. A simple
way is to assume constant acceleration ovcr the observation and prediction
interval, estimate the initial position Xu' velocity V
iJ
anu acceleration a anu
predict future position using the model xU) = X
o
+I\lr +or:!'/2. PUlling the
time origin at the first observation, the radar gives
The condition Cor bS to be posilive whalever the sci of small changes ,;0 is
thereCore that V bO should not be zero. In other words, the columns of V must
not be linearly dependent, or to put it another way, nOllc of the regressors may
be totally redundant by being a linear combinatioll or the others at every
sample. A corollary states that UI" U is positivc-dclinitc, ensuring that
<)(JIU
I
U /W is pmdtivc and also guaranteeing thaI the inverse ur U
l
U exisls,
since VI' U MJ cannot be zero for allY real, IIOIl-zero M), It docs nol guarantee
that the inverse is easy to compute accurately, thuugh, as we shall sec in a
moment.
The {j given by (4.12) is called the ordinary /e{fSI-.H/lilifes (o./.s.) ('stimale or
O. We shall lind that it has some olil-of-the-ordinary properties.
<,I
63
(4.1.14) VTVO = V'y
A third point is that the normal matrix Illuy be ncar-singular. Computing its
inverse woultJ thcll be ill-collditioncu, involving at some stage small
dill'crenccs lll' large quantities. In Example 4.1.1 the computation is not very
hut! Vi VI and the coractors of some clements or u-r V are an
ordcr or so smaller than any clcment of [{rV.ln more serious cases like
Exal11ple 4.1.2, ill-conditioning may prevent satisCactory solution of lhe
normal equations. When this happens. it signals that at least onc regressor is
not pulling its weight, as it is close to being linearly dependent on the other
regressors, which would cause U to lose rank und U
T
U to become singular.
Poor lllllllcri'cal conditioning therefore indicates a badly constructed
model, with ncar-redundancy among its explanatory variables. Such ncar-
redundancy can be induced by a bad choice of co-ordinates I'or the
observations, obscuring the information in one or more regressors, as
Example 4.1.2 will show.
coeHicicnt matrices. Some are described in Section 4.2. However, if N is large,
u lot or computation is relJuired just to form the 110rmal equatiolls
4.1 FINDING TilE "'BEST-FIr' MODEL
4 LEAST-SQUARES MODEL FITTING 62
Of = "0 a] = [4.79 234 55,4J
Several practical points show up in this example. First, the dimensions of
the normal matrix U
T
U are fixed by the relatively small number ofcocHicicnts
bcing estimated; thrce here, however many observations there may be.
Sccond, VI" U is symmctric and, as we saw earlier, positive-definite; ami there
arc spccial emdent mcthods or solving sets of linear elJuations with such
331.J J
3483. Y
J8348.392
380.39474 J
- 72.532895
6.9078947
63
b(12.2
3483.9
n 3995.7237
761.74342
-72.532895
l
6
U
'
U = 63
331.1
[
20963.816
[V'U] -I = -3995.7237
380.39474
V'y = J'.,
45687.96
and
11'=[157 -52,4 5.71J
The posilion at II s given by lhis 0is - 73.9 m, clearly at odds wilh lhe given
observations, so something is amiss. The observations are actually
.\"(1) 5 +250(1 - 10) +5(1 - 10)' +e(l)
= n 1995 + 150/ + 5/' +e(l)
and the observatiollnoisc e(t) has samples with r.l11.s. value 6.12, 50'0 is wildly
inaccurate. The normal malrix V'" U is now very ill-conditiuned, and the
so, rounding lInally to three Ilgures,
EXUlIIlJlc 4.1.2 The satHc radar observatioJis arc obtained as III Example
4.1.1, at intervals of 0.2 s but starting at 10 5 rather than time zero. The same
model as in Example 4.1, I, quuLlratic in t, is tiued. With the ne\v values of t,
10(0.2) II. keeping eight significant ligures to try to maintain UCl'UI'i1CY gives
I
264
4,46]
-335
67.0
3 1.1 j
2.2 (J.9
0.9 0.3916
0.8
218
-2.95
18.2
-33.5
0.6
151
V'u= I
LII
[
0.821
[VTVr'= -2.95
4.46
[
793J
Ury = 580 ,
476
and to three figures
so
I(s) 0 0.2 0,4
X(I1I) 3 59 98
Here (} is [x
o
"0 aJ' and y is [3 59 264J'. Malrix V has all I's in
column I, the sample times 0,0.2... , I ill coluJlIn 2 i.lnd r
1
/2 valucs
0,0.02, ... , 0.5 in column 3, so
4.1.3 Orlhogonalily
65
(4.1.18)
(4.1.16)
IIYll2 = yry = j'y + 2y'e -i-CTc = IIY11' + liCI1
2
4.1 FINDING TilE "/JEST FIT" MODEL
..-----:'
roo Y-9 _
_______ I

. PkJne containing
all vectors of the form
..

ui
Fig. 4.1.1 Urdinary least-squares estimation vIelVed as orthogonal projcctiutl.
is compared with the observed output y. The vector of errors between yand y is
y
(Incidcntally, it may seem odd to ucnnc error So that overestimating the
outpul gives a Ilegative Crror. It isodd. but it is all almost universal convcntiun
amI we are stl1l:k with il.) Louking at the sUllIol'lhc prOlIucts ol'corn:sponding
model-output amI samples, wc lind that
)"e = 0' Vr(y - VO) = y'U[ VT(y - V[ Vi Vr' VTy)
=y'(V[VTVr'V
T
- V[U'Vr'VT)y=O (4.1.17)
Two vectors whose inner product is zero, like the model-output and output-
error vectors here, are said to be orthogonal. This follows from Pythagoras'
theorem amI the delinition or Euclidean length of a vector as the square root of
its inner product with itself, that is, the sum of squares of its elements.
Denoting the length of a vector by 11'11, il' f'e is zero we have
i.e. y, c and y I'orm a right-angled triangle. That makes sense, as in 0.1.5.
estimation Ii is chosen to make c as short as possible, subject to y, being j)fthe
form uO, a linear combinalion or the column vectors to of U: .
Y
=0 u' +0 u' +",+0 u' (4.1.19)
I 1 1 1 II r
In other words, yhas to lie in the hyperplanc spanlled by to u;,. Figure 4.1.1
shuws lhe situalioll rur I' = 2. Plainly, the shortest error vector is obtained by
(4.1.15)
4 LEASTSQUARES MODEL FIITING
y VIi
If T is large (10 here), a small error in a or vI) corresponds to large errors in
coellicients x( - T) and v( - T) or the present Illodel. The 10-ligure
calculations give a jj which, although it looks poor, implies X
o
= 15.9,
V
o
== 224, a = 53-4, so the indirect estimates of "0 and lJ arc lillie worse than in
Example 4.1.1. The errOr in X
o
is larger because of" the ill-colHJitioning but
much smaller than might have been guessed from the poor O. NeVCrlhelcss, it
leads to a mcan error or 26.3 between model output and observed position,
enough to detect the llloJcl ueliciclicy easily. 6
Two lessons can be drawn from Example 4.1.2. First, the model and the
observation co-ordinates should be choscn to avoid approaching linear
dependence between the regressors. Second, the sensitivity or the model
coefiicient estimates to noise may be much higher than that or the gooul1css of
fit. The importance of avoiding high sensitivity varies according to whether
interest centres on the accuracy of the coell1cients or their ability to lit the
observations.
the model is
X(I) = [x
o
- voT +(aT
2
/2)] +(u
o
- aT)1 +(at
2
/2) +e(l)
= x( - T) +v( - T)I +(al'/2) +e(t)
64
A property ofgreat value in interpreting least slJuares, Orlhogol1ality, becomes
apparent when the model output
1
2
= (r +10.5)' '" 21 t + 110.25 = 211- 110.25
so in V, column 3 is almost equal to 10.5 x column 2 - 55.125 X column I,
making V
T
V almost singular.
If 10 significant figures are kept, the ill-conditioning has less clreet but 0I is
[417 -310 53.4J, still far from the correct values. Now the reason is that
the coefficients of I and 1 are very sensitive to allY noise-induced error in the
acceleration and initial velocity implied by the observations. Spceilieally, with
the observations generated by
x(t) = X
o
+vo(l - 1') +(a12)(1 - T)2 +e(t)
calculated IU
r
VI, 0.608, is about cight orders smaller thall the individual
product terllls in its calculation. The cause is the new choice til" time origin,
which has made the regressors very nearly Iincurly ucpcnucnL Over tile
observation range 10:::; f ::::;; II. denoting I - 10.5 by r and recognising that r is
small,
67
(4.1.27)
(4.1.25)
(4.1.29)
(4.1.26)
(4.1.28)
(4.1.30)
(4.1.31)
(4.1.24)
1'.\
k =0, 1" .. ,.1'-1
k= 1,2, ... ,S
R"" = (ljN)V'U
] ]
-
II I YI
)' ,
",,' ,{,
I, = lv'uj' I U'y
U
N
-
2
N +/- I
I'",.{k) = 2: "
j
' kYj'
j=/
r,,/,= (ljN)V'y,
[
"
0
Iii
V=

[-'1,1' = Rl/lI(L, 5)h
We can take L equal to s alld compule the u.p.r. estimate
- I
h = R
Uli
(S,5)ru.l'
where I anL! 111 tire any convenient starting timcs from which the nccessary
samples ur II ami .J' are available. We c01l1Ll, for instance, use
which is identical to the o.l.s. estimate based on the same observations. The
model in both cases is
The u.p.r. estimate fi found by inverting the Wiener-HopI' equation would
then be
with
The sole (Jilrcrcllcc, illsigniJicall( for long records. is that the t\\'o methods
might start some calculations at slightly JiJI'ercnt points in the streams or
samples.
4.1 FlNlJlNti THE "UEST-Flr r-.H>DI.':L
(4.1.22) becomes
Here R
,W
is synlmetric, as autocorrelation values at equal positive and negative
lags arc, by dcJinition, equal. It is also positive-delinite and hlvertible unless
somc exact linear relation holds between any s successive input samples, an
easy situation to avoid.
In practice the correlation values in r UJ' and R
U'I
would be replaced by Hnite-
sample estimates
(4.1.22)
(4.1.21)
(4.1.20)
(4.1.23)
1
266.9
-2.929
1'",,( I -.I)j
'III/(L - .\)
0.8
210.1
7.929
k = 1,2, ... , L
4 LEASTSVLJAIlES MUUEL FITTING
0.6
155.4
-4.429
l'u,,(L - 2}
"",,(-1 )
I'UII(O}
0.4
103.0
- 5.000
0.2
52.79
6.214
y= vii =UIV'U! I V' y =1'( V)y
I'",.(k) = 2: 1r,1'",,(k - i),
i=O
U'(y - V[U'Vr I V'y) V'y - V'y =0
o
4.786
- 1.786
[
1'",,(0)
R (L.I)= I',,,,tl)
1m , .
I'",,(L - I)
J'
e
66
With the view that yis the OrlllOgollal projection or y 011 to tile hyperplane
ddincu by U, it is natural to speak ora IJIlffrix 1'( U) projecting)' 011
to V:
We sec that o.l.s. is a linear cstimator, in that {) and yare both linear in the
observations y.
Another useful interprctation of o.l.s. is ill terms orcorrelalion functions.
The discrete-time Wiener-Hopf equatioll approach to identifying the unitw
pulse response (Section 3.1) was based on
where 'u,.tk) is the input-output cross-correlation at lag k, rur,(k - i) the input
autocorrelation at lag k -'- i and h; the unit-pUlse response at lag i. Thesum in
(4.1.23) is in practice from i = I to i =s, the eOective seilling time beyond
which the u.p.r. is negligible. Collecting rll),(l) to I'lIlL) into a vector r
uJ
" the
significantly nOll-zero 1I.p.r. values 11 I to into h, am.! the I'JlII values into un
Lx s matrix
making y the orthogonal projection 01" y 011 to the hyperplanc of the UC's. The
picture also makcs it clear that cis orthogonal to each inuividual regressor
vector U
C
in the hyperplane.,Algebraically,
and fIe is -0.0012, near enough to zero. The regressors I, I and (2/2 give
x 10'6, -4.6 x 10'" and -1.7 x 10'''. Notice that whenever a
constant term is included in the regressioIl cquation, the samplc mean of the
model-output error will be zero. 6
Example 4 .1.3 The o.l.s. Illodcl obtained 111 Exaniph: 4.1.1 gives, rOllnding
la-figure computations,
A correlation interpretation of the orthogonality of yand eis that, if yand e
are each composed of an ergodic sequence of samples, y'J(! is N limes the
sample discrete-time correlatioll between lJ
i
} ano {(;} allag zero. In that case
one can say that the output error is uncorrclatcu (in the tinite sample) with the
corresponding model output. Intuition ugrccs that there should be nO
correlation between the explained part.l; of the output and the ullexplained
part eif ecannot be reduced further by adjusting the explanatory terms.
the prescnce of IVmight imply a large increase in computation, as lVis N x N.
So long as IV is diagonal, however, the increase is sl11all. One need only
multiply )'i aild u
i
in each regression cquation by 11'//2 then calculate as if for
0.1.5. or, equally simply, multiply each column u
i
of U
T
by 11'", producing Zr,
say, then calculate
69
(4.1.35)
4.2 COMPUTATIONAL METIIODS FOR OROINARY LEASTSQUARES 4 LEASTS(jUARES MODEL Fin INa 6H
4, 1.4 Least S'luares
The expression for Ow is lillic lIlorc Clllliplicatcd thall thc o.l.s. cslililtltC. Hill.!
is still symmctric. With IV tu ensure thal ,')'w is
positive, Uf,VU is also positive-definite and thcrefore invertible. Gcncrally,
Example 4.1.4 Once more we use the radar observations frol11 Example
4.1.1. This time. we have prior knowledge that the third, fourth and fifth
observations arc subjeetlo larger noise than the olhers. (the sequence Ie} is,
in fact, 2 - 3.8 7.8 5.8 - 9.8 - 4.) We decide to wcightlhe squares of
thcse crrors ,Ii as strongly <IS thc others, making II'JJ = 1l'.J..j. = 11'55 = I and
11'11 = 11'22 = Il'hh = 4 ill a t.Iiaguual IV. Note that the absolute scaling of IV is
immaterial. U
l
IV is
"I

4 I I I

0.8 0.4 0.6 0.8
0.08 0.08 0.18 0.32
givillg
[1.5
6.6
2.66 J
I [1771 J [4.59j
Ow= 6.6 5.32 2.412 1407.4 = 250
2.66 2.412 1.1428 637.5 19.0
The weighling has greatly improved the estimates of the initial velocity V
o
== 0
1
and acceleration ti ==
3
, Even rough prior information on noise behaviour is
seen to be valuable, particularly in estimating parameters whkh depend, as
here, on derivatives of an observed variable. L::..
We have discovered two good reasons ror secking bctter methods or solving
the normal cquations than gcneral-purpose matrix-inversion routines: thc fact
that thc normal matrix VI' U has sllcc:ial properties which should be exploilcd.
IHIIlH.:ly SYllllllCl ry aud thc possi bili ty lhat VI U is ill-
cOI1lJilionc<.l, cHusing inaccuracy in its inversion, All the methods we shall
examine cnlail splitting UrU iuto matrix
4.2 COMPUTATIONAL METHODS FOIl OIlDlNAIlY LEAST-
SQUARES
Both these ideas are capable of extension to non-diagonal IV. They are
followed up in Section 5.3 in connection with the Markov estimate and
instrumental vari ables, respectively.
(4.1.33)
(4.1.34)
. (4.1.32)
N N
Sw ; e
T
lVe = LLn'ue,ej
j= 1 j= I
iJ
w
=
rJS
w
/DU=2(U
T
IVU()- UTlVy)
giving as the weighted-least-squares (w.l.s,) estimate
A diagonal IV can be used, each lI'jj c1cmcnt wcighting an individual squared
error. Wecould even specify non-zero orr-diagonallenns to penalize products
of errors, but at the moment it is not clear what circumstances would require
this. It will prove necessary when we consider estimation in autocorrelatcd
noise, in Section 5.3.5.
The estimate Ow of 0 minimising S"" is easily found by making
Ordinary least squares estimation weights each output error in the same
the significance of an error depends only 011 its size, not all its position in the
succession of N samples. There are occasions when it makes sense to weight
errors at some points more heavily than others. 1;01' example, in tracking a
radar target one might be morc worried by errors in reccnt positit)JI than by
older ones. Another possibility is that some observations might be distrusted
more than others, so that one would wish to take Icss notice or thc
corresponding cr'rbrs. Such evcntualities arc the concern of this scction.
The algebra deriving the 0.1.5. estimate is scarcely ahered if e
'
lVe is used
instead of e
1
'e, as the error measure to be minimised, with HI a symmetric
matrix showing thedesircd weighting or individual error lcrms contributing to
the total
Matrix square rools arc not unique, for if B is a square fool of A and P is allY
orthogonal matrix of the appropriate dimension, then as ppl is I,
and f[z contains, in general, all p z's, Ji'z contains the last jJ - I or them, and so
on, so that L
T
is upper-triangular and /. lower-triangular. As (4.2.3) is lrue
whatever the value of z, LL
T
equals II as rcquired.
71
(4.2.7)
ul
- 2.95 4.46J
18.2 -33.5
- 33.5 67.0
0.5)6 0.183.)6]
jU.7 0.5)0.7 z
o 0.1222
[
,-
y6
L'z=
[
0.821
/1-'= -2.95
4.46
so
4.2 COMPUTATIONAL METHODS FOR ORDINARY LEASTSQUARES
glVlllg
rounued to three Jlgures (from tcn-Jigurc calculations). This agrees with the
inverse found in Example.4.1.1. C::.
Now L - I can be round by solving Liz = (. starting with the last row and
working up\\'urds:
2.1 '" \.1/0.1222 '" 8.183\,
2, =\,I.jO.7 -0.50, '" 1.195(, -4,092\.1
z, =(,IJ6.6 - 0.5=, - 0.183 2.1 '" 0.4082\, - 0.5916\, + 0.5455\,
An allernalive and easily programmcd way lo find L is by direcl idenlificalion
of colulllns of LL
T
with the same columns of II, one at a time. The first column
of L is found by noling lhal/" times column I of L equals column lorA, and
If I equals a 1 I' Column 2 of L is then the only unknown in
',,(column 1 of L) + ',,(columll 2 of L) = column 2 of A (4.2.5)
and 1
11
is obtainable frol11
(4.2.6)
The process is continued until all the colu1lllls of L arc round.
Hoth mClhous or linding L will break down if at any stage i the expression
for It is negative. The positive-definiteness of A ensures that this will not
happen, as we can sec by considering the situation after i-I successrul stages
of cOinplelillg the square. By then, zTAz is in the form
where 11 is thc quadralic remainder in =j l' ito =/,. and ai:= I) is the square of
(4.2.4)
(4.2.3)
(4.2.2)
(4.2.1)
I' ,
I r
2
1
r
"
A =LL'
L
r
=
4 LEASTSQUARES MODEL FITTING
4.2.1 Cholcski ""ctorisalioll
A =BB' = BIB' = BPI" B' = BP(BP)'
zTAz (lrz)' + (liz)' + ... + (I/Z)' = (/. IZ)/(/.' z) = zl L/.' z
7U
Example 4.2.1 The I1llnnallllUlrix A in I:Xi.llnplc 4.1.1 gives
,'Az =6z; +60,z, +2.20,0, + 1.8=,2,
= 6(0; + z,o, + 0.360, =,) + + 1.8=,=, +
(Jii(o, +0.50, +0.l83=,)' + (2.2
+ (1.8 - 1.1)=,=., + (0.3916 - 0.20 16
= (J6(z, +0.50, +0.1832,)' + (JO.7 (2, -1-0.5=.1)' + (0.12222,)'
where
The simplest factorisation method is Cholcski factorisation into a real
triangular matrix and its transpose. Denoting Vi U by A 1'01' convenience,
Choleski factorisation linds a mUfrix square rouf L of II
satisfying
so BP is also a square root of A. A lower-triangular square roollllay be made
unique by requiring all ilS principal-diagonal e1emenls lo be posilive. Once A
is in the form LL
T
, it is easy to illvcn, as A- I is L"'TL - I and a lower-lriangular
matrix is invertible, a ruw at a lilllC, by suc1,;cssivc substitution.
Cholcski decomposition of II is nothing 1I10re cxolit.:: than c'ompll.'lillK file
square, familiar from school algcbra. The quadratic Zl liz is rc\vriUcn as a sum
of squares of linear combinations of the z's:
the coeJl1cicnt of Zj in the next square to be computed, U. For C
i
to be found,
at- IJ must be positive. Now zTAz is positive for any real, nOll-zero Z, including
that in which Zj is I, Zj -I- 1 to zp zero, and z1 to Zi I chosen to make (I to (i- 1 all
zero (certainly feasible since the coelIicicllts of: I to Zj I in 'I to (i _I form an
upper-triangular matrix with non-zero principal-diagonal clements, which is
consequently non-singular). For this z, zTAz is just II, so llg - I) is certainly
positive.
A refinement removing the need to compute the square fools at-
lli12
is to
replace LL
T
by LDLf, where D is diagonal with eli; = II and L is still lower-
triangular but with I's all along its principal diagonal. An uppcr'-lriangular
matrix version is called and is the basis ora least-squares
algorithm widely used in estimating the state uf dynamical systems (Bierman,
1977). (Slale estimation is discussed brielly in Section 7.3.2.)
The aim of matrix-square-rout mcthods 1'01' finding the 0.1.5. estitnale is to
red uce inaccuracy due to i of the normal lIlatrix, They work by
reducing the range of numbcr magnitudes, Hill.! hence the seriousness of
rounding errors. Computation with matrix square roots gives accuracy
comparable with that obtained by keeping twice as many signiJicant figures
without resort to matrix square roots.
and
73
(4.2.11)
(4.2.14)
(4.2.13)
where V is an upper-triangular p x p matrix. Notice that. since
V*' V* = )/'V= V'Q'QV = v'v
and V is upper-triangular. V is a Choleski factor or v
T
V.
Ifwc now denote the top l' elements or y* by yf and the other N - l' by yj,
we lind from (4.2.9) and (4.2.10) that
S = S = (yf - VO) '(Yf - VO) + yjTyj (4.2.12)
In this sum of N squared terms, only the Jirst p depend on 6, so S is minimised
by making thelll zero. Hence
The Ii thus found is the j)I'UUlIct of the inverse.: or all upper-triangular lIlatrix,
easy lo compute by back-substitulion, and the lirst p samples frum y*, a
linearly Jiltered version of}'. Jf\:Ve Cunnow lino a reasonably simple method of
constructing an orthogonal Qto give QV in the desired form (4.2.10) we shall
have all attractive way to calculate the o.l.s. estimate of U
forming and inverting the normal matrix.
4.2 COMPUTATtONAL METHODS FOR ORDINARY LEAST-SQUARES 4 LEAST-SQUARES MODEL FrI1"ING 72
(4.2.17)
(4.2.15)
(4.2.16)
Uli) = pli-llpti-J.J . , P(1)U
*4.2.3 Householder Tnlllsformation of Regressor J\'latrix
The required method is provided by Householder transformations, in which
U is premulliplicd successively by orthogonal matrices pt I '. plJ.l, ... , pl/", each
of the form
when UOl is prclllultiplicd by pIll, while making clements i + I to N of column i
zero. Not only is this possible, but ",liJ can also be chosen to leave the Jirsl i-I
rows o!' uti) unchanged, as follows. Denoting column i of VIi) and U(i + I) by
and ul
l
-/-
1
),
Each ",Iii has to do two things: make pHI orthogonal so that in the long run
plpJprp - I J... p( II is orthogonal, and perform a stage of the uppcr-
triangularisalion or QU by making a column zero. Writing out plOTpliJ shows
that ",HHwliJ musl be I lo make ptil orthogonal. The upper-triangularisation or
U call be carrieu out column by column if each n,li) is choscnlo leave unaltered
the firsl i-I columns of the matrix
(4.2.9)
(4.2.8)
(4.2.10)
V*
y*g,Qy=QVO+Qe= V*O+e*
This sum can be made equal to S, the sum or squared errors using (} in the
original equation, by choosing an orthogonal matrix as Q, so that QTQ is I.
Furthermore, there is enough freedolll ill clwu!)ing Qfor us to insistlliat u*
should be in the especially convenient form
with modified observations, regressors and errors, but the same O. The sum of
squares of the output errors given by the o.l.s. estimate 0and the modified
model is
A more recent alternative way around numerical dilTiculties in solving the
normal equations is the GolulrBouseholder technique (Golub. 1965). A
linear transformalion is applied to V and y (i.e. they arc preJ1lulliplied by a
matrix) so as to modify the regression equation, without changing U, and make
it easier to solve. PreJ1lultiplying the regression equations (4.1.6) by an N x N
matrix Q gi ves
4.2.2 Teehnillne
I
74 4 LEAST-SQUARES MODEL FITTING
4.2 COMPUTATIONAL METHODS FOR ORDINARY LEAST-SQUARES 75
so
(4.2.18)
The technique just described has excellentnumerieal
properties in cases where other methods are vulnerable to

-0.11019
-0.05019
0.04981
0.18981
0.36981
- 1.2247
-0.15505
0.04495

0.44495
0.64495
-323.74
- 35.722
3.2785
56.278
123.28
169.28

-I
0 0
I 0.2 0.02
[III) '" [I
I 0.4 0.08
1 0.6 0.18
I 0.8 0.32
1 I 0.5
- 2.4495
o
o
o
o
o
)'12) :::::y(1) _2\\.. I().\\.II)l)'I()
3
59
98
151
218
264
rr -./ 6 ==
U\2(1 = - a 5gn 11\1
1
) = - 2.4495
all) = 12rrllJii,'1 + rr) J 1/1 = 4.1108
lrjll::::: uj\l/o:lll = 0.24326 ror j = 2, .... 6
= (ll\\l +- (T 5gn ul
l
l
n/a(1) = 0.83912
Theil
w
li
" [111
1
= [2.0554 0.72978 0.26759]
and
amI similarly
EXUIIIIJlc 4.2.2 Let us carry out the o.l.s. computation for the problem of
Example 4.1.1 by the Householder method. Starting with
we compute in turn
(J = 15um' or squares or clements ito p of column i of U1ij :1,'2
with i= I, so
(4.2.21 )
(4.2.25)
(4.2.23)
(4.2.20)
(4.2.24)
, (4.2.22)
N
III = Lil/I! g a
2
II /.....; J'
j"" I
Iluji+ IJUZ = =
= U1ilTU1il = lI ui/
l
li
l
{

(Xli) = (12 +2a-lul;'1 +/.....;"j:l! = {2a(a + luWl) jill
j",j
ul:+ 11.= -dsgn uW
N
ali' = - ui
"
I'll = - 11::+ II)' + L
j=i+ I
where
The first i-I elements of ui
i
+ I) arc the same as in ujil and the last N - i arc
zero, so (4.2.21) requires that
\I!y' = 0, i-I ..
which says that ""U} is a scalar times the change from nIH to + Ii. To make pOl
orthogonal, w1ilTn,(i), that isll'w{i
l
lll, must be I, so ",(I) is in fact
(U1
i1
- nil + - + 1111. The sign is immaterial, and we slHllltakc it as
positive. Now ",HI can be fixed, except for "'lif, by specifying that the lirst
i-I clements uf ul
i
", II be unchanged I'nllli those of uV' and tlie last N - i
elements be zero, giving
The sign of u:: + I) is chosen to maximise 111'11 so as to avoid un necessary loss of
numerical accuracy:
and, substituting (4.2.23) into (4.2.20),
where (J is positive, thell
The only unknown element ui:+ I) of I) is determined by noting that since
p(iJ is orthogonal
Some practical points to notice are that the s4uarc root in (4.2.25) need. not
be computed, as P(I) involves only second-degree terms in elements urn
l
!); that
W(i)W
liJT
need not be computed and used explicitly, as UU+ 1) is composed only
of U
W
, W
lilT
VIi) and ",h); and that (J will be zero only if U is singular.
and
77
(4.2.32)
(4.2.31)
(4.2.26)
(4.2.27)
i= 1,2, .. . ,p
v = pRQ'
V'VQ = QR
T
PTpRQ'Q QRTR
aQ(vector zero but for I in position i) = a(column i of Q) = ll(lI;
After the 0* which minimises S*, and therefore S, is found by this trivial
calculation, is readily obtained us QO*, since the inverse of QT, un brlhogonal
matrix, is Q. Furthermore, near-linear dependence between the regressors will
show up as a very small value for one or more of the rHOs. In an eXlreme case,
some r.. might be negligible or even brought to zero by roundolf error. so S
would be all"ected by adding a large number ll( to Or As Qt!* is 0, the
corresponding change to 6 is the addition lo tJ of
The first p clclllcnts uf lW* rjJOT lo and the rest are zero, so S
minimised by
S* g, (y* - RU*)'(y* - /W*) = (y - VO)'pp'(y - VO) =S (4.2.30)
A factorisation method for 0.1.5. which indicates exactly where the cause of
any ill-conditioning lies is singular-value decomposition (Forsythe el al.,
1977). It has much in common with the Golub-Householder method. The
technique is based on decomposing the regressor matrix U into
where I'ano (! are orthogonalilltllriccs, rcspcL:tively N x N i.llllip x p, auu Ris
all N x fJ IIHHrix zero but for lllllHwgative clements fji' IS; i S;/,.
elements arc sLJuare roots of the eigenvalues of the normal matrix VI" U. To see
why, notice that
and R1R is a p x p uiagonalmutrix with as element U, i), so each column qf
of Q satislies
V
T
Vq; = rf,<!; (4.2_28)
In other words, is an eigenvalue of VI V, with (If as the corresponding
eigenvector. Clearly a zero value for 1'jj would indicate exact line,Hdependence
between the L:olumns or VI" V, reJlecting similar dependence between lhe
regressors.
Again we transform the regression equation linearly, to make the normal
equations easier to solve. Prelllultiplying by pT,
y* = ply = pi VU -I- piC = 1" pRQ'O -I- piC RU -I- c* (4.2.29)
where 0* denotes QI'O and c* denotes pOl c. Because pT is orthogonal, the
transl'ormation tIoes not alter the SUIlI ur sljui.lredcrrurs:
4.2 COMPUTATIONAL METHOUS FOR ORDINARY LEAST-SQUARES
4,2.4 Singulur-Vulnc IJccompusiliun
ll(llJ = 1.2882

0.41833
0.12220
4 LEAST-SQUARES MOUEL FIlTING
- 1.2247
0.83666
o
- 323. 74
219.32
6.7647

12.473
1.4119
uW = 0.83666,
[
-2,4495
V= 0
o
y* =y(4-) =
-2,4495 - 1.2247 -0,44907
- 323. 74
0 0.83666 0.38916
219.32
U(JI=
0 0 -0.07282
},U) =
-8.2814
0 -0.17315
ano
0
-6.7169
0 0 -0.03423
8.8475
0 0 0.04451
3,4120
w
'lJl
V'z, = [0 0.64410 0.3432('1
W
'3lT
V'" = [0 0 0.10953J
w
'ZlT
= [0 -0.76984 0.034893 0.19015 0.34540 0.50066J
and
(J = 0.83666,
l:;'or i = 2, we compute
76
a = 0.12220= a
U1
= 0.21906
W"'T = [0 0 -0.89630 -0.36852 -0.21598 0.119151
Hence
giving
Finally, for i = 3,
Matrix lj'4} huslhe same first two rows as UiJl, only nOll-zero iii row 3, and
rows 4 to 6 all zero. As lj(4) is U*, with Vas its lirst 3 x 3 SUb-matrix, we have
and
Solving VO = y*, we find 0, = 55.4, 0, = 234,0, = 479 by back-substitution,
The sum of squares of outpnt errors S is given by yj ryj as 157.8. f':,
4,3 NON-LINEAR LEAST-SQUARES ESTIMATION
79
Solving the normal equations amounts to aujusting 6until each of a collection
offunction5 or 6becomes zero. One approach is Newton's method. In a scalar
problem adjusting {j to bring g( (]), say, to zero, iteration k 01" Newton's method
replaces 0" - " by
a'" a" -"- (Dg/rJ0I"e,i" _"r 'g(0" - ") (4.3.4a)
The rationale is that for slllall changes about 1), g(O) varies <:llmost linearly
with a, su this adjustment will bring g(a"') c1use to zero. The multivariable
version, similarly motivated, is
0'" 0" "- [J"gl'h"" ..J 'g(O'k - ") (4.3.4b)
Example 4.3.1 For mallY biolllcuical applications, as the stu.dy of bloud
cUllccnll'atiull or a drug artl,;r a uust.:, a low-on.ler multi-exponential response.
model is cmpluyed, eithcr as all end in itself or as a half-way stage to a set.ol
rate equatiuns making up a compartmental model. One such model, wIth
initial and final output zero, is
-,,(I) = c(exp()" I) - exp(A,I)) + e(I), I ;:>: 0
The vector 0 is [c )'1 i
2
]\ and if y(1) is sampled at times 1
1
,/
2
,. , . ,IN' WC
have exp(A,I) - exp(), ,I) as '1!/aO, ' CI; exp(A,I) as '1!/aO, and - cl; exp()" I;)
as ,11;/,00. Cullecting samples expU,I) intu a vector 'I" exp(A,I) mto '12-
'jexp(}.j'i) into '1 antl/ jcxp(..1.
2
1;) into '2' the normal equatIOns become
"I ',' .. Li'.
-2[",-1', c" -c',J'(y-C(II,-II2)=O
From the first or these equations, c call be estimated as
c' = (II, -II,)TY/(II, -'12)'(11, - 112)
once )'1 and )., have been estimated. At each iteration and).2 are to
try tu satisfy lhe other two normal equations, with for c, cHher
numerically rrom the previous iteration or algebraIcally glVlllg
,) )'(11, -- 'I,) 'tIl, -- 'I!) = ('I, - 'I,) , y') ('I, .- '12), j = I, 2 f':..
4.3.2 Gauss-Ncn'(oll Algorithm
4.3 NON-LINEAR LEAST-SQUARES ESTIMATtON
equations. Fur a model linear in 0, fis VO and J"lis V. The extra dillieulty with
a non-linear mouel is that Juris a function oro, so the normal equatIOns are no
longer linear ill 0, and have to be solved iteratively. If some of the coefficients
in 0 enter r linearly, as in the multi-exponential model, they can be found at
each itcratilHl by o.l.s. ancr the nOll-linearly entering coemcicnts have been
fixed.
(4.3.2)
(4.3.3)
4 LEASTSQUARES MUDEL FiniNG
y=f(V,O)+c
rJS/rJO = - 2[J,,/J r(y - f) = 0
78
Let the model be
In many situations where the Illudel is not lincar ill all its coellicicnts, it still
makes sense to find the coemcient values which minimise the slim of the
squared errors between model output and observed output. An example is
multi-exponential modelling of an impulse response
y(l) =1I, expU,I) +lI,exp(A,1) + ... +lI"cxp(I."t) +"(1) (4.3.1)
where the a's enter linearly butthe A's du nut. The algcbra which g"ve the u.l.s.
estimate no longer applies, but we Can still exploit the fact that the ,"unction
being minimised is i.l SUIlI or squared errors; we arc nut J"acl:d with <l gl:IH.:ral
unconstrained algebraic minimisation problelll.
4.3.1 Gcncndiscll Nurmal E(luatiulIs
If addition of all; to 0 has lillie elrcct 011 S, VI/; must be negligible. In other
words, the linear combination or regressors with the clements of (Ii as
coeIlicients contributes nothing to the performance or the Illodel. To
summarise, the a tlractioll of the singular-value Jccomposi lion is lha 1it reveals
any ill-conditioning clearly, and provides an easy way to prevent it from
causing numerical dillkullics. By selling 07 to zefo if r
ii
is belliw SOIlIC small
specified vallie, a corn:sj1ollding lincar cOlllhinatioll (Iiii willi lillie iilllucllcc
UII Sis sd tu zero. II' it \Vcrc lIot, it could he so large as La obscure the
meaningful parl or u.
The mechanics of ubtaining P and Q .Ire a lillie coJllplicatcd (Fon;ylhc el
al., 1977) and will not be detailed here. IJrielly, Householder lransfunnatiulls
are used to introduce zcros below the leading diagonal of U by
premultiplication, and above the Jjrst superdiagunal by pustJ1lultiplicatiun,
then a version or the QIi algurithm is empluyed tu reduce Ihe bidiagunal
matrix thus obtained to diagonal form, iteratively.
where y is the vector composed or all the output sanlples, U is the matrix of
explanatory-variable samples, and fcomprises the functions moLiclling .l'. The
least-squares estimate or 0 has to make the graLiient of S zcro, i.c.
whcre JuI is the Jacobian matrix of r with respect to 0, i.e. the matrix with
21/
aO
j as clemenI (i,j). We have in (4.3.3) a gCllcralised versioll uf the normal
and
[ilOl,\I" - ,"Ol,'z"']
Example 4.3.2 The two-exponential impulse response model of Example
4.3.1 is to be filted to observations
81
-16030J
525U
1.199 x IO-"J
5.564 x 10 .,
5' Ii '" 275.8 i
'
Ii '" 268.8,
JW)r./HlJ.::::: [ 74390
- 16030
(oJI lfl'l ,_[3.927 x 10-
5
1.1.1 .- I.IYYxIO"
One way of ensuring some progress in reducing Swould be to take a smull step
in the local downhill gradient direction from II, to
O(kl =Olk- II - a(DSjDO)lu"w-1l (4.3.9)
Oi"uinury esliUlUlioll call be regardlxl as the singl\,; Gauss-
Newton step rcquircu to readl the optimulll 0 frolll a starting guess ur
zero when the 11I0uei is linear ill U. Fur a Ilon-linear model, rapiJ convergence
can be expected only when we are justiHed in taking J as near-constant over
each step and ignoring second derivatives. These assumptions can only be
checked at the expense of a great deal of extra computing, and if they are
invalid the iterations may not converge at all. There is, therefore, some
incentive to look for a more reliable iterative method.
I"''' (y - jiOI) '" [- 15pO J
. -499.2
Rounding six-Jigurc calculations to four ligures as above, the new values of Xl
and Xl ,He found to be
= 0(11 = Oil" -\- [.I'"'' I""J-' 1""'ly _ jiOI) '" [-0.821IJ

).'zli - ". - 2.465
whence
[
1:,::J =
XiiI -1.249
arc obtaintu, at l1Iuch greater computational cost.
4.3.3 Levcnhcrg-Maflluardt Algorithm
A reasonable improvement has been achieved in this step. It is interesting to
find that, if c is retained as an unknown to be adjusted in the N'ewLon
step, the very poor values
glvmg
and
4.3 NON-LINEAR LEAST-SQUARES ESTIMATION
(4.3.5)
(4.3.7)
(4.3.8)
2.25
40
1.75
55
4 LEAST-SQUARES MODEL FITriNG
1.25
85
0.75
115
0.5
90
o
)' 0
80
N
L
1/ 1('
[Jug] '" 2 ':.J.."--'.. = 2l.l Il'l.l d]
ao DO " ,
1= I
so, denoting 101' at lj by )(11.-1/ and similarly ror f, we have the
Gauss-Newton algorithm
(jlk) = (j(k-lJ + [Jlk-I)TJlk-I)J-IJlk- I Ji'(y _ II)
given by a methionine tolerance test (Brown l!f al., IY79). Starting guesses
-0.7 for llIO' and -2 for arc made ancr inspection or a pll;t of the
ubservations. They allow calculatiun or
II',"" '" [I 0.7047 O.591fl 0.4169 0.293X 0.20701
Looking back at the normal equations,
g,(O) = - 2(row i of [.I,dl' J(y - f)
so a typical clement of the Jacobian matrix iii (4.3.4b) is
N
ali, = 2 '\'f t7f; .,1/; _ a'l; . .}
vO
j
LAvo, vO
j
DO, aO
j
U, -.I,)
/;:.1
Most of the labour in evaluating Jug is due to the Np(p -\- I )/2 second
derivatives. However, if our current estimate 11 already gives quite small
output errors )',-f" we may perhaps ignore all the terms in (4.3.6) involving
second derivatives. If so,
1,'zIT", [I 0.3679 0.2231 0.0821 0.03U2 0.0/11 J
which with the first of the normal equations gives the o.l.s. estimate for c:
i/O) = (1/\) - IJio1fry/(,lI0l - IliOifftll\O} - Ilio
,
) :::: 263.2
The model output errors can then be calculated as
y - jiOI = Y_ lilOI(II',ol _ I,'z"'), giving S'OI as 676.7
Jf c is excluded from the vector of unknown coellicicllLs Lo be found in the
Gauss-Newton step, because II will be cakulatcdlatcr frol11 1/\11 and Ilill in
the sal11e way us e
lll
), )101 is just
4.4 WHY BOTlln WITH STATISTICS'!
Appealing though the idea or minimising the 5ul\i or the slJuared model
output errors is, other pussibilities exist. Tu reach a proper assessment of
83 FURTHER READING
FURTHER READING
Econometrics texts such as Goldberger (1964) and Johnston (1972) give some
of the clearest aeeounls of least squares. Draper and Smith (1981) provide a
detailed basic account least squares, including non-linear least squares,
many examples and exercises. Another text with plenty of examples is
Chatterjee and Price (1977). Computational methods based on singular-value
decomposition and Householder transformation are covered in detail by
Lawson and I-Ianson (1974), with FORTRAN listings. They also consider
equality and inequality constrainIs such as prior knowledge that model
coefiicients are non-negative, and selective deletion of regressors from a
tentative model. Ivlatrix factorisation techniques for least squares afe
described by Bierman (1977) with recursive (sequential) processing of the
observations in millu. as in Chapter 7. Sorenson (1970) reviews the history of
least squares.
Least-squares routines are <Ivailablc ill many software libraries. but it is
unwise to lise them without an apprecialion of the techniques they usc and the
least-squares estimation, we must ask how its performance compares with that
of other estimators. Conclusions about estimator performance based on one
set or records, or even several. ure not likely to be entirely reliable. Questions
about perrormance arc thereforc essentially probabilistic, asking what will
happcn over a sel of estimation experimcnts spccilied in statistical
tcrllls.
It may scem a shame to start talking in probabilistic terms when, aftcr all,
the purely dctcnilinistically motivated least-squares methods orten give
perfectly acceptable resulls, There arc several reasons why the etrort is
worthwhile. We shall fllld that sometimes the performance of ordinary least-
squares can be improved by straightforward modifications, for instance when
something is known or can be estimated about the correlation structure of the
noise present in t.he observations. We shall see that least"-squares estimation
thus modified has attractive statistical properties in addition to its algebraic
simplicity and relative computational convenience. We shall encounter, and
learn to avoid. problems arising when some of the regressors contain noise
correlated with the observation noise. Finally, we may find that in the broader
methodology or itlenlificalion, the ability of a statistically motivated
estimatiun method tu provide not just 0bu t also an estimate orits rcJihbility, in
the rorm of ils covariance (Section 5.3), is valuable.
The next chapter describes how the probabilistic behaviour of an estimator
is characterised, and how least-squares estimation looks from a probabilistic
viewpoint.
4 LEASTSQUARES MODEL FITnNG
with a a scalar small enough to make the elfccts of second ami higher
derivatives of S(O) over the step negligible. The weakness orthis idea is that to
lind as large as possible a value ror ex one would have to eXHmine the local
shape of 8(0), and having gone to that trouble woulLl have 110 eXCllse for not
using the shape information in a more ambitious algorithm. II', instead, a
conservative value of a were used, progress would he slow, parlil:ularly as the
gradient became small ncar the optimulI1.
The basic idea or the Le!ll'llherK A1arc/lwrdt 1llgol'UIII11 (Wolfe, J Y7H) is to
compromise between the uownhill grmiicnl directioll ami the Jircclioll given
by the algorithm, by finding the step that satisfies
[Jik - IIIJIk - 11.+ (,Ik - II1]((i"" - lilk - ") - nSlnOI"eti"" (4.3.10)
with II a positive scalar. II is chosell to be small, the step is almost a
step,and if fLlk - II is large it is almost a uownhill gradient step
with 1/1,"-11 fora. NOlice that ror any sensible model, JI.I is posnllve-m:JnIllC,
since otherwise Je50would have to be zero ror somc non-zero (iO, implying
two diJferellt values of 0 give' precisely the same output values. Positive-
definiteness guarantees that flJ call be invcrteu in the Gauss-Newton
algorithm and, withilL positive and 1 positivc-uclinitc. also guanilltccs that the
Levenbcrg- Marquardt step is feasible. The radurisatiulllllcthoos uist:usscd in
Section 4.2 are also applicable here.
The scalar I"in (4,3.10) is adjusted at each slep according lo how progress in
the previous step t:ompar6.J with what was cXj1t:cted. Implcmcntations of the
algoi'ithm dilfcr ill their rules for aujusting:ll, aiming
to avoiu an excessive number of trial evaluatiuns uf S' anu steps'in V, and in
their safeguards when J1 becomes very small. The algorithm is vcry widely
used, but as with most search algorithms cases Can be found in which it "hangs
up" before reaching an acceptable Ii. Although this sounds serious, a
disposition to hang up is orten a good indication that something is wrong
the form of the model or the starting guesses VIOl. Moreover, it is far easier
be fooled into accepting an ill-chosen model Slrueture by an acceptable lit
one set of records than to be deceived by a hung-up search into thinking an
optimum {j has been found for the given model when it has not. In the
case more runs with dilJ'erent starting guesses will oneil resolve
uncertainty; in thc former, morc records are required, and even then it
not be easy to recognise un uneconomical model as such.
82
I'
84 4 LEASTSQUARES MODEL FITTING PROBLEMS
85
An impulse-response test gives
o+- 0.2 {).4 0.6 0.8
potential problems of ill-conditioning. Van dCIl Bns (19H3) discusses ill-
conditioning in non-linear least squares.
Hamming (1973) sums up the point or exercises such as least sqwlrcs filting
pithily: "The purpose of computing is insight, not numbers",
4.3
Timc I
Rcsponsc h(l) 3.4 2.3 1.7 1.2 0.9
HEFEHENClcS
PROBLEMS
Input II, -0.64 0.36 0.52 0.49 -0.58 -0.36 -0.32 0.72
Oulpul Y, O.4J -0.41 -1.32 -1.05 -0.21 0.27 0.40 0.09 -0.10 -0.12 0.44
Find the o.l.s. estimates of the unit-pulse-response ordinates "I and "2 in the
model ;'i = II I lI, _ 1 +"lU, _ 2 +el' usi ng as lIlany of the observations as
possible.
4.2 Repeat Problem 4.1, but including a collstantterm inlhe model. Usc the
Choleski method to inverl the normal matrix.
I 1
no ')
By o.l.s., lind thc cstill",lcs of 1\ and r in the model h(l) = 1\ ex!'( -I/r) -I- e(l)
which give a least-squares lit to the decibel value 2010g,uh(t), i.e. which
minimise the Slim of the S4Uai'ed proportional or percentage, rather than
absolute, errors in h(I).
4.4 Verify algebraically lhat for any column vector from U, P( Uj"i is ui,
where P( Uj is U[U
r
UJ - 'U
r
. Wha t is the geometrical reason?
4.5 Show that P( U) is idempotent. Why is it, geometrically? Defining P'( U)
as 1- P( U), show that P'( U) is also idempotent. What are the geometrical
interpretations 0)' I'( and P( where is any real vector conformable
with P(U)?
4.6 By rewriting the transfer-function model
h ::-l+b,z-';!'+"'+b::-
u
,- " U(z-')
I +a1z 1+ ... +a/lz II
as a dilference equation relating output sample y, to earlier samples of the
input and output, produce a regression equation which would, in principle,
allow 0.1.5. estimation of the transfer-function coeJncicnts. [This idea will be
pursned in Section 7.2.1
4.7 What happens tu the expressiun fur the o.l.s. estimate if the columns of
the regressor matrix U form an orthogonal set'? What happens if they arc
orthonormal'! [*Note the connection with singular-value decomposition.}
4.8 Find the least-squares estimates of acceleration, initial velocity and
initial position as in Example 4.1.1, bUI weighting lhe sqnared errors linearly
from weight I at lime zero to weight 6 at time 1.0
l
Le. penalising recent errors
more heavily. Use the estimates to predict larget positioll al time 1.2, and
compare the predieti911 error (the aclual position being at312.2m)lwith that ill
Example 4.1.1.
4.9 Express the o.l.s. sum of squared errors s(ii) in terms of y and P( U).
4.10 Compute the second derivatives 02j;/oO,DO
j
which appear in (4.3.6), for
the model and observations of Example 4.3.2, and check the effects of
omitting them from the calculation of [Jog] by (4.3.7).
10 9 8 7 6 5 4 J 2 o Time (
4.1 A data-logging run on un process gives
Bierman, G. J. (1977). "FaclOrizatioll Mcthous for Discrete Scqucnthd Estimation", Academic
Press, New York and London.
Brown, F. R., Godfrey, K. R., and Knell, A. (1979). Compartmental modelling based on
methionine tolerance test data: a case study. Ml'd. Bioi. Ellg. CVII/pur. 17. 223-229.
ChutLcrjcc, S., and Price, n. (1977). "Regression Analysis by Example", Wiley, New York.
Draper, N. R'
t
and Smith, H. (1981). "Applied Regression Analysis", 2m.! cu. Wiley, New York.
Forsythe, G. E. Malcolm, M. A., and Moler, C. U. (1977). "Computer Methods for
Mathematical Computations". Prenticc-Hall, Englcwood ClilTs, New Jersey.
Gauss, K. F. (1809, transl. 1963). 'Theory nfthe Motion ortlle Heavenly Uodies about the Sun in
Conic Scctions". Dover, New York.
Goldberger. A. S. (J9M). "Econometric Theory". Wiley, New York.
Golub. G. (1965). Numerical methods ror solving least squares problems. NUll/cr. Math. 7,
206-216.
Hamming. R. W. (I973). "Numerical MethoiJs for Scientists ami Engineers", 2111.1 cd. McGraw-
Hill, New York.
Johnston, J. (1972). "Economctric Methods", 2nd cd. McGraw-Hili, New York.
Lawson, C. L.. and Hanson. R. J. (1974). "Solving Least Squarcs Problems". Jlrcllike-Hall,
Englcwood Clill's, New Jerscy.
Sorcnson, H. W. (1970). Leo.lst squares cslilllatillll: from Uauss to Kalman. lEE"!:: IJil('CII"I1II1 7.
'
van den Bos,A. (1983). Limits 10 resolution ill nonlinear lcast squares mol1ellitling: IEEE TrailS.
Amol1l. COlllrol AC-28, 1118-1120.
Wolle, M. A. (1978). "Numericul Methods for Unconstrained Optimiztltion". Van Nostrand
Reinhold, Wokingham.
Chapter :;
5.1 INTRODUCflON
Whenever noisc is present in thc observations from which a model is
estimated, the paramcter estimates are afiected by it and are therefore ranuom
variables; laking another set of observations would not give precisely the same
results. Any output prediction making use of the parameter estimates is also a
random variable. We may choose to regard the actual parameters as
deterministic or as themselves random variables. The latter view impliesthu.tu
range of possible systems as well as signals should be considered.
When dealing with random variables, it is natural to ask how the estimalor
will perform Oil arcragL' over all possible noise realisations, and perhaps all
possible adual paramcter va lucs. Ou r measure or how the estimator performs
should becunsislcnl with the ilitcnded usc of thc IlWtiC!' For instancc, we 111<1)'
be intercsted in how well it predicts fUlure output values. The accuracy of
individual parameters may not then be of direct interest, particularly 1'01'
parameters with no clear physical significance. Conversely, the whole aim muy
to find good values fur certain parameters. \Vc need statisticalmcasures of
model accuracy which are flexible enough to suit either situation.
While thinking about the accuracy of parameter estimates, we should bear
in mind that the m o ~ d e l structure will rarely coincide exactly with the
mechanism generating the observations. The structure is usually a
compromise between simplicity and power to account for observed behaviour.
Moreover, the ultimate test of the model is adequacy for a speciJied purpose,
not optimality und still less truth.1n these circumstances it is not strictly to the
point to speak of'true' or 'optimal' parameter values, although we shall orten
do so for convenience. Adequacy is a diflicult attribute to analyse or to
generalise to a variety of applications.
Ideally, we should enquire into the statistical behaviour of an estimator by
examining the joint probability density function (p,d,f.) of the estimates, This
is the basis of Bayes estimatiun, discussed in Section 6.2. Practically, we
virtually always settle for knowing about the mean anL! scatter, bcculisc
H7
Statistical ProJlerties of Estilllators
(5.1.3)
(5.1.4)
1= 1,2,. ", N
ii '" E[b(O)] =E[E[O I0] - oJ
o 0 0/0
J', = call + ('/'
h(lI) '" [0 IOJ - 0 '" fOp(O I0)<10 - II
I1I11
IN
with 1'(0,,01 , .. . ,0,,) the joint p.d.f. of the parameters. When 0 is trealed as
random, the conditional hias
5.1 INTRODUCTION
_L
N
{Ele,]}! IN{Ele,l}!
- a+-- N-a= -- N
Il, 1I ,
1= 1 1= I
The estimator, which docs not use any very slllall values of II, is"
'. '
is dclined, based 011 the cunditional p.d.L P{O jll). We can then cOllsitlcr a
larger collection of experiments, covering all possible values of 0 as well as all
realisations of the signals and noise. The overall unconditional bias is then
;'Ii
I-E'['] _L{Elall, +e,p! ')- 0: -(1,- __ \ N-a
. III J
I
Example 5.1.1 The gain fX of a device is estimated from N measurements of its
input u and output y with additive noise e, the model being
Ea.ch will be .regarded as dctcrmiilistic und known exactly, any uncertainty
bcmgmcluded 111 Cr' Each Y
r
is 1.1 random variable, as it contains noise. The bias
&is
E and L like this is allowed by the linearity of the expectation
operation; 1'1'0111 its Jclinitioll, the expected vallie of a sum is the SUIll or the
expected values or its parts, whether or nul they afe inuependent.
The bias here depends on the Ill)isc mean alllI input values butnOI 011 I.X. If
the noise has a conslantlllc,lIl {',the bias is {'tillll':S thc mean or 1/11 thl'llugh the
(5.1
5 STATISTICAL PROPERTIES OF ESTIMATORS
5.1.1 Ilias of Eslimators
Our first statistical question is whether estimates obtained from
experiments will cluster about the true value.
For a single fixed but unknown parameter 0, the bias is
between the expected value of its estimate Dand 0:
b '" E[Ol - 0", f01'(0) <10 - 0
1/
Here the expectation operator E indicates taking the mean of its argument,
and1'(0) denotes the p.d.. of O. The integration is over the range of all possible
ovalues. The definitiol1 of bias extends readily to a vector 0 of parameters:
b'" l,;lO]- 0 = f f01'(0,.0, ... ,0,,)<10, ... <10,,- II
=1'01'(0)<10- II (5.1.2)
further analysis is either too hard or requires unrealistic quantities of prior
information to be supplied, such as the entire noise probability density
function. Confining our attention to the estimates' mean and scatter
(covariance, defined shortly) may ilOl be as big a limitation as it seems. In the
special case of a Gaussian p.d.f., which we can sometimes accept as realistic,
the mean and covariance arc enough to ucfinc the p.cJ.r. shape completely.
The Gaussian assumption simplifies a great tical of estimation theory, as we
shall sec, but cunnot be made uncritically.
Initially we shall regard the true parameters () as unknown constants. Later
we shall treat them as random variables. Throughout the chapter, statistical
properties of estimators will be discussed by reference to least-squares
estimation, as it is simple, familiar and important in practice. However, the
ideas apply to any estimator; and will recur in connection with
estimators in later chapters. We shall also assume that every sampled noisy
waveform, i.e. every ram/om process or family of random variables (the
samples, indexed by time) is wide-sense stationary. That is, at least its mean
and variance (or, for a vector process, covariance: sec Section 5.3.1) are finite
and constant, and the correlation bctwecn its values at any two times depends
only on the dilference between the two times. Hence we would, 1'01' instance,
treat a noisy sinusoid as a deterministic sinusoid plus a constant-mean
random waveform, not as' a varying-mean randum waveform. For a Gaussian
random process, wide-sense stationarity implies strict statiunarity, Le. totally
time-invariant statistics, since the p.d.r. is completely dell ned by the mean and
variance (or covariance for a vector).
88
5.1.2 L1nbiased Linear Estimator for Lincar-in-Paramcters IVludcl
linitc-stlmplc-lIJibiascd estimatur is obtainable just by rcsl:uling s to
Ns/(N - I). 6.

(5.1.5)
(5.1.7)
(5.1.6)
(5.1.8)
(/ = Ay
E[AU]=/
y = UII + c.
E[i] - y = E[U((/ - II) - e] = Ub - E[e]
For a fixed II. the bias is
b E[Ay] - II = E[A U - /]11 + E[Ae]
5.1 INTRUDUCTION
So long as the Hnite-sample bias can be [oundin terms of quantities we can
evaluate or estimate well, we can choose the sample size N to make the bias
acceptably smull. Even so, the estimation error in anyone experiment may be
large; to accept an estimate is an act offaith. We normally wish to bolster our
raith with some assurance Ihat the scalter or the eSlimates produced by the
estimator is, on average, small, and even then there is some risk that we shall
be disappointed.
It seems reasonable to select, whenever possible, an unbiased estimator.
However. we shall See in Seetion 5.3.6 that a biased estimator may produce
estimates with so much less scallcr than an allernative unbiased estimator that
it has smaller mdlI1-square error; and is preferable.
Here A will depend in some perhaps complicaled way on samples or the
explanatory variables. Ifallthose variables are uncorrelated with e. which is so
if the model is good at explaining the systematic behaviour of the output, and
if also c is zero"mean, the la,tterm in (5.1.6) is zero, so the bias is zero ir
This restriction 011 A will be invoked in Chapter 7 when deriving recursive
estimators, but more immediately we check ir it is obeyed by least-squares
estimators in the next section.
An important property or mudels with zero-Illean
c uncorrclatcd with U is thai unbiased paralllclcresliIll<llcS imply llllbiascu
model preuictions ); of the output due to any spccilied U, since then
Estimators which are linear in the output observations l"ormitlg 'Y nrc
attractively simple, cOlnputationally ano algcbraically. Whcn the model
relating y to explanatory variables through parameters 0 is also linear, as in
o.l.s. anu w.1.s., it il is easy to find the conditions for the estimator to be
unbiased. The moueI and estimatOl' are
5 SIATISTICAL I'IWI'ERTIES UF ESTIMATURS
N N
= -E[I I v,v,JI N' +,;' = - (NE[II'] + N(N - I),,')/N' + ,,,
1= I s= 1
N N N N
=[v'] - 2I I E[v,v,]/N' +E[I I V,V,lN'J - E[v'] +2';' - ii'
f=I 1=:15=1
N
I IE[v! - 2v,;; + ;;'] II N - E[v' - 2v'; + ,i]
1= 1
= -(E[1I
2
]-,;')/N=-s/N
The bias is asymptotically zero since s is finite, but is nOll-zero for a lillite N, in
spite or the plausible look or". Allhough the bias depends on the unknown s,a
N
b = E[II(V, - ;;)'IJIN - [(v - ,,)']
1= I
where vis the sample mean I:v,", I vJN. The bias ill .v is
N
I
'(V -;;)',
j I f

1= I
Example 5.1.2 The variance s '" [(v - til'] or a wide-sense stationary signal
v(t) with mean ti is estimated rrom N independent samples v, to.l'N by
experiment, so unless if is zero, the larger the It values the hetler, as intuition
suggests. If jj is zero, 0: is unbiased for any selection of nOll-zero input samples
and any number or measurements N. D.
As records arc of finite length in reallifc, we wish to know lhe./inite-.,wmple
bias of an estimate based on N samples of each variable, as ill Example 5.1.1.
We may make do with the asymptutic bills in the limit as N tends to inHnity, as
second best, since it may be possible to determine whether <.Ill estimator is
asymptotically biased even Ihough its finite-sample bias is dilfIcult or
impossible to evaluate.
(5.2.6)
(5.2.9)
(5.2.8)
(5.2.7)
(5.2.5)
(5.2.4)
y; = u;O +Il,
J', = Y; +v" t = 1,2,3, .. " N
1>= E[[V'Vj-1VTC]
U
C[[V'Vj-IV'IE[e]
v
from measurements
t Kendall ano Stuart, 1979; Johnston, t972.
in which. rrom (5.2.6'-5.2.8),
afrected by mutually uncorrelated, zero-mean noises 11', and v,, The modelling
error n, is uncorrclalcd with 11', and V, and has mean zero, The regression
equation is
5.2.3 Bias due to Noisily Obserl'ed Regressors: The HErrors in Variables"
Problemt
5.2.2 Dins n'ith ]{cgressors ]{andom
Let us HI'sl examine o.l.s. estimation of a scalar parameter 0 in a model
and alice morc {) is unbiased ife On the other hand, if Uand eare
not indcpcndcilt, thc bias is Ilol gCllcnllly zero. The bias in these cin,:ulllstances
is investigatcd in Section 5.2.5, using probability limits. .,.1
Two cOl11mon causes of dependence between regressors and regrcssion-
equation error are noise in observing the regressors, and inclusion of earlier
samples or the output (regressand) among the regressors in a dynamical
model. They are the subjects or the next two scctions.
When V is partly or wholly random, the bias has to be averaged over Vas well
as e and for o.l.s. becomes
lillie extra complication cllsues ir e is independent or V, since then (5.1.7) is
still lruc, .
5.2 BIAS 01' LEAST-SQUARES ESTtMATE 93
the row or Vi due to the constantterm{! consists wholly of I's, we see rrom
UTe being zero that the residuals add up to zero even though the error samples
do not. The presence or the constant term in the model ensures that the
constant component of the residuals, but not e in this case, is zero. 6
(5.2.3)
(5.2.2)
(5.2.1)
5 STATISTICAL PROPERTIES 01' ESTIMATORS
b = [V'WVj-1 U' WE[ej
5.2 BIAS OF LEAST-SQUARES ESTIMATE
which is zero so long as e is zero-mean.
11'0 is taken to be random, the conditional bias 1>(0) deli ned by (5.1.3) must
be considered, but in ract the o.l.s. and w.l.s. biases we have just round are
independent of 0, sO 1>(0) and its mean bcoincide with b.
Example5.2.1 Observations ofy(l) are alfected by linear drirt d(l) = at +{3 as
well as zero-mean noise 11(1). A regression with a constant term f3 but no at
term is tried. The corresponding regression-equation error is
e = at + n
5.2.1 Bins with Regressors Deterministic
92
Proviucd c is zerO-llleall, the U.1.5. estimate is thcl'cl'orc unhinsed, for any
number N of samples making up y. As in Example 4.1.1, the mean uf c can
made zero even when y UI}(j/or the regressors cOlltain constant components,
by including a constant term in the regression Illodel.
For a w.l.s. estimate, (4.1.34) has [V I IVV] -, Vi Was A, so again AV is I,
and the bias is
where t comprises the sampling instants. Since the lI1ean of cis nol zero, the
o.l.s. and w.l.s. estimates will be biased. Thc o,l.s. residuals c Ii!, y - VII have
thc propcrty that
VTc= V"(y- V[UrVrIVly)= U'y- Vly=()
so the bias could be regarded us caused by 0.1.5. forcing Vi e to be zero when
UTe is not. Weighted-least-squares similarly forces UTIVe to be zero. Since
E[A V] = [V'Vj-1 U' V = I
so rrom (5.1.6) the bias in 0 is
I> = [VI U1- IV Ilei
To analyse the bias of least-squares estimates, we must distinguish between
deterministic and random regressors, and between regressors correlated and
uncorrelated with the regression-equation error.
The o.l.s. estimate (4.1.12) is linear in y and based on the linear-in-
parameters model (4.1.4), so (5.1.5) applies, with [U' Vr I Vi ror A. Treating
both 0 and the regressors forming U as deterministic,
95
(5.2.14)
(5.2.16)
011
1 = 1,2,3, ...
-a" hi
1=1.2,3.... =: u,
l
(} + { ' ~ ,
Example 5.2.2 A systcm is described by thc lirst-onJcr discrele-timc model
Y(o-')=(h,o' '/(1 +(/,o-'l)V(o-') + V(o-')
The modcl is rewrittcn as a regrcssioll equation
e
l
=v, -l-1l1t'1_1
Here k is the dead time in sample intervals, V(z - I) represents the output noise,
and
A(Z-I)=1lIZ-I+1l2Z-2+ ... +11uZ-", lJ(z-I)=btz-I+ .. +b
m
:::-
m
(5.2.15)
is rewritten in regression-equation form via
and the regressor vector urI' is
where
{YI-I .. , .l'1-U "1_/;_1 " , - k - mJ
From (5.2.14), t', is seen to be a moving average orn + I successive samples of
the o,.igin"lnoise sequence {t'} in (5.2.13), so Ie} is aUlocolielated even if {F) is
not.
We shall see in Section 5.3.5that autocorrelation of re} afrects the scatter of
least-squares estimates of O. Our present concern is bias, though. Each .1\-i
among the regrcssors in (5.2.16) is directly ailecled by e,_i' We "Iso sec, by
writing (5.2.16) with I - i in place of 1, that .1',-1 depends indirectly on C
I
_
i
-I to
e r ~ i-II lhrough )'1_ j_ 1 to .1',_ i-II and yet more indirectly on all earlier samples
from le J. Thus correlation of C!I with any earlier e
l
_
i
leads to correlation
between (', and one or more regressor J ' I ~ 1 to Y
r
-
II
. Bias will result in 0, as
shown by (5.2.4).
One aim of the instrumental variable method described in Section 5.3.6 is
to avoid such bias.
Interpreting Z-I as the 0nc-siHllple-delay operator
l
we obtain the difference
equation
The paramder vector () to be estimated is
(5.2.13)
(5.2.10)
(5.2.11)
b =[[UTVj-' UTe] = -[[( U' + W)'( U' + /VJr' [V' + IVJrWjO
(5.2.12)
5.2.4 Bias duc to Prescncc uf Output muong Regressurs
CrOSS-COri'elatiol1 between regressors alld the regressiolJ-eqlHltiOiI elTor may
also arise when a :-transform transfer-function model
N N
= [2:1(11; + 11',)(11, + I', - 11',(1) IILl II,' :J
I=: I ''''1
Note that the bias is duc to dependence betwecn U and c, so noise in the
inputs to a system being identined by 0,1.5. causes bias only when it appears
both in the input measurements and in e. Bias would nol arise, for instance,
from actuator noiseaJfccting the input u' actually applied when a kllowl1lesl
signal u was intended, sint:e thellll would be ullcorrclated with II' ano hence e
l
even though If' would be correlated with II'.
where the last line follows from the assumed uncorrelatetlness and zero
means. The estimate is clearly biased in general, 1'01' deterI11inistic or randOin u;
and for any N. The o.l.s. estimate of a vector 0 is similarly biased in the same
circumstances: "
so the bias is
so there is dependence between e, and regressor H, through 11'(. The 0.1.5.
estimate of 0 is
94
97
(5.2.23)
(5.2.19)
(5.2.21)
(5.2.20J
(5.2.22)
limits of o.l.s. parameter
Ul
= (plim 0'
plim =
plim(ABJ = plim A plim B
plim U=pliml[U'Ur' UTI UO + eJ}
= 0 + UTUr'}Plim{L UTe}
=O+I<-'c
so, for example
exist, the o.l.s. cstimate has a probability limit
We call now enquire illto the probability
estimates. Assuming that the probability limits
lrom x to zero as N tends to infinity. Almost sure and mean square
convergence each imply convergence in neither is implied by it.
Convergence in probability is a weaker property than a.s. or m.s. convergence,
but is usually easier (and certainly no harder!J to prove.
If converges in p['obability to x, x is said to be the [Jl'obabiliIJ'!illlir
01 The big attraction 01 probability limits is the property (Wilks,
1962J that 10" any continuous lunction j
and for two maliices A and B, both functions of the same random variables,
we can Hnu the probability limit or each clement of AB from
By its uelinitioll. R is posilive-ucJillitc. alllitherefol'c invertible, eXcel'll in the
degenerate case where exact linear dependcnce between the regressors makes
Vex zero for some non-zero a. AsR -I is non-singular, the bias R-1c is zero if
and only if c is zero. Wilh the regressors and lc I stationary, c lurns Gullo be
the vector of Eluje] between regressors and le l, so the
necessary and sulIicient condilion lor () to be asymplotically unbiased, in lhe
sense lhat it converges in probability to 0, is that every regressor is
uncon-elated with the regression-equation error.
All estimator U(N) which converges in probability to 0 is said to be (weaklyJ
consistent. We have just seen, then, that the consistency of UTe/ N as an
estimator of lhe cross-correlations between lhe regressors and (e] guarantees
consistency of 6, provided the cross-correlations are all zero.
(5.2.18)
(5.2.17)
lor all i:2: 2
for all N> No
- xl < t;) > I -1/
lim N) - xl> 1:) = 0
N-':i"
which implies that
Analysis of bias soon rUlls up against the problem of IInJing expectations or
relatively complicated functions 01 random variables, as ill (5.2.4) if U
contains random variables. The problem can be avoided by considering
asymptotic bias and employing probability limits. Probability limits refer to
one particular way in which estimates may settle down as the number N of
observations they are based 011 is increased. A sequence of random variables
with N increasing (for instance, paramcter estimates computed from
longer and longer records) is said to converge in probability to .Y if for any
positive real numbers e and If we can fwd a value No of N such that
5.2.5 Convergence, Probability Limils and Consistency
when, as usual, the input sequence Ill} is independent of {v).
Regressor Y,_ I is correlated with C!, since bolll depend on u
1
_ \. SpcciJically,
,. (0) = El(v + a v)'] = E[v'J +,,' E[,,' J= (I +0' )cr'
1''' I 1 1-1 I I ,- I I
Put less lormally, the chance that is lurther than" Irom x becomes, then
remains, as small as we like as N increascs past No' He careful not to interpret
this as implying that almost every realisation of converges withinl; of x
and remains there; the tiny proportion or realisations not within I: may consist
of diflerent realisations at different values or N. Nor docs convergence
in probability to x mean that every realisation has x as the limit 01
Permanent convergence of almost every realisation to within I; or x is called
convergence with probability I (w.p.l) or almost sure (a.s.) convergence. An
alternative form of convergence is mean-square (m.s.) convergence or
convergence in quadratic mean, defined as convergence of the m.s. deviation of
Suppose that {u} is,zero-mcall, uncorrcla tcd and of COllstan tva riance (T.!. Even
then, {e} is autocorrelated:
96
99
(5.3.7) eov( Oi) = E[( cO - E[Oi])( cO - E[CO])T] ~ C(eov O)C'
5.3.2 Covariance of Linear Functions of Estimate;
SiglliJicmlcc of lVlinhuum-Covariancc Estimate
the (co)variance of the scalar observation noise sequence {u} would be
eov(1', 1- s) = ,,' ,5(1- s)
If we decided instead to use a state-space model
I + a ~
o
'"
EX31111J1e 5.3.1 If the noise sequence Iv) in the transfer-function model of
Example 5.2.2 is white and has constant variance a
2
and zero-mean, the
regression-equatiun error vector e has
i.e. the covai'iance is zero except at lag 1 - S zero, when it is R. This occurs in
the process- or observation-noise part of state-space models, which we shall be
employing in Section 8, I.
Example 5.3.2 An unbiased estimate (j(i) with variance p is obtained from
5.3 COVARIANCE OF ESTIMATES
Another Common situation occurs when x consists or simultaneous
samples of a colleelion of separate variables, all white, and
eov(x,l-s)=RJ(t-s) (5.3.6)
In comparing two estimators or assessing the quality of a model, it is often
necessary lO examine thecovarial1ce of a fixed linear function of parameter
estimates making up a vector 0. Keeping it as general as possible, the function
is cO wilh C a matrix, although most orten C will only be a row veelor. An
example is a row vector u
r
of specified regressor values for which we want to
know the model-output mean-square error E[(uT(O - 0))'] allribulable to
parameter error. The covariance or cO is
(5.3.1)
(5.3.3)
(5.3.2)
(5.3.5)
]
1'.u(N)
r,:(N-I)
1'.,.,(0)
1'.,,(2) 1'.,.,(1 )
ru(O)
5.3.1 Definition or Covariance
var x'" ,,'(x) g, E[(x - Ex)']
eov x'" Ii(x) g, [Ix - Ex)!x - Ex)'1
eov(x,.I', I) g, El('., - EX-!Ix, - Ex,)']
l
ru(O)
I' (I)
R = cov x = .l.\
. ~ x ' :
r,.,{ N)
Element fij of R(x) is the covariance J:.'!(x
j
- 1.::.\")(-'".1 - 1::.\) I between clements
i unoj of x, so the variances urindividual clements of x make up the principal
diagonal of R(x). A random process x,. i.e. a random variable with a time
argument I, has its covariance tlcflncd as
or standard deviation a(x). For a vector variable x, the coullterparl is the
cvVl1r;Uf1te 11Ialrix
The average scatter ora scalar ranuom vliriablc x about its mean is described
by its variance
eov x ~ E[n" - Ex' x" - xEx
T
+ Ex' ExT] = Eln
T
] - ExEx I (5.3.4)
so for a zero-mean random variable, Elxx
l
] amI cov x can be used
interchangeably.
In addition to the covariance or an estimate, we shall onen be "interested in
the covariance of input, noise or error samplcs which have been written as a
vector, like e in the regression model, usually comprising successive samples
uniform in time. Element (i,) of cov x is thcn autocorrelation rn<li -)0 at lag
Ii '"'-)1 sampling intervals, so
A function of two time arguments like this is cumbersome, and we morc often
deal with the covariance CDV(X, 1 - s) of a wide-sense stationary process. In
that case element (i,) is the cross-corrclation at lag 1 ~ s between Xi and Xj'
For the moment we are concerned mainly with the simplest dcJlnition (5.3.2),
applied lo O.
The covariance matrix is easily expressed in terms of the "mean-square
value" matrix and thc mean: "
A special case is whenlxl is an uncorrelatcd (white) sequence, with rxAO)
equal to a
2
and the autocorrelation zero at all other lags, $0 that R.
n
is all.
98
batch i of some observations. We enquire how much the variance could be
reduced by taking M batches of observations and computing as Ii Ihe mean of
DOl to OIMI, assuming the batches are independent auu statistically identical.
The vari,lIlce of the mean {j is
lUI
(5;3,10)
(5.3.12)
x'Ay =I I -',a,,)', =trlAyx'J
k
Since. when U is non-random and cov c is a
2
i,p
S=c'c=(y- VO)'(y- vri)=yTU
N
- VlVTVj-'VT)'y
V[V'Vj'UT)yy'j (5.3.11)
where iN denotcs thc N x N identity malrix, the trace tr is the sum of the
elemcnts, and we have used the relation
and the expectation operation can be dropped when U is llon'-random. Since
[V'Vr I can be extracled easily from the o.l.s. computation, cava can be
estimated readily providing 0"2 can be estimated. One would guess that the
sum of squares of the residuals is proportional to a
2
In fact
eovli = E[<IV
'
Vj-I V'y - II)([V
'
Vj I V'), - O)'j
= E[<IV I 'U'l VO+ c) - 1I)([U'Uj-1 U'I VO +e) - II)']
=E[lU
'
Vj V[ u' VI - Ij (5.3.9)
5.3.3 Ctl\'ariancc or Ordhmr.y Least-S(luUrc and \Veighted Least-Square
Esfilua(cs
E[Sj=lr:U
N
- VlVTVr'V')E[yyTjJ
,
= trl(l
N
- VIV' Ur' V')( VIIOTU
I
+ (J'IN):
(J"trJI
N
- VlV'Uj - I V
T
j (J"(trJI
N
J -tr{[ V'Uj-' VI V j)
(5.3.13)
we conclude thai SI( N - p) is an unbiased estimator of (J"-
In the simplest case, cove is a
2
i, the elements of e are uncorrelated and all of
variance a
2
That is so for uniform sampling of a system with stationary
e(t), autocorrclatcd only over lags less than one sampling interval. If so,
howevcr, minimise it in the special case where the regression-equation error is
while. We follow up this point in Sections 5.3.4 and 5.3.5.
When U is indepcndcnt of" c and e is zero-mean, the o.l.s. estimate has
covariancc
(5.3.8)
5 STATISTiCAL PROPERTIES OF ESTIMATORS
var(cTO) - var(c'rO*) = cTpc - clp*c = e'(P - P*)c
OUI )'J I'" I'" .[(OUI - O)(OIil - 0) J Va< 0 = E ....J- I .. - 0 = I;
M M M
i= I j= I
so cT{j* has the lower variance if P - p* is posilive-ueliliite. or, it
another way, if p* is smaller than P. What is more, this is true 1l'11ll;ever aur
choicl! As well as implying lower variance of the model output due to any
specified regressor values, smaller p. implies lower variance rur each
individual clement 0i of t]* than for tlie correspolluing clement of fi, We see
this by choosing as c a vector which is zero but rur Olle in position i. Amung
unbiased estimates, smaller variance implies smaller n1.s. error, since I11.S.
error is variance plus (mean errur) squared.
The good implications of minimum parameter-estimate covariance arc so
wide that we are justified in paying covariance close allcntion as an indicator
of estimator quality, particularly in the context of unbiased estimators.
A warning is in order here. It is easy to confuse the variance of an unbiased
estimate of a model output utO with the mean-square output error actually
Obtained in filling Ihe model. The Iwo are not the samc. The o.l.s. paramcter
estimates minimise the actual mean-square model-output error over the
record, but do not generally minimise the mcan-square model-output error
over all realisations for any specified regressor
values, not even for the values occurring in the record. The o.l.s. estimates do,
Armed with (5.3.7) we can investigate how to get the minimum-variance
estimator of any scalar linear function eTO. We denote a cundiuale unbiased
estimate of 0 by 0* und its covariance by P*. We slla II compare the variance
of ct{j* with that of cTI), where 6is any unbiased estimate. with covariance P.
PUlling c
T
for C in (5.3.7),
=MplM' =1'1114
The next-la-last step above recognises thai OUI - 0is independent of OUI_ 0
and has zerO mean, since (](il is unbiased, so all the cross-prmlucllcrms in the
sum are zero.
In the nolation 01'(5.3.7), C is a row vector with every entry 11M, Ii is the
column vector containing ()( 1) to DIAl) and COy (] is pl. Notice that the variance
of Ii tends to zero as M is increased indefinitely. /'0,
100
cov 0" = E[(Ay - O)(Ay - 0)1] = E[AeeTA
I
] = a'E[AATj (5.3.15)
IOJ
(5.3.22)
(5.3.23)
(5.3.21)
(5.3.20) U'=QU y' =Qy,
y' = Q( UO +e) = U'O +e'
cove' = E[QeeTQT] = QRQT
where R is cove. By choosing Q such that

From the original regression equation we then have
where e' is Qe. This ne\v errur is still zero-mean, but its covariance is
When the covariance of the regression-equalion error e is not of the form all,
we no longer have any reason to suppose that the o.l.s. estimate of (} has the
smallest covariance or all linear. unbiased estilllHtcS. We can, however. still
obtain the minimum-covariance eSlimate irwe Iirst operate oil the regression
equation so as to turn it inlo an equation with all error veclorwhich docs have
covariance tTl f. The required operation is lincar fillering. That is. y and U arc
pre-l11ultiplied by some N x N matrix Q to give
5.3.5 !\rJinimum-CO\'ariancc Estimate \Vhcn Error ]s Autocorrelaled/Non-
Stationary: The Markov Estimate
D = E[AA
I
] - E[[U'Uj-lj=covO" -covO (5.3.18)
and D is positivc-semi-definite, being the mean of the positive-semi-definite
product or a real.matrix and its transpose, so
cOV 0" :2: eov 0 (5.3.19)
This minimum-covariance property is a powerful incentive to employ o.l.s.
when the error is uncorrelated and of constant variance.
It would be nice to have a comparably simple minimum-covariance estimate
when the error-vector elements are correlated and/or of differing
Such an estimate is round in the next section.
so
unbiased estimates we shall encounter are o[ this [orm. Keeping A and U
stochastic, consider
D@'E[(A-[UrU]-IUr)(A-[UTUrIUT)lj
E[AAlj - E[A U[UIUj -I] _ E[[U
T
U]-I UTA
T
) +E[[U
r
Uj-l
(5.3.16)
Replacing A by [BUr I B, we see that
EIA U[UIU] II = El[U
I
Uj-I] = E[[ UIU] -I UIA'] (5.3.17)
5.J COVARIANCE OF ESTI MATES
(5.3.14)
-155
95H
-1763
[
43.2
52.6[U
I
,Uj 1= -155
235
covO
w
= E[[UTWUr I UTWRWTU[U
I
WUj-l]
5.3.4 Minimum-Cm'uriuncc Property of Ordinary Least Squares, \Vheu
Error ]s Uncorrclulcd
where R is cove. As [U
T
U J- t U
T
IV is the matrix relating Ow to y, it might
appear that cOV Ow can be computed cheaply, given R. Direct computation
would, on the contrary, be expensive since R is N x iV, normally very large. It
can be avoided in the most important w.l.s. estimator. the Markov estimator
which minimises cov Ow, as discussed in Section 5.3.5.
under the, assumption that e is uncorrclated with the samples making up A.
We could now assume that A and U are non-random, allowing us to
drop the expectation signs in (5.3.10) and (5.3.15). then show fairly easily
(Problem 5.4) that ,,2(AA
T
- [Ur UJ-I) is positive-semi-definite, demo
onstrating that no 0A gives cov 0" smaller than cov O. A less restrictive
assumption is that A may be stochastic bUI is of Ihe [orm [BUriB. which
satisfies the condition (5.1.7) that ensures unbiasedness. All the linear,
For the w.l.s. estimate Ow given by (4.1.34). the covariallce is found by the
same steps as in (5.3.9) to be
We shall now discover that when the regressiun-equation errors forming c afC
zero-mean, ul1correlatcu and all or the same variance, sO that R is (11 f, the
covariance of the o.l.s. estimate {j is the smallest of any linear. unbiased
estimate. I[we denote any such estimate Ay by 0..
1
then, using (5.1.7) to ensure
zero bias,
Example 5.3.3 In Example 4.1.1, the o.I.s. estimates of initialtargel position
x
o
, velocity "0 and acceleration a were :(0 = 4 79. Do = 234. d = 55.4. The sum
of squares of the residuals was S = 157.9, so ,,' is estimated as
S/(N - p) = 157.9/(6 - 3) = 52.6, and cov 0 is estimated as
235 j
-1763
3526
102
The square roots of the clements give estilllated stanuard
deviations 6.57 for "0,31.0 [or Do andl59.4 for d. As N is so small, we should
not trust the covariance estimate too much, however. 6
105
1.64
o
U
t = 1,2,.,., N
1= 1,2, ... , N
o
U
o
-U.8
[j
-0.8/T 0 0
:J
/T -0.8/T 0
it I)"
"
I.M -O.X
-0.8 I.M
cov C'" II = /T' 0
o
The only dilliculty is that we do not know the initial conditions, and yin
here, anti must choose them arbitrarily. So long as the filtering from {Ii} und
I)'} to III'} and IY'} is stable, the elrects of incorrect initial conditions
eventually die out, so we need only discard the lirsl few (we hope) values in Ill:
and {.v'I, 6.
gives
We saw in Section 5.2.4 that the combination of autocorrclated kJ and
inclusion of earlier oUlpuls among the regressors (like y, _ I in Example 5.3.4)
Generally we can sec that when e, is a linear combination of 11 + I successive
samples v, to V, -
H
, R has 2n + I non-zero diagonals containing r{',,(O) to r"l'(lI),
and each row or S has 11 + I non-zero elements, proportional to the coeHkients
of V, to V'_II in e,. Equating clements of SST and R gives those coetlicients
uniquely. We then want to invert the linear relation between {e} and {v} sO as
to find the filler that tUl"ns]/f J into iu'} and 1.1') into 1.1" j, just as Q-I was
inverted to give Qto prclnulliply U and y. As S is Ilot square, it is not obvious
how to do su (0'. I docs nul exist). We can, however, invert U =SU' and
y = Sy' one sample at a time. In this example
Gaussian elimination inverts this tridiagonal matrix quite quickly for use in
(5.3 .25)( Fenner; 1974), but a nealer and more informative solution is 10 notice
thut II can be factorized as SST where S is the N x (N -I- I) matrix
as found in Example 5.2.2. In practice, Ie I is not accessible from jy} and III}
without exact knowledge or a and b, which would do away with the need for
idcntilication. Instead, we have to use the o.l.s. residuals {e to for instance,
which dilrer syslematically from ie} because the o.1.s. estimates aand 6are
inexact. \Ve finu. we hope,
(5.3.25)
I = 1,2, ... , N
Y, = - ay, _1 + btl, _ I + e/
with
cove' = QQ-1Q-"Q' = I
II is always possible to factorize II. as in (5.3.23), because R is positive-definite
(unless the elements of e arc always linearly dependent, nol a practical
possibi lily); the factorisation anWullls to rewriting the LJuad rn tic !'orlll as
and positivc..t!cJinitcllcss ensures that at ICasllJllc clement of
Q-I is nOll-zero, so Q exists. The Cholcski factorisatiun ill Chapter 4 is Olle
such factorisation.
The 0.1.5. estimate based on the lillcrcLi equation (5.3.21) is
0' = [V'TV']-l V'T
y
' = [VIQTQV]-l V1QTQy
= [VTR-
I
V]-' V'II-'y
Suppose that in reality a is -0.8 and the noise sequence Iv) of the
function model is uncorrelateu. If the autocorrelation runction of {e) could be
calculated exactly, it would be
1',.,(0)=(l-l-a2>a2=I.64/T', 1',,(I)=1I/T'=-0.8/T', I',.,.(i) =0, i:?2
Example 5.3.4 We return to the plant and model of Exal11ple 5.2.2.
transfer-function model was rewrillen as a regression equation
so it turns out that the millimWl1-Covuriallce, linear, Ill/biased eSlimlltl! is w.l.s.,
with the inverse of ,he covarianCe! the regression-equation error as the
weighting mattix. The estimate is callcd the iHarkov, Aitken or generalized-
least squares (g.l.s.) estimate. Since the filtering or the regression equation not
only uncorrelates the error {e) but also makes its variance unity, the
covariance of 0' is, from (5.3.10), E[[ V,r V'l - I] or Ell V'R I V] - 1].
Direct implementation of(5.3.25) is unallractive, as R is N x N, normally
large. A beller solution is to find a low-order linear filter with Ihe required
"noise whitening" effect on Ie}. This may be done iteratively, processing all the
regression-equation errors at once as in Section 7.2.2, or recursively, running
through the observations and regression-equation errors one at a time, as in
Section 7.4. In either method we find the required IiIter by identifying the
structure of the regressioil-equation error. The l1i'oblcl11 is a special case of the
usual input-output identificatiun problem, wilh "output" (', nlOdcllcd as a
linear function of carlier samples or itself and white-noise forcing. The
sequence {e) is, of course, not directly available and must be approximated by
residuals {.I', - urO) with 0 the best estimate of 0 al hand.
104
We make the covariance
107
(5.3.29)
=,,' E[[ZTV/Nj-I[ZTZ/Nj[VTZ/NJ-lj/N'" (5.3,28)
z,U
/
Like Z r V, Z' be has to be divided by N to get a finite probability limit R
zz
.
Plim V I L/N i(just and the inversion or zr V/N and V'Z/ N inverts Rzu
and so altogether
plilll cov til = IT' plillllL I Vj Nr ' plilll[L' L/ Nj plim[ V I L/NI '1/ N
= E[[Z'Vj-'Zr E [ceT!Z[VTZj-'J
Z,U l'IZ,11
,,' E[[Z'VJ-IZ'ZlVIZ)"j
z.u
1/11
e.f/
=. E [[Z'VI-1ZTeeTZ[U'ZI-IJ
i,U,,,
Now (5.3,29) rcveals a uangcr. If, in making Z uncorrelateu with Ie l. we
should rcnder Z almost uncorrelated with V, Ii
w
and H:
vz
wonld bc small and
their inverses large. With Ru. not particularly small, we could then have an
undesirably large asymptotic covariance for 0z' even though 01. might be
consistent. It seems Z must consist of instrumental variables correlated as
lillIe as possible with Ie I but as much as possible with V. There need be no
connict, in principle; the error-correlated variables in U arc functions of both
the noise-free variables driving the system ({u l here) and the regression-
equation error, and we are merely asking ror the rormer to be emphasised
and the latter suppressed in Z. However, not ktiOwing e exactly, we cannot
check accurately how closely Z approaches the desired correlation behaviour.
The required correlation properties or Z do not prescribe explicitly what we
should choose as instrumental variables, Several possible choices will now be
reviewed in un example.
is
(Melsa and Sagc, 1973, p. 162), we tind that
cov Oz g, z.f)(O, - 0)(0, - 0)' j
5.l COVARIANCE OF ESTIMATES
covariance. Taking cove to be ,,2t. the simplest possibility, and using the fact
that in gencral
(5.3.26)
Oz = [Zr Vj -I Zr
y
106
The d.lIc to c.orrelution between regressors and fcgfcssiollwcquHlion error,
described 111 SectIOn 5.2.4, can be avoided by modirying the o.l.s. estimale into
the instrumental variable estimate
causes bias in 6,. due to correlation between L', ano those earlier-output
r.cgressors..The docs not arise if 1e J has its autocorrelation removed by
lInear. filtefIng, slllc.e then the correlation between regressors antI rcgrcssion-
equatIOn error vi.llllshcs. An allcrnativc method of avoiding such bias is the
subject of the next section.'
where Z is a in which the regressors of U arc replaced
by other va.nables (the instrumental variables, or just i1lstruments) not
correlated with the error. A suitable choice of Z will make Oz a consistent
estimator of 0, for .
plim Oz = plim([ZTVr IZ'(VO +e = 0 + plim([ZTUJ - IZ' e)
= 0 + plim([ZTVj/N r ') plim(ZTe/ N) = 0 + ii;J ,o
z
, (5.3.27)
We _here that putting in the 1/ N is enough to make the
limits Rzu and '\1' exist. The elrcct is that, rorinstallco clcment
(i,J) or Rzu becomes the probability limit or the mcan, uver lhe N in
column I or Z and columni or U, or the product or the variables rorming those
olle would gucss, in many cases each such N-samplc tnean tends
wIth IIlcreaslllg N to the expected value or the product. Hcnce, irthe variables
in Z, Uand e arc zero-mean, the elements of R
zu
ano i:
z
" arc the covariances
between thc corresponding variables. (Caution is needed here, 1'01' situations
can be devised in which an N"sample mean is asymptotically biased but still
A proponion or realisations might be biased, the proportion
to zero as N increases and so not preventing
COllslstency, but the bias in that proportion might rise more rapidly with N
alld produce nell bias. We cannotthererore assume blindly that ror
any vanable 'N wIth a probability limit, plim and lim Ec coincide)
R' N N-+tr: .N '
eturlllng to. (5.3.27) we sec that plim 0, is 0 ir ,0" is zero. When ,0" is the
vector of covanances between the instrumental variables and Ie} as discussed
above, we conclude that Oz is consistent if the instrumental variables are
uncorrelated with {el.
Our choice or Z is rurther guided by wanting liz to have a small asymptotic
5.3.6 Ilistrulllelital Variables
t09
(5.3.32)
(5.3.31)
N N
= (I1I,(lI,U + e,) +krl- kU)!(III; +k)
/=0 I '=01
Up to now we have asked first that an estimator should be unbiased, then that
it should have minimum covariance among unbiased estimators. Reasonable
as this seems, it is not always the best thing to do, as it does not guarantee
minimum mean-,square error (m.s.e.) in the estimates. The 111.s.e. matrix for
estimate 0 of II, with mean EO equal to 0, is (treating 0 as non-random)
M g, E[(O - U)(O - U)I] = l(O - 0 +0 - U)(O - 0+0 - O)T]
= [(0 - (})(O - /h
1
J + () - {I)(/I - (I)' + (() -0)(11 - O)r + (II - U)((I _O)T
= cov 0+ bb
l
,.' . (5}.30)
where b is the bias in ii. This matrix counterpart of the familiar "mean-square
value variance plus mean squared" indicates that a finite bias may be
worth exchanging 1'01' a reduced covnriallce.
Reduction of Ill.s.e. is the aim of ridge regressiol/, which modifies the 0.1.5.
estimate to
,=0 I 1= 1
with K some symmetric matrix. Several forms have becn suggested for K
(Hoed and Kennard, 1970; Goldstein and Smith, 1974), the simplesI being k 1
with k a positive scalar. To see how a reduction in Ill.s.e. comes about,
consider a scalar U, for which (5.3.31) beeomes
N N
0,,= I/'Y'!(L",2+
k
)
5.3.7 Estimation: Ridge Regression
squaring and summing its ordinates, as in Problem 3.2. Hence,
. - (I +(2)r,.,.(0) [1/(a
2
1(2 +11
2
) 0IJ
phll1 N cov U. == ---
Z r"u(O) . 0
and high input power compared with noise power, i.e. low I'n,(O)!rr",(O), is seen
to be benclkial, as one would expect. b.
1 = 1,2, .. ., N
N
lUi 1
plim N
r= I
e, = v
r
+ (11\ _ 1 '
N
r'''(O)J
I'rm(O)
N
r
N
1= J
N
PlimI
1"'1
[
r".(o)
= r",.(O)
J',- 2 depends on u's up to U, _ J and e's up to e/_.2' and thus all v's up to V'_1' As
{II} is uncorrelated wilh {v}, 1',,,(1) is then zero providing r",,(i) is zero for all
i> 1. Similarly, I'll/D) depends on I'UII and is zero if lu} is white. Hence
IRzul = 1',.,.(1)1'",(0) - (a contribution which depends on r"" and is zero if
r",,(i) = 0 for all i> I). It is not dillicult (0 see lhat 1',.,.( I) is not generally zero,
so IRzul is nonzero and RiJ exists. Problem 5.5 examines the resulting
probability limit of N cov 07. when {II} is not strongly autocorrelated.
(iii) Z,_ I = Y,_ 1 - e,_ I' Now z,_ I is not obtainable exactly, since 1 is
not known exactly, but it could be approximated by -liy, _2 + IJI/, _2 using
tentative estimates of a and b, for instance from o.l.s. This choice or ='-1 is
appealing because it modifies the troublesome regrcssor I as lillie as
necessary to uncorrelate it rrom e
r
If (u J is while, it is fairly ei.lsy to show
(Problem 5.6) thatRzu, R/-Z and R
uz
are all diagonal, with principal-diagonal
elements (a
2
g1. + h
1
)r
ull
(O) and 1',,1/(0), where U
1
is the"powcr gain" 1'1'01l1I'ulI(0)
to rn(O), obtainable by calculating from a alltl h the unit-pulse response then
(i) Z'_I =11'_1' so that r".(O)=r,,'/O) and r",(O) =r,,,,(O). Clearly, Rzu
becomes singular. We should have foreseen this tlismilcr, as we have
introduced exact linear dependence between the columns of Z. namely colul11n
1 - column 2 = O. Linear dependence or ncar-dependence amung columlls of
Z might be much harder to foresee in an example with morc regressors.
(ii) Z,_, =Y'-2' so thal 1'".(0)=1'.,.,.(1) and f,,,(O)=r,,,(I). Wilh
is not inflating pJim coy Oz. Let us examine J?zu for three
choices of z,_ I' each uncorrelatcd with ',:
108
Example 5.3.5 Bias was caused in Examples 5.2.2 and 5.3.4 by the presence of
Y,_ I ,correlated with e" (is one of the regressors. To avoid bias, we replace .1'/_ 1
by an instrumental variable I untorrclatcd with ('/. We HILISL alsu take care
that
I II
(5.3.43)
(5.3.42)
(5.3.41)
(5.3.40)
Q = [(0 - 0)' W(O - 0)]
= [R' R R'y
= [12
1
V' UQ +kQ'QJ-' R'P'y
[(e'(O - O))'J = [(Ii - O)ree , (0 O)J
= [e'(O - 0)10 - WeJ = e'Me
OJ( = Q-'Ot = QOt = [Q'U'U
= [U'U 'QII
T
pry = [U'U +kJr' U'y
5.3.8 Linear Eslinm!or 'md Orthogonality
Having established that the minimum-ll1c<lll-squure-error (m.m.s.c.), esti-
mator is !lot gencrally the minimulll-covariance unbiased estimator, let us find
out what it is. Initially we shall consider the scalar weighted I11.S.C
and ridge regrcssioll emerges as the result of preventing any singular values of
U from being very small.
Ridge regresion call be shown to be capable of reducing the each
element of lJ. The proof consists of writing the derivative of the tn.sl.e, at k= 0,
i.e. at the whcre ridge regression departs from o.1.s., as a sum of negative
terms.
rather than the m.s.e. matrix M in (5.3.30). The dilferenee is less significant
than it might seem, as we arc usually interested in m.s. weighted errors of the
[arm
We also restrict the estimator to the form
O=Ay (5.3.44)
Iincar in the observations, like all the ki.\st-squarcs estimators, for
computational allu analytical simplicity.
where we have used the ract that P and Q arc both orthogonal. Helice
the 0:1' is a lincar cOlllhinatioJl of c1emcnts of (J which hus little innuence all the
Jit. and is cllllsequcntly poorly estimated. The ill-cunditioning t:all be
alleviated by rcplat:ing evcry 1/"11 by ri;!(,.Ji +k) with k positive,
preventing any r
it
I'rom being too small. The result is to from
v7/r .. (which minimises S' in (5.J.38)) to "iiyi!(lt+ k), say 0tj' I he tbagnnal
;,;at,';x R'II in (5.3.36) is thereby modilicd to II' R +kJ. so solving (5.3.36),
(5.3.39)
(5.3.37)
(5.3.36)
(5.3.38)
(5.3.33)
(5.3.35)
(5.3.34)
i= 1.2....."
.2ij* _ '. ,*
"i 1 -l i i.ll'
U=PRQ'
I.e.
0* g, 12'0.
p N
S = L:C1'7 - r"On': + L{y7'}
i= I i=fI+ I
R'RO' = R'y.
from the transformed regression elluatiol\
y* = 1"( UII +c) = 1" PRQ'II + P'e = JW* +e*
where
III-conditioning appears as at least one r
u
being very small, making the sum of
squares of residuals (model-output errors)
= = 0
which requires k to be a
2
jlP. The stationary value is, in fact, a minimum since
(f-j'vl/iJk!. is entirely composed of positive tcrms. We canllot choose the best k
in advance evell in the scalar case, nol knowing (f or 0, but we could lind un
acceptable k by trial and error (Hoerl and Kennard. 1970). checking that (he
estima tes arc credible and the SU 111 of squares of residuals is nollillduIy inllated
by using k.
For a vector O. analysis or how k alfeets M is not easy. and Hocrl and
Kennard avoid it by considering the mean sum of squares or estimate errors
[(0 - OJ' (0 - O)J. This is less satisraetory. particularly ir the clements or 0 are
of differing orders of magnitude, as all thc squared crrors arc weighted
equally. Goldstein and Smith justiry the choice'" th,ough its efreel in
reducing The singular-valuc dccomposition described in
Section 4.2.4 is applied to U. so
A stationary value of Al is i.H.::hicvcd when
very insensitive to the related Dr. SillCCSis the SUlllC as inlhe originul problem,
where P and Q are orthOgonal 1t1lltrices and R is N x p unl! zero but 'for non-
negative singular values l'u, I i p. As in Scction 4.2.4. P, Q and R i:\I'C used
to transform thc normal cquations to
Assuming Ie} is nol aULOcorrelatcd or correia leu with III \. UIH.J has mean zero
.,. 'I" ,N , I . - .
al1u variance rr-, lllll writing L.
r
11I,- as L.' tIe IIl.S.C. ul OJ( IS
110
by postiTlultiplying this expression and thc original onc (5.3.50) by
VIIIIIU
T
+ R. Hence the bias is
- _ Oot 0
E[Ay -/lJ = E[A VO + Ae -/lJ = A VO -II = - 0 - (5.3.52)
I +ot I -I-a
113
(5.3.55)
(5.3.53)
(5.3.58)
(5.3.54)
1
OOT
1+",
j (c' (0' - 0))'] = elM'c
lI/1r aOO
r
i'vJ= . .. ,+,
(I +a)- (I +"')-
ARA
r
=OaOI/(I +ot)'
i\I = El(O -11)(0 - oj']
=[A V - 1)/1 +Ae)(e
l
Al + Ol( VTA
T
-1]
=(A V -l)OOI(VTA' -I) +ARA
T
[/1' -II] = E[[V'R-' V
'
R-
I
(VII +e) -II]
=QIV'U-IVj IV1U'e]=O
The g.i.s, estimate 0' given by (5.3.25) is unbiased, since
so the Ill.m.s.c. estimator is indeed bitlscd. Its m.s.e. matrix is
.. ( R-' VIIO
T
V
r
R-
I
)
VrR-IV(M'-M)VTR-IV=VI R-'- V
I+ot
= V'[R + VOOrVTr
l
V>O
(5.3.57)
M' = [[U' 1/-' Vr I V' 1/-l cc
l
1C 'V[V
I
1/-1 V]-'j = [Vrl/-
I
Vr I
(5.3.56)
and its Ill.S.C. matrix is
The last step can be verilicd by multiplying the inner matrices by R + VOO" V
T
,
and the inequality follows from [R + VOO
T
V
r
] -I being the inverse of a
positive-definite matrix, as noted earlier, and therefore itself positive-definite.
Now since VTR-
I
V is invertible, any quadratic form - Can be
rewritten ('VII/-IV(lVl'-M)VTI/-IV( where ( is so
(5.3.57) shows that M' - M is positive-definite. The practical conclusion is
that the g.l.s. estimate has a larger mean-square weighted error
than thc thcuretical Ill.Ill.S.C. estimatur, as the pL.:llulLy ror being ullbiased.
We show that AI' is largcr than kl, in the sense that AI' - Al is positivc-
definite, by first noting that
and from (5.3.51),
so
(5.3.45)
(5.3.46)
(5.3.47)
(5.3.48)
(5.3.50)
(5.3.51)
say
Cove = R Ee=O,
A = Oil' V
'
[VOO' V
T
+ Rj- ,
A
y = VII +e,
Each element au of A must give
112
<7Q = 0 - O)TW(O II'
iJUij iJO aaij
= [(element i of2 W(O -IIJ) = 0
so, writing out iJQ/iJaij for all rows i and columnsj of' A,
<7Q/DA = 2W[(O _1I)yT] =0
Whatever the value of W, <7Q/DA will be zero if A gives
[(0 -1I)y'] = 0
These orthogonality cOl/diliolls say that, on average over all possible values of
the observations, the error in each parameter should be unrelated to each
observation. The orthogonality conditions (4.1.20) for the output estimate
bilsed on 0.1.5. are rather similar; they imply thallhcrc is no relation between
the output error and each regressor\ 011 average over the samples in one
record. The output estimate is linear in the regressors, just as iJ is linear in the
observations here.
We can lind A for the m.m.s.c. estimator explicitly where the observations
arc gCllcralcu by
so
with U unu 0 deterministic. The weighted m.s.c. ill () is calculatcd over all
realisations of c for a particular value of V and o. Substituting (5.3.48) into
(5.3.47), ,
E[(AV-l)II+Ae)(VO+e)l] =(AV- 1)00' V' +AR=O (5.3.49)
The inverse in this expression exists, since R is a positive-uelinite covariance
and vaa
T
V
T
is non-negative-definite. Although such an expression for A in
terms oflhe unknown ais not directly usable, it docs allow us 10 1V0rk out the
bias and m.s. error of the theoretically optimal estimator. We can then
compare this estimator with the g.l.s. minimum-covariance estimator.
First, we must simplify A. We can easily verify that
OO'UTR-' OOrV
1
R-
I
I +II
T
V
T
R
assuming that p(y 10) is well enough behavetl to allow reversal of the order of
dillcrcutiation and integration. To be nHJrc spccilic. the regularity conditions
on pry I0) are that the range orintegration (over whith PU' I0) is non-zero) must
not depend on 0, and the integral must converge in spite of the liiHercntiatioll
115
(5.4.6)
5 p(yI0) (I!,. In ply 10)T}]
lcO i70
:']
-I'
[
cov O(Y)
cov,I, = I
'" [ COy10(.rJ
[i
and therefore
"N
L '
Illp(YIO) =--NlilrJ.-- 1"1'.'
O.
for x 1 to Xu nOli-negative, so
N ("N )
- ",-L./= 1 x,
p(y!O)"'p(x, .v,..'.vl a)= 11,,(x,la)=a 'exp a
,'" 1
,
Example 5.4(1 We want to estimate parameter rx of the probability density
p(x Ia) = expi - xla), X:2: 0 from N independent samples of x. We shall
find the Cran'lcr-Rao bound all the variance of any unbiased estimator ora,
and then check whether the unbiased eSlimalor eX = :L;"", 1 x/N attains it.
Integrating xp(x Ia) and x'ptx Ia) by parls. we fll1d the mean .t to be a alld
the 111.5. value xi to be 20'. The samples of x have a joillt probability dellsity
proving (5.4.2).
O( ]')f
1
. In ply 10)} = J'OU/p(y Lt2 tlv = c1 Iti(y)p(y I0) dy
l . DO DO DO
(! a
=,iO{O(Y)}=DO(O)=1 (5.4.5)
Like any uthel' ctlvariulH:c matrix. coy (/' is J111sitive-SCllIi-dclillilC. su in
particular

a . tV LI"-IX'
.... 111 p(]' I(J)= - +_.. ',--
no - IX cr
in its integrand. Next let us examint {O(y)(DIDO) Inp(YIOJ} in the sallle
fashion:
since 6CI') is by assumption unbiascd. Helice
5.4 EFFICIENCY
(5.4.4)
5A EFFICIENCY
5.4.1 Cnllllcr ({ao lloulld
[
0(.1') J
1> c1
DO In plY I0)
First we find {WinO) In,p(y IO)}:
{
n } IrJ rJolt1p(y 10) = 5iiln p(y I0)' p(y I0) til'
I
rJp(y 10) D I rJ
=---tly= l'(vIO)tly= (1)=0
t10 ,10' ,10
COy 0(.1') :2: r'
is proved by considering the covariance of the augmcnted vector
114
Besides establishing that an estimator cOllverges, we may wish to measure ils
performance against some slanu1.in.l. A slalluanJ for estimation covariance is
provided by the bound. The bound applies to i:\ilY unbiased
estimator O(y) of a parameter vector 0 lIsing measurements y. For instance; y
might comprise all the elements of y and U in the usual rcgrcssiollll1odel,
the bound is not restricted to any particular model or any panicular estimator
form. Some at least of the measurements arc random variables (noise
present) so they are described by their joint probability density function
p(yIO), which is influenced by 0. Subject to some tonditions on p(yIO),
discussed below, the covariance of O(y) cannot bt less than the
bound F-' where
J.,1 (")' 'l
E
l
. In 1'<.1' 10) , Inl'(YIO)J
J'III ilO dJ
Matrix F is called the Fishel' /lUllr;:\. Withuut iittempting a
detailed interpretation of F, wc cannot accept the namc as associating lowest
potential covariance 1 of an estimate with most information about it.
also seems reasonable that the information about 0 is conveyed by its inl1ucnce
On the measurements through ply I0).
The Cramer-Rao inequality
t 17
(5.4.8)
(5.4.9)
(5.4.10)
':11
I ap(yIO)ap(yIO)
p'(y I0) iJO, aO
j
(1' D( I (lP(YI O)
---
iW
DO
, I (J 10)- ao, p(yIO) aO
j
I a'p(YIO)
p(y I0) ao, ()OJ
5.4.2 Elliciellcy
-5 I a1p(YI0)} f"'p(YIO) a' f
E ----c --- = = --- p(y I0) tly
l PlY I0) aO
I
DO
j
DO, DO
j
DO, DO
j
giving
unbiased estimate is said to be efficient if its covariance equals the
Cramer-Rao bOlUld. Wc can delinc the ellicieney or a scalar estimate as the
bound divided by the estimation variance. The main practical
significuilee of ellieicllcy is in determining whether further cJrOrls to devise a
lower-covariance estimate would be futile becausc the present estimate is
efficient or nearly eJIkicllt. Even so, cfliciency is nol always critical, as a
rrom-eJIicient estimate may be the best prattiC<lble and, more to the point, may
be acceptably accllrate.
Investigation of cflieicncy may require some idcalising assumption about
the form of pry 10), as in thc following example.
and if once more we can reverse the order of integration and diHercntiation,
5.4 EFFICtENCY
The Fisher information matrix relates easily to the second-derivative
matrix of In p(y I0) with respect to 0, ror
= - /; O)}
= _ E{D In p(y I0) aIn pry 10)}
"0, DOj
.';d. ;, "'T' ""."" u; ;;:;:t""'''';, '" "h:::":::
We shall fe-encollnter InpCI'IO), when we cover maximum-likelihood
estimation in Chapter 6.
5 STATISTICAL PRUPERTIES UF ESTIMATORS
if l.l' J is such thal gU
r
- r .s;)'/ s gil, + I'
(or I = 1,2, ... , N
otherwise
N' 2N I, ., N
=-, --,
a- a a a
Example 5.4.2 The gain g of a plant modelled by
.1', =gu
r
+e,
is to be estimated from N independent pairs of measurements (upyJ
model error e,is believed to be unifonnly distributed over [- r. r]. What is
Cramer-Rao bound for g? He!"e
N N
= - 2; E{IX,} -1-;4 {(l>J}
1= 1 I = I
The range over which p(y I0) is non-zero c1carly depcnds on (g here), so
regularity conditions arc not all satisHed, und the Cramer-Rao bound
inapplicable. A moment's thought reveals that in principle the variance of
can be made as small as you please by using large enough absolute values
the input samples III{.
N
p(y I0).= for 1.1', - gll,1 S r OnlY}
/= 1
We next examine an equally simple example in which things arc not So
straightforward.
so &docs attain the Cramer- Rao hOLlnd; we couldn't do helter with allY ollIeI'
unbiased estimator, however ingenious. l:J.
116
so the Cramer-Rao bound on the variance of ci is a
1
jN.
The mean of a: is a and the variance of eX is
From this we find. given the independence or the samples. that
FURTHER READING
PlY I0)= prob(y = observed value I0)
119
l 1
REFERENCES
PROBLEMS
Fenner, R. T. (1974). "Computing for Engineers". Macmillan, London.
GOltlslcin, M,. and Smith, A. F. M. (1974). Ridge-lype estimators for regression antilysis. J. Roy.
Slaf. Soc. 836, 284-c-29I.
Goodwin. G. c., and Payne. R. A. (1977). "Dynamic System Identification Experiment Design
and Data Analysis". Academic Press, New York ...IIld London.
Helstrom, C. W. ( 1984). "Probability and Stochastic Processes ror Engineers". Macmillan, New
York.
Hoer!, A. E., and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogontil
problems. Tt'cJmOI1lt'frics 12, 55-68.
Johnston, J. (1972). "Econometric Methods", 2nd cd. McGraw-Hill, New York.
Kendall,M. G., and Stuart, A. (1979). "The Advanced Theory of Statistics", 4th cd., Vol. 2.
Gri!lill, London. .
Melsa, J. L, and Sage, A. P. (1973). "An Introduction to Probability and Stochastic Processes".
Prelitice-Hilil. Eng!ew()od eli!!"s, New Jersey.
PapouJis, A. (11)65'. "Probability, Rnl1l.lolII Variables, and Stochastic ProLcssCS". t-.klira\\'.I-lilJ
Kogakusha, New York and "fokya.
Silvey, S. D. (llJ75). "Statislicallnlcl'cllcc" Chapllliln & lIall, Londoll.
WlldswUrlh. (i. I' .. amI IIryali. J. U. (JlJ74). "ApplicatiollS of Prohahility allli It,mdolll
Variables", 2nd ed. New York.
Whitllc, P. (JlJ7U). ,. Probability". Pcnguin, I-Iannolll.lsworth. England.
Wilks, S. Slatisties". Wiley, New York.
5.1 Show that, in the notation of Example 5.1.2, is a biased estimator of
p2, with bias .\j N. If two independent batches of N samples of L' have sample
means L':l1
J
and is an unbiascd estimator of ijl? Show that
1 Iv,(V, - it
l
)) JI N is an unbiased estimator of s, where lJ
lll
is based on N
samples independent 01" VI to UN' Does it matter whether VI to 1',\' are mutually
independent in this estimator?
5.2 Find thc variance of in Problem 5.1.
5.3 II', in the "errors in variables" situation described in Section 5.2.3, 0 is a
vector but only one regressor is afrected by noise. does that noise cause bins in
all of O. only the clcment multiplying thc alreeted regressor, or nOllc or 0'1
Would your answer change if () were eslimated by w.l.s. with any positivc-
definite weighting matrix?
5.4 In Problem 4.5, [- U[ U' UJ 'u r was found tll be idempotenl. Verify
that, because it is idcmpotcnt and symi1letric, it is positive-semi-definite, With
Aa matrix such that A U = I. write AAT - [U
T
UJ - I as a symmetric exprcssion
in A and 1- U[ U I LIJ- I U
T
and hence show thal it is positive-semi-definitc.
[SCCtitlll 5.3.4 brings out the relevance of this problcm to the miuiinum-
covariance property of o.l.s.J
PROBLEMS
ill
'
In I'I " I0) = .. (I' II '(I
,'0- .
with obscrvcu yand U inserted
[U' /1-' Ur I
5 STATISTiCAL PROPERTIES OF ESTIMATORS
= prob(e = y - UO I0)
"
DO In ply I0)'= U' II 'IY" 110).
so, with U
The Markov estimate 0' given by (5.3.25):
0' = [V'R-' UJ-' V' /I-'y
has covariance [VI' I U] -1 as we saw in Section (5.3.5), so 0' has covariance
equal to lhe Cramer-Rao bound F-' and is ellieienl. . /::,
In ply I0) = -l(y - UO) I II '( y - UO) +const independent llf ()
Wadsworth and Bryan (1974) and Helstrom (1984) anlllng many lllher books
give the basic material on probability in detail and arc generolls with cxamplcli
and problems. Whittle (1970) is more advanced but very coneisc and readable,
and discusses the convergence ,of ranuolll scquclH.::cs. This tupic is also
introduced by Papoulis (1965), who provides a wide backgrllund in stllchastic
processes and looks at least-squares estimation in a stochastic setting.
Silvey (1975) covers un biased estimation and the
main topics or Chapter 6, and is beauLifully concise. A good selection of the
estimation theory we need is sUlllmarised by Goodwin and Payne (1977), and
the appendices of that book contain scvcrul slanll<tnJ results we shull lind
useful later.
To get any further we have to assume some form for the p.dJ. or c. A popular
assumption which makes the algebra easy and may bear some resemblance to
the truth is that e is a zero"'ll1can, Gaussian wllliom variable with p.d.L
pte I0) = cxp( -1e' /I -, e)/( (2,,)"1'1111)
where II is cove as usual. Putting y - UO for e and then using (5.4.1 I), we
obtain the Fisher information matrix
Example 5.4.3 We shall lest the elliciency of the Markov estimate introduced
in Section 5.3.5. With the usual regression model, and consiuering U not to be
random,
118
120 5 STATISTICAL PROPERTIES OF ESTIMATORS
5.5 Assuming thai the input sequence Ill) in Example 5.3,4 is
show (by expressing I',.,(IJ in terms of 1')',.(0) amll'",.{O) and then r,"'\O) mten.ns
of 1',,(0) and I'",.{O)) that 1',.,.(1) is not generallyzero. Hence lind the probabIlity
limit of N cov 6. in part (ii) of Example 5.3,4mterms of I'",.{O), 1'",,(0), (/ and b.
5.6 Verify th:t in pari (iii) of Example 5.3.5, lI
zu
and Rzz are as claimed
there.
5.7 Is u
r
-
1
a suitable instrumental variable Lo replace .1'1_1 in the model in
Example 5.3,4? Specifically, is it uncorrelated with e, but strongly correlated
with V'!
5 8 I
all acceptable instrumental variable to replace J',-I in
SY,_I-llf_l
Example 5.3,4?
Chapter 6
Optimal Estimation, Bayes and Maximum-Lil..elihood
Estimators
6.1 INTROOUCflON
A large ilumber of iJentiJicatioll lIlcthous will be described in Chapter 7, yet
they represent only a fraction of the' methods available. We must somehow
classify amI compare the throng or competing IlIctlwds, whether we want a
good technique for one application or a broad perspective 011 whole lield. A
framework which will accommodate many ioenlilication methods is set up,in
this chapter; Chapters go on to sec how the methods are implemented.
It is conveniclltto categorise mcthods initially according to what measure
of estimation goodness they try to optimise. Model structure and <:om-
putati0l1ul tactics can be considered later. Our basis for categorising will be
,Bayes 1'stimution, which is easy to appreciate, almost all-embracing and
appealipg to common sense. Other frameworks are possible, ano one in
particular, the prediction"error formulation (discussed brielly in Chapter 7) is
a very Llscful basis for analysis of asymptotic properties of parameter
estimators, Least-squares methods for models linear in their parameters will
fit into eithcr framcwork, alldwill be the main object of our attention in later
chapters.
We shall be asking or each method "Is it simple and computationally
cheap'!" ano "Arc its assumptions realisticT'. The answers will often be "no",
and will lead us to simplify the methods and be wary of their results.
6.2 IIAYESIAN APPROACII TO OPTIMAL ESTIMATION
The least-squares estimators we havc cOlH;Cnli"utcu 011 were motivated by the
simple idea of lilting model output to observed output as closely as possible.
They proved to have aLtractive statistical properties under suitable
assumptions, including optimality in the sense ofminimul11 covariance among
121
6.2.1 Optimality: Luss FuneIiuns and Risk
123
(6.2.1)
(6.2.2)
(6.2.3)
I' = 1'(0)
o
1'(0) = L(O, OJ
r
EL(I!(Y),O) = f '1" L(I!(l'),O)p(OI YjdO
III r " -'J
6.2 BAYESIAN APPROACH TO OPTIMAL ESTIMATION
6.2.2 Pusterior Probability IJCIISit)' of Parameters; Ba)'cs' Uulc
Once a loss function is chosen we can begin to design an estimator to
mjnimisc the scalar risk
Example 6.2.2 In compartmental models of drug metabolism in the body,
each rate constant for transfer of a drug between compartments is non-
negative by,deJlnition. An experienced investigator may be able to quote
maximum dcdible values for each. In the absence of any further information,
each rate c1nstant k
ij
can be assigned a uniform prior probability density l/a
o
over the range zero to its maximum credible value a
ij
The rate constanls do in
fact vary from subject to subject. su it is reasonable to treat them as random
6
The crucial importance of this extcnsion is thut it makes use of a prior I'.d./
prO) embodying all thc available background knowledge about the likely
paramcter valucs. The I1ceu. to provide a prior p.d.f. for the parameters
characterises Bayes estimatio11 and, we shall sec presently, is responsible for
both its power andits practical weakness. 'f \'
defined as the average loss over all possible realisations}' of the measured
output and explanatory-variable values (y and U together, in regression). The
minimulll-covariance estimator, for inslance, minimises the risk with
(cT(O - 0)' as L(O, 0). whatever the real, non-zero valueofe. At this pOillt, we
Can open up all elltirely new possibility by extcnding the aim of the estimator
to minimising the areragc risk ,-; over all possible values of 0:
The easiest way to lind the minillllllll-l'isk c5timator which minimises F is to
think or Y as fixed anu. determine the estimator O{ Y) which minimises the
average loss ovcr all possible 0 lor that Y. II' the estimator minimises
EL(O(Y), 0) for every realisation Y, it minimises the average risk over all 0 and
o
. Y. WiIh l' lixed, averaging over 0 requires lise of the posteriorp.dj. p(O IY) of 0
given Y:
6 BAYES AND MAXIMUM-LIKELIHOOD ESTIMATION
Example 6.2.1 A river-How prediclor based on an estimated model of the
catchment dynamics and rainfall mcasurements is required to give warning of
any flowJlikely to overtop a colfer dam protecting civil engineering works.
the overtopping How islu and the predicted How]; a prediction error is fairly
importanl when I> and I</;"I as it will precipilate an and
expensive hall and evacuaIion, extremely important whenI>/;, as
everyIhing a11d everybody will get wet, and unimportant in all other cases. The
simplest loss funcIion reHeeting this situation is Uwhen (/-./;,)(]'-./;,) > U,"
when and ]'>./;,, and (I when />./;, and ]'</;" with /1> a> U.
Refinements are possible, and the pral:ticability of designing an estimator with
this loss function is an open question. 6.
all linear, unbiased estimators. The following questions about them remain
unanswered, however:
(I) Can we improve on them by using some estimator?
(2) What estimator is best if the criterion is something other than minimum
output m.s. error or parameter-estimate covariance?
(3) How can we bring in prior information on the most likely values for the
parameters?
(4) Regarding the parameters as random variables, call we estimate their
joinI p.d.L raIher than just Iheir means?
The aim of Sections 6.2 and 6.3 is to answer these ljllestions. We slart by
setting in a broader context the dellnition of the best estimator.
122
Think for the moment of a scalar parameter 0. We could express how seriously
we take estimation errors of different sizes by nomina ting a scalar {oss./imctioll
L(O,O), larger for a worse eITO'. Some possible loss functions areJO-O)',
which we have already met; 10 - 01, which gives Icss weight than (0 - 0)' to
large errors; ((0 - 0)/0)', which implies that proportional rather than absolute
error is important; maxolO - UI, a pessimist's choice, which weighs only the
worst error; 0 for 10 - 01::; a and I for 10 - 01 > a, which indicates indillerence
to errors up to CJ.. and equal dislike of all larger errors, i.e. classilies each error as
"serious" or "not serious"; or (0 - 0)' for 10 - 01::; a and 2al0 _ 01';:- ,,'
10 - 01 > a, a compromise beIween (0 - 0)' and 10 - 01. Evidenl1y choosing a
loss function is a subjective matter, and u.cpenu.s 011 huw btU'! the cllllseqllenl:CS
of an error of any given size arc pen.:civcd to bc. Somctimes the ultimate
application of the model makes the chuice casy. but more onen not.
Subsequent averaging over Y would produce f:
125
estimate. The "most likely" value or 0 might reasonably be taken as 0
1
, at the
global maximum of 1'(0 I Y), but a good case could be argued ror the notably
different centroid O
2
and ror other measures or the middle of p(O II'). An
associated variance eslinlale, even if accurate, would fail to warn adequately
or the uncertainty in 0, as we might be quite happy with that variance if the
p.d.L were uniniodal.
Granted thatp(O I r) is highly desirable in itselr and will also lake us along
the road to a minimum-risk estimate, how is it computed?
N
= (21l<r
2
)-NJ2 exp( -ll(.1',-gll,-Y/(217
2j
)
1= I
N N
p,(YI""'YNlg)= rkr(Y,lg)= rkU',-gll,-I)
1= I 1= 1
6.2.3 UU)'CS EslimatiuiI: Uchlils
where each p.d.f. refers lo the random vari,lble indicaled by its subscript.
Before the measuremenls are made, g is known on physical grounds la lie
between I and 5, but that is all, so we assign to g the unirorm p.d.L
p(g)={, I sgs5
Examl,)e 6,2,3 A plant is modelled by
/
I
Once the measurements Y have been takcll,p(Y) in Bayes' fulc{6.2.5} is just a
number, and serves only to scale pWI I') so that its integral over 0 is unity.
According to what estimator is used, p(Y) mayor may not have to be
computed. When it must, it is round by integrating p(YIO)p(O), i.e. p(O, Y),
over all o. The prior density 1'(0) is provided by the user rrol11 background
knowledge or guesswork, and p(YIO) comes from the model relating 0 to Y,
together with the p.d.r. or each random variable influencing Y."'d" . f.,)
and its gain g is to be estimated from N independent measurements YI to YN'
Each noise sample e, is believed to be a Gaussian random variable with mean
zero and variance (fl, The input sequence flo to llN_l is known exactly, and so
can be treated as deterministic, leaving onlY.1'1 to.1'N as Y. Thus with g as 0,
1'( YI 0) is
(6.2.5)
(6.2.4)
'- I
"-
I
'"
Q.
I
/
I
I
~ ,
a 8
1
8
2
e
Buried in (6.2.4) is the relation between the posterior p.dJ. p(OI Y) and the
prior p.d.r. p(O) given by
IBayes' Rule p(OI Y) =P(YIO)P(O)/p(YJI
124
Range of almost
equi-probable values
Fig. 6.2.1 Posterior probability density function of parameter.
fLf,L(O(Y),OJ} = r" r" {r, f", L(OO).O)pWI l')dO}P(Y)dY
= f, f", f, r, L(O(Y),O)pW, y)dOdY
= r, r, f" r, L(0(Y),O)p(Y10)p(OJdYdO
= f', ,f', if/,(OO'),O)p(O)dO
= E{ ELUIO'), OJ} = F
o no
Before seeing in detail how Bayes' rule is employed in minimum-risk
estimation, let us pause to weigh up the idea or finding p(O I Y).
The most we could ask of any estimation method is that it should find the
entire p.d.f. of the parameters, given the measurements. The p.d.L says much
more about the parameters than a point estimate (j and its covariance could.
Figure 6.2.1 exemplifies p(O I Y) ror a scalar parameter. It indicates that in this
instance too little is yet known to lotate 0 confidently, values over a
considerable range being estimaled as about equally likely. We should want to
refine p(O IY) by adding more measurements to Y. Nonetheless, it is already
clear that 0 is unlikely to be negative. Notice the danger or relying on a point
127
sharpen it, reducing its variance (discounting the truncation at g = I and
g = 5) rrom
Having l"ccugniscd the attractions or Bayes estimation. let us pass 011 to its
drawbacks. They result I"rom its mcmbership of the luxury class, pio\'iding
evcrythiilg you could want but at 11 high price ill information and
computatioll. Thc need to provide a prior p.d.L is hardest to meet: indeed,
some statisticians find themselves unable to do so with a clear conscience,
because it is subjective. Bcrore yo'u dismiss these scruples, consider all
example. In Example 5.4.1 we estimated the parameter" or the p.dJ.
p(x) = expl -x/ai/a. II' we had opted ror Bayes estimation, but knelV in
advance only that rJ. lay between I and 2, we might well have taken the prior
pia) as unirorm at I rrom I to 2. II' alternatively we had written the model as
pix) = /Iexp( -/Ix) with /Hor 1/" Ihen knowing only that/lwas betwee,) rand
I we should have taken a nnirorm p.d.L with pill) equal to 2 over that range.
However. this is equivalent to a prior p.d.L pta) p(/I)ld/l/dal =,2Ia'. The
paradox would not arise if we had enough previous experience of parameter
values in similar cases to guess thei r relative rreq ucncies of occurrence, in other
words. iL\ve had an empirical prior p.dJ. Rarely is this so. Instead we have to
interpret' the prior p.d.r. as stating degrees or belier, however shakily rounded,
in each (possible parameter value.
A l}lrlhcr factor is that oncn the inrormation conveycd by thc
measuremcnts rar outweighs that contained in the prior p.d.f.. so the Hnal
paramcter cstimate is not very sensitive to thc prior p.d.f. A fair qucStiOi1 is
then "Why lise a prilli' p.d.r. at all, ir it has an insignilicantlinal elfect '1". The
answer is that Bayes estimation is still all appealing conceptual framework
even if" ils usc or a prior p,d.1". is not, in Ihe upshot. llulllerically significanl.
The uther rlilIdallleillal drawb:ick of Bayes estimalion is thc alllount or
work cl1laileu in I'l'fming the postcrior p.u.f. and thcll extn.lcting the
minimum-risk estimate. A short-cut procedurc which forms or updates lhc
estimate without computing the entire posterior p.d.!'. is more likely to be
acceptable. especially when many parameters musl be estimated at oncc.
For these reasons rull-blown Bayes estimators are seldom implemented
(Moore and Jones, 1978), but the Bayes rramework is orten helprul in
interpreting other algorithms. An importanl special case is that of Gaussian
prior and posterior p.dJ.'s, which are completely defined by their means and
variances (or covariance matrices, for vector r.v.'s). We shull see in Chapter 7
lhat several recursive estimators operate by updating a parameter estimate
and its estimated covariance each time a new measurement is made. For a
, ,
l:S;g:s;5
Fig. (J.2.2 Probahility density fUlIl:tions
fur E.ll.llillple 6,2..1.
plgl
5 3
q
6 BAYES AND MAXIMUML1KELlIIOOD ESTIMATION
2
.... ,\,..u P(fllYl
/ \ /p(YIg}
N
- rju', -gll,_,)'/(2a')) /p(l'),
1""1
a
PIg IY) = 1'( YI g)p(g )/p( Y)
Example 6.2.4 The measuremcnts in Example 6.2.3 could have been
processed one at a time. As g is assumed constant, it is not necessary to update
Ii at each time to account for its evolution; we need only bring in lhe
information conveyed by the ncw measurement. On receiving Yk' we update
p(gIY" ... ,Y,_,) to
pig IY" ... ,Y,) p(y, Ig)p(g IY, ' ... , y,-, J!p(y,)
The elreet is to adjust the location or the peak of the posterior p.d.r. and
One of the most valuablc features of Bayes estimation is its apt,ncss for
cstimation in steps, bringing in ncw measurements at each. Chapter 7 covers
stepwise estimation, but we should note hcre that it is necessary whenever
measuremcnts arc rcceivcd alllllllust be proccssed ill realtimc, and that it is a
convcnient way to estimate a lime-varying model. evcn olr-linc. Baycs
estimation in sleps involves lIsing thc posterior p.d.f. I"rom each step as the
prior p.d.1". for the next, aner allowing, usually straightforwardly, fur any
dynamics of the paramctcrs themselves if the parametcrs arc timc-varying.
We could calculate PlY) by integrating p(Y1g)p(g) over all g, but it is
unnecessary as plY) docs not alleet the shape or p(gIYI. The shape is
easily seen (Fig. 6.2.2) to be a Gaussian p.d.L, with peak at
g = 1 11l12_1 and variance i U:_
1
, chopped oil' at g = I
al1d g 5. f::,
Inserting the known values or II and mcasurcments of y. we can compute
126
6.3 MINIMlJMIHSI\ ESTIMATOI{S
129
(6.3.4)
c' CtlvO e = e' EI(O - 0)(0 - O)']e = E[(e'((1 - 0))']
= El(O - O)'ee'(O - 0)]
(i) The loss function (of 0- () only) is symmetrical about zero error and
monotonically Iloll
w
decreasing each side of zero error;
(ii) the posterior p.d.f. is symmetrical about the mean; and
(iii) either the loss function is also convex or the cumulative
distribution function is also convex below the mean.
(A convex function.l (x) salislies .I().x, -I- t I _c ).)x,) :s: )/(.\",) -I- (1- h/(.r,)
for all x, and x, and any (J :s: ).:S: I. Thai is, t he section olj(x) belween any two
points on/(x) lies entirely under or on the straight line joining them.) Of the
list of loss functions in Section 6.2.1, the Jirst two are convex and symmetrical
about zero error, but not the fifth, which is 110t convex. Asymmetrical p.d.r:s
abound, bUI many of them arc not skewed enough to make the conditiollal
mean a bad estimator. Convexity of the cumulative distribution function is
destroyed if the p.d.f. increases at any point as you move away from the mean;
such behaviour may well result from the presence of two or more soUrces of
estimation error with different means, or from a conflict between the prior
p.d.f. and the evidence in the measurements. However, the cumulative
distribution function need not be convex nor the p.d.l'. symmetrical for the
conditional mean to be the minimum mcan-square-error estimate (Deutsch,
1965).
which is the expected quadratic cost with ec' for W. The strong similarily of
the two estimators is only superficial. The averaging in (6.3.4) is over
measurement realisations with 0 fixed, but in (6.3.3) it is over realisations of 0
with the measurements fixed. In (6.3.4) the random variable is 0; in (6.3.3) it is
O. Although the two estimates might coincide in particular cases, they are not
the same in general.
The conditional mean has been shown (Sherman, 1958; Deutsch,.I9.(j5) to
be optimal for a broader class of loss functions than quadratic cosC The
conditional mean is the minimum-risk estimator provided that
The second-derivative matrix is IV, so if the minimum is to be unique,
be positive-definite, and to satisfy (6.3.3),0 musI equal E[UI1']: again the
estimator is the conditional mean.
We sbould pause here to note the distinction between the minimum-
qu'adratic-cost estimate and the minimum-covariance estimate in Section
5.3.2, which also minimised the expected value of a quadratic cost. There the
I11.S, error of aily linear function elf} was minimised, i.e.
6.3 MINIMUM-RISK ESTIMATORS
(6.3.3)
(6.3.2)
For a vector 0, the minimum-quadratic-cost estimator is found in much
same way. For any weighting matrix W,
:o{f,,,,
=2W[,'" f',,(O-O)P(OIYJdU
2W(0 - E[OI 1'J) = 0
o r, Up(UIYJdU=ElUjYJ
A minimum is found by (6.3.2) since the second derivative equals 2, which is
positive. Equation (6.3.2) says that lite minif1l11l1l-quadl'utic-t'ost estimator is
the conditional (posterior) mean.
Example 6.3.1 The posterior p.d.L of the gain g in Example 6.2.3 was
Gaussian, but truncated at g = I and 5. In olle experiment, the input values
t
measurements and noise variance pul the peak at g = 2.2 and make the
variance before truncation 4. A table of the cumulative Gaussian distribution
allows us to calculate that for the area under Ihe truncaled p.d J. 10 be I, the
p.dJ. between g = I and 5 must be 3.101 tillles the Gaussian p.dJ. with mean
2.2 and variance 4. Numerical integration of gp(g 1 YJ, using a table of the
Gaussian p.d.L, then gives the conditional mean as E[gl y] '" 2.73. f::,
6.3.1 IVlillilllUIII QUlulralic Cost
Gaussian-distributed estimate, the updating can be viewed as computing the
posterior p.d.f., via its mean and covariance, frolll a prior p.d.L also described
by its mean (the old estimate) and covariance.
The next section examines some minimum-risk estimators.
giving
The estimate 0minimising the expected value of (0 - U)' over all possible U,
given measurements Y, is round from
L, (O-U)p(UIYJdU=O (6.3.1)
128
131
I,,;, f";' N
6.3 MINIMUM-RISK ESTIMATORS
6.3.3 Minimax Estimator
As in Example 6.2.3, our prior information on g is that I ,,;, g,,;, 5, so p(O) is
'zero for 0 < I and 0> 5, We conclude from Bayes' rule that 1'(01 1") is zero
below max( g" J) and above min(g" 5). These values define the possible range
for g, and the minimax estimate of g is their mean. 6
N
p(Y! 0) =p,(y" Y,, .. ,y,,! g) = nPe(Y, - gu,_,)
1= \
The most pessimistic choice is the minimax estimator, which minimises the
expected maximum possible error. The idea makes sense only with regard to a
scalar parameter, unless we arc prepHred to measure the error in a vector
p<lramctcr by SOIiIC additional cOst funt:lioll. Leaving aside suchco!ll-
plicatiolls for the moment, an instant's thought shows that the minimax Uis
the midpoint of the range or pussible va lues or () as indicated by the extremities
of pU!ll').
Minimax l,;Stillliltillll lits intu a Bayes context by virtue or the fad that it
minimises the loss function limri"':i.(O - U)2'1, weighting extreme values
infinitely more heavily lhan all others. Bayes estimation does not, however,
seem a natural context, since we need not compute the whole of p(O! Y) if we
are interested OI;ly in its endpoints. Furthermore, the most obvious reason for
restricting attention to the ends of the range of () would be that the range was
the only convincing information, and too little was known to determine
PlY! 0) and 1'(0).
{
I' -5} . {I' +5}
max :::;; g:::;; 111111 -''-- ,
\$./$.N 1I
1
_
1
\:$.I$.N lI
t
_
1
which implies that
Example 6.3.3 Ifinthe problem of Example 6.2.3, we did not knl1\vthel1oise
p.dJ. but knew only that the noise in each observation was between - 5 and 5,
We could still establish the range of possible values of 0, i.e. g, as follows. Since
The problem presented by Example 6.3.3 is to identify the range of
parameter values consistent with given prior bounds and with measurements
coutaining noise described only by bounds. Only relatively reeently has this
problem, estimation based on a bare minimum of statistical information,
we know that p( riO) is zero outside lhe range
(6.3.5)
10, - O,lp(1i I}' )dli
;= 1,2, .. .,p
- a J'" '" f'_i", ... J" (Ii, - O,)II(lill')dO
- (J{) - 'J' .
6 DAYES AND MAXIMUM-LIKELIHOOD ESTIMATION
6.3.2 Minimum Expected Error
where tl, is zero but for one as element i. In (6.3.5), the contribution to the
derivative due to variation of OJ in the integration limits is zero since Vi - OJ is
zero at that point. The integrals in the last expression of (6.3.5) are the
cumulative probabilities of 0, being respectively below and above V" To satisfy
(6.3.5) they must be equal, so we conclude that Vi must be the median of the
marginal posterior p.d.f. of (); the esti-
malOI' is lite COl/eliltOllal (posterior) mediall.
Here the normalising conslanl p(Y) in Bayes' rule necd not be calculated,
whereas it must be to get the conditional mean.
Example 6.3.2 Refer again to Example 6.2.3, with numbers as in Example
6.3.1. The bsolute-errOf estimate 01" g cuts the area under
the posterior p.d.f. in half. From a table of the Gaussian distribulion, we find
that 0.2743 of the area under the untruncated p.d,f. is chopped oil' at g,= I
(0.6u below the peak) and 0.0808 at g = 5 (l.4u above the peak),
0.6449. The proportion of the area under the unlruneated p.dJ. below Ii is
therefore 0.2743 +1(0.6449) = 0.5968. The table then gives Ii as 0.2450- above
the peak, so Ii = 2.2 + 0.245(2.0) = 2.69. This conditional-median estimate
dill'ers from the conditional mean found in Example 6.3.1 since p(gj Y) is
asymmctrical, but the dilrercllcc is quite small. 6.
For a vector 0, absolute error does not lend itselrto forming a single scalar risk
function. Instead. we can minimise the risks E[lO, - O,!I Y] for all the elements
of 0 at Ollce, by requiring their derivatives with respect to () to be zero:
130
133
(6.3.8)
6.4 MAXIMUM-LIKELIHOOD ESTIMATION
Tlwjoiut p.d.1'. p(Y 10) of the measurements is determined, as before, by the
slructure togelher with the p.d.L's or the noise and of the inputs if they
arc stochastic. Oilce thc measurements have been made and numbers be
substituted for 1', plY10) is a function of the unknown parameters 0 only. The
maximum-I ikelihood (m.l.) estimate of 0 is the value which maximises 1'(1'10).
That is, once I' is known,p(Y10) is taken to indicate the likelihood of O. It may
help if we imagine 0 being stepped in very small increments over a wide range,
and a fixed large number of sets of measurements being made al each O. Jf we
then examine only those results where the measurements are very close to a
particular set or values Y, we shall find more generated by 0 values such that
p(Y I0) is high than by values with p(Y I0) low.
Computation of an m.l. estimate is simple in principle. Given the model
form and numerical 1', we write downp(Y I0) and find its global maximum. In
practice, p(l' I0) is usually a complicated function of 0, and any trick which
simplifies its maximisation is welcome. If the measurement set Y can be
arranged to consist of a number of much smaller independent sets Y
1
to YN
then 1'( YI 0) is I 1'(1', 10). An effective trick to make this product easier to
maximise is to take logs, giving ,logfl(Y, I0) as 10gp(YIO). Since log is a
6.4.1 Cuudilicmal IVlnxilllum-Likclihood Estimator
The reliance of Bayes estimation on a prior p.d.L is both its principal strength
and its most \vorrying aspect. Maximum"likelihood (M) estima:ti6n foxgoes
the strength but avoids the worry.
The cllnditionalmode is thus at 0, and because 1'(0 IY) is totally symmetrical
about (/' = U, the canditionallllean and median also occur at fJ. We conclude
that 0 is simultaneously the "Illo.st likely", and
minimum-expected-absolute-error estimatc of O.
The coincidence or these three optimal estimates is an incentive to assume,
or even pretend, that the p.d.1'. is Gaussian. For the estimates to coincide, the
posterior p.d.1'. need only be symmetrical about its peak. The Gaussian p.dJ.
is, however, the most popular to assume far other reasons as well, including its
significance in maximum-likelihood estimation (Section 6.4.2) and its relative
ease or analysis.
The exponential is at its maximum, maximising p(O IY), where cP is zero and
(6.3.7)
(6.3.6)
6 BAYES AND MAXIMUM-LiKELlIIOOD ESTIMATION
6.3.4 "Most Lilwlf' Estimate
A Gaussian posterior p.d.L simplifies analysis greatly, and orten approxi-
mates the truth well enough, For parameter 0 with p elements, a Gaussian
posterior p.dJ. is
where R -1 is a positive-definite matrix, and (J and R depend on the prior p,dJ.
and Y.
The conditional mean, 111edian anu mode can bc round 1'01' this p,d.r. by a
transformation of variables. Since R-
1
is positive-dellnite, it call be raclarised
into QQT where Qis a square non-singnlar matrix. If we define,p as QT(O - 0),
we obtain
The simplest idea of all is 10 take as 0the value giving the largest value of
p(0 IY): tile cOlldit fonal (posterior) mode or IJwxi1J11111l a posteriori estimatol';
Figure 6.2.1 shows that this is not always a good idea. The location of the
mode may give an incomplete or misleading impression when the posterior
p.dJ. is strongly skewed or has two or more peaks of not very dilferent heights.
To some extent the same criticism can be made of allY other pOintcstimate, as
the information in the p.dJ. cannot always be adequately summarised by a
single ii, Even so, unless the peak is very high it seems wise to choose an
estimator which is at the middle, in some defliled scnse, of the p.d.L
The conditional-mode estimator is the limit, as IX tcnus to zero, or a
minimumwrisk estimator with loss ['unction zero 1'01' {j within a distance r.J. of 0
and unity everywhere else,
Example 6.3.4 The cOIHJitional-motle estimate based 011 the p.d.r. or Fig,
6,2.2 is g = 2,2, quite a way from the Cllllditiollal-llicilll and cunditiollal
median estimates, Had the peak been sharper or the trulll:atioll niorc nearly
symmetrical, the three estimHtes would have differed less, , L:.
received much attention in the engineering literature (Sdl\\1cppc, 1973; Fogel
and Huang, 1982). Unlike the more general minimax problem, it generalises
readily to cover a vector parameter, becoming the problem or idelltifying the
region in parameter space consistent with a prior region and with the
bounded-noise measurements. We consider this further in Section S.6.
6.3.5 Bayes Estimation wilh Gaussian Prulmbility Uellsity Function
132
135
(6.4.4)
(6.4.3)
(6.4.1)
(6.4.2)
y = VO +e
L({i) = Inp,.(y - VO 10)
= - j In((2ntIRI) - j(y - VO)'R-'(y - VO)
. P,.(c) = 2rrtIRI) -1/1 exp( -icTR-lc)
and the log-likelihood function for 0 is
Let us relurn to the regression model
with U delerminstic and e zero-mean and of known covariance R. If e is
assumed Gaussian, as is often reasonable, and it has N elements, its p.dJ. is
6.4.3 Unconditional I\ilaximum-Likelihood Estimator
Before enquiring into the statistical properties arm.1. estimates, we look at
lWo special cases in the next two sections.
, 1 .
With R inucpelH.lcnt or 0, we can maximise L{O) by minimising
(y - VO)' R 'Iy - VOl. From Section 5.3.4, the minimising 0 is the Markov
estimate
6.4.2 I\'lnxillllllll-Likelihood EstimahJr with Gaussian lVlcaslJrcmcllts and
LiJl(.'ar I\'ludd
The conditional-mode Bayes cSlimator in Section 6.3.4 bears some similarilY
to the 111.1. estimalor in Section 6.4.1, but maximises 1'(01 Y) rather than
plY 10). Now for a given Y, p(O In dilrcrs from p( Y, 0) only by the factor p( Y).
a number indepenuenl of 0, so the condilional-moue estimator can be viewed
as an m.1. estimator based all the uncondilional joint p.d.f. p( r,O} ralher than
lhe conditional p.d.L pI YI 0).
Whal is more, if we have no prior illfol'llHlliol1 011 () and so take /J((J) as flat
and of unlimited extent, p( Y, 0) is the same shape as 1'( YI0), beingp( Y I0)1'111),
,
Henc9 the> m.l. estimator .I,)/, a model lincar ill the parameters and with
Galis.viall addiTive 1wise is identical To The IHarkol1 cstimaTor and shares its
properlies of zero bias and minimum covariance of all linear. unbiased
eSlimalors. For this reason, the lVlarkov estimation algorithms described in
Chapter 7 orten go under the name of maximum-likelihood, but the name is
accurale only if p(y! 0) is Gaussian.
Whose p.dJ'
l
given 11 and IX, is just the p.d.f. of c" so
The log-likelihood function is then
N
L(I!) 10gp(Y I0) = - IOY, - ,'iJ/a - N(log 2 +loga)
1=1
The values ii and Ii maximising L{O) arc found by examining
aL N ,1L
c1a a
1
a all = a
Selling aLlaa to zero gives &as OY, -jil)1N. As aLia" is discontinuous at
each Y" it requires a little more thought. If N is even, any value of /i between
the (NI2)th smallest Y, and the next larger makes aLia" zero; Lis a piccewise
linear continuous function of" with a flattop, for any given a. If N is odd, fi
must coincide with Ihe middle-ranking Y, as, although aLl1l" is undeflned at
that value, a small change in Jl in either direction reduces L. For completeness,
the Whole shape of L(",a) about ,; and &should bc checked to verify lhat a
maximu111 has been found. b.
Even an example as simple as Example 6.4. J brings uut the neeu for cure in
maximising the log-likelihood function, particularly where a or
local maximum may exist.
.1'/ = h,lll_1 +11 2",_1 + ... +""I"'-1Il + 11 + Cr
we Can calculate at each observation instant 1 an elfcctive measurement
Example 6.4.1 A known input sequence ju l is applied to a system with known
unit-pulse response 111) bUlllllknowll collstant output disturballce I', UIHJ N
samples orlhe oUlputyarc observeu. The noise 1(' l is zero-mean amI has p.dJ.
JI,.(e) = cxp( -lel/a)12a with a positive bUI unknown. The 'lllise samples
affecting successive output observations may be assumed indcpclll.lcJ1t. We
wish to find 111.1. estimates of J1 and a,
From the model
monotonically increasing function of positive values of its argument, the 0
which maximises the log-likelihood .!il1lctiol/ logp{ YI 0) also inaximises
p(YIO).
134
6.4.4 PrOllcrties of I\tluxillllllll-LiludihoodEslimu(ofs
137
An additional assumption that L(O) is everywhere dilferentiable twice
enables a consistent 1ll.1. estimate to be proved unique (ibid).
The covariance of m.1. estimates asymptotically reaches the Cramer-R.ao
bound, so they are asymptotically eIlicient (Cramer, 1946; Wald, 1943).
Granted thal with appropriate assumptions 111.1. estimates have good
asymptotic statistical properlics, a I"urther property becomes significant,
namely invarillllce. lilvariance is the property that the m.1. estimate ofa vector
f(O) of functions, no more in number than the dimension of 0, is just f(O) where
ois the m.L estimate or o. This applies whether or nolthe 0 corresponding to
any particular value of flO) is unique. The invariance property saveS an
enormous amOunt of work enquiring into the behaviour of practically
important functions of estimated parameters, as Example 6.4.2 demonstrates.
The explanation for this helpful properly is quite simple. The maximised
L(O) is no smallel' than L(O) ror any other 0, including all those values which
give flO) dilferent from 1'(0). The m.L estimate or fis round by evaluating the
same log-likelihood but regarding fas its argument; at each value orfwe pick
the largest log-likelihood given by any 0 which gives the required f(O). As we
have just remarked, no value of f different from flO) will result in a larger log-
likelihood than does [(0), so flO) is the m.1. estimate or f. '"
EXUlIIlJlc (J.4.2 Suppose we have round 1ll.1. estimates li, to till <1I1t1 h\ tu !}n of
the cocllicicnls in the model
Letting z tend to I, the gain is
(b,++b,,)/(I +0, +",+0,,)
and its m.1. estimate is
([" + ... + [,,,)/(1 -Hi, + ... +<;,,)
with negligible further computation and 110 further analysis, Similarly
the !i1.1. estimates of the poles allli zeros arc simply the zeros of
I+ci,z-l +, .. +ci"z-IJ and -I- .. ' -I- h"z-"), respectively. Notice
that although the poles and zerus correspond to unique values or the original
parameters lit to Uu and bIto hiP the steady-stale gain does nolo 6.
anp we require estimates or the steady-slate (d.c.) gain and poles and zeros of
the input-output relation.
In z-transforms thc model is
(6.4.6)
(6.4.5)
(6.4.7)
N
N II T-'
(v -u O-e)-
a aJ . r I
''''1
N
-, I I '0
(1'-11 -e)-
N f I
I'" I
6 DAYES AND MAXIMUM-LIKELIHOOD ESTIMATION
N
- I'\' T
e= N L (Y, - u, 0),
r= I
1= I
N
N I I'\' T-'
L(O) = -2 In(2,,) - N n a - 2a' L (y, - u, 0 - e)-
N
iJL I I T-
-=- (v -u O-e)
DJ a2 I I
r= I
The sample mean eis unbiased, bUl, as we saw in Example 5.1.2, the sample
mean-square deviation from the sample mean, a
1
, is biased for finite N.
Maximum-likelihood estimates are nevertheless asymptotically unbiased in
this instance and in general.
Maximum-likelihood estimates rrom independent, identically distribuled
measurements are strongly consistent (w.p,L). The proof is not simple (Wald,
1949). When the measurements are not all identically distributed because the
system generating them varies or the distribution of the system's forcing or
noise does, the m.L estimate may well not be consistent (Kendall and Sluart,
1979, Chapter 18). The root of the trouble is that measurements then
correspond to realisations of random variables whose p.d.f.'s have dilferent
parameter values at each sampling instant. The amount or information about
cach parameter value no longer incrcases cOliLilluaHy as I1lUI"C mcasurcments
are taken, so the small-sample bias persists.
so to make DL/DO and iJL/ila zero, the estimates must be
where u; is row t of U. Differentiating,
136
and the distinction between conditional ami ulH.:undilional 1Il.1. estimators
vanishes.
Maximum-likelihood estimators have lillite-sample bias in some instances. An
example is estimating the standard deviation a or uncorrelatcd additive noise
in lhe model (6.4.1). Following through (6.4.2) and (6.4.3) with a'/ ror the
noise covariance R, and allowing for an unknown noise mean e, we find
139
(6.4.16)
(6.4.18)
(6.4.15)
(6.4.14)
to' N
Ii. = Le,e: (y, - V,O)(y, - V,O)T
1= I 1= 1
!
I
6,5 PRACnCAL IMPLICATIONS OF THIS CHAPTER
We have seen that Bayes estimation has a satisfying rationale, and provides a
broad framework in which other eSlimators can usually be seen as simplified
versions of a Bayes estimator. The need in Bayes estimation for a prior p.dJ.
and a loss function is, depending 011 your viewpoint, either an advantage,
allowing the estimator to be tailored to the problem in hand and to any
background information, or a disadvantage, introducing subjective amI eYen
arbitrary judgelllents. Ambitious optimists with time to expcrimcllt feci thc
former, conservative pcssimists in a hurry the lalter.
Bayes cstimalors arc almost always to'o demanding in computation to
and from [(l.4.IS) "1Il1 (6.4.H),
N N
Le:ft.-'e,
1= I 1= I
.. N .
L(O, R) = - -2- (r( I -I- In 2,,) -I- InlRI)
The coupled equations (6.4.12) and (6.4.18) give the 111.1. estimates 0and ft.. In
the scalar case, ft. - , cancels in (6.4.12) leaving the o.l.s. eSlimate, and the m.1.
estimate of the error variance is, from (6.4.18), the sample mean-square error.
N N
Le:R- Ie, = L1tr(e,e:R-')
1= 1 f= I
The maximum of L is therefore where
Then, remembering that the covariance R is symmetric,
N
(lL NR I L-, T -,
-=----1-- R eeR
DR 2 2 I I
1=1
so back in (6.4.10),
and a little manipulation of e;r R-1c,:
6.5 PRACTICAL IMPLICATIONS OF THIS CHAPTER
(6.4.8)
(6.4.10)
I'" I
1= 1,2, ... , N
t= I
N
N I'\'
L(O, R) -"2 (dn2" -I- InIRI) - 2 L e: IC 'e,
I"" 1
6.4.5 MaxilllulIl- Likelihood Estimation with Gaussian Vector
I\tleasurcments and Unknown Regression-Equation Error Cm'uriunce
6 BAYES AND MAXIMUML1KELIHOOD ESTIMATION
so the m.1. estimates ft. and 0musl satisry
Dilrerenliation with respect 10 0, wilh e, related to{1 by (6.4.8), gives One or the
conditions for a maximum of L:
N
= (2,,)",-j2IRI
N
/2 exp( -1Le,' II 'e.)
,.= I
Also aLjaR must be zero. To find aLjaR we need the standard
(Goodwin and Payne, 1977)
a . T
aR[lnIRI) = R- ,
N
aL L1 -I
--= V R e =0
ao ' ,
1= I
so by laking logs, the log-likclihoou I'Ulll.:lioll is
with the covariance R orc
r
independent of t but unknown. Every e, is taken as
zero-mean, Gaussian and, in contrast to Section 6.4.2, independent of all the
olhers. Note that R is now the covariance between errors in one sample; not
different samples as in Section 6.4.2. II' also each V, is determinislie,
N
PlY10, R) = p,.(e" e
2
, ... , eN) = n{((2"YIRI) '12 exp( - ie,IR - Ie,) J
1= 1
We conclude our look at maximum"likelihood eslimalion by finding the mJ.
estimate of 0 from r-vector measurements y, given by
138
FURTHER ilEA DING
REFElllcNCES
Crumer. If, (1946), "Mathclllalical Metholls of Stlliistics". I'ritlceltlll Ul1iv. I'rillcelon,
Ncw Jersey,
Deutsch, R. (1%5). "Estimaliun Theury". Prclltkellall, Englewood Clilrs. New Jersey
Fogel, E" utld Iluillig, Y, F, (I !.1M2). 011 [he valuc ofillhnllllliioll ill syslelll idclJlilicationbuunucu
nuise cuse. Auto/1lalica IS,
141
PROBLEMS
6.1 After observations Y, the information about an unknown (J is in the form
of the posterior p.d.r. 1'(0 II') = e
X
-
fl
, 0;0, x. Find the minilllunl-expected-
absolute-error, minimum-quadratic-cost and unconditional maximum-
likelihood eslimates of O.
6.2 A noisy observation)' = 0 + v 2.5 is made of all unknown 0 with prior
p.d.L 1'(0) = 20, 0 S; 0 s; I. The noise v is known to have p.d.f. p(v) = v/2,
Os; v s; 2, and is independent of O. Find the eonditional-m.l., posterior-mode,
conditional-median and conditional-mean estimates of U,
6.3 An unknown Uis observed indirectly and noisily by y = fJ'! +1:' where
noise v has a uniform p,d,L over Ulo 2 and is independent of 0, and the prior
p.d,r. ur (/ is uniflll'lil over 0 lo I. Filld lite pusleriur p.d.L ur () ir (i) y = 2.5 anL!
(ii) J' = U.S. _ _
6.4 A scalar loss function which is sullletimes realistic is L(Ii,Ii) = D, Iii -iii <
D; L(V, 0) I, IV - O!;o, D. It implies that errors ill Vlip to D arc acceptable,
Goodwin, G. c" and Payne, R. L. (1977). "Dynamic System Identification Experiment Design
amI Datil Analysis". Acudelnic Prcss, New York <llld London,
Jaynes. E. T, (1968). Prior probabilitics.iEEE TrailS, S.I'st, Sci. Cybem,
Kashyup. R, L. (1971), ProbabilityalU.I uncertainlY, /EEE TrailS/III. 641-650,
Kendall, M, G. and Stuart, A. (1971), "Thc Advllllced Theory of Statistics. VoL 2, 4th cd.
Grillill, Lonuon,
Magill, D. or. (19(5). OptimallH.luptivc estimation of sampled stochasLic processes, IEEE TrailS,
AU((JlII, Coll1rol AC-IU. 434--439.
Mood. A. M" Graybill. F. A., and Docs. D, C. (1974), "Introduelion to the Theory ofStatislics",
3rd ed, McGraw-Hill. New York.
Moore, R, J" aild Jones, D, A. (1978), Coupled Bayesian-Kalman filter estimation
and states of dynamic waler quality model. In "Applications of Kalman Filtering and
Techniquc tn Hytlr()logy, Hydraulics and Watcr Resnurces" (CL. Chill, cd,), AGU
Chapmall Con!, Dept, Gv, Eng., U"iv, of Pirtsbllrgh, ,
PoHer, J. M., and Anderson. D, 0, O. (1980). Partial prior informallon and declslOnrnakmg,
IEEE Trtms. SyS(. Mall Cyber1/, SC-IO, 125-133, . '
Sage. A. P" und Mclsn: J, L. (1971), "Estimation Theory with Applications to COI1UIlUI1\CatlOns
and COlltrol", McUrilw-Hill, New York,
Schweppe, F. C. (1973). "Uucertain l>Yllilmic Syslcms", Prcntice-Hall, Englcwo(Jd Cliffs, New
Jcrsey,
Sherman. S, (195M). NUIl-lI\can-square error criteria I UE Tram. IT 4. 125-,126. .
Sorenson, H. W. and Alspach. IJ. L. (!lJ71). Recursive Bayesian estimation usIng (j;IUSSlUll sums,
AflltJl1wtica 7. 465-479, " I
Wald, A. (1943), Tests of statistical hypotheses concerning several par;uuetcrs when the Illlmbcr
of observations is large. TrailS, Amer. AJalh, Soc, 54. 426-482,
Wald. A. (1949). Notc 011 the consistency of the maximum-likelihood estimate. AmI. Math, Sial.
595-60 J.
S, (1971). "The Theory ofStalisticallnl'crcncc", Wiley. New York.
"Zacks, S, (1981), "Parametric Stalislicullnfercncc", Pergamon, Oxford.
PROBLEMS 6 BAYES AND MAXIMUM-LIKELIHOOD ESTIMATION
Bayes and maximum-likelihood estimation arc il1trodw:.. in all easy-tn-read
fashion by Mood el al., (1974), a good general rel'crenee for Chapters 5 and 6.
The material is also covered by Zacks (1971, 1981) at a rather more advanced
level, and an enormous number of other tcxtbooks. Sage ano Mclsa (1971)
give a clear account of estimation theory slanted towards control-engineering
applications.
The incorporation of prior background knowledge into estimation and
observation-based decision-making has had considerable attentiOtl in the
engineering literature (Jaynes, 1%8; Kashyap, 1971; Poller and Anderson,
1980). Approximate Bayesian computational methods arc relatively well
developed for staLe estimation; closely related to idcntiflcation as we shall see
in Chapter 7, and for combincd state and paramcter estimation, discusscd in
Section 8.9. The best-known leehnique (Sorenson and Alspach, 1971) uses a
weighted sum of Gaussian p.d.f.'s to approximate the non-Gaussian posterior
p.d.f. of the state or parameters. Gaussian sums have been used, 1'01' instance,
to decide between possible manOeuvres in target tracking. The underlying idea
has quite a loug history (Magill, 1%5).
implement fully. A small number of features of the posterior p.d.r. is
computed rather than the whole p.d.f., most orten the mean because it is
optimal in several ways as disclissed earlier, alll] the covariance as an indicator
of the spread of possible parameter values. The iJea of procceJillg from a
prior to a posterior p.d.f. is well suited to recursive estimation, considered in
Chapter 7.
Maximum-Iikclihooo estimatiun is popular because arits gouu asymptotic
properties, reasonable computational demands and considerable intuitive
appeal. The 'maximum-likelihood' algorithms popular in the identification
community are, in facl, Markov Icast-squares algorithms. This is not to say
they are good only for observations with Gaussian random components; the
Markov estimator was derived us tile nlinimulll-covariancc, unbiased, linear,
generalised least-squares estimator withollt reference to any p.d.r.
140
and larger errors unacceptable. How is the minimum-risk estimate tJ bascd on
this loss function found from the posterior p.dj'. of IJ'! In what circumstances
might (] be non-unique? How could the non-uniqueness be removed or
avoided in those circumstances'? [Hint: Think of maximising somcthing.]
6.5 Find the minimum-risk 0, according to the loss function of Problem 6.4,
given by the observation and p.d.f.'s of Problem 6.2.
6.6 Repeat Problem 6.4 for the loss function Lui, OJ = Iii - 01, 10 - 01 < D;
L(O, 0) = I, 10 - 01 :2: D.
6.7 Repeal Problem 6,5 for the loss function of Problem 6.6. Note that a
straightforward analytical solution is not possible. but numerical search is not
necessary.
142 " BAYeS AND MAXIMUM,L1KELJIIOOD ESTIMATION
Chapter 7
COlII(lutational Algorithllls for Identification
The control-engineering literature of the past two decades describes work on a
huge variety of identification methods, problems and applications (Eykhoff,
1974; Bekey and Saridis, 1983; Isermann, 1980; 'Eykhoff. 1981).
On the theorctical side, one of the greatest successes has been the
uniHcation of lllallY algorithms and experimental situations; notably in
analyses of asymptotiL: behaviour. Just the same, we cannot hope to cover
more than a smull fral:lion or the available methods, even at an introdw'::lory
level. Our selection is on the basis of popularily and proven
relative simplil.:ity and value as cxamples. The selcction is also influenced, of
course, by personal bias. l:rom rJ user's point or view, a good practical
appreciation of a few methods is more valuable than a theoretical
wilh a great many.
i
7.1 ASSUMPTIONS AND MODEL FORM
7.1.1 ASSUlIlpliolls
The algorithms to be described all caler for s.i.s.o. systC!ms lI'ith linear
dynamics. Thcy will accept non-linear functions of the observed variables as
explanatory variables, but like least-squares most of them rely on the model
being linear in the paramelers. The model may have to be split up or rewritten
to lhat cnd. Methods for non-linear systems are reviewed in Section 8.8.
We pass over m.l.m.o. systems because they raise a new complication.
Unlikc s.i.s.o. systeills. a Ill.i.I11.0. system can be represented by more tlmn one
minimum-order "transfer-function" model (matrix-fraclion dcscription: sec
Section X.7) wilh exactly tile salllC input-output behaviour, so the lirst
problem is to deciJc which one to iJclltify. Allemative representations which
are input"outpul equivalent may diJIer. for instance, in case or physical
interpretation or in how well condi tioned the parameter estimation will be. \Vc
have too little space to dojustice lo l11ultivariable representalion theory, so we
143
or in transfer-function form
145
(7,1.3)
All the algorithms ucscribed eml.'lloy aile or uther specialisation or the model
illustrated in Fig, 7, I, I:
B(--')--' I +C(--')
Yi:-
I
) U(Z-I)
I +//(Z-I) I
(iii) The injJut and noise are independent, This natural-looking
assumption means \\'c shall have to consider separately the iuentification of
closed-loop systems, where the input depends on the red-back noisy output.
Section 8.3 docs so.
(iv) The model artier is assumed.lixed, which may cause diJliculty in on-line
identification, Uncertainty about the best mouel order and doubt whether it
can be aujustcd reliably 011 line lllay leall to too high al1 order being used. The
identilkalion then runs a risk of" ill-conditioning because or ncar-redundancy
of some parameters. As an example, Illodelterms in too-recent input samples
may be incluued if the dead time is uncertain or varia ble. Modcl-order testing
is covered in Section 9.4, but existing techniques are aimed mainly at oO"-'ine
usc.
(v) Asymptotic projJerties are assumed important. They are, but the
emphasis on them is largely due to a lack or finite-sample theory, An algorithm
with better asymptotic properties, such as clTIciency, may also perform beller
on flnile recorus, but it may nol. The 1'.111.1. 2 algorithm of Section 7.4.3 is a
casc ill point, where a 1Il0dilkatioll to improve asymptotic bchaviour tends to
dcstabilisc the algorithm, ruining its performance on short records (Norton,
1977), "
in which
7.1.2 Stand,,,,1 Linear Single-lnllUl-Single-Oull'nl Model
+... +a
ll
z-
Il
, B(=-I)=b Z-l + ... +b .:-Ili
1 m (7.1.4)
C(:-')=C,Z-I +,., +c,z-", +'" +",z-'
Often IJ1 is taken equal to 11. but this is not essential. Integer k, the dead
time, is specified in advance. The eSlimation of k is discussed in Chapter 10,
but we noted under assumption (iv) that too small a value for k can result in ill-
conditioning. If the actual dead time is k" and the Illodel dead time k.
parameters hi to "k" j I Ii. arc redundant. \Vhat is mOlT, irthc order 11 of 1)
is also higher than necessary, ncar-cancelling pole--zero pairs will be
estimated, cl..Jl1tributing almost nothing to the input-output behaviour. The
spurious poles and zeros have little efrect on the model performance so long as
the poles remain stable, but the accompanying ill-conditioning may all"cctlhe
7,1 ASSUMPTIONS AND MODEL FORM
(7.L I)
7 COMPUTATIONAL ALGORITHMS FOR IIl1'N'r1FI(','TlnN 144
(B"'(Z-')U"'(Z-') +'" + B'"'(z-' + E(z- '
Y(z-!)= ',_, ,,_m_, __,,__ (7,1.2)
(I +//(Z-I
confine ourselves to an example or two in Section 8.7 showing the new reatures
with no S.LS.D. counterparts. Leaving asiuc the problem of choosing a model
structure, we may cxtenuthe S.LS.D. regression model heuristically by adding
terms in further inputs. A m.i.s.o. model with inputs ,,(I) to ui'/l and outputy
would then be
Note the restriction that all the input-output relations have the same poles in
this model.
Singlc-input-multi-OlilpUl models arc probably bcsllrcatcd as a collection
or s.i .5.0. models. The altcrna live, empluying a vector-out put Il1tH.ld. might in
principle require fewer paramelers, as parllJrthc Illodel would bc COmI11011 to
more than one outpuL An example is the stale equation or a statc-space
model. On the other ham.!, idcntiJica tion of scpara tc s.i .5.0. Illodels, eaC'h with
no more parameters than necessary to describe its uominanl dynamics, could
well be computalionally cheaper than simultaneous estimation 01" all the
parameters in a s.i.m.o. Illodel. It could also be more convcliicilt because it
. does not require access to all the outputs at the samc timc. ,
Several other assumptions, some dubious, ulH.lerlie the i0entilication
algorithms in this chapter. We nssume:
(i) A l/lllldratic/il11ctitm of residuals, prediction errors or parameler errors
is the performance criterion, mainly for mathematical convenience but also
because or a lack or well tried rules ror choosing other criteria in specific
experimcntal situations.Maximum-Iikelihood estimation is an exccption, as it
relates the function to be minimised to the p.d.L or the observations, and
hence to the noise p.dJ. Howevcr, the algorithm of that namc in
control engineering and described in Section 7.2.3 assullles a Gaussian p.dJ.
and minimises a quadratic cost functioll.
(ii) The plam ami ,wise parameters are taken as coiistall1 or at most slowly
changing, The dcsign or thc algorithms takes no aCCtlllnt or changes in
tlynamics due to common occurrences like variation ur feedstock quality,
operating point or demand ill process plant. and the elrcds or 1II1nlOnilorcd
inputs generally. Nor tlocs it cunsider !lon-statiunary noise dominated at
dilrerent timcs by dilrerent sources; Some well established techniques for
tracking time-varying dynamics arc describetl in Section S, I.
147
(7.1.7)
(7.1.8)

r",,(OI =)(I -I- I - I) =)
ror j> 0
1'01' i < O.
Y=-AY-/-Bz-'U-/-E
l'=-Al'-/-B:'U-I-W;
7.1.3 Output-Error and EljUation-Error Algorithms
/
7.1 ASSUMPTIONS AND MODEL FORM
r".,,(i) = j(lJ.5)'
and the other tcrms give
so
However, E is (I -/- A) W, so evell ir {IV} is white, Ie} is not. Consequently Ie} is
correlated with the noise content of the Jagged output samples in Al'. As seen
in section 5.2.4, such correlation causes bias in 1.5. estimates.li and E. To avoid
bias, equation-error algorithms identiry a noise-structure model aiong with A
and B, as we shall see later.
Outpul-error algorithms (Dugard and Landau, 1980) instead adjust Aand
iJ to minimise the error {IV} in
z -, anti z, giving the z-lransforms of the a.c,r. for positive and negative lags
rcspectively.
Example 7 .1.1 A noise-gcncrating sequence IV J of zero mean and m.s. value 1
produces a noise sequence til' J through
The infinite-series expansion or the first term gives
1(1 -I-O.5z-
1
-/-0.25:-
2
-/-(/e)
so the a.c.r. or :II' J can be round from
II (: 1 )J/(:) = - = (---'--.-_- -/- _I)
(1-0.5: ')(1-0.5z) 3 1-0.5: I 1-0.5z
The estimation algorithms in this chapter all identify the coellicicnls a
j
Lo G
n
and 11
1
to "
m
in model (7.1.3), but differ in how they treat the nuise par(. Twu
basic approaches to the noise can bc distinguished, leading to outpllt-error
and equation-error algorithms. Figure 7.1.2 shows the dilrerence. Equution-
error methods rewrite (7.1.3) into a rorm suitable ror 1.5. estimation.
Multiplying (7.1.3) by I -/- I) and dropping :-1 lor brevity, we obtain
(7.1.5)
(7.1.6)
7 COM i'UTATIONAL ALGORITHMS FOR IDENTIFICATION
= '1' x coellieient or :-i in H(:-I )H(:)
" II. . -,.
= (f- x cae lClcnt 01 Z 111---... 0 __ _
(I -I-D(o-'))(I -/-D(:)
Ivl
I +C(z-')
l+D(.l-'}
JwI
1
+
lui
z -, 8(z-1)
l+A(z- 'J
+
IyI
I
I I
I
Dead Plant PIon!
I
I
lime numeralor denominator
I
j dynamics dynomics j
Observed
Observed
0'
known
146
The a.c.r. of 111'} at lag i is
1'II'w(i) = E[W,II"+i]
= [(1100, +h,l',_, +hJYI_:!. +... HhoL"+i +h,t"+i_ 1-+ ... co)}
= (fJ.(IJ
O
h
i
+ h1h
i
+ 1 +... 00)
Fig. 7.1.1 Standard z-transrorm model for linear S.LS.D. system where Iv) is the while noise-
generating sequence, {II} the input, and 111'1 the structured noise.
non-redundant parameters too. Luckily ncar-cancelling pairs are
easy to detect, at the cost or racturising I-/- A(:-I) and B(:-').
The noise in (7.1.3), saYl\l'), is rcprcscnlcu as the result oflillcl'ing a zero-
mean stationary white 'scqucllcclv: through lhe transler fUllction
(I -/- C(z - 1/( I -/- D(z - 1i), thereby shaping the noise autucorrelation rune-
tion and power spectral density. We assume thal the input sequence Ill) is
independellt or 1f) Janti hcnce or {II' l,so we do 110t allow fectlbaek of the noisy
output The connection between the noise-shaping fiiler cacllicicnts and the
a.c.r. or 11I'j can be seen with the help or the unit-pulse respunse H(:-I) or the
filter, given by
where (J2 is the Ill.S. val uc or lll: _The Iil1i.1l exprcssioll ill (7.1.6) is a ratiunal
polynomial fUlldion or ::-1 01' Z, so tll'l is said to bc a stochastic process with
ratiO/lUI .\jH'ctJ'lIl dellsity_ NlIll1crical a.c.r. valucs 1'01' 111'11..:<111 bc found by
splitting thc rational polynomial function into partial rrm:tions symmetrical in
Equation (7.1.7) gives the regression-type equation
140
7.2 llATCIlIOITL1NEJ IDENTIFICATION ALGORITHMS
7.2 BATCH (OFF-LINE) IDENTIFICATION ALGORITHMS
Batch algorithms process all the observations of y and LI simultaneously and
produce a single estimatc of the parametcr vcctor. By contrast, the recursive
methods ofSectiolls 7.5 process the observations one sampling instant at
a time amI update the parameter estimates each time. Batch algorithms are
suitable only when estimates are rC4uircd once and for all or at Iong'intervals.
or when computing is chcap, since they process the entire record every time.
Most i'eal-timc applications are bettcr mct by rccursive algorithms since time
und computing power are strictly limited. An important example is
microprocessor-bascd self-luning control (Astrolll eJ al., 1977; Wellstcad eJ
a/., 1979), in which a new control value is computed by reference to a freshly
updaled model arter each sampling of the output. Batch methods have Some
advantage in iterative processing, where the estimates are improved by a
sllccession of iteralions, each processing the whole reCOi'd. They allow
monitoring of output or e4uation errors at the end of each iteration, using the
model obtained in that iteration. Progress can be checked and anomalics like
large isolated errors due to untrustworthy observations can be detected and
removed. This may be less easy in recursive estimation, where the 4uality of
the estimates varies during an iteration, starting relatively poor but improving
as more observations are proccssed. The significance of a given error value
correspondingly increases in the course of lhe iteration. The eO'eet is most
pronounced in early iterations and with short records.
The high cost or i"ecomputing the estimate at short intervals bv batch
methods makes recursive methods preferable for timc-varying
The tlexibility and computational economy of recursive algorithms has led
to great praclical and theoretical emphasis on them over the pasl decade. We
should not forget batch methods, liowever, as apart [rom their occasional
advantages they provide an introduction and motivation 1'01' recursive
methods.
The present output Y
I
is partly an autoregression (the -QiJ'l-i terms) and
partly a moving average (the bill
r
-
k
-
i
terms) of the exogenous (externally
generated) input {It}: altogether an autoregressive-moving average-exogenous
(o.r.m.a.x.) modd. II' no observed exogenous input were present, Lv} could
still be mooelled by an allloregre,Ysive-movillg average or a.r.m.a. model
driven by an unobscrved white noisc-generating sequcnce. Several of the
identilkatioll algorithms we shall discuss are based on least-squares applied to
(7.1.9).
7.2.1 Role of Batch Algorithms
(7.1.9)
Output
error
Adjustment
1----1
J
Adjuslment
7. COMPUTATIONAL ALUOIUTIIMS FOR IDENTIFICATION
101
Iyl
System
lui
.,
Model -
I
B(.t-i)z-k
1+
Iii
I
I
7.1.4 A.r.m.a.x. Model
\.-.. Equation
I error
I
I
L J
AdJustmenl
Fig. 7.1.2 (a) Output-crror, and (b) equation-crror identification, where 111) is the input.
Ib)
I
r------
Iyl
1+A(Z-IJ I System
I
lui
,
--'-'-'---
-
8(Z-1),,-*
t
14M
The explanatory variables in AY are all lagged versions of {II} and thus
uncorrelated with {IV}. Bias does not arise since {;;} is free of the noise which
alTects {y}. Clearly the lagged versions of {.I;} in AYare instrumental variables.
Section 7.4.5 discusses instrumental-variable identification further. The
output-error approach looks direct, but has its own complication, the need to
estimate the unobservable Ii} usiilgsome prior 11 and E. [n otherwords,.{IV}
in (1.1.8) is non-linear in A and B, whereas the equation error {c} in (7.1.7) is
linear in A and B.
151
(7.2.3)
(7.2.4)
(7.2.H)
1= 1,2.... , N
N
N I I LUi, rr) = - In(21t) - N In rr _. _ ,,2
2 26
1
__ I
1'= I
-I-o"-I-bm/{,_k_m
+C I l"/_ I + ... + c
t
/
r
'I + t'r
U
I
I
='" LI'r=1 )0/ n
11/- Ii - 1 1ft-Ii-Ill v/_ I
O'=[-lI, -(//1 h, bill C
""I
,
ASlrOIll anI.! Bllhlin, 1%6.
Y=-AY-I-il:-'U-I-(I -I-C)V
or in dillcrclll.:c-cquatioll form
7.2 UATCII (OFF-LINE) J1)ENTIFI(ATIUN ALGORITHMS
7.2.3 Maximum- Likelihood Algorillllllt
The 1110del (7.1.3) is rewritlen to look like a regression equalion, by taking D as
ldcnlical to A then multiplying by I + A:
(7.2.2) -I-D')Y=-AY*-I-B:"U*-I- V
1 COMPUTATIONAL ALGORITHMS FOR IDENTIFICATION
7.2.2 Iteralive Generalised Leasl-S'luarest
With 1v1white, there is no correlation between r;, and regressors I to.l'7-11 by
way of v
r
1 and earlier samples. There is no correlation betweenl r: and {I/*
either, so minimum-covariance linear unbiased estimates of the cocllicicnts in
A and B can be found by 0.1.5. The problem is how to Jiml D', not normally
known in advance. It is estimated iteralively, allernately with A and B, as
fullows.
(iii) til an autoregressive mouel
Eli} = _ [yof.' (i) -I-
to Elq by o.l.s., yielding 6<i1;
(iv) filter y*ii-II and U*u-Il to form
Step (i) or lhe lIrst iteration uses Yand U as y*HJI and U*lll). Subs',q
iterations gradually build up a noise-whitening filter as a t.:ast.:nue or
i = I; 2, .... Each I + ,j!ij is oflow urder, up to oruer 3 or so. The
rate is markedly inllucnced by the structure or but is usually quile rapid,
with insignilkant rcduction of the sum uf squares or resiJuals from slep (iii)
beyond about five to len iteratiuns.
t Clarke, 1967.
Balch Itcralh'c g.l.s. Iteration i is
(i) estimate ;jti} and i3(i} by o.1.s. using liltered I) and U*U-l}
produced in the previous iteration;
(ii) rorm residuals
tli} = (I -I- IIU})y*U-l J _ iJU): -Ii U*li- I)
Model (7.1.3) is used wilh Czero and I -I- D or lhdorm (I -I- A)( I -I- D'). giving
Y=-AY-I-B:"U-I-(I/(I -I-D'))V=-AY-I-B:"U-I-E (7.2.1)
Unless D' is zero, which is improb<lblc, I'-'ll is aUltlcorrclatcd. AnClJUiltion-
error Illcthou therefore Blust take action to avoid bias Hrising as ill Section
5.2.4. The algorithm !illers 1.1' I anti lu I with a transfer runClioll I +D',
producing lY*} and lu* l related by
150
153
Y,
f
I
I
I
1
J
Model
I
H,
1
I I
I
I;
I
f
I
I
f
I
I - +
i
f
'i,
f f
f
I
f
I
Updafe X H
I
f
I
i "I f
f 1
f I
____________J
I
j
7.3 RECURSIVE ESTIMATION
able, with simple modifications at most, to track time-varying dynamics as
discussed ill Scetion g, I. 'riley also allow prior information in the form of an
existing estimate to be exploited as the starting value for the recursion.
Some recursive estimators update a covariai1cc to indicate the reliability of
the estimates. A starting value for the covariance is provided by the user, along
with the initial cstimate itself, and tells the estimator how good the initial
estimate is. This sorl of estimator can be viewcd as a pared-down Bayes
estimator. Each ujJdate produces ill places of the posterior p.d.L just the
posterior mean and covariance. which provide prior values for the next
update. For a p.d.L which is completcly determined by its mean and
covariance, as a Gaussian p.d.r. is, the updating amounts lo full Bayes
estimation. We shall not normally take this view of the estimators. partly to
avoid thinking or the mean HIlLI covariance as necessarily the whole story, and
partly to retain. the option of regarding the parameters as unknown but
determinislic.
Recursive updating is popular in state estimation (Jazwinski, 1970;
Maybeck. 11.)71.) 1'01' lhe samc reasons as it is in identification, and some
identificatiull algorithms diller rrom slate-estimation algorilhms unly in
detailcd interpretation. The next lwo sections will emphasize the similarity by
1 ' .:":':" i...
x ;::;
Fig. 7.3.1 OtiC step uf recursive estimalor, where P, - f. is estimated covariance (If x, _I' II, is
observation (regressor) llullrix,)', is observed OULpUl, Y, is predicted output and v, is prcLlicliulI
(innovation).
for I:s; i
for f1 + I :s; i :::; 211 + I
for 211 + 2 :s; i :s; 311 +
7 COMPUTATIONAL ALGORITHMS FOR IDENTIFICATION 152
Applications of this widely used algorithm appear in Astrom, (1967)
Gustavsson (1969).
in which all a
2
v,/iJU,aO
j
arc zerO except those with i and.i belween 211 + 2 and
311 + 1 inclusive, which arc obtained by dillcrentiating (7.2.9). Only the
derivatives involving {v J have to be recalculated at cath iteration, the
being zero or fixed by {II} and {y). Lastly aLltJ(J and tJ
2
LID(J2 arc found
dilferentiating (7.2.8).
Batch iteratil'c 1ll.I. Iteration i for the parameter vector O'T rOTa] is
(i) recalculate all first and second derivatives which depend on 0', at
11, and form aLlaO' and the Hessian matrix [l/""L];
(ii) update 0' by a Newton-Raphson step
0'1fl = 0'''- 11 - (If u"L] - 1 aLlao'
(iii) check whether the estimates and/or likelihood function
sell led enough to stop.
Recursive estimation consists or repeatedly updating thc c:-;timatcs, each
update processing only onc output observation, It:-; Illo:-;tubviuliS application
is to real-time control or prcdiction. An example is hourly updating ofa
prediction model ora river catchmcnt on receipt of a new sHl11ple orinput (an
hour's rainfall) and output (present river 110w). Less obviously, recursive
estimators arc valuable otf line because their structure is simple anti they

av, ,OVr_'j . aV,_n _ _
aU
i
+(I----au;-++(Il-iJO
j
_It,-k-i+n+1
V'_H_:!II+ I
Each oL/aU
j
is found via {auf/Dod, which in turn come from a recurrence
relation obtained by dilrcrcntiating (7.2.4). The algorithm takes 111 equal to
II + I and q equal to II, then from (7.2.4)
The second derivatives of L follow from (7.2.8):
155
(7.3.8)
(7.3.9)
I[ 5J) =[ 5-I- l.5k:
I'
J
J _ 2 _ 2 -I- l.5k: 21
[
5J [kl"J(
Xl = -2 + ';;::1 4.5 - [1
We haye yct to Jix K,.
in which the noise is zero-mean. The new estimate is to be linear and unbiased,
so it has to be of the form
7.3.2 IVlinimulIl-Covariancc Linear Unbiased Updathlg
next ask for x, to have the smallest possible covariance. Its covariance is
1', = EI(x, - Ex,)( J1J= [(1- K,IJ,)x,_ 1 -I- K,Y, - x)( )']
= [(1- K,II,)(x, _1 x) -I- K,(Y, - 11,x)( )']
= (1- K/J,)1"_I(I- li,rKi) -I- K,RK,' (7.3.6)
where (.) denotes a repeat of the previous bracketed exprcssion. The last step
in (7.3.6) relics on the noise Y, Ii,x, that is Y" being uneorrelate.d, ,'yilh. the
error I-X. We can find the K
I
that minimises PI by writingdowl{the
change I1PI due to a small change I1K
I
and choosing K, to make the rate of
change of P, with K
,
zero:
I1P, =(1- (K, -I- I1K,)H,)1',_ 1 (1- U,T (K,T -I- I1K,T
/
-I- (K, -I- I1K,)R(K,' -I- I1K,T).L (1- K,Ii,)1',_ 1(1- 1i,"K:) - K,RK,T
'" I1K,( - 11,1', _1 (1- 11,"K,') RK.')+ ( - (1- K,H,JP, _111/ -I- K,R) 11K:
(7.3.7)
The optimal gain matrix ror the updating is thererore
K
l
= Pl_IH,'tH,p,_IH;r + R,-l
-(1- K,11,J1"_IH,' -I- K,R = 0
The inverse in this expression exists unless Y
I
is both 110iserree, making R zero,
and part-redundant, making HlP,_ 1 f1/" singular. The neglected second-
degree terms in 11K, are readily seen to be with the same
exceptions, conIlrming that a minimum of,P, has been achieved.
Here the second-degree terms in t1K
I
have been ncglected as we are about to
make !J.K, tenu to zero. For !J.P, to have zero rate of change with !J.K,
whatever the relative sizes of the clements of 11K" the expressions multiplying
11K, anu !J.Ki in (7.3.7) must both be zero, requiring
(7.3.2)
(7.3.5)
(7.3.4)
(7.3.3)
.I, -I- K,fl, = J
7 COMPUTATIONAL ALGORITIIMS FOR IDENTIFICATION
x, (J - K,l1,)x, _1 -I- K,Y, = X, _, -I- K,(Y, - 11,x, .. 1J
Hence
and so
Y, = lI,x -I- Y, (7.3.1)
Suppose we have at time ( an old unbiased estimate X, _ I of a veClor x
(parameter or slate) with a covariance P, __ I' We receive new noisy
observations making up y;, related linearly lo x by
7.3.1 Linear Unbiased Updaling
Example 7 Old unbiased estimates = 5, = 2 of two para-
meters xO
J
and X(2) arc to be updated using a new observation
Y
I
=xO
I
+X
l21
+ v, = 4.5
154
The observation noise VI has zero mean and covariailcc R, and is assumed
uncorrelated with the crror in Xl_I' We wish tu combine X
l
_
1
ami YI linearly
(to keep the computation and analysis simple), forming a new estimate XI. In
other words, we want
discussing estimation rather than specifically identification, and employing a
more general notation. Figure 7.3.1 introduces the notation and shows an
updating step. The broken lines indicate parts not present in all recursive
estimators. For generality, a vector olilput y is considered. Our exploration of
recursive estimators starts by seeing how the updating mechanism of Fig. 7.3.1
arises if we demand that the estimator is lincar and has good finite-record
statistical properties.
Our new linear and unbiased estimate must therefore aLid to the old estimate a
correction Inoportional to the predktioll error between thc ncw obscrvation
and its value predicted by the olu estimate. The preuiction error is often called
the innovation. Already the struclUre of Fig. 7.3.1 is parlly explained.
with matrices J,and K,chosellto make XI a good estimate. If we ask for x,to be
unbiased, it means that for any x and any given H,
157
(7.3.13)
0.2]
-O.4J
0.9
-0.2J -I- [0.2J5[0.2
0.8 0.2
)[
4 OJ 2.4
IJ 0 1 -0.4
0J[ 0.8
I -0.2
"'-II,x
u
+(y, -II,",,) = y,
-0.2J[4
O.B IJ
-0.6J
1.0
[
0.8
1'-
,- -0.2
=[ 2.8
-0.6
H,x, =H,(x
u
-I- K,(y, -li,,,o
=H,(x" -I- Ii, puH,'(H, PullJ -I- R) - '(y, - lI,x
o
At this juncture we can look back at Fig. 7.3.1 and recognise the whole
mechanism in our equations. The only other things required are initial
conditions X
o
and Po to slart lhe recursion when J I arrives. Ifx
u
is poor and Po
very large, the effect is that
The variances of .\": II auJ .r:
21
are larger, and the Jilrercnce bel ween this P, and
the optimal one is easilY,seen to be non-negative-definite, e.g. by Sylvester's
criterion. H 6
That is to lile correction or X
o
to xI is almost enough to make H J XI Ht Y1
exactly. The inJluence of 50.
0
is negligible because its uncertainly, specified by
Po, is much larger than that in y l' specified by the noise covariance R.
Generally, the larger Po the smaller the influence of x
o
'
In state estimation, (7.3.9), (7.3.5) and (7.3.6) or their alternatives listed
below are" large part or the ramou, KalllJalljillet(Kalll1an, 1960; Jazwinski,
1970; M"ybeck, 1979). The only items missing account ror time variation of
the state, as described by a state equation. We are not primarily concerned
with stale estimation, bUl we shall bring in a state equation to describe
evolution of the parameters of systems in Section 8.1.5. The
resemblance betWeen state and parameter cstima tion was poi 11 tell out not lung
arter the Kalman IiIter was devised (Mayne, 1963; Lee, 1964).
Example 7.3.3 The covariance or x, in Example 7.3.2 is given by (7.3.11) as
7.3 RECURSiVE ESTIMATION
sothe cstimated standard deviation of is )24' 1.55, that of is
jOJi:::::: 0.95, unu unll arc negatively correlated.
Sub-optimal correction gains k: 1) = k:
21
= 0.2, say, would make the
covariance, givcn by (7.3.6),
allli /I 5
0J[IJ( [4 JI-IJ ).. , [4J/ [04J
I I [I Ij 0 I _I +5 = I (
5
+5) = 0: I
I
(i) We can see in this small example why is larger than k:
21
, According
to P
t
-
I
, t is less reliable than I (it has a larger variance) and I is not
correlated with .i)=-',; as in addition H, shows that and .Y::\ allect.\"
equally, the error between .1'/ and .1\ should induce a larger correction in
than in .\"(2),
(ii) The dimensions of the matrix inverted in the computation of K, are
fixed by the dimensions or y,. Here y, is a scalar and the matrix is I x I. f',
Notice that
We can find the minimal covariance or x, by substi luting (7.3.9 j ilito (7.3.6):
:> _ T 'T ' -_) _ 'I' ''I'
P, = P'-l - KIHtP
r
_
1
-1,_ IK
r
+ A,UI,J '_111, -\- R)K,
= P,_ 1- P,_1ll;'UI,Pr"_IJl/ -\- R) - J II
I
P
i
_ I
= (1- K,II,)I" _,
A simpler expression l'or K, can be round rrom (7.3.10) and (7.3.9):
1'11
'./1-' =1' l{r(/_(H I' II' +/1)-'11 P 11')/1-'
I r ,- I I , ,-I I r , - I ,
= 1',_ ,11,'(.'/,1',- ,II,' + /1)- '(11,1',_,11,' -\- /I -11,/,,_,11,' j/l-'
= 1',_ ,H,' + /1)-' = K,
Equations (7.3.11) and (7.3.12) ror 1', and K, simpliry the algebra bUI are 1I0t
necessarily good reci pes for computation, where sensi tivity to round-off error
may have to be considered. They could not be used together, in any case,
PI would require K, and vice versa.
.[- 5-I- 1.5k:' 'J [ 5.6-J
x, = _ 2 -I- 1.5k: 21 = - 1.85
Example 7.3.2 We are told the covariance of the olu estimate x, _1 in Example
7.3.1 and the noise variance of the new observation Yl' respectively
The minimum-covariancc linear unbiasctlllcw estimate of x is obtained
the correction gain maLi"ix
so
156
159
(7,3.11)
(7.3.20)
(7.3.14)
(7.3.15)
(7.3.16)
[
/I'J [V'J
II"!, '1
'1
1
=
II, "
jlj=
XI = I ;}fl__ \ tW, _ I + lJ;rR,- 1Y,) = "1_1 + HIT R,- 'Y
,
)
= 1- U;r R,-I 1+ H/RI-'y,)
= "1_1 + PJJ/R,-1CYI - H,x, _
I
)
[I
0
:,1
n
0
"J
i}fl =
II, U
jf-I _
11';' U . ',\ \ . 0
'i -
11,-1 0 0
(7.3,17)
Hence
,
I' - , = MY rjf 'MY = \' Ir'lI- , II. = .''', JiY + 11'/' - 'II
I I' I 'I I I r "1 1.
71
,_ I' 1- 1 "I 'I " ,
i= I
and so
=1',-_', (7.3.19)
Similarly,
.Jf,'.clf,-' =[11,'11,' lIill;' II,'II,-'J (7.3.18)
and defining (fli ;)fl- 1,Y(/) - 1 as PI' which wc know from the end of Section
5.3.5 is the covarialH:c or x" we have
where
vector-output case. The regression equation (7.3.1) from to time ( gives
altogether
The Markov estimate based on (7.3: 14) is
x _(1(T'II/-' Jf )-' Jf'l'II/-'U"
. 1- . '" I ." " ,. " ;y,
where :!Il, is the covariance matrix of rl' As in the previous section. we assume
that Iv i is white, with Coy v, given by R,. Therefore, cov f"1 is a block-diagonal
matrix ano has a block-diagonal inversc:
7 COMPUTATIONAL ALGORITHMS FOR IllENTIFICA-rJON
Let us summarise what we have found so far.
7.3.3 Recursive Estimator Ucrh'cd from Least
Squares
The idea behind Ihis derivation is to apply Ihe balch Markov (g.!.s.) estimate
given by (5.3.15) to all the observations, and pick Ollt the clrecl or the new
observation and corresponding regressor samples at time t. Again we 00 the
for t = 1,2,. ", N; x
o
, Po given.
The updating equations can be written in several other ways, differing in
numerical properties such as sensitivity to round-ofl' error.
It is important to remember that we assumed "I to be uncorrclated with
xl_l-x. Since "'_I depends on Y,-l and through 011 all carlier
observations, it depends on the noise present in those observations. Our
assuolption therefore implies that {v J is while. The covariance R is between
noise variables all at one time, and docs not describe correlation between
successive samples.
We have treated 111 as deterministic in deriving this algorithm. The
observation equation (7.3.1) is a vector-output generalisation or our usual
regression-type model, with HI made up of regressors, When we rewrite a
transfer-function model like (7.1.3) in regression-equation rUrlll, (7'.1.9) Or
(7.2.4), the regressors arc parlly siochaslic since Ihey include earlier'oulput
samples, and they are also generally correlated with the regrcssion-cquation
noise. A vector-output version would have II, stm.:hastit.:: and correlated with v,
in p ..3.I). Sectiun 7.4 is cOIH.:el'lleU with IilH.ling IlIininllllll-COVill'iulICe linear
recursivc estimates in those marc compIicateu circulilstam.:es.
Havingjust derived the estimator l'roJlllirst principles, by t1irctt minilllisa-
lion or the covariance subject to linearity ano 1Illbiul-icuncss, we trace an
alternative derivation in the next section.
Uecursh'c IVlinimullI-Co\'uriance Linear UlIlJiascd Eslimator In on.ler of
computation,
K,=I',_/I,'UI,I',_,II,' +11)" (7.3.')) 117.3.10)
X,=x,_,+K,IY,-II,x,_,) 17..1.5) Dr (7.3.11)
1', = (/ - K,II,)I', _, (/ - II,' K,') -I- K,II K,' 17.3.(,) (7.3.5)
(or (7.3.11))
158
161
(7.3.29)
(7.3.30)
I
264
6
0.8
218
5
4]
0.6
151
4
[4 4
P, = PI-I - P,_lh,h}"j)I_I/(1 + h;PI_Ihl)
x
r
= XI_I -I- Prh,(y, - I)
I 0 0.2 0.4
.'(Il=.\', 3 59 98
I I 2 3
J.J RECURSI VE ESTI MATI ON
In our present notalion,
and the data were
Example 7.3.4 In example 4.1.4, a weighled l.s. estimate uf the parameter
vector [initial position. initial velocity, accelerationp was computed for the
tracking problcl11 or Example 4.1.1. Observations at six sampling instants
were processed using the batch \V.l.s. estimator
Ow = [VI I V
T
IVy
This algorithm is the L"orncrstOllc 0 rs.i.s.o. rccui'sive idcll tiJieation. Although
the algorithm was gcneraliscd and brought to prominence by PlackcLL (1950).
its essentials were derived. without the bCllcJil or matrix algcbra. by Gauss
(Young, 1984).
noise variance a; may well be constant but unknown. If so. we can define
a normalised covariance P
r
as P,/a
2
and write
Owis equal to the Markov estimate for a system with white noise ofcovariance
R = diag[j I I I -]. i.e. of time-varying vitriancc 1. 1. 1, I, 1,1, since
Markov anu w.l.s. estimates dilfcr only by R- t replacing IV. V\'c use our
recursive Markuv i.:stillwtor inslead. The mouel was
The weighting matrix IY was diagonal with principal diagonal
hi = [I I I'/2] = [I 0.2(1 - I) 0.02(1 - 1)']
,- Let us go through two recursion steps, starting with a guess x(} = () and
Po = 10'1, which statcs correctly that "0 is vcry poor. Calculation 10 aboul
eight figures is i1ctessary because of ill-conditioning, but results will be quoted
four figures for conciseness.
(7.3.24)
(7.3.23)
( 7.3.22)
(7.3.26)
(7.3.25)
(7.3.28)
(7.3.27)
X,=X
I
_
1
+
for t = 1, 2, ... , N; X
Ul
Po given.
and is often helpful in obtaining alternative ways of writing the updating
equations. Although (7.3.25) looks more complicated than (7.3.19), it uses
only one matrix inversion to go from P,_ I Lo PI' whereas (7.3.19) takes two.
Moreover, the matrix inverted in (7.3.25) has as many rows or COIUlPIlS as the
dimension of Yl' usually 51,naller than the dimension or x, so the matrix is
smaller than P'-l or Pro If the covariance is uptlatcd On a shurt-ward-Iength
computer over many stcps, (7.3.25) is risky, sincc ruund-ulrcrrur may inHate
the last tcrm and ultimately cause PI to be indefinite rather than positive-
definite. Safer alternatives exist (Uierman, 1977; Maybeek, 1979), but
floating-point computation with longer word length
l
say 30 bits, seems to
avoid trouble.
The scalar-output algorithm has hI' for H, and rr; for R,.
Scalar-Output Rccursh'c I\Ilinimum-Cm'ariulIcc Linear Unbiased Estimator
We recognise this updating equation as identicalto.(7.3.5) with K, replaced
by P,H,' R,- I as in (7.3.12). The covariance updating equation (7.3.19) is less
familiar. We can twist it into a recognisable forin by wriling
Pi _1 = P,_ 1P,- I PI = P
r
- I -I- Jl;rR,- I IJj)P,
which gives
This is (7.3.10). The equivalence 0[(7.3.19) and (7.3.25) is called th" //Iatrix-
inversion lemma. It is a special case of
160
On substituting (7.3.23) into (7.3.22), we lind
P,_ I = P, + 1H/'R,- 1(1 +- Il,Pr_tlJ/U,- I) IIJJJr_ I
so
7.3.4 Informalion Ulldating
163
350
o
6
Ib)
a
a
a
a
a
-366
a
a a
---.--------------
Actual xz, l1 t.

a
a
__ __
Acfual x"
5.0 -------------...
la)
lOa
a
- 175
"
0

<>,
2.5

Aclual
x"
a
a
7.3 RECURSI VE ESTI MATI ON
First we must realise Ihat the p.d.L PIX,) can aelthe part of the p.d.r. of Ihe
observations given the parameters, in the definition (5.4.1) of lhe information
malrix. We CUn think of x, asa vector of processed observulions, conlaining
the illformalion in all the original observalions up lo time r. Omitting the
explicil dependence or x, on x from our nolalion for brevity, we have
p(x,) = eonst x exp( - :\(x, XJf1',- '(X, - x,)) (7.3.31)
Fig. 7.3.2 Recursive minimum-eovariance unbiased estimates and their computed variances,
Ext,m"lc 7.3A, where 0: .i
ll
; 6: ":21; 0: .i
k
]
o
o
IC)"'
2000
o
IU'
-0.1236J
-988.9
9901
0]. Equalions o 0], h:l'o = [10' o
o
o
(1-
1
-Ill') '" [;t
I:;' :: JlilJ
U 10' Lu 0.25 Lo
hj = [I U.2 0.02], hjp, '" [0.25
[
0.2498 - 1.236
1'2'" -1.236 111.2
-0.1236 -988.9
[
1.545 x 10-']
p
2
h
2
", 1.236
0.1236
.1'2 - hix, '" 56, x
2
'"
27.69
At r = 0.2, ( = 2,
+ hil',h
2
",404.5,
At r =0, {is I and hT [I
and (7.3.28) then give
[
10
8
1', = 10'/- 0
[
0.25
x,,,,U+
The updating equation (7.3.19) 1'01' p-I is worth closer inspection.
illuminating inlcrpretation is lhat p-l measures inrormation, and (7.3.
says how much informalioll aboul x is supplied by the new observalion Yr'
a Gaussian x" we can show that P,- 1 is in racllhc Fisher information
162
Figure 7.3.2 shows the evolution or x, and P, over six updates. Convergence is
rapid, and x" is [4.613 250.1 19.30]T. close 10 Ihe bate)] eSlimate
[4,592 250.2 18.97]'. As the inilial error in "0 is "bout 250, ils initial
estimated variance 10
4
uscd as elcment (2, 2) or Po is rather small. A guide is
that each principal-diagonal elemcnt of Po should not be smaller than the
square of the largest inilial crror in thal paramcter which would be
unremarkable. For a Gaussian random variable, lhe (J limits encompass
about or all samples, so lhc guide is reasonable.
'the main numerical ditliculty in this example is in calculling lhe gain Plhj';
t"
165
(7.4.4)
(7.4.3)
(7.4.5)
(7.4.6)
(7.4.7)
V'-'1 J "'_I
i=Ij,Ij-I, ... ,1
E'tended Least-Squares Algorithm
For t 1,2, ... , N
(i) Update h'_1 to h, as in (7.4.2), using v,_. for V,_I
T T I
(ii) P, =P,_I -'- P'_IIt,h, P
f
- J/(rr; + It, P,_I I,).
(iii) Calculate innovation v, = y, - h;rx,_I'
(iv) x, = x, _1 + P,h,l'J(J,
2
; x
o
' Po given.
Unknown early samples of v and perhaps II and y in early h,'s are taken as
zero. Alternatively, batch o.l.s. can be performed on the lirst few sets of
to stand in for the unknown V,_I to V,_q in It,. By doing so, we introduce an
indirect link between the earlier eSlimates x,_, and x," The asymptotic and
finite-sumple behaviour of the estimator is thereby altered, and we can no
longer be sure even that it is consistent. These worries are postponed to
Section 7.5. For now, it is enough to know that misbehaviour is possible but is
very seldom seen in practice.
The e'tended least squares (e.l.s.) algorithm became popular (Panuska,
1968; Young, 1968) under a variety of other names such as Pallllska'smethod,
'I .. , .. ; <.,'
r.m./. I (recursive ma,imum-likelihood I ; un.1. 2 is discussed later) and, When
used for noise-structure estimation as in Section 7.4.5, a.m.l. (approximate
maximull1-likelihood). A common variant of e.l.s. uses the innovation
in place or VI i' As x, _i-I is OIlC step Ollt of date, v, _j approximates v, i less
wellthun docs li,_i' amI the performance of the algorithm sufrers. There is
rarely any computational saving since the residuals are needed for model
validation anyway. SOllie aulhol's reserve the lH\llICS 1'.111.1. I for lhe version
using (7.4.4) and a.m.1. for that using (7.4.3).
where
II: = l.1',
:<=[-01 -lIlI b , bllJ ('I c'11 (7.4.2)
with {II I zerO-Blean, wilite alld usually ul'collstalll variance (J2. lrwe kllew v, _1
to V'_'I at time I, (7.3.27) and (7.3.28) would givc minimum-covariancc
estimates, unbiaseu as I', is ulH;orrclaleu with h,. As we stcp throug.h the
records updating X, we can generate residuals
7.4 RECURSIVE IDENTIFICATION INCLUDING NOISE STRUCTURE
(7.4.1)
(7.3.33)
F[ iJ I ". (D I ".)' ]
F, = ax, ( np(',) Dx, ( np(,,
= P,-I E[(x, - X,)(x
1
- X/ll',-l = P,- I
7 COMPUTATIONAL ALGORITHMS FOR IJ)ENTIFICATION
With this interpretation, the role or H/R,-' H, in (7.3.19) is clear. A larger
observation-noise covariance implies that y, brings less information about x.
A large Po is seen to signal that X
o
contains littlc information.
The idea of updating p-I rather than P has received a lot of attention in
state estimation (Bierman, 1977) but not apparently in identification. The
relative dimensions of y and x determine whether there is any computational
saving.
= -'-P,-I(X, - ',) (7.3.32)
and the Fisher information matrix for x, is
7.4 RECURSIVE IDENTIFICATION INCLUDING A NOISE-
STRUCTURE MODEL
We now examine several techniques whkh approximate the lIliJlil11l1l11-
covariance estimate whcn the noise is Hol white but can be JllOut.:llcll as linearly
filtered white noise. The notation that x is the paramcter vector and h the
"regressor" vector will be retained, to allow easy cOlllpariS<ll1 wilh the
preceding recursive algorithms and avoid confusion between a regressor
vector nand inputs ii. It also helps when, in Chapter S, we borrow further statc-
estimation methods.
The aim of this algorithm is to modify (7.3.27) alld (7.3.28) as little as possible
yet attain acceptably small covariance and bias in the prcsence of
autocorrelated noise. The algorithm takes D equal to A in the standard model
(7.1.3) like the batch m.1. algorithm. The result is (7.2.4), or in our present
notation
7.4.1 Extended Lellst Squares
164
since XI is the mean of the unbiased Xl and PI its covariance, Hence
D . rl
.- ((In p(x, = 1:- (consl - - ',)' '(x, - ',!)
ax, l. XI -
167
(7.4.17)
(704.16)
(704.15)
(7.4.14)
(7.4.13)
,
[H,,5,] "" 2 Lh,h!
l== I
, ,
DS, La", 'L T' '
",,-=2 -:;-.' v
l
=-2 (h,+[J)I,J xlv,
eX ox
I::: I /:= I
the Jacobian again. The Whole Hessian is Lhen roughly
RECURSIVE IDENTIFICATION INCLUDING NOISE STRUCTURE
It relates easily to P, in e.l.s., since rrom the information equation (7.3.19)
specialised to scalar output
7.4.3 Extended Lcasl-S(IUareS as Approximate /VlaxhnulII- Likelihood;
Hecursivc l\thlximullt-Likclihood 2
The gradiellt is
Talman and van den Boom advise exponenLial weighting-out of old
residuals, as in Section 8.1.3, to wcaken the influence of poor initial estimates.
\ '. i
where clement (i.j) or the Jacobian maLrix [J).,] is Dh'),)'<j' The Jacobian is
nOll-zero because x allcds 1:/_
1
to ii/_., in hI' which arc computed as
.J'/-j - h/I_jx, I::; i::; q. Precise evaluation of as/ax would mean going back
over the cntire recoru to recalculate every I; and II at each new value of x; the
algorithm would 1101 be recursivc. Ttl kecp the computation recursive and
simple, wc might ignore at lime Ilhe illllucm:e ors 011 all earlier I' ano It values,
neglecting even lJ)l
f
]. Only - 211A would remain in aSiax, and we could
regard h,l', in (7.4.7) as approximating -1as/iJx evaluated al x
r
_ I'
Element (i,j) or the Hessian matrix or 5, with respecL to " is
Extended least-sLjuares appears in a new light irwe think or (704.7) as a sLep of
a numerical search routinc to minimise the sUm of squares of residuals
(7.4.8)
(704.9)
_li,]T
(704.10)
-dl c'
,
c'
\ -0" b
l
Y=-AY-I-B:-'U-I-(I +C')II/(I + f)
= -Al' + Bt-'U -I- C'V - DE -I- V
x=[-al
The e.l.s. updating equations are used to estimate
7.4.2 Extended Mutrix Method
a rcgrcssion cquation canbe written in lenl1s uf buth Eanu 1/ whit:h is linear in
all the anti noisc-lllo11el coellkicnts:
Assuming enough previous samples are available, approximate values of v
r
_
1
and "\_1 to update h'_1 can be by
e
t
_
1
= J'r- I + tl
l
Y,_l + ... + tlnJ',-u- I - hlll
r
- ... - I
(7.4.11)
V'_I ==-CIV,-2--iir-Q-1 + ... +cii,-r-t (7.4.12)
where the coeHicicllt estimatcs come from X,_I' Initially zcros are lIsed for e
and vin h, until e,_,_1 to e'_1 have all been round rrom (704.11). From that
point on, h fills wiLh values calculated by (704.11) and (704,12).
A weakness of c.l.s. is that the noise-model order Illay have to be high.
Coemcicnts C I to ("I arc the u.p.r, ordinate offhc filter which shapes the noise.
II' this transrer runcLion, more generally (I + C)( I + A)/( I + D), includes a
z-plane pole just inside the unit circle, the u.p.r. has a long Lail and" has La be
large. Estimating an a.r.m.a. noise model avoids this problem (Talmon
van den Boom, 1973). WiLh
C)(I...+ A) V g, 1+ C' V
I +D, -I-D
166
samples La get "0' Po and the earliest residuals, bUL the exLra programming
efrort is scarcely worthwhile.
A constant tcrm can be added to the motlel by adjoining I as an extra
element of hI' to go with the unknown constant as all element of x. The extra
term copes automutically with unknown means or 11{ J. 1.1' J anull
J
l providing
they do not change too quickly. DClcnninislk trcnds, c .g. ralllps,l.:ouid <llso be
accommodated by extra regression terlllS, but we shall lind lllore llcxiblc and
elegant methods ill Section 8.1.
MosL or the resulLs in Chapter 10 were obLained by e.l.s.
..
q /-q
7.4.4 Stochastic ApJlfoximution
169
(7.4.22)
" "(7:4.23) II1(X) = Y
Stochastic AIJIJi'oximatioll Algorithm
(i) Update "'_I to h,.
(ii) Calculate innovation \', = y, - hrx,_ I'
(iii) x, = +)j/h/",
forl=1,2, .. :,N; Ir}givcn.
Stochastic approximation (Robbins and Monro, 1951) originated as a trial-
and-error technique for finding the root x of a scalar equation
correction ga in some other way, while keepi ng the corrector form of updating
for x, but only at the price 01" a larger actual covariance for x. Unbiaseuness,
by contrast, ducs 110l uepcllll on the choice of correction-gain matrix, K, in
(7.3.5), proviuing the rcgressors arc not correlatcu with the noise.
The simplest schcme would be to replace in (7.3.28) or (7.4.7) by a
predetefillincu positive scalar )'
1
, giving a stochastic approximation (s.u.)
algorithm.
from noisy observations 1'-'(:'-/), 1 = 1,2, ... , where the forms of m{ . ) anu the
noise p.d.f. arc not necessarily known, but the noise has zero mean. Successive
trial values :i, are found by the scalar version of (7.4.22) with y -IIJ(_Y,_I)
for h, v/. A uistincti vc feature 01" s.a. is lha t m( . ) need not be paramcterised, so
its adoption ror parameter estimation is ironical.
As the correction to xis still proportional to h/\'1' we can interpret S.H. as
another gradient-based method for minimising S,. The step-size factors {J'}
have to be both large enough for xto reach the correct value and small enough
finally for xto stay there. As 1', depends on the noise, h, v, is a random variable
in general, a finite covariance, so {x Ican settle to a zero-covariance limit
as I-J> (fJ only if )l,-J>O. In c,l.s. the corrections uecrease automatically as PI
decreases. A sullicient condition for {x} to converge w.p.1 (and in mean
square) is that I:= I y? stays finite as I-J> if..J. The commonest choice for }'" Ct-C(
with! < 0::S: I and C a positive constant, satisfies this condition. The sequence
h
1
} can decrease more slowly without losing convergence if the noise is
bounded, which it is in actual observations; 0 < a.:s 1 is allowed (Ljung,
1977a). II' {r) is to decrease slowly enough for {xJ to reach the corrcct value
however far away il starteu at x
O
, I:= I)'I must tend to infinity as 1- 00.
There isa large literature on how to select I)' I. but s.a. incurs such a penalty
in slower convergcnce through replacing a Illulrix-tlcpcnden l LI pua ting gain by
a gain uepenuil1g on a scalar that it is useful only as a laSl resort, when
computing power is very limitcd. One or the original aLlnictions of s.a., the
7.4 RECURSIVE IDENTIFICATION INCLUDING NOiSE STRUCTURE
7 COMPUTATIONAL ALGORITHMS FOR IDENTIFICATJON 168
Consequently -h, is rJy,/rJx filtcred by 1+ C, and rJ,;,/rJx is obtainable by
filtenng - h, by 1/(1 + C), instead of using - h, for rJ';,/Dx as we did fOlrmerlv:
The modification improves the asymptotic properties of
consistency for all parameter values, which c.l.s.laeks (Ljung, Soderstrom and
Guslavsson, 1975). The modified algorithm is' often called Lm.1. 2
distinguish it from Lm.1. 1, which is e.l.s.
The actual performance of 2 at realistic 5.11.1'. is an object lesson in
dangers of on asymptotic results. The filtering by 1/( I + C) is often
unstable, as C is initially poor (Norton, 1977). Even when it is not unstable, the
filtering is liable to be counterproductive, because it uses (7.4.21) with old
values from q p:evious steps as ev
l
_ I/O.);; to ov
l
_.,IrJ.t;. Hence clvl/D.':; is still n.ot
accurate, espeCIally When xis changing rapidly from step to step, and errors
persist and accumulate whenever 1/( I + C) has one or lIlore poles close to
unit circle. A stability check is therefore no guarantee of good perfClrtllarlce.
we have for each element .f; of x
elt:
I
= _ (elem.en l i) _ _ ... _
U_\; 01 h, ax; ., D.f;
1', -hili, k 1- -h u
11.-/1 _.- m/-k-m
X='x,_ I + P,h, v,/cr
2

., I I
We recognise this as a Newton-Raphson step (Adby and
towards minimising S,.
Extended least-squares can be modified relatively cheaply to include the
Jacobian in the gradient calculation (7.4.14) by finding an expression for
Jvt/J" which does not necessitate going back over the whole record
(Soderstrom, 1973). Since
so that if thc noisc variance 0-1 is constant :'lIIU Po is large,
The heaviest computational task in e.J.s. Hill.! rclatcu lllelhous is upuating the
covariance. We could dispense with the covariance upoating and calculate the
t Sbderslrbm and Sloica, 1983; Young, 1974, 1984.
7.4.5 Recursive IlIslrumelllal-Variahle Algorithmt
171
6
6
o
o
o
o
,
o
o
o
o
o
o
o
o
o
o
o
o
o
'--__&-___ __--"-O__ __..J
o 6
10.
20 (ii)
o
o
60
120 1,1
1000 -
2000 -
3000 (Iii}
-1000
Fig. 7.4.1 Slochllslicapproximalion estimates, Example 7.4.1, where 0: XI'; Do: x 2'; 0: xJ,.
}',= l/t, (ii) i', == 1/10t and (iii) Y
l
= 10ft.
RECURSIVE IDI'NTIFICATION INCLUDING NOISE STRUCTURE
noise models of the alternatives are only as good as the parameter estimates
they depend on. Early eSlimates are poor, so bias occurs initially, declining,
one hopes; as more observations are processed. It is not at all obvious that any
of the mClhods will indeed converge lo unbiascd parameter estimates wilh
acceptable cnvarlullcc. Wc pursue convergence furthcr ill Sectioll 7.5.
The recursive illstrutllelltul-variable (r.i.v.) algorithm lo be described has a
distinctive realure, separation or planl-paralllctcr from noise-parameter
estimation. The separation allows a free choice of noise-parameter estimation
7 COMPUTATIONAL AUWIUTIIMS J'UR IDENTIFICATION
The e.I.s., e.m.m. and r.m.1. 2 algorithms cirCUmvel1led Ihe bias due to
correlation between regressors and regression-equation noise by integrating a
noise mOdel into the regression equation. The remaining noise-generating
term v, was while and uncorrelated with the input, and as a resull uncorrelated
with the regressors. An alternative stratagem, seen in balch form in Section
5.3.6, is to replace the noise-correlated regressors by instrumental variables,
which are totally uncorrelated with the noise. They musl also be strongly
correlated with the information-bearing regressors to yield low-covariance
estimates. The idea can be implemented recursively, but it shares a weakness
with the alternative recursive methods. The instrumental variables and the
Example 7.4.1 We shall repeat Example 7.3.4, bUI using the S.a. updating
equalion (7.4.22) with
(i) y, = I/t so that lhe updale al I = I is the same as in Example 7.3.4;
(ii) 1', = 1/IOt;
(iii) y, = 10/1.
Figure 7.4.1 shows the resulting estimates.
Case (i) shows clearly the ill elfeets of restricting the correction 10 be a scalar
times hi; the initial-position estimate rises far too rapidly and the initial
velocilY far too slowly. The resulls also underline lhe fact lhat consistency
(ensured by this choice of lr J) is no guarantee of good, or even tolerable,
small-sample behaviour. Case (ji) shows the c1rCcls or observation noise less,
but is even slower to converge llmll case (i). Case (iii) illustrates the serious
consequences or making )', larger. We call explain them by examining the
rccursion resulting from (7.4.22),
XI =(1- )'lh,h/)X,_1 + )',h,I',
where xis x-x. Thesum of the eigenvalues of the transition matrix 1 - r,b,llt is
its trace p - }',hnl/.lf 1'/ is large enough and hihl not too small, at least one
eigenvalue must be below - I. The recursion slep is lhen uns\.able, and
contributes 10 divergence or x. D.
17U
relalive ease with which convergence can be guaranteed, has lust its
significance now that the convergence properties of morc ambitious
algorithms are beller understood (Ljung, 1977a; Goodwill and Sin, 1984;
Solo, 1979).
(7.4.31)
If
l
= II; -I- (iju;_, + ... -I-
(7.4.29)
k III'
0
i",
h
m
0 0
- [
0 0 Ii I b
m
0 0
[ J
"i:
0 lJ
lII
1I;_k_1
.1'1-/1
l/=
II,
I a, (/"
0 0

, ,
0 I au 0 0
II, ,
'"
(J
all
== (iIi;
( 7.4.30)
II', = -dill', 1=' ,. - d"I1'",!, -I- ('It',_ I -1- ... c"V""'1 -/- P,
where II; is colullln t of an alternative i. v. Illatrix entirely
of {II' I. Iltllrlls ollithal Gin (7.4.30) (a Sylvester IIIl1tfix) is inverLible unlessB
and I +.11 have a zero in comlllOIl, so u; gives precisely the same parameter
estimates as zl' algebraically. There may, however, be numerical ditrc.rences,
especially before the effects of initial unknown values subside in the recursion
giving {II' J from lu I
The auxiliary Illodel may be updated afler every planl-parameter update,
or less Oneil. For instance, the algorithm may be iteraled, performing repeated
passcs through lhc records with the auxiliary model updated at the cnd or
each. I tcratioll of a recursive algorilhm is clumsy and may be too slow for real-
time identification, but it may help in gelling usable results from short records.
If the final 114 or P 1'1'0111 otic itcration is carried oyer as the initial value for the
nexl, the records are being lrealed arbitrarily as periodic with period equal to
their originallenglh.lnstead, it has been suggesled (Young, 1984) that in Li.v.
each ileration be initialised with M from (l1/)th of the way through lhe
previous iteration, Where I itcralions in all arc performed. This secms equally
arbilrary and resulLs in heavy depenJenee of thdinal M on the firsl (111)th of
the record, bul appears lO yield gooJ resulls.
Approximalc maximum-likelihood estimates or the noise paramcters arc
obtained from the Illodel (7.1.5), written us the a.Ln1.a. model
we can wrile column I of 2'1 as
7.4 RECURSIVE JlJEN"fIFICATION INCLUlJINU NOiSE STRUCTURE 173
A surprising consequence is lhal the numerator iJ of the auxiliary mouel has
no encct on the i.v. estimates. To see this, consider the sequence {tI' \ produced
by IiIlering 1111 with I/( I + Since
(7.4.26)
(7.4.25)
an h,
J'1l
7 COMPUTATIONAL ALUUIUTIIMS I'UR IDENTIFICATION
h,""[Y,
Recursive Instrumental-Variable Algorithll1
For ( = I, 2, ... , N
(i) Updale I and I Lo h, ami z,.
(ii) A4
1
=Al'_1 -/vl, _ Il,h/ AJ,._ d(l +11/1'11, IZr)'
(iii) Calculate innovation ", = y, - h}'x,_ I'
(iv) XI = x,_, + ,Hll/V
I
;
mechanism for generating III given; Xu' A/
o
given;
172
Here II is column t of ZT. We see from (7.4.25) that AI, is not symmetric, anu of
course it is no longcr the error covariance of x,.
We have yetlO specify Z,I' Intuitively, the ideal Lhing would be to replace the
noisy outpul samples in hI which cause all the trouble by "clean" values _J to
computed, as in the output-error approach of Seclion 7.1.3, by
+ ... (7.4.27)
We do nol know the plant parameters, anllthe best we cunllo is use the
eslimalcs. More generally, we can generate an i.v. sequcnce as the oUlput of
an auxiliary model.
We have some rrecllolll in generating the inslrulllcnlHI variables, sincc 0: in
(7.4.24) is lInall'cclcd by any non-singular linCH!" transformation of Z, i.e. by
any linear reversible filtering ortllc sequences ofs:1lllplcs forming Z, for if we
premultiply Z' by G,
O,.=[GZIUrle;zly=[Z'Uj 'e; 'e;z'y=ti, (7.4.28)
method, but the best-tested is a.m.I., the a.r.m.a. version of c.l.s. More
significantly, splitting the estimation of a (111 +11 +(I + r) vcctor of parameters
into separate estimation of an (111 +11) vector auu a ({I + r) vector saves
computing, as the computing lIcmanll rises more t han linearly wilh the
number of parameters.
The mechanics of r.i.Y. plant*p,u<lmeler estimation arc straightforward.
l.;'roll1 Scction (5.3.6), the batch i.v. estimate is
O,=[Z'Ur'Z'y
with Z the matrix of instrumental variables. We neeu only replace .11;' ;#,-
1
in
the algebra leading to (7.3.21) and (7.3.25), (7.3.27) and (7.3.28) by Z,. La
obtain the algorithm.
175
17.4.34)
,
r,lx) g, E[V,IX)] g, Lf(y, - Yi' XjJ
i= I
by noting that in the corresponding difference equation
.l'r = -{(\.I'/-l - .. , - (IllYt-n + 1 + ... + bmU,_k_m
The way the simpliJicd view of identification as predictor-building allows us
to defer consideration of the undcrlying model and the process generating the
records has led to strong emphasis on pl'edhtioll-error algorithms recently
(Ljung and Soderstrom, 1983). These algorithms aimlo minimise a scalar risk
[unction
f = Y - V= -AI' + +CV='(C-A) +Bz"U- c1'
so the one-step-ahead predictor is a recursion
where c; is C
j
- eli ami sis max(l/,Il). AlLhuugh the predictor came rrom the
model (7.2.3), it could equally have been devised empirically or SipJP1xpl!osen
as a convcnient structure. For prediction, the parameter vector which must be
identified is [c'1 c: b
l
b
m
-('I -c
q
]' However, the
original mouel implies sOme redundancy in these parameters if lJ is greater
ilian" 6
The loss fUllction I measures the error in predicting output Yi one sample
ahead. The cxpectation is over the observations up to I, with the parameter
vector x fixed. The algorithms actually lninimise V,(x) in lieLl or I'r(X) by
adjusting x according to the derivatives uf V,.(x) with respcl:t tu x, just as
Ln1.1. 2 did in Seclion 7.4.3. They consequently require each output prediction
to be a known and sulliciently dilferentiable function of x. The algorithlns'
inability to compute r,( x) wilhou t making assumptions about the p.d.r. of the
observations does not Inaller aSylnptotically (Ljung, 198 I). for under weak
assulnptions the x, that Ininilnises V,(x) teJ1ds w.p.! to the xthai Ininimises
lim'_I rt(x), as 1 tends to infinity.
+c
1
t'r-l +.,. +C,/"_q + L't
the only quantity on the right-hand side not known at instant t - 1is v, if A, B,
e and k are known. Since EV
r
is zero, the conditional-mean predictor )1, is
Yt - V" obtained by setting 1\ to zero and using the known values for all the
other terms making up Y
t
> Wc can avoid working out the terms in V,_q to V'_l
explicitly by expressing i, in lerrns of u's, y's and Fs up to (- 1. In z-
transforms.
7.4 RECURSIVE IDENTIFICATION INCLUDING NOiSE STRUCTURE
t\, -1' = Ii', . - fl'iI XU
. J , - j' I .i
7 COMPUTATIONAL ALGOIUTIIMS FOR IIJENTIFICATION
h;' [11',_ I 11'/_ r l' v, _'I] I, ,
X" &1 [-d
J
-til' <', {' J'
'1
174
where superscript 11 denotes noise, allli
7.4.6 PrcLlictiulI-Error Algurithms
(7.4.32)
Both {II'} and {v J are unobservable, or course, and have to be estimated by
Example 7.4.1 A one-step-ahead predictor of the output can be developed
frolll the regression equation model (7.2.3):
Y=-AY+Bz"U+(1 +ClV
With a few exceptions, we have examined algorithms which give parameter
estimates with zero bias and minimum covariance. As notet! in Section 5.3.2,
the attnlction or minimum parameter-estimate covariance \is that it
guarantees, for any scalar quantity lincar in the parameters, the smallest
variance and m.s. error obtainable with Zero bias. Il thereby kills two birds
with one stone; it gives a minimum m.s. crror both in each illl.lividual
parameter and in the output at fixed valucs of the explanatory vatiablcs.
To find the parameter-estimate covariance, we must spccify not 'only the
estimation algorithm but also the rorlll or the model al1d the system actually
generating the records. However, whcn the model is intendcu for prediction,
we are not interested in the parameter estimates or moucl structure for their
own sake, and can concentrate on prediction performance. In comparing the
performance of a predictor estimated by u particular algorithm with the best
Hllainable by that type or predictor, we need not specify either the original
model from which the predictor arose or the actual plant and noise dynamics.
In these circumstances, it is silnplestto regard the predictur itself as the model,
and think of identification: as predictor-building. The adequacy of the
parameter estimates fur purpusl:s other than preuictiull can lhcn bc len as a
separate issue.
;=1,2, ... ,1", j= 1,2, ... ,q (7.4.33)
where U'J is found as in (7.4.27) with estimated plant parameters. and I.' is
defined analogously to II".
A long series of refinements and extensions to the basic r.i.v./a.J1i.1.
combination has been reported (Young and Jakeman, 1979, 1980; Jakeman
and Young, 1979), and supported by extensive results from real and simulated
records.
177
(7.5.4)
(7.5.3)
(7.5.2)
(7.5.1)
clements III +11 + 2 J'
to fJl +11 +q of rj'/_I
c/', = [)', h,'] r
7.5.1 Extended least squares is based on
CONVERGENCE ANALVStS
+C1V
I
_
1
+...
J', anu lhe regressors h, f'orm
tP,=[J', , .. J'I u:UI-k-1 1I,'_k_mil
i
,_1
I ,
[
clements 2 (011 : c1cincnls 11 -I- 2 to!o
= 0 ol'tPr'_1 1
0
n+11l0rcPl_l:
+[h;r x+1'1 ori .1'1-1
The evolution of cP in a linear system is altogether described by
tjJ, = K(x,_,)tjJ'_1 + L(x,_I)e,
and write an equatiun stating how rj" cvolves from tl'I_I' To obtain tPl from
""-I we delete the oldest samples of input, output and, ir present, noise
estimate, shift the rest down one place and insert the new values. Unless the
Hew input salllpicciin be lreated asuctcrlllinisth.:. it is modelled as a (unction of
earlier input salllpks ano a while uriving sequcnce UIlU, irrccdback is present,
earlier oUlput samples. In self-tuning controllers the control inlwt,at ,ti.lne
t - I depcnu5 on the parametcr estimates x, _I used in the cOlllrol
computation, and lhis relation must be included in the equation for tPr' The
llew output)'r depends on the rest- of tP, via the system dynamics, and the new
noise estimate i\ _I depends on x, _I and part of tPl _I through the model, e.g.
where i'l is a scalar gain, sometimes I. Let us collect the regressor vector h
r
and
output J'r into
recursive schemes, including o.l.s.. e.l.s., s.a. and the commonest self-tuning
controllers. It consists or two difference equations, one the parameter-
updating equation and lhe other relating current samples of the output and
regressors to earlier samples. Covariance updating will be added later.
ParamCler-updating equations (7.3.28) for recursive minimum-covariance
estimation, (7.4.7) ror e.l.s. and c.m.IIl., (7.4.22) ror s.a. and (7.4.26) for r.i.v.
are all of the I'lli'm
, where K and L depend on x, _1 in general but not in some of the simpler
'algorithms, and {e I is a white sequence.
inlo a single
a variety of
Minimisation of I',(x) for a given predictor structure provides initial
motivation for a prediction-error algorithm, but as 50011 as we enquire
whether a difTerent predictor structure might be belter, we have to discuss the
process actually generating the records. Statements can be made about the
asymptotic covariance of x" i.e. the eHicicncy of the algorilluI1, at the price of
assuming that the model underlying the prcuictor can explain the
observations, in the sense thal the observations could have been generated by
the model with some true value of x.
A recursive prediction-error algorithm call be developed in quite a general
form (Ljung and Soderstrom, 1983) and then specialised by choice of loss
function, model structure and detailed implementation. Many
algorithms can be obtained in this way as prediction-error or approximate
prediction-error algorithms. This reinterpretation may suggest modifications
to improve asymptotic properties, as whcn c.l.s. was mollified into LIll.J. 2.
The prediction-errol' approach is direct and elegant when the problem really is
prediction, but it is also valuable because it allows degree or specialisation to
be traded against power or aSYI1l plotic COil vcrgclll.:e and dIicicm:y results when
the algorithms arc analysed.
Since the early 1970's, several methods have been developed lor convergence
analysis of a wide range of recursive identification algorithUls. Their
development was stimulated by two trends, a proliferation of',recursive
algorithms with similar updating structure (Astrom and Eykholf, 1971) and a
rebirth of adaptive control in the shape of self-LUning controllers (Astrom and
Wittenmark, 1973). Self-tuning cantrollers combine a recursive iden tilier and
a control law based on the updated model, as will be shown in Fig, 8.3.2 and
covered in Section 8.3.3. A convergence analysis lVas needed for such c1osed-
loop systems as well as systems with input independent or earlier output. The
analysis outlined below (Ljung, 1977a) is general enough to cover both.
As even an informal accounl of the analysis is quite complicated, and its
implications arc much easier to sec in specific examples, a carerul reading of
Examples 7.5.1-7.5.3 is strongly advised.
*7.5 CONVEUGENCE ANALYSIS FOR nECUnSIVE
IDENTIFICATION ALGOnlTHMs (
7.5.1 Forllluiulion of Ull Archetype Algorithill
Our first aim is to fll a broaJ class or recursive algorithms
framework, the archetype algorithm. It has to encompass
179
(7.5.7)
(7.5.8)
(7.5.9)
(7.5.10)
i= 1,2, .. . ,p
'" I1K(X,'!;-'L(X,)e, =1>,(X,) say
le= I
t{,,= rJK(X;,)t{'IJ +I rJ K(xj-I)L(x'l)c,
j.-, I Ii"" I j" Ii I I
7.5 CONVERGENCE ANALYSIS
We can make this look very like (7.5.1) by wriling it as
P-, pCI I (liT l'-I)
-!__ =-!2 II
/ / - I / / - I
For each column fl
iJ
of PI-'It, defining Y
I
as III we have
then need only definc an augmcnted "parameter" vector consisting of Xl
followed by columns r:"to r:'1 to l11ake(7.5.8) and (7.5.9) into (7.5.5). Example
illustrates the procedure for e.l.s.
7.5.2 Asymptotic BehaHour of ArchetYl.e Algorithm
I
Furthermore, if :cl is slationary and K(x
l
) has only stable eigenvalues.
i(X,) will become a stationary random veclor as i increases. We can now
Rigorous analysis of the archetype algorithm is complicated and dillicult, but
its spirit is conveyed by the following loose and heuristic argumenl.
In (7.5.4). the influcncc of the forcing e at one instant on laler ,p's will
eventually fade until negligible in almost any conceivable practical
) circumstances, and so will the efrect of the initial value l/J
o
' Also, if. the
parameter estimates converge, XI - I gets very smull as t increases. (We are
not embarking On a circular argument along the lines "ifil converges lhen ... it
converges". as we shall be cl11luiring into thc value to which xconverges and ils
'manner of convergence, notll'hether il converges.) We should nOlthen be too
far out ir, in calculating (Pi at any time i close tu a large time t, we replace aU
earlier x values by x, and illorc the clrcet of rPo' giving
7 COMPUTATIONAL FOR IDENTIFICATION
I 0
[" "]
0
0 0
0'
gl
0'
,p, = 'I',p,_ I + ,p, +
0 I
0 ,/)/-, I -I- e
IJ 0
,
:\:"
,-- J
0 IJ
0
0 0
IJ 0
I 0 IJ
'-Jl
ii
= IJ I 0 0 \Jl
ij
= 0, i ,pj, i = 1,2,3,
0 0
17H
then tPl can be written as
Hence (7.5.4) for this algorithm and input is
,p, = [1- Fr '[('I' + 1I),p'_1 + Je,l =K(x,_,),p, -I + I.,e,
= 1+ F,p, + 1I,p,_, + Je, say
Here the partitioning separates samples of y. II anu 6, the rorting c/ is\
[v, II"_'_I]T and 'I' accounts for the downward shift of most elements from
,p, _I to ,p,. The partitions of 'I' are
In (7.5.1), SI depends on h, directly, and in most algorithms also indirectly
through P
r
or lvi" The corrcclionlcrm in the pan.llHclcr-updating
consequently a complicated function of 4'1' so we write
If {u} is, for instance, an autoregression of onJcr 1< m so that
where (1(', .) is a vector of runctions we must specify, along with the "exact
contents of c/J" for each algorithm analysed.
The convergcnce analysis applies to the archetype algorithm (7.5.4) and
(7.5.5). Covariance updating is adjoincd to (7.5.5), as we sce shortly. The
analysis is diflieult because (7.5.4) and (7.5.5) arc coupled; in particular, x,
depends not only on 5.:
1
_
1
bUl also, through tP" on all carlier x's in general.
Let us now sec how 1.5. covariance and parameter updating lit into (7.5.5).
As usual it is easiest to consider the information-updating equation
== Pr \ + h,h;rja,l
where L docs not depend on X, _ I but K does. Both depend on x.
181
and
regressor vector would be if the Jixe<.1 value x, Were used for evaluating Kand L
at every up<.1ate from the indelinite past to time i. Only [\_1 to l\_q are
affected, in [act.
Denoling E[ii;(y; f,rx,l] by d(x,) and E[f,;h[] by G(x,), we obtain
[
' R' 'd(x ]
[(X') = EJI(.J;(x'),x')= "" _ ,
, I I. I - stacked columns of G(x
1
)10'- - R,
The dilferential equation (7,5,15) is then
x( r) = W' (r)d(x( r/,,'
for the Xpari olT, and for the remainder, writing the columns of Rand G side
by side once more,
li(r) = G(x(r)/,,'- R(r)
For (7,5, 15) to approximate the large-sample behaviour of {Xl adequately,
some regularity conditions must be imposed all I}' }oK, L, {e} and q. Without
wishing to go into them in detail, we should look brieJly at them to see that
they are not very restrictive. They arc!:
(i) Ie i Illust be a sequence o[ independent random variables, nol
necessarily stationary ailhough we took thell1 to be so in the heuristic
argument.
(ii) Eilhcr (a) e, must be bounded w,p,1. at all t (real-life dislurbanees and
[oreing lire bounded), and II musl be continuously dillerentiable wilh respeel
to q, and X, the derivatives being bounded at all t, or(b) to permit analysis wilh
c, not bounded, other conditions 011 c
l
and q must be assumed (Ljung, 1977a).
(iii) K and L musl be Lipschitz continuous (Vidyasagar, 1978),
(iv) q must be such that D [(x,) exisls.
(v) l)'} must be a decreasing sequence with 1 }'f infinite, I:= 1 finite
for some (l and 1/)', - I finile as 1-+ 00.
The eonditions on II, K and L have to be met over some region of X, say DR'
lhroughout which K has only slrictly stable eigenvalues,
Assul1ling 110 redundancy among the regressors in hi' G is
for boul1l1cd input iH1d output, Hnite. So therefore is the asymptotic R. Hence
R-
1
is tinite, so L1l1lust generally converge to zero. That is to say. the residuals
Yi - rl/Xi are asymptotically uncorrelated with the regressors Iii' We recognise
this as the probabilistic counlerparl of' the orthogonality property of I.s,
6
The converged state is given by xand !{ both zero. i.e.
7,5 CONVERGENCE ANALYSIS
(7,5,13)
i='+ I i=ll-I
7 COMPUTATIONAL ALGORITHMS FOR IOENTIFICATION
This can be viewed as the discrete-time version or a time-invariant dilrerential
equation
Examplc 7.5.2 We continue with the c.l.s. algorithm, <.1elining x; as X,
augmente<.1 by the stacked columns to of P,-llt, ant.! R, as P,-l/t. (R,
Ljung's notation, unrelated to regression-equation noise covariance). The
joint parameter and covariance updating equation (7.5.5) for x; then
[
R"ii.(I'-f,lx)/,,' ]
q(tPi(X;), x;) = stacked ulr r:
i
fiTlcr
2
- R,
where 0'2 is constant'as[el has been assumed stationary, and Iii is what
, +" r -1- s , I- ,
The asymptotic behaviour or [x: can thcli be invcstigated by solvillg the
differential equation numerically or, wilh luck, analytically,
, +s
X,;..,-X,,,,X,,,,,-x,,,,l'(x,) I 1','" [(X,)6r
i='+ 1
,+ " I I,.
I ]';lllq,pX,_,)'" I ,l"II(.J;;(X,),X,)
i=' I- I i=ll I
This last approximation is clearly acceptable if we arc interested in the
behaviour or>.:, sint:c the tCfml1cgIcctcL! has zero mean. It is 1101 clear without a
more careful analysis how rar (7.5.13) is {rue for individual realisations of {x}.
We can tidy up (7,5, 13) without essentially altering il by the Irick of treating
)'.. as the increment or a transformed time T, giving (
Lj=/+1 I
x(r) '" [(x(r)
and write (7,5,f I) as
At large enough i, q(!P,(x,}, xr) also becomes a ::;tationury rUiluoll1 variable by
virtue of its dependence 011 tPi (with x
r
regan.lcd strictly as a parameter, not a
variable, at the moment). We can thus express () in terms or its ueviation Pi
from its mean [(x,):
q(<P(x,), x,) = E!I(.p;(X,), x,) +1'; '" [(x,) + II;
, .
approximate the change in x over.\' steps from time ( lIsing (7.5.5):
ISO
Thcorcm 2 says that if x
J
has non-zero probability of converging to, wilhin a
distance p of some point x* however small pis, lhen x* must be; a stable
equilibrium point of (7.5.1 5). Ii applies so long as eov <I((X'), x') is positive-
definite and f(x/) has continuous derivatives throughout a neighbourhood of
x*, the derivatives converging uniformly as 1....... tf...J.
An algorilhm therefore cannot givecollsistellt estimates the true
value x is a stable equilibrium point of (7.5.1 5).
tSJ
7.5 CONVEHl;J,NCE ANALYSIS
Example 7.5.3 (Ljung, 1977b). In Example 7.5.2 we found that fore.l.s. d(x,)
IS zero 111 lhe converged state. Let us check whether lhe actual x is an
equilibrium poinl by examining d(;';I)' By definition
d(x,) = Elr.,IY, - r.,r x,)] = E[r.,".J
where
Vi - t'i + xl(iii - h;) = iii - Vi +C1(Vi _ 1 - [\- I) + ... + clV
I
_
q
- V
i
_ 'I)
We therefore get <" - v, if we pass I.! (x - x,) through the lilter 1/(1 + C(Z-I ,
I.e.
f'l =.l'i - ill x, = hrx + t\ - fi'txl = xT(h
i
- iii) +fiT(x - X,) + Vi
Now in e.l.s. hi - hi is zero extelJt for lhe last (1 elements v - to
1 /-1 ,-I
v
I
_,/- ii
lnll
, SO
i\ - 1\ = hil(X - x,)
where I., is h, liliered by I/( I + ('(0 - I I). Consequenlly,
d(x,) = 1:'[i;,(r.,'(x - x,J + v,)] = 1:'[i.,r.:](x - x,) = G(x - x,)
where Gis Elfiifir], itself a function or Xi' Here we are a
parameter, not a ranJom vilriablc, and Elhiv;] is zero since iv J is Any
converged value or XI has to make d and hence G(x - XI) zero. One such value
is x, but there may be others.
We discover how any other x
r
might make d zero by looking at
(x - X/)f G(x - x,), which is zero whenever d is zero. Defining and through
Ix - x,) IG(X - x,) = E[(x x,)rr.,r.!(x - x,)] =
we sec thal when d is zero, lhe cross-correlation at lag zero between c.. and tis
zero. liltered by 1/( I + ('(0 - I I), the cross-correlation is easiJ'y
in terms of Ihe power spectral density '/'i(JOJ) by Parseva!'s theorem:
- T f'/r
= 2,;-.'hUOJ)/(I + C(e-j",T)dOJ
- nil
Like any auto-speclral density, <!'<!.iOJ) is real and non-negative at all OJ, so the
integrand is negative only whcn Re{ 1/( I + C(e- j,,'I)) I is. We can say more
about the x, values which make 1I zero il'wc assullle Rcllj( I + C(e j",,) l to be
posi tive for all '"T be: ",een - 11 and 11, i.e. I/( I + ('( 0 - , )) to be .1'1 riclly posilive
:eal. it is..: is zero only if (/Jf.Uw) is zero over this range of w,
lmplymg that J has zero power. Thal is, e.l.s. can converge at some xother
than x only if (x - XI) 1hi is zero al all inslanls i w.p.l. Since (x - x)1 It- is'". - v.
filtered by. I + we conclude lhat the residuals {iiI of
model arc Idcnllcallol v1. The model output thus coincides with that given by
7 COMPUTATIONAL ALGORITHMS FOR IDENTIFICATiON
7.5.3 Convergence Theorems for Archetype Algorithm
182
Theorem 3 says how closely {x I follows a trajectory 01'(7.5.15). It slales that if
the solution of (7.5.1 5) from x, onwards is compared at instants I, to IN with
{x} obtained from the same x, :' the probability orthe dilrerellce exceeding Ii al
any orthe instants is not than KI;=o(I'I/I;4Y:J I'or any IX;;::: I and any cup
to eo and 1
0
beyond To, where to' To and constant K depend on IJ. and 011 the
minimum spacing bctwecn instants.
Now irwc im:rease 1
0
but nolhing chiC, so that A' sli.lys thc samc, thcll with
hi} dccreasing amI 1 (inile for sOllie a. (twu of our earlier assumptions'
(v)), KLJ==o (Y/)1:
4
YI gets smaller and smaller. llelH.:c lhe cstimates rrom X
'll
onwards stay close to the solulion of (7.5.15) more and more certainly as we
increase 1
0
; the later the section of :x 1 consiLicrcd, the more certainly lhe
trajectory starting at the lirst x, urthe section approximates the rest or1x l well.
Example 7.5.3 completcs the analysis of e.l.s. and introduces the important
property or positive realness.
Three theorems (Ljung, 1977a) stale how individual sequences IX}
estimates relate 10 Ihe dilferential equation (7.5.15).
Theorcm 1 says that the distance between x/ and the nearest point of a region
D" which contains only Irajectories of (7.5.15) which have been and will
rcmain in it forever, tends to zero w.p.l as t ....... if). At its simplest, D
I
. (called an
invariant set of (7.5.15 is a single equilibrium point where f is zero.
theorem applies when, w.p.1 and at an infinite number Or instants, 14',1 is below
some value and x/ is in a region from which no trajectory Icaves D1/ and all
trajectories converge to Dr as 1 ....... (fj.
The usefulness of Theorem I depends on how easily the domain of
attraction of Dr' the region from which trajectories or (7.5.15) converge to Dr'
can be identilied. Lyapunov slability analysis IVidyasagar. I97B) is required
1'01' all but trivial cases, and is Ilotoriuusly mure all art than a science.
Alternative approaches, some not relying on Lyapullov theury, have more
recently been developed (Goudwin. el al., 19BO; Solo, I9HIJ; Fogel, 1981
Goodwin el al., 1984).
185 REFERENCES
REFERENCES
Adby, P. R.. and M. A. I-I. (1974). "Introduction lo Oplill1iziltion f\.lcthods".
ClliljJrniin & I-lilil. Loudon.
AstrolTl. K. J. (1967). Cl1mputer control of .i paper application of linear stochastic
cOlltrol theory. IBM .I. Re.\'. Vel'. 11,389-405.
Astrom, K. J, {I 970). "Introduction to SLochastic Control Theory". Academic Press. New York
and London.
Astrolll, K. J .. amillohlill, T. (1966). Numerical identification of lineilr dynamic systems frortl
norlllal operating records. III "Theory of SelfAdaptive Control Systems" (p. H.
IlulUmond, cd.). PlenullJ, New York.
Astroll1. K. J., amI Eykholr, P. (1971). System survey. Aurvmarica 7,123-164.
Astrolll, K. J., amI Willetlillark, B. (1973).011 sell:'wiling regulators. Amvl/laticu9, 185-t99.
Astrolli. K. J.. Borisson. U., Ljung.. L., <Jlld Wiucllmark, 13. (1977). Theory and applications of
reguilitors. AIIIIJlJlt/rica 13; 457-476.
Autolllatic;1 Special (ItJHI). On IdentiJlciltitlll :lIlt! SYS!elll Parameter Eslilnatiull,
.-I",ollllll;c//. 17.
lJekey, <..l.A., illHI Smidis. U. N. (cds.) (ILJHJ). "JdellLilkatioll alltl Systelll Parameter Estimation
1982" (2 vols.). Pl'uc. IFAC Symp., 6rh, Washingtoll, D.G., .Iw//! I!)?:a. Pergamon,
Oxford.
DicflTHlll. G. J. (ItJ77). "Factorization tvlcthods for Discrete Sequelltial Estimation". Acadelllit:
Press. New York and London.
Clurke. D. W. (l1J(7). Generalized-least-squares estimation of the parumcters or a dynamic
lIlodel. I FACSYM P. JdClIl(/ica,ioll Auto",. COIlll'ol, Pl'l1glll!. Czechvslomkia. paper 3.17.
analysis or algorithms and the connections between them. Young (1984)
introduces recursive identilication via least-squares and pays close practical
atlcntioll to lime-varying models (which we discuss in Section 8.1) and
instrumental-variable nlgorithms. Soderstrom and Stoica (1983) analyse
instrumental-variable algorithms at a fairly advanced level, but point out the
practical implications of their results and present a number of studies. A
detailed account 01" stochastic approximation is given in an older book by
Wasan (1969).
We took only a cursory glance, in Section 7.1.2, at the connection between
an a.r.m.a. model of a stochastic process and its autocorrelation I"unction, for
two reasoils. First, the inpuls anu lIoise in ac1ual dynamic systems are at best
only approximately modelled by low-order a.Lm.a. models. Secoml,
identification is llloSt convcllicntly carricu out by trying successive models,
each estimaled straight from the input and outpul records, until an adequate
onc is round. It is not normally feasible to choose even the noise-model
structure mainly by reference to observed correlation functions. However,
correlation functions might on occasion give some guidance, and an
.acquaintance with representalion theory for stochastic processes might help,
so two widcly respected textbooks arc recommcnded: Astrom (j!l,70U1'.ld
Whilile (1963, extended and reprinted 1984). . ..
7 COMPUTATIONAL ALGOIUTIIMS FOR IUENTIFICATION
d(x - x) , .
-"-=-u-(x-x)
d.,
which is stablc without the strictly positivc-real conditiun on 1/( I + C(: - I).
t::,.
FURTIIEI{ READING
Several excellent books on recursive identification have appeared recently.
Ljung and Soderstrom (1983) give broad coverage and arc strong on the
d(x-x) __ 2G- I G-(-. _ )
_ a x x,
I.,
and so at x= x, R is Gla
2
and the Iineariscd equations arc
with G, Gand aG/dx evaluated at x. The second equation isstable if the first is,
so that the term in x- x decays. Stability of the first follows from G-
I
being
positive-definite and G+ G
T
positive-semi-delinite, but only under the strictly
positive-real assumption. Ljung (1977b) gives the details. t::,.
the true parameter and regressor values. If the model order is minimal (no
pole-zero cancellation) x
f
actually equals x, as the parameters are transfer-
function coeJIicients.
Without the strictly-positivc-real assumption, no such conclusions cun be
drawn; the converged estimates may dilrer from the true values. A stable and
unremarkable example which is not strictly positive real is
1/(1 +C(Z-I= 1/(1 + 1.6z-
1
+0.8z -')
which has a negative real part for any z between the 98.4 and J48.6 0 or - 98.4
and 148.6 radii on the unit circle. A positive-rcal condition turns up in
many recursive algorithms (Ljullg, 1977b).
We complete our check whether {.'} converges w.p.1 to x by testing the
local stability of the solutiol) of (7.5.15) about the equilihrium point x, as
required by Theorem 2. From Example 7.5.2,
Example 7.5.4 The motivation for the LlII.1. 2 modification of e.l.s. (Section
7.4.3) is clear in the present context. Recursive maximum-likelihood 2 filters hi
by 1/(1 + C(Z-I , with the elrect asymptotically of rcplacing ii, by ii, in G,
making Gand G identical. The Iinearised dinerential equation ror xabout x is
then
184
187
PROBLEMS
PIWULEMS
7.1 An autocorrelateu sequence is represented as the resull of passing
a while sequence though a filter with lransler function (I +0.5;-1)/
(1- 0.8;-' J. Find Ihe a.cJ. of Ihe autocorrelated sequence, normalised by
the I11.S, value. Show that the autucurrelateu sequence has in,S. value
5.694 limes that of the white sequence
7.2 In the recursive, minimum-covariance algorithm of Section 7.3.2,
HIP/_III;r + R must be inverted. If the noise and modelling error
contributions to the observed variables are negligible, I? is zero. If also there is
some redundancy in the vector, the rOws or II, are linearly
dependent, so HJJ/_ IlJ,' is singular and the inversion is not feasible. Can we
make it feasible by deleting observed variables unlilthe redundancy vanishes?
Does the answer change if only one of the observed variables is free of noise
and modelling error'! What is special about P When one of the observed
variables contains no noise or modelling error? Can the information-updating
form of the algorithm be employed in such a case?
of linear dynamic systems. Rep. 7308. Division Automatic Control, Lund Institute of
Technology, Sweden.
S6derslrihn. T., and Stoica, P. G. (1983). "lnstrumeIltal Variable Methods for System
Identificution". Springer-Verlag, Berlin and New York.
Solo, V. (1979). The convergence of AML. IEEE TrailS. Autom. ControJ AC-24, 958-962.
Solo, V. (1980). Some aspects of recursive parameter estimation. 1111. J. COlltrol 32,395-410.
Talmon, J. L., <lnd van den Bouin, A. J. W. (1973). On the estimation of the transfer function
parameters of process and noise dynamics using a single-stage estimation. III
"Idenlification and System Parameler Estimation" (P. Eykhorr, cd.); Proc. IFAC S)'mp.,
3rd, ParI 2, The Hague/Dl'Nl, The NerllcrlmJ(/s, .If/Ill' 1973. North-Holland, Amsterdam
and Amcrican Elsevier. New York.
Vidyasagar, M. (1978). "Nonlinear System Analysis". Prentice-Hall, Englcwood Cliffs, New
Jersey.
Wasan, M. 1'. (1969). "Stochastic Approximation". Cambridge Univ. Press, London and New
York.
Wellstead, P. E., Edmunds, J. M., Prager, D. and Zunker, P. (1979). pole/zero
assiglllllent'reguilitors. Jilt. J. Co//trol 30. 1--26.
Whillle, P. (llJ84). "Prediction and Regulation", 2ud cd. Blackwell, Oxford.
Young, P. C. (1%8). The use of lincar regression and related procedures for the identilication of
dynamic proccsscs. Prot'. IEEE Symp. Adaptil'l' 7th, Ullil'. of Calilumill. Los
Angeles. IEEE, New York.
Young, P. C. (llJ74). Recursive approaches (0 time scries analysis. Bull. IMA 10,209-224.
Young, P. C. (llJH4J. "Recursive Estimution and Timc-Series Analysis". Springer.:iYerlu.g;)3erlin
and New York. ..
Young, P. C. and Jakeman, A. J. (1979). Relined instrumental variable methoLls of recursive
(imc-series analysis, Purt I: Single input; single output systcms. 111/. J. Colltrol19, 1-30.
Young, P. C, and Jakeman, A. J. (1980). Relincd instrumenlal variable methods of recursive
time-series analysis, Part Ill: Ex(ensions. II/t. J. COlltrol 31,741-764.
Dugard, L., and Landau, I. D. (1980). Recursive output error identifkation algorithms
and evaluation. Automatica 16,443-462.
Eykholr, P. (1974). "System Identification". Wiley, New York.
Eykhoff,j P_ (cd.) (1981). "Trends and Progress in System Identification". Pergamon. Oxford.
Fogel, E. (1981). A fundamental approach to tbe convergence analysis of least squares
algorithms. IEEE Trans. AllIUm. COlltrol AC-26.
Goodwin, G. C, and Sin, K. S. (1984). "Adaptive Filtering Prediction amI Control".
Hall, Englewood Clurs, New Jersey.
Goodwin, G. C, Ramadge, P. J., and Caines, P. E. (1980). Illultivuriable adaptive
control. IEEE TrailS. AUlom. Cmitrol AC-25, 449-456.
Goodwin, G. C, Hill, D. J., and Palaniswumi, M. (1984). A perspective on convergence
adaptive control algorithms. Alltomutica 20, 519-53!.
Gus{avsson, l. (1969). Maximum likelihood identification of dynamics of the Agcs{a reactor and
comparison with results of spectral analysis. Rep. 6903. Division of Automatic Control,
Lund Institute of Technology, Sweden.
Isermann, R. (cd.) (1980). "Identification and System Parameter Estimation" (2 vofs.). Proc.
IFAC Symp. 51h, Darmstadt, FUG, Sl'ptl'l1lbl'r Jl)71). Pergamun, Oxfurd.
Jakeman, A. J., and Young, P. C. (1979). Rclincd instrumental variable mcthods of recursive
time-series analysis, Parl II: Multivllriablc systems. lilt . .I. ('onlroI29, 621(,44.
Jazwinski, A. H. (197U). "Stochustit; Processes and Filtering Thcory". AClIdemit; Prcss, New York
and London.
Kalman, R. E. (1960). A ncw approadl to linear filtering lltlU prcLlktion. hall.\". ASME. Sa. D. J,'
IJa.\ic Eng. 82, 35-45.
Kurz, I-I., and Goedeckc, W. (19HI). Digital paramctcr-adaptive control uf pruccsses with
unknuwn dead time. AUlOmatica 17, 245-252.
Lec, R. C. K. (1964). "Optimal Estimation, Idcntilication anLl Control", MIT pressl,cambridge,
Massachusctts. .
Ljung, L. (1977a). Analysis of recursive stochastic algorithms. IEEE Tral/.... AlllvlII. Colltrol
AC-22, 551-575.
Ljung, L. (1977b). On positive reallransfer functions and the convergence recursive
schemes. IEEE Trcm.... Autum. COl/trol AC-22, 539-551.
Ljung, L. (1981). Analysis of a general recursive predktion error identification algorithm,
Automatica 17, 89-99.
Ljung, L.. and Soderstrom, T. (1983). "Theory and Practke of Recursive Identification". MIT
Press, Cambridge, Massachusctts, and London.
Ljung, L., Soderstrom, T.,and Gustavsson, I. (1975). Countcrcxample to tile gelieml convergence
ora commonly used recursive idclllilicatioulllethoLl. IEEE l'rtllJo'i. Alilolll. ('mitrol AC-IU,
643-652.
Maybeck, P. S. (1979). "Stochastic Modds, Estimation, and Control", Vol. I. Academic Press,
New York and Londou:
Mayne, D. Q. (1963). Optimalnon-stulionary estimation or the pal'illl1i::ters of a linear system
with Gaussian inputs. J. Electron. Cunlrol 14, 101-112.
Nortoll, J. P. (1977). Initial convergence of recursive idelitiJicatioll algorithms. ElcCll"f!l/. Le1l. 13,
621-622.
Panuska, V. (1968). A stochastic approximation method for identification uflincar systems using
adaptive filtering. Proc. Joint Auto",. Comrol COIl/. AIm Arbor, Michigan,
Plackett, R. L. (1950). Some theorems in least squares. IJiOlllt'lrika 371,
Robbins, H., and Monro, S. (1951). A stochastic approximation method. AIm. Marh. Siat.
400-407.
Soderstrom, T. (1973). An algorithm for approximate maximum-likelihood id""1,n,a'i,,,
7.3 Repeat Example 7.3.4 retaining only five figures at the end of each stage
of calculation, to see the effects of illconditioning.
7.4 Section 7.3.2 mcntioned state estimation, in which x is modelled as
evolving in accordance with a state equation XI = :fi,- I XI _ 1 + 1"", - I W, - 1 where
{w} is a zero-mean, white sequence. As we shall sec in Section 8.1, it sometimes
makes sense to model a parameter vector x in this way. With that in mind,
show that the only unbiased estimator of x, in the form Ax, I' where x,_ I is
unbiased, is <1>,_ 1x,_ "
7.5 Two possible choices for the step-size factors 1')11 in stochastic
approximation (in addition La those ill Example 7.4.1) afC )', = l/t
1i2
and
}', = 10/1
1/2
. Try them out over six steps 011 the problem cOllsiucrcd in
Examples 7.3.4 and 7.4.1.
7.6 This problem traces the steps proving stahility of the dillcrential
equation for x- x at the end of Example 7.5.3. and thus showing that the true
value x is a possible convergence point for c.l.s.
(i) Show that if (; + CiT is positive-scllli-dclinitci.lnd G I positivcJcHnile,
a posiLivc-dcfillite matrix P Can be roulld su..:h that P{-U-1G)+
(_ G- I Girl',; 0.1 [The casiest way to show it is to produce" suitable P.]
(ii) By consideringlmT(PA + ATP)III,:where III is any eigcnvector of ,11 and
P is positive-definite, and then nT(A + AT)o, where n is any eigenvector of A,
show that negative-semi-definiteness of PA + ;{rp ensures that all eigenvalues
of A are in the left"hand hall' plane. .
(iii) Notice that with - a'G- I Gas A. (ii) proves that the in x- x in
Example 7.5.3 is stable when the assumptions on I and G arc met.
188
7 COMPUTATIONAL ALGORrrI1MS FOR 1DENTII'ICATION
Chapter 8
Specialised TOllies in Identification
8.1 IlECUIlSIYE IOENTIFICATION 01' LlNEAIl, TIME.YAIlYING
MODELS
N.J.I Ilol" of TilllcY"ryillg Mudcls
Up to this point we have treated the tlynamics, and hence the model structure
and parametcrs, uscollstunl. We couh..l argue that we have no since
a parameter is essentially nota variable; a sufIiciently
parameter model should be able to represent the dynamics throughout.
Practical factors force us to take a less dogmatic view, however. We may know
or care too little about the detailed dynamics to propose a comprehensive
model structure. The model may have to be linear and low-order because
modelling effort is limited or the end use requires a simple model, even at the
price of the parameters having to vary to accommodate non-linear or higher-
order behaviour. In those circulllstances the distinclion between parameters
and variables is bl uned, so we adopt the ad hoc definition that a parameter is
anything we wallt to regard as such, usually because of its physical
interpretation or its place in a standard model structure,
ExumpJe 8.1.1 We need a model, for control design. of an industrial boiler in
which water passes through tubes heated from outside by an oil burner. In
time the heat transfer through the lube walls slows as soot from incomplete
combustillll builds lip oli the outsidc anti mincrals m;cretc un the insillc or the
lubes. With enough instrumentation, lime, access and skill a constant-
parameter model could be built. no doubt nonlinear, relating the heat
transfer coclIicicnt to the history of burner air/fuel ratio, fuel rate. water or
steam flmv nile, anti fuel and waleI' composition. In practice. the coeflkicllL
woultl at most be treated as a parameter and measured, directly or by
estimalion from operating records, periodically or marc probably once and
for "II. More probably still, its influence would be lumped with the rest or the
IH9
boiler dynamics into a simplified overall mouel relating fuel, steam
feedwatcr now rales to steam temperalUre and pressure.
(8.1.1)
(8.1.3)
(8.1.2)
I
hTx + ,.
.,- 1.'11
,
s =\' Irt" =vTWv
I i....,; II - I I I
i= I
In recursive 0.1.5.. X
r
dilrers Iittlc from X'_l at large (because :\1_1 minimises
thesul11 of squared residuals rrom time I to (- I. and thc new residual!\ has a
proportionalely slllull inlluence on the sum Irolll I to t. II we think the
dynamics are timc-va rying, we can a Hath more weight to rcccn t than to earlier
rcasoning that x
r
should no! have to give small residuals at past
times when x was very different from Xr' Thus we are led to minimise a
weighted SUIll
1".1.3 I{ccursh'c \Veighlcd-Lcasl-Squarcs with a Forgclting Fador
8.1.2 l\'lodilicatiun of I{ecurshe AlgorHhms 10 Track Time
Our task is to modify the recursive algorilllills ofSectiolls 7.3 and 7.4 to track
time-varying panll1lcters. The observation (ur regression) equation
with parameter vector XI constant gave rise to updating equations (7.3,28),
(7.4.7), (7.4.22) and (7.4.26), all 01 the lorm
which can track paramcter variation provided the correclion gain k
l
is not too
small. The problem is thaI the gain decreases as ( increases, in any aLgorithm
which f(Jr tiillewinvariant dynamics yields ever-increasing accuracy 1'01' x
r
, '.' It
the prediction crror Xr - h;r
xl
I is due Icss and less to error in XI _I and is
ultimately duc mostly to observation noise, a smull correction gain is
appropriate. On the other hand, with time-varying dynamics x
i
_ I may not be
a good estimator of x
r
even at large I .. and a larger gain is necessary. To
improve tracking ability, the gain k, has to be increased in some systematic
way. In the Markov, e.l.s. and other recursive I.s. algorithms, k, is
Plh,/a;, so k, Illay be increased by increasing PI' that is by making the
covariance renee! less C{JIlJitiCIlCC in the updated XI' We shall examine three
ways or doing so.
8.1 RECURSIVE IDENTIFICATION OF LINEAR. T1MEVARVING MODELS 191
As the highest gains coincide with flow peaks and the lowest with relalively
dry spells. the variation is plainly due to changillg soil dryness, with
consequent variation in the proportion of rainfall running off rapidly. A
model allowing for storage and saturation would be worth investigating, it
seems. 6.
200
200
200
100
Time (hl
100
Time (hl
8 SPECIALISED TOPICS IN IDENTIFICATION
o
2xl0
4
(el
o
iL
100
Time (hI
Fig. 8.1.1 Timc-variitliOil M river-catchment model. Example H.I.2. (a) Steady-stale gain.
last hour's rainfall, and (c) How.
Example 8.1.2 Figure g.l.1 shows liilll.: variatiun ortlle cstilllatcd steady-state
gain ora rivcrcHtchmclll.lhc-Mackinlosh ill Tasmania. with hllurly rainfall as
input and river now as output. The gain wascstilllU1CU by all extension of the
e.l.s. algorithm as described in Section 8.1.5.
A less obvious justification for a time-varying model is that systematic
variation of parameters in a tentative Illudel. induced by ulllllodellcd
behaviour. can be vcry clTcclivc as a guide to how the nlOucI should be
extended or modified.
190
193
(8.1.7)
18.1.8)
(8,1.9)
(8.1.10)
COVW/_
I
= I
COVW, _ 1= Q,-I
X//
I
_ I = <I), -I x, -- I
8.1 RECURSIVE IDENTIFICATION 01' LINEAR. TIMEVARYING MODELS
One simple yet llcxible mouel is a random walk for each parameter:
since x
l
_ I is unbiased and __ I is zero. To satisfy (8.1.9), whatever the value
of we mllst have A cqual to tI',_ I and b zero. so
Here "",_ 1 is ilHkpcndcnt 01"",_ I' zero'"l11ean and white, i.e. E[wswtJ is Zero for
s"'" f. It can usually be takcn as wide-sense stationary, so Q,_ I can be written
as just Q. III thl: absellce or special background knowleuge we lakc Q as
diagonal, illlplying that the paramclers vary indcpcndcnlly. and thus we need
only specify the IlleHn-square variation or each parameter. This simple! random
lralA- (s.I'.1I'.) model is a special case of the more gcneral parameter-evolution
moucl
The covariance Pi!l _ I of X/If I is found by noting lIlat neither x/ __ 1 nor X,_ 1 is
We recognise (H.I.S) as a slale equation (D'Azzo and Houpis, 1981; Gabel and
Roberts, 1980) 10 accompany Ihe observation equation (8,1.1), Estimation of
XI is therel"ore a sort of state estimation problem, with a parameter vector as
the state. We should not lind this too paradoxical, given the haziness or the
distinction between time-varying parameters and variables, discussed carlier.
!he view or paramcter estimation as slate estimation is enormously
It opens the door to a great armoury orstate-estimation technique, as we shall
soon sec. There is one diHerence between parameter and state estimation. The
observation vector hI {matrix fit for vector observations} is taken as known
and deterministic in state estimation, but in parameter estimalion it is usually
stochastic, containing noisy previous output samples and/or samples of lhe
noisl.:-gelleratillg variahle, as well ,IS input samples which Illay be vicwcu as
stochastic. We saw in Section 7.5 that the stochastic and orten complic<.llcd
nature of h, makes analysis 01" recursive parameter estimalorl-i dillicull.
\Vc lIornHilly kllow luo little (0 specify (I) <Inti r ill the lull partllllc(cr-
variation llIodel UL I.H), but we loan see its cllccts on recursive least-squares
algorithms wilh very litlle more algebraic ellortlhan considering (8.1.7), anu
end lip with all algorithm recognisable as a standard stale estimator. State
equalion (8.1,8) auds a new slage 10 each recursion step. Jt is used 10 projeci
x
t
_
1
forward in time to a new prior estimate ).:111-1 of Xf' The second sub-
script indicates that X'!I-l is based on observations up to [ - I. For X. to
b I
, d b' . '1'-'
e Illear an un JUsed, It must be or the form Ax + b will]
I ;
8 SPECIALISED TOPICS IN IlJENlIFICATION 192
Alternatively, PI can be prevented from becoming 100 smull as t increases by
being reset to a fixed large value whenever its size, mcasured Cor instance by
tr PI' falls below a certain value. By doing l-iU we exprcl-is Llisbelicf thai x, is
really as good as PI says, and dismiss the confidence ill XI derived I"rom earlier
observations.
The mcthod has the tcchnical virtuc 01" allowing COllvcfgel1cc to be proved
relalively easily 1'01' time-invarianl paramelers (Goouwin and Sill. 1984). On
the other hand, it gives much marc innucncc to observations immediately after
a covariance resetting than Lo those just before, generally for no good reason.
8.1.5 Explicit Modelling of Parameler Variation
where IV, is a diagonal matrix nnd the weights Wi increase with time. The
scalar-output Markov estimator in Section 7,3.3 attaches weight to
, "
giving (7.3.29) and (7.3.30). If we put II', for I/(Jf we oblain (8.1.2) wilh
We cannot easily pick a forgetting factor or resetting threshold in dU""OCO
even if we know roughly what parameter variation to expect. The
forgetting factor may in any caSe be a poor compromise when some
parameters vary much more rapidly Ihan others. Greater Ilexibility and
simpler incorporation of prior knowledge can be avhieved by basing the
estimator on an explicit model of the parameter variation. If we are to avoid a
subslanlial exira idenlificalion problem. we musl keep the model of the
variation very simple.
r=-'-(p, ,- k "
, i
J
,- J fl' I = P, I,
I I, + h, P, JII,
The commonest choice of PI is a constant forf!,t![(iIlX factur 11 just below
generating an exponentially increasing sequence of weights 11', = [r
'
. The
value of /1 is adjusted until credible parameter variation and acceptable
residuals are oblained. A lypical value is belween 0,95 alld 0.99.
p = p _ I
/ /- I I 'I T/' I
-I- 11,1, / .. I I,
These equations are oftell wriUen iillcrms or II', PI and II',
and iJ,:
and
k
r
= 1i',P,h,
8.1.4 Cm'arianee neselling
195
(8.1.17)
(8.1.16)
[
fill] [1'1111] h __ , 6 111- I ,
fl = fl21 = p(21)
I . 11'- t
x, = <1>x"1 + r,(Y, - h"<!>X"I)/(1 + htr:") (8.1.18)
1', = Pn"1 _,,'r,r,r/(1 + htr:") (8.1.19)
I tis I11llch easier to find, by trial and error. a satisfactory valueJqrQ,in the
i.r.w. model (8.1. 14) Ihan illthc s.r.w. model (8.1.7), because theeflect
than-ideal choice in the i.r.w. model is merely to make the parameter-estimate
sequence: 50: l a little too smooth or rough, The overall extent of time variation
in jx 1. which is usually our main interest. is not very sensitive to Qin the i.r.w.
model.
8.1,6 Optimal SlIioulhing
When the parameters arc modelled as time-va I'yi ng and as in (8.1 .8),
the eXlra uncertainty introduced by r
l
_ tn',-I adds I to the
covariance of the parameter estimates. as in (8.1.11). The increase in
uncertainty makes it impOl'lant that at every sample instant xshould utilise
the inf'orlllatioll ill as many observations as possible, Also, as the detailed timc
variation in :x I may be of great interest, we should like good parameter
eSlimates throughout Ihe record, early as well as late. This applies especially
short records, where error in Ix) due to a poor initial gucss Xu may
decrease slowly enough to obscure the timc variation over much orthe record.
The key 10 improved parameter eSlimates is Ihe faci Ihat x, influences all
later values of x through the state equation, and hence allialer observations
up to the last. IN' Consequently x
r
should embody information from all hiler
observations as well as those up to .J',. That is, we should compute x,)s. not just
Xli" ComputaLioll ofx
rlN
is the function of/ixel!-i11tel'val oplimals11l00lhing in
slale estimation (Jazwinski. 1970; Bierman, 1977: Maybcck, 1982). In Ihe
same way lhal recursivc I.s. identification is identical to Kalman filtering
Both x' and s arc estimated, so
x, [:] <1'<-1 = U I = (8.1.15)
In the observa Iion cqtla t iOJl (H. 1.1 ), hI is p,u.l deu wi til zeros tomulliply 5" We
pal'litioll P
rll
I into {fl xI') hlocks to match x' anu s:
[
/llJ IJ Illl!.l]
P'!I.- I == pLl-11 pl22J
11'- I
and similarly for P" Defining
we obtain
8.1 RECURSIVE IDENTIFICATION OF LINEAR, TIMEVARYING MODELS
(8.1.14)
(8.1.13)
or its w.l.s. counterpart
For veelor observations, (8.1.10) and (8.1.11) arc unchanged. Together
with (7.3.9), (7.3.5) and (7.3.6) Ihey form the complete algorithm. The
algorithm is identical 10 the Kalman Jiller for slatc eSlimation. The nola lion
Xiii and PilI is uSlIn) for what we have been calling X, <Ino P" to emphasise that
the latest observation y, has been processed. '
Parameter tracking with an H.w.model adds only(8.1.12) 10 the recnrsion.
The choice of Q must avoid over-inllation or P and consequent inelticiency of
xon the one hand. and inability to follow rapid variations on the other. Prior
knowledge is not usually enough to fix Q. sowe III list adjust Qwith reference to
some performance measure. A simple am] clrective technique (Norton. 1975)
is to compare the 111.5. innovatiuns and residuals at trial values of Qwith those
obtained when Q is zero.
Background knowledge rarely provides clJ, I and f , I in the more general
model (8.1.8) eilher. It is nol easy i(J eSlimate Ihem along with x, partly
because too many unknowns arc to be estimated and partly because pruducts
of unknown elements of l.1'r_ I with the unknown elements of X, __ 1 make the
overall estimation problem nonMlinear. This problem is pursued further in
Section 8.9. There is. however, one versatile mOdel more elaborate than the
s.r.w. but actually easier to usc. the integrated rum/om It'u/k (i.r.Il'.) model
(Norton, 1976) for a parameler veclor x':
5,=SJ I -\-\\'r 11.
x;=:<-I-I-S,-IJ
194
correlated with I' as both depend only on earlier vulues or w. Hence,
denoting X,_ I - XI I by X, I'
IX, l-r,_tW,_I)(-tll, IX'_I-r, IW, [)II
= {IJ, J Elx, I i/ lld),l I + r, 1Hlw, J't\./ I W/ I
=(1', ,I', 11/ 1+1', 1(), 1)'/ I (X.I.II)
For the s.r.w. Illotlcl Htl.7) willi L:ollslttlll-covariallcc \n'J, this reuuccs to
I+Q (8.1.12)
and x
rll
_ I is just x, I'
The rest of the recursion step is exactly as 1'01' lime-invariant parameters,
with x'il I and P
rll
- I replacing Xi _I and P, I in (8.1.2) and in the covariance
equation
The parameter increments s are now random walks and Q dictates the
smoothness of the parameter variation rather than the 111.5. variation itself.
197
ti';
1= N - I, N - 2, ... ,0
(8.1.28)
I " /II' /i-I(, /I '. )
Ar = ( )r + I Ar + 1- ,+ ( , +I ) 1+ 1- I + I x
r
+ (IN ,
8.1 RECURSIVE IlJENTlFICATION OF LINEAR, 1'JMEVARYING MOlJELS
X
llN
= (1',- I(x/+ liN +r,Q/rJA,)
but it has a fatal Ilaw: it is unstable.
= I(J J - (D,- 'II(J I - (V;+ I - HrR- "HJJ - (V,-I) -'(J(V,-'rQrTI =.0
(8.1.31 )
Boundary conditions for these equations are the initial guess, x
Olal
and the
last cstimatex
NIN
computed by the ordinary non-smuothingMarkov
estimator. Once 1N_1 is round fr011l (8.1.26) using X
NIN
' the backwards,in'
time recursion ronned by (8.1.24), (8.1.27) and (8.1.22) gives 1,_" ""_I and
X
/
_liN successively, providing I is invertible. The recursion can be written
morc concisely as
I
A
II
. All
Pro41hal $moothillg Algorithm (8.1.28) is Ullslable Substituting,l" from
the fIrst equation into the second and dropping inessential subscripts, one step
of the backward recursion is
[
X'INJ = '(1 + rQfTH'R-
1
H)
1, H'R-II/
[
(1,,- IrQI-THTR - 'J
- }} I /i - I y, "
'" IINJ + rorcing by y,,. I (8.1.29)
1+ I
For stability, all eigenvalucs of \)1, i.e. zeros (J of la1+ lpl, must lie inside or on
thc unit circle. The identities (Guudwin and Payne, 1977)
give
luI - \1'1 = luI - q,;+ dluI - <lJ,-'(1 +rQrTH"R-IH)
-<vlrQrT(v
T
(UI-<lJT )-IHT/i-II/I
, 1+ 11+1
= luI - <1>;+ ,lIa1- (1',-1 - uq,,- IrQrT(aI - (V;+ ,) - '/ITR-II/I
= I(JJ-<P,-I (J'D,-lrQrTI
HTR-IH uI-q';+1
Now the zeros of laI - II arc the eigenvalues of I. which are the
reciprocals or Ihose of II',. For any stable parameter-evolution model (8.1.8),
(8.1.23)
(8.1.21)
18.1.22)
(8.1.27)
X
Olll
givell
(8.1.20)
I 1.2.... N
I =1.2..... N
1=1,2, ... ,N'-I
1= 1,2, .... tV
cuv,"" = R."
cov"'r __ 1 =Qr-I'
cov xUI!J = l"\qo,
H SPECIALlSElJ TOPICS IN IlJENTIFICATlON
y/- Hrx/=l',
XI = (!J, _ I x
r
_ I + r
/
_I W, I'
XU = xO/ 1J + x
Ulll
'
8L p-I' .) ,"'1"-0
-,- = oIO(XO!N - x
olo
+ ""OA
O
-
aX
olN
=0
aXNIN
aL Q-I' rT, -0 12 N
= '-IA'-I-' t=, , ... ,
w/_
I
196
and setting aL/a1,_I' aL/ax'IN and aL/a"',_1 to zerO to find the constrained
minimum of SN' The constraints (8.1.22) are satislied by aL/D1, _I being zero.
Also
aL 'I' -iI II' )' ("'1"-0
--,-=-H,R, y,- rXtlN -A
j
_
t
+ V,A,- I
aX1IN
We usc the Lagrange mulliplier method (Adby and Dempster, 19.14), defining
N
L
/= I
except for the stochastic nature of II" optimal smoothing algorithms
developed for stale cs(imatioll can be used ullchanged for parameter
estimation.
We can derive a fixed-interval smoothing algorithm as the optimally
weighted Markov I.s. estimator ufx, from YI to YN' Vector observations do 110t
add to lhe complication, so we shall consider thelll. We start hy writing
all the available information about lxJ:
The Markov estimates x
olX
to ),:,\/,\ and the COITCSpo1H..Iillg eSlimates to
Wi\' _ I must minimise
N
S" = L:(y, - lI,x,'N) IIJ', - I/,x, ,,\) + ",' IQ,- " ,i', ,:
r= 1
+(x
nls
- - x
011I
)
subject to equality constraints
xrl",=II)r--l x , I'
(8.1.41)
(8.1.40)
... ,0 (8.1.39)
I =N-1, N-2, ... ,0
X,I 21/11 = (I), IIXII-II,-I I
= (1
1
,+ I(X,+ II' + P, +!lr+ ,Ii,'+ I R'--I\(l,+ 1- 111 + I
X
,+ II'))
= <1
1
,11(/- PI-I III-IIH/+ I RI-t-\H,+ I)XII-III
+ forcing not depenuent on XI + I I'
A noteworthy feature of these algorithms is that the covariance P,+ liN of
XI + liN neeu not be computed unless we want it. or course we may wish to
assess the improvement due to smoothing, particularly ofl" line when extra
eompoting is no problem. Despite the simplicity of (8.1.36) and (8.1.39),
P
I
+ tiN is very complicated to derive algebraically. We can find it much more
easily rrolllthc orthogonality conditions 011 the smoothetl estimate. Much as
in Section 4.1.3, they say that the error ill Xis orthllgoilalto the contributiun
of each observatioll to X. amI hence orthogullaltll the ctJIIlribulion 01" allY set
Summary A .fixed-interval optimal smoothing algorithm for time-
varying parameters or stales consists of either (8.1.36) with (8.1.38),
. or (8,1.39) wilh (8.1.40).
,,'
As R.'--'l." and P
r
+ q, + I are symmetric, the transition matrix in (8.1.38) or
(8.1.40) is the transpose of that 01'(8.1.41). Hence. it has the same eigenvalues,
all stable since the forward updating (8.1.41) is stable (Jazwinski, 1970;
MeGarty, 1974). :,:,
ProoF Ihal (8. LJH) (Ifld (S.IAO) are slahle The transition matrix of eith"er
recursiun is (I 1III
l
I R,"'i\ 1/, + I P
I1
II' I I )(1
1
/ 1 I' The updating rrom x, I' II' to
x'I-11, + I is alLog.cther
The malehing recursion for 1, follows by substituting (8.1.39) into the first
equation or (S.1.28) rewritten as (S.!.3?) bUL with xl+lli+1 for x,+II,:
The question of stability does not arise in (8.1.36) or (8.1.39) as they arc not
recursions. bUI we musl prove thai (8.1.38) and (8.1.40) are slable.
B.I RECURStVE IDENTIFICATION 01' LINEAR, TIME-VARYING MODELS 199
Alternatively, Xt-l liN can be found in terms of X
I
+ 11,+ I by substituting
(8.1.36) inlo (8.1.33), which gives
(8.1.38)
I 0, I, ... , N - I
I = N - I, N - 2, ... , 0
B SPECIALISED TOPICS IN IDENTIFICATION 19B
A recursion for lt in terms of x
l
+ III can be obtained by writing the
equation of (8.1.28) as
lt = 111+ I - I R
I
-+
I
I
(Y,+ i - H
I
+ I x
l
+ III)
+ HI + I (XI + tiN - X, + II')
then substituting x,+ liN - X,+ III from (8.1.36) and usmg the
equation (7.3.19) from to PI-+IIII+I' 1
1
is
lt =. (l- H:+ I R,-+1
1
HI +I P,+ 111+ 1)($;+ III + I -l{;+ I RI-+II(YI + I - H, + I XI + lid)
, (8.1.35)
Running forwards in lime, the left,hand side of (8.1.34) is zero, i.e.
An initial condilion is found by adding <!I"P" I0 times (8.1.25) to (8.1.32) wilh t
zero:
X, + III - X,+ llN = $'(X'II - X/IN) +(P, + III .... (I)rPrl/lJ;r).,
and from the observation-updating equations giving x
f
+ lit + 1 from x, + lit and
P,-/
11
,+ 1 from P,-/
li,
as in (7.3.21) and (7.3.19), substituted into the
equation of (8. 1.28),
The rearranged optimal,smoothing algorithmllses (8.1.38) and (8.1.36) lo find
XI + liN in a backwunl run, using the results XII II' ami P/I III uf the on..linary,
non-smoothing forward recursion.
The instability 01'(8.1.28) is not all essential property of the ).'s or x's, but is due
to the way they are calculated. It can be avoided by computing KilN rrom X'!I-l
or X'ii rather than X
f
+ liN. The mechanics of establishing a suitable relation are
boring, and will only be sketched brieily. From the second equation of
(8.1.28), (8.1.10) and (8.1.11),
A, == $;+ 1A,+ 1 + P,-+\ 11+ j(X,+ liN - X,+ 11/+ \) + Pj-+\ [reXr+ III - X,+ liN)
PrelTIultiplying (8.1.33) by 1[',+ I P'-:"'.I,+ 1 and then putting I for I + I and
subslituting the result into (8.1.32) yiclJs a recursion fur x, + Ilf - x,+ IJN ......
P,+ Ill),':
, , 1" 111/' /"1 (' " /) , )
x'+II,-x'+lIN- I+ilr"'j= fill II/-J '1' IAI I
the eigenvalues of (1),-' I arc unstable, so from (8.1.31) \J1 has some unstable
eigenvalues.:,:,
201
Comportmenl
1
Input
Uj Observed
//.'-' x,
Comporlmenl
2
IOENTlIIABILlTY
8,2.1 Factors Affecting Identifiability
8,2 IDENTIFIABILITY
Fig. 8.2.1 TWO-colllpurllllclil model.
Example M.2.1 l'umpartfllcntal muuels uf the lype shuwn
(Carson ('I iii., IYH I ; Godfrey, IY8 31lire onen employcd in biomedical studies
of how various substanccs arc metaboliscd. The two-compartment model in
Fig. 8.2.1 represents rate equations
. [-k
UI
-k
21
k" J ["'J X= x+
k
21
-k
02
-k
J2
0
for now into and out of thc compartmcnts. The rate constants k
OI
' k
02
' k
l2
and k
21
aic to be found. To make physical sense they must be non-negative.
We assume that only compartment I can be perturbed amI compartment 2
observed, both directly. The observations and estimation method allow the
transfer function X
2
(s)/ VI (.1") to be found with negligible error. Can the rate
constants be detennincd uniquely?
Identilia bility is,. joint property of an idenlilieation experiment and a model.
establishes that the model parameters can bc estimated adequately from the
experiment. The model and experimcnt need not bc complicated for u test of
Iheir identifiability to be non-trivial.
(8.1.36), (8.1.38), (8.1.39) and (8.1.40) entail lillIe extra computation if the
covariance is not rcquired. For instance, the transition matrix from ),'+ 1 to)"
is; as we saw earlier, the transpose of that from x
,It
-
1
to X,+ lit. Scalar
observations and an s.r.w. or Lr.w. parameter-variation model simplifY the
algorithms further (Norton, 1975, 1976). The weighty covariance calculation
can be organised so as to avoid matrix inversion. Economical computing
arrangements for identifICation are discussed further by Nortoll (1975).
(8.1
(8.1.44)
(8.1.45)
H SPECIALISED "fOI'ICS IN IDENTIFICATION
X
IIN
- X
tlt
= p,!t<biP;-+'I It(X
t
+ liN - X: ,+ IJ')
Now (8.1.42) and (8.1.43) imply respectively that
[x" 11,,0<,+ II' - x,+ II")'"] = E[x, +II"x;:,. II'] - P, + II" = 0 (8.1.47)
20(J
From (8.1.36),
and, similarly,
[(X'IN - x'I,)(xI/N = -PI/N + P'I' (8.1.50)
II follows that, multiplying each side of (8.1.46) by its transpose and taking
expectations,
P
llt
- P
,JN
= ptl,<!,-tP,-/llt(P
t
.j. I JI - PI + lIN)P,-/III(l>,P
tll
(8.1
Summary The backlvllNl\' recursioll /01' the {.'OVl1l'iallCe the smoothed
estimate is
P
tlN
= PIli + .. llI,(PI+ liN - P, + 1I1)P'-:f'\ f/{I),P/
1J
(8.1.52)
and (8.1.39) with 1 for 1+ I gives
from which
E[(x,+ lIN - x,+ IJI)(X/+ liN - X, + Jlt)TJ
.\
= - E[x,+ 1II(X'+ liN - X, + IIJ
f
]
=-[x'+!I,x[+!I,,]+P'.'.II,=-P'+IIN+P'+II" (8.1.49)
We have P
NJN
from the forward recursion.
Fixed-interval optimal smoothing formnlae for state estimation have been
derived in a wide variety offorms (referellces ill Nortoll, 1975). The versions in
so
of observations. Dcnoting x - xby xas usual, we conduuc llwt
E[x, +11"(x" II" - x" II,ill U (8.1.42)
as X
I
+ liN depends on YI to YN and Xu Ilion Y'I to }'/. Similarly,
[x'I,,(x'I" - x'l,lr] = 0 (8.1
X,(s)/U,(s)=a/(.I" +/Js+)')
203 8.2 IDENTIFIABI LlTY
nature and location orthe inputs, paramelerisation and existing knowledge of
the model. and properties of the estimation algorithm. These factors
strongly. but we can single out and analyse some fairly restricted aspects of
identifiability. Passing over the properties or estimators, whieh we have
already enquircd into. let us examine model paramcterisation and input
properties.
We can slart identifiability analysis by asking, as in Example 8.2.1, whether
the experiment and model structure yield unique parameter values in
principlc, without regard to numerical accuracy or stoclUistic uncertainty. The
topic is oftCIl called the structural hlelll[fiaNlity proh!C:'1I1 (Bellman and
Astr6m, 1970), with "structural" 1IlH.lcrstoou to mcan "1"01' almost all
parameter values". The uniquc idclltiJinbility for all t' round ill part (iii) or
Exalllple H.2, I is Ilut strll(.:tural. as it applies uuly uvcr Ull infinitesimal
proportioll or IY.. {I alill )' values. Nor is thc 1I1liqllCIICSS ill purt (ii),
which applics for mallY parameter values bu tnot a Imost a lJ. Iden tilic.;PtiOHPf a
usable model docS not always require structural identifiability, \ve conclude.
The term "structural identifiability" is rather misleading. since identifiability
may depend on prior information or on what combination of input waveforms
is applied (Godfrey, 1983, Chapler 6; Problem 8.1), as well as on the model
structure. We prefer to speak of lktermil1istic itl'l1ojiabilit.1'.
The dcterministic identifiability problem is quile distinci fi'omlhe problem,
imponant for multivariable models (Section 8.7), or finding an economical
standard "cunonical" model to represent the obsetved behaviour uniquely.
Our choice of model is conditioned not only by a desire for uniqueness and
simplicity, but also by the intended usc of the model, the physical significance
of its parameters and our background knowledge about it. Example 8.2.1 is a
caSe in point, where a second-order transfer function with three parameters
completelYdcscribes the relation between II, (I) and .\",(1), but is less physically
informalive than the four-parameter compartmental model, and cannot
easily take inlo account the non-negativity or the rate constants.
for S,l.s.o. transfer-function. dilrcrential- or difTerence-equation
input, output ll1oucJs, dcterministic idcntiliability merely requires lhat there is
no nxlunuancy, i.e. the moucl order is nol too high, and thal lhe input
stimulatcs all the behaviour to be moueHed. Seetion8.2.J considers adequacy
of the input. Stale-space models pose a much stiller problem, becausc of the
greal variely or possible panllllctcrisutiollS ror any givcn input'lHllpUI
beha viou!". As [:xHmplc H.2. I has shown, Laplacc-t ransform unal ysis CUll, wi Ih
care, test delerministic identifiability of low-order state-space models, but it
8.2.2 Deterministic Identifiability
8 SPECIALISED TOPICS IN
Taking Laplace transforms of the rate equations and eliminating X, (s),
find
Example 8.2.1 deals with a linear, luw-un.lcf, tilllc-invariulll Illudel whose
transfer function is estimated accurately. Checking its idcI11illubility
nevertheless takes a little thought. Ideally, we should design an identification
experiment with the help of analysis, running through a number of
combinations of usable model and feasible expcrimcnt until we find u model
we can in theory identify uniquely. UnfortulHltcly, this may be diJliculL, as
identifiability depends onlllany things: scope and quality of Ihe observations,
(i) With k 0' = p, we can find unique values k '" = ()' - 11(/1 - a - fI))fa and
k,,=(/I-a-p)(1 +pfaJ-J'/a.
(ii) 'With k
02
=p, we have k," +k
12
=II-a-fI and pk", +k
01
k
12
=
from which -\-(a-ll)k"l +)'-a,,=O and
In general, k
OI
and k 12 are non-unique because of tile quadratic. However, no
ncgalivc value for k (J I or k 12 woulll make physical sense so, ucpcnding on the
actual numbers, we lIlay be able to pick out a u/lique solution for hUlh. For
example, a = I, 11=4 and )' = 2 give
k", = 1.5 0.5,11 +4'1 and k" = 1.5 - I' +0.5-} I +4"
To make k
OI
non-negative the ambiguolls sign in k
OI
must be positive ifp > 2.
That would make k 12 negative, though, so the model willr, p > 2 is
incompatible with the observations. If 2 -,j2 < " :0; 2, non-negativity of k I'
requires the positive square root for k 12 and hence the negative square root for
k
o
" and we obtain a unique solution. For 0:0;,,:0; 2 -,j2 both solutions for
k
01
and k 12 are non-negative, so the solution is nOl unique.
(iii) With k 12 =" and a, {I and J' as before, we have kill - 2k" I + I' - I =0
and k
01
=3 - P - k
oJ
' We must take the larger solution for k
U1
to make
non-negative if" < I, but if p> I the smaller solution for k
u
, is required to.
keep k
01
In both cases the oth57f rate constant is non-negative as
a result. The sign ambiguity is Ihereby resolved for any practicable", but only
by c!Jance; a small change in ex, 1101")' would leave it unresolved, and the model
not uniquely identified, for some values of".
where 0: = k
21
, fJ = k
OI
+ k
o
2. + /.;12 +k
21
and)' = k
ol
k
02
+ k
O
]k
l2
+k].
The experiment finds a. Ii and )'. With k 21 given by a, two equations remain
in three unknowns, so we neetl some auJitiollaI information. Let us
whether prior knowledge of one ol"k
o
\. k
02
or k 12 allows us to Hnd the others
uniquely. Consider each in turn having a known value p (perhaps zero).
202
205
(8.2_8)
for i =)
ror li;6)
rlq, = 0 (not needed),
I Jas rl. The amplitUdes of the exponentials inl!J2I(tJ!then
and '11 unknown. From (8.2.8).
B.2 Il)ENTiFtAUlLITY
By definition we also have
r!qj =
(i) Irwc klluw k
OI
=fl. thcll since k
01
is -all -111.1'
-k
ol
=([11 +([21 =(r
l
+rl)lAcll =-,,
is lincar ill r 1 anu tugetllt.:r with r: (12 = 0 gives r I uniquely ill gcncral. The
rCITlililtling equations give q, uniquely, so M and M-I, and hence A and the
constants, arc round uniquely.
we lhree equations ill foul' lIJlkntlWIlS, anu the mudcl is lilliLlcntiliable
rurther information.
Example 8.2.2 The problem of Example 8.2.1 will be tackled b)' normal-mode
analysis.
The impulse response from III to x
2
' with Band C known, gives
The eigenvalucs are readily found by Jilting exponentials to the observed
",prs, so/\ inI8.2.5) and eM in (8.2.7) arc known. All our information about
the model is now in the form of bilinear equations in the unknown rows of M
and columns of ,\1- I. Prior knowledge consisting of lincar equations in
elements of A can be expressed in that form also, by use of (8.2.5).
If We can solve uniquely for M and M-' and know /\, we can find A
uniquel)' through (8.2.5). We first choose the value or anyone non-zero
clement of each eigenvector at will, since the scaling of each eigenvector is left
free by ils defining equation. It is usually besllo make onc row r or column q
entirely oncs, or and zcros if zeros arc presenl1caving some free choices
elsewhere. The pusitions or zeros in iH and A/- I can be round casily from the
pattern of zeros in A imposcd by the model struelure (Norton. 1980a). With
olle r 01' (I known, some of the bilincar cquatiolls becomc lincar, amI the
idcntiliabilily lcst has beeillurucd into it test whether a mixed set of linear and
bilinear equations has a unique solution (Norton, 1980b; Norton el al:: 1980).
This reformulation of the original problem of testing whether equations of
degree lip lo /1 have a unique solulion makes a quick solution more likely. It
does not, however. produce a Ileat general criterion for unique identifiability,
and it increases the number of equations.
(8.2.
Vis) CX(s) + DU(s)
)'(1) Cx(li + DU(l)
H SPECIALISED TOPICS IN 1I)1I'N'f!I"lr,_TI
x(I) = AX(I) + Bu(l),
JX(J) - x(O-) = AX(s) + BU(s),
2U4
M/\ '" M diag(l. I ' 1. , ' _. _i.,,) = AM
and element (i,j) or A is
lIu = elcmcllt (i.j) or AfAAl 1 =rlAfl,
For allY positive integer k,
MI'I'=AMI'I' '=,I(AMI'I'
so A4e!ll is eA1Jltf, and the impulse-response matrix CeA1B is CAle
ll
lft4 - lB.
brevity we shall assume that inLlividuul c1L:1l1L:111S of Al{'i\IAl 1can be
rrom the impulse-response mutrix. This is so ill Example H.2.1 and whenever
and C are known bL:c:Jusc or our ehoke or slule. Tile ohscrvathllis 1l1ell
one or morc response components
iJ.(/) = r'e'''11
t./ I}
and the output is
V(s)=C(sl-A)-'x(O-)-I- IC(sl-A) '/J+lJiU(s)
The elements of the transler-function matrix C(s/- A I IIJ -1- D have
common denominator lsI - AI of degree 11 in s anu the clements or A. To tcst
the elements of A can be found uniquely from C(.\/- A) IIJ + IJ requires
method of testing whether a set of simullanculis equations uf Jcgrcc up to II
has a unique solution, whatever the numerical values. No such gCllcrulll1cthod
exists. We arc rcuuccd to ad huc searching for a unique algebraic solution, as
in Example 8.2.1. Similar comments apply irwe look at the impulse-response
matrix CeAIB.
The dilficulty can be cased, but not removed, by bridging the gill' betw"en
impulse-response matrix and stale-space model with a
expansion (Reid. 1983, Chapter 10; Blackman, 1977, Chapter 2). The idea is to
express A and CeAIB ill terms of the eigcnvalues ).1 to }.n or A, the rows rT to
or the modal matrix M and the columns q I to q" of M - I. The columns of M
are the eigenvectors of .A, so from the denning equation of eigcnvalues
eigenveclors,
can be very cumbersome 1'01' model onJcr as low as three (Norton. 1982<1),
reason is that the equations relating the parameters in the preferred model
the directly identifiable transfer-function or impulse-response coclIicients arc
of degree up lO II, the model order. In continuous time the slate model
(with x(O-) and u(I) givenl is Laplace-transformcd to
207
(8.2, II)
(8.2.12)
(8.2.10)
U
'
!X=- . u:x
I
(I' U'U(I = L(element I or Ua)' = 0
I"" I
N
where uis the veclor of regressor means. \Vhcn each row or U' is p successive
samples from a single stationary sequence lu - Di. the elements or V'et. arc the
result uf filtering lilt - i/ \ with a moving-average transfer fUIH.:tion
(XI + Cl.
2
=- I + ... + (i,,:: 1.-1- I. A moving average of a zero-mean sequence
Example 8.2.3 We contemplate 0.1.5. identification of the u.p.r. :11: iii the
model
where Iii is the meiJn or regressur i and culumn i or U'consists of zero-mean
samples I/Ii - Iii' The condition for Uri 10 be zero is then
Clearly Vex must be zero, i.e. the colulllns of V Illust belincarly dependent. We
can write U as
y,=h
1
11,-k-1 +h].1I, Ii--]. + ... +h/f,_k_I,+e1
with III I periodic with period P. Ifp > p. the last jJ - P regressors repeat the
first p - p. making U
T
U singular. Furthermore, if the d.c. componenl 01'11/ l
(the mean over a period) is zero. P must exceed p since any P successive
regressors II, to ll'-li""i-I'+ 1 wouill always sum to zero. Even whcn P> p
there may be trouble. e.g. irsuccessive half-cycles or 11/ l are symmetrical about
zero. so that the sum of' any two regressors hall' a cycle apart is always zero.
Further possible dependences cHI easily be fOlllld (Problcm X.2) but
enumerating them is tediolls. and it is simpler just 10 say lhat ill: must make
U
T
U 6.
For Vi U (0 be singular. a real. lIun-zero ex must exist such lhat VI'
which implies that
although they would nOI be linearly dependent for an arbitrary waveform,
they arc for a particular choice or waveform. The risk is significant since we
prefer a simple waveform, all other things being equal.
8.2 IIJENTIFIAUILITY
(8.2.9)
H SPECIALISED TOPICS IN IDENTIFICATION
Detcrministic identifiability assures us that we are not prevented 1'1'0111 Hnding
the parameters by practical restrictions on what variables call act as inputs and
observed outputs. We must next make sure that we are not prevented by
failure of the input to excite all the dynamics. For example, a singlc.. sinusoid,
input in Example 8.2, I would allow us only to lind one gain and phase change,
too lillie to determine the three transfer-function coellieients a, fi and J', Two
sinusoids would be enough.
Conditions on signals in an idcntiIlcutioll experiment to ensure adequate
cxcitaUon of the dynamics arc called persistency of excitation conditions.
They elleetively specify how many independent components Illust be present
in the input signal, and not surprisingly the Humber depends on the order of
the model. The conditions Ci\1l be inlerpretell in the frequency llomain. as we
have just done for Example 8.2.1, or the tillle dumain. They apply to both
deterministic and stochastic signals.
We ask first what conditions must be impused in 0.1.5. The o.l.s. estimate 0
satisfies the normal equations
We cannot solve for ij if VI" V is singular. i.e. if the regressors forming the
columns of U are linearly dependcnt. Commonly. several regressors are
lagged versions of the same signal, the output. The possibility arises lhat
8.2.3 Signal Requirements for Identifiuhilit}': I'ctsistcllcy uf Excitation
(ii) Ifwe know k
01
=1', then much as in (i), (I', +1',>"'1\<11 = -p, and we
have altogether two linear and two bilinear equations in r
l
and til' In this
small example, it is easy to eliminate all but one unknown, cl1lling up with a
quadratic and two roots in general. III a larger example. unique identiHubility
is rapidly established when a succession of linear-equation solutions gives all
the unknowns, as in (i), but not when some equations remain bilinear, as here.
(iii) If we know k " = p, then r:l\q, = /' and the analysis is very like (ii),
t:,.
Surprisingly, exhaustive deterministic idcntillability analysis for even a
modest class of models, such as third-order linear compartmental
(Norton, 1982a), exposes quite a variety of experiment-model combinations
which give delcnninale but non-unique solutions. as in Example 8.2.1 (ii) and
(iii). Sometimes the has a fairly obviolls cause, but sOl11climcs
nolo Atlcmpts at general deterministic idcutiliability anulysis have been
offered by many authors (Cobelli el (If., I n9; Dcllorge, 1980, 1981; Walter,
1982) with varying but incomplete success (Norton,
206
209
(8.2.17)
(8.2.16)
(8.2.18) I = 1,2, ... .1', = li/O.
0, = 0,
with U
o
and Jjp given. \Ve mightlirsl investigale its convergencc in the absence
of noise and errors in Illouel structurc. i.e. with
Persislency of exciwlioll conditions oll regressors often playa parl in proving
convergcnce of I.s.-based estimation algorithms. Detailed convergence proofs
are beyond the scope or this book (Goodwin ano Sin, 1984; Ljllllg and
S6dcrslriim, 1983), bul we can alforo a delailed look at one algorilhm,
recursive o.l.s .. lo gel a reel for the role of persistency of excitation. In our
presellt notation tlml with P interpreted as covOjrr
2
, the algorithm is
P
r
- (u,urp
r
-
I
or P
r
= p/ ,_ I -
I + II, J ,'_ 1u/
I -I- P,II,Crr - 1I}'(Jr_l)
*8.2.4 Persistency of Excitation Conditions and COIl\'crgcllce
ExamJllc H.2.4 Let liS examine a possible input (u J obtained by moving-
average filtering of whitc noise 11I'J:
Iii = 11"/ +1111'/ I + ... +/,,11'/ ' I'
AllY lincar combination or /I succcssive input samples, suy
has a o-lnlnsll".,,, G(O-I)( 1+ F(o' I ))11'(:- '), with obvious definitions of F
ano G. 1':;01' all cxact tincar depcndcnce to hold between every 11 siicc'cssive
input ,ampies, G(o - ')( I + n= -') must be zero. Clearly lhis is impossible for
nOll-zero and causal G(.:-l), so such an input signal is persistently exciting of
any required order. In the frequency domain, the p.s.d. of Jill' J is nat and the
filter I + I ) can only have jJ spectral nulls, leaving an infinity of non-zero
spectral componcnts in \11]. 6.
is noll-zerO. Coellicicnls ct., to an canll1ake H(e -i,,'1) zero at no more thanll - I
frequencies, as they also have to Hx its overall amplitude. Consequently no a:
can make (1IR
IW
IX zero provided II'uCjw) is non-lero atl1 br more frequencies. In
other words, a scalar ergodic signal is p.e. ufordcr 11 ifit contains energy atl1 or
more oistillct frequencies. The proof cxtcnds to vector signals (Soderstrom
and Stoica, !SHU), The convcrse is also true if (if I is scalar, since if <j'u(jw) is
non-zcro at /I - I or fcwer frequcncics, a: can be chosen to makc
IH(e-iIIJ1)12fl'u(jw) zero throughout. However, it is not gencrally neccssary for
a vector signal to contain energy at n frequencies to be p.e. of ordcr 11.
8.2 IDENTIFIAIlILiTY
N
L(II, -1')(11, , ,- II)
, ,
i= 1.2.... 11:/= 1.2.... ,11
8 SPECIALiSED TOPICS IN IDENTIFICATJO
I"rw(i) = lim

N
U = lim _!- 11
,v-." NL I'
I'" I
Ruu = !I'uu(i -iJ]ij'
is positive-definite.
208
cannot be constant and non-zero, so if iiT(X is nol zero, (8.2.12) cannot be
salisfied if 111 - II) is of period N or less; nor, clearly, will il be salislicdlby
stochastic {u - it}. With UTa: zero, it can be satisfied only if U'a. is zero.
CUll be sure of avoiding trouble so long as V'I" Viis posilivc-dclinite, since
(U'a)TU'a., and hence V'a.; cannot be zero.
We now have the motivation for a definition or a usable signal Ill).
or its sample mean and a.c.r. exist W.f1. I, ilnd if the 1l1,Hrix
If lit I is ergodic, expectations call replace the sample averages, and the signal
p.C. of order 11 if the mean and a.c.f. exisl and the 11 x 11 covariance matrix
posi live-dennile.
The frequency-domain tondilions for an ergodic 1/1 t to be p.e. of order
can be found from C
= - u)(u, - = - u))']
where ur is [U, If, I II, III and uis now [II II itll". According
to (8.2.13), R
UII
is singular only if a real. nOll-zero a: exists making a:r(u, - ii)
zero always. We can regard a:1(u, - u) as the outpllt 01" a filter with transfer
function
Definition A signal {u} is persisteJltly ('xtf/jug (p.e.) order 11 if the
limits
driven by \11 - (j \. Its spectral densi ty is thereftlre IJJ((' ";"")l211).Jj(l)), where Tis
the sumpling interval ami <I};,(jw) the spectral density of lu- iii. Parsevul's
theorem then gives
The speclral densi ty 1!),,{jW) is Jlon-Ilcgative a t all frequencies, so the only way
a:
T
R,ma: call be zero is if Ilk-
i
"''!) is zero at cvery frequency at whil:h 11'Il(jw)
211
(8.2.27)
(8.2.30)
(8.2.31)
(8.2.32)
1=0.M,2M, ...
iM
, (1'- I) . ( 1 'T) .
I. mill j,\1 = I m,,, )'0 + L Uk"k 2': 1'0 + II:
k=-I
. -'I' - .. - 1 -r - -,. )-1-
(liO +It)OU.fO/1ll :::;:; Amit1(PjM )OiMOiM :::;:; O/{I,fJ iM 0iM
:::;:; liJ
U
= )'o0riD
a
8.2 IDENTlFlAlliLlTY
A strengthening of the p.c. condition will enable us to say something abollt
lhe convergence rate. If
, ,
lim i.",,,,(' u,u
i
) = lim , u,u
t
) '" lim II: = .J. (8.2.28)
'-'J L ,', IL r--'t.
o Ii "'" I k "'I
I: independent or u" that is, batches or 1"1 successive u's contribute
lIVC-llelllllJlC increments to rr I, and if also we take I as 1'01 with To
By (8.2.23), O/P,-IO, is a non-increasing function of I, so to ensure that O,TO,
converges to zero as t tends to infinity we need only assume that
1 u/;ut) tends to infinity. Different though it looks, this is equivalent
to a p.e. condition on {u}, for if R
uu
is positive-definite,
and
Here we have recogniscd that 1'01 has all its eigenvalues equal to )'o, and that
smallest eigenvalue oftlie sumof two symmetric matrices is no less than the
sum or their smallest eigenvalues. From (8,2.30), (8.2.24) and (8.2.23) applied
1=1 to l=iM,
We conclude that OTO measured at intervals of Mrecursion steps converges to
zero at a rate asymptotically inversely proportional to time.
(8.2.15)
(8.2.26)
u;rp'_1
I + u;r Pi _I",
H SPECIALISEO TOPICS IN IDENTIFICATJON
,
(
, ')-'- ::::: All1ih /....; uj,u
k
0, 0/
k =- 1
O,TO,:;; O/P,-,O,!"''''''' (Iukut)
k=- 1
, ,
II/P,-IO, = 0.(1',;'1 +I nkut)II,:?; II, I u,uiO,
k == I k=- I
210
and from (8.2.16)
uTp = (I , ,
so
llJ, = I
so
A recursion fot the parameter error iJ is round by subtracting each side
(8.2.17) from II after substituting for .1', from (8.2.18):
.... . -r- . . 1-
0, = 0, I - P/U/US 0,._ I = 1
With the equation for P,-I in (8.2.16), (8.2.20) is in a sense a complete
statement of how the error evolves; it has very little intuitive appeal, though. A
slightly better idea of how 10} behaves is obtainable from thc scalar O/p,-IO,.
From (8.2.19) and (8.2.20)
for any real x (Mirsky, 1955, p. 388). Hence
-T . T-
0r_IU,u, 0,_ I
1
From (8.2.16) P,__\ and hence P'-l is non-negative definite, U,I'P'_IU
I
is non-
negative and the last term in (8.2.23) is positive unless the prediction error
in Y
I
is zero. That is, jjTp-ljj decreases unless the model predicts the
Olltput exactly.
No p.e. conditions have yet been imposed on Ill}. If they are, we can show
that the more easily interpreted OTO converges. A standard result'is that the
smallest (real) eigenvalue "'ml,(A) or a real, symmetric matrix A satisfies
xTAx A
mit1
(A)x
T
x
So
Even with the reference input zero, the colulllns of U arc linearly independent
unless u, _ I' u
l
1 and u,_) arc linearly dependent. The closed-loop transfer
213
(8,3,1)
Output
Y(r
l
).
Fig. l:U.1 Identification in dosedloop syslcm.
I I 1.-
Vlz" )
1+ C(Z-l)
1+D{z
,)
Controller
i-----------l
PIon!
Noise
I I
1+ G{z-'j I z-*8(Z-1)
I
-
I+FU-
l
)
i
1+ A(Z-1) +
I
I
I
I I
I I
I I
I
H(z-l)
I
I
I
I
I
Reference
inpLit
R(z-l)
k J must be such that
B3 IDENTIFICATION IN CLOSED LOOP
function from Ie \ to {u I gives a relation of the form
-g(:-')=(l +<>1:-1 +<>,:-z+<>,:-')U(:-I)
Hence for UI-I' U,_1 und to be linearly oependent, so that, say,
(I +(1
1
:-
1
+(I,:-')U(:-')=O
(I +(1,:-1 +(Iz:-')(:-')=O
Providing Ie I is p,e, of order 3, this is not so and the model is identifiable,
(iii) Model as in (i) but conlrollaw II, = g(r, - )',) -/;11,_, -/;", _" Wilh
the refercnce input zero, U is
(-(Ur +./.111 I U
I
_ I u
l
-
2
1
so the model is identifiable subject to a p.C. condition on :e I much as in (ii).
The cxalnplc illustrates how controller complexity ano a non-zero reference
inpul 01' p,e, oulput noise help closed-loop idcntiliabilily, A more general
picture can be seen by reference (0 Fig. 8.3.1. The coefficients in the
denominator I + A and numerator B of the plant transfer function are to be
estimated. \Ve have as usual
The paramcter
II, I ill
S SPECIALISED TOPICS IN IDENTIFICATION
8.3.1 Elfee!s of Feedback 00 Identifiability
8.3 IDENTIFICATION IN CLOSED LOOP
(i) Proposed model YI = -(/Yr _I + b
l
III I + h,2l/r -,2 +. i',.
vector is [-a b
l
11
1
] and the regressor matrix U is [)'I-I
obvious notation. Because of the controller,
=f,_I-{U
r
__
1
+.!u
l
-
1
)lg
Example 8.3.1 We want to idenlify by o,Ls, the forward-path dynamics ofa
system which has a si.llllplcd-dala controller with reference input i,.:. The
control law, computed with negligible delay, is
tI, = g(l'/ - .J'r) -jil,._ t
We test idelltiflability in various casts by checking ror lincar depelldence
among the regressors.
II' the rclcrcntc input is always zero, as it Illiglll be ill il regulator, the
columns of U arc linearly dependent through the control law. Fur any el, V'l is
zero if II = <>[g I iT
r
, SO <>[g I iTI' could be added to lhe parameter
estimates without afTecting the model output. The model is consequently not
iden tili" ble,
Jr the reference input varies, the mouel is identiJiable unless f
r
- I is a
combination ofu
l
I and U,_ l' and hem:-e or Y, I and u, l' We need not worry
about this eventualily, as the rererence input will not depclltl on the current
outpu!.
(ii) Proposed model +e
l
Now
[YI I y, 1 U
I
I]' which from the control law is
[f, I - I r'_1 - (u, +.!i.ll _ J)lg !"I- iJ
212
Many systems have feedback which canllot be inlcrruptct.l for an identification
cxpcrimcnl. The feedback may be inherent, as ill a dCIIl allll-' supply-price loop
in economics, or externally applied but no less essential, as when an existing
controller cannol safely be disconnected from an industrial process.
Somelimes feedback causes no diilicully. for inslance when the overall closed
loop behaviour is to be identified and the set-point can be perturbed. or when
lhe feedback is known in advance and e[1I1 be allowed for in estimating the
f"orward-path dynamics from the c1osed-Joop response, In other cases
feedback may render lhe syslem unidenlitiable.
with J any polynomial in : - I, and still gel precisely the same transfer function
215
(8.3.8)
(8.3.7)
j= 1,2, .. .,.1 (8.3.9)
Gill' = -( I -I- FjU + error
U(O-I) = )W(:") + y,,(:., )\,(:-1),
wilh the controller switched between,\" dilrerent control laws to bring about
identifiability whcnnccessary. Each.;f'; and .!f
j
is a rational-transf'cr-functioll
We now examine a wiJely applicable identifiability condition for multi
w
variable lincar feedback :;ystems (Si.iderstrOIlll'f al., 1976). The system has /111
control variables. J\ outputs anu /111' external inputs apan from the output
noise. The external inputs J1w\ may be perturbed relcrellce inputs. signals
added to Ihe conlrol variables, or bOlh. The model and plant are or the rorm
will be obtained. impressively act.:urate but entirely useless if VI)' is already
kno\vll. We t.:;J1Il101. for instant.:e, clnp!oy the spectral or correlation methods
or Chapter 3.
(iv) A method with the advantages ol'(ii) anu (iii) is to reconstruct lu J froll1
jrl, l,l': i.lnuthe controller eLJuatioll (H.3.2). The plant parameters call then be
estimated directly. Since 11') is at our uisposal wc have some choice in lit:
rather than having il imposed on us bYlrl as in method (iii).
The accuracy attainable by method (ii), treating the over-determined
equations as observations and finding the Markov estimates of A, Band C, is
in principle the same as that achieved by a prediction-error algorithm in
mel hod (iii) (Soderslrom "Ill!., 1976), but round-olferror may have dinerent
elTecls. Melhod (iv) avoids the polentilllnumerical difficulties or
estimating A, Band C directly, and lu \ does nol have to be instrumented
(except in the unlikely event thal the controller is not known accurately
enough).
where Y. LJ Hild V are the :-lrallsJ'Oi'IllS of the vector output. control anu
OUlpuHlOise-gcnerating variables, and .YJ and If, are matrices oj' rational
lransler functions (with some weak assumptions ensuring good behaviour).
The reed back mechanism is
*8.3,2 COllditiolls 011 Feedbaek alld External [1I1'lIls to Ellsllre [dellliliability
8.3 lOENTIFICATION IN CLOSED LOOP
did not alter the orders or I -I- A and B. Depending on the orders or A, ll, F, G
and /-I, it may be thai 110 such undetectable terms exist, as in (ii) and (iii) or
Example 8.3.1, and the model is identifiable,
With this method, the identification technique is also important, for if it
does not enforce causality on the model, the non-causal model
8 SPECIALISED TOPICS IN IIJENTIFICATiON
U = (G/(I -I- F))(R -Ill')
Y (I +C)(I +1')
V= (i + .'1)(1 + 1; + o-'BGii
We inrer thai A and Bcannot be identilied rrom Y/V, but Ccan, and so eanlh
coefficients in the characteristic equation, and hence the closed-loop poles.
Switching between two or more controllers can make A and B iden(iliable by'
this method, as will be seen in Section 8.3.2. .
(ii) We apply a perlurbation to the relerenee input, identiry the parameters
of( 1 -I- A)( I + F) + 0 -kllGll, 0 -'BG and (I + C)( I -f- F) in (8.3.3), then solve
ror A, II and C knowing 1', G, II and k. The lirsl pari presenls no problems
beyond the usual open-loop Diles such as choosing a suitable perturbation and
gelling the model orders and dead time right. The second pari involves solving
two over-determined sets oflincarequations, one lor A and B UIHJ the other for
C. The rormer may be ill-conditioned, especially when the controller. is
intended to make the closed-loop dynamics insensitive to some of" the plant
parameters.
(iii) With the reference input zero, we observe III \ and \.1': and identify th
parameters or (8.3.1) directly. The Icedback may destroy the linea
independence or the explanatory variables in (8.3.1), as in ease (I) in Exampl
8.3.1, making the method unreasible. The trouble, as with method (i), is
indistinguishability or (8.3.5) rrom (8.3.1), bul now I + A and :;-'B arc
idenlilied separately, rather than combined in the denominator or y/v. Thus
the extra terms - GHJ and (I + FlJ in (8.3.5)could go undetected only irthey
(I+FlU=-GIIY
so we could replace model (8.3.1) by
Y=(-A-I-GIIJ)Yt-(o-'ll+(I-I-F)JJU+(1 +C)II
There arc several possible approaches to identifying the system.
(i) With the rc!crcllt.:c input zero, iJcnlify the cocllkicnls in the rational
transfer function Y/ Vby a.r.m.a. modelling of 1.1' l (with 110 exogenous input).':
Pole-zero cancellation between (I +11)( I + Fj +: -, llGII and I + C would
prevent calculation or A and ll, bUI is unlikel'y. Amarc seriolls snag is Ihat with
R zero,
and the controller is
214
so, eliminating U,
I -I-A)(I -I-F)-I-o-'BGII)Y=o-kllGR-I-(l-f-C)(1 +1')11
217
I ,I "!'! i/.
8.3.3 Self:runing Control
8.3 JJ)ENTIFICATION IN CLOSED LOOP
with .if' square. As Jcontributes PI' to the rank, the rank condition requires.K
10 be or full rank Pu for almost all ![ is immaterial. Full row rank for .X
means that the control variables must contain independent contributions
from w, and full column rank requires Ihe corollury that no linear
combination of external inputs has zero ellccL on u.
(L:) If thcre arc 110 external inputs, j is
We next examine brielly the prime example of identification in a closed-loop
system: self-llIlling contro!'
SO ..!I'I toY', must contribute a total 01" Pr, linearly inuependent rows (and
columns). That is, the output must proviuc, in time, PII inucpendent
control signals,anu some combination or outputs and feedback transrer
functions, P
J
in all, must bc guaranteed to excite u. 6..
Recursive identitication allows a cOillrolier to tune itself, i.e. adjust its control-
law coellkients on line by reference to a periodically updated plant model.
Self-lUlling, if 1'<.1 pid enough, makes initial man ual selling-up of the controller
unnecessary and enables a controller to cope with a strongly time-varying
plant. Figure 8.3.2 shows a self-tuning controller. Identifkation and control
synthesis arc carricu out by the same digital processor, and can sometimes be
merged so that the model is only implicit, simplifying computation, as we shall
sec later.
Ideally, the cOlltrol syllihesis should optimise plant pcrformance taking Ihe
uncertainly in the Illodel into act:oullt as well as the pcrformance critcrion.
The conlrol signal resulting from such an ovcrall optimisation has to.
compromise betwecn conJlicting requirements. It must ailll at good plant
behaviour, e.g. a well-regulated output, but with caution as dictated by
eSlimation uncertainty in thc model, and at the same time it must excite the
plant enough for idenlilicalion. The combined optimal identifi-
cation and conlrol problcm has been investigated over a long period
(Feldbaum, 1960, 1965), bUI complele solulions have been round only for the
simplest cases. An inviting if somewhat risky simplification is to treat the
model as accurate during conlrol synthesis: the certainty equivalellce
principle. Self-luning control schemes based on this idea and implemented by
microprocessor have developed rapidly since the early 1970's and are now
!PJ!J rows ,\ II
I P.r rows
S SPECIALISED TOPICS IN JJ)ENTIFICATION 216
The condition tests the feedback structure but not the model structure; it tests
identifiability for every model of the given tlimellsions. We arc ,he"ef""e
strongly reassured when the condition is satislied, but a model 0\ specified
order may be identiflablc eVen when the condition is not satislicd because not
all models with dimensions PIt and PI' are iucntiJiable.
The need fur more than aile controller "W" ano .!.II is clear, sint:e,f has fewer
thun flu +PI' columns if PU' is less than Pr, alld .\' = I, and t:allnot have rank
flu -I- P
r
Geilcrally "'(PII' -I- 1\) must bc allcast Pu -I- PI"' alld this turns out to be
sullicicnl as well as Ilccess,lry.
Condition for model (8.3.8) sUbject to feedback (8.3.9) to be s.s';.
(Soderstrom et al., 1976) With the stated assumptions, the model is
s.s.i. if and only if the matrix
(with Pw ... Pw,P
y
p). columns) is o[runk Pu -I- P
y
for almost every z.
Example 8.3.2 (a) In Example 8.3.1, I, so for s(P,..+Py)?'
Pu +P
y
' PI\' must be 1 or more. That is, not every Inodel is identifiable without
an external input. Some models are, us we saw in cases Oi) and (iii). With one
external input, the rank condition is satisfied and all models, even case (0, are
identifiable: the model configuration (8.3.8) is s.s.i.
(b) Ifin general there arc as many external inputs as control variables
only one control law, 'PI\' = flu and s = I, so f is
matrix. Any correlation between I and {vl complicates matters slightlYt
without changing essentials. The idcnlil1catioll is assumed to be indirect as in
(i) and (ii) of Sectiol1 8.3.1, or direct as in (iii) and (iv) and by a prcdiction-error
algorithm. The input {w J is p.e. of any finite order (e.g. linearly filtered white
noise). The dead time in Ihe plant-controller loup is not zero. Finally, the
model is said to be strongly syslC!m it/elllffiah!e (s.s.i,) when the parameter
estimates of every paral11cterisatioJl of .!IJ alld ((, capable or the same
input-output and Jloisc-04LpUl behaviour as lhe actual system converges
W.p. I, as the record length tends to inJinity, to the values giving that
behaviour. (The reference to "every paramclcrisation" should become clearer
in Section 8.7.)
219
updating gain, while the control-input signal loscs persistency or excitation
because it derives from a well-regulated output, can cause divergence.
We conclude this brier glance at an increasingly important topic with an
example. It illustrates how identifkation and control cun be merged, and
introduces the idea of recursive adaptive prediction.
Example N.J.J Let us L1cvell)!1 a certainty-equivalence self-tuning cuntroller
for a systelll dcst.:'ribed by
I C(_-I)
1)= - - _ U{O-I)+ F(O-I)
+A(o ) I +A(o )
8.3 IDENTIFICATION IN CLOSED LOOP
where 1/(:'-1) the transform ora zero-mean, uncorrclatcd noise sequence {oJ,
und
A(:'-!)==U
I
: I+"'+ou:"-n,
1)=:('1:-
1
+ ...
The dead time k is assumed known, so b[ is nOll-zero. The controller is to
compute a neW control value at each sampling instant, on receiving the latest
output sample, At instant I, the controller computes 1I, to
squarc crror
(I +C)/(I +A)= I + F+o-'G/(l +A)
with F of degree k in : - 1 and remainder G of degree.i = max(n. q - k). We
(ben have
y= U v) + (I + FW
I +A I +A
J=E[(Yrll..+I-Y;".H+i)2 j/ ]
r H+ I' the earliest output influenced by lit. Here .1';".1-/.: -1- 1 is the desired value
.1'1 H f i' amI the conLiitioning on , indicates that J is cakulatcu from
knowlcLlgc available at instant I. incluuing .1'"
As {lYIH+ IH'lI
t
is hi'
{lJ/i.... and /'12J/l1U/'=2hi>o
so Jis minimised by the ii, which makes E[y
l
+/.: + 1 II] equal.l';".11i + I' For brevity
[.1', +Ii + 1 II] will be called il +Ii + I and'argumcnt : - I dropped from transforms
from now on. The only uncertainty in +1.. + 1 at instant I is due to ii, and 1', + 1 to
VI+!.. + I' since we know all inputs and outputs up to III _1 and .rl' and can find all
v's up to 1'1 via the systcm modcl. We find /, + I.. -1- 1 by first splitling all' Ihe
contribution to Y,+k+ 1 due to r,l 1 10 l',+k+ I' This entails long division to
obtain
Sarnpler
Output
Control
synthesis
Disturbances
ii,
Recursive
identification
lnihal estimates
and covdrldnce lor
model coefficients
j I
\
u"
Nbflonol
sampler
Fig. 8.3.2 Self-tuning conLroller.
218
available commercially. Aslriim and Wiltenmark (1984) give a very rerldable
introduction to self-tuning control, while Goodwin and Sin (1984) go into
greater detail. Certainly-equivalence sci r-lullcrs cail usc any of a of
control-synthesis techniques, among them pole placement spccJlYlllg the
closed-loop poles, linear-dynamics quadratic-cost
optimal control. deadbeat control and minimum-variance control
a quadratic function or present control and expected Olltput error at a future
instant. Permuted with uny of the simple recursive idcntilh.:ation algorithnWti
they provide a large range of schemes.>
Any adaptive control scheme, even a certainly-equivalence self-luner
applied to a linear, time-invariant plant. is a non-linear and timc-varying
closed-loop system. Nevertheless, swbilily and convergence analyses able to
deal with many such schemes under reasunable assumptlOils are ,no,W
available. Section 7.5 sketched that of Ljung (1977), and others appeur 111
Goodwin and Sin (1984) and its references, and the references of Astriim and
Wiltenmark (1984). An important point in implementing a certainly-
equivalence self-tuning scheme is that the idcnlificat!oll algorithm nol
become over-confident and "freeze", The danger arises when an
appropriate to dynamics is applied to a P,hlllt whose dynamics
dril"i slowly or change infrequently bUI abruptly. Aclion 10 prevent over-
confidence, for instance by the methods or Section 8, I, must also aVOId
opposite danger, of divergence. An excessive cumulative increase in
221
MA.I Orllcr-IIU.Tclllcllting Etluatiolls
8,4 ADDITION OP REGRESSORS: RECURSION IN MODEL ORDER
8.4, AUUITION OF REGRESSOnS: RECURSION IN MODEL
ORDER
and we need nol idenliry <') to <"I' We see thai it pays to match the rorm orthe
model 10 the joh in hand. /':,.
u))
8 SPECIALISED TUPICS IN IDENTIPICATION
f = =-'(
= = ' (IJ(I + F., ='(;)U'I' (i(r"": 'Ii
I -I-C . 11,'/ 1-1,1
where the lerm beginning: -I; gives thaI part ofy, I I; + I known at instantt,
(I + F) V the part due to later noise. The latter COlI tri bules zero to -I- k + I since
tv: is zero-mean, sO
220
We now have a recursion ror the (k + I )-step prediction or .1':
.1\+1;+ 1= IlIll , + h'21(1_1 + ... + m; I + gl.l't + ...
+gj.l',-j+ I - -'" - C'I.I\I'I;.'I/+ I
(8.4.1)
(8.4.5)
(8.4.3)
(8.4.4)
(8.4.6)
,pI' ,." = (V,: V" - V; V,,[ U;, V"r) V;,v,,) - )V,:(y - VA,)
H
7
V,:c"
We now substitute ;j,II'.'1 into (8.4.3) to oblain
- T T - IT, . 'T-
(JI'+" = (JI'- [UI'V
I
,] UI,I'I/W'"'IC/'
- - I I T--
(Jld'i = 0/, - lUI' VI'] U/,1- 'If/JI'+'!
which Sllbslituleu into the second gives
V;;U1,(O/l- [U;;UI>]-IUI';V'lrjJl'lq)+
y = [V" v"J[:] -I- e
Denoting the mDdel order by subscripts Dn 0 and'" as well as U and writing
[VI' V,l as U
p
+,' we have the new normal equations
Vi U ,=[U;,.v"O,,] (84 1 )
1,1,/ 1>1'/ I I.'I[} /'1//.1 I'+I/,} /..'1, .......
(, I' ! 'I 'I I' 'I 'I .,' /' ! 'I 'I)-
The first row partilion gives
We saw the ill-clrcCls orindllding Ilctir-reuulldan( regressors ill I.s. in Chapter
4. Such regressurs can be oClcC'led by singular-value decomposition as in
Seclion 4.2.4 or hy Ihe mOdel-order tests or Chapler 9. In this section we
consider the reverse process or building up a model by adding regressors. It
might be beller 10 sUlrl with an over-large set or regressors and whillie it dDwn,
but sometimes the observation of exIra variables is expensive or inconvenient
enDugh tD jusliry working rrom the bollDm up. The results or this section are
alsD one way Dr apprDaching lallice algDrithms (Graupe et al. 1980; Lee et al.
1982), which are recursive in bDth lime and model Drder.
We start by adding q new parameters f/J to p original ones 0, adjoining uJI:',
column regressor matrix '7'1 to Up to form the new model
order-incremcntingequtltion (8.4.6) resembles the time-update equation
so
" J)
'I
<'
) gl
_ ., k
- , (Il( I 1 F) V Hd')
I ,1- C
to upuale a vector of estimales
where +k stands ror [.1'1 +k II - I] and so on, anu
8 ' == h
,
t
:-
t
+ ... + == B(I + F)
An pxplicil minil1lll1il-Vlll'iance eSlimates A. IJ and C, computes
fl', Fand G, finds.l\ +k _ 'I -I- I to.l\ H recursively (which requires I + C to have all
its zeros inside the unit circle) and calculates lit to make /r H -I- 1 I'
II, = (Y*It/ II + (-I)\+k + .. , + i\i\+k_Il-l- 1- 1-'"
-bk+mur-k-m+l-glJ't _ .. ,
then pUlling the updated eslimales into the conlrol law to find III as in
explicit self-tuner. In other words, lhe controller consists only or i:I
updated adaptive (k -I- I)-slep predictor.
In regulator problems .1'* is constanl, anu lhe origin for II and y can be
chosen to make 1'* Zero. Since the minimum-variance control law setS.l\+k to
.1'1*+1;' +k - I to _ i and so all, all the terms in lhe control law are
zero. The control law simplilies to
m+l tglJ',++gjYJ'-j+I)/fJ'I
An implicit simplil1cs the computing by firsl lIsing lhe (k + I
prediction equation us a regression equation
.1'1 = + .. , + 21; m +gl.l'1 I; 1+'"
+gj.l"-1--/..-
C
l.1\ 1-"'-
C
'IY' 'I
Example 8.4,1 An extra lerm is added to the lUrgel-position model in the
Dolh (8.4.7) and (8.4.10) arc easy 10 vcril"y.
The resemblance to covariallcc-lI rda ting eq ualioll (7.3.25) is qui lc strong il'
we write
223
(804.12)
55.44]"'"
-4.952JT
y= V[V'Vr' VTy = P( V)y
8.4 ADDITION OF REGRESSORS: RECURSION IN MODEL ORDER
=
(i,,+, = [4.663 235.1
-40.30
and from (8.4.9), partition (I, I) of M
p
+, is
[
0.7477 -1.929 0.02481J
-1.929 8.067 -0.4897
0.0248 I - 0.4897 2.551
*8.4.2 Orthogonality in Ordcr-Incremcnting
The extra term contributes little to y, and fj has the large estimated standard
deviation JV
I
/
l
= 16.42. Moreover. its prcsence increases the s,d.'s of '\':0 and
lio considerably but only reduces the sum of' squares of output errors from
157,910 157.8. It is clearly not worlh including.
'rhe example sllOWS how litlle COlllputillg is Ilccucd (0 auu a single terlll; IV
is a scalar, a vector and tvl p+q correspondingly easy to form, 6
v:v, = 3.647 x'10-
1
V:,v, = [0.3 0.2611 0.1I8J'
=[-0.02811 0.1495 0.01837]'
IV = 269.567
radar problem of Example 4.1.1,10 account for rale of change of acceleration,
making the model
xU) = 'Y
o
+ 1'0/ +a
ll
/
l
/1 -I- br
J
/6
Here 0 is [x
o
t'o aoJ
l
, already estimated, and the new parameter e/) is b. with
q = I. AI" is also to hand. The new regressor vector "'I is
[0 1.3" x 10 .I 1.06 x 10 2 _Hi x 10 2 8.53" x 10 2 0.16')'
so to four flgures
We encountered orthogonality between the error }' - 5' and each regressor
\> vector in Section 4.1.3. and established its connection with the condilional-
mean estimatc in Section 6.3.6.Thc recursive I.s. algorithms of ChapleI' 7 can
be derived elegantly by reference 10 orthogonality, bUI we opted for an
algebraic derivation. To gain somC idea of the power and economy of a more
geometrical approach, we shall derive the order-incrementing equations by an
appeal to orthogonalily.
Recall that the o_l.s, estimate of y is
(804.10)
- A - 'B,
W
-1M -V1VtVIV) ,,'IV )-1]
/' I'qq 'Ii' ""'/,,
II"
BJ = [A , + A ',BWCA ,
D - WCA'-'
W (/) - CA 'Ii) ,
[
A
C
[
I
\J )-J
_ I I' P qq q r
- - WV;UpM,.
[
U'U UIV]-'
M - "" I'
,,+,,- VTv VII?
q p ., q
[
M" OJ [-M"V:,v'J
M,,+,,= 0 0 - M V'V
I' Jl l'
X (V: (V"M,.V;' - f)JI,) '[_ V,: V"M" V:' U"M"J
and we would expect much of the technique for avoiding numerical difficullies
in covariance updating to curry over to order incrementing.
Before interpreting the order-incrementing equations further. let us do
numerical example.
222
(7.3.21) of recursive I.s. Time updaling adds new raws ralher Ihan columns
the regressor ma tfix.
The malrix [V
T
V] musl also bc ordcr-incremcnlcd, 10 obtain a
covariance estimate ror the parameters. To do so we apply i.l partitioned-
matrix inversion Icmrna (Goodwin alllJ Payne, 1977, App. E)
[
A =r -(A -lJlJ"C)-'lJD" 'J
C D I - (D - eA 'IJ)' CA' (/J - C/ '!J) ,
and puI V:; V" for A, V:,v'1lor B, V" for C and V,: V" for D. Wc fmd Ihat,
we denole [V' V by M
I' P Ii'
where IV is delined in (8.4.5). The matrix inversions in part ilion (I, I) can
avoided by use or the matrix-inversion lemma (7.3.26) to give
(M,- V,; V,,)"' = M" + I<!"V;Y" /VV,: V"M" (8.4.9)
and partition (I, 2) is the transpose or partition (2. I) since Al/
I
+ '/ is symmetric.
Alternatively, partition (J. I) can be found directly in Ihe form on the righl of
(8.4.9) from
(8.4.16)
VT(y _ P( V)y) = UTy - UTy = 0
225
(85.1 )
_ Bm{.!..2__ =.b{J!)"\'.. ..
I +AIIl(s) 1+a,s+,+a,,/Il
Oile way to fit a reduced transfer function
8.5 MODEL REDUCTION
The fewer parameters a moud has, the easier it is to understand and apply.
The neatest way to ensure lIlUl a model has nO more parumeters than
necessary is to conduct order tests' during identification, as described in
Chapter 9, Nevertheless, we sometimes have to reduce an existing model,
perhaps to check whether order reduction alters the overall behaviour
significantly. There are many approaches (Bosley and Lees. 1972) of which we
shall examine a few of the most popular, applied to transfer-function rather
than state-space models.
8.5.1 Moment Matching: Pad" Approxilllation
A great deal of interest has been aroused in signal processing by the
development of la lIice algorithms (Friedlander, 1982). The algorithms employ
ail <I.r. time-series model ror I.s. signal estimation, and are implemented as a
cascade or idelllkaJ sections. each (;'orresponding to an illLTeaSe or olle in the
a.r. on..kt. 'flie <I Igorilhllls arc aUracl ive fur their t:lHlipUlalional Ct:OIlOIllY and
good IlUillerkal properties, ami are potentially useful 1'01' identification.
However. their economy depends on the model being an autoregression. For
an a.r., the regressor vector at time' is that at' - I shifted down one place and
with OIle new entry at the top. The normal matrix is correspondingly updated
mainly by shirting south-east. Withoot going into the details, we can
appreciate that .this simplifies a combined time-updating and order-
incrementing algorithm greally, For identification, we are rarely happy with a
purely a.r. model, and almost always require an a.r.m,a. model with
exogenous inputs plus a noise model, and perhaps also a constant term. The
updating is much less simple. with several new samples entering at each
update. The result is that computational economy is lost (Robills and
Wellstead, 1981 l. and the lallice method has no overwhelming to
counterbalance its relatively complicated progranulling and dilIiculty of
interpretation in identification.
8.5 MODEL REf)UCTION
8,4,3 Lallice Algorithms and Identification
(8.4.21)
(8.4.18)
(8.4,14)
8 SPECIALISED TOPICS IN "'ONTII",'.
114(12) = - /14 VIV A1
1ll
)
/' I' 'I
U/)"+,, - V"1>,,, "I = U
We recognise this immediately as the normal equations (8.4.2).
Similarly. (8.4.16) requi,es thaI
V'y-V'(V M"'l+ V M
I
22I)V
1
y=U
II "I' 'I 'I
Swapping Up and V
q
all through gives 1\41 III as in (8.4.8), and .M(21) is A/
o2
l
T
,
as by definition MJI+'/ is symmetric. _
The order-incrementing equations for (J and'" also comes from (8.4.
but written as
Now (8.4.15) must be true for any values of U". V" and y. including y non-zero
but UJ,y zero, so that
VT(V Mll"+ V M
122
I)V'y =0 (8.4.17j
l' I' 'I 'I
To satisfy (8.4.1?) it is sulIicicnt to make U;,( V,/kIll 2) +- V,,'H
f22J
) giving
V
Ty_ VT(V M"" + V MI2IljV
T
v_ U'(U M""+ V M
l21l
jV
T
y=O
{I P P 'I l'.J {' II " 'I
(8.4.
VTy_ V'(V 11'"'+ V M
'2II
jU'v- V'(V M
11
"+ V M'22ljV
T
v=O
II II pi 'I II. 'I I' 'I II.!
V"jM".,.q[Vp VqlY =U
"1 V'I
or in terms of the partitions A;flll J, AP 121, ki
flll
and A1
122
) of Al/1hl'
and it is enough if 1- V
T
( U M"" + V M'l2
I
) is zero which on substitution
'1 {I 'I' .
of M"" from (8.4.18) gives
MI22I=(I/'V _J/rV M V'V)"'=W
q II 'I Jl l' Jl q
By the sallle token y - POV"V,/H,Y is orthogollallu cuch l.'olulIIII 01" U
1
, aIll] Vq
so, with AI Jcnoling the invcf:-;c or the lIorJllul Ilwlrix as bcltllT.
where the projection matrix P( V) projeels y orthogonally on to the
formed by all lincar combillations of the columns of V, Hence the
y - P( V)y is orthogonal to each column of V, as is easily vcrified:
224
H(s) = I - O.7s -/- 0.6375s' ... 0.6265s
J
-/- ..
227
"
1.5 10
Time
05
(0)
(b)
Time
':,;;:-----.
--------

--
.. .. _._ --... - ... _ ... _._1".
o 05 1.0 1.5
10
20
o
c
o
o
c
o
e! 05
o
'"
"'
8.5.2 Continued-Fraction AIJIJroximatioll
"(0 -/-) = lim sH(s) '" lim s B,.,(s)
S-a) S-a) I + A",(s)
10
."
o
E
H
8.5 MODEL REDUCTION
Fig. 8.5.1 (a) Step and (b) impUlse responses, Example 8.5.1. : original model;
---: second-order reduced model; Jirst-onJer reduced model.
Amodel-reduction method popularised by Chen and Shieh (1968) is to expand
the original rational transfer function as a continued fraction ill the second
reduced models. Except very early on, the second-order modellits the step
response well. The impulse-response fit is less impressive, with quite wrong
behaviour initially. The trouble can be traced to a dill'crence in pole-zero
excess, two for the original model and one for each reduced model,
invalidating the approximation
8 SPECIALISED TOPICS IN 1I)1r.NT1F'Ir.HI
Y(.I)
Example 8.5.1 The model
(I -/- 0.5s)( I -/- 0.2s) .
. .... U(.I) -/- nOise
(I -/-s)(I -/-0.25s)(1 +O.ls)(1 -/-0.05.1)
An mlh-order reduced model then mutches the coeHicienls of SO to s2m-l
to a larger continuous-time model is to expand the transfer function of
larger model as a power series in s:
H(s) h
u
-/- h ,s -/- h, .I' -/- ... '70 (8.5.2)
then pick Ihe 2m coellicienls in (8.5.1) so as to match terms of(8.5.2) up
"2m_I",2111-I, The process lJf rational-fullction approximation via a Taylor
series is callcu jJmle approximation (Watson, lYHU). The numerator and
denominator degrees can be chosen at will; we have maLic BIJI{s) of degree onc
less than I + 1"J
m
{s) to give a realistic !inite bunJwidth.
Matching h
o
matches the steady-state gain, i.e. the linal value
lim,_osH(s)!s of the step response, and we can interpret matching higher
powers of s as paying allention to the response to higher derivatives of the
input. The signilicance of the matching is bcst seen in terms of the impulse
response. For a stable system
Id'H(s) {I d'H(s)}
i!1l; = lims--
1
-.- = lim .5//-
1
---.-
S-'U S {:l ,-'x, s cls'
B,.,(s) = l/(s)(1 -/- A,.,(.I)
(i) First-order model: b" = b" and h, = boa, -/- h I = 0 so b
u
= I. a,
(ii) Second-order model: h
u
= b
o
' b, = hua, + b
"
b, = b"a, -/- b,a, -/- Ii, =
0, b
J
=h,a, -/-h
2
a, -/-li
J
=0 so b
o
= I. b, a, = 1.222. a
2
=0.218.
Figure 8.5.1 gives the step and impulse responses of the original and
and
(Gabel and Roberts, 1980). We see that matching h, matches the ith
moment of the impulse response /t([). That seems sensible enough.
is to be reduced to first or second order. To do so, we write the m"1>'<[
function numerator and denominator ill ascending powers Or.'i' then expand
H(s) by long division, quicker Ihan repeated dill'crentiation and (8.5.3).
obtain
{
ldiH(I.)} I'
2.'-' --I.: = (-I)'h(t)dl
s l..\ {)
226
229
c,
if .-::, G
s
if
J
+
(e)

l'J +

J (',2 +('4/.1'
(b)
+
(a)
V(s) -1 1 y( 5)
8.5 MODEL REDUCTION
Fig.. 1:1.5.2 Continued-fraction model rcl.luctioll. (il) System with cOJllinued-fraction transfer
function of 211 quoLients, (b) system when G(s) is deleted.
r(.I") (',/.1' I
==
and (c) second-order system when G(s) is deleted.
ns) I + (''\('4-ls) + ('4-/\;

(8.5.5)
(8.5.6)
I
'"' +sIJ(s)/IJ(s)
'"' = I/bo
c, = bo/(a, - b,/b
o
)
C
J
= (a, - b,/b
o
)/
(b, - b
O
(a
2
- b,/bo)/(a, b,/h
a
)
I
(.'1 +
('2 I

s I

C
-'+" .
s
'" ... .---
, (l/s)(lJ(s)/D(s
B(s)
I + A(s)
B(s) I
I +A(s) = (I +A(sJJ71J(s)
I
228
then truncate it after 2/11 quotients and reconstruct the reduced-model rational
transfer function B..,(s)/(I + A..,(s. The continued fraction can be thought
as the transfer function of the nested feedback-feedforward model in Fig.
8.5.2(a). Figures 8.5.2(b) and (c) show that truncation of the continued
fraction after twound fouf quotients respectively gives valid llrst- and
order approximations provided the innermost retained feed forward gain is
mueh larger thun the finite gain of the deleted section. At small enough s this is
certainly so.
Coefiicient CI is produced by one stage of long division 011 the reciprocal of
the original transfer function, leaving a remainder slJ(s)/li(,\') say:
One stuge of long division on B(s)/D(s) then gives '"" and so on.
coefficients turn out to be the ratios of successive elements in the first w,uu,u
of the Routh array
Cauer form
231
(8.5.12)
bj-a,b
O
(1 +(1,)2
Gml" 1= bt:t," 1:= 1
d
2
G", I = 2':,(1,bo =-"-0
dz-
2
,=1 (I +a
l
)'
sO we solve
Exumple 8.5.3 The reduced-order modcl
B.,(2 - 1)/( I + Am(z-I) = (b
o
+ b, z- ')/! 1 + alz -I)
is to be filled to G(z-I)=0.5z-'/(I + I.Iz-
1
+024z-
2
). Though G(Z-I)
is unrealistically small, the example will bring out the main features of
the procedure_ The three reduced-model parameters allow us to match G,
dG/dz-
'
and d'G/dz-
'
at z = I. Denoting the reduced transfer function
by G
m
we can express Ali as a weighted sum of G and its first i derivatives, all
evaluated at z = I;
Aill=GI",
Ai, =
{ "" : '" I
d'G I dG I
Ai +----
2 /:::-2 ="'1 d::.-
1
:='1

dz ==1 d:. :=1 d.. ':=1
and so OIL Hel1cc we can match the original and reduced models through
either the moments or the same number of derivatives of their u.p.r.'s. We
conclude that moment-matching in discrete time is equivalent to matching the
leading terms of the Taylor series expansions of the transfer functions about
z = I, in clear correspondence with expansion about s = ain continuous time
if we interpret z as esT.
For the reduced model, B",(z- I )/(1 + A",(z- ')) replaces G(z-
derivatives. For the original mollel, the derivatives or moments cun'be'
computed directly I"rom its u.p.f. if it is available and short-lived. II" 110t, the
original model is also writlcn in rational form and the derivatives found from
thaI.
8.5 MODEL REDUCTION
(8.5.9)
(8.5.10)
(8.5. II)
c, = I
c, = 1.429
'"., = 3.322
<'" =O.%W
0.04375
o
bo+b1z-
1
+ +blllz-
m
l+a
t
z '+ +a
lll
z
1Il
0.4425
0.1
0.04375
1.4
0.7
0.3425
0.0375
BIIl(z- I)
I +Am(z ')
d:?,1 == \'k(k-I)"'(k-r+l)g,
cI.. :::=:IG
k'" r
I
I
0.7
0.2107
0.2179
230
but less readily than in continuous time. The ith moment is
so the firstorder model, retamlllg only c
1
and c" is I(c, +sic,) =
1/(1 +0.7s). Retaining c
i
to '"" gives the same second-ordcr model as in
Example 8.5.!. to.
To change the reduced-model order we simply add or deletc continued.
fraction coefficients; the only reworking is to turn the continued fraction back
into a rational polynomial function.
8.5.3 Moment Matching ill Oiscrclc Tillie
Example 8.5.2 For the transfer function of Example 8.5.1, the Routh array is
Most of the literature on model reduction by moment matching concerns
continuous-time systems, but we are mainly interested in discrete-time
models. Expansion of a z-transform transfer function as a power series in Z-1
just yields the unit-pulse response ofdinatcs go. g I"" in
G(z-')=go+g,z-' +g2z-'+ ... w (8.5.8)
so the matching process of Section 8.5.1 would only match the start of the
u.p.f., ignoring the rest. The time-moments of {g) can still he related to
coefficients of the reduced function
and since
-0.4
Fig. 8.5.3 Unit-impulse responses of original unu reJuccu 1ll()(.lc!S. 0: original model;
,6,; reduced model; 4: reduced model with currect dead timc.
233
8.6.1 Bounded-Noise, Parameter-Bounding .l\tlodel
tcchni4ues to avoiJ this sllag have becn suggested. They usually deterl1line
I + A", first, keeping the stable dominant poles of the original model. The
simplest just throwaway the fastest poles, i.e. the leftmost in the ,,-plane or
nearest to the origin in the z-plallc. The numerator coellicients can then be
chosen to match mmoments, achieving 11 poorer fit than Pade approximation
does (if stablc) matching 2m moments. These alternatives have their own
drawbacks (Shamash, 1983), notably the risk of retaining a slow pole even
when it is almost cancelled by a zero anti sO has little effect.
8.6 RECURSIVE IDENTIFICATION IN BOUNDED NOISE
The traditional noise model we have adhered to until now is white noise passed
throngh a low-order linear filter. The white-noise seqnenee is characterised by
its variance or covariance and its mean. When considering its p.dJ.. we have
usually taken it as Gaussian. Real noise is orten far from Gaussian, and tends
to exhibit complications such as isolated abrnpt cvents due to unmonitored
control actions, intermittent disturbances in <lmbient conditions or inputs
such as feedstock quality. and instrument misreadings or
More gradual changes amounting to time variation of the noise statistics
often occur, as noted in Section 8.1. Chapter 10 contains examples of real
noise behaviour.
The fIltered-white-Gaussian noise model can be defined on grounds of
mathematical convenience, allied to a hope that an estimator with good
properties in idealised noise will still perform well in real noise. With enongh
knowledge of the plant and its environment, a detailed noise p.d.1'. might be
formulated and employed by a Bayes or m.1. estimator, but shortage of prior
infonnation or excessive computing demands normally rule out that option. A
more realistic aim is to match the noise representation to the extent of prior
knowledge, keeping it very simple when necessary bnt allowing for empirical
refinement of the noise model during identification. We shalll10w examine an
alternative to probabilistic noise modelling which does not pretend to more
knowledge than is actually available, bnt allows us to discover more about the
noise as we go.
8.6 RECURSIVE IDENTIFICATION IN UOUNDED NOISE
8 SPECIALISED TOPICS IN IUENTIFICATlON

5 10
Time (sample intervals)
OB
v
v
0
0
0.4
a
v
,
232
The derivatives of Gare rather tedious to find for even a second-order rational
transfer [unction, so we might think of summing kigk' k = 0, I, ... , i = 0, 1,2,
to get the moments direclly (or computing the derivatives by (8.5.11. In fact,
is 33% Jess than Ni:!.. and the sum up to g20 still 16'J:llcss, even
though g 15 and g20 are only 0.0352 and 0.0115. Accuracy mighl be poorer still
with a real U.p.r. model because of scatter ami bias.
Figure 8.5.3 gives the u.p.r. of each model. Only the first few points are
much in error. Matching M
o
has made the errors sum to zero, but only by
driving the reduced-model u.p.r. negative initially. This implies non-
minimum-phase continuous-time behaviour, contrary to the original model.
A model B,,(z- J )/(1 +A,,(z - I)) = (blz- I + b
1
z - 2)/(1 +aI z - I) gives a much
belter fit, also shown in Fig. 8.5.3, with no negative excursion. Similar'
anomalous u.p.r. behaviour will appear in models estimated from real records
in Chapter 10, again due to too short a model dead time. L;;
8.5.4 Other ReducHon Methods
Pade approximation has the serious drawback that an unstable reduced
model may be obtaincd from a stable original model (Problem 8.5). Many
Given that the noise p.d.L and correlation structure are initially unknown,
perhaps cumplicateu, and dillicult to estimate reliably by way or residuals
frOIll limited records, is there a simple ilOll-probabilistic way to characterise
the iluise'! The answer is yes: by boullus. \Ve specify only the largest credible
234 8 SPECIALISED TOPICS I N IDENTIFICATION 8.6 RECURSIVE IDENTIFICATION IN UOUNDED NOISE 235
values that noise eould take, so the usual model linear in the parameters
becomes
Here the origin for )', is chosen to make the bounds 011 V, symmetrical, for
convenience. The bounds are taken as constant unless we know better.
Observation Y, tells us that
y, - r,'::; n",rO .s;y, + r
f
(8.6.2)
These two constraints on may be interpreted as hyperplanes in space,
between which must lie. A sequence of observations {J'I' Yz, ... 'YN ) gives N
pairs of hyperplanes, which together confine to some region D
N
After
processing YN' we know that 0 is somewhere in D
N
but we assign no
probabilities to ditferent positions, and make no attempt to extract a "best"
estimate.
Lack of a unique estimate of is at first worrying, but we can reassure
ourselves by reflecting that engineering design is largely a matter of
tolerancing for adequate performance in the worst casco For this purpose,
parameter bounds arc just what we wanl.
450
B,
\ .' /
\ //
\ / .
\ //
\ . /
\ //
.(/
/ ~ \
./' \
'/ \
23
a 150
-150
Observation
Fit!. H.6.1 1':lralllclCl' hUUlltlS, E:wtllplc H.ll. J. {'russ-section \11' /)1' hllll'hcu: m:i 1I11 I II drdcd.
(8.6.1) 1=1,2... ,N jv,l ,;; r, (known), y, = u;O + VI'
found by o.l.s. in Example 5.3.3. That is, 02 alld 0.1 are individually about
equally ill defined and the errors in O
2
and 03 are very likely to have opposite
sIgns.
As well as being conceptually straighlforwanl, parameter bounding throws
light on the strengths and weaknesses of the observations} and is potentially
valuable in experiment design. In this example it is clear that an observation
yielding constraints with d0
3
/dO, positive and about 1, and 20/t apart in the 0,
(8.6.4)
(8.6.3)
(8.6.5)
11'/11 u'll u
T
'/ = 11"11 u= Ilull'u = nju

U
,
u'll
11'/11 = - - ~ = 11011 cos a
lI
u
ll
and
Summary: Euclidean Inner-Product Space (Hadley, 1961; Luenberger,
1973; Halmos, 1958; Rockafellar, 1970) In a real inner-product space,
the length (or norm) 11011 of vet tor is defined as (OrO)I/2 and Euclidean
geometry applies. Ifq is the orthogonal projection ofonto u (Fig. 8.6.2)
and' is -'/, then by Pythagoras' theorem
so '/ and' are orthogonal if and only if ,i
r
, is zero. II' '/ is yu then
As we seem to be talking geometry, let us recall a few facts about Vectors in
Euclidean space. Those dealing with orthogonality are already familiar, at
least.
direction, would have reduced D considerably. An observation at r ~ t":' l.:with
noise 5, for instance, wuuld have given the chain-dOlled constraints in Fig',
8.6.1 (but only if the target Hew backwards). 6
[0
2
O,]r for
-1763 ]
3526
[
958
-1763
Example 8.6.1 In the radar target problem of Example 4.1.1, we decide to
bound the parameters on the (currect) assumption thut the noise is between
10, at every sample. Hence
y(r)-IO';;x
o
+00r+ar'/2,;;y(r) + 10, r=0,0.2, ... , 1.0
defines the range of the parameters 0= [xo v" aT' compatible with the
observations. Figure 8.6.1 shows the cross-section of the resulting.three-
dimensional polyhedron D
6
at X
o
= 5, the correct value. The observation at
r = 0 merely constrains x" to be between -7 and 13, i.e. puts bounding
hyperplanes parallel to our cross-section, sO they do not show.
The slopes - 2/r of the constraints fall in a fairly narrow range, so [0, 0,]
is ill defined in one direetion but well defined at right angles to it. The
elongated D
6
says much the sallle as the estimated covariance
237
(8,6.9)
(8.6.10)
1,1
t =.1,2, ...
(y, - - V,II),; 1 (8.6.11)
An appealing idea is to use an ellipsoid
-
,.(0-0,) P, (0-0,)'; I
Here we generalise lhe noise constraint to another ellipsoid, its defining
IlIHlfix R, becoming I} in (8.6.1). Figure H.6.J shuws the upuating of
the bounding cllipsoiu for O. Any 0 in or on both E,_ I anu F, satisfies
8.6 RECUIlSIVE JI)EN rJFICATION IN UOUNDEl> NOtSE
8.6.2 Parameter-Bounding Algorithm: Ellipsoidal Bound
in place of D" with the centre Or and symmetric positive-definite matrix PI
adjusted to lit D, as closely as possible. As ii, and p,-I have p and p(p + 1)/2
parameters, respectively, we need only update a fixed and quite small number
of parameters. What is more, the npdating turns out to be reasonably
uncomplicated, as dcmonstrated by Schweppe (1968) for state estimation and
Fogel and Huang (1982) for identilication.
The algorithm finds an ellipsoid , which includes all 0 contained in both
,_ I and the region F, between the hyperplanes dne to the latest observation.
That is, , contains all parai1leter values compatible with the latest
observation and, through E
r
-
1
, all earlier ones. Since the intersection of ,- I
and F, is not an ellipsoid, E
r
also contains some values incompatible with the
observations. It is pessimistic about the uncertainty in 0, and gives an outer
bonnd on D,. Most of the npdating process can be followed easily for the
vector-observation case, with the model
in up the best, viewed in isolation, is that which maximises lIu,1I and so brings
the hyperplanes as close together as possible. The distance 2r,/lIu,1I can be
interpreted as a noise: signal ratio; the best choice of signal maximises
signal: noise ralio, ullsurprisingly.
We must return from this diversion into experiment design and see how the
parameter-bounding region D call be calculated. The principle is very simple;
instead of IHwing to find (j to minimise some risk, log likelihood or Ls. cost, we
deline D by listing the constraints (8.6.2) up to date. For some applications like
toleranced prediction, discnssed later, this is good enough, but for others a
more concise description of D is essential. The number of vertices of D, rises
far more rapidly than I, so listing them would not be practicable. It would also
be diffJcnltto npdate a vertex list observation by observation. We are thus led
to look for some easily specilicd and updated approximation to D.
(8.6.8)
(8.6.6)
(8.6.7)
u
z < U 0(1) < Z
'1 - P - 2
so for any 0 ,; A'; I,
" ,;,to'" +(I - AjO''',; 'z
and the whole urthe line joining Oil' and 0'" is in the sel. The same goes
for a more complicated polyhedron D formed by more than 2p
hyperplanes, not necessarily in pairs.
H SPECIALiSED TOPICS IN IDENTIFiCATION
with V non-singular.
The pcints in and on the polyhedron form a convex set; that is, if
zit ;; Zi2 for i = 1,2, ... ,p, any such points 0
01
and O(2) satisfy
8
Fig. 8.6.2 Orthogollal pHljCl:lioll.
Hence UTO = Z says that the length of the orthogonal projection of 0
onto uisz/llull. This accounts for all 0 in a hyperplane onto which tu/llull'
is the perpendicular from the origin. A pair of hyperplanes u}O = Zil and
uTo = =12 are parallel and 1=11 - z"lIl1u,1I apart, with u
l
normal to both.
In p dimensions, any p linearly independent pairs deline a polyhedron
wilh 2
P
vertices given by
Assume that distance (vector length) as deli ned above is appropriate
measure the uncertainty in O. (If not, (01'WO),/z with a specilied weighting
matrix W can be accommodated by transforming 0 to 11* '" GO where G
1'
G
W, so that 0*1'0* is OT WO.) From the discussion just before (8.6.6) we see that
the hyperplanes (8.6.2) are 2r'/lIu,1I apart. It follows that if we have any
236
239
(8.6.18)
(8.6.20)
(8.6.22)
(8.6.24)
(8.6.25)
(8.6.26)
1',111 = MI\.
If +bcTl =11 +cTbl
(0 - 0,)'1',-1(0 0,) = {TM-
I
1',-1 = - '{,;, 1
8.6 RECURSIVE JlJENTIFICATION IN UOUNDED NOISE
By transferring the tidied-up term in v, to the right-hand side 01'(8.6.15) then
dividing through by the right-hand side, we tind that the updated ellipsoid is
(8.6.9) with
(8.6.23)
p
V, = nAi/2 = 11\.1
1/2
= IM-
I
1',MII/' =.11111- 1/'11',II/'IM1
1/
2= 11',1'/
2
j"" I
We have yet to choose 1', to make E, as tight a bound as possible. With D,
unknown, tightening , has to be interpreted as making E, small. The volume
of E, is proportional 10 11',1
1/
', as follows. As 1', is positive-detinite, all its
eigenvalues AI to AI' are positive and its eigenvectors nil to III
p
are orthonormal
if suitably scaled, i.e. milllj is zero for i '" j and I for i = j. If
I\. &;, diag(AI , A" ... , AI')' 111 = [111
1
111
2
.. III
p
] (8.6.19)
then by definition of eigenvalues and eigenvectors
and 1"1' is AI -1. The description of is simpler ill terms of e, dellncd as
M'(O - 0,), Since
= = (0 - O,J M'(O - 0,) = 110 - 0,11' '(8.6.'11).
this change of co-ordinates just shifts the origili to Of and rotates the axes;
without altering distances measured from the new origin. Now
P;-I = P'-l - PIP, -
1
U;I'(R
f
+ f'IUfP,_1 U,T)-I U/p
t
-
1
For scalar observations, we can write this as
P;-l = (1- p/pt_1u/u;r/{r? + PIU; p/-
1
uI)P
,
_
1
then lind 11';-11 via the leml11a (Goodwin and Payne, 1977, Appendix E)
We must express jP
f
!as a ("unction of P" First we tackle IP; _JI. The matrix-
inversion lemma turns (8.6.14) into
so E, is centred at the origin, with its axes aligned with the {axes and of half-
length A,1/2, i = 1,2, ... ,1'. Thinking of E, as a unit hypersphere I
squashed or stretched successively in each direction by a factor ).//2, we see
its volume is proportional to
(8.6.14)
(8.6.16) 0,=0'_1 +P,P;_IU/R,-l",
H SPECIALISED TOPICS IN IDENTIFICATION
Fig. 8.6.3 Updating or cllipsoillal outer boulll.J for 11.
v, = y, - U/i'_l"
23H
where
where
so for any non-negative PI'
(0 O'_I)T1',--\ (0 - 0, _I) +p,(Y, - V,O)l R,- I (y, - V,II) ,;, I +p, (8.6.12)
The left-hand side is quadratic in II and can be re",-ranged into the form (8.6.9),
to give us " Writing UfO as U/J,_ I + U,(J - 0,_ I) and collecting terms, we
tind
- T 1 . - -I' I ..
(0-11'_1) 1"_I(O-O,_,)+p,(y,- V,II'_I) R, (y,- V,O,_,)
- 2p,(y, - V,O, _, )TR,- I V,(O - 0, _I) ,;, I +1',
(8.6.13)
Already echoes of the Kalman liIter and recursive I.s. are heard. In (8.6.13),
both terms in II - 0,_ I can be incorporated in the new quadratic by completing
the square:
(0 - 0,- 1 - P,P;-l U;rR,-IVJfp;:::: :(0 - 0,_ J - 1 IV,)
-I- 'v,"'-- R,-l U,P;_1 U/R,- I l "
= (0 - O,)Tp;::::: (0 "- Of) -I- I - fJ,R,-1 V,P; _I U/R,- j )"
f
I -I- PI
(8.6.15)
The quadratic in I', in (8.6.15) is scalar, and independent oro. Weean tidy it up
by substituting 1';-1 from (8.6.14) then applying the matrix-inversion lemma
(7.3.26) backwards, giving
R-I - R-
I
UP' VTR-
I
I PI r t /- I I f
= R,-I - p,R,- I V,1',_ 1(1 + p,V,rR,- I V,1',_,) -I V,'II,-I
T - I
= (R, + PI V,1"_1 V, ) (8.6.17)
(8.6.29)
241
9,
o J[J'OOJ
I.sg x 10
6
-209 J[ 327J
- 2.09 x lOs 275
2.11 X 10
6
27.5
9.J3 x 10' J'[ 4.
70
J
-2.16 x lOs .. 286
1.03 x 10
6
- 254
o
1.50 X 10
6
o
-2.09 x to
J
3.88 x 10
4
-2.09 x lOs
_ 3.28 X 10
3
5.09 X 10
4
-2.16 x lOs
P,
min(u1"O) -"s +
(/ED o,"v
*8.6.3 Tolcraneed Prediclion
The output due to a spcciHetl Us in a bounded-noise model, with 0 known to
be within a region D, is preJictcu by stating bounds between which it will rail.
For a purpose like alarm scanning Of checking whether the output will be
within spccificalion, such a prcJiction is altractive, since it states deflllitely
whether, according to the model, an alarm condition could arise or whether
the output specification will certainly be mel. Similar comments apply
whenever the object of the prcuiction is to facilitate a yes/no decision.
With
An allernative to bounding the parameters by an ellipsoid is to compute a
"box" of bounds on the individual parameters (Mil'anese and Belforte, 1982).
The computation comprises a number of linear programming problems, one
per bound.
OUf look at parameter bounding ends with an example of an application
for which the feasible-parameter region need not be explicitly calculated.
P,
g,
v, 1',
P;-I
[
JOO
10' J 0.500 /.50 0
0
[ 424
2 6.09 x 104- 56.0 0.472 1.42 - 2.09 x 10
3
-209
[ 540
J 5.03 x IU
J
-17.4 0.445 1.39
-3.28 x 10J
9.33 x 10
3
As in recursive I.s., updating P takes the bulk of the computing. The extra
work to {iuu p is insignificant.
An interesting thing happens if we take lU
4
1 as Po' The first step is lillie
allcctcu and in the SCCOIlU we get a reasonable-looking P2' 0.333, but
according to (H.6.IH) P, is -1.291",. Negative principal-diagonal elements
appear in P" and the algorithm breaks down (E, is not an ellipsoid). The
explanation is that F
1
does not intersect EI' which is too small
too small. We must make Eo large enough to be sure it contains aBO
compatible with the observations. 6.
8.6 RECURSiVE IDENTIFICATION IN BOUNDED NOiSE 8 SPECIALISED TOPICS IN iDENTIFICATION
(I' - I +((2p - 1 - g, +vng,fll + - - g,) = 0 (8.6.28)
Ilelforte and Ilona ( 1985) have recently pointed out that the bound E, can be
tightened, at observations for which only one of the two hyperplanes forming
F
1
intersects E
t
_ I ' by replacing the one which <..loes not intersect 1_1 by a
parallel hyperplane tangent to E'_I before computing p,.
I{ecursive Ellipsoidal-Outer-Bounding Algorithm for Paramclcr.. 8ounding
Identification (Fogel and ('Iuang, 1982) With model (8.6.1), the
algorithm updates the ellipsoid
E:(O - ijf'r 1(0 - 0)';: I
as the onter bound for the feasible-parameter legion.
With 0
0
, Po specified (e.g. 0
0
=0, Po =
(i) Calculate gf = P,_ i u
f
and v, = Y, -
(ii) Find PI as positive root of(8.6.28); ifno positive real root, set PI
to;zero or stop and review I11odel.
(iii) Calculale P;_I from (8.6.25).
(iv) Set 0
1
= 0
1
_ I +PIP; _I (sealar-y version of (8.6. 16)).
(v) Set P, = (I +P, - +p,g,)P;-1 (sealar-y version of
(8.6.18.
Example 8.6.2 We apply the recursive algorithm to the problem of Example
8.6.1 with 0
0
zero and Po = 10
6
1. The lirst three steps give, rounded to three
figures,
Taking detenninants of P;_ I and P, in (8.6.18), we' then have IP,I in terms of p,.
The P, which minimises V, follows by routine algebra, setting oV';op, or
oIP,l/op, to Zero. It is the positive root of
If both roots are negative or complex, we infer that ,_ I and F, do not
intersect, in other words y, and Of arc jointly incompatible with r-1 and the
model at the assumed 1"/, This explicit warning allows us to revise the model or
r" or set P, to zero and ignore )', as all outlier.
with pP
f
- 1"' for band - u/(r; + Pi -tUr) for c. Denoting U}"P'_1 Uf by
240
243

,I''
I-O.8:-'J
I - 0.4:-'
1- 1.8:-' - 1.6:-'J
1- 1.40-' - 2:-'
The corresponding rational-transfer-function matrix is
') '" '(:-' )8,(:-')
8.7 IDENTIFICATION OF MULTIVARIABLE SYSTEMS
the model has exactly the same trallsfer-functiollmatrix G as in (i), bul A, and
B
1
have fewer parameters than A 1 and B
1
, and, depending on how far the
degrees 01" the elements arc known ill advance, arguably fewer than G. Single-
input-single-output systems have no such choice between two m.Ld.'s.
If
with many fewer parameters than we might have expected from the degrees of
elements in A I and BI' Nothing similar can happen in s.i.s.o. models where, in
the absence of polc--zcro cancellation (which incidel1 tally docs not OCCur here),
all cocllicicnts ill A t and B
1
in G.
(ii) All equally valid model is Ihe right mfd.
\'(:-') = lJ,(:-' )11;'1:-' )U{:-') + noise
Example 8.7.\ A multivariable system with input u and output y could be
described by the :-transform model
y(:-') = G(o-' )U(:-') + noise
where G is a malrix of rational transICr functions. there is more than one way
to write this model.
(i) 11" we slart with a difl"erence-equalion model and take :-transforms,
getting an a.r.Ii.l.a.x. model
II, (:- , )\'(: - ') B, (:- , )U(: - , ) + noise
with A I and BI matrices or polynomials in z-. I and A I non-singular except at
isolated values of z- I, we obtain the !e./i mutrix:!i'uction description (Ill.f.d.)
(Kailath, 1980; Goodwin and Payne, 1977)
\'(:-') = A '(:-')B,(:-' )U(:-') + noise
For example, we might have a two-input, two-oulput moLiel with
OUf limited aim in this section is to introduce two ilew aspects
opened up by m.i.m.o. systems: choice of parameterisatioll and choice
function. One of the most striking things about the Ii Leraturf on identification
is its concentration on s.i.s.o. systems (or occasionally m.i.s.o. or
which raise few new issues). Only a small proporliOI1 deals with m.i.I11.o.
identification, in spite of its importance, and this book is: no exception.
reason is not mere faintheartedness. As well as generally having
parameters than s.i.s.o. systems, m.i.m.o. systems raise substantial
problems. One, considered later, is how to base a cost function or risk on
vector of output variables. Other dillicultics arise when we start to choose
model structure. First, our fundamental nced in any idcntilication exercise
to understand whal goes on in the system well enough to judge the validity
the results and their practical implications. A m.i.m.o. system need not have
many inputs and outputs lor its overall behaviour to be lao complex to
all at once.' We are then forced to investigate onc s.i.s,o. (or cOl1c<civably
m.i.s.o.) relation at a tin-leo Scientific method itself owes much of its analytical
nature to this facl. By breaking a Ill,LIlLO, problem UOWIl in this way, we
also sidestep another new dilliculty, which is to choose an acceptable
parameterisatioll.
8.7.1 l'aramelerisation of Mulli-Inl'ul-Mulli-OulpUI Models
It is a non-trivial matter to decide how to paramcterise a m.i.m.o. model even
when its general type has been selected. The difliculty is best appreciated
through examples.
Example 8.6.3 We want to predict the target position at r = 2 from the results
of Example 8.6.1. To stay on Fig. 8.6.1 we assume X
o
= 5. In practice we would
keep X
o
free, of course. At r = 2
min(5 + 20, + 20,) - r," Y," max(5 + 20, + 20,) + r,
/JED /lEO
8.7 IDENTIFICATION OF MULTIVARIABLE S\'STEMS
y, is predicled by finding the extrema of a linear function of 0, subject to linear
inequality constraints if D is a polyhcdron. This is the standard linear
programming problem (Hadley, 1961; Luenberger, 1973) for which ellicient
methods able to handle hundreds of constraints exist.
so we need only find the extrema of 20, + 20, in the feasible region. They are
where lines ofslope - I touch the emls of D, al(O" 0,) = (261.25, - 18.75) and
(207.75, 122.5), the southernmost and northernmost vertices in Fig. 8.6.1.
corresponding range for Y... with r, =10, is from 480 to 675.5. D,
242
8 SPECIALISED TOPICS IN IDENTIFICATJON
245
8.7 IDENTIFICATION OF MULTIVARIABLE SYSTEMS
structure enuugh to make it uniquely identifiable if we are interested in the
parameters themselves, or minimal (in number of parameters) if we are only
interested in the model output but do not want to risk ill-conditioned
estimation. The choice of structure is further complicated by doubt over the
minimal number of parameters to fit the actual system adequately, and by the
desire for a structure which suits the application, e.g. control design. Selection
of a parameterisation is di,cussed by Glover and Willems (1974), Denham
(1974) and Gevers and Wertz (1984). Analogous comments apply to the other
main family of multivariable models, slate-space descriptions.
These examplcs and references also indicate that a large body of
multivariable theory underlies mj.m.o, idcntification. Ils unfamiliarity is
another rcason why wc cannot pursuc m.i.m.o. systems further.
0.45:-' ]
0.5 + 0.15z'
-0.5z-
2
]
0.5 - 0.35:.
1
- O.4z-'
(iii) The right m.Ld. with
A.-I _ [ I - 0.9z 1
,(. ) - -0.3
B,(Z-I) = [0.7 1
are still polynomial I1lalrices. Such all is called a right divisor of A2 and B
2
.
It does not alter G because
also gives the same Gis in (i) and (ii). The dillerences from A, and B, arc due to
the existence of a polynomial matrix M(z '), non-singular lor all but isolated
values of :.1, such that
244
(8.7.1)
8]A
3
1
= B
2
w/-
1
!vIA;' = B
2
A;'
We define the degree of the m.f.d. as the degree of IAI, since the common
denominator polynomial of the clements of G is I.ln arising frolll A- I. In
model (iii),
so deglA ,I = deglA ,I - deglMI
and going from A2 and B
2
to AJ and B] reduces the m.f.d. degree unless IMl is
of degree zero, i.e. a constant. That is why M is called a divisor. In the present
case
8.7.2 Cost Functions ror Multivariablc Models
Many s.Ls.o. identification algorithms minimise a scalar I.s. cost function of
the errors between observed and model outputs. The vector-output
generalisalion of the sum of squares of output errors is the matrix! L cO",ti>.
N
SN = L(Y, - Y,)(y, - Y,J
T
/""1
,['+0.3:-
1
"/(z- ) =
0.6
so we arc prcsented with an extra choice, how to derive from SN a scalar VN to
minimise. We might decide to weigh all elements of Y, - Yr equally and use
Weighted I.s. wilh a weighting matrix. IV can also be implemented through
(8.7.2)
(8.7.3)
(8.7.4)
,..." N
VN = L(y, - y,ll Wry, - y,) = Ltr W(y, - Y,)(y, - y,l r
/""1 /""1
=tr IVS
N
Finally, the cost function
turns out to have considerable appeal. If we assume lhat the error sequence
{Y - 5'l is composed of independent, Gaussian, zero-mean random variables
The example shows Ihat redundancy in an m.Ld. multivariable model,
introducing unnecessary parameters and consequent ill-conditioning, may be
non-trivial to detect even if the degree of the matrix-fraction description is as
small as possible, as in (ii), (iii) and (iv). An infinity of apparently diJferent
models has the same input-output behaviour. We must restrict the model
so 1MI is 2, of degree zero, and the degree of the m.Ld. is not reduced from (ii)
to (iii). A polynomial matrix with constant determinant is called lin/modlilar.
(iv) A len m.Ld. of degree one less than in (i) and the same G can be
obtained using the len divisor

which is not unimodular since IMI is Z-I. The details arc len to Problem 8.7
::,
8.8.1 Volterra-Series Model
8,8 IDENTIFICATiON OF NON-LINEAR SYSTEMS
247
(8.8.1)
18.8.2)
(8.8.3)
, .\ "J,!.",.
E[1/(1 - r 1)11(1 - ',).1'(1)] = ,5(, J - ,,)Ey(l) -I- 2/1, (, I ' ,,)
,
-I- f.,' ... f.,' iI,.(T I , T" ... , T,) rII/(r - ")d,, -1- .. Cfe
;= 1
worse for non-linear systems. The generalisation of the
inputoutput convolution
.1'(1) = J"'hlr)l/(r - ,)d,
II
H.H IDENTIFICATION OF NON-LINEAR SYSTEMS
Much elrort has gone into ways to identify the VO/IC/TG'kerne!sl!zI' "1' etc.,
mostly based on gcneralising the correlation mcthods of Chapter 3 (Billings,
1980). For example, for a Gaussian white-noise input (but not Gllywhite noise)
[1/(1- ').1'(1)1 = /11(')
and with ergodic signals time averages can be computed to approximate the
left-hand sides.
A representation as general as (8.8.2) would only be contemplated if the
system were too poorly understood to suggest a more specific model.
However. in those circumstances only an extreme optimist would expcct much
fron1 a model like (8.8.2). Unconstrained estimation orthe 11th Volterra kernel
is impracticable rOf 11 > 2 and uninviting for n = 2, simply because of the
number of points (or componenl functions) required to specify a function of"
indepcndenl variables. If a typical impulse response takes about 15-30 points
to describe it in the absence of good prior information 011 its shape, /12, might
take several hundred and h] Illany thousands. Large quantities of
observations and computing arc required to estimate the points ade4uately.
Even theil, the prospects of extracting any meaning from the results are not
good. It is from trivial to pick out the important Ji:alurcs from a lillear-
system u. p. r. cSlilllu (e in the presence of noise. Interprela tion of it dClailcJ and
Iloisc-alrcctcd h2 (r I- T
1
) is much harder, and the crucial step of infclTing the
form of a parametric model from the un parameterised results is daunting.
Simple non-linearitics such as hard saturation give rise to non-simple kernel
behaviour, to add to this dilliculty. Marmarelis and Marmarelis (1978) present
case Sludies whieh illustrate these points well. A number of ways of alleviating
the difficulties of Volterra-series identification have been investigated,
including usc ofa dcterministic pseudo-noise input (Barker and Davy, 1978)
is the I/o/Ierra series (S(:hetzcll, 1980)
y(r) = I.' iI
j
IT)II(1 - r)d,-I- I.' J" iI,lr j ,,)1/(1- 'j)I/(r-,,)d'j dr, -I- ...
() . () IJ
8 SPECIALISED lOPICS IN IDENTIFICATION
Techniques for analysing linear Llynamical systems afC clfcL:livc and relatively
easy to usc. We can swap rCi.luily from one to anuther as cunvenience dictates,
from Laplace tranSrOrillS to slale equations say, or from dillcrcncc equations
to transfer functions. We rapidly acquire an intuitive understand-
ing of lincar systems and arc happy to think illlcflllS of poles and zeros, power
spectra ano correlation functions, step ilia..! impulse responses. The contrast
with analysis of non-linear dynamics (Vidyasagar, 1978) is sharp. Methods for
non-linear systems tend to apply only to restricted classes, to give only partial
or approximate information and to be cumbersome. The reason is that
linear behaviour is vastly diverse and complex. Linearity and timc-invariance
impose tight constraints on possible behaviour. If wc say a linear system is
stable we don't have to add ifs and buts about initial conditions or size of
disturbance. We need not worry about limit cycles, birurcation and chaos
(Mees and Sparrow, 1981). Superposition makes it routine, almost trivial, to
relate linear-system response to excitation and initial conditions. An impulse
response or transfer [unction says all there is to say about the input-output
behaviour of a linear, time'-invadant systcm. Jump rcsonance, subharmonic
oscillation and generation of harmonics do not occur.
We might expect that non-linear systems arc also generally much harder
identify than linear systems, and this is so. A coherent body of WC"'ClljIlCCIl,
well tried and widely applicable identilication technique ror
systems does not exist. The rest or this scctioll reviews brielly some of the
diHiculties in identification poscd by non-Iincarity, and some situations where
progress can be illade.
The first dilliculty, once non-linearity has been detected, is to lind a versatile
but economical form of model. We know that for linear s.i.s.o.
versatility conflicts with economy. FOJ: example, an impulse response
(estimated as a discrete-time u.p.r.) will cope with any such system without the
need to determine the model order and dead time explicitly, but requires
ItlOre coetlicients than an equivalent rational transfer function. The conl1ict
246
with constant but unknown covariance, the paramctci' estimates which
minimise 15,,/NI arc the 111.1. estimates. They arc therefore asymptotically
efficient. Furthermore, S,,/N is the m.1. estimate ofcov(y - y) (Ljung, 1976).
249
(8.8.11 )
r-y(1)
Linear
Fig. 8.8.1 Ulock-orienled cascalle model.
u(/)
8.8 IDENTIFICATION OF NONL1NEAR SYSTEMS
(Sectioll 3.1). Assumillg.\'(I) ill Fig. 8.8.1 to be separable with respect 10 itself,
the cross-correlation aCross the l1on-linearity is
1'.,,,.(1, - I, I = f,', J' x(l ,J/(.r(l ,p(x(l,), dX(l,) <lX(I,)
= r, f(x(l,))g,(x(t,))g,(I, -1,)dx(/,) (8.8.7)
With g,(X(/, from (8.8.5) and g,(I, - I,) from (8.8.6),
,. ,(1,-1 )=f"' /(X(I p(x(l,))X(l,)g,(O)rxx(l,-/,) /.
,.., -, .' (0) . u(I,)
-'r g2 l.n.(O)
t,u-(tl-ll) ..
= ru(O) E[X(I,)f(x(l,]=cr.,(I, -I,) '<I, (8,8.8)
where c depenJs ollf() and the p.d.r. oLr(l) but nol on the time-structure of
x(l).
to (8.8.8) the non-linearity has the Same effect on r.l:1I' as a gain c.
!h15 enables us eventually to write 1'11,\0 in terms of h
ll
and !lh' the
,mpulse respouses in Fig. 8.8.1, as follows. Providing linear operators do not
destroy separability, so that 11(1) is separable with respect to x(l), the Fourier
transform of (8.8.8) implies that "
R"".(jw) = Rm(jw)!IJ,,(jw) =' cR.liw)!IJ,,(jw) = cR,x(jw) (8.8.9)
Back in the time domain. (8.8.9) says we can replace,., by cr so
lIll' . 11.'1"
,.",.( r) = 1"h"(,, )""".(r - ")<1,,
= e i'" h,,(r,) i'u h"(t,)r,,,,(, -, I - ")d,, <I" (8.8.10)
o {I
whe..e last stepfollows from the Wiener-Hopf equation. lfwe employ an
IIlpUl whIch IS white as well as separable, t/lu(r - T1- T1) is zero except at
T2 = r - T1 where it is d
1
7" and
8 SPECIALISED TOI'ICS IN IDENTIFICATION 248
1(, (x(l, ))1(,(0) =p(x(l,) )x(l,)
and so the a.c.f. of x at lag I, -'I, is
f
"'
_q x, (I,)p(x, (I, ),x,(I,)dx, (I,)
= f:', x,(I,)p(x,(I,)1 x,(I,))p(x,(t,))dx,(I,)
= p(X,(I ,)E[x, (I,) I x ,(I ,)J = g, (x,(I ,)g,(I, - I,)
r,)I, - I,) = f:'" x(l,) f:" .r(l, )1'(.\'(1,), x(l,)) '/.\'(1,) '/x(l ,)
= f'., X(/,)g,(X(l,))g,(/, -I,)'/x(l,)
= f:" x'(I,)I'(.\'(t,))(g,(I, -/,)!g,(O)d.\'(I,)
= (g,(I,-/,)ru(O)!g,(O)
Separability allows us to replace the instantaneous non-linearity by an
equivalent gain, in equations analogous to the Wiencp... Hopf "1"m.v..
or a mixture of sinusoids together with a rational-transfer-function version of
the series (LaWrence, 1981), but nothing can nullify the essential wastefulness
of the representation.
Faced with these dilliculties, we shall pass. on to more restricted models.
11.11.2 Block-Oriented Models
then x, (t,) is said to be separable with respeetly x ,(I,). Among others,jointly
Gaussian processes are separable. We can fino expressions for g I and g:! in the
case where Xl and Xl arc the same process x. Putting II equal to t.2' (8.8.4)
Many systems display significant nOll-linearity only ill olle or two Illcl110ryless
relations, and can be represented as cascades of lincar dynamical and non-
Iil1car instantaneous sub-systems. Figure 8.8.1 shows one blvck-oriel1ted
model of this sort. With a suitable input, the contents of each block ill this
model can be identified from cross-correlatioll functions by exploiting the
property of separabilily (Billings and Fakhouri, 1978 and 1982). If two
processes xdl) and X,(I) have ajoint p.d.r. p(Xd',), x,(I,) such that
I1I1
I
1111:
lill
m!
\
i
11111
fill:
!II
i

251
IH.Y.I)
COVl', = R,
y/ = }-fIX, + l'f'
xf=(I,/ ,XI I +B,_Ju/",
8.9 SIMULTANEOUS ESTIMATION OF PARAMEIERS AND STATE
and has a regressor vector [U
I
_
k
II
I
_
k
(Y,_1 - v, _I)]T at instant t. The noise is
partly additive. partly multiplicativc in the regression equation although
physically additive.
The regressor containing the unmeasured 1 might be replaced by a
recursive estimate
The standard slale-estiJ1lation problem (Kalman. 1960; Jazwinski, 1970;
Maybeck, 1979) is to estimate x, recllrsively from known inputs {II }'up to'.U,'c1
and observations !}'} up to Yf' with given initial conditions (x
o
and its
covariance) and the model
where (i, hand YI-1 - 6, come from the previous step. An interesting
alternative would be to treat b + {l/',_, as a time-varying first parameter, with
the regressor vector [u
l
_
k
lll_Ll'l_1r
r
. The temptation to extemporise is
considerable. 6
8.9 SIMULTANEOUS ESTIMATION OF PARAMETERS AND
STATE
Matrices HI" (1)1 1,81 - I' r
l
_ I' Q,_ I and R, are taken as completely krwwn: In
the aerospace applications where slate estimation was so sliccessful in the
1960's, the state model often described Newtonian dynamics and most
parameters were indeed well known. Even in those circumstances. filter
performance can sometimes be improved by relining uncertain parameter
values. An example is when approximations have been made to simplify the
model, so that some parameters hide non-linearity or high-order dynamics.
and vary as a result. For other applications such as process control, state
estimation may well not be feasible without estimation of uncertain
parameters,
State estimation in general. and combined parameter and state estimation
in particular. are heavily technological subjects (Bierman. 1977; Maybeck,
1982). We call an'ord, however, a brief look at the main ways or tackling the
combincd estimation problem. Estimation of QI_ J and R
I
on-line wilillot be
considered. as specialised lechniques tend 10 be employed (Mehra. 1974;
Maybcck. 19H2).
8 SPECIALISED TOPICS IN 250
As observed in Chapter 4. a regression model for 0,1.5. or its variants may be
non-linear in the variables, although it must be linear in the parameters. Non-
linearity may, however, cause a new dilIkulty in noise J1lOuclling (Billings,
Noise
v,
8.8.3 Regression Models for Non-Lillear Systems
Fig. 8.8.2 Non-linear model with additive nuise. Example H.8.1.
1984). Noise which is physically additive at the output will appear .
additively in the regression equation whenever earlier values of the
output enter non-additively into the regressors.
Example 8.8.1 The system modelled by Fig. 8.8.2 is described by
y, = bll
l
_
k
-llll,_dy, 1- ['/- I) -I- 1'1
The integral is the convolution of II" and II" that is, the overall
response of the linear sections cascaded. The non-linearity shows up. only
gain c. The overall linear dynamics can therefore be identified, to within a
factor, by cross-correlation just as in Chapter 3, but with a more re,;tri,ct"d
choice of input, separable as well as white.
Billings and Fakhouri (1978) show that with suitable assumptions cross-
correlating 11
2
(1) with .1'(1) gives
r.',.(r) = const xl'" 11/>( r ,)11.;( r - r I) ilr I (8.8.
"
and that the z-lransform transfer functions of the linear sections can then be
estimated straightforwardly from the transfer-function products
ing to rhe convolutions in (8.8. JI) and (8.8.12). An important byproduct
approach is a model-structure test. If the first linear block is absent, i.e. 1I,,(t)
6(1), (8.8.11) and (8.8.12) give the same result to within a scale factor, and
second linear block is absent, thc result of (8.8.12) is a constant times
square of that of (8.8.11).
'I'
"
II
lill
I
I
[II
[iii
253
(8.9.4)
(8.9.5)
(8.9.6)
(8.9.7)
(8.9.8)
y, = g(x;, v,)
8.9 SIMULTANEOUS ESTIMATION OF PARAMETERS AND STATE
If the state and observatioll equations before lillearisatioll are
anti lhe devialions arc
where rand g are differentiable with respect to X;_I and x; respectively, the
Iincarised equations are HI.
.Lt.:
y, = g(x;, 0),
JY2., =;t4-., ()xl., +.t
Lt
<5x
4
., + .\"5,1 + v
2
.
1
where <h 1.1 and <)X4-., are deviations from nominal values .X: 1.1 and .\"4." and
JY2., is lhe devialion 01'.1'2., from,\" 1."\"4.1' Suilable values for,\" 1.1 and .t...
1
arc the
estilllates based 011 observalions up to sample ( - I. Noise 1'1.l is optional. and
accounts for tile lincarisatioll error (h'
L1
(nol ullColTcialCd with lhe stale
as the filter assumcs, in fact).
Thc presence of .\'4 and x ... increases lhe computing substantiiilly aild
illtroduces sume uoubt over the ellects of the linearisatioll error. 6.
The augmented state equations are still linear, but the second observation
equation must be Iincarised lo
,5y, G, ,5x; -I- g(x;, v,) - g(x;. 0)
where elemellts (i,j) of F, _I alld G, are
ar I
[F,_dij=;:j-.-;-'- ,
L"l:j.I_1 _;_1
with thc nominal values given by
Notice that the 1I0ise terms are allocateu to the linearised state and
observation equations rather thall to the 1I01l-lillear time-update (8.9.8).
The obvious choice for I is the up-to-date estimate x; I' The tirne-
updated nominal slate x; is then used ill place of x;I'_ I' and Y, in place of the
one-step-ahead prediction H,x;I'_I' in calculating the innovation and
updating the stale estimate when observation J', is received. The remainder of
the Kalmall filter, with F,_ I for $'_1 and G, for H" is (8, 1.11) and, with P",_ I
for P, _I' (7.3.12), This state estimator based on equations Iinearised about
;(,_ is called the extended Kalman/ilter.
0.0025J
w, _,
X:;,t =X:;.t_1
8.9.1 State Augmentation and Extended Kalman Filtering
Example 8.9.1 The position of the accelerating targct of Example 4.1.1 is
observed every 0.1 s by two instruments. At sample instant ( one instrument
gives YI.! subject to zero-mean error, uncorrelated with earlier errors.
other gives )'2." noise-free but affected by unknown gain and constant biaS.
The acceleration of the target varies unpredictably from aile sample instant to
the next, but the changes are assumed to have zcro mcan and known ViII lil'IC".
The target position x and velocity.\.' arc to be estimated aftcr each observat.lon
instant.
We define the state as [x .\.' .\:y and derive discrete-lime stale equations
by trapezoidal integration of ., then .<, yielding
[
I 0.1 0.005J [I 0.05
x,= 0 I 0.1 x,_,-I- 0 I
o 0 I 0 0
252
where elcments I and 2 of W, _ I account for the errors in integrating vc,ve",
arid acceleration, The gain and bias of the second instrumcnt are treated as
extra state variables X4- and X:;, so
The unknown parameters 0 can be regarded, as in Section 8.1.3, as state
variables distinguished only by our preference to regard them as such and by
their simple dynamics. They can be adjoined to the other state variables to give
a state equation
x; =['\ I I f\}, ,I [\ I I;};, (H.9.2)
where w; _1 is w,_ I augmented by 1l0ll
w
zci'o clements for any clements of 0
modelled as time-varying. The augmcnlcu-statc observation equation is
y, = [H, O]x; -I- v, (8.9.3)
Simultaneous estimation of x, and 0, can now go ahead, once a suitable
covariance has been specified for the elements of w;_, which represent
changes in O. Trial and error may be necessary in finding this covariance, but a
more serious difficulty is that any unknown parameter in (I', _ I' r, '- I Of H,
gives rise to a non-linear term in the augmented stale x; or augmented state
and noise w,_ I' As the standard recursive state-estimation algorithm. the
Kalman filter, has covariance-updating equations which rely on linearity of
the stute and observation equations and additivity of the noise, we must
remedy the situation by local linearisation.
255
(8.9.9)
(8.9.10)
, - I
=p(x, Iy". - .. y" 0) fJp(y,- i Iy, i- , ..... y" 0)
i"'O
8.9 SIMULTANEOUS ESTIMATION OF PARAMETERS AND STATE
At sample time I, Ihe likelihood of 0 is, by Bayes' rule applied repeatedly.
jJ(l'IO)=p(x"y,.Y,_, ..... y, 10)
=l't
x
,ly,, .. yl.O)Pty,, ... YIIO)
,- ,
I\' T
- 2 L {In((2rr)qIJit - iPt - ifr i-I /11_ i + R, - J)
i=O
FUrlher progress dcpcnds 011 assuming a form lor the p.d.f:s. If they are
assumed Gaussian, each is fully specified by its mean and covariance. The
mean of p(x, IJ'" ... i Y1,0) is simply the conditional-mean estimate X'II or XI
given 0 and the observations up to Yt. Both x
lis
and the covariance PIli
would be given. by a Kalman filter with parameter O. The mean of
P()'I- i!Yt -j- I' ... ,Y1,0) is the conditional-mean estimate H,_ jXr-ill-i _ I of
Y,-i' and its covariance, with contributions from the mutually uncorrelated
error in and noise VI_i' is Ht-iP'-i!I-i-tH:_i + R
t
-
il
which would
also be given by a Kalman tiller. Hence the log likelihood is
LW, xf } = -In((2nyrjP'!11) -1<X
I
- xtll)Tp;i/(x
j
- X'I') HI
X (y,_j-H
I
j:\:l-ill-,--I)\
Selling ('L(O. X,)/("'x, to zero we obtain
-1'",'(x,-x'I')=O (8.9.11)
which merely says that X'II is the 111.1. estimate of x
t
' under the Gaussian
assumption, provided the 111.1. estimate of 0 is lIsed in computing XIII' Il is
clearly impracticable to run a Kalman/ilter over the whole observalion set for
each conceivable value of 0 and lind what valuc maximises L(O, XI)' so the
maximisation must be performed algebraically, selling DL(O, x,)/DO to zero.
The resulting equations arc complicated, mainly because each Pt-ijl-i-l
depends on O. They do not allow all explicit solution for 0, but have to be
solved iteratively. Maybeck (1982) gives a full account of how they can be
solved.
The drawback of this procedure is its very high cOillputing demand. which
necessitates a succession of approximations detailed by Maybeck (1982).
These include approximating the Hessian matrix alL/iJO]. by a matrix using
the gradients bUl not second derivatives of L, using the matrix
Siole
estimate
Xtl t - f
Fig. 8.9.1 Allcrnalcd paramcicr und slale estimation.
8.9.2 Alternated Parameter nnd State Estimation
One-sample
Innovation v,
delay
I
YI
PredictIon
K Filter
+
error
One -somple b:--
Irme and
v,
algOrithm
8, delay 8
t
l
/
I observation
update
1/- ,
Observa fion
equalion
Sampler r
254
The arrangement illl;identally requires less COlllpulillg than extended
Kalman littering with an augmented state. The computing demands of the
parameter estimation algorithm arc comparable with those or state
estimator for any given Ilumbcr or unkllowlls, whereas the dCllli.lnd 01 the state
estimator rises more than linearly with the number or state variables.
8.9.3 Maximum- Lil<elihood Estimalor
Both techniques discussed so far involve substituting nominal parallTIeler
values for best values when estimating the state. (We say "best" rather than
"actual'; values because the model is not an exact representation of the
underlying dynamical process.) Whalever their asymptotic properties,
schemes are approximate in finite samples and rather heuristic. Amore
procedure (Maybeck, 1982) would be to minimise at each recursion step
log-iikelihood function of the parameters. The likeiihood funclion
the entire history of the state estimates and covariunces, since they are all
affected by any change in 0 applying throughout the recursion.
The need for local lincarisation can be avuided by separating parameter
updating from stale updating. Figure g.9.1 shows a sl:hcmc su.ggestcd .by
Goodwin and Sin (1984). A similar arrangement has been used In adaptive
tUtering for equalisation of data-communication channels and
Norton, 1979). A Kalman-filter slep, wilh the current estimale 0, _I as 0,
computes x and passes the predictiollcrror (innovation)
Y
- y' (x OfJI I) to a prediction-error recursive idclltifkation algorithm of
I I' ,-I . . . ..
the type discussed in Section 7.4.6. A step of the IdentlhcatlOn algOrIthm then
produces 0, for the next stateestimation step.
Barker, H. A., and Davy, R. W. (1978). Measurement of second-order Volterra kernels using
pseudorandom ternary signals. lilt. J. Comrol 27, 277-291.
Dclforte. G., aild Bona, B. (1985). An improved paramcter identiJication algorilhm for signals
with unknown-but-boUI){led errors, IFAC Symp. IdelltificaIion System Parameter
Estimatioll. York, England, 1507-1512.
Bellman, R. A., and Astrom, K. J. (1970). On structural identifiability. Alath. Riosci. 7, 329-339.
Bierman, G. J. (1977). "Factorizatioll Methods for Discrcte Sequential Estimation". Acudemic
Press, New York <Iud London.
Billings, S. A. (1980). JdentiJicalion of nonlinear survey. Proc. lEE, Pt. D 127,
272-285.
Billings, S. A. (1984). Identification of nonlinear syslems. III "Non-linear Syslem Design" (S. A.
Billings, J. O. Gray and D. H. Owens, cds.). Peter Peregrinus, London.
Billings, S. A., and Fakhouri, S. Y. (1978). Identification of nonlinear systems using correlation
analysis. Proc. lEE 125, 691-697.
Billings, S. A., and Fakhouri, S. Y. (1982). Identification of systems containing linear dynamic
and stlltic nonlinear ,elements. AlifOnuitica 18, 15-26.
Blackman, P. F. (1977). "Introduction to State-variable Analysis". Macmillan, London.
Bosley, M. J., and Lees, F. P. (1972). A survey of simple transfer function derivations from high-
order models. Alito/llaIica 8, 765-775.
Carson, E. R., Cobelli, C, and Finkelstein, L. (1981). The identification of metabolic systems-a
review. ..1111. J, Physiol. 240. RI20-RI29.
Chen, C. F., llnd Shieh, L. S. (1968). A novel approach to linear model simplifioatioll'l/m, J.
COl/froIS,561-570. ., .
Cobelli, C, Lcpschy, A., and G. (1979). IdcntiJiabiJity of compartmcntal systems
anti related structural properties. Math. Biosci. 44,1-18.
D'Azzo, J. J. and Houpis, C. H. (1981). "LinearConlrol System Analysis and Design", 2nd cd.
McGraw.llill Kogakusha, Tokyo.
Delforge, J. (1980). New resulls on the proble.1l1 of identifiability ofa linear system. ModI. Biosci.
52, 73-96.
Delforge, J, (l981). Necessary and suflkient structural conditions for local identifiability of a
system with lincar compartmcnts. Math. Riosci. 54, 159-180.
Denham, M. J. (1974). Cunonical forms for the identification of multivariable linear systems.
IEEE TrailS. All/om. COI/Irol AC-19, 646-656.
Feldbuull1, A. A. (1960). Dual conlrol theory. AIIIofllatioil Remote COl/trol 21, 874-880,
1033-1039; 22, H2, 109-121.
Feldbaum, A. A. (1965). "Optinial Control Systems". Academic Press, .New York and London.
Fogel, E., and Huang, Y. F. (1982). On the value of information in syslem identification-
bounded noise case. AUlOmatica 18,229-238.
Friedlander, B. (1982). Lattice filters for adaplive processing. Proc. IEEE 70, 829-867.
Gabel, R. A., and Robcrts, R. A. (1980). "Signals and Linear Systems". Wiley, New York.
Gevers, M" and Wertz, V. (1984). Uniquely idenlifiable stale-space and ARMA paramctrizaliolls
for muhivariable linear systems. Alltomatica 2U, 333-347.
Glover, K., and Willems, J. C. (1974). Parametrizations of linear dynamical systems: Canonical
forms and identiJiabiliLy. IEEE Trans. Autom. COlllrol AC-19, 640-646.
Godfrey, K. R. (1983). "Compartmental Models and Their Applicalion". Academic Press, New
York and London.
Goodwin, G. c., alld P'.ylle, R. L. (1977). "Dynamic Systelll IdentilkallOu Experimellt Design
alld Oala Analysis". Acaucmic Press, New York and Lundon.
Goodwin, G. C, and Sin. K. S. (1984). "Adaplive Filtering Prediclion and COillrol".
Hall; EllglewUlld Gill's, New Jersey.
257 REFERENCES
REFERENCES
and by Bayes' rule,
(
III y)=p(y,III'Y<-I)p(1I1 Y,-,) OCjJ(y III y )jJ(lIlr _ ) (8.9.13)
J!, ( II' ) , ' , , , ,
P Y, I-I
As p(y,llI, Y,_ ,) has mean Ifx'I' _ , and covariance H,I'", _,1I,r + R, calculable,
given II, by a Kalman filter, (8.9.13) looks like a usable recursion for p(lIln,
assuming that PlY, III, Y, _,) is Gaussian and fully specified by its mean and
covariance. With both p.dJ.'s on the right-hand side of(8.9.12) accounted for,
we seem to be home and dry. However, each Kalman HiLer gives the
conditional mean and covariance for only one specific value of O. In both
(8,9.13) and (8.9.12) We need them for the entire range of possible values of II.
The idea is therefore only computationally feasible if II is is restricted 10 a
sufficiently small number of discrete values for one Kalman filter to be
assigned to each possible value. It might be possible to reduce the number of
values oro as the recursion goes on, asp(1I1 Y,) is likely to become more sharply
peaked as I increases. One virtue of the Bayes approach, the ability to cope
with a changing shape of 1'(111 n, would be lost by doing so.
Recall from Chapter 6 that Bayes' estimators arc derived from the posterior
p.d.f. of the unknowns. If we denote the observation history y" y, _, ' ... , y, by
Y" the posterior p.d.L we arc interested in is 0 I Y,). The most attractive
Bayes' estimator is the conditional mean (Section 6.3.1), which is optimal for a
wide range of loss functions. If we supply 0 and assume that PIx, I0, Y,) is
Gaussian, the Kalman filter computes the conditional mean as x'I'. The
question is, can we exploit this when 0 is not given, but is also unknown '!
The joint posterior p.d.L of x, and 0 is
p(x" 0 IY,) = pix, I0, Y,)p(1I1 Y,) (8.9.12)
8.9.4 Bayes' Estimatur
Al.Iby, P. R. lind Dempster, M. A. H. (1974). "JlltrotllH:tioli 10 OptimizlItion Methods".
Chapmun & Hull, Lonuull.
Astr6rn, K, J., UIll..! WiuclIlllurk, l.J. lIY84), "Computer CUlItfullctJ Systellls". Prentice-Hall,
Ellglcwom.l Clifrs, New Jersey.
256 8 SPECIALISED TOPICS IN IDENTIFICATION
(8x,_ '1'_'_ ,/80)(8x'_'I'_ ,_ ,/80)' in place of its expected value, updating Ii less
frequently than x and, more drastically, neglecting the dependence of P on 0
and updating 8
2
L/80
2
infrequently.
259
PROULEMS
gIves
ns) = ., __ +kill +k,,) V, (,1)+ k" V, (s)) ._
s- + (kol +kzl +k01 +kIZ)s+ (k
ol
k
01
+k
U1
k
11
+kozk:!.l)
Show thai, if V,(s) and V , (s) are exactly known, [he deterministic
identinability of [he model from Y(s) depends on the choice of input
waveforms as I'ollows:
Norton, J. P., Brown. R. F., and Godfrey, K. R. (1980). Modal analysis or identifiability of linear
cOlllpartmclItallllOuels. Proc. lEE PI. D 127,83--92.
Reid, J. G. (1983). "Lincar System Fundamcntals". Mt.:Graw-Hill. New York.
Robins, A. J. lind Wcllslcad, P. E. (1981). Recursive systcm identification using fast algorithms.
1m. J. Colltrol JJ. 455-480.
Rockafellar. R. T. (1970). "Collvex Alltilysis". Princeton Univ. Press. Princeton, New Jcrsey.
Schel1cll, M. (11)80). "The Volterra and Wiener Theorics uf Nonlilleur Systcms". Wiley. Ncw
York.
Schweppe, F. C. (1968). Itecursive state cstiniation: unknown but boundcd crrors and systcm
inputs. IEEE trwls. A1110111. COIl/rol AC-13, 22-28.
Shamash, Y. (1983). Critical review of methods for deriving stable reduced-order models. In
"Identification and System Parameter Estimation 1982" (G. A.Bekey and G. N. Saridis,
cds.). Pergamon, Oxford.
Soderstrom, T., Gustavsson, I., and Ljung, L. (1976). Identifiability conditions for linear
illuhivllriable systems operating undcr feedback. IEEE TrailS. AII(OIll. Control AC-2I,
837-840,
Soderstrom, T.. and Stoica, P. G. (19B3). "Instrumental Variable Methods for Systcm
Identilkution". Springer-Verlag, Berlin and New York.
Vidyasagar, M. (1978). "Nonlinear Systems Analysis". Prenticc-Hall, Engl;rood Clilfs. New
Jersey.
Walter. E. (191:12). "Identiliability of State Space Models". Springer-Verlag, Bcrlin and New York.
Walson, G. A. (1980). "Approximation Theory ulld Numcrkal Methods". Wiley, Chlc.l1ll'iler and
New York. '
PROULEMS
8.1 Consider the lwo-comparlmCill system of Example S.l.1 with only
compartment I observed, the observation gain being ('I' Verify thal the Illodel
k" J. ["'J
X X+ ,
k
ZI
-k
01
-k
I1
l{1
(i) Ifinpuls II, and Il, have sleps or impulses applied separalely and the
impulse responses from til lo y and 11 Z to .I' arc roulld, lhe model is completely
idenli/lable, including ('I'
(ii) If simultancous steps or impulses arc applied as "l ,111<.1 II,. the rate
conslants in the moLiel cannot be iLienlilicd. -
(iii) Ira slep is applied as III and an impulse as tiz (treated as a () function).
lhen the motlel is complelely identifiable only if ('I is known in advance.
8 SPECIALISED TOPICS IN IDENTIF[CATION 258
Graupe, D" Jain, V. K., and ISalahi, J. (1980). A comparative analysis of various least-squares
idenlHication algorithms. AII/olI/atica 16. 66J-6M I.
Hadley, G. (1961). "Linear Algcbm". Adtlison-Wcslcy, Reading, MussadlUsclts.
Hahnos. P. R. (1958). "Finite-dimensional Vector Spaces", Van Nllstrallt] Reinhold, Princeton,
New Jersey.
Juzwinski, A. H. (1970). "Stochastic Processes and Filtering Theory". Academic Prcs5, New York
und Londun.
Kailath, T. (I9BO). "Linear Systems". Prentice-Hull, Englewood em}"s, New Jersey.
Kalman, R. E. (1960). A new approach to lincar fillcringUlid predictioll problems. J. Basic Eng.
TrailS. ASME Sa. D 82, 35-45.
Lawrence, P. J. (1981). Estimation of the Volterra functional series of a nonlinear system using
frequency-response data. Proc. lEE Pi. D 128,
Lee, D. T. L., Friedlander, B., and Morr. M. (19M2). Recursive laddcr algorithms for ARMA
modclling. IEEE TrailS, Autum. CO/ltrol AC-27, 753-763.
Ljung, L. (1976). On the consistency of prediction error idcntiJication methods. 1/1 "System
Identification: Advances and Case Studies" (R. K. Mchra and D. G. Lailliotis, cds.),
Acadcmic Press, New York and London.
Ljung, L. (1977). Analysis ofrccursive stochastic algorithms. IEEE Tnim. AUlom. COIl/rol AC-
22,55[-575,
Ljung, L., and Soderstrom, T. (l983). "Thcory and Practicc or Recursive Identification". MIT
Press, Cambridge, Massachusclts.
Luenberger. D. G. (1973). "Introduction to Lincar and Nonlinear Programming". Addison-
Weslcy, Rcading, Massachusetts.
Marmarelis, P. Z., and Marrnarelis, V. Z. (1978). "Analysis of Physiological Systems". Plenum,
New York and London.
Maybeck, P. S. (1979). "Stochastic Models, Estimatioll; and Control", Vul. I. Acadcmic Press,
Ncw York lind London.
Maybeck, P. S. (1982). "Stochastic Models, Estimation, and COillrol", Vul. 2. Acadcmic Prcss,
McGarty, T. P. (l974), "Stochastic Systems and State Estimation". Wiley, New
Mees, A., and Sparrow, C. T. (1981). Chaos Proc. lEE PI. J) 128,201-205.
Mchra, R. K. (1972). Approaches to adaptive filtering. IEEE TrallJ. AU/olII. COl/lrol AC-17,
693-698,
Milancse, M., and Bclforte, G. (1982). Estimation theury nild ullcertainty intervals evaluation in
presence or unknown but bounded errors: linear families of models and estimators.
TrOllS. AI/tom. Control AC-27, 408-414.
Mirsky, L. (1955). "An Introduction to Linear Algebra". Oxford Univcrsity Press, London.
Nicholson, G., and Norton, J. P. (1979). Kalman lilter equalization for a timc-varying
communication channel. Ausf. Telecowm. Res. 13,3-12.
Norton, J. P. (1975). Optimal smoothing in the idcniilication of linear time-varying
Proc. lEE 122, 663-668.
Norton, J. P. (1976). Identification by optimal smoothing using integrated random walks.
lEE 123, 451-452,
Norton, J. P. (1980a). Structural zeros in the modallllatrix and its inverse. IEEE TrailS. Autom.
Conlrol AC-25, 980-981.
Norton, J. P. (l980b). Normal-mode identiliability analysis of linear compartmental systems in
linear stages. Malh. lJio.l'ci. 50. 95-115.
Norton, J. p, (1982a). An investigation of the sourt.:es or non-uniqueness in deterministic
iderltifiability. Math. Biosci. 60, 89-108.
Norton, J. P. (1982b). Letter to the Editor. Malh. lJio.\'ci. (.1, 2951t.JK.
261
PROBLEMS
8.8 Consider allcrnated state and parameter estimation as described in
Section 8.9.2, with recursiVe l.s. as the identification algorithm, and recall that
this algorilhl1l is identical in form, although not in interpretation, to a Kalman
filter. The identilication algorithm is applied to (8.9.1) with 0 appearing
lInearly III (I>, 1 and HI' and not appearing in B, _lor r
l
; assume 0 is known to
be constant. Write out the updating equations for 0and its covariance. and for
xand its covariance.
Compare the combined set ofequations with the extended Kalman liIter for
the Same problem, as described in Section 8.9.1. Would the extended Kalman
filter with the covariance replaced by a block-diagonal matrix
and the system is to be s.s.i.
give the same set of equations?
8,9* Investigate the conditions for the model (8.3.8) of a linear feedback
system to be strongly system identifiable if the system has feedback of the form
(8.3.9), two control variables, two outputs and one external input apart from
output nOise, and the controller is switched between two different clJntrollaws.
Specifically,. find which elements of the transfer-function mutrix 2
1
:'lnY8;'3.9)
musl diner Irum the corresponding clements of,!!}1. if the matrkcs ,,;('1 and %,
are both of the form -
8 SPECIALISED TOPICS IN IDENTIFICAtION 260
(ivj If an impulse is applied as '" and a step as "" the model is completely
identifiable even if CI is not initially known.
8,2 A proposed input signal for o.l.s. identification of a u.p.r. model as in
Example 8.2.3 has period Psamples. Is it persistently exciting of order Pif(i) it
has mean, anti successive half-cycles arc symmetrical about the
mean; Oi) successive half-cycles are nol symmetrical abotlllhc mean but each
cycle is symmetrical about its half-way point in time'!
8.3 Consider updating of the covariance of the recursive w.l.s.
discussed in Section 8.1.3. The batch w.l.s. estimate has en"";,,,,e.
E[( VTWVj-1 VTWRW
T
V( VTWVj-lj according to (5.3.14), where V is the
regressor matrix, R the covariance of the regression-equation error and the
weighting matrix. which in Section 8.1.3 is diagonal. First verify that (8.1.5)
updates (V
T
WV) - I by writing V," W, V, in terms of V,'_ I W, _ I V, _ I and an
increment at time t and using the lemma, much as in Section
7.3.3. Then produce an updating scheme 1'01' the covariance of the w.l.s.
estimate in tlie case of zero-mean regression-equation error of constant
. ,
vanance a- .
8,4 With g and/known in advance, is model (i) of Example 8.3.1 identifiable
from observations of the reference input anu the output by lilting a c1osed-
loop transfer function to the response to a deterministic refercnce input and
thcn solving for a, hI and h1. from thc coclIicients in the tran:-;fer fUllction'lls
the model still identifiable ifeither g or/is unknown? lThi:-; is not su easy to
answer as it scems at first sighLJ
Alternatively, could the model be idcntified by exciting the system with a
suitable reference input sequence and rewriting the model as a rcgression
equation relating YI to earlier output samplcs and to reference-input samples
(eliminating Ill))'! Would o.l.s. do?
8.5 Show that a Pad" approximation b/( I + 11.1') to the transler function
H(s) = (I + lis)/(I +s)(I + as is unstable if Ii> I +a. Can this happen even
if H(s) is both stable and minimum-phase (i.e. if all the zeros and poles of H(s)
are in the left-hand half plane)'! For such an H(s), can the alternative
approximation (b
o
+ b,s)/(I + 11.1') be unstable'!
8.6 Construct the cross-section at 0I = 5 of the polyhedron formed'
bounding 0 as in Example 8.6.1, with the same observations as in that example
but measuring time from midway between the third and fourth samples.
Notice whether the feasible-parameter region ends up larger or smaller than
Example 8.6.1.
8.7 Find A (z- ') and B( z - I) of the reduced-degree left-matrix-fraction
description of the system or Example 8.7.1 using the left divisor given in part
(iv) of the example. Verily that the degree is rcduced and cxplain the reduction
in tcrms of removing a common factor frol1l the numeralor and denominator
of G as given by A" I BI in part (i).
C1H1I11cr 9
Experimcntal Design and Choicc of Model Structure
9.1 INTRODUCTION
As alrcady cmphasiscd, idcntil1cation is not a mattcr of applying standard
techniques in a speciJled way and gctting guarantecd results. Rather we look at
the intended usc of I he model, the observations obtaina ble, the possibilities for
expcrimcntation and tllC time and clTurt available, then, if not dissuaded from
going any further, put together a mollel by an untidy combination; of
cxperimcnt. computation, analysis and revision. Eve"y stage is uncertain, and
cOlllmon cxperience is that each idclltilication exercise raises Some ncw
problclll. This is n rclleclion not priinarily of immaturity in identification
techniques, but rather of thc immense variety of dynamical behaviour,
experimcntal constraints amI purpo'ses for modelling. For thesc reasons,
designing a solhvare package for identilication is extremely dillicull, and the
1110st successful packages presuppose a great deal of intervention by the user
(Box and Jenkins, 1976; Young, 1984).
There is little point in trying to prescribe a comprehensive list of steps in
identifying a model and techniques for each step. Instead, the next two
chapters consider a nUlrtber of aspects of experiment design, model structure
selection and 1110del validation, with no pretence that all eventualities are
covered.
9.2 EXPERIMENT DESIGN
9.2.1 Adequacy of Input and Output
We cncouJlLercu pL:rSiSlcllCy of cxdlation conditions ill Section S.2.J, with
regard to possiblc rcdunlli.lncy allHJI1g the lagged input samples fOi"ming the
regressors in least-squares cstil1Hllion or a unit-pulse response. The input
sequence ju} was persistently exciting (p.e.) of order p if no selection from
263
Example 9.2.1 The inlluctlcc'or input bandwidth can bc secn in an example
(Robins, 1984) of the identification of aircraft dynamics. La teral acceleration I
and yaw rate r are measured as the aircran rudder dellection , is perturbed.
The aircraft velocity along its roll axis is ii, constant Juring the tcst, and along
265
4 2
Time (s)
Fig. 9.2.1 Recursive estimates of Y" Example 9.2.1.
o
10
its pitch axis is v. The problem is to identify the aerodynamic derivatives Y.... Y{'
Ill" Il
r
and lie ill thc mOllcl (for negligible roll motion)
I Ii +1''' = J'.P +Y,(
; = ll ... V +" r r +
from noisy measurements. Figure 9.2.la shows the estimates of Y, obtained
from a simulaled test with a 0.5 Hz balldwidth rudder-perturbation signal.
The estimates converge, but rather slowly. The reason becomes clear on
inspecting the transfer functions from' to I and r;
LI./",) - I' .w' - I' /I /'w + ( 1')1 . - I' /I )11
-I, .'r.. 'I,(.v'
= - w
2
- (J' ... +Il
r
)}w +ll(,ll +}\,11r
RUUJ) 1l,}W +Y,fl(. -Yl''''
ZUw) = - w' - j(Y,I+ /I,)jw +/I,ll +YJI,
with Y,"j' much smaller than y ... n, in practice. At low frequencies the
numerators are dominated by --' Y/l,U and - Yt/I" respectively, and Y,does not
appear in the denominator, So the measurements contain little
about Yr' However, at sulliciently high frequencies, LUw)/ZUw) A"
higher-bandwidlh input signal should therefore give better estimates of Ye'
Figure 9.2.1 b confirms that it does; a 5 Hz bandwidth rudder-delleclioll signal
results in much raster convergence of the Y, estimate. 6
II
9.2 EXPERIMENT DESIGN
9 EXPERIMENTAL DESIGN AND CIIOICE OF MODEL STRUCTURE
every p successive samples was exactly linearly dependent. The
domain counterpart is that the in put con tains power all' Of more frequencies.
An input whkh is p.e. or uHler Ii allows us to estimate fJ il,p.r. onJinates by
least squares.
Much reccnt en'ort has gone into deriving persistency of excitation
conditions to ensure convergeu<.:c of recursive algorithms for both
identification and adaptive control (Yuan and Wonham, 1977; Moore, 1983;
Goodwin ef al., 1985; Solo, 1983; Goodwin and Sin, 1984). The main
significance of such conditions in open-loop identification is to warn against
over-simple choices of input, such as a deterministic signal with a short period.
It is not normally difficult to ensure adequate excitation by employing, for
instance, a linearly ftItered white sequence as input (Example 8.2.4), so long as
we can choose the input. The situation is dillerent in adaptive cantral, where
the input depends on the output, but we shall not pursue that problem further
than it was taken in Section 8.3. Persistency of excitation also plays a part in
proving consistency of identification algorithms, as we saw for recursiveo.l.s.
in Section 8.2.5. An early invocation of p.e. conditions (Astrom and Bohlin,
1966) was to prove that the "maximum-likelihood" algorithm of Section
is consistent if the system is stable and completely slale controllable from the
input {II} or noise tel (O'AzZD and Hou!,is, 1981) allowing response
components due to every pole orthe system to be cxci tctl , and ifalso {Il} is p.e.
of order 2n and the model is
Y,=-atYI_l +, .. -t-h
ll
lf,_k_1l
+V,+CjL'I_1 + ... (9.2.1)
with {e} zero-mean, uncorrelated and Gaussian.
Given that an input is p.e. of the required order, the question remains
whether it has enough power in the pass band of the system to yield
model-coelficient estimates from a record of realistic length. The modest
requirements, nOll-zero power at some minimum number of frequencies
(Section 8.2.3), may ensure asymptotic convergence but do not guarantee
satisfactory finite-sample performance. Theoretically the input bandwidth
can be much smaller than thal of the system and still allow a model to be
identified, since the model structure relates behaviour beyond the
bandwidth to behaviour within it. Practically, the bandwidths must be
comparable. The spectral JisLribuLion or lhe noise power Illay also have Lo be
taken into account.
264
267
(9.2.4)
(9.2.5)
(9.2.6)
(9.2.7) k= 1,2, ... , N
x(I) = A(O)x(I) + B(O)\I(I)
y(l) = Y'(I) +e(l) = h(O)TX(I) +e(l)
J'
E['\' l1',(I}, - 0,)'1 .]'" tr wpl .
L 0"'11" 0=
0
i= I
sampled at times
9.2 EXPERIMEN I DESIGN
with specified forcing inputs u( I) and initial state x(O), e.g. zero with the system
qUiescent bel Ore bell1g perturbed. We consider noisy scalar observations
where Ydenotes the entire set of observations. One reasonable scalar measure
of estimation accuracy is .
where the weights Il'i making up the diagonal matrix 111 are chosen to suit the
model application, and 0
0
is a prior estimate oro. An alternative is -logIP-II,
which is nol affected by the scaling of individual O,'s.
Any numerical optimisation of an identification experiment has the
drawback that it depends on prior knowledge of 0, so a good design is
guaranteed only when it is least needed. However, a variety of helpful
qual!tative results input design have been obtained, particularly in
the Irequency domam (Goodwin and Payne, 1977).
...
*9.2.3 OIHimisntion of Output-Sampling Schedule
For CUIIl.:isCllCSS .I'(ld will bc wrillclI .l'k' '-HILI similarly allLl ('/;. The nuise
will be assul.l1cd Gaussian, with mean zero and variance at time 'k'
and I11dependent of one another. Since y1' depends only on x and u, it is
Formal optimisalion has also been applied successfully to the design of
output-salllpllllg schedules for experiments in which only very limited
observalions eanbe made (Di Slefano, 1980). With these applications in mind,
we shall look 1I1to the process of optimising the Fisher information
matrix F.
First we must make some assumptions about the model form and noise
probability distribution, lo enable us to write down p(YI 0) in (9.2.3). An
important case leading to tractable algebra is the linear state-variable model
(time-invariant for simplicity, although this is not essential):
(9.2.3)
[
D n' ]
p-' =F= E--(Inp(Y10)_ (lnjJ(YIO
riO DO DO
9 EXPERI MENTAL IJESIGN ANIJ CIIOICE 01' MOIJEL STRUCTURE
The model structure is assumed to have been selected already. The
optimisation is subject to constraints on, for instance, input or output power
or amplitude, number of output samples or experimcnl duration. To avoid
specialising the results to a particular estimator, 0is assumed to achieve the
Cramer-Rao bound on accuracy (Section 5.4.1), so
cov 0", l' = [(0 - 0)(0 OJ
r
]
*9.2.2 Oplimis:.ItitHl of IlIlIut
If a specially designed inpul perturbation is allowed or the output-sampling
schedule can be chosen, it is worth considering whether they can be designed
to optimise model accuracy. Consider input optimisation first (Goodwin and
Payne, 1977). The basic idea is to maximise some scalar measure of estimation
accuracy derived from the information matrix p-I or covariance P of
unbiased estimate 0 or the parameler vector o. Recall that for an
estimator, EO is 0 and
The need for a high enough sampling rate for input and output, long
enough records to determine the slower components of the response and allow
the initiakondition response to subside, and a long enough period if the input
is periodic, have been discussed in Chapters 2 and 3. Checking these poinls
requires rough prior knowledge of the dynamics, but in engineering
applications this is very often at hand, and is in any case highly desirable for
assessing the credibility of the final model before it is put into use. Some prior
experimentation may be required, to get information such as the spread of
time constants, approximate gains, the nature and frequency ofdisturhane,,,
and the incidence of drift.
When serious data-logging starts, instrument and data-logger unreliability
may be troublesome. For systems with slow dynamics, breaks in records
very likely, often as simultaneous short breaks in several records. When
simultaneously interrupted records arc shifted relative to one another as
required hy their relative timing within the model, each break results in two
equal intervals or one longer interval during which one 01' other record is
unavailable. The time shirts in computing correlation functions have a similar
effect. Reliability is a worse problem in multi-input or Illulti-output models, of
course, and this is one reason for the concentration on Illcthous for s.i.s.o.
identification.
266
269
u
k J [IJ
12 x -I- If
-k'2-k02 0
XI
Obse rved
",,
'"
,
ExallllJle 9.2.2 We once more a compartmental model as employed
in biomedical sludies. The model shown in Fig. 9.2.2 represents the /low of
material into and alit of the compartments by rate equations
.,1
Fig. 9.2.2 Compartmenlal lilodel, Example 9.2.2.
The whole procedure for evaluating the information matrix F at a given 0
comprises inlegrating lirsllhe slate equation (9.2.5), then, with x(c) known,
the sensilivity equations (9.2.13), and finally substituting the sample-lime
values of lhe derivatives into (9.2.12) and thence (9.2.11). Trial-and-crror
optimisation of F by repetition of this procedure is plainly a heavy
computational task. On the olher hand, as Di Stefano points out, useful
guidance can be obtaincu by a few trial evaluations or Fwithout liuJing the
optimal solution. The clfects of any change in the experiment can be predictetI
by comparing two evaluations. Di Stefano COniments that in his meJical
applications trials of this sort can be performed "prior to drawing a single Jrop
of biological /luid".
in the compartment contents x I (I) and x
2
{f). Rate constant /.:02 ror loss from
compartment 2 to the environment is known in advance, anti we have to design
an experiment to estimate k 12 and k21' the rate constants for flow between the
eomparlments. A bolus dose (impulse) is introduced inlo compartment I of
the previously empty system, and the ensuing variation of the amount in that
compartment sampled. We wish to compare two schedules, taking samples at
times I., 2, 3 and 2, 3, 4. Our measure of estimation accuracy will be tr F, which
will be evaluated al nominal (guessed) values k 12 = 0.05, k21 = 0.4. Ti,e
9.2 EXPERIMENT IJESIGN
(9.2.8)
(9.2.9)
(9.2.12)
(9.2.13)
(9.2.11 )
(9.2.10)
)=1,2, ... ,p
(y, - h'x,)'I}

(y, - h
T
X,)2
1
)}

a T T T
Do(h x,) = [Jox,] h+[Joh] x,
N
o . "'{I', - hTx, a 'r}.
ao(1n
p
(YI 0 = L -'-;r-3ii (h x,)
k::= I
N
p(yt 0) = p(y!'y" ... 'YN I0) nplY,l 0)
k::= I
N
Inp(YIO)=
k::= I
ax DA ax an au
- =-x +A -- +n--,
ao; DO; DO; DO; ao}
N
L
{
lOT aT '1' }
= ,-(II x,l-(h x,)
u, DO ao
k=::l
The only remaining problem in relaling F to design paramelers such as lhe
sampling times or inpul waveforms is 10 calculate a(hTx,)/ao for each sample
instant. It is
and [Jox,J, the Jacobian malrix of x, wilh respect 10 0, can be found column by
column by differenliating lhe slale equalion (9.2.5) wilh rcspeet to each
unknown V
j
, yielding
Hence
268 9 EXPERIMENTAL DESIGN AND CIIOICE OF MODEL STRUCTURE
As the noise samples Yk - hTx
k
are independent and zero-mean, the expected
value oflhe product of any two oflhem is zero, so on subslituting (9.2.10) into
the expressibn for the Cramer-Rao bound, we are len with
and so
deterministic, so Yk is Gaussian with mean hTx
k
and variance and the y
samples are independent. Consequently,
i
,
"
271
4 3
[
0.3447 0.2927] [0.5304 0.2319]
0.2927 0.2485 0.2319 0.1014
-4.230J
5.642
2
[
0.1418 0.2378]
0.2378 0.3989
,[ 4.155
(f-
-4.230
[
O.OIB7 0.0780]
0.0780 0.3256 in F)
ar x (increase
Ih..
"'0 I =;\: (- 21 -/- '-/') exp( - 0.21) -/- (lUI - \0) exp( -0.51) I
f' 2
(fl[ 8.027 - 5.020J
-5.020 4.167
for sampling at I, 2 and 3, compared with
for sampling at 2, 3 and 4. The reason why the former schedule allows more
accurate estimation of k
21
,as shown by element (2, 2) of F- 1, is that the faster
decaying of the two exponential components in x I (1) is much more sensitive in
both amplitUde and lime constant to k" than is the slower component, and it
is beller defined by the earlier sampling schedule. The beller defillition of k 12
by the laler schedule, indicatcd by elemenl (I, I) being smaller, is less readily
explaineu. The slower time constant is actually less sensitive to k12 than the
raster one, anu thc absolute scnsitivitics of the exponcntial amplitudes to k
li
are equal. Howevcr, thc slower componcnt has only aboul half the 1lmplituJc
or the fastcr one, so is rclatively morc scnsitive. It is cvitlClllly not vcry casy to
predict which schedule will be beller for k 12 wilhoutthe full analysis 10 lind
F-
1
Their initial cOlltlitiuns arc zero, because x(O) is independent or o. They arc
solved easily, if tediously, obtaining fr0111 the convolution integral
ii r
= (t -/- .,;') exp( - 0.21) - (51 -/- 'j"J exp( -0.51) J
{. ,
Inserting the chosen sample times l
k
, we can now evaluate F for each trial
schedule:
The contributiuns to F from samples at times 1,2, 3 and 4 arc gi:v.eJ) i1h,thc
accompanying tabulation, but the inJlucnce of an individual sample on the
accuracy of iJ cannot be seen from these figures. The most informative
indicator is the covariance F- I of O. If the noise variance is a
2
for all samples,
F-
1
is
1).2 EXI'EHIMENT IJESILiN

t, exp( - 0.21) - t, exp( - 0.51)J
exp( -0.21) -/- .\ exp( -0.51)
aA [-I
ao, = I
0.05J ax 1[-eXp(-o.21)-2eXp(-o.5Il]
-0.3 ;]0
2
-/-:3 exp( -0.21) -/- 2exp( -0.51)
0.05J ax 4[ exp( -0.21) - exp( -0.51)J
-0.3 -exp(-O.21)-/-exp(-O.51)
DA [0
DO, = 0
I) EXPERIMENTAL IJESIUN AND CHoiCE OF 1\'!llI)EL STRUCTURE
= [-0.4
0.4
=[-0.4
0.4
[
:\exp( - 0.21) -/- Jexp( -0.51)
e
A1
- .. .
- 1exp( - 0.21) -1exp( -0.51)
A= [-k"
k
ll
ax ax [ x'J
0o, = Aao, -/- -x;
and with n, C and U(I) all independcnt of 0, thc scnsitivity equations are
.1',=[1 O]X,-/-f"
The tranSItIon matrix e
AI
is found, by inverse Laplace transl"ormation of
[,1'1 - Aj-' or otherwise, to be
known k
02
is 0.25, the observation gain is I ano the input uosc is J, so the
observations are
270
here, with
The first eolumn gives X(I). Gradient D(hrx,)/DO in the expression (9.2.11) for
F is Dx,(ID/aO, so only the derivatives of X,(I) need be computed from the
sensitivity equations. With 01 == k 12 and 0i == k 2 J '
and standard linear-system theory gives
where 0* minimises V, V" is the second-derivative matrix of V and P the
asymptotic covariance of the normalised estimation error fi(0 - 0*). Jftwo
model structures .//
1
and .ll
z
being compared arc hierarchical in the sense
that ./1] is contained in ..i1
2
, and the system the observations has
lhe structure .1/
1
, F(O,) will be no larger than V(02) if the estimator achieves
either the Cramer-Rao bound on P-' or
273
(9.3.2)
(9.3.3)
(Y.3.1)
".1,
V(O) = IE[ecT]1
v = Ii [V(O*)] '" V((i) -I- tr( V"((i*)I')
or noise terms, as appropriate. Sometimcs it coincides with the order of the
dynamics, but more often nol. Folklore has something to say about model
order determination. All other things being equal, the simpler of two models
each encompassing the actual syslem behaviour is felt to be belter. This feeling
has been dignilied with the title of the parsimony principle. Although it has
some basis in the numerical ill-conditioning and statistical ineffIciency
associated with indiscriminate addition of terms to H model, it is an
oversimplification. Whether a simpler model is on average better depends on
the intended nse of the model, the family of alternative models contemplated
and the estimator employed.
Stoica and Soderstrom (1982) discuss under exactly what condilions the
parsimony principle applies. They compare structures by calculating for e.ach
or them the mean value V, over all possible values of the model estimates 0, of
any scalar measu're V(O) of model performance for beller
performance) which is dilferentiable twice with respect to O. By Taylor senes
expansion they show that as the record length N lends to infinity,
where R is the covariance of the white noise e making up the unpredictable
part of the observed system output, and cis the model-output error.
Hierarchical structures include important cases like autoregressions or
moving averages of different orders, but not, for example, a two-term moving
average as .1/\ and a third-order autoregression as ./I
z
. The assumption
the structures are hierarchical can be dropped if the performance measure IS
9.3 SELECTION OF MODEL STRUCTURE
and Ihe covariance satislies (9.3.2). In that case V"(O*) is 2lRlp-l and Vis
IRI dim 0, so a mouel structure with fewer unknowns to estimate is better,
hierarchical or not. Counterexamples are presented to show that the
parsimony principle docs 1101 upply ill gCl1cralunlcss the conditions above arc
[net.
These rcsulls arc very general they restrict the model structure and
proposed application very little, and apply tu large classes of estimators and
9.3.1 Model OnJcr Uetcrmination
The detcrmination of Illodel order is an important problem, for which
techniques are well developed. Model onlcr will be taken, rather loosely, to
mean either the total number of parameters or the llumbcr of thc input, output
9.3 SELECflON OF MODEL STRUCrURE
272 9 EXPERIMENTAL DESIGN AND CIIOICE OF MODEL
For sampling at 1,2 and 3, tr F-
I
is 12.19IT
2
, compared with 9.80IT'
the other schedule. The other measure -loglFl is 0.9165 + 410g IT
0.7442 + 410gIT, respectively. A scalar of accuracy is rather un-
when, as here and as usual, Some parameters improve and
others get worse when schedules are changed. Weighting individnal parameter
variances as in (9.2.4) supposes that we know in advance how seriously we
individual errors. Unfortunately we do not know until we see what vanallc,,,
arc obtained, so trial-::lI1d
M
crror adjustment of the weights is necessary. 6.
The selection of a model structure starts before dcsign of the identification
experiment, and continues during and after it. Decisions 011 the scope and
form of the model must be made. The scope determines what variables shoald
be included, what time scale the model is to operate on, what range
operating conditions shaull! be covered, what observations should be used
and what information the lIlodel has lU provide. Model I"onll was discussed in
detail in Chapter I.
130tl1l"Ofl11 and scope arc greatly Iimilctl by what turns out to be practicable
in collecting observations. Anyone concerned with identification in industry
has had the deBating experience of having what seems a perfectly reasonable
suggestion dismissed on unanswerable grounds such as unserviceable
instrumentation, unwillingness to interrupt JlOfmal operation, inability to
wait for the results, lack of manpower, missing records, or a conviction
the results are already known or impossible to obtain. As rational plans are so
often frustrated, there is little point in generalising rurther about these
informal aspects of structure selection.
Before going on to the topic of model order determination, we should note
that very often one starts with a strong prcference for one particular model
structure. The preference may stem from farililiaritYI proven elrectiveness or
mathematical tractability of the structure. The belief "beller the devil you
know than the devil you don't know" at times explail1s relention of a structure
with admitted weaknesses. In other cases onc hesitates to drop a "classical"
model on which much effort has been spent, even when it has obvious defects.
Example 9.3.1 The rainfall-river flow model
Y, =-OIY/-' - ... - anYr-1I + buu, +- ... +blll-lu/-m +c,c,_, +...
+cqe,_q + d +e,
275
(9.3.6)
(9.3.7)
Mot' I
Ii
'"
P
V
2
]
9 0.074286
2 4 10 '0.073542
]
J 4 II 0.073248
The log-likelihood function discussed in Seclion 6.4 is the basis of an
alternative test of model structure (Akaike, 1974). The
mean information is deJined as
9.3.3 The Akaike Information Criterion
total number of coelJkicnls estimated and V the mcan-square output enOL
For models I and 2, f is
7.44 )( 1445
.- ._-_.__ . ..- = 14 6
0.073542.1 .
If this is greater than the value exceeded by a F(I,1445) variate with
,.1 "
probability a, the hypothesis lhatthe extra coeflieient in model 2 is redundant
is rejected at level a. For a = 0.05 the value is 3.84, and for a = 0.01 it is 6.63, so
the hypothesis is rejected at either level. Comparing models 2 and 3, f is
2.94 x 10-
4
1444
=,5.80
0.073248 I
was filled by the extended least-squares algorithm to 1455 hourly samples {y I
of 1I0w in the Afon Hirnant in Wales. The input samples {II} are means of
readings from six rain gauges. The gauges register total rainfall oVer an hour,
and the elfective pure delay of the flow response to the areal mean rainfall is
under an hour, so the model has b
o
non-zero. For all runs the noise orderlfwas
3. Three models gave the values in the accompanying tabulation, whcrcp is the
where L(l1) is the function of 0 given Ihe set of observations Y
and 0
0
is the "truc" valuc oro. If we suppress unydoubts about the meaning of
the "lrue" 0, il turns oul t1Wl IU)jl00) l.:all be appruxilllalctl by the sum or a
term in 0
o
, lhe same fur any candidate model structure
l
and a term
proponional to the criterion
9.3 SELECTiON OF J'v!ODEL STRUCTURE
so model 3 is taken as significantly better at level 0.05 but not at level 0.01.
0:
(9.3.5)
(9.3.4) i= 1,2
9 EXPERIMENTAL [)ESIGN AND CHOICE OF MODEL STRUCTURE
Ji' - V N - P1
I' = _.! .. --._...,.
. V
2
P2 - PI
has an F(P2 - PI' N - P2) distributioll. The hypothesis that .1/
1
is adequate
can be testetl at ally chosen signiJicancc level by eOlllparinglwith the value
exceeded with the corresponding probability.
The F test operates on the sample mean-square model-output crrors
N
- I\',
V,(O,) = Ii L c, (0,),
/== ,
9,3.2 F Test
274
from two alternative model structures .1/
1
and .112.' which usually ddter
in their numbers of terms. 11" .11, has PI parameters amI .11
2
a larger number
P2' and if the output errors from .11, and .11
2
form sequences of in,le[lellldent,
Gaussian, zero-mean, constant-variance random variables, then V2 and
V, - V
2
are independent X
2
variates with N - P2 and P2 - PI degrees
freedom (Wadsworth and Bryan, 1974). It follows first that .1/
1
is an adequate
model, in that its output errors havc no timc structure. and SCCOlH..I that the
statistic
noise and input distributions. However, the assumption that .//
1
includes the
process generating the records is doubtful. The l11oslcommon practical
siluation is for a model to be estimated in the full knowledge that it is only a
simplified and partial representatiol1 of the system behaviour, adequate fot a
stated purpose. For instance, a low-order model may be required so that a
simple controller can be designed from it, or nOll-linear behaviour may be
treated as time variation of a simple lincar model rather than modelled
explicitly. The consequence is a blurring of questions of model goodness. A
larger model might well be a beller fit 10 the observations, and a beller
predictor, but unacceptable because of its complexity. We should not lose
sight of the fact that tests of model structure are to help us compromise
between complexity and performance, exclude grossly deficient models aud
avoid ill-conditioned computation, rather than to determine the "correct"
strucLUre.
We shall review three popular ways of testing model struclure: F lests, the
Akaike information criterion and comparison of prOllucHnomenl matrices.
Ifwe substitute this into (9.3.7) and drop the part independent of 0, we obtain
the statistic
277
(9.3.13)
Moreover, with P2 and pz- PI much smaller than N,
N -I', (exp(l:.. (I', _ 1',) _1) _N -I', 2 (1'2 _ PI) '" 2
1'2-1', N 1',-1', N
This value corresponds to a not very stringent significance level of roughly
for P2 -PI = 1 (addition of one parameter) and little lower for
1', - 1', "" 2, both for large N.
The main doubt about the usefulness of the F and AIC tests arises from
their assumption of Gaussian Ie}. In many identification problems the noise
and residuals arc distinctly non-Gaussian. An approach to model-order
testing which relies less on an assumption about the probability distribution of
the noisc would be ofintcresl. For.a linear model, the product-moment matrix
provides just such an approach.
9,].4 Product-Mulllent Matrix Test
(9.3.8)
(93.9)
9 EXPERIMENTAL DESIGN AND CHOICE 01' MODEL 276
C; = N In V, + 21',
to lest. Notice the assumption that OJ is the 111.1. estimate for .IIi'
The test then consists of comparing C1 with C
I
and accepting .//, as adequate
if C, is smaller. (The information criterion is abbreviated to AIC. According
to Akaike, the A stands for A, as distinct from D, C, etc.; according to
everyone else, it stands for Akaike.)
From Section 6.4.5, we know that scalar observations alreeted by Gaussian
errors of unknown constant variance give rise to a maximum log;-Itl<el:lho,od
N
L(O,) = - (I + In2l! + L
,=. 1
(9.3.16)
111]
(9.3.15)
1= 1,2, ... ,N
Uti_I
1=1,2, ... ,N
["
X,i_1 Xl
1
Un
I
X
nt
I I
llri+ I
V(lI, x, Ii) =
1
1
I
X
N 1
I liN
The idea bchillll mouel-structure testing using product-moment matrices
(Lce, I%4) is that thc noisc-frce output x, from a system with input-ou'tput
dynamics
(9.3.14)
is exactly linearly dependent on the set of samples X
r
-
1
to and U'_I to
"/_/l' For any trial model order ,i greater than 11, the dependence reduces the
rank of
by the numbcr Ii - 11 of X colull1ns linearly dependelll on the other colu111ns. If
the 'clean' output x were accessible, a simple tcst of system order would be to
check the rank of U or, more conveniently, the small (211 square) matrix
Vi VI N, henceforth called A. Assuming thatt he input is p.c. of order 11 + I or
more, the u columns arc still linearly independent when Ii is II + I, so the onset
of singularity of A at that Ii indicates unequivocally that the system order is 11,
thc largcst Ii lor which A remains non-singular. Unfortunately, {x} is not
accessible, and the rank deficiency of A is obscurcd by noise {e} in the
observed output samples {y}:
(9.3.12)
(9.3.10)
(9.3.11)
Model V
I'
C'
t 0.07429 9 -3765
2 0.07354 10 -3777
3 0-07325 II -3781
Nln V, +21'1 < Nln V
2
+21'2
(
V )N-p, N-p, ((2 ))
J= --'.- I - < - exp --(1'2 -PI) -I
V, 1'2-1', 1'2-1'1 N
then
- PI)
and so the Ale tcst is equivalent to an F lest giving
The attraction of the AIC test is that it does not require a significance level
to be chosen subjcctivcly. As Soderstrom (1977) has pointcd out, thc AIC test
can be viewed as an F test, for if by the AIC lest
Both the Ftest and the AIC test seem to favour the larger model unless the m.s.
output errors are very close. This point is followed up in problems 9.4 and 9.5,
',.
Example 9,3,2 The models quoted in Example 9.3.1 were obtained by the
c.l.s. algorithm, which is approximately m.1. il'thc noise is assumed Gaussian.
The models give the values in the accompanying tabulation, so model 3 is
preferred. A reduction of 0.26 in V
z
would be enough to make C; < c;;
279
(9.3.23)
(9.3.26)
(9.3.24)
(9.3.25)
(9.3.27)
'" III J
. '" [Y,,-I
... U
N
-
Il
+
1
U
n
_ I
UN _ I
.1'1
J'N-I/+ 1
H
;,,[y,,-,
,,-I - .
J',.... - I
where R:
c
{1i) is the normalised autocorrelation matrix and Ii is large enough to
justify (9.3.22). Equation (9.3.22) says 1',,(0) is a generalised eigenvalue of
A (II, y, Ii).
However A(II, x, Ii) is oblained, il will be iIl-eonditioned rather than
singular, because of the approximation error. Some way has to be found of
deciding when the matrix is ill enough conditioned to indicate that ij is greater
than n. Woodside suggests forming either the determinant ratio
as a normalised scalar statistic, or another determinant ratio pili) which
approximates the sum of the squared model-output errors obtainable in the
absence of noise. As Ii is increased, p(li) should jump upwards alii = II, and
f'-,(Ii) should drop sud4enly at Ii = II, as output modelling errol' due to too Iowa
model order vanishes. The derivation of Vt(li) is a bit tedious, and you may
pre reI' just 10 uote Ihc rcsults, (9.3.29) and (9.3.31).
First we write down the sum 8
11
_ I of squared regression-equation errors for
the o.l.s. iilOdcl of order It - I. Next 'We express Sn_ I as the quotient of two
determinants which relate closely to A(II, y, II - 1) and A (lI,y, II). Finaijy,
replace A(II, Y, II - I) and A(II, y,lI) by /J (II, x, 11- 1) and A(II, x, II) sO as to ;'.
reduce the effects of the noise in {y}, and compulc an uenhanced" sum of
squares of errors. The details are as follows.
The regressor m<ltrix for the model given by (9.3.14) and (9.3.16), but with
order n - 1 rather than n, is
The o.1.s. model outputs .1\1 to J'N form the vector
and the sum of squares of errors is
8"_1 = (y" - y)'(y" - y) = - y)
since y is orthogonal 10 y" - y. All the quantities in (9.3.25) and (9.3.26) may
be computed from the partitioned matrix
Expansion of IB(u,)', 11)1 by its lirst row gives
IB(u,y, 1/)1 = y,:Y"IH,:_ 1H,,_ tI- 1 1H,,_, )H,:_IY" (9.3.28)
(9.3.22)
(9.3.21)
(9.3.19)
(9.3.17)
, (9.3.20)
ti,]
;=O,I, ... ,li-1 (9.3.18)
I ,)'Ju,; yJU'i" 1 .. ,
I
I
Y[YI I y1u11
I
I
I I. I
U';Yl I ",; II"
I
I I r
"jYl I UrU'i
U(u,J'"i) = [Yri YIi-l Y I U,i U,j_1
yI-i=[Yti-i Yri-i+l .. , J'N-i]'
I
A(II,Y"i) =--;;
uIY'i
,.
U.;Y,;
. [R,.,.(li) OJ
A(II,X,II)=A(II,y,II)-
It may be possible to measure thc noise a.c:l'. .by holding the input
that all output variation is noisc. If not, 1I11ormcd gucsswork maY.Yleld a
usable estimate of the normalised a.e.r. (normalised by 1',.,(0), the n.OIse m.s.
value). A possible way of providing 1',,(0), suggested by WoodsIde, IS to take
the smallest solution of
I
. [11:.,.(11)
IA(II,x,li)l= A(II,y,II)-I',,(Oj
. [R".(li) OJ
A(II,y,li)-+A(II,X,II) +
where element (i,}) of R,.,.(li) is the autocorrelation of {e I at lag.i -j.
on the observed input {Li} can be treated by allowlI.lg an
autocorrelation matrix as the bottom right partition. Estimates of the nOIse
autocorrelation ordinates have to be supplied if we are to reconstruct the
noise-free A from
We assume that {e) is uncorrelated with {II} and Iherefore with (x), which
depends only On {II}, and also assume that (e) and {II} are ergodiC and the
system timeinvariant. As the number of observations N rtses, these
assumptions give
and similarly for uri_j. then
278 9 EXPERIMENTAL DESIGN AND CHOICE OF MODEL
For high output signal-to-noise ratios, it maybe possible to {y} in plaee
{x} in V and still detect the onset of ill-eonditlolllng 01 A as illS :a,sed, but for
realistic amounts of noise some modification is normally reqUired. An early
suggestion (Woodside, 1971) was to estimate and remove explicitly the elTect
of noise on A. If we denote V(II, y, Ii) by
where
281
FURTHER READING
Optimal experiment design is covered by Silvey (1980), Beck and Arnold
(1977), Zarrop (1979), GoodlVin and Payne (1977) and Kalaba and Springarn
FURTHER REAlJING
In this chapter we have reviewed several results and techniques which may be
some help in choosing a model structure and ucsigning an experiment to
estimate its eocJlicicnts, The reason for such diJIldcnt wording is that success in
identifying a useful model depends much more on accurate appreciation of
what the model must do and recognition of what is going on in the results of
identification experiments than on virtuosity in applying analytical tech-
niques. The next chapler will illustrate some or the problems that arise when
we start experimenting in earnest.
9.4 SUMMARY
Generalisations of the instrumental product-moment matrix method are
presented by Wellstead and Rojas (1982). They point out that minor
amendments to U allow the test to cover different orders 11 and m for the
autoregressive and moving-average parts orthe model (9.3.14) and to find the
dead time k. The testing proceeds in stages. First Ii and IiI +k are increased
togcthcr to identiry the larger or II and //I +k, then Ii and Iii +f( are reduced
alternately to find which is the smaller of II and III +k, and establish its value.
Finally Iii and f( are varied to find III and k.
Taking a more statistical view, the product-moment matrix tests are
methods or detecting ill-conditioning or the covariance matrix or the model-
coellicicnt estimates. Thc normal matrix for o.l.s. with the model (9.3.14) and
(9.3.16) is NA (II, y, II), so on the assumption of uncorrelated regression-
. equation error noise"), the covariance of the coefficient estimates is
',.,.(O)A -, (11,.1', II)/N. It is estimated as an intrinsic part of the e.l.s. algorithm,
and can also be obtained easily from qlC product-moment matrix inverse
updated by the recursive instrumental-variable algorithm (Young er al.,
1980). Scalar measnres of covariance ill-conditioning are discussed in this
reference also.
We should recall at this point that Chapter 4 has already provided ways of
testing Ihe structure of linear models. The utility of each term can be tested by
singular-value decomposition of U. as described in Section 4.2.4.
Alternatively, the Golub-Householder method described in Section 4.2.2
makes trial deletion or terms rrom the model easy; one need only place those
terms last, so that deleting them merely removes corresponding columns from
the extreme right or the triangular matrix V in (4.2, 10), leaving the rest of the
computation unchanged.
(9.3.35)
(9,3.29)
IB(lI,y,II)1
jH,;_,Hu_,l
[A'(lI,y,II)] = E[A'(lI,x,II)]
9 EXPERIMENTAL DESIGN AND CHOICE OF MODEL STRUCTURE
T 'H [H
T
H ]-I//T
Su_ I = Y/lYu - Y/I - /I-I II-I /1- I u-IY/l
We can relate H;'_ 1H"_1 to A(11, .1', II - I) by noticing that
Y,,-I U,,_I ] (9.3.30)
YN YN-n+2 liN ... li
N
-
Il
+
2
.
A'(lI, .1', II) = (U
1
(1I, Z, II) U(lI, .1',11/ N
where {z} is an instrumental-variable sequence which, as in section 7.4.5, is
strongly correlated with Ill} ami {x} but uncorrelated with the noise
le}. Provided that =, and u
i
arc uncorrelated with I up to e""II_!'
Now A'(II,X,II) sulrers rank deficiency in exactly the same way as does
A(ll,X,U) since U(ll,X, Il) is present in both, so the determinant-ratio tests
described above can be performed equally well using A'(ll, x, Il) in place
A(11, x, II). At the small price or having to treat the inpnt as noise-rree,
provides a computationally cheap alternative to Woodside's
method. Caution is necessary ir {z J is generated by passing {II} through a
linear filter; if the order or the lilter is less than the trial order Ii, the
determinant ratios will detect the liIter order rather than the system order.
so that
NA(lI,y,II-I)= U(lI,y,II-I)lU(lI,y,II-I)
X [YN YN-,,+2liN (9.3.31)
For N large, the right-hand sidc or (9.3.31) is close to H.:'_ Ji,,_I' Also,
U(lI,y,lI) = [y" Y"_I u" U,,_I] (9.3.32)
so B(ll,y,ll) could be obtained by deleting the "/I column row from
NA(11,.1', II). If we reconstruct the approximate noise-rree A(11, X, II - I) and
A(11, x, II), and pick out B(lI, x, II) rrom the latter, we can compute in place
(9.3.29)
p,(I1- I) = IB(lI,x,II)I/INA(lI,x,lI- 1)1
Woodside round that this "enhanced" statistic indicates the correct order
simulated records with mean-square signal:noise ratios down to a\;iout 10.
Instead orfindingA(11, x, II) by estimating R
ee
(II), Well stead (1978) sug;ges;ts
calculating
so
280
f
j(
\1
I
I
i
f
, i
ft
"
i
r
PIWBLEMS
.1/.1: ;'i: =l1
l
X +112X2 -t- ... -I- GIIX" +1m
What combinations oftwu or more of these structures are hierarchical? [Slate
any restrictions necessary,]
.11,: ." =ax +U:'/(x +g + bll
.11,: ." = ax +Ii +bll
283
I'J(UULEMS
Silvey, S. D. (1980). "Optimal Design". Chapman & Hall, London.
S6derstrfHn, T. (I 977}. On mOdel structure testing in system idcntilicalion. 1111. J. C01llro/26,
H8.
Solo, V. (1983). "Advanced Topics in Time Series Analysis". Springer. Verlag, Berlin and Ncw
York.
Sioica, P.. and Soderstrom, T. (1982).011 the parsimony principle. I"t. J. CO/J(roI36, 409-418.
van den Boom, A. J. W. and van den Enden (1974). The determination of the orders of process
and noise dynamics. AIl/oII/atica 10, 245-256.
Wadsworth, G. P., and Bryan. J. G. (1974). "Applications of Probability and Random
Variables", 2nd ct!. McGraw-Hill, New York.
Wcllslead. P. E. (1978). An instrumental product inoment test for model order cslimation.
AI/(omatica 14, 89-91.
1 Wellslcad, P. E., and Rojas, R. A. (1982). Instrumental product moment model-order testing:
Extensions and applications. lilt. J. C01l/roI35;-IOI3-1027.
Woodsiue, C. M. (1971). Estimation of tile order of linear systems. Alilomatica 7,
Young, P. C. (1984). "Recu,:sive Estimation and Time-Series Analysis". Berlin
and New York.
, Young. P. c., Jakeman. A., and McMurtrie, R. (19HO). An instrumental vuriuble method for
moLlel order identiliciltioll. Alitoll/l/tica 16,281-294.
Yuan, J. S-c., and Wonham, W. M. (1977). Probing signals for model reference idcntification.
IEEE 1'/'(111.\'. Amom. ('01l/rol AC-22, 5]()538.
Zarrop, M. B. (llJ7lJj. "Optimal Experiment Design for Dynamic System
Springer-Verlag. Berlin and New York.
9.] Which of the cueflicients J'r, l 11 pI ll, and 11(in Example 9.2.1 arc likely to be
hard to identify by an experiment in which the input signal \(1) is (i) of small
bandwidth, (ii) narrowband, at a frequency high enough for the frequency-
independent terms to have little en'ect on LIZ and RIZ? [Assume Y, is known
in ativance.J
9.2 Investigate the improvement in accuracy of k11 and k21 in Example
, 9.2.2, compared to the results of sampling at times 1,2 and 3, due to (i)
sampling al times 1,2, 3 and 4; (ii) sampling at 1, 2, 3 and 10; (iii) adding a
second sample at time J, independent of the lirst; (iv) adding a second sample
at time 3, independent of the first; (v) halving the sampling interval.
9.3 Three alternative model structures arc
REFERENCES
Akaike,H.(1974).A new look at thestalistical model identification. IEEE TrailS. Autum, COlltrol
AC-19.716-723.
Astrom; K. J" and Bohlin. T. (1966). Numerical klcnlilkatiol1 or lincar dynamic systems from
normal operating records. /11 "Theory or Scif-AJuptivc Control Systems" (P. H.
Hammond, cd.). Plenum, New York.
Beck, J. V" and Arnold, K. J. (1977) ... Parameter Estimation in Engineering and Science". Wiley,
New York.
Box., G. E. P., and Jenkins, G. M. {I 976). "Timc Series Anulysis Forecusting and Control';.
Holden-Day, San Francisco, California.
D'AZiO. J. J., and Houpis, C. H. (1981). "Lincar Control Systcm Analysis lind Design". 2nd cd.
McGraw-Hili, New York.
Oi Stefano, J. J. (1980). Design ant.! optimisation of tracer experiments in physiology and
medicine. Fed. Proc. 39, 84-90.
Goodwin, G. c., and Payne, R. L. (1977). "Dynamic System Identification: Experiment Design
and Data Analysis". Academic Press, New York and London.
Goodwin, G. c., and Sin, K. S. (1984). "Adaptive Filtering Prediction and Control':. Prentice-
Hall, Englewood Cliffs. New Jersey. .. .
Goodwin, G. c.. Norton, J. P., and Viswanathan, M. N. (1985). PersIstency of excllaUon [01"1
rionminimal models of systems having purely deterministic disturbances. IEEE Trans.
AIl(om. COllfrol AC-30, 589-592.
Guidorzi, R. P., Losito, M. P., ami Muratori, T. (1982). The range error test in the structual
identification of linear Illultivuriuble syslems. IEEE TrOllS. ,.11/(011/, C01!frol AC-27,
1044-1053.
Kalaba, R., and Springarn, K. (1982). "Control, Identilication and Input Optimization". Plenum,
New York.
Kashyap, R. L., and Run, A. R. 0976). "Dynamic Stochastic Models from Empirical Data";
Academic Press New York and Lom..loll.
Lee, R. C. K. (1964). "Optimal Estimation, Identilic,ltioll and Control". MIT Press, Cambridge,
Massachusctls.
Mehra, R. K., and Lainiotis, D. G. (cds.) (1976). "Systemldentilication". Academic Press, New
York and London.
Moore, J. B. (l983). Persistence of excitation in extended least squarcs. IEEE Trans. All/Olll.
Conlrol AC-28, 60-68.
Robins, A. J. (1984). Identification of aerodynamic derivatives using an Kalman
Lecture notes, graduate course on Kalman nltering, Dept. of Eleclrolllc and Eleclncal
Eng., Univ. of Birmingham, U.K.
282 9 EXPERIMENTAL IJESIGN ANIJ CHUICE OF MODEL STRUCTURE
(1982). Mehra and Lainiotis (1976) have two substantial sections on thetopic,
along with interesting identification case studies. We. have consldc:cd
multivariable systems, although the basic ideas concerning expenmenl design
and parsimony apply to them too. Model structure and
accuracy for mullivariable systems are the subjects of a chapter in Kashyap
and Rao (l,976). Guidorzi el al. (1982) olrer a test for multivariable model
structure which does not require prior filling of a selection of models.
Comparative studies of a number of structure selection methods have been
carried out by van den Boom and van den Enden (1974).
,,' 1./ :.'; ",:",
10.1 INTRODUCTION
[a, 0, IJ, IJ,J = [-1.527 0.598 -0.141 0.992]
10.1.1 Nature of Validation Tests
Chapler II)
Model Validation
Example 10.1.1 The model
A mouel is validated by answering two questions: is it credible, and docs it
work'! The lirst supposes we have background knowledge and wanllhc
mLH.lel to conl'orln with il.
is filled lo hourly recordings or rainrall {II) in the catchment of the River Eden
in north-west England and corresponding recordings of river flow {y}. The
aim is to develop an on-line flow predictor.
After 100 steps or the e.l.s. algorilhm, we have
Sillce b, is the first non-zero ordinate or the u.p.r., it eannol be negative, which
would imply thal the inilial ell'ect of rainfall is to reduce Ilow. Hence, IJ, is
implausible. The reason is lhat the model has too short a dead time; the latest
input affectingy, is u
l
2' In the absence ofany influence of u
l
_ I On Yr' the value
of jjl is determined by its elrect on the subsequent u.p.r. shape, Le. coefiicients
rrom 1r
2
on in
When the u.p.r. or a model with ncgative Iii and too short a dcad timc is
plotted, it normally approximales the u.p.r. of the model with the correct dead
lime quite well at lags beyond the correct dead time. f':,
185
10 8 (, 4 2 q
Y EXPERIMENTAL DESIGN AND CHOICE OF MODEL
9.4 If two linear Illodel structures dillcring by q terms are lo be compared
an F test, and the records are 200 or more samples long, the F U"""'VU'UL'"
relevant to the lesl is close to F(q, (0). The values offexceeded by an F(q,
variate with probability 0.05 are
284
3.84 3.00 2.37 2.10 1.94 1.83
Tabulate the percentage excess of the smaller model's sum of squared output
errors over that of the larger model when the hypolhesis lhat the
model is no belter is just accepted by the F tesl, for lhese values of q and for
records of length N = 200, 1000 and 5000. Ponder the likely praCllca,
significance of percenlage dillcrences of this size.
9.5 Tabulate the percentage excess of a smaller model's sum of squared
output errors over that of u larger Illodel when the Akuikc information
criterion says that two linear, time-invariant mudel structures difl'ering by q
lerms are equally acceplable, ror q = I, 2,4,6,8 and 10, and ror records
length N =200, 1000 and 5000. [Usc thc mean-square <>Ut!,ut error or
larger model as ,,'.J Compare this table wilh the aile compiled in Problem
is
E[(y, - y,(x
N
))' J= E [(hf (x - XN) +l',)(lii(x - XN) + vi)r]
= h?E[(x - xN)(x - xN)T]h
i
+E[vn
= h?PNh, + (10.1.2)
287
10.1.2 What Do We Test'!
1U.2 CHECKS BEFORE AND DURING ESTIMATION
Valuable information can be gained by looking at a plot of the records as SOOll
as they are received.
,,'
10.2 CHECKS BEFORE AND DURING ESTIMATION
We shall examine these tests entirely by examples, and try to resist too-
sweeping conclusions.
The short answer is "everything we can", but more specifically we Can test
(i) the records, before we do anything with!them;
(ii) the parameter estimates, in the light of background knowledge;
(iii) the fit 01" the model to the records, through the residuals (y _ y);
(IV) the eslimated covariance of the parameter estimates; and
(v) the behaviour of the model as a whole, measured for instance by
steady-state gain, u.p.r., poles and zeros.
10.2,1 Checks on Records
,Example 10.2.1 The rainfall and river now records used in Example 10.1.1 are
shown in Fig. 10.2.1. We see immediately that lwo distinct situations
alternate: rapidly changing flow with frequent or continuous rainfall, and
smooth monotonic flow decrease (recession) with little or no rainfall. The
question arises Whether one constant-parameter model can cater for both
situations. We also see that the low-frequency gain from rainfall to fiow varies
from one lIow peak to anolher; for instance, the peak near 213 h is higher than
the one near 189 h, but preceded by less rainfall in the previous 10 h or so. We
should Hot expect too much from a time-invariant linear model, and may well
have to resort to a time-varying model for prediction, the ultimate aim.
A time-invariant model might be preferable if we knew enough to
chol)sC its form, bUl we shall !lol pursue that option.
The spread of time constants looks large, from two hours or so as ilidicated
by the rises to perhaps ten hours or more for the slowest recession
components. The flow record is quile smooth, So the sampling interval is
probably short enough to avoid aliasing. The rainfall record is bolh uneven
and heavily quantised, but each point is the integrated rainfall over an hour
and the rainfall-to-llow dynamics are clearly of low bandwidth. Any increase
in explanatory powcr of the rainfall record achieved by shortening the
(10.1.3)
(10.1.1)
10 MODEL VALIDATION 286
Here h
j
has been treated as deterministic as we are not averaging over a range
of possible hi' and (lO.l.l) has been assumed an adequate description of Yi.
Also, Vi js assumed to be zero-mean and not correlated with x
N
. The actual
covariance PN and variance would be replaced by their computed estimates
to approximate the expected sample m.s. prediction error
A direct and revealing test of whether a model works is to try it on records
dilTerent from those it was estimated from. However, there could well be a
noticeable finite-sample dilTerence between the performances of the model on
the two sets of records, even if the model were optimal in some statistical sense
and strnctnrally well chosen. We might consider testing the significance of the
difference in performance by comparing a statistic from the new records, for
instance the m,s. output prediction error, with its theoretical value, computed
from the estimated covariance P N of the final parameter estimate x
N
of the
original records. At sample instant i in the new records, the theoretical m.s.
error in the output prediction )\(x
N
) obtained from un unbiased estimate "N'
for a system described by
Such a statistic is easy to compute, for given Ih) and but has some
unconvincing aspects. We should really take into account the uncertainty in
P
N
and the estimates of (J;/. To assess the significam:e of a deviation of the
statistic from its theoretical value given by (10.1.3) we need its sampling
distribution. The distribution may be dimcult to specify, as thc prediction
errors may well not form a stationary sequcnce. Forlllal tcsts of this sort may
have a role in refining an already good model, but in earlier stages of model
validation or when a good model is not realistically altainable, less formal
checks less reliant on idealising assumptions are more to the point.
The remainder of this chapter illustrates a selection of validation checks
applied to results from actual records. The tests are 1110stly quite informal, and
bring out typical weaknesses in models. As usual, we do not pretend the tests
are comprehensive or universally applicable.
289
(l0.2.3)
(10.2.2)
i = 0, 1,2, ... (J.; = (exp( - T/r; '" exp( - iT/r},
10.2 CHECKS BEFORE AND DURING ESTIMATION
Ii, = -lilli,., -'" - li,Ii,., + [" + ... + (10.2.1)
where is I for t =k +i and 0 otherwise. The steady-state gain is
The poles are readily interpreted in terms of time constants. A positive real
pole z = (J. corresponds to a sampled exponential component proportionallo
in the u.p.r. Hence the lime constant t is - Tlln 0:, where T is the sampling
interval. Complex-conjugale poles can be interpreted as in the following
example. .
[li l Il, h, [,2J = [-1.401 0.5135 0.5399 0.8917J
Example 10.2.2 We decided that the hydrological records in Example 10.2.1
exhibited a wide spread of time constants, and there was no evidence of
oscillatory response in the flow, so we expect positive real poles between 0
and I.
A model
u.p.r., steady-state (zero-frequency) gain and poles and zeros can be computed
easily from a.Lm.a.X. parameter esLimates, 011 line if necessary, and checked
against background knowledge. The u.p.r. {Ii} is given by
i.e. an a.r.m.a.x. model with (/I, 111, q} = (2,2,3) and dead time I, was filled by
e.l.s. to 280 input-output pairs from the hydrological records. The final
a.r.m.a.x. coefficient estimates were
The poles are complex conjugate, and. since the of
exp( -t/t)SiIl/it and exp(
have denominator 1- 2z" exp( - Tit) cos /iT + z' 2exp( -2T/t), the en-
velope time constant t of the damped oscillatory u.p.r. is - 2T/ln li
"
i.e. 3.0 h,
and the oscillation period 2"//i is 2"T/cos'
l
( -a,/2y7i,), i.e. 29.6h. The
oscillation has no obvious physical explanation, and takes the u.p.r. negative
for lags between 17 and 31 h, contrary to expectation.
We can permit a wider range of u.p.r. shapes by raising the m.a. order l1i to
6. The (2, 6, 3) model obtained with dead time zero (in case there is some small
early response) has [a, ,i,J = [ - 1.086 0.2629], implying time constants
200
IlJ MODEL VAel..,'A
Time {hI
100
100
Time (h)
(a) Rainfall and (b) river-How records. Fig. 10.2.1
10.2.2 Checks on Parameter Estimates
In Example 10.1.1, unrealistic model behaviour was detected through the
value of a single parameter. Quantities affected hy all the parameters, such
Other things to look for, .not evidcnt in the example, arc in"'rlllm"'ni
breakdowns, transcription errors and patching-up of interrupted records.
Errors are hard to avoid in records taken or transcribed manually, but they are
easy to detect in smooth records and need not be detected if infrequent
comparable with noise from other sources. Breaks in records, particularly
to oversight, are sometimes repaired by crude interpolation without
recipient being informed, So any constant or straight-line section should
regarded with suspicion.
sampling interval would consequently be outweighed by the increase in
quantisation error. The sampling interval may be too short to allow precise
estimation of the longest time constant of the u.p.r.
The dead time cannot be assessed from the raw records in this instance, but
is not more than a few hours.
We shall employ these records in many examples, since their behaVIClllr
complex enough to be a good test of identification and validation ,mett,odls,
288
E
6
(0)
E
=
3 0
-
c
0
0:
0
0
80
(bl
-;;;
,
..,
E
40
0
H
;;:
0
0
290 10 MODEL VALIDATION 1D.2 CHECKS BEFORE AND DURING ESTIMATION 291
The steady-state gain will be of most interest when we examine time-varying
models, later.
3.07 and 0.99 h and a completely nOlHlegative u.p.r. As Fig. 10.2.2 shows, the
u.p.r.'s of the (2, 2, 3) and (2, 6, 3) models are quite close over the entire range
of lags, so we should expect similar prediction performances. The smaller
model in fact has r.m.s. prediction error 9 higher than the larger one over
these records. 6
. .
-... :
.......
....
25
( bl
c
0
0
m
,
u
0
U1
H
0 100 200
Time {hi
-;;;
10
(o )
,
E
c
0
a
0
>
0
c
oS
-10
Fig. 10.2.3 (ill One-step prediction errors and (b) steady-state gain for (2,2,3) model of
Example 10.2.2.
8
8
8
a
8
o
a
8
3
c
0
2
,
.-
c
::>
0
a
8 8

10
Time (h)
Fig. 10.2.2 Unit-pulse responses 01"(2, 2,3) anti (2,6, 3) catr.:hmcnl lllodcls.D: 1/ = 2, 111 =2,
q=3; 0: 11=2, /1/=6; q=3.
10.2.3 Checks 011 Rcsidllllis
The reCursive algorithms of Chapter 7 correct the pammeter estimates at each
update by an amount proportional to the most rec.enl innovation)', - 11;"-,_1'
Although e.l.s. lind some other algorithms calculate the correction gain on a
logical basis, the gain will still be poor if the calculatioll is based on inadequate
information about the reliability or X,._ I or the relation between )', and x. An
awkward und easily overlooked point is that excessive correction may give x
t
such that J'r - is very small but Yr+ 1- h;':I.jX, is large; this is particularly.
likely when parameters are represented as time-varying and assigned too
much short-term variability_ We nlUst therefore keep an eye on both the
residuals sequence {y, - and the innovations sequence {y, - h/"x
t
_
1
}
They Should be similar in m.s. value if {xl is about right (Norton, 1975).
Before we compare m.s. residuals and innovations, model deficiencies can
be detected on Iille or off by noting largc and time-structured residuals or
innovations.
EXllmple 10.2.3 Figure 10.2.3 shows the innovations produccd by the (2, 2, 3)
model of Example 10.2.2, ami the accompanying parameter vanatlOn as
rellected in the steady-state gain at intervals of 5 h. From about 130 ",
parameter updating fails to respond to the raIl in prediction accuracy during
1I0w rises. The prediction performance over the less severely disturbed period
from 160 h on is unimpressive, and there is sustaincd error even during flow
recession between 140 and 160 h. It appears that the parameters should be
treated as timc"'-varying. by one of the methods of Section 8.1, since when thcy
arc represented as constant, thc correction gain is too low. 6.
Although poor perrormance may be detected on line as in this example, the
remedial action needed will vary. For that reason on-line adaptive
identification will be ditlicull, whether it modifies the model structure and
assumptions or adjusts the correction gain directly. Experience in state
estimation, where adaptive recursive filtering is more an art than a science
(Maybeck, 1982), bears this out. With enough prior experimentation,
adaptive state-estimation algorithms can sometimes be made to work, but the
best technique in a particular case is hard to predict. In short, recursive
algorithms must be tuned oll-line.
293 10.3 POSTESTIMATION CHECKS
The markedly non-stationary innovations sequence in Example 10.2.3
suggested a need for a higher-order or time-varying or non-linear model. As it
is even less feasible to draw general conclusions about identification li'om non-
linear examples than from linear ones, we shall focus on time-varying and
higher-order models. By time-varying we mean with the dynamics represented
as time-varying in the model structure; we are not thinking of the incidental
We next examine results retrospectively (olf-Iine) to find out whether and how
the model should he modified. Off-line working allows us to reprocess the
records alld estimates at leisure, which will prove especially helpful ill models
with strongly tirnc-vui"ying parameters.
10.3 POST-ESTIMATION CHECKS
10.3.1 Employment of Time-Varying Models
since is far from constant. The gives a better picture, showing
for instance thatlhe increase in at about 120 h more than .ancels the fall in
Even is less than though, as it relies on a dubious estimate of
too small in disturbed periods and too large in smooth recessions. It can be
seen from Fig. 10.2.4 that pi
l
,' is initially erratic, partly because is estimated
from a slIlall numher of resid ullis. La tcr, is undcr-rcsponsive, e.g. around
180 h, since is estimated from many residuals, not all still relevant. Between
120 and 210 h, pil,'/llill warns clearly that h, is unreliable. The warning is less
clear from 85 to 120 h, when b
l
is unrealistically negative. None of the
quantities in Fig. 10.2.4 seems to be a trustworthy guide to reliability of hi'
Further evidence that the reliability of parameter estimates cannot always
be gauged elfectively by such quantities appears in the results of Example
10.2.2. The estimat.es
1
and 0, which implied unconvincing complex-
conjugate poles in the (2,2,3) model have pW/loJ! = 0.025 and
pl;'/lii,1 = 0.062. We should therefore expect both estimates to be highly
reliable, yet in the (2,6,3) model, which has credible poles, p:INlii II = 0.083
and pli'llci,1 = 0.268 (a result of spreading the information in the records over
more para meier estimates). Evidently choices between model structures
cannot be made on this basis. " l ;', .',t.,';':
A final comment is that at anyone point in the recursion, all the process a.r.
coefficients (ei l' etc.) have similar estimated variances, seldom differing by
more than a factor of 2. This empirical fact applies also to the process m.a.
coefficients (b" etc.) and the noise-model coelficients, and seems to be true in a
wide range of examples, not only this one. Estimated variances appear not to
be very sensitive to poor choices of model structure. 6.
(10.2.6)
(10.2.4)
0.2
-;;
u
c
0
.';
10 MODEL VALIDATION
o
00
00
o
o
10.2.4 Checks on COl'ariance
db
o
..0

o 100 200
5.0
,
,
..
,
,-
2.5
.
,

:::..,
"
"
if a,' is assumed independent of I. The summation should start late enough to
miss the residuals greatly aU'ected by the poor initial guess x
o
'
T .
S S .. Sr - I lI,h, S, -. I
,= ,-I --I-I',,-----S,-,
+ I, /-1 I,
Fig. 10.2.4 Computed Sl<uu.lard deviation of [jl' standard deviation as proportion of 6
1
, and
square root of principal-diagonal element of5, in model of Example 10.1.1 .: 6,:

The covariance of x, can be estimated as
Example 10.2.4 Figure 10.2.4 shows the estimatcd standard deviation of
the estimate of the third parameter b
l
which became unrealistically negative in
Example 10.1.1. It is found by (10.2.6) wilh (0 = 50. Also shown are si',l and
The normalised s.d. is uninformative about the reliability of hi
The algorithm is thCll
An obvious way to assess the reliability of parameter estimates is to inspect
their estimatcd covariance in the algorithms which. like e.l.s., compute it. The
most convenient implementation of e.l.s. updates the normalised covariance
292
10.3.2 Checks on Parameter Estimates
The detailed variation of parameter eslimates or derived quantities such as
u.p.r. and steady-state gain can be checked against qualitative knowledge of
the physics underlying the dynamics.
295
200
200
Time (h)
rime (hl
100
100
1.2
" I
ci. ."

,;

u
"

0.6
."

N
a
"

E

<;
"

z

"

."
X
ft
0

10
Time (h)
( b)
o
0.8
o
<",-
during flow rises are even sharper than those in is' suggesting that the dead
lime shortens.
Changes in the initial part of the U.p.r., as the dead time and shortest time
constant vary, are made clear in Fig. 10.3.2 by plotting the u.p.r. normalised
by its peak value, for various significant instants. Shortening of the dead tinie
and shortest time constant occurs during heavy rainfall between 110 and
Fig. 10.3.1 Estimates of lime-varying (a) sleady-stale (zero-frequency) gain and (b) hI'
Example 10.3.1.
Fig. 10.3.2 Unit-pulse responses at various instants, normalised to have unity peak values,
Example 10.3.1. Unit pulse response is for model at G: IIOh; 6.: 130h; II: 250h.
c 20 (0)
a
a
;;;
,
u
a
'"
10 MODEL VALIDATION 294
Example 10.3.1 Figure 10.3.1 shows the lime-varying steady-state gain g, and
coefficient b
l
estimated by e.l.s. and optimal snloothing from the records of.
Example 10.2.1. The (2,6,3) model has dead time I. The principal-diagonal
elements of are 10 - 4and 10 - 2for the a.r. and m.a. coellicients in A(Z-I)
and B(Z-I), respectively, and zero for the noise polynomial C(z - I). These
values result from trial-and-error adjustment by factors of 10 to achieve the
smallest m.s. residual obtainable without excessive time variation of the
parameters and consequent inflation of the m.s. innovation (Norton, 1975).
There is a substantial rise in g., during each flow rise (Fig. 10.2.1), and a fall
during dry spells. The interpretation is that more of the rainfall is absorbed
into the ground after dry spells and less after heavy rain. The increases in b
l
time variation of recursive estimates of time-invariant dynamics as they seule
from inaccurate initial guesses.
As we saw in Section 8.1.5, random walks provide a flexible representation
of time-varying model coellicients x. We model x by
X,=X'_I +w,_" EW'_I =0, COVW,_I =Q (10.3.1)
where Q is diagonal and each principal-diagonal element controls the
variation of onc coefficient. In c.l.s., Q merely increases P before each new
observation is processed.
The estimation ofa time-varying Illodel is not only ofinlcrcsl when the final
model will be time-varying, but is also valuable as a bridge between a very
simple first-attempt model and a refined and extended time-invariant linal
model. Tbe nature of the time variation in a simple model is a good pointer to
the extra or modified features the linal model should have.
To get the most benefit from the random-walk representation, we must
distinguish genuine time-variation in X from the initial variation of xas it
-converges from a poor x
o
' The only way we can do so is by improving early
estimates retrospectively, by optimal smoothiilg which brings into x
f
all the
information about x
f
contained in later observations, as outlined in Section
8.1.6.
We employ time-varying models and retrospective updating extensively
from now on. However, some of the validation checks can equally be applied
to with coefficients represented as constant.
10.3.3 Checks on Residuals
}(t) =5.41 x 10"(exp( -//0.714) - exp( -1/0710.
Fig. 10.3.3 Variation of parameter eSlimates and r.m.s. output error with dead time,
melhionine tolcrancc lcst. Example 10.3.2.
Example 10.3.3 Time-varying (2,2,3) models with various dead times were
estimated from the records shown in Fig. 10.2.1, with Q as in Example 10.3.1.
The sample a.cJ. of the innovations is given in Fig. 10.3.4 for dead times 1and
9. The a.cJ. for dead time 9 is quite compatible with the assumption that the
function of the innovatiOils diO'ers significantly from that of a white sequence;
i.e. zero at all non-zero lags. The idea is that a good model predicts all the
systematic part of the output, leaving an unstructured innovations sequence.
Unfortunately, sample autocorrelation is not always a good measure of
whether a sequence is structured, as we shall now see,
10.3 POST-ESTIMATION CHECKS 297
1.5
"
"
0
;;
"0
u
~
'C5[
'2
, ,
0 0.2 0.4
Deod time {hi
54100 12400
I
,
.
15
1500
"
10
1000
~
~
< ~
~
E
Ii
5 500
"
0
0 0.2 0.4
Dead time {hi
2.25
40
1.75
55
10 MODEL VALIDATION
1.25
85
0.75
115
time I (h) 0 0.5
response y 0 90
296
The fit is moderate, with an r.m.s. error 6.03, about 16 / ~ of the r.m.s.
deviation of the samples from their mean. However, ti is implausibly large and
the time constants suspiciously similar. A possible reason is omission of dead
time. Figure 10.3.3 shows how Lm.S. error, ti, t] and i
z
were found to vary
with dead time. The optimal dead lime 0.35 h reduces the r.m.s. error to 1.13
and gives credible values for ti, i I and ill reassuring us that the improvement
in fit is not merely due to increasing by one the number of parameters
estimated from a very shortrecord. f::,.
A plot of this response and several others suggested that they might be fitted by
y(t) = a(exp( -1/, ,) ~ exp( -1/,,)), with the aim of investigating whether the
response parameters arc related to clinical condition.
Because the samples are unevenly spaced, parameters a, '[1 and t
2
were
estimated directly by non-linear least squares rather than trying to fit a
transfer-function model. The Levenberg-Marquardt method of Section 4.3.3
produced
Example 10.3.2 In a methionine tolerance test (Brown el aI., 1979), the
response of methionine concentration in the blood of a subject following a
rapid dose was
The following example illustrates (in a constant-parameter model) how a
deficiency in model structure may show up clearly in unlikely parameter values
although it is not obvious from the residuals.
130 h. Comparison of the u.p.r.'s at 130 and 250 h finds a lengthening of the
longest time constant and a further reduction in dead time, both due to
cumulative welling of the catchment. We conclude that ideally a non-linear
model should give rising gain, shortening dead time and lengthening overall
response as the catchment gets weller. f::,.
A straightfolWard plot of the residuals or innovations can say a great deal
about the adequacy of the model, as in Example 10.2.3. Another post-
estimation check often proposed is to test whether the sample autocorrelation
Log (h I
Fig. 10.3.4 Sample autocorrelation function of innovations for models with dilfcrent dead
times, Example 10.3.3. Model dead time 0: 111; e: 9h.
We next ask whether the r.m.s. innovation is an adequate guide to the
model. At the same time we see the clrects on the innovations of retrospective
re-estimation ,of {x) by optimal smoothing.
Example 10.3.4 Root mean square innovations arc plotted in Fig. 10.3.5a for
(2,2,3) and (2, 6, 3) models, with and without optimal smoothing of {x), [or a
range of model dead times. In all cases x is represented as a random walk, with
Q the same as before. The optimally smoothed estimates arc retrospective,
being based on the entire record o[ N input-output pairs rather than the
observations received up to each point in the recursion. The one-step
299
( 01
8
(b I
0
0

2
0
0
B
0
0
(, 0
0
,

0

0
E
05
a
0
E
Ii

Ii .
0

D 0
0
-I 5 -I

Dead lime (hi Dead lime (hi
From Fig., 10.3.5a, Il/-
J
is lhclatcst input cssential ror predicting Yr; the
(2,2,3) and (2,6,3) models perform well only when ",-3 is included in the'
cxplunutory variablcs. Only a slllall dcterioration in prediction performance
results when too short a dead time is specilicd in the (2,6,3) model, even
though the redundant leading m.a. terms b ,II, -h- l' etc., contribute nothing.
Evidently fewer than six Ill.a. terms are necessary if the dead time is well
chosen. The closeness of the Lm.s. innovation values' of the (2,2,3) and
(2,6,3) models at dead time 1 or 2 confirms this.
Figure 10.3.5 gives the r.m.s. values of' residuals {y, - h;x'I') and
{y, - h,' X'IN)' They demonstrate how elrective optimal smoothing is in
reducing noise-induced spurious shan-term variation of the parameter
estimates. Such variation makes the f.111.S. innovation three to fouf times the
size of the Lm.S. residual in the forward recursion. In the backward recursion
which produces {x
rIN
}, {Yl - h;x
r
_ IjN} and {Yr - the Lm.s. inno-
vation is only 20 to 30 %largeL We conclude that the optimally smoothed
estimates {X'IN) contain considerably less spurious variatiollthan the on-line
estimates {xt!r}' Do
Fig. 10.3.5 (a) Variation of r.m.s. innovation with dead time, for time-varying models
obtained with and without optimal smoothing, Example 10.3.4, and (b) variation of Lm.S.
residual with dead time. Without smoothing: 0: (2, 2, 3) model; 0: (2, 6, 3) model. With optimal
smoothing: 0: (2,2,3) model; I!!I: (2.6,3) model.
,
!
! "predictions" they yield are not therefore available on line. However, the
i smoothed "predictions" U.;fX'_'IIN} should give a better indication of long-
J tcrm olle-slep prediction performance than lh"rl'x, _III 1 }, because early in the
records the latter is still much alrected by the error in the initial estimate.
In this example, optimal smoothing reduces the Lm.s. value of the
innovations by about a factor or2. The variation with dead time and process
m.a. order is, however, much the same with ami without smoothing.
10 MODEL VALIDATION
20
------B-------.----.-.-- +20"
0 80 __

0------0----------------... - 2 CT
o
o
,.,;
u
0
0-
E
0
(f)
0
The explanation for the failure or the Lest is that it few large innovations
dominate the sample a.c.c., the reliability of which is therefore low. For
instance, the five largest innovations for oeuu time 9 account for 53.6 %of the
sum of squares. Incidentally, nonc of the large innovations is ncar the start of
the records and so avoidable by a later start for the a.c.c. calculation. We
conclude that the sample a.c.c. is helpful only if the innovations sequence is
reasonably stationary, a rare event in practice. L.
298
innovations sequence is white, with only one value outside the 2 standard-
deviation lines for an ullcorrelatcd sequence, and that only marginally. For
dead time I the a.c.r. looks more structured and is beyond the 2 standard
deviation lines at two lags and close at a third. Nevertheless, dead time I is far
closer to the truth. It gives smaller rcsiduals, better predictions and smaller
estimated standard deviations for the parameter estimates. It is also consistent
with the look of the records; dead time 9 is nol.
301
o
o
o
o
Simulation-mode results for short sections of record where the parameter

estimates or a time-varying model change rapidly help us to understand the
changcs and asscss whether they arc suHicicnt. The "before change" and "arter
change" models are compared according to how the simulation-mode {)I} fits
IY} in the vicinity of the change.
correction gain have become small. Simulation-mode results are pessimistic if
lhecorrection gain does indeed slay large enough to allow significant variation
or x.
Raising the model orders to (6,6, 3) in this example improves the
simulation-mode performance very little, reinrorcing the conclUSion that only
t1rne-varyll1g representation of the a.LI11.a. coefficients will match the
dynamics better. /:"
Example 10.3.6 A (2, 2, 3) model with dead time I and coelficients represented
as time-varying was estimated from the records of Fig. 10.2.1. The estimates
change rapidly during the flow rise at aboul 120h. Figure 10.3.7 plols the
simulation-mode {)i} generated by the model as it slood at 115 and 125 h,for a
period covering the rise, peak and recession. The model at 115 h has far wo
Iowa gain and is too slow; the flow peak is 2b late. The 125h
contrast, has about the right gain and places the peak at the correct time. Its
recession time constant is too short, but that is of little consequence for on-line
flow prediction; any fairly long time constant ensures good one-step
prediction while the flow changes slowly, and prediclions during a recession
are in any case of little importance. We can be satisfied with the challge in x
during the rise, it seems. 6
aD
o
0000
0
0
0
OL.---c:'-:,-----'----.:::........:..:r
120 130 140
Time {hI
Fig. 10.3.7 Observed now and simulation-mode flows given by models before and after
changes during flow rise, Example 10.3.6. Observed flow is a continuous line. Flows modelled at
.: 11511; 0: 12511.
(10.3.2)
10.3.5. Observed now is a
10 MODEL VALIDATION
10.3.4 Simulation-Mode Rnns
.1\ = - (II Y, - I "- . -lillY, -/I -I- fj Ill, - hi-\- ...
-\- [jlllllt-k-m+ 2]l\ - I -1-' .. -\- -'1
a =:;;::: ..
50 100 150
Time {hI
Fig. 10.3.6 Observed and simullHion-modc nows, Exalllple
continuous line.
80
Example 10.3.5 A (2,6,3) model with dead time I was estimated from the
records shown in Fig. 10.2. I, with the parameters represented as constant.
Figure 10.3.6 compares the observed and simulation-mode flows over part of
the record. The shortcomings of the model are plain to see: It overestimates
the lower peak flows and underestimates the largest peak, misses the slowest
dynamics in the recessions and gives toa large and rapid a response to the start
of rain after a long dry spell. These reatures appear much more clearly than in
the on-line residuals or innovations, for several reasons. The simulation-mode
computation deliberately omits the noise model, which on-line takes up some
of the output behaviour not captured by the input-output part of the model.
Missing or inaccurate dynamics in the model causes cUlTIulative output error
in the simulation mode, but the error is constantly removed on-line by use of
observed Haws in predicting lhe present How. Finally, the parameter estimates
evolve on-line even if represented as constant with Qzero, and can follow time-
varying dynamics to some extent. This is sO until the covariance P and
The recursion is started with Yo to .1',,- I for j)o to .Ii
u
_ l' Deficiencies in model
structure and poor parameter estimates or dead time give risc to obvious
systematic error in the simulation-Illode output scquclH.':C.
A severe and informative test of a model is to run it oVer records in simulation
mode, that is, with earlier model-output samples in place of the observed
output in the explanatory variables, generating
300
303
10.4 EPILOGUE
FURTHER READING
Yours sincerely,
Let us linish with a. story.
Once upon a time, a researcher interested in identification was invited by an
industrial research association to identify the dynamics of a certain process.
He was told recmds could be pmvided but were likely to be very noisy, He
agreed to take the job all. There was some delay in pmducing the records, and
the researcher moved overseas. He was keen to do the work, even though
contact with the people producing the records was now less
agreement slood. In due course the records arrived. Day afte.. day he laboured
to lit a convincing model, but with no success. Finally he gave up and, rather
shamefaced, wrote to the research association to admit defeat. Many weeks
passed, until one day a leller arrived from the research association. It read:
FURTHER READlNG
Thank you for your recenlletter. \Ve appreciate your efforts On our behalf,
and were sorry that they mel with no success. You will be pleased to hear that
we can now account for the dilliculty. The inpul record we sent you was for
Tuesday 23rd Marcil. The output record was for Tuesday 16th March.
and just outside it. Figure 10.3.8 gives 180 h of observed flow and the
simulation-mode flows calculated from rainfall at each gauge on its own, using
(2,6,3) constant-parameter models filted to 217 h of record, Gauge I predicts
a now peak at 32 h absent rmm the observed flow, and misses the peak at 70 h
almost completely. A pcak at 168 h predicted by gauge 2 rails to eventuate,
alld the obscrved peak at 114h is missed by gauge 2. Apparently the
catchmcnt is too large and the rainfall too local rur two rain gauges to be
enuugh. Furthcrlnorc, the absolutc as well as relative successofthc two gauges
varies greatly, so a weighted sum of their readings would not do, either. 6
Tests for abscnce of slruclure in residual sequences are discussed by Box and
Jenkins (1970), Kendall (1976) and the regression texts melltioned at the end
or Chapter 4.
10 MODEL VALIDATION
"
,- ! \,
, , ,
',' ,
J ,J (\ \ ,........,
r'-\ : I \ \ i '.
, "i,""". i . \ I \
1\1' \\. \
/' \. .,
/
"--'"
\--r Gouge 2
Gouge I
o
-;;;
,
'"E
- 62.5
E
u-
10.3.5 Informal Checks
100
Time (h l
Fig. 10.3.8 Observed flow and simullition-1ll0l1e Haws given by two scpuratc
records, Example 10.3.7.
302
Worthwhile checks after an estimation run arc for:
(i) isolated large residuals (outliers), which may ..eveal transcription or
instrument errors;
(ii) short periods of la..ge and highly st .. nctmed, or anomalously small,
residuals, which may show lip uoctoreu recurds;
(iii) abrupt and unexpected changes in the parameter estimates or
residuals. which may point to unrecorded incidents such as a feed-stock
change, unrecorded control action, change or inconsistency in the way
measurements are made or shift-la-shift variations in process-operating
practice;
(iv) input features with no apparent output consequences or output
featmes with no apparent cause, as shown by large residuals over short
periods, suggesting that morc extensive or better measurements or a higher
sampling rate may be required;
(v) periodicity: specialised models for periodic phenomena may be
necessary, an important topic we have too little space to pursue (Box and
Jenkins, 1970).
Simulation-mode runs are effective in bringing out such features.
Example 10.3.7 Hour Iy records Were taken or now in the Mackintosh River
in Western Tasmania, together with rainfall at two gauges, one not far above
the stream-gauging point and the other diametrically across the catchment
125
304 10 MODEL VALlDATlON
Identification case studies arc given by lJohlin (1976), Ljung and::.. :.
i
.
Soderstrom (1983), Olson (1976), Soderstrom and Stoica (1983) and Young
(1984). Gustavsson (1975) surveys applications in the process industries and
gives 143 references.
REFEllENCES
Index
Bohlin, T. (1976). Four cases of identification of changing systems. III "System Identification:
Advances and Case ,studies" (It. K. Mehra and D. G. Lainiotis. cds.). Academic Press,
Box. G. E. P., and Jenkins, G. M. (1970). "Time Series Analysis, Forecasting and Control",
HoldenDay. San Francisco, California.
Brown, R. F., Godfrey, K. R., and Knell, A. (1979). Compartmental modelling based on
methionine tolerance lest datu: A case study. Aled. Bioi. Eng. Campu!. 17, 223-229.
Gustavsson, 1. (1975). Survey of applications of identification in chemical and physical processes.
AUlomatica 11, 3-24.
Kendall, M. G. (1976). "Time-Series", 2nd ed. Grillin, London.
Ljung, L., and Soderstrom, T. (1983). "Theory and Practice of Recursive IdentiHcation". MIT
Press, Cambridge, Massachusclts.
Maybcck, P. S. (1982). "Stochaslic Models, Estimation and Control", Vol. 2. Acadcmic Press,
Norton, J. P. (1975). Optimal smoothing in the idcl1tiJiclition of linear systems.
Pruc. lEE 122, 663-668.
Olsson, G. (1976). Modeling and identification of a nuclear reactor. III "System Identification:
Advances and Casc Studies" (R. K. Mchra and D. G. Lainiotis, cds.). Acadcmic Press,
arid Stoka, P. G. (1983). "Instrumental Variublc MethOl.Js for System
Identification." llerlin and New York.
Young, P. C. (1984). "Recursive Estimation illlt.! Time-Series ;Analysis." Herlin
and Ncw York.

A
Aitkcn estimate. 103-105
Akaike information criterion. 275-277, 284
viewed liS F test, 276
A.m.l.. 165
Approximate maximum-likelihood, 165, 167.
172-174
A.r.m.a. modelling. 149.214
Autocorrelation ["unction, 44, 56, 66, 248,
278
of noise, 146
of inllovations, 2lJ6 2l,lH
AUlorcgression, 149
II
Uatch identificatioll algorithms. 141)-152
Bayes estimation, 139
computational methods for, 14U
parameter and slate, 256
with Gaussian p.d.!'., 132
llayes' rule, 124, 256
llias, 88-91, 119
asymptotic, 90
conditional. 89
duc to presence of OUlput among
regressors. 94-%
1)0
of least-squares estimate, 92 96
unconditional, 89
Botle plot, 36-31). 41
Buuildcd-noise eSlillilile. 131, 233-241
305
c
Ccrtainty equivalence principle, 217
Choleski factorisation. 70-72, 84
CommullItioll noisc. 38
estimator. 128, 141. 256
Conditional-median estimator, 130. 141
COildilional-mode estimate. 132.
Consistency, 97
of recursive estimators, 182
COil tinued-I"ractionapproximation, 227- 230
Convergence, lJ6
almost sure, 96
ill pfllhahility, 96
in quadralic mean. 1)6
Il1cun.syt1al'c, l)(}
of reclirsive iLlclltiJication algorithms,
176184,
of self-lUlling cOlltrollers. 21M
rate of o.l.s., 211
theorems for recursive estimators, 182
With probability one, 96,169,175.182,216
Convolution
integral. 24, 247
sum, 25, 32
Correlation functions. 43-46, 49-52
intcrpretation of ordinary least squares
ill terms of, 66
Covariance, 1)8
checks all computed. 292
of least-squares estimates, 101-103
of Iinellr function of estimatc. 99
of random proccss. 98
rcsetting.192
updating, 159161. 17S, 194. 222, 292
)06 INDEX
INDEX 307
.,
j
,
t
I
Crumcr-Rilo bouml. 114-117. 137.266,273
Cross-correlation function, 43, 66, 248" 250
Crosscorrc]ators, 53
IJ
Deal! lime. 7, 20. 26, :W, 40, 285,
2'.15-21)1)
DctefminanHalill lcst. 27tJ
Uillcrcllcc c411utioll, JJ, 85
Drift correction, 53
E
Ellicicncy, 114-118
Eigenvalues Ullll eigenvectors, 77
Equaliscr.4
Equation-error algorithm. 147
Ergodic signal, 208
Errors in variables. 93, 119
Euclidean inlier-product space, 235
Expected value, 43
Experiment design. 263-272, 281
fur perturhation testing, 52
ExlcnJcd leas! squares. 164-166
as approximate maximum-likelihood, 167
convergence of, 184
Extended matrix method, 166
F
F-[csl, 274, 284
Fi.shcr infornHllion mutrix, 114. 162, 164,
267
Forgetting factor. 192
Fourier transform. 46
G
Gnuss- Newton algorithm, 79-81
Generaliseu leusl sqtlarcs, IUJ-l U5, 150
iterative, 150
Golub-Householder technique, 72-- 76
II
llh::ran.:hicalillodcl slnrdun::s. 273, 2HJ
Hold cin;uit. 17
Householder transfonuatioll, 73 76
Idclltiliuhjlily,201 211
cOlll.htiollS onlt.ctllmt.:k aud inJluts.llS 217
deterministic. 20J20lJ, 251)
cllccts or feedback 011, 212215
ll(lrmal-Illouc analysis Ilf. 204}06
i.Jf stutc-spm:c models,
pcrsislclll:y of cXl:itutioll ClllllliLiollS lor,
206-211
structural. 203
ILlctllilicutioll
case studies. 304
consiraints on. IH
4l.) 55,247250
in dosed [uup. 212 221
of lIIullivariublc systems, 242 246
of nOll-lincar systems. 246 251
purposes or. J 5
slages in, J 5 17
Illr.:unditiollcd cOlllputation, 63, 72. 7M, M4,
IIO,245,27I),2MI
Impulse responsc, 23 26, 40, 204. 251), 296
measuremcnt of, 2H
Information uptJuting. 162 164. 17H, 187
Initial-condition response, 23 25
Inncr pruduct
derivutive uf, 61
Innllvution, 154, 21)tJ
autocorrelation runction 01',21)6 21)8
root-mean-square, 298
Input signul
choice of, 263--267
optimisation of, 266
Instrul11ental variahles, IUlIH}I), 120,
170-174
recursive, 170 174, 2H I
Invarial1l:e, 137
Iterutive processing, 141)
.1
J,H':ilhiall matrix, 7H

II
I
:1
Ii
il
Ii
K
Kulnmnliltcr, 157, 11)5. 25CJ, 2(11
eXlendeJ, 252, 261
Kullhal:k Liebler mean infofllliitioll, 275
L
Lattice algorithms, 225
Lea.'it .'iquarcs. 59" H2. 15H lill
extcnded, 164166
gcneraliseJ. 103-"11)5, 150, 15H
non-lincar, 78H2, 296
ordinary. 60
0
68, 271)
weighted, 68, 85, 161, 11)1. 245,260
LevenbergC"Marquardt "Igorithm, 81, 296
Linear cstimator, 91, 154
Lug-likelihood function. 134.275
Loss functi(Jl1s. 122. 141. 142. 175
convex, 121)
M
f\.htrkov estimatc, 10] 105. IIH. 135, ISH.
J96.215
Mairix-fraction descriptioll, 243. 260
Matrix-inversion lenulHi. 160
Matrix square fliot, 70
MaximuTlll1 po.l"/cr/oi"i 132
Maximum-likelihood estimation. 131-140,
144.151.246
butch iterative algorithm. 152
conditional. 133, 141
cOllsistency of, 136
invuriunce of. 137
parumeler ull(..I state, 254-256
properties or. 136
ullconJitiollul. 135. 141
with Gau.'isian measurements and lillear
mallei, 135
with Gaussian vector measurements, 138
Mean-square-error matrix. 109. 113
Methionine tolerance tesl. 5. 2%
Miniilllllllcovurialice estimate. I}I), 121),
155" 162. 164-176. 187
recursive lillear unhiased. 155-162. 187
with llutncorrelated or non-stationary
error, 103 105
MiIlI11lt1111-111etln--"qu:t re-eUt Ir illmte.
Illl) 113
Minimum-risk estimate, minimax, 131
Miniinum-risk estimator, 123,128-133_ 142
lllinimUlll-expcciedabsolute-error, 130, 141
minimulll-quallratic-cost, 128, 141, 144
Model
aircrafl dynamics. 264. 283
armax. 148.225.243,289
hlack box. 2, 21
hlast furnace. 21)
block-oriented, 248-250
canonical, 203
compartmental. 123,201. 206, 259, 269
(:(Jluinuous-tiIilC. 12
deterministic, II
discrete-time, 12
distributed. 7
dynat'nical. 6
ror control-system design. 3
for simulation and operator training. 5
for state estimation. 3
hydrolog.ical. 2.122.11)0,274,285,287,
2HI).
inpll[Ol1tput, II
liliear.9. 143
Illl:al Iincarislltioll of. 252
IUIllped,7
Markov-chain. 14
IilinllTWl, 245
lHulti-input-multi-outpul. II. 143,
llluiti-input-single-olllput. 144
noise. 233
non-causal. 215
non-paramctric. 12
paramcter-hounding (hounded-noise).

parametric. 12
predictioll, 2. 3, 122. 174
reactor. 4
runoJf-routing. 2
sectioned, lJ
single-input-multioutput; 144
single-input-singleoutput. I I, 143
standard linear s.Ls.n.. 145
state-variable. 9-11, 203. 251, 267
stochastic. II
lime-invariant. 8. 144
time-vurying. 8, 181)201
translorlll. I J
unitary. I J
world-growth, 5
JOS INDEX
INDEX J09
Model order. 6. 145
inoeincnting, 221-224
testing. 2H9
Mauel reduction, 225 233
Model-structure selection, 2:30,
272-281
Model validation. 285-303
lests for. 287
Moment matching, 225-227
discrete-lime, 230-232
'Most likely' estimate, 132
Moving average, 149
m-sequence, see Sequence
Multivariablc systems
cost functions for models of, 245
mouel puramctcrisation for. 242-245
N
Non-linear least squares. 78-82, 296
Non-linear systems
identification of, 246-251
models for, 250
Normal equations, 63. 206. 221
generalised. 78
Norrnul matrix.. 62. 69
covariUi1CC of 0.1.5. estimates (f(llll, 101, 28 I
factorisation llicthods for. 69-7H
uplhlliilg in autoregressive llllidcl, 225
o
Optimal estimation, 121
Optimal smoothing, 195-201,295
covariance computation, 199-201
fixed intcrval, 195-201
Ordinary least squares, 60-68. 279
by Choleski factorisation, 70-72
by Golub-Householder technique, 76
by singular-value decomposition, 77
computational methods for, 69-78
minimum-covariancc property llf. IU2, 1J9
Orthogonal matrices, 73, 77
Orthogonality, 64. 235
and minimum-mcun-square-ernlr linear
cstimator, 111-113 .
correlatiun interpretation of, 68
in order-incremcnting. 223
Onhog(lliatity r'ol/ti/,
(if mudel uutput nllli output error, 65,
27iJ
of regressol's ilnd output elTllf, 66, IHI, 223
of state estimate and its elTur. IlJiJ
Output-error algorithm. 147
Output-sampling schedule, of.
267-272
I'
Padc llpproximation. 260
Panuska's method. 165
Paramctcr and stale estimation
altcrnatcd, 254. 261
Bayes. 256
combined. 251--256. 261
maximum-likelihood,254-256
Paramcter-bounding algorithm
ellipsoid.d,237-241
linear bounding, 241
Parameter variation
explicit modelling of. 294
simple random walk, IlJ3. 294
Parsimony prilH:iplc, 27J
conditions for applicability of, 273
PartitiOilell-mutrix inversion lemma, 222
Periodicity, 302
Persistency or excitation condittOilS. 2U6. 263
for convergence of adaptive controL 264
for convergence of recursivc o.l.s.,
in ordinary Icust squares. 206-21 1
Persistently cxciting signu!. 208, 260
Polcs. interpretation of estimateu, 289
Posterior probubility density function. 123,
127,129,141,256
Power gain, 56. 57
Power spectral t!en.sity, 47. 56, 2U8
Prediction crror. mean-squure. 286
Prediction-error algorithms, 174-176. 215
Predictor
recursive adaptive, 219
tolerunced, 241
Prior probability density function, 123, 127,
141
Probability limit. 96, 120
ProdUl:HlIOlI\entllllltrix. 277 281
Projection lllutri,x. 66, 224
Pure delay, sr'e Dead time
i,
II
I
I
i
i
Q
Quadratic forlll. derivative of, 61
Quadratic-rcsiduc code. ,\'1'(' SC4UCliCC
Il
Radur tracking example, 62, 63, 69, 85, 222,
234. 240, 242
Random process, 88
Random variablc, ergodic, 43
Random walk
integl'llted, 194
simple, 193,294
Rational spectrul density. 146
Recursive identilicution, 149, 152-176
in bounded noise, 260
including noise-structure model, 164-170
initial conditions for, 157
lincm unbiased,
minimulll-covariance, 155-162
of timc-varying models, 191-201,293-301
Regressors
addition or. 221. 277 281
line.lr depenllellce between, 64, 77, 206,
212,277
Residuals. 290
Resonance peak. 31). 41
Ridge regression, 109 III
Right divisor. 244
Ilisk, 123. 175
R.m.!. I. 165
R.m.1. 2. 167, 184
s
Sampling interval. choicc of, 52, 287
Self-tuning control. 149. 176,217-221
minimum-variance. 219
Sepurability, 248
Sequeilce
inverse-repcat. 57
maximal-length (III-sequence), 50-57
mutually uncorrelatcd. 45
pseudo-random binary, 5055
quadl'lltic-residuc. 54. 57
slutisticll1Jy indepeiJdent. 45,145
zero-mean, 45
Signal
rcrturhalioll, 49, 265
white. 46
Simulatloll-illode runs, 300 30J
for time-varying models. 301
Singular-value decomposition. 77, 85, 110,
120
Speed-control system. 37-39
p.r.b.s. test of, 52
State estimation, 3. 153, 157, 164, 188,
195,237,251
and simultaneous parameter estimation,
251-256,261
applicntions, 3
model for, 3, 251
Stationarity
strict. 88
wide-sense. 88
Statistical properties of estimators, 87-118,
145
Steady-state gain, 289, 291, 294
Slep response, 28. 40
measurement or. 29 H
Slochastic approximation, 168-170, 188
Strict positive realness. 183
Strongly system idenliliable model, 216, 261
Supcrposition,
integral, 24
T
Talc. cautionary, 303
Transler-function nnalysers, 36, 54
Transfer function, 30-39. 204. 265
discrete-time, 31-34. 85
frequency. 34-39,265
Lapluce, 31
u
U_I) factorisutioll, 72
Unbiased estimator, 91. 119. 154,266
Uililll111Jular lIlHlrix, 244
Unil-pulse response, 25-27. 32-J4, 40, 45,
2S9
idcntilicatiol1 01",49-53,66
JIO
v
Volterra series, 246-248
IV
INDEX
While noise, 46A9
Wiener-HopI" cqU:llioll; 45. 57, 66, 249
design. 234
,
:I'
I'
!
I'
Weighting runtlion, set! Unit-pulse response
Weighted least squares, 68,85,161,191.
245,260
recursive, 191, 260
z
hold, 27, 41, 51
z-lrunsfoflll, 32
,. 1

2009 P. Norton

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

2009 P. Norton

Încărcat de

Drepturi de autor:

Formate disponibile

An Introduction to Identification

2.2.1 J"Icmmrclllcnt uf hllJlulsc RespUllse

= 0(11 = Oil" -\- [.I'"'' I""J-' 1""'ly _ jiOI) '" [-0.821IJ

'/ = 11"11 u= Ilull'u = nju

Simulation-mode results for short sections of record where the parameter

S-ar putea să vă placă și