Documente Academic
Documente Profesional
Documente Cultură
Kalman
Mathematical
System Theory
The Influence of R. E. KaIman
A Festschrift in Honor
of Professor R. E. Kalman
on the Occasion
of his 60th Birthday
With 49 Figures
Springer-Verlag
Berlin Heidelberg GmbH
ISBN 978-3-662-08548-6
The use of registered names, trademarks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
'IYpesetting: Thomson Press (India) Ud., NewDehli;
61/3020-543210 - Printed on acid-free paper.
Contents
Introduction . . . . . . . . . . . . . . . . . . . .
List of Technical Publication of R.E. Kaiman
1
9
17
41
55
89
135
. . . . . . . . .. 147
. . . . . . . . . . 159
Contents
191
213
233
267
279
. . . . . . 295
.. 311
. . . . . . . . . . . . . . 327
.. . . . .. 355
. . . . . .. 371
Contents
XI
423
Adaptive Control
K.J. Astrm . . . . . . . . . . . . ...... .
437
453
Controllability Revisited
M. Fliess . . . . . . . . . . . . . . . . . . . .
463
.. 491
. .503
527
559
579
Subject Index
593
Introduction
Scientific activities can be divided today into two broad categories. The first
category inc1udes by and large the natural sciences: physics, (most of) chemistry,
geology, materials, astronomy, (some of) biology, etc. Its objective is to
investigate fundamental properties of matter, the big bang, black holes, and
similar problems. The se co nd category is concerned with phenomena and
structures which display high compZexity. These can be found in nature;
biological phenomena and the structure of molecules, like DNA, are two
examples. But for the most part they are artificial, generated by disciplines such
as engineering, computer science, cybernetics, ecology, operations research,
ecomomics, etc .. The main distinction between these two categories is their
system-component: the former has a small system-component while the latter
has a Zarge system-component. In the sequel, we will attempt to define the
concept of system related or system theoretic activity.
The scientific methodology followed by the former category of sciences is
weIl established: experiment, theory, verification. After an experiment is carried
out, a theory is postulated, whose validity is subsequently verified. This
methodology is weIl suited for simple or simplified phenomena (like the
photoelectric effect). It does not work however for the scientific activities with
significant system-component. This is due to their high compZexity. Think for
ex am pIe of the brain or the computer; there is no simple experiment, which
could capture all aspects of the functions of either the brain or the computer.
Consequently, there is so simple theory wh ich would work either. The methodology of the system related sciences is not as yet, weIl established. What is certain
however is that the concept of model plays a prominent role. A model is an
abstract construction (a set of rules), which helps (a) in analyzing the problem
at hand, (b) in determining what can be achieved and wh at not, and finaIly,
(c) in giving prescriptions on how certain goals can be achieved.
After these remarks let us attempt to define what is meant by system related
activities or system theory for short. It is a science which deals with phenomena
whose complexity cannot be described by simple laws. It is concerned not with
the actual world but with models of the actual world. System theory is not only
descriptive like the natural sciences, but prescriptive as weIl. This means that
system theory does not only tell us how systems are (analysis), but how systems
should be (synthesis).
Introduction
The system-related sciences, were barely existent a century ago. Since then
their growth has steadily increased, accelerating especially after the middle of
this century. Judging from the number of journals devoted to this area, it has
become today one of the most important areas of scientific endeavor.
Due to the fact that system theory studies models, the natural means for
system-theoretic investigations in mathematics. Historically, system theory has
been concerned with large classes of systems, like linear, bilinear, analytic, etc.
(especially the former) and answered questions regarding their possibilites and
their performance limits. Since the main tool for studying system-theoretic
problems is mathematics, system theory, like mathematics is universal. This
means that a system theoretic result can be applied to a physical system or a
biological system or an economic system, or any other system, provided that
the assumptions under which the results were derived is satisfied. Thus
mathematics through abstraction at different levels, provides the necessary tools
for penetrating into the complexity which characterizes a great many of the
problems today.
To shed more light on the above picture of today's scientific activities, the
interplay between the two categories already mentioned is illustrated by the
following ex am pIe. Airplanes, and in particular newly designed, high-perfomance
airplanes, are inherently unstable. This instability is precisely a consequence of
their good aerodynamic properties. For these planes to be able to fly, the use
of (very sophisticated)feedback control mechanisms is imperative. Thus, the idea
of feedback being one of the typical system theoretic concepts, in building
airplanes, one needs to respect the laws of aerodynamics but those of system
theory as weIl.
The most influential and dominant figure in system theory over the past 30
years has undoubtedly been Rudolf E. Kaiman. There is hardly a research area
in this field which has not been influenced by his thinking. In the pages that
follow, there is ample documentation of Kalman's influence in the field which
earned hirn the reputation of the founder of mathematical system theory.
Kalman's first major contribution resulted from his little known master's
thesis [7]. Therein, much of the present development concerning the chaotic
behavior of dynamical system is anticipated. His adaptive control paper [12J
followed. It proposed the self-tuning regulator, which is widely used in practice
today and whose theoretical analysis was completed some twenty years later.
Between 1959 and 1965 KaIman wrote aseries of seminal papers. First, the
new approach to the filtering problem, known today as Kaiman Filtering, was
put forward [19J, [25]. Its enormous success and appeal lies in the fact that
the structure of the optimal filter is explicitely given, while the unknown gain
is computed _recursively, by solving a matrix Riccati differential or difference
equation. In the mean time the all pervasive concept of controllability and its
dual, the concept of observability, were formulated [24J, [34]. Simple rank
conditions were derived for checking their validity. These notions were essential
in Kalman's treatment of the linear quadratic problem in optimal control. By
combining the filtering and the control ideas, the first systematic theory for
Introduction
Introduction
topic. The first paper (Anders on & Moore) provides an overview of the KaIman
Filter and a comparison with the Wiener Filter; the key differences are
summarized in table 2. Furthermore, related topics, like the equivalence between
the main tools of the two filters, i.e. the Riccati equation and the spectral
factorization, and the way innovations come into the picture, are explored. The
second paper (Kailath) examines the very rich area surrounding the KaIman
Filter and adds many historical remarks. It is thus explained how the study of
innovations and martingales arises naturally in connection with the KaIman
and the Wiener Filtering problems, in attempts to find the relationship between
them. The third paper (Faurre) discusses how navigation and guidance were
influenced by the introduction of the KaIman Filter. There are five concrete
applications in which the author's company SAGEM, has been working on. In
two cases pictures of the actual hardware are displayed. The chapter condudes
with an account of the quantum KaIman Filter, where quantum probabilistic
techniques are used to produce an explicit solution of the discrete-time filtering
problem for a dass of dassical stochastic processes which is neither markovian
nor gaus si an (Accardi).
Chapter 3. The LQG Problem
The dual of the KaIman Filtering problem, known as the Linear-QuadraticGaussian (LQG) problem is discussed in this chapter. Kimura's essay argues that
LQG is the first systematic control synthesis theory and differs from dassical
synthesis methods because it is model-based. Precisely the fact that it is model
based raises the issue of the theory-practice gap, which in turn, prepares the
ground for robust contro!. It is also argued that the more recent H oo-control
theory is not a counterpart but rather a successor of the LQG theory. This last
point is taken up in the next contribution (Khargonekar), where it is actually
shown how the state feedback H 00 control problem and the H 00 filtering problem
can be regarded as generalizations of the LQG and the KaIman filtering
problems, respectively. The Riccati equation turns out to be the main tool in
both ca ses. Moreover, for the limiting value of a parameter, the Riccati equations
involved in the LQG and the KaIman Filter are recovered. The third paper in
this chapter (Goodwin and Salgado) surveys a newly developed framework
which unifies the discrete and continuous-time LQG theories. A formalism is
introduced in terms of the sampling period; as it goes to zero, the continuous
time case is recovered, while for unity sampling period, the discrete-time results
are obtained.
Chapter 4. The Realization Problem
Another one of Kalman's major contributions was to solve the realization
problem, i.e. the problem of construction of the state from the external behavior
in the case of linear systems. An equivalence between external (input-output)
and internal (input-state-output) descriptions of linear systems was thus
Introduction
of a complex variable). In the first paper Fuhrmann remarks that this dual
nature of the theory of linear systems accounts for the richness of the fieId and
for the depth of the results. A prime ex am pIe of this duality is Kalman's
unification of various stability criteria using modular arithmetic (algebraic
calculations modulo a given polynomial). Besides this, the. fundamental
connection between modules and coprime factorizations, and in particular the
so-called polynomial models which provide the link between the two, is surveyed.
The paper concludes with an account of the circle of ideas around the partial
realization problem, the Euclidean algorithm, continued fractions, canonical
forms, etc.., In the second paper Hautus and Heymann use the module
framework to obtain a unifying treatment of both realizations and feedback.
The application of module theoretic ideas to the treatment of zeros of
multi-input, multi-output systems is surveyed in Wyman's contribution, while
Hammer shows how the module framework can be used for the extraction of
structural invariants which are relevant in the study of dynamic compensation
problems. The last paper in this section (Kamen) discusses how more general
classes of linear systems~ like delay-systems, two-dimensional systems, etc., can
actually be interpreted as systems with coefficients in appropriate rings, as
opposed to fields.
5.2 Families of Systems. The global theory of families of linear systems is
summarized by Tannenbaum in the next essay. This research area has motivated many mathematicians since it was introduced by KaIman in 1974. Its
Introduction
connections with algebraic geometry and classical invariant theory are most
noteworthy.
5.3 Related Developments. Pearson's essay recalls the state of control system
design in the late 50s and argues that the introduction of the concepts of
controllability and observability opened the door to the discovery of the
parametrization of internally stabilizing controllers some 15 years later. This
parametrization builds the basis for all robust control approaches currently in
use. The paper by Baras shows that the operations with polynomials which are
at the heart of the algebraic approach to linear system theory, can be
automatized and the numerical problems completely circumvented using the
recent advances in error-free polynomial computation. The chapter concludes
with a survey of the area of linear discrete system stability by Mansour and
Jury. The investigations, which were motivated by the KaIman and Bertram
papers [21], [22], are centered around the discrete Schwarz matrix introduced
by Mansour, which today bears his name.
Introduction
I first met R.E. KaIman in Zrich in May 1975. At the time, I was still an
undergraduate student in the department of mathematics and physics at the
ETH, working on my diploma thesis in theoretical physics, an area which had
fascinated me since my high school years. It was my intention to pursue a
doctorate degree in this research direction. Having been nurtured and raised
in a classical European academic environment, I was led to regard theoretical
physics, with the necessary background in mathematics, as the branch of science
ofTering the ultimate intellectual challenge. Retrospectively, one can safely claim
that this picture in which I was brought up, was one-sided. Physics being by
and large a descriptive science, there was no room in it for the prescriptive
sciences, the sciences of the artificial which are a dominant force in the scientific
scene today.
Introduction
I would like to thank J.L. Massey of the ETH Zrich for extensive discussions
in the early stages of this project, as weil as for recommending this Festschrift
for publication to Springer. I am also indebted to le. Willems of the University
ofGroningen, for being a constant source ofhelp and councel since the beginning
of this undertaking. I would also like to thank J.B. Pearson of Rice University,
for his feedback during the later stages of the project.
In collaboration with the Centro Matematico V. Volterra, of the University
of Rome, Tor Vergata, a symposium was organized on the occasion of Kalman's
60th birthday, May 17-19, 1990. During this meeting, which took place in
Villa Mondragone, Frascati, most of the contributions to this Festschrift were
presented. The help of L. Accardi with regard to the financial and local
arrangements is gratefully acknowledged.
March 1990
A.C. Antoulas
Rice University, Houston
Electrical Engineering, MIT. (Servomechanisms Laboratory, MIT, internal report, May 1954,
68 pp)
[6] Analysis and design principles of second and higher-order saturating servomechanisms, Trans
AIEE (Applications and Industry), 74 11 (1955) 294-310
ArticIe reprinted in Optimal and Self-optimizing Control, edited by Rufus Oldenburger, MIT
Press, 1966, pp 102-118. (LC Card No. 66-21356)
[7] Nonlinear aspects of sampled-data control systems, in Proceedings of Second Symposium
on Nonlinear Circuit Analysis, edited by J. Fox, Polytechnic Institute of Brooklyn 1956,
pp 273-313. (LC Card No. 55-3575)
[8] Physieal and mathematical mechanisms of instability in nonlinear automatie control systems,
* All re-published (and possibly translated) versions of an articIe or book are Iisted together, under
the same number.
10
[22] (with J.E. Bertram) Control system analysis and design via the 'second method' of Lyapunov.
H. Discrete-time systems, Trans ASME (J. Basic Engineering), 82 D (1960) 394-399
Above two articles reprinted in Nonlinear Systems: Stability Analysis, edited by J.K. Aggarwal
and M. Vidyasagar, Dowden, Hutchinson, and Ross, 1977, pp 58-87. (LC Card No.
76-15382)
[23] Contributions to the theory of optimal control, Boletin de la Sociedad MaU:matica Mexicana,
5 (1960) 102-119
ArticIe reprinted in Simposium Internacional de Ecuaciones Diferentiales Ordinarias,
University of Mexico, September 1959, pp 102-119
Author's reply to discussion, IEEE Trans on Automatic Control, AC-17 (1972) 179-180
Author's rellective comments on articIe published as a Citation C1assic in Current Contents,
PC & ES, no. 32, August 6, 1979, pp 14
[24] On the general theory of control systems, in Proceedings first IFAC Congress on Automatic
Control, Moscow, 1960; Butterworths, London, 1961, Voll, pp 481-492. [Also, russian
translation, IFAC preprint, 29 pp]
[25] (with R.S. Bucy) New results in linear jiltering and prediction theory, Trans ASME (J. Basic
Engineering), 83 D (1961) 95-108
ArticIe reprinted in Random Processes, Part I: Multiplicity Theory and Canonical
Decompositions, edited by A. Ephremides and J.B. Thomas, Dowden, Hutchinson, and
Ross, 1973, pp 181-194. (LC Card No. 75-96190)
ArticIe reprinted in Kaiman Filtering: Theory and Application, edited by H.W. Sorenson,
IEEE Press, 1985, pp 34-47. (LC Card No. 85-14253)
[26] Lectures on the calculus of variations and optimal control, Aerospace Corporation, internal
lectures, August 7-18, 1961. [Typed manuscript, 35 pp, not published.]
[27] New methods and results in linear prediction and JUtering theory, RIAS Technical Report
61-1, February 1961, 135 pp
Report almost completely reprinted as New methods in Wiener JUtering theory, in Proceedings
First Symposium on Engineering Applications of Random Function Theory and Probability,
edited by 1. Bogdanoff and F. Kozin, Wiley, 1963, pp 270-388. (LC Card No. 63-1803)
Report reprinted in ASD Technical Report 61-27, Appendix, pp 109-268
[28] (with T.S. Englar and R.S. Bucy) Fundamental study of adaptive control systems,
SAD-TR-61-27, 1961, 300 pp. [Contains fuH text of [25] and [27], with connecting
narrative and examples.]
[29] Control of randomly varying linear dynamical systems, in Proceedings of Symposia on Applied
Mathematics, American Mathematical Society, Vol 13, 1962, pp 287-298. (LC Card No.
50-1183)
[30] Discussion of paper by L. Marcus and E.B. Lee, Trans ASME (1. Basic Engineering), 84 D
(1962) 9-10
[31] Canonical structure of linear dynamical systems, Proc. National Academy of Sciences (USA),
48 (1962) 596-600
[32] The variational principle of adapt~on: filters for curve fitting, presented at IFAC Symposium
on Adaptive Systems, April 1962, Rome. [Unpublished. Complete manuscript available, 15
pp]
[33] On the stability of linear time-varying systems, Trans IEEE on Circuit Theory, CT-9 (1962)
420-422. Discussion, ibid., CT-1O (1963) 540-542
[34] (with Y.C. Ho and K.S. Narendra) Controllability of linear dynamical systems, Contributions
to Differential Equations, Voll (1963) 189-213
[35] Mathematical description oflinear dynamical systems, SIAM 1. Control, 1 (1963) 152-192.
[36] The theory of optimal control and the calculus of variations, in Mathematical optimization
techniques, edited by R. BeHman, University of California Press, 1963, chapter 16, pp
309-331. (LC Card No. 63-12816)
[37] First-order implications of the calculus of variations in guidance and control, Proc. Optimum
Systems Synthesis Conference, Technical Report ASD-TDR-63-119 (Flight Control
Laboratory, Wright-Patterson Air Force Base, Ohio), February 1963, pp 365-371
[38] Lyapunov funetions for the problem of Lur'e in automatie eontrol, Proc. National Academy
of Sciences (USA), 49 (1963) 201-205
ArticIe reprinted in Nonlinear Systems: Stability Analysis, edited by J.K. Aggarwal and M.
Vidyasagar, Dowden, Hutchinson, and Ross, 1977, pp 201-205. (LC Card No. 76-15382)
[39] On a new characterization of linear passive systems, in Proc. 1st AHerton Conference, 1963,
pp 456-470. (Also RIAS Technical Report 64-7, April 1964)
11
[40] (with G. Szeg) Sur la stabilite absolue d'un systeme d'equations aux dilTerences finies,
Comptes rendus (Paris), 257 (1963) 388-39C.
[41] When is a linear control system optimal?, Trans. ASME (J. Basic Engineering), 86 D (1964)
51-60
Artic1e reprinted in Frequency Response Methods, edited by A.J.C. MacFariane, IEEE Press,
1979, pp 71-80. (LC Card No. 79-90572)
[42] On canonical realizations, Proc. 2nd Allerton Conference, 1964, pp 32-41
Artic1e reprinted in Arch. Automatyki i Telemekhaniki (Warsaw), 10 (1965) 3-10
[43] Toward a theory of computation in optimal control, in Proc. IBM Symposium on Scientific
Computation, October 1964, pp 25-42. (LC Card No. 66-19007)
[44] (with L. Weiss) Contributions to linear system theory, International J. Engineering Science,
3 (1965) 141-171
[45] On the Hermite-Fujiwara theorem in stability theory, Q. Applied Mathematics, 23 (1965)
279-282
[46] Algebraic structure oflinear dynamical systems, I. The module of I, Proc. National Academy
ofSciences (USA), 54 (1965) 1503-1508
[47] Irreducible realizations and the degree of a rational matrix, SIAM J., 13 (1965) 520-544
[48] Linear stochasticfiltering theory-reappraisal and outlook, Proceedings Symposium on System
Theory, edited by J. Fox, Polytechnic Institute of Brooklyn, 1965, pp 197-205. (LC Card
No. 65-28522)
[49] (with B.L. Ho) Ejfective construction of linear state-variable models from input/output data,
in Proceedings 3rd Allerton Conference, 1965, pp 449-459
Artic1e reprinted in Regelungstechnik, 14 (1966) 545-548
[50] Algebraic theory of linear systems, in Proceedings 3rd Allerton Conference, 1965, pp
563-577
Artic1e reprinted in Arch. Automatyki i Telemechaniki (Warsaw), 11 (1966) 119-129
[51] On structural properties of linear constant, multivariable systems, In Proc. 3rd IFAC
Congress, London, 1966
[52] (with T. Englar) A user's manual for the automatic synthesis program (Program C), NASA
Contractor Report CR 475, June 1966, 526 pp
[53] The Riccati equation, chapter 7 of above reference
[54] (with B.D.O. Anderson, R.W. Newcomb, and D.C. Youla) Equivalence oflinear time-invariant
dynamical systems, J. FrankIin Institute, 281 (1966) 371-378
[55] (with B.L. Ho) Spectral Jactorization using the Riccati equation, in Proc. 4th Allerton
Conference, 1966. [Also Aerospace Report No. TR-I00l (2307)-1]
[56] Algebraic aspects oJthe theory of dynamical systems, in DilTerential Equations and Dynamical
Systems, edited by J.K. Haie and J.P. LaSalle, Academic Press, 1967, pp 133-146
[57] New developments in systems theory relevant to biology, in Proceedings III Systems
Symposium, Case Institute of Technology, 1966; published as Systems Theory and Biology,
edited by M.D. Mesarovic, Springer, 1968, pp 222-232. (LC Card No. 68-21813)
[58] Realization theory Jor non-constant linear systems, January 1968, finished manuscript, 79
pp. Intended as chapter 12 for item [64] but not inc1uded
[59] On the mathematics of model building, in Proc. Summer School on Neural Networks,
Ravello, 1967; published as Neural Networks, edited by E.R. Caianiello, Springer, 1968, pp
170-177. (LC Card No. 68-8783).
[60] (in Russian) Raspoznavanie obrazov polilineinymi mashiami, in Proc IFAC Conference on
Adaptive Systems, Erevan, USSR, September 1968, pp 7-30, Izdatel'stvo Nauka,
Moskva, 1971
Artic1e republished in revised and annotated English translation as Pattern recognition
properties ofmultilinear response funetions, I -I I, Control and Cybernetics, 8 (1979) 331-361.
[61] Introduction to the algebraic theory oJ linear dynamical systems, in Proc. International
Summer School on Mathematical Systems Theory, Varenna, 1967; published as Mathematical
Systems Theory and Economics, edited by H.W. Kuhn and G.P. Szeg, Springer Lecture
Notes in Operations Research and Mathematical Economics, Vol 11, 1969, pp 41-65.
(LC Card No. 70-81409)
[62] Leetures on eontrollability and observability, in Proc. C.I.M.E. Summer School at Pontecchio
Marconi, Bologna, July 1968; published as Controllability and Observability, Edizioni
Cremonese, Roma, 1969, pp 1-149
[63] Aigebraic charaeterization of polynomials whose zeros lie in certain algebraic domains, in
Proc National Academy of Sciences (USA), 64 (1969) 818-823
12
[64] (with P.L. Falb and M.A. Arbib) Topics in Mathematical System Theory, McGraw-Hill,
1969, 358 pp (LC Card No. 68-31662)
Russian translation as Ocherki po matematicheskoi teorii sistem, Izdatel'sto "MIR", Moskva,
1971,400 pp
Romanian translation as Teoria sistemelor dinamice, Edituro Technica, Bucaresti, 1975,326
pp
[65] Same computational problems and methods related to invariant Jactors and control theory,
in Proc of Conference on Computational Problems in Abstract Algebra, edited by John
Leach, Oxford, 1967, Pergamon Press, 1969. (LC Card No. 75-84072)
[66] New algebraic methods in stability theory, Proc. 5th International Congress on Nonlinear
Oscillations, Kiev, 1969; published in Izdanie Instituta Matematiki Akademia Nauk USSR,
Kiev, 1970, Vol 2, pp, 189-199
[67] (edited with N. DeClaris) Aspects of Network and Systems Theory (a collection of papers in
honor of E.A. Guillemin), Holt, Rinehart, and Winston, 1971, 648 pp (LC Card No.
77-115455)
[68] On minimal partial realizations oJ a linear input/output map, in Aspects of Network and
System Theory (a collection of papers in honor of E.A. Guillemin), edited by R.E. Kaiman
and N. DeClaris, Holt, Rinehart, and Winston, 1971, pp 385-408. (LC Card No.
77-115455)
[69] (with M.LJ. Hautus) Realization oJ continuous-time linear dynamical systems: Rigorous theory
in the style oJ Schwartz, In Proc 1971 NRL-MRC Conference on Ordinary Differential
Equations, edited by L. Weiss, Academic Press, 1971, pp 151-164. (LC Card No.
77-187234)
[70] Kronecker invariants and Jeedback, in Proc. 1971 NRL-MRC Conference on Ordinary
Differential Equations, edited by L. Weiss, Academic Press, 1972, pp 459-471. (LC Card
No. 77-187234)
[71] (with Y. Rouchaleau and B.F. Wyman) Algebraic structure oJ linear dynamical systems. III.
Realization theory over a commutative ring, Proc. National Academy of Sciences (USA), 69
(1972) 3404-3406
[72] Remarks on mathematical brain models, in Biogenesis, Evolution, Homeostasis, edited by A.
Locker, Springer, 1973, pp 173-179. (LC Card No. 72-96743)
[73] (with Y. Rouchaleau) Realization theory oJ linear systems over a commutative ring, in
Automata Theory, Languages, and Programming, edited by M. Nivat, North Holland, 1973,
pp 61-65. (LC Card No. 72-93493)
[74] Filtraggio statistico nella tecnologia spaziale, in Scienza & Tecnica 73, Arnoldo Mondadori,
Milano, 1973, pp 403-408
[75] Algebraic-geometric description oJ the dass oJ linear systems oJ constant dimension, Proc
8th Annual Princeton Conference on Information Sciences and Systems, 1974, pp 189-191
[76] Comments on the scientific aspects oJ modeling, in Towards a Plan of Actions for Mankind,
edited by M. Marois, North Holland, 1974, pp 493-505. (LC Card No. 75-319415)
[77] Optimization, mathematical theory oJ, IV: Control theory, Encyclopedia Brittanica, 15th
Edition, 1974, Macropaedia, Vol 13 (Newman to Peisistratus), pp 634-638
[78] (with Michiel Hazewinkel) Moduli and canonicalJormsJor linear dynamical systems, Report
7504/M, Erasmus Universiteit Rotterdam, April 1974, 30 pp
[79] Algebraic aspects ofthe generalized inverse, in Generalized Inverses and Applications, edited
by M. Zuhair Nashed, Academic Press, 1976, pp 111-124. (LC Card No. 76-4938)
[80] Realization theory oJ linear dynamical systems, in Control Theory and Functional Analysis,
VoilI, International Atomic Energy Agency, Vienna, 1976, pp 235-256
[81] (with Michiel Hazewinkel) On invariants, canonical Jorms and moduli Jor linear, constant,
finite dimensional, dynamical systems, in Mathematical System Theory, edited by
G. Marchesini and S.K. Mitter, Springer Lecture Notes in Economics and Mathematical
Systems, 1976, pp 48-60
[82] A retrospective aJter twenty years: Jrom the pure to the applied, in Applications of Kaiman
Filter to Hyorology, Hydraulics and Water Resources, edited by Chao-lin Chiu, Dept. ofCivil
Engineering, University of Pittsburgh, 1978, pp 31-54. (LC Card No. 78-069752)
[83] Nonlinear realization theory, in Transactions of the Twenty-Fourth Conference of Army
Mathematicians, US Army Research Office, Triangle Park, NC, May 1978, pp 259-269
[84] (with A. Lindenmayer) DOL-realization oJ the growth of multicellular organisms (extended
abstract), Proc 4th International Symposium on the Mathematical Theory of Networks and
Systems, Delft, July 1979
13
[85] On partial realizations, transfer functions, and canonical forms, Acta Polytechnica
Scandinavica, Mathematics and Computer Sciences Series No. 31, 1979, pp 9-32
[86] A system-theoretic critique of dynamic economic models, in Global and Large-scale System
Models, edited by B. Lazarevic, Springer, 1979, pp 1-24. (LC Card No. 81-461283)
[87] Theory of modeling, Proceedings of the IBM System Science Symposium, Oiso, Japan,
October 1979, pp 53-69
[88] System-theoretic critique of dynamic economic models, Int. J. Policy Analysis and Information
Systems, 3 (1980) 3-22
[89] Mathematical system theory: the new Queen?, Texas Tech. University Mathematics Series,
No. 13, 1981, American Mathematical Heritage: Algebra and Applied Mathematics,
pp 121-127
[90] Dynamic econometric models: a system-theoretic critique, in New Quantitative Techniques
for Economic Analysis, edited by G.P. Szeg, Academic Press, New York, 1982, pp 19-28
[91] Identifiability and problems ofmodel selection in econometrics, in Advances in Econometrics,
edited by W. Hildebrand, Cambridge University Press, 1982, pp 169-207. (LC Card No.
81-18171)
ldentifikalhatosag es a modellvalasztas problemai as okonometriaban, (Hungarian translation
of the preceding), Szigma, 15 (1982) 87-119
[92] On the computation of the reachable/observable canonical form, SIAM J. Control and
Optimization, 20 (1982) 258-260
[93] Realization of covariance sequences, in Toeplitz Centennial, edited by I. Gohberg, Birkhuser,
1982, pp 135-164. (LC Card No. 82-1319)
[94] System identification from noisy data, in Dynamical Systems 11, edited by A.R. Bednarek
and L. Cesari, Academic Press, 1982, pp 331-342. (LC Card No. 82-11476) (Proceedings of
a University of Florida International Symposium)
[95] Identifiability and modeling in econometrics, Developments in Statistics, edited by P.R.
Krishnaiah, Academic Press, 1982, Vol 4, pp 97-136. (LC Card No. 77-11215)
[96] ldentification from real data, in Current DeveIopments in the Interface: Economics,
Econometrics, Mathematics, edited by M. Hazewinkel and A.H.G. Rinnooy Kan, D. Reidel,
Dordrecht, 1982, pp 161-196. (LC Card No. 82-16694)
[97] We can do something about multicolIinearity, Communications in Statistics, 13 (1984)
115-125
[98] Identification ofnoisy system, Uspekhi Mat. Nauk, 40 (1985) 29-37. Russian Mathematical
Surveys
[99] Transcript of Kyoto Prize Lectures, November 10 & November 11, 1985
[100] (edited with G.1. Marchuk, A.E. Ruberti, and AJ. Viberti) Recent Advances in Communcation
and Control Theory, Optimization Software, Inc., 1987,489 pp (LC Card No. 87-18604)
[101] The problem of prejudice in scientific modeling, in Recent Advances in Communication and
Control Theory, edited by R.E. KaIman, G.1. Marchuk, A.E. Ruberti, and AJ. Viberti,
Optimization Software, Inc., 1987, pp 448-461. (LC Card No. 87-18604)
[102] Nine Lectures on Identification (book), Springer, Lecture Notes on Economics, to appear
[103] Prolegomena to a theory ofmodeling, to appear in International J. ofMathematical Modeling
[104] A theory for the identification of linear relations, to appear in Lions Festschrift, edited by
H. Brezis and P.G. Ciarlet
Chapter 1
Axiomatic Framework
1 Introduction
I consider it a privilege to contribute the opening article to this Festschrift on
the occasion of the 60-th birthday of Rudolf Kalman.
The development of the field of System Theory as a scientific discipline owes
more to the vision and to the research work of KaIman than to that of any
other individual. True, control theoretic questions (and even a few answers)
date back all the way to the days of James Clerk Maxwell and to the pre-World
War 11 era when graphical algorithms for analyzing simple feedback schemes
developed by Bode, Nyquist, and others at Bell Laboratories were elevated to
the status of an Idea. True, the observation that biological systems interact with
their environment in an intelligent (feedback) fashion had led Wiener to coin
the term Cybernetics, but it proved hard to build a discipline on the shaky basis
of one single word, albeit a truly beautiful one at that. True, there was General
Systems Theory, but a few fuzzy ideas failed to provide the requisite variety
needed to shoot root as a basic interdisciplinary scientific endeavor.
These critical innuendos notwithstanding, one ought to give credit to
Cybernetics and General Systems Theory for realizing the need for a theory of
the artificial, for a framework for studying man-made systems, for a discipline
which addresses the problems of the prescriptive sciences. By their very nature,
Cybernetics and General Systems Theory profess an abstract purpose, and as
such they ran shipwreck in their refusal to accept the need that abstract ideas
can be properly articulated only in the language of mathematics, the field which
provides a vocabulary of abstract notions and concepts, and a grammar for
unfolding deductions from these.
Simultaneously to all this, more substantial work was underway in electrical
engineering. Indeed, in the fifties, we witnessed the development of electrical
network analysis and synthesis, which, among other things, laid the foundation
oflinear system theory. Unfortunately, the mainstream work in this are remained
physics-based, one of the perceived prerequisites being the need to capture the
restrictions imposed on the electrical circuit by the physical constraints of the
elements and the interconnections. Ironically, it was the invention of solid state
electronic devices which all but eliminated these physical constraints as an
18
J. C. Willems
important consideration for what can and what cannot be achieved, for what
can and wh at cannot be designed.
Another parallel development was communication theory. This area, more
specifically information theory, is a perfect example of a discipline which combines
asolid mathematical foundation, a potential and a penchant for The Big Idea,
and, through the co ding algorithms, an immediate technological relevance.
Regrettably, the scientific impact of this area remained isolated both in time and
in pI ace, and nowadays communication theory is a highly successful and active
area of research within electrical engineering but with limited influence outside
its immediate environment.
Electrical network theory and communication theory had indeed the
potential as a breeding ground for the growth of theoretical engineering.
Unfortunately, this promise remained largely unfulfilled and it was to be another
area of electrical engineering which was destined to combine many of these
ideas with the notion of feedback and the emerging focus on optimization as
a centripetal principle in design. This area was automatie control which until
the mid~fifties had remained a field thriving on a rather narrow intellectual
basis. In a sense it consisted out of little more than a good understanding of
the notion of a scalar transfer function and a few ad-hoc algorithms for the
design of single loop 3-term PID-controllers. The group around Lefshetz and
LaSalle at RIAS (the Research Institute for Advanced Studies, a scientific
research group ofthe Martin Company), and in particular, the vision ofKalman,
played a crucial role in giving automatic control the momentum required for
taking this field over the threshold.
In the late fifties control theory was the scene of a number of important
happenings. Firstly, we saw the development of the maximum principle, a subtle
set of necessary conditions for the optimality of an open loop control policy.
Secondly, there was the popularization of dynamic programming, which laid the
basis for a flexible view offeedback control for dynamical systems in the presence
ofuncertainty. Thirdly, and most importantly, we saw the appearance of Kaiman
filtering which provided a mathematical theory for recursive estimation and
prediction ofan unknown time-function on the basis ofanother, observed one.
KaIman filtering, together with its dual, the linear-quadratic-problem,
combined a number of catalytic features necessary for a successful development
in applied mathematics. Based on a convincing problem formulation, it obtained
its solution in a convenient recurvise form requiring the off-line solution of a
Riccati (differential) equation. It provided a beautiful algorithm suitable for
almost immediate computer implementation. Further, the infinite-time version
required the exploitation of the combined properties of observability and
controllability-very compelling concepts in their own right. Moreover, the
analysis of the Riccati equation, certainly in the infinite-time case, provided an
example of a puzzle of the type which appears to be an absolute requirement
for a thriving activity in normal science.
If one studies the literature of this era, one is struck by the breadth of ideas
put forward by KaIman at that time. He perceived long before it was to become
19
2 Mathematical Models
The language wh ich we developed as a mathematical vocabulary for modelling
is based on a conceptual triptych consisting of the behavior, behavioral
equations, and latent variables. We view a mathematical model as an exclusion
law: it states that certain outcomes of a phenomenon are forbidden, are declared
impossible, while others are declared as being (in principle) possible. Thus we
define a mathematical model as a pair M = (UJ,!B) with 1IJ a set called the
universum, and !B ~ 1IJ the behavior of the model. In most applications, the
behavior will be specified as the solution set of a system of equations. We will
call these behavioral equations. Formalizing, we have two maps 11,12 from the
universum 1IJ into aspace IB, called the equating space, and the behavior is
defined through the equations 11(U) = 1iu) by !B = {uElIJI11(U) = 12(U)}. Clearly
11> 12 define !B but the converse is obviously not true. Thus in mathematical
modelling, equations should be considered as a means to an end. Per se, they
are not the essen ce in a modelling exercise.
20
J. C. Willems
When models are deduced from first principles, it will invariably prove
convenient to introduce auxiliary variables. We will call these variables latent
variables and, in order to provide contrast, we will call the elements of the
universum lU manifest variables. Thus a latent variable model is a tri pie
M f = (lU,IL, 55 f) with lU the universum of manifest variables, lL the universum
of latent variables, and 55 f the Jull behavior. The latent variable model M f
induces the manifest model M = (lU, 55) with 55 = {UE lU[ 313(u, l)E55 f}, the
manifest behavior.
In the case of dynamical systems, the universum will consist of time-functions,
maps from the time-axis lI' ~ lR into the signal space W, and the behavior 55
consists of the family of W-valued time-trajectories which are compatible with
the laws of the dynamical system. Formally, a dynamical system 1: is a tri pie
(lI', W,55) with lI' ~ lR the time-axis, W the signal space, and 55 ~ W lf the
behavior. As with general models, also dynamical systems will usually be
described by behavioral equations. Often, these will take the form of difference
or differential equations. As with general models, also dynamical systems will
often by described through latent variables, yielding a latent variable
dynamical system 1:f = (lI', w, lL, 55 f) with lI' ~ lR the time axis, W the signal
space of manifest variables, lL the latent variable space, and 55 f ~ (W X IL)lf the
Jull behavior. 1:f now induces a (manifest) dynamical system 1: = (lI', W,55) in
the obvious way.
For motivational material and more details on this setting for studying
dynamical systems, we refer the reader to [Wl, W2, W3].
Example. Kepler's laws. If adefinition is to show proper respect for and do
justice to history, Kepler's laws should provide the very first example of a
dynamical system. They do. Take lI'=lR, W=lR 3 , and 55= {w:lR->lR 3 [Kepler's
laws are satisfied}. Thus the behavior 55 in this example consists ofthe planetary
motions which according to Kepler are possible, all trajectories mapping the
time axis lR into lR 3 (the position space for the planets) which satisfy his three
famous laws. Since for a given map w:lR -> lR 3 one can unambiguously decide
whether or not it satisfies Kepler's laws, 55 is indeed well-defined. Kepler's laws
form a beautiful example of a dynamical system in the sense of the above
definition, since it is one of the few instances in which 55 is described explicitely,
and not indirectly in terms of behavioral equations. It took no lesser man than
Newton to think up appropriate behavioral equations for this dynamical system.
3 Linear Systems
A dynamical system 1: = (lI', W,55) is said to be linear if W is a vector space
(over a field lF) and 55 is a linear subspace of W lf. Thus linear systems obey
the superposition principle in its very simplest form: {w 1 (-), wz{ )E55; <x, ElF} =>
{<xw 1 (-) + Wz{)E55}.
21
22
J. C. Willems
For proofs and further definitions, we refer the reader to [Wl, W2, W3].
23
The above proposition makes it evident why polynomial matrices play such
a overwhelming role in linear system theory.
Let R(S,S-l)EIRg x q[S,S-l]. We will call the system of AR-equations
R(a,a- 1)w=o minimal (for simplicity we call also R minimal) if {R'(S,S-l)E
IRg' x q[s, s - 1], R~I} = {g' ~ g}. Here R~RR means that ker R'( a, a - 1) =
ker R(a, a- 1 ).
Proposition, Every 1: EBq admits a minimal AR-representation R(a, a- 1)w
= o.
Moreover
(i) {R is minimal}<=>{R(s,s-l)EIRj;q[S,s-l], i.e., R is offull row rank}.
(ii) 1f R is minimal, then {R'(s, S-l)EIR' Xq[s, S-l], R~RR, R' minimal}<=>
{R and R' are unimodularly left-equivalent that is, there exists a unimodular
U(S,S-l) such that R = UR')}.
The above result identifies how all minimal AR-representations may be
deduced from one: as the orbit of the transformation group R ~ UR where U
ranges over the unimodular polynomial matrices with a suitable number of rows
and columns. This group is the unimodular group acting on the left. Note that
the above proposition implies in particular that IRj;q[S,S-l] (the full row rank
polynomial matrices with q columns) is a canonical form since each element of
Bq admits a minimal AR-representation.
We use the notion of canonical form he re in the following sense. Let ('-13, n)
be a parametrization of 9R. The map n: '-13 ~ 9R induces the equivalence relation
E on '-13 defined by {P1Ep2}:<=>{n(pd = n(p2)}' This equivalence relation leads
to canonical forms and to invariants. A subset \.13c C;; '-13 will be called a canonical
form for the parametrization (\.13, n) if \.13cnp(mod E) is non-empty for all PE\.13,
i.e., if n(\.13J = 9R, if (\.13" n) is itself a parametrization of 9R. It is called a trim
canonicalform if'-13cnp(modE) consists ofexactly one point for all PE\.13, i.e. if
nl'lJc:'-13c~9R is a bijection, i.e. if (\.13" n) is a trim parametrization of 9R.
All the notions related to dynamical systems introduced so far (linearity,
time-invariance, completeness, etc.) are trivially generalized to systems with
latent variables. Sometimes it is obvious that the manifest model inherits a
property from the latent variable model (for ex am pie, linearity and timeinvariance) sometimes it is less obvious (for example, completeness).
Let R(S,S-l)EIRg x q[S,S-l], M(S,S-l)EIRg xd [S,S-l] and consider the system
of behavioral difference equations
(ARMA)
24
J. C. Willems
relating the time-series w:Z -+ IRq (expressing the evolution of the manifest
variables) to the latent variable time-series a:Z-+IRq. We will call this an
AutoRegressive- M oving-Average system. The term R( eJ, eJ - l)W in ARMA is called
the AutoRegressive part, while M( eJ, eJ - 1)a is called the M oving-Average part.
Clearly these equations represent a dynamical system with latent variables
(Z,IRq,IRd , ~ f) with ~ f=ker[R(eJ, eJ- 1)1_ M(eJ, eJ-1)]. From the AR-representation
theorem, we know that precisely every such system with ~ fE.\2q+d, that is, every
linear time-invariant complete latent variable dynamical system, with 1I' = Z,
W = IRq, and ll.., = IRd , can be described by an ARMA-system of behavioral
equations.
An ARMA-system induces an manifest dynamical system (Z,IRq,~) with
~={wI3a such that ARMA is satisfied}, equivalently,~=(R(eJ,eJ-1))-1
imM(eJ,eJ- 1) (with ( )-1 the inverse image), R(eJ,eJ- 1):(IRqf l -+(IR9YZ, and
M(eJ,eJ- 1):(IRdfz-+(IR9)71. Clearly (Z,IRq,~) is linear and time-invariant. The
question arises if it is also complete. The answer is in the affirmative:
Theorem, Let the dynamical system with latent variables L:L = (Z,IRq,IRd, ~f) be
linear time-invariant and complete. Then the manifest dynamical system which it
represents, L: = (Z, IRq, ~), is also linear time-invariant and complete.
In terms of an ARMA-system of behavioral equations R(eJ,eJ- 1)w=
M( eJ, eJ - 1)a, the above theorem states that the latent variables a can be
completely eliminated, resulting in an AR-system of equations. That is, there
will exist a polynomial matrix R'(S,S-1)EIR'X q[S,S-1] such that the AR-system
of equations R'(eJ, eJ-1)W = 0 represents the manifest behavior. In other words,
this equation captures all the restrictions imposed on w by ARMA. This
elimination may result in an increase in the lag of the resulting AR-system as
compared to the ARMA-system. However, the number of equations in the
AR-system need never be larger than the number of equations of the original
ARMA-system. We will refer to the above theorem and to the resulting
consequence for ARMA- and AR-equations as the ARM A elimination theorem.
A specially important dass of ARMA-systems are those in which
R(s, s- 1) = I, yielding the M oving Average system
(MA)
The intrinsic behavior ~ of an MA-system equals im M(eJ, eJ-1). Of course,
~E.\2q, and the elements of .\2q which can be represented this way are precisely
these subspaces of (IR q)71 which are images of polynomial operators in the shift.
We have seen that every ~E.\2q allows an AR-representation, that is, that it is
the kernet of a polynomial operator in the shift. Does it also allow an
M A -representation? In other words, is it also the image of a polynomial operator
in the shift? If not, does the restriction of having an MA-representation imply
25
some interesting system theoretic property? It may come as a surprise that these
abstract questions lead us to concept of controllability!
The above results allow a generalization to the continuous-time case.
However, from a 'technical' mathematical point of view the theory becomes
somewhat more intrincate, because there is the difficulty of giving a natural
definition of the behavior induced by a differential equation.
Let Ro,R l , . .. ,RL E1Rgxq and consider the system of differential equations
dLw
dL-lW
dw
RL-+R
+
..
+Rl-+Row=o
dt L
L-l dt L- l
dt
The first thing to note is that the relevant polynomial ring is now (as in the
case 1I' = Z+) 1R[s], with the adapted notion of unimodularity, etc.
The above set of differential equations is the analog of an AR-system. The
analog of an ARMA-system becomes
with R(S)E1Rg Xq[s] and M(S)E1Rg Xd[S]. The analog of an MA-system follows
from there.
These differential equations define a dynamical system (possibly with latent
variables) with time axis 1I' = 1R (or 1R+, the theory now being completely
identical) and signal space W = 1Rq But how should one define the behavior?
The most logical approach is to include distributions in the behavior.
We denote by !)q the set of 1Rq-valued distributions on 1R, equipped with
the usual topology. Then R(d/dt)w = 0 defines the dynamical system
1: = (1R,1Rq, m) with ms !)q defined as ker R(d/dt) with R(d/dt) viewed as a map
from !)q to !)g. In this setting, the result regarding the elimination of latent
variables for ARMA-systems generalizes to the differential equation case.
Actually, the same holds if we consider all trajectories to be C oo , but working
with this space has many other disadvantages.
The above results show the ni ce interplay between the behavior and
behavioral equations. In our ideology, it is imperative to define concepts and
properties of systems on the level of the behavior and to develop tests on the
behavioral equations for veryfying wh ether the induced system has a particular
property. In this essay we will use the concepts of controllability and observability
as a case in point.
26
J. C. Willems
4 Controllability
We start with some historical comments. The notion of controllability-related
to the possibility of transferring the state of a system-has been introduced by
KaIman [K1, K2, K3, KHN1, KFA1] around 1960 and immediately became one
of the key concepts in control theory, related to the very possibility of exerting
effective control. It enters as a crucial condition in (infinite time) LQG- and
H co -control, in pole placement, in time optimal control, etc. Soon after the
introduction of controllability, realization theory appeared on the scene. One
of the important paradigms which was learned from this development is that
every transfer function and every convolution system can be represented by a
minimal state space system and that minimality is equivalent to controllability
and observability. This state of affairs has created the impression that
controllability is not an intrinsic property of a dynamical system, but merely a
property of astate realization, of a specific representation of a dynamical system.
Nevertheless, the idea that lack of controllability be ars relation to the presence
of common factors in a transfer function (and can thus, in some sense, be
considered as an external property) is very much part of the system theory
folklore. As indicated above, there are good system theoretic reasons why this
is dubious. It is equally dubious for good mathematical reasons: a transfer
function is a rational function and common factors are by definition cancelled.
We will now introduce a point of view which will make history out of these
historical remarks and this folklore. Indeed, we will put forward a convincing
definition of controllability which makes it into a property of the manifest
behavior and which is in principle applicable to any dynamical system. Thus
controllability will become a genuine property of a dynamical system.
Definition. Let L = (1I', W, ~), 1I' = 7l or lR, be a time-invariant dynamical
system. L is said to be controllable if for all W I, W 2 E~ there exists a t E1I', t ~ 0,
and a w:1I' (\ [0, t] -+ W such that w' E~, with w':1I' -+ W defined by
WI(t')
{
w'(t'):= w(t')
w 2 (t' - t)
for
for
for
t' <
~ t' ~ t
>
t'
'nature', of
Note that this notion of controllability is convincing in its own right and
representation independent. Obviously, it is desirable to be able to read off
27
from the behavioral equations whether or not a system enjoys a certain property.
For AR-models a rather concrete test can be obtained:
Theorem. Let ~E~q be represented by the AR-equations R(u, u-I)w = 0 with
R(s, s - I)EIRg xq[s, s -I]. Then Eis controllable if and only if the rank of the matrix
R(A., r I)E<Cg xq is independent of A. for 0 =f. A.E<C.
This theorem can be viewed as a sweeping generalization of Hautus'
controllability test for the ubiquitous state space system ux = Ax + Bu. We can
also prove the following interesting representation theorem:
Theorem. E = (Z, IRq, !B) = ~q is controllable if and only if there exists
M(s, S-I )EIRq x"[s, S-I] such that !B = im M(u, u- I), i.e., if and only if it allows a
M A -representation w = M(u, u - I )a.
We will refer to the above theorem as the MA-representation theorem. We
will denote the set of controllable systems in ~q by ~~onlro/lable. Thus
~~onlro/lable:= {~ = (Z,IRq, !B)E~ql E is controllable}. The theorem implies that
(IRq x"[s,s-I],n) with n:M(u,u-I)I-+(Z,IRq,imM(u,u- I)) defines a polynomial
matrix parametrization of ~~onlro/lable. This leads in our usual way to an
equivalence on IRqx"[S,S-I] denoted by MIM"AM2' wh ich hence signifies
imMI(u,u- l ) = im M 2 (u, u- I). In system theoretic terms, it means that the
associated MA-systems have the same manifest behavior.
The other extreme from controllability are the autonomous systems, in which
the past of a trajectory implies its future completely.
Let E = (1[', W, !B), 11' = Z or IR, be a time-invariant dynamical system. ~ is
said to be autonomous if {w l , W 2E!B and wl(t) = wit) for t < O} => {w i = w2}.
Proposition. Let E
=(Z,IRq,!B)E~q.
(i) ~ is autonomous;
(ii) !B is finite-dimensional;
(iii) E admits an AR-representation with R(s, S-I )EIRq xq[s, S-I] having det R =f. o.
(iv) E admits an AR-representation R(u, u-I)w = 0 with ker R(A., r I) = {O} for
some 0 =f. A. E<C (and hence for all but a finite number of 0 =f. A. E<C).
J. C. Willems
28
if
5 Observability
We now turn to the notion of observability. In our view, observability will be
a property of systems which produce two kinds of signals, one which is observed,
and another which should be deduced from the observed signal.
Definition. Let L = (11', W 1 X W z, \B), 1I' = ~ or IR, be a time-invariant
dynamical system. Then W2 is said to be observable from W1 in L if
m an d w 1 = w 1 = w2 = w2 .
wl' w2 ' wl' w2 E:.v
{ (
("
")
"}
{'
"}
E S2~ontrllable
(iii)
29
if
W=[~J
(1/0)
[~J
belongs to the
30
J. C. Willems
31
= N(u, u- 1 )a
32
J. C. Willems
(-but a finite-) number of rows. Each element R(s, S-1)ElRo Xq[S,S-1] induces
the dynamical system (Z,lRq,kerR(O', 0'-1)). Let n denote the map which
associates this dynamical system with the polynomial matrix R. Above we have
seen that (lRo Xq[s, S-1], n) defines a parametrization oJ i!q. The question whether
this is a continuous parametrization is a much more delicate one. To begin with,
it requires specifying a topology on i!q and on lRo x q[s, S-1].
For i!q we will use the following topology. We have seen that (Z,lRq,~)
belongs to i!q if and only if ~ is a linear shift-invariant closed subspace of (lRq)z,
equipped with the topology of pointwise convergence. This leads to a topology
on i!q with the following notion of convergence. A family of systems
L, = (Z, lRq, ~,)Ei!q with B>
areal number is defined to converge to
L o = (Z, ~q, ~o)Ei!q if
(i) whenever W'kE~'k' kEIN, lim k -+", Bk = 0, and limk -+", W'k = Wo (pointwise
convergence), then WoE~o, and
(ii) whenever Wo = ~o, then there exist W,E~, such that lim,-+", w, = Wo.
For lRo xq [S,S-1] on the other hand, we will use the following notion of
convergence. Let R,(s, S-1)ElRg, x q[s, S-1], B~ 0. Then we define lim,-+o R, to be
R o if
(i) g, = go for B sufficiently smalI,
s-1)=R'L csL'+R'Lc-l SL,-1++R'lc+l SI,+1+R'SI,
satisfy 1=:;;1
=:;;L =:;;L
(ii) R(s
f!
,
11:
- f!- f for all B ~ 0, and
(iii) lim,-+o R~ = R~ for all I ~ k ~ L. This last convergence is componentwise in
the entries of the matrices.
Thus system convergence means that convergent time-series from the
behavior of the convergent systems approach a limit in the behavior of the limit
system and, conversely, that each time-series in the behavior of the limit can
be approximated by elements in the behavior of the convergent systems.
Polynomial matrix convergence simply means convergence of the matrix
coefficients.
It is not possible to prove in full generality that the polynomial matrix
parametrization lRo x q[s, s- 1] of .Bq is a continuous one. For one thing, we need
to restrict out attention to Juli row rank polynomial matrices with q columns.
In fact it is easy to prove that (lRj;q[s,s-1],n) defines also a parametrization
oJ i!q (lRj;q[S,S-1] denotes the polynomial matrices of full row rank). This
parametrization corresponds, as we have seen, to the minimal ARrepresentation. In fact, we have seen that if R(O', 0'-1)W = 0 is one minimal
AR-representation of L, then the transformation group (the unimodular group)
R -+ UR, where U(s, s - 1) ranges over the unimodular polynomial matrices,
generates precisely all minimal AR-representations of L. This tremendous
non-uniqueness of behavioral equation representations is, among other things,
the source of difficulty in continuity considerations.
In order to state our result from [NW1] on continuous parametrization,
we need to introduce the notion ofthe memory span of an element (Z,lR q, ~)Ei!q.
33
It can be shown that \.B has the property that it has finite memory span, that is,
that there exists a AEZ+ such that w1, w2E\.B and w1(t) = w2(t) for 0 ~ t ~ A
implies that w1 A W2E\.B, where Wl A W2 denotes the concatenation of w1 and
w2, defined as (w 1 A W2)(t):= w1(t) for t < 0 and (w 1 A W2)(t):= w2(t) for t ~ O. The
smallest of such numbers AEZ will be called the memory span of ~.
Let R(S,S-I) =RLsL + R L_ 1SL- 1 + ... + R l sI EIR x q[S,S-IJ, have R L =1=0 and
R l =1= O. Then we call L -I the degree of R. Let us denote by IRj;.~[s, S-IJ the
collection ofelements ofIRj;q[S,S-IJ with degree ~A. Also, let us denote by
E~ those elements of Eq with memory span ~ A. In [NWIJ we have obtained
the following interesting continuity result:
Theorem. (IRj;.~[s, S-1 J, n) defines a continuous parametrization oJ E~.
Thus with the restrietion imposed in the above theorem, linear time-invariant
complete dynamical systems converge if and only if their AR-representations
converge.
The electrical circuit example described in [WNIJ shows that this will not be
automatie and it is refreshing to take note of the fact that observability provides
the key for a positive result in this direction!
Theorem. Assume that the ARMA-system
J. C. Willems
34
~j
lim ~j = ~J
and
E-O
~'
will converge:
lim 'B' = ~o
E-i'O
Corollary. Under the assumptions of the above theorem, there will exist full row
rank polynomial matrices R:(s, S-l )EJR x q[s, S-l] such that
R:(a, a- 1 )w =
~,
with
R~(s, S-l)
' .... 0
It would stand to reason to think that a' ~ aO. That, however, need not be
, .... 0
Clearly
w~(t)
=0
a'(t)
=8- t
e,-1
w~ ~o, w~ ~o,
e-O
e-O
35
specifies
~J'
a = R"((T, (T-1)W
R~((T, (T-1)W
= 0; a = R~((T, (T-1)W
It follows that, under the conditions of the previous theorem, we can choose
R' such that R' -E- 4 R~. However it follows from the above example that even
0
[;
E;
then it may not be possible to choose R"E such that R"e -e-O
- 4 R~. If it is the case,
then W,E~' converges to Wo (which will then belong to ~o) if and only
(w" a,)E~J also converges: (w., a.) - - 4 (wo, ao) (which then belongs to ~J)
~o
if
This is a particular type of ARMA-model (identify w with [u/y] and a with x).
Assurne that A E--4A o, B, - - 4 B o, C, - - 4 Co, D, --4D o, and that (A o, Co)
E-O
e-O
E;-OO
e-O
is an observable pair. Then ~J - - 4 ~Jo and (since the conditions of our main
E~O
result are now automatically satisfied) it follows that ~'--4 ~o. This result
E~O
has been shown in [Nl]. However, in this case it can also be seen that (u E, YE)E~E
converges to (u o, Yo) (which will belong to ~O) if and only if (u3,y"XE)E~J
converges to (u o, Yo, x o) (which will then belong to ~J). This can be seen as
follows. There holds (n is the dimension of the state vector x):
YE= C,X,
- CEBEu, + (TY, = C,A.x,
CEA,BEu E- C,B,(Tu E+ (T2yE = CEA;x,
C AnE- 1 X,
A n- 2B EU, - " ' -C,B ,(Tn- 2 U,+(T n- 1 y,=,
CE,
CoA
Co o
CO~~-l
and obtain, by taking the corresponding rows from the above equations, a
relation like
36
J. C. Willems
1
with Me square, Me ----+
Mo, and Mo nonsingular. Now pre-multiply with Me--+O
e
9 Postscript
In this artic1e, I have given abrief outline of some aspects of the theory of
dynamical systems which I developed during the last decade. The behavior of
a dynamical system plays the lead role in this approach. However, most
dynamical models encountered in practice will be given in the form of behavioral
equations. Typically these take the form of difference equations or differential
equations. In a dynamical system described in terms of behavioral equations,
the behavior is specified as the solution set of the behavioral equations.
Moreover, models obtained from first principles will invariably require the
introduction of auxiliary variables in order to specify the laws of the dynamical
system. We call these auxiliary variables latent variables in order to distinguish
them from the manifest variables. These are the variables which our model aims
at describing. This triptych, with the behavior in central stage and with
behavioral equations and latent variables as important supporting characters
provides the basic conceptual framework on which our approach is build.
In this approach we aim at introducing, at defining, all properties of a
dynamical system at the level of the behavior. We have developed here two
examples of such concepts: the important notions of controllability and
observability. Controllability refers to the possibility of matching any feasible
past trajectory to any feasible funture trajectory, while observability refers to
the possibility of deducing the latent variable trajectory from the manifest
variable trajectory.
Many other notions from c1assical and modern linear systems theory can
be introduced effectively from this point of view. We mention a few. Inputs and
outputs: the input is a maximal set of free variables. The state of a dynamical
system: the state is a latent variable having the property that the past behavior
is independent of the future behavior, given the present state. The transJer
Junction and, more gene rally, the Jrequency response: these simply c1assify the
exponential trajectories in the behavior (they can hence be introduced without
reference to Laplace transforms). The fact that behavioral equations provide a
many-to-one way ofdefining a dynamical system leads to the problem offinding
(trim) canonical Jorms and (complete) invariants (see [W3]). Finally, it is possible
37
to fit the classical realization theory within this framework. In fact, through the
notion of the most powerful unfalsified model (see [Wl, W3]), this problem
becomes both more natural and greatly generalized in the sense that it can be
applied to any set of observed vector time-se ries, and not just to the impulse
response.
Our approach is very much inspired by Kalman's seminal work. The first
and most basic aim is to provide (in the spirit of the first chapter of (KF Al])
a suitable axiomatic framework for the study of dynamical systems. This
framework, of course, allows for free variables (inputs). This is contrast to the
very limited point of view followed in topological dynamics where the initial
conditions uniquely specify the solution. However, in our framework the
input/output structure is deduced from the behavior and not imposed ab initio.
Also, the crucial role which the state plays in Kalman's work is in our framework
brought to its full fruition through the more general idea of latent variables.
As a first generation student of mathematical system theory, it is a pleasure
to acknowledge the constant inspiration which Kalman's work has provided
for my own scientific thinking. The present article is more than evidence of this
profound influence.
References
[K1] R.E. Kaiman, On the General Theory of Control Systems, Proeedings of the First
International Congress of the IFAC, Moseow 1960, Butterworths, London, pp. 481-492,
1960
[K2] R.E. Kaiman, Mathematical Description of Linear Dynamical Systems, SIAM Journal on
Control, Voll, pp 152-192, 1963
[K3] R.E. Kaiman, Leetures on Controllability and Observability, Centro Internazionale
Matematieo Estrio Bologna, Italy, 1968
[KFAI] R.E. Kaiman, P.L. Falb, and M.A. Arbib, Topies in Mathematieal Systems Theory,
MeGraw-Hill, 1969
[KHNI] R.E. Kaiman, Y.C. Ho, and K. Narendra, Controllability of Linear Dynamical Systems,
Contributions to Differential Equations, Vol I, No 2, pp 189-213, 1963
[NI] J.W. Nieuwenhuis, More About Continuity of Dynamical Sytems, Systems & Control
Letters, Vol 14, No 1, pp 25-30, 1990
[NW1] J.W. Nieuwenhuis and J.C. Willems, Continuity of Dynamical System: A System Theoretic
Approach, Mathematies ofControl, Signals, and Systems, Voll, No 2, pp 147-165, 1988
[WI] J.e. Willems, From Time Series to Linear System, Part I. Finite Dimensional Linear
Time Invariant Systems, Part ll. Exact Modelling, Part llI. Approximate Modelling,
Automatiea, Vol 22, No 5, pp 561-580, 1986; Vol 22, No 6, pp 675-694, 1986; Vol 23,
No 1, pp 87-115, 1987
[W2] J.e. Willems, Modelsfor Dynamics, Dynamics Reported, Vol 2, pp 171-269, 1989
[W3] J.C. Willems, Paradigms and Puzzles in the Theory of Dynamical Systems, IEEE
Transaetions on Automatie Control, Vol. AC-36, No. 3, pp 259-294, Mareh 1991
[WNI] J.C. Willems and J.W. Nieuwenhuis, Continuity of Latent Variable Models, IEEE
Transactions on Automatie Control, Vol. AC-36, No. 6, June 1991
Chapter 2
Kaiman Filtering
1 IDtroductioD
Undoubtedly, one of the major contributions of R.E. KaIman has been the
KaIman filter, [1,2J, the magnitude of the contribution being specifically
recognized in the award of the Kyoto Prize.
In this contribution, we shall try to put the KaIman filter in historical
perspective, by recalling the state of filtering theory before its appearance, as
well as some of the major developments which it spawned. It is impossible to
be comprehensive in the allotted space, especially so in making a selection from
major developments.
42
r ,
----~ O----~
t, i.e. as t evolves; optimal means that the estimate, call it .Nt), should offer
minimum me an square error, i.e. E {[y(t) - y(t)]2} should be minimized. In ca se
y(.) and v() are jointly gaussian, this means that y(t) is a conditional me an
estimate, viz E[y(t)lz(s),s < t].
The solution to this problem is depicted in Fig.2.2. The block labelled
"Wiener filter" is a linear, time-invariant, causal, stable system, describable by
an impulse response h(') or transfer function, thus
y(t) =
J h(t -
(2.1)
s)z(s)ds
-co
The signal y(.) and noise v() are often represented as the output of stable
linear systems excited by white noise, see Fig. 2.3. If By(). Bv(') denote zero
mean, unit variance, white noise, i.e.
E(By(t)By(S)] = E[Bv(t)Bv(S)] = c5(t - s)
(2.2)
Then
rPyy(jw) = IWy(jwW,
rPvv(jw) = IWv(jwW
(2.3)
Wiener Filter
Linear Time-invariant
Causal System
----I~~
z (. )
43
It is c1ear that the key problem in Wiener filtering is to describe the procedure
which leads from the pair cPyy(jw), cPvv(jw) (or Wy(jw) and Wv(jw)) to the impulse
response h(t) or its transfer function H(jw). The most crucial step is that of
spectral Jactorization. The spectrum of z(), at least when y(.) and v(') are
independent, is given by
(2.4)
(2.6)
The case of vector z() and matrix cPzA') is much more complicated.
The following table sums up some important characteristics of the Wiener
filtering problem and its solution:
Table 1. Assumptions for and properties of the Wiener filter
Initial time t o
Random Processes
Signal y(.)
Measurement Noise v()
Wiener filter
Main Calculation Burden
Quantity estimated
t o = - CX)
Stationarity required
(i) Spectrum c/Jyy(') enough, but Wy(jw) is acceptable.
(ii) y(.) is stationary, Wy(jw) must be stable.
(iii) c/Jyy(') and Wy(') are not necessarily rational.
(i) Usually independent of y(.)
(ii) Stationary
(iii) Not necessarily white
(iv) c/Jv.(-) is not necessarily rational.
Time invariant and stable but not necessarily with rational
transfer function.
Spectral factorization
y(t)
44
= F(t)x(t) + G(t)w(t)
(3.1a)
(3.1 b)
v'(s)] = [Q(t)
S'(t)
S(t) ]c5(t - s)
R(t)
(3.2)
with R(t)=R'(tO for all t. Very frequently, S(t) =0, i.e. w() and v() are
independent. We shall make this assumption. Then Q(t) = Q'(t) ~ O.
In the first instance, we shall assume a finite initial time t o. Further, x(to)
will be assumed to be a gaussian random variable, of me an X o and covariance
Po. Equation (3.1a) defines a Gauss-Markov model-since x() is a Markov
process which is gaussian. [In fact, [x'(-) z'(-)]' is also gaussian and Markov.]
The estimation task is to use measurements of z(s) for s< t to estimate x(t);
the estimate, call it x(t), is to be on line and to minimize E[ 11 x(t) - x(t) 11 2 ].
(This means that x(t) is necessarily a conditional-mean estimate.) The now
well-known solution is obtained as folIows. Define P(t) = P'(t) ~ 0 as the solution
of
P(t o) =P o
(3.3)
(3.4)
(Note that Fig. 3.2 depicts a standard observer, in terms of structure, with a
special gain P(t)H(t)R -l(t), sometimes referred to as the KaIman gain.) Further,
there holds
E {[x(t) - x(t)] [x(t) - x(t)]'} = P(t)
(3.5)
45
M easurement noise v(). In both the Wiener and KaIman formulations, the
noise is usually independent of y(.). In the Wiener formulation, it is stationary,
but this is not required for the KaIman formulation. The major difference is
that the noise is required to be white in the KaIman formulation. This is not
46
Kaiman
t o = - 00
Stationarity
Infinite dimensional OK
Measurement noise not necessarily white
Spectral factorization
Signal estimation
t o ~ - 00
Nonstationarity acceptable
Finite dimensional
Measurement noise white
Riccati equation solution
State estimation
required in the Wiener formulation (though it turns out that whiteness does
carry with it significant simplifications).
Filter. The KaIman filter is in general time varying, stability is not guaranteed
(and of course, over a finite interval it is of limited relevance). It is finite
dimensional. The Wiener filter may not be, since its transfer function is not
necessarily finite dimensional.
Main calculation Burden. Spectral factorization and Riccati matrix solution, the
two key computational tasks, could not appear more dissimilar.
Quantity estimated. The Wiener filter estimates y(t), the KaIman filter x(t). In
the KaIman filter formulation, y(t) = H'(t)x(t), and an estimate of y(t) follows as
y(t) = H'(t)x(t)
(3.6)
(3.7)
where l/J(.,.) is the transition matrix of F(). The smoothing problem (Li is
negative instead of positive) took some time to resolve; it is discussed later.
p= minP(t)
t-+ 00
(4.1)
47
(4.2)
The boundedness of P(t) is intuitively reasonable, because under observability, it is not surprising that the error in estimating x(t), viz P(t), should remain
bounded. There are a number of other consequences of this result:
(i) the KaIman filter (3.4) is asymptotically time invariant.
(ii) if t o -+ - 00, rather than t -+ 00, the fact that the right side of the differential
equation (3.3) has no explicit time dependence yields
P=
min P(t)
to--+ - 00
for all t.
(iii) if t o -+ - 00, the signal model (3.1) with constant F, G may produce
unbounded E[x(t) x'(t)] unless Rd;(F) < 0, i.e. unless it is stable. And if
t o remains finite and t -+ 00, the same is true.
Result 1 says nothing about stability of the KaIman filter, nor of the dependence of P on Po. The circle is closed with Result 2. Again, we suppose that
F, G, H, Q and R are time invariant.
Result 2. Suppose [F, H] is observable and [F, GQl /2] is controllable. Then P
as defined in Result 1 is independent of Po, and
ReA;(F - PHR-1H') < 0
Notice that this stability property is just what is required to ensure that the
KaIman filter (3.4) is asymptotically stable.
Summarizing, if Re A;(F) < 0, with constant F, G, H, Q and Rand with
t o -+ - 00, the signals x, y = H'x, v and z are all stationary, the KaIman filter is
time invariant and asymptotically stable provided observability and controllability conditions are fulfilled. (Even if Re A;(F) < 0 fails, the latter statements
concerning the KaIman filter remain true.)
The parallel with the Wiener filter becomes more apparent in this result.
Let us note that observability and controllability are a little stronger than
needed; in fact, it is not hard to relax these requirements to detectability and
stabilizability, see e.g. [5,6].
Even in the nonstationary case, it still makes sense to contemplate the
possibility of t -+ + 00, and to ask about the stability of the KaIman filter and
the forgetting of initial conditions. Aresolution of these questions was really
suggested by the observation of [1,2] that the KaIman filter problem in many
ways is a dual of the linear-quadratic regulator problem, where infinite-time
behavior and stability are key issues, even for time-varying systems. The
fundamental paper [7] had dealt with these issues, and duality pointed the way
to the corresponding filtering results:
Result 1 (TV). Suppose [F(t), H(t)] is uniformly completely observable, and
F(t), G(t), H(t), Q(t) and R(t) are bounded. Then P(t) is bounded for all tE[t o, 00].
48
Moreover, if t o ~ -
00,
to--+ -00
FkXk + Gk W k
= H~Xk+ Vk
(5.1a)
(5.1b)
with
(5.2)
and {wd, {vd are zero me an sequences. For convenience, let the initial time be
k = O. Then the data inc1ude the mean Xo and variance Po of x o, which is
independent of {wd, {v k }. All variables are gaussian.
The key idea is to distinguish the effect of dynamics and measurements in
the filter. More precise]y, let Xk/k be the optimal estimate, again a conditional
mean estimate, of X k given Z/, I ~ k, and let xk+ l/k be E[x k + llz/, I ~ kJ, the
one-step prediction estimate. Since Wk is independent of z/ for I ~ k, (5.1a)
yie1ds
(5.3)
This shows how to update an estimate as a result of the system dynamics,
when no extra measurements appear. Along with (5.3), there holds
(5.4)
Hence r k / k and r k + l/k are the error covariances associated with Xk / k and xk + l/k'
49
The measurement update equations indicate how to pass from xk+l/k and
.Ek+ l/k to xk+ l/k + 1 and .EH l/k + 1 They are
XH l/k+ 1 = XH l/k + .Ek+l/k H H 1 [H~+ 1.Ek+l/k H k+ 1 + Rk+1]-1
. [Zk+ 1 - H~+ 1 Xk+l/k]
.Ek+l/k+ 1 =.EH l/k -.EH l/k H H
1 [H~+ 1.EH
(5.5a)
l/k H k+ 1 + R k+1]-1
H~+1.Ek+1/k
(5.5b)
Observe that Fk> Gk and Qk enter only in the time or dynamical update
equation, while Hk and R k enter only in the measurement update equation. This
separate accounting for dynamics and measurements, necessarily blurred in the
continuous-time filter equations, is optimal in the discrete-time equations.
Some of these ideas are also to be found in [9].
(6.1)
where
(6.2)
The quantity on the right side of (6.1) is the spectrum of the measurement
process z(). The left side defines a spectral factorization. Notice that
[1 + H'(sI - F) - 1 K] - 1 = 1 - H'(sI - F + H Kr 1 K
50
(l
+ H'(sI -
(6.3)
and this shows that P can be defined as the solution of a linear Lyapunov matrix
equation. So the two apparently distinct filter calculations, spectral factorization
and (steady state) Riccati equation solution, are effectively equivalent in this case.
A related result concerns the so-called innovations process. Consider
Fig. 3.2, and suppose that P = P, with F, Hand R all time-invariant. Then
the transfer function matrix from z to v = z - H'x can be computed to be
1- H'(sI - F + PHR- 1H,)-1 PHR- 1 = I - H'(sI - F + KH,)-1K
= [I + H'(sI -
F)-1 Kr 1
(6.4)
It follows that the spectral matrix of v, which is termed the innovations process, is
(/Jvv(jw) = [I
+ H'(jwI -
F)-1 K] -1 (/Jzz(jw)[I
+ K'( -
=R
i.e. v() is a white noise process. This remarkable result continues to hold even
in the nonstationary case, see [12] for a further discussion, though the proof
is obviously very different. The observation has proved useful in developing
various extensions of the basic theory, motivating conjectures, etc. It suggests
that the time-varying Riccati equation is achieving a form of time-varying
spectral factorization. (Indeed, this is true, see [13].) It also makes ni ce contact
with the highly motivated derivation of the Wiener filter of [4].
7 Development 2: Smoothing
Filtering is concerned with obtaining E[x(t) Iz(s), SE [t o, t)] on line. Smoothing
is concerned with obtaining one or more of the following:
(fixed t o, varying t)
(fixed t o, ,1, and varying t)
(fixed t o, T, varying t)
These tasks are known as fixed point, fixed lag and fixed interval smoothing.
We shall now outline how these problems can be solved.
(7.1a)
(7.1 b)
51
v(l}
Z(I}
with
E[:J=[::]
E[ {[:J -[::]}{[:J -[::]}']
[~: ~:]
(7.2)
The set-up is a standard one in terms of the KaIman filter. The best estimate
of X a, viz xa(t), is E[xa(t)lz(s),s<t]. However, xi) is so constructed that
xa(t) = xa(t O) = x(t o) for all t. So xit) = E[x(to)lz(s), s < t].
The Riccati equation for the augmented system decomposes into the Riccati
equation for the original model plus linear equations, and the construction of
the smoothed estimate is not at all hard.
This approach was suggested by [14,15]. It could not really have come out
of a theory requiring stationarity.
Fixed lag smoothing. This problem was examined in the Wiener filtering
context, see, e.g. [3J. The optimum smoother is normally infinite dimensional
(unless L1 = co), and this does not augur weIl for a KaIman filtering approach.
However, by switching to discrete time we avoid this problem. Consider the
standard discrete-time signal model, augmented as depicted in Fig. 7.2.
1---" x.(2)=x
I
elay
~
...
JN)
"k
k-2
=Xk-N
52
k] = E[x k _ )z[, I ~ k]
(7.3)
This is nothing but a fixed-Iag estimate with lag j. Again, it is easy to compute
the filter for the augmented system, and therefore the fixed-Iag smoother. It
turns out that if not all lagged estimates between 1 and N are required, considerable simplification is possible, see [5]. This approach originally appeared
in [16]. Approaches to the continuous-time problem can be found in [17].
Fixed-Iag estimates will always offer a lower error covariance than filtered
estimates of the same quantity , since more data is being used to genera te the
estimate. When the KaIman filter is exponentially stable, it turns out that all
the improvement can in practice be obtained by taking the lag equal to 4 to 5
times the dominant time constraint of the KaIman filter.
Fixed interval smoothing. One way to solve the fixed interval smoothing
problem becomes available when the KaIman filter is exponentially stable. Let
,1 correspond to 4 to 5 times the dominant time constant t of the KaIman filter.
Then
E[x(t)lz(s), SE[t o, T]]
E[x(t)lz(s),SE[t o, t + ,1)]
(7.4)
.L
j'/_ J j - l Lj-l/i-l
m- l ]
(7.6)
8 MisceIIaneous Developments
Nonlinear Kaiman filter. Following the success of the (linear) KaIman filter, it
became natural to try to ex te nd the ideas to nonlinear systems. In one thrust,
see e.g. [21-22], the aim was to provide equations for the evolution of the
53
54
9 Conclusions
Though the preceding sections have soleIy discussed theoretical issues, we should
note the great practical importance ofthe Kaiman filter. Applications in tracking
and guidance abo und, and as noted in the preceding section, the KaIman filter
is a major constituent of many controller designs. In truth, it represents one of
the major post-war advances of engineering science.
References
[1] R.E. KaIman, "A new approach to linear filtering and prediction problems", J Basic Eng,
Trans ASME, Series D, Vo182, March 1960, pp 35-45
[2] R.E. KaIman and R.S. Bucy, "New results in linear filtering and prediction theory", J Basic
Eng, Trans ASME, Series D, Vo183, March 1961, pp 95-108
[3] N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series, MIT Press,
Cambridge, Mass, 1949
[4] H.W. Bode and c.E. Shannon, "A simplifies derivation of linear least square smoothing and
prediction theory", Proc IRE, Vo138, April 1950, pp 417-425
[5] B.D.O. Anderson and J.B. Moore, Optimal Filtering, Prentice Hall, Inc, Englewood Cliffs, NJ,
1979
[6] V. Kucera, "The discrete Riccati equation of optimal control", Kybernetika, Vo18, 1972,
pp 430-447
[7] R.E. KaIman, "Contributions to the theory of optimal control", Bol Soc Matem Mex, 1960,
pp 102-119
[8] B.D.O. Anderson and 1.B. Moore, "Detectability and stabilizability of discrete-time linear
systems", SIAM J on Control & Optimization, Vo119, 1981, pp 20-32
[9] P. Swerling, "A proposed stagewise differential correction procedure for satellite tracking and
prediction", J Astronaut, Sei, Vo16, 1959, pp 46-59
[10] P. Faurre, M. Clerget, and F. Germain, "Operateurs rationnels positifs", Dunod, Paris, 1979
[11] B.D.O. Anderson and S. Vongpanitlerd, Network Analysis and Synthesis, Prentice Hall, Inc,
Englewood Cliffs, NJ., 1973
[12] T. Kailath, "A view of three decades of linear filtering theory", IEEE Trans Info Theory,
Vol IT-20, March 1974, pp 146-181
[13] B.D.O. Anderson, J.B. Moore, and S.G. Loo, "Spectral factorization oftime-varying covariance
functions", IEEE Trans Info Theory, Vol IT-15, September 1969, pp 550-557
[14] L.E. Zachrisson, "On optimal smoothing of continuous-time Kaiman processes", Information
Sciences, Voll, 0969, pp 143-172
[15] W.W. Willman, :'On the linear smoothing problem", IEEE Tranns Auto Control, Vol AC-14,
February 1969, pp 116-117
[16] J.B. Moore, "Discrete-time fixed-Iag smoothing algorithms", Automatica, Vo119, March 1973,
pp 163-174
[17] S. Chirarattanon and B.D.O. Anderson, "Stable fixed-lag smoothing of continuous-time
pro ces ses", IEEE Trans Info Theory, Vol IT-20, January 1974, pp 25-36
[18] D.C. Fraser and J.E. Potter, "The optimum linear smoother as a combination oftwo optimum
linear filters", IEEE Trans Auto Control, Vol AC-14, August 1969, pp 387-390
[19] J.E. Wall, A.S. Willsky, and N.R. SandelI, "The fixed-interval smoother or for continuous-time
processes", Proc 19th IEEE Conference on Decision and Control, 1980, pp 385-389
[20] H.E. Rauch, "Solutions to the linear smoothing problem", 1EEE Trans Auto Control, Vol AC-8,
October 1963t pp 371-372
[21] E. Wong, Stochastic Processes in Information and Dynamical Systems, McGraw Hili Book
Co., New York, 1971
[22] RJ. Elliott, Stochastic Calculus and Applications, Springer Verlag, New York, 1982
[23] D.L. Snyder, The State- Variable Approach to Continuous Estimation, MIT Press, Cambridge,
Mass., 1969
[24] B.D.O. Anderson and J.B. Moore, Optimal Contro/: Linear-Quadratic Methods, Prentice Hall,
Englewood Cliffs, NJ, 1989
This paper is an account of the development of some of several researches inspired by Kalman's
seminal work on linear least-squares estimation for processes with known state-space models.
1 Introduction
I first met Rudy KaIman in October 1960 at a conference in Santa Monica,
California organized by Richard Bellman. Rudy spoke about the theory of
optimal control and the calculus of variations [1], while my paper was on
Gaussian signal detection problems in which the likelihood ratios were expressed
in terms of "smoothed" (noncausal) least-squares estimates [2]. lassume Rudy
told me then about his paper on discrete-time state-space estimation [3] and
the continuous time paper with R. Bucy [4]. However I knew nothing about
state equations and was not particularly interested in recursive estimates;
moreover the papers [3]-[4] stated the determination of smoothed estimates
as being more complicated and still unsolved. So while Rudy and I met again
at MIT and in fact explored my spending the summer of 1961 with hirn at
RIAS (which did not happen because I celebrated my graduation after four
years at MIT by a visit horne to India), it was not until the mid-sixties that I
began to study his papers. The motivation was that in certain feedback
communication schemes (see, e.g. [5], [6]) and in a new Gaussian signal
detection formula of Schweppe [7] based on causal least-square estimates,
recursive estimates were important. So with the help of some graduate students
(J. Omura, B. Gopinath and P. Frost) I began the study of state-space systems
and KaIman filters. That was indeed a fortunate occurrence, because in one
way or the other for the last quarter century a significant part of my research
has been influenced, by Kalman's work on system theory. It is therefore a special
This work was supported in part by the U.S. Army Research Office under Contract DAAL03-89K-OI09 and the Air Force Office ofScientific Research, Air Force Systems Command under Contract
AF88-0327. This manuscript is submitted for publication with the understanding that the US
Government is authorized to reproduce and distribute reprints for Government purposes
notwithstanding any copyright notation thereon.
56
T. Kailath
t~0
(1)
The solution h(') can be shown to determine the linear least-squares estimate,
say z(), of the values of a zero-mean stationary stochastic process z() from
noisy measurements:
z(t) =
where
J h(T)y(t -
T)dT
-
CIJ
(2)
< T< t
(3a)
Ev(t)z(s)
== 0
(3b)
(3c)
1 F or as the preacher said in Ecdesiastes, 'The race is not to the swift, nor the baule to the strong,
neither yet bread to the wise, nor yet riches to men of understanding, nor yet fa vour to men
of skilI; but time and chance happeneth to them all.".
57
-oo<r<t
0~s~t~ T
(4)
o ~ s < t,
z(t) = H(t)x(t),
t~0
(Sa)
where F('), G('), H(') are known n x n, n x m and p x n matrices, while x(O), u()
and v() are zero-mean random quantities with
Eu(t)x*(O) == 0,
Ev(t)x*(O) == 0,
Ex(O)x*(O) = /Io
(Sb)
(Sc)
The star denotes (Hermitian) transpose and the matrices /Io and Q(-) are also
assumed to be known. We also remark that for vectors, we shall take a J.. b to
mean E[ab*] = O. [It is perhaps not as well-known as it should be that inner
products can be allowed to be matrix-valued.] Now if we define
Kxy(t, s) = E[x(t)y*(s)]E[x(t)x*(s)]H*(s)
(6)
and
t
(7)
58
T. Kailath
E[(x(t) - x(t))y*(S)]
= 0, 0 ~ s < t ~ T
0~s<t
(8)
Wh at KaIman and Bucy did in their celebrated paper [4] was to note that the
state-space assumptions (5) implied, after some calculation, that
(9)
d
-x(t) =
dt
J[F(t) t
= F(t)x(t)
+ hxy(t, t)[y(t) -
H(t)x(t)],
X(O) = 0
(10)
= E[x(t) -
x(t)]x*(t)
(11)
= Kxit, t) -
(12)
where the last equality follows from the defining integral equation (8). Therefore
we have
(13)
hxy(t, t) = P(t)H*(t)
It is customary to redefine hxy(t, t) as
hxy(t, t) = K(t),
X(O) = 0
(14)
Finally some sirpple calculations, e.g. by working with the differential equation
for the state error, x(t) = x(t) - x(t), show that P(-) obeys a nonlinear matrix
differential equation of Riccati type
59
The equations (14), (15) are the celebrated Kalman-Bucy filtering equations,
which by now have probably launched more reports and papers than Helen's
face ever did ships.
Brief Historical Comment
The paper [4J of KaIman and Bucy explains the historical background of their
work, based on reports by Follin and Carlton (1956), Hanson (1957) and Bucy
(1959), and Kalman's independent discrete-time solution [3]. The other authors
were motivated by attempts to obtain a practical solution to a nonstationary
tracking problem by starting with an estimating filter in state-space form and
then choosing the coefficients to minimize the mean-square-error. In conversation with me, and as evidenced by his paper [3J, KaIman stated that
he was not directly influenced by any practical application, but was motivated
by the wish to use in stochastic problems the state-space descriptions that he
had been (a pioneer in) advocating for studiesof deterministic linear (control)
systems. If I may say so, here is another ex am pIe, if more are needed, of the
value of pursuing research based on challenge and vision, rather than immediate
relevance. However, in this case, practical applications were very much at hand,
with the launching of Sputnik and the space age in 1957. Therefore the work
of KaIman and Bucy was immediately picked up by engineers at NASA (e.g.
S. Schmidt, G. Smith), the Draper Laboratories of MIT (R. Battin, J. Potter)
and elsewhere. Moreover it should not be surprising that closely related results
were being obtained by others at about the same time. One of the most relevant
was a paper by P. Swerling (1959) on a stage-wise smoothing procedure for
satellite observations. Anyone familiar with the famous stories of Gauss'
calculations ofthe orbits of asteroids might expect that his name should resurface
in this subject and indeed it does; H.H. Rosenbrock wrote a note on this in
1965 and Y. Genin (1970) wrote a careful exposition of the connection. FinaIly,
very related results in continuous-time were obtained by the Soviet physicist,
R.L. Stratonovich [llJ, whose goal was to show how going to the study of
Markov processes rather than Gaussian processes enabled (communications)
engineers to go beyond linear detection and estimation problems to nonlinear
and recursive solutions. The connection of their models to Gauss-Markov
processes was not explicitly mentioned by KaIman and Bucy in [3J and
[4J, though they could not have been unaware of it. On the other hand,
Stratonovich's approach was to explicitly seek to generalize the Fokker-Planck
equation for Markov density functions to obtain nonlinear filtering equations
for the conditional probability density (anticipating similar but later studies by
Kushner). Then in studying wh at he calls the "Gaussian approximation,"
Stratonovich obtains (in a different notation, of course) the basic (conditional
me an and variance) equations (10), (14), (15) of the KaIman filter. For ease of
reference, rather than give separate citations, I might mention that the abovecited papers of Kolmogorov (1939) (1941), Carlton and Follin (1956), Swerling
(1959), and Genin (1970), as weIl as KaIman [3J, Zadeh and Ragazzini [10J,
60
T. Kailath
and Stratonovich [11] have been reprinted, along with some others and with
some commentary, in the collection [12]. To avoid misunderstanding, it should
perhaps also be noted that this discussion is inserted not to detract from the
contributions of Kaiman and Bucy, but to show the richness and depth of the
problem to which they made such an elegant and influential contribution;
moreover they were able to add important results based on the controllability
and observability concepts recently formalized (and in the ca se of observability
first introduced) by Kaiman, such as the theorem on stability of the filter (see
the discussion at the end of Sect. 4) and the connection via duality with the
linear regulator problem and the Hamiltonian equations (see (28) below).
61
in a short note of Bucy [15], but came up with a slightly different formula than
in [14]. When he showed this to me, my discussions with Zakai about stochastic
integrals and ordinary integrals came to mind, and I saw that the transformation
of Duncan's version from It to ordinary (now called Stratonovich) integrals
reconciled the two. Later, this confusion between integrals also helped me to
clarify some of Schweppe's likelihood ratio formulas-see [16].
In any case, to return to the linear estimation problem, examination of
the Kalman-Bucy equations led me to the conclusion that the process
y(t) - H(t)x(t) was a white-noise process. It is hard to recall now the exact train
of thought, but at the time there was considerable confusion about this fact.
Thus my local experts (Frost and Clark) at first argued that this could not be
true since
(16)
and the error i(t) had nondiagonal covariance P() and was correlated with
the noise v(). Then as I recall we saw a preprint of a note by L.D. Collins [17],
showing by a detailed calculation using the filter equations (10), (14), (15) that
the process was indeed white. 2 At about the same time, related proofs were given
by Wonham (1967), Kushner (1967) and Anderson and Moore (1967); detailed
references can be found in [18]. However this was not what I needed. Rather
than using the Kalman-Bucy equations to show this property, Iwanted to go
the other way: if one could show that the process y(.) - H(' )x() was white, and
causally equivalent to the process y(.), then the filtering equations could be
readily obtained by the two-step Bode-Shannon procedure: whitening, followed
by the simple problem of estimation given white noise. In fact, after some effort,
the following result was obtained. Consider a process
(17a)
0~t~T
with
Ev(t)v*(s)=I(j(t-s),
Ev(t)z*(s):=O,
Ez(t)z*(s)=K(t,sJ
(17b)
Let
T
JE!z(tW dt <
o
z(t) =
(18)
CfJ
0 ~ t, s ~ T
(19)
2 I noticed recently that while in discrete-time Kaiman in [3, p. 42] states, "the signal after the
first summer is white noise since y(tlt - 1) is obviously [because it was obtained by a Gram-Schmidt
orthogonalization] an orthogonal random process," no similar remark is made in the continuoustime paper [4], presumably because detailed ca1culations such as those of Collins [17] and others
needed to be made.
62
T. Kailath
(20)
is white with the same covariance as v(), i.e. Ev(t)v*(s) = I b(t - s). No statespace model is needed, but in fact if there is one then an easy calculation leads
to the Kalman-Bucy equations (14)-(15) (see, e.g. [18], Sect. 6.5). (We may note
that a one-sided dependence between v() and z(-) is also permitted: we can
allow z(') to be correlated with past v(), as may happen in feedback communication and control problems.)
The point is that to estimate the states from the white noise process is easy: if
t
(21a)
O~r<t
yields
Ex(t)v*(r} =
Sg(t, s)Ev(s)v*(r)dr =
g(t, r)
Therefore
t
x(t) = S[Ex(t)v*(r)]v(r)dr
o
(21 b)
This seems circular, but some thought shows that (21) states that x(t) is a
function of earlier values of x(). This suggests that a differential equation may
be around, and in fact differentiation of (21) yields
t
Recalling that x() obeys the differential equation (5) and noting that u(t) 1- v(r),
r < t leads after a simple calculation to the Kalman-Bucy differentialequation
(14). The Riccati equation can now be obtained fairly directly, in several
ways-perhaps the simplest is indicated in [19, footnote 7]. Some people might
be concerned about the lack of rigor in the above, especially in differentiating
(21b). However there is no less rigor used here than in starting with the whitenoise driven state equation (5) in the first pI ace. A rigorous derivation using
integrated versions throughout (and the Ito differential rule) is easy to pattern
on the above outIine, and was done in [18a]. It should be noted that the rigorous
formulation does make some things simpler, e.g. handling the previously
mentioned one-sided correlation between v() and z().
Of course to justify the use of (21 a), one has to show that the original process
y(.) can be causally recovered [rom the process v(). Here again my communi-
63
(22a)
(23a)
where it is easy to check that k = h + h 2 + ... is also Volterra and squareintegrable. Then the covariance of y can be written down, in operator form, as
Eyy* = I + K = (I + k)Evv*(1 + k*)
= (1 + k)(1 + k*),
(23b)
say
(23c)
where H(t, s) is square-integrable on [0, T] x [0, T], and obtain the identity
or
(23d)
H = h + h* -h*h.
(23e)
I no ted that H in fact solved the smoothing problem, so that we now could see
that the solution of the (noncausal) smoothing problem was completely
determined by knowledge of the (causal) filtering solution. This was pleasing,
because KaIman [3] and KaIman and Bucy [4] had left smoothing problems
aside as not yielding immediately to their approaches. Smoothing problems are
discussed further in Sect. 4.
The operator identities (22)-(23) have a lot of other implications: for ex am pIe,
the Kalman-Bucy formulas implicitly also specify the function h(t, s). [In fact,
h(t, s) = H(t) P(t, s)K(s) = H(t) P(t, s)P(s)H*(s).] Therefore it must be true that
Fredholm and Wiener-Hopf-type integral equations with kerneIs K(t, s) arising
from state-space models can be solved via Riccati diferential equations. This
led to various interesting results that we do not have space to elaborate here;
64
T. Kailath
:i =
Fx + K v,
x(O) = 0
y=Hx+v
(24)
then it is easy to see that the integral of the process v() is a martingale process.
Moreover the fact that v() is white me ans that its integral has variance t.
Therefore Levy's theorem applied, and we immediately had a surprising result:
if v() is white and Gaussian and independent of past values of z(), then whether
z() is Gaussian or not, the process v() = y(. ) - z(), z() being the (generally
65
nonlinear) least-squares estimate of z() given past y(.), is itself a white Gaussian
noise process with the same covariance as V().4
This was such a striking result that Paul Frost abandoned an almost
complete dissertation on nonlinear estimation (using the then traditional
Bayes-rule based approaches of Kushner and others) and started to develop an
innovations approach. The name came from the fact that we could write
v(t) ~ y(t) - z(t) = y(t) -
.P(tl t -
Ho:y(t) = v(t)
Ho:y(t) = v(t)
But this is in the form of a "(conditionaIly) known signal z() in white Gaussian
noise problem," which suggests that the likelihood ratio (Radon-Nikodym
derivative) can be written in a wellknown (matched filter or correlation-detection)
form
dP
T
1T
_ 1 = exp z(t)y(t)dt - z2(t)dt
(25)
dP o
0
20
41t is c1ear that so me care is needed here: if v() and v() are both Gaussian and have the same
covariance, why are they not identical? The fact is that the integral of v() is a martingale with
respect to the (smallest) family of sigma-fields generated by the variables {Y(1:), 1: ~ t, 0 ~ t ~ T},
while the integral of v() is a martingale not with respect to this family of sigma fields but with
respect to a larger (underlying) family of sigma fields. The interested reader can pursue the details
in books such as [36], [37] (see also [32]).
66
T. Kailath
However since z(') is random (being known only in the conditional sense that
z(t) is determined by the past observations {Y('t'), 't' < t}), there. is a question
as to the interpretation of the first integral in (25). It turns out that in order
to reduce to previously known results for z( ) of various forms (e.g.
z(t) = cos(2nt + (J), where (J is uniformly distributed over (-n, n)), the integral
had to be taken in the It sense. This led to various interesting calculations,
described in the paper [22a]. The result (25) has had various theoretical and
practical applications; in particular, it provides a general structure for the
optimal detector which can be a guide to simple suboptimal implementations
(see, e.g. [23]).
On the theoretical side however the original "proof" in [22a] was not
satisfactory (though it did bring in, apparently for the first time in such
applications, a powerful theorem of a Soviet mathematician, Girsanov [24],
which has since become a major tool in stochastic control theory). The
achievement of a rigorous proof came about in an interesting way. I had met
Professor T. Hida of Nagoya University in Japan at a statistics conference in
the U.S. and he invited me to visit hirn on my travels between California and
India. Hida had written a basic paper on canonical representations of Gaussian
processes, which was related to my studies ofthe problem of"singular" detection
(related to absolute continuity of measures). The first of these visits was in 1968,
soon after (25) had been conjectured. Hida introduced me to a student of his,
M. Hitsuda, working in a nearby university, who gave me a preprint of a paper
soon to appear in the Osaka Journal of Mathematics [25]. Hitsuda's paper
established conditions under wh ich a Gaussian process could be related to a
Wiener process in a causal and causally invertible fashion, using new martingale
and stochastic integral methods developed in 1966-67 by H. Kunita and
S. Watanabe [26]. One consequence of Hitsuda's work was an alternative proof
(to the one in [18] using integral operators) in the Gaussian case of the
equivalence of the innovations and the original process {y(-)}. More useful to
me were the facts that Kunita was also at Nagoya and that from discussions
with Hitsuda and hirn I learned enough about the new martingale methods to
obtain a simple and rigorous proof of (25) [22b]. I also enlisted their help in
getting a correct proof of our conjectured causal equivalence of v() and y(.) in
the non-Gaussian case. They did not succeed in this, but Kunita and his student
Fujisaki, joined later by Kallianpur on receipt of a preprint from them,
circumvented the equivalence result by showing that even without it, functionals
of y(.) could be written as stochastic integrals with respect to v(), but with
integrands that depended on y(. )-see [27]; with this representation, the rest
of the arguments proceed essentially as in the linear case.
However while the general nonlinear filtering analogs ofthe linear KalmanBucy results are thus obtained in a conceptually simple way, this very simplicity
showed c1early that the nonlinear problem is in general impossible to solve
exactly. The difficulty is that the equation for the conditional mean has terms
dependent on higher-order moments, and so also for the conditional variance,
conditional third moment and so on. There appears to be no simple way of
67
68
T. Kailath
69
along a path that after a while led to scattering theory, the Chandrasekhar
equations and fast algorithms for matrices with displacement structure.
However, before moving on, some remarks on smoothed estimates are
appropriate, because their study even in the linear case continues in a useful
way even to the present [38]. A proper understanding ofthe smoothing problem
can be important in understanding the properties of stochastic processes with
state-space models, as noted by Faurre, Lindquist, Picci and others (see, e.g.
[39] ).
The problem of smoothed estimates, e.g. finding x(t[ T), the least-squares
estimate of x(t) given data {y(r), ~ t ~ T} was left aside by KaIman and Bucy
[3], [4] as not yielding easily to their approaches. However a direct approach
using the innovations immediately leads to the following basic result [40]: Let
P(t, s) = Ex(t)x*(s)
Then the (noncausal) smoothed estimate, x(t [T), is completely determined by
the (causal) filtered estimate via the formula
T
(26)
x(t [T)
= Sg(t, s)v(s)ds
o
and apply the orthogonality condition x(t) - x(t[ T) 1. {y(r), ~ r ~ T} to see that
g(t, s) = Ex(t)v*(s)
Then decomposing the integral over [0, T] into one over [0, t] and [t, T], and
checking that Ex(t)v*(s) = P(t, s)H(s) for s> t yields (26). [There is a elose
relationship between (26) and the operator formula (23e).]
Note that this basic formula does not depend upon having a state-space
model for x(). However if such exists, then we can say more about the smoothed
estimate by plugging in the Kalman-Bucy formulas. In this way, we are led
70
T. Kailath
x(t) + P(t)A(t 1 T)
(27a)
S o/*(s, t)H*(s)v(s)ds
t
and o/(t, s) is the state-transition matrix of [F(') - K(' )H(')], K(') = P( )H*().
Therefore we also have the differential equation description
(tl T) = - [F*(t) - H*(t)K*(t)]A(t) - H*(t)v(t),
A(TI T) = 0
(27b)
Other special smoothing problems are also easy to solve. Note that, by
regarding t as fixed, T increasing, (26) immediately solves the so-calledfixed-point
smoothing problem. By letting t vary and T = t + ,1, ,1 > 0 and fixed, we have
a fixed lag smoothing formula. Note that if we have a state-space model, (27a)
will still apply but because the upper limit is variable, the differential equation
for A(t 1t + ,1) will have an extra term. And so on.
We shall go in another direction by introducing another representation (not
solution) for the smoothed estimate, which will lead us to a fascinating scattering
interpretation of the whole state-space estimation problem. For this note that
by using the Kalman-Bucy filter equations, differentiation of (27a) leads to a
set of so-called canonical differential equations (weIl known for the calculus of
variations):
[
~(tl T) ]
-l(t T)
1
[X(t l T)]
M(t) A(t T)
1
0
]
H*(t)y(t)
(28a)
and
x(OI T) = 17oA(OI T)
(28b)
F(t)
- H*(t)H(t)
G(t)Q(t)G*(t)]
- F*(t)
(28c)
These equations are not directly useful, because the two-point nature of the
boundary conditions makes them hard to solve except by going back to (27)
or to the so-called Rauch-Tung-Striebel equation (obtained by substituting
for A('I T) from (27) into the first equation (28a)). KaIman was familiar with the
Hamiltonian matrix (28c) from his studies (see, e.g. [41]) ofthe linear quadratic
regulator problem, where wh at is encountered is actually the "dual" matrix M*.
He used this fact to show that the solution P() ofthe Riccati differential equation
could be expressed in terms of the elements of the fundamental (or statetransition) matrix of M('), noting however that this was not of computational
value except perhaps in the constant parameter case.
71
Scattering Models
As stated before, oUf reason for introducing (28) is that it enables us to establish
a useful transmission-line (or scattering) model for state-space estimation
problems. Such models were extensively studied by Redheffer in the late fifties
. (see [42]), motivated by work in electromagnetic theory and neutron transport.
The stochastic least-squares problem was very far from his knowledge or
consciousness. How I came to this subject, first explored with L. Ljung and
B. Friedlander, lS described in Sect. 5. To see this, change notation for
convenience
and make the simple Euler approximations for the derivative, e.g.
;( I)
xst=
( A)
+OLJ
and also
s + LI
[H*(S~Y(S)L1]
G(S)Q(S)G*(S)L1] [
x(sl t) ]
1 + F*(s)L1
)o(s + ,1 + t)
(29)
Note that because of the minus sign in (28a), the arguments of the ,1,('lt)
terms are reversed from those of the x('1 t) terms. Because of this we can regard
x(-It) as aforward wave and ,1,('lt) as a backward wave travelling through an
incremental section at s of some scattering medium specified by the incremental
forward and backward transmission coefficients {1 + F(s)L1, 1 + F*(s)L1} and
incrementalleft and right reflection coefficients { - H*(s)H(s)L1, G(s)Q(s)G*(s)L1};
the section has an incremental internal source y(s)L1. Now let us consider a
macroscopic section of the medium from say s = r to s = t; this is shown in
Fig. 1, followed by an incremental section.
We shall collect the operators in the macroscopic section into a so-called
scattering matrix
(30)
The left reflection operator of the macroscopic section is denoted P o(r, t) for
the following reason. By tracing paths through Fig. 1 we can see that
72
T. Kailath
[+ F/:,.
J---<-"T""7""--;.----,-{+J---
Po
So(t, T)
-wo
cc"/:,.
-WH/:"
/
W"o
I +F"/:,.
where 0(.1) denotes terms that go to zero faster than .1 as .1--+ 0; for simplicity
we have not shown all the arguments for the terms on the LHS. Therefore
dt
q~(t, r)
= Ao(r, t)
(32)
73
x(tlt)
Po
W'o
A(Tlt)
A(tlt) = 0
where x o, o obey the same differential equations as x and of (10) and (27),
except that P is replaced by Po. These relations will be used presently.
Several other nice results also follow easily from Fig. 2. For example, we
can see that the left reflection coefficient, say P, of the combined sections in
Fig.2 obeys the same differential equation as Po but with P(r, r) = llt; so also
the equations for 'P and Ware the same as { 'Po, Wo} but with P replacing Po.
However by again tracing flows through Fig. 2 we can write
P(t, r) = Po(t,r) + 'Po[llt- lltWollt + lltWolltWollt- ... J 'P (33)
= P o(t, r) + 'Pollt[l
+ W lltr 1 'P
(34)
Two-Filter Formulas
From Fig. 2, we can by inspection read out (recalling (32)) a relation for the
estimates (analogous to (34)),
x(tlt)= xo(tlt) + 'Po(t,r)x(rlt)
(35)
(36)
74
T. Kailath
left of the point rand trace the flow again to obtain the backwards evolution
equations
oso(t, r)
or
P oGQG* p~
(F+GQG*Wo)*P~
]
(37)
o
or
~Ao(rlt) =
(F - WoGQG*A(rl t) + H*(r)y(r).
(38)
Note that in reverse time, it is Wo and not Po, that obeys a Riccati equation.
The Mayne-Fraser formula is often called a two-filter formula [44b] because
it can be rewritten, by defining
(39)
as
~(r It) =
[llt- 1 + P; 1
(40)
~ Pb(t, r) =
F Pb - PbF*
~ ~b(rl t) =
or
or
+ GQG* -
PbH* HP b
(41a)
(41b)
+ t) arbitrary,
Pb(t, t)
(41c)
= 00
Equation (41) is regarded as defining the (backwards) KaIman filter for the
usual state space model running backwards in time,
- x(r) = F(r)x(r)
+ G(r)u(r),
This is not correct however, since in such a model we shall clearly have a
dependence between the white noise u() and the boundary state value x(T),
which will destroy the basic Markovian property of the state-space model (see
[45aJ), and prevent simple application of the KaIman filter arguments. This
difficulty is not recognized in many papers and even textbooks, where the issue
is avoided by the weakly motivated assumption that the variance of x(T) is
infinite, which decorrelates u() and x(T). However the variance of x(T) is not
infinite; it is in fact the solution at T of the weIl behaved linear (Lyapunov)
differential equation
iI =
F II + llF*
+ GQG*,
ll(t o) = II 0
75
A true resolution of the problem needs more care in defining a proper reverse
time (or backwards) model. This was initially easier to approach in the scattering
model (intuitively, because going from left to right or vice versa are not as
different in a spatial wave model as reversing time is in the unidirectional
Markovian state-space time model)-see [45]-[46] for the details and more
results. Reference [38] gives a different perspective on backwards-time and
two-filter smoothing formulas. Here the reader may find it amusing to check
that a proper (Markovian state-space) backwards-time model for the simple
state equation (for a Wiener process)
x(t) = u(t),
0~t
Eu(t)x(O) = 0,
Ex 2 (0) = 1
is
x(t)
= - -x(t) + f.1(t), 0 ~ t ~ T
t
Ef.1(t)x(T) = 0,
Ex 2 (T) = T
~]isfollOWedbYS2=[~ ~]
(42a)
S=[
A(I-bq-la
c + dC(l- bq-la
B+Ab(I-Cb)-lD]
d(I - Cb)-l D
(42b)
(42c)
I=S*S-l=S-l*S
(43)
i.e. the star product inverse and the usual inverse coincide when both exist.
(Actually the star-product inverse can sometimes be defined even when S-l
fails to exist.)
One can now develop a star product or scattering algebra. The examples
already given indicate that this algebra is perhaps the most insightful and often
the simplest way of obtaining many old and new results connected with
76
T. Kailath
Homogeneous Media
If the medium is homogeneous (or time-invariant in our case), the incremental
parameters {F, G, Q, H} will be independent of time, and the properties of the
medium scattering matrix So(t, r) willdepend only upon the thickness t - r of
the section. An immediate consequence is a "doubling formula"
(44)
which can be used to quickly calculate the limiting behavior of So(t, 0) as t ~ 00,
an observation first made and used in radiative transfer theory (by Van de
Hulst). A direct state-space derivation is much more involved (see, e.g. [49],
p. 158); moreover the fact that the doubling formula for the Riccati variable
P() will also involve the quantities { '1', W} is much more natural to see in the
scattering derivation. The point is that introducing the star product enables the
notation to carry most of the computational burden, leaving more scope for
conceptual understanding.
Another consequence of homogeneity is that the forward and back ward
evolution equations must be the same (because adding an incrementallayer to
the right is the same as adding one to the left), i.e. we have
- - So(t, r)
ar
a
= - So(t, r)
(45)
ar
FPo+PoF*+GQG*-PoH*HP=-Po{t,r)= 'I'oGQG*'I''6,
dt
(46)
As mentioned before, perhaps the key contribution of KaIman and Bucy [4]
was their analysis of the asymptotic properties of the Riccati equation and the
steady-state filter. Thus they showed for time-invariant (and a special dass of
time-variant) models, that if {F, GQ} is controllable and {H, F} observable, then
as t ~ 00 (even if Fis unstable) P() converges to a constant value P independent
of the initial value, P(to) = II 0' provided II 0 is nonnegative definite. P is the
unique positive definite solution of the so-called algebraic Riccati equation
77
(ARE)
0= FP + PF* + GQG* - PH*HP
(47)
and also the unique P that makes the closed-Ioop state matrix (F - PH* H)
stable. The significance of this result is that it shows that errors in computing
P(t), at any time t, will die out as time progresses rather than build up. However
there is an interesting issue here: there is no guarantee that numerical errors
will not make P(t), at some time t, indefinite or even negative-definite. Therefore
it would be desirable to investigate convergence for more general initial
conditions. Here the most general results are apparently those of Willems [50]
who showed that convergence took place for all II > P_, P_ being the infimum
over all solutions to the ARE (47). This, and related results of several others,
are all based on a detailed study of the nonlinear ARE. In my opinion, a proof
of convergence that avoids a detailed study of the limit itself was more desirable.
I proposed this problem in 1974, to a new postdoctoral scholar, Lennart Ljung,
and we obtained such a proof [51], using an identity that was very natural in the
scattering framework, but apparently new in the estimation and control
literature:
tp o(t, 0) =
(48)
tp~(t, 0)
where tp 0(-,0) is the state-transition matrix of (F - P o(-}H* H), P 0(-) being the
solution of the Riccati differential equation (RDE) with P(to) = II 0 = 0, while
tp~(,O) is the state-transition matrix for (F - GQG*P~(-)), where P~(-) satisfies
the adjoint RDE,
a
pa(t)
o -- F*p 0 + pa0 F + H*H - paGQG*pa
0
0'
pa(t)
= 0
0 0
(49)
(50)
Pa is the steady-state solution of (49) and can be shown to exist under the above
conditions on {F,G,Q,H}. We may note that if {H,F} is observable, then it
can be shown that pa is invertible, and that (50) reduces to
(51)
which is the result of Willems. Simple examples show that (50) can hold even
if (51) fails.
Hopefully, enough has been described of the scattering approach to show
that it can provide a powerful and physically intuitive framework for the study
of Riccati differential equations. More can be found in [46]-[48]. And we might
mention that we have exploited less than half of the results in Redheffer's basic
paper [42]; I understand that Redheffer is currently preparing a monograph
on his theory.
78
T. Kailath
J
t
~s~t~ T
(52a)
K(t,s) =
Je-tXlt-slw(oc)doc
1
(52b)
Such kerneis arise in radiative transfer theory, where w(oc) is the intensity of
light impinging on the atmosphere, say, from a direction oc, and e- tXt represents
the attenuation at depth t. Casti, Kalaba and Murthy [52a] noted that the
solution of (47) could be reduced to the solution of a coupled set of nonlinear
integro-differential equations
(53b)
oc ~ 1
(53c)
aY(t, oc)
1
-at- = - oc Y(t, oc) - X(t, oc) Y(t, )w()d
X(O,oc) = 1 = Y(O,oc),
(53a)
79
0 1 1
(54)
00
t;;; 0
P(O) = IIo
(55a)
(55b)
(56)
80
T. Kailath
'P(O, 0)
=I
(58)
Equation (57) shows the striking property that the rank and in fact the inertia
(i.e. the number of positive, negative and zero eigenvalues) of PU is constant
with time: it depends only upon the rank (or inertia) of
+ GQG* -
(59)
IIoH*H IIo
Now it can weil happen that P(O) has low rank. For example, if
if
P(O) = GQG*
IIo=O,
has rank
;;:;; min(n, m)
(60)
F fj
+ fjF* + GQG* =
(61a)
then
if
IIo = fj,
P(O)=-IIoH*HIIo hasrank
;;:;;min(n,p)
(61b)
where p is the number of outputs. When IIo = fj, the processes x() and z()
are stationary.
This suggests that the n x n matrix P() can be propagated using lower rank
matrices, with a saving in computation; then P() and K() could be computed
when desired by a quadrature. In fact, the situation is even nicer: Let
rank P(O) = rank[F IIo + IIoF*GQG* - IIoH*H IIo]
IX =
an
IX
IX
(62a)
(62b)
with as many ones as P(O) has positive (negative) eigenvalues. Also let L o
be any matrix such that
(62c)
[Note that L o is not unique; it can be modified by any A -unitary matrix,
e A e* = A. We shail ignore this possibility here-it does have useful
consequences.] Then (57) can be written as
(63)
6 Note that (60) is just a generalized Stokes identity (46), noted in our discussion of homogeneous
scattering media (corresponding to time-invariant state-space models) in Sect. 4. The more general
identity (57) can also be obtained in the scattering context-see [47b).
81
where
L(t) = 'P(t,O)L o
(64)
(65a)
(65b)
L(O) = L o
(65c)
(66)
The point of course is that whenever p n, a n the new equations (65) can
provide a considerable reduction in complexity over solving for the n x n coupled
Riccati equations of the general (for time-variant as well as time-invariant
systems) Kalman-Bucy filter. For example, in the special case of scalar stationary
processes, we see from (61) that a = 1 = p, and the n(n + 1)/2 coupled Riccati
equations reduce to 2n coupled equations, which with some goodwill 7 the reader
will recognize as a finite-dimensional version of the X and Y equations (53).
Therefore what we have found in (65) is a generalization to a special dass
of nonstationary processes of the results of Chandrasekhar for stationary
processes. These generalized Chandrasekhar equations can provide dramatic
computational simplifications when n is large, as in image processing problems
,and in distributed parameter systems; we refer he re only to the papers [59].
The Chandrasekhar equations led to many further results of various sorts,
for both continuous and discrete-time systems, with and without state-space
models, which are too extensive to describe here. However because of their
important role in later work, I should mention here the papers [60] and [61],
dealing with certain "square-root" or "array" versions of the discrete-time
(Riccati and) Chandrasekhar equations.
It should also be mentioned that it was my reading in the radiative transfer
literature that led me to the work of Redheffer on transmission lines and
7 Actually it is worth noting that in radiative transfer theory, one started with given covariance
functions and there is no state-space model. This would have made comparison with the KaImanfilter formulas difficult, except that a couple of years earlier, Roger Geesey and I had shown how
to re-express the Kaiman filter equations in terms of the coefficients of the covariance function
[57]-this fortunate circumstance made the exploration of the radiative transfer literature much
easier.
82
T. Kailath
scattering [42], where Riccati equations again played a key role. However I
put it aside until I had better understood the results (52)-(54) of Ambartzumian
and Chandrasekhar. Then in 1974, Lennart Ljung came to Stanford as a
postdoctoral scholar, strongly recommended by his adviser in Sweden, K.-J.
strm. I suggested to Lennart, and to a new Ph.D. student B. Friedlander,
that it would be worthwhile to relate Redheffer's work to Riccati equation
results we knew from KaIman filtering theory. The result, after a couple of
frustrating initial months till we found the right framework, was the paper
[47b] and several others; later another student G. Verghese, whose own Ph.D.
work was on linear systems, pointed out the usefulness of starting with the
Hamiltonian equations (28), which enabled us to also obtain results on the
estimates themselves [4 7a].
Displacement Rank
However the key question that led beyond state-space models, at least initially, 8
was the meaning of the parameter 0(, which seemed to rise almost out of nowhere in the arguments leading to the generalized Chandrasekhar equations.
Moreover, the formula (62a) for 0( is not invariant in that it seems to depend
upon the particular state-space model {F, G, H, Q} used to model the signal
process z = Hx. In fact, 0( is an invariant quantity completely determined by
the covariance function of the process z(). To explain this, let us remark that
the original derivations of Ambartzumian, Chandrasekhar, Bellman, Kalaba,
Casti and others were all based on the fact that the covariance function was
stationary, or in their language, had a displacement or Toeplitz form,
K(t, s) = f(t - s)
(67a)
(67b)
(:t + a~)K(t,
s) # 0,
(68)
it turns out that K(t, s) arising from a time-invariant state-space model is not
8 Recent work with Lev-Ari has taken us back to state-space models, in a different not really
dynamic sense-to what Livsic [62] has called nodes or colligations {F, G, H, J}.
83
completely arbitrary: it has finite rank in the sense that we can write
s) ( ~ot + ~)K(t,
os
K(t, O)K(O, s) =
Ei(/>;(t)<Pi(S)
(69)
where Ei = + 1 or -1 (as many times in fact as the inertia of P(O) in (61)). The
RHS of (69) is what is called a "degenerate" kernel in the theory of integral
equations, so that while K(t, s) is not of displacement form, it is "dose" in some
sense to a dis placement kernel.
The number (J. is a measure of this doseness, and therefore we called it a
displacement rank. The measure was shown to have operational significance in
the sense that it takes (J. times as many computations to solve an integral
equation with a kernel of dis placement rank (J. as it takes for a Toeplitz kernel
(see [63J). In [64J, we relate these general results to the Chandrasekhar equations
for state-space systems.
Further developments have stemmed largely from attempting to work out
the above ideas for discrete-time processes, or even more simply, for finite
matrices. This has turned out to be a very long story indeed, starting with the
discrete-time vers ions of the generalized Chandrasekhar equations and their
connections to the Levinson and Schur algorithms and matrix factorizationsee, e.g. [65J-[67]. Here we shall only comment on the analog of (69).
We start by noting that the analog of a displacement or Toeplitz kernel is
a Toeplitz matrix, i.e. one of the form
T= [ci-iJ,
0 ~ i,j
N.
Many nice results are known for such matrices, especially the fact that linear
equations with Toeplitz coefficient matrices can be solved with O(n 2 ) flops
(floating point operations) rather than the O(n 3 ) flops required for a general
matrix. However in applications we often need to work with dosely related
matrices, e.g. having the forms Tl T 2 , or Tl T 2 - T 3 T4 , or Tl T~ 1 T 3 , where the
{TJ are Toeplitz matrices. These composite matrices are not Toeplitz in general,
,and so the question arises as to whether it would take O(n 3 ) flops to solve the
corresponding linear equations. When pressed, one would have to say no, and
in fact it turns out that suitable concepts of displacement structure can be
introduced: one family of definitions has the form
(70)
where {F l' F 2} are lower-triangular matrices. The simplest case, and the one
dosest to the continuous-time definition (67), is perhaps to choose F 1 = F 2 = Z,
the lower-shift matrix with zeros everywhere except for 1's on the first subdiagonal. The survey [68J gives a fairly recent account of some of the many
properties and applications following from this definition. Further results,
especially on matrices such as Tl T~ 1 T 3 , can be obtained by using block-shift
F-matrices of the form [Zn! (flZn2 (fl ",ZnnJ (see, e.g. [69a, b, and cJ).
It is worth noting here that the paper [69J deals with certain now wellknown matrix identities for the inverse of Toeplitz matrices first introduced by
84
T. Kailath
where Li and L 2 are lower triangular Toeplitz matrices. From this it can be
seen that the displacement T- i - ZT- i Z* has the same rank (and inertia) as
the displacement T - ZTZ*, which was a key observation in the development
of the theory of displacement structure-see the paper [70] written just as the
beginning of displacement theory was taking shape.
I first saw these important formulas in a Russian book sent to me by Gohberg
in 1973 a few months before the appearance of the English version [71]. That
this fortunate circumstance was based on an even more fortunate earlier accident
is a story that is too long to tell here. One connection is the operator formulas
noted earlier in (23); a further indication may be gained from the following
somewhat unusual acknowledgement in the paper [72]: "The first steps to the
results of this paper were taken with R. Geesey. Of course, the work progressed
very rapidly with the discovery of the deep and beautiful studies of Gohberg
and Krein; for the inadvertent discovery in May 1968 of their books, T. Kailath
is indebted to G. Wallen stein and his insistence on browsing in a Leningrad
bookstore." It is indeed a pleasure to mention here the considerable stimulation
and several other serendipitous benefits I have gained from having the good
fortune through this work to fall into the "orbit" of Professor Israel Gohberg.
As to the further development of displacement structure, that is a story to
be told some other time. However for reference to recent results I might mention
the study of some very useful connections of displacement rank structure to
inverse scattering problems, which led to a unified framework for dealing with
the Schur algorithms, identification of discrete transmission lines, matrix
factorizations, lattice filtering and partial realization of linear systems (see
[73a]-[73d]) and the Ph.D. dissertations of J. Chun [74a] and D. Pal [74b],
the results of which are in process of publication. These deal with both
Toeplitz- and Hankel-related matrices and with Bezoutian matrices (which are
inverses of Toeplitz and Hankel matrices). Now Bezoutians have displacement
structure and so in particular their tri angular factors (and thereby their inertia)
can be determined via fast O(n 2 ) algorithms-see [75]. On the other hand,
from the time of Hermite it has been known that the inertia of these matrices
arise determines the root distribution of polynomials with respect to the
imaginary axis and the unit circle. It may therefore not be surprising that the
fast algorithms turn out to yield very naturally, and in a unified way, the famous
Routh-Hurwitz-and the Schur-Cohn criteria, among several others. This is
shown in [75] for the regular case; extensions to singular cases are discussed
in [74]; not surprisingly, KaIman has also made some elegant contributions to
this classical area of system theory-see [76]-[78].
The above is only a partial account of the influence Rudy Kalman's seminal
contributions have had on my work and directions of research. Topics such as
85
6 Acknowledgments
The topics mentioned in my review are fairly extensive, so any reasonable
referencing scheme must be consciously incomplete and unconsciously subjective. Therefore I would like to say explicitly that any significant omissions
are inadvertent. This being said, it is a special pleasure for me to recollect the
many pie asant and important interactions I have had with a host of students
and colleagues in many areas of mathematical system theory. As with references,
they are really too many to comfortably list here. But because of the nature
and depth of certain associations, and at least as far as the topics in this review
are concerned, I should like to especially thank Paul Frost, Adrian Segall,
Martin Morf, Lennart Ljung, Patrick Dewilde, Freddy Bruckstein, and Hanoch
Lev-Ari for many an enjoyable discussion and insight.
Finally I am indebted to Thanos Antoulas for his invitation to add my
contribution to the many others in this volume.
References
[1] R.E. Kaiman, "The theory of optimal control and the calculus ofvariations in mathematical
optimization techniques," Mathematical Optimization Techniques, ed. R. Bellman, pp 309331, Univ of Calif Press, 1963
[2] T. Kailath, "Adaptive matched filters," ibid. pp 109-140
[3] R.E. Kaiman, "A new approach to linear filtering and prediction problems," J Basic Eng,
Vo182, pp 34-45, Mar 1960
[4] R.E. KaIman and R.S. Bucy, "New results in IineaT filtering and prediction theory," Trans
ASME. Ser D. J Basic Eng, Vo183, pp 95-107, Dec 1961
[5] J.P. Schalkwijk and T. Kailath, "Co ding with wideband additive noise channels with
feedback, part I: no bandwidth limitation," IEEE Trans on Inform Thy, Vol IT-12,
pp 172-182, April 1966
[6] J.P. Schalkwijk, "Center of gravity information feedback," IEEE Trans Inform Thy,
Vol IT-14, pp 324-441, Mar 1968
[7] F.C. Schweppe, "Evaluation of Iikelihood functions for Gaussian signals," IEEE Trans
Inform Thy, Vol IT-11, pp 61-70, 1965
[8] W. Davenport and W.L. Root, Random signals and noise, McGraw-Hill, 1958
[9] H.W. Bode and C.E. Shannon, "A simplified derivation of linear least square smoothing and
prediction theory," Proc IRE, Vol 38, pp 417-425, Apr 1950
[10] L.A. Zadeh and J.R. Ragazzini, "An extension of Wiener's theory of prediction," J Appl
Phys, Vo121, pp 645-655, July 1950
[11] R.L. Stratonovich, "Application of the theory of Markov processes for optimum filtration
of signals," Radio Eng Electron Phys (USSR), Voll, pp 1-19, Nov 1960
[12] T. Kailath, ed., Benchmark papers in linear least-squares estimation, Dowden, Hutchinson &
Ross, Stroudsburg, PA, 1977 (now distributed by Academic Press)
[13] E. Wong and M. Zakai, "On the relation between ordinary and stochastic differential
equations and applications to stochastic problems in control theory," in Proc 3rd IFAC
Congr London: Butterworth, 1966
[14] R.L. Stratonovich and YU.G. Sosulin, "Optimal detection of a Markov process in noise,"
Eng Cybern, Vo16, pp 7-19, Oct 1964
86
T. Kailath
[15] R.S. Bucy, "Nonlinear filtering theory," IEEE Trans Automat Contr, Vol AC-IO, p 198,
April 1965
[16] T. Kailath, "Likelihood ratios for Gaussian processes," IEEE Trans Inform Theory, IT-16,
pp 276-288, May 1970
[17] L.D. Collins, "Realizable whitening filters and state-variable realizations," Proc IEEE,
Vo156, pp 100-101, Jan 1968
[18] T. Kailath, "An innovations approach to least-squares estimation-part I: linear filtering in
additive white noise," IEEE Trans Automat Contr, Vol AC-13, pp 646-655, Dec 1968
[18a] T. Kailath, "A note on least-squares estimation by the innovations method," SIAM Journal
Contr., Vol. 10, no. 3, pp 477-486, Aug 1972
[19] T. Kailath, "The innovations approach to detection and estimation theory," Proc IEEE,
Vo158, pp 680-695, May 1970
[20] R.K. Mehra, "On the identification of variances and adaptive Kaiman filtering," IEEE Trans
Automat Contr, Vol AC-15, pp 175-184, 1970. See also Vol AC-16, pp 12-21, 1971
[21a] C.E. Benes, "On Kailath's innovations conjecture," Bell Syst Tech J, Vol IT-55,
pp 981-1001, Sept 1976
[21b] D.F. Allinger and S. K. Mitter, "New results on the innovations problem for nonlinear
filtering," Stochastics, Vo14, pp 339-348, 1981
[22a] T. Kailath, "A generallikelihood-ratio formula for random signals in Gaussian noice," IEEE
Trans Inform Theory, Vol IT-15, pp 350-361, May 1969
[22b] T. Kailath, "A further note on a generallikelihood formula for random signals in Gaussian
noise," IEEE Trans on Inform Theory, Vol IT-16, pp 393-396, July 1970
[22c] T. Kailath, "The structure of Radon-Nikodym derivatives with respect to Wiener measure,"
Ann Math Stat, Vo142, pp 1054-1067, 1971
[23] M.H.A. Davis and E. Andreadakis, "Exact and approximate filtering in signal detection,"
ibid., Vol IT-23, pp 768-772, 1977
[24] LV. Girsanov, "On transforming a certain class of stochastic processes by absolutely
continuous substitution of measures," Theor Probability Appl, Vol 5, pp 285-301, 1960
[25] M. Hitsuda, "Representation of Gaussian processes equivalent to Wiener processes," Osaka
J Math, Vol 5, pp 299-312, 1968
[26] H. Kunita and S. Watanabe, "On square-integrable martingales," Nagoya Math. J, Vol 30,
pp 209-245, Aug 1967
[27] M. Fujisaki, G. Kallianpur, and H. Kunita, "Stochastic differential equations for the nonlinear
filtering problem," Osaka J Math, Vo19, pp 19-40, 1972
[28] M. Hazewinkel and J.C. Willems, eds., Stochastic systems: the mathematics of jiltering and
identification and applications, D. Reidel, 1981
[29] H. Sorenson, ed., Kaiman jiltering theory and applications, IEEE Press, New York, 1985
[30] T. Kailath, "Some extensions of the innovations theorem," Bell Syst Tech J, Vo150,
pp 1487-1494, Apr 1971
[31] R.E. Kaiman, "Linear stochastic filtering theory: reappraisal and outlook," Proc Symp
System Theory, pp 197-205, Polytechnic Inst, Brooklyn, 1965
[32] P.A. Meyer, "Sur un probleme de filtration," Seminaire de probabilites, part VII, leeture
notes in mathematics, Vo1321, pp 223-247, Springer-Verlag, New York, 1973
[33] D.L. Snyder, Random point processes, J Wiley, New York, 1975
[34] A. Segall and T. Kailath, "The modeling of randomly modulated jump processes," IEEE
Trans Inform Thy, Vol IT-21, pp 135-143, 1975. See also A. Segall, M. H. A. Davis and
T. Kailath, "Nonlinear filtering with counting observations, ibid., pp 143-149
[35] A. Segall and T. Kailath, "Orthogonal functionals of independent-increment processes," IEEE
Trans Inform Theory, Vol IT-22, pp 287-298, 1976
[36] P. Bremaud, Point processes and queues: martingale dynamics, Springer-Verlag, 1981
[37a] R.S. Liptser and A. N. Shiryaev, Statistics of random processes, Vols land II, SpringerVerlag, 1977; original Russian edition, 1974
[37b] R.S. Liptser aiid A.N. Shiryaev, Theory ofmartingales, Kluwer, Amsterdam, 1989
[38] R. Ackner and T. Kailath, "Complementary models and smoothing," IEEE Trans Automat
Contr, Vol AC-34, pp 963-969, Sept 1989
[39] P. Faurre, M. Clerget, F. Germain, Operateurs rationnels positifs, Dunod, Paris, 1979
[40] T. Kailath and P. Frost, "An innovations approach to least-squares estimation, part II: linear
smoothing in additive white noise," IEEE Trans Automat Contr, Vol AC-13, pp 655-660,
Dec 1968
87
[41] R.E. Kaiman, "Contributions to the theory of optimal control," Bol. Soc. Mat. Mexicana,
Second Ser, Vol 5, pp 102-119, 1960
[42] R. RedheITer, "On the relation of transmission-line theory to scattering and transfer," J M ath
Phys, Vol 41, p 141, 1962
[43] D.G. Lainiotis, "Partitioned estimation algorithms, 11: linear estimation," Information
Sciences, Vo17, pp 317-340,1974
[44a] D.Q. Mayne, "A solution ofthe smoothing problem for linear dynamic systems," Automatica,
Vo14, pp 73-92, 1966
[44b] D.C. Fraser and J.E. Potter, "The optimal linear smoother as a combination oftwo optimum
linear filters," IEEE Trans Automat Contr, Vol AC-14, pp 387-390, 1969
[45a] L. Ljung and T. Kailath, "Backwards Markovian models for second-order stochastic
processes," IEEE Trans Infor Thy, Vol IT-22, No 4, pp 488-491, July 1976
[45b] G. Verghese and T. Kailath, "A further note on backwards Markovian models," IEEE Trans
Inform Thy, Vol IT-25, No 1, pp 121-124, January 1979; correction, Vol IT-25, p 501, July
1979
[46] L. Ljung and T. Kailath, "A unified approach to smoothing formulas," Automatica, Vol 12,
No 2, pp 147-157, March 1976
[47a] G. Verghese, B. Friedlander, T. Kailath, "Scattering theory and linear least-squares
estimation, part 111: the estimates," IEEE Trans Auto Contr, 1980
[47b] L. Ljung, T. Kailath, B. Friedlander, "Scattering theory and linear least-squares estimation,
part I: continuous-time problems," Proc IEEE, Vol 64, No 1, pp 131-139, January 1976
[48] B. Levy, D.A. Castanon, G.C. Verghese and A. Willsky, "A scattering frame-work for
decentralized estimation problems," Automatica, Vol 19, pp 373-384, 1983
[49] B.D.O. Anderson and J.B. Moore, Optimaljiltering, Prentice-Hall, 1979
[50] J.c. Willems, "Least-squares stationary optimal control and the algebraic Riccati equation,"
IEEE Trans Automat Contr, Vol AC-16, pp 621-634,1971
[51] T. Kailath and L. Ljung, "The asymptotic behavior of constant-coefficient Riccati dilTerential
equations," IEEE Trans Automat Contr, Vol AC-21, pp 385-388, 1976
[52a] J.L. Casti, R.E. Kalaba and V.K. Murthy, "A new initial-value method for on-line filtering
and estimation," IEEE Trans Inform Theory, Vol IT-18, pp 515-518, July 1972
[52b] J. Buell, J.L. Casti, R.E. Kalaba and S. Ueno, "Exact solution of a family of matrix integral
equations for multiply-scattered partially polarized radiation," J Math Phys, Vo111,
pp 1673-1678, 1970
[53] R.E. Kalaba, H.H. Kagiwada, S. Ueno, Multiple scattering processes, inverse and direct,
Addison-Wesley, MA, 1975
[54] R.E. Bellman and G.M. Wing, Introduction to invariant imbedding, J. Wiley, New York, 1975
[55] S. Chandrasekhar, Radiative transfer, Oxford University Press, London, 1950. (Dover
Publications, New York, 1960)
[56a] S. Chandrasekhar, "On the radiative equilibrium for a stellar atmosphere. XXII (conc\uded),
Astrophysical Journal, Vo1108, pp 188-215, 1948
I56b] V.V. Sobolev, A treatise on radiative transfer, Van Nostrand Co., Princeton, NJ, 1963; Russian
original, 1956
[57] T. Kailath, R. Geesey, "An innovations approach to least squares estimation, part IV:
recursive estimation given lumped covariance functions," IEEE Trans Automat Contr,
Vol AC-16, No 6, pp 720-727, 1971
[58] T. Kailath, "Some new algorithms for recursive estimation in constant linear systems," IEEE
Trans Inform Theory, Vol IT-19, No 6, pp 750!.760, November 1973
[59a] J. Casti and O. Kirschner, "Numerical experiments in linear control theory using generalized
S-Y equations," IEEE Trans Automat Contr, Vol AC-21, pp 792-795, 1966
[59b] M. Sorine, "Sur les equations de Chandrasekhar associes au probleme de contrle d'un
systeme parabolique," C R Acad Sc, Paris, t 285, pp 863-865, 1977
[59c] J. Bums and R.K. Powers, "Factorization and reduction methods for optimal control of
hereditary systems," Mat Aplic Comp, Vol 5, No 3, pp 203-248, 1986
[60] M. Morf and T. Kailath, "Square-root algorithms for linear least squares estimation," IEEE
Trans on Autom Contr, Vol AC-20, No 4, pp 487-497, Aug 1975
[61] T. Kailath, A. Vieira and M. Morf, "Orthogonal transformation (square-root) implementat ions of the generalized Chandrasekhar and generalized Levinson Aigorithms," in Inter'l
Symp on Syst Optimization & Analysis, ed. by A. Bensoussan and J.L. Lions, pp 81-91,
Springer-Verlag, New York, 1979
88
T. Kailath
[62aJ M.S. Livsic, "Operators, oscillations, waves (Open systems)," Amer Math Soc Translations,
Vo134, 1973; Russian original, Nauka, Moscow, 1966
[62bJ M.S. Livsic and A.A. Yantsevich, "Operator colligations in Hilbert space," Nauka, Moscow,
1971; English translation, J. Wiley, New York, 1979
[63] T. Kailath, L. Ljung and M. Morf, "Generalized Krein-Levinson equations for efficient
caIculation of Fredholm resolvents of nondisplacement kerneIs," Topics in Functional
Analysis, Vo13, pp 169-184, ed. by I.C. Gohberg and M. Kac, Academic Press, New York,
1978
[64] T. Kailath, "Some new resuIts and insights in linear least-squares estimation theory," First
Joint IEEE-USSR Workshop on Inform Thy, pp 97-104, Moscow, USSR, December 1975.
(Reprinted with corrections as Appendix A in T. Kailath, Lectures in Wiener and Kaiman
Filtering, Springer-Verlag, 1981)
[65] T. Kailath, M. Morf and G. Sidhu, "Some new algorithms for recursive estimation in constant
linear discrete-time systems," Proc Seventh Princeton Conf on Inform Sei & Systs, pp 344-352,
Princeton, N.J., March 1973. See also IEEE Trans Automat Contr, Vol AC-19, pp 315-323,
Aug 1974
[66] M. Morf, "Fast algorithms for multivariable systems," Ph.D. dissertation, Dept of Elec Eng,
Standord, CA, Aug 1974
[67J P. De.wilde, A. Vieira and T. Kailath, "On a generalized Szeg-Levinson realization algorithm
for optimal linear prediction based on a network synthesis approach," IEEE Trans Circuits
and Systems, Vol CAS-25, No 9, pp 663-675, Sept 1978
[68J T. Kailath, "Signal processing applications of some moment problems," Proc of Symposia
in Appl Math, Vo137, pp 71-109, AMS Annual Meeting, short course reprinted in Moments
in Mathematics, ed. H. Landau, San Antonio, TX, January 1987
[69aJ T. Kailath and J. Chun, "Generalized Gohberg-Semencul formulas for matrix inversion,"
pp 231-246 in The Gohberg Anniversary Collection, Vol I, ed. H. Dym et al., Birkhauser,
Basel, 1989
[69bJ J. Chun and T. Kailath, "Displacement structure for Hankel, Vandermonde and related
matrices," Linear Algebra and Its Appls., 1991
[69cJ J. Chun and T. Kailath, "Divide-and-conquer solutions of least-squares problems for
matries with displacement structure," SIAM Journal of Matrix-Analysis, 1991.
[70J T. Kailath, A. Vieira, and M. Morf, "Inverses of Toeplitz operators, innovations, and
orthogonal polynomials," SIAM Review, Vo120, No 1, pp 106-119, Jan 1978
[71J I.e. Gohberg and I.A. Fel'dman, "Convolution equations and projection methods for their
solution," Amer Math Soc Translations, Vo141, 1974; Russian original, Nauka, Moscow, 1971
[72] T. Kailath and D.L. Duttweiler, "An RKHS approach to detection and estimation theory,
part III: generalized innovations representations and a Iikelihood-ratio formula," IEEE
Trans on Inform Thy, Vol IT-18, No 6, pp 730-745, Nov 1972
[73aJ T.K. Citron, A.M. Bruckstein and T. Kailath, "An inverse scattering approach to the partial
realization problem," Proc. 23rd IEEE Conference on Decision & Contr., pp 1503-1506,
Las Vegas, NV, Dec 1984.
[73bJ T. Kailath, A. Bruckstein and D. Morgan, "Fast matrix factorization via discrete transmission
lines," Linear Algebra and Its Appls, Vol 75, pp 1-25, Mar 1986
[73cJ A. Bruckstein and T. Kailath, "An inverse scattering framework for several problems in
signal processing," ASSP Magazine, Vo14, No 1, pp 6-20, Jan 1987.
[73dJ A. Bruckstein and T. Kailath, "Inverse scattering for discrete transmission-line models,"
SIAM Review, Vo129, No 3, pp 359-389, Sept 1987.
[74aJ J. Chun, Fast array algorithmsfor structured matrices, Ph.D. dissertation, Dept of Elec Eng,
Stanford University, CA, June 1989
[74bJ D. Pal, Fast algorithmsfor structured matrices with arbitrary rank profile, Ph.D. dissertation,
Dept of Elec Eng, Stanford University, CA, June 1990
[75J H. Lev-Ari, Y. Bistritz and T. Kailath, "Generalized Bezoutians and families of efficient
root-Iocation.procedures," IEEE Trans Cir and Sys, Feb 91
[76J R.E. KaIman, "On the Hermite-Fujiwara theorem in stability theory," Quart Appl Math,
Vo123, pp 279-287, 1965
[77J R.E. KaIman, "Algebraic characterization ofpolynomials whose zeros lie in certain algebraic
domains," Proc Nat Acad Sci, U.S.A., Vo164, pp 818-823, 1969
[78J R.E. KaIman, "On partial realizations, transfer functions and canonical forms," Acta
Polytechnica Scandinavica, MA 31, pp 9-32, 1979
(1.1)
(1.2)
When dynamics and real time considerations enter the picture, one is led to
the filtering problem with two main methods:
1
Pierre Faurre.
90
P. FauITe et al.
x=Fx+v
where v is a white noise
E[v(s)vT(t)]
= Q15(s - t)
(1.5)
y=Hx+w
(1.6)
(1.7)
91
(1.8)
x = x - X, and
gives then in addition the performance of the fiItering process.
To apply successfully the algorithm (1.7)-(1.8) called KaIman filter, two more
considerations have to be resolved:
(a) to design the markovian representation (1.4) for the signals or noises related
to the case under consideration.
We refer to Box-Jenkins [9], Faurre [10]-[11], and Young [12] either for
statisticalor more theoretical considerations for this statistical identification
and representation problem.
(b) to implement on a digital computer the algorithm, in a stable way.
We refer to Bierman [13] who uses numerical factorization methods introduced
earlier by numerical analysts such as Golub [14] for updating least squares
normal equations when new data are added (in a certain way, Gauss [3]
previously had a similar idea).
1.2 Navigation
The Navigation Problem
The navigation problem (most usually in the vicinity of the earth) consists in
determining for a vehicle in real time:
-its position
-its velocity
-its attitude or orientation
the vehicle (i.e. without any exchange with the external worId) and so is
completely autonomous,
(ii) all other methods of navigation, which use exchanges of signals or measures
with the external worId: celestial navigation, radio navigation, etc...
We are going to describe in more detail what the principles ofinertial navigation
are.
92
P. Faurre et al.
Inertial Navigation
T,
R (or R 1 and R 2 )
More basically, two time parameters are really fundamental for the geo'id:
T=2njil
Ts =2n,JRi9
= 24 hours
=
Foucault [15J was the first to make implementation ofthose ideas in 1851 with
his famous experiment with a pendulum, and with the invention ofthe gyroscope.
At the beginning ofthe century, Sperry and Anschutz invented the gyrocompass.
However it is during the Second World War that the complete design of an
inertial navigation system was made [16J determining all parameters from
inertial instruments.
With gyroscopes and accelerometers set on a "cluster" (Fig. 1.2) one gets
enough information to measure absolute orientation and
If we know a model
T.
93
>--.--+-;+
processor
Fig. 1.3. Principles of an INS (lnertial Navigation System) a) strapdown b) with platform
~2t:la = T + g(r)
(1.9)
94
P. Faurre et al.
it in a form suitable for KaIman filtering. See [19]. Many of the successful
applications of KaIman filters were done through a similar process.
References
[1] H.H. Goldstine, Ahistory of Numerical Analysisfrom the 16th century through the 19th century,
Springer Verlag, 1977
[2] A.M. Legendre, Nouvelles Methodes pour la Determination des Orbites de Cometes, Courcier,
1806
[3] C.F. Gauss, Theoria Motus, Goettingen, 1809
[4] N. Wiener, Extrapolation. Interpolation and Smoothing ofStationary Time Series, John Wiley,
1949
[5] N. Levinson, The Wiener Root Mean Square Error Criterion in Filter Design and Prediction,
Journal of Mathematics and Physics, XXV, No 4, pp 261-278, 1947
[6] J.L. Doob, The elementary Gaussian Process, An. Mathematical Statistics, 15, pp 229-282, 1944
[7] R.E. KaIman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic
Engineering, pp 35-45, 1960
[8] R.E. KaIman, New Methods in Wiener Filtering Theory, Proceedings of the 1st Symposium
on Engineering Applications of Random Function Theory, pp 270-388, Wiley, 1963
[9] G. Box, G. Jenkins, Time Series Analysis, Holden-Day, 1969
[10] P. Faurre, Realisations Markoviennes de Processus Stationnaires, Rapport LABORIA No 13,
IRIA, 1973
[11] P. Faurre, M. Clerget, F. Germain, Operateurs Rationnels Positifs. Applications a
I'Hyperstabilite et aux Processus A/eatoires, Dunod, 1979
95
[12] P. Young, Recursive Estimation and Time Series Analysis, Springer Verlag, 1984
[13] G.J. Bierman, Factorization M ethodsfor Discrete Sequential Estimation, Academic Press, 1977
[14] G.H. Golub, C.E. Van Loan, Matrix Computations, The John Hopkins University Press, 2nd
edition, 1989
[15] L. Foucault, Recueil de ses Travaux Scientifiques, Gauthier-Villars, 2 volumes, 1878
[16] C.S. Draper, W. Wrigley, J. Hovorka, Inertial Guidance, Pergamon, 1960
[17] K.J. Astrom, Some Problems of Optimal Control in Inertial Guidance, IBM Research Paper
RJ-229, San Jose, 1962
[18] L.D. Brack, G.T. Schmidt, Statistical Estimation in Inertial Navigation Systems, pp 554,558,
AIAA, JACC Guidance and Contral Conference, Seattle, August 15-17, 1966
[19] P. Faurre, et al., Navigation Inertielle Optimale et Filtrage Statistique, Dunod, 1971
2.1 Introduction
SIGAL,2 a family of strapdown inertial systems, has been developed to answer
the needs for miniature wideband attitude and navigation systems for missile
guidance. For reasons of size, the only sensors used in these systems are three
miniature gyrojaccelerometer multisensors, instead of three accelerometers and
two gyroscopes gene rally used to provide the necessary angular rate and specific
force measurements.
In the following sections, will be first described the gyrojaccelerometer multisensor, which is an unbalanced Dry-Tuned-Gyro (DTG), and then the design
and test results of the digital regulator and estimator using Linear Quadratic
Gaussian (LQG) control.
Due to their small size and low price, high speed microprocessors are
increasingly used to implement regulator algorithms, This allows the im plementa ti on of sophisticated digital control loops and wider bandwidth for
guidance systems. Earlier work [1], on the design of a digital servo-control
loop for a gyro led to problems of robustness because of parameter variations.
More recently, Ribeiro [2], proposed an LQG control for a similar gyroscope,
using a simplified model. Both studies supposed that the gyro angular rate is
equal to the controller torque, wh ich is not the case during fast variations of
the angular rate. The LQG control described in this section is based on a more
sophisticated model inspired by Craig [3] and takes into account the multivariable nature of the DTG.
In co nt rast to previous studies, a stochastic model of angular rate and linear
acceleration of the system is used instead of taking these entries proportional
to controller torques [2], and the bandwidth of the estimator is separate from,
and higher than, the bandwidth of the closed loop.
97
Rotor
Ring magnet
Pickoff coil
Gyro case
Motor stator
Motor rotor
98
P. Faurre et al.
Two magnetic torquers, oriented along these gyro sensing axes, apply a
moment on the rotor represented in case-fixed coordinates by H wc(t), where
H is a 2 x 2 constant matrix.
The gyro case motion is defined by its absolute angular rate w(t) and its
specific force, I(t), defined as the difTerence between the absolute linear
acceleration and the gravity acceleration. Both entries are resolved along the
case-fixed coordinates.
Because of the gas contained inside the case, an aerodynamic moment as well
as a damping moment act on the rotor.
The same method as the one used in Craig [3J, [7J for the case of an
unbalanced gyro, can be used to obtain the following simplified equation of
motion [4J:
(2.2)
with
J=[ +10
N
v=I Z -I
nutation frequency,
rotor speed,
principal moments of inertia of the
rotor,
principal moments of inertia of the
gimbal,
Ix,Iy,Iz
A,B,C
I = t(l x + I y + B)
D
p=p
Kq
p
2
pendulosity,
+~
K", = (k x
kx,ky
99
+ ky -
(A
+B -
C)N 2 )/2Iz N
(2.3)
instead of Wc'
The corresponding discrete-time system is controllable, considering u as the
control, and observable, considering as the observation. So the discrete
algebraic Riccati equation leading to the optimal gain L has one and only one
positive sem i definite solution. The control can be computed as:
u(k)
=-
Lxc(k)
In order to solve the estimation problem, a random drift is chosen for the
stochastic model of w(k):
w(k
+ 1) =
w(k)
+ v(k)
(2.5)
100
P. Faurre et al.
an augmented state
The new system is still observable using and controllable using v. Therefore
the discrete algebraic Riccati equation leading to the optimal filter gain K has
one and only one positive semi definite solution and the steady state KaIman
filter is
x(n
Hx(n)]
Now the LQG-control problem defined by (2.2) to (2.5) can be solved, using
control and filter gains Land K. The system with state x, output y and input
W e , is observable but not stabilizable, but the loss function is stabilizable.
Therefore it can be minimized [6]. It can also be shown that the optimal control
is
wAn)
=-
[L, - 1Jx(n)
with integral action because of the model (2.5) chosen for w. A block diagram
of the system is shown in Fig. 2.3. A notch filter is added on at frequencies
N, 2N, 3N, ... because of a periodic noise on the pickoff outputs. Then the LQG
control is generalized to the ca se of the three unbalanced gyroscopes of the
SIGAL inertial guidance system used in agile missiles.
Figure 2.4 shows the relations between the reference frame of each gyroscope
and the reference frame of the system. It can be noticed that: the balanced gyro
3 is only sensitive to wy and W z ' the absolute angular rate of the vehicule along
x et z axes; the unbalanced gyro 2 is sensitive to w x' w z' and to Ix, Iz' components
of the specific force; the inverted unbalanced gyro 1 is sensitive to w x' w=' Ix, I y ,
with a pendulosity coefficient opposed to the pendulosity coefficient of the
second gyro.
Therefore the whole system can detect wand I along each reference axis.
(j)
/\
co
STATE FEEDACK
/\
EJ
z
Invened
unbalanced
gyro. I
101
!
I
I
gyro.2
GLOBAL ESTIMATOR
ro.!
102
P. Faurre et al.
- ....
-- - - - --- ----- - --
-3
1\\
\/
-5
iD
'U
i:5
\~I
- 10
-1 5
," F(log)
250 Hz
- 1.5
\
Cl>
'"c
' 1.2 Hz
\\
\\
- 90
.c
a.
-135
-1 80
1
10
\
\
: mE?osurE?mE?n I
---: simulollcr
F(log)
100
Hz
1000
wx/wx
efTect of the notch filters at 250 Hz can be seen. A 140 Hz bandwidth is' shown.
This is twice the bandwidth of conventional implementations where the
estimation wis taken to be W e , thus limiting the bandwidth of the estimator to
that of the control loop. In the proposed method, these two bandwidths are
distinct: stabilization error is used to improve the estimation. On the Black plot
of the openloop response (Fig. 2.7) a gain margin of 6 dB, a phase margin of
40 degrees and a closed-Ioop resonance of 3 dB can be measured. This Black plot
corresponds to the x output of one of the 3 gyros. It can be noticed that the
closed-Ioop resonance is greater than the estimation resonance. Compared to
classical control, stifTness is three times higher and the sensor can stand angular
acceleration up t 60000/S2. This is made possible because the control loop
can be tuned without detuning the estimator.
In conclusion, the LQG control method together with an accurate model
is, as expected, of great use fr solving such a multi variable estimation and
control problem.
103
An LQG control for the non-stationary model (2.1) is currently under study.
The concept could result in a single instrument providing both measurement
of angular rate and specific force along two axes.
References
[1] G.K. Steel, S.N. Puri, Direct digital control of dry tuned rotor gyros, Pergamon Press, Oxford,
Automatie Control in Space, Vo12, pp 79-85, 1980
[2] J.F. Ribeiro, A LQG regulator for the angular motion of the rotor of a tuned gyroscope, Instituto
de Pesquisas Espaciais. Sao Jose dos Campos (Bresil). INPE-4280, PRE/ 1152, Aout 1987
[3] R.J.G. Craig, Theory ofoperation of an elastically supported tuned gyroscope, IEEE Transactions
on Aerospace and Electronic Systems, Vol AES 2, No 3, pp 280-288, May 1972
[4] P. Constancis, M. Sorine, Wideband linear quadratic gaussian control of slrapdown dry luned
gyro/ accelerometers, AIAA Guidance, Navigation and Control Conference, Boston MA., Paper
No 89-3441, pp 141-145, August 1989
104
P. Faurre et aJ.
[5] H. Shingu, Study on non interacting control of dynamical/y tuned dry gyro, Trans. Soc. Instrum.
and Control Eng. (Japan), Vo120, No 6, pp 554-560, 1984
[6] M. Sorine, Sur /'equation de Riccati stationnaire associee au probleme de controle d'un systeme
parabolique. Comptes Rendus I'Academie des Sciences, t. 287, Serie A, Septembre 1978, p 445
[7] RJ.G. Craig, Dynamical/y tuned gyros in strapdown systems, NATO-AGARD Conference
Proceedings No 116 on Inertial Navigation, Components and Systems, AD-758127, Paris,
February 1973
3.1 Introduction
The line-of-sight stabilization error of a pointing and tricking system caused
by response to gimbal bearing friction torque is often of sufficient magnitude
to be the object of an intense design effort [1]. This torque acts on the stabilized
member of the system's gimbal as a function of relative angular motion between
that member and the gimbal's base. It is counteracted in conventional systems
by the torque motor of a stabilization feedback loop. A gyroscope mounted on
the stabilized member is used in the feedback loop. This loop produces corrective
motor torque as a function of error measured by the inertial sensor. Feedback
operation reduces friction-related errors. Nevertheless, precision is limited by
loop stability which bounds feedback gain. Stabilization errors are then often
unacceptably large.
When friction torque can be accurately predicted in real-time, it is possible
to improve precision by using a feedforward compensation of this torque before
its effect is measured by the feedback sensor. In that case stabilization error is
no more function of the fuH friction torque but only of the mismatch between
actual and predicted friction torques. In addition, it is possible that the co~fficients of the model vary with temperature, time and operation conditions.
The motivates making the friction compensation adaptive.
The detailed knowledge of friction behavior necessary to achieve accurate
real-time modeling has been improved recently [2], [1]. This is particularly
true concerning the transient behavior of friction caused by relative motion
reversals of the system's gimbal members. Characterization of damping in solid
friction oscillators is given by Dahl's model which behaves as Coulomb's
model for large amplitudes and as viscous and structural damping for medium
and small amplitudes.
Adaptive friction compensation has been considered before. A feedforward
compensator adapted with model reference techniques and based on the
'Coulomb/stiction' model has been used [3]. Canudas, Astrm and Braun, [4],
have proposed an adaptive scheme with explicit identification of the Coulomb's
1
106
P. Faurre et al.
model. This scheme uses the apriori information available i.e. the structure of
the nonlinearity and the knowledge of some of the parameters.
Here we pro pose an adaptive friction compensator, whose implementation
is based on Dahl's model. Friction models proposed in the literature are
quickly discussed in Sect. 3.2. In Sect. 3.3 we describe the indirect adaptive
friction compensator built around an Extended KaIman Filter.
dC =er da (
dt
dt
1_~sgnda)
dt
Cmax
(3.1)
with
C
Cmax
friction torque,
maximum friction torque,
relative gimbal angle,
da
sgndt
er
model parameter.
This model is probably the most accurate to describe friction transient behavior
during reversals but it does not take stiction into account. A mathematical study
ofDahl's model has been done by Bliman, [5], in particular, the following result
+Cs
+Ccl----da
dt
-----I-Ce
-Cs
107
(j
(3.2)
is stabilized to zero with a feedback control Cm. To reduce the effects of the
friction terms by a nonlinear compensation, it is necessary to obtain an estimate
C of the friction torque C.
Assume that y is a noisy measurement of (output of a gyroscope):
(3.3)
y=+v
(}=[(j,~JT
Cmax
(3.4)
Now, (3.1), (3.2) and (3.5) are state equations of a system observed through (3.3).
When is small, ci is elose to the angular rate of the base of the pointing and
tracking system and can be estimated separately.
108
P. Fa urre et al.
40
Vi
-go
'1\ I Yll
20
'"
VI
.9
co..
-20
-40
Fig. 3.2. Stabilization angle error without compensation
40
Vi
"U
C
0
20
CII
VI
a:t.
Cl
CII
:J
of!
- -100
-200 ~
109
measured to be 40 are minutes peak-to-peak. Figures 3.2 and 3.3 show the
stabilisation error without and with eompensation. The peak-to-peak error
is shown to be divided by 3 using eompensation [80 are seeonds peak-to-peak
to 27 are seeonds peak-to-peak]. The frietion torque estimate C is shown in
Fig.3.4.
References
[1] C.D. Walrath, Adaptive Bearing Friction Compensation Based on Recent Knowledge oJ Dynamic
Friction, Automatica, Vo120, No 6, pp 717-727, 1984
[2] P.R. Dahl, Solid Friction Damping oJSpacecraJt Oscillations, Paper No 75-1104, AIAA Guidance
and Control Conference, Boston, Mass., August 1975
[3] Guilbart, Winston, Adaptive Compensation Jor an Optical Tracking Telescope, Automatica,
Vol 10, pp 125-131, 1974
[4] C. Canudas, K.J. Astrom, K. Braun, Adaptive Frinction Compensation in DC Motor Drives.
IEEE Journal of Robotics and Automation, Vol RA-3, No 6, December 1987
[5] P.A. Bliman, Etude mathematique d'un modele de Jrottement sec: le modele de P.R. Dahl. These
de Docteur en Sciences, Universite Paris IX, to be published
111
--\
--4
Rs
(known)
GPS RECEIVER
(antenna)
--4
(unknown)
EARTH
CENTER
Antenna
DIGITAL SECnON
r PREAMPLIFIER
~r
I
RF
MODULE
f--
.
.r
>-
~
- ...,
Channel
- - - - Channel
_ _ _
- - - - - - ...,
Channel
0-
L _ _ _ _ _ _ ...J
Channel
r----
P
R
I
N
T
E
R
F
A
C
E
'-----
L...-
0-
_ ...J
_
r----
I-
I-
C
E
f-
S lS
0
GPS Measurements
The code phase, or pseudo-range p, allows measuring the whole time difTerence
between the transmit time and the received time. The measurement p is related
to the user-to-satellite distance p by (Fig. 4.1):
l_
= u,
u= - p
p = Rs - R
(4.1)
112
P. Faurre et al.
with
R. = [X., Y.,Z.]T
R=[X,Y,ZY
c
bs
br
dion and dtrop
d.
Wp
satellite's position in earth-fixed earth centered coordinates as given by the navigation data message,
user's position in the same reference frame,
speed of light in vacuum,
satellite dock offset from GPS time,
receiver dock offset from GPS time,
ionospheric and tropospheric propagation errors,
error in the transmitted satellite position,
code tracking error of the receiver.
The carrier phase, or pseudo-range rate cp, can only be measured relative to
the time when the loop has locked to the incoming signal. The measurement
cp is related to the satellite to user distance p by:
cp = p + CPo + c(b. -
b r ) - d ion + dtrop + d.
+ W",
(4.2)
where
CPo
GPS related errors b. and d. are usually smalI, from 1 to 5 meters. Propagation
residuals errors, and dion , and dtrop vary from 5 to 30 meters. Receiver dock
errors are much greater and must be evaluated to compute a good position.
Code tracking error and carrier tracking error are of different magnitude
due to the corresponding wavelength. Code resolution is of the order of 1 to
30 meters, depending on the type of code tracked and the receiver design. Carrier
resolution is of the order of 1 to 3 centimeters. Consequently a geod design
must take advantage of the accurate but relative carrier phase measurement and
the absolute but less accurate code measurement.
113
p=p-cbr + Wp
which is a non-linear function of the vehicle's position and must be linearized
around the position estimate. Then the code phase observation model is:
or
ut5if - ct5br + Wp
t5yp = -
Llp
Llt
Llb r
Llt
-=--c-+
W~=p-cf,
+ W~
Llc$
Llt
t5y",=---= t5p-ct5Ir + W~
or
R=V
V=A
.
A=--A+v
T
114
P. Faurre et al.
with
R=[X, Y,ZY
position in earth fixed earth centered coordinates,
v = [Vx' Vy, VzY velocity in same coordinates,
A = [A x, A y, AzY acceleration in same coordinates,
v
white noise.
+w
with
~X = [~X,~y,~Z,~VX,~Vy,~Vz,~Ax,~Ay,~Az,~b,,~frY
~Y=[(P-P),(~~ - ~~)J
This non-stationary 11 state model is observable if measurements to 4 or more
satellites can be made.
It can be seen that processing 4 or more simultaneous measurements of
different satellites will give an estimate of position and velo city at measurement
time independent of vehicle dynamics. Of course, the best result is reached when
simultaneous measurements are taken from all visible satellites, wh ich is sometimes called the "AlI-in-View" concept.
If less than 4 channels are used, the filter will rely on the vehicle dynamics'
model to correlate measurements made at different times. Cdnsequently the
estimates will be related not only to measurement noise but also to the propagation of state noise between measurements. In these conditions, time differences
between satellite sampling times, level of vehicle dynamics, and receiver clock
stability are critical issues.
For this reason, when dealing with high dynamics, four or more channels
are generally required. For lower dynamics, one or two channel receivers can
be used, depending on their channel sampling time and clock stability.
References
[I] D.E. WeHs, et al., Guide to GPS positioning, Canadian GPS Associates, Fredericton, N.B,
Canada, 1986
[2] J.J. Spilker, GPS Signal Structure and Performance Characteristics, Journal of Institute of
Navigation, Vol I, pp 29-54, 1980
[3] R.P. Denaro, P.V.W. Loomis, GPS Navigation Processing and Kaiman Filtering, AGARD No
161, pp 11.1-11.9, 1989
[4] L. Camberlein, B. Capit, Uliss G, a Fully Integrated "All-in-one" and "All-in-View" Inertia-GPS
Unit, IEEE PLANS 1990 Symposium, to appear
[5] J. Ashjaee, On the precision ofthe CjA Code, IEEE PLANS 86 Proceedings, pp 214-217
5.1 Introduction
To aehieve the same navigation aeeuraey as gimballed inertial navigation
systems (gimballed INS's), strapdown inertial navigation systems (strapdown
INS's) require better gyros and aeeeierometers and, therefore, a better ealibration
aeeuraey.
A 1 Nm/h-1 m/s class strapdown INS for fighter aireraft typically requires
gyro stabilities and ealibration aeeuracies of .005 d/h for drifts, 5 ppm for seale
faetors and 15 mieroradians for misalignments. Aeeordingly, it requires for its
aeeelerometers stabilities and ealibration aeeuraeies of 50 ppm for seale faetors,
40 miero-g for bias and 15 mieroradians for misalignments. Aeeordingly, it
requires for its aeeelerometers stabilities and ealibration aeeuraeies of 50 ppm
for seale faetors, 40 miero-g for bias and 15 microradians for misalignments.
For transport aireraft and eommercial aviation, numbers 2 to 3 times larger
are usually satisfaetory due to lower flight dyn ami es and to less stringent velocity
aeeuraey needs. Exeept gyro drifts, most of these values are about or more than
an order of magnitude lower than what is required for a gimballed INS for the
same type of applieation.
Usually, the ealibration of an inertial navigation system eonsists in eompa ring while the system is at test the gyroseopes' outputs with the Earth rate
and the aeeeierometers' outputs with the loeal gravity. This is done for different
angular positions with respeet to the loeal geographie axes by using an aeeurate
turntable on whieh the cluster in rigidly fastened.
However, in the partieular ease of ring laser gyro strapdown inertial navigation system, this method eannot be used beeause the elastie isolation of the
cluster (on whieh the laser gyros and aeeeierometers are mounted) must not be
clamped. Therefore, a specifie ealibration method has to be used.
116
P. Faurre et al.
x =0
9
6V(0)
6Vo
(5.1)
<p(0) = <Po
(5.2)
Xa(O)=X a
(5.3)
(5.4)
Trademark of a family of ring laser gyro inertial navigation systems developed by SAGEM.
117
with
Ea = FaX a + va
(5.5)
Eg=FgXg+vg
(5.6)
and where
is the velo city error vector in geographie axes,
is the small angle between the analytical platform and the geographie
axes (it is the attitude alignment error),
is the accelerometer error state vector,
is the gyro error state vector,
is the accelerometer total error in cluster axes, linear function of X a ,
is the gyro total error in cluster axes, linear function of X g ,
is the rotation matrix between the analytical platform axes [pJ and the
cluster (or measurement) axes [mJ (it is computed in the INS by
integration using the gyro outputs),
is the specific force as measured by the accelerometers.
<5V
q>
Xa
Xg
Ea
Eg
T p/ m
Ja
<5)( = F<5X + v
<5X(O) = <5X 0
(5.7)
where
<5X = [<5 V, q>, X a, XgJT
F
v
<5Y=H<5X+W
(5.8)
where
Y
H
W
Written in the canonical form of equations (5.7) and (5.8), calibration becomes
a filtering problem which can be solved by KaIman filtering technique, assuming
all the errors are observable. The observability depends on the gyro and
118
P. Faurre et al.
Typical calibralion
identified
values
as
functions of time
and a ng ular. position
.
__ . " __M"_'. '_"" _,, ._ ... _ ... ______ ._ .. _ _ :_
" .,. . _..._ . .. ... . _ -___
+1000
I
:
:
-~_~_~_~_:-~'-~..- .,.I
+500
.(
':
'2
5 -'- - '-;-'- "
( ' )
,
i -~( : ~:: : : : .-:: !
.:
-500
~\
'
~.,
..
-1000
Typical calibration
standard deviaLions of calibration errors
+1000
,I
'-')
+500
,
I
,i
i :j
1
o c:o~_.:.....-...!--_-:':----_."i'_:-"":'--':"'--"\o.....:...o;;.;.;.:::.;;,J
..!
'
-;
3;
---.l
10
Fig.5.1. Calibration simulatio n: typical reco vered errors a nd KaIma n filter estimated errors versus
time a nd cluster angular position sequence
119
Calibration Validation
An accurate simulation of a ring laser gyro INS has been used to extensively
validate the accuracy of this calibration technique. This simulation takes into
account aIl the errors modeled (accelerometer scale factor errors, misalignments
and bias, and gyro scale factor errors, misalignments and drifts).
Simulation results are illustrated in Fig. 5.1 and Table 5.1. Typical evolution
of recovered errors and KaIman estimated errors as a function of time and
angular position is given in Fig. 5.1. Typical simulated and recovered errors as
weIl as KaIman filter estimated and true errors can be compared in Table 5.1.
The accuracy and the consistency of these results may be noticed.
Table 5.1 Typical caIibration results. SF = scale factor, B = bias, M = misalignment, G = gyro,
A = accelerometer
Error source
Simulated
error
Recovered error
100
200
300
101
201
299
Kaiman filter
estimated
error
Kaiman filter
true error
2
2
2
-1
-1
1
10
10
10
5.4
-2.5
9.6
200
299
300
5
5
5
0
1
0
100
200
300
97
197
296
5
5
5
3
3
4
Jlg
Jlg
Jlg
102
204
304
100
203
302
2
2
2
2
1
2
Jlrd
Jlrd
Jlrd
200
310
320
198
307
319
5
5
2
3
SFGX
SFGY
SFGZ
ppm
ppm
ppm
BGX
BGY
BGZ
.001/h
.001/h
MGYX
MGYZ
MGYZ
Jlrd
Jlrd
Jlrd
200
300
300
SFAX
SFAY
SFAZ
ppm
ppm
ppm
BAX
BAY
BAZ
MAYX
MAZX
MAZY
.001/h
2.5
5.0
7.5
-2.9
7.5
-2.1
120
P. Faurre et aI.
5.5 Conclusion
This ealibration teehnique for ring laser gyro systems, features several definite
advantages eompared to eonventional methods:
-high ealibration aeeuraey,
-use of a low-eost two-axis turntable,
-fast and fully automatie operation,
-use of INS standard navigation outputs, veloeity and attitude,
-great flexibility, easy modifieation of the ealibration proeedure sequenee
without any reprogramming,
-real time implementation providing ealibration data as weIl as an estimate
of their aeeuraey.
References
[1] L. Camberlein, F. Mazzanti, Calibration Techniquefor Laser Gyro Strapdown Inertial Navigation
Systems, Symposium Gyro Technology Proceedings, pp 5.1-5.12, Universitt Stuttgart,
September 1985
[2] J. Mark, D. Tazartes, T. Hildy, Fast Orthogonal Calibration ofthe Ring Laser Strapdown System,
Symposium Gyro Technology Proceedings, pp 13.0-13.21, Universitt Stuttgart, 1986
[3] P. Faurre, et aI., Navigation Inertielle Optimale et Filtrage Statistique, Dunod, 1971
1
2
122
P. Faurre et al.
INFRA RED
TRANSMISS ION
CARRIER
REFERENCE
SYSTEM
Fig.6.1. "ALIDADE", at-sea alignment of the carrier based Super-Etendard of the French Navy
v=
Tg/p!p
+ gp -
A(p g + 2ilg) V
cjJ=Tg/pwp-Wg+Vd
V(O)
Vo
(6.1)
({J(O) =
({Jo
(6.2)
(sin ({Jz)'
Vs
(6.3)
(eos ((Jz)'
= Vc
(6.4)
with
system,
inertial platform vertieal angle errors (assumed to be small
angles),
= Z( - ({Jz)(I + A({J)) projeetion on the x - y plane of the rotation matrix
from the platform frame [p] to the [g] frame, where Z is a rotation
matrix and A({J) eorresponds to a eross produet operator,
inertial platform azimuth error (wide angle),
= [Ix, !y' !zF speeifie foree measured by the aireraft platform aeeeierometers,
gravity veetor,
= [({Jx' ({JyF
= [_ Vy , Vx ,
123
The gyro drifts are assumed to be sm all and almost negligeable. They are taken
into account by vd, vs ' Ve , uncorrelated zero mean Gaussian white noises.
The velocity output of the aircraft inertial system is V + JV, where JV is
the velocity error.
VR = V + Wb X I + W
(6.5)
VR = V + TglbA(Wb)Lb + W
(6.6)
VR = V+B(wb)L b + W
(6.7)
with
Lx = 0,
Ly = 0,
Lx(O) = LxQ'
Ly(O) = LyQ
(6.8)
and where
is the carrier reference velocity output,
is the true velocity at the aircraft inertial system location,
Tg1b
is the coordinate transformation matrix from the carrier
frame [b] to [g],
Wb
is the carrier angular rate Wb in frame [b], provided by the
carrier reference,
L b = [Lx, Ly, LzY is the lever arm vector I in frame [b], L z is constant and
known, Lx and Ly are unknown, constant for a specific
alignment but random for different alignments,
W
represents the measurement error and the small movement
perturbations which are modeled as a zero mean
Gaussian white noise.
VR
V
124
P. Faurre et al.
CARRIER
REFERENCE
SYSTEM
NAVFILTER
---------- -,
AIRCRAFT INS
INS
PLATFORM
POSITIO N
ATTITU DE
8y
ALlGNMENT
FILTER
L _ _ _ _ _ UPDATING
_ _ _ _ _ _ _ .J
X =j(X) +v
(6.9)
Y=HX+W
(6.10)
x=
(6.11 )
where
[Vx , Vy, q>x, q>y' sin q>z, cos q>z' Lx, Ly]T
Y = [VRx' vRyF
(6.12)
The alignment problem, i.e. the estimation of the errors in the state vector
components of the dynamical model, is solved by an extended KaIman filter
developed about the current estimate. Once these errors are estimated the
aircraft INS is updated by correcting its states (Fig. 6.2). This implementation
is robust because of the very weak non-linearity of the model. The Alidade
at-sea alignment takes between 6 to 10 minutes, depending on the required
accuracy, and thus is two times faster than conventional methods. In addition
to the alignment KaIman filter, two other filters were also deve10ped for
"Alidade", one for the hybrid navigation of the carrier reference system and
one for estimating the carrier angular rate. All three have been fully operational
for more than 10 years.
References
[1] L. Camberlein, J.P. Paccard, M. de Cremiers, "Alidade", Optimisation Cout-Performance de
{'Alignement d'un Systeme Inertie! sur Porte-Avions, AGARD Symposium Proceedings,
pp 63.1-63.18, May 1979
[2] L. Camberlein, Evolutions Techniques de la Navigation par Inertie pour Avion, Revue Navigation
de l'Institut Francais de Navigation, pp 287-310, July 1979
[3] P. Faurre, et al., Navigation Inertielle Optimale et Filtrage Statistique, DUNOD, 1971
[4] C.T. Leondes, ed., Theory and Applications of Kaiman Filtering, AGARDograph No 139,
February 1970
[5] A.A. Sutherland Jr, The Kaiman Filter in Transfer Alignment of Inertial Guidance System,
Journal of Spacecraft, Vol 5, No 10, October 1968
7.1 Introduction
Much has been said and written about the exceptional complementarity ofGPS
and INS, about the several possible levels of hardware and software integration,
and about the corresponding performance/cost and synergy efficiency trade-offs.
These subjects, therefore, will not be discussed in detail, since they are covered
in many references, as in [3J, [4J, [7J and [10].
In brief, the exceptional complementarity comes from the very high long
term accuracy of GPS x, y, z position and velocity, and from the very high
short term accuracy and high pass-band of INS x, y, z position, velocity and
acceleration. It also comes: 1) from their equally remarkable coverage all-overthe-world 24 hours a day, 2) from INS being self-contained and insensitive to
jamming, and GPS not, 3) from INS providing attitude and GPS time.
The several possible levels of hardware and software integration range from
separate standalone INS and GPS unit, to complete hardware and software
integration in a single unit where a single adaptive KaIman filter would control
the GPS channels as weIl as the hybrid navigation. Between these two extremes,
synergy varies from its minimum to its maximum, although the latter extreme,
as most ideals, may very likely turn out to be unpractical to implement.
The performance/cost and synergy efficiency trade-offs usually bear on:
-aircraft installation constraints, i.e. number of units, size, weight, power, bus
load, more or less difficult validation and testing,
-performance, i.e. accuracy, resistance to jamming, to attitude manreuvers
(masking), to dynamics (high g's) and GPS shortages, inertia rapid in-flight
or at-sea alignment capability,
-integrity monitoring, reliability, robustness, redundancy, stability of the
hybrid navigation solution.
The subject of this section is to present the ULISS 2 Inertia-GPS Multisensor
1
2
126
P. Faurre et al.
127
unit (16 kg). The ULISS sm all size and light weight "all-in-one" unit demonstrates
the remarkable INS and GPS hardware synergy achieved, compared to the
one of separate INS and GPS units.
Fig.7.2. The RF and digital module of the 8 channel embedded GPS receiver totaling only
0.51-0.5 kg-9 W
128
P. Faurre et al.
results for the embedded GPS receiver in small size (0.5 liters), weight (0.5 kilograms) and power dissipation (9 watts).
inertial aiding of the embedded G PS receiver
The inertial aiding of the embedded receiver is twofold: fast acquisition at
receiver turn-on and re-acquisition after loss oflock in case of attitude manreuvers
(masking) or jamming, and improvement of the receiver resistance to jamming
and to high dynamics.
a) Fast acquisition and re-acquisition are achieved by initializing the states of
the code and carrier phase tracking loops with values computed from the
vehic1e's position and velo city given by pure inertia and from satellite
ephemeris and current time.
b) Improvement of the receiver jamming and high g's resistance is done by tight
inertial dynamic ai ding of code and carrier phase tracking loops which allows
bandwidth reduction.
'--
GPS
PUREAND
AIDEDGPS
POSITION,
VELOCITY
TIME
TERRAIN
MASS
MEMORY
BARO
ALT.
POSITION
UPDATES
R+M
l/J
TERCOR
I- -
t
TESTS
SELECTION
PREFIL
TERING
PURE
INERTIA
NAVIGATION
:-
r--
r---
HYBRID
NAVIGATION
KALMAN
FILTER
PURE INERTIA
POSITION,
VELOCITY,
ATTITUDE
ALIGN
~NAV
HYBRIDNAV
POSITION,
VELOCITY,
ATTITUDE
129
Given the real system accelerometer measurements Ja and the inertial platform
attitude angle error (jj, it can be shown [10] that the equations of perfeet inertial
navigation are, in the true frame [v]:
Z=Vz
where
[v],[t],[p]
Pv
V
ep
Ja
Ea
Eg
gp(L,G,Z)
Q
130
P. Faurre et al.
These equations of perfect inertial navigation contain implicitly a11 accelerometer and gyro errors. In practice only the most significant and observable
accelerometer and gyro errors are modeled, i.e. typica11y:
hz =
Vz
d=Vd
where Vz and Vd are uncorrelated zero mean Gaussian white noises.
The resulting set of equations of perfect inertial navigation is non-stationary
and non-linear. It is linearised about the best estimate to provide the inertial
navigation error model used by the hybrid navigation extended KaIman filter.
Model of the Embedded GPS Receiver
The embedded GPS receiver provides two kinds ofvery accurate measurements:
the code phase and the carrier phase rate to each satellite, also ca11ed pseudorange and pseudo-range rate. These measurements are used as observations by
the KaIman filter. The corresponding observation models, linearized about the
current estimate, are the same as those given in Sect. 4.2:
c5 y p = p t1<jJ
t1t
pA = -
Mi - cc5br + Wp
t1~
c5Ytb = - - - =
t1t
_-
where
p,<jJ
c5R,c5Y
c
c5b" c5!r
Wp,Wtb
Za = h(bi,Z) + W a
.
b=
--b.+v.
I
Ti I
I
131
where
Za
Z
bi
This non-linear model is linearized about the eurrent estimate to provide the
observation model.
Model of Terrain Correlation and Other Position Updates
bYR=Rm-R =bR + WR
where
is the position update, with two or three dimensions,
are respeetively the position eurrent estimate and error,
is the position measurement error, generally eorrelated between axes.
Kaiman Filter Implementation
ULISS unit implements an aeeurate and elaborate inertial-GPS-multisensor
tridimensional hybrid navigation by means of an 18 state extended Kaiman
filter. The same Kaiman filter is used for ground, in-flight or at-sea alignment,
providing fast reaetion time as well as smooth transition to and aeeurate
initialization of the navigation mode. This Kaiman filter is based on preeise
dynamie modeling of kinematics and sensors. It indudes the following residual
error eomponents of the state:
-position
-veloeity
-attitude
-aeeeierometer bias
- gyro drifts
-barometrie altitude parameters,
-GPS doek errors and doek error-rate.
3 sealars,
3 sealars,
3 sealars,
1 sealar,
3 sealars,
132
P. Faurre et al.
1500
m
""' Pure i Ilcrt i J
lOf1(Jitude errar
1000
500 _
o"'::_-~~
Illcr't iil-tcrrilln currclJt ion lli lude errot'
-500
-1000
Pur-e i nrrt lil
-1500 -
lalllude error
"'
-2000 o
o
o
ru
o
o
o
"
o
o
<D
aJ
o
o
- ffi
133
-m
_
4UOo.
Pure 1IlC,"tlol
GPS
ItIS/GPS
2000 .
OOODEtOO
-2000.
-4000.
o
o
o
q
ci
ci
[\J
"
':;
ci
ci
UJ
o
w
o
o
o
o
"';
':;
w
o
o
[\J
"';
Multisensor INS hybrid navigation reduces the standard 1 Nm/h, 1 m/s INS
errors to a few tens of meters in position and better than 0.1 mls in velocity
when updates are constantly made. H, for any reason (antenna masking,jamming
for GPS or default of radio altimeter), updates cannot be achieved any more,
hybrid navigation keeps the memory of the improvement and provides remarkable survival of the performance achieved.
References
[1] P. Lloret, Inertial + Total Station + GPS: A "Golden Tripod"for High Productivity Surveying,
PLANS 90, to appear in March 1990
[2] L. Camberlein, P. Lloret, Medium to High Accuracy Navigation with Pure and Hybrid Inertial
Systems and Related Mission Planning, GIFAS Conference, Dehli, Bangalore, February 15-22
1989
[3] D.A. Tazartes, J.G. Mark, Integration of GPS Receivers into Existing Inertial Navigation
Systems, Navigation Vol35, 1988-1989
[4] M.A. Sturza, C.c. Richards, Embedded GPS Solves the Installation Dilemma, PLANS 1988,
Proceedings of the Synposium, pp 374-380, 1988
[5] J.A. Soltz, 1.1. Donna, R.L. Greenspan, An Option for Mechanizing Integrated GPS/INS
Solutions, Navigation, Vol35, No 4, Winter 1988-89, pp 443-457
[6] C. Kervin, R. Cnossen, C. Kiel, M. Lynch, Development of a Tightly Integrated Ring Laser
Gyro Based Navigation System, PLANS 1988, Proceedings of the Symposium, pp 545-552
[7] R.P. Denaro, GJ. Geir, GPS/Inertial Navigation System In'tegrationfor Enhanced Navigation
Performance and Robustness, AGARD 1988, Proceedings of the Conference, LS-161
[8] J. Ashjaee, On the precision of the C/A code, PLANS 86, Pl'Oceedings of the Symposium,
pp 214-217
[9] P. Lloret, B. Capit, Inertie GPS: un mariage de raison a ['essai, Revue Navigation, Institut
Fram;ais de Navigation, Vol35, No 139, July 1987, pp 295-325
[10] P. Faurre, et al., Navigation Inertielle Optimale et Filtrage Statistique, DUNOD, 1971
8 Further Ideas 1
Although great progress has been made over the last thirty years and Kaiman
filtering is a tremendous advance in digital data processing, one should not be
discouraged for future progress. We should not imitate the very great engineer
Bode, so famous for his work related to feedback amplifier design, who, in April
1960, unaware of the Kaiman paper which appeared one month sooner, and
reviewing his own very important work related to feedback, was becoming
dubious about future research and wrote [IJ "one wonders whetlier there are
enough really good problems to go around among all workers now in the field, or
whether effort may not be lost by duplication or micro-engraving".
Progress will be permanent and will not end with uso A lot of efforts are
being made to find good directions [2]. We can modestly point out two main
directions of research:
-one related to the design of new components or instruments which will
become simpler and simpler from a hatdware point-of-view, but will require
more and more software for data processing,
-the other one connected with the new possibilities of parallel processing and
new digital structures [3J-[ 4]: one could look at the transition from Wiener
filters to KaIman filters as associated to the transition from analogous
electronics to standard digital computers. Wh at will be accompanying the
transition from standard processors to massive parallel computers?
References
[I] H.W. Bode, Feedback, the History of an ldea, Symposium on Active Networks and Feedback
Systems, Polytechnic Institute of Brooklyn, Polytechnic Press, April 19-21, 1960
[2] Challenges to Control, a Collective View, IEEE Transactions on Automatie Control, 32, No 4,
pp 275-285, 1987
[3] J.M. Ortega, lntroduction to Parallel and Vector Solution ofLinear Systems, Plenum Press, 1988
[4] S. Kim, D.P. Agrawal, Least-Squares Multiple Updating Algorithms on a Hypercube, Journal
of Parallel and Distributed Computing, Vol 8, No I, Junuary 1990
Pierre Faurre.
The notion of state is crucial in all sciences, yet the amount of literat ure specifically devoted to analyze this notion is irrelevant compared to its importance
(cf. [lJ for a detailed discussion and references). Particulady delicate is the
notion of quantum mechanical state so it is not surprising that the filtering and
prediction problem corresponding to these states has not yet received a
completely satisfactory solution.
In the present paper, which anticipates some results obtained in collaborati on with M. Ohya [3J and S. Belavkin [2J, I want to show that the quantum
filtering problem can be correctly stated and solved within the framework of
the theory of quantum Markov chains. An interesting fall out of this is an
explicit and easily computable formula for the solution of the filtering problem
relative to a dass of c1assical stochastic processes wh ich is considerably larger
than the dass of processes for which an explicit solution of the filtering problem
was available.
The notion of state introduced below indudes both the dassical and the
quantum notions.
A set of experimental operations are said to prepare a system in adefinite
state if, as a result of them, the values of each observable of the system have a
well defined probability distribution (the dassical states correspond to the case
-in which all these distributions are delta functions).
A change of state corresponds to a change of our information on the system
and here we have to distinguish two possibilities:
i) we acquire information on the system by interacting directly with it (hence
the gain of information on one observable might correspond to a loss on
some other one).
ii) we acquire information on a given system 1 by interacting with some other
system 2 which had previously interacted with system 1 but whose interaction
with system 1 at the moment of the measurement is negligible. In this case
the change of state of system 1 is due only to our change of information
about it and does not correspond to areal physical change of system 1.
The measurements of type (ii) are called nondemolition measurements.
136
L. Accardi
the quantum state ofthe atom at time k given that the result of our measurements
up to time kare (w I , ... , w k) =:w kj . This state is usually called the posterior state of
the atom given the set of observation w kj . Its precise definition is given in the
Appendix.
We shaB denote D k the set of aB the possible results of the measurements
of the given observable at time k and
(1.1)
d=
F k + I (Pk(Wkj ), w kj , Wk + I)
(1.2)
137
For the construction of our model, the assumption that the measurements
are nondemolition will be crucial.
If C* is affine, we call it a linear lifting; if it maps pure states into pure states,
we call it pure.
To every lifting from d l to d l (8)d2 we can associate two channels: one
from d l to d l , defined by
2,
defined by
(2.2)
2)
'Va2 Ed2
(2.3)
such that
cpI1(8)d2 =p2
(2.4)
138
L. Accardi
The idea of this definition being that the interaction with system 2 does not
alter the state of system 1.
Remark. The notions of "lifting" and of "nondemolition lifting", discussed here
are essentially (i.e. up to minor technicalities) included in the more abstract
notions of "state extension" and "canonical state extension" introduced by
Cecchini and Petz [6], [7] (cf. also Cecchini and Kmmerer [7]).
It is clear that a positive identity preserving linear map 1): .9110.911 --+ .911
defines by duality a linear lifting from .911 to .9110.912. In some cases the converse
is also true. For example, if .911,.912 are W*-algebras and d 10d2 denotes their
W*-tensor product, then any linear lifting 1)* from d 1 to d 10d2 defines by
duality a positive linear map 1): .9110.911 --+ .911 characterized by
P1(I)(a 1 @a 2):=(I)*P1)(a 10a 2);
P1EY(d1),
139
3 The Model
For each instant k we consider the algbera
.?k1 =.? .? ... .?
(k-times)
of all the observations up to time k (included). This algebra describes all the
possible results of all the possible measurements that one can perform in the
first kinstants on the output field.
It is mathematically convenient to embed all the algebras .?k1 into the single
algebra
.?N:= (8) N.?
which is the tensor product of countably many co pies of .?.
Denote for kEN
jk:.? ~ .?N:= N.?
the natural embedding into the k-th factor and let, for each I
d[:=.?[do ; dkl:=.?kldol(k~ 1) , d
01
= l~Ndo
The idea of non demolition measurement is built into this model because the
observables of the EM field (elements of B N ) commute with those of the atom
(ements of do).
For any C*-algebra 'ff we shall denote 'ff* its dual and 9'('ff) the set of states
of 'ff.
For each I ~ N there is a natural embedding
i[: b[E.?[-+b[ 1.otoE.?[do = d[
we shall write
140
L. Accardi
(3.1)
From this we see that the explicit formula for qJkl is:
qJ~l!!4k-ll = qJ~-ll
and therefore it defines a unique state qJ i!8 on!!4 N, characterized by the property
qJi!8l!!4 kl = qJ~;
VkEN.
The state qJi!8 will be called the apriori state of the output channel it describes
the apriori statistics ofthe output field. This state is a quantum Markov chain.
Since a measurement at a fixed time only can be only performed on
compatible observables, it follows that the choice of an observable of the output
system, Le. the radiation, field, to be measured at time k, is equivalent to the
choice of an abelian algebra
~k S
.?4k; kEN
141
:kj:= :[1.kj
we have
:kj ~ rt(.Qkj) ~ rt(.Q 1 x ... x il k) ~ j= 1rt(ilJ
According to Lemma (A2.1) and Definition (A.2) and given the above
identifications, for each kEN and for each WkjEil kj the posterior state Pkj(W) of
si 0 given the measurement of :kj' the initial state epkj and the result w kj of the
observations up to time k is weIl defined for ep-all w.
Moreover one has
epkj(Fkj ) =
S <Pkj(Wkj),Fkj(Wkj)ep(dwkj)
(3.3)
!h]
for all FkjE:kj si 0 ~ rt(il kj ; si 0)' We write epkj for the restriction of ep on
:kj ~ rt(il kj ) and we use the same symbol for the Baire measure induced on
il kj . The conditional prob ability of ep ~+ 1j given ff kj will be denoted
ep~+ 1j(dwklwk)
Remark. Since for each natural integer k, :kj S; ggkj' because ofthe identity (3.3),
we can interpret both Pkj and F kj as ffkj-adapted functions on il and we can
write
nk]
Notice moreover that for this conclusion we do not need the very special
structure of ep, given by (3.2), the only important thing is that the conditions
of Lemma (A2.1) are fulfilled.
With these notations the main result on discrete filtering theory for quantum
Markov chains can be formulated as follows:
Theorem. Let the sequences (epkj) and (Pkj) be defined respectively by (3.1) and
(3.4). Thenfor every kEN and ep~-for every WkjEil kj ; the Y'(sI)-valued measure
on il k given by the restriction of ~: + 1Pk(w kj ) on gg k 1d o ~ rt(il k) is absolutely
continuous with respect to epf+ 1j('1 w kj ) and its Radon-Nidokym derivative is
Pk+ 1 (w kj,)
In other terms:
142
L. Accardi
Proof For any aEd O,fkjE~(ilkj) and fk+ 1 E~(ilk+ 1)' one has:
qJk+llUkjC?9fk+1 a)
=
= S qJ~(dWkj)fkj(Wkj) S <Pk+1(wkj,wk+d,a)
ilkJ
ilk+ 1
ilkJ
Va,
Vfk+
[}k+ 1
<Pk(Wkj),lffk+1Uk+l a)
<1ff:Pk(Wkj),fk+l a)
or equivalently:
P (w W
k
+1
kj'
+1
) _ _Iff.::...:+-,--,lo...::[=-=-P-"-k(-,---W--,,,k,-,-j)=-](-,---d_w-,,-k_+~I)
f0
(d
1
)
qJk+ Ij Wk+1 Wkj
143
Define, for aEd, E(a) =:A.(E"'(l a)). Then for each WEil the map
p(w):aEd - t E(a)(w) =: <p(w), a)
one has
<p(w), F(w)
=I
<p(w),fj(w)aj )
= E'"
('?
JEI
=I
nw)E"'(a)(w)
JEI
fj aj)(w)
JEI
= E"'(F)(w)
Definition (A2.2). In the notations of Lemma (A.l), for cpfi!-almost each wEil,
the state p(w)E9'(d) is called the posterior state of d given the measurement of
~, the initial state cp and the result w of the observation.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16J
Chapter 3
1 Introduction
Control system design is the task offinding a controller which satisfies prescribed
specifications, under a given set of constraints. Desirable controller parameters
must satisfy a certain set of identities and inequalities that represent the plant
dynamics, constraints and specifications. Hence, control system design
amounts to solving a set of equations subject to inequality constraints that
contain the controller parameters as unknowns. Since the given set of equations
and inequalities is often very complicated and not precisely known, a solution
should be found by a cut-and-try method guided by intuition and experience,
which has been a major tool of practical control system design.
If control theory gives a basis for design, it must capture the essential features
of the design and formulate it as a mathematically tractable problem. A design
problem which theory formulates mathematically is bound to be a simplification
of real design problems and only a small part of specifications and constraints
is explicitly represented. Universal applicability and explicit solvability are vital
to a theory, but they are usually contradictory to the complexity and the
ambiguity of real problems. The key issue of design theory is to enhance its
universality (generality) and solvability without sacrificing its reality.
The LQG theory established by KaIman [lJ was undoubtedly the first
comprehensive and successful design theory in the history of control engineering
having a balance among universal applicability, explicit solvability and reality.
Though it was c1ear that Kalman's initial motivation when he established the
whole framework of LQG theory in the early 1960's was not brought forth by
practical needs, LQG theory gave a great impact on control system design. The
purpose of this artic1e is to discuss the impact of LQG theory on control system
design.
Before the LQG theory emerged, control theory was composed of a collection
of c1assical results with the names of great forerunners such as the Routh-Hurwitz
stability test, Nyquist stability test, Lur6's theorem on absolute stability, physical
realizability theorem of Wiener-Paley, Bode's integral formula representing the
gain-phase relation, and so on, and a collection of design tools like lead-lag
compensation, root-locus method, dead-beat control of sampled-data system,
and so on.
148
H. Kimura
149
paradigm to fill the gap. Even before the LQG theory, there was a gap between
theory and practice. For instance, Routh-Hurwitz stability theory had not been
known to practical control engineers for more than 60 years. However, the
issue of theory-practice gap did not come into being until the LQG theory was
established. Paradoxically, the creation of the issue of theory-practice gap is
one of the major contributions of the LQG theory to control engineering. In
the subsequent discussions, we shall figure out the most essential feature of the
theory-practice gap that was brought in by LQG theory.
y=u
(1)
This can be regarded as the model of a vehide moving on a line with normalized
mass of inertia, where y denotes its displacement and u the force applied to it.
The purpose of control is to move the vehide to the origin as soon as possible,
starting from an initial condition (Yo, Yo). A state-space form of (1) is given
by
[::J
1J
(2a)
(2b)
u= -I)(Xl -X2
= -I)(Y- Y
where I)( and are gains to
y + y + I)(Y = 0
(3)
(4)
It is dear that (y,'y) = (Xl' X2) tends to the origin as t -+ 00 for any positive I)(
and . From the engineering point ofview, it is better to find the unique optimal
gains with respect to some performance criterion.
Since our main concern is the speed of response, we might use the so-called
squared transient response area
J y(tf dt.
00
J 1 (I)(, ) =
(5)
as a performance criterion. Obviously, the sm aller the J 1 is, the quicker is the
response. Solving the second-order differential equation (4) under the initial
150
H. Kimura
condition (y(O), y(O)) = (Yo, Yo) and substituting the solution in (5) yield
ll(rx,)=(~+~)Y~+YOYo+ y~
rx
rx
2rx
(6)
It is clear that II (rx, ) given in (6) can be made arbitrarily small by choosing rx
and sufficiently large. Since the input u cannot be arbitrary large, the criterion
(5) is not appropriate for practical purpose. In order that a criterion is practical,
(7)
where M 1 and M 2 denote the upper bounds of gains. However, the optimum
values of rx and cannot be represented in closed forms and depend on the
initial conditions (Yo, yo). The constraints (7) thus give very complicated
non-linear controllaw. Therefore, the minimization of (6) under the constraints
(7) is again far from practical.
Instead of bounding the gains as in (7), we can penalize the increase of gains
taking
l(rx, ) =
00
S (qy2(t) + u 2(t))dt,
(8)
2
rx
rx
2
(9)
(lOb)
Jq,
* = J2lCJ,
151
Jy(t)2 dt,
00
V=
Ju(t)2dt
00
+ 3 + 3y + y2)Y6/16.
for any input u(t) and the equality holds for the optimal case. Figure 2.1 shows
the inequality (11). The curve which represents the performance of optimal
regulator forms the boundary dividing the unrealizable zone (unshadowed) and
the realizable zone (shadowed). Using the optimal regulator theory, we can
always depict a performance curve like Fig. 2.1 for more general cases.
As was shown in the above example, the optimal regulator gives a deep
insight on the trade-off between the response speed and the applied control
power. It was a sort of common sense that the larger the control power is
injected, the better the control performance iso However, no design method had
o~----~-------L------~------~~V
10
15
20
Fig.2.1
152
H. Kimura
been available before the emergence of LQG theory to deal with the trade-off
between the control performance and the control power.
If = 0 in (3), the closed-Ioop system is not stable. In other words, output
feedback cannot stabilize the closed-Ioop system. State feedback is essential in
this case. The optimal regulator fully exhibited the power of state feedback. In the
state-space paradigm, the state is regarded as the information concerning the
past behavior of systems which is necessary and sufficient for predicting the
future behavior of systems. In other words, state is an interface between the past
and the future, according to Akaike [5]. Therefore, the state feedback uses the
complete information and represents the ideal feedback scheme. Wh at is not
attainable by state feedback cannot be attained by any other feedback scheme.
Optimal regulator provides an effective way ofusing this complete information.
153
154
H. Kimura
is an interface that fuses the two opposite worlds, the abstract theory and the
real plant.
However, to build a model of a given plant is a difficult task in general due
to a multitude of reasons. We frequently use the term physical model. But it is
important to recognize the difference between a physical model and a model
based on physics. It is only an isolated, idealized and simple process that is
described by means ofphysicallaws. A physical process ta king place in industrial
systems is always a compound of several subprocesses interacting with each other.
For instance, the steel rolling is a relatively simple mechanical process based
on plastic deformation. However, an actual rolling process is far from simple
due to the use of a tandem mill system. It generates interstand tensions which
delicately affects the thickness. The traveling speed is crucially dependent on
the friction between the material and the mill. No physicallaw can identify all
these processes quantitatively. If we take the deformation profile of the strip
into account, the problem becomes three dimensional. No mechanical theory
of deformation is available to deal with such three dimensional phenomena.
We must recognize the physics is far from sufficient to describe industrial
processes quantitatively. Physics has been built on the belief that Nature is
simple, or becomes simple after the removal ofmacroscopic diversity. Oppositely,
engineering is based on the beliefthat Nature is camp lex. Instead offundamental
equations, phenomenal or empirical formulas are used extensively to describe
the physical process of the plant under consideration. Engineering systems
belong to the artificial world. They cannot be fully described by the knowledge
of natural science whose objective is to understand the natural world, though
they obey and utilize the natural laws. This is an important point in order to
understand the difficulty of modeling.
Recently, some attempts have been made to ex te nd the scope of physics to
describe the complexity and the diversity of the macroscopic physical world.
We expect that the new trend in physics will help to mitigate the difficulty of
modeling [7].
In many subjects of engineering such as theory of he at transfer, an effort is
made to fill the gap between the fundamental physicallaw and the complexity
of physical world. They are essentially based on natural science. Though the
knowledge of such engineering is essential to the modeling of plants, something
more is definitely required in order to give some principle to decompose, to
simplify, to parameterize the model and to design experiments to identify the
plant parameters quantitatively. It would be some sort of "metaphysics" that
belongs to "the science of the artificial" [22]. This is exactly what we lack and
hope to have in future to eliminate the theory-practice gap to some extent.
155
namely, how accurate should be the model of uncertainties be? and how
should the performance index be defined to consolidate design specifications and model inaccuracy? Furthermore, how can one guarantee the
relative insensitivity of the final control system right from the start? Of
course, these questions are not new; nonetheless, the existence of systematic
computer-aided design technique has returned the focus upon these
age-old questions, [17].
The argument on the robustness ofLQG method dates back to the celebrated
paper by KaIman [4] on the inverse regulator problem. KaIman derived the
so-called KaIman equation and established that the sensitivity of the optimal
regulator is always not greater than one for SI SO systems, irrespective of the
selection of performance indices. This also implies that the optimal regulator
has infinite gain margin and phase margin of at least 60, again irrespective of
performance indices. These robustness properties of the SISO LQG method
were extensively discussed in a beautiful book [9].
The implications of the KaIman equation for the general MIMO case were
investigated by many authors (e.g. [10] [11] [12]). It is worth noting that a
complete multivariable extension ofthe solution oft he inverse regulator problem
was obtained quite recently by Fujii [13], who derived a novel design method
of robust control based on the inverse regulator theory [14]. A general
robustness issue for the MIMO ca se was addressed by Safonov et al. [15], who
derived a natural extension of the stability margin of scalar regulator to the
multi variable case. This work showed that ni ce robustness properties for SI SO
regulators are carried over to the MIMO ca se, and really gave an initial impellent
for the subsequent progress of robust control theory.
Unfortunately, however, it was pointed out that the nice robustness
properties of optimal regulators hold only when state feedback is available [16].
They no longer hold when the optimal regulator is to be implemented by
output feedback using KaIman filters or observers. As a remedy for this, the
so-called loop transfer recovery (LTR) method was proposed. However, since
it uses a kind of high gain strategy, the practical applicability of LTR method
is questionable.
The robustness issue concerning the LQG method exhibits a possibility of
serious theoretical approach to deal with model uncertainty. Though most
applications ofLQG method to real systems exhibit its robustness as was proved
156
H. Kimura
theoretically, some serious drawbacks have been pointed out [18]. However,
LQG theory will remain to be a milestone for robust control theory, as is seen
in the recent rapid progress of quadratic stabilization theory [19] [20].
6 Conclusion
As the founder of LQG theory, KaIman made an enormous impact
on the methodology to control system design. LQG theory was the first
comprehensive and systematic design theory we had in the long history of
control engineering. First, it pushed forward the superiority of model-based
control. It established a method of systematic and logical design of control
systems which is sharply different from conventional design methods based on
experience and intuition. It opened up the possibility of changing control
system design from arts to science. Second, it extended the scope of control
system design from the control of local process variables to the integrated
control of the whole plant. Third, it created the field of CAD for control systems.
It converted the control system design from paper-pencil jobs to large-scale
computation using interactive software packages.
It is also true that many criticisms have been raised against LQG theory
since its emergence. The underlying ideas of these criticisms have been changing
from time to time. While some of them have settled down, the fundamental
issue ofmodeling remains to be argued. The argument ofthe theory-practice gap
has been giving a continuous impellent for creating a new paradigm. Now, we
have a new design method called H"" control which was established as an
alternative of LQG. Different from various design methods proposed in the
past, H"" method is actually not an alternative of LQG but rather includes
LQG as its extreme case [21]. This is analogous to the fact that quantum
physics includes classical physics as its extreme case. In this respect, H"" method
is really a successor of LQGmethod in a higher logicallevel, of course giving
more versatile design strategies. The spirit of LQG survives in H"" theory.
References
[1] R.E. KaIman, "Contributions to the theory of optimal control," Boletin de la Sociedad
Matematica, Mexicana, Vol 5, pp 102-119, 1960
[2] G.c. Newton, L.A. Gould and J.F. Kaiser, Analytical Design of Linear Feedback Controls,
John Wiley and Sons, New York, 1957
[3] L.S. Pontryagin, et. al., The Mathematical Theory ofOptimal Process, Interscience Publishers,
New York, 1962
[4] R.E. KaIman, "When is a linear control system optimal?" Trans ASM E, J of Basic Engineering,
Vol 86, pp 1-10, 1964
[5] H. Akaike, 'Markovian representation of stochastic process by canonical variables," SIAM J
Control, Vo113, pp 162-173, 1975
[6] Special Issue on Linear-Quadratic-Gaussian Estimation and Control Problems, IEEE Trans
Automat Contr, Vol AC-16, 1971
[7] I. Prigogine and I. Stengers, Order Out of Chaos: M an's N ew Dialogue with Nature, Bantam
Books, New York, 1984
157
[8] P. Dorato (ed.), Robust Control, IEEE Press, New York, 1987
[9] B.D.O. Anderson and J.B. Moore, Linear Optimal Control, Prentice-Hall, New Jersey, 1971
[10] B.D.O. Anderson, "Sensitivity improvement using optimal design," Proc lEE, Vol 113,
pp 1084-1086, 1966
[11] J.B. Cruz and W.R. Perkins, "A new approach to the sensitivity problem in multi variable
feedback systems," IEEE Trans Automat Contr, Vol AC-9, pp 216-223, 1964
[12] E. Kreindler, "Closed-Ioop sensitivity reduction of linear optimal control systems," ibid, Vol
AC-13, pp 254-262, 1968
[13] T. Fujii, "A complete optimality condition in the inverse problem of optimal control," SIAM
J Control and Optimiz, Vol 22, pp 327-341, 1984
[14] T. Fujii, "A new approach to the LQ design from the viewpoint of the inverse regulator
problem," IEEE Trans Automat Contr, Vol AC-32, pp 995-1004, 1987
[15] M.G. Safonov and M. Athans, "Gain and phase margin for muItiloop LQG regulators," ibid,
Vol AC-22, pp 173-179, 1977
[16] J.C. Doyle and G. Stein, "Robustness with observers," ibid, Vol AC-24, pp 607-611, 1979
[17] M. Athans, "Editorial: On the LQG Problems," IEEE Trans. Automat Contr, Vol AC-16,
p 528, 1971
[18] H. H. Rosenbrock, "Good, Bad or Optimal," ibid, Vol AC-16, 1971
[19] I.R. Petersen, "A Riccati equation approach to the design of stabilizing controllers and
observers for a dass of uncertain linear systems," IEEE Trans Automat Contr, Vol AC-30,
pp 904-907, 1985
[20] K. Zhou and P.P. Khargonekar, "An algebraic Riccati equation approach to H'" optimization,"
Systems & Control Letters, Vol 11, pp 85-92, 1988
[21] J.C. Doyle, K. Glover, P.P. Khargoneker and BA Francis, "State-space solutions to standard
H 2 and H", control problems," ibid, Vol AC-34, pp 831-847, 1989
[22] HA Simon, The Sciences ofthe Artificial, MIT Press, 1969
Some of the recent developments in H<Xl control theory using a state-space approach are discussed.
The elose connections to the LQG problem are highlighted. Moreover a new proof of the necessary
conditions for the solvability of the standard problem of H<Xl control theory is given.
1 Introduction
Among the many contributions of KaIman to mathematical system theory, his
work (KaIman [1960J, KaIman and Bucy [1961J) on linear-quadratic-Gaussian
(LQG) optimal control and filtering problems has had a very strong influence
on the subject. In all of his work, he emphasized that the notion of state plays
a vital role in a deeper understanding of system theoretic problems. Indeed,
some of the appeal and beauty of KaIman filtering and linear-quadratic optimal
controllies in the simple and intuitive structure of the solutions.
Motivated by robustness considerations, Zames introduced the problem of
HOC) optimal control in his pioneering paper Zames [1981]. The essential idea
was to design a controller to optimize performance for the worst exogenous
input. Thus, while in KaIman filtering and the LQG problem, the power
spectrum ofthe exogenous input (noise) is assumed known (usually white noise),
in the HOC) control problem the power spectrum is assumed unknown and the
controller is designed for the worst case.
Early research in HOC) control theory was conducted using frequency
domain methods. The key tools were the Youla-Jabr-Bongiorno-Kucera
parametrization of all stabilizing controllers, inner-outer factorizations of
transfer functions, Nevanlinna-Pick interpolation theory, Nehari distance
theorem, the commutant lifting theorem, etc. Since the frequency domain
approach is not the main topic of this paper and since there is an extensive
amount of literature on this approach, I will not discuss it here. The survey
paper by Francis and Doyle [1987J and the expository book by Francis [1987J
are excellent sources for the frequency domain approach to the HOC) control
* Supported in part by NSF under grant no. ECS-9096109. AFOSR under contract no.
AFOSR-90-0053. and ARO under contract no. DAAL03-90-G-0008.
160
P. P. Khargonekar
2 State-Space Approach to H
CX)
Control Theory
(1)
161
Here x, w, u, z, y denote respective1y the state, the exogenous input, the control
input, the regulated output, and the measured output. KaIman made pioneering
contributions to the problem of designing a controller for optimizing the
variance of z when w is a stochastic process. This classical solution, commonly
known as the LQG theory, has had a very significant influence on linear
multi variable control theory over the last 30 years.
The standard problem of H 00 control theory is: Given the linear system 2:
and a positive number y, find a causal (dynamic) controller K such that the
closed loop system is well-posed, internally stable, the closed loop input-output
operator
Tzw :L 2 -L 2 :WHZ
is bounded, and the induced norm
IIzI12
}
IITzwll:=sup { --:llwI12#0 <y
IIwI12
A controller K is called admissible iff it is causal, the closed loop system is
well-posed and internally stable. If K is also an FDLTI system
d~ =A~ +By,
dt
(2)
u=C~+Dy
then the closed loop system is well-posed iff (1 - J 22D) is invertible and the
closed loop system is internally stable iff the closed loop system matrix
[
+ G2(1-DJzz}- l DJ 22
B(1- J 22D)-1
G2 (1- DJzz}-lC
A
+ BJ 22(1- DJ 22)-lC
has no eigenvalues in the closed right half complex plane. Moreover, in this
case, if we let Tzw(s) denote the closed loop transfer function matrix, then
11 Tzwll = 11 Tzwll oo := sup{B(Tzw(s)): Re(s) ~ O}
It is well known fact that under suitable assumptions, the LQG controller
is also the unique solution to the problem of minimizing the quadratic norm
of the closed loop transfer function matrix Tzw over all internally stabilizing
controllers. Thus, the key difference between the LQG and the H 00 control
problems is in the choice ofnorm on transfer functions. This is intimately related
to the underlying assumptions on the exogenous signals as mentioned in the
introduction. Also, it turns out that the H 00 norm arises naturally in many
robust control problems.
In order to present the most important concepts clearly, I will make certain
simplifying assumptions. Most of these assumptions can be easily removed.
A.1 J 11
= J 22 = o.
A.2 J'12 [H 1
J 1 2J = [0
I].
162
A.3
A.4
A.5
A.6
P. P. Khargonekar
Assumption A.3 is the dual of assumption A.2 and is analogous to the standard
assumption in the KaIman filtering problem that the process noise and the
measurement noise are uncorrelated and that the measurement noise is
nonsingular and normalized. Assumption A.4 is a technical assumption and
guarantees that certain solutions to certain algebraic Riccati equations are
nonsingular. Assumption A.5 is necessary and sufficient to guarantee the
existence of an internally stabilizing controller for the system L. The reader is
referred to Glover and Doyle [1988,1989], Safonov and Limebeer [1988], and
Zhou and Khargonekar [1988] on techniques for rernoving the assumptions
A.1-A.5.
The assumption A.6 also causes no loss of generality as shown by
Khargonekar and Poolla [1986]. They showed that the infimum of the norm
of Tzw over all causal internally stabilizing nonlinear controllers is no less than
that over all FDLTI internally stabilizing controllers. In fact a simple timedomain proof of their result can be constructed using some of the techniques
introduced in this paper.
Cl)
(It should be noted that the state-feedback problem does not satisfy the
assumption A.3 above.) The papers by Mageirou and Ho [1977] and Petersen
[1987] contain some ofthe first results on the state-feedback HCl) problem. (The
paper by Mageirou and Ho [1977] contains a result, in the context of robust
stabilization, which is essentially equivalent to the result of Petersen [1987].)
Under the assurnption J 12 = 0, Petersen [1987] considered the problem of
163
Theorem 2.1. Consider the system E and suppose (3) holds. Then y* = inf{ 11 Tzw 11 00: K
is an internally stabilizing dynamic state-Jeedback controller} = inf { 11 Tzw 1100: K
is an internally stabilizing static state-Jeedback gain.}
Theorem (2.1) was proved by Khargonekar, Petersen, and Rotea [1988]
under the condition that J 12 = 0. It was generalized further by Khargonekar,
Petersen, and Zhou [1987], Zhou and Khargonekar [1988], Doyle, Glover,
Khargonekar, and Francis [1989]. Recently, Scherer [1990] has shown the
following
Theorem 2.2. Consider the system E and suppose (3) holds. Let y* be as in Theorem
(2.1). Suppose there exists an admissible dynamic state-Jeedback controller K(s)
such that the norm of the closed loop transfer function
11 Tzw 11 00 = y*
Then there also exists a stabilizing static feedback L such that the norm of the
closed loop transfer function
11 T zw 11 00 = y*
The following theorem shows how one can obtain state-feedback control
laws by solving certain algebraic Riccati equations. It is taken from Doyle,
Glover, Khargonekar, and Francis [1989]. Results analogous to Theorem (2.3)
were obtained earlier by Mageirau and Ho [1977], Petersen [1987],
Khargonekar, Petersen, and Zhou [1987], Zhou and Khargonekar [1988].
Theorem 2.3. Consider the system E. Suppose assumptions A.1, A.2, and (3) hold.
Then there exists an admissible controller K such that
(4)
11 Tzw 11 00 < y
F'P +PF +
(5)
164
P. P. Khargonekar
F + ((GI G'I)/(y2) - G2G~)P is asymptotically stable, and P > O. In this case, the
controllaw
u=
-G~Px
(6)
There are a number of interesting features to this result. Note that the
control law is obtained by solving an algebraic Riccati equation which is
analogous to the classical results of KaIman on the linear-quadratic optimal
control problem. The algebraic Riccati equation (5) is similar to the corresponding equation in the linear-quadratic optimal control problem except
that the quadratic term in (5) is indefinite. Indeed, (5) is identical to the ARE
that arises in linear quadratic optimal game problems. This is intuitively
appealing since in the H 00 control problem the inputs wand u act as opposing
players: the exogenous input w tries to maximize (the norm of) of z while u is
designed to minimize it. This connection between linear-quadratic games and
H 00 control theory has been discussed and explored in a number of recent
papers. See, e.g. Khargonekar, Petersen, and Zhou [1987], Tadmor [1988,
1989], Limebeer, Anderson, Khargonekar, and Green [1989], Basar [1989a,
1989b], Uchida and Fujita [1989]. From the game theory perspective, the works
of Banker [1972], Mageirou [1976], and Mageirou and Ho [1977] are of special
interest with respect to the current developments in H 00 control theory.
Tadmor [1990,1989] and Limebeer, Anderson, Khargonekar, and Green
[1989] have obtained analogues ofTheorem (2.3) for linear time-varying systems.
Tadmor [1989] has also obtained results analogous to Theorem (2.3) for irifinitedimensional systems.
Note that as y goes to 00, the controller (6) approaches the LQR solution.
In other words, as the H 00 norm constraint on the closed loop transfer function
is relaxed, the control law given by (6) approach es the corresponding LQR
controllaw.
It is interesting to note that the paper ofPetersen [1987] was for the singular
case, i.e. when the rank of J 21 is less than the dimension of u. Indeed, in that
paper J 21 = O. The results of Khargonekar, Petersen, and Zhou [1987], Zhou
and Khargonekar [1988] also apply to the general singular case (J 21 is not
necessarily full rank). However, in these papers the singular case is taken care
of by introducing certain small peturbations to make the problem nonsingular.
Since we are dealing with a strict inequality in (4), the existence ofan appropriate
small perturbation is not difficult to establish. In a different and more appealing
approach, Stoorvogel and Trentelman [1988] have obtained interesting results
on the singular case in terms of quadratic matrix inequalities. Their approach
is also related to the almost disturbance decoupling problem.
Theorem (2.3) naturally leads to the H 00 analog of the inverse problem of
optimal control formulated by KaIman [1964]. This inverse problem of H 00
control theory was considered by Fujii and Khargonekar [1988] who showed
that state-feedback H 00 controllers inherit the nice robustness properties
165
(KaIman [1964]) of the LQR controllers and in a certain sense are even more
robust.
dx
-=Fx+G1w,
dt
z=H1x,
Y = H 2 x + J 21 W
(7)
In this section, we will assume that (F, G1) is reachable, (F, H 2) is detectable,
and A.3 holds.
The problem is to estimate the output z using the measurements y. Nagpal
and Khargonekar [1989] have considered both the filtering (estimator is
required to be causal) and the smoothing (estimator is allowed to be noncausal)
problems. So, let ff be a causal estimator and let
z= ff(y)
The estimator ff is called unbiased if
y(,)
Let us now define an H 00 i.e., a worst case, performance measure. There are
two cases to consider: (i) initial state is known (and, without loss of
generality = 0), and (ii) the initial state is unknown. In the latter case, it is
assumed that the best apriori estimate of x (0) is zero. In other words, the
estimator initial state is taken to be zero. Let 0 ~ T ~ 00. Define
J1(ff,
and
Ob
J 2(.'#',R,T):=sup
{li
Z -
z112
[ 1 W 11 2 + x~Rxo]
1/2: WEL2
= xo, 1 W1 ~ + x~Rxo # 0}
(8)
[0,T],x(0)
(9)
166
P. P. Khargonekar
Theorem 2.4. Consider the FDLTI system (7). Let y > 0 be given and let T =
Then there exists an unbiased linear filter such that
00.
(10)
if and only ifthere exists a bounded symmetrie matrixfunction Q(t), tE[O, (0) such
that
Q(t) = FQ(t) + Q(t)F' -
Q(O)=R-\
(11)
is exponentially stable.
M oreover, if these conditions hold then one filter that satisfies the performance
i(t) = H 1 x(t)
(12)
(13)
Theorem 2.5. Consider the FDLTI system (7). Let x(O) = O. Let y > 0 be given
and let T = 00. Then there exists an unbiased linear filter such that
J 1(ff,00)<y
(14)
( 15)
the matrix (F - Q(H~H 2 - H'l H d y 2)) has no eigenvalues in the closed right
half plane, and Q> o.
M oreover, if these conditions hold then with Q(t) replaced by Q, the filter given
by (12) and (13) satisfies performance bound (14).
These results are the H 00 analog of the weIl known KaIman filtering results.
The key differences are in the Riccati differential equation (11) and the algebraic
Riccati equation (15) which are very similar to the covariance equations for the
KaIman filtering problem with the exception of the term QH'l H 1 Qjy2. Thus, in
H 00 filtering, the states to be estimated influence the filter itself unlike in KaIman
filtering where the optimal es ti mate of any state-functional is obtained from the
167
Tzw 11 00 < y
(16)
conditions hold:
1. there exists a (unique) real symmetrie matrix P such that
F'P + PF
(17)
u= -G~Px,
H 2X) + G2u
(18)
168
P. P. Khargonekar
where
QP)-l
Z= ( 1 - }'2
The notation in the above theorem is meant to be suggestive. Note that the
above controller (18) has many similarities to (and some differences from) the
classical LQG controller. It is called the HOC! central controller and has many
interesting interpretations and properties.
As }' goes to 00, this controller approaches the LQG controller.
It is shown in Doyle, Glover, Khargonekar, and Francis [1989J that the
above controller is an H 00 estimator for the state-feedback control law (6) in
the presence of the disturbance w = (G'lPjy2)X. More specifically, let
G' P
r:=w-+x
)'
-=
( F+-G1G'lP')
x+ G 1 r+ G2 u,
2}'
y=H 2 x+J 21 r
(19)
=-
G~Px
and apply Theorem (2.5) to the above system, then the resulting filter is precisely
the output feedback controller in Theorem (2.6).
The above result is also closely connected to the separation principle in the
risk sensitive control problem obtained by Whittle [1986]. See Glover and Doyle
[1988J, Doyle, Glover, Khargonekar, and Francis [1989J for further details
along these lines.
Recently, Stoorvogel [1989J has obtained interesting extensions ofTheorem
(2.6) to the singular case, i.e. the case when J 12 and J 21 are not required to
have full rank. As mentioned previously, Tadmor [1990, 1989J, Limebeer,
Anderson, Khargonekar, and Green [1989J have extended these results to linear
time-varying and infinite-dimensional systems. Sampei, Mita, and Nakamichi
[1990J have extended the approach taken in Zhou and Khargonekar [1988J
to the output feedback case.
Proof of Theorem (2.6). Without loss of generality, set}' = 1.
(Necessity) Suppose there exists an admissible FDLTI controller K such
that the norm of the closed loop transfer function matrix 11 Tzw 11 00 < 1. It follows
169
we have
(20)
(1Iw(t)112_llz(t)112)dt~0
-T,
(Here Tl' T 2 can be 00.) We will break up this proof into three parts.
Part 1: Necessity of Condition 2
Following the key idea from Nagpal and Khargonekar [1989J, we will first
construct an exogenous signal W such that y == 0. Let J il be such that the matrix
[~~:J
is a square orthogonal matrix. Now set
w:= J~l V 2 + Ji;v l
(21)
1 W 1 2= 1 vl l1 2+ 1 v211 2,
G1 W = G 1J il V1,
J 21 W=V 2
V2 ,
let
(22)
+ J 12 u(t)
Y = H 2 x(t) + v2 (t)
z = Hlx(t)
If
V2
H 2x(t),
= -
(23)
then y == 0.
Now let T > be arbitrary. For any V l EL 2 [ - T,O] and V2 and w as above,
it is easy to verify that wEL 2 [ - T,O]. since K is a causallinear controller an
y == 0, it follows that u == 0. Now the equations for system L become
dx
.1'
-=FX+GJ 21 VU
dt
(24)
z=Hlx
Now if we set
V l , V2
J (11 w(t) 1 2-
-T
= 0, then
1 z(t) 1 2)dt
-T
(25)
170
P. P. Khargonekar
(26)
-T
= O,x(O) = X o, T > O}
(27)
Note that in J _, we take infimum over aB T> 0 also. By the re ach ability of
(F, G1JiD, it foBows that J -(xo) is finite and from (25) that it is non-negative.
It foBows from Willems [1971, Theorem 7] that there exists a real symmetrie
matrix X such that
(28)
and the eigenvalues of (F - G1G'l X) are in the closed right half plane. [Here we
have used the fact that G1G'l = G1J J il G'l'] Moreover,
i;
o~ J -(x o) =
- x~Xxo
Thus, X ~O.
Next we show that X < 0 and aB eigenvalues of F - G l G'l X are in the open
right half plane. Since 11 T zw 11 00 < 1, there exists a (sufficiently smaB) (j > 0 such
that if Z:= (z' (jx')', then 11 Tz,w 11 00 < 1. Now arguing as above, we can conclude
that there exists a (unique) real symmetrie matrix X such that
(29)
and the eigenvalues of F - G1G'l X are in the closed right half plane. U sing the
monotonicity properties of solutions to algebraic Riccati equations, it foBows
that X ~ X ~ O. In fact strict inequality holds here. To see this, suppose v is
such that v'(X - X)v = O. It foBows that X v = X v. Then multiplying (29) and
(28) by v' on the left and v on the right, and subtracting, we get (jzll v Il z = 0 and
so v = O. Therefore,
Since 11 Tz,w 11 00 < 1, we can repeat the above argument for X and therefore
X < O. Let Y and Y denote the unique stabilizing solutions to the algebraic
Riccati equations (28) and (29) respectively. Then using the maximality and
minimality properties of stabilizing and antistabilizing solutions of algebraic
Riccati equations, we have
Thus we are in the strict inequality case of Theorem 5 of Willems [1971] and
aB eigenvalues of F - G1G'lX are in the open right half plane.
171
F' + Q(H'11
H - H'2
H2) -- -
x- l(F -
G11
G' X)X- 1
d(x'Px) = Ilu +
dt
IIwl1 2 -llzl1 2
(30)
Now set w(t):= G~Pe(F+(G,G~ -G 2 G;)P)I XO for t ~ O. Using the fact that
F + (G 1 G~ - G2G~)P is asymptotically stable, it follows that w belongs to
L 2 [0, (0). Now for any internally stabilizing controller K, we have
00
00
The supremum on the right hand side is taken over all uEL 2 [0, (0) for which
the resulting state trajectory of:E is such that xEL 2[0, (0). Since w is fixed this
is a linear-quadratic optimal control problem. Now it is not difficult to verify
that u(t) = - G~Pe(F+(G,G; -G 2 G;)P)I XO is the (unique) solution to this linearquadratic optimal control problem. (This can be done in many different ways.
For example, one can check that the w, uand the corresponding state trajectory
x(t) = e(F+(G,G; -G 2 G;)P)I XO solve the associated two point boundary value problem.)
Moreover, with w, U, x as above, it folio ws that u(t) = - G~Px(t) and
w(t) = G'l px(t). Now integrating (30) from 0 to 00, we get
co
sup
u
J0 (11 w(t) 11 2 -
J(11 w(t) 11 2 -
(31)
J(v 1 , T)=X~Q-1XO+oellxoI12
172
P. P. Khargonekar
V2
-co
X~[Q-l
+ o:Jxo
(32)
From part 2 above, set w(t):= G'lPe(F+(G,G; -GzG;)P)t xo , t;?; O. For any admissible
controller K, we get from (31) above
co
11
2)dt ~ - x~Pxo
(33)
+ o:Jxo -
0:
x~Pxo;?; 0
> O. Therefore
Q-l_P;?;O
To prove the strict inequality, choose () > 0 as in part (1) of the proof. Applying
the above argument to Tz~w and with the obvious notation it follows that
Q;l ;?;P~
Thus the following chain of inequalities holds:
Q-l>Q;l;?;p~>p
(34)
T'=[
.
GI
GI
G2G~P
]
F+GIG'lP-ZQH~H2'
]
-ZQH~J21 '
e:= [H 1 -J12G~P
J12G~PJ
(35)
173
where
[ PII =
0
(Q -
0
1 _
P) -
]
1
It follows that (/J has all eigenvalues in the left half plane. Thus, from (35) one can
conc1ude that the c10sed loop system is internally stable. Again using conditions
1,2, and 3, it is not hard to show that ((/J + lle' e) = ( -ll-l[(/J + TT'Il-l]'ll)
has no eigenvalues on the imaginary axis. Now from (35) and Lemma 4 of
Doyle, Glover, Khargonekar, and Francis [1989J, it follows that the c10sed loop
system satisfies 11 Tzw 11 00 < 1.
This completes the proof.
0
dx
- = Fx
dt
Zl
= H ll x + J 112 U,
y=H
2x
+ J 21 w + J 22 U
(36)
where W 1 and W 2 are two exogenous (vector) inputs and Z 1 and Z2 are two
regulated (vector) outputs. The multiple objective optimal control problem of
interest is defined as follows:
Optimal LQG Performance Subject to an H 00 Constraint
Find an admissible controller that minimizes the 11 TZ1W1 112 subject to the
constraint 11 Tz2w2 11 00 < y.
This problem arises in many situation. For ex am pIe, it represents the problem
of designing a controller to optimize the nominal performance subject to a
174
P. P. Khargonekar
3 Concluding Remarks
Many new insights into the H<Xl control problem have been obtained by taking
a state-space approach. Perhaps, the most important is the insight into the
structure of the solution. A major motivation for considering the H<Xl norm
comes from robust control problems. While the recent developments in H <Xl
control theory represent very significant progress in robust control, the problem
of robust performance remains achallenging open problem.
Acknowledgements
I would like to thank K.M. Nagpal and M.A. Rotea for many helpful
conversations.
References
1. M. Banker [1972]. Linear stationary quadratic games, Proceedings of IEEE COIiference on
Decision and Control, pp 193-197
2. T. Basar [1989a]. Disturbance attenuation in LTI plants with finite horizon: Optimality of
nonlinear controllers, Systems and Control Letters, Vol 13, pp 183-192
175
176
P. P. Khargonekar
29. A.A. Stoorvoge1 [1989]. The singular H"" control problem with measurement feedback, to
appear in SI AM J Control and Optimization
30. A.A. Stoorvogel and H. Trentelman [1988]. The quadratic matrix inequality and in singular
H"" control with state feedback, to appear in SIAM J. Control and Optimization
31. G. Tadmor [1988]. Worst-case design in the time domain: the maximum principle and the
standard H"" problem, Mathematics of Control, Signals, and Systems, Vo13, pp 301-324
32. G. Tadmor [1989]. The standard H"" problem and the maximum principle: The general linear
case, Tech. Rep. 192, University of Texas at Dallas, May 1989
33. K. Uchida and M. Fujita [1989]. On the central controller: Characterization via differential
games and LEQG problems, Systems and Control Letters, Vol 13, pp 9-14
34. P. Whittle [1986]. A risk-sensitive certainty equivalence principle, in Essays in Time Series and
Allied Proeesses, Applied Probability Trust, London, pp 383-388
35. J.c. Willems [1971]. Least squares stationary optimal control and the algebraic Riccati equation,
IEEE Trans on Automat Contr, Vol AC-16, No 6, pp 621-634
36. G. Zames [1981]. Feedback and optimal sensitivity: Model reference transformations,
multiplicative seminorms, and approximate inverses, IEEE Trans on Automatie Control, Vol
AC-26, pp 301-320
37. K. Zhou and P.P. Khargonekar [1988]. An algebraic Riccati equation approach to H ""
optimization, Systems and Control Letters, Vol 11,85-92
One of the key contributions to arise in system science over the past 40 years has been LQG
theory. The purpose of this paper is to review this theory emphasising the connection between
continuous and discrete time cases. We also briel1y review certain robustness issues induding the
concept of loop transfer recovery
1 Introduction
LQG theory, as first described in the 1960's, [14], [15], [16], [18] represented
a major breakthrough in system science since it provided a new viewpoint
within which problems could be formulated. Prior to the appearance of this
theory, the principal emphasis had been on frequency domain techniques for
single input single output systems. LQG theory made possible the treatment
of multivariable, nonstationary problems and therefore opened new horizons
in control and estimation theory.
Since its appearance. LQG theory has proved to be enormously successful.
The reasons for this success include the fact that the problem formulation is
easy to understand and to relate to physical design objectives: the fact that the
resultant optimization problem fits into a rich mathematical framework leading
to elegant and mathematically tractable solutions and to the fact that the results
have a clear and intuitive interpretation in practice.
Over the past 30 years there has been an enormous amount of work done
on the LQG problem including extensions of the theory and a large number
of successful applications. It would be impossible to do justice to a survey of
this work. Thus, we will give a tutorial introduction to the theory but with a
novel twist: treating the continuous and discrete cases in a unified framework.
The essential components of the LQG problem are as folIo ws [25]. The
underlying system is described by a linear state space model having "white
noise" disturbances in both the state evolution and output measurements. In
continuous time, this model has the following incremental form [3]:
dx = Acxdt + Bcudt + dv:
dz = Ccxdt + dw
x(O) = x o
(1.1)
(1.2)
178
where x ElRn, uElRm zElRr are the state, input and integral of the output
respectively. Also, v and ware independent Wiener processes having incremental
covariance Qedt and redt respectively. (A useful heuristic is to think of dv/dt
and dw/dt as continuous time "white noise" processes in which case. Qe and Tc
have the interpretation of Power Spectral Densities. In this context, the system
output is Ye = dz/dt.) The initial state Xo is taken to be a random variable,
independent of v and w, and having covariance Po.
Given the model (1.1), (1.2) the design objective is to minimize the following
quadratic criterion with the input expressed as a function of past data:
J=
tE {X(tf f 17 jx(t j) +
(1.3)
2 LQ Control Problem
Consider the special case when the noise is zero and the state is directly measured.
The model (1.1), (1.2) then becomes
d
-x = Aex + Beu;
dt
x(O) = X o
(2.1)
(2.2)
With a zero order hold input and sampling period Li, the corresponding discrete
model is
x((k + I)Li) = Aqx(kLi) + Bqu(kLi);
where
x(O) =
Xo
(2.3)
y(kLi) = Cqx(kLi)
(2.4)
Aq =
(2.5)
Bq
eAc.<l
.<I
=Je
Cq = Ce
Ac
(.<I-t)Bd7:
e
(2.6)
(2.7)
The subscript 'q' refers to the shift operator formulation wh ich is standard for
the discrete time case. However, a difficulty with this description is that it does
179
not have a meaningfullimit as the sampling period goes to zero. For example,
we have
lim A = 1;
..:1-->0
(2.8)
lim Bq=O
..:1-->0
This difficulty arises since, whereas (2.1) is in incremental form (i.e. the equation
describes the differential of the state), (2.3) is in absolute form (i.e. the equation
describes the end point of the state transition). This motivates the following
alternative form of (2.3):
x((k
+ 1)Li) Li
(2.10)
yc(kLi) = C~x(kLi)
where Cq =
C~ =
(2.9)
Cc and
(2.11)
= TBc
T=
(1 +
(2.12)
LiAc + Li 2 A; ... )
2!
A~, B~
(2.13)
3!
dt
(2.15)
where q is the usual forward shift operator. In this notation the continuous
model (2.1), (2.2) and the discrete model (2.9), (2.10) can be written in a unified
wayas
(2.16)
px=Ax+Bu;
(2.17)
yc=Cx
A~, B~
180
S dt = {
t
(2.18)
(t/LI)-l
k=O
x(O) = x o
(2.20)
A(tf ) = l:fx(tf )
(2.21)
(2.22)
[01
L1BR-IBT][PX(t)]=[ A -BR-IBT][X(t)].
(l + AT,,1)
pA(t)
- Q
- AT
A(t) ,
[;(~~)]=[l:f:~tf)]
(2.23)
Since (I + AT,,1) is non singular for the model (2.1), (2.9) then (2.23) can be
rewritten as
[ PX(t)] = M[X(t)]
p2(t)
2(t)
(2.24)
M=[I
L1BR- IB T
(l+A TL1)
]-l[
A
-Q
-BR-IB T]
_AT
(2.25)
=[
A
_Q
-BR-IB T]
_AT
(2.26)
It follows that JT =
r =1
181
J and
rlMTJ= -(1+M.1)-lM
(2.28)
(2.29)
Substituting (2.29) into (2.23) and using the fact that e(17x) = (e17)x
.1(e17)(ex) leads to
[-17(t) I]M[ I
17(t)
J=
-e17(t-.1);
17(tf)=17 f
+ 17(ex) +
(2.30)
(2.31)
(2.32)
(2.33)
(2.34)
==-
(2.35)
[ - f I] M [~ ]
= 0
(2.36)
This nonlinear equation is called the Algebraic Riccati Equation (ARE) and
has a family of solutions. One of these solutions is of particular interest and is
obtained as follows:
182
Let X = exil xI1Y ElR 2n x n be the matrix formed from the generalized
eigenvectors corresponding to the eigenvalues of M inside or on the stability
boundary; i.e.
(2.37)
where A is a diagonal matrix of eigenvalues (more generally a Jordan form
matrix). From (2.37) we have (provided X;/ exists) that
(2.38)
It is then c1ear from (2.37), (2.36) that 2:s = X 21X;/ satisfies the ARE. Also from
the top equation in (2.37) we have
X llAX;/ = A - BL
(2.39)
where L is obtained from (2.35) with 2:(t + Li) replaced by 2: s . Hence, the c10sed
loop system corresponding to 2: s has eigenvalues inside or on the stability
boundary. This solution is the strang solution to the ARE [8]. If aB of the c10sed
loop poles lie inside the stability boundary, then the solution is called the
stabilizing solution.
Next, we c1arify the conditions under which the solutions of the RDE (2.3.1)
converge to the strong solution of the ARE [7].
We factor Q as [(YC]. We then have the following result:
Theorem 2.1 [7], [8], [21].
(i) Existence and Uniqueness. The ARE has a unique strong solution if and
only if (A, B) is stabilizable
(ii) Stabilizability only. Subject to (2: f - 2: s ) ~ 0, then lim 2:(t) = 2:s if and only
t-> 0Cl
if (A, B)
of the RD E with final condition 2:f' and 2:s is the unique stabilizing solution
ofthe ARE.
D
183
Theorem 2.2 [21]. For the steady state version oJthe controllaw(2.35) we have
(1) Jor all
WEIR
where
LI
a=---R + L1BT~B
g, 1 + L(yl -
A)-lB
I-Ja
(3) The phase margin is at least
cos-1(1-!-a)
Note that as L1-+ 0, the above results reduce to the continuous time results
described earlier.
(3.1)
(3.2)
where y(t) is the pre-filtered version of Yit) and where vikL1), wq(kL1) are discrete
gaussian white noise processes having covariance given by
f]} [Q;Sq
=
Sq Jc5(k -I)
Rq
(3.3)
As might be expected from comments made in Sect. 2, the model format given
in (3.1), (3.2) will not have a sensible limit as L1-+ O. Indeed, in addition to (2.8)
which also holds here, we have that Qq is of order L1 and R q is of order 1/L1
184
and hence
lim Qq = 0,
.1--+0
lim Rq =
.1--+0
(3.4)
00
(3.5)
(3.6)
Sr
[ D o SoJ=.1'P
(3.8)
ro
We can then write the models (1.1), (1.2) and (3.5), (3.6) in unified form as
{!x
= Ax
+ Bu + v
(3.9)
(3.10)
y=Cx+w
where (v T , wT ) has Spectral Density [ ST
Q
rSJ.
(!x = Ai + Bu + H(y -
CX)
(3.11)
where
H = [(.1A
(3.12)
185
From the structure of the solution given in (3.12), (3.13) it follows that the
properties of the KaIman filter are duals of the corresponding properties of an
associated optimal control problem. For example, if we factor Q as DD T , then
we have the following dual to Theorem 2.1:
Theorem 3.1
(i) Existence and uniqueness. The ARE has a unique strang solution if and
only if(C,A) is detectable
(ii) Detectability only. Subject to Po - Ps ~ 0, then lim P(t) = Ps if and only if
I .... 00
(C,A) is detectable, where P(t) is the solution of the RDE with initial
condition Po, and Ps is the unique strong solution ofthe ARE.
(iii) Detectability and no unreachable modes on stability boundary. Subject to
Po>O, then the detectability of(C,A) and the nonexistence ofunreachable
mo des of (A, D) on the stability boundary are necessary and sufficient
conditions for lim P(t) = Ps (exponentially fast), where P(t) is the solution
1 .... 00
ofthe RDE with intial (wndition Po, and Ps is the unique stabilizing solution
ofthe ARE.
D
The above results cover a wide range of filtering problems. For ex am pIe, part
(ii) of the above theorem applies to sinewave estimation in noise. This allows
the traditional methods ofDiscrete Fourier Transforms to be viewed as a special
case of KaIman filtering [5].
As stated in Sect. 1, the solution of the general problem given in equations
(1.1) to (1.3) is to combine the state feedback solution described in Sect. 2
with the optimal state estimator described above. This result is commonly
known as the Separation Theorem or the use ofCertainty Equivalence [1], [3].
4 Robustness Issues
Unfortunately the nice robustness properties of the LQ optimal controller
described in Sect. 2 are often lost when x is fed back rather than x [9]. To
see why this is so, we pI ace the KaIman filter in a more general setting. In the
time invariant case, the KaIman estimator gives x as particular linear functions
of y and u. We generalize this to any function of the following form:
(4.1)
where Tl' T2 are rational, proper, stable linear time invariant operators. For
ex am pIe, any full order steady linear observer (including the KaIman filter) leads
to
(4.2)
186
Fig.4.1
x must be
(4.3)
Full state feedback and state estimate feedback lead to different sensitivity of
the closed loop. We illustrate this idea in the case of an unmeasured input
disturbance. Consider a MIMO system under linear feedback control based on
estimates of the states as shown in Fig. 4.1.
From Fig. 4.1, we have
(4.4)
u'=u-d= -Lx
(4.5)
u = Sejd
where
Sef = [I + L(eI - A)-1 Br 1 [I
+ LT1(e)J
(4.7)
u' = u -d = -Lx =
- L(eI - A)-1Bu
(4.8)
u = SSfd
(4.9)
with
Ssf = [I + L(eI - A)-1B]-1
(4.10)
+ LT1 (e)]
(4.11)
187
From (4.11) it is dear that the sensitivity to input disturbances are exactly equal
if and only if LT1(e) = o. This can be achieved by setting T1(e) = 0, but then it
may not be possible to construct a proper, stable T 2 which satisfies (4.3). A
practical solution to this dilemma involves requiring LT1(e) to be zero only in
the frequency band of the disturbance. Then a stable T 2 can be sought which
ensures satisfaction of(4.3) in a bandwidth appropriate for the problem at hand.
If the plant is minimum phase, such a practical solution is easy to achieve. In
the single input single output case it suffices to put.
T 2 (e) =
N(e)E(e)
adj(e1- A)B
(4.12)
where N(e) is the numerator of the system transfer function and where E(e) is
a stable polynomial introduced to ensure that (4.12) is proper. By choosing E(e)
to roll off outside the bandwidth of interest then it is readily seen that (4.3) is
satisfied over this bandwidth. In the non-minimum phase case, a compromise
is required between equalizing the sensitivities to input disturbances and
satisfying (4.3).
For more general problems, we can use the KaIman Filter framework to
design a robust filter by introducing additional terms in the noise spectral
densities as an artifice for dealing with different types of uncertainty. For
example, returning to the input disturbance case, this can be captured in the
KaIman Filter setting by choosing n = BB T and then letting r ~O [14]. In the
case of a minimum phase, relative degree one system, this leads to the solution
given in (4.12). This latter approach is generally known as Loop Transfer Recovery
[10], [11], [23], [24]. Using the same general approach, sensitivity to other
forms of uncertainty can be addressed [20].
Work along the general directions outlined above has contributed to a better
understanding ofthe robustness properties ofthe LQG solution. This has further
extended the practical appeal of the method.
5 Conclusions
This paper has given a brief review of linear quadratic control and estimation
theory from a unified perspective. This dass of techniques provides a flexible
framework within which feedback control problems can be precisely specified
and solved. The method has found wide spread acceptance and is frequently
used especially for complex multi variable systems.
Of course, the raison d'etre for the use of feedback is to have a control
system which has low sensitivity to uncertainty. Hence, a complete design will
generally require a combination of feedback and feedforward strategies with
the feedback bandwidth being made as high as possible so as to reduce sensitivity.
No matter wh at method is used to design the feedback, the ultimate achievable
bandwidth is limited by non-minimum phase zeros, time delays, high relative
188
Chapter 4
Introduction
Dynamical systems can be described in two different ways. The first is the
external, input/output or black-box description and the second is the interna I or
state-space description. The former is solely in terms of external variables, the
causes or inputs and the responses or outputs. The latter is in terms of additional
variables, the so-called state variables. The state variables represent the internal
dynamics of the system; they summarize its past history and are sometimes
referred to as the memory of the system.
The state-space description has a long history going back to Newton.
Famous examples are the Lagrangian and the Hamiltonian frameworks in
mechanics. The input/output description is more recent. In an electric network,
the impedance linking the input current and the resulting voltage constitutes
an example of such a description.
The question of the relationship between the external and the internal
descriptions of a given system arises. In principle, going from the internal to
the external description is straightforward, as it involves mere elimination of
the state variables. The converse step, i.e. deriving the internal from the
complete, or from an incomplete, external description is highly non-trivial. This
is the problem of modelling, Le. the problem of constructing the state from
measurement:; on the external variables. Realization is the special case where
the systems to be modelIed are assumed to be linear and time-invariant, and
(complete or incomplete) measurements of the impulse response are provided.
The solution of the realization problem is mainly due to R.E. KaIman. It
constituted the first systematic approach of the modelling question and a big
step towards the understanding of the structure of linear systems.
Historically, the realization problem for linear, time-invariant, finitedimensional systems from the transfer function matrix was solved in 1963, in
a paper by Gilbert and in another by KaIman published back-to-back in the
SIAM Journal on Control (see Gilbert [16] and KaIman [17]). Using a partial
*The work of the third author was partially supported by the Inamori Foundation.
192
A. C. Antoulas et al.
193
x(O) = 0,
(1)
194
A. C. Antoulas et al.
where the time t is either a continuous variable tER, or a discrete variable tEZ.
The operator (I is defined as follows:
dx(t)
(Ix(t):= - - ,
dt
when tEZ;
u, x, y are elements of the input space U = R m, the state space X:= Rn, and the
output space Y = RP, respectively, while F, G, H are linear maps between the
following spaces
G:U -tX,
F:X -tX,
H:X - t Y
L:=(H,F,G)
TF=FT,
TG=G
All systems be10nging to the same equivalence dass as L will be denoted by [L].
The second family of systems we will consider consists of the so-called
convolution systems, i.e. systems where the output y is obtained by convolving
the input u with the impulse response A:
(3)
y(t) = (A*u)(t)
y(t) =
JA(t -
r)u(r)dr,
tER,
(4.1)
y(t) =
L At_ru(r),
tEZ
(4.2)
r=O
195
t ~ 0,
tER,
t>O,
tEZ,
in both cases
At=HP-1G,
t=1,2, ...
Recall, that for both discrete- and continuous-time systems the transfer function
(the transform of A) is
Z(o") = H(01 - F)-lG = HG(J-l
This provides another way of seeing that S completely specifies the external
behavior for systems of the form (3). We can therefore define a map
(5.1)
(5.2)
The next question is: is 4> invertible? This question if far from trivial. It is,
in fact, the realization question: given complete data about the impulse response,
can we associate to it a state-space tripIe 1:' (4) surjective), uniquely (4) injective)?
The answer is: both the domain and the range of 4> need to be restricted for it
to become an invertible map.
196
A. C. Antoulas et al.
FG
O(H,F):= R'(F',H'),
B(S):=
This matrix has block Hankel structure and infinitely many rows and columns.
It turns out that the image of CPn is
im CPn = F: xt := {SEFext:rank B(S) = n}
Summarizing we have the following
Main result: Realization with complete data. The map
whose action is defined by (5.2), is a bijection.
197
A2
A3
A3
B(SN):=
AN
AN
?
AN
?
where ?s stand for as yet unknown matrices which conserve the block Hankel
structure of B(SN)' The following definitions will be needed in the sequel (see,
e.g. KaIman [27], Bosgra [15]). We will say that the
column (row) of B(SN)
is linearly independent from the previous ones, if there exists a j x j submatrix
of the first j columns (rows) of B(SN) which is nonsingular, independently of
the choice of the unknown elements ? Similarly, the i th column (row) is linearly
dependent on the previous columns (rows) of B(SN) if the determinants of all
ix i submatrices of the first i columns of this behavior matrix depend on some
free parameters (this implies that they can be made zero for some choice of the
free parameters). It follows that the rank of B(SN) is r if there exists an r x r
submatrix which is nonsingular independently of the free parameters ? Consider
the family
F~xt(N):=
In a similar way to (5.1), (5.2) we can define the following map between the
family of state-space systems F~nt and the family of input/output systems just
defined:
,I, . Fint --+ Fext(N)
(6.1)
IfJn"
n
n
'
(6.2)
198
A. C. Antoulas et al.
It is easy to check that this map is surjective but not injective. To see the fact
that it is not injective, consider the case N = 1 and Sl = (1). All members of the
eiER.
l),(_eiO
L2=((~
_1eiJ,(~)),
Kj ,
Vi'
iE!!.,
of LEF~nt. These indices are invariants with respect to equivalence and satisfy
Kj
Vi
= n = dirn 17
ie!!
jE1!J
KJ'
JEm
v iEp}
-'"
_
With linear dependence and independence of the columns (rows) of the partially
defined behavior matrix as defined above, we attach to B(SN) unique column
indices Cj' jE,!!, and unique row indices ri , iEP, as folio ws. If cj = 1 (r i = 1), the
j'h column (i th row) of the (l + 1)st block coTumn (row) of B(SN) is linearly
dependent on the previous columns (rows), while the
column (i th row) of the
lth block column (row) is linearly independent on the previous columns (rows).
It follows that
LCj= Lri=n=rankB(SN)
Let us now define the family
F~xt(N):= {SEF~xt(N):cj + r i ~ N, jE,!!, iE!!.}
'
Kj =
199
Remarks. (a) As in the previous section, the constructions and proofs are
omitted. The interested reader is referred to Bosgra [15] and Antoulas [3] for
details.
(b) Comparing the cases of realization with complete data (infinite sequence
S) and realization with partial data (finite sequence SN)' we see that completeness
of the data forces uniqueness. In the latter case there is no uniqueness. Actually,
there is an infinity of (non-equivalent) solutions. In order to classify this set of
solutions additional conditions are needed. The above result shows that these
additional constraints guarantee uniqueness; they are expressed in terms of
inequalities among the indices K j , Vi and the amount of information (i.e. length
of the sequence) available.
(c) Given SN' suppose that the conditions for uniqueness are not satisfied.
Wh at we need to do is construct all continuations SR of SN such that Kj + Vi ~ N.
Each such continuation corresponds then to one of the solutions with indices
K j , Vi'
3 Further Developments
a. Synthesis Problems
(~:: ~::)(~:)
We are looking for all dynamic compensators given by the proper rational
matrix Zc:
U2
= -ZCY2'
Zyu 1 where
has certain properties, the most important being internal stability and the
stability of Zy itself. Using the so-called Youla-Kucera parametrization the
200
A. C. Antoulas et al.
where Zl,Z2,Z3 are readily derived from the Zijs (see, e.g. [28]). The starting
point for this calculation is the expression of the Zijs in terms of matrix fractions.
In doing so two possibilities are available: the first is to use polynomial matrix
fractions and the second is to use proper stahle rational matrix fractions. The
latter approach has the advantage that the properness of the compensator Zc
is guaranteed; this is not the case with the former approach. The advantage of
the former approach on the other hand, is that it allows us to keep track of
the complexity (McMillan degree) of the compensator; this in turn is not the
case with the latter approach.
The problem stated above was indeed investigated in Antoulas [2] making
use of polynomial factorizations. It turns out t1iat the properness constraint can
be dealt with in terms of partial realizations. More precisely, Zc is proper if
and only if the parameter Zx is a partial realization of a sequence derived from
the Zijs. This shows the relevance of the realization problem to synthesis
problems.
b. Recursiveness Issues
201
In addition, there exist positive integers ni> n i + l' i = 1,2, ... , such that there is
a one-one correspondence between
.EI ~Sl:= (Al" .. ,An,),
.E 1,2~Sl,2:= (Al"'" A nl , .. , An,),
where .E l,k denotes the interconnection of the first k subsystems in the above
cascade. The main property of this correspondence is that .E l,k is a minimal
partial realization of S l,k for k = 1,2, ....
This implies that the above structure is compatible with recursiveness, since
changing any Markov parameter AI, I> nk , does not affect the interconnection
.E l,k of the first k subsystems. This fundamental result provides the basis for a
theory of recursive realizations for multi-input, multi-output systems. The
natural tool for achieving this is a certain polynomial unimodular matrix of
size p + m, which can be associated to every system with m inputs and p outputs.
The result described above has numerous connections to other topics like
(matrix) continued fraction expansions, (matrix) Euclidean algorithm, (matrix)
linear fractional transformations, Kronecker indices, etc.. Furthermore, for
ni = i, an explicit construction of the systems in the above diagram, from the
Markov parameters, has been worked out [3]. This construction constitutes
the matrix generalization of the well known Berlekamp-Massey recursive
algorithm.
The results described above were triggered by Kalman's paper on the partial
realization problem [27]. For details, the main source is Antoulas [3]. Further
investigations along the same lines can be found in Antoulas [8], [11], Antoulas
and Bishop [5]. In [11] it is shown that the machinery introduced in [3] leads
to a new test for minimality of (both scalar and matrix) realizations based on the
degrees of the entries of an appropriate Bezout identity. In [5] various
recursiveness results are translated to a state-space setting and contact with
geometrie control is established. Furthermore, in [8], it is shown that many,
seemingly diverse results in system theory, can be explained in a unified way
using the cascade interconnection of two-port systems mentioned above, as a
too1. In particular, this refers to passive network synthesis results like Darlington
synthesis and Inverse Scattering, the underlying Nevanlinna-Pick algorithm,
and of course, the recursive realization results just mentioned. Finally, this
recursiveness structure can be extended to very general classes of modelling
problems (see Antoulas and Willems [14]).
c. Rational Interpolation
202
A. C. Antoulas et al.
... + AN er N + ...
It follows that
A k =..!.. dkZ(er)!
k! der k a=O
Hence, realization can be interpreted as interpolation at the origin. Moreover,
the complexity of Z(er) is the same as that of Z(er).
It is easy to see that black-box experiments (with linear systems) using
exponentials instead of impulses result in information on the value (and the
value of a certain number of derivatives) of the transfer function at points in the
complex plane which correspond to the frequency of these exponentials. Thus
experiments of this sort result in interpolation data.
The question arises as to whether realization and interpolation can be treated
in a unifying framework. As shown in Antoulas and Anderson [4] this is indeed
the case. Actually, the tool replacing the (partially defined) behavior matrix
B(SN)' which has Hankel structure, is the so-called Lwner matrix. It can be
shown that this Lwner matrix reduces to a Hankel matrix whenever we are
dealing with single point interpolation (which was shown above to include the
realization problem). Therefore the Lwner matrix framework provides a
generalization of the realization problem.
Various other investigations on the rational interpolation problem have
followed. In Antoulas [6] the (scalar) rational interpolation problem, with a
different degree of complexity (namely the sum of the numerator and the
denominator degrees, as opposed to the maximum between the numerator
and the denominator degrees), is considered. It is shown that the Euclidean
algorithm, and the degrees of the successive quotients, provide the key to
parametrizing all solutions, where the complexity is the parameter defined above.
This clarifies many aspects ofthe Pade and the Cauchy approximation problems,
not fully understood in the literature. In Antoulas and Anderson [7], various
classical results in rational interpolation theory are revisited and reinterpreted
using the Lwner matrix introduced above. An important result in this regard
is the fact that the celebrated Nevanlinna-Pick algorithm, consists of nothing
more than unconstrained minimal interpolation of the original set of data,
together with a mirror-image set of data. In Antoulas and Anderson [9], furt her
insight is provided on the use of the Lwner matrix in the study of the scalar
rational interpolation problem. In particular, all solutions are constructed both
in a state-space and a polynomial setting. In Anderson and Antoulas [10], the
construction of solutions of the matrix rational interpolation problem, in
state-space form, is derived using the (block) Lwner matrix. This theory
generalizes the construction of realizations by means of the (block) Hankel
203
matrix. Finally, Antoulas and Willems [13], provide a novel approach which
in [12] leads for the first time, to the solution of a general matrix rational
interpolation problem with the McMillan degree as complexity; the computation
of minimal interpolants follows as a corollary. This is done by defining directly
from the data, a pair of matrices. All admissible interpolant degrees are then
obtained from the reachability indices ofthis pair. Moreover, the corresponding
linear dependencies ofthe columns ofthe reachability matrix ofthis pair, provide
a parametrization of all interpolants of appropriate complexity. These results
are also extended in [12] to a very general bitangential interpolation problem
by the appropriate definition of a second pair of matrices.
References
[1] B.D.O. Anderson and S. Vongpanitlerd, Network analysis and synthesis: a modern systems
theory approach, Prentice Hall (1973)
[2] A.C. Antoulas, A new approach to synthesis problems in linear systems, IEEE Transaetions
on Automatie Control, AC-30, pp 465-474 (1985)
[3] A.C. Antoulas, On recursiveness and related topics in linear systems, IEEE Transaetions on
Automatie Control, AC-31, pp 1121-1135 (1986)
[4] A.C. Antoulas and B.D.O. Anderson, On the scalar rational interpolation problem, IMA J
of Mathematieal Control and Information, Special Issue on Parametrization problems, edited
by D. Hinrichsen and J.c. Willems, 3, pp 61-88 (1986)
[5] A.C. Antoulas and R.H. Bishop, Continued fraction decomposition of linear systems in the
space state, Systems and Control Letters, 9, pp 43-53 (1987)
[6] A.C. Antoulas, Rational interpolation and the Euclidean Aigorithm, Linear Algebra and Its
Applieations, 108, pp 157-171 (1988)
[7] A.C. Antoulas and B.D.O. Anderson, On the stable rational interpolation problem, Linear
Algebra and Its Applieations, Special Issue on Linear Control Theory, 122/123/124, pp 301-329
(1989)
[8] A.C. Antoulas, The cascade structure in system theory, in Three deeades of mathematical
system theory, edited by H. Nijmeijer and J.M. Schumacher, Springer Lecture Notes in Control
and Information Science, 135, pp 1-18 (1989)
[9] A.C. Antoulas and B.D.O. Anderson, State space and polynomial approaches to rational
interpolation, Progress in Systems and Control Theory: Realization and Modelling in System
Theory, MA Kaashoek, J.H. van Schuppen, and A.C.M. Ran, Eds., vol. I, pp. 73-82,
Birkhuser (1990).
[10] B.D.O. Anderson and A.C. Antoulas, Rational interpolation and state-variable realizations,
Linear Algebra and Its Applieations, Special Issue on Matrix problems, 137/138: 479-509
(1990)
[11] A.C. Antoulas, On minimal realizations, Systems and Control Letters, 14: 319-324 (1990)
[12] A.C. Antoulas, JA Ball, J. Kang, and J.C. Willems, On the solution of the minimal rational
interpolation prpblem, Linear Algebra and Its Applieations, Special Issue on Matrix Problems,
137/138: 511-573 (1990)
[13] A.C. Antoulas and J.C. Willems, Rational interpolation and Prony's method, Analysis and
Optimization of Systems, J.L. Lions and A. Beusoussan, Eds., Springer Verlag, Lecture Notes
in Control and Information Sciences, 144: 297-306 (1990).
[14] AC. Antoulas and J.C. Willems, Linear modeling and recursive modeling, ECE Rice UniverSity,
Tech. Report 91-05 (1991)
[15] O.H. Bosgra, On parametrizations of the minimal partial realization problem, Systems &
Control Letters, 3: 181-187 (1983)
[16] E.G. Gilbert, Controllability and observability in multivariable control systems, SIAM J
Control, 1: 128-151 (1963)
[17] R.E. Kaiman, Mathematical description of linear dynamical systems, SIAM J Control, 1:
152-192 (1963)
204
A. C. Antoulas et al.
[18J R.E. KaIman, Lyapunov functions for the problem of Lur'e in automatie control, Proc
National Academy of Sciences (USA), 49: 201-205 (1963)
[19J R.E. KaIman, On a new characterization of linear passive systems, Proc First A/lerton
Conference on Circuits and Systems, University of Illinois, pp 456-470 (1963)
[20J R.E. KaIman, Irreducible realizations and the degree of a rational matrix, SIAM J Control,
13: 520-544 (1965)
[21J B.L. Ho and R.E. KaIman, EfTective construction of linear state-variable models from
input/output data, Regelungstechnik, 14: 545-548 (1966)
[22J R.E. KaIman, On minimal partial realizations of a linear input/output map, in Aspects of
network and system theory, R.E. KaIman and N. DeClaris Eds., Holt, Rinehart, and Winston,
pp 385-408 (1971)
[23J R.E. KaIman, P.L. Falb, and M.A. Arbib, Topics in mathematical system theory, McGraw-HiIl
(1969)
[24J R.E. KaIman, Kronecker invariants and feedback, in Proc 1971 NRL-MRC Conf on Ordinary
Differential Equations, edited by L. Weiss, Academic Press (1972)
[25J R.E. KaIman and Y. Rouchaleau, Realization theory of linear systems over a commutative
ring, in Automata theory, languages, and programming, edited by M. Nivat, North Holland,
pp 61-65 (1972)
[26J R.E. Kaiman, Realization theory of linear dynamical systems, in Control theory and functional
analysis, Vol H, International Atomic Energy Agency, Vienna, pp 235-256 (1976)
[27J R.E. KaIman, On partial realizations, transfer functions, and canonical forms, Acta
Polytechnica Scandinavica, Mathematics and Computer Science Series, 31: 9-32 (1979)
[28J J.B. Pearson, On the parametrization of input-output maps for stable linear systems, this
volume, chapter V, pp 345-354
205
difficult to expect every state to be reachable. Instead, one usually requires that
the reachable states constitute a dense subspace. This notion is called approximate reachability. A system that is approximately reachable and observable is
said to be weakly canonical.
This notion of canonicity appears to be a natural extension of the finitedimensional case, so one may expect that weakly canonical realizations may
be unique. However, Baras, Brockett, and Fuhrmann [2J have shown that
there exist two (in fact, infinitely many) systems 1: 1 and 1: 2 having the same
impulse response, both weakly canonical, yet they are not isomorphic. The
difficulty here is oftopological nature. They both have a Hilbert state space, and
there exists a continuous system morphism:
(1)
yet T is not continuously invertible. In other words, under the weak notion of
canonicity, the topology of the system cannot be uniquely determined from its
external data.
The reason for this nonuniqueness is that approximate reachability or
observability imposes too little restriction on the topology of the state space.
This suggests the need of strengthening the notion of canonicity to obtain the
desired uniqueness. KaIman hirnself did not regard the notions of controllability
and observability as those that cannot be changed. Rather, they should be
properly modified according to the context in which systems are considered.
Let us quote from his CIME Lecture Notes ([9J):
"The chief current problem in controllability theory is the extension to
more elaborate algebraic structures." (p. 141-Historical Comments)
Reading "topological" for "algebraic" precisely applies to the present context.
In fact, it has become a standard understading that it is appropriate to consider
the notions of re ach ability and observability in the category in which systems
are considered.
There are a number of approaches toward the uniqueness principle along
this line. Helton [8J gave a uniqueness theorem under the requirement of exact
reachability. Brockett and Fuhrmann [3J derived a different uniqueness theorem
by restricting to systems with certain symmetry properties.
There is, however, another aspect that KaIman emphasized. In his
k[z]-module approach to realization, input functions are sequences of bounded
length, and every canonical realization can be obtained as a result of computing
the Nerode equivalence classes ([14J) of such input functions ([llJ). He pursued
this line of approach in [12J for the study of continuous-time systems. However,
to make the module theory for this case parallel to the discrete-time systems,
aspace of distributions was introduced as an input function space. (See also
Matsuo [15J and Kamen [13J for related treatments.) This introduction of a
space of distributions leads to a non-Hilbert space structure, and it is in marked
contrast with the works [lJ, [3J, [4J, [8], where the theory is essentially in the
realm of L2 -theory, and the Nerode equivalence classes are not explicitly present.
206
A. C. Antoulas et al.
Therefore, in this context, more advanced tools are needed to derive the
uniqueness theorem for canonical realizations. Matsuo [16] made use of Ptak's
open mappingjdosed graph theorem ([17]), and proved the uniqueness theorem
under the requirements that
1. the input function space be a so-caIled Ptak space ([17]);
2. the state-space be a barreled space;
3. "canonical" means exactly reachable and observable.
The above approaches (except [3]) aIl focus upon the notion of reachability.
An alternative approach is, however, also possible. This approach places more
emphasis upon observability, and introduce a stronger notion of observability;
on the other hand, exact reachability is not required. To motivate, let us return
to the question of extending the notions of reachability and observability, and
focus on the notion of observability.
A state-space is a gadget which stores the past his tory of inputs to the system
that is enough to determine the future behavior of the system outputs ([11]).
States are not directly observed, but rather through observation of outputs
only. From the viewpoint of realization theory, therefore, states can be anything
that satisfy this requirement. In the finite-dimensional ca se, this led to the notion
of abstract vector spaces, and the co ordinate system chosen there is of secondary
importance. It is mainly for the convenience of computation, conciseness of
expression, etc ..
A similar argument can be made on topologies of the state-space in infinite
dimensions. Since states cannot be directly observed, the doseness of two states
(i.e. topology) is observed only through the corresponding outputs. Of course,
two dose states should produce dose outputs. But if we require observability,
we want to condude that two states are dose if the corresponding outputs are
dose. This is precisely Kalman's basic idea on observability, adopted in the
topological context. In fact, for the finite-dimensional case, observability
guarantees this property. In other words, initial state determination is always
weIl posed for the finite-dimensional case. This is an implicit consequence of
observability in finite-dimensional systems.
However, due to the very infinite-dimensionality, this weIl-posedness does
not automatically hold for systems with infinite-dimensional state spaces, even
though they are observable. Observability merely requires that the correspondence
initial states H outputs
(2)
be one-to-one, but it is not necessarily weIl posed. This means the following: If
(2) is not weIl posed, then even if we observe two outputs to be very dose, the
207
corresponding states can be very far away. Since there is nothing else to rely
upon other than the observation of outputs, there would be no way to recover
the knowledge of closeness of states in such a case.
This observation naturally leads to the notion of topological observability
([18J): A system is said to be topologically observable if the correspondence (2)
is weIl posed, i.e. it is continuously invertible. We say that a system is canonical
if it is both approximately reachable and topologically observable. This is a
strong restriction and there is a question as to if it may be an obstruction
against the existence of a canonical realization.
It turns out that a canonical realization in this sense always exists and is
unique. We conclude this section with this theorem.
Theorem 11.2 (Existence and Uniqueness of Canonical Realizations.) Let f be
a linear input/output map. Then its canonical (i.e. approximately reachable
and topologically observable) realization always exists, and is unique up to
isomorphism.
208
A. C. Antoulas et al.
is continuously invertible for some T> o. If this property holds, the system is
said to be topologically observable in bounded time. Interestingly enough, this
condition gives a necessary and sufficient condition for the shift realization
(described in the last section) to have a Banach state space ([18J).
Kamen [13J considered a related notion of finiteness. In study of boundedtime controllability, he is led to the study of impulse responses of the form
(4)
where q and p are Schwartz distributions of compact support contained in
( - 00,0]. This is in a precise analog ofKalman's k[zJ-module setting for discretetime systems. There, polynomials are regarded as signals of finite length given
on ( - 00, OJ nZ, and transfer functions are ratios ofsuch objects. In (4), one takes
distributions of compact support in ( - 00, OJ instead of polynomials in z. A
typical example is given by delay-differential equations. For example, let W(s)
be a transfer function
1
W(s)=-ses -1
(5)
The inverse Laplace transforms of sand eS are ()' (the derivative of Dirac's delta
distribution) and () -1 (Dirac's delta at point -1), respectively. Then the
corresponding impulse response becomes
(6)
209
impulse response, which in turn yields the finite-time reconstruction ofthe states.
In what follows, we show that this finite-time property enables us to compute
the Nerode equivalence classes very naturally as a generalization of the works
[11] and [6].
Let A = q-1 *P be pseudorational. Consider the following subspace xq of
L~oc [0, 00 ):
xq:=
{xEL~oJO,
oo);supp(q*x) c ( - 00,0]}.
(7)
The space X q is the set of all output functions genera ted by the denominator
distribution q - 1. As can be imagined from the finite-dimensional theory
([11], [6]), this is a dual way of representing the Nerode equivalence classes of
the impulse response A:= q - 1. In view of the separate continuity of convolution,
this space is closed in L~oJO, (0). It is also easy to check that (7) is closed under
the left shift operators:
(O"tx)(r):= x(t + r).
(8)
Since {O"t} comprise a strongly continuous semigroup in L~oc [0, (0), they also
form a semigroup in x q Let F be the infinitesimal generator of this semigroup.
Then the following differential equation description gives a realization of A:
State Space: x q
Systems Equation:
dt x t()
(9)
(10)
y(t) = xlO)
where
Fx(r):=dx, with domain D(F)
dr
= WilOC[O,oo)nX A
'
(11 )
In the rest of the section, we will see the following properties of I:q,p:
1. When explicit forms of q and p are given, it is possible to exhibit I:q,p.
2. I:q,p often agrees with natural function space models, such as M 2-model for
delay systems.
210
A. C. Antoulas et al.
(12)
(13)
supp(q*X) c (- 00,0]
(14)
(15)
on (0,00)
This yields
x(t + 1) - x(t) = 0
(16)
Hence
t
(17)
for 1 ~ t < 2. Iteration of this formula implies that under (16), x(t) is completely
determined by the data XI[O,l) and x(l). Taking the c10sure of all such x in
L~oc[O, 00) to obtain X q , we conc1ude
(18)
Write (x, z(O for an element in L 2 [0, 1] x R. To derive the system equation (9)
in this space, we need only to compute
lim ~ [a.(x, z(O - (x,z(O))],
.~O e
(19)
under the equivalence (18), and the fact that z(O) and x gives the data of X[O,l)
and x(l). An easy calculation ([22]) yields
:t
C;~)) ~ (::':~:O) )+ G)
y(t) = Zt(O)
.(t),
(20)
211
This is nothing but the well known M 2-model for the delay-differential equation:
x(t) = x(t - 1) + u(t),
y(t) = x(t - 1)
(21)
(22)
if and only if
1. there is no common zero between the Laplace transforms q(s) and (s), and
2. sUp{t;tEsuppqusuppp} =0.
Since in the present case the pair
(j'-l -
= O}
(24)
Every AEa(F) is an eigenvalue, and has a finite multiplicity wh ich is equal to the
dimension of the corresponding generalized eigenspace.
3 Some Remarks
Mainly due to the page limitation, we could not discuss the L 2 input/output
framework (and its transformed form H 2 theory) developed in [1], [2], [4],
[5], [8], etc. For more details, see [7] and references therein. The H 2 framework
recently received a renewed interest in connection with the H<X>-control theory,
of wh ich consequences are yet to be seen in the future developments. In all
212
A. C. Antoulas et al.
The use of state-spaee models for modelling and proeessing of random signals was introdueed by
Kaiman at the very beginning of the history of System Theory. Although speetaeular sueeesses
have emerged from the introduetion of these models (Kaiman filtering to name just one), until quite
reeently there has not been any serious efTort of putting together in a logically eonsistent way a
theory of modelling and model representation in the stoehastie frame. Expanding applieations to
diverse fields like Eeonometrics ete. and a multitude of non standard estimation problems arising
in engineering applieations seem now to render the need for such a theory more urgent.
In this paper we diseuss some ideas whieh are believed to be the eentral concepts needed for
understanding stoehastie modelling. Stoehastie realization is seen as the problem of transforming
models of "phenomenologieal" type (ealled external) into models possessing more strueture, whieh
require the introduetion of auxiliary variables (internal models).
1 Introduction
Mathematical models of dynamic phenomena can be classified in two broad
categories: external models, which are mathematical relations involving only the
external variables of the system [extern al variables are by definition those
directly accessible to observation (measurement variables) or control (decision
variables)] and interna I models, which, besides the external, also involve auxiliary
variables. Auxiliary variables (also called internal or latent variables) need not
have any direct physical or economic meaning and are introduced at the purpose
of giving to the model a special mathematical structure. In a sense they play
the role of additional dynamical parameters which if necessary can be eliminated
returning to an external description.
Realization may be broadly defined as the problem of transforming external
models into internal models (to within a specified structure). According to this
definition, realization may then abstractly be viewed as a problem of parametrization falling into the same general category of, say, representing an algebraic
curve r = {(x, y); F(x, y) = O} in the plane (here x and y are the external variables
and F(x, y) = 0, F a polynomial, is the external model) in parametric form
r = {(x, y); x = <!>(t), y = t/!(t), tEI}, the parameter tEl being the "latent"
auxiliary variable and the above "parametric" description the wanted "internai"
model of r.
214
G. Picci
R.E. KaIman originated reaEzation theory in the early sixties studying the
following setup:
- The c1ass of external models consists of causal linear input-output maps
("input" here is used as a synonym of control).
-An internal auxiliary variable x is defined by the property of making past
inputs and future outputs conditionally independent! given the current value
of x. This is called the state property and x the state variable of the system.
Internal models with the auxiliary variable x possessing the state property
are called state-space system.
It is a fact that the introduction of state-space systems has had a profound
influence in the development of modern engineering sciences. From one side a
natural mathematical framework has emerged for stating and solving many
basic problems of control and communication engineering. The role of"sufficient
statistic" of the state variable alluded above, permits the translation of control
and filtering problems into control and filtering oJ the state, thus leading to
general prototype problems which are formulated and solved in a universal
format, leading for the first time to an effective "theory of communication and
control".
From the computational viewpoint, the concept of state-space system
appears very much as the right generalization of the concept of recursively
computableJunction to the (infinite cardinality) continuous case. The solution of
a control/observation problem stated using state-space models can most
naturally be given by producing a new state-space system which does not show
the solution in c10sed form but does instead the signal processing required for
implementing it. Thus state-space dynamical systems play a dual role as models
and as computational schemes. The importance of this aspect has taken sometime
to be fully appreciated, aithough recent emphasis on computational methods
has led to quite a change of perspective. Nowadays even Bode plots are
computed by state-space methods [15].
For the above reasons, in the last two decades there have been substantial
efforts to generalize modelling and realization theory beyond Kalman's original
input-output setup. In particular, motivated by areas like statistical signal
processing and econometrics, the question of understanding stochastic modelling
and building realization theory in the context of stochastic models has naturally
arisen. A first indication in this direction was given already in [7]. Also, one
would like to inc1ude in the theory the ca se of autonomous (deterministic)
systems where there are no control variables and input-output maps cannot be
taken as a primary external description. The originators of Stochastic Realization were Akaike [lJ, Ruckebusch [22J and Picci [16J, but the main body
of the theory is especially due to Lindquist-Picci and Ruckebusch and is
summarized in the survey papers [13J, [22J, see also [23]. The "autonomous"
setting has been developed into a very articulate theoretical construction by
1
215
J.c. Willems [25-28J and by the dutch school. It is remarkable that these two
apparently different contexts turn out to involve in reality quite similar ideas
[20]. In this respect, we would like to present this paper much more as an
attempt to sort out general ideas on modelling and model structure rather than
areport on specific results on stochastic realization. We believe that many ideas,
although originated in a probabilistic context and described here in a probabilistic language, have general significance. Some can be recognized in Willems'
theory even if there are no probability measures around.
We shall not attempt any survey of the literature on stochastic realization
theory. We should however point out that, at least for the linear-Gaussian case,
a distributional modelling theory, based on spectral factorization and the
so-called Positive-Real Lemma (the Yakubovich-Kalman-Popov Lemma) has
been available since the late sixties, [2J, [6]. This was not satisfactory however
as in virtually all ofthe applications, the processing of"random" signals requires
processing of a specific time trajectory of the signal. Therefore a theory based
on consideration of "sam pie values" is needed, not just a distributional one.
Now, this brings in direct1y the question of defining stochastic models, the basic
objects of our study.
Definition. A stochastic dynamical system is a stochastic process z:= {z( t) LET
defined on a parametrized family of probability spaces {.o, si, .uu}. The parameter
u is the control variable, a deterministic function of time tE T, belonging to some
set lJlt of admissible control functions and the dependence of the probability
measure .uu upon u is causal, i.e. for every event A belonging to the past history
of y at time t, .uu(A) depends only on values taken by u before and at time t.
The process z will in general take its values in a product space Y x X (of
external and internal signal alphabets) and the relative components y and x are
declared external and internal (or latent) variables. We shall usually write z as
an ordered pair z = (y, x). The system is called an external description if z == y
and an internal description if internal variables (x) are present. Naturally the
probability space is only assigned up to stochastic equivalence (it can be fixed
in some canonical way). The a-algebra 2 si represents the events being modelled
by the system. Clearly si contains at least the events relative to z i.e. si ::> f!Z:=
a{z(t); tET} but it may be bigger. Finally u, called the control (or decision)
variable is a variable with no dynamical description (i.e. without a probabilistic
description). Assigning a dynamics to u, thus making it also into a stochastic
process is actually what control theory is about and need not concern us here.
The causality of the map u ~ .uu is postulated because of the meaning of u as a
decision variable (no clairvoyance).
The notion of autonomous dynamical systems descends from the general
definition by deleting control variables (i.e. taking lJlt to be a singleton). This is
2 All O"-algebras will be assumed Jl-complete and the qualification "Jl-almost surely" is tacitly
understood whenever appropriate.
216
G. Picci
n=1,2, ...
217
218
G. Picci
2 Splitting Variables
There are two fundamental types of auxiliary variables which enter in the
construction of internal models of random phenomena. We shall call them
splitting variables and noise variables. A splitting variable parametrizes the
probabilistic dependence between external variables. In the common probability
space {n, OJI, J1.}, let OJI i = CT(Y;) i = 1,2 be the CT-angebras induced by the random
variables Yi. A random variable x is said to be splitting for (Yl, Y2) if X = CT(X)
makes OJI land OJI 2 conditionally independent given X, i.e.
(i) J1.(A l nA 2IX) = J1.(AtlX)J1.(A 2 IX) Al EOJI l , A 2E0JI 2 or equivalently,
(ii) J1.(A 210J1 l v X) = J1.(A 2IX) A2E<??!20ralso,
(iii) J1.(A l I0Jl 2 v X) = J1.(A l IX) AlE<??!l.
Notations: OJI l JL0JI 21X or y l JLY2lx.
The definition is easily extendable to any number of external variables. Since
CT-algebras have a natural ordering by refinement there is a natural notion of
minimal (or coarsest) splitting CT-algebra and of minimal splitting variable as one
inducing a minimal splitting CT-algebra. Minimal splitting CT-algebras are non
unique and a cIassical counterexample to uniqueness is offered by the two
predictor-algebras
E(OJI ll<??! 2):= CT{J1.(A l IOJl 2); Al EOJI l}
E(OJI 210J1 d:= CT{J1.(A 210J1 d; A 2EOJI 2}
each of wh ich is minimal splitting. The two predictor algebras coincide only
when OJI land OJI 2 intersect perpendicularly i.e. when OJI 1 JL OJI 21 OJI ln OJI 2 which is
a rather special condition.
Splitting variables are a generalization of the idea of (Bayesian) sufficient
statistic. In particular, a minimal OJI l-measurable sufficient statistic is a "coarsest"
function x = (P(Yl) wh ich does exactly as weil as the whole Yl at the purpose
of predicting Yl. This property is generalized by property (ii) above. Note that
a splitting variable need not necessarily be a function of (Yt> Y2). If this is the
case the variable is caIIed (Yl' Y2)-induced (the word "internai" is also used but
since it may cause confusion in the present context it will be avoided). In general
x may even require a larger probability space than the one supporting (Yl,Yl).
A prototype abstract problem in stochastic realization is to construct splitting
variables as functions of certain available random data: Given in a probability
space {n, .91, J1.} random variables Yl, Y2 and perhaps also an exogenous
independent random element w, find all minimal splitting variables x (or CT-algebras
X) Jor (Yl'Yl), wh ich are Junctions oJ (Yl,Yl, w) (resp. contained in
OJI l v 0JI 2 V CT(W)).
As we shall see later on, different cIasses of internal stochastic models can
naturally be characterized in terms of a particular splitting property. Perhaps
the most important is the characterization of state-space internal models which
we proceed to illustrate below. [The notations used are as folIows: if Y= {y(t) }relR
219
I1JJ t- JUl t+
(MS)
The extern al process y is then called the output and x the state of the system.
The system is stationary if (y, x) are jointly stationary and finite dimensional if
x(t) takes values, for each tEIR, on a finite dimensional space X
From the definition of conditional independence reported at the beginning
it is seen that l1JJ 1 JL!lJJ 21 X implies .91 1 JL .91 21X for any sub cr-algebras .91 1 c !lJJ 1
and .91 2 c !lJJ 2' It then follows from condition (MS) that
(S)
and
X t- JLXt+ IXt
(M)
for all tEIR. (S) is referred to as the splitting property of x while (M) is just the
Markov property of the state process x. In the present continuous-time setting
it can be shown that (S) and (M) imply, and hence are equivalent to (MS), the
Markovian Splitting property of the Definition but in more general situations
this may not be the case. The two properties (S) and (M) constitute the natural
generalization to the stochastic framework of the deterministic properties of a
state variable.
The following implication is an immediate consequence of (ii) or (iii),
l1JJ 1 JL l1JJ 21 X => !lJJ 1 n l1JJ 2
so that (S) implies !lJJ t c X t and hence there is a Borel function ht:X ~ IRm such
that
y(t) = hlx(t))
(MR)
220
G. Picci
Theorem. In the probability space {Q, .91, Jl} where .91 => OJI, let {9't} and {fit}
be two flows of cr-algebras satisfying the foltowing conditions:
1. {9't} is increasing and {fit} is decreasing with t.
2. 9't v fit = .91 for alt t.
+
3. 9't => OJI t- and 9't => IJ}jt for alt t.
4. 9't l..fitl9'tnfiJor alt t, i.e. {9't} and {fit} are perpendicularly intersecting.
Then f![t:= 9't n fit is Markovian splitting and f![t- = 9't, f![t+ = fit. Viceversa,
alt M arkovian splitting cr-algebras f![t are genera ted in this way with
9't = OJI,- V f![" fi, = OJI,+ v f![t and .91 = lJ}j v f![.
D
This result is a generalization of the fundamental scattering representation
of Markovian Splitting Subspaces first obtained by Lindquist-Picci in the
wide-sense (linear) framework (the original reference being listed in [13J). A
proof of this theorem is a result of joint work with l.H. van Schuppen and will
appear in aseparate publication.
In the stationary wide sense (or Gaussian) setting one can work on the
Hilbert subspace of U(Q, .91, Jl) linearly generated by the components of the
process under study and the abstract procedure given before for constructing
the state of a process y can be made quite explicit. Conditional independence
reduces to conditional orthogonality of subspaces of a Hilbert space, written
H l l..H 2 1X
with the equivalent meanings,
i'. <Al - E(A1IX), A2 - E(A 2IX) = 0
ii'. E(A 2IH l v X) = E(A2IX)
iii'. E(All H 2 v X) = E(All X)
where AiEHi, <.,.) denotes inner product, E( IX) is the orthogonal projection
operator onto X and H v X is a closed vector sum of subspaces. We shall use
the standard notations H(y), H,- (y) and H,+ (y) to denote the subspaces spanned
by (the sc al ar components of) the process y, the past history of y up to time t
and the future history of y after time t. There is a unitary group {V r } of linear
operators, called the shift group, which stationarily pro pagates the process y,
VrYk(S) = Yk(t + s), k = 1, ... , m, and all time-dependent quantities in H(y) are
221
(MR')
where C is a fixed linear operator from the state-space of the Markov process
{x(t)} into IRm. The general result on construction of splitting a-algebras given
above specializes to
Proposition. Given a stationary Gaussian space H:::J H(y) with shift {Ur}' consider
the intersection, X = S ( l Sof a pair (S, S) of subspaces of H satisfying the following
conditions,
1'. UiS c Sand UrS eS for all t;S 0, i.e. Sand
the left and right shift semigroups.
2'. SV S=H.
3'. S:::JH-(y),S:::JH+(y).
4'. Sand S are perpendicularly intersecting.
Yl=A1x+W 1
YN= ANx + W N
(FA)
222
G. Picci
then any set of generators, x = {Xl" .xn }, for X defines a F.A. model. If x is a
minimal set, the operators A k are uniquely determined. The model (F.A.) is minimal
if and only if X = H(x) is a minimal splitting subspace.
0
The stochastic realization problem for Factor Analysis models is very difficult
even in the seemingly simpler static case. The intrinsic feature of non-uniqueness
of the latent variable x has created a wave of mystery around these objects and
their use has been discouraged. Obviously the non-uniqueness of the factor x
or better, ofthe minimal splitting subspaces X, is always present in parametrization problems of this nature. The problem here seems to be much more a
cIassification of the minimal factors than achieving "identifiability". This
viewpoint transpires also in Kalman's work [8,9]. A fairly complete treatment
of the "two blocks" case (N = 2) is given in [18, 19].
3 Noise Variables
In the engineering literat ure the term white noise is used to denote a stochastic
process w = {w(t)} having the "canonical" property that its value {w(t)} are a
maximally uncorrelated family of random variables. In discrete time "white
noise" generally means a stationary zero mean process with independent or
uncorrelated variables. In continuous time several choices are possible. We shall
223
(WNR)
where .;V is the trivial u-algebra {4>, n} mod Ji. It follows that any process y
admitting a causal or anticausal white noise representation must have trivial
224
G. Picci
nSt
{O},
nSt
{O},
(PND').
There are examples showing that one ofthese two conditions does not necessarily imply the other.
225
1]
= S qJ(s)dw(s),
(R)
-00
process independent of y and jointly stationary and let H:= H(y) v H(v) have
multiplicity p. Then to each white noise generator w of H there corresponds an
m x p matrix valued function W EL 2(lR; lR mX P) which represents y as
+00
(WNR')
-00
W,
:::J
+00
-00
W(t - s)dw(s)
226
G. Picci
J W(t -
+00
s)dw(s)
-00
form a scattering pair spanning H:= H-(y) v X v H+(y). Viceversa, given any
H::J H(y) with the properties ofproposition 1 and 2 above and given any scattering
pair (S, S) in H, the intersection Sn S is a M arkovian splitting subspace and S, S
have the representation (SR).
The pair (S, S) is called the Scattering Representation of X.
We shall now restrict further to finite dimensional Markovian splitting
subspaces and choose a basis x(O)=[x 1 (O),oo.,x n(O)Y in each X. The
n-dimensional process x(t):= Urx(O), tEIR, will be wide-sense stationary and
Markov.
227
(FR)
(BR)
These are in fact differential versions of the causal and anticausal white noise
representations of x corresponding (via Proposition 2) to the scattering pair
(S, S). The matrix A is asymptotically stable, i.e.1Re[A,(A)] < 0 while Ais totally
unstable, i.e.1Re[A,(A)] > O. The pair (A, B), must be a reachable pair (since x(O),
being a basis, has a positive definite covariance) and similarly for (A, B). Note
that (FR), resp. (BR), must then be integrated forward (resp. backward) in time
in order to get a stationary process, i.e.
x(t) =
J eA(t-s) Bdw(s),
t
-00
x(t) =
+00
J eA(t-s) Bdw(s)
t
For this reason (FR) is called aforward and (BR) a backward representation of x.
Also, the denomination "differential equations" is a bit ambiguous, since there
are no arbitrary initial conditions associated with the representations. They may
actually be regarded as boundary value systems, (FR) being associated with the
fixed boundary condition x( - (0) = 0 and (BR) with x( + (0) = O. But there is
more. Since (S, S) intersect perpendicularly, there is a very precise relation
between wand wand this in turn reflects into a relation between the parameters
of the two representations (FR) and (BR).
228
G. Picci
Conclusion
Stochastic models should be useful to describe physical systems, but stochastic
modelling done on the sole basis of "physical intuition" may lead to nonsense.
A variety of linear models with random noise inputs (eventually modelled as a
filtered white noise) are used in identification, quite often introduced on the
basis of "physical modelling", like measurement errors etc. It may happen that
these models either give poor fitting or are not "identifiable" on the basis of
the observed data. What has been exposed before should convince the reader
that random inputs are just internal variables and have no more "physical"
meaning than any other auxiliary variable (say the state in deterministic systems).
What matters is the external (observable) process being modelled and it may
just happen that the "physical" model describes in reality too narrow a .dass
of stochastic processes to fit reasonably the data. Also quite likely a physical
model is not a "canonical" representation ofthe extern al process being described.
In simple cases the standard practice of converting to the "innovation representation" works, but this may turn out to be highly non trivial in more general
situations.
References
[1] Akaike H. (1975) Markovian representation of stochastic processes by means of canonical
variables, SIAM J Control, 13, pp 165-173
[2] Anderson B.D.O. (1969) The inverse problem of stationary covariance generation, J Stat Phys
1, pp 133-147
[3] Cramer H. (1960) On some Classes ofNon-Stationary Stochastic Processes, Proc IV Berkeley
Symp Statistic and Appl Probability II, pp 57-78
[4] Doob J.L. (1953) Stochastic Processes, Wiley
[5] Ephremides T., Thomas J. (1973) Random Processes, Multiplicity Theory and Canonical
Decompositions, Dowden Hutchinson & Ross
[6] Faurre P. (1973) Realisations Markoviennes de processus aleatoires stationnaires, INRIA
Report n 13, INRIA, Le Chesnay
[7] Kaiman R.E. (1965) Linear stochastic filtering: reappraisal and outlook, Proc Symp System
Theory, Polytechnic Inst of Brooklyn, pp 197-205
[8] Kaiman R.E. (1982) Identification from real data, in Current Developements in the Interface:
Economics. Econometrics and Mathematics (M. Hazewinkel and A.H.G. Rinrooy Kan eds),
Reidel, Dordrecht, pp 161-196
[9] Kaiman R.E. (1982) System Identification from Noisy Data, in Dynamical Systems II (A.R.
Bednarek and L. Cesari eds), Academic Press, pp 135-164
[10] Lewis J.T., Thomas L.c. (1974) How to make a Heat Bath, in Functional Integration (A.M.
Arthurs ed), Clarendon Press, Oxford, pp 97-123
[11] Lewis J.T., Maassen H. 1984. Hamiltonian models of cIassical and quantum stochastic
processes. Quantum Probability and Applications (A. Frigerio and V. Gorini eds), Springer
L.N. in Mathematics, 1055, Springer Verlag
[12] Lindquist A., Picci G. (1979) On the Stochastic Realization Problem, SIAM J Control Optimiz,
17, pp 365-369
[13] Lindquist A., Picci G. (1985) Realization Theory for multivariate stationary Gaussian
processes, SIAM J Control Optimiz, 23, pp 809-857
[14] Lindquist A., Picci G. (1990) A geometric approach to modeling and estimation of linear
stochastic systems-to appear in Journal of Math Systems Estimation and Control
[15] Laub A. et al. (1988) MATLAB Control System Toolbox, Tbe Math Works Inc
[16] Picci G. (1976) Stochastic realization ofGaussian processes, Proc IEEE 64, pp 112-122
229
Chapter 5
Department of Mathematics, Ben-Gurion University of the Negev, Beer Sheva, 84120 Israel
The role of algebraic methods in general and module theory in particular as a unifying framework
in the theory of linear systems will be surveyed. We will focus on polynomial algebra, realization
theory, stability and applications of continued fractions in this area.
1 Introduction
A Persian rug is valued for its beauty, the originality of its design, the harmony
of its colouring and the density of the knots. It turns out that a valuable rug
is also very practical and connoisseurs value its usefulness. An area of science
is not that different. We are guided by a combination of different criteria. Even
if applications are our motivation, it is wise to be guided by aesthetic considerations, it is important to realize precisely the structural properties, and evaluate
how any particular part of science is related to the main body. One of the
landmarks of a successful scientific domain is its rich connectivity to other areas.
Isolation implies low oxygen supply which rapidly leads to sterility and decay.
It seems necessary to resort to some kind of imagery as for the young
newcomers of our field, exposed very early to a bombardment ofhighly technical
and sophisticated mathematics, it is hard to realize how the field of system and
control reached this point.
Clearly system theory is the result of a joint effort of many scientists, going
back at least to J.c. Maxwell in the preceding century. However the time
things started to move from the vague to the precise, from the computational
to the conceptual, i.e. from problem solving to the creation of a theory, is clearly
the late fifties and early sixties and in this process the dominant figure was, no
doubt, R.E. KaIman. His contributions were, from the technical point, superceded quickly by people better trained in their respective fields. However in
no case was the technical contribution the important part. The importance
related to the tremendous insight into the heart of the problem. Thus with
optimal control and the fundamental new point of view in filtering. The same
is true with the conceptual foundation of the theory of finite dimensional linear
1
234
P. A. Fuhrmann
2 Modular Arithmetic
In KaIman [1969J the use of algebra as a computational tool, extremely weIl
suited for computers, is stressed. This point really raises the question of how
abstract can one make a theory while retaining the ability for easy computability.
Clearly this is related to what are the most convenient representations of abstract
objects. The situation is analogous to the case of linear transformations in finite
dimensional vector spaces and their representations as matrices. However, even
in this case, this weIl established representation is by no means the only one,
235
and the point I will try to make, not even the most convenient, certainly not
the most illuminating from the structural point of view. The importance of
polynomial algebra in this connection has been realized early by KaIman, not
only as a key to structure via the use of polynomial modules, but also as a
convenient computational tool, namely through the use of modular arithmetic.
This of course was possible if one restricted to the scalar case, i.e. to single
input/single output systems. Simultaneously the systematic use of polynomial
matrices in system theory was begun by Rosenbrock [1970]. While this turned
out to be an extremely powerful tool, it paid less attention to the fundamental
conceptual underpinnings of the subject.
I was in a lucky position, see Fuhrmann [1976], to be able to realize that
Kalman's emphasis of the abstract theory of modules and Rosenbrock's use of
coprime factorizations and polynomial matrices, the generally available state
space methods, as weil as the theory of infinite dimensional systems beginning
its development at that time, were all different facets that could be unified in
one approach. Specifically one goes from the abstract theory of modules to
certain polynomial representations. This leads to a modular arithmetic on
polynomial vectors and matrices. Finally looking at the new objects, namely
polynomial models, any choice of linear basis leads to matrix or state space
representations. From this point of view the use of polynomial models is a
balancing act between the abstract and the concrete.
From an external point of view regarding systems, an input output map
was defined by KaIman as a module homomorphism, over the ring of polynomials, between appropriately defined spaces of input and output functions.
Of course this implies a linear context and the module property implies time
invariance. The causality property can be introduced by requiring the invariance
of an appropriate submodule under the input/output map. Alternatively one
can implicitly introduce causality by considering the resrticted input/output map
j:Q .....
(1 )
where Q is the space of past input functions and r the space of future output
functions. The assumptions on the various objects appearing has a profound
influence on the development of the theory. Kalman's choice was Q = Fm[z]
and r = Z-l p[[Z-l]]. We shall study these and related objects in some detail
and return to Kalman's abstract realization in Sect. 5. It is convenient to
take a slightly more general setting.
Let F denote an arbitrary field. We will use the following notation. F[z]
will denote the ring of polynomials over F, F(z) the field of rational functions
and F[[Z-l]] the ring offormal power series in Z-l with coefficients in F, i.e.
j EF[[Z-l]] then j(z) = "LJ=ojjzj. By Z-l F[[Z-l]] we denote the subspace of
F[[Z-l]] consisting of all power series with vanishing constant term and by
F((Z-l)) we denote the field of truncated Laurent series namely of series of the
form
h(z) =
n(h)
j= -
hjz j with
00
n(h)EZ
236
P. A. Fuhrmann
----t
----t
with j the embedding of Fm[z] into F m((Z-l)) and n the canonical projection
onto the quotient module.
Elements of Fm((z-l))/Fm[z] are equivalence classes and two elements are
in the same equivalence class if and only if they differ in their polynomial terms
only. Thus a natural choice of element in each equivalence class is the one
element whose polynomial terms are all zero. This leads to the identification
of Fm((z - 1))/Fm[z] with z - 1Fm[[z -1]]. With this identification we denote by
n _ the canonical projection, i.e.
n
n_
fjzj =
j=-oo
fjzj
j=O
Since Kern_ = Fm[z] and Imn_ =z- l F m[[Z-l]] we have a direct sum
decomposition, over R,
We will denote by n+ the complementary projection on Fm[z], i.e.
n+ = 1- n_
or equivalently
1T.+
L fjzj = L fjzj
j
j'i?;O
At this point we will introduce a special operator, namely the shift operator
S defined by
(Sf)(z)
= zf(z)
for fEV((Z-l))
Clearly S is a linear map that is invertible and S-l f = Z-l f. The name
derives from the representation ofthe map in terms ofthe sequence of coefficients.
Indeed if f(z) = 'Lfjzj and we make the correspondence
f(z)~( ... , f2' fl'
/0, f -1"")
237
S_h=1LZh
hEZ- 1 F m [[z- l
JJ
We note that S + is injective but not surjective whereas S _ is surjective but not
injective.
The map S _ has many eigenfunctions. In fact, each oe in F is an eigenvalue
of S _, with the eigenfunctions given by
v(z)=(Z-oe)-l~
~EFm
In the same way one can show the existence of generalized eigenfunctions of
arbitrary order. Contrary to this richness of eigenfunctions, the shift S + does
not have any eigenfunctions. The previous theorem indicates that the spectral
structure of the shift S _, with finite multiplicity, is rich enough to model all
finite dimensional linear transformations, up to similarity transformations. This
is indeed the case as the next theorem, proved originally by Rota [1960J in a
different setting, shows.
Theorem 2.1 (Rota). Let A be a linear transformation infinite dimensional veetor
spaee Fm over F, whieh we take to be Fm. Then A is isomorphie to S _ restricted
to afinite dimensional S_-invariant subspaee ofz- 1Fm[[Z-lJ].
Proof. Define the set L by L= {(zI (zI -
A)-l~
L Aj~z-U+
A)-l~I~EFm}.
Since
00
1)
j= 1
1Lz(zI - A)-l~
= 1L(zI - A + A)(zI -
A)-l~ =
A)-l~ =
(zI - A)-l A~
238
P. A. Fuhrmann
(/J
j=O
Then clearly
vjZ j =
(/J
j=O
Ajv j
(/JS+=A(/J
If an F[z]-module strueture is defined on Fm by
p'V = p(A)v
for
pEF[z]
Fm[z]jKer (/J
v(z)
j=O
wjZ j =
j=O
wjz j +1
L Awjz j
j
So
(/J(v)=
j=O
Aj+I Wj -
L AjAwj=O
j=O
Conversely let vEKer (/J, then if v(z) = L'j=oVjZ j, we have "L'j=oAjvj = O. This
implies that
v(z) =
j=O
j=O
L vjZ j - L
Ajv j =
L (z j1 -
j=O
Aj)v j
But
(z j1 - Aj) = (zl- A)(zj- 11 + zj-2 A
+ ... + Aj-I)
239
3 Coprimeness
In this section we focus on coprimeness. This is a c1assieal topic, going back
to Greek mathematics that culminated in the Euclidean algorithm. In the fabric
of system theory the presence of the notion of coprimeness and the Euclidean
algorithm, and the related Bezout equation, are all pervasive. They relate to
the geometry of invariant subspaces, spectral problems and inversion of
operators, stability criteria, canonical forms, controllability criteria, eontinued
fraction expansions, recursive algorithms, and this is a partial list.
We will introduce coprimeness on the level of polynomial matrices, relate
these to geometrie ideas as well as to spectral representations. Finally we will
apply it to the study of inversion of the module homomorphisms characterized
in Theorem 3.6.
All these ideas have their counterpart in functional analysis and operator
theory. In its most far reaching extension the idea of coprimeness is present in
the eelebrated corona theorem, proved by Carleson [1962]. For the application
of this to spectral theory see Fuhrmann [1968], and to infinite dimensional
realization theory we refer to Fuhrmann [1981]. Recently a very nice applieation
to the robust stabilization problem has been provided by Georgiou and Smith
[1990]. For an effort to present a eoherent connection between polynomial
coprimeness and H OO coprimeness see Fuhrmann [1991].
We begin now with the arithmetization of the lattice operations in the set
of submodules of Fm[z]. First we recall some definitions. Given elements Di in
a ring R then D is a common leJt divisor, or common left factor if there exist Ei
in R such that Di = DEi. E is a common right multiple of the Di if there exist Ei
such that DiE i = E. D is a greatest common right divisor, if it is a common
right divisor and is a right multiple of any other right divisor. Elements D i are
left coprime iftheir g.e.1.d. is a unit. The symmetrie concepts are similarly defined.
The set of submodules of Fm[z] is partially ordered by inc1usion. Inc1usion
is related to factorization of the representing polynomial matrices.
Theorem 3.1. Let M = DFm[z] and N = EFm[z] then M
Jor so me G in pP x mEz].
D
N ifandonly ifD = EG
240
P. A. Fuhrmann
Corollary 3.1. (i) Given D i, i = 1, ... , s, in FP x mez] then the D i have a g.c.l.d. D
which can be expressed as
s
D=
L DiEi
i= 1
(ii) Given Di, i = 1, ... , s, in FP x mez] then the D i have a g.c.r.d. D wh ich can
be expressed as
s
D=
L EPi
i= 1
Corollary 3.2. (i) Elements Di> i = 1, ... , s, in FP x mez] are left coprime
if there exist Ei such that
if and only
L DiEi=1
i=l
(ii) Elements D i, i = 1, ... , s, in FP x mez] are right coprime
if and
only
if there
L EPi=1
i=l
Set operations on invariant subspaces of X D are reflected in the factorizations. This is summed up in the following.
Theorem 3.3. Let Mi> i = 1, ... , s be submodules of X D, having the representations
Mi = EiX Fi' that eorrespond to the faetorizations
D=EiF i
Then the foltowing statements are true.
(i) M 1 C M 2 ifand only ifE 1 = E 2 R, i.e. ifand only ifE 2 is a leftfaetor ofE 1
(ii) n;=lMi has the representation EvXF with E v the l.e.r.m. ofthe Ei and F v
the g.e.r.d. of the F i.
(iii) M 1 + ... + M s has the representation EJlX F" with EJl the g.e.l.d. of the Ei and
FJl the l.c.l.m. of alt the F i
0
+ ... + EsX Fs
Ei are left eoprime.
241
X n = E 1X F, EB .. EBEsX Fs
is a direct sum if and only if D = EiPJor all i, the Ei are left coprime and the Pi
and right coprime.
D
242
P. A. Fuhrmann
and
(3)
Z:XD~XD,
HG:Fm[z]~z-lp[[z-I]]
(4)
243
Z-lpp[[Z-I]] by
(6)
H=D~IZnD
(7)
So H satisfies the Hankel functional equation (5). By Theorem 4.1 there exists a
G such that H = HG. So
(8)
or
D I n_ Gf = ZnDf
f we get D1 G = NI' or
(9)
G=D~I NI
::::l
n_GDf=O,
Le. N = GD is a polynomial matrix. So
G=ND- I
(10)
and we have
(11)
244
P. A. Fuhrmann
Notice that no coprimeness assumptions have been made. Also we note that
equation (11) is equivalent to the intertwining relation (2).
Conversely, assurne Theorem 3.6 holds. Let H be a Hankel operator and
assurne Im H is finite dimensional. So obviously H: Fm[z] --+ Im H is surjective.
DefineZ:XD--+X D, by Z = DlH. We claim ZSD = SD,Z. Indeed,for fEX D,
(12)
Hf=D l 1C D,Nd=1C_D;lNd=1C_Gf
5 Realization Theory
We saw, in the proof of Theorem 3.6, the natural way in which matrix fractions
arise. We use now these matrix fractions to write down an extremely simple
realization procedure. While this is a basis free approach, it goes without saying
that special choices of basis lead to special matrix realizations, and in turn to
a variety of canonical forms. This is a very general method which we do not
explore in full in this paper. However we will give some examples in connection
with continued fractions. We begin by recalling some concepts.
Definition 5.1. A discrete time, constant linear system L is a tripie of F-linear
spaces X, U and Y and a tripie of maps (A, B, C) with AEL(X, X), BEL(U, X)
and CEL(X, Y). The tripie of maps represents the system of equations
Xn + 1 =
AX n + BU n
Yn
CX n + DUn
(13)
L CAjBu n- j00
Yn
j=O
We note that the input/output relation depends on the tripie (A, B, C) only
through the maps CAjB, which are called the Markov parameters ofthe system.
245
Definition 5.2. Given the system}; = (A, B, C) with the state-space X. Astate
XEX is called reachable if there exists a sequence of inputs driving the system
from the zero state to x. Astate XEX is called unobservable if in the absence
of inputs all outputs of the system with x as the initial state are zero. We say
the system}; is reachable if every state is reachable and observable if the only
unobservable state is the zero state.
We state without proof the following simple criteria for reachability and
observability.
Theorem 5.1. The system}; = (A, B, C) is reachable
i=O
and observable
if and only if
(14)
if and only if
n KerCAi={O}
i=O
00
(15)
(16)
n Ker CA = {O}
(17)
n-l
i=O
and
n-l
i=O
respectively.
Conditions (16) and (17) are equivalent to the Kaiman reachability and
observability rank conditions. Define now the reachability map ~: Fm[z] -+ X
and observability map (1): X -+Z-l FP[[Z-l]] of the system}; = (A, B, C) by
~
and
i=O
i=O
L UiZ i = L AiBu i
246
P. A. Fuhrmann
The next lemma summarizes the basic properties of the reachability and
observability maps.
Lemma 5.1. With the module structure induced in X by A, the reachability map
and observability map (!) 01 the system L are F[zJ-module homomorphisms.
is surjective if and only if L is reachable and (!) is injective if and only if L is
observable.
D
qt
qt
Theorem 5.2. Given a restricted i/o map 1 and a realization L = (A, B, C). Then
1 admits a lactorization
1 = (!)qt
where qt and
the diagram
(!)
is commutative.
Theorem 5.3. Given a restricted i/o map 1 then canonical realizations oll always
exist.
247
Fm[z]
Imf
/h
Z-lp[[-l]]
where now g differs from f only in the range module and h is the canonical
injection of Im f into Z-l P[[Z-l]].
0
Our representation theorems, using the polynomial and rational models X D
and X D enable us to pass from the abstract realization of an i/o map f to a
more concrete one. For this we use the matrix fraction representation ofrational
matrices and their relation to the kernel and image ofthe induced Hankel map.
Let G = ND- 1 be a matrix fraction representation of a rational G, no
coprimeness assumptions being made. Define the shift realization to be the
realization (A, B, C) in the state-space X D defined by
(18)
=(Zi-1G~)_1 =(lLZi-1G~)_1
=Gi~'
i.e. we have a realization. We compute next the re ach ability and observability
maps of this realization.
n
r!ll
L UiZ i = L
i=O
i=O
S~nDui
L nDzinDu i
i=O
L nDziui =
i=O
nD
L Zi Ui
i=O
for all
uEFm[z]
248
P. A. Fuhrmann
Obviously, since X D = Im 1tD , f!/t is surjective and hence the previous realization
is reachable. For the observability map @ we have
@f=
=
L (ND-11tDZi-lf)_lZ-i= L (ND-1D1t_D-lzi-lf)_lZ-i
00
00
i= 1
i=l
00
i= 1
i=O
1t_Gf
We can rewrite this realization using the rational model rather than the
polynomial one. Thus the state-space is chosen as X D and (A, B, C, D) is defined
through
(20)
balanced realization.
The two realization procedures outlined above can be combined into a single
one by looking at more general representations of rational transfer functions.
Thus we will assume the transfer function of a system is given by
(21)
Our approach to the analysis of these systems is to associate with each
representation of the form (21) a state-space realization in the following way.
We choose X T as the state-space and define the tripie (A, B, C), with A: X T --+ X T'
{ B~:~:U~'
249
(22)
Cf=(VT-1f)_1
if and only if T
6 Stability Tbeory
Stability theory is, in the context of linear, finite dimensional time invariant
systems, concerned with root location ofpolynomials. The origin ofthis problem
goes back at least to the middle ofthe previous century, with the work of Jacobi
and Borchardt. This work utilized the theory of quadratic forms. This powerful
tool reached its perfeetion at the hands of a master, C. Hermite [1856]. In a
completely different direction we have the pioneering work of Liapunov [1893].
This reduces in our case to the analysis of the celebrated Liapunov equation.
Now these two widely differing approaches to the study of stability are
manifestations of a constant phenomenon in system theory. The main
characterizing object of a time invariant system can be taken to be, in view of
realization theory, its transfer function. The transfer function of a finite
dimensional system is rational, and a rational function can be viewed in two
very different ways. On the one hand we can see it as an algebraic object and
apply to its study algebraic methods. On the other hand we can view it as a
complex valued function and study it in analytic terms. This Janus like character
of system theory accounts in no small part for the richness of the field and the
depth of some of the results. In multifaceted situations like this, especially in a
dynamic field of research as the theory of dynamical systems, it becomes
increasingly difficult to maintain agiobai encompassing point of view. But
whenever such a view can be obtained it is highly profitable and provides some
extra insight to the theory.
It is in this context that one can view Kalman's [1969] contribution to the
area of stability.
250
P. A. Fuhrmann
In this work KaIman set out to give a unified treatment of the classical
stability criteria of Hermite, Hurwitz, Schur-Cohn and Liapunov. In the process
the algebraic theory of quadratic forms and the Liapunov equation are unified.
While this is not the first result relating Liapunov's method to the method of
quadratic forms, apparently the first demonstration of this is due to Parks
[1962], it is a highly ingenious one. The approach is algebraic and uses modular
arithmetic in two variables. The result can be described in the following way.
We associate a matrix C(p) to any polynomial pEC[X, y], where p(x, y) =
L~:~ Lj:~CijXiyj. Given a polynomial ifJEC[X] we will denote by 'I' the ideal
in C[x,y] generated by (ifJ(x), I//(y)) where I//(z) = ifJ(z). The half plane result can
be stated in the following way.
Theorem 6.1. 1fifJEC[x] and 'I' is the previously defined ideal. Then alt zeroes
ofifJ are in the open right half plane if and only if C(x + y)-l mod '1') is positive
definite.
D
This result can be transformed into other domains as folIows. Let
r(x, Y)EC[X, y] be such that C(r) has rank 2 and signature O. Then the following
is true.
Theorem 6.2. 1f <pEC[x] and (/J is the previously defined ideal. Then alt zeroes
of <p are in the domain Rer(A., I) > 0 if and only if C(r- 1 mod (/J) is positive
definite.
D
It is worth mentioning that Kalman's method has been extended, see Djaferis
and Mitter [1977], to give a constructive algorithm for the solution ofLiapunov
related equations.
From my point of view I find the KaIman [1969] approach especially
interesting because of the intensive use of modular arithmetic. This leads to a
very direct connection to Liapunov's theorem.
We proceed now to a brief review of the classical approach using quadratic
forms. This is the road taken originally by Hermite [1856] and an excellent
scholarly exposition of is to be found in Krein and Naimark [1936].
The basic quadratic forms related to the analysis of root location problems
are the Hankel and Bezoutian forms. For this set of problems the Bezoutian
certainly proves itself the more powerful too1. The reason for this being its
linearity properties with respect to its defining polynomials. The Bezoutian
B(q, p) of the two polynomials q and pis defined as the quadratic form B(q, p) =
(bi) where
q(z)p(w)-p(z)q(w)=
L L bijZi-l(Z-W)W j -
(23)
i= 1 j= 1
251
Bezoutians see Helmke and Fuhrmann [1989]. It seems that the most powerful
way to study the Bezoutian is the following characterization derived in
Fuhrmann [1981a].
Theorem 6.3. Let p, qEF[z], with degp ~ degq. Then the Bezoutian B = B(q,p)
of q and p satisfies
(24)
In the root location problem several Bezoutian related quadratic forms turn
out to be useful. In particular let Q = (Qjd, where
Q(z, w) = _ i q(z)q(w) - q(z)q(w)
z-w
"L.., "Q
L..,
jk Z j - l Wk-l
(25)
i = 1 j= 1
Q=
L1 k=L1 Qjk~lk
j=
be the Hermitian form with the Qjk defined by (25). Then the nu mb er of real
zeros of q together with the number of zeros of q arising from complex conjugate
pairs is equal to b. There are 7r more zeros of q in the upper half plane and v
more in the lower half plane. In particular all zeros of q are in the upper half
plane if and only if Q is positive definite.
0
Note that the Hermitian form - iB(q, q) can also be written as the Bezoutian
of two real polynomials, i.e. polynomials with real coefficients. Indeed, let qr' qi
denote the real and imaginary parts of q, which are defined by
qr(z) = q(z) ; q(z)
qi(Z) = q(Z); q(z)
252
P. A. Fuhrmann
we have
- iB(q, q)
== -
+ iq;, qr - iqi)
i [B(qr' qr) + iB(q;, qr) iB(qr
= 2B(q;,qr)
or
- iB(q, q)
= 2B(q;, qr)
(26)
z+w
h;jZ;-lwi- 1
(27)
;=lj=l
The quadratic form associated with the generating function (26) is called the
H ermite-Fujiwara form.
Let now q(z) be areal polynomial. We define its even and odd parts, which
we denote by q + and q _ respectively, by
q+
( 2)
Z
q(z) + q( - z)
q _ (Z2) = q(z) - q( - z)
(28)
Definition 6.1. Let q(z) be areal monic polynomial of degree m with real simple
zeroes
and let p(z) be areal polynomial with positive leading coefficient and zeroes
and
l
Then we say that q and p are a real pair if the zeroes satisfy
1X 1 <l<1X 2 <2< .. <m-1<lXm if degp=m-l
253
and
1<OC 1 <2<<m<OCm if degp=m
We say that q and p form a positive pair if they form a real pair and OCm < O.
We are ready to give a summary of the Bezoutian related stability eriteria.
For a proof of the related Hurwitz determinantal eonditions in the spirit of this
paper we refer to Helmke and Fuhrmann [1989].
Then the
(29)
One form eontains only even terms the other only odd ones. So the positive
definiteness ofthe Hermite-Fujiwara form is equivalent to the positive definiteness of the two Bezoutians B(q +, q _) and B(zq _, q +).
Let g be a rational transfer funetion with real eoefficients. The Cauehy index
of g, denoted by I g , is defined as the number of jumps of g from - 00 to + 00
minus the number of jumps from + 00 to - 00. The eentral result eoneerning
the Cauehy index is the possibility of evaluating the Cauehy index of a rational
funetion as the signature of a quadratic form. This result is gene rally known
254
P. A. Fuhrmann
I g = (J(H n) = (J(B(q,p))
(30)
H=
n
o
As a eonsequenee of Theorem 6.6(iii) and the Hermite-Hurwitz theorem
we ean state the following.
Theorem 6.8. Let q be areal monie polynomial, and let q +, q _ be defined as in
(28). Let the rational funetion g be defined by
g(z) = q_(z)
q+(z)
Then
1. g is proper for odd n and strietly proper for even n.
2. g ean be expanded in apower series in Z-l
g(z) = go
+ gl Z-l + ...
gn-l
gn-l
g2n-3
H=
n
and
g2
gn
gn
g2n-2
((JH)n =
o.
255
Going baek to equation (29) we are led direetly to the following result, see
Gantmaeher [1959J p. 177-178.
- zq _ ( - Z2)
q+{ _ Z2)
or
g
n-l
n-3
-qn-3z
+ ...
(z)=qn-l z
n
n-2
z-qn-2 z
+ ...
we have 19 = n.
Froof. It suffiees to eompute the Bezoutian of - zq _ ( - Z2) and q + ( - Z2). This
we proeeed to do
- q+{ - Z2)wq_{ - w2) + zq_{ - Z2)q+{ - w2)
z-w
[ - q+{ - Z2)wq_{ - w2) + zq_{ - Z2)q+{ - w2)J{z + w)
Z2_ W2
Z2 q _{ _ Z2)q+{ _ w2) _ q+{ _ Z2)W 2q_{ _ w2)
Z2 _w 2
o
1
Cq = [SqJ:: =
(31)
256
P. A. Fuhrmann
7 Continued Fractions
Continued fractions have a long history in mathematics. Kalman's contribution
to this subject, KaIman [1979J, is by no means the first use ofcontinued fractions
in the area of system theory. A variety of ad-hoc methods related to continued
fractions and Pade approximation have been used in system theoretic problems.
However Kalman's contribution seemed the first coherent exposition on the
connection of the Euclidean algorithm, continued fractions, the partial
realization problem, canonical forms etc. This was a highly influential paper
that triggered a lot of other work in this area. The papers by Gragg and
Lindquist [1983J, Antoulas [1986J and Helmke and Fuhrmann [1989J are just
a sm all sampie.
The use of continued fractions as a prametrization for Rat(n) has been
suggested in that paper. Some work on the topological aspects of this approach
was started by Fuhrmann and Krishnaprasad [1986J and continued by Helmke,
Hinrichsen and Manthey [1989]. Also, the idea of using continued fractions
for the construction of balanced realizations in Fuhrmann [1991J is along the
lines of KaIman [1979J and Ober [1987].
Let 9 be a strict1y proper rational function and let 9 = p/q be an irreducible
representation of 9 with q monic of degree n. We define, using the division rule
for polynomials, a sequence of polynomials qi' and a sequence of nonzero
constants i and monic polynomials ai + 1 (z), referred to as atoms by
q-l = q, qo = P
qi+ 1 (z) = ai+ 1 (Z)qi(Z) - iqi-l (z)
(32)
with deg qi+ 1 < deg qi. The procedure ends when qr is the g.c.d. of p and q. Since
p and q are assumed coprime qr is a nonzero constant.
The atoms a 1 , , ar and real monic polynomials of degrees n 1 , .. , nr such
that
n 1 + ... + nr = n
(33)
257
In terms ofthe Pi and the ai(z), g has the continued fraction representation
g(z) = _ _ _ _ _--=-P--=-o_ _ _ __
a1(z) _ _ _ _ _--'-P--=-1_ _ _ __
az(z) _ _ _ _ _P_2____
Pr-2
Pr-1
ar-l ()
z ---
a3(z) - ... - - - - -
(35)
ar(z)
a(Hg ) =
sign
i=1
nP
i- 1
j=O
1 + ( _ 1)degai -
(36)
o
The continued fraction representation (35) can be translated immediately
into a canonical form realization which incorporates the atoms.
Theorem 7.2. Let the strictly proper transfer function g = p/q have the sequence
of atoms {ai+ 1(Z), PJ, and assume that n 1 , ... , nr are the degrees of the atoms
and
ak(z) =
nk-1
alklz i + znk
(37)
i=O
A 12
A 21
A 22
A=
where
o
1
, i = 1, . .. ,r,
Aii =
1
(39)
258
P. A. Fuhrmann
o . .
o
(40)
o
b=
C=
(0 .. o 0 .. 0)
(41)
o
the o being in the n1 position.
Proof. Two sequences of polynomials Pk and Qk are defined by the three term
recursion formulas
and
P -1 = - 1, Po = 0
P k+1 (z) = ak+1 (z)Pk(z) - kPk-1 (z)
Q-1 =0,
(42)
Qo = 1
(43)
is clearly a basis for X q as it contains one polynomial for each degree between
1. We call this basis the orthogonal basis because of its relation to
orthogonal polynomials. In this connection see Gragg [1974J and Fuhrmann
[1988J.
To complete the proofwe use the shift realization (18) of gwhich is minimal,
by the coprimeness of p and q. Its matrix representation with respect to the basis
Bor proves the theorem. In the computation of the matrix representation we
lean heavily on the recursion formulas (43).
0
o and n -
259
and so
d1(z) = a1(z)p(z) - Pok
o = a.po
However, as qdp = 4dP all other atoms of 9 and (j coincide. Thus, changing
the transfer function g(z) by output feedback amounts to rescaling Po by a
nonzero constant and to chan ging the constant part of a1(z) by an arbitrary
real number, i.e. a1(z)l-+a1(z) - a. In particular the degrees ni of the atoms are
output feedback invariants, see Fuhrmann and Krishnaprasad [1986], Helmke
and Fuhrmann [1989]. It follows that any transfer function g(z) as in equation
(35) is output feedback equivalent to a unique transfer function (j given by
1
(j(z) = - - - - - - - - - - d1(z) _ _ _ _ _---'P---"l'-----_ _ __
aiz) _ _ _ _ _P_2____
Pr-2
a3 (z) - ... - - - - -
ar -
(44)
(z) - Pr-l
--
ar(z)
ak(z) =
alklzi + zn k
(45)
i=O
(A, b, e)
, e= (0 . . 1 0 .. 0)
(46)
o
the 1 being in the n 1 position.
All
A 21
A=
A 12
A 22
~n
260
P. A. Fuhrmann
where
-all)
0
1
, Au=
- a(l)
o . .
n,-1
-a(i)
nj-!
o
(49)
o
D
{ ~17 + 17,4 =
-~B
A17+17A=-CC
(50)
The matrix 17 is called the gramian of the system (A, B, C, D) and its diagonal
entries are called the singular values of the system. They are equal to the singular
values of an induced Hankel operator, Hg, where this Hankel operator acts
between the Hardy spaces of the right and left half-planes.
Thus let g = d*jd be an asymptotically stable, all-pass transfer function. This
means, see Glover [1984J or Fuhrmann [1991J, that d is stable and the singular
values satisfy 0"1 = ... = 0"n = 1.
d(z) = d+(Z2) + Zd_(Z2)
(51)
(52)
Then
Next we put
J(z) = - Zd_(Z2)
d+(Z2) ,
261
h(z) =
zd_( -
Z2)
d+( - Z2)
h(z)=
k= 1 Z -!X k
with !Xl < ... <!Xn" The other representation, and the more interesting one for
our purposes, is a continued fraction representation of the form given in
equation (35) i.e.
h(z) =
_ _ _ _ _~o=____ _ _ __
a 1(z) _ _ _ _ _ __1_ _ _ __
a2(z) _ _ _ _--'---=-2_ _ __
n-2
a3 (z) - ' .. - - - - -
an -
(53)
() n - 1
1 z --an(z)
where ai(z), i = 1, ... , n are monic of degree one and alI the i are positive.
Theorem 7.4. Let g be an asymptotically stable all-pass function. Then g has a
o
,
(0 ...
!Xo)
(54)
262
P. A. Fuhrmann
with the i chosen so that the ai are monic and degqi+1 <degqi<degqi-1.
Let ai(z) = z - (Xi' i = 1, ... , n. We will show by induction that the sequence of
polynomials is alternatingly even and odd. In fact the initialization ensures q - 1
to be even and qo to be odd. So assume parity alternates for all indices till i.
From equation (55) it follows that
1
qi-l(Z)
(56)
ai(z)qi(Z)-qi+1(Z)
and hence
qi(Z)
qi-1(Z)
ai(z)- qi+1(Z)
qi(Z)
(57)
Now qi(Z)/qi-1 (z) is odd by the induction hypothesis. So ai(z) - qi+ 1 (Z)/qi(Z) is
also odd. This forces both ai(z) and qi+1(Z)/qi(Z) both to be odd. Thus a;{z)=z
and qi + 1 has opposite parity to qi. Thus h has the representation
h(z) = _ _ _ _ _(X~~-=--_ __
z _ _ _ _ _(X...o.i_ _ __
(X~
z - -----0---
z-
(X;-2
(58)
(Xn-1
z--Z
By Theorem 9.4 in Helmke and Fuhrmann [1989] h(z) has a realization of the
form
o
1
1
which is similar, through the matrix diag(po, ... , Pn-1) with Po = 1 and
Pi+1/Pi = (X~, to the matrix
0
(Xl
(Xl
,
(Xn -1
(Xn-1
(XO
(0 ...
(Xo)
263
d + (Z2)
SO
- ia~
J(z) = --------"::-----iz _ ~_ _ _~_a~i_____
a2
iz - -~----___:__---
z+ _____a_i'-=-____
a2
a;-2
iz -
iz __
a;_-_l
iz
(59)
Using again the continued fraction canonical form we get a realization for
J given by
0
-1
a 21 0
-1
a;_l
0
and by similarity to
0
-al
al
(0", ao)
o
Now g is obtained back from
tion, Thus g is realized by
o
(60)
with k = a~,
Clearly the realization in (60) is balanced with L = I,
264
P. A. Fuhrmann
References
[1965] N.I. Akhiezer, The Classical Moment Problem, Hafner, New York
[1986] A.C. Antoulas, "On recursiveness and related topics in system theory", IEEE Trans Aut
Control, AC-31, 1121-1135
[1970] R.W. Brockett, Finite Dimensional Linear Systems, Wiley, New York
[1962] L. Carleson, "Interpolation by bounded analytic functions and the corona problem", Ann
Math, 76, 547-559
[1977] T.E. Djaferis and S.K. Mitter, "Exact solution of some linear matrix equations using
algebraic methods", M.I.T. Report ESL-P-746
[1976] P.A. Fuhrmann, "Algebraic system theory: An analyst's point of view", J FrankIin Inst,
301, 521-540
[1977] P.A. Fuhrmann, "On strict system equivalence and similarity", Int J Contr, 25, 5-10
[1981] P.A. Fuhrmann, Linear Systems and Operators in Hilbert Space, McGraw-Hill, New York
[198Ia] P.A. Fuhrmann, "Polynomial models and algebraic stability criteria," Proceedings of the
Joint Workshop on Feedback and Synthesis of Linear and Nonlinear Systems, ZIF
Bielefeld, June 1981
[1981 b] P.A. Fuhrmann, "Duality in polynomial models with some applications to geometrie
control theory," IEEE Trans Aut Control, AC-26, 284-295
[1983] P.A. Fuhrmann, "On symmetrie rational transfer functions", Linear Algebra and Appl,
50, 167-250
[1984] P.A. Fuhrmann, "On Hamiltonian transfer functions", Lin Alg Appl, 84, 1-93
[1988] P.A. Fuhrmann, "Orthogonal matrix polynomials and system theory", Rend Sem Mat
Univers Politecn Torino, Special issue Control Theory, 68-124
[1991] P.A. Fuhrmann, "A polynomial approach to Hankel norm approximations". Lin. Aig.
Appl., 146, 133-220
[1989] P.A. Fuhrmann and B.N. Datta, "On Bezoutians, van der Monde matrices and the
Lienard-Chipart stability criterion", Lin Alg Appl, 120, 23-38
[1986] P.A. Fuhrmann and P.S. Krishnaprasad, "Towards a cell decomposition for Rat(n), IMA
J Math Contr Info 3 (1986), 137-150
[1926] M. Fujiwara, "Uber die algebraischen Gleichungen, deren Wrzein in einem Kreise oder
in einer Halbebene liegen," Math Z, Vol 24, 161-169
.
[1959] F.R. Gantmaeher, The Theory of Matrices, Chelsea, New York
[1990] T.T. Georgiou and M.C. Smith, "Optimal robustness in the gap metrie", IEEE Trans.
AC-35: 673-686
[1984] K. Glover," All optimal Hankel-norm approximations and their Loo-error bounds", Int J
Contr, 39, 1115-1193
[1983] W.B. Gragg and A. Lindquist, "On the partial realization problem", Linear Algebra and
Appl, 50, 277-319
[1989] U. Helmke and P.A. Fuhrmann, "Bezoutians", Lin Alg Appl, Vols 122-124, 1039-1097
[1989] U. Helmke, D. Hinrichsen, and W. Manthey, "A cell decomposition of the spaee of real
Hankels ofrank~n and some applieations", Lin Alg Appl, Vols 122-124,331-355
[1856] C. Hermite, "Sur le nombre des raeines d'une equation algebrique comprise entre des
limites donnes," J Reine Angew Math, Vol 52, pp 39-51
[1965] R.E. Kaiman, "On the Hermite-Fujiwara theorem in stability theory", Quarterly of Appl
Math, 32, 279-282
[1965] R.E. Kaiman, "Algebraie structure of linear dynamieal systems. I. The module of E", Proc
Nat Acad Sei (USA), 51, 1503-1508
[1969a] R.E. Kaiman, "Lectures on eontrollability and observability", CIME Summer school
(Pontecchio Marconi, Italy 1968), Edizioni Cremonese, Roma
265
[1969b] R.E. KaIman, "Introduction to the algebraic theory of linear dynamical systems" in
Mathematical System Theory and Economics (edited by H.W. Kuhn and G.P. Szego),
Springer Lecture Notes in Operations Research and Mathematical Economics, Vol 11,
41-65
[1969c] R.E. KaIman, "Algebraic characterization of polynomials whose zeros Iie in algebraic
domains", Proc Nat Acad Sei, 64, 818-823
[1970] R.E. KaIman, "New algebraic methods in stability theory", Proc 5th International Congress
on Nonlinear Oscillations, Kiev 1969, Vol 2, 189-199
[1979] R.E. KaIman, "On partial realizations, transfer functions and canonical forms", Acta Polyt
Scand, 31, 9-32
[1969] R.E. KaIman, P.L. Falb and M.A. Arbib, Topics in Mathematical System Theory, McGrawHili, New York
[1936] M.G. Krein and M.A. Naimark, "The method of symmetrie and Hermitian forms in the
theory of the separation of the roots of algebraic equations," English translation in Linear
and Multilinear Algebra, Vol 10 [1981], 265-308
[1980] P.S. Krishnaprasad, "On the geometry oflinear passive systems", in Linear Systems Theory,
edited by c.1. Byrnes and C.F. Martin, in series Lectures in Applied Mathematics, Vol 18,
253-275, Amer Math Soc, Providence, R.I.
[1890] L. Kronecker, "Algebraische Reducti der Schaaren bilinearer Formen", S.-B. Akad Berlin,
763-776
[1893] A.M. Liapunov, "Probleme general de la stabilite de mouvement", Ann Fac Sei Toulouse,
9 (1907), 203-474. (French translation of the Russian paper published in Comm Soc
Math Kharkow)
[1914] A. Lienard and M. Chipart, "Sur le signe de la partie reelle des racines d'une equation
algebrique," J de Math, Vol 10, pp 291-346
[1981] B.C. Moore, "Principal component analysis in linear systems: controllability, observability
and model reduction", IEEE Trans on Auto Contr, 26, 17-32
[1957] Nehari, "On bounded bilinear forms", Ann Math, 65, 153-162
[1985] N.K. Nikolskii, Treatise on the Shift Operator, Springer Verlag, Berlin
[1987] R. Ober, "Asymptotically stable all pass transfer functions: canonical form, parametrization
and realization", Proceedings IFAC World Congress, Munich 1987
[1970] H.H. Rosenbrock, State Space and Multivariable Theory, J. Wiley, New York
[1968] G.c. Rota, "On models for linear operators", Comm. Pure and Appl. Math., 13: 469-472
[1967] D. Sarason, "Generalized interpolation in H""", Trans Amer Math Soc, 127, 179-203
[1970] B. Sz.-Nagy and C. Foias, Harmonie analysis ojOperators on Hilbert Space, North Holland,
Amsterdam
[1931] B.L. van der Waerden, Moderne Algebra, Springer Verlag, Berlin
1 Introduction
In aseries of seminal papers published between 1960 and 1965 [8-13] (see also
[14,15]), R.E. KaIman layed the foundations of what has since become known
as M athematical System Theory. The corners tones of Kalman's theory were the
celebrated concepts of Controllability, Observability and (Canonica0 realization.
The first formal introduction of the concepts of controllability and observability
as fundamental structural properties of (linear) systems was made by KaIman
in [8,9] and the canonical realization problem and its relation to controllability
and observability was first studied extensively in [11]. The relation between the
structure of canonical realizations and that of transfer matrices was investigated
by KaIman extensively in [12]. The crucial insights in Kalman's theory derived
from the discovery that the concepts of controllability and observability are
linked in an essential way to the system's structure and that many of the system's
structural features are encoded in its controllable and observable behaviors.
In the early stages, the mathematical tools for the study of the structure of
linear systems were only basic linear algebra (the theory of invariant subspaces)
and the theory of matrices (in the style of Gantmacher [7]). A crucial
contribution to the theory of linear systems was Kalman's discovery [13] that
the theory can be embedded naturally in c1assical module theory. Specifically,
module theory was shown to be a natural setting for an abstract theory of
realization in which the concepts of state, controllability and observability arise
in a completely natural way.
A further central result on linear system structure was obtained by W onham
in [19] where he showed that (in multi-input systems) controllability is
equivalent to "pole-assignability". This result indicated (probably for the first
time) the deep interrelation between controllability and (state)-feedback
capabilities of a given system. (The pole assignability result had been known
much earlier for single input systems.) The research on linear feedback received
a great boost by the pole assignment result, and proceeded in two basic parallel
avenues: The "geometric" approach that was developed primarily by Wonham
and Morse (see, e.g. [18]) in wh ich framework such important problems were
investigated as feedback-decoupling, regulator design, design ofmodel-following
268
269
2 Input/Output Maps
A system is a device that accepts inputs and produces outputs (based on the
received inputs). We consider linear discrete-time systems, i.e. we assume that
the following is given:
1. A field lK. Typically, lK will be the field of real or complex numbers, or a
Let J: A 11Il-+ A 0// be an i/o map. We define the M arkov parameters of J as the
lK-linear maps T t :l1Il-+ 0//, 1;:= PtO J oi u , where iu :l1Il-+ AI1Il denotes the canonical
270
3 State-Space Realizations
Suppose that in addition to Illt and I{?!, we are given the space ff, which we will
refer to as state space, and consider sequences (x t), (u t ), (Yt) related by the
following equations:
(3.1)
where F, G, and H are linear maps between the corresponding spaces. For each
uEAIllt, one can construct a unique xEAff satisfying the first equation of (3.1),
and subsequently YEAI{?! by the second equation. This defines an ijo map, which
can be written as
Y = T(z)u,
(3.2)
(3.3)
where
'Fr:= Hpt-1G
if t > 0
(3.4)
We will call (3.1) astate space representation of the i/o map defined by either
(3.3) or (3.4). Realization theory is concerned with the opposite direction of the
above computation: Construct a state-space representationfrom a given i/o map.
Fundamental questions are the existence, the uniqueness and the actual
construction of realizations. As explained in the introduction, the foundation
for an elegant theory was provided by R.E. KaIman. The starting point of this
theory is apreeise deseription of the eoneept of state. The state is a
time-dependent variable that satisfies two fundamental properties:
(3.5)
271
( - 00,0] can be identified with polynomials. We will use the notation illK or
IK[z] for the set ofpolynomials in z, and similarly, ililll for the set ofpolynomials
with coefficients in illI. Hence we have a map g:ililll-+f!l, where f!l denotes the
state space. The quantity x = g(u) represents the state at time t = 1 resulting
from the input u.
The second statement says that, if we are only interested in the influence of
past inputs (i.e. polynomial inputs) on future values of the output, then there
is a map h from the state space f!l to the set P!!! of all sequences of future
outputs. The quantity y = h(x) represents the output sequence for t> 0 resulting
from input zero and initial state x. We will specify the set TOJI and the map h
in a moment. Taking the composite of the two maps 9 and h, we have a
map J:= hog:ililll-+ TOJI, which represents the effect of past inputs on future
outputs.
In order to make this discussion more concrete, we have to define the space
TOJI more precisely. When talking about future outputs, we do not want to
imply that past outputs have been zero. Rather, we want to express the fact
that we are only interested in the future values. That is, we will identify two
output sequences when their values coincide for t > O. This consideration gives
rise to the introduction of quotient spaces. Specifically: TOJI:= AOJIjilOJl. Hence,
two outputs are identified when their difference is a polynomial. For any IK-linear
space ff, the spaces Aff, ilff and Tff are connected via canonical maps, viz.
j = js:ilff -+ Aff, the canonical embedding, i.e. the restriction of the identity
map in Aff to nff, and n = ns:Aff -+ Tff, the canonical projection, mapping
each element of Aff to its equivalence dass. Now we can define the map J,
introduced in the previous paragraph, more precisely. In fact, this map is defined
by the following commutative dia gram,
/\'1.1
--------7)
11. "d
(3.6)
hence, J:= nOlOj. This map is sometimes called the Kaiman ijo map, or the
restricted ijo map, or the Hankel map. The conceptual experiment, in which
only past inputs and future outputs are considered is sometimes called the
Kaiman experimental setup.
Obviously, it is of interest to find out how time invariance can be
incorporated into diagram (3.6) by the use of a suitable algebraic structure. The
space AillI is a AIK-linear space, but this is obviously not the case for ililll. In
ililll, multiplication with z is possible, but division by z can take the element
out of ililll. Therefore we can say that ililll is an ilIK-module. Here illK = lK[z]
is defined to be a ring in a standard way. Similarly, ilOJI is an illK-module and,
by a standard construction in algebra, TOJI = AOJIjilOJI is an illK-module.
Furthermore, the maps j, J and n are illK-homomorphisms, so that
272
the map J uniquely determines the original map 1, due to the strict-properness
condition.
We now turn to the actual realization problem. As we have noticed before,
if we have a realization there is an intermediate space q;, and there are maps
g:ilillt -+ f!l and h:f!l -+ P!Y such that J = hog. Thus we can extend diagram (3.6)
NU
j
Q'U
:r
) A"d
in
(3.7)
) r"d
/h
We have not specified anything about f!l, g and h yet, but in view of the linear
framework we are working in, it is obvious to require that f!l be a IK-linear
space and g and h be IK-linear maps. (Note that ilillt and P!Y can be viewed
as IK-linear spaces in a canonical way.) However, we also want to exploit the
ilIK-module structure of diagram (3.6) or, equivalently, the time invariance of
the system. For this purpose, we extend the ilIK-module structure to diagram
(3.7), i.e we define a multiplication by z in f!l in such a way that g and h
commute with the multiplication with z. It is easily seen that this suffices for
determining an flIK-module structure on f!l in such a way that the maps g and
h become ilIK-homomorphisms. Let us see wh at happens when we apply the
map g to zu instead of u. The input zu is obtained by shifting u one time unit
to the left. By time invariance, the corresponding output, y:= J(zu) is obtained
from y:= 1(u) by shifting to the left. Since the input is zero for positive t, we
find that if Xo is the initial state, Yi=HFi-1XO and Yi=HFiXO=HFi-l(Fxo).
Hence, Y is the sequence that corresponds to the initial state Fx o instead of Xo.
So, replacing u by zu has the same effect on the output as replacing Xo by Fxo.
Therefore, an obvious idea is to identify the action XHZX with XHFx. It is
easily verified that with this definition, f!l is an ilIK-module, g and h are
ilIK-homomorphisms and (3.7) a commutative diagram of illK-homomorphisms.
The following theorem formulates the fundamental result ofrealization theory:
(3.8) Theorem. Let f:AIlIt -+ N.!! be an i/o map. To every realization (F, G, H)
there corresponds a unique ilIK-module Jactorization (g, h) oJ the restricted map J
corresponding to 1. Conversely, every Jactorization oJ J gives rise to a unique
realization (F, G, H). The two correspondences are inverses oJ each other.
In short: reaIization is equivalent to factorization. It will be seen that a number
of concepts and properties are much more easily expressed in terms of the
factorization than directly in terms of the realization.
273
(3.9)
G:Oll ~f!l':UHg(iU(U)),
H:f!l' ~W:XHP1(h(x))
~l
rY
This means, g2 = (tgl and h1 = h2(t. In terms of the realizations this can be
written as:
whieh is the familiar coneept of isomorphism. Henee, we have the result that
eanonieal realizations are essentially unique, i.e. unique up to isomorphism.
274
4 Finite-Dimensional Realizations
Up to now, we have not made any assumption about the dimension of the
spaces involved. In this section, we ass urne that tfIt and OJ! are finite dimensional.
Specifically, we let tfIt = lKm and OJ! = lK P We are interested in the question of
the existence of a finite-dimensional realization, i.e. a realization in which the
state space q; is finite dimensional. Abstract conditions for this to be the case
can be given in module-theoretic terms. To this extent, we use the following
result:
(4.1) Lemma. Let L1 be a submodule oJ QtfIt. Then the Jollowing statements are
equivalent:
1.
2.
3.
4.
The second statement follows from the fact that in a reaehable realization, the
state space is isomorphie with QtfIt /ker(g).
A more concrete (and well-known) eondition for the existence of a
finite-dimensional realization is the rationality of the transfer funetion. The
above framework enables us to relate any reaehable finite-dimensional
realization in an essentially unique way to a matrix fraction representation of
the transfer function, whieh in particular implies that the transfer matrix is
rational. Specifically, ass urne that (f, g) is such a factorization and denote ker(g)
by ,1. Since ,1 is full, it is generated by m vectors. Hence we ean write ,1 = DQtfIt
275
(i.e. the image of QO/l under the map induced by D) for some nonsingular
m x m-polynomial matrix D. The matrix D not only induces an lKhomomorphism in QO/l but also a AIK-linear map in AO/l. As such, it can be
composed with the map!. We define N:=JoD. An easy calculation yields that
N:AOll ~ AO/l is a polynomial map, i.e. N(QOll) r;; QO/l. Consequently,
J = No D -1 is a representation of J as a left matrix fraction of polynomials.
We formulate this as a theorem:
= IKm and
qy = IKP. For every reachable finite-dimensional realization (g, h), the i/o-map can
be written as the matrixJraction J = ND-I, where Dis any basis matrix ofker(g)
and N:= Jo D. This representation is unique up right multiplication of N and D by a
unimodular matrix. Conversely, to every matrix Jraction J = ND -1, one. can
construct a reachable realization (g, h) by defining !![:= QO/l /DQOll, and taking for
g the canonical map. The realization is unique up to isomorphism.
5 State Feedback
We want to find out to wh at degree it is possible to modify an existing system
using astate feedback transformation u = - Kx + Lv, where v a new input
variable. Such a feedback transformation is denoted as (K, L).
y)
(5.1)
where
(5.3)
We conclude that the effect of state feedback can also be achieved by cascading
the system 1: with a precompensator with transfer function TK L Conversely,
we may ask the question of which precompensators can alternatively be
implemented using state feedback. That is, for which i/o maps Tdoes there exist
astate feedback transformation (K, L) such that 1= TK L? It is obvious from the
definition that TK L is invertible and has a causal inv~rse in addition to being
causal itself. Such a map will be called a bicausal isomorphism. Consequently,
276
The interpretation of this statement is immediate: If for t > 0, the input is zero
and the output of Is is zero at time t = 1 then the output will be zero for all
positive t.
Ifwe start from an arbitrary i/o map!:AOZI-+Aqjj, and we assume we have
a realization (g, h), we can define the i/s map J. using e.g. the state-space equations
(3.1). The maps 1 and J. have the obvious relationship 1 = Ho J.. Note that we
have extended the map H:f!{ -+qjj (since y = Hx). In general, we will call an i/s
map J. a semirealization of an i/o map 1, if there exists a static map H such that
1 = Ho J.. Here a static map is the obvious extension of a map H:f!{ -+CfIJ to a
map Af!{ -+Aqjj, obtained by applying the map H to each coefficient of the
power series.
Now we ask the question: Which condition must be satisfied for a given i/s
map Is to be a semirealization of a given i/o map J?
The following result answers this question and it also shows that this answer
is characteristic for i/s maps:
(5.5) Theorem. Let h be an i/o map. Then 11 is a reachable i/s map iff the
following holds: for every i/o map 1 satisfying ker(f1) S; ker(f), the map 11 is a
semirealization of 1.
Note that kerneis are taken of the restricted maps. Using matrix fraction
representations, as discussed in Theorem 4.3, we can reformulate the previous
result as folIows:
(5.6) Corollary. Let SD -1 be an i/o map. Then SD - 1 is a reachable i/s map iff
the following holds: for every i/o map ND - 1, there exists a (unique) static map H
such that N = H S.
Next we turn to the state feedback problem. The basic question is: Given a
bicausal map T, when does there exists a feedback compensator (K, L) such that
T= ~,L?
It is easily seen that a bicausal map can be written uniquely as the sum of
invertible static map and a strictly causal map. Also, it follows from (5.3) that
L is the static part of !2' K,L" Therefore upon multiplying Twith the inverse of
its static part, we obtain the situation with static part I, corresponding to the
277
feedback compensator (K,1). The objective then is to find a map K such that
1= (I + Kl,)-l, equivalently, such that 1- 1 = I + Kl.. The map j:= 1- 1 - I is
strictly proper, and hence it can be seen as an i/o map. The equation to be
satisfied by K reduces to Kfs =]. That is, oUf question is equivalent to: Is l.
a semirealization of 1? For this we have seen an answer in Theorem (5.5): This
holds iff ker(fs) ~ ker(f).
Because of the definition of the restricted map this is equivalent to: If the
input u is polynomial and l.(u) is polynomial then f(u) is polynomial, or equivalently
1- 1(u) is polynomial. Thus we obtain the following result.
(5.7) Theorem. Let l.:Aou-tA2t be a reachable i/s map and I:AOU-tAOU a
AIK-linear map. There exists afeedback transformation (K, L) such that 1= IK L ijJ
(i) I is a bicausal isomorphism,
(ii) for uEQOU, l.(u)EQ2t, we have 1- 1(u)EilOU.
This result can be reformulated in terms ofthe matrix fraction representation
of Theorem 4.3. To this extent, assume the 1s: AOU - t A2t is a reachable i/s map.
According to Theorem 4.3, !. can be written as SD - 1 for some polynomial
matrices Sand D, where D = ker fs (= ker P1 0 f" see Lemma 5.4). Then we have
uEilOU, !.(U)EQ2t iff uEim D, i.e. u = Dv for some vEQOU. Hence, according to
the theorem, 1= IK L for some feedback transformation iff 1- 10 DVEQOU for all
vEQOU, i.e. iffl- 1 Dis polynomia!. In order to simplify the formulas and without
loss of generality, we assume that L = I, correspondingly, that 1- 1 is of the form:
1- 1 = I + 1, where 1 is strictiy causa!. Then the above condition is: Q:= 1 D
is polynomia!. Hence, we can write las 1- 1 = (D + Q)D - 1, i.e. 1= D(D + Q)-l.
The final result is:
0
278
[3] J. Hammer, "Assignment of Dynamics for Nonlinear Recursive Feedback Systems", Int J
Control, 48, pp 1183-1212, 1988
[4] J. Hammer and M. Heymann, "Causal Factorization and Linear Feedback," SIAM J on
Contr and Optim, 19, pp445-468, 1981
[5] M.L.J. Hautus and M. Heymann, "Linear Feedback-an Algebraic Approach," SIAM J Contr
and Optim, 16, pp 83-105, 1978
[6] M. Heymann, Structure and Realization Problems in the Theory oi Dynamical Systems,
Springer New York, 1975
[7] F.R. Gantmacher, The Theory oi Matrices, Chelsea, New York, 1959
[8] R.E. Kaiman, "Contributions to the Theory of Optimal Control," Bol Soc Mat Mexicana,
5, pp 102-119, 1960
[9] R.E. Kaiman, "On the General Theory ofControl Systems," Proc 1st IFAC Congress, Moscow,
Butterworths, London 1960
[10] R.E. Kaiman, "Canonical Structure of Linear Dynamical Systems," Proc N at Acad Sei (U .S.A.),
48, pp 595-600, 1962
[11] R.E. Kaiman, "Mathematical Description of Linear Dynamical Systems," SIAM J Control,
1, pp 152-192, 1963
[12] R.E. Kaiman, "Irreducible Realizations and the Degree ofa Rational Matrix," SIAM J Control,
3, pp 520-544, 1965
[13] R.E. Kaiman, "Algebraic Structure of Linear Dynamical Systems. I. The Module of E," Proc
Nat Acad Sei (U.S.A.), 54, pp 1503-1508,1965
[14] R.E. Kaiman, Lectures on Controllability and Observability, Lecture Notes, CIME, July 1968,
Cremonese, Rome 1969
[15] R.E. Kaiman, P.L. Falb, and M.A. Arbib, Topics in Mathematical System Theory, McGraw
Hill 1969
[16] H.H. Rosenbrock, State Space and Multivariable Theory, John Wiley & Sons, New York
[17] W.A. Wolovich, Linear Multivariable Systems, Springer Verlag, New York, 1974
[18] W.M. Wonham, Linear Multivariable Control, 3rd ed., Springer Verlag, New York, 1985
[19] W.M. Wonham, "On Pole Assignment in Multi-input Controllable Linear Systems," IEEE
Trans Auto Cantr, AC-12, pp 660-665, 1967
1 Introduction
In abrief expository paper presented in 1967 [K5], KaIman asked the following
questions:
What is a system? How can it be effectively described in mathematical
terms? Is there a deductive way of passing from experiments to mathematical models? How much can be said about the internal structure of
a system on the basis of experimental data? What is the minimal set of
components from which a system with given characteristics can be built?
280
B. F. Wyman
y(t) = Hx(t)
Following c1assical ideas from abstract algebra (as found, for example, in Van
der Waerden's text), KaIman defined a K[z]-module structure on the state space
X by p(z)x = p(F)x for any state vector x. (If F is considered as a square matrix,
p(F) is also a square matrix for any polynomial p(z), and the right hand side
is simply matrix-vector multiplication.) The study ofthis K[z]-module structure
is equivalent to the study of F up to similarity, which can be done using the
language of invariant factors or of canonical forms. In fact, from this point of
view the module-theoretic treatment unifies the two approaches and gives the
proof of the canonical form theorems as corollaries of more general results in
commutative algebra.
The observation that the dynamical structure on the state space of a linear
system corresponds to a polynomial module structure opens up new possibilities
for generalizations and connections between system theory and algebra, but
perhaps by itself is not so interesting. Kalman's major contribution in [AS]
281
was the discovery that not only the state space, but also the spaces of input
and output sequences admit natural polynomial module structures, and that
the three structures fit together perfectly. The input/output spaces are in fact
not finite dimensional, so the full power of module theory is needed to give a
good conceptual view of the situation.
In his study of input and output strings, KaIman began with a finite
dimensional space V, say, and a time set T consisting of the integers with the
standard order. Trajectories 1I'V are maps v: T --+ V such that there is an integer
N = N(v) such that v(t) = for all t < N. The value v(t) contains information
of some kind which occurs at time t. Past trajectories (up to and including the
present moment) are given by the set Q V of all trajectories vanishing for t> 0,
while future trajectories rv vanish for t< 1. These structures can be given
various algebraic structures using the Z-transform. If v is a trajectory, the
Z-transform is the formal Laurent series given by
Z(v) =
L
00
v(t)z-t
t=N
L
00
v(t + 1)z-t,
t=N
so that multiplication by z has a dynamic meaning as a time shift one unit into
the past. In particular, the set Q V of part trajectories is invariant under this
action. In other words, QV becomes a K[z] module itself, and as such it is a
free module of rank equal the vector space dimension of V.
So far we have seen that the space of trajectories and the space of past
trajectories can be viewed as K[z]-modules, and that these module actions have
a natural dynamical meaning as time shifts. Wh at about the space r V of future
trajectories? Kalman's module action of K[z] on rv, introduced in [AS] and
described in more detail in Chapter X and the CIME notes [K7, 8], is one of
the most important ideas of the algebraic approach. As a first attempt, try the
obvious action: Z(L~l v(t)z-t) = L~ov(t)z-r+l. This result does not lie in rv,
since it begins with v(1), so KaIman elected to erase this initial term, obtaining
282
B. F. Wyman
the new action ZCL;": 1 v(t)z-t) = Lt~ 1 v(t + l)z-t. From a dynamical point of
view, this piece of algebra is extremely natural, corresponding to the intuitive
principle "multiplication by z is a time shift, ignoring data which falls into the
past." Furthermore, this action is equally natural from several algebraic points
of view leading to the well-known KaIman realization diagram.
.
Suppose given a linear system in the form (X, U, Y, F, G, H). The idea that
every past input string should set up astate leads to a mapping <G: D V -+ X
defined by setting <G(uz i ) = FiGu for all u in U, and extending linearly to all
polynomials. The mapping <G becomes a K[z]-module map if X is given the
standard dynamical structure and D V is given the shift module structure
discussed above. In fact, <G is the "correct" abstract version of the controllability
matrix, and the system (F, G, H) is controllable exactly when <G is surjective.
For studies of observability, KaIman introduced a map lH: X -+ r Y defined
by lH(x) = 2::;": 1 HP(x)z-t. Then, lH is a module homomorphism which serves
as the abstract version of the classical observability matrix, and is one-to-one
if the system is observable.
The controllability and observability morphisms fit perfectly into a
commutative diagram, the Kaiman realization triangie, which has had an
enormous influence on system theory.
T#
DU -----------l~
283
We saw above that the past trajectories QU appear as a subset, and even
a sub K[z]-module, of the space 1I'U of all input trajectories. The factor module
"all trajectories modulo past trajectories," 1I'Y/ Q Y, is then exactly the space
ry of future output trajectories, and the induced K[z]-action on the factor
module is exactly Kalman's dynamical action discussed above. That is, Q U
inserts K[z]-linearly into TU and 1I'Y projects K[z]-linearly onto r Y. The
resulting five space diagram
With this definition, the state space (or the pole module of the transfer
function) is a K[z]-module and the dynamics mapping F on X can be defined
as the action of the variable z. The K[z]-linear mappings <G and IH co me free
from standard lemmas in commutative algebra, and <G is surjective while IH is
injective. Finally, the K-linear mapping G and H can be derived from <G and
284
B. F. Wyman
285
QU
U - - - - - -... X
G
286
B. F. Wyman
In other words, the correspondence G~<G establishes an isomorphism of Kmodules HomK(U, X) ~ HomR(.QU, X). That is, the construction .Q must be a
Zeft adjoint of the forgetful functor which takes an R-module to its underlying
K-module. In the Arbib and Manes theory, such an adjoint can be rather
elaborate, but in the present ring theory context it is easy to construct: .Q U is
just given by the tensor product .Q U ~ R @KU, which is a free R-module if U
is free over K (as for vector spaces). The map <G also has a standard construction.
In the dassical ca se this is exactly Kalman's construction, so that we have a
wonderful blending of dynamical intuition and fancy algebra.
The output side is more difficult. Of course, an appropriate diagram can be
easily constructed.
ry
.. Y
The right hand side is just Y(z)jY[z], but the left hand side is "too big" for
this attractive isomorphism to hold. But, in fact, the truth is bett er. We have
HomK(K[z], Y) ~ y((Z-l ))jY[z] ~
ry
287
via the correspondence sendingf: K[z] --+ Yto L~of(zl)z-l-l(mod Y[z]). That
is, Kalman's r construction is exactly the polynomial case of the general right
adjoint functor approach, and again we have aperfect match between dynamical
intuition and algebra.
The idea that the polynomial ring K[z] can be replaced by a more or less
arbitrary ring R acting on the state-space X ("the module action is the
dynamies!") has led to a lot of examples and ni ce results. The idea is to find a
ring whose action describes somehow a dynamics worthy of study, and then
to use Kalman's philosophy to compute suitable input and output string spaces.
The paper by Ed Kamen in this volume will address many of these issues. The
articles ([W3, W4]), partly inspired by Kamen's work on time-varying systems,
deal with general rings and some exotic specific rings. A somewhat different
direction in the application of module theory to system theory, namely the
theory of "families of systems," or "systems over rings," was also begun under
Kalman's influence [RW, RWK]. We say no more about this area here, since
it will be discussed in detail in Kamen's article. Here we discuss more concrete
examples closely related to Kalman's original work on stationary systems over
fields of scalars.
Although the polynomial ring K[z] is the most familiar subring of the field
K(z) of rational functions, many other subrings R of K(z) also have system
theoretic significance. The algebraic study of the behavior of transfer functions
at infinity is based on the ring <D oo of all proper rational functions, those rational
functions with numerator degree no larger than the denominator degree. The
ring <D oo is a local ring with a unique maximal ideal corresponding to the point
at infinity. Classical transfer functions have no poles at infinity, so that the
study of <Doo-modules becomes only important for the study of singular systems,
discussed immediately below, or for the study of modules of multivariable zeros
(which we discuss in the next section).
The discrete-time version of singular (improper, semistate) systems deals
with equations of the form
Ex(t
+ 1) =
Fx(t)
+ Gu(t),
y(t) = Hx(t),
where the matrix E may be singular. The study of improper systems began late
in the history of system theory, but recently there has been a great deal of new
activity in this area [LN, M]. In the continuous-time case, such systems may
involve differentiation of inputs or impulsive responses [VLK], and in the
discrete-time case improper systems simulate predictors. On the other hand,
they can arise abstractly from inversion problems of various kinds, and systems
in descriptor form (or, synonymously, systems with embedded statics) behave
formally like predictors in certain cases [L]. Furthermore, from an algebraic
or algebro-geometric point of view the class of (possibly) improper systems is
the natural object of study. The point at infinity must be included in any
comprehensive study of poles and the zeros of linear systems. Many common
problems such as system inversion or solving transfer function equations lead
288
B. F. Wyman
naturally to improper systems even if the given data involve only proper or
strictly proper systems. If these systems are to be studied by me ans of the
KaIman linear realization theory philosophy, then a new ring must be sought
to replace the polynomials K[z]. In fact )", is the correct choice, at least for
the "purely improper" case in which the dynamics mapping F is just the identity
map. In this case, the transposed equation x(t) = Ex(t + 1) - Gu(t) shows that
the system can be viewed as evolving backwards in time. In the Z-transform
theory the study of backwards evolution involves multiplication by Z-l which
lies in )", but not in the polynomial ring. New techniques, involving aglobaI
state space construction, must be used to study the most general singular system.
The approach to singular systems via the algebraic theory of systems at infinity
was begun by Conte and Perdon in [CP1, 2], and some more recent work on
global systems can be found in [WCP, WSCP].
Rings which are more general than )", but which are still subrings of K(z)
are also useful. If X is a classical state space viewed as a module over K[z]
and S is a multiplicative subset of K[z], then we can consider the localization
R = S-l K[z] and the corresponding localized state space S-l X. For example,
if S is the set of polynomials with all roots in the right half plane, then S-l X
is the stable part of the state space. (Localization at a set of modes, wh ich is
essentially just expanding the ring of operators to allow more denominators,
is a functorial algebraic technique which kills off the set of states corresponding
to those modes.) Furthermore, localization of K[z] can be combined with the
theory at infinity to develop a theory of proper stable or proper unstable transfer
functions and their corresponding state spaces. The idea here is that a very
small amount of category theory allows us to construct perfect analogs of
Kalman's input Q spaces and output r spaces, and to mimic his realization
triangle exactly. The foundations of this theory have been worked out in some
detail in [WCP].
I will conclude this section with the admission that, as far as I know, the
category theoretical approach has not yet succeeded in capturing the full power
of Kalman's original intuition. I view the power of the algebra in [AS] as
coming in two stages. First the observation that the dynamics on astate space
can be conveniently captured by a module structure, together with the intuitive
module structure on the Q and r spaces led to the realization triangle. Then
the big KaIman diagram with a trapezoid relating transfer functions and the
KaIman input/output map led to a full fledged realization theory with many
classical connections. Now the left-adjoint and right-adjoint functors available
in the various category theory approaches give a powerful generalization of
the realization triangle. On the other hand, the top trapezoid, involving
Z-transforms of trajectories on the wh oIe time line, must be created in an ad hoc
manner for each problem attacked. So far, for each reasonable dynamical
problem and each reasonable choice of rings, some sort of trajectory construction has turned out to be available, but I look forward to a new, very
general construction which will tell us what trajectories ought to be in some
elegant generality.
289
U[z]
( )- U[z]nT- 1 (y[z])
X T _
290
B. F. Wyman
in which the state can be thought of as inputs modulo inputs with trivial outputs.
It turns out that zeros are more complieated than poles, since the rational null
space of the transfer function supplies a rather different sort of zero, so that
the vector space kernel of T(z) must be factored out to insure a finite-dimensional
answer for the zero module.
The module theoretic approach to zeros has achieved some important
successes in unifying and explaining zero related behavior. The intuitive idea
that "a zero of a system is a pole of the inverse system" has been refined to the
assertion that the zero module of a system is a factor module of the pole module
of any right inverse. (Left inverses use submodules.) Additional poles of an
inverse system, not forced by direct zeros, are now well understood [WS3]. The
relationship between zeros and geometric control theory has now been clarified
by an explicit exact sequence, and recent work untangles various structural
information for non-minimal systems contained in the Rosenbrock system
matrix [WS2,6]. More recently [WCP2, SW] problems of pole/zero cancellation for interconnected systems have been worked out. I would like to
emphasize here that Kalman's convietion that the module-theoretic view of
stationary linear system theory is the "correct" view, or at least a very powerful
one, has been borne out in the theory of multi variable zeros as well as the
theory of poles.
291
We have proved that these Wedderburn numbers are in fact the dimensions of
new finite-dimensional vector spaces which measure the size of two sorts of
"generic zeros": those arising from the failure of G(z) to be injective and from
the failure of G(z) to be surjective. Furthermore, the number of ordinary
"lumped" zeros is best viewed as the dimension of a zero-module (either finite
or at infinity), and of course the number of poles is the dimension of the polemodule, or minimal realization state space (possibly at infinity), as discussed in
previous sections. With an expanded interpretation, the counting principle is
not only true numerically, but it becomes an exact sequence relating the usual
pole-modules, the lumped zero-modules, and the new generic "Wedderburn
zero spaces."
We devote a little space to some of the new ideas in [WSCP]. The crucial
step of describing generic zeros by means of a finite dimensional vector space
sterns from a construction mentioned by Forney in notes written at Stanford
in 1972. Since it is closely associated with earlier work of Wedderburn, we call
it the Wedderburn-Forney construction. Let C be any subspace of V(z), where
V is finite dimensional over k. By choosing a basis of V over k and using it as
a basis of V(z) over k(z), we can identify V(z) as aspace of column vectors with
coefficients in k(z). A vector in V(z) is polynomial (that is, in the KaIman space
V[zJ) or proper (in !2 00 V, the KaIman input space at infinity) if its coefficients
He in K[zJ or (1)00' With these conventions, we define coordinatewise maps 1t+
(polynomial part of a rational function) and 1L (strictly proper part). We define
the Wedderburn-Forney space associated to C by
W(C)=
1L(C)
Cnz- 1 (!2 00 V)
That is, W(C) consists of the set of vectors which are strictly proper parts of
vectors in C, modulo the set ofvectors of C which are themselves strictly proper.
It is clear that W(C) is a vector space over K, and the first technical result is
that for any subspace C, W(C) is finite dimensional over K. The fine structure
of the Wedderburn-Forney space W(C) is closely related to earlier work on
minimal indices and minimal polynomial bases, as found, for example in
Forney's paper [Fo]. For example, its dimension is given by the sum of the
minimal indices of T(z). Some of the main ideas were already done by
Wedderburn and Kronecker EWe; R, pp 95-99J, and an excellent summary can
be found in [Ka, Chap. 6J.
Let T(z) be a transfer function. We define the global pole space X(T) by
X(T) = X(T) EB X 00 (T), where X(T) is the usual KaIman pole module and X oo(T)
is its analog at infinity. Similarly the global zero space is defined to be Z(T) =
Z(T) EB Zoo(T). The principal result is given by the following exact sequence.
Fundamental Pole-Zero Exact Sequence: Let T(z) be a transfer function.
Then there is an exact sequence of finite dimensional vector spaces over k,
O---.Z(T)---.
X(T)
.
---. W(lm T(z))---.O
W(ker T(z))
292
B. F. Wyman
6 An Appreciation
I hope I will be forgiven for some personal remarks on this occasion. In 1971,
when I was coming to the end of a temporary position at Stanford, I was an
algebraist without much direction. Rudy KaIman asked me a question, and
encouraged me to work with Yves Rouchaleau, who was dose to finishing his
dissertation on systems over rings. Together they managed to te ach me a little
system theory and get me going on a fascinating se ries of problems. I am still
trying to understand the algebraic foundations of linear system theory, and I
hope I have succeeded to some extent. In any case, I have enjoyed myself
thoroughly, and I'm still having a good time. Thanks, Rudy.
References
[AS] See [K2]
[BL] C. Byrnes, A. Lindquist, eds., Frequency Domain and State Space Methods Jor Linear
Systems, North-Holland: Amsterdam, 1986
[BMS] C. Byrnes, C. Martin, R. Saeks, eds, Linear Circuits, Systems, and Signal Processing:
Theory and Applications, North Holland, 1988
[CP1] G. Conte, A. Perdon, "On polynomial matriees and finitely generated torsion modules,"
in AMS-N ASA-N ATO Summer Seminar on Linear System Theory, 18, Leetures in Applied
Math, Amer Math Society: Providenee, 1980
[CP2] G. Conte, A. Perdon, "Generalized State Spaee Realizations of Non-proper Transfer
Funetions," Systems and Control Letters, 1, 1982
[CP3] G. Conte, A. Perdon, "Infinite Zero Module and Infinite Pole Modules," VII Int ConJ
Analysis and Opt oJ Systems: Nice, Lee Notes Control and Inf. Scienee, Springer 62,
pp 302-315, 1984
[Fo] G.D. Forney, Ir., "Minimal Bases of Rational Veetor Spaees with Applieations to
Multivariable Linear Systems," SIAM J Control, 13,493-520, 1975
[F] P. Fuhrmann, Linear Systems and Operators in Hilbert Space, Chap. I, MeGraw-Hill
International, 1981
[FH] P. Fuhrmann, M. Hautus, "On the Zero Module of Rational Matrix Funetions,"
Proe 19th IEEE Conf Deeision and Control, pp 161-184, 1980
[Kl] R.E. KaIman, "Mathematical Deseription of Linear Dynamieal Systems," SIAM J
Control, 1, pp 152-192, 1963
[K2] R.E. KaIman, "Algebraie Strueture of Linear Dynamie Systems. I. The Module of Sigma,"
Proc Nat Acad Sei (USA), 54, pp 1503-1508, 1965
293
[K3] R.E. Kaiman, "Irreducible Realizations and the Degree of a Rational Matrix, J Soe Indust
Appl Math, Vol13, No 2, 1965
[K4] R.E. Kaiman, "New Developments in System Theory Relevant to Biology," in Systems
Theory and Biology (M. Mesarovic, ed.), New York: Springer, 1968
[K5] R.E. Kaiman, "On the Mathematics of Model Building," Proceedings of the School on
Neural Networks, Ravello, June 1967
[K6] R.E. Kaiman, Aigebraic Aspects of the Theory of Dynamical Systems, in Differential
Equations and Dynamical Systems, J.K. Haie and J. LaSalle, eds., New York: Academic
Press, 1967
[K7] R.E. Kaiman, Leetures on Controllability and Observability [The CIME Notes], Centro
Internazionale Matematico Estivo, Bologna, 1968
[K8] R.E. Kaiman, Aigebraic Theory of Linear Systems, Chapter X in [KFA]
[KFA] R.E. Kaiman, P. Falb, M. Arbib, Topics in Mathematical System Theory, McGraw-Hill:
New York, 1969
[Ka] T. Kailath, Linear Systems, Prentice Hall: Englewood Cliffs, 1980
[L] D. Luenberger, "Dynamic Equations in Descriptor Form," IEEE Trans Auto Control,
AC-22, pp 312-321, March 1977
[LN] F.S. Lewis and R.W. Newcomb, guest editors, "Special Issue: Semistate Systems," Cireuits,
Systems, and Signal Proeessing, 5, No 1, 1986
[M] M. Malabre, "Generalized Linear Systems: Geometrie and Structural Approaches,"
Special Issue on Control System, Linear Algebra and Applieations, 122/123/124,
pp 591-620, 1989
[Ma] E.G. Manes, ed., Category Theory Applied to Computation and Control, Springer Lec
Notes Comp Sci, 25, 1975
[R] H. Rosenbrock, State Spaee and Multivariable Theory, Wiley: New York, 1970
[RW] Y. Rouchaleau, B. Wyman, "Linear Dynamical Systems over Integral Domains," J
Computer Sys Sei, 9, pp 129-142,1974
[RWK] Y. Rouchaleau, B. Wyman, R. Kaiman, "Algebraic structure of Linear Dynamic Systems
III. Realization Theory over a Commutative Ring," Proe Nat Aead Sei, 69, pp 3404-3406,
1972
'
[VLK] G. Verghese, B. Levy, T. Kailath, "A Generalized State Space for Singular Systems,"
IEEE Trans Auto Control, AC-26, pp 811-831, August 1981
[W1] B. Wyman, "Linear Systems over Commutative Rings," Lecture Notes, Stanford
University, 1972
[W2] B. Wyman, "Linear Systems over Rings of Operators" in [Ma], pp 218-223
[W3] B. Wyman, "Time Varying Linear Discrete-time Systems: Realization Theory," Advanees
in Math Supplementary Studies, Vol I, pp 233-258, Acad Press, 1978
[W4] B. Wyman "Estimation and Duality for Time-Varying Linear Systems," Pacific J of
Math (Hochschild volume), 86, 1980, 361-377
[WCP1] B. Wyman, G. Conte, A. Perdon, "Local and Global Linear System Theory" in [BL],
pp 165-181
[WCP2] B. Wyman, G. Conte, A. Perdon, "Fixed Poles in Transfer Function Equations," SIAM
J Cont Opt, 26, pp 356-368, March 1988
EWe] J.H.M. Wedderburn, Leetures on Matriees, Chap. 4, A.M.S. Colloquium Publications,
17, 1934
[Wo] M. Wonham, Linear Multivariable Control: A Geometrie Approach, 3rd Edition, Springer,
1985
[WS1] B. Wyman, M.K. Sain, "The Zero Module and Essential Inverse Systems," IEEE Trans
Cire Systems, CAS-28, pp 112-126, 1981
[WS2] B. Wyman, M.K. Sain, "On the Zeros of a Minimal Realization," Lin Alg Appl, 50,
pp621-637, 1983
[WS3] B. Wyman, M.K. Sain, "On the Design of Pole Modules for Inverse Systems," IEEE
Trans Cire Systems, CAS-32, pp 977-988, 1985
[WS4] B. Wyman, M.K. Sain, "A Unified Pole-Zero Module for Linear Transfer Functions,"
Systems and Control Letters, 5, 1984, 117-120
[WS6] B. Wyman, M.K. Sain, "Module Theoretic Structures for System Matrices" SIAM J
Control and Opt, January 1987, pp 86-99
[WSCP] B. Wyman, M. Sain, G. Conte, A.-M. Perdon, "On the Zeros and Poles of a Transfer
Function," Special Issue on Systems and Control, Linear Algebra and Applieations,
122/123/124, pp 123-144, 1989
The algebraic module theoretic stability framework for linear time-invariant systems is reviewed.
The main theme is that Kalman's algebraic realization theory has evolved much beyond its initial
objective of providing an abstract framework for the derivation of mathematical models of systems.
It has become a powerful tool for the extraction of structural invariants, permitting the exact
characterization of all options for dynamics assignment through internally stable linear dynamic
compensation. This characterization is provided by a set of integers-the stability indices of the
system.
1 Introduction
In asense, realization theory is the basic mechanism of science through which
the conceptualization of observation is achieved.1t formulates the mathematical
guiding principles that lead from measurement of behavior to laws of nature.
Duly stated, realization theory is the abstract theory of mathematical modeling.
It forms the bridge from experiment to theory, and, in a way, is a grand
mathematical scheme for data compression, facilitating compact mathematical
description of vast realms of experimental data. All this is, of course, well known.
The main point of the present note is to show that realization theory has
matured beyond its innate mission ofproviding guiding principles for modeling,
and has become a refined tool for scientific analysis, capable of singling out the
important aspects of experimental data and filtering away the c1utter of
secondary details. In mathematical terms, realization theory has become a
sophisticated tool for the extraction of structural invariants of systems.
Historically and philosophically, realization theory may be conceived as the
driving force behind the scientific revolution that started in the eighteenth
century; Nevertheless, it seems that elicit mathematical treatments of basic
aspects of realization theory had not appeared in the scientific literature until
around the middle of the present century. At that time, realization theory formed
1 This research was supparted in part by the National Science Foundation, USA, Grant number
8896182 and 8913424. Partial support was also provided by the Office of Naval Research, USA,
Grant number NOOOI4-86-K0538, by US Air Force Grant AFOSR-87-0249, and by the US Army
Research Office, Grant number DAAL03-89-0018, through the Center for Mathematical System
Theory, University of Florida, Gainesville, FL 32611, USA.
296
J. Hammer
297
for the given system, the theory should provide a clear indication of the ways
in which the system has to be modified to facilitate the desired performance.
From a mathematical point of view, the limitations on design performance for
a given class of systems are usually presented in terms of system invariants. In
qualitative terms, the invariants characterize the fundamental underlying
structure of the system which cannot be alte red by external intervention; they
provide the skeleton upon which the designer may build.
The module theoretic approach to the linear realization problem initiated
in KaIman [1965J has matured into a powerful tool for the derivation of
structural invariants for linear time-invariant systems. Specifically, it yields a
set of integer invariants that entirely characterize all possible dynamical
behaviors that can be assigned to a system through the use of external dynamic
compensation, subject to the requirement that the final closed loop configuration
be internally stable. This result is derived by developing an algebraic module
theoretic realization theory over certain rings of stable rational functions, which
replace the ring of polynomials used in the original KaIman realization theory.
In this way, algebraic realization theory becomes more than just a tool for
obtaining dynamical models of linear systems; it provides the means to extract
the inherent structure ofthe system from the input/output data on its behavior.
It is quite fascinating that the fundamental invariant structure of a linear
time-invariant system can be entirely characterized by a finite set of integers.
When investigating the possible dynamical properties that can be endowed to
a given system through an internally stable closed loop configuration, all one
needs to know are these integer invariants, despite the fact that the complete
description ofthe dynamical model ofthe system requires a much larger number
of real parameters. The question of whether certain dynamical properties can
or cannot be assigned to a system by internally stable compensation simply
reduces to the comparison of some integers. This fact provides a deep indication
of the fundamental simplicity of linear time-invariant control systems. It is, of
course, in line with the classical results on pole assignment (Wonham [1967J)
and on the assignment of invariant factors (Rosenbrock [1970J).
298
J. Hammer
of the form
u=
00
'\'
L..., U1Z
-I
(2.1)
1=10
(2.2)
In his [1972J lecture notes, Wyman noted that, through (2.2), time invariance
is related to the linearity of the system ~ over the field of scalar Laurent series,
and this idea was then further expounded in Hautus and Heymann [1978].
We review this point next.
Let K be a field, let S be a K -linear space, and let AS denote the set of all
Laurent series of the form
s=
L
00
SIZ-I,
(2.3)
1=10
where the initial integer t o may vary from one sequence to another, and where
for all integers t ~ t o. In particular, taking S = K, it is easy to see that the
set AK forms a field under the operations of coefficientwise addition as addition
and series convolution as multiplication. The set AS becomes then a AK-linear
space. Furthermore, it can readily be shown that if the dimension of S as a
K -linear space is n, then AS is a finite dimensional AK-linear space of dimension
n as weIl. The importance of the notion of AK-linearity is that it permits the
fusion of two seemingly disparate notions-the notion of linearity and the
notion of time invariance. In fact, every AK -linear map f: A U --+ A Y clearly
satisfies (2.2), and whence represents a K-linear time-invariant system, which has
the K-linear space U as its input space, and the K-linear space Y as its output
space (Wyman [1972J). Moreover, it can be shown that every K-linear
time-invariant system ~ which is causa I and has a finite dimensional state-space
represents a AK -linear map f: A U --+ A Y, where U is the input space of ~ and
Y is its output space. Thus, for the systems we intend to consider in this note,
the notion of a AK-linear map is equivalent to the input/output description of
a system.
As any linear map between finite dimensional linear spaces, a AK-linear
map f: A U --+ A Y may be represented by a matrix, relative to specified bases of
its domain AU and its codomain AY. Among the bases of the space AU, a
particularly significant role is played by bases of the original K-linear space U.
SIES
299
It is easy to verify that every basis u l' u2 , ... ,Um of the K -linear space U is also
a basis of the AK-linear space AU. A matrix representation of the AK-linear
300
J. Hammer
The space AS is, of course, an Q9K-module as well. Let SI'S2' ... 'Sn be a
basis of the K -linear space S, and let Q9S be the Q9K -submodule of AS generated
by this basis, namely,
Q9 S =
{s
,= 1
Then, it is easy to see that Q9S is the same for any basis SI' S2, ... , Sm of S, and
its rank as an Q9K-module is equal to the dimension of S as a K-linear space.
We denote by
(2.6)
the identity injection which maps each element of Q9S into the same element
in AS. By
(2.7)
301
we denote the canonical projection which maps each element sEAS into the
equivalence dass s + QoS in AS/QoS. It follows then that a AK-linear
map f:AU ~AY is input/output stable
c Qo Y.
The final algebraic structure that we need to review deals with the
combination of the notions of causality and stability. Let Q;; K:= QoK n Q - K,
which consists of all the stable and causal elements in AK. Then, the following
is true (Morse [1975J)
(2.8) Proposition. Q;; K is a principal ideal domain.
(3.1)
In intuitive terms, the map I associates with each input sequence terminating
at the present, the future part of the output sequence genera ted by it. Of
particular importance is the set of all past input sequences that generate zero
future outputs, namley the set
(3.2)
302
J. Hammer
nu
I1'
~/
x
= FX k + Guk ,
= Hx k ,
in a way that we do not detail here (see KaIman, Falb, and Arbib [1969J
or Hautus and Heymann [1978J). The realization (X, g, h) is called
reachable whenever g is surjective; observable whenever h is injective; and
canonical whenever it is both reachable and observable. The pair (X, g) is called
a semirealization of f. It can be readily seen from the diagram that every
realization (X, g, h) of f must satisfy ker g c ker 1; and a reachable realization
(X, g, h) is canonical if and only if ker g = ker
(= L1 K ). A canonical
semirealization can be simply constructed by ta king g as the projection
.QU -.QU/L1 K , and the canonical state-space is simply given by the quotient
module X = .QU/L1 K These facts indicate that the module L1 K is the basic
quantity in realization theory for linear time-invariant systems.
In stronger terms, the polynomial module L1 K exactly contains the critical
information that is necessary in order to construct a dynamical mathematical
model of a given linear time-invariant system from its input/output behavior
f. However, as we shall show in the sequel, this information does not directly
provide the designer with a clear indication of the design options at his
disposition, when trying to control the given system using an internally stable
control configuration. The information provided by AK , although complete,
contains much too many details whithin which the critical information
characterizing the design options is buried. Nevertheless, the module theoretic
stability framework which we have briefly reviewed earlier will extract the sought
after information from the deep underlying algebraic structure of the input/
output map f. It will provide an accurate characterization of all design options
in the most obvious and condensed form, in terms of a set of m integers, where
m is the dimension of the input space U.
Perhaps, the most fundamental quantity in the analysis of the invariant
structure of linear time-invariant systems is the strict observability module
associated with an input/output mapf:AU -AY, and given by kernf
(Hammer and Heymann [1983J). The strict observability module kernf
consists of all input sequences (not necessarily restricted to the past) which
303
produce zero future outputs from the system described by the input/output
mapf. Sincefand n are both nK-homomorphisms, it is an nK-module, namely,
a module over the polynomials. In contrast to the realization module .1 K , which
contains only polynomial vectors as elements, the module ker nf mayaiso
contain non-polynomial elements. As a direct consequence of the definition, we
have that
.1 K = kernf nnu,
so that .1 K C ker nf. In view of the weH known fact that the realization module
.1 K is offull rank whenever fis a rational function, the last containment implies
that the module kernf contains a basis of the AK-linear space AU.
The basic algebraic quantity of our present discussion is the stability module
.1 0 of Hammer [1983a], given by
(3.3)
It consists of all stable input sequences that produce zero future outputs for
the system described by the input/output map f. Since ker nf and noU are both
nK-modules (the latter being implied by the fact that nK c noK), it foHows
that .1 0 is an nK-module. The module .1 0 will allow us to derive a complete
set of invariants that characterize the set of all dynamical properties that can
be assigned to the system described by the input/output map f by internally
stable control. These invariants are derived through a standard procedure for
the extraction of integer invariants from nK-modules, which is described in
the next section.
Another module which is critical to our discussion is the pole module .1 0 ,
also introduced in Hammer [1983a], and given by
(3.4)
304
J. Hammer
it is more common to use the notion of degree, given by deg s:= - ord s. The
leading coefficient S of s is an element of the k-linear space S given by S:= Sords
if s i= 0 and S:= 0 if s = O. A set of elements u 1 , u 2 , , U n in the space AS is said
to be properly independent if the leading coefficients U1 , u2 , ... , un are K-linearly
independent. A proper basis of the AK-linear space AS is a basis of AS that
consists of properly independent elements. An ordered proper basis of an
.QK-module .1 c AS is a basis d 1 , , dn of .1 that consists of properly
independent elements, and deg di ~ deg di + 1 for all i = 1, ... , n - 1. As it turns
out, the degrees ofthe elemens of ordered proper bases are of deep significance in
linear time-invariant system theory. The origin ofthis fact sterns from the notion
of causality, but we will not explore this connection he re in great detail (see
Hammer and Heymann [1981, 1983] and Hammer [1983a]).
It is quite interesting that the degrees of the elements of an ordered proper
basis of an .QK-submodule .1 ofthe AK-linear space AS are uniquely determined
by .1, and can be derived without the explicit construction of any proper bases.
This fact has probably first been noticed (somewhat implicitly) in Rosenbrock
[1970], but the specific procedure used here to derive these degrees was
developed in Hautus and Heymann [1978], Hammer and Heymann [1981,
1983] and Hammer [1983a]. To describe the procedure, let .1 c AS be an
.QK-module. For every integer k, let Sk be the K-linear subspace of S spanned
by the leading coefficients of all elements sEL1 satisfying ord s ~ k. Since .1 is
an .QK-module (thus permitting shifts to the left in the discrete-time
interpretation), it follows that the sequence of subspaces {Sd creates a chain
... ::::J S -1 ::::J So ::::J S 1 ::::J " ' , which is called the order chain of .1. The sequence of
the dimensions of the elements of this chain, namely, the sequence of integers
'1k := dim KSk' k = ... , - 1,0,1, ... , is called the order list of .1. An .QK-module
.1 c AS is said to be rational if the intersection .1 n.QS is of rank equal to the
dimension of the K -linear space S. Also, the .Q K -module .1 is said to be bounded
if there is an integer (:J. such that ord s ~ (:J. for all nonzero elements sEL1.
Now, let .1 be a rational .Q K-submodule of the AK-linear space AS, let
{lJd be the order list of .1, and let n be the dimension of the K-linear space S.
The degree indices /11 ~ /12 ~ ... ~ /1n ofL1 are then defined as follows. For every
integer j satisfying lJi ~j < lJi-1' the degree index /1j:= - i; if limk-+rolJk i= 0, set
/1j:= 0 for j = 1, ... , lim k-+ ro lJk' It can then be shown that, for a rational and
bounded .QK-module .1, the degree indices describe the degrees of every ordered
proper basis of .1, as follows (Hammer and Heymann [1983]).
(4.1) Theorem. Let .1 be a rational and bounded .QK-submodule ofthe AK-linear
space AS, and let /11 ~ /12 ~ ... ~ /1n be its degree indices. Then, .1 is of
rank n = dimKS, and
305
[1974], Forney [1975], and Hautus and Heymann [1978] (the present version
of the result is taken from the last reference).
(4.2) Theorem. Let f : AU --+ A Y be a linear input/output map, and let .11 K be the
Kaiman realization module off. Then, the degree indices of .11 K are the reachability
indices of a canonical realization of the system represented by f.
As weIl known, the reachability indices play an important role in the theory
oflinear control, as evidenced by the Rosenbrock Theorem (Rosenbrock [1970]).
They are also referred to as the 'Kronecker invariants' of the system (KaIman
[1971]).
Of fundamental importance to the theory of internally stable linear control
are the degree indices of the stability module .11 6 which were introduced and
studied in Hammer [1983a, b, and c], and which form the main motto of this
note. We review from these references the following basic definition. (Recall
that a linear input/output map is simply a strictly causal and rational AK-linear
map.)
306
J. Hammer
.
:
- ............................................................................................................... ,
---------------------------------. ,
:, ...
:, :,
,
,,, ,,
, ,,
,, ,,
, ,,
,,
,,
14-----', ,
,, ,,
,,
~~~
::
'-
,
I
fnv,v,r) :
............................................................................................................. ..
307
= fVl"
(5.3)
f(W,v,r) = f(V,r) W
(5.4)
f(V,r)
and
IS
said to be
(5.5)
of8).
= FXk + Gu k ,
= Hx k ,
308
1. Hammer
o1 ~ O2 ~ ... ~ 0m, and let k:= rank AKf. Let cP l' ... , CPk be a set of monic stable
CPi
References
(For a more complete list of references refer to Hammer [1983a, b, and cl)
G.O. Forney [1975] "Minimal bases of rational vector spaces with applications to multivariable
linear systems," SIAM J Control, Vol13, pp 643-658
J. Hammer [1983a] "Stability and non singular stable precompensation: an algebraic approach,"
Mathematical Systems Theory, Vo116, pp 265-296
309
pp 37-61
1. Hammer [1983c] "Pole assignment and minimal feedback design," International J Control,
1. Hammer and M. Heymann [1981] "Causal factorization and linear feedback," SIAM J Control,
1. Hammer and M. Heymann [1983] "Strictiy observable rational linear systems," SIAM J Contro!,
Vo121, pp 1-16
M.LJ. Hautus and M. Heymann [1978] "Linear feedback-an a!gebraic approach," SIAM J
Contro!, Vo116, pp83-105
R.E. Kaiman [1965] "Algebraic structure of linear dynamical systems, I: the module of E,"
Proceedings of the National Academy of Sciences (USA), Vo154, pp 1503-1508
R.E. Kaiman [1968] Lectures on Controllability and Observability, CIME
R.E. Kaiman [1971] "Kronecker invariants and feedback," in ordinary differential equations,
NRL-MRC Conference, L. Weiss ed, pp 459-471, Academic Press, NY
R.E. Ka!man [1980] "Identifiability and problems ofmode! selection in econometrics," Proceedings
of the 4th World Congress of the Econometric Society, Aix-en-Provence, France, August 1980
R.E. Kaiman, P.L. Falb, and M.A. Arbib [1969] Topics in Mathematical System Theory,
McGraw-Hill, NY
A.S. Morse [1975] "System invariants under feedback and cascade control," in Lecture notes in
Economics and Mathematical Systems, Vo1131, G. Marchesini and S. Mitter eds, Springer
Verlag, Berlin
A. Nerode [1958] "Linear automaton transformations," Proceedings ofthe American Mathematical
Society Vo19, pp 541-544
H.H. Rosenbrock [1970] State space and multi variable theory, Nelson, London
W.A. Wolovich [1974] Linear multivariable systems, Applied Mathematical Sciences series, No 11,
Springer Verlag, NY
W.M. Wonham [1967] Linear multivariable control: a geometrie approach, Lecture notes in
Economics and Mathematical Systems, No 101, Springer Verlag, NY
B.F. Wyman [1972] Linear systems over commutative rings, Lecture notes, Stanford University
O. Zariski and P. Samuel [1958] Commutative algebra, D. Van Nostrand Co, NY
The paper begins with abrief his tory of the role played by R.E. Kaiman in the establishment of
the field of linear systems over a commutative ring. The theory of systems over rings is motivated
by considering integer systems, systems with time delays, parameter-dependent systems, and multidimensional systems inc1uding spatially-distributed systems. Then abrief survey is given of existing
work on the control of systems over rings along with some discussion of open research problems.
1 The Beginning
My first contact with Professor KaIman was during my third year as a graduate
student at Stanford University. In particular, during the Fall 1969 and Winter
1970 quarters, I took Professor Kalman's two-course sequence on mathematical
system theory. I remember very dearly Professor KaIman telling the dass on
the first day that linear algebra and abstract algebra (rings, modules, etc.) were
required for the course. Most, if not all, of the electrical engineering students
in the dass had had courses in linear algebra, but not many of us were experts
in modern algebra. So for me and some of the other students in the dass the
race to leam abstract algebra was begun at full throttle.
Professor Kalman's two-course sequence had a profound impact on me; in
fact my going into mathematical system theory as an area of research was
primarily a result of my in te rest in the material he taught in the sequence. I
found Professor KaIman to be a very effective teacher: his lectures were always
weIl motivated, very dear, and very precise. But what impressed me the most
was the tremendous degree of intuition he displayed on the topics he was
covering.
The first quarter of Professor Kalman's course dealt with linear timeinvariant discrete-time systems given by the dynamical equations
x(t + 1) = Fx(t) + Gu(t)
y(t) = Hx(t) + Ju(t)
(1.1)
(1.2)
312
E. W. Kamen
[19,20]). Here K[z] is the ring of polynomials in the symbol (or indeterminate)
z with coefficients in the field K. In Kalman's original work, the K[z]-module
structure was used to derive a number of results on realization, system structure,
and control. In more recent work, Bostwick Wyman and his colleagues have
extended the module-theoretic approach to the study of the zeros of multivariable systems (e.g. see [50]). The K[z]-module approach to linear
time-invariant discrete-time systems is considered in the paper by Wyman in
this volume, so we shall not pursue the topic any further here.
Although the problem was not introduced in his course (as I recall), sometime
around 1970 Professor KaIman asked the question as to how much of the
theory of linear time-invariant discrete-time systems "go es through" if the field
K is generalized to a commutative ring R. In other words, in the representation
(1.1)-(1.2) now suppose that the matrices F, G, H,} are over a commutative ring
Rand that the input u(t), state x(t), and output y(t) are column vectors with
entries in R. The representation (1.1)-(1.2) with F, G, H,} over R is an example
of a system over a ring, that is, a system whose coefficients belong to a ring.
Part of the original motivation for studying systems over a ring was due to
the interest in discrete-time systems over the ring Z of integers. A discrete-time
system of the form (1.1)-(1.2) with F,G,H,} over Z processes integer-valued
inputs u(k) using integer operations. Thus systems over Z can be implemented
"exacdy" on a digital computer, assuming that magnitudes are less than 10 12
for 12-digit precision. Systems over Z arise in co ding theory (Johnston [17])
and in digital signal processing (Kurshan and Gopinath [32]). In [32], the
authors consider the generation and detection of single tones using digital filters
given by the difference equation
y(k) = aly(k - 1) + a2 y(k - 2) + ... + any(k - n) + u(k)
where the ai are integers. Systems over Z also appear in CCD technology,
where new types of programmable CCD filters with integer coefficients have
been fabricated (see Hewes et al. [15]).
Professor KaIman was especially interested in the problem of realization
for systems over rings, which is defined as folIows. Given a sequence
{A l ,A 2 ,A 3 , ... } of p x m matrices over a commutative ring R, when do there
exist n x n, n x m, p x n matrices F, G, H over R such that
(1.3)
If we define the transfer matrix by the formal power series
W(z) = Alz- l
+ A 2 z- 2 + ... ,
(1.4)
313
314
E. W. Kamen
315
the ring R[b(1),7r 1,7r 2, ... ,7rqJ is isomorphie to the ring R[0"1'0"2"",O"q+1J of
polynomials in q + 1 symbols with coefficients in R.
Now consider a system whose impulse response function matrix W is over
the quotient field R(J(l), 7r 1, 7r 2, ... , 7r q ). One can then ask if there exists a realization given by a quadrupie (F, G, H, J) of matrices over the quotient field
R(7r 1, 7r 2, ... , 7r q ) such that
(2.1)
Since convolution with b(1) corresponds to taking the generalized derivative,
the factorization (2.1) defines the following state model
(2.2)
(2.3)
316
E. W. Kamen
LL
00
i,j= -
w(k - i, r - j)u(i,j),
k, rEZ
(3.1)
00
Here u(k, r) is the input array, y(k, r) is the output array, and w(k, r) is the
point-spread function. For any k, rEZ, u(k, r), y(k, r), and w(k, r) are elements of
the reals R. The summation in (3.1) ends at i = k since we are assuming that
w(k, r) = 0 for all k < O. In other words, we are assuming that the system is
half-plane causal. If w(k, r) = 0 for all k < 0 and all r < 0, the system is said to
be quarter-plane causal. We are also assuming that w(k, r) and/or u(k, r) is
constrained so that the sum on j in (3.1) is weIl defined. We will make this
precise shortly.
Let V denote the set of all real-valued functions defined on Z with support
bounded on the left. With pointwise addition and with convolution, V is a
commutative ring with identity. We will show that the 2-D system defined by
(3.1) can be viewed as a 1-D system whose coefficients belong to the ring V.
First, for each integer k, let Uk'Yk' and W k denote the functions from Z into R
defined by uk(r) = u(k, r), Yk(r) = y(k, r), and wk(r) = w(k, r). The functions uk, Yk'
and W k are the kth rows of the corresponding arrays.
317
Now suppose that for each kEZ, Uk'Yk, and Wk are elements of the ring V;
in other words, the rows of the input array, output array, and point-spread
function have supports bounded on the left. Then we can write (3.1) as a
convolution expression with entries in V given by
k
Yk
L
i= -
(3.2)
Wk-i*U i
00
where for each fixed k and i, Wk-i*U i is the convolution of the ith row of the
input array u(k, r) with the (k - i}th row of the point-spread function w(k, r).
It should be noted that the representation (3.2) is not unique; for example,
we could represent the input and output arrays in terms of columns or rotated
columns, rather than rows. Some results on 2-D systems are dependent on
which representation is used, whereas some are not.
The dass of 2-D systems given by (3.2) indudes the dass of nonsymmetrie
half-plane causal 2-D systems. We can also consider a large dass of symmetrie
half-plane causal 2-D systems by assuming that the rows Uk> Yk' and W k are
elements of the convolution ring S of absolutely summable real-valued functions
defined on Z.
With R equal to either Vor S, consider the system given by (3.2) with wkER.
The transfer matrix W(z) of the system is defined to be the formal power series
in z - 1 with coefficients in R given by
W(z) =
L WiZ- i.
00
(3.3)
i= 1
L
00
WiZ- i,
(3.4)
i= 1
+J
(3.5)
= H*x t
+ J*u t
(3.7)
x(t+l,r)=
00
j= -
F(r-j)x(t,j)+
j= -
j= -
00
00
y(t, r) =
00
H(r - j)x(t,j) +
G(r-j)u(t,j)
(3.8)
J(r - j)u(t,j)
(3.9)
00
L
00
j= -
00
318
E. W. Kamen
Thus the 2-D state representation (3.8)-(3.9) can be studied in terms of the state
representation (3.6)-(3.7) defined over the ring R. We should note that there is
also a continuous-time version of (3.6)-(3.7) which arises in the study of
spatially-distributed continuous-time systems (see [29]). Also, it is obvious that
(3.6)-(3.7) can be generalized to the N-dimensional ca se (for N> 1) by
considering a ring offunctions from the N -fold Cartesian product of Z into R.
Parameter-Dependent Systems. Linear systems whose coefficients are
functions of one or more parameters can be studied in terms of systems over
a ring. This ring approach to parameter-dependent systems originated in the
work of Byrnes [3,4,5]. The definition of the ring setup is given below.
With N equal to a fixed positive integer, let W be a subset of N-dimensional
Euclidean space RN and let C denote the ring of real-valued continuous functions
defined on W. Here addition and multiplication are defined pointwise in the
usual way. Now a quadrupIe (F, G, H, J) of n x n, n x rn, p x n, p x rn matrices
over C defines a rn-input p-output n-dimensional linear continuous-time or
discrete-time system whose coefficients depend continuously on the entries of
a parameter vector WE W. In the continuous-time case, the system is given by
the state representation over the ring C given by
(3.10)
(3.11 )
where for each fixed value of t, u(t), x(t), and y(t) are column vectors over C.
Evaluating (3.10)-(3.11) at every WEW, we obtain the collection of
equations.
(3.12)
WE W
(3.13)
In the discrete-time case the dynamical equations are given by (1.1)-(1.2) with
F, G, H, J defined over C.
Note that the input u(t, w) depends on the parameter vector w. Dependency
of the input on w may result for ex am pIe from taking u(t, w) to be a feedback
signal generated from the state x(t, w) or output y(t, w) of the system.
One can also consider parameter-dependent systems defined over a subring
of C such as the ring R [w] of polynomials in the entries of w with real coefficients,
or the ring of rational functions a(w)/b(w) in the entries of w with b(w) nonzero
for all WE W. If the parameter set W is a c10sed bounded set, by the Weierstrass
approximation theorem we can approximate any function in C by a polynomial
in R [w], with the approximation as c10se as desired (in terms of the standard
sup norm on C). Hence when Wis c10sed and bounded, any system over C can
be approximated by a system over R[w].
319
n-1
L bizi,
i=O .
where "det" denotes the determinant. The system is pole assignable if for any
CiER, i = 1,2, ... , n, there exists an Lover R such that
det(zI - F
+ GL) =
When R is a field, it is weIl known that pole and coefficient assignability are
equivalent to reac;hability, which in turn is equivalent to right invertibility (or
full rank) of the reachability matrix U = [G FG ... Fn-1G]. When R is a ring,
reachability is still equivalent to right invertibility of the matrix U, and in the
single-input case (m = 1), it is still true that coefficient assignability, pole assignability, and reachability are all equivalent. Further, in the m = 1 case existing
expressions (such as Ackermann's formula) for the feedback L which gives a
desired characteristic polynomial in the field case are also valid in the ring case.
320
E. W. Kamen
The first major work on the control of systems over a ring in the multi-input
case was Morse's paper [34] which contains a constructive proof that
reachability implies pole assignability when R is the ring K[O"] of polynomials
in a symbol 0" with coefficients in a field K. (In fact, Morse's construction applies
to systems over any principal ideal domain.) In this survey paper, Eduardo
Sontag [43] proved that reachability is necessary for pole assignability and he
showed that reachability and coefficient assignability are equivalent when R is
a semilocal ring. In the paper by Bumby et al. [2], a counterexample was given
showing that reachability does not imply pole assignability in general when
R=R[O"U0"2]. Later Tannenbaum [48] showed that re ach ability does not
always imply pole assignability when R is any polynomial ring in two or more
symbols with coefficients in a field. In more recent work, an equivalence has
been established between the pole assignability property for reachable systems
(F, G, H) over a ring Rand the property that the image of G contains a
unimodular element (or a rank-one summand) for every reachable system over
R (see Sharma [42], Naude and Naude [35], and Sontag [47]).
The problem of pole or coefficient assignability can be reduced to the
single-input (rn = 1) case by using the notion of feedback cyclizability: a system
(F, G, H) over R is feedback cyclizable if there exist a rn-element column vector
b over Rand a rn x n matrix Q over R such that the pair (F - GQ, Gb) is
reachable; that is, the n x n matrix U(b, Q) = [Gb (F - GQ)Gb ... (F - GQt - 1 Gb]
is invertible over R. If U(b, Q) is invertible, using results for the rn = 1 case, we
can compute a feedback row vector over R such that F - GQ - Gb has the
desired characteristic polynomial. Then with L = Q + b, F - GL has the desired
characteristic polynomial.
When R is a field, it was Heymann [16] who showed that reachability and
feedback cyclizability are equivalent. When R is a ring, it is still true that
feedback cyclizability implies reachability; however, the counterexamples of
Sontag [43] show that in general reachability does not imply feedback
cyclizability. The following necessary and sufficient condition for feedback
cyclizability was given in [26]: (F, G, H) is feedback cyclizable if and only if
there exist rn-element column vectors U i over R for i = 0,1, ... , n - 1 such that
the elements gi' i = 1,2, .. . ,n generate Rn, where go = Gu o and gi = Fg i - 1 + GU i
for i = 1,2, ... , n - 1. Here Rn is the R-module of all n-element column vectors
over R. For results on feedback cyclizability when R is a principal ideal domain,
see Schmale [40, 41].
In the work ofEmre and Khargonekar [11], it is shown that for any reachable
system (F, G, H) over a ring R, coefficient assignability can be achieved by using
a dynamic feedback system (A, B, C, D) defined over Rand with the dimension
q of the feedback system bounded above by n 2 . In the discrete-time
interpretation, the feedback control signal is given by
u(t) = Cg(t) + Dx(t),
(4.1)
where x(t) is the state of the given system and g(t) is the state of the feedback
system with state equation g(t + 1) = Ag(t) + Bx(t). With the feedback control
321
GC]
A
(4.2)
-[~ ~]
322
E. W. Kamen
(4.3)
From the results in [10], the Bezout identity (4.3) implies that there is a
dynamic state feedback control system such that the characteristic polynomial
of the resulting closed-Ioop system belongs to S.
Existing results on stabilization based on the Bezout identity (4.3) are not
constructive in general due to the difficulty in determining the existence of
matrices X(z) and Y(z) satisfying (4.3). When R is a C*-algebra, stabilizability
can be checked using local criteria (based on the Gelfand transform) and
stabilizing feedback gain matrices can be computed from Riccati equations
(as in the field case). For example, the rings Sand C defined in Sect. 3
admit the structure of a C*-algebra and thus two-dimensional systems (such
as spatially-distributed systems) and parameter-dependent systems can be
stabilized using a Riccati-equation type approach. A key feature of the
C*-framework is the existence of a star operation on the algebra, in terms of
which one can define the notion of positivity. For results on the stabilization
of systems defined over algebras, see Byrnes [4], Green and Kamen [14], and
Kamen [29].
An interesting open area of research is the possible extension of the Riccatiequation framework to systems over rings whieh are not closed under an
appropriate star operation. For example, consider the class of continuous-time
systems with commensurate time delays defined over the polynomial operator
ring R[d]. The appropriate star operation appears to be d* = d- 1 , but as noted
in Sect. 2, d- 1 corresponds to the ideal predictor and thus is noncausal. An
open question is whether or not the Riccati-equation setup can be modified to
yield causal stabilizing solutions for systems over R[d].
5 Concluding Remarks
One of the objectives of this paper is to show the influence and role played by
R.E. KaIman in the establishment of the field now called systems over rings.
The paper is not intended to be a complete survey of systems over rings; in
fact, there are numerous papers on the subject that are not mentioned here due
to space limitations. Also omitted is the growing body of literat ure on the
system-over-ring approach to linear time-varying systems.
323
The theory of systems over rings is sufficiently well developed (and I believe
of sufficient interest) to justify the generation of tutorial treatments in the form
of textbooks. A book emphasizing the mathematical aspects of the subject has
appeared [1]; however, as of yet there are no textbooks focusing on the
engineering side of the theory. This is a task that Yves Rouchaleau and I set
out to do several years ago, but we have not been able to finish the job. Perhaps
this paper will stimulate us (or someone else) to "fill the gap."
References
[1] J.W. Brewer, J.W. Bunce, and F.S. Van Vleck, Linear Systems over Commutative Rings, Marcel
Dekker, New York, 1986
[2] R. Bumby, E.D. Sontag, HJ. Sussman, and W. Vasconcelos, "Remarks on the pole-shifting
problem over rings," Journal oJ Pure and Applied Algebra, Vol 20, pp 113-127, 1981
[3] C.I. Byrnes, "On the stabilizability of linear control systems depending on parameters," in
Proe 18th IEEE ConJ Deeision and Control, Ft Lauderdale, pp 233-236,1979
[4] c.I. Byrnes, "Realization theory and quadratic optimal controllers for systems defined over
Banach and Frechet algebras," in Proe 19th Conf Decision and Control, Albuquerque, New
Mexico, pp 247-251, 1980
[5] c.I. Byrnes, "Algebraic and geometrie aspects of the analysis of feedback systems," in
Geometrieal Methods Jor the Theory oJ Linear Systems, C.I. Byrnes and C. Martin (Eds.),
D. Reidel, Dordrecht, pp 85-124, 1980
[6] PJ. Cahen and J.L. Chabert, "Elements quasi-entiers et extensions de Fatou, J. Algebra, Vol
36, pp 185-192, 1975
[7] R. Eising. "Realization and stabilization of 2-D systems," IEEE Trans Automatie Control,
Vol AC-23, pp 793-799, 1978
[8] R. Eising and M.L.J. Hautus, "Realization algorithms for a system over a PID," Math Systems
Theory, Vol 14, pp 353-366, 1981
[9] E. Emre, "On necessary and sufficient conditions for regulation of linear systems over rings,"
SIAM J Control and Optimization, Vol 20, pp 155-160, 1982
[10] E. Emre, "Regulation of linear systems over rings by dynamic output feedback," Systems and
Control Letters, Vol 3, pp 57-62, 1983
[11] E. Emre and P.P. Khargonekar, "Regulation of split linear systems over rings: Coefficient
assignment and observers," IEEE Trans Automatie Control, Vol AC-27, pp 104-113, 1982
[12] P. Fatou, "Se ries trigonometriques et series de Taylor," Aeta Math, Vo130, pp 335-400,1905
[13] M. Fliess, "Matrices de Hankei," J Math Pures et Appl, Vol 53, pp 197-224, 1974
[14] W.L. Green and E.W. Kamen, "Stabilizability of linear systems over a commutative normed
algebra with applications to spatially-distributed and parameter-dependent systems," SIAM
J Control and Optimization, Vol 23, pp 1-18, 1985
[15] C.R Hewes, RW. Brodersen, and D.D. Buss, "Applications of CCD and switched capacitor
filter technology," Proe IEEE, Vol 67, pp 1403-1415, 1979
[16] M. Heymann, "Comments on pole assignment in multi-input controllable linear systems,"
IEEE Trans Automatie Control, Vol AC-13, pp 748-749,1968
[17] R. Johnston, "Linear systems over various rings," Ph.D. Dissertation, M.I.T., 1973
[18] R.E. Kaiman, "Algebraic structure of linear dynamical systems. I. The module of E," Proc
National Academy of Sciences (USA), Vol 54, pp 1503-1508, 1965
[19] RE. Kaiman, "Lectures on controllability and observability," Centro Internazionale
Matematico Estivo, Bologna, 1968
[20] R.E. Kaiman, P.L. Falb, and M.A. Arbib, Topies in Mathematieal System Theory,
McGraw-Hill, New York, 1969
[21] RE. Kaiman and M.L.l Hautus, "Realization of continuous-time linear dynamical systems:
Rigorous theory in the style ofSchwartz," in Ordinary Differential Equations, 1971 NRL-MRC
Conf, L. Weiss (Ed.), pp 151-164, Academic Press, New York, 1972
[22] E.W. Kamen, "A distributional-module theoretic representation of linear dynamical
continuous-time systems," Ph.D. Dissertation, Stanford University, 1971
[23] E.W. Kamen, "An algebraic realization theory for linear continuous-time systems, in Proe
Sixth Hawaii Int Conf on Systems Science, pp 32-34, 1973
324
E. W. Kamen
[24] E.W. Kamen, "On an algebraic theory of systems defined by convolution operators," Math
Systems Theory, Vol 9, pp 57-74, 1975
[25] E.W. Kamen, "Module structure of infinite-dimensional systems with applications to
controllability," SIAM J Control and Optimization, Vol 14, pp 389-408, 1976
[26] E.W. Kamen, "Lectures on algebraic system theory: Linear systems over rings," NASA
Contractor Report 3016, 1978
[27] E.W. Kamen, "An operator theory of linear functional differential equations," J Differential
Equations, Vol 27, pp 274-297, 1978
[28] E.W. Kamen, "Asymptotic stability of linear shift-invariant two-dimensional digital filters,"
IEEE Trans Cireuits and Systems, Vol CAS-27, pp 1234-1240, 1980
[29] E.W. Kamen, "Stabilization of linear spatially-distributed continuous-time and discrete-time
systems," in Multidimensional Systems Theory, N.K. Bose (Ed.), pp 101-146, D. Reidel,
Dordrecht, 1985
[30] P.P. Khargonekar, "On matrix fraction representations for linear systems over commutative
rings," SIAM J Contra I and Optimization, Vo120, pp 172-197, 1982
[31] P.P. Khargonekar and E.D. Sontag, "On the relation between stable matrix fraction factorizations and regulable realizations of linear systems over rings," IEEE Trans Automatie Control,
Vol AC-27, pp 627-638, 1982
[32] R.P. Kurshan and B. Gopinath, "Digital single-tone generator-detectors," Bell System
Teehnieal Journal, Vol 55, pp 469-476, 1976
[33] E.B. Lee and A.W. Olbrot, "On reachability over polynomial rings and a related genericity
problem," Int J Systems Scienee, Vol13, pp 109-113,1982
[34] A.S. Morse, "Ring models for delay-differential systems," Automatiea, Vol 12, pp 529-531,1976
[35] c.G. Naude and G. Naude, "Comments on pole assignability over rings," Systems and Control
Letters, Vol 6, pp 113-115, 1985
[36] R.W. Newcomb, "On the realization of multivariable transfer functions," Report EERL 58,
Cornell Univ., Ithaca, NY, 1966
[37] Y. Rouchaeleau, "Linear, discrete-time, finite-dimensional dynamical systems over some c\asses
of commutative rings," Ph.D. Dissertation, Stanford University, 1972
[38] Y. Rouchaleau and B.F. Wyman, "Linear dynamical systems over integral domains," J
Computer and System Sciences, Vo19, pp 129-142, 1974
[39] Y. Rouchaleau, B.F. Wyman, and R.E. Kaiman, "Algebraic structure of linear dynamical
systems. III. Realization theory over a commutative ring," in Proc National Academy of
Sciences, Vo169, pp 3404-3406, 1972
[40] W. Schmale, "Feedback cyc\ization over certain principal ideal domains," Int J Control,
Vo148, pp 89-96, 1988
[41] W. Schmale, "Three-dimensional feedback cyc\ization over C[y]," Systems and Control Letters,
Vo112, pp 327-330, 1989
[42] P.K. Sharma, "Some results on pole-placement and reachability, Systems and Control Letters,
Vol 6, pp 325-328, 1986
[43] E.D. Sontag, "Linear systems over commutative rings: A survey," Ricerche di Automatica,
Vol 7, pp 1-34, 1976
[44] E.D. Sontag, "On linear systems and noncommutative rings," Math. Systems Theory, Vo19,
pp 327-344, 1976
[45] E.D. Sontag, "On split realizations of response maps over rings," Information and ContrnI,
Vol 37, pp 23-33, 1978
[46] E.D. Sontag, "On first-order equations for multidimensional filters," IEEE Trans Acoustics,
Speech, and Signal Proc, Vol ASSP-26, pp 480-482, 1978
[47] E.D. Sontag, "Comments on 'Some results on pole-placement and reachability'," Systems and
Control Letters, Vol 8, pp 79-83, 1986
[48] A. Tannenbaum, "Polynomial rings over arbitrary fields in two or more variables are not
pole assignable," Systems and Control Letters, Vo12, pp 222-224, 1982
[49] N.S. Williams and V. Zakian, "A ring of delay operators with applications to delay-differential
systems," SIAM J Control and Optimization, Vo115, pp 247-255, 1977
[50] B.F. Wyman, M.K. Sain, G. Conte, and A. Perdon, "On the zeros and poles of a transfer
function," Linear Algebra and Its Applications, Vo1122, pp 123-144, 1989
[51] Y. Yamamoto, "Module structure of constant linear systems and its application to controllability," J Math Analysis and Appl, Vo183, pp 411-437, 1981
[52] D.C. Youla, "The synthesis of networks containing lumped and distributed elements," in Proc
Symp on Generalized Networks, Vol XVI, Polytechnic Inst of Brooklyn, pp 289-343, 1966
Chapter 5
In this paper we discuss a invariant-theoretic construction of the KaIman space wh ich is a universal
parameter space for linear time-invariant dynamical systems of fixed state space and input/output
dimensions.
1 Introduction
Families of dynamical systems appear in all aspects of systems and control
theory. Indeed, the essential need for feedback in control systems is the fact
that the plant model is only an approximation, and so we must in reality design
for a whole family of plants. Of course, the appropriate notion of family depends
upon the type of problem in which we are interested. For example, in robust
control, families are modelIed by certain natural norm bounded perturbations
of a given nominal plant. This is a local analytic point of view.
In the early 1970's, KaIman [lOJ undertook aglobaI algebraic approach to
the problem of system parametrization when he constructed a universal
parameter spaee of linear time invariant systems of fixed state space and input/
output dimensions. In doing this, he initiated a powerful algebro-geometric
framework for studying families of linear time invariant dynamical systems.
This approach has had major ramifications in algebraic systems theory and
basically opened up a new branch of study. Indeed, a whole conference was
dedicated at Harvard in 1979 just to consider this research area. Even today
almost two decades later, a number of prominent researchers are continuing
along this research stream.
Besides the introduction of algebraic geometry into control, Kalman's paper
[10J (see also [13]) illuminated the deep connection between invariant theory
and a number of control problems. Given the introduction of invariant theory
and algebraic geometry into control, it was only a sm all step to bring geometrie
invariant theory into the picture. Indeed, geometrie invariant theory may be
* This research was supported in part by grants from the NSF (ECS-8704047), NSF (DMS-8811084),
the Air Force Office of Scientific Research AFOSR-88-0020, AFOSR-90-0024, U.S. Army Research
Office, and the ONR through Honeywell Systems and Research Center, Minneapolis.
328
A. Tannenbaum
2 On Algebraic Groups
We will assurne throughout this paper that the reader has abasie knowledge
of algebraie geometry. Good references for this are [4], [9], [18]. In this seetion,
we will very briefly review some basie notions in the theory of algebraic groups
following [3], [9], [18]. By morphism of algebraic varieties will mean an analytic
map which is locally rational in the coordinates of the given varieties.
Definition 1. Let G be an algebraie variety whieh is endowed with a group
structure. Let Jl: G x G - G, be defined by Jl(x, y):= xy, and i: G - G be given by
i(x):= X-I. If Jl and i are morphisms, then we say that Gis an algebraic group.
Many of the most common Lie groups are in fact algebraic groups. Below
all of our varieties will be assumed to be complex.
Examples 1.
(1) This is the most important example of an algebraic group. Let
329
3 Remarks on Systems
In this paper, we will assume that all of our systems are defined over the complex
numbers C, even though most ofthe constructions go over in the more physically
realistic case of systems defined over the real numbers R.
As is standard, we will identify the linear time invariant system given by
the differential equation
This is precisely the set of linear time-invariant systems of state space dimension
n and input dimension m. For simplicity we have suppressed the output part
of the system. (See however Sect. 9.) We will also identify at certain times in
the what follows the space C n xn X C n xm with cn 2 +nm.
Now it is well-known that the general linear group GL n acts on C"2+ nm by
change of basis in the state space. Namely for (F, G)EC"2+ nm and gEGL n , we
define the action g'(F, G):= (gFg- 1 , gG). From an input/output point of view,
the systems (F, G) and (gFg- 1 , gG) are identical [11]. Accordingly, we define
the equivalence relation on C"2+ nm by (F, G) '" (F', G') ifthere exists gEGLn such
that
330
A. Tannenbaum
oe Systems
331
We will see below that the KaIman space is a quotient in some suitable
sense. Indeed we have the following:
Definition 3. Let G be an algebraic group acting on a variety X. A quotient of
X by G is a pair (Y, G) where Y is a variety and oe: X ..... Y is a morphism such that
(i) oe is constant on the orbits;
(ii) given Y' a variety, oe':X ..... Y', a morphism constant on the orbits, there
exists a unique morphism : Y ..... Y' such that oe' = ooe. Note that ifa quotient
exists, then it is unique up to isomorphism.
Example 3. Let C" xn denote the space of n x n complex matrices. Consider the
polynomial map oe:C"xn ..... c n defined by sending an n x n matrix to it~
characteristic coefficients. One can prove (e.g. using Richardson's theorem [18])
that (C", oe)is a quotient ofcn Xn relative to conjugate action ofthe group GL n
Remark 1. We will assume that the reader is familiar with the basic notions
concerning sheaf theory. Recall then that to any algebraic variety X, we may
associate the strueture sheaf, (!Jx, which may be characterized as folIows: For
U c X open, we let (!Jx(U) be the ring of regular functions defined on U. Note
that a function is regular if it is analytic, and may be written as the ratio of
two polynomials on U.
If the variety X is affine, then (!Jx(X) defines the co ordinate ring of R of X.
Then X (via the Hilbert Nullstellensatz) may be identified as the set of maximal
ideals of R.
In practice, the notion of quotient is a bit too weak. Indeed, a quotient may
not be an orbit space. Indeed, referring to Example 3 above, for the quotient
oe:cnxn ..... c n, we see that each fiber consists of a unique closed orbit consisting
of semi-simple matrices, and a unique relatively open orbit consisting of matrices
with a cyclic vector. These coincide if all the eigenvalues are distinct. Hence C"
cannot in any reasonable sense be regarded as parametrizing the orbits of the
action of GLn on C"x n. We therefore need adefinition which combines the
geometrie notion of orbit space with the invariant theoretic notion of quotient,
in short a "geometrie quotient." As we alluded to above, the key property lies
in the closedness of the orbits. This leads to the following fundamental definition:
Definition 4. Let G be an algebraic group acting on a variety X. A geometrie
quotient is a pair (Y, cjJ) consisting of a variety Y, and a morphism cjJ: X ..... Y such
that
(i) for every ye Y, cjJ -l(y) is an orbit;
(ii) for each invariant open sub set U c X, there exists an open subset U' c Y
with cjJ -l(U') = U;
(iii) for every open subset U' c Y, cjJ*: (!J( U') ..... (!J( cjJ - 1 (U' defines an isomorphism of(!J(U') onto the ring ofinvariantfunctions (!J(cjJ -l(U'G of cjJ -l(U').
332
A. Tannenbaum
Proof. Since the proof is so nice, we give a short sketch here. For full details
see [14], [3]. First if Adenotes the coordinate ring of X, then A is a finitely
generated C-algebra. Let AG denote the ring of invariant functions. Then since
Gis linearly reductive AG is also finitely generated. See [14], [3]. (This is really
the key point of the proof.) Hence AG determines an affine variety Y, and the
inclusion A G ~ Adetermines a morphism 4>: X -+ Y. It is then elementary to
show that (Y, 4 is a quotient.
0
333
R(F, G). Moreover, clearly Vnm is invariant under the action of GLn What we
will show now is that GLn acts on the space of completely reachable systems
Vnm with closed orbits.
Proof. The proofwe give here works ~ver C or R. For a proof over an arbitrary
field see [18], [19]. First we may consider V:= Cn as a finitely generated C[x]
module via the action
xv:= F(v)
Homqx] (V/V', V)
Certainly the dimension of the linear space .91 is equal to the dimension of
stab (F, G), and clearly dirn .91 = 0 if and only if V = V'. But V = V' ifand only
if (F, G) is completely reachable.
0
We now can prove the following key result:
Corollary 1. GLn acts on Vnm with closed orbits.
Proof. Indeed from Theorem 3, we have that all the orbits of GL n in Vnm have
constant (maximal) dimension. Hence by the closed orbit lemma (Theorem 1),
these orbits must be closed.
0
Remark 2. Actually one can even prove more than Corollary 1, namely that
Vnm is precisely the set of pre-stable points of cn 2 + nm under the GLn-action. For
details see [18].
334
A. Tannenbaum
Then since fIJ is invariant, it descends to a function on the quotient VI. Define
VIJ := VI\{(F,G):fIJ(F,G)=O}
U1(\ U J =
7t; 1 (VJI )
7tJ
Thus the sets {VI }Iniee patch together to form a variety vltnm This is precisely
the Kaiman space [10].
Now it is very easy to get a projective embedding of vltnm Indeed, the set
of functions {JIJ} forms a Czech 1-cocycle, and hence determines a line bundle
335
L on .Anm It can be proven [18J that L is ample, Le. for some N > 0, the
sections of 0 determine an embedding of .Anm into some projective space. This
means that .Anm is quasi-projective, i.e. is isomorphic to an open subset of a
GLn=SLnC*
As above we identify cn Xn X cn x m with en 2 + nm =:X. Let (Y, n) be the quotient
of X by SL n This exists by Mumford's theorem. If X' denotes the subset of X
fixed by the action of C*, then it is easy to see that X' ~ Cn2 Let Y':= n(X').
From Example 3 above, we have that Y' ~ cn.
Now one can check that Y is a C*-variety. Let Oc.(Y) denote the orbit of
YE Y under C*. Then
and it is easy to show that Oc.(y)n Y' consists of a single point (the "vertex"
of a cone). Recall that for two varieties VI and V2 , a morphism : VI --+ V2 is
projeetive ifit may be factored by a closed immersion of VI into pN x A 2 followed
by the natural projection of pN x A 2 onto A 2
We can now state the following lemma from [18J:
Lemma 1. Set Y:= Y\Y'. Then a geometrie quotient Y/C* for Y exists relative
to C*, and further there exists a natural projeetive morphism cfJ: Y/C* --+ Y'.
336
A. Tannenbaum
C'\{O}/c*~pr-l
Set
Then we have a commutative diagram
Y/C* =.Anm
The following theorem folIo ws immediately from our above discussion:
Theorem 5. There exists a projective morphism l/J:.A nm --+ cn which makes the
Jollowing diagram commutative:
1" ~C""-~r:
.Anm
q,
Cn
337
noer = identity. For m = 1, we know that algebraic canonical forms exist, namely
the control canonical form.
that is,
Hom(S,Z) ~ ~(S)
VSeVar
This means that ~ and hz are equivalent functors, or that Z represents the
functor ~. Such a Z has been called by Mumford a fine moduli spaee. The
point of this present section is that the KaIman space .nm is this fine moduli
space Z.
338
A. Tannenbaum
We first should note that one has a natural morphism of functors from
$i ~h.,l{nm. More precisely, given SEVar, and ("f/,F,gl, ... ,grn)E$i(S), for every
SES (after choosing a basis in "f/(s)), we get a completely reachable pair
(F(s), G(s)). This pair is unique modulo choice of basis, and thus determines a
point of Atnrn . Hence ("f/,F,gl, ... ,grn) determines a morphism S~Atnrn, and so
we have the required morphism of functors ffi ~ h.,llnm.
We can now state:
Theorem 6. Atnrn is a universal parameter space, i.e. g;(S) ~ Hom(S, At nrn) for
all SEVar.
Proof. First for fixed SEVar, we have that for U c S open, the mapping
U ~ g;(U) defines a sheaf of sets. For each I a nice selection, define a subfunctor
g;I of g; by letting for each SEVar, g;I(S) be the subset of g;(S) consisting of
m+2-tuples ("f/,F,gl, ... ,grn) such that for each point sES,F(s):"f/(s)~
"f/(s), gl (s), ... ,grn(S)E"f/(s) have the property that for some (and hence every)
choice of basis in "f/(s), the corresponding completely reachable pair (F(s), G(s))
satisfies the condition det R(F(s), G(s) h =F O.
Now one can show easily that the g;I for each I nice are open subfunctors
of g;; see [18]. Moreover, we claim that the functors g; I are representable.
Indeed, to verify this, we will prove that if (VI' n I ) is the geometrie quotient of
UI, then g;I ~ hVI First note that
U I ~ GL n x
cnrn
We define an endomorphism
FI:VI x cn~ VI X Cn
by
FI(x, y):= (x, F(x)y),
Cn by
339
for i = 0, ... , m. It is easy to see then that (VI X en, FI> gll"'" glm) is a universal
family, and so VI represents :F I as claimed.
Next we note that the {:F/}/nice from an open covering of :F. We are now
almost done. Indeed, let
:FIJ=:FI x F:F J
Then identifying the functor :F I with the variety VI which represents it, we
have that the map
is an open immersion of varieties. From the fact that :F is convered by the
open representable subfunctors :FI' the :F I patch along the :F IJ to form a
variety, and since :F is a Zariski sheaf (i.e. U -t :F(U) is a sheaf for U eS open),
this variety must represent :F. FinaHy since .Anm and :F are patched together
in the same way, we must have that .A nm represents :F, Le., .Anm is the universal
0
parameter space.
Remark4.
(i) In [7], Hazewinkel shows that .Anm may be constructed over any
commutative ring with identity (e.g. the ring of integers Z) using certain
local arguments. The above argument immediately implies this fact as weH.
See [18].
(ii) From the proof of Theorem 6 we also deduce the existence of a universal
family of completely reachable pairs which corresponds to the identity
morphism.Anm -t.Anm under the isomorphism Hom(.A nm'.Anm) ~ :F(.Anm)'
9 Canonical Systems
After the initial work done on the KaIman space .Anm a number of researchers
(see [5-7], [2], [18] and the references therein), considered the problem of the
quotient space of canonical systems. Since this makes strong contact with
another of Kalman's key contributions, namely realization theory, we would
like to discuss this work briefly here as weH. In this case, we will take into
account the output part of the system, i.e. we consider systems of the form
i(t) = Fx(t) + Gu(t)
y(t) = Hx(t)
enxm
raJlHF"-l;J=n
completely
340
A. Tannenbaum
Notice that change of basis in the state space induces the following GL n action
on C nxn X C nxm X C P xn:
for gEGL n. Clearly then V n,m,p ccnxn x cnxm X C pxn is a Zariski open subset
which is invariant under this GLn-action.
Now based on our arguments from Sect. 7, we may show that there exists
a geometrie quotient (At~,m,p' nn,m,p) for V~,m,p with respect to the above
GLn-action. The orbit space At~,m,p will be smooth, and quasi-projective. As the
following result due to Hazewinkel [6] shows, it is a beautiful fact from
realization theory that At~,m,p is even quasi-affine:
C
Theorem 7.
Proof. We will indicate the basic idea ofthe proofjust to so where the realization
theory comes in. See [6], [2], [18] for complete details.
Define a morphism 1jJ: c nxn X c nxm X CP xn~ c(n + 1)2 mp by:
IjJ(F, G,H):=
HG
HFG
:
...
...
HFnG
HrG
HFn+1G
HF 2n G
Yt':= (fi(At~,m,p)
Note that Yt' is precisely the space of Hankel matrices of rank n, and so Yt' is
clearly quasi-affine. We have therefore established the existence of a bijective
e
morphism :h:
't' At n,m,p ~ Yt'. The fact the (fi is biJ'ective is not enough to imply
that it is an isomorphism of varieties. To do this requires some more powerful
machinery of algebraic geometry [6], [2], [18]. One argument can be based
on the facts that Yf is smooth and that (fi can be easily shown to be an
vII~,m,p'
341
cjJ is an isomorphism.
Remark 5.
(i) The above result of Hazewinkel is interesting from the purely realization
theoretic point of view. Indeed, the above proof shows that realizations are
continuous in their parameters when restricted to canonical systems.
(ii) As we mentioned above, it is possible to work out most of the above
constructions over R, and therefore define a real version of vii l,n,l which
we will call vii l,n,l (R). The space vii l,n,l (R) has undergone a particularly
rigorous topological study [1], [17], [18] (and many others). Indeed,
vii l,n,l (R) may be identified with the space (in Brockett's notation [1])
Rat(n):=
10 Conclusions
It has been now almost twenty years since, KaIman initiated the geometrie
approach to families of systems outlined above. Of course, even today the
concept of family remains fundamental in systems and control, and is really
the underlying object of study in both adaptive and robust control. Especially
relevant is adaptive control with its emphasis on the notion of identification,
since much of the interest in the topology and geometry of the moduli spaces
of systems was precisely for identification theoretic reasons. However it is
important to note that despite the uses of very high powered techniques from
topology, complex analysis, Lie groups, algebras, differential and algebraic
geometry, the precise global structure of these universal families and parameter
spaces still remains an open problem to this day, and one of active research
interest.
In the past ten years, there has also been an extensive pro gram of research
using a local analytic approach in both adaptive and robust control. It turned
out to be highly profitable both from the theoretical and practical stand points
to consider families defined in weighted HOO balls using techniques from operator
and interpolation theory. There is also much work being carried out on the
melding of the robust and adaptive control approaches to system uncertainty
and families of systems. In short, the study of families of systems whether from
342
A. Tannenbaum
the algebraic or analytie, loeal or global point ofview lies at the heart offeedbaek
eontrol theory. Certainly, the KaIman eonstruetion of the moduli spaee of
dynamieal systems is one of the major aehievements in this area.
References
[1] R. Broekett, "Some geometrie questions in the theory of linear systems," IEEE Trans Auto
Control AC-21 (1976), 449-464
[2] C. Byrnes, "On the eontrol of eertain deterministie infinite dimensional systems by
algebro-geometrie teehniques," Amer J Math 100 (1979), 1333-1381
[3] J. Fogarty, Invariant Theory, Benjamin, New York, 1965
[4] Hartshorne, Aigebraie Geometry, Springer-Verlag, New York, 1977
[5] M. Hazewinkel, "Moduli and eanonieal forms for linear dynamical systems II: the topologieal
ease," Math Systems Theory 10 (1977),363-385
[6] M. Hazewinkel, "Moduli and eanonieal forms for linear dynamieal systems III: the
algebro-geometrie ease," (edited by R. Hermann and C. Martin), Proe of the 1976 Ames
Conferenee on Geometrie Control Theory, Math Sei Press, 1977
[7] M. Hazewinkel, "(Fine) moduli (spaces) for linear systems: What are they and what are they
good for," Leetures given at the NATO-AMS Study Inst. on Algebraie and Geometrie Methods
in Linear Systems Theory, Harvard Univ, 1979
[8] M. Hazewinkel and R.E. Kaiman, "On invariants, eanonieal forms, and moduli for linear,
eonstant, finite dimensional dynamieal systems," in Proe CNR-CISM Symp on .AIgebraie
System Theory, Udine (1975), Leeture Notes in Eeonomies Math System Theory 131, pp 48-60,
Springer, New York (1976)
[9] J. Humphreys, Linear Aigebraie Groups, Springer, New York, 1975
[10] R.E. Kaiman, "Algebraie geometrie deseription of the c1ass of linear systems of eonstant
dimension," 8th Annual Pineeton Conference on Information Scienees and Systems, Prineeton,
NJ., 1974
[11] R.E. Kaiman, M. Arbib, and P. Falb, Topies in Mathematieal System Theory, MeGraw-Hill,
New York, 1969
[12] R.E. Kaiman, "On minimal partial realizations of an input/output map," in Aspeets of N etwork
and System Theory (edited by R.E. Kaiman and N. DeClaris), Holt, Rinehart, and Winston,
New York, 385-407 (1971).
[13] R.E. Kaiman, "System theoretic aspeets ofthe theory ofinvariants," unpublished manuseript,
1974
[14] D. Mumford, Geometrie Invariant Theory, Springer, New York, 1965
[15] D. Mumford and K. Suominen, "Introduetion to the theory of moduli," (edited by F. Oort),
Proe 5-th Nordie Sehool in Math, Olso, 1970, Wooiters-Noordhoff, Groningen, 171-222
(1972)
[16] D. Quillen, "Projective modules over polynomial rings," Inv Math 36 (1976),167-171
[17] G. Segal, "The topology of spaees of rationaifunetions," Aeta M athematica 143 (1979), 39-72
[18] A. Tannenbaum, Invarianee and System Theory: Aigebraie and Geometrie Aspeets, Springer,
New York, 1981
[19] A Tannenbaum, "On the stabilizer subgroup of a pair of matriees," Linear Algebra and Its
Applieations 50 (1983), 527-544
Chapter 5
1 Introduction
I was introduced to the topic of linear multi-input-multi-output (MIMO)
systems in the spring of 1960 at Purdue University by the late Rangasami
Sridhar. At that time, reference material consisted of papers by Freeman [1,2]
and Kavanagh [3,4] and analysis and synthesis of MIMO systems was in its
infancy. Transfer function matrix representation was used and both analysis
and synthesis involved inversion of square rational matrices. There was no
understanding of synthesis procedures that would be guaranteed to lead to
proper stabilizing controllers nor was there any real understanding of how
closed-loop stability could be determined from an input-output representation.
Although it was quite clear that considerable work was needed in order to
develop effective tools to use on these problems, it was definitely not clear where
to start.
The proper start was on the representation of MIMO systems and was made
by R.E. KaIman [5] via introduction of the concepts of controllability and
observability and recognition of their importance on system structure.
This was one of the most fundamental contributions made to linear system
theory and the purpose of this essay is to trace the development of subsequent
results that now furnish a complete characterization of all input-output maps
for a stable linear system.
The fact that this development has taken the better part of the past thirty
years attests to the nontrivial nature of the problem and to the importance of
Kalman's contribution.
In the following, the emphasis is on the relationship between systems
described by transfer functions and by state equations. KaIman emphasized the
state space description and in this manner was able to characterize minimal
realizations oftransfer functions. This was the major step in the following results
to be discussed.
346
J. B. Pearson
Fig.l
347
Fig.2
It is very easy to establish that the system with inputs (r 1 , r 2 ) and outputs
(Yl> Y2) is controllable and observable if and only if G and Kare controllable
and observable. Of course, this means that in any stability analysis of the
c1osed-Ioop system we will have all the system modes available. That is, if G
and K have minimal realizations of orders n1 and n2 respectively, then we will
have nl + n2 c1osed-Ioop modes to investigate.
It should be 9lear from this discussion that the system in Fig. 2 is stable
if and only if each of the four transfer function matrices has left-half-plane poles
only. Controllability and observability allows us to conclude that these poles
represent all n1 + n2 system modes under our control.
Controllability and observability have now enabled us to analyze the c1osedloop stability of an interconnected system in terms of its transfer function matrix.
The next step is concerned with representatin of rational transfer function
matrices as matrix fractions either over the ring of polynomials or the ring of
stable rational functions.
348
J. B. Pearson
A(s)
-N(s)
-B(S)][Yl]=[B(S)
M(s)
Y2
0
0 ][r 1]
N(s)
r2
A(s)
-N(s)
- B(s)
M(s)
B(s)
0]
N(s)
has full row rank for all s, but this is the same as
BW
[ A~)
o 0
0
M(s)
0]
N(s)
having full row rank which is true since [A(s) B(s)] has full row rank as does
[M(s) N(s)] by left coprimeness.
Therefore the stability of the system is determined by the zeros of
d et [
A(s)
-N(s)
349
(see [12J for an early synthesis problem using polynomial MFD's and [13J for
more re cent results) and led to further work in MIMO system representation
designed to avoid this problem.
The work of Pernebo [14J involved what he called A-generalized polynomials, which were rational functions that behaved like polynomials. This
particular system representation was somewhat awkward and quickly gave way
to the so-called "stable-rational function" MFD. Here the matrices have entries
that are stable proper rational functions rather than polynomials. This
representation was first suggested by Vidyasagar [15J and developed by Desoer
et al. [16]. In my opinion, it is a less satisfying representation than the polynomial representation, since it is only the right-half-plane poles and zeros of a
system that are retained in all coprime factorizations, but these are apparently
the only system attributes that play major roles in system stability and
performance. Computation of "stable-rational" MFD's is quite sraightforward
as shown by Nett et al. [17J and proper controllers result from all synthesis
procedures which use the YJBK parametrization [18-20J of all stabilizing
controllers. This was the final breakthrough that led to the parametrization
of all stable input-output maps of a linear system and the subject of this
easy.
NB I )
Let us now ass urne that the coprime factorizations are in terms of stable
rational functions and thus the above statement becomes-The system is stable
if and only if
(MAI - NB I )
is a unimodular matrix, and hence its determinant is a unit. In this case a unit
is a proper stable rational function whose inverse is also proper, stable and
rational.
Define N Q and DQ as folIows,
[N
MJ[X
Y
-BIJ=[N Q
Al
DQJ
(*)
350
J. B. Pearson
where
[_AYl
:J[~ -A~l]_[~ ~]
is the double-Bezout identity [21]. Since (M, N) are left coprime and
[ X
-Bl]
Al
where Q=D"QlN Q is stable and (QB+X l ) and (QA- Y l ) are left coprime.
Therefore every stabilizing controller can be represented as in (**) where Q
is a stable proper rational matrix.
On the other hand, given any stable proper rational Q, define
M=QB+X l
and
N=QA- Y l
G(I-KG)-l
(I -KG)-lKG
G(I-KG)-lK]
(I -KG)-lK
B1(QB + X d
(A1Q - Y)B
B1(QA - Y 1)]
A1(QA - Y 1)
Notice that each entry is a stable rational function when Q is stable rational.
Also notice that every entry is linear-affine in Q.
The next thing to notice is the primary reason for using stable rational
MFD's. Assume G = A - 1 B = B 1 A; 1 is strictly proper. This means that
351
and
lim B(s) = 0
s--+
00
Since G is proper, det A( (0) :;6 0 and det Al (00) :;6 O. Therefore
K(oo) = [X1(00)]-1[Q(00)A(00) - Y1(00)]
and since
x 1(00)A 1((0) = I
it is clear that K is a proper rational function.
Even when B( (0) =I- 0, it is straightforward to impose the constraint that
det(Q( 00 )B( (0) + Xl (00)) :;6 O.
Y2
Fig.3
J. B. Pearson
352
D= [DDz
0 ]
D3
and define
~1
and ~ z as
Then
[
Z ]
Y1
[N l NzJ[~lJ
~z
N3
N4
Then
VYz = U(Yl
+ r z)
f'
Dz
0
D3
N 3 N4
0
0
0
-1
0
0
0
lJ J ~ [~ ~]m
UJ ~ [1' 0f'J
-V
Nz
0
0
~z
1 0
1
oo
Yl
Yz
It is now quite easy to prove that the three matrices shown above constitute
a bicoprime factorization [21] of the system transfer function and hence that
system stability is determined by the zeros of
o
o
-1
~] ~dct(D,)det(W)dct(V
A,
+ UB,)
-V
where (B z, A z ) is a right coprime factorization of G2Z and W is a greatest
common right divisor of (N 4' D 3 ).
Clearly the system is stabilizable if and only if det(Dd and det(W) are units
and this condition can be shown to be equivalent to the statement that G and
G22 have the same unstable poles. This, of course, is the same thing as saying
that all the unstable modes of G must be controllable from u and observable
353
G 12 (I
+ KG 22 )-1 KG 21
G 12 Y AG 21 ) - (G 12 AtlQ(AG 21 )
H:= Gll
G12 YAG 21
Define
U:= G 12 A 1
V:=AG 21
Then it can be shown that when G is admissible, H, U and V are stable rational
functions and all transfer functions that can be achieved with a stabilizing
controller are given by
lP=H-UQV
where Q is any stable rational matrix and H, U and V are stable rational
matrices determined by G. Of course, this equation is well-known and widely
used today in synthesis problems involving minimization of the H 2' H 00 and 11
norms [22-25], but when I recall the state of affairs 30 years ago, I claim that
this equation represents one of the most outstanding achievements in linear
system theory and that the door to developing this equation was opened for
us by R.E. KaIman.
References
[1] Freeman, H., "A synthesis method for multipole control systems," AIEE Trans Appl Ind,
Vol 76, Part Il, pp 28-31, March 1957
[2] Freeman, H., "Stability and physical realizability considerations in the synthesis of multipole
control systems," AIEE Trans Appl Ind, Vol 77, Part Il, pp 1-5, March 1958
[3] Kavanagh, RJ., "Noninteracting controls in linear multivariable systems," AIEE Trans Appl
Ind, Vol 76 Part Il, pp 95-99, May 1957
[4] Kavanagh, RJ., "Multivariable control system synthesis," AIEE Trans Appl Ind, Vol77,
part Il, pp 425-429, November 1958
[5] Kaiman, R.E., "On the general theory of control systems," Proc. 1st International Congress
on Automatie Control, Moscow, 1960, Butterworths, 1961
[6] Kaiman, R.E., "Canonical structure of linear dynamical systems," Proc Nat Acad Sei, Vo148,
No 4, pp 596-600, 1962
[7] Gilbert, E.G., "Controllability and observability in multivariable control systems," SIAM J
Control, Voll, No 2, pp 128-151, 1963
[8] Kaiman, R.E., "Mathematical description of linear dynamical systems," SI AM J Control,
Voll, No 2, pp 152-192, 1963
[9] Rosenbrock, H.H., State-Space and Multivariable Theory. John Wiley and Sons, Inc, New
York,1970
[10] Wolovich, W.A., Linear Multivariable Systems. Springer-Verlag: New York, 1974
354
J. B. Pearson
[11] Kailath, T., Linear Systems. Prentice-Hall, Inc., Englewood ClilTs, NJ, 1980
[12] Cheng, L. and Pearson, J.B., "Frequency domain synthesis of linear multivariable regulators,"
IEEE Trans on Automatie Control, Vol AC-23, pp 3-15, 1978
[13] Antoulas, A.C., "A new approach to synthesis problem in linear systems," IEEE Trans on
Automatie Control, Vol AC-30, pp 465-473, May 1985
[14] Pernebo, L., "An algebraic theory for the design of controllers for linear multivariable systems;
Part I: Structure matrices and feedforward design; Part 11: Feedback realizations and feedback
design," IEEE Trans on Automatie Control, Vol AC-26, pp 171-194, Feb 1981
[15] Vidyasagar, M., "Input-output stability ofa broadclass oflinear time-invariant multivariable
feedback systems," Siam J Control, VollO, pp 203-209, 1972
[16] Desoer, C.A., Liu, R.W., Murray, J., and Saeks, R., "Feedback system design: the fractional
representation approach to analysis and synthesis," IEEE Trans on Automatie Control,
Vol AC-25, pp 399-412, June 1980
[17] Nett, C.N., Jacobson, C.A., and Balas, MJ., "A connection between state-space and doubly
coprimefractional representations," IEEE Trans A-C, Vol AC-29, No 9, pp 831-832, Sep 1984
[18] Kucera, V., "Algebraic theory of discrete optimal control for multivariable systems,"
Kybernetika (Prague), Vols 10-12, pp 1-240, 1974
[19] Kucera, V., Diserete Linear Control, John Wiley and Sons, New York, N.Y., 1979
[20] Youla, D.C., Bongiorno, J.J., and Jabr, H.A., "Modern Wiener-Hopf design of optimal
controllers-Part 11: The multivariable case," IEEE Trans on Automatie Control, Vol AC-21,
pp 319-338, June 1976
[21] Vidyasagar, M., "Control Systems Synthesis: A Factorization Approach," MIT Press,
Cambridge, MA, 1985
[22] Doyle, J.c., "Advances in multivariable control," ONRjHoneywell Workshop, Minneapolis,
MN, 1984
[23] Chang, B.-C. and Pearson, J.B., "Optimal disturbance reduction in linear multivariable
systems," IEEE Trans Automatie Control, Vol AC-29, No 10, pp 880-887, October 1984
[24] Francis, B.A., Helton, J.W., and Zames, G., "H"'-optimal feedback controllers for linear
multivariable systems," IEEE Trans Automatie Control, Vol AC-29, No 10, pp 888-900,
October 1984
[25] Dahleh, M.A. and Pearson, J.B., "l'-optimal feedback controllers for MIMO discrete-time
systems," IEEE Trans A-C, Vol AC-32, No 4, pp 314-322, April 1987
s. Baras
Aigebraic system theory as introduced by Kaiman provided a unifying framework for the frequency
domain and state-space approaches to linear finite dimensional systems. More significantly it
allowed a rapproachement with automata theory which led to the development of extensions to
infinite dimensional systems and nonlinear systems. Another important consequence was the
popularization of algebraic methods for constructing and analyzing models of systems over arbitrary
rings and fields. An important obstac1e for utilizing these powerful mathematical tools in practical
applications has been the non availability of emcient and fast algorithms to carry through the
precise error-free computations required by these algebraic methods. Recently with the advent of
computer algebra this has become possible. In this paper we develop highly emcient, error-free
algorithms, for most of the important computations needed in linear systems over fields or rings.
We show that the structure of the underlying rings and modules is critical in designing such
algorithms. We also discuss the importance of such algorithms for controller synthesis.
1 Introduction
My first exposure to algebraic system theory was through a graduate course
at Harvard in 1972, based on the text [17J, which presented a more complete
version of ideas presented in Kalman's seminal papers [18,32]. I was inspired
enough by the elegance ofKalman's methods and their implications to undertake
a research pro gram to develop similar constructs and theories for infinite
dimensional systems [1-3]. Wh at I found significant and highly promising in
these ideas was the algebraic methodology which, through a rapproachement
with automata theory, offered a way of thinking about infinite dimensional and
nonlinear systems free of the constraints of finite dimensionality and linearity.
I had many opportunities to interact with KaIman through the years on these
topics. The theories developed are by now weH known and cover a diverse
variety of systems: linear systems over rings and fields [33, 34J, nonlinear systems
[11,21, 35J, infinite dimensional systems [2J. EventuaHy these ideas influenced
the more traditional approaches to control systems description and synthesis
through the development of polynomial matrix methods [16,24,36, 37J. More
* The
results reported here represent joint research with David MacEnany. Robert Munach
performed most of the computer experiments. This research was supported in part by NSF grant
NSF CDR-85-00108 under the Engineering Research Centers Program, and AFOSR University
Research Initiative grant 87-0073. David MacEnany was partially supported by an IBM Fellowship.
356
J. S. Baras
357
358
J. S. Baras
359
finite sequence of elementary row operations. In other words, they are related
by left multiplication by a unimodular matrix U(s) such that B(s) = U(s)A(s) or
A(s) = U- 1 (s)B(s).
Any mx n polynomial matrix A(S)EM(Q[s]) of rank r ~ 1 is row equivalent
to an upper triangular form (or upper trapezoidal form if m i= n). Therefore, it
can be reduced by a sequence of elementary row operations to ,an upper
tri angular (trapezoidal) matrix T(S)EM(Q[s]). Said another way, there exists a
unimodular matrix U(s) such that U(s) A(s) =T(s) with T(S)EM(Q[s]) upper
tri angular. If T(S)EM(Q[s]) satisfies the following conditions, it is said to be in
column H ermite form.
1. If m > r then the last m - r rows are identically zero.
2. Each nonzero diagonal entry has degree greater than the entries above it.
3. Each diagonal entry is monic.
360
J. S. Baras
( 1 0)
- q(s)
(a(S)
b(s)
C(S))
d(s)
(a(S)
r(s)
c(s)
)
d(s) - q(s)c(s)
d(s)-q(s)c(s)
c,d,qEQ[S]
one encounters the generic computation rJ. + y, rJ., , y E Q. If these are expressed
as ratios of integers rJ. = Na/Da, = N/D, Y= NY/DY, all reduced to lowest
terms, then
NaDDY + NNYD a
rJ. + yb = - - - - - - DaDDY
This computation requires six integer multiplications, one integer addition and
the calculation of a GCD. Although there are more efficient methods (see [26],
p. 313), it remains a fact that rational arithmetic is computationally expensive,
due in large part to the need for GCD calculations. On the other hand, if it
can be arranged so that rJ., and y are all integers, then the same computation
obviously requires only two integer multiplications, one integer addition and
no GCD calculation. Clearly, by multiplying each row of any A(S)EM(Q[s])
by a large enough integer, the denominators of every coefficient of every entry
of A(s) can be cancelled and such a diagonal operation is certainly unimodular
in M(Q[s]). Because this involves a fixed overhead, assume, for convenience,
that A(S)EM(Z[s]). This still creates difficulties because Z[s] is not a Euclidean
ring. For instance, it is easy to see that the remainder oftwo polynomials EQ[S]
with integer coefficients has, in general, rational coefficients; consider the
remainder of 2s after division by 3s - 1. In other words, Euclidean division is
not defined for Z[s]. However, Z[sJ is an instance of a unique factorization
domain (UFD) and there exists a procedure called Pseudo-division which is
defined for polynomials with coefficients in a UFD and which avoids rational
arithmetic altogether.
361
Pseudo-Division Lemma
Given polynomials a(s), b(S)EZ[SJ with deg b(s) ~ deg a(s), there exist polynomials
r(s), N(S)EZ(S) and a positive integer D such that
Db(s) = N(s)a(s) + r(s)
with either r(s) == 0 or deg r(s) < deg a(s). Furthermore, D is the smallest such
integer, i.e.for any other tripie (r'(s), N'(s), D') satisfying the above, it follows that
D ~ D'. It also follows that D ~ I~ where k = deg b(s) - deg a(s) + 1 with la denoting
the leading coefficient of a.
Proof. In division of polynomials b(s) and a(s) over Q, explicit division of la is
performed k times. Thus, if b(s) and a(s) start off with integer coefficients, then
the only denominators which appear in the coefficients of the quotient q(s) and
remainder r(s) are divisors of
This suggests that it is possible to find
polynomials q(s), r(s)EZ[SJ such that I:b(s) = q(s) a(s) + r(s) (see [26J, p. 407).
Denote/: by D' and q(s) by N'(s). Write
I:.
u(s) = S8 + S6 - 3s4
3s 3 + 8s 2 + 2s - 5
and
v(s) = 3s6
+ 5s4 -
4s 2
9s + 21
This example is discussed in [26, p 408]. U sing Knuth's algorithm one obtains,
27u(s) = v(s)(9s 2
6) + ( - 15s4
+ 3s 2 -
9)
9u(s) = v(s)(3s 2 - 2) + ( - 5s 4 + S2 - 3)
Of course in this ex am pIe the difference is negligible. However, if the size of the
leading coefficient of v(s) is large, the difference in computational burden can
be quite substantial.
Aigorithm D-Pseudo-Division of Polynomials
Given two polynomials b(s) = bosn + b1sn - 1 + ... + bm a(s) = aos m+ als m - 1 + ...
+ amEZ[sJ with n ~ m and a(s) =f. 0, this algorithm computes the smallest
pseudo-remainder D and the pseudo-quotient N(s) defined above. This algorithm
computes D and N(s) directly instead of computing D' and N'(s) and then finding
362
J. S. Baras
g'. Direct calculation of D and N(s) involves computing GCD's on thefly which
involve smaller numbers than those used to compute g'. Bigger numbers cost
more in GCD calculations and given the size of the integers encountered in
polynomial matrix computations, e.g. easily greater than 1000 digits, this
algorithm can save a substantial amount of time.
rnnrn +- min (rn, n - rn)
9 +- GCD(b o, ao)
d+-ao/g
D+-d
bo +- bo/g
F or i = 1 thru n - rn Do
F or j = i thru n - rn Do
bj+-bj*d
EndDo
For j = 1 thru min(rnnrn, n - rn - i + 1) Do
EndDo
9 +- GCD (bi' ao)
d+-ao/g
D+-D*d
bi +- bJg
EndDo
The algorithm terminates with the first n - rn + 1 coefficients of b(s) overwritten
according to {bo, b 1 ,, bn-m} +- {No, N 1"", N n-m}.
Introduction of a zero in the (2, 1) position of the matrix in the 2 x 2 example
can now be performed using the Pseudo-division algorithm. This freedom from
rational arithmetic is not without drawbacks, however. Consider triangularizing
the matrix:
1
( 45s
7 - 5s
s
-lOs -10
6s 2 - 1
o
-89145
363
o ) (-9905
0
9
RA(s) = ( 0
0 65025
0
0
0
-9905
0
The superfluous left content of this matrix can then be discarded since this is
equivalent to multiplying A(8) by a unimodular matrix R - 1 thereby keeping
the coefficient size to a minimum. Note that the above polynomial matrix A(s)
is in pseudo-column Hermite form and that the column Hermite form of A(8) is:
17678 3
9905
17968 2
9905
13348
1981
1870
1981
260218
9905
782
1981
1 0
---+--------
---+--------
969s 3
1981
8
8567s 2
9905
2
80s 3
2418 110
---+5s +--+57
57
57
x x
xx
x x
Oxxxx
0 x
OOOxx
0 0 0 0 x
364
1. S. Baras
which returns the index of the row of A whose klh entry is a non-zero polynomial
oflowest degree among the rows {k, k + 1, ... , N} If Ak,k(S) = A k+ 1,k(S) = AN,k(S) ==
0, then it returns - 00
F or k = 1 thru N - 1 Do
index +- MinimumDegreelndex(A, k)
If index x#--
00
Then
End EndlessLoop
EndDo
Endlf
End
It is emphasized that this is a 'paper' algorithm. In fact, the working code based
on the above is more efficient and can handle singular and non-square matrices
and the entries can initially belong to Q[s]. However, these considerations just
complicate matters and obscure the basic operation.
Aigorithm M-Minor Oriented Triangularization of Polynomial Matrices
This algorithm is similar to the one above except it performs the zeroing process
in a leading principal minor oriented fashion so that the algorithm consists of
N - 1 stages where the k x k leading principal submatrix is in a triangular form
by the end of the k th stage. Furthermore, the algorithm employs an additional
substage wh ich reduces the degrees ofthe polynomial entries above the diagonal
on the fly using Pseudo-Euclidean division. The order in which the degree are
reduced is important and is based upon notions from the integer case contained
in Kannan and Bachern [23]. The order is shown pictorially below. The output
matrix is in its unique Pseudo-Hermite form, not simply triangularized, i.e. it
is a triangular matrix with entries above the diagonal of degree less than the
diagonal entry. It is technically not in Hermite form since the diagonal entries
are not monic. But this form is easily obtained by left multiplication with the
appropriate diagonal matrix of rational numbers, a unimodular matrix with
365
respect to Q[s].
X
-+
-+
0
0
X
X
.x
0
0
0
0
0
0
0
0
0
0
0
6
3
1
5
2
0
0
0
-+
... -+
6 Simulation Results
Macsyma is a Lisp based system for performing formal, symbolic calculations
using both error-free and arbitrary arithmetic. Since it is Lisp based,
Macsyma runs the fastest on Lisp machines-computers which have the Lisp
instructions hard-coded into their microprocessors, such as the Texas
366
J. S. Baras
7 Summary of Functions
The following is a summary of the high-level auxiliary programs which we have
to date implemented in Macsyma and Mathematica. They perform most of
the common, high-level tasks arising in the control system design process.
367
computed.
Bezout(N(s), D(s))- Finds the homogenous and particular solutions to the
Bezout equation, i.e. finds polynomial matrices Xp(s), Yp(s), Xh(s), Yh(s) such
that Xp(s)X(s) + Yp(s)N(s) = land Xh(s)X(s) + Yh(s)N(s) = O. Used for
References
[1] Baras, J.S. and R.W. Brockett, "H 2 Functions and Infinite Dimensional Realization Theory,"
SIAM Journal of Control, Vol 13, No 1, pp 221-241, Jan 1975
[2] Baras, J.S., "Algebraic Structure of Infinite Dimensional Linear Systems in Hilbert Space,"
in M athematical System Theory, G. Marchesini and S.K. Mitter (Eds.), Springer-Verlag Lecture
Notes in Economics and Mathematical Systems, Vol 131, pp 193-203, 1975
[3] Baras, J.S. and P. Dewilde, "Invariant Subspace Methods in Linear Multivariable Distributed
Systems and Lumped-Distributed Network Synthesis," IEEE Proceedings, Special Issue on
Recent Trends in Systems Theory, pp 160-178, Feb 1976
[4] Bariess, E.H., "Computational Solutions of Matrix Problems over an Integral Domain,"
J Inst Maths Applies VI0, 69-104, 1972
[5] Bariess, E.H., "Sylvester's Identity and Multistep Integer-Preserving Gaussian Elimination,"
Math Comp V22, 565-578, 1968
[6] Brown, W.S., "On Euclid's Aigorithm and the Computation of Polynomial Greatest
Common Divisors," J ACM V18 (4) 478-504 Oct 71
[7] Brown, W.S. and J.F. Traub, "On Euclid's Aigorithm and the Theory of Subresultants,"
J ACM V18 (4) 505-514 Oct 71
[8] Chou, T.J. and G.E. Collins, "Algorithms for the Solution of Systems of Linear Diophantine
Equations," Siam J Comp Vll (4) 687-708 Nov 82
[9] Cohen, G., P. Moller, J.P. Quadrat and Viot M., "Algebraic Tools for the Performance
Evaluation ofDiscrete Event Systems," IEEE Proceedings, Vol 77, pp39-58, 1989
[10] Collins, G.E., "Subresultants and Reduced Polynomial Remainder Sequences," J ACM V14
(1) 128-142 Jan 67
368
J. S. Baras
[11] Fliess, M., "Un Outil Aigebrique: Les Series Formelles Non Commutatives," in Mathematical
System Theory, G. Marchesini and S.K. Mitter (Eds.), Springer-Verlag Lecture Notes in
Economics and Mathematical Systems, Vol 131, pp 122-148, 1975
[12] Gantmakher, F.R., Theory of Matrices, New York: Chelsea, 1959
[13] Gregory, R.T. and E.V. Krishnamurthy, Methods and Applications of Error-Free Computation,
Berlin: Springer, 1984
[14] HartIey, B. and T.O. Hawkes, Rings, Modules and Linear Algebra, London: Chapman and
Hall, 1970
[15] Hungerford, T.W., Algebra, Berlin: Springer, 1974
[16] Kailath, T., Linear Systems, Englewood ClifTs: Prentice-Hall, 1980
[17] Kaiman, R.E., P.L. Falb, and M.A. Arbib, Topics in Mathematical System Theory,
McGraw-Hill, 1969
[18] Kaiman, R.E., "Algebraic Structure of Linear Dynamical Systems I. The Module of E," Proc
Nat Acad Sci (USA), 54, pp 1503-1508
[19] Kaiman, R.E., "On Minimal Partial Realizations of a Linear Input/Output Map," in Aspects
of Network and System Theory, R.E. Kaiman and N. DeClaris (Eds.), Holt, Rinehart and
Winston, 1971
[20] Kaiman, R.E., "Introduction to the Aigebraic Theory of Linear Systems," in Mathematical
Systems Theory and Economics I, Lecture Notes in Operations Research and Mathematical
Economics, Vol 11, pp 41-65, Springer-Verlag, 1969
[21] Kaiman, R.E., "Pattern Recognition Properties of Multilinear Machines," in Proc of IFAC
International Symposium on Technical and Biological Systems ofControl, Yerevan, Armenian
SSR,1968
[22] Kannan, R., "Solving Systems of Linear Equations over Polynomials," Report CMU-CS-83165, Dept. of Comp. Sei., Carnegie-Mellon University, Pittsburgh, 1983
[23] Kannan, R. and A. Bachern, "Polynomial Aigorithms for Computing the Smith and Hermite
Normal Forms of an Integer Matrix," Siam J. Comp. V8 (4) 499-507 Nov 79
[24] Khargonekar, P. and E. Sontag, "On the Relation Between Stable Matrix Fraction Factodzation and Regulable Realizations of Linear Systems Over Rings," IEEE Trans. on Aut. Control,
Vol AC-27, No 3, pp 627-638, June 1982
[25] Keng, H.L., Introduction to Number Theory, Berlin: Springer, 1982
[26] Knuth, D.E., The Art ofComputer Programming, Vol2 Reading, Mass: Addison Wesley, 1981
[27] Krishnamurthy, E.V., Error-Free Polynomial Matrix Computations, Berlin: Springer, 1985
[28] Lipson, J.D., Elements of Algebra and Algebraic Computing, Reading: Addison-Wesley, 1981
[29] MacDufTee, C.C., The Theory of Matrices, New York: Chelsea, 1950
[30] McClellan, M.T., "The Exact Solution of Systems of Linear Equations with Polynomial
Coefficients," J ACM V20 (4) 563-588 Oct 73
[31] Newman, M., Integral Matrices, New York: Academic Press, 1972
[32] Rouchaleau, Y., B.F. Wyman, and Kaiman R.E., "Algebraic Structure of Linear Dynamical
Systems IH. Realization Theory Over a Commutative Ring," Proc of Nat Acad of Sciences,
Vol 69, No 11, pp 3404-3406, 1972
[33] Sontag, E., "Linear Systems Over Commutative Rings: A Survey," Ricerche di Automatica,
Vol 7, No 1, pp 1-34, July 1976
[34] Sontag, E., "Linear Systems Over Commutative Rings: A (Partial) Updated Survey", Proc of
1981 IFAC, Kyoto 1981
[35] Sontag, E., Polynomial Response Maps, Lecture Notes in Control and Information Sciences,
Vol 13, Springer-Verlag, 1979
[36] Vidyasagar, M., Control System Synthesis, Cambridge: MIT Press, 1985
[37] Wolovich, W.A., Linear Multivariable Systems, Berlin: Springer, 1974
369
1000000
100 hour'S
I
100000
x/~:;;x4
. . . . . .1,,,......-1 J .......
,...-
10000
1000
I
100
! ......-:::: v____
.-
~;:;z:=::;
~~"'--~9-
10
~~~~~0
,;
"""
,.t~X~X
x X--XJ
c
0
. . .
t::?T
0
---~
o=-----j
,~
~ __ f XEstimaled val~~
Ran out of memo
I
TI Explorer II~
16 mb physlcal mem~~
128 mb virlual memory
10
12
8
9
11
Square matrix dimension
13
14
15
16
TI Explorer 11
16 mb physlcal memor~==
128 mb vlrlual memory
~f
1'~/"
3
9
10
11
12
8
Square matrix dimension
13
14
15
16
Fig. 2. Time to triangularize (sec)- Minor oriented algorithm polynomial degrees 1 through 6
370
J. S. Baras
10000
__ v~:~~~
1000
==_
_ _ X:..--X>::::::::::XI-~
~c
~
,, _ _ ,I
'O~c:::::;;Z;;.
:.......=::
c~~ ====~
b
0
~~ 1:_ _b-==o==""l
~::::---l'::::::6-1
_!_o_O
"...:;
100
~~
v// o
0
0
--=-,
0
-= 6-===-0
=--1===-
.-"
~~L~"""'"
10 "6
0
8
10
11
12
9
Square matrix dimension
13
14
15
16
~./'" ......:..........
10~.
1=~~~~~~~~~~~
TI Explorer 1 1 1 - - + - - + - - - 1 - - - 1 - - - + - - + - - - + - - + - - - - + - 1 6 mb physlcal memory-
1+-_-f__+-_-1__1-_~t_-~,t_--r-~t_--r~12~B~m~b~v~lritu~a~l~m~eim~o~ry~
3
4
5
6
7
10
11
8
9
12
13
14
15
16
Square matrix dimension
In this report some developments in the area of stability of discrete systems originally motivated
by a paper by Kaiman and Bertram are overviewed. It is shown that an analog to the Schwarz-form
was developed for diserete systems. This form was applied in determining the margin of stability,
Identification, signal proeessing using lattiee filters and model reduetion of one-dimensional and
multi-dimensional systems
1 Introduction
Wall [1] has shown that a polynomial
(1)
has all its roots with negative real parts if and only if Cl' C 2 , , Cn in the following
continued fraction are aB positive
g(s) = ________1_________
1
f(s)
(2)
Cls+ 1 + - - - - - 1 - - C2S
+-----C3 S
where
(3)
Let
i = 1, .. . ,n and
Co
= 1
372
Then
g(s)
(4)
f(s)
(5)
m = 0, 1, ... ,n-1
where fo = 1,f-1 = 1 and fn = f(s).
Here f1 (S),f2(S), ... ,fis) are the successive denominators of the continued
fraction (4).
It was also shown in [1] that
f(s) = det
(6)
-1
b2
(7)
Other variations of the matrix (7) which have the same characteristic equation
are
[=:: ~
-h.
J[~1
(8a, b,c)
373
It is noted here that this matrix or one of its variations has been called in the
literature the "Schwarz-matrix."
Schwarz [2J developed a numerical method of elementary transformations
to transform a given matrix A to the above nurmal-form (8a) so that the Hurwitzstability ofthe matrix A is determined from b 1 ,b 2 , . ,b n i.e. A is Hurwitz-stable
if and only if b 1 , b 2 , , b n are all positive.
KaIman and Bertram [3J used the second method of Lyapunov to prove
the Hurwitz-stability of a system matrix in Schwarz-form, e.g. if we consider the
linear system
(9)
where Al is in Schwarz-form (8a) then the Lyapunov function
v= :s.TP1:s.
(10)
where
b./b,b, ... b.
can be used to prove Hurwitz-stability
V=
(11 )
-:s.TQ1X 1
where
(12)
b 1 , b 2 , . , bn > 0 are necessary and sufficient conditions for the stability of the
system (9).
Parks [4J showed that the first column of the Routh array consists of the
elements
(13)
As the connection between Routh and Hurwitz criteria is known (see Gantmacher
[5J) i.e. the elements of the first column of the Routh array are
(14)
374
D2
DI
D3
DI D2
DI D4
D2 D3
b l =D I ,b 2 =-,b 3 =--,b 4 = - - ,
(15)
= bl
ao,o
=1
aj,k
For n = 1 a ll = b l
For n = 2 a12 = b l
For n = 3 a13 = b l
The relation between a l
"
a2
a3
a4
a22
a23
(16)
= b2
= b 2 + b 3 a33 = b l b 3
1 0
1
al
0
1
0
1
0
0
al,l
a l ,2
al ,n-2
a2,n-2
bl
b2
b3
b4
an-2,n-2
bn
a2,2
an
(17)
A=
[0
-an
(18)
TI
al,l
a2,2
a l ,2
a3,3
a2,3
a l ,3
an-l,n-l
an- 2,n-1
an-3,n-1
P 11 = [
(b
d/2
(b 1 /b 2 )1 / 2
375
(19)
(b db 2 b 3 . ... bn )1 / 2
(20)
- b!/2
b;12
0
(21)
It is noted that the matrix (20) was first used in [9J. In [10J it was shown that
376
(23)
The necessary and sufficient condition for the roots of (22) to lie inside the unit
circle is that the zero order terms in Fn(z) and the n - 1 polynomials obtained
successively through the transformation
(24)
(25)
(26)
i.e.
a 1 ,l=L1 l ,
al,l
az,z
a1,z
1
a3,3
aZ,3
al ,3
an-l,n-l
an-Z,n-l
an - 3 ,n-1
(27)
an-l,n-I]
an-Z,n-Z
(28)
- L1 n - 2 L1 n - 1
2
1 - .1 n-2
(29)
377
oo ...
...
o ...
0J
0
..
(31)
AVis negative semidefinite and does not vanish identically for any general
sequence of vectors.
Thus the Schur-Cohn stability criterion is proved using the second method
of Lyapunov and a diagonal matrix is used here for the proof similar to [3].
Let 1- A/ = b/ and P3 = p 3 / . then
(32)
- An-lA n
- An-2Anbn-l
- A l A n b2 .. bn- l
-Anblb2"bn-l
- A l An- l b 2 b n - 2
-An-lblb2bn-2
(33)
(34)
Another form can be obtained using the transformation matrix T [14], [15]
378
Tij= T i - 1,j-l
+ L
(35)
Lin-i+1Lin-i+kTi-k,j-l
k=2
T ii
A s = TAr- 1 =
T iQ
= 0 for' i > 0
-.1._1.1.
-.1.- 2.1.(1-.1;_1)
-.1.- 2.1.- 1
-L1.- 3 L1.(I-f;-2)(I-L1;-1)
-.1.- 3 .1.-:1(1-.1;_2)
-J
(36)
and get
Q,~ [0
(38)
In [10] the results were extended to the complex case and simplifications
analogous to those provided by the Lienard-Chipart criterion are obtained for
the real case.
3 Applications
3.1 Estimation of the Margin of Stability
In [14] the margin of stability of a discrete system given by the nth order
difference equation
(39)
+ 1) = A~(k)
(40)
379
(41)
L1V= _gTQsg
(42)
Psis given by (37) and Qs by (38). The margin of stability is determined using
the weighted sum ofsquare error J = 'L.:'=Ok'y2(k). Due to the fact that T defined
in (35) is a lower triangular matrix with (1,1) entry = 1, then
L
<Xl
J=
k=O
k'y2(k) =
L k'xi(k) = L
<Xl
<Xl
k=O
k=O
k'xi(k)
(43)
For r=O,
Jo =
(44)
which can be computed directly for different initial conditions. For example if
y(O) = y(1) = ... = y(n - 2) = 0
and
y(n - 1) = 1
(J
1J 0
(45)
where Ai are the roots of (39). For weighted sum of square error (r ~ 1) J, can
be obtained in terms of solutions of the Lyapunov equation for linear discrete
systems [14].
+ 1) = A~(k) + lzu(k)
y(k) = fT ~(k)
(46)
(47)
380
In this case there are 2n parameters to be identified in the system (47). The
advantage of the transformation to the form (47) is that M is in Hessenberg
form which has some numerical advantages. The performance criterion is chosen
as
J
k;1
(48)
Iy*(k) - y(k)1
el
+ 1) = Mg(k) + fu(k)
y(k) = fT g(k)
(50)
where
M is in Mansour-form
~
!z. = [L1 n - 1 L1 n -
2 ,11
1]
fT = [flfz fn]
The system (50) can be represented by the following cascade structure Fig. 1
with a diagonal matrix transformation
T=dia gona{1 +e nL1n,(1 +enL1n)(1 +en-1L1n-l), ...
lII
(1 +e;L1J]
(51 )
where e; =
1.
I
Fig.l
381
'----------..z
)-----~
- - - - ----10-(
)----~
Fig.2
.1.- 1.1.
- .1.- 2 .1.(1 + 6.- 1.1.- d
M'=
1 - 6.- 1.1.- 1
- .1.- 2 .1.- 1
+ 6j.1;)
.1.U;:II(I+eA)
.. .
.. .
.11.1.U7~2(1
-.1 1.1 2
- .1 2 (1 + 61.1 1)
[<. e'l]
(52)
The system (52) can be represented by the lattice digital filter Fig.2. This
realization is with the least number of multipliers. Normally, there is a direct
coupling between u and y. The signs in Fig. 2 must be chosen in the same
sequence and are determined so that the coefficient sensitivity and the output
noise become smalI. The structure corresponds to a lossless cascade transmission
line.
+ 1) = M~(k) + Qu(k)
y(k)
= !/~(k)
(53)
where M is given by
-.1 1
-.1 2
1-L1i
-.1 1 .1 2
M=
Q=[1
!/=[C 1
O,,O]T
C2 "'C n ]
382
It is assumed that (53) is stable, Le. ILiil < 1. If I(Li n /Li n - 1 )1 is sufficiently small
then X n will reach its quasi steady state much faster than X n -1' From last system
equation in (53) we get after putting xn(k) instead of xn(k + 1) on the left side
of the equation
~n(k)=
- Li n
(k
Xl ) _
1 + Li n _ 1 Li n
Li n - 2Li n
1 + Li n - 1 Li n
.I>
An
-1
(k)
(54)
o
(55)
1 - Li;_2
- Li n - 2L1 n - 1
where
A
_ Li n
L.l n -1 -
+ Li n -
1 + Li n - 1Li n
The matrix Nt is in the same form. One can easily prove that stability and
steady state response are preserved after reduction. Further reduction can be
continued in the same manner.
In [19J and [20J two multivariable Mansour forms for model reduction are
derived. The first one is obtained from the Luenberger first form using similarity
transformation.
x :
x
0 : ~
x :
: : 1
x:
..
.
A=
.............~.~ ..........~.. ~
x :
B=
x:
.. ..
. ..
x:
~
X
(56)
IM 22 \
M=
(57)
383
The diagonal blocks are in Mansour form. The extra degree of freedom in the
coupling elements in (57) can be used to get a nice structure of M. For example,
for 2 inputs (m = 2)
All
0 ]
OOPl A 22
(58)
The reduction of the first or the second subsystems can be done independently
by the same method used in the single input case.
The second multivariable normal form [20J is obtained by first transforming
the Luenberger first form to the block-controllability form by means of elementary similarity transformations and then applying a generalized matrix SchurCohn-Jury table. This second form is then constructed in the same manner as
the single input case.
Let the controllability of the nth order system with m inputs be described
by the block controllability form
A=
-AqJ
1m .
:
.
,
..
Im
(59)
-:A 1
M=
- (}l
1 - ()~
- (}i
- (}1(}2
(60)
The elements (}l' (}2' ... ' (}q are m x m matrices obtained from the matrix SchurCohn-Jury table.
The reduction procedure is similar to the single input case. (For details of
transformation and model reduction see [20J).
In [21J further justification of the Badreddin-Mansour method is presented
where the connection between root location properties, coefficient properties
and Schur-Cohn coefficient properties are established. It is also shown that the
lattice realization of the reduced system differs from the lattice realization of
the original system by replacing the last delay by unity gain element.
384
y(i,j) = [Cl I C
(61)
2][~h(~,~)] + Du(i,j)
~V(z,J)
Conclusion
The Schwarz canonical form of continuous systems and the discrete Schwarz
canonical form or Mansour-form for discrete systems are discussed. An overview
is given on the characteristics and applications of the Mansour-form in the
areas of determination of the margin of stability, identification, signal processing
using Lattice filters and model reduction of one-dimensional and twodimensional systems. The work of KaIman and Bertram in 1960 on using
Lyapunov theory to prove the stability of Schwarz form has inspired the
sub se quent work for the discrete case.
385
References
[1] Wall, H.S.: Polynomials whose zeros have negative real parts. Amer Math Monthly, 52 (1945),
pp 308-322
[2] Schwarz, H.: Ein Verfahren zur Stabilittsfrage bei M atrizen-Eigenwertproblemen, Zeit f angew
Math u Physik, 1956, pp 473-500
[3] Kaiman, R., Bertram, J.: Control System analysis and design via the second method of Lyapunov.
J of Basic Engineering, June 1960, p 371
[4] Parks, P.: A new proof of the Hurwitz-stability criterion by the second method of Lyapunov
with applications to optimum transfer functions. Fourth Joint Automatie Control Conference,
June 1963
[5] Gantmacher, F.R.: Applications ofthe theory ofmatrices. Interscience, New York, 1959
[6] Mansour, M.: Stability criteria of linear systems and the second method of Lyapunov. Scientia
Electrica, Vol XI, 1965, pp 65-104
[7] Mansour, M.: Die Stabilitt linearer Abtastsysteme und die zweite Methode von Lyapunov.
Regelungstechnik, Heft 12, 1965, pp 592-596
[8] Chen, C.F., Chu, H.: A matrixfor evaluating Schwarz-Jorm. IEEE Trans on Aut Control, 1966,
pp 303-305
[9] Puri, N., Weygandt, C.: Second method of Lyapunov and Routh canonicalform. J. ofthe Franklin
Institute, Nov 1963, p 365
[10] Anderson, B.D.O., Jury, E.I., Mansour, M.: Schwarz-Matrix Properties for Continous and
Discrete Time Systems. Int J Control, Vol 23, 1976, pp 1-16
[11] Schur, 1.: Ueber Potenzreihen, die im Innern des Einheitskreises beschrnkt sind. J. Reine u.
Angewandte Math. 147 (1917), pp 205-232
[12] Cohn, A.: Ueber die Anzahl der Wurzeln einer algebraischen Gleichung in einem Kreis, Math
Zeit 14 (1922), pp 110-148
[13] Jury, E.I.: Theory and Applications ofthe z-Transform Method. Krieger, 1982
[14] Mansour, M., Jury, E.I., Chaparro, L.F.: Estimation of the margin of stability for linear
continuous and discrete systems. Int J Control, Vol 30, 1979, pp 49-69
[15] Mansour, M.: A note on the stability of Linear Discrete Systems and Lyapunov Method. IEEE
Trans on Aut Control, Vol AC-27, No 3, June 1982, pp 707-708
[16] Dourdoumas, N.: Ein Beitrag zur Identifikation und Approximation von Systemen mit Hilfe
linearer diskreter mathematischer Modelle. Archiv fr Elektrotechnik 62 (1980), pp 1-4
[17] Takizawa, M., Kishi, H., Hamada, N.: Synthesis of Lattice digital filters by the state variable
method. Electron Commun Japan, 65-A, pp 27-36
[18] Badreddin, E., Mansour, M.: Model reduction of discrete time systems using the Schwartz
canonicalform. Electron. Lett., Vol 16, No 20, Sep 25,1980, pp 782-783
[19] Badreddin, E., Mansour, M.: A multivariable normal-formfor model reduction of discrete-time
sytems. Syst. Control. Lett. 4 (1983), pp 271-285
[20] Badreddin, E., Mansour, M.: A second multivariable normal-Jorm for model reduction of
discrete-time systems. Syst Control Lett 4 (1984), pp 109-117
[21] Anderson, B.D.O., Jury, E.I., Mansour, M.: On Model Reduction of Discrete Time Systems.
Automatica, Vol 22, No 6, 1986, pp 717-721
[22] Antoulas, A.C., Mansour, M.: On Stability and the Cascade Structure. Technical Report 1990
[23] Jury, E.I., Premaratne, K.: Model Reduction ofTwo-Dimensional Discrete Systems. IEEE Trans
on Circuit and Systems, Vol CAS-33, No 5, May 1986, pp 558-562
[24] Premaratne, K., Jury, E.I., Mansour, M.: Multivariable Canonical Formsfor Model Reduction
of 2D Discrete Time Systems. Vol CAS-37, No 4, IEEE Trans on Circuits and Systems, April
1990, pp 488-501
Chapter 6
Introduction
The use of state space systems to represent dynamical systems of various types
has a long history in mathematics and physics. Two fundamental examples are
the well-known methods for analysing high order ordinary differential equations
via the corresponding first order equations and the formulation of Hamiltonian
mechanics-within which c1assical celestial mechanics forms a magnificent
special case. Equally, in theoretical engineering, inc1uding the information and
control sciences, state space methods have played and continue to play a
fundamental role. From a mathematical viewpoint, the pure state evolution
models of dynamical system theory undergo an enormous generalization in
control theory to input-output systems, which then fall into equivalence c1asses
according to their input-output behavior. Furthermore, from an engineering
viewpoint, a vast array of problems are either initially posed in state space
terms (aerospace engineering provides many examples) or are first presented in
terms of input-output models which are then transformed into input-stateoutput form (process control being one source of examples). As a result, one
of the basic subjects of study in mathematical system theory is the relationship
between input-state-output systems and input-output systems. The generality
of this relationship is conveyed by the Nerode Theorem which states that any
non-anticipative (set-valued) input-(set valued) output system S possesses a
state space realization 17(S), that is to say, there exists an input-state-output
system 17(S) which generates the same input-output trajectories as S.
One ofthe fundamental contributions of Professor KaIman is the recognition
of the great importance of time invariant finite dimensional linear input-stateoutput systems (henceforth simply called linear state space systems) in the
mathematical study of dynamical control systems, and of the consequences of
this for theoretical engineering, in particular circuit theory and control theory.
Three major aspects ofthis theory, which are due in whole or in part to Professor
KaIman, are, first, the solution ofthe prediction problem for stochastic processes
generated by state-space systems; second, the solution of the realization problem
for time invariant linear input-output systems offinite Smith-McMillan degree
(henceforth called linear input-output systems); and, third, the initiation of the
390
P. E. Caines
391
(labeled the output process) and an IRm-valued process u (labeled the input
process). A parameterized class of predictors {.J\(e) = f(y~-1, u~, e); kE71d is
specified, where e lies in a v-dimensional manifold e. Here 71 1 denotes the
natural numbers 1,2, .... The resulting set of representations
(1)
where ek ~ Yk(e), kE71 1, is called afamily ofapproximate system models for (y, u).
We emphasize that a parameter is an element of a parameter space of systems
which is a set of systems endowed with a manifold structure; the term parameter
is not to be confused with any particular co-ordinate description of a parameter.
In this connection the reader is referred to KaIman [1980].
Restricting the most general formulation somewhat for the sake of brevity,
we now introduce a family of continuous loss functions lk(' ):IRP x e ~ IR 1,
kE71 1, that yield the cost function L N (-, ):IRNp x e~IR1, NE71 1, on a sampie
path of prediction errors e(e), via
1 N
LN(e(e), e) ~ -
lk(ek(e), e),
N E71 1
(2)
k=l
For each NE71 1, one may now define the sampie path dependent minimum
prediction error (MPE) estimator {jN = (jN(y~-l, u~) as a y~-l, u~)-measurable
selection of a parameter in the set arg min Oe8 L N(e(e), e) (whenever the minimum
exists) and define the deterministic optimal (ASM) estimator N as the set of
parameters arg min Oe eELN(e(e), e) (again whenever the minimum exists).
In this general framework, for a given family of ASMs and loss functions,
we say that e is (asymptotically) identifiable if there exists a unique limit
e
in the manifold topology for all sequences chosen from the sequence of sets
{e N; N E71d, and say that the prediction error estimator process {(jN; N E71d
is (strongly) consistent if {jN ~ a.s. as N ~ 00, again in the manifold topology.
The benefits of the generality of this formulation include the following:
eE
392
P. E. Caines
(3) In the most general version ofthe theory, (0.3) includes a term that measures
N ; N EZd then automatically take
the complexity of the model. and
account of the trade-off between the minimization of prediction errors along
a given block of data and the corresponding increase in the complexity of
the best fitting model. This trade-off is a fundamental issue in the philosophy
of science, and the reader is referred to Chapter 5, of Caines [1988J for a
discussion of this topic together with a bibliography. Our formulation
permits one, for ex am pIe, to express the Hannan-Quinn and HannanDeistler theories (see Hannan-Deistler [1988J and Chap.8, Caines
[1988J) of system parameter and structure estimation within a coherent
overall framework. Furthermore, the ASM-PEM formulation is at least
consistent with recent theories of parameter and structure estimation, such
as those due to Willems [1986, 1987J and Rissanen [1989]. (The latter
presents a generalized form of Shannon information theory as its framework
for the study of system identification, and for the study of statistics in
general.)
(4) The ASM-PEM formulation of system identification specializes smoothly
to the standard central problems of system identification such as the
maximum likelihood identification of Gaussian state-space and autoregressive moving average systems with exogenous observed inputs (called
ARM AX systems).
{e
393
cally stable finite dimensional linear time invariant system, where the system
acts on an input process composed of an observed stochastic or deterministic
process U and an unobserved stochastic process w. Moreover, this class of
systems has representations both in the autoregressive moving average form
(with exogenous inputs), known as ARMAX representations, and in input-stateoutput form (see equations (1.6) below).
We begin by considering a class OJI of p-component stochastic processes
defined on the probability space (Q, f!J, P), YEOJI being the sum of a process (,
generated nonanticipatively from an observed input U in a given class dlt, and
a process ~ in the class E of pro ces ses generated from a class of unobserved
inputs. So we write
kEll
(1.1 )
U
U
394
P. E. Caines
L
00
~k =
WiW k - i,
Wo = I,
a.s.
kEZ
(1.2)
i=O
wJ
395
(1.3)
w(z)
where it is assumed that (i) the matrix [Z(z), W(z)] of formal transfer functions
is the positive power series expansion of a (p x (m + p)) matrix [Z(z), W(z)] =
[Z*(Z-l), W*(Z-l)] of rational functions of z with Smith-McMillan degree b
in Z-l, where (ii) [Z(z), W(z)] is non-anticipative as an operator on formal
power series and hence is proper satisfying
[Z(z), W(z)]\z=o = [Z*(Z-l), W*(Z-l)]!z-I=OO
<
00,
and which is such that (iii) all entries have asymptotically stable poles (which
implies in particular, [Z(O), W(O)J < 00), (iv) det W(z) has asymptotically stable
zeros, and (v) W(O) = 1.
In this case, we know from realization theory (see KaIman [1963, 1965J,
Heymann [1975J, and for recent discussions Bokor and Kevitzky [1987J,
Delchamps [1988J, Hannan and Deistier [1988J and Deistier and Gevers
[1988J) that [Z(z), W(z)] above has the following representation.
The ARMAX Representation
The matrix of rational functions [Z(z), W(z)] satisfying [Z(O), W(O)J <
a left matrix fraction description
[Z(z), W(z)]
00
has
(1.4)
A;
00
has
(1.5)
396
P. E. Caines
where (H, F, [G u, GW ], [Du,!]) are respectively (p x <5), (<5 x <5), <5 x (m + p) real
matrices and (i) F is asymptotically stable, (ii) det(H(I z -1 - F) -1 GW + 1) has
asymptotically stable zeros, (iii) (H, F) is observable and (F, [Gu, GW]) is
controllable, or, equivalently, (iii') the state space dimension <5 in (1.5) is the
smallest amongst all such state space realizations.
SSX stands for state space system with exogenous inputs and astate space
system satisfying the conditions above will be said to satisfy SSX.
We now specify our admissible set of ARMAX and state space systems by
considering the set of linear systems having the properties ARMAX or SSX
above. To be specific, we adopt the following definition:
Definition 1.1. For given p,<5EZ 1, mEZ+,ff'(p,m,<5) shall denote the set of
397
single output systems for which b = 1, and W(O) < 00. They are given by the
set of rational functions (IX + z -1 )j(y + bz - 1), where there are no common
factors between numerator and denominator, and b are not both zero, and
y "# 0, that is, I I + Ib I > 0, IXb - y "# and y "# 0. This set may also be written as
bz/(a + z) + C, a "# 0, b "# 0. The manifold structure of this family of functions is
given by MV = MV' x R 1 C R3 , where MV' is the analytic manifold of dimension
Vi = b(p + m) = 2 which is equal to the disjoint union of the four chart
neighborhoods O = {a Z 0, b Z 0; (a, b)ER Z }.
It is evident that MV' x R 1 is in one to one correspondence with the formal
transfer functions of the set of systems I~,l and that 9"(l,O,l)cIi,l is
parameterized by P( 1, 0, 1) ~ MV n { Ial > 1, Ib + 11 < Ia I} n {C = 1}.
Observe that in the induced topology on {c = 1} C R 3 the parameter set
P(l,O, 1) is the open set given by the disjoint union
+ 11< lai}
The fact that MV' x R 1 is the disjoint union of four open connected sets is
also a consequence of the following general fact:
b -n + b Z -(n-1) + + b
Rat(n)~ {b()
~= nZ _
n-1 _ _
o,lanl+lbnl>O,
a(z) anz n + an - 1z (n 1) + ... + ao
a(z) and b(z) relatively prime},
nE~+,
and this will yield a system in Itz whenever [H, FJ is observable. In the ca se
where the first and third span R Z, we may take a basis so that
[H, F,
hll
GJChart 1
= [[ h Z1
h 12 ]
hzz '
[0 1] [0 glZ]]
IX z
IX 1
'
1 gzz
'
398
P. E. Caines
[H,F,GJchart2= [[ h
11
h12]
[0 1] [gl1 ~]J.
h'
,
21
22
!X2!X1
g21
and in the ca se where the first and fourth span, we may take
11
1[
22
_ a
a 12
-a 21
a11
],[0
399
1:c5kj Vk,je71,
(1.6a(i) )
t/le '1',
1:efff',
(1.6a(ii
and
SSX
Xk+ 1
Yk
ke71 +,
t/I e '1',
(1.6b(i
= H ",x k + D~Uk + ]W k ,
(1.6b(ii
1: ef!!>
(1.6b(iii
(1.7)
k-- 00
VNe71 1,
Vz~eRNp,
(RN(m+ p ),
respectively),
~(J =
(J'
(1.8)
400
P. E. Caines
Let us assume for the moment that the random variable z~ is distributed
over RNr with the parameterized density f(z~; 8), where 8 takes a value in the
set e. Then, the likelihood function of the observations z~ is given by
f(z~; .): e ~ R 1, and the maximum likelihood estimate 8N (of 8) is given by the
maximum (when it exists) of f(z~; 8) over e.
To begin with we observe that if z is a fuH rank zero me an stationary
Gaussian stochastic process with (infinite) process covariance matrix
0< 1: Z = (1::,j = 1::_ j; i, jE71), where 1::,j = EzizJ, i, jE71, then the likelihood
function f(';') is parameterized by 1: z E e when we identify e with the set of
(Nr x Nr) strictly positive submatrices on the diagonal of 1: Z ( . ) corresponding
to z~. Further, f('; .) has the form
1
H(z~)T[1:~(8)r l(Z~)},
8E
e
(2.1)
401
n f(Yily~-l; u, 8),
N
(2.2a)
i= 1
n f(Ydy~-1,U~;8)f(udyil-l,U~-1),
N
(2.2b)
i= 1
Yili-l(8)~
S YJ(Ydy~-1;U,8)dYi>
Rp
iE'lll,
that is, Yili-l is the conditional expectation of Yi given y~-l in the case where
(u,8) are deterministic. And let
Yili-l(8)~
S yJ(Ydy~-1;Uil;8)dYi'
Rp
that is, Yili-l is the conditional expectation of Yi given V1-l, Ui1) in the case
where (y, u) are jointly randomly distributed and 8 is deterministic.
Making the change of variables
Yi-+ Yi - Yili-l(8),
1 ~ i ~ N,
in (2.2a) and (2.2b) respectively, and using the change of variables formula, we
obtain
f(y~; u, 8) =
n f(Yi - Yili_l(8)ly~-1; U, 8)
N
i= 1
(2.3a)
402
P. E. Caines
and
N
N. 8) f( Yl'Ul'
-
(2.3b)
i= I
TI f(UiIY~-l,
U il - I )
(2.4)
i= I
Then (2.3a) and (2.3b) yield a likelihood function on the observations ofthe form
expttIIOgfo(V;(8))+IOgJ(Y~,U~)}'
(2.5)
wherefo(') denotesf(v;(8) Iyil- I ; U, 8) orf(v;(8) Iy~-I, u~; 8), vi(8)!), Yi - Yili-I (8) and
Yili-I(8) denotes the conditional expectation of Yi given the observations, this
being computed using the appropriate conditional distribution in each case.
We now postulate some further conditions on the processes appearing in
the system equations (1.6) so that the likelihood function (2.5) on the observations will be computable in a convenient recursive manner. These conditions are
typical of those imposed on ARMAX and SSX systems to yield a satisfactory
theory of filtering and stochastic optimal control. We shall let '5 denote the
maximum of the row degrees of A1I'(z) in (1.6(i)).
INP 3A: The random variables x!), (Y~b+ l ' W~b+ I) and XI' and ARMAX (1.6a)
and SSX (1.6b) respectively, are jointly distributed with and orthogonal to w~
for each N E7L I , where W is a full-rank orthogonal process and the joint
distribution is Gaussian with zero mean.
In case (A) the process U enters the recursions (1.6) in a deterministic manner
and the Gaussian density on the respective initial conditions propagates to a
Gaussian distribution on (Y~b+ l ' W~b+ I) parameterized by 8 == (1/1, L). This gives
(y~) a (nonsingular) Gaussian distribution for each N E7L I'
When INP 1Band INP 2 are in force, we enunciate the following condition:
INP 3B: (x!), (Y~b+ l ' U~b+ I' W~b+ I)' W 1 , w 2 , .. ) and (Xl> W I , w 2 , ... ) are full-rank
zero-mean Gaussian processes with Wn+ 1 independent of {x, (u 1 , W I ), .. , (u n, wn),
un+d and (XI,(UI,WI),,,,,(un,wn),Un+l) respectively, for all nE7L+.
If the observed input process U is absent, INP 3A and 3B become the single
hypo thesis that the initial conditions for (1.6) and the (full rank) orthogonal
processs (wo, wI , ... ) are jointly Gaussian, mutually orthogonal and have zero
mean.
Since INP 1- 3 give a likelihood function of the form (2.5) for (1.6), and since
the process v is Gaussian with density N(O,Lili-I(8)),Lili-I(8O, iE7L 1 , the
403
following expression gives the 8 = (t/J, .1')-dependent part of the logarithm of the
likelihood function (scaled by - 2/N):
{1
}
-2N
- LN 10 - ex --1 v 8 2
N i=1 g (2ny/2 (det.1' i1i _ 1(8))1/2 p 211 ,( )II[E;I;-dOW'
= p log 2n
+~
Ni=1
(2.6a)
(2.6b)
For convenience, we often refer to (2.6)-less the quantity p log 2n-as the
log-likelihood function and denote it by LN(Y~; u, 8) in case A, and LN(Y~' u~; 8)
in case B. Both LN(Y~; u, 8) and LN(Y~' u~; 8) are given by
1
N
(2.7)
The matrix inverse .1' i1i :' I (8) in (2.6) and (2.7) exists for all iEZ I and for aB
8E e since (1.6b) yields
.1' i1i - 1(8) = Ho Vi1i - 1(8)HJ + .1'0'
iEZ I ,
where Vi1i - 1(8) denotes the state estimation error covariance and we have.1' o >
for all 8E e by the definition of e.
Note that the initial condition covariances Exlxi, Exx T and Exx T are
parametrie quantities required to specify the joint distributions (x, Y~), (x, Y~),
(x, Y~, u~) and (Xl' Y~, u~), respectively, and to initiate the recursions for the
sequence {Viii-I; iEZ I } in case (A) and (B) respectively. As remarked in Sect. 1,
in none of the cases can this quantity be consistently estimated and it is not
included in the parameter 8.
The function (2.7) will be our main object of study in Sect. 3. As stated
earlier, the important point is that the ARMAX and SSX models together with
the INP hypotheses in their final form as given in Sect. 3 permit the recursive
construction of the prediction errors {v k (8); 1 ~ k ~ N} via the techniques of
linear recursive filtering.
The following example is instructive.
Example 2.1. Consider the construction of the likelihood function for a
404
P. E. Caines
8 k (J2
EYk+rYr
--2'
1-<5
Vk, tEll
12: i,j 2: N,
and the Gaussian assumption on w makes the entire Y process Gaussian with
density.
N.
_
1
1
1
!(Yl' 8) - (2nt I2 11: NI 1/2 exp - Z[YN, J:'N-1,"" Y2, Y1]
(J2
8(J2
8 N - 1(J2
1- 8 2
1- 82
1- 82
8(J2
(J2
1- 82
1- 82
Yn
YN-1
(J2
8(J2
1- 8 2
1- 82
8 N - 1(J2
8(J2
(J2
1- 82
1- 82
1- 82
Y2
Y1
In this simple example we can carry out the matrix manipulations to obtain
2
!(y~;8)=(2n)-NI2 ( ~
1-8
1
-8
)-1 /2exp-HYN'YN-1"",Y2,Y1]
0
1
-8
1
-8
0
0
(J2
(J2
X
(J2
(J2
1- 82
-1
-8
405
0
YN
-8
YN-1
X
-8
0
Y2
Y1
Hence
(a1 - 8
N
-log f(y~; 8) = -log2n
+ -1 { log - - ) }
2N
(2.9)
On the other hand, we can obtain (2.9) in the form (2.7) as follows:
From (2.5) we obtain
-logf(y~; 8) =
+ 2i~l (Yi -
Yili-1)2(Lili_1)-1
(2.10)
and
Hence the solution sequence to the Riccati equation (with its initial condition
determined by the Lyapunov equation) is
2
a
2
2
--2,a ,a , ...
1-8
406
P. E. Caines
Yi - Yili-1
for all i ~ 2.
Yi - 0Yi-1
This simple example shows how the calculation of (y~f (1:' N) -1 (y~), in the
direct version of the likelihood function is transformed into a sum of quantities
that depend upon the solution of the appropriate Lyapunov and Riccati
equations, in the "filter" form of the likelihood function (2.7). This illustrates
how, at the cost of the complexity of the Riccati equation (equivalently the
Cholesky factorization algorithms), the direct likelihood function as a Gaussian
process generated by the system (1.6a)-(1.6b) is transformed into the version
that depends on the process innovations.
407
the overall hypotheses INP B below include the analytic closed loop so Iv ability
(analytic CLS) condition of Chap. 10, Sect. 1 of Caines [1988], which merely
states that if the set of equations for (y, u) is solved in terms of wand the
feedback disturbance v, then the resulting map (w, v) ~ (y, u) is given asymptotically by an asymptotically stable matrix transfer function (i.e. a matrix transfer
function analytic in a neighborhood of the closed unit disc D). Further, although
the coefficient matrices IJ, LU and L of the feedback are not needed to compute
the maximum likelihood estimate, they are needed to compute the asymptotic
covariance of the estimate.
In the theory statement below, we gather together, for ease of reference, all
the relevant input hypotheses. The reader should note the addition of the
important "persistent excitation" condition in both sets of hypotheses: the
exogenous input, whether deterministic or random, is required to asymptotically
behave like a wide sense stationary process, which is not linearly deterministic
(with respect to the regressions of order less than twice an integer depending
upon the system structure). We also note that the maximum likelihood estimate
is defined as the minimizing argument of the function LN in (2.7) (which in the
MPE framework is interpreted as a prediction error loss function) and that the
hypo thesis of the existence of a conditional density for Uk given (y~-l, U~-l) is
not required in either of the theorems below.
Before stating the main theorem, we stress that it is not hypothesized that
the observed y process, ((y, u) process, respectively) is stationary.
Let "5 denote the maximum row degree of Atp(z) and Btp(z) for all t{!E 'P for
some set of co-ordinate charts covering the compact sub set C c 'P cited in the
statement ofTheorem 3.2. This co-ordinate system will remain fixed throughout
the rest of the discussion.
We now collect together the two sets ofinput hypotheses under the headings
INP A and INP B.
INPA:
(1) u is deterministic (i.e. {cj>,.Q} measurable) and bounded.
(2) w is a zero mean wide sense stationary Gaussian process with EWkWJ =
~bkj,k,jE71, with ~E(lJ>.
(3) x = (i:;I+ l' W~;I+ 1) and Xl' in (1.6a) and (1.6b), respectively, are jointly
distributed with and orthogonal to w~ for each N E71 + and the joint distribution is Gaussian with zero mean.
(4) For the process u the following limits exist:
T
Mk'
-1- ~
1Im
L.. UiUi_k =
N-ooN+l i =o
kE71,
and
(3.1a)
408
P. E. Caines
INPB:
(1)
Uk ;:::
j= 1
j= 1
L L}Yk-j+ L LjUk-j+Vk>
(3.1 b)
vis called the exogenous part of the input process and is said to be (random)
persistently exiciting of order 2<5.
Versions of part (1) of the next theorem exist in the literature (see, e.g. Ljung
[1987J). Here we give a precise statement and proof linking the dass of systems
fJ'(p, m, <5), the notion of asymptotic identifiability and the hypotheses INP
(Al)-(A4), in particular the persistent excitation (order 2<5) hypothesis. Part (2)
should be compared with the feedback system identification results in Chap. 10
of Caines [1988J, where stronger conditions are required as the dynamics of
the feedback loop are not assumed known in the most general ca se considered
there. We also remark that the singular spectral distribution hypo thesis for v
in INP B above can be relaxed in both Theorem 3.1 and Theorem 3.2. For
Theorem 3.1 the proof be comes a little more elaborate than that given here
and for Theorem 3.2 the proof statement is identical.
409
Proof. Since systems satisfying ARMAX or SSX are asymptotically stable, and
since w (under INP A) and (w, v) (under INP B) are Gaussian processes, the
k-+<XJ
= 0,
+N(8)Zk+NT(8)
lim EZ kk+l
-- lim E Zk+N(8')Zk+NT(8')
k+l
k+l
k+l
k-.CX)
k-+co
But the asymptotic stability of systems in Y(p, m, (5) and INP A(1)-A(2) then
imply
(i)
<XJ
<XJ
i=O
i=O
(ii)
For the first equation of the pair above we shall invoke the (order 2(5)
persistent excitation (3.1a) and use an argument based on Hankel matrix
realization theory.
Put Yt = 2:~oZo,iMt-i' 't"E71+. Then we have the infinite matrix equation
Mo
Mi
MI
Ml
Mo
M 2
M l
M2~
M2~-1
IM2/l
2/l
[
X X
X X
= [YOY1 Y2 /l],
where we have suppressed mention of 8 for simplicity of notation.
By the persistent excitation condition INP A(4), IM~~ > 0; so there exists a
sem i-infinite lower triangular matrix S with m(2t5 + 1) x m(2t5 + 1) identity
410
P. E. Caines
matrices on the diagonal and all other non-zero entries in the left-most m(26 + 1)
block column, such that
[Z,Z,Z, ...
jS-'SM~~
[Z,Z,Z, .jS-' [
~i~l
'"
Y2b ],
which gives
[[Z OZI Z 2 ]S - I ]2b = [Yo Y I
...
-1- Y 2b ] [M 2b
2b ] V[Yo Y I
...
Y
2b ]
Let Q denote the semi-infinite matrix with (i) I m(2H I) in the top left
position, (ii) below each block diagonal matrix Im' a block column
[a~ lai I m' a~ I a 2 I, ... , et~ I aqI mY corresponding to the linear relation
L[=OaiZj + i = 0, q ~ 6, ao 0, (which holds for all jEZ I by virtue of the hypothesis 6(Z(z)) = 6 and the Cayley-Hamilton Theorem), and (iii) zeros in all other
entries; explicitly:
and
Z2
Zb
ZHI
Zb
ZHI
ZH2
Z2b-1
IH~+ 1=
Zb+1
Z2b
00
L ZiM-i'
i= I
411
However (ii), together with the uniqueness of the admissible spectral factors
parameterized by e, yields
Wo(z) = W9(z),
E 9 = E 9
Part 2. Let us assurrie hypotheses INP B hold. Then the mean value of the
process (y, u) converges to zero as N ~ 00 and the limiting Gaussian distribution
of the process is characterized by its spectral distribution matrix. We shall show
that under INP B this spectral distribution is in one-to-one relation with the
elements of e.
Now the limiting steady state behavior of (y, u) can be realized at any (and
hence all) finite time instants by taking a suitable (Gaussian) prob ability distribution on the initial conditions. We shall assurne this change of time origin from
infinity to the finite part of the time axis 7l has been carried out. In this ca se
we may write the input-output equations of the system (1.6) in the form
y(z) = Z(z)u(z) + W(z)w(z)
u(z) = L(z)w(z) + v(z)
(W(O) = I)
(L(O)
= 0)
B
(Y.U)( )
+ [Z(eiO)dF jj(IJ)ZT(~-iO)
dFjj(B)ZT(e-' O)
Z(eiO)dFjj(IJ)]
dFjj(lJ)
412
P. E. Caines
+ G~Uk + G~Wk
Yk = HljJXk + D~Uk + W k
x k + 1 = F ljJXk
(1.6b(i))
(1.6b(ii))
Part 1. Let the initial conditions for the system and the system inputs u and w
be defined on the probability space (il, f!4, P) and let one of the two alternative
sets of assumptions INP A or INP B hold.
When the input hypotheses INP A are in force the observations on a system
in Y(p, m, b) consist of the entire sampie path on Z of the deterministic observed
process u and the sampie path of y on Zl'
When the input hypotheses INP B are in force the observations on a system
in Y(p, m, b) consist of the sampie paths of (y, u) on Z l '
For a system in Y(p,m,b), parameterized by 8Ee, generating the observed
process subject to the assumptions INP A or INP B, let the maximum likelihood
estimate {fN of 8 be given by a (uniquely specified) maximizing argument of
LN(Y~; u, (J), or LN(Y~' u~; (J), respectively, over (JEC, where C is a compact subset
of containing 8.
Then {fN is strongly consistent, that is
{fN-+8
a.s.
as N -+ 00 in the topology of e.
Part 2. Let the hypotheses INP A or INP B hold. In addition, assume the
parameter 8 lies in the interior C of c.
as
413
N~oo.
a2
aoiao
1 ~ i, j
~ 1],
(3.2a)
a
aoiaoj
2
1 ~ i, j
~ 1],
(3.2b)
when INP B holds, where the indicated matrix inverse and second partial differentials exist. Where H is block diagonal under both input hypotheses A and
B, with blocks corresponding to 17 and "', where
cJ)(O) 0 -2
rr.
J W; 1(ei1)W(ei1)17(0)Wd'(e-i1)W;T(e-i1)dD,
2"
(3.3a)
- U(e-i1)Z~(e-i1)] W;T(e- i1 )} dy
and
1
il(O) 0 -2
rr.
Yw;
0
(ei1)Z6(ei1)dF(ei1)Z~(e-i1)W;T(e-i1)
(3.3b)
(3.3c)
when
Ze(z) = A; 1 (z)B 6(z), W6(z) = A; 1(Z)C6(z), and Z6(Z) 0 Z6(Z) - Z(z),
(3.3d)
and where F is the non-decreasing function in [O,2rr.] given by Fu under INPA
and F't' under INP B such that
1 2"
M k = - J e- ik6 dF(0),
2rr.
kE71
414
P. E. Caines
.
. H- 1(OO)[ P(O)
Q(O)]H-1(e') 0 f t he l'Imltmg
..
d'lstn'b utlOn
. .
covanance
matnx
T
Q (0) R(O)
)N(ON - 0) equals 2H- 1(O). ,
The entries in the matrix p(e) under input hypotheses A and Bare given by
0 '
ao,
.
8= 8
ae)
ao}
.
8=8
1<' .<p(P+ 1)
= I,) = 2
(3.4i)
The entries of the matrix Runder input hypo thesis Aare given by
Rij = 4Tr[M ii(O)E- 1(O)]
aej
aei
(3.4ii)
where
(3.4iii)
whe~e ,~)(e) denotes I.l=O(iJ!20i)Kj(O)Uk-j, when K}{O) is the coefficient matrix
1(z)Za(z). (The first summand in (3.4ii) also appears as
of z} appearing in -
W;
the first summand in (3.4iv) below where it is given by its integral representation
with ii OO replacing u.) Under input hypotheses B the entries of the matrix Rare
given by
4
Rij= 2n
f"Tr {[ WJ"
1"
a -.,
. a
-T'
aOj
415
(3.4iv)
Outline of Proof
We begin by defining the process VOO(O) which is the prediction error process
generated by the predictor for Yk+ 1 based upon Y~, u~ using the parameter
9 EC. VOO (0) is given by
VOO(O)(z) = C; 1 (z)Ao(Z) [(Aif l(z)B (z) - A; 1 (z)Bo(Z))u(z)
Our first objective is to study the asymptotic properties of the steady state
version of LN(Y~; u, 0), N EZ 1 , where we recall
(3.5)
over
e. Then
Lemma 3.1. Subject to the general hypotheses of the theorem and subject to
hypotheses INP A(1)-A(3)
(3.6)
as N --+ 00 for all OEe, moreover this convergence is uniform over any compact
set D ce.
In (3.6) the function L (9) is given by
L(O) = log(det I(O)) Tr[[ lP(O) + .Q(O)]I- 1(O)]
(3.7)(i)
416
P. E. Caines
where
(/Je((J) ~ Ef/o((J)f/~((J)
=~
2n
Q e((J)
~ 21
yw;
1 (eiY)
W6(eiY)1:'(e)W~(e-iY) W; T(eiY)dy
(3.7)(ii)
YW;
1 (eiY)Ze(eiY)dF u(eiY)ZJ(e -
iy)W; T(e- iy ),
(3.7)(iii)
and
(3.7)(iv)
and where F u() is defined in the theorem statement.
(We remark that Fu exists since, for all N, the Nm x Nm matrix, with i,j-th
block entry M i _ j' is positive and hence we may apply the theorem of Herglotz
to produce the required function.)
The next fact is that the exact likelihood function converges as described in
Lemma 3.2. Subject to the general hypotheses of the theorem and subject to the
input hypotheses INP A(1)-A(3).
LN(Y~; u, (J) ~ LN((J)
as N
~ 00
(3.8)
a.s.
for all (JEe and uniformly over any compact set D ce.
e.
a.s.
(3.9)
for each N EZ 1 and all (JEC. Now Cis a compact set so it is sequentially compact
and t~e sequence {{fN; N EZ 1 } has a convergent subsequence {{fNM; M EZ 1 } such
that (JNM ~ (J* as M ~ 00 in the topology of e. Further, we observe that (J* is
a 88-measurable e-valued random variable.
The next step is to insert (J* in the left hand side of (3.9), and take limits
along the sequence {NM;MEZd. Then by Lemma 3.1 and 3.2 we obtain
Le(e) ~ L6((J*)
a.s.
417
Lemma 3.3. Under the general hypotheses ofthe theorem and the input hypotheses
INP A(1)-A(4) we have
Lt/(O) ~ Lt/(O) = 10g(detE(O)) + p a.s.
vOee,
(3.10)
(3.11)
O.
(3.12)
holds with the lower bound on the right-hand side of (3.12) attained only at
X=Q.
Identifying terms between (3.11) and (3.12), we obtain the lower bound
10g(det[(I)t/(0) + Qt/(O)J) + p
(3.13)
for (3.11). Now the funetion log (-) is strietly monotone inereasing on IR l and
det A > det B for A =P B, A = AT ~ B = BT > O. (This foHows sinee det (X + I) >
det I when X = X T ~ 0 and X =P 0.) Consequently we may obtain a unique global
minimum for (3.13) at '" = ~ if we ean show that (l)t/(O) + Qt/(O) has a unique
strietly positive global minimum (as a positive matrix) at '" =~, that is,
(l)t/(O) + Q(O) ~ (l)(O) > 0 with equality on!y if '" =~. We do this by showing
(l)t/(O) ~ (l)(O) = E(O) > 0 and Q(O) ~ Qt/(O) = 0 with equality in both only if
",=~.
+ P(O, z) = I + I
00
j=l
= I + Q(O, z) = I +
00
Qj(O)zj
j=l
418
P. E. Caines
(3.14)
Now, since L(O) > 0 for an OEe, equality holds in (3.15) if and only if
W; l(Z)W(z) = I a.e. on T. We conclude that lP(O) is minimized as a positive
matrix if and only if
(3.16)
in which case lP(O) = L(O).
It is clear from the definition of DO in (3.7) (iii) and (3.7) (iv) that
D(O) ~ .Q(O) = 0 for an OE e. We want to show that .Q(O) = 0 only if
Z(O) = Z(O). Let us write Zo(z) in the left co prime factorization (mJ.d.) form
F;l(Z)G O(z), where Fo(z) and Go(z) are polynomial matrices of order less than
or equal to 215.
Now since w; 1F;lG odF uG:F;*W;* ~ O,D(O) = 0 implies
0=
2n
26
w=o
J Go(ei).)dFiei).)GJ(eU) = L
Gi(O)Mj-iGJ(O)
where
Giz) ~
26
L Gi(O)Zi
(3.17)
i=O
But then the hypo thesis that u is persistently existing of order 215 (INP A(4))
yields Go(z) = 0, and so
Zo(z) = Z(z)
(3.18)
419
But again using the fact that log det X + Tr QX - 1 ~ log det Q + p (for
X = X T > 0 and Q = QT > 0) with equality only if X = Q, we obtain 17(0) = 17(8).
Hence 0 = 8 is the unique globally minimizing parameter of L(O) over
e.
Observe that the persistent excitation (rank 2<5) hypo thesis of INP A(4) has
played a key rle in the proof of this lemma, but is was not necessary to explicitly
prove asymptotic identifiability of the parameterization of e.
(3.19)
for ()~ i an interior point of the line segment [ON' 8J, for all N sufficiently large,
where' the differential a2 /aOa()i yields a row vector.
We observe that the definition of L ()) in (3.l1i) and the analyticity of
implies that all derivatives required in this section exist and are continuous.
Now in analogy with Lemma 3.1 we analyze the behavior of the relation
(3.19). We begin with the statement of the following lemma:
Lemma 3.4. Subject to the general hypotheses ofthe theorem and hypotheses INP
A(l)-A(3), we have
a().ao.
,
J
L ())
N
a()ia()j
1'/,
a.s.
(3.20)
and where l/J ()) and .Q()) are defined in the statement ofthe theorem.
420
P. E. Caines
Lemma 3.5. Subject to the general hypotheses ofthe theorem and the hypotheses
INP A(1)-A(4), the (11 x 11) matrix H(8) is non-singular and block diagonal
with (11' x 11') and (11 - 11') x (11- 11') blocks on the diagonal where the entries in
these blocks are given informulae (3.2) and (3.3) in the theorem statement.
0
Lemma 3.6. Subject to the general hypotheses ofthe theorem and hypotheses INP
A(1)-A(3), we have
02 LN (N.
00.00. yl' u,
I
0)
02 LN (0)
~ 00.00.
I
'
1 ~ i, j
11,
a.s.
The convergence of the matrix term on the right-hand side of (3.19) and an
asymptotic analysis of the explicit formulae
(3.21a)
and
(3.21b)
then permits one to establish the formulae (3.4(i)-(iii)) in the theorem statement.
The plan of the proof under the feedback hypotheses INP B(1)-B(4) is
similar.
0
Concluding Remarks
In this paper we have given a principal maximum likelihood estimation result
for finite dimensional linear systems driven by an orthogonal Gaussian
disturbance process and by deterministic or feedback control inputs. General
ASM-PEM methods include these results as particular cases, but the stronger
hypotheses of Theorem 3.2 above yield the more specific and explicit results
given in its statement.
421
References
Bokor 1. and 1. Keviczky, ARMA Canonical Forms Obtained from Constructibility Invariants,
Int J Contro!, 45 (3),861-873 (1987)
Brockett, R.W., So me Geometrie Questions in the Theory of Linear Systems, IEEE Trans Auto
Contro!, AC-21 (3),449-455 (1976)
Byrnes, C., The Moduli Space for Linear Dynamical Systems, in the 1976 Ames Research Center
(N ASA) Con! on Geometrie Contro! Theory, eds. C. Martin and R. Hermann, Vol VII, Lie Groups:
History, Frontiers and Applications, Math Sei Press, Brookline, MA (1977)
Byrnes, C. and N.E. Hurt, On the Moduli of Linear Dynamical Systems, Stud Ana! Adv Math
Suppt Stud, 4, 83-122 (1978)
Caines, P.E., Linear Stochastic Systems, John-Wiley, NYC, (1988)
Caines, P.E. and J. Rissanen, Maximum Likelihood Estimation of Parameters in Multivariate
Gausian Stochastie Processes, IEEE Trans InfTheory, IT-20 (1),102-104 (1974)
Clark, J .M.C., The Consistent Selection of Loeal Coordinates in Linear System Identification, Joint
Atomatic Contro! Conference, Purdue Univ. Lafayette, Indiana, July (1976)
DeistIer, M., The Properties of the Parameterization of ARMAX Systems and their Relevance for
Structural Estimation. Econometrica, 51, 1187-1207 (1983)
DeistIer, M. and M. Gevers, Properties of the Parameterization of Monie ARMA Systems, 8th
I F AII FO RS Symposium on I dentification and System Parameter Estimation, Preprints 2, 1341-1348
(1988)
De\champs, D.F. Global structure of families of multivariable linear systems with an applieation
to identification, Math Systems Theory, 18,329-380 (1985)
De\champs, D.F., State Space and Input-Output Linear Systems, Springer-Verlag, NYC (1988)
Denham, MJ., Canonical forms for the identification of multivariable linear systems. IEEE Trans
Autom Contro!, AC-19, 646-656 (1974)
422
P. E. Caines
Dickinson, B.W., T. Kailath and M. Morf, Canonical matrix fraction and state-space descriptions
for deterministic and stoehastic linear systems. IEEE Trans Autom Control, AC-19, 656-667
(1974)
Dunsmuir, W., A Central Limit Theorem for Parameter Estimation in Stationary Vector Time
Series and its Application to Models for a Signal Observed in Noise, Ann Stat, 7 (3), 490-506
(1979)
Forney, D., Minimal bases of rational vector spaces, with applications to multivariable linear
systems, SIAM J Control, 13,493-520 (1975)
Gevers, M. and V. Wertz, Uniquely identifiable state-space and ARMA parameterizations for
multivariable linear systems, Automatica, 20 (3), 333-347 (1984)
Glover, K. and J.C. Willems, Parameterizations of linear dynamical system: Canonical forms and
identifiability, IEEE Trans Automat Contr, Vol AC-19, 640-646 (1974)
Hannan, E.J., The Statistical Theory of Linear Systems, Chapter 2 in Developments in Statistics,
edited by P. Krishnaiah, Academie Press, N.Y., 83-121 (1979)
Hannan, E.J. and M. Deistier, The Statistical Theory of Linear Systems, John Wiley, NYC (1988)
Hanzon, B., Identifiability, Recursive Identification and Spaces of Linear Dynamical Systems, Ph.D.
Thesis, University of Grningen, Grningen, (1986).
Hazewinkel, M. and R.E. Kaiman, On Invariants, Canonical Forms and Moduli for Linear,
Constant, Finite Dimensional Dynamical Systems, Mathematical Systems Theory, Udine 1975,
Spinger-Verlag, NYC, 48-60 (1976)
Heymann, M., Structure and Realization Problems in the Theory of Dynamical Systems, Springer,
CISM Courses and Leetures No. 204, NY (1975)
Kaiman, R.E., Mathematieal Description of Linear Dynamical Systems, SI AM J Control, 1, 152-192
(1963)
Kaiman, R.E., Irreducible Realizations and the Degree of a Rational Matrix, SIAM J Appl Math,
13, 520-544 (1965)
Kaiman, R.E., Aigebraic Geometrie Deseription of the Class of Linear Systems of Constant
Dimension, 8th Annual Princeton Conference on Information Sciences as Systems, Princeton, NJ
(1974)
Kaiman, R.E., Identifiability and Problems of Model Selection in Eeonometrics, 4th World Congress
ofthe Econometric Society, Aix-en-Provence, August (1980)
Ljung, L., System Identification: Theory for the User, Prentice Hall, New Jersey, (1987)
Rissanen I., Basis of Invariants and Canonical Forms for Linear Dynamic Systems, Automatica,
10, 175-182 (1974)
Rissanen I., Stochastic Complexity in Statistical Inquiry, World Scientifie, Singapore, (1989)
Rissanen, I. and P.E. Caines, The Strong Consistency ofMaximum Likelihood Estimators of ARMA
Proeesses, The Annals of Statistics, 7 (2), 297-315 (1979)
Rissanen, I. and L. Ljung. Estimation of Optimum Structures and Parameters for Linear Systems
Mathematical Systems Theory, Udine, 92-110 (1975)
Segal, G., The Topology of Spaces of Rational Functions, Acta Mathematica, 143,39-72 (1979)
Van Overbeck, A.J.M., and L. Ljung, On Line Structure Selection for Multivariable State Spaee
Models, Automatica, 18 (5), 529-543 (1982)
Willems, I.C., From Time Series to Linear Systems, Automatica, Part 1,22 (5), 561-580, Part 11,
22 (6), 675-694 (1986); Part III 23 (1), 87-115 (1987)
Introduction
In identification oflinear systems the "main stream" approach to noise modeling
is to add all noise to the outputs (assuming orthogonality), or to the equations
(which is the same for our analysis). In econometrics these models are named
errors-in-equations models. Here we are concerned with the ease where in
principle all variables may be eontaminated by noise. Sueh models are called
errors-in-variables (EV) or latent variables models, or using a slightly different
but equivalent formulation Jactor models. Whereas in the errors-in-equations
approach the deterministic system is embedded into its stochastie environment
in an asymmetrie way, EV modeling is (in prineiple) more general and
corresponds to symmetrie noise modeling. The asymmetry of errors-in-equations
modeling can be justified in many situations, in particular in prediction, however
there are a number of cases where this asymmetry is not appropriate and leads
to "prejudiced" results. The symmetrie EV modeling is appropriate for instance:
(i) If we are interested in the true system gene rating the data (rather than in
prediction or in encoding the data by system parameters) and we cannot
be sure apriori that the observed inputs are not corrupted by noise.
(ii) If we want to approximate a high dimensional data vector by a relatively
small number of faetors.
(iii) If we have no sufficient apriori information about the number of equations
in the system or about the classification of the variables into inputs and
outputs; then we have to perform a more symmetrie system modeling, which
in turn demands a more symmetrie noise model.
The statistical analysis of EV models has a long history in econometrics,
psychometrics and statistics (see, e.g. Frisch 1934; Koopmans 1937; Bekker and
de Leeuw 1987). Recently there has been aresurging interest in such models;
these models also attracted attention in system and control theory. Partly this
Support by the Austrian "Fonds zur Frderung der wissenschaftlichen Forschung" "Schwerpunkt
Angewandte Mathematik" (S32/02) is gratefully acknowledged.
424
W(Z)Zt = 0
00
j=
wjzj;
WFIRmxn
(1.2)
-00
We will eall w(z) the relationfunetion ofthe exaet relation (1.1) (compare Willems
1986). CIearly, systems of the form (1.1) are symmetrie in the sense that we need
no apriori c1assifieation of the variables Zt into inputs and outputs and no a
priori information about eausality directions; without restrietion of generality
we will assurne that m ~ n holds and that w(z) eontains no linearly dependent
rows; also in general m is not known apriori.
The observed variables are of the form
(1.3)
425
general every system would be compatible with the second moments of the observations; thus some additional assumptions have to be imposed. By assumption
(iv) the common effects are attributed to the system and the individual effect
to the noise. Clearly (iv) cannot be justified universally; in other words it will
be a "prejudice" in a number of applications. However it is a reasonable
assumption for a sufficiently large class of ca ses.
In Dur analysis the frequency .:l. will be kept fixed. In this sense I, E and D
are considered as (constant) Hermitian matrices rather than as spectral densities.
From (1.3) we have
(1.4)
Clearly (1.4) mayaiso be interpreted as coming from a (static) relation between
(Cn-valued random variables z, z and u
z=z+u;
I = Ezz*;
Wz=o,
E = EZf*; D = Euu*
(1.5)
In this paper we will analyse the relation between the second moments of the
observations I and the system and noise characteristics w(z) and D. Such an
analysis is a necessary first step for an analysis of the properties of estimation
and inference procedures. The main question are (compare Deistler an
Anderson 1989).
(a) Find the maximum number, m* say, of (linearly independent) rows of w(z)
among the set of all w(z) compatible with given I. Sometimes we also use
the symbol mc(I) for m* ifwe want to make the dependence on I explicit.
(b) Give adescription of the set of all (w(z), D) compatible with given I; in
addition describe the sub set corresponding to different numbers of linear
relations m.
(c) Describe the set of all I corresponding to a given m*, n > m* ~ 1.
Thus the problems we consider are (a) to find the (maximum) number of
equations for given I, (b) to describe the set of all observationally equivalent
(based on second moments only) signal and noise characteristics and (c) to
describe the set of spectral densities corresponding to a given m*. There is no
general solution available for these problems up to now.
In this paper, we focus on the case of general n and m* = 1 [for n general
and m* = n - 1 see Anderson and Deistler (1990)]. For the static case, where
Zt and U t are (real) white noise processes (and thus I(.:l.) and E(.:l.) are constant
with real entries) and w(z) is constant with real entries, this problem has a long
history, see, e.g. Frisch (1934), Koopmans (1937), and KaIman (1982), (1983). The
dynamic case turns out to be significantly more complicated. The cases n = 2
and n = 3 have been treated in detail in Anderson and Deistler (1984), (1987).
The paper is organized as follows. In Sect. 2 and 3 we are concerned with
question (b) and to a small extent also with question (a). The set of all rows of
426
=0
(2.1)
The set of all solutions corresponding to a given l: is called the solution set
L (of l:); sometimes we also use the notation LE Analogously we define D as
the set of all feasible matrices D corresponding to l:. Since L is the union of
linear spaces of dimension greater than zero, we may find a normalization
useful. In most parts of the paper, the first component of X, Xl is normalized
to one.
Let us define the matrix S = (sij Si~ 1), i,j = 1, ... , n and let
row of S. Now it is easily seen from
Sj
427
Proof. The proof is straightforward and can be seen from projecting the n-th
component of Z (see 1.5) on the linear space spanned by the other components
of z.
Lemma 2. Let D = diag{d ii } be feasible and let d~:) correspond to the i-th
elementary regression. Then
(2.5)
(2.6)
holds. The expression is zero for d 11 = dW, and hence is negative for d 11 > d~ll
428
0 which is
D=D
xEL,
(2.7)
Let us consider the case mc(I:) = 1 in more detail now. We investigate the
solution set and the set of all feasible matrices D.
=1= 0
of x is unequal to zero.
Proof. We give a proof by contradiction. If e.g. Xl = 0 holds for xEL, x =1= 0,
then as seen from (2.7) we may put du = 0 and D remains feasible. Also the
last n - 1 rows of I: - D are clearly linearly dependent. Then the first row of
= (I: - D) is linearly independent from the other rows of since otherwise
mc(f) > 1 would hold. Now performing the first elementary regression for f,
(not I:) using an evident notation, corresponds to determining Sl and Dso that
Slf = Sl D,
where
D=diag{dll,O, ... ,O}
and
O~D~t
t ~ t - D= I: -
(D
+ D) ~ 0
we see that D + D is feasible; Lemma 1 then implies rk(I: - (D + D)) < rk and
thus mc(I:) > 1.
Therefore in the case mc(I:) = 1, the normalization Xl = 1 is no restriction
of generality and we can consider the (normalized) solution set
L= {xl(1,x)EL}
The relation between Land D (remember that I: is kept fixed) then is ofthe form
(2.8)
where
I:
= (0"11,I: 12 )
I:i2,I: 22
D
'
= (d 11 ,O)
0,D 22
(2.9)
429
(2.10)
and
a 11
+ XL l' 2 = d 11
(2.11)
Theorem 2.1
(a) mC(L) = 1 if and only if no xEIx =f. 0 has a zero entry
(b) Jor mC(L)= 1, the relation between Land D defined by (2.9)-(2.11) is a
homeomorphism
(c) Jor mC(L) = 1, Land D are compact. IJwe consider Las a subset OJlR 2 (1I-1) ~
(["- \ then L is oJ real dimension n - 1.
ProoJ
(a) One part is just Lemma 3. Conversely if mC(L) > 1 holds, then I contains
at least one linear subspace of dimension greater than one and in this
subset clearly there is an element x =f. 0, with one zero component.
(b) As has been shown already, i is a bijection. The continuity in both directions
is easily seen from (2.9)-(2.11).
(c) Clearly, D is bounded. Let f n = 17 - D n , DnED, be a convergent sequence
with limit f. We then have 17 ~ f ~ 0 and f is singular since det f n = 0
holds and the determinant is a continuous function ofits entries and therefore
fis feasible. Thus every convergent sequence DIIED takes its limit D in D,
in other words D is closed. Since the image of a compact set by a continuous
mapping is compact, L is compact by (b). As is seen from (2.9) and (2.11),
for given d 22 , ... , dnn (and given 17 of course) X and thus d 11 are uniquely
determined. Note that D = Amin(L)' I, where Amin(A) is used to denote the
smallest eigenvalue of A ~ 0, is always feasible. This follows from
(2.12)
Remark 1. Note that for the static ca se a significantly more far-reaching result
is available, see Frisch (1934), Koopmans (1937) and Kaiman (1982). In this
430
+ (1 -
a)y,
aE<C
(3.1)
+ (1 -
a)y)E = (ax
+ (1 -
a)y)D = axD x
+ (1 -
a)yD y
(3.2)
(3.3)
431
which gives
(3.4)
- - - - d < j ) = d
S.
1 + __ ..2l
1 - a Sjj
II
II
By Lemma 2, dl: J ~ djj ~ must hold for every feasible D. Also note that
SI{S;/ > 0; thus (3.3) and (3.4) imply aE[O, 1]. Put
dii =
(3.5)
then such a prescription for D satisfies (3.2) and D ~ for every aE[O, 1]. In
order to show that a D given by (3.3)-(3.5) is feasible for every aE[O, 1], it
remains to show that E - D ~ holds. Note that for the j-th elementary
regression dYJ is the unique solution of the equation
(3.6)
in the variable djj EIR. This is a direct consequence of the fact that (3.6) is a
linear equation with a positive coefficient for (ajj - djj ) (compare 2.6). Now
performing the j-th elementary regression for E - aD l , aE(O, 1) we see that the
corresponding noise covariance matrix is diag {O, ... , d jj , 0, ... , O} with djj given
by (3.4) and thus E - D = (E - aDd - diag{O, ... , djj , 0, ... , O} ~ 0.
Let us consider
(3.7)
(3.8)
(3.9)
for the j-th equation. Again we put dii = 0, i =1= I, i =1= j. Equation (3.2) then are
equivalent to
( ~_I).Sj/~O
a
Su
which in turn is equivalent to
arg(~ -
(3.10)
432
Re
CI.
CI.
(1)
---_
CI.
(2)
.....
Fig.l
Using the same argument as above, we can show that L - D ~ holds. Now
let us discuss condition (3.10). We may distinguish between three cases.
433
Proof. We have to show that for every 17EL1, there is a neighborhood contained
in L1' Ifthis were not true, there would be a sequence17 n converging to17 and
mc(17 n) > 1 for all n. Thus, there would exist En> En feasible for 17 n and
rkE n < n - 1. By En ~ 17 n and 17n -+17, En is a bounded sequence, and thus has
at least one limit point, E say. Since the determinant is a continuous function of
the entries of a matrix, we then have rk E < n - 1 and 0 ~ E ~17 and thus
mc(17) > 1 in contradiction to our assumption.
From the idea of the proof above, we immediately obtain:
Corollary. L1 U L2 ... U Lm, 1 ~ m ~ n - 1, is open in L.
Consider the function 1, defined on L1, which attaches to every 17 the
corresponding (normalized) solution set L = LI' For any two compact subsets,
L, M say, of <c(n-1) x (n-1) a metric can be defined by
(4.1)
where
xii
(4.2)
XEL
(4.3)
xeLr
As the norm is a continuous function and the (normalized) solution sets are
compact we may replace the inf by min and sup by max.
As we are in the case m* = 1 every principal minor up to size n - 1 of(17 - D),
DED is strictly positive. Since these minors are continuous functions of D and
since D is compact by Theorem 2.1 all these principal minors are bounded
away from zero by a positive constant. Furthermore, by the same type of
argument we see that there is a compact neighborhood, Va = {A ~ 013DED S.t.
11 A - (17 - D) 11 ~ b}, ofthe set {17 - D 1D ED}, where the same statement remains
valid, i.e. there is ab> 0 and a corresponding c > 0 such that all principal
minors in the set V 0 are uniformly bounded away from zero by c. In the following
we will assurne that Cl > 0 chosen sufficiently sm all so that we will stay in Va
by performing the subsequent steps.
Since 17 n-+17, for every Cl E(O, 1) there is a no such that
(4.4)
434
Dn = (1 Then
B1)D,
Dn ~ 0 and
2BII + (1 -
B1
)f = (1 + B1)I -
(1 - B1)D ~
n ~ (1 - B1)t ~ 0
l'
(4.5)
{~1
(4.7)
xeLE xeLE n
PraaJ. Since In converges to a singular matrix, also the variances dl:~n of the
corresponding noise terms for the elementary regressions converge to zero.
Thus, by Lemma 2, we have f n -+ I for any f n feasible for In. The rest is
straightforward from
11 X n - xii = 11 I
12 I2'}
References
Anderson, B.D.O. and M. DeistIer, 1984, Identifiability in dynamic errors-in-variables models,
Journal of Time Series Analysis 5, 1-13
Anderson, B.D.O. and M. DeistIer, 1987, Dynamic errors-in-variables systems with three variables,
Automatica 23, 611-616
435
Anderson, B.D.O. and M. DeistIer, 1990, Identification of dynamic systems from noisy data: The
case m* = n - I, Mimeo
Bekker, P. and J. de Leeuw, 1987, The rank of reduced dispersion matrices, Psychometrica 52,
125-135
DeistIer, M., 1989, Symmetrie modeling in system identification, in: H. Nijmeijer and J.M.
Schumacher, eds., Three Decades of Mathematical System Theory. Springer Lecture Notes in
Control and Information Sciences, no. 135, Springer-Verlag, Berlin
DeistIer, M. and B.D.O. Anderson, 1989, Linear dynamic errors-in-variables models, some structure
theory, Journal of Econometrics 41, 39-63
Dieudonne, J., 1969, Foundations of modern analysis. Academic Press, New York
Frisch, R., 1934, Statistical confluence analysis by means of complete regression systems,
Publication no. 5 (Economic Institute, University of Oslo, Oslo)
KaIman, R.E., 1982, System identification from noisy data, in: A. Bednarek and L. Cesari, eds.,
Dynamic systems 11, a University ofFlorida international symposium (Academic Press, New York,
NY)
KaIman, R.E., 1983, IdentifiabiIity and modeling in econometrics, in: P.R. Krishnaiah, ed.,
Developments in statistics, Vol. 4 (Academic Press, New York, NY)
Koopmans, T.C., 1937, Linear regression analysis of economic time series, Netherlands Economic
Institute, Haarlem
Picci, G. and S. Pinzoni, 1986, Dynamic factor-analysis models for stationary processes, IMA
Journal of Mathematical Control and Information 3, 185-210
Scherrer, W., M. DeistIer, M. Kopel and W. Reitgruber, 1990, Solution sets for linear dynamic
errors-in-variables models, Mimeo
Willems, J.C., 1986, From time series to linear systems, Part I, Automatica 22, 561-580
Adaptive Control*
K. J. strm
Department of Automatie Control, Lund Institute of Technology, S-221OO Lund, Sweden
This paper gives an overview of some ideas in adaptive control that originate from a paper published
by KaIman in 1958. The key ideas are given using quotes from Kalman's paper. It is described
how the ideas were developed to give practically useful adaptive controllers. The paper ends with
a few personal remarks.
1 Introduction
Several schemes for adaptive control system originate from the fifties.
Stimulation came from two application areas: flight control systems and
computerized process control. With the emergence of supersonic aircraft which
operated over a wide range of speed and heights, it was found that controllers
having constant gain were inadequate. An active pro gram in adaptive control
was initiated to find controllers that could cope with the situation (see [Gregory
1959J). Ideas like self-oscillating adaptive systems and model reference adaptive
systems were explored, but the practical solution turned out to be .gain
scheduling. The possibility to use digital computers for process control also
generated a lot of interest in the late fifties [strm and Wittenmark 1990].
Before starting his graduate studies, KaIman worked with these problems in the
Engineering Research Laboratory at DuPont. In connection with this, he
suggested the adaptive control scheme which is now known as the self-tuning
regulator. To describe the idea we quote from the paper [KaIman 1958J:
(i) The dynamic characteristics of the process are to be represented in the form of ** Equation [6],
the coefficients of which are to be computed from measurements. The number n = q is assumed
arbitrarily. In general, the higher n, the more accurate the representation of the process by the
DilTerence Equation [6].
(ii) The coefficients aj and bj should be determined anew at each sampling instant so as to minimize
the weighted mean-square error E(N).
(iii) The calculations necessary for determining the coefficients consist of modifications of the
c1assicalleast-squares filtering procedure and are given in the Appendix.
* This paper was written while the author was Visiting Professor at
438
K. J. strm
(iv) The choice of an optimal controller is largeJy arbitrary, depending on what aspect of system
response is to be optimized. The determination of the coefficients in the describing equations
of the controller is a routine matter if the coefficients of the pulse-transfer function are known.
Only second order systems were considered and the least squares algorithm
was simplified in order to implement the algorithm. Kalman's paper also
describes a special purpose computer that was used to implement the algorithm.
The reasons for using a special purpose computer give an interesting perspective
on the development of computer technology that has taken place since 1958.
We quote:
As soon as the operations discussed in the foregoing sections have been reduced to a set of
numerical calculations (see Appendix) the machine has been synthesized in principle. This means
that any general-purpose digital computer can be programmed to act as the self-optimizing machine.
In practical applications, however, a general-purpose digital computer is an expensive, bulky,
extremely complex, and somewhat akward piece of equipment. Moreover, the computational
capabilities (speed, storage capacity, accuracy) of even the smaller commercially available
general-purpose digital computers are considerably in excess of what is demanded in performing
the computations listed in the Appendix.
For these reasons, a small special-purpose computer was constructed ...
This paper presents some of the development of the basic idea that have
occurred after the publication of Kalman's paper. Section 2 gives more details
about adaptive algorithms. In particular, the notion of direct and indirect
adaptive control are introduced. Adaptive systems are analysed in Sect. 3.
It is interesting to note that the algorithm for recursive least squares estimation
has the same structure as the KaIman filter. This can be exploited in analysis.
Some unexpected properties of direct adaptive algorithms are investigated in
Sect. 4. Industrial uses of the algorithm are briefly mentioned in Sect. 5. In the
Conclusions, we give some historical remarks and acknowledge Kalman's
impact on my own research.
2 Algorithms
Kalman's suggestion for an adaptive controller is captured by the block diagram
given in Fig. 1. There are obviously many different choices of control and
estimation algorithms, as was pointed out in quote (iv) in Sect. 1.
To be specific, assume that the process to be controlled with sufficient
accuracy can be described by
A(q- 1 }y(t} = B(q-l }u(t - do} + v(t}
(1)
439
Process parameters
I
Design
Estimation ~
Co ntroller
pa rameters
-=--
Controller
Process
Fig.l
Parameter Estimation
A recursive algorithm for estimating the parameters of (1) will first be given. If
least squares estimation is used it is straightforward to derive the following
equations. See, e.g. [strm and Wittenmark 1989].
8(t) = (j(t - 1) + K(t)e(t)
e(t) = y(t) - cpT(t - 1){j(t - 1)
K(t) = P(t - l)cp(t - I)(R 2 + cpT(t - I)P(t - l)cp(t - 1))-1
P(t) = (1 - K(t)cpT(t - I))P(t - 1) + R 1
(2)
(3)
cp is a vector of regressors
cp(t -1) = [u(t - do),,u(t - do - m) - y(t - 1),, - y(t - n)Y
(4)
Control Design
There are many possible choices for control design techniques in an adaptive
system. Many properties of the adaptive systems can be derived without
considering the details of the control design. Many different approaches have
been explored-minimum variance control, pole placement, LQG, etc. See
[Goodwin and Sin 1984J and [strm and Wittenmark 1989].
440
K. J. strm
The simplification that the estimated parameters are used instead of the true
parameters when computing the controllaw is called the certainty equivalence
assumption. Notice that the least squares method also gives an estimation of
the parameters uncertainties. It is, however, a significant complication to take
the uncertainties into account.
The algorithm is called indirect because the controller parameters are
obtained indirect1y by first estimating process parameters and then computing
a controllaw. The control design can be described mathematically by a nonlinear
mapping. A difficulty is that many control designs have mappings that are
singular for certain process parameters, typically when reachability or
observability is lost due to a pole-zero cancellation. See [KaIman et al. 1963].
By a simple reparameterization of the model it is possible to obtain
algorithms where the controller parameters are updated direct1y. To do the
reparameterization, however, it is necessary to introduce the control design
method explicitly. A simple control design method will be introduced before
the direct algorithm is described.
(5)
where e(t) is white gaus si an noise. This is a general model for linear stochastic
SISO systems (see [strm 1970J). The polynomial C can be assumed stable
and of degree n without losing generality. The model (5) is a good way to
441
(6)
To solve this equation for arbitrary C, it must be required that A and Bare
relatively prime. Among the infinitely many polynomial solutions, it is desirable
to find a solution where R has the lowest degree. This will make the moving
average as short as possible. There is a unique solution such that
deg R = d o + deg B - 1 = n - 1
There are, however, solutions having lower degree. They are obtained when R
and B havecommon factors. This means that the controller cancels some process
zeros. To describe these solutions, introduce the factorization B = B+ B- and
assume that B + divides R, Le. R = R 1 B +. Hence
RC
AR+q-dOBS
RlC
AR l +q-dOB-S
AR l + q-dOB-S = C
(7)
where
deg R l = d o + deg B- - 1 = n - 1 - deg B+
the degree of the moving average is smaller than n but larger than do - 1. The
design parameter is thus limited to a few integer values. Notice that unstable
factors of B cannot be canceled. This furt her limits the choice.
If the polynomial B is stable, there is a controller that makes the output a
442
K. J. strm
(8)
= R o + q-doQ
where the last equality follows from (5) with C = 1. Observing that BR o = R
we get
(9)
Notice that this equation may be considered a reparameterization of the model
(5) in the controller parameters. Hence if the parameters of (9) are estimated
the controller parameters are obtained directly. An adaptive controller based
on estimation of parameters of this equation is therefore called a direct adaptive
algorithm. The adaptive algorithm can be formulated as follows:
Data: Given the prediction horizon d, and the orders k and 1of the polynomials
Rand S.
443
Step 1. Estimate the coefficients of the polynomials Rand S of the model given
by equation (9) by recursive least squares, i.e. equation. (2) where the parameter
vector is
(}=[ror k
sOS,]T
and e(t) and ep(t - 1) in the right hand side of equation (2) are replaced by
e(t) = y(t) - R(q-1 )u(t - d) - S(q-1 )y(t - d) = y(t) - epT(t - d)O(t - 1)
and
Although the direct adaptive algorithm was derived under the assumption
that C = 1, it has interesting and unexpected properties when applied to a system
with C i= 1. This will be discussed in Sect. 4.
3 Expected Properties
In this section it will be shown that an adaptive system has many properties
that may be expected intuitively. In particular, it will be explored that much
can be said about an adaptive system without going into the details of the
particular control design method used. Such an approach is very much along
the lines of quote (iv) in Sect. 1. A straightforward approach is to investigate
if the estimate converges. If it does, the limiting value will tell what the limiting
controllaw iso
Interpretation of the Estimator as a Kaiman Filter
For the purpose of analysis it is useful to view the parameters of the system (1)
as random pro ces ses. Although this may look like a complication, it admits a
nice interpretation of the parameter estimator as a KaIman filter, and it also
makes it possible to use martingale theorems for convergence analysis.
To be specific it is assumed that the parameters of(1) are stochastic processes
described by
{}(t + 1) = (}(t) + w(t)
y(t) = ep(t){} + v(t)
(10)
444
K. J. strm
where {W(t)} and {v(t)} are sequences of independent gaussian random vectors
with zero mean and covariances R 1 and R 2 The initial condition is assumed
to be gaussian with mean (J0 and covariance pO. The parameters are thus viewed
as states, and equation (1) is viewed as the output equation. See [strm 1970].
Vector w represents the drift in the parameters. Parameters are constant but
unknown if w = O.
With these assumptions equation (2) for the recursive least squares estimation
can be formally interpreted as KaIman filtering [KaIman 1960J if the initial
conditions are chosen as (J(O) = (J0 and P(O) = pO.
Notice that to allow the KaIman filtering interpretation, it is necessary that
v in eq. (1) is white gaussian noise. The filtering problem is however not in the
standard form because the vector cp(t) is not known apriori.
The following result is proven in [strm 1978].
Notice that the sum in this expression is closely related to the inverse of the
matrix P(t). If the matrix R is regular, it follows that the estimates converge to
the random variable (J(O) which is the initial condition of (10). In the Bayesian
setting this may be regarded as the true parameter.
The regularity of R is closely related to persistency of excitation or sufficient
445
richness. For a system described by equation (1), the white noise signal v provides
sufficient excitation. For open loop systems, this was established in [Sternby
1977]. For closed loop systems, there is an additional complication because the
feedback may create dependencies. In [Rootzen and Sternby 1984], it was shown
that feedback does not create particular difficulties in this case. This
is also shown in [Kumar 1990].
Notice that if R is regular, it implies that the matrix P(t) goes to zero as
t--+ 00. In [Sternby 1977] and [Rootzen and Sternby 1984], it was shown that
this is in fact a neeessary and sufficient eondition for the least squares estimate
to converge.
Onee it is established that parameter estimates converge, it is straightforward
to prove that adaptive controllaws based on the estimates also converge, and
that the input and output signals in adaptive systems are mean square bounded.
This is done in detail for many different control design methods in [Kumar
1990].
Adaptive algorithms based on the assumption that the proeess can be
described by equation (1) are thus weIl understood in the ease when the
disturbance v is white noise. The theoretical results are satisfying from the point
of view that they rely heavily on the properties of the KaIman filtering
interpretation ofleast square estimation and that they can be applied to a wide
range of control design methods.
4 U nexpected Properties
The assumption that {v(t)} in equation (1) is white noise is restrictive. It would be
highly desirable to have algorithms for systems described by equation (5) where
{e(t)} is white gaussian noise, since this is a general model for linear stochastic
SISO systems [strm 1970]. An indireet adaptive algorithm for this process is
complicated since parameters of polynomial C have to be estimated. The direet
Algorithm 2 described in Sect. 2 has, however, interesting properties when
applied to the process (5). This is illustrated by a numerical experiment.
A Computational Experiment. A computational experiment is made to gain
insight into the problem. Consider a first order system described by
y(t + 1) + ay(t) = bu(t) + e(t + 1) + ce(t)
(12)
where u is the control signal, y the measured signal and e discrete time white
noise. Parameters a and bare arbitrary, but parameter c is assumed to be less
than one in magnitude. If the parameters are known, it follows from equation
(8) that the minimum variance controller is given by
a-c
(13)
u(t) = -b-y(t)
i.e. a proportional controller.
446
Cll
10
E
K. J. strm
1.5
~ 1.0
'-
Cll
CD
E
~
JtAl.r~~,,--",~._::-:
..~
..::
..::
..':':_-~
..... ,-=
.. ~
.. ~
.. _~..~.~
..~_~~_ _~~
0.5
C\l
n.
0
0
50
100
150
100
150
Time
Fig.2
c
0
'tcS
:::J
LL
Ul
-Ul
100
....I
'0
Cll
~
:::J
E
50
:::J
50
Time
Fig.3
1973] gives insight into the behavior of the direct adaptive controller.
447
= d, d + 1, ... , d
y(t + t}u(t}=O
= d,d
+I
+ 1, ... ,d + k
(14)
Algorithm 2 can thus be interpreted as if it attempts to drive certain covariances of the process output and certain cross covariances between input
and output to zero. Parameter d and the orders of the polynomials Rand S
determine which covariances are driven to zero. Notice that the limiting property
does not depend on the character of the system that is controlled.
Stronger statements can be made if more is known ab out the system to be
controlled. The following result was also shown in [strm and Wittenmark
1973].
Theorem 3. Let Aigorithm 2 with d ~ d o be used to control a system described
by equation (5). Assume that
min(k, 1) ~ n - 1
(15)
I t the estimates converge and the limiting controller is such that Rand S are
relative prime, then the equilibrium solution is such that
y(t + t}y(t} = 0
= d, d + 1, ...
(16)
D
Theorem 3 implies that a moving-average controller is a possible equilibrium
when Algorithm 2 is used to control a process given by equation (5). The
particular moving average controller obtained is governed by the choice of the
parameter d.
It remains to show that the estimates converge. The results from the previous
section do not apply since signal v is not white noise. Analysis of the algorithm
therefore require different tools and the results are not as simple and complete
as in the case of white equation noise.
The special case where d = do and the polynomial B is stable has been investigated by [Ljung 1977J. He derived convergence conditions using averaging
methods. In the ca se where an estimator based on stochastic approximation
was used instead of least squares, it was shown that the closed loop system is
asymptotically stable and that the estimates converge if the polynomial B is
stable and the function
1
1
G(z}=--C(z) 2
448
K. J. strm
5 Industrial Use
KaIman expressed the following views on the application of his ideas to process
control in the Authors Closure of his 1958 paper:
The author may not be unduly optimistic in expressing his feeling that (disregarding economic
considerations) sufficient theoretical and technological know how exists already to bring practical
process control elose to the best performance achievable in the light of the !imitations imposed by
physical measuring equipment.
This was a good conjecture, but additional theoretical work and the invention
of the microprocessors were required to make adaptive process control an
industrial reality.
The simple self-tuner based on recursive least squares and moving average
control has found significant industrial use. It was used in a commercial ship
steering auto-pilot as early as 1979. It is currently installed in more than 70
ships in regular trafik SSPA in Sweden has recently developed a new ship
steering autopilot called RollNix. This system uses the rudder both for steering
and roll stabilization. The system also uses the self-tuning regulator based on
least squares estimation. Currently the system is used in 9 ships. A slight
modification of Algorithm 2 is used in a process control system made by ABB
(Asea Brown Boveri) for industrial process control. These controllers are
installed in ab out 3,000 loops worldwide. An indirect algorithm where control
design is based on pole placement is used in another system for industrial
process control developed by First Control AB in Sweden. This system is
installed in about 450 loops. The algorithms have also been used in biomedical
systems. Gambro in Sweden is using self-tuning regulators in more than 4,000
dialysis machines. Many additional products are under development. The
adaptive controllers have performed significantly better than conventional
controllers. More information about applications of adaptive control are given
in [strm and Wittenmark 1989].
6 Conclusions
As is clear from this paper, there is now an understanding of adaptive algorithms
of the type that was discussed in Kalman's 1958 paper. Several books have
449
recently appeared where many more details are given ([Anderson et al. 1986],
[strm and Wittenmark 1989], [Goodwin and Sin 1984], [Narendra and
Annaswamy 1989], and [Sastry and Bodson 1989]). It is also interesting to see
that the ideas to an increasing extent are finding their way to industrial use.
I had the personal fortune to leam about Kalman's work early in my career.
The first time I heard about his work on adaptive control was from a colleague
from Sweden who spent some time in Prof. Ragazzini's group at Columbia
University. Kalman's ideas about adaptive control intrigued me and has inspired
a lot of my research. I had renewed and more intense contact with KaIman
when I joined IBM Research at San Jose. Control systems research at IBM
was led by Jack Bertram, a colleaque ofKalman's from DuPont and later fellow
student at Columbia. Bertram didjoint work with KaIman on dead-beat control
[KaIman and Bertram 1958], sampled data systems [KaIman & Bertram 1959]
and Lyapunov theory [KaIman and Bertram 1960]. KaIman was also a
consultant to IBM. He appeared for seminars and discussions, and I had the
fortune to hear about his ideas first hand. One member of the IBM control
systems research group, Dick Koepcke, had also implernented Kalman's
algorithms on a hybrid computer consisting of an analog computer and a large
IBM computer. This work was unfortunately interrupted when the research
group moved to San Jose. It took a while for me to understand adaptive control.
It was necessary to obtain a deeper understanding of system identification and
system theory. This had all been very interesting work that has occupied me
for many years. In closing this paper I would like to thank Rudy for giving the
initial inspiration and for many discussions and communications we have had
since 1961.
References
Anderson, B.D.O., R.R. Bitmead, C.R. Johnson, Jr., P.V. Kokotovic, R.L. Kosut, I.M.Y. MareeIs,
L. Praly, and B.D. Riedle (1986), Stability of Adaptive Systems: Passivity and Averaging Analysis,
MIT Press, Cambridge, MA
Astrm, KJ. (1970), Introduction to Stochastic Control Theory, Academic Press, New York
Astrm, K.J. (1978), "Stochastic control problems," in Coppe1, W.A. (editor), Mathematical Control
Theory, Lecture Notes in Mathematics, Springer-Verlag, Berlin, F.R.G
Astrm, K.J. (1980), "Maximum Iikelihood and prediction error methods," Automatica, 16,
pp 551-574
Astrm, KJ. and B. Wittenmark (1973), "On self-tuning regulators," Automatica, 9, pp 185-199
Astrm, KJ. and B. Wittenmark (1989), Adaptive Control, Addison Wesley, Reading, MA
Astrm, KJ. and B. Wittenmark (1990), Computer Controlled Systems. Second Edition, Prentice
Hall, Englewood ClifTs, NJ
Chow, Y. and H. Teicher (1988), Probability Theory: Independence, Interchangability, Martingales.
Second Edition, Springer-Verlag, New York
Gregory, P.C. (editor) (1959), Proc Seif Adaptive Flight Control Symposium, Wright-Patterson Air
Force Base, Weight Air Development Center, Ohio
Goodwin, G.C. and K.S. Sin (1984), Adaptive Filtering Prediction and Control, Information and
Systems Science Series, Prentice-Hall, Englewood ClifTs, NJ
Holst, J. (1979) "Local stability of some recursive stochastic algorithms," Proc 5th I F AC Symposium
on Identiflcation and System Parameter Estimation, Pergamon Press, Oxford pp 1139-1146
Kaiman, R.E. (1958), "Design of a self-optimizing control system," Trans ASME, 80, pp 468-478
KaIman, R.E. (1960), "A new approach to linear filtering and prediction problems," Trans ASME
J Basic Eng, 82, pp 35-45
450
K. J. strm
KaIman, R.E. and J.E. Bertram (1958), "General synthesis procedure for computer control of single
and multi-Ioop linear systems," Trans AIEE, 77:II, pp 602-609
KaIman, R.E. and J.E. Bertram (1959), "A unified approach to the theory of sampling systems," J
FrankIin Inst, 267, pp 405-436
KaIman, R.E. and J.E. Bertram (1960), "The 'second method' of Lyapunov in the analysis and
optimization of control systems, I-Continuous-time systems," Trans AIEE J Basic Engr, 82,
pp 371-393
KaIman, R.E. and J.E. Bertram (1960), "The 'second method' of Lyapunov in the analysis and
optimization of control systems, II-Discretes-time systems," Trans AIEE J Basis Engr, 82,
pp 394-399
KaIman, R.E., y.c. Ho, and K.S. Narendra (1963), "Controllability of linear dynamical systems,"
in Contributions to Differential Equations, 1, Interscience-Wiley, New York, pp 189-213
Kumar, P.R. (1990), "Convergence of adaptive control schemes using least-squares parameter
estimates,". I EEE Trans Autom Contr, AC-35, to appear
Ljung, L. (1977), "On positive real transfer functions and the convergence of some recursive schemes,"
IEEE Trans Autom Contr, AC-22, pp 539-550
Ljung, L. and T. Sderstrm (1983), Theory and Practice of Recursive Identification, MIT Press,
Cambridge, MA
Narendra, K.S. and A.M. Annaswamy (1989), Stable Adaptive Systems, Prentice Hall, Englewood
ClifTs, NJ
Rootzen, H. and J. Sternby (1984), "Consistency in least squares: a bayesian approach," Automatica,
20, pp471-475
Sastry, S. and M. Bodson (1989), Adaptive Control Stability, Convergence, and Robustness, Prentice
Hall, Englewood ClifTs, NJ
Sternby, J. (1977), "On consistency for the method ofleast squares using martingale theory," IEEE
Trans Autom Contr, AC-22, pp 346-352
Chapter 7
The notion of controllability was identified by Kaiman as one of the central properties determining
system behavior. His simple rank condition is ubiquitous in linear systems analysis. This artic\e
presents an elementary and expository overview of the generalizations of this test to a condition
for testing accessibility of discrete and continuous time nonlinear systems.
1 Introduction
The state-space approach to control systems analysis took center stage in the
late 50's. Right from the beginning, it was recognized that certain nondegeneracy
assumptions were needed in establishing results on optimal control. However,
it was not until Kalman's work ([9J, [1OJ) that the property of controllability
was isolated as of interest in and of itself, as it characterizes the degrees of
freedom available when attempting to control a system.
The study of controllability for linear systems, first carried out in detail by
KaIman and his coworkers in [10J, has spanned a great number of research
directions, and Kalman's citation for the IEEE Medal of Honor in 1974 attests
to this influence. Associated topics such as testing degrees of controllability,
and their numerical analysis aspects, are still the subject of much research (see,
e.g. [12J, [19J, and references there). This paper deals with the questions
associated with testing controllability of nonlinear systems, both those operating
in continuous time, that is, systems of the type
x(t) = f(x(t), u(t))
(CT)
= f(x(t), u(t))
(DT)
where the superscript "+" is used to indicate time shift (x+(t)=x(t+ 1)). In
principle, one wishes to study controllability from the origin. This is the property
that for each state xeIR" there be some control driving 0 to x in finite time.
454
E. D. Sontag
(The terminology "reachability" is also used for this concept.) As shown below,
in order to obtain elegant general results one has to weaken the notion of
controllability.
To simplify matters, it will be assumed that the states x(t) belong to an
Euclidean space IR", controls u(t) take values in Euclidean space IRm, and the
dynamics function f is (real-)analytic on (x, u). Many generalizations, such as
allowing x to evolve on a differentiable manifold, or letting f have less
smoothness, are of great interest; however, in order to keep the discussion as
elementary as possible the above assumptions are made here. (Analyticity allows
stating certain results in necessary and sufficient, rather than merely sufficient,
manner.) The controls u() are allowed to be arbitrary measurable essentially
bounded functions. The origin is assumed to be an equilibrium state, that is
f(O,O) = 0
is a diffeomorphism for each fixed u; in other words, this map is bijective and
has a non singular differential at each point. Imposing invertibility simplifies
matters considerably, and is a natural condition for equations that arise from
the sampling of continuous time systems, which is one of the main ways in
which discrete time systems appear in practice.
When the system is linear, that is,
f(x, u) = Ax + Bu
for suitable matrices A (of size n x n) and B (of size n x m), controllability from
the origin is equivalent to the property that the rank of the n x nm KaIman
block matrix
(B, AB, A 2 B, ... , An-l B)
(1)
must equal the dimension n of the state space. This is a useful and simple test,
and much effort has been spent on trying to generalize it to nonlinear systems
in various forms.
The systematic study of co nt roll ability questions for continuous time
non-linear systems was began in the early 70's. At that time, the papers [14],
[22], and [13], building on previous work ([2], [5]) on partial differential
equations, gave a nonlinear analogue of the above KaIman controllability rank
condition. This analogue provides only a necessary test, not sufficient. It becomes
necessary and sufficient if one is interested instead in the accessibility property,
a weaker form of controllability which will be discussed below and which
corresponds to being able to reach from the origin a set of full dimension (not
necessarily the entire space). Analogous results hold also in discrete time.
455
However, this work did not settle the question of characterizing controllability,
a question which remains open and which is the subject of a major current
research effort, at least in so far as characterizations of local analogues is
concerned. (One does know that local controllability can be checked in principle
in terms of linear relations between the Lie brackets of the vector fields defining
the system ([20]), and isolating the explicit form of these relations has been a
major focus of research. It is impossible to even attempt here to give a reasonably
complete list of references to this very active area of research. The reference
[21] can be used as a source of further bibliography.)
This brief overview article will discuss accessibility for discrete and
continuous time, as well as some results which exhibit examples where
accessibility and controllability coincide. Some ultimate limitations on the
possibility of effectively checking controllability will also be mentioned. For
more details on accessibility at an expository level, see for instance [6], [20],
or [7] in continuous time, and [8] in discrete time. These references should
also be consulted for justifications of all statements given here without proof.
The level of the presentation will be kept as elementary as possible, in order
to explain the main ideas in very simple terms.
2 Accessibility
Let}; be either (CT) or (DT). The reachable set r!II is by definition the set of
states reachable from the origin, that is, the set of states of the form
{cjJ(t, 0, 0, w)l t ~ 0, w admissible control}
where cjJ(t, s, x, w) denotes the value x(t) at time t of the solution of (CT) or (DT)
respectively, with initial condition x at time sand control function w = wo.
The function w is an arbitrary sequence in the discrete time case, and is required
to be measurable essentially bounded in the continuous case. If the solution of
(CT) is undefined for a certain wand x, then cjJ is also undefined.
The system }; will be said to be accessible (from the origin) if the reachable
set r!II has a nonempty interior in ]Rn.
456
E. D. Sontag
Remark .2. One mayaiso define accessibility from arbitrary initial states (rather
than just from the origin). When the initial state is not an equilibrium state,
however, one must distinguish between accessibility, as defined here, and "strong
accessibility" which corresponds to the requirement that there be a fixed time
T> such that the reachable set in time T, that is
k~
for every state x and sequence of controls U l , U k By Sard's Theorem, for each
fixed k it is either the ca se that the map fk(O, .) has at least one point where its
Jacobian has rank n, or its image has measure zero. Since a countable union
of negligible sets again has measure zero, accessibility implies that there must
exist some k and some sequence of controls ul"",Uk so that the Jacobian of
fk(O,') evaluated at that input sequence,
fk(O,')* [Ul"" Uk],
with
457
x=Ax+Bu
the functions Ju are all affine, and the Lie brackets are again of the same form.
It is easy to show that all elements of .P have the form
AkBv
458
E. D. Sontag
if and
only
if
r:
Xu.i(x):=
~I
Oe
fuo f:+\e;(x),
<=0
where ei denotes the ith coordinate vector, and more generalJy for alJ u, i and
each integer k ~ 0 let
(Ad~X u,;)(x):= ~I
f~ fuo f:+\e; f;;k(X)
Oe
<=0
The accessibility Lie algebra is now defined in terms of iterated Lie brackets
of these vector functions, and the accessibility rank condition is defined in terms
of this, analogously to the continuous time case. The main fact is, then, as folJows.
if and
only
if
Again, for linear (discrete time) systems, the condition reduces to the Kaiman
controlJability test. The vectors Ad~Xu.i are in fact alJ of the type AkBu, for
vectors uElRm
459
X=U (1)+U
o
i
2(
0 )
oe(xd
for x > 0 and oe(x) == 0 for x ~ O. This system is easily shown to be accessible-in
fact, it is completely controllable (any state can be steered to any other state),
-but the accessibility rank condition does not hold.
D
x=
- f(x,u)
in continuous time, or
x+ =
f;; l(X)
in discrete time. Since the accessibility Lie algebra 2 is a vector space, the same
Lie algebra results for the time-reversed of a continuous time system, proving
the equivalence of both notions in that case. The same result turns out to be
true in the discrete case, though the proofis much less trivial. This is summarized
then by:
Remark. The proofin the discrete case relies roughlyon the following argument.
Introduce a superscript "-,, to the notation for the vectors Ad~Xu.i introduced
above, and use 2 - instead of 2 for the Lie algebra genera ted by these. Consider
also the vectors
now with k ~ 0, and let 2+ be the algebra generated by these vectors. This
algebra is the same as the algebra 2 obtained for the time-reversed system.
460
E. D. Sontag
One first proves that it is also possible to generate the same Lie algebra using
negative k in the definition of the vectors Ad~X;;'i (that is, the middle term in
the definition is fu 0 f ::+\e; rather than f:: 1 0 f u+ee ). Thus the only obstruction
is due to the use of negative instead of positive k. But since the operator
Ad o: Xf---+Ado(X),
on the Lie algebra of all vector fields preserves the tangent space at 0, because
the origin is an equilibrium state, this induces an isomorphism between the two
linear subspaces .P + (0) and.P - (0), giving the desired equality of ranks.
D
6 An Example
The following is a well-known ("folk") example from differential geometry
illustrating the use ofthe accessibility rank condition in continuous time; because
the resulting system is "symmetrie" in the sense that f(x, -u) = - f(x, u), and
accessibility holds from every state, it can be shown that this example is
completely controllable, but here we only concentrate on the local aspect about
zero.
Assurne that we model an automobile in the following way, as an object in
the plane. The position of the center of the front axle has coordinates (x, y), its
orientation is specified by the angle ljJ, and 8 is the angle its wheels make relative
to the orientation of the car.
We assurne that the angle 8 can take values on an interval (- 80 , 80 ),
corresponding to the maximum allowed displacement of the steering wheel, and
that ljJ can take arbitrary values. As controls we take the steering wheel moves
(u 1 ) and the engine speed (u z). Using elementary trigonometry, the following
model results:
~ (~)+ l~~:;;;~}
u,
u,
(2)
lR x lR x lR x (- 80 , 80 )!:; lR 4
(In fact, it is more natural to identify ljJ and ljJ + 2n and take as state space the
manifold lR x lR X Sl X ( - 80 , 80 ); this leads to control systems on manifolds
different from Euclidean spaces.) We take the controls as having values on lR 2 ;
a more realistic model of course incorporates constraints on their magnitude.
A control with U z == corresponds to a pure steering move, while one with
U 1 ==
models a pure driving move in which the steering wheel is fixed in one
position. We let g 1 = steer be the vector field (0,0,0, 1)' and gz = drive the vector
field (cos(ljJ + 8), sin(ljJ + 8), sin 8, 0)'. It is intuitively clear that the system is
461
completely controllable, but one can check accessibility using the rank
condition. Indeed, computing
wriggle:= [steer,driveJ
and
slide:= [wriggle, drive J
it is easy to see that the determinant of the matrix consisting of the columns
(steer, drive, wriggle, slide) is nonzero everywhere, and in particular at the origin.
For cjJ = e= 0 and any (x, y), wriggle is the vector (0, 1, 1,0), a mix of sliding
in the y direction and a rotation, and slide is the vector (0, 1,0,0) corresponding
to sliding in the y direction. This means that one can in principle implement
infinitesimally both of the above motions. The "wriggling" motion corresponding
to wriggle is, from the definition of Lie bracket, basically that corresponding
to many fast iterations of the actions:
steer - drive - reverse steer - reverse drive, repeat
which one often performs in order to get out of a tight parking space.
Interestingly enough, one could also approximate the pure sliding motion:
wriggle, drive, reverse wriggle, reverse drive, repeat, corresponding to the last
vector field.
462
E. D. Sontag
3. Brunovsky, P., "Local controllability of odd systems," Banach Center Publications 1 (1974):
39-45
Controllability Revisited
M. Fliess
Laboratoire des Signaux et Systemes, CNRS-ESE, Plateau de Moulon,
F-91192 Gif-sur-Yvette Cedex, France
1 Introduction
Since its introduction in the early sixties, Kalman's controllability [22-25] has
played a prominent role in understanding and clarifying many structural
properties offinite-dimensional constant linear systems. A good characterization
of the importance of a theoretical concept is its ubiquity, i.e. its possibility of
embodying far reaching extensions to other situations. One of the most
important generalizations was done a few years later on finite-dimensional
nonlinear systems, see, e.g. the works of Hermann [18], Lobry [28], Sussmann
and Jurdjevic [38], Hermes [19], Krener [27] and many others. This was the
starting point of methods utilizing Lie brackets of vector fields and foliations,
stemming from differential geometry, to attack various problems in nonlinear
control theory (see, e.g. Sussmann [37] and Isidori [20]).
An alternative approach employing differential fields has recently been
proposed by the author [7,8] for the study of linear and nonlinear systems. It
has the advantage of solving some long-standing problems, such as input-output
inversion and of throwing new light on state-variable realizations. A (nonlinear)
dynamics is now a finitely generated differentially algebraic extension K/k<u),
where k<u) is the differential field generated by the ground field k and by the
control variables u = (Ul' ... ' um). Recall that a differentially algebraic extension
is the natural differential analogue of the well-known notion of algebraic
extensions for usual (non-differential) fields (see Kolchin [26]). It means that
any element in K satisfies an algebraic differential equation with coefficients in
k <u). Pommaret [30] has suggested in this context a most elegant interpretation
of controllability by identifying it with the fact that the ground field k is
464
M. Fliess
2 Differential Fields
2.1. Differential Algebra
Differential algebra was introduced between the two World Wars by the
American mathematician Ritt [31] in order to develop for differential algebraic
equations a theory whieh, to some extent, would be close to algebra, which was
465
-(a+b}=+b,
dt
d
.
-(ab) = b + ab
dt
A constant of K is an element ceK such that c= O. The set of constants of
K is a subfield of K, which is called the field of constants.
2.3. Remarks
(i) We restrict ourselves to systems with lumped parameters, i.e. to ordinary
differential equations and to ordinary differential fields with a single
derivation. We do not treat partial differential fields with several pairwise
commuting derivations.
(ii) For simplicity's sake we also limit ourselves to fields of a characteristic zero,
as differential equations over fields of strictly positive characteristics have
not yet appeared in physics or engineering.
2.4. Examples
(i) The fields <Q, lR, <C of rational, real and complex numbers are trivial fields
of constants.
(ii) The field of not necessarily proper rational transfer functions in a single
variable s, over the field lR, is a differential field with respect to the derivation
dlds.
466
M. Fliess
3 Nonlinear Dynamics
3.1. Let k be a given differential ground field and k (u) be the differential field
+ (d 5U 2
where Al" .. , An are polynomials over k. We should pinpoint that the equations
467
are implicit with respect to ~ 1"", ~n' When the Jacobian matrix
(E) is only "locally valid", i.e. in domains where the Jacobian matrix has full
rank.
3.3. ~ is a minimal (generalized) state. Take two of such minimal states ~ and
[, i.e. two transcendence basis of K/k<u). Any component of [ is k<u)algebraically dependent on ~. This amounts to saying that there exists a
control-dependent implicit relation of the type:
Pi([i'~'U,, ... ,u(P)=O,
i=l, ... ,n
[i =
4 Controllability
4.1. The dyn ami es K/k<u) is said to be controllable [30] if, and only if, the
(17)
X = fo(x)
+I
UJi(X),
i= 1
where the control variables appear linearly, as quite often in the differential
geometrie approach (cf. [20]). The f/s are formal polynomial vector fields over
k. As the derivatives of the components of x = (x 1"'" X n ) appear linearly, it is
rather routine work to verify that the differential ideal [26] corresponding to
Recall that, in our abstract algebraic setting, this theorem has only a heuristic value.
468
M. Fliess
<
4.3. Let .!l' (resp. A) be the Lie K-algebra spanned by Jo,f!> ... ,fm (resp.
J1' ... ,fm) Denote by .!l' 0 the Lie ideal of A in .!l'. The usual differential
geometrie notions possess a formal counterpart (see, e.g. Botelho [2J, Nichols
and Weisfeiler [29J). The involutive distribution L (resp. L o) corresponding to
.!l' (resp . .!l' 0) is the k(x)-vector space spanned by .!l' (resp . .!l' 0)' where k(x) is
the (non-differential) field genera ted by k and the components of x. The following
well-known accessibility definitions (cf. [37J) now read:
(17) is said to be weakly (resp. strongly) accessible if, and only if, the k(x)dimension of L (resp. L o) is n.
4.4. Our main result is:
Theorem. The dynamics k<u, x)/k<u) is controllable if, and only if, (17) is
strongly accessible.
4.5. Remark. The elementary example x = x, where n = 1 and with no input,
i.e. m = 0, is weakly accessible. As x is differentially algebraic over k, it is not
controllable in the sense of 4.1, which is therefore not equivalent to weak
accessibility.
4.6. We now outline the proof of the theorem. Let k be the differential algebraic
closure of k in k <u, x), i.e. the set of all elements in k <u, x) which are differentially algebraic over k. The next lemma, which is easy, will be most useful:
Lemma. k is not difJerentially algebraically closed in k<u, x) if, and only if, the
4.8. Assurne that the k(x)-dimension of L o is (j ~ n. From [2, 29J, we know the
existence of a basis 8/8rll, ... ,8/8t/b of pairwise commuting vector fields and
therefore the existence offormallocal coordinates e 1'' eb' eH 1' ., en which
yield (see, e.g. [20J) the same type of decomposition as in 4.7: e1, ... , eb (resp.
eH 1' ., en) satisfy differential equations which are control dependent (resp.
independent).
4.9. The former decomposition is thus possible if, and only if, one of the two
next conditions is satisfied:
469
5 Linear Controllability 2
5.1. We are defining linear dynamics via module theory which is more familiar
to control theorists than differential vector spaces [26J, which we employed in
our previous publications [7, 8, 12]. This latter framework has perhaps the
advantage of making the connection with general nonlinear cases c1earer.
5.2. Let k be an arbitrary differential ground field. We denote by k[d/dtJ the
ring of linear differential operators over k of the form
finite
d~
a~~,
a~Ek
dt
k[d/dtJ is a commutative ring if, and only if, k is a fie1d of constants. It is known
(cf. Cohn [6J) that k[d/dtJ is a principal ideal ring.
5.3. We denote by [wJ the left k[d/dtJ-module spanned by a set w. A linear
dynamics M is a finite1y generated left k[d/dtJ-module containing [uJ, such that
the quotient module M/[uJ is torsion. The input u is said to be independent if,
and only if, the module [uJ is free. The dynamics M is said to be constant (resp.
time-varying) if, and only if, k is (resp. is not) a field of constants.
5.4. As M/[uJ is torsion and finitely generated, it is necessarily finitedimensional as a k-vector space. Take a finite set ~ = (~1' ... , ~ n) of elements in
M such that its canonical image in M/[uJ is a basis of the latter vector space.
This yields the linear counterpart of (I) and (E) in 3.2:
~ = A~
+ L
~
B/1u(/1),
/1;1
where A and the B/1's are matrices over k of appropriate sizes. As in 3.3, another
(generalized) minimal state [ = ([1' ... , [n) is related to ~ by a control-dependent
transformation
~
= p[ +
Qvu(V)
v; 1
Pis a square invertible matrix and the Qv's are matrices of appropriate sizes.
The results of the sections IV and. V have already been announced in [10,11].
470
M. Fliess
5.5. Consider the dynamics ~ = ~ + u + U, where m = n = 1. The state transformation = x + u yields x = x + 2u which does not contain derivatives of the
control variable. This elimination can easily be extended to the general case.
The transformation
[ = ~-
Ba u(a-l)
x=
oe - 1. By repeating
S:x + Cu,
where x is a (minimal) Kaiman state. Note that two minimal KaIman states x
and x are related by the usual control independent transformation x = [1J>x,
where [1J> is a square invertible matrix.
5.6. Remark. We should remember that this elimination of the derivatives of
the control variables is in general impossible for nonlinear dynamics (Freedman
and Willems [14], Glad [16]).
5.7. From now on, we assume that the input u is independent. A linear dynamics
M is said to be controllable if, and only if, its KaIman representation is
controUabIe 3 .
5.8. Example. The KaIman representation of ~ = U, m = n = 1, is x = 0, where
x = ~ - u. It is of course not controUable. Note, however, that any value of ~
is reachable by a suitable choice of u. This implies that in our approach controllability and reachability are not equivalent.
5.9. A weU known property (cf. Cohn [6]) of finitely generated Ieft modules
over principal ideal rings teUs us that the dynamics M can by decomposed in
the foUowing way
M=FfBT,
where F (resp. T) is a free (resp. torsion) module. This fact is of course equivalent
to the KaIman decomposition of a linear dynamics into the controUable and
uncontrollable parts
~(x+) =
dt
x#
(S:
(0)
s: 2
)(x+) + ((O))u,
x#
471
4.1. RecaIl that it is weIl known (see, e.g. [20J) that the strong accessibility of
a constant linear system implies Kalman's controllability.
5.10. In the last section we have proved the following characterization of linear
controllability which seems to be new:
Theorem. A linear dynamics M is controllable if, and only if, it is a free left
k[dldtJ-module.
472
M. Fliess
6.5. The set of hidden modes or decoupling zeros is the union of the sets of
input- and output-decoupling zeros.
7.4. Remark. One should note that for some peculiar choices of u and of the
corresponding trajectories of x, the controllability of the linearized dynamics
might fai!. This can be precisely expressed in computing the rank condition for
checking controllability. Interesting related results have been obtained by
Charlet, Uvine, and Marino [5] in the context of dynamic feedback linearization.
References
[1] H. BIomberg and R. Ylinen, Algebraic Theory of Multivariable Linear Systems, Academic
Press, London, 1983
[2] J.N. Botelho, Le theoreme de Frobenius formel, J. DifT Geom, 12 (1977), pp 319-325
[3] N. Bourbaki, Algebre (Chap 4 7), Masson, Paris, 1981
[4] F.M. Callier and C.A. Desoer, Multivariable Feedback Systems, Springer-Verlag, New York,
1983
[5] B. Charlet, J. Levine and R. Marino, On Dynamic Feedback Linearization, Systems Control
Lett, 13 (1989), pp 143-151
473
[6] R.M. Cohn, Free Rings and their Relations, Academic Press, London, 1971
[7] M. Fliess, Automatique et corps differentiels, Forum Math, 1 (1989), pp 227-238
[8] M. Fliess, Generalized Linear Systems with Lumped or Distributed Parameters and
Differential Vector Spaces, Int J Control, 49 (1989), pp 1989-1999
[9] M. Fliess, Some Remarks on Nonlinear Input-Output Systems with Delays, in "New Trends
in Nonlinear Control Theory", (Proc Conf Nantes 1988), J Descusse, M. Fliess, A. Isidori
and D. Leborgne eds, Lect Notes Control Inform Sei 122 (1989), pp 172-181, Springer-Verlag,
Berlin
[10] M. Fliess, Commandabilite, matrices de transfert et modes caches, c.R, Acad Sei Paris, 1-309
(1989), pp 874-851
[11] M. Fliess, Geometrie Interpretation of the Zeros and of the Hidden Modes
of a Constant Linear System via a Renewed Realization Theory, Proc IFAC Workshop
"System Structure and Control: State-space and Polynomial Methods," (1989), pp 209-213,
Prague
[12] M. Fliess, Generalized Controller Canonical Forms for Linear and Nonlinear Dynamies,
IEEE Trans Automat Control, 35 (1990), pp 994-1001
[13] M. Fliess, Automatique en temps discret et algebre aux differences, Forum Math, 2 (1990),
pp. 213-232
[14] M.l. Freedman and J.c. Willems, Smooth Representation of Systems with Differentiated
Inputs, IEEE Trans Automat Control, 23 (1978), pp 16-21
[15] E. Freund, Zeitvariable Mehrgrssensysteme, Lect Notes Operat Res Math Systems, 57 (1971),
Springer-Verlag, Berlin
[16] S.T. Glad, Nonlinear State Space and Input Output Descriptions Using Differential
Polynomials, in "New Trends in Nonlinear Control Theory" (Proc Conf Nantes 1988), J.
Descusse, M. Fliess, A. Isidori and D. Leborgne eds, Lect Notes Control Inform Sei, 122
(1989), pp 182-189, Springer-Verlag, Berlin
[17] A. Haddak, Differential Algebra and Controllability, in "Nonlinear Control Systems Design",
(Proc IFAC Symp Capri, 1989), A Isidori ed, Pergamon Press, Oxford
[18] R. Hermann, Differential Geometry and the Calculus of Variations, Academic Press, New
York,1968
[19] H. Hermes, On Local and Global Controllability, SIAM J Control, 12 (1974), pp 252-261
[20] A. Isidori, Nonlinear Control Systems (2nd edition), Springer-Verlag, Berlin, 1989
[21] J. Johnson, Khler Differentials and Differential Algebra, Annals Math, 89 (1969), pp 92-98
[22] R.E. Kaiman, On the General Theory ofControl Systems, in "Automatie and Remote Control",
Proc 1st IFAC World Congress Moscow 1960, Voll (1961), pp 481-492, Butterworth, London
[23] R.E. Kaiman, Mathematical Description of Linear Systems, SIAM J Control, 1 (1963),
pp 152-192
[24] R.E. Kaiman, Lectures on Controllability and Observability, CIME Summer Course,
Cremonese, Roma, 1968
[25] R.E. Kaiman, P.L. Falb and M.A. Arbib, Topics in Mathematical System Theory,
McGraw-Hill, New York, 1969
[26] E.R. Kolchin, Differential Algebra and Algebraic Groups, Academic Press, New York, 1973
[27] A.J. Krener, A Generalization ofChow's Theorem and the Bang-Bang Theorem to Nonlinear
Control Systems, SIAM J Control, 12 (1974), pp 43-52
[28] C. Lobry, Contrlabilite des systemes non lineaires, SIAM J Control, 8 (1970), pp 573-605
[29] W. Nichols and B. Weisfeiler, Differential Formal Groups of J.F. Ritt, Amer J Math, 104
(1982), pp 943-1003
[30] J.-F. Pommaret, Lie Groups and Mechanics, Gordon and Breach, New York, 1988
[31] J.F. Ritt, Differential Algebra, Amer Math Soc, New York, 1950
[32] A. Robinson, Introduction to Model Theory and to the Metamathematics of Algebra,
North-Holland, Amsterdam, 1974
[33] H.H. Rosenbrock, State-space and Multivariable Theory, Nelson, London, 1970
[34] AJ. van der Schaft, Structural Properties of Realizations of External Differential Systems, in
"Nonlinear Control Systems Design", (Proc IFAC Symp Capri, 1989), A. Isidori ed, Pergamon
Press, Oxford
[35] c.B. Schrader and M.K. Sain, Research on System Zeros: A Survey, Proc 27th IEEE Conf
Decision Control, 1988, pp 890-901, Austin, TX
[36] E.D. Sontag, Finite-dimensional open-Ioop control generators for nonlinear systems, Int J
Control, 47 (1988), pp 537-556
474
M. Fliess
[37] HJ. Sussmann, Lie Braekets, Real Analytieity and Geometrie Control, in "DilTerential
Geometrie Control Theory" (Proe Conf Miehigan Teeh Univ 1982), R.W. Boekett, R.S.
Millman and HJ. Sussmann eds, Birkhuser, Boston, 1983, pp 1-116
[38] H.J. Sussmann and V. Jurdjevie, Controllability of Nonlinear Systems, J DilT Equations, 12
(1972), pp 95-116
[39] D. Williamson, Observation of Bilinear Systems with Applieation to Biological Control,
Automatiea, 13 (1977), pp 243-254
[40] C. Wood, The Model Theory of DilTerential Fields Revisited, Israel J Math, 25 (1976),
pp 331-352
[41] M. Zeitz, Canonieal Forms for Nonlinear Systems in "Nonlinear Control Systems Design",
(Proe IFAC Symp Capri, 1989), A. Isidori ed, Pergamon Press, Oxford
1 Introduction
One of Kalman's most relevant contributions to system and control theory
has been the development of a rigorous conceptual framework in which the
relations between external (input-output) and internal (state) variables
associated with the mathematical descriptions of a dynamical system can be fully
understood. A cornerstone of this theory was the so-called "canonical structure
theorem", first published in [IJ, "which describes abstractly the coupling
between the external and internal variables of any linear dynamical system"
([2J, p. 153). This fundamental result played a crucial role in a number of
major problems of analysis and design, like the construction of irreducible
realizations of an impulse response matrix (a problem completely solved, for
the first time, in [2J), the assignment of the eigenvalues in a feedback system,
the disturbance decoupling and noninteracting control problems, to mention a
few.
Since the appearence of this result in 1962, many authors were interested
in finding extensions to more general classes of systems than those considered
in the paper [1]. A first set of contributions in this direction, addressed to the
extension of the "canonical structure theorem" to time-varying linear systems,
could be found in the works of KaIman and Weiss [3J, Youla [4J, Silverman
and Meadows [5J, Weiss [6J, d'Alessandro, Isidori and Ruberti [7]. A second
stage of development included the theory of bilinear systems, whose canonical
decomposition was studied by Brockett [8J and d' Alessandro, Isidori and
Ruberti [9]. Finally, a canonical structure theorem-along with methods for
the construction of irreducible realizations- became available also for nonlinear
systems, as an outcome of the works of Sussmann and Jurdjevic [10J, Brockett
[l1J, Sussmann [12J, Hermann and Krener [13J, Isidori, Krener, Gori Giorgi,
and Monaco [14J, Fliess [15].
In this paper, which is prepared on the occasion of the 60th birthday of
Rudolf E. KaIman, we intend to review some of our own research work, done
in collaboration with the coauthors of [7J and [14J, on the subject of the
canonical structure of control systems, with particular emphasis on the problem
of decomposing a given system into reachable/unreachable and, respectively,
476
x= F(t)x + G(t)u
(2.1a)
y = H(t)x
(2.1b)
with xER n, uER m , xERP, and F(.), G(.), H(.) matrices of continuous functions.
We suppose that the reader is familiar with the concepts ofreachability, controllability, observability and constructibility (at a specific time t), whose definitions
can be found e.g. in [3]. Our goal is to find a (time-varying) co ordinate
transformation:
z(t) = T(t)x(t)
(where T(.) is an invertible matrix of continuous functions) such that in the
new coordinates, the equations describing the system become:
(2.2a)
=2 =
(2.2b)
F 22(t)Z2
477
or, respectively:
Zl =F l l (t)Zl + Fu(t)Z2 + Gl(t)u
Z2 = F 22(t)Z2 + G2(t)u
y = H 2(t)Z2
with minimal (constant) dimension of Zl in the first case and minimal (constant)
dimension of Z2 in the second case. For reasons of space we discuss only the
first situation and we refer to the literature [7J for the other one, which can be
dealt with in a completely dual manner.
It is weH known that, in the case of a time-invariant system, a decomposition
of the form (2.2) can be obtained by choosing a new basis in the state space
with the property that the first-say nl-vectors of this basis span the set of
reachable states. In the case oftime-varying systems this simple argument cannot
be further pursued because, as also weH known, the set of reachable states may
depend on the time and, what is worse from the point of view of performing a
"decomposition" into subsystems, its dimension also may vary with time (as a
matter of fact, it is easy to check that the latter is a nondecreasing function of
t). A similar difficulty arises also if one looks at the property of controHability,
because the set of controHable states mayas weH depend on time and so may
its dimension (which, by the way, is a nonincreasing function of t). Fortunately
enough, however, a suitable blend of these two properties does the job, in the
sense that enables us to identify a subset whose dimension is constant, which
depends continuously on the time, and which asymptoticaHy coincides either
with the set of controHable states (as time tends to - CIJ) or with the set of
reachable states (as time tends to + CIJ). The following statement expresses these
properties in a precise manner.
Lemma (see [7J) Let ~(t) denote the set of states reachable (from 0) at time t
and let ~(t) denote the set of states controllable (to 0) at time t. The subspace
,9l(t) = 9l(t) + ~(t)
(2.3)
for all t' and t, where ([J(t', t) is the state transition matrix of the system (2.1).
In particular, there exist times Tl ~ T 2, with the properties that:
,9l(t) = ~(t) for all
,9l(t) = 9l(t) for all
t< Tl
t> T 2
The results of this Lemma can be c1early taken as the point of departure
for the construction of a canonical decomposition of the form (2.2). From (2.3),
one deduces the existence of a nonsingular matrix of continuous functions
478
L g;(x)u;
(3.1a)
;= 1
(3.1 b)
with state x evolving on an open set V ofR n; J, g1' ... ' gm are smooth Rn-valued
mappings (vector fields) and h 1 , . , hp are smooth R-valued mappings, all
defined on the open set V.
Our purpose is to describe how Kalman's canonical structure theorem can
be extended to this dass of systems. We proceed in two steps: first, we show
how to find a co ordinate system in which the state variables are separated into
two groups, (1 and (2' such that the second group (2 is not affected by the first
group (1 and by the input to the systems, or-respectively-the first group (1
does not affect the second group (2 and the output of the system. Then, we
interpret these decompositions in term of reachability and observability
properties.
Formally, the first step of the program amounts to the solution of the
following two problems.
Problem 1. Consider a system of the form (3.1) and let XO be a fixed point of
V. Find, if possible, a neighborhood VO of XO and a local coordinate transformation z = tP(x) defined on VO such that, in the new coordinates, the system
(3.1a) is represented by equations of the form:
(1=J1(1'(2)+
L1 g1i(1'(2)U;
(3.2a)
;=
(3.2b)
479
Problem 2. Consider a system of the form (3.1) and let XO be a fixed point of
V. Find, if possible, a neighborhood VO of XO and a local coordinates transformation Z = d>(x) defined on VO such that, in the new coordinates, the control
system (3.1) is represented by equations of the form:
'1
= fl('l' '2) +
'2
L gli('l' (2)Ui
i=
(3.3a)
g2i('2)Ui
(3.3b)
f2('2)
+L
i= 1
'1
'2
1, ... ,
(3.3c)
is used.
Pointwise, a distribution identifies a vector space, a subspace of Rn. On the
basis of this fact, it is possible to extend to the notion of distribution a number
of elementary concepts related to the notion of vector space. Thus, if ..1 1 and
..1 2 are distributions, their sum ..1 1 + ..1 2 and interseetion ..1 1 n ..1 2 are defined
pointwise as the sum and, respectively, intersection of the subspaces ..1 1 (x) and
..12(X). A distribution ..1 1 contains a distribution A 2 , and is written Al :::::l A 2, if
..1 1 (x) :::::l ..1 2 (x) for all x. A vector field f belongs to a distribution ..1, and is
written fE..1, if f(X)E..1(X) for all x. The dimension of a distribution at a point
x of V is the dimension of the subspace ..1(x).
The manner in which ..1(x) depends on x leads to a number of additional
characterizations. A distribution ..1, defined on a open set V, is nonsingular if
there exists an integer d such that
dirn [..1(x)] = d
for all x in V. A point XO of V is a regular point of a distribution ..1, if there
exists a neighborhood VO of XO with the property that ..1 is nonsingular on VO.
A distribution ..1 is involutive if the Lie bracket [r l' L 2] of any pair of vector
480
!2EA~
[tl' !2]EA
We recall that the Lie bracket [tl' !2] of the vector fields !l and !2 is a new
vector field defined by
<
<
oX 1 oX n
481
if and only if it
is involutive.
are linearly independent at the point xc. Then, it is always possible to choose,
in the set of functions:
Xl (X)
a subset of d functions whose differentials at xe, together with those of the set
(3.4) form a set of exactly n linearly independent row vectors. Let <Pi' <P2'' <Pd
denote the functions thus chosen and set:
<Pd+l(X) = Al(X), ... , <Pn(x) = An-Ax)
By construction, the Jacobian matrix of the mapping:
z = lP(x) = col(<pl (x), ... , <PAx), <Pd + 1 (X), ... , <Pn(X))
has rank n at XO and, therefore, the mapping lP qualifies as a local diffeomorphism
(i.e. a local smooth coordinates transformation) around the point xc. Now,
suppose r is a vector field of ,1. In the new coordinates, this vector field is
represented in the form:
i(z) = [ 8 lP r(x)]
8x
x= <1>-'(z)
482
Coherently with the notation introduced before for invariant distribution, the
property in equation is usually written as L fn c n, where Lf n = span {L fW:
WEn}. It is easy to show that the notion thus introduced is the dual version
of the notion of invariance of a distribution. More precisely, it can be proven
that if a smooth distribution .1 is invariant under the vector field f, then the
co distribution n = L1..L is also invariant under fand, conversely, that if a smooth
codistribution Q is invariant under the vector field f, then the distribution
.1 = n..L is also invariant under f.
The key role of the notion of invariant distribution is explained by the
following statement, in wh ich we consider an involutive distribution .1, invariant
under a vector field f.
J(z)=
(3.6)
483
x' =/(x)
(3.8a)
'2 = /2(2)
(3.8b)
= tP(x),
(3.11)
at any time t.
Two initial conditions x a and x b satisfying (3.10) belong, by definition, to a
slice of the form (3.9). As we have just seen, the two corresponding trajectories
xa(t) and xb(t) of(3.7) necessarily satisfy (3.11), i.e. at any time t belong necessarily
to a slice of the form (3.9). Thus, we can conclude that the flow of (3.7) carries
slices (of the form (3.9 into slices.
484
We have at this point recalled all the notions needed to understand how
the Problems 1 and 2 stated at the beginning of the seetion can be solved. The
following two statements illustrate their solutions.
Proposition 3.1. Let .1 be a nonsingular involutive distribution of dimension d and
assume that .1 is invariant under the vector fields f, g l' ... , gm' M oreover, suppose
that the distribution span{g1, ... ,gm} is contained in .1. Then, for each point
XO it is possible to find a neighborhood VO of XO and a local coordinates transformation z = lP(x) defined on VO such that, in the new coordinates, the control
system (3.1a) is decomposed as in the equation (3.2).
Proof. From the Lemma it is known that there exists, around each xe, a local
coordinates transformation yielding a representation of the form (3.6) for the
vector fields f, g l' .. , gm' In the new coordinates the vector fields g l' ... , gm,
that by assumption belong to .1, are represented by vectors whose last (n - d)
components are vanishing (see (3.5)). This proves the Proposition.
0
Proposition 3.2. Let .1 be a nonsingular involutive distribution of dimension d and
assume that .1 is invariant under the vector fieldsf, g1, ... ,gm' Moreover, suppose
that the codistribution span{dh 1, ... ,dhp } is contained in the codistribution L1.i.
Then, for each point XO it is possible to find a neighborhood VO of XO and a local
coordinates transformation z = lP(x) defined on VO such that, in the new
coordinates, the control system (3.1) is decomposed as in the equations (3.3).
Proof. As before, we know that there exists, around each xe, a coordinates
transformation yielding a representation of the form (3.6) for the vector fields
f, g1,.,gm In the new coordinates, the covector fields dh 1, ... ,dhp , that by
assumption belong to L1.i, must be represented by vectors whose last d
components are vanishing, and this completes the proof.
0
+ ... + gm(x)~
passing through the point x(Tk ). Suppose that the assumptions ofthe Proposition
3.1 are satisfied, choose a point xe, and set x(O) = xe. For small values of t the
state evolves on VO and we may use the equations (3.2) to interpret the behavior
'2
485
for every input u. Since the two states x a and x b produce the same output under
any input, they are indistinguishable.
From the previous discussion we could obtain only "negative" results, in
the sense that we identified-in the case of decomposition (3.2)-a smooth
hypersurface of dimension d < n in which all states reachable at time t = T are
necessarily inc1uded and-in the case of decomposition (3.3)-a smooth
hypersurface of dimension d < n whose states are necessarily indistinguishable.
If more "positive" informations are sought, ab out the actual "thickness" of the
set of states reachable at some fixed time, or about the actual thickness of the
sets of states indistinguishable at some fixed time, one has obviously to look
first at decompositions of the form (3.2) in which the dimension d of the first
group of coordinates is the smallest possible, or at decomposition of the form
(3.3) in which the dimension d of the first group of coordinates is the largest
possible.
In view of the results summarized in the previous section, the first problem
amounts to finding, if possible, the smallest distribution invariant under the vector
fields f, gl, ... , gm which contains the vector fields gl, ... , gm. On the other hand,
the second problem amounts to find, if possible, the smallest codistribution
invariant under the vector fields f, gl, ... , gm which contains the covector fields
dh 1 , .. , dh p
The two objects in question can be calculated by means of appropriate (and
simple) algorithms. Consider the sequence of distributions recursively defined
as:
(4.1a)
Li k = Li k - 1 + [f, Li k -
1]
i= 1
[gi' Li k -
1]
(4.1b)
This sequence identifies the distribution needed in order to perform the minimal
decomposition of the form (3.2). In fact, the following is true.
486
= .Qk-1 + L J.Qk-1 +
I. L
(4.2b)
gi .Qk-1
i= 1
This sequence identifies the distribution needed in order to perform the maximal
decomposition of the form (3.3). In fact, the following is true.
;;-1
Theorem 4.1. Let P denote the smallest distribution invariant under f, gl," . , gm
which contains gl, ... , gm' Suppose P and P + span {f} are both nonsingular. Let
p denote the dimension of P. Then,for each XO E V it is possible to find a neighborhood VO of XO and a coordinate transformation z = <P(x) defined on VO with the
following properties:
(a) the set ~(XO, T) of states reachable at time t = T starting from XO at t = 0
along trajectories entirely contained in V and under the action of piecewise
constant input functions is a subset of the slice
for all
Theorem 4.2. Let Q denote the annihilator of the smallest codistribution invariant
under f, gl, ... ,gm which contains dh 1 , ... ,dh p Suppose Q is nonsingular and let
q denote its dimension. Then, for each XO E V it is possible to find a neighborhood
VO of XO and a coordinate transformation z = <P(x) defined VO with the following
properties:
487
5 Irreducible Realizations
As shown for the first time by KaIman for linear systems in [2J, the "canonical
structure theorem" is an important ingredient in the process of constructing
irreducible realizations of an input-output map. Starting form the observation
that a linear input-output map
JW(t, .)u(.)d.
I
y(t) =
is realizable by a finite dimensional linear system if and only if the kernel W(t,.)
is separable, i.e. there exist differentiable functions Q(t) and P(.) such that:
W(t,.) = Q(t)P(.)
one can in fact use the canonical structure theorem in order to extract from
the "trivial" realization:
x = P(t)u
y = Q(t)x
an irreducible one.
A similar procedure can be followed also in the problem of constructing
irreducible realizations of a nonlinear input-output map, starting from an
appropriate version of the separability condition which also holds in this more
general setting. For, recall (see [17J and [18J) that under suitable assumptions
on the input function, the output response of a (single-input) nonlinear system
of the form (3.1) can be expanded in the form of a Volterra series:
I
tl
+ JJ...
00
'tk- 1
J Wk(t'l,,k)U(dU(k)dldk
0
488
where:
As in the ca se of a linear system, from the two functions pet, x) and Q(t, x)
which separate the kerneIs one can define the "trivial" realization, which has
the form:
x=
pet, x)u(t)
x(O) = XO
Y = Q(t,x)x
and from this an irreducible realization one can be extracted by means of the
canonical structure theorem.
References
[1] R.E. Kaiman, Canonical strueture oflinear dynamical systems, Proc Nat Aead Sei U.S.A. 48,
pp 596-600 (1962)
[2] R.E. Kaiman, Mathematical Description of linear dynamieal systems, SIAM J Contr 1, pp
152-192 (1963)
[3] R.E. Kaiman, L. Weiss, Contributions to linear system theory, Int J Engrn Sei 3, pp 141-171
(1965)
[4] D.C. Youla, The synthesis of linear dynamieal systems from preseribed weighting pattern,
SIAM J Appl Math 14, pp 527-549 (1966)
[5] L.M. Silverman, H.E. Meadows, Controllability and Observability of time-variable linear
systems, SIAM J Contr 5, pp 64-73 (1967)
489
[6] L. Weiss, On the structure theory oflineared differential systems, SIAM J Contr 6, pp 659-680
(1968)
[7] P. d'Alessandro, A. Isidori, A. Ruberti, A new approach to the theory of canonical
decomposition of linear dynamical systems, SIAM J Contr 10, pp 148-158 (1972)
[8] R.W.Brockett, On the algebraic structure of bilinear systems, Theory and Applieations of
Variable Strueture Systems, R. Mohler and A. Ruberti eds, Aeademie Press, pp 153-168 (1972)
[9] P. d'Alessandro, A. Isidori, A. Ruberti, Realization and strueture theory of bilinear dynamieal
systems, SIAM J Contr 12, pp 517-535 (1974)
[10] H. Sussmann, V.Jurdjevie, Controllability ofnonlinearsystems,J Diff Eqs 12, pp95-116(1972)
[11] R.W. Broekett, System theory on group manifolds and eoset spaces, SIAM J Contr 10, pp
265-284 (1972)
[12] H. Sussmann, Existenee and uniqueness of minimal realizations of nonlinear system, Math
Syst Theory 10, pp 263-284 (1977)
[13] R. Hermann, A.J. Krener, Nonlinear controllability and observability, IEEE Trans Aut Contr
AC-22, pp 728-740 (1977)
[14] A. Isidori, A.J. Krener, C. Gori Giorgi, S. Monaco, Nonlinear decoupling via feedback: a
differential geometrie approach, IEEE Trans Aut Contr AC-26, pp 331-345 (1981)
[15] M. Fliess, Realisation loeale des systemes non lineaires, algebres de Lie filtrees transitives et
series generatrices non commutatives, Invent Math 71, pp 521-537 (1983)
[16] A. Isidori, Nonlinear eontrol systems, 2nd ed, Springer Verlag, pp 1-480 (1989)
[17] R.W. Broekett, Volterra series and geometrie eontrol theory, Automatica 12, pp 167-176(1976)
[18] C. Lesjak, A.J. Krener, The existenee and uniqueness ofVolterra series for nonlinear systems,
IEEE Trans Aut Contr AC-23, pp 1091-1095 (1978)
[19] A. Isidori, A. Ruberti, Aseparation property of realizable Volterra kerneis, Syst Contr Lett
1, pp 309-311 (1982)
1 Introduction to HUM
Let us consider a Distributed Parameter System, i.e. a system with astate
equation given by a P.D.E. (Partial Differential Equation) of evolution type.
Let us write, in a formal fashion for the time being
oy
-+d(y) =&8v
ot
(1.1)
where y denotes the state, v denotes the control; in (1.1) d is a P.D.O. (Partial
Differential Operator) which is linear or non linear. The operator &8 maps the
space of controls into the space of "acceptable" right hand sides.
This way of writing things is of course formal.
In "real" situations, v can appear only on the boundary ofthe spatial domain
Q where (1.1) is considered, or it can appear inside Q (boundary-resp.
distributed-control in the first-resp. second-situation).
We confine ourselves here to deterministic P.D.E's. In (1.1), y can be a scalar
function or a system y = {Ylo"" YN}'
In the above notations, the wave equation
(1.2)
is written in a systemform (i.e. y be comes in fact the couple {y,oy/ot}).
Of course to (1.1) one should add boundary conditions and initial conditions.
We shall assume that the boundary conditions are implicitely taken ca re of
in (1.1). But of course this has to be made precise in each specific situation.
The initial condition is
y(O) = yO
( 1.3)
where yO is given in a suitable function space and where y(O) denotes the function
x -+ y(x, 0), if y(x, t) denotes "the" solution of (1.1) (1.3) subject to appropriate
boundary conditions.
492
J. L. Lions
y(x, t; v) = y(v)
any solution of (1.1) and (1.3) (subject to appropriate boundary conditions).
Adaptations of what follows to ca ses where y(v) is not unique are given (at least
in so me cases) in J.L. Lions [2J, A.V. Fursikov [lJ and [2].
Exactly as in the finite dimensional case, one considers next a cost function
defined by
J(v) = q>(y(v), v)
(1.4)
where q>(y, v) is a function from Y x 1111 -+ JR where Y (resp. 1111) denotes the space
(resp. set) of states (resp. admissible controls).
One looks then for
inf J(v),
(1.5)
where y(v) can also be subject to further restrictions (the state constraints).
In the (very many) problems and still open questions which present
themse1ves along these lines, R. E. Kalman's work has always been a "guiding
line". What part of Kalman's theory can be extended to the infinite dimensional
situations?
0
The first question to consider is of course the KaIman filter. One can indeed
obtain an infinite dimensional analogue of the Riccati equation of Kalman's
filter, based on the general theory ofnon homogeneous boundary value problems
(as studied in J.L. Lions and E. Magenes [lJ) and on the use of L. Schwartz's
kernel theorem (L. Schwartz [lJ). cf. J.L. Lions [lJ and [3J.
0
Another question is to study the controllability problem.
For linear systems, a general method has been introduced in J.L. Lions [4J
and [5]. Let us describe it in a formal fashion and in a very particular situation.
Let us consider the wave equation
a2 y
at 2
Lty = VXm in
x (0, T)
(1.6)
where (9 denotes an open subset of the bounded open set n of JRn and where
Xm denotes the characteristic function of (!).
The boundary condition is chosen to be Gust in order to fix ideas; wh at
follows is completely general)
y=
on
r = an = boundary of n
(1.7)
493
= 0,
ay (0) = 0.
(1.8)
at
Let T > be given. Let zo, Zl be two functions given in "appropriate" function
spaces.
We shall say that the problem is exactly controllable if for any couple {ZO, Zl},
there exists a function v (in a suitable function space) such that the corresponding
solution y(v) satisfies
(1.9)
A few remarks are in order.
Remark 1.2. Of course the above problem is ambiguous. We have to make
precise what are the function spaces which are considered. This is a highly non
trivial question, as we shall indicate below. In order to "start" the machinery,
we shall assume that
v spans
L 2 (@ x (0, T))
(1.10)
Remark 1.3. Due to the finite speed of propagation of, say, singularities, it is
c1ear that finding v for "any" couple {ZO, Zl} is impossible unless T is "Iarge
enough", a condition which can be made precise, as in J.L. Lions [4] and
[5].
0
Remark 1.4. In the linear case, having zero initial conditions as in (1.8) is not
a restriction.
0
Remark 1.5. Assurne there is exact controllability. Let e> be chosen so small
that there is still exact controllability between e and T. Then let us choose V1
an arbitrary L 2 function in @ x (0, e). This choice will drive the system to the state
y(e; vd = ~o,
ay
at (e; Vl) = ~l
v=
Vl
in
x (0, e),
V2
in
x (e, T)
(1.11)
drives the system from (O,O} at time to {ZO, Zl} at time T. Therefore there are
infinitely many controls v such that (1.9) holds true. It becomes then very natural
494
J. L. Lions
to look for
inf~ H
2 (I) x (O,T)
v2 dxdt
(1.12)
among all the v's (ifthey exist) such that (1.9) holds true (state constraints).
V)}
(1.13)
We define next
+ 00
ql = Zl
(1.14)
otherwise.
F(v)=- H v2 dxdt
2 (I) x (O,T)
(1.15)
inf[F(v) + G(Lv)] = J o
(1.16)
Using duality theory (of course with the necessity to prove that we can
indeed use it here !), one has
J O= -inf[F*(L*q)+G*(-q)]
(1.17)
F*(v) = F(v)
(1.18)
(1.19)
and that
495
2Q
- 2 -AQ=O,
at
Q(X, T) = QO,
Q
= 0 on
aQ (x, T) = Ql in
at
n,
(1.20)
x (0, T)
Then, multiplying (1.20) by y = y(v) and integrating by parts, one finds that
SI
(I) x(O.T)
QV dx dt.
(1.21)
(1.22)
QO.Q'
where
(1.23)
with g = solution of (1.20).
We now observe that
(
ss
(I)
Q2 dxdt )
1/2
(1.24)
x (O,Tl
Remark 1.6. Of course in order to obtain more precise results, one has to
possibly characterize the space F (and in any ca se to obtain indusion relations
with "classical" spaces). We refer here to J.L. Lions [4J and [5J and to many other
works, in particular hy A. Haraux, V. Komornik, and E. Zuazua.
496
J. L. Lions
The space F can depend on (!). Formally F = F((!)) becomes "very large" as
becomes "very small", so that F' = F'((!)) gets smaller as (!) decreases. This is
coherent with common sense: the smaller the set (!), the smaller is the set of
ZO,Zl that we can reach using vEL2((!) x (0, T)). This remark leads to a natural
question: one observes first that if (!) = Q (and if T is large enough), then
F = HMQ) x L2 (n), where H~(Q) denotes the Sobolev space of functions
cpEL2 (Q) such that ocp/ox;EL2 (n), i ~ i ~ n and such that cp = on r = on.
What are the conditions for (!) which imply that
(!)
02 y
ot 2 - L1y =
eXI!!>
y=
x (0, T)
on
y(x, 0) = 0, oy (x, 0) =
ot
(1.26)
in n,
(1.27)
(1.28)
497
Remark 1.11. The situation is slightly different for parabolic systems (i.e. time
irreversible systems). We shall not enter into this topic here, referring to J.L.
Lions [5J Vol. 2 and [6J.
0
Remark 1.12. There are several applications which also motivate the above
eonsiderations. The main one is related to the control and stabilization of
flexible struetures. Cf. [6J loe. eit. for other questions. Of course all this directly
leads to non linear P.D.E.'s. A few remarks concerning these cases are presented
in the next seetion.
a2 y
at 2
Ay + yg(y) = VXI!!'
(2.1)
a2 y
""t2 -
Ay + yg(~) = VXI!!'
(2.2)
v(~)
such that
y(T; v) = zo,
ay
at (T; v) = Zl
(2.3)
and
(2.4)
This uniquely defines y = y(x, t; v(m = M(~). One will have obtained a
control v driving (2.1) from state {O,O} to state {ZO,Zl} at time T if onefinds a
fixed point of M
M(~) =~.
498
J. L. Lions
(2.5)
a2y
laylP-2 ay
--L1Y+1>: -=VXI!!
2
at
at
at
(2.6)
(2.7)
p>2
(2.8)
and
Existence and uniqueness ofa solution of(2.6), (1.7), and (1.8) is well known.
The counter example runs as follows. One intro duces a function t/J = t/J(x)
such that
00,
111
+ - =p q 2
(2.9)
a T)
S t/J ( -(x,
{l
8t
)2 dx ~ C
VVEL2 ((!)
X(0, T))
(2.10)
499
IS
not
J1j;y(x, Tfdx ~ C
u
The conclusion imposes itself: many open questions remain for the
controllability of non linear distributed systems. We refer also to the papers [6J
and [7J of the Author.
Note added in Proof. Many refined results going much beyond the content of
Remark 2.3 have been obtained by I. Diaz. A complete report of these results
500
J. L. Lions
Chapter 8
Influence in Mathematics
In 1960 KaIman introduced the state space method as a systematic tool for the
x=
Ax + Bu,
x(O) =
y= Cx+Du
When one assumes that the state vector x(t) is subject to zero initial condition
(x o = 0), then after Laplace transformation the relation between the input
function u(s) and the output function (s) is given by
(s) = W(s)u(s)
where
W(s) = D
+ C(sI -
A) - 1B
(1)
is the transfer function of the system. The converse problem is to realize a given
proper rational matrix function as the transfer function of a linear system, i.e.
given W(s) find matrices A,B,C,D so that (1) holds. Using the notions of
controllability, observability and minimal realization, KaIman c1arified the
precise connection between an input-output or c1assical frequency domain
description of a linear dynamical system on the one hand and astate space
model on the other. KaIman also introduced a primitive axiomatic framework
for linear dynamical systems and pursued applications of the state space method
to specific engineering problems, such as the linear quadratic regulator problem
and the KaIman filter.
Some of these same ideas appeared in a different form earlier in the operator
model theory ofLivsic and Brodskii [7J and also concurrently but independently
in the model theory of Sz.-Nagy and Foias [13J and de Branges and Rovnyak
[6]. The characteristic function attached to an operator on a Hilbert space in
this theory is in fact the transfer function for a system having extra symmetries.
For arecent overview of the connections between system theory and operator
1
3
504
J. A. Ball et al.
model theory, see [2]. Starting in the middle 1970's, researchers such as Baras,
Brockett, Dewidle, Fuhrmann, and Helton analyzed the connections between
operator model theory (in particular the Beurling-Lax theorem and shift
invariant subspaces in H 2 ) and system theory. Some of this work led in turn
to the polynomial module approach to systems theory.
It has now become c1ear that the state space method introduced by KaIman
is relevant to the understanding of rational matrix functions from the purely
mathematieal point of view as a generalization of c1assieal function theory.
Issues to be understood (also basie to a multitude of applications) are matricial
analogues of all sorts of factorization and interpolation problems. The factorization problems inc1ude: polynomial, minimal (no pole-zero cancellations),
spectral and Wiener-Hopf (splitting poles and zeros of t~y factors to specific
regions of the complex plane), inner-outer and coprime. Inteq;clation problems
inc1ude: finding a polynomial with given zeros, finding a rational function with
given poles and zeros, finding a rational inner (all-pass) function with given
zeros in the right half plane, Lagrange interpolation for polynomials, Pade and
Cauchy interpolation for rational functions, and the c1assieal interpolation
problems of Schur, Nevanlinna-Piek, Caratheodory-Toeplitz, Schur-Takagi
and Nehari. (These last topies are particularly relevant to the active area of H OO
control which has developed in the past decade.) Work up to 1970 on rational
matrix functions generally dealt directly with polynomial coefficients of matrix
entries, expressed many ideas in terms of determinants and minors, developed
solution algorithms using row and column operations, or took the linear
dynamieal system as the main object of study rather than the rational matrix
function itself. Examples of this flavor of work are the spectral factorization
algorithm of Youla [17] and the well-known book of Rosenbrock [15].
Somewhat later there began a more systematic exploitation of state space
methods to the study of rational matrix functions. The relevance of the state
space method to understanding rational matrix functions is the idea of using
four matrices A, B, C, D to express a rational matrix function W(z) in the realized
form (1) as the transfer function of a linear system. The next step is to formulate
the problem under study in terms of the matrices A, B, C, D and then to reduce
the solution of the problem to standard linear algebra manipulations on finite
matrices. This has the additional payoff that one is able to get explicit formulas
for the final solution rather than merely existence theorems. An early state space
algorithm for spectral factorization was obtained by Anderson [1]. Later Bart,
Gohberg, Kaashoek [5] applied astate space approach for general minimal
factorizations to spectral factorization. A major more recent breakthrough was
Glover's state space solution (inc1uding a linear fractional parametrization of
the set of all solutions) of the Nehari problem [8]. Here we discuss a partieular
line of development pursued in recent years by us and together with a number
of our colleagues. Inspiration for the work comes both from system theory
and operator model theory; a good broad sampie of the work can be found in
the books [3,5,9, 10, 11, 12]. In this approach, zero and pole data (inc1uding
geometrie directional information) are encoded in a pair of matriees C, A) or
505
(A, B)) which forms a piece ofa realization ofthe associated function (see [11, 3J).
sup{lf(z)l:zEII+} ~ 1
(2)
such that
f(ot:i) = Wi for
i = 1, ... , m
(3)
If any Wi has Iwil > 1, then (2) and (3) are incompatible and the problem is not
solvable. On the other hand, it can be shown that if Iwil ~ 1 for all i, then the
problem always has a solution (see Chap. 21 in [3J). If Iw;I < 1 for all i, one
can always solve the problem even when additional derivative constraints are
imposed on f at ot:i. The simplest boundary Nevanlinna-Pick interpolation is
to assume also that Iw;l = 1 for i = 1, ... , m and to impose interpolation
constraints on the derivative of f as weIl as the point ot:i:
f'(ot:i)=Yi,
i= 1, ... ,m.
(4)
It turns out that for any f satisfying (2) and (3) necessarily f'(ot:;) has the form
f'(ot:;) = - WiPi where Pi !i;;; O. This fact has to do with classical properties about
angular derivatives for Schur-class functions; we refer to [16J for arecent elegant
Hilbert space derivation of the basic properties of angular derivatives for general
(nonrational) Schur-cl ass functions and to [3J for a more elementary account
of the rational case.
We now turn to the simplest matrix analogue of the interpolation problem
(2)-(4) (where Yi = - WiPi with Pi !i;;; 0). We also permit interpolation conditions
in the interior of II+ as in the usual Nevanlinna-Pick problem. We are given
n points Zl' .. 'Zn in II+, n nonzero 1 x N row vectors Xl, ... ,xn,n 1 x N row
vectors Yl, ... ,Yn together with m points ot:1, ... ,ot: m on the imaginary line, 2m
N x 1 column vector ~ l' ... , ~m and '11, .. , 11m all of unit length and m positive
real numbers Pl, ... , Pm. Then the problem is to find all rational N x N matrix
506
J. A. BaIJ et al.
(5)
(6)
(7)
(8)
~jF(rJ.)=1Jj
-Yl-Y n
where
hk1 = (~k~i -1Jk1Ji)/(Zk + Z,)
hk1 = Pk if k = 1,
X
= [(x;~; - Y;1Jn/(z; -
if k =1-1,
rJ.j)]l~;~n;l~j~m,
andfinally
e(z) = [
+ C,,(z[ -
edz)]
e 22 (Z)
by
A,,)-lT- 1 B~,
(9)
where
507
"=[A" 0],
Ao
B,=[:~J
C,,= [C"
Co].
(ii)
The idea behind the proof of Theorem 1 is as folIows. One can show by a
direct winding number argument that a J-Iossless rational matrix function e(z),
i.e. e(z) such that, for J = IN$ - IN'
e( - Z)*Je(z) = J
if
9lz = 0,
(10)
e( - Z)*Je(z) ~ J
if
9lz > 0,
(11)
and
and which has the appropriate pole and zero structure at the interpolation
nodes Zl'''''Zn,rx 1 , ... ,rxm provides an appropriate linear fractional map which
provides a description for the set of all solutions. To find a rational matrix
function e having these properties, one assumes that e(z) is in the realized
form e(z) = I + C(zI - A) - 1B of the transfer function of a linear system and
derives the corresponding conditions on the matrices A, Band C which in turn
are solvable.
The method generalizes easily to the more complicated situation where the
interpolation conditions involve high er order derivatives on the unknown
function F. In this case, the interpolation conditions themselves are expressed
more conveniently in terms of a collection of matricies reminiscent of state
space ideas. Complete details on this general case can be found in Chap. 21 of
the monograph [3]. A direct elementary proof of this special ca se will appear
in [4].
References
[1] B.D.O. Anderson, An algebraic solution to the spectral factorization problem, IEEE Trans
Auto Control AC-12 (1967), 410-414
508
J. A. Ball et al.
[2] J.A. Ball and N. Cohen, De Branges-Rovnyak operator models and systems theory: a survey,
in Proe Workshop on Matrix and Opterator Theory, Rotterdam (June 1989), to appear
[3] J.A. Ball, I. Gohberg and L. Rodman, Interpolation of Rational Matrix Funetions, BirkhuserVerlag (Basel), OT45 1990
[4] J.A. Ball, I. Gohberg and L. Rodman, Boundary Nevanlinna-Piek interpolation for rational
matrix funetions, J Math Systems, Estimation and Control, to appear
[5] H. Bart, I. Gohberg and M.A. Kaashoek, Minimal Faetorization of Matrix and Operator
Funetions, Birkhuser-Verlag on (Basel), 1979
[6] L. de Branges and J. Rovnyak, Appendix on square summable power series, Canonieal models
in quantum seattering theory, in Perturbation Theory and its Applieations in Quantum
Meehanies (ed. C.H. Wileox), Wiley (New York), 1966
[7] M.S. Brodskii, Triangular and Jordan Representations of Linear Operators, Transl Math
Monographs Vol 32, Amer Math Soe (Providenee), 1971
[8] K. Glover, All optimal Hankel-norm approximations oflinear multi variable systems: relations
to approximations, Int J Control39 (1984),1115-1193
[9] I. Gohberg (ed.), Topies in Interpolation Theory of Rational Matrix-valued Funetions,
Birkhuser-Verlag OT33 (Basel), 1988
[10] I. Gohberg and M.A. Kaashoek (ed.), Construetive Methods of Wiener-Hopf Faetorizations,
Birkhuser-Verlag OT21 (Basel), 1986
[11] I. Gohberg, P. Laneaster and L. Rodman, Matrix Polynomials, Aeademic Press (New York),
1982
[12] I. Gohberg, P. Laneaster and L. Rodman, Invariant subspaees of matriees with applieations,
J. Wiley and Sons (New York), 1986
[13] B. Sz.-Nagy and C. Foias, Harmonie Analysis of Operators on Hilbert Spaee, Ameriean
Elsevier (New York), 1970
[14] L. Rodman, An Introduetion to Operator Polynomials, Birkhuser-Verlag OT38 (Basel), 1989
[15] H.H. Rosenbroek, State Spaee and Multivariable Theory, John Wiley (New York), 1970
[16] D. Sarason, Angular derivatives via Hilbert spaee, Complex Variables: Theory and Applieations
10 (1988),1-10
[17] D.C. Youla, On the faetorization of rational matriees, IRE Trans on Information Theory
(1961),172-189
In this paper explicit formulas are given for the solutions of singular integral
equations with a rational symbol. One of the main ideas is to use the state
space method from linear systems theory to reduce the problem to a problem
for input/output systems. Also, the Fredholm characteristics are described and
a review is given of the factorization method.
Introduction
The state space method is based on the fact that a proper rational matrix
function W(A) can be written in the form
W(J,) = D
+ C(A -
A)-l B,
(1)
where A is a square matrix whose order may. be much larger than the size of
W(A), and B, C, and D are matrices of appropriate sizes. The representation (1)
allows one to reduce problems about rational matrix functions in a successful
way to problems about constant matrices, and it often suggests explicit and
easily computable formulas for the solutions. In the last decade the state space
approach has also proved to be effective in solving various problems of mathematical analysis (see [3J and the references therein).
In the present paper we apply the state space method to solve singular
integral equations. Such equations serve as a tool to solve problems in many
fields of applications (statistics, mathematical physics, mechanics, etc.). For the
general theory and examples of applications we refer to the books [8J, [10J,
[llJ, and [14]. The equations we deal with will have a rational matrix symbol
and are of the form:
(2)
510
Here the contour r consists of a finite number of disjoint smooth simple Jordan
curves, a() and bO are given m x m rational matrix functions, which have no
poles on r, and f is a given function from L~(T), the space of all (Cm-valued
functions that are square integrable on r. The matrix function W(),
W():= [a() - b()] -l[a() + b(A)],
which plays a decisive role in the analysis of (2), is, in general, not proper, and
therefore we use in this paper a modification of the representation (1), namely
we write
W(A) = I
+ C(AG -
A)-lB
(3)
Here A,B, and C are as in (1), G is a square matrix oLth.e same order as A,
and I stands for the m x m identity matrix. In terms of the, matrices A, G, B,
and C we shall give necessary and sufficient conditions for the inversion of the
equation (2) and an explicit formula for its solution. Also the Fredholm characteristics of equation (2) will be described in terms of these four matrices.
Section 1 is of preliminary character. Here we list a number of facts about
the representation (3) and we review the spectral separation theorem for a pencil
G - A. Section 2 contains the main theorems. The proofs are given in Sect. 3.
In Sect. 4 the state space method is also used to derive an explicit presentation
of the factorization approach. This last section may be viewed as a continuation
of the theory developed in [2], [6], which concerns Wiener-Hopf integral
operators, infinite block Toeplitz matrices and singular integral operators with
proper rational symbols.
1 Preliminaries
Throughout this paper r is a contour of the type appearing in (2). Thus r
consists of a finite number of disjoint smooth simple Jordan curves. The inner
domain of r will be denoted by Li + and its outer domain by Li _. In wh at
follows we ass urne that 00 ELi _.
Whenever convenient we identify a p x q matrix with the linear transformation from (Cq into (CP defined by the canonical action of the matrix relative to
the standard bases in (Cq and (CP. The symbol I denotes an identity operator
or a square identity matrix.
la. Matrix pencils. Let A and G be n x n complex matrices. The expression
AG - A, where A is a complex parameter, is called a (linear matrix) pencil. We
say that the pencil AG - A is r-regular if det(G - A) # 0 for each on the
contour r. In that ca se one can define the following matrices:
P=~fG('G-A)-ld', Q=~f('G-A)-lGd'
(1.1)
2nlr
2nl r
We shall need the following spectral decomposition result. For its proof we
refer to [12], see also [6], Sect. 2.
511
Proposition 1.1. Let.W - A be T-regular, and let the matrices P and Q be defined
by (1.1). Then P and Q are projections which have the following properties:
(1) PG=GQandPA=AQ;
(2) (AG - A)-l P = Q(AG - A)-l on T and thisfunction has an analytic continuation on .1_ which vanishes at 00;
(3) (AG - A)-l(I - P) = (I - Q)(AG - A)-l on T and thisfunction has an analytic
continuation on .1 + .
H ere G and A are square matrices of the same size, n x n say, the pencil AG - A
is T-regular, and Band C are matrices of sizes n x m and m x n, respectively.
Proof. Since W has no poles on the contour T, we can decompose W as
W = W + + W _, where W + and W _ are rational matrix functions, W + has no
poles on .1 + u T, and W _ is strictly proper and has no poles on .1_ ur. By
the classical realization theorem W _ admits a representation of the form
W_(A) = Cl(A - Ad-1B l ,
where Al is a square matrix that has all its eigenvalues in .1+, and B l and Cl
are matrices of appropriate sizes. Next, choose a fixed Ao E.1+, and put
0/(,1.) =
1[I - W + 1)]
+ (,1.0
(1.3)
The matrix function 0/ is strictly proper and 0/ has no poles in the set
{AE<C1 ,1.0 + 1E.1+
UT}.
(1.4)
such that A 2 has no eigenvalues in the set (1.4). From (1.3) and (1.5) it follows that
1_ 0/(_1_)
W+(z)=I _ _
z - ,1.0
z - ,1.0
512
where 12 is the identity matrix of the same order as A 2 . Since A 2 has no eigenvalues in the set (1.4), we have
det[zA 2
(I2
+ Ao A 2 )] # 0,
zEL1+ ur.
Now put
AG_A=(A-A 1
C = (Cl
AA 2 - (l2
B=
C 2 ),
+ Ao A 2)
(~:)
C(}~G
(AG - A X)-l =
- A X)-l B, AET,
(1.6)
()~G
(2.1)
(S<p )(A) = ~
J <p(jl) djl, AE T,
m rjl-A
(2.2)
where the integral is taken in the sense of the Cauchy principal value. The
operator S defined in this way can be extended by continuity to a bounded
linear operator, again denoted by S, on all of L~(T) (see, e.g. [4], Sect. 1.3).
The operator S enjoys the property that S2 = I. Hence Pr= -!-(l + S) and
513
(2.4)
(2.5)
We shall refer to MwP r+ Qras the singular integral operator with symbol W.
(Strict1y speaking the symbol is the diagonal matrix W()E!H r , where Irstands
for the function which is identically equal on T to the m x m identity matrix;
in what follows we shall omit this second function.)
Equation (2.1) has a unique solution qJEL~(T) for each choice of fEL~(n
if and only if the singular integral operator MwP r + Qris invertible, and in
that case the solution qJ is given by
qJ =
[M wP r+ Qrr 1 9,
+ qAG -
(2.6)
A)-l B, AET,
A)-ld(, p x =
A X)-ld(
(2.7)
514
In that case
(M wPr+ Qr)-1 g(A) = g(A) - C(AG - A X)-1 B(Prg)(A)
+ {C(AG -
A X)-1 Bg(()d(),AET,
00, codim Im
T=
L~(T)
is Fredholm if Im T is closed
dim(L~(T)/Im
T) <
00
Here Ker T denotes the null space and Im T the range of T. If T is Fredholm,
then its index is the integer
ind T:= dirn Ker T - codim Im T
We say that T+ is a generalized inverse of T if TT+ T = T.
Theorem 2.2. Let T = MwP r + Qrbe the singular operator on L~(T) with symbol
(2.6). Put A x = A - Be. Then T is a Fredholm operator if and only if the pencil
AG - A x is T-regular, and in that case the following equalities hold:
(2.9)
<en
ImP + Ker p x
(2.10)
(2.11)
Here P and p x are as in (2.7), and n is the order of the matrices A and G.
Furthermore, a generalized inverse T+ of T is given by
(T+ g)(A) = g(A) - C(AG - A X) -1 B(P rg)(A)
+ {C(AG -
where J + : Im P x
-+ Im P
(2.12)
515
Proposition 3.1. Let MwP r + Qr be the singular integral operator on L~(n with
symbol (2.6), and let gEL~(n. Put A x = A - BC. Assume that J..G - A x is
T-regular, and let P and P x be the projections defined by (2.7). Then the
equation
(MwP r + Qr)qJ = 9
has a solution
qJEL~(n
Jp x G((G r
(3.1)
if and only if
(3.2)
(3.3)
Proof. For
qJEL~(n,
(3.4)
put
1 qJ(()
.
-J
-d(,J..ET,
2m r(-J..
(3.5a)
1
qJ(()
-J
-d(,J..Er.
2m r(-J..
(3.5b)
Assume now that qJEL~(n is a solution of (3.1). We shall show that in that
case 9 satisfies (3.2) and that qJ is given by (3.3). Introduce the auxiliary function
p(J..) = (J.. - A) - 1 BqJ + (J..), J..E r. From the representation (2.6) for W it follows that
the connection between qJ and 9 in (3.1) is described by the following input/output
system:
(3.6)
Note that pEL~(n. The first identity in (3.6) implies that the function
(J..G - A)p(J..) is in Im Pr (where Pr is now considered on L~(T)), and hence
516
by (3.2a)
1
1
MlG - A)p(l) = - j - ( ' G - A)p(')d'
2m
l
2m
r' -
1
p(O
)
=Gx+(lG-A) ( - j - d ' ,lET,
2m
l
where
x = - j p(')d' E<Cn
2m r
(3.7)
Since PGxElmP, we may apply Proposition 1.1(2) to show that (lG - A)-l PGx
extends to an analytic function on LL which vanishes at 00. The function p_
has the same properties. Thus, by (3.7), also the function (lG - A)-l(! - P)Gx
may be extended to an analytic function on .1_ which vanishes at 00. On the
other hand, by Proposition 1.1 (3), the function (lG - A) -1(1 - P)Gx is analytic
on .1 + u T. Thus this function is an entire function which is zero at infinity.
Therefore, by Liouville's theorem, (lG - A)-l(1 - P)Gx is identically zero, which
implies that Gx = PGxElm P.
From (3.7) it follows that the first identity in (3.6) can be written as:
lGp+(l) = Ap+(l) - Gx + BqJ+(l),lET
(3.8)
(3.9)
Now multiply (3.9) from the left by Band substract the resulting identity from
(3.8). This yields
lGp+(l) = A x p+(l) - Gx + Bg+(l),lET,
(3.10)
and thus
(3.11)
From Proposition 1.1 (with lG - A x in place of lG - A) we know that the
function (AG - A X) -1(1 - P x )Gx extends to a function which is analytic at each
point of .1 + u T, and thus the function (lG - A X) - 1(1 - P x )Gx belongs to
ImP r = KerQr. Also, p+EKerQr. Therefore, Qrapplied to (3.11) gives:
!(lG - A X)-l Bg+(l) -
r' -
~J _l_('G 2m
A X)-l Bg+(')d'
(3.12)
517
and so
IS
1
x
-[(A-OG+((G-A
)]
2m r(-A
1
1
P x Gx=zBg+(I",)--.
1
1
1
-J
-Bg+(Od( + -.S G((G 2m r( -..1.
2m r
A X)-l Bg+(()d(,
(3.13)
Proposition 1.1(1) and 1.1(2) imply that the last integral does not change if in
the integrand G is replaced by P x G. But P x G(( G - A X) -1 B is analytic on LL
and vanishes at 00. Therefore
~S p x G((G 2mr
A X)-lBg_(Od( = 0,
and thus
(3.14)
which shows that (3.2) is fullfilled.
Put Y = Gx. Then (3.4) holds true. Furthermore, by the second identity in
(3.6) and formulas (3.7) and (3.11), we have for AET
cp(A)
= g(A) -
Cp _ (..1.) - Cp + (..1.)
AET
From YEImP and the spectral results in Section 1a it follows that P1EImQr.
Since y satisfies (3.4),
(QrP2)(A) = _(AG_A X)-lp Xy,
Furthermore,
AET
518
1
= (QrBg+)(Je) + - j G((G - A X)-lBg+(()d(
2m r
=
X)-l Bg + (()d()
To prove the last equality one uses the same type of reasoning as in (3.13).
From the above calculation it folIo ws that (3.12) holds with y in pI ace of Gx,
which shows that
(QrP3)(Je)
Thus P2
that
<p-=g--CPl'
JeEl
+ P3),
we conclude
<p+=g+-C(P2+P3)
+ P2 + P3
W(Je)<p+ (Je) + <p _(Je) = <p + (Je) + C(JeG - A)-l B<p + (Je) + <p_(Je)
= <p+(Je) + Cp(Je) + <p_(Je)
= g(Je), JeEl,
and thus 9 is a solution of (3.1).
Proof of Theorem 2.2. From the general theory of singular integral equations
(see [5], also [4]) it is known that T=MwPr+Qris Fredholm ifand only if
det W(Je) # 0, JeEl. But then, by Proposition 1.2, T is Fredholm if and only if
det(JeG - A X) # 0 for each JeEl.
Assurne that the latter condition holds true. A straightforward application
of Proposition 3.1 (with 9 = 0) yields (2.8). Also (2.9) follows directly from
Proposition 3.1; one only has to note that for xEImp x
To prove the first identity in (2.10) it suffices to show that for TE Im P (') Ker P x
the identity
(3.15)
implies y = O. Since YEIm P, the left hand side of (3.15) extends to an analytic
function on .1_ which vanishes at 00. From YE Ker P x it follows that the right
519
(AG-AX)-lY=(AG-A)-ly,
(3.16)
AEF
Apply G to both sides of (3.16) and integrate over the contour r. One sees that
y = Py = p x Y = 0, and thus the first identity in (2.10) is proved. In an analogous
way (or using a duality argument) one proves the second identity in (2.10).
From (2.10) it follows that
Ind T= dim(lmPnKer P X )
= dirn Im P - dirn
dirn
<en
ImP + Ker p x
ImP
<en
- dim----Im P n Ker P x
Im P + Ker P x
I
d Im P + Ker P x
= d 1m mP- I m - - - - Kerp x
<en
.
dI
m----ImP+Ker p x
= dirn Im P - dirn Im P x ,
which proves (2.11).
Finally, let us show that the operator T+ defined by (2.12) is a generalized
inverse of T. Take an arbitrary cpEL~(n, and put g = Tcp. Then (3.2) holds true,
that is,
(3.17)
codim Im T = 0,
(3.18)
J+x=(I-II)x,
xElmp x
(3.19)
520
zEIm P,
+ W_(A)(Qr(W::1g))(A),
AEr,
(4.2)
r,
W(A) = I
+ C(AG -
A)-l B,
AEr
521
+ C(AG -
W+(A) = I
+ Cr(AG -
A)-1(I -ll)B,
A)-1B,
is
AEr,
AEr,
(4.4)
522
(I - ll)A xr=O,
llG= Gr
(G) - (( - A.)llG,
and
(I-ll~C(I-~=A(I-~-(I-llMx
and thus
{W+(A)-1- W_(A)}W_(O-1
= _ C(AG - A x )-1 B
+ (( -
(4.5)
By inserting (4.5) and (1.6) in (4.3) we obtain
(MwP r + Qr)-1 g (A)
= g(A) - iC(AG - A x )-1 Bg(A) - C(AG - A x )-1 B(~
J_l_g(()d()
2m r (-A
AET.
523
References
[1] H. Bart, 1. Gohberg, M.A. Kaashoek: Minimal factorization ofmatrix and operator functions.
Operator Theory: Advances and Applications, Voll, Birkhuser Verlag, Basel, 1979
[2] H. Bart, 1. Gohberg, M.A. Kaashoek: Explicit Wiener-Hopf factorization and realisation. In:
Operator Theory: Advances and Applications, Vo121, Birkhuser Verlag, Basel, 1986,
pp 235-316
[3] H. Bart, 1. Gohberg, M.A. Kaashoek: The state space method in problems of analysis. In:
Proceedings of the first international conference on industrial and applied mathematics,
Contributions from the Netherlands, Centre for Mathematics and Computer Science,
Amsterdam, 1987, pp 1-16
[4] K. Clancey, 1. Gohberg: Factorization of matrix functions and singular integral operators.
Operator Theory: Advances and Applications, Vol 3, Birkhuser Verlag, Basel, 1981
[5] 1. Gohberg: The factorization problem in normed rings, functions of isometrie and symmetrie
operators, and singular integral equations, Russian Math Surveys 19 (1) (1964),63-114
[6] 1. Gohberg, M.A. Kaashoek: Block Toeplitz operators with rational symbols. In: Operator
Theory: Advances and Applications, Vo135, Birkhuser Verlag, Basel, 1988, pp 385-440
[7] 1. Gohberg, M.A. Kaashoek, A.C.M. Ran: Interpolation problems for rational matrix functions
with incomplete data and Wiener-Hopf factorization. In: Operator Theory: Advances and
Applications, Vol 33, Birkhuser Verlag, Basel, 1988, pp 73-108
[8] I. Gohberg, N.Ya. Krupnik: Introduction to the theory of one-dimensional singular integral
operators. Kishniev: Stiince, 1973 (Russian); German transI: Birkhuser Verlag, Basel, 1979
[9] R.E. KaIman, P.L. Falb, M.A. Arbib: Topics in mathematical system theory. McGraw-Hill,
New York, 1969
[10] E. Meister: Randwertaufgaben in der Funktionentheorie. Teubner, 1983
[11] N.I. Muskhelishvili: Singular integral equations. Boundary problems of function theory and
their applications to mathematical physics 2nd ed, Fizmatgiz, Moscow, 1962 (Russian); English
transl of 1st ed, NoordhofT, Groningen, 1953
[12] F. Stummel: Diskrete Konvergenz linearer Operatoren. II Math Z 120 (1971),231-264
[13] A.E. Taylor, D.C. Lay: Introduction to functional analysis (2nd ed). Wiley, New York, 1980
[14] N.P. Vekua: Systems of singular integral equations and some boundary problems. GITTL,
Moscow, 1950 (Russian); English transI, NoordhofT, Groningen, 1967
Chapter 9
Applications
This paper is a review of the algebraie theory of eonvolutional codes through the foeus of the
author's 1970 paper [Fl], incIuding its origins in the work of KaIman et al. and of Massey and
Sain. This paper grew out of and in turn influeneed the development of algebraie system theory,
primarily through the author's 1975 paper [F3], as is indieated by a synopsis of relevant parts of
Kailath's text and by a citation index of [F3]. The paper includes a summary exposition of the
contents of [Fl], [F3], and a less-known intermediate paper [F2], as weil as abrief introduction
to subsequent work in convolutional coding theory citing [Fl], particularly that of Piret.
1 Introduction
There are two principal classes of codes for digital communications: block codes
and convolutional codes.
There exists a well-developed algebraic theory of block codes, documented
in books such as Peterson [1961], Berlekamp [1968], Peterson and Weldon
[1971], MacWilliams and Sloane [1977], McEliece [1977], van Lint [1982],
and Blake and Mullin [1985]. The principal classes of block codes (e.g. ReedSolomon and BCH codes) are constructed from the algebraic theory; their
principal properties, such as their minimum distances, are determined by
algebraic arguments; and their decoding algorithms are typically algebraic.
'Algebraic coding theory,' which effectively means 'the theory of algebraic block
codes,' remains an active research area [IT Special Issue, 1988].
By contrast, the algebraic theory of convolutional codes is relatively meager.
Convolutional codes are typically found by more or less exhaustively searching
for codes with good distance properties, and their decoding algorithms are
typically also search procedures (e.g. the Viterbi algorithm, sequential
decoding). However, for most applications, convolutional codes have proved
to have a better performance-complexity tradeoffthan block codes, and therefore
they have gene rally been preferred for implementation.
It was with the hope of understanding this superiority and laying the basis
for a general algebraic theory of convolutional codes that I wrote a paper [F1r
1 [Fl] received the 1970 Browder J. Thompson Prize, for the best paper in any IEEE publication
by an author under the age of 30.
528
G. D. Forney, Jr
References
[FI] G.D. Forney, Jr., "Convolutional codes I: Aigebraic structure," IEEE Transactions on
Information Theory, Vol IT-16, pp nO-738,1970;correction appears in Vol IT-17, p 360, 1971
[F2] G.D. Forney, Jr., "Structural analysis of convolutional codes via dual codes," IEEE
Transactions on Information Theory, Vol IT-19, pp 512-518, 1973
[F3] G.D. Forney, Jr., "Minimal bases ofrational vector spaces, with applications to multi variable
linear systems," SIAM J Control, Vol13, pp 493-520, 1975
[I] W.W. Peterson, Error-Correcting Codes. New York: Wiley, 1961
[2] E.R. Berlekamp, Algebraic Coding Theory. New York: McGraw-HiIl, 1968
[3] W.W. Peterson and E.J. Weid on, Error-Correcting Codes, 2nd edition. Cambridge, Mass: MIT
Press, 1971
[4] F.J. MacWiIliams and NJ.A. Sioane, The Theory of Error-Correcting Codes. 1977
[5] R.J. McEliece, The Theory ofInformation and Coding. Reading, Mass.: Addison-Wesley, 1977
[6] J.H. van Lint, Introduction to Coding Theory. New York: Springer-Verlag, 1982
[7] I.F. Blake and R.C. Mullin, The Mathematical Theory ofCoding. New York: Academic Press,
1985
[8] D.J. Costello, Jr. and J.H. van Lint, eds, Special issue on coding techniques and coding theory,
IEEE Transactions on Information Theory, Sept 1988
[9] S. Lin and D.J. Costello, Jr., Error Control Coding: Fundamentals and Applications. Englewood
ClifTs, NJ.: Prentice-Hall, 1983
[10] R.E. Blahut, Theory and Practice of Error Control Codes. Reading, Mass.: Addison-Wesley,
1983
[11] P. Piret, Convolutional Codes: An Algebraic Approach. Cambridge, Mass.: MIT Press, 1988
[12] T. Kailath, Linear Systems. Englewood CifTs, NJ.: Prentice-Hall, 1980
529
2 Origins
In this section we give abrief introduction to convolutional codes in the context
of multi dimensional linear system theory, describe the pioneering work of
Massey and Sain in this field, and finally describe the main algebraic concept
upon which [F1] was based (the invariant factor theorem and its extensions),
whose usefulness had been shown by the work of KaIman et al. [KF Al
530
G. D. Forney, Jr
zero. In coding theory G is called a generator matrix, and its elements are
represented by formal power series gij(D) in the indeterminate D (the delay
operator) over F. When the system is finite-dimensional, every giiD) is a rational
function-i.e. a quotient of polynomials in D. The set of all rational functions
in D over F is denoted by F(D), and the set of all polynomials by F[D]. Indeed,
every gij(D) is a realizable function-i.e. a rational function that represents a
causal sequence when expanded as a formal Laurent series in D (sometimes
called a proper rational function). We denote the set of all realizable functions
in D over F as Frz[D].
If x is a k-tuple of sequences of elements of F, called an input sequence, then
the encoder generates an n-tuple of output sequences, called the code sequence
y = xG. The code Cis the set of all code sequences, C = {y = xG}, as x ranges
through all possible input sequences. An input sequence may continue
indefinitely, but for technical reasons we require, if it is nonzero, that it 'start'
at some time; i.e. XE [F((D))]k, where F((D)) denotes the set of all formal Laurent
series f(D) = fdDd + fd+ lD d+ 1 + .", where the initial index d, called the delay,
may be negative; it is denoted by del f(D) = d. (The delay of the zero sequence
O(D) is defined as deI O(D) = 00.) The set of all formal power series is the set of all
formal Laurent series with nonnegative delay, and is denoted by F[[D]]; all
realizable functions are thus in F[[D]]. We say that a formal Laurent se ries
f(D) is finite if all of its coefficients are zero after some time, and then define
its degree degf(D) as the index of the last nonzero co ordinate. The set F[D]
of polynomials is the subset of F((D)) with deI f(D) ~ 0 and finite degree, plus
the zero sequence O(D), for which we define degO(D) = - 00.
For example, Fig. 1 illustrates a simple rate-1/2 convolutional encoder
over the binary field F = GF(2). The input bit sequence enters a two-stage shift
register, and two output sequences are generated from the current input bit
and the two stored bits by linear additions over GF(2) (mod-2 addition). The
generator matrix of this encoder is G = [1 + D2 , 1 + D + D2 ]; the code is the
set of code sequences C = {y = xG:xEF((D))}; and the state space has dimension
v = 2, with qV = 22 = 4 distinct states. The dimension of the state space is the
maximum degree of the two generator polynomials.
The most important parameter for code performance is the minimum
Hamming distance dH(C) between code sequences, sometimes called the free
distance, where the Hamming distance between two sequences is the number
of symbols (elements of F) in which they differ. The main object of convolutional
code design is to maximize dH(C) for a given rate kin and state space dimension
v. Since a convolutional code C is linear, this is the same as minimizing the
,,(0) :
531
532
G. D. Forney, Jr
533
elements, and the primes are the elements of R that have no factors other than
themselves, up to units. Let P be the set of all primes in R (with the ambiguity
due to units resolved by some convention); then any nonzero element of R can
be written uniquely in the form
r = ullpeP pep(r) ,
where u is a unit in R, and ep(r) ~ 0 is the order of the prime p in r. The order
ep(O) is defined conventionally as 00 for all pEP.
Thus such useful concepts as 'greatest common divisor' and 'least common
multiple' are weIl defined. SpecificaIly, if r = {r j } is a set of elements of R, their
greatest common divisor is defined as
gcd r = II ppep(r),
where the order ep(r) ~ 0 of a prime p in r = {r j } is defined as
534
G. D. Forney, Jr
but now ep(q) can be negative for pEP'. The numerator and denominator of an
element q of Q may be defined as
num q = uil
den q = II
peP"
pePd
pep(q).
'
p-ep(q)
'
where P n is the set of all PEP for wh ich ep(q) > 0, and Pd is the set of all PEP'
for which eiq) < O. Similarly, the notion of greatest common divisor generalizes,
gcd q = II ppep(q),
where the order ep(q) of a prime p in q = {qj} is defined as ep(q) = minj { ep(q)},
which can now be negative for pEP'. In words, gcd q is the greatest common
divisor of the numerators of q divided by the least common multiple of the
denominators.
The invariant factor theorem can then be extended in the following way,
using the idea that a Q-matrix {qij} becomes an R-matrix if every element %
is multiplied by the least common multiple of their denominators. This is
sometimes called the Smith-McMillan canonical form [KFA]:
Invariant Factor Theorem (Extension). Let R be a principal ideal domain,
let Q be a ring of fractions of R, and let G be a k x n Q-matrix. Then G has an
invariant-factor decomposition with respect to R
G=ArB,
where Ais a k x kR-matrix with an R-matrix inverse A -1; B is an n x n R-matrix
with R-matrix inverse B- I ; and r is a k x n diagonal Q-matrix, whose diagonal
elements "li = (Xi/i' 1 ~ i ~ k, are called the invariant factors of G with respect
to R. The invariant factors are unique, and are computable as follows: let Ai
be the greatest common divisor ofthe ixi minors of G, with A o = 1 by convention;
then Yi = Ai/Ai-I' Each invariant factor Yi divides the next, in the sense that
the orders ep(Yi) are nondecreasing with i for all pEP.
If Q is the field of quotients of R, then G generates a vector space over Q
of dimension equal to the number of nonzero invariant factors of G with respect
to R, which is called the rank of G. The rank of G is thus the greatest value of
i for which the ixi minors of G are not all zero. G has an nxk Q-matrix right
inverse G- I = B- I r- I A -1 if and only if G has full rank k.
535
References
[1] P. Elias, "Error-free co ding," IRE Transactions on Information Theory, Vol PGIT-4, pp 29-37,
1954
[2] J.M. Wozencraft and B. ReifTen, Sequential Decoding. Cambridge, Mass.: MIT Press, 1961
[3] J.L. Massey, Threshold Decoding. Cambridge, Mass.: MIT Press, 1963
[4] AJ. Viterbi, "Error bounds far convolutional codes and an asymptotically optimum decoding
algorithm," IEEE Transactions on Information Theory, Vol IT-13, pp 260-269, 1967
[5] J.L. Massey and M.K. Sain, "Codes, automata, and continuous systems: Explicit
interconnections," IEEE Transactions on Automatie Control, Vol AC-12, pp 644-650, 1967
[6] J.L. Massey and M.K. Sain, "Inverses of linear sequential circuits," IEEE Transactions on
Computers, Vol C-17, pp 330-337, 1968
[6] R.R. Olson, "Note on feedforward inverses far linear sequential circuits," IEEE Transactions
on Computers, Vol C-19, pp 1216-1221, 1970
[7] D.J. Costello, "Construction of convolutional codes for sequential decoding," Tech Rpt
EE-692, U. Notre Dame, August 1969
[8] R.E. Kaiman, "Irreducible representations and the degree of a rational matrix," J
SIAM Control, Vol 13, pp 520-544, 1965
[9] R.E. Kaiman, P.L. Falb and M.A. Arbib, Topics in Mathematical System Theory. New York:
McGraw-Hill, 1969; Chap. 10
[10] C.W. Curtis and I. Reiner, Representation Theory of Finite Groups and Associative Algebras.
New York: Interscience, 1962; pp 94-96
[11] LN. Herstein, Topics in Algebra. New Yark: Ginn-Blaisdell, 1964
[12] B. Hartley and T.O. Hawkes, Rings, Modules and Linear Algebra. London: Chapman and
Hall, 1970
536
G. D. Forney, Jr
3 Mai Results
We now summarize the main results of [F1], [F2], and [F3], with a few
em bellishmen ts.
3.1 Inverses
Questions about wh ether convolutional encoders (or more general linear
systems) characterized by a transfer-function matrix G have inverses of various
kinds are often easily answered by appeal to the invariant factor theorem or
its extension.
Generally, if G is a k x n matrix over so~e ring of fractions Q of a p.i.d. R,
where k ~ n, then G has an R-matrix inverse ~1 if and only if the numerator
rxk of the last invariant factor Yk of G with respect to R is 1; i.e. if the orders
ep(Yk) are nonpositive for all PEP (which implies that ep(Yi) ~ 0 for 1 ~ i ~ k for
all pEP). In this case, the inverse is given by G- 1 = B- 1 r -1 A -1.
If G has full rank, then it has an R-matrix pseudo-inverse G- 1 given by
G-1 = rxkB -1 r -1 A - 1 such that GG -1 = rxkl k, and if G' is any other R-matrix
such that GG' = rxlk for some rxER, then rxk divides rx; i.e. Ghas a pseudo-inverse
with factor rx if and only if
ep(rx):?; ep(Yk),
all PEP
537
+ deggj, 1 ~ i ~ k}
538
G. D. Forney, Jr
The predictable degree property ensures that inequality never occurs in this
expression, so that we can predict the degree of a code sequence y if we know
the degree deg Xi of the k components of the input sequence x. This means that
the set of all code sequences y of a given degree can be easily enumerated, which
is the content of statement (d).
Conversely, if we can easily tabulate the short polynomial code sequences
in C (e.g. by searching a trellis diagram for Cl, then we can find a minimal
encoder. Any polynomial code sequence of least degree may be taken as the
first generator gl. Then any polynomial code sequence of least degree that is
not linearly dependent on gl may be taken as the second generator g2' and so
forth. When k linearly independent code sequences gi have been found in this
way, they must form a minimal encoder G. All minimal encoders thus have
the same constraint lengths Vi.
Given one encoder G for a code C, an equivalent minimal encoder can be
found by various algebraic manipulations of G in F[D] or F(D), as described
in [F1] and [F3]. However, as a practical matter, the method of tabulation of
the shortest linearly independent code sequences may be easier.
A minimal encoder does not just have the least overall constraint length of
any polynomial encoder in the obvious realization; there is no realization of
any encoder G for the code C that has fewer states. This is proved by observing
that the generators gi are code sequences that must be generated by any encoder
for C, and the time shifts D - jgi of these sequences, 1;'i; i;'i; k, O;'i;j;'i; Vi - 1,
truncated to time index zero and higher, are all nonzero sequences which must
arise from different physical states at time zero in any realization of any encoder,
since, from the main theorem about minimal encoders, no two ofthese truncated
sequences differ by a code sequence in C. These sequences represent astate
space of dimension V = LiVi over F, so the dimension of the state space of any
encoder must be at least v.
A rate-1ln encoder G comprises only a single generator n-tuple g. A minimal
encoder is obtained by multiplying through by the least common multiple of
the denominators of the rational functions gj(D), and then dividing through by
the greatest common divisor of the numerators; minimal encoders are thus
unique, up to units of F. Thus the fuH algebraic apparatus is not needed for
rate-1ln codes.
In summary, minimal encoders are polynomial encoders with polynomial
inverses (and thus noncatastrophic) that can be realized in the obvious realization with as few memory elements as any encoder for the same code, and that
also have the predictable degree property.
539
G[G-
1 ;H T ]
or
= [Ik;Okx(n-kj],
540
G. D. Forney, Jr
= In;
i.e. they are each others' inverses. Consequently they are both unimodular
polynomial matrices ('scramblers,' in the terminology of [Fl]).
These matrices may be used in an interesting decomposition. Let r be any
element of [FD))]n; then
r[G- 1 ;H T ]
[x(r);s(r)],
r
n
541
542
G. D. Forney, Jr
+ ... + ODr/L' =
(1
+ OD)/L-/L',
a polynomial offormal degree J1. - J1.'. The set P* of primes in F{D} therefore
consists of the set P of ordinary primes in F[D] (the irreducible polynomials),
plus one more, the polynomiall + OD, whose formal degree is 1. Every nonzero
element of F{D} can then be written uniquely as
= u
pep(f),
p.
where u is a unit (nonzero element of F), and ep(f) ~ 0 is the order of the prime
PEP* in f. (The order of any prime PEP* in the zero polynomial is defined
conventionally as ep(O) = 00.)
Now every nonzero ordinary rational function in F(D) can be written
uniquely as the ratio of two elements of F{D} of the same formal degree, after
reduction to lowest terms. Specifically, if 9 = fd f2 is the ratio of two ordinary
polynomials of degrees J1.1 and J1.2' respectively, then the numerator is multiplied
by (1 + OD)/L2 - /LI if J1.1 ~ J1.2' or the denominator by (1 + OD)/LI - #2 otherwise. Every
nonzero rational function can then be written uniquely as
9=u
n
p'
pep(g),
543
where now the order eig) ean be negative. For example, the delay of a rational
funetion g is given by
deI g = eD(g)
Beeause the formal degrees of the numerator and denominator are equal,
we have the productJormula (a sum here, beeause we are dealing with exponents):
~>p(g)[degp] = 0
P'
L ep(g)[degp] =
- e1+OD(g)
00
+ deI g2;
544
G. D. Forney, Jr
The prime factors of positive order are factors of the numerator, and may
be called the zeros of a nonzero rational function g, whereas its prime factors
of negative order are factors of the denominator and may be called its poles.
The product formula says that the number of zeros, weighted by their degree,
is equal to the number of poles, similarly weighted. If the prime D has positive
(resp. negative) order, then g is said to have a zero (resp. pole) at zero of order
e(p) (resp. - e(p)); similarly, the prime D- 1 is regarded as a zero or pole at
injinity. (If we are working in F(D - 1) = F(z), the definition of 'zero' and 'infinity'
are interchanged.)
We may obtain subsets R of F(D) that are rings, in fact p.i.d.s, by excluding
certain poles or zeros. The ring of realizable (proper) functions Frz[D] is the
set of elements of F(D) that have no poles at zero (or, equivalently, the.elements
of F(D- 1 ) that have no poles at infinity); i.e. in which the order of the prime
D is nonnegative. The ring of polynomials F[D] is the set of elements that have
no poles anywhere except at infinity; i.e. the order of every prime but D- 1 is
nonnegative; the ring of polynomials F[D -1] is the-sevbf elements that have
no poles anywhere except at zero; i.e. the order of every prime but D is
nonnegative. The ring of finite sequences is the set of elements that have no
poles except at zero or infinity. We shall denote the ring of elements of F(D)
that have no poles at a particular prime PEP* as F p[D]; for example,
Frz[DJ = F D[D].
Since F(D) is the field of quotients of any of these subrings R, the extended
invariant factor theorem may be used to decompose any k x n rational matrix
G into a product A TB, where A and Bare invertible (unimodular) k x k and n x n
R-matrices, while T is a diagonal F(D)-matrix whose diagonal elements are the
invariant factors of G with respect to R. When R = F p[D], each invariant factor
Yi is a nonnegative power of p, Yi = pep(Yi), since p is the only prime in F p[D].
The invariant factors may be calculated as folIows. Let ep(Li;) be the minimum
order of p in any of the ix i minors of G. Then
ehi) = ep(Li i) - ep(Li i -
1 ),
where by convention ep(Li o) = O. The extended 1FT shows that the orders ep(Yi)
are nondecreasing with i.
If S is some set of primes PEP*, and Fs[D] is the ring of all elements of
F(D) that have no poles in S, then S is the intersection of the rings {F p[D], pES},
and the invariant factors of G with respect to F s[D] are the products of the
invariant factors of G with respect to each of the rings F p[D] individually, since
the orders of the i x i minors of Gare unchanged for each pES. (This assurnes
that F(D) is the field of quotients of F s[D], which occurs if and only if the
complement of S contains at least one element of PEP* of degree 1; for example,
in view of the product formula, F p.[D] = F, so F(D) is not the field of quotients
ofFp.[D].)
If q = {qj} is a set of rational functions, then the order ep(q) of a prime p in
q is defined as
ep(q) = minjeiqj)
545
We still have
eiq) ~ 00,
equality iff q = 0;
kEF(D);
~>p(q)[degp] ~ ~>p(ql)[degp]
P'
P'
L ep(q) [deg p] ~ 0,
q =F 0
P'
00
if rank(G) < k;
defG ~
if rank(G) = k
546
G. D. Forney, Jr
then define the zero degree and pole degree of the set q as folIows:
zdgq = L ep(q)[degp];
pdgq = L - ep(q)[degp],
Pd
where P n is the set f PEP* fr which ep(q) > 0, and Pd is the set f PEP* fr
which ep(q)<O. It fllows that zdgq~O, pdgq~O, and
def q = pdg q - zdg q
Thus def q ~ pdg q, with equality if and only if zdg q = 0; i.e. iff ep(q) ~ 0 fr all
PEP*.
If the elements qj f q are expressed as ratis f elements f F {D - 1 }, then
pdg q is the degree f the least cmmn multiple of the denominators. Note
that if q is a set of realizable functions, then eD(q) ~ 0, since eD(q) ~ 0 for any
realizable function qj' S pdg q is also the degree oftlte-1east common multiple
of the denominators if the elements qj are expressed as ratios f elements of
the ring of polynomials F[D -1].
M ain theorem 0/ realization theory. Let G be a k x k matrix of realizable
functions, let el1 i) be the rders of the primes PEP* in the set of i x i minors
of G, 1 ~ i ~ k, and let ep(Yi) = ep(Ll i) - ep(Ll i - 1 ), 1 ~ i ~ k, with eiLlo) = 0 for
all PEP* by convention. Let
,ui=pdgYi=
-ep(Yi)[degp],
Pd(Yi)
where Phi) is the set of PEP* fr which ep(Yi) < 0, and let ,u = Li,ui. Then there
exists a realizatin f G with ,u memry elements, and there exists n realizatin
with fewer than ,u memry elements.
Since ep(Yi) is nndecreasing withi,,ui=pdgYi is nnincreasing with i, and
the sets Phi) are nested, with Phk) the smallest. If we define defYi =
Lp.-ehi)[degp] and ZdgYi= pdgYi-defYi' then defYi is nnincreasing with
i, zdg Yi is nndecreasing with i, and
ZdgYi = _L eh;)[degp] ~ 0,
Pd(Yi)
where PiY;) is the set f PEP* fr which ep(Yi) ~ O. Thus pdgYi ~ defYi' with
equality if and nly if ehi) ~ 0 fr all PEP*. Finally, eiG) = ep(Lld = Liep(Yi)
fr all PEP*. Thus
defC = defG = -L::ep(G)[degp] =
p*
Ip. Ii
eiYi) [degp]
with equality if and nly if ehi) ~ 0 fr all PEP*, 1 ~ i ~ k. Since ehi) is nondecreasing with i, this ccurs if and nly if eiYk) ~ 0 fr all PEP*.
547
for all PEP*. If inequality ne ver occurs for some PEP*, then the set G is said
to be p-orthogonal.
If G is a generalized minimal encoder, then the matrix G' with rows
g; = p-ep(gil gi has ep(g;) = 0, 1:::;; i:::;; k, and thus has an Fp[D]-matrix inverse
(G,)-l. Then
y = xG =
where x; =
Xipep(gil,
'
+ ep(g;)}
548
G. D. Forney, Jr
549
4 Subsequent Work
It is not possible in a paper of this length to give a fuH account of further
550
G. D. Forney, Jr
decoders, such as in the papers of Schalkwijk et al. [1976], [1978] and those
of Reed and Truong [1983], [1984].
References
[1] J.L. Massey, DJ. Costello, Jr. and J. Justesen, "Polynomial weights and code constructions,"
IEEE Transactions on Information Theory, Vol IT-19, pp 101-110, 1973
[2] J. Justesen, "New convolutional code constructions and a dass of asymptotically good
time-varying codes," IEEE Transactions on lIiformation Theory, Vol IT-19, pp 220-235, 1973
[3] J. Justesen, "Algebraic construction of rate-1/v convolutional codes," IEEE Transactions on
Information Theory, Vol IT-21, pp 577-580, 1975
[4] E. Paaske, "Short binary convolutional codes with maximal free distance for rates 2/3 and
3/4," IEEE Transactions on Information Theory, Vol IT-20, pp 683-689, 1974
[5] T.J. Shusta, "Enumeration of minimal convolutional encoders," IEEE Transactions on
Information Theory, Vol IT-23, pp. 127-132, 1977
[6] G. Ungerboeck, "Channel coding with multiJeveJjphase systems," IEEE Transactions on
Information Theory, Vol IT-28, pp 55-67, 1982
[7] J.P.M. Schalkwijk and A.J. Vinck, "Syndrome decoding of binary rate-1/2 convolutional
codes," IEEE Transactions on Communications, Vol COM-24, pp 977-985, 1976
[8] J.P.M. Schalkwijk, AJ. Vinck and K.A. Post, "Syndrome decoding of rate-kin convolutional
codes," IEEE Transactions on Information Theory, Vol IT-24, pp 553-562, 1978
[9] LS. Reed and T.K. Truong, "New syndrome decoder for (n, 1) convolutional codes," Electronics
Letters, Vol 19, pp 344-346, 1983
[10] I.S. Reed and T.K. Truong, "New syndrome decoding techniques for the (n,k) convolutional
codes," lEE Proceedings F-Communications, Radar, and Signal Processing, Vol 131, pp
412-416, 1984
551
552
G. D. Forney, Jr
H(s) = N(s)D- 1(s), Kailath introduces column (or row) reduced matrices and poles
553
and zeros at irifinity. He defines the square matrix D(s) as column reduced if
the degree of its determinant is equal to the sum of the degrees of its columns,
and comments:
554
G. D. Forney, Ir
We note first that the set of all rational m x 1 vectors {J(s)} such that
H(s)f(s) = 0 is a vector space (over the field of scalar rational functions)
called the right null-space of H(s) .. . If we had a vector space over the real
or complex numbers, its dimension would essentially characterize it. But
for our more general situation of a rational vector space, it turns out that
there is arieher structure, first noted by Kronecker (in connection with
linear pencils) and exposed in detail by Wedderburn and later,
independently, by Forney [F3]. Here we follow the slightly different
development in [Verghese, 1978], [Verghese and Kailath, 1979] (see also
[Vekua, 1967], Sect. 5].
This structure is captured by the notion of a minimal polynomial basis
for the (right null-)space...
[Kailath then develops the structure and properties of minimal bases,
and shows their relation to Kronecker indices.]
lC. Gohberg first pointed out to the author that the significance of
minimal ba ses was perhaps first realized by J. Plemelj in 1908 and then
substantially developed in 1943 by N.l Mushkelishvili and N.P. Vekua
(see the discussion in [Vekua, 1967], Sect. 5]. These authors were studying
the so-called Riemann-Hilbert problem, wh ich was later shown to be
closely related to the theory of Wiener-Hopf integral equations, as
described for example in the definitive paper ofGohberg and Krein [1960].
Certain so-called 'factorization indices' play an important role in this
theory, and it is therefore not surprising that these are closely related to
the Kronecker indices [as described more fully later] ...
(Although [F2] is not cited here, the right null-space is what would be called
the dual code in coding theory, and the right minimal (Kronecker) indices are
the constraint lengths v/ of a minimal dual encoder, which playa central role
in [F2].)
The next section defines the defect of a transfer function H(s) as in [F3],
and shows that the defect of H(s) is equal to the sum of the left and right
Kronecker indices of H(s).
The following section discusses equations of the form H(s)A(s) = B(s), and
shows that the question of whether they can be solved with no poles p in so me
subset S of P* can be answered by arguments in the same vein.
Aseries of25 exercises develops a wide variety ofadditional specific results.
Finally, in the last part of the chapter, there is a discussion of "Popov or
polynomial-echelon matrix-fraction descriptions, ... [which] are quite interesting [and] were first introduced by Popov [1969], and later studied by Morf
[1974]. Forney [F3], Eckberg [1973], and Kung and Kailath [1977]. .." This
development yields the Popov parameters {oc;jd as system invariants. Connections are made to the problems of finding a left matrix-fraction description
A -1 B equal to ND - 1; to finding inverse systems; and to the minimal partial
realization problem.
In summary, the mathematics exposed in [F1]-[F3] seems to connect
broadly to wide areas of the algebraic theory of multi variable linear systems
555
theory, and to be a basic tool in this field. Of course, it is evident from Kailath's
scholarship that much of this was a rediscovery or recasting of concepts and
results that had been at least partially known previously, perhaps in other
contexts, and that a great many authors contributed to the development and
application of these ideas. Nonetheless, it seems fair to conclude that the
presentation of these ideas in 1970-1975 did substantially impact this field, and
that these ideas are now basic.
References
[I]
[2]
[3]
[4]
556
G. D. Forney, Jr
5 Conclusion
This snapshot of one scientific development is an interesting vignette that
illuminates how science works.
In reviewing this his tory, one is struck by how the accidents of time and
place sometimes combine. The fact that Massey, the co ding theorist, and Sain,
the linear system theorist, were both at Notre Dame was of course a large factor
in the appearance of their pioneering work. But Massey was also a consultant
to Codex, where we were daily being presented with evidence of the practical
superiority of convolutional codes, and where we learned of his work with great
interest. Once the question of minimal encoders for a convolutional code was
clearly framed in 1968-69, it was natural to consult the literat ure of realization
theory; the fact that KaIman, Falb, and Arbib [KFA] appeared in 1969 was an
extremely happy coincidence. The confluence of all these circumstances account
in large part for [F1].
The fact that the coding framework (finite field, discrete time) leads to a
purely algebraic approach was certainly helpful. Also, questions that arise
naturally in coding-inverses, equivalence of encoders-might not have arisen
so prominently in another context. So it was probably fortunate that the work
was initially motivated by co ding problems.
Another piece of great good fortune was my spending the 1971-72 academic
year at Stanford, where the algebraic system theory ideas of KaIman and his
intellectual descendants were "in the air." Although [F2] was written at Stanford,
it was not greatly influenced by these more general ideas. However, enough
understanding of the state of multivariable system theory was obtained so that
the idea of writing up the results of [F1] for a system-theory audience took
shape, ultimately taking the form of [F3].
The timing of [F3] seems to have been just about right for multi variable
linear system theory. Enough related results were already known so that the
rather purely mathematical approach of [F3] could be readily related to known
and potential applications. Therefore it seems to have been rapidly assimilated
into the field. The fact that Kailath and his group were intimately familiar with
the work must also have greatly helped its propagation and acceptance.
(As a sidelight, it may interest the reader to know that I was so far out of
the field of linear system theory that I never even saw a copy of the published
paper [F3] until recently. "I shot an arrow in the air; it fell to earth, I know
557
not where." In a telephone eall in about 1987, B.F. Wyman told me that [F3]
was "famous"-to my eomplete surprise.)
It is somewhat ironie to find that while these ideas have borne relatively
little fruit in the field of eoding theory for whieh they were developed, they have
been so sueeessful elsewhere. Nonetheless, this must happen often in the
development of seienee. The history as a whole ean be viewed as a eontribution
in the development of algebraie system theory that was due in part to the
aeeidents oftime and plaee, and in part to the benefits of a fresh point ofview.
This is abrief survey of some recent research trends in econometrics which make extensive use of
techniques developed in system theory. In particular, we pay attention to the following subjects:
cointegration, error correction, and the representation of systems; path controllability, system
inversion, and trackability; inputs, outputs, and errors-in-variables.
1 Introduction
System theory interacts with the theory of economics and econometrics in rather
diverse ways, and the past few decades have seen the arrival and sometimes
also the departure of a rich variety of research trends in the interface. The story
might begin with The Mechanism of Economic Systems [55], a book that was
published in 1953 although it was based on notes that the author, Amold
Tustin, had written immediately after World War II. In this book, Tustin
proposed to model the workings of anational economy by analog simulation
using clever mechanical and electrical devices which he described in so me detail.
Apparently his hope, as an electrical engineer, was to use such nonlinear models
to explain and remedy business cycles much in the same way as unwanted
oscillatory motions in servomechanisms can be suppressed by appropriate
controller design. As noted by Aoki [3], this approach doesn't seem to have
had widespread influence among economists.
There have been other trends, however, wh ich did acquire a status of
permanence in the economic and econometric literature. Optimal control theory,
in the style that emerged in the fifties, has found its way into the economic
realm and is weIl and alive there. This is evidenced in recent textbooks such as
[16] and [53]. Optimal stochastic control theory has found application in
financial management; arecent survey is provided in [31]. There are other areas
that are more or less allied to system theory and that are extensively used in
economics, such as the theory of differential games, but we willleave these out
of our discussion.
An ex am pie of a standard and full-fledged subject in system theory that has
had an undeniable influence in econometrics is, of course, the Kaiman filter.
560
J. M. Schumacher
Its importance was recognized in the standard reference [22], and the KaIman
filter can now be considered as one of the standard tools in the study of time
series and dynamic economic models (cf. [14,48]). Further interaction between
system theory and econometrics takes place in the field of identification. The
fundamental problems that are involved here were stirred up by R.E. Kaiman
[30]. Arecent detailed elaboration of some of the points raised by Kaiman can
be found in [36,37]. At a more technicallevel, the recent book by Hannan and
Deistler [23] provides an excellent reference for the way that system theory
and statistics interact to solve identification problems.
In this paper, we shall attempt to highlight some of the newer research
trends in econometrics wh ich make extensive use of ideas and techniques from
system theory. First, we shall discuss the issue of 'cointegration' which has been
heavily debated in econometric circles during the past decade. One ofthe central
points in the discussion is a result known as the Granger representation theorem;
this is basically a theorem about alternative representations for linear dynamic
systems, which in system-theoretic terms would fall under the heading of
realization theory (or as some would perhaps prefer to say: the theory of system
representations and transformations). There is also an aspect of control in the
cointegration debate; in particular, the tracking oftargets is involved. The ability
of a system to track a given target is a classical subject in system theory, and
recently there have been some efforts to extend this older work and to apply
it in specific economic contexts. We shall briefly discuss the results in this area
in Sect. 3. Our final topic will concern the selection of 'inputs' and 'outputs'
('endogenous' and 'exogenous' variables, in econometric terminology). This
subject allows a four-fold decomposition brought about by the two divisions
static/dynamic and deterministic/stochastic; we shall discuss all four cases, to
bring out some interesting analogies. The final Sect. 5 contains concluding
remarks.
In this paper we will not cover all of the impulses to the application of systemtheoretic ideas in economics that are due Aoki and his co-workers, such as the
ideas concerning aggregation and reduction by balancing; instead we refer to
Aoki's recent book [4]. For additional material, we also refer to the special
issue ofthe Journal of Economic Dynamics and Control on Economic Time Series
with Random Walk and Other Nonstationary Components (Vol. 12-2/3 (1988),
edited by M. Aoki), the special issues of Computers & Mathematics with
Applications on System-Theoretic Methods in Economic Modeling (Vois. 17-8/9
(1989) and 18-6/7 (1989), edited by S. Mittnik), and the survey paper by E.l
Moore [38].
561
+ Bzt - 1] =
C(L)et
(The notation here is the econometric one: L is the lag operator that maps (xt)t
to (x t - dt; L1 = I - L is the difference operator, which maps (xt)t to (x t - x t - dt;
A1(z), B1(z), D(z), and C(z) are polynomial matrices; (et)t is white noise.) This
way of incorporating long-term dynamics into short-term dynamic models
originates in [9,47].
Apreeise formulation of the connection between cointegrated models and
error eorreetion models has been proposed by C.W.J. Granger in an unpublished
manuseript [20] and in the paper [13]. Specifieally, Granger ealls a proeess
(xt)t cointegrated of order d, b if all eomponents are integrated to order d, and
if some non trivial linear combination Zt = rl.' X t is integrated of order d - b where
b > O. A process X t in Rn that is eointegrated of order 1,1 is said to have
cointegrating rank r if a'x t is stationary for some r x n-matrix a' of full row
rank, and if 'x t is nonstationary for any matrix ' whose rank exceeds r. The
Granger representation theorem gives the eonnection between representations of
'autoregressive' and 'moving-average' type for time se ries that are eointegrated
of order 1, 1. The following version uses a formulation proposed by Johansen
[26].
The Granger Representation Theorem. Assume that the Rn-valued process (xt)t
satisfies
(1)
562
J. M. Schumacher
where (a,), is zero-mean white noise of unit variance, and C(z) is an n x n matrixvalued function that is holomorphic on the disk Iz I < 1 + p and that is nonsingular
on the same disk except at 1, where C(1) has rank n - r. Let oe and be n x r
matrices offull column rank such that oe'C(l) = 0 and C(l) = O.lfthe r x r matrix
oe'(dC/dz(l)) is nonsingular, then the process (x,), is cointegrated of order 1,1
with cointegrating rank rand satisfies the equation
(2)
where
o
The proof of the Granger representation theorem in [13] is somewhat hard to
follow. Engle sketches a different proof, due to B.S. Yoo, in [12]. This proof is
based on what Engle calls the Smith-McMillan-Yoo form; it is actually a Smith
form with respect to the ring of causal stable rational functions. In [26], Johansen
uses the context of functions that are holomorphic on an open disk containing
the unit circle (which is more general than the rational context used by Yoo),
and he provides a third proof. Apparently it hasn't been noticed in this literature
that essentially a matrix generalization is involved here of the following simple
rule from complex function theory: if fez) is holomorphic in a neighborhood
of zo, then f- l(Z) has a simple pole at Zo if and only if dJldz(zo) is nonzero,
and in that case the residue of f -l(Z) at Zo (Le. the coefficient of (z - zo) -1 in
the Laurent se ries development of f -l(Z) around zo) is given by (dJldz(zo)) -1.
In the matrix case, one has to take directions into account, and the resulting
residue formula is given below. We shall say that a matrix function G(z) has a
simple pole at a point Zo of the complex plane if G(z) has a pole at Zo but
(z - zo)G(zo) doesn't have a pole there.
Residue Formula. Let F(z) be n x n matrix function that is holomorphic in a
neighborhood of zo, and suppose that F(z) is nonsingular in a neighborhood
of Zo except at Zo itself. Let the rank of F(zo) be n - r; let oe and be n x r-matrices
offull column rank such that oe' F(zo) = 0 and F(zo) = O. Under these conditions,
the matrix function F -l(Z) has a simple pole at Zo if and only if the constant
563
Zo
+ II1(z),
+ (C(z) -
C(1))/(1 - z)
564
J. M. Schumacher
We shall continue to ass urne that the matrix function C(z) is holomorphic on
an open disk containing the unit circle and that C(z) is non singular on the same
disk except at z = 1. It is natural to define the co integration space of order k as
the set of all vectors oe such that Llkoe'xt is stationary. If we denote the dimension
of this space by nk' then we may call the indices (no, n1, ... , nd) the co integration
indices ofthe process (xt)t. In this terminology, an Rn-valued process is integrated
or order 1, 1 with cointegrating rank r if and only if its cointegration indices
are (r, n). The cointegration indices can be easily expressed in terms of the
coefficients of the power series development of C(z) around z = 1: writing
C(z) =
L Ci 1 00
z)j,
j=O
we have
nd_i=dimker[Co C1Ci-1J'
The important point to note is that the cointegration indices are not in any
one-one relation with the orders of the zeros at 1 of the matrix function C(z).
(We recall that a non singular meromorphic matrix function F(z) allows, with
respect to a given ZoEC, a 'local' Smith form
(3)
in which we use the notation H(I) for the space ofvector functions that are holomorphic in a neighborhood of 1, with the following formula (adapted from
[41J) for the number vj of zeros at 1 of C(z) of order~;j:
vj = dirn {oe(I)1 oe(z)EH(I), (1 - z) - jC'(z)oe(z)EH(I)}
(4)
Clearly we have
(5)
but equality does not hold in general, as can be seen from simple examples.
The most important exception to this is, of course, the case of first-order
integration.
It can easily be seen that the vector functions oe(z) which appear in (4) may
be restricted to be vector polynomials, without impairing the validity of the
statement. Therefore, if we allow cointegrating vectors to be polynomial rather
than constant and change the definition of 'cointegration indices' accordingly,
we do obtain a one-one relation between cointegration indices and orders of
zeros at 1. The importance of polynomial cointegrating vectors (PCIV's) has
been emphasized by Yoo (cf. [12J). A slightly different approach is taken by
Johansen [24]. He introduces wh at we have called the 'cointegration indices',
565
and notes that their sum can at most be equal to the order r of the zero of
det C(z) at 1. The case in which equality holds is referred to by Johansen as the
'balanced' case; since it is easily verified that
we can see that this case is the one in which equality holds in (5) for each
= 1, ... , d and, moreover, vj = 0 for j < d. Johansen proceeds to show that, after
constant row transformations which are summarized in a nonsingular matrix
T, we can write
TC(z)~ Ir ::)~rz)]
(1 - Z)kCk(Z)
where, in the balanced case, the matrix C(z) = [C~(z) ... C~(z)]' is nonsingular
at 1. We mayaIso write this is a slightly different way:
C(z)
= T- 1 diagl -
Comparing this with (3), we see that the balanced case is characterized by the
fact that the local Smith form around z = 1 can be obtained using only a constant
transforming matrix on the left side. In general, one will have to use a nonconstant transformation; although the local Smith form in principle calls for
holomorphic transformations, Johansen proves by a direct argument that a polynomial transformation on the left hand side will suffice. (In the rational case,
one might appeal to the Smith-McMillan form to prove this; in fact, this is
what Yoo does.) The polynomial transformation can then be interpreted as a
transformation of the variables in which linear combinations are taken of
contemporaneous and lagged components.
So, either by introduction of polynomial cointegrating vectors or by polynomial transformations of the variables, the structure of cointegrated systems
can be studied through the zero structure of an associated matrix function at
z = 1. This may help to solve remaining problems, such as the formulation of
analogs of the Granger representation theorem for higher-order cointegrated
series (partial results on this can be found in [24] and [8]). Another important
question is, to what extent polynomial cointegrating vectors (or polynomial
transformations of the variables) are unique; the answer to this is of course
critical to the discovery of 'target relations'.
In the above, we have emphasized what might be called the 'structural'
aspect of cointegration. There is of course also a 'statistical' side to the matter,
which is concerned with the testing of hypotheses about the cointegration
structure and with the estimation of cointegrating vectors, and most of the
journalliterature in fact concentrates on this aspect (see for instance [13,25,42]).
Virtually all of this work is concerned with first-order cointegrated systems. It
566
J. M. Schumacher
seems, however, that even in this context there are some basic questions that
remain to be answered, in particular in connection with hypothesis testing.
Engle notes: "The null hypothesis of cointegration would be far more useful in
empirical research than the natural null of non-cointegration. The selection of
a 5% test of the non-cointegration null is very arbitrary and many researchers
are assuming cointegration when these tests are only rejected at larger
significance levels" [12, p. 26/27]. One may argue about what is natural; in a
sense, the hypothesis of cointegration is the more highly structured one, and is
therefore simpler and more natural. From a certain point of view, the
cointegrated situation is also the more singular one, which may explain the
difficulties that classical statistical methods have with adopting cointegration
as the null hypothesis. Possibly the theory of zeros of matrix functions may
also be of help here to unravel the singularities.
567
time' or 'policy lead', any given path of the target variables can be tracked
exactly by proper choice of the instrument variables. The definition in the
continuous-time ca se is slightly different, but the criterion (at least in the linear
constant-parameter case) is the same: path controllability holds if and only if
the transfer matrix G(z) from instruments to targets has full row rank as a
rational matrix [6, p. 559J. This is a rather attractive generalization of the static
rule of Tinbergen.
Further work within the system theory community on this subject has concentrated on finding simple conditions for right invertibility in terms of the
state space representation
x(k + 1) = Ax(k) + Bu(k),
X(k)EX, U(k)E U
y(k)E Y
TO = {O}
Tk+1
= {xEXjx=Ax+BuforsomexETkandusuch that CX + Du = O}
It is easily seen that the sequence (Tk)k is nondecreasing, and so the sequence
must have a limit which is denoted by T*. The system given by the parameters
(A, B, C, D) is right invertible (in the sense that the transfer matrix G(z) =
C(zI - A) -1 B + D is right invertible as a rational matrix) if and only if
CT*+imD= Y
The state space framework suggests extensions to the non-constantparameter case and the nonlinear case. A characterization of path controllability for linear systems with time-varying parameters has been given by Engwerda
[15J; necessarily the condition is more involved than in the constant-parameter
case, but an analogy with the Morse-Wonham result can still be drawn.
Necessary and sufficient conditions for (local) path controllability of discretetime nonlinear systems have been given by Nijmeijer [40J, who also establishes
the close relation that exists between path controllability and decouplability
(the possibility ofintroducing a control policy in which each target is influenced
by only one instrument). Recently, state space algorithms have become available
to decide on the right invertibility of systems that are given in implicit form,
rather than in solved form [33]. This is areturn to the original formulation by
Tinbergen, who starts in [54J with implicit equations rather than with a 'final
form'.
One may reasonably argue that the invertibility of dynamic systems should
play an important role in dynamic economic theory, simply because invertibility
568
J. M. Schumacher
569
models, and the results of control theory may help to provide such information
in the form of constraints that must be satisfied for control action to be effective.
Rw=O
(6)
where we may ass urne that the matrix R has full row rank. If we believe that
it is reasonable to require that the inputs are not restricted by the equations
and that the outputs are completely determined by the inputs and by the
equations, then the standard procedure applies: select output variables by finding
a maximal set of independent columns among the columns of R, name the
associated components y, name the remaining components u, rewrite (6) as
R1y + R 2 u = 0 and, noting that R 1 is invertible by construction, obtain
Y= -R;lR 2 u
which c1early has the desired characteristics. In general, the choice of inputs is
not unique; however, the number of inputs is determined by the data (6). Any
selection of this number of variables will 'generically' be valid as a choice of
inputs.
There is a certain asymmetry in the selection procedure based on (6) since
we first select the outputs and then simply let the inputs be what is left. However,
if we would have represented the subspace ker R which effectively appears in
(6) as the image rather than as the kernel of some matrix, then we would have
selected the inputs first by taking a maximal set of independent rows of the
representing matrix. So the seeming priority of outputs over inputs in the
selection procedure above is just a consequence of the chosen representation.
570
J. M. Schumacher
where adenotes the (forward) shift and R(z) is a polynomial matrix which we
may assurne to have full row rank. The basic technique is to write R(z) in the
form T(z)B(z) where T(z) is an invertible rational matrix and B(z) is 'right
bicausal', i.e. B(z) is proper rational and has full row rank at infinity. This
factorization may be achieved by the reduction of R(z) to row reduced form
[27, p. 386]; indeed, note that this procedure factorizes R(z) as U(z).::1(z)B(z)
where U(z) is unimodular, .::1(z) is diagonal with diagonal elements of the form
z\ and B(z) is right bicausal. A proposed selection of inputs and outputs will
induce a partitioning of R(z) as [R l (z) Rz{z)] (after possible reordering of the
columns), and a corresponding partitioning of B(z) as [BI (z) B 2 (z)]. Now, R l (z)
will be invertible if and only if BI (z) is invertible, and R; I (z)R 2 (z) = B; I (z)B 2 (z)
will be proper rational if and only if Bl(z) doesn't have a zero at infinity. (The
'only if' holds because BI (z) and B 2 (z) are co prime as matrices over the ring
of proper rational functions, so there can't be a pole-zero cancellation.) The
result is that the proposed selection of inputs and outputs is admissible if and
only if the matrix Bl(oo) is non-singular. In other words, what we have to do
is to select a maximal number of independent columns from the full row rank
matrix B( 00 )-we might say that the problem is reduced to the static case.
Of course, this solution is hardly surprising to the econometrician, who is
used to representing transfer matrices as quotients of matrices of polynomials
in Z-l (the backward shift). In models of the form
B(a-1)y
= A(a-1)u
where A(z) and B(z) are polynomial matrices, the condition that B(O) should
be invertible is known as the 'causality condition'; in fact, such models are often
specified with the condition B(O) = I (see for instance [22, p. 13]).
In order to make a comparison with the stochastic situation that will be
discussed below, let us see how much more difficult the problem becomes when
we require that that the transfer matrix from inputs to outputs should not only
be proper, but also stable. In principle, the same technique as above applies: if
we can write R(z) in the form T(z)B(z) where T(z) is an invertible rational matrix
and B(z) is now a proper stable rational matrix having full row rank for all z
with JzJ :?; 1, then a selection of inputs and outputs will be admissible if and
only ifthe corresponding matrix Bl(z) is nonsingular for all z with JzJ:?; 1. The
desired factorization of R(z) can be obtained by a Wiener-Hopf factorization
with respect to the unit circle [7] (cf. the interpretation of the reduction to row
571
w=Nx+e
(7)
where x generates the observations and e is noise. The observed vector w will
be normally distributed with zero me an and covariance matrix Q, and so all
observational data are summarized in Q. In the model (7), we could select
independent rows from the matrix N (which may be assumed to be of full
column rank) and we might convert the model to an input-output form just as
in the deterministic case. However, without further assumptions on the noise,
the model (7) is hopelessly non-unique. Not even the number of inputs is
well-defined; it may vary from rk Q (no noise) to 0 (all noise).
One possible constraint on the noise covariance matrix r, which is wellmotivated when the observation space Rq is considered as the Cartesian product
of q different one-dimensional spaces, is to require that r should be diagonal.
This, of course, leads to the Jactor analysis model, which has experienced
renewed interest following Kalman's critique of the concept of identifiability in
econometrics [28,29]. What we called 'the number of inputs' becomes 'the
number of common factors' in the context of factor analysis, and it is natural
to define this number as the minimallength ofthe vector x for which a representati on of the form (7) (with cov(ee T ) diagonal) is possible. In contrast to the
unconstrained case, this number is now well-determined, but unfortunately its
determination is an open problem.
From the point of view of selecting inputs and outputs, it may be more
natural to think of Rq not as the product of q one-dimensional spaces, but as
the product of an input space and an output space (yet to be determined). A
possible constraint to impose would be that the noise covariance matrix should
be block diagonal corresponding to this decomposition. This leads to an alternative interpretation of the vector x, since it can be shown that the model
(8)
(with x,e l , and e2 independent) holds if and only if y and u are conditionally
independent given x. The conditional independence property is also used to
define the notion of 'state' in stochastic systems (see for instance [52J), and so
572
J.
~.
Schurnacher
the problem of constructing a model of the form (8) for a given decomposition
ofwinto components u and y is sometimes called a realization problem [17,45].
Let us say that a decomposition of w into inputs u and outputs y is
'admissible' if there is a model of the form (8) in which x has minimal length
among all models of the same type corresponding to the same decomposition,
and in which the matrix H 2 is invertible. The invertibility of H 2 will allow the
model to be rewritten in an input-output form:
y=H 1H;1/1
y= y + B1
u= /1 + B2
This is the errors-in-variables form (see for instance [l1J). The decomposition
of w into y and u leads to a partitioning of the covariance matrix Qww:
Qww = (Qyy
QUY
Qyu)
Quu
QyU =
H1 E[xx T JHI =
QyxQ;/Qxu
and
573
Using the singular value decomposition, one can easily show that the latter
inequality can be rewritten as a lower bound on Q, of the form Q ~ Qmin' The
corresponding (non-unique) 'true' linear relation between the latent variables y
and u is given by QyuQ - 1.
In the static ca se, several proposals have been formulated to reduce the nonuniqueness of the errors-in-variables model by bringing in some extra information; see for instance [1]. Let us see what the dynamic case has to offer. We
follow the development in [21J and [43].
Our goal will be to verify the admissibility of a given decomposition of w(t)
in inputs u(t) and outputs y(t). The observational data are supposed to be
summarized in a spectral density matrix Qww(z) for w, which is partitioned
according to the proposed decomposition as
Qww(z) = (Qyy(Z)
QUY(z)
Qyu(Z))
Quu(z)
We are looking for a 'true' transfer matrix G(z) and a 'true' input spectral density
Q(z) which should satisfy
G(z)Q(z) = Qyu(z)
Q(z) ~ Quu(z),
G(z)Q(z)GT(z -1) ~ Qyy(z),
Izl = 1
Izl = 1
Under suitable assumptions, the development in the static case can be followed
(replace the field R by the field R(z), the partial order '~' by the partial order
'~pointwise for Izl = 1', and the involution MM T by the involution M(z)
MT(z - 1)). As in the static case, the set of all minimal solutions will be parametrized by the spectral density matrices Q(z) that fall between an upper and
a lower bound determined by the data, and the corresponding transfer matrices
are then given by
G(z)
= Qyu(z)Q -l(Z)
where
D(z) = (
L1~)).
and where F _(z)(F +(z)) is unimodular as a matrix over the ring R_(z)(R+(z))
of rational functions having all their poles inside (outside) the unit circle. (We
574
J. M. Schumacher
also used the fact that Qyu(z) must have full column rank, as in the static case.)
For any rational matrix M(z), write M*(z) = MT(z-l); note that M* will be
R_(z)-unimodular if M is R+(z)-unimodular, and vice versa. Now, write
G = F _DF +Q-l
= F _DQ(F!)-l
where
Q=F+Q-1F!
,,= - L "i
i= 1
5 Conclusions
An econometrician once told me that he was amazed that system theory is still
an active field, since he couldn't imagine that the analysis of the KaIman filter
would not be completed by now. Apparently, the full variety of system-theoretic
575
methods has as yet failed to disclose itself to the field of econometrics. System
theory provides a rich set of examples which illustrate the pitfalls of modeling,
and how to avoid these; KaIman has used such examples in his contributions
to the ongoing debate on the fundamentals of mathematical modeling and
identification. System theory also provides a large body of knowledge about
state space techniques, and the applicability of such techniques to econometric
problems has been shown in the work of Aoki and others. But the collection
of mathematical techniques that are familiar .to and developed by system
theorists allows an even more intensive contact. As shown in this paper, matrix
factorizations and pole-zero considerations play an important role in
econometric problems, and system theorists have applied these for a long time.
There is an econometric interest in representation problems, which is something
about which system theory has a lot to say. The invertibility of systems is a
natural concept in dynamic economic analysis, destined to playa role similar
to the invertibility of matrices in static analysis; and again, system theory
pro vi des the necessary tools. While some of the questions here are no doubt
more modest than the fundamental issues with which R.E. KaIman has confronted the econometric profession, they may still be a worthwhile subject for
research and lead to results that will satisfy system theorists and econometricians
alike.
Acknowledgement
It is a pleasure to thank Manfred Deistler, Theo Nijman, and Henk Nijmeijer
for the useful information they kindly provided to me.
References
1. D.J. Aigner, c. Hsiao, A. Kapteyn, T. Wansbeek (1984). Latent variable models in econometrics.
Z. Griliches, M.D. Intriligator (eds). Handbook of Econometrics, North-Holland, Amsterdam,
1321-1393
2. M. Aoki (1975). On a generalization ofTinbergen's condition in the theory ofpolicy to dynamic
models. Review of Economic Studies 42, 293-296
3. M. Aoki (1976). Optimal Control and System Theory in Dynamic Economic Analysis, North-
Holland, New-York
San Francisco
6. R.W. Brocket!, M.D. Mesarovic (1965). The reproducibility of multivariable systems. J Math
Anal Appl 11, 548-563
7. K. Clancey, I.C. Gohberg (1981). Factorization ofmatrixfunctions and singular integral operators,
Birkhauser, Basel
8. J.E. Davidson (1987). Cointegration in linear dynamic systems. (Paper presented at the
Econometric Society European Meeting, Copenhagen, August 1987)
9. J.E. Davidson, D.F. Hendry, F. Srba, S. Yeo (1978). Econometric modelling of the aggregate
time-series relationship between consumers' expenditure and income in the United Kingdom.
The Economic Journal 88, 661-692
10. J.E.H. Davidson, D.F. Hendry (1981). Interpreting economic evidence: The behavior of
consumers' expenditure in the U.K. European Economic Review 16, 177-192
11. M. DeistIer (1989). Symmetrie modeling in system identification. H. Nijmeijer, J.M. Schumacher
(eds). Three Decades of Mathematical System Theory. A Collection of Surveys at the Occasion
of the 50th Birthday of Jan C. Willems, Lect Notes Contr Inform Sei 135, Springer, Berlin,
129-147
576
J. M. Schumacher
12. R.F. Engle (1987). On the theory of cointegrated economic time series. (Paper presented at the
Econometric Society European Meeting, Copenhagen, August 1987)
13. R.F. Engle, C.WJ. Granger (1987). Co-integration and error correction: Representation,
estimation, and testing. Econometrica 55, 251-276
14. R.F. Engle, M. Watson (1985). Applications of Kalmanjiltering in econometrics. (Paper presented
at the 5th World Congress of the Econometric Society, Boston, Mass., 1985)
15. J.C. Engwerda (1988). Control aspects of linear discrete time-varying systems. Int J Contr 48,
1631-1658
16. G. Feichtinger, R.F. Hartl (1986). Optimale Kontrolle Oekonomischer Prozesse, De Gruyter,
BerIin
17. L. Finesso, G. Picei (1984). Linear statistical models and stochastic realization theory. A.
Bensoussan, J.L. Lions (eds). Analysis and Optimization of Systems (Proc 6th Intern Conf Anal
Optimiz Syst, Nice, June 1984), Lect Notes Contr Inform Sei 62, Springer, BerIin, 445-470
18. B.A. Francis, M. Vidyasagar (1983). Aigebraic and topological aspects of the regulator problem
for lumped linear systems. Automatica 19, 87-90
19. P.A. Fuhrmann, J.c. WiIlems (1979). Factorization indices at infinity for rational matrix
functions. Integr Eq Oper Th 2, 287-301
20. C.W.J. Granger (1983). Co-integrated variables and error-correcting models, UCSD Discussion
Paper 83-13
21. M. Green, B.D.O. Anderson (1986). Identification of multivariable errors-in-variables models
with dynamics. IEEE Trans Automat Contr AC-31, 467-471
22. EJ. Hannan (1970). Multiple Time Series, Wiley, New York
23. E.J. Hannan, M. DeistIer (1988). The Statistical Theory of Linear Systems, Wiley, New York
24. S. Johansen (1988). The mathematical structure of error correction models. N.U. Prabhu (ed).
Statistical Inferencefrom Stochastic Processes (Proc AMSjIMSjSIAM Joint Summer Research
Conf, August 1987), Contemporary Mathematics, Vo180, Amer Math Soc, Providence, RI,
359-386
25. S. Johansen (1988). Statistical analysis of cointegration vectors. J Econ Dyn Contr 12,231-254
26. S. Johansen (1989). Estimation and hypothesis testing of cointegration vectors in Gaussian vector
autoregressive models, Preprint 3, Institute of Mathematical Statistics, Univ of Copcnhagen
27. T. Kailath (1980). Linear Systems, Prentice-HalI, Englewood ClilTs, NJ
28. R.E. KaIman (1982). System identification from noisy data. A.R. Bednarek, L. Cesari (eds).
Dynamical Systems II (Univ Florida Intern Symp), Ac Press, New York
29. R.E. Kaiman (1982). System identification from real data. M. Hazewinkel, A.H.G. Rinnooy Kan
(eds). Current Developments in the Interface: Economics. Econometrics. Mathematics, Reidel,
Dordrecht,161-196
30. R.E. Kaiman (1983). Identifiability and modeling in econometrics. P.R. Krishnaiah (ed).
Developments in Statistics 4, Ac Press, New York, 97-136
31. I. Karatzas (1989). Optimization problems in the theory of continuous trading. SIAM J Contr
Optimiz 27, 1221-1259
32. T. Kloek (1984). Dynamic adjustment when the target is nonstationary. International Economic
Review 25, 315-326
33. M. Kuijper, J.M. Schumacher (1990). State space formulas for transfer poles at injinity.
(Manuscript in preparation)
34. H. Kwakernaak, R. Sivan (1972). Linear Optimal Control Systems, Wiley, New York
35. P. Lancaster (1966). Lambda-matrices and Vibrating Systems, Pergamon Press, Oxford
36. C.A. Los (1989). The prejudices of least squares, prineipal components and common factors
schemes. Computers M ath Appl 17, 1269-1283
37. C.A. Los (1989). Identification of a linear system from inexact data: A three-variable example.
Computers Math App117, 1285-1304
38. EJ. Moore (1985). On system-theoretic methods and econometric modeling. Intern Econ Rev
26,87-110
39. A.S. Morse, W.M. Wonham (1971). Status of noninteracting contro\. IEEE Trans Automat
Contr AC-16, 568-581
40. H. Nijmeijer (1989). On dynamic path decoupling and dynamic path controllability in economic
systems. J Econ Dyn Contr 13,21-39
41. H. Nijmeijer, J.M. Schumacher (1985). On the inherent integration structure of nonlinear
systems. IMA J Math Contr Inf2, 87-107
42. P.c.B. PhiIlips, S.N. DurIauf (1986) Multiple time series regression with integrated processes.
Review of Economic Studies 53, 473-495
577
43. G. Picei, S. Pinzoni (1986). A new dass of dynamic models for stationary time series. S. Bittanti
(ed). Time Series and Linear Systems, Lect Notes Contr Inf Sci 86, Springer, Berlin, 69-114
44. AJ. Preston (1974). A dynamic generalization ofTinbergen's theory ofpolicy. Review of Economic
Studies 41, 65-74
45. C. van Putten, I.H. van Schuppen (1983). The weak and strong Gaussian probabilistic realization
problem. J Multivar Anal 13, 118-137
46. M. Salmon (1988). Error correction models, cointegration and the internal model principle. J
Econ Dyn Contr 12, 523-549
47. I.D. Sargan (1964). Wages and prices in the United Kingdom: A study in econometric
methodology. P.E. Hart, G. MiIIs, I.N. Whittaker (eds). Econometric Analysis for National
Economic Planning, Butterworths, London
48. W. Schneider (1986). Der Kaimanfilter als Instrument zur Diagnose und Schtzung variabler
Parameter in konometrischen Modellen, Physica-Verlag, Heidelberg
49. I.M. Schumacher (1983). The algebraic regulator problem from the state-space point of view.
Lin Alg Appl 50, 487-520
50. I.M. Schumacher (1986). Residue formulas for meromorphic matrices. c.1. Byrnes, A. Lindquist
(eds). Computational and Combinatorial Methods in System Theory, North-Holland, Amsterdam,
97-111
51. I.M. Schumacher (1989). Linear system representations. H. Nijmeijer, I.M. Schumacher (eds).
Three Decades of M athematical System Theory. A Collection of Surveys at the Occasion of the
50th Birthday of Jan C. Willems, Lect Notes Contr Inform Sci 135, Springer, Berlin, 382-408
52. I.H. van Schuppen (1989). Stochastic realization problems. H. Nijmeijer, I.M. Schumacher (eds).
Three Decades of M athematical System Theory. A Collection of Surveys at the Occasion of the
50th Birthday of Jan C. Willems, Lect Notes Contr Inform Sei 135, Springer, Berlin, 480-523
53. A. Seierstad, K. Sydsaeter (1987). Optimal Control Theory with Economic Applications, Advanced
Textbooks in Economics, Vo124, North-Holland, Amsterdam
54. 1. Tinbergen (1952). On the Theory of Economic Policy (2nd ed), North-Holland, Amsterdam
55. A. Tustin (1953). The Mechanism of Economic Systems. An Approach to the Problem of Economic
Stabilisation from the Point of Jliew of Control-System Engineering, Heinemann, London
56. I.C. Willems (1979). System theoretic models for the analysis of physical systems. Ricerche di
Automatica 10,71-106
57. I.C. Willems (1983). Input-output and state-space representations of finite-dimensional linear
time-invariant systems. Lin Alg Appl 50, 581-608
58. I.C. Willems (1986). From time series to linear system. Part I: Finite dimensional linear time
invariant systems. Automatica 22, 561-580
59. I.C. Willems (1988). Models for dynamics. U. Kirchgraber, H.D. Walther (eds). Dynamics
Reported (VoI2), Wiley/Teubner, 171-269
60. W.M. Wonham (1979). Linear Multivariable Control: a Geometrie Approach (2nd ed), Springer,
New York
In this paper we define and study a dass of nonlinear filters capable of solving a range of problems
which arise in the design of adaptive analog systems. Even though the definitions used are compelling
in terms of the phenomena and natural in terms of mathematics, the filters require careful analysis
because they can exhibit discontinuous behavior. Using some recent results on a type of gradient
flow on the orghogonal group, we are able to construct a dilTerential equation realization of a
smooth approximation to these filters.
1 Introduction
During the summer of 1970 I spent a pleasant period at Stanford at R.E.
Kalman's invitation. Although his published work stemming from this period
is, for the most part, devoted to problems in algebraic system theory, on that
occasion he had invited people working in a variety of areas, to participate in
teaching a summer school. The visitors included Stephen Grossberg who
delivered, with considerable enthusiasm, a set of lectures on learning. I must
confess that even though the neural models were presented with conviction, the
complexity of the nonlinear differential-difference equation models being
advanced left me feeling that I wasn't ready for work on learning. Of course,
in the meantime cognitive science has become one of the most exciting fields
of inquiry with important contributions coming not only from neuroscientists
but also from computer scientists, electrical engineers, physicists, and statisticians. I can't be sure that the Stanford experience played a significant role in
the decision, but nonetheless, twenty years later, I find myself writing about
dynamical models for learning.
The topics we are concerned with here are standard fare in the neural
network literature. Even so, clear statements about underlying mathematics are
often missing. Our work is most specifically related to the properties of what
are often called linear feedforward networks. We will develop an alternative to
the usual algorithms but most importantly we insist that there be no distinctions
This work was supported in part by the U.S. Army Research Office under grant
DAAL03-86-K-0171, the National Science Foundation under grant CDR-85-00108, and DARPA
grant AFOSR-89-0506.
580
R. W. Brockett
between the learning phase and the operational phase of the system. Singular
value decomposition applied to operators on finite dimensional spaces, be this
expressed in the language of statistics (principal component analysis), stochastic
processes (Karhunen-Loeve expansion) or ordinary linear regression plays an
important role. In terms of applications, there are points of contact with the
work of Kohonen on autoassociation (see especially Chap. 4 of [lJ), work on
adaptive arrays (see the last few Chaps. of [2J) and other topics in neural
networks ([3J is a timely reference). The recent papers of Bourland and Karp
[4J, and Baldi and Hornik [5J have done a great deal to clarify the role of
singular value decomposition in this context. In particular, an important aspect
of the use of the principal component analysis vis avis the learning phase of
the weight selection process is investigated in [5]. In the spirit of Kalman's
pioneering work on the realization problem [6J, one of the goals
of this paper is to capture in one state space model all aspects of the phenomena being investigated. Thus in Sect. 5 we present a ccimplete, input-output,
learning-phase/operational-phase model for learning subspaces; the operations
needed to generate the flow are just addition, subtraction and multiplication.
Most of the literat ure in this area deals with difference equation models but
one can make the case that differential equations are at least as plausible in
terms of biological applicability and we will work with them exclusively. There
would be no intrinsic difficulty in re casting the ideas in difference equation
form.
581
i=lO
i=lO
That is, the e-basis is ordered by the integral squared value of <u, ei(t) on
[0, t]. Such a basis always exists (we will show how to construct it later) and
it is unique up to the choice of sign unless it happens that two or more of the
e's span spaces of equal "energy". We use this definition and the set {A.;}~= 1 to
define a mapping ~;.:Em[O,t]f---+Em[O,t] whose action is given by
~;.(u)(t)
L Ai<u(t), e;(t)ei(t)
i=l
Because the choice of sign for the ei(t) cancels out, this definition is unambiguous
unless there are two or more e;'s which define subspaces with equal energy. In
that case we resolve the ambiguity by multiplying all the components of u(t)
which lie in an equal energy subspace by the average of the corresponding A's.
We call ~;. the ideal, norifading memory, adaptive subspace filter with weight
vector ,1.
If all the A's are the same, Al = ,12 = ... = Am = a, then ~ ;.(u(t)) = au(t);
otherwise ~;. shapes u(t) in a way that depends nonlinearly on the past of u.
Even so the definition implies that ~;. is homogeneous of degree 1: ~ ;.(au(t)) =
a~ ;.(u(t)). Also, ~;. is additive with respect to the weights:
~ ;.(u(t))
+ ~ /L(u(t)) =
~ +/L(u(t))
The relations hip between weight multiplication and the composition law is
more interesting.
Lemma. 1f ,1 is monotone descreasing in the sense that Al> ,12 > ... > Am then
~. = ~ /L(~;') with V= (Al Jl1' A2Jl2' .. ' AmJlm). 1f ,1 and Jl are both monotone
decreasing, then ~. = ~ ;.(~ ,,) = ~ /L(~ ;.).
Proof. Consider ~/L(~;'). If {ei(t)}~=l is adapted to u and if ,1 is monotone
decreasing, then {ei(t)}~= 1 is also adapted to ~ ;.(u) and so the result folIows.
The same reasoning applies to ~;.(~ ,,).
If ,1 is not monotone, there will be choices for u such that the e-basis for u
is not an e-basis for ~ ;.(u). In general the composition of two subspace filters
is not a subspace filter.
Even though ~ ;.(u)(t) depends on the behavior of u on the wh oIe interval
[0, t], the only aspects of the past of u which matter are those which are needed
582
R. W. Brocket!
then Q(t) provides a summary of the past of u which is adequate to permit one
to compute the ei(t). In fact, because Q(t) is symmetrie and positive semidefinite,
there necessarily exists an orthogonal matrix 0(t) such that Q = 0(t)Q(t)0 T(t)
is diagonal with the eigenvalues of Q, arranged in decreasing order along the
diagonal of Q. As above, if the eigenvalues of Q are unrepeated, then 0(t) is
unique to within a premultiplication by a diagonal matrix D whose diagonal
entries are 1. Regardless of how this choice of signs is made, the rows of 0(t)
are suitable basis vectors for the definition of g;).. This is the construction of
{ei(t) }~= 1 that was promised. In view ofthis we can assert the following theorem.
Theorem 1. Let A be the m by m matrix diag(A 1 , A2, ... , Am). Except Jor those
values oJ t Jor wh ich Q(t) has repeated eigen va lues, the input-output system g;).
is realized by
Q(t) = u(t)uT(t);
Q(O) =
with 0(t) being any orthogonal matrix such that Q(t) = 0(t)Q(t)0 T(t) is diagonal
with the diagonal entries oJ Q arranged in decreasing order along the diagonal.
Moreover, given any Q= QT > and any t > there exists uEEm[o, t] wh ich
drives Q to Qat time t and if A is not a multiple oJthe identity Jor any Q1 #- Q2'
there is a u which evokes a different response Jrom Q(O) = Q1 than it does Jrom
Q(O) = Q2.
583
Qo(t)
= Juo(a)u~(a)da
o
has unrepeated eigenvalues ql > q2 > ... > qm. Let eo(t) be such that
eo(t)Qo(t)e~(t)=diag(ql,q2, ... ,qn) and let Q# denote the skew-symmetric
matrix whose i/h entry, for i =f. j is (qi-q)-l. Assume that JF).: Em[o, t] H Em[O, t]
maps Uo into Yo. 1f JF ).(u o + ebu) = Yo + eby, then to first order in e
by(t) = e
+ bu(a)uT(a)da e
o(t)), e
~(t)A e
o(t) ] u(t)
+ ebu(a))(u(a) + ebu(a)f da
Ju(a)buT(a) + bu(a)uT(a)da
o
Since
is to be orthogonal and since (1 + eQ) is orthogonal to first order in
e, if and only if Q = - QT, we express the variation in e required to keep Q
diagonal to first order as eo(l + eQ) with Q = - QT. Thus we have
eo(l + eQ)(Qo + ebQ)(l- eQ)e~ = Q + ebQ
Since Q is diagonal, the effect of the two Q terms on the left side of this equation
is to multiply the i/h element of eoQe~ by -(qi-q). Since the matrix
eoQe~Q - QeoQe~ which appears on the left is dearly zero on the diagonal
we see that bQ = diag eobQe~ and that
eoQe~ = - Q#o(eobQe~ - diag eobQe~)
But since Q# is also zero on the diagonal, Q#odiag eobQe~ is zero and
eoQe~ = - Q#o(eobQe~)
584
R. W. Brockett
111=min
{x;}
L IIYi-KxiI12;
i=1
xiEE r
Problem 2. Given two sets of n vectors in Ern, {Yi}7=1' {x;}7=1' and given an
integer r ~ m, find
n
112 = min
K
i=1
IIYi-KxiI12;
rankK = r
113 = min
K,Xi
i=l
11 Yi - Kxdl
Xi = (K TK)-1 KTYi
Using this, problem 3 can be recast by substituting for Xi its optimal value as
a function of K. This gives
n
113 = min
K
i=1
rankK = r
585
.Eyy =
.EyX =
L YiY;;
i= 1
n
L Yi X ;;
i=1
.Exy =
L XiY;
i= 1
n
.E xx = LXiX;
i= 1
A=.E yxy~-1B(BBT)-1
"" xx
Substituting this .into the expression for '12 and taking advantage of a
cancellation of terms, gives
'12 = min tr(.Eyy - .EXy ,JE:-1 B(BB T)-1BT,JE:-1.EXY )
B
586
R. W. Brocket!
Example 1. The autoassociative problem. (See [1], [4], [5].) The autoassociation problem is that of finding a system which is capable of categorizing a set
of examples and then using what it has learned to assign subsequent inputs to
the correct category, even if they are somewhat inaccurate or incomplete. More
precisely, suppose one is given an input uEEm[O, t] and that, for the most part,
the inteval [0, t] consists of segments during which u(t) is drawn from a finite
set of constant vectors with only an occasional transition between these
segments. Say
u(u) = uit;
u(u) = u;,;
o~
u ~ t1
tl+c5~u~t2
u(U) = Ui r ;
'1 =
Jo min
i
11
HGu(t) - Ui 11 2 dt
is as smaH as possible. This problem is solved by the fF). filter which maps
Em[O, t] into itself and has A. = (1,1, ... ,1,0,0, ... ,0) with r ones and m - r zeros.
To see this, form the sampie covariance matrix Q(t) = J~u(u)uT(u)du and define
y by
y(t) = eT(t)A eu(t)
with e(t)Q(t) eT(t) = Q= diag(ql' tl2' ... ' tlm) as in theorem 1. This rule will
associate to u(t) an element in the space spanned by {U;}~=l approximating u
as weH as possible in the least squares sense. In view of the form of A we should
let G and H be the first r columns of e T and the first r rows of e(t), respectively.
Example 2. Total least squares approximation. This involves generating an
approximating linear transformation associated with empirical data. The
approximation is to be optimal in a total least squares sense. Suppose that
aEE n and bEEn are, to within certain error, related by b = Ta and that over a
period of time a number of instanciations of this relationship have been made
available. We write these as (a(t), b(t)). Consider the 2n-dimensional input and
2n-dimensional output adaptive subspace filter fF ). with A. being n ones foHowed
by n zeros, i.e. A. = (1,1, ... ,1,0,0, ... ,0). Form a 2n-dimensional vector
u(t) = [a(t)]
b(t)
587
If the relation b(t) = Ta(t) holds, then the Q of Theorem 1 is given by the 2n
by 2n symmetrie matrix
A short calculation shows that in this ideal case, unless a is confined to a proper
subspace of En on all of [0, t], the e which diagonalizes Q(t) is such that
(A = diag(1, 1, ... ,1,0,0, ... ,0))
(I + TTT)-lTT ]
T(I + TTT)-lTT
Of course if the pairs a(t), b(t) are distorted by noise, then e will track some
nearby appropriate average and e TAeu will only approximate the desired
output.
588
R. W. Brockett
Ifwedefine 11 Q 11 to be
(.Eq~.)1/2 = (trQTQ)1 /2, the so-called Frobenius norm, then
J
along solutions of Q = uu T - uTQuQ we have
d
dt
_11 Q 11 2 =
d
dt
- tr Q TQ
Thus if 11 Q 11 is initially less than one, it will grow and approach one but not
grow beyond it; if it is initially greater than one, it will decrease to one but not
go below it. If 11 Q 11 is initially one, it will remain so.
With this normalization the equation for Q can be rewritten as
Q(t) =
Ju(O')uT(O')e-f~UT(~)Q(~)u(~)dqdO'
t
which, although still implicit, clearly places in evidence the fact that past values
of u are discounted more heavily than recent values. This change in the definition
of Q suggests that we introduce a corresponding version of the adaptive subspace
filter.
Q(O) = I/jm
589
e=
is the steepest deseent equation for the funetion tr 0Q0 TN and that tr 0Q0 TN
has 2n . n! stationary points exaetly 2n of whieh are stable. Thus if N is diagonal
we may assert that, for almost all initial eonditions on 0, the solution of this
equation approaehes a value whieh makes 0Q0 T diagonal and ordered
similarly with N.
Definition. Ifu(t)EEm, A =diag(A 1')'2, ... ,A m) and N=diag(1,2, ... ,m), we will
refer to
Q(O) = I/fm
590
R. W. Brockett
Of course this system assigns to y(t) exactly the same value that jj). does
if the e equation has reached the equilibrium corresponding to one of its
minima. Roughly speaking, the quality of the approximation improves as the
rate of change of u is reduced. However, this system defines a continuous map
of Em[O, tJ H Em[O, tJ and cannot realize jj). exactly.
Evaluation of the right-hand side of this system of equations only involves
multiplication, addition and subtraction. The e equation plays the role here
that back propagation plays in that it can be thought of as adjusting weights.
The absence of local minima reported in [5J can be interpreted as a special
case ofthe assertion in [llJ about the stationary values oftr(eQeTN).
6 A Master Equation
The point of the previous set of equations is that although it is common to
discuss feedforward networks as operating in two stages, a learning or training
stage followed by an operational stage, in many cases this division is artificial
and can be done away with. Here we want to ofTer a slightly different form of
these equations, one which generates the sampie covariance data and its
normalized eigenvectors at the same time.
Consider P = eQ with e and Q as above. Then
p=eQ+eQ
= e(t)QeTNeQ - NeQQ + euu T - uTQueQ
=pe TNP-NeQ2 + euu T -uTQuP
From the definitions we see that eQ is the left polar decomposition of P and
clearly pTp = QTQ = Q2. Thus we may write Q = JpTp and e = P(JpTP)-l.
This gives
p=pJpTp-lpTNP_ NPJpTp + P(JpTP)-lUU T _u TJpTpUp
This simplifies to
p
7 Conclusion
It seems that learning problems often have both a calculus (or continuous
591
1985
[3] D.E. Rumelhart and J.L. McClelland, Parallel and Distributed Processing, MIT Press,
Cambridge, MA, 1986
[4] H. Bourland and Y. Karp, "Auto-Association by Multilayer Perceptrons and Singular Value
Decomposition," Biological Cybernetics, Vol 59 (1988) pp 291-294
[5] P. Baldi and K. Hornik, "Neural Networks and Principal Component Analysis: Learning
from Examples Without Local Minima," Neural Networks, Vol 2 (1989) pp 53-58
[6] R.E. KaIman, "Canonical Structure ofLinear Dynamical Systems," Proc. Nat. Acad. Sciences,
Vol 48 (1962) pp 596-600
[7] E. Oja, "A Simplified Neuron Model as a Principal Component Analyzer," J of Math Biology,
Vol15 (1982) pp 267-273
[8] S. Shinomoto, "Memory-Maintenance in Neural Networks," Journal of Physics, A20, 1987
[9] RJ.T. Morris and W.S. Wong, "A Short-Term Neural Network Memory," SIAM Journal on
Computing; Vol17 No 6 (1988) pp 1103-1118
[10] Terence D. Sanger, "Optimal Unsupervised Learning in a Single-Layer Linear Feed-Forward
Network," Neural Networks, Vol2 (1989) pp 459-473
592
R. W. Brockett
[11] R.W. Brockett, "Least Squares Matching Problems," Linear Algebra and Its Applications,
Vols 122/123/124 (1989) pp 761-777
[12] R.W. Brockett, "Dynamical Systems That Sort Lists, Diagonalize Matrices and Solve Linear
Programming Problems," Proceedings ofthe 1988 IEEE CO/iference on Decision and Control,
IEEE, New York (1988) pp 799-803
[13] C.E. Shannon, "Symbolic Analysis of Relay and Switching Circuits," Trans AIEE, Vol57
(1938) pp 713-723
[14] J.B. Dennis, Mathematical Programming and Electrical Networks, J WiIey, New York, 1959
[15] S. Grossberg, "Nonlinear DilTerence-DilTerential Equations in Prediction and Learning
Theory," Proc. Nat. Acad. Sciences, Vol 58 (1967) pp 1329-1334
[16] N. Karmarkar, "A New Polynomial Time Algorithm for Linear Programming," Combinatorica,
Vol4 (1984) pp 373-395
Subject Index
A
accessibility of nonlinear systems, 6, 454 fT.
to the origin, 459
computational complexity, 461
Lie algebra, 457 Ir.
rank condition, continuous, discrete
time, 457 fT.
strong, and controllability, 468
Ackermann's formula, 319
adaptive control and identification, 387-450
adaptive control, 2, 6, 19, 437-450
algorithms, 438 fT.
direct, indirect, 438 fT.
indirect, unexpected properties, 445 fT.
industrial use, 448
LS estimator as KaIman filter, 443 fT.
microprocessor implementation, 448
model reference, 437
self-oscillating, 437
adaptive filters for learning systems, 580 fT.
descent equation, 589 fT.
realization, 582
algebraic bundle, 335
algebraic coding, see: coding
algebraic canonical form, 335
algebraic geometry, 6, 327 fT.
algebraic groups, 322 fT.
algebraic system theory, see: module theory
and linear system theory; linear
systems; polynomial matrices;
realization problem
algebraic varieties and morphisms, 328 fT.
affine varieties, 331
c1assification of varieties, 330
quasi-projective varieties, 335
algebraization of system theory, 234
algebras, C*, W*, von Neumann, 137
predictor, splitting, 218
algorithms,
B. L. Ho, 284
Berlekamp-Massey recursive, 201
Nevanlinna-Pick, 201, 202
for triangularizing polynomial
matrices, 363-365
analytic c10sed loop solvability (analytic
CLS),407
594
Subject Index
C
calculus of variations, 70
canonical embedding and projection, 271, 299
canonical forms, 5, 23, 36, 239, 280
and continued fractions, 256, 257
canonical forms, global algebraic,
non-existence of, 337
canonical structure (decomposition)
theorem, 192, 196
extensions, 6, 475-488
nonlinear systems, 478 fT.
time-varying systems, 476fT.
canonical (controllable, observable)
systems, 26, 284, 339-341
see also: minimal systems, minimal
realizations
canonical system, weakly, 205
cascade interconnection, 201
category theory and system theory, 279,
284-288
Cauchy approximation, 202
Cauchy index, 253
causality and stability, 301
Cayley-Hamilton theorem, 332, 340, 410
CCD filters as systems over rings, 312
chaotic behavior, 2
characteristic and transfer functions, 503
Cholesky factorization, 406
cIosed orbit lemma, 330
co-state or Lagrange multiplier, 180
coding, algebraic and system theory, 527-557
block codes, 527
catastrophic encoder, 531
convolutional codes, 7, 290, 527
algebraic methods, 7, 290, 550 fT.
encoder, 356, 529, 536 fT.
dual codes and encoders, 539 fT.
generalized minimal encoder, 545 fT.
minimal bases and poles at infinity, 542 fT.
parity-check matrix, 541
realization of encoders, 546
syndrome, 539
systematic encoders, 541 fT.
code sequence, code, 530
generator matrix, 530
Hamming (free) distance, 530
performance vs. complexity, 527
systems over the integers, 312
system theory and algebra, 532 fT.
trellis codes, 539,fT.
co integration, 7, 560 fT.
and zero structure, 565
commutant lifting theorem, algebraic
version, 241
commutative algebra, 280, 283, 285
Subject Index
pole-assignability and feedback, 267
controllable systems and transfer functions, 30
controllers,
H"" central and LQG c1assical, 168
PID, 18
stabilizing, YJBK parametrization of, 6,
159, 199, 349-353
synthesis and computer algebra, 355-367
convergence, of polynomial matrices, 32
of systems, 321T.
convolution systems, 194,216, 244, 315
convolutional codes, see: coding
coprimeness,
of polynomial matrices, 239-242, 307, 348
and corona theorem, 239
.
approximate,211
H"",239
observability, reachability, 248
corona theorem and coprimeness, 239
covariance, 45, 57, 90, 216, 227, 407
cross correlation, 222
cybernetics, 17
D
Darlington synthesis, 201, 264
datacompression, realization problem, 193,295
dead-beat control, 147
description, external, input/output, black
box, 4, 191,213
internal, state space, 4, 191
see also: state space models
see also: modeling, realization problem,
behavioral approach
design theory, 1471T.
detectability, 47, 48
detector, optimal, 66
dilTerential algebra and fields, 6, 4641T.
dilTerentially algebraic elements,
extensions, 4651T.
dilTerentially transcendental elements,
extensions, 4661T.
dimension of state space systems, 194
see also: canonical systems
Diophantine (Bezout) equations, 356
discrete event systems, 356
displacement, structure, 781T.
rank, 821T.
distributed and nonlinear systems, 451-500
distributed systems, control of 491-499
distributions, Schwartz, 193, 205, 208
E-module framework, 314
smooth, involutive, integrable, 4791T.
disturbance decoupling problem, almost, 164
division, Euclidean, 3, 5, 201, 202, 239, 256,
261,359
pseudo-, 360 IT.
divisors, common, greatest,
left-right, 239-240, 358, 533
595
Dry-Tuned-Gyro (DTG), 96
duality, exact reachability and topological
observability, 207
duality, KaIman filtering and LQG
problem, 47, 60
dynamic programming, 18, 19
dynamics assignment and invariants, 306-308
dynamics, topological, 37
E
E-module framework for systems over rings, 314
econometrics,
error correction models, 561
errors-in-variables models, 5691T., 572
factor analysis models, 571
inputs and outputs, 5691T.
system-theoretic trends, 7, 559-577
tracking of targets, 5661T.
economic time series, random drift, 560
electronic devices, solid state, 17
engineering,
theory, practice, and art, 296
vs. physics, 154
mathematization of, 296
theoretical, 18, 296
entropy minimization problem, 174
equations,
behavioral, 19, 21, 36, 216
Chandrasekhar, 69, 78, 81, 83
c1assical Wiener-Hopf, 78
delay dilTerential, 211
forward and backward evolution, 76
gyro/accelerometer motion, 97
integral and Riccati equations, 64
integral, Fredholm-type, 63
Lyanunov dilTerential, 74
Wiener-Hopf-type, 63, 78, 79
stochastic dilTerential, 65, 227
equivalence, stochastic, 215
equivalent (linear) systems, 194, 3301T.
errors-in-equations models, 423
errors-in-variables (EV) models 216, 423,
5691T., 572
estimates,
conditional mean, 42
least-squares, 6, 56, 65, 89, 4381T.
filtered, smoothed, 56, 69
estimators,
asymptotic properties of, 393
causal, on-line, optimal, 41
consistent, 434
gyro/accelerometer, 99
maximum likelihood (ML), 400
consistency and asymptotic
normality, 406-420
minimum prediction error (MPE), 3911T.
optimal state, 183
optimal (ASM), 3911T.
596
Subject Index
H", problem,
standard control problem, 161
output-feedback control problem, 167
state-feedback control problem, 162
H", coprimeness, 239
H", filtering and KaIman filtering, 166
H", filtering and smoothing problem, 165
Subject Index
Hamiltonian, equations, 60, 82
framework, 191
matrix, 70, 180
Hamming (free) distance, 530
Hankel,
matrices (behavior matrices), 193,340,396
signature of, 341
partially determined, 197
functional equation, 242
map or Kaiman input/output map, 271
operators and module
homomorphisms, 242-244
operators, singular values of, 260
quadratic form, 250
Hautus controllability test generalization, 27
heat bath, 217
Hermite form of polynomial matrices, 357,
359, 367
Hermite-Fujiwara, matrix, 256
quadratic form, 252-256
Hermite-Hurwitz stability theorem, 254
Hermitian quadratic form, 251
Hessenberg form, 380
hidden modes (decoupling zeros), 471 fT.
Hilbert, Nullstellensatz, 331
state space, 205
uniqueness method (HUM) 491, 495
Hurwitz stability, 372 IT.
I
597
598
Subject Index
It dilTerential rule, 62
It stochastic integral, 60
K
KaIman experimental setup, 271
KaIman filtering, 2-4, 6, 7, 18, 19,41-143,
185,400,438,492,559
Bierman factorization algorithms, 119
and H", filtering, 166
adaptive filtering, 53
advance in digital data processing, 134
asymptotic stability and time invariance, 47
Chandrasekhar type algorithms, 53
compensation of solid friction, 94
computational issues, 46, 53
continuous time (Kalman-Bucy filtering), 44
controller design, 53
discrete time, 48
dual of LQG problem, 47
extended or nonlinear, 52, 67, 106, 124
historical comment, 59
implementation on digital computer, 91
infinite time interval, 46
key dilTerences with Wiener filtering, 46
LS estimator in adaptive control, 4431T.
navigation and guidance, 89-134
quantum, 4, 135-143
separation of dynamics and
measurements, 48, 49
square root filtering, 53
stability, 60
state-space formulas, 78
Kalman-Bucy or continuous-time KaIman
filtering, 44, 56-60, 79, 81
Kalman-Bucy formulas, 63, 64, 69
KaIman,
gain,44,79
controllability (reachability) and
observability matrices, 196, 332, 454
input/output or Hanke! map, 271, 2821T.
realization diagram, 282
space of families of systems, 3281T.
and the Grassmannian, 335
construction of, 334-335
geometric structure, 335-337
Karmarkar descent algorithm, 591
Kepler's laws, 20
Krohn-Rodes theory, 284
Kronecker indices, 3, 201, 305
Kyoto Prize, 41
L
Lagrange multiplier or co-state, 180
Lagrangian framework, 191
Laplace transforms, 36
LaSalle's bang-bang principle, 148
latent variables, 19,36,424
laUice filters, 84, 380 IT.
Laurent series, 282, 2971T., 530
Subject Index
LTR (loop transfer recovery), 155, 187
Luenberger form, 382
Lure's theorem, 147
Lyapunov,
equation, 74, 256, 406
function, 379
stability theorem, 255
second method, 377
Lyapunov-Routh-Hurwitz stability, 374
M
MA models, 27 Cf.
Macsyma and Mathematica, 357, 366
manifest variables, 20, 36
Mansour or discrete Schwarz form, 6,
375 Cf.
map, continuously invertible, 205, 207
margin, gain and phase, 102, 183
Markov,
chains, quantum, 135, 141
density functions, Fokker-Planck equation
for, 59
diCfusions, 60
parameters, 192, 195, 244, 269, 270
process,90
property, 74, 217
splitting property, 219 Cf.
martingales, 4, 64, 66, 223
match between dynamical intuition and
algebra, 287
mathematical models,
behavior and behavioral equations, 19 Cf.
extern al (phenomenological), internal, see:
description
see also: modeling, realization problem
mathematics and system theory, 501-523
mathematization of engineering, 296
matrices,
behavior or Hankel, see: Hankel
matrices
Lwner, 202
polynomial, see: polynomial matrices
scattering, 71
matrix factorization, 83, 504
matrix fractions for rational matrix
functions, 200, 243 Cf., 275, 313, 347 Cf.
coprime fractions, see: coprimeness
matrix pencils, 510
matrix rational interpolation
problems, 201 Cf., 503 Cf.
maximum likelihood estimators, 390, 400
consistency and asymptotic
normality, 406-420
maximum principle, 18, 19, 148
Maxwell, J. c., 17,233
Mayne-Frazer smoothing formula, 73
McMillan degree, 192
see also: invariant factors
memory span, 22, 32
599
600
Subject Index
observability/reachability decomposition,
192, 196
see also: canonical structure theorem
observables, algebra of, 136
observer, 19, 44, 348
operation valued measures, 138
operator model theory, 503
operator, Markovian, 137
operator, Volterra, 63 fT.
optimal control, 145-188
inverse problem of, 148
and H"" 164
control performance vs. input power, 151
optronic systems, 105
orbit space, 340
order chain and list, 304
orthogonality condition, 57, 62, 69
orthogonality of subspaces, conditional, 220
output and input spaces, see: input/output
spaces
output-feedback canonical form and
invariants, 258-259
outputs, regulated and measured, 161, 351
P
Subject Index
reachability, observability, 440
unstable, 346, 348
pole-zero exact sequence, 290
poles and zeros of linear systems, 268, 280,
289-292
poles at infinity and minimal bases, 542 Ir.
polynomial (module) action and time-shift, 281
polynomial ba ses and indices, minimal, 291, 554
polynomial matrices, 5, 21
terminology of 357-359
column, row reduced, 552
convergence of, 32
coprimeness, 239-242, 307, 348
elementary row and column operations, 358
fractions of, 200, 243, 275-277, 347 fT., 367, 395
poles, zeros at infinity, 552
Hermite forms, 357
predictable degree property, 538
triangularization of, 356, 359 fT.
unimodular, 29, 358
polynomial models, 5, 235, 247
polynomial modules, 235, 268, 280
and dynamical structure, 280
and input/output spaces, 281
polynomials, 3, 235, 281
computation, 348
error-free computation, 6, 357 fT.
generalized, 349
localizations of, 299 fT.
Pontryagin maximum principle, 18, 19, 148
positive definiteness and stability, 253
positive real lemma, 3, 192, 215
power series, formal, 235
rationality of, 313
see also: Laurent series
prediction, 46, 389
prediction error methods (PEM), 390 fT.
loss and cost functions, 391
recursive construction, 403
predictor algebras, 218
prejudice in identification, 425
principal ideal domain, 300, 315, 532
probability,
spaces, measures, distributions, 215 fT.
axiomatic framework for modeling, 216
conditional, 217
processes,
random (stochastic), 41, 45, 215 fT.
Gaussian stationary, 400
non-Gaussian, 67
Poisson, 67, 68
purely non-deterministic, 224
state-space description of nonstationary, 57
wide sense or second order, 216, 217, 394
properness of compensator and partial
realization, 200
pseudo-division lemma, 361
pseudorationality, 208
Ptak space, 206
601
Q
quadratic forms: Bezoutian, Hankel,
Hermite, Hermite-Fujiwara, 250-256
quadratic stabilization theory, 156
quadrature, 80
R
602
Subject Index
Subject Index
short exact sequence, 236
fundamental pole-zero, 291
SIGAL strapdown inertial systems, 96
signal,
estimation and detection, 68
processing and systems over the integers, 312
processing, statistical, 214
space, 20
singular values of linear systems, 260
singular value decomposition, 580
Smith canonical form and invariant factors,
284, 367
Smith-McMillan, form, 7, 289, 367, 395, 532,
534, 553
degree, 396
Smith-McMillan-Yoo form, 562
smoothing, 46, 501T., 701T.
Sobolev space, 496
solid friction, adaptive compensation, 105
spaces,
input, output, see: input/output spaces
state, see: state, state space realizations
probability, 215
Ptak, barreled, 206
spectral densities, 178, 184, 394, 424
spectral factorization, 4, 43, 90, 215, 225
of Wiener and Hopf, 56, 57
innovations, Riccati equation, 49
time-varying, 50
spectral matrix, 50
spectral problems, 239
splitting algebra, variables, 218-222
Sputnik, 59
SSX models, 3951T.
stability,
margin, 378-379
interna!, 297
of input/output map, 300
stability, module theoretic framework, 2971T.
stability and causality, 301
stability module, 303
stability and module indices, 303-305
stability rings, 297
stability criteria,
Hermite-Hurwitz, 254
Lienard-Chipart, 253, 375, 378
Lyapunov, 255, 3731T.
Routh-Hurwitz, 372 ff.
Schur-Cohn, 375 ff.
unified treatment, 250
stability theory,
linear systems, 249-256
discrete linear systems, 371-384
stabilizability, 47, 48
stabilizer subgroup, 333
stabilizing controllers, YJBK
parametrization, 349-353
state, 3, 135, 159,214,219,270
see also: modeling or realization
construction, 4, 191 ff.
interface between past and future, 152
603
604
Subject Index
Subject Index
Wiener N., 17
Wiener filtering, 3, 4, 41, 43, 49, 90
calculation burden, 46
and KaIman filtering, key dilTerences, 46
signal model for, 42, 44
Wiener process (or Brownian motion), 64,
66, 68, 178, 223
causal, anticausal, 223, 225
Wiener-Hopf factorization, 56, 57, 570
Wiener-Hopf integral equations, 63, 510, 554
Wiener-Paley physical realizability theorem,
147
Wiener-Volterra expansion, 67
Wold representation, 224
605