Sunteți pe pagina 1din 589

R.E.

Kalman

Athanasios C. Antoulas (Ed.)

Mathematical
System Theory
The Influence of R. E. KaIman
A Festschrift in Honor
of Professor R. E. Kalman
on the Occasion
of his 60th Birthday
With 49 Figures

Springer-Verlag
Berlin Heidelberg GmbH

Prof. Dr. Athanasios C. Antoulas


Department ofElectrical and Computer Engineering
Rice University, Houston, Texas 77251, USA
Mathematical System Theory, E.T.H. Zrich
CH-8092 Zrich, Switzerland

ISBN 978-3-662-08548-6

Library of Congress CataJoging-in-Publication Data


Mathematical system theory- the influence ofR.E.Kalman!
Athanasios Constantinos Antoulas, ed.
Includes index.
ISBN 978-3-662-08548-6
ISBN 978-3-662-08546-2 (eBook)
DOI 10.1007/978-3-662-08546-2
I. Control theory. 2. Kaiman filtering. 3. System analysis.
l. Antoulas, Athanasios Constantinos, 1950 -.
This work is subject to copyright. All rights are reserved, whether the whole or part of the
material is concemed, specifically the rights of translation, reprinting, re-use of illustrations,
recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data
banks. Duplication of this publication or parts thereof is only permitted under the provision
of the German Copyright Law of September 9, 1965, in its current version and a copyright
fee must always be paid. Violations fall under the prosecution act of the German Copyright
Law.
Springer-Verlag Berlin Heidelberg 1991
Originally published by Springer-Verlag Berlin Heidelberg New Y ork in 1991

The use of registered names, trademarks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
'IYpesetting: Thomson Press (India) Ud., NewDehli;
61/3020-543210 - Printed on acid-free paper.

Hypotheses non fingo (Newton)

Curriculum Vitae of R.E. KaIman

Rudolf EmU Kaiman was born in Budapest, Hungary, on May


19,1930. He received the bachelor's degree (S.B.) and the master's
degree (S.M.) in electrical engineering, from the Massachussetts
Institute ofTechnology in 1953 and 1954 respectively. He received
the doctorate degree (D. Sei.) from Columbia University in 1957.
His major positions include that of Research Mathematician at
R.I.A.S. (Research Institute for Advanced Study) in Baltimore,
between 1958-1964, Professor at Stanford U niversity between
1964-1971, and since 1971 Graduate Research Professor, at the
Center for Mathematical System Theory, University of Florida,
Gainesville. Moreover, since 1973 he has also held the chair for
Mathematical System Theory at the ETH (Swiss Federal Institute
of Technology) Zrich. He is the recepient of numerous awards,
including the IEEE Medal of Honor (1974), the IEEE Centenial
Medal (1984), the Kyoto Prize in High Technology from the
Inamori foundation, Japan (1985), the Steele Prize of the American
Mathematical Society (1987). He is a foreign member of the
Hungarian and French Academies of Science, and has received
a number of honorary doctorates. He is married to Constantina
nee Stavrou, and they have two children, Andrewand Elizabeth.

Contents

Introduction . . . . . . . . . . . . . . . . . . . .
List of Technical Publication of R.E. Kaiman

1
9

Chapter 1: Axiomatic Framework

Dynamical Systems, Controllability,


and Observability: A Post-Modern Point of View
J.c. Willems . . . . . . . . . . . . . . . . . . . . . .

17

Chapter 2: Kaiman Filtering

Kaiman Filtering: Whence, What and Whither?


B.D.O. Anderson. J.B. Morre . . . . . . . . . . .

41

From Kaiman Filtering to Innovations,


Martingales, Scattering and Other Nice Things
T. Kailath . . . . . . . . . . . . . . . . .

55

Kaiman Filtering and the Advancement


of Navigation and Guidance
P. Faurre . . . . . . . . . . . . . . . . . .

89

Quantum Kaiman Filters


L. Accardi . . . . . . . . .

135

Chapter 3: The LQG Problem

LQG as a Design Theory


H. Kimura . . . . . . . . . . .

. . . . . . . . .. 147

State-Space H 00 Control Theory and the


LQG Problem
P.P. Khargonekar . . . . . . . . . . . ..

. . . . . . . . . . 159

Unified Continuous and Discrete Time


LQG Theory
G.c. Goodwin. M.E. Salgado . . . . . . . . . . . . . . . . .. 177

Contents

Chapter 4: The Realization Problem

Linear Deterministic Realization Theory


A.C. Antoulas, T. Matso, Y. Yamamoto

191

Stochastic Realization Theory


G. Picci . . . . . . . . . . . . . . . . . . .

213

Chapter 5: Linear System Theory: Module Theory

Algebraic Methods in System Theory


P.A. Fuhrmann . . . . . . . . . . . . . . . . .

233

Module Theory and Linear System Theory


M.L.J. Hautus, M. Heymann . . . . . . . .

267

Models and Modules: Kalman's Approach


to Algebraic System Theory
B.F. Wyman . . . . . . . . . . . . . . . . . .

279

Linear Realization Theory, Integer Invariants


and Feedback Control
J. Hammer . . . . . . . . . . . . . . . . . . . . .

. . . . . . 295

Linear Systems Over Rings: From R.E. KaIman


to the Present
E.w. Kamen . . . . . . . . . . . . . . . . . . . . .

.. 311

Chapter 5: Linear System Theory: Families of Systems

Invariant Theory and Families


of Dynamical Systems
A. Tannenbaum . . . . . . . . .

. . . . . . . . . . . . . . 327

Chapter 5: Linear System Theory:


Related Developments

On the Parametrization of Input-Output Maps


for Stable Linear Systems
J.B. Pearson . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Algebraic System Theory, Computer Algebra
and Controller Synthesis
J.S. Baras . . . . . . . . . . . . . . . . . . ..

.. . . . .. 355

On the Stability of Linear Discrete System


and Related Problems
M. Mansour and E.I. Jury . . . . . . . . . .

. . . . . .. 371

Chapter 6: Identification and Adaptive Control

Finite Dimensional Linear Stochastic System Identification


P.E. Caines . . . . . . . . . . . . . . . . . . . . . . . . . . .. 389

Contents

XI

Identification of Dynamic Systems from Noisy Data:


The Case m* = 1
M. Deistier, B.D.O. Anderson

423

Adaptive Control
K.J. Astrm . . . . . . . . . . . . ...... .

437

Chapter 7: Generalizations to Nonlinear


and Distributed Systems
Kalman's Controllability Rank Condition:
From Linear to Nonlinear
E.D. Sontag . . . . . . . . . . . . . . . . . . . .

453

Controllability Revisited
M. Fliess . . . . . . . . . . . . . . . . . . . .

463

On the Extensions of Kalman's Canonical


Structure Theorem
A. Ruberti, A. Isidori . . . . . . . . . . . . . . . . . . . . . . 475
Some Remarks on the Control of Distributed
Systems
J.L. Lions . . . . . . . . . . . . . . . . . . . .

.. 491

Chapter 8: Influence in Mathematics


The State Space Method in the Study of Interpolation
by Rational Matrix Functions
J.A. Ball, I. Gohberg, L. Rodman . . . . . . . . .

. .503

The State Space Method for Solving Singular


Integral Equations
I. Gohberg, M.A. Kaashoek . . . . . . . . . . . . . . . . . . 509
Chapter 9: Applications
Algebraic Structure of Convolutional Codes
and Aigebraic System Theory
G.D. Forney, Jr . . . . . . . . . . . . . . . .

527

System-Theoretic Trends in Econometrics


J.M. Schumacher . . . . . . . . . . . . . . .

559

Dynamical Systems That Learn Subspaces


R. W. Brockett . . . . . . . . . . . . . . . . . . . . . . . .

579

Subject Index

593

Introduction

Scientific activities can be divided today into two broad categories. The first
category inc1udes by and large the natural sciences: physics, (most of) chemistry,
geology, materials, astronomy, (some of) biology, etc. Its objective is to
investigate fundamental properties of matter, the big bang, black holes, and
similar problems. The se co nd category is concerned with phenomena and
structures which display high compZexity. These can be found in nature;
biological phenomena and the structure of molecules, like DNA, are two
examples. But for the most part they are artificial, generated by disciplines such
as engineering, computer science, cybernetics, ecology, operations research,
ecomomics, etc .. The main distinction between these two categories is their
system-component: the former has a small system-component while the latter
has a Zarge system-component. In the sequel, we will attempt to define the
concept of system related or system theoretic activity.
The scientific methodology followed by the former category of sciences is
weIl established: experiment, theory, verification. After an experiment is carried
out, a theory is postulated, whose validity is subsequently verified. This
methodology is weIl suited for simple or simplified phenomena (like the
photoelectric effect). It does not work however for the scientific activities with
significant system-component. This is due to their high compZexity. Think for
ex am pIe of the brain or the computer; there is no simple experiment, which
could capture all aspects of the functions of either the brain or the computer.
Consequently, there is so simple theory wh ich would work either. The methodology of the system related sciences is not as yet, weIl established. What is certain
however is that the concept of model plays a prominent role. A model is an
abstract construction (a set of rules), which helps (a) in analyzing the problem
at hand, (b) in determining what can be achieved and wh at not, and finaIly,
(c) in giving prescriptions on how certain goals can be achieved.
After these remarks let us attempt to define what is meant by system related
activities or system theory for short. It is a science which deals with phenomena
whose complexity cannot be described by simple laws. It is concerned not with
the actual world but with models of the actual world. System theory is not only
descriptive like the natural sciences, but prescriptive as weIl. This means that
system theory does not only tell us how systems are (analysis), but how systems
should be (synthesis).

Introduction

The system-related sciences, were barely existent a century ago. Since then
their growth has steadily increased, accelerating especially after the middle of
this century. Judging from the number of journals devoted to this area, it has
become today one of the most important areas of scientific endeavor.
Due to the fact that system theory studies models, the natural means for
system-theoretic investigations in mathematics. Historically, system theory has
been concerned with large classes of systems, like linear, bilinear, analytic, etc.
(especially the former) and answered questions regarding their possibilites and
their performance limits. Since the main tool for studying system-theoretic
problems is mathematics, system theory, like mathematics is universal. This
means that a system theoretic result can be applied to a physical system or a
biological system or an economic system, or any other system, provided that
the assumptions under which the results were derived is satisfied. Thus
mathematics through abstraction at different levels, provides the necessary tools
for penetrating into the complexity which characterizes a great many of the
problems today.
To shed more light on the above picture of today's scientific activities, the
interplay between the two categories already mentioned is illustrated by the
following ex am pIe. Airplanes, and in particular newly designed, high-perfomance
airplanes, are inherently unstable. This instability is precisely a consequence of
their good aerodynamic properties. For these planes to be able to fly, the use
of (very sophisticated)feedback control mechanisms is imperative. Thus, the idea
of feedback being one of the typical system theoretic concepts, in building
airplanes, one needs to respect the laws of aerodynamics but those of system
theory as weIl.
The most influential and dominant figure in system theory over the past 30
years has undoubtedly been Rudolf E. Kaiman. There is hardly a research area
in this field which has not been influenced by his thinking. In the pages that
follow, there is ample documentation of Kalman's influence in the field which
earned hirn the reputation of the founder of mathematical system theory.
Kalman's first major contribution resulted from his little known master's
thesis [7]. Therein, much of the present development concerning the chaotic
behavior of dynamical system is anticipated. His adaptive control paper [12J
followed. It proposed the self-tuning regulator, which is widely used in practice
today and whose theoretical analysis was completed some twenty years later.
Between 1959 and 1965 KaIman wrote aseries of seminal papers. First, the
new approach to the filtering problem, known today as Kaiman Filtering, was
put forward [19J, [25]. Its enormous success and appeal lies in the fact that
the structure of the optimal filter is explicitely given, while the unknown gain
is computed _recursively, by solving a matrix Riccati differential or difference
equation. In the mean time the all pervasive concept of controllability and its
dual, the concept of observability, were formulated [24J, [34]. Simple rank
conditions were derived for checking their validity. These notions were essential
in Kalman's treatment of the linear quadratic problem in optimal control. By
combining the filtering and the control ideas, the first systematic theory for

Introduction

control synthesis, known today as the Linear-Quadratic-Gaussian or LQG theory,


resulted [23]. Notice that the common thread which runs through all these
contributions is provided by the concepts of state and model. The next
contribution was the solution of the black box modelling problem in the linear
case, known as realization theory. This problem involves the construction of the
state from input/output measurements. The conditions for existence and
uniqueness of the solution were derived. The remarkable fact is that uniqueness
is equivalent to controllability and observability of the model; this highlights the
key role played by these concepts in the theory of linear systems [31], [35],
[42], [47], [49].
The next milestone in the sequence of contributions was the introduction
of module theory to the study of linear systems [46]. The result was a flurry
of activities, very much alive today, exploiting the fact that polynomials and
the associated Euclidean division algorithm entered into the picture. Related
contributions inc1ude investigations on systems over rings [71], [73], bilinear
and multilinear systems [60]. Kalman's contribution to the positive real lemma
[38], [39], the location of the roots of polynomials [63], and the study of the
connection between Kronecker invariants and feedback [70] should also be
mentioned. Later came the founding of yet another area of research in system
theory, that of families of linear systems [75]. Over the past decade KaIman
has devoted his efforts to the understanding of the problem of identification
from noisy data with particular attention to the connections with econometrics
and statistics [102].
There follows a guided tour ofthe contents ofthis Festschrift which contains
more details on the above contributions.
Chapter 1. Axiomatic Framework
The opening paper (Willems) is concerned with a study of the axiomatic
foundations of system theory, a topic so dear to KaIman (see [64, Chap. 1]).
Using a behavior (trajectory) based approach, it reflects on and generalizes the
basic system-theoretic concepts: state-controllability-observability. What
becomes c1ear is that controllability is an attribute of the system which does
not need an internal representation to be defined. It can be defined in terms of
the behavior in such a way that it generalizes the usual concept of controllability.
Moreover, the problem of deciding which of the external variables are the causes
and which are the effects, can be addressed in this framework.
Chapter 2. Kaiman Filtering
In late 1958, as an application of state variable theory, KaIman set out to try
the mode1-based approach on the problem of Wiener Filtering (see [82]). What
ensued from this attempt in the months that followed, was the celebrated Kaiman
Filter, which turned out to be one of the most influential developments in
science, of the past several decades. The se co nd chapter is dedicated to this

Introduction

topic. The first paper (Anders on & Moore) provides an overview of the KaIman
Filter and a comparison with the Wiener Filter; the key differences are
summarized in table 2. Furthermore, related topics, like the equivalence between
the main tools of the two filters, i.e. the Riccati equation and the spectral
factorization, and the way innovations come into the picture, are explored. The
second paper (Kailath) examines the very rich area surrounding the KaIman
Filter and adds many historical remarks. It is thus explained how the study of
innovations and martingales arises naturally in connection with the KaIman
and the Wiener Filtering problems, in attempts to find the relationship between
them. The third paper (Faurre) discusses how navigation and guidance were
influenced by the introduction of the KaIman Filter. There are five concrete
applications in which the author's company SAGEM, has been working on. In
two cases pictures of the actual hardware are displayed. The chapter condudes
with an account of the quantum KaIman Filter, where quantum probabilistic
techniques are used to produce an explicit solution of the discrete-time filtering
problem for a dass of dassical stochastic processes which is neither markovian
nor gaus si an (Accardi).
Chapter 3. The LQG Problem
The dual of the KaIman Filtering problem, known as the Linear-QuadraticGaussian (LQG) problem is discussed in this chapter. Kimura's essay argues that
LQG is the first systematic control synthesis theory and differs from dassical
synthesis methods because it is model-based. Precisely the fact that it is model
based raises the issue of the theory-practice gap, which in turn, prepares the
ground for robust contro!. It is also argued that the more recent H oo-control
theory is not a counterpart but rather a successor of the LQG theory. This last
point is taken up in the next contribution (Khargonekar), where it is actually
shown how the state feedback H 00 control problem and the H 00 filtering problem
can be regarded as generalizations of the LQG and the KaIman filtering
problems, respectively. The Riccati equation turns out to be the main tool in
both ca ses. Moreover, for the limiting value of a parameter, the Riccati equations
involved in the LQG and the KaIman Filter are recovered. The third paper in
this chapter (Goodwin and Salgado) surveys a newly developed framework
which unifies the discrete and continuous-time LQG theories. A formalism is
introduced in terms of the sampling period; as it goes to zero, the continuous
time case is recovered, while for unity sampling period, the discrete-time results
are obtained.
Chapter 4. The Realization Problem
Another one of Kalman's major contributions was to solve the realization
problem, i.e. the problem of construction of the state from the external behavior
in the case of linear systems. An equivalence between external (input-output)
and internal (input-state-output) descriptions of linear systems was thus

Introduction

established, predicted already by Nerode a few years earlier, on the set-theoretic


level. In this chapter an account of the deterministic (Antoulas, Matsuo, and
Yamamoto) as weIl as the more involved stochastic (Picci) realization problems
is given. In the former paper, surveys of the realization problem for both
finite- and infinite-dimensional linear systems are followed by overviews of
recent related results. The stochastic approach presented in the latter paper is
based on the conceptual point that random inputs should be regarded as
auxiliary variables, and have no more physical meaning than the state variables.
Chapter 5. Linear System Theory
This chapter is dedicated to the structure of linear systems. The state space of
a linear (time-invariant, finite-dimensional) system, together with the map which
determines its dynamics, can be regarded as a polynomial module. Therefore,
the theory of linear systems is equivalent to the theory of polynomial modules.
This simple but penetrating observation, due to KaIman, converted the study
the linear systems to the study of polynomial maps and matrices. The numerous
and deep contributions in this area are surveyed by the next five papers. The
remaining two subsections are concerned with further developments including
the global theory of linear systems.
5.1 Module Theory. The transfer function of a linear system can be regarded
as an algebraic object (ratio of polynomials) or as an analytic object (function

of a complex variable). In the first paper Fuhrmann remarks that this dual
nature of the theory of linear systems accounts for the richness of the fieId and
for the depth of the results. A prime ex am pIe of this duality is Kalman's
unification of various stability criteria using modular arithmetic (algebraic
calculations modulo a given polynomial). Besides this, the. fundamental
connection between modules and coprime factorizations, and in particular the
so-called polynomial models which provide the link between the two, is surveyed.
The paper concludes with an account of the circle of ideas around the partial
realization problem, the Euclidean algorithm, continued fractions, canonical
forms, etc.., In the second paper Hautus and Heymann use the module
framework to obtain a unifying treatment of both realizations and feedback.
The application of module theoretic ideas to the treatment of zeros of
multi-input, multi-output systems is surveyed in Wyman's contribution, while
Hammer shows how the module framework can be used for the extraction of
structural invariants which are relevant in the study of dynamic compensation
problems. The last paper in this section (Kamen) discusses how more general
classes of linear systems~ like delay-systems, two-dimensional systems, etc., can
actually be interpreted as systems with coefficients in appropriate rings, as
opposed to fields.
5.2 Families of Systems. The global theory of families of linear systems is

summarized by Tannenbaum in the next essay. This research area has motivated many mathematicians since it was introduced by KaIman in 1974. Its

Introduction

connections with algebraic geometry and classical invariant theory are most
noteworthy.
5.3 Related Developments. Pearson's essay recalls the state of control system
design in the late 50s and argues that the introduction of the concepts of
controllability and observability opened the door to the discovery of the
parametrization of internally stabilizing controllers some 15 years later. This
parametrization builds the basis for all robust control approaches currently in
use. The paper by Baras shows that the operations with polynomials which are
at the heart of the algebraic approach to linear system theory, can be
automatized and the numerical problems completely circumvented using the
recent advances in error-free polynomial computation. The chapter concludes
with a survey of the area of linear discrete system stability by Mansour and
Jury. The investigations, which were motivated by the KaIman and Bertram
papers [21], [22], are centered around the discrete Schwarz matrix introduced
by Mansour, which today bears his name.

Chapter 6. Identification and Adaptive Control


The first paper by Caines, shows that the theories of the KaIman Filter, the
realization problem and the study offamilies of systems (all discussed in previous
chapers) can be combined to study in detail and with considerable success the
linear stochastic system identification problem. The paper by Deistler and
Anderson, inspired by Kalman's recent research in the area of identification
from noisy data, presents new results on identification with errors-in-variables
models. The last paper (strm) surveys how the ideas put forward by KaIman
in his 1958 paper mentioned earlier, resulted in the popular and widely used
adaptive control scheme known as the self-tuning regulator. An interesting aspect
of the self-tuning regulator is that the recursive least-squares estimation
algorithm proposed in that paper has a structure similar to the later discovered
KaIman Filter.
Chapter 7. Generalizations to Nonlinear and Distributed Systems
The first two papers (Sontag, Fliess) discuss various attempts to generalize the
concepts of controllability to the nonlinear case and the difficulties involved.
The former argues that for both continuous- and discrete-time nonlinear systems
the weaker concept of accessibility is the right one to use. In the latter, a
differential algebra approach is proposed to deal with nonlinear controllability.
The third paper (Roberti and Isidori) shows how Kalman's canonical structure
theorem, i.e. the decomposition of every system into controllable and/or
observable parts, can be generalized for certain classes of non-linear systems.
The last paper ofthis chapter (Lions) reviews the status of attempts to generalize
the notions of controllability and observability in linear and non-linear
distributed systems.

Introduction

Chapter 8. Influence in Mathernatics


In the first paper, Ball, Gohberg and Rodman, survey the use of realization and
state-space theory for the study of interpolation problems for rational matrix
functions from the purely mathematical point of view. A result on boundary
Nevanlinna-Pick interpolation is quoted. The connection between system
theory and operator theory is also mentioned. The next paper (Gohberg and
Kaashoek) continues along the same lines in order to study the solution of
singular integral equations.
Chapter 9. Applications
The first paper of the last chapter (Forney) details the close interaction and
mutual influence between algebraic coding and algebraic system theory. An
account is given on how the theory of convolutional codes was influenced in
the early '70s by algebraic system theory, and in particular by the SmithMcMillan form of rational functions, which is a manifestation of the invariant
factor theorem in module theory. In turn, results in co ding led to the developme nt of the theory on minimal bases of rational vector spaces, put forward by
Forney, which becarne an important tool in system theory. The next contribution (Schumacher) surveys the influence of system-theoretic methods in
econometrics. Not surprisingly, the KaIman Filter plays a prominent role. Most of
the paper however is dedicated to a discussion on the topic known in econometric
circles as cointegration. In system theoretic terms, it can be cast under the heading
of realization or modelling. The last contribution is concerned with learning.
The author (Brockett) attempts to establish a realization theory for learning
systems, arguing that the main ingredient of such a theory lies in an appropriate
union of continuous and combinatorial analysis. At this point I would like to pay
tribute to A. Lindenmayer, whose projected contribution on system theory and
biology would have been part of this chapter. But fate dictated otherwise.
It is my pleasure to dedicate this volume to Professor R.E. KaIman with
respect and admiration. I would like to close this introduction with some
personal reminiscences.

I first met R.E. KaIman in Zrich in May 1975. At the time, I was still an
undergraduate student in the department of mathematics and physics at the
ETH, working on my diploma thesis in theoretical physics, an area which had
fascinated me since my high school years. It was my intention to pursue a
doctorate degree in this research direction. Having been nurtured and raised
in a classical European academic environment, I was led to regard theoretical
physics, with the necessary background in mathematics, as the branch of science
ofTering the ultimate intellectual challenge. Retrospectively, one can safely claim
that this picture in which I was brought up, was one-sided. Physics being by
and large a descriptive science, there was no room in it for the prescriptive
sciences, the sciences of the artificial which are a dominant force in the scientific
scene today.

Introduction

In the summer semester of 1975, I attended Kalman's first course after he


was appointed to the newly created chair of mathematical system theory at the
ETH. At the end of this course I was impressed by wh at the system related
sciences had to offer in terms of intellectual challenge, motivation and purpose.
I was equally impressed by the lecturer hirnself. I have since dedicated myself
to the cause of better understanding and advancing system theory.
Now, some 15 years later, I am still as excited about system theory as I was
then. In great part, this enthusiasm was instilled in me by my Doktorvater,
R.E. Kaiman. I recall the numerous seminars he organized in system theory,
usually on topics remote from my own thesis topic, and the ensuing discussions.
His penetrating, sometimes provocative, always insightful and critical comments
to the heart of the problem, constituted my main guide in my quest for
intellectual maturity, for learning to distinguish the relevant from the irrelevant,
the problem from the non-problem. Kaiman was a 6&oxIXAOS in the ancient
greek meaning of the word.
Acknowledgements

I would like to thank J.L. Massey of the ETH Zrich for extensive discussions
in the early stages of this project, as weil as for recommending this Festschrift
for publication to Springer. I am also indebted to le. Willems of the University
ofGroningen, for being a constant source ofhelp and councel since the beginning
of this undertaking. I would also like to thank J.B. Pearson of Rice University,
for his feedback during the later stages of the project.
In collaboration with the Centro Matematico V. Volterra, of the University
of Rome, Tor Vergata, a symposium was organized on the occasion of Kalman's
60th birthday, May 17-19, 1990. During this meeting, which took place in
Villa Mondragone, Frascati, most of the contributions to this Festschrift were
presented. The help of L. Accardi with regard to the financial and local
arrangements is gratefully acknowledged.
March 1990

A.C. Antoulas
Rice University, Houston

List of Technical Publications of R. E. KaIman

List of Technical Publications of R. E. Kalman*


[1] Discussion of paper by A.R. Bergen an J.R. Ragazzini, Trans AIEE (Applications and
Industry), 73 11 (1954) 245-246
[2] Phase-plane analysis of automatie control systems with nonlinear gain elements, Trans AIEE
(Applications and Industry), 73 11 (1954) 383-390
[3] Discussion of paper by H. Chestnut, Trans ASME, 76 (1954) 1362
[4] A critical survey of analysis and design methods for sampled-data control systems,

Servomechanisms Laboratory, internal report, MIT, 1954, 37 pp

[5] Phase-plane analysis of nonlinear sampled-data servomechanisms, M.S. Thesis, Dept. of

Electrical Engineering, MIT. (Servomechanisms Laboratory, MIT, internal report, May 1954,
68 pp)
[6] Analysis and design principles of second and higher-order saturating servomechanisms, Trans
AIEE (Applications and Industry), 74 11 (1955) 294-310
ArticIe reprinted in Optimal and Self-optimizing Control, edited by Rufus Oldenburger, MIT
Press, 1966, pp 102-118. (LC Card No. 66-21356)
[7] Nonlinear aspects of sampled-data control systems, in Proceedings of Second Symposium
on Nonlinear Circuit Analysis, edited by J. Fox, Polytechnic Institute of Brooklyn 1956,
pp 273-313. (LC Card No. 55-3575)
[8] Physieal and mathematical mechanisms of instability in nonlinear automatie control systems,

Trans ASME, 79 (1957) 553-566


[9] Discussion of paper by P. Sarachik and J.R. Ragazzini, Trans AIEE (Applications and
Industry), 76 11 (1957) 60
[10] Optimal non linear control of saturating systems by intermittent action, IRE Wescon
Convention Record, 1957, Voll, part 4, pp 130-135. [Also Columbia University, Electronics
Research Laboratories Final Report F/127, Vol 3, 21 pp]
[11] Analysis and synthesis of linear systems operating on randomly sampled data, Doctoral
dissertation, Dept. of Electrical Engineering, Columbia University, 1957, 149 pp
[12] Design of a selj-optimizing control system, Trans ASME, 80 (1958) 468-478
[13] (with R.W. Koepcke) Optimal synthesis of linear sampling control systems using generalized
performance indexes, Trans ASME, 80 (1958) 1820-1826
[14] (with J.E. Bertram) General synthesis procedure for computer control of single and multi-Ioop
linear systems, Trans AIEE (Applications and Industry), 77 11 (1958) 602-609
[15] Sampled-data control, in Handbook of automation, computation and control, edited by
E.M. Grabbe, S. Ramo, and D.E. Wooldridge, Wiley, 1958, Chapter 12, pp 12-01 to 12-09
[16] (with R.W. Koepcke) The role of digital computers in the dynamic optimization of chemical
reactors, Proceedings Western Joint Computer Conference, 1959, pp 107-116
[17] (with L. Lapidus and E. Shapiro) On the optimal control of chemical processes, in Proceedings
Joint Symposium on Instrumentation and Computation in Process Development and Plant
Design, London, 1959, pp 6-17
[18] (with J.E. Bertram) A unified approach to the theory ofsampling systems, J. FrankIin Institute,
267 (1959) 405-436
[19] A new approach to linear jiltering and prediction problems, Trans ASME (J. Basic
Engineering), 82D (1960) 35-45
ArticIe reprinted in Linear Least Squares Estimation, edited by T. Kailath, Dowden,
Hutchinson, and Ross, 1977, pp 254-264. (LC Card No. 77-7465)
ArticIe reprinted in KaIman Filtering: Theory and Application, edited by H.W. Sorenson,
IEEE Press, 1985, pp 16-26. (LC Card No. 85-14253)
[20] (with L. Lapidus and E. Shapiro) Mathematies is the key, Chemical Engineering Progress,
56, 1960, No. 2 pp 55-61)
[21] (with J.E. Bertram) Control system analysis and design via the 'second method' of Lyapunov.
I. Continuous-time systems, Trans ASME (J Basic Engineering), 82 D (1960) 371-393

* All re-published (and possibly translated) versions of an articIe or book are Iisted together, under
the same number.

10

List of Technical Publications of R. E. KaIman

[22] (with J.E. Bertram) Control system analysis and design via the 'second method' of Lyapunov.
H. Discrete-time systems, Trans ASME (J. Basic Engineering), 82 D (1960) 394-399
Above two articles reprinted in Nonlinear Systems: Stability Analysis, edited by J.K. Aggarwal
and M. Vidyasagar, Dowden, Hutchinson, and Ross, 1977, pp 58-87. (LC Card No.
76-15382)
[23] Contributions to the theory of optimal control, Boletin de la Sociedad MaU:matica Mexicana,
5 (1960) 102-119
ArticIe reprinted in Simposium Internacional de Ecuaciones Diferentiales Ordinarias,
University of Mexico, September 1959, pp 102-119
Author's reply to discussion, IEEE Trans on Automatic Control, AC-17 (1972) 179-180
Author's rellective comments on articIe published as a Citation C1assic in Current Contents,
PC & ES, no. 32, August 6, 1979, pp 14
[24] On the general theory of control systems, in Proceedings first IFAC Congress on Automatic
Control, Moscow, 1960; Butterworths, London, 1961, Voll, pp 481-492. [Also, russian
translation, IFAC preprint, 29 pp]
[25] (with R.S. Bucy) New results in linear jiltering and prediction theory, Trans ASME (J. Basic
Engineering), 83 D (1961) 95-108
ArticIe reprinted in Random Processes, Part I: Multiplicity Theory and Canonical
Decompositions, edited by A. Ephremides and J.B. Thomas, Dowden, Hutchinson, and
Ross, 1973, pp 181-194. (LC Card No. 75-96190)
ArticIe reprinted in Kaiman Filtering: Theory and Application, edited by H.W. Sorenson,
IEEE Press, 1985, pp 34-47. (LC Card No. 85-14253)
[26] Lectures on the calculus of variations and optimal control, Aerospace Corporation, internal
lectures, August 7-18, 1961. [Typed manuscript, 35 pp, not published.]
[27] New methods and results in linear prediction and JUtering theory, RIAS Technical Report
61-1, February 1961, 135 pp
Report almost completely reprinted as New methods in Wiener JUtering theory, in Proceedings
First Symposium on Engineering Applications of Random Function Theory and Probability,
edited by 1. Bogdanoff and F. Kozin, Wiley, 1963, pp 270-388. (LC Card No. 63-1803)
Report reprinted in ASD Technical Report 61-27, Appendix, pp 109-268
[28] (with T.S. Englar and R.S. Bucy) Fundamental study of adaptive control systems,
SAD-TR-61-27, 1961, 300 pp. [Contains fuH text of [25] and [27], with connecting
narrative and examples.]
[29] Control of randomly varying linear dynamical systems, in Proceedings of Symposia on Applied
Mathematics, American Mathematical Society, Vol 13, 1962, pp 287-298. (LC Card No.
50-1183)
[30] Discussion of paper by L. Marcus and E.B. Lee, Trans ASME (1. Basic Engineering), 84 D
(1962) 9-10
[31] Canonical structure of linear dynamical systems, Proc. National Academy of Sciences (USA),
48 (1962) 596-600
[32] The variational principle of adapt~on: filters for curve fitting, presented at IFAC Symposium
on Adaptive Systems, April 1962, Rome. [Unpublished. Complete manuscript available, 15
pp]
[33] On the stability of linear time-varying systems, Trans IEEE on Circuit Theory, CT-9 (1962)
420-422. Discussion, ibid., CT-1O (1963) 540-542
[34] (with Y.C. Ho and K.S. Narendra) Controllability of linear dynamical systems, Contributions
to Differential Equations, Voll (1963) 189-213
[35] Mathematical description oflinear dynamical systems, SIAM 1. Control, 1 (1963) 152-192.
[36] The theory of optimal control and the calculus of variations, in Mathematical optimization
techniques, edited by R. BeHman, University of California Press, 1963, chapter 16, pp
309-331. (LC Card No. 63-12816)
[37] First-order implications of the calculus of variations in guidance and control, Proc. Optimum
Systems Synthesis Conference, Technical Report ASD-TDR-63-119 (Flight Control
Laboratory, Wright-Patterson Air Force Base, Ohio), February 1963, pp 365-371
[38] Lyapunov funetions for the problem of Lur'e in automatie eontrol, Proc. National Academy
of Sciences (USA), 49 (1963) 201-205
ArticIe reprinted in Nonlinear Systems: Stability Analysis, edited by J.K. Aggarwal and M.
Vidyasagar, Dowden, Hutchinson, and Ross, 1977, pp 201-205. (LC Card No. 76-15382)
[39] On a new characterization of linear passive systems, in Proc. 1st AHerton Conference, 1963,
pp 456-470. (Also RIAS Technical Report 64-7, April 1964)

List of Technical Publications of R. E. Kaiman

11

[40] (with G. Szeg) Sur la stabilite absolue d'un systeme d'equations aux dilTerences finies,
Comptes rendus (Paris), 257 (1963) 388-39C.
[41] When is a linear control system optimal?, Trans. ASME (J. Basic Engineering), 86 D (1964)
51-60
Artic1e reprinted in Frequency Response Methods, edited by A.J.C. MacFariane, IEEE Press,
1979, pp 71-80. (LC Card No. 79-90572)
[42] On canonical realizations, Proc. 2nd Allerton Conference, 1964, pp 32-41
Artic1e reprinted in Arch. Automatyki i Telemekhaniki (Warsaw), 10 (1965) 3-10
[43] Toward a theory of computation in optimal control, in Proc. IBM Symposium on Scientific
Computation, October 1964, pp 25-42. (LC Card No. 66-19007)
[44] (with L. Weiss) Contributions to linear system theory, International J. Engineering Science,
3 (1965) 141-171
[45] On the Hermite-Fujiwara theorem in stability theory, Q. Applied Mathematics, 23 (1965)
279-282
[46] Algebraic structure oflinear dynamical systems, I. The module of I, Proc. National Academy
ofSciences (USA), 54 (1965) 1503-1508
[47] Irreducible realizations and the degree of a rational matrix, SIAM J., 13 (1965) 520-544
[48] Linear stochasticfiltering theory-reappraisal and outlook, Proceedings Symposium on System
Theory, edited by J. Fox, Polytechnic Institute of Brooklyn, 1965, pp 197-205. (LC Card
No. 65-28522)
[49] (with B.L. Ho) Ejfective construction of linear state-variable models from input/output data,
in Proceedings 3rd Allerton Conference, 1965, pp 449-459
Artic1e reprinted in Regelungstechnik, 14 (1966) 545-548
[50] Algebraic theory of linear systems, in Proceedings 3rd Allerton Conference, 1965, pp
563-577
Artic1e reprinted in Arch. Automatyki i Telemechaniki (Warsaw), 11 (1966) 119-129
[51] On structural properties of linear constant, multivariable systems, In Proc. 3rd IFAC
Congress, London, 1966
[52] (with T. Englar) A user's manual for the automatic synthesis program (Program C), NASA
Contractor Report CR 475, June 1966, 526 pp
[53] The Riccati equation, chapter 7 of above reference
[54] (with B.D.O. Anderson, R.W. Newcomb, and D.C. Youla) Equivalence oflinear time-invariant
dynamical systems, J. FrankIin Institute, 281 (1966) 371-378
[55] (with B.L. Ho) Spectral Jactorization using the Riccati equation, in Proc. 4th Allerton
Conference, 1966. [Also Aerospace Report No. TR-I00l (2307)-1]
[56] Algebraic aspects oJthe theory of dynamical systems, in DilTerential Equations and Dynamical
Systems, edited by J.K. Haie and J.P. LaSalle, Academic Press, 1967, pp 133-146
[57] New developments in systems theory relevant to biology, in Proceedings III Systems
Symposium, Case Institute of Technology, 1966; published as Systems Theory and Biology,
edited by M.D. Mesarovic, Springer, 1968, pp 222-232. (LC Card No. 68-21813)
[58] Realization theory Jor non-constant linear systems, January 1968, finished manuscript, 79
pp. Intended as chapter 12 for item [64] but not inc1uded
[59] On the mathematics of model building, in Proc. Summer School on Neural Networks,
Ravello, 1967; published as Neural Networks, edited by E.R. Caianiello, Springer, 1968, pp
170-177. (LC Card No. 68-8783).
[60] (in Russian) Raspoznavanie obrazov polilineinymi mashiami, in Proc IFAC Conference on
Adaptive Systems, Erevan, USSR, September 1968, pp 7-30, Izdatel'stvo Nauka,
Moskva, 1971
Artic1e republished in revised and annotated English translation as Pattern recognition
properties ofmultilinear response funetions, I -I I, Control and Cybernetics, 8 (1979) 331-361.
[61] Introduction to the algebraic theory oJ linear dynamical systems, in Proc. International
Summer School on Mathematical Systems Theory, Varenna, 1967; published as Mathematical
Systems Theory and Economics, edited by H.W. Kuhn and G.P. Szeg, Springer Lecture
Notes in Operations Research and Mathematical Economics, Vol 11, 1969, pp 41-65.
(LC Card No. 70-81409)
[62] Leetures on eontrollability and observability, in Proc. C.I.M.E. Summer School at Pontecchio
Marconi, Bologna, July 1968; published as Controllability and Observability, Edizioni
Cremonese, Roma, 1969, pp 1-149
[63] Aigebraic charaeterization of polynomials whose zeros lie in certain algebraic domains, in
Proc National Academy of Sciences (USA), 64 (1969) 818-823

12

List of Technical Publications of R. E. Kaiman

[64] (with P.L. Falb and M.A. Arbib) Topics in Mathematical System Theory, McGraw-Hill,
1969, 358 pp (LC Card No. 68-31662)
Russian translation as Ocherki po matematicheskoi teorii sistem, Izdatel'sto "MIR", Moskva,
1971,400 pp
Romanian translation as Teoria sistemelor dinamice, Edituro Technica, Bucaresti, 1975,326
pp
[65] Same computational problems and methods related to invariant Jactors and control theory,
in Proc of Conference on Computational Problems in Abstract Algebra, edited by John
Leach, Oxford, 1967, Pergamon Press, 1969. (LC Card No. 75-84072)
[66] New algebraic methods in stability theory, Proc. 5th International Congress on Nonlinear
Oscillations, Kiev, 1969; published in Izdanie Instituta Matematiki Akademia Nauk USSR,
Kiev, 1970, Vol 2, pp, 189-199
[67] (edited with N. DeClaris) Aspects of Network and Systems Theory (a collection of papers in
honor of E.A. Guillemin), Holt, Rinehart, and Winston, 1971, 648 pp (LC Card No.
77-115455)
[68] On minimal partial realizations oJ a linear input/output map, in Aspects of Network and
System Theory (a collection of papers in honor of E.A. Guillemin), edited by R.E. Kaiman
and N. DeClaris, Holt, Rinehart, and Winston, 1971, pp 385-408. (LC Card No.
77-115455)
[69] (with M.LJ. Hautus) Realization oJ continuous-time linear dynamical systems: Rigorous theory
in the style oJ Schwartz, In Proc 1971 NRL-MRC Conference on Ordinary Differential
Equations, edited by L. Weiss, Academic Press, 1971, pp 151-164. (LC Card No.
77-187234)
[70] Kronecker invariants and Jeedback, in Proc. 1971 NRL-MRC Conference on Ordinary
Differential Equations, edited by L. Weiss, Academic Press, 1972, pp 459-471. (LC Card
No. 77-187234)
[71] (with Y. Rouchaleau and B.F. Wyman) Algebraic structure oJ linear dynamical systems. III.
Realization theory over a commutative ring, Proc. National Academy of Sciences (USA), 69
(1972) 3404-3406
[72] Remarks on mathematical brain models, in Biogenesis, Evolution, Homeostasis, edited by A.
Locker, Springer, 1973, pp 173-179. (LC Card No. 72-96743)
[73] (with Y. Rouchaleau) Realization theory oJ linear systems over a commutative ring, in
Automata Theory, Languages, and Programming, edited by M. Nivat, North Holland, 1973,
pp 61-65. (LC Card No. 72-93493)
[74] Filtraggio statistico nella tecnologia spaziale, in Scienza & Tecnica 73, Arnoldo Mondadori,
Milano, 1973, pp 403-408
[75] Algebraic-geometric description oJ the dass oJ linear systems oJ constant dimension, Proc
8th Annual Princeton Conference on Information Sciences and Systems, 1974, pp 189-191
[76] Comments on the scientific aspects oJ modeling, in Towards a Plan of Actions for Mankind,
edited by M. Marois, North Holland, 1974, pp 493-505. (LC Card No. 75-319415)
[77] Optimization, mathematical theory oJ, IV: Control theory, Encyclopedia Brittanica, 15th
Edition, 1974, Macropaedia, Vol 13 (Newman to Peisistratus), pp 634-638
[78] (with Michiel Hazewinkel) Moduli and canonicalJormsJor linear dynamical systems, Report
7504/M, Erasmus Universiteit Rotterdam, April 1974, 30 pp
[79] Algebraic aspects ofthe generalized inverse, in Generalized Inverses and Applications, edited
by M. Zuhair Nashed, Academic Press, 1976, pp 111-124. (LC Card No. 76-4938)
[80] Realization theory oJ linear dynamical systems, in Control Theory and Functional Analysis,
VoilI, International Atomic Energy Agency, Vienna, 1976, pp 235-256
[81] (with Michiel Hazewinkel) On invariants, canonical Jorms and moduli Jor linear, constant,
finite dimensional, dynamical systems, in Mathematical System Theory, edited by
G. Marchesini and S.K. Mitter, Springer Lecture Notes in Economics and Mathematical
Systems, 1976, pp 48-60
[82] A retrospective aJter twenty years: Jrom the pure to the applied, in Applications of Kaiman
Filter to Hyorology, Hydraulics and Water Resources, edited by Chao-lin Chiu, Dept. ofCivil
Engineering, University of Pittsburgh, 1978, pp 31-54. (LC Card No. 78-069752)
[83] Nonlinear realization theory, in Transactions of the Twenty-Fourth Conference of Army
Mathematicians, US Army Research Office, Triangle Park, NC, May 1978, pp 259-269
[84] (with A. Lindenmayer) DOL-realization oJ the growth of multicellular organisms (extended
abstract), Proc 4th International Symposium on the Mathematical Theory of Networks and
Systems, Delft, July 1979

List of Technical Publications of R. E. KaIman

13

[85] On partial realizations, transfer functions, and canonical forms, Acta Polytechnica
Scandinavica, Mathematics and Computer Sciences Series No. 31, 1979, pp 9-32
[86] A system-theoretic critique of dynamic economic models, in Global and Large-scale System
Models, edited by B. Lazarevic, Springer, 1979, pp 1-24. (LC Card No. 81-461283)
[87] Theory of modeling, Proceedings of the IBM System Science Symposium, Oiso, Japan,
October 1979, pp 53-69
[88] System-theoretic critique of dynamic economic models, Int. J. Policy Analysis and Information
Systems, 3 (1980) 3-22
[89] Mathematical system theory: the new Queen?, Texas Tech. University Mathematics Series,
No. 13, 1981, American Mathematical Heritage: Algebra and Applied Mathematics,
pp 121-127
[90] Dynamic econometric models: a system-theoretic critique, in New Quantitative Techniques
for Economic Analysis, edited by G.P. Szeg, Academic Press, New York, 1982, pp 19-28
[91] Identifiability and problems ofmodel selection in econometrics, in Advances in Econometrics,
edited by W. Hildebrand, Cambridge University Press, 1982, pp 169-207. (LC Card No.
81-18171)
ldentifikalhatosag es a modellvalasztas problemai as okonometriaban, (Hungarian translation
of the preceding), Szigma, 15 (1982) 87-119
[92] On the computation of the reachable/observable canonical form, SIAM J. Control and
Optimization, 20 (1982) 258-260
[93] Realization of covariance sequences, in Toeplitz Centennial, edited by I. Gohberg, Birkhuser,
1982, pp 135-164. (LC Card No. 82-1319)
[94] System identification from noisy data, in Dynamical Systems 11, edited by A.R. Bednarek
and L. Cesari, Academic Press, 1982, pp 331-342. (LC Card No. 82-11476) (Proceedings of
a University of Florida International Symposium)
[95] Identifiability and modeling in econometrics, Developments in Statistics, edited by P.R.
Krishnaiah, Academic Press, 1982, Vol 4, pp 97-136. (LC Card No. 77-11215)
[96] ldentification from real data, in Current DeveIopments in the Interface: Economics,
Econometrics, Mathematics, edited by M. Hazewinkel and A.H.G. Rinnooy Kan, D. Reidel,
Dordrecht, 1982, pp 161-196. (LC Card No. 82-16694)
[97] We can do something about multicolIinearity, Communications in Statistics, 13 (1984)
115-125
[98] Identification ofnoisy system, Uspekhi Mat. Nauk, 40 (1985) 29-37. Russian Mathematical
Surveys
[99] Transcript of Kyoto Prize Lectures, November 10 & November 11, 1985
[100] (edited with G.1. Marchuk, A.E. Ruberti, and AJ. Viberti) Recent Advances in Communcation
and Control Theory, Optimization Software, Inc., 1987,489 pp (LC Card No. 87-18604)
[101] The problem of prejudice in scientific modeling, in Recent Advances in Communication and
Control Theory, edited by R.E. KaIman, G.1. Marchuk, A.E. Ruberti, and AJ. Viberti,
Optimization Software, Inc., 1987, pp 448-461. (LC Card No. 87-18604)
[102] Nine Lectures on Identification (book), Springer, Lecture Notes on Economics, to appear
[103] Prolegomena to a theory ofmodeling, to appear in International J. ofMathematical Modeling
[104] A theory for the identification of linear relations, to appear in Lions Festschrift, edited by
H. Brezis and P.G. Ciarlet

Chapter 1

Axiomatic Framework

Dynamical Systems, Controllability, and Observability:


A Post-Modern Point of View
J. C. Willems
Mathematics Institute, University of Groningen, P.O. Box 800, NL-9700 AV Groningen,
The Netherlands

1 Introduction
I consider it a privilege to contribute the opening article to this Festschrift on
the occasion of the 60-th birthday of Rudolf Kalman.
The development of the field of System Theory as a scientific discipline owes
more to the vision and to the research work of KaIman than to that of any
other individual. True, control theoretic questions (and even a few answers)
date back all the way to the days of James Clerk Maxwell and to the pre-World
War 11 era when graphical algorithms for analyzing simple feedback schemes
developed by Bode, Nyquist, and others at Bell Laboratories were elevated to
the status of an Idea. True, the observation that biological systems interact with
their environment in an intelligent (feedback) fashion had led Wiener to coin
the term Cybernetics, but it proved hard to build a discipline on the shaky basis
of one single word, albeit a truly beautiful one at that. True, there was General
Systems Theory, but a few fuzzy ideas failed to provide the requisite variety
needed to shoot root as a basic interdisciplinary scientific endeavor.
These critical innuendos notwithstanding, one ought to give credit to
Cybernetics and General Systems Theory for realizing the need for a theory of
the artificial, for a framework for studying man-made systems, for a discipline
which addresses the problems of the prescriptive sciences. By their very nature,
Cybernetics and General Systems Theory profess an abstract purpose, and as
such they ran shipwreck in their refusal to accept the need that abstract ideas
can be properly articulated only in the language of mathematics, the field which
provides a vocabulary of abstract notions and concepts, and a grammar for
unfolding deductions from these.
Simultaneously to all this, more substantial work was underway in electrical
engineering. Indeed, in the fifties, we witnessed the development of electrical
network analysis and synthesis, which, among other things, laid the foundation
oflinear system theory. Unfortunately, the mainstream work in this are remained
physics-based, one of the perceived prerequisites being the need to capture the
restrictions imposed on the electrical circuit by the physical constraints of the
elements and the interconnections. Ironically, it was the invention of solid state
electronic devices which all but eliminated these physical constraints as an

18

J. C. Willems

important consideration for what can and what cannot be achieved, for what
can and wh at cannot be designed.
Another parallel development was communication theory. This area, more
specifically information theory, is a perfect example of a discipline which combines
asolid mathematical foundation, a potential and a penchant for The Big Idea,
and, through the co ding algorithms, an immediate technological relevance.
Regrettably, the scientific impact of this area remained isolated both in time and
in pI ace, and nowadays communication theory is a highly successful and active
area of research within electrical engineering but with limited influence outside
its immediate environment.
Electrical network theory and communication theory had indeed the
potential as a breeding ground for the growth of theoretical engineering.
Unfortunately, this promise remained largely unfulfilled and it was to be another
area of electrical engineering which was destined to combine many of these
ideas with the notion of feedback and the emerging focus on optimization as
a centripetal principle in design. This area was automatie control which until
the mid~fifties had remained a field thriving on a rather narrow intellectual
basis. In a sense it consisted out of little more than a good understanding of
the notion of a scalar transfer function and a few ad-hoc algorithms for the
design of single loop 3-term PID-controllers. The group around Lefshetz and
LaSalle at RIAS (the Research Institute for Advanced Studies, a scientific
research group ofthe Martin Company), and in particular, the vision ofKalman,
played a crucial role in giving automatic control the momentum required for
taking this field over the threshold.
In the late fifties control theory was the scene of a number of important
happenings. Firstly, we saw the development of the maximum principle, a subtle
set of necessary conditions for the optimality of an open loop control policy.
Secondly, there was the popularization of dynamic programming, which laid the
basis for a flexible view offeedback control for dynamical systems in the presence
ofuncertainty. Thirdly, and most importantly, we saw the appearance of Kaiman
filtering which provided a mathematical theory for recursive estimation and
prediction ofan unknown time-function on the basis ofanother, observed one.
KaIman filtering, together with its dual, the linear-quadratic-problem,
combined a number of catalytic features necessary for a successful development
in applied mathematics. Based on a convincing problem formulation, it obtained
its solution in a convenient recurvise form requiring the off-line solution of a
Riccati (differential) equation. It provided a beautiful algorithm suitable for
almost immediate computer implementation. Further, the infinite-time version
required the exploitation of the combined properties of observability and
controllability-very compelling concepts in their own right. Moreover, the
analysis of the Riccati equation, certainly in the infinite-time case, provided an
example of a puzzle of the type which appears to be an absolute requirement
for a thriving activity in normal science.
If one studies the literature of this era, one is struck by the breadth of ideas
put forward by KaIman at that time. He perceived long before it was to become

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

19

common knowledge the possibility of combining the separation of state


observers with state feedback controllers on the one hand, and of on-line
identification with feedback control on the other hand, in order to obtain
sophisticated adaptive control schemes with a very appealing cybernetic
structure. He realized that the novel common feature in Pontryagin's maximum
principle, in Bellman's dynamic programming, and in the KaIman filtering
algorithm lies in the use of state models and he was soon to initiate the construction of such models as an independent topic, which he termed realization theory.
Last but not least, he immediately saw the potential of this circle of ideas as
providing a setting for a theory for studying systems in interaction with their
environment. By combining problems from communication theory, the tradition
of circuit theory, ideas from feedback control, algorithms for modelling on the
basis of observations, and, finally, schemes for adaptive decision making under
uncertainty, system theory was able to become a discipline harboring the
prescriptive sciences or, if you prefer to view them that way, the non-physicsbased applied mathematics, let's say cybernetics in the best sense of the
term.
In this essay I will describe the framework for studying dynamical systems
which I have been developing during the last decade. This framework builds
on Kalman's work in that it views systems as a matter 01 principle in interaction
with their environment. In this context the notions of controllability and
observability play an essential role and become more natural and greatly
generalized. An important part in our framework is also set aside for the concept
of state, the cause ce/ebre in Kalman's seminal work. However, because of space
limitations we will in this article concentrate on the general framework and on
the notions of co nt roll ability and observability.

2 Mathematical Models
The language wh ich we developed as a mathematical vocabulary for modelling
is based on a conceptual triptych consisting of the behavior, behavioral
equations, and latent variables. We view a mathematical model as an exclusion
law: it states that certain outcomes of a phenomenon are forbidden, are declared
impossible, while others are declared as being (in principle) possible. Thus we
define a mathematical model as a pair M = (UJ,!B) with 1IJ a set called the
universum, and !B ~ 1IJ the behavior of the model. In most applications, the
behavior will be specified as the solution set of a system of equations. We will
call these behavioral equations. Formalizing, we have two maps 11,12 from the
universum 1IJ into aspace IB, called the equating space, and the behavior is
defined through the equations 11(U) = 1iu) by !B = {uElIJI11(U) = 12(U)}. Clearly
11> 12 define !B but the converse is obviously not true. Thus in mathematical
modelling, equations should be considered as a means to an end. Per se, they
are not the essen ce in a modelling exercise.

20

J. C. Willems

When models are deduced from first principles, it will invariably prove
convenient to introduce auxiliary variables. We will call these variables latent
variables and, in order to provide contrast, we will call the elements of the
universum lU manifest variables. Thus a latent variable model is a tri pie
M f = (lU,IL, 55 f) with lU the universum of manifest variables, lL the universum
of latent variables, and 55 f the Jull behavior. The latent variable model M f
induces the manifest model M = (lU, 55) with 55 = {UE lU[ 313(u, l)E55 f}, the
manifest behavior.
In the case of dynamical systems, the universum will consist of time-functions,
maps from the time-axis lI' ~ lR into the signal space W, and the behavior 55
consists of the family of W-valued time-trajectories which are compatible with
the laws of the dynamical system. Formally, a dynamical system 1: is a tri pie
(lI', W,55) with lI' ~ lR the time-axis, W the signal space, and 55 ~ W lf the
behavior. As with general models, also dynamical systems will usually be
described by behavioral equations. Often, these will take the form of difference
or differential equations. As with general models, also dynamical systems will
often by described through latent variables, yielding a latent variable
dynamical system 1:f = (lI', w, lL, 55 f) with lI' ~ lR the time axis, W the signal
space of manifest variables, lL the latent variable space, and 55 f ~ (W X IL)lf the
Jull behavior. 1:f now induces a (manifest) dynamical system 1: = (lI', W,55) in
the obvious way.
For motivational material and more details on this setting for studying
dynamical systems, we refer the reader to [Wl, W2, W3].
Example. Kepler's laws. If adefinition is to show proper respect for and do
justice to history, Kepler's laws should provide the very first example of a
dynamical system. They do. Take lI'=lR, W=lR 3 , and 55= {w:lR->lR 3 [Kepler's
laws are satisfied}. Thus the behavior 55 in this example consists ofthe planetary
motions which according to Kepler are possible, all trajectories mapping the
time axis lR into lR 3 (the position space for the planets) which satisfy his three
famous laws. Since for a given map w:lR -> lR 3 one can unambiguously decide
whether or not it satisfies Kepler's laws, 55 is indeed well-defined. Kepler's laws
form a beautiful example of a dynamical system in the sense of the above
definition, since it is one of the few instances in which 55 is described explicitely,
and not indirectly in terms of behavioral equations. It took no lesser man than
Newton to think up appropriate behavioral equations for this dynamical system.

3 Linear Systems
A dynamical system 1: = (lI', W,55) is said to be linear if W is a vector space
(over a field lF) and 55 is a linear subspace of W lf. Thus linear systems obey
the superposition principle in its very simplest form: {w 1 (-), wz{ )E55; <x, ElF} =>
{<xw 1 (-) + Wz{)E55}.

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

21

A dynamical system r = (1[', W,~) with 1I' = 7L or lR is said to be


time-invariant if O"I~ = ~ for all tE1I' (0"1 denotes the backwards t-shift:
(0"1)(t'):= /(t' + t. If 1I' = 7L then this condition is equivalent to O"~ =~. The
analogue of this definition when the time axis 1I' is 7L+ or lR+ requires O"t~ S ~
for all tE1I'.
The notion of behavioral equations is immediately applicable (take UJ =
Wll') to dynamical systems. We will be particularly concerned with behavioral
difference equations. A behavioral difference equation representation of a discretetime dynamical system with time axis 1I' = 7L+ and signal space W is defined by a
non negative integer L (calIed the lag, or the order of the difference equation), an
abstract set JE (calIed the equating space), and two maps f1' f2: W L+1 - JE.
They define the behavior by ~={w:7L+_Wlf1o(O"LW,O"L-1W,,,.,O"w,w)=
f2o(O"L W,O"L-1W,,,.,O"w,w)}. This is easily generalized to the ca se 1I'=7L.
However, in this case it is logical to consider the equation to have both positive
and negative lags, yielding the behavioral equations:
f1 (O"L W, O"L-1 W, ... , 0"1 + 1W, O"IW) = f2 (O"L W, O"L-1 W, ... , 0"1+ 1W, O"IW)
We will call L -I the lag of this difference equation. It is clear that the system
(7L, W,~) obtained this way defines a time-invariant dynamical system.
Let us now consider the following set of behavioral difference equations
RLw(t + L) + R L- 1w(t + L - 1) + ... + R 1+1 w(t + 1+ 1) + R1w(t + 0 =

where RvRL_l'".,Rl+l'RlElRgxq. (Note that this will be the system of


difference equations obtained by assuming in the above W = lRq , JE = lRg , and
f1,f2 linear). Any finite set of linear equations relating the attribute time-series
w1, w2, ... , wq to a finite set oftheir time lags will lead to a set of equations which
in matrix notation can be written in the above form. We will assurne that L, IE7L,
L ~ I (we find it convenient not to ass urne that 1=0, even though at this point
this would constitute no realloss of generality). Now introduce the polynomial
matrix R(S,S-1)=RLSL+RL_1SL-l + ... +R 1+1Sl+ 1 + RlSlElRgxq[S,S-1]. The
above system of difference equations may be written in shorthand in terms of
this polynomial matrix as
(AR)
We will call this an AutoRegressive system. Note that every polynomial matrix
defines such a system with the number of columns equal to the dimension of
the signal space and the number of rows equal to the number of (scalar)
autoregressive equations. It is logical to consider the signal space (lR q ) and hence
its dimension (q) as fixed by the system. However, the number of equations (g)
in AR will be a variable, it will depend on the system and on its representation.
We will therefore assurne that in R(s, S-1) the number of columns is fixed but
that the number of rows is in principle free. Hence R(s, S-1) will be considered
to be an element of lR x q[s, S-1], the set of polynomial matrices with q columns
and any number of rows.

22

J. C. Willems

The behavioral difference equation AR describes a dynamical system


with time axis 11' = ll, signal space W = lRq, and behavior
~ = {w:ll-.lRqIR(er, er- 1 )w = o}. Note that ifwe consider R(s, S-l)ElRg Xq[s, S-l]
as inducing the linear map R(er, er- 1 ): (lRq)z -.(1Rg)z, then ~ = ker R(er, er- 1 ). Let
us call a time-invariant system };=(ll,lRq,~) complete if {wE~}~{wl[lo,II]E
~ 1[10,11] for all t o , t 1 Ell}. It is easy to see that the system}; defined above is
linear time-invariant and complete. In fact:
}; = (1I', W,~)

Theorem 1 Let }; =(ll,lRq,~) be a dynamical system. The following are


equivalent:

(i) }; is linear time-invariant and complete;


(ii) }; is linear time-invariant and has finite memory span;
(iii) 3 R(s, S-l )ElR x q[s, S-l] such that ~ = ker R(er, er- 1);
(iv) ~E2q (2 q:= the family of all linear shift-invariant closed subspace of Lq;
Lq:= (lRq)z equipped with the topology of pointwise convergence)

AR-systems of behavioral equations playa central role in signal processing,


discrete-time control, econometrics, etc. Against this background it should be
of considerable interest to identify on the level of the behavior what is really
behind the assumption that a system can be described by AR-equations. The
above theorem shows precisely wh at these conditions are: linearity, timeinvariance, and completeness. We will refer to the above theorem as the
AR -representation theorem.

In our framework a mathematical model is a very abstract object indeed.


When we represent it by behavioral equations it becomes a bit more concrete.
Often, however, it is possible to view a model as being induced by some specific
parameters. Let 9Jl be a set, for example a family of mathematical models. In
this case each element lME9Jl denotes a mathematical model (lU, ~). A
parametrization (~, n) of 9Jl consists of a set ~ and a surjective map n: ~ -. 9Jl.
The set ~ is called the parameter space. We think of ~ as a concrete space and
of an element PE~ as a parameter, for example a set of real or complex numbers,
or vectors, or polynomials, or polynomial matrices. Typically p determines a
behavioral equation and in this way induces a mathematical model. If ~ is a
set of matrices we speak of (~, n) as a matrix parametrization of M. A similar
nomenclature is used for polynomial parametrizations, polynomial matrix
parametrizations, etc. When ~ is an abstract space related to 9Jl then we will
also call (~, n) a representation of9Jl. Hence representation (abstract) is essentially
synonymous to parametrization (concrete). For representations, think of a
behavioral equation representation of a mathematical model, or of a latent
variable representation, or of astate representation, etc. Note that we do not
require n to -be injective. Indeed, in a particular parametrization, there will
usually be many parameters inducing the same model. If, however, n is bijective,
then we speak about a trim parametrization.
I

For proofs and further definitions, we refer the reader to [Wl, W2, W3].

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

23

Reformulated in this language, the AR-representation theorem reads:


Proposition, Let Bq denote the set of linear time-invariant complete systems
(lf, W, IB) with lf = Z and W = IRq. Then (IR' x q[s, S-l], ker) with ker:R(s, S-l)H
(Z,IRq, ker R(a, a- 1)) defines a (polynomial matrix) parametrization of Bq.

The above proposition makes it evident why polynomial matrices play such
a overwhelming role in linear system theory.
Let R(S,S-l)EIRg x q[S,S-l]. We will call the system of AR-equations
R(a,a- 1)w=o minimal (for simplicity we call also R minimal) if {R'(S,S-l)E
IRg' x q[s, s - 1], R~I} = {g' ~ g}. Here R~RR means that ker R'( a, a - 1) =
ker R(a, a- 1 ).
Proposition, Every 1: EBq admits a minimal AR-representation R(a, a- 1)w

= o.

Moreover
(i) {R is minimal}<=>{R(s,s-l)EIRj;q[S,s-l], i.e., R is offull row rank}.
(ii) 1f R is minimal, then {R'(s, S-l)EIR' Xq[s, S-l], R~RR, R' minimal}<=>
{R and R' are unimodularly left-equivalent that is, there exists a unimodular
U(S,S-l) such that R = UR')}.
The above result identifies how all minimal AR-representations may be
deduced from one: as the orbit of the transformation group R ~ UR where U
ranges over the unimodular polynomial matrices with a suitable number of rows
and columns. This group is the unimodular group acting on the left. Note that
the above proposition implies in particular that IRj;q[S,S-l] (the full row rank
polynomial matrices with q columns) is a canonical form since each element of
Bq admits a minimal AR-representation.
We use the notion of canonical form he re in the following sense. Let ('-13, n)
be a parametrization of 9R. The map n: '-13 ~ 9R induces the equivalence relation
E on '-13 defined by {P1Ep2}:<=>{n(pd = n(p2)}' This equivalence relation leads
to canonical forms and to invariants. A subset \.13c C;; '-13 will be called a canonical
form for the parametrization (\.13, n) if \.13cnp(mod E) is non-empty for all PE\.13,
i.e., if n(\.13J = 9R, if (\.13" n) is itself a parametrization of 9R. It is called a trim
canonicalform if'-13cnp(modE) consists ofexactly one point for all PE\.13, i.e. if
nl'lJc:'-13c~9R is a bijection, i.e. if (\.13" n) is a trim parametrization of 9R.
All the notions related to dynamical systems introduced so far (linearity,
time-invariance, completeness, etc.) are trivially generalized to systems with
latent variables. Sometimes it is obvious that the manifest model inherits a
property from the latent variable model (for ex am pie, linearity and timeinvariance) sometimes it is less obvious (for example, completeness).
Let R(S,S-l)EIRg x q[S,S-l], M(S,S-l)EIRg xd [S,S-l] and consider the system
of behavioral difference equations
(ARMA)

24

J. C. Willems

relating the time-series w:Z -+ IRq (expressing the evolution of the manifest
variables) to the latent variable time-series a:Z-+IRq. We will call this an
AutoRegressive- M oving-Average system. The term R( eJ, eJ - l)W in ARMA is called
the AutoRegressive part, while M( eJ, eJ - 1)a is called the M oving-Average part.
Clearly these equations represent a dynamical system with latent variables
(Z,IRq,IRd , ~ f) with ~ f=ker[R(eJ, eJ- 1)1_ M(eJ, eJ-1)]. From the AR-representation
theorem, we know that precisely every such system with ~ fE.\2q+d, that is, every
linear time-invariant complete latent variable dynamical system, with 1I' = Z,
W = IRq, and ll.., = IRd , can be described by an ARMA-system of behavioral
equations.
An ARMA-system induces an manifest dynamical system (Z,IRq,~) with
~={wI3a such that ARMA is satisfied}, equivalently,~=(R(eJ,eJ-1))-1
imM(eJ,eJ- 1) (with ( )-1 the inverse image), R(eJ,eJ- 1):(IRqf l -+(IR9YZ, and
M(eJ,eJ- 1):(IRdfz-+(IR9)71. Clearly (Z,IRq,~) is linear and time-invariant. The
question arises if it is also complete. The answer is in the affirmative:

Theorem, Let the dynamical system with latent variables L:L = (Z,IRq,IRd, ~f) be
linear time-invariant and complete. Then the manifest dynamical system which it
represents, L: = (Z, IRq, ~), is also linear time-invariant and complete.
In terms of an ARMA-system of behavioral equations R(eJ,eJ- 1)w=
M( eJ, eJ - 1)a, the above theorem states that the latent variables a can be
completely eliminated, resulting in an AR-system of equations. That is, there
will exist a polynomial matrix R'(S,S-1)EIR'X q[S,S-1] such that the AR-system
of equations R'(eJ, eJ-1)W = 0 represents the manifest behavior. In other words,
this equation captures all the restrictions imposed on w by ARMA. This
elimination may result in an increase in the lag of the resulting AR-system as
compared to the ARMA-system. However, the number of equations in the
AR-system need never be larger than the number of equations of the original
ARMA-system. We will refer to the above theorem and to the resulting
consequence for ARMA- and AR-equations as the ARM A elimination theorem.
A specially important dass of ARMA-systems are those in which
R(s, s- 1) = I, yielding the M oving Average system
(MA)
The intrinsic behavior ~ of an MA-system equals im M(eJ, eJ-1). Of course,
~E.\2q, and the elements of .\2q which can be represented this way are precisely
these subspaces of (IR q)71 which are images of polynomial operators in the shift.
We have seen that every ~E.\2q allows an AR-representation, that is, that it is
the kernet of a polynomial operator in the shift. Does it also allow an
M A -representation? In other words, is it also the image of a polynomial operator
in the shift? If not, does the restriction of having an MA-representation imply

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

25

some interesting system theoretic property? It may come as a surprise that these
abstract questions lead us to concept of controllability!
The above results allow a generalization to the continuous-time case.
However, from a 'technical' mathematical point of view the theory becomes
somewhat more intrincate, because there is the difficulty of giving a natural
definition of the behavior induced by a differential equation.
Let Ro,R l , . .. ,RL E1Rgxq and consider the system of differential equations
dLw
dL-lW
dw
RL-+R
+
..
+Rl-+Row=o
dt L
L-l dt L- l
dt

Introducing the polynomial matrix R(s)=RLSL+RL_lSL-l + ... +Rls+RoE


1Rgxq [s] yields the shorthand notation

The first thing to note is that the relevant polynomial ring is now (as in the
case 1I' = Z+) 1R[s], with the adapted notion of unimodularity, etc.
The above set of differential equations is the analog of an AR-system. The
analog of an ARMA-system becomes

with R(S)E1Rg Xq[s] and M(S)E1Rg Xd[S]. The analog of an MA-system follows
from there.
These differential equations define a dynamical system (possibly with latent
variables) with time axis 1I' = 1R (or 1R+, the theory now being completely
identical) and signal space W = 1Rq But how should one define the behavior?
The most logical approach is to include distributions in the behavior.
We denote by !)q the set of 1Rq-valued distributions on 1R, equipped with
the usual topology. Then R(d/dt)w = 0 defines the dynamical system
1: = (1R,1Rq, m) with ms !)q defined as ker R(d/dt) with R(d/dt) viewed as a map
from !)q to !)g. In this setting, the result regarding the elimination of latent
variables for ARMA-systems generalizes to the differential equation case.
Actually, the same holds if we consider all trajectories to be C oo , but working
with this space has many other disadvantages.
The above results show the ni ce interplay between the behavior and
behavioral equations. In our ideology, it is imperative to define concepts and
properties of systems on the level of the behavior and to develop tests on the
behavioral equations for veryfying wh ether the induced system has a particular
property. In this essay we will use the concepts of controllability and observability
as a case in point.

26

J. C. Willems

4 Controllability
We start with some historical comments. The notion of controllability-related
to the possibility of transferring the state of a system-has been introduced by
KaIman [K1, K2, K3, KHN1, KFA1] around 1960 and immediately became one
of the key concepts in control theory, related to the very possibility of exerting
effective control. It enters as a crucial condition in (infinite time) LQG- and
H co -control, in pole placement, in time optimal control, etc. Soon after the
introduction of controllability, realization theory appeared on the scene. One
of the important paradigms which was learned from this development is that
every transfer function and every convolution system can be represented by a
minimal state space system and that minimality is equivalent to controllability
and observability. This state of affairs has created the impression that
controllability is not an intrinsic property of a dynamical system, but merely a
property of astate realization, of a specific representation of a dynamical system.
Nevertheless, the idea that lack of controllability be ars relation to the presence
of common factors in a transfer function (and can thus, in some sense, be
considered as an external property) is very much part of the system theory
folklore. As indicated above, there are good system theoretic reasons why this
is dubious. It is equally dubious for good mathematical reasons: a transfer
function is a rational function and common factors are by definition cancelled.
We will now introduce a point of view which will make history out of these
historical remarks and this folklore. Indeed, we will put forward a convincing
definition of controllability which makes it into a property of the manifest
behavior and which is in principle applicable to any dynamical system. Thus
controllability will become a genuine property of a dynamical system.
Definition. Let L = (1I', W, ~), 1I' = 7l or lR, be a time-invariant dynamical
system. L is said to be controllable if for all W I, W 2 E~ there exists a t E1I', t ~ 0,
and a w:1I' (\ [0, t] -+ W such that w' E~, with w':1I' -+ W defined by

WI(t')
{
w'(t'):= w(t')
w 2 (t' - t)

for
for
for

t' <
~ t' ~ t

>
t'

In this definition, we should think of wI as a past trajectory, chosen by


W 2 as a desired future trajectory imposed by the 'designer', and of
w as a controlled trajectory judiciously selected by the 'control engineer'. Thus
a controllable system can eventually be steered to the desired trajectory,
whatever be its history. Controllable systems preach a message ofhope ofsinners
and comfort for educators: however dismal the past (w I ) rnay have been, by proper

'nature', of

guidance (w), we can nevertheless achieve any desired future (w 2 )!

Note that this notion of controllability is convincing in its own right and
representation independent. Obviously, it is desirable to be able to read off

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

27

from the behavioral equations whether or not a system enjoys a certain property.
For AR-models a rather concrete test can be obtained:
Theorem. Let ~E~q be represented by the AR-equations R(u, u-I)w = 0 with
R(s, s - I)EIRg xq[s, s -I]. Then Eis controllable if and only if the rank of the matrix
R(A., r I)E<Cg xq is independent of A. for 0 =f. A.E<C.
This theorem can be viewed as a sweeping generalization of Hautus'
controllability test for the ubiquitous state space system ux = Ax + Bu. We can
also prove the following interesting representation theorem:
Theorem. E = (Z, IRq, !B) = ~q is controllable if and only if there exists
M(s, S-I )EIRq x"[s, S-I] such that !B = im M(u, u- I), i.e., if and only if it allows a
M A -representation w = M(u, u - I )a.
We will refer to the above theorem as the MA-representation theorem. We
will denote the set of controllable systems in ~q by ~~onlro/lable. Thus
~~onlro/lable:= {~ = (Z,IRq, !B)E~ql E is controllable}. The theorem implies that
(IRq x"[s,s-I],n) with n:M(u,u-I)I-+(Z,IRq,imM(u,u- I)) defines a polynomial
matrix parametrization of ~~onlro/lable. This leads in our usual way to an
equivalence on IRqx"[S,S-I] denoted by MIM"AM2' wh ich hence signifies
imMI(u,u- l ) = im M 2 (u, u- I). In system theoretic terms, it means that the
associated MA-systems have the same manifest behavior.
The other extreme from controllability are the autonomous systems, in which
the past of a trajectory implies its future completely.
Let E = (1[', W, !B), 11' = Z or IR, be a time-invariant dynamical system. ~ is
said to be autonomous if {w l , W 2E!B and wl(t) = wit) for t < O} => {w i = w2}.
Proposition. Let E

=(Z,IRq,!B)E~q.

Thefollowing conditions are equivalent:

(i) ~ is autonomous;
(ii) !B is finite-dimensional;
(iii) E admits an AR-representation with R(s, S-I )EIRq xq[s, S-I] having det R =f. o.
(iv) E admits an AR-representation R(u, u-I)w = 0 with ker R(A., r I) = {O} for
some 0 =f. A. E<C (and hence for all but a finite number of 0 =f. A. E<C).

In the context of input/output representations, autonomous systems are


those which contain no inputs. Note again that our definition of autonomous
system is a convincing one in its full generality.
As we have seen, controllable systems and autonomous systems form the
extreme points in a range of possibilities. Every system in ~q is a combination
of these two extremes:
Proposition. Let E = (Z,IRq, !B)E~q. There exist subsystems Ei = (Z,IRq, !Bi)E~q,
i= 1,2 of E, with EI controllable, E 2 autonomous, and !B=!BI +!B 2 (this sum
may be taken to be a direct sum). In every such decomposition EI equals the

J. C. Willems

28

controllable part of L, whereas L z depends on the decomposition. H owever,


\B = \B 1 EB \B2' then the dimension of \Bz is fixed.

if

The controllable part of L is defined as the largest controllable subsystem


of L. That is, the system L' = (~, IRq, \B) with \B' S; \B, L' controllable and with
\B' as large as possible. Equivalently, \B' = \Bcompactjclosure: it is the closure in the
topology of pointwise convergence ofthe elements of\B having cmpact support.

5 Observability
We now turn to the notion of observability. In our view, observability will be
a property of systems which produce two kinds of signals, one which is observed,
and another which should be deduced from the observed signal.
Definition. Let L = (11', W 1 X W z, \B), 1I' = ~ or IR, be a time-invariant
dynamical system. Then W2 is said to be observable from W1 in L if
m an d w 1 = w 1 = w2 = w2 .
wl' w2 ' wl' w2 E:.v
{ (

("

")

"}

{'

"}

In other words, observability implies that there exists a map F:(WSlf -+


(W2 )'lf such that {(w 1 , W2)E\B} {w z = Fw 1}. For systems described by ARequations a concrete test fr observability can be derived:

= (~,IRq, x IRq>, \B)d!q, +q2 be represented by the AR-system


R 1((J,(J-l)W 1 +R 2((J,(J-l)W 2 =O with R 1(s,S-I)EIR9 x q'[S,S-I] and R 2(s,S-I)E
IR9 xq2[S, s - 1]. Then W2 is observable from W1 in L if and only if the rank of the
matrix Rz()"A -1) is equal to q2for all 0;6 ),E<C.
Theorem. Let L

If in a dynamical system with latent variables the latent variable is observable


from the manifest variable, then we will call the latent variable system simply
observable. In this sense observability becomes an intrinsic property of a latent
variable dynamical system. If the latent variables can be recovered from the
manifest variables (and the laws of the system), then we call the latent variable
model observable. For the ARMA-system R((J,(J-l)w=M((J,(J-l)a the above
theorem immediately leads to the conclusion that an ARMA-system is
observable if and only if M(A, A-1) is of full column rank for all 0;6 A = <C.
Let M(S,S-I)EIR qx d[S,S-I]. We will call the system of MA-equations
W = M((J, (J-l)a minimal (for simplicity we will call also M minimal) if
{M'(s, S-1 )EIRq xdls, S-I], M:w.'AM} = {d' ~ d}. There holds:
Proposition. Every L

E S2~ontrllable

admits a minimal M A -representation. M oreover

(i) {M is minimal}~{M(s,s-l)ER};Ts,s-l], i.e., M is offull column rank};


(ii) if M(S,S-I)EIR qX d[S,S-l] is minimal, then {M' minimal, M'MAM}~

{3F(s)EIRd x d(S), detF;60 such that M(S,S-l)=M'(s,S-l)F(s)};

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

(iii)

29

if

the MA-system w=M(a,a- 1 )a is observable, then it is minimal, and


{w=M'(a,a- 1 )a observable, M:,uM}~{M and M' are unimodularly
right-equivalent (that is, there exists a unimodular polynomial matrix U(s, S-1)
such that M = M'U)}.

The notions of controllability and observability are without doubt the


paradigmatic ex am pies of system theory concepts. It would perhaps be most
logical to develop now some applications of our version of these concepts to
the design of feedback controllaws and the synthesis of observers. However,
we will not do this. Firstly, because this part of our theory is still in somewhat
preliminary stage and secondly, because we want to use this occasion in order
to put forward some less usual avenues of application. As an application of the
notion of controllability we have chosen the interpretation of the notion of a
transfer function in our setting. As an application of the notion of observability
we have chosen its relevance to the question of continuous parametrization.

6 The Transfer Function and Controllability


For a system L = (71:, 1Rq, !B)EEq we will refer to the collection of all exponential
solutions as the frequency response. Formally, however, we need to consider
complex functions. Thus the frequency response associates with each ..E<C a
sub set IF;.s;;;<C q defined by IF;.:={wE<cqIRe(exp;.w)E!B and Im(exp;.w)E!B}
where exp ;.:71:--+ <Cq is defind as exp ;.(t):= e;'l. Clearly IF;. is a linear subspace. It
is equal to kerR(..,r 1 ) with R(a,a- 1 )w=o an AR-representation of L. We
call the map ..E<C~IF;. S;;; <Cq thefrequency response. Note that IF;. has constant
dimension for ..E<C if and only if L is controllable. It is interesting to note that
controllability corresponds exactly to the case that the frequency response
defines a vector bundle over the Riemann shere.
Now consider the dynamical system

W=[~J

(1/0)

with p(S,S-1)E1R PXP [S,S-1], Q(S,S-1)E1Rpxm [S,S-1], and detP#O.


In the above system the variable u is 'maximally free'. With this we mean
that for any u:71:--+ 1Rm there exist y:71:--+ 1RP such that

[~J

belongs to the

behavior of 1/0. Moreover there are no further free components. In particular,


if u is chosen to be zero, then the compatible y's define an autonomous system.
We will denote the family of discrete-time linear time-invariant complete
1/0 dynamical systems (71:,1Rm, 1RP,!B) (of course, !BEEm+ p) by E~'. It follows that
E~' admits a polynomial matrix parametrization (lP, n) with lP = 1R~sXP[s, S-1] X
1RPxm[s, S-1] and n(P, Q) = (71:, 1Rm, 1RP,!B) with !B = ker [Q(a, a- 1)1- P(a, a- 1)].
The equivalence relation induced by this will be denoted by lio. It follows

30

J. C. Willems

immediately from 16. that {(P,Q)Iio(P',Q')}<=>{3 a unimodular U(s,s-l)e


IRP X P[S,S-I] such that P= UP' and Q= UQ'}.
The controllable elements of S2~' will be denoted by S2~',controllable and for
a system ~ eS2~' we will denote its controllable part again as ~controllable. It
follows that (1/0) defines a controllable system if and only if the polynomial
matrices P and Q are right co-prime. Common factors in P and Q consequently
correspond to loss of controllability!
Let ~eS2~' be described by (1/0). Define G(s)elRPxm(s) by

G(s):= P- 1(s, S-I)Q(S, S-I)


This matrix of rational functions is called the transfer function of ~. Clearly if
(P, Q)Iio(P', Q'), then P- 1 Q = (p')-IQ' and so, ~ uniquely specifies the transfer
function. Is the converse also true? No quite:
Proposition. Let ~ieS2~', i = 1,2, and let Gi(s) be their transfer function. Then
{G 1(s) = G2 (s)}<=> {E~ontrollable = ~';ntrollable}.
Consequently, the transfer function determines only the controllable part of a
system.
It is easy to see that the frequency response determines the transfer function
and hence the controllable part of a system. In general, however, neither the
frequency response nor the transfer function determines the behavior completely.
The transfer function determines the system ifT the system is controllable. The
frequency response determins the system ifT the AR-matrix for ~, R(s,s-1)e
IRj;q[S,S-1], can be written as R(S,S-1)=F(A.,.r 1)R'(s,S-I) with R'(A.,.r 1) of
constant rank for 0 =f. A.e<C and F square and semi-simple. A square
polynomial matrix F(A., A. -1) is said to be semi-simple if det F =f. 0 and if every
o=f. A.e<C has multiplicity as a root of det F(s, S-1) equal to dirn ker F(A., A. -1).
Let G(s)eIRP xm(s). Clearly there are many 1/0 systems (Z, IRm, IRP, !B) such that
G(s) is the associated transfer function: simply identify two polynomial matrices
p(S,S-1) and Q(S,S-1) such that G(s)=p- 1(S,S-1)Q(S,S-I). All the resulting
systems will have the same controllable part. However, there will be precisely
one controllable system among these. It can be obtained as folIows. Determine
a left co-prime factorization of G(s), Le. determine left co-prime polynomial
matrices P(s, s-1)eIRPx PEs, S-1], det P =f. 0, and Q(s, s-1)eIRP xm[s, S-1] such that
G(s) = P- 1(S,S-I)Q(S,S-1). Then the behavioral equations
P(a, (J-1)y = Q(a, a- 1 )u

define an AR-representation of the (unique) controllable system having transfer


function G.
However, since when considering transfer functions, we are, by default,
studying controllable systems, we know that they also allow MArepresentations. These are easy to obtain. Take any polynomial matrices
N(s, s-1)eIRP Xm[s, S-1], M(s, s-1)eIRm X m[s, S-1], det M =f. 0, such that G(s) =

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

31

N(S,S-1)M- 1(s,S-1), and consider the MA-system


u = M(u, u- 1 )a;

= N(u, u- 1 )a

This will be an MA-representation of the controllable 1/0 system with transfer


function G(s). It will be an observable representation if and only if (N, M) is a
right co-prime factorization of G.
The above analysis shows in what sense common factors in P and Q can
be freely cancelled in the AR-description P(u, u - l)y = Q(u, u - 1)u. In principle,
they should be respected, but, if we are only interested in the controllable part,
they can be cancelled. Cancelling or introducing common factors will result
in a different behavior and different controllability properties. On the other
hand, in a MA-representation u = M(u,u- 1 )a; y = N(u,u- 1 )a, common factors
can be cancelled or introduced as far as the manifest behavior is concerned but
not as far as the full, the internal behavior or the observability properties are
concerned.
It follows that f!,7t',controllable is parametrized by (1RP x m(s), n) with
n(G) = (1l,IRm, RP, m) and m= ker[Q(u,u- 1)1- P(u, u- 1 )] =im [M(U,U= :)] with
N(u, u )

G = P - 1 Q = N M- 1 and (P, Q) left co-prime. This parametrization is obviously


a trim one. Hence transfer functions provide very effective characterizations of
controllable 1/0 systems. However, in general transfer functions are not
adequate as a description of a linear time-invariant system.

7 Parametrizations and Continuity


The application of controllability which we considered basically states that
controllable systems are those for which the transfer function provides a trim
parametrization. The application of observability which we will consider is
perhaps more surprising. We will demonstrate the elose relation between an
ARMA representation as a continuous parametrization. First, however, we need
to introduce the concept of a continuous parametrization.
Let (~, n) be a parametrization of 9Jl (for example a family of dynamical
systems), and assurne that ~ and 9Jl are endowed with a topology. We will call
(~, n) a continuous parametrization of 9Jl if whenever PtE~, e!?; 0 satisfies
lim.-+ o P. = Po then lim t -+ o n(p.) = n(po) and, conversely, whenever M.E9Jl, e!?; 0,
satisfies lim t -+ o M. = !Mo, then there exists PtE~ such that n(p.) = M. and such
that lim.-+ o P. = Po. In other words, in a continuous parametrization convergence
of the 'abstract' objects in 9Jl can be put into evidence by convergence of the
parameters representing these objects, that is by the 'concrete' objects in ~ with
the corresponding specific and natural convergence with which typical
parameter spaces are endowed.
Now consider the family of discrete-time linear time-invariant complete
dynamical systems (Il, R q, m) denoted, as mentioned earlier, by f!,q. Let
RO x q[s, S-l ] denote the family of polynomial matrices with q columns and any

32

J. C. Willems

(-but a finite-) number of rows. Each element R(s, S-1)ElRo Xq[S,S-1] induces
the dynamical system (Z,lRq,kerR(O', 0'-1)). Let n denote the map which
associates this dynamical system with the polynomial matrix R. Above we have
seen that (lRo Xq[s, S-1], n) defines a parametrization oJ i!q. The question whether
this is a continuous parametrization is a much more delicate one. To begin with,
it requires specifying a topology on i!q and on lRo x q[s, S-1].
For i!q we will use the following topology. We have seen that (Z,lRq,~)
belongs to i!q if and only if ~ is a linear shift-invariant closed subspace of (lRq)z,
equipped with the topology of pointwise convergence. This leads to a topology
on i!q with the following notion of convergence. A family of systems
L, = (Z, lRq, ~,)Ei!q with B>
areal number is defined to converge to
L o = (Z, ~q, ~o)Ei!q if

(i) whenever W'kE~'k' kEIN, lim k -+", Bk = 0, and limk -+", W'k = Wo (pointwise
convergence), then WoE~o, and
(ii) whenever Wo = ~o, then there exist W,E~, such that lim,-+", w, = Wo.
For lRo xq [S,S-1] on the other hand, we will use the following notion of
convergence. Let R,(s, S-1)ElRg, x q[s, S-1], B~ 0. Then we define lim,-+o R, to be
R o if
(i) g, = go for B sufficiently smalI,
s-1)=R'L csL'+R'Lc-l SL,-1++R'lc+l SI,+1+R'SI,
satisfy 1=:;;1
=:;;L =:;;L
(ii) R(s
f!
,
11:
- f!- f for all B ~ 0, and
(iii) lim,-+o R~ = R~ for all I ~ k ~ L. This last convergence is componentwise in
the entries of the matrices.
Thus system convergence means that convergent time-series from the
behavior of the convergent systems approach a limit in the behavior of the limit
system and, conversely, that each time-series in the behavior of the limit can
be approximated by elements in the behavior of the convergent systems.
Polynomial matrix convergence simply means convergence of the matrix
coefficients.
It is not possible to prove in full generality that the polynomial matrix
parametrization lRo x q[s, s- 1] of .Bq is a continuous one. For one thing, we need
to restrict out attention to Juli row rank polynomial matrices with q columns.
In fact it is easy to prove that (lRj;q[s,s-1],n) defines also a parametrization
oJ i!q (lRj;q[S,S-1] denotes the polynomial matrices of full row rank). This
parametrization corresponds, as we have seen, to the minimal ARrepresentation. In fact, we have seen that if R(O', 0'-1)W = 0 is one minimal
AR-representation of L, then the transformation group (the unimodular group)
R -+ UR, where U(s, s - 1) ranges over the unimodular polynomial matrices,
generates precisely all minimal AR-representations of L. This tremendous
non-uniqueness of behavioral equation representations is, among other things,
the source of difficulty in continuity considerations.
In order to state our result from [NW1] on continuous parametrization,
we need to introduce the notion ofthe memory span of an element (Z,lR q, ~)Ei!q.

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

33

It can be shown that \.B has the property that it has finite memory span, that is,
that there exists a AEZ+ such that w1, w2E\.B and w1(t) = w2(t) for 0 ~ t ~ A
implies that w1 A W2E\.B, where Wl A W2 denotes the concatenation of w1 and
w2, defined as (w 1 A W2)(t):= w1(t) for t < 0 and (w 1 A W2)(t):= w2(t) for t ~ O. The
smallest of such numbers AEZ will be called the memory span of ~.
Let R(S,S-I) =RLsL + R L_ 1SL- 1 + ... + R l sI EIR x q[S,S-IJ, have R L =1=0 and
R l =1= O. Then we call L -I the degree of R. Let us denote by IRj;.~[s, S-IJ the
collection ofelements ofIRj;q[S,S-IJ with degree ~A. Also, let us denote by
E~ those elements of Eq with memory span ~ A. In [NWIJ we have obtained
the following interesting continuity result:
Theorem. (IRj;.~[s, S-1 J, n) defines a continuous parametrization oJ E~.
Thus with the restrietion imposed in the above theorem, linear time-invariant
complete dynamical systems converge if and only if their AR-representations
converge.

8 Continuity of ARMA-ModeIs and Observability


Let (1[', W, lL, \.B J) be a latent variable dynamical system. As we have seen, we
call it observable if {(w',a'),(w",a")E\.BJ' and w'=w"} imply {a'=a"}.
Observability in other words implies that the latent variable trajectory a can
be deduced from the manifest variable trajectory w.
Now consider a family of ARMA-models

R.(er, er- 1)w = M.(er, er- 1)a


depending on a real parameter e ~ O. Let \.Bj denote its full behavior and \.Bt
its manifest behavior. From the results from [NWIJ mentioned in the previous
section we know that if [R o(s,s-I)I-M o(s,s-I)J is of full row rank, and if
\.BJo. The question which we
lim f .... 0 R t = R o and lim E.... 0 Mt = Mo, then \.BJt ------+
f-O
now wish to address is if and when this (parameter or behavior) convergence
oftheJull behavior implies the convergence \.Bt ------+ \.B 0 ofthe manifest behavior.
..... 0

The electrical circuit example described in [WNIJ shows that this will not be
automatie and it is refreshing to take note of the fact that observability provides
the key for a positive result in this direction!
Theorem. Assume that the ARMA-system

Ro(er, er-I w = M o(er, er- 1)a


is minimal (i.e., that [Ro{s, s-I)I_ Mo{s, S-1)J is oJJull row rank) and observable
(i.e., that M(, -1) is oJ Jull column rank Jor all 0 =1= E<C). Then if
lim..... o R.(s, s -1) = Ro(S, S-1) and lim..... o M.(s, s - 1) = M o(s, s - 1), the Juli behavior

J. C. Willems

34
~j

as well as the manifest behavior

lim ~j = ~J

and

E-O

~'

will converge:

lim 'B' = ~o

E-i'O

By combining these results with those of [NW1] mentioned in the previous


section a number of corollaries readily follow:

Corollary. Under the assumptions of the above theorem, there will exist full row
rank polynomial matrices R:(s, S-l )EJR x q[s, S-l] such that

R:(a, a- 1 )w =

is an (AR)-model representing the behavior

lim R:(s, S-l) =

~,

with

R~(s, S-l)

' .... 0

Corollary. Let.Lj = (.71,IR.q, JRd, ~j), 8 ~ 0, be afamily oflatent variable dynamical


systems and assume,for all 8 ~ 0 that ~j is a linear shift-invariant closed subspace
Of(JRq+d)~ equipped with the topology ofpointwise convergence. Assume moreover
that there exists a LiE71+ such that the memory span of ~j is ~ Li for all 8 ~ O.
Let .L' denote the manifest dynamical system induced by .Lj. Then if .LJ is
observable there holds that lim, .... o.Lj =.LJ implies that lim, .... o.L' = .L 0 .
Consider again thefamily of ARMA-models Ria, a- 1 )w = M,(a, a- 1 )a, 8 ~ O.
Assurne that the conditions of the previous theorem are satisfied. Note that
observability at 8 = 0 need not imply observability for sm all 8. Assurne therefore
explicitely observability for all 8 ~ o. Then we have the following situation:
1. The main result teIls us that W'E~' and w' ~ WO imply the existence of
' .... 0

an aO such that (WO, aO)E~J.


2. By observability at 8 = 0 we know that this aO is unique.
3. By observability for B > 0 we know that there exist unique a"s such that
(w', a')E'Bj.

It would stand to reason to think that a' ~ aO. That, however, need not be
, .... 0

the case! This may be illustrated for example by the ARMA-system

Consider the following solution


w~(t)~ 8- t e- C1

Clearly

w~(t)

=0

a'(t)

=8- t

e,-1

w~ ~o, w~ ~o,
e-O

e-O

but a'(t) diverges for all t as 8--+0!

1 Axiomatic Framework-Dynamical Systems, Controllability, and Observability

35

When R((T,(T-1)W = M((T,(T-1)a is observable, then (see Sect. 6) there exist


R', R" such that
R'((T, (T-1)W = 0;

specifies

~J'

a = R"((T, (T-1)W

For R.((T,(T-1)W = M.((T,(T-1)a this leads to

R~((T, (T-1)W

= 0; a = R~((T, (T-1)W

It follows that, under the conditions of the previous theorem, we can choose
R' such that R' -E- 4 R~. However it follows from the above example that even
0
[;

E;

then it may not be possible to choose R"E such that R"e -e-O
- 4 R~. If it is the case,
then W,E~' converges to Wo (which will then belong to ~o) if and only
(w" a,)E~J also converges: (w., a.) - - 4 (wo, ao) (which then belongs to ~J)

~o

if

This can be used effectively to study the convergence of the input/state/


output-model
(TX = A,x + B,u; Y = C,X + D,u

This is a particular type of ARMA-model (identify w with [u/y] and a with x).
Assurne that A E--4A o, B, - - 4 B o, C, - - 4 Co, D, --4D o, and that (A o, Co)
E-O

e-O

E;-OO

e-O

is an observable pair. Then ~J - - 4 ~Jo and (since the conditions of our main
E~O

result are now automatically satisfied) it follows that ~'--4 ~o. This result
E~O

has been shown in [Nl]. However, in this case it can also be seen that (u E, YE)E~E
converges to (u o, Yo) (which will belong to ~O) if and only if (u3,y"XE)E~J
converges to (u o, Yo, x o) (which will then belong to ~J). This can be seen as
follows. There holds (n is the dimension of the state vector x):
YE= C,X,
- CEBEu, + (TY, = C,A.x,
CEA,BEu E- C,B,(Tu E+ (T2yE = CEA;x,
C AnE- 1 X,
A n- 2B EU, - " ' -C,B ,(Tn- 2 U,+(T n- 1 y,=,
CE,

Now choose a set of rows forming a non-singular sub-matrix of

CoA
Co o

CO~~-l

and obtain, by taking the corresponding rows from the above equations, a
relation like

36

J. C. Willems

1
with Me square, Me ----+
Mo, and Mo nonsingular. Now pre-multiply with Me--+O
e

and obtain the desired

with R~ convergent. Hence for input/state/output systems observability im pli es


not only convergence of the manifest behavior, but also of the state trajectory
corresponding to a convergent input/output pair.

9 Postscript
In this artic1e, I have given abrief outline of some aspects of the theory of
dynamical systems which I developed during the last decade. The behavior of
a dynamical system plays the lead role in this approach. However, most
dynamical models encountered in practice will be given in the form of behavioral
equations. Typically these take the form of difference equations or differential
equations. In a dynamical system described in terms of behavioral equations,
the behavior is specified as the solution set of the behavioral equations.
Moreover, models obtained from first principles will invariably require the
introduction of auxiliary variables in order to specify the laws of the dynamical
system. We call these auxiliary variables latent variables in order to distinguish
them from the manifest variables. These are the variables which our model aims
at describing. This triptych, with the behavior in central stage and with
behavioral equations and latent variables as important supporting characters
provides the basic conceptual framework on which our approach is build.
In this approach we aim at introducing, at defining, all properties of a
dynamical system at the level of the behavior. We have developed here two
examples of such concepts: the important notions of controllability and
observability. Controllability refers to the possibility of matching any feasible
past trajectory to any feasible funture trajectory, while observability refers to
the possibility of deducing the latent variable trajectory from the manifest
variable trajectory.
Many other notions from c1assical and modern linear systems theory can
be introduced effectively from this point of view. We mention a few. Inputs and
outputs: the input is a maximal set of free variables. The state of a dynamical
system: the state is a latent variable having the property that the past behavior
is independent of the future behavior, given the present state. The transJer
Junction and, more gene rally, the Jrequency response: these simply c1assify the
exponential trajectories in the behavior (they can hence be introduced without
reference to Laplace transforms). The fact that behavioral equations provide a
many-to-one way ofdefining a dynamical system leads to the problem offinding
(trim) canonical Jorms and (complete) invariants (see [W3]). Finally, it is possible

1 Axiomatie Framework-Dynamieal Systems, Controllability, and Observability

37

to fit the classical realization theory within this framework. In fact, through the
notion of the most powerful unfalsified model (see [Wl, W3]), this problem
becomes both more natural and greatly generalized in the sense that it can be
applied to any set of observed vector time-se ries, and not just to the impulse
response.
Our approach is very much inspired by Kalman's seminal work. The first
and most basic aim is to provide (in the spirit of the first chapter of (KF Al])
a suitable axiomatic framework for the study of dynamical systems. This
framework, of course, allows for free variables (inputs). This is contrast to the
very limited point of view followed in topological dynamics where the initial
conditions uniquely specify the solution. However, in our framework the
input/output structure is deduced from the behavior and not imposed ab initio.
Also, the crucial role which the state plays in Kalman's work is in our framework
brought to its full fruition through the more general idea of latent variables.
As a first generation student of mathematical system theory, it is a pleasure
to acknowledge the constant inspiration which Kalman's work has provided
for my own scientific thinking. The present article is more than evidence of this
profound influence.
References
[K1] R.E. Kaiman, On the General Theory of Control Systems, Proeedings of the First
International Congress of the IFAC, Moseow 1960, Butterworths, London, pp. 481-492,
1960
[K2] R.E. Kaiman, Mathematical Description of Linear Dynamical Systems, SIAM Journal on
Control, Voll, pp 152-192, 1963
[K3] R.E. Kaiman, Leetures on Controllability and Observability, Centro Internazionale
Matematieo Estrio Bologna, Italy, 1968
[KFAI] R.E. Kaiman, P.L. Falb, and M.A. Arbib, Topies in Mathematieal Systems Theory,
MeGraw-Hill, 1969
[KHNI] R.E. Kaiman, Y.C. Ho, and K. Narendra, Controllability of Linear Dynamical Systems,
Contributions to Differential Equations, Vol I, No 2, pp 189-213, 1963
[NI] J.W. Nieuwenhuis, More About Continuity of Dynamical Sytems, Systems & Control
Letters, Vol 14, No 1, pp 25-30, 1990
[NW1] J.W. Nieuwenhuis and J.C. Willems, Continuity of Dynamical System: A System Theoretic
Approach, Mathematies ofControl, Signals, and Systems, Voll, No 2, pp 147-165, 1988
[WI] J.e. Willems, From Time Series to Linear System, Part I. Finite Dimensional Linear
Time Invariant Systems, Part ll. Exact Modelling, Part llI. Approximate Modelling,

Automatiea, Vol 22, No 5, pp 561-580, 1986; Vol 22, No 6, pp 675-694, 1986; Vol 23,
No 1, pp 87-115, 1987
[W2] J.e. Willems, Modelsfor Dynamics, Dynamics Reported, Vol 2, pp 171-269, 1989
[W3] J.C. Willems, Paradigms and Puzzles in the Theory of Dynamical Systems, IEEE
Transaetions on Automatie Control, Vol. AC-36, No. 3, pp 259-294, Mareh 1991
[WNI] J.C. Willems and J.W. Nieuwenhuis, Continuity of Latent Variable Models, IEEE
Transactions on Automatie Control, Vol. AC-36, No. 6, June 1991

Chapter 2

Kaiman Filtering

Kaiman Filtering: Whence, What and Whither?


B. D. O. Anderson and J. B. Moore
Department of Systems Engineering, Research School of Physical Sciences and Engineering,
The Australian National University, GPO Box 4, Canberra ACT 2601, Australia

1 IDtroductioD
Undoubtedly, one of the major contributions of R.E. KaIman has been the
KaIman filter, [1,2J, the magnitude of the contribution being specifically
recognized in the award of the Kyoto Prize.
In this contribution, we shall try to put the KaIman filter in historical
perspective, by recalling the state of filtering theory before its appearance, as
well as some of the major developments which it spawned. It is impossible to
be comprehensive in the allotted space, especially so in making a selection from
major developments.

2 The WieDer Filter


Wiener filtering [3, 4J probably represented the first domain of endeavour where
two important ideas were merged: dynamic systems, and optimal estimation in
the presence of noise. The quantity being estimated is a "moving target", being
associated with the evolution of a dynamic system. More precisely, consider
the arrangement of Fig. 2.1, which depicts a signal y(.), contaminating noise
v() and a measurement z(). Continuous and discrete-time problems can be
formulated, and vector y(.), v() and z() are permitted. Nevertheless, for
convenience, we shall consider scalar continuous time signals only. All signals
are defined on the time interval ( - Cf) , Cf)).
It is assumed that y(.) and v() are sam pIe functions of stationary random
processes. Normally, they are zero mean, and independent. Further, they are
assumed to have known spectra, ljJyy(jw) and ljJvv(jw), WER. (Certain wellbehavedness properties, whose particular form does not concern us here, must
be fulfilled by the spectra also.)
The task of Wiener filtering is to use the measurements z( ) to estimate y().
More precisely, the estimation is required to be causal, on-line and optimal.
Causal means that y(t) is to be estimated using Z(5) for 5 < t; on-line means that
at time t, the estimate of y(t) should be available, and this should hold for all

42

B. D. O. Anderson and J. B. Moore

r ,

----~ O----~

Fig. 2.1. Signal model for Wiener filter

t, i.e. as t evolves; optimal means that the estimate, call it .Nt), should offer
minimum me an square error, i.e. E {[y(t) - y(t)]2} should be minimized. In ca se
y(.) and v() are jointly gaussian, this means that y(t) is a conditional me an
estimate, viz E[y(t)lz(s),s < t].
The solution to this problem is depicted in Fig.2.2. The block labelled
"Wiener filter" is a linear, time-invariant, causal, stable system, describable by
an impulse response h(') or transfer function, thus
y(t) =

J h(t -

(2.1)

s)z(s)ds

-co

The signal y(.) and noise v() are often represented as the output of stable
linear systems excited by white noise, see Fig. 2.3. If By(). Bv(') denote zero
mean, unit variance, white noise, i.e.
E(By(t)By(S)] = E[Bv(t)Bv(S)] = c5(t - s)

(2.2)

Then
rPyy(jw) = IWy(jwW,

rPvv(jw) = IWv(jwW

(2.3)

Wiener Filter
Linear Time-invariant
Causal System

Fig. 2.2. Structure of Wiener filter

----I~~

z (. )

Fig. 2.3. Alternative signal model for


Wiener filter

2 Kaiman Filtering-Whence, What and Whither?

43

It is c1ear that the key problem in Wiener filtering is to describe the procedure
which leads from the pair cPyy(jw), cPvv(jw) (or Wy(jw) and Wv(jw)) to the impulse
response h(t) or its transfer function H(jw). The most crucial step is that of
spectral Jactorization. The spectrum of z(), at least when y(.) and v(') are
independent, is given by
(2.4)

and the spectral factorization operation requires the determination of a transfer


function WAjw) such that WAs) and W;l(S) are analytic in Res ~ 0 [i.e. WA')
is causal and causally invertible] such that
(2.5)
In [3], this spectral factorization operation is presented as a crucial step in the
calculation of H('), while in [4], it is the key to a well-motivated derivation of
the optimum filter. The point behind this derivation is the following. Define a
signal sA') as the output of a linear system of transfer function W; l(jW) driven
by z(). Because of the causality and causal invertibility of WA'), then sz(') is
informationally equivalent to z(), i.e. estimation of y(t) using RAs) for s < t must
give the same result as estimation of y(t) using z(s) for s < t; but also, sA') is
white. This is a very significant simplification, and is the key to the derivation
of the optimum filter in [4].
We noted above that construction of WA') satisfying stability constraints
and (2.5) is a crucial step in the contruction of H(). The question arises as to
how this can be done. If cPzz(') is rational, polynomial factorization is the key.
Otherwise, one uses
')
.
{ 1 +Joo 10gcPzz(jw) d }
Wz (JWo
=mmexp
-.
w
8-+0
2n -00 - J(W - wo) - s

(2.6)

The case of vector z() and matrix cPzA') is much more complicated.
The following table sums up some important characteristics of the Wiener
filtering problem and its solution:
Table 1. Assumptions for and properties of the Wiener filter
Initial time t o
Random Processes
Signal y(.)
Measurement Noise v()

Wiener filter
Main Calculation Burden
Quantity estimated

t o = - CX)
Stationarity required
(i) Spectrum c/Jyy(') enough, but Wy(jw) is acceptable.
(ii) y(.) is stationary, Wy(jw) must be stable.
(iii) c/Jyy(') and Wy(') are not necessarily rational.
(i) Usually independent of y(.)
(ii) Stationary
(iii) Not necessarily white
(iv) c/Jv.(-) is not necessarily rational.
Time invariant and stable but not necessarily with rational
transfer function.
Spectral factorization
y(t)

44

B. D. O. Anderson and J. B. Moore

3 The Kaiman Filter ( Continuous Time), [2]


We shall first recall the standard KaIman filter set-up, and then compare it with
the Wiener filter. The signal model is (see Fig. 3.1)
dx(t)
dt

= F(t)x(t) + G(t)w(t)

(3.1a)

z(t) = H'(t)x(t) + v(t)

(3.1 b)

in which F, G, H are n x n, n x m, and n x p respectively. The processes w() and


v() define zero me an white noise, such that
E[W(t)] [w'(s)
v(t)

v'(s)] = [Q(t)
S'(t)

S(t) ]c5(t - s)
R(t)

(3.2)

with R(t)=R'(tO for all t. Very frequently, S(t) =0, i.e. w() and v() are
independent. We shall make this assumption. Then Q(t) = Q'(t) ~ O.
In the first instance, we shall assume a finite initial time t o. Further, x(to)
will be assumed to be a gaussian random variable, of me an X o and covariance
Po. Equation (3.1a) defines a Gauss-Markov model-since x() is a Markov
process which is gaussian. [In fact, [x'(-) z'(-)]' is also gaussian and Markov.]
The estimation task is to use measurements of z(s) for s< t to estimate x(t);
the estimate, call it x(t), is to be on line and to minimize E[ 11 x(t) - x(t) 11 2 ].
(This means that x(t) is necessarily a conditional-mean estimate.) The now
well-known solution is obtained as folIows. Define P(t) = P'(t) ~ 0 as the solution
of

P = PF' +FP-PHR-1H'P + GQG'

P(t o) =P o

(3.3)

and x() by (see Figure 3.2)


dx

- = F(t)x(t) + P(t)H(t)R -l(t)[Z(t) - H'(t)x(t)]


dt

(3.4)

(Note that Fig. 3.2 depicts a standard observer, in terms of structure, with a
special gain P(t)H(t)R -l(t), sometimes referred to as the KaIman gain.) Further,
there holds
E {[x(t) - x(t)] [x(t) - x(t)]'} = P(t)

(3.5)

Fig. 3.1. Signal model for


Wiener filter

2 KaIman Filtering-Whence, What and Whither?

45

Fig.3.2. KaIman filter

i.e. the performance of the optimal estimator, as measured by the error


covariance, is given through the solution of (3.3). (The existence of a solution
to (3.3) in (to, (0) has to be proved, but this is not difficult.)
Let us now study some differences with the Wiener filter, considering the
points listed in Table 1.
Initial time. In the above formulation, we have taken t o > - 00. This can be
adjusted, see the next section. So the KaIman filter allows either t o > - 00 or
t o = - 00, whereas the Wiener filter allows only t o = - 00.
Random processes. The exciting random processes v() and w() in the above
signal model are not necessarily stationary, since Q(.) and R() in (3.2) may be
time-varying. With t o > - 00, we cannot have stationarity. Finally, time-variable
F(t), G(t), H(t) mean that, even if Q(t) and R(t) were constant and t o were - 00, z()
would not have to be stationary. Thus, in contrast to the Wiener filter, stationarity
is normally not guaranteed or expected. Under special circumstances, it can be
present however. We discuss this point further in the next section.
Signal y(.). lt is clearly appropriate to regard H'x in the signal model (3.1) as
corresponding to the signal y(.) in the previous section. For the Wiener filter,
it was enough to know the spectrum of y(.), or equivalently, its covariance.
There is such a quantity as a nonstationary covariance, and so in principle one
could contemplate constructing the KaIman filter knowing just the covariance
of y(.). Whether or not this is possible, (itis, see [5]) this is certainly not the
way the KaIman filter was presented-rather, a detailed signal model of y(.) is
required, rather like requiring Wy(jw) in the Wiener filter problem as opposed
just to epyy(jw). Next, we note that for the KaIman filter problem, y(.) does not
have to be stationary, and the linear system linking w(-) to y(.) does not have
to be time-invariant or stable. This linear system however does have to be finite
dimensional. In the time-invariant case, this would correspond to Wi) being
rational, which is not a requirement in Wiener filtering. In summary, the cost
of dropping the stationarity assumption of the Wiener filtering problem is the
requirement for a model of y(.), not just of its covariance, and a finitedimensional model into the bargain.

M easurement noise v(). In both the Wiener and KaIman formulations, the
noise is usually independent of y(.). In the Wiener formulation, it is stationary,
but this is not required for the KaIman formulation. The major difference is
that the noise is required to be white in the KaIman formulation. This is not

46

B. D. O. Anderson and J. B. Moore

Table 2. Wiener filter and Kaiman filter key differences


Wiener

Kaiman

t o = - 00
Stationarity
Infinite dimensional OK
Measurement noise not necessarily white
Spectral factorization
Signal estimation

t o ~ - 00

Nonstationarity acceptable
Finite dimensional
Measurement noise white
Riccati equation solution
State estimation

required in the Wiener formulation (though it turns out that whiteness does
carry with it significant simplifications).
Filter. The KaIman filter is in general time varying, stability is not guaranteed
(and of course, over a finite interval it is of limited relevance). It is finite
dimensional. The Wiener filter may not be, since its transfer function is not
necessarily finite dimensional.
Main calculation Burden. Spectral factorization and Riccati matrix solution, the
two key computational tasks, could not appear more dissimilar.
Quantity estimated. The Wiener filter estimates y(t), the KaIman filter x(t). In
the KaIman filter formulation, y(t) = H'(t)x(t), and an estimate of y(t) follows as
y(t) = H'(t)x(t)

(3.6)

Table 2 summarizes the key differences.


The problem of prediction is alm ost as easily solved as that of filtering. This
is the task of estimating x(t + Li) for some positive Li, given z(s) for s< t. There
holds
.
x(t + Li) = l/J(t + Li, t)x(t)

(3.7)

where l/J(.,.) is the transition matrix of F(). The smoothing problem (Li is
negative instead of positive) took some time to resolve; it is discussed later.

4 Kaiman Filter Over an Infinite Time Interval


The deepest contribution of [2] lies in the treatment of infinite time problems.
Here, concepts of controllability and observability were cleverly used to tackle
issues such as time invariance and stability. We shall review the main ideas here.
Suppose that F, G, H, Q and Rare all time invariant, with t o remaining finite.
Result 1. Given observability of [F, H], P(t) is bounded, and

p= minP(t)
t-+ 00

(4.1)

2 KaIman Filtering-Whence, What and Whither?

47

exists and satisfies the steady state Riccati equation


PF' + FP-PHR-1H'P+ GQG' =0

(4.2)

The boundedness of P(t) is intuitively reasonable, because under observability, it is not surprising that the error in estimating x(t), viz P(t), should remain
bounded. There are a number of other consequences of this result:
(i) the KaIman filter (3.4) is asymptotically time invariant.
(ii) if t o -+ - 00, rather than t -+ 00, the fact that the right side of the differential
equation (3.3) has no explicit time dependence yields

P=

min P(t)
to--+ - 00

for all t.
(iii) if t o -+ - 00, the signal model (3.1) with constant F, G may produce
unbounded E[x(t) x'(t)] unless Rd;(F) < 0, i.e. unless it is stable. And if
t o remains finite and t -+ 00, the same is true.
Result 1 says nothing about stability of the KaIman filter, nor of the dependence of P on Po. The circle is closed with Result 2. Again, we suppose that
F, G, H, Q and R are time invariant.
Result 2. Suppose [F, H] is observable and [F, GQl /2] is controllable. Then P
as defined in Result 1 is independent of Po, and
ReA;(F - PHR-1H') < 0

Notice that this stability property is just what is required to ensure that the
KaIman filter (3.4) is asymptotically stable.
Summarizing, if Re A;(F) < 0, with constant F, G, H, Q and Rand with
t o -+ - 00, the signals x, y = H'x, v and z are all stationary, the KaIman filter is
time invariant and asymptotically stable provided observability and controllability conditions are fulfilled. (Even if Re A;(F) < 0 fails, the latter statements
concerning the KaIman filter remain true.)
The parallel with the Wiener filter becomes more apparent in this result.
Let us note that observability and controllability are a little stronger than
needed; in fact, it is not hard to relax these requirements to detectability and
stabilizability, see e.g. [5,6].
Even in the nonstationary case, it still makes sense to contemplate the
possibility of t -+ + 00, and to ask about the stability of the KaIman filter and
the forgetting of initial conditions. Aresolution of these questions was really
suggested by the observation of [1,2] that the KaIman filter problem in many
ways is a dual of the linear-quadratic regulator problem, where infinite-time
behavior and stability are key issues, even for time-varying systems. The
fundamental paper [7] had dealt with these issues, and duality pointed the way
to the corresponding filtering results:
Result 1 (TV). Suppose [F(t), H(t)] is uniformly completely observable, and
F(t), G(t), H(t), Q(t) and R(t) are bounded. Then P(t) is bounded for all tE[t o, 00].

48

B. D. O. Anderson and J. B. Moore

Moreover, if t o ~ -

00,

lim P(t) = P(t)

to--+ -00

exists and satisfies the differential equation (3.3).


Result 2 (TV). In addition to the hypothesis of Result 1 (TV) suppose that
[F(t), G(t)Ql/2(t)J is uniformly completely controllable. Then P(t) is independent
of Po, and the system Ti = (F(t) - P(t)H(t)R -l(t)H'(t))p is exponentially stable.

Relaxation of the observability/controllability to detectability/stabilizability


is surprisingly difficult, and took some time, [8J.

5 Kaiman Filter (Discrete Time), [1]


Practically all that has been stated for the continuous-time KaIman filter carries
over to the discrete-time filter. There is however one idea in the discrete-time
theory that is made more transparent than in the continuous-time theory, and
because of its applicability to more general problems it is worth recording. The
signal model is
Xk + 1
Zk

FkXk + Gk W k

= H~Xk+ Vk

(5.1a)
(5.1b)

with
(5.2)

and {wd, {vd are zero me an sequences. For convenience, let the initial time be
k = O. Then the data inc1ude the mean Xo and variance Po of x o, which is
independent of {wd, {v k }. All variables are gaussian.
The key idea is to distinguish the effect of dynamics and measurements in
the filter. More precise]y, let Xk/k be the optimal estimate, again a conditional
mean estimate, of X k given Z/, I ~ k, and let xk+ l/k be E[x k + llz/, I ~ kJ, the
one-step prediction estimate. Since Wk is independent of z/ for I ~ k, (5.1a)
yie1ds
(5.3)
This shows how to update an estimate as a result of the system dynamics,
when no extra measurements appear. Along with (5.3), there holds
(5.4)

Hence r k / k and r k + l/k are the error covariances associated with Xk / k and xk + l/k'

2 KaIman Filtering-Whence, What and Whither?

49

The measurement update equations indicate how to pass from xk+l/k and
.Ek+ l/k to xk+ l/k + 1 and .EH l/k + 1 They are
XH l/k+ 1 = XH l/k + .Ek+l/k H H 1 [H~+ 1.Ek+l/k H k+ 1 + Rk+1]-1
. [Zk+ 1 - H~+ 1 Xk+l/k]
.Ek+l/k+ 1 =.EH l/k -.EH l/k H H

1 [H~+ 1.EH

(5.5a)

l/k H k+ 1 + R k+1]-1

H~+1.Ek+1/k

(5.5b)

Observe that Fk> Gk and Qk enter only in the time or dynamical update
equation, while Hk and R k enter only in the measurement update equation. This
separate accounting for dynamics and measurements, necessarily blurred in the
continuous-time filter equations, is optimal in the discrete-time equations.
Some of these ideas are also to be found in [9].

6 Development 1: Spectral Factorization and Innovation


The key computational tool in Wiener filtering is spectral factorization, and in
KaIman filtering, it is the Riccati equation.
Since there are some problems which can be treated either by Wiener or
KaIman filtering methods (those where t o = - 00, Wy(jw) = H'(jwI- F)-lGQ1/2,
and v() is white), there should be a connection between the computational tools
for each approach, and indeed, this is now reasonably explored even in books,
see, e.g. [5,10,11]. We shall touch lightly on the connection here.
Take t o = - 00, F, G, H, Q and R constant, with F stable and the observability/controllability condition fulfilled. From the steady-state Riccati
equation (4.2), it is possible to prove
(1 + H'(sI - F)-l K)R(l + K'( - sI - Fr 1 H)
= R +H'(sI- F)-lGQG'( -sI-Fr 1H

(6.1)

where
(6.2)

The quantity on the right side of (6.1) is the spectrum of the measurement
process z(). The left side defines a spectral factorization. Notice that
[1 + H'(sI - F) - 1 K] - 1 = 1 - H'(sI - F + H Kr 1 K

and F - KH' = F - PHR -1 H', which is guaranteed stable. So the spectral


factorization produces a transfer function (matrix) which is stable, together with
its inverse. Evidently, the Riccati equation has led to a spectral factorization.
Conversely, if one has the spectral factorization (obtained say by Wiener
filtering methods), and is able to represent the spectral factor in the form

50

(l

B. D. O. Anderson and 1. B. Moore

+ H'(sI -

F)-1 K)R 1/2 then (4.2) and (6.2) imply

P(F' - HK') + (F - KH')P = - GQG' - KRK'

(6.3)

and this shows that P can be defined as the solution of a linear Lyapunov matrix
equation. So the two apparently distinct filter calculations, spectral factorization
and (steady state) Riccati equation solution, are effectively equivalent in this case.
A related result concerns the so-called innovations process. Consider
Fig. 3.2, and suppose that P = P, with F, Hand R all time-invariant. Then
the transfer function matrix from z to v = z - H'x can be computed to be
1- H'(sI - F + PHR- 1H,)-1 PHR- 1 = I - H'(sI - F + KH,)-1K

= [I + H'(sI -

F)-1 Kr 1

(6.4)

It follows that the spectral matrix of v, which is termed the innovations process, is

(/Jvv(jw) = [I

+ H'(jwI -

F)-1 K] -1 (/Jzz(jw)[I

+ K'( -

=R

jwI - F')-l H]-1


~~

i.e. v() is a white noise process. This remarkable result continues to hold even
in the nonstationary case, see [12] for a further discussion, though the proof
is obviously very different. The observation has proved useful in developing
various extensions of the basic theory, motivating conjectures, etc. It suggests
that the time-varying Riccati equation is achieving a form of time-varying
spectral factorization. (Indeed, this is true, see [13].) It also makes ni ce contact
with the highly motivated derivation of the Wiener filter of [4].

7 Development 2: Smoothing
Filtering is concerned with obtaining E[x(t) Iz(s), SE [t o, t)] on line. Smoothing
is concerned with obtaining one or more of the following:

E[x(to)lz(s), SE[t o, t)]


E[x(t)lz(s), SE[t o, t + ,1)]
E[x(t)lz(s), SE[t o, T)]

(fixed t o, varying t)
(fixed t o, ,1, and varying t)
(fixed t o, T, varying t)

These tasks are known as fixed point, fixed lag and fixed interval smoothing.
We shall now outline how these problems can be solved.

Fixed point smoothing. Consider the standard signal model, augmented by a


second set of integrators which are undriven and unmeasured. The initial
condition on these integrators is identical with that on the main signal model.
The full model is then

(7.1a)
(7.1 b)

2 Kaiman Filtering-Whence, What and Whither?

51

v(l}
Z(I}

Fig. 7.1. Augmented signal model


for fixed point smoothing

with

E[:J=[::]
E[ {[:J -[::]}{[:J -[::]}']

[~: ~:]

(7.2)

The set-up is a standard one in terms of the KaIman filter. The best estimate
of X a, viz xa(t), is E[xa(t)lz(s),s<t]. However, xi) is so constructed that
xa(t) = xa(t O) = x(t o) for all t. So xit) = E[x(to)lz(s), s < t].
The Riccati equation for the augmented system decomposes into the Riccati
equation for the original model plus linear equations, and the construction of
the smoothed estimate is not at all hard.
This approach was suggested by [14,15]. It could not really have come out
of a theory requiring stationarity.
Fixed lag smoothing. This problem was examined in the Wiener filtering
context, see, e.g. [3J. The optimum smoother is normally infinite dimensional
(unless L1 = co), and this does not augur weIl for a KaIman filtering approach.
However, by switching to discrete time we avoid this problem. Consider the
standard discrete-time signal model, augmented as depicted in Fig. 7.2.

1---" x.(2)=x
I

elay
~

...

JN)

"k

k-2

=Xk-N

Fig. 7.2. Augmented signal model


for fixed lag-smoothing

52

B. D. O. Anderson and J. B. Moore

This augmented model, with state


is of a standard type, and therefore a KaIman filter can be found for it. The
state estimate ofthe KaIman filter will include as a subvector for eachjE[l, N]
E[x~)lz[,l ~

k] = E[x k _ )z[, I ~ k]

(7.3)

This is nothing but a fixed-Iag estimate with lag j. Again, it is easy to compute
the filter for the augmented system, and therefore the fixed-Iag smoother. It
turns out that if not all lagged estimates between 1 and N are required, considerable simplification is possible, see [5]. This approach originally appeared
in [16]. Approaches to the continuous-time problem can be found in [17].
Fixed-Iag estimates will always offer a lower error covariance than filtered
estimates of the same quantity , since more data is being used to genera te the
estimate. When the KaIman filter is exponentially stable, it turns out that all
the improvement can in practice be obtained by taking the lag equal to 4 to 5
times the dominant time constraint of the KaIman filter.
Fixed interval smoothing. One way to solve the fixed interval smoothing
problem becomes available when the KaIman filter is exponentially stable. Let
,1 correspond to 4 to 5 times the dominant time constant t of the KaIman filter.
Then
E[x(t)lz(s), SE[t o, T]]

E[x(t)lz(s),SE[t o, t + ,1)]

(7.4)

so long as t + ,1 ~ T. If we define H(t) = for t> T, so that measurements of


z(t) for t > Tcontains no information about x(t), then (7.4) holds for all tE[t o, T].
Exact approaches are described in [18-20]. They generally involve running
a KaIman filter forward in time and storing either the filtered estimates or the
measurements. Then those stored estimates are run backwards to obtain the
fixed-interval smoothed estimates. In discrete time, there holds
(7.5)
where the interval in question is [0, N]. Using stored values of x m' xj/i-l'
equation (7.5) is implemented backwards in time, thus j = N, N - 1, N - 2, ....
Similariy,
L j - l / N = Lj-l/i-l

.L

+ Lj-l/i-l Fj_l L j'/-l [L j / N -

j'/_ J j - l Lj-l/i-l

m- l ]
(7.6)

8 MisceIIaneous Developments
Nonlinear Kaiman filter. Following the success of the (linear) KaIman filter, it
became natural to try to ex te nd the ideas to nonlinear systems. In one thrust,
see e.g. [21-22], the aim was to provide equations for the evolution of the

2 KaIman Filtering-Whence, What and Whither?

53

conditional probabilities p(x(t)lz(s),s < t). Of course, describing the evolution


of a function is much more complicated than describing the evolution of a finite
dimensional quantity, as constituted for example by a mean and variance. Most
practical applications of the nonlinear KaIman filter have therefore sought to
follow the approach of describing the evolution of the conditional mean and
error covariance. The catch is that approximations are inevitably involved, as
linearization underpins virtually all the suggested schemes. In reference [23],
the phase-Iocked-Ioop for demodulation of angle modulated signals is derived
using such ideas: this is a real demonstration of their efficacy.
Computational issues. Attempts to implement the KaIman filter on practical
problems soon showed that, where there was a choice of analytically equivalent
procedures, numerical issues could playamajor role in determining the utility
of algorithms, especially for discrete-time problems, see e.g. [5] for a summary.
Key ideas advanced included the following
(a) The information filter; one works by updating the inverses of the covariance
matrices, and modifies the filter equation.
(b) Sequential processing; one pro ces ses a vector of output measurements one
at a time, or in blocks.
(c) Square root filtering; one works with the square root of the covariance
matrices, rather than the matrices themselves.
(d) Prevention of filter divergence; to avoid gain matrices becoming too small,
artificial increase of the covariance of the process noise, {wd or w(), or
exponential forgetting of old data, is introduced.
(e) Chandrasekhar type algorithms; computational simplifications arise when
the defining matrices of the signal model are time-invariant.
Controller design. For high order, linear multi variable systems, the design of
controllers is a major task. One of the main tools is the linear-quadraticGaussian (LQG) design method, and a key component of such a design is the
inclusion of a KaIman filter. The filter produces estimates x(t) of the plant state
x(t), and they are fed back to the plant input via an optimal linear state feedback
law. For arecent treatment of LQG design, see [24].
Adaptivejiltering. A critical assumption in applying KaIman filtering techniques
is that the signal model is known. Since inaccurate model knowledge (which
can certainly be the case in practice) may lead to poor estimation or prediction,
there is motivation to "tune" or "select" filter parameters on line. The key idea
of adaptive KaIman filtering is to monitor the variance of the innovations and
to tune or select the KaIman gain, and perhaps other parameters, to reduce
this. When the innovations covariance is white, and consequently of minimum
variance, there is guaranteed optimality. A second approach is to work with
signal models such that the original unknown model parameters are states, and
apply KaIman filtering to estimate the parameters. These parameter estimates
can then be used to tune the original KaIman filter. This coupled Kaiman filter
arrangement is discussed in [5].

54

B. D. O. Anderson and 1. B. Moore

9 Conclusions
Though the preceding sections have soleIy discussed theoretical issues, we should
note the great practical importance ofthe Kaiman filter. Applications in tracking
and guidance abo und, and as noted in the preceding section, the KaIman filter
is a major constituent of many controller designs. In truth, it represents one of
the major post-war advances of engineering science.
References
[1] R.E. KaIman, "A new approach to linear filtering and prediction problems", J Basic Eng,
Trans ASME, Series D, Vo182, March 1960, pp 35-45
[2] R.E. KaIman and R.S. Bucy, "New results in linear filtering and prediction theory", J Basic
Eng, Trans ASME, Series D, Vo183, March 1961, pp 95-108
[3] N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series, MIT Press,
Cambridge, Mass, 1949
[4] H.W. Bode and c.E. Shannon, "A simplifies derivation of linear least square smoothing and
prediction theory", Proc IRE, Vo138, April 1950, pp 417-425
[5] B.D.O. Anderson and J.B. Moore, Optimal Filtering, Prentice Hall, Inc, Englewood Cliffs, NJ,
1979
[6] V. Kucera, "The discrete Riccati equation of optimal control", Kybernetika, Vo18, 1972,
pp 430-447
[7] R.E. KaIman, "Contributions to the theory of optimal control", Bol Soc Matem Mex, 1960,
pp 102-119
[8] B.D.O. Anderson and 1.B. Moore, "Detectability and stabilizability of discrete-time linear
systems", SIAM J on Control & Optimization, Vo119, 1981, pp 20-32
[9] P. Swerling, "A proposed stagewise differential correction procedure for satellite tracking and
prediction", J Astronaut, Sei, Vo16, 1959, pp 46-59
[10] P. Faurre, M. Clerget, and F. Germain, "Operateurs rationnels positifs", Dunod, Paris, 1979
[11] B.D.O. Anderson and S. Vongpanitlerd, Network Analysis and Synthesis, Prentice Hall, Inc,
Englewood Cliffs, NJ., 1973
[12] T. Kailath, "A view of three decades of linear filtering theory", IEEE Trans Info Theory,
Vol IT-20, March 1974, pp 146-181
[13] B.D.O. Anderson, J.B. Moore, and S.G. Loo, "Spectral factorization oftime-varying covariance
functions", IEEE Trans Info Theory, Vol IT-15, September 1969, pp 550-557
[14] L.E. Zachrisson, "On optimal smoothing of continuous-time Kaiman processes", Information
Sciences, Voll, 0969, pp 143-172
[15] W.W. Willman, :'On the linear smoothing problem", IEEE Tranns Auto Control, Vol AC-14,
February 1969, pp 116-117
[16] J.B. Moore, "Discrete-time fixed-Iag smoothing algorithms", Automatica, Vo119, March 1973,
pp 163-174
[17] S. Chirarattanon and B.D.O. Anderson, "Stable fixed-lag smoothing of continuous-time
pro ces ses", IEEE Trans Info Theory, Vol IT-20, January 1974, pp 25-36
[18] D.C. Fraser and J.E. Potter, "The optimum linear smoother as a combination oftwo optimum
linear filters", IEEE Trans Auto Control, Vol AC-14, August 1969, pp 387-390
[19] J.E. Wall, A.S. Willsky, and N.R. SandelI, "The fixed-interval smoother or for continuous-time
processes", Proc 19th IEEE Conference on Decision and Control, 1980, pp 385-389
[20] H.E. Rauch, "Solutions to the linear smoothing problem", 1EEE Trans Auto Control, Vol AC-8,
October 1963t pp 371-372
[21] E. Wong, Stochastic Processes in Information and Dynamical Systems, McGraw Hili Book
Co., New York, 1971
[22] RJ. Elliott, Stochastic Calculus and Applications, Springer Verlag, New York, 1982
[23] D.L. Snyder, The State- Variable Approach to Continuous Estimation, MIT Press, Cambridge,
Mass., 1969
[24] B.D.O. Anderson and J.B. Moore, Optimal Contro/: Linear-Quadratic Methods, Prentice Hall,
Englewood Cliffs, NJ, 1989

From Kaiman Filtering to Innovations,


Martingales, Scattering and Other Nice Things*
T. Kailath
Department of Electrical Engineering, Stanford University, Stanford, CA, 94305-4055, USA

This paper is an account of the development of some of several researches inspired by Kalman's
seminal work on linear least-squares estimation for processes with known state-space models.

1 Introduction
I first met Rudy KaIman in October 1960 at a conference in Santa Monica,
California organized by Richard Bellman. Rudy spoke about the theory of
optimal control and the calculus of variations [1], while my paper was on
Gaussian signal detection problems in which the likelihood ratios were expressed
in terms of "smoothed" (noncausal) least-squares estimates [2]. lassume Rudy
told me then about his paper on discrete-time state-space estimation [3] and
the continuous time paper with R. Bucy [4]. However I knew nothing about
state equations and was not particularly interested in recursive estimates;
moreover the papers [3]-[4] stated the determination of smoothed estimates
as being more complicated and still unsolved. So while Rudy and I met again
at MIT and in fact explored my spending the summer of 1961 with hirn at
RIAS (which did not happen because I celebrated my graduation after four
years at MIT by a visit horne to India), it was not until the mid-sixties that I
began to study his papers. The motivation was that in certain feedback
communication schemes (see, e.g. [5], [6]) and in a new Gaussian signal
detection formula of Schweppe [7] based on causal least-square estimates,
recursive estimates were important. So with the help of some graduate students
(J. Omura, B. Gopinath and P. Frost) I began the study of state-space systems
and KaIman filters. That was indeed a fortunate occurrence, because in one
way or the other for the last quarter century a significant part of my research
has been influenced, by Kalman's work on system theory. It is therefore a special

This work was supported in part by the U.S. Army Research Office under Contract DAAL03-89K-OI09 and the Air Force Office ofScientific Research, Air Force Systems Command under Contract
AF88-0327. This manuscript is submitted for publication with the understanding that the US
Government is authorized to reproduce and distribute reprints for Government purposes
notwithstanding any copyright notation thereon.

56

T. Kailath

pleasure for me to contribute an article to this special volume. Limitations of


time and space prevent an extensive presentation of the various results that
arose from these studies, and in any case most of them have been documented
in the literature. On the other hand, I thought it might be interesting to describe
how some of these results grew out of my communications and signal processing
background, unlike the state-space based education of most of the control
theorists in this field. Hopefully this might encourage others to look across the
borders of their own fields not only for greener pastures beyond, but for a
fruitful symbiosis of different disciplines. Another objective is to illustrate again
the hesitant and uneven progress of most research and the role of chance and
luck therein. 1

2 The Kalman-Bucy Filter


For someone with a communications background, least-squares estimation is
first encountered via the Wiener theory (e.g. as in [8J, which I had studied in
a prepublication version in my first course at MIT in the Fall of 1957). Two
salient facts he re are that the smoothed (or "noncausai") estimates are easily
obtained via Fourier transformation, while filtered (or "causai") estimates are
more difficult. They are obtained via the beautiful spectral factorization
technique of Wiener and Hopf for solving the famous integral equation named
after them:
t

h(t) + h(T)K(T - s)ds = K(t),

t~0

(1)

The solution h(') can be shown to determine the linear least-squares estimate,
say z(), of the values of a zero-mean stationary stochastic process z() from
noisy measurements:
z(t) =

where

J h(T)y(t -

Y(T) = Z(T) + V(T),

T)dT
-

CIJ

(2)

< T< t

(3a)

with v() being a white noise process uncorrelated with z(),


Ev(t)v(s) = 6(t - s),

Ev(t)z(s)

== 0

(3b)

and K (., .) being the covariance function of zU


K(t - -s) = Ez(t)z(s)

(3c)

1 F or as the preacher said in Ecdesiastes, 'The race is not to the swift, nor the baule to the strong,
neither yet bread to the wise, nor yet riches to men of understanding, nor yet fa vour to men
of skilI; but time and chance happeneth to them all.".

2 Kaiman Filtering-From Kaiman Filtering to Innovations

57

The equation (1) follows from the so-called orthogonality condition


z(t)-z(t)1-y(r),

-oo<r<t

where a 1- b means that Eab = O.


While the Wiener-Hopf spectral factorization technique is very elegant, it
relies on properties of bilateral Laplace transforms and complex functions not
familiar to many engineers. A more favored solution method is based on the
use of causal and causally invertible whitening filters to replace the process y(')
by a white noise process, say v('), from which the least-squares estimate z() is
readily obtained without the need for solving any integral equation. This
approach was beautifully presented by Bode and Shannon [9], and in the
lesser-known (but more general) paper of Zadeh and Ragazzini [10]. (It was
only realized much later than Kolmogorov's solution (1939) (1941), contemporaneous with Wiener, of the discrete-time least-squares problem was based on
the same idea.)
Over a decade or so, attempts were made to extend Wiener's results to
nonstationaty processes z(), now only observed over finite-time intervals. Such
problems lead to equations of the form
t

h(t, s) + Jh(t, r)K(-r, s)d-r = K(t, s),


o

0~s~t~ T

(4)

They are said to be ofWiener-Hopftype, because ofthe (half-plane) constraints


but no general spectral (or rather covariance) factorization type of
solution was available. KaIman [3] and KaIman and Bucy [4] provided a
breakthrough by replacing the assumption of a known covariance function
K(', .) by the assumption that the nonstationary process had a state-space
description

o ~ s < t,

z(t) = H(t)x(t),

x(t) = F(t)x(t) + G(t)u(t),

t~0

(Sa)

where F('), G('), H(') are known n x n, n x m and p x n matrices, while x(O), u()
and v() are zero-mean random quantities with
Eu(t)x*(O) == 0,

Ev(t)x*(O) == 0,

Ex(O)x*(O) = /Io

E[~~:~Jcu*(S) v*(S)]=[Q~t) ~}(t-S)

(Sb)
(Sc)

The star denotes (Hermitian) transpose and the matrices /Io and Q(-) are also
assumed to be known. We also remark that for vectors, we shall take a J.. b to
mean E[ab*] = O. [It is perhaps not as well-known as it should be that inner
products can be allowed to be matrix-valued.] Now if we define
Kxy(t, s) = E[x(t)y*(s)]E[x(t)x*(s)]H*(s)

(6)

and
t

x(t) = Jhxy(t, s)y(s)ds


o

(7)

58

T. Kailath

then the orthogonality condition

E[(x(t) - x(t))y*(S)]

= 0, 0 ~ s < t ~ T

leads to the equation (also of Wiener-Hopf-type)


t

hxy(t, s) + Jhxit, r)Kx/r, s)dr = Kxy(t, s),


o

0~s<t

(8)

Wh at KaIman and Bucy did in their celebrated paper [4] was to note that the
state-space assumptions (5) implied, after some calculation, that

(9)

-hxy(t, s) = [F(t) - hxit, t)H(t)]hxy(t, s)


8t
Then differentiating the integral expression in (7) gives

d
-x(t) =
dt

J[F(t) t

= F(t)x(t)

hxit, t)H(t)]hxy(t, s)y(s)ds + hxy(t, t)y(t)

+ hxy(t, t)[y(t) -

H(t)x(t)],

X(O) = 0

(10)

They then introduced the variance matrix of the error,

P(t) = E[x(t) - x(t)] [x(t) - x(t)]*

= E[x(t) -

x(t)]x*(t)

(11)

and noted that

P(t)H*(t) = E[x(t) - x(t)]x*(t)H*(t)

= Kxit, t) -

Jhxy(t, r)H(r)Kxy(r, t)dt


o

(12)

where the last equality follows from the defining integral equation (8). Therefore
we have
(13)

hxy(t, t) = P(t)H*(t)
It is customary to redefine hxy(t, t) as

hxy(t, t) = K(t),

the (KaIman) gain function

so that (10) becomes

x(t) = F(t)x(t) + K(t)[y(t) - H(t)x(t)],

X(O) = 0

(14)

Finally some sirpple calculations, e.g. by working with the differential equation
for the state error, x(t) = x(t) - x(t), show that P(-) obeys a nonlinear matrix
differential equation of Riccati type

P(t) = F(t)P(t)F*(t) + G(t)Q(t)G*(t) - P(t)H*(t)H(t)P(t), P(O) = IIo


(15)

2 KaIman Filtering-From KaIman Filtering 10 Innovations

59

The equations (14), (15) are the celebrated Kalman-Bucy filtering equations,
which by now have probably launched more reports and papers than Helen's
face ever did ships.
Brief Historical Comment

The paper [4J of KaIman and Bucy explains the historical background of their
work, based on reports by Follin and Carlton (1956), Hanson (1957) and Bucy
(1959), and Kalman's independent discrete-time solution [3]. The other authors
were motivated by attempts to obtain a practical solution to a nonstationary
tracking problem by starting with an estimating filter in state-space form and
then choosing the coefficients to minimize the mean-square-error. In conversation with me, and as evidenced by his paper [3J, KaIman stated that
he was not directly influenced by any practical application, but was motivated
by the wish to use in stochastic problems the state-space descriptions that he
had been (a pioneer in) advocating for studiesof deterministic linear (control)
systems. If I may say so, here is another ex am pIe, if more are needed, of the
value of pursuing research based on challenge and vision, rather than immediate
relevance. However, in this case, practical applications were very much at hand,
with the launching of Sputnik and the space age in 1957. Therefore the work
of KaIman and Bucy was immediately picked up by engineers at NASA (e.g.
S. Schmidt, G. Smith), the Draper Laboratories of MIT (R. Battin, J. Potter)
and elsewhere. Moreover it should not be surprising that closely related results
were being obtained by others at about the same time. One of the most relevant
was a paper by P. Swerling (1959) on a stage-wise smoothing procedure for
satellite observations. Anyone familiar with the famous stories of Gauss'
calculations ofthe orbits of asteroids might expect that his name should resurface
in this subject and indeed it does; H.H. Rosenbrock wrote a note on this in
1965 and Y. Genin (1970) wrote a careful exposition of the connection. FinaIly,
very related results in continuous-time were obtained by the Soviet physicist,
R.L. Stratonovich [llJ, whose goal was to show how going to the study of
Markov processes rather than Gaussian processes enabled (communications)
engineers to go beyond linear detection and estimation problems to nonlinear
and recursive solutions. The connection of their models to Gauss-Markov
processes was not explicitly mentioned by KaIman and Bucy in [3J and
[4J, though they could not have been unaware of it. On the other hand,
Stratonovich's approach was to explicitly seek to generalize the Fokker-Planck
equation for Markov density functions to obtain nonlinear filtering equations
for the conditional probability density (anticipating similar but later studies by
Kushner). Then in studying wh at he calls the "Gaussian approximation,"
Stratonovich obtains (in a different notation, of course) the basic (conditional
me an and variance) equations (10), (14), (15) of the KaIman filter. For ease of
reference, rather than give separate citations, I might mention that the abovecited papers of Kolmogorov (1939) (1941), Carlton and Follin (1956), Swerling
(1959), and Genin (1970), as weIl as KaIman [3J, Zadeh and Ragazzini [10J,

60

T. Kailath

and Stratonovich [11] have been reprinted, along with some others and with
some commentary, in the collection [12]. To avoid misunderstanding, it should
perhaps also be noted that this discussion is inserted not to detract from the
contributions of Kaiman and Bucy, but to show the richness and depth of the
problem to which they made such an elegant and influential contribution;
moreover they were able to add important results based on the controllability
and observability concepts recently formalized (and in the ca se of observability
first introduced) by Kaiman, such as the theorem on stability of the filter (see
the discussion at the end of Sect. 4) and the connection via duality with the
linear regulator problem and the Hamiltonian equations (see (28) below).

3 The Innovations Process and Martingales


In the mid-sixties I didn't know of all this related work, except that of
Stratonovich, which had a lot of results on the topic of signal detection and
communications, my chief research interest in those days. With this background,
while the Kalman-Bucy analysis was not hard to follow, for a better personal
understanding I kept thinking about how a Bode-Shannon type of whitening
filter derivation [9]-[10] could be obtained. In retrospect, it was fortunate that
all my attention in that period was focused on continuous-time processes,
because as I discovered much later Kalman's discrete-time solution was in fact
based on a whitening filter approach, which however is much easier in discretetime (being equivalent to a preliminary Gram-Schmidt orthogonalization)
Another fortunate and equally accidental circumstance was that in 1966-67,
I had taken on as a postdoctoral fellow, Dr. J. Martin Clark, who had just
completed a thesis at Imperial College, London, on modelling stochastic
processes via the then very new (to engineers) It stochastic integral. My
introduction to this topic was sometime in 1965 when my good friend Moshe
Zakai called me from Berkeley in some excitement to say he wanted to come
down and explain to me how in work on Markov diffusions, Gene Wong and
he had come upon the need for a new integral in which 2f~x dx =F x 2 (t) - x 2 (0).
[In return, I was able to give them a useful reference to a book of Stratonovich;
see the acknowledgment in [13].] The same issue had arisen in nonlinear
state-space estimation, so there was a great deal of discussion at this time about
stochastic integrals, especially the It integral and its relation to the so-called
Stratonovich integral obeying the "usual" integration rules. When Martin CIark
wrote enquiring about a postdoctoral position at Stanford, I seized on this as
a good way of learning about this fancy stuff. I also put a Ph.D. student,
T. Duncan, to wrk on such things, especially to try to rigorously derive, working
exdusively in continuous-time, the nonlinear filtering extensions of the
Kalman-Bucy formulas and also some likeIihood ratio formulas ofStratonovich
and Sosulin [14] for the detection of non-Gaussian Markov processes in white
Gaussian noise. Duncan studied this detection problem by a method described

2 Kaiman Filtering-From Kaiman Filtering to Innovations

61

in a short note of Bucy [15], but came up with a slightly different formula than
in [14]. When he showed this to me, my discussions with Zakai about stochastic
integrals and ordinary integrals came to mind, and I saw that the transformation
of Duncan's version from It to ordinary (now called Stratonovich) integrals
reconciled the two. Later, this confusion between integrals also helped me to
clarify some of Schweppe's likelihood ratio formulas-see [16].
In any case, to return to the linear estimation problem, examination of
the Kalman-Bucy equations led me to the conclusion that the process
y(t) - H(t)x(t) was a white-noise process. It is hard to recall now the exact train
of thought, but at the time there was considerable confusion about this fact.
Thus my local experts (Frost and Clark) at first argued that this could not be
true since

y(t) - H(t)x(t) = H(t)i(t) + v(t)

(16)

and the error i(t) had nondiagonal covariance P() and was correlated with
the noise v(). Then as I recall we saw a preprint of a note by L.D. Collins [17],
showing by a detailed calculation using the filter equations (10), (14), (15) that
the process was indeed white. 2 At about the same time, related proofs were given
by Wonham (1967), Kushner (1967) and Anderson and Moore (1967); detailed
references can be found in [18]. However this was not what I needed. Rather
than using the Kalman-Bucy equations to show this property, Iwanted to go
the other way: if one could show that the process y(.) - H(' )x() was white, and
causally equivalent to the process y(.), then the filtering equations could be
readily obtained by the two-step Bode-Shannon procedure: whitening, followed
by the simple problem of estimation given white noise. In fact, after some effort,
the following result was obtained. Consider a process

y(t) = z(t) + v(t),

(17a)

0~t~T

with

Ev(t)v*(s)=I(j(t-s),

Ev(t)z*(s):=O,

Ez(t)z*(s)=K(t,sJ

(17b)

Let
T

JE!z(tW dt <
o

z(t) =

(18)

CfJ

Jh(t, s)y(s)ds = the l.l.s.e. of z(t) given {y(s), s < t}


o

Then by using the Wiener-Hopf-type of equation obeyed by h(t, s),


t

h(t, s) + Jh(t, r)K(r, s)dr = K(t, s),


o

0 ~ t, s ~ T

(19)

2 I noticed recently that while in discrete-time Kaiman in [3, p. 42] states, "the signal after the
first summer is white noise since y(tlt - 1) is obviously [because it was obtained by a Gram-Schmidt
orthogonalization] an orthogonal random process," no similar remark is made in the continuoustime paper [4], presumably because detailed ca1culations such as those of Collins [17] and others
needed to be made.

62

T. Kailath

it can be shown that the process


v(') = y(-) - z(-) = z(') + v(')

(20)

is white with the same covariance as v(), i.e. Ev(t)v*(s) = I b(t - s). No statespace model is needed, but in fact if there is one then an easy calculation leads
to the Kalman-Bucy equations (14)-(15) (see, e.g. [18], Sect. 6.5). (We may note
that a one-sided dependence between v() and z(-) is also permitted: we can
allow z(') to be correlated with past v(), as may happen in feedback communication and control problems.)
The point is that to estimate the states from the white noise process is easy: if
t

x(t) = Sg(t, s)v(s)ds

(21a)

then the orthogonality condition


x(t)-x(t)1-v(r),

O~r<t

yields
Ex(t)v*(r} =

Sg(t, s)Ev(s)v*(r)dr =

g(t, r)

Therefore
t

x(t) = S[Ex(t)v*(r)]v(r)dr
o

(21 b)

S[Ex(t)v*(r)] [y(r) - H(r)x(r)]dr


o

This seems circular, but some thought shows that (21) states that x(t) is a
function of earlier values of x(). This suggests that a differential equation may
be around, and in fact differentiation of (21) yields
t

.i(t) = (Ex(t)v*(t))v(t) + S [Ex(t)v*(r)]v(r)dr


o

Recalling that x() obeys the differential equation (5) and noting that u(t) 1- v(r),
r < t leads after a simple calculation to the Kalman-Bucy differentialequation
(14). The Riccati equation can now be obtained fairly directly, in several
ways-perhaps the simplest is indicated in [19, footnote 7]. Some people might
be concerned about the lack of rigor in the above, especially in differentiating
(21b). However there is no less rigor used here than in starting with the whitenoise driven state equation (5) in the first pI ace. A rigorous derivation using
integrated versions throughout (and the Ito differential rule) is easy to pattern
on the above outIine, and was done in [18a]. It should be noted that the rigorous
formulation does make some things simpler, e.g. handling the previously
mentioned one-sided correlation between v() and z().
Of course to justify the use of (21 a), one has to show that the original process
y(.) can be causally recovered [rom the process v(). Here again my communi-

2 Kaiman Filtering-From Kaiman Filtering to Innovations

63

cations background was helpful. I was accustomed to the operator shorthand


form of (20),
v = (1- h)y

(22a)

This suggests that y can be found as


(22b)

an operation that is meaningful if h is small, in particular if it is less then unity


(in norm). A little search in books on integral equations came up with the
desired result. In F. Smithies, Integral Equations (Cambridge University Press,
1958, p. 34), the causal invertibility of 1- h as above is shown to hold if
the kernel h(t, s) of the Volterra integral operator h is square-integrable on
[0, T] x [0, T], a condition that it is easy to check is ensured by the assumption
(18) on zU-see [18, App. HA].
I should remark that this operator notation was very helpful in suggesting
further important results. For example let us write
y = (1- h)-l V = (I + k)v

(23a)

where it is easy to check that k = h + h 2 + ... is also Volterra and squareintegrable. Then the covariance of y can be written down, in operator form, as
Eyy* = I + K = (I + k)Evv*(1 + k*)
= (1 + k)(1 + k*),

(23b)

which is a covariance factorization formula. Moreover we can write (the


Fredholm resolvent formula)
(I + K)-l = I - H,

say

(23c)

where H(t, s) is square-integrable on [0, T] x [0, T], and obtain the identity
or

(1- H) = (I + k*)-l(I + k)-l = (1- h*)(1- h)

(23d)

H = h + h* -h*h.

(23e)

I no ted that H in fact solved the smoothing problem, so that we now could see
that the solution of the (noncausal) smoothing problem was completely
determined by knowledge of the (causal) filtering solution. This was pleasing,
because KaIman [3] and KaIman and Bucy [4] had left smoothing problems
aside as not yielding immediately to their approaches. Smoothing problems are
discussed further in Sect. 4.
The operator identities (22)-(23) have a lot of other implications: for ex am pIe,
the Kalman-Bucy formulas implicitly also specify the function h(t, s). [In fact,
h(t, s) = H(t) P(t, s)K(s) = H(t) P(t, s)P(s)H*(s).] Therefore it must be true that
Fredholm and Wiener-Hopf-type integral equations with kerneIs K(t, s) arising
from state-space models can be solved via Riccati diferential equations. This
led to various interesting results that we do not have space to elaborate here;

64

T. Kailath

I should, however, say that it led me to a valuable connection with Professor


Alan Schumitzky of USC, who has written many interesting papers on
factorization of operators, integral equations and Riccati equations, etc. I might
also mention here an early review paper [19] that will give an indication of
the wide range of ideas and connections growing out of the identification of
y(.) - H(' )x() as the innovations process of y(.), i.e. not just the fact that the
process was white as noted by Collins [17] and others around 1968, but that
this result did not depend upon having the Kalman-Bucy solution and also
that y(.) could be causally recovered from v(). In the state-space context, the
invertibility amounted to, as noted in [18, App.IID], to rewriting the
Kalman-Bucy filter equation (14) in the form

:i =

Fx + K v,

x(O) = 0

y=Hx+v

(24)

which immediately provided a (causal and causally invertible) innovations model


for the process y(). This model is parametrized by the n x p elements of the
gain function K('), as compared to the (n x m + mx m + n 2 ) elements in
{Q, G, lIo }. (We ass urne {F, H} fixed, for various reasons; one is that they are
readily identified from the covariance function of y(.), unlike {G, Q, IIo}. This
innovations representation has since been widely used in system identification,
and adaptive KaIman filtering problems (see, e.g. Mehra [20]).
Here we shall go on to describe another very major set of ideas growing
out of theinnovations result for linear filtering.

The Appearance of Martingales


From my studies with Paul Frost and Martin Clark on stochastic integrals, we
knew about a famous theorem ofLevy [Doob, Stochastic Processes, Wiley 1953;
Ch.7, Thm. 11.9] that a martingale process 3 with variance t is a Brownian
motion (or Wien er) process. Now if the process v(') in (16) is Gaussian and the
estimate z() in (19) is taken (not as the linear but) as the (nonlinear) least-squares
estimate, viz. as a conditional expectation
z(t) = E[z(r)ly(r), r < t]

then it is easy to see that the integral of the process v() is a martingale process.
Moreover the fact that v() is white me ans that its integral has variance t.
Therefore Levy's theorem applied, and we immediately had a surprising result:
if v() is white and Gaussian and independent of past values of z(), then whether
z() is Gaussian or not, the process v() = y(. ) - z(), z() being the (generally

Roughly, 1(') is a martingale if E[I(t)l {y(r), r ~ s < t}]

= I(s) (see also foot note 4).

2 Kaiman Filtering-From Kaiman Filtering to Innovations

65

nonlinear) least-squares estimate of z() given past y(.), is itself a white Gaussian
noise process with the same covariance as V().4
This was such a striking result that Paul Frost abandoned an almost
complete dissertation on nonlinear estimation (using the then traditional
Bayes-rule based approaches of Kushner and others) and started to develop an
innovations approach. The name came from the fact that we could write
v(t) ~ y(t) - z(t) = y(t) -

.P(tl t -

which displayed it as the "innovation" or the "new" information in y(t) given


all the past {y(s),s<t}. [The symbol v() was chosen as a sm all pun on the
American pronunciation of "new".]
However to complete the argument, one had to show the causal equivalence
of v() and y(.). Now as before we can write the relation between v() and y(.)
in operator form, v = (I - h)y, but unlike the Gaussian or linear case, h in
general is a highly nonlinear Volterra operator and no ready-made invertibility
results are at hand. In his dissertation, Paul Frost thought he had a proof, but
(mea culpa as his adviser by then) we had overlooked a reversed inequality at
an early stage in the proof. Our embarrassment was somewhat mitigated by
the fact that several other investigators, much more knowledgeable about
stochastic integrals and martingales, were to try for many years to establish
the equivalence. Partial results were obtained by Clark and by Benes [21a] but
the general result was only proved (under the assumption (18), and also complete
independence of z(-) and v()) in 1981 by Mitter and Allinger [21b], using
powerful results on stochastic differential equations developed weIl after 1968!
The lucky fact that the right mathematical tools had just recently been
developed was also important in a more successful application of the Gaussian
nature of the innovations. Thus note that the innovations property suggests
that the signal detection problem of choosing between the hypotheses
H 1 :y(t) = z(t) + v(t),

Ho:y(t) = v(t)

may be "replaced" by one of choosing between

ii 1 :y(t) = z(t) + v(t),

Ho:y(t) = v(t)

But this is in the form of a "(conditionaIly) known signal z() in white Gaussian
noise problem," which suggests that the likelihood ratio (Radon-Nikodym
derivative) can be written in a wellknown (matched filter or correlation-detection)
form
dP
T
1T
_ 1 = exp z(t)y(t)dt - z2(t)dt
(25)
dP o
0
20

41t is c1ear that so me care is needed here: if v() and v() are both Gaussian and have the same
covariance, why are they not identical? The fact is that the integral of v() is a martingale with
respect to the (smallest) family of sigma-fields generated by the variables {Y(1:), 1: ~ t, 0 ~ t ~ T},
while the integral of v() is a martingale not with respect to this family of sigma fields but with
respect to a larger (underlying) family of sigma fields. The interested reader can pursue the details
in books such as [36], [37] (see also [32]).

66

T. Kailath

However since z(') is random (being known only in the conditional sense that
z(t) is determined by the past observations {Y('t'), 't' < t}), there. is a question
as to the interpretation of the first integral in (25). It turns out that in order
to reduce to previously known results for z( ) of various forms (e.g.
z(t) = cos(2nt + (J), where (J is uniformly distributed over (-n, n)), the integral
had to be taken in the It sense. This led to various interesting calculations,
described in the paper [22a]. The result (25) has had various theoretical and
practical applications; in particular, it provides a general structure for the
optimal detector which can be a guide to simple suboptimal implementations
(see, e.g. [23]).
On the theoretical side however the original "proof" in [22a] was not
satisfactory (though it did bring in, apparently for the first time in such
applications, a powerful theorem of a Soviet mathematician, Girsanov [24],
which has since become a major tool in stochastic control theory). The
achievement of a rigorous proof came about in an interesting way. I had met
Professor T. Hida of Nagoya University in Japan at a statistics conference in
the U.S. and he invited me to visit hirn on my travels between California and
India. Hida had written a basic paper on canonical representations of Gaussian
processes, which was related to my studies ofthe problem of"singular" detection
(related to absolute continuity of measures). The first of these visits was in 1968,
soon after (25) had been conjectured. Hida introduced me to a student of his,
M. Hitsuda, working in a nearby university, who gave me a preprint of a paper
soon to appear in the Osaka Journal of Mathematics [25]. Hitsuda's paper
established conditions under wh ich a Gaussian process could be related to a
Wiener process in a causal and causally invertible fashion, using new martingale
and stochastic integral methods developed in 1966-67 by H. Kunita and
S. Watanabe [26]. One consequence of Hitsuda's work was an alternative proof
(to the one in [18] using integral operators) in the Gaussian case of the
equivalence of the innovations and the original process {y(-)}. More useful to
me were the facts that Kunita was also at Nagoya and that from discussions
with Hitsuda and hirn I learned enough about the new martingale methods to
obtain a simple and rigorous proof of (25) [22b]. I also enlisted their help in
getting a correct proof of our conjectured causal equivalence of v() and y(.) in
the non-Gaussian case. They did not succeed in this, but Kunita and his student
Fujisaki, joined later by Kallianpur on receipt of a preprint from them,
circumvented the equivalence result by showing that even without it, functionals
of y(.) could be written as stochastic integrals with respect to v(), but with
integrands that depended on y(. )-see [27]; with this representation, the rest
of the arguments proceed essentially as in the linear case.
However while the general nonlinear filtering analogs ofthe linear KalmanBucy results are thus obtained in a conceptually simple way, this very simplicity
showed c1early that the nonlinear problem is in general impossible to solve
exactly. The difficulty is that the equation for the conditional mean has terms
dependent on higher-order moments, and so also for the conditional variance,
conditional third moment and so on. There appears to be no simple way of

2 Kaiman Filtering-From Kaiman Filtering to Innovations

67

satisfactorily truncating this infinite ascending chain of equations, which is also


encountered in other fields such as turbulence. While attempts continue, e.g.
using ideas from Lie group theory (see, e.g. the survey volume [28]), to tarne
the problem of recursive nonlinear filtering, I remain skeptical. The nonlinear
state-space model may not be the appropriate one; more progress may come
by returning to the Wiener-Volterra expansion approach, especially since the
huge computational burdens can be alleviated by the successes of modern
integrated circuit technology. In the meantime, considerable success has been
obtained by using (properly tuned) "extended" KaIman filters (see, e.g. the special
applications volume [29]).
Realization of such difficulties led me to return to the linear theory, though
not without first exploring some other pleasing consequences of the links already
made with martingale theory. A step in increasing the generality of the results
was to use more advanced (local) martingale theory to establish the Gaussianness
of the innovations under the condition Elz(t)1 < 00, which is weaker than (18)
[30]. However more generality is possible. Realizing the fundamental
importance of martingale theory in these studies, I took the opportunity
provided by attendance at the 1972 Paris IFAC Congress S to journey to the
little black forest town of Freiburg to try to meet the then world's leading expert
on martingales, Paul-Andre Meyer. Meyer was good enough to see me on a
Sunday morning and to listen patiently to the ideas I was groping towards. A
preprint sent to me a few weeks later showed that Meyer had c1early seen the
main point and immediately recast it in a more general framework-he related
the innovations to the concept the martingale theorists called "dual predictable
projections." I won't attempt to explain this here, but it may be amusing to
quote from the first paragraph of Meyer's paper [32]: "Aux gens qui disent que
les probabilites so nt une branche des mathematiques appliquees, nous
repondons depuis des annees que probabilites que nous [i.e. the Strasbourg
school] faisons, au moins, ne p~uvent servir arien. 11 faut se detromper: la
solution de certains problemes poses par les ingenieurs exige maintenant une
partie de l'arsenal de la 'theorie generale des processus."

Filtering for Point Processes


Another important consequence of bringing martingales into the estimation
problem was the rapid progress made in tackling discontinuous processes,
arising from Poisson and other non-Gaussian independent increment processes
whose "rate" was modulated by some other unobserved process whose values
5 This too has a story: I rarely attended IF AC Congresses (partly because, by the time they 100m
into consciousness, the paper deadlines have long since passed). However I had helped Jorma
Rissanen with my knowledge of Wiener theory to improve a result of his on the stochastic partial
realization problem (a study sparked by a short Kaiman paper [31]); in return, Jorma wrote up
the results and made sure of meeting the IF AC deadlines.

68

T. Kailath

we wished to estimate. As with nonlinear Gaussian filtering, the first results


here were obtained via discretization and the use of Bayes' rule, see, e.g. Yashin
(1970), Frost (1971), Snyder and others ci ted for example in the book [33]. These
authors noted that though the models they studied contained what one might
think of as a multiplicative noise, their results on signal estimation and detection
were remarkably similar to earlier results for signals in additive noise. The
martingale formulation of these randomly modulated jump processes provides
a simple explanation. In the additive case, y(t) = z(t) + v(t) implied that
y(t) - y(tl t - ) was a white Gaussian noise process, or equivalently, the derivative
of a Wiener (martingale) process. So also if N() is a Poisson process whose
rate A(t) is itself random (sometimes called a doubly stochastic Poisson process),
it turns out that N(t) - J~ A(r)dr is again a (discontinuous) martingale process.
So we again have a signal (),(.)) plus white noise (derivative of the martingale
process) model! This fact was perhaps first noted by P. Bremaud in a 1972
Ph.D. thesis at UC Berkeley done under E. Wong and P. Varaiya. Further
results were obtained by a student of mine, A. Segall (Ph.D., 1973) and several
others (M.H.A. Davis, R. Boel, J. van Schuppen, P. Varaiya, E. Wong, J. Jacod
and others). I might mention [34J, [35J for a somewhat more engineeringoriented presentation of these facts; the book [36J by Bremaud gives a nice
account of the martingale approach.
In this context, I should mention my friendly contacts with A. Shiryaev and
R. Liptser, whom I had first met in 1968 when I visited the Soviet Union as a
member of the IEEE delegation to the annual meeting of the Popo v society.
Shiryaev and Liptser were weil aware of the Kaiman filter, and had extended
it in interesting ways, e.g. to problems with linear state models but non-Gaussian
initial states. It turned out that Shiryaev had also noticed, about the same time
as Frost and I, the white Gaussian noise property of {v(t) = y(t) - y(t It - )},
though he had not thought of using it as in [18J to derive the filtering formulas.
He was pleasantly surprised to hear from me of the important role of Girsanov's
theorem in our work. Having become aware of the innovations approach,
Shiryaev and several of his colleagues (e.g. Ershov, Skorokhod) also attempted
proofs of the equivalence of v() and y(.) in the non-Gaussian case, but without
success. In fact, his student Cirelson generated a famous example showing that
equivalence need not hold when z(-) and v() are not independent, see [21 aJ;
as mentioned be fore, the successful proof of Allinger and Mitter assurnes
complete independence. The Soviet group, along with Jacod and others in
France, has explored many applications and extensions of the martingale
approach, and in particular Liptser and Shiryaev have made an important
contribution through their comprehensive and outstanding books [37].
The mathematical sophistication, if not the engineering relevance, of this
research area has grown by leaps and bounds especially as it moves into the
area of stochastic control (see, e.g. the journal Stochastics), going far beyond
my depth. Fortunately, a chance visit (in September 1971, I think) by John
Casti, a student ofBellman and Kalaba at USC, who hadjust moved to Systems
Control, Inc., in Palo Alto, led me back to linear least-squares problems and

2 KaIman Filtering-From KaIman Filtering to Innovations

69

along a path that after a while led to scattering theory, the Chandrasekhar
equations and fast algorithms for matrices with displacement structure.
However, before moving on, some remarks on smoothed estimates are
appropriate, because their study even in the linear case continues in a useful
way even to the present [38]. A proper understanding ofthe smoothing problem
can be important in understanding the properties of stochastic processes with
state-space models, as noted by Faurre, Lindquist, Picci and others (see, e.g.
[39] ).

4 Smoothed Estimates and Scattering Models

The problem of smoothed estimates, e.g. finding x(t[ T), the least-squares
estimate of x(t) given data {y(r), ~ t ~ T} was left aside by KaIman and Bucy
[3], [4] as not yielding easily to their approaches. However a direct approach
using the innovations immediately leads to the following basic result [40]: Let

y(t) = H(t)x(t) + v(t)


where v() is unit-intensity white noise uncorrelated with past values of x().
Introduce the innovations process

v(t) = y(t) - H(t)x(t) = H(t)x(t) + v(t)


and define

P(t, s) = Ex(t)x*(s)
Then the (noncausal) smoothed estimate, x(t [T), is completely determined by
the (causal) filtered estimate via the formula
T

x(t[ T) = x(t) + SP(t, s)H*(s)v(s)ds

(26)

To see this, write

x(t [T)

= Sg(t, s)v(s)ds
o

and apply the orthogonality condition x(t) - x(t[ T) 1. {y(r), ~ r ~ T} to see that

g(t, s) = Ex(t)v*(s)
Then decomposing the integral over [0, T] into one over [0, t] and [t, T], and
checking that Ex(t)v*(s) = P(t, s)H(s) for s> t yields (26). [There is a elose
relationship between (26) and the operator formula (23e).]
Note that this basic formula does not depend upon having a state-space
model for x(). However if such exists, then we can say more about the smoothed
estimate by plugging in the Kalman-Bucy formulas. In this way, we are led

70

T. Kailath

immediately to the Bryson-Frazier formula


x(t 1 T)

x(t) + P(t)A(t 1 T)

(27a)

where A(') is the so-called "adjoint" variable


A(tl T)

S o/*(s, t)H*(s)v(s)ds
t

and o/(t, s) is the state-transition matrix of [F(') - K(' )H(')], K(') = P( )H*().
Therefore we also have the differential equation description
(tl T) = - [F*(t) - H*(t)K*(t)]A(t) - H*(t)v(t),

A(TI T) = 0

(27b)

Other special smoothing problems are also easy to solve. Note that, by
regarding t as fixed, T increasing, (26) immediately solves the so-calledfixed-point
smoothing problem. By letting t vary and T = t + ,1, ,1 > 0 and fixed, we have
a fixed lag smoothing formula. Note that if we have a state-space model, (27a)
will still apply but because the upper limit is variable, the differential equation
for A(t 1t + ,1) will have an extra term. And so on.
We shall go in another direction by introducing another representation (not
solution) for the smoothed estimate, which will lead us to a fascinating scattering
interpretation of the whole state-space estimation problem. For this note that
by using the Kalman-Bucy filter equations, differentiation of (27a) leads to a
set of so-called canonical differential equations (weIl known for the calculus of
variations):
[

~(tl T) ]
-l(t T)
1

[X(t l T)]
M(t) A(t T)
1

0
]
H*(t)y(t)

(28a)

with coupled boundary conditions at 0 and T,


}o(rlr) = 0

and

x(OI T) = 17oA(OI T)

(28b)

while M(') has the Hamiltonian form


M(t)- [

F(t)
- H*(t)H(t)

G(t)Q(t)G*(t)]
- F*(t)

(28c)

These equations are not directly useful, because the two-point nature of the
boundary conditions makes them hard to solve except by going back to (27)
or to the so-called Rauch-Tung-Striebel equation (obtained by substituting
for A('I T) from (27) into the first equation (28a)). KaIman was familiar with the
Hamiltonian matrix (28c) from his studies (see, e.g. [41]) ofthe linear quadratic
regulator problem, where wh at is encountered is actually the "dual" matrix M*.
He used this fact to show that the solution P() ofthe Riccati differential equation
could be expressed in terms of the elements of the fundamental (or statetransition) matrix of M('), noting however that this was not of computational
value except perhaps in the constant parameter case.

2 Kaiman Filtering-From Kaiman Filtering to Innovations

71

Scattering Models
As stated before, oUf reason for introducing (28) is that it enables us to establish
a useful transmission-line (or scattering) model for state-space estimation
problems. Such models were extensively studied by Redheffer in the late fifties
. (see [42]), motivated by work in electromagnetic theory and neutron transport.
The stochastic least-squares problem was very far from his knowledge or
consciousness. How I came to this subject, first explored with L. Ljung and
B. Friedlander, lS described in Sect. 5. To see this, change notation for
convenience
and make the simple Euler approximations for the derivative, e.g.

;( I)

xst=

x(s + ,11 t) - x(s)


,1

( A)

+OLJ

and also
s + LI

S y(a)da = y(s)L1 + 0(,1)

Then, neglecting 0(,1) terms, (28a) yields

[ X(S + ,11 t)] = [ 1 + f(s)L1


,1,(s 1 t)
- H*(s)H(s)L1

[H*(S~Y(S)L1]

G(S)Q(S)G*(S)L1] [
x(sl t) ]
1 + F*(s)L1
)o(s + ,1 + t)
(29)

Note that because of the minus sign in (28a), the arguments of the ,1,('lt)
terms are reversed from those of the x('1 t) terms. Because of this we can regard
x(-It) as aforward wave and ,1,('lt) as a backward wave travelling through an
incremental section at s of some scattering medium specified by the incremental
forward and backward transmission coefficients {1 + F(s)L1, 1 + F*(s)L1} and
incrementalleft and right reflection coefficients { - H*(s)H(s)L1, G(s)Q(s)G*(s)L1};
the section has an incremental internal source y(s)L1. Now let us consider a
macroscopic section of the medium from say s = r to s = t; this is shown in
Fig. 1, followed by an incremental section.
We shall collect the operators in the macroscopic section into a so-called
scattering matrix
(30)

The left reflection operator of the macroscopic section is denoted P o(r, t) for
the following reason. By tracing paths through Fig. 1 we can see that

P o(t + ,1, t) = GQG* ,1 + (1 + F L1)(P 0 - P oH* HP 0,1 + 0(,1)(1 + F* ,1)

72

T. Kailath
[+ F/:,.
J---<-"T""7""--;.----,-{+J---

Po

So(t, T)

-wo

cc"/:,.

-WH/:"

/
W"o

I +F"/:,.

Fig. 1. To determine the (forward) evolution equation of So(t, ,)

where 0(.1) denotes terms that go to zero faster than .1 as .1--+ 0; for simplicity
we have not shown all the arguments for the terms on the LHS. Therefore

d ( ) l' P o(t + .1, r) - P(t, r)


- Pot, r = 1m - - ' - - - - - - .1

dt

= GQG* + FP o + PoF* - PoH*HP o, Po(r, r) = 0


which is the same Riccati differential equation as for the KaIman filter (cf. (15))
but with P(t o) = II 0 = O. (The more general boundary condition will be
introduced presently.)
Similar arguments as for Po show that

8So(t, r) = [(F - P oH* H) 'Po


8t
- 'P6H*H'P o

FPo + PoF* + GQG* - POH*HPoJ


'P6(F - P oH* H)*
(31a)

with initial condition


(31b)
Therefore we can identify 'Po as the state-transition matrix of the closed-loop
KaIman filter, and Wo as the closed-loop observability Gramian. The special
initial conditions (31 b) explain the subscript on the quantities in the scattering
matrix. The actual boundary conditions (28b) can be incorporated by adding a
special boundary layer as in Fig. 2.
One immediate result from Fig. 2 is that we can identify {q;, q~} as the
emerging waves from the medium when xo(rlr) = 0, IIr = O. Calculations as for
(31) will show that we can identify
q;(t, r) = xo(t + t),

q~(t, r)

= Ao(r, t)

(32)

2 Kaiman Filtering-From Kaiman Filtering to Innovations

73

x(tlt)

Po

W'o

A(Tlt)

A(tlt) = 0

Fig. 2. Incorporating the boundary conditions

where x o, o obey the same differential equations as x and of (10) and (27),
except that P is replaced by Po. These relations will be used presently.
Several other nice results also follow easily from Fig. 2. For example, we
can see that the left reflection coefficient, say P, of the combined sections in
Fig.2 obeys the same differential equation as Po but with P(r, r) = llt; so also
the equations for 'P and Ware the same as { 'Po, Wo} but with P replacing Po.
However by again tracing flows through Fig. 2 we can write
P(t, r) = Po(t,r) + 'Po[llt- lltWollt + lltWolltWollt- ... J 'P (33)
= P o(t, r) + 'Pollt[l

+ W lltr 1 'P

(34)

which is a well-known formula usually obtained after much calculation.

Two-Filter Formulas
From Fig. 2, we can by inspection read out (recalling (32)) a relation for the
estimates (analogous to (34)),
x(tlt)= xo(tlt) + 'Po(t,r)x(rlt)

(35)

'This formula is not as well-known as (34) because it is harder to prove by


traditional methods; it was perhaps first derived by Lainiotis via his so-called
"partitioning" mechanism (see, e.g. [43J). To continue,we also note by inspection
of Fig. 2 that
(rl t) = o(r It) - Wox(rl t)
x(rl t) = x(r Ir) + llt(rl t)
Eliminating (r It) gives
llt- 1 x(rl t) - llt- 1 x(rl r) = o(r It) - Wox(rl t)

(36)

which is one form of a so-called Mayne-Fraser smoothing formula [44]. For


this interpretation, t should be fixed and it will be useful to consider o(r It)
and Wo(t, r) as r decreases from r = t (i.e. reverse-time evolution). This is not
hard to do in the scattering picture: we just add an incremental section to the

74

T. Kailath

left of the point rand trace the flow again to obtain the backwards evolution
equations
oso(t, r)

or

p 0(F + GQG* Wo)


F*Wo + WoF+H*H - WoGQG*Wo

P oGQG* p~

(F+GQG*Wo)*P~

]
(37)

with So(t, t) = I. So also we can define

o
or

~Ao(rlt) =

(F - WoGQG*A(rl t) + H*(r)y(r).

(38)

Note that in reverse time, it is Wo and not Po, that obeys a Riccati equation.
The Mayne-Fraser formula is often called a two-filter formula [44b] because
it can be rewritten, by defining
(39)

as
~(r It) =

[llt- 1 + P; 1

[llt- 1 ~(r Ir) + P; 1 ~b(r It)J

(40)

where some simple algebra shows from (37) that

~ Pb(t, r) =

F Pb - PbF*

~ ~b(rl t) =

(F + PbH* H)~b + PbH*y

or

or

+ GQG* -

PbH* HP b

(41a)
(41b)

with boundary values


~b(t

+ t) arbitrary,

Pb(t, t)

(41c)

= 00

Equation (41) is regarded as defining the (backwards) KaIman filter for the
usual state space model running backwards in time,
- x(r) = F(r)x(r)

+ G(r)u(r),

This is not correct however, since in such a model we shall clearly have a
dependence between the white noise u() and the boundary state value x(T),
which will destroy the basic Markovian property of the state-space model (see
[45aJ), and prevent simple application of the KaIman filter arguments. This
difficulty is not recognized in many papers and even textbooks, where the issue
is avoided by the weakly motivated assumption that the variance of x(T) is
infinite, which decorrelates u() and x(T). However the variance of x(T) is not
infinite; it is in fact the solution at T of the weIl behaved linear (Lyapunov)
differential equation

iI =

F II + llF*

+ GQG*,

ll(t o) = II 0

2 Kaiman Filtering-From Kaiman Filtering to Innovations

75

A true resolution of the problem needs more care in defining a proper reverse
time (or backwards) model. This was initially easier to approach in the scattering
model (intuitively, because going from left to right or vice versa are not as
different in a spatial wave model as reversing time is in the unidirectional
Markovian state-space time model)-see [45]-[46] for the details and more
results. Reference [38] gives a different perspective on backwards-time and
two-filter smoothing formulas. Here the reader may find it amusing to check
that a proper (Markovian state-space) backwards-time model for the simple
state equation (for a Wiener process)
x(t) = u(t),

0~t

Eu(t)u(s) = (j(t - s),

Eu(t)x(O) = 0,

Ex 2 (0) = 1

is
x(t)

= - -x(t) + f.1(t), 0 ~ t ~ T
t

Ef.1(t)f.1(s) = (j(t - s),

Ef.1(t)x(T) = 0,

Ex 2 (T) = T

Redheffer's Star Product


We wish to continue with a few more results but, as the reader may suspect,
there is no need to keep using flow-tracing arguments. A scattering algebra
(and calculus) can be derived, starting with the following fact (which can be
verified in several simple ways, including flow-tracing): if
Sl=[:

~]isfollOWedbYS2=[~ ~]

(42a)

then the cascade has scattering function

S=[

A(I-bq-la
c + dC(l- bq-la

B+Ab(I-Cb)-lD]
d(I - Cb)-l D

(42b)

Following Redheffer (see, e.g. [42]) we shall write this as


S=Sl*S2

(42c)

and call it a star product. It is an interesting and useful fact that

I=S*S-l=S-l*S

(43)

i.e. the star product inverse and the usual inverse coincide when both exist.
(Actually the star-product inverse can sometimes be defined even when S-l
fails to exist.)
One can now develop a star product or scattering algebra. The examples
already given indicate that this algebra is perhaps the most insightful and often
the simplest way of obtaining many old and new results connected with

76

T. Kailath

state-space estimation (filtering and smoothing) problems. We refer to [46]-[48]


for more details and several applications. Here we shall only note some results
for the important special case of time invariant models.

Homogeneous Media
If the medium is homogeneous (or time-invariant in our case), the incremental
parameters {F, G, Q, H} will be independent of time, and the properties of the
medium scattering matrix So(t, r) willdepend only upon the thickness t - r of
the section. An immediate consequence is a "doubling formula"

(44)
which can be used to quickly calculate the limiting behavior of So(t, 0) as t ~ 00,
an observation first made and used in radiative transfer theory (by Van de
Hulst). A direct state-space derivation is much more involved (see, e.g. [49],
p. 158); moreover the fact that the doubling formula for the Riccati variable
P() will also involve the quantities { '1', W} is much more natural to see in the
scattering derivation. The point is that introducing the star product enables the
notation to carry most of the computational burden, leaving more scope for
conceptual understanding.
Another consequence of homogeneity is that the forward and back ward
evolution equations must be the same (because adding an incrementallayer to
the right is the same as adding one to the left), i.e. we have

- - So(t, r)

ar

a
= - So(t, r)

(45)

ar

We have called the resulting identities generalized Stokes identities because an


elementary form of the one obtained by equating the (1,2) terms, viz.,

FPo+PoF*+GQG*-PoH*HP=-Po{t,r)= 'I'oGQG*'I''6,
dt

(46)

was derived by Stokes (1881) in studying the propagation of light through a


pile of glass plates. We shall see the significance of (46) in the next section.
Asymptotic Behavior

As mentioned before, perhaps the key contribution of KaIman and Bucy [4]
was their analysis of the asymptotic properties of the Riccati equation and the
steady-state filter. Thus they showed for time-invariant (and a special dass of
time-variant) models, that if {F, GQ} is controllable and {H, F} observable, then
as t ~ 00 (even if Fis unstable) P() converges to a constant value P independent
of the initial value, P(to) = II 0' provided II 0 is nonnegative definite. P is the
unique positive definite solution of the so-called algebraic Riccati equation

2 KaIman Filtering-From KaIman Filtering to Innovations

77

(ARE)
0= FP + PF* + GQG* - PH*HP

(47)

and also the unique P that makes the closed-Ioop state matrix (F - PH* H)
stable. The significance of this result is that it shows that errors in computing
P(t), at any time t, will die out as time progresses rather than build up. However
there is an interesting issue here: there is no guarantee that numerical errors
will not make P(t), at some time t, indefinite or even negative-definite. Therefore
it would be desirable to investigate convergence for more general initial
conditions. Here the most general results are apparently those of Willems [50]
who showed that convergence took place for all II > P_, P_ being the infimum
over all solutions to the ARE (47). This, and related results of several others,
are all based on a detailed study of the nonlinear ARE. In my opinion, a proof
of convergence that avoids a detailed study of the limit itself was more desirable.
I proposed this problem in 1974, to a new postdoctoral scholar, Lennart Ljung,
and we obtained such a proof [51], using an identity that was very natural in the
scattering framework, but apparently new in the estimation and control
literature:
tp o(t, 0) =

(48)

tp~(t, 0)

where tp 0(-,0) is the state-transition matrix of (F - P o(-}H* H), P 0(-) being the
solution of the Riccati differential equation (RDE) with P(to) = II 0 = 0, while
tp~(,O) is the state-transition matrix for (F - GQG*P~(-)), where P~(-) satisfies
the adjoint RDE,
a
pa(t)
o -- F*p 0 + pa0 F + H*H - paGQG*pa
0
0'

pa(t)
= 0
0 0

(49)

Moreover we were able to prove what appears to be the strongest convergence


result to date: if {F, GQ} is stabilizable and {H, F} is detectable, then convergence
holds for all symmetrie initial conditions II such that
x(p aII pa + pa)x > 0, VX such that pax =f. 0

(50)

Pa is the steady-state solution of (49) and can be shown to exist under the above
conditions on {F,G,Q,H}. We may note that if {H,F} is observable, then it
can be shown that pa is invertible, and that (50) reduces to
(51)
which is the result of Willems. Simple examples show that (50) can hold even
if (51) fails.
Hopefully, enough has been described of the scattering approach to show
that it can provide a powerful and physically intuitive framework for the study
of Riccati differential equations. More can be found in [46]-[48]. And we might
mention that we have exploited less than half of the results in Redheffer's basic
paper [42]; I understand that Redheffer is currently preparing a monograph
on his theory.

78

T. Kailath

5 The Chandrasekhar Equations and Displacement Structure


The discovery of Redheffer's work on scattering came about as a delayed
consequence of a visit from J.L. Casti mentioned earlier. Casti showed me so me
results to appear in a paper [52a], noting a "new" way of solving equations of
Wiener-Hopf type (see (4))

J
t

h(t, s) + h(t, r)K(r, s)dr = K(t, s),


o

~s~t~ T

(52a)

where K(t, s) had the special form

K(t,s) =

Je-tXlt-slw(oc)doc
1

(52b)

Such kerneis arise in radiative transfer theory, where w(oc) is the intensity of
light impinging on the atmosphere, say, from a direction oc, and e- tXt represents
the attenuation at depth t. Casti, Kalaba and Murthy [52a] noted that the
solution of (47) could be reduced to the solution of a coupled set of nonlinear
integro-differential equations

aX(t, oc) = _ Y(t, oc) Y(t, )w()d


at
0

(53b)

oc ~ 1

(53c)

aY(t, oc)
1
-at- = - oc Y(t, oc) - X(t, oc) Y(t, )w()d
X(O,oc) = 1 = Y(O,oc),

(53a)

The fact that the solution depended upon nonlinear integro-differential


equations with given initial conditions made numerical evaluation feasible, just
as the nonlinear Riccati differential equation did for the (nonstationary)
state-space problem. This was an intriguing result, which it was natural to
explore further and to try to relate to the state-space KaIman filter formulas I
was now quite familiar with. Now the derivation of the equations (53) was not
given in the short note [52a], but was referred to the paper [52b]. There it
turned out to depend upon a somewhat long sequence of manipulations of
integral equations, following a pattern familiar from invariant imbedding theory
(see, e.g. the books [53]-[54]). These derivations all appeared to depend quite
crucially upon the fact that the kernel of the integral equation had the special
form (52b).
Moreover, as I looked further into the history of these results, I found that
astrophysicists had found the Wiener-Hopf technique to be elegant, but
computationally difficult for the (infinite-dimensional) problems they faced. So
it was a big step when in 1943 a famous Soviet astronomer, V.A. Ambartzumian,
had shown that the classical Wiener-Hopf equation (1) could be solved by
reduction to an integro-differential equation of Riccati-type (see, e.g. [55,

2 KaIman Filtering-From KaIman Filtering to Innovations

79

Sect. 51, equation (30)]):

0 1 1

- P(t, 0:, ) = Q+ P(t, 0:, ')w(')d' + P(t, 0:', )w(o:')do:'


07:
0
0
1 1

+ JJP(t, 0:', )w(o:')w(')P(t, 0:, ')do:' d'

(54)

00

with given initial conditions. Ambartzumian used an invariance principle long


familiar to electrical engineers in his derivation: the input impedance of a medium
(the atmosphere, in his case) of infinite depth is the same even after a small
layer has been removed. Later S. Chandrasekhar further exploited and extended
these invariance principles to solve the Wiener-Hopf-type equations (52) for a
finite atmosphere. He introduced the notation X() and Y() functions and
derived the differential equations (53) (see [55, Sect. 56]); the point was that
solving for two functions of two variables was simpler than solving for one
function, R(t, 0:, ) of three variables. In fact, the simplifications were enough
that Chandrasekhar and others were able to obtain important numerical results
using just the clumsy hand-computing machines of the 1940s. (See, e.g. [56a,
p. 214 and 56b, p. 169].) As an aside, let me note that Chandrasekhar, who
was to win a Nobel prize many years later, derived these results in Part XXII
[56a] of a long sequence of long papers; Part XXII had 480 equations. This
style resulted in an affectionate spoof "On the imperturbability of elevator
operators: LVII, by S. Candlestickmaker, Institute for Studied Advances, Old
Cardigan, Wales." It had the obligatory acknowledgement to the "computers"
of the day: "I wish to record my indebtedness to Miss Canna Helpit, who carried
out the laborious numerical work ..."
The results led me to wonder about their implications for state-space models.
It took a while, but I was able to obtain (see [58]) equations of Chandrasekhar
type for the nonstationary pro ces ses that arise from constant-parameter
state-space models. In retrospect, the key insight was the following.
It is a striking property of the Riccati-based Kalman-Bucy filter that it
holds equally for time-variant and time-invariant state-space models-there is
no difference in computation (see, e.g. (15)), though storage may be less in the
time-invariant case. On the other hand, there should certainly be some
simplifications possible with time-invariance. The only way has to be to avoid
the Riccati equation and find so me other way of computing the KaIman gain
function K(). And some exploration led to the following argument.
Consider the Riccati equation for a time-invariant model:
p(t) = FP(t) + P(t)F* + GQG* - K(t)K*(t),
K(t) = P(t)H*,

t;;; 0

P(O) = IIo

(55a)
(55b)

Because of time-invariance we can differentiate again to obtain


P(t) = F p(t) + P(t)F* - P(t)H* K(t) - K(t)Hp(t)
= (F - K(t)H)P(t) + P(t)(F - K(t)H)*

(56)

80

T. Kailath

Assuming temporarily that KU is known, this is a homogeneous linear


differential equation for p(.), whose solution we can write as
(57)

p(t) = 'P(t, O)P(O) 'P*(t, 0)

where 'P(.,.) is the closed-Ioop state-transition matrix of the KaIman filter,


defined by
d 'P(t, 0) = (F _ K(t)H) 'P(t, 0),
dt

'P(O, 0)

=I

(58)

Equation (57) shows the striking property that the rank and in fact the inertia
(i.e. the number of positive, negative and zero eigenvalues) of PU is constant
with time: it depends only upon the rank (or inertia) of

P(O) = F IIo + IIoF*

+ GQG* -

(59)

IIoH*H IIo

Now it can weil happen that P(O) has low rank. For example, if
if

P(O) = GQG*

IIo=O,

has rank

;;:;; min(n, m)

(60)

where m is the number of inputs. 6 On the other hand if F is stable and fj is


the unique nonnegative definite solution of (the steady-state variance equation)

F fj

+ fjF* + GQG* =

(61a)

then
if

IIo = fj,

P(O)=-IIoH*HIIo hasrank

;;:;;min(n,p)

(61b)

where p is the number of outputs. When IIo = fj, the processes x() and z()
are stationary.
This suggests that the n x n matrix P() can be propagated using lower rank
matrices, with a saving in computation; then P() and K() could be computed
when desired by a quadrature. In fact, the situation is even nicer: Let
rank P(O) = rank[F IIo + IIoF*GQG* - IIoH*H IIo]

IX =

an

IX

IX

diagonal signature matrix

(62a)
(62b)

with as many ones as P(O) has positive (negative) eigenvalues. Also let L o
be any matrix such that
(62c)
[Note that L o is not unique; it can be modified by any A -unitary matrix,
e A e* = A. We shail ignore this possibility here-it does have useful
consequences.] Then (57) can be written as

P(t) = L(t) A L*(t)

(63)

6 Note that (60) is just a generalized Stokes identity (46), noted in our discussion of homogeneous
scattering media (corresponding to time-invariant state-space models) in Sect. 4. The more general
identity (57) can also be obtained in the scattering context-see [47b).

2 Kaiman Filtering-From Kaiman Filtering to Innovations

81

where
L(t) = 'P(t,O)L o

(64)

Moreover if we notice that (by constancy of H)


K(t) = P(t)H* = L(t) 1\ L*(t)H*

(65a)

and that, from (64) and (58),


L(t) = (F - K(t)H)L(t)

(65b)

with the boundary conditions


K(O) = lloH*,

L(O) = L o

(65c)

Equations (65) form a coupled set of n(p + a) nonlinear differential equations


with given initial conditions. Their solution determines K(') and hence the leastsquares estimate zU via (10), without the need to know P(). However if desired
P() can be found by quadrature as
t

P(t) = P(O) + JL(r) 1\ L*(r)dr


o

(66)

The point of course is that whenever p n, a n the new equations (65) can
provide a considerable reduction in complexity over solving for the n x n coupled
Riccati equations of the general (for time-variant as well as time-invariant
systems) Kalman-Bucy filter. For example, in the special case of scalar stationary
processes, we see from (61) that a = 1 = p, and the n(n + 1)/2 coupled Riccati
equations reduce to 2n coupled equations, which with some goodwill 7 the reader
will recognize as a finite-dimensional version of the X and Y equations (53).
Therefore what we have found in (65) is a generalization to a special dass
of nonstationary processes of the results of Chandrasekhar for stationary
processes. These generalized Chandrasekhar equations can provide dramatic
computational simplifications when n is large, as in image processing problems
,and in distributed parameter systems; we refer he re only to the papers [59].
The Chandrasekhar equations led to many further results of various sorts,
for both continuous and discrete-time systems, with and without state-space
models, which are too extensive to describe here. However because of their
important role in later work, I should mention here the papers [60] and [61],
dealing with certain "square-root" or "array" versions of the discrete-time
(Riccati and) Chandrasekhar equations.
It should also be mentioned that it was my reading in the radiative transfer
literature that led me to the work of Redheffer on transmission lines and
7 Actually it is worth noting that in radiative transfer theory, one started with given covariance
functions and there is no state-space model. This would have made comparison with the KaImanfilter formulas difficult, except that a couple of years earlier, Roger Geesey and I had shown how
to re-express the Kaiman filter equations in terms of the coefficients of the covariance function
[57]-this fortunate circumstance made the exploration of the radiative transfer literature much
easier.

82

T. Kailath

scattering [42], where Riccati equations again played a key role. However I
put it aside until I had better understood the results (52)-(54) of Ambartzumian
and Chandrasekhar. Then in 1974, Lennart Ljung came to Stanford as a
postdoctoral scholar, strongly recommended by his adviser in Sweden, K.-J.
strm. I suggested to Lennart, and to a new Ph.D. student B. Friedlander,
that it would be worthwhile to relate Redheffer's work to Riccati equation
results we knew from KaIman filtering theory. The result, after a couple of
frustrating initial months till we found the right framework, was the paper
[47b] and several others; later another student G. Verghese, whose own Ph.D.
work was on linear systems, pointed out the usefulness of starting with the
Hamiltonian equations (28), which enabled us to also obtain results on the
estimates themselves [4 7a].

Displacement Rank
However the key question that led beyond state-space models, at least initially, 8
was the meaning of the parameter 0(, which seemed to rise almost out of nowhere in the arguments leading to the generalized Chandrasekhar equations.
Moreover, the formula (62a) for 0( is not invariant in that it seems to depend
upon the particular state-space model {F, G, H, Q} used to model the signal
process z = Hx. In fact, 0( is an invariant quantity completely determined by
the covariance function of the process z(). To explain this, let us remark that
the original derivations of Ambartzumian, Chandrasekhar, Bellman, Kalaba,
Casti and others were all based on the fact that the covariance function was
stationary, or in their language, had a displacement or Toeplitz form,
K(t, s) = f(t - s)

(67a)

An alternative way of stating this is that


s) ==
( ~8t + ~)K(t,
8s

(67b)

N ow the covariance function of a process z() with a constant parameter


state-space model will not in general be stationary. In fact, as remarked in (61)
above, this will only be true when F is stable and [Ja has a special value.
However even when this does not hold, so that

(:t + a~)K(t,

s) # 0,

(68)

it turns out that K(t, s) arising from a time-invariant state-space model is not

8 Recent work with Lev-Ari has taken us back to state-space models, in a different not really
dynamic sense-to what Livsic [62] has called nodes or colligations {F, G, H, J}.

2 KaIman Filtering-From KaIman Filtering to Innovations

83

completely arbitrary: it has finite rank in the sense that we can write
s) ( ~ot + ~)K(t,
os

K(t, O)K(O, s) =

Ei(/>;(t)<Pi(S)

(69)

where Ei = + 1 or -1 (as many times in fact as the inertia of P(O) in (61)). The
RHS of (69) is what is called a "degenerate" kernel in the theory of integral
equations, so that while K(t, s) is not of displacement form, it is "dose" in some
sense to a dis placement kernel.
The number (J. is a measure of this doseness, and therefore we called it a
displacement rank. The measure was shown to have operational significance in
the sense that it takes (J. times as many computations to solve an integral
equation with a kernel of dis placement rank (J. as it takes for a Toeplitz kernel
(see [63J). In [64J, we relate these general results to the Chandrasekhar equations
for state-space systems.
Further developments have stemmed largely from attempting to work out
the above ideas for discrete-time processes, or even more simply, for finite
matrices. This has turned out to be a very long story indeed, starting with the
discrete-time vers ions of the generalized Chandrasekhar equations and their
connections to the Levinson and Schur algorithms and matrix factorizationsee, e.g. [65J-[67]. Here we shall only comment on the analog of (69).
We start by noting that the analog of a displacement or Toeplitz kernel is
a Toeplitz matrix, i.e. one of the form

T= [ci-iJ,

0 ~ i,j

N.

Many nice results are known for such matrices, especially the fact that linear
equations with Toeplitz coefficient matrices can be solved with O(n 2 ) flops
(floating point operations) rather than the O(n 3 ) flops required for a general
matrix. However in applications we often need to work with dosely related
matrices, e.g. having the forms Tl T 2 , or Tl T 2 - T 3 T4 , or Tl T~ 1 T 3 , where the
{TJ are Toeplitz matrices. These composite matrices are not Toeplitz in general,
,and so the question arises as to whether it would take O(n 3 ) flops to solve the
corresponding linear equations. When pressed, one would have to say no, and
in fact it turns out that suitable concepts of displacement structure can be
introduced: one family of definitions has the form
(70)

where {F l' F 2} are lower-triangular matrices. The simplest case, and the one
dosest to the continuous-time definition (67), is perhaps to choose F 1 = F 2 = Z,
the lower-shift matrix with zeros everywhere except for 1's on the first subdiagonal. The survey [68J gives a fairly recent account of some of the many
properties and applications following from this definition. Further results,
especially on matrices such as Tl T~ 1 T 3 , can be obtained by using block-shift
F-matrices of the form [Zn! (flZn2 (fl ",ZnnJ (see, e.g. [69a, b, and cJ).
It is worth noting here that the paper [69J deals with certain now wellknown matrix identities for the inverse of Toeplitz matrices first introduced by

84

T. Kailath

Gohberg and Semencul in 1972 in a paper published in the journal of the


Moldavian Academy of Sciences. The formulas express the inverse of the
Toeplitz matrix in the form
(71)

where Li and L 2 are lower triangular Toeplitz matrices. From this it can be
seen that the displacement T- i - ZT- i Z* has the same rank (and inertia) as
the displacement T - ZTZ*, which was a key observation in the development
of the theory of displacement structure-see the paper [70] written just as the
beginning of displacement theory was taking shape.
I first saw these important formulas in a Russian book sent to me by Gohberg
in 1973 a few months before the appearance of the English version [71]. That
this fortunate circumstance was based on an even more fortunate earlier accident
is a story that is too long to tell here. One connection is the operator formulas
noted earlier in (23); a further indication may be gained from the following
somewhat unusual acknowledgement in the paper [72]: "The first steps to the
results of this paper were taken with R. Geesey. Of course, the work progressed
very rapidly with the discovery of the deep and beautiful studies of Gohberg
and Krein; for the inadvertent discovery in May 1968 of their books, T. Kailath
is indebted to G. Wallen stein and his insistence on browsing in a Leningrad
bookstore." It is indeed a pleasure to mention here the considerable stimulation
and several other serendipitous benefits I have gained from having the good
fortune through this work to fall into the "orbit" of Professor Israel Gohberg.
As to the further development of displacement structure, that is a story to
be told some other time. However for reference to recent results I might mention
the study of some very useful connections of displacement rank structure to
inverse scattering problems, which led to a unified framework for dealing with
the Schur algorithms, identification of discrete transmission lines, matrix
factorizations, lattice filtering and partial realization of linear systems (see
[73a]-[73d]) and the Ph.D. dissertations of J. Chun [74a] and D. Pal [74b],
the results of which are in process of publication. These deal with both
Toeplitz- and Hankel-related matrices and with Bezoutian matrices (which are
inverses of Toeplitz and Hankel matrices). Now Bezoutians have displacement
structure and so in particular their tri angular factors (and thereby their inertia)
can be determined via fast O(n 2 ) algorithms-see [75]. On the other hand,
from the time of Hermite it has been known that the inertia of these matrices
arise determines the root distribution of polynomials with respect to the
imaginary axis and the unit circle. It may therefore not be surprising that the
fast algorithms turn out to yield very naturally, and in a unified way, the famous
Routh-Hurwitz-and the Schur-Cohn criteria, among several others. This is
shown in [75] for the regular case; extensions to singular cases are discussed
in [74]; not surprisingly, KaIman has also made some elegant contributions to
this classical area of system theory-see [76]-[78].
The above is only a partial account of the influence Rudy Kalman's seminal
contributions have had on my work and directions of research. Topics such as

2 Kaiman Filtering-From Kaiman Filtering to Innovations

85

stability, partial realization, stochastic realization, the positive-real-lemma,


network theory, linear systems, and control have been omitted. There are indeed
few areas of mathematical system theory that Rudy has not influenced, direct1y
or indirectly.

6 Acknowledgments
The topics mentioned in my review are fairly extensive, so any reasonable
referencing scheme must be consciously incomplete and unconsciously subjective. Therefore I would like to say explicitly that any significant omissions
are inadvertent. This being said, it is a special pleasure for me to recollect the
many pie asant and important interactions I have had with a host of students
and colleagues in many areas of mathematical system theory. As with references,
they are really too many to comfortably list here. But because of the nature
and depth of certain associations, and at least as far as the topics in this review
are concerned, I should like to especially thank Paul Frost, Adrian Segall,
Martin Morf, Lennart Ljung, Patrick Dewilde, Freddy Bruckstein, and Hanoch
Lev-Ari for many an enjoyable discussion and insight.
Finally I am indebted to Thanos Antoulas for his invitation to add my
contribution to the many others in this volume.

References
[1] R.E. Kaiman, "The theory of optimal control and the calculus ofvariations in mathematical
optimization techniques," Mathematical Optimization Techniques, ed. R. Bellman, pp 309331, Univ of Calif Press, 1963
[2] T. Kailath, "Adaptive matched filters," ibid. pp 109-140
[3] R.E. Kaiman, "A new approach to linear filtering and prediction problems," J Basic Eng,
Vo182, pp 34-45, Mar 1960
[4] R.E. KaIman and R.S. Bucy, "New results in IineaT filtering and prediction theory," Trans
ASME. Ser D. J Basic Eng, Vo183, pp 95-107, Dec 1961
[5] J.P. Schalkwijk and T. Kailath, "Co ding with wideband additive noise channels with
feedback, part I: no bandwidth limitation," IEEE Trans on Inform Thy, Vol IT-12,
pp 172-182, April 1966
[6] J.P. Schalkwijk, "Center of gravity information feedback," IEEE Trans Inform Thy,
Vol IT-14, pp 324-441, Mar 1968
[7] F.C. Schweppe, "Evaluation of Iikelihood functions for Gaussian signals," IEEE Trans
Inform Thy, Vol IT-11, pp 61-70, 1965
[8] W. Davenport and W.L. Root, Random signals and noise, McGraw-Hill, 1958
[9] H.W. Bode and C.E. Shannon, "A simplified derivation of linear least square smoothing and
prediction theory," Proc IRE, Vol 38, pp 417-425, Apr 1950
[10] L.A. Zadeh and J.R. Ragazzini, "An extension of Wiener's theory of prediction," J Appl
Phys, Vo121, pp 645-655, July 1950
[11] R.L. Stratonovich, "Application of the theory of Markov processes for optimum filtration
of signals," Radio Eng Electron Phys (USSR), Voll, pp 1-19, Nov 1960
[12] T. Kailath, ed., Benchmark papers in linear least-squares estimation, Dowden, Hutchinson &
Ross, Stroudsburg, PA, 1977 (now distributed by Academic Press)
[13] E. Wong and M. Zakai, "On the relation between ordinary and stochastic differential
equations and applications to stochastic problems in control theory," in Proc 3rd IFAC
Congr London: Butterworth, 1966
[14] R.L. Stratonovich and YU.G. Sosulin, "Optimal detection of a Markov process in noise,"
Eng Cybern, Vo16, pp 7-19, Oct 1964

86

T. Kailath

[15] R.S. Bucy, "Nonlinear filtering theory," IEEE Trans Automat Contr, Vol AC-IO, p 198,
April 1965
[16] T. Kailath, "Likelihood ratios for Gaussian processes," IEEE Trans Inform Theory, IT-16,
pp 276-288, May 1970
[17] L.D. Collins, "Realizable whitening filters and state-variable realizations," Proc IEEE,
Vo156, pp 100-101, Jan 1968
[18] T. Kailath, "An innovations approach to least-squares estimation-part I: linear filtering in
additive white noise," IEEE Trans Automat Contr, Vol AC-13, pp 646-655, Dec 1968
[18a] T. Kailath, "A note on least-squares estimation by the innovations method," SIAM Journal
Contr., Vol. 10, no. 3, pp 477-486, Aug 1972
[19] T. Kailath, "The innovations approach to detection and estimation theory," Proc IEEE,
Vo158, pp 680-695, May 1970
[20] R.K. Mehra, "On the identification of variances and adaptive Kaiman filtering," IEEE Trans
Automat Contr, Vol AC-15, pp 175-184, 1970. See also Vol AC-16, pp 12-21, 1971
[21a] C.E. Benes, "On Kailath's innovations conjecture," Bell Syst Tech J, Vol IT-55,
pp 981-1001, Sept 1976
[21b] D.F. Allinger and S. K. Mitter, "New results on the innovations problem for nonlinear
filtering," Stochastics, Vo14, pp 339-348, 1981
[22a] T. Kailath, "A generallikelihood-ratio formula for random signals in Gaussian noice," IEEE
Trans Inform Theory, Vol IT-15, pp 350-361, May 1969
[22b] T. Kailath, "A further note on a generallikelihood formula for random signals in Gaussian
noise," IEEE Trans on Inform Theory, Vol IT-16, pp 393-396, July 1970
[22c] T. Kailath, "The structure of Radon-Nikodym derivatives with respect to Wiener measure,"
Ann Math Stat, Vo142, pp 1054-1067, 1971
[23] M.H.A. Davis and E. Andreadakis, "Exact and approximate filtering in signal detection,"
ibid., Vol IT-23, pp 768-772, 1977
[24] LV. Girsanov, "On transforming a certain class of stochastic processes by absolutely
continuous substitution of measures," Theor Probability Appl, Vol 5, pp 285-301, 1960
[25] M. Hitsuda, "Representation of Gaussian processes equivalent to Wiener processes," Osaka
J Math, Vol 5, pp 299-312, 1968
[26] H. Kunita and S. Watanabe, "On square-integrable martingales," Nagoya Math. J, Vol 30,
pp 209-245, Aug 1967
[27] M. Fujisaki, G. Kallianpur, and H. Kunita, "Stochastic differential equations for the nonlinear
filtering problem," Osaka J Math, Vo19, pp 19-40, 1972
[28] M. Hazewinkel and J.C. Willems, eds., Stochastic systems: the mathematics of jiltering and
identification and applications, D. Reidel, 1981
[29] H. Sorenson, ed., Kaiman jiltering theory and applications, IEEE Press, New York, 1985
[30] T. Kailath, "Some extensions of the innovations theorem," Bell Syst Tech J, Vo150,
pp 1487-1494, Apr 1971
[31] R.E. Kaiman, "Linear stochastic filtering theory: reappraisal and outlook," Proc Symp
System Theory, pp 197-205, Polytechnic Inst, Brooklyn, 1965
[32] P.A. Meyer, "Sur un probleme de filtration," Seminaire de probabilites, part VII, leeture
notes in mathematics, Vo1321, pp 223-247, Springer-Verlag, New York, 1973
[33] D.L. Snyder, Random point processes, J Wiley, New York, 1975
[34] A. Segall and T. Kailath, "The modeling of randomly modulated jump processes," IEEE
Trans Inform Thy, Vol IT-21, pp 135-143, 1975. See also A. Segall, M. H. A. Davis and
T. Kailath, "Nonlinear filtering with counting observations, ibid., pp 143-149
[35] A. Segall and T. Kailath, "Orthogonal functionals of independent-increment processes," IEEE
Trans Inform Theory, Vol IT-22, pp 287-298, 1976
[36] P. Bremaud, Point processes and queues: martingale dynamics, Springer-Verlag, 1981
[37a] R.S. Liptser and A. N. Shiryaev, Statistics of random processes, Vols land II, SpringerVerlag, 1977; original Russian edition, 1974
[37b] R.S. Liptser aiid A.N. Shiryaev, Theory ofmartingales, Kluwer, Amsterdam, 1989
[38] R. Ackner and T. Kailath, "Complementary models and smoothing," IEEE Trans Automat
Contr, Vol AC-34, pp 963-969, Sept 1989
[39] P. Faurre, M. Clerget, F. Germain, Operateurs rationnels positifs, Dunod, Paris, 1979
[40] T. Kailath and P. Frost, "An innovations approach to least-squares estimation, part II: linear
smoothing in additive white noise," IEEE Trans Automat Contr, Vol AC-13, pp 655-660,
Dec 1968

2 Kaiman Filtering-From Kaiman Filtering to Innovations

87

[41] R.E. Kaiman, "Contributions to the theory of optimal control," Bol. Soc. Mat. Mexicana,
Second Ser, Vol 5, pp 102-119, 1960
[42] R. RedheITer, "On the relation of transmission-line theory to scattering and transfer," J M ath
Phys, Vol 41, p 141, 1962
[43] D.G. Lainiotis, "Partitioned estimation algorithms, 11: linear estimation," Information
Sciences, Vo17, pp 317-340,1974
[44a] D.Q. Mayne, "A solution ofthe smoothing problem for linear dynamic systems," Automatica,
Vo14, pp 73-92, 1966
[44b] D.C. Fraser and J.E. Potter, "The optimal linear smoother as a combination oftwo optimum
linear filters," IEEE Trans Automat Contr, Vol AC-14, pp 387-390, 1969
[45a] L. Ljung and T. Kailath, "Backwards Markovian models for second-order stochastic
processes," IEEE Trans Infor Thy, Vol IT-22, No 4, pp 488-491, July 1976
[45b] G. Verghese and T. Kailath, "A further note on backwards Markovian models," IEEE Trans
Inform Thy, Vol IT-25, No 1, pp 121-124, January 1979; correction, Vol IT-25, p 501, July
1979
[46] L. Ljung and T. Kailath, "A unified approach to smoothing formulas," Automatica, Vol 12,
No 2, pp 147-157, March 1976
[47a] G. Verghese, B. Friedlander, T. Kailath, "Scattering theory and linear least-squares
estimation, part 111: the estimates," IEEE Trans Auto Contr, 1980
[47b] L. Ljung, T. Kailath, B. Friedlander, "Scattering theory and linear least-squares estimation,
part I: continuous-time problems," Proc IEEE, Vol 64, No 1, pp 131-139, January 1976
[48] B. Levy, D.A. Castanon, G.C. Verghese and A. Willsky, "A scattering frame-work for
decentralized estimation problems," Automatica, Vol 19, pp 373-384, 1983
[49] B.D.O. Anderson and J.B. Moore, Optimaljiltering, Prentice-Hall, 1979
[50] J.c. Willems, "Least-squares stationary optimal control and the algebraic Riccati equation,"
IEEE Trans Automat Contr, Vol AC-16, pp 621-634,1971
[51] T. Kailath and L. Ljung, "The asymptotic behavior of constant-coefficient Riccati dilTerential
equations," IEEE Trans Automat Contr, Vol AC-21, pp 385-388, 1976
[52a] J.L. Casti, R.E. Kalaba and V.K. Murthy, "A new initial-value method for on-line filtering
and estimation," IEEE Trans Inform Theory, Vol IT-18, pp 515-518, July 1972
[52b] J. Buell, J.L. Casti, R.E. Kalaba and S. Ueno, "Exact solution of a family of matrix integral
equations for multiply-scattered partially polarized radiation," J Math Phys, Vo111,
pp 1673-1678, 1970
[53] R.E. Kalaba, H.H. Kagiwada, S. Ueno, Multiple scattering processes, inverse and direct,
Addison-Wesley, MA, 1975
[54] R.E. Bellman and G.M. Wing, Introduction to invariant imbedding, J. Wiley, New York, 1975
[55] S. Chandrasekhar, Radiative transfer, Oxford University Press, London, 1950. (Dover
Publications, New York, 1960)
[56a] S. Chandrasekhar, "On the radiative equilibrium for a stellar atmosphere. XXII (conc\uded),
Astrophysical Journal, Vo1108, pp 188-215, 1948
I56b] V.V. Sobolev, A treatise on radiative transfer, Van Nostrand Co., Princeton, NJ, 1963; Russian
original, 1956
[57] T. Kailath, R. Geesey, "An innovations approach to least squares estimation, part IV:
recursive estimation given lumped covariance functions," IEEE Trans Automat Contr,
Vol AC-16, No 6, pp 720-727, 1971
[58] T. Kailath, "Some new algorithms for recursive estimation in constant linear systems," IEEE
Trans Inform Theory, Vol IT-19, No 6, pp 750!.760, November 1973
[59a] J. Casti and O. Kirschner, "Numerical experiments in linear control theory using generalized
S-Y equations," IEEE Trans Automat Contr, Vol AC-21, pp 792-795, 1966
[59b] M. Sorine, "Sur les equations de Chandrasekhar associes au probleme de contrle d'un
systeme parabolique," C R Acad Sc, Paris, t 285, pp 863-865, 1977
[59c] J. Bums and R.K. Powers, "Factorization and reduction methods for optimal control of
hereditary systems," Mat Aplic Comp, Vol 5, No 3, pp 203-248, 1986
[60] M. Morf and T. Kailath, "Square-root algorithms for linear least squares estimation," IEEE
Trans on Autom Contr, Vol AC-20, No 4, pp 487-497, Aug 1975
[61] T. Kailath, A. Vieira and M. Morf, "Orthogonal transformation (square-root) implementat ions of the generalized Chandrasekhar and generalized Levinson Aigorithms," in Inter'l
Symp on Syst Optimization & Analysis, ed. by A. Bensoussan and J.L. Lions, pp 81-91,
Springer-Verlag, New York, 1979

88

T. Kailath

[62aJ M.S. Livsic, "Operators, oscillations, waves (Open systems)," Amer Math Soc Translations,
Vo134, 1973; Russian original, Nauka, Moscow, 1966
[62bJ M.S. Livsic and A.A. Yantsevich, "Operator colligations in Hilbert space," Nauka, Moscow,
1971; English translation, J. Wiley, New York, 1979
[63] T. Kailath, L. Ljung and M. Morf, "Generalized Krein-Levinson equations for efficient
caIculation of Fredholm resolvents of nondisplacement kerneIs," Topics in Functional
Analysis, Vo13, pp 169-184, ed. by I.C. Gohberg and M. Kac, Academic Press, New York,
1978
[64] T. Kailath, "Some new resuIts and insights in linear least-squares estimation theory," First
Joint IEEE-USSR Workshop on Inform Thy, pp 97-104, Moscow, USSR, December 1975.
(Reprinted with corrections as Appendix A in T. Kailath, Lectures in Wiener and Kaiman
Filtering, Springer-Verlag, 1981)
[65] T. Kailath, M. Morf and G. Sidhu, "Some new algorithms for recursive estimation in constant
linear discrete-time systems," Proc Seventh Princeton Conf on Inform Sei & Systs, pp 344-352,
Princeton, N.J., March 1973. See also IEEE Trans Automat Contr, Vol AC-19, pp 315-323,
Aug 1974
[66] M. Morf, "Fast algorithms for multivariable systems," Ph.D. dissertation, Dept of Elec Eng,
Standord, CA, Aug 1974
[67J P. De.wilde, A. Vieira and T. Kailath, "On a generalized Szeg-Levinson realization algorithm
for optimal linear prediction based on a network synthesis approach," IEEE Trans Circuits
and Systems, Vol CAS-25, No 9, pp 663-675, Sept 1978
[68J T. Kailath, "Signal processing applications of some moment problems," Proc of Symposia
in Appl Math, Vo137, pp 71-109, AMS Annual Meeting, short course reprinted in Moments
in Mathematics, ed. H. Landau, San Antonio, TX, January 1987
[69aJ T. Kailath and J. Chun, "Generalized Gohberg-Semencul formulas for matrix inversion,"
pp 231-246 in The Gohberg Anniversary Collection, Vol I, ed. H. Dym et al., Birkhauser,
Basel, 1989
[69bJ J. Chun and T. Kailath, "Displacement structure for Hankel, Vandermonde and related
matrices," Linear Algebra and Its Appls., 1991
[69cJ J. Chun and T. Kailath, "Divide-and-conquer solutions of least-squares problems for
matries with displacement structure," SIAM Journal of Matrix-Analysis, 1991.
[70J T. Kailath, A. Vieira, and M. Morf, "Inverses of Toeplitz operators, innovations, and
orthogonal polynomials," SIAM Review, Vo120, No 1, pp 106-119, Jan 1978
[71J I.e. Gohberg and I.A. Fel'dman, "Convolution equations and projection methods for their
solution," Amer Math Soc Translations, Vo141, 1974; Russian original, Nauka, Moscow, 1971
[72] T. Kailath and D.L. Duttweiler, "An RKHS approach to detection and estimation theory,
part III: generalized innovations representations and a Iikelihood-ratio formula," IEEE
Trans on Inform Thy, Vol IT-18, No 6, pp 730-745, Nov 1972
[73aJ T.K. Citron, A.M. Bruckstein and T. Kailath, "An inverse scattering approach to the partial
realization problem," Proc. 23rd IEEE Conference on Decision & Contr., pp 1503-1506,
Las Vegas, NV, Dec 1984.
[73bJ T. Kailath, A. Bruckstein and D. Morgan, "Fast matrix factorization via discrete transmission
lines," Linear Algebra and Its Appls, Vol 75, pp 1-25, Mar 1986
[73cJ A. Bruckstein and T. Kailath, "An inverse scattering framework for several problems in
signal processing," ASSP Magazine, Vo14, No 1, pp 6-20, Jan 1987.
[73dJ A. Bruckstein and T. Kailath, "Inverse scattering for discrete transmission-line models,"
SIAM Review, Vo129, No 3, pp 359-389, Sept 1987.
[74aJ J. Chun, Fast array algorithmsfor structured matrices, Ph.D. dissertation, Dept of Elec Eng,
Stanford University, CA, June 1989
[74bJ D. Pal, Fast algorithmsfor structured matrices with arbitrary rank profile, Ph.D. dissertation,
Dept of Elec Eng, Stanford University, CA, June 1990
[75J H. Lev-Ari, Y. Bistritz and T. Kailath, "Generalized Bezoutians and families of efficient
root-Iocation.procedures," IEEE Trans Cir and Sys, Feb 91
[76J R.E. KaIman, "On the Hermite-Fujiwara theorem in stability theory," Quart Appl Math,
Vo123, pp 279-287, 1965
[77J R.E. KaIman, "Algebraic characterization ofpolynomials whose zeros lie in certain algebraic
domains," Proc Nat Acad Sci, U.S.A., Vo164, pp 818-823, 1969
[78J R.E. KaIman, "On partial realizations, transfer functions and canonical forms," Acta
Polytechnica Scandinavica, MA 31, pp 9-32, 1979

Kaiman Filtering and the Advancement


of Navigation and Guidance
P. Faurre, SAGEM, 6, Avenue d'Iena, F-75783 Paris, France
with the collaboration of L. Camberlein, B. Capit, P. Constancis, M. de Cremiers,
J. DutiIIoy, F. Mazzanti, J. P. Paccard, M. Sorine

1 Basic Information Processing Required


for Navigation and Guidance 1
1.1 Data Processing and Statistical Filters
One can safely say that the trajectory estimation problem for celestial bodies
has been a main incentive for the progress of instrumentation and processing
of astronomical data.
On one hand, numerical methods have been invented and developed for
such problems [1]. On the other hand, probabilistic considerations have been
introduced soon after the invention of probability theory to model instrumentation errors.
In this way, Legendre [2J and Gauss [3J made a great advance with their
least squares method wh ich can be considered as the real ancestor of statistical
filtering.
The basic idea is the following: for estimating unknown parameters x from
some noisy measurements
y=Hx+w

(1.1)

where w is a "noise", one has to minimize a quadratic criterion such as


inf J(x) = (y - HX)TV(y - Hx)

(1.2)

to get the best estimate x of x.


Here V is a some kind of likelihood matrix, inverse of a noise covariance
matrix R (R = V-i).
The best optimal data processing is then equivalent to the solution of the
so-called "normal equation":
(1.3)

When dynamics and real time considerations enter the picture, one is led to
the filtering problem with two main methods:
1

Pierre Faurre.

90

P. FauITe et al.

(i) Wiener Filters


The approach, consistent with classical control theory and analogous electronic
technology, was mainly developed from the method introduced by Wiener [4].
The random signals are described, as engineers say, as "white noise exciting a
black box". Mathematicians speak more precisely of spectral representations:
noise or signals are modeled by correlation functions or by their Fourier transforms called spectral functions.
The main step for the solution of the optimal data processing, i.e. for
obtaining the transfer function of the filter, is a spectral factorization which can
be done by algebraic methods, or by numerical methods equivalent to the
solution of anormal equation.
As early as in 1947, Levinson [5] obtained a fast numerical algorithm for such
a solution. Basic limitations of such an approach are that it can actually solve
only stationary problems (i.e. stationary signals and equally sampled
observations).
(ii) Kaiman Filters
KaIman filtering methods do not have such limitations of stationarity and are
weIl adapted to digital processing, which make them ideally suited to the
navigation problem as will be seen later. The idea of state space or state vector
is used to model random signals and instrument noise. Mathematicians speak
ofa Gauss-Markov representation which, although already considered by such
mathematicians as Doob [6], was introduced by KaIman in the early 1960's in
the filtering problem in one of the most famous contemporary papers in the
controlliterature [7]. See also [8].
The general framework is the following:
estimate the state vector (Markov process) x modeled as
(1.4)

x=Fx+v
where v is a white noise
E[v(s)vT(t)]

= Q15(s - t)

(1.5)

through observations (continuous time or sampled)

y=Hx+w

(1.6)

where w is another white noise of covariance R.


The optimal filter for the best estimate x of x-KaIman filter-can be
written down by inspection as the following numerical algorithm (continuous
time):
~=Fx+K(y-Hx)

(1.7)

where the optimal (non stationary) gain K is computed through a (continuous

2 Kaiman Filtering-The Advancement of Navigation and Guidance

91

time) Riccati equation


K=PHTR- 1
P=FP+PF T -PHTR-1HP+Q.

(1.8)

x = x - X, and
gives then in addition the performance of the fiItering process.
To apply successfully the algorithm (1.7)-(1.8) called KaIman filter, two more
considerations have to be resolved:

P is nothing else that the covariance of the estimation error

(a) to design the markovian representation (1.4) for the signals or noises related
to the case under consideration.
We refer to Box-Jenkins [9], Faurre [10]-[11], and Young [12] either for
statisticalor more theoretical considerations for this statistical identification
and representation problem.
(b) to implement on a digital computer the algorithm, in a stable way.
We refer to Bierman [13] who uses numerical factorization methods introduced
earlier by numerical analysts such as Golub [14] for updating least squares
normal equations when new data are added (in a certain way, Gauss [3]
previously had a similar idea).

1.2 Navigation
The Navigation Problem

The navigation problem (most usually in the vicinity of the earth) consists in
determining for a vehicle in real time:
-its position
-its velocity
-its attitude or orientation

3 scalars or 2 for boats,


3 scalars or 2 for boats,
3 angles.

We are not going to describe all navigation methods or instruments, but we


shall make a basic distinction between
(i) inertial navigation, which consists of making measurements internally to

the vehicle (i.e. without any exchange with the external worId) and so is
completely autonomous,
(ii) all other methods of navigation, which use exchanges of signals or measures
with the external worId: celestial navigation, radio navigation, etc...
We are going to describe in more detail what the principles ofinertial navigation
are.

92

P. Faurre et al.

Inertial Navigation

Inertial navigation is based on the principle of inertia of Mechanics (either


Newton, or Einstein) which in some way is an observability principle in the
sense of KaIman. It states that, without any exchange with the external world,
one can determine inside a vehicle:

T,

-the specific force


difTerence between all forces applied to the vehicle and
gravitation,
-the inertial rotation vector W of the vehicle relative to the "absolute" or
"inertial" space.
Today T is measured by instruments called accelerometers and W by
gyroscopes. For navigation close to the earth, i.e. close to the geoId (Fig. 1.1),
the basic physical constants are:

earth rotation rate (24 hours)


earth gravitation,
earth curvature radius.

R (or R 1 and R 2 )

More basically, two time parameters are really fundamental for the geo'id:

T=2njil
Ts =2n,JRi9

= 24 hours
=

84 minutes, called Schuler period.

Foucault [15J was the first to make implementation ofthose ideas in 1851 with
his famous experiment with a pendulum, and with the invention ofthe gyroscope.
At the beginning ofthe century, Sperry and Anschutz invented the gyrocompass.
However it is during the Second World War that the complete design of an
inertial navigation system was made [16J determining all parameters from
inertial instruments.
With gyroscopes and accelerometers set on a "cluster" (Fig. 1.2) one gets
enough information to measure absolute orientation and
If we know a model

T.

Fig. 1.1. Inertial navigation c10se to the earth


physical parameters

Fig. 1.2. Inertial cluster

2 KaIman Filtering-The Advancement of Navigation and Guidance

93

>--.--+-;+

processor

for gravitation, i.e. the function


equation of mechanics:

Fig. 1.3. Principles of an INS (lnertial Navigation System) a) strapdown b) with platform

gen, we can mechanize (integrate) the basic

~2t:la = T + g(r)

(1.9)

and get position, velocity and attitude of the vehicle.


Several time scales appear for such mechanization:
-attitude (orientation) depending on angular vehicle dynamics which can
necessitate a few 100 Hz of bandwidth,
-"linear" velocity with lower frequency dynamics.
Therefore, in a first historical stage, hybrid computation was used
(Fig. l.3a), with analogous computation through a servo platform for angular
integration, and digital computation for speed and position integration.
Today, digital computers are much faster and so all the mechanization can
be done digitally (Fig. l.3b): one speaks then of a "strapdown" inertial navigation system because instruments are "strapped" to the vehicle.

1.3 Navigation Considered Globally as a Filtering Problem


To get the best precision within an optimal cost, one is led today to make
hybrid systems using both inertial instruments wh ich have the advantages of
being autonomous with wideband characteristics, and using also external
information such as that given by radionav systems (Loran, Tacan, NavStar
GPS, etc . .. ).
The first realizations of such hybrid systems were made using ad hoc classical
control methods ("damping of Schuler oscillations"). However it was early
realized [17]-[ 18] that the Kaiman filter was really a tremendous theoretical
and practical tool to design hybrid navigation systems. We refer to [19] for a
detailed exposition. It can be said shortiy that the hybrid navigation problem
can be modeled as a nonstationary filtering problem as soon as one can model
the instrument errors as a Gauss- Markov model such as (lA). The Kaiman
filter then estimates real time errors from which the best estimates of all the
navigation parameters can be computed.
It was a great intellectual advancement in the late 1960's to realize that the
navigation problem was really an optimal da ta processing problem and to put

94

P. Faurre et al.

it in a form suitable for KaIman filtering. See [19]. Many of the successful
applications of KaIman filters were done through a similar process.

1.4 Introduction to the Six Examples of the Chapter


The following parts have been written by L. Camberlein, B. Capit, P. Constancis,
M. de Cremiers, J. Dutilloy, F. Mazzanti, J.P. Paccard-all of SAGEM-and
M. Sorine of INRIA. They best illustrate the state of art for implementing
navigation KaIman filters.
We start by showing in Sect. 2 how modern filtering techniques interact with
the design of new instruments: unbalanced instruments are used purposely.
Next we show Sect. 3 that KaIman filters can compensate one of the
nightmares of mechanical engineers to design accurate systems: solid friction.
The more modern form of radionavigation is satellite navigation: Sect. 4
shows how KaIman filters are widely used in such systems. Related softwares are
actually very complex ones.
As in every mathematical integration process, the integration of the basic
equation (1.9) for INS requires that initial conditions are known: this is called
the alignment problem as far as navigation parameters are concerned and the
calibration problem as far as the parameters for all error models are concerned.
A very efficient software for a new ring laser gyro system is described in Sect. 5.
To conclude with two examples ofhybrid INS systems, we have presented:
-in Sect. 6, the Super-Etendard system which has been operational for more
than 10 years,
-in Sect. 7, a new inertia-GPS multisensor system which hasjust flown and is a
very good example of "sensor fusion" as one would say today.

References
[1] H.H. Goldstine, Ahistory of Numerical Analysisfrom the 16th century through the 19th century,
Springer Verlag, 1977
[2] A.M. Legendre, Nouvelles Methodes pour la Determination des Orbites de Cometes, Courcier,
1806
[3] C.F. Gauss, Theoria Motus, Goettingen, 1809
[4] N. Wiener, Extrapolation. Interpolation and Smoothing ofStationary Time Series, John Wiley,
1949
[5] N. Levinson, The Wiener Root Mean Square Error Criterion in Filter Design and Prediction,
Journal of Mathematics and Physics, XXV, No 4, pp 261-278, 1947
[6] J.L. Doob, The elementary Gaussian Process, An. Mathematical Statistics, 15, pp 229-282, 1944
[7] R.E. KaIman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic
Engineering, pp 35-45, 1960
[8] R.E. KaIman, New Methods in Wiener Filtering Theory, Proceedings of the 1st Symposium
on Engineering Applications of Random Function Theory, pp 270-388, Wiley, 1963
[9] G. Box, G. Jenkins, Time Series Analysis, Holden-Day, 1969
[10] P. Faurre, Realisations Markoviennes de Processus Stationnaires, Rapport LABORIA No 13,
IRIA, 1973
[11] P. Faurre, M. Clerget, F. Germain, Operateurs Rationnels Positifs. Applications a
I'Hyperstabilite et aux Processus A/eatoires, Dunod, 1979

2 KaIman Filtering-The Advancement of Navigation and Guidance

95

[12] P. Young, Recursive Estimation and Time Series Analysis, Springer Verlag, 1984
[13] G.J. Bierman, Factorization M ethodsfor Discrete Sequential Estimation, Academic Press, 1977
[14] G.H. Golub, C.E. Van Loan, Matrix Computations, The John Hopkins University Press, 2nd
edition, 1989
[15] L. Foucault, Recueil de ses Travaux Scientifiques, Gauthier-Villars, 2 volumes, 1878
[16] C.S. Draper, W. Wrigley, J. Hovorka, Inertial Guidance, Pergamon, 1960
[17] K.J. Astrom, Some Problems of Optimal Control in Inertial Guidance, IBM Research Paper
RJ-229, San Jose, 1962
[18] L.D. Brack, G.T. Schmidt, Statistical Estimation in Inertial Navigation Systems, pp 554,558,
AIAA, JACC Guidance and Contral Conference, Seattle, August 15-17, 1966
[19] P. Faurre, et al., Navigation Inertielle Optimale et Filtrage Statistique, Dunod, 1971

2 Wideband Control of the Gyro/Accelerometer


Multisensors of a Strapdown Guidance System
for Agile Missiles 1

2.1 Introduction
SIGAL,2 a family of strapdown inertial systems, has been developed to answer
the needs for miniature wideband attitude and navigation systems for missile
guidance. For reasons of size, the only sensors used in these systems are three
miniature gyrojaccelerometer multisensors, instead of three accelerometers and
two gyroscopes gene rally used to provide the necessary angular rate and specific
force measurements.
In the following sections, will be first described the gyrojaccelerometer multisensor, which is an unbalanced Dry-Tuned-Gyro (DTG), and then the design
and test results of the digital regulator and estimator using Linear Quadratic
Gaussian (LQG) control.
Due to their small size and low price, high speed microprocessors are
increasingly used to implement regulator algorithms, This allows the im plementa ti on of sophisticated digital control loops and wider bandwidth for
guidance systems. Earlier work [1], on the design of a digital servo-control
loop for a gyro led to problems of robustness because of parameter variations.
More recently, Ribeiro [2], proposed an LQG control for a similar gyroscope,
using a simplified model. Both studies supposed that the gyro angular rate is
equal to the controller torque, wh ich is not the case during fast variations of
the angular rate. The LQG control described in this section is based on a more
sophisticated model inspired by Craig [3] and takes into account the multivariable nature of the DTG.
In co nt rast to previous studies, a stochastic model of angular rate and linear
acceleration of the system is used instead of taking these entries proportional
to controller torques [2], and the bandwidth of the estimator is separate from,
and higher than, the bandwidth of the closed loop.

P. Constancis (SAGEM), M. Sorine (INRIA).


SAGEM's Trademark.

2 Kaiman Filtering-The Advancement of Navigation and Guidance

97

2.2 Equations of the Gyrol Accelerometer Motion


The stapdown gyroscope shown in Fig. 2.1 has a diameter of about 3 cm. Fig. 2.2
presents the gyro in greater detail:
The drive motor, a two-phase, synchronous hysteresis type, spins the rotor,
at a high angular velocity N, relative to the casing (250 revolutions per second).

Fig.2.1. SAGEM GSL 82, gyrolaccelerometer

Hooke's joint suspension


\

Rotor
Ring magnet

Torquer stator --tt;!-=::.;;

Pickoff coil

Gyro case

Motor stator
Motor rotor

Fig, 2.2. Cross section of a dry-tuned gyro

98

P. Faurre et al.

The suspension is a Hooke's joint containing one gimbal connected to the


rotor, on one side, and the motor shaft by two orthogonal elastic hinges, on
the other side. The rotor is then free to rotate about these two axes. The spring
rate of the flexure is dynamically tuned to nearly zero. Perfect adjustment is
not necessary because the model takes into account the residual stifTness.
The rotor's center of mass is purposely not on the center of torsional support
(unbalanced gyro). This rotor axial pendulosity generates sensitivity to linear
acceleration inputs.
The inductive pickofTs measure the rotor angular position , relative to the
ca se in two directions (sensing axes) making an orthogonal co ordinate
reference frame with the shaft:
(2.1)

Two magnetic torquers, oriented along these gyro sensing axes, apply a
moment on the rotor represented in case-fixed coordinates by H wc(t), where
H is a 2 x 2 constant matrix.
The gyro case motion is defined by its absolute angular rate w(t) and its
specific force, I(t), defined as the difTerence between the absolute linear
acceleration and the gravity acceleration. Both entries are resolved along the
case-fixed coordinates.
Because of the gas contained inside the case, an aerodynamic moment as well
as a damping moment act on the rotor.
The same method as the one used in Craig [3J, [7J for the case of an
unbalanced gyro, can be used to obtain the following simplified equation of
motion [4J:
(2.2)

with

J=[ +10
N

v=I Z -I

nutation frequency,

rotor speed,
principal moments of inertia of the
rotor,
principal moments of inertia of the
gimbal,

Ix,Iy,Iz
A,B,C

I = t(l x + I y + B)
D

rotor windage damping coefficient,

2 Kaiman Filtering-The Advancement of Navigation and Guidance

p=p

Kq

p
2

pendulosity,

+~

K", = (k x
kx,ky

99

+ ky -

(A

+B -

C)N 2 )/2Iz N

rotor pendulosity along spin axis,


pendulosity of the gimbal along spin
axis,
rotor-to-case lift torque coefficient,
residual stiffness,
torsional stiffness of flexures.

2.3 Estimation and Control


Model (2.2) is selected for control design. This design will be first explained for
a balanced gyro (P = 0).
A estimation, w(k)=E{w(k)ly(k-l),y(k-2), ... }, ofthe angular rate w(k),
is requested assuming the past data y(k - 1), y(k - 2), ... to be known, where
y(k) = (k) + w(k)

(2.3)

and w is a two dimensional discrete-time Gaussian white noise.


The control W c which minimises the quadratic loss function is also needed:
(2.4)

First, the deterministic problem supposing knowledge of wand of the complete


state is solved:

'Then the control is:


u=wc'-w

instead of Wc'
The corresponding discrete-time system is controllable, considering u as the
control, and observable, considering as the observation. So the discrete
algebraic Riccati equation leading to the optimal gain L has one and only one
positive sem i definite solution. The control can be computed as:
u(k)

=-

Lxc(k)

In order to solve the estimation problem, a random drift is chosen for the
stochastic model of w(k):
w(k

+ 1) =

w(k)

+ v(k)

(2.5)

where v is a 2-D discrete Gaussian white noise, independent of w, and we introduce

100

P. Faurre et al.

an augmented state

The new system is still observable using and controllable using v. Therefore
the discrete algebraic Riccati equation leading to the optimal filter gain K has
one and only one positive semi definite solution and the steady state KaIman
filter is
x(n

+ 1) = Fx(n) + Gwe(n) + K[y(n) -

Hx(n)]

Now the LQG-control problem defined by (2.2) to (2.5) can be solved, using
control and filter gains Land K. The system with state x, output y and input
W e , is observable but not stabilizable, but the loss function is stabilizable.
Therefore it can be minimized [6]. It can also be shown that the optimal control
is
wAn)

=-

[L, - 1Jx(n)

with integral action because of the model (2.5) chosen for w. A block diagram
of the system is shown in Fig. 2.3. A notch filter is added on at frequencies
N, 2N, 3N, ... because of a periodic noise on the pickoff outputs. Then the LQG
control is generalized to the ca se of the three unbalanced gyroscopes of the
SIGAL inertial guidance system used in agile missiles.
Figure 2.4 shows the relations between the reference frame of each gyroscope
and the reference frame of the system. It can be noticed that: the balanced gyro
3 is only sensitive to wy and W z ' the absolute angular rate of the vehicule along
x et z axes; the unbalanced gyro 2 is sensitive to w x' w z' and to Ix, Iz' components
of the specific force; the inverted unbalanced gyro 1 is sensitive to w x' w=' Ix, I y ,
with a pendulosity coefficient opposed to the pendulosity coefficient of the
second gyro.
Therefore the whole system can detect wand I along each reference axis.
(j)

/\

co
STATE FEEDACK
/\

(,\(n) = -(L, -I) x(n)

Fig. 2.3. LQG contral for one balanced gyra

2 Kaiman Filtering-The Advancement of Navigation and Guidance

EJ

z
Invened
unbalanced

gyro. I

101

!
I

I
gyro.2

Fig. 2.4. Definition of co ordinate reference


frames

GLOBAL ESTIMATOR

ro.!

Fig. 2.5. LQG control of three unbalanced gyroscopes

Figure 2.5 illustrates the complete estimation process of wand f. It is possible


to show that the state feedback of the whole system can be split into three parts
(one per gyro).

2.4 Test Performance ResuIts


A 2000 Hz sampling frequency was selected for the 6 input/6 output system.
Figure 2.6 shows the estimator frequency response. The other axes responses
are similar. The unity steady-state gain is verified. The gap between measurements and simulation is mainly due to turntable resonances above 60 Hz. The

102

P. Faurre et al.

- ....

-- - - - --- ----- - --

-3

1\\

\/

-5

iD
'U
i:5

\~I

- 10

-1 5

," F(log)
250 Hz

- 1.5

\
Cl>

'"c

' 1.2 Hz

\\

\\

- 90

.c

a.

-135

-1 80
1

10

\
\

: mE?osurE?mE?n I
---: simulollcr

F(log)

100

Hz

Fig. 2.6. Bode plot for the estimator transfer funetion

1000

wx/wx

efTect of the notch filters at 250 Hz can be seen. A 140 Hz bandwidth is' shown.
This is twice the bandwidth of conventional implementations where the
estimation wis taken to be W e , thus limiting the bandwidth of the estimator to
that of the control loop. In the proposed method, these two bandwidths are
distinct: stabilization error is used to improve the estimation. On the Black plot
of the openloop response (Fig. 2.7) a gain margin of 6 dB, a phase margin of
40 degrees and a closed-Ioop resonance of 3 dB can be measured. This Black plot
corresponds to the x output of one of the 3 gyros. It can be noticed that the
closed-Ioop resonance is greater than the estimation resonance. Compared to
classical control, stifTness is three times higher and the sensor can stand angular
acceleration up t 60000/S2. This is made possible because the control loop
can be tuned without detuning the estimator.
In conclusion, the LQG control method together with an accurate model
is, as expected, of great use fr solving such a multi variable estimation and
control problem.

2 KaIman FiItering-The Advancement of Navigation and Guidance

103

Fig.2.7. Black plot for the x output of one of the 3 gyros

An LQG control for the non-stationary model (2.1) is currently under study.
The concept could result in a single instrument providing both measurement
of angular rate and specific force along two axes.

References
[1] G.K. Steel, S.N. Puri, Direct digital control of dry tuned rotor gyros, Pergamon Press, Oxford,
Automatie Control in Space, Vo12, pp 79-85, 1980
[2] J.F. Ribeiro, A LQG regulator for the angular motion of the rotor of a tuned gyroscope, Instituto
de Pesquisas Espaciais. Sao Jose dos Campos (Bresil). INPE-4280, PRE/ 1152, Aout 1987
[3] R.J.G. Craig, Theory ofoperation of an elastically supported tuned gyroscope, IEEE Transactions
on Aerospace and Electronic Systems, Vol AES 2, No 3, pp 280-288, May 1972
[4] P. Constancis, M. Sorine, Wideband linear quadratic gaussian control of slrapdown dry luned
gyro/ accelerometers, AIAA Guidance, Navigation and Control Conference, Boston MA., Paper
No 89-3441, pp 141-145, August 1989

104

P. Faurre et aJ.

[5] H. Shingu, Study on non interacting control of dynamical/y tuned dry gyro, Trans. Soc. Instrum.
and Control Eng. (Japan), Vo120, No 6, pp 554-560, 1984
[6] M. Sorine, Sur /'equation de Riccati stationnaire associee au probleme de controle d'un systeme
parabolique. Comptes Rendus I'Academie des Sciences, t. 287, Serie A, Septembre 1978, p 445
[7] RJ.G. Craig, Dynamical/y tuned gyros in strapdown systems, NATO-AGARD Conference
Proceedings No 116 on Inertial Navigation, Components and Systems, AD-758127, Paris,
February 1973

3 On-Line Adaptive Solid Friction Compensation


for High Accuracy Stabilisation of Optronic Systems l

3.1 Introduction
The line-of-sight stabilization error of a pointing and tricking system caused
by response to gimbal bearing friction torque is often of sufficient magnitude
to be the object of an intense design effort [1]. This torque acts on the stabilized
member of the system's gimbal as a function of relative angular motion between
that member and the gimbal's base. It is counteracted in conventional systems
by the torque motor of a stabilization feedback loop. A gyroscope mounted on
the stabilized member is used in the feedback loop. This loop produces corrective
motor torque as a function of error measured by the inertial sensor. Feedback
operation reduces friction-related errors. Nevertheless, precision is limited by
loop stability which bounds feedback gain. Stabilization errors are then often
unacceptably large.
When friction torque can be accurately predicted in real-time, it is possible
to improve precision by using a feedforward compensation of this torque before
its effect is measured by the feedback sensor. In that case stabilization error is
no more function of the fuH friction torque but only of the mismatch between
actual and predicted friction torques. In addition, it is possible that the co~fficients of the model vary with temperature, time and operation conditions.
The motivates making the friction compensation adaptive.
The detailed knowledge of friction behavior necessary to achieve accurate
real-time modeling has been improved recently [2], [1]. This is particularly
true concerning the transient behavior of friction caused by relative motion
reversals of the system's gimbal members. Characterization of damping in solid
friction oscillators is given by Dahl's model which behaves as Coulomb's
model for large amplitudes and as viscous and structural damping for medium
and small amplitudes.
Adaptive friction compensation has been considered before. A feedforward
compensator adapted with model reference techniques and based on the
'Coulomb/stiction' model has been used [3]. Canudas, Astrm and Braun, [4],
have proposed an adaptive scheme with explicit identification of the Coulomb's
1

M. Sorine (INRIA), P. Constancis (SAGEM).

106

P. Faurre et al.

model. This scheme uses the apriori information available i.e. the structure of
the nonlinearity and the knowledge of some of the parameters.
Here we pro pose an adaptive friction compensator, whose implementation
is based on Dahl's model. Friction models proposed in the literature are
quickly discussed in Sect. 3.2. In Sect. 3.3 we describe the indirect adaptive
friction compensator built around an Extended KaIman Filter.

3.2 Friction Modeling


In the c1assical Coulomb stiction model there is a constant friction torque Ce
opposing the motion when IX =P 0 (Fig. 3.1). For zero velocity the stiction Cs
will oppose aIl motions as long as the torques are smaIler in magnitude than
the stiction torque. Coulomb's stiction model has been weIl established in
connection with large amplitudes.
A very different model, based on experimental studies of ball bearings was
proposed by Dahl in [2]:

dC =er da (
dt
dt

1_~sgnda)
dt

Cmax

(3.1)

with
C
Cmax

friction torque,
maximum friction torque,
relative gimbal angle,

da
sgndt

sign of angular velocity,

er

model parameter.

This model is probably the most accurate to describe friction transient behavior
during reversals but it does not take stiction into account. A mathematical study
ofDahl's model has been done by Bliman, [5], in particular, the following result

+Cs
+Ccl----da

dt

-----I-Ce
-Cs

Fig. 3.1. Coulomb's stiction model of friction

2 Kaiman Filtering-The Advancement of Navigation and Guidance

107

(j

is proven: Coulomb's model is the limit of Dahl's model as goes to infinity.


Extension of Dahl's model to cover the stiction case is under current research.

3.3 Friction Compensation


The equation of motion for the stabilized member of inertia I and angular
position is:

(3.2)

is stabilized to zero with a feedback control Cm. To reduce the effects of the
friction terms by a nonlinear compensation, it is necessary to obtain an estimate
C of the friction torque C.
Assume that y is a noisy measurement of (output of a gyroscope):
(3.3)

y=+v

where v is a Gaussian white-noise process with zero mean and covariance


matrice:
E(vv T ) = R

Extended KaIman Filtering can be used to identify the parameters:

(}=[(j,~JT
Cmax

(3.4)

and to estimate , and C.


Because the parameters are time-varying, it is necessary to eliminate the
influence of old data. This can be done by using the stochastic model:
(3.5)
where w is a Gaussian white noise process with zero mean and covariance
matrice:
E(ww T ) =

Now, (3.1), (3.2) and (3.5) are state equations of a system observed through (3.3).
When is small, ci is elose to the angular rate of the base of the pointing and
tracking system and can be estimated separately.

3.4 Test Performance Results


Experimental results have been obtained, using a method elose to the one
presented above. For the test a stabilized member was mounted on a moving
turntable with a gimbal which causes the friction. The turntable's angle Cl was

108

P. Fa urre et al.

40
Vi

-go

'1\ I Yll

20

'"

_..... _...~ .+.............. ,................,. \...........+...... t

VI

.9

co..

-20

-40
Fig. 3.2. Stabilization angle error without compensation

40
Vi

"U
C
0

20

CII
VI

a:t.

-20 - 40 Fig.3.3. Stabilization angle error with compensation

Cl
CII

:J

of!

- -100

-200 ~

Fig. 3.4. Friction torque


estimate C

2 Kaiman Filtering-The Advancement of Navigation and Guidance

109

measured to be 40 are minutes peak-to-peak. Figures 3.2 and 3.3 show the
stabilisation error without and with eompensation. The peak-to-peak error
is shown to be divided by 3 using eompensation [80 are seeonds peak-to-peak
to 27 are seeonds peak-to-peak]. The frietion torque estimate C is shown in
Fig.3.4.
References
[1] C.D. Walrath, Adaptive Bearing Friction Compensation Based on Recent Knowledge oJ Dynamic
Friction, Automatica, Vo120, No 6, pp 717-727, 1984
[2] P.R. Dahl, Solid Friction Damping oJSpacecraJt Oscillations, Paper No 75-1104, AIAA Guidance
and Control Conference, Boston, Mass., August 1975
[3] Guilbart, Winston, Adaptive Compensation Jor an Optical Tracking Telescope, Automatica,
Vol 10, pp 125-131, 1974
[4] C. Canudas, K.J. Astrom, K. Braun, Adaptive Frinction Compensation in DC Motor Drives.
IEEE Journal of Robotics and Automation, Vol RA-3, No 6, December 1987
[5] P.A. Bliman, Etude mathematique d'un modele de Jrottement sec: le modele de P.R. Dahl. These
de Docteur en Sciences, Universite Paris IX, to be published

4 GPS Receiver Kaiman Filter!

4.1 NA VSTAR Global Positioning System


Overview of the System

The NA VST AR Global Positioning System (GPS) is based on distance


measurements to satellites, whose positions are known (Fig.4.1). The actual
measurements are given by the transmission time of radio-frequency signals
between the satellites and the user's receiver antenna.
The GPS system consists of a constellation of 21 satellites and a ground
based control segment. The satellites travel on earth centered orbits of radius
26000 km indined 55 with the equatorial plane. They carry atomic docks used
to synchronise the transmission of radio-frequency signals to GPS users. The
signals are composed of a carrier sine-wave, a pseudo-random code and a navigation data message. The data message consists of various parameters allowing
the user to compute the position of the transmitting satellite and to correct for
atomic dock errors. Other components of the signal are used to measure the
transmission time. The control segment tracks the position and dock errors of
each satellite to periodically update the navigation data message.
Simplified Structure of a GPS Receiver

A GPS receiver is composed of an antenna, a preamplifier, a radio-frequency


module and a digital module (Fig. 4.2). The digital section can carry from 1 to
12 channels, depending on receiver design. A channel is primarily composed
of two phase lock loops tracking the carrier wave and the pseudo-random code
of one satellite. Each loop gives a phase measurement related to the transmission
time.
If the number of channels available is less than the number of satellites in
view, channel sequencing is used to get measurements related to different
satellites at different times on the same channel.

1. Dutilloy, B. Capit, L. Camberlein (SAGEM).

2 Kaiman Filtering-The Advancement of Navigation and Guidance

111

--\

--4

Rs
(known)

GPS RECEIVER
(antenna)

--4

(unknown)

Fig. 4.1. Principle of GPS


positioning

EARTH
CENTER

Antenna

DIGITAL SECnON

r PREAMPLIFIER

~r

I
RF

MODULE

f--

.
.r

>-

~
- ...,

Channel

- - - - Channel

_ _ _

- - - - - - ...,
Channel

0-

L _ _ _ _ _ _ ...J

Channel

r----

P
R

I
N
T
E
R
F
A
C
E

'-----

L...-

0-

_ ...J
_

r----

I-

I-

C
E

f-

S lS
0

Fig. 4.2. GPS receiver architecture

GPS Measurements

The code phase, or pseudo-range p, allows measuring the whole time difTerence
between the transmit time and the received time. The measurement p is related
to the user-to-satellite distance p by (Fig. 4.1):

p = p + c(bs - br ) + dion + dtrop + ds + Wp


where
p = Cpp)1 /2

l_

= u,

u= - p

is the unit vector of the user-to-satellite distance,

p = Rs - R

is the user to satellite vector,

(4.1)

112

P. Faurre et al.

with
R. = [X., Y.,Z.]T
R=[X,Y,ZY
c

bs
br
dion and dtrop

d.

Wp

satellite's position in earth-fixed earth centered coordinates as given by the navigation data message,
user's position in the same reference frame,
speed of light in vacuum,
satellite dock offset from GPS time,
receiver dock offset from GPS time,
ionospheric and tropospheric propagation errors,
error in the transmitted satellite position,
code tracking error of the receiver.

The carrier phase, or pseudo-range rate cp, can only be measured relative to
the time when the loop has locked to the incoming signal. The measurement
cp is related to the satellite to user distance p by:

cp = p + CPo + c(b. -

b r ) - d ion + dtrop + d.

+ W",

(4.2)

where

CPo

p, c, b., b" dion , dtrop , dr


W",

is the phase ambiguity at lock time, which is unknown,


are the same as above,
is the carrier tracking error of the receiver.

GPS related errors b. and d. are usually smalI, from 1 to 5 meters. Propagation
residuals errors, and dion , and dtrop vary from 5 to 30 meters. Receiver dock
errors are much greater and must be evaluated to compute a good position.
Code tracking error and carrier tracking error are of different magnitude
due to the corresponding wavelength. Code resolution is of the order of 1 to
30 meters, depending on the type of code tracked and the receiver design. Carrier
resolution is of the order of 1 to 3 centimeters. Consequently a geod design
must take advantage of the accurate but relative carrier phase measurement and
the absolute but less accurate code measurement.

4.2 KaIman Filter of a GPS Receiver


GPS receivers used on vehides usually implement a KaIman filter to estimate
position and velocity from GPS measurements and code and carrier phase
observation models.

Code Phase Observation Model


Equation (4.1) is a measurement of user's position in which Wp can be modeled
as zero mean Gaussian white noise. The errors b., d., dion and dtrop are relatively
sm all and non-observable and may be induded in the measurement noise Wp-

2 Kaiman Filtering-The Advancement of Navigation and Guidance

113

The code measurement equation then reduces to:

p=p-cbr + Wp
which is a non-linear function of the vehicle's position and must be linearized
around the position estimate. Then the code phase observation model is:

or

ut5if - ct5br + Wp

t5yp = -

Carrier Phase Observation Model


Equation (4.2) is generally used as a measurement ofuser's velocity by computing
the difference of two measurements over time interval Llt- W~ can be modeled
as zero mean Gaussian white noise. The variations of bs' dion , dtrop and ds over
Llt are much smaller than W~ and, consequently, can be neglected.
The carrier phase rate, or pseudo-range rate, observation model then reduces
to:
LlcjJ
Llt

Llp
Llt

Llb r
Llt

-=--c-+

W~=p-cf,

+ W~

where Ir is the receiver clock offset rate.


This model is also non-linear.
Linearized around the position estimate it gives the code phase observation
model:
LlcjJ
Llt

Llc$
Llt

t5y",=---= t5p-ct5Ir + W~

or

Model of the Vehicle's Dynamics


The vehicle's dynamics is modeled by a Markov process in earth coordinates.
For high dynamics vehicles, the model includes position, velocity and
acceleration:

R=V
V=A
.

A=--A+v
T

114

P. Faurre et al.

with
R=[X, Y,ZY
position in earth fixed earth centered coordinates,
v = [Vx' Vy, VzY velocity in same coordinates,
A = [A x, A y, AzY acceleration in same coordinates,
v
white noise.

Implementation of the Navigation Kaiman Filter


The complete model, including the vehicle dynamic states and the GPS states,
linearized about the current estimate, is of the form:
~X=F~X +v
~Y=H~X

+w

with
~X = [~X,~y,~Z,~VX,~Vy,~Vz,~Ax,~Ay,~Az,~b,,~frY

~Y=[(P-P),(~~ - ~~)J
This non-stationary 11 state model is observable if measurements to 4 or more
satellites can be made.
It can be seen that processing 4 or more simultaneous measurements of
different satellites will give an estimate of position and velo city at measurement
time independent of vehicle dynamics. Of course, the best result is reached when
simultaneous measurements are taken from all visible satellites, wh ich is sometimes called the "AlI-in-View" concept.
If less than 4 channels are used, the filter will rely on the vehicle dynamics'
model to correlate measurements made at different times. Cdnsequently the
estimates will be related not only to measurement noise but also to the propagation of state noise between measurements. In these conditions, time differences
between satellite sampling times, level of vehicle dynamics, and receiver clock
stability are critical issues.
For this reason, when dealing with high dynamics, four or more channels
are generally required. For lower dynamics, one or two channel receivers can
be used, depending on their channel sampling time and clock stability.
References
[I] D.E. WeHs, et al., Guide to GPS positioning, Canadian GPS Associates, Fredericton, N.B,
Canada, 1986
[2] J.J. Spilker, GPS Signal Structure and Performance Characteristics, Journal of Institute of
Navigation, Vol I, pp 29-54, 1980
[3] R.P. Denaro, P.V.W. Loomis, GPS Navigation Processing and Kaiman Filtering, AGARD No
161, pp 11.1-11.9, 1989
[4] L. Camberlein, B. Capit, Uliss G, a Fully Integrated "All-in-one" and "All-in-View" Inertia-GPS
Unit, IEEE PLANS 1990 Symposium, to appear
[5] J. Ashjaee, On the precision ofthe CjA Code, IEEE PLANS 86 Proceedings, pp 214-217

5 Calibration of a Ring Laser Gyro Inertial System 1

5.1 Introduction
To aehieve the same navigation aeeuraey as gimballed inertial navigation
systems (gimballed INS's), strapdown inertial navigation systems (strapdown
INS's) require better gyros and aeeeierometers and, therefore, a better ealibration
aeeuraey.
A 1 Nm/h-1 m/s class strapdown INS for fighter aireraft typically requires
gyro stabilities and ealibration aeeuracies of .005 d/h for drifts, 5 ppm for seale
faetors and 15 mieroradians for misalignments. Aeeordingly, it requires for its
aeeelerometers stabilities and ealibration aeeuraeies of 50 ppm for seale faetors,
40 miero-g for bias and 15 mieroradians for misalignments. Aeeordingly, it
requires for its aeeelerometers stabilities and ealibration aeeuraeies of 50 ppm
for seale faetors, 40 miero-g for bias and 15 microradians for misalignments.
For transport aireraft and eommercial aviation, numbers 2 to 3 times larger
are usually satisfaetory due to lower flight dyn ami es and to less stringent velocity
aeeuraey needs. Exeept gyro drifts, most of these values are about or more than
an order of magnitude lower than what is required for a gimballed INS for the
same type of applieation.
Usually, the ealibration of an inertial navigation system eonsists in eompa ring while the system is at test the gyroseopes' outputs with the Earth rate
and the aeeeierometers' outputs with the loeal gravity. This is done for different
angular positions with respeet to the loeal geographie axes by using an aeeurate
turntable on whieh the cluster in rigidly fastened.
However, in the partieular ease of ring laser gyro strapdown inertial navigation system, this method eannot be used beeause the elastie isolation of the
cluster (on whieh the laser gyros and aeeeierometers are mounted) must not be
clamped. Therefore, a specifie ealibration method has to be used.

F. Mazzanti, L. Camberlein (SAGEM).

116

P. Faurre et al.

5.2 Overview of the SIGMA Calibration Method


The calibration method which was developed in 1985 for the SIGMA 2 ring
laser gyro systems substitutes the analytical platform of astrapdown INS to
the turnable reference axes. The analytical platform, which can be seen as the
image of the platform cluster of a gimballed system, is computed in real time
by integration of the attitude differential equations using the gyro's outputs.
This integration provides the direction eosine attitude matrix between the
cluster axes and analytical platform axes. Therefore, this matrix takes into
account the angular errors of the elastic isolation during multiposition
calibrations.
The accelerometer outputs, in cluster axes, are transformed in analytical
platform axes using the attitude matrix. These accelerometer outputs are used
to define very accurately the attitude and rate of the analytical platform. For
this attitude and rate measurement, the accelerometers have a much better
resolution than the turntable angular encoders of classical calibration methods.
Before a calibration procedure, the inertial navigation system (which is
mounted on a two-axis turntable) is aligned using it own gyrocompass alignment.
Then the INS is switched to navigation mode and the calibration procedure is
started. This procedure consists of using the velocities errors of the INS during
each static period before and after a turntable rotation. It is obvious that the
number of rotations needed depends on the number of parameters to be
identified in the accelerometers' and gyros' error models. The concept of
observability as defined by KaIman is very useful to design the procedure.
To process in real time the standard navigation outputs of the ring laser
gyro INS, a high accuracy KaIman Filter has been developed, based on the
fine modelization of the SIGMA strapdown navigation mode.

5.3 Strapdown Navigation Model


The velocity errors of astrapdown INS depend on its gyros' and accelerometers'
calibration errors, as weIl as on its attitude and alignment errors. Their relationship is described by the error differential equations of strapdown inertial navigation. These equations can be written, in matrix form, as follows (see [3J):
6V= TplmE a - A(<p) Tp1mfa - Coriolis terms...

eil = TplmE g - A(w)<p


Xa=O

x =0
9

6V(0)

6Vo

(5.1)

<p(0) = <Po

(5.2)

Xa(O)=X a

(5.3)
(5.4)

Trademark of a family of ring laser gyro inertial navigation systems developed by SAGEM.

2 KaIman Filtering-The Advancement of Navigation and Guidance

117

with
Ea = FaX a + va

(5.5)

Eg=FgXg+vg

(5.6)

and where
is the velo city error vector in geographie axes,
is the small angle between the analytical platform and the geographie
axes (it is the attitude alignment error),
is the accelerometer error state vector,
is the gyro error state vector,
is the accelerometer total error in cluster axes, linear function of X a ,
is the gyro total error in cluster axes, linear function of X g ,
is the rotation matrix between the analytical platform axes [pJ and the
cluster (or measurement) axes [mJ (it is computed in the INS by
integration using the gyro outputs),
is the specific force as measured by the accelerometers.

<5V
q>

Xa
Xg
Ea
Eg

T p/ m

Ja

The total error model is then linear and of the form:

<5)( = F<5X + v

<5X(O) = <5X 0

(5.7)

where
<5X = [<5 V, q>, X a, XgJT

F
v

is the total state vector including velo city, attitude, gyro


and accelerometer error states,
is the dynamic matrix of the model,
is the process noise vector.

During the calibration or alignement mode, when the system is at rest, it is


known that velocity relative to the earth is zero. Velocity measurements equal
to zero are made between each rotation ofthe cluster. Therefore, the observation
model is:
It is linear, stationary and of the form:

<5Y=H<5X+W

(5.8)

where
Y

H
W

is the measurement vector,


is the measurement matrix,
is the measurement error which depends mainly on the accelerometer
quantization.

Written in the canonical form of equations (5.7) and (5.8), calibration becomes
a filtering problem which can be solved by KaIman filtering technique, assuming
all the errors are observable. The observability depends on the gyro and

118

P. Faurre et al.

accelerometer error models and on the sequence of rotations and angular


positions used for the cluster. Therefore, a sufficient number of angular positions
must be performed to estimate all the observable terms. This calibration
technique is very flexible and allows any sequence and any number of positions.
As in any other methods, the accuracy of this technique depends on the accuracy
of the gyro and accelerometer error models.

5.4 Implementation of the Calibration Kaiman Filter


The dimension of the state vector of the calibration Kaiman filter depends on
the number of error terms to be identified. For the calibration of a 0 .5Nm / h
class ring laser gyro INS, such as the SIGMA system, astate of dimension in
the vicinity of 30 is required.

Typical calibralion
identified
values
as
functions of time
and a ng ular. position
.
__ . " __M"_'. '_"" _,, ._ ... _ ... ______ ._ .. _ _ :_
" .,. . _..._ . .. ... . _ -___
+1000
I
:
:

-~_~_~_~_:-~'-~..- .,.I

+500
.(

':
'2
5 -'- - '-;-'- "
( ' )

o 'll-' '" \..._: . ._. ..."'(.. _.;_. :._:.__.i ._~.-~l~~?~.; 3I mm

,
i -~( : ~:: : : : .-:: !
.:

-500

~\

'

~.,

..

(1) gyro scale fac tor errOT ppm


(2) gyro misalignment J.lId
(3) acceleromeler bias j.1g

-1000
Typical calibration
standard deviaLions of calibration errors
+1000

,I

'-')

+500

,
I

I' - i-'~i. .-L.~~ _ ._.


!\
i

~--t!: ._.:', -._:"

,i

i :j

1
o c:o~_.:.....-...!--_-:':----_."i'_:-"":'--':"'--"\o.....:...o;;.;.;.:::.;;,J
..!
'

-;

3;

---.l

10

(I) gyro scale faclor error ppm


(2) gyro misalignment J.lId
(3) accelerometer bias j.1g
t (min)

Fig.5.1. Calibration simulatio n: typical reco vered errors a nd KaIma n filter estimated errors versus
time a nd cluster angular position sequence

2 Kaiman Filtering-The Advancement of Navigation and Guidance

119

Special care must be taken to avoid numerical unstability by using special


KaIman filter algorithms such as Bierman's factorization. The transition matrix
must be computed very accurately to achieve the required precision.

Calibration Validation
An accurate simulation of a ring laser gyro INS has been used to extensively
validate the accuracy of this calibration technique. This simulation takes into
account aIl the errors modeled (accelerometer scale factor errors, misalignments
and bias, and gyro scale factor errors, misalignments and drifts).
Simulation results are illustrated in Fig. 5.1 and Table 5.1. Typical evolution
of recovered errors and KaIman estimated errors as a function of time and
angular position is given in Fig. 5.1. Typical simulated and recovered errors as
weIl as KaIman filter estimated and true errors can be compared in Table 5.1.
The accuracy and the consistency of these results may be noticed.

Table 5.1 Typical caIibration results. SF = scale factor, B = bias, M = misalignment, G = gyro,
A = accelerometer

Error source

Simulated
error

Recovered error

100
200
300

101
201
299

Kaiman filter
estimated
error

Kaiman filter
true error

2
2
2

-1
-1
1

10
10
10

5.4
-2.5
9.6

200
299
300

5
5
5

0
1
0

100
200
300

97
197
296

5
5
5

3
3
4

Jlg
Jlg
Jlg

102
204
304

100
203
302

2
2
2

2
1
2

Jlrd
Jlrd
Jlrd

200
310
320

198
307
319

5
5

2
3

SFGX
SFGY
SFGZ

ppm
ppm
ppm

BGX
BGY
BGZ

.001/h
.001/h

MGYX
MGYZ
MGYZ

Jlrd
Jlrd
Jlrd

200
300
300

SFAX
SFAY
SFAZ

ppm
ppm
ppm

BAX
BAY
BAZ
MAYX
MAZX
MAZY

.001/h

2.5
5.0
7.5

-2.9
7.5
-2.1

120

P. Faurre et aI.

5.5 Conclusion
This ealibration teehnique for ring laser gyro systems, features several definite
advantages eompared to eonventional methods:
-high ealibration aeeuraey,
-use of a low-eost two-axis turntable,
-fast and fully automatie operation,
-use of INS standard navigation outputs, veloeity and attitude,
-great flexibility, easy modifieation of the ealibration proeedure sequenee
without any reprogramming,
-real time implementation providing ealibration data as weIl as an estimate
of their aeeuraey.
References
[1] L. Camberlein, F. Mazzanti, Calibration Techniquefor Laser Gyro Strapdown Inertial Navigation
Systems, Symposium Gyro Technology Proceedings, pp 5.1-5.12, Universitt Stuttgart,
September 1985
[2] J. Mark, D. Tazartes, T. Hildy, Fast Orthogonal Calibration ofthe Ring Laser Strapdown System,
Symposium Gyro Technology Proceedings, pp 13.0-13.21, Universitt Stuttgart, 1986
[3] P. Faurre, et aI., Navigation Inertielle Optimale et Filtrage Statistique, Dunod, 1971

6 "Alidade", At-Sea Alignment of the Inertial System


of the "Super Etendard" Carrier Based Aircraft 1

6.1 Specific Characteristics of At-Sea Alignment


This example is based on "Alidade",2 at-sea alignment technique, developed for
the "Super Etendard" carrier based aircraft of the French Navy. This developme nt has been fully operational since 1978.
As any alignment, the alignment of the inertial navigation system (INS) of
a carrier based aircraft consists in aligning the aircraft inertial platform axes
on the local geographic axes and in initializing its position and velocity. In the
case of at-sea alignment unusual conditions are encountered: the carrier moves
and roUs and ai rc raft can be anywhere on the deck and can have any orientation
in heading (Fig. 6.1).
An accurate reference system on the carrier is, of course, needed to provide
position and velocity initialization. In addition, it can be shown that this position
or velocity information, if its resolution is small enough, completed by the
accelerometers and gyros of the aircraft inertial system, is sufficient to align its
platform axes (the KaIman observability criterion is satisfied).
In a modern implementation such as "Alidade", the requirement is for a
fully automatie alignment process taking into account all peculiarities, and accomodating for any normal aircraft carrier manoeuvers and sea states. "Alidade"
uses an infra-red transmission system between the carrier and the planes on
the deck to be more discrete.
As is shown below, the alignment of the inertial navigation system of an
aircraft on a carrier is a non-stationary problem. The carrier reference velocity
is taken as an observation of the alignment errors of the aircraft inertial system,
the estimation of these errors can be done by KaIman filtering and the system
can then be aligned.

1
2

L. Camberlein, M. de Cremiers, 1.P. Paccard (SAGEM).


SAGEM's Trademark.

122

P. Faurre et al.
INFRA RED
TRANSMISS ION

CARRIER
REFERENCE

SYSTEM
Fig.6.1. "ALIDADE", at-sea alignment of the carrier based Super-Etendard of the French Navy

6,2 Model of the Aircraft Inertial System During At-Sea Alignment


During at-sea alignment, the aireraft inertial system integrates the inertial
navigation equations. The model of the true veloeity at the inertial system's
loeation satisfies the following differential equations in the loeal geographie
frame (north, west, up) ealled [gJ:

v=

Tg/p!p

+ gp -

A(p g + 2ilg) V

cjJ=Tg/pwp-Wg+Vd

V(O)

Vo

(6.1)

({J(O) =

({Jo

(6.2)

(sin ({Jz)'

Vs

sin ({JAO) = sin ({Jzo

(6.3)

(eos ((Jz)'

= Vc

eos ({JAO) = eos ({Jzo

(6.4)

with

= [Vx , Vy ] T true horizontal veloeity at the loeation of the aireraft inertial

system,
inertial platform vertieal angle errors (assumed to be small
angles),
= Z( - ({Jz)(I + A({J)) projeetion on the x - y plane of the rotation matrix
from the platform frame [p] to the [g] frame, where Z is a rotation
matrix and A({J) eorresponds to a eross produet operator,
inertial platform azimuth error (wide angle),
= [Ix, !y' !zF speeifie foree measured by the aireraft platform aeeeierometers,
gravity veetor,
= [({Jx' ({JyF

_ Vy tan LJT with Rand L respeetively earth radius and


R R
R
latitude,
= [il eos L, 0, il sin LF earth rotation rate on [g] frame,
= Pg + ilg geographie frame inertial rotation rate,
eontrol angular rate applied to the inertial platform.

= [_ Vy , Vx ,

2 Kaiman Filtering-The Advancement of Navigation and Guidance

123

The gyro drifts are assumed to be sm all and almost negligeable. They are taken
into account by vd, vs ' Ve , uncorrelated zero mean Gaussian white noises.
The velocity output of the aircraft inertial system is V + JV, where JV is
the velocity error.

6.3 Observation and Lever Arm Effect Modeling


The measurement or observation is the carrier reference navigation system
vclocity output VR This velocity is a measurement of the carrier velocity
at the carrier reference system location and not at the aircraft INS location.
The lever arm I between the carrier reference system and the aircraft INS,
combined with the carrier angular rate Wb' develops a relative velocity which
has to be accounted for in the observation model:

VR = V + Wb X I + W

(6.5)

and in the local geographie frame [g]:

VR = V + TglbA(Wb)Lb + W

(6.6)

VR = V+B(wb)L b + W

(6.7)

with

Lx = 0,

Ly = 0,

Lx(O) = LxQ'

Ly(O) = LyQ

(6.8)

and where
is the carrier reference velocity output,
is the true velocity at the aircraft inertial system location,
Tg1b
is the coordinate transformation matrix from the carrier
frame [b] to [g],
Wb
is the carrier angular rate Wb in frame [b], provided by the
carrier reference,
L b = [Lx, Ly, LzY is the lever arm vector I in frame [b], L z is constant and
known, Lx and Ly are unknown, constant for a specific
alignment but random for different alignments,
W
represents the measurement error and the small movement
perturbations which are modeled as a zero mean
Gaussian white noise.
VR
V

6.4 At-Sea Alignment Implementation


Summarizing, the dynamical model of the at-sea alignment for the INS of
a carrier based aircraft can be described by augmenting the INS model,
equations (6.1) to (6.4), by the model of the unknown lever arm components,
equation (6.8). This 8 state dynamical model is observable, non-linear,

124

P. Faurre et al.
CARRIER
REFERENCE
SYSTEM
NAVFILTER

---------- -,

AIRCRAFT INS

INS
PLATFORM

POSITIO N
ATTITU DE

8y

ALlGNMENT
FILTER

L _ _ _ _ _ UPDATING
_ _ _ _ _ _ _ .J

Fig.6.2. At-sea alignment


implementation with Kaiman
filter

non-stationary when the observation is given by (6.7). The observation is linear


and non-stationary. The complete alignment model is therefore of the form:

X =j(X) +v

(6.9)

Y=HX+W

(6.10)

x=

(6.11 )

where
[Vx , Vy, q>x, q>y' sin q>z, cos q>z' Lx, Ly]T

Y = [VRx' vRyF

(6.12)

The alignment problem, i.e. the estimation of the errors in the state vector
components of the dynamical model, is solved by an extended KaIman filter
developed about the current estimate. Once these errors are estimated the
aircraft INS is updated by correcting its states (Fig. 6.2). This implementation
is robust because of the very weak non-linearity of the model. The Alidade
at-sea alignment takes between 6 to 10 minutes, depending on the required
accuracy, and thus is two times faster than conventional methods. In addition
to the alignment KaIman filter, two other filters were also deve10ped for
"Alidade", one for the hybrid navigation of the carrier reference system and
one for estimating the carrier angular rate. All three have been fully operational
for more than 10 years.
References
[1] L. Camberlein, J.P. Paccard, M. de Cremiers, "Alidade", Optimisation Cout-Performance de
{'Alignement d'un Systeme Inertie! sur Porte-Avions, AGARD Symposium Proceedings,
pp 63.1-63.18, May 1979
[2] L. Camberlein, Evolutions Techniques de la Navigation par Inertie pour Avion, Revue Navigation
de l'Institut Francais de Navigation, pp 287-310, July 1979
[3] P. Faurre, et al., Navigation Inertielle Optimale et Filtrage Statistique, DUNOD, 1971
[4] C.T. Leondes, ed., Theory and Applications of Kaiman Filtering, AGARDograph No 139,
February 1970
[5] A.A. Sutherland Jr, The Kaiman Filter in Transfer Alignment of Inertial Guidance System,
Journal of Spacecraft, Vol 5, No 10, October 1968

7 Inertia-GPS-Multisensor Hybrid Navigation


for Combat Aircraft 1

7.1 Introduction
Much has been said and written about the exceptional complementarity ofGPS
and INS, about the several possible levels of hardware and software integration,
and about the corresponding performance/cost and synergy efficiency trade-offs.
These subjects, therefore, will not be discussed in detail, since they are covered
in many references, as in [3J, [4J, [7J and [10].
In brief, the exceptional complementarity comes from the very high long
term accuracy of GPS x, y, z position and velocity, and from the very high
short term accuracy and high pass-band of INS x, y, z position, velocity and
acceleration. It also comes: 1) from their equally remarkable coverage all-overthe-world 24 hours a day, 2) from INS being self-contained and insensitive to
jamming, and GPS not, 3) from INS providing attitude and GPS time.
The several possible levels of hardware and software integration range from
separate standalone INS and GPS unit, to complete hardware and software
integration in a single unit where a single adaptive KaIman filter would control
the GPS channels as weIl as the hybrid navigation. Between these two extremes,
synergy varies from its minimum to its maximum, although the latter extreme,
as most ideals, may very likely turn out to be unpractical to implement.
The performance/cost and synergy efficiency trade-offs usually bear on:
-aircraft installation constraints, i.e. number of units, size, weight, power, bus
load, more or less difficult validation and testing,
-performance, i.e. accuracy, resistance to jamming, to attitude manreuvers
(masking), to dynamics (high g's) and GPS shortages, inertia rapid in-flight
or at-sea alignment capability,
-integrity monitoring, reliability, robustness, redundancy, stability of the
hybrid navigation solution.
The subject of this section is to present the ULISS 2 Inertia-GPS Multisensor

1
2

L. Camberlein, B. Capit (SAGEM).


Trademark of a family of dry-tuned gyro inertial navigation systems manufactured by SAGEM.

126

P. Faurre et al.

Hybrid Navigation Unit, an interesting example of an efficient synergy and


excellent performance/ cost trade-off.

7.2 A Fully Integrated Inertia-GPS Unit


A New Option for the ULISS Family in an "AII-In-One" Unit
The ULISS inertia-GPS unit (Fig. 7.1) demonstrates a GPS option designed
for the ULISS family of inertial units. This inertial technology has been thought
of to suit requirements ranging from a simple navigation unit to an elaborate
hybrid navigation unit and/ or to a nav-attack unit. The same highly integrated
technology is used by the different versions of the family of which more than
1200 units are in operation on 14 types of aircraft of 12 Air Forces and Navies.
Whatever the version, it is the same small size (3/4 ATR short) and light weight

Fig.7.1. ULISS inertia-GPS-multisensor hybrid navigation unit

2 KaIman Filtering-The Advancement of Navigation and Guidance

127

unit (16 kg). The ULISS sm all size and light weight "all-in-one" unit demonstrates
the remarkable INS and GPS hardware synergy achieved, compared to the
one of separate INS and GPS units.

An Embedded "AII-In-View" GPS Receiver


The embedded GPS receiver (Fig. 7.2) consists of two different modules: a
Radio-Frequency module and a digital module. The RF module contains a
receiver section (RF) and a digital frequency synthesizer section. The key features
of the embedded receiver are:

"all-in-view" parallel tracking


The "all-in-view" concept designates the capability of a GPS receiver to track
all visible satellites. The tracking of all satellites in view is accomplished through
dedicated physically independant channels in parallel. This "all-in-view" concept
maximizes: continuity, smoothness, precision, reliability and integrity of navigation outputs.

miniaturization and almost all-digital processing


The digital module, using ASICs and highly integrated circuits, implements 8
parallel channels to acquire and track up to 8 satellites at the same time. This

Fig.7.2. The RF and digital module of the 8 channel embedded GPS receiver totaling only
0.51-0.5 kg-9 W

128

P. Faurre et al.

results for the embedded GPS receiver in small size (0.5 liters), weight (0.5 kilograms) and power dissipation (9 watts).
inertial aiding of the embedded G PS receiver
The inertial aiding of the embedded receiver is twofold: fast acquisition at
receiver turn-on and re-acquisition after loss oflock in case of attitude manreuvers
(masking) or jamming, and improvement of the receiver resistance to jamming
and to high dynamics.
a) Fast acquisition and re-acquisition are achieved by initializing the states of
the code and carrier phase tracking loops with values computed from the
vehic1e's position and velo city given by pure inertia and from satellite
ephemeris and current time.
b) Improvement of the receiver jamming and high g's resistance is done by tight
inertial dynamic ai ding of code and carrier phase tracking loops which allows
bandwidth reduction.

7.3 Inertia-GPS-Multisensor Hybrid Navigation Software


General Description
The general software functional block-diagram of the ULISS Inertia-GPSMultisensor hybrid navigation unit is given in Fig. 7.3 which shows its main
features. The embedded GPS receiver is aided by pure inertia. The hybrid
navigation KaIman filter receives inputs from several sources inc1uding the

'--

GPS

PUREAND
AIDEDGPS
POSITION,
VELOCITY

TIME

TERRAIN
MASS
MEMORY

BARO
ALT.
POSITION
UPDATES

R+M

l/J

TERCOR

I- -

t
TESTS
SELECTION
PREFIL
TERING

PURE
INERTIA
NAVIGATION

:-

r--

r---

HYBRID
NAVIGATION
KALMAN
FILTER

PURE INERTIA
POSITION,
VELOCITY,
ATTITUDE

ALIGN

~NAV

HYBRIDNAV
POSITION,
VELOCITY,
ATTITUDE

Fig. 7.3. Simplified software block-diagram of the ULISS inertia-GPS-multisensor hybrid


navigation unit

2 Kaiman Filtering-The Advancement of Navigation and Guidance

129

embedded GPS receiver but also from an embedded terrain correlation


algorithm (TERCOR), a barometrie altitude and possibly other position sources
(radar, infra-red systems ... ). The hybrid navigation inputs first go through a
test/selection and prefiltering function which tests for degradations or failures,
provides warnings, rejects irrelevant inputs, selects the best input set and
implements the necessary prefilterings. The hybrid navigation has two modes
of operation: the normal NA V mode and the ALIGN mode wh ich allows fast
ground, in-flight or at-sea alignment.
There are three types of outputs: 1) hybrid navigation position, velocity and
attitude, the primary and more accurate outputs, 2) pure inertia position,
velo city and attitude, and 3) pure (aided) GPS position and velo city. 2) and 3) are
secondary outputs. In addition, very accurate GPS time and one pulse per second
are output.
The design of the hybrid navigation KaIman Filter is based on accurate
modeling ofthe sensors: inertial navigation, embedded GPS receiver, barometrie
altitude, and position updates provided by terrain correlation and other possible
systems.
Model of the Inertial System

Given the real system accelerometer measurements Ja and the inertial platform
attitude angle error (jj, it can be shown [10] that the equations of perfeet inertial
navigation are, in the true frame [v]:

Z=Vz

T v/t = - A(p)v T v/t


V= [1 + A(ep)] (fa - Ea) + gp(L, G,Z) - A(pv + 2 Tv/tQ)V
<p = [1 + A(ep)]wp + Eg - W v

where
[v],[t],[p]

Pv

V
ep

Ja

Ea
Eg

gp(L,G,Z)
Q

are respectively the true coordinate frame, the earth frame


and the platform frame,
is the true altitude,
is the transformation matrix from [t] to [v], from which can
be computed the true latitude Land longitude G,
is the angular rate of frame [ v] with respect to the earth,
is the true velo city,
is the platform attitude angle error, assumed to be a small
angle,
is the measured specific force given by the accelerometers,
is the total accelerometer error,
is the total gyro error,
is the gravity function of the true position,
is the earth angular rate,
is the inertial platform angular rate control,
is the true frame angular rate with respect to the inertial frame.

130

P. Faurre et al.

These equations of perfect inertial navigation contain implicitly a11 accelerometer and gyro errors. In practice only the most significant and observable
accelerometer and gyro errors are modeled, i.e. typica11y:

Fa = bz vertical accelerometer bias,


F g = d 3 gyro drifts,
with, for instance, the fo11owing models:

hz =

Vz

d=Vd
where Vz and Vd are uncorrelated zero mean Gaussian white noises.
The resulting set of equations of perfect inertial navigation is non-stationary
and non-linear. It is linearised about the best estimate to provide the inertial
navigation error model used by the hybrid navigation extended KaIman filter.
Model of the Embedded GPS Receiver
The embedded GPS receiver provides two kinds ofvery accurate measurements:
the code phase and the carrier phase rate to each satellite, also ca11ed pseudorange and pseudo-range rate. These measurements are used as observations by
the KaIman filter. The corresponding observation models, linearized about the
current estimate, are the same as those given in Sect. 4.2:

c5 y p = p t1<jJ
t1t

pA = -

Mi - cc5br + Wp

t1~

c5Ytb = - - - =
t1t

_-

u c5V - cc5!r + Wtb

where
p,<jJ

c5R,c5Y
c

c5b" c5!r
Wp,Wtb

are respectively the pseudo-range and pseudo-range rate to a given


satellite,
is the unit vector of the antenna-to-satellite direction,
are respectively the position error vector and the velocity error vector,
is the speed of light in vacuum,
are respectively the receiver dock offset and offset rate,
are the measurements errors, assumed to be zero mean white noises.

Model of the Barometrie Altitude Measurement


The barometrie altitude measurement is used as an observation. Its error is
mainly function of the difference between the true atmosphere and the standard
atmosphere. Depending on the application it can be modeled by a first, second
or third order markovian model of the form:

Za = h(bi,Z) + W a
.

b=
--b.+v.
I
Ti I
I

2 KaIman Filtering-The Advancement of Navigation and Guidance

131

where
Za
Z
bi

is the barometrie altitude,


is the true altitude,
are parameters of the model

This non-linear model is linearized about the eurrent estimate to provide the
observation model.
Model of Terrain Correlation and Other Position Updates

The TERCOR 3 embedded terrain eorrelation algorithm provides position updates


as well as other possible aireraft systems (radar. .. ). These position measurements
are used as observations. The eorresponding observation model is:

bYR=Rm-R =bR + WR
where
is the position update, with two or three dimensions,
are respeetively the position eurrent estimate and error,
is the position measurement error, generally eorrelated between axes.
Kaiman Filter Implementation
ULISS unit implements an aeeurate and elaborate inertial-GPS-multisensor
tridimensional hybrid navigation by means of an 18 state extended Kaiman
filter. The same Kaiman filter is used for ground, in-flight or at-sea alignment,
providing fast reaetion time as well as smooth transition to and aeeurate
initialization of the navigation mode. This Kaiman filter is based on preeise
dynamie modeling of kinematics and sensors. It indudes the following residual
error eomponents of the state:

-position
-veloeity
-attitude
-aeeeierometer bias
- gyro drifts
-barometrie altitude parameters,
-GPS doek errors and doek error-rate.

3 sealars,
3 sealars,
3 sealars,
1 sealar,
3 sealars,

The Kaiman filter proeesses multiple observations eoming from GPS


pseudo-range and earrier phase measurements, barometrie altitude, terrain
eorrelation position, and, as already mentioned, from other possible position
updates, and from zero veloeity in ground alignment mode. The preproeessing
aehieved by the test/seleetion/prefiltering funetion is based on the available
3

Trademark of the terrain correlation agorithm deveJoped by SAGEM.

132

P. Faurre et al.

validity indicators, and on the innovations and covariances provided by the


filter. This function also provides fine integrity monitoring (i.e. soft failure
detection) of the inertia as well as of the G PS receiver and space segment.
The filter is used to calibrate the day-to-day and long term evolution of
gyro and accelerometer biases. It takes advantage of the carrier phase centimetric
resolution for fast convergence and alignment, as well as accurate hybrid
velocity.
Globally, the software integration-inertial aiding and hybrid navigationis designed to guarantee high stability of the hybrid navigation solution by a
numerically stable Kalman-Bierman algorithm and by a very low level long
term correlation of the measurements through direct pure inertia aiding and
open loop hybrid navigation.
Hybrid Navigation Flight Test ResuIts
This paragraph presents flight test results. Figures 7.4 and 7.5 show ex am pIes
of compared position errors of pure inertia, inertia-terrain correlation, inertia
GPS and pure GPS with respect to a navigation reference system consisting of
a high accuracy ULISS INS and a radio beacon positioning system TRIDENT
III whose accuracy is given as a few meters (la). These plots demonstrate the tremendous gain brought in by the hybrid navigation over pure inertia and even
over pure GPS (losses oflock, period ofbad satellite constellation geometry ... ).
Hybrid navigation conjugates advantages of all sensors, mainly: very good
precision of GPS (or terrain profile) in the long term, and continuity, autonomy
and short term accuracy of the INS.

1500

m
""' Pure i Ilcrt i J
lOf1(Jitude errar

1000
500 _

Illert ioJ - t ('rra i n corre],] llo n 10 IKj itud c error

o"'::_-~~
Illcr't iil-tcrrilln currclJt ion lli lude errot'

-500

-1000
Pur-e i nrrt lil

-1500 -

lalllude error

"'

-2000 o

o
o
ru

o
o
o

"

o
o

<D

aJ

o
o

Fig.7.4. Inertia-terrain correlation hybrid position errors (latitude and longitude)

2 Kaiman Filtering-The Advancement of Navigation and Guidance


6000.

- ffi

133

-m
_

4UOo.

Pure 1IlC,"tlol

GPS

ItIS/GPS

2000 .

OOODEtOO

-2000.

-4000.

o
o
o
q

ci

ci

[\J

"

':;

ci

ci

UJ

o
w

o
o

o
o

"';

':;
w

o
o

[\J

"';

Fig.7.5. Inertia-GPS position error (iongitude)

Multisensor INS hybrid navigation reduces the standard 1 Nm/h, 1 m/s INS
errors to a few tens of meters in position and better than 0.1 mls in velocity
when updates are constantly made. H, for any reason (antenna masking,jamming
for GPS or default of radio altimeter), updates cannot be achieved any more,
hybrid navigation keeps the memory of the improvement and provides remarkable survival of the performance achieved.
References
[1] P. Lloret, Inertial + Total Station + GPS: A "Golden Tripod"for High Productivity Surveying,
PLANS 90, to appear in March 1990
[2] L. Camberlein, P. Lloret, Medium to High Accuracy Navigation with Pure and Hybrid Inertial
Systems and Related Mission Planning, GIFAS Conference, Dehli, Bangalore, February 15-22
1989
[3] D.A. Tazartes, J.G. Mark, Integration of GPS Receivers into Existing Inertial Navigation
Systems, Navigation Vol35, 1988-1989
[4] M.A. Sturza, C.c. Richards, Embedded GPS Solves the Installation Dilemma, PLANS 1988,
Proceedings of the Synposium, pp 374-380, 1988
[5] J.A. Soltz, 1.1. Donna, R.L. Greenspan, An Option for Mechanizing Integrated GPS/INS
Solutions, Navigation, Vol35, No 4, Winter 1988-89, pp 443-457
[6] C. Kervin, R. Cnossen, C. Kiel, M. Lynch, Development of a Tightly Integrated Ring Laser
Gyro Based Navigation System, PLANS 1988, Proceedings of the Symposium, pp 545-552
[7] R.P. Denaro, GJ. Geir, GPS/Inertial Navigation System In'tegrationfor Enhanced Navigation
Performance and Robustness, AGARD 1988, Proceedings of the Conference, LS-161
[8] J. Ashjaee, On the precision of the C/A code, PLANS 86, Pl'Oceedings of the Symposium,
pp 214-217
[9] P. Lloret, B. Capit, Inertie GPS: un mariage de raison a ['essai, Revue Navigation, Institut
Fram;ais de Navigation, Vol35, No 139, July 1987, pp 295-325
[10] P. Faurre, et al., Navigation Inertielle Optimale et Filtrage Statistique, DUNOD, 1971

8 Further Ideas 1

Although great progress has been made over the last thirty years and Kaiman
filtering is a tremendous advance in digital data processing, one should not be
discouraged for future progress. We should not imitate the very great engineer
Bode, so famous for his work related to feedback amplifier design, who, in April
1960, unaware of the Kaiman paper which appeared one month sooner, and
reviewing his own very important work related to feedback, was becoming
dubious about future research and wrote [IJ "one wonders whetlier there are
enough really good problems to go around among all workers now in the field, or
whether effort may not be lost by duplication or micro-engraving".

Progress will be permanent and will not end with uso A lot of efforts are
being made to find good directions [2]. We can modestly point out two main
directions of research:
-one related to the design of new components or instruments which will
become simpler and simpler from a hatdware point-of-view, but will require
more and more software for data processing,
-the other one connected with the new possibilities of parallel processing and
new digital structures [3J-[ 4]: one could look at the transition from Wiener
filters to KaIman filters as associated to the transition from analogous
electronics to standard digital computers. Wh at will be accompanying the
transition from standard processors to massive parallel computers?
References
[I] H.W. Bode, Feedback, the History of an ldea, Symposium on Active Networks and Feedback
Systems, Polytechnic Institute of Brooklyn, Polytechnic Press, April 19-21, 1960
[2] Challenges to Control, a Collective View, IEEE Transactions on Automatie Control, 32, No 4,
pp 275-285, 1987
[3] J.M. Ortega, lntroduction to Parallel and Vector Solution ofLinear Systems, Plenum Press, 1988
[4] S. Kim, D.P. Agrawal, Least-Squares Multiple Updating Algorithms on a Hypercube, Journal
of Parallel and Distributed Computing, Vol 8, No I, Junuary 1990

Pierre Faurre.

Quantum Kaiman Filters


L. Accardi
Dipartimento di Matematica, Centro Matematico V. VoIterra, Universita di Roma 11,
Via Orazio Raimondo, 1-00173 Rome, Italy

The notion of state is crucial in all sciences, yet the amount of literat ure specifically devoted to analyze this notion is irrelevant compared to its importance
(cf. [lJ for a detailed discussion and references). Particulady delicate is the
notion of quantum mechanical state so it is not surprising that the filtering and
prediction problem corresponding to these states has not yet received a
completely satisfactory solution.
In the present paper, which anticipates some results obtained in collaborati on with M. Ohya [3J and S. Belavkin [2J, I want to show that the quantum
filtering problem can be correctly stated and solved within the framework of
the theory of quantum Markov chains. An interesting fall out of this is an
explicit and easily computable formula for the solution of the filtering problem
relative to a dass of c1assical stochastic processes wh ich is considerably larger
than the dass of processes for which an explicit solution of the filtering problem
was available.
The notion of state introduced below indudes both the dassical and the
quantum notions.
A set of experimental operations are said to prepare a system in adefinite
state if, as a result of them, the values of each observable of the system have a
well defined probability distribution (the dassical states correspond to the case
-in which all these distributions are delta functions).
A change of state corresponds to a change of our information on the system
and here we have to distinguish two possibilities:
i) we acquire information on the system by interacting directly with it (hence
the gain of information on one observable might correspond to a loss on
some other one).
ii) we acquire information on a given system 1 by interacting with some other
system 2 which had previously interacted with system 1 but whose interaction
with system 1 at the moment of the measurement is negligible. In this case
the change of state of system 1 is due only to our change of information
about it and does not correspond to areal physical change of system 1.
The measurements of type (ii) are called nondemolition measurements.

136

L. Accardi

1 The Filtering Problem


To fix the ideas we shall consider an atom interacting with its own radiation
field (output field) and the measurements on the field as a tool to obtain
information on the atom, considered as on open system, without disturbing it.
Of course the absence of disturbance is an idealization because we are supposing
that a modification of the field will not interact back on the atom. But in several
situations this idealization is physically justified: for ex am pie, if we count the
photons radiated by the atom, we can surely assume that this counting does
not react back on the atom.
Let us fix a unit of time and consider only discrete time.
Denote d o the algebra of observables of the atom and f!lj the algebra of the
observables of the radiation field (output field, or channel) of the atom emitted
per unit time. This is also called the output algebra and the measurements of
observables of this algebra will act as preparations with respect to the atom in
the sense that they increase our information on the atom and therefore they
change its state in the sense described in Sect. 1 (but not in the sense that they
change the intrinsic physical properties of the atom).
Suppose that, at each time we perform a measurement of some observable
quantity of the radiation field (e.g. number of photons, electric field, magnetic
field, ... ). The choice of the observable might depend on time, but at each time
it is fixed. For each instant k we denote W k the result of our measurement at
time k and
Pk(W I " " , w k ) =:Pk(W kj )E.9"(A o)

the quantum state ofthe atom at time k given that the result of our measurements
up to time kare (w I , ... , w k) =:w kj . This state is usually called the posterior state of
the atom given the set of observation w kj . Its precise definition is given in the
Appendix.
We shaB denote D k the set of aB the possible results of the measurements
of the given observable at time k and
(1.1)

the set of aB sequence of measurements up to time k (notice that D kj depends


on wh ich observables we decide to measure in the instants j = 1, ... , k.
The problem of nonlinear (discrete) fiItering consists in the derivation of a
recursive relation of the form:
Pk+ I (wkj ' W k +

d=

F k + I (Pk(Wkj ), w kj , Wk + I)

(1.2)

where F k + I is a, usuaBy nonlinear,function from .9"(do) x D kj x Dk+ I to .9"(do)


In other terms, the basic idea of filtering is that at each instant k we update
our forecast ofthe state of the system keeping into account the results of previous
measurements (i.e. w kj ), the previous state of the system (i.e. Pk(W kj )) and the
new information acquired (i.e. W k + I)'

2 Kaiman Filtering-Quantum Kaiman Filters

137

For the construction of our model, the assumption that the measurements
are nondemolition will be crucial.

2 Channels, Liftings and Transition Expectations


In this section we introduce some notions which will be used throughout in
the following.
For a C*-algebra d, we denote .5f'(d) the convex set of its states. In this
paper all C*- and W*-algebras are realized on so me Hilbert space and the
tensor product are those induced by the tensor products of the corresponding
Hilbert spaces. If no further specification is added, the algebra d j will act on
the Hilbert space .Yt'j (j = 1,2). If d is a von Neumann algebra, then .5f'(d)
denotes the set of its normal states. If d and ffl are C*-algebras, a channel from
d to ffl is a map A *: .5f'(d) --+ .5f'(ffl). If A * is affine we speak of a linear channel.
A linear channel A * can be extended by linearity to a linear map (still denoted
A *) from d* to f!JJ*. Its dual A: ,s;1 --+f!JJ is a positive map. If it is completely
positive, we call it a Markovian operator. A standard lifting should be thought
as a channel from a sub-system to a compound system. In many important
cases the algebra of a compound system is described by a tensor product. In
the present paper we shall confine our analysis to such a situation.
Definition 1.1. Let d l , d 2 be C*-algebras and let d l (8) d 2 be a fixed C*-tensor
product of d l and d 2 A lifting from d l to d l (8) d 2 is a continuous map
(2.1)

If C* is affine, we call it a linear lifting; if it maps pure states into pure states,
we call it pure.
To every lifting from d l to d l (8)d2 we can associate two channels: one
from d l to d l , defined by

AipI(a l ):= (C*Pl)(al (8) 1); 'ValEdl


another from d l to d

2,

defined by

A!PI(a2):= (C*pd(l (8)a 2);

In general, astate cpE.5f'(dl (8) d


cpldl (8) 1 =PI;

(2.2)

2)

'Va2 Ed2

(2.3)

such that

cpI1(8)d2 =p2

(2.4)

will be called a compound state of the states PI E.5f'(d l ) and P2E.5f'(d2)


The following problem is important in several applications:
Given astate PI E.5f'(dd and a channel A *: .5f'(dd --+ .5f'(d2), find a standard
lifting C*:.5f'(dl )--+.5f'(dl (8)d2) such that C*Pl is a compound state of PI and
A*PI' Several particular solutions ofthis problem have been proposed in [10],
[11], [12], [4], [5], [6], [7], ... , however an explicit description of aB the

138

L. Accardi

possible solutions to this problem is still missing. A large class of examples is


considered in [3].
Definition 1.2. A lifting from .911 to .911(8) .912 is called nondemolition for astate
P1 EY(d1 ) if P1 is invariant for Ai i.e. if for all a 1Ed1
(2.5)

The idea of this definition being that the interaction with system 2 does not
alter the state of system 1.
Remark. The notions of "lifting" and of "nondemolition lifting", discussed here
are essentially (i.e. up to minor technicalities) included in the more abstract
notions of "state extension" and "canonical state extension" introduced by
Cecchini and Petz [6], [7] (cf. also Cecchini and Kmmerer [7]).
It is clear that a positive identity preserving linear map 1): .9110.911 --+ .911
defines by duality a linear lifting from .911 to .9110.912. In some cases the converse
is also true. For example, if .911,.912 are W*-algebras and d 10d2 denotes their
W*-tensor product, then any linear lifting 1)* from d 1 to d 10d2 defines by
duality a positive linear map 1): .9110.911 --+ .911 characterized by
P1(I)(a 1 @a 2):=(I)*P1)(a 10a 2);

P1EY(d1),

a 1Ed 1, a 2Ed2 (2.6)

Since 1)* maps states into states, it follows that


(2.7)
Definition 1.3. Let .911, .912 be C*-algebras and let .911(8) d 2 be a fixed C*-tensor
product of .911 and .912. A transition expectation from d 2 to .911 is a completely
positive linear map 1): .911 --+.9110.911 satisfying (2.7).
Suppose now that .91 = C(j and let !!2 be an abelian-subalgebra of (!lj. Then
!!2 ~ C(j(Q) (the algebra of complex valued functions on some compact Hausdorff
space Q) and if :li' denotes the Baire O"-algebra on Q, then with standard
arguments one can extend the transition expectation I) to a map, still denoted
1), from .91 0:li' --+ d.
Therefore for each B ~ Q (:li' -measurable) we have a completely positive map
I)(B): .91--+.91

or an operation. Moreover it is clear that the map


B E:li' I--+I)(B)E Cf(d)

is a countably additive (in the weak operator topology) CP(d)-valued measure.


Such measures are called operation valued measures and they are used in
Ludwig's approach to the quantum theory of measurement.
Thus we see that the relationship between a transition expectation and an
operator valued measure is analogue to the relation between a quantum state
and a classical probability measure: a quantum state gives infinitely many pro-

2 KaIman Filtering-Quantum KaIman Filters

139

bability measures by restrietion to abelian subalgebra. By the same mechanism,


a transition expectation gives infinitely many prob ability measures.

3 The Model
For each instant k we consider the algbera
.?k1 =.? .? ... .?

(k-times)

of all the observations up to time k (included). This algebra describes all the
possible results of all the possible measurements that one can perform in the
first kinstants on the output field.
It is mathematically convenient to embed all the algebras .?k1 into the single
algebra
.?N:= (8) N.?
which is the tensor product of countably many co pies of .?.
Denote for kEN
jk:.? ~ .?N:= N.?

the natural embedding into the k-th factor and let, for each I

.?[:= VkE[ j k (.?)

In particular we shall freely use the identification


.?[O,kl ~ .?k1

Thus in our model the algebra of observables of the composite system


(atom + EM field) will be the algebra
d = (N.?) d o =.? N d o
and for each I

N we shall use the notations

d[:=.?[do ; dkl:=.?kldol(k~ 1) , d

01

= l~Ndo

The idea of non demolition measurement is built into this model because the
observables of the EM field (elements of B N ) commute with those of the atom
(ements of do).
For any C*-algebra 'ff we shall denote 'ff* its dual and 9'('ff) the set of states
of 'ff.
For each I ~ N there is a natural embedding
i[: b[E.?[-+b[ 1.otoE.?[do = d[

we shall write

140

L. Accardi

Our basic model of a memoryless instrument at time k will be a transition


expectation

Ck:!!4k .910 --+ .910


("memoryless instrument" means that the instrument at time k does not depend
on the results of measurements strict1y before k).
To a sequence of memoryless instruments (Ck)(kEN) and astate qJo on .910
(initial state of the system) we associate a sequence qJkl of states on the algebra
d kl = !!4kl do, with the following procedure: for each kEN we define the
transition expectation
Ekl,k-ll: .?4kl .910 ~ .?4k- 1l .?4k .910 --+.?4k- 1l .910
by linear extension of the map.

bk- 1l bk a o E!!4k-ll !!4k dol-+bk- 1l Ck(bk a o)E!!4k-ll .910


Now we define inductive1y
qJOl = qJo ; qJkl = Etl,k-llqJk-1l ; k,?: 1

(3.1)

From this we see that the explicit formula for qJkl is:

qJ(b 1 ... bkao) = qJo[C1(b 1( .. Ck-1(bk- 1Ck(bka O ))"'))]


(aoEd 0' bj E!!4 j , j = 1, ... , k).
(3.2)
From this formula two facts are apparent:
(i) the sequence (qJkl) is not necessarily projective, i.e. in general it is not true that
qJk11 .91k-ll = qJk-ll

(ii) Denoting for each kEN

qJ~ = qJkl l!!4kl


then the sequence (qJ~) is projective, Le.

qJ~l!!4k-ll = qJ~-ll
and therefore it defines a unique state qJ i!8 on!!4 N, characterized by the property
qJi!8l!!4 kl = qJ~;

VkEN.

The state qJi!8 will be called the apriori state of the output channel it describes
the apriori statistics ofthe output field. This state is a quantum Markov chain.
Since a measurement at a fixed time only can be only performed on
compatible observables, it follows that the choice of an observable of the output
system, Le. the radiation, field, to be measured at time k, is equivalent to the
choice of an abelian algebra
~k S

.?4k; kEN

2 Kaiman Filtering-Quantum Kaiman Filters

141

Since :k is an abelian C*-algebra, we can write it in the form


:k ~ rt(il k)
where il k is a compact Hausdorff space representing the set of all possible
results of our measurments of the given observable at time k. Denoting
:/:= V :k;
ke/

:kj:= :[1.kj

we have
:kj ~ rt(.Qkj) ~ rt(.Q 1 x ... x il k) ~ j= 1rt(ilJ
According to Lemma (A2.1) and Definition (A.2) and given the above
identifications, for each kEN and for each WkjEil kj the posterior state Pkj(W) of
si 0 given the measurement of :kj' the initial state epkj and the result w kj of the
observations up to time k is weIl defined for ep-all w.
Moreover one has
epkj(Fkj ) =

S <Pkj(Wkj),Fkj(Wkj)ep(dwkj)

(3.3)

!h]

for all FkjE:kj si 0 ~ rt(il kj ; si 0)' We write epkj for the restriction of ep on
:kj ~ rt(il kj ) and we use the same symbol for the Baire measure induced on
il kj . The conditional prob ability of ep ~+ 1j given ff kj will be denoted
ep~+ 1j(dwklwk)
Remark. Since for each natural integer k, :kj S; ggkj' because ofthe identity (3.3),
we can interpret both Pkj and F kj as ffkj-adapted functions on il and we can
write

epkj(F kj ) = S<Pkj(w),Fkj (w)epf0(dw) = S <Pkj(Wkj),Fkj)ep(dwkj) (3.4)


n

nk]

Notice moreover that for this conclusion we do not need the very special
structure of ep, given by (3.2), the only important thing is that the conditions
of Lemma (A2.1) are fulfilled.
With these notations the main result on discrete filtering theory for quantum
Markov chains can be formulated as follows:
Theorem. Let the sequences (epkj) and (Pkj) be defined respectively by (3.1) and
(3.4). Thenfor every kEN and ep~-for every WkjEil kj ; the Y'(sI)-valued measure
on il k given by the restriction of ~: + 1Pk(w kj ) on gg k 1d o ~ rt(il k) is absolutely
continuous with respect to epf+ 1j('1 w kj ) and its Radon-Nidokym derivative is
Pk+ 1 (w kj,)
In other terms:

~:+ 1Pk(wkj)(dW k 1) = Pk+ 1(Wkj' Wk+1)ep~+ 1j(dw k +1l W kj)

142

L. Accardi

Proof For any aEd O,fkjE~(ilkj) and fk+ 1 E~(ilk+ 1)' one has:
qJk+llUkjC?9fk+1 a)
=

S <Pk+ 1 (Wkj' Wk+ 1j)' a) fkj(wkj)fk+ 1 (Wk+ 1)qJ~+ 1j(dwk+d


!lk+ 1]

= S qJ~(dWkj)fkj(Wkj) S <Pk+1(wkj,wk+d,a)
ilkJ

ilk+ 1

Ik + 1(Wk+ 1)qJ~+ 1j(dW k+ 11 Wkj)


=
=

lff:qJ~Ukj fk+ 1 a) = qJ~Ukj IffkUk+ 1 a))

<Pk(Wkj ), Iff kUk+ 1 a) fkj(Wkj)qJ~(dWkj)

ilkJ

From this we deduce that


qJ~ - Vw kj ,

Va,

Vfk+

S <Pk+ 1(Wkj, Wk+ d, a) fk+ 1 (Wk+l)qJ~+ Ij(dw k+Ijlwkj )

[}k+ 1

<Pk(Wkj),lffk+1Uk+l a)

the arbitrariness of fk+

<1ff:Pk(Wkj),fk+l a)

and athen imply

Iff:+ 1Pk(Wkj)(dw k+1) = Pk + 1(Wkj, Wk+I)P~+ Ij( dw k+ 11 Wkj )

or equivalently:
P (w W
k

+1

kj'

+1

) _ _Iff.::...:+-,--,lo...::[=-=-P-"-k(-,---W--,,,k,-,-j)=-](-,---d_w-,,-k_+~I)
f0
(d
1
)
qJk+ Ij Wk+1 Wkj

and this ends the proof.

Appendix: Aposteriori states


Lemma (A.l). Let !! ~ LOO (il,J1) be an abelian W*-algebra and let qJ be a
normal state on f?2 C?9 d ~ LCO(il, J1; d). Then the qJ-expectation E<P from f?2 d
to f?2 is a norm I projection and there exists a pELCO(il, J1; Y(d)) such that
VFEf?2 d and qJf0 VWEil
E<P(F)(w) = <p(w), F(w)
(Al)

where qJ q, is the restriction of qJ on f?2, identified to a Baire measure on il.


Proof. That E<P is a norm one projection follows easily since f?2 is in the center
of f?2 d. It is known that there, exists a lifting from LCO(il, qJq,) to LCO(il) [16],
i.e. an homomorphism A:LCO(il,qJf0)~LCO(il) such that, ifn<p:LCO(il)~LCO(il,qJf0)
is the canonical projection, then

2 Kaiman Filtering-Quantum Kaiman Filters

143

Define, for aEd, E(a) =:A.(E"'(l a)). Then for each WEil the map
p(w):aEd - t E(a)(w) =: <p(w), a)

is astate on d. If I is a finite set and fj:il - t C are continuous functions (jEI),


then denoting
F:= IJjajE~d
JEI

one has
<p(w), F(w)

=I

<p(w),fj(w)aj )

= E'"

('?

JEI

=I

nw)E"'(a)(w)

JEI

fj aj)(w)

JEI

= E"'(F)(w)

So by density arguments (Al) holds.


Remark. From the Lemma above it follows that, if we denote cpfi! the Baire
measure induced on il by the restrietion of cp on ~, then we have
cp(F) =

J <p(w), F(w) )cpfi!(dw)


Q

Definition (A2.2). In the notations of Lemma (A.l), for cpfi!-almost each wEil,
the state p(w)E9'(d) is called the posterior state of d given the measurement of
~, the initial state cp and the result w of the observation.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16J

Acardi L.: Stato fisico. Enciclopedia Einaudi 13 (1981) 514-548


Accardi L., Belavkin S.: The filtering problem for discrete quantum Marko chains. to appear
Accardi L., Ohya M.: Compound channels, transition expectations and liftings to appear
Cecchini c.: Stochastic couplings for von Neumann algebras. Quantum Probability and
Applications III
Cecchini c., Petz D.: Classes of conditional expectations over von Neumann algebras. Preprint
Cecchini c., Petz D.: State extensions and a radon-Nikodym theorem for conditional
expectations on von Neumann algebras. Pacific Journal of Mathematics 138 (1989) 9-24
Cecchini c., Kmmerer B.: Stochastic Transitions on Preduals of von Neumann Aigebras.
In: Quantum Probability and applications V, Springer LNM 1442, pp 126-130
Kaiman R .. , A new approach to linear filtering and Prediction Problems ASME
Transactions, vol. 82, Part D (Journal of Basic Engineering), (1960) 35-45
Kaiman R.E., Bucy R.S.: New results in Linear Filtering and Prediction Theory Transactions
of the ASME, Journal of Basic Engineering, March (1961) 95-108
Ohya M.: A New Formulation of Mutual Entropy for Gaussian Channels. Preprint
Ohya M.: Quantum Ergodic Channels in Operator Algebras. Reprinted from Journal of
Mathematical Analysis and Applications 84 (1981) 318-327
Ohya M.: Note on Quantum Probability. Lettere al Nuovo Cimento 38 (1983) 402-404
Ohya M.: On Compound State and Mutual Information in Quantum Information Theory.
IEEE Transactions on information theory 29 (1983) 770-774
Petz D.: An invitation to the C*-algebra of the canonical commutation relation. Leuven
University Press, 1989
Petz D.: Characterization of Sufficient Observation Channels. Preprint
Tulcea I.T., Tulcea I.c.: Topics in the theory of lifting. Springer (1969)

Chapter 3

The LQG Problem

LQG as a Design Theory


H. Kimura
Department of Mechanical Engineering for Computer-Controlled Machinery,
Osaka University, 2-1, Yamada-oka, Suita, Osaka 565 Japan

1 Introduction
Control system design is the task offinding a controller which satisfies prescribed
specifications, under a given set of constraints. Desirable controller parameters
must satisfy a certain set of identities and inequalities that represent the plant
dynamics, constraints and specifications. Hence, control system design
amounts to solving a set of equations subject to inequality constraints that
contain the controller parameters as unknowns. Since the given set of equations
and inequalities is often very complicated and not precisely known, a solution
should be found by a cut-and-try method guided by intuition and experience,
which has been a major tool of practical control system design.
If control theory gives a basis for design, it must capture the essential features
of the design and formulate it as a mathematically tractable problem. A design
problem which theory formulates mathematically is bound to be a simplification
of real design problems and only a small part of specifications and constraints
is explicitly represented. Universal applicability and explicit solvability are vital
to a theory, but they are usually contradictory to the complexity and the
ambiguity of real problems. The key issue of design theory is to enhance its
universality (generality) and solvability without sacrificing its reality.
The LQG theory established by KaIman [lJ was undoubtedly the first
comprehensive and successful design theory in the history of control engineering
having a balance among universal applicability, explicit solvability and reality.
Though it was c1ear that Kalman's initial motivation when he established the
whole framework of LQG theory in the early 1960's was not brought forth by
practical needs, LQG theory gave a great impact on control system design. The
purpose of this artic1e is to discuss the impact of LQG theory on control system
design.
Before the LQG theory emerged, control theory was composed of a collection
of c1assical results with the names of great forerunners such as the Routh-Hurwitz
stability test, Nyquist stability test, Lur6's theorem on absolute stability, physical
realizability theorem of Wiener-Paley, Bode's integral formula representing the
gain-phase relation, and so on, and a collection of design tools like lead-lag
compensation, root-locus method, dead-beat control of sampled-data system,
and so on.

148

H. Kimura

In the late 1950's we had two remarkable attempts on a comprehensive


design theory. One was the analytic design method developed by Newton et al.
[2], and the other was the optimal control theory established by Pontryagin
and his school [3]. The former was based essentially on the old statistical
optimization theory of Wiener and lacked theoretical novelty that might have
attracted theory-oriented researchers. Also, its applicability was limited to stable
minimal-phase plants. Pontryagin's maximum principle gave great impact to
control theory at that time. Supported by the success of dynamic programming
by Bellman, as weIl as that of bang-bang principle by LaSaIle, optimal control
quickly built its solid footing. Many people thought that optimal control would
be the most important subject of control theory. However, its computational
unfeasibility calmed down the fever throughout the 1960's.
LQG theory owed quite a lot to these two predecessors-analytical design
theory (or statistical optimization theory of Wiener) and optimal control theory.
On the one hand, LQG theory can be regarded as a time-domain version of
Wiener theory. On the other hand, it can be regarded as a specialization of
optimal control theory to the linear quadratic case. Of course, LQG theory is
far beyond a simple intersection or amalgamation of these two theories. It has
created an essentially new framework of control theory which provides us with
deep understanding of the fundamental structure of control systems.
It is true that LQG theory has also created a long-standing issue of
arguments, the so-called "theory-practice gap". Even in the mid 60's when
the LQG framework was still fresh, KaIman himself mentioned on this issue:
There are nagging doubts-particularly strongly felt by engineers-that
the theory is overidealized, hence impractical. It is argued that the choice
of the performance index to be optimized is arbitrary and subjective,
perhaps only a matter of taste. If so, then it is pointless to devote too
much effort to finding a control law which is the best in so me narrow,
individualistic sense [4].
KaIman responded to the above criticism by posing and solving the inverse
problem of optimal control which turned out to be one of the most important
achievements in LQG theory.
The background idea of the theory-practice gap argument has been changing
from time to time, but it has continuously focused upon the reality of design
theory, which tends to be oversacrificed for the sake of universality and solvability.
The tone of the argument was sometimes amplified by the fact that LQG theory
is something different from conventional design and does not include it. If LQG
theory had included conventional design principles like loop shaping in the
frequency domain as a special case (just like quantum physics that contains
classical physics as an extreme case), the argument on the theory-practice gap
might have been totally different.
Every theory is built on abstraction, idealization and simplification of the
real world. Therefore, the theory-practice gap always exists wherever theory
exists. It is even necessary as a driving force for creating a new theory or a new

3 The LQG Problem-LQG as a Design Theory

149

paradigm to fill the gap. Even before the LQG theory, there was a gap between
theory and practice. For instance, Routh-Hurwitz stability theory had not been
known to practical control engineers for more than 60 years. However, the
issue of theory-practice gap did not come into being until the LQG theory was
established. Paradoxically, the creation of the issue of theory-practice gap is
one of the major contributions of the LQG theory to control engineering. In
the subsequent discussions, we shall figure out the most essential feature of the
theory-practice gap that was brought in by LQG theory.

2 LQG as a Design Theory


Consider a plant described by

y=u

(1)

This can be regarded as the model of a vehide moving on a line with normalized
mass of inertia, where y denotes its displacement and u the force applied to it.
The purpose of control is to move the vehide to the origin as soon as possible,
starting from an initial condition (Yo, Yo). A state-space form of (1) is given
by

[~:J = [~ ~J[::J + [~Ju


y=[O

[::J

1J

(2a)
(2b)

The state feedback controllaw is represented as

u= -I)(Xl -X2
= -I)(Y- Y
where I)( and are gains to
y + y + I)(Y = 0

(3)

be determined. Substitution of (3) in (1) yields

(4)

It is dear that (y,'y) = (Xl' X2) tends to the origin as t -+ 00 for any positive I)(
and . From the engineering point ofview, it is better to find the unique optimal
gains with respect to some performance criterion.
Since our main concern is the speed of response, we might use the so-called
squared transient response area

J y(tf dt.

00

J 1 (I)(, ) =

(5)

as a performance criterion. Obviously, the sm aller the J 1 is, the quicker is the
response. Solving the second-order differential equation (4) under the initial

150

H. Kimura

condition (y(O), y(O)) = (Yo, Yo) and substituting the solution in (5) yield

ll(rx,)=(~+~)Y~+YOYo+ y~
rx

rx

2rx

(6)

It is clear that II (rx, ) given in (6) can be made arbitrarily small by choosing rx
and sufficiently large. Since the input u cannot be arbitrary large, the criterion
(5) is not appropriate for practical purpose. In order that a criterion is practical,

it must incorporate the constraints on the magnitude of rx and .


It is possible to minimize II (rx, ) under the constraints

(7)
where M 1 and M 2 denote the upper bounds of gains. However, the optimum
values of rx and cannot be represented in closed forms and depend on the
initial conditions (Yo, yo). The constraints (7) thus give very complicated
non-linear controllaw. Therefore, the minimization of (6) under the constraints
(7) is again far from practical.
Instead of bounding the gains as in (7), we can penalize the increase of gains
taking
l(rx, ) =

00

S (qy2(t) + u 2(t))dt,

(8)

as a performance criterion, where q > 0 is a design parameter used for taking


the trade-off between the speed of response and the input power to be used.
This is exactly the optimal regulator that was established by KaIman. Analogous
to II (rx, ), the criterion (8) is calculated to be
2 q+rx 2YoYo+ ( q+rx 2 + ) Yo
'2
q+rx 2) Yo+
l(rx,)= ( ~+
rx

2
rx
rx
2

(9)

After some calculations, we obtain the partial derivatives


(lOa)
al
rx
2 .2 (
2)
rx . 2
(q )
a = - 2 (qyo + Yo) 1 - 2rx + 22 (Yo - rxyo) 1 - rx2 '

(lOb)

Hence, both allarx and al la vanish if 1 - 2/2rx = 0, 1 - qlrx 2 = O. Therefore,


the minimum of l(rx, ) is attained for
rx* =

Jq,

* = J2lCJ,

which are independent on (Yo, yo). Taking q = w 4 , we have the optimal


closed-Ioop system of Butterworth type
ji+j2wy+w 2y=u

3 The LQG Problem-LQG as a Design Theory

151

Generalization of the above procedure leads to the optimal regulator which


is the core of the LQG theory. All the computations like (9) (10) are reduced
to solving a Riccati equation.
The example described above is very simple, but exhibits some essential
features of optimal regulators. It shows that a realistic controllaw is obtained
by penalizing the input power. This implies that the optimal regulator is
essentially based on the trade-off between the control performance and the
input power. In most cases of practical importance, except perhaps nonminimum phase systems, the increase of control power results in the improvement of the control performance. Optimal control theory gives a deep insight
to the relation of control performance to input power and established an effective
method of trade-off between them.
In the above example, assume Yo = 0 for simplicity, and let
U=

Jy(t)2 dt,

00

V=

Ju(t)2dt

00

Taking y = rx/2, we have U V = (y-1


inequality
3

+ 3 + 3y + y2)Y6/16.

This yields the


(11)

for any input u(t) and the equality holds for the optimal case. Figure 2.1 shows
the inequality (11). The curve which represents the performance of optimal
regulator forms the boundary dividing the unrealizable zone (unshadowed) and
the realizable zone (shadowed). Using the optimal regulator theory, we can
always depict a performance curve like Fig. 2.1 for more general cases.
As was shown in the above example, the optimal regulator gives a deep
insight on the trade-off between the response speed and the applied control
power. It was a sort of common sense that the larger the control power is
injected, the better the control performance iso However, no design method had

o~----~-------L------~------~~V

10

15

20

Fig.2.1

152

H. Kimura

been available before the emergence of LQG theory to deal with the trade-off
between the control performance and the control power.
If = 0 in (3), the closed-Ioop system is not stable. In other words, output
feedback cannot stabilize the closed-Ioop system. State feedback is essential in
this case. The optimal regulator fully exhibited the power of state feedback. In the
state-space paradigm, the state is regarded as the information concerning the
past behavior of systems which is necessary and sufficient for predicting the
future behavior of systems. In other words, state is an interface between the past
and the future, according to Akaike [5]. Therefore, the state feedback uses the
complete information and represents the ideal feedback scheme. Wh at is not
attainable by state feedback cannot be attained by any other feedback scheme.
Optimal regulator provides an effective way ofusing this complete information.

3 The Impact of LQG Theory


In 1960's when LQG theory was making rapid progress, its impact was mainly
confined to the academic world. Industry as a whole did not pay much attention
to the use ofLQG theory for control system design, except perhaps the US space
industry that was being involved in serious competition with the USSR. From the
early 1970's, industry became gradually interested in state-space theory. It is
partly because LQG theory had reached a certain state of maturity, as was
shown in [5]. Successful extension to servo problems and development ofreliable
softwares for solving Riccati equations were the two major factors which
enhanced the reality of LQG theory. Also, the rapid progress of microprocessors
was the great help for LQG theory to penetrate into industry. Microprocessors
make it cheaper and easier to implement a complex control algorithm that
LQG theory generates. The increasing needs for multivariable control in
industry was another important factor that pushed forward the use of LQG
theory in the practical control system design.
Twenty years ago, industrial application of LQG theory was very few. Case
study was limited only to some pioneering works. Ten years ago, we saw quite
a few serious industrial applications of LQG or related theory that reached the
level of test operation. Nowadays we can see wide spread industrial applications
of LQG theory, some of which are already in the full commercial operations.
LQG theory is now an indispensable body of knowledge for engineers working
in any areas of research and development related to control.
LQG theory pushed forward the superiority of systematic and logical design
based on plant models, to conventional trial-and-error methods of design. This
is perhaps thc most important contribution of LQG theory to control system
design. In order that control system design has a logical ground, it must be
based on a mathematical model of plant. In other words, model-based control
is the only way to make control system design scientific. Let us take the weather
forecasting as an example. Thirty years ago, it was mainly based on the experience

3 The LQG Problem-LQG as a Design Theory

153

of experts rather than meteorological reasoning. Now, supercomputers are used


to compute and predict the atmospheric phenomena quantitatively based on
rich information brought in by satellites, and the expert's experience is much
less weighted than it used to be. If this is a way of scientific progress, it is exactly
what LQG theory tried to pave for control system design.
Unfortunately, however, we must admit that in most control system design
of real plants, cut-and-try methods led by engineers' intuition and experience
are still in use. Most of the existing control algorithms implemented in real
plants are not based on theory but on ad hoc reasoning that relies on the
individual physical structure of plants. There are many reasons which prevent
the use of LQG theory in the grass root level of design, though LQG theory
is already very popular among control engineers. The most fundamental and
serious reason lies in the idea of model-based control itself. This will be the
subject of the next section.
LQG theory extended the scope of control from the regulation of local
process variables to the control of overall system. It is clear that a system
composed of many subsystems is effectively controlled by taking the interaction
among them properly into account, instead of controlling each subsystem
independently. This leads to the difficulty of multi variable control, in which the
intuition no longer works. In the late 1950's, we had no method to deal with
multi variable control except very primitive ideas of decoupling, cascade control
and internal compensation. LQG theory provided the first effective multi variable
feedback synthesis, at least at the conceptual level. It provided a way of
systematic use of measured signal for the purpose of control, wh ich opened a
door to the control of total systems integrating local controllers. This was indeed
great conceptual progress, to which LQG theory made significant contributions.

4 The Issue of Modeling


The underlying idea of LQG theory is the superiority of model-based control.
The model plays an extremely important role in the design based on LQG
theory. Without a model of the given plant, LQG theory does not work.
However, modeling is not an easy task for most plants to be controlled, specially
for complex and large-scale plants. This is the most fundamental reason that
limits the use of LQG or other theory in the control of real plants.
On the one hand, there is control theory as a logical system which is
abstract, universal and free from any physical constraints. On the other hand,
there is a real plant to be controlled which is concrete, individual and under
many physical constraints. Any engineer, who tries to design a control system
of a real plant basing hirnself on control theory, must combine these two things
of opposite characteristics. He should not be overwhelmed by the distance
between them. A model of the plant represented in an appropriate way is the
only medium that combines the theory and the plant. In other words, the model

154

H. Kimura

is an interface that fuses the two opposite worlds, the abstract theory and the
real plant.
However, to build a model of a given plant is a difficult task in general due
to a multitude of reasons. We frequently use the term physical model. But it is
important to recognize the difference between a physical model and a model
based on physics. It is only an isolated, idealized and simple process that is
described by means ofphysicallaws. A physical process ta king place in industrial
systems is always a compound of several subprocesses interacting with each other.
For instance, the steel rolling is a relatively simple mechanical process based
on plastic deformation. However, an actual rolling process is far from simple
due to the use of a tandem mill system. It generates interstand tensions which
delicately affects the thickness. The traveling speed is crucially dependent on
the friction between the material and the mill. No physicallaw can identify all
these processes quantitatively. If we take the deformation profile of the strip
into account, the problem becomes three dimensional. No mechanical theory
of deformation is available to deal with such three dimensional phenomena.
We must recognize the physics is far from sufficient to describe industrial
processes quantitatively. Physics has been built on the belief that Nature is
simple, or becomes simple after the removal ofmacroscopic diversity. Oppositely,
engineering is based on the beliefthat Nature is camp lex. Instead offundamental
equations, phenomenal or empirical formulas are used extensively to describe
the physical process of the plant under consideration. Engineering systems
belong to the artificial world. They cannot be fully described by the knowledge
of natural science whose objective is to understand the natural world, though
they obey and utilize the natural laws. This is an important point in order to
understand the difficulty of modeling.
Recently, some attempts have been made to ex te nd the scope of physics to
describe the complexity and the diversity of the macroscopic physical world.
We expect that the new trend in physics will help to mitigate the difficulty of
modeling [7].
In many subjects of engineering such as theory of he at transfer, an effort is
made to fill the gap between the fundamental physicallaw and the complexity
of physical world. They are essentially based on natural science. Though the
knowledge of such engineering is essential to the modeling of plants, something
more is definitely required in order to give some principle to decompose, to
simplify, to parameterize the model and to design experiments to identify the
plant parameters quantitatively. It would be some sort of "metaphysics" that
belongs to "the science of the artificial" [22]. This is exactly what we lack and
hope to have in future to eliminate the theory-practice gap to some extent.

5 The Issue of Robustness


In the LQG design, a heavy burden is placed on modelling, which is usually a
difficult and time-consuming task. If a model used for the design can be a rough
and inaccurate one, the burden of modeling is lessened considerably. The

3 The LQG Problem-LQG as a Design Theory

155

purpose of robust control is to introduce tolerance against as much uncertainty


as possible in the model used for design. The robustness issue of control system
design has been receiving increasing attention in the last decade, as control
theory has increasing contact with industrial applications. Now, robust control
has grown up to be the most active field of control theory and has marked
some great progress through 1980's [8]. It was the LQG theory that created
this field in mid-1960's. The following sentence was written in 1971, which
foresaw the importance of robust control, though the word "robust" was not
used explicitly.
It appears that the most pressing need is related to the modeling issue;

namely, how accurate should be the model of uncertainties be? and how
should the performance index be defined to consolidate design specifications and model inaccuracy? Furthermore, how can one guarantee the
relative insensitivity of the final control system right from the start? Of
course, these questions are not new; nonetheless, the existence of systematic
computer-aided design technique has returned the focus upon these
age-old questions, [17].
The argument on the robustness ofLQG method dates back to the celebrated
paper by KaIman [4] on the inverse regulator problem. KaIman derived the
so-called KaIman equation and established that the sensitivity of the optimal
regulator is always not greater than one for SI SO systems, irrespective of the
selection of performance indices. This also implies that the optimal regulator
has infinite gain margin and phase margin of at least 60, again irrespective of
performance indices. These robustness properties of the SISO LQG method
were extensively discussed in a beautiful book [9].
The implications of the KaIman equation for the general MIMO case were
investigated by many authors (e.g. [10] [11] [12]). It is worth noting that a
complete multivariable extension ofthe solution oft he inverse regulator problem
was obtained quite recently by Fujii [13], who derived a novel design method
of robust control based on the inverse regulator theory [14]. A general
robustness issue for the MIMO ca se was addressed by Safonov et al. [15], who
derived a natural extension of the stability margin of scalar regulator to the
multi variable case. This work showed that ni ce robustness properties for SI SO
regulators are carried over to the MIMO ca se, and really gave an initial impellent
for the subsequent progress of robust control theory.
Unfortunately, however, it was pointed out that the nice robustness
properties of optimal regulators hold only when state feedback is available [16].
They no longer hold when the optimal regulator is to be implemented by
output feedback using KaIman filters or observers. As a remedy for this, the
so-called loop transfer recovery (LTR) method was proposed. However, since
it uses a kind of high gain strategy, the practical applicability of LTR method
is questionable.
The robustness issue concerning the LQG method exhibits a possibility of
serious theoretical approach to deal with model uncertainty. Though most
applications ofLQG method to real systems exhibit its robustness as was proved

156

H. Kimura

theoretically, some serious drawbacks have been pointed out [18]. However,
LQG theory will remain to be a milestone for robust control theory, as is seen
in the recent rapid progress of quadratic stabilization theory [19] [20].

6 Conclusion
As the founder of LQG theory, KaIman made an enormous impact
on the methodology to control system design. LQG theory was the first
comprehensive and systematic design theory we had in the long history of
control engineering. First, it pushed forward the superiority of model-based
control. It established a method of systematic and logical design of control
systems which is sharply different from conventional design methods based on
experience and intuition. It opened up the possibility of changing control
system design from arts to science. Second, it extended the scope of control
system design from the control of local process variables to the integrated
control of the whole plant. Third, it created the field of CAD for control systems.
It converted the control system design from paper-pencil jobs to large-scale
computation using interactive software packages.
It is also true that many criticisms have been raised against LQG theory
since its emergence. The underlying ideas of these criticisms have been changing
from time to time. While some of them have settled down, the fundamental
issue ofmodeling remains to be argued. The argument ofthe theory-practice gap
has been giving a continuous impellent for creating a new paradigm. Now, we
have a new design method called H"" control which was established as an
alternative of LQG. Different from various design methods proposed in the
past, H"" method is actually not an alternative of LQG but rather includes
LQG as its extreme case [21]. This is analogous to the fact that quantum
physics includes classical physics as its extreme case. In this respect, H"" method
is really a successor of LQGmethod in a higher logicallevel, of course giving
more versatile design strategies. The spirit of LQG survives in H"" theory.

References
[1] R.E. KaIman, "Contributions to the theory of optimal control," Boletin de la Sociedad
Matematica, Mexicana, Vol 5, pp 102-119, 1960
[2] G.c. Newton, L.A. Gould and J.F. Kaiser, Analytical Design of Linear Feedback Controls,
John Wiley and Sons, New York, 1957
[3] L.S. Pontryagin, et. al., The Mathematical Theory ofOptimal Process, Interscience Publishers,
New York, 1962
[4] R.E. KaIman, "When is a linear control system optimal?" Trans ASM E, J of Basic Engineering,
Vol 86, pp 1-10, 1964
[5] H. Akaike, 'Markovian representation of stochastic process by canonical variables," SIAM J
Control, Vo113, pp 162-173, 1975
[6] Special Issue on Linear-Quadratic-Gaussian Estimation and Control Problems, IEEE Trans
Automat Contr, Vol AC-16, 1971
[7] I. Prigogine and I. Stengers, Order Out of Chaos: M an's N ew Dialogue with Nature, Bantam
Books, New York, 1984

3 The LQG Problem-LQG as a Design Theory

157

[8] P. Dorato (ed.), Robust Control, IEEE Press, New York, 1987
[9] B.D.O. Anderson and J.B. Moore, Linear Optimal Control, Prentice-Hall, New Jersey, 1971
[10] B.D.O. Anderson, "Sensitivity improvement using optimal design," Proc lEE, Vol 113,
pp 1084-1086, 1966
[11] J.B. Cruz and W.R. Perkins, "A new approach to the sensitivity problem in multi variable
feedback systems," IEEE Trans Automat Contr, Vol AC-9, pp 216-223, 1964
[12] E. Kreindler, "Closed-Ioop sensitivity reduction of linear optimal control systems," ibid, Vol
AC-13, pp 254-262, 1968
[13] T. Fujii, "A complete optimality condition in the inverse problem of optimal control," SIAM
J Control and Optimiz, Vol 22, pp 327-341, 1984
[14] T. Fujii, "A new approach to the LQ design from the viewpoint of the inverse regulator
problem," IEEE Trans Automat Contr, Vol AC-32, pp 995-1004, 1987
[15] M.G. Safonov and M. Athans, "Gain and phase margin for muItiloop LQG regulators," ibid,
Vol AC-22, pp 173-179, 1977
[16] J.C. Doyle and G. Stein, "Robustness with observers," ibid, Vol AC-24, pp 607-611, 1979
[17] M. Athans, "Editorial: On the LQG Problems," IEEE Trans. Automat Contr, Vol AC-16,
p 528, 1971
[18] H. H. Rosenbrock, "Good, Bad or Optimal," ibid, Vol AC-16, 1971
[19] I.R. Petersen, "A Riccati equation approach to the design of stabilizing controllers and
observers for a dass of uncertain linear systems," IEEE Trans Automat Contr, Vol AC-30,
pp 904-907, 1985
[20] K. Zhou and P.P. Khargonekar, "An algebraic Riccati equation approach to H'" optimization,"
Systems & Control Letters, Vol 11, pp 85-92, 1988
[21] J.C. Doyle, K. Glover, P.P. Khargoneker and BA Francis, "State-space solutions to standard
H 2 and H", control problems," ibid, Vol AC-34, pp 831-847, 1989
[22] HA Simon, The Sciences ofthe Artificial, MIT Press, 1969

State-Space H 00 Control Theory and the LQG Problem*


P. P. Khargonekar
Department of Electrical Engineering and Computer Science, The University of Michigan,
Ann Arbor, MI 48109-2122, USA

Some of the recent developments in H<Xl control theory using a state-space approach are discussed.
The elose connections to the LQG problem are highlighted. Moreover a new proof of the necessary
conditions for the solvability of the standard problem of H<Xl control theory is given.

1 Introduction
Among the many contributions of KaIman to mathematical system theory, his
work (KaIman [1960J, KaIman and Bucy [1961J) on linear-quadratic-Gaussian
(LQG) optimal control and filtering problems has had a very strong influence
on the subject. In all of his work, he emphasized that the notion of state plays
a vital role in a deeper understanding of system theoretic problems. Indeed,
some of the appeal and beauty of KaIman filtering and linear-quadratic optimal
controllies in the simple and intuitive structure of the solutions.
Motivated by robustness considerations, Zames introduced the problem of
HOC) optimal control in his pioneering paper Zames [1981]. The essential idea
was to design a controller to optimize performance for the worst exogenous
input. Thus, while in KaIman filtering and the LQG problem, the power
spectrum ofthe exogenous input (noise) is assumed known (usually white noise),
in the HOC) control problem the power spectrum is assumed unknown and the
controller is designed for the worst case.
Early research in HOC) control theory was conducted using frequency
domain methods. The key tools were the Youla-Jabr-Bongiorno-Kucera
parametrization of all stabilizing controllers, inner-outer factorizations of
transfer functions, Nevanlinna-Pick interpolation theory, Nehari distance
theorem, the commutant lifting theorem, etc. Since the frequency domain
approach is not the main topic of this paper and since there is an extensive
amount of literature on this approach, I will not discuss it here. The survey
paper by Francis and Doyle [1987J and the expository book by Francis [1987J
are excellent sources for the frequency domain approach to the HOC) control

* Supported in part by NSF under grant no. ECS-9096109. AFOSR under contract no.
AFOSR-90-0053. and ARO under contract no. DAAL03-90-G-0008.

160

P. P. Khargonekar

problem. These references also contain fairly exhaustive bibliographies. This


frequency-domain/operator-theoretic approach is currently a very active area
of research and a number of different avenues are being explored. See Doyle,
Glover, Khargonekar, and Francis [1989] for some ofthe more recent references
(especially those which have appeared since Francis [1987]) on this approach.
A major new deve10pment in H 00 control theory in the last two years has
been the introduction of state-space methods. This has led to a rather
transparent solution to the standard problem of H 00 control theory. This solution
is remarkably similar to the elassical LQG solution with appropriate differences
that reflect the differences in the H 00 and the LQG performance criteria. As a
result it is possible to give a simple interpretation to the so-called H 00 central
controller. Analogous to the LQG theory, the solutions are given in terms of
solutions to algebraic and differential matrix Riccati equations-a tool
introduced in system and control theory by KaIman. These new developments
are very elose in spirit and technique to Kalman's pioneering contributions to
the state-space optimal control and filtering problems.
In this contribution I will discuss some of these recent results and developments in H 00 control theory focussing exclusively on the state-space point of view.
This paper is not intended to be a complete survey of this area. Rather, it is my
intention to give a brief summary of some of the key recent results and discuss
some of the new insights obtained using the state-space approach.
For reasons of space and time, I have chosen to omit proofs he re with one
exception. Recently, I have developed a new proof of the necessity part of the
main theorem on the solvability of the standard problem of HOC! control with
output feedback. There are two aspects of this proof that are particularly
appealing to me: (i) the proof is completely in time-domain and (ii) it has very
interesting and natural connections to the elassical linear-quadratic control
theory. (More specifically, there are some very concrete connections to certain
optimal control problems studied in Willems [1971].) This new derivation of
the necessary conditions can be combined with the proofs of sufficiency of these
conditions as given in Doyle, Glover, Khargonekar, and Francis [1989] or
Tadmor [1988] for an alternate derivation ofthe recent solution to the standard
problem of H 00 control theory. However, for the sake of completeness, a proof
of sufficiency is also ineluded.

2 State-Space Approach to H

CX)

Control Theory

Control the finite-dimensional linear time-invariant system E


dx
-=Fx+ G1w + G2 u,
dt

z=H 1 x+J ll W+J 12 U,


y=H 2 x+J 21 W+J 22 U

(1)

3 The LQG Problem-State-Space H 00 Control Theory

161

Here x, w, u, z, y denote respective1y the state, the exogenous input, the control
input, the regulated output, and the measured output. KaIman made pioneering
contributions to the problem of designing a controller for optimizing the
variance of z when w is a stochastic process. This classical solution, commonly
known as the LQG theory, has had a very significant influence on linear
multi variable control theory over the last 30 years.
The standard problem of H 00 control theory is: Given the linear system 2:
and a positive number y, find a causal (dynamic) controller K such that the
closed loop system is well-posed, internally stable, the closed loop input-output
operator
Tzw :L 2 -L 2 :WHZ
is bounded, and the induced norm
IIzI12
}
IITzwll:=sup { --:llwI12#0 <y
IIwI12
A controller K is called admissible iff it is causal, the closed loop system is
well-posed and internally stable. If K is also an FDLTI system

d~ =A~ +By,
dt

(2)

u=C~+Dy

then the closed loop system is well-posed iff (1 - J 22D) is invertible and the
closed loop system is internally stable iff the closed loop system matrix
[

+ G2(1-DJzz}- l DJ 22
B(1- J 22D)-1

G2 (1- DJzz}-lC
A

+ BJ 22(1- DJ 22)-lC

has no eigenvalues in the closed right half complex plane. Moreover, in this
case, if we let Tzw(s) denote the closed loop transfer function matrix, then
11 Tzwll = 11 Tzwll oo := sup{B(Tzw(s)): Re(s) ~ O}
It is well known fact that under suitable assumptions, the LQG controller
is also the unique solution to the problem of minimizing the quadratic norm
of the closed loop transfer function matrix Tzw over all internally stabilizing
controllers. Thus, the key difference between the LQG and the H 00 control
problems is in the choice ofnorm on transfer functions. This is intimately related
to the underlying assumptions on the exogenous signals as mentioned in the
introduction. Also, it turns out that the H 00 norm arises naturally in many
robust control problems.
In order to present the most important concepts clearly, I will make certain
simplifying assumptions. Most of these assumptions can be easily removed.
A.1 J 11

= J 22 = o.

A.2 J'12 [H 1

J 1 2J = [0

I].

162

A.3
A.4
A.5
A.6

P. P. Khargonekar

J 21 [G'1 J~I]=[O 1].


(F, GI' H 1) is controllable and observable.

(F, G2 , H 2) is stabilizable and detectable.


The controller K is an FDLTI system.

Assumption A.l implies that there is no direct transmission from the


exogenous input to regulated output and from the control input to the measured
output. The latter assumption ensures that any proper rational matrix K(s)
leads to a well-posed feedback system. Assumption A.2 is quite common in the
LQG literature and amounts to assuming that there is no cross term in the
formula for 11 z 11 2 and the penalty on the control input u is normalized, i.e.
IIzl1 2 = X'H'IHIX + u'u

Assumption A.3 is the dual of assumption A.2 and is analogous to the standard
assumption in the KaIman filtering problem that the process noise and the
measurement noise are uncorrelated and that the measurement noise is
nonsingular and normalized. Assumption A.4 is a technical assumption and
guarantees that certain solutions to certain algebraic Riccati equations are
nonsingular. Assumption A.5 is necessary and sufficient to guarantee the
existence of an internally stabilizing controller for the system L. The reader is
referred to Glover and Doyle [1988,1989], Safonov and Limebeer [1988], and
Zhou and Khargonekar [1988] on techniques for rernoving the assumptions
A.1-A.5.
The assumption A.6 also causes no loss of generality as shown by
Khargonekar and Poolla [1986]. They showed that the infimum of the norm
of Tzw over all causal internally stabilizing nonlinear controllers is no less than
that over all FDLTI internally stabilizing controllers. In fact a simple timedomain proof of their result can be constructed using some of the techniques
introduced in this paper.

2.1 The State-Feedback H 00 Control Problem


We will first consider the state-feedback H
in the system L, let us ass urne that

Cl)

problem. In other words,


(3)

(It should be noted that the state-feedback problem does not satisfy the
assumption A.3 above.) The papers by Mageirou and Ho [1977] and Petersen
[1987] contain some ofthe first results on the state-feedback HCl) problem. (The
paper by Mageirou and Ho [1977] contains a result, in the context of robust
stabilization, which is essentially equivalent to the result of Petersen [1987].)
Under the assurnption J 12 = 0, Petersen [1987] considered the problem of

3 The LQG Problem-State-Space H<Xl Control Theory

163

finding a real matrix L such that F + G2 L is asymptotically stable and

IIH 1 (sI -F- G2 L)-IG 1 1100 <Y


He obtained a necessary and sufficient condition for the existence of L in terms
of an algebraic Riccati equation with an indefinite quadratic term. This was
quite an interesting result and led to a number of subsequent developments.
In view of the fact that frequency-domain H 00 contra I theory results usually
led to (apparently) high dimensional controllers, Khargonekar, Petersen, and
Rotea [1988] investigated wh ether static ga ins were actually H 00 optimal. This
problem was motivated by the one of Kalman's key insights that control is a
function of the state. Thus, we expected the following result to be true:

Theorem 2.1. Consider the system E and suppose (3) holds. Then y* = inf{ 11 Tzw 11 00: K
is an internally stabilizing dynamic state-Jeedback controller} = inf { 11 Tzw 1100: K
is an internally stabilizing static state-Jeedback gain.}
Theorem (2.1) was proved by Khargonekar, Petersen, and Rotea [1988]
under the condition that J 12 = 0. It was generalized further by Khargonekar,
Petersen, and Zhou [1987], Zhou and Khargonekar [1988], Doyle, Glover,
Khargonekar, and Francis [1989]. Recently, Scherer [1990] has shown the
following

Theorem 2.2. Consider the system E and suppose (3) holds. Let y* be as in Theorem
(2.1). Suppose there exists an admissible dynamic state-Jeedback controller K(s)
such that the norm of the closed loop transfer function
11 Tzw 11 00 = y*
Then there also exists a stabilizing static feedback L such that the norm of the
closed loop transfer function

11 T zw 11 00 = y*
The following theorem shows how one can obtain state-feedback control
laws by solving certain algebraic Riccati equations. It is taken from Doyle,
Glover, Khargonekar, and Francis [1989]. Results analogous to Theorem (2.3)
were obtained earlier by Mageirau and Ho [1977], Petersen [1987],
Khargonekar, Petersen, and Zhou [1987], Zhou and Khargonekar [1988].

Theorem 2.3. Consider the system E. Suppose assumptions A.1, A.2, and (3) hold.
Then there exists an admissible controller K such that

(4)

11 Tzw 11 00 < y

if and only if there

exists a (unique) symmetric matrix P such that

F'P +PF +

p(G~~'1 - G2G~)P +H'IHl =0,

(5)

164

P. P. Khargonekar

F + ((GI G'I)/(y2) - G2G~)P is asymptotically stable, and P > O. In this case, the
controllaw

u=

-G~Px

(6)

internally stabilizes I; and (4) holds.

There are a number of interesting features to this result. Note that the
control law is obtained by solving an algebraic Riccati equation which is
analogous to the classical results of KaIman on the linear-quadratic optimal
control problem. The algebraic Riccati equation (5) is similar to the corresponding equation in the linear-quadratic optimal control problem except
that the quadratic term in (5) is indefinite. Indeed, (5) is identical to the ARE
that arises in linear quadratic optimal game problems. This is intuitively
appealing since in the H 00 control problem the inputs wand u act as opposing
players: the exogenous input w tries to maximize (the norm of) of z while u is
designed to minimize it. This connection between linear-quadratic games and
H 00 control theory has been discussed and explored in a number of recent
papers. See, e.g. Khargonekar, Petersen, and Zhou [1987], Tadmor [1988,
1989], Limebeer, Anderson, Khargonekar, and Green [1989], Basar [1989a,
1989b], Uchida and Fujita [1989]. From the game theory perspective, the works
of Banker [1972], Mageirou [1976], and Mageirou and Ho [1977] are of special
interest with respect to the current developments in H 00 control theory.
Tadmor [1990,1989] and Limebeer, Anderson, Khargonekar, and Green
[1989] have obtained analogues ofTheorem (2.3) for linear time-varying systems.
Tadmor [1989] has also obtained results analogous to Theorem (2.3) for irifinitedimensional systems.

Note that as y goes to 00, the controller (6) approaches the LQR solution.
In other words, as the H 00 norm constraint on the closed loop transfer function
is relaxed, the control law given by (6) approach es the corresponding LQR
controllaw.
It is interesting to note that the paper ofPetersen [1987] was for the singular
case, i.e. when the rank of J 21 is less than the dimension of u. Indeed, in that
paper J 21 = O. The results of Khargonekar, Petersen, and Zhou [1987], Zhou
and Khargonekar [1988] also apply to the general singular case (J 21 is not
necessarily full rank). However, in these papers the singular case is taken care
of by introducing certain small peturbations to make the problem nonsingular.
Since we are dealing with a strict inequality in (4), the existence ofan appropriate
small perturbation is not difficult to establish. In a different and more appealing
approach, Stoorvogel and Trentelman [1988] have obtained interesting results
on the singular case in terms of quadratic matrix inequalities. Their approach
is also related to the almost disturbance decoupling problem.
Theorem (2.3) naturally leads to the H 00 analog of the inverse problem of
optimal control formulated by KaIman [1964]. This inverse problem of H 00
control theory was considered by Fujii and Khargonekar [1988] who showed
that state-feedback H 00 controllers inherit the nice robustness properties

3 The LQG ProbIem-State-Space H 00 ControI Theory

165

(KaIman [1964]) of the LQR controllers and in a certain sense are even more
robust.

2.2 The H 00 Filtering Problem


In the LQG theory, state-feedback linear-quadratic optimal control and Kaiman
filtering problems turn out to have dual structures. The situation is somewhat
similar in the H 00 case. Filtering problems were first considered by Doyle,
Glover, Khargonekar, and Francis [1989] under the assumptions that (i) the
process is stable, and (ii) the initial state is known. A steady-state filtering
problem was also considered by Bernstein and Haddad [1989b] under the same
assumptions. Recently, Nagpal and Khargonekar [1989] have developed a fairly
complete H 00 estimation theory. Here I will give a brief summary of some of
their results.
Consider the FDL TI system

dx

-=Fx+G1w,
dt

z=H1x,
Y = H 2 x + J 21 W

(7)

In this section, we will assume that (F, G1) is reachable, (F, H 2) is detectable,
and A.3 holds.
The problem is to estimate the output z using the measurements y. Nagpal
and Khargonekar [1989] have considered both the filtering (estimator is
required to be causal) and the smoothing (estimator is allowed to be noncausal)
problems. So, let ff be a causal estimator and let

z= ff(y)
The estimator ff is called unbiased if
y(,)

= 0 \I, ~ t=>z(,) = 0 \I, ~ t

Let us now define an H 00 i.e., a worst case, performance measure. There are
two cases to consider: (i) initial state is known (and, without loss of
generality = 0), and (ii) the initial state is unknown. In the latter case, it is
assumed that the best apriori estimate of x (0) is zero. In other words, the
estimator initial state is taken to be zero. Let 0 ~ T ~ 00. Define
J1(ff,

and
Ob

T):=sup{IIZ-ZI12:WEL2[0, T],O# IIWI12'X(0)=0},


IIwI12

J 2(.'#',R,T):=sup

{li

Z -

z112

[ 1 W 11 2 + x~Rxo]

1/2: WEL2

= xo, 1 W1 ~ + x~Rxo # 0}

(8)

[0,T],x(0)
(9)

166

P. P. Khargonekar

In the definition of J 2' R is taken to be positive definite. The key difference


between J 1 and J 2 is that J 2 measures the worst case performance over all
possible wand x(O). The expression [11 w 11 ~ + x~Rxo] 1/2 is a total measure of
energy in the exogenous variables w, x(O).
The following results are taken from Nagpal and Khargonekar [1989].

Theorem 2.4. Consider the FDLTI system (7). Let y > 0 be given and let T =
Then there exists an unbiased linear filter such that

00.

(10)

if and only ifthere exists a bounded symmetrie matrixfunction Q(t), tE[O, (0) such
that
Q(t) = FQ(t) + Q(t)F' -

Q(t)H~H 2Q(t) + Q(t)H'l ~ 1 Q(t) + G 1 G~,


y

Q(O)=R-\

(11)

and the system


e(t) = [F - Q(t) (

H~H 2- H~~ 1) ] e(t)

is exponentially stable.
M oreover, if these conditions hold then one filter that satisfies the performance

bound (10) is given by


~(t)

= Fx(t) + Q(t)H~[y(t) - H 2x(t)], x(O) = 0,

i(t) = H 1 x(t)

(12)
(13)

Theorem 2.5. Consider the FDLTI system (7). Let x(O) = O. Let y > 0 be given
and let T = 00. Then there exists an unbiased linear filter such that
J 1(ff,00)<y

(14)

if and only if there exists a real symmetrie matrix Q such that


FQ

+ QF' - QH~H2Q + QH'l~lQ + G1G'1 = 0,


y

( 15)

the matrix (F - Q(H~H 2 - H'l H d y 2)) has no eigenvalues in the closed right
half plane, and Q> o.
M oreover, if these conditions hold then with Q(t) replaced by Q, the filter given
by (12) and (13) satisfies performance bound (14).
These results are the H 00 analog of the weIl known KaIman filtering results.
The key differences are in the Riccati differential equation (11) and the algebraic
Riccati equation (15) which are very similar to the covariance equations for the
KaIman filtering problem with the exception of the term QH'l H 1 Qjy2. Thus, in
H 00 filtering, the states to be estimated influence the filter itself unlike in KaIman
filtering where the optimal es ti mate of any state-functional is obtained from the

3 The LQG Problem-State-Space Ha> Control Theory

167

optimal state-estimator. Note that as y -+ 00, the H 00 filters approach the


standard KaIman filters.
There is another qualitative interpretation of the term QH'1 H 1 Qjy2. It can
be regarded as additional process noise. As a result, the H 00 filter has robustness
(in the sense ofsatisfying.(lO) or (14)) to variation in the spectra ofthe exogenous
signals at the expense of performance if the exogenous input w is actually a
zero me an Gaussian white stochastic process.
The paper by Nagpal and Khargonekar [1989] contains similar results on
H 00 filtering and smoothing for infinite as well as finite horizon cases for linear
time-invariant as well as linear time-varying systems.

2.3 The Output-Feedback H oo Control Problem


Let us now consider the output feedback problem. A key result from the
linear-quadratic-Gaussian control theory is that in the output feed case, the
optimal controller is the KaIman filter for the optimal state-feedback law. The
above mentioned results on the state-feedback H 00 control problem indicate
that something similar may be true in the output feedback H 00 problem. The
following result is taken from Glover and Doyle [1988], Doyle, Glover,
Khargonekar, and Francis [1989].
Theorem 2.6. Consider the linear system L. Suppose assumptions A.1-A.6 hold.
Then there exists an admissible output feedback controller K such that with the
control law u = Ky, the closed loop transfer function satisfies
11

Tzw 11 00 < y

(16)

if and only if the following

conditions hold:
1. there exists a (unique) real symmetrie matrix P such that
F'P + PF

+p(G~~'1_ G2G~)P + H'1H1 =0,

F + G 1G'1)/(y2) - G2G~)P is asymptotically stable, and P > 0;


2. there exists a (unique) real symmetrie matrix Q such that
FQ

+ QF' + Q(H~~1 -H~H2)Q + G1G'1 =0,

(17)

F + Q( (H'1 H 1)/(y2) - H~H 2) is asymptotically stable, and Q > 0; and


3. Q-1 > pjy2
1f these conditions hold then one controller that satisfies the inequality (16) is
given by
dx = [F
dt

+ G1 ~'1 pJx + ZQH~(y y

u= -G~Px,

H 2X) + G2u

(18)

168

P. P. Khargonekar

where
QP)-l

Z= ( 1 - }'2

The notation in the above theorem is meant to be suggestive. Note that the
above controller (18) has many similarities to (and some differences from) the
classical LQG controller. It is called the HOC! central controller and has many
interesting interpretations and properties.
As }' goes to 00, this controller approaches the LQG controller.
It is shown in Doyle, Glover, Khargonekar, and Francis [1989J that the
above controller is an H 00 estimator for the state-feedback control law (6) in
the presence of the disturbance w = (G'lPjy2)X. More specifically, let

G' P

r:=w-+x
)'

Then the equations for system I become


dx
dt

-=

( F+-G1G'lP')
x+ G 1 r+ G2 u,
2}'

y=H 2 x+J 21 r

(19)

Now if we consider the problem of estimating the state-feedback law (6)


u

=-

G~Px

and apply Theorem (2.5) to the above system, then the resulting filter is precisely
the output feedback controller in Theorem (2.6).
The above result is also closely connected to the separation principle in the
risk sensitive control problem obtained by Whittle [1986]. See Glover and Doyle
[1988J, Doyle, Glover, Khargonekar, and Francis [1989J for further details
along these lines.
Recently, Stoorvogel [1989J has obtained interesting extensions ofTheorem
(2.6) to the singular case, i.e. the case when J 12 and J 21 are not required to
have full rank. As mentioned previously, Tadmor [1990, 1989J, Limebeer,
Anderson, Khargonekar, and Green [1989J have extended these results to linear
time-varying and infinite-dimensional systems. Sampei, Mita, and Nakamichi
[1990J have extended the approach taken in Zhou and Khargonekar [1988J
to the output feedback case.
Proof of Theorem (2.6). Without loss of generality, set}' = 1.
(Necessity) Suppose there exists an admissible FDLTI controller K such
that the norm of the closed loop transfer function matrix 11 Tzw 11 00 < 1. It follows

3 The LQG Problem-State-Space H 00 Control Theory

that for all WEL 2[ - Tl' T 2] with x( - Tl) =


T2

169

we have
(20)

(1Iw(t)112_llz(t)112)dt~0

-T,

(Here Tl' T 2 can be 00.) We will break up this proof into three parts.
Part 1: Necessity of Condition 2
Following the key idea from Nagpal and Khargonekar [1989J, we will first
construct an exogenous signal W such that y == 0. Let J il be such that the matrix

[~~:J
is a square orthogonal matrix. Now set
w:= J~l V 2 + Ji;v l

(21)

Then it is easy to verify that

1 W 1 2= 1 vl l1 2+ 1 v211 2,
G1 W = G 1J il V1,
J 21 W=V 2

For any vl and


becomes

V2 ,

let

(22)

be given by (21). Then the equations for the system L

dx = Fx(t) + G1Ji;vl(t) + G2u(t),


dt

+ J 12 u(t)
Y = H 2 x(t) + v2 (t)
z = Hlx(t)

If

V2

is chosen such that


v2(t)

H 2x(t),

= -

(23)

then y == 0.
Now let T > be arbitrary. For any V l EL 2 [ - T,O] and V2 and w as above,
it is easy to verify that wEL 2 [ - T,O]. since K is a causallinear controller an
y == 0, it follows that u == 0. Now the equations for system L become
dx
.1'
-=FX+GJ 21 VU
dt
(24)

z=Hlx

Now if we set

V l , V2

and was above, and let x( - T)

J (11 w(t) 1 2-

-T

= 0, then

1 z(t) 1 2)dt

J (1Iv (t)11 2+ 1 H 2x(t) 1 2-IIH l x(t)11 2)dt

-T

(25)

170

P. P. Khargonekar

Define the cost functional


J(V l , T):=

J (1Ivl(t)II Z + 11 Hzx(t) Il z -IIHlx(t)IIZ)dt

(26)

-T

We can now apply the classical results on linear-quadratic optimal control


to the system (24) and the cost functional J (as in, for example, Willems [1971]).
Indeed, consider the optimization problem
J -(x o):= inf{J(v l , T):x( - T)

= O,x(O) = X o, T > O}

(27)

Note that in J _, we take infimum over aB T> 0 also. By the re ach ability of
(F, G1JiD, it foBows that J -(xo) is finite and from (25) that it is non-negative.

It foBows from Willems [1971, Theorem 7] that there exists a real symmetrie
matrix X such that

(28)
and the eigenvalues of (F - G1G'l X) are in the closed right half plane. [Here we
have used the fact that G1G'l = G1J J il G'l'] Moreover,

i;

o~ J -(x o) =

- x~Xxo

Thus, X ~O.
Next we show that X < 0 and aB eigenvalues of F - G l G'l X are in the open
right half plane. Since 11 T zw 11 00 < 1, there exists a (sufficiently smaB) (j > 0 such
that if Z:= (z' (jx')', then 11 Tz,w 11 00 < 1. Now arguing as above, we can conclude
that there exists a (unique) real symmetrie matrix X such that
(29)
and the eigenvalues of F - G1G'l X are in the closed right half plane. U sing the
monotonicity properties of solutions to algebraic Riccati equations, it foBows
that X ~ X ~ O. In fact strict inequality holds here. To see this, suppose v is
such that v'(X - X)v = O. It foBows that X v = X v. Then multiplying (29) and
(28) by v' on the left and v on the right, and subtracting, we get (jzll v Il z = 0 and
so v = O. Therefore,

Since 11 Tz,w 11 00 < 1, we can repeat the above argument for X and therefore
X < O. Let Y and Y denote the unique stabilizing solutions to the algebraic
Riccati equations (28) and (29) respectively. Then using the maximality and
minimality properties of stabilizing and antistabilizing solutions of algebraic
Riccati equations, we have

Thus we are in the strict inequality case of Theorem 5 of Willems [1971] and
aB eigenvalues of F - G1G'lX are in the open right half plane.

3 The LQG Problem-State-Space H", Control Theory

171

Setting Q:= - x-I, we can conclude that (15) holds. Furthermore,

F' + Q(H'11
H - H'2
H2) -- -

x- l(F -

G11
G' X)X- 1

a asymptotically stable, and Q > o. Thus condition 2 in Theorem (2.6) holds.


Part 2: N ecessity of Condition 1
By taking transposes and using part 1 above, it folIo ws that condition 1 in
Theorem (2.6) holds. Before proceeding to the last part of the proof, we will
first derive an easy consequence of this condition.
Consider the linear system :E with x(O) = Xo. By differentiating x' Px along
the trajectories of :E, and completing squares we get

d(x'Px) = Ilu +
dt

G~Px112 -llw - G~Px 11 2 +

IIwl1 2 -llzl1 2

(30)

Now set w(t):= G~Pe(F+(G,G~ -G 2 G;)P)I XO for t ~ O. Using the fact that
F + (G 1 G~ - G2G~)P is asymptotically stable, it follows that w belongs to
L 2 [0, (0). Now for any internally stabilizing controller K, we have

J(llw(t)112_llz(t)112)dt~sup J(1Iw(t)11 2 -llz(t)11 2)dt

00

00

The supremum on the right hand side is taken over all uEL 2 [0, (0) for which
the resulting state trajectory of:E is such that xEL 2[0, (0). Since w is fixed this
is a linear-quadratic optimal control problem. Now it is not difficult to verify
that u(t) = - G~Pe(F+(G,G; -G 2 G;)P)I XO is the (unique) solution to this linearquadratic optimal control problem. (This can be done in many different ways.
For example, one can check that the w, uand the corresponding state trajectory
x(t) = e(F+(G,G; -G 2 G;)P)I XO solve the associated two point boundary value problem.)
Moreover, with w, U, x as above, it folio ws that u(t) = - G~Px(t) and
w(t) = G'l px(t). Now integrating (30) from 0 to 00, we get
co

sup
u

J0 (11 w(t) 11 2 -

11 z(t) 11 2)dt = - x~Pxo

(This is why w = G'lPX is the 'worst disturbance'.)


Hence, for any stabilizing controller K with w set as above, we get
co

J(11 w(t) 11 2 -

11 z(t) 11 2)dt ~ - x~Pxo

(31)

Part 3: Necessity of Conditions 3


This will be accomplished by using a particular choice for wand examining
the behavior of the closed loop system. The idea is to use the worst disturbance
w for t < 0 from part 1 to set up a desired state x(O) = Xo keeping y == 0 for
t < O. Then use the worst disturbance w for t ~ 0 from part 2 with x(O) = xo.
Let Xo # 0 be fixed. Fix oe > O. Select V1(t)EL 2[ - T,O] and T > 0 (sufficiently
large) are chosen such that with x( - T) = 0, we have x(O) = Xo, and

J(v 1 , T)=X~Q-1XO+oellxoI12

172

P. P. Khargonekar

Now set w(t) for t ~ 0 by (21) where


in part (1) of the proof,

V2

is given by (23). Then as shown above

S (11 w(t) 1 2 -11 z(t) 2)dt = J(v 1, T) =


11

-co

X~[Q-l

+ o:Jxo

(32)

From part 2 above, set w(t):= G'lPe(F+(G,G; -GzG;)P)t xo , t;?; O. For any admissible
controller K, we get from (31) above
co

J (11 w(t) 11 2 -11 z(t)

11

2)dt ~ - x~Pxo

(33)

Adding (32) and (33), and using (20), we get


X~[Q-l

+ o:Jxo -

This holds for all

0:

x~Pxo;?; 0

> O. Therefore

Q-l_P;?;O
To prove the strict inequality, choose () > 0 as in part (1) of the proof. Applying
the above argument to Tz~w and with the obvious notation it follows that

Q;l ;?;P~
Thus the following chain of inequalities holds:
Q-l>Q;l;?;p~>p

This completes the proof of necessity of conditions 1, 2, and 3.


(Sufficiency) We will only give a sketch of the key steps of the proof since
this involves fairly routine ca1culations. Let the controller be given by (18). Let
e:= x -:X. Then the closed loop system equations can be written as folIows:

dt/J = (/Jt/J + Tw,


dt
z=elj;,

(34)

where Ij;:= (x' e')', and


(/J:= [F - G2G~P
-GIG~P

T'=[
.
GI

GI

G2G~P
]
F+GIG'lP-ZQH~H2'
]

-ZQH~J21 '

e:= [H 1 -J12G~P

J12G~PJ

Using stability of (F + GI G'l P - G2G~P), and controllability of (F, GI) it is easy


to show that ((/J, T) is stabilizable. A routine ca1culation using conditions 1,2,
and 3 verifies that
II(/J'

+ (/JII + TT' + IIe'eII = 0,

(35)

3 The LQG Problem-State-Space Ha) Control Theory

173

where

[ PII =
0

(Q -

0
1 _

P) -

]
1

It follows that (/J has all eigenvalues in the left half plane. Thus, from (35) one can

conc1ude that the c10sed loop system is internally stable. Again using conditions
1,2, and 3, it is not hard to show that ((/J + lle' e) = ( -ll-l[(/J + TT'Il-l]'ll)
has no eigenvalues on the imaginary axis. Now from (35) and Lemma 4 of
Doyle, Glover, Khargonekar, and Francis [1989J, it follows that the c10sed loop
system satisfies 11 Tzw 11 00 < 1.
This completes the proof.
0

2.4 Multiple Objective Problems


Control system design is most often a study of tradeoffs among competing
objectives. For example, it is often necessary to trade performance for robustness.
In a single objective optimal control problem such as the LQG or the H 00
problem, the competing objectives are combined into a single objective function
such as the H 2 or the H 00 norm of a c10sed loop transfer function matrix wh ich
is then minimized. This procedure often requires adjustments in weightinK
functions to reflect the specified performance and robustness requirements. For
this and many other reasons, it is important to study optimal control problems
with multiple objectives. Such multiple objective control problems have been
investigated by many authors. Here I will discuss a certain multiple objective
problem that arises in an attempt to combine the LQG and the H 00 control
theories.
Consider the FDLTI system

dx

- = Fx

dt

Zl

+ GllW 1 + G12W2 + G 2u,

= H ll x + J 112 U,

z2=H 12 X+J 122 U,

y=H

2x

+ J 21 w + J 22 U

(36)

where W 1 and W 2 are two exogenous (vector) inputs and Z 1 and Z2 are two
regulated (vector) outputs. The multiple objective optimal control problem of
interest is defined as follows:
Optimal LQG Performance Subject to an H 00 Constraint

Find an admissible controller that minimizes the 11 TZ1W1 112 subject to the
constraint 11 Tz2w2 11 00 < y.
This problem arises in many situation. For ex am pIe, it represents the problem
of designing a controller to optimize the nominal performance subject to a

174

P. P. Khargonekar

constraint of robust stability. This problem was first considered by Bernstein


and Haddad [1989a]. However, since it is a very difficult problem, they replaced
the objective function 11 TZ,W, 112 by an upper bound for it. Under the assumption
that W 1 = W 2 , they gave some necessary conditions for the solvability of this
problem using controllers of a given order. The solution was given in terms of
3 coupled algebraic Riccati equations. Mustafa and Glover [1988] considered
the problem of minimizing the entropy of the closed loop transfer matrix TZ2W2
subject to the constraint 11 1',.2w2 11 <Xl < 1. They showed that the central controller
given in Theorem (2.6) is the solution to this entropy minimization problem.
Their work also showed that the entropy of the closed loop transfer function
is closely related to the upper bound for 11 TZ2w2112 used by Bernstein and Haddad
[1989a]. Doyle, Zhou, and Bodenheimer [1989] considered a related problem
and gave necessary conditions and sufficient conditions for the solvability of
their problem. Rotea and Khargonekar [1989] have also approached this
problem by considering a related problem. They have considered the
state-feedback case, i.e. y = x. Under this assumption they considered the
following so-called simultaneous H 2 and H<Xl problem: Among all admissible
controllers the minimize 11 TZ' Wl 112' find a controller that also satisfies
11 TZ2w211 <Xl < y. Clearly, if this modified problem admits a solution it also solves
the problem of optimal LQG performance subject to an H<Xl constraint.
Rotea and Khargonekar [1989] have given a complete solution to the
simultaneous H 2 and H<Xl problem. At this time it appears that the problem of
optimal LQG performance subject to an H<Xl constraint remains a difficult open
problem.

3 Concluding Remarks
Many new insights into the H<Xl control problem have been obtained by taking
a state-space approach. Perhaps, the most important is the insight into the
structure of the solution. A major motivation for considering the H<Xl norm
comes from robust control problems. While the recent developments in H <Xl
control theory represent very significant progress in robust control, the problem
of robust performance remains achallenging open problem.
Acknowledgements

I would like to thank K.M. Nagpal and M.A. Rotea for many helpful
conversations.
References
1. M. Banker [1972]. Linear stationary quadratic games, Proceedings of IEEE COIiference on
Decision and Control, pp 193-197

2. T. Basar [1989a]. Disturbance attenuation in LTI plants with finite horizon: Optimality of
nonlinear controllers, Systems and Control Letters, Vol 13, pp 183-192

3 The LQG Problem-State-Space H", Control Theory

175

3. T. Basar [1989b]. A differential games approach to controller design: disturbance rejection,


model matching, and tracking, Proe 1989 Conferenee on Decision and Control, pp 407-414
4. D.S. Bernstein and W.M. Haddad [1989a]. LQG control with an H", performance bound: A
Riccati equation approach, IEEE Trans on Automat Contr, Vol AC-34, No 3, pp 293-305
5. D. S. Bernstein and W. M. Haddad [1989b]. Steady state Kaiman filtering with an H '"
performance bound, Systems and Control Letters, Vol 12, pp 9-16
6. J.c. Doyle, K. Glover, P.P. Khargonekar, and B.A. Francis [1989]. State-space solutions to
standard H 2 and H", and control problems, IEEE Trans on Automat Contr, Vol AC-34, No
8, pp 831-847
7. J.c. Doyle, K. Zhou, and B. Bodenheimer [1989]. Optimal control with mixed H 2 and H",
performance objectives, Proe of the 1989 American Control Conferenee, Pittsburgh,
Pennsylvania, pp 2065-2070
8. B.A. Francis [1987]. A Course in H", Control Theory, Lecture Notes in Control and Information
Sciences, No 88, Springer-Verlag, New York
9. B. A. Francis and J.c. Doyle [1987]. Linear control theory with an H", performance criterion,
SIAM J. Control and Optimization, pp. 815-844
10. T. Fujii and P.P. Khargonekar [1988]. Inverse problems in H", control theory and
linear-quadratic differential games, Proe 1988 Conferenee on Decision and Control, pp 26-31
11. K. Glover and J.C. Doyle [1988]. State-space formulae for all stabilizing controllers that satisfy
an H", norm bound and relations to risk sensitivity,Syst Contr Lett, Volll, No 3, pp 167-172
12. K. Glover and J.C. Doyle [1989]. A state-space approach to H", optimal control, in Three
Deeades of Mathematieal System Theory, eds. H. Nijmeijer and J.M. Schumacher,
Springer-Verlag, Berlin, pp 179-218
13. R.E. Kaiman [1960]. Contributions to the theory of optimal control, Bol Soe Mat Mexico,
Vol 5, pp 102-199
14. R.E. Kaiman [1964]. When is a linear control system optimal?, Trans ASME J Basic Engr,
Vol 86D, pp 344-347
15. R.E. Kaiman and R.S. Bucy [1961]. New Results in linear filtering and prediction theory,
Trans ASME, Series D, Journal of Basie Engineering, pp 95-108
16. P.P. Khargonekar, I.R. Petersen, and M.A. Rotea [1988]. H ",-optimal control with
state-feedback, IEEE Trans on Automat Control, Vol AC-33, No 8, pp 786-788
17. P.P. Khargonekar, I.R. Petesen, and K. Zhou [1987]. "Robust stabilization and H", optimal
control", Technical Report 87-KPZ, Department of Electrical Engineering, University of
Minnesota, Minneapolis, Minnesota. An abridged version in IEEE Transaetions on Automatie
Control, Vol 34, No 3, pp 356-361, 1990
18. P.P. Khargonekar and K. Polla [1986]. Uniformly optimal control of linear time-invariant
plants: nonlinear time-varying controllers, Systems and Control Letters, Vol 6, pp 303-308
19. D.J.N. Limebeer, B.D.O. Anderson, P.P. Khargonekar, and M. Green [1989]. Agame theoretic
approach to H", control for time varying systems, submitted for publication
20. E.F. Mageirou [1976]. Values and strategies in infinite duration linear quadratic games, IEEE
Transaetions on Automatie Control, Vol AC-21, pp 547-550
21. E.F. Mageirou and Y.c. Ho [1977]. Decentralized stabilization via game theoretic methods,
Automatiea, Vol 13, pp 393-399
22. D. Mustafa and K. Glover [1988]. Controllers which satisfy an H ",-norm bound and maximize
an entropy integral, Proe 1988 Conferenee Decision and Control, pp 959-964
23. K. Nagpal and P.P. Khargonekar [1989]. Filtering and smoothing in an H", setting, Control
Group Report No. GGR-21, College of Engineering, The University of Michigan. A conference
version is in Proe 28th Coriferenee on Deeision and Control, pp 415-420, 1989. To appear in
IEEE Transactions on Automatic Control, 1991
24. I.R. Petersen [1987]. Disturbance attenuation and H", optimization: a design method based
on the algebraic Riccati equation, IEEE Transaetions on Automatie Control, Vol AC-32, pp
427-429
25. M.A. Rotea and P.P. Khargonekar [1989]. H 2 optimal control with an H", constraint: the
state-feedback case, Control Group Report No. CGR-22, College ofEngineering, The University
ofMichigan. Proc 1990 American Control Conference, pp 2380-2384. To appear in Automatica
26. M.G. Safonov and D.J.N. Limebeer [1988]. Simplifying the H", theory via loop shifting, Proe
27th IEEE CDC, pp 1399-1404
27. M. Sampei, T. Mita, M. Nakamichi [1990]. An algebraic approach to H", output feedback
control problems, Systems and Control Letters, Vol 14, pp 13-24.
28. C. Scherer [1990]. H", control by state-feedback and fast algorithms for the computation of
optimal H", norms, IEEE Transactions on Automatic Control, Vol 35, No 10, pp 1090-1099

176

P. P. Khargonekar

29. A.A. Stoorvoge1 [1989]. The singular H"" control problem with measurement feedback, to
appear in SI AM J Control and Optimization
30. A.A. Stoorvogel and H. Trentelman [1988]. The quadratic matrix inequality and in singular
H"" control with state feedback, to appear in SIAM J. Control and Optimization
31. G. Tadmor [1988]. Worst-case design in the time domain: the maximum principle and the
standard H"" problem, Mathematics of Control, Signals, and Systems, Vo13, pp 301-324
32. G. Tadmor [1989]. The standard H"" problem and the maximum principle: The general linear
case, Tech. Rep. 192, University of Texas at Dallas, May 1989
33. K. Uchida and M. Fujita [1989]. On the central controller: Characterization via differential
games and LEQG problems, Systems and Control Letters, Vol 13, pp 9-14
34. P. Whittle [1986]. A risk-sensitive certainty equivalence principle, in Essays in Time Series and
Allied Proeesses, Applied Probability Trust, London, pp 383-388
35. J.c. Willems [1971]. Least squares stationary optimal control and the algebraic Riccati equation,
IEEE Trans on Automat Contr, Vol AC-16, No 6, pp 621-634
36. G. Zames [1981]. Feedback and optimal sensitivity: Model reference transformations,
multiplicative seminorms, and approximate inverses, IEEE Trans on Automatie Control, Vol
AC-26, pp 301-320
37. K. Zhou and P.P. Khargonekar [1988]. An algebraic Riccati equation approach to H ""
optimization, Systems and Control Letters, Vol 11,85-92

Unified Continuous and Discrete Time LQG Theory


G. C. Goodwin! and M. E. Salgad0 2
I Centre for Industrial Control Science, Department of Electrical Engineering and
Computer Science, University of NewcastIe, New South Wales 2308, Australia
2 Department of Electronic Engineering, Universidad Te-cnica Federico Santa Maria,
Casilla 1l0-V, Valparaiso, Chile

One of the key contributions to arise in system science over the past 40 years has been LQG
theory. The purpose of this paper is to review this theory emphasising the connection between
continuous and discrete time cases. We also briel1y review certain robustness issues induding the
concept of loop transfer recovery

1 Introduction
LQG theory, as first described in the 1960's, [14], [15], [16], [18] represented
a major breakthrough in system science since it provided a new viewpoint
within which problems could be formulated. Prior to the appearance of this
theory, the principal emphasis had been on frequency domain techniques for
single input single output systems. LQG theory made possible the treatment
of multivariable, nonstationary problems and therefore opened new horizons
in control and estimation theory.
Since its appearance. LQG theory has proved to be enormously successful.
The reasons for this success include the fact that the problem formulation is
easy to understand and to relate to physical design objectives: the fact that the
resultant optimization problem fits into a rich mathematical framework leading
to elegant and mathematically tractable solutions and to the fact that the results
have a clear and intuitive interpretation in practice.
Over the past 30 years there has been an enormous amount of work done
on the LQG problem including extensions of the theory and a large number
of successful applications. It would be impossible to do justice to a survey of
this work. Thus, we will give a tutorial introduction to the theory but with a
novel twist: treating the continuous and discrete cases in a unified framework.
The essential components of the LQG problem are as folIo ws [25]. The
underlying system is described by a linear state space model having "white
noise" disturbances in both the state evolution and output measurements. In
continuous time, this model has the following incremental form [3]:
dx = Acxdt + Bcudt + dv:
dz = Ccxdt + dw

x(O) = x o

(1.1)
(1.2)

178

G. C. Goodwin and M. E. Salgado

where x ElRn, uElRm zElRr are the state, input and integral of the output
respectively. Also, v and ware independent Wiener processes having incremental
covariance Qedt and redt respectively. (A useful heuristic is to think of dv/dt
and dw/dt as continuous time "white noise" processes in which case. Qe and Tc
have the interpretation of Power Spectral Densities. In this context, the system
output is Ye = dz/dt.) The initial state Xo is taken to be a random variable,
independent of v and w, and having covariance Po.
Given the model (1.1), (1.2) the design objective is to minimize the following
quadratic criterion with the input expressed as a function of past data:
J=

tE {X(tf f 17 jx(t j) +

x(tf Qx(t) + U(t)T Ru(t)dt }

(1.3)

where 17 j, Q are positive semi definite matrices and R is a positive definite


matrix and where E[] denotes expectation with respect to the probability space
jointly genera ted by v and w.
The solution to the above problem turns out to be very simple; namely a
combination of a linear state estimator together with linear feedback of these
estimates. In this paper, we will review this solution: first treating the control
problem and then the combined problem of control and state estimation.

2 LQ Control Problem
Consider the special case when the noise is zero and the state is directly measured.
The model (1.1), (1.2) then becomes
d
-x = Aex + Beu;
dt

x(O) = X o

(2.1)
(2.2)

With a zero order hold input and sampling period Li, the corresponding discrete
model is
x((k + I)Li) = Aqx(kLi) + Bqu(kLi);

where

x(O) =

Xo

(2.3)

y(kLi) = Cqx(kLi)

(2.4)

Aq =

(2.5)

Bq

eAc.<l

.<I
=Je

Cq = Ce

Ac

(.<I-t)Bd7:
e

(2.6)

(2.7)

The subscript 'q' refers to the shift operator formulation wh ich is standard for
the discrete time case. However, a difficulty with this description is that it does

3 The LQG Problem-Continuous and Discrete Time LQG Theory

179

not have a meaningfullimit as the sampling period goes to zero. For example,
we have
lim A = 1;

..:1-->0

(2.8)

lim Bq=O

..:1-->0

This difficulty arises since, whereas (2.1) is in incremental form (i.e. the equation
describes the differential of the state), (2.3) is in absolute form (i.e. the equation
describes the end point of the state transition). This motivates the following
alternative form of (2.3):
x((k

+ 1)Li) Li

x(kLi) = A~x(kLi) + B~u(kLi)

(2.10)

yc(kLi) = C~x(kLi)

where Cq =

C~ =

(2.9)

Cc and

(2.11)

= TBc
T=

(1 +

We thus see, that

(2.12)
LiAc + Li 2 A; ... )

2!

A~, B~

(2.13)

3!

differ from Ac, B c by terms of order Li and


(2.14)

For convenience, we introduce the following operator notation:


.
')
-d ('In contInUOUS
tIme
p=

dt

(2.15)

q - -1('In d'lscrete tIme


')
Li

where q is the usual forward shift operator. In this notation the continuous
model (2.1), (2.2) and the discrete model (2.9), (2.10) can be written in a unified
wayas
(2.16)

px=Ax+Bu;

(2.17)

yc=Cx

where A, B denote Ac, B c or

A~, B~

depending on whether or not Li = 0

180

G. C. Goodwin and M. E. Salgado

We also introduce the following integral (summation) notation:

S dt = {
t

Jdt(in continuous time)


0

(2.18)

(t/LI)-l

,,1 (in discrete time)

k=O

Using this notation, the deterministic form of (1.3) is


(2.19)
Using standard variational techniques [4], [17] the necessary conditions for
optimality of (2.19) are

px(t) = Ax(t) + Bu(t);

x(O) = x o

(2.20)

p2(t) = - ATA(t + ,,1);

A(tf ) = l:fx(tf )

(2.21)

Ru(t) + B T A(t + ,,1) = 0

(2.22)

where A(t) is a Lagrange multiplier or co-state.


Noting that A(t + ,,1) = (1 + L1p)A(t) and substituting (2.22) into (2.20) gives

[01

L1BR-IBT][PX(t)]=[ A -BR-IBT][X(t)].
(l + AT,,1)
pA(t)
- Q
- AT
A(t) ,

[;(~~)]=[l:f:~tf)]

(2.23)

Since (I + AT,,1) is non singular for the model (2.1), (2.9) then (2.23) can be
rewritten as

[ PX(t)] = M[X(t)]
p2(t)
2(t)

(2.24)

where the matrix M is given by

M=[I

L1BR- IB T
(l+A TL1)

]-l[

A
-Q

-BR-IB T]
_AT

(2.25)

We will call M a generalized Hamiltonian matrix. Obviously as ,,1-+0, we obtain


the continuous time version:
M
C

=[

A
_Q

-BR-IB T]
_AT

(2.26)

We also define a 2nx2n matrix Jas:


(2.27)

3 The LQG Problem-Continuous and Discrete Time LQG Theory

It follows that JT =

r =1

181

J and

rlMTJ= -(1+M.1)-lM

(2.28)

This result inc1udes the case of Hamiltonian matrices (continuous) and


symplectic matrices (discrete). Also, the eigenvalues of M can be grouped into
two disjoint sets Tl and T 2 such that for every AaET l there exists a AbET2
such that Aa + Ab + .1Aa Ab = O. We can thus choose either Tl or T 2 to contain
only those eigenvalues wh ich are inside or on the stability boundary.
Equation (2.23) represents a linear two point boundary value problem. In
general we can write
A(t) = 17(t)x(t);

(2.29)

Substituting (2.29) into (2.23) and using the fact that e(17x) = (e17)x
.1(e17)(ex) leads to
[-17(t) I]M[ I
17(t)

J=

-e17(t-.1);

17(tf)=17 f

+ 17(ex) +
(2.30)

This equation is a matrix Riccati Differential (or Difference) equation (RDE)


[6] and can be written in fuH as
- e17(t - .1) = Q + AT17(t) + 17(t)A + .1A T17(t)A
17(tf)=17f
-L(tf[R+.1B T17(t)B]L(t);
L(t) = (R

+ .1BT17(t)B)-l B T17(t) (1 + A.1)

(2.31)
(2.32)

Finally, from (2.22), the optimal controllaw is given by


u(t) = - R -lB TA(t + .1)

(2.33)

Noting from (2.29), (2.20) that


A(t + .1) = 17(t + .1)[(1 + A.1)x(t) + .1Bu(t)]

(2.34)

then (2.33) becomes


u(t)

==-

(R + .1B T17(t + .1)B)-l B T(1 + A.1)x(t)


L(t + .1)x(t)

(2.35)

An important issue is under what circumstances the solution to (2.31) reaches


a steady value and whether or not this leads to a stable feedback system. If a
steady state solution f, does exist, then this must satisfy (2.31) with the derivative
(difference) of 17 set to zero, this is

[ - f I] M [~ ]

= 0

(2.36)

This nonlinear equation is called the Algebraic Riccati Equation (ARE) and
has a family of solutions. One of these solutions is of particular interest and is
obtained as follows:

182

G. C. Goodwin and M. E. Salgado

Let X = exil xI1Y ElR 2n x n be the matrix formed from the generalized
eigenvectors corresponding to the eigenvalues of M inside or on the stability
boundary; i.e.
(2.37)
where A is a diagonal matrix of eigenvalues (more generally a Jordan form
matrix). From (2.37) we have (provided X;/ exists) that
(2.38)
It is then c1ear from (2.37), (2.36) that 2:s = X 21X;/ satisfies the ARE. Also from
the top equation in (2.37) we have
X llAX;/ = A - BL

(2.39)

where L is obtained from (2.35) with 2:(t + Li) replaced by 2: s . Hence, the c10sed
loop system corresponding to 2: s has eigenvalues inside or on the stability
boundary. This solution is the strang solution to the ARE [8]. If aB of the c10sed
loop poles lie inside the stability boundary, then the solution is called the
stabilizing solution.
Next, we c1arify the conditions under which the solutions of the RDE (2.3.1)
converge to the strong solution of the ARE [7].
We factor Q as [(YC]. We then have the following result:
Theorem 2.1 [7], [8], [21].

(i) Existence and Uniqueness. The ARE has a unique strong solution if and
only if (A, B) is stabilizable
(ii) Stabilizability only. Subject to (2: f - 2: s ) ~ 0, then lim 2:(t) = 2:s if and only
t-> 0Cl

if (A, B)

is stabilizable, where 2:(t) is the solution of the RDE with final


condition 2:f' and 2:s is the unique strang solution of the ARE.
(iii) Stabilizability and no unobservable modes on stability boundary. Subject to
2: f > 0, the stabilizability of (A, B) and the nonexistence of unobservable
mo des of (C, A) on the stability boundary are necessary and sufficient
conditions for lim 2:(t) = 2:s (exponentially fast), where 2:(t) is the solution
t-> 0Cl

of the RD E with final condition 2:f' and 2:s is the unique stabilizing solution
ofthe ARE.
D

An important property of the LQ optimal control solution, as outlined above,


is that it has good robustness properties [1], [19], [22]. In the continuous time
single input single output case, the gain margin is infinite and the phase margin
is at least 60. The corresponding result for the continuous/discrete formulation
given above is:

3 The LQG Problem-Continuous and Discrete Time LQG Theory

183

Theorem 2.2 [21]. For the steady state version oJthe controllaw(2.35) we have
(1) Jor all

WEIR

where

LI

a=---R + L1BT~B

and u(y) is the return difJerence Junction given by


u(y)

g, 1 + L(yl -

A)-lB

(2) The ga in margin is at least

I-Ja
(3) The phase margin is at least
cos-1(1-!-a)

Note that as L1-+ 0, the above results reduce to the continuous time results
described earlier.

3 Optimal State Estimation


We next return to the stochastic model (1.1), (1.2). Again, we are interested in
the corresponding discrete time version. However, it makes no sense to sam pie
(1.1), (1.2) direct1y since the resultant noise will have infinite variance. Instead,
we follow the standard practical strategy of preceeding the sampier by an
anti-aliasing filter. When this is done, the sampled model corresponding to (1.1),
(1.2) has the form [21]:

xk + 1)L1) = A q x(kc5) + Bq u(kL1) + vq(kL1)

(3.1)

y(kL1) = Cq x(kL1) + wikL1)

(3.2)

where y(t) is the pre-filtered version of Yit) and where vikL1), wq(kL1) are discrete
gaussian white noise processes having covariance given by

E{[ vq(kL1) ] [vi lL1 )TW q{l L1


wikL1)

f]} [Q;Sq
=

Sq Jc5(k -I)
Rq

(3.3)

As might be expected from comments made in Sect. 2, the model format given
in (3.1), (3.2) will not have a sensible limit as L1-+ O. Indeed, in addition to (2.8)
which also holds here, we have that Qq is of order L1 and R q is of order 1/L1

184

G. C. Goodwin and M. E. Salgado

and hence
lim Qq = 0,

.1--+0

lim Rq =

.1--+0

(3.4)

00

As in Sect. 2, these difficulties are removed, by describing the system in


difference form:

x((k + 1).1) - x(k.1) = Aox(k) + Bou(k) + vo(k)


.1

(3.5)

y(k) = Cox(k) + wik)

(3.6)

where the covariance of vo' Wo is given by


(3.7)
The covariance given in (3.7) is of order 1/.1 and thus still does not converge
to a sensible limit as .1 ~ 0. However, this final difficulty is removed by describing
the properties of vo' Wo in terms of .1A. In fact, this can be interpreted as using
aPower Spectral Density description since Spectral Density = Variance/Bandwidth where Bandwidth = 1/.1. This is then consistent with the continuous time
case where, as noted previously, spectral densities are used in the model. For
future use, we also define

Sr

[ D o SoJ=.1'P

(3.8)

ro

We can then write the models (1.1), (1.2) and (3.5), (3.6) in unified form as
{!x

= Ax

+ Bu + v

(3.9)
(3.10)

y=Cx+w
where (v T , wT ) has Spectral Density [ ST
Q

rSJ.

In continuous time, we need to interpret (3.9), (3.10) in the incremental form


given in (1.1), (1.2).
We then seek a mapping x(t) from the data up to time (t - .1) to IR" such
that E{ (x - x)(x - xf} is minimal. Standard orthogonal projection arguments
[2J lead to the following result (commonly called the KaIman filter) [14J, [15]:

(!x = Ai + Bu + H(y -

CX)

(3.11)

where

H = [(.1A

+ I)PC T + SJ [.1CPC T + T]-1

(3.12)

and P satisfies the following matrix Riccati differential (difference) equation:


(3.13)

3 The LQG Problem-Continuous and Discrete Time LQG Theory

185

From the structure of the solution given in (3.12), (3.13) it follows that the
properties of the KaIman filter are duals of the corresponding properties of an
associated optimal control problem. For example, if we factor Q as DD T , then
we have the following dual to Theorem 2.1:

Theorem 3.1
(i) Existence and uniqueness. The ARE has a unique strang solution if and
only if(C,A) is detectable
(ii) Detectability only. Subject to Po - Ps ~ 0, then lim P(t) = Ps if and only if
I .... 00

(C,A) is detectable, where P(t) is the solution of the RDE with initial
condition Po, and Ps is the unique strong solution ofthe ARE.
(iii) Detectability and no unreachable modes on stability boundary. Subject to
Po>O, then the detectability of(C,A) and the nonexistence ofunreachable
mo des of (A, D) on the stability boundary are necessary and sufficient
conditions for lim P(t) = Ps (exponentially fast), where P(t) is the solution
1 .... 00

ofthe RDE with intial (wndition Po, and Ps is the unique stabilizing solution
ofthe ARE.
D

The above results cover a wide range of filtering problems. For ex am pIe, part
(ii) of the above theorem applies to sinewave estimation in noise. This allows
the traditional methods ofDiscrete Fourier Transforms to be viewed as a special
case of KaIman filtering [5].
As stated in Sect. 1, the solution of the general problem given in equations
(1.1) to (1.3) is to combine the state feedback solution described in Sect. 2
with the optimal state estimator described above. This result is commonly
known as the Separation Theorem or the use ofCertainty Equivalence [1], [3].

4 Robustness Issues
Unfortunately the nice robustness properties of the LQ optimal controller
described in Sect. 2 are often lost when x is fed back rather than x [9]. To
see why this is so, we pI ace the KaIman filter in a more general setting. In the
time invariant case, the KaIman estimator gives x as particular linear functions
of y and u. We generalize this to any function of the following form:
(4.1)

where Tl' T2 are rational, proper, stable linear time invariant operators. For
ex am pIe, any full order steady linear observer (including the KaIman filter) leads
to
(4.2)

186

G. C. Goodwin and M. E. Salgado

Fig.4.1

For X to be an "unbiased" estimator, the transfer function from u to


the same as that from u to x [24J, i.e.

x must be
(4.3)

Full state feedback and state estimate feedback lead to different sensitivity of
the closed loop. We illustrate this idea in the case of an unmeasured input
disturbance. Consider a MIMO system under linear feedback control based on
estimates of the states as shown in Fig. 4.1.
From Fig. 4.1, we have
(4.4)

u'=u-d= -Lx

and using (4.1) we obtain


u = [I + LT1(e) + LT2(e)C(eI - A)-1 BJ -1 [I + LT1(e)Jd

(4.5)

Introducing the constraint (4.3) yields


(4.6)

u = Sejd

where
Sef = [I + L(eI - A)-1 Br 1 [I

+ LT1(e)J

(4.7)

If we use full state feedback, then

u' = u -d = -Lx =

- L(eI - A)-1Bu

(4.8)

from where we finally obtain

u = SSfd

(4.9)

with
Ssf = [I + L(eI - A)-1B]-1

(4.10)

Comparison of (4.4) and (4.10) leads to


Sef = Ssf[I

+ LT1 (e)]

(4.11)

3 The LQG Problem-Continuous and Discrete Time LQG Theory

187

From (4.11) it is dear that the sensitivity to input disturbances are exactly equal
if and only if LT1(e) = o. This can be achieved by setting T1(e) = 0, but then it
may not be possible to construct a proper, stable T 2 which satisfies (4.3). A
practical solution to this dilemma involves requiring LT1(e) to be zero only in
the frequency band of the disturbance. Then a stable T 2 can be sought which
ensures satisfaction of(4.3) in a bandwidth appropriate for the problem at hand.
If the plant is minimum phase, such a practical solution is easy to achieve. In
the single input single output case it suffices to put.
T 2 (e) =

N(e)E(e)

adj(e1- A)B

(4.12)

where N(e) is the numerator of the system transfer function and where E(e) is
a stable polynomial introduced to ensure that (4.12) is proper. By choosing E(e)
to roll off outside the bandwidth of interest then it is readily seen that (4.3) is
satisfied over this bandwidth. In the non-minimum phase case, a compromise
is required between equalizing the sensitivities to input disturbances and
satisfying (4.3).
For more general problems, we can use the KaIman Filter framework to
design a robust filter by introducing additional terms in the noise spectral
densities as an artifice for dealing with different types of uncertainty. For
example, returning to the input disturbance case, this can be captured in the
KaIman Filter setting by choosing n = BB T and then letting r ~O [14]. In the
case of a minimum phase, relative degree one system, this leads to the solution
given in (4.12). This latter approach is generally known as Loop Transfer Recovery
[10], [11], [23], [24]. Using the same general approach, sensitivity to other
forms of uncertainty can be addressed [20].
Work along the general directions outlined above has contributed to a better
understanding ofthe robustness properties ofthe LQG solution. This has further
extended the practical appeal of the method.

5 Conclusions
This paper has given a brief review of linear quadratic control and estimation
theory from a unified perspective. This dass of techniques provides a flexible
framework within which feedback control problems can be precisely specified
and solved. The method has found wide spread acceptance and is frequently
used especially for complex multi variable systems.
Of course, the raison d'etre for the use of feedback is to have a control
system which has low sensitivity to uncertainty. Hence, a complete design will
generally require a combination of feedback and feedforward strategies with
the feedback bandwidth being made as high as possible so as to reduce sensitivity.
No matter wh at method is used to design the feedback, the ultimate achievable
bandwidth is limited by non-minimum phase zeros, time delays, high relative

188

G. C. Goodwin and M. E. Salgado

degree, plant uncertainty and output measurement noise. These constraints


im pose unavoidable trade-offs in sensitivity minimization [12]. Within this
broader perspective, LQG control represents a powerful method for articulating
the design issues.
References
[1] Anderson, B.D.O. and J.B. Moore (1971) Optimal Contro/. Prentice Hall, Englewood ClifTs,
New Jersey
[2] Anderson, 8.D.O. and J.B. Moore (1979) Optimal Filtering. Prentice Hall, Englewood ClifTs,
New Jersey
[3] Astrm, K.J. (1970) Introduction to Stochastic Control Theory Academic Press, New York
[4] Athans, M. and P.L. Falb (1966) Optimal Control, McGraw Hili, New York
[5] Bitmead, R.R., A.C. Tsoi and P.J. Parker (1986) "A Kaiman filtering approach to short-time
Fourier analysis". IEEE Trans ASSP Vol 34 No 6, pp 1493-1501
[6] Bucy, R.S. (1967) "Global theory of the Riccati equation". J Comput System Science. Voll,
pp 349-361
[7] Chan, S.c., G.c. Goodwin and K.S. Sin (1984) "Convergence properties ofthe Riccati difTerence
equation in optimal filtering of nonstabilizable systems" IEEE Trans in Auto Control Vol
AC-29 No 2, pp 110-118
[8] De Souza, C.E., M.R. Gevers and G.C. Goodwin (1986) "Riccati equations in optimal filtering
of nonstabilizable systems having singular state transition matrices" IEEE Trans in Auto
Control, Vol AC-37, No 9, pp 831-838
[9] Doyle, J.c. (1978) "Guaranteed margins for LQG regulators" IEEE Trans in Auto Control
Vol AC-23, No 4, pp 756-757
[10] Doyle, J.c. and G. Stein (1978) "Robustness with observers". Proceedings of the 25th CDC,
San Deigo, California, pp 1-6
[11] Doyle, J.C. and G. Stein (1981). "Multivariable feedback design: concepts for a c1assicaljmodern
synthesis". IEEE Trans in Auto Control, Vol AC-26, No 1, pp 4-16
[12] Freudenberg, J.S. and D. Looze (1988) "Frequency Domain Properties of Scalar and
Multivariable Feedback Systems". Springer Verlag, Berlin
[13] Goodwin, G.c. and R.H. Middleton (1989) "The c1ass of all stable unbiased state estimators".
Systems and Control Letters Vol 13, No 2, pp 161-163
[14] Kaiman, R.E. (1960) "A new approach to linear filtering and prediction problems".
Transactions ASME, Journal of Basic Eng Vol 82, pp 34-45
[15] Kaiman, R.E. and R.S. Bucy (1961) "New results in linear filtering and prediction theory".
Transactions ASME, Journal of Basic Eng, Vol 83, pp 95-107
[16] Kaiman, R.E. (1961) "Contributions to the theory of optimal control" Bol Soc Mat Mee, Vol
5 pp 102-119
[17] Kaiman, R.E. (1961) "The theory of optimal control and the calculus of variations", in
"Mathematical Optimization Techniques", I.R. Bellman (ed.). University of California Press.
[18] Kaiman, R.E. (1964) "When is a linear control system optimal?" Transactions ASME, Journal
of Basic Eng Vol 86 pp 51-60
[19] Kwakenaak, Hand R. Sivan (1972) "Linear Optimal Control System". Wiley Interscience,
New York
[20] Lin, J.Y. and D.L. Mingori (1989) "Design of LQG controller with reduced parameter
sensitivity". Proceedings of the 28th CDC, Tampa, Florida
[21] Middleton, R.H. and G.c. Goodwin (1990) "Digital Estimation and Control: A Unified
Approach" Prentice Hall
[22] Safonov, M.G. and M. Athans (1977). "Gain and phase margins for multiloop LQG
regulators", IEEE Trans in Auto Control Vol AC-22, No 2, pp 173-179
[23] Stein, G. and M. Athans (1987) "The LGR/LTR procedure for multi variable feedback control
design". IEEE Trans in Auto Control Vol AC-32 No 2 pp 105-114
[24] Zhang, Z. and J.S. Freudenberg (1987) "Loop transfer recovery with non-minimum phase
zeros". Proceedings of the 26th CDC. Los Angeles, California pp, 956-957
[25] Special issue of the Linear Quadratic Gaussian Problem (1971) IEEE Trans in Auto Contro/.
Vol AC-16. No 6

Chapter 4

The Realization Problem

Linear Deterministic Realization Theory*


A. C. Antoulasl, T. Mats0 2 and Y. Yamamot0 3
1 Mathematical System Theory, E.T.H. Zurich, CH-8092 Zurich, Switzerland and
Department ofElectrical and Computer Engineering, Rice University, Houston, Texas 77251, USA
2 Department of Information Science, Faculty of Science, Toho University, 2-2-1 Miyama,
Funabashi 274, Japan
3 Department of Applied Systems Science, Faculty of Engineering, Kyoto University, Kyoto 606,
Japan

Introduction
Dynamical systems can be described in two different ways. The first is the
external, input/output or black-box description and the second is the interna I or
state-space description. The former is solely in terms of external variables, the

causes or inputs and the responses or outputs. The latter is in terms of additional
variables, the so-called state variables. The state variables represent the internal
dynamics of the system; they summarize its past history and are sometimes
referred to as the memory of the system.
The state-space description has a long history going back to Newton.
Famous examples are the Lagrangian and the Hamiltonian frameworks in
mechanics. The input/output description is more recent. In an electric network,
the impedance linking the input current and the resulting voltage constitutes
an example of such a description.
The question of the relationship between the external and the internal
descriptions of a given system arises. In principle, going from the internal to
the external description is straightforward, as it involves mere elimination of
the state variables. The converse step, i.e. deriving the internal from the
complete, or from an incomplete, external description is highly non-trivial. This
is the problem of modelling, Le. the problem of constructing the state from
measurement:; on the external variables. Realization is the special case where
the systems to be modelIed are assumed to be linear and time-invariant, and
(complete or incomplete) measurements of the impulse response are provided.
The solution of the realization problem is mainly due to R.E. KaIman. It
constituted the first systematic approach of the modelling question and a big
step towards the understanding of the structure of linear systems.
Historically, the realization problem for linear, time-invariant, finitedimensional systems from the transfer function matrix was solved in 1963, in
a paper by Gilbert and in another by KaIman published back-to-back in the
SIAM Journal on Control (see Gilbert [16] and KaIman [17]). Using a partial

*The work of the third author was partially supported by the Inamori Foundation.

192

A. C. Antoulas et al.

fraction decomposition Gilbert showed how to construct a realization with a


minimal number of state variables, provided that the transfer function has simple
poles. KaIman generalized this result to the ca se of multiple poles, and at the
same time proved the famous canonical decomposition theorem which brought
the previously introduced concepts of reachability and observability into play.
This result states that every linear system in state-space form can be decomposed
into four subsystems: one which is reachable and observable, another which is
not reachable but observable, a third which is reachable but not observable
and finally one wh ich is neither reachable nor observable. The important
property of this decomposition is that only the subsystem which is reachable
and observable affects the input/output behavior of the system. This suggests
that unless the realization constructed is reachable and observable, it will contain state variables which are arbitrary, i.e. which cannot be determined from
input/output measurements. The uniqueness theorem of canonical ( = reachable
and observable) realizations which was proved around the same time (see
KaIman, Falb, and Arbib [23, Appendix 1O.C]) completed the picture. It said
that a canonical realization of a given impulse response (or transfer function
matrix) is essentially unique, the only freedom in the construction of the state
being the choice of basis in the state-space (which of course, can never be
determined from input/output measurements). The connection between
canonical (or irreducible) realizations and the so-called McMillan degree of a
rational matrix was investigated in [20].
An interesting special case of the realization problem is concerned with
passive (lumped) electric networks (these are linear, time invariant and finitedimensional systems). A characterization ofthe transfer functions of such systems
was derived in the '30s. The result asserts that a rational function can be
considered as the transfer function of a passive network if, and only if, it is
positive real (i.e. has real coefficients, and its real part is positive for positive values
of the independent variable). This was later generalized to multi-input multioutput networks by introducing the concept of a positive real matrix. In the eady
'60s, with the advent of realization for general linear systems, the prospect of a
realization theory for passive networks arose. The first major step towards this
goal was the translation of this positive realness condition in state-space terms.
Yakubovic in 1962 was able to obtain such a translation for the scalar case.
One year later KaIman [18] provided an alternative proof of this result and
immediately conjectured [19] its matrix version, which was proved by Anderson
four years later. This result is sometimes referred to as the positive-real lemma;
it is also known as the Yakubovitch-Kalman-Popov lemma. Subsequently, using
this lemma, the realization problem (also known as the passive network synthesis
problem) was tackled, mainly by Anderson. For a complete account on the
area we refer to the book by Anderson and Vongpanitlerd [1].
Later, KaIman considered the more general situation where instead of the
transfer function an infinite sequence of constant matrices-sometimes referred
to as the Markov parameters-is provided. In this case the question of existence
ofsolutions to the realization problem arises; roughly speaking, this is equivalent

4 The Realization Problem-Linear Deterministic Systems

193

to the question of compressibility of an infinite set of data to a finite one.


Furthermore, the construction of realizations has to be addressed anew. The
main tool for addressing these issues is a (block) Hankel matrix constructed
from this infinite sequence of Markov parameters, which we will call the
behavior matrix. The existence question turned out to be equivalent to the
finiteness of the rank of this (doubly infinite) matrix, while an explicit algorithm
was given for the construction of canonical realizations in terms of this same
matrix (see B.L. Ho and KaIman [21]). Further work of Kalman's in the area
includes his 1971 and 1979 papers as well as his 1972 joint paper with
Rouchaleau. The first two treat the realization problem with incomplete
measurements ofthe impulse response, known as the partial realization problem;
in the former [22], various results are obtained on the general matrix case,
while in the latter [27] the complete solution in the scalar case is presented,
with numerous connections to classical topics in mathematics. The third paper
[25] is concerned with the extension of realization theory to the class of linear
systems with coefficients in a commutative ring. At the same time KaIman and
Hautus [12, in Part II] showed that the theory ofrealization for continuous-time
linear systems necessitates the introduction of the more refined theory of
distributions, and raises a new type of question on the existence and uniqueness
of canonical realizations. This paper greatly stimulated the later developments
in realization theory of infinite-dimensional systems.
In the sequel we will survey results on the realization problem for linear,
deterministic, finite-dimensional systems (Part I) and for infinite-dimensional
systems (Part II).

Part I: Realization of Finite-Dimensional Systems


1 Realization with Complete Data
In this part, we will briefly review the results on the realization problem of
linear, time-invariant, finite-dimensional systems with both complete and
incomplete (partial) data. Various recent related developments are also surveyed.
These are (a) an application of realization to the general feedback synthesis
problem, (b) the theory of recursive realizations, and (c) the modelling problem
where the input functions are exponentials instead ofimpulses; this is equivalent
to the rational interpolation problem. For another survey, mainly on the
realization problem with complete data, see KaIman [26].
Consider the following family of dynamical systems:

O"x(t):= Fx(t) + Gu(t),


y(t) = Hx(t),

x(O) = 0,

(1)

194

A. C. Antoulas et al.

where the time t is either a continuous variable tER, or a discrete variable tEZ.
The operator (I is defined as follows:
dx(t)
(Ix(t):= - - ,
dt

when tER, and

(Ix(t):= x(t + 1),

when tEZ;

u, x, y are elements of the input space U = R m, the state space X:= Rn, and the
output space Y = RP, respectively, while F, G, H are linear maps between the

following spaces
G:U -tX,

F:X -tX,

H:X - t Y

Equation (1) defines a linear, time-invariant, finite-dimensional system in internal


or state-space form. In short, it will be denoted by the tripie of linear maps (or
matrices)
(2)

L:=(H,F,G)

The dimension of L is defined as


dirn L:= dirn X = n,
while two systems Land t of the same dimension, are called equivalent if they
differ by a state-space isomorphism, i.e. there exists a basis change T:X -tX,
det T i= 0, such that
HT=H,

TF=FT,

TG=G

All systems be10nging to the same equivalence dass as L will be denoted by [L].
The second family of systems we will consider consists of the so-called
convolution systems, i.e. systems where the output y is obtained by convolving
the input u with the impulse response A:
(3)

y(t) = (A*u)(t)

More precise1y, for the continuous-time case


t

y(t) =

JA(t -

r)u(r)dr,

tER,

(4.1)

where the impulse response is assumed to be a p x m real analytic matrix.


For discrete-time systems the function A is given by the sequence of p x m
constant matrices (A 1 ,A z, ... ) and
t-1

y(t) =

L At_ru(r),

tEZ

(4.2)

r=O

Equations (4.1), (4.2) define the external or input/output description of linear,


time-invariant continuous- or discrete-time systems, respectively. For simplicity,
the systems are assumed to be at rest for t = 0.

4 The Realization Problem-Linear Deterministic Systems

195

Since A(t) is real analytic, it is completely determined by the coefficients A k


of its Taylor series expansion at, say, t = 0:
t2
tk
A(t)=A 1 +A 2t+A 3 -+ ... +A k -+ ...
2!
k!

Thus, the input/output description of linear systems both in discrete and


continuous time is given by the infinite sequence of p x m matrices
The A k are sometimes referred to as M arkov parameters.
To formalize the set-up a little more, for fixed positive integers p, n, m we
define the family of equivalence classes of state-space systems
Fint:= {[1:']:1:' = (H, F, G)ERP xn X Rn xn X Rn xm},
and the family of input/output systems
Fext:= {S:S = (Al> A 2 , ... ), AkERP xm}
The question arises: What is the relationship between these two families of
systems?

In one direction, the relationship is easy to establish. Since the impulse


response of (1) is
A(t) = HeFtG,

t ~ 0,

tER,

A(t) = (HG,HFG,HF 2G, ... ),

t>O,

tEZ,

in both cases
At=HP-1G,

t=1,2, ...

Recall, that for both discrete- and continuous-time systems the transfer function
(the transform of A) is
Z(o") = H(01 - F)-lG = HG(J-l

+ HFG(J-2 + ... + HFk-1G(J-k + ...

This provides another way of seeing that S completely specifies the external
behavior for systems of the form (3). We can therefore define a map
(5.1)

between the families of internal and external descriptions as folIows:


[1:' = (H, F, G)J~4>([1:'J):= S = (HG, HFG,HF 2G, ... )

(5.2)

The next question is: is 4> invertible? This question if far from trivial. It is,
in fact, the realization question: given complete data about the impulse response,
can we associate to it a state-space tripIe 1:' (4) surjective), uniquely (4) injective)?
The answer is: both the domain and the range of 4> need to be restricted for it
to become an invertible map.

196

A. C. Antoulas et al.

Recall the following basic definitions. The reachable, unobservable subspaces


of 1:, are
where

X reach:= im R(F, G),


R(F, G):= (G

FG

X unobs:= ker O(H, F),


F 2 G ... ),

O(H,F):= R'(F',H'),

are the reachability, observability matrices of 1:. Thus, 1: is called reachable if


Xreach = X, and observable if Xunobs = O. From the canonical decomposition
theorem [16J the external description of any state-space system 1:, depends
only on the reachable and observable subsystem. This motivates the definition
of the following subfamily of Fint:
F~nt:=

{[1:JEF int:1: is reachable & observable}

As a consequence, the restriction of CP to F~nt, denoted by CPn> becomes injective.


The next step is to compute the image of this map. Since 1: = (H, F, G) is a
finite object, the image of CPn must satisfy some finiteness condition also. This
is as follows. Given SEFext, we attach to it the behavior matrix

B(S):=

This matrix has block Hankel structure and infinitely many rows and columns.
It turns out that the image of CPn is
im CPn = F: xt := {SEFext:rank B(S) = n}
Summarizing we have the following
Main result: Realization with complete data. The map
whose action is defined by (5.2), is a bijection.

In words, there is a bijective correspondence between the set of equivalence


c1asses of reachable and observable systems 1: of dimension n, and the set of
infinite sequences S whose behavior matrix has rank n.
Remarks. (a) The restriction from Fext to F:xt provides the ans wer to the question
of existence of realizations.
(b) The restriction from Fint to F~nt provides the answer to the question of
uniquness of realizations. It says that if the rank condition is satisfied, there is

4 The Realization Problem-Linear Deterministic Systems

197

essentially a unique solution, i.e. a unique equivalence dass of state-space systems


I, which gives rise to S.
(c) The explicit construction ofthe inverse ofthe map <Pn' which is equivalent
to the construction of canonical realizations, will not be discussed. For details
on this construction and on any of the missing proofs, we refer to KaIman
[26].
0

2 Realization with Partial Data


In this case the impulse response is only partly known. In particular, only the
first N Markov parameters are available. Thus the family of input/output
systems we need to consider is
Fext(N):= {SN:= (Al"'" AN): AkERP xm}

In analogy to the ca se of complete data, we need to define the partially


determined behavior matrix
Al
A2
A3

A2
A3

A3

B(SN):=

AN

AN
?

AN
?

where ?s stand for as yet unknown matrices which conserve the block Hankel
structure of B(SN)' The following definitions will be needed in the sequel (see,
e.g. KaIman [27], Bosgra [15]). We will say that the
column (row) of B(SN)
is linearly independent from the previous ones, if there exists a j x j submatrix
of the first j columns (rows) of B(SN) which is nonsingular, independently of
the choice of the unknown elements ? Similarly, the i th column (row) is linearly
dependent on the previous columns (rows) of B(SN) if the determinants of all
ix i submatrices of the first i columns of this behavior matrix depend on some
free parameters (this implies that they can be made zero for some choice of the
free parameters). It follows that the rank of B(SN) is r if there exists an r x r
submatrix which is nonsingular independently of the free parameters ? Consider
the family

F~xt(N):=

{SNEFext(N): rank B(SN) = n}

In a similar way to (5.1), (5.2) we can define the following map between the
family of state-space systems F~nt and the family of input/output systems just
defined:
,I, . Fint --+ Fext(N)
(6.1)
IfJn"
n
n
'

[I]HljJn([I]) = SN = (HF, HFG, ... , HFN-lG)

(6.2)

198

A. C. Antoulas et al.

It is easy to check that this map is surjective but not injective. To see the fact
that it is not injective, consider the case N = 1 and Sl = (1). All members of the

family of one-dimensional systems


17 1 = (1,ei, 1),

eiER.

partially realize S l ' for any arbitrary


Furthermore, every system belonging
to the family of two-dimensional systems

l),(_eiO

L2=((~

_1eiJ,(~)),

which is reachable and observable, provided that - ~ is not an eigenvalue of


the F matrix, realizes S 1 partially, as well.
The above example shows that there are infinitely many (equivalence cIasses
of) canonical solutions, of different dimensions. To sort these out and obtain
an invertible map we need to impose restrictions both on the domain F~nt and
on the range F~xt(N) of l/Jn- To this end we need to introduce the so-called
reachability and observability indices (see, e.g. [24J)
jE,!!,

Kj ,

Vi'

iE!!.,

of LEF~nt. These indices are invariants with respect to equivalence and satisfy

Kj

Vi

= n = dirn 17

ie!!

jE1!J

Using these indices we define the family


Fint.=
{[.LJEFint.
with indices
n
n

KJ'

JEm
v iEp}
-'"
_

With linear dependence and independence of the columns (rows) of the partially
defined behavior matrix as defined above, we attach to B(SN) unique column
indices Cj' jE,!!, and unique row indices ri , iEP, as folio ws. If cj = 1 (r i = 1), the
j'h column (i th row) of the (l + 1)st block coTumn (row) of B(SN) is linearly
dependent on the previous columns (rows), while the
column (i th row) of the
lth block column (row) is linearly independent on the previous columns (rows).
It follows that

LCj= Lri=n=rankB(SN)
Let us now define the family
F~xt(N):= {SEF~xt(N):cj + r i ~ N, jE,!!, iE!!.}

We are now ready to state this section's


Mai result: Realizatio with partial data. The map

.;; . Fint ~ Fext(N)


o/n

'

whose action is defined by (6.2), is a bijection provided that


Vi = ri' iEp.

Kj =

Cj' jE,!!, and

4 The Realization Problem-Linear Deterministic Systems

199

Thus, there is a bijective correspondence between the family of canonical


systems with reachability indices K j , observability indices Vi' and dimension n
on the one hand, and the family of finite sequences of matrices with associated
column indices K j , row indices Vi' and rank of the behavior matrix n, provided
thatKj+vi~N.

Remarks. (a) As in the previous section, the constructions and proofs are

omitted. The interested reader is referred to Bosgra [15] and Antoulas [3] for
details.
(b) Comparing the cases of realization with complete data (infinite sequence
S) and realization with partial data (finite sequence SN)' we see that completeness
of the data forces uniqueness. In the latter case there is no uniqueness. Actually,
there is an infinity of (non-equivalent) solutions. In order to classify this set of
solutions additional conditions are needed. The above result shows that these
additional constraints guarantee uniqueness; they are expressed in terms of
inequalities among the indices K j , Vi and the amount of information (i.e. length
of the sequence) available.
(c) Given SN' suppose that the conditions for uniqueness are not satisfied.
Wh at we need to do is construct all continuations SR of SN such that Kj + Vi ~ N.
Each such continuation corresponds then to one of the solutions with indices
K j , Vi'

3 Further Developments
a. Synthesis Problems

A large class offeedback synthesis problems can be formulated as follows. Given


is a linear dynamic system with two sets of inputs u 1 , U 2 and two sets of outputs
Yl' Y2' described by the following equations (in the transform variable, omitted
for simplicity)
(;:) =

(~:: ~::)(~:)

We are looking for all dynamic compensators given by the proper rational
matrix Zc:
U2

= -ZCY2'

such that the closed loop system described by Y1

Zyu 1 where

Zy = Zl1 - Z12ZdI + Z22Z d- 1Z 21'

has certain properties, the most important being internal stability and the
stability of Zy itself. Using the so-called Youla-Kucera parametrization the

200

A. C. Antoulas et al.

above equation can be rewritten as a linear equation in the Youla-Kucera


parameter Z x:
Zy = Zl - Z2ZxZ3'

where Zl,Z2,Z3 are readily derived from the Zijs (see, e.g. [28]). The starting
point for this calculation is the expression of the Zijs in terms of matrix fractions.
In doing so two possibilities are available: the first is to use polynomial matrix
fractions and the second is to use proper stahle rational matrix fractions. The
latter approach has the advantage that the properness of the compensator Zc
is guaranteed; this is not the case with the former approach. The advantage of
the former approach on the other hand, is that it allows us to keep track of
the complexity (McMillan degree) of the compensator; this in turn is not the
case with the latter approach.
The problem stated above was indeed investigated in Antoulas [2] making
use of polynomial factorizations. It turns out t1iat the properness constraint can
be dealt with in terms of partial realizations. More precisely, Zc is proper if
and only if the parameter Zx is a partial realization of a sequence derived from
the Zijs. This shows the relevance of the realization problem to synthesis
problems.
b. Recursiveness Issues

Given an infinite sequence of constant p x m matrices S = (Al' A 2 , , A k , )


the following result due to Antoulas [3], can be proved. There exist systems
I;, i = 1,2, ... , with two sets of inputs and two sets of outputs, which are
interconnected in a cascade as follows:

Fig. 3.1. Cascade interconnection

4 The Realization Problem-Linear Deterministic Systems

201

In addition, there exist positive integers ni> n i + l' i = 1,2, ... , such that there is
a one-one correspondence between
.EI ~Sl:= (Al" .. ,An,),
.E 1,2~Sl,2:= (Al"'" A nl , .. , An,),

where .E l,k denotes the interconnection of the first k subsystems in the above
cascade. The main property of this correspondence is that .E l,k is a minimal
partial realization of S l,k for k = 1,2, ....
This implies that the above structure is compatible with recursiveness, since
changing any Markov parameter AI, I> nk , does not affect the interconnection
.E l,k of the first k subsystems. This fundamental result provides the basis for a
theory of recursive realizations for multi-input, multi-output systems. The
natural tool for achieving this is a certain polynomial unimodular matrix of
size p + m, which can be associated to every system with m inputs and p outputs.
The result described above has numerous connections to other topics like
(matrix) continued fraction expansions, (matrix) Euclidean algorithm, (matrix)
linear fractional transformations, Kronecker indices, etc.. Furthermore, for
ni = i, an explicit construction of the systems in the above diagram, from the
Markov parameters, has been worked out [3]. This construction constitutes
the matrix generalization of the well known Berlekamp-Massey recursive
algorithm.
The results described above were triggered by Kalman's paper on the partial
realization problem [27]. For details, the main source is Antoulas [3]. Further
investigations along the same lines can be found in Antoulas [8], [11], Antoulas
and Bishop [5]. In [11] it is shown that the machinery introduced in [3] leads
to a new test for minimality of (both scalar and matrix) realizations based on the
degrees of the entries of an appropriate Bezout identity. In [5] various
recursiveness results are translated to a state-space setting and contact with
geometrie control is established. Furthermore, in [8], it is shown that many,
seemingly diverse results in system theory, can be explained in a unified way
using the cascade interconnection of two-port systems mentioned above, as a
too1. In particular, this refers to passive network synthesis results like Darlington
synthesis and Inverse Scattering, the underlying Nevanlinna-Pick algorithm,
and of course, the recursive realization results just mentioned. Finally, this
recursiveness structure can be extended to very general classes of modelling
problems (see Antoulas and Willems [14]).
c. Rational Interpolation

Given the sequence of constant p x m matrices SN = (Al"'" AN), an equivalent


formulation of the (partial) realization problem is the following. Find all p x m

202

A. C. Antoulas et al.

rational matrices Z such that their formal power series expansion is


Z(er) = Al er- l

+ ... + ANer- N + ...

This can easily be converted into a rational interpolation problem. Let


Z(er):= Z(er- l ) = Aler +

... + AN er N + ...

It follows that

A k =..!.. dkZ(er)!
k! der k a=O
Hence, realization can be interpreted as interpolation at the origin. Moreover,
the complexity of Z(er) is the same as that of Z(er).
It is easy to see that black-box experiments (with linear systems) using
exponentials instead of impulses result in information on the value (and the
value of a certain number of derivatives) of the transfer function at points in the
complex plane which correspond to the frequency of these exponentials. Thus
experiments of this sort result in interpolation data.
The question arises as to whether realization and interpolation can be treated
in a unifying framework. As shown in Antoulas and Anderson [4] this is indeed
the case. Actually, the tool replacing the (partially defined) behavior matrix
B(SN)' which has Hankel structure, is the so-called Lwner matrix. It can be
shown that this Lwner matrix reduces to a Hankel matrix whenever we are
dealing with single point interpolation (which was shown above to include the
realization problem). Therefore the Lwner matrix framework provides a
generalization of the realization problem.
Various other investigations on the rational interpolation problem have
followed. In Antoulas [6] the (scalar) rational interpolation problem, with a
different degree of complexity (namely the sum of the numerator and the
denominator degrees, as opposed to the maximum between the numerator
and the denominator degrees), is considered. It is shown that the Euclidean
algorithm, and the degrees of the successive quotients, provide the key to
parametrizing all solutions, where the complexity is the parameter defined above.
This clarifies many aspects ofthe Pade and the Cauchy approximation problems,
not fully understood in the literature. In Antoulas and Anderson [7], various
classical results in rational interpolation theory are revisited and reinterpreted
using the Lwner matrix introduced above. An important result in this regard
is the fact that the celebrated Nevanlinna-Pick algorithm, consists of nothing
more than unconstrained minimal interpolation of the original set of data,
together with a mirror-image set of data. In Antoulas and Anderson [9], furt her
insight is provided on the use of the Lwner matrix in the study of the scalar
rational interpolation problem. In particular, all solutions are constructed both
in a state-space and a polynomial setting. In Anderson and Antoulas [10], the
construction of solutions of the matrix rational interpolation problem, in
state-space form, is derived using the (block) Lwner matrix. This theory
generalizes the construction of realizations by means of the (block) Hankel

4 The Realization Problem-Linear Deterministic Systems

203

matrix. Finally, Antoulas and Willems [13], provide a novel approach which
in [12] leads for the first time, to the solution of a general matrix rational
interpolation problem with the McMillan degree as complexity; the computation
of minimal interpolants follows as a corollary. This is done by defining directly
from the data, a pair of matrices. All admissible interpolant degrees are then
obtained from the reachability indices ofthis pair. Moreover, the corresponding
linear dependencies ofthe columns ofthe reachability matrix ofthis pair, provide
a parametrization of all interpolants of appropriate complexity. These results
are also extended in [12] to a very general bitangential interpolation problem
by the appropriate definition of a second pair of matrices.

References
[1] B.D.O. Anderson and S. Vongpanitlerd, Network analysis and synthesis: a modern systems
theory approach, Prentice Hall (1973)
[2] A.C. Antoulas, A new approach to synthesis problems in linear systems, IEEE Transaetions
on Automatie Control, AC-30, pp 465-474 (1985)
[3] A.C. Antoulas, On recursiveness and related topics in linear systems, IEEE Transaetions on
Automatie Control, AC-31, pp 1121-1135 (1986)
[4] A.C. Antoulas and B.D.O. Anderson, On the scalar rational interpolation problem, IMA J
of Mathematieal Control and Information, Special Issue on Parametrization problems, edited
by D. Hinrichsen and J.c. Willems, 3, pp 61-88 (1986)
[5] A.C. Antoulas and R.H. Bishop, Continued fraction decomposition of linear systems in the
space state, Systems and Control Letters, 9, pp 43-53 (1987)
[6] A.C. Antoulas, Rational interpolation and the Euclidean Aigorithm, Linear Algebra and Its
Applieations, 108, pp 157-171 (1988)
[7] A.C. Antoulas and B.D.O. Anderson, On the stable rational interpolation problem, Linear
Algebra and Its Applieations, Special Issue on Linear Control Theory, 122/123/124, pp 301-329
(1989)
[8] A.C. Antoulas, The cascade structure in system theory, in Three deeades of mathematical
system theory, edited by H. Nijmeijer and J.M. Schumacher, Springer Lecture Notes in Control
and Information Science, 135, pp 1-18 (1989)
[9] A.C. Antoulas and B.D.O. Anderson, State space and polynomial approaches to rational
interpolation, Progress in Systems and Control Theory: Realization and Modelling in System
Theory, MA Kaashoek, J.H. van Schuppen, and A.C.M. Ran, Eds., vol. I, pp. 73-82,
Birkhuser (1990).
[10] B.D.O. Anderson and A.C. Antoulas, Rational interpolation and state-variable realizations,
Linear Algebra and Its Applieations, Special Issue on Matrix problems, 137/138: 479-509
(1990)
[11] A.C. Antoulas, On minimal realizations, Systems and Control Letters, 14: 319-324 (1990)
[12] A.C. Antoulas, JA Ball, J. Kang, and J.C. Willems, On the solution of the minimal rational
interpolation prpblem, Linear Algebra and Its Applieations, Special Issue on Matrix Problems,
137/138: 511-573 (1990)
[13] A.C. Antoulas and J.C. Willems, Rational interpolation and Prony's method, Analysis and
Optimization of Systems, J.L. Lions and A. Beusoussan, Eds., Springer Verlag, Lecture Notes
in Control and Information Sciences, 144: 297-306 (1990).
[14] AC. Antoulas and J.C. Willems, Linear modeling and recursive modeling, ECE Rice UniverSity,
Tech. Report 91-05 (1991)
[15] O.H. Bosgra, On parametrizations of the minimal partial realization problem, Systems &
Control Letters, 3: 181-187 (1983)
[16] E.G. Gilbert, Controllability and observability in multivariable control systems, SIAM J
Control, 1: 128-151 (1963)
[17] R.E. Kaiman, Mathematical description of linear dynamical systems, SIAM J Control, 1:
152-192 (1963)

204

A. C. Antoulas et al.

[18J R.E. KaIman, Lyapunov functions for the problem of Lur'e in automatie control, Proc
National Academy of Sciences (USA), 49: 201-205 (1963)
[19J R.E. KaIman, On a new characterization of linear passive systems, Proc First A/lerton
Conference on Circuits and Systems, University of Illinois, pp 456-470 (1963)
[20J R.E. KaIman, Irreducible realizations and the degree of a rational matrix, SIAM J Control,
13: 520-544 (1965)
[21J B.L. Ho and R.E. KaIman, EfTective construction of linear state-variable models from
input/output data, Regelungstechnik, 14: 545-548 (1966)
[22J R.E. KaIman, On minimal partial realizations of a linear input/output map, in Aspects of
network and system theory, R.E. KaIman and N. DeClaris Eds., Holt, Rinehart, and Winston,
pp 385-408 (1971)
[23J R.E. KaIman, P.L. Falb, and M.A. Arbib, Topics in mathematical system theory, McGraw-HiIl
(1969)
[24J R.E. KaIman, Kronecker invariants and feedback, in Proc 1971 NRL-MRC Conf on Ordinary
Differential Equations, edited by L. Weiss, Academic Press (1972)
[25J R.E. KaIman and Y. Rouchaleau, Realization theory of linear systems over a commutative
ring, in Automata theory, languages, and programming, edited by M. Nivat, North Holland,
pp 61-65 (1972)
[26J R.E. Kaiman, Realization theory of linear dynamical systems, in Control theory and functional
analysis, Vol H, International Atomic Energy Agency, Vienna, pp 235-256 (1976)
[27J R.E. KaIman, On partial realizations, transfer functions, and canonical forms, Acta
Polytechnica Scandinavica, Mathematics and Computer Science Series, 31: 9-32 (1979)
[28J J.B. Pearson, On the parametrization of input-output maps for stable linear systems, this
volume, chapter V, pp 345-354

Part 11: Realization of Infinite-Dimensional Systems


1 Uniqueness of Canonical Realizations
In this part, we discuss Kalman's influence on realization theory of
infinite-dimensional systems. Our objective is, however, a rather modest one:
we try to exhibit how his basic contributions and philosophy affected realization
theory of such systems, mainly in relation to the authors' works.
In his various articles (e.g. [10J), KaIman repeatedly emphasized the
importance of the uniqueness principle of models. Certainly, if models were not
unique, wh at would be the scientific ground for the validity of the results deduced
from those models? What if we had two essentially different (non-equivalent)
models for the solar system which have equal validity?

This principle, however, requires a more advanced treatment, and should


be made mathematically precise. As we know today, this led to the discovery
of minimal and canonical realizations, and the precise mathematical statement
of the uniqueness of canonical (minimal) realizations up to system isomorphism
([l1J).
The first marked difference in realization of infinite-dimensional systems is
that this uniqueness theorem does not necessarily hold-at least not in the
same sense as in the finite-dimensional case. First of all, one needs to modify
the notion of canonical realizations. For infinite-dimensional systems, it is

4 The Realization Problem-Linear Deterministic Systems

205

difficult to expect every state to be reachable. Instead, one usually requires that
the reachable states constitute a dense subspace. This notion is called approximate reachability. A system that is approximately reachable and observable is
said to be weakly canonical.
This notion of canonicity appears to be a natural extension of the finitedimensional case, so one may expect that weakly canonical realizations may
be unique. However, Baras, Brockett, and Fuhrmann [2J have shown that
there exist two (in fact, infinitely many) systems 1: 1 and 1: 2 having the same
impulse response, both weakly canonical, yet they are not isomorphic. The
difficulty here is oftopological nature. They both have a Hilbert state space, and
there exists a continuous system morphism:
(1)

yet T is not continuously invertible. In other words, under the weak notion of
canonicity, the topology of the system cannot be uniquely determined from its
external data.
The reason for this nonuniqueness is that approximate reachability or
observability imposes too little restriction on the topology of the state space.
This suggests the need of strengthening the notion of canonicity to obtain the
desired uniqueness. KaIman hirnself did not regard the notions of controllability
and observability as those that cannot be changed. Rather, they should be
properly modified according to the context in which systems are considered.
Let us quote from his CIME Lecture Notes ([9J):
"The chief current problem in controllability theory is the extension to
more elaborate algebraic structures." (p. 141-Historical Comments)
Reading "topological" for "algebraic" precisely applies to the present context.
In fact, it has become a standard understading that it is appropriate to consider
the notions of re ach ability and observability in the category in which systems
are considered.
There are a number of approaches toward the uniqueness principle along
this line. Helton [8J gave a uniqueness theorem under the requirement of exact
reachability. Brockett and Fuhrmann [3J derived a different uniqueness theorem
by restricting to systems with certain symmetry properties.
There is, however, another aspect that KaIman emphasized. In his
k[z]-module approach to realization, input functions are sequences of bounded
length, and every canonical realization can be obtained as a result of computing
the Nerode equivalence classes ([14J) of such input functions ([llJ). He pursued
this line of approach in [12J for the study of continuous-time systems. However,
to make the module theory for this case parallel to the discrete-time systems,
aspace of distributions was introduced as an input function space. (See also
Matsuo [15J and Kamen [13J for related treatments.) This introduction of a
space of distributions leads to a non-Hilbert space structure, and it is in marked
contrast with the works [lJ, [3J, [4J, [8], where the theory is essentially in the
realm of L2 -theory, and the Nerode equivalence classes are not explicitly present.

206

A. C. Antoulas et al.

Therefore, in this context, more advanced tools are needed to derive the
uniqueness theorem for canonical realizations. Matsuo [16] made use of Ptak's
open mappingjdosed graph theorem ([17]), and proved the uniqueness theorem
under the requirements that
1. the input function space be a so-caIled Ptak space ([17]);
2. the state-space be a barreled space;
3. "canonical" means exactly reachable and observable.

Theorem 11.1 (Existence and Uniqueness of Canonical Realizations.) Suppose


that the input function space is a barreled Ptak space. Then a canonical (i.e.
exactly reachable and observable) realization with barreled state-space exists and
is unique up to isomorphism.

The above approaches (except [3]) aIl focus upon the notion of reachability.
An alternative approach is, however, also possible. This approach places more
emphasis upon observability, and introduce a stronger notion of observability;
on the other hand, exact reachability is not required. To motivate, let us return
to the question of extending the notions of reachability and observability, and
focus on the notion of observability.
A state-space is a gadget which stores the past his tory of inputs to the system
that is enough to determine the future behavior of the system outputs ([11]).
States are not directly observed, but rather through observation of outputs
only. From the viewpoint of realization theory, therefore, states can be anything
that satisfy this requirement. In the finite-dimensional ca se, this led to the notion
of abstract vector spaces, and the co ordinate system chosen there is of secondary
importance. It is mainly for the convenience of computation, conciseness of
expression, etc ..
A similar argument can be made on topologies of the state-space in infinite
dimensions. Since states cannot be directly observed, the doseness of two states
(i.e. topology) is observed only through the corresponding outputs. Of course,
two dose states should produce dose outputs. But if we require observability,
we want to condude that two states are dose if the corresponding outputs are
dose. This is precisely Kalman's basic idea on observability, adopted in the
topological context. In fact, for the finite-dimensional case, observability
guarantees this property. In other words, initial state determination is always
weIl posed for the finite-dimensional case. This is an implicit consequence of
observability in finite-dimensional systems.
However, due to the very infinite-dimensionality, this weIl-posedness does
not automatically hold for systems with infinite-dimensional state spaces, even
though they are observable. Observability merely requires that the correspondence
initial states H outputs

(2)

be one-to-one, but it is not necessarily weIl posed. This means the following: If
(2) is not weIl posed, then even if we observe two outputs to be very dose, the

4 The Realization Problem-Linear Deterministic Systems

207

corresponding states can be very far away. Since there is nothing else to rely
upon other than the observation of outputs, there would be no way to recover
the knowledge of closeness of states in such a case.
This observation naturally leads to the notion of topological observability
([18J): A system is said to be topologically observable if the correspondence (2)
is weIl posed, i.e. it is continuously invertible. We say that a system is canonical
if it is both approximately reachable and topologically observable. This is a
strong restriction and there is a question as to if it may be an obstruction
against the existence of a canonical realization.
It turns out that a canonical realization in this sense always exists and is
unique. We conclude this section with this theorem.
Theorem 11.2 (Existence and Uniqueness of Canonical Realizations.) Let f be
a linear input/output map. Then its canonical (i.e. approximately reachable
and topologically observable) realization always exists, and is unique up to
isomorphism.

Isomorphism he re is topological, that is, it is continuous and is continuously


invertible. Another consequence is that this unique realization is given as the
so-called shift realization. A shift realization is constructed as folIows: We
consider the set of all output functions, take its suitable closure and regard it
as the state-space. For its free state transition, we take the semigroup generated
by the left shifts (whence the name). The rest of the system equations follows
naturally from the input/output correspondence. For a more precise statement,
see [18].
To see the relationship between the two uniqueness theorems above, let il
and r be suitable input and output function spaces. Suppose that il and r
are dual to each other, and f:il -+ r is an input/output map. In the
single-input/single-output case, we can identify this f with its adjoint l' using
this duality. The exactly reachable canonical realization is il/ker fand
topologically observable canonical realization corresponds to imf. By the
standard duality theory ([17J), il/kerfand imf are topologically dual to each
other. Furthermore, if X is a reflexive space, an observability map h:X -+ r is
continuously invertible if and only if its adjoint h': il -+ X' is surjective, i.e. exact
reachability and topological observability are dual to each other. However, the
two realizations may behave differently in actual computation of realizations.

2 Computation of Canonical Realizations


The success of Kalman's k[z]-module theory lies also in making the ground
for computation of canonical realizations. This is exhibited in [I1J, and later
extended to the case ofpolynomial matrix fraction representations by Fuhrmann
[6]. Adopting this methodology as a strong guideline, we show that for some
class of input/output maps, the shift realizations can be naturally computed.

208

A. C. Antoulas et al.

The difficulty in computation of such realizations arises apparently from


the lack of a finiteness condition. In the finite-dimensional case, the underling
finiteness is the very finite-dimensionality of the state-space. What is there to
be regarded ''finite'' in the infinite-dimensional case?
An important consequence of the finite-dimensionality is that a finite step
state reconstruction is possible. In other words, only bounded-time data are
needed for determination of initial states. In the context of topological
observability, this can be stated as folIo ws: Let x be an initial state, y = h(x) its
corresponding output. Suppose that
(3)

is continuously invertible for some T> o. If this property holds, the system is
said to be topologically observable in bounded time. Interestingly enough, this
condition gives a necessary and sufficient condition for the shift realization
(described in the last section) to have a Banach state space ([18J).
Kamen [13J considered a related notion of finiteness. In study of boundedtime controllability, he is led to the study of impulse responses of the form

(4)
where q and p are Schwartz distributions of compact support contained in
( - 00,0]. This is in a precise analog ofKalman's k[zJ-module setting for discretetime systems. There, polynomials are regarded as signals of finite length given
on ( - 00, OJ nZ, and transfer functions are ratios ofsuch objects. In (4), one takes
distributions of compact support in ( - 00, OJ instead of polynomials in z. A
typical example is given by delay-differential equations. For example, let W(s)
be a transfer function
1

W(s)=-ses -1

(5)

The inverse Laplace transforms of sand eS are ()' (the derivative of Dirac's delta
distribution) and () -1 (Dirac's delta at point -1), respectively. Then the
corresponding impulse response becomes
(6)

which is of type (4).


If the denominator distribution satisfies a regularity condition (see [22J),
we call impulse responses of type (4) pseudorational ([22J). Impulse responses
of delay systems satisfy this property ([19J). It turns out that pseudorationality
is in very dose connection with topological observability in bounded time. If
an impulse response is pseudorational, its canonical realization is topologically
observable in bounded time, and hence possesses a Banach state-space ([ 19J).
An important observation here is that the very "finiteness" which played
such a crucial role in finite-dimensional systems can be generalized to the infinitedimensional systems through the introduction of pseudorationality. Here "finite"
means the finite length of the support of the denominator distribution of the

4 The Realization Problem-Linear Deterministic Systems

209

impulse response, which in turn yields the finite-time reconstruction ofthe states.
In what follows, we show that this finite-time property enables us to compute
the Nerode equivalence classes very naturally as a generalization of the works
[11] and [6].
Let A = q-1 *P be pseudorational. Consider the following subspace xq of
L~oc [0, 00 ):

xq:=

{xEL~oJO,

oo);supp(q*x) c ( - 00,0]}.

(7)

The space X q is the set of all output functions genera ted by the denominator
distribution q - 1. As can be imagined from the finite-dimensional theory
([11], [6]), this is a dual way of representing the Nerode equivalence classes of
the impulse response A:= q - 1. In view of the separate continuity of convolution,
this space is closed in L~oJO, (0). It is also easy to check that (7) is closed under
the left shift operators:
(O"tx)(r):= x(t + r).

(8)

Since {O"t} comprise a strongly continuous semigroup in L~oc [0, (0), they also
form a semigroup in x q Let F be the infinitesimal generator of this semigroup.
Then the following differential equation description gives a realization of A:
State Space: x q
Systems Equation:

dt x t()

= Fx t (') + A(' )u(t)

(9)

(10)

y(t) = xlO)

where
Fx(r):=dx, with domain D(F)
dr

= WilOC[O,oo)nX A
'

(11 )

This system is denoted I:q,p. The realization I:q,p is an infinite-dimensional


analog of the well-known Fuhrmann realization for finite-dimensional systems
([6]). (See also [5] for the same li ne of approach in a different setting.) I:q,p is
topologically observable in bounded time, but it need not be approximately
reachable, because the state-space is built on the information given by the
denominator q only. The essential questions that should be addressed here are:
1. How can I:q,p be computed?
2. When is system I:q,p approximately reachable, in other words, canonical?

In the rest of the section, we will see the following properties of I:q,p:
1. When explicit forms of q and p are given, it is possible to exhibit I:q,p.
2. I:q,p often agrees with natural function space models, such as M 2-model for
delay systems.

210

A. C. Antoulas et al.

3. Basic properties of Eq,p such as reachability, spectrum location, can be


characterized directly in terms of the factorization q -1 * p, without recourse
to explicit forms of Eq,p.

Let us again take the transfer function (5):


1
W(s)=--,
ses -1

(12)

and its inverse Laplace transform (6):


A = (0'-1 - 0)-1 *0 =:q-1*p

(13)

x q To this end, take a sufficiently smooth


and consider the condition

A is pseudorational. Let us compute


x(t)EL~oc[O, 00),

supp(q*X) c (- 00,0]

(14)

Condition (14) is equivalent to


q*x = 0

(15)

on (0,00)

This yields
x(t + 1) - x(t) = 0

for all t> 0

(16)

Hence
t

x(t) = x(1) + Sx(r - 1)dr

(17)

for 1 ~ t < 2. Iteration of this formula implies that under (16), x(t) is completely
determined by the data XI[O,l) and x(l). Taking the c10sure of all such x in
L~oc[O, 00) to obtain X q , we conc1ude
(18)

Write (x, z(O for an element in L 2 [0, 1] x R. To derive the system equation (9)
in this space, we need only to compute
lim ~ [a.(x, z(O - (x,z(O))],
.~O e

(19)

under the equivalence (18), and the fact that z(O) and x gives the data of X[O,l)
and x(l). An easy calculation ([22]) yields

:t

C;~)) ~ (::':~:O) )+ G)
y(t) = Zt(O)

.(t),

(20)

4 The Realization Problem-Linear Deterministic Systems

211

This is nothing but the well known M 2-model for the delay-differential equation:
x(t) = x(t - 1) + u(t),
y(t) = x(t - 1)

(21)

Let us remark the following:


The procedure above is very similar to the transfer function realization for
the finite-dimensional case; it is essentially the computation of the Nerode
equivalence classes exhibited in [11] and [6].
For neutral systems, this procedure again yields an M 2-model-different
from the more common W~ model ([21]).
Now is this realization canonical? The following theorem answers this.
Theorem 11.3 ([22], [23]). The realization l:q,p (9)-(11) is canonical if and only
if the pair (q,p) is approximately coprime, i.e. there exists a sequence (rn>sn) of
distributions having compact support in ( - 00,0] such that
q*rn + p* Sn --+ (j

(22)

in the sense of distributions. Furthermore, this holds

if and only if

1. there is no common zero between the Laplace transforms q(s) and (s), and

2. sUp{t;tEsuppqusuppp} =0.
Since in the present case the pair
(j'-l -

(5)*0 + c5*c5 = c5,

(j'-l - (j, (j)

satisfies the Bezout identity


(23)

realization (20) is canonical.


It is desirable to deduce various system properties without recourse to a
concrete representation such as (20). As a precise analog of the finite-dimensional
case, we have the following result ([20]):
Theorem 11.4. Let l:q,p be as above, and F the infinitesimal generator of the
transition semigroup. Then the spectrum o{F) is given by
a(F) = {AEC;q(A.)

= O}

(24)

Every AEa(F) is an eigenvalue, and has a finite multiplicity wh ich is equal to the
dimension of the corresponding generalized eigenspace.

3 Some Remarks
Mainly due to the page limitation, we could not discuss the L 2 input/output
framework (and its transformed form H 2 theory) developed in [1], [2], [4],
[5], [8], etc. For more details, see [7] and references therein. The H 2 framework
recently received a renewed interest in connection with the H<X>-control theory,
of wh ich consequences are yet to be seen in the future developments. In all

212

A. C. Antoulas et al.

these works, Kalman's abstract realization framework and k[z]-module theory


had an influence in one way or another. We have only reviewed one aspect of
his influence mainly in relation to the authors' works. However, it is obvious
that his emphasis upon the uniqueness principle and the need for a canonical
construction served as a strong guideline even in this limited case.
References
[1] 1.S. Baras and R.W. Brockett, "H 2 -functions and infinite-dimensional realization theory,"
SIAM J Contr, 13: 221-241, 1975
[2] l.S. Baras, R.W. Brockett, and P.A. Fuhrmann, "State-space models for infinite-dimensional
systems," IEEE Trans Autom Contr, AC-19: 693-700, 1974
[3] R.W. Brockett and P.A. Fuhrmann, "Normal symmetrie dynamical systems," SIAM J Cont,
& Optimiz, 14: 107-119, 1976
[4] P. DeWilde, "Input-output description of roomy systems," SIAM J Contr & Optimiz, 14:
712-736, 1976
[5] P.A. Fuhrmann, "On realization of linear systems and applications to some questions of
stability," Math Syst Theory, 8: 132-141, 1974
[6] P.A. Fuhrmann, "Algebraic system theory: an analyst's point of view," J FrankIin Inst, 301:
521-540, 1976
[7] P.A. Fuhrmann, Linear Systems and Operators in Hilbert Space, McGraw-Hill, 1981
[8] l.W. Helton, "Systems with infinite-dimensional state space: the Hilbert space approach,"
Proc IEEE, 64: 145-160, 1976
[9] R.E. Kaiman, Lectures on Controllability and Observability, Lecture Notes at Centro
Internazionale Matematico Estivo, Bologna, Italy, 1968
[10] R.E. Kaiman, "First Kyoto Prize Commemorative Lecture," Inamori Foundation, 1990
[11] R.E. Kaiman, P.L. Falb, and M.A. Arbib, Topics in Mathematical System Theory,
McGraw-Hill, 1969.
[12] R.E. Kaiman and M.L.l. Hautus, "Realization of continuous-time linear dynamical systems:
rigorous theory in the style of Schwartz," Ordinary Differential Equations, 1971 NRL-MRC
Conference, edited by L. Weiss Academic Press, 1972
[13] E.W. Karnen, "Module structure of infinite-dimensional systems with applications to
controllability", SIAM J Contr & Optimiz, 14: 389-408, 1976
[14] A. Nerode, "Linear automat on transformations," Proc Amer Math Soc, 9: 541-544, 1958
[15] T. Matsuo, "Mathematical theory of linear continuous time systems," Research Rep Autom
Contr Lab Nagoya Univ., 16: 11-17, 1969
[16] T. Matsuo, Realization Theory of Continuous- Time Dynamical Systems, Springer Lecture Notes
in Control and Information Sei 32, 1981
[17] H.H. Schaefer, Topological Vector Spaces, Springer, 1971
[18] Y. Yamamoto, "Realization theory of infinite-dimensional linear systems, Part I & 11," Math
Syst Theory, 15: 55-77, 169-190, 1981
[19] Y. Yamamoto, "A note on linear input/output maps of bounded type," IEEE Trans Autom
Control, AC-29: 733-734, 1984
[20] Y. Yamamoto, "Realization ofpseudo-rational input/output maps and its spectral properties,"
Mem Fac Eng Kyoto Univ, Vo147: 221-239, 1985
[21] Y. Yamamoto and S. Ueshima, "A new model for neutral delay-differential systems," Int J
Control, 43: 465-472, 1986
[22] Y. Yamamoto, "Pseudo-rational input/output maps and their realizations: a fractional
representation approach to infinite-dimensional systems," SIAM J Control & Optimiz, 26:
1415-1430, 1988
[23] Y. Yamamoto, "Reachability of a dass of infinite-dimensional linear systems: an external
approach with applications to general neutral systems," SIAM J Control & Optimiz, 27:
217-234, 1989

Stochastic Realization Theory


G. Picci
Dipartimento di Elettroniea ed Informatiea, Universita di Padova, via Gradenigo 6/A,
35131 Padova, Italy

The use of state-spaee models for modelling and proeessing of random signals was introdueed by
Kaiman at the very beginning of the history of System Theory. Although speetaeular sueeesses
have emerged from the introduetion of these models (Kaiman filtering to name just one), until quite
reeently there has not been any serious efTort of putting together in a logically eonsistent way a
theory of modelling and model representation in the stoehastie frame. Expanding applieations to
diverse fields like Eeonometrics ete. and a multitude of non standard estimation problems arising
in engineering applieations seem now to render the need for such a theory more urgent.
In this paper we diseuss some ideas whieh are believed to be the eentral concepts needed for
understanding stoehastie modelling. Stoehastie realization is seen as the problem of transforming
models of "phenomenologieal" type (ealled external) into models possessing more strueture, whieh
require the introduetion of auxiliary variables (internal models).

1 Introduction
Mathematical models of dynamic phenomena can be classified in two broad
categories: external models, which are mathematical relations involving only the
external variables of the system [extern al variables are by definition those
directly accessible to observation (measurement variables) or control (decision
variables)] and interna I models, which, besides the external, also involve auxiliary
variables. Auxiliary variables (also called internal or latent variables) need not
have any direct physical or economic meaning and are introduced at the purpose
of giving to the model a special mathematical structure. In a sense they play
the role of additional dynamical parameters which if necessary can be eliminated
returning to an external description.
Realization may be broadly defined as the problem of transforming external
models into internal models (to within a specified structure). According to this
definition, realization may then abstractly be viewed as a problem of parametrization falling into the same general category of, say, representing an algebraic
curve r = {(x, y); F(x, y) = O} in the plane (here x and y are the external variables
and F(x, y) = 0, F a polynomial, is the external model) in parametric form
r = {(x, y); x = <!>(t), y = t/!(t), tEI}, the parameter tEl being the "latent"
auxiliary variable and the above "parametric" description the wanted "internai"
model of r.

214

G. Picci

R.E. KaIman originated reaEzation theory in the early sixties studying the
following setup:
- The c1ass of external models consists of causal linear input-output maps
("input" here is used as a synonym of control).
-An internal auxiliary variable x is defined by the property of making past
inputs and future outputs conditionally independent! given the current value
of x. This is called the state property and x the state variable of the system.
Internal models with the auxiliary variable x possessing the state property
are called state-space system.
It is a fact that the introduction of state-space systems has had a profound
influence in the development of modern engineering sciences. From one side a
natural mathematical framework has emerged for stating and solving many
basic problems of control and communication engineering. The role of"sufficient
statistic" of the state variable alluded above, permits the translation of control
and filtering problems into control and filtering oJ the state, thus leading to
general prototype problems which are formulated and solved in a universal
format, leading for the first time to an effective "theory of communication and
control".
From the computational viewpoint, the concept of state-space system
appears very much as the right generalization of the concept of recursively
computableJunction to the (infinite cardinality) continuous case. The solution of
a control/observation problem stated using state-space models can most
naturally be given by producing a new state-space system which does not show
the solution in c10sed form but does instead the signal processing required for
implementing it. Thus state-space dynamical systems play a dual role as models
and as computational schemes. The importance of this aspect has taken sometime
to be fully appreciated, aithough recent emphasis on computational methods
has led to quite a change of perspective. Nowadays even Bode plots are
computed by state-space methods [15].
For the above reasons, in the last two decades there have been substantial
efforts to generalize modelling and realization theory beyond Kalman's original
input-output setup. In particular, motivated by areas like statistical signal
processing and econometrics, the question of understanding stochastic modelling
and building realization theory in the context of stochastic models has naturally
arisen. A first indication in this direction was given already in [7]. Also, one
would like to inc1ude in the theory the ca se of autonomous (deterministic)
systems where there are no control variables and input-output maps cannot be
taken as a primary external description. The originators of Stochastic Realization were Akaike [lJ, Ruckebusch [22J and Picci [16J, but the main body
of the theory is especially due to Lindquist-Picci and Ruckebusch and is
summarized in the survey papers [13J, [22J, see also [23]. The "autonomous"
setting has been developed into a very articulate theoretical construction by
1

This idea will be discussed in more detail in the sequel.

4 The Realization Problem-Stochastic Systems

215

J.c. Willems [25-28J and by the dutch school. It is remarkable that these two
apparently different contexts turn out to involve in reality quite similar ideas
[20]. In this respect, we would like to present this paper much more as an
attempt to sort out general ideas on modelling and model structure rather than
areport on specific results on stochastic realization. We believe that many ideas,
although originated in a probabilistic context and described here in a probabilistic language, have general significance. Some can be recognized in Willems'
theory even if there are no probability measures around.
We shall not attempt any survey of the literature on stochastic realization
theory. We should however point out that, at least for the linear-Gaussian case,
a distributional modelling theory, based on spectral factorization and the
so-called Positive-Real Lemma (the Yakubovich-Kalman-Popov Lemma) has
been available since the late sixties, [2J, [6]. This was not satisfactory however
as in virtually all ofthe applications, the processing of"random" signals requires
processing of a specific time trajectory of the signal. Therefore a theory based
on consideration of "sam pie values" is needed, not just a distributional one.
Now, this brings in direct1y the question of defining stochastic models, the basic
objects of our study.
Definition. A stochastic dynamical system is a stochastic process z:= {z( t) LET
defined on a parametrized family of probability spaces {.o, si, .uu}. The parameter
u is the control variable, a deterministic function of time tE T, belonging to some
set lJlt of admissible control functions and the dependence of the probability
measure .uu upon u is causal, i.e. for every event A belonging to the past history
of y at time t, .uu(A) depends only on values taken by u before and at time t.

The process z will in general take its values in a product space Y x X (of
external and internal signal alphabets) and the relative components y and x are
declared external and internal (or latent) variables. We shall usually write z as
an ordered pair z = (y, x). The system is called an external description if z == y
and an internal description if internal variables (x) are present. Naturally the
probability space is only assigned up to stochastic equivalence (it can be fixed
in some canonical way). The a-algebra 2 si represents the events being modelled
by the system. Clearly si contains at least the events relative to z i.e. si ::> f!Z:=
a{z(t); tET} but it may be bigger. Finally u, called the control (or decision)
variable is a variable with no dynamical description (i.e. without a probabilistic
description). Assigning a dynamics to u, thus making it also into a stochastic
process is actually what control theory is about and need not concern us here.
The causality of the map u ~ .uu is postulated because of the meaning of u as a
decision variable (no clairvoyance).
The notion of autonomous dynamical systems descends from the general
definition by deleting control variables (i.e. taking lJlt to be a singleton). This is
2 All O"-algebras will be assumed Jl-complete and the qualification "Jl-almost surely" is tacitly
understood whenever appropriate.

216

G. Picci

actually nothing different from the ordinary notion of a stochastic process of


Probability Theory: just aspace of time trajectories endowed with a probability
measure. If control variables are present, one has to take into account that the
trajectories of the system may vary either because of a random choice of the
elementary event in Q or also because of different past histories of the control
variable. This additional dynamics is of the same "causal input-output" type
as in the original KaIman setup. In a linear-Gaussian framework for ex am pIe,
the dependence on u is typically concentrated in the expectation of the distribution which can in turn be interpreted as conditional expectation of z(t) given
past u's. If this dependence is linear, it has the classical convolution integral
form. Thus nothing radically new is expected to emerge from consideration of
control variables. For this reason, and also in view of notational simplicity,
from now on we shall restrict to autonomous systems only.
It should be appreciated that there cannot be finite sets of differential or
difference equations satisfied by the trajectories of a non-deterministic process.
More precisely, all "behavioural equations" commonly used to describe random
processes, like say ARMA models, state-space or Error- In- Variables descriptions
etc. always require the introduction of auxiliary variables and should therefore
be regarded as internal models. By the external model of a stochastic system
(or process) we shall then merely intend the family of compatible finite
dimensional distributions

F{tl, ... ,tn,lh, ... ,lln}:=fl{z(td~1l1, ... ,Z(tn)~lln},

n=1,2, ...

which uniquely specifies the underlying probability measure fl.


We shall choose to work with continuous time (T = lR) and ass urne that
the external process takes values in lR m Often a more realistic type of external
description in many practical situations may be a second order description
consisting of a covariancefunction l1:lR x lR"lR mxm where

l1(t, s):= E[y(t)Y(S)T]


and a mean value fl(t):= E(y(t)), tElR. These da ta specify an equivalence class
of processes (on a fixed probability space) which is often called a "wide sense"
or "second order" process. A representative may be taken by assuming Gaussian
distributions.
The issue of wh at should be meant by a Stochastic Dynamical System or by
a stochastic model has been object of controversy and several different definitions
have been proposed in the literature which may not quite agree with the one
we are proposing here. One general point which has been argued sometimes
(and which is usually avoided in the literature) concerns the relevance of
axiomatic probability as an appropriate mathematical framework for describing
the type of "uncertain" or "unpredictable" temporal behaviour of the variables
describing technological or economic systems. It is often argued that "physical"
uncertainty or unpredictability is actually just a manifestation of incomplete
(deterministic) modelling and that there are no "urns" visible in nature from
which sam pIe paths are extracted.

4 The Realization Problem-Stochastic Systems

217

Indeed, incomplete modelling may occur either due to actual ignorance of


physical laws governing the phenomenon or, as it is often the case, because
physical systems are never isolated and interact instead with a complicated
environment which one cannot and may not want to model explicitly. Now
in the first case, a totally "black box" situation, we can't venture to say anything
more than one must by necessity use a phenomenological description. The only
sensible justification for using probability is that it allows to capture the notion
of "statistical regularity" in the data which is a basic motivation for model
building (a model should be useful to describe also data which were not used
for its calibration). Also, probabilistic models do not lead to structural
ill-posedness in identification on the basis of observed data as much as it is
instead typically the case with deterministic models.
In the second situation we can recognize a sort of statistical-mechanical
setup where our variable of interest is coupled to a "Iarge" environment through
many interaction variables. There are dasses of stochastic processes wh ich
ideally model this situation. There is a highly inspiring formal mathematical
construction in nonequilibrium statistical mechanics showing that a wide dass
of purely nondeterministic random processes modelling physical variables, like
say the motion of a Brownian partide, can actually be described by deterministic
dynamic systems in interaction with a "Iarge", generally infinite-dimensional,
deterministic environment (a "heat bath"), [10J, [11]. The he at bath coupled
to the physical variable is, to be sure, an "artificial" mathematical construction
based on dilation theory. Nevertheless it has the general format of a Hamiltonian
Dynamical System. Moreover the construction can be carried out in great
generality.
Thus the notion of a (purely nondeterministic) stochastic process can be
regarded as a mathematical caricature of the physical situation of a deterministic
system in interaction with a large complicated (although still deterministic)
environment. The family of possible temporal evolutions (trajectories) of the
process can be imagined as corresponding to all possible different evolutions
or initial conditions of the environment interacting with it. Modelling the
environment and thereby making the description of the signal completely
deterministic, would introduce many (generically, infinitely many) non
observable internal variables. Here lies the point for the introduction of a
probability measure, playing the role of an invariant measure on the phase
space of the canonical heat bath of the signal. By this introduction an infinite
dimensional environment can be described by a prob ability distribution, which
in certain cases may be specified only by a few constant parameters.
In the following we shall give for gran ted some standard notions of Probability Theory like stationarity, the notion of conditional prob ability, the definition
of Markov property and the basic theory of stationary wide sense processes as
e.g. found in Rozanov's book [21]. The term random variable will generally
me an a random function with values in some abstract space. Real random
variable will be used for IR1-valued random variables.

218

G. Picci

2 Splitting Variables
There are two fundamental types of auxiliary variables which enter in the
construction of internal models of random phenomena. We shall call them
splitting variables and noise variables. A splitting variable parametrizes the
probabilistic dependence between external variables. In the common probability
space {n, OJI, J1.}, let OJI i = CT(Y;) i = 1,2 be the CT-angebras induced by the random
variables Yi. A random variable x is said to be splitting for (Yl, Y2) if X = CT(X)
makes OJI land OJI 2 conditionally independent given X, i.e.
(i) J1.(A l nA 2IX) = J1.(AtlX)J1.(A 2 IX) Al EOJI l , A 2E0JI 2 or equivalently,
(ii) J1.(A 210J1 l v X) = J1.(A 2IX) A2E<??!20ralso,
(iii) J1.(A l I0Jl 2 v X) = J1.(A l IX) AlE<??!l.
Notations: OJI l JL0JI 21X or y l JLY2lx.
The definition is easily extendable to any number of external variables. Since
CT-algebras have a natural ordering by refinement there is a natural notion of
minimal (or coarsest) splitting CT-algebra and of minimal splitting variable as one
inducing a minimal splitting CT-algebra. Minimal splitting CT-algebras are non
unique and a cIassical counterexample to uniqueness is offered by the two
predictor-algebras
E(OJI ll<??! 2):= CT{J1.(A l IOJl 2); Al EOJI l}
E(OJI 210J1 d:= CT{J1.(A 210J1 d; A 2EOJI 2}

each of wh ich is minimal splitting. The two predictor algebras coincide only
when OJI land OJI 2 intersect perpendicularly i.e. when OJI 1 JL OJI 21 OJI ln OJI 2 which is
a rather special condition.
Splitting variables are a generalization of the idea of (Bayesian) sufficient
statistic. In particular, a minimal OJI l-measurable sufficient statistic is a "coarsest"
function x = (P(Yl) wh ich does exactly as weil as the whole Yl at the purpose
of predicting Yl. This property is generalized by property (ii) above. Note that
a splitting variable need not necessarily be a function of (Yt> Y2). If this is the
case the variable is caIIed (Yl' Y2)-induced (the word "internai" is also used but
since it may cause confusion in the present context it will be avoided). In general
x may even require a larger probability space than the one supporting (Yl,Yl).
A prototype abstract problem in stochastic realization is to construct splitting
variables as functions of certain available random data: Given in a probability
space {n, .91, J1.} random variables Yl, Y2 and perhaps also an exogenous
independent random element w, find all minimal splitting variables x (or CT-algebras
X) Jor (Yl'Yl), wh ich are Junctions oJ (Yl,Yl, w) (resp. contained in
OJI l v 0JI 2 V CT(W)).
As we shall see later on, different cIasses of internal stochastic models can
naturally be characterized in terms of a particular splitting property. Perhaps
the most important is the characterization of state-space internal models which
we proceed to illustrate below. [The notations used are as folIows: if Y= {y(t) }relR

4 The Realization Problem-Stochastic Systems

219

is a stochastic process then !lJJt:= u{y(t)} is the present u-algebra at time t,


!lJJ t- = u{y(s); S ~ t} the past and !lJJ t+ = u{y(s); s ~ t} the future at time t while
!lJJ:= I1JJ t- v!lJJt denotes the complete time history of y.]

Definition. A stochastic dynamical system, z = (y, x), is in state-space form, or is


a state-space system, if the internal variable x = {x(t)}telR has the following
M arkovian Splitting property,
X t-

I1JJ t- JUl t+

I1JJt IXt, \ftEIR

(MS)

The extern al process y is then called the output and x the state of the system.
The system is stationary if (y, x) are jointly stationary and finite dimensional if
x(t) takes values, for each tEIR, on a finite dimensional space X
From the definition of conditional independence reported at the beginning
it is seen that l1JJ 1 JL!lJJ 21 X implies .91 1 JL .91 21X for any sub cr-algebras .91 1 c !lJJ 1
and .91 2 c !lJJ 2' It then follows from condition (MS) that
(S)

and
X t- JLXt+ IXt

(M)

for all tEIR. (S) is referred to as the splitting property of x while (M) is just the
Markov property of the state process x. In the present continuous-time setting
it can be shown that (S) and (M) imply, and hence are equivalent to (MS), the
Markovian Splitting property of the Definition but in more general situations
this may not be the case. The two properties (S) and (M) constitute the natural
generalization to the stochastic framework of the deterministic properties of a
state variable.
The following implication is an immediate consequence of (ii) or (iii),
l1JJ 1 JL l1JJ 21 X => !lJJ 1 n l1JJ 2

so that (S) implies !lJJ t c X t and hence there is a Borel function ht:X ~ IRm such
that

y(t) = hlx(t))

(MR)

Thus the output of a state-space system is a "memoryless" function of the state.


If the system is stationary h can be chosen independent of t. Being completely
pedantic about the terminology, one should say that (MR) is an internal model
corresponding to the notion of (continuous time) state-space system. A model
of the type (MR) is called a Markovian representation of the process y. A
Markovian representation is called minimal if X t is minimal splitting for each
tEIR and stationary if the state process is stationary.
A first abstract instance of stochastic realization problem follows

Problem. Given a process y = {y(t)}telR describe the minimal Markovian


representations of y and give a procedure for constructing the relative state

220

G. Picci

processes. In particular, describe alt minimal representations which are


y-induced.
Of course whenever some structure is present in y, like stationarity,
continuity, Gaussianness etc. it is natural to ask to construct realizations (i.e.
Markovian representations of y) possessing the same type of structure. We shall
not worry so much about these questions at this point. Our goal is rather to
reveal the basic structure of splitting cr-algebras as a first step leading to the
construction of the state.

Theorem. In the probability space {Q, .91, Jl} where .91 => OJI, let {9't} and {fit}
be two flows of cr-algebras satisfying the foltowing conditions:
1. {9't} is increasing and {fit} is decreasing with t.
2. 9't v fit = .91 for alt t.
+
3. 9't => OJI t- and 9't => IJ}jt for alt t.
4. 9't l..fitl9'tnfiJor alt t, i.e. {9't} and {fit} are perpendicularly intersecting.

Then f![t:= 9't n fit is Markovian splitting and f![t- = 9't, f![t+ = fit. Viceversa,
alt M arkovian splitting cr-algebras f![t are genera ted in this way with
9't = OJI,- V f![" fi, = OJI,+ v f![t and .91 = lJ}j v f![.
D
This result is a generalization of the fundamental scattering representation
of Markovian Splitting Subspaces first obtained by Lindquist-Picci in the
wide-sense (linear) framework (the original reference being listed in [13J). A
proof of this theorem is a result of joint work with l.H. van Schuppen and will
appear in aseparate publication.
In the stationary wide sense (or Gaussian) setting one can work on the
Hilbert subspace of U(Q, .91, Jl) linearly generated by the components of the
process under study and the abstract procedure given before for constructing
the state of a process y can be made quite explicit. Conditional independence
reduces to conditional orthogonality of subspaces of a Hilbert space, written

H l l..H 2 1X
with the equivalent meanings,
i'. <Al - E(A1IX), A2 - E(A 2IX) = 0
ii'. E(A 2IH l v X) = E(A2IX)
iii'. E(All H 2 v X) = E(All X)
where AiEHi, <.,.) denotes inner product, E( IX) is the orthogonal projection
operator onto X and H v X is a closed vector sum of subspaces. We shall use
the standard notations H(y), H,- (y) and H,+ (y) to denote the subspaces spanned
by (the sc al ar components of) the process y, the past history of y up to time t
and the future history of y after time t. There is a unitary group {V r } of linear
operators, called the shift group, which stationarily pro pagates the process y,
VrYk(S) = Yk(t + s), k = 1, ... , m, and all time-dependent quantities in H(y) are

4 The Realization Problem-Stochastic Systems

221

assumed stationary i.e. also propagated by the shift. The subspace X r is a


M arkovian splitting subspace at time t if
(MS')

X r is then both a splitting subspace and a M arkovian subspace at time t. Actually


everything can be looked at a fixed time instant say t = 0, and then propagated
by {Ur}' For this reason the subscript t is usually omitted in the various
subspaces. The analog of (MR) is, quite expectedly, the linear model
y(t) = Cx(t)

(MR')

where C is a fixed linear operator from the state-space of the Markov process
{x(t)} into IRm. The general result on construction of splitting a-algebras given
above specializes to
Proposition. Given a stationary Gaussian space H:::J H(y) with shift {Ur}' consider
the intersection, X = S ( l Sof a pair (S, S) of subspaces of H satisfying the following

conditions,
1'. UiS c Sand UrS eS for all t;S 0, i.e. Sand
the left and right shift semigroups.
2'. SV S=H.
3'. S:::JH-(y),S:::JH+(y).
4'. Sand S are perpendicularly intersecting.

S are invariant subspaces for

Then X is a Markovian Splitting Subspace contained in H. Conversely, to any


Markovian splitting subspace X associate S;= H-(y) v X and S;= H+(y) v X.
Then Sand S satisfy conditions l' to 4' above with H;= H(y) v [V rXrl
0
This characterization reduces the problem of constructing the state to finding
Scattering pairs i.e. to finding invariant subspaces for the shift. The rest of the
story is actually told in great detail in the survey paper [13J and we shall not
repeat it here.
We shall rather return to the general idea of splitting variable and show
how it may lead to a quite diverse type of stochastic models. For concreteness
we shall work completely in the (linear) wide-sense framework.
Let y l ' ... 'YN be jointly stationary wide sense processes taking values in
IRm\ k = 1... N. A (linear stationary) Factor-Analysis model is a set of equation
of the following form

Yl=A1x+W 1
YN= ANx + W N

(FA)

where x is an IRn-valued stationary process called the factor, W 1 , ... , W N are


stationary processes mutually uncorrelated and uncorrelated with x, i.e.

222

G. Picci

and A k are linear (deterministic) convolution operators. The model is said to


be minimal if the number of factors (the scalar components of x), n, is as small
as possible.
Models of this type have been around for a long time in the statistical
literature (mostly in the static version [24J). The variable x is meant to capture
or "explain" all of the cross correlation between the y/s while the so-called noise
terms W k model the "purely endogenous" variability of the y/s and have no
bearing on the mutual correlations.
F.A. models or Factor Systems (which we will not take the pain to define)
can be viewed as interna I models with the factor process x playing the role of
internal variable.
Proposition. In the F.A. model,

E[YiIH(Yil"'" YiN' x); ik "# iJ = E[YiIH(x)J


for all i = 1. .. N, so that x is splitting for Yl ' ... 'YN' i.e.
H(y;)1-H(Yj)IH(x)

for all i"# j.

Viceversa, let X be a splitting subspace for Yl" 'YN' i.e. let


H(y;)1-H(YJIX,

i,j= 1, ... ,N.

then any set of generators, x = {Xl" .xn }, for X defines a F.A. model. If x is a
minimal set, the operators A k are uniquely determined. The model (F.A.) is minimal
if and only if X = H(x) is a minimal splitting subspace.
0
The stochastic realization problem for Factor Analysis models is very difficult
even in the seemingly simpler static case. The intrinsic feature of non-uniqueness
of the latent variable x has created a wave of mystery around these objects and
their use has been discouraged. Obviously the non-uniqueness of the factor x
or better, ofthe minimal splitting subspaces X, is always present in parametrization problems of this nature. The problem here seems to be much more a
cIassification of the minimal factors than achieving "identifiability". This
viewpoint transpires also in Kalman's work [8,9]. A fairly complete treatment
of the "two blocks" case (N = 2) is given in [18, 19].

3 Noise Variables
In the engineering literat ure the term white noise is used to denote a stochastic

process w = {w(t)} having the "canonical" property that its value {w(t)} are a
maximally uncorrelated family of random variables. In discrete time "white
noise" generally means a stationary zero mean process with independent or
uncorrelated variables. In continuous time several choices are possible. We shall

4 The Realization Problem-Stochastic Systems

223

define it to me an a continuous process with stationary independent (or


uncorrelated, in a wide-sense setting) increments. This is commonly called a
W iener (or wide sense-W iener) process.
A basic chapter in modern Probability Theory studies the representability
of processes y = {y(t)} as functionals ofwhite noise, which we write as
y(t) = ft(w)

(WNR)

w = {w(t)} being a white noise process. A representation of this type is very


naturally interpreted as an internal model of y, w being the latent noise variable,
or simply the noise variable of the model. White noise representations can be
seen (if we ignore some technicalities arising in the nonlinear context) as inputoutput (deterministic) maps producing the trajectories of y as a "result" of the
application of a particular white noise sam pIe path at the input. There is a
basic difference with the dassical idea of input-output map in system theory
however in that w is not assigned arbitrarily from the outside (is not a control
variable) but is instead to be considered as apart of the model. This idea of
"hidden" white noise input plays an extremely important role in stochastic
systems theory.
The idea and the study of white noise representations originates with the
work of Wold [29J in 1938. The area has since then evolved greatly, the most
recent research involving quite sophisticated martingale representation theory.
In this section we shall mostly restrict to the ca se of wide-sense stationary
1Rm-valued processes. In this case ft turns out to be linear and a large body of
c1assical results apply. For a survey of the relevant literature one may consult
the collection [5]. Before embarking in the analysis of the wide sense stationary
case it is worth reviewing some general concepts. Let 1fI t+, 1fI t- be the u-algebras
generated by the future and past increments of a Wiener process wand let
1fI = 1fI t- v 1fIt. A functional of w is simply a 1fI-measurable function with
values in 1Rm. So a functional actually depends only on the equivalence dass
of Wiener processes defined on the underlying probability space and inducing
the same "1fI. A white noise representation of a process y is just a time indexed
family offunctionals of a Wiener process, {ft(w)}, satisfying (WNR) for all tE1R.
A representation is y-induced (or internal) if 1fI = if!I. In this case the noise
process w can be constructed as the output of a "whitening filter" which
transforms y into w. In general we have if!I c 1fI properly and one may need
to enlarge the probability space to support the noise process of the representation.
A representation (WNR) is called causa I if for each tft is 1fI t- -measurable and
anticausal if it is 1fI t+ -measurable. Of course these are only two extreme pos sibilities and a continuum of intermediate measurability structures are in general
possible. The Wiener process, as any white process, obeys the "0-1 law"

where .;V is the trivial u-algebra {4>, n} mod Ji. It follows that any process y
admitting a causal or anticausal white noise representation must have trivial

224

G. Picci

"remote past" or "remote future" i.e.


(PND)
A process satisfying both 3 conditions (PND) will be called purely non-deterministic.
Pure non-determinism implies unpredictability both forward and back ward in
time since !lJJ; must then be increasing or, dually, !lJJt must be decreasing at
least at some instants of time. Precise conditions under whieh (PND) implies
white noise representation are known only in some particular cases. It is known
that nonstationary processes may require a white noise process with an infinite
number of independent components [3]. One says that the multiplicity of the
process is infinite in this case. Stationarity (or some variants of stationarity)
appears to be a natural structural conditions guaranteeing finite multiplicity.
This is evident in the wide-sense setting. The action of the shift group {Ut}
on the Hilbert space H(y), generated by a wide-sense stationary process {y(t)},
induces a natural module structure on H(y) with ring of scalars IB, the bounded
Borel functions on IR. Let F be the infinitesimal generator of {Ut} i.e.
Ut = exp(iFt) and let qJEIB, then qJ .1'/, for an arbitrary 1'/EH(y), is first defined
for trigonometrie polynomials say, qJ(A) = Lcxkei.l.t\ as qJ(F)1'/ = LCXkUtk1'/ and
then extended to the wh oIe of IB (trigonometric polynomials are dense in IB).
The multiplicity of y is then defined as the multiplicity of H(y) as a IB-module,
i.e. as the cardinality of a minimal set of generators. Since H(y) is always finitely
genera ted as, by definition, Yk(O), k = 1, ... , m are generators, the multiplicity of
a wide-sense stationary process is always finite and ~ of its dimension.
Now let S, respective1y S, be an invariant subspace for the left (resp. right)
shift and let {St} and {St} be the increasing and, respectively, decreasing flows
of subspaces obtained by shifting, i.e. St = UtS and similarly for St. Sand S are
Jull-range in H, if Soo = H or S-oo = H.
We shall call Sand S purely-non-deterministic if,

nSt

{O},

nSt

{O},

(PND').

The continuous-time version ofthe Wold representation theorem states that


if S or S are PND' then there exist p-dimensional Wiener pro ces ses, w(t) =
[w 1 (t), ... ,w p(t)Y and w(t) = [l-i\(t), ... ,wp(t)Y, such that S=H-(dw), or,
respectively, S= H+(dw), i.e. S (resp. S) are the past (future) spaces oJ a white
noise.
A wide-sense stationary process y, will be called (linearly) purely-nondeterministic if H-(y) and H+(y) are PND'. It follows from the Wold
representation that there are white noise processes such that H(y) = H(dw), in
other words, the module H(y) admits white noise generators. This me ans that

There are examples showing that one ofthese two conditions does not necessarily imply the other.

4 The Realization Problem-Stochastic Systems

225

every element 1]EH(y) can be represented as


+00

1]

= S qJ(s)dw(s),

(R)

-00

where qJ = [qJl'.'" qJp] is a row vector function, belonging to e(lR; IR..P), of


"coefficients" of the representation. The number p ~ m, is the multiplicity of y.
The increment space H(dw) is precisely populated by random variables of the
form (R). There are infinitely many ways of choosing white noise generators of
y and even more so if we are allowed to build w out of a larger space than H (y).
Proposition 1. Let v be ar-dimensional exogenous purely-non-deterministic

process independent of y and jointly stationary and let H:= H(y) v H(v) have
multiplicity p. Then to each white noise generator w of H there corresponds an
m x p matrix valued function W EL 2(lR; lR mX P) which represents y as
+00

y(t) = S W(t - s)dw(s)

(WNR')

-00

The Fourier transform


Factorization equation,

W,

of W is then an mx p solution of the Spectral


(SF)

where ([J is the m x m spectral density matrix of y.


Viceversa, let W be a m x p solution of (SF), then there exist a generating
white noise w for H for which (WNR') is valid. The correspondence W +4W is one
to one provided it is interpreted as correspondence between equivalence classes
defined modulo multiplication by an arbitrary p x p orthogonal matrix.
D
The construction of the white noise process generating Hand corresponding
to the spectral factor W in the proposition, is a slight generalization of the weIl
known idea of whitening filter of Bode and Shannon and is explained in [14].
In a sense the above result shows that for wide sense stationary processes the
computation of white noise representations reduces to spectral factorization.
We shall further discuss only causal and anticausal representation.
A matrix function W with entries in L 2(lR) is called causa I (resp. anticausal)
if W(t) = 0 for t < 0 (resp. for t > 0). It is weIl known that causal (anticausal)
L 2 -functions have Fourier transforms which can be extended as analytic
functions to the right (resp. left) of the imaginary axis.
Proposition 2. Let H

:::J

H(y) be fixed as in the previous proposition. Then,

i. To any full range purely-non-deterministic left-invariant subspace S of H,


containing the past H - (y) there corresponds a causal white noise representation
y(t) =

+00

-00

W(t - s)dw(s)

226

G. Picci

where w is the unique causa I generator oJ Sand W a causal matrix Junction


in L 2 (lR; lRmxp ). The Fourier transJorm Wis an analytic solution oJthe spectral
Jactorization problem (SF). Viceversa, to any analytic m x p solution oJ (SF)
there corresponds a white noise w such that H-(dw):= S is a Ui-invariant
PND' Juli range subspace oJ H containing H-(y).
ii. Dually, to any Jull range purely-non-deterministic right-invariant subspaces S
oJ H containing theJuture H+(y), there corresponds an anticausal white noise
representation
y(t) =

J W(t -

+00

s)dw(s)

-00

where w is the unique anticausal generator oJ Sand W an anticausal matrix


Junction in L 2(IR; IRmX P). The Fourier transJorm oJ W is a co-analytic solution
oJ the spectral Jactorization problem (SF). Viceversa, to any co-analytic m x p
solution of (SF) there corresponds a white noise wsuch that H+(dw):= S is a
0
Ur-invariant subspace of H with the properties listed above.
The two statements above don't claim great originality. Their purpose is to
stress that the causal form of white noise representation which is about the only
one looked at in the literature has nothing so special to deserve to be considered
the only one. There is no built-in causality nor privileged direction of time in
the definition of a dynamical system and this makes representations with any
causality structure equally valid.
Proposition 2 is also apreparatory result for studying white noise representation of state-space system. In order to do this we first need to guarantee
representability of the state process. We recall that a Markovian splitting
subspace X is purely-non-deterministic (or proper in the terminology of [13]) if
the past X r- and the future X r+ satisfy condition (PND') above. We also name
a pair of subspaces (S, S) satisfying conditions l' to 4' listed in the Proposition
of section 2 and the (PND') condition, a scattering pair.

Theorem 3 [13]. Any purely-non deterministic Markovian splitting subspace X


has a representation as X = Sn S where
(SR)

form a scattering pair spanning H:= H-(y) v X v H+(y). Viceversa, given any
H::J H(y) with the properties ofproposition 1 and 2 above and given any scattering
pair (S, S) in H, the intersection Sn S is a M arkovian splitting subspace and S, S
have the representation (SR).
The pair (S, S) is called the Scattering Representation of X.
We shall now restrict further to finite dimensional Markovian splitting
subspaces and choose a basis x(O)=[x 1 (O),oo.,x n(O)Y in each X. The
n-dimensional process x(t):= Urx(O), tEIR, will be wide-sense stationary and
Markov.

4 The Realization Problem-Stochastic Systems

227

Let X(S, S) be the scattering representation of X. Coordinate-wise, this


means that there are two white noise processes w, w, such that x(t)EHt-(dw)
and x(t)EH,+(dw), hold simultaneously. From general representation theorems
for Markov processes [4], it follows that x(t) admits two simultaneous representations as a solution of a "stochastic differential equation",

dx(t) = Ax(t)dt + Bdw(t)

(FR)

dx(t) = Ax(t)dt + Bdw(t)

(BR)

These are in fact differential versions of the causal and anticausal white noise
representations of x corresponding (via Proposition 2) to the scattering pair
(S, S). The matrix A is asymptotically stable, i.e.1Re[A,(A)] < 0 while Ais totally
unstable, i.e.1Re[A,(A)] > O. The pair (A, B), must be a reachable pair (since x(O),
being a basis, has a positive definite covariance) and similarly for (A, B). Note
that (FR), resp. (BR), must then be integrated forward (resp. backward) in time
in order to get a stationary process, i.e.

x(t) =

J eA(t-s) Bdw(s),
t

-00

x(t) =

+00

J eA(t-s) Bdw(s)
t

For this reason (FR) is called aforward and (BR) a backward representation of x.
Also, the denomination "differential equations" is a bit ambiguous, since there
are no arbitrary initial conditions associated with the representations. They may
actually be regarded as boundary value systems, (FR) being associated with the
fixed boundary condition x( - (0) = 0 and (BR) with x( + (0) = O. But there is
more. Since (S, S) intersect perpendicularly, there is a very precise relation
between wand wand this in turn reflects into a relation between the parameters
of the two representations (FR) and (BR).

Proposition 4 [12]. Theforward and backward representations ofthe state process


x are related by
A= _PA Tp- 1 , B=B, AP+PA T +BBT=O
Moreover the noise process
function,

wis obtained by Jiltering w with the all pass transfer

By coupling (FR) or (BR) with the representation y(t) = Cx(t), implied by


the Markovian splitting property, one obtains, quite expectedly, "state space"
models describing y as the output of a linear dynamical system driven by white
noise. The peculiar fact however, is that the non-causal backward representation,
describing the (same) process (y, x) is quite different from the trivial time reversal
transformation which would apply in the deterministic case.
This diversity is typical of stochastic models. In general stochastic evolutions
are irreversible, Le. the same variable is described by different dynamic equations
forward and backwards in time.

228

G. Picci

Conclusion
Stochastic models should be useful to describe physical systems, but stochastic
modelling done on the sole basis of "physical intuition" may lead to nonsense.
A variety of linear models with random noise inputs (eventually modelled as a
filtered white noise) are used in identification, quite often introduced on the
basis of "physical modelling", like measurement errors etc. It may happen that
these models either give poor fitting or are not "identifiable" on the basis of
the observed data. What has been exposed before should convince the reader
that random inputs are just internal variables and have no more "physical"
meaning than any other auxiliary variable (say the state in deterministic systems).
What matters is the external (observable) process being modelled and it may
just happen that the "physical" model describes in reality too narrow a .dass
of stochastic processes to fit reasonably the data. Also quite likely a physical
model is not a "canonical" representation ofthe extern al process being described.
In simple cases the standard practice of converting to the "innovation representation" works, but this may turn out to be highly non trivial in more general
situations.
References
[1] Akaike H. (1975) Markovian representation of stochastic processes by means of canonical
variables, SIAM J Control, 13, pp 165-173
[2] Anderson B.D.O. (1969) The inverse problem of stationary covariance generation, J Stat Phys
1, pp 133-147
[3] Cramer H. (1960) On some Classes ofNon-Stationary Stochastic Processes, Proc IV Berkeley
Symp Statistic and Appl Probability II, pp 57-78
[4] Doob J.L. (1953) Stochastic Processes, Wiley
[5] Ephremides T., Thomas J. (1973) Random Processes, Multiplicity Theory and Canonical
Decompositions, Dowden Hutchinson & Ross
[6] Faurre P. (1973) Realisations Markoviennes de processus aleatoires stationnaires, INRIA
Report n 13, INRIA, Le Chesnay
[7] Kaiman R.E. (1965) Linear stochastic filtering: reappraisal and outlook, Proc Symp System
Theory, Polytechnic Inst of Brooklyn, pp 197-205
[8] Kaiman R.E. (1982) Identification from real data, in Current Developements in the Interface:
Economics. Econometrics and Mathematics (M. Hazewinkel and A.H.G. Rinrooy Kan eds),
Reidel, Dordrecht, pp 161-196
[9] Kaiman R.E. (1982) System Identification from Noisy Data, in Dynamical Systems II (A.R.
Bednarek and L. Cesari eds), Academic Press, pp 135-164
[10] Lewis J.T., Thomas L.c. (1974) How to make a Heat Bath, in Functional Integration (A.M.
Arthurs ed), Clarendon Press, Oxford, pp 97-123
[11] Lewis J.T., Maassen H. 1984. Hamiltonian models of cIassical and quantum stochastic
processes. Quantum Probability and Applications (A. Frigerio and V. Gorini eds), Springer
L.N. in Mathematics, 1055, Springer Verlag
[12] Lindquist A., Picci G. (1979) On the Stochastic Realization Problem, SIAM J Control Optimiz,
17, pp 365-369
[13] Lindquist A., Picci G. (1985) Realization Theory for multivariate stationary Gaussian
processes, SIAM J Control Optimiz, 23, pp 809-857
[14] Lindquist A., Picci G. (1990) A geometric approach to modeling and estimation of linear
stochastic systems-to appear in Journal of Math Systems Estimation and Control
[15] Laub A. et al. (1988) MATLAB Control System Toolbox, Tbe Math Works Inc
[16] Picci G. (1976) Stochastic realization ofGaussian processes, Proc IEEE 64, pp 112-122

4 The Realization Problem-Stochastic Systems

229

[17] Picci G. (1986) Applications of Stochastic Realization Theory to a Fundamental Problem


of Statistical Physics, Mode/ling Identification and Robust Control (C.I. Byrnes and A.
Lindquist eds), North Holland
[18] Picci G., Pinzoni S. (1986) Dynamic Factor-Analysis Models for Stationary Processes, IMA J
Math Control Appl, 3, pp 185-210
[19] Picci G. (1989) Parametrization of Factor-Analysis Models, Journal of Econometrics, 41,
pp 17-38
[20] Picci G. (1989) Aggregation of Linear Systems in a completely Deterministic Framework, in
Three Decades of Mathematical System Theory (H. Neijmeier and J.M. Schumacher eds),
Springer Verlag Lect Notes in Control and Information Sciences, 35, pp 358-381
[21] Rozanov Y.A. (1967) Stationary Random Process, Holden Days
[22] Ruckebusch G. (1976) Representations Markoviennes de processus gaussiens stationnaires.
CR Acad Sei Paris Series A, 282, pp 649-651
[23] Ruckebusch G. (1980) Theorie geometrique de la representation Markovienne. Ann Inst Henri
Poincare' XVI, 3, pp 225-297
[24] Taylor J.S.T., Pavon M. (1987) On the Nonlinear Stochastic Realization Problem, Stochastics
26, pp 65-79
[25] Van Schuppen J.H. (1986) Stochastic Realization Problems Motivated by Econometric
Modeling, in Modelling, Identification and Robust Control (C.I. Byrnes and A. Lindquist eds),
North Holland, pp 259-276
[26] Willems J.C. (1979) System Theoretic Models for the analysis of physical systems, Ricerche
di Automatica, 10, pp 71-104
[27] Willems J.C. (1983) Input-output and state-space representations of finite-dimensional linear
time-invariant systems. Lin Alg Appl, 50, pp 581-608
[28] Willems J.C. (1986) From time series of linear systems. Part I. Finite Dimensional Linear
Time Invariant Systems, Automatica, 22, pp 561-580
[29] Willems J.c. (1989) Models for Dynamics, Dynamics Reported, vol. 2, pp 171-269
[30] Wold H. (1938) A Study in the Analysis ofStationary Time Series, Almqvist & Wiksells, UppsaJa

Chapter 5

Linear System Theory: Module Theory

Algebraic Methods in System Theory


P. A. Fuhrmann!

Department of Mathematics, Ben-Gurion University of the Negev, Beer Sheva, 84120 Israel

The role of algebraic methods in general and module theory in particular as a unifying framework
in the theory of linear systems will be surveyed. We will focus on polynomial algebra, realization
theory, stability and applications of continued fractions in this area.

1 Introduction
A Persian rug is valued for its beauty, the originality of its design, the harmony
of its colouring and the density of the knots. It turns out that a valuable rug
is also very practical and connoisseurs value its usefulness. An area of science
is not that different. We are guided by a combination of different criteria. Even
if applications are our motivation, it is wise to be guided by aesthetic considerations, it is important to realize precisely the structural properties, and evaluate
how any particular part of science is related to the main body. One of the
landmarks of a successful scientific domain is its rich connectivity to other areas.
Isolation implies low oxygen supply which rapidly leads to sterility and decay.
It seems necessary to resort to some kind of imagery as for the young
newcomers of our field, exposed very early to a bombardment ofhighly technical
and sophisticated mathematics, it is hard to realize how the field of system and
control reached this point.
Clearly system theory is the result of a joint effort of many scientists, going
back at least to J.c. Maxwell in the preceding century. However the time
things started to move from the vague to the precise, from the computational
to the conceptual, i.e. from problem solving to the creation of a theory, is clearly
the late fifties and early sixties and in this process the dominant figure was, no
doubt, R.E. KaIman. His contributions were, from the technical point, superceded quickly by people better trained in their respective fields. However in
no case was the technical contribution the important part. The importance
related to the tremendous insight into the heart of the problem. Thus with
optimal control and the fundamental new point of view in filtering. The same
is true with the conceptual foundation of the theory of finite dimensional linear
1

Earl Katz Family Chair in Aigebraic System Theory.

234

P. A. Fuhrmann

systems. Thus the focusing on controllability as a fundamental new concept,


its relation to realization theory, the isomorphism theorem and the grand
unification of the external and internal viewpoints in system theory through
the use of module theory.
I have already in the past, in Fuhrmann [1976J, indicated my debt to
Kalman's work in system theory. I have profited mostly from the process of
algebraization that KaIman brought to the field and it is in this area that I
would like to confine my contribution to this volume.
Algebra historically was almost synonimous with the theory of equations.
Linear algebra was concerned mostly with systems of linear equations and the
formal manipulation of matrices and determinants. In the twentieth century
algebra underwent a major revolution, and the essence of the revolution was
the change of focus from the special to the general, the change of emphasis
from the problem solving to the structure and the conceptual understanding.
The influence of Hilbert and E. Noether in this process was all pervasive. The
change was announced and widely recognized with the appearance of van der
Waerden's [1931J treatise. Not by accident did module theory play an important
role in this development. In fact here, I believe for the first time, was a treatment
ofthe structure oflinear transformations based on the naturally induced module
structure.
The module theoretic approach to linear algebra, it has no competition in
terms of power and elegance, did still not find general acceptance in undergraduate mathematical, not to say engineering, curricula. Thus the importation
by KaIman of module theory as a working tool in system theory was nothing
short of revolutionary. The naturality of the concept to this setup account no
doubt for the great success it had in providing a unifying point of view.
In a short contribution it is difficult to do justice to the many different ways
that module theory can be applied to problems of systems and control. So,
rat her than give a systematic account, at this stage even a reasonable size book
may not be up to the task, I will present aseries of vignettes which, if they
arouse the reader's interest, can be used as starting points for further reading
and study. Each of the topics chosen is related in one way or another to previous
work by KaIman.

2 Modular Arithmetic
In KaIman [1969J the use of algebra as a computational tool, extremely weIl
suited for computers, is stressed. This point really raises the question of how
abstract can one make a theory while retaining the ability for easy computability.
Clearly this is related to what are the most convenient representations of abstract
objects. The situation is analogous to the case of linear transformations in finite
dimensional vector spaces and their representations as matrices. However, even
in this case, this weIl established representation is by no means the only one,

5 Linear System Theory-Algebraic Methods

235

and the point I will try to make, not even the most convenient, certainly not
the most illuminating from the structural point of view. The importance of
polynomial algebra in this connection has been realized early by KaIman, not
only as a key to structure via the use of polynomial modules, but also as a
convenient computational tool, namely through the use of modular arithmetic.
This of course was possible if one restricted to the scalar case, i.e. to single
input/single output systems. Simultaneously the systematic use of polynomial
matrices in system theory was begun by Rosenbrock [1970]. While this turned
out to be an extremely powerful tool, it paid less attention to the fundamental
conceptual underpinnings of the subject.
I was in a lucky position, see Fuhrmann [1976], to be able to realize that
Kalman's emphasis of the abstract theory of modules and Rosenbrock's use of
coprime factorizations and polynomial matrices, the generally available state
space methods, as weil as the theory of infinite dimensional systems beginning
its development at that time, were all different facets that could be unified in
one approach. Specifically one goes from the abstract theory of modules to
certain polynomial representations. This leads to a modular arithmetic on
polynomial vectors and matrices. Finally looking at the new objects, namely
polynomial models, any choice of linear basis leads to matrix or state space
representations. From this point of view the use of polynomial models is a
balancing act between the abstract and the concrete.
From an external point of view regarding systems, an input output map
was defined by KaIman as a module homomorphism, over the ring of polynomials, between appropriately defined spaces of input and output functions.
Of course this implies a linear context and the module property implies time
invariance. The causality property can be introduced by requiring the invariance
of an appropriate submodule under the input/output map. Alternatively one
can implicitly introduce causality by considering the resrticted input/output map
j:Q .....

(1 )

where Q is the space of past input functions and r the space of future output
functions. The assumptions on the various objects appearing has a profound
influence on the development of the theory. Kalman's choice was Q = Fm[z]
and r = Z-l p[[Z-l]]. We shall study these and related objects in some detail
and return to Kalman's abstract realization in Sect. 5. It is convenient to
take a slightly more general setting.
Let F denote an arbitrary field. We will use the following notation. F[z]
will denote the ring of polynomials over F, F(z) the field of rational functions
and F[[Z-l]] the ring offormal power series in Z-l with coefficients in F, i.e.
j EF[[Z-l]] then j(z) = "LJ=ojjzj. By Z-l F[[Z-l]] we denote the subspace of
F[[Z-l]] consisting of all power series with vanishing constant term and by
F((Z-l)) we denote the field of truncated Laurent series namely of series of the
form

h(z) =

n(h)

j= -

hjz j with
00

n(h)EZ

236

P. A. Fuhrmann

An element hEF((Z-l)) is called rational if there exists a nonzero polynomial q


such that qhEF[z]. Thus every rational function has a representation of the
form h = p/q with p and q polynomials. An invertible element of F[[Z-l]] will
be called a bicausal isomorphism.
In much the same way we can introduce the corresponding spaces,
R((Z-l)), R[z] and R[[Z-l]] where R is an arbitrary ring. Even more gene rally,
for a left R module M, we define M((Z-l)), M[z] and M[[Z-l]] as before. While
addition is defined in M((Z-l)) as usual multiplication is not. But the
multiplication of elements of R((Z-l)) and M((Z-l)) is defined. M((Z-l)) becomes
a module over various rings including R,R[z], R((Z-l)) and R[[Z-l]]. We will
consider here only the case of M = Fm. Various module structures can be
introduced. In particular Fm[z] is an R[z] submodule of F m((Z-l)), and as F[z]
modules we have the following short exact sequence ofmodule homomorphism

----t

Fm[z] Fm((Z-l)) ~ Fm((z-l))/F m[z]

----t

with j the embedding of Fm[z] into F m((Z-l)) and n the canonical projection
onto the quotient module.
Elements of Fm((z-l))/Fm[z] are equivalence classes and two elements are
in the same equivalence class if and only if they differ in their polynomial terms
only. Thus a natural choice of element in each equivalence class is the one
element whose polynomial terms are all zero. This leads to the identification
of Fm((z - 1))/Fm[z] with z - 1Fm[[z -1]]. With this identification we denote by
n _ the canonical projection, i.e.
n

n_

fjzj =

j=-oo

fjzj

j=O

Since Kern_ = Fm[z] and Imn_ =z- l F m[[Z-l]] we have a direct sum
decomposition, over R,
We will denote by n+ the complementary projection on Fm[z], i.e.

n+ = 1- n_
or equivalently
1T.+

L fjzj = L fjzj
j

j'i?;O

At this point we will introduce a special operator, namely the shift operator
S defined by

(Sf)(z)

= zf(z)

for fEV((Z-l))

Clearly S is a linear map that is invertible and S-l f = Z-l f. The name
derives from the representation ofthe map in terms ofthe sequence of coefficients.
Indeed if f(z) = 'Lfjzj and we make the correspondence
f(z)~( ... , f2' fl'

/0, f -1"")

5 Linear System Theory-Algebraic Methods

237

where the underlined term is the coefficient of ZO then


Sf(z)-( ... '/1'/0' f-1' f -2"")

Since Fm[zJ is a submodule, relative to Fm x m[zJ, then in particular


SFm[zJ c Fm[zJ, i.e. it is an S-invariant subspace. Therefore we can define the
restrietion of S to F m [ zJ and denote it by S +, that is
S+ = SlFm[zJ

Similarly S induces an F-linear map in z- l Fm[[Z-lJJ defined by


for

S_h=1LZh

hEZ- 1 F m [[z- l

JJ

We note that S + is injective but not surjective whereas S _ is surjective but not
injective.
The map S _ has many eigenfunctions. In fact, each oe in F is an eigenvalue
of S _, with the eigenfunctions given by
v(z)=(Z-oe)-l~

~EFm

In the same way one can show the existence of generalized eigenfunctions of
arbitrary order. Contrary to this richness of eigenfunctions, the shift S + does
not have any eigenfunctions. The previous theorem indicates that the spectral
structure of the shift S _, with finite multiplicity, is rich enough to model all
finite dimensional linear transformations, up to similarity transformations. This
is indeed the case as the next theorem, proved originally by Rota [1960J in a
different setting, shows.
Theorem 2.1 (Rota). Let A be a linear transformation infinite dimensional veetor
spaee Fm over F, whieh we take to be Fm. Then A is isomorphie to S _ restricted
to afinite dimensional S_-invariant subspaee ofz- 1Fm[[Z-lJ].
Proof. Define the set L by L= {(zI (zI -

A)-l~

L Aj~z-U+

A)-l~I~EFm}.

Since

00

1)

j= 1

L is a subspace of z- l Fm[[Z-lJ]. Moreover since


S_(zI -

1Lz(zI - A)-l~
= 1L(zI - A + A)(zI -

A)-l~ =

A)-l~ =

(zI - A)-l A~

is in L it is c1early S _ -invariant. Finally if 4J: Fm -+ L is defined by


4J~ = (zI - A)-l~ then 4J is injective and 4JA = S _ 4J, so A is isomorphie to S -IL
with L= Im 4J.
0
Considerations of duality suggest that rather than use the restrietion of S_
to submodules as models fr finite dimensional linear transformations, we can
use compressions f S + to quotient modules of Fm[z]. A eompression of S + is

238

P. A. Fuhrmann

the map indueed by S + in a quotient module. This we proeeed to show direetly.


The result is dual to Rota's theorem.

Theorem 2.2. Let A be a linear transformation in the finite dimensional veetor


spaee Fm over the field F. Then A is isomorphie to the map indueed by S + in a
quotient module of Fm[z].
Proof. Define a map (/J: Fm[z] --t Fm by
n

(/J

j=O

Then clearly

vjZ j =

(/J

j=O

Ajv j

is surjeetive and satisfies

(/JS+=A(/J
If an F[z]-module strueture is defined on Fm by

p'V = p(A)v

for

pEF[z]

then (/J is aetually a surjeetive module homomorphism. Thus we have the


F[z]-module isomorphism
Fm

Fm[z]jKer (/J

and we are done.

It is of interest to obtain an explieit deseription of Ker (/J. We will show that


Ker (/J = (zl - A)Fm[z]. To this end let vE(zl - A)Fm[z] then

v(z)

(zl - A)w(z) = (zl - A)

j=O

wjZ j =

j=O

wjz j +1

L Awjz j
j

So
(/J(v)=

j=O

Aj+I Wj -

L AjAwj=O

j=O

Conversely let vEKer (/J, then if v(z) = L'j=oVjZ j, we have "L'j=oAjvj = O. This
implies that
v(z) =

j=O

j=O

L vjZ j - L

Ajv j =

L (z j1 -

j=O

Aj)v j

But
(z j1 - Aj) = (zl- A)(zj- 11 + zj-2 A

+ ... + Aj-I)

and so v(z) = (zl - A)w(z).


The moral of this is that in the quotient module Fm[z]jKer (/J the submodule
Ker (/J has a niee representation namely Ker (/J= (zl - A)Fm[z] and moreover
the eompression ofthe shift to the corresponding quotient module is isomorphie

5 Linear System Theory-Algebraic Methods

239

to A. This suggests that we study the lattice of submodules of Fm[z]. This we


do in the next section.

3 Coprimeness
In this section we focus on coprimeness. This is a c1assieal topic, going back
to Greek mathematics that culminated in the Euclidean algorithm. In the fabric
of system theory the presence of the notion of coprimeness and the Euclidean
algorithm, and the related Bezout equation, are all pervasive. They relate to
the geometry of invariant subspaces, spectral problems and inversion of
operators, stability criteria, canonical forms, controllability criteria, eontinued
fraction expansions, recursive algorithms, and this is a partial list.
We will introduce coprimeness on the level of polynomial matrices, relate
these to geometrie ideas as well as to spectral representations. Finally we will
apply it to the study of inversion of the module homomorphisms characterized
in Theorem 3.6.
All these ideas have their counterpart in functional analysis and operator
theory. In its most far reaching extension the idea of coprimeness is present in
the eelebrated corona theorem, proved by Carleson [1962]. For the application
of this to spectral theory see Fuhrmann [1968], and to infinite dimensional
realization theory we refer to Fuhrmann [1981]. Recently a very nice applieation
to the robust stabilization problem has been provided by Georgiou and Smith
[1990]. For an effort to present a eoherent connection between polynomial
coprimeness and H OO coprimeness see Fuhrmann [1991].
We begin now with the arithmetization of the lattice operations in the set
of submodules of Fm[z]. First we recall some definitions. Given elements Di in
a ring R then D is a common leJt divisor, or common left factor if there exist Ei
in R such that Di = DEi. E is a common right multiple of the Di if there exist Ei
such that DiE i = E. D is a greatest common right divisor, if it is a common
right divisor and is a right multiple of any other right divisor. Elements D i are
left coprime iftheir g.e.1.d. is a unit. The symmetrie concepts are similarly defined.
The set of submodules of Fm[z] is partially ordered by inc1usion. Inc1usion
is related to factorization of the representing polynomial matrices.
Theorem 3.1. Let M = DFm[z] and N = EFm[z] then M
Jor so me G in pP x mEz].
D

N ifandonly ifD = EG

Two natural operations on submodules are intersection and sums and it is


of interest to relate these operations to the representatives of the corresponding
submodules.
Theorem 3.2. Given submodules Mi = DiFm[z], i = 1, ... ,s let nf= 1 Mi = DFm[z]
and Lf= 1 Mi = EFm[z]. Then D is a l.c.r.m. oJ the Di and E is a g.c.l.d. oJ the
Di

240

P. A. Fuhrmann

Corollary 3.1. (i) Given D i, i = 1, ... , s, in FP x mez] then the D i have a g.c.l.d. D
which can be expressed as
s

D=

L DiEi

i= 1

(ii) Given Di, i = 1, ... , s, in FP x mez] then the D i have a g.c.r.d. D wh ich can
be expressed as
s

D=

L EPi

i= 1

Corollary 3.2. (i) Elements Di> i = 1, ... , s, in FP x mez] are left coprime
if there exist Ei such that

if and only

L DiEi=1
i=l
(ii) Elements D i, i = 1, ... , s, in FP x mez] are right coprime

if and

only

if there

exist Ei such that


s

L EPi=1

i=l

Set operations on invariant subspaces of X D are reflected in the factorizations. This is summed up in the following.
Theorem 3.3. Let Mi> i = 1, ... , s be submodules of X D, having the representations
Mi = EiX Fi' that eorrespond to the faetorizations
D=EiF i
Then the foltowing statements are true.
(i) M 1 C M 2 ifand only ifE 1 = E 2 R, i.e. ifand only ifE 2 is a leftfaetor ofE 1
(ii) n;=lMi has the representation EvXF with E v the l.e.r.m. ofthe Ei and F v
the g.e.r.d. of the F i.
(iii) M 1 + ... + M s has the representation EJlX F" with EJl the g.e.l.d. of the Ei and
FJl the l.c.l.m. of alt the F i
0

The previous theorem leads immediately to the problem of decomposing a


polynomial model into a direct sum of submodules.
Theorem 3.4. Let D=EiF i, for i= 1, ... ,s. Then
(i) We have
X D = E 1 X F1

if and only if the

+ ... + EsX Fs
Ei are left eoprime.

5 Linear System Theory-Algebraic Methods

241

(ii) We have n:=1EiXF,=O ifand only ifthe Pi are right coprime.


(iii) The decomposition

X n = E 1X F, EB .. EBEsX Fs
is a direct sum if and only if D = EiPJor all i, the Ei are left coprime and the Pi
and right coprime.
D

We saw, in Theorem 2.2, how every linear transformation A is isomorphie


to the compression of the shift S+ to a quotient module pm[z]jDpm[z] with
D(z) = zI - A. The computational inconvenience of working with equivalence
c1asses can be removed if we can find a good way of choosing a representative
in each equivalence c1ass of pm[z]jDpm[z]. Let us analyze the scalar case. We
consider the quotient module P[z]jdP[z] with d a nonzero polynomial. Now
given any f in P[z] we can write f = ad + rand this representation is unique
if deg r < deg d. To isolate the remainder r we proceed by dividing the previous
equality by d, i.e. fd- 1 = a + rd- 1. Now a is in P[z] whereas rd-l, since
degr<degd, is in z- 1p[[Z-1]] so applying the projection 1L we have
n _ d - 1 f = rd -1 and r = dn _ d -1 f. The advantage of this circuitious route to
the remainder r is in the ease with which it generalizes to the multi variable case.
Definition 3.1. F or a nonsingular polynomial matrix D e pm x m[z] we define the
mapnn by
nnf=Dn_D- 1f, for fepm[z]

Theorem 3.5. Let D be a nonsingular polynomial matrix in pm x m[z]. Then nn is


a projection in pm[z] and Ker nn = Dpm[z].
D
We introduce now an P[z]-module structure in X n = Im nn by letting
p'f= nn(pf)

for all p in P[z] and all f in pm[z].


With the previously defined module structure X n is isomorphie to pm[z]j
Dpm[z].
In X n we will focus on a special map Sn which corresponds to the action
of the identity polynomial z, Le.
Snf=nnzf for feD

Thus the module structure in X n is identical to the module structure induced


by Sn through p' f = p(Sn)f. With this definition the study of Sn is identical to
the study of the module structure of X n. In particular the invariant subspaces
of Sn are just the submodules of X n.
The following theorem is a characterization ofF[z]-module homomorphisms
between polynomial models. This theorem, proved in Fuhrmann [1976], is the
algebraie version of the celebrated commutant lifting theorem, due to Sarason

242

P. A. Fuhrmann

[1967], Sz.-Nagy and Foias [1970]. In this connection Nikolskii [1985] is a


convenient reference.
Theorem 3.6. Let DEFmxm[z] and D 1 EFm, Xm,[z] be nonsingular polynomial
matrices. Then Z: XD~XD' is an F[z]-homomorphism, i.e. ZSD = SD,Z, if and
only if there exist M and N in Fm, x mEz] such that
(2)

and
(3)

We outline next with the invertibility properties of these homomorphisms.


We do this through the characterization of the kernel and image of the mapZ
defined by equation (3) in terms of the polynomial data (2).
Theorem 3.7. Let

Z:XD~XD,

be the module homomorphism defined in

Theorem 3.6. Then


(i)
(ii)
(iii)
(iv)

Ker Z = EX G' where D = EG and G is a g.c.r.d. oJ D and N.


ImZ = E1X G" where D 1 = EI GI and EI is a g.c.l.d. oJ D 1 and M.
Z is surjective if and only if N and D 1 are left coprime.

Z is injective if and only if D and Mare right coprime.

4 Hankel Operators and Module Homomorphisms


Given a p x m matrix function GEPXmZ-I)) it induces a Hankel map
defined by

HG:Fm[z]~z-lp[[z-I]]

HGJ = 'LGJ, JEFm[z]

(4)

It is easy to check that a Hankel operator satisfies the HankelJunctional equation,


namely
(5)

Thus solutions of this equation are just module homomorphisms from Fm to


Z-lp[[Z-I]].
The foHowing is characterization of aH solutions of the Hankel functional
equation, as weH as a characterization of rationality. The theorem can be viewed
as an algebraic version of Nehari's theorem, see Nehari [1957] and Nikolskii
[1985]. The characterization offinite rank Hankel operators is due to Kronecker
[1890].
Theorem 4.1. There exists a map H: Fm[z] ~ Z-l p[[Z-I]] that satisfies (5) if
and only ifthere exists a GEPXmZ-I)) such that H=H G. G is rational ifand
only Im HG is finite dimensional.

5 Linear System Theory-Algebraic Methods

243

It is natural to inquire whether the two theorems, Theorem 3.6 and


Theorem 4.1, each a characterization of module homomorphisms, are related
in some way. This turns out to be the case, and in fact this is a basis for a very
general and powerful technique in operator theory. For more on this one can
consult Nikolskii [1985]. Notice the naturality of the introduction of matrix
fraction representations of rational matrices.

Theorem 4.2. Theorem 4.1 and Theorem 3.6 are equivalent.


Proof. Assume Theorem 4.1. Let ZSD = SD,. Define a map H:Fm[z]-+

Z-lpp[[Z-I]] by

(6)

H=D~IZnD

We will show that H is a Hankel map.


HS+f = D~I ZnDzf =D~I ZnDznDf
= D~I ZSDnDf = D~ISD,ZnDf = D~I DIn_D~lzZnDf
= n_zD~1 ZnDf = S_Hf

(7)

So H satisfies the Hankel functional equation (5). By Theorem 4.1 there exists a
G such that H = HG. So
(8)

or
D I n_ Gf = ZnDf

Now, from equation (8) it follows that ImHGcX D,. Hence


n_DIn_Gf = n_D I Gf = 0
Since this is true for all vector polynomials

f we get D1 G = NI' or
(9)

G=D~I NI

So, for fEX D ,


Zf =DIn_D~1 Nd= nD,Nd
Now, from the definition of H, it follows that Ker H
vector polynomial f,

::::l

DFm[z]. So, for any

n_GDf=O,
Le. N = GD is a polynomial matrix. So
G=ND- I

(10)

and we have
(11)

244

P. A. Fuhrmann

Notice that no coprimeness assumptions have been made. Also we note that
equation (11) is equivalent to the intertwining relation (2).
Conversely, assurne Theorem 3.6 holds. Let H be a Hankel operator and
assurne Im H is finite dimensional. So obviously H: Fm[z] --+ Im H is surjective.
DefineZ:XD--+X D, by Z = DlH. We claim ZSD = SD,Z. Indeed,for fEX D,

ZSDf = D l HD1C_D- l zf = DlHzf = DlHS+f


= DlS_Hf = D l 1C_zHf = D l 1C_D; lZDlHf = SD,Zf

(12)

By Theorem 3.6, there exist polynomial matrices N, N 1 such that N 1 D = D 1 N


and Zf = 1CD ,Nj. So

Hf=D l 1C D,Nd=1C_D;lNd=1C_Gf

5 Realization Theory
We saw, in the proof of Theorem 3.6, the natural way in which matrix fractions
arise. We use now these matrix fractions to write down an extremely simple
realization procedure. While this is a basis free approach, it goes without saying
that special choices of basis lead to special matrix realizations, and in turn to
a variety of canonical forms. This is a very general method which we do not
explore in full in this paper. However we will give some examples in connection
with continued fractions. We begin by recalling some concepts.
Definition 5.1. A discrete time, constant linear system L is a tripie of F-linear
spaces X, U and Y and a tripie of maps (A, B, C) with AEL(X, X), BEL(U, X)
and CEL(X, Y). The tripie of maps represents the system of equations
Xn + 1 =

AX n + BU n

Yn

CX n + DUn

(13)

The dimension of the system, dirn L, is defined by


dimL =dimX
The spaces X, U and Y are referred to as the state-space, input space and
output space respectively.
Here the summation is well defined as the sequence of inputs U k is finitely
nonzero for negative indices and hence there are only a finite number of nonzero
terms in the summation. The input/output relations induced by the system can
therefore be written in the form of a discrete convolution product

L CAjBu n- j00

Yn

j=O

We note that the input/output relation depends on the tripie (A, B, C) only
through the maps CAjB, which are called the Markov parameters ofthe system.

5 Linear System Theory-Algebraic Methods

245

Realization theory is an inverse problem. Starting from an i/o map fitis


required to construct a state-space system}; = (A, B, C) for which f = f x. In
that ca se we say }; = (A, B, C) is a realization of f. Let us consider the
implications of this definition. In order to do this we recall now the important
notions of reachability and observability. These notions were introduced and
studied by Kaiman. For a full historical discussion we refer to Kaiman [1969a]
and Kaiman, Falb, and Arbib [1969].

Definition 5.2. Given the system}; = (A, B, C) with the state-space X. Astate
XEX is called reachable if there exists a sequence of inputs driving the system
from the zero state to x. Astate XEX is called unobservable if in the absence
of inputs all outputs of the system with x as the initial state are zero. We say
the system}; is reachable if every state is reachable and observable if the only
unobservable state is the zero state.
We state without proof the following simple criteria for reachability and
observability.
Theorem 5.1. The system}; = (A, B, C) is reachable

n Ker BAi = {O}


00

i=O

and observable

if and only if
(14)

if and only if

n KerCAi={O}
i=O
00

(15)

If dirn X = n then (14) and (15) can be replaced by

n Ker BAi = {O}

(16)

n Ker CA = {O}

(17)

n-l

i=O
and

n-l

i=O

respectively.

Conditions (16) and (17) are equivalent to the Kaiman reachability and
observability rank conditions. Define now the reachability map ~: Fm[z] -+ X
and observability map (1): X -+Z-l FP[[Z-l]] of the system}; = (A, B, C) by
~

and

i=O

i=O

L UiZ i = L AiBu i

246

P. A. Fuhrmann

The next lemma summarizes the basic properties of the reachability and
observability maps.

Lemma 5.1. With the module structure induced in X by A, the reachability map
and observability map (!) 01 the system L are F[zJ-module homomorphisms.
is surjective if and only if L is reachable and (!) is injective if and only if L is
observable.
D

qt

qt

A system is called canonical or minimal if it is both reachable and observable.


In the same way we say that L = (A, B, C) is a canonical realization of an
input/output map 1 if it is a reachable and observable realization.
We observe now the following.

Theorem 5.2. Given a restricted i/o map 1 and a realization L = (A, B, C). Then

1 admits a lactorization
1 = (!)qt
where qt and
the diagram

(!)

is commutative.

are the reachability and observability maps 01 L. In other words

We note that, with the transfer function G of the system defined by


G(z) = C(zI - A)-l B

the restricted input/output map 1 is equal to the Hankel map HG'


Now if Xis taken to be an F[zJ-module with the module structure induced
by A. This leads us, following KaIman, to the following definition.

Definition 5.3. Given an F[ z ]-module homomorphism I: U[ z J ~ z - 1 Y[ [z - 1JJ;


an abstract realization of 1 is a factorization
I=hg
with g:U[zJ~X and h:X~z-lY[[Z-lJJ two F[z]-homomorphisms. The
realization is called reachable if g is surjective and observable if h is injective.
It is called canonical if it is both reachable and observable. The realization is
finite dimensional if X is finite dimensional as an F -linear space.

Theorem 5.3. Given a restricted i/o map 1 then canonical realizations oll always
exist.

5 Linear System Theory-Algebraic Methods

247

Proof. We can always factor f canonically. One way to do it is to factor f as

where g is the canonical projection and h the homomorphism induced by f on


the factor module. An alternative canonical factorization is

Fm[z]

Imf

/h

Z-lp[[-l]]

where now g differs from f only in the range module and h is the canonical
injection of Im f into Z-l P[[Z-l]].
0
Our representation theorems, using the polynomial and rational models X D
and X D enable us to pass from the abstract realization of an i/o map f to a
more concrete one. For this we use the matrix fraction representation ofrational
matrices and their relation to the kernel and image ofthe induced Hankel map.
Let G = ND- 1 be a matrix fraction representation of a rational G, no
coprimeness assumptions being made. Define the shift realization to be the
realization (A, B, C) in the state-space X D defined by
(18)

In order to show that this is a realization of G we compute


CA i - 1 B~ = (ND- 1S~-lnD~)_l = (ND- 1Dn _D- 1zi-1nD~)_1
=(Nn_D-1zi-1~)_1 =(ND-1Zi-1~)_1

=(Zi-1G~)_1 =(lLZi-1G~)_1

=Gi~'

i.e. we have a realization. We compute next the re ach ability and observability
maps of this realization.
n

r!ll

L UiZ i = L

i=O

i=O

S~nDui

L nDzinDu i

i=O

L nDziui =

i=O

nD

L Zi Ui

i=O

i.e. r!ll:Fm[z] "",X D is given by


(!/lu = nDu

for all

uEFm[z]

248

P. A. Fuhrmann

Obviously, since X D = Im 1tD , f!/t is surjective and hence the previous realization
is reachable. For the observability map @ we have
@f=
=

L (ND-11tDZi-lf)_lZ-i= L (ND-1D1t_D-lzi-lf)_lZ-i
00

00

i= 1

i=l

L (ND- 1Zi - l f)_lz-i = L (1t_Z i- 1 Gf)_ l z-i =


00

00

i= 1

i=O

1t_Gf

We can summarize this analysis in the following


Theorem 5.4. The realization associated with the matrix fraction G = ND- l is
reachable. It is observable if and only if N and D are right coprime.

Next we construct a shift realization associated with the matrix fraction


G = D~l N 1. WechooseX D as the state-spaceanddefine [Al' B l , Cl] through
(19)

Theorem 5.5. The realization associated with the matrix fraction G = D ~ 1 N 1 is


observable. It is reachable if and only if N 1 and D l are left coprime.

We can rewrite this realization using the rational model rather than the
polynomial one. Thus the state-space is chosen as X D and (A, B, C, D) is defined
through

(20)

It is this realization, in its scalar version, we use as a basis for obtaining a

balanced realization.
The two realization procedures outlined above can be combined into a single
one by looking at more general representations of rational transfer functions.
Thus we will assume the transfer function of a system is given by
(21)
Our approach to the analysis of these systems is to associate with each
representation of the form (21) a state-space realization in the following way.
We choose X T as the state-space and define the tripie (A, B, C), with A: X T --+ X T'

5 Linear System Theory-Algebraic Methods

{ B~:~:U~'

249

(22)

Cf=(VT-1f)_1

We call this the associated realization to the polynomial matrix T

Theorem 5.6. The system given by (22) is a realization of n _ G where


G = VT- 1U + W This realization is reachable if and only if T and U are left
coprime and observable

if and only if T

and V are right coprime.

In particular this approach allows us to study realizations that are neither


reachable nor observable. For an isomorphism analysis of such realizations and
its relation to system equivalence see Fuhrmann [1977]. This more general
realization procedure is also more amenable to the study of realizations reflecting
some symmetry in the input/output behaviour. For this we refer to Fuhrmann
[1983, 1984].

6 Stability Tbeory
Stability theory is, in the context of linear, finite dimensional time invariant
systems, concerned with root location ofpolynomials. The origin ofthis problem
goes back at least to the middle ofthe previous century, with the work of Jacobi
and Borchardt. This work utilized the theory of quadratic forms. This powerful
tool reached its perfeetion at the hands of a master, C. Hermite [1856]. In a
completely different direction we have the pioneering work of Liapunov [1893].
This reduces in our case to the analysis of the celebrated Liapunov equation.
Now these two widely differing approaches to the study of stability are
manifestations of a constant phenomenon in system theory. The main
characterizing object of a time invariant system can be taken to be, in view of
realization theory, its transfer function. The transfer function of a finite
dimensional system is rational, and a rational function can be viewed in two
very different ways. On the one hand we can see it as an algebraic object and
apply to its study algebraic methods. On the other hand we can view it as a
complex valued function and study it in analytic terms. This Janus like character
of system theory accounts in no small part for the richness of the field and the
depth of some of the results. In multifaceted situations like this, especially in a
dynamic field of research as the theory of dynamical systems, it becomes
increasingly difficult to maintain agiobai encompassing point of view. But
whenever such a view can be obtained it is highly profitable and provides some
extra insight to the theory.
It is in this context that one can view Kalman's [1969] contribution to the
area of stability.

250

P. A. Fuhrmann

In this work KaIman set out to give a unified treatment of the classical
stability criteria of Hermite, Hurwitz, Schur-Cohn and Liapunov. In the process
the algebraic theory of quadratic forms and the Liapunov equation are unified.
While this is not the first result relating Liapunov's method to the method of
quadratic forms, apparently the first demonstration of this is due to Parks
[1962], it is a highly ingenious one. The approach is algebraic and uses modular
arithmetic in two variables. The result can be described in the following way.
We associate a matrix C(p) to any polynomial pEC[X, y], where p(x, y) =
L~:~ Lj:~CijXiyj. Given a polynomial ifJEC[X] we will denote by 'I' the ideal
in C[x,y] generated by (ifJ(x), I//(y)) where I//(z) = ifJ(z). The half plane result can
be stated in the following way.

Theorem 6.1. 1fifJEC[x] and 'I' is the previously defined ideal. Then alt zeroes
ofifJ are in the open right half plane if and only if C(x + y)-l mod '1') is positive
definite.
D
This result can be transformed into other domains as folIows. Let
r(x, Y)EC[X, y] be such that C(r) has rank 2 and signature O. Then the following
is true.

Theorem 6.2. 1f <pEC[x] and (/J is the previously defined ideal. Then alt zeroes
of <p are in the domain Rer(A., I) > 0 if and only if C(r- 1 mod (/J) is positive
definite.
D
It is worth mentioning that Kalman's method has been extended, see Djaferis
and Mitter [1977], to give a constructive algorithm for the solution ofLiapunov
related equations.
From my point of view I find the KaIman [1969] approach especially
interesting because of the intensive use of modular arithmetic. This leads to a
very direct connection to Liapunov's theorem.
We proceed now to a brief review of the classical approach using quadratic
forms. This is the road taken originally by Hermite [1856] and an excellent
scholarly exposition of is to be found in Krein and Naimark [1936].
The basic quadratic forms related to the analysis of root location problems
are the Hankel and Bezoutian forms. For this set of problems the Bezoutian
certainly proves itself the more powerful too1. The reason for this being its
linearity properties with respect to its defining polynomials. The Bezoutian
B(q, p) of the two polynomials q and pis defined as the quadratic form B(q, p) =
(bi) where

q(z)p(w)-p(z)q(w)=

L L bijZi-l(Z-W)W j -

(23)

i= 1 j= 1

The Bezoutian is a symmetric matrix, linear in its polynomial variables


qandp and alternating, i.e. B(p, q) = - B(q, p). For an extensive study of

5 Linear System Theory-Algebraic Methods

251

Bezoutians see Helmke and Fuhrmann [1989]. It seems that the most powerful
way to study the Bezoutian is the following characterization derived in
Fuhrmann [1981a].

Theorem 6.3. Let p, qEF[z], with degp ~ degq. Then the Bezoutian B = B(q,p)
of q and p satisfies

(24)

Applying Bezoutians, the elementary results concerning root location are


given by the following theorem.

Theorem 6.4. Let q be areal polynomial. Then


(i) The number of distinct roots of q is equal to codim Ker B(q, q').
(ii) The number of distinct real roots of q is equal to u(B(q, q')).
(iii) All roots of q are real if and only if B(q, q') ~ o.
0

In the root location problem several Bezoutian related quadratic forms turn
out to be useful. In particular let Q = (Qjd, where
Q(z, w) = _ i q(z)q(w) - q(z)q(w)
z-w

"L.., "Q
L..,
jk Z j - l Wk-l

(25)

i = 1 j= 1

where q(z) = L~=oqkZk = q(i). Then the following holds.

Theorem 6.5. Let q be a complex polynomial of degree n. Let


n

Q=

L1 k=L1 Qjk~lk

j=

be the Hermitian form with the Qjk defined by (25). Then the nu mb er of real
zeros of q together with the number of zeros of q arising from complex conjugate
pairs is equal to b. There are 7r more zeros of q in the upper half plane and v
more in the lower half plane. In particular all zeros of q are in the upper half
plane if and only if Q is positive definite.
0

Note that the Hermitian form - iB(q, q) can also be written as the Bezoutian
of two real polynomials, i.e. polynomials with real coefficients. Indeed, let qr' qi
denote the real and imaginary parts of q, which are defined by
qr(z) = q(z) ; q(z)
qi(Z) = q(Z); q(z)

252

P. A. Fuhrmann

we have
- iB(q, q)

== -

+ iq;, qr - iqi)
i [B(qr' qr) + iB(q;, qr) iB(qr

iB(q" q;) + B(qi' q;)]

= 2B(q;,qr)

or
- iB(q, q)

= 2B(q;, qr)

We can arrive at stability results by making a simple change of variable.


From the generating function
q(z)q(w) - q( - z)q( - w)

(26)

z+w

a quadratic form H = (hij) is induced via


q(z)q(w) - q( - z)q( - w) =
z+w

h;jZ;-lwi- 1

(27)

;=lj=l

The quadratic form associated with the generating function (26) is called the
H ermite-Fujiwara form.
Let now q(z) be areal polynomial. We define its even and odd parts, which
we denote by q + and q _ respectively, by
q+

( 2)
Z

q(z) + q( - z)

q _ (Z2) = q(z) - q( - z)

(28)

Thus we have q(z) = q + (Z2) + zq _ (Z2).

Definition 6.1. Let q(z) be areal monic polynomial of degree m with real simple
zeroes

and let p(z) be areal polynomial with positive leading coefficient and zeroes

and
l

< 2 < ... < m if degp = m

Then we say that q and p are a real pair if the zeroes satisfy
1X 1 <l<1X 2 <2< .. <m-1<lXm if degp=m-l

5 Linear System Theory-Algebraic Methods

253

and
1<OC 1 <2<<m<OCm if degp=m

We say that q and p form a positive pair if they form a real pair and OCm < O.
We are ready to give a summary of the Bezoutian related stability eriteria.
For a proof of the related Hurwitz determinantal eonditions in the spirit of this
paper we refer to Helmke and Fuhrmann [1989].

Theorem 6.6. Let q(z) = zn + qn- 1 Zn -1


following statements are equivalent.
(i)
(ii)
(iii)
(iv)
(v)

+ ... + qo be monie of degree n.

Then the

q(z) is a stable, or Hurwitz polynomial.


The Hermite-Fujiwaraform is positive definite.
The two Bezoutians B(q+,q_) and B(zq_,q+) are positive definite.
The polynomials q + and q _ form a positive pair.
The Bezoutian B(q +, q _) is positive definite and all the eoeffieients qi are
positive.
0

The equivalenee of (i) and (v) is referred to as the Lienard-Chipart theorem.


The equivalenee of (i) and (ii) is classical and go es baek to Hermite, see Krein
and Naimark [1936] and Gantmaeher [1959]. Also it is weIl known, see Krein
and Naimark [1936], Datta [1978a], Fuhrmann [1981a], that the HermiteFujiwara form is isomorphie to the direet sum of the forms B(q +, q _) and
B(zq _, q +). So eonditions (ii) and (iii) are equivalent. The equivalenee of statements (ii) and (iii) is the eonsequenee of the representation of the HermiteFujiwara form as a direet sum of two Bezoutians. This is an easy eomputation
and is worth repeating.
q(z)q(w) - q( - z)q( - w)
(z+w)
(q+(Z2) + zq_(Z2))(q+(W 2) + wq_(w 2)) _ (q+(Z2) _ zq_(Z2))(q+(W 2) _ wq_(w 2))
z+w

(29)

One form eontains only even terms the other only odd ones. So the positive
definiteness ofthe Hermite-Fujiwara form is equivalent to the positive definiteness of the two Bezoutians B(q +, q _) and B(zq _, q +).
Let g be a rational transfer funetion with real eoefficients. The Cauehy index
of g, denoted by I g , is defined as the number of jumps of g from - 00 to + 00
minus the number of jumps from + 00 to - 00. The eentral result eoneerning
the Cauehy index is the possibility of evaluating the Cauehy index of a rational
funetion as the signature of a quadratic form. This result is gene rally known

254

P. A. Fuhrmann

as the Hermite-Hurwitz theorem. For a proof and a eonneetion to signature


symmetrie realizations see Fuhrmann [1983].

= 2:;: 19dz i+ 1 be a strietly proper


rational funetion with p and q eoprime and deg q = n. Then

Theorem 6.7 (Hermite-Hurwitz). Let g = p/q

I g = (J(H n) = (J(B(q,p))

(30)

where H n denotes the Hankel matrix

H=
n

o
As a eonsequenee of Theorem 6.6(iii) and the Hermite-Hurwitz theorem
we ean state the following.
Theorem 6.8. Let q be areal monie polynomial, and let q +, q _ be defined as in
(28). Let the rational funetion g be defined by

g(z) = q_(z)
q+(z)
Then
1. g is proper for odd n and strietly proper for even n.
2. g ean be expanded in apower series in Z-l
g(z) = go

+ gl Z-l + ...

with go = 0 in ease n is even.


3. The polynomial q is stable if and only
gl

gn-l

gn-l

g2n-3

H=
n

and
g2

gn

gn

g2n-2

((JH)n =

are both positive definite and go

o.

if the H ankel matriees

5 Linear System Theory-Algebraic Methods

255

Going baek to equation (29) we are led direetly to the following result, see
Gantmaeher [1959J p. 177-178.

Theorem 6.9. Let q be areal monie polynomial of degree n and let


q{z) = q + (Z2) + zq _ (Z2)
Then q is a Hurwitz polynomial
g()
Z =

if and only if for

- zq _ ( - Z2)
q+{ _ Z2)

or
g

n-l
n-3
-qn-3z
+ ...
(z)=qn-l z
n
n-2
z-qn-2 z
+ ...

we have 19 = n.
Froof. It suffiees to eompute the Bezoutian of - zq _ ( - Z2) and q + ( - Z2). This
we proeeed to do
- q+{ - Z2)wq_{ - w2) + zq_{ - Z2)q+{ - w2)
z-w
[ - q+{ - Z2)wq_{ - w2) + zq_{ - Z2)q+{ - w2)J{z + w)
Z2_ W2
Z2 q _{ _ Z2)q+{ _ w2) _ q+{ _ Z2)W 2q_{ _ w2)
Z2 _w 2

So the Bezoutian is isomorphie to B{zq _, q +) E9 B{q +, q -).

This result has an immediate interpretation in terms of eontinued fraetions.


This will be diseussed in the next seetion.
To eomplete our survey of stability we quote the following result whieh
makes the eontaet with Liapunov's theorem via the use of the polynomial
models.

Theorem 6.10. Assume q is areal polynomial of degree n, having no zeroes on


the imaginary axis, and let Cq be its eompanion matrix, defined by

o
1
Cq = [SqJ:: =

(31)

256

P. A. Fuhrmann

Then the Hermite-Fujiwara matrix H is a solution of a Liapunov equation

with Q a nonnegative definite quadratic form.

7 Continued Fractions
Continued fractions have a long history in mathematics. Kalman's contribution
to this subject, KaIman [1979J, is by no means the first use ofcontinued fractions
in the area of system theory. A variety of ad-hoc methods related to continued
fractions and Pade approximation have been used in system theoretic problems.
However Kalman's contribution seemed the first coherent exposition on the
connection of the Euclidean algorithm, continued fractions, the partial
realization problem, canonical forms etc. This was a highly influential paper
that triggered a lot of other work in this area. The papers by Gragg and
Lindquist [1983J, Antoulas [1986J and Helmke and Fuhrmann [1989J are just
a sm all sampie.
The use of continued fractions as a prametrization for Rat(n) has been
suggested in that paper. Some work on the topological aspects of this approach
was started by Fuhrmann and Krishnaprasad [1986J and continued by Helmke,
Hinrichsen and Manthey [1989]. Also, the idea of using continued fractions
for the construction of balanced realizations in Fuhrmann [1991J is along the
lines of KaIman [1979J and Ober [1987].
Let 9 be a strict1y proper rational function and let 9 = p/q be an irreducible
representation of 9 with q monic of degree n. We define, using the division rule
for polynomials, a sequence of polynomials qi' and a sequence of nonzero
constants i and monic polynomials ai + 1 (z), referred to as atoms by
q-l = q, qo = P
qi+ 1 (z) = ai+ 1 (Z)qi(Z) - iqi-l (z)

(32)

with deg qi+ 1 < deg qi. The procedure ends when qr is the g.c.d. of p and q. Since
p and q are assumed coprime qr is a nonzero constant.
The atoms a 1 , , ar and real monic polynomials of degrees n 1 , .. , nr such
that
n 1 + ... + nr = n

(33)

and i are nonzero real numbers with


(34)

5 Linear System Theory-Algebraic Methods

257

In terms ofthe Pi and the ai(z), g has the continued fraction representation
g(z) = _ _ _ _ _--=-P--=-o_ _ _ __
a1(z) _ _ _ _ _--'-P--=-1_ _ _ __
az(z) _ _ _ _ _P_2____

Pr-2
Pr-1
ar-l ()
z ---

a3(z) - ... - - - - -

(35)

ar(z)

Using the linearity properties of the Bezoutian in conjunction with the


Euclidean algorithm, or equivalently with the continued fraction expansion (35),
we are led to the following result. For a proof see Fuhrmann [1983].
Theorem 7.1 (Frobenius). Let g be rational having the continued fraction
representation (35). Then
r

a(Hg ) =

sign

i=1

nP

i- 1

j=O

1 + ( _ 1)degai -

(36)

o
The continued fraction representation (35) can be translated immediately
into a canonical form realization which incorporates the atoms.
Theorem 7.2. Let the strictly proper transfer function g = p/q have the sequence
of atoms {ai+ 1(Z), PJ, and assume that n 1 , ... , nr are the degrees of the atoms
and
ak(z) =

nk-1

alklz i + znk

(37)

i=O

Then g has a relization of the form (A, b, c) of the form


All

A 12

A 21

A 22

A=

where

o
1

, i = 1, . .. ,r,

Aii =
1

(39)

258

P. A. Fuhrmann

o . .

o
(40)

o
b=

C=

(0 .. o 0 .. 0)

(41)

o
the o being in the n1 position.
Proof. Two sequences of polynomials Pk and Qk are defined by the three term
recursion formulas

and

P -1 = - 1, Po = 0
P k+1 (z) = ak+1 (z)Pk(z) - kPk-1 (z)
Q-1 =0,

(42)

Qo = 1

Qk+ 1 (z) = ak+ 1 (Z)Qk(Z) - kQk-1 (z)

(43)

The expansion of Pk/ Qk in powers of z - 1 agrees with that of g up to order


2L~=1 dega; + degak+1> see Gragg and Lindquist [1983]. The set
Bor = {1, Z, , Z"I-1, Q1>ZQ1"" ,Z"2-1Q1"'" Qr-1"'" z"r-I-1Qr_1}

is clearly a basis for X q as it contains one polynomial for each degree between
1. We call this basis the orthogonal basis because of its relation to
orthogonal polynomials. In this connection see Gragg [1974J and Fuhrmann
[1988J.
To complete the proofwe use the shift realization (18) of gwhich is minimal,
by the coprimeness of p and q. Its matrix representation with respect to the basis
Bor proves the theorem. In the computation of the matrix representation we
lean heavily on the recursion formulas (43).
0

o and n -

The previous construction can be applied also to obtain an output-feedback


canonical form. Indeed, suppose (j = pjq is output feedback equivalent to g = p/q.
Then p(z) = a.p(z), a. "# 0 and q(z) = q(z) - kp(z). Therefore
implies

q 1 (z) = a 1(z)p(z) - oq(z)


ocq1 (z) = a1 (z)(a.p(z)) - ocoq(z)

= a 1(z)(a.p(z)) - a.oWz) + kp(z))


= (a 1(z) - ok)(p(z)) - (a.o)q(z)

5 Linear System Theory-Algebraic Methods

259

and so
d1(z) = a1(z)p(z) - Pok

o = a.po

However, as qdp = 4dP all other atoms of 9 and (j coincide. Thus, changing
the transfer function g(z) by output feedback amounts to rescaling Po by a
nonzero constant and to chan ging the constant part of a1(z) by an arbitrary
real number, i.e. a1(z)l-+a1(z) - a. In particular the degrees ni of the atoms are
output feedback invariants, see Fuhrmann and Krishnaprasad [1986], Helmke
and Fuhrmann [1989]. It follows that any transfer function g(z) as in equation
(35) is output feedback equivalent to a unique transfer function (j given by
1
(j(z) = - - - - - - - - - - d1(z) _ _ _ _ _---'P---"l'-----_ _ __
aiz) _ _ _ _ _P_2____

Pr-2

a3 (z) - ... - - - - -

ar -

(44)

(z) - Pr-l
--

ar(z)

with d1(z) = a1(z) - a1(0).


The map gl-+{j defines a canonical form for output feedback, and by choosing
a state-space realization for {j as in KaIman [1979] or Gragg and Lindquist
[1983], the following theorem is proved.
Theorem 7.3. Let (A, b, c) be a minimal realization of a transfer function 9 of
McMillan degree n. Let ai(z), Pi be the sequence of atoms of 9 and assume that
n 1 , , nr are the degrees of the atoms and
nk- 1

ak(z) =

alklzi + zn k

(45)

i=O

Then (A, b, c) is output feedback equivalent to a unique system of the form


oftheform

(A, b, e)

, e= (0 . . 1 0 .. 0)

(46)

o
the 1 being in the n 1 position.
All
A 21

A=

A 12
A 22

~n

260

P. A. Fuhrmann

where

-all)

0
1

, Au=
- a(l)

o . .

n,-1

-a(i)

nj-!

i = 2, ... ,r, (48)


0

o
(49)

o
D

We complete the paper with a short study of balanced realizations in a


special case. We construct such a realization for the case of an asymptotically
stable, all-pass transfer function.
To begin, we recall that a minimal realization (A, B, C, D), of an asymptotically stable transfer function g is called balanced if there exists a diagonal
ma trix diag (0"1' , 0" n) such tha t

{ ~17 + 17,4 =

-~B

A17+17A=-CC

(50)

The matrix 17 is called the gramian of the system (A, B, C, D) and its diagonal
entries are called the singular values of the system. They are equal to the singular
values of an induced Hankel operator, Hg, where this Hankel operator acts
between the Hardy spaces of the right and left half-planes.
Thus let g = d*jd be an asymptotically stable, all-pass transfer function. This
means, see Glover [1984J or Fuhrmann [1991J, that d is stable and the singular
values satisfy 0"1 = ... = 0"n = 1.
d(z) = d+(Z2) + Zd_(Z2)

(51)

d*(z) = d + (Z2) - zd _ (Z2)

(52)

Then

Next we put
J(z) = - Zd_(Z2)
d+(Z2) ,

5 Linear System Theory-Algebraic Methods

261

i.e. f is constant output feedback equivalent to g. Hence, once we get a canonical


form for f, the one for g will easily follow. Next we introduce yet another
auxiliary strictIy proper rational function by defining

h(z) =

zd_( -

Z2)

d+( - Z2)

By Theorem 6.9 the Cauchy index I h of h is equal to n = deg d. This piece of


information implies two specific representations for h, one additive and the
other a continued fraction representation. For the additive representation we
must have n real simple poles far h with positive residues. Thus
n

h(z)=

k= 1 Z -!X k

with !Xl < ... <!Xn" The other representation, and the more interesting one for
our purposes, is a continued fraction representation of the form given in
equation (35) i.e.

h(z) =

_ _ _ _ _~o=____ _ _ __

a 1(z) _ _ _ _ _ __1_ _ _ __
a2(z) _ _ _ _--'---=-2_ _ __

n-2

a3 (z) - ' .. - - - - -

an -

(53)

() n - 1
1 z --an(z)

where ai(z), i = 1, ... , n are monic of degree one and alI the i are positive.
Theorem 7.4. Let g be an asymptotically stable all-pass function. Then g has a

realization of the form

o
,

(0 ...

!Xo)

(54)

Proof. We will distinguish between two cases.


Case 1. Assume that n = deg d is even. Therefore the function h has an even
den omina tor and an odd numerator. Thus we can be more explicit as far as
the continued fraction is concerned. Indeed, put q -1 (z) = d + ( - Z2) and qo(z) =
- zd _ ( - Z2) and set up the Euclidean algorithm with the recursion
(55)

262

P. A. Fuhrmann

with the i chosen so that the ai are monic and degqi+1 <degqi<degqi-1.
Let ai(z) = z - (Xi' i = 1, ... , n. We will show by induction that the sequence of
polynomials is alternatingly even and odd. In fact the initialization ensures q - 1
to be even and qo to be odd. So assume parity alternates for all indices till i.
From equation (55) it follows that
1
qi-l(Z)

(56)

ai(z)qi(Z)-qi+1(Z)

and hence
qi(Z)

qi-1(Z)

ai(z)- qi+1(Z)
qi(Z)

(57)

Now qi(Z)/qi-1 (z) is odd by the induction hypothesis. So ai(z) - qi+ 1 (Z)/qi(Z) is
also odd. This forces both ai(z) and qi+1(Z)/qi(Z) both to be odd. Thus a;{z)=z
and qi + 1 has opposite parity to qi. Thus h has the representation
h(z) = _ _ _ _ _(X~~-=--_ __

z _ _ _ _ _(X...o.i_ _ __
(X~

z - -----0---

z-

(X;-2

(58)

(Xn-1

z--Z

By Theorem 9.4 in Helmke and Fuhrmann [1989] h(z) has a realization of the
form

o
1

1
which is similar, through the matrix diag(po, ... , Pn-1) with Po = 1 and
Pi+1/Pi = (X~, to the matrix
0

(Xl

(Xl

,
(Xn -1
(Xn-1

(XO

(0 ...

(Xo)

5 Linear System Theory-Algebraic Methods

263

To go back from h to J we note that


d - (Z2)
J( Z) = - 1'h('lZ ) = - 1' (
- 'lZ) -

d + (Z2)

SO

J has only simple imaginary axis zeroes,


From the continued fraction expansion (58) we get

- ia~
J(z) = --------"::-----iz _ ~_ _ _~_a~i_____
a2
iz - -~----___:__---

z+ _____a_i'-=-____
a2

a;-2

iz -

iz __
a;_-_l
iz
(59)

Using again the continued fraction canonical form we get a realization for

J given by

0
-1

a 21 0

-1

a;_l
0

and by similarity to
0
-al

al

(0", ao)

o
Now g is obtained back from
tion, Thus g is realized by

J by the inverse output-feedback transforma-

o
(60)

with k = a~,
Clearly the realization in (60) is balanced with L = I,

264

P. A. Fuhrmann

Ca se 2. This case can be treated similarly, however the details will be


skipped.
0
We note in passing that the above realization is a special ca se of Darlington
synthesis, see Brockett [1970J and Krishnaprasad [1980].

References
[1965] N.I. Akhiezer, The Classical Moment Problem, Hafner, New York
[1986] A.C. Antoulas, "On recursiveness and related topics in system theory", IEEE Trans Aut
Control, AC-31, 1121-1135
[1970] R.W. Brockett, Finite Dimensional Linear Systems, Wiley, New York
[1962] L. Carleson, "Interpolation by bounded analytic functions and the corona problem", Ann
Math, 76, 547-559
[1977] T.E. Djaferis and S.K. Mitter, "Exact solution of some linear matrix equations using
algebraic methods", M.I.T. Report ESL-P-746
[1976] P.A. Fuhrmann, "Algebraic system theory: An analyst's point of view", J FrankIin Inst,
301, 521-540
[1977] P.A. Fuhrmann, "On strict system equivalence and similarity", Int J Contr, 25, 5-10
[1981] P.A. Fuhrmann, Linear Systems and Operators in Hilbert Space, McGraw-Hill, New York
[198Ia] P.A. Fuhrmann, "Polynomial models and algebraic stability criteria," Proceedings of the
Joint Workshop on Feedback and Synthesis of Linear and Nonlinear Systems, ZIF
Bielefeld, June 1981
[1981 b] P.A. Fuhrmann, "Duality in polynomial models with some applications to geometrie
control theory," IEEE Trans Aut Control, AC-26, 284-295
[1983] P.A. Fuhrmann, "On symmetrie rational transfer functions", Linear Algebra and Appl,
50, 167-250
[1984] P.A. Fuhrmann, "On Hamiltonian transfer functions", Lin Alg Appl, 84, 1-93
[1988] P.A. Fuhrmann, "Orthogonal matrix polynomials and system theory", Rend Sem Mat
Univers Politecn Torino, Special issue Control Theory, 68-124
[1991] P.A. Fuhrmann, "A polynomial approach to Hankel norm approximations". Lin. Aig.
Appl., 146, 133-220
[1989] P.A. Fuhrmann and B.N. Datta, "On Bezoutians, van der Monde matrices and the
Lienard-Chipart stability criterion", Lin Alg Appl, 120, 23-38
[1986] P.A. Fuhrmann and P.S. Krishnaprasad, "Towards a cell decomposition for Rat(n), IMA
J Math Contr Info 3 (1986), 137-150
[1926] M. Fujiwara, "Uber die algebraischen Gleichungen, deren Wrzein in einem Kreise oder
in einer Halbebene liegen," Math Z, Vol 24, 161-169
.
[1959] F.R. Gantmaeher, The Theory of Matrices, Chelsea, New York
[1990] T.T. Georgiou and M.C. Smith, "Optimal robustness in the gap metrie", IEEE Trans.
AC-35: 673-686
[1984] K. Glover," All optimal Hankel-norm approximations and their Loo-error bounds", Int J
Contr, 39, 1115-1193
[1983] W.B. Gragg and A. Lindquist, "On the partial realization problem", Linear Algebra and
Appl, 50, 277-319
[1989] U. Helmke and P.A. Fuhrmann, "Bezoutians", Lin Alg Appl, Vols 122-124, 1039-1097
[1989] U. Helmke, D. Hinrichsen, and W. Manthey, "A cell decomposition of the spaee of real
Hankels ofrank~n and some applieations", Lin Alg Appl, Vols 122-124,331-355
[1856] C. Hermite, "Sur le nombre des raeines d'une equation algebrique comprise entre des
limites donnes," J Reine Angew Math, Vol 52, pp 39-51
[1965] R.E. Kaiman, "On the Hermite-Fujiwara theorem in stability theory", Quarterly of Appl
Math, 32, 279-282
[1965] R.E. Kaiman, "Algebraie structure of linear dynamieal systems. I. The module of E", Proc
Nat Acad Sei (USA), 51, 1503-1508
[1969a] R.E. Kaiman, "Lectures on eontrollability and observability", CIME Summer school
(Pontecchio Marconi, Italy 1968), Edizioni Cremonese, Roma

5 Linear System Theory-Algebraic Methods

265

[1969b] R.E. KaIman, "Introduction to the algebraic theory of linear dynamical systems" in
Mathematical System Theory and Economics (edited by H.W. Kuhn and G.P. Szego),
Springer Lecture Notes in Operations Research and Mathematical Economics, Vol 11,
41-65
[1969c] R.E. KaIman, "Algebraic characterization of polynomials whose zeros Iie in algebraic
domains", Proc Nat Acad Sei, 64, 818-823
[1970] R.E. KaIman, "New algebraic methods in stability theory", Proc 5th International Congress
on Nonlinear Oscillations, Kiev 1969, Vol 2, 189-199
[1979] R.E. KaIman, "On partial realizations, transfer functions and canonical forms", Acta Polyt
Scand, 31, 9-32
[1969] R.E. KaIman, P.L. Falb and M.A. Arbib, Topics in Mathematical System Theory, McGrawHili, New York
[1936] M.G. Krein and M.A. Naimark, "The method of symmetrie and Hermitian forms in the
theory of the separation of the roots of algebraic equations," English translation in Linear
and Multilinear Algebra, Vol 10 [1981], 265-308
[1980] P.S. Krishnaprasad, "On the geometry oflinear passive systems", in Linear Systems Theory,
edited by c.1. Byrnes and C.F. Martin, in series Lectures in Applied Mathematics, Vol 18,
253-275, Amer Math Soc, Providence, R.I.

[1890] L. Kronecker, "Algebraische Reducti der Schaaren bilinearer Formen", S.-B. Akad Berlin,
763-776
[1893] A.M. Liapunov, "Probleme general de la stabilite de mouvement", Ann Fac Sei Toulouse,
9 (1907), 203-474. (French translation of the Russian paper published in Comm Soc
Math Kharkow)
[1914] A. Lienard and M. Chipart, "Sur le signe de la partie reelle des racines d'une equation
algebrique," J de Math, Vol 10, pp 291-346
[1981] B.C. Moore, "Principal component analysis in linear systems: controllability, observability
and model reduction", IEEE Trans on Auto Contr, 26, 17-32
[1957] Nehari, "On bounded bilinear forms", Ann Math, 65, 153-162
[1985] N.K. Nikolskii, Treatise on the Shift Operator, Springer Verlag, Berlin
[1987] R. Ober, "Asymptotically stable all pass transfer functions: canonical form, parametrization
and realization", Proceedings IFAC World Congress, Munich 1987
[1970] H.H. Rosenbrock, State Space and Multivariable Theory, J. Wiley, New York
[1968] G.c. Rota, "On models for linear operators", Comm. Pure and Appl. Math., 13: 469-472
[1967] D. Sarason, "Generalized interpolation in H""", Trans Amer Math Soc, 127, 179-203
[1970] B. Sz.-Nagy and C. Foias, Harmonie analysis ojOperators on Hilbert Space, North Holland,
Amsterdam
[1931] B.L. van der Waerden, Moderne Algebra, Springer Verlag, Berlin

Module Theory and Linear System Theory


M. L. J. Hautus 1 and M. Heymann 2
1 Department of Mathematics and Computing Science, Eindhoven University of Technology,
NL-5600 MB Eindhoven, The Netherlands
2 Department of Computer Science, Technion-Israel Institute of Technology, Haifa, 32000
Israel

1 Introduction
In aseries of seminal papers published between 1960 and 1965 [8-13] (see also

[14,15]), R.E. KaIman layed the foundations of what has since become known
as M athematical System Theory. The corners tones of Kalman's theory were the
celebrated concepts of Controllability, Observability and (Canonica0 realization.
The first formal introduction of the concepts of controllability and observability
as fundamental structural properties of (linear) systems was made by KaIman
in [8,9] and the canonical realization problem and its relation to controllability
and observability was first studied extensively in [11]. The relation between the
structure of canonical realizations and that of transfer matrices was investigated
by KaIman extensively in [12]. The crucial insights in Kalman's theory derived
from the discovery that the concepts of controllability and observability are
linked in an essential way to the system's structure and that many of the system's
structural features are encoded in its controllable and observable behaviors.
In the early stages, the mathematical tools for the study of the structure of
linear systems were only basic linear algebra (the theory of invariant subspaces)
and the theory of matrices (in the style of Gantmacher [7]). A crucial
contribution to the theory of linear systems was Kalman's discovery [13] that
the theory can be embedded naturally in c1assical module theory. Specifically,
module theory was shown to be a natural setting for an abstract theory of
realization in which the concepts of state, controllability and observability arise
in a completely natural way.
A further central result on linear system structure was obtained by W onham
in [19] where he showed that (in multi-input systems) controllability is
equivalent to "pole-assignability". This result indicated (probably for the first
time) the deep interrelation between controllability and (state)-feedback
capabilities of a given system. (The pole assignability result had been known
much earlier for single input systems.) The research on linear feedback received
a great boost by the pole assignment result, and proceeded in two basic parallel
avenues: The "geometric" approach that was developed primarily by Wonham
and Morse (see, e.g. [18]) in wh ich framework such important problems were
investigated as feedback-decoupling, regulator design, design ofmodel-following

268

M. L. J. Hautus and M. Heymann

systems and the formulation and characterization of feedback-invariants. The


second approach was algebraic and rested primarily on polynomial matrix
techniques. Here it was learned that with the aid of fraction representations of
transfer matrices many problems that relate to system realization as weIl as to
state-feedback can be attacked in a fairly unified way (see, e.g. [6,16,17]).
An important consequence of the algebraic research on linear feedback was
the observation by several authors (see, e.g. [2,5]) that the module-theoretic
framework introduced by KaIman for formulating the realization problem could
be adapted to investigate state-feedback as well. Thus a unified view of
realization and feedback within a module-theoretic framework became possible.
The crucial insight in accommodating the state-feedback problem within the
module theoretic framework was the observation that an extended formulation
of the system's input/output map was required [5]. State-feedback could then
be expressed as a suitable (bicausal) linear operator and its effect could be
extensively studied.
Further progress in studying linear feedback was subsequently made in [4]
where it was observed that the extended input/output map introduced in [5]
induces not only a polynomial (or K[[z]]-) module structure of a system (which
was central in formulating state related properties of the system) but also a
causal (or K [[ z - 1]]-) module structure as weIl. The latter was shown to be
instrumental in expressing the causal characteristics of the system and hence
facilitated the investigation and formulation of questions related to outputfeedback.
The role of module theory in linear system theory did not end he re yet and
in recent years an extensive research program was undertaken by Wyman, Sain,
Conte and Perdon in exploring the pole and zero structures of linear systems
within the module-theoretic framework. The interested reader is referred to the
ni ce survey of this work [1]. FinaIly, it is noteworthy that re cent research on
the structure of nonlinear systems is strongly influenced by the algebraic work
on linear systems in which module theory, as originally introduced into system
theory by KaIman played a fundamental role (see, e.g. [3] and the references
quoted therein).
The present paper is a survey of the polynomial-module theoretic framework
of linear system theory. It is shown that a comprehensive and consistent theory
can be developed starting from basic principles by properly formulating the
input/output behavior of a system. The framework provides a unified view of a
system that eliminates the need to make sharp distinctions between time-domain
and frequency-domain and enables the formulation and examination of a wide
range of system-theoretic questions. The paper is by no means exhaustive and
emphasizes chiefly some of the topics originally explored by KaIman as weIl
as some questions related to state feedback.

5 Linear System Theory-Module Theory

269

2 Input/Output Maps
A system is a device that accepts inputs and produces outputs (based on the
received inputs). We consider linear discrete-time systems, i.e. we assume that
the following is given:
1. A field lK. Typically, lK will be the field of real or complex numbers, or a

finite field (e.g. GF(2)).


2. lK-Linear spaces l1Il and 0//, called the input value space and the output value
space, respectively. These are the spaces in which the input and the output
take their values. We define the input and output (trajectory) space as 11Il 00
and 0//00, where for any lK-linear space Y, the space y oo is defined as the
space consisting of all two-sided infinite sequences (St)teZ, such that StEY for
all tEll, and such that there exists a toEll for which we have St = 0 (t < t o)
(note that t o will depend on s). The spaces 11Il 00 and 0//00 can easily be endowed
with alK-linear structure.
3. AlK-linear map J:l1Il oo -+0// 00 , which is time-invariant and strictly causal. Here
we call a map J :11Il00 -+ 0//00 time-invariant, if for all sequences uEl1Il oo , we have
f(au) = a f(u), where a is the shift operator: (au)(t):= u(t + 1) for all tEll. A
map J is called strictly causa 1 if u(t) = 0 for t< t o implies that f(u)(t) = 0 for
t ~ t o.
A map as defined in 3 is called an (extended) i/o map. The situation is often
illustrated as in the following diagram:

Time invariance can be very neatly accommodated by viewing elements of 11Il00


and 0//00 as formal power series. We denote by AlK the set of formal Laurent
series LteZStZ-t, where (St)teZ is in lK oo . The spaces AI1Il and Ao// are introduced
similarly. Henceforth the spaces 11Il00 and AI1Il will be identified. They represent
two ways of looking at the same object: 11Il00 emphasizes the time-domain point
ofview and AI1Il stresses the frequency-domain aspect. The principal convenience
of formal Laurent series is the fact that they can be given an algebraic structure
that is compatible with the time-invariance concept. Thus AlK becomes a field
and AI1Il and Ao// become AlK-linear spaces. This algebraic structure is defined
in an obvious and well-known way: Addition is to be performed elementwise
and multiplication is convolution. In particular, the shift operator equals
multiplication by z. The following result is fundamental:
(2.1) Theorem. An i/o map is AlK-linear.

Let J: A 11Il-+ A 0// be an i/o map. We define the M arkov parameters of J as the
lK-linear maps T t :l1Il-+ 0//, 1;:= PtO J oi u , where iu :l1Il-+ AI1Il denotes the canonical

270

M. L. 1. Hautus and M. Heymann

injection: iu(u):= LtUtZ- t, and Uo = U, ut = 0 for t =F O. Furthermore, Pt: AI{?! --+I{?!


denotes the projection: Pk(LtYtZ-t):= Yk' Strict causality of J implies that 'Fr = 0
for t ~ O. Consequently, we can define the transfer function T(z):= Tj(z):= Lt'FrZ- t.
This is an element of A2', if 2' denotes the lK-linear space of lK-linear maps
Illt --+ I{?!. In the sequel we will often identify the i/o map with its transfer function.
This is justified, because }tu) = Tlu, where Tlu denotes the usual convolution
product of two Laurent series.

3 State-Space Realizations
Suppose that in addition to Illt and I{?!, we are given the space ff, which we will
refer to as state space, and consider sequences (x t), (u t ), (Yt) related by the
following equations:
(3.1)

where F, G, and H are linear maps between the corresponding spaces. For each
uEAIllt, one can construct a unique xEAff satisfying the first equation of (3.1),
and subsequently YEAI{?! by the second equation. This defines an ijo map, which
can be written as

Y = T(z)u,

(3.2)

T(z):= H(zI - F)-lG

(3.3)

where

The corresponding Markov parameters are:

'Fr:= Hpt-1G

if t > 0

(3.4)

We will call (3.1) astate space representation of the i/o map defined by either
(3.3) or (3.4). Realization theory is concerned with the opposite direction of the
above computation: Construct a state-space representationfrom a given i/o map.
Fundamental questions are the existence, the uniqueness and the actual
construction of realizations. As explained in the introduction, the foundation
for an elegant theory was provided by R.E. KaIman. The starting point of this
theory is apreeise deseription of the eoneept of state. The state is a
time-dependent variable that satisfies two fundamental properties:

- The state is only dependent on past va lues of the input.


- Past values of the input affect the output only via the state.

(3.5)

We start with a heuristie diseussion. Suppose we are given a state-spaee


representation in the sense of (3.5). Time invarianee allows us to identify the
present time with t = 1. The first statement says that there is a map from the
spaee of all inputs with support on ( - 00, OJ (i.e. u(t) = 0 for t > 0) to the state
spaee. In terms of our formal power-series notation, inputs with support on

5 Linear System Theory-Module Theory

271

( - 00,0] can be identified with polynomials. We will use the notation illK or
IK[z] for the set ofpolynomials in z, and similarly, ililll for the set ofpolynomials
with coefficients in illI. Hence we have a map g:ililll-+f!l, where f!l denotes the
state space. The quantity x = g(u) represents the state at time t = 1 resulting
from the input u.
The second statement says that, if we are only interested in the influence of
past inputs (i.e. polynomial inputs) on future values of the output, then there
is a map h from the state space f!l to the set P!!! of all sequences of future
outputs. The quantity y = h(x) represents the output sequence for t> 0 resulting
from input zero and initial state x. We will specify the set TOJI and the map h
in a moment. Taking the composite of the two maps 9 and h, we have a
map J:= hog:ililll-+ TOJI, which represents the effect of past inputs on future
outputs.
In order to make this discussion more concrete, we have to define the space
TOJI more precisely. When talking about future outputs, we do not want to
imply that past outputs have been zero. Rather, we want to express the fact
that we are only interested in the future values. That is, we will identify two
output sequences when their values coincide for t > O. This consideration gives
rise to the introduction of quotient spaces. Specifically: TOJI:= AOJIjilOJl. Hence,
two outputs are identified when their difference is a polynomial. For any IK-linear
space ff, the spaces Aff, ilff and Tff are connected via canonical maps, viz.
j = js:ilff -+ Aff, the canonical embedding, i.e. the restriction of the identity
map in Aff to nff, and n = ns:Aff -+ Tff, the canonical projection, mapping
each element of Aff to its equivalence dass. Now we can define the map J,
introduced in the previous paragraph, more precisely. In fact, this map is defined
by the following commutative dia gram,
/\'1.1

--------7)

11. "d

(3.6)

hence, J:= nOlOj. This map is sometimes called the Kaiman ijo map, or the
restricted ijo map, or the Hankel map. The conceptual experiment, in which
only past inputs and future outputs are considered is sometimes called the
Kaiman experimental setup.
Obviously, it is of interest to find out how time invariance can be
incorporated into diagram (3.6) by the use of a suitable algebraic structure. The
space AillI is a AIK-linear space, but this is obviously not the case for ililll. In
ililll, multiplication with z is possible, but division by z can take the element
out of ililll. Therefore we can say that ililll is an ilIK-module. Here illK = lK[z]
is defined to be a ring in a standard way. Similarly, ilOJI is an illK-module and,
by a standard construction in algebra, TOJI = AOJIjilOJI is an illK-module.
Furthermore, the maps j, J and n are illK-homomorphisms, so that

272

M. L. J. Hautus and M. Heymann

J:ilillt -+ P!Y is also an fllK-homomorphism. Finally, it can easily be seen that

the map J uniquely determines the original map 1, due to the strict-properness
condition.
We now turn to the actual realization problem. As we have noticed before,
if we have a realization there is an intermediate space q;, and there are maps
g:ilillt -+ f!l and h:f!l -+ P!Y such that J = hog. Thus we can extend diagram (3.6)

NU
j

Q'U

:r

) A"d

in

(3.7)

) r"d

/h

We have not specified anything about f!l, g and h yet, but in view of the linear
framework we are working in, it is obvious to require that f!l be a IK-linear
space and g and h be IK-linear maps. (Note that ilillt and P!Y can be viewed
as IK-linear spaces in a canonical way.) However, we also want to exploit the
ilIK-module structure of diagram (3.6) or, equivalently, the time invariance of
the system. For this purpose, we extend the ilIK-module structure to diagram
(3.7), i.e we define a multiplication by z in f!l in such a way that g and h
commute with the multiplication with z. It is easily seen that this suffices for
determining an flIK-module structure on f!l in such a way that the maps g and
h become ilIK-homomorphisms. Let us see wh at happens when we apply the
map g to zu instead of u. The input zu is obtained by shifting u one time unit
to the left. By time invariance, the corresponding output, y:= J(zu) is obtained
from y:= 1(u) by shifting to the left. Since the input is zero for positive t, we
find that if Xo is the initial state, Yi=HFi-1XO and Yi=HFiXO=HFi-l(Fxo).
Hence, Y is the sequence that corresponds to the initial state Fx o instead of Xo.
So, replacing u by zu has the same effect on the output as replacing Xo by Fxo.
Therefore, an obvious idea is to identify the action XHZX with XHFx. It is
easily verified that with this definition, f!l is an ilIK-module, g and h are
ilIK-homomorphisms and (3.7) a commutative diagram of illK-homomorphisms.
The following theorem formulates the fundamental result ofrealization theory:

(3.8) Theorem. Let f:AIlIt -+ N.!! be an i/o map. To every realization (F, G, H)
there corresponds a unique ilIK-module Jactorization (g, h) oJ the restricted map J
corresponding to 1. Conversely, every Jactorization oJ J gives rise to a unique
realization (F, G, H). The two correspondences are inverses oJ each other.
In short: reaIization is equivalent to factorization. It will be seen that a number
of concepts and properties are much more easily expressed in terms of the
factorization than directly in terms of the realization.

5 Linear System Theory-Module Theory

273

The correspondences mentioned in the theorem are easily computed. Given


the realization (F, G, H), x = g(u) is given as the state at time t = 1 resulting from
the input u, y = h(x) is the output trajectory resulting from initial state x and
input zero. Conversely, given the .QlK-module factorization (g, h) of f, the
realization (F, G, H) is defined by:
F:f!l' ~f!l':XHZX,

(3.9)

G:Oll ~f!l':UHg(iU(U)),
H:f!l' ~W:XHP1(h(x))

Next we turn our attention to the uniqueness of realization. It can readily be


verified that without any further restriction there is no hope that realization
could be unique. For this reason, one usually imposes the conditions of
reachability and observability on the realization. These properties are most
easily expressed in terms of the maps g and h.
(3.10) Definition. A realization (F, G, H) (and the corresponding factorization
(g, h)) is called reaehable if g is surjective. It is called observable if h is injective.
The realization is called eanonieal if it is reachable and observable.

It is straightforward to see that these definitions coineide with the familiar


definitions of reaehability and observability. The existenee of a eanonical
faetorization is easily demonstrated. In fact, given f:.QOll ~ TW, we ean take
f!l':= QO/t/ker(f), g:QOll ~ f!l' the eanonical projection, and h:f!l' ~ TOJI defined
by h(g(u)):= f(u). This realization will be called the standard eanonical
realization. Also, it is an easy exercise in standard algebraic calculations to
prove that any eanonieal realization is isomorphie to the standard realization.
Here we eall two realizations (f!l'b Fi , Gi' H i ) (where i = 1,2) isomorphie if there
exists a .QlK-isomorphism (t:f![l ~f!l'2 which makes the following diagram
eommutative.

~l

rY

This means, g2 = (tgl and h1 = h2(t. In terms of the realizations this can be
written as:

whieh is the familiar coneept of isomorphism. Henee, we have the result that
eanonieal realizations are essentially unique, i.e. unique up to isomorphism.

274

M. 1. J. Hautus and M. Heymann

4 Finite-Dimensional Realizations
Up to now, we have not made any assumption about the dimension of the
spaces involved. In this section, we ass urne that tfIt and OJ! are finite dimensional.
Specifically, we let tfIt = lKm and OJ! = lK P We are interested in the question of
the existence of a finite-dimensional realization, i.e. a realization in which the
state space q; is finite dimensional. Abstract conditions for this to be the case
can be given in module-theoretic terms. To this extent, we use the following
result:
(4.1) Lemma. Let L1 be a submodule oJ QtfIt. Then the Jollowing statements are

equivalent:
1.
2.
3.
4.

The AlK-linear hull oJ L1 is AtfIt.


has rank m.
QtfIt/ L1 is a torsion module.
QtfIt / ,1 has finite dimension as alK-linear space.
,1

Let us clarify some of the concepts used in this lemma:


1. The AlK-linear hull of a module L1 is the smallest AlK-linear space eontaining
,1.

2. By definition, a free module has a basis. If the ring is an integral domain,


the number ofbasis elements is fixed, i.e. independent ofthe partieular ehoice
of the basis. This number is ealled the rank of the module.
3. Because QlK is a prineipal ideal domain, every submodule ,1 of a free finitely
genera ted module Q is free, and rank L1 ~ rank Q. Since QtfIt is isomorphie
to (QlK)m, it is free. Henee eondition 2 states that ,1 has 'full rank'.
4. A module ,1 over a ring R is ealled a torsion module if for every uEL1, there
exists a nonzero rER sueh that ru = O.
We call a submodule L1 of QtfIt Jull if it satisfies the above eonditions. Then we
have the following eonsequenee of the lemma:
(4.2) Theorem. A restricted i/o map J:QtfIt -+ TOJ! has a finite-dimensional

realization if and only if ker(J) is a Juli submodule oJ QtfIt. A given reachable


realization (g, h) is finite dimensional if and only if ker(g) is Jull.

The second statement follows from the fact that in a reaehable realization, the
state space is isomorphie with QtfIt /ker(g).
A more concrete (and well-known) eondition for the existence of a
finite-dimensional realization is the rationality of the transfer funetion. The
above framework enables us to relate any reaehable finite-dimensional
realization in an essentially unique way to a matrix fraction representation of
the transfer function, whieh in particular implies that the transfer matrix is
rational. Specifically, ass urne that (f, g) is such a factorization and denote ker(g)
by ,1. Since ,1 is full, it is generated by m vectors. Hence we ean write ,1 = DQtfIt

5 Linear System Theory-Module Theory

275

(i.e. the image of QO/l under the map induced by D) for some nonsingular
m x m-polynomial matrix D. The matrix D not only induces an lKhomomorphism in QO/l but also a AIK-linear map in AO/l. As such, it can be
composed with the map!. We define N:=JoD. An easy calculation yields that
N:AOll ~ AO/l is a polynomial map, i.e. N(QOll) r;; QO/l. Consequently,
J = No D -1 is a representation of J as a left matrix fraction of polynomials.
We formulate this as a theorem:

= IKm and
qy = IKP. For every reachable finite-dimensional realization (g, h), the i/o-map can
be written as the matrixJraction J = ND-I, where Dis any basis matrix ofker(g)
and N:= Jo D. This representation is unique up right multiplication of N and D by a
unimodular matrix. Conversely, to every matrix Jraction J = ND -1, one. can
construct a reachable realization (g, h) by defining !![:= QO/l /DQOll, and taking for
g the canonical map. The realization is unique up to isomorphism.

(4.3) Theorem. Let TA Oll ~ A qy be an i/o-map. Assume that Oll

5 State Feedback
We want to find out to wh at degree it is possible to modify an existing system
using astate feedback transformation u = - Kx + Lv, where v a new input
variable. Such a feedback transformation is denoted as (K, L).
y)

(5.1)

In the diagram, the relation between u and y is y = f(u). In addition, there is


an i/o map from u to x, which we denote by J.. Hence x = fs(u). Combining
these relations with u = Lv - Kx, we obtain
(5.2)

where
(5.3)

We conclude that the effect of state feedback can also be achieved by cascading
the system 1: with a precompensator with transfer function TK L Conversely,
we may ask the question of which precompensators can alternatively be
implemented using state feedback. That is, for which i/o maps Tdoes there exist
astate feedback transformation (K, L) such that 1= TK L? It is obvious from the
definition that TK L is invertible and has a causal inv~rse in addition to being
causal itself. Such a map will be called a bicausal isomorphism. Consequently,

276

M. L. J. Hautus and M. Heymann

being a causal isomorphism is a necessary condition for Tto be implementable


as state feedback. The arguments up to now do not use the fact that x is the
state variable. In order to exploit this special property of x, we have
to analyze by wh at property the map Is is special amongst i/o maps in general.
We will express the fact that the output of Is is astate variable by saying that
J. is an i/s map (input/state). The following gives a characterization ofi/s maps:
(5.4) Lemma. An i/o map J.:AOZI -+ Aqjj is an i/s map if and only if
ker(fs) = ker(p1 0 J)

The interpretation of this statement is immediate: If for t > 0, the input is zero
and the output of Is is zero at time t = 1 then the output will be zero for all
positive t.
Ifwe start from an arbitrary i/o map!:AOZI-+Aqjj, and we assume we have
a realization (g, h), we can define the i/s map J. using e.g. the state-space equations
(3.1). The maps 1 and J. have the obvious relationship 1 = Ho J.. Note that we
have extended the map H:f!{ -+qjj (since y = Hx). In general, we will call an i/s
map J. a semirealization of an i/o map 1, if there exists a static map H such that
1 = Ho J.. Here a static map is the obvious extension of a map H:f!{ -+CfIJ to a
map Af!{ -+Aqjj, obtained by applying the map H to each coefficient of the
power series.
Now we ask the question: Which condition must be satisfied for a given i/s
map Is to be a semirealization of a given i/o map J?
The following result answers this question and it also shows that this answer
is characteristic for i/s maps:
(5.5) Theorem. Let h be an i/o map. Then 11 is a reachable i/s map iff the
following holds: for every i/o map 1 satisfying ker(f1) S; ker(f), the map 11 is a
semirealization of 1.
Note that kerneis are taken of the restricted maps. Using matrix fraction
representations, as discussed in Theorem 4.3, we can reformulate the previous
result as folIows:
(5.6) Corollary. Let SD -1 be an i/o map. Then SD - 1 is a reachable i/s map iff
the following holds: for every i/o map ND - 1, there exists a (unique) static map H
such that N = H S.
Next we turn to the state feedback problem. The basic question is: Given a
bicausal map T, when does there exists a feedback compensator (K, L) such that
T= ~,L?
It is easily seen that a bicausal map can be written uniquely as the sum of
invertible static map and a strictly causal map. Also, it follows from (5.3) that
L is the static part of !2' K,L" Therefore upon multiplying Twith the inverse of
its static part, we obtain the situation with static part I, corresponding to the

5 Linear System Theory-Module Theory

277

feedback compensator (K,1). The objective then is to find a map K such that
1= (I + Kl,)-l, equivalently, such that 1- 1 = I + Kl.. The map j:= 1- 1 - I is
strictly proper, and hence it can be seen as an i/o map. The equation to be
satisfied by K reduces to Kfs =]. That is, oUf question is equivalent to: Is l.
a semirealization of 1? For this we have seen an answer in Theorem (5.5): This
holds iff ker(fs) ~ ker(f).
Because of the definition of the restricted map this is equivalent to: If the
input u is polynomial and l.(u) is polynomial then f(u) is polynomial, or equivalently
1- 1(u) is polynomial. Thus we obtain the following result.
(5.7) Theorem. Let l.:Aou-tA2t be a reachable i/s map and I:AOU-tAOU a
AIK-linear map. There exists afeedback transformation (K, L) such that 1= IK L ijJ
(i) I is a bicausal isomorphism,
(ii) for uEQOU, l.(u)EQ2t, we have 1- 1(u)EilOU.
This result can be reformulated in terms ofthe matrix fraction representation
of Theorem 4.3. To this extent, assume the 1s: AOU - t A2t is a reachable i/s map.
According to Theorem 4.3, !. can be written as SD - 1 for some polynomial
matrices Sand D, where D = ker fs (= ker P1 0 f" see Lemma 5.4). Then we have
uEilOU, !.(U)EQ2t iff uEim D, i.e. u = Dv for some vEQOU. Hence, according to
the theorem, 1= IK L for some feedback transformation iff 1- 10 DVEQOU for all
vEQOU, i.e. iffl- 1 Dis polynomia!. In order to simplify the formulas and without
loss of generality, we assume that L = I, correspondingly, that 1- 1 is of the form:
1- 1 = I + 1, where 1 is strictiy causa!. Then the above condition is: Q:= 1 D
is polynomia!. Hence, we can write las 1- 1 = (D + Q)D - 1, i.e. 1= D(D + Q)-l.
The final result is:
0

(5.8) Theorem. Let 1s = SD - 1 be an i/s map. A bicausal precompensator I can


be implemented as state feedback with L matrix equal to I ijJ I is of the form
1= D(D + Q) -1, where Q is a polynomial map such that QD -1 is strictly causa/.
An ijo map f1 can be obtained from fs by static state feedback ijJ it can be written
asf1=HS(D+Q)-1.
We conclude that in terms of transfer functions, astate feedback transformation simply transforms the fraction N D- 1 to N(D + Q)-1, where Q is a
polynomial matrix such that QD- 1 is strictly proper and conversely, every
transfer matrix of the form N(D + Q) -1 can be obtained by static state feedback.
This is completely analogous to the well-known single variable case.
References
[1] G. Conte and A.M. Perdon, "Zeros, Poles and Modules in Linear System Theory," in Three
Decades of Mathematical System Theory, Springer-Verlag Lecture Notes in Control and
Information Sciences, No 135, pp 79-100, 1989
[2] A.E. Eckberg, Jr., "A Characterization ofLinear Systems via Polynomial Matrices and Module
Theory", MIT Electronic Systems Laboratory Rep ESL-R-528, Cambridge, MA 1974

278

M. L. J. Hautus and M. Heymann

[3] J. Hammer, "Assignment of Dynamics for Nonlinear Recursive Feedback Systems", Int J
Control, 48, pp 1183-1212, 1988
[4] J. Hammer and M. Heymann, "Causal Factorization and Linear Feedback," SIAM J on
Contr and Optim, 19, pp445-468, 1981
[5] M.L.J. Hautus and M. Heymann, "Linear Feedback-an Algebraic Approach," SIAM J Contr
and Optim, 16, pp 83-105, 1978
[6] M. Heymann, Structure and Realization Problems in the Theory oi Dynamical Systems,
Springer New York, 1975
[7] F.R. Gantmacher, The Theory oi Matrices, Chelsea, New York, 1959
[8] R.E. Kaiman, "Contributions to the Theory of Optimal Control," Bol Soc Mat Mexicana,
5, pp 102-119, 1960
[9] R.E. Kaiman, "On the General Theory ofControl Systems," Proc 1st IFAC Congress, Moscow,
Butterworths, London 1960
[10] R.E. Kaiman, "Canonical Structure of Linear Dynamical Systems," Proc N at Acad Sei (U .S.A.),
48, pp 595-600, 1962
[11] R.E. Kaiman, "Mathematical Description of Linear Dynamical Systems," SIAM J Control,
1, pp 152-192, 1963
[12] R.E. Kaiman, "Irreducible Realizations and the Degree ofa Rational Matrix," SIAM J Control,
3, pp 520-544, 1965
[13] R.E. Kaiman, "Algebraic Structure of Linear Dynamical Systems. I. The Module of E," Proc
Nat Acad Sei (U.S.A.), 54, pp 1503-1508,1965
[14] R.E. Kaiman, Lectures on Controllability and Observability, Lecture Notes, CIME, July 1968,
Cremonese, Rome 1969
[15] R.E. Kaiman, P.L. Falb, and M.A. Arbib, Topics in Mathematical System Theory, McGraw
Hill 1969
[16] H.H. Rosenbrock, State Space and Multivariable Theory, John Wiley & Sons, New York
[17] W.A. Wolovich, Linear Multivariable Systems, Springer Verlag, New York, 1974
[18] W.M. Wonham, Linear Multivariable Control, 3rd ed., Springer Verlag, New York, 1985
[19] W.M. Wonham, "On Pole Assignment in Multi-input Controllable Linear Systems," IEEE
Trans Auto Cantr, AC-12, pp 660-665, 1967

Models and Modules: Kalman's Approach


to Aigebraic System Theory
B. F. Wyman
Mathematics Department, The Ohio State University, Columbus, Ohio 43210, USA

1 Introduction
In abrief expository paper presented in 1967 [K5], KaIman asked the following
questions:
What is a system? How can it be effectively described in mathematical
terms? Is there a deductive way of passing from experiments to mathematical models? How much can be said about the internal structure of
a system on the basis of experimental data? What is the minimal set of
components from which a system with given characteristics can be built?

In fact, in aseries of papers [Kl-3] KaIman had already supplied nice


answers to these questions in the specific context offinite-dimensional stationary
linear systems. The paper [K1] formulates these problems for general (possibly
time-varying) continuous time systems, while [K2] studies the algebraic theory
of discrete-time systems.
The present survey will deal primarily with [K2], here abbreviated as [AS]
for "Algebraic Structure I," and the ideas and generalizations which grew from
it. It will be important for us to realize that [AS] has inspired two separate
lines of thought. The first, more general and fundamental, is the idea that
rigorous, technical definitions can be given for the ideas of an input/output
behavior, and the corresponding notions of "realization" and "minimal realization." Furthermore, at least in the stationary linear ca se, each input/output
behavior leads to a minimal realization with suitable unicity properties. Thus,
there is "a deductive way of passing from experiments to mathematical models."
In the linear stationary case these new ideas can be implemented extremely
nicely both on a theoretical and on a concrete algorithmic level, but the
philosophy is quite general and has influenced research in the areas of timevarying systems, non-linear systems, and much more abstract constructions. In
Sect. 3 we discuss briefly some of the "category-theoretic" work of this kind
which was directly influenced by [AS], and many additional aspects of this
philosophy will appear in other papers in this volume.
The second aspect of [AS], which is most evident from the title and for
which the paper itself is usually remembered, is the use of ideas from commilta-

280

B. F. Wyman

tive algebra to describe the structure of a finite dimensional stationary linear


system. In [AS], the sets ofinput signals and output signals, and the state space,
are all given the structure of modules over the rings of polynomials over a field
of scalars. The main result is that an input/output map can be realized by a
unique minimal (or irreducible) system. A more detailed version of the algebraic
theory can be found in the weIl known Chapter X of the 1969 book by KaIman,
Falb, and Arbib ([K8, KFA]). Section 20fthis paper gives a capsule summary
of Kalman's algebraic approach and some of its generalizations.
FinaIly, although Kalman's original work on the algebraic structure of the
state space emphasized the poles or natural modes of a system, the module
theoretic ideas involved can also be applied to the theory of zeros of multivariable systems. The interactions between Kalman's pole modules and the more
recently defined multi variable zero modules lead to clean formulations of
control-theoretic ideas which have been known for a long time, and suggest a
number of unsolved problems. The long-term project of applying Kalman's
algebraic methods to the theory of zeros is discussed in Sect. 4.

2 The Module Theoretic Structure of Linear Systems


Following [AS] we will use here discrete-time notation for stationary linear
systems. Denote by K an arbitrary field (typically reals or complexes for
engineering applications, finite for co ding theory, rationals, p-adics and more
general fields for applications to systems over rings). Let X (the state space), U
(the input space), and Y (the output space) be finite-dimensional vector spaces
over K. Let F:X --+X (the dynamics map), G: U --+X (the input map), H:X --+ Y
(the output map) be K-linear maps. The structure equations for a system L will
then be given by
x(t + 1) = Fx(t) + Gu(t),

y(t) = Hx(t)

Following c1assical ideas from abstract algebra (as found, for example, in Van
der Waerden's text), KaIman defined a K[z]-module structure on the state space
X by p(z)x = p(F)x for any state vector x. (If F is considered as a square matrix,
p(F) is also a square matrix for any polynomial p(z), and the right hand side
is simply matrix-vector multiplication.) The study ofthis K[z]-module structure
is equivalent to the study of F up to similarity, which can be done using the
language of invariant factors or of canonical forms. In fact, from this point of
view the module-theoretic treatment unifies the two approaches and gives the
proof of the canonical form theorems as corollaries of more general results in
commutative algebra.
The observation that the dynamical structure on the state space of a linear
system corresponds to a polynomial module structure opens up new possibilities
for generalizations and connections between system theory and algebra, but
perhaps by itself is not so interesting. Kalman's major contribution in [AS]

5 Linear System Theory-Models and Modules

281

was the discovery that not only the state space, but also the spaces of input
and output sequences admit natural polynomial module structures, and that
the three structures fit together perfectly. The input/output spaces are in fact
not finite dimensional, so the full power of module theory is needed to give a
good conceptual view of the situation.
In his study of input and output strings, KaIman began with a finite
dimensional space V, say, and a time set T consisting of the integers with the
standard order. Trajectories 1I'V are maps v: T --+ V such that there is an integer
N = N(v) such that v(t) = for all t < N. The value v(t) contains information
of some kind which occurs at time t. Past trajectories (up to and including the
present moment) are given by the set Q V of all trajectories vanishing for t> 0,
while future trajectories rv vanish for t< 1. These structures can be given
various algebraic structures using the Z-transform. If v is a trajectory, the
Z-transform is the formal Laurent series given by

Z(v) =

L
00

v(t)z-t

t=N

where z is a place-holding variable chosen so that z-t indexes events at time


+t. We denote by K((Z-l)) the field of formal Laurent series over the field K
of scalars and the space of trajectories (vector Laurent series) by V((Z-l)). The
ring K[z] of polynomials can be considered as a subring of K((Z-l)). In
transform notation the set Q V corresponds to polynomials in z (past time gives
positive exponents), and we can write Q V = V[z], thinking of polynomials with
vector coefficients. The set rv consists of formal Laurent series L~l v(t)z-t
starting at t = 1, where for each t, v(t) is a vector in V.
The polynomial ring K[z] acts on the trajectories V((Z-l)) in the obvious
way:

L
00

v(t + 1)z-t,

t=N

so that multiplication by z has a dynamic meaning as a time shift one unit into
the past. In particular, the set Q V of part trajectories is invariant under this
action. In other words, QV becomes a K[z] module itself, and as such it is a
free module of rank equal the vector space dimension of V.
So far we have seen that the space of trajectories and the space of past
trajectories can be viewed as K[z]-modules, and that these module actions have
a natural dynamical meaning as time shifts. Wh at about the space r V of future
trajectories? Kalman's module action of K[z] on rv, introduced in [AS] and
described in more detail in Chapter X and the CIME notes [K7, 8], is one of
the most important ideas of the algebraic approach. As a first attempt, try the
obvious action: Z(L~l v(t)z-t) = L~ov(t)z-r+l. This result does not lie in rv,
since it begins with v(1), so KaIman elected to erase this initial term, obtaining

282

B. F. Wyman

the new action ZCL;": 1 v(t)z-t) = Lt~ 1 v(t + l)z-t. From a dynamical point of
view, this piece of algebra is extremely natural, corresponding to the intuitive
principle "multiplication by z is a time shift, ignoring data which falls into the
past." Furthermore, this action is equally natural from several algebraic points
of view leading to the well-known KaIman realization diagram.
.
Suppose given a linear system in the form (X, U, Y, F, G, H). The idea that
every past input string should set up astate leads to a mapping <G: D V -+ X
defined by setting <G(uz i ) = FiGu for all u in U, and extending linearly to all
polynomials. The mapping <G becomes a K[z]-module map if X is given the
standard dynamical structure and D V is given the shift module structure
discussed above. In fact, <G is the "correct" abstract version of the controllability
matrix, and the system (F, G, H) is controllable exactly when <G is surjective.
For studies of observability, KaIman introduced a map lH: X -+ r Y defined
by lH(x) = 2::;": 1 HP(x)z-t. Then, lH is a module homomorphism which serves
as the abstract version of the classical observability matrix, and is one-to-one
if the system is observable.
The controllability and observability morphisms fit perfectly into a
commutative diagram, the Kaiman realization triangie, which has had an
enormous influence on system theory.

T#
DU -----------l~

where, for the moment, T# = lH 0 <G is adefinition.


This diagram leads immediately to conceptual approaches to the study of
canonical (controllable and observable) systems, and to the careful separation
of the notions of canonical and minimal systems. The uniqueness problem for
a canonical realization of a transfer function is also clarified with examination
of this diagram, but to discuss this aspect of Kalman's work we have to work
a little harder.
The idea of a transfer function, which developed classically from 7l- or
Laplace-transforms, can be formulated algebraically as follows. A transfer
function matrix, or better, an input/output morphism, is (in the present context),
a mapping T(z):1I'U -+ 1I'Y between spaces of trajectories. Each trajectory space
is a vector space over the field K((Z-l)) of Laurent series, and to say that T(z)
is linear and stationary is exactly the assertion that T(z) is K((z - 1 ))-linear.
Familiar calculations give T(z) = H(zI - F)-lG. Our next goal is to describe
Kalman's connection between this K((Z-l ))-linear transfer function and the
K[z]-module realization triangle.

5 Linear System Theory-Models and Modules

283

We saw above that the past trajectories QU appear as a subset, and even
a sub K[z]-module, of the space 1I'U of all input trajectories. The factor module
"all trajectories modulo past trajectories," 1I'Y/ Q Y, is then exactly the space
ry of future output trajectories, and the induced K[z]-action on the factor
module is exactly Kalman's dynamical action discussed above. That is, Q U
inserts K[z]-linearly into TU and 1I'Y projects K[z]-linearly onto r Y. The
resulting five space diagram

results from gluing a "transfer function trapezoid" to a "realization triangle".


Chasing through a lot of formulas suffices to prove that the diagram commutes,
but there is a better way to think about the commutativity. The middle mapping
T#(z):QU ~ry, called the "KaIman input/output map," is a K[z]-module
mapping whose action can be described as follows: Set the state of the system
to zero, sometime in the distant past, and start inputting astring which ends
at timet=O. Then (and only then) start observing the output (from time t= 1
on). The algebraic theorem that the bottom triangle commutes can be described
intuitively as follows: the output in the future arising from an input confined to
the past can be described completely as the output of thestate at time t = 1
resulting from the input.
So far we have been working from a given system, and have defined the
"big KaIman realization diagram," which relates input/output behavior to the
internal behavior of the system. From this point it is easy to imagine reversing
the process, starting with a description of input/output behavior, and striving
to construct a system which produces that behavior. With this in mind, consider
given a transfer function T(z):1I'U ~ 1I'Y, which is a K((Z-l ))-linear mapping.
Consider the trapezoidal part of the Kaiman dia gram and define the KaIman
mapping T#(z):,QU ~ry by composing the three maps across the top. Now
define the state space X by the linear analog of the Nerode equivalence relation
in automata theory. That is, the state space X consists of the set of input
strings, modulo those strings whose future output vanishes. More formally:
X(T) =,QU /ker T#(z) = QU /QU n T-1(,Q Y) = QU /ker T#

With this definition, the state space (or the pole module of the transfer
function) is a K[z]-module and the dynamics mapping F on X can be defined
as the action of the variable z. The K[z]-linear mappings <G and IH co me free
from standard lemmas in commutative algebra, and <G is surjective while IH is
injective. Finally, the K-linear mapping G and H can be derived from <G and

284

B. F. Wyman

H by reversing the philosophy of early paragraphs. In other words, given a


transfer function, the big KaIman diagram provides a canonical realization.
An important part of Kalman's work in this area involves explicit and
algorithmic methods for computation, particularly the circle of ideas around
the B.L. Ho algorithm ([AS, Sect. 3; K8, p. 288ff.]). Also, since the pole module
X is a finitely generated module over K[z], the full machinery of invariant
factors and Smith canonical form is available and very useful in the study of
such systems ([K8, p. 276ff.]). The reader is referred to the original sources
for more about these important topics. Here, in anticipation of the next section,
we turn to uniqueness questions.
Suppose given a transfer function T(z). The pole module X(T) and in fact
the wh oie system (F, G, H) can be constructed by means of a big KaIman
realization diagram as described above. How do we know that the system so
derived depends only on the transfer function and not on the particular construction used? What is needed is a uniqueness theorem, which was carefully
stated in [K8, p. 258] and proved as a corollary of "Zeiger's Lemma." Quoting
Kalman's summary: "All canonical realizations of any fixed [transfer function]
are essentially the same: the equivalence classes of canonical realizations are in
one-to-one correspondence with the class ofinput/output maps." Closely related
to this result is the fact that there is a system mapping from any reachable
system onto any canonical system, and also an injective mapping from any
observable system into any canonical system. This body of results gives a very
clear view of the fact that any minimal system (having astate space of smallest
possible dimension) is also a canonical system. (See [W3] for a quick discussion
of these issues.)
Some of Kalman's original thinking in this area was mo ti va ted by the
Krohn-Rhodes theory of finite semigroup machines. See Kalman's discussions
in [AS, Note 2 (p. 1508); K8, Sect. 10.12]. In that theory, one machine "divides"
another if the first is a homomorphic image of a submachine of the second. In
general, there may be no morphism between the two machines in either direction.
This formalism is closely related to the ideas of re ach ability and observability
previously introduced by KaIman, and these interrelationships seem to have
influenced the final shape of Kalman's realization triangle.

3 The Categorical Imperative


In February 1974 E.G. Manes, M.A. Arbib, and others organized a symposium
at the AAAS meeting in San Francisco entitled: "The first International
Symposium: Category Theory Applied to Computation and Control." As far
as I know there was never a second. The proceedings volume [Ma] edited by
Manes, contains many papers on categorical generalization of Kalman's
realization triangle.
Category theory supplies good generalizations of Kalman's Q spaces of
"past input strings" and his r spaces of"future input strings." In great generality

5 Linear System Theory-Models and Modules

285

we have a diagram of the form

where the precise definitions of Q, r, and X depend on the context. Several


long papers by Arbib and Manes in [Ma] give a good introduction to the
category theoretic approach, together with definitions of Q and r in great
generality. A much shorter note [W2], written independently of Arbib and
Manes' work, restricts attention to (generalized) linear systems, using examples
which are rather concrete, following closely Kalman's philosophy and using
commutative algebra and ideal theory to compute realizations in many cases.
The paper [W2] starts with two rings K and R, where K should be thought
of as a ring of scalars (generalizing the scalar field K discussed earlier), and R
is a ring of some sort of "generalized difference operators." In the classical
theory R is just the polynomial ring K[z]. We assume that K a subring of R,
so that an R-module can be viewed as a K-module simply by forgetting its
R-module structure. A system in this context is a triple of spaces X, U, and Y
and a pair ofmaps G: U ~X, H:X ~ Y, where U and Yare K-modules and X
is an R-module and also a K-module by forgetting. The two maps G and H
are K-module mappings, representing input and output mappings. The
dynamics map F: X ~ X has been replaced by the R-module action on X, just
as Kaiman used the K[z]-action on X to describe its dynamical structure in
the classical case.
To produce an abstract analog of Kalman's realization triangle, we need to
examine the algebraic role of Q and r, spaces originally defined by a strong
dynamical intuition. Kaiman himself already had a good idea of what was
involved, pointing out in 1968 that Q U is a projective module and r Y is an
injective module ([K7, p. 48]).
Considering the input side more formally, wh at we require of aspace of
the form QU is that it be an R-module, that U be a K-submodule of QU, and
that every K-module map G: U ~X possess a unique R-module extension
<G: Q U ~ X. That is, we need an input Kaiman diagram

QU

U - - - - - -... X
G

286

B. F. Wyman

In other words, the correspondence G~<G establishes an isomorphism of Kmodules HomK(U, X) ~ HomR(.QU, X). That is, the construction .Q must be a
Zeft adjoint of the forgetful functor which takes an R-module to its underlying
K-module. In the Arbib and Manes theory, such an adjoint can be rather
elaborate, but in the present ring theory context it is easy to construct: .Q U is
just given by the tensor product .Q U ~ R @KU, which is a free R-module if U
is free over K (as for vector spaces). The map <G also has a standard construction.
In the dassical ca se this is exactly Kalman's construction, so that we have a
wonderful blending of dynamical intuition and fancy algebra.
The output side is more difficult. Of course, an appropriate diagram can be
easily constructed.

ry

.. Y

That is, there should be a K-module mapping ry ~ Y, and for every


K-module map H:X ~ Y an R-module lifting IH:X ~ry which commutes as
shown. Here we have HomK(X, Y) ~ HomR(X, r Y), via H ~ IH, so that r is a
right adjoint to the forgetful functor. F or most of us, the theory of right adjoints,
involving universally receptive spaces and the theory of injective modules, is
measurably less intuitive than left adjoints and free modules. Where did
Kalman's r Y really come from (in an algebraic or categorical sense), and why
is it a right adjoint functor? Kalman's formulas, with their dynamical inspiration,
supply proofs which do not generalize weIl. My attempt to understand the
situation started with aremark of KaIman in the CIME notes ([K7, p.53,
Remark 4.26A]), in which he points out that for astate space X, HomK(X, Y) ~
HomK[zj(X, Y(z)jY[z]). Now, the space Y(z)jY[z] looks remarkably like ry
(in fact, it is the torsion submodule of r Y). This isomorphism is dose to the
required adjointness theorem, since a K-linear map H:X ~ Y would correspond
under the isomorphism to a K[z]-linear map IH:X -+ Y(z)jY[z]. In the linear
categorical theory involving two rings K and R, the right adjoint to the forgetful
functor is exactly HomK(R, Y), so perhaps we are pretty dose. On the other
hand it is not quite true that
HomK(K[z], Y) ~ HomK[zj(K[z], Y(z)jY[z])

The right hand side is just Y(z)jY[z], but the left hand side is "too big" for
this attractive isomorphism to hold. But, in fact, the truth is bett er. We have
HomK(K[z], Y) ~ y((Z-l ))jY[z] ~

ry

5 Linear System Theory- Models and Modules

287

via the correspondence sendingf: K[z] --+ Yto L~of(zl)z-l-l(mod Y[z]). That
is, Kalman's r construction is exactly the polynomial case of the general right
adjoint functor approach, and again we have aperfect match between dynamical
intuition and algebra.
The idea that the polynomial ring K[z] can be replaced by a more or less
arbitrary ring R acting on the state-space X ("the module action is the
dynamies!") has led to a lot of examples and ni ce results. The idea is to find a
ring whose action describes somehow a dynamics worthy of study, and then
to use Kalman's philosophy to compute suitable input and output string spaces.
The paper by Ed Kamen in this volume will address many of these issues. The
articles ([W3, W4]), partly inspired by Kamen's work on time-varying systems,
deal with general rings and some exotic specific rings. A somewhat different
direction in the application of module theory to system theory, namely the
theory of "families of systems," or "systems over rings," was also begun under
Kalman's influence [RW, RWK]. We say no more about this area here, since
it will be discussed in detail in Kamen's article. Here we discuss more concrete
examples closely related to Kalman's original work on stationary systems over
fields of scalars.
Although the polynomial ring K[z] is the most familiar subring of the field
K(z) of rational functions, many other subrings R of K(z) also have system
theoretic significance. The algebraic study of the behavior of transfer functions
at infinity is based on the ring <D oo of all proper rational functions, those rational
functions with numerator degree no larger than the denominator degree. The
ring <D oo is a local ring with a unique maximal ideal corresponding to the point
at infinity. Classical transfer functions have no poles at infinity, so that the
study of <Doo-modules becomes only important for the study of singular systems,
discussed immediately below, or for the study of modules of multivariable zeros
(which we discuss in the next section).
The discrete-time version of singular (improper, semistate) systems deals
with equations of the form
Ex(t

+ 1) =

Fx(t)

+ Gu(t),

y(t) = Hx(t),

where the matrix E may be singular. The study of improper systems began late
in the history of system theory, but recently there has been a great deal of new
activity in this area [LN, M]. In the continuous-time case, such systems may
involve differentiation of inputs or impulsive responses [VLK], and in the
discrete-time case improper systems simulate predictors. On the other hand,
they can arise abstractly from inversion problems of various kinds, and systems
in descriptor form (or, synonymously, systems with embedded statics) behave
formally like predictors in certain cases [L]. Furthermore, from an algebraic
or algebro-geometric point of view the class of (possibly) improper systems is
the natural object of study. The point at infinity must be included in any
comprehensive study of poles and the zeros of linear systems. Many common
problems such as system inversion or solving transfer function equations lead

288

B. F. Wyman

naturally to improper systems even if the given data involve only proper or
strictly proper systems. If these systems are to be studied by me ans of the
KaIman linear realization theory philosophy, then a new ring must be sought
to replace the polynomials K[z]. In fact )", is the correct choice, at least for
the "purely improper" case in which the dynamics mapping F is just the identity
map. In this case, the transposed equation x(t) = Ex(t + 1) - Gu(t) shows that
the system can be viewed as evolving backwards in time. In the Z-transform
theory the study of backwards evolution involves multiplication by Z-l which
lies in )", but not in the polynomial ring. New techniques, involving aglobaI
state space construction, must be used to study the most general singular system.
The approach to singular systems via the algebraic theory of systems at infinity
was begun by Conte and Perdon in [CP1, 2], and some more recent work on
global systems can be found in [WCP, WSCP].
Rings which are more general than )", but which are still subrings of K(z)
are also useful. If X is a classical state space viewed as a module over K[z]
and S is a multiplicative subset of K[z], then we can consider the localization
R = S-l K[z] and the corresponding localized state space S-l X. For example,
if S is the set of polynomials with all roots in the right half plane, then S-l X
is the stable part of the state space. (Localization at a set of modes, wh ich is
essentially just expanding the ring of operators to allow more denominators,
is a functorial algebraic technique which kills off the set of states corresponding
to those modes.) Furthermore, localization of K[z] can be combined with the
theory at infinity to develop a theory of proper stable or proper unstable transfer
functions and their corresponding state spaces. The idea here is that a very
small amount of category theory allows us to construct perfect analogs of
Kalman's input Q spaces and output r spaces, and to mimic his realization
triangle exactly. The foundations of this theory have been worked out in some
detail in [WCP].
I will conclude this section with the admission that, as far as I know, the
category theoretical approach has not yet succeeded in capturing the full power
of Kalman's original intuition. I view the power of the algebra in [AS] as
coming in two stages. First the observation that the dynamics on astate space
can be conveniently captured by a module structure, together with the intuitive
module structure on the Q and r spaces led to the realization triangle. Then
the big KaIman diagram with a trapezoid relating transfer functions and the
KaIman input/output map led to a full fledged realization theory with many
classical connections. Now the left-adjoint and right-adjoint functors available
in the various category theory approaches give a powerful generalization of
the realization triangle. On the other hand, the top trapezoid, involving
Z-transforms of trajectories on the wh oIe time line, must be created in an ad hoc
manner for each problem attacked. So far, for each reasonable dynamical
problem and each reasonable choice of rings, some sort of trajectory construction has turned out to be available, but I look forward to a new, very
general construction which will tell us what trajectories ought to be in some
elegant generality.

5 Linear System Theory-Models and Modules

289

4 Poles and Zeros


In our discussions so far of Kalman's module theoretic approach to systems,
we have emphasized throughout the role of the poles of the system. In fact, the
main KaIman tri angle (in its various contexts) relates the pole module, the statespace of a system, to the input/output behavior of the system. KaIman has
emphasized throughout his work that this description, far from being intensely
abstract, gives a concrete and usable idiom for the study oflinear system theory.
More than 15 years after Kalman's first use ofmodule theory to study poles,
Mike Sain and I wondered if the same intuition and power could be applied
to the theory of zeros of a multi variable linear system. In the classical theory
of single-input, single-output systems a zero of a system was a root of the
numerator polynomial of the transfer function. As such it was viewed simply
as a complex number, perhaps with multiplicity. This level of discourse in more
or less equivalent to identifying a system with the eigenvalues of its dynamics
matrix (although in the ca se of zeros only the values without any dynamics
matrix were available). This approach was inadequate to handle the multivariable
theory, in which "multiplicity" is a subtle notion, and numerically coincident
zeros and poles may fail to cancel. The first sophisticated approach to these
problems can be found in [R]. Rosenbrock's idea was to consider the SmithMcMillan form of a transfer function and to consider the numerators and
denominators of its diagonal entries. In fact, as KaIman already noted in [AS],
the denominators (which are the diagonal entries of an appropriate Smith form)
give the invariant factors of the pole module. The numerators, known as the
Rosenbrock zero polynomials of the transfer function, satisfy the basic divisibility
relations required ofinvariant factors, so surely they should correspond to some
naturally defined module. After a first try based directly on the polynomial
matrix theory, the "good" definition of the multivariable zero module of a transfer
function was presented in [WS1]. This module has a natural definition, and
incorporates the crucial structural properties ofzeros. lust as in Kalman's view,
the module structure itselfis the correct view ofmultiplicity. By now a substantial
literature has appeared [WS2, 6, FR, CPl, 2, WSCP].
We review quickly the definition of the zero module of a transfer function
T(z): U(z)~ Y(z), originally given in [WSl]. This module, a finitely generated
torsion module over K[z], can be written

Z(T) = T- 1 (y[z]) + U[z]


ker T+ U[z]
Thus, a "zero" is a rational input wh ich gives a "trivial" or polynomial output,
viewed modulo polynomial inputs. This definition is motivated in some sense
by Kalman's original definition of the pole module

U[z]
( )- U[z]nT- 1 (y[z])

X T _

290

B. F. Wyman

in which the state can be thought of as inputs modulo inputs with trivial outputs.
It turns out that zeros are more complieated than poles, since the rational null

space of the transfer function supplies a rather different sort of zero, so that
the vector space kernel of T(z) must be factored out to insure a finite-dimensional
answer for the zero module.
The module theoretic approach to zeros has achieved some important
successes in unifying and explaining zero related behavior. The intuitive idea
that "a zero of a system is a pole of the inverse system" has been refined to the
assertion that the zero module of a system is a factor module of the pole module
of any right inverse. (Left inverses use submodules.) Additional poles of an
inverse system, not forced by direct zeros, are now well understood [WS3]. The
relationship between zeros and geometric control theory has now been clarified
by an explicit exact sequence, and recent work untangles various structural
information for non-minimal systems contained in the Rosenbrock system
matrix [WS2,6]. More recently [WCP2, SW] problems of pole/zero cancellation for interconnected systems have been worked out. I would like to
emphasize here that Kalman's convietion that the module-theoretic view of
stationary linear system theory is the "correct" view, or at least a very powerful
one, has been borne out in the theory of multi variable zeros as well as the
theory of poles.

5 The Fundamental Pole-Zero Exact Sequence


In this section I would like to discuss some current work joint with M.K. Sain,
G. Conte, and A.M. Perdon which we hope will refine Kalman's original vision
for the module theory of poles, together with the newer module theory of
multi variable zeros. In addition, these new results address some difficulties
involving infinite dimensional zero spaces for transfer functions which fail to
be injective (or wh ich fail to be surjective) which did not appear in Kalman's
theory of poles. In addition to its direct descent from [AS], this work has roots
in Wedderburn's work on rational vectors, strong connections with geometrie
control theory, and even some ties with convolutional coding theory as described
in the paper of Forney in this volume.
The recent paper [WSCP] combines the theories of poles and zeros both
in the finite plane (studied by the theory of polynomial modules) and at infinity
(studied by the theory of <Doo-modules) in a way which gives a structural
interpretation to the assertion that "The number of zeros of a transfer function
is equal to the number of poles." Of course, this assertion is wrong, at least as
commonly understood. It is known that in general the number of poles of a
transfer function G(z) (counting both the finite poles and the poles at infinity)
may be greater than number of zeros (so counted). The difference has been
called the "defect" of the transfer function [Fo]. In fact, the defect has been
calculated in terms of"Kronecker indices" or "Wedderburn numbers" [Fo, We].

5 Linear System Theory-Models and Modules

291

We have proved that these Wedderburn numbers are in fact the dimensions of
new finite-dimensional vector spaces which measure the size of two sorts of
"generic zeros": those arising from the failure of G(z) to be injective and from
the failure of G(z) to be surjective. Furthermore, the number of ordinary
"lumped" zeros is best viewed as the dimension of a zero-module (either finite
or at infinity), and of course the number of poles is the dimension of the polemodule, or minimal realization state space (possibly at infinity), as discussed in
previous sections. With an expanded interpretation, the counting principle is
not only true numerically, but it becomes an exact sequence relating the usual
pole-modules, the lumped zero-modules, and the new generic "Wedderburn
zero spaces."
We devote a little space to some of the new ideas in [WSCP]. The crucial
step of describing generic zeros by means of a finite dimensional vector space
sterns from a construction mentioned by Forney in notes written at Stanford
in 1972. Since it is closely associated with earlier work of Wedderburn, we call
it the Wedderburn-Forney construction. Let C be any subspace of V(z), where
V is finite dimensional over k. By choosing a basis of V over k and using it as
a basis of V(z) over k(z), we can identify V(z) as aspace of column vectors with
coefficients in k(z). A vector in V(z) is polynomial (that is, in the KaIman space
V[zJ) or proper (in !2 00 V, the KaIman input space at infinity) if its coefficients
He in K[zJ or (1)00' With these conventions, we define coordinatewise maps 1t+
(polynomial part of a rational function) and 1L (strictly proper part). We define
the Wedderburn-Forney space associated to C by
W(C)=

1L(C)
Cnz- 1 (!2 00 V)

That is, W(C) consists of the set of vectors which are strictly proper parts of
vectors in C, modulo the set ofvectors of C which are themselves strictly proper.
It is clear that W(C) is a vector space over K, and the first technical result is
that for any subspace C, W(C) is finite dimensional over K. The fine structure
of the Wedderburn-Forney space W(C) is closely related to earlier work on
minimal indices and minimal polynomial bases, as found, for example in
Forney's paper [Fo]. For example, its dimension is given by the sum of the
minimal indices of T(z). Some of the main ideas were already done by
Wedderburn and Kronecker EWe; R, pp 95-99J, and an excellent summary can
be found in [Ka, Chap. 6J.
Let T(z) be a transfer function. We define the global pole space X(T) by
X(T) = X(T) EB X 00 (T), where X(T) is the usual KaIman pole module and X oo(T)
is its analog at infinity. Similarly the global zero space is defined to be Z(T) =
Z(T) EB Zoo(T). The principal result is given by the following exact sequence.
Fundamental Pole-Zero Exact Sequence: Let T(z) be a transfer function.
Then there is an exact sequence of finite dimensional vector spaces over k,

O---.Z(T)---.

X(T)
.
---. W(lm T(z))---.O
W(ker T(z))

292

B. F. Wyman

As a numerical corollary, we have that for every transfer function T(z),


dimX(T) = dim71(T) + dirn W(ker(T(z)) + dirn W(im T(z))
This result, which folIo ws immediately by counting dimensions, should be
interpreted as stating that "the number of poles of a matrix of rational functions
equals the number of zeros." That is, the right hand side of the equation is a
reasonable accounting of the total number of zeros of T(z). This numerical
corollary is weIl known (for example, see [KaJ), but the structural implications
ofthe sequence are new and far-reaching. In particular, the spaces and mappings
of this exact sequence have important interpretations in geometrie control
theory.

6 An Appreciation
I hope I will be forgiven for some personal remarks on this occasion. In 1971,
when I was coming to the end of a temporary position at Stanford, I was an
algebraist without much direction. Rudy KaIman asked me a question, and
encouraged me to work with Yves Rouchaleau, who was dose to finishing his
dissertation on systems over rings. Together they managed to te ach me a little
system theory and get me going on a fascinating se ries of problems. I am still
trying to understand the algebraic foundations of linear system theory, and I
hope I have succeeded to some extent. In any case, I have enjoyed myself
thoroughly, and I'm still having a good time. Thanks, Rudy.
References
[AS] See [K2]
[BL] C. Byrnes, A. Lindquist, eds., Frequency Domain and State Space Methods Jor Linear
Systems, North-Holland: Amsterdam, 1986
[BMS] C. Byrnes, C. Martin, R. Saeks, eds, Linear Circuits, Systems, and Signal Processing:
Theory and Applications, North Holland, 1988
[CP1] G. Conte, A. Perdon, "On polynomial matriees and finitely generated torsion modules,"
in AMS-N ASA-N ATO Summer Seminar on Linear System Theory, 18, Leetures in Applied
Math, Amer Math Society: Providenee, 1980
[CP2] G. Conte, A. Perdon, "Generalized State Spaee Realizations of Non-proper Transfer
Funetions," Systems and Control Letters, 1, 1982
[CP3] G. Conte, A. Perdon, "Infinite Zero Module and Infinite Pole Modules," VII Int ConJ
Analysis and Opt oJ Systems: Nice, Lee Notes Control and Inf. Scienee, Springer 62,
pp 302-315, 1984
[Fo] G.D. Forney, Ir., "Minimal Bases of Rational Veetor Spaees with Applieations to
Multivariable Linear Systems," SIAM J Control, 13,493-520, 1975
[F] P. Fuhrmann, Linear Systems and Operators in Hilbert Space, Chap. I, MeGraw-Hill
International, 1981
[FH] P. Fuhrmann, M. Hautus, "On the Zero Module of Rational Matrix Funetions,"
Proe 19th IEEE Conf Deeision and Control, pp 161-184, 1980
[Kl] R.E. KaIman, "Mathematical Deseription of Linear Dynamieal Systems," SIAM J
Control, 1, pp 152-192, 1963
[K2] R.E. KaIman, "Algebraie Strueture of Linear Dynamie Systems. I. The Module of Sigma,"
Proc Nat Acad Sei (USA), 54, pp 1503-1508, 1965

5 Linear System Theory-Models and Modules

293

[K3] R.E. Kaiman, "Irreducible Realizations and the Degree of a Rational Matrix, J Soe Indust
Appl Math, Vol13, No 2, 1965
[K4] R.E. Kaiman, "New Developments in System Theory Relevant to Biology," in Systems
Theory and Biology (M. Mesarovic, ed.), New York: Springer, 1968
[K5] R.E. Kaiman, "On the Mathematics of Model Building," Proceedings of the School on
Neural Networks, Ravello, June 1967
[K6] R.E. Kaiman, Aigebraic Aspects of the Theory of Dynamical Systems, in Differential
Equations and Dynamical Systems, J.K. Haie and J. LaSalle, eds., New York: Academic
Press, 1967
[K7] R.E. Kaiman, Leetures on Controllability and Observability [The CIME Notes], Centro
Internazionale Matematico Estivo, Bologna, 1968
[K8] R.E. Kaiman, Aigebraic Theory of Linear Systems, Chapter X in [KFA]
[KFA] R.E. Kaiman, P. Falb, M. Arbib, Topics in Mathematical System Theory, McGraw-Hill:
New York, 1969
[Ka] T. Kailath, Linear Systems, Prentice Hall: Englewood Cliffs, 1980
[L] D. Luenberger, "Dynamic Equations in Descriptor Form," IEEE Trans Auto Control,
AC-22, pp 312-321, March 1977
[LN] F.S. Lewis and R.W. Newcomb, guest editors, "Special Issue: Semistate Systems," Cireuits,
Systems, and Signal Proeessing, 5, No 1, 1986
[M] M. Malabre, "Generalized Linear Systems: Geometrie and Structural Approaches,"
Special Issue on Control System, Linear Algebra and Applieations, 122/123/124,
pp 591-620, 1989
[Ma] E.G. Manes, ed., Category Theory Applied to Computation and Control, Springer Lec
Notes Comp Sci, 25, 1975
[R] H. Rosenbrock, State Spaee and Multivariable Theory, Wiley: New York, 1970
[RW] Y. Rouchaleau, B. Wyman, "Linear Dynamical Systems over Integral Domains," J
Computer Sys Sei, 9, pp 129-142,1974
[RWK] Y. Rouchaleau, B. Wyman, R. Kaiman, "Algebraic structure of Linear Dynamic Systems
III. Realization Theory over a Commutative Ring," Proe Nat Aead Sei, 69, pp 3404-3406,
1972
'
[VLK] G. Verghese, B. Levy, T. Kailath, "A Generalized State Space for Singular Systems,"
IEEE Trans Auto Control, AC-26, pp 811-831, August 1981
[W1] B. Wyman, "Linear Systems over Commutative Rings," Lecture Notes, Stanford
University, 1972
[W2] B. Wyman, "Linear Systems over Rings of Operators" in [Ma], pp 218-223
[W3] B. Wyman, "Time Varying Linear Discrete-time Systems: Realization Theory," Advanees
in Math Supplementary Studies, Vol I, pp 233-258, Acad Press, 1978
[W4] B. Wyman "Estimation and Duality for Time-Varying Linear Systems," Pacific J of
Math (Hochschild volume), 86, 1980, 361-377
[WCP1] B. Wyman, G. Conte, A. Perdon, "Local and Global Linear System Theory" in [BL],
pp 165-181
[WCP2] B. Wyman, G. Conte, A. Perdon, "Fixed Poles in Transfer Function Equations," SIAM
J Cont Opt, 26, pp 356-368, March 1988
EWe] J.H.M. Wedderburn, Leetures on Matriees, Chap. 4, A.M.S. Colloquium Publications,
17, 1934
[Wo] M. Wonham, Linear Multivariable Control: A Geometrie Approach, 3rd Edition, Springer,
1985
[WS1] B. Wyman, M.K. Sain, "The Zero Module and Essential Inverse Systems," IEEE Trans
Cire Systems, CAS-28, pp 112-126, 1981
[WS2] B. Wyman, M.K. Sain, "On the Zeros of a Minimal Realization," Lin Alg Appl, 50,
pp621-637, 1983
[WS3] B. Wyman, M.K. Sain, "On the Design of Pole Modules for Inverse Systems," IEEE
Trans Cire Systems, CAS-32, pp 977-988, 1985
[WS4] B. Wyman, M.K. Sain, "A Unified Pole-Zero Module for Linear Transfer Functions,"
Systems and Control Letters, 5, 1984, 117-120
[WS6] B. Wyman, M.K. Sain, "Module Theoretic Structures for System Matrices" SIAM J
Control and Opt, January 1987, pp 86-99
[WSCP] B. Wyman, M. Sain, G. Conte, A.-M. Perdon, "On the Zeros and Poles of a Transfer
Function," Special Issue on Systems and Control, Linear Algebra and Applieations,
122/123/124, pp 123-144, 1989

Linear Realization Theory, Integer Invariants


and Feedback Control 1
J. Hammer
Center far Mathematical System Theory, Department o[ Electrical Engineering,
University of Flarida, Gainesville, FL 32611, USA

The algebraic module theoretic stability framework for linear time-invariant systems is reviewed.
The main theme is that Kalman's algebraic realization theory has evolved much beyond its initial
objective of providing an abstract framework for the derivation of mathematical models of systems.
It has become a powerful tool for the extraction of structural invariants, permitting the exact
characterization of all options for dynamics assignment through internally stable linear dynamic
compensation. This characterization is provided by a set of integers-the stability indices of the
system.

1 Introduction
In asense, realization theory is the basic mechanism of science through which
the conceptualization of observation is achieved.1t formulates the mathematical
guiding principles that lead from measurement of behavior to laws of nature.
Duly stated, realization theory is the abstract theory of mathematical modeling.
It forms the bridge from experiment to theory, and, in a way, is a grand
mathematical scheme for data compression, facilitating compact mathematical
description of vast realms of experimental data. All this is, of course, well known.
The main point of the present note is to show that realization theory has
matured beyond its innate mission ofproviding guiding principles for modeling,
and has become a refined tool for scientific analysis, capable of singling out the
important aspects of experimental data and filtering away the c1utter of
secondary details. In mathematical terms, realization theory has become a
sophisticated tool for the extraction of structural invariants of systems.
Historically and philosophically, realization theory may be conceived as the
driving force behind the scientific revolution that started in the eighteenth
century; Nevertheless, it seems that elicit mathematical treatments of basic
aspects of realization theory had not appeared in the scientific literature until
around the middle of the present century. At that time, realization theory formed
1 This research was supparted in part by the National Science Foundation, USA, Grant number
8896182 and 8913424. Partial support was also provided by the Office of Naval Research, USA,
Grant number NOOOI4-86-K0538, by US Air Force Grant AFOSR-87-0249, and by the US Army
Research Office, Grant number DAAL03-89-0018, through the Center for Mathematical System
Theory, University of Florida, Gainesville, FL 32611, USA.

296

J. Hammer

a basic component of the research in automata theory, which culminated in


the weIl known Nerode principle (Nerode [1958]). The basic principles of
realization theory the way it is known today were set out in the pioneering
contributions of R.E. KaIman (KaIman [1965, 1968], and KaIman, Falb,
and Arbib [1969]), which formed the mathematical foundation of modern
system theory and contro!. The broad implications of realization theory to a
variety of other scientific disciplines were also pointed out by R.E. KaIman (e.g.
KaIman [1980]).
Perhaps, one of the most important contributions of R.E. KaIman in the
context of realization theory was the development of an explicit mathematical
framework within wh ich the realization issue can be explored. In this way, the
realization problem was transformed from a vague philosophical entity into a
concrete set of mathematical problems, complete with tools and techniques for
exploration. Kalman's mathematization of realization theory formed the cradle
for the evolution of mathematical system theory, giving birth to an entirely new
branch of Mathematics, and forming a lead toward the mathematization of
Engineering.
The present note is concerned with some implications of a particular
contribution of R.E. KaIman to mathematical realization theory-the introduction of algebraic module theory as a fundamental tool for the solution of
the realization problem for linear time-invariant systems (KaIman [1965]).
The main objective here is to show that basically the same abstract formalism
can be used to derive structural invariants of linear time-invariant systems,
invariants that determine and characterize the limitations on the performance
of a linear system within a control engineering environment. The material
covered in this note is a review and re-interpretation of results presented in
Hammer [1983a, b, and cl
Over the years, much thought has been given to the issue of what should the
true nature of Engineering theory be, and how should it relate to Engineering
practice. IdeaIly, one might say, Engineering theory should provide the formulas
for solving practical engineering problems. However, further reflection shows
this approach to be quite simplistic; while being scientifically founded,
Engineering practice comprises a substantial component of what one might call
'Engineering Art'. The wide spectrum of design constraints and refined performance criteria facing the Engineer make an individualistic approach to design
imperative. And the unavoidable interaction between engineering systems and
human operators adds to the performance evaluation an aspect of aesthetics
and values. Thus, Engineering practice consists of much more than the
application of pre-derived formulas.
It seems that the most important role of Engineering theory is to provide
the designer with a clear indication of the limitations on design performance.
These limitations should be extracted by the theory from the description of the
system, and presented in concise and clear form, so as to provide the designer
with a clear characterization of the entire spectrum of achievable design
specifications. Furthermore, in case critical design specifications cannot be met

5 Linear System Theory-Realization, Integer Invariants and Feedback

297

for the given system, the theory should provide a clear indication of the ways
in which the system has to be modified to facilitate the desired performance.
From a mathematical point of view, the limitations on design performance for
a given class of systems are usually presented in terms of system invariants. In
qualitative terms, the invariants characterize the fundamental underlying
structure of the system which cannot be alte red by external intervention; they
provide the skeleton upon which the designer may build.
The module theoretic approach to the linear realization problem initiated
in KaIman [1965J has matured into a powerful tool for the derivation of
structural invariants for linear time-invariant systems. Specifically, it yields a
set of integer invariants that entirely characterize all possible dynamical
behaviors that can be assigned to a system through the use of external dynamic
compensation, subject to the requirement that the final closed loop configuration
be internally stable. This result is derived by developing an algebraic module
theoretic realization theory over certain rings of stable rational functions, which
replace the ring of polynomials used in the original KaIman realization theory.
In this way, algebraic realization theory becomes more than just a tool for
obtaining dynamical models of linear systems; it provides the means to extract
the inherent structure ofthe system from the input/output data on its behavior.
It is quite fascinating that the fundamental invariant structure of a linear
time-invariant system can be entirely characterized by a finite set of integers.
When investigating the possible dynamical properties that can be endowed to
a given system through an internally stable closed loop configuration, all one
needs to know are these integer invariants, despite the fact that the complete
description ofthe dynamical model ofthe system requires a much larger number
of real parameters. The question of whether certain dynamical properties can
or cannot be assigned to a system by internally stable compensation simply
reduces to the comparison of some integers. This fact provides a deep indication
of the fundamental simplicity of linear time-invariant control systems. It is, of
course, in line with the classical results on pole assignment (Wonham [1967J)
and on the assignment of invariant factors (Rosenbrock [1970J).

2 Time Invariance, Linearity, and Stability Rings


Consider a causallinear time-invariant system}; having an input space U of
finite dimension m and an output space Y of finite dimension p. For the sake
of intuitive clarity, assurne that }; is a discrete-time system (the framework
discussed herein applies to continuous-time systems as well; one only needs to
use the Laplace transform instead of the Z-transform employed here). Adopting
the input/output point ofview, the system}; is regarded as a map that transforms
input sequences of vectors of U into output sequences of vectors of Y. An input
sequence U = {u to ' uto + l ' U to + 2 " " } is commonly represented as a Laurent series

298

J. Hammer

of the form

u=

00
'\'

L..., U1Z

-I

(2.1)

1=10

which is interpreted as the Z-transform of u. In this notation, the Laurent series


zu simply represents the input sequence obtained by shifting U by one step to
the left, and Z represents the one-step shift operator. The set of all Laurent
sequences of the form (2.1), where the (finite) initial time t o may vary from
sequence to sequence, is denoted by AU. Clearly, the output sequence generated
by ~ from the input sequence u is then an element of the space A Y. The time
invariance of the system ~ implies that it commutes with the shift operator z,
so that
~z=z~

(2.2)

In his [1972J lecture notes, Wyman noted that, through (2.2), time invariance
is related to the linearity of the system ~ over the field of scalar Laurent series,
and this idea was then further expounded in Hautus and Heymann [1978].
We review this point next.
Let K be a field, let S be a K -linear space, and let AS denote the set of all
Laurent series of the form

s=

L
00

SIZ-I,

(2.3)

1=10

where the initial integer t o may vary from one sequence to another, and where
for all integers t ~ t o. In particular, taking S = K, it is easy to see that the
set AK forms a field under the operations of coefficientwise addition as addition
and series convolution as multiplication. The set AS becomes then a AK-linear
space. Furthermore, it can readily be shown that if the dimension of S as a
K -linear space is n, then AS is a finite dimensional AK-linear space of dimension
n as weIl. The importance of the notion of AK-linearity is that it permits the
fusion of two seemingly disparate notions-the notion of linearity and the
notion of time invariance. In fact, every AK -linear map f: A U --+ A Y clearly
satisfies (2.2), and whence represents a K-linear time-invariant system, which has
the K-linear space U as its input space, and the K-linear space Y as its output
space (Wyman [1972J). Moreover, it can be shown that every K-linear
time-invariant system ~ which is causa I and has a finite dimensional state-space
represents a AK -linear map f: A U --+ A Y, where U is the input space of ~ and
Y is its output space. Thus, for the systems we intend to consider in this note,
the notion of a AK-linear map is equivalent to the input/output description of
a system.
As any linear map between finite dimensional linear spaces, a AK-linear
map f: A U --+ A Y may be represented by a matrix, relative to specified bases of
its domain AU and its codomain AY. Among the bases of the space AU, a
particularly significant role is played by bases of the original K-linear space U.
SIES

5 Linear System Theory-Realization, Integer Invariants and Feedback

299

It is easy to verify that every basis u l' u2 , ... ,Um of the K -linear space U is also
a basis of the AK-linear space AU. A matrix representation of the AK-linear

map f with respect to bases u 1 , u2 , ... , Um of U and Y1' Y2"'" Ym of Y is called


a transfer matrix of f, and it coincides with the standard notion of a transfer
matrix used in linear contral.
We consider next some objects in the infrastructure of the space AS. First,
let QS denote the set of all polynomial elements within AS, namely, the set of
all elements of the form L~=,oS,z-', where t o ~ 0 and S,ES. Then, for S = K, the
set QK is simply the principal ideal domain of polynomials with coefficients in
the field K. More generally, the space QS is an ilK-module of rank equal to
dimKS. Also, let Q- S denote the set ofall elements in AS ofthe form L;x,:,oS,Z-r.
The set Q - K is then the principal ideal of all power series with coefficients in
K, and Q- S is a module over Q- K with rank given by dimKS.
The space AS is itself an QK-module as well, and one may consider the
quotient module TS:= AS/QS. We denote by
j:QS.--+AS

the. identity injection, which assigns to each polynomialvector the formal


Laurent series equal to it. We let
n:AS--+TS

be the canonical projection onto the quotient module.


Several c1asses of AK-linear maps are important in this context. A polynomial
map is a AK-linear map f:AU --+ AY which satisfiesf[QU] c QY, and is simply
a map that has a polynomial transfer matrix. A AK -linear map f: A U --+ A Y
is rational if there is a nonzero polynomial IjJEilK for which IjJf is a polynomial map. A causal map is a AK-linear map f:AU --+ AY that satisfies
f[Q- U] c Q- Y, and it describes a causal linear time-invariant system; The
map fis strictly causaliff[Q- U] c Z-1 [Q- Y]. A AK-linearmap M:AU --+ AU
is bicausal if it is invertible, and if it and its inverse M- 1 are both causal maps.
Finally, a rational and strict1y causal AK-linear map is called a linear input/
output map.
Perhaps, one of the most intriguing properties of linear systems is the fact
that they permit treatment of the notion of stability in an algebraic setup,
without requiring direct reference to topological properties. This fact, which is
crucial to the algebraic theory of linear systems, has been observed by several
investigators during the seventies. One of the earliest references to this point is
Morse [1975]. The basic observation in this context is that stability theory
for linear time-invariant systems can be developed within the algebraic framework of localizations of the ring of polynomials. To be specific, let ec QK be
a subset of polynomials satisfying the following three properties: (i) the product
of every two elements of e is still in e; (ii) the zero polynomial is not in e; and
(iii) contains a polynomial of degree one. A subset satisfying these properties
is called astability set. The algebraic definition of stability is then given by the
following

300

J. Hammer

2.4. Definition. Let () be astability set. A AK-linear mapl:AU --.AY is


input/output stable (in the sense of ()) if there is an element I/IE(} such that 1/11
is a polynomial map.
Note that this notion of stability conforms with the classical notion of
stability used in linear contral. Indeed, let K be the field of real numbers. We
can take the stability set () as the set of all polynomials having all their roots
within the unit disc in the complex plane. Then, Definition (2.4) becomes identical
with the classical definition of stability for discrete-time systems. Alternatively,
we can take () to be the set of all polynomials having all their roots in the open
left half of the complex plane. Then, (2.4) reduces to the classical definition of
stability for continuous time systems. Thus, the present definition of stability
generalizes the classical ones.
In order to permit the development of a complete algebraic theory of
internally stable control, it is necessary to refine somewhat the notion of a
stability set in the following way (Hammer [1983aJ). Astriet stability set ()
is astability set for which there exists a polynomial of degree one not in ().
Thus, for a strict stability set (), there is a polynomial of degree one in (), and
there is (another) polynomial of degree one wh ich is not in (). The examples
provided above are, in fact, all strict stability sets. Throughout the present note,
all stability sets are assumed to be strict stability sets.
Let Q9K denote the set of all rational elements ocEAK wh ich can be expressed
as a fraction of the form oc = IY, where and Y are polynomials and YE(}. The
set Q9K simply describes the set of all stable scalar transfer functions (in the
sense of ()). The following is then a direct consequence of the theory of localized
rings (e.g. Zariski and Samuel [1958J).
(2.5) Proposition. Q9K is a prineipal ideal domain.

The space AS is, of course, an Q9K-module as well. Let SI'S2' ... 'Sn be a
basis of the K -linear space S, and let Q9S be the Q9K -submodule of AS generated
by this basis, namely,
Q9 S =

{s

.f OCiSi, OCl, ... ,OCmEQ9K}

,= 1

Then, it is easy to see that Q9S is the same for any basis SI' S2, ... , Sm of S, and
its rank as an Q9K-module is equal to the dimension of S as a K-linear space.
We denote by
(2.6)

the identity injection which maps each element of Q9S into the same element
in AS. By
(2.7)

5 Linear System Theory-Realization, Integer Invariants and Feedback

301

we denote the canonical projection which maps each element sEAS into the
equivalence dass s + QoS in AS/QoS. It follows then that a AK-linear
map f:AU ~AY is input/output stable

if and only if f[QoUJ

c Qo Y.

The final algebraic structure that we need to review deals with the
combination of the notions of causality and stability. Let Q;; K:= QoK n Q - K,
which consists of all the stable and causal elements in AK. Then, the following
is true (Morse [1975J)
(2.8) Proposition. Q;; K is a principal ideal domain.

Using this notation, a AK-linear map f:AU ~ AY is causa I and input/output


if and only if f[Q;; UJ c Q;; Y. It is convenient to employ the following
terminology. A AK-linear map M:AU ~AU is QK- (respectively, Q- K-,
!2oK-,!2;; K-) unimodular if it has an inverse M- 1, and if both M and M- 1 are
polynomial (respectively, causal, stable, causal and stable) maps. Clearly, the
QK-unimodular maps are the usual polynomial unimodular maps, and the
Q- K-unimodular maps are the bicausal maps. Every Q;; K-unimodular map
must also be bicausal.
stable

3 Realization, Strict Observability, and Stability Modules


The basic structure of Kalman's algebraic realization theory (KaIman [1965J)
can be briefly described as follows. First, in order to provide an intuitive
background, note that every element u = L~roUrZ-r with t o ~ 0 in the
space AU can be decomposed into two (non-disjoint) parts: the past part
u p := L~=roUrz-r, which is the polynomial part of u; and the future part
u F := L~o urz- r, which can be identified with the projection nu. The 'present'
component U o is contained in both parts. Now, with each AK-linear
map f: A U ~ A Y, one associates a restricted map I given by
l:=nfj:!2u~ry

(3.1)

In intuitive terms, the map I associates with each input sequence terminating
at the present, the future part of the output sequence genera ted by it. Of
particular importance is the set of all past input sequences that generate zero
future outputs, namley the set
(3.2)

which, by definition, is a subset of QU. In fact, since I is c1early an


QK-homomorphism, the set Li K is an !2K-module, and it is usually referred to
as the KaIman realization module. In order to point out the significance of the
module Li K , we review the basic definition of an abstract realization, as conceived
in KaIman [1965]. Letf:AU ~AY be a linear input/output map. An abstract
realization of fis a tripie (X, g, h), where X is an QK-module and g:QU ~ X

302

J. Hammer

and h: X - r Y are .Q K -homomorphisms, such that the following diagram is


commutative.

nu

I1'

~/
x

This commutative diagram gives rise to a representation of the system


represented by the input/output map f in the standard form
Xk+ 1
Yk

= FX k + Guk ,

= Hx k ,

in a way that we do not detail here (see KaIman, Falb, and Arbib [1969J
or Hautus and Heymann [1978J). The realization (X, g, h) is called
reachable whenever g is surjective; observable whenever h is injective; and
canonical whenever it is both reachable and observable. The pair (X, g) is called
a semirealization of f. It can be readily seen from the diagram that every
realization (X, g, h) of f must satisfy ker g c ker 1; and a reachable realization
(X, g, h) is canonical if and only if ker g = ker
(= L1 K ). A canonical
semirealization can be simply constructed by ta king g as the projection
.QU -.QU/L1 K , and the canonical state-space is simply given by the quotient
module X = .QU/L1 K These facts indicate that the module L1 K is the basic
quantity in realization theory for linear time-invariant systems.
In stronger terms, the polynomial module L1 K exactly contains the critical
information that is necessary in order to construct a dynamical mathematical
model of a given linear time-invariant system from its input/output behavior
f. However, as we shall show in the sequel, this information does not directly
provide the designer with a clear indication of the design options at his
disposition, when trying to control the given system using an internally stable
control configuration. The information provided by AK , although complete,
contains much too many details whithin which the critical information
characterizing the design options is buried. Nevertheless, the module theoretic
stability framework which we have briefly reviewed earlier will extract the sought
after information from the deep underlying algebraic structure of the input/
output map f. It will provide an accurate characterization of all design options
in the most obvious and condensed form, in terms of a set of m integers, where
m is the dimension of the input space U.
Perhaps, the most fundamental quantity in the analysis of the invariant
structure of linear time-invariant systems is the strict observability module
associated with an input/output mapf:AU -AY, and given by kernf
(Hammer and Heymann [1983J). The strict observability module kernf
consists of all input sequences (not necessarily restricted to the past) which

5 Linear System Theory-Realization, Integer Invariants and Feedback

303

produce zero future outputs from the system described by the input/output
mapf. Sincefand n are both nK-homomorphisms, it is an nK-module, namely,
a module over the polynomials. In contrast to the realization module .1 K , which
contains only polynomial vectors as elements, the module ker nf mayaiso
contain non-polynomial elements. As a direct consequence of the definition, we
have that
.1 K = kernf nnu,

so that .1 K C ker nf. In view of the weH known fact that the realization module
.1 K is offull rank whenever fis a rational function, the last containment implies
that the module kernf contains a basis of the AK-linear space AU.
The basic algebraic quantity of our present discussion is the stability module
.1 0 of Hammer [1983a], given by
(3.3)
It consists of all stable input sequences that produce zero future outputs for
the system described by the input/output map f. Since ker nf and noU are both
nK-modules (the latter being implied by the fact that nK c noK), it foHows
that .1 0 is an nK-module. The module .1 0 will allow us to derive a complete
set of invariants that characterize the set of all dynamical properties that can
be assigned to the system described by the input/output map f by internally
stable control. These invariants are derived through a standard procedure for
the extraction of integer invariants from nK-modules, which is described in
the next section.
Another module which is critical to our discussion is the pole module .1 0 ,
also introduced in Hammer [1983a], and given by
(3.4)

where n o is defined in (2.7). This module consists of aH polynomial (i.e. past)


input sequences which produce stable output sequences from the system
described by f:A U -+ A Y. It is an nK -module, and is obviously contained in n U.

4 Module Indices and Stability Indices


The derivation of structural invariants for linear time-invariant systems seems
to be intimately linked to the notion of proper bases (or 'minimal bases'), which
have found various applications in algebra (e.g. Wedderburn [1934]), and
whose significance to the theory of linear control systems has been pointed out
by numerous authors (Rosenbrock [1970], Wolovich [1974], Forney [1975],
Hautus and Heymann [1978], Hammer and Heymann [1981]). We start this
section with a brief review of the notion of proper bases.
Let s = L~toStZ-t be an element of the AK-linear space AS. The order of
s is defined by ord s:= mint {St i= O} if s i= 0 and ord s:= r:IJ if s = 0; sometimes,

304

J. Hammer

it is more common to use the notion of degree, given by deg s:= - ord s. The
leading coefficient S of s is an element of the k-linear space S given by S:= Sords
if s i= 0 and S:= 0 if s = O. A set of elements u 1 , u 2 , , U n in the space AS is said
to be properly independent if the leading coefficients U1 , u2 , ... , un are K-linearly
independent. A proper basis of the AK-linear space AS is a basis of AS that
consists of properly independent elements. An ordered proper basis of an
.QK-module .1 c AS is a basis d 1 , , dn of .1 that consists of properly
independent elements, and deg di ~ deg di + 1 for all i = 1, ... , n - 1. As it turns
out, the degrees ofthe elemens of ordered proper bases are of deep significance in
linear time-invariant system theory. The origin ofthis fact sterns from the notion
of causality, but we will not explore this connection he re in great detail (see
Hammer and Heymann [1981, 1983] and Hammer [1983a]).
It is quite interesting that the degrees of the elements of an ordered proper
basis of an .QK-submodule .1 ofthe AK-linear space AS are uniquely determined
by .1, and can be derived without the explicit construction of any proper bases.
This fact has probably first been noticed (somewhat implicitly) in Rosenbrock
[1970], but the specific procedure used here to derive these degrees was
developed in Hautus and Heymann [1978], Hammer and Heymann [1981,
1983] and Hammer [1983a]. To describe the procedure, let .1 c AS be an
.QK-module. For every integer k, let Sk be the K-linear subspace of S spanned
by the leading coefficients of all elements sEL1 satisfying ord s ~ k. Since .1 is
an .QK-module (thus permitting shifts to the left in the discrete-time
interpretation), it follows that the sequence of subspaces {Sd creates a chain
... ::::J S -1 ::::J So ::::J S 1 ::::J " ' , which is called the order chain of .1. The sequence of
the dimensions of the elements of this chain, namely, the sequence of integers
'1k := dim KSk' k = ... , - 1,0,1, ... , is called the order list of .1. An .QK-module
.1 c AS is said to be rational if the intersection .1 n.QS is of rank equal to the
dimension of the K -linear space S. Also, the .Q K -module .1 is said to be bounded
if there is an integer (:J. such that ord s ~ (:J. for all nonzero elements sEL1.
Now, let .1 be a rational .Q K-submodule of the AK-linear space AS, let
{lJd be the order list of .1, and let n be the dimension of the K-linear space S.
The degree indices /11 ~ /12 ~ ... ~ /1n ofL1 are then defined as follows. For every
integer j satisfying lJi ~j < lJi-1' the degree index /1j:= - i; if limk-+rolJk i= 0, set
/1j:= 0 for j = 1, ... , lim k-+ ro lJk' It can then be shown that, for a rational and
bounded .QK-module .1, the degree indices describe the degrees of every ordered
proper basis of .1, as follows (Hammer and Heymann [1983]).
(4.1) Theorem. Let .1 be a rational and bounded .QK-submodule ofthe AK-linear
space AS, and let /11 ~ /12 ~ ... ~ /1n be its degree indices. Then, .1 is of
rank n = dimKS, and

(i) .1 has an ordered proper basis;


(ii) Every ordered proper basis d 1'" ., dn of .1 satisfies deg di = /1i' i = 1, ... , n.

In order to provide an indication of the profound significance of the degree


indices in the context of the theory of linear time-invariant systems, consider
the following fundamental result, which is due to Rosenbrock [1970], Wolovich

5 Linear System Theory-Realization, Integer Invariants and Feedback

305

[1974], Forney [1975], and Hautus and Heymann [1978] (the present version
of the result is taken from the last reference).

(4.2) Theorem. Let f : AU --+ A Y be a linear input/output map, and let .11 K be the
Kaiman realization module off. Then, the degree indices of .11 K are the reachability
indices of a canonical realization of the system represented by f.

As weIl known, the reachability indices play an important role in the theory
oflinear control, as evidenced by the Rosenbrock Theorem (Rosenbrock [1970]).
They are also referred to as the 'Kronecker invariants' of the system (KaIman
[1971]).
Of fundamental importance to the theory of internally stable linear control
are the degree indices of the stability module .11 6 which were introduced and
studied in Hammer [1983a, b, and c], and which form the main motto of this
note. We review from these references the following basic definition. (Recall
that a linear input/output map is simply a strictly causal and rational AK-linear
map.)

(4.3) Definition. Let f: AU --+ A Y be a linear input/output map, let .11 6 =


ker nf n Q6 U be its stability module, and let m = dim K U. The stability indices
(J 1 ~ (J 2 ~ ~ (Jm of f (in the sense of the stability set 0) are the degree indices
of .11 6
As it turns out, the stability indices exactly characterize the set of all possible
dynamical behaviors that can be assigned to the system represented by f through
internally stable closed loop control. Thus, the m integers (J l' ... , (J m represent
the entire information that a designer needs to know about the system represen ted by f, when weighing the options available for designing the dynamical
behavior of a closed loop control configuration that internally stabilizes the
system. In this way, the formalism of the algebraic theory of linear realization
provides a mechanism for the extraction of the underlying invariant structure
of a linear system in the context of dynamical design and stabilization. It is
also quite surprising to note how little data is needed about the system for this
purpose-only a set of m nonnegative integers. The verification of these facts
is the subject of the next section.
Before concluding the present section, we provide one more definition.

(4.4) Definition. Let f: AU --+ AY be a linear input/output map, let .11 6 =


kernefnQU be its pole module, and let m=dimKU. The pole indices
Pl ~ P2 ~ ... ~ Pm off (in the sense of the stability set 0) are the degree indices
of .11 6, The pole degree P of the system represented by f is defined as
p(f):= Pl + P2 + '" + Pm'
It can be shown the the pole degree is equal to the number of unstable poles
of the system (see Hammer [1983a] for a detailed discussion of this and other
topics mentioned in this section).

306

J. Hammer

5 Invariants, Dynamics Assignment, and Internal Stability


In order to discuss the implications of the stability indices on the theory of
internally stable linear control, consider the following classical control
configuration .

.
:

- ............................................................................................................... ,

---------------------------------. ,
:, ...
:, :,
,
,,, ,,
, ,,

,, ,,
, ,,
,,
,,
14-----', ,
,, ,,
,,
~~~

::

1.. __ ..... -_ .. - - - - - - - - - - - - _ ... - - - - - - _ ... - - - -

'-

,
I

fnv,v,r) :
............................................................................................................. ..

Here, ! is the transfer matrix of the system that needs to be controlled; V is


an in-Ioop dynamic precompensator; r is a dynamic output feedback compensator; and W is an external dynamic precompensator. We denote by !(V.r)
the input/output relation (transfer matrix) of the loop alone, and by !(W.v.r) the
input/output relation of the entire composite system. To set up the notation,
we take !:AU-+AY, in which case V:AU-+AU;r:AY-+AU; and W:AU-+
AU. In order to preserve the degrees of freedom available for the input of the
system, we shall require the precompensators V and W to represent invertible
systems (i.e. systems with nonsingular transfer matrices). As is common practice,
we shall also assume that the given system! is strictly causal. Of course, the
compensators W, V, and rare all required to be causal.
When discussing composite systems, the notion of internal stability is of
uttermost importance. A composite system is internally stahle if all its modes,
including the unobservable and the unreachable ones, are stable, where stability
is in the sense of the stability set 8. Our basic objective here is to characterize
all possible dynamical behaviors that can be assigned to the transfer matrix
!(W,V.r) of the composite system by appropriately choosing the compensators
W, V, and r, under the requirement that the system be internally stable. We
shall see that the set of all such possible dynamical behaviors is completely
characterized by the stability indices of the given system! (Hammer [1983a, b,
and c]).
We discuss next explicit conditions for the internal stability of the closed
loop configuration (5.1). First, we list some input/output relations which can
be directly derived through simple standard computations. Denoting
(5.2)

5 Linear System Theory-Realization, Integer Invariants and Feedback

307

it can be seen that

= fVl"

(5.3)

f(W,v,r) = f(V,r) W

(5.4)

f(V,r)
and

Further, some terminology. The senes combination


8-detectable if the pole degrees satisfy
p(fV) = p(f) + p(V)

IS

said to be
(5.5)

In intuitive terms, fV is 8-detectable if and only if there occur no cancellations


of unstable poles and unstable zeros when the transfer matrices of fand V are
multiplied (Hammer [1983b]). We can now state a characterization ofinternal
stability for the configuration (5.1) (Hammer [1983b]).
(5.6) Proposition. The composite system f(w,v,r) is internally stable (in the sense
of the stability set 8) if and only if the following conditions hold.
(i) fV is 8-detectable;
(ii) All of the maps W, f(V,r)' 1" f(V,r{' and

of8).

V are input/output stable (in the sense

Consider now astate representation


Xk+ 1
Yk

= FXk + Gu k ,
= Hx k ,

of the input/output relation f(W,v,r)' As is weil known (Rosenbrock [1970]),


every such canonical state representation corresponds to a left co prime polynomial fraction representation f(W,V,r) = G- 1 Hof the transfer matrix of f(W,v.r)
(which we denote by the same symbol as the map for simplicity of notation).
Here, G and H are left coprime polynomial matrices with G being invertible.
Also, the (nontrivial) invariant factors of the polynomial matrix Gare the same
as the invariant factors of the matrix F (where the invariant factors of the matrix
F over the field Kare defined as the invariant factors of the polynomial matrix
(z1 - F. Furthermore, the dynamical properties ofthe input/output map f(W,V,r)
are entirely determined by the invariant factors of the matrix F. In fact, the
invariant factors of F, which characterize the invariant structure of Funder
similarly transformations (e.g. Maclane and Birkhoff [1979]), provide
the only significant data about F, since F can be replaced by any matrix similar
to it by inducing an isomorphie transformation of the state-space X.
Consequently, the canonical dynamical behavior of the closed loop system (5.1)
is entirely described by the invariant factors of the denominator matrix G in a
left coprime polynomial fraction representation f(W,V,r) = G -1 H.
Thus, in order to determine the entire set of canonical dynamical behaviors
that can be assigned to (5.1) by appropriately choosing the compensators, all
we need to know is the set of all possible invariant factors that may appear as

308

1. Hammer

the invariant factors of the denominator matrix G in a left coprime polynomial


fraction representation f(w, V,r) = G -1 H of the composite system. This set is
entirely characterized by the next result, which is reproduced here from Hammer
[1983c]. (We say that a polynomial cP is stable if 1/cjJEQeK. Also, note the
reverse ordering of the stability indices here.)

(5.7) Theorem. Letf: AU - AY be a linear input/output map with stability indices

o1 ~ O2 ~ ... ~ 0m, and let k:= rank AKf. Let cP l' ... , CPk be a set of monic stable

polynomials, where CPi+ 1 divides


statements are equivalent.

CPi

for all i = 1, ... , k - 1. Then, the following

(i) 'L.{=1 degCPi~L.{=10i for allj= 1, ... ,k.


(ii) There exist causal dynamic compensators W:AU -AU, V:AU -AU, and
r: A Y - A U, where Wand V are nonsingular, such that the closed loop system
f(W,v,r) is internally stable and has a left coprime polynomialfraction representation f(W,V,r) = G- 1H, with G having CP1"'" CPk as its (nontrivial) invariant
factors.

Whence, we have a complete characterization of all the possible dynamical


properties that can be assigned to the given system f by dynamic compensation,
within an internally stable c10sed loop control configuration. This characterization is entirely determined by m integers-the stability indices of the given
system f. It follows then that the stability indices provide all the information
a designer needs to know in order to be able to evaluate all the available options
for the assignment of input/output dynamical properties through internally
stable contro!.
A detailed proof of Theorem (5.7), as well as explicit descriptions of the
construction of dynamic compensators that achieve desired invariant factors
for the c10sed loop system, are given in Hammer [1983b, c]. These references
also contain a variety of other resuIts on dynamics assignment, inc1uding
dynamics assignment by pure dynamic output feedback and by unity output
feedback.
To conc1ude, we have seen that algebraic realization theory for linear timeinvariant systems has matured into more than just an abstract framework for
the derivation of dynamical models of systems. It has become a refined tool for
the extraction of structural invariants from the input/output behavior of the
system, and has elicited the simplicity of the fundamental structure of linear
time-invariant control systems.

References
(For a more complete list of references refer to Hammer [1983a, b, and cl)
G.O. Forney [1975] "Minimal bases of rational vector spaces with applications to multivariable
linear systems," SIAM J Control, Vol13, pp 643-658
J. Hammer [1983a] "Stability and non singular stable precompensation: an algebraic approach,"
Mathematical Systems Theory, Vo116, pp 265-296

5 Linear System Theory-Realization, Integer Invariants and Feedback

309

1. Hammer [1983b] "Feedback representation of precompensators," International J Control, Vol 37,

pp 37-61

1. Hammer [1983c] "Pole assignment and minimal feedback design," International J Control,

Vol 37, pp 63-88

1. Hammer and M. Heymann [1981] "Causal factorization and linear feedback," SIAM J Control,

Vol 19, pp 445-468

1. Hammer and M. Heymann [1983] "Strictiy observable rational linear systems," SIAM J Contro!,

Vo121, pp 1-16
M.LJ. Hautus and M. Heymann [1978] "Linear feedback-an a!gebraic approach," SIAM J
Contro!, Vo116, pp83-105
R.E. Kaiman [1965] "Algebraic structure of linear dynamical systems, I: the module of E,"
Proceedings of the National Academy of Sciences (USA), Vo154, pp 1503-1508
R.E. Kaiman [1968] Lectures on Controllability and Observability, CIME
R.E. Kaiman [1971] "Kronecker invariants and feedback," in ordinary differential equations,
NRL-MRC Conference, L. Weiss ed, pp 459-471, Academic Press, NY
R.E. Ka!man [1980] "Identifiability and problems ofmode! selection in econometrics," Proceedings
of the 4th World Congress of the Econometric Society, Aix-en-Provence, France, August 1980
R.E. Kaiman, P.L. Falb, and M.A. Arbib [1969] Topics in Mathematical System Theory,
McGraw-Hill, NY
A.S. Morse [1975] "System invariants under feedback and cascade control," in Lecture notes in
Economics and Mathematical Systems, Vo1131, G. Marchesini and S. Mitter eds, Springer
Verlag, Berlin
A. Nerode [1958] "Linear automaton transformations," Proceedings ofthe American Mathematical
Society Vo19, pp 541-544
H.H. Rosenbrock [1970] State space and multi variable theory, Nelson, London
W.A. Wolovich [1974] Linear multivariable systems, Applied Mathematical Sciences series, No 11,
Springer Verlag, NY
W.M. Wonham [1967] Linear multivariable control: a geometrie approach, Lecture notes in
Economics and Mathematical Systems, No 101, Springer Verlag, NY
B.F. Wyman [1972] Linear systems over commutative rings, Lecture notes, Stanford University
O. Zariski and P. Samuel [1958] Commutative algebra, D. Van Nostrand Co, NY

Linear Systems Over Rings:


From R. E. Kaiman to the Present
E. W. Kamen
Department of Electrical Engineering, University of Pittsburgh, Pittsburgh, PA 15261, USA

The paper begins with abrief his tory of the role played by R.E. Kaiman in the establishment of
the field of linear systems over a commutative ring. The theory of systems over rings is motivated
by considering integer systems, systems with time delays, parameter-dependent systems, and multidimensional systems inc1uding spatially-distributed systems. Then abrief survey is given of existing
work on the control of systems over rings along with some discussion of open research problems.

1 The Beginning
My first contact with Professor KaIman was during my third year as a graduate
student at Stanford University. In particular, during the Fall 1969 and Winter
1970 quarters, I took Professor Kalman's two-course sequence on mathematical
system theory. I remember very dearly Professor KaIman telling the dass on
the first day that linear algebra and abstract algebra (rings, modules, etc.) were
required for the course. Most, if not all, of the electrical engineering students
in the dass had had courses in linear algebra, but not many of us were experts
in modern algebra. So for me and some of the other students in the dass the
race to leam abstract algebra was begun at full throttle.
Professor Kalman's two-course sequence had a profound impact on me; in
fact my going into mathematical system theory as an area of research was
primarily a result of my in te rest in the material he taught in the sequence. I
found Professor KaIman to be a very effective teacher: his lectures were always
weIl motivated, very dear, and very precise. But what impressed me the most
was the tremendous degree of intuition he displayed on the topics he was
covering.
The first quarter of Professor Kalman's course dealt with linear timeinvariant discrete-time systems given by the dynamical equations
x(t + 1) = Fx(t) + Gu(t)
y(t) = Hx(t) + Ju(t)

(1.1)

(1.2)

Here t is the discrete-time index which takes on integer values only, F, G, H, J


are matrices over an arbitrary field K, and the input u(t), state x(t), and output
y(t) are column vectors with entries in K.
In the course, the study of systems of the form (1.1)-(1.2) was given in terms
of Kalman's K[z]-module approach wh ich first appeared in [18] (see also

312

E. W. Kamen

[19,20]). Here K[z] is the ring of polynomials in the symbol (or indeterminate)
z with coefficients in the field K. In Kalman's original work, the K[z]-module
structure was used to derive a number of results on realization, system structure,
and control. In more recent work, Bostwick Wyman and his colleagues have
extended the module-theoretic approach to the study of the zeros of multivariable systems (e.g. see [50]). The K[z]-module approach to linear
time-invariant discrete-time systems is considered in the paper by Wyman in
this volume, so we shall not pursue the topic any further here.
Although the problem was not introduced in his course (as I recall), sometime
around 1970 Professor KaIman asked the question as to how much of the
theory of linear time-invariant discrete-time systems "go es through" if the field
K is generalized to a commutative ring R. In other words, in the representation
(1.1)-(1.2) now suppose that the matrices F, G, H,} are over a commutative ring
Rand that the input u(t), state x(t), and output y(t) are column vectors with
entries in R. The representation (1.1)-(1.2) with F, G, H,} over R is an example
of a system over a ring, that is, a system whose coefficients belong to a ring.
Part of the original motivation for studying systems over a ring was due to
the interest in discrete-time systems over the ring Z of integers. A discrete-time
system of the form (1.1)-(1.2) with F,G,H,} over Z processes integer-valued
inputs u(k) using integer operations. Thus systems over Z can be implemented
"exacdy" on a digital computer, assuming that magnitudes are less than 10 12
for 12-digit precision. Systems over Z arise in co ding theory (Johnston [17])
and in digital signal processing (Kurshan and Gopinath [32]). In [32], the
authors consider the generation and detection of single tones using digital filters
given by the difference equation
y(k) = aly(k - 1) + a2 y(k - 2) + ... + any(k - n) + u(k)

where the ai are integers. Systems over Z also appear in CCD technology,
where new types of programmable CCD filters with integer coefficients have
been fabricated (see Hewes et al. [15]).
Professor KaIman was especially interested in the problem of realization
for systems over rings, which is defined as folIows. Given a sequence
{A l ,A 2 ,A 3 , ... } of p x m matrices over a commutative ring R, when do there
exist n x n, n x m, p x n matrices F, G, H over R such that

(1.3)
If we define the transfer matrix by the formal power series
W(z) = Alz- l

+ A 2 z- 2 + ... ,

then there exist matrices F, G, H satisfying (1.3) if and only if


W(z) = H(zI - F)-lG

(1.4)

A sequence {A l ,A 2 , ... } over R is said to be realizable over R if there exist


matrices F, G, H over R satisfying either (1.3) or (1.4).

5 Linear System Theory-Systems Over Rings

313

A particular question posed by Professor KaIman was the following: Let R


be an integral domain (a commutative ring with no divisors of zero) and let Q
denote the quotient field of R. Suppose that the sequence {A1,A z, ... } over R
is realizable over Q; that is, there exist matrices F, G, H over Q satisfying (1.3)
or (1.4). Then the question is whether or not the sequence is realizable over R.
If realizability over Q always implied realizability over R, then the question of
the existence of realizations over R could be reduced to the question of the
existence of realizations over a field, for which there was a complete theory.
This provided much of the motivation for the particular problem posed by
KaIman.
Professor KaIman suggested the problem of realization over a commutative
ring to one of his students, Yves Rouchaleau, as a potential topic for a Ph.D.
thesis. Appearing in March 1972, Rouchaleau's thesis [37] was the first work
on systems over rings. Using a key proposition proved by Bostwick Wyman,
in his Ph.D. thesis Rouchaleau showed that if R is an integrally-dosed
Noetherian domain, then realizability over Q always implies realizability over
R. Rouchaleau also gave a constructive procedure for computing minimal
realizations in the ca se when R is a principal ideal domain (e.g. when R = Z
or R = K[a] = ring of polynomials in a with coefficients in a field K).
The condition in Rouchaleau's thesis that R be integrally dosed was soon
eliminated in the paper by Rouchaleau, Wyman, and KaIman [39], so in that
paper one can find the result that realizability over Qalways implies realizability
over R if R is a Noetherian domain. The dass of rings that are Noetherian
domains indudes the important special case of the ring K[a 1,a2, ... ,aq ] of
polynomials in the symbols a1,az, ... ,aq with coefficients in a field K. (We will
see later how this case arises in applications.) The main theorem in Rouchaleau,
Wyman, and KaIman [39] was generalized by Rouchaleau and Wyman
[38] to a dass of non-Noetherian domains (namely, rings with rank-one
normalization).
It is worth noting that in his review of [38], Michel Fliess pointed out that
for single-input single-output systems, the question of realizability over an
integral domain given realizability over the quotient field is equivalent to a
problem raised by Fatou [12] in 1905, and for which definitive results were
derived by Cahen and Chabert [6].1t should also be mentioned that the question
of rationality (or realizability) of power series over commutative and noncommutative rings has been studied by Fliess [13] using the Hankel matrix
approach.
A great deal of work has been carried out on the problem of realization for
systems over a ring. The most recent results have centered on the relationship
between realizations having various desirable properties and the existence of
particular types of matrix fraction representations for the transfer matrix (e.g.,
see Sontag [45], Khargonekar [30], Khargonekar and Sontag [31]).

314

E. W. Kamen

2 The Continuous-Time Case


As noted in the previous section, the motivation for studying systems over rings
originated from the application to discrete-time systems. In this section we
consider the application of systems over rings to continuous-time systems. First,
we need to return to Professor Kalman's two-course sequence in mathematical
system theory.
In the Winter 1970 term of the sequence, Professor KaIman presented a
continuous-time version of his K[z]-module approach to discrete-time systems.
The continuous-time counterpart to the ring of polynomials K[z] was chosen
to be the ring E of distributions on the reals with compact support contained
in the interval (- 00,0]. In the definition of the ring structure on E, addition
is the usual addition of distributions, multiplication is the convolution operation,
and the identity element of E is the Dirac distribution ~ concentrated at the
origin. For linear time-invariant continuous-time systems with infinite1ydifferentiable impulse response function matrix, KaIman constructed an
E-module framework which dose1y resembled his K[z]-module approach to
discrete-time systems (see [21]).
After taking Professor Kalman's two-course sequence, I was very interested
in determining if his E-module framework could be extended to a large dass
of infinite-dimensional continuous-time systems induding lumped-distributed
networks with LC and RC transmission lines. The generalization of Kalman's
E-module setup required that one remove the constraint that the impulse
response function matrix be infinitely differentiable. In my Ph.D. thesis [22]
(which was supervised by Robert Newcomb), I generalized the E-module framework to a large dass of infinite-dimensional systems by defining the space of
output functions in terms of the space of all distributions on the reals with
support bounded on the left. The framework was formulated so that (as in
Kalman's setup) the state set induced from a "KaIman-type" input/output map
always admitted the structure of a finitely-generated E-module, although it was
infinite dimensional as a linear space over the reals. The finiteness of the state
module was exploited in Kamen [25] to yield results on the control of infinitedimensional systems. We should note that a similar module-theoretic approach
was developed by Yamamoto [51].
The connection between the E-module framework in [22] and systems over
rings came about as follows. First, in order to focus on a particular dass of
lumped distributed systems, in [22], I consider the subring R[~(1), 1t 1 , 1t2' ' 1t q ]
of E consisting of all polynomials in the elements ~(l), 1tl' 1t2' ' 1tq with
coefficients in the reals R, where ~(l) is the generalized derivative of the
Dirac distribution ~ and 1t 1 , 1t2' ' 1tq are elements of the ring E. It is assumed
that each 1t i has an inverse (under convolution) in the space of distributions
on R with support contained in the interval [0, (0). As no ted in [22],
R[~(1),1tl,1t2, ... ,1tq] is an integral domain since it is a sub ring ofE which has
no divisors of zero. If ~(l), 1t 1 , 1t2' ' 1tq are algebraically independent (over R),

5 Linear System Theory-Systems Over Rings

315

the ring R[b(1),7r 1,7r 2, ... ,7rqJ is isomorphie to the ring R[0"1'0"2"",O"q+1J of
polynomials in q + 1 symbols with coefficients in R.
Now consider a system whose impulse response function matrix W is over
the quotient field R(J(l), 7r 1, 7r 2, ... , 7r q ). One can then ask if there exists a realization given by a quadrupie (F, G, H, J) of matrices over the quotient field
R(7r 1, 7r 2, ... , 7r q ) such that
(2.1)
Since convolution with b(1) corresponds to taking the generalized derivative,
the factorization (2.1) defines the following state model

dx(t)jdt = (F*x)(t) + (G*u)(t)

(2.2)

y(t) = (H*x)(t) + (hu)(t)

(2.3)

where "*" denotes the convolution operation. The representation (2.2)-(2.3)


defines a continuous-time system over the field of (convolution) operators
R(7r 1, 7r 2, ... , 7r q ). Here the input u(t), the "state" x(t), and the output y(t) are
column vectors over some appropriate space of functions (or distributions).
In [22J the treatment of the problem of realization dealt only with the construction of realizations with J = 0 and with F, G, H defined over the quotient
field R(7r 1, 7r 2, ... , 7r q ). This work was motivated in part by the earlier work of
Youla [52] and Newcomb [36J on the synthesis of passive n-port networks
containing lumped and distributed elements. However, in the application to
systems, realizations over the quotient field are not meaningful in general since
elements in thisfield may correspond to noncausal operators. For example, if
7r 1 is equal to b(t + T) where T > 0, then 7r 1 corresponds to an ideal predictor
which of course is not physically realizable.
To insure that realizations given by (2.2)-(2.3) are physically realizable, it
is necessary to restrict the elements of F, G, H to the subring of R(7r 1, 7r 2, ... , 7r q )
consisting of all stable elements that are proper in the 7r i In other words, we
really need to consider systems over a ring of operators. The problem of constructing realizations over a ring of operators was first considered in [23J, with
the full paper appearing in [24]. The results in [23,24J center on the existence
of factorizations of the form (2.1) with J = 0 and with F, G, H defined over the
polynomial ring R[d 1 ,d 2 , ,dq ] where di is the inverse of 7r i . In particular, as
noted in [23, 24J, since R[d 1 ,d 2 , ... ,dq J is a Noetherian domain, the results of
Rouchaleau, Wyman, KaIman [39] can be directly applied to the realization
of continuous-time systems over this ring of operators. In addition, when q = 1
and the element d 1 = d is transcendental over R, the polynomial ring R[dJ is
a principal ideal domain, and in this case results [37,8J on the construction of
minimal realizations can be directly applied.
We should note that the representation (2.2)-(2.3) can be defined without
the need to consider spaces of distributions (this was done in [34,49]). In
particular, given T > 0 let d denote the T-second delay operator acting on some
appropriate space of functions, and let R[dJ denote the ring of operators

316

E. W. Kamen

consisting of polynomials in d with coefficients in R. Then a quadruple (F, G, H, J)


of matrices over R[dJ defines a delay differential system with commensurate
time delays given by the dynamical equations dx(t)/dt = (Fx)(t) + (Gu)(t),y(t) =
(Hx)(t) + (Ju)(t), where the action of F on x, G on u, H on x, and J on u is
defined in the obvious way. The noncommensurate delay ca se can be considered
by taking F,G,H,J to be over the polynomial ring R[d 1 ,d 2 , ... ,dq J where the
di are delay operators. The extension of the operator-ring approach to a large
dass of functional differential equations is given in [27J.

3 Additional Applications of Systems Over Rings


In this section we consider two additional dasses of systems that can be viewed
as systems over rings. Webegin with two-dimensional systems and then we
consider parameter-dependent systems.
Two-Dimensional Systems. Linear two-dimensional (2-D) discrete systems
can be treated as one-dimensional (1-D) systems with coefficients in a commutative ring. In the quarter-plane case, this ring approach to 2-D systems originated
in the work of Sontag [43, 44J and Eising [7J. Below we follow the development
in Kamen [28J in deriving a ring approach to a large dass of half-plane causal
2-D systems. Since 2-D systems are often specified by an input/output model,
rather than astate model, we begin with the input/output convolution relationship for a 2-D system.
Consider the single-input single-output linear discrete 2-D system given by
the convolution expression
y(k, r) =

LL
00

i,j= -

w(k - i, r - j)u(i,j),

k, rEZ

(3.1)

00

Here u(k, r) is the input array, y(k, r) is the output array, and w(k, r) is the
point-spread function. For any k, rEZ, u(k, r), y(k, r), and w(k, r) are elements of
the reals R. The summation in (3.1) ends at i = k since we are assuming that
w(k, r) = 0 for all k < O. In other words, we are assuming that the system is
half-plane causal. If w(k, r) = 0 for all k < 0 and all r < 0, the system is said to
be quarter-plane causal. We are also assuming that w(k, r) and/or u(k, r) is
constrained so that the sum on j in (3.1) is weIl defined. We will make this
precise shortly.
Let V denote the set of all real-valued functions defined on Z with support
bounded on the left. With pointwise addition and with convolution, V is a
commutative ring with identity. We will show that the 2-D system defined by
(3.1) can be viewed as a 1-D system whose coefficients belong to the ring V.
First, for each integer k, let Uk'Yk' and W k denote the functions from Z into R
defined by uk(r) = u(k, r), Yk(r) = y(k, r), and wk(r) = w(k, r). The functions uk, Yk'
and W k are the kth rows of the corresponding arrays.

5 Linear System Theory-Systems Over Rings

317

Now suppose that for each kEZ, Uk'Yk, and Wk are elements of the ring V;
in other words, the rows of the input array, output array, and point-spread
function have supports bounded on the left. Then we can write (3.1) as a
convolution expression with entries in V given by
k

Yk

L
i= -

(3.2)

Wk-i*U i
00

where for each fixed k and i, Wk-i*U i is the convolution of the ith row of the
input array u(k, r) with the (k - i}th row of the point-spread function w(k, r).
It should be noted that the representation (3.2) is not unique; for example,
we could represent the input and output arrays in terms of columns or rotated
columns, rather than rows. Some results on 2-D systems are dependent on
which representation is used, whereas some are not.
The dass of 2-D systems given by (3.2) indudes the dass of nonsymmetrie
half-plane causal 2-D systems. We can also consider a large dass of symmetrie
half-plane causal 2-D systems by assuming that the rows Uk> Yk' and W k are
elements of the convolution ring S of absolutely summable real-valued functions
defined on Z.
With R equal to either Vor S, consider the system given by (3.2) with wkER.
The transfer matrix W(z) of the system is defined to be the formal power series
in z - 1 with coefficients in R given by
W(z) =

L WiZ- i.
00

(3.3)

i= 1

We can generalize (3.3) to the m-input p-output case by considering transfer


matrices of the form
W(z)

L
00

WiZ- i,

(3.4)

i= 1

where the Wi are P x m matrices over the ring R.


A realization of the system given by (3.4) is a quadrupie (F, G, H, J) of n x n,
n x m, p x n, p x m matrices over R such that
W(z) = H(zI - F)-lG

+J

(3.5)

The factorization (3.5) defines astate model given by the equations


(3.6)
Yt

= H*x t

+ J*u t

(3.7)

Evaluating both si des of (3.6)-(3.7) at rEZ, we obtain the 2-D equations


00

x(t+l,r)=

00

j= -

F(r-j)x(t,j)+

j= -

j= -

00

00

y(t, r) =

00

H(r - j)x(t,j) +

G(r-j)u(t,j)

(3.8)

J(r - j)u(t,j)

(3.9)

00

L
00

j= -

00

318

E. W. Kamen

Thus the 2-D state representation (3.8)-(3.9) can be studied in terms of the state
representation (3.6)-(3.7) defined over the ring R. We should note that there is
also a continuous-time version of (3.6)-(3.7) which arises in the study of
spatially-distributed continuous-time systems (see [29]). Also, it is obvious that
(3.6)-(3.7) can be generalized to the N-dimensional ca se (for N> 1) by
considering a ring offunctions from the N -fold Cartesian product of Z into R.
Parameter-Dependent Systems. Linear systems whose coefficients are
functions of one or more parameters can be studied in terms of systems over
a ring. This ring approach to parameter-dependent systems originated in the
work of Byrnes [3,4,5]. The definition of the ring setup is given below.
With N equal to a fixed positive integer, let W be a subset of N-dimensional
Euclidean space RN and let C denote the ring of real-valued continuous functions
defined on W. Here addition and multiplication are defined pointwise in the
usual way. Now a quadrupIe (F, G, H, J) of n x n, n x rn, p x n, p x rn matrices
over C defines a rn-input p-output n-dimensional linear continuous-time or
discrete-time system whose coefficients depend continuously on the entries of
a parameter vector WE W. In the continuous-time case, the system is given by
the state representation over the ring C given by

dx(t)/dt = Fx(t) + Gu(t)

(3.10)

y(t) = Hx(t) + Ju(t)

(3.11 )

where for each fixed value of t, u(t), x(t), and y(t) are column vectors over C.
Evaluating (3.10)-(3.11) at every WEW, we obtain the collection of
equations.

dx(t, w)/dt = F(w)x(t, w) + G(w)u(t, w)


y(t, w)

= H(w)x(t, w) + J(w)u(t, w),

(3.12)
WE W

(3.13)

In the discrete-time case the dynamical equations are given by (1.1)-(1.2) with
F, G, H, J defined over C.

Note that the input u(t, w) depends on the parameter vector w. Dependency
of the input on w may result for ex am pIe from taking u(t, w) to be a feedback
signal generated from the state x(t, w) or output y(t, w) of the system.
One can also consider parameter-dependent systems defined over a subring
of C such as the ring R [w] of polynomials in the entries of w with real coefficients,
or the ring of rational functions a(w)/b(w) in the entries of w with b(w) nonzero
for all WE W. If the parameter set W is a c10sed bounded set, by the Weierstrass
approximation theorem we can approximate any function in C by a polynomial
in R [w], with the approximation as c10se as desired (in terms of the standard
sup norm on C). Hence when Wis c10sed and bounded, any system over C can
be approximated by a system over R[w].

5 Linear System Theory-Systems Over Rings

319

4 Control of Systems Over Rings


For the systems defined in the previous sections, it is possible to make effective
use of the ring framework in the study of the existence and construction of
controllers for various types of control problems. There exists a large number
of papers in the literature on the control of systems over rings. In this section
we briefly survey a small portion of the existing work and we point out a couple
of open areas for research.
Given a commutative ring R with 1, consider the n-dimensional systemover
R defined by the tripIe (F, G, H) of n x n, n x m, p x n matrices over R. (Note that
we are taking J = 0.) The system (F, G, H) is said to be "free" since it is defined
in terms of matrices, rather than R-module homomorphisms. In this paper we
are restricting attention to free systems, although "nonfree" systems are of
importance in the theory of systems over rings.
A system (F, G, H) over R may be interpreted as a discrete-time system over
R given by the dynamical equations (1.1)-(1.2), or for various appropriate choices
of R, it may be interpreted as a continuous-time system over R (such as a system
with time delays, a spatially-distributed system, or a parameter-dependent
system). All of the control-theoretic results discussed below apply to either the
discrete-time or continuous-time interpretation.
Given a system (F, G, H) over Rand a m x n matrix Lover R, consider the
n x n matrix F - GL. In either the discrete-time or continuous-time
interpretation of (F, G,H), the matrix F - GL arises in the obvious way by
feeding back the state of the given system through the "feedback gain matrix"
L(that is, we set u = - Lx). The system (F, G, H) is said to be coefficient assignable
if the coefficients of the characteristic polynomial of F - GL can be assigned
arbitrarily by selecting L. In other words, the system is coefficient assignable if
for any biER, i = 0,1, ... , n - 1, there exists an Lover R such that
det(zI - F + GL) = zn +

n-1

L bizi,

i=O .

where "det" denotes the determinant. The system is pole assignable if for any
CiER, i = 1,2, ... , n, there exists an Lover R such that
det(zI - F

+ GL) =

(z - C1)(Z - C2) ... (z - cn)

When R is a field, it is weIl known that pole and coefficient assignability are
equivalent to reac;hability, which in turn is equivalent to right invertibility (or
full rank) of the reachability matrix U = [G FG ... Fn-1G]. When R is a ring,
reachability is still equivalent to right invertibility of the matrix U, and in the
single-input case (m = 1), it is still true that coefficient assignability, pole assignability, and reachability are all equivalent. Further, in the m = 1 case existing
expressions (such as Ackermann's formula) for the feedback L which gives a
desired characteristic polynomial in the field case are also valid in the ring case.

320

E. W. Kamen

The first major work on the control of systems over a ring in the multi-input
case was Morse's paper [34] which contains a constructive proof that
reachability implies pole assignability when R is the ring K[O"] of polynomials
in a symbol 0" with coefficients in a field K. (In fact, Morse's construction applies
to systems over any principal ideal domain.) In this survey paper, Eduardo
Sontag [43] proved that reachability is necessary for pole assignability and he
showed that reachability and coefficient assignability are equivalent when R is
a semilocal ring. In the paper by Bumby et al. [2], a counterexample was given
showing that reachability does not imply pole assignability in general when
R=R[O"U0"2]. Later Tannenbaum [48] showed that re ach ability does not
always imply pole assignability when R is any polynomial ring in two or more
symbols with coefficients in a field. In more recent work, an equivalence has
been established between the pole assignability property for reachable systems
(F, G, H) over a ring Rand the property that the image of G contains a
unimodular element (or a rank-one summand) for every reachable system over
R (see Sharma [42], Naude and Naude [35], and Sontag [47]).
The problem of pole or coefficient assignability can be reduced to the
single-input (rn = 1) case by using the notion of feedback cyclizability: a system
(F, G, H) over R is feedback cyclizable if there exist a rn-element column vector
b over Rand a rn x n matrix Q over R such that the pair (F - GQ, Gb) is
reachable; that is, the n x n matrix U(b, Q) = [Gb (F - GQ)Gb ... (F - GQt - 1 Gb]
is invertible over R. If U(b, Q) is invertible, using results for the rn = 1 case, we
can compute a feedback row vector over R such that F - GQ - Gb has the
desired characteristic polynomial. Then with L = Q + b, F - GL has the desired
characteristic polynomial.
When R is a field, it was Heymann [16] who showed that reachability and
feedback cyclizability are equivalent. When R is a ring, it is still true that
feedback cyclizability implies reachability; however, the counterexamples of
Sontag [43] show that in general reachability does not imply feedback
cyclizability. The following necessary and sufficient condition for feedback
cyclizability was given in [26]: (F, G, H) is feedback cyclizable if and only if
there exist rn-element column vectors U i over R for i = 0,1, ... , n - 1 such that
the elements gi' i = 1,2, .. . ,n generate Rn, where go = Gu o and gi = Fg i - 1 + GU i
for i = 1,2, ... , n - 1. Here Rn is the R-module of all n-element column vectors
over R. For results on feedback cyclizability when R is a principal ideal domain,
see Schmale [40, 41].
In the work ofEmre and Khargonekar [11], it is shown that for any reachable
system (F, G, H) over a ring R, coefficient assignability can be achieved by using
a dynamic feedback system (A, B, C, D) defined over Rand with the dimension
q of the feedback system bounded above by n 2 . In the discrete-time
interpretation, the feedback control signal is given by
u(t) = Cg(t) + Dx(t),

(4.1)

where x(t) is the state of the given system and g(t) is the state of the feedback
system with state equation g(t + 1) = Ag(t) + Bx(t). With the feedback control

5 Linear System Theory-Systems Over Rings

321

(4.1), the system matrix ofthe resulting closed-Ioop system is the (n + q) x (n + q)


matrix given by
F =[F+GD
cl
B

GC]
A

(4.2)

Here coefficient assignability means that it is possible to assign the characteristic


polynomial of F cl by selecting A, B, C, D.
Now defining the (m + q)-input, (p + q)-output, (n + q)-dimensional augmented system

Fa=[~ ~] Ga=[~ ~] Ha=[~ ~]


from (4.2) we see that Fcl = Fa - GaLa, where
La =

-[~ ~]

Hence dynamic state feedback of the given system (F, G, H) is equivalent to


nondynamic (or static) state feedback (i.e. u = - Lx) of the augmented system
(Fa' Ga, Ha) So reachability always implies that for some q ~ n2 the augmented
system is coefficient assignable by a feedback gain matrix La over R. This
augmentation approach to assignability has been pursed by Sontag [47J and
Brewer et al. [1].
The determination of the minimal dimension of the feedback system
necessary to achieve assignability is a very interesting open problem (which
may be solvable for only very special classes of rings). A current problem of
interest is the determination of upper bounds on the dimension that are functions
of n and/or m.
It should be mentioned that pole or coefficient assignability results for the
augmented system (F a' Ga, Ha) can be dualized to yield results on ob servers with
assignable "dynamics." The separation theorem can then be generalized to the
augmented framework so that state feedback controllers and observers can be
combined in the standard way to construct an input/output regulator which
results in a closed-Ioop system having an assignable characteristic polynomial.
A limitation of control-theoretic results based on reachability is that this
property may be too "strong" for a system over a ring R. This is particularly
true in the single-input case where reachability requires that the determinant
ofthe n x n reachability matrix [G FG ... pn-1G] be invertible in R. Thus how
strong the condition is in the m = 1 case depends on how "large" the subset of
invertible elements iso In general, the more inputs a system has, the "weaker"
the reachability condition iso In fact, as proved by Lee and Olbrot [33J, when
R = R[d 1 , d2 , , dqJ reachability is a generic condition if the number of inputs
exceeds q.
To eliminate the need for reachability, we can consider control schemes that
yield a stable closed-Ioop system, rather than assigning the zeros or coefficients

322

E. W. Kamen

of the closed-Ioop characteristic polynomial. This requires a notion of stability


whieh (as done in [31]) can beintroduced in a purely algebraie way as folIows.
Let S denote a multiplicative sub set of monie polynomials belonging to the
ring R[z] of polynomials in z with coefficients in the ring R. The elements of
S are interpreted as the subset of stable polynomials. Then a system (F, G, H)
over R is S-stabilizable if there exists an Lover R such that det(zI - F + GL)eS.
It is known (see [9]) that a necessary condition for S-stabilizability is that there
exist matriees X(z) and Y(z) over S-l R[z] such that
(zI - F)X(z) + GY(z) = I

(4.3)

From the results in [10], the Bezout identity (4.3) implies that there is a
dynamic state feedback control system such that the characteristic polynomial
of the resulting closed-Ioop system belongs to S.
Existing results on stabilization based on the Bezout identity (4.3) are not
constructive in general due to the difficulty in determining the existence of
matrices X(z) and Y(z) satisfying (4.3). When R is a C*-algebra, stabilizability
can be checked using local criteria (based on the Gelfand transform) and
stabilizing feedback gain matrices can be computed from Riccati equations
(as in the field case). For example, the rings Sand C defined in Sect. 3
admit the structure of a C*-algebra and thus two-dimensional systems (such
as spatially-distributed systems) and parameter-dependent systems can be
stabilized using a Riccati-equation type approach. A key feature of the
C*-framework is the existence of a star operation on the algebra, in terms of
which one can define the notion of positivity. For results on the stabilization
of systems defined over algebras, see Byrnes [4], Green and Kamen [14], and
Kamen [29].
An interesting open area of research is the possible extension of the Riccatiequation framework to systems over rings whieh are not closed under an
appropriate star operation. For example, consider the class of continuous-time
systems with commensurate time delays defined over the polynomial operator
ring R[d]. The appropriate star operation appears to be d* = d- 1 , but as noted
in Sect. 2, d- 1 corresponds to the ideal predictor and thus is noncausal. An
open question is whether or not the Riccati-equation setup can be modified to
yield causal stabilizing solutions for systems over R[d].

5 Concluding Remarks
One of the objectives of this paper is to show the influence and role played by
R.E. KaIman in the establishment of the field now called systems over rings.
The paper is not intended to be a complete survey of systems over rings; in
fact, there are numerous papers on the subject that are not mentioned here due
to space limitations. Also omitted is the growing body of literat ure on the
system-over-ring approach to linear time-varying systems.

5 Linear System Theory-Systems Over Rings

323

The theory of systems over rings is sufficiently well developed (and I believe
of sufficient interest) to justify the generation of tutorial treatments in the form
of textbooks. A book emphasizing the mathematical aspects of the subject has
appeared [1]; however, as of yet there are no textbooks focusing on the
engineering side of the theory. This is a task that Yves Rouchaleau and I set
out to do several years ago, but we have not been able to finish the job. Perhaps
this paper will stimulate us (or someone else) to "fill the gap."
References
[1] J.W. Brewer, J.W. Bunce, and F.S. Van Vleck, Linear Systems over Commutative Rings, Marcel
Dekker, New York, 1986
[2] R. Bumby, E.D. Sontag, HJ. Sussman, and W. Vasconcelos, "Remarks on the pole-shifting
problem over rings," Journal oJ Pure and Applied Algebra, Vol 20, pp 113-127, 1981
[3] C.I. Byrnes, "On the stabilizability of linear control systems depending on parameters," in
Proe 18th IEEE ConJ Deeision and Control, Ft Lauderdale, pp 233-236,1979
[4] c.I. Byrnes, "Realization theory and quadratic optimal controllers for systems defined over
Banach and Frechet algebras," in Proe 19th Conf Decision and Control, Albuquerque, New
Mexico, pp 247-251, 1980
[5] c.I. Byrnes, "Algebraic and geometrie aspects of the analysis of feedback systems," in
Geometrieal Methods Jor the Theory oJ Linear Systems, C.I. Byrnes and C. Martin (Eds.),
D. Reidel, Dordrecht, pp 85-124, 1980
[6] PJ. Cahen and J.L. Chabert, "Elements quasi-entiers et extensions de Fatou, J. Algebra, Vol
36, pp 185-192, 1975
[7] R. Eising. "Realization and stabilization of 2-D systems," IEEE Trans Automatie Control,
Vol AC-23, pp 793-799, 1978
[8] R. Eising and M.L.J. Hautus, "Realization algorithms for a system over a PID," Math Systems
Theory, Vol 14, pp 353-366, 1981
[9] E. Emre, "On necessary and sufficient conditions for regulation of linear systems over rings,"
SIAM J Control and Optimization, Vol 20, pp 155-160, 1982
[10] E. Emre, "Regulation of linear systems over rings by dynamic output feedback," Systems and
Control Letters, Vol 3, pp 57-62, 1983
[11] E. Emre and P.P. Khargonekar, "Regulation of split linear systems over rings: Coefficient
assignment and observers," IEEE Trans Automatie Control, Vol AC-27, pp 104-113, 1982
[12] P. Fatou, "Se ries trigonometriques et series de Taylor," Aeta Math, Vo130, pp 335-400,1905
[13] M. Fliess, "Matrices de Hankei," J Math Pures et Appl, Vol 53, pp 197-224, 1974
[14] W.L. Green and E.W. Kamen, "Stabilizability of linear systems over a commutative normed
algebra with applications to spatially-distributed and parameter-dependent systems," SIAM
J Control and Optimization, Vol 23, pp 1-18, 1985
[15] C.R Hewes, RW. Brodersen, and D.D. Buss, "Applications of CCD and switched capacitor
filter technology," Proe IEEE, Vol 67, pp 1403-1415, 1979
[16] M. Heymann, "Comments on pole assignment in multi-input controllable linear systems,"
IEEE Trans Automatie Control, Vol AC-13, pp 748-749,1968
[17] R. Johnston, "Linear systems over various rings," Ph.D. Dissertation, M.I.T., 1973
[18] R.E. Kaiman, "Algebraic structure of linear dynamical systems. I. The module of E," Proc
National Academy of Sciences (USA), Vol 54, pp 1503-1508, 1965
[19] RE. Kaiman, "Lectures on controllability and observability," Centro Internazionale
Matematico Estivo, Bologna, 1968
[20] R.E. Kaiman, P.L. Falb, and M.A. Arbib, Topies in Mathematieal System Theory,
McGraw-Hill, New York, 1969
[21] RE. Kaiman and M.L.l Hautus, "Realization of continuous-time linear dynamical systems:
Rigorous theory in the style ofSchwartz," in Ordinary Differential Equations, 1971 NRL-MRC
Conf, L. Weiss (Ed.), pp 151-164, Academic Press, New York, 1972
[22] E.W. Kamen, "A distributional-module theoretic representation of linear dynamical
continuous-time systems," Ph.D. Dissertation, Stanford University, 1971
[23] E.W. Kamen, "An algebraic realization theory for linear continuous-time systems, in Proe
Sixth Hawaii Int Conf on Systems Science, pp 32-34, 1973

324

E. W. Kamen

[24] E.W. Kamen, "On an algebraic theory of systems defined by convolution operators," Math
Systems Theory, Vol 9, pp 57-74, 1975
[25] E.W. Kamen, "Module structure of infinite-dimensional systems with applications to
controllability," SIAM J Control and Optimization, Vol 14, pp 389-408, 1976
[26] E.W. Kamen, "Lectures on algebraic system theory: Linear systems over rings," NASA
Contractor Report 3016, 1978
[27] E.W. Kamen, "An operator theory of linear functional differential equations," J Differential
Equations, Vol 27, pp 274-297, 1978
[28] E.W. Kamen, "Asymptotic stability of linear shift-invariant two-dimensional digital filters,"
IEEE Trans Cireuits and Systems, Vol CAS-27, pp 1234-1240, 1980
[29] E.W. Kamen, "Stabilization of linear spatially-distributed continuous-time and discrete-time
systems," in Multidimensional Systems Theory, N.K. Bose (Ed.), pp 101-146, D. Reidel,
Dordrecht, 1985
[30] P.P. Khargonekar, "On matrix fraction representations for linear systems over commutative
rings," SIAM J Contra I and Optimization, Vo120, pp 172-197, 1982
[31] P.P. Khargonekar and E.D. Sontag, "On the relation between stable matrix fraction factorizations and regulable realizations of linear systems over rings," IEEE Trans Automatie Control,
Vol AC-27, pp 627-638, 1982
[32] R.P. Kurshan and B. Gopinath, "Digital single-tone generator-detectors," Bell System
Teehnieal Journal, Vol 55, pp 469-476, 1976
[33] E.B. Lee and A.W. Olbrot, "On reachability over polynomial rings and a related genericity
problem," Int J Systems Scienee, Vol13, pp 109-113,1982
[34] A.S. Morse, "Ring models for delay-differential systems," Automatiea, Vol 12, pp 529-531,1976
[35] c.G. Naude and G. Naude, "Comments on pole assignability over rings," Systems and Control
Letters, Vol 6, pp 113-115, 1985
[36] R.W. Newcomb, "On the realization of multivariable transfer functions," Report EERL 58,
Cornell Univ., Ithaca, NY, 1966
[37] Y. Rouchaeleau, "Linear, discrete-time, finite-dimensional dynamical systems over some c\asses
of commutative rings," Ph.D. Dissertation, Stanford University, 1972
[38] Y. Rouchaleau and B.F. Wyman, "Linear dynamical systems over integral domains," J
Computer and System Sciences, Vo19, pp 129-142, 1974
[39] Y. Rouchaleau, B.F. Wyman, and R.E. Kaiman, "Algebraic structure of linear dynamical
systems. III. Realization theory over a commutative ring," in Proc National Academy of
Sciences, Vo169, pp 3404-3406, 1972
[40] W. Schmale, "Feedback cyc\ization over certain principal ideal domains," Int J Control,
Vo148, pp 89-96, 1988
[41] W. Schmale, "Three-dimensional feedback cyc\ization over C[y]," Systems and Control Letters,
Vo112, pp 327-330, 1989
[42] P.K. Sharma, "Some results on pole-placement and reachability, Systems and Control Letters,
Vol 6, pp 325-328, 1986
[43] E.D. Sontag, "Linear systems over commutative rings: A survey," Ricerche di Automatica,
Vol 7, pp 1-34, 1976
[44] E.D. Sontag, "On linear systems and noncommutative rings," Math. Systems Theory, Vo19,
pp 327-344, 1976
[45] E.D. Sontag, "On split realizations of response maps over rings," Information and ContrnI,
Vol 37, pp 23-33, 1978
[46] E.D. Sontag, "On first-order equations for multidimensional filters," IEEE Trans Acoustics,
Speech, and Signal Proc, Vol ASSP-26, pp 480-482, 1978
[47] E.D. Sontag, "Comments on 'Some results on pole-placement and reachability'," Systems and
Control Letters, Vol 8, pp 79-83, 1986
[48] A. Tannenbaum, "Polynomial rings over arbitrary fields in two or more variables are not
pole assignable," Systems and Control Letters, Vo12, pp 222-224, 1982
[49] N.S. Williams and V. Zakian, "A ring of delay operators with applications to delay-differential
systems," SIAM J Control and Optimization, Vo115, pp 247-255, 1977
[50] B.F. Wyman, M.K. Sain, G. Conte, and A. Perdon, "On the zeros and poles of a transfer
function," Linear Algebra and Its Applications, Vo1122, pp 123-144, 1989
[51] Y. Yamamoto, "Module structure of constant linear systems and its application to controllability," J Math Analysis and Appl, Vo183, pp 411-437, 1981
[52] D.C. Youla, "The synthesis of networks containing lumped and distributed elements," in Proc
Symp on Generalized Networks, Vol XVI, Polytechnic Inst of Brooklyn, pp 289-343, 1966

Chapter 5

Linear System Theory: Families of Systems

Invariant Theory and Families


of Dynamical Systems*
A. Tannenbaum
Department of Electrical Engineering, University of Minnesota, Minneapolis,
Minnesota 55455, USA and
Department of Electrical Engineering, Technion-Israel Institute of Technology,
Haifa, 32000 Israel

In this paper we discuss a invariant-theoretic construction of the KaIman space wh ich is a universal
parameter space for linear time-invariant dynamical systems of fixed state space and input/output
dimensions.

1 Introduction
Families of dynamical systems appear in all aspects of systems and control
theory. Indeed, the essential need for feedback in control systems is the fact
that the plant model is only an approximation, and so we must in reality design
for a whole family of plants. Of course, the appropriate notion of family depends
upon the type of problem in which we are interested. For example, in robust
control, families are modelIed by certain natural norm bounded perturbations
of a given nominal plant. This is a local analytic point of view.
In the early 1970's, KaIman [lOJ undertook aglobaI algebraic approach to
the problem of system parametrization when he constructed a universal
parameter spaee of linear time invariant systems of fixed state space and input/
output dimensions. In doing this, he initiated a powerful algebro-geometric
framework for studying families of linear time invariant dynamical systems.
This approach has had major ramifications in algebraic systems theory and
basically opened up a new branch of study. Indeed, a whole conference was
dedicated at Harvard in 1979 just to consider this research area. Even today
almost two decades later, a number of prominent researchers are continuing
along this research stream.
Besides the introduction of algebraic geometry into control, Kalman's paper
[10J (see also [13]) illuminated the deep connection between invariant theory
and a number of control problems. Given the introduction of invariant theory
and algebraic geometry into control, it was only a sm all step to bring geometrie
invariant theory into the picture. Indeed, geometrie invariant theory may be

* This research was supported in part by grants from the NSF (ECS-8704047), NSF (DMS-8811084),
the Air Force Office of Scientific Research AFOSR-88-0020, AFOSR-90-0024, U.S. Army Research
Office, and the ONR through Honeywell Systems and Research Center, Minneapolis.

328

A. Tannenbaum

regarded as an algebro-geometrie manifestation of classical invariant theory. It


was devised by David Mumford [14] precisely in the eontext ofuniversal families
(or moduli spaces) of algebraic varieties.
The purpose of this note is to give a geometrie-invariant theoretie construetion
of the KaIman space and using this construetion to derive some of its key
geometrie properties. Moreover, we want to explain in apreeise sense in what
way it eaptures the notion of universal family of system. Because of the
constraints on the size of papers for this volume at eertain points in our
arguments we will not be able to give all the details. For these details we would
like to refer the interested reader to the mono graph [18] and the references
therein.

2 On Algebraic Groups
We will assurne throughout this paper that the reader has abasie knowledge
of algebraie geometry. Good references for this are [4], [9], [18]. In this seetion,
we will very briefly review some basie notions in the theory of algebraic groups
following [3], [9], [18]. By morphism of algebraic varieties will mean an analytic
map which is locally rational in the coordinates of the given varieties.
Definition 1. Let G be an algebraie variety whieh is endowed with a group
structure. Let Jl: G x G - G, be defined by Jl(x, y):= xy, and i: G - G be given by
i(x):= X-I. If Jl and i are morphisms, then we say that Gis an algebraic group.
Many of the most common Lie groups are in fact algebraic groups. Below
all of our varieties will be assumed to be complex.
Examples 1.
(1) This is the most important example of an algebraic group. Let

GL n := {n x n complex invertible matrices}


Notice that GLn is the eomplement of the hypersurfaee given in cn 2 by the
determinant. As sueh it can be shown that GLn is affine [4], [9], that is,
GLn is isomorphie to some closed subvariety of CN for some N > O. If is
easy to see that the multiplication and inversion operations are morphisms
so that GL n is an algebraic group, the general linear group.
(2) Set

SL n := {n x n eomplex matriees A such that det A = I}


Then SL n (the special linear group) is clearly an algebraie group which as a
variety is the hypersurface
detA -1 =0

5 Linear System Theory-Invariant Theory and Families of Systems

329

in C"2. An algebraic group which is isomorphie to a closed algebraic


subgroup of GLn is called a linear algebraic group. SL n is an ex am pIe of a
linear algebraic group.
Finally we will be interested on how algebraic groups "act" on varieties.
We have
Definition 2. Let G be an algebraic group and X a variety. Then we say that
G acts on X if G acts on X (in the usual sense of a group acting on a set) and
if the group action G x X ~ X is a morphism of varieties.

Example 2. A relevant example of an algebraic group action is the following.


Let for k,j positive integers

CkXj := {k xj complex matrices}


Then we can define the action of GL n x C"xn by (g,M)l--+gMg- 1 , i.e. the
conjugate action.
We will see that the key to constructing the universal family of dynamical
systems is understanding such algebraic group actions.

3 Remarks on Systems
In this paper, we will assume that all of our systems are defined over the complex
numbers C, even though most ofthe constructions go over in the more physically
realistic case of systems defined over the real numbers R.
As is standard, we will identify the linear time invariant system given by
the differential equation

x(t) = Fx(t) + Gu(t)


with the matrix pair (F, G). Let
C"x n

x C"x m = {(F, G):F is n x n and Gis n x m}

This is precisely the set of linear time-invariant systems of state space dimension
n and input dimension m. For simplicity we have suppressed the output part
of the system. (See however Sect. 9.) We will also identify at certain times in
the what follows the space C n xn X C n xm with cn 2 +nm.
Now it is well-known that the general linear group GL n acts on C"2+ nm by
change of basis in the state space. Namely for (F, G)EC"2+ nm and gEGL n , we
define the action g'(F, G):= (gFg- 1 , gG). From an input/output point of view,
the systems (F, G) and (gFg- 1 , gG) are identical [11]. Accordingly, we define
the equivalence relation on C"2+ nm by (F, G) '" (F', G') ifthere exists gEGLn such
that

gFg- 1 = F', gG = G'

330

A. Tannenbaum

Then in this sense we may identify the "orbit space"


cn 2+nm/GL n:= {equivalence classes (F, G) '" (gFg- 1 , gG)}
with the set of input/output behaviors of linear dynamical systems of fixed state
and input dimensions.
A fundamental problem posed by KaIman was to understand the geometrie
structure of such an orbit space. It was precisely this question that brought the
tools of algebraic geometry and geometrie invariant theory into systems for the
first time.
Unfortunately, the space Q:= cn2+nm/GLn turns out to have a very ugly,
complicated geometrie structure. The problem here may be understood as
follows. Let n: cn 2 +nm ~ Q denote the quotient map. Clearly, for Q to be smooth
and even separated (under any reasonable topology, e.g. the complex or Zariski),
the map n must separate orbits, i.e. n-1(x) should consist of a unique GL n orbit
in cn 2 + nm But the fiber n- 1(x) is closed while an orbit may not be closed.
Hence the fiber may consist of several orbits "bound" together. Consequently
the space Q will be highly singular and not even separated (Le. Hausdorff in
the complex topology). In fact, the structure of orbits of an algebraic group
may be summarized by the following result (the closed orbit lemma a proof of
which may be found in [9]):

Theorem 1. Let G be an algebraic group acting on a variety X. Then every orbit


is a smooth locally closed subset 01 X whose boundary is a union 01 orbits 01
strictly lower dimension.
Note that the closed orbit lemma implies in particular that orbits ofminimal
dimension are ciosed. Clearly, in order to have any chance of constructing a
well-behaved orbit space, we will first have to identify the subset of cn 2 + nm on
whieh GLn acts with closed orbits. This will be done in Sect. 5. First however
in the next section we will first carefully review some geometrie invariant theory
concepts.

4 Geometrie Invariant Theory


During the 1960's, in his seminal work on the problem of classification of
algebraic varieties, David Mumford noticed that in order to arrive to a precise
notion of moduli space (roughly a universal parameter space of a given type of
variety), one would need a geometrie interpretation of classieal invariant theory.
Now it turns out that the space whieh KaIman constructed parametrizing linear
dynamical systems of a given state space dimension is exactly such a universal
family, Le. a moduli space of dynamical systems [5-8]. Therefore we should
expect that one may construct the KaIman space using ideas from geometrie
invariant theory. The point of this section will be to introduce precisely these
concepts. For more ab out the theory of moduli spaces see [14], [15], [18].

5 Linear System Theory-Invariant Theory and Families

oe Systems

331

We will see below that the KaIman space is a quotient in some suitable
sense. Indeed we have the following:
Definition 3. Let G be an algebraic group acting on a variety X. A quotient of
X by G is a pair (Y, G) where Y is a variety and oe: X ..... Y is a morphism such that
(i) oe is constant on the orbits;
(ii) given Y' a variety, oe':X ..... Y', a morphism constant on the orbits, there
exists a unique morphism : Y ..... Y' such that oe' = ooe. Note that ifa quotient
exists, then it is unique up to isomorphism.
Example 3. Let C" xn denote the space of n x n complex matrices. Consider the
polynomial map oe:C"xn ..... c n defined by sending an n x n matrix to it~
characteristic coefficients. One can prove (e.g. using Richardson's theorem [18])
that (C", oe)is a quotient ofcn Xn relative to conjugate action ofthe group GL n
Remark 1. We will assume that the reader is familiar with the basic notions
concerning sheaf theory. Recall then that to any algebraic variety X, we may
associate the strueture sheaf, (!Jx, which may be characterized as folIows: For
U c X open, we let (!Jx(U) be the ring of regular functions defined on U. Note
that a function is regular if it is analytic, and may be written as the ratio of
two polynomials on U.
If the variety X is affine, then (!Jx(X) defines the co ordinate ring of R of X.
Then X (via the Hilbert Nullstellensatz) may be identified as the set of maximal
ideals of R.
In practice, the notion of quotient is a bit too weak. Indeed, a quotient may
not be an orbit space. Indeed, referring to Example 3 above, for the quotient
oe:cnxn ..... c n, we see that each fiber consists of a unique closed orbit consisting
of semi-simple matrices, and a unique relatively open orbit consisting of matrices
with a cyclic vector. These coincide if all the eigenvalues are distinct. Hence C"
cannot in any reasonable sense be regarded as parametrizing the orbits of the
action of GLn on C"x n. We therefore need adefinition which combines the
geometrie notion of orbit space with the invariant theoretic notion of quotient,
in short a "geometrie quotient." As we alluded to above, the key property lies
in the closedness of the orbits. This leads to the following fundamental definition:
Definition 4. Let G be an algebraic group acting on a variety X. A geometrie
quotient is a pair (Y, cjJ) consisting of a variety Y, and a morphism cjJ: X ..... Y such
that
(i) for every ye Y, cjJ -l(y) is an orbit;
(ii) for each invariant open sub set U c X, there exists an open subset U' c Y
with cjJ -l(U') = U;
(iii) for every open subset U' c Y, cjJ*: (!J( U') ..... (!J( cjJ - 1 (U' defines an isomorphism of(!J(U') onto the ring ofinvariantfunctions (!J(cjJ -l(U'G of cjJ -l(U').

332

A. Tannenbaum

It is the geometrie quotient that is the "correct" geometrie invariant


candidate for orbit space. We now to understand which algebraic group actions
lead to quotients and geometrie quotients. To do this, we recall the following
key definition [14], [3]:
Definition 5. A linear algebraic group is linear reductive if each rational action
of G on any finite dimensional vector space V is completely reducible, i.e. if
W c V is an invariant subspace, then there is an invariant subspace W' c V
such that V = WEB W'.
It is important to note that the linear algebraic groups GLn and SL n are in
fact linearly reductive [3], [9], [14]. We conclude this section with the following
key result due to Mumford:
Theorem 2 (Mumford's Theorem). Let X be an affine variety, G a linearly
reductive group acting on X. Then an affine quotient (Y, 4 exists. Further, G acts
on X with closed orbits if and only if (Y, ep) is a geometrie quotient.

Proof. Since the proof is so nice, we give a short sketch here. For full details
see [14], [3]. First if Adenotes the coordinate ring of X, then A is a finitely
generated C-algebra. Let AG denote the ring of invariant functions. Then since
Gis linearly reductive AG is also finitely generated. See [14], [3]. (This is really
the key point of the proof.) Hence AG determines an affine variety Y, and the
inclusion A G ~ Adetermines a morphism 4>: X -+ Y. It is then elementary to
show that (Y, 4 is a quotient.
0

5 Completely Reachable Systems


One of the key concepts introduced by KaIman in the early 1960's was that of
complete reachability. It turns out that this is also the key concept we will need
for the construction of the universal family of dynamical systems.
Using the notation of Sect. 2, recall that a system (F, G)Ecn 2 + nm is
completely reachable if
rank [G FG F 2 G ... pn-l G] =:rank R(F, G) = n
Note that by the Cayley-Hamilton theorem, it is enough to go up to pn-l
in the above characterization of complete reachability.
Set
Vnm := {(F, G)Ecn 2 + nm :rankR(F, G) = n}

Obviously, Vnm is a Zariski open subset of cn 2 + nm , that is, it is defined as the


complement of a closed algebraic variety. In fact, it is the complement of the
variety given by the common zeros of the determinants of the n x n minors of

5 Linear System Theory-Invariant Theory and Families of Systems

333

R(F, G). Moreover, clearly Vnm is invariant under the action of GLn What we
will show now is that GLn acts on the space of completely reachable systems
Vnm with closed orbits.

First in general for an algebraic group G acting on a variety X, we set


stab (x):= {gEG:g x = x}
This is the stabilizer subgroup of G. Clearly
dirn (Gx) + dirn stab (x) = dirn G
Thus orbits ofmaximal dimension correspond to points with stabilizer subgroup
ofminimal dimension. We now have the following characterization of complete
reachability:
Theorem 3. (F, G)E Vnm

if and only if dirn stab (F, G) = O.

Proof. The proofwe give here works ~ver C or R. For a proof over an arbitrary
field see [18], [19]. First we may consider V:= Cn as a finitely generated C[x]
module via the action
xv:= F(v)

Let V' be the C[x]-submodule of V generated by the columns of G. Then


stab(F, G) = {J: V --+ V:fisaC[x]-automorphism andfl V' = identity}
Now stab(F, G) is a Lie group, whose associated Lie algebra is

.91:= {J': V --+ V:f' is a C[x]-endomorphism and f'1V' = O}


~

Homqx] (V/V', V)

Certainly the dimension of the linear space .91 is equal to the dimension of
stab (F, G), and clearly dirn .91 = 0 if and only if V = V'. But V = V' ifand only
if (F, G) is completely reachable.
0
We now can prove the following key result:
Corollary 1. GLn acts on Vnm with closed orbits.

Proof. Indeed from Theorem 3, we have that all the orbits of GL n in Vnm have
constant (maximal) dimension. Hence by the closed orbit lemma (Theorem 1),
these orbits must be closed.
0
Remark 2. Actually one can even prove more than Corollary 1, namely that
Vnm is precisely the set of pre-stable points of cn 2 + nm under the GLn-action. For
details see [18].

334

A. Tannenbaum

6 On the Construction of the Kaiman Space


We are now ready to put all the previous ideas together and give our construction
of the KaIman space. As above let Vnm denote the Zariski open subset of
completely reachable pairs. Define the GLn-equivariant morphism R: Cn x n X
cnxm-.cnxmn by
R(F, G):= [G FGPG]

Number the columns of R(F, G) lexicographically by


0102 ... Om 11 .. . 1m ... n1 ... nm
We then define a nice selection of this set of indices to be a sub set I of size n
such that if (i,j)EI, then (i' ,nEI for all i' ~ i. A key observation of KaIman [10]
is that (F, G) is completely reachable if and only if there exists a nice selection
I such that
det R(F, Gh = 0
Now for I a niee selection, let
U I := {(F, G):detR(F, Gh =F O}

Clearly {U I} I niee forms an affine open covering of Vnm


Since by Corollary 1, GLn acts on Vnm with closed orbits, and U I is invariant
under this group action, we have that the group GLn acts on the (affine) space
UI with closed orbits. Thus by Mumford's theorem (Theorem 2), there exists a
geometrie quotient (VI' 7tI) of U I and VI is affine.
Next given any two niee selections I and J, define an invariant function on
U I by
fIJ(F, G):= det R(F, G)J/det R(F, Gh

Then since fIJ is invariant, it descends to a function on the quotient VI. Define
VIJ := VI\{(F,G):fIJ(F,G)=O}

It is easy to see that


7t;-1 (VIJ ) =

U1(\ U J =

7t; 1 (VJI )

Since VI is a geometrie quotient for U land similarly VJ is a geometrie quotient


for U1> it follows that VIJ and VJI are quotients for U1(\ U J. Thus by uniqueness
we see that there exists a canonical isomorphism IjJIJ: VIJ-' VJI such that
IjJIJ 0 7t I =

7tJ

Thus the sets {VI }Iniee patch together to form a variety vltnm This is precisely
the Kaiman space [10].
Now it is very easy to get a projective embedding of vltnm Indeed, the set
of functions {JIJ} forms a Czech 1-cocycle, and hence determines a line bundle

5 Linear System Theory-Invariant Theory and Families of Systems

335

L on .Anm It can be proven [18J that L is ample, Le. for some N > 0, the
sections of 0 determine an embedding of .Anm into some projective space. This
means that .Anm is quasi-projective, i.e. is isomorphic to an open subset of a

closed projective variety. The following result is an immediate consequence of


the above discussion and of the results in [14J, [15J:

Theorem 4. .Anm is a smooth, irreducible quasi-projeetive variety of dimension


nm. n: Vnm --+ .Anm is a geometrie quotient. In partieular, Vnm is a prineiple algebraie
GLn-bundle over .Anm .
We should not that in his original construction, KaIman constructed .Anm
as a subspace of the Grassmannian. This approach while very natural and
elegant leaves open a number of basic questions concerning the geometric
structure of .Anm . We will see in the next section that such questions are crucial
in understanding the existence of algebraic canonical forms.

7 On the Geometrie Strueture of ."m


As we just noted, Theorem 4 still does not give a very complete picture on the
geometry of the space .Anm Indeed, even though we have shown that it is
quasi-projective, in fact it may be even quasi-affine, Le. embedded as an open
subset of some affine variety. Actually, for n = 1, this is the case. More precisely,
using the control canonical form, it is immediate that .An! ~ en. We will
therefore have to work a bit more to elucidate the structure of .Anm in general.
The first thing to note is that in the standard way we have that

GLn=SLnC*
As above we identify cn Xn X cn x m with en 2 + nm =:X. Let (Y, n) be the quotient
of X by SL n This exists by Mumford's theorem. If X' denotes the subset of X
fixed by the action of C*, then it is easy to see that X' ~ Cn2 Let Y':= n(X').
From Example 3 above, we have that Y' ~ cn.
Now one can check that Y is a C*-variety. Let Oc.(Y) denote the orbit of
YE Y under C*. Then

and it is easy to show that Oc.(y)n Y' consists of a single point (the "vertex"
of a cone). Recall that for two varieties VI and V2 , a morphism : VI --+ V2 is
projeetive ifit may be factored by a closed immersion of VI into pN x A 2 followed
by the natural projection of pN x A 2 onto A 2
We can now state the following lemma from [18J:

Lemma 1. Set Y:= Y\Y'. Then a geometrie quotient Y/C* for Y exists relative
to C*, and further there exists a natural projeetive morphism cfJ: Y/C* --+ Y'.

336

A. Tannenbaum

ProoJ. The proof of the lemma while straightforward is somewhat technical,


so we just give the main idea here referring the interested reader to [18J for
the details. The basic thing to notice is that the required projective morphism
Y/C* --+ Y' is an immediate generalization of the map
C'\ {O}/C* --+ {O}
which is obviously projective since

C'\{O}/c*~pr-l

Set
Then we have a commutative diagram

with l/J projective. One can show [18J that in fact

Y/C* =.Anm
The following theorem folIo ws immediately from our above discussion:
Theorem 5. There exists a projective morphism l/J:.A nm --+ cn which makes the
Jollowing diagram commutative:

1" ~C""-~r:
.Anm

q,

Cn

where oe: CR 2 --+ cn is the quotient morphism as defined in Example 3, and


p: Cn2 + nm --+ Cn2 is the natural projection map.
Corollary 2. For m.> 1, .A nm is not quasi-affine.

ProoJ. Since l/J has projective varieties of positive dimension as tibers, .A nm


cannot lie in any affine space.
0
Remark 3. Corollary 2 has several important consequences in systems. One of
the key applications for the KaIman space was to prove the non-existence of
global algebraic and even continuous canonical forms for m .> 1. (This is some
very nice work of Hazewinkel [5J, [6J, [7J.) In geometrie terms this condition
means that n:.A nm --+ Vnm has no sections, Le. morphisms a:.A nrn --+ Vnm with

5 Linear System Theory-Invariant Theory and Families of Systems

337

noer = identity. For m = 1, we know that algebraic canonical forms exist, namely
the control canonical form.

Corollary 3. For m > 1, there exist no global algebraie eanonieal forms.


Proof From our above discussion, we must show that for m> 1, n: Vnm ~. nm
has no sections. But if such a section er existed, it would define an immersion
D
of .nm into the quasi-affine space Vnm contradicting Corollary 2.
Corollary 4. .nm is not projeetive, i.e. is not isomorphie to a closed subvariety
of some projeetive spaee.
Proof Since c/J:.nm~cn is a non-constant morphism, .nm admits global
regular functions, and so cannot be projective [18].
D

8 Universal Parameter Space of Dynamical Systems


In this section, we indicate in what sense the space .nm is a universal family
of systems. Let Var denote the category of all complex varieties, and let Sets
denote the category of all sets. First given any variety X, we define a
contravariant functor hx from Var to Sets by
hx(S):= Hom(S, X)

where Hom(S, X) denotes the set of all morphisms from S ~ X


Moreover, we define the contravariant functor ~:Var~Sets by
~(S)'= {isomorphism c1asses of completely reachable families

of pairs with m inputs and state space dimension n over S}

Note that by a eompletely reaehable family over S, we mean an m + 2-tuple


("f", F, g1, ... ,gm) where "f" is a rank n vector bundle over S, Fis an endomorphism
of "f", and gl, ... ,gm are global sections such that ~i(S)gj(s) generate the fiber
"f"(s) for each seS and 1 ~j~m, 0 ~ i ~ n-l.
The defining property ofthe universal parameter space say Z, is that in fact
hz~~

that is,
Hom(S,Z) ~ ~(S)

VSeVar

This means that ~ and hz are equivalent functors, or that Z represents the
functor ~. Such a Z has been called by Mumford a fine moduli spaee. The
point of this present section is that the KaIman space .nm is this fine moduli
space Z.

338

A. Tannenbaum

We first should note that one has a natural morphism of functors from
$i ~h.,l{nm. More precisely, given SEVar, and ("f/,F,gl, ... ,grn)E$i(S), for every
SES (after choosing a basis in "f/(s)), we get a completely reachable pair
(F(s), G(s)). This pair is unique modulo choice of basis, and thus determines a
point of Atnrn . Hence ("f/,F,gl, ... ,grn) determines a morphism S~Atnrn, and so
we have the required morphism of functors ffi ~ h.,llnm.
We can now state:
Theorem 6. Atnrn is a universal parameter space, i.e. g;(S) ~ Hom(S, At nrn) for
all SEVar.
Proof. First for fixed SEVar, we have that for U c S open, the mapping
U ~ g;(U) defines a sheaf of sets. For each I a nice selection, define a subfunctor
g;I of g; by letting for each SEVar, g;I(S) be the subset of g;(S) consisting of
m+2-tuples ("f/,F,gl, ... ,grn) such that for each point sES,F(s):"f/(s)~
"f/(s), gl (s), ... ,grn(S)E"f/(s) have the property that for some (and hence every)
choice of basis in "f/(s), the corresponding completely reachable pair (F(s), G(s))
satisfies the condition det R(F(s), G(s) h =F O.

Now one can show easily that the g;I for each I nice are open subfunctors
of g;; see [18]. Moreover, we claim that the functors g; I are representable.
Indeed, to verify this, we will prove that if (VI' n I ) is the geometrie quotient of
UI, then g;I ~ hVI First note that
U I ~ GL n x

cnrn

equivariantly, so the quotient nI : UI~ VI admits a section (TI.


Now we want to construct universal family of reachable systems VI which
represents an element of g;AVI ). More precisely, we want to construct a family
kI of completely reachable pairs over VI, such that for any completely reachable
family k over a variety S, there exists a unique morphism f: S ~ VI such that
k ~ f* kl. The universal family kI corresponds to the identity morphism
VI ~ VI. Once we have this universal family it follows from abstract nonsense
that VI represents g; I. But VI ~ cn rn , and so invoking the Quillen-Suslin theorem
[16], we see that the only rank n vector bundle over VI is the trivial bundle
VI X cnrn . Set
(F(x), G(x)):= (TI (X)

We define an endomorphism
FI:VI x cn~ VI X Cn

by
FI(x, y):= (x, F(x)y),

and sections gIi: VI ~ VI

Cn by

XH(x,i-th column of G(X))

5 Linear System Theory-Invariant Theory and Families of Systems

339

for i = 0, ... , m. It is easy to see then that (VI X en, FI> gll"'" glm) is a universal
family, and so VI represents :F I as claimed.
Next we note that the {:F/}/nice from an open covering of :F. We are now
almost done. Indeed, let
:FIJ=:FI x F:F J
Then identifying the functor :F I with the variety VI which represents it, we
have that the map
is an open immersion of varieties. From the fact that :F is convered by the
open representable subfunctors :FI' the :F I patch along the :F IJ to form a
variety, and since :F is a Zariski sheaf (i.e. U -t :F(U) is a sheaf for U eS open),
this variety must represent :F. FinaHy since .Anm and :F are patched together
in the same way, we must have that .A nm represents :F, Le., .Anm is the universal
0
parameter space.

Remark4.
(i) In [7], Hazewinkel shows that .Anm may be constructed over any
commutative ring with identity (e.g. the ring of integers Z) using certain
local arguments. The above argument immediately implies this fact as weH.
See [18].
(ii) From the proof of Theorem 6 we also deduce the existence of a universal
family of completely reachable pairs which corresponds to the identity
morphism.Anm -t.Anm under the isomorphism Hom(.A nm'.Anm) ~ :F(.Anm)'

9 Canonical Systems
After the initial work done on the KaIman space .Anm a number of researchers
(see [5-7], [2], [18] and the references therein), considered the problem of the
quotient space of canonical systems. Since this makes strong contact with
another of Kalman's key contributions, namely realization theory, we would
like to discuss this work briefly here as weH. In this case, we will take into
account the output part of the system, i.e. we consider systems of the form
i(t) = Fx(t) + Gu(t)
y(t) = Hx(t)

where (F, G, H)een x n


observable if

enxm

raJlHF"-l;J=n

ep x n. RecaH that such a system is

completely

340

A. Tannenbaum

Again by the Cayley-Hamilton theorem n - 1 suffices. A system (identified by


its matrix tri pIe) (F, G, H) is called eanonieal if it is completely reachable and
completely observable.
Let
V~,m,p:=

{(F, G,H)EC"x n X C"x m x Cpxn:(F, G,H) is canonical}

Notice that change of basis in the state space induces the following GL n action
on C nxn X C nxm X C P xn:
for gEGL n. Clearly then V n,m,p ccnxn x cnxm X C pxn is a Zariski open subset
which is invariant under this GLn-action.
Now based on our arguments from Sect. 7, we may show that there exists
a geometrie quotient (At~,m,p' nn,m,p) for V~,m,p with respect to the above
GLn-action. The orbit space At~,m,p will be smooth, and quasi-projective. As the
following result due to Hazewinkel [6] shows, it is a beautiful fact from
realization theory that At~,m,p is even quasi-affine:
C

Theorem 7.

At~,m,p is quasi-affine. This means that At~,m,p is isomorphie to an


open subset of an affine variety.

Proof. We will indicate the basic idea ofthe proofjust to so where the realization
theory comes in. See [6], [2], [18] for complete details.
Define a morphism 1jJ: c nxn X c nxm X CP xn~ c(n + 1)2 mp by:

IjJ(F, G,H):=

HG
HFG
:

...
...

HFnG

HrG
HFn+1G
HF 2n G

Note that for gEGLn,


ljJ(gFg-l,gG,Hg- 1) = IjJ(F, G, H)

Tpdescends to the quotient At~,m,p' and defined

Thus if we re~rict ljJ to V~,m,p'


a morphism IjJ:At cn,m,p ~c(n+l)
theory [11], [12] is that in fact
Set

m__ . Now a key result from partial realization


ljJ is injective.

Yt':= (fi(At~,m,p)

Note that Yt' is precisely the space of Hankel matrices of rank n, and so Yt' is
clearly quasi-affine. We have therefore established the existence of a bijective
e
morphism :h:
't' At n,m,p ~ Yt'. The fact the (fi is biJ'ective is not enough to imply
that it is an isomorphism of varieties. To do this requires some more powerful
machinery of algebraic geometry [6], [2], [18]. One argument can be based
on the facts that Yf is smooth and that (fi can be easily shown to be an

5 Linear System Theory-Invariant Theory and Families of Systems


i~omorphism

on (dense) open subsets of.Yt' and


0

vII~,m,p'

341

It is then standard that

cjJ is an isomorphism.

Remark 5.
(i) The above result of Hazewinkel is interesting from the purely realization
theoretic point of view. Indeed, the above proof shows that realizations are
continuous in their parameters when restricted to canonical systems.
(ii) As we mentioned above, it is possible to work out most of the above
constructions over R, and therefore define a real version of vii l,n,l which
we will call vii l,n,l (R). The space vii l,n,l (R) has undergone a particularly
rigorous topological study [1], [17], [18] (and many others). Indeed,
vii l,n,l (R) may be identified with the space (in Brockett's notation [1])
Rat(n):=

{f jg:J, gER[z], g monic, and J and g have no common factors in R[z]}.


It turns out that this space has n + 1 connected components which can be
named via the signature of the associated n x n Hankel of the members of
the components [1]. There has even been work on the computation of the
homotopy and ho molo gy groups of these components. (See [1], [17], [18]
and the references therein.) However even to this day there is still no complete
understanding of the exact topological structure of this space.

10 Conclusions
It has been now almost twenty years since, KaIman initiated the geometrie
approach to families of systems outlined above. Of course, even today the
concept of family remains fundamental in systems and control, and is really
the underlying object of study in both adaptive and robust control. Especially
relevant is adaptive control with its emphasis on the notion of identification,
since much of the interest in the topology and geometry of the moduli spaces
of systems was precisely for identification theoretic reasons. However it is
important to note that despite the uses of very high powered techniques from
topology, complex analysis, Lie groups, algebras, differential and algebraic
geometry, the precise global structure of these universal families and parameter
spaces still remains an open problem to this day, and one of active research
interest.
In the past ten years, there has also been an extensive pro gram of research
using a local analytic approach in both adaptive and robust control. It turned
out to be highly profitable both from the theoretical and practical stand points
to consider families defined in weighted HOO balls using techniques from operator
and interpolation theory. There is also much work being carried out on the
melding of the robust and adaptive control approaches to system uncertainty
and families of systems. In short, the study of families of systems whether from

342

A. Tannenbaum

the algebraic or analytie, loeal or global point ofview lies at the heart offeedbaek
eontrol theory. Certainly, the KaIman eonstruetion of the moduli spaee of
dynamieal systems is one of the major aehievements in this area.

References
[1] R. Broekett, "Some geometrie questions in the theory of linear systems," IEEE Trans Auto
Control AC-21 (1976), 449-464
[2] C. Byrnes, "On the eontrol of eertain deterministie infinite dimensional systems by
algebro-geometrie teehniques," Amer J Math 100 (1979), 1333-1381
[3] J. Fogarty, Invariant Theory, Benjamin, New York, 1965
[4] Hartshorne, Aigebraie Geometry, Springer-Verlag, New York, 1977
[5] M. Hazewinkel, "Moduli and eanonieal forms for linear dynamical systems II: the topologieal
ease," Math Systems Theory 10 (1977),363-385
[6] M. Hazewinkel, "Moduli and eanonieal forms for linear dynamieal systems III: the
algebro-geometrie ease," (edited by R. Hermann and C. Martin), Proe of the 1976 Ames
Conferenee on Geometrie Control Theory, Math Sei Press, 1977
[7] M. Hazewinkel, "(Fine) moduli (spaces) for linear systems: What are they and what are they
good for," Leetures given at the NATO-AMS Study Inst. on Algebraie and Geometrie Methods
in Linear Systems Theory, Harvard Univ, 1979
[8] M. Hazewinkel and R.E. Kaiman, "On invariants, eanonieal forms, and moduli for linear,
eonstant, finite dimensional dynamieal systems," in Proe CNR-CISM Symp on .AIgebraie
System Theory, Udine (1975), Leeture Notes in Eeonomies Math System Theory 131, pp 48-60,
Springer, New York (1976)
[9] J. Humphreys, Linear Aigebraie Groups, Springer, New York, 1975
[10] R.E. Kaiman, "Algebraie geometrie deseription of the c1ass of linear systems of eonstant
dimension," 8th Annual Pineeton Conference on Information Scienees and Systems, Prineeton,
NJ., 1974
[11] R.E. Kaiman, M. Arbib, and P. Falb, Topies in Mathematieal System Theory, MeGraw-Hill,
New York, 1969
[12] R.E. Kaiman, "On minimal partial realizations of an input/output map," in Aspeets of N etwork
and System Theory (edited by R.E. Kaiman and N. DeClaris), Holt, Rinehart, and Winston,
New York, 385-407 (1971).
[13] R.E. Kaiman, "System theoretic aspeets ofthe theory ofinvariants," unpublished manuseript,
1974
[14] D. Mumford, Geometrie Invariant Theory, Springer, New York, 1965
[15] D. Mumford and K. Suominen, "Introduetion to the theory of moduli," (edited by F. Oort),
Proe 5-th Nordie Sehool in Math, Olso, 1970, Wooiters-Noordhoff, Groningen, 171-222
(1972)
[16] D. Quillen, "Projective modules over polynomial rings," Inv Math 36 (1976),167-171
[17] G. Segal, "The topology of spaees of rationaifunetions," Aeta M athematica 143 (1979), 39-72
[18] A. Tannenbaum, Invarianee and System Theory: Aigebraie and Geometrie Aspeets, Springer,
New York, 1981
[19] A Tannenbaum, "On the stabilizer subgroup of a pair of matriees," Linear Algebra and Its
Applieations 50 (1983), 527-544

Chapter 5

Linear System Theory: Related Developments

On tbe Parametrization of Input-Output Maps


for Stable Linear Systems
J. B. Pearson
Department of Electrical and Computer Engineering, Rice University, Houston, TX 77251, USA

1 Introduction
I was introduced to the topic of linear multi-input-multi-output (MIMO)
systems in the spring of 1960 at Purdue University by the late Rangasami
Sridhar. At that time, reference material consisted of papers by Freeman [1,2]
and Kavanagh [3,4] and analysis and synthesis of MIMO systems was in its
infancy. Transfer function matrix representation was used and both analysis
and synthesis involved inversion of square rational matrices. There was no
understanding of synthesis procedures that would be guaranteed to lead to
proper stabilizing controllers nor was there any real understanding of how
closed-loop stability could be determined from an input-output representation.
Although it was quite clear that considerable work was needed in order to
develop effective tools to use on these problems, it was definitely not clear where
to start.
The proper start was on the representation of MIMO systems and was made
by R.E. KaIman [5] via introduction of the concepts of controllability and
observability and recognition of their importance on system structure.
This was one of the most fundamental contributions made to linear system
theory and the purpose of this essay is to trace the development of subsequent
results that now furnish a complete characterization of all input-output maps
for a stable linear system.
The fact that this development has taken the better part of the past thirty
years attests to the nontrivial nature of the problem and to the importance of
Kalman's contribution.
In the following, the emphasis is on the relationship between systems
described by transfer functions and by state equations. KaIman emphasized the
state space description and in this manner was able to characterize minimal
realizations oftransfer functions. This was the major step in the following results
to be discussed.

346

J. B. Pearson

2 The Canonical Structure Theorem


Controllability and observability were introduced by KaIman in his 1960 IFAC
paper, "On the General Theory of Control Systems," in which he notes the
relation between lack of controllability and pole cancellation in the system
transfer function.
This relation became more apparent when he published the canonical
structure theorem [6] and minimal realization of a transfer function matrix
became a topic of interest. This problem was addressed in a beautiful paper by
EImer Gilbert [7] in which the minimal realization problem was solved for the
case of distinct poles in the transfer function matrix. In this paper Gilbert
discussed the unsatisfactory state of afTairs in the MIMO system area and
pointed out errors that appeared in the literature concerning closed-Ioop system
stability due to underestimation of system order and lack of understanding of
MIMO system structure. He then presented a detailed study ofthe complexities
of pole cancellation induced by the specification of a desired transfer function
on a given plant, compensator pair. He conc1uded his paper with the remarks,
"Finally, it should be noted that the synthesis of a multivariable feedback system
is truly a formidable task. Unwieldy calculations, complex compensation
contraints, and difficulties in evaluating the efTect of disturbance inputs and
parameter variations all complicate the search for satisfactory design
proced ures."
Gilbert's results were immediately generalized by KaIman [8] and one of
the most important results from these studies was the ability to determine when
"unstable cancellation" takes place between two MIMO systems in cascade.
Clearly if a minimal realization of the product contains fewer unstable modes
than the sum of the unstable modes in each individual realization, then
cancellation has occurred. This was a major step forward since it was never
previously understood how to determine the number of poles in a transfer
function matrix. It was still not understood how to avoid this cancellation when
using a synthesis procedure in which a "desired" transfer function was specified,
but a giant step had been made in understanding stability in MIMO systems.
Consider the system shown in Fig. 1

Fig.l

5 Linear System Theory-Parametrization of Input-Output Maps

347

and suppose we want to know if the closed-Ioop is stable. We know that


necessary conditions are that G and K contain no hidden (i.e. uncontrollable
an9/or unobservable) unstable modes and that there should be no right-halfplane pole cancellation in the cascade connection of G and K. For the first
condition, since all we know is G and K, we must assume no unstable hidden
modes, but for the G, K cancellation, we can appeal to controllability and
observability of the closed-Ioop system, by introducing a new input and a new
output, as shown in Fig. 2.
y,

Fig.2

It is very easy to establish that the system with inputs (r 1 , r 2 ) and outputs
(Yl> Y2) is controllable and observable if and only if G and Kare controllable

and observable. Of course, this means that in any stability analysis of the
c1osed-Ioop system we will have all the system modes available. That is, if G
and K have minimal realizations of orders n1 and n2 respectively, then we will
have nl + n2 c1osed-Ioop modes to investigate.
It should be 9lear from this discussion that the system in Fig. 2 is stable
if and only if each of the four transfer function matrices has left-half-plane poles
only. Controllability and observability allows us to conclude that these poles
represent all n1 + n2 system modes under our control.
Controllability and observability have now enabled us to analyze the c1osedloop stability of an interconnected system in terms of its transfer function matrix.
The next step is concerned with representatin of rational transfer function
matrices as matrix fractions either over the ring of polynomials or the ring of
stable rational functions.

3 Matrix Fraction Description (MFD) of MIMO Systems


The state space description opened the way to understanding, but it did not
lead naturally to convenient design methods using output feedback rather than
state feedback. With the sole exception of LQG synthesis (another major
contribution due primarily to KaIman), it was never easy to formulate sythesis
problems using astate space description when the state of the system could
not be measured. It became generally accepted to use observers and estimates
of the state and some authors even maintained that when the state was not

348

J. B. Pearson

measurable, then it was necessary to use an observer to control a system. This


was the only unfortunate legacy of the LQG problem, and it took areturn to
transfer function descriptions, in particular to MFD's in order to demonstrate
that synthesis problems could be formulated and solved without observers.
Leading roles in this return to the input-output representation were played
by Rosenbrock [9] and Wolovich [10]. This work is nicely summarized in
Chap. 6 of Kailath's book [11], where it is shown how simple and elegant
the representation of MI MO systems can be using matrix polynomials. The
tie-in to minimal realizations is through coprime factorizations. Polynomial
MFD's lead directly to a natural definition ofMIMO system zeros and pole-zero
cancellation in MIMO systems becomes as transparent as it is in SISO
systems.
To see how this ties into our previous discussion, refer to Fig. 2. Suppose
G=A(S)-lB(s) and K=M(s)-lN(s) where A(s) and B(s) are left coprime
polynomial matrices as are M(s) and N(s).
Then

A(s)
-N(s)

-B(S)][Yl]=[B(S)
M(s)
Y2
0

0 ][r 1]

N(s)

r2

This representation is left coprime iff the matrix

A(s)
-N(s)

- B(s)
M(s)

B(s)

0]

N(s)

has full row rank for all s, but this is the same as

BW
[ A~)
o 0

0
M(s)

0]
N(s)

having full row rank which is true since [A(s) B(s)] has full row rank as does
[M(s) N(s)] by left coprimeness.
Therefore the stability of the system is determined by the zeros of
d et [

A(s)
-N(s)

-B(S)] = (constant)det[M(s)A1(s) - N(s)B1(s)]


M(s)

where B1A~1 =A-1B is a right coprime MFD of G.


This last statement is true, since by coprimeness, all of the zeros of this
determinant appear as poles of the transfer function. This is the input-output
consequence of controllability and observability.
Although the above theory is complete and elegant, unfortunately it is
inconvenient computationally for two reasons. First, polynomial computation
is notoriously sensitive and is avoided by numerical analysts whenever possible.
Second, in synthesis problems, we have at least two objectives. In our example
above, we must guarantee that given A1(s) and B1(s) we must find M(s) and
N(s) so that det(MA 1 - NBd has only left-half-plane zeros and M- 1N is a
proper rational function. This properness is not at all straighforward to achieve

5 Linear System Theory-Parametrization of Input-Output Maps

349

(see [12J for an early synthesis problem using polynomial MFD's and [13J for
more re cent results) and led to further work in MIMO system representation
designed to avoid this problem.
The work of Pernebo [14J involved what he called A-generalized polynomials, which were rational functions that behaved like polynomials. This
particular system representation was somewhat awkward and quickly gave way
to the so-called "stable-rational function" MFD. Here the matrices have entries
that are stable proper rational functions rather than polynomials. This
representation was first suggested by Vidyasagar [15J and developed by Desoer
et al. [16]. In my opinion, it is a less satisfying representation than the polynomial representation, since it is only the right-half-plane poles and zeros of a
system that are retained in all coprime factorizations, but these are apparently
the only system attributes that play major roles in system stability and
performance. Computation of "stable-rational" MFD's is quite sraightforward
as shown by Nett et al. [17J and proper controllers result from all synthesis
procedures which use the YJBK parametrization [18-20J of all stabilizing
controllers. This was the final breakthrough that led to the parametrization
of all stable input-output maps of a linear system and the subject of this
easy.

4 Parametrization of All Stabilizing Controllers


This parametrization, which is often called the "Y oula" parametrization in
deference to his many significant achievements, actually appeared earlier in an
obscure work by Kucera [18J and later in his book [19J after the paper by
Youla, Jabr, and Bongiorno [20J had appeared. I will refer to this as the
YJBK parametrization in order to include all parties who may deserve
recognition.
In the stability problem previously discussed, we said system stability is
determined by the zeros of
det(MA I

NB I )

Let us now ass urne that the coprime factorizations are in terms of stable
rational functions and thus the above statement becomes-The system is stable
if and only if
(MAI - NB I )

is a unimodular matrix, and hence its determinant is a unit. In this case a unit
is a proper stable rational function whose inverse is also proper, stable and
rational.
Define N Q and DQ as folIows,
[N

MJ[X
Y

-BIJ=[N Q
Al

DQJ

(*)

350

J. B. Pearson

where
[_AYl

:J[~ -A~l]_[~ ~]

is the double-Bezout identity [21]. Since (M, N) are left coprime and
[ X

-Bl]
Al

is unimodular, [D Q , N Q] are left coprime and it is possible to represent a


stabilizing controller by solving (*) as follows:
(**)

where Q=D"QlN Q is stable and (QB+X l ) and (QA- Y l ) are left coprime.
Therefore every stabilizing controller can be represented as in (**) where Q
is a stable proper rational matrix.
On the other hand, given any stable proper rational Q, define
M=QB+X l

and

N=QA- Y l

Then (M, N) are left co prime and


MAl-NBl=I
and hence is unimodular.
Therefore every stabilizing controller K can be parametrized by a stable
proper rational matrix Q as given by (**), i.e.
K = (QB + X d-l(QA - Y l )
or in an equivalent manner by

The utility of this parametrization appears when we describe the various


transfer functions of oUf system shown in Fig. 2.
The transfer function matrix in terms of K and G is
[

G(I-KG)-l
(I -KG)-lKG

G(I-KG)-lK]
(I -KG)-lK

which when written in terms of Q is


[

B1(QB + X d
(A1Q - Y)B

B1(QA - Y 1)]
A1(QA - Y 1)

Notice that each entry is a stable rational function when Q is stable rational.
Also notice that every entry is linear-affine in Q.
The next thing to notice is the primary reason for using stable rational
MFD's. Assume G = A - 1 B = B 1 A; 1 is strictly proper. This means that

5 Linear System Theory-Parametrization of Input-Output Maps

351

and
lim B(s) = 0

s--+

00

Since G is proper, det A( (0) :;6 0 and det Al (00) :;6 O. Therefore
K(oo) = [X1(00)]-1[Q(00)A(00) - Y1(00)]

and since

x 1(00)A 1((0) = I
it is clear that K is a proper rational function.
Even when B( (0) =I- 0, it is straightforward to impose the constraint that
det(Q( 00 )B( (0) + Xl (00)) :;6 O.

5 The Current State of Affairs


Based on the previous work, we are now able to describe a general synthesis
problem in the following way.
z

Y2

Fig.3

In Fig. 3, G is the (generalized) plant and K is the controller, u is the


control input, w is the exogenous input (e.g. reference and disturbance inputs),
z is the regulated output (e.g. the difference between desired and actual outputs)
and Yl is the measured output. The inputs r 1 and r 2 and the output Y2 have
been added to facilitate the stability analysis. That is, if G and Kare controllable and observable, then the close-Ioop system from (w, rt, r2) to (z, Yl' Y2)
is also controllable and observable and this will be adequate for astability
analysis.
Our system can be represented as follows

where (N, D) are right coprime.

J. B. Pearson

352

Without loss of generality, we write

D= [DDz

0 ]
D3

and define

~1

and ~ z as

Then
[

Z ]

Y1

[N l NzJ[~lJ
~z
N3

N4

Define a left coprime factorization of K as


K= V-lV

Then
VYz = U(Yl

+ r z)

and we can represent our closed loop system as follows

f'

Dz

0
D3

N 3 N4
0

0
0
-1

0
0
0

lJ J ~ [~ ~]m
UJ ~ [1' 0f'J

-V

Nz
0
0

~z

1 0
1

oo

Yl
Yz

It is now quite easy to prove that the three matrices shown above constitute
a bicoprime factorization [21] of the system transfer function and hence that
system stability is determined by the zeros of

o
o
-1

~] ~dct(D,)det(W)dct(V

A,

+ UB,)

-V
where (B z, A z ) is a right coprime factorization of G2Z and W is a greatest
common right divisor of (N 4' D 3 ).
Clearly the system is stabilizable if and only if det(Dd and det(W) are units
and this condition can be shown to be equivalent to the statement that G and
G22 have the same unstable poles. This, of course, is the same thing as saying
that all the unstable modes of G must be controllable from u and observable

5 Linear System Theory-Parametrization of Input-Output Maps

353

at Y1' Any G satisfying this condition is said to be admissible. It is also possible


to conclude from the above analysis that K stabilizes G if and only if it stabilizes
G22 , so the YJBK parametrization of K can be included as follows.
Consider the fictitious inputs r1 and r2 to be removed now and determine
the system transfer function from w to z. It is
lP = G ll

G 12 (I

+ KG 22 )-1 KG 21

and in terms of the YJBK parameter Q


lP = (G ll

G 12 Y AG 21 ) - (G 12 AtlQ(AG 21 )

H:= Gll

G12 YAG 21

Define
U:= G 12 A 1
V:=AG 21

Then it can be shown that when G is admissible, H, U and V are stable rational
functions and all transfer functions that can be achieved with a stabilizing
controller are given by
lP=H-UQV

where Q is any stable rational matrix and H, U and V are stable rational
matrices determined by G. Of course, this equation is well-known and widely
used today in synthesis problems involving minimization of the H 2' H 00 and 11
norms [22-25], but when I recall the state of affairs 30 years ago, I claim that
this equation represents one of the most outstanding achievements in linear
system theory and that the door to developing this equation was opened for
us by R.E. KaIman.

References
[1] Freeman, H., "A synthesis method for multipole control systems," AIEE Trans Appl Ind,
Vol 76, Part Il, pp 28-31, March 1957
[2] Freeman, H., "Stability and physical realizability considerations in the synthesis of multipole
control systems," AIEE Trans Appl Ind, Vol 77, Part Il, pp 1-5, March 1958
[3] Kavanagh, RJ., "Noninteracting controls in linear multivariable systems," AIEE Trans Appl
Ind, Vol 76 Part Il, pp 95-99, May 1957
[4] Kavanagh, RJ., "Multivariable control system synthesis," AIEE Trans Appl Ind, Vol77,
part Il, pp 425-429, November 1958
[5] Kaiman, R.E., "On the general theory of control systems," Proc. 1st International Congress
on Automatie Control, Moscow, 1960, Butterworths, 1961
[6] Kaiman, R.E., "Canonical structure of linear dynamical systems," Proc Nat Acad Sei, Vo148,
No 4, pp 596-600, 1962
[7] Gilbert, E.G., "Controllability and observability in multivariable control systems," SIAM J
Control, Voll, No 2, pp 128-151, 1963
[8] Kaiman, R.E., "Mathematical description of linear dynamical systems," SI AM J Control,
Voll, No 2, pp 152-192, 1963
[9] Rosenbrock, H.H., State-Space and Multivariable Theory. John Wiley and Sons, Inc, New
York,1970
[10] Wolovich, W.A., Linear Multivariable Systems. Springer-Verlag: New York, 1974

354

J. B. Pearson

[11] Kailath, T., Linear Systems. Prentice-Hall, Inc., Englewood ClilTs, NJ, 1980
[12] Cheng, L. and Pearson, J.B., "Frequency domain synthesis of linear multivariable regulators,"
IEEE Trans on Automatie Control, Vol AC-23, pp 3-15, 1978
[13] Antoulas, A.C., "A new approach to synthesis problem in linear systems," IEEE Trans on
Automatie Control, Vol AC-30, pp 465-473, May 1985
[14] Pernebo, L., "An algebraic theory for the design of controllers for linear multivariable systems;
Part I: Structure matrices and feedforward design; Part 11: Feedback realizations and feedback
design," IEEE Trans on Automatie Control, Vol AC-26, pp 171-194, Feb 1981
[15] Vidyasagar, M., "Input-output stability ofa broadclass oflinear time-invariant multivariable
feedback systems," Siam J Control, VollO, pp 203-209, 1972
[16] Desoer, C.A., Liu, R.W., Murray, J., and Saeks, R., "Feedback system design: the fractional
representation approach to analysis and synthesis," IEEE Trans on Automatie Control,
Vol AC-25, pp 399-412, June 1980
[17] Nett, C.N., Jacobson, C.A., and Balas, MJ., "A connection between state-space and doubly
coprimefractional representations," IEEE Trans A-C, Vol AC-29, No 9, pp 831-832, Sep 1984
[18] Kucera, V., "Algebraic theory of discrete optimal control for multivariable systems,"
Kybernetika (Prague), Vols 10-12, pp 1-240, 1974
[19] Kucera, V., Diserete Linear Control, John Wiley and Sons, New York, N.Y., 1979
[20] Youla, D.C., Bongiorno, J.J., and Jabr, H.A., "Modern Wiener-Hopf design of optimal
controllers-Part 11: The multivariable case," IEEE Trans on Automatie Control, Vol AC-21,
pp 319-338, June 1976
[21] Vidyasagar, M., "Control Systems Synthesis: A Factorization Approach," MIT Press,
Cambridge, MA, 1985
[22] Doyle, J.c., "Advances in multivariable control," ONRjHoneywell Workshop, Minneapolis,
MN, 1984
[23] Chang, B.-C. and Pearson, J.B., "Optimal disturbance reduction in linear multivariable
systems," IEEE Trans Automatie Control, Vol AC-29, No 10, pp 880-887, October 1984
[24] Francis, B.A., Helton, J.W., and Zames, G., "H"'-optimal feedback controllers for linear
multivariable systems," IEEE Trans Automatie Control, Vol AC-29, No 10, pp 888-900,
October 1984
[25] Dahleh, M.A. and Pearson, J.B., "l'-optimal feedback controllers for MIMO discrete-time
systems," IEEE Trans A-C, Vol AC-32, No 4, pp 314-322, April 1987

Algebraic System Theory, Computer Algebra


and Controller Synthesis*
J.

s. Baras

Electrical Engineering Department and Systems Research Center,


University of Maryland, College Park, MD 20742, USA

Aigebraic system theory as introduced by Kaiman provided a unifying framework for the frequency
domain and state-space approaches to linear finite dimensional systems. More significantly it
allowed a rapproachement with automata theory which led to the development of extensions to
infinite dimensional systems and nonlinear systems. Another important consequence was the
popularization of algebraic methods for constructing and analyzing models of systems over arbitrary
rings and fields. An important obstac1e for utilizing these powerful mathematical tools in practical
applications has been the non availability of emcient and fast algorithms to carry through the
precise error-free computations required by these algebraic methods. Recently with the advent of
computer algebra this has become possible. In this paper we develop highly emcient, error-free
algorithms, for most of the important computations needed in linear systems over fields or rings.
We show that the structure of the underlying rings and modules is critical in designing such
algorithms. We also discuss the importance of such algorithms for controller synthesis.

1 Introduction
My first exposure to algebraic system theory was through a graduate course
at Harvard in 1972, based on the text [17J, which presented a more complete
version of ideas presented in Kalman's seminal papers [18,32]. I was inspired
enough by the elegance ofKalman's methods and their implications to undertake
a research pro gram to develop similar constructs and theories for infinite
dimensional systems [1-3]. Wh at I found significant and highly promising in
these ideas was the algebraic methodology which, through a rapproachement
with automata theory, offered a way of thinking about infinite dimensional and
nonlinear systems free of the constraints of finite dimensionality and linearity.
I had many opportunities to interact with KaIman through the years on these
topics. The theories developed are by now weH known and cover a diverse
variety of systems: linear systems over rings and fields [33, 34J, nonlinear systems
[11,21, 35J, infinite dimensional systems [2J. EventuaHy these ideas influenced
the more traditional approaches to control systems description and synthesis
through the development of polynomial matrix methods [16,24,36, 37J. More

* The

results reported here represent joint research with David MacEnany. Robert Munach
performed most of the computer experiments. This research was supported in part by NSF grant
NSF CDR-85-00108 under the Engineering Research Centers Program, and AFOSR University
Research Initiative grant 87-0073. David MacEnany was partially supported by an IBM Fellowship.

356

J. S. Baras

recently linear systems over non-standard rings appeared as efficient models


for discrete event systems such as flexible manufacturing factories and
communicatton networks [9].
Despite the mathematical success of these ideas and theories, and the
apparent didactic value in unifying large bodies of specialized techniques in
systems, there appears to be a lack of appreciation by practical engineers and
control systems practicioners. In my opinion this is largely due to the fact that
these methods have not been supported by adequate development of efficient
and fast algorithms to perform the required computations. It has become by
now clear that the most successful way for mathematical theories to influence
practical control engineering, is by the development of efficient algorithms and
related software that permit the practicing systems engineer to utilize powerful
mathematical methods easily and efficiently. Actually, one of the major
motivations that KaIman provided [19-21J for the development of algebraic
methods in systems modeling and control was the opportunity to auto mate
many ofthe sophisticated mathematical constructions required. The advent and
popularization of computer algebra systems, has created a unique opportunity
to design such algorithms and therefore progress decidedly towards the
realization of his early vision.
We have initiated recently such a pro gram, with our first target being linear
systems over fields and rings. The predominant algorithms in this area involve
error-free computations for various polynomial matrix algorithms. We found
that the structure of the underlying rings and modules plays a critical role in
the design of such algorithms. By carefully exploiting the algebraic properties
of these rings and modules we have being able to construct efficient error-free
algorithms, based on symbolic computation, with performance three orders of
magnitude faster than previously known algorithms to control engineers.
Polynomial matrices [12J playa key role in the design of multi-input
multi-output control and communication systems [16, 24, 36, 37]. Examples
include coprime factorizations of transfer function matrices, canonical
realizations obtained from matrix fraction descriptions, design of feedback
compensators and convolutional coders. In addition the design of digital, finite
word length programmable controllers requires computations over finite fields
and rings. Typically, such problems abstract in a natural way to the need to
solve systems of Diophantine Equations (e.g. the so-called Bezout Equation).
These and other problems involving polynomial matrices require efficient
polynomial matrix triangularization procedures, a result which is not surprising
given the importance of matrix triangularization techniques in numericallinear
algebra. There, matrices with entries from a field can be triangularized using
some form of Gaussian elimination. However, polynomial matrices have entries
from a Euclidean ring, an algebraic object for wh ich Gaussian elimination is
not defined. For such matrices triangularization is accomplished instead by
what is naturally referred to as Euclidean elimination. Unfortunately, the
numerical stability and sensitivity issues of Euclidean elimination are not weIl
understood and in practice floating-point arithmetic has yielded poor results.

5 Linear System Theory-Computer Algebra and Controller Synthesis

357

At present, a reliable numerical algorithm for the triangularization of polynomial


matrices does not exist.
This paper presents new algorithms for polynomial matrix triangularization
which entirely circumvent the numerical sensitivity issues of floating-point
methods through the use of exact, symbolic methods in computer algebra. The
use of such error-free algorithms guarantees that all calculations are exact and
therefore accurate to within the precision of the model data - the best that can
be hoped for. The emphasis will be placed on efficient algorithms to compute
exact Hermite forms of polynomial matrices because this procedure is central to a
large variety of algorithms important in the design of control and communication
systems. Moreover, the triangular Hermite form is defined for any matrix with
entries from a principal ideal ring. Such matrices arise in many practical
problems in communications and control, for instance, the analysis of quantization effects in linear systems and the design of convolutional coders. Due to
their symbolic nature, the algorithms apply equally weIl to such problems since
the particular ring involved in a particular problem can be itself considered as
pro gram input data.
We have implemented algorithms to compute the exact Hermite forms of
polynomial matrices in the M acsyma and M athematica computer algebra
languages. We have also written a suite of auxiliary programs which call on
these triangularization procedures in order to perform the more high-level tasks
arising in control system synthesis. A particular example of the latter is the
design of dynamic feedback compensators.

2 Facts and Terminology of Polynomials


and Polynomial Matrices
Denote by Q[s] the ring ofpolynomials in the indeterminant 's' with coefficients
drawn from the field of rational numbers, Q. Z[s] is the subring of Q[s] formed
when the polynomial coefficients are restricted to lie in Z, the ring of integers.
The leading coefficient of a polynomial is the nonzero coefficient of its highest
degree term. By convention, the leading coefficient of the zero polynomial is
defined to be one, and its degree is taken to be - 00. If the leading coefficient
of a polynomial is one, then the polynomial is said to be monic. If a(s)EQ[S] is
a polynomial of degree zero, then a(s) is said to be a unit of Q[s], i.e. l/a(s)EQ[s].
Obviously, the set of units of Q[s] is precisely Q\{O}. A polynomial a(s)EZ[S]
is called primitive if its coefficients are relatively prime in Z, i.e. if the greatest
common divisor of its coefficients is a unit of Z, namely 1. For any a(s)EZ[S],
there exists a non-zero scalar CaEZ, unique up to a unit in Z, and a primitive
polynomial Pa(S) EZ[S], such that a(s) = caPa(s); with slight imprecision Ca is called
the content of a(s) and Pa(s) its primitive (with respect to ca)' Two popular
conventions are to always choose the positive content or to choose the sign of
the content so that the leading coefficient ofthe primitive is positive. A collection

358

J. S. Baras

of polynomials in Z[s] having content which are relatively prime is said to be


relatively primitive.
A polynomial p(s) divides a polynomial q(s), p(s) Iq(s), if there exists c(s) such
that p(s) c(s) = q(s). A common divisor, CD, c(s) of {Pi(s):i = 1, ... , n} is a
polynomial such that c(s) I{Pi(S)}. A greatest common divisor, GCD, g(s) of {Pi(S)}
is a CD of {Pi(S)} such that c(s)lg(s) for any other CD, c(s), of {Pi(S)}. A common
multiple, CM, c(s) of {Pi(S)} is a polynomial such that {Pi(S)} Ic(s). Aleast common
multiple, LCM, l(s) of {Pi(S)} is a CM of {Pi(S)} such that l(s)lc(s) for any other
CM, c(s), of {Pi(S)}.
Denote by M(Q[s]) the collection of m x n matrices with entries from Q[s].
Similarly, M(Z[s]) will denote the subset of M(Q[s]) when the entries are
restricted to lie in Z[s]. P(s)eM(Q[s]) is called a polynomial matrix. Letting
R[s] denote either Z[s] or Q[s] or, in fact, any commutative polynomial ring,
P(s)eM(R[s]) is said to be non singular if P(s) is square and det P(s) is not the
zero polynomial. A nonsingular polynomial matrix, W(s)eM(R[s]) is unimodular
if its determinant is a unit of R[s]. In this case, therefore, W(S)-l = Adj W(s)/
det W(s) is itself a polynomial matrix and also a unit of M(R[s]).
P(s)EM(Z[s]) is said to be row (left) primitive, if the polynomial entries in
each row are relatively primitive. For any A(s)eM(Z[s]), there exists a diagonal
matrix CAEM(Z) and a row primitive matrix P A(s)eM(Z[s]) such that
A(s) = CAPA(s). This we call a left content-primitive factorization of A(s). The
diagonal elements of CA are called the row contents of the respective rows of
A(s). Similar statements can be made about column (right) primitive matrices.
In analogy with the polynomial case, a content-primitive factorization is
obviously unique only up to the choice of the signs of the row contents.
Two polynomial matrices in M(Q[s]) (or polynomial vectors if n = 1), say
Al (s) and A 2 (s) are said to be linearly dependent if there exists a coefficient of
proportionality, a(s)/b(s), in the fieId of rational functions such that Al (s) =
a(s)/b(s)A 2 (s). Hence Al(s) and Ais) are linearly dependent if there exists a(s)
and b(s) in Q[s] such that b(s)Al(s) - a(s)A 2 (s) = O. This leads to the conclusion
that polynomial matrices are linearly dependent (over the field of rational
functions) if and only if there exists a non trivial linear combination of them
which equals the zero polynomial matrix and which employs only polynomial
coefficients. With this in mind, elementary row and column operations for
polynomial matrices in M(Q[s]) take the form:
1. multiplication ofa row (column) by a unit ofQ[s], Le. any nonzero rational;
2. addition to any row (column) of a polynomial multiple of any other row
(column);
3. interchange of any two rows (columns).
Performing any of the above operations on an identity matrix results in an
elementary matrix E(s) over Q[s]. Clearly, each such E(s) is unimodular and
its inverse is also an elementary matrix. In fact, every unimodular matrix is a
product of such elementary matrices. Two polynomial matrices A(s) and B(s)
are said to be row equivalent if each can be obtained from the other using a

5 Linear System Theory-Computer Algebra and Controller Synthesis

359

finite sequence of elementary row operations. In other words, they are related
by left multiplication by a unimodular matrix U(s) such that B(s) = U(s)A(s) or
A(s) = U- 1 (s)B(s).
Any mx n polynomial matrix A(S)EM(Q[s]) of rank r ~ 1 is row equivalent
to an upper triangular form (or upper trapezoidal form if m i= n). Therefore, it
can be reduced by a sequence of elementary row operations to ,an upper
tri angular (trapezoidal) matrix T(S)EM(Q[s]). Said another way, there exists a
unimodular matrix U(s) such that U(s) A(s) =T(s) with T(S)EM(Q[s]) upper
tri angular. If T(S)EM(Q[s]) satisfies the following conditions, it is said to be in
column H ermite form.
1. If m > r then the last m - r rows are identically zero.
2. Each nonzero diagonal entry has degree greater than the entries above it.
3. Each diagonal entry is monic.

The pseudo-column Hermite form for A(S)EM(Q[s]) can be defined in terms of


its Hermite form, HA(s), by multiplying each row of HA(s) with an integer of
smallest magnitude such that the matrix H~(s) so obtained satisfies
H~(S)EM(Z[s]). Clearly, H~(s) so obained is row primitive and row equivalent
to A(s). Conversely, suppose that one is given H~(s) row primitive satisfying
items (1) and (2) above and is told that H~(s) is row equivalent to A(s). Divide
each row of H~(s) by the leading coefficient of the polynomial on the diagonal
of the respective row and caU the matrix so obtained HA(s). Then there exists
U(s) unimodular such that A(s) = U(s)HA(s). The notion of pseudo-column
Hermite form gives a canonic triangular form in M(Z[s]) for each matrix in
M(Q[s]).

3 Triangularizing Polynomial Matrices


The upper triangularization of matrices with entries from a field using a sequence
of non-singular elementary row operations plays a key role in the theory of
vector spaces. Likewise, the upper triangularization ofmatrices with entries from
a ring using a sequence of unimodular elementary row operations plays a key
role in the theory of matrix modules.
The transformation to an upper triangular (trapezoidal) from using
elementary operations can be performed on any matrix with entries from a
Euclidean ring ofwhich M(Q[s]) is one example. The key feature that Euclidean
rings enjoy is the Euclidean division property. Given two polynomials
a(s), b(S)EQ Es], where deg b(s) ~ deg a(s), there exist two polynomials, the
quotient, q(s) and the remainder, r(s), such that b(s) = q(s) a(s) + r(s) where r(s) == 0
or deg r(s) < deg a(s). The fact that the ineq uality on the degrees is strict aUows
one to introduce a zero into a polynomial matrix using elementary operations.
As an example, consider an attempt to introduce a zero into the (2, 1) position
of a 2 x 2 polynomial matrix. Suppose one has found q(s) and r(s) such that

360

J. S. Baras

b(s) = q(S) a(s) + r(s) and then computes,

( 1 0)
- q(s)

(a(S)
b(s)

C(S))
d(s)

(a(S)
r(s)

c(s)
)
d(s) - q(s)c(s)

using an obviously unimodular pre-multiplication. By interchanging the rows,


the same procedure can now be repeated on the resulting matrix and iterated,
each step reducing the degree of the (2,1) entry in view of the strict degree
inequality. Clearly, the process cannot continue indefinitely without yielding a
constant. If the constant is not zero, one final row exchange and the proper
elementary row operation will introduce a zero into the (2, 1) position. It is not
difficult to see that the resulting polynomial in the (1,1) position will be the
GCD of the original polynomials a(s) and b(s). This process is called Euclidean
elimination by analogy with Gaussian elimination.

4 Integer vs Rational Arithmetic


The above example also serves to illustrate that rational arithmetic is costly.
For instance, to calculate the coefficients of polynomials of the form

d(s)-q(s)c(s)

c,d,qEQ[S]

one encounters the generic computation rJ. + y, rJ., , y E Q. If these are expressed
as ratios of integers rJ. = Na/Da, = N/D, Y= NY/DY, all reduced to lowest
terms, then
NaDDY + NNYD a
rJ. + yb = - - - - - - DaDDY
This computation requires six integer multiplications, one integer addition and
the calculation of a GCD. Although there are more efficient methods (see [26],
p. 313), it remains a fact that rational arithmetic is computationally expensive,
due in large part to the need for GCD calculations. On the other hand, if it
can be arranged so that rJ., and y are all integers, then the same computation
obviously requires only two integer multiplications, one integer addition and
no GCD calculation. Clearly, by multiplying each row of any A(S)EM(Q[s])
by a large enough integer, the denominators of every coefficient of every entry
of A(s) can be cancelled and such a diagonal operation is certainly unimodular
in M(Q[s]). Because this involves a fixed overhead, assume, for convenience,
that A(S)EM(Z[s]). This still creates difficulties because Z[s] is not a Euclidean
ring. For instance, it is easy to see that the remainder oftwo polynomials EQ[S]
with integer coefficients has, in general, rational coefficients; consider the
remainder of 2s after division by 3s - 1. In other words, Euclidean division is
not defined for Z[s]. However, Z[sJ is an instance of a unique factorization
domain (UFD) and there exists a procedure called Pseudo-division which is
defined for polynomials with coefficients in a UFD and which avoids rational
arithmetic altogether.

5 Linear System Theory-Computer Algebra and Controller Synthesis

361

Pseudo-Division Lemma

Given polynomials a(s), b(S)EZ[SJ with deg b(s) ~ deg a(s), there exist polynomials
r(s), N(S)EZ(S) and a positive integer D such that
Db(s) = N(s)a(s) + r(s)
with either r(s) == 0 or deg r(s) < deg a(s). Furthermore, D is the smallest such
integer, i.e.for any other tripie (r'(s), N'(s), D') satisfying the above, it follows that
D ~ D'. It also follows that D ~ I~ where k = deg b(s) - deg a(s) + 1 with la denoting
the leading coefficient of a.
Proof. In division of polynomials b(s) and a(s) over Q, explicit division of la is
performed k times. Thus, if b(s) and a(s) start off with integer coefficients, then
the only denominators which appear in the coefficients of the quotient q(s) and
remainder r(s) are divisors of
This suggests that it is possible to find
polynomials q(s), r(s)EZ[SJ such that I:b(s) = q(s) a(s) + r(s) (see [26J, p. 407).
Denote/: by D' and q(s) by N'(s). Write

I:.

r'(s) = D'b(s) - N'(s)a(s) == g'r(s) = g'(Db(s) - N(s)a(s))


where g'=GCD{D',N~, ... ,N~_m}. Clearly (r(s),N(s),D)EZ[SJ and D~D'.
The following algorithm computes the D, N(s) and r(s) discussed in the
Pseudo-division lemma. It is a distinct improvement over the Pseudo-division
lemma given in Knuth [26J in that it computes with smaller numbers and
maintains primitivity of the input polynomials. As an example, suppose we wish
to pseudo-divide u(s) by v(s) where,

u(s) = S8 + S6 - 3s4

3s 3 + 8s 2 + 2s - 5

and

v(s) = 3s6

+ 5s4 -

4s 2

9s + 21

This example is discussed in [26, p 408]. U sing Knuth's algorithm one obtains,

27u(s) = v(s)(9s 2

6) + ( - 15s4

+ 3s 2 -

9)

Algorithm D given below instead computes according to,

9u(s) = v(s)(3s 2 - 2) + ( - 5s 4 + S2 - 3)
Of course in this ex am pIe the difference is negligible. However, if the size of the
leading coefficient of v(s) is large, the difference in computational burden can
be quite substantial.
Aigorithm D-Pseudo-Division of Polynomials

Given two polynomials b(s) = bosn + b1sn - 1 + ... + bm a(s) = aos m+ als m - 1 + ...
+ amEZ[sJ with n ~ m and a(s) =f. 0, this algorithm computes the smallest
pseudo-remainder D and the pseudo-quotient N(s) defined above. This algorithm
computes D and N(s) directly instead of computing D' and N'(s) and then finding

362

J. S. Baras

g'. Direct calculation of D and N(s) involves computing GCD's on thefly which
involve smaller numbers than those used to compute g'. Bigger numbers cost
more in GCD calculations and given the size of the integers encountered in
polynomial matrix computations, e.g. easily greater than 1000 digits, this
algorithm can save a substantial amount of time.
rnnrn +- min (rn, n - rn)

9 +- GCD(b o, ao)
d+-ao/g
D+-d
bo +- bo/g
F or i = 1 thru n - rn Do
F or j = i thru n - rn Do
bj+-bj*d
EndDo
For j = 1 thru min(rnnrn, n - rn - i + 1) Do
EndDo
9 +- GCD (bi' ao)
d+-ao/g
D+-D*d
bi +- bJg
EndDo
The algorithm terminates with the first n - rn + 1 coefficients of b(s) overwritten
according to {bo, b 1 ,, bn-m} +- {No, N 1"", N n-m}.
Introduction of a zero in the (2, 1) position of the matrix in the 2 x 2 example
can now be performed using the Pseudo-division algorithm. This freedom from
rational arithmetic is not without drawbacks, however. Consider triangularizing
the matrix:
1
( 45s
7 - 5s

s
-lOs -10
6s 2 - 1

Triangularization with Pseudo-division yields:

o
-89145

1351755s 3 -1373940s 2 + 5102550s + 7152750


)
43605s 3 -77103s 2 + 234189s + 35190
- 3706425s4 + 5202000s 3 - 18532125s 2 - 15671025s -7152750

5 Linear System Theory-Computer Algebra and Controller Synthesis

363

This illustrates the main disadvantage of triangularization over M(Z[s]) - the


coefficient growth of the polynomials. As the number of rows and columns in
the matrix increases, this coefficient growth continues unabated and begins to
erode the advantage of using integer arithmetic. One approach to handle this
coefficient growth is to remove the content of the current row after a zero is
introduced. Factoring the above matrix into its left content-primitive form RA(s)
yields:
765

o ) (-9905

0
9

RA(s) = ( 0

0 65025

0
0

0
-9905
0

1767s 3 -1796s 2 +6670s+9350 )


4845s 3 -8567s 2 +26021s+3910
-57s 4 +80s 3 -285s 2 -241s-110

The superfluous left content of this matrix can then be discarded since this is
equivalent to multiplying A(8) by a unimodular matrix R - 1 thereby keeping
the coefficient size to a minimum. Note that the above polynomial matrix A(s)
is in pseudo-column Hermite form and that the column Hermite form of A(8) is:
17678 3
9905

17968 2
9905

13348
1981

1870
1981

260218
9905

782
1981

1 0

---+--------

---+--------

969s 3
1981
8

8567s 2
9905

2
80s 3
2418 110
---+5s +--+57
57
57

5 Algorithms for Triangularizing Polynomial Matrices


Algorithm T -Column Oriented Triangularization of Polynomial Matrices

Given an N x N nonsingular matrix AEM(Z[s]), this algorithm overwrites A


with a tri angular form obtained by a sequence of unimodular, elementary row
operations. It avoids rational arithmetic by using Pseudo-Euclidean division.
In addition, it inhibits coefficient growth by dividing the current row (the row
affected by the preceding elementary row operation step) by its row content.
This algorithm operates in a column oriented fashion by successively zeroing
out the entries in each column below the diagonal. This is shown pictorially
below.

[: : : : :]-+[~ : : : :]. . --)[~ ~ : : :]


xxx

x x

xx
x x

Oxxxx

0 x

OOOxx
0 0 0 0 x

364

1. S. Baras

Assurne there exists a pre-defined function


MinimumDegreelndex(A, k):= arg min {deg Al,k"'" deg AN,d

which returns the index of the row of A whose klh entry is a non-zero polynomial
oflowest degree among the rows {k, k + 1, ... , N} If Ak,k(S) = A k+ 1,k(S) = AN,k(S) ==
0, then it returns - 00
F or k = 1 thru N - 1 Do
index +- MinimumDegreelndex(A, k)
If index x#--

00

Then

A k,. ~ Aindex,. (exchange rows k and index)

F or n = k + 1 thru N Do (zero out all entries in column k below Ak,k)


EndlessLoop
num +- pseudo - quotient(An,k' Ak,k)
denom +- pseudo - remainder(An,b Ak,k)
An,. +-denom*A n,. - num*A k,.

An,. +- An,.lGCD {content(An,l)"'" content(An,N)}


If An,k == then exit EndlessLoop
An,.~Ak.

End EndlessLoop
EndDo
Endlf
End
It is emphasized that this is a 'paper' algorithm. In fact, the working code based
on the above is more efficient and can handle singular and non-square matrices
and the entries can initially belong to Q[s]. However, these considerations just
complicate matters and obscure the basic operation.
Aigorithm M-Minor Oriented Triangularization of Polynomial Matrices

This algorithm is similar to the one above except it performs the zeroing process
in a leading principal minor oriented fashion so that the algorithm consists of
N - 1 stages where the k x k leading principal submatrix is in a triangular form
by the end of the k th stage. Furthermore, the algorithm employs an additional
substage wh ich reduces the degrees ofthe polynomial entries above the diagonal
on the fly using Pseudo-Euclidean division. The order in which the degree are
reduced is important and is based upon notions from the integer case contained
in Kannan and Bachern [23]. The order is shown pictorially below. The output
matrix is in its unique Pseudo-Hermite form, not simply triangularized, i.e. it
is a triangular matrix with entries above the diagonal of degree less than the
diagonal entry. It is technically not in Hermite form since the diagonal entries
are not monic. But this form is easily obtained by left multiplication with the
appropriate diagonal matrix of rational numbers, a unimodular matrix with

5 Linear System Theory-Computer Algebra and Controller Synthesis

365

respect to Q[s].
X

-+

-+

0
0

X
X

.x

0
0

0
0
0

0
0
0
0

0
0

6
3
1

5
2

0
0
0

-+

... -+

For k=2 thru N Do


For n = 1 thru k - 1 Do (triangularize the k X k1h leading principal minor)
if deg A n.n > deg A k.n Then A k.. ~ An ..
EndlessLoop
num +- pseudoquotient(Ak.n> A n.n}
denom +- pseudoremainder(A k.n, A n.n}
A k.. +- denom*A k.. - num*A n..
A k.. +- AdGCD {content(Ak,l}"'" content(A k.N }}
If A k n =I 0 then A k.. ~ An .. else Exit EndlessLoop
End EndlessLoop
EndDo
Fori = -1 thru-k+ 1 step -1 Do (reduce degs of abv diag polys In
k x k1hminor)
Forj = i + 1 thru 0 Do
If deg A k+i.k+i ~ deg A k+i.k+ i Then
num +- pseudoquotient(A k+i.k+i' Akk +i.kk+)
denom +- pseudoremainder(A k+i k+i' Akk+ i.kk+)
Ak+ i.. +- denom * Ak+ i.. - num * Ak+ i ..
Ak+i .. +- Ak+;/GCD {content(Ak+i.d,, content(A k+i.N }}
EndIf
EndDo
EndDo
EndDo

6 Simulation Results
Macsyma is a Lisp based system for performing formal, symbolic calculations
using both error-free and arbitrary arithmetic. Since it is Lisp based,
Macsyma runs the fastest on Lisp machines-computers which have the Lisp
instructions hard-coded into their microprocessors, such as the Texas

366

J. S. Baras

Instruments Explorer 11. Mathematica is a new computer algebra system which


has similar capabilities as Macsyma, but is written in 'C'. We have been
running M athematica on the Next Machine, but its relatively slow microprocessor and small memory severly limit its performance.
Simulations were performed to determine the average time required to triangularize square polynomial matrices and the maximum coefficient length
using both Algorithm T and Algorithm M (see attached graphs). Each matrix
had polynomial entries with randomly generated 1-2 digit coefficients. Runs
were parameterized by the degree of the polynomial, which ranged from 1 to
6, and the dimension of the matrix, which ranged from 2 to 16.
These simulations were run on a Texas Instruments Explorer 11 with 16mb
of physical memory and 128mb of virtual memory using the Macsyma
version of our algorithms. The graphs represent the results of the simulations
averaged over 7 runs. The results indicate that Algorithm T was moderately
fast er than Algorithm M in triangularizing matrices up to 9 x 9. At that point
Algorithm T was still faster for triangularizing matrices with lower degree
polynomials, but slower in the higher degree polynomials. This can be attributed
to the fact that Algorithm M requires less memory during computations due
to its substage which reduces the degrees of the polynomials above the diagonal
on the fly. Therefore costly garbage collections, a technique of freeing
dynamically allocated memory, are reduced. This is further shown by the fact
that Algorithm Tran out ofmemory whiIe.attempting to triangularize a 13 x 13
matrix with degree 6 polynomial entries while Algorithm M completed a 16 x 16
matrix with degree 6 entries.
It appears that both of these algorithms run close to exponential time. The
slopes of the semi-Iog plots of the timings increase slightly with increasing
polynomial degree. The maximum coefficient length was approximately the
same for each algorithm and the coefficient growth appears to be
sub-exponential with increasing matrix dimension. A 16 x 16 matrix with degree
6 polynomials is the largest that has been attempted with Algorithm M. It
required 40 hours to trianglarize with the resulting matrix having a maximum
coefficient length of 2115 digits.
Although Algorithm T was faster than Algorithm M on the sm aller matrices,
it did not have the overhead of putting the matrix in a canonic form in the
process. Algorithm M leaves the matrix in a unique Pseudo-Hermite form as
described earlier. The output matrix of Algorithm T requires the application of
an additional algorithm to reduce the degree of its above diagonal polynomials
in order to put it in a Pseudo-Hermite form.

7 Summary of Functions
The following is a summary of the high-level auxiliary programs which we have
to date implemented in Macsyma and Mathematica. They perform most of
the common, high-level tasks arising in the control system design process.

5 Linear System Theory-Computer Algebra and Controller Synthesis

367

RightMatrixFraction(H(s))-Computes a right matrix fraction description


of the transfer function matrix H(s), i.e. computes the matrices N(s), D(s) such
that H(s) = N(s) D(s) -1. The LeftMatrixFraction description is analogously

computed.
Bezout(N(s), D(s))- Finds the homogenous and particular solutions to the
Bezout equation, i.e. finds polynomial matrices Xp(s), Yp(s), Xh(s), Yh(s) such
that Xp(s)X(s) + Yp(s)N(s) = land Xh(s)X(s) + Yh(s)N(s) = O. Used for

designing feedback compensators in the frequency domain.

ColumnReduce (D(s))-Column reduces the polynomial matrix D(s), i.e.


multiplies D(s) by an appropriate unimodular matrix such that the matrix of

leadingcoefficients of its entries is nonsingular. Row Reduce is analogously


computed.
ControllerfH(s))- Finds a controller form realization of the transfer function
matrix H(s). Controllability, Observer and Observability realizations are
analogously computed.
Hermite(N(s))-Finds the Hermite form of the polynomial matrix N(s).
RightCoprime(N(s), D(s))- Determines the greatest common right divisor of
the polynomial matrices N(s) and D(s). If it is not unimodular, it is pulled out
ofboth matrices making them right coprime. Used for finding minimal realizations. Left Coprime is analogously computed.
Smith(N(s))- Finds the Smith form of the polynomial matrix N(s). This is
a canonic, diagonal form of a polynomial matrix.
SmithMcMillan(H(s))-Finds the Smith-McMillan form of the transfer
function matrix H(s). This is a canonic, rational, diagonal form of a matrix
whose entries are ratios of polynomials.

References
[1] Baras, J.S. and R.W. Brockett, "H 2 Functions and Infinite Dimensional Realization Theory,"
SIAM Journal of Control, Vol 13, No 1, pp 221-241, Jan 1975
[2] Baras, J.S., "Algebraic Structure of Infinite Dimensional Linear Systems in Hilbert Space,"
in M athematical System Theory, G. Marchesini and S.K. Mitter (Eds.), Springer-Verlag Lecture
Notes in Economics and Mathematical Systems, Vol 131, pp 193-203, 1975
[3] Baras, J.S. and P. Dewilde, "Invariant Subspace Methods in Linear Multivariable Distributed
Systems and Lumped-Distributed Network Synthesis," IEEE Proceedings, Special Issue on
Recent Trends in Systems Theory, pp 160-178, Feb 1976
[4] Bariess, E.H., "Computational Solutions of Matrix Problems over an Integral Domain,"
J Inst Maths Applies VI0, 69-104, 1972
[5] Bariess, E.H., "Sylvester's Identity and Multistep Integer-Preserving Gaussian Elimination,"
Math Comp V22, 565-578, 1968
[6] Brown, W.S., "On Euclid's Aigorithm and the Computation of Polynomial Greatest
Common Divisors," J ACM V18 (4) 478-504 Oct 71
[7] Brown, W.S. and J.F. Traub, "On Euclid's Aigorithm and the Theory of Subresultants,"
J ACM V18 (4) 505-514 Oct 71
[8] Chou, T.J. and G.E. Collins, "Algorithms for the Solution of Systems of Linear Diophantine
Equations," Siam J Comp Vll (4) 687-708 Nov 82
[9] Cohen, G., P. Moller, J.P. Quadrat and Viot M., "Algebraic Tools for the Performance
Evaluation ofDiscrete Event Systems," IEEE Proceedings, Vol 77, pp39-58, 1989
[10] Collins, G.E., "Subresultants and Reduced Polynomial Remainder Sequences," J ACM V14
(1) 128-142 Jan 67

368

J. S. Baras

[11] Fliess, M., "Un Outil Aigebrique: Les Series Formelles Non Commutatives," in Mathematical
System Theory, G. Marchesini and S.K. Mitter (Eds.), Springer-Verlag Lecture Notes in
Economics and Mathematical Systems, Vol 131, pp 122-148, 1975
[12] Gantmakher, F.R., Theory of Matrices, New York: Chelsea, 1959
[13] Gregory, R.T. and E.V. Krishnamurthy, Methods and Applications of Error-Free Computation,
Berlin: Springer, 1984
[14] HartIey, B. and T.O. Hawkes, Rings, Modules and Linear Algebra, London: Chapman and
Hall, 1970
[15] Hungerford, T.W., Algebra, Berlin: Springer, 1974
[16] Kailath, T., Linear Systems, Englewood ClifTs: Prentice-Hall, 1980
[17] Kaiman, R.E., P.L. Falb, and M.A. Arbib, Topics in Mathematical System Theory,
McGraw-Hill, 1969
[18] Kaiman, R.E., "Algebraic Structure of Linear Dynamical Systems I. The Module of E," Proc
Nat Acad Sci (USA), 54, pp 1503-1508
[19] Kaiman, R.E., "On Minimal Partial Realizations of a Linear Input/Output Map," in Aspects
of Network and System Theory, R.E. Kaiman and N. DeClaris (Eds.), Holt, Rinehart and
Winston, 1971
[20] Kaiman, R.E., "Introduction to the Aigebraic Theory of Linear Systems," in Mathematical
Systems Theory and Economics I, Lecture Notes in Operations Research and Mathematical
Economics, Vol 11, pp 41-65, Springer-Verlag, 1969
[21] Kaiman, R.E., "Pattern Recognition Properties of Multilinear Machines," in Proc of IFAC
International Symposium on Technical and Biological Systems ofControl, Yerevan, Armenian
SSR,1968
[22] Kannan, R., "Solving Systems of Linear Equations over Polynomials," Report CMU-CS-83165, Dept. of Comp. Sei., Carnegie-Mellon University, Pittsburgh, 1983
[23] Kannan, R. and A. Bachern, "Polynomial Aigorithms for Computing the Smith and Hermite
Normal Forms of an Integer Matrix," Siam J. Comp. V8 (4) 499-507 Nov 79
[24] Khargonekar, P. and E. Sontag, "On the Relation Between Stable Matrix Fraction Factodzation and Regulable Realizations of Linear Systems Over Rings," IEEE Trans. on Aut. Control,
Vol AC-27, No 3, pp 627-638, June 1982
[25] Keng, H.L., Introduction to Number Theory, Berlin: Springer, 1982
[26] Knuth, D.E., The Art ofComputer Programming, Vol2 Reading, Mass: Addison Wesley, 1981
[27] Krishnamurthy, E.V., Error-Free Polynomial Matrix Computations, Berlin: Springer, 1985
[28] Lipson, J.D., Elements of Algebra and Algebraic Computing, Reading: Addison-Wesley, 1981
[29] MacDufTee, C.C., The Theory of Matrices, New York: Chelsea, 1950
[30] McClellan, M.T., "The Exact Solution of Systems of Linear Equations with Polynomial
Coefficients," J ACM V20 (4) 563-588 Oct 73
[31] Newman, M., Integral Matrices, New York: Academic Press, 1972
[32] Rouchaleau, Y., B.F. Wyman, and Kaiman R.E., "Algebraic Structure of Linear Dynamical
Systems IH. Realization Theory Over a Commutative Ring," Proc of Nat Acad of Sciences,
Vol 69, No 11, pp 3404-3406, 1972
[33] Sontag, E., "Linear Systems Over Commutative Rings: A Survey," Ricerche di Automatica,
Vol 7, No 1, pp 1-34, July 1976
[34] Sontag, E., "Linear Systems Over Commutative Rings: A (Partial) Updated Survey", Proc of
1981 IFAC, Kyoto 1981
[35] Sontag, E., Polynomial Response Maps, Lecture Notes in Control and Information Sciences,
Vol 13, Springer-Verlag, 1979
[36] Vidyasagar, M., Control System Synthesis, Cambridge: MIT Press, 1985
[37] Wolovich, W.A., Linear Multivariable Systems, Berlin: Springer, 1974

5 Linear System Theory-Computer Algebra and Controller Synthesis

369

1000000
100 hour'S
I

100000

x/~:;;x4

. . . . . .1,,,......-1 J .......

,...-

10000

1000
I

100

! ......-:::: v____

.-

~;:;z:=::;

~~"'--~9-

10

~~~~~0

,;
"""

,.t~X~X
x X--XJ

c
0

. . .

t::?T
0

........b .............f ........~


..-9-

---~

o=-----j

,~

~ __ f XEstimaled val~~
Ran out of memo
I

TI Explorer II~
16 mb physlcal mem~~
128 mb virlual memory

10
12
8
9
11
Square matrix dimension

13

14

15

16

Fig. 1. Time of triangularize (sec)-Column oriented algorithm polynomial degrees 1 through 6

TI Explorer 11
16 mb physlcal memor~==
128 mb vlrlual memory

~f

1'~/"
3

9
10
11
12
8
Square matrix dimension

13

14

15

16

Fig. 2. Time to triangularize (sec)- Minor oriented algorithm polynomial degrees 1 through 6

370

J. S. Baras

10000

__ v~:~~~

1000

==_

_ _ X:..--X>::::::::::XI-~
~c
~
,, _ _ ,I

'O~c:::::;;Z;;.

:.......=::

c~~ ====~
b
0
~~ 1:_ _b-==o==""l
~::::---l'::::::6-1
_!_o_O
"...:;

100

~~

v// o

0
0

--=-,
0

-= 6-===-0

=--1===-

.-"

X-Estlmaled valu",Ran out ,of memory

~~L~"""'"
10 "6
0

TI Explorer 11'-----16 mb physlcal memory128 mb vlrtual memory

8
10
11
12
9
Square matrix dimension

13

14

15

16

Fig. 3. Maximum coefficient length (number of digits)-Column oriented algorithm polynomial


degrees 1 through 6

~./'" ......:..........

10~.
1=~~~~~~~~~~~
TI Explorer 1 1 1 - - + - - + - - - 1 - - - 1 - - - + - - + - - - + - - + - - - - + - 1 6 mb physlcal memory-

1+-_-f__+-_-1__1-_~t_-~,t_--r-~t_--r~12~B~m~b~v~lritu~a~l~m~eim~o~ry~
3
4
5
6
7
10
11
8
9
12
13
14
15
16
Square matrix dimension

Fig.4. Maximum coellicient length (number of digits)-Minor oriented algorithm polynomial


degrees 1 through 6

On the Stability of Linear Discrete Systems


and Related Problems
M. Mansour l and E. I. Jury 2

1 Institute of Automatie Control, Swiss Federal Institute of Teehnology, ETH-Zrieh


CH-8092 Zrich, Switzerland
2 Department of Eleetrieal and Computer Engineering, University of Miami, P.O. Box 248294,
Coral Gables, FL 33124, USA

In this report some developments in the area of stability of discrete systems originally motivated
by a paper by Kaiman and Bertram are overviewed. It is shown that an analog to the Schwarz-form
was developed for diserete systems. This form was applied in determining the margin of stability,
Identification, signal proeessing using lattiee filters and model reduetion of one-dimensional and
multi-dimensional systems

1 Introduction
Wall [1] has shown that a polynomial
(1)

has all its roots with negative real parts if and only if Cl' C 2 , , Cn in the following
continued fraction are aB positive
g(s) = ________1_________
1
f(s)

(2)

Cls+ 1 + - - - - - 1 - - C2S

+-----C3 S

where

(3)
Let

i = 1, .. . ,n and

Co

= 1

372

M. Mansour and E. I. Jury

Then
g(s)

(4)

f(s)

f(s) is Hurwitz-stable if and only if b1 , ... ,bn are all positive.


It was proved also in [1] that if f(s) is Hurwitz-stable then there exists
uniquely determined polynomials fO,f1"'" fn-1 of degrees 0,1, ... , n - 1 having
the same property as f(s) and which are connected with f(s) by the recurrence
formula
fm+ 1 = sfm + bm+1fm-1

(5)

m = 0, 1, ... ,n-1
where fo = 1,f-1 = 1 and fn = f(s).
Here f1 (S),f2(S), ... ,fis) are the successive denominators of the continued
fraction (4).
It was also shown in [1] that

f(s) = det

(6)

i.e. f(s) is the characteristic polynomial of the matrix


-b 1

-1

b2

(7)

Other variations of the matrix (7) which have the same characteristic equation
are

[=:: ~

-h.

J[~1

(8a, b,c)

5 Linear System Theory-Stability of Linear Discrete Systems

373

It is noted here that this matrix or one of its variations has been called in the
literature the "Schwarz-matrix."
Schwarz [2J developed a numerical method of elementary transformations
to transform a given matrix A to the above nurmal-form (8a) so that the Hurwitzstability ofthe matrix A is determined from b 1 ,b 2 , . ,b n i.e. A is Hurwitz-stable
if and only if b 1 , b 2 , , b n are all positive.
KaIman and Bertram [3J used the second method of Lyapunov to prove
the Hurwitz-stability of a system matrix in Schwarz-form, e.g. if we consider the
linear system

(9)
where Al is in Schwarz-form (8a) then the Lyapunov function

v= :s.TP1:s.

(10)

where

b./b,b, ... b.
can be used to prove Hurwitz-stability

V=

(11 )

-:s.TQ1X 1

where

(12)

V is negative semidefinite and vanishes identically only at the origin. Therefore

b 1 , b 2 , . , bn > 0 are necessary and sufficient conditions for the stability of the

system (9).
Parks [4J showed that the first column of the Routh array consists of the
elements
(13)

As the connection between Routh and Hurwitz criteria is known (see Gantmacher
[5J) i.e. the elements of the first column of the Routh array are
(14)

where D, . .. , Dn are the Hurwitz determinants.

374

M. Mansour and E. I. Jury

From (13) and (14) we get

D2
DI

D3
DI D2

DI D4
D2 D3

b l =D I ,b 2 =-,b 3 =--,b 4 = - - ,

(15)

Thus the link between Lyapunov-Routh-Hurwitz was obtained.


In [4] the inverse problem of stability was solved, i.e. the coefficients of the
characteristic equation as a function of the stability conditions are determined.
In [6] a recursion formula is derived which expresses the coefficients of the
characteristic polynomial (1) as a function of b l , b 2,.", bn. Here we have
ar,n=ar,n-l +ar-2,n-2 bn
al,l

= bl

ao,o

=1

aj,k

For n = 1 a ll = b l
For n = 2 a12 = b l
For n = 3 a13 = b l
The relation between a l

"

a2
a3
a4

a22
a23

(16)

= b2
= b 2 + b 3 a33 = b l b 3

.an and bl " .bn is given by

1 0
1

al

= 0 for j < 0 and k <j

0
1

0
1

0
0

al,l

a l ,2

al ,n-2
a2,n-2

bl
b2
b3
b4

an-2,n-2

bn

a2,2

an

(17)

Also in [6] a transformation matrix T, which transforms a matrix A in


companion form to a matrix A, in Schwarz-form is derived. Here Al = Tl A T~ I
where

A=

[0

-an

(18)

TI

al,l

a2,2

a l ,2

a3,3

a2,3

a l ,3

an-l,n-l

an- 2,n-1

an-3,n-1

A similar transformation was also derived in [8].

5 Linear System Theory-Stability of Linear Discrete Systems

Let P 1 = P 11 2 where Plis given by (10), i.e.

P 11 = [

(b

d/2

(b 1 /b 2 )1 / 2

375

(19)

(b db 2 b 3 . ... bn )1 / 2

Using the transformation matrix T 2 = P 11 Tl we get the system matrix

(20)

- b!/2

b;12
0

In this case V = ~T~ Le. P2 = I = Identity matrix and

(21)

It is noted that the matrix (20) was first used in [9J. In [10J it was shown that

the Schwarz-matrix of a complex polynomial can be related to that of areal


polynomial using the Lienard-Chipart-type simplification of the stability properties of areal polynomial.
Motivated by the work of KaIman and Bertram [3J and by their statement:
"The analog of the canonic matrix (Schwarz-matrix) is so far not yet available
for descrete systems" and solving the inverse problem of stability similar to the
work of Parks [4J, Mansour [7J obtained the analog of Schwarz-matrix for
discrete systems which is called in the literature the discrete Schwarz-form or
Mansour-form. In the next section the development leading to this form is
presented, as weIl as some extensions. In section III the application of this form
in different areas, like determining the stability margin of discrete systems,
Identification, Lattice filters, Model reduction of one-dimensional systems and
model reduction of multi-dimensional systems.

2 The Discrete Schwarz-Form or Mansour-Form


The Schur-Cohn stability criterion of a linear discrete system gives the conditions under which all the roots of the characteristic polynomial lie inside the
unit circle.
Given the system
(22)
A is in companion form.

376

M. Mansour and E. 1. Jury

The characteristic equation of the system (22) is given by


F(z) = Fn(z) = zn + a, zn-l + ... + an

(23)

The necessary and sufficient condition for the roots of (22) to lie inside the unit
circle is that the zero order terms in Fn(z) and the n - 1 polynomials obtained
successively through the transformation
(24)

are of magnitude smaller than unity [l1J, [12].


L1 r is the zero order term of the polynomial of degree r

lL1 r l<1 r=1,2, ... ,n

(25)

The L1 r can be obtained from Jury table [13].


In [7J the inverse problem of stability for discrete systems was solved giving
the recursion formula
ar,n = ar,n-l + an-r,n-l L1 n
aj,k = 0
for k <j, aj,k = 1 for j

(26)

i.e.
a 1 ,l=L1 l ,

a 12 =L1 l +L1 l L1 z , a22 =L1 z,

This can be expressed in matrix form as follows

al,l

az,z
a1,z
1

a3,3
aZ,3
al ,3

an-l,n-l

an-Z,n-l
an - 3 ,n-1

(27)

If we use the transformation matrix

an-l,n-I]
an-Z,n-Z
(28)

we can transform the system (22) to the Mansour-form


l-L1~_1

- L1 n - 2 L1 n - 1

2
1 - .1 n-2

(29)

5 Linear System Theory-Stability of Linear Discrete Systems

377

Choose a Lyapunov-function V = ~T P 3~' where

oo ...
...

o ...

0J
0

..

(31)

AVis negative semidefinite and does not vanish identically for any general
sequence of vectors.
Thus the Schur-Cohn stability criterion is proved using the second method
of Lyapunov and a diagonal matrix is used here for the proof similar to [3].
Let 1- A/ = b/ and P3 = p 3 / . then

(32)

Using the transformation matrix T4 = P 33 T 3 then we get

- An-lA n
- An-2Anbn-l
- A l A n b2 .. bn- l
-Anblb2"bn-l

- A l An- l b 2 b n - 2
-An-lblb2bn-2

(33)

In this case V = ~ T ~ i.e. P 4 = land

(34)

Another form can be obtained using the transformation matrix T [14], [15]

378

M. Mansour and E. 1. Jury

whose elements are given by


i-j+l

Tij= T i - 1,j-l

+ L

(35)

Lin-i+1Lin-i+kTi-k,j-l

k=2

T ii

= 1, Tij = 0 for j > 0

A s = TAr- 1 =

T iQ

= 0 for' i > 0

-.1._1.1.

-.1.- 2.1.(1-.1;_1)

-.1.- 2.1.- 1

-L1.- 3 L1.(I-f;-2)(I-L1;-1)

-.1.- 3 .1.-:1(1-.1;_2)

-L1.(1-L1~) ... (1-L1;_1)

-L1._1(1-L1~) ... (1-L1;_1)

-J

(36)

For the proof of Schur-Cohn criterion we use here

and get

Q,~ [0

(38)

In [10] the results were extended to the complex case and simplifications
analogous to those provided by the Lienard-Chipart criterion are obtained for
the real case.

3 Applications
3.1 Estimation of the Margin of Stability
In [14] the margin of stability of a discrete system given by the nth order
difference equation
(39)

is determined using the weighted square error.


The state variable representation is given by
~(k

+ 1) = A~(k)

(40)

5 Linear System Theory-Stability of Linear Discrete Systems

379

where A is in the companion form. A can be transformed to the form (36)


g(k + 1) = Asg(k)

(41)

Let the Lyapunov function be


V=gTPsg,

L1V= _gTQsg

(42)

Psis given by (37) and Qs by (38). The margin of stability is determined using
the weighted sum ofsquare error J = 'L.:'=Ok'y2(k). Due to the fact that T defined
in (35) is a lower triangular matrix with (1,1) entry = 1, then

L
<Xl

J=

k=O

k'y2(k) =

L k'xi(k) = L
<Xl

<Xl

k=O

k=O

k'xi(k)
(43)

For r=O,
Jo =

gT(O{k~O k'(A~)TQsA~ ]g(O)

(44)

which can be computed directly for different initial conditions. For example if
y(O) = y(1) = ... = y(n - 2) = 0

and

y(n - 1) = 1

then the margin of stability [14]


1

(J

= 1 - max IAi I > 22n -

1J 0

(45)

where Ai are the roots of (39). For weighted sum of square error (r ~ 1) J, can
be obtained in terms of solutions of the Lyapunov equation for linear discrete
systems [14].

3.2 Identification and Approximation of Systems


Dourdoumas [16] transformed the system
~(k

+ 1) = A~(k) + lzu(k)
y(k) = fT ~(k)

(46)

where A is in companion form, lzT = [0 ... 1] to the form


g(k + 1) = Mg(k) + Qu(k)
y(k) = fT g(k)

(47)

where M is in Mansour-form (29). The transformation matrix T is determined


from AT= TM and b = Tb.

380

M. Mansour and E. 1. Jury

In this case there are 2n parameters to be identified in the system (47). The
advantage of the transformation to the form (47) is that M is in Hessenberg
form which has some numerical advantages. The performance criterion is chosen
as
J

k;1

(48)

Iy*(k) - y(k)1

where y*(k) is the measured output.


Stability is assured if ,1;1 < 1. To use optimization without constraints on
the parameters ,1; is transformed as follows
,1;

= - (1 - e) + 2(1 - e)sin 2 ,1;


(49)

el

3.3 Representation of Lattice Filters


Takizawa et al. [17] uses the Mansour-form to represent lattice filters. The
system (46) can be transformed to
g(k

+ 1) = Mg(k) + fu(k)
y(k) = fT g(k)

(50)

where
M is in Mansour-form
~

!z. = [L1 n - 1 L1 n -

2 ,11

1]

fT = [flfz fn]
The system (50) can be represented by the following cascade structure Fig. 1
with a diagonal matrix transformation
T=dia gona{1 +e nL1n,(1 +enL1n)(1 +en-1L1n-l), ...

lII

(1 +e;L1J]
(51 )

where e; =

1.
I

Fig.l

5 Linear System Theory-Stability of Linear Discrete Systems

381

'----------..z

)-----~

- - - - ----10-(

)----~

Fig.2

The above representation (50) can be transformed to

.1.- 1.1.
- .1.- 2 .1.(1 + 6.- 1.1.- d

M'=

1 - 6.- 1.1.- 1
- .1.- 2 .1.- 1

+ 6j.1;)
.1.U;:II(I+eA)

.. .
.. .

.11.1.U7~2(1

-.1 1.1 2
- .1 2 (1 + 61.1 1)

~'I = [ .1.- 1,(1 + 6.1.) U7~ 1(1 + 6j.1 j)


(:y =

[<. e'l]

(52)

The system (52) can be represented by the lattice digital filter Fig.2. This
realization is with the least number of multipliers. Normally, there is a direct
coupling between u and y. The signs in Fig. 2 must be chosen in the same
sequence and are determined so that the coefficient sensitivity and the output
noise become smalI. The structure corresponds to a lossless cascade transmission
line.

3.4 Model Reduction of Discrete Time Systems


In [18] the Badreddin-Mansour method of model reduction of SISO discrete
systems was presented. The idea of model reduction is explained as follows. Let
~(k

+ 1) = M~(k) + Qu(k)
y(k)

= !/~(k)

(53)

where M is given by
-.1 1
-.1 2

1-L1i
-.1 1 .1 2

M=

Q=[1
!/=[C 1

O,,O]T
C2 "'C n ]

382

M. Mansour and E.1. Jury

It is assumed that (53) is stable, Le. ILiil < 1. If I(Li n /Li n - 1 )1 is sufficiently small

then X n will reach its quasi steady state much faster than X n -1' From last system
equation in (53) we get after putting xn(k) instead of xn(k + 1) on the left side
of the equation
~n(k)=

- Li n
(k
Xl ) _
1 + Li n _ 1 Li n

Li n - 2Li n

1 + Li n - 1 Li n

.I>

An

-1

(k)

(54)

substituting in equation (53) we get the new system matrix

o
(55)

1 - Li;_2

- Li n - 2L1 n - 1
where
A

_ Li n

L.l n -1 -

+ Li n -

1 + Li n - 1Li n

The matrix Nt is in the same form. One can easily prove that stability and
steady state response are preserved after reduction. Further reduction can be
continued in the same manner.
In [19J and [20J two multivariable Mansour forms for model reduction are
derived. The first one is obtained from the Luenberger first form using similarity
transformation.
x :

x
0 : ~

x :
: : 1

x:

..
.
A=
.............~.~ ..........~.. ~
x :

B=

x:

.. ..

. ..

............... ;............ : ......... .


x
x;

x:

~
X

(56)

IM 22 \
M=

(57)

5 Linear System Theory-Stability of Linear Discrete Systems

383

The diagonal blocks are in Mansour form. The extra degree of freedom in the
coupling elements in (57) can be used to get a nice structure of M. For example,
for 2 inputs (m = 2)

All
0 ]
OOPl A 22

(58)

The reduction of the first or the second subsystems can be done independently
by the same method used in the single input case.
The second multivariable normal form [20J is obtained by first transforming
the Luenberger first form to the block-controllability form by means of elementary similarity transformations and then applying a generalized matrix SchurCohn-Jury table. This second form is then constructed in the same manner as
the single input case.
Let the controllability of the nth order system with m inputs be described
by the block controllability form

A=

-AqJ

1m .

:
.
,

..

Im

(59)

-:A 1

The second multivariable Mansour form will be

M=

- (}l

1 - ()~

- (}i

- (}1(}2

(60)

The elements (}l' (}2' ... ' (}q are m x m matrices obtained from the matrix SchurCohn-Jury table.
The reduction procedure is similar to the single input case. (For details of
transformation and model reduction see [20J).
In [21J further justification of the Badreddin-Mansour method is presented
where the connection between root location properties, coefficient properties
and Schur-Cohn coefficient properties are established. It is also shown that the
lattice realization of the reduced system differs from the lattice realization of
the original system by replacing the last delay by unity gain element.

384

M. Mansour and E. I. Jury

In [22] it was shown that the elimination of row 1 and column 1 or


elimination of row i-I and column i for i = 2, 3" ... ,n of the Mansour matrix
(29) will result in a Mansour matrix of dimension n - 1. The above corresponds
to elemination of the first subsystem or the ith subsytem i = 2, 3, ... ,n in the
cascade realization.

3.5 Model Reduction of Two Dimensional Discrete Systems


In [23] the one dimensional reduction method of Badreddin-Mansour is
extenned to two-dimensional (2-D) discrete systems. The Roesser model of 2-D
system utilizes two kinds of state variables; one which propagates horizontally
and the other vertically. The model for a 2-D linear shift invariant SISO discrete
system is given by

-: 1,j)] = [~I~] [~:(~'~)] + [Bl]U(i,j)


[ ~:(~
~ (z,J + 1)
A
A
~ (z,J)
B
3

y(i,j) = [Cl I C

(61)

2][~h(~,~)] + Du(i,j)
~V(z,J)

A similarity transformation transforms the companion forms Aland A 4 to


Mansour matrices M 1 and M 4' The matrices M 1 and M 4 are the horizontally
propagating and the vertically propagating sections of the 2-D systems. The
off-diagonal matrices represent the interconnection between the two sections.
The model reduction of the horizontal or vertical section is done similarly to
the I-D case. For the special ca se of separable systems (M 2 = 0 or M 3 = 0) the
reduced system will preserve the stability of the original system and also the
steady state unit response. The same will be the ca se for 2-D 1h-l v system. In
the other cases stability is not, generally, guaranteed.
In [24] the above results are extended to 2-D multivariable discrete systems.
Two new 2-D multivariable canonical forms are derived on the same line as
the two 1-D multivariable canonical forms obtained in [19] and [20].

Conclusion
The Schwarz canonical form of continuous systems and the discrete Schwarz
canonical form or Mansour-form for discrete systems are discussed. An overview
is given on the characteristics and applications of the Mansour-form in the
areas of determination of the margin of stability, identification, signal processing
using Lattice filters and model reduction of one-dimensional and twodimensional systems. The work of KaIman and Bertram in 1960 on using
Lyapunov theory to prove the stability of Schwarz form has inspired the
sub se quent work for the discrete case.

5 Linear System Theory-Stability of Linear Discrete Systems

385

References
[1] Wall, H.S.: Polynomials whose zeros have negative real parts. Amer Math Monthly, 52 (1945),
pp 308-322
[2] Schwarz, H.: Ein Verfahren zur Stabilittsfrage bei M atrizen-Eigenwertproblemen, Zeit f angew
Math u Physik, 1956, pp 473-500
[3] Kaiman, R., Bertram, J.: Control System analysis and design via the second method of Lyapunov.
J of Basic Engineering, June 1960, p 371
[4] Parks, P.: A new proof of the Hurwitz-stability criterion by the second method of Lyapunov
with applications to optimum transfer functions. Fourth Joint Automatie Control Conference,
June 1963
[5] Gantmacher, F.R.: Applications ofthe theory ofmatrices. Interscience, New York, 1959
[6] Mansour, M.: Stability criteria of linear systems and the second method of Lyapunov. Scientia
Electrica, Vol XI, 1965, pp 65-104
[7] Mansour, M.: Die Stabilitt linearer Abtastsysteme und die zweite Methode von Lyapunov.
Regelungstechnik, Heft 12, 1965, pp 592-596
[8] Chen, C.F., Chu, H.: A matrixfor evaluating Schwarz-Jorm. IEEE Trans on Aut Control, 1966,
pp 303-305
[9] Puri, N., Weygandt, C.: Second method of Lyapunov and Routh canonicalform. J. ofthe Franklin
Institute, Nov 1963, p 365
[10] Anderson, B.D.O., Jury, E.I., Mansour, M.: Schwarz-Matrix Properties for Continous and
Discrete Time Systems. Int J Control, Vol 23, 1976, pp 1-16
[11] Schur, 1.: Ueber Potenzreihen, die im Innern des Einheitskreises beschrnkt sind. J. Reine u.
Angewandte Math. 147 (1917), pp 205-232
[12] Cohn, A.: Ueber die Anzahl der Wurzeln einer algebraischen Gleichung in einem Kreis, Math
Zeit 14 (1922), pp 110-148
[13] Jury, E.I.: Theory and Applications ofthe z-Transform Method. Krieger, 1982
[14] Mansour, M., Jury, E.I., Chaparro, L.F.: Estimation of the margin of stability for linear
continuous and discrete systems. Int J Control, Vol 30, 1979, pp 49-69
[15] Mansour, M.: A note on the stability of Linear Discrete Systems and Lyapunov Method. IEEE
Trans on Aut Control, Vol AC-27, No 3, June 1982, pp 707-708
[16] Dourdoumas, N.: Ein Beitrag zur Identifikation und Approximation von Systemen mit Hilfe
linearer diskreter mathematischer Modelle. Archiv fr Elektrotechnik 62 (1980), pp 1-4
[17] Takizawa, M., Kishi, H., Hamada, N.: Synthesis of Lattice digital filters by the state variable
method. Electron Commun Japan, 65-A, pp 27-36
[18] Badreddin, E., Mansour, M.: Model reduction of discrete time systems using the Schwartz
canonicalform. Electron. Lett., Vol 16, No 20, Sep 25,1980, pp 782-783
[19] Badreddin, E., Mansour, M.: A multivariable normal-formfor model reduction of discrete-time
sytems. Syst. Control. Lett. 4 (1983), pp 271-285
[20] Badreddin, E., Mansour, M.: A second multivariable normal-Jorm for model reduction of
discrete-time systems. Syst Control Lett 4 (1984), pp 109-117
[21] Anderson, B.D.O., Jury, E.I., Mansour, M.: On Model Reduction of Discrete Time Systems.
Automatica, Vol 22, No 6, 1986, pp 717-721
[22] Antoulas, A.C., Mansour, M.: On Stability and the Cascade Structure. Technical Report 1990
[23] Jury, E.I., Premaratne, K.: Model Reduction ofTwo-Dimensional Discrete Systems. IEEE Trans
on Circuit and Systems, Vol CAS-33, No 5, May 1986, pp 558-562
[24] Premaratne, K., Jury, E.I., Mansour, M.: Multivariable Canonical Formsfor Model Reduction
of 2D Discrete Time Systems. Vol CAS-37, No 4, IEEE Trans on Circuits and Systems, April
1990, pp 488-501

Chapter 6

Identification and Adaptive Control

Finite Dimensional Linear Stochastic


System Identification
P. E. Caines
Canadian Institute for Advanced Research and Department of Electrical Engineering,
McGiII University, Montreal, Quebec, Canada H3A 2A7

Introduction
The use of state space systems to represent dynamical systems of various types
has a long history in mathematics and physics. Two fundamental examples are
the well-known methods for analysing high order ordinary differential equations
via the corresponding first order equations and the formulation of Hamiltonian
mechanics-within which c1assical celestial mechanics forms a magnificent
special case. Equally, in theoretical engineering, inc1uding the information and
control sciences, state space methods have played and continue to play a
fundamental role. From a mathematical viewpoint, the pure state evolution
models of dynamical system theory undergo an enormous generalization in
control theory to input-output systems, which then fall into equivalence c1asses
according to their input-output behavior. Furthermore, from an engineering
viewpoint, a vast array of problems are either initially posed in state space
terms (aerospace engineering provides many examples) or are first presented in
terms of input-output models which are then transformed into input-stateoutput form (process control being one source of examples). As a result, one
of the basic subjects of study in mathematical system theory is the relationship
between input-state-output systems and input-output systems. The generality
of this relationship is conveyed by the Nerode Theorem which states that any
non-anticipative (set-valued) input-(set valued) output system S possesses a
state space realization 17(S), that is to say, there exists an input-state-output
system 17(S) which generates the same input-output trajectories as S.
One ofthe fundamental contributions of Professor KaIman is the recognition
of the great importance of time invariant finite dimensional linear input-stateoutput systems (henceforth simply called linear state space systems) in the
mathematical study of dynamical control systems, and of the consequences of
this for theoretical engineering, in particular circuit theory and control theory.
Three major aspects ofthis theory, which are due in whole or in part to Professor
KaIman, are, first, the solution ofthe prediction problem for stochastic processes
generated by state-space systems; second, the solution of the realization problem
for time invariant linear input-output systems offinite Smith-McMillan degree
(henceforth called linear input-output systems); and, third, the initiation of the

390

P. E. Caines

classiftcation of the topological structure of the moduli spaces of linear state


space systems. The notions of observability and controllability play a
fundamental role in each of these three topics.
In this section of the Festschrift we present two basic results from the theory
of linear stochastic system identification, where by system identification we
mean system modelling and parameter estimation taken together. The first is
a precise statement and (to the best of the author's knowledge) a novel proof
of a system identifiability theorem. This states that under certain experimental
conditions only one system in a given general class oflinear systems can generate
a given set of input-output covariances. In this theorem the covariances of the
input-output pro ces ses take the exact (deterministic) value given by the corresponding integrals with respect to the underlying prob ability measure. The
second result is a major system identification theorem concerning the consistency
and asymptotic normality of system parameter estimates: it states that a weIl
known type of estimator (the maximum likelihood estimator) is such that along
the random sampIe paths ofthe observed process the parameter estimate (which
is a function of the observations and hence is itself a random process) converges
with probability one to the value of the parameter for the system generating
the observations, This is the strong consistency property. Furthermore, if the
(asymptotically vanishing) parameter estimation error is blown up by .jN,
where N is the length of the block of observations, it is shown that the resulting
random variable has a limiting normal distribution with zero mean and a
covariance matrix of a particular form. This is the asymptotic normality
property. It applies a microscope, so to speak, to the estimation error process
and reveals its probabilistic structure as it shrinks to zero. IncidentaIly, this
permits the application of statistical test procedures to detect whether certain
parameters take any particular specified values (in a given co-ordinate system).
It should perhaps be stressed that the first result mentioned above is a structural
result concerning the system and its environment and the second is a probabilistic result (which depends upon structural information).
It is hoped that the detailed pres~ntation in this chapter of these two results
will exhibit the efTective interplay of each of the three major aspects of system
theory in (i) the modelling of a significant class of stochastic systems, (ii) the
formulation of the parameter estimation problem for this class (including the
development of a theory of identifiability) and (iii) the analysis of a parameter
estimation technique (including asymptotic statistical results).
Much of the discussion in this chapter is based upon the text Linear
Stochastic Systems (Caines [1988]). In the sections ofthat book concerned with
system identification we formulate the general theory of Approximate System
Models (ASMs) and the corresponding set of Prediction Error Methods (PEMs)
for the estimation of the parameters appearing in ASMs. (The origins of this
theory lie in papers by L. Ljung, P.E. Caines and in papers they co-authored.
The reader is referred to the aforementioned text for details.) In a nut-sheIl, this
theory posits the existence of a given observed joint stochastic process (y, u) on
a probability space (n, fA, P), whose components are an JRP-valued process y

6 Identification and Adaptive Control-Linear Stochastic Identification

391

(labeled the output process) and an IRm-valued process u (labeled the input
process). A parameterized class of predictors {.J\(e) = f(y~-1, u~, e); kE71d is
specified, where e lies in a v-dimensional manifold e. Here 71 1 denotes the
natural numbers 1,2, .... The resulting set of representations
(1)

where ek ~ Yk(e), kE71 1, is called afamily ofapproximate system models for (y, u).
We emphasize that a parameter is an element of a parameter space of systems
which is a set of systems endowed with a manifold structure; the term parameter
is not to be confused with any particular co-ordinate description of a parameter.
In this connection the reader is referred to KaIman [1980].
Restricting the most general formulation somewhat for the sake of brevity,
we now introduce a family of continuous loss functions lk(' ):IRP x e ~ IR 1,
kE71 1, that yield the cost function L N (-, ):IRNp x e~IR1, NE71 1, on a sampie
path of prediction errors e(e), via
1 N

LN(e(e), e) ~ -

lk(ek(e), e),

N E71 1

(2)

k=l

For each NE71 1, one may now define the sampie path dependent minimum
prediction error (MPE) estimator {jN = (jN(y~-l, u~) as a y~-l, u~)-measurable
selection of a parameter in the set arg min Oe8 L N(e(e), e) (whenever the minimum
exists) and define the deterministic optimal (ASM) estimator N as the set of
parameters arg min Oe eELN(e(e), e) (again whenever the minimum exists).
In this general framework, for a given family of ASMs and loss functions,
we say that e is (asymptotically) identifiable if there exists a unique limit
e
in the manifold topology for all sequences chosen from the sequence of sets
{e N; N E71d, and say that the prediction error estimator process {(jN; N E71d
is (strongly) consistent if {jN ~ a.s. as N ~ 00, again in the manifold topology.
The benefits of the generality of this formulation include the following:

eE

(1) A coherent set of strong consistency, asymptotic normality and asymptotic

accuracy results are obtained within the ASM-PEM framework (see


Chap. 8, Caines [1988] for (roughly speaking) not necessarily linear predictor systems with geometrically decaying weighting patterns.
(2) The generality of the theory permits a systematic analysis of generic nonlinear stochastic system identification problems where the exact prob ability
distributions of the observed pro ces ses are not necessarily Gaussian and
may not possess higher moments. In such situations a theory of approximation is essential since it is not possible even in principal to estimate the
complete set of finite dimensional distributions of the input-output process
(or equivalent data); the reason for this is the growth, in the general case,
in the computational complexity of any estimation procedure as a function
of the size of the data blocks. This stands in stark contrast to (i) the linear
stochastic state-space and input-output system case, and (ii) the ASM case,
where the complexity of the predictors is chosen apriori.

392

P. E. Caines

(3) In the most general version ofthe theory, (0.3) includes a term that measures
N ; N EZd then automatically take
the complexity of the model. and
account of the trade-off between the minimization of prediction errors along
a given block of data and the corresponding increase in the complexity of
the best fitting model. This trade-off is a fundamental issue in the philosophy
of science, and the reader is referred to Chapter 5, of Caines [1988J for a
discussion of this topic together with a bibliography. Our formulation
permits one, for ex am pIe, to express the Hannan-Quinn and HannanDeistler theories (see Hannan-Deistler [1988J and Chap.8, Caines
[1988J) of system parameter and structure estimation within a coherent
overall framework. Furthermore, the ASM-PEM formulation is at least
consistent with recent theories of parameter and structure estimation, such
as those due to Willems [1986, 1987J and Rissanen [1989]. (The latter
presents a generalized form of Shannon information theory as its framework
for the study of system identification, and for the study of statistics in
general.)
(4) The ASM-PEM formulation of system identification specializes smoothly
to the standard central problems of system identification such as the
maximum likelihood identification of Gaussian state-space and autoregressive moving average systems with exogenous observed inputs (called
ARM AX systems).

{e

The important ca se introduced in (4) is the subject of this chapter; it involves


restrictions of the general case in each of its crucial features: first, the systems
generating the observations, not the po si ted predictor set, is taken to be a set
of time invariant linear state space or ARMAX systems (naturally, this yields
linear predictors for the observations in the Gaussian ca se); second, all processes
are taken to be Gaussian or deterministic; and, third, the maximum likelihood
method of parameter estimation is adopted; in the Gaussian case this makes
the maximum likelihood estimator N identical to the MPE estimator obtained
by taking the loss functions {lk(', .); kEZ d to be the negative logarithms of the
conditional distributions of the observations.
Although these restrictions may seem severe, we shall argue in Sect. 1
below that the Gaussian ARMAX and state-space case is important if one
adopts certain standard system modelling hypotheses. Furthermore, as already
stated, this case involves a fascinating interplay of ideas from finite dimensional
linear system theory.

1 Gaussian Armax and State Space Models


In this section we show that the im position of a sequence of standard system
modelling hypotheses takes one from a general dass of observed joint inputoutput processes to the important dass of processes generated by an asymptoti-

6 Identification and Adaptive Control-Linear Stochastic Identification

393

cally stable finite dimensional linear time invariant system, where the system
acts on an input process composed of an observed stochastic or deterministic
process U and an unobserved stochastic process w. Moreover, this class of
systems has representations both in the autoregressive moving average form
(with exogenous inputs), known as ARMAX representations, and in input-stateoutput form (see equations (1.6) below).
We begin by considering a class OJI of p-component stochastic processes
defined on the probability space (Q, f!J, P), YEOJI being the sum of a process (,
generated nonanticipatively from an observed input U in a given class dlt, and
a process ~ in the class E of pro ces ses generated from a class of unobserved
inputs. So we write
kEll

(1.1 )

where (k = (k(U~ ",), Uk = uk(w), for all kEll


Our treatment of the identification problem will only include (nonanticipative) linear systems that have finite dimensional realizations because, in
a sense to be explained subsequently, they have finite dimensional parameterizations and, moreover, they map Gaussian processes to Gaussian processes.
This permits exact system modelling, by which we mean the exact specification
of the finite dimensional distributions of Y or (y, u) as appropriate. Further, we
are interested in longitudinal asymptotic properties of the estimators. These
considerations lead us to restrict our attention to asymptotically stable linear
time invariant systems, since they are definitely capable of some form of
asymptotically stationary behavior, wh ether driven by deterministic or
stochastic processes. Once this restriction is accepted, there is the prospect of
an asymptotic analysis of the behavior of various estimators via properties of
the strong law and ergodic type.
The process ( is taken to be the output process of an asymptotically stable
finite dimensional linear time invariant system Z described by the (p x m)
transfer function matrix Z(z) = L;:OZ;Zi. The m component input process U to
Z will be taken to be directly observed, defined on (Q, f!J, P), and will be initially
be assumed to satisfy one of the conditions:
INP lA:
INP lB:

U
U

is deterministic (i.e. {cjJ, Q} measurable) and bounded.


is an asymptotically stationary stochastic process.

We shall make the hypotheses on U more restrictive as the discussion


proceeds.
Turning to the class of processes E, which are generated by the unobserved
inputs, we agaifllook for the properties of finite dimensional parameterizability
(to facilitate exact system modelling) and for some form of stationary (to get
longitudinal asymptotic results).
A further modelling restriction to be placed on processes in E is that they
have no linearly deterministic part, that is, ~k cannot be decomposed as ~~ + ~f
with ~f = (~f IHL 1) a.s. and ~z non-zero.

394

P. E. Caines

This assumption is made purely for the sake of simplicity. In applications,


it is often useful to employ models which involve nonstationary deterministic
parts such as trends or stationary linearly deterministic parts.
The same holds for the mean value process of the observed process y; in
order to keep the system modeling theory in this chapter simple we shall consider
only those non-zero biases that can be represented by the signal ( genera ted
from u via Z.
A penultimate step in this discussion is to observe that although any
p-component process ~ und er consideration is assumed to have no linearly
deterministic part, it still need not be of full rank. If ~ is not of full rank, then
the same is true for y, under INP 1A, since ( is predictable given the impulse
response of the system Z and the input u. In this case, it is a perfectly reasonable
simplification of the identification problem just to take a maximal subset of the
components of y for which there are no non-zero linear functionals that are
exactly predictable. If we ass urne this has already been done we are free to
assurne that ~ is of full rank.
Obviously one of the most important types of processes ~ that satisfy the
specifications above is the dass of wide sense stationary processes. By abasie
result of stochastic realization theory (see Theorem 1.7, Chap. 4 of Caines
[1988J), ~ is a p component full rank wide sense stationary process with a rational
spectral density matrix (/J, where det (/J does not vanish on T = {z: [zr = 1, ZE~},
if and only if there exists a representation of ~ of the form

L
00

~k =

WiW k - i,

Wo = I,

a.s.

kEZ

(1.2)

i=O

where W(z) is a (p x p) asymptotically stable, inverse asymptotically stable


matrix of rational functions and W is a p component wide sense stationary
orthogonal process with a strictly positive covariance matrix 1:. In addition, ~
has zero mean if and only if w has zero mean.
So it is convenient at this point to isolate the following assumptions on the
process w:

INP 2: The process w defined on (il, flJ, P) is a zero-mean wide-sense


stationary orthogonal process with EW k
= 1: bkj , k,jEZ, and 1: Ef!J, where f!J>
denotes the set of (p x p) strictly positive symmetrie matrices.

wJ

By the theorem quoted above, when the system W acts on a process w


satisfying INP 2 it generates wide sense stationary processes satisfying the
hypotheses we have dedared to be desirable. For this reason, we settle on
systems with the stated properties of Was the systems generating ~. It should
be no ted, however, that we shall use this assumption from here on without
assuming ~ to be wide sense stationary. (Equivalently: W will act on w for kEZ +,
but the system will not, in general, be given the initial state covariance corresponding to wide sense stationary state and output processes.)
Following this discussion, which has led us from the properties ofthe process
dass l1Jj to an admissible set of system models for gene rating certain members
of 11Jj, we concentrate on the description of the systems themselves. Our starting

6 Identification and Adaptive Control-Linear Stochastic Identification

395

point is the formal (positive) power series equation


y(z)

= [Z(z), W(z)] [U(Z)],

(1.3)

w(z)

where it is assumed that (i) the matrix [Z(z), W(z)] of formal transfer functions
is the positive power series expansion of a (p x (m + p)) matrix [Z(z), W(z)] =
[Z*(Z-l), W*(Z-l)] of rational functions of z with Smith-McMillan degree b
in Z-l, where (ii) [Z(z), W(z)] is non-anticipative as an operator on formal
power series and hence is proper satisfying
[Z(z), W(z)]\z=o = [Z*(Z-l), W*(Z-l)]!z-I=OO

<

00,

and which is such that (iii) all entries have asymptotically stable poles (which
implies in particular, [Z(O), W(O)J < 00), (iv) det W(z) has asymptotically stable
zeros, and (v) W(O) = 1.
In this case, we know from realization theory (see KaIman [1963, 1965J,
Heymann [1975J, and for recent discussions Bokor and Kevitzky [1987J,
Delchamps [1988J, Hannan and Deistier [1988J and Deistier and Gevers
[1988J) that [Z(z), W(z)] above has the following representation.
The ARMAX Representation
The matrix of rational functions [Z(z), W(z)] satisfying [Z(O), W(O)J <
a left matrix fraction description
[Z(z), W(z)]

00

= A; l(Z-l )[B*(Z-l), C*(Z-l)J + [Z(O), W(O)]


= A; 1(Z-1 )[B*(Z-l), C*(Z-l )J,

has

(1.4)

where W(O) = I, and A*, B*, C* are respectively (p x p), (p x m) and (p x p)


polynomial matrices such that (i)
1(z - 1) exists ~most everywhere and
detA*(z-1) has asymptotically stable zeros, (ii) det[C*(z-1)+ A*(Z-1)] has
asymptotically stable zeros, and (iii) deg det A*(z -1) = band A*(z - 1) and
[B*(Z-l), C*(Z-l)J have no common left matrix divisors other than unimodular
matrices (Le. A*(Z-1) and [B*(Z-1), C*(Z-l)] are left co prime), or, equivalently,
(iii') deg det A*(z -1) in (1.4) has the smallest (integer) value b amongst all such
representations.
Consideration of the Smith-McMillan form of [Z(z), W(z)] subject to
[Z(O), W(O)] < 00 shows there exists an ARMAX matrix fraction description
[Z(z), W(z)] = A -l(Z) [B(z), C(z)], A(O) = I, C(O) = I, which has the same
stability and coprimeness properties as (1.4). Such a description will be said to
satisfy ARMAX.
Equivalently, we know that the following holds.

A;

The SSX Representation


The matrix of rational functions [Z(z), W(z)] satisfying [Z(O), W(O)J <
the representation

00

has
(1.5)

396

P. E. Caines

where (H, F, [G u, GW ], [Du,!]) are respectively (p x <5), (<5 x <5), <5 x (m + p) real
matrices and (i) F is asymptotically stable, (ii) det(H(I z -1 - F) -1 GW + 1) has
asymptotically stable zeros, (iii) (H, F) is observable and (F, [Gu, GW]) is
controllable, or, equivalently, (iii') the state space dimension <5 in (1.5) is the
smallest amongst all such state space realizations.
SSX stands for state space system with exogenous inputs and astate space
system satisfying the conditions above will be said to satisfy SSX.
We now specify our admissible set of ARMAX and state space systems by
considering the set of linear systems having the properties ARMAX or SSX
above. To be specific, we adopt the following definition:
Definition 1.1. For given p,<5EZ 1, mEZ+,ff'(p,m,<5) shall denote the set of

time-invariant finite-dimensional ARMAX and state-space system, each of


whose members possesses a (p x (m + p)) transfer function [Z(z), W(z)] such
that there exists:
(1) A matrix fraction description (A(z), [B(z), C(z)]) satisfying ARMAX, or,
equivalently.
(2) Astate space description (H, F, [Gu, GW], [DU, 1]) satisfying SSX. When m = 0
D
this will indicate that we are dealing with the ca se where u is absent.
Now the set of (r x s)-block Hankel matrices of rank <5, equivalently (r x s)
matrices of rational functions of Smith-McMillan degree <5 satisfying Z(O) = 0,
is an (analytic) manifold MV' of dimension v' = <5(r + s), see KaIman [1974],
Clark [1976], Hazewinkel and KaIman [1976], Hazewinkel [1977], and
Rissanen and Ljung [1976], and, for related studies Byrnes, [1977], Byrnes and
Hurt [1978], Hannan [1979], Delchamps [1985], Deistler [1983], and Hannan
and Deistler [1988]. Further, since we must include Z(O) in our system
description, we set MV = MV' x Rpm.
By imposing the asymptotic stability and normalization constraints of
ARMAX or SSX we restrict our attention to a certain submanifold N Vof MV
corresponding to Y'(p, m, <5), where v = <5(2p + m) + pm.
Definition 1.2. The parameter space lJ'(p,m, <5) for p,<5EZ 1 , mEZ+, denotes the
submanifold NV c MV corresponding to the set of systems S(p, m, <5).
D
lJ'(p, m, <5) shall be written IJ' whenever the context leaves no ambiguity
concerning p, m and <5.

Example 1.1. The fact that MV is a manifold wh ich is not homeomorphic to


Euclidean space is illustrated by the simplest possible case where the
Smith-McMillan degree <5 (in Z-l) ofthe rational functions is 1, or, equivalently,
those for which the formal transfer function (in z) has Hankel matrix of rank
<5 = 1. Such rational functions W(z) correspond to the set k ~,l of single input

6 Identification and Adaptive Control-Linear Stochastic Identification

397

single output systems for which b = 1, and W(O) < 00. They are given by the
set of rational functions (IX + z -1 )j(y + bz - 1), where there are no common
factors between numerator and denominator, and b are not both zero, and
y "# 0, that is, I I + Ib I > 0, IXb - y "# and y "# 0. This set may also be written as
bz/(a + z) + C, a "# 0, b "# 0. The manifold structure of this family of functions is
given by MV = MV' x R 1 C R3 , where MV' is the analytic manifold of dimension
Vi = b(p + m) = 2 which is equal to the disjoint union of the four chart
neighborhoods O = {a Z 0, b Z 0; (a, b)ER Z }.
It is evident that MV' x R 1 is in one to one correspondence with the formal
transfer functions of the set of systems I~,l and that 9"(l,O,l)cIi,l is
parameterized by P( 1, 0, 1) ~ MV n { Ial > 1, Ib + 11 < Ia I} n {C = 1}.
Observe that in the induced topology on {c = 1} C R 3 the parameter set
P(l,O, 1) is the open set given by the disjoint union

{O uO+ _ uO_ + uO __ } n {lai> 1, Ib

+ 11< lai}

The fact that MV' x R 1 is the disjoint union of four open connected sets is
also a consequence of the following general fact:

b -n + b Z -(n-1) + + b
Rat(n)~ {b()
~= nZ _
n-1 _ _

o,lanl+lbnl>O,
a(z) anz n + an - 1z (n 1) + ... + ao
a(z) and b(z) relatively prime},

nE~+,

is an open submanifold of RZn+Z with n + 1 connected components; this was


proved by Brockett [1976J; see Segal [1979J for more information.
Example 1.2. In Example 1.1, the open co-ordinate neighborhoods O x R 1
for I ~,1 were disjoint and their union covered the parameter space MV' x R 1.
However, if we start from the state space description of a linear system of given
state-space dimension, we rather naturally obtain overlapping coordinate
neighborhoods of the form found in the standard definition of a manifold.
Purely for definiteness consider the set I;,z of two-input-two-output, linear
state-space dimension b = 2 whose transfer functions satisfy Z(O) < 00.
By virtue of the controllability of any minimal realization, the rank of the
(2 x 4) matrix [G, FGJ must be 2. In the case where the first two columns span
R Z we can choose the basis for the state space so that

and this will yield a system in Itz whenever [H, FJ is observable. In the ca se
where the first and third span R Z, we may take a basis so that
[H, F,

hll

GJChart 1

= [[ h Z1

h 12 ]
hzz '

[0 1] [0 glZ]]
IX z

IX 1

'

1 gzz

'

398

P. E. Caines

in case the second and fourth span, we may take

[H,F,GJchart2= [[ h

11

h12]

[0 1] [gl1 ~]J.

h'
,
21
22
!X2!X1
g21
and in the ca se where the first and fourth span, we may take

h12 ]' Li[H"FGJ Chart = [[hh21 h


22
3

11

1[

22
_ a

a 12

-a 21
a11

],[0

where Li = (a 11 a 22 - a12a21) # 0, and where in all cases, of course, observability


must hold. And so on.
Let a(i) and A(i) denote the co ordinate descriptions of a vector a and matrix
A with respect to the basis given by the i-th chart. Then we see that the
transformation taking, for instance, the set of coordinates in Chart 1 to those
in Chart 2 (in the overlap ofthe two neighborhoods) is given by the state-space
transformation T = [g~l), F(1)g~l)J - 1.
We observe that the description of this covering set of co ordinate
charts, together with the associated analytic transformation maps on the
overlaps, explicitly displays 1:'t2 as a v = 2(2 + 2) = 8-dimensional analytieal
manifold.
D
A unique description of the elements of the parameter space 'P(p, m, 0) of
ARMAX or SSX systems !f'(p, m, 15) can only be obtained by passing to the
equivalence c1asses of coordinate descriptions of points in !f'(p, m, 15). This is
precisely wh at is denoted by the manifold M 8 of 1:'~,2 in the ex am pIe above,
and by the parameter space manifold 'P(p, m, 15) in the general case. Co ordinate
charts (with transition functions which are in fact algebraic functions) of 'P in
terms of the entries of [H, F, GJ or (A(z), B(z)) may be given by a specification
of a "nice" basis for the row space of the Hankel matrix, see Hazewinkel and
KaIman [1976].
Canonical ARMAX or state space realizations (for given (p, m, 15)) are defined
to be matrix fraction descriptions or state space descriptions satisfying ARMAX
or SSX respectively which are in bijective correspondence with the elements of
!f'(p, m, 15). Works on these topics inc1ude those of Glover and Willems [1974J,
Denham [1974J, Diekinson, Kailath, and Morf[1974J, Rissanen [1974J, Forney
[1975J, Rissanen and Ljung [1976J, Deistler [1983J, Gevers and Wertz [1984J,
Van Overbeck and Ljung [1982J, Hanzon [1986J, and the references contained
in those artic1es.
Finally, in the modelling of the system generating y in (1.1), we take the
(p x p) covariance matrix 1:' of the orthogonal process wand let it vary over the
set f!} of (p x p) symmetrie positive definite matrices.
For given p, m, 15 this results in J ~ 'P x ~ as the parameter space for the
systems appearing in (1.3), satisfying ARMAX or SSX, and wh ich are driven
by an orthogonal process with covariance 1:' > 0.
We shall take the topology on J to be the product topology inherited from
the constitute space 'P, endowed with its manifold topology, and the space ~,
endowed with the topology of Euclidean space.

6 Identification and Adaptive Control-Linear Stochastic Identification

399

Let us assume u satisfies INP lA or INP IB and W satisfies INP 2; then in


an obvious (positive) formal z-transform and parameter subscript notation, we
shall write the system equations on 7l+ as follows.
ARMAX:

A",(z)y(z) = B",(z)u(z) + C",(z)w(z) + I.C.(z),


EWkWJ =

1:c5kj Vk,je71,

(1.6a(i) )

t/le '1',

1:efff',

(1.6a(ii

and
SSX
Xk+ 1
Yk

= F ",Xk + G~Uk + G;Wk'

ke71 +,

t/I e '1',

(1.6b(i

= H ",x k + D~Uk + ]W k ,

(1.6b(ii

EwkwJ = 1:c5kj , Vk, je71,

1: ef!!>

(1.6b(iii

By the very construction of e there is a one-to-one relationship b,!;!tween


the elements ofthe set ofsystems and covariances fI'(p, m, c5) x f!!> and the analytic
parameter manifold e.
Equations (1.6a) and (1.6b) constitute our fundamental set oflinear stochastic
systems. It is the parameters (Je e of the members of this set wh ich we wish to
estimate from observations on (y, u) using the most effective means we can find.
A fundamental question concerning the set of system models in (1.6) is
whether it satisfies the following definitions of identifiabi!ity.
Definition 1.3. Let {P~(;(J);Ne711} denote the right continuous finite
dimensional distribution functions of the process z = y (z = (y, u) respectively)
generated by the ARMAX and state space systems (1.6) for a given (Je e and a
given distribution on the corresponding initial conditions or initial state vector.
Then e is an identifiable parameterization for the family of systems (1.6) if,
for z = y (z = (y, u), respective1y), and all (J, (J' ee,
P~(z~; (J)

= P~(z~; (J'), VN e71 1 ,


~(J = (J'.

Vz~ eRNp (RN(m+ pl , respectively),

(1.7)

e is an asymptotically identifiable parameterization for the family of systems


(1.6) if, for z = y (z = (y, u), respectively), and all (J, (J' e e
!im (P~!~(z~;(J) - P~!~(z~;(J' = 0,

k-- 00

VNe71 1,

Vz~eRNp,

(RN(m+ p ),

respectively),

~(J =

(J'

(1.8)

In statistical parameter estimation it is the stronger second form of


identifiability that is often required, since, if one observes a single sampie path
of an asymptotically stationary process, then it is only asymptotic properties
of the finite dimensional distributions that can be inferred.

400

P. E. Caines

2 Construction of the Likelihood Function


A basic dass of parameter estimators in mathematical statistics is the set of
maximum likelihood estimators. In any given problem such an estimator is
defined to be the (set of) values of the parameter e that maximizes the joint
probability density f(z~; e), when the observations z~ are taken to be given
and fixed. Although maximum likelihood estimators are known to exhibit poor
performance (by several standard criteria) for small sampie sizes, it is weil known
that in certain standard situations (and subject to identifiability hypotheses) the
maximum likelihood estimates N converge a.s. (i.e. with probability one) to the
as N ~ 00 and, furthermore, display asymptotically
true parameter value
normal behavior around the true value. The dynamical system identification
problem is one of the most interesting applications of maximum likelihood
estimation and it is the one we study in this chapter in the case of finite
dimensional linear systems.
This section of the chapter is devoted to producing the likelihood function
displayed in formula (2.7) below. The vital feature which is immediately apparent
is that it is expressed in terms of the output prediction process (generated via
the state estimation process) of the observed system. Hence, linear recursive
filtering theory is seen to play a central conceptual role in the calculation of
maximum likelihood estimates and, equally important, the theory ofthe Kaiman
filter plays a crucial role in the analysis of their convergence.
Observations on the inputs and outputs of systems such as ARMAX and
SSX ((1.6a and 1.6b) of the previous section) over the inter val [1, NJ constitute
a set of random vector variables

{z l ' Z2"'" ZN} ~ {(Y1' u1), (Y2' u2),, (YN' UN)},


which we may arrange into the vector variable
(Z~)T

= (Y~, U~;Y~_l;" .;Yi, ui)ERN(m+ p )

Let us assume for the moment that the random variable z~ is distributed
over RNr with the parameterized density f(z~; 8), where 8 takes a value in the
set e. Then, the likelihood function of the observations z~ is given by
f(z~; .): e ~ R 1, and the maximum likelihood estimate 8N (of 8) is given by the
maximum (when it exists) of f(z~; 8) over e.
To begin with we observe that if z is a fuH rank zero me an stationary
Gaussian stochastic process with (infinite) process covariance matrix
0< 1: Z = (1::,j = 1::_ j; i, jE71), where 1::,j = EzizJ, i, jE71, then the likelihood
function f(';') is parameterized by 1: z E e when we identify e with the set of
(Nr x Nr) strictly positive submatrices on the diagonal of 1: Z ( . ) corresponding
to z~. Further, f('; .) has the form
1

f(z~; 8) = (2nt r / 2 (det 1:~(8))1/2 exp -

H(z~)T[1:~(8)r l(Z~)},

8E

e
(2.1)

6 Identification and Adaptive Control-Linear Stochastic Identification

401

Now in order to construct the exact likelihood function on a given


nonstationary process Y, let us assume the random variable {y~} defined on
(Q, flA, P) has a density that depends on the deterministic quantities (u~, 8) for
all N. Then, by the definition of a conditional density, we obtain
f(y~; u, 8) = f(YNly~-l; u, 8)f(y~-1; u, 8) =

n f(Yily~-l; u, 8),
N

(2.2a)

i= 1

where f('; u, 8) is used generically to denote the indicated conditional densities


(which necessarily exist); the second equality above folIo ws by induction and
we shall use the convention f(YIIY~; u, 8) = f(Yl; u, 8).
Alternatively, let (y~, u~) have a joint density that depends on the deterministic quantity 8. Further, assume the conditional densities f(Uklu~-l ,y~-l; 8)
are not functions of 8 for each kE'lll' (This may be interpreted to mean that
any feedback from Y to u is not e-parameterized.)
Reasoning as before, we obtain
f(Y~, u~; 8) = f(YN, uNly~-l, U~-l; 8)f(y~-1, U~-l; 8)
= f(YNly~-l, u~; 8)f(UNly~-1, U~-l; 8)f(y~-1 ,U~-l; 8)

n f(Ydy~-1,U~;8)f(udyil-l,U~-1),
N

(2.2b)

i= 1

We reiterate that f(y~; u, 8) in (2.2a) is the likelihood function of the data


which is parameterized by the deterministic quantities (u,8), where u is
{rjJ, Q}-measurable and 8 is a "purely" deterministic element of e. In (2.2b),
f(Y~, u~; 8) is the likelihood function of(y~, u~), and in that expression 8 is the
only deterministic quantity parameterizing the function.
Now let
y~

Yili-l(8)~

S YJ(Ydy~-1;U,8)dYi>

Rp

iE'lll,

that is, Yili-l is the conditional expectation of Yi given y~-l in the case where
(u,8) are deterministic. And let
Yili-l(8)~

S yJ(Ydy~-1;Uil;8)dYi'

Rp

that is, Yili-l is the conditional expectation of Yi given V1-l, Ui1) in the case
where (y, u) are jointly randomly distributed and 8 is deterministic.
Making the change of variables
Yi-+ Yi - Yili-l(8),

1 ~ i ~ N,

in (2.2a) and (2.2b) respectively, and using the change of variables formula, we
obtain
f(y~; u, 8) =

n f(Yi - Yili_l(8)ly~-1; U, 8)
N

i= 1

(2.3a)

402

P. E. Caines

and
N
N. 8) f( Yl'Ul'
-

TI f( Yi-Yili-I (8) IYi-I


I i-I
I ,U i.
I , 8) f( UiY
I ,U i-I)
I
N

(2.3b)

i= I

respectively, since the determinant of the Jacobian of the change of variables


is the identity matrix in each case. (Note that this manreuvre works for any
change of variables of the form Yi~ Yi - ki_l(yil- l , U il ), 1 ~ i ~ N, with k i- I(-)
measurable with respect to the sigma field ff{y~-l,uil}')
Write
J(y~, u~)

TI f(UiIY~-l,

U il - I )

(2.4)

i= I

Then (2.3a) and (2.3b) yield a likelihood function on the observations ofthe form

expttIIOgfo(V;(8))+IOgJ(Y~,U~)}'

(2.5)

wherefo(') denotesf(v;(8) Iyil- I ; U, 8) orf(v;(8) Iy~-I, u~; 8), vi(8)!), Yi - Yili-I (8) and
Yili-I(8) denotes the conditional expectation of Yi given the observations, this
being computed using the appropriate conditional distribution in each case.
We now postulate some further conditions on the processes appearing in
the system equations (1.6) so that the likelihood function (2.5) on the observations will be computable in a convenient recursive manner. These conditions are
typical of those imposed on ARMAX and SSX systems to yield a satisfactory
theory of filtering and stochastic optimal control. We shall let '5 denote the
maximum of the row degrees of A1I'(z) in (1.6(i)).
INP 3A: The random variables x!), (Y~b+ l ' W~b+ I) and XI' and ARMAX (1.6a)
and SSX (1.6b) respectively, are jointly distributed with and orthogonal to w~
for each N E7L I , where W is a full-rank orthogonal process and the joint
distribution is Gaussian with zero mean.

In case (A) the process U enters the recursions (1.6) in a deterministic manner
and the Gaussian density on the respective initial conditions propagates to a
Gaussian distribution on (Y~b+ l ' W~b+ I) parameterized by 8 == (1/1, L). This gives
(y~) a (nonsingular) Gaussian distribution for each N E7L I'
When INP 1Band INP 2 are in force, we enunciate the following condition:
INP 3B: (x!), (Y~b+ l ' U~b+ I' W~b+ I)' W 1 , w 2 , .. ) and (Xl> W I , w 2 , ... ) are full-rank
zero-mean Gaussian processes with Wn+ 1 independent of {x, (u 1 , W I ), .. , (u n, wn),
un+d and (XI,(UI,WI),,,,,(un,wn),Un+l) respectively, for all nE7L+.

If the observed input process U is absent, INP 3A and 3B become the single
hypo thesis that the initial conditions for (1.6) and the (full rank) orthogonal
processs (wo, wI , ... ) are jointly Gaussian, mutually orthogonal and have zero
mean.
Since INP 1- 3 give a likelihood function of the form (2.5) for (1.6), and since
the process v is Gaussian with density N(O,Lili-I(8)),Lili-I(8O, iE7L 1 , the

6 Identification and Adaptive Control-Linear Stochastic Identification

403

following expression gives the 8 = (t/J, .1')-dependent part of the logarithm of the
likelihood function (scaled by - 2/N):

{1

}
-2N
- LN 10 - ex --1 v 8 2
N i=1 g (2ny/2 (det.1' i1i _ 1(8))1/2 p 211 ,( )II[E;I;-dOW'

= p log 2n

+~

Ni=1

{log det .1' i1i - 1(8) + Tr[vi(8)vT(8).1'~i:'1 (8)]}

(2.6a)
(2.6b)

For convenience, we often refer to (2.6)-less the quantity p log 2n-as the
log-likelihood function and denote it by LN(Y~; u, 8) in case A, and LN(Y~' u~; 8)
in case B. Both LN(Y~; u, 8) and LN(Y~' u~; 8) are given by

1
N

i~l {log det .1' i1i - 1(8) + Tr[vi(8)vT(8).1'ili:'1 8)]}

(2.7)

The matrix inverse .1' i1i :' I (8) in (2.6) and (2.7) exists for all iEZ I and for aB
8E e since (1.6b) yields
.1' i1i - 1(8) = Ho Vi1i - 1(8)HJ + .1'0'

iEZ I ,

where Vi1i - 1(8) denotes the state estimation error covariance and we have.1' o >
for all 8E e by the definition of e.
Note that the initial condition covariances Exlxi, Exx T and Exx T are
parametrie quantities required to specify the joint distributions (x, Y~), (x, Y~),
(x, Y~, u~) and (Xl' Y~, u~), respectively, and to initiate the recursions for the
sequence {Viii-I; iEZ I } in case (A) and (B) respectively. As remarked in Sect. 1,
in none of the cases can this quantity be consistently estimated and it is not
included in the parameter 8.
The function (2.7) will be our main object of study in Sect. 3. As stated
earlier, the important point is that the ARMAX and SSX models together with
the INP hypotheses in their final form as given in Sect. 3 permit the recursive
construction of the prediction errors {v k (8); 1 ~ k ~ N} via the techniques of
linear recursive filtering.
The following example is instructive.
Example 2.1. Consider the construction of the likelihood function for a

stationary system given by


(2.8a)
(2.8b)
where Xk'Yk,WkER I for kEZ+ and where W is an independent identically
distributed N(O, (J2) process. Evidently the system is in both ARMA and state
space form, and we assume INP 2 and INP 3B hold.
Let us assume (2.8a, b) is in steady state, in other words, X is a strictly
stationary process. In steady state, II 00 /;:; Ex~ satisfies the Lyapunov
equation II 00 = 8II 008 + (J2, and for this to have a finite solution we must
naturally assume that the system is asymptotically stable that is 181 < 1.

404

P. E. Caines

The covariance sequence for the observed process Y is


R k = R_ k =

8 k (J2
EYk+rYr

--2'

1-<5

Vk, tEll

Weshall let 1:N denote the covariance of (y n' .. , Y 1); hence


e1i- il(J2

12: i,j 2: N,

(1: N)i,i = 1 _ 82 '

and the Gaussian assumption on w makes the entire Y process Gaussian with
density.
N.
_
1
1
1
!(Yl' 8) - (2nt I2 11: NI 1/2 exp - Z[YN, J:'N-1,"" Y2, Y1]
(J2

8(J2

8 N - 1(J2

1- 8 2

1- 82

1- 82

8(J2

(J2

1- 82

1- 82

Yn

YN-1

(J2

8(J2

1- 8 2

1- 82

8 N - 1(J2

8(J2

(J2

1- 82

1- 82

1- 82

Y2

Y1

In this simple example we can carry out the matrix manipulations to obtain
2

!(y~;8)=(2n)-NI2 ( ~
1-8

1
-8

)-1 /2exp-HYN'YN-1"",Y2,Y1]
0

1
-8

1
-8

0
0

(J2
(J2
X

(J2

(J2

1- 82

-1

6 Identification and Adaptive Control-Linear Stochastic Identification

-8

405

0
YN

-8

YN-1
X

-8
0

Y2
Y1

Hence

(a1 - 8

N
-log f(y~; 8) = -log2n
+ -1 { log - - ) }
2N

(2.9)
On the other hand, we can obtain (2.9) in the form (2.7) as follows:
From (2.5) we obtain

-logf(y~; 8) =

2 log 2n + 2i~l log L ili - 1


1 N

+ 2i~l (Yi -

Yili-1)2(Lili_1)-1

(2.10)

where L i1i - 1 = E(Yi - )lili_1)2.


Since EW k = 0, for all kE7L, and since no observations are taken before i = 1,
we have

and

where Il co satisfies the Lyapunov equation Il co = 8Il 008 + a 2 with solution


Il co = a 2 /1 - 8 2 Subsequently L i l i - 1 satisfies the Riccati equation

Hence the solution sequence to the Riccati equation (with its initial condition
determined by the Lyapunov equation) is
2

a
2
2
--2,a ,a , ...

1-8

406

P. E. Caines

and (2.10) is given explicitly by

-logf(y~;O) = N2 log2n +!{IOg(~)


+ (N -1)IOga 2 }
2
1-0

+~{YiC:202)-1 +it2(Yi-Yi i_1)2 a -2)}


1

which is the same as (2.9) since

Yi - Yili-1
for all i ~ 2.

Yi - 0Yi-1

This simple example shows how the calculation of (y~f (1:' N) -1 (y~), in the
direct version of the likelihood function is transformed into a sum of quantities
that depend upon the solution of the appropriate Lyapunov and Riccati
equations, in the "filter" form of the likelihood function (2.7). This illustrates
how, at the cost of the complexity of the Riccati equation (equivalently the
Cholesky factorization algorithms), the direct likelihood function as a Gaussian
process generated by the system (1.6a)-(1.6b) is transformed into the version
that depends on the process innovations.

3 Consistency and Asymptotic Normality


of Maximum Likelihood Estimators
In this section we present our main results, namely, the asymptotic identifiability
of the parameterization e of Y'(p, m, c5) and the strong consistency and
asymptotic normality of maximum likelihood estimators of the parameters of
Gaussian ARMAX and state space systems when the parameters lie in a compact
set. The methods of this section are based on techniques which appeared in
Caines and Rissanen [1974], Rissanen and Caines [1979] and Caines [1988].
In particular a complete proof of Theorem 3.2 is to be found in Chap. 7 of
Caines [1988]. An important parallelline of work on this problem is due to
Dunsmuir [1979] and Deistler and Hannan and is described in detail in the
text by Hannan and Deistler [1988].
The analysis will be carried out without making a distinction between the
autoregressive moving average models of (1.6a) and the state space models of
(1.6b), except that in order to be precise the input hypotheses below are stated
explicitly for the initial conditions in each case.
There are some significant differences between the analysis in the case where
the input u is deterministic, corresponding to our input hypotheses A, and those
where it is random, corresponding to our input hypotheses B.
We emphasise here that one of the main advantages in having available a
version of the theorem established under input hypotheses B is that it covers the
case of(linear constant) output-to-input feedback (plus exogenous disturbances)
when the feedback relation does not depend upon OE e. In particular, note that

6 Identification and Adaptive Control-Linear Stochastic Identification

407

the overall hypotheses INP B below include the analytic closed loop so Iv ability
(analytic CLS) condition of Chap. 10, Sect. 1 of Caines [1988], which merely
states that if the set of equations for (y, u) is solved in terms of wand the
feedback disturbance v, then the resulting map (w, v) ~ (y, u) is given asymptotically by an asymptotically stable matrix transfer function (i.e. a matrix transfer
function analytic in a neighborhood of the closed unit disc D). Further, although
the coefficient matrices IJ, LU and L of the feedback are not needed to compute
the maximum likelihood estimate, they are needed to compute the asymptotic
covariance of the estimate.
In the theory statement below, we gather together, for ease of reference, all
the relevant input hypotheses. The reader should note the addition of the
important "persistent excitation" condition in both sets of hypotheses: the
exogenous input, whether deterministic or random, is required to asymptotically
behave like a wide sense stationary process, which is not linearly deterministic
(with respect to the regressions of order less than twice an integer depending
upon the system structure). We also note that the maximum likelihood estimate
is defined as the minimizing argument of the function LN in (2.7) (which in the
MPE framework is interpreted as a prediction error loss function) and that the
hypo thesis of the existence of a conditional density for Uk given (y~-l, U~-l) is
not required in either of the theorems below.
Before stating the main theorem, we stress that it is not hypothesized that
the observed y process, ((y, u) process, respectively) is stationary.
Let "5 denote the maximum row degree of Atp(z) and Btp(z) for all t{!E 'P for
some set of co-ordinate charts covering the compact sub set C c 'P cited in the
statement ofTheorem 3.2. This co-ordinate system will remain fixed throughout
the rest of the discussion.
We now collect together the two sets ofinput hypotheses under the headings
INP A and INP B.
INPA:
(1) u is deterministic (i.e. {cj>,.Q} measurable) and bounded.
(2) w is a zero mean wide sense stationary Gaussian process with EWkWJ =
~bkj,k,jE71, with ~E(lJ>.
(3) x = (i:;I+ l' W~;I+ 1) and Xl' in (1.6a) and (1.6b), respectively, are jointly
distributed with and orthogonal to w~ for each N E71 + and the joint distribution is Gaussian with zero mean.
(4) For the process u the following limits exist:
T
Mk'
-1- ~
1Im
L.. UiUi_k =
N-ooN+l i =o

kE71,

and
(3.1a)

408

P. E. Caines

where this property is referred to as (deterministic) persistent excitation of


order 2<5.
D

INPB:
(1)

is a stochastic process generated by the feedback relation

Uk ;:::

j= 1

j= 1

L L}Yk-j+ L LjUk-j+Vk>

where v is a zero-mean "Stationary Gaussian process such that


vk..L Sp{ (x ~ (y':~+ l' U':;j+ 1> W':;j+ 1)' W~OO,} or Sp{x 1, W~ OO'}' kEZ 1 , where W
is specified below, and where the analytic CLS (w, v) ~ (y, u) condition holds.
(2) W is a zero mean orthogonal stationary Gaussian process with Ewkw[ =
L >O,kEZ.
(3) wk..L Sp {y':~+ 1> u':;j+ 1> v~ 00' w~-.; } or wk...L Sp {Xl' V~ 00' w~-.; }, kEZ 1, where
all random variables listed here have ajoint Gaussian distribution with zero
mean.
(4) Let v~ {u k - (ukIH~'W); kEZ+} or v ~ {Uk - (ukIH~"W); kEZ+}; then by (1)-(3)
the limit limN->OO(l/N)L:f=lvi+kvT = Mk,kEZ, exist a.s. It is assumed that
the asymptotic stationary Gaussian distribution for v possesses a singular
spectral distribution measure and that the associated covariance matrices
satisfy

(3.1 b)

vis called the exogenous part of the input process and is said to be (random)
persistently exiciting of order 2<5.

Versions of part (1) of the next theorem exist in the literature (see, e.g. Ljung
[1987J). Here we give a precise statement and proof linking the dass of systems
fJ'(p, m, <5), the notion of asymptotic identifiability and the hypotheses INP
(Al)-(A4), in particular the persistent excitation (order 2<5) hypothesis. Part (2)
should be compared with the feedback system identification results in Chap. 10
of Caines [1988J, where stronger conditions are required as the dynamics of
the feedback loop are not assumed known in the most general ca se considered
there. We also remark that the singular spectral distribution hypo thesis for v
in INP B above can be relaxed in both Theorem 3.1 and Theorem 3.2. For
Theorem 3.1 the proof be comes a little more elaborate than that given here
and for Theorem 3.2 the proof statement is identical.

Theorem 3.1. Subject to the hypotheses INP A or INP B,


is an asymptotically identifiable parameterization for the family of systems (1.6).

6 Identification and Adaptive Control-Linear Stochastic Identification

409

Proof. Since systems satisfying ARMAX or SSX are asymptotically stable, and
since w (under INP A) and (w, v) (under INP B) are Gaussian processes, the

finite dimensional distributions {P~!~(z~;8);N~ I} converge pointwise to


Gaussian distributions as k --+ 00 for all initial conditions in (1.6).
We treat each case in turn.
Part 1. It follows from the remark above that (1.8) holds under INP A if the
following me an value process and covariance matrix conditions hold on the
process z:

lim {[ZO<z)u(z) + E(I. C.O(Z))]k - [Zo,(z)u(z) + E(I. C.O,(Z))]k}

k-+<XJ

= 0,

+N(8)Zk+NT(8)
lim EZ kk+l
-- lim E Zk+N(8')Zk+NT(8')
k+l
k+l
k+l

k-.CX)

k-+co

But the asymptotic stability of systems in Y(p, m, (5) and INP A(1)-A(2) then
imply
(i)
<XJ

<XJ

i=O

i=O

L Zo,iMt-i = L ZO',iMt-i, VTE71,

(ii)

and INP A(1)-A(4) give

Wo(z).E oW~(Z-l) = Wo,(z).Eo' W~(Z-l)

For the first equation of the pair above we shall invoke the (order 2(5)
persistent excitation (3.1a) and use an argument based on Hankel matrix
realization theory.
Put Yt = 2:~oZo,iMt-i' 't"E71+. Then we have the infinite matrix equation
Mo
Mi

MI

Ml
Mo

M 2
M l

M2~

M2~-1

IM2/l
2/l
[

X X
X X

= [YOY1 Y2 /l],
where we have suppressed mention of 8 for simplicity of notation.
By the persistent excitation condition INP A(4), IM~~ > 0; so there exists a
sem i-infinite lower triangular matrix S with m(2t5 + 1) x m(2t5 + 1) identity

410

P. E. Caines

matrices on the diagonal and all other non-zero entries in the left-most m(26 + 1)
block column, such that

[Z,Z,Z, ...

jS-'SM~~

[Z,Z,Z, .jS-' [

~i~l

Hence we obtain the p x (26 + l)m matrix equation


[[Z OZI Z 2 ... ]S-I]2bM~~ = [Yo YI

'"

Y2b ],

which gives
[[Z OZI Z 2 ]S - I ]2b = [Yo Y I

...

-1- Y 2b ] [M 2b
2b ] V[Yo Y I

...

Y
2b ]

Let Q denote the semi-infinite matrix with (i) I m(2H I) in the top left
position, (ii) below each block diagonal matrix Im' a block column
[a~ lai I m' a~ I a 2 I, ... , et~ I aqI mY corresponding to the linear relation
L[=OaiZj + i = 0, q ~ 6, ao 0, (which holds for all jEZ I by virtue of the hypothesis 6(Z(z)) = 6 and the Cayley-Hamilton Theorem), and (iii) zeros in all other
entries; explicitly:

Then, because [Z OZI ",Z2bZ2Hl ... ]Q = [ZOZI ",Z2b-IZ2bO 0]


Q- 1 S-1 is lower triangular with top left entry I(2Hl)m, this yields

and

[Z OZI Z 2 ",Z2b] = [([ZOZI ",Z2bZ 2HI ... ]Q)Q- IS-I]2b

= [Yo YI ... Y2b ]

Now Zi=HFi-IG,iEZI for a realization (H,F,G,D) ofthe system Z(z) in


Y(p, m, 6). Since {Z l,z 2' ... ,Z 2b} gives the segment of the Hankel matrix
ZI
Z2

Z2

Zb
ZHI

Zb

ZHI
ZH2

Z2b-1

IH~+ 1=
Zb+1

Z2b

this data determines a minimal realization (H, F, G) for Z(z) - D = L~ I Z /


Further, given {Zi; iEZd, D is determined by the equation
Yo=DM o +

00

L ZiM-i'

i= I

6 Identification and Adaptive Control-Linear Stochastic Identification

411

since IM~~ > 0 implies Mo> O. It follows that (i) implies


Zo(z) = Z9'(Z)

However (ii), together with the uniqueness of the admissible spectral factors
parameterized by e, yields
Wo(z) = W9(z),

E 9 = E 9

The definition of ethen gives 9 = 9', as required.

Part 2. Let us assurrie hypotheses INP B hold. Then the mean value of the
process (y, u) converges to zero as N ~ 00 and the limiting Gaussian distribution
of the process is characterized by its spectral distribution matrix. We shall show
that under INP B this spectral distribution is in one-to-one relation with the
elements of e.
Now the limiting steady state behavior of (y, u) can be realized at any (and
hence all) finite time instants by taking a suitable (Gaussian) prob ability distribution on the initial conditions. We shall assurne this change of time origin from
infinity to the finite part of the time axis 7l has been carried out. In this ca se
we may write the input-output equations of the system (1.6) in the form
y(z) = Z(z)u(z) + W(z)w(z)
u(z) = L(z)w(z) + v(z)

(W(O) = I)
(L(O)

= 0)

where, as given in hypo thesis INPB(4), Vk=Uk-(UkIH;) and where we write


(uklHn = L~oLiWk-i' (Note that L(O) = 0 because Uk is a function of y~-l, U~-l
and Vk only and that L~ 1 Lizi -1 is an outer spectral factor of the spectral
density of the projected process.)
The spectral distribution function of the strictly stationary process (y, u) is
given by
dF

B
(Y.U)( )

=[1 Z(ei~][W(eiO) O][I O][WT(e- iO ) LT(e- iO )][ 1. O]dB


0 1
L(eiO ) 1 0 0
0
1
ZT(e'O) 1

+ [Z(eiO)dF jj(IJ)ZT(~-iO)
dFjj(B)ZT(e-' O)

Z(eiO)dFjj(IJ)]
dFjj(lJ)

By INPB(4) V is persistently exciting of order (2<5). Hence, since dF ii (9) and


Z(e i9 )dF ii (9) are given by the singular part of dF(y,u)(9), one can solve for Z(e i9 )
by part (1) of the theorem above. It is then straightforward to show that the
absolutely continuous part of F(y,u) yields W(e i9 ) and E. Consequently the
spectral distribution of the limiting joint (y, u) process generated by the system
with parameter 9Ee maps to the data ([Z9(Z), Wiz)],E 9}. This conc1udes the
proof of part (2).
0
Example 3.1. Perhaps the simplest non-trivial example of the necessity of the
persistent excitation hypothesis in part (1) is given by the three parameter system
')'z/1 - exz + ~ (, ')'Z, ')'exz 2 , ')'rx.2z 3 , .. ), ')':;l: 0, 0< Iex I < 1, which has SmithMcMillan degree 1, taken together with an input U for which Mt = 1 for r = 2n,

412

P. E. Caines

i for 't' = 2n + 1, nEZ. In this case it is impossible to obtain (oe, , y) using


this rank 2 = 2b data. However, part (1) of Theorem 3.1 states that an input
process which is persistently exciting of order 2b = 2, and hence a matrix M~
possessing 3 x 3 diagonal submatrices of fuH rank, is sufficient for the identifiability of this system.
D
MT =

Theorem 3.1 is of intrinsic interest for the system identification problem


discussed in this chapter. Although this theorem is not used directly in the
proof of the main result below, the persistency of excitation hypo thesis plays a
crucial role.
Theorem 3.2. Given the dimensions p, bEZ 1, mEZ+ of the output, state and
exogenous input processes, respectively, consider the ARMAX systems
(1.6a(i))
in Y(p, m, b) evolving on Zl' or, equivalently, the state-space systems

+ G~Uk + G~Wk
Yk = HljJXk + D~Uk + W k

x k + 1 = F ljJXk

(1.6b(i))
(1.6b(ii))

in Y(p, m, b). As indicated, let members of Y(p, m, b) be parameterized by the


elements 'l' ofthe (analytic) manifold 'l';further, let f!} ~ {.E:.E =.E T > 0, .E ERP 2}
parameterize the covariance matrices of the orthogonal input process wand let
~ 'l' x f!} be of dimension '1EZ1'

Part 1. Let the initial conditions for the system and the system inputs u and w
be defined on the probability space (il, f!4, P) and let one of the two alternative
sets of assumptions INP A or INP B hold.
When the input hypotheses INP A are in force the observations on a system
in Y(p, m, b) consist of the entire sampie path on Z of the deterministic observed
process u and the sampie path of y on Zl'
When the input hypotheses INP B are in force the observations on a system
in Y(p, m, b) consist of the sampie paths of (y, u) on Z l '
For a system in Y(p,m,b), parameterized by 8Ee, generating the observed
process subject to the assumptions INP A or INP B, let the maximum likelihood
estimate {fN of 8 be given by a (uniquely specified) maximizing argument of
LN(Y~; u, (J), or LN(Y~' u~; (J), respectively, over (JEC, where C is a compact subset
of containing 8.
Then {fN is strongly consistent, that is

{fN-+8

a.s.

as N -+ 00 in the topology of e.
Part 2. Let the hypotheses INP A or INP B hold. In addition, assume the
parameter 8 lies in the interior C of c.

6 Identification and Adaptive Control-Linear Stochastic Identification

as

413

N~oo.

In the above result the following notation is used.


(i) {fN - 0 denotes the vector difference of (fN and 0in a local co-ordinate chart
of e around 0, this difference necessarily being defined for all suitably large
N, and
(ii) H II(O) and Htptp(O) denote the top left and lower right blocks ofthe matrices

a2

[H(O)]ij = - - {log(det 17(0)) + Tr[cJ)(O) + il (0)]17- 1(0)} 16=0,


o

aoiao

1 ~ i, j

~ 1],

(3.2a)

when INP A holds, and

a
aoiaoj
2

[H(O)]ij = - - {log(det 17(0)) + Tr[cJ)(O) + il(O)]17- (O)} 16=,


o

1 ~ i, j

~ 1],

(3.2b)

when INP B holds, where the indicated matrix inverse and second partial differentials exist. Where H is block diagonal under both input hypotheses A and
B, with blocks corresponding to 17 and "', where

cJ)(O) 0 -2
rr.

J W; 1(ei1)W(ei1)17(0)Wd'(e-i1)W;T(e-i1)dD,

2"

(3.3a)

<P(O)0~ Y{W;1(e i1 )[ -Z6(ei1 )L(ei1 ) + W(e i1 )] 17(0) [Wd'(e- i1 )


2rr.

- U(e-i1)Z~(e-i1)] W;T(e- i1 )} dy
and
1
il(O) 0 -2
rr.

Yw;
0

(ei1)Z6(ei1)dF(ei1)Z~(e-i1)W;T(e-i1)

(3.3b)

(3.3c)

when
Ze(z) = A; 1 (z)B 6(z), W6(z) = A; 1(Z)C6(z), and Z6(Z) 0 Z6(Z) - Z(z),
(3.3d)
and where F is the non-decreasing function in [O,2rr.] given by Fu under INPA
and F't' under INP B such that

1 2"
M k = - J e- ik6 dF(0),
2rr.

kE71

414

P. E. Caines

Where this function necessarily exists in both cases.


P(O) Q(O)]
.
Formulas for the blocks of [ T
under mput hypotheses Aare
Q (0) R(e)
given in (3.4i-iii) below and under input hypotheses Bare given in (3.4i) and
(3.4iv) below. This matrix is block diagonal under input hypotheses A and B,
that is to say Q = 0, with the blocks P and R corresponding to E and P
respectively. Furthermore H(O) is block diagonal and, as displayed earlier, the
0

.
. H- 1(OO)[ P(O)
Q(O)]H-1(e') 0 f t he l'Imltmg
..
d'lstn'b utlOn
. .
covanance
matnx
T
Q (0) R(O)
)N(ON - 0) equals 2H- 1(O). ,
The entries in the matrix p(e) under input hypotheses A and Bare given by
0 '

Pij(O) = E(Tr[E- 1(O)

aE(~)] + Tr[(WoW~) aE'- 1.(O)])1


ao,

ao,

x (Tr[E- 1 (O) aE(~)] Tr[ (WoW~ aE- .(O)])1

.
8= 8

ae)

ao}

=2Tr[E- 1(o/E(O) E- 1(O/E(O)],


aei
aej

.
8=8

1<' .<p(P+ 1)
= I,) = 2

(3.4i)

The entries of the matrix Runder input hypo thesis Aare given by
Rij = 4Tr[M ii(O)E- 1(O)]

+ ~ Tr[WJ" 1(ei),) aWli(ei).)E(O) awd" (ei).)WJ"T(e- i).)E- 1(O)]dA,


2n

aej

aei

(P; 1) + 1 ~ i,j ~ 11,

(3.4ii)

where
(3.4iii)
whe~e ,~)(e) denotes I.l=O(iJ!20i)Kj(O)Uk-j, when K}{O) is the coefficient matrix
1(z)Za(z). (The first summand in (3.4ii) also appears as
of z} appearing in -

W;

the first summand in (3.4iv) below where it is given by its integral representation
with ii OO replacing u.) Under input hypotheses B the entries of the matrix Rare
given by
4

Rij= 2n

f"Tr {[ WJ"

1"

a -.,

. a

-T'

(e'A) aO Zli(e'A)dF .,(e').) aO Zli(e-').)WJ"T(e-').)E- 1(e)


i
i

+ Tr[Wi 1(e i).)( -~Zli(ei).)L(ei)') + ~ Wli(ei).))E(O)dA


aOj

aOj

6 Identification and Adaptive Control-Linear Stochastic Identification

415

(3.4iv)

Outline of Proof
We begin by defining the process VOO(O) which is the prediction error process
generated by the predictor for Yk+ 1 based upon Y~, u~ using the parameter
9 EC. VOO (0) is given by
VOO(O)(z) = C; 1 (z)Ao(Z) [(Aif l(z)B (z) - A; 1 (z)Bo(Z))u(z)

+ Aif 1 (z)C(z)w(z)] + I.C.;>(z)


for each OE e, where I.C.;>(z) is a O-dependent choice of initial condition such
that
l1o(z) t?: [C; 1 (z)Ao(Z)] [Aif 1 (z)C(z)]w(z) + I.C.;>(z)
=

w; l(Z) W(z)w(z) + l.c.;>(z)

is a zero-mean stationary Gaussian process with spectral density matrix


lPZ(z) = W; l(z)W(z)IW~(Z-l)W; T(Z-l)

Our first objective is to study the asymptotic properties of the steady state
version of LN(Y~; u, 0), N EZ 1 , where we recall

and in order to do this we define the function

(3.5)
over

e. Then

Lemma 3.1. Subject to the general hypotheses of the theorem and subject to
hypotheses INP A(1)-A(3)
(3.6)

as N --+ 00 for all OEe, moreover this convergence is uniform over any compact
set D ce.
In (3.6) the function L (9) is given by
L(O) = log(det I(O)) Tr[[ lP(O) + .Q(O)]I- 1(O)]

(3.7)(i)

416

P. E. Caines

where
(/Je((J) ~ Ef/o((J)f/~((J)

=~
2n

Q e((J)

~ 21

yw;

1 (eiY)

W6(eiY)1:'(e)W~(e-iY) W; T(eiY)dy

(3.7)(ii)

YW;

1 (eiY)Ze(eiY)dF u(eiY)ZJ(e -

iy)W; T(e- iy ),

(3.7)(iii)

and

(3.7)(iv)
and where F u() is defined in the theorem statement.

(We remark that Fu exists since, for all N, the Nm x Nm matrix, with i,j-th
block entry M i _ j' is positive and hence we may apply the theorem of Herglotz
to produce the required function.)
The next fact is that the exact likelihood function converges as described in

Lemma 3.2. Subject to the general hypotheses of the theorem and subject to the
input hypotheses INP A(1)-A(3).
LN(Y~; u, (J) ~ LN((J)

as N

~ 00

(3.8)

a.s.

for all (JEe and uniformly over any compact set D ce.

The geometrie rate of convergence of the solution of the Riccati equation


to its limiting value (see, e.g. Chap. 3, Caines [1988J) is used in the proof if
this lemma and the proof of Lemma 3.6 below.
We are now in a position to prove the strong consistency part ofthe theorem
und er the input hypotheses INP A.
In order to apply Lemmas 3.1 and 3.2 we identify the compact set D with
the set C containing
By definition of {fN' we have

e.

LN(Y~; u, (J) ~ LN(Y~; u, (fN)

a.s.

(3.9)

for each N EZ 1 and all (JEC. Now Cis a compact set so it is sequentially compact
and t~e sequence {{fN; N EZ 1 } has a convergent subsequence {{fNM; M EZ 1 } such
that (JNM ~ (J* as M ~ 00 in the topology of e. Further, we observe that (J* is
a 88-measurable e-valued random variable.
The next step is to insert (J* in the left hand side of (3.9), and take limits
along the sequence {NM;MEZd. Then by Lemma 3.1 and 3.2 we obtain
Le(e) ~ L6((J*)

a.s.

The final step is given in

6 Identification and Adaptive Control-Linear Stochastic Identification

417

Lemma 3.3. Under the general hypotheses ofthe theorem and the input hypotheses
INP A(1)-A(4) we have
Lt/(O) ~ Lt/(O) = 10g(detE(O)) + p a.s.

vOee,

(3.10)

where, as in (3.7), we have


Lt/(O) = logdet E(O) + Tr[E17o(O)17~(O) + Qt/(O)]E-l(O)

Further, equality holds in (3.1 0) if and only if 0 =

(3.11)

O.

Proof: We show that Lt/(O) has a unique global minimum at 0 = O.


Consider the funetion 10gdetX + Tr[QX-lJ, where X and Q are (p x p)
symmetrie strietly positive matriees. One may verify that the inequality
10g(detX) + TrQX- l ~ logdet Q + P

(3.12)

holds with the lower bound on the right-hand side of (3.12) attained only at
X=Q.
Identifying terms between (3.11) and (3.12), we obtain the lower bound
10g(det[(I)t/(0) + Qt/(O)J) + p

(3.13)

for (3.11). Now the funetion log (-) is strietly monotone inereasing on IR l and
det A > det B for A =P B, A = AT ~ B = BT > O. (This foHows sinee det (X + I) >
det I when X = X T ~ 0 and X =P 0.) Consequently we may obtain a unique global
minimum for (3.13) at '" = ~ if we ean show that (l)t/(O) + Qt/(O) has a unique
strietly positive global minimum (as a positive matrix) at '" =~, that is,
(l)t/(O) + Q(O) ~ (l)(O) > 0 with equality on!y if '" =~. We do this by showing
(l)t/(O) ~ (l)(O) = E(O) > 0 and Q(O) ~ Qt/(O) = 0 with equality in both only if

",=~.

Roughly speaking, 17(0) is the "residual sequenee" obtained by applying the


O-parameterized steady-state filter to the fuH-rank purely stoehastic part of the
observed proeess y onee the effeet of the u proeess has been subtraeted. Henee
it is intuitively clear that the varianee

is minimized at Wo(z) = Wt/(Z).


More rigorously, by the stability assumptions on systems parameterized by
e, the sums
W; l(Z) = I

+ P(O, z) = I + I

00

j=l

Pj(O)zj and W(Z)

= I + Q(O, z) = I +

00

Qj(O)zj

j=l

eonvergest absolutely and uniformly to a funetion which is analytie inside


{z; Izl ~ 1 + B, zeC} for some BeO.
Then, by the orthogonality of the exponential funetions on the unit eircle
(or by using Theorem 2.1 of Chap.3, Caines [1988J on the predietion of

418

P. E. Caines

wide-sense stationary processes), we have

E1Jo(O)1J~(O) = ~ Y(I + P(O,ei).))(I + Q(O,eU))


2n

x L(O)(I + Q(O,e-i).))T(I + P(O,e-;;')fd)' ~ L(O)

(3.14)

Now, since L(O) > 0 for an OEe, equality holds in (3.15) if and only if
W; l(Z)W(z) = I a.e. on T. We conclude that lP(O) is minimized as a positive
matrix if and only if
(3.16)
in which case lP(O) = L(O).
It is clear from the definition of DO in (3.7) (iii) and (3.7) (iv) that
D(O) ~ .Q(O) = 0 for an OE e. We want to show that .Q(O) = 0 only if
Z(O) = Z(O). Let us write Zo(z) in the left co prime factorization (mJ.d.) form
F;l(Z)G O(z), where Fo(z) and Go(z) are polynomial matrices of order less than
or equal to 215.
Now since w; 1F;lG odF uG:F;*W;* ~ O,D(O) = 0 implies

0= 2~ W; 1 (ei).)F ; l(ei).)Go(eU)dFiei).)GJ(e-U)F; T(e-i).)W;T(e- i )')


But, since W;lF;l has fun rank everywhere on T, we have GodFuG: =0.
Hence

0=

2n

26

w=o

J Go(ei).)dFiei).)GJ(eU) = L

Gi(O)Mj-iGJ(O)

where
Giz) ~

26

L Gi(O)Zi

(3.17)

i=O

But then the hypo thesis that u is persistently existing of order 215 (INP A(4))
yields Go(z) = 0, and so
Zo(z) = Z(z)

(3.18)

But, by ARMAX, or SSX, (3.16) and (3.18) imply t/I = ~.


At this point we have shown that
L(O) ~ log(det(lP(O) + .Q(O)) + p ~ 10gdet(L(0)) + p
with equality in the second inequality if and only if t/I = ~.
Now at 0 = 0, L(O) certainly attains the lower bound log(det L(O)) + p. On
the other hand, L(O) = log(det L(O)) + p has just been shown to imply t/I = ~,
and hence, by(3.11), at such a value ofO=(t/I,L)=(t/I(O),L(O)), we have
L(O) = log(det L(O)) + Tr[L(0)L- 1(O)] = log(det(L(O)) + p.

6 Identification and Adaptive Control-Linear Stochastic Identification

419

But again using the fact that log det X + Tr QX - 1 ~ log det Q + p (for
X = X T > 0 and Q = QT > 0) with equality only if X = Q, we obtain 17(0) = 17(8).
Hence 0 = 8 is the unique globally minimizing parameter of L(O) over

e.

Observe that the persistent excitation (rank 2<5) hypo thesis of INP A(4) has
played a key rle in the proof of this lemma, but is was not necessary to explicitly
prove asymptotic identifiability of the parameterization of e.

Asymptotic Normality Under Input Hypotheses A


By the result just proven, ON lies a.s. within a co ordinate neighborhood of 8 for
all N sufficiently large, that is, (~- 8)eN.(0) where N.(O) is an e-neighborhood
of the origin in IR'.
Since we may pick e such that N.(8) c C, we obtain (aLN/aOi)(Y~; u, ON) =
0, 1 ~ i ~ 1'/, and so the Mean Value theorem applied to (aLN/a()i)(Y~; u, 0) at ON
yields

(3.19)
for ()~ i an interior point of the line segment [ON' 8J, for all N sufficiently large,
where' the differential a2 /aOa()i yields a row vector.
We observe that the definition of L ()) in (3.l1i) and the analyticity of
implies that all derivatives required in this section exist and are continuous.
Now in analogy with Lemma 3.1 we analyze the behavior of the relation
(3.19). We begin with the statement of the following lemma:

Lemma 3.4. Subject to the general hypotheses ofthe theorem and hypotheses INP
A(l)-A(3), we have

a().ao.
,
J

L ())
N

-+ Hij()) ~ - - {log(det 17(0))

a()ia()j

+ Tr[( l/J ()) + fl ()))17- 1())]}


1 ~ i, j

1'/,

a.s.

(3.20)

as N -+ 00 for all Oe e, where

and where l/J ()) and .Q()) are defined in the statement ofthe theorem.

A lengthy technical argument wh ich uses the persistency of excitation (order


2<5) hypo thesis yields

420

P. E. Caines

Lemma 3.5. Subject to the general hypotheses ofthe theorem and the hypotheses
INP A(1)-A(4), the (11 x 11) matrix H(8) is non-singular and block diagonal
with (11' x 11') and (11 - 11') x (11- 11') blocks on the diagonal where the entries in
these blocks are given informulae (3.2) and (3.3) in the theorem statement.
0
Lemma 3.6. Subject to the general hypotheses ofthe theorem and hypotheses INP

A(1)-A(3), we have
02 LN (N.

00.00. yl' u,
I

0)

02 LN (0)

~ 00.00.
I

'

1 ~ i, j

11,

a.s.

as N ~ 00 for all OEe uniformly over any compact set D ce.

The convergence of the matrix term on the right-hand side of (3.19) and an
asymptotic analysis of the explicit formulae

(3.21a)
and

(3.21b)
then permits one to establish the formulae (3.4(i)-(iii)) in the theorem statement.
The plan of the proof under the feedback hypotheses INP B(1)-B(4) is
similar.
0

Concluding Remarks
In this paper we have given a principal maximum likelihood estimation result
for finite dimensional linear systems driven by an orthogonal Gaussian
disturbance process and by deterministic or feedback control inputs. General
ASM-PEM methods include these results as particular cases, but the stronger
hypotheses of Theorem 3.2 above yield the more specific and explicit results
given in its statement.

6 Identifieation and Adaptive Control-Linear Stochastic Identification

421

The finite dimensionality and linearity hypotheses played explicit and


implicit reHes in this analysis in the following ways:
(i) The restriction to finite dimensional linear systems permitted the use of
ARMAX and linear state space system descriptions.
(ii) The parameterization question for linear finite dimensional systems has an
explicit solution in terms of the manifold, structure of the set of systems
Y(p, m, b) used in this chapter. Furthermore, realization theory, via Hankel
matrix analysis, together with the key persistency of excitation condition,
was exploited to yield the system identifiability result of Theorem 3.1, and
to obtain Lemma 3.2 in the proof ofTheorem 3.2. These hypotheses are also
used in the proof of the analogue to Lemma 3.2 when hypotheses INP B
are in force, and in the proof of Lemma 3.5 above and its corresponding
lemma und er hypotheses INPB (see Chap. 7, Caines [1988J).
(iii) The likelihood function was expressed in terms of innovation processes
which may be generated by the KaIman filter and the Riccati equation.
The asymptotic analysis of the maximum likelihood estimate throughout
the proof of Theorem 3.2 is then facilitated by the theory of the KaIman
filter.
Acknowledgements

The author gratefully acknowledges conversations with Joszef Bokor, Laszl6


Gerenscer, Robert Hermann and Karim Nassiri-Toussi.

References
Bokor 1. and 1. Keviczky, ARMA Canonical Forms Obtained from Constructibility Invariants,
Int J Contro!, 45 (3),861-873 (1987)
Brockett, R.W., So me Geometrie Questions in the Theory of Linear Systems, IEEE Trans Auto
Contro!, AC-21 (3),449-455 (1976)
Byrnes, C., The Moduli Space for Linear Dynamical Systems, in the 1976 Ames Research Center
(N ASA) Con! on Geometrie Contro! Theory, eds. C. Martin and R. Hermann, Vol VII, Lie Groups:
History, Frontiers and Applications, Math Sei Press, Brookline, MA (1977)
Byrnes, C. and N.E. Hurt, On the Moduli of Linear Dynamical Systems, Stud Ana! Adv Math
Suppt Stud, 4, 83-122 (1978)
Caines, P.E., Linear Stochastic Systems, John-Wiley, NYC, (1988)
Caines, P.E. and J. Rissanen, Maximum Likelihood Estimation of Parameters in Multivariate
Gausian Stochastie Processes, IEEE Trans InfTheory, IT-20 (1),102-104 (1974)
Clark, J .M.C., The Consistent Selection of Loeal Coordinates in Linear System Identification, Joint
Atomatic Contro! Conference, Purdue Univ. Lafayette, Indiana, July (1976)
DeistIer, M., The Properties of the Parameterization of ARMAX Systems and their Relevance for
Structural Estimation. Econometrica, 51, 1187-1207 (1983)
DeistIer, M. and M. Gevers, Properties of the Parameterization of Monie ARMA Systems, 8th
I F AII FO RS Symposium on I dentification and System Parameter Estimation, Preprints 2, 1341-1348
(1988)
De\champs, D.F. Global structure of families of multivariable linear systems with an applieation
to identification, Math Systems Theory, 18,329-380 (1985)
De\champs, D.F., State Space and Input-Output Linear Systems, Springer-Verlag, NYC (1988)
Denham, MJ., Canonical forms for the identification of multivariable linear systems. IEEE Trans
Autom Contro!, AC-19, 646-656 (1974)

422

P. E. Caines

Dickinson, B.W., T. Kailath and M. Morf, Canonical matrix fraction and state-space descriptions
for deterministic and stoehastic linear systems. IEEE Trans Autom Control, AC-19, 656-667
(1974)
Dunsmuir, W., A Central Limit Theorem for Parameter Estimation in Stationary Vector Time
Series and its Application to Models for a Signal Observed in Noise, Ann Stat, 7 (3), 490-506
(1979)
Forney, D., Minimal bases of rational vector spaces, with applications to multivariable linear
systems, SIAM J Control, 13,493-520 (1975)
Gevers, M. and V. Wertz, Uniquely identifiable state-space and ARMA parameterizations for
multivariable linear systems, Automatica, 20 (3), 333-347 (1984)
Glover, K. and J.C. Willems, Parameterizations of linear dynamical system: Canonical forms and
identifiability, IEEE Trans Automat Contr, Vol AC-19, 640-646 (1974)
Hannan, E.J., The Statistical Theory of Linear Systems, Chapter 2 in Developments in Statistics,
edited by P. Krishnaiah, Academie Press, N.Y., 83-121 (1979)
Hannan, E.J. and M. Deistier, The Statistical Theory of Linear Systems, John Wiley, NYC (1988)
Hanzon, B., Identifiability, Recursive Identification and Spaces of Linear Dynamical Systems, Ph.D.
Thesis, University of Grningen, Grningen, (1986).
Hazewinkel, M. and R.E. Kaiman, On Invariants, Canonical Forms and Moduli for Linear,
Constant, Finite Dimensional Dynamical Systems, Mathematical Systems Theory, Udine 1975,
Spinger-Verlag, NYC, 48-60 (1976)
Heymann, M., Structure and Realization Problems in the Theory of Dynamical Systems, Springer,
CISM Courses and Leetures No. 204, NY (1975)
Kaiman, R.E., Mathematieal Description of Linear Dynamical Systems, SI AM J Control, 1, 152-192
(1963)
Kaiman, R.E., Irreducible Realizations and the Degree of a Rational Matrix, SIAM J Appl Math,
13, 520-544 (1965)
Kaiman, R.E., Aigebraic Geometrie Deseription of the Class of Linear Systems of Constant
Dimension, 8th Annual Princeton Conference on Information Sciences as Systems, Princeton, NJ
(1974)
Kaiman, R.E., Identifiability and Problems of Model Selection in Eeonometrics, 4th World Congress
ofthe Econometric Society, Aix-en-Provence, August (1980)
Ljung, L., System Identification: Theory for the User, Prentice Hall, New Jersey, (1987)
Rissanen I., Basis of Invariants and Canonical Forms for Linear Dynamic Systems, Automatica,
10, 175-182 (1974)
Rissanen I., Stochastic Complexity in Statistical Inquiry, World Scientifie, Singapore, (1989)
Rissanen, I. and P.E. Caines, The Strong Consistency ofMaximum Likelihood Estimators of ARMA
Proeesses, The Annals of Statistics, 7 (2), 297-315 (1979)
Rissanen, I. and L. Ljung. Estimation of Optimum Structures and Parameters for Linear Systems
Mathematical Systems Theory, Udine, 92-110 (1975)
Segal, G., The Topology of Spaces of Rational Functions, Acta Mathematica, 143,39-72 (1979)
Van Overbeck, A.J.M., and L. Ljung, On Line Structure Selection for Multivariable State Spaee
Models, Automatica, 18 (5), 529-543 (1982)
Willems, I.C., From Time Series to Linear Systems, Automatica, Part 1,22 (5), 561-580, Part 11,
22 (6), 675-694 (1986); Part III 23 (1), 87-115 (1987)

Identification of Dynamic Systems from Noisy Data:


The Case m* = 1*
M. Deistler 1 and B. D. O. Anderson 2
1 Institute of Econometrics Operations Research and System Theory, Technical University
of Vienna, A-I040 Vienna, Austria
2 Department of Systems Engineering, Research School of Physical Sciences and Engineering,
The Australian National University, OPO Box 4, Canberra, ACT 2601, Australia

Introduction
In identification oflinear systems the "main stream" approach to noise modeling
is to add all noise to the outputs (assuming orthogonality), or to the equations
(which is the same for our analysis). In econometrics these models are named
errors-in-equations models. Here we are concerned with the ease where in
principle all variables may be eontaminated by noise. Sueh models are called
errors-in-variables (EV) or latent variables models, or using a slightly different
but equivalent formulation Jactor models. Whereas in the errors-in-equations
approach the deterministic system is embedded into its stochastie environment
in an asymmetrie way, EV modeling is (in prineiple) more general and
corresponds to symmetrie noise modeling. The asymmetry of errors-in-equations
modeling can be justified in many situations, in particular in prediction, however
there are a number of cases where this asymmetry is not appropriate and leads
to "prejudiced" results. The symmetrie EV modeling is appropriate for instance:

(i) If we are interested in the true system gene rating the data (rather than in
prediction or in encoding the data by system parameters) and we cannot
be sure apriori that the observed inputs are not corrupted by noise.
(ii) If we want to approximate a high dimensional data vector by a relatively
small number of faetors.
(iii) If we have no sufficient apriori information about the number of equations
in the system or about the classification of the variables into inputs and
outputs; then we have to perform a more symmetrie system modeling, which
in turn demands a more symmetrie noise model.
The statistical analysis of EV models has a long history in econometrics,
psychometrics and statistics (see, e.g. Frisch 1934; Koopmans 1937; Bekker and
de Leeuw 1987). Recently there has been aresurging interest in such models;
these models also attracted attention in system and control theory. Partly this

Support by the Austrian "Fonds zur Frderung der wissenschaftlichen Forschung" "Schwerpunkt
Angewandte Mathematik" (S32/02) is gratefully acknowledged.

424

M. DeistIer and B. D. O. Anderson

development was triggered by a number of seminal papers authored by KaIman


(see, e.g. KaIman 1982, 1983). In these papers the theoretical and practical appeal
ofthis approach, as a general approach to the problem of identification oflinear
systems has been pointed out as well as the technical complications and the
main open problems. The price to be paid for the generality of noise modeling
is a significantly increased complication in the statistical analysis. The main
problem is a basic "nonidentifiability" in the sense that in general the system
is not uniquely determined from the (population) second moments of the
observations, sinee the separation between the system and the noise part is not
unique.
Our work was definitely inspired by Kalman's ideas. Whereas KaIman
analysed the statie ease, our foeus is on the ease of dynamie systems; for the
dynamie ease see also Anderson and Deistler (1984, 1987), Pieci and Pinzoni
(1986), and DeistIer and Anderson (1989).
The system eonsidered is of the form
(1.1)

W(Z)Zt = 0

where Zt is the n-dimensional veetor of latent (i.e. in general unobserved)


variables, Z is used for the backward-shift on II (i.e. z(Zt \tEll) = (Zt- 1\ tEll)) as
well as for a eomplex variable and
w(z) =

00

j=

wjzj;

WFIRmxn

(1.2)

-00

We will eall w(z) the relationfunetion ofthe exaet relation (1.1) (compare Willems
1986). CIearly, systems of the form (1.1) are symmetrie in the sense that we need
no apriori c1assifieation of the variables Zt into inputs and outputs and no a
priori information about eausality directions; without restrietion of generality
we will assurne that m ~ n holds and that w(z) eontains no linearly dependent
rows; also in general m is not known apriori.
The observed variables are of the form
(1.3)

where U t is the noise veetor.


Throughout the paper we will assurne:
(i) (Zt), (Zt) and (u t) respeetively are [wide sense] stationary proeesses [with real
valued eomponents] and speetral densities E, fand D. [In addition limits
of random variables are understood in the sense of mean square
eonvergenee.]
(ii) EZ t = EU t = 0
(iii) Eztu: = 0 and finally
(iv) D is diagonal.
For a diseussion of assumption (iv) see, e.g. DeistIer and Anderson (1989)
and DeistIer (1989). Without imposing any assumption besides (ii) and (iii), in

6 Identification and Adaptive Control-Identification from Noisy Data

425

general every system would be compatible with the second moments of the observations; thus some additional assumptions have to be imposed. By assumption
(iv) the common effects are attributed to the system and the individual effect
to the noise. Clearly (iv) cannot be justified universally; in other words it will
be a "prejudice" in a number of applications. However it is a reasonable
assumption for a sufficiently large class of ca ses.
In Dur analysis the frequency .:l. will be kept fixed. In this sense I, E and D
are considered as (constant) Hermitian matrices rather than as spectral densities.
From (1.3) we have
(1.4)
Clearly (1.4) mayaiso be interpreted as coming from a (static) relation between
(Cn-valued random variables z, z and u

z=z+u;
I = Ezz*;

Wz=o,
E = EZf*; D = Euu*

(1.5)

In this paper we will analyse the relation between the second moments of the
observations I and the system and noise characteristics w(z) and D. Such an
analysis is a necessary first step for an analysis of the properties of estimation
and inference procedures. The main question are (compare Deistler an
Anderson 1989).
(a) Find the maximum number, m* say, of (linearly independent) rows of w(z)
among the set of all w(z) compatible with given I. Sometimes we also use
the symbol mc(I) for m* ifwe want to make the dependence on I explicit.
(b) Give adescription of the set of all (w(z), D) compatible with given I; in
addition describe the sub set corresponding to different numbers of linear
relations m.
(c) Describe the set of all I corresponding to a given m*, n > m* ~ 1.
Thus the problems we consider are (a) to find the (maximum) number of
equations for given I, (b) to describe the set of all observationally equivalent
(based on second moments only) signal and noise characteristics and (c) to
describe the set of spectral densities corresponding to a given m*. There is no
general solution available for these problems up to now.
In this paper, we focus on the case of general n and m* = 1 [for n general
and m* = n - 1 see Anderson and Deistler (1990)]. For the static case, where
Zt and U t are (real) white noise processes (and thus I(.:l.) and E(.:l.) are constant
with real entries) and w(z) is constant with real entries, this problem has a long
history, see, e.g. Frisch (1934), Koopmans (1937), and KaIman (1982), (1983). The
dynamic case turns out to be significantly more complicated. The cases n = 2
and n = 3 have been treated in detail in Anderson and Deistler (1984), (1987).
The paper is organized as follows. In Sect. 2 and 3 we are concerned with
question (b) and to a small extent also with question (a). The set of all rows of

426

M. DeistIer and B. D. O. Anderson

w (when suitably normalized) compatible with a given l: is called the solution


set. In Sect. 2 some topological properties of the solution set are shown. In
Sect. 3 some additional results concerning the form of the solution set are
derived. In Sect. 4 we investigate the set of all l: with mc(l:) = 1 and the function
attaching to every such l: the corresponding solution set.

2 The Solution Set-Some General Properties


In a first step, let us assume temporarily that f (rather than 1:') is known.
Clearly relation (1.1) implies
w(ei)l f(A)

=0

(2.1)

Conversely, if we commence from fand if we want to explain by the system


as much as possible and if we have no additional apriori information, then by
(2.1), the rows of ware defined as a basis of the left kernel of fand w is unique
up to basis change.
Clearly in general only l: is known and thus equation (1.4) will be the
starting point of our analysis. Remember that l:, fand D are no negative definite
and that f is singular and D is diagonal. In view of this for given l:, fand D
are called feasible if
(2.2)
holds, where l: - D = f is singular and D is diagonal. As easily can be shown,
for every l: ~ 0 a feasible decomposition (1.4) and thus a corresponding EV
representation exists. To avoid having to consider a number of special cases
we will assume throughout that
(v) l: > 0
(vi) O"ij;60 i,j= 1, ... ,n and
(vii) sij;6 0 i,j = 1, ... , n hold.

Here S = l: -1 and as a general rule if e.g. l: is a matrix, its i,j-entry is


denoted by the corresponding lower case symbol O"ij.
For given l:, a vector XE([" is called a solution if there exists a feasible f
satisfying
(2.3)

The set of all solutions corresponding to a given l: is called the solution set
L (of l:); sometimes we also use the notation LE Analogously we define D as
the set of all feasible matrices D corresponding to l:. Since L is the union of
linear spaces of dimension greater than zero, we may find a normalization
useful. In most parts of the paper, the first component of X, Xl is normalized
to one.

6 Identification and Adaptive Control-Identification from Noisy Data

Let us define the matrix S = (sij Si~ 1), i,j = 1, ... , n and let
row of S. Now it is easily seen from

Sj

427

denote the j-th

si" E = (0, ... ,0, S;;.l, 0, ... ,0) = sjD


that Sj is the solution (with first component normalized to one) corresponding
to the j-th elementary regression, Le. to the case where all components of Zt,
except for the j-th, are assumed to be observed free of noise; Sj will be called
thej-th elementary solution. Since the first elementary solution SI always exists,
no matrix E is excluded by the normalization Xl = 1. However, the kernel of
f may be orthogonal to (1,0, ... ,0) and in this sense the normalization may be
a restrietion of generality. However as will be shown in the subsequent Lemma 3,
this situation will not occur in the case m* = 1. Clearly elementary solutions
can also be defined for singular matrices E. They correspond to the projection
ofthej-th component of Z in (1.5) on the space spanned by all other components.
Now, let us state some usefullemmas.

Lemma 1. Let E ~ (may be singular). 1f the n-th row of E, an say, is linearly


independent from the other rows al' ... ' an -1 of E, then the n-th elementary
regression gives a noise covariance matrix D of the form
D n = diag{O, ... ,O,d~~}

where d~~ >


rank of A).

and where rk(E - D n) = rk(E) - 1 holds (here r(A) denotes the

Proof. The proof is straightforward and can be seen from projecting the n-th
component of Z (see 1.5) on the linear space spanned by the other components
of z.
Lemma 2. Let D = diag{d ii } be feasible and let d~:) correspond to the i-th
elementary regression. Then
(2.5)

Proof. Without restrietion of generality, take i = 1; let D = A + B; where


A = diag{d l l ,O, ... ,O} and B = diag{O,d n ,.. ,dnn }.
First note that for d11 > dW, the matrix E - A would not be nonnegative
definite. To see this consider
det(E - A) = (a 11 - d 11 ) fl (E) + f2(E)

(2.6)

where fl and f2 depend only on E and where

holds. The expression is zero for d 11 = dW, and hence is negative for d 11 > d~ll

428

M. Deistier and B. D. O. Anderson

Now I: - D = I: - A - B = C ~ 0 would imply B + C = I: - A


a contradiction for du > dW.
For fixed I:, the relation between Land D is given by
xI: =xD,

0 which is

D=D

xEL,

(2.7)

Let us consider the case mc(I:) = 1 in more detail now. We investigate the
solution set and the set of all feasible matrices D.

Lemma 3. For mc(I:) = 1, xEL and x

=1= 0

imply that every entry Xj' j = 1, ... , n,

of x is unequal to zero.
Proof. We give a proof by contradiction. If e.g. Xl = 0 holds for xEL, x =1= 0,
then as seen from (2.7) we may put du = 0 and D remains feasible. Also the
last n - 1 rows of I: - D are clearly linearly dependent. Then the first row of
= (I: - D) is linearly independent from the other rows of since otherwise
mc(f) > 1 would hold. Now performing the first elementary regression for f,
(not I:) using an evident notation, corresponds to determining Sl and Dso that

Slf = Sl D,
where
D=diag{dll,O, ... ,O}

and

O~D~t

By Lemma 1, d11 > 0 holds. From


I: ~

t ~ t - D= I: -

(D

+ D) ~ 0

we see that D + D is feasible; Lemma 1 then implies rk(I: - (D + D)) < rk and
thus mc(I:) > 1.
Therefore in the case mc(I:) = 1, the normalization Xl = 1 is no restriction
of generality and we can consider the (normalized) solution set
L= {xl(1,x)EL}
The relation between Land D (remember that I: is kept fixed) then is ofthe form

(2.8)
where
I:

= (0"11,I: 12 )
I:i2,I: 22

D
'

= (d 11 ,O)
0,D 22

I: 12 E <cn - 1 ; I: 22 , D22 E<c(n-l) x (n-l).


Note that for all feasible D, 22 = I: 22 - D22 must have full rank n - 1, since
otherwise mc(I:) > 1 would hold. Thus x is uniquely determined for given DED
by:

(2.9)

6 Identification and Adaptive Control-Identification from Noisy Data

429

Conversely, for given x, D is uniquely determined by


xL 22 = xD 22

(2.10)

and

a 11

+ XL l' 2 = d 11

(2.11)

Thus we have defined a bijection, i say between Land D.

Theorem 2.1
(a) mC(L) = 1 if and only if no xEIx =f. 0 has a zero entry
(b) Jor mC(L)= 1, the relation between Land D defined by (2.9)-(2.11) is a
homeomorphism
(c) Jor mC(L) = 1, Land D are compact. IJwe consider Las a subset OJlR 2 (1I-1) ~
(["- \ then L is oJ real dimension n - 1.
ProoJ
(a) One part is just Lemma 3. Conversely if mC(L) > 1 holds, then I contains
at least one linear subspace of dimension greater than one and in this
subset clearly there is an element x =f. 0, with one zero component.
(b) As has been shown already, i is a bijection. The continuity in both directions
is easily seen from (2.9)-(2.11).
(c) Clearly, D is bounded. Let f n = 17 - D n , DnED, be a convergent sequence
with limit f. We then have 17 ~ f ~ 0 and f is singular since det f n = 0
holds and the determinant is a continuous function ofits entries and therefore
fis feasible. Thus every convergent sequence DIIED takes its limit D in D,
in other words D is closed. Since the image of a compact set by a continuous
mapping is compact, L is compact by (b). As is seen from (2.9) and (2.11),
for given d 22 , ... , dnn (and given 17 of course) X and thus d 11 are uniquely
determined. Note that D = Amin(L)' I, where Amin(A) is used to denote the
smallest eigenvalue of A ~ 0, is always feasible. This follows from
(2.12)

where J1 is the diagonal matrix of eigenvalues of 17 and where U is a unitary


matrix. For mC(L) = 1, all principal minors of (17 22 - Aminln-1) are strict1y
positive and since the determinant is a continuous function of the matrix
entries, all principal minors of 17 22 - D 22 are also positive in a suitably
chosen neighborhood of Aminln-1' and moreover this neighborhood can be
chosen such that 0< d11 < dW holds. Thus this neighborhood is
homeomorphic to the corresponding neighborhood of AminI in D. Replacing
AminI by a general DED and using an analogous argument, we can show
that D is locally homeomorphic to lR n+- 1 and thus topological manifold of
dimension n - 1.

Remark 1. Note that for the static ca se a significantly more far-reaching result
is available, see Frisch (1934), Koopmans (1937) and Kaiman (1982). In this

430

M. DeistIer and B. D. O. Anderson

case m* = 1 if and only if E -1 is "like" a positive matrix (i.e. E -1 can be made


to a matrix with positive entries by eventually multiplying some of the rows
and the corresponding columns by - 1) and the solution set L is the convex
hull generated by the elementary solutions. The proof of this result relies on a
Perron-Frobenius type argument which cannot be carried over to matrices
with complex entries. Note in particular that whereas it is easy to check whether
E - 1 is like a positive matrix, condition (a) is not of much value for that purpose.
Also note that, as can be seen from Anderson and Deistler (1987) for the case
n = 3, in the dynamic situation we may have that E -1 is not like a positive
matrix (with properly complex entries) and still m* = 1 holds. As we will see in
Sect. 3, in general for the complex case, the solution set L will not be a polytope
since it has curved bounding hyper-surfaces.
Remark 2. Note that L is compact if and only if mc(E) = 1 holds: Assurne
t be feasible such that the dimension of the kernel of E - t,
ker(E - t), is equal to mc(E); now it is straight-forward to show that the
dimension of ker(E - t) (") {xE<C1 Xl = 1} is equal to mc(E) - 1 and thus is
greater than zero. Also note that for mc(E) > 1, the relation between Land D
is not a function in either direction.
mc(E) > 1; let

3 Solutions on Complex Lines


In this section we further investigate the solution set. Again E is kept fixed.
The main idea here is to connect two points, X and y say, from the solution set
by the complex line
(XX

+ (1 -

a)y,

aE<C

(3.1)

and to investigate for which a, ax + (1 - a)YEL holds. Note that ax + (1 - a)y,


aE<C is a plane in 1R.?". The results obtained in this section are valid for general
mc( E). We start from the equation
(ax

+ (1 -

a)y)E = (ax

+ (1 -

a)y)D = axD x

+ (1 -

a)yD y

(3.2)

where x,YEL, Xl = Yl = 1 and D x and D y correspond to X and y respectively;


Dis diagonal and the unknown variable in (3.1). Clearly ax + (1 - a)YEL if and
only if there is a D satisfying (3.2) and D ~ 0 and E - D ~ 0 hold.
First consider the case X=Sl,y=Sj, j> 1, D x =D 1 and Dy=D j , i.e. we
investigate the real plane given by the first and the j-th elementary solutions.
Then the first equation in (3.2) is of the form
"'d(1)
11 -- d 11
U<o

and the j-th equation is of the form


(1 - a)s.JJ})
dVJ = (asl J. + (1 - a)sJJJJ
.. )d ..

(3.3)

6 Identification and Adaptive Control-Identification from Noisy Data

431

which gives
(3.4)

- - - - d < j ) = d

S.

1 + __ ..2l
1 - a Sjj

II

II

By Lemma 2, dl: J ~ djj ~ must hold for every feasible D. Also note that
SI{S;/ > 0; thus (3.3) and (3.4) imply aE[O, 1]. Put
dii =

for i =1= 1, i =1= j

(3.5)

then such a prescription for D satisfies (3.2) and D ~ for every aE[O, 1]. In
order to show that a D given by (3.3)-(3.5) is feasible for every aE[O, 1], it
remains to show that E - D ~ holds. Note that for the j-th elementary
regression dYJ is the unique solution of the equation

det(E - diag(O, ... , d jj , 0, ... ,0)) =

(3.6)

in the variable djj EIR. This is a direct consequence of the fact that (3.6) is a
linear equation with a positive coefficient for (ajj - djj ) (compare 2.6). Now
performing the j-th elementary regression for E - aD l , aE(O, 1) we see that the
corresponding noise covariance matrix is diag {O, ... , d jj , 0, ... , O} with djj given
by (3.4) and thus E - D = (E - aDd - diag{O, ... , djj , 0, ... , O} ~ 0.
Let us consider
(3.7)

For every YEF nL we can choose a corresponding feasible D with du = 0. Thus,


for x = SI and YEF nL we have (3.3) and thus aE[O, 1]. [Clearly, here, in general
not every aE[O, 1] gives a solution.]
In an analogous way as before we proceed in the ca se x = S" Y = Sj' I, j =1= 1;
1=1= j. Then from (3.2) we obtain
asudW = (asu + (1 - a)sj/)d u

(3.8)

for the I-th equation and

= (as,.1 + (1 - a)sII.. )dII..


(1 - a)sIId<j)
II

(3.9)

for the j-th equation. Again we put dii = 0, i =1= I, i =1= j. Equation (3.2) then are
equivalent to

( ~_I).Sj/~O
a
Su
which in turn is equivalent to

arg(~ -

1) = args u - argsjl = argsjl- argsu

(3.10)

432

M. DeistIer and B. D. O. Anderson

Re

CI.

Set of all feasible

CI.

(1)

---_

/ Set of all feasible

CI.

(2)

.....

Fig.l

Using the same argument as above, we can show that L - D ~ holds. Now
let us discuss condition (3.10). We may distinguish between three cases.

(i) If Sjlls" > holds, then (3.10) is equivalent to OCE(O, 1].


(ii) If Sj!ls" <0 holds, then (3.10) is equivalent to ocE[1,(0)u(0, - (0).
(iii) If Sjlls" is not real, then the set of an feasible oc is an arc of a circle as
shown in Fig. 1.
For a more detailed discussion for the cases n = 3 and n = 4 see Scherrer et al.
(1990).

4 The Relation Between I and L


Let Lm denote the set of an L > 0, satisfying (vi) and (vii), such that mC(L) = m.
In a next step we now consider some properties of LI and of solutions for
varying (rather than for fixed) LELl.
We will consider the set L of all L satisfying (v)-(vii) as an open subset of
IRnLn. (Note that the set of an positive definite matrices is open in IRn 2 -n.)
Theorem 4.1.

LI is an open sub set olL.

6 Identification and Adaptive Control-Identification [rom Noisy Data

433

Proof. We have to show that for every 17EL1, there is a neighborhood contained
in L1' Ifthis were not true, there would be a sequence17 n converging to17 and
mc(17 n) > 1 for all n. Thus, there would exist En> En feasible for 17 n and
rkE n < n - 1. By En ~ 17 n and 17n -+17, En is a bounded sequence, and thus has
at least one limit point, E say. Since the determinant is a continuous function of
the entries of a matrix, we then have rk E < n - 1 and 0 ~ E ~17 and thus
mc(17) > 1 in contradiction to our assumption.
From the idea of the proof above, we immediately obtain:
Corollary. L1 U L2 ... U Lm, 1 ~ m ~ n - 1, is open in L.
Consider the function 1, defined on L1, which attaches to every 17 the
corresponding (normalized) solution set L = LI' For any two compact subsets,
L, M say, of <c(n-1) x (n-1) a metric can be defined by

d(L, M) = sup(p(L, M), p(M, L))

(4.1)

where

p(L, M) = sup inf 11 y YEM

xii

(4.2)

XEL

d is called the Hausdorff distance (see, e.g. Dieudonne (1969), p. 61). By C we


denote the set of all compact subsets of <C n-1 endowed with the Hausdorff
distance.
Theorem 4.2. The function 1 :L1 -+ C: 1(17) = LI is continuous.

Proof. Consider a sequence 17 n -+17EL1' By Theorem 4.1 we may assurne


17 nEL1' We will show that
sup inf 11 X n - x 11-+0
XneLEn

(4.3)

xeLr

As the norm is a continuous function and the (normalized) solution sets are
compact we may replace the inf by min and sup by max.
As we are in the case m* = 1 every principal minor up to size n - 1 of(17 - D),
DED is strictly positive. Since these minors are continuous functions of D and
since D is compact by Theorem 2.1 all these principal minors are bounded
away from zero by a positive constant. Furthermore, by the same type of
argument we see that there is a compact neighborhood, Va = {A ~ 013DED S.t.
11 A - (17 - D) 11 ~ b}, ofthe set {17 - D 1D ED}, where the same statement remains
valid, i.e. there is ab> 0 and a corresponding c > 0 such that all principal
minors in the set V 0 are uniformly bounded away from zero by c. In the following
we will assurne that Cl > 0 chosen sufficiently sm all so that we will stay in Va
by performing the subsequent steps.
Since 17 n-+17, for every Cl E(O, 1) there is a no such that
(4.4)

434

M. DeistIer and B. D. O. Anderson

Let fand D be feasible for I and define

Dn = (1 Then

B1)D,

Dn ~ 0 and
2BII + (1 -

B1

)f = (1 + B1)I -

(1 - B1)D ~

n ~ (1 - B1)t ~ 0

Now let us perform the first elementary regression for


decomposition of the type (1.4)
In = In + Dn ,
~

l'

(4.5)

which gives a feasible

{~1

D n = dlag dll,O, ... ,O}

Let Dn = D n + Dn, then In and Dn are feaslble for In


Now, let us write, using an obvious notation [compare (2.8)]:
-

11 Xn - x 11 = 11 I 12 (I 22 - D22 )-1 - I 12.n '(I 22.n - D22 ,n) -1 11


= [det(I 22 - D 22 )'det(I 22,n - D22,n)] -1'11 I 12' adj(I 22 - D22 )
det(I22 ,n - D22 ,n) - I 12,nadj(I 22,n - D 22 ,n)'det(I 22 - Dd 11
~ C- 2 'B 2
(4.6)
where B2 > 0 is determined from B1 by (4.4) and (4.5); B2 can be chosen arbitrarily
small by a suitable choice of B1 and can be chosen independently of the choice
of DED. Thus we have shown (4.3). By the same argument we can show
sup inf 11 X n - x 11-+0

(4.7)

xeLE xeLE n

which proves the theorem


Theorems 4.1 and 4.2 are important for a statistical analysis. In particular
we see that a consistent estimator of I can directly be used to obtain a consistent
estimator for L E'
Next we consider the situation where the noise variance is converging to
zero; in this case the solution sets are converging to a singleton:
Theorem 4.3. Let In-+I, where I nELl and rkI=n-1, LE=kerIn
{xECCn lx 1 = I} =F 0. Then L En -+L E.

PraaJ. Since In converges to a singular matrix, also the variances dl:~n of the
corresponding noise terms for the elementary regressions converge to zero.
Thus, by Lemma 2, we have f n -+ I for any f n feasible for In. The rest is
straightforward from
11 X n - xii = 11 I

12 I2'}

- I 12,n '(I 22,n - D 22 ,n)ll-1

References
Anderson, B.D.O. and M. DeistIer, 1984, Identifiability in dynamic errors-in-variables models,
Journal of Time Series Analysis 5, 1-13
Anderson, B.D.O. and M. DeistIer, 1987, Dynamic errors-in-variables systems with three variables,
Automatica 23, 611-616

6 Identification and Adaptive Control-Identification from Noisy Data

435

Anderson, B.D.O. and M. DeistIer, 1990, Identification of dynamic systems from noisy data: The
case m* = n - I, Mimeo
Bekker, P. and J. de Leeuw, 1987, The rank of reduced dispersion matrices, Psychometrica 52,
125-135

DeistIer, M., 1989, Symmetrie modeling in system identification, in: H. Nijmeijer and J.M.
Schumacher, eds., Three Decades of Mathematical System Theory. Springer Lecture Notes in
Control and Information Sciences, no. 135, Springer-Verlag, Berlin
DeistIer, M. and B.D.O. Anderson, 1989, Linear dynamic errors-in-variables models, some structure
theory, Journal of Econometrics 41, 39-63
Dieudonne, J., 1969, Foundations of modern analysis. Academic Press, New York
Frisch, R., 1934, Statistical confluence analysis by means of complete regression systems,
Publication no. 5 (Economic Institute, University of Oslo, Oslo)
KaIman, R.E., 1982, System identification from noisy data, in: A. Bednarek and L. Cesari, eds.,
Dynamic systems 11, a University ofFlorida international symposium (Academic Press, New York,
NY)
KaIman, R.E., 1983, IdentifiabiIity and modeling in econometrics, in: P.R. Krishnaiah, ed.,
Developments in statistics, Vol. 4 (Academic Press, New York, NY)
Koopmans, T.C., 1937, Linear regression analysis of economic time series, Netherlands Economic
Institute, Haarlem
Picci, G. and S. Pinzoni, 1986, Dynamic factor-analysis models for stationary processes, IMA
Journal of Mathematical Control and Information 3, 185-210
Scherrer, W., M. DeistIer, M. Kopel and W. Reitgruber, 1990, Solution sets for linear dynamic
errors-in-variables models, Mimeo
Willems, J.C., 1986, From time series to linear systems, Part I, Automatica 22, 561-580

Adaptive Control*
K. J. strm
Department of Automatie Control, Lund Institute of Technology, S-221OO Lund, Sweden

This paper gives an overview of some ideas in adaptive control that originate from a paper published
by KaIman in 1958. The key ideas are given using quotes from Kalman's paper. It is described
how the ideas were developed to give practically useful adaptive controllers. The paper ends with
a few personal remarks.

1 Introduction
Several schemes for adaptive control system originate from the fifties.
Stimulation came from two application areas: flight control systems and
computerized process control. With the emergence of supersonic aircraft which
operated over a wide range of speed and heights, it was found that controllers
having constant gain were inadequate. An active pro gram in adaptive control
was initiated to find controllers that could cope with the situation (see [Gregory
1959J). Ideas like self-oscillating adaptive systems and model reference adaptive
systems were explored, but the practical solution turned out to be .gain
scheduling. The possibility to use digital computers for process control also
generated a lot of interest in the late fifties [strm and Wittenmark 1990].
Before starting his graduate studies, KaIman worked with these problems in the
Engineering Research Laboratory at DuPont. In connection with this, he
suggested the adaptive control scheme which is now known as the self-tuning
regulator. To describe the idea we quote from the paper [KaIman 1958J:
(i) The dynamic characteristics of the process are to be represented in the form of ** Equation [6],
the coefficients of which are to be computed from measurements. The number n = q is assumed
arbitrarily. In general, the higher n, the more accurate the representation of the process by the
DilTerence Equation [6].
(ii) The coefficients aj and bj should be determined anew at each sampling instant so as to minimize
the weighted mean-square error E(N).
(iii) The calculations necessary for determining the coefficients consist of modifications of the
c1assicalleast-squares filtering procedure and are given in the Appendix.

* This paper was written while the author was Visiting Professor at

the Department of Electrical


and Computer Engineering, University of Texas at Austin, Austin TX 78712.
** Equation [6] is Ck + b\ck _\ + ... + bnCk-n = aOm k + a\mk-\ + ... + anmk - n + Bk where m denotes
the control variable and C the process output.

438

K. J. strm

(iv) The choice of an optimal controller is largeJy arbitrary, depending on what aspect of system
response is to be optimized. The determination of the coefficients in the describing equations
of the controller is a routine matter if the coefficients of the pulse-transfer function are known.

Only second order systems were considered and the least squares algorithm
was simplified in order to implement the algorithm. Kalman's paper also
describes a special purpose computer that was used to implement the algorithm.
The reasons for using a special purpose computer give an interesting perspective
on the development of computer technology that has taken place since 1958.
We quote:
As soon as the operations discussed in the foregoing sections have been reduced to a set of
numerical calculations (see Appendix) the machine has been synthesized in principle. This means
that any general-purpose digital computer can be programmed to act as the self-optimizing machine.
In practical applications, however, a general-purpose digital computer is an expensive, bulky,
extremely complex, and somewhat akward piece of equipment. Moreover, the computational
capabilities (speed, storage capacity, accuracy) of even the smaller commercially available
general-purpose digital computers are considerably in excess of what is demanded in performing
the computations listed in the Appendix.
For these reasons, a small special-purpose computer was constructed ...

This paper presents some of the development of the basic idea that have
occurred after the publication of Kalman's paper. Section 2 gives more details
about adaptive algorithms. In particular, the notion of direct and indirect
adaptive control are introduced. Adaptive systems are analysed in Sect. 3.
It is interesting to note that the algorithm for recursive least squares estimation
has the same structure as the KaIman filter. This can be exploited in analysis.
Some unexpected properties of direct adaptive algorithms are investigated in
Sect. 4. Industrial uses of the algorithm are briefly mentioned in Sect. 5. In the
Conclusions, we give some historical remarks and acknowledge Kalman's
impact on my own research.

2 Algorithms
Kalman's suggestion for an adaptive controller is captured by the block diagram
given in Fig. 1. There are obviously many different choices of control and
estimation algorithms, as was pointed out in quote (iv) in Sect. 1.
To be specific, assume that the process to be controlled with sufficient
accuracy can be described by
A(q- 1 }y(t} = B(q-l }u(t - do} + v(t}

(1)

where y is the output, u is the input and v is a disturbance. Specific assumptions


about the disturbances are made later. A and Bare polynomials in the delay
operator q - 1. It is assumed that deg A = n and deg B = m. Without loss of
generality, it can also be assumed that m + do = n. The order of the system is
then n.

6 Identification and Adaptive Control-Adaptive Control

439

Process parameters

I
Design

Estimation ~

Co ntroller
pa rameters

-=--

Controller

Process

Fig.l

Parameter Estimation
A recursive algorithm for estimating the parameters of (1) will first be given. If
least squares estimation is used it is straightforward to derive the following
equations. See, e.g. [strm and Wittenmark 1989].
8(t) = (j(t - 1) + K(t)e(t)
e(t) = y(t) - cpT(t - 1){j(t - 1)
K(t) = P(t - l)cp(t - I)(R 2 + cpT(t - I)P(t - l)cp(t - 1))-1
P(t) = (1 - K(t)cpT(t - I))P(t - 1) + R 1

(2)

where {j is a vector of estimates of the parameters of the model (3), i.e.

(j=[b o b1,,bm a1,,an y

(3)

cp is a vector of regressors
cp(t -1) = [u(t - do),,u(t - do - m) - y(t - 1),, - y(t - n)Y

(4)

R 1 is a nonnegative symmetrie matrix and R 2 a positive number.

Control Design
There are many possible choices for control design techniques in an adaptive
system. Many properties of the adaptive systems can be derived without
considering the details of the control design. Many different approaches have
been explored-minimum variance control, pole placement, LQG, etc. See
[Goodwin and Sin 1984J and [strm and Wittenmark 1989].

440

K. J. strm

Indirect Adaptive Control


Assuming that a method to compute a controller is available, the following
adaptive control algorithm can now be formulated.

Algorithm 1: Indirect Self-Tuning Regulator


Step 1. Estimate the coefficients of the polynomials A and B in equation (3)
recursively using the least-squares method given by equation (4).
Step 2. Compute the parameters of the controller using the chosen method for
control design where the estimates obtained in Step 1 are used as the true model
parameters.
Repeat Steps 1 and 2 at each sampling period.
D

The simplification that the estimated parameters are used instead of the true
parameters when computing the controllaw is called the certainty equivalence
assumption. Notice that the least squares method also gives an estimation of
the parameters uncertainties. It is, however, a significant complication to take
the uncertainties into account.
The algorithm is called indirect because the controller parameters are
obtained indirect1y by first estimating process parameters and then computing
a controllaw. The control design can be described mathematically by a nonlinear
mapping. A difficulty is that many control designs have mappings that are
singular for certain process parameters, typically when reachability or
observability is lost due to a pole-zero cancellation. See [KaIman et al. 1963].
By a simple reparameterization of the model it is possible to obtain
algorithms where the controller parameters are updated direct1y. To do the
reparameterization, however, it is necessary to introduce the control design
method explicitly. A simple control design method will be introduced before
the direct algorithm is described.

Moving Average Control


A stochastic control algorithm will be described. To do this it is necessary to
characterize the disturbance v in equation (1). Here it will be assumed that v is a
moving average of white noise. The model (1) then becomes.
A(q- 1 )y(t) = B(q - 1 )u(t - d o) + C(q -1 )e(t)

(5)

where e(t) is white gaus si an noise. This is a general model for linear stochastic
SISO systems (see [strm 1970J). The polynomial C can be assumed stable
and of degree n without losing generality. The model (5) is a good way to

6 Identification and Adaptive Control-Adaptive Control

441

describe steady state regulation of an industrial process. A natural criterion for


regulation of an industrial process is to have small fluctuations in the process
output and moderate control motions. This can be captured by a quadratic
loss function. If this is done, however, it is necessary to introduce a design
parameter in the form of areal number that gives the relative weight of control
error and control variables. An alternative is to use the so-called moving average
controller. This has the advantage that the design variable is a bounded integer.
The moving average controller is a controller such that the covariance function
has compact support. For a discrete time system, this means that the output
is a moving average of white noise. To derive the controllaw, first notice that
a general linear controllaw can be expressed by
R(q-l )u(t) = - S(q-l )y(t)

(6)

Combining (6) with (5) gives


RC
y(t) = AR + q-dO BS e(t)
where the arguments q-l have been dropped to simplify the notation. To make
y a moving average, RC/(AR + q-dOBS) must be a polynomial. This can be
achieved by choosing the feedback polynomials Rand S such that
AR + q-dOBS = C

To solve this equation for arbitrary C, it must be required that A and Bare
relatively prime. Among the infinitely many polynomial solutions, it is desirable
to find a solution where R has the lowest degree. This will make the moving
average as short as possible. There is a unique solution such that
deg R = d o + deg B - 1 = n - 1
There are, however, solutions having lower degree. They are obtained when R
and B havecommon factors. This means that the controller cancels some process
zeros. To describe these solutions, introduce the factorization B = B+ B- and
assume that B + divides R, Le. R = R 1 B +. Hence
RC
AR+q-dOBS

RlC
AR l +q-dOB-S

If the control law is chosen as

AR l + q-dOB-S = C

(7)

where
deg R l = d o + deg B- - 1 = n - 1 - deg B+
the degree of the moving average is smaller than n but larger than do - 1. The
design parameter is thus limited to a few integer values. Notice that unstable
factors of B cannot be canceled. This furt her limits the choice.
If the polynomial B is stable, there is a controller that makes the output a

442

K. J. strm

moving average of order d o - 1, which is a minimum variance controller. This


controller is characterized by polynomials R = BR o and So where
AR o + q-dOS o = C

(8)

The polynomials are thus given by


RO(q-l) = q-d o+ 1 C(q-l )div A(q-l)
SO(q-l)

= q-d o + 1 C(q-l )mod A(q-l)

The moving average controller is thus a natural generalization of the minimum


variance controller. It is straightforward to show that
R1

= R o + q-doQ

where Q is a polynomial of degree degB- (see [strm 1970]).

Direct Adaptive Control


An attempt to derive an indirect adaptive algorithm for the model given by
equation (5) leads to considerable complications because estimation of
parameters of the polynomial C leads to a nonlinear problem. The parameters
can be estimated using recursive maximum likelihood estimates but the
algorithms are complicated. See [strm 1980] and [Ljung and Sderstrm
1983]. Surprisingly there are much simpler algorithms that will work for systems
described by (5). To derive them a simpler problem is first treated. It williater
be demonstrated that the simplified algorithm works also for the model (5).
Consider a minimum variance controller which is obtained by solving (8)
with C = 1, i.e.
AR o + q-dOS o = 1

Consider this as an operator equation and let it operate on y. This gives


y(t) = ARoy(t) + Soy(t - do) = BRou(t - do) + Soy(t - do) + Roe(t)

where the last equality follows from (5) with C = 1. Observing that BR o = R
we get
(9)
Notice that this equation may be considered a reparameterization of the model
(5) in the controller parameters. Hence if the parameters of (9) are estimated
the controller parameters are obtained directly. An adaptive controller based
on estimation of parameters of this equation is therefore called a direct adaptive
algorithm. The adaptive algorithm can be formulated as follows:

Algorithm 2. Direct Self-Tuning Regulator

Data: Given the prediction horizon d, and the orders k and 1of the polynomials
Rand S.

6 Identification and Adaptive Control-Adaptive Control

443

Step 1. Estimate the coefficients of the polynomials Rand S of the model given
by equation (9) by recursive least squares, i.e. equation. (2) where the parameter
vector is

(}=[ror k

sOS,]T

and e(t) and ep(t - 1) in the right hand side of equation (2) are replaced by
e(t) = y(t) - R(q-1 )u(t - d) - S(q-1 )y(t - d) = y(t) - epT(t - d)O(t - 1)

and

ep(t) = [u(t) u(t - k)y(t) y(t -1) y

Step 2. Calculate the control signal from


R(q - 1 )u(t) = - S(q -1 )y(t)

where Rand S are the estimates obtained in Step 1.


Repeat Steps 1 and 2 at each sampling period.

Although the direct adaptive algorithm was derived under the assumption
that C = 1, it has interesting and unexpected properties when applied to a system
with C i= 1. This will be discussed in Sect. 4.

3 Expected Properties
In this section it will be shown that an adaptive system has many properties
that may be expected intuitively. In particular, it will be explored that much
can be said about an adaptive system without going into the details of the
particular control design method used. Such an approach is very much along
the lines of quote (iv) in Sect. 1. A straightforward approach is to investigate
if the estimate converges. If it does, the limiting value will tell what the limiting
controllaw iso
Interpretation of the Estimator as a Kaiman Filter
For the purpose of analysis it is useful to view the parameters of the system (1)
as random pro ces ses. Although this may look like a complication, it admits a
nice interpretation of the parameter estimator as a KaIman filter, and it also
makes it possible to use martingale theorems for convergence analysis.
To be specific it is assumed that the parameters of(1) are stochastic processes
described by
{}(t + 1) = (}(t) + w(t)
y(t) = ep(t){} + v(t)

(10)

444

K. J. strm

where {W(t)} and {v(t)} are sequences of independent gaussian random vectors
with zero mean and covariances R 1 and R 2 The initial condition is assumed
to be gaussian with mean (J0 and covariance pO. The parameters are thus viewed
as states, and equation (1) is viewed as the output equation. See [strm 1970].
Vector w represents the drift in the parameters. Parameters are constant but
unknown if w = O.
With these assumptions equation (2) for the recursive least squares estimation
can be formally interpreted as KaIman filtering [KaIman 1960J if the initial
conditions are chosen as (J(O) = (J0 and P(O) = pO.
Notice that to allow the KaIman filtering interpretation, it is necessary that
v in eq. (1) is white gaussian noise. The filtering problem is however not in the
standard form because the vector cp(t) is not known apriori.
The following result is proven in [strm 1978].

Theorem 1. Kaiman Filtering Property of Recursive Least Squares Estimates


Consider the system (1) where parameters are governed by (10) with gaussian
initial conditions having mean (J0 and covariance pO. Assume that {v(t)} is a
sequence of independent gaussian variables with zero me an and covariance
R 2 > 0 where the sequence {v(t)} is also independent of (J(O). Then the conditional
distribution of (J with respect to the sigma algebra genera ted by y(t), y(t - 1), ...
is gaussian with mean O(t) and covariance P(t) given by equation (2) with initial
0
conditions 0(0) = (J0 and P(O) = pO.
It was convenient to introduce w in equation (10) to see the similarity to the
KaIman filtering problem. To capture the idea that parameters are constant
but unknown, it will be assumed in the following that w in (10) and R 1 in (2)
are zero.
Since a conditional mean is a martingale, it follows from the martingale
convergence theorem [Chow and Teicher 1988J that the estimate O(t) and its
covariance P(t) converge a.e. as t -+ 00. In this context a.e. means almost
everywhere with respect to the joint distribution of 0(0) and {v(t)}, i.e. for almost
every system and for almost every realization. This was proven in [Sternby
1977]. Since the gaussian measure is absolutely continuous with respect to the
Lebesque measure, convergence is also a.e. with respect to the Lebesque measure
on the parameter space. See [Rootzen and Sternby 1984].
It is of interest to characterize the set that the parameter 8(t) converges to.
In [Sternby 1977], it was shown that this set ii; the nullset of the matrix
1 t
R = lim cp(k)cpT(k)
t--+CX) t k= 1

Notice that the sum in this expression is closely related to the inverse of the
matrix P(t). If the matrix R is regular, it follows that the estimates converge to
the random variable (J(O) which is the initial condition of (10). In the Bayesian
setting this may be regarded as the true parameter.
The regularity of R is closely related to persistency of excitation or sufficient

6 Identification and Adaptive Control-Adaptive Control

445

richness. For a system described by equation (1), the white noise signal v provides
sufficient excitation. For open loop systems, this was established in [Sternby
1977]. For closed loop systems, there is an additional complication because the
feedback may create dependencies. In [Rootzen and Sternby 1984], it was shown
that feedback does not create particular difficulties in this case. This
is also shown in [Kumar 1990].
Notice that if R is regular, it implies that the matrix P(t) goes to zero as
t--+ 00. In [Sternby 1977] and [Rootzen and Sternby 1984], it was shown that
this is in fact a neeessary and sufficient eondition for the least squares estimate
to converge.
Onee it is established that parameter estimates converge, it is straightforward
to prove that adaptive controllaws based on the estimates also converge, and
that the input and output signals in adaptive systems are mean square bounded.
This is done in detail for many different control design methods in [Kumar
1990].
Adaptive algorithms based on the assumption that the proeess can be
described by equation (1) are thus weIl understood in the ease when the
disturbance v is white noise. The theoretical results are satisfying from the point
of view that they rely heavily on the properties of the KaIman filtering
interpretation ofleast square estimation and that they can be applied to a wide
range of control design methods.

4 U nexpected Properties
The assumption that {v(t)} in equation (1) is white noise is restrictive. It would be
highly desirable to have algorithms for systems described by equation (5) where
{e(t)} is white gaussian noise, since this is a general model for linear stochastic
SISO systems [strm 1970]. An indireet adaptive algorithm for this process is
complicated since parameters of polynomial C have to be estimated. The direet
Algorithm 2 described in Sect. 2 has, however, interesting properties when
applied to the process (5). This is illustrated by a numerical experiment.
A Computational Experiment. A computational experiment is made to gain
insight into the problem. Consider a first order system described by
y(t + 1) + ay(t) = bu(t) + e(t + 1) + ce(t)

(12)

where u is the control signal, y the measured signal and e discrete time white
noise. Parameters a and bare arbitrary, but parameter c is assumed to be less
than one in magnitude. If the parameters are known, it follows from equation
(8) that the minimum variance controller is given by
a-c
(13)
u(t) = -b-y(t)
i.e. a proportional controller.

446

Cll

10
E

K. J. strm

1.5

~ 1.0

'-

Cll

CD

E
~

JtAl.r~~,,--",~._::-:
..~
..::
..::
..':':_-~
..... ,-=
.. ~
.. ~
.. _~..~.~
..~_~~_ _~~

0.5

C\l

n.

0
0

50

100

150

100

150

Time
Fig.2

c
0

'tcS

:::J

LL

Ul
-Ul

100

....I

'0
Cll

~
:::J
E

50

:::J

50

Time
Fig.3

Let us investigate wh at happens when the system is controlled by


Algorithm 2. From the analysis in Sect. 3, it can be expected that the controller
converges to the minimum variance control when parameter c is zero.
Figure 2 shows the behavior of the estimate of So when a = - 0.9, b = 2 and
c = 0.3. Parameter r o is kept constant equal to one. It follows from (13) that
the minimum variance controller has a gain of 0.6. Notice that it appears from
the simulation experiment that the controller converges to the minimum
variance controller.
Figure 3 shows the accumulated sum of squares of the process outputs with
the self-tuner and with the optimal fixed gain controller. Notice that there is
very little difference between the accumulated losses obtained with the self-tuner
and the fixed gain controller. The main difference occurs initially when the
estimate is poor. After the first 10 steps the curves are almost identical. It thus
appears that the simple direct self-tuner converges to the correct controller in
spite of mismatch in the model structure.
Analysis. The following result which was given in [strm and Wittenmark

1973] gives insight into the behavior of the direct adaptive controller.

6 Identification and Adaptive Contro!-Adaptive Contro!

447

Theorem 2. Let Aigorithm 2 be used to control a system. Parameter ro can either


befixed or estimated. Assume that the regression vectors are mean square bounded,
and that the parameter estimates converge. The closed-Ioop system obtained in
the limit is then characterized by
y(t + t}y(t} = 0

= d, d + 1, ... , d

y(t + t}u(t}=O

= d,d

+I

+ 1, ... ,d + k

where the bar indicates time averages.

(14)

Algorithm 2 can thus be interpreted as if it attempts to drive certain covariances of the process output and certain cross covariances between input
and output to zero. Parameter d and the orders of the polynomials Rand S
determine which covariances are driven to zero. Notice that the limiting property
does not depend on the character of the system that is controlled.
Stronger statements can be made if more is known ab out the system to be
controlled. The following result was also shown in [strm and Wittenmark
1973].
Theorem 3. Let Aigorithm 2 with d ~ d o be used to control a system described
by equation (5). Assume that
min(k, 1) ~ n - 1

(15)

I t the estimates converge and the limiting controller is such that Rand S are
relative prime, then the equilibrium solution is such that
y(t + t}y(t} = 0

= d, d + 1, ...

(16)

D
Theorem 3 implies that a moving-average controller is a possible equilibrium
when Algorithm 2 is used to control a process given by equation (5). The
particular moving average controller obtained is governed by the choice of the
parameter d.
It remains to show that the estimates converge. The results from the previous
section do not apply since signal v is not white noise. Analysis of the algorithm
therefore require different tools and the results are not as simple and complete
as in the case of white equation noise.
The special case where d = do and the polynomial B is stable has been investigated by [Ljung 1977J. He derived convergence conditions using averaging
methods. In the ca se where an estimator based on stochastic approximation
was used instead of least squares, it was shown that the closed loop system is
asymptotically stable and that the estimates converge if the polynomial B is
stable and the function
1
1
G(z}=--C(z) 2

448

K. J. strm

is positive real. Local stability was investigated by [Holst 1979]. He showed


that the closed loop system is locally stable if the real part of the polynomial
C(Z-l) is positive when evaluated at all zeros of B(Z-l).
It has thus been demonstrated that the direct algorithm has some very
interesting properties, however the understanding of the direct algorithm is not
complete.

5 Industrial Use
KaIman expressed the following views on the application of his ideas to process
control in the Authors Closure of his 1958 paper:
The author may not be unduly optimistic in expressing his feeling that (disregarding economic
considerations) sufficient theoretical and technological know how exists already to bring practical
process control elose to the best performance achievable in the light of the !imitations imposed by
physical measuring equipment.

This was a good conjecture, but additional theoretical work and the invention
of the microprocessors were required to make adaptive process control an
industrial reality.
The simple self-tuner based on recursive least squares and moving average
control has found significant industrial use. It was used in a commercial ship
steering auto-pilot as early as 1979. It is currently installed in more than 70
ships in regular trafik SSPA in Sweden has recently developed a new ship
steering autopilot called RollNix. This system uses the rudder both for steering
and roll stabilization. The system also uses the self-tuning regulator based on
least squares estimation. Currently the system is used in 9 ships. A slight
modification of Algorithm 2 is used in a process control system made by ABB
(Asea Brown Boveri) for industrial process control. These controllers are
installed in ab out 3,000 loops worldwide. An indirect algorithm where control
design is based on pole placement is used in another system for industrial
process control developed by First Control AB in Sweden. This system is
installed in about 450 loops. The algorithms have also been used in biomedical
systems. Gambro in Sweden is using self-tuning regulators in more than 4,000
dialysis machines. Many additional products are under development. The
adaptive controllers have performed significantly better than conventional
controllers. More information about applications of adaptive control are given
in [strm and Wittenmark 1989].

6 Conclusions
As is clear from this paper, there is now an understanding of adaptive algorithms
of the type that was discussed in Kalman's 1958 paper. Several books have

6 Identification and Adaptive Control-Adaptive Control

449

recently appeared where many more details are given ([Anderson et al. 1986],
[strm and Wittenmark 1989], [Goodwin and Sin 1984], [Narendra and
Annaswamy 1989], and [Sastry and Bodson 1989]). It is also interesting to see
that the ideas to an increasing extent are finding their way to industrial use.
I had the personal fortune to leam about Kalman's work early in my career.
The first time I heard about his work on adaptive control was from a colleague
from Sweden who spent some time in Prof. Ragazzini's group at Columbia
University. Kalman's ideas about adaptive control intrigued me and has inspired
a lot of my research. I had renewed and more intense contact with KaIman
when I joined IBM Research at San Jose. Control systems research at IBM
was led by Jack Bertram, a colleaque ofKalman's from DuPont and later fellow
student at Columbia. Bertram didjoint work with KaIman on dead-beat control
[KaIman and Bertram 1958], sampled data systems [KaIman & Bertram 1959]
and Lyapunov theory [KaIman and Bertram 1960]. KaIman was also a
consultant to IBM. He appeared for seminars and discussions, and I had the
fortune to hear about his ideas first hand. One member of the IBM control
systems research group, Dick Koepcke, had also implernented Kalman's
algorithms on a hybrid computer consisting of an analog computer and a large
IBM computer. This work was unfortunately interrupted when the research
group moved to San Jose. It took a while for me to understand adaptive control.
It was necessary to obtain a deeper understanding of system identification and
system theory. This had all been very interesting work that has occupied me
for many years. In closing this paper I would like to thank Rudy for giving the
initial inspiration and for many discussions and communications we have had
since 1961.
References
Anderson, B.D.O., R.R. Bitmead, C.R. Johnson, Jr., P.V. Kokotovic, R.L. Kosut, I.M.Y. MareeIs,
L. Praly, and B.D. Riedle (1986), Stability of Adaptive Systems: Passivity and Averaging Analysis,
MIT Press, Cambridge, MA
Astrm, KJ. (1970), Introduction to Stochastic Control Theory, Academic Press, New York
Astrm, K.J. (1978), "Stochastic control problems," in Coppe1, W.A. (editor), Mathematical Control
Theory, Lecture Notes in Mathematics, Springer-Verlag, Berlin, F.R.G
Astrm, K.J. (1980), "Maximum Iikelihood and prediction error methods," Automatica, 16,
pp 551-574
Astrm, KJ. and B. Wittenmark (1973), "On self-tuning regulators," Automatica, 9, pp 185-199
Astrm, KJ. and B. Wittenmark (1989), Adaptive Control, Addison Wesley, Reading, MA
Astrm, KJ. and B. Wittenmark (1990), Computer Controlled Systems. Second Edition, Prentice
Hall, Englewood ClifTs, NJ
Chow, Y. and H. Teicher (1988), Probability Theory: Independence, Interchangability, Martingales.
Second Edition, Springer-Verlag, New York
Gregory, P.C. (editor) (1959), Proc Seif Adaptive Flight Control Symposium, Wright-Patterson Air
Force Base, Weight Air Development Center, Ohio
Goodwin, G.C. and K.S. Sin (1984), Adaptive Filtering Prediction and Control, Information and
Systems Science Series, Prentice-Hall, Englewood ClifTs, NJ
Holst, J. (1979) "Local stability of some recursive stochastic algorithms," Proc 5th I F AC Symposium
on Identiflcation and System Parameter Estimation, Pergamon Press, Oxford pp 1139-1146
Kaiman, R.E. (1958), "Design of a self-optimizing control system," Trans ASME, 80, pp 468-478
KaIman, R.E. (1960), "A new approach to linear filtering and prediction problems," Trans ASME
J Basic Eng, 82, pp 35-45

450

K. J. strm

KaIman, R.E. and J.E. Bertram (1958), "General synthesis procedure for computer control of single
and multi-Ioop linear systems," Trans AIEE, 77:II, pp 602-609
KaIman, R.E. and J.E. Bertram (1959), "A unified approach to the theory of sampling systems," J
FrankIin Inst, 267, pp 405-436
KaIman, R.E. and J.E. Bertram (1960), "The 'second method' of Lyapunov in the analysis and
optimization of control systems, I-Continuous-time systems," Trans AIEE J Basic Engr, 82,
pp 371-393
KaIman, R.E. and J.E. Bertram (1960), "The 'second method' of Lyapunov in the analysis and
optimization of control systems, II-Discretes-time systems," Trans AIEE J Basis Engr, 82,
pp 394-399
KaIman, R.E., y.c. Ho, and K.S. Narendra (1963), "Controllability of linear dynamical systems,"
in Contributions to Differential Equations, 1, Interscience-Wiley, New York, pp 189-213
Kumar, P.R. (1990), "Convergence of adaptive control schemes using least-squares parameter
estimates,". I EEE Trans Autom Contr, AC-35, to appear
Ljung, L. (1977), "On positive real transfer functions and the convergence of some recursive schemes,"
IEEE Trans Autom Contr, AC-22, pp 539-550
Ljung, L. and T. Sderstrm (1983), Theory and Practice of Recursive Identification, MIT Press,
Cambridge, MA
Narendra, K.S. and A.M. Annaswamy (1989), Stable Adaptive Systems, Prentice Hall, Englewood
ClifTs, NJ
Rootzen, H. and J. Sternby (1984), "Consistency in least squares: a bayesian approach," Automatica,
20, pp471-475
Sastry, S. and M. Bodson (1989), Adaptive Control Stability, Convergence, and Robustness, Prentice
Hall, Englewood ClifTs, NJ
Sternby, J. (1977), "On consistency for the method ofleast squares using martingale theory," IEEE
Trans Autom Contr, AC-22, pp 346-352

Chapter 7

Generalizations to Nonlinear and Distributed Systems

Kalman's Controllability Rank Condition:


From Linear to N onlinear*
E. D. Sontag
SYCON-Rutgers Center for Systems and Control, Department of Mathematics,
Rutgers University, New Brunswick, NI 08903, USA

The notion of controllability was identified by Kaiman as one of the central properties determining
system behavior. His simple rank condition is ubiquitous in linear systems analysis. This artic\e
presents an elementary and expository overview of the generalizations of this test to a condition
for testing accessibility of discrete and continuous time nonlinear systems.

1 Introduction
The state-space approach to control systems analysis took center stage in the
late 50's. Right from the beginning, it was recognized that certain nondegeneracy
assumptions were needed in establishing results on optimal control. However,
it was not until Kalman's work ([9J, [1OJ) that the property of controllability
was isolated as of interest in and of itself, as it characterizes the degrees of
freedom available when attempting to control a system.
The study of controllability for linear systems, first carried out in detail by
KaIman and his coworkers in [10J, has spanned a great number of research
directions, and Kalman's citation for the IEEE Medal of Honor in 1974 attests
to this influence. Associated topics such as testing degrees of controllability,
and their numerical analysis aspects, are still the subject of much research (see,
e.g. [12J, [19J, and references there). This paper deals with the questions
associated with testing controllability of nonlinear systems, both those operating
in continuous time, that is, systems of the type
x(t) = f(x(t), u(t))

(CT)

described by differential equations, and discrete time systems described by


difference equations
x+ (t)

= f(x(t), u(t))

(DT)

where the superscript "+" is used to indicate time shift (x+(t)=x(t+ 1)). In
principle, one wishes to study controllability from the origin. This is the property
that for each state xeIR" there be some control driving 0 to x in finite time.

* Research supported in part by

US Air Force Grant AFOSR-88-0235

454

E. D. Sontag

(The terminology "reachability" is also used for this concept.) As shown below,
in order to obtain elegant general results one has to weaken the notion of
controllability.
To simplify matters, it will be assumed that the states x(t) belong to an
Euclidean space IR", controls u(t) take values in Euclidean space IRm, and the
dynamics function f is (real-)analytic on (x, u). Many generalizations, such as
allowing x to evolve on a differentiable manifold, or letting f have less
smoothness, are of great interest; however, in order to keep the discussion as
elementary as possible the above assumptions are made here. (Analyticity allows
stating certain results in necessary and sufficient, rather than merely sufficient,
manner.) The controls u() are allowed to be arbitrary measurable essentially
bounded functions. The origin is assumed to be an equilibrium state, that is
f(O,O) = 0

For controllability questions from non-equilibria, related results hold, except


for some minor changes in definitions. An important and last restriction is that
in discrete time the system (DT) will be assumed to be invertible, meaning that
the map
f(,u)

is a diffeomorphism for each fixed u; in other words, this map is bijective and
has a non singular differential at each point. Imposing invertibility simplifies
matters considerably, and is a natural condition for equations that arise from
the sampling of continuous time systems, which is one of the main ways in
which discrete time systems appear in practice.
When the system is linear, that is,
f(x, u) = Ax + Bu

for suitable matrices A (of size n x n) and B (of size n x m), controllability from
the origin is equivalent to the property that the rank of the n x nm KaIman
block matrix
(B, AB, A 2 B, ... , An-l B)

(1)

must equal the dimension n of the state space. This is a useful and simple test,
and much effort has been spent on trying to generalize it to nonlinear systems
in various forms.
The systematic study of co nt roll ability questions for continuous time
non-linear systems was began in the early 70's. At that time, the papers [14],
[22], and [13], building on previous work ([2], [5]) on partial differential
equations, gave a nonlinear analogue of the above KaIman controllability rank
condition. This analogue provides only a necessary test, not sufficient. It becomes
necessary and sufficient if one is interested instead in the accessibility property,
a weaker form of controllability which will be discussed below and which
corresponds to being able to reach from the origin a set of full dimension (not
necessarily the entire space). Analogous results hold also in discrete time.

7 Nonlinear and Distributed Systems-Controllability Rank Condition

455

However, this work did not settle the question of characterizing controllability,
a question which remains open and which is the subject of a major current
research effort, at least in so far as characterizations of local analogues is
concerned. (One does know that local controllability can be checked in principle
in terms of linear relations between the Lie brackets of the vector fields defining
the system ([20]), and isolating the explicit form of these relations has been a
major focus of research. It is impossible to even attempt here to give a reasonably
complete list of references to this very active area of research. The reference
[21] can be used as a source of further bibliography.)
This brief overview article will discuss accessibility for discrete and
continuous time, as well as some results which exhibit examples where
accessibility and controllability coincide. Some ultimate limitations on the
possibility of effectively checking controllability will also be mentioned. For
more details on accessibility at an expository level, see for instance [6], [20],
or [7] in continuous time, and [8] in discrete time. These references should
also be consulted for justifications of all statements given here without proof.
The level of the presentation will be kept as elementary as possible, in order
to explain the main ideas in very simple terms.

2 Accessibility
Let}; be either (CT) or (DT). The reachable set r!II is by definition the set of
states reachable from the origin, that is, the set of states of the form
{cjJ(t, 0, 0, w)l t ~ 0, w admissible control}
where cjJ(t, s, x, w) denotes the value x(t) at time t of the solution of (CT) or (DT)
respectively, with initial condition x at time sand control function w = wo.
The function w is an arbitrary sequence in the discrete time case, and is required
to be measurable essentially bounded in the continuous case. If the solution of
(CT) is undefined for a certain wand x, then cjJ is also undefined.
The system }; will be said to be accessible (from the origin) if the reachable
set r!II has a nonempty interior in ]Rn.

Remark 1. Accessibility can be proved to be equivalent to the following


property: the set of states reachable from the origin using positive and negative
time motions is a neighborhood of the origin. This equivalence, valid under the
blanket assumption of analyticity that was made earlier, if often referred to as
the "positive form of Chow's lemma" and is due to Krener (see [13]) in more
generality for continuous time; the difference equation version is provided in
[8]. It should be pointed out that for accessibility from initial states which are
not equilibria, the continuous version of the equivalence is still valid, but in
discrete time this equivalence does not follow any more; see for instance the
0
example in [8].

456

E. D. Sontag

Remark .2. One mayaiso define accessibility from arbitrary initial states (rather
than just from the origin). When the initial state is not an equilibrium state,
however, one must distinguish between accessibility, as defined here, and "strong
accessibility" which corresponds to the requirement that there be a fixed time
T> such that the reachable set in time T, that is

9l T (xo):= {q,(T,O,xo,w)lw admissible control}

has a nonempty interior. In the case treated here, starting at an equilibrium


state, both notions can be shown to coincide (see next remark for discrete time,
D
where it is trivial to establish).

Remark 3. In discrete-time, accessibility corresponds to the requirement that


the union of the images of the composed maps
fk(O,'): (IRm)k --t IRn,

k~

cover an open subset, where we are denoting


fk(X,(U l ,, uk)):= f(f( f(f(x, ud, U2)"'" Uk- d, Uk)

for every state x and sequence of controls U l , U k By Sard's Theorem, for each
fixed k it is either the ca se that the map fk(O, .) has at least one point where its
Jacobian has rank n, or its image has measure zero. Since a countable union
of negligible sets again has measure zero, accessibility implies that there must
exist some k and some sequence of controls ul"",Uk so that the Jacobian of
fk(O,') evaluated at that input sequence,
fk(O,')* [Ul"" Uk],

has rank n. Consequently, accessibility is equivalent to accessibility in time


exactly k (cf. above Remark). Moreover, accessibility is equivalent to some such
rank being full (the converse follows from the Implicit Mapping Theorem). By
the chain rule for derivatives, this Jacobian condition can be restated as follows:
Consider the linearization of the system (DT) along the trajectory
Xl

= 0, X2 = f(x l , u l ), X3 = f(X2' u 2),

that is, the linear time-varying system


x(t + 1) = A,x(t) + B,u(t)

with

Then accessibility is equivalent to the existence of some sequence of controls


U l , ... , U k for which this linearization is controllable as a linear system. By
analyticity, if this holds for so me sequence of controls of length k then it holds
for almost every such sequence. In continuous time, the same result holds too
(see, for instance [17] for a proof).
D

7 Nonlinear and Distributed Systems-Controllability Rank Condition

457

Under certain circumstances, accessibility is equivalent to controllability.


Certainly this is the case for linear systems, as is easy to see. For another example,
if the system (CT) is "symmetric" meaning that
J(x, - u) = - J(x, u)
for each x and u, then accessibility from zero is equivalent to the reachable set
from the origin being a neighborhood of zero, and accessibility from every point
is equivalent to the reachable set being the entire space. A weaker type of
symmetry is given in [3J, a condition which inc1udes linear systems and hence
elegantly generalizes the equivalence of accessibility and controllability for those.
Another set of sufficient conditions for the equivalence of controllability and
accessibility revolve around the concept of Poisson stability; see for instance
[15J, [1].

3 Rank Condition-Continuous Time


For each control value u, Ju denotes the function
Ju: IRn --+ IRn: X 1--+ J(x, u)
(the "vector field" determined by the control u). Given any two such vector
functions J and g, one can associate the new function
[f,gJ
defined by the formula
[f, gJ(x):= g*[xJJ(x) - J*[xJg(x)
where in general h*[xJ is used to indicate the Jacobian of the vector function
h evaluated at the point x. This is called the Lie bracket of J and g, and it
represents the infinitesimal direction that results from following J and 9 in
positive time, followed by J and 9 in negative time.
The accessibility Lie algebra .P associated to the system (CT) is the linear
span of the set of all vector functions that can be obtained starting with the
Ju 's and taking any number of Lie brackets of them and the resulting functions.
For instance, if Ul' U2' U3' U4 are any four control values, the function
is in .P.
For a linear system

x=Ax+Bu
the functions Ju are all affine, and the Lie brackets are again of the same form.
It is easy to show that all elements of .P have the form

AkBv

458

E. D. Sontag

where v is some vector in n~m, or Ax + Bu. Moreover, every possible vector of


this form does appear as some product.
The system (CT) satisfies the accessibility rank condition at the origin if the
set of vectors
Y(O):= {g(O),gEY}

is a vector space of dimension n. In view of the preceding discussion, for linear


systems this condition is the same as the Kaiman controlJability rank condition.
The main result is then (see, for instance [7]):

Theorem. The system (CT) is accessible


condition holds.
D

if and

only

if

the accessibility rank

4 Rank Condition-Discrete Time


There is an analogue of the accessibiJity rank condition for discrete time systems,
and this is studied next. This work was started to a great extent by the papers
[4], [16]; see [8] for details.
The notation fu is as above, and in particular fo is the map fL 0). Recall
that in the discrete case one assumes invertibility, so that the inverse maps
1
are welJ-defined and again analytic. For each i = 1, ... , m and each uElRm let

r:

Xu.i(x):=

~I
Oe

fuo f:+\e;(x),
<=0

where ei denotes the ith coordinate vector, and more generalJy for alJ u, i and
each integer k ~ 0 let

(Ad~X u,;)(x):= ~I
f~ fuo f:+\e; f;;k(X)
Oe
<=0

The accessibility Lie algebra is now defined in terms of iterated Lie brackets
of these vector functions, and the accessibility rank condition is defined in terms
of this, analogously to the continuous time case. The main fact is, then, as folJows.

Theorem. The system (DT) is accessible


condition holds.
D

if and

only

if

the accessibility rank

Again, for linear (discrete time) systems, the condition reduces to the Kaiman
controlJability test. The vectors Ad~Xu.i are in fact alJ of the type AkBu, for
vectors uElRm

Remark. If the systems would only be assumed to be smooth as opposed to


analytic, the accessibility condition is only sufficient but not necessary, both in
discrete and continuous time. Consider for instance the system on lR 2 , with lR 2

7 Nonlinear and Distributed Systems-Controllability Rank Condition

459

also as control space, and equations

X=U (1)+U
o
i

2(

0 )
oe(xd

where oe is the function with


oe(x) = e -1/x 2

for x > 0 and oe(x) == 0 for x ~ O. This system is easily shown to be accessible-in
fact, it is completely controllable (any state can be steered to any other state),
-but the accessibility rank condition does not hold.
D

5 Controllability to the Origin


Often one is interested not in controllability from the origin but in controllability
to zero. The corresponding accessibility property is that
~:=

{x 14>(t, 0, x, co) = 0 for some t ~ 0 and some admissible control co}

contain an open set. This might be called "accessibility to the origin". It


corresponds to plain accessibility (from the origin) for the time-reversed system,
that is,

x=

- f(x,u)

in continuous time, or
x+ =

f;; l(X)

in discrete time. Since the accessibility Lie algebra 2 is a vector space, the same
Lie algebra results for the time-reversed of a continuous time system, proving
the equivalence of both notions in that case. The same result turns out to be
true in the discrete case, though the proofis much less trivial. This is summarized
then by:

Proposition; A system is accessiblefrom 0 ifand only ifit is accessible to O.

Remark. The proofin the discrete case relies roughlyon the following argument.
Introduce a superscript "-,, to the notation for the vectors Ad~Xu.i introduced
above, and use 2 - instead of 2 for the Lie algebra genera ted by these. Consider
also the vectors

now with k ~ 0, and let 2+ be the algebra generated by these vectors. This
algebra is the same as the algebra 2 obtained for the time-reversed system.

460

E. D. Sontag

One first proves that it is also possible to generate the same Lie algebra using
negative k in the definition of the vectors Ad~X;;'i (that is, the middle term in
the definition is fu 0 f ::+\e; rather than f:: 1 0 f u+ee ). Thus the only obstruction
is due to the use of negative instead of positive k. But since the operator
Ad o: Xf---+Ado(X),

Ado(X)(x):= (f~ 1 )* [fo(X)] (X(fo(x))

on the Lie algebra of all vector fields preserves the tangent space at 0, because
the origin is an equilibrium state, this induces an isomorphism between the two
linear subspaces .P + (0) and.P - (0), giving the desired equality of ranks.
D

6 An Example
The following is a well-known ("folk") example from differential geometry
illustrating the use ofthe accessibility rank condition in continuous time; because
the resulting system is "symmetrie" in the sense that f(x, -u) = - f(x, u), and
accessibility holds from every state, it can be shown that this example is
completely controllable, but here we only concentrate on the local aspect about
zero.
Assurne that we model an automobile in the following way, as an object in
the plane. The position of the center of the front axle has coordinates (x, y), its
orientation is specified by the angle ljJ, and 8 is the angle its wheels make relative
to the orientation of the car.
We assurne that the angle 8 can take values on an interval (- 80 , 80 ),
corresponding to the maximum allowed displacement of the steering wheel, and
that ljJ can take arbitrary values. As controls we take the steering wheel moves
(u 1 ) and the engine speed (u z). Using elementary trigonometry, the following
model results:

~ (~)+ l~~:;;;~}
u,

where z = (x, y, ljJ,

u,

(2)

8r can be thought of as belonging to the state space

lR x lR x lR x (- 80 , 80 )!:; lR 4
(In fact, it is more natural to identify ljJ and ljJ + 2n and take as state space the
manifold lR x lR X Sl X ( - 80 , 80 ); this leads to control systems on manifolds
different from Euclidean spaces.) We take the controls as having values on lR 2 ;
a more realistic model of course incorporates constraints on their magnitude.
A control with U z == corresponds to a pure steering move, while one with
U 1 ==
models a pure driving move in which the steering wheel is fixed in one
position. We let g 1 = steer be the vector field (0,0,0, 1)' and gz = drive the vector
field (cos(ljJ + 8), sin(ljJ + 8), sin 8, 0)'. It is intuitively clear that the system is

7 Nonlinear and Distributed Systems-Controllability Rank Condition

461

completely controllable, but one can check accessibility using the rank
condition. Indeed, computing
wriggle:= [steer,driveJ
and
slide:= [wriggle, drive J
it is easy to see that the determinant of the matrix consisting of the columns
(steer, drive, wriggle, slide) is nonzero everywhere, and in particular at the origin.
For cjJ = e= 0 and any (x, y), wriggle is the vector (0, 1, 1,0), a mix of sliding
in the y direction and a rotation, and slide is the vector (0, 1,0,0) corresponding
to sliding in the y direction. This means that one can in principle implement
infinitesimally both of the above motions. The "wriggling" motion corresponding
to wriggle is, from the definition of Lie bracket, basically that corresponding
to many fast iterations of the actions:
steer - drive - reverse steer - reverse drive, repeat
which one often performs in order to get out of a tight parking space.
Interestingly enough, one could also approximate the pure sliding motion:
wriggle, drive, reverse wriggle, reverse drive, repeat, corresponding to the last
vector field.

7 Remarks on Computational Complexity


It is worth looking also at Kalman's condition for linear systems from the

viewpoint of a polynomial time test, in the sense of Theoretical Computer


Science. One can prove that, in general, for a large dass of nonlinear
("polynomial") continuous-time systems, accessibility is decidable. For a
restricted dass which has often appeared in applications, that of bilinear
subsystems, accessibility can even be checked in polynomial time, just as with
Kalman's test for controllability of linear systems, but the problem of true
controllability is NP-hard. This last result provides a rigorous statement of the
fact that accessibility is easier to characterize than controllability, and it can
be interpreted as an ultimate limitation on the possibility of even finding a
characterizing condition for controllability (as opposed to accessibility) for
nonlinear systems that will be as easy to check as Kalman's.
See [18J and references there for precise details on the setup as weIl as for
proofs, as weIl as the more recent work [11J which extends the above.
8 References
1. Brockett, R., "Nonlinear systems and differential geometry," Proc IEEE 64 (1976): 61-72
2. Chow, W.L., "Uber systeme von linearen partiellen differentialgleichungen erster ordnung,"
Math Ann 117 (1939): 98-105

462

E. D. Sontag

3. Brunovsky, P., "Local controllability of odd systems," Banach Center Publications 1 (1974):
39-45

4. Fliess, M., and D. Normand-Cyrot, "A group-theoretic approach to discrete-time nonlinear


controllability," Proc IEEE Conf Dec Control, 1981
5. Hermann, R., "On the accessibility problem in control theory," in Int Symp Diff Eqs and
Mechanics, Academic Press, NY, 1973
6. Hermann, R. and AJ. Krener, "Nonlinear controllability and observability," IEEE Trans Autom
Ctr 22: 728-740
7. Isidori, A., Nonlinear Control Systems: An Introduction, Springer, Berlin; 1985

8. Jakubczyk, B., and E.D. Sontag, "Controllability of nonlinear discrete-time systems: A


Lie-algebraic approach," Invited Expository Artide, SIAM J Control and Opt 28 (1990): 1-33
9. Kaiman, R.E., "Contributions to the theory of optimal control," Bol Soc Mat Mex 5 (1960):
102-119
10. Kaiman, R.E., Y.c. Ho, and K.S. Narendra, "Controllability of linear dynamical systems,"
Contr Diff Eqs 1 (1963): 189-213
11. Kawski, M., "The complexity of deciding controllability," in Systems and Control Letters 15
(1990): 9-14
12. Kenney, c., and A.J. Laub, "Controllability and Stability Radii for Companion Form Systems,"
Math Control Signals Systems 1 (1988): 239-256
13. Krener, A., "A generalization of Chow's theorem and the bang-bang theorem to nonlinear
control systems," SIAM J Control12 (1974): 43-52
14. Lobry, c., "Controllabilite des systemes non Iineaires," SIAM J Contr 8 (1970): 573-605
15. Lobry, c., "Bases mathematiques de la theorie de systemes asservis non lineaires," Report #7505,
Univ Bordeaux, 1976
16. Normand-Cyrot, Dorothee, Theorie et Pratique des Systemes Non Lineaires en Temps Discret,
These de Docteur d'Etat, Universite Paris-Sud, 1983
17. Sontag, E.D., "Finite dimensional open-loop control generators for non-linear systems," Int J
Control 47 (1988): 537-556
18. Sontag, E.D., "Some complexity questions regarding controllability," Proc IEEE Conf Decision
and Control, Austin, December 1988, pp 1326-1329
19. Sontag, E.D., Mathematical Control Theory Springer-Verlag, New York, 1990.
20. Sussmann, HJ., "Lie brackets, real analyticity, and geometrie control," in Differential Geometrie
Control theory(R. W. Brockett, R.S. Millman, and H.J. Sussmann, eds.), Birkhauser, Boston, 1983
21. Sussmann, H.J., "A general theorem on local controllability," SIAM J Control and Opt, 25
(1987): 158-194
22. Sussmann, HJ., and V. Jurdjevic, "Controllability of nonlinear systems," J Diff Eqs 12 (1972):
95-116

Controllability Revisited
M. Fliess
Laboratoire des Signaux et Systemes, CNRS-ESE, Plateau de Moulon,
F-91192 Gif-sur-Yvette Cedex, France

Kalman's controllability is extended to nonlinear dynamics via elementary differential field


techniques. This generalization, which was first suggested by J.-F. Pommaret, is related to the
notion of strong accessibility introduced by HJ. Sussmann aod V. Jurdjevic in 1972. In the case
of constant or time-varying linear systems, it comes down to a purely module-theoretic description.
As a bonus of our methods we give a straightforward characterizatioo of hidden modes and we
also demonstrate that the controllability of a nonlinear dynamics is equivalent to the controllability
of its tangent linearized dynamics.

1 Introduction
Since its introduction in the early sixties, Kalman's controllability [22-25] has
played a prominent role in understanding and clarifying many structural
properties offinite-dimensional constant linear systems. A good characterization
of the importance of a theoretical concept is its ubiquity, i.e. its possibility of
embodying far reaching extensions to other situations. One of the most
important generalizations was done a few years later on finite-dimensional
nonlinear systems, see, e.g. the works of Hermann [18], Lobry [28], Sussmann
and Jurdjevic [38], Hermes [19], Krener [27] and many others. This was the
starting point of methods utilizing Lie brackets of vector fields and foliations,
stemming from differential geometry, to attack various problems in nonlinear
control theory (see, e.g. Sussmann [37] and Isidori [20]).
An alternative approach employing differential fields has recently been
proposed by the author [7,8] for the study of linear and nonlinear systems. It
has the advantage of solving some long-standing problems, such as input-output
inversion and of throwing new light on state-variable realizations. A (nonlinear)
dynamics is now a finitely generated differentially algebraic extension K/k<u),
where k<u) is the differential field generated by the ground field k and by the
control variables u = (Ul' ... ' um). Recall that a differentially algebraic extension
is the natural differential analogue of the well-known notion of algebraic
extensions for usual (non-differential) fields (see Kolchin [26]). It means that
any element in K satisfies an algebraic differential equation with coefficients in
k <u). Pommaret [30] has suggested in this context a most elegant interpretation
of controllability by identifying it with the fact that the ground field k is

464

M. Fliess

differentially algebraically closed in K. In plain words this can be translated by


saying that any element in K is indeed influenced by the control variables as
it does not satisfy any differential equation independent of u, Le. with coefficients
in k. We show here that Pommaret's controllability is related to the notion of
strong accessibility due to Sussmann and Jurdjevic [38] (see also Haddak [17]
and van der Schaft [34] for similar results). The benefits are manifold:
-It is now clear what controllability means for systems described by implicit

differential equations and/or with time-varying coefficients, as these systems


are easily taken into account by differential fields [7].
- The enlargement of the notion of controllability to systems determined by
partial differential equations [30], to discrete-time systems [13] via difference
algebra and to systems with delays [9] via difference-differential algebra
should not be difficult.
We have to remind that the notion of differentially algebraically closed fields
was introduced for the first time by the famous logician Robinson [32] in the
setting of model theory (see also Wood [40]). In the near future we hope to be
able to relate controllability with logic to get a better understanding of this
concept in much more general situations.
When specializing to constant or time-varying linear systems we come to
an apparently new characterization of controllability in terms of free modules.
However, such a language should not be surprising as KaIman [24, 25]
demonstrated more than twenty years ago the usefulness of module theory for
linear systems. A straightforward but rewarding bonus of our approach is a
simple and transparent definition of input-decoupling zeros. Recall that
decoupling zeros or hidden modes, which are of utmost importance for the
qualitative behaviour oflinear systems (see, e.g. Rosenbrock [33]), were perhaps
never given in simple and universally accepted geometrie definitions (cf. Schrader
and Sain [35]). For the sake of completeness we also define the outputdecoupling zeros. Those zeros are related to a lack of observability which
corresponds to a set-theoretic inclusion.
In a final short appendix we indieate via Khler differentials, that the controllability of a nonlinear dynamics is equivalent to the controllability of its
linearized tangent dynamics (see Sontag [36] for related results).

2 Differential Fields
2.1. Differential Algebra

Differential algebra was introduced between the two World Wars by the
American mathematician Ritt [31] in order to develop for differential algebraic
equations a theory whieh, to some extent, would be close to algebra, which was

7 Nonlinear and Distributed Systems-ControIIabiIity Revisited

465

maturing at that time in Germany. Today the bible of differential algebra is


Kolchin's book [26] to which we refer for more details and information.

2.2. Differential Fields


A differential field K is a commutative field which is equipped with a single
derivation dldt = ".", which obeys the usual rules:
Va,beK,

-(a+b}=+b,
dt

d
.
-(ab) = b + ab
dt
A constant of K is an element ceK such that c= O. The set of constants of
K is a subfield of K, which is called the field of constants.

2.3. Remarks
(i) We restrict ourselves to systems with lumped parameters, i.e. to ordinary
differential equations and to ordinary differential fields with a single
derivation. We do not treat partial differential fields with several pairwise
commuting derivations.
(ii) For simplicity's sake we also limit ourselves to fields of a characteristic zero,
as differential equations over fields of strictly positive characteristics have
not yet appeared in physics or engineering.

2.4. Examples
(i) The fields <Q, lR, <C of rational, real and complex numbers are trivial fields
of constants.
(ii) The field of not necessarily proper rational transfer functions in a single
variable s, over the field lR, is a differential field with respect to the derivation
dlds.

2.5. A differential field extension LI K is given by two differential fields K, L


such that
(i) K is a subfield of L,
(ii) the derivation of K is the restriction to K of the derivation of L.
Most of their rather elementary properties mimic those of classic non-differential
fields (see, e.g. Bourbaki [3]).

2.6. Two situations are possible for LI K:


(i) An element eeL is said to be differentially algebraic over Kif, and only if,
it satisfies an algebraic differential equation p(e, e, ... , e(a)} = 0, where P is

466

M. Fliess

a polynomial over K in ~ and its derivatives. The extension LI K is said to


be differentially algebraic if, and only if, any element of L is differentially
algebraic over K.
(ii) An element 17EL is said to be differentially transcendental over Kif, and only
if, it is not differentially algebraic over K. This means that no algebraic
differential equations over K is satisfied by 17. The extension LI K is said to
be differentially transcendental if, and only if, there exists at least one element
of L which is differentially transcendental over K.

2.7. K is said to be differentially algebraically closed in L (cf. Robinson [32],


Kolchin [26]) if, and only if, any element of L, which is differentially algebraic
over K, necessarily belongs to K.
2.8. The next result for finitely generated differential extensions will be most
useful:
Theorem. A finitely genera ted differential extension is differentially algebraic if,
and only if, its (non-differential) transcendence degree is finite.
In a more down-to-earth, but less precise, language this transcendence degree
is the number of initial conditions which are needed for computing the solutions
of the algebraic differential equations represented by LI K.

3 Nonlinear Dynamics
3.1. Let k be a given differential ground field and k (u) be the differential field

generated by k and the elements of a finite set u = (u 1 , , um) of differential


quantities wh ich play the role of control variables. A typical element of k (u)
looks, when k = <Q, like
8(U\3))4 2 - 17(Ul)10(2)23
3U\5)

+ (d 5U 2

A (nonlinear) dynamics [12] is a finitely generated differentially algebraic


extension Klk(u).
3.2. According to 2.8, the (non-differential) transcendence degree of Klk(u) is
finite, say n. Take a transcendence basis ~ = (~l' ... , ~n) of KI k (u). Each of the
derivatives ~ 1, . , ~n is k (u )-algebraically dependent on ~. This reads:
(I)

Al(~l'~' U, , ... , U(Cl)) = 0, ... , An(~n,~, u, , ... , U(Cl)) = 0,

where Al" .. , An are polynomials over k. We should pinpoint that the equations

7 Nonlinear and Distributed Systems-Controllability Revisited

467

are implicit with respect to ~ 1"", ~n' When the Jacobian matrix

has full rank n, the implicit function theorem yields 1


(E)

~1 =al(~,u,, ... ,u(~)"",~n=an(~,u,, ... ,u(~)

(E) is only "locally valid", i.e. in domains where the Jacobian matrix has full
rank.
3.3. ~ is a minimal (generalized) state. Take two of such minimal states ~ and
[, i.e. two transcendence basis of K/k<u). Any component of [ is k<u)algebraically dependent on ~. This amounts to saying that there exists a
control-dependent implicit relation of the type:
Pi([i'~'U,, ... ,u(P)=O,

i=l, ... ,n

which "locally" yields

[i =

Pi(~' u, , ... , u(P)

Such control-dependent state transformations have al ready been utilized in


some more practically orientated works (see, e.g. Williamson [39], Zeitz and
his co-workers [41]).

4 Controllability
4.1. The dyn ami es K/k<u) is said to be controllable [30] if, and only if, the

ground field k is differentially algebraically closed in K (see 2.7). Recall that


this means that any element in K and not belonging to the ground field k is
indeed commanded by u as it does not satisfy any differential equation over k.
4.2. Assume now that k is a field of constants, lR for ex am pie, and that the input

u is independent, i.e. that the components of u are differential k-indeterminates


(cf. [26]). Now, consider a system
m

(17)

X = fo(x)

+I

UJi(X),

i= 1

where the control variables appear linearly, as quite often in the differential
geometrie approach (cf. [20]). The f/s are formal polynomial vector fields over
k. As the derivatives of the components of x = (x 1"'" X n ) appear linearly, it is
rather routine work to verify that the differential ideal [26] corresponding to

Recall that, in our abstract algebraic setting, this theorem has only a heuristic value.

468

M. Fliess

(17) is prime. To (17) corresponds therefore a dynamics k<u,x)/k<u), where,

by a slight abuse of notations, the elements in the differential equation and in


the differential fields are denoted with the same symbols.
Without a great loss of generality, we also ass urne that x is a (non-differential)
transcendence basis of k< u, x) / k u ).

<

4.3. Let .!l' (resp. A) be the Lie K-algebra spanned by Jo,f!> ... ,fm (resp.
J1' ... ,fm) Denote by .!l' 0 the Lie ideal of A in .!l'. The usual differential
geometrie notions possess a formal counterpart (see, e.g. Botelho [2J, Nichols
and Weisfeiler [29J). The involutive distribution L (resp. L o) corresponding to
.!l' (resp . .!l' 0) is the k(x)-vector space spanned by .!l' (resp . .!l' 0)' where k(x) is
the (non-differential) field genera ted by k and the components of x. The following
well-known accessibility definitions (cf. [37J) now read:
(17) is said to be weakly (resp. strongly) accessible if, and only if, the k(x)dimension of L (resp. L o) is n.
4.4. Our main result is:
Theorem. The dynamics k<u, x)/k<u) is controllable if, and only if, (17) is

strongly accessible.
4.5. Remark. The elementary example x = x, where n = 1 and with no input,
i.e. m = 0, is weakly accessible. As x is differentially algebraic over k, it is not
controllable in the sense of 4.1, which is therefore not equivalent to weak
accessibility.

4.6. We now outline the proof of the theorem. Let k be the differential algebraic
closure of k in k <u, x), i.e. the set of all elements in k <u, x) which are differentially algebraic over k. The next lemma, which is easy, will be most useful:
Lemma. k is not difJerentially algebraically closed in k<u, x) if, and only if, the

transcendence degree oJ the intersection k n k(x) over k is strictly positive.


4.7. Set trd knk(x)/k=n-d. Choose a transcendence basis e=(e1, ... ,en)
of k(x)/k, where ed+1' ... ' en is a transcendence basis of knk(x)/k. It yields a
decomposition of (17) where e 1'' ed (resp. eH 1'' en) satisfy differential
equations which are control dependent (resp. independent).
O

4.8. Assurne that the k(x)-dimension of L o is (j ~ n. From [2, 29J, we know the
existence of a basis 8/8rll, ... ,8/8t/b of pairwise commuting vector fields and
therefore the existence offormallocal coordinates e 1'' eb' eH 1' ., en which
yield (see, e.g. [20J) the same type of decomposition as in 4.7: e1, ... , eb (resp.
eH 1' ., en) satisfy differential equations which are control dependent (resp.
independent).
4.9. The former decomposition is thus possible if, and only if, one of the two
next conditions is satisfied:

7 Nonlinear and Distributed Systems-Controllability Revisited

469

(i) k is not differentially algebraically c10sed in k< u, x>,


(ii) the k(x )-dimension of La is strict1y less than n.
This ends the sketch of the proof which also shows that d = b.

5 Linear Controllability 2
5.1. We are defining linear dynamics via module theory which is more familiar
to control theorists than differential vector spaces [26J, which we employed in
our previous publications [7, 8, 12]. This latter framework has perhaps the
advantage of making the connection with general nonlinear cases c1earer.
5.2. Let k be an arbitrary differential ground field. We denote by k[d/dtJ the
ring of linear differential operators over k of the form

finite

d~

a~~,

a~Ek

dt

k[d/dtJ is a commutative ring if, and only if, k is a fie1d of constants. It is known
(cf. Cohn [6J) that k[d/dtJ is a principal ideal ring.
5.3. We denote by [wJ the left k[d/dtJ-module spanned by a set w. A linear
dynamics M is a finite1y generated left k[d/dtJ-module containing [uJ, such that
the quotient module M/[uJ is torsion. The input u is said to be independent if,
and only if, the module [uJ is free. The dynamics M is said to be constant (resp.
time-varying) if, and only if, k is (resp. is not) a field of constants.
5.4. As M/[uJ is torsion and finitely generated, it is necessarily finitedimensional as a k-vector space. Take a finite set ~ = (~1' ... , ~ n) of elements in
M such that its canonical image in M/[uJ is a basis of the latter vector space.
This yields the linear counterpart of (I) and (E) in 3.2:

~ = A~

+ L
~

B/1u(/1),

/1;1

where A and the B/1's are matrices over k of appropriate sizes. As in 3.3, another
(generalized) minimal state [ = ([1' ... , [n) is related to ~ by a control-dependent
transformation
~

= p[ +

Qvu(V)

v; 1

Pis a square invertible matrix and the Qv's are matrices of appropriate sizes.

The results of the sections IV and. V have already been announced in [10,11].

470

M. Fliess

5.5. Consider the dynamics ~ = ~ + u + U, where m = n = 1. The state transformation = x + u yields x = x + 2u which does not contain derivatives of the
control variable. This elimination can easily be extended to the general case.
The transformation

[ = ~-

Ba u(a-l)

yields an equation where the derivative of u is of order


this procedure we arrive to a Kaiman representation:

x=

oe - 1. By repeating

S:x + Cu,

where x is a (minimal) Kaiman state. Note that two minimal KaIman states x
and x are related by the usual control independent transformation x = [1J>x,
where [1J> is a square invertible matrix.
5.6. Remark. We should remember that this elimination of the derivatives of
the control variables is in general impossible for nonlinear dynamics (Freedman
and Willems [14], Glad [16]).
5.7. From now on, we assume that the input u is independent. A linear dynamics
M is said to be controllable if, and only if, its KaIman representation is
controUabIe 3 .
5.8. Example. The KaIman representation of ~ = U, m = n = 1, is x = 0, where
x = ~ - u. It is of course not controUable. Note, however, that any value of ~
is reachable by a suitable choice of u. This implies that in our approach controllability and reachability are not equivalent.
5.9. A weU known property (cf. Cohn [6]) of finitely generated Ieft modules
over principal ideal rings teUs us that the dynamics M can by decomposed in
the foUowing way
M=FfBT,

where F (resp. T) is a free (resp. torsion) module. This fact is of course equivalent
to the KaIman decomposition of a linear dynamics into the controUable and
uncontrollable parts

~(x+) =
dt

x#

(S:

(0)

s: 2

)(x+) + ((O))u,
x#

where the uncontroUable part, which satisfies, x+ = s: 1 X +, obviousIy generates


a torsion module. ControUability thus implies that T is trivial, Le. that there
exists no element in M which satisfies a linear differential equation over k which
is independent of u. This shows that what we are doing is indeed the linear
counterpart of the general concept of controUability which we introduced in

In the time-varying case, we refer, for example, to Freund [15].

7 Nonlinear and Distributed Systems-Controllability Revisited

471

4.1. RecaIl that it is weIl known (see, e.g. [20J) that the strong accessibility of
a constant linear system implies Kalman's controllability.
5.10. In the last section we have proved the following characterization of linear
controllability which seems to be new:

Theorem. A linear dynamics M is controllable if, and only if, it is a free left
k[dldtJ-module.

6 Hidden Modes or Decoupling Zeros


6.1. In this chapter we are working with constant linear systems. The study of
their various zeros attracted much attention, a good review of which was recently
given by Schrader and Sain [35]. Among those zeros, hidden modes or decoupling
zeros often appear in complex interconnections of systems. A precise definition
in the multivariable case has only been given by matrix calculations in the
context of Rosenbrock's polynomial methods (see, e.g. Rosenbrock [33J, Callier
and Desoer [4J, Blomberg and Ylinen [IJ). We hope to convince the reader
that our approach offers a straightforward understanding of this important
topic. 4

6.2. Take a constant linear dynamics M, which is assumed to be uncontrollable.


The torsion module T in 5.9, which is then non-trivial, is finite-dimensional as
a k-vector space. The derivation dldt induces a k-linear mapping -r: T --+ T. The
input-decoupling zeros are the eigenvalues of -r over an algebraic c10sure k of k.
6.3. For the sake of completeness, we now briefly discuss output-decoupling

zeros which are related to a lack of observability with respect to an output,


which is here given by an elementary set-theoretic inc1usion, which is also valid
in the time-varying case: The dynamics M is observable with respect to the
output y if, and only if, M coincides with the left k[dldtJ-module [u,y]. The
dynamics M is unobservable with respect to y if, and only if, M strictly contains
Eu, y].
The intuitive meaning is the following: Observability implies that any element
in M can be expressed as a k-linear combination of the components of u and
y and of a finite number of their derivatives.
6.4. If M is not observable with respect to y, the quotient module Q = M I[u, y J
is nontrivial. As Q is finitely generated and torsion, it is a finite-dimensional
k-vector space. The derivation dldt induces a k-linear mapping 0': Q --+ Q. The
output-decoupling zeros are the eigenvalues of 0' over k.

Transmission zeros are also defined in [11].

472

M. Fliess

6.5. The set of hidden modes or decoupling zeros is the union of the sets of
input- and output-decoupling zeros.

7 Appendix: On the Relationship Between Nonlinear


and Linear Controllability
7.1. Consider the following nonlinear dynamics which we write down with
classic notations:
x=F(x,u)

Of course, it is possible to linearize it with respect to a given input u and the


corresponding trajectory of x. This yields in general a time-varying linear
system:
Lix = FxLix + FuLiu

7.2. In the language of fields, such a linearization procedure is formalized by


the notion of (Khler) differentials (see, e.g. Bourbaki [3]), which possesses a
differential analogue (Johnson [21]), which we will not recall here due to a
lack of space. It is then straightforward to check that k is differentially
algebraically closed in Kif, and only if, the left k[d/dtJ-module [2K/k spanned
by the Khler differentials corresponding to K/k is free. This module [2K/k'
which plays the role of the linearized dynamics, is then controllable (see 5.10).
7.3. We have outlined the proof of the following important result, which seems
to be new:
Theorem A nonlinear dynamics is controllable if, and only if, its linearized
dynamics is controllable.

7.4. Remark. One should note that for some peculiar choices of u and of the
corresponding trajectories of x, the controllability of the linearized dynamics
might fai!. This can be precisely expressed in computing the rank condition for
checking controllability. Interesting related results have been obtained by
Charlet, Uvine, and Marino [5] in the context of dynamic feedback linearization.
References
[1] H. BIomberg and R. Ylinen, Algebraic Theory of Multivariable Linear Systems, Academic
Press, London, 1983
[2] J.N. Botelho, Le theoreme de Frobenius formel, J. DifT Geom, 12 (1977), pp 319-325
[3] N. Bourbaki, Algebre (Chap 4 7), Masson, Paris, 1981
[4] F.M. Callier and C.A. Desoer, Multivariable Feedback Systems, Springer-Verlag, New York,
1983
[5] B. Charlet, J. Levine and R. Marino, On Dynamic Feedback Linearization, Systems Control
Lett, 13 (1989), pp 143-151

7 Nonlinear and Distributed Systems-Controllability Revisited

473

[6] R.M. Cohn, Free Rings and their Relations, Academic Press, London, 1971
[7] M. Fliess, Automatique et corps differentiels, Forum Math, 1 (1989), pp 227-238
[8] M. Fliess, Generalized Linear Systems with Lumped or Distributed Parameters and
Differential Vector Spaces, Int J Control, 49 (1989), pp 1989-1999
[9] M. Fliess, Some Remarks on Nonlinear Input-Output Systems with Delays, in "New Trends
in Nonlinear Control Theory", (Proc Conf Nantes 1988), J Descusse, M. Fliess, A. Isidori
and D. Leborgne eds, Lect Notes Control Inform Sei 122 (1989), pp 172-181, Springer-Verlag,
Berlin
[10] M. Fliess, Commandabilite, matrices de transfert et modes caches, c.R, Acad Sei Paris, 1-309
(1989), pp 874-851
[11] M. Fliess, Geometrie Interpretation of the Zeros and of the Hidden Modes
of a Constant Linear System via a Renewed Realization Theory, Proc IFAC Workshop
"System Structure and Control: State-space and Polynomial Methods," (1989), pp 209-213,
Prague
[12] M. Fliess, Generalized Controller Canonical Forms for Linear and Nonlinear Dynamies,
IEEE Trans Automat Control, 35 (1990), pp 994-1001
[13] M. Fliess, Automatique en temps discret et algebre aux differences, Forum Math, 2 (1990),
pp. 213-232
[14] M.l. Freedman and J.c. Willems, Smooth Representation of Systems with Differentiated
Inputs, IEEE Trans Automat Control, 23 (1978), pp 16-21
[15] E. Freund, Zeitvariable Mehrgrssensysteme, Lect Notes Operat Res Math Systems, 57 (1971),
Springer-Verlag, Berlin
[16] S.T. Glad, Nonlinear State Space and Input Output Descriptions Using Differential
Polynomials, in "New Trends in Nonlinear Control Theory" (Proc Conf Nantes 1988), J.
Descusse, M. Fliess, A. Isidori and D. Leborgne eds, Lect Notes Control Inform Sei, 122
(1989), pp 182-189, Springer-Verlag, Berlin
[17] A. Haddak, Differential Algebra and Controllability, in "Nonlinear Control Systems Design",
(Proc IFAC Symp Capri, 1989), A Isidori ed, Pergamon Press, Oxford
[18] R. Hermann, Differential Geometry and the Calculus of Variations, Academic Press, New
York,1968
[19] H. Hermes, On Local and Global Controllability, SIAM J Control, 12 (1974), pp 252-261
[20] A. Isidori, Nonlinear Control Systems (2nd edition), Springer-Verlag, Berlin, 1989
[21] J. Johnson, Khler Differentials and Differential Algebra, Annals Math, 89 (1969), pp 92-98
[22] R.E. Kaiman, On the General Theory ofControl Systems, in "Automatie and Remote Control",
Proc 1st IFAC World Congress Moscow 1960, Voll (1961), pp 481-492, Butterworth, London
[23] R.E. Kaiman, Mathematical Description of Linear Systems, SIAM J Control, 1 (1963),
pp 152-192
[24] R.E. Kaiman, Lectures on Controllability and Observability, CIME Summer Course,
Cremonese, Roma, 1968
[25] R.E. Kaiman, P.L. Falb and M.A. Arbib, Topics in Mathematical System Theory,
McGraw-Hill, New York, 1969
[26] E.R. Kolchin, Differential Algebra and Algebraic Groups, Academic Press, New York, 1973
[27] A.J. Krener, A Generalization ofChow's Theorem and the Bang-Bang Theorem to Nonlinear
Control Systems, SIAM J Control, 12 (1974), pp 43-52
[28] C. Lobry, Contrlabilite des systemes non lineaires, SIAM J Control, 8 (1970), pp 573-605
[29] W. Nichols and B. Weisfeiler, Differential Formal Groups of J.F. Ritt, Amer J Math, 104
(1982), pp 943-1003
[30] J.-F. Pommaret, Lie Groups and Mechanics, Gordon and Breach, New York, 1988
[31] J.F. Ritt, Differential Algebra, Amer Math Soc, New York, 1950
[32] A. Robinson, Introduction to Model Theory and to the Metamathematics of Algebra,
North-Holland, Amsterdam, 1974
[33] H.H. Rosenbrock, State-space and Multivariable Theory, Nelson, London, 1970
[34] AJ. van der Schaft, Structural Properties of Realizations of External Differential Systems, in
"Nonlinear Control Systems Design", (Proc IFAC Symp Capri, 1989), A. Isidori ed, Pergamon
Press, Oxford
[35] c.B. Schrader and M.K. Sain, Research on System Zeros: A Survey, Proc 27th IEEE Conf
Decision Control, 1988, pp 890-901, Austin, TX
[36] E.D. Sontag, Finite-dimensional open-Ioop control generators for nonlinear systems, Int J
Control, 47 (1988), pp 537-556

474

M. Fliess

[37] HJ. Sussmann, Lie Braekets, Real Analytieity and Geometrie Control, in "DilTerential
Geometrie Control Theory" (Proe Conf Miehigan Teeh Univ 1982), R.W. Boekett, R.S.
Millman and HJ. Sussmann eds, Birkhuser, Boston, 1983, pp 1-116
[38] H.J. Sussmann and V. Jurdjevie, Controllability of Nonlinear Systems, J DilT Equations, 12
(1972), pp 95-116
[39] D. Williamson, Observation of Bilinear Systems with Applieation to Biological Control,
Automatiea, 13 (1977), pp 243-254
[40] C. Wood, The Model Theory of DilTerential Fields Revisited, Israel J Math, 25 (1976),
pp 331-352
[41] M. Zeitz, Canonieal Forms for Nonlinear Systems in "Nonlinear Control Systems Design",
(Proe IFAC Symp Capri, 1989), A. Isidori ed, Pergamon Press, Oxford

On the Extensions of Kalman's Canonical


Structure Theorem
A. Ruberti and A. Isidori
Istituto di Automatica et Sistemistica, Universita di Roma-La Sapienza,
Via Eudossiana 18,1-00184 Rome, Italy

1 Introduction
One of Kalman's most relevant contributions to system and control theory
has been the development of a rigorous conceptual framework in which the
relations between external (input-output) and internal (state) variables
associated with the mathematical descriptions of a dynamical system can be fully
understood. A cornerstone of this theory was the so-called "canonical structure
theorem", first published in [IJ, "which describes abstractly the coupling
between the external and internal variables of any linear dynamical system"
([2J, p. 153). This fundamental result played a crucial role in a number of
major problems of analysis and design, like the construction of irreducible
realizations of an impulse response matrix (a problem completely solved, for
the first time, in [2J), the assignment of the eigenvalues in a feedback system,
the disturbance decoupling and noninteracting control problems, to mention a
few.
Since the appearence of this result in 1962, many authors were interested
in finding extensions to more general classes of systems than those considered
in the paper [1]. A first set of contributions in this direction, addressed to the
extension of the "canonical structure theorem" to time-varying linear systems,
could be found in the works of KaIman and Weiss [3J, Youla [4J, Silverman
and Meadows [5J, Weiss [6J, d'Alessandro, Isidori and Ruberti [7]. A second
stage of development included the theory of bilinear systems, whose canonical
decomposition was studied by Brockett [8J and d' Alessandro, Isidori and
Ruberti [9]. Finally, a canonical structure theorem-along with methods for
the construction of irreducible realizations- became available also for nonlinear
systems, as an outcome of the works of Sussmann and Jurdjevic [10J, Brockett
[l1J, Sussmann [12J, Hermann and Krener [13J, Isidori, Krener, Gori Giorgi,
and Monaco [14J, Fliess [15].
In this paper, which is prepared on the occasion of the 60th birthday of
Rudolf E. KaIman, we intend to review some of our own research work, done
in collaboration with the coauthors of [7J and [14J, on the subject of the
canonical structure of control systems, with particular emphasis on the problem
of decomposing a given system into reachable/unreachable and, respectively,

476

A. Ruberti and A. Isidori

observablejunobservable parts. In the second section, we deal with time-varying


linear systems and explain how a decomposition into constant dimensional
subsystems can be obtained. In the third and fourth section we discuss the
corresponding problem for a time-invariant nonlinear system. In both cases,
the goal is to identify and construct a decomposition in which the dimension
of the subsystem which is affected by the input is the smallest possible,
and-respectively-the dimension of the subsystem which does not affect the
output is the largest possible. The characterization of the minimal (maximal)
subsystems thus obtained in terms of structure properties is not as simple as
in the case of linear systems, where minimality corresponds exactly to the
standard properties reachability and observability, but it still possible and rather
meaningful. For example, in the case oftime-varying linear systems, the minimal
subsystem which is affected by the input is such that every state is the sum of
astate which is reachable, form 0, at time t and astate which is controllable,
to 0, at time t. In the case of a nonlinear system, the minimal subsystem which
is affected by the input is such that at least an open subset of states is reachable
at any (small) time t, from the initial state XO ab out which the (Iocal)
decomposition is performed. Mutatis mutandis, similar interpretations are
possible about the decomposition which maximizes the dimension of the
subsystem which does not affect the output. The paper is also completed by an
example of application of the canonical structure theorem to the construction
of an irreducible realization of a prescribed (nonlinear) input-output map.

2 Canonical Decompositions of Time-Varying Linear Systems


In this section we consider time-varying linear systems described, in state space
form, by equations of the following type:

x= F(t)x + G(t)u

(2.1a)

y = H(t)x

(2.1b)

with xER n, uER m , xERP, and F(.), G(.), H(.) matrices of continuous functions.
We suppose that the reader is familiar with the concepts ofreachability, controllability, observability and constructibility (at a specific time t), whose definitions
can be found e.g. in [3]. Our goal is to find a (time-varying) co ordinate
transformation:
z(t) = T(t)x(t)
(where T(.) is an invertible matrix of continuous functions) such that in the
new coordinates, the equations describing the system become:

=1 = F ll (t)Zl + F 12 (t)Z2 + G1(t)u

(2.2a)

=2 =

(2.2b)

F 22(t)Z2

y=H 1(t)Zl +H 2(t)Z2

7 Nonlinear and Distributed Systems-Canonical Structure Theorem

477

or, respectively:
Zl =F l l (t)Zl + Fu(t)Z2 + Gl(t)u
Z2 = F 22(t)Z2 + G2(t)u
y = H 2(t)Z2

with minimal (constant) dimension of Zl in the first case and minimal (constant)
dimension of Z2 in the second case. For reasons of space we discuss only the
first situation and we refer to the literature [7J for the other one, which can be
dealt with in a completely dual manner.
It is weH known that, in the case of a time-invariant system, a decomposition
of the form (2.2) can be obtained by choosing a new basis in the state space
with the property that the first-say nl-vectors of this basis span the set of
reachable states. In the case oftime-varying systems this simple argument cannot
be further pursued because, as also weH known, the set of reachable states may
depend on the time and, what is worse from the point of view of performing a
"decomposition" into subsystems, its dimension also may vary with time (as a
matter of fact, it is easy to check that the latter is a nondecreasing function of
t). A similar difficulty arises also if one looks at the property of controHability,
because the set of controHable states mayas weH depend on time and so may
its dimension (which, by the way, is a nonincreasing function of t). Fortunately
enough, however, a suitable blend of these two properties does the job, in the
sense that enables us to identify a subset whose dimension is constant, which
depends continuously on the time, and which asymptoticaHy coincides either
with the set of controHable states (as time tends to - CIJ) or with the set of
reachable states (as time tends to + CIJ). The following statement expresses these
properties in a precise manner.

Lemma (see [7J) Let ~(t) denote the set of states reachable (from 0) at time t
and let ~(t) denote the set of states controllable (to 0) at time t. The subspace
,9l(t) = 9l(t) + ~(t)

has constant dimension, say n l , M oreover,


,9l(t') = ([J(t', t)&'(t)

(2.3)

for all t' and t, where ([J(t', t) is the state transition matrix of the system (2.1).
In particular, there exist times Tl ~ T 2, with the properties that:
,9l(t) = ~(t) for all
,9l(t) = 9l(t) for all

t< Tl

t> T 2

The results of this Lemma can be c1early taken as the point of departure
for the construction of a canonical decomposition of the form (2.2). From (2.3),
one deduces the existence of a nonsingular matrix of continuous functions

478

A. Ruberti and A. Isidori

with the property that:


P(t) = span {0 1 (t), ... , On,(t)}

Then choosing this as a coordinates transformation it is immediate to obtain


for the system a decomposition of the form (2.2). In particular, the invariance
of &(.) under tP(., .), expressed by (2.3), accounts for the block-triangular
structure of F(.) in the new coordinates. The minimality of this decomposition,
on the other hand, is a straightforward consequence of the fact that, asymptotically, &(t) coincides with either lC(t) or with Bl(t).

3 Canonical Decompositions of Nonlinear Systems:


Mathematical Background
In this section we consider nonlinear systems described, in state space form, by
equations of the following type:
X = J(x) +

L g;(x)u;

(3.1a)

;= 1

(3.1 b)
with state x evolving on an open set V ofR n; J, g1' ... ' gm are smooth Rn-valued
mappings (vector fields) and h 1 , . , hp are smooth R-valued mappings, all
defined on the open set V.
Our purpose is to describe how Kalman's canonical structure theorem can
be extended to this dass of systems. We proceed in two steps: first, we show
how to find a co ordinate system in which the state variables are separated into
two groups, (1 and (2' such that the second group (2 is not affected by the first
group (1 and by the input to the systems, or-respectively-the first group (1
does not affect the second group (2 and the output of the system. Then, we
interpret these decompositions in term of reachability and observability
properties.
Formally, the first step of the program amounts to the solution of the
following two problems.

Problem 1. Consider a system of the form (3.1) and let XO be a fixed point of
V. Find, if possible, a neighborhood VO of XO and a local coordinate transformation z = tP(x) defined on VO such that, in the new coordinates, the system
(3.1a) is represented by equations of the form:

(1=J1(1'(2)+

L1 g1i(1'(2)U;

(3.2a)

;=

(3.2b)

7 Nonlinear and Distributed Systems-Canonical Structure Theorem

479

Problem 2. Consider a system of the form (3.1) and let XO be a fixed point of
V. Find, if possible, a neighborhood VO of XO and a local coordinates transformation Z = d>(x) defined on VO such that, in the new coordinates, the control
system (3.1) is represented by equations of the form:

'1

= fl('l' '2) +

'2

L gli('l' (2)Ui
i=

(3.3a)

g2i('2)Ui

(3.3b)

f2('2)

+L

i= 1

'1

'2

1, ... ,

(3.3c)

where = (Z l ' ... ,Zd) and = (Zd +


Zn)'
In order to understand how Problems 1 and 2 can be solved, we need to
recall a certain number of powerful differential-geometrie concepts, which proved
themselves instrumental in achieving the canonical decompositions indicated
by (3.2) and (3.3). The point of departure is the notion of a smooth distribution,
a nonlinear "version" of the notion of subspace (of a vector space), which can
be illustrated in the following way. Suppose fl,f2, ... ,fq are smooth vector
fields, all defined on the same open set V of Rn, and let ..1(x) denote the vector
space spanned by the vectors fl (x), f2(X), ... ,fq(x) (i.e. by the values of fl'
f2, ... ,fq at the point x of V). A smooth distribution is the mapping ..1 which
assigns to each point x of V the vector space ..1(x). Whenever its is necessary
to indicate explicitly the set of vector fields f10 f2, ... ,fq which were used in
order to define the mapping ..1, the notation:
..1 = span {fl ,f2,.,fq}

is used.
Pointwise, a distribution identifies a vector space, a subspace of Rn. On the
basis of this fact, it is possible to extend to the notion of distribution a number
of elementary concepts related to the notion of vector space. Thus, if ..1 1 and
..1 2 are distributions, their sum ..1 1 + ..1 2 and interseetion ..1 1 n ..1 2 are defined
pointwise as the sum and, respectively, intersection of the subspaces ..1 1 (x) and
..12(X). A distribution ..1 1 contains a distribution A 2 , and is written Al :::::l A 2, if
..1 1 (x) :::::l ..1 2 (x) for all x. A vector field f belongs to a distribution ..1, and is
written fE..1, if f(X)E..1(X) for all x. The dimension of a distribution at a point
x of V is the dimension of the subspace ..1(x).
The manner in which ..1(x) depends on x leads to a number of additional
characterizations. A distribution ..1, defined on a open set V, is nonsingular if
there exists an integer d such that
dirn [..1(x)] = d
for all x in V. A point XO of V is a regular point of a distribution ..1, if there
exists a neighborhood VO of XO with the property that ..1 is nonsingular on VO.
A distribution ..1 is involutive if the Lie bracket [r l' L 2] of any pair of vector

480

A. Ruberti and A. Isidori

fields !land !2 belonging to A is a vector field which belongs to A, i.e. if:


!1 EA,

!2EA~

[tl' !2]EA

We recall that the Lie bracket [tl' !2] of the vector fields !l and !2 is a new
vector field defined by

In many instances, calculations are easier if, instead of distributions, one


consider dual objects, called codistributions, that are defined in the following
way. Recall that a smooth covector field w, defined on an open set V of Rn, is
a smooth assignment - to each point x of V -of an element of the dual space
(Rn)*. Suppose W l , W 2 , . , wq are smooth covector fields, and let n(x) denote
the vector space spanned, at a point x of V, by the covectors W 1 (x),
W2(X), ... , wix). The mapping which assigns to each point x of V the vector
space Q(x) is a smooth codistribution. Coherently with the notations introduced
for distributions, the notation
Q =

span{wl , W2'' Wd}

if often used. Since, pointwise, co distributions are vector spaces (subspaces of


(Rn)*), one can easily extend the notion of addition, intersection, inc1usion.
Similarly, one can define the dimension of a codistribution at each point x of
V, and distinguish between regular points and points of singularity.
Sometimes, it is possible to construct codistributions starting from given
distributions, and conversely. The natural way to do this is the following one:
given a distribution A, for each x in V consider the annihilator of A(x), that is
the set of all covectors which annihilate all vectors in A(x):

<

A .L(x) = {w*E(Rn)*: w*, v> = 0 for all vEA(x)}

Since A.L(x) is a subspace of (R n)*, this construction identifies exactly a


co distribution, in that assigns- to each x of V -a subspace of (Rn)*. This
codistribution, noted A.L, is called the annihilator of Li. Conversely, given a
co distribution Q, one can construct a distribution, no ted n.L and called the
annihilator of n, setting at each x in V:

<

n .L(x) = {vE(Rn): w*, v>

= 0 for all w* En(X)}

A property ofmajor importance, for a distribution, is that ofbeing completely


integrable. Recall that, if ,1 is a smooth R-valued mapping defined on an open
set V of Rn, its differential (or gradient), denoted dA, is the smooth covector
field defined by:
dA(x)= [ -0,1 ... -0,1 ]

oX 1 oX n

7 Nonlinear and Distributed Systems-Canonical Structure Theorem

481

A nonsingular d-dimensional distribution ,1, defined on an open set V of Rn,


is completely integrable if, for each point XO of V there exists a neighborhood
VO of xe, and (n - d) R-valued smooth functions Al' ... ' An - d, all defined on VO,
such that:

span {dAl> ... ' dAn-d} = ,1.L


The following classical result establishes that the properties of being completely
integrable and that to being involutive are equivalent for a nonsingular smooth
distribution.

Theorem (Frobenius). A nonsingular smooth distribution is completely integrable

if and only if it

is involutive.

One of the most useful consequences of the notion of complete integrability


is related to the possibility of using the functions Al' ... , An -d in order to define
a local coordinates transformation which entails a particularly simple representation for the vector fields of ,1. For, observe that, by construction, the (n - d)
differentials:
(3.4)

are linearly independent at the point xc. Then, it is always possible to choose,
in the set of functions:
Xl (X)

= Xl' X2(X) = X2'.' Xn(X) = Xn

a subset of d functions whose differentials at xe, together with those of the set
(3.4) form a set of exactly n linearly independent row vectors. Let <Pi' <P2'' <Pd
denote the functions thus chosen and set:
<Pd+l(X) = Al(X), ... , <Pn(x) = An-Ax)
By construction, the Jacobian matrix of the mapping:
z = lP(x) = col(<pl (x), ... , <PAx), <Pd + 1 (X), ... , <Pn(X))
has rank n at XO and, therefore, the mapping lP qualifies as a local diffeomorphism
(i.e. a local smooth coordinates transformation) around the point xc. Now,
suppose r is a vector field of ,1. In the new coordinates, this vector field is
represented in the form:
i(z) = [ 8 lP r(x)]

8x

x= <1>-'(z)

Since, by construction, the last n - d rows of the jacobian matrix of lP span


,1.L, it follows that the last n - d entries of the vector on the right-hand side
are zero, for all X in the set where the coordinates transformation is defined.
Thus any vector field of ,1, in the new coordinates, has a representation of the
form:
(3.5)
i(z) = col( i 1 (z), ... , id(z), 0, ... , 0)

482

A. Ruberti and A. Isidori

Next, we recall the notion of invariance (of distribution/co distribution) under


a given vector field. This notion is of paramount importance in the derivation
of canonical decompositions because it plays, in the theory of nonlinear control
systems, a role similar to the one played in the theory of linear systems by the
notion of subspace invariant under a linear mapping. A distribution .1 is invariant
und er a vector field f if the Lie bracket [f, '1:] of f with every vector field '1: of
.1 is again a vector field of .1, i.e. if:

'1:EL1 => [f, '1:] EL1


In more condensed form, this property is usually written as [f, .1] c .1,
where [f, .1] denotes the distribution spanned by all the vector fields of the
form [f, '1:], i.e. [f, .1] = span {[f, '1:], '1:EL1}.
A codistribution Q is said to be invariant under the vector field f if the
derivative Lfw, of any covector field W of n along the vector field f, is again
a covector field of Q; the quantity Lfw is a covector field defined by:

Coherently with the notation introduced before for invariant distribution, the
property in equation is usually written as L fn c n, where Lf n = span {L fW:
WEn}. It is easy to show that the notion thus introduced is the dual version
of the notion of invariance of a distribution. More precisely, it can be proven
that if a smooth distribution .1 is invariant under the vector field f, then the
co distribution n = L1..L is also invariant under fand, conversely, that if a smooth
codistribution Q is invariant under the vector field f, then the distribution
.1 = n..L is also invariant under f.
The key role of the notion of invariant distribution is explained by the
following statement, in wh ich we consider an involutive distribution .1, invariant
under a vector field f.

Lemma (see, e.g. [15]) Let .1 be a nonsingular involutive distribution of dimension


d and suppose that .1 is invariant under the vector field f. Then at each point XO
there exist a neighborhood UO of XO and a coordinates transformation z = lP(x)
defined on UD, which the vector field fis represented by a vector of the form:

J(z)=

fiz 1', Zd' Zd+ 1' .. , Zn)


fd+l(Zd+l,,Zn)

(3.6)

The representation (3.6) is particularly useful in interpreting the notion of


invariance of a distribution form a system-theoretic point of view. For, suppose

7 Nonlinear and Distributed Systems-Canonical Structure Theorem

483

a dynamical system of the form:


(3.7)

x' =/(x)

is given and let Li be a nonsingular and involutive distribution, invariant under


the vector field /. Choose the coordinates as described in the Lemma and
set
(1 =(Zl,,Zd)
(2 =(Zd+1,, Zn)

Then, the system in question is represented by equations of the form


'1 =/1(1'(2)

(3.8a)

'2 = /2(2)

(3.8b)

that is exhibits, in the new coordinates, an internal triangular decomposition.


Geometrically, the decomposition described by (3.8) can be interpreted in
the following way. Suppose, without loss of generality, that tP(XO) =0 and that
the neighborhood UO on which the transformation is defined is a neighborhood
of the form
UO = {xERn: IZ;(x) I < e}
where e is a suitable small number. Such a neighborhood UO is called a cubic
neighborhood centered at xc. Let x be a point of UD, and consider the sub set
of UO consisting of all points whose last n - d coordinates (namely the (2
coordinates) coincide with those of x, i.e. the set:
(3.9)
This set is called a slice of the neighborhood UD.
Suppose now that x a and x b are two points of UO satisfying the condition:
(3.10)
i.e. having the same (2 coordinates but possibly different (1 coordinates. Let
xa(t) and xb(t) denote the integral curves of the equation (3.7) starting respectively
from x a and x b at time t = O. Recalling that in the new coordinates the equation
(3.7) exhibits the decomposition (3.8), it is easy to conclude that, so long as xa(t)
and xb(t) are contained in the domain UO of the coordinates transformation
Z

= tP(x),

(2(X a (t = (2(X b(t

(3.11)

at any time t.
Two initial conditions x a and x b satisfying (3.10) belong, by definition, to a
slice of the form (3.9). As we have just seen, the two corresponding trajectories
xa(t) and xb(t) of(3.7) necessarily satisfy (3.11), i.e. at any time t belong necessarily
to a slice of the form (3.9). Thus, we can conclude that the flow of (3.7) carries
slices (of the form (3.9 into slices.

484

A. Ruberti and A. Isidori

We have at this point recalled all the notions needed to understand how
the Problems 1 and 2 stated at the beginning of the seetion can be solved. The
following two statements illustrate their solutions.
Proposition 3.1. Let .1 be a nonsingular involutive distribution of dimension d and
assume that .1 is invariant under the vector fields f, g l' ... , gm' M oreover, suppose
that the distribution span{g1, ... ,gm} is contained in .1. Then, for each point
XO it is possible to find a neighborhood VO of XO and a local coordinates transformation z = lP(x) defined on VO such that, in the new coordinates, the control
system (3.1a) is decomposed as in the equation (3.2).

Proof. From the Lemma it is known that there exists, around each xe, a local
coordinates transformation yielding a representation of the form (3.6) for the
vector fields f, g l' .. , gm' In the new coordinates the vector fields g l' ... , gm,
that by assumption belong to .1, are represented by vectors whose last (n - d)
components are vanishing (see (3.5)). This proves the Proposition.
0
Proposition 3.2. Let .1 be a nonsingular involutive distribution of dimension d and
assume that .1 is invariant under the vector fieldsf, g1, ... ,gm' Moreover, suppose

that the codistribution span{dh 1, ... ,dhp } is contained in the codistribution L1.i.
Then, for each point XO it is possible to find a neighborhood VO of XO and a local
coordinates transformation z = lP(x) defined on VO such that, in the new
coordinates, the control system (3.1) is decomposed as in the equations (3.3).

Proof. As before, we know that there exists, around each xe, a coordinates
transformation yielding a representation of the form (3.6) for the vector fields
f, g1,.,gm In the new coordinates, the covector fields dh 1, ... ,dhp , that by
assumption belong to L1.i, must be represented by vectors whose last d
components are vanishing, and this completes the proof.
0

4 Canonical Decompositions of Nonlinear Systems:


Reachability and Observability
The two local decompositions illustrated in the previous section are very useful
in understanding the input-state and state-output behavior ofthe control system
(3.1). Suppose that the inputs U i are piecewise constant functions of time so that
on time intervals of the form [Tk , Tk + 1) the state of the system evolves along
the integral curve of a vector field of the form:
f(x) + g1(X)~

+ ... + gm(x)~

passing through the point x(Tk ). Suppose that the assumptions ofthe Proposition
3.1 are satisfied, choose a point xe, and set x(O) = xe. For small values of t the
state evolves on VO and we may use the equations (3.2) to interpret the behavior

7 Nonlinear and Distributed Systems-Canonical Structure Theorem

'2

485

of the system. From these, it is seen that the


coordinates of x(t) are not
affected by the input. In particular, the set of points reachable at time t is
necessarily a subset ofa slice ofthe form (3.9), the one passing through the point
xO(T) = (J)f(xO)

(where (J)f(x) is the flow of the vector field f).


The Proposition 3.2 is useful in studying state-output interactions. Choose
a point XO and take two initial states x a and x b belonging slice of UD. Let x:(t)
and x~(t) denote the values of the states reached at time t, starting from x a and
x b , under the action of the same input u. From the equation (3.3b) we see
immediately that:

for every input u. Since the two states x a and x b produce the same output under
any input, they are indistinguishable.
From the previous discussion we could obtain only "negative" results, in
the sense that we identified-in the case of decomposition (3.2)-a smooth
hypersurface of dimension d < n in which all states reachable at time t = T are
necessarily inc1uded and-in the case of decomposition (3.3)-a smooth
hypersurface of dimension d < n whose states are necessarily indistinguishable.
If more "positive" informations are sought, ab out the actual "thickness" of the
set of states reachable at some fixed time, or about the actual thickness of the
sets of states indistinguishable at some fixed time, one has obviously to look
first at decompositions of the form (3.2) in which the dimension d of the first
group of coordinates is the smallest possible, or at decomposition of the form
(3.3) in which the dimension d of the first group of coordinates is the largest
possible.
In view of the results summarized in the previous section, the first problem
amounts to finding, if possible, the smallest distribution invariant under the vector
fields f, gl, ... , gm which contains the vector fields gl, ... , gm. On the other hand,
the second problem amounts to find, if possible, the smallest codistribution
invariant under the vector fields f, gl, ... , gm which contains the covector fields
dh 1 , .. , dh p
The two objects in question can be calculated by means of appropriate (and
simple) algorithms. Consider the sequence of distributions recursively defined
as:
(4.1a)

Li k = Li k - 1 + [f, Li k -

1]

i= 1

[gi' Li k -

1]

(4.1b)

This sequence identifies the distribution needed in order to perform the minimal
decomposition of the form (3.2). In fact, the following is true.

486

A. Ruberti and A. Isidori

Lemma. Suppose Li n - 1 is nonsingular. Then Li n - 1 is also involutive and is the


smallest distribution invarfant under the vector fields f, gl,' .. , gm which contains
the vector fields gl,' .. , gm'
Consider now the sequences of codistributions recursively defined as:
(4.2a)
.Qk

= .Qk-1 + L J.Qk-1 +

I. L

(4.2b)

gi .Qk-1

i= 1

This sequence identifies the distribution needed in order to perform the maximal
decomposition of the form (3.3). In fact, the following is true.

;;-1

Lemma. Suppose .Qn-1 is nonsingular. Then .Q


is involutive and .Qn-1 is the
smallest codistribution invariant under the vector fields f, gl, ... , gm which contains
the covector fields dh l ' ... , dh p
Having in this way identified the "minimal" decomposition of the form (3.2)
and the "maximal" decomposition of the form (3.3), it is possible to analyze in
more detail the reachability properties of the subsystem corresponding to the
first group of coordinates-in the case of (3.2)-and the observability properties
of the subsystem corresponding to the second group of coordinates-in the
ca se of (3.3). The following statements, which summarize results originally
proven in [10J, [13J and [14], characterize the properties in question.

Theorem 4.1. Let P denote the smallest distribution invariant under f, gl," . , gm
which contains gl, ... , gm' Suppose P and P + span {f} are both nonsingular. Let
p denote the dimension of P. Then,for each XO E V it is possible to find a neighborhood VO of XO and a coordinate transformation z = <P(x) defined on VO with the
following properties:
(a) the set ~(XO, T) of states reachable at time t = T starting from XO at t = 0
along trajectories entirely contained in V and under the action of piecewise
constant input functions is a subset of the slice

where <Pf(xO) denotes the state reached at time t = T when u(t) =


tE[O, TJ
(b) the set ~(XO, T) has a nonempy interior in the topology of Sxo, T'

for all

Theorem 4.2. Let Q denote the annihilator of the smallest codistribution invariant
under f, gl, ... ,gm which contains dh 1 , ... ,dh p Suppose Q is nonsingular and let
q denote its dimension. Then, for each XO E V it is possible to find a neighborhood
VO of XO and a coordinate transformation z = <P(x) defined VO with the following
properties:

7 Nonlinear and Distributed Systems-Canonical Structure Theorem

487

(a) Any two initial states x a and x b of VO such that

(Mx") = cf>;(x b) i = q + 1, ... , n


produce identical output functions under any input which keeps the state
trajectories evolving on vo.
(b) Any initial state x ofVo wh ich cannot be distinguishedfrom XO under piecewise
constant input functions belongs to the slice

Sxo = {xeVo:cf>q+ 1 (x) = cf>q+ 1 (XO), , cf>nCx) = cf>n(XO)}

5 Irreducible Realizations
As shown for the first time by KaIman for linear systems in [2J, the "canonical
structure theorem" is an important ingredient in the process of constructing
irreducible realizations of an input-output map. Starting form the observation
that a linear input-output map

JW(t, .)u(.)d.
I

y(t) =

is realizable by a finite dimensional linear system if and only if the kernel W(t,.)
is separable, i.e. there exist differentiable functions Q(t) and P(.) such that:
W(t,.) = Q(t)P(.)

for all t,.,

one can in fact use the canonical structure theorem in order to extract from
the "trivial" realization:

x = P(t)u
y = Q(t)x

an irreducible one.
A similar procedure can be followed also in the problem of constructing
irreducible realizations of a nonlinear input-output map, starting from an
appropriate version of the separability condition which also holds in this more
general setting. For, recall (see [17J and [18J) that under suitable assumptions
on the input function, the output response of a (single-input) nonlinear system
of the form (3.1) can be expanded in the form of a Volterra series:
I

y(t) = wo(t) + wl(t, .l)u(.dd. l + ...


o
t

tl

+ JJ...
00

'tk- 1

J Wk(t'l,,k)U(dU(k)dldk
0

with kerneis recursively defined by:

488

A. Ruberti and A. Isidori

where:

and XO = x(O). A realization of a prescribed sequence of kerneIs wo(t),


Wl(t,Tl), ... ,Wk(t,Tl, ... ,Td, ... is quadruplet {f,g,h,xO} which satisfies the
previous recursive relations.
It is immediate to see that a realization exists if and only if the sequence
of kerneIs which characterize the expansion of the input-output map satisfies
an appropriate separability condition. In fact, the following result can easily be
proven.
Theorem (see [19J) Afamily ofkernels wO(t),Wl(t,Tl), ... ,Wk(t,Tl, ... ,Tk)' ... is
realizable by a finite dimensional non linear system (3.1) if and only if there exist
two functions pet, x) and Q(t, x) such that:
wo(t) = Q(t, xe)
Wk(t, Tl' ... ' Tk ) =

[~[ ... [~[aQ(t,x) P(Tl,X)]P(T2,X)] ... ]P(Tk'X)]


ox
ox OX
x=x

As in the ca se of a linear system, from the two functions pet, x) and Q(t, x)
which separate the kerneIs one can define the "trivial" realization, which has
the form:

x=

pet, x)u(t)

x(O) = XO

Y = Q(t,x)x

and from this an irreducible realization one can be extracted by means of the
canonical structure theorem.
References
[1] R.E. Kaiman, Canonical strueture oflinear dynamical systems, Proc Nat Aead Sei U.S.A. 48,
pp 596-600 (1962)
[2] R.E. Kaiman, Mathematical Description of linear dynamieal systems, SIAM J Contr 1, pp
152-192 (1963)
[3] R.E. Kaiman, L. Weiss, Contributions to linear system theory, Int J Engrn Sei 3, pp 141-171
(1965)

[4] D.C. Youla, The synthesis of linear dynamieal systems from preseribed weighting pattern,
SIAM J Appl Math 14, pp 527-549 (1966)
[5] L.M. Silverman, H.E. Meadows, Controllability and Observability of time-variable linear
systems, SIAM J Contr 5, pp 64-73 (1967)

7 Nonlinear and Distributed Systems-Canonical Structure Theorem

489

[6] L. Weiss, On the structure theory oflineared differential systems, SIAM J Contr 6, pp 659-680
(1968)
[7] P. d'Alessandro, A. Isidori, A. Ruberti, A new approach to the theory of canonical
decomposition of linear dynamical systems, SIAM J Contr 10, pp 148-158 (1972)
[8] R.W.Brockett, On the algebraic structure of bilinear systems, Theory and Applieations of
Variable Strueture Systems, R. Mohler and A. Ruberti eds, Aeademie Press, pp 153-168 (1972)
[9] P. d'Alessandro, A. Isidori, A. Ruberti, Realization and strueture theory of bilinear dynamieal
systems, SIAM J Contr 12, pp 517-535 (1974)
[10] H. Sussmann, V.Jurdjevie, Controllability ofnonlinearsystems,J Diff Eqs 12, pp95-116(1972)
[11] R.W. Broekett, System theory on group manifolds and eoset spaces, SIAM J Contr 10, pp
265-284 (1972)
[12] H. Sussmann, Existenee and uniqueness of minimal realizations of nonlinear system, Math
Syst Theory 10, pp 263-284 (1977)
[13] R. Hermann, A.J. Krener, Nonlinear controllability and observability, IEEE Trans Aut Contr
AC-22, pp 728-740 (1977)
[14] A. Isidori, A.J. Krener, C. Gori Giorgi, S. Monaco, Nonlinear decoupling via feedback: a
differential geometrie approach, IEEE Trans Aut Contr AC-26, pp 331-345 (1981)
[15] M. Fliess, Realisation loeale des systemes non lineaires, algebres de Lie filtrees transitives et
series generatrices non commutatives, Invent Math 71, pp 521-537 (1983)
[16] A. Isidori, Nonlinear eontrol systems, 2nd ed, Springer Verlag, pp 1-480 (1989)
[17] R.W. Broekett, Volterra series and geometrie eontrol theory, Automatica 12, pp 167-176(1976)
[18] C. Lesjak, A.J. Krener, The existenee and uniqueness ofVolterra series for nonlinear systems,
IEEE Trans Aut Contr AC-23, pp 1091-1095 (1978)
[19] A. Isidori, A. Ruberti, Aseparation property of realizable Volterra kerneis, Syst Contr Lett
1, pp 309-311 (1982)

Some Remarks on the Control of Distributed Systems


J. L. Lions
College de France and C.N.E.S.
2, Place Maurice Quentin, F-75039 Paris, France

1 Introduction to HUM
Let us consider a Distributed Parameter System, i.e. a system with astate
equation given by a P.D.E. (Partial Differential Equation) of evolution type.
Let us write, in a formal fashion for the time being

oy

-+d(y) =&8v

ot

(1.1)

where y denotes the state, v denotes the control; in (1.1) d is a P.D.O. (Partial
Differential Operator) which is linear or non linear. The operator &8 maps the
space of controls into the space of "acceptable" right hand sides.
This way of writing things is of course formal.
In "real" situations, v can appear only on the boundary ofthe spatial domain
Q where (1.1) is considered, or it can appear inside Q (boundary-resp.
distributed-control in the first-resp. second-situation).
We confine ourselves here to deterministic P.D.E's. In (1.1), y can be a scalar
function or a system y = {Ylo"" YN}'
In the above notations, the wave equation
(1.2)
is written in a systemform (i.e. y be comes in fact the couple {y,oy/ot}).
Of course to (1.1) one should add boundary conditions and initial conditions.
We shall assume that the boundary conditions are implicitely taken ca re of
in (1.1). But of course this has to be made precise in each specific situation.
The initial condition is

y(O) = yO

( 1.3)

where yO is given in a suitable function space and where y(O) denotes the function
x -+ y(x, 0), if y(x, t) denotes "the" solution of (1.1) (1.3) subject to appropriate
boundary conditions.

Remark 1.1. It may happen that existence of a solution is known without


uniqueness being known. A classical situation where such a problem arises is

492

J. L. Lions

the system of Navier-Stokes equations. It mayaiso happen, in particular for


stationary problems (where of course (1.3) becomes irrelevant), that (1.1) (without
ay/at) admits an infinite number of solutions.
0
In general we shall denote by

y(x, t; v) = y(v)
any solution of (1.1) and (1.3) (subject to appropriate boundary conditions).
Adaptations of what follows to ca ses where y(v) is not unique are given (at least
in so me cases) in J.L. Lions [2J, A.V. Fursikov [lJ and [2].
Exactly as in the finite dimensional case, one considers next a cost function
defined by
J(v) = q>(y(v), v)

(1.4)

where q>(y, v) is a function from Y x 1111 -+ JR where Y (resp. 1111) denotes the space
(resp. set) of states (resp. admissible controls).
One looks then for

inf J(v),

(1.5)

where y(v) can also be subject to further restrictions (the state constraints).

In the (very many) problems and still open questions which present
themse1ves along these lines, R. E. Kalman's work has always been a "guiding
line". What part of Kalman's theory can be extended to the infinite dimensional
situations?
0
The first question to consider is of course the KaIman filter. One can indeed
obtain an infinite dimensional analogue of the Riccati equation of Kalman's
filter, based on the general theory ofnon homogeneous boundary value problems
(as studied in J.L. Lions and E. Magenes [lJ) and on the use of L. Schwartz's
kernel theorem (L. Schwartz [lJ). cf. J.L. Lions [lJ and [3J.
0
Another question is to study the controllability problem.
For linear systems, a general method has been introduced in J.L. Lions [4J
and [5]. Let us describe it in a formal fashion and in a very particular situation.
Let us consider the wave equation

a2 y
at 2

Lty = VXm in

x (0, T)

(1.6)

where (9 denotes an open subset of the bounded open set n of JRn and where
Xm denotes the characteristic function of (!).
The boundary condition is chosen to be Gust in order to fix ideas; wh at
follows is completely general)

y=

on

x (0, T), where

r = an = boundary of n

(1.7)

7 Nonlinear and Distributed Systems-Remarks on Distributed Systems

493

The initial conditions are


y(o)

= 0,

ay (0) = 0.

(1.8)

at

Let T > be given. Let zo, Zl be two functions given in "appropriate" function
spaces.
We shall say that the problem is exactly controllable if for any couple {ZO, Zl},
there exists a function v (in a suitable function space) such that the corresponding
solution y(v) satisfies
(1.9)
A few remarks are in order.
Remark 1.2. Of course the above problem is ambiguous. We have to make
precise what are the function spaces which are considered. This is a highly non
trivial question, as we shall indicate below. In order to "start" the machinery,
we shall assume that
v spans

L 2 (@ x (0, T))

(1.10)

But of course this is by no means the only possible choice.

Remark 1.3. Due to the finite speed of propagation of, say, singularities, it is
c1ear that finding v for "any" couple {ZO, Zl} is impossible unless T is "Iarge
enough", a condition which can be made precise, as in J.L. Lions [4] and
[5].
0
Remark 1.4. In the linear case, having zero initial conditions as in (1.8) is not
a restriction.
0

Remark 1.5. Assurne there is exact controllability. Let e> be chosen so small
that there is still exact controllability between e and T. Then let us choose V1
an arbitrary L 2 function in @ x (0, e). This choice will drive the system to the state
y(e; vd = ~o,

ay

at (e; Vl) = ~l

We choose next V2 in L2 (@ x (e, T such that the solution is driven from


to {ZO,Zl}. This is indeed possible according to Remark 1.4 and the
exact controllability between e and T. Then the control v given by
gO,~l}

v=

Vl

in

x (0, e),

V2

in

x (e, T)

(1.11)

drives the system from (O,O} at time to {ZO, Zl} at time T. Therefore there are
infinitely many controls v such that (1.9) holds true. It becomes then very natural

494

J. L. Lions

to look for

inf~ H

2 (I) x (O,T)

v2 dxdt

(1.12)

among all the v's (ifthey exist) such that (1.9) holds true (state constraints).

Let us proceed in a formal fashion, in order to explain some of the main


ideas without going into technical details which can be found in the already
quoted papers.
We use duality theory, in the sense of Fenchel [IJ and Rockafellar [1]. Cf.
also I. Ekeland and R. Temam [1].
Let us define the linear operator L by

Lv = {Y(T; v), : (T;

V)}

(1.13)

We define next

G(q) = G(qO, ql) by


G(q) = 0 if qO = zO,

+ 00

ql = Zl

(1.14)

otherwise.

This is a "proper" convex function (on a suitable function space).


If we set
1

F(v)=- H v2 dxdt
2 (I) x (O,T)

(1.15)

defined on L2 (!) x (0, T)), problem (1.12) is equivalent to finding

inf[F(v) + G(Lv)] = J o

(1.16)

Using duality theory (of course with the necessity to prove that we can
indeed use it here !), one has
J O= -inf[F*(L*q)+G*(-q)]

(1.17)

In (1.17), H*(f) denotes in general the conjugate of H(f) defined by


H*(f) = sup [(f,]) - H(J)]
J
and L* denotes the adjoint of L. We easily find that

F*(v) = F(v)

(1.18)

G*(q) = (qO,ZO) + (ql,Zl)

(1.19)

and that

(where parentheses express duality between suitable function spaces).

7 Nonlinear and Distributed Systems-Remarks on Distributed Systems

495

We verify next that L* is given as follows. Let us introduce Q as the solution of

2Q
- 2 -AQ=O,

at

Q(X, T) = QO,
Q

= 0 on

aQ (x, T) = Ql in
at

n,

(1.20)

x (0, T)

Then, multiplying (1.20) by y = y(v) and integrating by parts, one finds that

(Lv, { - Ql, QO} >=

SI

(I) x(O.T)

QV dx dt.

(1.21)

Therefore L*q = QX(I) if qO = _ Ql, ql = gO.


Therefore the dual problem appearing on the right hand side of (1.17) becomes
- inf ,,(gO,e 1 )

(1.22)

QO.Q'

where
(1.23)
with g = solution of (1.20).
We now observe that
(

ss
(I)

Q2 dxdt )

1/2

(1.24)

x (O,Tl

defines a norm on the space of (smooth) functions QO, e1 by virtue of Holmgren's


uniqueness theorem (cf. L. Hrmander [lJ, Th. 5.3.3), provided T is large enough.
Let us set
(1.25)
We define a (possibly) new Hilbert space by taking the completion ofsmooth
functions {/, e1 for the norm (1.25). Let us denote by F this Hilbert space.
Then the solution ofinf "(eO, Ql) is straighforward. It admits a unique solution
if {zl, -zO}EF' = dual space o[ F.
This is the basis o[ the (general) HUM (Hilbert Uniqueness Method)
introduced in the already quoted papers.
Further remarks are now in order.

Remark 1.6. Of course in order to obtain more precise results, one has to
possibly characterize the space F (and in any ca se to obtain indusion relations
with "classical" spaces). We refer here to J.L. Lions [4J and [5J and to many other
works, in particular hy A. Haraux, V. Komornik, and E. Zuazua.

496

J. L. Lions

The space F can depend on (!). Formally F = F((!)) becomes "very large" as
becomes "very small", so that F' = F'((!)) gets smaller as (!) decreases. This is
coherent with common sense: the smaller the set (!), the smaller is the set of
ZO,Zl that we can reach using vEL2((!) x (0, T)). This remark leads to a natural
question: one observes first that if (!) = Q (and if T is large enough), then
F = HMQ) x L2 (n), where H~(Q) denotes the Sobolev space of functions
cpEL2 (Q) such that ocp/ox;EL2 (n), i ~ i ~ n and such that cp = on r = on.
What are the conditions for (!) which imply that
(!)

F((!)) = H~(n) x L2 (n)?


These necessary and sufficient conditions have been found by C. Bardos, G.
0
Lebeau and J. Rauch [1].
Remark 1.7. The above method is an extension of the weIl known duality
between controllability and observability in finite dimensional theory.
Further analogies between RUM and "finite dimensional methods" have
been indicated to the author by S. Mitter [1] and by D. RusseIl [1]. One should
also consult the survey D. RusseIl [2].
0
Remark 1.8. The above method leads in a natural way to numerical algorithms.
We refer to R. Glowinski, C. Li and J.L. Lions [1].
0
Remark 1.9. The "Optimality System", i.e. the system of P.D.E.'s giving the
necessary and sufficient conditions for {eO, e1 } to solve (1.22), is given as foIlows:
one solves (1.20) first; one introduces next y as the solution of

02 y
ot 2 - L1y =

eXI!!>

y=

x (0, T)

on

y(x, 0) = 0, oy (x, 0) =
ot

(1.26)

in n,

and one defines A by

A{eO, e1 } = {~ (x, T), - y(x, T)}

(1.27)

Then {eO, ( 1) is the solution of

A{eO,e 1} = {Zl, -ZO}

(1.28)

Remark 1.10. RUM is generalfor linear P.D.O.'s. We have introduced (formaIly)


the method in an hyperbolic situation. But the method applies as weIl to weIl
set problems in the sense ofPetrowski. Cf. [4] and [5] ofthe author. In these cases
the speed of propagation of, say, singularities, is infinite, so that no
condition of the type" T large enough" is needed.

7 Nonlinear and Distributed Systems-Remarks on Distributed Systems

497

HUM also applies for Sehroedinger equations with various techniques to


obtain the estimates; cf. G. Lebeau [1J, E. Maehtyngier [1J, and a general
0
approach based on semi group theory, in A. Bensoussan [1].

Remark 1.11. The situation is slightly different for parabolic systems (i.e. time
irreversible systems). We shall not enter into this topic here, referring to J.L.
Lions [5J Vol. 2 and [6J.
0
Remark 1.12. There are several applications which also motivate the above
eonsiderations. The main one is related to the control and stabilization of
flexible struetures. Cf. [6J loe. eit. for other questions. Of course all this directly
leads to non linear P.D.E.'s. A few remarks concerning these cases are presented
in the next seetion.

2 Non Linear Systems


The above method HUM has been introduced using linear tools. Let us assume
now that the state equation (1.6) with boundary eondition (1.7) and initial
eonditions (1.8) is replaced by the non linear PDE

a2 y
at 2

Ay + yg(y) = VXI!!'

(2.1)

conditions (1.7) and (1.8) being unchanged.


The following general method has been used by E. Zuazua [1J and [2]. One
starts with an arbitrary function ~ and one considers the linear problem:

a2 y

""t2 -

Ay + yg(~) = VXI!!'

(2.2)

with (1.7) and (1.8) unchanged.


Using HUM one defines
v=

v(~)

such that

y(T; v) = zo,

ay
at (T; v) = Zl

(2.3)

and

11 V IIp(1!! x (O,T)) = min

(2.4)

This uniquely defines y = y(x, t; v(m = M(~). One will have obtained a
control v driving (2.1) from state {O,O} to state {ZO,Zl} at time T if onefinds a
fixed point of M
M(~) =~.

498

J. L. Lions

Using various fixed point theorems based on c1assical Leray-Schauder


theory, E. Zuazua (loc. cit.) has obtained sufficient conditions on g and on (!)
such that there exists v such that (2.3) holds true. cf. also Y. Jinhai [1].
D
Remark 2.1. Conditions on (!J should (?) be so that the space F = F( (!)) introduced
in Sect. 1 for the linear case equals HM!2) x L 2(!2). Otherwise one intro duces
spaces F which are so large that solving non linear problems with initial data
in F looks hopeless.
0
Remark 2.2. Of course in non linear situations, one does restrict the generality
by assuming that y(O) = 0, (ayjat)(O) = O. One should consider the general
situation where

(2.5)

yO, y1 given (not necessarily 0).


Actually it has been proven in Ball, Marsden, and Slemrod [1] that the
answer may depend on the properties of yO and y1 (in bilinear situations).
D
Remark 2.3. It is useful to have in mind the existence of counter examples in
quite simple non linear situations. Let us consider the non linear P.D.E.

a2y
laylP-2 ay
--L1Y+1>: -=VXI!!
2
at
at
at

(2.6)

with conditions (1.7) and (1.8) unchanged. In (2.6) we assume that


1>:>0

(2.7)

p>2

(2.8)

and
Existence and uniqueness ofa solution of(2.6), (1.7), and (1.8) is well known.
The counter example runs as follows. One intro duces a function t/J = t/J(x)
such that

t/J ~ 0 in!2, t/J = 0 in (!) and


t/J = 0 on r = an, and such that
St/J-(1/2+ l/P)qIVt/Jlqdx <
n

00,

111
+ - =p q 2

(2.9)

(Such functions t/J exist).


Let T> 0 be arbitrarily given. Then there exists a constant c = c(l>:, T) such
that

a T)
S t/J ( -(x,
{l

8t

)2 dx ~ C

VVEL2 ((!)

X(0, T))

(2.10)

7 Nonlinear and Distributed Systems-Remarks on Distributed Systems

Of course (2.10) implies that the set spanned by {y(T),(oyjot)(T)}


den se in U(Q) x L2 (Q) as v spans L2 ((!) x (0, T)).
For the proof of (2.10) one multiplies (2.6) by 1j;(oyjot).
Suitable integrations by parts and Gronwall's lemma lead to (2.10).

499
IS

not

Remark 2.4. For parabolic problems, estimates of the type

J1j;y(x, Tfdx ~ C
u

(analogous to (2.10)) have been noticed by A. Bamberger [1].

The conclusion imposes itself: many open questions remain for the
controllability of non linear distributed systems. We refer also to the papers [6J
and [7J of the Author.
Note added in Proof. Many refined results going much beyond the content of
Remark 2.3 have been obtained by I. Diaz. A complete report of these results

will be given in EI Escorial Seminar, August 1991.


References
J.M. Ball, J.E. Marsden and M. Slemrod [lJ Controllability for distributed bilinear systems, SIAM
J Control and Optimization. Vol 20, No 4, 1982, P 575-597
C. Bardos, G. Lebeau and J. Rauch [lJ Appendix of J.L. Lions [5J, Voll.
A. Bensoussan [1]. Controllability. In a book to appear
I. Ekeland and R. Temam [IJ Analyse convexe et problemes variationnels. Dunod, Gauthier Villars.
1974
W. Fenchel [IJ On conjugate convex functions. Canad J Math 1 (1949), P 73-77
A.V. Fursikov [IJ Control problems... for the 3-dimensional Navier Stokes and Euler Equations
Mat USSR Sbornik 43 (1982), 9, P 251-273. [2J Lagrange principle for problems of optimal
control of ill-posed or singular distributed systems. JMPA to appear
R. Glowinski, C. Li and J.L. Lions [1]. Numerical approximation of exact controllability of the
wave equation. Boundary control of Dirichlet type. Jap J of Applied Math 1990
L. Hrmander [IJ Linear partial differential operators. Springer Verlag. 1976
Y. Jinhai [1]. Thesis. Fudan University May 1991
R.E. Kaiman [IJ Contributions to the theory of optimal control Bol Soc Mat Mexicano (1960),
p 102-119. [2J On the general theory of control systems. Proc IFAC Moscow 1960. Butterworths,
1961
R.E. Kaiman and R.S. Bucy [IJ New results in linear filtering and prediction theory. J of Basic
Engineering 1961, p 95-107
G. Lebeau [IJ Contrle de l'equation de Schrdinger, JMPA. To appear
J. Lagnese and J.L. Lions [IJ M odelling, Analysis and Control ofThin Plates. R.M.A. Masson. 1988
J.L. Lions [IJ Contrille optimal de systemes gouvernes par des equations aux derivees partielles.
Paris-Dunod. 1968. [2J Contrille des systemes distribues singuliers. Gauthier Villars. Collection
MMI t.l3 (1983). [3J Some aspects of the optimal control of distributed parameter systems. SIAM
Pub VoI61972. [4J Exact controllability, stabilization and perturbations for distributed systems.
SIAM Rev 1988. [5J Contrillabi/ite Exacte Perturbations et stabilisation des systemes distribues.
t.l Contrlabilite exacte t.2 Perturbation. Collection R.M.A. Masson 1988. [6J Exact
Controllability for Distributed Systems. Some trends and some problems. Venise 1. Symposium
on Applied and Industrial Mathematics. 1989. [7J Trends and problems for co nt roll ability of
non linear distributed systems. IEEE CDC 1990
J.L. Lions et E. Magenes [IJ Problemes aux limites non homogenes et applications. Voll, et
2-Dunod-Paris. 1968

500

J. L. Lions

E. Machtyngier [IJ Contrlabilite exacte et stabilisation frontiere de I'equation de Schredinger.


CRAS. Paris 1990
S. Mitter [IJ Private Communication
T.R. Rockafellar [1] Duality and stability in extremum problems involving convex functionals.
Pac J Math 21 (1967), P 167-187
D. Russell [1] Private communication. [2J Controllability and stabilization theory for linear partial
differential equations. Recent Progress and open questions. SIAM Rev 20 (1978), P 639-739
L. Schwartz [1] Theorie des noyaux. Proc Int Congress of Mathematicians (1950), 1, P 220-230
E. Zuazua [1] Contrlabilite exacte des systemes d'evolution non lineaires. CRAS. Paris (306),
1988, p 129-132. [2] Exact controllability for semi linear evolution equations. JMPA (1991)

Chapter 8

Influence in Mathematics

The State Space Method in the Study of Interpolation


by Rational Matrix Functions
J. A. Ball 1 , I. Gohberg 2 and L. Rodman 3
1 Department of Mathematics, Virginia Polytechnic Institute and State University,
Blacksburg, Virginia 24061, USA
2 School of Mathematical Sciences, The Raymond and Beverly Sackler Faculty of
Exact Sciences, Tel-Aviv University, Tel-Aviv, Ramat Aviv 69978, Israel
3 Department of Mathematics, College of William and Mary, Williamsburg,
Virginia 23187-8795, USA

In 1960 KaIman introduced the state space method as a systematic tool for the

understanding of the structure of a linear dynamical system. A model for a


linear dynamical system is a system of vector differential equations

x=

Ax + Bu,

x(O) =

y= Cx+Du
When one assumes that the state vector x(t) is subject to zero initial condition
(x o = 0), then after Laplace transformation the relation between the input
function u(s) and the output function (s) is given by
(s) = W(s)u(s)

where
W(s) = D

+ C(sI -

A) - 1B

(1)

is the transfer function of the system. The converse problem is to realize a given
proper rational matrix function as the transfer function of a linear system, i.e.
given W(s) find matrices A,B,C,D so that (1) holds. Using the notions of
controllability, observability and minimal realization, KaIman c1arified the
precise connection between an input-output or c1assical frequency domain
description of a linear dynamical system on the one hand and astate space
model on the other. KaIman also introduced a primitive axiomatic framework
for linear dynamical systems and pursued applications of the state space method
to specific engineering problems, such as the linear quadratic regulator problem
and the KaIman filter.
Some of these same ideas appeared in a different form earlier in the operator
model theory ofLivsic and Brodskii [7J and also concurrently but independently
in the model theory of Sz.-Nagy and Foias [13J and de Branges and Rovnyak
[6]. The characteristic function attached to an operator on a Hilbert space in
this theory is in fact the transfer function for a system having extra symmetries.
For arecent overview of the connections between system theory and operator

1
3

Partially supported by the NSF Grant DMS-8701615-02.


Partially supported by the NSF Grant DMS-8802836.

504

J. A. Ball et al.

model theory, see [2]. Starting in the middle 1970's, researchers such as Baras,
Brockett, Dewidle, Fuhrmann, and Helton analyzed the connections between
operator model theory (in particular the Beurling-Lax theorem and shift
invariant subspaces in H 2 ) and system theory. Some of this work led in turn
to the polynomial module approach to systems theory.
It has now become c1ear that the state space method introduced by KaIman
is relevant to the understanding of rational matrix functions from the purely
mathematieal point of view as a generalization of c1assieal function theory.
Issues to be understood (also basie to a multitude of applications) are matricial
analogues of all sorts of factorization and interpolation problems. The factorization problems inc1ude: polynomial, minimal (no pole-zero cancellations),
spectral and Wiener-Hopf (splitting poles and zeros of t~y factors to specific
regions of the complex plane), inner-outer and coprime. Inteq;clation problems
inc1ude: finding a polynomial with given zeros, finding a rational function with
given poles and zeros, finding a rational inner (all-pass) function with given
zeros in the right half plane, Lagrange interpolation for polynomials, Pade and
Cauchy interpolation for rational functions, and the c1assieal interpolation
problems of Schur, Nevanlinna-Piek, Caratheodory-Toeplitz, Schur-Takagi
and Nehari. (These last topies are particularly relevant to the active area of H OO
control which has developed in the past decade.) Work up to 1970 on rational
matrix functions generally dealt directly with polynomial coefficients of matrix
entries, expressed many ideas in terms of determinants and minors, developed
solution algorithms using row and column operations, or took the linear
dynamieal system as the main object of study rather than the rational matrix
function itself. Examples of this flavor of work are the spectral factorization
algorithm of Youla [17] and the well-known book of Rosenbrock [15].
Somewhat later there began a more systematic exploitation of state space
methods to the study of rational matrix functions. The relevance of the state
space method to understanding rational matrix functions is the idea of using
four matrices A, B, C, D to express a rational matrix function W(z) in the realized
form (1) as the transfer function of a linear system. The next step is to formulate
the problem under study in terms of the matrices A, B, C, D and then to reduce
the solution of the problem to standard linear algebra manipulations on finite
matrices. This has the additional payoff that one is able to get explicit formulas
for the final solution rather than merely existence theorems. An early state space
algorithm for spectral factorization was obtained by Anderson [1]. Later Bart,
Gohberg, Kaashoek [5] applied astate space approach for general minimal
factorizations to spectral factorization. A major more recent breakthrough was
Glover's state space solution (inc1uding a linear fractional parametrization of
the set of all solutions) of the Nehari problem [8]. Here we discuss a partieular
line of development pursued in recent years by us and together with a number
of our colleagues. Inspiration for the work comes both from system theory
and operator model theory; a good broad sampie of the work can be found in
the books [3,5,9, 10, 11, 12]. In this approach, zero and pole data (inc1uding
geometrie directional information) are encoded in a pair of matriees C, A) or

8 Influence in Mathematics-State Space and Rational Matrix Functions

505

(A, B)) which forms a piece ofa realization ofthe associated function (see [11, 3J).

The connection between factorization and invariant subspaces, a central theme


in the operator model theory, is given a more affine form to classify all minimal
factorizations of a rational matrix function (see [5, 12J). In more recent work
[3J, intricate bitangential interpolation conditions are encoded in a collection
of seven matrices from which one eventually is able to produce a realization of
the desired solution. In the end fairly complete state space solutions of nontrivial
matricial versions of all the factorization and interpolation problems listed
above have now been obtained. In this work the fundamental notions of
controllability and observability introduced by KaIman occur in all sorts of
purely mathematical contexts. Recently, some preliminary extensions to infinite
dimensional settings have been obtained (see [14J).
In this short note we illustrate the state space method for the simplest matrix
version of boundary Nevanlinna-Pick interpolation. The classical boundary
Nevanlinna-Pick (sometimes also called Loewner) interpolation problem is as
folIows. We are given m points ot:1, ... ,ot:m on the imaginary line and m complex
numbers W 1 , , W m and seek a rational function f which is analytic on the right
half plane II + with

sup{lf(z)l:zEII+} ~ 1

(2)

such that
f(ot:i) = Wi for

i = 1, ... , m

(3)

If any Wi has Iwil > 1, then (2) and (3) are incompatible and the problem is not
solvable. On the other hand, it can be shown that if Iwil ~ 1 for all i, then the
problem always has a solution (see Chap. 21 in [3J). If Iw;I < 1 for all i, one
can always solve the problem even when additional derivative constraints are
imposed on f at ot:i. The simplest boundary Nevanlinna-Pick interpolation is
to assume also that Iw;l = 1 for i = 1, ... , m and to impose interpolation
constraints on the derivative of f as weIl as the point ot:i:
f'(ot:i)=Yi,

i= 1, ... ,m.

(4)

It turns out that for any f satisfying (2) and (3) necessarily f'(ot:;) has the form
f'(ot:;) = - WiPi where Pi !i;;; O. This fact has to do with classical properties about
angular derivatives for Schur-class functions; we refer to [16J for arecent elegant
Hilbert space derivation of the basic properties of angular derivatives for general
(nonrational) Schur-cl ass functions and to [3J for a more elementary account
of the rational case.
We now turn to the simplest matrix analogue of the interpolation problem
(2)-(4) (where Yi = - WiPi with Pi !i;;; 0). We also permit interpolation conditions
in the interior of II+ as in the usual Nevanlinna-Pick problem. We are given
n points Zl' .. 'Zn in II+, n nonzero 1 x N row vectors Xl, ... ,xn,n 1 x N row
vectors Yl, ... ,Yn together with m points ot:1, ... ,ot: m on the imaginary line, 2m
N x 1 column vector ~ l' ... , ~m and '11, .. , 11m all of unit length and m positive
real numbers Pl, ... , Pm. Then the problem is to find all rational N x N matrix

506

J. A. BaIJ et al.

functions F(z) for which


sup { 11 F(z) 11 :zEll +}

(5)

x;F(z;) = y; for i = 1, ... , n


for j=1, ... ,m
-~jF'((t)1J;=Pj for j=1, ... ,m

(6)
(7)
(8)

~jF(rJ.)=1Jj

The solution to the boundary Nevanlinna-Pick interpolation problem is as


follows. Note that the function e(z) which gene rates a linear fractional description for the set of all solutions is given explicitly in terms of the data in state
space form.

Theorem 1. Given the interpolation conditions (5)-(8), introduce the matrices


A~ = diag[zl'' znJ,

A" = - At = diag[ -Zl' .. ' -zn],


C =
1t

[-x** . . -x*]* '


1

-Yl-Y n

where
hk1 = (~k~i -1Jk1Ji)/(Zk + Z,)
hk1 = Pk if k = 1,
X

= [(x;~; - Y;1Jn/(z; -

if k =1-1,

rJ.j)]l~;~n;l~j~m,

andfinally

Assume that T is invertible and define a rational 2N x 2N matrix function


e(z) = [e 11 (z)
21 (Z)

e(z) = [

+ C,,(z[ -

edz)]

e 22 (Z)

by

A,,)-lT- 1 B~,

(9)

8 Influence in Mathematics-State Space and Rational Matrix Functions

where

507

"=[A" 0],

Ao

B,=[:~J
C,,= [C"

Co].

There then exist rational N x N matrixJunctions F(z) satisfying the interpolation


conditions (5)-(8) if and only if T is positive definite. In this case the set oJ all
such Junctions F(z) is given by
F(z) =

[e 11 (z)G(z) + e 12 (z)] [e 21 (z)G(z) + e 22 (z)r \

where G is an arbitrary rational matrix Junction analytic on 11 + Jor which


(i) sup{IIG(z)ll:zE11+}~l and
21 G+
22 has a simple pole at rxJor i= 1, ... ,m.

(ii)

The idea behind the proof of Theorem 1 is as folIows. One can show by a
direct winding number argument that a J-Iossless rational matrix function e(z),
i.e. e(z) such that, for J = IN$ - IN'
e( - Z)*Je(z) = J

if

9lz = 0,

(10)

e( - Z)*Je(z) ~ J

if

9lz > 0,

(11)

and

and which has the appropriate pole and zero structure at the interpolation
nodes Zl'''''Zn,rx 1 , ... ,rxm provides an appropriate linear fractional map which
provides a description for the set of all solutions. To find a rational matrix
function e having these properties, one assumes that e(z) is in the realized
form e(z) = I + C(zI - A) - 1B of the transfer function of a linear system and
derives the corresponding conditions on the matrices A, Band C which in turn
are solvable.
The method generalizes easily to the more complicated situation where the
interpolation conditions involve high er order derivatives on the unknown
function F. In this case, the interpolation conditions themselves are expressed
more conveniently in terms of a collection of matricies reminiscent of state
space ideas. Complete details on this general case can be found in Chap. 21 of
the monograph [3]. A direct elementary proof of this special ca se will appear
in [4].

References
[1] B.D.O. Anderson, An algebraic solution to the spectral factorization problem, IEEE Trans
Auto Control AC-12 (1967), 410-414

508

J. A. Ball et al.

[2] J.A. Ball and N. Cohen, De Branges-Rovnyak operator models and systems theory: a survey,
in Proe Workshop on Matrix and Opterator Theory, Rotterdam (June 1989), to appear
[3] J.A. Ball, I. Gohberg and L. Rodman, Interpolation of Rational Matrix Funetions, BirkhuserVerlag (Basel), OT45 1990
[4] J.A. Ball, I. Gohberg and L. Rodman, Boundary Nevanlinna-Piek interpolation for rational
matrix funetions, J Math Systems, Estimation and Control, to appear
[5] H. Bart, I. Gohberg and M.A. Kaashoek, Minimal Faetorization of Matrix and Operator
Funetions, Birkhuser-Verlag on (Basel), 1979
[6] L. de Branges and J. Rovnyak, Appendix on square summable power series, Canonieal models
in quantum seattering theory, in Perturbation Theory and its Applieations in Quantum
Meehanies (ed. C.H. Wileox), Wiley (New York), 1966
[7] M.S. Brodskii, Triangular and Jordan Representations of Linear Operators, Transl Math
Monographs Vol 32, Amer Math Soe (Providenee), 1971
[8] K. Glover, All optimal Hankel-norm approximations oflinear multi variable systems: relations
to approximations, Int J Control39 (1984),1115-1193
[9] I. Gohberg (ed.), Topies in Interpolation Theory of Rational Matrix-valued Funetions,
Birkhuser-Verlag OT33 (Basel), 1988
[10] I. Gohberg and M.A. Kaashoek (ed.), Construetive Methods of Wiener-Hopf Faetorizations,
Birkhuser-Verlag OT21 (Basel), 1986
[11] I. Gohberg, P. Laneaster and L. Rodman, Matrix Polynomials, Aeademic Press (New York),
1982
[12] I. Gohberg, P. Laneaster and L. Rodman, Invariant subspaees of matriees with applieations,
J. Wiley and Sons (New York), 1986
[13] B. Sz.-Nagy and C. Foias, Harmonie Analysis of Operators on Hilbert Spaee, Ameriean
Elsevier (New York), 1970
[14] L. Rodman, An Introduetion to Operator Polynomials, Birkhuser-Verlag OT38 (Basel), 1989
[15] H.H. Rosenbroek, State Spaee and Multivariable Theory, John Wiley (New York), 1970
[16] D. Sarason, Angular derivatives via Hilbert spaee, Complex Variables: Theory and Applieations
10 (1988),1-10
[17] D.C. Youla, On the faetorization of rational matriees, IRE Trans on Information Theory
(1961),172-189

The State Space Method for Solving Singular


Integral Equations
I. Gohberg 1 and M. A. Kaashoek 2
1 School of Mathematical Sciences, The Raymond and Beverly Sackler Faculty
of Exact Sciences, Tel-Aviv University, Tel-Aviv, Ramat Aviv 69978, Israel
2 Faculteit Wiskunde en Informatica Vrije Universiteit, NL-I007 Me Amsterdam,
The Netherlands

It is a pleasure to dedicate this paper to R.E. KaIman, who played an outstanding


role in inventing and developing the state space theory in system theory and control.
The first named author remembers with gratitude how R.E. KaIman introduced
him to this method du ring his visit to the ETH Zrich in the beginning of 1978.

In this paper explicit formulas are given for the solutions of singular integral
equations with a rational symbol. One of the main ideas is to use the state
space method from linear systems theory to reduce the problem to a problem
for input/output systems. Also, the Fredholm characteristics are described and
a review is given of the factorization method.

Introduction
The state space method is based on the fact that a proper rational matrix
function W(A) can be written in the form
W(J,) = D

+ C(A -

A)-l B,

(1)

where A is a square matrix whose order may. be much larger than the size of
W(A), and B, C, and D are matrices of appropriate sizes. The representation (1)
allows one to reduce problems about rational matrix functions in a successful
way to problems about constant matrices, and it often suggests explicit and
easily computable formulas for the solutions. In the last decade the state space
approach has also proved to be effective in solving various problems of mathematical analysis (see [3J and the references therein).
In the present paper we apply the state space method to solve singular
integral equations. Such equations serve as a tool to solve problems in many
fields of applications (statistics, mathematical physics, mechanics, etc.). For the
general theory and examples of applications we refer to the books [8J, [10J,
[llJ, and [14]. The equations we deal with will have a rational matrix symbol
and are of the form:
(2)

510

I. Gohberg and M. A. Kaashoek

Here the contour r consists of a finite number of disjoint smooth simple Jordan
curves, a() and bO are given m x m rational matrix functions, which have no
poles on r, and f is a given function from L~(T), the space of all (Cm-valued
functions that are square integrable on r. The matrix function W(),
W():= [a() - b()] -l[a() + b(A)],

which plays a decisive role in the analysis of (2), is, in general, not proper, and
therefore we use in this paper a modification of the representation (1), namely
we write
W(A) = I

+ C(AG -

A)-lB

(3)

Here A,B, and C are as in (1), G is a square matrix oLth.e same order as A,
and I stands for the m x m identity matrix. In terms of the, matrices A, G, B,
and C we shall give necessary and sufficient conditions for the inversion of the
equation (2) and an explicit formula for its solution. Also the Fredholm characteristics of equation (2) will be described in terms of these four matrices.
Section 1 is of preliminary character. Here we list a number of facts about
the representation (3) and we review the spectral separation theorem for a pencil
G - A. Section 2 contains the main theorems. The proofs are given in Sect. 3.
In Sect. 4 the state space method is also used to derive an explicit presentation
of the factorization approach. This last section may be viewed as a continuation
of the theory developed in [2], [6], which concerns Wiener-Hopf integral
operators, infinite block Toeplitz matrices and singular integral operators with
proper rational symbols.

1 Preliminaries
Throughout this paper r is a contour of the type appearing in (2). Thus r
consists of a finite number of disjoint smooth simple Jordan curves. The inner
domain of r will be denoted by Li + and its outer domain by Li _. In wh at
follows we ass urne that 00 ELi _.
Whenever convenient we identify a p x q matrix with the linear transformation from (Cq into (CP defined by the canonical action of the matrix relative to
the standard bases in (Cq and (CP. The symbol I denotes an identity operator
or a square identity matrix.
la. Matrix pencils. Let A and G be n x n complex matrices. The expression
AG - A, where A is a complex parameter, is called a (linear matrix) pencil. We
say that the pencil AG - A is r-regular if det(G - A) # 0 for each on the
contour r. In that ca se one can define the following matrices:

P=~fG('G-A)-ld', Q=~f('G-A)-lGd'

(1.1)
2nlr
2nl r
We shall need the following spectral decomposition result. For its proof we
refer to [12], see also [6], Sect. 2.

8 Influence in Mathematics-Solving Singular Integral Equations

511

Proposition 1.1. Let.W - A be T-regular, and let the matrices P and Q be defined
by (1.1). Then P and Q are projections which have the following properties:

(1) PG=GQandPA=AQ;
(2) (AG - A)-l P = Q(AG - A)-l on T and thisfunction has an analytic continuation on .1_ which vanishes at 00;
(3) (AG - A)-l(I - P) = (I - Q)(AG - A)-l on T and thisfunction has an analytic
continuation on .1 + .

1b. Realization. This subsection concerns the special representation (3). We


need the following version of the classical realization theorem (see [9]).
Proposition 1.2. A rational m x m matrix function W without poles on the contour

T admits the following representation:


(1.2)

H ere G and A are square matrices of the same size, n x n say, the pencil AG - A
is T-regular, and Band C are matrices of sizes n x m and m x n, respectively.
Proof. Since W has no poles on the contour T, we can decompose W as
W = W + + W _, where W + and W _ are rational matrix functions, W + has no
poles on .1 + u T, and W _ is strictly proper and has no poles on .1_ ur. By
the classical realization theorem W _ admits a representation of the form
W_(A) = Cl(A - Ad-1B l ,

where Al is a square matrix that has all its eigenvalues in .1+, and B l and Cl
are matrices of appropriate sizes. Next, choose a fixed Ao E.1+, and put
0/(,1.) =

1[I - W + 1)]
+ (,1.0

(1.3)

The matrix function 0/ is strictly proper and 0/ has no poles in the set
{AE<C1 ,1.0 + 1E.1+

UT}.

(1.4)

Again apply the classical realization theorem. So we can find a representation


of 0/ of the form
(1.5)

such that A 2 has no eigenvalues in the set (1.4). From (1.3) and (1.5) it follows that

1_ 0/(_1_)

W+(z)=I _ _
z - ,1.0

z - ,1.0

512

1. Gohberg and M. A. Kaashoek

where 12 is the identity matrix of the same order as A 2 . Since A 2 has no eigenvalues in the set (1.4), we have
det[zA 2

(I2

+ Ao A 2 )] # 0,

zEL1+ ur.

Now put

AG_A=(A-A 1

C = (Cl

AA 2 - (l2
B=

C 2 ),

+ Ao A 2)

(~:)

Then the pencil AG - A is T-regular and (1.2) holds true.

If W is as in (1.2), then we shall say that W is in realized form, and we shall


call the right hand side of (1.2) a realization of W The following proposition
will be used in Sect. 3; its proof can be found in [6], Sect. 4.

Proposition 1.2. Let W()~) = 1+ C(),G - A) -1 B, AE T, be a given realization. Put


A x = A - Be. Then det W(A) # 0 for each AET if and only ifthe pencil AG - A x
is T-regular, and in that case we have the following identities:
W(A)-l = I -

C(}~G

(AG - A X)-l =

- A X)-l B, AET,

(1.6)

- A)-l - (AG - A)-l BW(A)-lC(AG - A)-l, AET


(1.7)

()~G

2 Inversion and Fredholm Properties


Consider the integral equation

a(A)<p(A) + b(A)(~ <p(jl) djl) = f(A), AE T


mrjl-A

(2.1)

As before, the contour T consists of a finite number of disjoint smooth simple


Jordan curves and the coefficients a(-) and b(-) are m x m rational matrix
functions, which have no poles on r.
For <p a rational function without poles on T, we put

(S<p )(A) = ~

J <p(jl) djl, AE T,

m rjl-A

(2.2)

where the integral is taken in the sense of the Cauchy principal value. The
operator S defined in this way can be extended by continuity to a bounded
linear operator, again denoted by S, on all of L~(T) (see, e.g. [4], Sect. 1.3).
The operator S enjoys the property that S2 = I. Hence Pr= -!-(l + S) and

8 Influence in Mathematics-Solving Singular Integral Equations

513

Q r = ~(I - S) are complementary projections. The image of Pr consists of all


functions in L~(T) that admit an analytic continuation into ..1 +. Similarly, the
image of Qris the subspace of all functions in L~(T) that admit an analytic
continuation into ..1_ and vanish at 00.
Assume now that det(a(A) - b(A)) i= 0 for AET. Then equation (2.1) may be
rewritten in the form
(2.3)

where M w is the operator of multiplication by the m x m matrix function


W(A) = [a(A) - b(A)] -l[a(A) + b(A)], AET,

(2.4)

and the right hand side g is given by


g(A) = [a(A) - b(A)] -1 f(A), AET

(2.5)

We shall refer to MwP r+ Qras the singular integral operator with symbol W.
(Strict1y speaking the symbol is the diagonal matrix W()E!H r , where Irstands
for the function which is identically equal on T to the m x m identity matrix;
in what follows we shall omit this second function.)
Equation (2.1) has a unique solution qJEL~(T) for each choice of fEL~(n
if and only if the singular integral operator MwP r + Qris invertible, and in
that case the solution qJ is given by
qJ =

[M wP r+ Qrr 1 9,

where g is defined by (2.5). In this section we give a necessary and sufficient


condition for the invertibility of MwP r + Qr and an explicit formula for its
inverse. Also we shall describe the Fredholm properties of the operator
MwPr+Qp
Since the coefficients a() and b() in (2.1) are rational and have no pole on
T, we see from (2.4) that the same is true for W. It follows (see Sect. 1b) that
W admits a realization of the following form
W(A) = I

+ qAG -

(2.6)

A)-l B, AET,

where AG - A is aT-regular matrix pencil. We have the following theorems.


Theorem 2.1. Let MwP r + Qrbe the singular integral operator on L~(T) with
symbol (2.6). Put A x = A - Be. Then M wPr + Qr is invertible if and only if the
following two conditions are satisfied:
(0() the pencil AG - A x is T-regular,
() <en = Im P Et> Ker P x ,

where n is the order of the matrices G and A, and


P=

--~-:J G((G 2nlr

A)-ld(, p x =

_!J G((G 2nlr

A X)-ld(

(2.7)

514

I. Gohberg and M. A. Kaashoek

In that case
(M wPr+ Qr)-1 g(A) = g(A) - C(AG - A X)-1 B(Prg)(A)

+ {C(AG -

A X)-1 - C(AG - A)-1}(I - II)

.(~J p x G((G 2mr

A X)-1 Bg(()d(),AET,

where JI is the projection of <en along Im P onto Ker P x


Recall (see [13]) that an operator T on
and
dirn Ker T <

00, codim Im

T=

L~(T)

is Fredholm if Im T is closed

dim(L~(T)/Im

T) <

00

Here Ker T denotes the null space and Im T the range of T. If T is Fredholm,
then its index is the integer
ind T:= dirn Ker T - codim Im T
We say that T+ is a generalized inverse of T if TT+ T = T.

Theorem 2.2. Let T = MwP r + Qrbe the singular operator on L~(T) with symbol

(2.6). Put A x = A - Be. Then T is a Fredholm operator if and only if the pencil
AG - A x is T-regular, and in that case the following equalities hold:

Ker T= {ip I ip(A) = -C(AG-A)-1Y +C(AG-A X)-1 y,YEImPnKer P X},


(2.8)

Im T= {gEL~(T)1 p x G((G - A X)-1Bg(()d(EImP + Ker P X},


r
dirn Ker T = dirn (Im P n Ker P X), co dirn Im T = dirn

(2.9)

<en

ImP + Ker p x
(2.10)

ind T = rank P - rank P x ,

(2.11)

Here P and p x are as in (2.7), and n is the order of the matrices A and G.
Furthermore, a generalized inverse T+ of T is given by
(T+ g)(A) = g(A) - C(AG - A X) -1 B(P rg)(A)

+ {C(AG -

A X)-1 - C(AG - A)-1}

.J+(~J pXG((G-A X)-1Bg(()d(),AET,


2mr

where J + : Im P x

-+ Im P

(2.12)

is a generalized inverse of the linear transformation


(2.13)

8 Influence in Mathematics-Solving Singular Integral Equations

515

3 Proofs of the Mai Theorems


In this section we give the proofs ofTheorem 2.1 and Theorem 2.2. The following
proposition summarizes one of the main steps in the proofs. Hs central idea is
the reduction ofthe inversion problem to a problem for input/output systems.

Proposition 3.1. Let MwP r + Qr be the singular integral operator on L~(n with
symbol (2.6), and let gEL~(n. Put A x = A - BC. Assume that J..G - A x is
T-regular, and let P and P x be the projections defined by (2.7). Then the
equation
(MwP r + Qr)qJ = 9
has a solution

qJEL~(n

Jp x G((G r

(3.1)

if and only if

A X)-l Bg(()d(EP X[ImP],

(3.2)

and in that case the general solution of (3.1) is given by


qJ(J..) = g(J..) - C(J..G - A)-ly + C(J..G - A X)-ly
+ C(J..G - A X)-lB(Prg)(J..),J..ET,

(3.3)

where y is an arbitrary vector in Im P such that


p XY =

Proof. For

~J p x G((G - A X)-l Bg(()d(


2mr

qJEL~(n,

(3.4)

put

qJ+(J..).= (PrqJ)(J..) = "iqJ(J..) +

1 qJ(()
.
-J
-d(,J..ET,
2m r(-J..

(3.5a)

qJ_(J..):= (QrqJ)(J..) = iqJ(J..) -

1
qJ(()
-J
-d(,J..Er.
2m r(-J..

(3.5b)

Assume now that qJEL~(n is a solution of (3.1). We shall show that in that
case 9 satisfies (3.2) and that qJ is given by (3.3). Introduce the auxiliary function
p(J..) = (J.. - A) - 1 BqJ + (J..), J..E r. From the representation (2.6) for W it follows that
the connection between qJ and 9 in (3.1) is described by the following input/output
system:

J..GP(J..) = Ap(J..) + BqJ+(J..), J..ET,


g(J..)
= Cp(J..) + qJ+(J..) + qJ_(J..)

(3.6)

Note that pEL~(n. The first identity in (3.6) implies that the function
(J..G - A)p(J..) is in Im Pr (where Pr is now considered on L~(T)), and hence

516

I. Gohberg and M. A. Kaashoek

by (3.2a)

r' =-j - [ ( ' r' -

1
1
MlG - A)p(l) = - j - ( ' G - A)p(')d'
2m
l

2m

l)G +(AG- A)]p(Od'

r' -

1
p(O
)
=Gx+(lG-A) ( - j - d ' ,lET,
2m
l

where

x = - j p(')d' E<Cn
2m r

But then we see (use (3.2b)) that


p_(A) = (AG - A)-lGx,lET

(3.7)

Since PGxElmP, we may apply Proposition 1.1(2) to show that (lG - A)-l PGx
extends to an analytic function on LL which vanishes at 00. The function p_
has the same properties. Thus, by (3.7), also the function (lG - A)-l(! - P)Gx
may be extended to an analytic function on .1_ which vanishes at 00. On the
other hand, by Proposition 1.1 (3), the function (lG - A) -1(1 - P)Gx is analytic
on .1 + u T. Thus this function is an entire function which is zero at infinity.
Therefore, by Liouville's theorem, (lG - A)-l(1 - P)Gx is identically zero, which
implies that Gx = PGxElm P.
From (3.7) it follows that the first identity in (3.6) can be written as:
lGp+(l) = Ap+(l) - Gx + BqJ+(l),lET

(3.8)

By applying Pr to the second identity in (3.6) we get


g+(A) = Cp+(l) + qJ+(A),lET

(3.9)

Now multiply (3.9) from the left by Band substract the resulting identity from
(3.8). This yields
lGp+(l) = A x p+(l) - Gx + Bg+(l),lET,

(3.10)

and thus

(3.11)
From Proposition 1.1 (with lG - A x in place of lG - A) we know that the
function (AG - A X) -1(1 - P x )Gx extends to a function which is analytic at each
point of .1 + u T, and thus the function (lG - A X) - 1(1 - P x )Gx belongs to
ImP r = KerQr. Also, p+EKerQr. Therefore, Qrapplied to (3.11) gives:
!(lG - A X)-l Bg+(l) -

r' -

~J _l_('G 2m

A X)-l Bg+(')d'

(3.12)

8 Inlluence in Mathematics-Solving Singular Integral Equations

517

and so

IS

1
x
-[(A-OG+((G-A
)]
2m r(-A

1
1
P x Gx=zBg+(I",)--.

((G - A X)-l Bg+(()d(


= -iBg+(A) -

1
1
1
-J
-Bg+(Od( + -.S G((G 2m r( -..1.
2m r

A X)-l Bg+(()d(,

(3.13)

Proposition 1.1(1) and 1.1(2) imply that the last integral does not change if in
the integrand G is replaced by P x G. But P x G(( G - A X) -1 B is analytic on LL
and vanishes at 00. Therefore

~S p x G((G 2mr

A X)-lBg_(Od( = 0,

and thus
(3.14)
which shows that (3.2) is fullfilled.
Put Y = Gx. Then (3.4) holds true. Furthermore, by the second identity in
(3.6) and formulas (3.7) and (3.11), we have for AET
cp(A)

= g(A) -

Cp _ (..1.) - Cp + (..1.)

= g(A) - C(AG - A)-l y + C(AG - A X)-l y - C(AG - A X)-lBg+(A),

which proves (3.3).


Next, we prove the converse statement. So we assume that cp is given by
(3.3), with y a vector in Im P satisfying (3.4). Put
P1(A) = (AG - A)-l y,

P2(A) = (AG - A X)-l y

P3(A) = (AG - A X)-lBg+(A),

AET

From YEImP and the spectral results in Section 1a it follows that P1EImQr.
Since y satisfies (3.4),
(QrP2)(A) = _(AG_A X)-lp Xy,

Furthermore,

AET

518

I. Gohberg and M. A. Kaashoek

1
= (QrBg+)(Je) + - j G((G - A X)-lBg+(()d(
2m r
=

tBg + (Je) - (JeG - A X)(~J _l_((G - A


2m r(-Je

X)-l Bg + (()d()

To prove the last equality one uses the same type of reasoning as in (3.13).
From the above calculation it folIo ws that (3.12) holds with y in pI ace of Gx,
which shows that
(QrP3)(Je)
Thus P2
that

(JeG - A X)-lp Xy,

+ P3 EKer Qr= Im Pr.

<p-=g--CPl'

JeEl

As <p = 9 - CPl - C(P2

+ P3),

we conclude

<p+=g+-C(P2+P3)

From the definitions of P2 and P3 we have


(JeG - A)(P2(Je) + P3(Je)) = (JeG - A x )(P2(Je) + P3(Je)) - BC(P2(Je) + P3(Je))

= - y + Bg + (Je) - BC(P2(Je) + P3(Je))


= - y + B<p + (Je)
= - (JeG - A)Pl(Je) + B<p+(Je), JeEl
It follows that with P = Pl
implies that

+ P2 + P3

the identities in (3.6) hold true. But this

W(Je)<p+ (Je) + <p _(Je) = <p + (Je) + C(JeG - A)-l B<p + (Je) + <p_(Je)
= <p+(Je) + Cp(Je) + <p_(Je)
= g(Je), JeEl,
and thus 9 is a solution of (3.1).

Proof of Theorem 2.2. From the general theory of singular integral equations
(see [5], also [4]) it is known that T=MwPr+Qris Fredholm ifand only if
det W(Je) # 0, JeEl. But then, by Proposition 1.2, T is Fredholm if and only if
det(JeG - A X) # 0 for each JeEl.
Assurne that the latter condition holds true. A straightforward application
of Proposition 3.1 (with 9 = 0) yields (2.8). Also (2.9) follows directly from
Proposition 3.1; one only has to note that for xEImp x

XEP x [Im P]<=>xEIm P + Ker p

To prove the first identity in (2.10) it suffices to show that for TE Im P (') Ker P x
the identity
(3.15)
implies y = O. Since YEIm P, the left hand side of (3.15) extends to an analytic
function on .1_ which vanishes at 00. From YE Ker P x it follows that the right

8- Influence in Mathematics-Solving Singular Integral Equations

519

hand side of (3.15) has an analytic continuation on Li +. So, by Louiville's


theorem, both functions are identically zero on F. But then we can apply the
identity (1.7) to show that

(AG-AX)-lY=(AG-A)-ly,

(3.16)

AEF

Apply G to both sides of (3.16) and integrate over the contour r. One sees that
y = Py = p x Y = 0, and thus the first identity in (2.10) is proved. In an analogous
way (or using a duality argument) one proves the second identity in (2.10).
From (2.10) it follows that
Ind T= dim(lmPnKer P X )

= dirn Im P - dirn

dirn

<en

ImP + Ker p x

ImP

<en

- dim----Im P n Ker P x
Im P + Ker P x

I
d Im P + Ker P x
= d 1m mP- I m - - - - Kerp x

<en
.
dI
m----ImP+Ker p x

= dirn Im P - dirn Im P x ,
which proves (2.11).
Finally, let us show that the operator T+ defined by (2.12) is a generalized
inverse of T. Take an arbitrary cpEL~(n, and put g = Tcp. Then (3.2) holds true,
that is,

z:= f p x G('G - A X)-l Bg(')d'ElmJ,


r

(3.17)

where J is defined by (2.13). Put y =


z. Since J + is a generalized inverse of
J, the map JJ + acts as the identity operator on Im J, and therefore
P x y = J z = z. It follows that (3.4) holds true. Also YE Im P. Thus

Proposition 3.1 implies that T+ g is a solution of(3.1). But then

Tcp = g = T(T+ g) = TT+Tcp


Since cp is arbitrary, we have proved that T+ is a generalized inverse of T.

Proof of Theorem 2.1. Assurne that T:= M wP r+ Qr is invertible. Then T is


Fredholm, and thus, by Theorem 2.2, condition (oe) is fulfilled. Furthermore, since
dirn Ker T = 0,

codim Im T = 0,

(3.18)

formula (2.10) shows that condition () is fulfilled.


Conversely, assurne that (oe) and () hold true. Then, by Theorem 2.2, the
operator Tis Fredholm and (3.18) holds true. But this means that Tis invertible.
To compute T- 1 , let II be the projection of <en along Im P onto Ker P x,
and define J + : Im P x -+ Im P by setting

J+x=(I-II)x,

xElmp x

(3.19)

520

1. Gohberg and M. A. Kaashoek

Let J be the map defined by (2.13). From


JJ+ Jz = pX(I - JI)Jz = p x Jz,

zEIm P,

it follows that J + is a generalized inverse of J. (In fact, J + is the inverse of J,


but we don't need this here.) Now, let T+ be the operator defined by (2.12)
with J + given by (3.19). Then T+ is a generalized inverse of T. But T is invertible,
and thus T+ = T- 1 , which proves the formula for T- 1
D

4 The Factorization Method


The classical way to invert the singular integral operator MwP r + Qr is based
on the idea of factorization. First one looks for a so-called right canonical
factorization of the symbol W relative to the contour r, that is, a factorization
of the form
(4.1)

where, for v = +, -, the matrix function W. is continuous on Li. ur and analytic


on Li., and det W.(A) =I' 0 for each AELi. urIn particular, the factor W _ is
analytic at wand det W _ (w) =I' o.
As in the previous sections, let us assume that the symbol W is rational.
Then it is well-known (see, e.g. [4J, Theorem I.3.1) that the singular integral
operator MwP r + Qr is invertible if and only if its symbol admits a right
canonical factorization, and in that case
(M wP r+ Qr)-lg(A)

= W + (A)-l(P r(W:: 19))(A)

+ W_(A)(Qr(W::1g))(A),

AEr,

(4.2)

where W _ and W + are the factors in a right canonical factorization of W relative


to r By definition, W:: 1 g is the function W _ (.) - 1 g(.). To apply this method
in an effective way one needs necessary and sufficient conditions that guarantee
the existence of the canonical factorization and one needs explicit formulas for
the factors in the factorization (and also for their inverses). The representation
(2.6) of the symbol allows one to find such conditions in terms of finite
dimensional spaces and to derive the factors and their inverses explicitly. The
following theorem holds; its proof may be found in [6J, Sect. 5.

Theorem 4.1. Let W be a rational m x m matrix function without poles on the


contour

r,

and let W be given in realized form:

W(A) = I

+ C(AG -

A)-l B,

AEr

Put A x = A - Be. Then W admits a right canonical factorization relative to


if and only if the following two conditions hold true:

(i) the pencil AG - A x is r -regular,


(ii) <rn = Im P EB Ker P x and <r n = Im QEB Ker Qx

8 Inlluence in Mathematics-Solving Singular Integral Equations

521

H ere n is the order of the matrices G and A, and

In that case a canonical factorization W(A) = W _ (A) W + (A) ofW relative to


obtained by taking
W_(A) = I

+ C(AG -

W+(A) = I

+ Cr(AG -

A)-1(I -ll)B,
A)-1B,

is

AEr,

AEr,

W_(A)-l = I - C(I - r)(AG - A X)-1B, AEr,


W+(A)-1=I-C(AG-A X)-lllB, AEr
H ere II is the projection of <en along Im P onto Ker P x and r is the projection
along Im Q onto Ker Q x. Furthermore, the two equalities in (ii) are not independent; in fact, the first equality in (ii) implies the second and conversely.

Let MwP r + Qr be the singular integral operator with symbol W. Assurne


that W is rational and given in the realized form (2.6). Theorem 4.1 and the
general theory of singular integral operators reviewed in the first two paragraphs
ofthis section imply that M wPr+ Qris invertible ifand only if conditio~s (i) and
(ii) in Theorem 4.1 are fulfilled. Since the two conditions in Theorem 4.1 (ii) are
equivalent, we reprove in this way the first part of Theorem 2.1.
The formula for (M w P r +Qr)-l appearing in Theorem 2.1 mayaiso be
obtained from Theorem 4.1 and the general theory referred to above. For this
purpose we use formula (4.2), and we insert in this expression the explicit
formulas for the factors W_; W::: 1 and W~1 appearing in Theorem 4.1. To see
that, indeed, this leads to the desired formula for (MwP r + Qr)-1, we first
rewrite (4.2) in the following form:

Next, observe that, by Theorem 4.1,


{W+(A)-1- W_(A)}W_(O-1
= -C(AG -A X)-1llB_ C(AG-A)-l(I -ll)B

+ C(AG-A X)-lllBC(I -r)('G-A X)-lB


+ C(AG - A)-1(I -ll)BC(I -r)('G- A x )-1 B

(4.4)

522

I. Gohberg and M. A. Kaashoek

The laUer formulas can be simplified further. Indeed, note that


llA(I - r) =0,

(I - ll)A xr=O,

llG= Gr

Since BC = A - A x, it follows that


II BC(1 - r) = A x r - II A x
= (A x - A.G)r - ll(A x

(G) - (( - A.)llG,

and
(I-ll~C(I-~=A(I-~-(I-llMx

= (A -A.G)(1 -r)-(I -ll)(A x -(G)-(( -,1)(1 -ll)G

Inserting these expressions in (4.4) yields


{W +(,1)-1_ W _P,)} W _(0- 1
= -C((G -A x )-1 B _(( _ A)C(AG _ A x)-1 llG((G-A x)-1 B
- (( - A)C(AG - A)-1(I - ll)G((G - A x )-1 B

Next, use that


(( -A)C(AG-A X)-1G((G_A X)-1B
= C(AG-A X)-1(((G_A X)-(AG-A X))((G-A X)-1B
= C(AG-A X)-1B-C((G-A X)-1B,

and thus
{W+(A)-1- W_(A)}W_(O-1
= _ C(AG - A x )-1 B

+ (( -

A){C(AG- A X)-1 - C(A.G- A)-1}(1 - ll)G((G - A X)-1B

(4.5)
By inserting (4.5) and (1.6) in (4.3) we obtain
(MwP r + Qr)-1 g (A)
= g(A) - iC(AG - A x )-1 Bg(A) - C(AG - A x )-1 B(~

J_l_g(()d()

2m r (-A

+ {C(AG-A X)-1_C(AG-A)-1}(1 -ll)

.(~ JG((G-A X)-1Bg(()d(),


2m r

AET.

Since (1 - ll)P x = 1 - II and Pr is given by (3.5a), we have found the expression


for (M w P r +Qr)-1 appearing in Theorem 2.1.
The Fredholm properties of M wP r+ Qr mayaIso be derived via the
factorization method. This requires to construct a non-canonical factorization,
which one can do via the state space method (see [2], [7]). However the formulas
are much more involved than those in Theorem 4.1, and hence for the Fredholm
case the approach via input/output systems employed in Section 3 is more direct.

8 Influence in Mathematics-Solving Singular Integral Equations

523

References
[1] H. Bart, 1. Gohberg, M.A. Kaashoek: Minimal factorization ofmatrix and operator functions.
Operator Theory: Advances and Applications, Voll, Birkhuser Verlag, Basel, 1979
[2] H. Bart, 1. Gohberg, M.A. Kaashoek: Explicit Wiener-Hopf factorization and realisation. In:
Operator Theory: Advances and Applications, Vo121, Birkhuser Verlag, Basel, 1986,
pp 235-316
[3] H. Bart, 1. Gohberg, M.A. Kaashoek: The state space method in problems of analysis. In:
Proceedings of the first international conference on industrial and applied mathematics,
Contributions from the Netherlands, Centre for Mathematics and Computer Science,
Amsterdam, 1987, pp 1-16
[4] K. Clancey, 1. Gohberg: Factorization of matrix functions and singular integral operators.
Operator Theory: Advances and Applications, Vol 3, Birkhuser Verlag, Basel, 1981
[5] 1. Gohberg: The factorization problem in normed rings, functions of isometrie and symmetrie
operators, and singular integral equations, Russian Math Surveys 19 (1) (1964),63-114
[6] 1. Gohberg, M.A. Kaashoek: Block Toeplitz operators with rational symbols. In: Operator
Theory: Advances and Applications, Vo135, Birkhuser Verlag, Basel, 1988, pp 385-440
[7] 1. Gohberg, M.A. Kaashoek, A.C.M. Ran: Interpolation problems for rational matrix functions
with incomplete data and Wiener-Hopf factorization. In: Operator Theory: Advances and
Applications, Vol 33, Birkhuser Verlag, Basel, 1988, pp 73-108
[8] I. Gohberg, N.Ya. Krupnik: Introduction to the theory of one-dimensional singular integral
operators. Kishniev: Stiince, 1973 (Russian); German transI: Birkhuser Verlag, Basel, 1979
[9] R.E. KaIman, P.L. Falb, M.A. Arbib: Topics in mathematical system theory. McGraw-Hill,
New York, 1969
[10] E. Meister: Randwertaufgaben in der Funktionentheorie. Teubner, 1983
[11] N.I. Muskhelishvili: Singular integral equations. Boundary problems of function theory and
their applications to mathematical physics 2nd ed, Fizmatgiz, Moscow, 1962 (Russian); English
transl of 1st ed, NoordhofT, Groningen, 1953
[12] F. Stummel: Diskrete Konvergenz linearer Operatoren. II Math Z 120 (1971),231-264
[13] A.E. Taylor, D.C. Lay: Introduction to functional analysis (2nd ed). Wiley, New York, 1980
[14] N.P. Vekua: Systems of singular integral equations and some boundary problems. GITTL,
Moscow, 1950 (Russian); English transI, NoordhofT, Groningen, 1967

Chapter 9

Applications

Aigebraic Structure of Convolutional Codes,


and Aigebraic System Theory
G. D. Forney, Jr.
Codex Corporation, 20 Cabot Boulevard, Mansfield, MA 02048 USA

This paper is a review of the algebraie theory of eonvolutional codes through the foeus of the
author's 1970 paper [Fl], incIuding its origins in the work of KaIman et al. and of Massey and
Sain. This paper grew out of and in turn influeneed the development of algebraie system theory,
primarily through the author's 1975 paper [F3], as is indieated by a synopsis of relevant parts of
Kailath's text and by a citation index of [F3]. The paper includes a summary exposition of the
contents of [Fl], [F3], and a less-known intermediate paper [F2], as weil as abrief introduction
to subsequent work in convolutional coding theory citing [Fl], particularly that of Piret.

1 Introduction
There are two principal classes of codes for digital communications: block codes
and convolutional codes.
There exists a well-developed algebraic theory of block codes, documented
in books such as Peterson [1961], Berlekamp [1968], Peterson and Weldon
[1971], MacWilliams and Sloane [1977], McEliece [1977], van Lint [1982],
and Blake and Mullin [1985]. The principal classes of block codes (e.g. ReedSolomon and BCH codes) are constructed from the algebraic theory; their
principal properties, such as their minimum distances, are determined by
algebraic arguments; and their decoding algorithms are typically algebraic.
'Algebraic coding theory,' which effectively means 'the theory of algebraic block
codes,' remains an active research area [IT Special Issue, 1988].
By contrast, the algebraic theory of convolutional codes is relatively meager.
Convolutional codes are typically found by more or less exhaustively searching
for codes with good distance properties, and their decoding algorithms are
typically also search procedures (e.g. the Viterbi algorithm, sequential
decoding). However, for most applications, convolutional codes have proved
to have a better performance-complexity tradeoffthan block codes, and therefore
they have gene rally been preferred for implementation.
It was with the hope of understanding this superiority and laying the basis
for a general algebraic theory of convolutional codes that I wrote a paper [F1r
1 [Fl] received the 1970 Browder J. Thompson Prize, for the best paper in any IEEE publication
by an author under the age of 30.

528

G. D. Forney, Jr

some twenty years aga on the algebraic structure of convolutional codes. A


second paper [F2] showed how certain questions concerning convolutional
codes could be best answered by investigating dual codes.
These papers were a success in providing a linear-system-theoretic structure
theory for convolutional codes, in settling certain questions concerning
invertibility, and in defining canonical classes of convolutional encoders. They
were generally failures, however, with regard to leading to any new classes of
codes or decoding algorithms, or stimulating very much further work.
In consequence, the first paper seems to be still the main reference in this
area. The material has not found its way into any general coding textbooks
(e.g. Lin and Costello [1983], Blahut [1983]) of which I am aware, although
it is sometimes taught in advanced graduate courses. The recent monograph
by Piret [1988] is probably the most substantial descendant of this work.
A third paper [F3] was written for a linear-system-theory audience. This
paper seems to have been widely no ted and cited, and its concepts and results
rather completely assimilated into the literature of the field by the end of decade;
see, e.g. the introductory text of Kailath [1980].
The purposes of this paper are (a) to pay tribute to the origins of these
papers in the world of multidimensional (or multivariable) system theory,
particularly in the work of KaIman; (b) to summarize their content, particularly
for readers who are experts in algebraic system theory but not in cding; and
(c) to trace briefly subsequent work in the algebraic theory of convolutional
codes, and back in the domain of algebraic system theory.

References
[FI] G.D. Forney, Jr., "Convolutional codes I: Aigebraic structure," IEEE Transactions on
Information Theory, Vol IT-16, pp nO-738,1970;correction appears in Vol IT-17, p 360, 1971
[F2] G.D. Forney, Jr., "Structural analysis of convolutional codes via dual codes," IEEE
Transactions on Information Theory, Vol IT-19, pp 512-518, 1973
[F3] G.D. Forney, Jr., "Minimal bases ofrational vector spaces, with applications to multi variable
linear systems," SIAM J Control, Vol13, pp 493-520, 1975
[I] W.W. Peterson, Error-Correcting Codes. New York: Wiley, 1961
[2] E.R. Berlekamp, Algebraic Coding Theory. New York: McGraw-HiIl, 1968
[3] W.W. Peterson and E.J. Weid on, Error-Correcting Codes, 2nd edition. Cambridge, Mass: MIT
Press, 1971
[4] F.J. MacWiIliams and NJ.A. Sioane, The Theory of Error-Correcting Codes. 1977
[5] R.J. McEliece, The Theory ofInformation and Coding. Reading, Mass.: Addison-Wesley, 1977
[6] J.H. van Lint, Introduction to Coding Theory. New York: Springer-Verlag, 1982
[7] I.F. Blake and R.C. Mullin, The Mathematical Theory ofCoding. New York: Academic Press,
1985
[8] D.J. Costello, Jr. and J.H. van Lint, eds, Special issue on coding techniques and coding theory,
IEEE Transactions on Information Theory, Sept 1988
[9] S. Lin and D.J. Costello, Jr., Error Control Coding: Fundamentals and Applications. Englewood
ClifTs, NJ.: Prentice-Hall, 1983
[10] R.E. Blahut, Theory and Practice of Error Control Codes. Reading, Mass.: Addison-Wesley,
1983
[11] P. Piret, Convolutional Codes: An Algebraic Approach. Cambridge, Mass.: MIT Press, 1988
[12] T. Kailath, Linear Systems. Englewood CifTs, NJ.: Prentice-Hall, 1980

9 Applications-Convolutional Codes and Algebraic System Theory

529

2 Origins
In this section we give abrief introduction to convolutional codes in the context
of multi dimensional linear system theory, describe the pioneering work of
Massey and Sain in this field, and finally describe the main algebraic concept
upon which [F1] was based (the invariant factor theorem and its extensions),
whose usefulness had been shown by the work of KaIman et al. [KF Al

2.1 Introduction to Convolutional Codes


Convolutional codes were invented in 1954 by Elias [1954], and became popular
after the invention of attractive decoding algorithms such as sequential decoding
by Wozencraft [1961], threshold decoding by Massey [1963], and the Viterbi
algorithm [1968l
A general rate-kin convolutional encoder is a k-input, n-output, linear, timeinvariant, causal, invertible, and realizable sequential circuit over some finite
field F = GF(q), typically the binary field GF(2). Thus the natural setting for
an algebraic theory of convolutional codes is the algebraic theory of multidimensional linear systems.
But the coding theory point of view can often be somewhat different from
that of linear system theory. For instance:
The field F is always finite, unlike the real or complex field. This has the
important consequence that if the state space of the encoder has finite
dimension v, then the state space is in fact finite, with qV distinct states; i.e.
convolutional encoders are finite-state machines. Also, problems of analysis,
such as stability, do not occur in coding theory; coding problems all lie in
the field of algebra (or of geometry).
Coding theory is concerned only with discrete-time systems, not continuoustime. Many algebraic-system-theoretic concepts and results are more
transparent and intuitive in the discrete-time domain.
In linear system theory, the input/output relation is usually of central interest.
In co ding theory, what is most important is the set of all output sequences
that can be produced by the encoder-i.e. the code. One does not care much
about which input sequence pro duces which output (code) sequence, as long
as certain elementary conditions are satisfied, to be discussed later.
These factors led naturally toward an algebraic approach, and indeed to
the formulation of questions which might not have been so natural in algebraic
system theory, but whose answers seem nonetheless to have proved illuminating
there. It was of course a fortunate coincidence that the foundation work of
KaIman and others in multidimensional systems theory had recently become
developed to the point of accessibility to a researcher outside of that field.
In this theory, a k-input, n-output discrete-time linear system is characterized
by a k x n matrix G whose terms represent its responses to unit inputs at time

530

G. D. Forney, Jr

zero. In coding theory G is called a generator matrix, and its elements are
represented by formal power series gij(D) in the indeterminate D (the delay
operator) over F. When the system is finite-dimensional, every giiD) is a rational
function-i.e. a quotient of polynomials in D. The set of all rational functions
in D over F is denoted by F(D), and the set of all polynomials by F[D]. Indeed,
every gij(D) is a realizable function-i.e. a rational function that represents a
causal sequence when expanded as a formal Laurent series in D (sometimes
called a proper rational function). We denote the set of all realizable functions
in D over F as Frz[D].
If x is a k-tuple of sequences of elements of F, called an input sequence, then
the encoder generates an n-tuple of output sequences, called the code sequence
y = xG. The code Cis the set of all code sequences, C = {y = xG}, as x ranges
through all possible input sequences. An input sequence may continue
indefinitely, but for technical reasons we require, if it is nonzero, that it 'start'
at some time; i.e. XE [F((D))]k, where F((D)) denotes the set of all formal Laurent
series f(D) = fdDd + fd+ lD d+ 1 + .", where the initial index d, called the delay,
may be negative; it is denoted by del f(D) = d. (The delay of the zero sequence
O(D) is defined as deI O(D) = 00.) The set of all formal power series is the set of all
formal Laurent series with nonnegative delay, and is denoted by F[[D]]; all
realizable functions are thus in F[[D]]. We say that a formal Laurent se ries
f(D) is finite if all of its coefficients are zero after some time, and then define
its degree degf(D) as the index of the last nonzero co ordinate. The set F[D]
of polynomials is the subset of F((D)) with deI f(D) ~ 0 and finite degree, plus
the zero sequence O(D), for which we define degO(D) = - 00.
For example, Fig. 1 illustrates a simple rate-1/2 convolutional encoder
over the binary field F = GF(2). The input bit sequence enters a two-stage shift
register, and two output sequences are generated from the current input bit
and the two stored bits by linear additions over GF(2) (mod-2 addition). The
generator matrix of this encoder is G = [1 + D2 , 1 + D + D2 ]; the code is the
set of code sequences C = {y = xG:xEF((D))}; and the state space has dimension
v = 2, with qV = 22 = 4 distinct states. The dimension of the state space is the
maximum degree of the two generator polynomials.
The most important parameter for code performance is the minimum
Hamming distance dH(C) between code sequences, sometimes called the free
distance, where the Hamming distance between two sequences is the number
of symbols (elements of F) in which they differ. The main object of convolutional
code design is to maximize dH(C) for a given rate kin and state space dimension
v. Since a convolutional code C is linear, this is the same as minimizing the

,,(0) :

Fig.1. Four-state rate-l/2 binary convolutional encoder

9 Applications-Convolutional Codes and Algebraic System Theory

531

minimum Hamming weight (number of nonzero terms) of any nonzero code


sequence. For example, for the code generated by the encoder of Fig. 1,
dH(C) = 5, with the code sequence [1 + D 2 , 1 + D + D 2 ] being the only weight-5
code sequence.

2.2 Results of Massey and Sain


The principal prior work in this field was that of Massey and Sain [1967, 1968],
who made the basic connections between convolutional coding theory and
algebraic system theory.
In [1967], Massey and Sain recognize that two encoders can be considered
equivalent for co ding purposes if they generate the same set of output sequences
(the same code), and that there is a polynomial encoder equivalent to any
rational encoder (multiply G through by the least common multiple of the
denominators of its elements).
In [1968], they consider the case of a rate-kin convolutional encoder of the
type of Fig. 1; Le. with a polynomial generator matrix G = {gij(D)EF[D],
1 ~ i ~ k, 1 ~j ~ n}. They define an "inverse with delay L" for an encoder G as
a realizable linear sequential circuit defined by an n x k generator matrix G- 1
such that GG -1 = DL I k, where I k is the k x k identity matrix. Such a matrix
G- 1 is called a (right) inverse of G if L= 0, or more generally a right 'pseudoinverse' for any L. (Note: this definition difTers from the usual definition of a
pseudo-inverse of a matrix of less than full rank.) They note that it is desirable
for co ding purposes that any such encoder have a right pseudo-inverse G- 1
that is polynomial; for then an estimate y of the transmitted code sequence y
can be converted to a delayed estimate x of the input sequence x by the relation
yG -1 = DLx, without any danger that a finite number of errors in y will lead
to an infinite number of errors in X. They show that if there is any infinite
(nonpolynomial) input sequence x such that y = xG is finite (polynomial), then
G cannot have a polynomial right pseudo-inverse G- 1 ; for ifit did, then yG- 1
would have to be polynomial, contradiction. (They imply but do not actually
prove the converse.)
The main result of Massey and Sain [1968] is that a polynomial encoder
G has a polynomial right pseudo-inverse G-1 if and only if the greatest common
divisor of the determinants of the k x k submatrices of G (the k x k minors of
G) is DL , for some L. They also show that the minimum delay L min of any
pseudo-inverse is bounded by L min ~ L, and that the upper bound holds with
equality when k = 1.
Subsequently Massey's student Olson [1970] was able to prove that the
minimum delay L min of any realizable (proper rational) encoder is equal to the
difTerence between the greatest common delay of the k x k minors and the
greatest common delay of the (k - 1) x (k - 1) minors of G (see Sect. 3.1).
Encoders without polynomial right pseudo-inverses are now called
catastrophic encoders, and the condition wherein a finite number of errors in y

532

G. D. Forney, Jr

can lead to an infinite number of errors in X, catastrophic error propagation.


This terminology does not appear in [Massey and Sain, 1968J, although this
paper is often cited for this concept. According to Massey, he first used this
term in its now-accepted sense in a seminar at V.C.L.A. in 1969.
It is important to avoid catastrophic encoders not only in practice, but also
in searching for good codes. This avoids wasted effort; more importantly, it
may avoid wrong conciusions, if the search considers only polynomial inputs.
For example, the encoder G = [1 + D, 1 + DJ might appear to generate a code
C with Hamming distance dH(C) = 4; however, the nonpolynomial input
sequence 1/(1 + D) gives the polynomial output sequence [1,IJ, so the true
Hamming distance of Cis only dH(C) = 2.
A systematic encoder G is one that has the k x k identity matrix h as its first
k columns (or, more generally, in some subset of k of its n columns). Massey's
student Costello [1969J was apparently the first to notice that every convolutional encoder is equivalent to a systematic encoder, in general nonpolynomial. A systematic encoder has a trivial polynomial inverse G - 1 = [I k; oy
with delay zero, and therefore any systematic encoder is noncatastrophic.

2.3 Mathematical System Theory and Algebra


In attempting to extend the work of Massey and Sain and Massey's students,
it became apparent that the key was finding the right language for theproblem.
Vltimately, this appeared to be that of mathematical system theory, and of the
algebra upon which it was based. To quote from the Acknowledgment of [Fl]:
The principal results were at first obtained by tedious constructive
arguments; subsequently Prof. R.E. Kaiman was good enough to send
along some of his work, which pointed out the usefulness of the invariant
factor theorem in the guise of the Smith-MacMilian canonical form, and
which consequently was of great value in simplifying and ciarifying the
development.
In particular, the paper refers to [Kaiman, 1965J and Chap. 10 of KaIman,
Falb and Arbib [1969J (by Kaiman), which are concerned with algebraic
realization theory.
In turn, these papers led to the mathematical literature, particulariy to the
textbooks of Curtis and Reiner [1962J and Herstein [1964]. (These remain
excellent references, but today one might start with Hartley and Hawkes [1970].)
Here one finds the basic concepts of rings and modules. One learns that a
principal ideal domain (p.i.d.) is a particulariy well-behaved form of ring, of
which the ring of integers and the ring F[DJ of polynomials over a field F are
archetypes.
In any p.i.d. R, there is unique factorization of any nonzero element r of R
into a product of primes, up to units, where the units of Rare its invertible

9 Applications-Convolutional Codes and Algebraic System Theory

533

elements, and the primes are the elements of R that have no factors other than
themselves, up to units. Let P be the set of all primes in R (with the ambiguity
due to units resolved by some convention); then any nonzero element of R can
be written uniquely in the form

r = ullpeP pep(r) ,
where u is a unit in R, and ep(r) ~ 0 is the order of the prime p in r. The order
ep(O) is defined conventionally as 00 for all pEP.
Thus such useful concepts as 'greatest common divisor' and 'least common
multiple' are weIl defined. SpecificaIly, if r = {r j } is a set of elements of R, their
greatest common divisor is defined as
gcd r = II ppep(r),
where the order ep(r) ~ 0 of a prime p in r = {r j } is defined as

ep(r) = minj { ep(r)}


Similarly, the least common multiple is defined by replacing 'min' by max.'
It was amusing (and informative) to find the invariant factor theorem (1FT)
described, in paraphrase, as "the main (and perhaps only) theorem about
modules over principal ideal domains." In the engineering literature, for the
case where R is a ring ofpolynomials F[D], it is often called the Smith canonical
form [KF A]. The main part of the version of the 1FT used in [F 1] was this:
Invariant Factor Theorem. Let R be a principal ideal domain, and let G be
a k x n R-matrix. Then G has an invariant-factor decomposition
G=ATB,
where A is a square k x kR-matrix with unit determinant, hence with an
R-matrix inverse R -1; B is a square nxn R-matrix with R-matrix inverse B- 1
(Le. A and Bare 'unimodular'); and T is a k x n diagonal matrix, whose diagonal
elements Yi' 1 ~ i ~ k, are called the invariant factors of G with respect to R.
The invariant factors are unique, and are computable as follows: let Li i be the
greatest common divisor of the i x i minors of G, with Li o = 1 by convention;
then Yi = Li;/Li i - 1 Each invariant factor Yi divides Yi+ 1 if Yi+ 1 ;6 0, 1 ~ i ~ k - 1;
i.e. the orders ep(Yi) are nondecreasing with i for all pEP.
For any p.i.d. R, we can create a ring of fractions of R by declaring that a
certain subset P' ofthe primes have inverses. Ifthe set S is then the multiplicative
subset of R consisting of all products of elements of P', then the ring offractions
Q = S-l R is the set of all elements {q = r/s: rER, SES}. Then Q is a p.i.d. that
contains R; its primes are the primes of R that do not lie in S. If P' is the set
P of all primes in R, then every nonzero element of R is given an inverse, and
Q= S -1 R is actually a field, called the field of quotients of R.
For example, the set of rational functions F(D) is the field of quotients of
the ring of polynomials F[D]. The set of realizable functions Frz[D] is the ring
of fractions of F[D] in which every prime polynomial has an inverse except the
polynomial D, so that D is the only prime in Frz[D]. The greatest common

534

G. D. Forney, Jr

divisor of a set of elements of Frz[D] must therefore be apower of D, say DL ,


and we then call L the greatest common delay of such a set. Another interesting
ring of fractions of F[D] is the set of all finite formal Laurent series, which is
obtained if only the prime D is given an inverse.
In a ring of fractions Q of R, there is still unique factorization
q = uilpeP pep(q) ,

but now ep(q) can be negative for pEP'. The numerator and denominator of an
element q of Q may be defined as
num q = uil
den q = II

peP"

pePd

pep(q).

'

p-ep(q)

'

where P n is the set of all PEP for wh ich ep(q) > 0, and Pd is the set of all PEP'
for which eiq) < O. Similarly, the notion of greatest common divisor generalizes,
gcd q = II ppep(q),
where the order ep(q) of a prime p in q = {qj} is defined as ep(q) = minj { ep(q)},
which can now be negative for pEP'. In words, gcd q is the greatest common
divisor of the numerators of q divided by the least common multiple of the
denominators.
The invariant factor theorem can then be extended in the following way,
using the idea that a Q-matrix {qij} becomes an R-matrix if every element %
is multiplied by the least common multiple of their denominators. This is
sometimes called the Smith-McMillan canonical form [KFA]:
Invariant Factor Theorem (Extension). Let R be a principal ideal domain,
let Q be a ring of fractions of R, and let G be a k x n Q-matrix. Then G has an
invariant-factor decomposition with respect to R

G=ArB,
where Ais a k x kR-matrix with an R-matrix inverse A -1; B is an n x n R-matrix
with R-matrix inverse B- I ; and r is a k x n diagonal Q-matrix, whose diagonal
elements "li = (Xi/i' 1 ~ i ~ k, are called the invariant factors of G with respect
to R. The invariant factors are unique, and are computable as follows: let Ai
be the greatest common divisor ofthe ixi minors of G, with A o = 1 by convention;
then Yi = Ai/Ai-I' Each invariant factor Yi divides the next, in the sense that
the orders ep(Yi) are nondecreasing with i for all pEP.
If Q is the field of quotients of R, then G generates a vector space over Q
of dimension equal to the number of nonzero invariant factors of G with respect
to R, which is called the rank of G. The rank of G is thus the greatest value of
i for which the ixi minors of G are not all zero. G has an nxk Q-matrix right
inverse G- I = B- I r- I A -1 if and only if G has full rank k.

9 Applications-Convolutional Codes and Algebraic System Theary

535

References
[1] P. Elias, "Error-free co ding," IRE Transactions on Information Theory, Vol PGIT-4, pp 29-37,
1954
[2] J.M. Wozencraft and B. ReifTen, Sequential Decoding. Cambridge, Mass.: MIT Press, 1961
[3] J.L. Massey, Threshold Decoding. Cambridge, Mass.: MIT Press, 1963
[4] AJ. Viterbi, "Error bounds far convolutional codes and an asymptotically optimum decoding
algorithm," IEEE Transactions on Information Theory, Vol IT-13, pp 260-269, 1967
[5] J.L. Massey and M.K. Sain, "Codes, automata, and continuous systems: Explicit
interconnections," IEEE Transactions on Automatie Control, Vol AC-12, pp 644-650, 1967
[6] J.L. Massey and M.K. Sain, "Inverses of linear sequential circuits," IEEE Transactions on
Computers, Vol C-17, pp 330-337, 1968
[6] R.R. Olson, "Note on feedforward inverses far linear sequential circuits," IEEE Transactions
on Computers, Vol C-19, pp 1216-1221, 1970
[7] D.J. Costello, "Construction of convolutional codes for sequential decoding," Tech Rpt
EE-692, U. Notre Dame, August 1969
[8] R.E. Kaiman, "Irreducible representations and the degree of a rational matrix," J
SIAM Control, Vol 13, pp 520-544, 1965
[9] R.E. Kaiman, P.L. Falb and M.A. Arbib, Topics in Mathematical System Theory. New York:
McGraw-Hill, 1969; Chap. 10
[10] C.W. Curtis and I. Reiner, Representation Theory of Finite Groups and Associative Algebras.
New York: Interscience, 1962; pp 94-96
[11] LN. Herstein, Topics in Algebra. New Yark: Ginn-Blaisdell, 1964
[12] B. Hartley and T.O. Hawkes, Rings, Modules and Linear Algebra. London: Chapman and
Hall, 1970

536

G. D. Forney, Jr

3 Mai Results
We now summarize the main results of [F1], [F2], and [F3], with a few
em bellishmen ts.

3.1 Inverses
Questions about wh ether convolutional encoders (or more general linear
systems) characterized by a transfer-function matrix G have inverses of various
kinds are often easily answered by appeal to the invariant factor theorem or
its extension.
Generally, if G is a k x n matrix over so~e ring of fractions Q of a p.i.d. R,
where k ~ n, then G has an R-matrix inverse ~1 if and only if the numerator
rxk of the last invariant factor Yk of G with respect to R is 1; i.e. if the orders
ep(Yk) are nonpositive for all PEP (which implies that ep(Yi) ~ 0 for 1 ~ i ~ k for
all pEP). In this case, the inverse is given by G- 1 = B- 1 r -1 A -1.
If G has full rank, then it has an R-matrix pseudo-inverse G- 1 given by
G-1 = rxkB -1 r -1 A - 1 such that GG -1 = rxkl k, and if G' is any other R-matrix
such that GG' = rxlk for some rxER, then rxk divides rx; i.e. Ghas a pseudo-inverse
with factor rx if and only if
ep(rx):?; ep(Yk),

all PEP

In particular, taking R as the ring of polynomials F[D] and Q as the ring


of realizable functions Frz[D], a realizable encoder G has a polynomial right
inverse G -1 if and only if the greatest common divisor of the numerators of its
k x k minors is 1, and a polynomial right pseudo-inverse G- 1 if and only if this
greatest common divisor is apower of D, say D L The minimum delay of any
such pseudo-inverse is the degree of the numerator rx k of the last invariant factor
Yk of G with respect to F[D], i.e. the order en(Yk) of the prime D in Yk' which
is equal to the difference of the greatest common delays of the numerators of
the k x k and (k - 1) x (k - 1) minors of G.
Similarly, taking R as Frz[D] in the ordinary invariant factor theorem, a
realizable G has a realizable pseudo-inverse G- 1 with delay d if and only if
d:?; L, where the last invariant factor Yk of G with respect to F[D] is equal to
DL (Since D is the only prime in Frz[D], Yk must be apower of D.)

3.2 Minimal Encoders


The centerpiece of [F1] was the proof that every convolutional code C could
be genera ted by a canonical encoder G, called a minimal encoder, with various
desirable properties. We present here a development of this result along the
lines of [F3], which does not use the invariant factor theorem.

9 Applications-Convolutional Codes and Algebraic System Theory

537

Since G is a rational matrix, it gene rates a vector space V of dimension k


(assuming full rank) over the field F(D). If C is the code generated by G, V is
the set of rational code sequences in C, which are the output sequences that
are generated by rational input sequences. Two rational generator matrices
generate the same code C if and only if they are bases for the same rational
vector space V.
Any set of k linearly independent n-tuples gj = {%(D), 1 ~j ~ n}, 1 ~ i ~ k,
in V may be taken as a basis G. In this section, we consider only polynomial
bases G.
The degree of a polynomial n-tuple g = {gj(D), 1 ~j ~ n} is defined as
degg = max {deg gj(D), 1 ~j ~ n}
The constraint lengths of a polynomial encoder Gare then defined as the degrees
of its k generators, Vi = deg gi, 1 ~ i ~ k. The overall constraint length is V = LiV i .
A polynomial G has an obvious realization (controller canonical form) in which
the k input sequences enter k shift registers of lengths Vi' 1 ~ i ~ k, and the
outputs are formed by linear combinations of the current inputs and the V shift
register outputs, as in Fig. 1. The total number of memory elements in the
obvious realization, which is the dimension of its state space, is the overall
constraint length of G.
A minimal encoder for V is defined as a polynomial basis G of V that has
least overall constraint length V among all bases for V. The main theorem of
[F3] says that the following statements are all equivalent:
(a) G is a minimal encoder for V.
(b) The greatest common divisor ofthe k x k minors of Gis 1, and their greatest
degree is equal to the overall constraint length v of G.
(c) If y = xG is a polynomial code sequence, then (i) x is polynomial, and (ii)
degy = max{degxi + Vi' 1 ~ i ~ k}
(d) The dimension of the set of all polynomial code sequences y with deg y ~ d
as a vector space over F is equal to Li:v;<d(d - v;), where the Vi are the
constraint lengths of G.
The first part of statement (b) is recognizable as a statement that all invariant
factors Yi of G with respect to F[D] are 1, which implies that G has a polynomial
inverse G- 1 . Such encoders are called 'basic' in [Fl]. However, the second part
is also necessary for minimality. The two parts can be expressed in a single
unified setting, as we shall discuss further in Sect. 3.5.
The first part of statement (c) follows from the first part of statement (b),
and ass ures that a minimal encoder is not catastrophic. The second part of (c)
follows from the second part of (b), and is called the predictable degree property,
for the following reason. We ha ve deg xjgj = deg Xi + deg gj; however, in general,
because of possible cancellation ofhigh-order coefficients, we can say only that
degy ~ max{deg Xi

+ deggj, 1 ~ i ~ k}

538

G. D. Forney, Jr

The predictable degree property ensures that inequality never occurs in this
expression, so that we can predict the degree of a code sequence y if we know
the degree deg Xi of the k components of the input sequence x. This means that
the set of all code sequences y of a given degree can be easily enumerated, which
is the content of statement (d).
Conversely, if we can easily tabulate the short polynomial code sequences
in C (e.g. by searching a trellis diagram for Cl, then we can find a minimal
encoder. Any polynomial code sequence of least degree may be taken as the
first generator gl. Then any polynomial code sequence of least degree that is
not linearly dependent on gl may be taken as the second generator g2' and so
forth. When k linearly independent code sequences gi have been found in this
way, they must form a minimal encoder G. All minimal encoders thus have
the same constraint lengths Vi.
Given one encoder G for a code C, an equivalent minimal encoder can be
found by various algebraic manipulations of G in F[D] or F(D), as described
in [F1] and [F3]. However, as a practical matter, the method of tabulation of
the shortest linearly independent code sequences may be easier.
A minimal encoder does not just have the least overall constraint length of
any polynomial encoder in the obvious realization; there is no realization of
any encoder G for the code C that has fewer states. This is proved by observing
that the generators gi are code sequences that must be generated by any encoder
for C, and the time shifts D - jgi of these sequences, 1;'i; i;'i; k, O;'i;j;'i; Vi - 1,
truncated to time index zero and higher, are all nonzero sequences which must
arise from different physical states at time zero in any realization of any encoder,
since, from the main theorem about minimal encoders, no two ofthese truncated
sequences differ by a code sequence in C. These sequences represent astate
space of dimension V = LiVi over F, so the dimension of the state space of any
encoder must be at least v.
A rate-1ln encoder G comprises only a single generator n-tuple g. A minimal
encoder is obtained by multiplying through by the least common multiple of
the denominators of the rational functions gj(D), and then dividing through by
the greatest common divisor of the numerators; minimal encoders are thus
unique, up to units of F. Thus the fuH algebraic apparatus is not needed for
rate-1ln codes.
In summary, minimal encoders are polynomial encoders with polynomial
inverses (and thus noncatastrophic) that can be realized in the obvious realization with as few memory elements as any encoder for the same code, and that
also have the predictable degree property.

3.3 Dual Codes and Encoders


The inner product (y, z) of two row n-tuples y and z is defined as yzT, the product
ofy with the transpose ofz. Two n-tuples are orthogonal iftheir inner product is
zero. The set of all n-tuples z that are orthogonal to a vector space V of
dimension k is a dual vector space V.L of dimension n - k.

9 Applications-Convolutional Codes and Algebraic System Theory

539

Applying these general principles to a convolutional code C, which is a set


of code sequences y that forms a vector space of dimension k over the field
F((D of formal Laurent series, we can define the dual convolutional code Cl.
as the set of all sequences zE[F((DY such that yzT is the all-zero sequence
for all YEe. Then Cl. is a vector space of dimension n - k over F((D, and has
an (n - k) x n generator matrix H. In fact, if G is any k x n generator matrix
for C, then any (n - k) x n matrix H of rank n - k such that GH T = 0 is a
generator matrix for Cl..
It is shown in [Fl] and [F3] (with different proofs) that the set of k x k
minors of a minimal encoder G for C is the same as the set of (n - k) x (n - k)
minors of a minimal encoder H for the dual code Cl.. This has the interesting
consequence that their overall constraint lengths are the same, v = vi, although
of course {v;} =P { vt} in general, since in general k =P n - k.
The transpose H T of a minimal encoder H for Cl. is the generator matrix
of an n-input, (n - k)-output linear sequential circuit, called a syndrome-former
for e. 1fr is any sequence ofn-tuples, then s = rH T is a sequence of(n - k)-tuples,
called the syndrome of r. A sequence r is a code sequence in C if and only if its
syndrome is O. More generally, each coset of C is associated with a unique
syndrome sequence s, and consists of all r such that rH T = s.
A code C may thus be characterized equally well by a syndrome-former H T
as by a generator matrix G. This is particularly desirable for rate-(n - l)ln
codes, since their dual codes are rate-l/n codes, which are simple to analyze
and enumerate. This observation has been exploited by Paaske [1974] arid by
Ungerboeck [1982] in thcir searches for high-rate convolutional and trellis
codes.
By taking the dual of the obvious realization of H, we arrive at a realization
of a syndrome-former H T in a standard form (the ob server canonical form)
which has the same number v of memory elements as a minimal encoder G for e.
We show in [F2] that if the n outputs of Gare connected direct1y to the n
inputs of such a syndrome-former H T , then not only are the syndrome-former
outputs always equal to zero, but the syndrome-former states always track the
encoder states. This yields an isomorphism of the encoder and syndrome-former
state spaces. Using this isomorphism, certain questions about C can be
answered, such as: how many code sequences are all-zero over a given time span?
Interestingly, the answers involve the constaint lengths Vil. of minimal encoders
H for the dual code Cl..
From this isomorphism, it also follows that, with the addition only of some
combinationallogic, we can develop k more outputs from a syndrome-fomer
which are always equal to the encoder inputs. In other words, there exists an
n-input, n-output circuit with polynomial generator matrix [G- 1; H T ] that is
realizable with n memory elements such that if y = xG, then
y[G- 1 ;H T ] = [x; 0],

G[G-

1 ;H T ]

or

= [Ik;Okx(n-kj],

so that the first k columns are a polynomial n x k right inverse G - 1 for G.

540

G. D. Forney, Jr

A polynomial syndrome-former H T can be used for resynchronization. If a


code sequence y starts coming into H T at any time, with H T in any initial state,
then after a finite time no greater than the highest degree of any element of H T ,
the state of the syndrome former will be in the 'correct' state (the state corresponding to the current encoder state under this isomorphism). In this sense,
a syndrome-former can be used as an 'observer' of a discrete-time system.
Similarly, the dual code C.L has a syndrome-former GT , and its generator matrix
H has a polynomial right inverse H- 1 such that H[G T ;H- 1 ] = [O(n-k)xk;I n- k].
Thus the two n x n polynomial matrices [G- 1 ;H T ] and [G T ;H- 1Y satisfy
[G T ;H- 1Y[G-\H T ]

= In;

i.e. they are each others' inverses. Consequently they are both unimodular
polynomial matrices ('scramblers,' in the terminology of [Fl]).
These matrices may be used in an interesting decomposition. Let r be any
element of [FD))]n; then
r[G- 1 ;H T ]

[x(r);s(r)],

where x(r) is a k-tuple of sequences that may be regarded as a noisy estimate


(of the inputs), and s(r) is the syndrome ofr, which identifies the coset of C to
which r belongs. Let y(r) = x(r)G and t(r) = s(r)(H- 1f be the code sequence
corresponding to the noisy estimate and a coset representative sequence corresponding to the syndrome, which are both n-tuples of sequences. Then, as
illustrated in Fig. 2,
.
r = r[G- 1 ;H T ] [G T ; H- 1Y = [x(r);s(r)] [G T ;H- 1Y = ~(r) + t(r)
Thus every rE[FD))]" can be uniquely decomposed into a sum of a code
sequence ~(r)EC and a coset representative sequence t(r) in a linear (n - k)dimensional space T generated by the (n - k) x n generator matrix (H- 1 f; i.e.
there is a direct sum coset decomposition
[F( (D))]n = C EE> T,

where the representatives t of the co sets of C themselves form a linear space


T. The matrix G -1 G projects [F( (D))]" onto the k-dimensional subspace C,

r
n

Fig.2. Decomposition of a sequence r of n-tuples with an encoder inverse G- I, a syndrome-former

H T , an encoder G, and a coset representative generator (H- 1?

9 Applications-Convolutional Codes and Aigebraic System Theory

541

while H T(H- 1f projects [F((D))]n onto an orthogonal (n - k)-dimensional


subspace T. This decomposition, briefly discussed in Appendix I of [F1], and
the co set representative generator (H- 1 f have proved useful in recent work on
trellis shaping [Forney 1989].

3.4 Systematic Encoders


Let A be any k x k submatrix of a minimal encoder G for C whose determinant
IA I does not have Das a factor. There must be such a submatrix, else D would
be a divisor of all k x k minors and G would not be minimal. Then the inverse
A - 1 of A is a realizable (rational, causal) matrix, and G' = A - 1 G is a realizable
encoder for C with the k x k identity matrix as the corresponding k x k
submatrix. As noted in Sect. 2, such a matrix is called a systematic encoder;
it has the property that the input sequence x appears unchanged as part of the
code sequence y = xG'.
Thus every code C may be generated by a systematic encoder G, which in
general is not polynomial. It is shown in [F1], using the fundamental realizability
results of KaIman et al. [KF A], that any systematic encoder for C mayaiso
be realized with v memory elements, not necessarily in any particular form. As
already no ted, every systematic encoder also has a trivial polynomial inverse,
effectively the k x k identity matrix I k Thus systematic encoders mayaiso be
regarded as a canonical c1ass.
The general form of a systematic encoder, with the identity matrix I k fixed
in the first k columns, is G = [Ik; P], where P is some k x (n - k) 'parity-check
matrix,' in general involving feedback. It is apparent by inspection that the
(n - k) x n matrix H = [ - pT;I n _ k] is an encoder for the dual code, also
systematic. If G is a minimal encoder, one way of finding a minimal encoder
for the dual code is to transform G to systematic form, write down the corresponding systematic encoder for the dual code, and then determine a minimal
encoder equivalent to the systematic dual encoder.
If G is a minimal encoder for a rate-l/n code, comprising a single generator
g, then a systematic encoder is obtained by dividing through by its first
component g1' assuming that it is not divisible by D. The resulting encoder
G' = [1; P] has a simple realization in controller canonical form involving a
single shift register of length v = deg g, with the polynomial g 1 determining the
feedback. Correspondingly, there is a simple realization of a rate-(n - 1)/n
encoder H = [ - pT; I n - 1] as the dual (ob server canonical form) of a controller
canonical realization for the dual rate-l/n code, involving a single shift register
of length v; this is the form preferred by Ungerboeck [1982] in his original
paper on trellis codes.
There is a discussion in [Fl] about the relative virtues of minimal and
systematic encoders. Theoretically, the main virtue of minimal encoders is that
they are a canonical basis for set of all polynomial code sequences in C. The

542

G. D. Forney, Jr

main virtue of systematic encoders is that there is a unique systematic encoder


generating each code C, ifthe k x k identity matrix is fixed in the first k columns
(and if we exclude codes whose k x k minor in these columns is a multiple of
D, as we may always do by apermutation of columns). Therefore the systematic
form is sometimes preferred for code searches, although the minimal form is
usually preferred for analysis.

3.5 Generalized Minimal Bases and Poles at Infinity


There is a lengthy appendix in [F3], concerned with finding asymmetrie
statement of the conditions for minimality using the language of valuation
theory [Monna, 1970]. The only substantial suggestion that came out of the
reviews of [F3] was that the appendix might bett~,-be reserved for another
paper; but after my assurances that there was no prospect of my ever writing
another paper in this field, the editor relented and published it. With apologies
to that editor, here is another presentation of the gist of this appendix, with
some extensions.
Define a polynomial offormal degree J1. over a field F asf(D) = fo + f1D + ...
f/LD/L, where all coefficients includingf/L may take on any values in F, including
O. Denote the set of all such polynomials by F{D}.
There is unique factorization of the nonzero elements of F{D}, as follows.
A polynomial of formal degree J1. can be uniquely written as the product of an
ordinary polynomial of degree J1.' ~ J1. with the polynomial
1 + OD

+ ... + ODr/L' =

(1

+ OD)/L-/L',

a polynomial offormal degree J1. - J1.'. The set P* of primes in F{D} therefore
consists of the set P of ordinary primes in F[D] (the irreducible polynomials),
plus one more, the polynomiall + OD, whose formal degree is 1. Every nonzero
element of F{D} can then be written uniquely as

= u

pep(f),

p.

where u is a unit (nonzero element of F), and ep(f) ~ 0 is the order of the prime
PEP* in f. (The order of any prime PEP* in the zero polynomial is defined
conventionally as ep(O) = 00.)
Now every nonzero ordinary rational function in F(D) can be written
uniquely as the ratio of two elements of F{D} of the same formal degree, after
reduction to lowest terms. Specifically, if 9 = fd f2 is the ratio of two ordinary
polynomials of degrees J1.1 and J1.2' respectively, then the numerator is multiplied
by (1 + OD)/L2 - /LI if J1.1 ~ J1.2' or the denominator by (1 + OD)/LI - #2 otherwise. Every
nonzero rational function can then be written uniquely as
9=u

n
p'

pep(g),

9 Applications-Convolutional Codes and Algebraic System Theory

543

where now the order eig) ean be negative. For example, the delay of a rational
funetion g is given by
deI g = eD(g)
Beeause the formal degrees of the numerator and denominator are equal,
we have the productJormula (a sum here, beeause we are dealing with exponents):
~>p(g)[degp] = 0
P'

In the ease where g is an ordinary polynomial of degree Il in D, we therefore have


degg =

L ep(g)[degp] =

- e1+OD(g)

It is easy to see that for any gl,g2EF(D) and any PEP*,


eig1);:;;;

00

for all PEP*, equality irr gl = 0;

eig1g2) = ep(gl) + ep(g2);


ep(gl

+ g2) ~ min {ep(gd, ep(g2)},

where inequality may oeeur beeause of eaneellation. For example,


del(glg2) = deI gl

+ deI g2;

del(gl + g2) ~ min {deI gl' del g2};


deg(glg2) = degg 1 + degg 2;
deg(gl + g2);:;;; max{degg 1 , degg 2},
where inequality may oeeur beeause of eaneellation of low-order eoeffieients in
the expression for delay, or ofhigh-order eoefficients in the expression for degree.
A nonzero rational funetion ean also be written uniquely as the ratio of two
elements of F {D - 1} of the same formal degree. If p is a prime in F {D} of formal
degree Il, then D-/lp is a eorresponding prime of degree Il in F{D- 1 }; in
partieular, the prime 0 + D eorresponds to 1 + OD - 1, and the prime 1 + OD to
the prime 0 + D - 1. Under this eorrespondenee, we have the same faetorization
and the same produet formula whether we use elements of F{D} or F{D- 1 }.
This allows a symmetrie treatment of D-transforms or z-transforms, for instanee,
where z = D -1. We shall refer to the prime 0 + D ~ 1 + OD -1 simply as D, and
to 1 +OD~0+D-1 simply as D- 1
We may define degg as -eD-.(g) for any rational funetion g; then degg is
the negative of the delay of g as a formal Laurent series in D - 1, whieh we
denote as deID-.g:
degg= -deID-.g= -eD-.(g)
Similarly, we may make the definition
degD-.g= -delg= -eD(g),

whieh eoincides with the definition of the degree of a polynomial in D - 1.

544

G. D. Forney, Jr

The prime factors of positive order are factors of the numerator, and may
be called the zeros of a nonzero rational function g, whereas its prime factors
of negative order are factors of the denominator and may be called its poles.
The product formula says that the number of zeros, weighted by their degree,
is equal to the number of poles, similarly weighted. If the prime D has positive
(resp. negative) order, then g is said to have a zero (resp. pole) at zero of order
e(p) (resp. - e(p)); similarly, the prime D- 1 is regarded as a zero or pole at
injinity. (If we are working in F(D - 1) = F(z), the definition of 'zero' and 'infinity'
are interchanged.)
We may obtain subsets R of F(D) that are rings, in fact p.i.d.s, by excluding
certain poles or zeros. The ring of realizable (proper) functions Frz[D] is the
set of elements of F(D) that have no poles at zero (or, equivalently, the.elements
of F(D- 1 ) that have no poles at infinity); i.e. in which the order of the prime
D is nonnegative. The ring of polynomials F[D] is the set of elements that have
no poles anywhere except at infinity; i.e. the order of every prime but D- 1 is
nonnegative; the ring of polynomials F[D -1] is the-sevbf elements that have
no poles anywhere except at zero; i.e. the order of every prime but D is
nonnegative. The ring of finite sequences is the set of elements that have no
poles except at zero or infinity. We shall denote the ring of elements of F(D)
that have no poles at a particular prime PEP* as F p[D]; for example,
Frz[DJ = F D[D].
Since F(D) is the field of quotients of any of these subrings R, the extended
invariant factor theorem may be used to decompose any k x n rational matrix
G into a product A TB, where A and Bare invertible (unimodular) k x k and n x n
R-matrices, while T is a diagonal F(D)-matrix whose diagonal elements are the
invariant factors of G with respect to R. When R = F p[D], each invariant factor
Yi is a nonnegative power of p, Yi = pep(Yi), since p is the only prime in F p[D].
The invariant factors may be calculated as folIows. Let ep(Li;) be the minimum
order of p in any of the ix i minors of G. Then
ehi) = ep(Li i) - ep(Li i -

1 ),

where by convention ep(Li o) = O. The extended 1FT shows that the orders ep(Yi)
are nondecreasing with i.
If S is some set of primes PEP*, and Fs[D] is the ring of all elements of
F(D) that have no poles in S, then S is the intersection of the rings {F p[D], pES},
and the invariant factors of G with respect to F s[D] are the products of the
invariant factors of G with respect to each of the rings F p[D] individually, since
the orders of the i x i minors of Gare unchanged for each pES. (This assurnes
that F(D) is the field of quotients of F s[D], which occurs if and only if the
complement of S contains at least one element of PEP* of degree 1; for example,
in view of the product formula, F p.[D] = F, so F(D) is not the field of quotients
ofFp.[D].)
If q = {qj} is a set of rational functions, then the order ep(q) of a prime p in
q is defined as
ep(q) = minjeiqj)

9 Applications-Convolutional Codes and Algebraic System Theory

545

We still have
eiq) ~ 00,

equality iff q = 0;

eikq) = ep(k) + ep(q),


ep(ql

kEF(D);

+ q2) ~ min {ep(ql)' ep(q2)}

However, the product formula is now an inequality, since if q =F 0,

~>p(q)[degp] ~ ~>p(ql)[degp]
P'

P'

We define the defect of q as


defq = -

L ep(q) [deg p] ~ 0,

q =F 0

P'

(If q = 0, then defq = - 00.) For example, if q is a nonzero set of polynomials


with no common factor, so that eiq) = for PEP, then
defq = - en-l(q) = degq
We now have

def q = - 00 if q = 0; def q ~ otherwise;


defkq = defk + defq = defq, kEF(D);
def(ql + q2) ~ max {def ql' def q2}
If G.= {gi} is a set of k vectors of rational functions, i.e. an F(D)-matrix,
then we may define the defect of G as
defG = def{k

xk minors of G} = - L ep(G) [deg p],


P'

where eiG) is the minimum order of p in any of the k x k minors of G. We


then have
defG = -

00

if rank(G) < k;

defG ~

if rank(G) = k

Further, if T is any k x k rational matrix with det T =F 0, then


defTG = def(det T) + defG = defG,
so def G is an invariant over any basis TG for the row space V of G. Thus we
may say that defV = defG, or more generally defC = defG if, C is the code
generated by an encoder G and V is its rational subset.
We nw wish to show that def Cis the minimum number of memory elements
in any realization of any encoder G for C, and that there exist encoders that
can be realized with def C memory elements. Any such encoder may be regarded
as a generalized minimal encoder, or a generalized minimal basis for C.
For this tesult, we need to refer to the main theorem of realization theory,
which can be expressed as folIows. If q = {qj} is a set of elements of F(D), and
the orders eiq) of the primes PEP* are defined as above as eiq) = minj{ep(qj)},

546

G. D. Forney, Jr

then define the zero degree and pole degree of the set q as folIows:
zdgq = L ep(q)[degp];
pdgq = L - ep(q)[degp],
Pd

where P n is the set f PEP* fr which ep(q) > 0, and Pd is the set f PEP* fr
which ep(q)<O. It fllows that zdgq~O, pdgq~O, and
def q = pdg q - zdg q
Thus def q ~ pdg q, with equality if and only if zdg q = 0; i.e. iff ep(q) ~ 0 fr all
PEP*.
If the elements qj f q are expressed as ratis f elements f F {D - 1 }, then
pdg q is the degree f the least cmmn multiple of the denominators. Note
that if q is a set of realizable functions, then eD(q) ~ 0, since eD(q) ~ 0 for any
realizable function qj' S pdg q is also the degree oftlte-1east common multiple
of the denominators if the elements qj are expressed as ratios f elements of
the ring of polynomials F[D -1].
M ain theorem 0/ realization theory. Let G be a k x k matrix of realizable
functions, let el1 i) be the rders of the primes PEP* in the set of i x i minors
of G, 1 ~ i ~ k, and let ep(Yi) = ep(Ll i) - ep(Ll i - 1 ), 1 ~ i ~ k, with eiLlo) = 0 for
all PEP* by convention. Let
,ui=pdgYi=

-ep(Yi)[degp],

Pd(Yi)

where Phi) is the set of PEP* fr which ep(Yi) < 0, and let ,u = Li,ui. Then there
exists a realizatin f G with ,u memry elements, and there exists n realizatin
with fewer than ,u memry elements.
Since ep(Yi) is nndecreasing withi,,ui=pdgYi is nnincreasing with i, and
the sets Phi) are nested, with Phk) the smallest. If we define defYi =
Lp.-ehi)[degp] and ZdgYi= pdgYi-defYi' then defYi is nnincreasing with
i, zdg Yi is nndecreasing with i, and
ZdgYi = _L eh;)[degp] ~ 0,
Pd(Yi)

where PiY;) is the set f PEP* fr which ep(Yi) ~ O. Thus pdgYi ~ defYi' with
equality if and nly if ehi) ~ 0 fr all PEP*. Finally, eiG) = ep(Lld = Liep(Yi)
fr all PEP*. Thus
defC = defG = -L::ep(G)[degp] =
p*

Ip. Ii

eiYi) [degp]

= L defYi ~ L pdgYi = L,ui =,u,


i

with equality if and nly if ehi) ~ 0 fr all PEP*, 1 ~ i ~ k. Since ehi) is nondecreasing with i, this ccurs if and nly if eiYk) ~ 0 fr all PEP*.

9 Applications-Convolutional Codes and Algebraic System Theory

547

In summary, an encoder G for C can be realized with defC memory elements


if and only if ehk):::;; for all PEP*. We recognize that this is precisely the
condition that assures that G has an F p[D]-matrix inverse for all PEP*, or an
F s[D]-matrix inverse for any subset S of p* such that F(D) is the field of
quotients ofF s[D]. Such an encoder may therefore be called globally invertible.
A 1 x n row vector gi is realizable with pdg gi ~ def gi memory elements. A
k x n matrix G = {g;} is realizable in controller canonical form with Li pdg gi
memory elements by linearly combining the outputs of the k circuits whose
transfer functions are the row vectors gi' Thus an encoder G can be realized
with def G memory elements in controller canonical form if and only if deg gi =
Pi' 1 :::;; i:::;; k, and furthermore ep(gi):::;; 0, 1 :::;; i:::;; k, for all PEP*. (Note that this
requires that the rows gi be ordered so that defg i is nonincreasing with i). Such
an encoder may be regarded as a generalized minimal encoder.
In general, for any k x n rational matrix G, y = xG satisfies
eiy) ~ mini {ep(xig;)}

mini {ep(x;) + ep(gi)},

for all PEP*. If inequality ne ver occurs for some PEP*, then the set G is said
to be p-orthogonal.
If G is a generalized minimal encoder, then the matrix G' with rows
g; = p-ep(gil gi has ep(g;) = 0, 1:::;; i:::;; k, and thus has an Fp[D]-matrix inverse
(G,)-l. Then
y = xG =
where x; =

L: xigi = L: xpep(gil g: = "~ x:g: = x'G'

Xipep(gil,

'

1 :::;; i:::;; k. Thus

ep(x') = mini {ep(x;)} = mini {ep(x;)

+ ep(g;)}

But now x' = y(G')-l, and since (G,)-l is an Fp[D]-matrix,


ep(x') ~ ep(Y)
Consequently eiY) = ep(x') = min i{eixi) + ep(g;)}; i.e. the set Gis p-orthogonal,
for all PEP*. (Conversely, if here exists an x' such that ep(Y) < ep(x'), then G'
can have no F p[D]-matrix inverse.) An encoder G that is p-orthogonal for all
PEP* is said to be globally orthogonal.
Putting all of this together, the following conditions are equivalent:
(a)
(b)
(c)
(d)

G is a generalized minimal encoder.


For all PEP* and all gi' eig i):::;; 0, and ep(G) = Lieig;).
If y = xG, then eiY) = mini {eix;) + ep(gi)}' for all PEP*.
Gis realizable in controller canonical form with defG memory elements.

This is a generalization of the main theorem of [F3], expressed in asymmetrie


form relative to all PEP*.
An ordinary (polynomial) minimal encoder for G has ep( G) = for all P -;6 D - 1
and eD-,(G) = defG. Since its rows gi are polynomial, ep(G) = implies eigi) =
for p-;6D-l, and eD-,(G)=defG implies eD-,(G)=LieD-,(gi)= -Lideggi=

548

G. D. Forney, Jr

Li def gi' A generalized minimal encoder for G is thus a generalization of a


minimal encoder for G; the defects deg gi are generalized constraint lengths, and
def G is a generalization of the overall constraint length of G.
Finally, G has a pseudo-inverse with factor pL if and only L ~ eiYk)' This
completes the generalization of the main results of [Fl] and [F3].
References
[1] E. Paaske, "Short binary convolutional codes with maximal free distance for rates 2/3 and
3/4," IEEE Transactions on Information Theory, Vol IT-20, pp 683-689, 1974
[2] G. Ungerboeck, "Channel coding with multileveljphase systems," IEEE Transactions on
Information Theory, Vol IT-28, pp 55-67, 1982
[3] G.D. Forney, Ir., "Trellis shaping," 1989
[4] A.F. Monna, Analyse Non-Archimedienne. Berlin: Springer, 1970
[5] R.E. KaIman, P.L. Falb and M.A. Arbib, Topics in Mathematical System Theory. New York:
McGraw-HiII, 1969; Chap. 10

9 Applications-Convolutional Codes and Aigebraic System Theory

549

4 Subsequent Work
It is not possible in a paper of this length to give a fuH account of further

developments in the algebraic theory of convolutional codes, although, in brief


summary, there have not been that many. We give a short synopsis of the
citations of [Fl] (and [F2]), and an overview of the recent monograph of Piret
[1988]. The impact on linear system theory seems to have been relatively greater;
here we rely on the text of Kailath [1980], as weH as a synopsis of citations of
[F3].

4.1 Citation Index for [F1]


A citation index for [Fl] was obtained by a search in the Science Citation
Index data base covering roughly 1974-1988, augmented by a manual scan of
the IEEE Transactions on Information Theory for 1971-75 and 1987-89, and
by one or two other fortuitously discovered references. It contains 80 papers,
not counting [F2] or [F3]. Nine of these papers also cite [F2] and nine also
cite [F3].
Of these 80 references, 49 are about convolutional codes, 6 are about treHis
codes, 6 are about miscellaneous codes, and the remaining 19 are in the fjeld
of algebraic system theory. Of the 61 co ding papers, the great majority (42) are
in the IEEE Transactions on Iriformation Theory.
Most of these references are simply to establish terminology, such as:

an encoder is characterized by a generator matrix G with elements in F(D);


a convolutional code is a vector space over F(D) of dimension k;
two encoders are equivalent ifthey generate the same set of code sequences C;
the overall constraint length of a rate-kin polynomial encoder G is the number
of memory elements in an obvious (controller canonical) realization;
the constraint length of an arbitrary code Cis the number of memory elements
in a minimal encoder G for C (important for Viterbi algorithm decoders);
catastrophic error propagation (with prior reference to Massey and Sain)
may occur ifa nonpolynomial encoder input can result in a polynomial output;
a decoder can be decomposed into a code sequence estimator and an inverse
map.
The main efforts toward algebraic constructions of convolutional codes seem
to have been some early papers by Justesen et al. [Massey et al., 1973; Justesen,
1973 and 1975] and aseries of papers by Piret (some with DeIsarte), which
work is summarized in Piret's 1988 monograph; see Sect. 4.2. The algebraic
theory is also used to aid searches for good convolutional codes in such papers
as Paaske [1974] and Shusta [1977], and is used to aid searches for good trellis
codes in Ungerboeck [1982], as weIl as to derive a desirable form for a systematic
rate-(n - 1)ln encoder. FinaHy, it naturaHy appears in studies of syndrome

550

G. D. Forney, Jr

decoders, such as in the papers of Schalkwijk et al. [1976], [1978] and those
of Reed and Truong [1983], [1984].
References
[1] J.L. Massey, DJ. Costello, Jr. and J. Justesen, "Polynomial weights and code constructions,"
IEEE Transactions on Information Theory, Vol IT-19, pp 101-110, 1973
[2] J. Justesen, "New convolutional code constructions and a dass of asymptotically good
time-varying codes," IEEE Transactions on lIiformation Theory, Vol IT-19, pp 220-235, 1973
[3] J. Justesen, "Algebraic construction of rate-1/v convolutional codes," IEEE Transactions on
Information Theory, Vol IT-21, pp 577-580, 1975
[4] E. Paaske, "Short binary convolutional codes with maximal free distance for rates 2/3 and
3/4," IEEE Transactions on Information Theory, Vol IT-20, pp 683-689, 1974
[5] T.J. Shusta, "Enumeration of minimal convolutional encoders," IEEE Transactions on
Information Theory, Vol IT-23, pp. 127-132, 1977
[6] G. Ungerboeck, "Channel coding with multiJeveJjphase systems," IEEE Transactions on
Information Theory, Vol IT-28, pp 55-67, 1982
[7] J.P.M. Schalkwijk and A.J. Vinck, "Syndrome decoding of binary rate-1/2 convolutional
codes," IEEE Transactions on Communications, Vol COM-24, pp 977-985, 1976
[8] J.P.M. Schalkwijk, AJ. Vinck and K.A. Post, "Syndrome decoding of rate-kin convolutional
codes," IEEE Transactions on Information Theory, Vol IT-24, pp 553-562, 1978
[9] LS. Reed and T.K. Truong, "New syndrome decoder for (n, 1) convolutional codes," Electronics
Letters, Vol 19, pp 344-346, 1983
[10] I.S. Reed and T.K. Truong, "New syndrome decoding techniques for the (n,k) convolutional
codes," lEE Proceedings F-Communications, Radar, and Signal Processing, Vol 131, pp
412-416, 1984

4.2 The Work of Piret


Since his doctoral thesis in 1977, Piret has made a sustained effort to disco ver
and investigate classes of convolutional codes whose properties could be
effectively analyzed by algebraic methods. This work is summarized in his 1988
monograph [Piret, 1988].
To quote from his introduction:
The main characteristics of [algebraic block codes, such as the
Reed-Solomon and BCH codes] include:
1.
2.
3.
4.

Flexibility in the choice of the parameters of the codes.


Ease of enumerating the codes of this class.
High efficiency when used on a noisy channel.
An apriori bound (the BCH bound) on their error-correcting
capability.
5. A simple implementation, compared to that needed for a general block
code.
Unfortunately, no similar achievement is known to date for
convolutional codes: most convolutional codes have been constructed

9 Applications-Convolutional Codes and Algebraic System Theory

551

by computer search, and algebraic structures that guarantee good


efficiency have proved difficult to find. The main part of this book is
devoted to an attempt to fill part of this gap between the construction
methods for block codes and for convolutional codes. More precisely,
we shall analyze in some detail a construction scheme for convolutional
codes that fulfills the first three of the five characteristics
above and partially fulfills the fourth one. Although we obtain no
equivalent ofthe BCH bound, the structure ofthe codes considered will
indeed make their analysis much easier than that of arbitrary codes
ha ving the same parameters.
Piret first develops the theory of minimal encoders, and uses it as a basic
tool in further developments. He considers only polynomial encoders G. He
defines the complexity ofan encoder as its overall constraint length in an obvious
realization, and defines an encoder as minimal if it achieves the least complexity
among all polynomial encoders gene rating a given rate-kin code C, as in [Fl].
His first construction of a minimal encoder is by choosing the k shortest linearly
independent code sequences in C, using an algebraic procedure that he calls
"not very practical,"
In addition to the predictable degree property of [Fl], Piret defines an
encoder G as having the predictable delay property if deI (xG) = deI x for any
set x of input sequences, and the predictable span property if it has both
predictable delay and predictable degree. In his definition, the predictable degree
property includes the condition that no nonpolynomial input can produce a
polynomial output; Le. the encoder must not be catastrophic. He then proves
that a encoder is minimal if and only if it has the predictable span property.
This is equivalent to one of the results of [Fl], but the proof is different and
is simple to follow.
Piret then gives a more efficient method for determining a minimal
encoder from an arbitrary encoder G for C. The algorithm first finds a
basic encoder by a procedure that resembles the algorithm that produces
the invariant factor decomposition G = ArB for G as an F[D]-matrix, but
recognizes and exploits the fact that the desired basic encoder is simply the first
k rows of B, so that reduction to a Hermite canonical form by elementary row
operations is sufficient. The resulting basic encoder has the predictable degree
property if and only if its matrix of high-order coefficients has rank k; if it fails
to have this property, then aseries of reductions as in [Fl] will produce the
desired minimal encoder.
The main body of Piret's work is concerned with finding classes of convolutional codes that are invariant under certain automorphisms, defined as follows.
Let K be a subgroup ofthe multiplicative group ofall nonsingular n x n matrices
over F. Let k be a sequence {kd of kiEK, where i is the 'time index.' If y = {yJ
is any code sequence in C, then its transform k(y) is the sequence {YikJ The
sequence k is an automorphism of C if k(C) = C, and the set of all such
automorphisms is the automorphism group AutK(C) of the code C.

552

G. D. Forney, Jr

For example, if K is the group of cyclic permutations of n elements, and C


is invariant under some subset of the set of all sequences k = {k;}, kiEK, then
C is a convolutional code that can be regarded as a generalization of a cyclic
block code. Piret shows that there are nontrivial ex am pIes of such codes with
good distance properties.
To describe Piret's work furt her would go weIl beyond the scope of this
review or ofthe author's understanding. Essentially, by restricting consideration
to codes that satisfy constraints of this type, Piret is able to show that all codes
with certain parameters are equivalent to one of a small set of codes that can
be easily searched, and that good codes can be found by this procedure. He is
also able to construct good codes for figures of merit other than Hamming
distance. The interested reader should consult Piret's monograph.
Reference
[1] P. Piret, Convolutional Codes: An algebraic Approach. Cambridge, Mass.: MIT Press, 1988

4.3 Algebraic System Theory-Kailath's Text


The concepts and results of [F1] and [F3] seem to have had greater impact
and use in algebraic system theory than in convolutional coding theory. This
statement is not made on personal authority, since I have not followed this
field; however, it is supported by the assimilation of these concepts and results
in the weIl-known text ofKaiIath [1980], as briefly recapitulated in this section,
and by the length and breadth of the citation index for [F3], described in
Sect. 4.4.
Chapter 6 of Kailath's book is a lang and comprehensive treatise, entitled
"State-space and matrix-fraction descriptions of multivariable systems," that in
many respects is the centerpiece of the book. It would be much better for the
reader to consult the original than any summary that could be provided here.
Our purpose here is merely to trace the visible effects of [F1] and [F3].
In Sect. 6.3, Kailath launches into a discussion of properties of polynomial
matrices with the following comments:
The basic mathematical theory of polynomial matrices is drawn from
MacDuffee [1933], Gantmakher [1959], Wedderburn [1934], and
Newman [1962]. The application to systems problems seems to have first
been made by Belevitch [1968], Popov [1969], Rosenbrock [1970], Forney
[Fl], and Wolovich [1974]; in so doing, some new mathematical results
have been added by these authors and others, see especially Forney [F3].
Subsequently, in discussing the properness of a matrix-fraction description

H(s) = N(s)D- 1(s), Kailath introduces column (or row) reduced matrices and poles

9 Applications-Convolutional Codes and Aigebraic System Theory

553

and zeros at irifinity. He defines the square matrix D(s) as column reduced if
the degree of its determinant is equal to the sum of the degrees of its columns,
and comments:

[TheJ name [column reducedJ was suggested by Heymann [1975]. The


concept was first introduced into system theory by Wolovich [1971J, who
used the ... name column proper. An early mathematical treatment can
be found in Wedderburn [1933J, Chap. 4. (The physical significance of
the concept is that column-reduced matrices have no zeros at infinity .... )
After showing that this property depends on the nonsingularity of the
high-order coefficient matrix, he extends the concept (after Wedderburn) to
rectangular matrices of fuH rank, and shows the relation to proper solutions
H(s) of equations of the type H(s)D(s) = N(s). He then intro duces the predictable
degree property of [F3J (referring also to Vekua [1967J), and uses it to show
that aH equivalent (under uni modular transformations) column reduced matrices
have the same column degrees. A further application yields a division theorem
for polynomial matrices, to obtain a strictly proper remainder.
The concept next appears in constructing a controller-form realization of a
matrix-fraction transfer function H(s) = N(s)D -l(S) with state space dimension
equal to det[degD(s)J, assuming that D(s) is column reduced, attributed to
Wang [1971J and Wolovich [1971J. The column degrees are called the
controllability indices, and identified with the right Kronecker indices of a certain
matrix pencil. Similar results are obtained for an observer-form realization.
Section 6.5 is concerned with rational matrices. The development of the
Smith-McMillan canonical form leads to adefinition of the poles and zeros
of multi variable transfer function as the poles and zeros of the invariant factors.
This in turn leads to a discussion of poles and zeros at infinity, wh ich are not
determined by the invariant factors with respect to the polynomials F[s]. Kailath
comments:
There are several problems in wh ich it is important to keep track of the
behavior at s = 00. Poles at s = 00 correspond to nonproper systems (or
systems with differentiators), as may arise in constructing inverse systems,
while the zeros at 00 are important in studying the asymptotic behavior
of multi variable root loci ...
One method of obtaining the pole-zero structure at infinity is to determine
the Smith-McMillan form (invariant factors) of H(s) with respect to F(s - 1).
However, Kailath goes on to give a fuH account of "Valuations and a direct
characterization of the Smith-McMillan form," foHowing the appendix of [F3J
and the dissertation of Levy [1979].
There follows a section on "Nulls pace structure; Minimal polynomial bases
and Kronecker indices," which begins:
A rational p x m matrix H(s) is singular ifit is not square and invertible.
We shall see that the nature of this singularity can be characterized by
certain so-called left and right minimal (or Kronecker) indices.

554

G. D. Forney, Ir

We note first that the set of all rational m x 1 vectors {J(s)} such that
H(s)f(s) = 0 is a vector space (over the field of scalar rational functions)
called the right null-space of H(s) .. . If we had a vector space over the real
or complex numbers, its dimension would essentially characterize it. But
for our more general situation of a rational vector space, it turns out that
there is arieher structure, first noted by Kronecker (in connection with
linear pencils) and exposed in detail by Wedderburn and later,
independently, by Forney [F3]. Here we follow the slightly different
development in [Verghese, 1978], [Verghese and Kailath, 1979] (see also
[Vekua, 1967], Sect. 5].
This structure is captured by the notion of a minimal polynomial basis
for the (right null-)space...
[Kailath then develops the structure and properties of minimal bases,
and shows their relation to Kronecker indices.]
lC. Gohberg first pointed out to the author that the significance of
minimal ba ses was perhaps first realized by J. Plemelj in 1908 and then
substantially developed in 1943 by N.l Mushkelishvili and N.P. Vekua
(see the discussion in [Vekua, 1967], Sect. 5]. These authors were studying
the so-called Riemann-Hilbert problem, wh ich was later shown to be
closely related to the theory of Wiener-Hopf integral equations, as
described for example in the definitive paper ofGohberg and Krein [1960].
Certain so-called 'factorization indices' play an important role in this
theory, and it is therefore not surprising that these are closely related to
the Kronecker indices [as described more fully later] ...
(Although [F2] is not cited here, the right null-space is what would be called
the dual code in coding theory, and the right minimal (Kronecker) indices are
the constraint lengths v/ of a minimal dual encoder, which playa central role
in [F2].)
The next section defines the defect of a transfer function H(s) as in [F3],
and shows that the defect of H(s) is equal to the sum of the left and right
Kronecker indices of H(s).
The following section discusses equations of the form H(s)A(s) = B(s), and
shows that the question of whether they can be solved with no poles p in so me
subset S of P* can be answered by arguments in the same vein.
Aseries of25 exercises develops a wide variety ofadditional specific results.
Finally, in the last part of the chapter, there is a discussion of "Popov or
polynomial-echelon matrix-fraction descriptions, ... [which] are quite interesting [and] were first introduced by Popov [1969], and later studied by Morf
[1974]. Forney [F3], Eckberg [1973], and Kung and Kailath [1977]. .." This
development yields the Popov parameters {oc;jd as system invariants. Connections are made to the problems of finding a left matrix-fraction description
A -1 B equal to ND - 1; to finding inverse systems; and to the minimal partial
realization problem.
In summary, the mathematics exposed in [F1]-[F3] seems to connect
broadly to wide areas of the algebraic theory of multi variable linear systems

9 Applications-Convolutional Codes and Algebraic System Theory

555

theory, and to be a basic tool in this field. Of course, it is evident from Kailath's
scholarship that much of this was a rediscovery or recasting of concepts and
results that had been at least partially known previously, perhaps in other
contexts, and that a great many authors contributed to the development and
application of these ideas. Nonetheless, it seems fair to conclude that the
presentation of these ideas in 1970-1975 did substantially impact this field, and
that these ideas are now basic.
References
[I]
[2]
[3]
[4]

T. Kailath, Linear Systems. Englewood ClifTs, N.J.: Prentice-Hall, 1980


C.c. MacDufTee, The Theory of Matrices. Berlin: Springer, 1933
F.R. Gantmakher, Theory of Matrices. New York: Chelsea, 1959
J.H.M. Wedderburn, Leetures on Matrices. Providence, R.I.: Am Math Society, 1934; New
York: Dover, 1964
[5] M. Newman, Integral Matrices. New York: Academic Press, 1962
[6] V. Belevitch, Classieal Network Theory. San Francisco: Holden-Day, 1968
[7] V.M. Popov, "Some properties of control systems with irreducible matrix transfer functions,"
Leeture Notes in Mathematies, Springer, Vol 144, pp 169-180, 1969
[8] H.H. Rosenbrock, State-Spaee and Multivariable Theory. New York: Wiley, 1970
[9] W.A. Wolovich, Linear Multivariable Systems. New York: Springer-Verlag, 1974
[10] M. Heymann, "Structure and realization problems in the theory of dynamical systems," CISM
Courses and Leetures, Springer, No. 204, 1975
[11] W.A. Wolovich, "The determination of state-space representations for linear multivariable
systems," Proe. Seeond IFAC Symp. on Multivariable Teehnieal Control Systems, Dsseldorf,
1971
[12] N.P. Vekua, Systems of Singular Integral Equations. Noordhoff, The Neherlands, 1967.
(Russian original, 1950.)
[13] S.-H. Wang, "Design of linear muItivariable systems," Memo No. ERL-M309, Electronics
Research Laboratory, Univ Calif, Berkeley, 1971
[14] B. Levy, "Algebraic structure of 2-D systems," Ph.D. dissertation, Dep!. of Electrical
Engineering, Stanford U., Stanford, Calif, 1979
[15] G. Verghese, "Infinite-frequency behavior in generalized dynamical systems," Ph.D.
dissertation, Dep!. of Electrical Engineering, Stanford U., Stanford, Calif, 1978
[16] G. Verghese and T. Kailath, "Rational matrix structure," Proc 1979 IEEE Conf on Decision
and Control, 1979.
[17] I.c. Gohberg and M.G. Krein, "Systems of integral equations on a half line with kernel
dependingon the difTerence ofarguments," Amer Math Soc TranslVol14, pp 217-287,1960
[18] M. Morf, "Fast algorithms for muItivariable systems," Ph.D. dissertation, Dep!. of Electrical
Engineering, Stanford U., Stanford, Calif, 1974
[19] A.E. Eckberg, "Algebraic system theory with application to decentralized control," Ph.D.
.
dissertation, Dep!. of Electrical Engineering, M.I.T., Cambridge, Mass., 1973
[20] S. Kung, T. Kailath and M. Morf, "Fast and stable algorithms for muItivariable design
problems," Proe F ourth I F AC 1nt Symp on Multivariable Technological Systems, pp 97-104, 1977

4.4 Citation Index for [F3]


A search of the Science Citation Index data base yielded 129 papers that have
ci ted [F3] through late 1988 (enough to qualify [F3] as a 'Citation classic').
These are alm ost all in control theory or linear system theory: e.g. 44 in the

556

G. D. Forney, Jr

International Journal on Control (11 in 1979 alone), 31 in the IEEE Transaetions


on Automatie Control (9 in 1981), 11 in the SIAM Journal on Control and
Optimizaton, and more recently 8 in Linear Algebra and its Applieations. (Only
two are in the IEEE Transaetions on Information Theory.) The complete index

is available from the author.

5 Conclusion
This snapshot of one scientific development is an interesting vignette that
illuminates how science works.
In reviewing this his tory, one is struck by how the accidents of time and
place sometimes combine. The fact that Massey, the co ding theorist, and Sain,
the linear system theorist, were both at Notre Dame was of course a large factor
in the appearance of their pioneering work. But Massey was also a consultant
to Codex, where we were daily being presented with evidence of the practical
superiority of convolutional codes, and where we learned of his work with great
interest. Once the question of minimal encoders for a convolutional code was
clearly framed in 1968-69, it was natural to consult the literat ure of realization
theory; the fact that KaIman, Falb, and Arbib [KFA] appeared in 1969 was an
extremely happy coincidence. The confluence of all these circumstances account
in large part for [F1].
The fact that the coding framework (finite field, discrete time) leads to a
purely algebraic approach was certainly helpful. Also, questions that arise
naturally in coding-inverses, equivalence of encoders-might not have arisen
so prominently in another context. So it was probably fortunate that the work
was initially motivated by co ding problems.
Another piece of great good fortune was my spending the 1971-72 academic
year at Stanford, where the algebraic system theory ideas of KaIman and his
intellectual descendants were "in the air." Although [F2] was written at Stanford,
it was not greatly influenced by these more general ideas. However, enough
understanding of the state of multivariable system theory was obtained so that
the idea of writing up the results of [F1] for a system-theory audience took
shape, ultimately taking the form of [F3].
The timing of [F3] seems to have been just about right for multi variable
linear system theory. Enough related results were already known so that the
rather purely mathematical approach of [F3] could be readily related to known
and potential applications. Therefore it seems to have been rapidly assimilated
into the field. The fact that Kailath and his group were intimately familiar with
the work must also have greatly helped its propagation and acceptance.
(As a sidelight, it may interest the reader to know that I was so far out of
the field of linear system theory that I never even saw a copy of the published
paper [F3] until recently. "I shot an arrow in the air; it fell to earth, I know

9 Applications-Convolutional Codes and Algebraic System Theory

557

not where." In a telephone eall in about 1987, B.F. Wyman told me that [F3]
was "famous"-to my eomplete surprise.)
It is somewhat ironie to find that while these ideas have borne relatively
little fruit in the field of eoding theory for whieh they were developed, they have
been so sueeessful elsewhere. Nonetheless, this must happen often in the
development of seienee. The history as a whole ean be viewed as a eontribution
in the development of algebraie system theory that was due in part to the
aeeidents oftime and plaee, and in part to the benefits of a fresh point ofview.

System-Theoretic Trends in Econometrics


J. M. Schumacher
Centre for Mathematics and Computer Science, P.O. Box 4079, 1009 AB Amsterdam,
The Netherlands, and
Department ofEconomics, Tilburg University, P.O. Box 90153,5000 LE Tilburg, The Netherlands

The dream of econometrics has metamorphosed


into a technical problem in system theory.
(R. E. KaIman)

This is abrief survey of some recent research trends in econometrics which make extensive use of
techniques developed in system theory. In particular, we pay attention to the following subjects:
cointegration, error correction, and the representation of systems; path controllability, system
inversion, and trackability; inputs, outputs, and errors-in-variables.

1 Introduction
System theory interacts with the theory of economics and econometrics in rather
diverse ways, and the past few decades have seen the arrival and sometimes
also the departure of a rich variety of research trends in the interface. The story
might begin with The Mechanism of Economic Systems [55], a book that was
published in 1953 although it was based on notes that the author, Amold
Tustin, had written immediately after World War II. In this book, Tustin
proposed to model the workings of anational economy by analog simulation
using clever mechanical and electrical devices which he described in so me detail.
Apparently his hope, as an electrical engineer, was to use such nonlinear models
to explain and remedy business cycles much in the same way as unwanted
oscillatory motions in servomechanisms can be suppressed by appropriate
controller design. As noted by Aoki [3], this approach doesn't seem to have
had widespread influence among economists.
There have been other trends, however, wh ich did acquire a status of
permanence in the economic and econometric literature. Optimal control theory,
in the style that emerged in the fifties, has found its way into the economic
realm and is weIl and alive there. This is evidenced in recent textbooks such as
[16] and [53]. Optimal stochastic control theory has found application in
financial management; arecent survey is provided in [31]. There are other areas
that are more or less allied to system theory and that are extensively used in
economics, such as the theory of differential games, but we willleave these out
of our discussion.
An ex am pie of a standard and full-fledged subject in system theory that has
had an undeniable influence in econometrics is, of course, the Kaiman filter.

560

J. M. Schumacher

Its importance was recognized in the standard reference [22], and the KaIman
filter can now be considered as one of the standard tools in the study of time
series and dynamic economic models (cf. [14,48]). Further interaction between
system theory and econometrics takes place in the field of identification. The
fundamental problems that are involved here were stirred up by R.E. Kaiman
[30]. Arecent detailed elaboration of some of the points raised by Kaiman can
be found in [36,37]. At a more technicallevel, the recent book by Hannan and
Deistler [23] provides an excellent reference for the way that system theory
and statistics interact to solve identification problems.
In this paper, we shall attempt to highlight some of the newer research
trends in econometrics wh ich make extensive use of ideas and techniques from
system theory. First, we shall discuss the issue of 'cointegration' which has been
heavily debated in econometric circles during the past decade. One ofthe central
points in the discussion is a result known as the Granger representation theorem;
this is basically a theorem about alternative representations for linear dynamic
systems, which in system-theoretic terms would fall under the heading of
realization theory (or as some would perhaps prefer to say: the theory of system
representations and transformations). There is also an aspect of control in the
cointegration debate; in particular, the tracking oftargets is involved. The ability
of a system to track a given target is a classical subject in system theory, and
recently there have been some efforts to extend this older work and to apply
it in specific economic contexts. We shall briefly discuss the results in this area
in Sect. 3. Our final topic will concern the selection of 'inputs' and 'outputs'
('endogenous' and 'exogenous' variables, in econometric terminology). This
subject allows a four-fold decomposition brought about by the two divisions
static/dynamic and deterministic/stochastic; we shall discuss all four cases, to
bring out some interesting analogies. The final Sect. 5 contains concluding
remarks.
In this paper we will not cover all of the impulses to the application of systemtheoretic ideas in economics that are due Aoki and his co-workers, such as the
ideas concerning aggregation and reduction by balancing; instead we refer to
Aoki's recent book [4]. For additional material, we also refer to the special
issue ofthe Journal of Economic Dynamics and Control on Economic Time Series
with Random Walk and Other Nonstationary Components (Vol. 12-2/3 (1988),
edited by M. Aoki), the special issues of Computers & Mathematics with
Applications on System-Theoretic Methods in Economic Modeling (Vois. 17-8/9
(1989) and 18-6/7 (1989), edited by S. Mittnik), and the survey paper by E.l
Moore [38].

2 Cointegration, Error Correction,


and the Representation of Systems
Many economic time-se ries show an apparent random drift, which may be
explained by a lack of forces wh ich tend to drive the variable under study to
some preferred level. Since the traditional econometric methods of dealing with

9 Applications-System-Theoretic Trends in Econometrics

561

time-series are based on stationarity assumptions, it is standard practice


(recommended for instance in [5]) to pre-filter the data by taking differences.
Differencing once will reduce a 'random walk'-like behavior to stationarity. If
necessary, a time-series may be differenced several times in order to achieve
stationarity. A scalar time series is said to be integrated of order d if it reaches
stationary after differencing d times. Since there is loss of information involved
in taking difference (a differenced model can only describe relations between
changes ofvariables, not relations between the absolute levels), over-differencing
should be avoided.
In the context of vector time series, clearly there may be different orders of
observation between the components ofthe vector; more generally, it can happen
that certain linear combinations of the components have lower order of
integration than the components themselves. This may be seen as strong evidence
for the presence of economic forces wh ich tend to keep a certain balance between
the components, and the discovery of such relations is therefore of considerable
interest. Examples are the relations between consumption and income and
between short-term and long-term interest rates [9,13]. Generally speaking,
cointegration is found in so-called error correction models. Suppose that we
have two (vector) variables Yt and Zt which tend to satisfy a static 'target' relation
AYt+ BZt =0
The presence of this target relation can be reconciled with the presence of
(first-order) nonstationary dynamics by specifying an 'error correction' model:
A 1(L)L1Yt + BI (L)L1z t + D(L)[AYt-l

+ Bzt - 1] =

C(L)et

(The notation here is the econometric one: L is the lag operator that maps (xt)t
to (x t - dt; L1 = I - L is the difference operator, which maps (xt)t to (x t - x t - dt;
A1(z), B1(z), D(z), and C(z) are polynomial matrices; (et)t is white noise.) This
way of incorporating long-term dynamics into short-term dynamic models
originates in [9,47].
Apreeise formulation of the connection between cointegrated models and
error eorreetion models has been proposed by C.W.J. Granger in an unpublished
manuseript [20] and in the paper [13]. Specifieally, Granger ealls a proeess
(xt)t cointegrated of order d, b if all eomponents are integrated to order d, and
if some non trivial linear combination Zt = rl.' X t is integrated of order d - b where
b > O. A process X t in Rn that is eointegrated of order 1,1 is said to have
cointegrating rank r if a'x t is stationary for some r x n-matrix a' of full row
rank, and if 'x t is nonstationary for any matrix ' whose rank exceeds r. The
Granger representation theorem gives the eonnection between representations of
'autoregressive' and 'moving-average' type for time se ries that are eointegrated
of order 1, 1. The following version uses a formulation proposed by Johansen
[26].

The Granger Representation Theorem. Assume that the Rn-valued process (xt)t
satisfies
(1)

562

J. M. Schumacher

where (a,), is zero-mean white noise of unit variance, and C(z) is an n x n matrixvalued function that is holomorphic on the disk Iz I < 1 + p and that is nonsingular
on the same disk except at 1, where C(1) has rank n - r. Let oe and be n x r
matrices offull column rank such that oe'C(l) = 0 and C(l) = O.lfthe r x r matrix
oe'(dC/dz(l)) is nonsingular, then the process (x,), is cointegrated of order 1,1
with cointegrating rank rand satisfies the equation

(2)
where

llo = (oe'(dC/dz(l)))-l oe'


The processes (Llx,), and (oe'x,), are stationary so that the representation (2) may
be seen as an error correction representation.
Conversely, suppose the process (x,), satisfies an equation (2) where (a,), is
white noise and where the matrix function ll(z) = llo(z) + (1- Z)lll(Z) is
holomorphic and nonsingular on the disk Iz I < 1 + p except at z = 1 where
llo = ll(l) has rank r. Let oe and be n x (n - r)-matrices of full column rank
such that oe''!to = 0 and '!to = O.lf the (n - r) x (n - r)-matrix oe'lll (l) is invertible,
then the process (x,), is cointegrated of order 1, 1 with cointegrating rank rand
satisfies an equation (1) in which C(z) is holomorphic and nonsingular in the disk
Izl< 1 + p except at the point z = 1, where

o
The proof of the Granger representation theorem in [13] is somewhat hard to
follow. Engle sketches a different proof, due to B.S. Yoo, in [12]. This proof is
based on what Engle calls the Smith-McMillan-Yoo form; it is actually a Smith
form with respect to the ring of causal stable rational functions. In [26], Johansen
uses the context of functions that are holomorphic on an open disk containing
the unit circle (which is more general than the rational context used by Yoo),
and he provides a third proof. Apparently it hasn't been noticed in this literature
that essentially a matrix generalization is involved here of the following simple
rule from complex function theory: if fez) is holomorphic in a neighborhood
of zo, then f- l(Z) has a simple pole at Zo if and only if dJldz(zo) is nonzero,
and in that case the residue of f -l(Z) at Zo (Le. the coefficient of (z - zo) -1 in
the Laurent se ries development of f -l(Z) around zo) is given by (dJldz(zo)) -1.
In the matrix case, one has to take directions into account, and the resulting
residue formula is given below. We shall say that a matrix function G(z) has a
simple pole at a point Zo of the complex plane if G(z) has a pole at Zo but
(z - zo)G(zo) doesn't have a pole there.
Residue Formula. Let F(z) be n x n matrix function that is holomorphic in a
neighborhood of zo, and suppose that F(z) is nonsingular in a neighborhood
of Zo except at Zo itself. Let the rank of F(zo) be n - r; let oe and be n x r-matrices
offull column rank such that oe' F(zo) = 0 and F(zo) = O. Under these conditions,
the matrix function F -l(Z) has a simple pole at Zo if and only if the constant

9 Applications-System-Theoretic Trends in Econometrics

matrix rx'(dF/dz(zo)) is invertible, and in that case the residue of F- 1(z) at


has rank rand is given by

563
Zo

Res(F- 1 (z); zo) = (rx'(dF/dz(zo))) - l rx,


The formula is given by Lancaster for the case in wh ich F(z) is a polynomial
matrix [35, pp. 60-65]; the holomorphic version is formula (4.18) in [50]. The
proof is based on a suitable ('local') version of the Smith form. To see how the
residue formula applies to the Granger representation theorem, we note that
II(L) and C(L) should be related by
II(L)C(L) = Li

This means that


C- 1(z) = IIo(1-Z)-l

+ II1(z),

so that IIo is the residue of C- 1(z) at 1, and of course we also have


II- 1 (z) = C(1)(1 - Z)-l

+ (C(z) -

C(1))/(1 - z)

so that C(1) is the residue of II -l(Z) at 1.


Aside from the technicalities, a more fundamental point that might be
brought up in connection with the Granger representation theorem is the
following. The theorem purports to be a statement about different representations
ofthe same thing, but it is actually not too dear what it is that is being represented. Statements about equivalence of representations are traditionally formulated
in situations in which there is a unique stationary solution associated with each
representation, and in this case there is of course no problem-what is represented is that stationary solution. If one leaves the domain of stationary series, however (as one is forced to do in order to discuss phenomena such as cointegration),
then this obvious answer is no longer applicable. The difficulty is noted by
Davidson, who writes: "In fact, because of missing constants of integration a
process such as [one given by a vector autoregressive equation II(L)x t = St, with
II(I) singular] cannot give a complete description of the generation process of
the variables; it must be understood as representing a stationary process in the
differences" [8, p. 8/9]. A more satisfactory approach, however, should address
the problem of nonunique solutions directly. The idea of considering sets of
solutions rather than individual solutions is a key point in the work of J.C.
Willems [57,58], which already has given rise to an extensive theory of equivalent
representations for linear deterministic systems (cf. the survey [51]). It would
seem that a similar theory will have to be developed for the stochastic case in
order to allow for an exact and complete frmulation of results such as the
Granger representation theorem.
Now, let us consider briefly the general situation of higher-order cointegration. If (xt)t is a process that is integrated of order d > 0, then the Wold
decomposition implies a representation of the form
Lidxt = C(L)St

564

J. M. Schumacher

We shall continue to ass urne that the matrix function C(z) is holomorphic on
an open disk containing the unit circle and that C(z) is non singular on the same
disk except at z = 1. It is natural to define the co integration space of order k as
the set of all vectors oe such that Llkoe'xt is stationary. If we denote the dimension
of this space by nk' then we may call the indices (no, n1, ... , nd) the co integration
indices ofthe process (xt)t. In this terminology, an Rn-valued process is integrated
or order 1, 1 with cointegrating rank r if and only if its cointegration indices
are (r, n). The cointegration indices can be easily expressed in terms of the
coefficients of the power series development of C(z) around z = 1: writing
C(z) =

L Ci 1 00

z)j,

j=O

we have
nd_i=dimker[Co C1Ci-1J'

The important point to note is that the cointegration indices are not in any
one-one relation with the orders of the zeros at 1 of the matrix function C(z).
(We recall that a non singular meromorphic matrix function F(z) allows, with
respect to a given ZoEC, a 'local' Smith form
(3)

where U(z) and V(z) are holomorphic in a neighborhood of Zo and invertible


at zo0 The integers k 1 , ,kn are called the order of the zeros of F(z) at the point
zo.) This is seen most clearly by comparing the formula
nd- j = dirn {oel(1 - z)- jC'(z)oeEH(I)},

in which we use the notation H(I) for the space ofvector functions that are holomorphic in a neighborhood of 1, with the following formula (adapted from
[41J) for the number vj of zeros at 1 of C(z) of order~;j:
vj = dirn {oe(I)1 oe(z)EH(I), (1 - z) - jC'(z)oe(z)EH(I)}

(4)

Clearly we have
(5)

but equality does not hold in general, as can be seen from simple examples.
The most important exception to this is, of course, the case of first-order
integration.
It can easily be seen that the vector functions oe(z) which appear in (4) may
be restricted to be vector polynomials, without impairing the validity of the
statement. Therefore, if we allow cointegrating vectors to be polynomial rather
than constant and change the definition of 'cointegration indices' accordingly,
we do obtain a one-one relation between cointegration indices and orders of
zeros at 1. The importance of polynomial cointegrating vectors (PCIV's) has
been emphasized by Yoo (cf. [12J). A slightly different approach is taken by
Johansen [24]. He introduces wh at we have called the 'cointegration indices',

9 Applications-System-Theoretic Trends in Econometrics

565

and notes that their sum can at most be equal to the order r of the zero of
det C(z) at 1. The case in which equality holds is referred to by Johansen as the
'balanced' case; since it is easily verified that

we can see that this case is the one in which equality holds in (5) for each
= 1, ... , d and, moreover, vj = 0 for j < d. Johansen proceeds to show that, after
constant row transformations which are summarized in a nonsingular matrix
T, we can write

TC(z)~ Ir ::)~rz)]

(1 - Z)kCk(Z)

where, in the balanced case, the matrix C(z) = [C~(z) ... C~(z)]' is nonsingular
at 1. We mayaIso write this is a slightly different way:
C(z)

= T- 1 diagl -

zt', ... , (1 - Z)k n ) C(z)

Comparing this with (3), we see that the balanced case is characterized by the
fact that the local Smith form around z = 1 can be obtained using only a constant
transforming matrix on the left side. In general, one will have to use a nonconstant transformation; although the local Smith form in principle calls for
holomorphic transformations, Johansen proves by a direct argument that a polynomial transformation on the left hand side will suffice. (In the rational case,
one might appeal to the Smith-McMillan form to prove this; in fact, this is
what Yoo does.) The polynomial transformation can then be interpreted as a
transformation of the variables in which linear combinations are taken of
contemporaneous and lagged components.
So, either by introduction of polynomial cointegrating vectors or by polynomial transformations of the variables, the structure of cointegrated systems
can be studied through the zero structure of an associated matrix function at
z = 1. This may help to solve remaining problems, such as the formulation of
analogs of the Granger representation theorem for higher-order cointegrated
series (partial results on this can be found in [24] and [8]). Another important
question is, to what extent polynomial cointegrating vectors (or polynomial
transformations of the variables) are unique; the answer to this is of course
critical to the discovery of 'target relations'.
In the above, we have emphasized what might be called the 'structural'
aspect of cointegration. There is of course also a 'statistical' side to the matter,
which is concerned with the testing of hypotheses about the cointegration
structure and with the estimation of cointegrating vectors, and most of the
journalliterature in fact concentrates on this aspect (see for instance [13,25,42]).
Virtually all of this work is concerned with first-order cointegrated systems. It

566

J. M. Schumacher

seems, however, that even in this context there are some basic questions that
remain to be answered, in particular in connection with hypothesis testing.
Engle notes: "The null hypothesis of cointegration would be far more useful in
empirical research than the natural null of non-cointegration. The selection of
a 5% test of the non-cointegration null is very arbitrary and many researchers
are assuming cointegration when these tests are only rejected at larger
significance levels" [12, p. 26/27]. One may argue about what is natural; in a
sense, the hypothesis of cointegration is the more highly structured one, and is
therefore simpler and more natural. From a certain point of view, the
cointegrated situation is also the more singular one, which may explain the
difficulties that classical statistical methods have with adopting cointegration
as the null hypothesis. Possibly the theory of zeros of matrix functions may
also be of help here to unravel the singularities.

3 The Tracking of Targets


Although cointegration can be caused by the presence of 'common trends',
another explanation that is sometimes plausible is presence of steering action.
Davidson and Hendry [10] even use the word 'servo-mechanism' to describe
the economic forces that keep certain variables together; Arnold Tustin would
have appreciated this terminology. Error correction models are placed explicitly
in a context of target following by Kloek [32]. It may then be expected that
the extensive theory of tracking which has been developed in mathematical
control theory should have some relevance.
There is a sizable economic literature with a clear system-theoretic motivation on the problem of exactly following a prescribed path, the so-called 'path
controllability'. The problem is customarily posed in a deterministic setting and
bears a mathematical-economic flavor rather than an econometric one. Path
controllability can be seen as an extension ofTinbergen's concept of achievability
of targets in static models [54]. When the targets are solved in terms of the
instruments in a static linear model, so that we have
y=Gu
where y is a vector of targets, G is a constant matrix, and u is a vector of
instruments, then the obvious criterion for achievability of each given vector y
by a suitable choice of instruments u is that the matrix G should have full row
rank. A necessary condition for this to hold is of course that the number of
targets should not exceed the number of instruments; this is sometimes called
the 'Tin bergen policy condition'. The dynamic version of target achievability
was introduced in economics by Preston [44] and Aoki [2], after essentially
the same idea had been introduced into system theory (under the name of
'functional reproducibility') by Brockett and Mesarovic [6]. In the discrete-time
case, path controllability is defined to me an that, after a certain 'adjustment

9 Applications-System-Theoretic Trends in Econometrics

567

time' or 'policy lead', any given path of the target variables can be tracked
exactly by proper choice of the instrument variables. The definition in the
continuous-time ca se is slightly different, but the criterion (at least in the linear
constant-parameter case) is the same: path controllability holds if and only if
the transfer matrix G(z) from instruments to targets has full row rank as a
rational matrix [6, p. 559J. This is a rather attractive generalization of the static
rule of Tinbergen.
Further work within the system theory community on this subject has concentrated on finding simple conditions for right invertibility in terms of the
state space representation
x(k + 1) = Ax(k) + Bu(k),

y(k) = Cx(k) + Du(k),

X(k)EX, U(k)E U
y(k)E Y

A condition for right invertibility in terms of the parameters A, B, C, and D was


already given by Brockett and Mesarovic, but this involved a rather big matrix
formed from the parameter matrices. The following compact method for
determining whether or not a system is right invertible is essentially due to
Morse and Wonham [39]. Define recursively a sequence of subspaces of the
state space X by

TO = {O}
Tk+1

= {xEXjx=Ax+BuforsomexETkandusuch that CX + Du = O}

It is easily seen that the sequence (Tk)k is nondecreasing, and so the sequence
must have a limit which is denoted by T*. The system given by the parameters
(A, B, C, D) is right invertible (in the sense that the transfer matrix G(z) =
C(zI - A) -1 B + D is right invertible as a rational matrix) if and only if
CT*+imD= Y

The state space framework suggests extensions to the non-constantparameter case and the nonlinear case. A characterization of path controllability for linear systems with time-varying parameters has been given by Engwerda
[15J; necessarily the condition is more involved than in the constant-parameter
case, but an analogy with the Morse-Wonham result can still be drawn.
Necessary and sufficient conditions for (local) path controllability of discretetime nonlinear systems have been given by Nijmeijer [40J, who also establishes
the close relation that exists between path controllability and decouplability
(the possibility ofintroducing a control policy in which each target is influenced
by only one instrument). Recently, state space algorithms have become available
to decide on the right invertibility of systems that are given in implicit form,
rather than in solved form [33]. This is areturn to the original formulation by
Tinbergen, who starts in [54J with implicit equations rather than with a 'final
form'.
One may reasonably argue that the invertibility of dynamic systems should
play an important role in dynamic economic theory, simply because invertibility

568

J. M. Schumacher

is such a basic concept; so the study of system invertibility is weH-motivated.


However, it is also clear that exact path foHowing is not a realistic goal in a
worId fuH of disturbances. Alternative formulations of tracking problems can
be obtained by introducing assumptions on the variables to be tracked, and a
stochastic setting can be accommodated by relaxing the condition of exact
tracking. For instance, Kloek in [32] assurnes that the target is integrated of
order 2 and requires that the tracking error should be zero-mean and weakly
stationary. Situations in which some information is available about the dynamics
of the signal to be foHowed have been studied extensively in system and control
theory; in fact, this branch of control theory has its roots in the design of certain
servomechanisms that were used in WorId War II and that took the notion of
'target' quite literally. We refer to [34, Ch. 5], [60, Ch. 6-8], [49], and [18] for
a sampie of the modern literature on the subject. Basically, the conclusion of
these studies is that, for track ability of constants and linear trends in discretetime systems, the transfer matrix from instruments to targets should have fuH
row rank at the point 1 of the complex plane. Moreover, if the action of the
controlling mechanism is based purelyon the tracking error, then it can be
shown that the controller must contain what is called an 'internal model' of
the signal that is to be followed.
A few remarks can be made here. Firstly, we see that again the zero structure
at 1 is of importance. Secondly, a somewhat surprising conclusion is that the
track ability condition is stronger than the condition for path controllability;
indeed, if the transfer matrix has full row rank at some given point of the
complex plane then it will certainly have full row rank as a rational matrix.
This may be explained by the fact that path controllability is achieved by openloop control, whereas in the case of the trackability problem the solution is
sought in the form of a closed-Ioop controller, which automatically adjusts the
control action to changes in the signal to be followed. Thirdly, the presence of
an 'internal model' might be an interesting hypothesis in situations in which
control action is suspected, such as when time series are cointegrated. Structural
constraints such as the one implied by the internal model principle mayaiso
be used in model specification. We note that the internal model principle has
been mentioned recently by Salmon [46], who however seems to use the term
to indicate compatibility between models; this is certainly a subject of interest,
but not one that is related directly to the tracking problem.
The role of ideas from control theory in mathematical economics can now
almost be called classical; this is true for optimal control, but also for a number
of other ideas in which optimization is not necessarily involved, such as path
controllability. Developments that may be expected here include further elaboration of the relation between path controllability and decoupling, and study of
the structure of control policies when the instruments are in the hands of various
different agents. The application of control ideas in econometric modeling is
more recent and, to a considerable extent, this subject still has to take shape.
In many situations in which several variables of interest are studied there is a
great need for structural information to be incorporated in the specification of

9 Applications-System-Theoretic Trends in Econometrics

569

models, and the results of control theory may help to provide such information
in the form of constraints that must be satisfied for control action to be effective.

4 Inputs, Outputs, and Errors-In-Variables


It is a generally recognized fact among econometricians that the distinction

between endogenous and exogeneous variables is often debatable. For this


reason (and for other reasons as weIl) it has been argued by J.c. Willems [56]
that in a general theory of systems one should start with a notion of '0 bservables'
or 'external variables' without imposing apriori a division between inputs and
outputs. This implies that one should describe the relations between the variables
in a nondiscriminating way. Having done this, one may ask which choices of
inputs and outputs would be reasonable; of course, exactly what is 'reasonable'
in a given situation may depend on the availability of extra information which
is not expressed in the system description. We shall discuss the problem of
selecting inputs and outputs in four cases, corresponding to the divisions static/
dynamic and deterministic/stochastic. The discussion will be limited to linear
systems, however.
The deterministic static case would perhaps be considered trivial, but let us
discuss it anyway for purposes of comparison. Suppose a linear relation between
external variables Wi is given by

Rw=O

(6)

where we may ass urne that the matrix R has full row rank. If we believe that
it is reasonable to require that the inputs are not restricted by the equations
and that the outputs are completely determined by the inputs and by the
equations, then the standard procedure applies: select output variables by finding
a maximal set of independent columns among the columns of R, name the
associated components y, name the remaining components u, rewrite (6) as
R1y + R 2 u = 0 and, noting that R 1 is invertible by construction, obtain

Y= -R;lR 2 u
which c1early has the desired characteristics. In general, the choice of inputs is
not unique; however, the number of inputs is determined by the data (6). Any
selection of this number of variables will 'generically' be valid as a choice of
inputs.
There is a certain asymmetry in the selection procedure based on (6) since
we first select the outputs and then simply let the inputs be what is left. However,
if we would have represented the subspace ker R which effectively appears in
(6) as the image rather than as the kernel of some matrix, then we would have
selected the inputs first by taking a maximal set of independent rows of the
representing matrix. So the seeming priority of outputs over inputs in the
selection procedure above is just a consequence of the chosen representation.

570

J. M. Schumacher

In the linear deterministic dynamic case, the problem of selecting inputs


and outputs has been considered by Willems in [57]. In this ca se, the condition
for an admissible selection of inputs and outputs might be that the transfer
matrix from inputs to outputs should exist and should be proper rational. (This
can be formulated more intrinsically: see [59].) The solution given in [57, 58]
may be described as folIows. Let a set of difference equations with constant
coefficients in the variable W(k)ERq be given by
R(a)w = 0,

where adenotes the (forward) shift and R(z) is a polynomial matrix which we
may assurne to have full row rank. The basic technique is to write R(z) in the
form T(z)B(z) where T(z) is an invertible rational matrix and B(z) is 'right
bicausal', i.e. B(z) is proper rational and has full row rank at infinity. This
factorization may be achieved by the reduction of R(z) to row reduced form
[27, p. 386]; indeed, note that this procedure factorizes R(z) as U(z).::1(z)B(z)
where U(z) is unimodular, .::1(z) is diagonal with diagonal elements of the form
z\ and B(z) is right bicausal. A proposed selection of inputs and outputs will
induce a partitioning of R(z) as [R l (z) Rz{z)] (after possible reordering of the
columns), and a corresponding partitioning of B(z) as [BI (z) B 2 (z)]. Now, R l (z)
will be invertible if and only if BI (z) is invertible, and R; I (z)R 2 (z) = B; I (z)B 2 (z)
will be proper rational if and only if Bl(z) doesn't have a zero at infinity. (The
'only if' holds because BI (z) and B 2 (z) are co prime as matrices over the ring
of proper rational functions, so there can't be a pole-zero cancellation.) The
result is that the proposed selection of inputs and outputs is admissible if and
only if the matrix Bl(oo) is non-singular. In other words, what we have to do
is to select a maximal number of independent columns from the full row rank
matrix B( 00 )-we might say that the problem is reduced to the static case.
Of course, this solution is hardly surprising to the econometrician, who is
used to representing transfer matrices as quotients of matrices of polynomials
in Z-l (the backward shift). In models of the form
B(a-1)y

= A(a-1)u

where A(z) and B(z) are polynomial matrices, the condition that B(O) should
be invertible is known as the 'causality condition'; in fact, such models are often
specified with the condition B(O) = I (see for instance [22, p. 13]).
In order to make a comparison with the stochastic situation that will be
discussed below, let us see how much more difficult the problem becomes when
we require that that the transfer matrix from inputs to outputs should not only
be proper, but also stable. In principle, the same technique as above applies: if
we can write R(z) in the form T(z)B(z) where T(z) is an invertible rational matrix
and B(z) is now a proper stable rational matrix having full row rank for all z
with JzJ :?; 1, then a selection of inputs and outputs will be admissible if and
only ifthe corresponding matrix Bl(z) is nonsingular for all z with JzJ:?; 1. The
desired factorization of R(z) can be obtained by a Wiener-Hopf factorization
with respect to the unit circle [7] (cf. the interpretation of the reduction to row

9 Applications-System-Theoretic Trends in Econometrics

571

reduced form as a Wiener-Hopffactorization with respect to the point at infinity


in [19J). SO in this case, the input-output selection problem is essentially the
following: given a matrix that is 'right unimodular' over the ring of proper
stable functions, find a square submatrix that is unimodular. Obviously, this is
not always possible. The simplest example would be that of a system with two
variables in which neither the transfer matrix from the first to the second variable
nor its inverse is stable.
Next, let us consider the stochastic case. Ifwe suppose that both the observations and the additive noise are generated by mechanisms that can be modeled
as zero-mean normally distributed variables, then the general linear model can
be written as

w=Nx+e

(7)

where x generates the observations and e is noise. The observed vector w will
be normally distributed with zero me an and covariance matrix Q, and so all
observational data are summarized in Q. In the model (7), we could select
independent rows from the matrix N (which may be assumed to be of full
column rank) and we might convert the model to an input-output form just as
in the deterministic case. However, without further assumptions on the noise,
the model (7) is hopelessly non-unique. Not even the number of inputs is
well-defined; it may vary from rk Q (no noise) to 0 (all noise).
One possible constraint on the noise covariance matrix r, which is wellmotivated when the observation space Rq is considered as the Cartesian product
of q different one-dimensional spaces, is to require that r should be diagonal.
This, of course, leads to the Jactor analysis model, which has experienced
renewed interest following Kalman's critique of the concept of identifiability in
econometrics [28,29]. What we called 'the number of inputs' becomes 'the
number of common factors' in the context of factor analysis, and it is natural
to define this number as the minimallength ofthe vector x for which a representati on of the form (7) (with cov(ee T ) diagonal) is possible. In contrast to the
unconstrained case, this number is now well-determined, but unfortunately its
determination is an open problem.
From the point of view of selecting inputs and outputs, it may be more
natural to think of Rq not as the product of q one-dimensional spaces, but as
the product of an input space and an output space (yet to be determined). A
possible constraint to impose would be that the noise covariance matrix should
be block diagonal corresponding to this decomposition. This leads to an alternative interpretation of the vector x, since it can be shown that the model
(8)

(with x,e l , and e2 independent) holds if and only if y and u are conditionally
independent given x. The conditional independence property is also used to
define the notion of 'state' in stochastic systems (see for instance [52J), and so

572

J.

~.

Schurnacher

the problem of constructing a model of the form (8) for a given decomposition
ofwinto components u and y is sometimes called a realization problem [17,45].
Let us say that a decomposition of w into inputs u and outputs y is
'admissible' if there is a model of the form (8) in which x has minimal length
among all models of the same type corresponding to the same decomposition,
and in which the matrix H 2 is invertible. The invertibility of H 2 will allow the
model to be rewritten in an input-output form:
y=H 1H;1/1

y= y + B1

u= /1 + B2
This is the errors-in-variables form (see for instance [l1J). The decomposition
of w into y and u leads to a partitioning of the covariance matrix Qww:
Qww = (Qyy
QUY

Qyu)
Quu

We claim: the decomposition oJ w into inputs u and outputs y is admissible if


if the matrix QyU has Juli column rank. To see this, assume first that
we do have an admissible decomposition and let (8) be a corresponding model.
Because x has minimum length, the covariance matrix Qxx of x must be nonsingular. We have H 1 = QyxQ;x1 since obviously H 1 x is the least-squares
estimate of y given x, and likewise H 2 = QuxQ;/. Because of the mutual
independence of x, Bl ' and B2' one has
and only

QyU =

H1 E[xx T JHI =

QyxQ;/Qxu

Now, it is shown in [17J that the length of x in a minimal representation of


the form (8) is equal to the rank of Qyu. From the formula above, we see that
this implies that Qxu is surjective (and hence invertible) and that Qyx is injective.
But then QyU is injective too, by the same formula. Conversely, if it is given that
QyU has full column rank, then the construction of [17J immediately leads to a
representation of the desired form.
The conclusion must be that imposing that the error covariance matrix
should be block diagonal doesn't help very much in the selection of inputs and
outputs. In particular, it doesn't rule out the possibility of attributing all observed
variation to noise.
Before turning to the dynamic case, let us note that the error-in-variables
model is not uniquely determined even ifwe fix the choice ofinputs and outputs.
It is easy to see that all possible solutions can be parametrized in terms of the
'true' input covariance matrix Q, and that all symmetrie positive definite matrices
Q will qualify that satisfy the two inequalities
Q~Quu

and

9 Applications-System-Theoretic Trends in Econometrics

573

Using the singular value decomposition, one can easily show that the latter
inequality can be rewritten as a lower bound on Q, of the form Q ~ Qmin' The
corresponding (non-unique) 'true' linear relation between the latent variables y
and u is given by QyuQ - 1.
In the static ca se, several proposals have been formulated to reduce the nonuniqueness of the errors-in-variables model by bringing in some extra information; see for instance [1]. Let us see what the dynamic case has to offer. We
follow the development in [21J and [43].
Our goal will be to verify the admissibility of a given decomposition of w(t)
in inputs u(t) and outputs y(t). The observational data are supposed to be
summarized in a spectral density matrix Qww(z) for w, which is partitioned
according to the proposed decomposition as
Qww(z) = (Qyy(Z)
QUY(z)

Qyu(Z))
Quu(z)

We are looking for a 'true' transfer matrix G(z) and a 'true' input spectral density
Q(z) which should satisfy
G(z)Q(z) = Qyu(z)
Q(z) ~ Quu(z),
G(z)Q(z)GT(z -1) ~ Qyy(z),

Izl = 1

Izl = 1

Under suitable assumptions, the development in the static case can be followed
(replace the field R by the field R(z), the partial order '~' by the partial order
'~pointwise for Izl = 1', and the involution MM T by the involution M(z)
MT(z - 1)). As in the static case, the set of all minimal solutions will be parametrized by the spectral density matrices Q(z) that fall between an upper and
a lower bound determined by the data, and the corresponding transfer matrices
are then given by
G(z)

= Qyu(z)Q -l(Z)

However, we want to impose both causality and stationarity and so we require


G(z) to have all of its poles inside the unit disko The problem is to find the
restrietions on Q(z) that will guarantee this property for G(z).
Again, the key tool to use is the Wiener-Hopf factorization. To avoid some
technical intricacies, we shall assurne that both Q(z) and Qyu(z) have constant
rank on the unit circle. Then we can write
Qyu(z) = F _ (z)D(z)F + (z)

where
D(z) = (

L1~)).

L1(z) = diag(z"', ... , z"m)

and where F _(z)(F +(z)) is unimodular as a matrix over the ring R_(z)(R+(z))
of rational functions having all their poles inside (outside) the unit circle. (We

574

J. M. Schumacher

also used the fact that Qyu(z) must have full column rank, as in the static case.)
For any rational matrix M(z), write M*(z) = MT(z-l); note that M* will be
R_(z)-unimodular if M is R+(z)-unimodular, and vice versa. Now, write
G = F _DF +Q-l

= F _DQ(F!)-l

where
Q=F+Q-1F!

Do a spectral factorization to write Q= H +H!, where H + is R+ (z)-unimodular.


We then have
G = [F _JDH + [H!(F!)-lJ

Because the factors between square brackets are R_(z)-unimodular, it follows


that G(z) will be causal and stable if and only if DH + is causal and stable,
which is the same as requiring that AH + should be causal and stable. Because
A(z) is diagonal, this requirement entails that all entries hij(z) of the i-th row
of H +(z), which are rational functions having all their poles outside the unit
circle, should be such that the functions ZKihij(z) have all their poles inside the
unit circle. Since multiplication by apower of z can only move poles between
zero and infinity, we see that the ";'s should be nonpositive and that the hij(z)'s
should be polynomials of degree no higher than - "i' This means that there
will be no solution if one of the Wiener-Hopf indices "i is positive, and that
otherwise the solution set is parametrized by at most" parameters, where
i=m

,,= - L "i
i= 1

(Of course, we also have the requirement that H + should be unimodular, so


the parametrization is nontrivial.)
We see that imposing the requirements of causality and stationarity may
weIl cause a certain proposal for the selection of inputs and outputs to be
rejected; if the proposed selection turns out to be admissible, then it causes the
set of all possible models to be finitely parametrized. However, there is no
indication how to select inputs and outputs in such a way that the associated
Wiener-Hopf indices will be nonpositive; this problem was raised in [21J but
apparently the question is still open. Also, as in the scalar case, the number of
inputs is still undetermined and the possibility of attributing all variance to
noise is not ruled out. It may be worthwhile to try out the effect of other possible
constraints on the error spectral density, such as size constraints (proposed for
the static case in [IJ).

5 Conclusions
An econometrician once told me that he was amazed that system theory is still
an active field, since he couldn't imagine that the analysis of the KaIman filter
would not be completed by now. Apparently, the full variety of system-theoretic

9 Applications-System-Theoretic Trends in Econometrics

575

methods has as yet failed to disclose itself to the field of econometrics. System
theory provides a rich set of examples which illustrate the pitfalls of modeling,
and how to avoid these; KaIman has used such examples in his contributions
to the ongoing debate on the fundamentals of mathematical modeling and
identification. System theory also provides a large body of knowledge about
state space techniques, and the applicability of such techniques to econometric
problems has been shown in the work of Aoki and others. But the collection
of mathematical techniques that are familiar .to and developed by system
theorists allows an even more intensive contact. As shown in this paper, matrix
factorizations and pole-zero considerations play an important role in
econometric problems, and system theorists have applied these for a long time.
There is an econometric interest in representation problems, which is something
about which system theory has a lot to say. The invertibility of systems is a
natural concept in dynamic economic analysis, destined to playa role similar
to the invertibility of matrices in static analysis; and again, system theory
pro vi des the necessary tools. While some of the questions here are no doubt
more modest than the fundamental issues with which R.E. KaIman has confronted the econometric profession, they may still be a worthwhile subject for
research and lead to results that will satisfy system theorists and econometricians
alike.

Acknowledgement
It is a pleasure to thank Manfred Deistler, Theo Nijman, and Henk Nijmeijer
for the useful information they kindly provided to me.
References
1. D.J. Aigner, c. Hsiao, A. Kapteyn, T. Wansbeek (1984). Latent variable models in econometrics.
Z. Griliches, M.D. Intriligator (eds). Handbook of Econometrics, North-Holland, Amsterdam,
1321-1393
2. M. Aoki (1975). On a generalization ofTinbergen's condition in the theory ofpolicy to dynamic
models. Review of Economic Studies 42, 293-296
3. M. Aoki (1976). Optimal Control and System Theory in Dynamic Economic Analysis, North-

Holland, New-York

4. M. Aoki (1987). State Space Modeling of Time Series, Springer, Berlin


5. G.E.P. Box, G.M. Jenkins (1970). Time Series Analysis. Forecasting and Control, Holden-Day,

San Francisco

6. R.W. Brocket!, M.D. Mesarovic (1965). The reproducibility of multivariable systems. J Math
Anal Appl 11, 548-563
7. K. Clancey, I.C. Gohberg (1981). Factorization ofmatrixfunctions and singular integral operators,

Birkhauser, Basel

8. J.E. Davidson (1987). Cointegration in linear dynamic systems. (Paper presented at the
Econometric Society European Meeting, Copenhagen, August 1987)
9. J.E. Davidson, D.F. Hendry, F. Srba, S. Yeo (1978). Econometric modelling of the aggregate

time-series relationship between consumers' expenditure and income in the United Kingdom.
The Economic Journal 88, 661-692
10. J.E.H. Davidson, D.F. Hendry (1981). Interpreting economic evidence: The behavior of
consumers' expenditure in the U.K. European Economic Review 16, 177-192
11. M. DeistIer (1989). Symmetrie modeling in system identification. H. Nijmeijer, J.M. Schumacher
(eds). Three Decades of Mathematical System Theory. A Collection of Surveys at the Occasion
of the 50th Birthday of Jan C. Willems, Lect Notes Contr Inform Sei 135, Springer, Berlin,
129-147

576

J. M. Schumacher

12. R.F. Engle (1987). On the theory of cointegrated economic time series. (Paper presented at the
Econometric Society European Meeting, Copenhagen, August 1987)
13. R.F. Engle, C.WJ. Granger (1987). Co-integration and error correction: Representation,
estimation, and testing. Econometrica 55, 251-276
14. R.F. Engle, M. Watson (1985). Applications of Kalmanjiltering in econometrics. (Paper presented
at the 5th World Congress of the Econometric Society, Boston, Mass., 1985)
15. J.C. Engwerda (1988). Control aspects of linear discrete time-varying systems. Int J Contr 48,
1631-1658
16. G. Feichtinger, R.F. Hartl (1986). Optimale Kontrolle Oekonomischer Prozesse, De Gruyter,
BerIin
17. L. Finesso, G. Picei (1984). Linear statistical models and stochastic realization theory. A.
Bensoussan, J.L. Lions (eds). Analysis and Optimization of Systems (Proc 6th Intern Conf Anal
Optimiz Syst, Nice, June 1984), Lect Notes Contr Inform Sei 62, Springer, BerIin, 445-470
18. B.A. Francis, M. Vidyasagar (1983). Aigebraic and topological aspects of the regulator problem
for lumped linear systems. Automatica 19, 87-90
19. P.A. Fuhrmann, J.c. WiIlems (1979). Factorization indices at infinity for rational matrix
functions. Integr Eq Oper Th 2, 287-301
20. C.W.J. Granger (1983). Co-integrated variables and error-correcting models, UCSD Discussion
Paper 83-13
21. M. Green, B.D.O. Anderson (1986). Identification of multivariable errors-in-variables models
with dynamics. IEEE Trans Automat Contr AC-31, 467-471
22. EJ. Hannan (1970). Multiple Time Series, Wiley, New York
23. E.J. Hannan, M. DeistIer (1988). The Statistical Theory of Linear Systems, Wiley, New York
24. S. Johansen (1988). The mathematical structure of error correction models. N.U. Prabhu (ed).
Statistical Inferencefrom Stochastic Processes (Proc AMSjIMSjSIAM Joint Summer Research
Conf, August 1987), Contemporary Mathematics, Vo180, Amer Math Soc, Providence, RI,
359-386
25. S. Johansen (1988). Statistical analysis of cointegration vectors. J Econ Dyn Contr 12,231-254
26. S. Johansen (1989). Estimation and hypothesis testing of cointegration vectors in Gaussian vector
autoregressive models, Preprint 3, Institute of Mathematical Statistics, Univ of Copcnhagen
27. T. Kailath (1980). Linear Systems, Prentice-HalI, Englewood ClilTs, NJ
28. R.E. KaIman (1982). System identification from noisy data. A.R. Bednarek, L. Cesari (eds).
Dynamical Systems II (Univ Florida Intern Symp), Ac Press, New York
29. R.E. Kaiman (1982). System identification from real data. M. Hazewinkel, A.H.G. Rinnooy Kan
(eds). Current Developments in the Interface: Economics. Econometrics. Mathematics, Reidel,
Dordrecht,161-196
30. R.E. Kaiman (1983). Identifiability and modeling in econometrics. P.R. Krishnaiah (ed).
Developments in Statistics 4, Ac Press, New York, 97-136
31. I. Karatzas (1989). Optimization problems in the theory of continuous trading. SIAM J Contr
Optimiz 27, 1221-1259
32. T. Kloek (1984). Dynamic adjustment when the target is nonstationary. International Economic
Review 25, 315-326
33. M. Kuijper, J.M. Schumacher (1990). State space formulas for transfer poles at injinity.
(Manuscript in preparation)
34. H. Kwakernaak, R. Sivan (1972). Linear Optimal Control Systems, Wiley, New York
35. P. Lancaster (1966). Lambda-matrices and Vibrating Systems, Pergamon Press, Oxford
36. C.A. Los (1989). The prejudices of least squares, prineipal components and common factors
schemes. Computers M ath Appl 17, 1269-1283
37. C.A. Los (1989). Identification of a linear system from inexact data: A three-variable example.
Computers Math App117, 1285-1304
38. EJ. Moore (1985). On system-theoretic methods and econometric modeling. Intern Econ Rev
26,87-110
39. A.S. Morse, W.M. Wonham (1971). Status of noninteracting contro\. IEEE Trans Automat
Contr AC-16, 568-581
40. H. Nijmeijer (1989). On dynamic path decoupling and dynamic path controllability in economic
systems. J Econ Dyn Contr 13,21-39
41. H. Nijmeijer, J.M. Schumacher (1985). On the inherent integration structure of nonlinear
systems. IMA J Math Contr Inf2, 87-107
42. P.c.B. PhiIlips, S.N. DurIauf (1986) Multiple time series regression with integrated processes.
Review of Economic Studies 53, 473-495

9 Applications-System-Theoretic Trends in Econometrics

577

43. G. Picei, S. Pinzoni (1986). A new dass of dynamic models for stationary time series. S. Bittanti
(ed). Time Series and Linear Systems, Lect Notes Contr Inf Sci 86, Springer, Berlin, 69-114
44. AJ. Preston (1974). A dynamic generalization ofTinbergen's theory ofpolicy. Review of Economic
Studies 41, 65-74
45. C. van Putten, I.H. van Schuppen (1983). The weak and strong Gaussian probabilistic realization
problem. J Multivar Anal 13, 118-137
46. M. Salmon (1988). Error correction models, cointegration and the internal model principle. J
Econ Dyn Contr 12, 523-549
47. I.D. Sargan (1964). Wages and prices in the United Kingdom: A study in econometric
methodology. P.E. Hart, G. MiIIs, I.N. Whittaker (eds). Econometric Analysis for National
Economic Planning, Butterworths, London
48. W. Schneider (1986). Der Kaimanfilter als Instrument zur Diagnose und Schtzung variabler
Parameter in konometrischen Modellen, Physica-Verlag, Heidelberg
49. I.M. Schumacher (1983). The algebraic regulator problem from the state-space point of view.
Lin Alg Appl 50, 487-520
50. I.M. Schumacher (1986). Residue formulas for meromorphic matrices. c.1. Byrnes, A. Lindquist
(eds). Computational and Combinatorial Methods in System Theory, North-Holland, Amsterdam,
97-111
51. I.M. Schumacher (1989). Linear system representations. H. Nijmeijer, I.M. Schumacher (eds).
Three Decades of M athematical System Theory. A Collection of Surveys at the Occasion of the
50th Birthday of Jan C. Willems, Lect Notes Contr Inform Sci 135, Springer, Berlin, 382-408
52. I.H. van Schuppen (1989). Stochastic realization problems. H. Nijmeijer, I.M. Schumacher (eds).
Three Decades of M athematical System Theory. A Collection of Surveys at the Occasion of the
50th Birthday of Jan C. Willems, Lect Notes Contr Inform Sei 135, Springer, Berlin, 480-523
53. A. Seierstad, K. Sydsaeter (1987). Optimal Control Theory with Economic Applications, Advanced
Textbooks in Economics, Vo124, North-Holland, Amsterdam
54. 1. Tinbergen (1952). On the Theory of Economic Policy (2nd ed), North-Holland, Amsterdam
55. A. Tustin (1953). The Mechanism of Economic Systems. An Approach to the Problem of Economic
Stabilisation from the Point of Jliew of Control-System Engineering, Heinemann, London
56. I.C. Willems (1979). System theoretic models for the analysis of physical systems. Ricerche di
Automatica 10,71-106
57. I.C. Willems (1983). Input-output and state-space representations of finite-dimensional linear
time-invariant systems. Lin Alg Appl 50, 581-608
58. I.C. Willems (1986). From time series to linear system. Part I: Finite dimensional linear time
invariant systems. Automatica 22, 561-580
59. I.C. Willems (1988). Models for dynamics. U. Kirchgraber, H.D. Walther (eds). Dynamics
Reported (VoI2), Wiley/Teubner, 171-269
60. W.M. Wonham (1979). Linear Multivariable Control: a Geometrie Approach (2nd ed), Springer,
New York

Dynamical Systems that Learn Subspaces*


R. W. Brockett
Division of Engineering and Applied Physics,
Harvard University,
Cambridge, MA 02138, USA

In this paper we define and study a dass of nonlinear filters capable of solving a range of problems
which arise in the design of adaptive analog systems. Even though the definitions used are compelling
in terms of the phenomena and natural in terms of mathematics, the filters require careful analysis
because they can exhibit discontinuous behavior. Using some recent results on a type of gradient
flow on the orghogonal group, we are able to construct a dilTerential equation realization of a
smooth approximation to these filters.

1 Introduction
During the summer of 1970 I spent a pleasant period at Stanford at R.E.
Kalman's invitation. Although his published work stemming from this period
is, for the most part, devoted to problems in algebraic system theory, on that
occasion he had invited people working in a variety of areas, to participate in
teaching a summer school. The visitors included Stephen Grossberg who
delivered, with considerable enthusiasm, a set of lectures on learning. I must
confess that even though the neural models were presented with conviction, the
complexity of the nonlinear differential-difference equation models being
advanced left me feeling that I wasn't ready for work on learning. Of course,
in the meantime cognitive science has become one of the most exciting fields
of inquiry with important contributions coming not only from neuroscientists
but also from computer scientists, electrical engineers, physicists, and statisticians. I can't be sure that the Stanford experience played a significant role in
the decision, but nonetheless, twenty years later, I find myself writing about
dynamical models for learning.
The topics we are concerned with here are standard fare in the neural
network literature. Even so, clear statements about underlying mathematics are
often missing. Our work is most specifically related to the properties of what
are often called linear feedforward networks. We will develop an alternative to
the usual algorithms but most importantly we insist that there be no distinctions

This work was supported in part by the U.S. Army Research Office under grant
DAAL03-86-K-0171, the National Science Foundation under grant CDR-85-00108, and DARPA
grant AFOSR-89-0506.

580

R. W. Brockett

between the learning phase and the operational phase of the system. Singular
value decomposition applied to operators on finite dimensional spaces, be this
expressed in the language of statistics (principal component analysis), stochastic
processes (Karhunen-Loeve expansion) or ordinary linear regression plays an
important role. In terms of applications, there are points of contact with the
work of Kohonen on autoassociation (see especially Chap. 4 of [lJ), work on
adaptive arrays (see the last few Chaps. of [2J) and other topics in neural
networks ([3J is a timely reference). The recent papers of Bourland and Karp
[4J, and Baldi and Hornik [5J have done a great deal to clarify the role of
singular value decomposition in this context. In particular, an important aspect
of the use of the principal component analysis vis avis the learning phase of
the weight selection process is investigated in [5]. In the spirit of Kalman's
pioneering work on the realization problem [6J, one of the goals
of this paper is to capture in one state space model all aspects of the phenomena being investigated. Thus in Sect. 5 we present a ccimplete, input-output,
learning-phase/operational-phase model for learning subspaces; the operations
needed to generate the flow are just addition, subtraction and multiplication.
Most of the literat ure in this area deals with difference equation models but
one can make the case that differential equations are at least as plausible in
terms of biological applicability and we will work with them exclusively. There
would be no intrinsic difficulty in re casting the ideas in difference equation
form.

2 Adaptive Subspace Filters


Our starting point is a standard one in electrical engineering. We consider inputoutput systems acting to transform signals; we assume that the signals belong
to a vector space and that this space has an inner product. The signal processing
which is of interest is that of amplifying or suppressing particular components
of signals, depending on the subspace in which these components reside. Wh at
sets the present discussion apart from basic linear system theory is that we
allow the choice of subspaces to be determined adaptively, which is to say that
it can depend on the past values of the signal being processed. Typical problems
in this domain are that of building and analyzing systems which allow all signals
in the most frequently used subspace of a certain dimension to pass through
unchanged while suppressing completely signals which lie in the orthogonal
complement of this subspace. Other possibilities include shaping the response
by amplifying the projections onto different subspaces by different amounts,
including the possibility of completely suppressing signals in the subspace of a
certain dimension defined by the components wh ich have the most energy.
Because they adapt to the signals they process, often these filters will have
associated with them a parameter which measures how much signal activity is
required to move a subspace by a fixed amount.

9 Applications-DynamicaI Systems that Learn Subspaces

581

In Sect. 5 we will discuss approximate implementations in terms of


differential equations but before proceeding to that we consider a more tractable,
idealized, situation. Let Em[o, t] denote the set of all square integrable functions
from [0, t] into euclidean m-space, Em. Suppose that ueEm[O, t] and suppose that
there is given an ordered set of real numbers A1 ,A 2 , ,Am We find an
orthonormal basis {ei(t) }~= 1 for E m which is adapted to u in the sense that for
any other orthonormal basis {Ji(t)}~= 1 and any k = 1,2, ... , m,
k

L J<ei(t), u(cr) )2dcr ~ L J<fi(t), u(cr) )2dcr


t

i=lO

i=lO

That is, the e-basis is ordered by the integral squared value of <u, ei(t) on
[0, t]. Such a basis always exists (we will show how to construct it later) and
it is unique up to the choice of sign unless it happens that two or more of the
e's span spaces of equal "energy". We use this definition and the set {A.;}~= 1 to
define a mapping ~;.:Em[O,t]f---+Em[O,t] whose action is given by
~;.(u)(t)

L Ai<u(t), e;(t)ei(t)

i=l

Because the choice of sign for the ei(t) cancels out, this definition is unambiguous
unless there are two or more e;'s which define subspaces with equal energy. In
that case we resolve the ambiguity by multiplying all the components of u(t)
which lie in an equal energy subspace by the average of the corresponding A's.
We call ~;. the ideal, norifading memory, adaptive subspace filter with weight
vector ,1.
If all the A's are the same, Al = ,12 = ... = Am = a, then ~ ;.(u(t)) = au(t);
otherwise ~;. shapes u(t) in a way that depends nonlinearly on the past of u.
Even so the definition implies that ~;. is homogeneous of degree 1: ~ ;.(au(t)) =
a~ ;.(u(t)). Also, ~;. is additive with respect to the weights:
~ ;.(u(t))

+ ~ /L(u(t)) =

~ +/L(u(t))

The relations hip between weight multiplication and the composition law is
more interesting.
Lemma. 1f ,1 is monotone descreasing in the sense that Al> ,12 > ... > Am then
~. = ~ /L(~;') with V= (Al Jl1' A2Jl2' .. ' AmJlm). 1f ,1 and Jl are both monotone
decreasing, then ~. = ~ ;.(~ ,,) = ~ /L(~ ;.).
Proof. Consider ~/L(~;'). If {ei(t)}~=l is adapted to u and if ,1 is monotone
decreasing, then {ei(t)}~= 1 is also adapted to ~ ;.(u) and so the result folIows.
The same reasoning applies to ~;.(~ ,,).
If ,1 is not monotone, there will be choices for u such that the e-basis for u
is not an e-basis for ~ ;.(u). In general the composition of two subspace filters
is not a subspace filter.
Even though ~ ;.(u)(t) depends on the behavior of u on the wh oIe interval
[0, t], the only aspects of the past of u which matter are those which are needed

582

R. W. Brocket!

to compute the adapted orthonormal basis {e;(t) }~= 1. If we pick an arbitrary


time independent orthonormal basis for Ern, say {J~= l ' and generate the
m(m + 1)/2 independent entries of the symmetrie matrix Q(t) defined by
t

qiit) = S <u(u), i><U(U), j>du


o

then Q(t) provides a summary of the past of u which is adequate to permit one
to compute the ei(t). In fact, because Q(t) is symmetrie and positive semidefinite,
there necessarily exists an orthogonal matrix 0(t) such that Q = 0(t)Q(t)0 T(t)
is diagonal with the eigenvalues of Q, arranged in decreasing order along the
diagonal of Q. As above, if the eigenvalues of Q are unrepeated, then 0(t) is
unique to within a premultiplication by a diagonal matrix D whose diagonal
entries are 1. Regardless of how this choice of signs is made, the rows of 0(t)
are suitable basis vectors for the definition of g;).. This is the construction of
{ei(t) }~= 1 that was promised. In view ofthis we can assert the following theorem.
Theorem 1. Let A be the m by m matrix diag(A 1 , A2, ... , Am). Except Jor those
values oJ t Jor wh ich Q(t) has repeated eigen va lues, the input-output system g;).
is realized by
Q(t) = u(t)uT(t);

Q(O) =

y(t) = 0 T(t)A 0(t)u(t)

with 0(t) being any orthogonal matrix such that Q(t) = 0(t)Q(t)0 T(t) is diagonal
with the diagonal entries oJ Q arranged in decreasing order along the diagonal.
Moreover, given any Q= QT > and any t > there exists uEEm[o, t] wh ich
drives Q to Qat time t and if A is not a multiple oJthe identity Jor any Q1 #- Q2'
there is a u which evokes a different response Jrom Q(O) = Q1 than it does Jrom
Q(O) = Q2.

If we regard Em[O, t] as a normed space with 11 u 11 = (S~uT(u)u(u)du)1/2 and


think of g;). as mapping this normed space into itself, then g;). is not only
nonlinear but actually discontinuous. That is to say there exist points Uo in the
space of inputs and sequences {uJ ~ 1 converging to U o such that the output
sequence {g; ).(u;)} ~ 1 does not converge to g; ).(u o). These points of discontinuity
correspond to inputs having the property that the energies associated with two
or more different subspaces with unequal A's are equal over an interval of
positive length. That is, Q(t) has repeated eigenvalues, not just at one moment
in time but over an interval, and the corresponding values of Aare distinct.
Even though the map g; A. of Em[o, t] into Em[o, t] is not continuous everywhere, it is nicely behaved in a neighborhood of typical inputs and it is of
interest to compute the derivative at those points where it does exist. In this
theorem statement and elsewhere, the notation [A, B] = AB - BA and A B =
(aijbij) will be used for the Lie bracket and Schur-Hadamard product,
respectively.
0

9 Applications-Dynamical Systems that Learn Subspaces

583

Theorem 2. Suppose that uoEEm[o, t] and suppose that


t

Qo(t)

= Juo(a)u~(a)da
o

has unrepeated eigenvalues ql > q2 > ... > qm. Let eo(t) be such that
eo(t)Qo(t)e~(t)=diag(ql,q2, ... ,qn) and let Q# denote the skew-symmetric
matrix whose i/h entry, for i =f. j is (qi-q)-l. Assume that JF).: Em[o, t] H Em[O, t]
maps Uo into Yo. 1f JF ).(u o + ebu) = Yo + eby, then to first order in e
by(t) = e

~(t)A e ~(t)bU(t) + [ Q#o( e ~(t)! u(a)buT(a)

+ bu(a)uT(a)da e

o(t)), e

~(t)A e

o(t) ] u(t)

Proof. Replacing u by u + ebu gives


t

Qo(t) + ebQ(t) = (u(a)


o

+ ebu(a))(u(a) + ebu(a)f da

so that to first order in e


bQ(t) =

Ju(a)buT(a) + bu(a)uT(a)da
o

Since
is to be orthogonal and since (1 + eQ) is orthogonal to first order in
e, if and only if Q = - QT, we express the variation in e required to keep Q
diagonal to first order as eo(l + eQ) with Q = - QT. Thus we have
eo(l + eQ)(Qo + ebQ)(l- eQ)e~ = Q + ebQ

This yields the relations


or
eoQe~Q - QeoQe~ = - eobQe~ + bQ

Since Q is diagonal, the effect of the two Q terms on the left side of this equation
is to multiply the i/h element of eoQe~ by -(qi-q). Since the matrix
eoQe~Q - QeoQe~ which appears on the left is dearly zero on the diagonal
we see that bQ = diag eobQe~ and that
eoQe~ = - Q#o(eobQe~ - diag eobQe~)

But since Q# is also zero on the diagonal, Q#odiag eobQe~ is zero and
eoQe~ = - Q#o(eobQe~)

584

R. W. Brockett

Returning now to the definition y(t) = e T(t)A e(t)u(t), using linearization we


see that
c5y(t) = e ~(t)A e o(t)c5u(t) + e ~(t).Q(t)A eo(t)u(t) - e ~(t)A.Q(t)eo(t)u(t)
= e ~(t)A e o(t)c5u(t) + e ~(t)[.Q(t), A] eo(t)u(t)

= e~(t)Aeo(t)c5u(t)+ [e~(t).Qeo(t), e~(t)Aeo(t)]u(t)

The formula of the theorem is obtained by substituting the above expression


for e ~ Q e 0 into this expression.
D
This calculation shows that the output of g;.l depends smoothly on the
input as long as the eigenvalues of Q are unrepeated but that the norm of the
derivative of g;.l goes to infinity as two eigenvalues of Q(t) approach each other.
Notice that, in particular, g;.l is not differentiable at U o = 0, as is consistent with
the general fact about nonlinear maps which are homogeneous of degree 1.

3 Autoassociation and Regression


There are three related minimization problems which playa role in the theory
of linear feedforward networks. We review them quickly because they set the
stage for our subsequent discussion.
Problem 1. Given n vectors in Ern, {Yi}7 = l' and given an m by r matrix K, find
n

111=min
{x;}

L IIYi-KxiI12;
i=1

xiEE r

Problem 2. Given two sets of n vectors in Ern, {Yi}7=1' {x;}7=1' and given an
integer r ~ m, find
n

112 = min
K

i=1

IIYi-KxiI12;

rankK = r

Problem 3. Given n vectors in Ern, and given a m by r matrix K of rank r, find


n

113 = min
K,Xi

i=l

11 Yi - Kxdl

The first of these is solved by projecting Yi onto the range space of K. If K


is of rank r there is a simple formula

Xi = (K TK)-1 KTYi
Using this, problem 3 can be recast by substituting for Xi its optimal value as
a function of K. This gives
n

113 = min
K

i=1

11 Yi - K(K TK)-1 K TYd1 2 ;

rankK = r

9 Applications-Dynamical Systems that Leam Subspaces

585

But since P = K(K TK) -1 K T = pT = p2, we see that P is a rank r orthogonal


projection; it can be written as e A e T with A = diag(1, 1, ... ,1,0,0, ... ,0) and
rank A = r. Thus on one hand we can think of problem 3 as that of finding the
best r-plane in Ern or we can think of it as finding the best orthogonal matrix
acting on Ern. Since the manifold of all r-planes in Ern is of dimension mr - r2
and the manifold of all m by m orthogonal matrices is m(m - 1)/2 dimensional,
formulating the optimization in terms of the Grassmann manifold of r-planes
in m-space involves fewer parameters. Nonetheless, the descent algorithm of
Sect. 5 is formulated in the space of orthogonal matrices, generating the
r-plane in m-space via A e T.
Problem 2 is slightly more involved. As pointed out by Baldi and Hornik
[5], it is useful to rewrite K as K = AB with A = m by rand B = r by m.
Introduce the sampIe covariance matrices
n

.Eyy =
.EyX =

L YiY;;

i= 1
n

L Yi X ;;

i=1

.Exy =

L XiY;

i= 1
n

.E xx = LXiX;
i= 1

We assume, for simplicity, that.E xx is invertible and change variables accordingly


to B=B,JE:. Using the identity (X 1,x 2 )=trx 1xI we can write

= min tr(.E yy - 2AB,JE:-1 .EXY + ABB TAT); A = mbyr; B = rbym


A,ii

For a fixed B the optimization of A is straightforward quadratic minimization


which gives

A=.E yxy~-1B(BBT)-1
"" xx
Substituting this .into the expression for '12 and taking advantage of a
cancellation of terms, gives
'12 = min tr(.Eyy - .EXy ,JE:-1 B(BB T)-1BT,JE:-1.EXY )
B

Again, P = B(BB T)-1 B is an orthogonal projection and thus problem 2, after


a preliminary transformation which only involves linear algebra, is reduced to
the problem of finding an optimal r-plane in Ern. This, in turn, can be replaced
by finding an optimal m by m orthogonal matrix as above.
The /F). filters have been defined so as to provide solutions to function space
versions of problems of this type. We consider two ex am pIes.

586

R. W. Brocket!

Example 1. The autoassociative problem. (See [1], [4], [5].) The autoassociation problem is that of finding a system which is capable of categorizing a set
of examples and then using what it has learned to assign subsequent inputs to
the correct category, even if they are somewhat inaccurate or incomplete. More
precisely, suppose one is given an input uEEm[O, t] and that, for the most part,
the inteval [0, t] consists of segments during which u(t) is drawn from a finite
set of constant vectors with only an occasional transition between these
segments. Say
u(u) = uit;
u(u) = u;,;

o~

u ~ t1

tl+c5~u~t2

u(U) = Ui r ;

The values of U during the transitions are unimportant because c5 is smaH


compared with ti+ 1 - t i. Assurne that the vector space spanned by the set {Ui}~= 1
is r < n dimensional and that we want to find an r by n matrix G and a n by
r matrix H so that
I

'1 =

Jo min
i

11

HGu(t) - Ui 11 2 dt

is as smaH as possible. This problem is solved by the fF). filter which maps
Em[O, t] into itself and has A. = (1,1, ... ,1,0,0, ... ,0) with r ones and m - r zeros.
To see this, form the sampie covariance matrix Q(t) = J~u(u)uT(u)du and define
y by
y(t) = eT(t)A eu(t)

with e(t)Q(t) eT(t) = Q= diag(ql' tl2' ... ' tlm) as in theorem 1. This rule will
associate to u(t) an element in the space spanned by {U;}~=l approximating u
as weH as possible in the least squares sense. In view of the form of A we should
let G and H be the first r columns of e T and the first r rows of e(t), respectively.
Example 2. Total least squares approximation. This involves generating an
approximating linear transformation associated with empirical data. The
approximation is to be optimal in a total least squares sense. Suppose that
aEE n and bEEn are, to within certain error, related by b = Ta and that over a
period of time a number of instanciations of this relationship have been made
available. We write these as (a(t), b(t)). Consider the 2n-dimensional input and
2n-dimensional output adaptive subspace filter fF ). with A. being n ones foHowed
by n zeros, i.e. A. = (1,1, ... ,1,0,0, ... ,0). Form a 2n-dimensional vector
u(t) = [a(t)]
b(t)

9 Applications-Dynamical Systems that Learn Subspaces

587

If the relation b(t) = Ta(t) holds, then the Q of Theorem 1 is given by the 2n
by 2n symmetrie matrix

A short calculation shows that in this ideal case, unless a is confined to a proper
subspace of En on all of [0, t], the e which diagonalizes Q(t) is such that
(A = diag(1, 1, ... ,1,0,0, ... ,0))
(I + TTT)-lTT ]
T(I + TTT)-lTT

Using the relationship Ta=b we conclude that eAeTu is given by

Of course if the pairs a(t), b(t) are distorted by noise, then e will track some
nearby appropriate average and e TAeu will only approximate the desired
output.

4 Fading Memory; Bounded Q


There are both performance and implementation considerations which argue
for making a slight modification in the adaptive subspace filter. With respect
to performance, the difficulty is that a strong signal, confined to a low
dimensional subspace, can destroy the structure in the complementary
subspaces, effectively washing away the subspace structure associated with those
experiences which are not repeated with sufficient frequency or intensity. This
problem has attracted some attention in the literature [7-9]. On the other
hand, implementation is made difficult by the fact that the solution of the Q
equation of the previous section grows without bound and thus cannot be
realized exactly.
A simple way to address these problems is to add a normalization term to
the equation for Q so as to prevent it from growing beyond a certain point.
Perhaps the most elementary way to do this to introduce proportional
discounting, reducing elements of Q by an amount proportional to their size.
This leads to an equation of the form
Q(t) = u(t)uT(t) - uT(t)Q(t)u(t)Q(t)

588

R. W. Brockett

Ifwedefine 11 Q 11 to be
(.Eq~.)1/2 = (trQTQ)1 /2, the so-called Frobenius norm, then

J
along solutions of Q = uu T - uTQuQ we have
d
dt

_11 Q 11 2 =

d
dt

- tr Q TQ

= 2tr(QT u(t)uT(t)) - 2u T(t)Q(t)u(t) tr QT(t)Q(t)


= 2u T(t)Q(t)u(t)(1 - tr QT(t)Q(t))

Thus if 11 Q 11 is initially less than one, it will grow and approach one but not
grow beyond it; if it is initially greater than one, it will decrease to one but not
go below it. If 11 Q 11 is initially one, it will remain so.
With this normalization the equation for Q can be rewritten as
Q(t) =

Ju(O')uT(O')e-f~UT(~)Q(~)u(~)dqdO'
t

which, although still implicit, clearly places in evidence the fact that past values
of u are discounted more heavily than recent values. This change in the definition
of Q suggests that we introduce a corresponding version of the adaptive subspace
filter.

Definition. If uEEm[O, t] and if y = ff; .,(u) is given by


Q(t) = u(t)uT(t) - uT(t)Qu(t)Q;

Q(O) = I/jm

y(t) = eT(t)A e(t)u(t)

with A =diag(A 1,A 2 , ... ,A m) and e(t)QeT(t) = diag(Q11,llz2, .. ,qmm) with


Q11 ~ Q22 ~ ... ~ Qmm, then we say that ff;;,. is the ideal,fading memory, adaptive
subspace filter with weight vector A.
This fading memory scheme does not help with the problem of some low
dimensional subspace driving the rest of the subspace structure into the noise
level. As an aside we mention a generalization of this discounting scheme which
addresses this problem. Let (Jl1, Ji2'"'' Jim) be a set of real numbers and let
M = diag(Ji1, Ji2"'" Jim)' Let e be such that eQe T is diagonal with the diagonal
entries ordered as above. The equation

specializes to Q= uu T - uTQuQ when M is the identity but othewise the efTect


of M is as folIows. The orthogonal matrix e projects the incoming u-vector
onto the eigenspaces of Q and multiplies the i 1h projection by Jii' If
Ji1 ~ Ji2 ~ ... ~ Jim, this will have the efTect of ofTsetting the tendency of the larger
and more frequently occurring values of u to wash out the fine structure which
goes along with the less frequently used subspaces.

9 Applications-Dynamical Systems that Leam Subspaces

589

5 A Continuous Descent Equation


The difficulty with the differential equation realization of Theorem 1 is that it
requires the computation of 0 and no algorithm for eomputing 0 is given. As
diseussed in the Introduetion, our goal in this paper is to find a set of ordinary
differential equations whose right-hand sides do not involve anything other
than addition, subtraction and multiplieation and whieh will implement the
adaptive subspaee filter. Although stated indifferent terms, Sanger [10] achieves
something similar. The key result of this section eomes from our papers
[11-12].
Let SO(n) denote the set of all n by n orthogonal matriees with positive
determinant and let Q be any real symmetrie matrix. If N is a seeond real
symmetrie matrix, then tr(0Q0 TN) is areal valued for every 0ESO(n). We
ean define an inner produet on the spaee of symmetrie matrices by
<H,N) = tr HTN. It is intuitive that the largest value oftr(0Q0 T N) for Q and
N fixed and 0 orthogonal will be achieved when 0Q0 T is, in some sense, as
dosely aligned with N as possible. Taking the derivative of tr( 0Q 0 T N) and
setting it to zero gives (see [10]) 0Q0 TN = N0Q0 T. If it happens that N is
diagonal with unrepeated entries on the diagonal, then 0Q0 TN = N0Q0 T
implies that 0Q0 T is diagonal. This does not define 0 uniquely but only
defines it to within apermutation and 2n ehoices of sign. The maximum of the
traee oeeurs when (qll, q22' ... qnn) and (n 11 , n22 ,., nnn) are similarly ordered,
e.g. if n 11 > n 22 > ... > nnn, then we should have q 11 > q22 > ... > qnn"
The idea of associating a gradient to a funetion ean be made eoordinate
free if there is a Riemannian metrie on the spaee. Otherwise no intrinsie meaning
ean be given to this idea. Thus steepest deseent is only defined in a natural way
if one has sueh a Riemannian strueture. In [11] we show that relative to the
natural metrie on SO(n)

e=

0Q0 TN0 - N0Q

is the steepest deseent equation for the funetion tr 0Q0 TN and that tr 0Q0 TN
has 2n . n! stationary points exaetly 2n of whieh are stable. Thus if N is diagonal
we may assert that, for almost all initial eonditions on 0, the solution of this
equation approaehes a value whieh makes 0Q0 T diagonal and ordered
similarly with N.
Definition. Ifu(t)EEm, A =diag(A 1')'2, ... ,A m) and N=diag(1,2, ... ,m), we will
refer to

Q(t) = u(t)uT(t) - uT(t)Q(t)u(t)Q(t);


0(t) = 0(t)Q(t)0 T(t)N - N0(t)Q(t)
y(t) = 0 T(t)A 0(t)u(t)

as the deseent approximation to :J;)..

Q(O) = I/fm

590

R. W. Brockett

Of course this system assigns to y(t) exactly the same value that jj). does
if the e equation has reached the equilibrium corresponding to one of its
minima. Roughly speaking, the quality of the approximation improves as the
rate of change of u is reduced. However, this system defines a continuous map
of Em[O, tJ H Em[O, tJ and cannot realize jj). exactly.
Evaluation of the right-hand side of this system of equations only involves
multiplication, addition and subtraction. The e equation plays the role here
that back propagation plays in that it can be thought of as adjusting weights.
The absence of local minima reported in [5J can be interpreted as a special
case ofthe assertion in [llJ about the stationary values oftr(eQeTN).

6 A Master Equation
The point of the previous set of equations is that although it is common to
discuss feedforward networks as operating in two stages, a learning or training
stage followed by an operational stage, in many cases this division is artificial
and can be done away with. Here we want to ofTer a slightly different form of
these equations, one which generates the sampie covariance data and its
normalized eigenvectors at the same time.
Consider P = eQ with e and Q as above. Then
p=eQ+eQ
= e(t)QeTNeQ - NeQQ + euu T - uTQueQ
=pe TNP-NeQ2 + euu T -uTQuP

From the definitions we see that eQ is the left polar decomposition of P and
clearly pTp = QTQ = Q2. Thus we may write Q = JpTp and e = P(JpTP)-l.
This gives
p=pJpTp-lpTNP_ NPJpTp + P(JpTP)-lUU T _u TJpTpUp

This simplifies to
p

= JpTPNP - NP JpTp + P(JpTP)-lUU T - uTJpTpUp

y(t) = pJpTp-l AJpTp-l pTU(t)

7 Conclusion
It seems that learning problems often have both a calculus (or continuous

mathematical) aspect and a combinatorial (or discrete mathematical) aspect.


For ex am pie, in the ca se of problems which reduce to principal component
analysis, we can think ofthe continuous part as that offinding the m-eigenspaces

9 Applications-Dynamical Systems that Learn Subspaces

591

of a symmetric matrix, whereas the combinatorial part is that of selecting which


of the (mir) choices of r I-dimensional eigenvalues is best. In 1938 Shannon
[13J showed that Boolean algebra could be an effective tool in analyzing
switching (or combinatorial) networks. Twenty years later considerable effort
was devoted to bridging the gap between combinatorial logic networks and
resistive networks containing diodes and voltage sources. The monograph of
Dennis [14J is typical. This work brought linear inequalities and mathematical
programming into the picture in a central way. Following this were two developments of importance which occurred more or less in parallel. Starting about
twenty years aga there was a renewal effort to find appropriate dynamical
models for "neurons" and networksof neurons [15]. One might characterize
some of the simpler of these as being the result of adding dynamical elements
to the nonlinear resistive models that were developed to synthesize logical
relations. Although starting somewhat later, there was an overlapping developme nt growing out of Karmarkar's work [16J on descent methods for linear
programming. This effort has shown, convincingly, that at least some combinatorial problems can be solved by "analog" i.e. steepest descent methods. One
might argue that the most remarkable aspect of this work is that linear
inequalities are replaced by completely smooth relationships-not just in some
approximate sense but in an exact sense.
What we see today might then be described as a beginning of a new synthesis
wh ich promises fulfillment of a number of the goals which have lead researchers
down this path. That is, the realization of an effective union of continuous and
combinatorial analysis in a form which can be used to solve interesting problems.
In the present paper this synthesis involves the replacement of combinatorial
logic by the minimization of a certain type of function defined on the orthogonal
group. In the future we may expect to see other smooth structures playing a
similar role.
References
[I] T. Kohonen, SelfOrganization and Associative Memory, Springer-Verlag, Berlin, 1984
[2] B. Widrow and S.D. Stearns, Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, NJ,

1985
[3] D.E. Rumelhart and J.L. McClelland, Parallel and Distributed Processing, MIT Press,
Cambridge, MA, 1986
[4] H. Bourland and Y. Karp, "Auto-Association by Multilayer Perceptrons and Singular Value
Decomposition," Biological Cybernetics, Vol 59 (1988) pp 291-294
[5] P. Baldi and K. Hornik, "Neural Networks and Principal Component Analysis: Learning
from Examples Without Local Minima," Neural Networks, Vol 2 (1989) pp 53-58
[6] R.E. KaIman, "Canonical Structure ofLinear Dynamical Systems," Proc. Nat. Acad. Sciences,
Vol 48 (1962) pp 596-600
[7] E. Oja, "A Simplified Neuron Model as a Principal Component Analyzer," J of Math Biology,
Vol15 (1982) pp 267-273
[8] S. Shinomoto, "Memory-Maintenance in Neural Networks," Journal of Physics, A20, 1987
[9] RJ.T. Morris and W.S. Wong, "A Short-Term Neural Network Memory," SIAM Journal on
Computing; Vol17 No 6 (1988) pp 1103-1118
[10] Terence D. Sanger, "Optimal Unsupervised Learning in a Single-Layer Linear Feed-Forward
Network," Neural Networks, Vol2 (1989) pp 459-473

592

R. W. Brockett

[11] R.W. Brockett, "Least Squares Matching Problems," Linear Algebra and Its Applications,
Vols 122/123/124 (1989) pp 761-777
[12] R.W. Brockett, "Dynamical Systems That Sort Lists, Diagonalize Matrices and Solve Linear
Programming Problems," Proceedings ofthe 1988 IEEE CO/iference on Decision and Control,
IEEE, New York (1988) pp 799-803
[13] C.E. Shannon, "Symbolic Analysis of Relay and Switching Circuits," Trans AIEE, Vol57
(1938) pp 713-723
[14] J.B. Dennis, Mathematical Programming and Electrical Networks, J WiIey, New York, 1959
[15] S. Grossberg, "Nonlinear DilTerence-DilTerential Equations in Prediction and Learning
Theory," Proc. Nat. Acad. Sciences, Vol 58 (1967) pp 1329-1334
[16] N. Karmarkar, "A New Polynomial Time Algorithm for Linear Programming," Combinatorica,
Vol4 (1984) pp 373-395

Subject Index

A
accessibility of nonlinear systems, 6, 454 fT.
to the origin, 459
computational complexity, 461
Lie algebra, 457 Ir.
rank condition, continuous, discrete
time, 457 fT.
strong, and controllability, 468
Ackermann's formula, 319
adaptive control and identification, 387-450
adaptive control, 2, 6, 19, 437-450
algorithms, 438 fT.
direct, indirect, 438 fT.
indirect, unexpected properties, 445 fT.
industrial use, 448
LS estimator as KaIman filter, 443 fT.
microprocessor implementation, 448
model reference, 437
self-oscillating, 437
adaptive filters for learning systems, 580 fT.
descent equation, 589 fT.
realization, 582
algebraic bundle, 335
algebraic coding, see: coding
algebraic canonical form, 335
algebraic geometry, 6, 327 fT.
algebraic groups, 322 fT.
algebraic system theory, see: module theory
and linear system theory; linear
systems; polynomial matrices;
realization problem
algebraic varieties and morphisms, 328 fT.
affine varieties, 331
c1assification of varieties, 330
quasi-projective varieties, 335
algebraization of system theory, 234
algebras, C*, W*, von Neumann, 137
predictor, splitting, 218
algorithms,
B. L. Ho, 284
Berlekamp-Massey recursive, 201
Nevanlinna-Pick, 201, 202
for triangularizing polynomial
matrices, 363-365
analytic c10sed loop solvability (analytic
CLS),407

applications of system theory, 525-592


approximation, Pade, Cauchy, 202, 504
approximative system models (ASM), 390 fT.
AR models, 21 fT.
ARMA models, 24fT., 216
observability and continuity, 33
ARMAX models, 395 fT.
arithmetic,
integer vs. rational, 360-363
modular, 5, 234-239, 250
assignability, pole and coefficient, 319
assignment of dynamics and invariants,
306-308
automata theory and Nerode principle, 296
auxiliary variables, 5, 216
axiomatic framework, 3, 15-37, 503
B
balanced realizations, 248
for all-pass transfer functions, 260
and continued fractions, 256
Banach state space, 208
bandwidth, 102, 184
bang-bang principle, 148
barreled space, 206
bases, minimal or proper, 7, 303 fT., 542 fT.
and poles at infinity, 542
Bayesian sufficient statistic, 218
behavior (Hankel) matrix, 193 fT., 340, 396
partially determined, 197
beha vior of model, 19 fT.
behavioral approach to system theory, 3,17-37
behavioral equations, 19,21,36,216
Bellman dynamic programming, 18, 19, 148
Berlekamp-Massey recursive algorithm, 201
Beurling-Lax theorem, 504
Bezout identity (equation), 239, 322, 356, 367
double, 350
Bezoutiah matrix, 84, 250 fT.
Bezoutian quadratic form, 250 fT.
bicausal maps, 236, 275-277
black box modeling (description), 1 fT., 191 fT.
see also: realization problem
Bode and Shannon whitening filter, 225
Bode integral formula, 147

594

Subject Index

Brownian motion (or Wiener process), 64


Bryson-Frazier formula, 70

C
calculus of variations, 70
canonical embedding and projection, 271, 299
canonical forms, 5, 23, 36, 239, 280
and continued fractions, 256, 257
canonical forms, global algebraic,
non-existence of, 337
canonical structure (decomposition)
theorem, 192, 196
extensions, 6, 475-488
nonlinear systems, 478 fT.
time-varying systems, 476fT.
canonical (controllable, observable)
systems, 26, 284, 339-341
see also: minimal systems, minimal
realizations
canonical system, weakly, 205
cascade interconnection, 201
category theory and system theory, 279,
284-288
Cauchy approximation, 202
Cauchy index, 253
causality and stability, 301
Cayley-Hamilton theorem, 332, 340, 410
CCD filters as systems over rings, 312
chaotic behavior, 2
characteristic and transfer functions, 503
Cholesky factorization, 406
cIosed orbit lemma, 330
co-state or Lagrange multiplier, 180
coding, algebraic and system theory, 527-557
block codes, 527
catastrophic encoder, 531
convolutional codes, 7, 290, 527
algebraic methods, 7, 290, 550 fT.
encoder, 356, 529, 536 fT.
dual codes and encoders, 539 fT.
generalized minimal encoder, 545 fT.
minimal bases and poles at infinity, 542 fT.
parity-check matrix, 541
realization of encoders, 546
syndrome, 539
systematic encoders, 541 fT.
code sequence, code, 530
generator matrix, 530
Hamming (free) distance, 530
performance vs. complexity, 527
systems over the integers, 312
system theory and algebra, 532 fT.
trellis codes, 539,fT.
co integration, 7, 560 fT.
and zero structure, 565
commutant lifting theorem, algebraic
version, 241
commutative algebra, 280, 283, 285

commutative diagrams, 246fT., 271 fT., 282fT.,


302, 336
compensation, dynamic, see: control theory
controllers, feedback, H", problem,
LQG problem
complexity, high, 1
McMillan (or Smith-McMillan)
degree, 192 fT., 396
computations, in Wiener vs Kaiman filtering, 46
symbolic and error-free, 356
modern integrated circuit technology, 67
complexity of controllability-accessibility, 461
computer algebra and controller
synthesis, 355-367
Macsyma and Mathematica, 357
computer technology, perspective on, 438
consistency and asymptotic normality of ML
estimators, 406-420
consistent (strongly) parametrization, 412
consistent estimators, 434
continued fractions, 5,.239, 244, 256-264, 371
and realization, 201
continuity and observability, 33
control theory,
multiple objective problems, 173
by polynomial matrices, 356 fT.
distributed systems, 491-499
assign dynamics to control variable, 215
geometrie, 290
adaptive, 2, 6, 19, 437-450
cut-and-try methods, 153
H", optimal, 4, 159
and LQG 156
frequency-domain/operator-theoretic
methods, 160
state-space methods, 160-174
model-based, 152, 153
systems over rings, 319-322
worst case design, 159
control variable, no dynamical description, 215
co nt roll ability, 2, 3, 6, 17-19,36,46,60, 100,
234,239,267,282,345-347,390,503,505
and Kronecker indices, 553
and pole cancellation, 346
and strong accessibility, 468
distributed systems, 6, 492 fT.
generalization of Hautus test, 27
linear and nonlinear, 472
linear systems using modules, 469 fT.
loss of and common factors, 30
matrix (reachability matrix), 196, 332, 454
nonlinear systems, 6, 453-472
(accessibility) to the origin, 459
(accessibility), computational
complexity, 461
and Lie brackets, 455
using difTerential fields, 463-472
observability and minimality, 26
path-, and tracking of targets, 566 fT.

Subject Index
pole-assignability and feedback, 267
controllable systems and transfer functions, 30
controllers,
H"" central and LQG c1assical, 168
PID, 18
stabilizing, YJBK parametrization of, 6,
159, 199, 349-353
synthesis and computer algebra, 355-367
convergence, of polynomial matrices, 32
of systems, 321T.
convolution systems, 194,216, 244, 315
convolutional codes, see: coding
coprimeness,
of polynomial matrices, 239-242, 307, 348
and corona theorem, 239
.
approximate,211
H"",239
observability, reachability, 248
corona theorem and coprimeness, 239
covariance, 45, 57, 90, 216, 227, 407
cross correlation, 222
cybernetics, 17
D
Darlington synthesis, 201, 264
datacompression, realization problem, 193,295
dead-beat control, 147
description, external, input/output, black
box, 4, 191,213
internal, state space, 4, 191
see also: state space models
see also: modeling, realization problem,
behavioral approach
design theory, 1471T.
detectability, 47, 48
detector, optimal, 66
dilTerential algebra and fields, 6, 4641T.
dilTerentially algebraic elements,
extensions, 4651T.
dilTerentially transcendental elements,
extensions, 4661T.
dimension of state space systems, 194
see also: canonical systems
Diophantine (Bezout) equations, 356
discrete event systems, 356
displacement, structure, 781T.
rank, 821T.
distributed and nonlinear systems, 451-500
distributed systems, control of 491-499
distributions, Schwartz, 193, 205, 208
E-module framework, 314
smooth, involutive, integrable, 4791T.
disturbance decoupling problem, almost, 164
division, Euclidean, 3, 5, 201, 202, 239, 256,
261,359
pseudo-, 360 IT.
divisors, common, greatest,
left-right, 239-240, 358, 533

595

Dry-Tuned-Gyro (DTG), 96
duality, exact reachability and topological
observability, 207
duality, KaIman filtering and LQG
problem, 47, 60
dynamic programming, 18, 19
dynamics assignment and invariants, 306-308
dynamics, topological, 37
E
E-module framework for systems over rings, 314
econometrics,
error correction models, 561
errors-in-variables models, 5691T., 572
factor analysis models, 571
inputs and outputs, 5691T.
system-theoretic trends, 7, 559-577
tracking of targets, 5661T.
economic time series, random drift, 560
electronic devices, solid state, 17
engineering,
theory, practice, and art, 296
vs. physics, 154
mathematization of, 296
theoretical, 18, 296
entropy minimization problem, 174
equations,
behavioral, 19, 21, 36, 216
Chandrasekhar, 69, 78, 81, 83
c1assical Wiener-Hopf, 78
delay dilTerential, 211
forward and backward evolution, 76
gyro/accelerometer motion, 97
integral and Riccati equations, 64
integral, Fredholm-type, 63
Lyanunov dilTerential, 74
Wiener-Hopf-type, 63, 78, 79
stochastic dilTerential, 65, 227
equivalence, stochastic, 215
equivalent (linear) systems, 194, 3301T.
errors-in-equations models, 423
errors-in-variables (EV) models 216, 423,
5691T., 572
estimates,
conditional mean, 42
least-squares, 6, 56, 65, 89, 4381T.
filtered, smoothed, 56, 69
estimators,
asymptotic properties of, 393
causal, on-line, optimal, 41
consistent, 434
gyro/accelerometer, 99
maximum likelihood (ML), 400
consistency and asymptotic
normality, 406-420
minimum prediction error (MPE), 3911T.
optimal state, 183
optimal (ASM), 3911T.

596

Subject Index

optimal (KaIman filter), 45


unbiased, 165, 186
EucIidean algorithm (division), 3, 5, 201, 202,
239, 256, 261, 359
Euclidean and Gaussian elimination, 356, 360
expectation, conditional, 64
F

factor analysis models, 221 ff., 423, 571


factorization and realization, 272
factorization, co prime, 5, 31
see also: coprimeness
factorization, spectral, 43, 215, 225
factors, common and loss of controllability, 30
families of linear systems, 3, 325-342
adaptive and robust control, 341
input/output and state space systems, 195
reachable systems, 332-333
fast algorithms, 69, 356
feedback, 2 ff.
algebraic approach, 268, 276
and feedforward strategies, 187
and low sensitivity to uncertainty, 187
and realization, module-theoretic
framework, 268
bandwidth and low sensitivity, 187
bandwidth and non-minimum phase
zeros, 188
controllability, pole-assignability, 267
cycIizability, 320
geometrie approach, 267
output feedback 167 ff., 258 ff., 308, 347
realization and invariants, 295-308
state, 162
synthesis, general problem, 351
and computer algebra, 355-367
controllability and observability, 351
fast algorithms, 356
norm minimization, 353
YJBK parametrization, 349-353
filtering problem, 4, 136, 141
see also: KaIman filtering
adaptive filters for learning, 580 ff.
anti-aliasing filters, 183
discrete nonlinear, 136
for point processes, 67
H<Xl filtering, 166
lattice filters, 84, 380 ff.
matched filters, 65
nonlinear Gaussian, 68
notch filters, 100
recursive nonlinear, 67
whitening filters, 57, 60, 223 ff.
Fokker-Planck equation for Markov density
functions, 59
formal power series, 235, 530
see also: Laurent series
Fourier transform, 56, 185,225
fractional transformations, linear, 201

fractions, continued, see: continued fractions


fractions,
polynomial matrix, 200, 243, 275-277,
347 ff., 367, 395
proper stable rational, 200, 347 ff.
Fredholm operator, 514
Fredholm-type integral equations, 63
frequency response, 29, 30, 36
frequency- vs. time-domain in module
theoretic framework, 268
friction,
solid, 94, 105 ff.
compensation, 94, 107
Coulomb, Dahl model, 105
gimbal bearing, 105
Frobenius theorem
on signature of Hankel matrix, 257
on integrable and involutive
distributions, 481
Fuhrmann realization, infinite dimensional, 209
fundamental result of realization theory, 272
G
gain scheduling, 437
gain, KaIman, 44
optimal,99
static, 163
strategy of high, 155
gain-phase relation, 147
games, linear-quadratic and H", control, 164
gap, theory-practice, 4, 148
Gauss, caIculations of orbits of asteroids, 59
Gauss-Markov model, 44, 90, 93
Gaussian and Euclidean elimination, 356, 360
Gaussian process, 66
gcId-gcrd (greatest common left-right
divisor), 239-242, 307, 348
geometric quotient, 332 ff., 340
geometric structure of orbit spaces, 330
gimbal bearing friction, 105
Gohberg Semencul formulas, 84
GPS receiver KaIman filter, 110 .
Gram-Schmidt orthogonalization, 60
gramian of linear system, 260
Granger representation theorem, 561 ff.
Grassmannian and KaIman space, 335
guidance and navigation, 4, 89-134
gyro/accelerometer, multisensor, 96
equations, estimation and control, 99
gyroscope, gyrocompass, 92
H

H", problem,
standard control problem, 161
output-feedback control problem, 167
state-feedback control problem, 162
H", coprimeness, 239
H", filtering and KaIman filtering, 166
H", filtering and smoothing problem, 165

Subject Index
Hamiltonian, equations, 60, 82
framework, 191
matrix, 70, 180
Hamming (free) distance, 530
Hankel,
matrices (behavior matrices), 193,340,396
signature of, 341
partially determined, 197
functional equation, 242
map or Kaiman input/output map, 271
operators and module
homomorphisms, 242-244
operators, singular values of, 260
quadratic form, 250
Hautus controllability test generalization, 27
heat bath, 217
Hermite form of polynomial matrices, 357,
359, 367
Hermite-Fujiwara, matrix, 256
quadratic form, 252-256
Hermite-Hurwitz stability theorem, 254
Hermitian quadratic form, 251
Hessenberg form, 380
hidden modes (decoupling zeros), 471 fT.
Hilbert, Nullstellensatz, 331
state space, 205
uniqueness method (HUM) 491, 495
Hurwitz stability, 372 IT.
I

identification and adaptive control, 6, 387-450


identification from noisy data, 3, 423-435
consistent estimators, 433-434
in econometrics, 423, 560
prejudice in identification, 425
problem formulation, 425
static, dynamic cases, 425
static case and Perron-Frobenius theory, 427
the solution set, 426 IT.
identification, linear stochastic, 389-421
ASM-PEM framework, 391 IT.
approximative system models (ASM), 390 IT.
basic nonidentifiability, 424
consistency and normality result, 390 IT.
construction of likelihood function, 400-406
error magnitude vs model complexity, 392
identifiability, of parametrizations, 399
of stochastic models, 228
asymptotic, 391, 408
input hypotheses, asymptotically
stationary process, 393
deterministic input, 393
wide-sense stationary process, 394
modeling and parameter estimation, 390
persistent excitation, 407 IT.
prediction error methods (PEM), 390 IT.
system identifiability result, 390 IT.
IEEE Medal of Honor, 453
implicit function theorem, 456, 467

597

impulse response, pseudorational, 208


indices,
of polynomial bases, 291, 554
reachability or Kronecker, 3, 201, 305
stability and module, 303-305
industrial processes, 154
inequalities, quadratic matrix, 164
inertia of a matrix, 80, 84
inertial navigation systems (INS),
Alidade alignment, 121
calibration of a ring laser gyro, 115
gimballed, 115
implementation of calibration Kaiman
filter, 118
SIGAL systems, 96
SIGMA calibration method, 116
strapdown, 115, 116
Inertia-GPS, ULISS systems, 125
Kaiman filter implementation, 131
multisensor hybrid navigation, 125
software, 128
infinity, poles and zeros at, 287
information theory, 18
see also: coding
innovations, 4, 50, 228, 406
and spectral factorization, 49
martingales, scattering, 55-88
input/output spaces, 194, 235 IT., 269 IT., 298
input/output, maps (descriptions), 191, 194,
269,299
see also: description, external
stable maps, 300
parametrization of, 345-353
inputs, control and exogenous, 161,351
inputs, random, as internal variables, 5, 228
integral equations, singular, state space
solution, 7, 509-523
integral operators, inversion by factorization
methods, 520 fT.
inversion by input/output methods, 512 IT.
integral, stochastic, Ito, Stratonovich, 60, 64, 66
internal or state space models (description),
see: state space models
internal stability, 302, 349 fT.
see also: YJBK parametrization
interpolation and realization, 7, 202
interpolation by state space methods, 7,
503-507
intertwining relation, 244
invariant factor theorem, 7, 529 IT., 534
invariant factors, 280, 284, 307, 367
and pole assignment, 297
invariant imbedding theory, 78
invariant subspaces, geometry of, 239
invariant theory and families of systems, 6,
327-342
invariants, realization and feedback, 295-308
system performance, 5, 296
inverse scattering, 84, 201
irreversibility of stochastic evolutions, 227

598

Subject Index

It dilTerential rule, 62
It stochastic integral, 60
K
KaIman experimental setup, 271
KaIman filtering, 2-4, 6, 7, 18, 19,41-143,
185,400,438,492,559
Bierman factorization algorithms, 119
and H", filtering, 166
adaptive filtering, 53
advance in digital data processing, 134
asymptotic stability and time invariance, 47
Chandrasekhar type algorithms, 53
compensation of solid friction, 94
computational issues, 46, 53
continuous time (Kalman-Bucy filtering), 44
controller design, 53
discrete time, 48
dual of LQG problem, 47
extended or nonlinear, 52, 67, 106, 124
historical comment, 59
implementation on digital computer, 91
infinite time interval, 46
key dilTerences with Wiener filtering, 46
LS estimator in adaptive control, 4431T.
navigation and guidance, 89-134
quantum, 4, 135-143
separation of dynamics and
measurements, 48, 49
square root filtering, 53
stability, 60
state-space formulas, 78
Kalman-Bucy or continuous-time KaIman
filtering, 44, 56-60, 79, 81
Kalman-Bucy formulas, 63, 64, 69
KaIman,
gain,44,79
controllability (reachability) and
observability matrices, 196, 332, 454
input/output or Hanke! map, 271, 2821T.
realization diagram, 282
space of families of systems, 3281T.
and the Grassmannian, 335
construction of, 334-335
geometric structure, 335-337
Karmarkar descent algorithm, 591
Kepler's laws, 20
Krohn-Rodes theory, 284
Kronecker indices, 3, 201, 305
Kyoto Prize, 41
L
Lagrange multiplier or co-state, 180
Lagrangian framework, 191
Laplace transforms, 36
LaSalle's bang-bang principle, 148
latent variables, 19,36,424
laUice filters, 84, 380 IT.
Laurent series, 282, 2971T., 530

and time-invariance, 269


formal, and z-transform, 281
rational, 236
truncated, 235
Iclm-~crm (least common left-right multiple), 240
learnmg systems, 7, 5791T.
adaptive subspace filters, 580 IT.
realization, 582
autoassociation and regression, 5841T.
continuous and combinatorial aspects, 590
descent equation implementation, 5891T.
fading memory filters, 5871T.
singular value decomposition, 580
total least squares approximation, 586
least squares estimation, 56, 65, 89, 4381T.
Levinson algorithm, 83
Lie brackets, 455, 457, 463, 479, 482, 582
Lienard-Chipart stability criterion, 253, 375,
378
lifting, 137, 138
likelihood function contruction, 400-406
likelihood matrix, 89
linear systems,
algebraic and analytic object, 5
algebraic-geometric framework, 327
canonical or minimal, 246, 282
see also: canonical realizations
discrete, stability of, 371-384
equivalent systems, 194, 330 IT.
families of, 5, 325-342
gramian and singular values of, 260
general feedback synthesis problem, 351
infinite dimensional, canonical, 207
local analytic vs. global algebraic
viewpoint, 327
over rings, 311-322
parametrization of input/output
maps, 345-353
poles and zeros, 280, 289-292
state space, definition, 194, 244, 270
stochastic, 399
universal parameter space, 327
see also: AR, MA, ARMA, ARMAX, SSX
models
linearity and time invariance, fusion of, 298
localizations of polynomials, 2991T.
Lwner interpolation problem, 505
Lwner matrix, 202
log-likelihood function, 403
LQG (linear quadratic gaussian) problem, 3,
4,18,70,96,145-188,347,439
and state-space H", theory, 159-176
impact of, 152
industrial applications, 152
intuitive interpretation, 177
microprocessor implementation, 152
robustncss of, 155
time-domain version of Wiener theory, 148
unified continuous and discrete time
theory, 177-188

Subject Index
LTR (loop transfer recovery), 155, 187
Luenberger form, 382
Lure's theorem, 147
Lyapunov,
equation, 74, 256, 406
function, 379
stability theorem, 255
second method, 377
Lyapunov-Routh-Hurwitz stability, 374
M
MA models, 27 Cf.
Macsyma and Mathematica, 357, 366
manifest variables, 20, 36
Mansour or discrete Schwarz form, 6,
375 Cf.
map, continuously invertible, 205, 207
margin, gain and phase, 102, 183
Markov,
chains, quantum, 135, 141
density functions, Fokker-Planck equation
for, 59
diCfusions, 60
parameters, 192, 195, 244, 269, 270
process,90
property, 74, 217
splitting property, 219 Cf.
martingales, 4, 64, 66, 223
match between dynamical intuition and
algebra, 287
mathematical models,
behavior and behavioral equations, 19 Cf.
extern al (phenomenological), internal, see:
description
see also: modeling, realization problem
mathematics and system theory, 501-523
mathematization of engineering, 296
matrices,
behavior or Hankel, see: Hankel
matrices
Lwner, 202
polynomial, see: polynomial matrices
scattering, 71
matrix factorization, 83, 504
matrix fractions for rational matrix
functions, 200, 243 Cf., 275, 313, 347 Cf.
coprime fractions, see: coprimeness
matrix pencils, 510
matrix rational interpolation
problems, 201 Cf., 503 Cf.
maximum likelihood estimators, 390, 400
consistency and asymptotic
normality, 406-420
maximum principle, 18, 19, 148
Maxwell, J. c., 17,233
Mayne-Frazer smoothing formula, 73
McMillan degree, 192
see also: invariant factors
memory span, 22, 32

599

memoryless instrument (quantum


probability), 140
minimal (controllable, observable) systems
see: canonical systems
minimum phase plant, 187
minimum variance control, 439, 442
model reduction of discrete time systems, 381 Cf.
model-based control, 152, 153
modeling, see also: realization problem
state construction, 191 Cf.
issue of modeling, 153 Cf.
stochastic exact, 393
stochastic, distributional, 214 Cf.
models,
see also: realization problem
see also: AR, MA, ARMA, ARMAX, SSX
models
black box 1 Cf., 191 Cf.
dynamical, for learning, 579
errors-in-variables, 6, 216
from first principles, 36, 154
Gauss-Markov,44
identifiability of stochastic, 228
modules and algebraic system
theory, 279-292
most powerful unfalsified (MPUM), 37
sampled, 183
scattering, 71
stochastic, 183, 216
modes, hidden, unstable, 347
modular arithmetic, 234-239, 250
module structure, stochastic context, 224
module theory and linear system theory, 3,
5, 233-327, 469 Cf.
dynamics, 288
stability framework, 302 Cf.
module and stability indices, 303-305
modules, strict observability, 302
modules over the polynomials, see:
polynomial modules and matrices
modules, pole and zero, 287, 289
modules, torsion, and finite dimensional
realizations, 274
moduli space of dynamical systems, 330
classification, 390
fine, 337
moving average control, 440 Cf.
MPUM (most powerful unfalsified model), 37
multiple, common, least, left-right, 239-240,
358,533
Mumford's theorem, 332, 334, 335
N
Navier-Stokes equations, 492
navigation and guidance, 4, 89-134
navigation Kaiman filters, implementation,
94-134
hybrid systems, 93
inertial navigation, "strapdown" system, 91 Cf.

600

Subject Index

navigation and guidance (cant.)


inertia-GPS multisensor system, 94
ring laser gyrosystem, 94
SIGAL systems, 96
Super-Etendard system, 94
radio navigation, 91
radio satellite navigation, 94
radionav systems, 93
NA VSTAR GPS (global positioning
system), 110
Nehari problem,
state space solution, 504
algebraic version, 242
Nerode equivalence, 205, 209, 211, 283
Nerode theorem, 389
networks, analysis, synthesis, 17, 192
neural models, 579
Nevanlinna-Pick,
algorithm, 201, 202
interpolation problem, state space
solution, 7, 504 fT.
nice selection, 334
N oetherian domain, 313 fT.
noise, variables, 218, 222-227
additive, 68
covariance matrix, 89, 427
measurement,45
modeling, 423
multiplicative, 68
process, 41, 167
white, 44, 65, 90
non-Gaussian processes, 67
nonlinear and distributed systems, 451-500
accessibility, 454 fT.
controllability, 453 fT.
local controllability and Lie brackets, 455
nonlinear dynamics using difTerential fields,
4661T.
real-analytic dynamics, 454
normal equation, 89, 91
numerical sensitivity of floating-point
methods, 357
Nyquist, 17
Nyquist stability test, 147

observability, 2 fT., 17 fT., 28 fT., 46, 60, 92, 100,


116, 192, 196 245, 267, 282, 284, 339,
345 fT., 390, 476,
486, 503, 505
and continuity, 33
and injectivity, 273, 302
and reachability indices, 198
and right coprimeness, 248
matrix, 196
controllability and minimality, 26
gramian,72
topologieal, 207, 208

observability/reachability decomposition,
192, 196
see also: canonical structure theorem
observables, algebra of, 136
observer, 19, 44, 348
operation valued measures, 138
operator model theory, 503
operator, Markovian, 137
operator, Volterra, 63 fT.
optimal control, 145-188
inverse problem of, 148
and H"" 164
control performance vs. input power, 151
optronic systems, 105
orbit space, 340
order chain and list, 304
orthogonality condition, 57, 62, 69
orthogonality of subspaces, conditional, 220
output and input spaces, see: input/output
spaces
output-feedback canonical form and
invariants, 258-259
outputs, regulated and measured, 161, 351
P

Pade approximation, 202, 256, 504


parameter identifiability, 399
parameter space, 396 fT.
parameter space, universal, of dynamical
systems, 337-339
parametrization,
asymptotically identifiable, 408
and continuity, 29 fT.
of stable input-output maps (YJBK
parametrization), 199,345-353
strongly consistent, 412
partial realizations, 5, 84, 193, 197fT., 256
and properness of compensator, 200
passive network synthesis, 192
PD Es, evolution type, 491
boundary and initial conditions, 491
performance vs. robustness, 173
Perron-Frobenius theory of positive
matrices, 430
persistency of excitation, 407 fT., 444
physics vs. engineering, 154
physics, classical and quantum, 156
point processes, filtering for, 67
Poisson processes, 67
pole assignment,
controllability and feedback, 267
control, 439
and invariant factor assignment, 297
pole module, 283 fT.
indices, degree, 305
pole cancellation and controllability, 346
pole-zero modules, 2891T.
pole-zero cancellation,
interconnected systems, 290

Subject Index
reachability, observability, 440
unstable, 346, 348
pole-zero exact sequence, 290
poles and zeros of linear systems, 268, 280,
289-292
poles at infinity and minimal bases, 542 Ir.
polynomial (module) action and time-shift, 281
polynomial ba ses and indices, minimal, 291, 554
polynomial matrices, 5, 21
terminology of 357-359
column, row reduced, 552
convergence of, 32
coprimeness, 239-242, 307, 348
elementary row and column operations, 358
fractions of, 200, 243, 275-277, 347 fT., 367, 395
poles, zeros at infinity, 552
Hermite forms, 357
predictable degree property, 538
triangularization of, 356, 359 fT.
unimodular, 29, 358
polynomial models, 5, 235, 247
polynomial modules, 235, 268, 280
and dynamical structure, 280
and input/output spaces, 281
polynomials, 3, 235, 281
computation, 348
error-free computation, 6, 357 fT.
generalized, 349
localizations of, 299 fT.
Pontryagin maximum principle, 18, 19, 148
positive definiteness and stability, 253
positive real lemma, 3, 192, 215
power series, formal, 235
rationality of, 313
see also: Laurent series
prediction, 46, 389
prediction error methods (PEM), 390 fT.
loss and cost functions, 391
recursive construction, 403
predictor algebras, 218
prejudice in identification, 425
principal ideal domain, 300, 315, 532
probability,
spaces, measures, distributions, 215 fT.
axiomatic framework for modeling, 216
conditional, 217
processes,
random (stochastic), 41, 45, 215 fT.
Gaussian stationary, 400
non-Gaussian, 67
Poisson, 67, 68
purely non-deterministic, 224
state-space description of nonstationary, 57
wide sense or second order, 216, 217, 394
properness of compensator and partial
realization, 200
pseudo-division lemma, 361
pseudorationality, 208
Ptak space, 206

601

Q
quadratic forms: Bezoutian, Hankel,
Hermite, Hermite-Fujiwara, 250-256
quadratic stabilization theory, 156
quadrature, 80
R

radiative transfer theory, 76, 78


Radon-Nikodym derivative, 65
random drift in economic time series, 560
random inputs as interna I variables, 228
random processes and variables, 41, 45, 215 fT.
see also: processes
rational interpolation, 201-203, 503-507
rational functions and matrices,
field of rational functions, 235
proper stable fractions of, 200
factorization problems, 504
interpolation problems, 201 fT., 503 fT.
Smith-McMillan form, 7, 289, 367, 395,
532, 534, 553
state space methods, 504
see also: transfer functions, power series
rational matrix symbol, 509
rational model, 248
rational vector spaces, minimal (proper)
bases, 7, 303, 542-548
and poles at infinity, 542
rationality, and finite dimensional
realizations, 274
reachability, 192, 196,245, 284, 332, 475, 486
and feedback cyclizability, 320
and lert coprimeness, 248
and observability indices, 198, 305
and surjectivity, 273, 302
approximate, 205, 207
input/state map, 276, 277
matrix (controllability matrix), 196, 332,454
pairs, 227, 338
reachable family of systems, 337
set, 455
reachability/observability canonical
decomposition, 192fT., 475fT.
see also: canonical structure theorem
realizability and separability, 487
realization (modeling) problem, 3 fT, 19, 37,
189-229, 244-249, 272, 339, 389, 511
abstract realization, 301
and factorization, 246, 272
and continued fractions, 201, 256
and feedback, module-theoretic
framework, 268
and interpolation, 7, 193,201-203
and stability modules, 301-303
and synthesis problems, 199-200
canonical (mirninal) realizations, 193,267,
273, 282, 284, 302, 345
compressibility of infinite data set, 193, 295

602

Subject Index

realization (modeling) problem (cant.)


continuity in the parameters, 341
continuous-time linear systems, 193
dia gram of Kaiman, 282
feedback synthesis, 193
finite-dimensional, 274
finiteness = finite rank ofbehavior matrix, 196
finiteness = finite support of denominator
distribution, 208
infinite dimensional environment, 217
infinite dimensional Fuhrmann realization,
209
invariants and feedback, 295-308
linear deterministic, 191-212
complete data, 193-197
finite-dimensional systems, 193-204
infinite-dimensional systems, 193,204-212
partial data (partial realization), 5, 84,
197-199
miminal (canonical) realizations, 193, 267,
273, 282, 284, 302, 345
modeliI~g from impulse response and
transfer function, 191
module approach, 205
noise variables, 218
of matrix fractions, 247-248
of rational matrices, 503, 546
over a commutative ring, 313 I
recursiveness issues, 193, 200-201
shift, with polynomial models, 247
with rational models, 248
splitting variables, 218-222
state space, 270-273
state-space system and recursive
computability, 214
stochastic, 5, 213-229, 394
factor analysis models, 222
axiomatic probability framework, 216
prototype problem, 218
symmetry, 249
torsion modules and rationality, 274
transfer function with simple and multiple
poles, 192
uniqueness of canonical realizations, 192,
196,204-207
recursive algorithms, 239
Berlekamp-Massey, 201
Redheffer, star product, 75
reflection coeflicients, left and right, 71
regression, 427, 584 ff.
regulation of industrial process, 441
regulator,
inverse problem, 155
self-tuning, 437 ff.
direct,442
indirect 440
relation function, 424
representations, ARMAX, SSX, 395 ff.
residue of matrix function, 562
Riccati equation (RE), 2, 4, 18, 46, 47, 49, 62,
72, 81, 91, 151,406,492

RE and spectral factorization, 49


RE framework for systems over rings, 322
ARE, algebraic RE, 76, 99, 160, 181
similarity between LQG and H"" 164
RE asymptotic behavior, 76
RE differential, difference, 77, 160, 181, 184
time-varying RE and time-varying spectral
factorization, 50
square-root or array versions, 81
Riemann-Hilbert problem, 554
robustness issues, 154, 159, 185
robustness vs. performance, 173 ff.
RollNix, ship steering autopilot, 448
root location of polynomials, 249-256
root-Iocus method, 147
Rosenbrock theorem, 305
Routh array, 373
Routh-Hurwitz criterion, 84, 147, 373
S
sampling period in LQG problem, 4, 178 ff.
scattering,
approach, 77
matrix, 71,72
model, 69, 71, 75
pairs, 221
representation, 226
inverse, 84, 201
Schrdinger equations, 497
Schuler period, 92
Schur algorithm, 83, 84
Schur-Cohn stability criterion, 84, 375 ff.
Schur-Cohn-Jury table, 383
Schur-Hadamard product, 582
Schwarz matrix, 373 ff.
Schwarz, discrete, or Mansour form, 6, 375 ff.
sciences,
descriptive, 1, 7
of the artificial, 7, 154
prescriptive, 1,7,17,19
scientific activities and methodology, 1
self-tuning regulator, 2, 6, 437 ff.
direct,442
indirect, 440
sensitivity minimization trade-offs, 188
sensor fusion, 94
separability and realizability, 487
separation theorem and certainty
equivalence, 185
separation, state observers-state feedback
controllers, 19
sheaf theory, 331
shift,
group,220
operator, 178, 236
compression of, 237
eigenvalues and eigenfunctions, 237
realization, 258
with polynomial models, 247
with rational models, 248

Subject Index
short exact sequence, 236
fundamental pole-zero, 291
SIGAL strapdown inertial systems, 96
signal,
estimation and detection, 68
processing and systems over the integers, 312
processing, statistical, 214
space, 20
singular values of linear systems, 260
singular value decomposition, 580
Smith canonical form and invariant factors,
284, 367
Smith-McMillan, form, 7, 289, 367, 395, 532,
534, 553
degree, 396
Smith-McMillan-Yoo form, 562
smoothing, 46, 501T., 701T.
Sobolev space, 496
solid friction, adaptive compensation, 105
spaces,
input, output, see: input/output spaces
state, see: state, state space realizations
probability, 215
Ptak, barreled, 206
spectral densities, 178, 184, 394, 424
spectral factorization, 4, 43, 90, 215, 225
of Wiener and Hopf, 56, 57
innovations, Riccati equation, 49
time-varying, 50
spectral matrix, 50
spectral problems, 239
splitting algebra, variables, 218-222
Sputnik, 59
SSX models, 3951T.
stability,
margin, 378-379
interna!, 297
of input/output map, 300
stability, module theoretic framework, 2971T.
stability and causality, 301
stability module, 303
stability and module indices, 303-305
stability rings, 297
stability criteria,
Hermite-Hurwitz, 254
Lienard-Chipart, 253, 375, 378
Lyapunov, 255, 3731T.
Routh-Hurwitz, 372 ff.
Schur-Cohn, 375 ff.
unified treatment, 250
stability theory,
linear systems, 249-256
discrete linear systems, 371-384
stabilizability, 47, 48
stabilizer subgroup, 333
stabilizing controllers, YJBK
parametrization, 349-353
state, 3, 135, 159,214,219,270
see also: modeling or realization
construction, 4, 191 ff.
interface between past and future, 152

603

minimal generalized, 467


quantum mechanical, 135 ff.
state feedback, 275-277
state space and pole module, 283
state space models (description), 19, 79, 191,
194, 214, 222, 226, 345
state space realizations, see: realization problem
state variable, sufficient statistic of, 214
state-transition matrix, 70, 72, 77
stationary problems, 90
stationary wide sense (or Gaussian) setting, 220
statistic, sufficient of state variable, 214
Bayesian sufficient, 218
statistical,
filters, 89
optimization theory of Wiener, 148
regularity in modelling, 217
-mechanical setup, 217
steel rolling, 154
stochastic,
control algorithm, 440
control theory, 66
equivalence, 215
exact modeling, 393
identification, 389-421
linear systems, ARMAX, SSX, 399
models, 214ff.
and physical systems, 228
identifiability of, 228
process, 215, 217
stationary Gaussian, 400
Stokes iden ti ti es, 76
Stratonovich stochastic integral, 60
superposition principle, 20
symbolic methods in computer algebra, 357
symplectic matrices, 180
synthesis,
c1assical and model-based, 4
of feedback systems, 346
of passive networks, 192,315
problems without observers, 348
problems with generalized polynomials,
349
problems, properness of controller, 348
system component, 1
system description, externa!, internal, see:
description
system identification,
from noisy data, 6, 423-435
linear stochastic, 6, 389-421
system invariants and performance, 296
system stability and performance, 349
system theoretic,
activity,1
trends in econometrics, 559-577
system theory,
and convolutional codes, 7, 527-557
algebraization of, 234
applications, 525-592
influence in mathematics, 501-523
general, 17

604

Subject Index

systems, see also: linear systems; AR, MA,


and controllable subsystems, 30
ARMA, ARMAX, SSX models
as rational and complex-valued functions, 249
approximatively reachable, 207
trapezoid diagrams, 283
at infinity, theory of, 288
entropy of c10sed loop, 174
autonomous, 27, 215
number of poles and zeros of, 290
canonical, see: canonical systems
and state feedback, 277
controllable, see: controllability
behavior at infinity, 287
coprime factorizations, 356
convergent, 32
convolution, 194
Smith-McMillan form of, 289
delay, 5, 209, 316
stable and localization, 288
deterministic and stochastic (heat bath), 217 transition expectation, 138
dynamical, 17, 20, 36, 41
transmission coefficients, forward and
families of, 5, 6, 287
backward, 71
free, 319
triangular form, 359
latent variable, 20, 28
triangularization of polynomial matrices,
learning,7
359,363 fT.
linear, see: linear systems
two-fiIter formulas, 73-75
man-made, 17
neutral, 211
nonlinear and distributed, 451-500
U
over a ring of operators, 315
ULISS navigation systems, 125
over rings, 3, 287 fT.
uncertainty, difTerent types of, 187
continuous-time case, 314-316
unimodular,
E-module framework, 314
group, 23, 32
map,301
pole and coefficient assignability, 319 fT.
Riccati equation framework, 322
matrices, 29, 358
parameter-dependent,318
unique factorization domain (UFD), 360
unobservable subspace, 196
singular, 287
state space, see: state space models
stochastic dynamical, 215 fT., 389 fT.
V
topologically observable, 207
valuation theory, 542
two-dimensional (2-D) 5, 316
variables,
weakly canonical, 205
auxiliary, 5, 216
external (measurement), inputs and
outputs, 3, 191,213
T
internal and random inputs, 228
theorem,
internal, latent or state, 36, 191,213
canonical decomposition (structure), 6,
192, 196, 475-488
manifest, 36
noise, 218
commutant lifting, algebraic version, 241
random,217
Nehari, algebraic version, 242
open mapping/c1osed graph, 206
splitting, 218-222
uniqueness of canonical realizations, 192, 196 variance, 184
theory of distributions, 193
Viterbi algorithm, 527
theory of the artificial, 17
Volterra operator, 63 fT.
theory-practice gap, 4, 148
Volterra series, 487
time invariance and Laurent series, 269
time series, integrated and cointegrated, 561
time- vs. frequency-domain in module
W
theoretic framework, 268
wave equation, 491
Toeplitz matrices, 510
wave, forward and backward, 71
topological aspects of continued fractions, 256 weather forecasting, 152
topological observability in bounded time, 208 Wedderburn-Forney construction, 291,
topology of pointwise convergence, 32
542-548
torsion modules and rationality, 274 fT.
white noise, 50, 222
tracking of targets and path controllability,
disturbances, 177
representation, 223
566 fT.
transfer functions (matrices), 5, 18, 29, 36,
functionals of, 223
161 fT., 195 fT., 208 fT., 241 fT., 270, 282,
whitening filter, 60, 223
299 fT., 312, 315, 317, 345, 393, 397, 503 fT.
wide sense stationary processes, 424

Subject Index
Wiener N., 17
Wiener filtering, 3, 4, 41, 43, 49, 90
calculation burden, 46
and KaIman filtering, key dilTerences, 46
signal model for, 42, 44
Wiener process (or Brownian motion), 64,
66, 68, 178, 223
causal, anticausal, 223, 225
Wiener-Hopf factorization, 56, 57, 570
Wiener-Hopf integral equations, 63, 510, 554
Wiener-Paley physical realizability theorem,
147
Wiener-Volterra expansion, 67
Wold representation, 224

605

Yakubovic-Kalman-Popov lemma, 192,215


YJBK parametrization of stabilizing
controllers, 349-353
Z
Zariski open subset, 3321T.
Zariski sheaf, 339
Zeiger's lemma, 284
zero and pole modules, 287, 289
zero order hold input, 178
zeros and poles, of linear systems, 5, 289-292
at infinity, 544
zeros, input-output decoupling, 464, 4711T.

S-ar putea să vă placă și