Documente Academic
Documente Profesional
Documente Cultură
t0
(t)Qx(t) +u
(t)Ru(t) + 2x
(t)Su(t)
dt
s.t.
, R = R
, R = R
and R = R
and R = R
the
matrix equation B
Y +Y A,
where Y = Y
0
leads to the so called return difference condition. Its single
input formulation is due to Kalman [1] and was later extended
by Anderson and Moore [6]. Specically, K is optimal for
some Q = Q
0, R = R
I +B
(jI A
)
1
K
I +K(jI A)
1
B
R R. (2)
B. Cost estimation problem
In this paper we address a problem related to inverse
optimal control. Specically, we consider the problem of
estimating a cost function that best approximate a given set of
state trajectories. The underlying hypothesis is that the given
trajectories result from the optimal control of a known linear
system at different initial conditions. We aim at estimating the
(unknown) cost function that has generated this set. Once the
correct cost has been learned, we can think of simulating the
state trajectories of the system at new initial conditions, not
necessarily contained in the original training set.
II. COST FUNCTION ESTIMATION: INFINITE TIME
OPTIMAL CONTROL
Consider the following optimal control problem:
min
u
1
2
t0
J(x(t), u(t))dt s.t.
x = Ax +Bu
x(t
0
) = x
0
where (A, B) is supposed stabilizable and the matrix B full
column rank. Let the cost function contain an input-state cross
term:
J(x, u) = x
Qx +u
Ru + 2x
Su. (3)
From now on, we will indicate J the set of such cost functions
with standard hypothesis on the matrices Q, R and S, i.e.:
J =
0, R = R
> 0,
Q S
S
0, (
A,
Q
1
2
) detect.
; (4)
the matrices
A,
Q are dened as follows:
A = ABR
1
S
,
Q = QSR
1
S
. (5)
Under these hypothesis the solution of the optimal control
problem is known to be unique and corresponds to an asymp-
totically stabilizing static state feedback K, i.e. u
(t) = Kx(t)
with A + BK stable. Let x
(t; x
0
) be the optimal trajectory,
i.e. the state trajectory corresponding to the optimizing input
u
(t; x
0,i
) : i = 1, . . . , N}.
Estimate the cost function J J (or equivalently the matrices
Q, R and S) that have generated the given trajectories.
The problem does not have a unique solution, because optimal
trajectories do not univocally determine a cost function in
J. Specically, previous works have shown that for any cost
function in the set J there exists an innite number of different
(not simply scaled) cost functions leading to the same optimal
gain K. Therefore, given a set of optimal trajectories with
different initial conditions, we cannot univocally determine
the matrices Q, R, S that generated those trajectories. The
following denition will help us in formalizing this concept.
Denition 1: (J
J, equivalence relation on J). Consider
J,
J J. We say that J is equivalent to
J and we write
J
J if and only if K =
K, being K and
K the optimal
gains associated to J and
J, respectively.
It can be shown that the relation dened on the set of the
cost functions J is an equivalence, being reexive, symmetric
and transitive. Consequently, we can divide J into equivalence
classes.
Coming back to our problem, a given set of trajectories
generated by J, could have been generated by any other
J
in the equivalence class associated to J. Therefore, in order
to dene correctly the estimation problem it is necessary to
identify a canonical representative in each class; afterwards,
we can then reduce the problem to estimating the canonical
form of J. The problem can be reformulated as follows.
Problem 2: Assume that we are given the matrices A and B
and a set of optimal trajectories for different initial conditions:
{x
i
(t) = x
(t; x
0,i
) : i = 1, . . . , N}.
Estimate the canonical form of the equivalence class associated
to J J, being J the cost function that have generated the
given trajectories.
Let us then build a set of canonical forms, i.e. a set that con-
tains exactly one element in each equivalence class. Consider
the following set of cost functions:
J
1
(x, u) = x
Kx +u
u 2x
u, (6)
J
1
=
J
1
of type (6) with (A+BK) < 0
. (7)
Proposition 1: J
1
is a proper subset of the set J, i.e. J
1
J.
Proof: Consider J
1
J
1
; we want to prove that J
1
J.
Equivalently, we have to show that Q K
K, R I
m
and
S K
K 0
and I
m
> 0. Moreover:
Q S
S
K K
K I
m
0
as follows from the semidenite positiveness of the Schur
complement QSR
1
S
A = A+BK,
Q = 0.
Using (7) we conclude (
A) < 0 and the detectability of
the couple (
A,
Q
1
2
) easily follows. It remains to prove that
J
1
J such that J
1
/ J
1
; this trivially follows from the
denitions.
Proposition 2: Consider J
1
J
1
. If:
J
1
(x, u) = x
Kx +u
u 2x
u,
then the optimal gain associated to J
1
is exactly K.
Proof: Using proposition 1 we have that J
1
J so that
the associated direct optimal control problem can be solved in
the standard manner. Specically, dene Q K
K, R I
m
and S K
P +S
),
where P is the unique positive semidenite solution of the
following Algebraic Riccati Equation:
P
A+
A
P PBR
1
B
P +
Q = 0. (8)
Easy substitutions show that
Q = 0 so that P = 0. Substituting
those results in the expression of the optimal gain we obtain
K
1
= R
1
S
Kx +u
u 2x
u.
is a cost function in J
1
. Moreover, such a J
1
is in the
equivalence class of J, i.e. J J
1
. In fact, using proposition
2, the optimal gain K
1
associated to J
1
equals K and this
concludes the rst part of the proof. Lets now prove that there
are not two elements of J
1
in the same equivalence class, i.e.
there are not in J
1
two elements equivalent to one another.
Suppose by contradiction that there exist J
1
and
J
1
J
1
such
that J
1
J
1
and J
1
=
J
1
. Therefore:
J
1
(x, u) = x
Kx +u
u 2x
u,
J
1
(x, u) = x
Kx +u
u 2x
u.
If J
1
=
J
1
, then K =
K. But K and
K are the optimal
gains associated to J
1
and
J
1
respectively and consequently
the optimal gains associated to J
1
and
J
1
are different. This
contradicts the assumption J
1
J
1
and concludes the proof.
Proposition 4: Consider the system x = Ax + Bu and let
x(t) be a state trajectory. Then, there exists a cost function
J J that has x(t) as minimizing trajectory if and only if
there exists a stabilizing gain K such that:
x(t) = (A+BK)x(t) t. (9)
Proof: () It easily follows from the theory of optimal
control. () Take:
J(x, u) = x
Kx +u
u 2x
u,
and use proposition 2 to show that the optimal trajectory is
exactly (9).
At this point problem 2 can be formulated as a constrained
parametric identication problem; the model will be the fol-
lowing:
(t; x
0,i
) truncated after a
sufciently long time interval. The main obstacle in solving
this identication problem consists in forcing the matrix K to
be stabilizing.
III. COST FUNCTION ESTIMATION: FINITE TIME OPTIMAL
CONTROL WITH FIXED FINAL STATE
In this section we consider a nite time optimal control with
xed nal state. To our knowledge, the associated inverse op-
timal control problem has never been considered in literature.
min
u
1
2
t1
t0
J(x(t), u(t))dt s.t.
x = Ax +Bu
x(t
0
) = x
0
, x(t
1
) = x
1
Lets make the standard assumption on the cost function J,
i.e. lets assume J J; moreover let the couple (A, B) be
controllable and the matrix B be full column rank. Under
these assumptions the direct problem solution is well known.
However, in this paper we do not refer to the classical
expression for the optimal control u
(t) = K
+
e
A+t
p
1
K
e
A(tt
f
)
p
2
where the quantities K
, K
+
, A
, A
+
, p
1
and p
2
can be
determined as follows:
A
+
= A+BK
+
, A
= A+BK
,
K
+
= R
1
(S
+B
P), K
= K
+
+R
1
B
p
1
p
2
I
n
e
At
f
e
A+t
f
I
n
x
0
x
1
.
The matrix P is the unique stabilizing and positive semide-
nite solution of the Algebraic Riccati Equation (8); is the
solution of the following Lyapunov equation:
A
+
1
+
1
A
+
+BR
1
B
= 0.
Using the well known properties of the Lyapunov equation and
the stability of the matrix A
+
, we know that is uniquely
determined as:
=
0
e
A
+
t
BR
1
B
e
A+t
dt
1
.
At this point, a problem similar to the one presented in the
previous section can be formulated. In this case, trajectories
are assumed to be optimal with respect to an LQ xed nal
state problem with different initial and nal conditions.
Problem 3: Assume we are given the matrices A and B
and a set of optimal trajectories for different initial and nal
conditions:
{x
i
(t) = x
(t; x
0,i
, x
1,i
) : i = 1, . . . , N}.
Estimate the cost function J J (or equivalently the matrices
Q, R and S) that have generated the given trajectories.
Also in this case the problem does not have a unique solution.
Specically, one can easily nd different (not simply scaled)
cost functions, J and
J, that lead to the same K
+
and K
.
Therefore the trajectories that follow from the minimization of
J will be equal to the ones that follow from the minimization
of
J, for any initial and nal condition. Consequently, the
following denition of equivalence is the natural modication
of the denition seen in the Section II.
Denition 2: (J
J, equivalence relation on J). Consider
J,
J J. We say that J is equivalent to
J and we write J
J
if and only if K
+
=
K
+
and K
=
K
.
Once again, it can be shown that the relation dened on the
set of the cost functions J is an equivalence and consequently
J can be divided into equivalence classes. Notice that a given
set of optimal trajectories generated by J, could have been
generated by any other
J in the equivalence class associated to
J. Ideally, one would like to associate a canonical form to each
equivalence class. The set of canonical forms should contain
exactly one element for each equivalence class. However, up
to this moment, we havent found such a set; instead, in the
following we will introduce a set J
2
J that contains at least
one element in equivalence class:
J
2
(x, u) = x
RKx +u
Ru 2x
Ru (10)
J
2
=
J
2
of type (10) with (A+BK) < 0,
R = R
> 0, det(R) = 1
. (11)
Proposition 5: J
2
is a proper subset of the set J, i.e. J
2
J.
Proof: Consider J
2
J
2
; we want to show that J
2
J.
We have to show that Q K
RK, R and S K
R
satisfy the properties in (4). Obviously, R = R
> 0 by (11)
and therefore Q = K
RK 0. Moreover:
Q S
S
RK K
R
RK R
0,
as follows from the semidenite positiveness of the Schur
complement QSR
1
S
A = A+BK,
Q = 0,
so that the detectability follows from the fact that (A +
BK) < 0 as a consequence of J
2
J
2
. It remains to prove
that J J such that J / J
2
; this trivially follows from the
denitions.
Proposition 6: The set J
2
contains at least one element in
each equivalence class in J.
Proof: We have to prove that for every J J there is at
least one element
J J
2
such that J
J. Let:
J(x, u) = x
Qx +u
Ru + 2x
Su.
and let K
+
and K
J(x, u) = x
+
RK
+
x +u
Ru 2x
+
Ru.
where we have dened
R
R
det(R)
. Lets show that
J is
effectively in J
2
, i.e. lets show that (A + BK
+
) < 0 and
that
R is unitary and positive denite. (A + BK
+
) < 0
from the fact that A + BK
+
= A
+
is asymptotically stable;
the second is evident from the denition of
R and from the
positivity of R. To prove that J
J we have to show that
K
+
and K
equal respectively
K
+
and
K
. Starting from
K
+
and using J
2
J we have:
P
A+
A
P PB
R
1
B
P +
Q = 0, (12)
K
+
=
R
1
(B
P +
S
), (13)
where P is the unique stabilizing and positive semidenite
solution of (12) and:
Q K
+
RK
+
,
S K
+
R,
A AB
R
1
S
,
Q
Q
S
R
1
S
R
1
S
=
K
+
. We are left with proving that
K
= K
= K
+
+R
1
B
0
e
A
+
t
BR
1
B
e
A+t
dt
1
,
= K
+
+
R
1
B
0
e
A
+
t
B
R
1
B
e
A+t
dt
1
.
Observing that R and
R differ only for the multiplicative
constant det(R) we conclude
K
= K
.
In the view of our problem, any set of optimal trajectories
generated by J J can be generated by a
J J
2
equivalent
to J; such a
J always exists according to Proposition 6. In
the following result we formulate necessary and sufcient
conditions for a given x(t) to be optimal.
Proposition 7: Consider the system x = Ax + Bu and let
x(t) be a state trajectory. Then, there exists a cost function
J J that has x(t) as the minimizing trajectory if and only
if there exists:
a stabilizing gain K,
a matrix R = R
x(t)
(t)
A+BK BR
1
B
O
n
A
x(t)
(t)
(14)
Proof: To prove what we claim we start from a well
known result in linear quadratic convex optimal control (see
[8] for details). It essentially states that if J J then a
trajectory x(t) is optimal for the associated xed nal state
optimal problem -with initial and nal condition respectively
x(t
0
) and x(t
1
)- if and only if the following condition holds:
(t) :
x(t)
(t)
A BR
1
B
x(t)
(t)
, (15)
where we dened as usual:
A ABR
1
S
,
Q QSR
1
S
.
Lets now prove proposition 7. (Only if) Consider x(t),
optimal for J J. Then, according to Proposition 6, it must
be optimal for some
J J
2
:
J(x, u) = x
RKx +u
Ru 2x
Ru. (16)
Since J
2
J, condition (15) must hold with Q = K
RK,
S = K
R and R = R
RK,
and S = K
R.
Problem 3 becomes a mixed identication-estimation problem.
Specically we have:
x(t)
(t)
A+BK BR
1
B
O
n
(A+BK)
x(t)
(t)
+v(t)
y(t) = x(t) +w(t)
where the measurements are the optimal trajectories x
i
(t) =
x
(t; x
0,i
, x
1,i
) for i = 1, . . . , N. The variables to be estimated
are the matrices K and R together with the state variable (t).
The estimation procedure should constrain the unown matrices
K and R to be respectively stabilizing for the couple (A, B)
and positive denite.
IV. CONCLUSIONS
This paper was about the problem of estimating a cost
function from a given set of state trajectories; matrices A and
B of the underlying linear system were assumed known. Two
classes of optimal control problems were considered: innite
time optimal control and nite time optimal control with xed
nal state; in both cases, cost functions were assumed to be
quadratic. The estimation problem turned out to be ill-posed by
the non-uniqueness of the matrices Q, R and S that correspond
to the same optimal control. This complication was handled by
dening a proper equivalence relation on the set of admissible
cost functions and searching for a set of canonical forms. The
problem of estimating the cost function was then reduced to
the constrained identication of a linear gray-box state space
model.
REFERENCES
[1] R. Kalman, When is a linear control system optimal? ASME Transac-
tions, Journal of Basic Engineering, vol. 86, pp. 5160, 1964.
[2] J. Willems, Least squares stationary optimal control and the algebraic
riccati equation, IEEE Transaction on Automatic Control, vol. Ac-16,
pp. 621633, december 1971.
[3] E. Kreindler and A. Jameson, Optimality of linear control systems,
IEEE Transaction on Automatic Control, pp. 349351, june 1972.
[4] A. Jameson and E. Kreindler, Inverse problem of linear optimal control,
SIAM Journal of Control, vol. 11, pp. 119, February 1973.
[5] B. P. Molinari, The stable regulator problem and its inverse, IEEE
Transaction on Automatic Control, vol. Ac-18, pp. 454459, october
1973.
[6] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic
Methods. Prentice-Hall International Inc., 1989.
[7] A. Ferrante, G. Marro, and L. Ntogramatzidis, Employing the algebraic
riccati equation for the solution of the nite-horizon lq problem, in
Proceedings of the 42nd IEEE Conference on Decision and Control, 2003,
pp. 210214.
[8] F. L. Lewis and V. L. Syrmos, Optimal control. Jhon Wiley and Sons,
Inc., 1995.