Sunteți pe pagina 1din 5

Linear Optimal Control Problems and Quadratic

Cost Functions Estimation


Francesco Nori
Department of Information Engineering
Universit` a degli Studi di Padova
Padova, Italia
Email: iron@dei.unipd.it
Ruggero Frezza
Department of Information Engineering
Universit` a degli Studi di Padova
Padova, Italia
Email: frezza@dei.unipd.it
AbstractInverse optimal control is a classical problem of
control theory. It was rst posed by Kalman in the early sixties.
The problem, as addressed in literature, answers to the following
two questions: (a) Given system matrices A,B and a gain matrix
K, nd necessary and sufcient conditions for K to be the optimal
of an innite time LQ problem. (b) Determine all weight matrices
Q, R and S which yield the given gain matrix K. In this paper, we
tackle a related, but different problem. Starting from the state
trajectories of an LTI system, identify the matrices Q, R and
S that have generated those trajectories. Both innite and nite
time optimal control problems are considered. The motivation
lies in the characterization of the trajectories of LTI systems in
terms of the control task.
I. INTRODUCTION
Inverse optimal control theory dates back to the early sixties.
The theory was inspired by a problem rst posed by Kalman
[1]. Consider the classical innite time LQ direct optimal
control:
min
u


t0

(t)Qx(t) +u

(t)Ru(t) + 2x

(t)Su(t)

dt
s.t.

x(t) = Ax(t) +Bu(t)


x(t
0
) = x
0
x R
n
, u R
m
.
(1)
Under general hypothesis (Q = Q

, R = R

> 0 and (A, B)


stabilizable), it can be proven [2] that if an optimal control u

exists, then it corresponds to a static state feedback K, i.e.


u

(t) = Kx(t). The inverse problem considered by Kalman


consists in two questions:
(a) Find necessary and sufcient conditions on the matrices
A, B and K such that the control law u(t) = Kx(t)
minimizes the cost (1) for some Q, R and S.
(b) Determine all such costs, i.e. all Q, R and S correspond-
ing to the same K.
A. Previous works on inverse optimal control
In this section we report previous results on the solution
of (a) and (b). The underlying hypothesis are the same of
the direct problem plus a condition on B: rank(B) = m.
Moreover, (1) is slightly modied by adding a stabilizing
constraint:
lim
t
x(t) = 0.
Kreindler [3] solved (a) showing that K is optimal for some
Q = Q

, R = R

> 0 and S if and only if A + BK is


asymptotically stable. Moreover, he showed that (b) can be
solved arbitrarily xing Q = Q

and R = R

> 0, and then


computing a suitable S.
Jameson [4] restricted the set of cost functions, imposing
S = 0. Under this restriction, K is optimal if and only if
A + BK is asymptotically stable and KB has m linearly
independent real eigenvectors. All Q and R that lead to the
same K are determined solving in P = P

and R = R

the
matrix equation B

P = RK; Q is then computed using the


standard algebraic Riccati equation, with Q as the unknown.
The obtained results are easily generalized to the time-varying
nite-horizon LQ optimal control problem.
Molinari [5] considered an even more restrictive situation,
xing R = I
m
and S = 0. Under those restraints, K is optimal
if and only if A + BK is asymptotically stable and KB is
symmetric. The set of all Q = Q

leading to a given K can


be obtained solving:
Q =

Q+A

Y +Y A,
where Y = Y

is any solution of Y B = 0 and where



Q is
any weighting matrix leading to the given K.
The solution of (a) considerably changes if Q is required to
be positive semidenite. Specically, the request Q = Q

0
leads to the so called return difference condition. Its single
input formulation is due to Kalman [1] and was later extended
by Anderson and Moore [6]. Specically, K is optimal for
some Q = Q

0, R = R

, S = 0 if and only if A +BK


is stable and there exists an R = R

> 0 that satisfy the


return difference inequality:

I +B

(jI A

)
1
K

I +K(jI A)
1
B

R R. (2)
B. Cost estimation problem
In this paper we address a problem related to inverse
optimal control. Specically, we consider the problem of
estimating a cost function that best approximate a given set of
state trajectories. The underlying hypothesis is that the given
trajectories result from the optimal control of a known linear
system at different initial conditions. We aim at estimating the
(unknown) cost function that has generated this set. Once the
correct cost has been learned, we can think of simulating the
state trajectories of the system at new initial conditions, not
necessarily contained in the original training set.
II. COST FUNCTION ESTIMATION: INFINITE TIME
OPTIMAL CONTROL
Consider the following optimal control problem:
min
u
1
2


t0
J(x(t), u(t))dt s.t.

x = Ax +Bu
x(t
0
) = x
0
where (A, B) is supposed stabilizable and the matrix B full
column rank. Let the cost function contain an input-state cross
term:
J(x, u) = x

Qx +u

Ru + 2x

Su. (3)
From now on, we will indicate J the set of such cost functions
with standard hypothesis on the matrices Q, R and S, i.e.:
J =

J : J of type (3) with Q = Q

0, R = R

> 0,

Q S
S

0, (

A,

Q
1
2
) detect.

; (4)
the matrices

A,

Q are dened as follows:

A = ABR
1
S

,

Q = QSR
1
S

. (5)
Under these hypothesis the solution of the optimal control
problem is known to be unique and corresponds to an asymp-
totically stabilizing static state feedback K, i.e. u

(t) = Kx(t)
with A + BK stable. Let x

(t; x
0
) be the optimal trajectory,
i.e. the state trajectory corresponding to the optimizing input
u

(t), when the initial condition is x(t


0
) = x
0
. Then we
formulate the following problem.
Problem 1: Assume that we are given the matrices A and B
and a set of optimal trajectories for different initial conditions:
{x
i
(t) = x

(t; x
0,i
) : i = 1, . . . , N}.
Estimate the cost function J J (or equivalently the matrices
Q, R and S) that have generated the given trajectories.
The problem does not have a unique solution, because optimal
trajectories do not univocally determine a cost function in
J. Specically, previous works have shown that for any cost
function in the set J there exists an innite number of different
(not simply scaled) cost functions leading to the same optimal
gain K. Therefore, given a set of optimal trajectories with
different initial conditions, we cannot univocally determine
the matrices Q, R, S that generated those trajectories. The
following denition will help us in formalizing this concept.
Denition 1: (J

J, equivalence relation on J). Consider
J,

J J. We say that J is equivalent to

J and we write
J

J if and only if K =

K, being K and

K the optimal
gains associated to J and

J, respectively.
It can be shown that the relation dened on the set of the
cost functions J is an equivalence, being reexive, symmetric
and transitive. Consequently, we can divide J into equivalence
classes.
Coming back to our problem, a given set of trajectories
generated by J, could have been generated by any other

J
in the equivalence class associated to J. Therefore, in order
to dene correctly the estimation problem it is necessary to
identify a canonical representative in each class; afterwards,
we can then reduce the problem to estimating the canonical
form of J. The problem can be reformulated as follows.
Problem 2: Assume that we are given the matrices A and B
and a set of optimal trajectories for different initial conditions:
{x
i
(t) = x

(t; x
0,i
) : i = 1, . . . , N}.
Estimate the canonical form of the equivalence class associated
to J J, being J the cost function that have generated the
given trajectories.
Let us then build a set of canonical forms, i.e. a set that con-
tains exactly one element in each equivalence class. Consider
the following set of cost functions:
J
1
(x, u) = x

Kx +u

u 2x

u, (6)
J
1
=

J
1
of type (6) with (A+BK) < 0

. (7)
Proposition 1: J
1
is a proper subset of the set J, i.e. J
1
J.
Proof: Consider J
1
J
1
; we want to prove that J
1
J.
Equivalently, we have to show that Q K

K, R I
m
and
S K

satisfy the properties in (4). Obviously K

K 0
and I
m
> 0. Moreover:

Q S
S

K K

K I
m

0
as follows from the semidenite positiveness of the Schur
complement QSR
1
S

= 0 0 and from the positiveness


of the matrix R. It remains to prove that (

A,

Q
1
2
) is detectable.
Easy substitution in (5) leads to:

A = A+BK,

Q = 0.
Using (7) we conclude (

A) < 0 and the detectability of
the couple (

A,

Q
1
2
) easily follows. It remains to prove that
J
1
J such that J
1
/ J
1
; this trivially follows from the
denitions.
Proposition 2: Consider J
1
J
1
. If:
J
1
(x, u) = x

Kx +u

u 2x

u,
then the optimal gain associated to J
1
is exactly K.
Proof: Using proposition 1 we have that J
1
J so that
the associated direct optimal control problem can be solved in
the standard manner. Specically, dene Q K

K, R I
m
and S K

. The optimal gain associated to J


1
is given
by:
K
1
= R
1
(B

P +S

),
where P is the unique positive semidenite solution of the
following Algebraic Riccati Equation:
P

A+

A

P PBR
1
B

P +

Q = 0. (8)
Easy substitutions show that

Q = 0 so that P = 0. Substituting
those results in the expression of the optimal gain we obtain
K
1
= R
1
S

= K, and this concludes the proof.


Proposition 3: The set J
1
is a set of canonical forms for
the equivalence relation dened on J.
Proof: Since we have already proven that J
1
J, we are
left with proving that the set J
1
contains exactly one element
in each equivalence class. We proceed by proving that for
every J J there is at least one element J
1
J
1
such that
J J
1
; we then prove that this element is unique. Consider
J J and let K be the optimal gain associated to J; using
the hypothesis on J we know that (A + BK) < 0 (see [6]
for details). Consequently, the following cost function:
J
1
(x, u) = x

Kx +u

u 2x

u.
is a cost function in J
1
. Moreover, such a J
1
is in the
equivalence class of J, i.e. J J
1
. In fact, using proposition
2, the optimal gain K
1
associated to J
1
equals K and this
concludes the rst part of the proof. Lets now prove that there
are not two elements of J
1
in the same equivalence class, i.e.
there are not in J
1
two elements equivalent to one another.
Suppose by contradiction that there exist J
1
and

J
1
J
1
such
that J
1


J
1
and J
1
=

J
1
. Therefore:
J
1
(x, u) = x

Kx +u

u 2x

u,

J
1
(x, u) = x


Kx +u

u 2x

u.
If J
1
=

J
1
, then K =

K. But K and

K are the optimal
gains associated to J
1
and

J
1
respectively and consequently
the optimal gains associated to J
1
and

J
1
are different. This
contradicts the assumption J
1


J
1
and concludes the proof.
Proposition 4: Consider the system x = Ax + Bu and let
x(t) be a state trajectory. Then, there exists a cost function
J J that has x(t) as minimizing trajectory if and only if
there exists a stabilizing gain K such that:
x(t) = (A+BK)x(t) t. (9)
Proof: () It easily follows from the theory of optimal
control. () Take:
J(x, u) = x

Kx +u

u 2x

u,
and use proposition 2 to show that the optimal trajectory is
exactly (9).
At this point problem 2 can be formulated as a constrained
parametric identication problem; the model will be the fol-
lowing:

x(t) = (A+BK)x(t) +v(t)


y(t) = x(t) +w(t)
where everything is known but the noise variances
v
,
w
,
and the matrix K, constrained to be stabilizing for (A, B).
Measurements are the trajectories x

(t; x
0,i
) truncated after a
sufciently long time interval. The main obstacle in solving
this identication problem consists in forcing the matrix K to
be stabilizing.
III. COST FUNCTION ESTIMATION: FINITE TIME OPTIMAL
CONTROL WITH FIXED FINAL STATE
In this section we consider a nite time optimal control with
xed nal state. To our knowledge, the associated inverse op-
timal control problem has never been considered in literature.
min
u
1
2

t1
t0
J(x(t), u(t))dt s.t.

x = Ax +Bu
x(t
0
) = x
0
, x(t
1
) = x
1
Lets make the standard assumption on the cost function J,
i.e. lets assume J J; moreover let the couple (A, B) be
controllable and the matrix B be full column rank. Under
these assumptions the direct problem solution is well known.
However, in this paper we do not refer to the classical
expression for the optimal control u

(t) but to a reformulation


recently proposed by A. Ferrante et al. (see [7] for details).
Specically, it can be proven that:
u

(t) = K
+
e
A+t
p
1
K

e
A(tt
f
)
p
2
where the quantities K

, K
+
, A

, A
+
, p
1
and p
2
can be
determined as follows:
A
+
= A+BK
+
, A

= A+BK

,
K
+
= R
1
(S

+B

P), K

= K
+
+R
1
B

p
1
p
2

I
n
e
At
f
e
A+t
f
I
n

x
0
x
1

.
The matrix P is the unique stabilizing and positive semide-
nite solution of the Algebraic Riccati Equation (8); is the
solution of the following Lyapunov equation:
A
+

1
+
1
A

+
+BR
1
B

= 0.
Using the well known properties of the Lyapunov equation and
the stability of the matrix A
+
, we know that is uniquely
determined as:
=


0
e
A

+
t
BR
1
B

e
A+t
dt

1
.
At this point, a problem similar to the one presented in the
previous section can be formulated. In this case, trajectories
are assumed to be optimal with respect to an LQ xed nal
state problem with different initial and nal conditions.
Problem 3: Assume we are given the matrices A and B
and a set of optimal trajectories for different initial and nal
conditions:
{x
i
(t) = x

(t; x
0,i
, x
1,i
) : i = 1, . . . , N}.
Estimate the cost function J J (or equivalently the matrices
Q, R and S) that have generated the given trajectories.
Also in this case the problem does not have a unique solution.
Specically, one can easily nd different (not simply scaled)
cost functions, J and

J, that lead to the same K
+
and K

.
Therefore the trajectories that follow from the minimization of
J will be equal to the ones that follow from the minimization
of

J, for any initial and nal condition. Consequently, the
following denition of equivalence is the natural modication
of the denition seen in the Section II.
Denition 2: (J

J, equivalence relation on J). Consider
J,

J J. We say that J is equivalent to

J and we write J

J
if and only if K
+
=

K
+
and K

=

K

.
Once again, it can be shown that the relation dened on the
set of the cost functions J is an equivalence and consequently
J can be divided into equivalence classes. Notice that a given
set of optimal trajectories generated by J, could have been
generated by any other

J in the equivalence class associated to
J. Ideally, one would like to associate a canonical form to each
equivalence class. The set of canonical forms should contain
exactly one element for each equivalence class. However, up
to this moment, we havent found such a set; instead, in the
following we will introduce a set J
2
J that contains at least
one element in equivalence class:
J
2
(x, u) = x

RKx +u

Ru 2x

Ru (10)
J
2
=

J
2
of type (10) with (A+BK) < 0,
R = R

> 0, det(R) = 1

. (11)
Proposition 5: J
2
is a proper subset of the set J, i.e. J
2
J.
Proof: Consider J
2
J
2
; we want to show that J
2
J.
We have to show that Q K

RK, R and S K

R
satisfy the properties in (4). Obviously, R = R

> 0 by (11)
and therefore Q = K

RK 0. Moreover:

Q S
S

RK K

R
RK R

0,
as follows from the semidenite positiveness of the Schur
complement QSR
1
S

= 0 0 and from the positiveness


of the matrix R. It remains to prove that (

A,

Q
1
2
) is detectable.
Easy substitution leads to:

A = A+BK,

Q = 0,
so that the detectability follows from the fact that (A +
BK) < 0 as a consequence of J
2
J
2
. It remains to prove
that J J such that J / J
2
; this trivially follows from the
denitions.
Proposition 6: The set J
2
contains at least one element in
each equivalence class in J.
Proof: We have to prove that for every J J there is at
least one element

J J
2
such that J

J. Let:
J(x, u) = x

Qx +u

Ru + 2x

Su.
and let K
+
and K

be the matrices associated to J through


the solution of the associated optimal control problem. An
element

J J
2
equivalent to J is the following:

J(x, u) = x

+

RK
+
x +u


Ru 2x

+

Ru.
where we have dened

R
R
det(R)
. Lets show that

J is
effectively in J
2
, i.e. lets show that (A + BK
+
) < 0 and
that

R is unitary and positive denite. (A + BK
+
) < 0
from the fact that A + BK
+
= A
+
is asymptotically stable;
the second is evident from the denition of

R and from the
positivity of R. To prove that J

J we have to show that
K
+
and K

equal respectively

K
+
and

K

. Starting from

K
+
and using J
2
J we have:
P

A+

A

P PB

R
1
B

P +

Q = 0, (12)

K
+
=

R
1
(B

P +

S

), (13)
where P is the unique stabilizing and positive semidenite
solution of (12) and:

Q K

+

RK
+
,

S K

+

R,

A AB

R
1

S

,

Q

Q

S

R
1

S

Once again we have



Q = 0 so that P = 0 is the unique positive
semidenite solution of (12); consequently

K
+
=

R
1

S

=
K
+
. We are left with proving that

K

= K

. First notice that


since

K
+
= K
+
we have

A
+
= A+B

K
+
= A+BK
+
= A
+
.
Therefore:
K

= K
+
+R
1
B

0
e
A

+
t
BR
1
B

e
A+t
dt

1
,

= K
+
+

R
1
B

0
e
A

+
t
B

R
1
B

e
A+t
dt

1
.
Observing that R and

R differ only for the multiplicative
constant det(R) we conclude

K

= K

.
In the view of our problem, any set of optimal trajectories
generated by J J can be generated by a

J J
2
equivalent
to J; such a

J always exists according to Proposition 6. In
the following result we formulate necessary and sufcient
conditions for a given x(t) to be optimal.
Proposition 7: Consider the system x = Ax + Bu and let
x(t) be a state trajectory. Then, there exists a cost function
J J that has x(t) as the minimizing trajectory if and only
if there exists:
a stabilizing gain K,
a matrix R = R

> 0 with det(R) = 1,


a vector (t) R
n
,
such that t [t
0
, t
1
]:

x(t)

(t)

A+BK BR
1
B

O
n
A


x(t)
(t)

(14)
Proof: To prove what we claim we start from a well
known result in linear quadratic convex optimal control (see
[8] for details). It essentially states that if J J then a
trajectory x(t) is optimal for the associated xed nal state
optimal problem -with initial and nal condition respectively
x(t
0
) and x(t
1
)- if and only if the following condition holds:
(t) :

x(t)

(t)


A BR
1
B


x(t)
(t)

, (15)
where we dened as usual:

A ABR
1
S

,

Q QSR
1
S

.
Lets now prove proposition 7. (Only if) Consider x(t),
optimal for J J. Then, according to Proposition 6, it must
be optimal for some

J J
2
:

J(x, u) = x

RKx +u

Ru 2x

Ru. (16)
Since J
2
J, condition (15) must hold with Q = K

RK,
S = K

R and R = R

> 0 unitary; then, substitutions


in (15) lead to (14) and the result is proven. (If) Suppose
that (14) holds for a K (stabilizing), R (unitary and positive
denite) and (t); we prove that x(t) is optimal for

J dened
as in (16) with the given K and R. The result easily follows
observing that (15) corresponds to (14) dening Q = K

RK,
and S = K

R.
Problem 3 becomes a mixed identication-estimation problem.
Specically we have:

x(t)

(t)

A+BK BR
1
B

O
n
(A+BK)


x(t)
(t)

+v(t)
y(t) = x(t) +w(t)
where the measurements are the optimal trajectories x
i
(t) =
x

(t; x
0,i
, x
1,i
) for i = 1, . . . , N. The variables to be estimated
are the matrices K and R together with the state variable (t).
The estimation procedure should constrain the unown matrices
K and R to be respectively stabilizing for the couple (A, B)
and positive denite.
IV. CONCLUSIONS
This paper was about the problem of estimating a cost
function from a given set of state trajectories; matrices A and
B of the underlying linear system were assumed known. Two
classes of optimal control problems were considered: innite
time optimal control and nite time optimal control with xed
nal state; in both cases, cost functions were assumed to be
quadratic. The estimation problem turned out to be ill-posed by
the non-uniqueness of the matrices Q, R and S that correspond
to the same optimal control. This complication was handled by
dening a proper equivalence relation on the set of admissible
cost functions and searching for a set of canonical forms. The
problem of estimating the cost function was then reduced to
the constrained identication of a linear gray-box state space
model.
REFERENCES
[1] R. Kalman, When is a linear control system optimal? ASME Transac-
tions, Journal of Basic Engineering, vol. 86, pp. 5160, 1964.
[2] J. Willems, Least squares stationary optimal control and the algebraic
riccati equation, IEEE Transaction on Automatic Control, vol. Ac-16,
pp. 621633, december 1971.
[3] E. Kreindler and A. Jameson, Optimality of linear control systems,
IEEE Transaction on Automatic Control, pp. 349351, june 1972.
[4] A. Jameson and E. Kreindler, Inverse problem of linear optimal control,
SIAM Journal of Control, vol. 11, pp. 119, February 1973.
[5] B. P. Molinari, The stable regulator problem and its inverse, IEEE
Transaction on Automatic Control, vol. Ac-18, pp. 454459, october
1973.
[6] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic
Methods. Prentice-Hall International Inc., 1989.
[7] A. Ferrante, G. Marro, and L. Ntogramatzidis, Employing the algebraic
riccati equation for the solution of the nite-horizon lq problem, in
Proceedings of the 42nd IEEE Conference on Decision and Control, 2003,
pp. 210214.
[8] F. L. Lewis and V. L. Syrmos, Optimal control. Jhon Wiley and Sons,
Inc., 1995.

S-ar putea să vă placă și