Documente Academic
Documente Profesional
Documente Cultură
3.1
Dynamic programming is a robust approach to solving optimal control problems. The method was originated by R. Bellman in early 1950s, and its basic
idea is to consider a family of optimal control problems with dierent initial times and states, to establish relationships amon these problems via the
so-called Hamilton-Jacobi-Bellman equation (HJB, for short). If the HJB
equation is solvable (either analytically or numerically), then one can obtain
an optimal feedback control by taking the maiximizer/minimizer involved in
the HJB equation (i.e. the so-called verication technique).
For illustration, let us rst look at the deterministic case.
33
3.1.1
Deterministic control
(3.1)
J(u()) =
(3.2)
for some given maps f and h. Given certain regularity conditions, the state
equation (3.1) admits a unique solution x() C([0, T ]; R) and (3.2) is welldened. The optimal control problem is stated as follows:
Minimize (3.2) over V[0, T ].
Let (s, y) [0, T ) R, and consider the following control system over
[s, T ]:
(3.3)
Here, the control u() V[s, T ] = {u() is measurable in [s, T ]} . The cost
functional is the following:
J(s, y; u() =
V (s, y) = inf u()V [s,T ] J(s, y; u()), for any (s, y) [0, T ) R
V (T, y) = h(y)
35
inf
s
uV[s,T ]
(3.4)
Equation (3.4) is referred to as the dynamic programming equation. The
result is known as Bellmans principle of optimality.
Proof: Let us denote the right-hand side of the above equation by V (s, y).
By denition,
V (s, y) J(s, y; u()) =
s
s
Thus, taking the inmum over V[s, T ] we get V (s, y) V (s, y). Conversely,
for any > 0, there exists a u () V[s, T ] such that
V (s, y) + J(s, y; u ())
s
s
which implies V (s, y) V (s, y). We then obtain the desired result.
Let us make an observation on the Bellmans principle. Suppose (x(), u())
is an optimal pair. Then
V (s, y) = J(s, y; u()) =
=
s
s
s
s
s
s
s
s
s
s
or
1 s
V (s, x(s)) V (s, y)
+
f (t, x(t), u)dt 0, for any u U.
s s
s s s
It follows
Vt + b(t, x, u)Vx + f (t, x, u) 0, for any u U,
which results in
Vt + inf {b(t, x, u)Vx + f (t, x, u)} 0.
uU
On the other hand, for any > 0, 0 s < s T with s s > 0 small
enough, there exists a u() u,s () V[s, T ] such that
V (s, y) + (s s)
s
s
37
uU
s s s
Vt (t, x(t)) + inf {b(t, y, u)Vx (t, y) + f (t, y, u)} , as s s.
uU
In the last limit above, we have used the uniform continuity of functions
b and f as assumed. This yields Vt + inf uU {b(t, x, u)Vx + f (t, x, u)} 0,
which is desired.
If the inmum in the HJB equation is achieved at u = u(t, x), we can
substitute it into (3.3) to get x(t). Then (x(), u()) is an optimal pair.
3.1.2
J(u()) = E
(3.5)
(3.6)
Dene
U[0, T ] = {u() is measurable in [0, T ], and {Ft }t0 adapted} .
The optimal stochastic control problem is stated as follows:
Minimize (3.6) over U[0, T ].
Let (s, y) [0, T ) R, and consider the following control system over
[s, T ]:
(3.7)
J(s, y; u() = E
V (s, y) = inf u()U [s,T ] J(s, y; u()), for any (s, y) [0, T ) R
V (T, y) = h(y)
Yt ,Ct
E0z0
(T s)
U (Cs ) ds + U (ZT ) ,
inf
uU [s,T ]
s
s
39
(3.9)
3.2
Viscosity solutions
The value function is often not smooth. Thus, one needs to introduce the
notion of viscosity solutions to characterize the value function.
Denition 1 A function v C ([0, T ] R) is called a viscosity subsolution
(supersolution) of (3.9) if
v(T, x) h(x), for any x R
(or v(T, x) h(x))
and for any C 1,2 ([0, T ] R) , whenever v attains a local maximum
(minimum) at (t, x) [0, T ) R, we have
t inf
uU
1 2
(t, x, u)xx + b(t, x, u)x + f (t, x, u) 0, ( 0).
2
s s
s
1
t (s, y) + 2 (t, x, u)xx + b(t, x, u)x + f (t, x, u), for any u U.
2
uU
1 2
(t, x, u)xx + b(t, x, u)x + f (t, x, u) 0,
2
s
s
s
1
E
f (t, x(t), u())dt (s, y) + (s, x(s))
s s
s
s
1
E
[t (t, x(t)) dt
s s
s
1
+ 2 (t, x(t), u())xx + b(t, x(t), u())x + f (t, x(t), u()) dt
2
s
1
1 2
E
(t, x, u)xx + b(t, x, u)x + f (t, x, u) dt
t (t, x(t)) + inf
uU 2
s s
s
1 2
t (t, x(t)) + inf
(t, x, u)xx + b(t, x, u)x + f (t, x, u) .
uU 2
So,
t inf
uU
1 2
(t, x, u)xx + b(t, x, u)x + f (t, x, u) 0,
2