Documente Academic
Documente Profesional
Documente Cultură
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
16.323 Lecture 4
HJB Equation
DP in continuous time
HJB Equation
Continuous LQR
uT Ru
= 2uT R
u
Ru
=R
u
Spr 2008
DP in Continuous Time
16.323 41
subject to
x
x(t0)
m(x(tf ), tf )
u(t)
=
=
=
a(x, u, t)
given
0 set of terminal conditions
U set of possible constraints
First step: consider cost over the interval [t, tf ], where t tf of any
control sequence u( ), t tf
tf
J(x(t), t, u( )) = h(x(tf ), tf ) +
g(x( ), u( ), ) d
t
t tf
Spr 2008
16.323 42
Approach:
Split time interval [t, tf ] into [t, t + t] and [t + t, tf ], and are
specically interested in the case where t 0
Identify the optimal cost-to-go J (x(t + t), t + t)
Determine the stage cost in time [t, t + t]
Combine above to nd best strategy from time t.
Manipulate result into HJB equation.
Split:
J (x(t), t) = min
u( )U
t tf
h(x(tf ), tf ) +
g(x( ), u( ), )) d
= min h(x(tf ), tf ) +
u( )U
t tf
tf
t+t
tf
g(x, u, ) d +
g(x, u, ) d
t+t
Implicit here that at time t+t, the system will be at state x(t+t).
But from the principle of optimality, we can write that the
optimal cost-to-go from this state is:
J (x(t + t), t + t)
J (x(t), t) = min
g(x, u, ) d + J (x(t + t), t + t)
u( )U
t t+t
Spr 2008
16.323 43
J
J (x(t + t), t + t) J (x(t), t) +
(x(t), t) t
t
J
+
(x(t), t) (x(t + t) x(t))
x
Which for small t can be compactly written as:
J (x(t + t), t + t) J (x(t), t) + Jt(x(t), t)t
+Jx(x(t), t)a(x(t), u(t), t)t
Spr 2008
HJB Equation
16.323 44
Spr 2008
16.323 45
tf
J=
dt = tf
xT
=
x
and Jt = 0
which gives:
xT
x
0 = 1 +
(Ax)
x
x
1
=
(xT Ax)
x
1 1 T
=
x (A + AT )x = 0
x 2
so that the HJB is satised and the optimal control is:
x
u =
x
June 18, 2008
Spr 2008
Continuous LQR
16.323 46
1
1
T
T
T
J = x(tf ) Hx(tf )+
x(t) Rxx(t)x(t) + u(t) Ruu(t)u(t) dt
2
2 t0
Assume that tf xed and there are no bounds on u,
Assume H, Rxx(t) 0 and Ruu(t) > 0, then
1
H(x, u, Jx, t) =
x(t)T Rxx(t)x(t) + u(t)T Ruu(t)u(t)
2
+Jx(x(t), t) [A(t)x(t) + B(t)u(t)]
Since
2H
= Ruu(t) > 0
u2
then this denes a global minimum.
Spr 2008
16.323 47
1
T
1
1
T
T
x(t) Rxx(t)x(t) + Jx (x(t), t)B(t)Ruu (t)Ruu(t)Ruu (t)B(t) Jx
(x(t), t)
2
1
T
T
P (t) = P T (t)
J (x(t), t) = xT (t)P (t)x(t),
2
= xT (t)P (t)x(t)
t
2
u(t)U
6 See
7 Partial
derivatives taken wrt one variable assuming the other is xed. Note that there are 2 independent variables in this problem
x and t. x is time-varying, but it is not a function of t.
Spr 2008
16.323 48
Key thing about this J solution is that, since Jx = xT (t)P (t), then
1
u(t) = Ruu
(t)B(t)T Jx(x(t), t)T
1
= Ruu
(t)B(t)T P (t)x(t)
Spr 2008
16.323 49
As before, can evaluate the performance of some arbitrary timevarying feedback gain u = G(t)x(t), and the result is that
JG = xT S(t)x
2
S(tf ) = H
Since this must be true for arbitrary G, then would expect that this
1
reduces to Riccati Equation if G(t) Ruu
(t)B T (t)S(t)
Spr 2008
LQR Observations
16.323 410
If a steady state solution exists Pss to the DRE, then the closed-loop
system using the static form of the feedback
1 T
u(t) = Ruu
B Pssx(t) = Fssx(t)
8 16.31
Notes on Controllability
9 16.31
Notes on Observability
Spr 2008
16.323 411
This simple system represents one of the few cases for which the
dierential Riccati equation can be solved analytically:
P ( ) =
where = tf t, = a2 + b2(Rxx/Ruu).
Note that for given a and b, ratio Rxx/Ruu determines the time
constant of the transient in P (t) (determined by ).
The steady-state P solves the CARE:
2 2
2aPss + Rxx Pss
b /Ruu = 0
a + a2 + b2Rxx/Ruu
a+
a + a +
Pss =
= 2
=
>0
b /Ruu b2/Ruu a +
b2/Ruu
With Ptf = 0, the solution of the dierential equation reduces to:
P ( ) =
Rxx sinh( )
(a) sinh( ) + cosh( )
Rxx
Rxx sinh( )
= Pss
(a) sinh( ) + cosh( )
(a) +
Spr 2008
16.323 412
a + a2 + b2Rxx/Ruu
1
Kss = Ruu bPss =
b
The closed-loop dynamics are
x = (a bKss)x = Acl x(t)
b
= a (a + a2 + b2Rxx/Ruu) x
b
= a2 + b2Rxx/Ruu x
which are clearly stable.
Spr 2008
Numerical P Integration
16.323 413
P11
P
12
...
P1n
vec(P ) =
P22
P23
.
..
Pnn
The unvec(y) operation is the straightforward
Can now write the DRE as dierential equation in the variable y
Note that with = tf t, then d = dt,
t = tf corresponds to = 0, t = 0 corresponds to = tf
Can do the integration forward in time variable : 0 tf
Then dene a Matlab function as
doty = function(y);
P=unvec(y); %
doty = vec(dotP); %
return
Spr 2008
16.323 414
Figure 4.2: Comparison showing response with much larger Rxx /Ruu
June 18, 2008
Spr 2008
16.323 415
Figure 4.3: State response with high and low Ruu . State response with timevarying gain almost indistinguishable highly dynamic part of x response ends before
signicant variation in P .
June 18, 2008
Spr 2008
16.323 416
Spr 2008
Numerical Calculation of P
1
2
3
4
5
6
7
8
9
10
11
12
A=3;B=11;Rxx=7;Ptf=13;tf=2;dt=.0001;
Ruu=20^2;
Ruu=2^2;
13
14
15
16
17
18
19
20
21
22
time=[0:dt:tf];
P=zeros(1,length(time));K=zeros(1,length(time));Pcurr=Ptf;
for kk=0:length(time)-1
P(length(time)-kk)=Pcurr;
K(length(time)-kk)=inv(Ruu)*B*Pcurr;
Pdot=-Pcurr*A-A*Pcurr-Rxx+Pcurr*B*inv(Ruu)*B*Pcurr;
Pcurr=Pcurr-dt*Pdot;
end
23
24
25
26
27
28
29
30
31
32
options=odeset(RelTol,1e-6,AbsTol,1e-6)
[tau,y]=ode45(@doty,[0 tf],vec(Ptf));
Tnum=[];Pnum=[];Fnum=[];
for i=1:length(tau)
Tnum(length(tau)-i+1)=tf-tau(i);
temp=unvec(y(i,:));
Pnum(length(tau)-i+1,:,:)=temp;
Fnum(length(tau)-i+1,:)=-inv(Ruu)*B*temp;
end
33
34
35
[klqr,Plqr]=lqr(A,B,Rxx,Ruu);
36
37
38
39
40
41
42
43
% Analytical pred
beta=sqrt(A^2+Rxx/Ruu*B^2);
t=tf-time;
Pan=((A*Ptf+Rxx)*sinh(beta*t)+beta*Ptf*cosh(beta*t))./...
((B^2*Ptf/Ruu-A)*sinh(beta*t)+beta*cosh(beta*t));
Pan2=((A*Ptf+Rxx)*sinh(beta*(tf-Tnum))+beta*Ptf*cosh(beta*(tf-Tnum)))./...
((B^2*Ptf/Ruu-A)*sinh(beta*(tf-Tnum))+beta*cosh(beta*(tf-Tnum)));
44
45
46
47
48
49
50
51
52
53
54
55
figure(1);clf
else
end
56
57
58
59
60
61
62
63
64
65
66
67
figure(3);clf
else
end
16.323 417
Spr 2008
16.323 418
68
69
70
71
72
73
74
75
76
77
78
Pan2=inline(((A*Ptf+Rxx)*sinh(beta*t)+beta*Ptf*cosh(beta*t))/((B^2*Ptf/Ruu-A)*sinh(beta*t)+beta*cosh(beta*t)));
x1=zeros(1,length(time));x2=zeros(1,length(time));
xcurr1=[1];xcurr2=[1];
for kk=1:length(time)-1
x1(:,kk)=xcurr1; x2(:,kk)=xcurr2;
xdot1=(A-B*Ruu^(-1)*B*Pan2(A,B,Ptf,Ruu,Rxx,beta,tf-(kk-1)*dt))*x1(:,kk);
xdot2=(A-B*klqr)*x2(:,kk);
xcurr1=xcurr1+xdot1*dt;
xcurr2=xcurr2+xdot2*dt;
end
79
80
81
82
83
84
85
86
87
88
89
figure(2);clf
plot(time,x2,bs,time,x1,r.);xlabel(time);ylabel(x)
title([A = ,num2str(A), B = ,num2str(B), R_{xx} = ,num2str(Rxx),...
R_{uu} = ,num2str(Ruu), P_{tf} = ,num2str(Ptf)])
legend(K_{ss},K_{analytic},Location,NorthEast)
if Ruu > 10
print -r300 -dpng reg2_11.png;
else
print -r300 -dpng reg2_22.png;
end
function [doy]=doty(t,y);
global A B Rxx Ruu;
P=unvec(y);
dotP=P*A+A*P+Rxx-P*B*Ruu^(-1)*B*P;
doy=vec(dotP);
return
function y=vec(P);
1
2
3
4
5
2
3
4
5
6
y=[];
for ii=1:length(P);
y=[y;P(ii,ii:end)];
end
7
8
return
function P=unvec(y);
2
3
4
5
6
7
8
9
10
11
N=max(roots([1 1 -2*length(y)]));
P=[];kk=N;kk0=1;
for ii=1:N;
P(ii,ii:N)=[y(kk0+[0:kk-1])];
kk0=kk0+kk;
kk=kk-1;
end
P=(P+P)-diag(diag(P));
return
Spr 2008
0 1
0
x =
x+
u
0 1
1
10
0
0
q 0
2J = xT (10)
x(10) +
xT (t)
x(t) + ru2(t) dt
0 0
0 h
0
Compute gains using both time-varying P (t) and steady-state value.
Spr 2008
16.323 420
Figure 4.7: State response - Constant gain and time-varying gain almost indistin
guishable because the transient dies out before the time at which the gains start to
change eectively a steady state problem.
For most applications, the static gains are more than adequate - it
is only when the terminal conditions are important in a short-time
horizon problem that the time-varying gains should be used.
Signicant savings in implementation complexity & computa
tion.
June 18, 2008
Spr 2008
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
h=4;q=1;r=3;
16
17
18
19
20
21
22
Psi11=V(1:2,1:2);
Psi21=V(3:4,1:2);
Ptest=Psi21*inv(Psi11);
23
24
if 0
25
26
27
28
29
30
31
32
33
34
35
36
time=[0:dt:tf];
P=zeros(2,2,length(time));
K=zeros(1,2,length(time));
Pcurr=Ptf;
for kk=0:length(time)-1
P(:,:,length(time)-kk)=Pcurr;
K(:,:,length(time)-kk)=inv(Ruu)*B*Pcurr;
Pdot=-Pcurr*A-A*Pcurr-Rxx+Pcurr*B*inv(Ruu)*B*Pcurr;
Pcurr=Pcurr-dt*Pdot;
end
37
38
39
40
41
42
43
44
45
46
47
48
else
options=odeset(RelTol,1e-6,AbsTol,1e-6)
[tau,y]=ode45(@doty,[0 tf],vec(Ptf),options);
Tnum=[];Pnum=[];Fnum=[];
for i=1:length(tau)
time(length(tau)-i+1)=tf-tau(i);
temp=unvec(y(i,:));
P(:,:,length(tau)-i+1)=temp;
K(:,:,length(tau)-i+1)=inv(Ruu)*B*temp;
end
49
50
end % if 0
51
52
53
[klqr,Plqr]=lqr(A,B,Rxx,Ruu);
54
55
56
57
58
59
60
61
62
63
64
65
66
67
x1=zeros(2,1,length(time));
x2=zeros(2,1,length(time));
xcurr1=[1 1];
xcurr2=[1 1];
for kk=1:length(time)-1
dt=time(kk+1)-time(kk);
x1(:,:,kk)=xcurr1;
x2(:,:,kk)=xcurr2;
xdot1=(A-B*K(:,:,kk))*x1(:,:,kk);
xdot2=(A-B*klqr)*x2(:,:,kk);
xcurr1=xcurr1+xdot1*dt;
xcurr2=xcurr2+xdot2*dt;
end
16.323 421
Spr 2008
68
69
x1(:,:,length(time))=xcurr1;
x2(:,:,length(time))=xcurr2;
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
figure(5);clf
subplot(221)
legend(K_1(t),K_1)
xlabel(Time (sec));ylabel(Gains)
subplot(222)
legend(K_2(t),K_2)
xlabel(Time (sec));ylabel(Gains)
subplot(223)
plot(time,squeeze(x1(1,1,:)),time,squeeze(x1(2,1,:)),m--,LineWidth,2),
legend(x_1,x_2)
subplot(224)
plot(time,squeeze(x2(1,1,:)),time,squeeze(x2(2,1,:)),m--,LineWidth,2),
legend(x_1,x_2)
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
figure(6);clf
subplot(221)
legend(P(t)(1,1),P_{lqr}(1,1),Location,SouthWest)
xlabel(Time (sec));ylabel(P)
subplot(222)
legend(P(t)(1,2),P_{lqr}(1,2),Location,SouthWest)
xlabel(Time (sec));ylabel(P)
subplot(223)
plot(time,squeeze(P(2,1,:)),[0 10],[1 1]*squeeze(Plqr(2,1)),m--,LineWidth,2),
legend(P(t)(2,1),P_{lqr}(2,1),Location,SouthWest)
xlabel(Time (sec));ylabel(P)
subplot(224)
plot(time,squeeze(P(2,2,:)),[0 10],[1 1]*squeeze(Plqr(2,2)),m--,LineWidth,2),
legend(P(t)(2,2),P_{lqr}(2,2),Location,SouthWest)
xlabel(Time (sec));ylabel(P)
axis([0 10 0 8])
if jprint;
print -dpng -r300 reg1_6.png
end
111
112
113
114
115
116
117
118
119
120
121
122
123
figure(1);clf
plot(time,squeeze(K(1,1,:)),[0 10],[1 1]*klqr(1),r--,LineWidth,3)
legend(K_1(t)(1,1),K_1(1,1),Location,SouthWest)
xlabel(Time (sec));ylabel(Gains)
title([q = ,num2str(1), r = ,num2str(r), h = ,num2str(h)])
print -dpng -r300 reg1_1.png
figure(2);clf
plot(time,squeeze(K(1,2,:)),[0 10],[1 1]*klqr(2),r--,LineWidth,3)
legend(K_2(t)(1,2),K_2(1,2),Location,SouthWest)
xlabel(Time (sec));ylabel(Gains)
if jprint;
print -dpng -r300 reg1_2.png
end
124
125
126
127
128
129
130
figure(3);clf
plot(time,squeeze(x1(1,1,:)),time,squeeze(x1(2,1,:)),r--,LineWidth,3),
legend(x_1,x_2)
xlabel(Time (sec));ylabel(States);title(Dynamic Gains)
if jprint;
print -dpng -r300 reg1_3.png
end
131
132
133
134
135
136
137
figure(4);clf
plot(time,squeeze(x2(1,1,:)),time,squeeze(x2(2,1,:)),r--,LineWidth,3),
legend(x_1,x_2)
xlabel(Time (sec));ylabel(States);title(Static Gains);
if jprint;
print -dpng -r300 reg1_4.png
end
16.323 422
Spr 2008
A good rule of thumb when selecting the weighting matrices Rxx and
Ruu is to normalize the signals:
12
(x1)2
max
=
(x2)2max
...
Rxx
12
(u1)2
max
(u2)2max
...
n2
(xn)2max
Ruu
2
m
(um)2
max
The
i
i2 = 1 and
i
i2 = 1 are used to add an additional relative