Sunteți pe pagina 1din 7

. .

460

IEEE TRANSACTIONS ON AUTOMATIC CONTROL. VOL.

Fig. 4.

Decompo5ition of a multistageproblem

Ail).

?l,.(r): x(0)

I . . . . .

.(t),

U,(O)

..... U [ . ( P I )

T F ( I ) : x ( O ) . . .. , x( t ) , U F ( 0 ) ,... *

I),

UF( I -

AC-26. NO. 2.

APRIL

1981

T. Basar. Information structures and equilibria in dynamic games. in , ? d e w T e n &


mDmanric S w e m T h e a v and Economrcs. M.Aokiand
A. Marzollo, Eds. New
York: Academic. 1978.
7. Basar and G. J.Olsder. Teamoptimal closed-loopStackelbergstrategies
in
NR. 242. Dep Appl. Math.. Twente
hierarchial control problems. Memorandum
Univ. of Technol.. The Netherlands. Memo. N R 242. Feb. 1979. pp. 1-25.
Y-C. Ho and K-C. Chu.Teamdecisiontheoryandinformationstruc1ures
in
optimalcontrolproblems, Part I. IEEETrans. .4uromor. Conrr.. vol. AC-17, pp.
15-22. Feb. 1972.
Y-C.Ho. D. M. Chiu. and R. Muralidharan. A model for optimal peak load pricing
by electric utilities. in Proc. Lmrence S m p . on S w r . and Decision Scr., Berkeley.
CA. Oct. 1977. pp. 16-23.
I. Pressman. A mathematical formulationof the peak-load pricing problem, Bell J.
Econ. Munugemenr S n . , vol. I , pp. 304-326,. Autumn 1970.
E. Maskin. The implementation of social choice
P. Dasgupta. P. Hammond. and
r u l e s : Some general results on incentive compatibility. R ~ I Econ.
.
Sfudies. vol. 46.
pp. 185-216. Apr. 1979.
T. Basar, Hierarchical decisionmaking under uncertainty. in D~rrurmc Oprimi;otion
and Marhemurxal Economics. P. T.Liu. Ed. New York: Plenum. 1980.
G. P. Papavassilopoulos and J. B. Cruz. Jr.. Sufficient conditions for Stackelberg
and Nasb strategies with memory.J. Oprrmrz. Thee? rlppl.. Sept. 1980.
Y-C. Ho. P. B. Luh. and G . J. Olsder. A control theoretic blew on incentives. in
Prac. 4Jh Inr. Con/ on Ana!nls and Oprrmi; Svs.. IhRIA. Yersailles. France Dec.
1980: also Springer-Verlag Lecture Kotes Series. to be published.

i.e.. each player has perfect memory of the state ( x ) and his own control
history. The point here is that given . x ( t ) . u,.(t- I), X([- I). we can in
general calculate u F (t - 1) or vice versa [4]. As shown in Fig. 4, the leader
at time r can choose h s decision u , ( f ) based on u F ( r - I ) (or the entire
past decisions of the follower). thus L essentially imposes a kind of
reversed information structure on F. Note that whoever gets to declare his
strategy First becomes the leader. This approach requires separate-treatments at t = T - 1 and 0 by solving u,(T- 1) first which has a permanently optimal solution, and considering u F (- I ) to be fixed at zero as is
evident in the works of Basar and Tolwinski. Also. bear in mind that the
A New Computational Method for Stackelberg and
distinction between closed-loop Stackelberg controls and Stackelberg
feedback strategies [I] still exists. With this understanding. closed-loop
Min-Max Problems by Use of a Penalty Method
Stackelberg strategy for linear quadratic deterministic problem can be
KIYOTAKA SHIMIZU AND EITARO AIYOSHI
solved using the basic idea discussed in Section 111. It is clear that many
the enormous flexibility here. For
strategies y;, are possible dueto
Abstract-This paper is concernedwiththeStackelbergproblemand
example, Ls strategy may punish a nonrational behavior of F for one
stage only (asin (41). two stages, etc.. or for the rest of the game (as in [2]). the min-max problem in competitive systems. The Stackelberg approachis
applied to theoptimization of two-level systems where the higher level
It is thus possible that different y;. may enjoy various advantages.
determines the optimal value of its decision variables (parameters for the
lower level) so as to minimize its objective, Hide the lower level minimizes
VIII. CONCLIXON
its own objective with respect to the lower level decision variables under
the given parameters. Meanwhile, the min-max problem is to determine a
In this paper we have identified two reasons why L may be interested in
min-maxsolutionsuchthatafunctionmaximizedwithrespect
to the
the decision of F. First. knowing Fs strategy and h s decision may enable
maximizers variables is minimized with respect to the minimizers varia1. to infer the states of nature [RI)]. Second, Fs decision may directly
bles. This problem is also characterizedbyaparametricapproach
in a
affect Ls payoff [R2)]. We then discussed mechanisms by which L can
two-level scheme. New computational methods are proposed here; thatis, a
induce F to behave cooperatively. In case R I ) the mechanism is to
series of nonlinearprogrammingproblemsapproximatingtheoriginal
transform Fs payoff function so that it looks like Ls ow-n. In case R2) it
two-levelproblem byapplication of apenaltymethod
to a constrained
is to make directly any choice of Fs strategy other than the cooperative
parametricproblem in thelowerlevelaresolvediteratively.
It is proved
one unpalatable. In either case the crucial requirement is that we have the
reversed information structure as defined in Section 111. It serves as a that a sequence of approximated solutions converges to the correct Stackelberg solution, or the min-max solution. Some numerical examples are
unifying ingredient in diverse applications.
presented to illustrate the algorithms.
ACKNOWLEDGMENT

I. INTRODUCTION

The authors would like to thank Dr. G. J. Olsder for many insightful
comments on this subject.
REFERENCES

M Simaan and J. 8. Crw. Jr..Additlonalaspects of theStackelbergstrategy in


nonzerosum games. J. Oprrnliz. T h e o n .4pp/..vol. 1 I . pp. 613-626. June 1973.
T.Basar and H. Selbuz. Closed-loop Stackelberg strateges a x b applicatlons in the
optimal control of multilevel systems. I E E E Tronr Buromur. Conrr.. vol. AC-24. pp.
166- 179. Apr. 1979.
G P. Papavassilopoulosand J. B. C n n . Jr..Nonclassicalcontrolproblemsand
Stackelberggames. IEEE Truns. Auromur. Conrr.. v d . AC-24. pp. 155-166. Apr
1979
to multi-stage hear-quadratic
B. Toolwinski. Closed-loopStackelbergsolution
game. J. Oprrnrr:. Theon Appl to be published.
M. Spence. Nonlinear pricing and welfare. J . Public Econ vol. X. pp I - 18. Aug.
1977
T.Groves and >Loeb.
I, Incentives in a divisionalized firm. Wunugemenr So..rol
25. pp 321 - 230. Mar. 1979.
H. von Stackclberg. The Theon- o/ the .MarkerEronon1.v. Oxford. England: Oxford
Vniv. Press. 1952.
C.-I Chen and J. B. Crw. Jr.. Stackelberg solution for twoFper-wn games with biased
.
pp. 791-798. Dec.
information patterns. IEEE Trans. rluromur. Conrr.. ~ o l AC-17.
1972.
31. Simaan and J. B. Cnn. Jr.. On the Stackelberg strategy in nonzero-sum games:
J . Oprrmr:. Theon. Appl.. vol. 11. pp. 533-555. May 1973.

This paper is concerned d t h the Stackelberg problem and the min-max


problem in competitive systems.
The Stackelberg solution [ 141, [ 121. [ 131. [8] is the most rational one to
answer a question: what will be the best strategy for Player 1 who knows
Player 2s objective function and has to choose his strategy first, while
Player 2 chooses his strategy after announcement of Player Is strategy. A
problem in the field of competitive economics is one such problem.
formulated so that a
The min-max problem [3].[2]. [4]. [9].[6]is
function, maximized with respect to the maximizers variables. is minimized with respect to the minimizers variables. The min-max solution is
optimal for the minimizer against the worst possible case that might be
taken by the opponent (the maximizer). Thus. the min-max concept plays
an important role in game theoq.
Many articles on the equilibrium solutions. such as the Nash solution
and the saddle-point solution. have been published. However, the StackelManuscript received September 4. 1979; rebised F e b r u q 18. 1980 and June 30. 1980.
Paper recommended by A. J. Laub. Chairman of the Computational Methods and Discrete
Systems Committee.
Theauthorsarewiththeschool
of Engineenng. Keio University. 3 - 1 4 1 Hiyoshi
Kohoku-ku Yokohama. Japan.

0018-9286/81/0400-0460$~.75 c,l981 IFEE

113% TRANSACTIONS O N AUTOMATIC CONTROL. VOL.

AC-26,NO. 2.

APRIL

1981

berg and the min-max solutions differ from them in that Player 2 makes
his decision after Player 1 does. i.e., a precedence in decision-making
order exists. Therefore, both the Stackelberg and the min-max problems
can be represented by hierarchical optimization problems, where a part of
constraints in the upper level problem consists of a parameterized optimization problem in the lower level. The problems of this type can be solved
by a parametric approach in principle. However, that approach causes
difficulties in developing an available computational method also. Some
computational methods for the min-max problem were proposed in [4].
[6], [9]. [ 1 I], where a gradient type algorithm was discussed [4]. [6]and a
relaxation method [Y], [ll]. In [Is], a penalty method was applied to
calculate a saddle-point solution. However. papers on the Stackelberg and
the min-max solutions are comparatively few. most of them limited to
separate constraints cases when the constraints for each player do not
depend on another player's strategies.
In this paper. we consider the most general problems with unseparate
constraints. and propose new computational methods based on the theoq
of a barrier method (an interior penalty method [l]. [ j ] ) . Our method is
based on that a series of nonlinear programming problems approximating
the original hierarchical one are solved iteratively by applying the barrier
method to the lower level problem. It proves that a sequence of approximate solutions converges to the true solution for the original problem.

46 1

problem (3d)-(3f). Here. we assume existence of ( x , j ( x ) ) satisfying


(3b)-(3f). For simplicity of notation, we shall denote x'l and $1 nith x.'
and J".respectively. in the following section.
111. APPROXIMATE TECHNIQUE
USINGA BARRIERMETHOD
FOR
THE STACKELBERG SOLUTION

Our approach to the problem (3) begins with replacing the lower level
problem by an unconstrained problem based on the theory of a barrier
method. Then, the parametric minimization problem (3d)-(3f) is transformed into an unconstrained parametric minimization problem with the
augmented objective function P combined with the constraint functions.
The parametric solutionp(x), whchis regarded as a function of x. can be
approximated by an implicit function satisfying a stationaq condition for
the resulting unconstrained lower level problem. A sequence of approximated problems will be solved by use of appropriate nonlinear programming techniques.
Let the feasible region Y be given with the vector function h , E Rq' as
F={ y l h , ( y)<O). and let
S(x)={pJhz(J')~O,g2(x,p)<o}.
We begin with imposing the following assumption:
a)int S(x). the interior of S(x). which is given by

11. FORMCLATIOK
OF THE STACKELBERG
PROBLEM

intS(x)=(y;h2(p)<0,gz(x.y)<O},

In a two-player game. Player 1 has all information about Player 2's


objective function and constraints. while Player 2 knows nothing about
player 1 but the strategy announced by Player I . Then. until Player 1
announces his own optimal strategy. Player 2 cannot solve his optimizing
problem responding to the Player 1's strategy. In such a situation, Player
I has the leadership in playing the game and can decide the best strategy
in anticipation of the Player 2's act. The optimal solution for Player 1
with precedence in decision is called a Stackelberg Solution with Player 1
as leader. The Stackelberg solution is the optimal strategy for the leader
when the follower reacts by playing optimally.
Let X E R " ~ , E R and g IER"'' be the decision variable vector, the
objective function and the vector constraint function for Player I. respectively. while J E R":, E R and g 2 ER"" for Player 2. Two functions,
fl( x. y ) and /,( x. J), map R"I X R": into the real lines such that Player 1
wishes to minimize/, and Player 2 nishes to minimize/,. Then. Player 2's
minimal solution $(x) responding to a Player 1's strategy x satisfies the
follo\ving relation:

/,

is not empty for any fixed x and its closure becomes S(x).
Let us define the augmented objective function on xXint S(x). where x
is any point in R"', as
p'(x.S.)=f,(x.J)+'Q(gz(x.P).h?(y))

Lvhere r>O and


domain that

+ is such a continuous

aspEintS(x)

Q(g2(x,~).h2(~))-++cx.

asJ+aS(x).

Here, a.S(x) denotes the boundary of S(x).


We consider an unconstrained minimization problem with the augmented objective function:
min P '( x. J)

(5)

L'

forVyEYsatisfyingg,(x.J)GO.
XI'

function defined on the negative

cb(gz(x.J).hz(Y))~O.

/,(**J:(x))Gf,(x.Y)

For such J'( x). if there exists

(4)

('?

such that

in place of the constrained minimization problem:


minf,(x, u)
1'

/l(x\~..i.(x~~))~/l(x.j(x?)

subject toyES(x).

for VxEXsatisfyingg,(x, y ( x ) ) ~ O .

x > ' iscalled


the Stackelberg solution for Player 1 asa
leader and
J" ~ px ) (' )is an optimal solution for Player 2 in response to x \ ' . In other

words, the Stackelberg solution xrl can be defined to be a solution to the


following two-level decision problem:
min/,(x.

(3a)

JYX))

subject to EX

(3b)
(3c)

gl(x. .i.(X))GO
/dx. i ( x ) ) = minf2(x. 1'1

(3d)

L'

subject t o y Y
&(X,

y)<O

(6)

(2)

(3e)
(30

where f ( x ) denotes a parametric minimal solution to the lowerlevel

In ordertoapply
the theory of abarrier method, we impose the
following assumptions.
b) The functionsf2(x. y ) and g,(x, y ) are continuous at any (x, J ) E
R"] X R"' and h2(y) is continuous at anyyER"2.
c) The set Y is compact. This implies that S(x) is also compact.
Then,it is assured that the problems ( 5 ) and (6) have their optimal
solutions j ' ( x ) Eint S( x) and I(x) E S( x), respectively, for any fixed x.
Let us consider a sequence of the optimal solutions { s ; r l x ) ) for the
problem (5) in response to a positive parameter sequence { r k ) strictly
decreasing to zero. When the parameter x is fixed, it follows directly from
a theory of a barrier method (an interior penal5 function method [ 11. [5])
that any accumulation point of the sequence { p r 1 x ) ) is optimal for the
problem (6). On the other hand, when theparameter x varies as a
we
sequence ( x k ) . the convergence about ( j ' ^ ( x k ) )is left unsettled.
prepare the following lemma.
Len~nzu1: Let { j r ' ( x h) ) be a sequence of the optimal solutions to the
problem (5) in response to a sequence { x'} C X converging to 1 and a
positive sequence { r ' ) strictly decreasing to zero. If assumptions a)-c)

So.

464

11-1-1- TKASSACTIONS
ON

become differentiable, whose gradient and Jacobian at ( x . y ) can be given


as follows, respectively.
c~l(X)=~~;fl(X..l.)+~I'fi(X.l)~?l(X)
I

= ~ ~ f I ( x . . , ~ ) - q . f , ( r . , ~ ) [ ~ ~ , ~ PTr.;Pr(x,J)
'(x.y)]

(25)
- I

~ . , ( X ) = ~ . ~ g l ( x . y ) - ~ ~ R I ( X . . ) ' ) [ ~ ~ ~ PY,I.P'(X.J.).
~(x.y)]

(26)

Then, the problem (22) is equivalent to the following problem under the
relation (23):
minfl(x.?(xN

(27a)

AUTO>t1ATIC COKTROL. VOL. A C - 7 6 . NO.

2. APRIL 1981

[IO] presented optimality conditions for the min-ma problem of the


separate tqpe in a form like the optimality conditions for usual nonlinear
programming problems. As to computational methods. however. only few
methods were proposed. based on a gradient method [4]. [6]. anda
relaxation method [9]. [ 1 I], that are not applicable to min-ma.; problems
subject to the unseparate constraints. So. we apply the solution method
presented for the Stackelberg problem to the min-max problem (28).
Let us define the augmented objective function for the lo\verlevel
problem as
P'(x.y)=f(x.y)-r~(~?(x.y).h?(y)).

and replacethe constrained maximization problem in (28d)-(18f) with


unconstrained maximization problem

max P'( x. y).

subject toh,(x)<O
g1(x. v(x))<O.

(27~)

From now on. let us t r y tosolve the problem (27) instead of the
problem (12). Note. however. it is impossible to represent the function
(30a
q ( x ) in an explicit form. Thismatter
causes difficulties in directly
apply-ing nonlinear programming tothe problem (27). In the most of
iterative methods of NLP. however. the data needed for its computation
are values of gradients of the objective and the constraint functions as
well as their function values at the current point. Let x' bc the current
point. Then. in our case. the value y' ~ q x')( coincides u-ith the solution
?'( x ' ) to the problem ( 5 ) with x=x'. whch can be solved easill. By use of
this valuey'. we can evaluate $,(x[) and e l ( x ' )by ( 2 5 )and (26) along
with
jl(x')=f,(x~.JJ).

g,(x')=gl(x'.y').

Only after the preparation of those data, we can apply the existing
nonlinear programming, suchas Zoutendijk's feasible direction method
[ 161. to the problem (27).

IV. AN APPLICATION
TO T

H MIS-MAX
~
PROBLI..?.~

We consider the min-max problem as a special case of the Stackelberg


problem (3). Setting!, = -f, =fin the problem (3). \ve have the follon-ing
min-max problem:
(28a)

minf(x. j ( x ) )
X

subject to x A'

(28b)

g,(x. j ( X ) ) G O

(28C)

f(x,j(x))=maxf(x,y)

(28d)

subject toy E Y
gz(X. y)<O

(Be)

(29)

I'

(27b)

Furthermore, since the objective function is common in both the upper


and lower levels. this peculiarity makes it possible for us to transform the
min-max problem ( 2 8 ) into the folloxing problem:
minP'(x.j'(x))
X

subject toxEA'

(30b)
(30~)

g,(x.j'(x))<O

P r ( x ,i ' ( x ) ) = maxP'(x.J*).

(304

1'

Notice that f ( x . i ( x ) ) in (2821) is replaced with the maximal valued


function P'( x. i r ( x ) ) in (30a). This point distinguishes bettveen the
Stackelberg problem and the min-max problem.
Here. we impose the following assumptions corresponding to b) and d)
in Section 111.
b') The functions f( x. y ) and g2(x. y ) are continuous in ( x , 1)E R"I X
R": and h,( J) are continuous in-rER":.
d') The function/(x. p ) is strictly concave in y,g,(x. p ) and h z ( y ) are
convex in y. Furthermore. + ( g 2 .h , ) is an increasing convex function in
(g2.h:).
Then. the following lemma can be obtained.
Lemrnn 2: Let { i r ' ( x L)) be a sequence of the optimal solution to the
problem (29) in response to a sequence ( x L ) C X converging to T and a
positive sequence ( r ' ) strictly decreasing to zero. If assumptions a), b'),
c). and d') are satisfied. then it holds that
i) I i m L d x j r \ x L ) = i ( . t )

i) l i m ' - ~ ~ r ( x L . ~ r ~ x L ) ) = f ( i , j ( . ? ) ~ .
ProoJ:
i) Setting fi = -f in Corollary 1 in Section 111. we have t h s result
immediately.
ii) Suppose that P ' \ x L , f(x* )) does not converge to f( T, F(f)),
then there exist a positive number c and a positive integer K, such that

(280

where Y={ y l h , ( y ) S O ) and j ( x ) is a parametric maximum solution to


the lo\verlevel problem (28d)-(28r). The solution to the problem ( 2 8 ) .
which we call a min-maw solution. is the best for Pla\er I ngaimt the
worst possible case that might be taken by Player 2. when Player 1 hnon-s
nothing about Player 2.
In a special case. when the constraintson x and y are independent
(separate constraints). the min-max problem is formulated simply as

~ P " ( X ~ . ? ~ ~ ( X ~ ) ) - ~ ( T . ~ (forVX>K,.
T))~~~

(31)

Here. consider an open ball B( j(T ): 8 ) C R'': around j(2 ) . then the
continuity off at y implies the existence of a positive 6 such that
J(i.y)-j(.?.t(i))

<c

forV.rrB( f ( f ) : 8 ) .

Aswmption a) enables us to choose another point y " such that y "


int X( i ) t ? B ( ?(i):
8 ) . For such)"'. i t hold5 that

min max j ( x. p )
X L X .I'E Y

whereX=(x(h,(x)<O) and Y = ( y ~ h ~ ( y ) < O )


Most studies on min-max problems have been limited to the separate
type not including unseparateconstraints (2%) and (280. Danskin [3]
presented the directional derivative of the ma?iimal valued function +(x)
=
,.f(x, p ) =f( x. j(x)). Using h s . Bram [2] and Schmitendorf

,f(~.y")-!(T.j(~))I<E-

(34

By the same reason. as in the proof of Lemma 1. we have the existence


of a sequence [ y L } c Y and a positive integer K , such that

y' E S ( X ' )

forVX>K2

and that y A-y", Since y"Eint S ( ?). \ve have the existence of a positive
integer K , such that

IEEE TRANSACTIONS
ON

AC-26,NO. 2. APRIL 1981

AUTOMATIC CONTROL, VOL.

465

(33)

]P"(xk,yy")-f(f,y")J<~
forVk>K,

in the similar manner to the proof of Lemma 1. Set K=max( K,,K z , K 3 ) .


Then, using (33). (32), and (3 I ) in turn, we have the following relations for
all k > K.

200
50

8.430
8.992
9.414
9.730
9.873
9.944

IO
I
0.1

Prk(xk,yk)>f(f,y,,)--E

0.01

>f(f,jqa))-2rzP*(xk,j,rk(xk)).

This contradicts the fact that P r * ( x k ) is a maximum solution tothe


problem (29) in response to x k and rk. Thus, we can conclude that
w
p r I x k , j r * ( x k ) ) converges tof(f,+(f)).
Let us impose the following assumption.
e') The functions/(x, y ) and gl(x,y)are continuous in (x, y). Then,
the following theorem
be obtained.
Theorem 2: Let (x' ] be the sequence of optimal solutions generated
from the problem (30) in response to a positive sequence {rk) strictly
decreasing to zero. If the assum~tionsa), b'), c), d'),e'), f ) , and g) are
satisfied, then the sequence { x r } has an accumulation point, any of
which is optimal for the min-max problem (28).
Proof (Feasibility of theAccumulation Point):By a similar way in
Theorem 1, we can prove the feasibility.
(Optimality of the Accumulation Point): Suppose that f does not solve
the problem (28); then in the same manner as in Theorem 1, we have the
existence of x" E X such that

13.55
81.88
88.96
94.74
97.49
98.88

126.9
44.05
13.63
2.107
0.5589
0.1194

The computational aspect for the problem (30) is similar to that for the
problem ( I 2) in Section 111.
we omit the computational consideration
for the min-max problem.

So.

NUhlERlCAL EXAMPLES
v . SOME
In order to illustrate the convergence properties, we shall give some
examples for the Stackelberg problems.
Example I: As the Stackelberg problem of theunseparate type. we
present the following simple problem:
minx2+(p(x)- 10)'
X

subject to - x + p ( x ) G O
O G x G I5

(34)

g,(x", P(X"))<O

8.425
8.988
9.413
9.729
9.872
9.943

(~+Zp(x)-3O)~=min(x+2y-30)'

and

f(x".-ij(x"))<f(f,jqf)).

subject to x+y-20<0

Here. let

OGyG20.
where-E>O.

/(f.p(i))-/(~".p(~"))=2-~.

(35)

As x is fixed at x". the theory of a barrier method yields that as k + c o ,


P'A(x',,j!'"x"))~/(x",Q(x")).

That is. there exists a positive integer K" such that


forVk>K".

IP"(x",j,"(x"))-f(x",P(x"))l<f

(36)

K e Stackelberg solution of this problem is x' = 10 and j ( x s ) = 10,


which can be found by geometrical consideration. Applying the presented
algorithm in Section I11 by use of the penalty function Q of SUMT [5], we
obtain the compytational result shown in Table I. It has been certified
that ( x @ ,jr*(x' )) approaches gradually to the Stackelberg solution of
the original problem in proportion as r k decreases.
Example 2: As the Stackelberg problem of the separate type, we
present the following problem with two-dimensional variables:
min(x, -30)' +(xz -20)' -20~3, +20y2

On the other hand, ii) in Lemma 2 implies the existence of a positive


integer.f?such that

subject to x , + 2 x 2 a 3 0
IP'"(~',,'A(~rA))-/(f,i,(f))!<~

forVk>f?.

(37)

x , +xz d 2 5

Set K=max(K". f ? ) . Then. using (36). (35). and (37) in turn, we have
the following relations for all k >K :

x1 d 15
( x , - I ~ ( X ) ) ' + ( X ~ - ~ ~ ( X ) ) ~min(x,
=
-y,)'

+ ( x z -y212

.1'

P'.*(,",p'~(x~~))<f(x~~,,(x"))+c

subjeci to Ody, < I O


=f(i,p(f))-E<P'I(X'I,plA(XI.*))

Since we find that (x", j'ix'')) satisfies (30c) in the similar manner to
the proof in Theorem 1. this relation contradicts the fact that (xr*) solves
the problem (30).

0dyz d 10.
Our method exhibits the power, as there exists no other appropriate
manner to calculate the Stackelberg solution even for the separate type.
Table IIillustrates the convergence property in this case. too.

TABLE I1
rh

500
200
50
10

x;'

xf

18.44
19.03
20.00
20.00
20.00

6.563
5.968
5.000
5.000
5.000

True
xi
x;
Values 20.00 5.000

$t(XrA)

fikxrA)

/,(xrh,jr\xr'))

6.339
7.288
8.553
9.326
9.775

5.173
5.228
4.976
4.821
4.959

291.0
276.0
253.4
234.9
228.7

j~(x')

h(x")

/l(x',j(x'))
225.0

10.000

5.000

P~\X~~,~~'(X~'))

564.0
319.8
191.4
133.9
109.5
fJx'.J(x'))
100.0

466

I t E E TRAKSACTIONS ON AUTOMATIC CONTROL. YOL. AC-26. NO.

Some computational experiments for the min-max problems confirmed


the convergence property and availability of our method also.
VI.

2.

1981

APRIL

is controlled by FS policies there is a finite state Markov system which is


equivalent to the original problem. The finite state problem can be soked
by known Markov programming methods.

CoNCLUsloN

In this paper. we have presented two solution methods for the most
general tj-pes of the Stackelberg and the min-max problems, both of
which are formulated as two-level optimization problems to be solved in
principle by a parametric approach. Our methods apply the barrier
method to transform the two-level problems into the ordinary nonlinear
programming problems. Some numerical examples were given to illustrate
the proposed algorithms. In viewof the fact that there are scarcely anyavailable solution methods so far. we are conLinced of the significance of
our methods. Nevertheless. there are also troubles in practical computations. For instance. when r is set sufficiently small. the rapid increase or
decrease of the function P (x. J) in the vicinity of the boundary of the
feasible region troubles us to obtain an accurate solution to the problem
( 5 ) or (29), and the increase of sensitivity of P( x. J) introduces roundoff
errors in the calculation of the gradient of the implicit function. Those
troubles give serious influence on the computational results. It is hoped.
however, that wewill overcome the above stated troubles. which are
originated in disadvantages peculiar to the barrier method. with various
existing techniques for improvement of the barrier method (SUMT).

ACKNOWLEDGMENT
The authors wish to thank thereviewers for their many constructive
comments.
REFERENCES
M Avriel. .Vouhmwr Progranmm~. E n g l e w d Cliffs. NJ: Prcntlce-Hall. 1976.
J. Bram. The Lagrangc multiplier theorem for max-m1n uith w.eraI constraints.
S l A M J . A p p l M u [ h . vol.
~
14. no. 4. pp. 665-667, 1966
J hi. Damkin. The Theow o/ ,Muv-.&f/n uml I t , Applr~unonI O U>upopo,a 4llorurwm
Prohlene. Berlin: Springer-Verlag. 1967
V. F DemJanov. Algorithmsfor some minimar problem.. J Conlpur Sial S I . .
w l 2. no 4. pp. 342-380. 1968.
A. V. Flacco and Ci. P. McCormick. The xqucntial unconatralncdminlmizatlon
tcchnsquc for nonlinear programming. ~ h ~ u : e m e Sc!
) ~ r. bo1 IO. no 2 . pp 601-617.
1964.

J. E Hellcr and I. R. C r u r Jr.. An algorithm lor minmax parametcr optlmiwtion.


.41mmuruu. vol. X. no. 3. pp. 325-335. 1972
W . W. Hogan. Polnt-twset map5 in mathematical programmmg. SI.4 M R r r . w l .
IS. no 3. pp 591-603. 1973.
Ci. P Papnbas~tlopoulosand I. B. Cruz. Jr.. TGonclasical control prohlemr.and
Stackelberg gameh. I E E E Trum. Auronlur Comr vol AC-24. no 1. pp 155- 166.
1979
D M Salmon. Minimax controller denign. I E E E Trmz 41,rontur. C w n - . vol.
AC-I?. no 4. pp. 369-376. 1 9 6 X .
W. E. Schmitendorf. Necerary condltlonrand
sufficient condillon\for
>tatnc
min-max problems. J ,Murh. AMI. Appl.. vol. 57. no. 3. pp. hR3-693. 1977
K Shlmizu and E Aiyoshi. N e c e s a E condition> for mln-max prohlema and
algorithms hy a rclaxation pr.edurc. I E E E Trum l u r o m u r . Comr.. bo1 AC-25. no
I. IYXO.

M. Simaan and I. 8. CNZ. Jr.. On the Stackelberg strategy In nonzero-sum games.


3. Oprimlr. Theon) Appl.. vol. I I . no. 5. pp. 533-555. 1973.
M. Slmaan.Stackelberg optimizationof two-level svstems. I E E E T r a m S ~ s r .
Man. Cvhern.. vol. SMC-7. no. 4. pp. 554-556. 1977.
H. Von Stackelberg. The Theon. o/ rhe !Marker &onom?. Oxford. England: Oxford
Umv. Press. 1952
J J. Strodiot and V. H. Ngyen. An exponential penalty method for nondifferentiahle minmax problem uith general constraints. J. Oprrmt:. Theow Appl . vol. 21. no
2. pp. 205-219. 1979.
G . Zoutendijk. .Merho& o/ Feasible Dirernons. Amsterdam. The Netherlands: Elsevier. 1960.

Finite Chain Approximationfor a Continuous


Stochastic Control Problem

Abstracf-A class of feedback policies called finitely switched (FS)


policies is introduced. When a one-dimensional nonlinear stochastic system
Mvllnuacriptrccclved Fcbruap I I . 1980: revlscd Auguat X. 1980 Paper recommcnded b!
S . I Marcus. Chairman of the Srochatic Control Committee. T h i s work \\a> upported h>
the Natlonal Sclcnce Foundallon under Grant 79-03879 and hy the l o ~ n Scraicea
t
Elcctronic\ Program under Contract F44620-76-C-0100
Thc authors arc uith the Department of Electrical Englneering and Computer Sciences
and the Elcctronics Research Laboraton. Uniwrzity of Callfornia. Bcrkelc!. CA 94720

1. INIXODUCTION
We consider the control of the one-dimensional system described by the
It6 equation
d.x,=n2(x,. u,)dr+cr(.x,.u,)dw,.

t>O

(1.1)

in which .x, is the state. u , is the control, and ( w , ) is standard Brownian


motion. The control actions are to be selected in feedback form u , = @ ( x r )
where Q belongs to a specified family of feedback policies 0.
A standard approach to this situation is to treat it as an optimal control
problem. One assumes first that 0 is the family of all feedback policies Q
taking values in a specified control constraint set L;.One then seeks e* in
0 which is optimal in the sense that it minimizes the expected \due
of a given cost function. Q can be obtained bysolvingthe BellmanHamilton-Jacobiequation. This partial differential equation is usually
difficult to solve and one is forced to resort to some approximation.
A different lund of approximation is proposed here. A class 0 of
~
(FS) policies is introduced. It has
feedback policies called f i ~ i r e !switched
the property that the behavior of the state process ( x , ) when it is
controlled by @ in (D can be summarized by an equivalent finite Markov
chain. The optimal $* in an FS family Q, can then be obtained by Markov
programming techniques.
The basic idea of this approximation is readily traced to the work of
Forestier and Varaiya [I]. The only difference is that there ( x , ) itself is a
discrete-time finite Markov chain. w-hereashere it is an It6 process. This
difference. of course. leads to changes in the technical argument. Nevertheless. followingthe discussion in [I]. it is reasonable to kiew the
equivalent chain obtained here as an aggregation of (x,). and to
regard an FS policy as being implemented in a two-layer hierarchy.
However. this approach does not generalize easily to vector-valued
processes. For some related work. see [6].
FS policies are described in Section I1 and the equivalent Markov chain
is exhibited in Section 111. The calculation of the parameters of this chain
occupies Section I V . An example is treated in Section V.
11. FINITELY
SWITCHED
POLICIES

An FS family 0 is defined through a pair (S,\I.) where S=(.r,

<x2

< . . . <.x,z) is a finite set of states called the sw,itchingser (boundaq set
in [I]). and Q is any specified family of functions 4:.x+.+(.x). A particularly- policy @ in Q=Q,(S. k) is obtained by assigning to each x , E S a
function $,, 9.Then @=\I.is simply the family of all assignments
+=(+,,:.
..$,,.). To simplify notation such @ is usually denoted by

..$,,).
In the folloLbingwe suppose that a family @ = @ (S . \I.) is given in
advance.
Fix a +=(+,,:. ..IC,,,)
in 0.The control action ( K , , r>O) indicated by
@ and the resulting state (.x,. r>O) can now be described. Iiote that ( u , )
and (x,) are stochastic processes which depend on 9.
For simplicity suppose that the initial state is a switching state. say
.x,,=.x, a.s. Then the control action is gven bythe feedback policy
u , =$,(x,) for O<r< TI. where TI is the first time that x, reaches another
switching state. say .x, +.x,. The control action is now given by u, =$,(.x,)
for TI< r t T 2 . where T, is the first time after TI that x, reaches a new
x,),
switching state .xI #.x, when the feedback policy is switched to u ,
and so on. In brief. as soon as x, encounters a switching state the control
is saitched to the corresponding feedback policy-. More precisely, define
the random variables

To =O

a.s.

~ , = i n f { t > T , - , ( x J . E S . x T # . ~ T , , ~ , } n, 2 1 .

0 0 1 8-9286/8 1/04OO-0466$OO.75 C 1981 IEEE

(2.1)

S-ar putea să vă placă și