Documente Academic
Documente Profesional
Documente Cultură
We now turn to methods for general constrained nonlinear programming. These may
problems) and solved using one (or some variant) of the algorithms we have already
seen.
2. PRIMAL METHODS: These are methods that work directly on the original
problem.
We shall look at each of these in turn; we first start with transformation methods.
150
solution to the original problem. The original constrained problem is transformed into
Min f(x)
Max 0,
st xRn
EXAMPLE:
2 if 2
i. e. ,
0 if 2
infeasible; so 2
∗
1
Minimize has optimum solution 2 .
2
1 ∗
: As → ∞, 2 → 2
2
f+2p
1=0.5
2p
2=1.5
f+1p
1p
‐3 ‐2 ‐1 0 1 2 ‐1 0 1 2 3
In general:
penalty function.
2. The solution to the unconstrained problem can be made arbitrarily close to that
algorithms for unconstrained optimization would stop prematurely because the step
values; the optimal point from one iteration becomes the starting point for the next
problem.
1. a tolerance ,
2. an “increase” factor ,
3. a starting point x1, and
4. an initial 1.
At Iteration i
1. Solve the problem Minimize f(x)+ip(x); st xX (usually Rn). Use xi as the
2. If ip(xi+1) < then STOP; else let i+1 =(i) and start iteration (i+1)
153
1 if 0
0 if 0
(x) = (1–x1 –x2)2, f(x) = x12 + 2x22+ [(1–x1 –x2)2], and the necessary
2 2 1 1 0, 4 2 1 1 0
∗
2 ∗
Thus and
2 3 2 3
Starting with =0.1, =10 and x1=(0,0) and using a tolerance of 0.005 (say), we have
the following:
These methods also transform the original problem into an unconstrained one;
however the barrier functions prevent the current solution from ever leaving the
feasible region. These require that the interior of the feasible sets be nonempty, which
is impossible if equality constraints are present. Therefore, they are used with
A barrier function B is one that is continuous and nonnegative over the interior
of {x | g(x)0}, i.e., over the set {x | g(x)<0}, and approaches as the boundary is
Then
Usually,
∑ or ∑ log
f(x) + B(x)
A. Ideally we would like B(x)=0 if gj(x)<0 and B(x)= if gj(x)=0, so that we never
leave the region{x | g(x) 0}. However, B(x) is now discontinuous. This causes
Similar to exterior penalty functions we don't just choose one small value for rather
At Iteration i
1. Solve the problem Minimize f(x)+iB(x),st xX (usually Rn). Use xi as the
2. If iB(xi+1) < then STOP; else let i+1 =(i) and start iteration (i+1)
The necessary conditions for the optimal solution (f(x)=0) yield the following:
2 / 1 0, 4 / 1 0
1 1 3 1 1 3
Solving, we get and
3 6
∗ ∗
Since the negative signs lead to infeasibility, we and
Starting with =1, =0.1 and x1=(0,0) and using a tolerance of 0.005 (say), we have
the following:
Minimize f(x)
st gj(x) 0; j = 1,2,…,m
Suppose we use the usual interior penalty function described earlier, with p=2.
Suppose that the solution to (1) for a fixed (say k>0) is given by xk.
plus the original constraints, complementary slackness for the inequality constraints
Comparing (4) and (5) we can see that when we minimize the auxiliary function
using =k, the j(k) values given by (2) and (3) estimate the Lagrange multipliers
in (5). In fact it may be shown that as the penalty function method proceeds and
k and the xkx*, the optimum solution, the values of j(k) j*, the optimum
Consider our example on page 153 once again. For this problem the Lagrangian is
Solving these results in x1*=2/3; x2*=1/3; *=4/3; (=0 yields an infeasible solution).
159
Recall that for fixed k the optimum value of xk was [2k/(2+3k) k/(2+3k)]T. As
Similar statements can also be made for the barrier (interior penalty) function
so that
If (like before) for a fixed k we denote the solution by xk and define -k/gj(xk) =
j(k), then from (6) and (5) we see that j(k) approximates j. Furthermore, as
detail by Fiacco and McCormick. While they are attractive in the simplicity of the
principle on which they are based, they also possess several undesirable properties.
When the parameters are very large in value, the penalty functions tend to be
ill-behaved near the boundary of the constraint set where the optimum points usually
lie. Another problem is the choice of appropriate 1 and values. The rate at which i
change (i.e. the values) can seriously affect the computational effort to find a
solution. Also as increases, the Hessian of the unconstrained function becomes ill-
conditioned.
Here there is no need for to go to infinity, and the unconstrained function is better
2 2 2
H= . Suppose we want to find its eigenvalues by solving
2 4 2
= 2 – (6+4) + (8+12) = 0
= (3 2 ) 4 2 1
Taking the ratio of the largest and the smallest eigenvalue yields
(3 2 ) 4 2 1
. It should be clear that as , the limit of the preceding ratio
(3 2 ) 4 2 1
also goes to . This indicates that as the iterations proceed and we start to increase
the value of , the Hessian of the unconstrained function that we are minimizing
problematic if we are using a method for the unconstrained optimization that requires
MULTIPLIER METHODS
where j>0 and 0 are parameters associated with the jth constraint. This is also
Note that if =0 and j =, this reduces to the usual exterior penalty function.
Let xi be the minimum of the auxiliary function Pi(x) at some iteration i. From the
P 0,
Let us assume for a moment that are chosen at each iteration i so that they satisfy
the following:
, 0 (1)
0, 0 and 0
163
Therefore as long as (2) is satisfied, we have for the point xi that minimize Pi(x)
P ∑ (3)
If we let and use the fact that j>0, then (2) reduces to
0, 0 and ∙ 0, ∀ (A)
∑ . (B)
It is readily seen that (A) and (B) are merely the KARUSH-KUHN-TUCKER
∗ ∗ ∗
and
Here x* is the solution to the problem, and j* is the optimum Lagrange multiplier
From the previous discussion it should be clear that at each iteration j should be
ALGORITHM: A general algorithmic approach for the multiplier method may now
be stated as follows:
STEP 3: Check convergence criteria and go to Step 4 only if the criteria are not
satisfied.
Step 2.
,
165
PRIMAL METHODS
By a primal method we mean one that works directly on the original constrained
problem:
Min f(x)
st gj(x)0 j=1,2,...,m.
Let xi be a design at iteration i. A new design xi+1 is found by the expression xi+1 = xi
where f and gj are gradient vectors of the objective and constraint functions and J
Unlike with transformation methods, it is evident that here the gradient vectors of
primal methods.
166
The first class of primal methods we look at are the feasible directions methods,
where each xi is within the feasible region. An advantage is that if the process is
terminated before reaching the true optimum solution (as is usually necessary with
large-scale problems) the terminating point is feasible and hence acceptable. The
x' X d
x'
Further, if we also have f(x'+d) < f(x') for all [0,'), ' <, then d is called an
fT(x')d < 0
met.
NOTE: If gj(x) is a linear function, then the strict inequality (<) used to define the set
Geometric Interpretation
Minimize f(x), st gj(x) 0, j=1,2,…,m
gj(x*) gj(x*)
x* dTgj(x*)=0 x*
gj(x)=0 gj(x)=0
d
d
dTgj(x*)<0 d is a feasible d is tangent to the
direction constraint boundary
(i) (ii)
In (ii) d any positive step in this direction is infeasible (however we will often
FARKAS’ LEMMA: Given ARmn, bRm, xRn, yRm, the following statements
So… jHj-
Application to NLP
Let b -f(x*) Aj gj(x*)
On the other hand (2) indicates the K-K-T conditions. They are equivalent!
-f (x*)
g3(x ) *
g1(x*)
Feasible region
g3(x)=0
g2(x*) g1(x)=0
g2(x)=0
K-K-T implies that the steepest descent direction is in
the cone generated by gradients of tight constraints
Minimize f(x)
====================================================
To generate an improving feasible direction from x', we need to find d such that
Minimize fT(x')d
direction di.
171
In practice, the actual subproblem we solve is slightly different since strict inequality
Minimize z
st fT(x')d z
gjT(x')d z for jJ
hkT (x')d = 0 for k=1,2,...,p,
-1 di 1 for i=1,2,...,n
If (z*,d*) is the optimum solution for this then
Min f(xi+di)
Min f(xi+di)
st 0max
where max= supremum { | gj(xi+di) 0, j=1,2,...,m}. Let the optimum solution to
this be at *=i.
NOTE: (1) In practice the set Jx' is usually defined as Jx'={ j |gj(xi)+ 0}, where the
EXAMPLE
st x1 + x2 8
-x1 + 2x2 10
-x1 0
-x2 0 (A QP…)
1 1 1 0
, , , ,
1 2 0 1
4 2 4
and .
2 2 6
0
Let us begin with , with f(x0)=0
0
Min z Min z
g3T(xi)d z -d1 z
g4T(xi)d z -d2 z
i.e., Min z
1
yielding z* = -1 (<0), with d* = d0=
1
0 1
0 1
4
= 0=4 , with f(x1)= -24.
4
Min z
d1+ d2 z
-1d1, d21
i.e., Min z
1
yielding z* = -4 (<0), with d* = d1= .
0
4 1 4
4 0 4
174
3
=1=1 , with f(x2)= -26.
4
Min z
-1d1, d21
i.e., Min z
0
yielding z* = -4 (<0), with d* = d2= .
1
3 0 3
4 1 4
3
=2=1 , with f(x3)= -29.
5
175
Min z
d1+ d2 z
-1d1, d21
i.e., Min z
0
yielding z* = 0, with d* = . STOP
0
∗ 3
Therefore the optimum solution is given by , with f(x*)= -29.
5
7
-x1+2 x2=10
6
5 x3
4 2
x1
x
feasible region
3
x1+ x2=8
2
x0 1 2 3 4 5 6 7 8
176
Recall that the steepest descent direction is -f(x). However, for constrained
problems, moving along -f(x) may destroy feasibility. Rosen's method works by
2) P is positive semidefinite
3) P is a projection matrix if, and only if, I-P is also a projection matrix.
(b) Any xRn can be uniquely expressed as x=p+q, where p=Px, q=(I-P)x
x2
given x, we can
x find p and q
q
p
x1
177
Consider the simpler case where all constraints are linear. Assume that there are 2
-f
P(-f)
g1
(I-P)(-f)
g2
g1(x)=0
g2(x)=0
Obviously, moving along -f would take us outside the feasible region. Say we
improving direction.
For -Pf to also be feasible we must have g1T (-Pf) and g2T (-Pf) 0 (by
definition…).
178
direction). Thus if we denote M as the (2n) matrix where Row 1 is g1T and
To find the form of the matrix P, we make use of Property 4(b) to write the vector -f
Now, -g1 and -g2 are both orthogonal to -Pf, and so is [I-P](-f). Hence we must
have
-M[I-P]f = MMT
‐(MMT)-1M[I-P]f = (MMT)-1MMT =
= ‐{ (MMT)-1Mf + (MMT)-1M(-Pf) }
Hence we have
i.e., P = [I - MT(MMT)-1M]
179
we may stop; if not, a new projection matrix could be identified such that d= - f
any component of w that is negative, and drop the corresponding row from the matrix
====================
Minimize f(x)
projecting the gradient on to the tangent hyperplane at x (rather than on to the surface
Let xi be a feasible point and let J be the "active set", i.e., J = {j | gj(xi)=0}, or more
Suppose each jJ is approximated using the Taylor series expansion by:
Let Mmatrix whose rows are gjT(xi) for jJ (active constraints) and hkT(xi)
Let let P=[I - MT(MMT)-1M] (projection matrix). If M does not exist, let P=I.
Let di = -Pf(xi).
= -(MMT)-1Mf(xi)
c) If di 0 go to Step 2.
181
Solve Minimize
Call the optimal solution *and find xi+1 = xi+*di. If all constraints are not linear
make a "correction move" to return x into the feasible region. The need for this is
shown below. Generally, it could be done by solving g(x)=0 starting at xi+1 using the
O.F. contours
O.F. contours f
g2=0 f
g1 xi+1
Pf Pf
correction move
g1=0 g1=0
LINEARIZATION METHODS
The general idea of linearization methods is to replace the solution of the NLP by the
into the LP
Thus we start with some initial xi where the objective and constraints (usually only
the tight ones) are linearized, and then solve the LP to obtain a new xi+1 and
continue...
Note that f(xi) is a constant and x-xi=d implies the problem is equivalent to
Min dTf(xi)
and this is a direction finding LP problem, and xi+1 =xi+d implies that the step
size=1.0!
4 4 2 4 2 10 6
; ;
2 8 2 8 2 10 2
*
=-30
f=-20
f=-10
=0
f=0 1=0
1=0
2=0
2=0
The method however has limited application since it does not converge if the local
minimum occurs at a point that is not a vertex of the feasible region; in such cases it
oscillates between adjacent vertices. To avoid this one may limit the step size (for
instance, the Frank-Wolfe method minimizes f between xi and xi+d to get xi+1).
184
programming problems. RQP is a better approach because the second order terms
function of the objective function f at the point xi. Notice that this a QP in d with
linear constraints. Once the above QP is solved to get di, we obtain xi by finding an
optimum step size along xi+di and continue. The matrix B is never explicitly
computed --- it is usually updated by schemes such as the BFGS approach using
information from f(xi) and f(xi+1) only. (Recall the quasi-Newton methods for
unconstrained optimization)
185
One of the better linearization methods is the cutting plane algorithm of J.E.Kelley
Suppose we have a set of nonlinear constraints gj(x)0 for j=1,2,...,m that determine
the feasible region G. Suppose further that we can find a polyhedral set H that
2=0
3=0 G:
1=0
3=0
G j(x): nonlinear
1=0
2=0 j(x): linear; hence
H H is polyhedral
4=0
3. Otherwise add an extra linear constraint (i.e., an extra line/hyperplane) so that the
current optimal point becomes infeasible for the new problem with the additional
JUSTIFICATION
Step 1: It is in general easier to solve problems with linear constraints than ones with
nonlinear constraints.
Thus every point that satisfies gj(x)0 automatically satisfies hk(x)0, but not
vice-versa. Thus G places "extra" constraints on the design variables and therefore
the optimal solution with G can never be any better than the optimal solution with H.
MUST be optimal for G also. (NOTE: In general, the optimal solution for H at some
J={j | gj(x)>0}, i.e., ̅ is the value of the most violated constraint at xi. By the
̅ ̅ +…
If then ̅ ̅
i. e., ̅ 0 ()
The current point xi is obviously infeasible in this constraint --- since hk(xi) 0
So, if we add this extra linear constraint, then the optimal solution would
change because the current optimal solution would no longer be feasible for the new
LP with the added constraint. Hence () defines a suitable cutting plane.
NOTE: If the objective for the original NLP was linear, then at each iteration we
True optimum
h4 x* for G…
4
2
h5 2
1 1
G G
H H
3 3
=x*
h6 5 5
4 4 6
2
2
1 1
G G
3
H H
3