Sunteți pe pagina 1din 40

149

CONSTRAINED NONLINEAR PROGRAMMING

We now turn to methods for general constrained nonlinear programming. These may

be broadly classified into two categories:

1. TRANSFORMATION METHODS: In this approach the constrained nonlinear

program is transformed into an unconstrained problem (or a series of unconstrained

problems) and solved using one (or some variant) of the algorithms we have already

seen.

Transformation methods may further be classified as

(1) Exterior Penalty Function Methods

(2) Interior Penalty (or Barrier) Function Methods

(3) Augmented Lagrangian (or Multiplier) Methods

2. PRIMAL METHODS: These are methods that work directly on the original

problem.

We shall look at each of these in turn; we first start with transformation methods.
150

Exterior Penalty Function Methods

These methods generate a sequence of infeasible points whose limit is an optimal

solution to the original problem. The original constrained problem is transformed into

an unconstrained one (or a series of unconstrained problems).The constraints are

incorporated into the objective by means of a "penalty parameter" which penalizes

any constraint violation. Consider the problem

Min f(x)

st gj(x) ≤ 0, j=1,2,…,m, hj(x) = 0, j=1,2,…,p, xRn

Define a penalty function p as follows:

Max 0,

where  is some positive integer. It is easy to see that

If gj(x) ≤ 0 then Max{0, gj(x)} = 0

If gj(x) > 0 then Max{0, gj(x)} = gj(x) (violation)

Now define the auxiliary function as

P(x)= f(x) +  (x),where  >> 0.

Thus if a constraint is violated, (x)>0 and a "penalty" of  (x) is incurred.


151

The (UNCONSTRAINED) problem now is:

Minimize P(x)= f(x) +  (x)

st xRn

EXAMPLE:

Minimize f(x)=x, st g(x)=2-x 0, x R (Note that minimum is at x* =2)

Let (x)= Max[{0,g(x)}]2 ,

2 if 2
i. e. ,
0 if 2

If (x)=0, then the optimal solution to Minimize is at x*= - and this is

infeasible; so 2


1
Minimize has optimum solution 2 .
2

1 ∗
: As → ∞, 2 → 2
2

f+2p
1=0.5
2p
2=1.5

f+1p

1p

‐3 ‐2 ‐1 0 1 2 ‐1 0 1 2 3

Penalty Function Auxiliary Function


152

In general:

1. We convert the constrained problem to an unconstrained one by using the

penalty function.

2. The solution to the unconstrained problem can be made arbitrarily close to that

of the original one by choosing sufficiently large .

In practice, if  is very large, too much emphasis is placed on feasibility. Often,

algorithms for unconstrained optimization would stop prematurely because the step

sizes required for improvement become very small.

Usually, we solve a sequence of problems with successively increasing 

values; the optimal point from one iteration becomes the starting point for the next

problem.

PROCEDURE: Choose the following initially:

1. a tolerance ,
2. an “increase” factor ,
3. a starting point x1, and
4. an initial 1.
At Iteration i

1. Solve the problem Minimize f(x)+ip(x); st xX (usually Rn). Use xi as the

starting point and let the optimal solution be xi+1

2. If ip(xi+1) <  then STOP; else let i+1 =(i) and start iteration (i+1)
153

EXAMPLE: Minimize f(x) = x12 + 2x22

st g(x) = 1–x1 –x2  0; xR2

Define the penalty function

1 if 0
0 if 0

The unconstrained problem is

Minimize x12 + 2x22+  (x)

If (x)=0, then the optimal solution is x* =(0,0) INFEASIBLE!

 (x) = (1–x1 –x2)2,  f(x) = x12 + 2x22+  [(1–x1 –x2)2], and the necessary

conditions for the optimal solution (f(x)=0) yield the following:

2 2 1 1 0, 4 2 1 1 0


2 ∗
Thus and
2 3 2 3

Starting with  =0.1, =10 and x1=(0,0) and using a tolerance of 0.005 (say), we have

the following:

Iter. (i) i xi+1 g(xi+1) ip(xi+1)


1 0.1 (0.087, 0.043) 0.87 0.0757
2 1.0 (0.4, 0.2) 0.40 0.16
3 10 (0.625, 0.3125) 0.0625 0.039
4 100 (0.6622, 0.3311) 0.0067 0.00449
5 1000 (0.666, 0.333) 0.001 0.001

Thus the optimal solution is x*= (2/3, 1/3).


154

INTERIOR PENALTY (BARRIER) FUNCTION METHODS

These methods also transform the original problem into an unconstrained one;

however the barrier functions prevent the current solution from ever leaving the

feasible region. These require that the interior of the feasible sets be nonempty, which

is impossible if equality constraints are present. Therefore, they are used with

problems having only inequality constraints.

A barrier function B is one that is continuous and nonnegative over the interior

of {x | g(x)0}, i.e., over the set {x | g(x)<0}, and approaches  as the boundary is

approached from the interior.

Let  0 if y<0 and lim → ∞

Then

Usually,

∑ or ∑ log

In both cases, note that lim → ∞

The auxiliary function is now

f(x) + B(x)

where  is a SMALL positive number.


155

Q. Why should  be SMALL?

A. Ideally we would like B(x)=0 if gj(x)<0 and B(x)= if gj(x)=0, so that we never

leave the region{x | g(x) 0}. However, B(x) is now discontinuous. This causes

serious computational problems during the unconstrained optimization.

Similar to exterior penalty functions we don't just choose one small value for rather

we start with some 1 and generate a sequence of points.

PROCEDURE: Initially choose a tolerance , a “decrease” factor , an interior

starting point x1, and an initial 1.

At Iteration i

1. Solve the problem Minimize f(x)+iB(x),st xX (usually Rn). Use xi as the

starting point and let the optimal solution be xi+1

2. If iB(xi+1) <  then STOP; else let i+1 =(i) and start iteration (i+1)

Consider the previous example once again…


156

EXAMPLE: Minimize f(x) = x12 + 2x22

st g(x) = 1–x1 –x2  0; xR2

Define the barrier function

The unconstrained problem is

Minimize x12 + 2x22+  (x) = x12 + 2x22 -  1

The necessary conditions for the optimal solution (f(x)=0) yield the following:

2 / 1 0, 4 / 1 0

1 1 3 1 1 3
Solving, we get and
3 6

∗ ∗
Since the negative signs lead to infeasibility, we and

Starting with  =1, =0.1 and x1=(0,0) and using a tolerance of 0.005 (say), we have

the following:

Iter. (i) i xi+1 g(xi+1) iB(xi+1)


1 1 (1.0, 0.5) -0.5 0.693
2 0.1 (0.714, 0.357) -0.071 0.265
3 0.01 (0.672, 0.336) -0.008 0.048
4 0.001 (0.6672, 0.3336) -0.0008 0.0070
5 0.0001 (0.6666, 0.3333) -0.0001 0.0009

Thus the optimal solution is x*= (2/3, 1/3).


157

Penalty Function Methods and Lagrange Multipliers

Consider the penalty function approach to the problem below

Minimize f(x)

st gj(x)  0; j = 1,2,…,m

hj(x) = 0; j = m+1,m+2,…,l xkRn.

Suppose we use the usual interior penalty function described earlier, with p=2.

The auxiliary function that we minimize is then given by

P(x) = f(x) + p(x) = f(x) + {j[Max(0, gj(x))]2 + j[hj(x)]2}

= f(x) + {j[Max(0, gj(x))]2 + j[hj(x)]2}

The necessary condition for this to have a minimum is that

P(x) = f(x) + p(x) = 0, i.e.,

f(x) + j2[Max(0, gj(x)]gj(x) + j2[hj(x)]hj(x) = 0 (1)

Suppose that the solution to (1) for a fixed  (say k>0) is given by xk.

Let us also designate

2k[Max{0, gj(xk)}] = j(k); j=1,2,…,m (2)

2k[hj(xk)] = j(k); j=m+1,…,l (3)

so that for =k we may rewrite (1) as


158

f(x) + jj(k) gj(x) + jj(k) hj(x) = 0 (4)

Now consider the Lagrangian for the original problem:

L(x,) = f(x) + jjgj(x) + jjhj(x)

The usual K-K-T necessary conditions yield

f(x) + j=1..m {jgj(x)} + j=m+1..l {jhj(x)} = 0 (5)

plus the original constraints, complementary slackness for the inequality constraints

and j0 for j=1,…,m.

Comparing (4) and (5) we can see that when we minimize the auxiliary function

using =k, the j(k) values given by (2) and (3) estimate the Lagrange multipliers

in (5). In fact it may be shown that as the penalty function method proceeds and

k and the xkx*, the optimum solution, the values of j(k) j*, the optimum

Lagrange multiplier value for constraint j.

Consider our example on page 153 once again. For this problem the Lagrangian is

given by L(x,) =x12 + 2x22 + (1-x1-x2).

The K-K-T conditions yield

L/x1 = 2x1- = 0; L/x2 = 4x2- = 0; (1-x1-x2)=0

Solving these results in x1*=2/3; x2*=1/3; *=4/3; (=0 yields an infeasible solution).
159

Recall that for fixed k the optimum value of xk was [2k/(2+3k) k/(2+3k)]T. As

we saw, when k, these converge to the optimum solution of x* = (2/3,1/3).

Now, suppose we use (2) to define () = 2[Max(0, gj(xk)]

= 2[1 - {2/(2+3)} - {/(2+3)}] (since gj(xk)>0 if >0)

= 2[1 - {3/(2+3)}] = 4/(2+3)

Then it is readily seen that Lim() = Lim4/(2+3) = 4/3 = *

Similar statements can also be made for the barrier (interior penalty) function

approach, e.g., if we use the log-barrier function we have

P(x) = f(x) +  p(x) = f(x) + j-log(-gj(x)),

so that

P(x) = f(x) - j{/gj(x)}gj(x) = 0 (6)

If (like before) for a fixed k we denote the solution by xk and define -k/gj(xk) =

j(k), then from (6) and (5) we see that j(k) approximates j. Furthermore, as

k0 and the xkx*, it can be shown that j(k) j*

For our example (page 156) -k/gj(xk) =k/ 1 =

-2k/(1- 1 3  4/3, as k0.


160

Penalty and Barrier function methods have been referred to as “SEQUENTIAL

UNCONSTRAINED MINIMIZATION TECHNIQUES” (SUMT) and studied in

detail by Fiacco and McCormick. While they are attractive in the simplicity of the

principle on which they are based, they also possess several undesirable properties.

When the parameters  are very large in value, the penalty functions tend to be

ill-behaved near the boundary of the constraint set where the optimum points usually

lie. Another problem is the choice of appropriate 1 and  values. The rate at which i

change (i.e. the  values) can seriously affect the computational effort to find a

solution. Also as  increases, the Hessian of the unconstrained function becomes ill-

conditioned.

Some of these problems are addressed by the so-called "multiplier" methods.

Here there is no need for  to go to infinity, and the unconstrained function is better

conditioned with no singularities. Furthermore, they also have faster rates of

convergence than SUMT.


161

Ill-Conditioning of the Hessian Matrix

Consider the Hessian matrix of the Auxiliary function P(x)=x12+2x22+(1-x1-x2)2:

2  2  2 
H=  . Suppose we want to find its eigenvalues by solving
 2 4  2  

|H-I| = (2+2-)*(4+2-) - 42

= 2 – (6+4) + (8+12) = 0

This quadratic equation yields

 = (3  2  )  4  2  1

Taking the ratio of the largest and the smallest eigenvalue yields

(3  2  )  4  2  1
. It should be clear that as , the limit of the preceding ratio
(3  2  )  4  2  1

also goes to . This indicates that as the iterations proceed and we start to increase

the value of , the Hessian of the unconstrained function that we are minimizing

becomes increasingly ill-conditioned. This is a common situation and is especially

problematic if we are using a method for the unconstrained optimization that requires

the use of the Hessian.


162

MULTIPLIER METHODS

Consider the problem: Min f(x), st gj(x)0, j=1,2,...,m.

In multiplier methods the auxiliary function is given by

P(x) = f(x) + ½ ∑ Max 0,

where j>0 and 0 are parameters associated with the jth constraint. This is also

often referred to as the AUGMENTED LAGRANGIAN function

Note that if =0 and j =, this reduces to the usual exterior penalty function.

The basic idea behind multiplier methods is as follows:

Let xi be the minimum of the auxiliary function Pi(x) at some iteration i. From the

optimality conditions we have

P 0,

Now, 0, , . Therefore P implies

Let us assume for a moment that are chosen at each iteration i so that they satisfy

the following:

, 0 (1)

It is easily verified that (1) is equivalent to requiring that

0, 0 and 0 
163

Therefore as long as (2) is satisfied, we have for the point xi that minimize Pi(x)

P ∑ (3)

If we let and use the fact that j>0, then (2) reduces to

0, 0 and ∙ 0, ∀ (A)

and (3) reduces to

∑ . (B)

It is readily seen that (A) and (B) are merely the KARUSH-KUHN-TUCKER

necessary conditions for xi to be a solution to the original problem below!!

Min f(x), st gj(x)0, j=1,2,...,m.

We may therefore conclude that

∗ ∗ ∗
and

Here x* is the solution to the problem, and j* is the optimum Lagrange multiplier

associated with constraint j.

From the previous discussion it should be clear that at each iteration j should be

chosen in such a way that , → 0, which results in xi x*. Note that xi is

obtained here by minimizing the auxiliary function with respect to x.


164

ALGORITHM: A general algorithmic approach for the multiplier method may now

be stated as follows:

STEP 0: Set i=0, choose vectors xi,  and .

STEP 1: Set i=i+1.

STEP 2: Starting at xi-1, Minimize Pi(x) to find xi.

STEP 3: Check convergence criteria and go to Step 4 only if the criteria are not

satisfied.

STEP 4: Modify based on satisfying (1). Also modify  if necessary and go to

Step 2.

(Usually is set to 0…)

One example of a formula for changing is the following:

,
165

PRIMAL METHODS

By a primal method we mean one that works directly on the original constrained

problem:

Min f(x)

st gj(x)0 j=1,2,...,m.

Almost all methods are based on the following general strategy:

Let xi be a design at iteration i. A new design xi+1 is found by the expression xi+1 = xi

+ i , where i is a step size parameter and is a search direction (vector). The

direction vector is typically determined by an expression of the form:

where f and gj are gradient vectors of the objective and constraint functions and J

is an “active set" given by J ={j|gj(xi)+  0,  >0}.

Unlike with transformation methods, it is evident that here the gradient vectors of

individual constraint functions need to be evaluated. This is a characteristic of all

primal methods.
166

METHOD OF FEASIBLE DIRECTIONS

The first class of primal methods we look at are the feasible directions methods,

where each xi is within the feasible region. An advantage is that if the process is

terminated before reaching the true optimum solution (as is usually necessary with

large-scale problems) the terminating point is feasible and hence acceptable. The

general strategy at iteration i is:

(1) Let xi be a feasible point.

(2) Choose a direction so that

a) xi+ is feasible at least for some "sufficiently" small >0, and

b) f(xi+ ) < f(xi).

(3) Do an unconstrained optimization

Min f(xi+ ) to obtain *=i and hence xi+1 = xi+i

DEFINITION: For the problem Minimize f(x), st xX

d (0) is called a feasible direction at x'X if  >0  (x'+d)X for 

x' X d
x'

feasible directions not feasible directions


167

Further, if we also have f(x'+d) < f(x') for all [0,'), ' <, then d is called an

improving feasible direction.

Under differentiability, recall that this implies

fT(x')d < 0

Let Fx' = {d | fT(x')d < 0}, and

Dx' = {d |  >0  (x'+d)X for all },

Thus Fx' is the set of improving directions at x', and

Dx' is the set of feasible directions at x'.

If x* is a local optimum then Fx*  Dx* =  as long as a constraint qualification is

met.

In general, for the problem

Minimize f(x), st gj(x)0, j=1,2,...,m.

let Jx' ={j | gj(x')=0} (Index set of active constraints)

We can show that the set of feasible directions at x' is

Dx' = {d | gjT(x')d < 0 for all jJx'}

NOTE: If gj(x) is a linear function, then the strict inequality (<) used to define the set

Dx' above can be relaxed to an inequality ().


168

Geometric Interpretation
Minimize f(x), st gj(x) 0, j=1,2,…,m

gj(x*) gj(x*)

x* dTgj(x*)=0 x*
gj(x)=0 gj(x)=0
d

d
dTgj(x*)<0 d is a feasible d is tangent to the
direction constraint boundary
(i) (ii)

In (ii) d any positive step in this direction is infeasible (however we will often

consider this a “feasible” direction…)

FARKAS’ LEMMA: Given ARmn, bRm, xRn, yRm, the following statements

are equivalent to each other:


Suppose Hj is the hyperplane
1. yTA  0  yTb  0 through the origin that is
orthogonal to Aj
2.  z such that Az=b, z0
A1
yTA 0  y jHj- H2
b

where Hj- is the closed H3 A2


half-space on the side of Hj H1 A3
that does not contain Aj

So… jHj-

yTA 0  yTb  0 simply Hb


implies that (jHj-)  Hb-
169

Application to NLP
Let b  -f(x*) Aj  gj(x*)

y  d (direction vector), zj  j for j Jx*  {j | gj(x*)=0}

Farkas’ Lemma then implies that

(1) dTgj(x*)  0  j Jx*  -dTf(x*)  0, i.e., dTf(x*)  0

(2)  j0 such that jJx* jgj(x*) = -f(x*)

are equivalent! Note that (1) indicates that

 directions satisfying dTgj(x*)  0  j Jx* are feasible directions

 directions satisfying dTf(x*)  0 are ascent (non-improving) directions

On the other hand (2) indicates the K-K-T conditions. They are equivalent!

So at the optimum, there is no feasible direction that is improving…

-f (x*)
g3(x ) *
g1(x*)

Cone generated by gradients


of tight constraints
x*

Feasible region
g3(x)=0
g2(x*) g1(x)=0
g2(x)=0
K-K-T implies that the steepest descent direction is in
the cone generated by gradients of tight constraints

Similarly for the general NLP problem


170

Minimize f(x)

st gj(x) 0, j=1,2,...,m

hk(x) 0, k=1,2,...,p

the set of feasible directions is

Dx'= {d | gjT(x')d < 0  jJx', and hkT(x') d = 0  k}

====================================================

In a feasible directions algorithm we have the following:

STEP 1 (direction finding)

To generate an improving feasible direction from x', we need to find d such that

Fx'Dx' , i.e., 

fT(x')d < 0 and gjT(x')d < 0 for jJx', hkT(x') d = 0  k

We therefore solve the subproblem

Minimize fT(x')d

st gjT(x')d < 0 for jJx'

(0 for j  gj linear)

 hkT (x')d = 0 for all k,

along with some “normalizing” constraints such as

-1  di  1 for i=1,2,...,n or ║d║2= dTd  1.

If the objective of this subproblem is negative, we have an improving feasible

direction di.
171

In practice, the actual subproblem we solve is slightly different since strict inequality

(<) constraints are difficult to handle.

Let z = Max [fT(x')d, gjT(x')d for jJx']. Then we solve:

Minimize z

st fT(x')d  z
gjT(x')d  z for jJ
  hkT (x')d = 0 for k=1,2,...,p,
-1  di  1 for i=1,2,...,n
If (z*,d*) is the optimum solution for this then

a) z* < 0  d =d* is an improving feasible direction

b) z = 0  x* is a Fritz John point (or if a CQ is met, a K-K-T point)

(Note that z* can never be greater than 0)

STEP 2: (line search for step size)

Assuming that z* from the first step is negative, we now solve

Min f(xi+di)

st gj(xi+di) 0, 0

This can be rewritten as

Min f(xi+di)

st 0max

where max= supremum { | gj(xi+di) 0, j=1,2,...,m}. Let the optimum solution to

this be at *=i.

Then we set xi+1= (xi+di) and return to Step 1.


172

NOTE: (1) In practice the set Jx' is usually defined as Jx'={ j |gj(xi)+ 0}, where the

tolerance i is referred to as the "constraint thickness," which may be

reduced as the algorithm proceeds.

(2) This procedure is usually attributed to ZOUTENDIJK

EXAMPLE

Minimize 2x12+ x22- 2x1x2 - 4x1 - 6x2

st x1 + x2  8

-x1 + 2x2  10

-x1 0

-x2  0 (A QP…)

For this, we have

1 1 1 0
, , , ,
1 2 0 1
4 2 4
and .
2 2 6

0
Let us begin with , with f(x0)=0
0

ITERATION 1 J={3,4}. STEP 1 yields

Min z Min z

st fT(x1)d  z d1(4x1-2x2-4) + d2(2x2-2x1 -6)  z

g3T(xi)d  z  -d1  z

g4T(xi)d  z -d2  z

-1d1, d21 d1,d2  [-1,1]


173

i.e., Min z

st -4d1- 6d2  z, -d1  z, -d2  z d1,d2  [-1,1]

1
yielding z* = -1 (<0), with d* = d0=
1

STEP 2 (line search):

0 1
0 1

Thus max = { | 28, 10, -0, -0} = 4

We therefore solve: Minimize f(x0+d0), st 04

4
= 0=4  , with f(x1)= -24.
4

ITERATION 2 J={1}. STEP 1 yields

Min z

st d1(4x1-2x2-4) + d2(2x2-2x1 -6)  z

d1+ d2  z

-1d1, d21

i.e., Min z

st 4d1- 6d2  z, d1 d2  z, d1,d2  [-1,1]

1
yielding z* = -4 (<0), with d* = d1= .
0

STEP 2 (line search):

4 1 4
4 0 4
174

Thus max = { | 8-8, -4+810, -40, -40} = 4

We therefore solve: Minimize 2(4-)2 + 42 – 2(4-)4 – 4(4-) – 6(4), st 04.

3
=1=1  , with f(x2)= -26.
4

ITERATION 3 J={}. STEP 1 yields

Min z

st d1(4x1-2x2-4) + d2(2x2-2x1 -6)  z

-1d1, d21

i.e., Min z

st -4d2  z, d1,d2  [-1,1]

0
yielding z* = -4 (<0), with d* = d2= .
1

STEP 2 (line search):

3 0 3
4 1 4

Thus max = { | 7+8, 5+10, -30, -4-0} = 1

We therefore solve: Minimize f(x2+d2), st 01

i.e., Minimize 2(3)2 + (4+)2 – 2(3)(4+) – 4(3) – 6(4+), st 01

3
=2=1  , with f(x3)= -29.
5
175

ITERATION 4 J={1}. STEP 1 yields

Min z

st d1(4x1-2x2-4) + d2(2x2-2x1 -6)  z

d1+ d2  z

-1d1, d21

i.e., Min z

st -2d1- 10d2  z, d1 d2  z, d1,d2  [-1,1]

0
yielding z* = 0, with d* = . STOP
0

∗ 3
Therefore the optimum solution is given by , with f(x*)= -29.
5

path followed by algorithm


8

7
-x1+2 x2=10
6

5 x3

4 2
x1
x
feasible region
3
x1+ x2=8
2

x0 1 2 3 4 5 6 7 8
176

GRADIENT PROJECTION METHOD (ROSEN)

Recall that the steepest descent direction is -f(x). However, for constrained

problems, moving along -f(x) may destroy feasibility. Rosen's method works by

projecting -f(x) on to the hyperplane tangent to the set of active constraints. By

doing so it tries to improve the objective while simultaneously maintaining

feasibility. It uses d=-Pf(x) as the search direction, where P is a projection matrix.

Properties of P (nn matrix)

1) PT = P and PTP = P (i.e., P is idempotent)

2) P is positive semidefinite

3) P is a projection matrix if, and only if, I-P is also a projection matrix.

4) Let Q=I-P and p= Px1, q=Qx2 where x1, x2Rn. Then

(a) pTq= qTp =0 (p and q are orthogonal)

(b) Any xRn can be uniquely expressed as x=p+q, where p=Px, q=(I-P)x

x2
given x, we can
x find p and q
q

p
x1
177

Consider the simpler case where all constraints are linear. Assume that there are 2

linear constraints intersecting along a line as shown below.

-f
P(-f)

g1

(I-P)(-f)
g2
g1(x)=0
g2(x)=0

Obviously, moving along -f would take us outside the feasible region. Say we

project -f on to the feasible region using the matrix P.

THEOREM: As long as P is a projection matrix and Pf 0, d=-Pf will be an

improving direction.

PROOF: f Td = f T(-Pf) = -f TPTPf = -║Pf║2 < 0. (QED)

For -Pf to also be feasible we must have g1T (-Pf) and g2T (-Pf)  0 (by

definition…).
178

Say we pick P such that g1T(-Pf) = g2T(-Pf) = 0, so that -Pf is a feasible

direction). Thus if we denote M as the (2n) matrix where Row 1 is g1T and

Row 2 is g2T, then we have M(-Pf)= 0 (*)

To find the form of the matrix P, we make use of Property 4(b) to write the vector -f

as -f = P(-f) + [I-P](-f) = -Pf - [I-P]f

Now, -g1 and -g2 are both orthogonal to -Pf, and so is [I-P](-f). Hence we must

have

[I-P](-f) = - [I-P]f = 1g1 + 2g2

 -[I-P]f = MT, where =

 -M[I-P]f = MMT

 ‐(MMT)-1M[I-P]f = (MMT)-1MMT = 

  = ‐{ (MMT)-1Mf ‐ (MMT)-1 MPf }

  = ‐{ (MMT)-1Mf + (MMT)-1M(-Pf) }

  = ‐(MMT)-1Mf (from (*) above M(-Pf)=0…)

Hence we have

-f = -Pf - [I-P]f = -Pf + MT-Pf + MT(MMT)-1Mf 

i.e., Pf = f - MT(MMT)-1Mf = [I - MT(MMT)-1M]f 

i.e.,   P = [I - MT(MMT)-1M]
179

Question: What if Pf(x) = 0?

Answer: Then 0 = Pf = [I –MT(MMT)-1M]f = f + MTw, where

w = -(MMT)-1Mf. If w0, then x satisfies the Karush-Kuhn-Tucker conditions and

we may stop; if not, a new projection matrix could be identified such that d= - f

is an improving feasible direction. In order to identify this matrix we merely pick

any component of w that is negative, and drop the corresponding row from the matrix

M. Then we use the usual formula to obtain .

====================

Now consider the general problem

Minimize f(x)

st gj(x) 0, j=1,2,...,m

hk(x) 0, k=1,2,...,p

The gradient projection algorithm can be generalized to nonlinear constraints by

projecting the gradient on to the tangent hyperplane at x (rather than on to the surface

of the feasible region itself).


180

STEP 1: SEARCH DIRECTION

Let xi be a feasible point and let J be the "active set", i.e., J = {j | gj(xi)=0}, or more

practically, J = {j | gj(xi)+  0}.

Suppose each jJ is approximated using the Taylor series expansion by:

gj(x) = gj(xi) + (x-xi)gj(xi).

Notice that this approximation is a linear function of x with slope gj(xi).

 Let Mmatrix whose rows are gjT(xi) for jJ (active constraints) and hkT(xi)

for k=1,2,...,p (equality constraints)

 Let let P=[I - MT(MMT)-1M] (projection matrix). If M does not exist, let P=I.

 Let di = -Pf(xi).

a) If di =0 and M does not exist STOP

b) If di =0 and M exists find

= -(MMT)-1Mf(xi)

where u corresponds to g and v to h. If u>0 STOP, else delete the row of M

corresponding to some uj <0 and return to Step 1.

c) If di 0 go to Step 2.
181

STEP 2. STEP SIZE (Line Search)

Solve Minimize

where max= Supremum { | gj(xi+di) 0, hj(xi+di) =0}.

Call the optimal solution *and find xi+1 = xi+*di. If all constraints are not linear

make a "correction move" to return x into the feasible region. The need for this is

shown below. Generally, it could be done by solving g(x)=0 starting at xi+1 using the

Newton Raphson approach, or by moving orthogonally from xi+1 etc.

O.F. contours
O.F. contours f
g2=0 f
g1 xi+1
Pf Pf
correction move
g1=0 g1=0

g1, g2 are linear constraints g1 is a nonlinear constraint

NO CORRECTION REQD. REQUIRES CORRECTION


182

LINEARIZATION METHODS

The general idea of linearization methods is to replace the solution of the NLP by the

solution of a series of linear programs which approximate the original NLP.

The simplest method is RECURSIVE LP:

Transform Min f(x), st gj(x), j=1,2,...,m,

into the LP

Min {f(xi) + (x-xi)Tf(xi)}

st gj(xi) + (x-xi)Tgj(xi) 0, for all j.

Thus we start with some initial xi where the objective and constraints (usually only

the tight ones) are linearized, and then solve the LP to obtain a new xi+1 and

continue...

Note that f(xi) is a constant and x-xi=d implies the problem is equivalent to

Min dTf(xi)

st dTgj(xi) -gj(xi), for all j.

and this is a direction finding LP problem, and xi+1 =xi+d implies that the step

size=1.0!

EXAMPLE: Min f(x)= 4x1 – x22 -12,

st g1(x) = x12 + x22 - 25  0,

g2(x) = -x12 - x22 +10x1 + 10x2 - 34  0

If this is linearized at the point xi =(2,4), then since


183

4 4 2 4 2 10 6
; ;
2 8 2 8 2 10 2

it yields the following linear program (VERIFY…)

Min (x) = 4x1 – 8x2 + 4,

st 1(x) = 4x1 + 8x2 - 45  0,

2(x) = 6x1 + 2x2 - 14  0

*
=-30
f=-20

f=-10
=0

f=0 1=0

1=0
2=0

2=0

The method however has limited application since it does not converge if the local

minimum occurs at a point that is not a vertex of the feasible region; in such cases it

oscillates between adjacent vertices. To avoid this one may limit the step size (for

instance, the Frank-Wolfe method minimizes f between xi and xi+d to get xi+1).
184

RECURSIVE QUADRATIC PROGRAMMING (RQP)

RQP is similar in logic to RLP, except that it solves a sequence of quadratic

programming problems. RQP is a better approach because the second order terms

help capture curvature information.

Min dTL(xi) + ½ dTBd

st dTgj (xi)  -gj(xi)  j

where B is the Hessian (or an approximation of the Hessian) of the Lagrangian

function of the objective function f at the point xi. Notice that this a QP in d with

linear constraints. Once the above QP is solved to get di, we obtain xi by finding an

optimum step size along xi+di and continue. The matrix B is never explicitly

computed --- it is usually updated by schemes such as the BFGS approach using

information from f(xi) and f(xi+1) only. (Recall the quasi-Newton methods for

unconstrained optimization)
185

KELLEY'S CUTTING PLANE METHOD

One of the better linearization methods is the cutting plane algorithm of J.E.Kelley

developed for convex programming problems.

Suppose we have a set of nonlinear constraints gj(x)0 for j=1,2,...,m that determine

the feasible region G. Suppose further that we can find a polyhedral set H that

entirely contains the region determined by these constraints, i.e., GH.

2=0

3=0 G:
1=0
3=0
G j(x): nonlinear
1=0
2=0 j(x): linear; hence
H H is polyhedral
4=0

In general, a cutting plane algorithm would operate as follows:

1. Solve the problem with linear constraints.

2. If the optimal solution to this problem is feasible in the original (nonlinear)

constraints then STOP; it is also optimal for the original problem.

3. Otherwise add an extra linear constraint (i.e., an extra line/hyperplane) so that the

current optimal point becomes infeasible for the new problem with the additional

constraint (we thus "cut out" part of the infeasibility).

Now return to Step 1.


186

JUSTIFICATION

Step 1: It is in general easier to solve problems with linear constraints than ones with

nonlinear constraints.

Step 2: G is determined by gj(x)0, j=1,...,m, where each gj is convex, while

H is determined by hk0  k, where each hk is linear. G is wholly contained inside

the polyhedral set H.

Thus every point that satisfies gj(x)0 automatically satisfies hk(x)0, but not

vice-versa. Thus G places "extra" constraints on the design variables and therefore

the optimal solution with G can never be any better than the optimal solution with H.

Therefore, if for some region H, the optimal solution x* is also feasible in G, it

MUST be optimal for G also. (NOTE: In general, the optimal solution for H at some

iteration will not be feasible in G).

Step 3: Generating a “CUT”

In Step 2, let ̅ Maximum ∈ , where J is the set of violated constraints,

J={j | gj(x)>0}, i.e., ̅ is the value of the most violated constraint at xi. By the

Taylor series approximation around xi,

̅ ̅ +…

 ̅ ̅ , for every xRn (since ̅ is convex).


187

If then ̅ ̅

But since ̅ is violated at xi, ̅ 0. Therefore ̅ 0 ∀ xRn.

  ̅ is violated for all x,

the original problem is infeasible.

If , then consider the following linear constraint:

i. e., ̅ 0 ()

(Note that xi, and ̅ are all known…)

The current point xi is obviously infeasible in this constraint --- since hk(xi)  0

implies that ̅ ̅ 0, which contradicts the fact that

̅ is violated at xi , (i.e, ̅ (xi) >0).

So, if we add this extra linear constraint, then the optimal solution would

change because the current optimal solution would no longer be feasible for the new

LP with the added constraint. Hence () defines a suitable cutting plane.

NOTE: If the objective for the original NLP was linear, then at each iteration we

merely solve a linear program!


188

An Example (in R2) of how a cutting plane algorithm might proceed.

(--------- cutting plane).

True optimum
h4 x* for G…
4

2
h5 2

1 1
G G

H H
3 3

Nonnegativity + 3 linear Nonnegativity + 4 linear


constraints constraints

=x*
h6 5 5
4 4 6
2
2
1 1
G G

3
H H
3

Nonnegativity + 5 linear Nonnegativity + 6 linear


constraints constraints

S-ar putea să vă placă și