Documente Academic
Documente Profesional
Documente Cultură
' $
& %
2
' $
Contents
& %
3
' $
Chapter I
Introductory Example
Development Stages
Degrees of Freedom
& %
1 INTRODUCTORY EXAMPLE 4
' $
1 Introductory Example
Problem
Schedule your activities for next week-end, given you only have
$100 to spend.
Question
& %
1 INTRODUCTORY EXAMPLE 5
' $
A systematic approach might be :
& %
1 INTRODUCTORY EXAMPLE 6
' $
Step 3 : Specify other limitations.
tavailable =
$available =
& %
1 INTRODUCTORY EXAMPLE 7
' $
& %
1 INTRODUCTORY EXAMPLE 8
' $
2 General Formulation
Subject to
3 Development Stages
Stage 1 : Problem Definition
Objectives
goal of optimization
performance measurement
example : max profit, min energy cost, etc
max(P (x)) = min(P (x)).
& %
3 DEVELOPMENT STAGES 11
' $
Process models
assumptions & constants
equations to represent relationships
? material / energy balances
? thermodynamics
? kinetics, etc
constraints and bounds to represent limitations
? operating restrictions,
? bounds / limits
& %
3 DEVELOPMENT STAGES 12
' $
Standard form
Scaling of variables
units of measure
scaling factors
& %
3 DEVELOPMENT STAGES 13
' $
Stage 3 : Problem Solution
Technique selection
matched to problem type
exploit problem structure
knowledge of algorithms strengths & weaknesses
Starting points
usually several and compare
Algorithm tuning
termination criteria
convergence tolerances
step length
& %
3 DEVELOPMENT STAGES 14
' $
Solution uniqueness
Perturbation of optimization problem
effects of assumptions
variation (uncertainty) in problem parameters
variation in prices/costs
& %
3 DEVELOPMENT STAGES 15
' $
& %
3 DEVELOPMENT STAGES 17
' $
& %
4 DEGREES OF FREEDOM 18
' $
4 Degrees of Freedom
To determine the degrees of freedom (the number of variables
whose values may be independently specified) in our model we
could simply count the number of independent variables (the
number of variables which remain on the right- hand side) in our
modified equations.
& %
4 DEGREES OF FREEDOM 19
' $
Consider the following three equations relating three variables
x1 , x2 and x3 :
x1 x2 = 0
x1 2x3 = 0
x2 2x3 = 0
This seems to indicate that there are no degrees of freedom.
x1 2x3 = 0
x2 2x3 = 0
x1 x2 = 0
the result is the first equation.
h(v) = 0
in the set of variables v (n + m elements). We would like to
determine whether the set of equations can be used to solve for
some of the variables in terms of the others.
u = g(x)
& %
4 DEGREES OF FREEDOM 21
' $
We know that :
rank[v h] m.
What does it mean if :
rank[v h] < m?
& %
4 DEGREES OF FREEDOM 22
' $
Lets investigate a simple set of equations :
v3
v1 v1 v2 e
h(v) =
v1 v2 v3
with a = 1/e. (This could be the material balances for a reactor.) A
three dimensional plot of the equations looks like :
& %
4 DEGREES OF FREEDOM 23
' $
& %
4 DEGREES OF FREEDOM 24
' $
Examine the neighbourhood of the point :
v1 = 1, v2 = 0, v3 = 1
What is happening here ?
Then :
This example was chosen because it was very easy to see the
occurrence of linear dependence within the equation set. Of course
the situation can be much more complex for real problems with
many more equations and variables. So, clearly we cannot
determine degrees of freedom by counting the number of equations
in our problem.
& %
4 DEGREES OF FREEDOM 26
' $
& %
4 DEGREES OF FREEDOM 27
' $
plant design,
plant flow sheeting,
model fitting,
& %
4 DEGREES OF FREEDOM 28
' $
mass FA FR = 0
component A FA kXA V XA FR = 0
component B kXA V XB FR = 0
& %
4 DEGREES OF FREEDOM 29
' $
1 XA XB = 0
If we know the feed-rate (FA = 5kg/min), reactor volume (V = 2
litres), density of the reaction mixture ( = 1kg/l), and reaction
rate constant (k = 7min1 ), then the material balances can be
re-written :
5 FR = 0
5 14XA XA FR = 0
14XA XB FR = 0
1 XA XB = 0
Is this set of equations linear or nonlinear ?
& %
4 DEGREES OF FREEDOM 30
' $
In our problem, there are :
3 variables (XA , XB , FR )
4 equations (1 mass balance, 2 component balances, 1 other
equation).
= FR XA FR XB FR = 0
or :
FR (1 XA XB ) = 0
and since FR 6= 0, then
1 XA XB = 0
& %
4 DEGREES OF FREEDOM 31
' $
which is our fourth equation. Thus our fourth equations are not
linearly independent.
Caution :
Care should be taken when developing a model to avoid this
situation of specifying too many linearly dependent equations, since
it often leads to difficulty in finding a feasible solution.
1 XA XB = 0
This equation could then provide an easy check on any solution we
find using the other equations.
Therefore :
degrees of freedom = 3 - 3 = 0.
This says there are no degrees of freedom for optimization and the
reactor problem is fully specified.
FR = 5 kg/min.
XA = 5/19 wt. fraction
XB = 14/19 wt. fraction
& %
4 DEGREES OF FREEDOM 34
' $
& %
1
' $
Chapter II
Linear Programming
1. Introduction
2. Simplex Method
3. Duality Theory
4. Optimality Conditions
6. Sensitivity Analysis
& %
1 INTRODUCTION 2
' $
1 Introduction
Linear programming became important during World War II :
schedule production,
choose feedstocks,
determine new product feasibility,
handle constraints for process controllers, . . .
min c1 x1 + c2 x2 + c3 x3 + . . . + cn xn
xi
subject to :
a11 x1 + a12 x2 + a13 x3 + . . . + a1n xn b1
a21 x1 + a22 x2 + a23 x3 + . . . + a2n xn b2
.. ..
. .
am1 x1 + am2 x2 + am3 x3 + . . . + amn xn bm
0 xi xl,i
bi 0
= note that all of the functions in this optimization problem are
linear in the variables xi .
& %
1 INTRODUCTION 4
' $
min cT x
x
subject to :
Ax b
0x xl
b 0
Note that :
& %
1 INTRODUCTION 5
' $
Example
& %
1 INTRODUCTION 6
' $
& %
1 INTRODUCTION 7
' $
& %
1 INTRODUCTION 8
' $
c t P(x)
0 0 0
2 0 1800
0 1 1500
6/5 3/5 1980
So our market gardener should plant 1.2 acres in cabbages and 0.6
acres in tomatoes. She will then make $1980 in profit.
& %
1 INTRODUCTION 9
' $
2. no feasible region,
3. non-unique solution,
& %
1 INTRODUCTION 10
' $
1. Unbounded Solution :
Non-Unique Solution :
& %
2 SIMPLEX METHOD 13
' $
2 Simplex Method
Our market gardener example had the form :
subjcet to :
1.5c + 2t 3
20c + 60t 60
where : x = [c t]T .
& %
2 SIMPLEX METHOD 16
' $
2. Introduce slack variables (or convert all inequality constraints
to equality constraints) :
min 900c 1500t
c,t
subjcet to :
1.5c + 2t + s1 = 3
20c + 60t + s2 = 60
c, t, s1 , s2 0
or in matrix form :
min [900 1500 0 0]x
x
subject to :
1.5 2 1 0 3
x=
20 60 0 1 60
where : x = [c t s1 s2 ]T 0.
& %
2 SIMPLEX METHOD 17
' $
3. Define an initial basis and set up tableau :
choose the slacks as initial basic variables,
this is equivalent to starting at the origin.
the initial tableau is :
& %
2 SIMPLEX METHOD 18
' $
& %
2 SIMPLEX METHOD 19
' $
& %
2 SIMPLEX METHOD 20
' $
5. Transform the constraint equations :
& %
2 SIMPLEX METHOD 22
' $
& %
2 SIMPLEX METHOD 23
' $
The final tableau is :
2. Mixed constraints :
& %
2 SIMPLEX METHOD 27
' $
& %
2 SIMPLEX METHOD 28
' $
Big M Method
min x1 + 2x2
x
subject to :
3x1 + 4x2 5
x1 + 2x3 = 3
x1 + x2 4
& %
2 SIMPLEX METHOD 29
' $
& %
2 SIMPLEX METHOD 30
' $
3. Set up the tableau as usual. The initial basis will include all of
the artificial variables and the required slack variables :
& %
2 SIMPLEX METHOD 31
' $
4. Phase 1 : modify the objective row in the tableau to reduce the
coefficients of the artificial variables to zero :
& %
2 SIMPLEX METHOD 32
' $
5. Phase 2 : Solve the problem as before starting from the final
tableau from Phase 1.
& %
2 SIMPLEX METHOD 33
' $
& %
2 SIMPLEX METHOD 34
' $
min cT x
x
subject to :
Ax b
x 0
we introduce the slack variables and partition x, A and c as follows :
xB
x=
xN
A= B | N
& %
2 SIMPLEX METHOD 35
' $
cB
c=
cN
xB = B 1 [b N xN ]
& %
2 SIMPLEX METHOD 36
' $
T 1
P (x) = CB B [b N xN ] + cTN xN
or
T 1
P (x) = CB B b + [cTN cTB B 1 N ]xN
Then, the tableau we used to solve these problems can be
represented as :
xTB xTN b
xB I B 1 N B 1 b
& %
2 SIMPLEX METHOD 37
' $
3. calculate B 1 N and B 1 b.
4. find the pivot element by performing the ratio text using the
column corresponding to the most negative shadow price.
5. the pivot column corresponds to the new basic variable and the
pivot row corresponds to the new non-basic variable. Modify
the B and N matrices accordingly. Calculate B 1 .
& %
2 SIMPLEX METHOD 38
' $
Chvatal
Edgar & Himmelblau (references in 7.7)
Fletcher
Gill, Murray and Wright
& %
3 DUALITY & LINEAR PROGRAMMING 39
' $
& %
3 DUALITY & LINEAR PROGRAMMING 40
' $
Consider the general Linear programming problem :
min cT x
x
subject to :
Ax b
x 0
L(x, ) = cT x T [Ax b]
Notice that the Lagrangian is a scalar function of two sets of
variables : the Primal variables x, and the Dual variables (or
Lagrange multipliers) . Up until now we have been calling the
Lagrange multipliers the shadow prices.
& %
3 DUALITY & LINEAR PROGRAMMING 41
' $
We can develop the Dual problem first be re-writing the
Lagrangian as :
LT (x, ) = xT c [xT AT bT ]
which can be re-arranged to yield :
LT (, x) = bT xT [AT c]
This Lagrangian looks very similar to the previous one except that
the Lagrange multipliers and problem variables x have switched
places. In fact this re-arranged form is the Lagrangian for the
maximization problem :
max bT
subject to :
AT c
0
This formulation is often called the Dual problem to indicate that
it is defined in terms of the Dual variables .
& %
3 DUALITY & LINEAR PROGRAMMING 42
' $
& %
3 DUALITY & LINEAR PROGRAMMING 43
' $
min cT x
x
subject to :
Ax b
x 0
As we saw earlier, problems with inequality constraints of this form
often present some difficulty in determining an initial feasible
starting point. They usually require a Phase 1 starting procedure
such as the Big M method.
& %
3 DUALITY & LINEAR PROGRAMMING 44
' $
subject to :
AT c
0
& %
3 DUALITY & LINEAR PROGRAMMING 45
' $
Primal Dual
Equality constraint Free variable
Inequality constraint Nonnegative variable
Free variable Equality constraint
Nonnegative variable Inequality constraint
& %
4 OPTIMALITY CONDITIONS 46
' $
4 Optimality Conditions
Consider the linear programming problem :
min cT x
x
subject to :
M bM active
x
N bN inactive
& %
4 OPTIMALITY CONDITIONS 47
' $
In our market gardener example, the problem looked like :
L(x, ) = cT x M [M x bM ] N [N x bN ]
At the optimum, we know that the Lagrange multipliers (shadow
prices) for the inactive inequality constraints are zero (i.e. N = 0).
Also, since at the optimum the active inequality constraints are
exactly satisfied :
M x bM = 0
& %
4 OPTIMALITY CONDITIONS 48
' $
L(x , ) = P (x ) = cT x
At the optimum, we have seen that the shadow prices for the active
inequality constraints must all be non-negative (i.e. M 0). These
multiplier values told us how much the optimum value of the
objective function would increase if the associated constraint was
moved into the feasible region, for a minimization problem.
x L(x , ) = x P (x ) (M )T gM (x ) = cT (M )T M = 0
& %
4 OPTIMALITY CONDITIONS 49
' $
5 Applications
Quadratic Programming Using LP
1
min xT Hx + cT x
x 2
subject to
Ax b
x0
where H is a symmetric, positive definite matrix.
& %
5 APPLICATIONS 51
' $
1 T
L(x, , m) = x Hx + cT x T (Ax b) mT x
2
where and m are the Lagrange multipliers for the constraints.
& %
5 APPLICATIONS 52
' $
subject to :
Hx + c AT m = 0
T (Ax b) mT x = 0
Ax b 0
x, , m 0
This LP problem may prove easier to solve than the original QP
problem.
& %
5 APPLICATIONS 53
' $
Successive Linear Programming
min P (x)
x
subject to :
h(x) = 0
g(x) 0
If we linearize the objective function and all of the constraints
about some point xk , the original nonlinear problem may be
approximated (locally) by :
& %
5 APPLICATIONS 55
' $
& %
6 SENSITIVITY ANALYSIS 56
' $
6 Sensitivity Analysis
Usually there is some uncertainty associated with the values used
in an optimization problem. After we have solved an optimization
problem, we are often interested in knowing how the optimal
solution will change as specific problem parameters change. This is
called by a variety of names, including :
sensitivity analysis,
post-optimality analysis,
parametric programming.
In our Linear Programming problems we have made use of pricing
information, but we know such prices can fluctuate. Similarly the
demand / availability information we have used is also subject to
some fluctuation. Consider that in our market gardener problem,
we might be interested in determining how the optimal solution
changes when :
min cT x
x
subject to :
Ax b
L(x, ) = cT x TM [M x bM ] TN [N x bN ]
Recall that at the optimum :
x L(x , ) = cT (M )T M = 0
L(x , ) = [M x bM ] = 0
(N )T = 0
& %
6 SENSITIVITY ANALYSIS 59
' $
Notice that the first two sets of equations are linear in the variables
of interest (x , M ) and they can be solved to yield :
M = (M T )1 c and x = M 1 bM
Then, as we discussed previously, it follows that :
P (x ) = cT x = cT M 1 bM = (M )T bM = bTM M
We now have expressions for the complete solution of our Linear
Programming problem in terms of the problem parameters.
& %
6 SENSITIVITY ANALYSIS 61
' $
b P (x ) = b ( ) b = ( )T
T
1
M
b x = b M 1 bM =
0
1
M c
b = b = 0
0
& %
6 SENSITIVITY ANALYSIS 62
' $
c P (x ) = c c x = (x )T
T
1
c x = c M bM = 0
T 1
T 1
(M ) c (M )
c = c =
0 0
& %
6 SENSITIVITY ANALYSIS 63
' $
Note that x is not an explicit function of the vector c and will not
change so long as the active set remains fixed. If a change in c
causes an element of to become negative, then x will jump to a
new vertex.
min c1 x1 + c2 x2
x1 ,x2
subject to
g1 (x) a11 x1 + a12 x2
g2 (x) a21 x1 + a22 x2
= = Ax b
.. ..
. .
& %
6 SENSITIVITY ANALYSIS 64
' $
Note that the set of inequality constraints can be expressed :
x g1
M x g2
Ax = x = x b
N
..
.
This two variables optimization problem has an optimum at the
intersection of the g1 and g2 constraints, and can be depicted as :
& %
6 SENSITIVITY ANALYSIS 65
' $
Summary
& %
7 INTERIOR POINT METHODS 66
' $
Interior point methods move through the feasible region toward the
optimum. This is accomplished by :
& %
1
' $
Chapter III
Introduction
Newtons Method
Quasi-Newton Methods
& %
1 INTRODUCTION 2
' $
1 Introduction
Univariate optimization means optimization of a scalar function of
a single variable :
y = P (x)
These optimization methods are important for a variety of reasons :
1. there are many instances in engineering when we want to find
the optimum of functions such as these (e.g. optimum reactor
temperature, etc.),
2. almost all multivariable optimization methods in commercial
use today contain a line search step in their algorithm,
3. they are easy to illustrate and many of the fundamental ideas
are directly carried over to multivariable optimization.
Of course these discussions will be limited to nonlinear functions,
but before discussing optimization methods we need some
mathematical background.
& %
1 INTRODUCTION 3
' $
Continuity :
& %
1 INTRODUCTION 4
' $
manpower costs,
measurements, . . .
& %
1 INTRODUCTION 5
' $
Convexity of Functions
& %
1 INTRODUCTION 7
' $
& %
1 INTRODUCTION 8
' $
The problem with these ideas is that we can only calculate the
derivatives at a point. So looking for an optimum using these
fundamental ideas would be infeasible since a very large number
points (theoretically all of the points) in the region of interest must
be checked.
& %
1 INTRODUCTION 9
' $
dP
=0 stationarity
dx |x
and
d2 P
2
>0 a minimum
dx |x
d2 P
<0 a maximum
dx2 |x
& %
1 INTRODUCTION 10
' $
Consider the function :
y = x4
which we know to have a minimum at x = 0.
at the point x = 0 :
dy 3 d2 y 2 d3 y d4 y
|0 = 4x = 0, |0 = 12x = 0, |0 = 24x = 0, |0 = 24 > 0
dx dx2 dx3 dx4
& %
1 INTRODUCTION 11
' $
dn P
n
6= 0 where n {3, 5, 7, . . .}
dx |x
Then x is an inflection point. Otherwise, if the first higher-order,
non-zero derivative is even :
dn P
n
>0 a minimum
dx |x
dn P
<0 a maximum
dxn |x
& %
1 INTRODUCTION 12
' $
Some Examples
1.
min ax2 + bx + c = 0
x
stationarity :
dP b
= 2ax + b x =
dx 2a
curvature :
d2 P a>0 maximum
= 2a
dx2 a<0 minimum
& %
1 INTRODUCTION 13
' $
2.
min x3 2x2 5x + 6
x
stationarity :
dP 2 19
= 3x2 4x 5 = 0 x =
dx 3
curvature :
2
d P 219 > 0 mimimum
|x = 6x 4 =
dx2 2 19 < 0 maximum
3. min(x 1)n
& %
1 INTRODUCTION 14
' $
4. A more realistic example. Consider the situation where we have
two identical settling ponds in a waste water treatment plant :
& %
1 INTRODUCTION 15
' $
1) well mixed,
2) constant density,
3) constant flowrate.
contaminant balance :
dV co
= F c1 F co (second pond)
dt
dV c1
= F ci F c1 (first pond)
dt
V
Let the residence time be = F . Then the balances become
dco
= c1 co (second pond)
dt
dc1
= ci c1 (first pond)
dt
& %
1 INTRODUCTION 16
' $
Since we wish to know how changes in ci will affect the outlet
concentration co . Define perturbation variables around the
nominal operating point and take Laplace Transforms :
Ci (s) = k
Then, transfer function of the outlet concentration becomes :
k
Co (s) =
( s + 1)2
& %
1 INTRODUCTION 17
' $
& %
1 INTRODUCTION 18
' $
stationarity :
dco t t t
= ke 1 = 0 = 1 = 0 = t =
dt
curvature :
2
d co k t t k
= 2 e |t= = < 0 = maximum(k > 0)
dt2 e
Finally, the maximum outlet concentration is :
1 kV
co ( ) = k e ' 0.368
F
& %
1 INTRODUCTION 19
' $
This example serves to highlight some difficulties with the
analytical approach. These include :
& %
2 INTERVAL ELIMINATION METHODS 21
' $
physical limits,
scanning,
choosing a starting point,
checking the function initially decreases,
finding another point of higher value.
& %
2 INTERVAL ELIMINATION METHODS 22
' $
Golden Section Search
l1 l2
where : = =r
l2 L
Since L = l1 + l2 , l2 = rL and l1 = r2 L, then :
L = rL + r2 L
or
r2 + r 1 = 0
& %
2 INTERVAL ELIMINATION METHODS 23
' $
There are two solutions for this quadratic equation. We are only
interested in the positive root :
1 + 5
r= ' 0.61803398 . . . Golden ratio
2
Which has the interesting property :
1
r=
r+1
& %
2 INTERVAL ELIMINATION METHODS 24
' $
& %
2 INTERVAL ELIMINATION METHODS 25
' $
There are only three possible situations. For the minimization case,
consider :
& %
2 INTERVAL ELIMINATION METHODS 26
' $
Graphically the algorithm is :
& %
2 INTERVAL ELIMINATION METHODS 27
' $
Lk = rk L0 ' (0.618034)k L0
3. the new point for the current iteration is :
xk1 = ak + bk xk2
or :
xk2 = ak + bk xk1
& %
2 INTERVAL ELIMINATION METHODS 28
' $
Example
& %
2 INTERVAL ELIMINATION METHODS 29
' $
(k + 2) = 8 evaluations of P (xi ),
(k + 2) = 8 determinations of a new point xi .
& %
3 POLYNOMIAL APPROXIMATION METHODS 30
' $
& %
3 POLYNOMIAL APPROXIMATION METHODS 31
' $
& %
3 POLYNOMIAL APPROXIMATION METHODS 33
' $
& %
3 POLYNOMIAL APPROXIMATION METHODS 34
' $
Graphically, the situation is :
The procedure is :
1. choose three points that bracket the optimum,
2. determine a quadratic approximation for the objective function
based on these three points.
3. calculate the optimum of the quadratic approximation (the
predicted optimum x ),
4. repeat steps 1 to 3 until the desired accuracy is reached.
& %
3 POLYNOMIAL APPROXIMATION METHODS 36
' $
& %
4 NEWTONS METHOD 37
' $
4 Newtons Method
This is a derivative-based method that uses first and second
derivatives to approximate the objective function. The optimum of
the objective function is then estimated using this approximation.
dP 1 d2 P 2
P (x) ' P (xk ) + |xk (x xk ) + | x (x x k )
dx 2 dx2 k
dP dP d2 P
' |x + |x (x xk )
dx dx k dx2 k
& %
4 NEWTONS METHOD 38
' $
dP dP d2 P
' |x + |x (x xk ) = 0
dx dx k dx2 k
& %
4 NEWTONS METHOD 39
' $
Graphically, this is :
5 1.000
& %
4 NEWTONS METHOD 41
' $
& %
5 QUASI-NEWTON METHODS 42
' $
5 Quasi-Newton Methods
A major drawback of Newtons method is that it requires us to
have analytically determined both the first and second derivatives
of our objective function. Often this is considered onerous,
particularly in the case of the second derivative. The large family of
optimization algorithms that use finite difference approximations of
the derivatives are called Quasi-Newton methods.
0 P (x + x) P (x x)
P (x)
2x
P (x + x) P (x)
P 0 (x)
x
P (x) P (x x)
P 0 (x)
x
& %
5 QUASI-NEWTON METHODS 43
' $
Approximations for the second derivatives can be determined in
terms of either the first derivatives or the original objective
function :
00 P 0 (x + x) P 0 (x x)
P (x)
2x
00 P 0 (x + x) P 0 (x)
P (x)
x
00 P 0 (x) P 0 (x x)
P (x)
x
P (x + x) 2P (x) + P (x x)
P 00 (x)
x2
P (x + 2x) 2P (x + x) + P (x)
P 00 (x)
x2
P (x) 2P (x x) + P (x 2x)
P 00 (x)
x2
& %
5 QUASI-NEWTON METHODS 44
' $
& %
5 QUASI-NEWTON METHODS 45
' $
Regula Falsi
d2 P p0 (xq ) P 0 (xp )
dx2 xq xp
where xp and xq are chosen so that P 0 (xq ) and P 0 (xp ) have
different signs (i.e. the two points bracket the stationary point).
& %
5 QUASI-NEWTON METHODS 46
' $
x and xp or xq
depending upon :
& %
5 QUASI-NEWTON METHODS 47
' $
Graphically :
& %
5 QUASI-NEWTON METHODS 48
' $
Example : revisit the wastewater treatment problem with = 1.
max tet = min tet
t t
& %
6 SUMMARY 50
' $
6 Summary
There are only two basic ideas in unconstrained univariate
optimization :
& %
6 SUMMARY 51
' $
1. ease of implementation,
2. convergence speed,
3. robustness,
4. computational load (number of function evaluations, derivative
evaluations / approximations, etc.).
& %
6 SUMMARY 52
' $
& %
1
' $
Chapter IV
Introduction
Gradient-Based Methods
Newtons Method
Quasi-Newton Methods
& %
1 INTRODUCTION 2
' $
1 Introduction
Multivariate optimization means optimization of a scalar function
of several variables :
y = P (x)
and has the general form :
min P (x)
x
& %
1 INTRODUCTION 3
' $
Background
Consider the 2nd order Taylor series expansion about the point x0 :
1
P (x0 ) ' P (x0 ) + x P|x0 (x x0 ) + (x x0 )T 2x P|x0 (x x0 )
2
If we let :
1
a = P (x0 ) x P|x0 x0 + xT0 2x P|x0 x0
2
bT = x P|x0 xT0 2x P|x0
H = 2x P|x0
& %
1 INTRODUCTION 4
' $
Then we can re-write the Taylor series expansion as a quadratic
approximation for P (x) :
T 1 T
P (x) = a + b x + x Hx
2
and the derivatives are :
x P (x) = bT + xT H
2x P (x) = H
|I H| = 0
& %
1 INTRODUCTION 5
' $
The possible geometries are :
& %
1 INTRODUCTION 6
' $
& %
1 INTRODUCTION 7
' $
& %
1 INTRODUCTION 8
' $
& %
1 INTRODUCTION 9
' $
& %
1 INTRODUCTION 10
' $
& %
1 INTRODUCTION 11
' $
x P|x = 0
and :
& %
1 INTRODUCTION 12
' $
Consider :
T
P (x) = xT Axex Ax
Then :
T xT Ax T T xT Ax
x P = 2(x A)e + x Ax(x A)e
T xT Ax
= 2(x A)e (1 + xT Ax)
& %
1 INTRODUCTION 13
' $
Example
2 1 2 2 4
P =2 =
2 1 4 2
The eigenvalues of the Hessian are :
I 2
4 2 4 = ( 2)2 16 = 0
=
4 2 4 2
1 = 6 and 2 = 2
Thus the Hessian is indefinite and the stationary point is a
saddle point. This is a trivial example, but it highlights the
general procedure for direct use of the optimality conditions.
& %
1 INTRODUCTION 15
' $
& %
1 INTRODUCTION 16
' $
kxk+1 xk k
kP (xk+1 ) P (xk )k
kx P|xk k
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 17
' $
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 18
' $
min P (xk + s)
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 19
' $
Consider the previous two dimensional problem, but using the
independent search directions ([1 1]T , [1 1]T ).
Nelder-Meade Simplex
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 22
' $
Step 1 :
Step 2 :
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 24
' $
The reflected point is better than the second worst, but not
better than the best, then obtain a new simplex by replacing
the worst point xn+1 with the reflected point xr , and go to
step (a).
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 25
' $
If P (xr ) < P (x1 )
The reflected point is the best point so far, then go to step (d).
xe = xo + (xo xn+1 )
If P (xe ) < P (xr )
Else
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 26
' $
Else
Else
go to step (f).
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 27
' $
(f) Reduction For all but the best point, replace the point with
xi = x1 + (xi x1 ), i {2, . . . , n + 1}
go to step (a).
& %
2 SEQUENTIAL UNIVARIATE SEARCHES 28
' $
For the reflection, since xn+1 is the vertex with the higher
associated value among the vertices, we can expect to find a
lower value at the reflection of xn+1 in the opposite face
formed by all vertices point xi except xn+1 .
Example
x1 x2 x3 P1 P2 P3 xr Pr notes
0 3
3
3
2
2
2
8 5 2 2
1
1 expansion
1 2
1
3
2
2
2
1 5 2 1
1
0 expansion
2 2
1
1
1
2
2
1 0 2 1
0
1 contraction
13 3 5
" #
3 2
1
1
1
7/4
3/2
1 0 16
4
3 16
2
5 13
4 3/4
3/2
1
1
7/4
3/2 16 0 16
& %
3 GRADIENT-BASED METHODS 30
' $
3 Gradient-Based Methods
This family of optimization methods use first-order derivatives to
determine a descent direction. (Recall that the gradient gives the
direction of the quickest increase for the objective function. Thus,
the negative of the gradient gives the quickest decrease for the
objective function.)
& %
3 GRADIENT-BASED METHODS 31
' $
Steepest Descent
& %
3 GRADIENT-BASED METHODS 32
' $
k
kxk+1 xk k
kx P|xk k
kP (xk+1 ) P (xk )k
& %
3 GRADIENT-BASED METHODS 33
' $
Example 1
xk x P|Tx k P (xk )
k
3 4
0 8
3 4
1 0
1
1 0
& %
3 GRADIENT-BASED METHODS 34
' $
start at x0 = [1 2]T
xk x P|Tx k P (xk )
k
1 0
0 1
2 2
1 0
1
1 0
1 101 99
min xT x
x 2 99 101
Then
101 99
x P = x T = [101x1 99x2 99x1 + 101x2 ]
99 101
& %
3 GRADIENT-BASED METHODS 36
' $
1 T
101 99
P (xk +k sk ) = xk k x P|xk xk k x P|xk
2 99 101
T
1 101 99 101 99 101 99
= xk k xk xk k xk
2 99 101 99 101 99 101
which yields :
1 T 2 2 T 3
P (xk + k sk ) = xk Hxk 2k xk H xk + k xk H xk
2
& %
3 GRADIENT-BASED METHODS 37
' $
This expression is quadratic in k , thus making it easy to perform
an exact line search :
dP
= xTk H 2 xk + k xTk H 3 xk = 0
dk
Solving for the step length yields :
2
101 99
xTk xk
xTk H 2 xk 99 101
k = = 3
xTk H 3 xk
101 99
xTk xk
99 101
20, 002 19, 998
xTk xk
19, 998 20, 002
k =
4, 000, 004 3, 999, 9996
xTk xk
3, 999, 996 4, 000, 004
& %
3 GRADIENT-BASED METHODS 38
' $
Starting at x0 = [2 1]T :
xk P (xk ) x P|Tx k
k
2 103
0 54.50 0.0050
1 97
1.4845 2.8809
1 4.4104 0.4591
1.4854 3.0591
0.1618 8.3353
2 0.3569 0.0050
0.0809 7.8497
0.1201 0.2331
3 0.0289 0.4591
0.1202 0.2476
0.0131 0.6745
4 0.0023 0.0050
0.0065 0.6352
0.0097 0.0189
5 1.8915 104 0.4591
0.0097 0.0200
0.0011 0.0547
6 1.5307 105 0.0050
0.0005
4
0.0514
7.868 10 0.0015
7 1.2387 105 0.4591
7.872 104 0.0016
& %
3 GRADIENT-BASED METHODS 39
' $
Conjugate Gradients
Steepest Descent was based on the idea that the minimum will be
found provided that we always move in the downhill direction. The
second example showed that the gradient does not always point in
the direction of the optimum, due to the geometry of the problem.
Fletcher and Reeves (1964) developed a method which attempts to
account for local curvature in determining the next search direction.
The algorithm is :
& %
3 GRADIENT-BASED METHODS 40
' $
& %
3 GRADIENT-BASED METHODS 41
' $
& %
3 GRADIENT-BASED METHODS 42
' $
Example
1 T 101 99
min x x
x 2 99 101
101 99
xTk sk
99 101
k =
101 99
sTk sk
99 101
& %
3 GRADIENT-BASED METHODS 43
' $
start at x0 = [2 1]T
xk P (xk ) x P|Tx sk k
k
2 103 103
0
1 97 97
1.4845 2.8809 2.9717
1
1.4854 3.0591 2.9735
0
2
0
& %
3 GRADIENT-BASED METHODS 44
' $
& %
4 NEWTONS METHOD 45
' $
4 Newtons Method
The Conjugate Gradients method showed that there was a
considerable performance gain to be realized by incorporating some
problem geometry into the optimization algorithm. However, the
method did this in an approximate sense. Newtons method does
this by using second-derivative information.
1
P (x) P (xk ) + x P|xk (x xk ) + (x xk )T 2x P|xk (x xk )
2
& %
4 NEWTONS METHOD 46
' $
1
xk+1 = xk 2x P|xk x P|xk
Notice that in this case we get both the direction and the step
length for the minimization. Further, if the objective function is
quadratic, the optimum is found in a single step.
& %
4 NEWTONS METHOD 47
' $
& %
4 NEWTONS METHOD 48
' $
Example Consider again the optimization problem :
1 101 99
min xT x
x 2 99 101
The gradient and Hessian for this example are :
101 99 101 99
x P = x T , 2x P =
99 101 99 101
start at x0 = [2 1]T
2 103 101 99
0 54.50
1 97 99 101
0
1 0
0
& %
4 NEWTONS METHOD 49
' $
Steepest Descent was very good for the first two iterations, when
we were quite far from the optimum. But it slowed rapidly as :
x x x P|xk 0
& %
4 NEWTONS METHOD 50
' $
xk+1 = xk + k xk
& %
4 NEWTONS METHOD 51
' $
& %
5 APPROXIMATE NEWTON METHODS 52
' $
P P (xk + i ) P (xk )
x P|xk =
xi xk ki k
where i is a perturbation in the direction of xi . This approach
& %
5 APPROXIMATE NEWTON METHODS 53
' $
To approximate the Hessian using only finite differences in the
objective function could have the form :
2
P [P (xk + i ) P (xk )] [P (xk + j ) P (xk )]
2x P|xk =
xi xj xk ki kkj k
H = [hi ]
The problem is that often this approximate Hessian is
non-symmetric. A symmetric approximation can be formed as :
1h T
i
H = H + H
2
Whichever approach is used to approximate the derivatives, the
Newton step is calculated from them. In general, Newtons methods
based on finite differences can produce results similar to analytical
results, providing care is taken in choosing the perturbations i .
(Recall that as the algorithm proceeds : xk x and x P|xk 0.
Then small approximation errors will affect convergence and
accuracy.) Some alternatives which can require fewer computations,
yet build curvature information as the algorithm proceed are the
Quasi-Newton family.
& %
5 APPROXIMATE NEWTON METHODS 55
' $
& %
5 APPROXIMATE NEWTON METHODS 56
' $
Thus, any algorithm which satisfies this condition can build up
curvature information as it proceeds from xk to xk+1 in the
direction xk ; however, the curvature information is gained in only
one direction. Then as these algorithms proceed the from
iteration-to-iteration, the approximate Hessian can differ by only a
rank-one matrix
Hk+1 = Hk + vuT
Combining the Quasi-Newton Condition and the update formula
yields :
1 h i
v= T gk Hk xk
u xk
& %
5 APPROXIMATE NEWTON METHODS 57
' $
1 h i
Hk+1 = Hk + T gk Hk xk uT
u xk
& %
5 APPROXIMATE NEWTON METHODS 58
' $
gk gkT Hk xk xTk Hk
Hk+1 = Hk + T
gk xk xTk Hk xk
The BFGS update has several advantages including that it is
symmetric and it can be further simplified if the step is calculated
according to :
Hk xk = k x P|xk
& %
5 APPROXIMATE NEWTON METHODS 59
' $
General Quasi-Newton Algorithm
1. Initialization :
2. At iteration k :
sk = (Hk )1 (x P|xk )
- calculate the next iterate :
xk+1 = xk + k sk
& %
5 APPROXIMATE NEWTON METHODS 60
' $
- compute x P|xk , gk , xk .
& %
5 APPROXIMATE NEWTON METHODS 61
' $
Example
& %
5 APPROXIMATE NEWTON METHODS 62
' $
start at x0 = [2 1]T
xk P (xk ) x P| Hk sk k
xk
2 103 1 0 103
54.5 0.005
1 97 0 1 97
1.48 2.88 100.52 99.5 2.97
4.41 0.50
1.48 3.05 99.5 100.46 2.97
0
0
& %
5 APPROXIMATE NEWTON METHODS 63
' $
The BFGS algorithm was :
- not quit as efficient as Newtons method,
- approximately as efficient as the Conjugate Gradients
approach,
- much more efficient that Steepest Descent.
This is what should be expected for quadratic functions of low
dimension. As the dimension of the problem increases, the
Newton and Quasi-Newton methods will usually out perform
the other methods.
cautionary note :
The BFGS algorithm guarantees that the approximate Hessian
remains positive definite (and invertible) providing that the
initial matrix is chosen as positive definite, in theory. However,
round-off errors and sometimes the search history can cause the
approximate Hessian to become badly conditioned. When this
occurs, the approximate Hessian should be reset to some value
(often the identity matrix I).
& %
5 APPROXIMATE NEWTON METHODS 64
' $
5.2 Convergence
During our discussions we have mentioned convergence
properties of various algorithms. There are a number of ways
with which convergence properties of an algorithm can be
characterized. The most conventional approach is to determine
the asymptotic behaviour of the sequence of distances between
the iterates (xk ) and the optimum (x ), as the iterations of a
particular algorithm proceed. Typically, the distance between
as iterate and the optimum is defined as :
kxk x k
& %
5 APPROXIMATE NEWTON METHODS 65
' $
kxk+1 x k
lim =
k kxk x kp
- super-linear ( = 0, p = 1),
& %
5 APPROXIMATE NEWTON METHODS 66
' $
& %
5 APPROXIMATE NEWTON METHODS 67
' $
- ease of implementation,
- computational limitations.
& %
5 APPROXIMATE NEWTON METHODS 68
' $
& %
1
' $
Chapter V
Introduction
Optimality Conditions
Examples
& %
1 INTRODUCTION 2
' $
1 Introduction
Until now we have only been discussing NLP that do not have
constraints. Most problems that we wish to solve will have some
form of constraints.
& %
1 INTRODUCTION 3
' $
General Form
Subject to
1. Stationarity
L(x , u, v) = P (x ) + g(x )u + f (x )v = 0
2. Feasibility
g(x ) 0, f (x ) = 0
3. Complementarity
uT g(x ) = 0
& %
2 OPTIMALITY CONDITIONS FOR LOCAL OPTIMUM 5
' $
4. Non-negativity
u0
5. Curvature
wT 2 Lw 0, feasible w
& %
2 OPTIMALITY CONDITIONS FOR LOCAL OPTIMUM 6
' $
Example
2x1 + 3x2 12
6x1 + 3x2 18
L = (x1 5)2 + (x2 4)2 + 1 (2x1 + 3x2 12) + 2 (6x1 + 3x2 18)
& %
2 OPTIMALITY CONDITIONS FOR LOCAL OPTIMUM 7
' $
x L = [2(x1 5) + 21 + 62 2(x2 4) + 31 + 32 ]
x L = 0 and L = 0
3/2 3/4
x= = P = 13.25
3 17/12
& %
2 OPTIMALITY CONDITIONS FOR LOCAL OPTIMUM 8
' $
Let us assume that only the second constraint is active at the
optimum.
2x1 + 3x2 12
6x1 + 3x2 18 (active)
Then we could modify Lagrangian and check KKT conditions
x L = [2(x1 5) + 61 2(x2 4) + 31 ] = 0
2x1 + 3x2 12
6x1 + 3x2 18 + e (active)
x L = [2(x1 5) + 61 2(x2 4) + 31 ] = 0
Notice that x and changed very slightly from those values of the
original solution
2x1 + 3x2 12
6x1 + 3x2 18 + e (active)
& %
3 ALGORITHMS FOR CONSTRAINED NLP 12
' $
3. Gradient-Based Methods
Steepest, feasible descent direction,
Solve the resulting univariate problem,
Not currently popular.
& %
4 SEQUENTIAL QUADRATIC PROGRAMMING 13
' $
1
T
min P (xk ) x + xT Bk x
x 2
subject to
f (xk ) + f (xk )T x = 0
g(xk ) + g(xk )T x = 0
where Bk is the current approximation of the Hessian of the
Lagrangian and x is the search direction.
3. Semidefinite Optimization.
6. Semi-Infinite Optimization.
7. Dynamic Optimization.
& %