Sunteți pe pagina 1din 30

1

Chapter 5: Unconstrained Optimization Methods


2
Iterative descent direction methods
Gradient methods
Conjugate Gradients method
Newtons Method and variations
Quasi-Newtons method
Line search techniques for step length
Polynomial interpolation methods
Golden Section method
Outline
3
Descent direction
A direction that guaranteed to produce a decrease in objective function
Key Words
Step Size
A distance can move along the descent direction
Initial guess
An initial (feasible) point user supplied
Convergence Criteria
A criteria to decide when stops the optimization iteration
Rate of Convergence
A key measure of performance of an algorithm. The convergence rate
could be linear, super-linear or quadratic
4
Analytical vs Numerical Optimization Method
Analytical Method:
Possible to use the optimality conditions directly
Suitable for small-scale problem
Easy to implement theoritical analysis of optimization performance
Numerical Method:
Suitable for large-scale problem
Iterative method is used to deal with the nonlinearity
More Choices for descent direction
Flexible convergence criteria
) ( ) (
1 k k
f f x x <
+
2
1
1
1
) ( ) ( c
c
s
s
+
+
k k
k k
f f x x
x x
,
1 k
k
k k
d x x o + =
+
n k , , 1 , 0 =
Convergence criteria
Value of objective function

k
d

k
o

k
x

0
x Intial guess
A set of iterative points
A set of descent direction
A set of step sizes
Iterative Decent
As in the case of optimality conditions, the main ideas of unconstrained optimization
methods have simple geometrical explanations
Bertsekas, p.23
,
1 k
k
k k
d x x o + =
+
n k , , 1 , 0 =
Gradient Methods (I)
Given a vector x Rn with 7(x) = u, consider the half line of vectors
Bertsekas, p.23
Gradient Methods (II)
Carrying this idea further, we consider the half line of vectors
where the direction vector J R
n
makes an angle of strictly more than /2 with vf
7
Bertsekas, p.24
Selecting the Descent Direction
Many gradient methods are specified in the form:
8
The Stepest Descent Method
Direction along which objective function decreases most rapidily
9
Proof:
For any unit search direction P and step size , with Taylor theorem
Hence the most rapid decrease is the solution of the problem
So the objective is minimized when
k k
k
k k
k
k
f P f P f V + = + o o ) ( ) ( x x
0 cos V = V
k k
T
f P f P Min
The angle between vector P and
0
f V
k
k
k k
T
f
f
P
f P f P
V
V
=
V = V t cos
Slow Convergence of Stepest Descent
10
Bertsekas, p.26
11
Conjugate Gradient Method
To accelerate the convergence rate of the steepest descent method by adding a
portion of the previous direction to the current negative gradient
The scalar multiplier determines the portion of the previous direction to be
added to determine the new direction
To guarantee the descent direction
Furthermore, the method terminates with an optimal solution after at most n steps.
n k d x f d
x f d
k k k k
, , 1 , ) (
) (
1
0 0
= + V =
V =

|
| |
| | ) ( ) (
) ( ) (
1 1
V V
V V
=
k
T
k
k
T
k
k
x f x f
x f x f
| Fletcher-Reeves formula
Polak-Ribiere formula
| |
| | ) ( ) (
) ( ) ( ) (
1 1
1

V V
V V V
=
k
T
k
k
T
k k
k
x f x f
x f x f x f
|
{ } 0 , max
k k
| | =
12
provided 7`(xk) is positive definite (if not, modification).
The idea in Newtons method is to minimize at each iteration the quadratic
approximation of f around the current point x
k
given by:
Using the necessary condition for the optimization
Note that Newtons method finds the global minimum of a positive definite quadratic
function in a single iteration (assuming o
k
= 1)
Newtons Method
k k T k k T k k k
d x H d d x f x f x f ) ( ) (
2
1
) ( ) ( ) (
1
+ V + ~
+
0 ) ( ) ( = + V
k k k
d x H x f
| | ) ( ) (
1
k k k
x f x H d V =

| | ) ( ) (
1
1 k k k k
x f x H x x V =

+
13
Fast Convergence of Newtons Method
Bertsekas, p.27
14
Original Newton method has no step size, and this will cause divergence problem
when the initial guess is far away from the optimum.
The modified Newtons method computes step size and makes method more stable
Question: How to obtain the inverse of the Hessian matrix efficiently ? !
Other choices of
k
yield the class of Quasi-Newton methods
Modified Newtons Method
| | ) ( ) (
1
1 k k
k
k k
x f x H x x V =

+
o
15
Quasi-Newton Methods (I)
Quasi-Newton methods are gradient nethods of the form:
16
Quasi-Newton Methods (II)
Without explicitly requiring the Hessian matrix
Update the matrix D with Broyden-Fletcher-Goldfarb-Shanno (BFGS) method
The intial guess of matrix D could be a positive definite matrix, i.e. an identity matrix
( )
( )
( )
( ) ( )
( ) { } ( )
(

+
|
|
.
|

\
|
+ + =
+ k
T
k k
T
k
T
k k
k
T
k k
T
k
T
k k
k
T
k
k k
T
k
k k
D q s D q s
s q s q
s s
s q
q D q
D D
1
1
1
) (
k k k
x f D d V =
k
k
k k
d x x o + =
+1
?
1
+ =
+ k k
D D
1. Determine direction
2. Find step length, update current point
3. Update matrix D for next iteration
k k k k k k
X X s X f X f q = V V =
+ + 1 1
) ( ) (
17
Rate of Convergence
Search Path and Convergence History
2 2 2 2
)
3
1
) 1 ( 2 ( ) ( + + + = y x y x f
Steepest Descent Method Conjugate Gradient Method Quasi-Newton Method Modifed-Newton Method
Iteration Iteration Iteration Iteration
18
Comparison
The steepest descent method produces successive direction
that are perpendicular to each other
Near the optimum, the convergence of the steepest descent
method is very slow because of the zigzag phenomenon
Quasi-Newton method avoid solving the Hessian matrix and
merely need modest effort to update the approximation of
inverse Hessian
Steepest descent method Quasi-Newton method
19
Step Size Selection (I)
At each iteration of optimization, one need to determine a descent direction and an
appropriate step size. The success of a method dependes on efficient choices of
them.
At each iteration, with a known descent direction, the minimization problem
reduces to a one-variable problem.
Since the line search method are based on given descent direction, we assume that
the step size is a positive value.
20
Step Size Selection (II)
There is a number of rules for choosing the stepsize o
k
in a gradient method:
The minimization and limited minimization rules must tipically be implemented with the aid
of one dimensional line search algorithms.
21
Line Search Methods for Step Size
We consider minimization of the function g o = (x + oJ) where f is
continuously differentiable.
Cubic interpolation (requires derivatices)
Quadratic interpolation (requires only function values)
Golden section (requires only function values)
Searches a bracketed interval in which unimodality is assumed.
22
Golden Section Method (I)
Here we assume that g(o)is strictly unimodal in the interval [0,s].
The golden section method minimizes g over [0,s] by determining at the kth
iteration an interval [o
k
, o
k
] containing o

.
These intervals are obtained using the
number
which satisfies
Bertsekas, p.744
Golden Section Method (II)
23
Bertsekas, p.745
24
Equal Interval Search
Bounds are found for the minimum of objective function first.
By successively refining these bounds, the minimium is bracketed
to any desired degree of precision
Initialize:
x1 = 0; delta = delta0;
x2 = x1 + delta
f1 = (x1); f2 = (x2);
x_low-x_upper = 2
Loop While (x_low-x_upper)<
if (f2 <= f1)
Continuing to decrease
else
Go past minimum
x_low = x1 delta; x_upper = x2;
delta = delta/Factor (i.e. 5~10)
end If
end While
x_min = (x_upper x_low)/2
25
Example
With initial parameters : 01 . 0 , 5 , 5 . 0 , 0
0 0
= = = = c o o F
2
2 1
1
1
o o
|
+
=
26
Golden Section Search
Initialize:
x_low = a + (b-a)*0.382;
x_upper = a + (b-a)*0.618;
f1 = (x_low); f2 = (x2);
Loop While (x_upper- x_low)<
if (f1 > f2)
a = x_low; x_low = x_upper; f1 = f2
x_upper = a + (b-a)*0.618
f2 = (x_upper)
else
b = x_upper; x_upper = x_low; f2 = f1
x_low = a + (b-a)*0.382
f1 = (x_low)
end if
end While
a b
x_low
x_upper
27
Example
01 . 0 , 5 , 5 . 0 , 0
0 0
= = = = c o o F
With initial parameters :
2
2 1
1
1
o o
|
+
=
Successive Step Size Reduction
To avoid the often considerable computation associated with the line minimization
rules, it is natural to consider rules based on successive step size reduction.
In the simplest rule of this type an initial step size s is chosen and if the
corresponding vector x
k
+ sJ
k
does not yield an improved value of f, that is
(x
k
+ sJk) (x
k
), the step size is reduced, peraps repeatedly, by a certain
factor, until the value of f is improved.
Caution: the cost improvement at each iteration may not be substanial enough to
guarantee convergence to a minimum!
28
Bertsekas, p.745
Line Search by the Armijo Rule
The Armijo rule is essentially the successiv e reduction rule , suitable modified to eliminate
the theoretical convergence difficulites.
29
Bertsekas, p.31
Homework
Explain what is the significance of the number = 1 -
2
for the golden section
search.
Prove that o o = 0 o
2
, o u.
30

S-ar putea să vă placă și