Multivariate Optimizationx 4

Overview of Multivariate Optimization Topics
Multivariate Optimization Overview
Problem denition
The unconstrained optimization problem is a generalization of

the line search problem
Algorithms
Cyclic coordinate method
Steepest descent
Conjugate gradient algorithms
PARTAN
Newtons method
Levenberg-Marquardt
Find a vector a such that

a = argminf (a)
a
Note that the are no constraints on a

Example: Find the vector of coecients (w Rp1 ) that
minimize the average absolute error of a linear model
Akin to a blind person trying to nd their way to the bottom of a
valley in a multidimensional landscape
Concise, subjective summary
We want to reach the bottom with the minimum number of cane

taps
Also vaguely similar to taking core samples for oil prospecting
J. McNames Portland State University ECE 4/557 Multivariate Optimization Ver. 1.14 1
Example 1: Optimization Problem

5
a2
a2
5
5
0
a1
5
5
0
a1
Example 1: MATLAB Code

function [] = O p t i m i z a t i o n P r o b l e m ();
% ==============================================================================
% User - Specified Parameters
% ==============================================================================
x = -5:0 .05 :5;
y = -5:0 .05 :5;
% ==============================================================================
% Evaluate the Function
% ==============================================================================
[X , Y ] = meshgrid (x , y );
[Z , G ] = OptFn (X , Y );
functionName = O p t i m i z a t i o n P r o b l e m ;
fileIdentifier = fopen ([ functionName .tex ] , w );
% ==============================================================================
% Contour Map
% ==============================================================================
figure ;
FigureSet (2 , Slides );
contour (x ,y ,Z ,50);
xlabel ( a_1 );
ylabel ( a_2 );
zoom on ;
AxisSet (8);
fileName = sprintf ( %s -% s , functionName , Contour );
print ( fileName , - depsc );

fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = \ n
fprintf ( fileIdentifier , \\ newslide \ n );
fprintf ( fileIdentifier , \\ stepcounter { exc }\ n );
fprintf ( fileIdentifier , \\ slideheading { Exam ple \\ arabic { exc }: Optimization Problem }\ n );
fprintf ( fileIdentifier , \\ includegraphics [ scale =1]{ Matlab /% s }\ n , fileName );
fprintf ( fileIdentifier , \ n );
% ==============================================================================
% Quiver Map
% ==============================================================================
figure ;
axis ([ -5 5 -5 5]);
contour (x ,y ,Z ,50);
h = get ( gca , Children );
set (h , LineWidth ,0 .2 );
hold on ;
xCoarse = -5:0 .5 :5;
yCoarse = -5:0 .5 :5;
[X , Y ] = meshgrid ( xCoarse , yCoarse );
[ ZCoarse , GCoarse ] = OptFn (X , Y );
nr = size ( xCoarse ,1);
dzx = GCoarse (
1: nr ,1: nr );
dzy = GCoarse ( nr + (1: nr ) ,1: nr );
quiver ( xCoarse , yCoarse , dzx , dzy );
hold off ;
xlabel ( a_1 );
ylabel ( a_2 );
zoom on ;
case 1 ,
case 2 ,
case 3 ,
otherwise ,
AxisSet (8);
fileName = sprintf ( %s -% s , functionName , Quiver );
fprintf ( fileIdentifier , \\ slideheading { Example \\ arabic { exc }: Optimization Problem }\ n );
% ==============================================================================
% 3 D Maps
% ==============================================================================
figure ;
set ( gcf , Renderer , zbuffer );
h = surf (x ,y , Z );
set (h , LineStyle , None );
xlabel ( a_1 );
ylabel ( a_2 );
shading interp ;
grid on ;
AxisSet (8);
hl = light ( Position ,[0 ,0 ,30]);
set ( hl , Style , Local );
set (h , B a c k F a c e L i g h t i n g , unlit )
material dull
for c1 =1:3
switch c1
Global Optimization?
view (45 ,10);

view ( -55 ,22);
view ( -131 ,10);
error ( Not implemented. );
In general, all optimization algorithms nd a local minimum in as

few steps as possible
end
fileName = sprintf ( %s -% s % d , functionName , Surface , c1 );
fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
fprintf ( fileIdentifier , \\ slideheading { Example \\ arabic { exc }: Optimization Problem }\ n );
fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
end
% ==============================================================================
% List the MATLAB Code
% ==============================================================================
fprintf ( fileIdentifier , \\ slideheading { Example \\ arabic { exc }: MATLAB Code }\ n );
fprintf ( fileIdentifier , \ t \\ matlabcode { Matlab /% s.m }\ n , functionName );
There are also global optimization algorithms based on ideas

such as
Evolutionary computing
Genetic algorithms
Simulated annealing
None of these guarantee convergence in a nite number of
iterations
All require a lot of computation
fclose ( fileIdentifier );
Optimization Comments
Ideally, when we construct models we should favor those which
can be optimized with few shallow local minima and reasonable
computation
Graphically you can think of the function to be minimized as the
elevation in a complicated high-dimensional landscape
Optimization Algorithm Outline

The basic steps of these algorithms is as follows
1. Pick a starting vector a
2. Find the direction of descent, d
3. Move in that direction until a minimum is found:
:= argminf (a + d)
The problem is to nd the lowest point
a := a + d
The most common approach is to go downhill

The gradient points in the most uphill direction
The steepest downhill direction is the opposite of the gradient
Most optimization algorithms use a line search algorithm
The methods mostly dier only in the way that the direction of
descent is generated
4. Loop to 2 until convergence

Most of the theory of these algorithms is based on quadratic
surfaces
Near local minima, this is a good approximation
Note that the functions should (must) have continuous gradients
(almost) everywhere
Cyclic Coordinate Method
Example 2: Cyclic Coordinate Method
1. For i = 1 to p,
ai := argminf ([a1 , a2 , . . . , ai1 , , ai+1 , . . . , ap ])
+ Each line search can be performed semi-globally to avoid shallow

local minima
1
Y
+ Simple to implement
+ Can be used with nominal variables
+ f (a) can be discontinuous
+ No gradient required
Very slow compared to gradient-based optimization algorithms
Usually only practical when the number of parameters, p, is small

There are modied versions with faster convergence
5
5
0
X
0.5
0.5
Function Value
1
1.5
2
3
2
2.5
3
3.5
10
15
20
25
Iteration
Example 2: Relevant MATLAB Code
function [] = C y c l i c C o o r d i n a t e ();
% clear all ;
close all ;
Euclidean Position Error
ns
x
y
b0
ls
26;
-3;
1;
-1;
30;
a = zeros ( ns ,2);
f = zeros ( ns ,1);
[z , dzx , dzy ] = OptFn (x , y );

a (1 ,:) = [ x y ];
f (1)
= z;
for cnt = 2: ns ,
if rem ( cnt ,2)==1 ,
d = [1 0] ; % Along x direction
else
d = [0 1] ; % Along y direction
end ;
2
1
0
=
=
=
=
=
10
15
20
25
Iteration
[b , fmin ] = LineSearch ([ x y ] ,d , b0 , ls );
x = x + b * d (1);
y = y + b * d (2);
a ( cnt ,:) = [ x y ];
f ( cnt )
= fmin ;
end ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
[ zopt , id1 ] = min ( z );
[ zopt , id2 ] = min ( zopt );
id1 = id1 ( id2 );
xopt = x ( id1 , id2 );
yopt = y ( id1 , id2 );
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id1 ] = min ( z );
[ zopt2 , id2 ] = min ( zopt2 );
id1 = id1 ( id2 );
xopt2 = x ( id1 , id2 );
yopt2 = y ( id1 , id2 );
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
z = OptFn (x , y );
contour (x ,y ,z ,50);
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
print - depsc C y c l i c C o o r d i n a t e C o n t o u r B ;
set ( h (1) , LineWidth ,1 .2 );

h = plot ( xopt , yopt , kx , xopt , yopt , rx );
set ( h (1) , MarkerSize ,5);
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc C y c l i c C o o r d i n a t e C o n t o u r A ;
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -1 .5 + ( -2:0 .05 :2) , -1 .5 + ( -2:0 .05 :2));
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc C y c l i c C o o r d i n a t e E r r o r L i n e a r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
xerr = ( sum ((( a - ones ( ns ,1)*[ xopt2 yopt2 ]) ) . ^2) ) . ^(1/2);
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
set (h , MarkerSize ,6);
xlabel ( Iteration );
ylabel ( Euclidean Position Error );
xlim ([0 ns -1]);
ylim ([0 xerr (1)]);
grid on ;
set ( gca , Box , Off );
AxisSet (8);
print - depsc C y c l i c C o o r d i n a t e P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 ,f , b ,[0 ns ] , zopt *[1 1] , r ,[0 ns ] , zopt2 *[1 1] , g );
set ( h (1) , Marker , . );
ylabel ( Function Value );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
Steepest Descent
Steepest Descent
The gradient of the function f (a) is dened as the vector of partial

derivatives:
T

f (a)
f (a)
f (a)
.
.
.
a f (a) a1
a2
ap
+ Very stable algorithm

Can converge very slowly once near the local minima where the
surface is approximately quadratic
It can be shown that the gradient, a f (a), points in the

direction of maximum ascent
The negative of the gradient, a f (a), points in the direction
of maximum descent
A vector d is a direction of descent if there exists a such that
f (a + d) < f (a) for all 0 < <
It can alsoT be shown that d is a direction of descent i
(a f (a)) d < 0
The algorithm of steepest descent uses d = a f (a)
The most fundamental of all algorithms for minimizing a
continuously dierentiable function
Example 3: Steepest Descent
1.2
1.3
1.4
1.5
1.6
Y
1.7
1.8
1.9
2.1
5
5
2.2
0
X
1.8
1.6
X
1.4
1.2
Example 3: Steepest Descent Method
6
Function Value
5
4
3
2
3
2
1
1
0
0
0
10
15
20
25

function [] = SteepestDescent ();
% clear all ;
close all ;
ns
x
y
b0
ls
=
26;
=
-3;
=
1;
= 0 .01 ;
=
30;
a = zeros ( ns ,2);
f = zeros ( ns ,1);
[z , g ] = OptFn (x , y );
a (1 ,:) = [ x y ];
f (1)
= z;
d
= -g / norm ( g );
for cnt = 2: ns ,
x = x + b * d (1);
y = y + b * d (2);
[z , g ] = OptFn (x , y );
d
= -g ;
d
= d / norm ( d );
10
15
20
25
Iteration
Iteration
a ( cnt ,:) = [ x y ];
f ( cnt )
= z;
end ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
[ zopt , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt = x ( id1 , id2 );
yopt = y ( id1 , id2 );
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt2 = x ( id1 , id2 );
yopt2 = y ( id1 , id2 );
[ zopt zopt2 ]
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );

hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc S t e e p e s t D e s c e n t C o n t o u r A ;
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -1 .6 + ( -0 .5 :0 .01 :0 .5 ) , -1 .7 + ( -0 .5 :0 .01 :0 .5 ));
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc S t e e p e s t D e s c e n t C o n t o u r B ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
print - depsc S t e e p e s t D e s c e n t P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
set ( h (1) , Marker , . );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
grid on ;
Conjugate Gradient Algorithms
AxisSet (8);
print - depsc S t e e p e s t D e s c e n t E r r o r L i n e a r ;
1. Take a steepest descent step

2. For i = 2 to p
:= argminf (a + d)
a := a + d
gi := f (a)
:=
T
gi gi
T
gi1 gi1
d := gi + di
Based on quadratic approximations of f
Called the Fletcher-Reeves method
Example 4: Fletcher-Reeves Conjugate Gradient
2.5
2.6
2.7
2.8
2.9
Y
3.1
3.2
3.3
3.4
5
5
0
X
3.5
1.5
2
X
2.5
6
Function Value
5
4
3
2
3
2
1
1
0
0
0
10
15
20
25
Iteration
10
15
20
25
Iteration

function [] = FletcherReeves ();
% clear all ;
close all ;
ns
x
y
b0
ls
=
26;
=
-3;
=
1;
= 0 .01 ;
=
30;
a = zeros ( ns ,2);
f = zeros ( ns ,1);
[z , g ] = OptFn (x , y );
a (1 ,:) = [ x y ];
f (1)
= z;
d = -g / norm ( g ); % First direction
for cnt = 2: ns ,
x = x + b * d (1);
y = y + b * d (2);
go = g ; % Old gradient
[z , g ] = OptFn (x , y );
beta = (g * g )/( go * go );
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );

hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc F l e t c h e r R e e v e s C o n t o u r A ;
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid (1 .5 :0 .01 :2 .5 , -3 .5 :0 .01 : -2 .5 );
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
= -g + beta * d ;
a ( cnt ,:) = [ x y ];
f ( cnt )
= z;
end ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
[ zopt , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt = x ( id1 , id2 );
yopt = y ( id1 , id2 );
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt2 = x ( id1 , id2 );
yopt2 = y ( id1 , id2 );
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
z = OptFn (x , y );
axis ( square );
hold on ;
AxisSet (8);
print - depsc F l e t c h e r R e e v e s C o n t o u r B ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
print - depsc F l e t c h e r R e e v e s P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
set ( h (1) , Marker , . );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
grid on ;
Conjugate Gradient Algorithms Continued
AxisSet (8);
print - depsc F l e t c h e r R e e v e s E r r o r L i n e a r ;
There is also a variant called Polak-Ribiere where

T
:=
(gi gi1 ) gi
T
gi1 gi1
+ Only requires the gradient

+ Converges in a nite No. steps when f (a) is quadratic and perfect
line searches are used
Less stable numerically than steepest descent
Sensitive to inexact line searches
Example 5: Polak-Ribiere Conjugate Gradient
2.5
2.6
2.7
2.8
2.9
Y
3.1
3.2
3.3
3.4
5
5
0
X
3.5
1.5
2
X
2.5
6
Function Value
5
4
3
2
3
2
1
1
0
0
0
10
15
20
25

function [] = PolakRibiere ();
% clear all ;
close all ;
ns
x
y
b0
ls
=
26;
=
-3;
=
1;
= 0 .01 ;
=
30;
a = zeros ( ns ,2);
f = zeros ( ns ,1);
[z , g ] = OptFn (x , y );
a (1 ,:) = [ x y ];
f (1)
= z;
d = -g / norm ( g ); % First direction
for cnt = 2: ns ,
x = x + b * d (1);
y = y + b * d (2);
go = g ; % Old gradient
[z , g ] = OptFn (x , y );
beta = (( g - go ) * g )/( go * go );
10
15
20
25
Iteration
Iteration
= -g + beta * d ;
a ( cnt ,:) = [ x y ];
f ( cnt )
= z;
end ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
[ zopt , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt = x ( id1 , id2 );
yopt = y ( id1 , id2 );
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt2 = x ( id1 , id2 );
yopt2 = y ( id1 , id2 );
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );

hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc P o l a k R i b i e r e C o n t o u r A ;
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid (1 .5 :0 .01 :2 .5 , -3 .5 :0 .01 : -2 .5 );
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc P o l a k R i b i e r e E r r o r L i n e a r ;
AxisSet (8);
print - depsc P o l a k R i b i e r e C o n t o u r B ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
print - depsc P o l a k R i b i e r e P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
set ( h (1) , Marker , . );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
grid on ;
Parallel Tangents (PARTAN)

1. First gradient step
d := f (a)
:= argmin f (a + d)
sp := d
a := a + sp
2. Gradient Step
dg := f (a)
:= argmin f (a + d)
sg := d
a := a + sg
3. Conjugate Step
dp := sp + sg
:= argmin f (a + d)
sp := d
a := a + sp
PARTAN Concept
a2
a0
Example 6: PARTAN
a3
5
a6
a1
a7
4
3
a4
a5
First two steps are steepest descent
1
Y
Thereafter, each iteration consists of two steps

1. Search along the direction
0
1
di = ai ai2
where ai is the current point and ai2 is the point from two
steps ago
2. Search in the direction of the negative gradient
3
4
5
5
di = f (ai )
0
X
Example 6: PARTAN
Example 6: PARTAN
2.5
2.6
6
2.7
5
Function Value
2.8
2.9
3
3.1
3.2
4
3
2
3.3
1
3.4
3.5
1.5
2
X
2.5
10
15
20
25
Iteration
Example 6: PARTAN
function [] = Partan ();

% clear all ;
close all ;
5
4
3
=
26;
=
-3;
=
1;
= 0 .01 ;
=
30;
a
f
= zeros ( ns ,2);
= zeros ( ns ,1);
[z , g ] = OptFn (x , y );
a (1 ,:) = [ x y ];
f (1)
= z;
xa
= x;
ya
= y;
2
1
0
ns
x
y
b0
ls
10
15
20
25
Iteration
cnt = 2;
while cnt < ns ,
% Gradient step
[z , g ] = OptFn (x , y );
d = -g / norm ( g ); % Direction
[ bg , fmin ] = LineSearch ([ x y ] ,d , b0 , ls );
xg = x + bg * d (1);
yg = y + bg * d (2);
cnt = cnt + 1;
a ( cnt ,:) = [ xg yg ];
f ( cnt )
= OptFn ( xg , yg );
fprintf ( G : % d %5 .3f \ n ,cnt , f ( cnt ));
if cnt == ns ,
break ;
end ;
% Conjugate
d = [ xg - xa yg - ya ] ;
if norm ( d )=0 ,
d = d / norm ( d );
[ bp , fmin ] = LineSearch ([ xg yg ] ,d , b0 , ls );
else
bp = 0;
end ;
if bp >0 , % Line search in conj ugate direction was successful
fprintf ( P : );
x = xg + bp * d (1);
y = yg + bp * d (2);
% First step - substitute for a Conjugate step

d = -g / norm ( g );
% First direction
[ bp , fmin ] = LineSearch ([ x y ] ,d , b0 ,100);
x
= x + bp * d (1); % Standin for a conjugate step
y
= y + bp * d (2);
a (2 ,:) = [ x y ];
f (2)
= fmin ;
else
% Could not move - do another gradient update

cnt
= cnt + 1;
a ( cnt ,:) = a ( cnt -1 ,:);
f ( cnt )
= f ( cnt -1);
if cnt == ns ,
break ;
end ;
fprintf ( G2 : );
[z , g ] = OptFn ( xg , yg );
d = -g / norm ( g ); % Direction
[ bp , fmin ] = LineSearch ([ xg yg ] ,d , b0 , ls );
x = xg + bp * d (1);
y = yg + bp * d (2);
end ;
% Update anchor point

xa = xg ;
ya = yg ;
cnt
= cnt + 1;
a ( cnt ,:) = [ x y ];
f ( cnt )
= OptFn (x , y );
fprintf ( % d %5 .3f \ n ,cnt , f ( cnt ));
end ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
[ zopt , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt = x ( id1 , id2 );
yopt = y ( id1 , id2 );

[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt2 = x ( id1 , id2 );
yopt2 = y ( id1 , id2 );
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
xlim ([0 ns -1]);

grid on ;
AxisSet (8);
print - depsc P a r t a n P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
set ( h (1) , Marker , . );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
print - depsc P a r t a n E r r o r L i n e a r ;
AxisSet (8);
print - depsc PartanContourA ;
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid (1 .5 :0 .01 :2 .5 , -3 .5 :0 .01 : -2 .5 );
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc PartanContourB ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
PARTAN Pros and Cons

a2
a0
a3
a6
a1
a7
a4
a5
+ For quadratic functions, converges in a nite number of steps

+ Easier to implement than 2nd order methods
+ Can be used with large number of parameters
+ Each (composite) step is at least as good as steepest descent
+ Tolerant of inexact line searches
Each (composite) step requires two line searches
Newtons Method
1
ak+1 = ak H(ak )
Example 7: Newtons with Steepest Descent Safeguard
f (ak )
where f (ak ) is the gradient and H(ak ) is the hessian of f (a),
2 f (a)
2 f (a)
2 f (a)
.
.
.
2
a
a
a
a
a1
1
2
1
p
2 f (a)
2 f (a)
2 f (a)
.
.
.
2
a2 ap
a2
a a
H(ak ) 2. 1
..
..
..
..
.
.
.
2
2
2
f (a)
f (a)
f (a)
.
.
.
2
ap a1
ap a2
a
4
3
2
0
1
Based on a quadratic approximation of the function f (a)
If f (a) is quadratic, converges in one step
If H(a) is positive-denite, the problem is well dened near local

minima where f (a) is nearly quadratic
4
5
5
0
X

7
1.5
6
2
Function Value
2.5
4
3
2
3
1
0
0.5
1
X
1.5
10
20
30
40
50
Iteration
60
70
80
90

6
function [] = Newtons ();

% clear all ;
close all ;
5
4
3
ns
x
y
b0
= 100;
= -3; % Starting x
= 1; % Starting y
= 1;
a
f
= zeros ( ns ,2);
= zeros ( ns ,1);
[z ,g , H ] = OptFn (x , y );
a (1 ,:) = [ x y ];
f (1)
= z;
2
1
0
10
20
30
40
50
Iteration
60
70
80
90
for cnt = 2: ns ,
d = - inv ( H )* g ;
if d * g >0 , % Revert to steepest descent if is not direction of descent
% fprintf ( (%2 d of %2 d ) Min. Eig :%5 .3f Reverting... \n , cnt , ns , min ( eig ( H )));
d = -g ;
end ;
d = d / norm ( d );
[b , fmin ] = LineSearch ([ x y ] ,d , b0 ,100);
% a ( cnt ,:) = ( a ( cnt -1 ,:) - inv ( H )* g ) ; % Pure Newton s Method
x = x + b * d (1);
y = y + b * d (2);
[z ,g , H ] = OptFn (x , y );
a ( cnt ,:) = [ x y ];
f ( cnt )
= z;
end ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
[ zopt , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt = x ( id1 , id2 );
yopt = y ( id1 , id2 );
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt2 = x ( id1 , id2 );
yopt2 = y ( id1 , id2 );
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc NewtonsContourA ;
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid (1 .0 + ( -1:0 .02 :1) , -2 .4 + ( -1:0 .02 :1));
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc NewtonsContourB ;
grid on ;
AxisSet (8);
print - depsc N e w t o n s E r r o r L i n e a r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
print - depsc N e w t o n s P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
set ( h (1) , Marker , . );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
Newtons Method Pros and Cons
Levenberg-Marquardt
ak+1 = ak H(ak )1 f (ak )

+ Very fast convergence near local minima
1. Determine if k I + H(ak ) is positive denite. If not, k := 4k

and repeat.
2. Solve the following equation for ak+1
[k I + H(ak )] (ak+1 ak ) = f (ak )
Not guaranteed to converge (may actually diverge)

Requires p p Hessian
Requires a p p matrix inverse that uses O(p3 ) operations
3.
rk
f (ak ) f (ak+1 )
q(ak ) q(ak+1 )
where q(a) is the quadratic approximation of f (a) based on the

f (a), f (a), and H(ak )
4. If rk < 0.25, then k+1 := 4k
If rk > 0.75, then k+1 := 12 k
If rk 0, then ak+1 := ak
5. If not converged, k := k + 1 and loop to 1.
Levenberg-Marquardt Comments
Example 8: Levenberg-Marquardt Conjugate Gradient
Similar to Newtons method
Has safety provisions for regions where quadratic approximation is

inappropriate
4
3
Compare
ak+1 = ak H(ak )1 f (ak )

[k I + H(ak )] (ak+1 ak ) = f (ak )
1
Y
Newtons:
LM :
0
1
If = 0, these are equivalent
If , ak+1 ak
is chosen to ensure that the smallest eigenvalue of H(ak ) is

positive and suciently large ( )
4
5
5
0
X
2.5
2.6
6
2.7
5
Function Value
2.8
2.9
3
3.1
3.2
4
3
2
3.3
1
3.4
3.5
1.5
2
X
2.5
10
15
20
25
Iteration

6
function [] = L e v e n b e r g M a r q u a r d t ();
% clear all ;
close all ;
5
ns
x
y
eta
a
f
= 26;
= -3; % Starting x
=
1; % Starting y
= 0 .0001 ;
= zeros ( ns ,2);
= zeros ( ns ,1);
[ zn ,g , H ] = OptFn (x , y );
a (1 ,:) = [ x y ];
f (1)
= zn ;
ap = [ x y ] ; % Previous point
for cnt = 2: ns ,
[ zn ,g , H ] = OptFn (x , y );
1
0
while min ( eig ( eta * eye (2)+ H )) <0 ,

eta = eta * 4;
end ;
10
15
20
25
a ( cnt ,:) = ( ap - inv ( eta * eye (2)+ H )* g ) ;
Iteration
x = a ( cnt ,1);
y = a ( cnt ,2);
zo = zn ; % Old function value
zn = OptFn (x , y );
y = a ( cnt ,2);
a ( cnt ,:) = [ x y ];
f ( cnt )
= OptFn (x , y );
xd = ( a ( cnt ,:) - ap );
qo = zo ;
qn = zn + g * xd + 0 .5 * xd * H * xd ;
% disp ([ cnt a ( cnt ,:) f ( cnt ) r eta ])

end ;
if qo == qn , % Test for convergence

x = a ( cnt ,1);
y = a ( cnt ,2);
a ( cnt : ns ,:) = ones ( ns - cnt +1 ,1)*[ x y ];
f ( cnt : ns ,:) = OptFn (x , y );
break ;
end ;
r = ( zo - zn )/( qo - qn );
if r <0 .25 ,
eta = eta * 4;
elseif r >0 .50 , % 0 .75 is recommended , but much slower
eta = eta / 2;
end ;
if zn > zo , % Back up
a ( cnt ,:) = a ( cnt -1 ,:);
else
ap = a ( cnt ,:) ;
end ;
x = a ( cnt ,1);
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
[ zopt , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt = x ( id1 , id2 );
yopt = y ( id1 , id2 );
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id1 ] = min ( z );
id1 = id1 ( id2 );
xopt2 = x ( id1 , id2 );
yopt2 = y ( id1 , id2 );
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );
zoom on ;
AxisSet (8);
print - depsc L e v e n b e r g M a r q u a r d t C o n t o u r A ;
figure ;
FigureSet (1 ,4 .5 ,2 .75 );
[x , y ] = meshgrid (1 .5 :0 .01 :2 .5 , -3 .5 :0 .01 : -2 .5 );
z = OptFn (x , y );
axis ( square );
hold on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
hold off ;
xlabel ( X );
ylabel ( Y );

AxisSet (8);
print - depsc L e v e n b e r g M a r q u a r d t E r r o r L i n e a r ;
zoom on ;
AxisSet (8);
print - depsc L e v e n b e r g M a r q u a r d t C o n t o u r B ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
print - depsc L e v e n b e r g M a r q u a r d t P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
set ( h (1) , Marker , . );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
grid on ;
Levenberg-Marquardt Pros and Cons

[k I + H(ak )] (ak+1 ak ) = f (ak )
Many equivalent formulations
+ No line search required
+ Can be used with approximations to the hessian
+ Extremely fast convergence (2nd order)
Requires gradient and hessian (or approximate hessian)
Requires O(p3 ) operations for each solution to the key equation
Optimization Algorithm Summary
Algorithm
Cyclic Coordinate
Steepest Descent
Conjugate Gradient
PARTAN
Newtons Method
Levenberg-Marquardt
Convergence
Slow
Slow
Fast
Fast
Very Fast
Very Fast
Stable
Y
Y
N
Y
N
Y
f (a)
N
Y
Y
Y
Y
Y
H(a)
N
N
N
N
Y
Y
LS
Y
Y
Y
Y
N
N

Multivariate Optimizationx 4

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Multivariate Optimizationx 4

Încărcat de

Drepturi de autor:

Formate disponibile

Overview of Multivariate Optimization Topics

Multivariate Optimization Overview

The unconstrained optimization problem is a generalization of

Find a vector a such that

Note that the are no constraints on a

Concise, subjective summary

We want to reach the bottom with the minimum number of cane

Example 1: Optimization Problem

Example 1: Optimization Problem

Example 1: Optimization Problem

Example 1: Optimization Problem

Example 1: Optimization Problem

Example 1: MATLAB Code

print ( fileName , - depsc );

view (45 ,10);

In general, all optimization algorithms nd a local minimum in as

There are also global optimization algorithms based on ideas

Optimization Algorithm Outline

The problem is to nd the lowest point

The most common approach is to go downhill

4. Loop to 2 until convergence

Cyclic Coordinate Method

Example 2: Cyclic Coordinate Method

ai := argminf ([a1 , a2 , . . . , ai1 , , ai+1 , . . . , ap ])

2. Loop to 1 until convergence

+ Each line search can be performed semi-globally to avoid shallow

+ Can be used with nominal variables

+ f (a) can be discontinuous

Very slow compared to gradient-based optimization algorithms

Usually only practical when the number of parameters, p, is small

Example 2: Cyclic Coordinate Method

Example 2: Cyclic Coordinate Method

Example 2: Cyclic Coordinate Method

Example 2: Relevant MATLAB Code

Euclidean Position Error

[z , dzx , dzy ] = OptFn (x , y );

set ( h (1) , LineWidth ,1 .2 );

The gradient of the function f (a) is dened as the vector of partial

+ Very stable algorithm

It can be shown that the gradient, a f (a), points in the

Example 3: Steepest Descent

Example 3: Steepest Descent

Example 3: Steepest Descent

Example 3: Steepest Descent Method

Example 3: Relevant MATLAB Code

h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );

Conjugate Gradient Algorithms

1. Take a steepest descent step

Example 4: Fletcher-Reeves Conjugate Gradient

Example 4: Fletcher-Reeves Conjugate Gradient

Example 4: Fletcher-Reeves Conjugate Gradient

Example 4: Fletcher-Reeves Conjugate Gradient

Example 4: Relevant MATLAB Code

h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );

Conjugate Gradient Algorithms Continued

There is also a variant called Polak-Ribiere where

+ Only requires the gradient

Example 5: Polak-Ribiere Conjugate Gradient

Example 5: Polak-Ribiere Conjugate Gradient

Example 5: Polak-Ribiere Conjugate Gradient

Example 5: Polak-Ribiere Conjugate Gradient

Example 5: MATLAB Code

h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );