Documente Academic
Documente Profesional
Documente Cultură
3B1B Optimization
Michaelmas 2014
Newtons method
Line search
Quasi-Newton methods
Least-Squares and Gauss-Newton methods
Downhill simplex (amoeba) algorithm
A. Zisserman
-1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0.5
Rosenbrocks function
2
2
2
f (x, y) = 100(y x ) + (1 x)
Rosenbrock function
3
2.5
2
1.5
1
0.5
0
-0.5
-1
-2
-1
Minimum is at [1, 1]
Steepest descent
The 1D line minimization must be performed using one of the earlier
methods (usually cubic polynomial interpolation)
Steepest Descent
Steepest Descent
3
2.5
0.85
2
0.8
1.5
1
0.75
0.5
0
0.7
-0.5
0.65
-1
-2
-1
-0.95
-0.9
-0.85
-0.8
-0.75
3. Memory footprint
4. Region of convergence
Update x
f 0(xn)
xn+1 = xn 00
f (xn)
f f
,
x y
x
y
2f
1
2
+ (x, y) x
2f
2
xy
!
2f
x
xy
2f
y
y 2
+ h.o.t
Newtons method in ND
Expand f (x) by its Taylor series about the point xn
1 >
f (xn + x) f (xn) + g>
n x + x H n x
2
where the gradient is the vector
"
f
f
gn = f (xn) =
,...,
x1
xN
#>
H n = H (xn) =
2f
x2
1
..
...
...
2f
x1xN
2f
x1xN
f
x2
N
with solution x = H1
n gn. This gives the iterative update
xn+1 = xn H1
n gn
xn+1 = xn + x
= xn H1
n gn
If f (x) is quadratic, then the solution is found in one step.
The method has quadratic convergence (as in the 1D case).
The solution x = H1
n gn is guaranteed to be a downhill direction
(provided that H is positive definite)
For numerical reasons the inverse is not actually computed, instead
x is computed as the solution of H x = gn.
Rather than jump straight to xn H1
n gn , it is better to perform a
line search which ensures global convergence
xn+1 = xn nH1
n gn
If H = I then this reduces to steepest descent.
2.5
2.5
1.5
1.5
0.5
0.5
-0.5
-0.5
-1
-2
-1
0
gradient < 1e-3 after 15 iterations
-1
-2
-1.5
-1
-0.5
0.5
1.5
Quasi-Newton methods
If the problem size is large and the Hessian matrix is dense then it
may be infeasible/inconvenient to compute it directly.
e.g. in 1D
First derivatives
f f0
f f1
h
h
and f 0 (x0 ) = 0
f 0(x0 + ) = 1
2
h
2
h
second derivative
f(x)
f0f1
f1 f0
f 2f0 + f1
h
= 1
f 00(x0) = h
2
h
x-1
h
x0
x+1
Quasi-Newton: BFGS
Set H 0 = I.
Update according to
qnq>
(H nsn) (H nsn)>
n
H n+1 = H n + >
qn sn
s>
n H n sn
where
sn = xn+1 xn
qn = gn+1 gn
The matrix itself is not stored, but rather represented compactly by
a few stored vectors.
The estimate H n+1 is used to form a local quadratic approximation
as before.
Example
Rosenbrock function
3
2.5
2
1.5
1
0.5
0
-0.5
-1
-2
-1
Matlab fminunc
>> f='100*(x(2)-x(1)^2)^2+(1-x(1))^2';
>> GRAD='[100*(4*x(1)^3-4*x(1)*x(2))+2*x(1)-2; 100*(2*x(2)-2*x(1)^2) ]';
Choose options for BFGS quasi-Newton
>> OPTIONS=optimset('LargeScale','off', 'HessUpdate','bfgs' );
>> OPTIONS = optimset(OPTIONS,'gradobj','on');
Start point
>> x = [-1.9; 2];
>> [x,fval] = fminunc({f,GRAD},x,OPTIONS);
This produces
x = 0.9998, 0.9996
fval = 3.4306e-008
f (x) =
M
X
ri2
i=1
i=1
ri2 =
i=1
(y(si, x) ti)2
target value
s
Cost function
Transformation parameters:
3D rotation matrix R
Image generation:
rotate and translate 3D model by R and T
project to generate image IR,T(x, y)
M
X
i=1
ri2 = krk2
r1
x1
J(x) =
..
r
M
x1
...
...
r1
xN
rM
xN
assume
M>N
Consider
X
X 2
r
ri =
2ri i
xk i
xk
i
Hence
f (x) = 2J>r
J>
2ri
= 2
+2
ri
xk xl
i xk xl
i
X ri ri
Hence
2 J> J
H ( x) =
M
X
riR i
i=1
J>
Ri
For these reasons, the second-order term is often ignored, giving the
Gauss-Newton approximation to the Hessian :
H (x) = 2J>J
Hence, explicit computation of the full Hessian can again be avoided.
Example Gauss-Newton
The minimization of the Rosenbrock function
f (x, y) = 100(y x2)2 + (1 x)2
can be written as a least-squares problem with residual vector
r=
r1
x
J(x) = r
2
x
"
10(y x2)
(1 x)
!
r1
20x 10
y =
r2
1
0
y
H(x) =
2f
x2
2f
xy
2f
xy
=
2f
y 2
"
200
"
20x 1
10
0
#"
20x 10
1
0
"
800x2 + 2 400x
400x
200
ri2
i=1
H (x) = 2J>J
and the gradient is given by
f (x) = 2J>r
So, the Newton update step
xn+1 = xn + x
= xn H1
n gn
computed as H x = gn, becomes
J>J x = J>r
These are called the normal equations.
>J
xn+1 = xn nH1
g
with
H
(
x
)
=
2
J
n
n n
n n
Gauss-Newton method with line search
2.5
2.5
1.5
1.5
0.5
0.5
-0.5
-0.5
-1
-2
-1
0
gradient < 1e-3 after 14 iterations
-1
-2
-1.5
-1
-0.5
0.5
1.5
Comparison
Gauss-Newton
Newton
Newton method with line search
2.5
2.5
1.5
1.5
0.5
0.5
-0.5
-0.5
-1
-2
-1
-1
-2
-1
approximates Hessian by
Jacobian product
start
reflect
expand
contract
xr =
x + (
x xn+1)
P
where
x=
i xi /(N + 1) is the centroid and > 0. Compute f (xr ), and there
are then 3 possibilities:
1. f (x1) < f (xr ) < f (xn ) (i.e. xr is neither the new best or worst point), replace
xn+1 by xr .
2. f (xr ) < f (x1 ) (i.e. xr is the new best point), then assume direction of reflection
is good and generate a new point by expansion
xe = xr + (xr
x)
where > 0. If f (xe) < f (xr ) then replace xn+1 by xe, otherwise, the expansion
has failed, replace xn+1 by xr .
3. f (xr ) > f (xn) then assume the polytope is too large and generate a new point
by contraction
xc =
x + (xn+1
x)
where (0 < < 1) is the contraction coecient. If f (xc) < f (xn+1 ) then the
contraction has succeeded and replace xn+1 by xc, otherwise contract again.
Standard values are = 1, = 1, = 0.5.
Example
3
2.5
2
1.5
1
0.5
0
-0.5
-1
-2
-1
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2
1.9
1.8
-2
-1.8
-1.6
-1.4
detail
-1.2
-1
Matlab fminsearch
with 200 iterations
0
-1
-1
contractions
0
-1
0
-1
-1
XX
YY
3
2
YY
3
2
1
YY
0
-1
0
1
XX
3
1
-1
-1
XX
XX
YY
-1
0
-1
-1
XX
reflections
YY
3
2
-1
YY
3
2
-1
YY
3
2
1
-1
YY
-1
XX
1
XX
-1
XX
Summary
no derivatives required
deals well with noise in the cost function
is able to crawl out of some local minima (though, of course, can still get stuck)
Matlab fminsearch
Nelder-Mead simplex direct search
>> banana = @(x)100*(x(2)-x(1)^2)^2+(1-x(1))^2;
Pass the function handle to fminsearch:
>> [x,fval] = fminsearch(banana,[-1.9, 2])
This produces
x = 1.0000 1.0000
fval =4.0686e-010
What is next?
Move from general and quadratic optimization problems
to linear programming
Constrained optimzation