Documente Academic
Documente Profesional
Documente Cultură
Philippe Toint
Department of Mathematics, University of Namur, Belgium
( philippe.toint@fundp.ac.be )
Outline
1
Conclusions
April 2009
2 / 323
Acknowledgements
This course would not have been possible without
the Francqui Foundation and the Katholieke Universiteit Leuven,
Moritz Diehl, Dirk Roose and Stefan Vandewalle (the gentle
organizers),
Fabian Bastin, Stefania Bellavia, Cinzia Cirillo, Coralia Cartis, Andy
Conn, Nick Gould, Serge Gratton, Sven Leyffer, Vincent Malmedy,
Benedetta Morini, Melodie Mouffe, Annick Sartenaer, Katya
Scheinberg, Dimitri Tomanos, Melissa Weber-Mendonca (my patient
co-authors).
Ke Chen, Patrick Laloyaux (who supplied pictures)
April 2009
3 / 323
What is optimization?
best
choice
constraints
April 2009
4 / 323
More formally
variables
objective function
constraints
x = (x1 , x2 , . . . , xn )
minimize/maximize f (x)
c(x) 0
such that
c(x) 0
(the general nonlinear optimization problem)
(+ conditions on x, f and c)
Philippe Toint (Namur)
April 2009
5 / 323
Nature optimizes
April 2009
6 / 323
April 2009
7 / 323
a)
power
b)
astigmatism
April 2009
8 / 323
uncorrected
long distance
Philippe Toint (Namur)
short distance
PAL
April 2009
9 / 323
1
N = N(x, y ) = r
h i2 h i2 .
1 + z + z
x
y
2 2 !
z z
z
x y
xy
April 2009
10 / 323
Sachs (2003)
April 2009
11 / 323
K2
1
1
(Arrhenius equation)
Evolution of temperature:
c()
= [k()],
t
April 2009
12 / 323
Sansom (2001)
April 2009
13 / 323
April 2009
14 / 323
April 2009
15 / 323
April 2009
16 / 323
April 2009
17 / 323
0.8
0.6
0.4
0.2
0.2
0.4
2.5
1.5
0.5
0.5
April 2009
18 / 323
0.8
0.6
0.4
0.2
0.2
0.4
2.5
1.5
0.5
0.5
April 2009
18 / 323
0.8
0.6
0.4
0.2
0.2
0.4
2.5
1.5
0.5
0.5
1X
1
min kx0 xb k2B 1 +
kHM(ti , x0 ) bi k2R 1 .
x0 2
2
i
i=0
April 2009
18 / 323
0.8
0.6
0.4
0.2
0.2
0.4
2.5
1.5
0.5
0.5
April 2009
18 / 323
CERFACS (2009)
April 2009
19 / 323
April 2009
20 / 323
April 2009
21 / 323
Illustration :
Ubus = distance 1.2 price of ticket 2.1 delay wrt to car travel +
Philippe Toint (Namur)
April 2009
22 / 323
April 2009
23 / 323
Normal
Lognormal
Spline
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
15
20
25
30
35
40
45
April 2009
24 / 323
Z
(uij zij log(uij )) +
ij
kuk
April 2009
25 / 323
April 2009
26 / 323
April 2009
26 / 323
dq
dt (t)
= velocities)
April 2009
27 / 323
Applications: finance
1
risk management
portofolio analysis
3
1
0.9
FX markets
Normal
Spline
0.8
0.7
0.6
0.5
0.4
Standard
0.3
Optimized
0.2
0.1
0
-1.5
-1
-0.5
0.5
Investment distribution
for the BoJ 1991-2004
4
...
1.5
Everybody loves
an optimizer!
April 2009
28 / 323
History
April 2009
29 / 323
History
Al-Khwarizmi (783-850)
April 2009
30 / 323
History
April 2009
31 / 323
J. de Lagrange (1735-1813)
History
April 2009
32 / 323
History
April 2009
33 / 323
Michael Powell
History
Roger Fletcher
April 2009
34 / 323
Basic concepts
min f (x)
x
such that
c(x) 0
Difficulties:
the objective function f (x) is typically complicated (nonlinear)
it is also often costly to compute
there may be many variables
the constraints c(x) may defined a complicated geometry
April 2009
35 / 323
Basic concepts
April 2009
36 / 323
Basic concepts
Trust-region methods
iterative algorithms
find local solutions only
Algorithm 1.1: The trust-region framework
Until an (approximate) solution is found:
Step 1: use a model of the nonlinear function(s)
within region where it can be trusted
Step 2: notion of sufficient decrease
Step 3: measure achieved and predicted reductions
Step 4: decrease the region radius if unsuccessful
April 2009
37 / 323
Illustration
April 2009
38 / 323
x0 = (0.71, 3.27)
Illustration
and
f (x0 ) = 97.630
Contours of f
Contours of m0 around x0
(quadratic model)
April 2009
39 / 323
k
0
k
1
sk
(0.05, 0.93)
Illustration
f (xk + sk )
43.742
f /mk
0.998
xk+1
x0 + s0
April 2009
40 / 323
k
0
1
k
1
2
sk
(0.05, 0.93)
(0.62, 1.78)
Illustration
f (xk + sk )
43.742
2.306
f /mk
0.998
1.354
xk+1
x0 + s0
x1 + s1
April 2009
40 / 323
k
0
1
2
k
1
2
4
sk
(0.05, 0.93)
(0.62, 1.78)
(3.21, 0.00)
Illustration
f (xk + sk )
43.742
2.306
6.295
f /mk
0.998
1.354
0.004
xk+1
x0 + s0
x1 + s1
x2
April 2009
40 / 323
k
0
1
2
3
k
1
2
4
2
sk
(0.05, 0.93)
(0.62, 1.78)
(3.21, 0.00)
(1.90, 0.08)
Illustration
f (xk + sk )
43.742
2.306
6.295
29.392
f /mk
0.998
1.354
0.004
0.649
xk+1
x0 + s0
x1 + s1
x2
x2 + s2
April 2009
40 / 323
k
0
1
2
3
4
k
1
2
4
2
2
sk
(0.05, 0.93)
(0.62, 1.78)
(3.21, 0.00)
(1.90, 0.08)
(0.32, 0.15)
Illustration
f (xk + sk )
43.742
2.306
6.295
29.392
31.131
f /mk
0.998
1.354
0.004
0.649
0.857
xk+1
x0 + s0
x1 + s1
x2
x2 + s2
x3 + s3
April 2009
40 / 323
k
0
1
2
3
4
5
k
1
2
4
2
2
4
sk
(0.05, 0.93)
(0.62, 1.78)
(3.21, 0.00)
(1.90, 0.08)
(0.32, 0.15)
(0.03, 0.02)
Illustration
f (xk + sk )
43.742
2.306
6.295
29.392
31.131
31.176
f /mk
0.998
1.354
0.004
0.649
0.857
1.009
xk+1
x0 + s0
x1 + s1
x2
x2 + s2
x3 + s3
x4 + s4
April 2009
40 / 323
k
0
1
2
3
4
5
6
k
1
2
4
2
2
4
8
sk
(0.05, 0.93)
(0.62, 1.78)
(3.21, 0.00)
(1.90, 0.08)
(0.32, 0.15)
(0.03, 0.02)
(0.02, 0.00)
Illustration
f (xk + sk )
43.742
2.306
6.295
29.392
31.131
31.176
31.179
f /mk
0.998
1.354
0.004
0.649
0.857
1.009
1.013
xk+1
x0 + s0
x1 + s1
x2
x2 + s2
x3 + s3
x4 + s4
x5 + s5
April 2009
40 / 323
Illustration
Path of iterates:
From another x0 :
April 2009
41 / 323
Illustration
And then. . .
April 2009
42 / 323
April 2009
43 / 323
H(
Lesson 2:
Trust-region methods
for unconstrained problems
April 2009
43 / 323
April 2009
44 / 323
Background material
April 2009
45 / 323
Background material
April 2009
46 / 323
Background material
x F (x + s)s d.
April 2009
47 / 323
Background material
April 2009
48 / 323
Background material
April 2009
49 / 323
Background material
April 2009
50 / 323
Background material
Newtons method
Solve
F (x) = 0
Idea: solve linear approximation
F (x) + J(x)s = 0
quadratic local convergence
. . . but not globally convergent
Yet the basis of everything that follows
April 2009
51 / 323
Background material
April 2009
52 / 323
Background material
f (x)
ci (x) = 0, for i E,
ci (x) 0, for i I,
Active set:
F{2,3}
c2 (x) = 0
r
rx3
F{2}
j
C
F
F{1,2} *
Philippe Toint (Namur)
rx1
6
F{1}
rx2
F{3}
A(x1 ) = {1, 2}
A(x2 ) =
A(x3 ) = {3}
c1 (x) = 0
r
Y F{1,3}
H
c3 (x) = 0
April 2009
53 / 323
Background material
ci (x ) = 0 for all i E
ci (x ) 0 and [y ]i
and ci (x )[y ]i
0 for all i I
= 0 for all i I.
yi ci (x)
iEI
Philippe Toint (Namur)
April 2009
54 / 323
Background material
I | [y ]j = 0}
April 2009
55 / 323
Background material
N (x) = {y IRn | hy , u xi 0, u C}
tangent cone of C at x C
T (x) = N (x)0 = cl{(u x) | 0 and u C}
def
April 2009
56 / 323
Background material
April 2009
57 / 323
Background material
April 2009
58 / 323
Background material
Conjugate gradients
Idea: minimize a convex quadratic on successive nested Krylov subspaces
Algorithm 2.1: Conjugate-gradients (CG)
Given x0 , set g0 = Hx0 + c and let p0 = g0 .
For k = 0, 1, . . . , until convergence, perform the iteration
k
xk+1 = xk + k pk
gk+1 = gk + k Hpk
k
pk+1 = gk+1 + k pk
April 2009
59 / 323
Background material
Preconditioning
Idea: change the variables x = Rx and define M = R T R.
Algorithm 2.2: Preconditioned CG
Given x0 , set g0 = Hx0 + c, and let v0 = M 1 g0 and p0 = v0 .
For k = 0, 1, . . . , until convergence, perform the iteration
k
xk+1 = xk + k pk
gk+1 = gk + k Hpk
vk+1 = M 1 gk+1
k
pk+1 = vk+1 + k pk
April 2009
60 / 323
Background material
Lanczos method
Idea: compute an orthonormal basis of the successive nested Krylov
subspaces
makes QkT HQk tridiagonal
Algorithm 2.3: Lanczos
Given r0 , set y0 = r0 , q1 = 0.
For k = 0, 1, . . ., perform the iteration,
k
= kyk k2
qk
= yk /k
= hqk , Hqk i
April 2009
61 / 323
Background material
Lanczos
Cholesky
{z
April 2009
62 / 323
April 2009
63 / 323
April 2009
64 / 323
f (xk ) f (xk + sk )
.
mk (xk ) mk (xk + sk )
[k , )
[2 k , k ]
[1 k , 2 k ]
if k 2 ,
if k [1 , 2 ),
if k < 1 .
April 2009
65 / 323
April 2009
66 / 323
Assumptions
f C2
f (x) lbf
kxx f (x)k ufh
mk C 2 (Bk )
mk (xk ) = f (xk )
def
gk = x mk (xk ) = x f (xk )
kxx mk (x)k umh 1 for all x Bk
1
une kxkk
April 2009
67 / 323
3.5
2
2.5
1
2
0
1.5
1
1
2
0.5
4
4
1.5
2.5
3.5
4.5
April 2009
68 / 323
1
0
4
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
15
0 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1
0.1
0.2
kgk k
mk (xk ) mk (xk ) kgk k min
, k
k
C
1
2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
April 2009
69 / 323
1
0
4
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
15
0 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1
0.1
0.2
0.3
kgk k
mk (xk ) mk (xk ) dcp kgk k min
, k
k
AC
0.4
0.5
0.6
0.7
0.8
0.9
April 2009
70 / 323
Immediate consequence:
Suppose that x f (xk ) 6= 0. Then mk (xk + sk ) < mk (xk ) and
sk 6= 0.
k is well defined!
April 2009
71 / 323
80
60
45
30
20
5
5.2
0.5
2
1
5.2
1
2
5.2
0.5
5
20
30
45
60
1.5
1.5
0.5
0.5
1.5
April 2009
72 / 323
April 2009
73 / 323
April 2009
74 / 323
kgk k s6
s
For 1 >0
s
s s
2
s
s
s
s
{ti }
{`i }
s
s
s
s
s
S
K
s s
s
s
s
-k
April 2009
75 / 323
xBk
April 2009
76 / 323
But. . .
4
4
4
4
4
April 2009
77 / 323
lim kgk k = 0
April 2009
78 / 323
1.5
0.5
0.5
1.5
2
2
1.5
0.5
0.5
1.5
April 2009
79 / 323
9
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
April 2009
80 / 323
April 2009
81 / 323
1.5
1.5
1.5
0.5
0.5
0.5
0.5
0.5
0.5
1.5
1.5
1.5
2
2
1.5
0.5
0.5
1.5
2
2
1.5
0.5
0.5
1.5
2
2
1.5
0.5
0.5
April 2009
1.5
82 / 323
def
83 / 323
3.5
3.5
3.5
2.5
2.5
2.5
1.5
1.5
1.5
0.5
0.5
1.5
2.5
3.5
0.5
0.5
1.5
2.5
3.5
0.5
0.5
1.5
2.5
April 2009
3.5
84 / 323
April 2009
85 / 323
The subproblem
Assume
Euclidean norm
quadratic model (possibly non-convex)
(drop the index k)
min
sIRn
subject to ksk2
April 2009
86 / 323
Possible approaches
exact minimization
truncated conjugate-gradients
CG + Lanczos (GLTR)
doglegs
eigenvalue based methods
(projection methods)
April 2009
87 / 323
April 2009
88 / 323
4
4
4
4
4
4
4
4
April 2009
89 / 323
n
X
i=1
i2
(i + )2
April 2009
90 / 323
ks()k2
25
@
I
@
20
@
@
@ solution curve
15
10
0
?
6
April 2009
91 / 323
A nonconvex case
30
ks()k2
25
20
15
10
0
4
April 2009
92 / 323
ks()k2
25
20
15
10
April 2009
93 / 323
ks()k2
=1
= 0.1
= 0.01
= 0.00001
25
20
15
10
April 2009
94 / 323
() =
Then
1/ks()k2
1
1
=0
ks()k2
2.5
6
2
=1
= 0.1
= 0.01
= 0.00001
1.5
0.5
2
April 2009
95 / 323
The derivatives of ()
Suppose g 6= 0. Then
() is strictly increasing ( > 1 ), and concave.
0 () =
hs(), s()i
ks()k32
where
s() = H()1 s().
Note: if H() = LLT and Lw = s(), then
hs(), s()i = hs(), LT L1 s()i = kw k2
April 2009
96 / 323
Solve LLT s = g .
Solve Lw = s.
Replace by +
ksk2
ksk22
.
kw k22
April 2009
97 / 323
3.5
2.5
2.5
1.5
1.5
1
1
0.5
0.5
1.5
2.5
3.5
4.5
1.5
2.5
3.5
4.5
4
4
May be preconditioned
Steihaug (1983), T. (1981)
April 2009
98 / 323
(smooth transition)
Gould-Lucidi-Roma-T. (1999)
Philippe Toint (Namur)
April 2009
99 / 323
Doglegs
Idea: use steepest descent and the full Newtonstep (requires convexity?)
dogleg curve
sD
sN
@
R
@
sC
s DD
@
I
@
@
double-dogleg curve
trust-region boundary
April 2009
100 / 323
An eigenvalue approach
Rewrite
(H + M)s = g
as
(H g )
s
1
= Ms
H
gT
s
1
= ()
M 0
0 1
s
1
101 / 323
Bibliography
April 2009
102 / 323
Bibliography
D. D. Morrison,
Methods for nonlinear least squares problems and convergence proofs,
Proceedings of the Seminar on Tracking Programs and Orbit Determination, (J. Lorell and F. Yagi, eds.), Jet Propulsion
Laboratory, Pasadena, USA, pp. 1-9, 1960.
M. J. D. Powell,
A New Algorithm for Unconstrained Optimization,
in Nonlinear Programming (J. B. Rosen, O. L. Mangasarian and K. Ritter, eds.), Academic Press, London, pp. 31-65,
1970.
F. Rendl and H. Wolkowicz,
A Semidefinite Framework for Trust Region Subproblems with Applications to Large Scale Minimization,
Mathematical Programming, 77(2):273-299, 1997.
M. Rojas, S. A. Santos and D. C. Sorensen,
A new matrix-free algorithm for the large-scale trust-region subproblem,
CAAM, Rice University, TR99-19, 1999.
D. Winfield,
Function and functional optimization by interpolation in data tables,
Ph.D. Thesis, Harvard University, Cambridge, USA, 1969.
April 2009
103 / 323
Lesson 3:
Derivative-free optimization,
infinite dimensions and filters
April 2009
104 / 323
April 2009
105 / 323
April 2009
106 / 323
yi Y ;
April 2009
107 / 323
50
45
40
35
30
25
20
15
10
1
5
2
0
3
2.5
1.5
0.5
0.5
3
1
1.5
April 2009
108 / 323
50
45
40
35
30
25
20
15
10
1
5
2
0
3
2.5
1.5
0.5
0.5
3
1
1.5
April 2009
108 / 323
50
45
40
35
30
25
20
15
10
1
5
2
0
3
2.5
1.5
0.5
0.5
3
1
1.5
April 2009
108 / 323
50
45
40
35
30
25
20
15
10
1
5
2
0
3
2.5
1.5
0.5
0.5
3
1
1.5
April 2009
108 / 323
To be considered:
poisedness of the interpolation set Y
choice of models (linear, quadratic, in between, beyond)
convergence theory
numerical performance
April 2009
109 / 323
Poisedness
April 2009
110 / 323
20
18
16
14
12
10
8
6
4
2
0
2
1
0
1.5
0.5
1
0.5
1.5
April 2009
111 / 323
April 2009
111 / 323
. . . or this?
Philippe Toint (Namur)
April 2009
111 / 323
. . . or this?
Philippe Toint (Namur)
April 2009
111 / 323
April 2009
111 / 323
i i (yj ) = f (yj )
j = 1, . . . , p
i=1
1 (y1 )
..
(Y ) = det
.
1 (yp )
p (y1 )
..
.
p (yp )
April 2009
112 / 323
Lagrange polynomials
Remarkable: replace y by y+ in Y :
(Y+ )
= L(y+ , y ) is independent of the basis {i ()}pi=1
(Y )
where
y Y
L(y , y ) =
1
0
if y = y
if y =
6 y
April 2009
113 / 323
f (y )Lk (xk + s, y )
y Yk
And then. . .
y Yk
April 2009
114 / 323
April 2009
115 / 323
April 2009
115 / 323
April 2009
115 / 323
April 2009
115 / 323
April 2009
115 / 323
April 2009
115 / 323
April 2009
115 / 323
April 2009
115 / 323
April 2009
115 / 323
April 2009
116 / 323
April 2009
117 / 323
April 2009
118 / 323
April 2009
119 / 323
f (xk ) f (xk + sk )
.
mk (xk ) mk (xk + sk )
April 2009
120 / 323
Step 4b: Replace far point. If k < 1 (+ other technical condition) and Fk 6= , reject
xk + sk , keep k and exchange xk + sk with
y = arg max ky (xk + sk )k2 |Lk (xk + sk , y )|
y Fk
Step 4c: Replace close point. If k < 1 (+ other technical condition) and Ck 6= , reject
xk + sk , keep k and exchange xk + sk with
y = arg max ky (xk + sk )k2 |Lk (xk + sk , y )|
y Ck
Step 4d: Decrease the radius. Otherwise, reject xk + sk , keep Yk , and reduce k .
Philippe Toint (Namur)
April 2009
121 / 323
April 2009
122 / 323
April 2009
123 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
25
20
15
10
5
0
5
10
1.5
1.2
1
0.8
0.6
0.5
0.4
0
0.2
0
April 2009
124 / 323
April 2009
125 / 323
Main motivation:
April 2009
126 / 323
min [H] =
hd, Hdi
dV,d6=0 hd, di
inf
April 2009
127 / 323
April 2009
128 / 323
April 2009
129 / 323
Filter algorithms
April 2009
130 / 323
Filter algorithms
Monotonicity (1)
April 2009
131 / 323
Filter algorithms
Monotonicity (2)
But, unfortunately,
classical safeguards limit efficiency!
Of interest: design less obstructive safeguards while
ensuring better numerical performance
(the Newton Liberation Front!)
continuing to guarantee global convergence properties
Is this possible?
Typically:
abandon strict monotonicity of usual measures
but insist on average behaviour instead
Philippe Toint (Namur)
April 2009
132 / 323
Filter algorithms
Non-monotone trust-regions
Idea: f (xk+1 ) < f (xk ) replaced by f (xk+1 ) < fr (k)
with
fr (k) < fr (k1)
Further issues:
suitably define the reference iteration r (k)
adapt the trust-region algorithm: also compare achieved and
predicted reductions since reference iteration
T. (1997)
April 2009
133 / 323
Filter algorithms
Non-monotone TR algorithm
Algorithm 3.3: Non monotone TR algorithm (NMTR)
Step 0: Initialization. Given: x0 , 0 , 1 , 2 , 1 , 2 . Compute f (x0 ), set k = 0.
Step 1: Model definition. Choose k kk and define mk in Bk .
Step 2: Step calculation. Compute sk that sufficiently reduces mk and xk + sk Bk .
Step 3: Acceptance of the trial point.
compute f (xk + sk ),
kh =
i=r (k)
Define
iS
if k 2 ,
if k [1 , 2 ),
if k < 1 .
April 2009
134 / 323
Filter algorithms
kgj k min
j=p(k),jS
kgj k
, j
j
r6
f (xk )
r
r
6 r
r r
r
r
r
r
r
6 r
r 6
r
r
r
r
r r
6
k
X
t=0,tS
kgt k min
r
r r
r
-k
kgt k
, t .
t
April 2009
135 / 323
Filter algorithms
fr f (xk + sk )
f (xk ) f (xk + sk )
.
k = max
,
r + mk (xk ) mk (xk + sk ) mk (xk ) mk (xk + sk )
If k < 1 , then xk+1 = xk and go to Step 4; otherwise xk+1 = xk + sk and
c = c + mk (xk ) mk (xk+1 ) and r = r + mk (xk ) mk (xk+1 )
Step 3b: update the best value. If f (xk+1 ) < fmin then set fc = fmin = f (xk+1 ),
c = 0 and ` = 0 and go to Step 4; otherwise, ` ` + 1.
Step 3c: update the reference candidate. If f (xk+1 ) > fc , set fc = f (xk+1 ) and
c = 0.
Step 3d: possibly reset the reference value. If ` = m, set fr = fc and r = c .
April 2009
136 / 323
Filter algorithms
6
r
f (xk )
dr
r
r
dr
r
dr
r
r
r
r
r
dr
r
r
r
r
rd
rd
r
r
r
r
r
r
rd
-k
0 0 0 0 1 2 0 1 0 1 1 1 2 1 2 0 1 1 1 1 2 0 1 1 2 0 0 1 2 0
: reference iteration
: new best value
? : reference iteration redefined (l = m)
Philippe Toint (Namur)
April 2009
137 / 323
Filter algorithms
An unconstrained example
2
-2
-4
-6
-8
-10
-12
0
10
20
30
40
50
60
70
April 2009
138 / 323
Filter algorithms
April 2009
139 / 323
Filter algorithms
April 2009
140 / 323
Filter algorithms
6
r
r
r
0
Philippe Toint (Namur)
(x)
April 2009
141 / 323
Filter algorithms
f (xk )
f (xk ) k
(1 )k
0
(x)
filter area (non)-monotonically decreasing
Philippe Toint (Namur)
April 2009
142 / 323
Filter algorithms
i2
April 2009
143 / 323
Filter algorithms
6
q
q
q
0
2 (x)
April 2009
144 / 323
Filter algorithms
Additionally
possibly consider unsigned filter entries
use a trust-region algorithm when
trial point unacceptable
convergence to non-zero solution
( internal restoration)
April 2009
145 / 323
Filter algorithms
Fortran 95 package
large scale problems (CUTEr interface)
includes several variants of the method
signed/unsigned filters
Gauss-Newton, Newton or adaptive models
pure trust-region option
uses preconditioned conjugate-gradients
+ Lanczos for subproblem solution
April 2009
146 / 323
Filter algorithms
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Default
Pure trust region
0
10
April 2009
147 / 323
Filter algorithms
0.9
0.8
0.7
p()
0.6
0.5
0.4
0.3
0.2
0.1
Filter
LANCELOT
0
10
April 2009
148 / 323
Filter algorithms
0.9
0.8
0.7
p()
0.6
0.5
0.4
0.3
0.2
0.1
Filter
Unfettered
0
10
April 2009
149 / 323
Filter algorithms
6
q
q
q
0
g2 (x)
April 2009
150 / 323
Filter algorithms
A few complications. . .
But . . .
April 2009
151 / 323
Filter algorithms
0.9
0.8
0.7
p()
0.6
0.5
0.4
0.3
0.2
0.1
Default
Pure trust region
LANB
1
10
April 2009
152 / 323
Filter algorithms
log of residual
10
12
14
10
20
30
40
iterations
50
60
70
80
April 2009
153 / 323
Filter algorithms
log of residual
50
100
150
iterations
200
250
300
April 2009
154 / 323
Filter algorithms
log of residual
10
50
100
150
iterations
200
250
300
April 2009
155 / 323
Filter algorithms
Conclusions
April 2009
156 / 323
Bibliography
April 2009
157 / 323
Bibliography
10 M. J. D. Powell,
Developments of NEWUOA for minimization without derivatives,
IMA Journal of Numerical Analysis, 28(4):649-664, 2008.
11 K. Scheinberg and Ph. L. Toint,
Self-correcting geometry in model-based algorithms for derivative-free unconstrained optimization,
Report TR09/06, FUNDP, Namur, 2009.
12 Ph. L. Toint,
A non-monotone trust-region algorithm for nonlinear optimization subject to convex constraints,
Mathematical Programming, 77(1):69-94, 1997.
13 D. Winfield,
Function Minimization by Interpolation in a Data Table,
Journal of the IMA, 12:339-347, 1973.
April 2009
158 / 323
Lesson 4:
Optimization with
convex constraints
April 2009
159 / 323
April 2009
160 / 323
y = PC (y )
r
rP (y )
C
PC (y )
r
r
PC (y )
r
[x` ]i
def
[y ]i
[PC (y )]i =
[xu ]i
ry
if [y ]i [x` ]i ,
if [x` ]i < [y ]i < [xu ]i ,
if [xu ]i [y ]i
April 2009
161 / 323
r PC (y )
y = PC (y )
r
April 2009
162 / 323
0.8
0.6
x3
0.4
0.2
0
1
0.8
1
0.6
0.8
0.6
0.4
x2
0.4
0.2
0.2
0
x1
April 2009
163 / 323
r x t f (x)
m x
r
p(t, x) = p(tm , x)
x f (x)
Philippe Toint (Namur)
April 2009
164 / 323
Two projections
PT (x) [x f (x)]6C 0
PC [x x f (x)] xC 0
April 2009
165 / 323
Measuring criticality
(x, ) =
(x)
April 2009
166 / 323
min
hx f (x), di
(x) = (x, 1) =
x+dF ,kdk1
def
April 2009
167 / 323
k
xk r
r
. projected path
xk + dk
April 2009
168 / 323
def
(tkGC > 0)
such that
mk (xkGC ) f (xk ) + ubs hgk , skGC i
and either
mk (xkGC ) f (xk ) + lbs hgk , skGC i
or
kPT (x GC ) [gk ]k epp |hgk , skGC i|
k
or
kskGC k frd k
Philippe Toint (Namur)
(close to TR boundary)
April 2009
169 / 323
0.2
0.4
0.6
0.8
1.2
1.4
1.6
April 2009
170 / 323
0.2
0.4
0.6
0.8
1.2
1.4
1.6
April 2009
171 / 323
Useful properties
Piecewise search for xkGC well-defined and finite
1
(x, ) is non-decreasing
(x, ) is non-decreasing
(x, ) is non-increasing
|(x) (y )| Lkx y k
if x f (x) is continuous on a bounded level set
April 2009
172 / 323
k
mk (xk ) mk (xk + sk ) CR k min
, k , 1
1 + kHk k
Idea: Linesearch conditions imply
mk (xk ) mk (xkGC ) ubs |hgk , skGC i| = ubs (xk , kskGC k)
but need
kPT (P[xk tj gk ]) [gk ]k epp
def
k
mk (xk ) mk (xk ) dcp k min
, k
k
GC
April 2009
173 / 323
As above. . .
All limit points are first-order critical, i.e.
lim k = 0
But . . .
does the active set settle ?
(needed for 2nd-order convergence or rate)
April 2009
174 / 323
April 2009
175 / 323
April 2009
176 / 323
A(x ) = A
April 2009
177 / 323
April 2009
178 / 323
April 2009
179 / 323
A simple case
Consider C = {x IRn | x 0} and build
log
def
n
X
log(xi )
i=1
when & 0.
Philippe Toint (Namur)
April 2009
180 / 323
How it works. . .
h
i2
Example: minx1 ,x2 0 120 x12 (x1 1) x2 + 1 + 10(4 + x1 )2 150
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
April 2009
181 / 323
How it works. . .
h
i2
Example: minx1 ,x2 0 120 x12 (x1 1) x2 + 1 + 10(4 + x1 )2 150
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
April 2009
181 / 323
How it works. . .
h
i2
Example: minx1 ,x2 0 120 x12 (x1 1) x2 + 1 + 10(4 + x1 )2 150
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
April 2009
181 / 323
How it works. . .
h
i2
Example: minx1 ,x2 0 120 x12 (x1 1) x2 + 1 + 10(4 + x1 )2 150
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
April 2009
181 / 323
How it works. . .
h
i2
Example: minx1 ,x2 0 120 x12 (x1 1) x2 + 1 + 10(4 + x1 )2 150
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
April 2009
181 / 323
How it works. . .
h
i2
Example: minx1 ,x2 0 120 x12 (x1 1) x2 + 1 + 10(4 + x1 )2 150
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
April 2009
181 / 323
n
X
i=1
1.8
1.8
1.6
1.6
1.4
1.4
1.2
1.2
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
1.8
1.8
1.6
1.6
1.4
1.4
1.2
1.2
0.8
0.8
0.6
0.6
0.4
0.4
0.2
1
[x]i
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
1
2
April 2009
182 / 323
lim dist(yp , C) = 0.
April 2009
183 / 323
April 2009
184 / 323
(0, 1).
if k,j 2 ,
if k,j [1 , 2 ),
if k,j < 1 .
185 / 323
Assume:
k, > 0, bbmh (, k ) 1 k, j 0,
b
kxx mk,j
(x, k )k bbmh (, k )
April 2009
186 / 323
b
kxx mk,j
(xk,j , )k bbmh (k mdb (k), k )
April 2009
187 / 323
xk,j
k dist(xk,j , C)
@
@
@
R
@
k,j
xk,j + sk,j
ri{C}
April 2009
188 / 323
CC
xk,j
(t) = {x | x = xk,j tgk,j , t 0, tkgk,j k (1 k )dk,j and x Bk },
kgk,j k
mk,j (xk,j )mk,j (xk,j ) kgk,j k min
, k,j , (1 k )dk,j
k,j
CC
1
2
April 2009
189 / 323
f
b
Define mk,j (xk,j + s) = mk,j
(xk,j + s) + mk,j
(xk,j + s)
Step 2: Step calculation. Define dk,j = dist(xk,j , C). Compute sk,j such that
xk,j + sk,j Bk,j C and dist(xk,j + sk,j , C) k dk,j
and such that it sufficiently reduces mk,j
Compute (xk,j + sk,j , k ) and
(xk,j , k ) (xk,j + sk,j , k )
k,j =
.
mk,j (xk,j ) mk,j (xk,j + sk,j )
1 , define xk,j+1 = xk,j + sk,j ; otherwise define xk,j+1 = xk,j .
Then if k,j
if k,j 2 ,
if k,j [1 , 2 ),
if k,j < 1 .
April 2009
190 / 323
and
xx b(x, ) = X 2 e
April 2009
191 / 323
b
1
2
e, si + 12 hs, Xk,j
si
mk,j
(xk,j + s) = k he, log(xk,j )i hXk,j
Stop this algorithm as soon as an iterate xk,j = xk+1 is found for which
1
kx f (xk+1 ) k Xk+1
ek D (k ),
2
] E (k )
min [xx f (xk+1 ) + k Xk+1
April 2009
192 / 323
April 2009
193 / 323
(i = 1, . . . , n).
(i 6 A{xk` })
and
lim inf hu, [xx f (xk` )]ui 0
`
April 2009
194 / 323
XZ = 0,
x 0,
z 0,
x m(x) z = 0,
XZ = e
x 0,
z 0.
Perturb:
April 2009
195 / 323
Hence
h
i
1
xx mk,j (xk,j ) + Xk,j
Zk,j xk,j = x log (x, k )
April 2009
196 / 323
"
kgk,j k
k,j
#
, k,j , (1 k )dk,j
)
h
i
2
2
2 2
, k,j min k,j , k,j , (1k ) dk,j
April 2009
197 / 323
if k,j 2 ,
if k,j [1 , 2 ),
if k,j < 1 .
Step 5: Update the dual variables. Set zk,j+1 > 0. Increment j by one,go to
Step 1.
April 2009
198 / 323
1
min [xx f (xk+1 ) + Xk+1
Zk+1 ] E (k )
April 2009
199 / 323
"
I = zul min
1
e, zk,j , k Xk,j+1
e
, zuu max
1
e, zk,j , 1
k e, k Xk,j+1 e
and choose
zk,j+1 =
Philippe Toint (Namur)
PI [z k,j+1 ]
zk,j
200 / 323
1
[xk,j ]i
,1 .
If, furthermore,
lim ksk,j k = 0 when
lim kgk,j k = 0
then
1
lim
zk,j k Xk,j
e
= 0 if
lim kgk,j k = 0.
April 2009
201 / 323
q
1
h, [Hk,j + Xk,j
Zk,j ]i
xk,j
xk,j + sk,j
ri{C}
all usual convergence properties for fixed k
Philippe Toint (Namur)
April 2009
202 / 323
Scaled tests:
kx f (xk+1 ) zk+1 k[k+1] D (k )
kXk+1 Zk+1 k I k2 C (k ),
min
12
Mk+1
(xx f (xk+1 )
with
21
1
Xk+1
Zk+1 )Mk+1
E (k ),
def
1
Mk+1 = Hk+1 + Xk+1
Zk+1
April 2009
203 / 323
C (k ) k
lim
= 0.
k mini [xk+1 ]i
Moreover
If exact derivatives are used, the (k ) can be chosen to ensure
componentwise near quadratic rate of convergence.
This is quite remarkable!
Philippe Toint (Namur)
April 2009
204 / 323
with
b
mk,j
(xk,j + sk,j )
m
X
[yk,j ]i hsk,j , xx ci (xk,j )sk,j i
i=1
April 2009
205 / 323
m
X
[yk,j ]i xx ci (xk,j )
def
Gk,j = A (xk,j )C
i=1
April 2009
206 / 323
Bibliography
J. V. Burke,
On the identification of active constraints,
SIAM Journal on Numerical Analysis, 25(5):1197-1211, 1988.
J. V. Burke,
On the identification of active constraints II: the nonconvex case,
SIAM Journal on Numerical Analysis, 27(4):1081-1102, 1990.
J. V. Burke and J. J. Mor
e,
Exposing Constraints,
SIAM Journal on Optimization, 4(3):573-595, 1994.
J. V. Burke, J. J. Mor
e and G. Toraldo,
Convergence properties of trust region methods for linear and convex constraints,
Mathematical Programming A, 47(3):305-336, 1990.
A. R. Conn, N. I. M. Gould, D. Orban and Ph. L. Toint,
A primal-dual trust-region algorithm for minimizing a non-convex function subject to bound and linear equality
constraints,
Mathematical Programming, 87(2):215-249, 2000.
N. I. M. Gould, D. Orban, A. Sartenaer and Ph. L. Toint,
Superlinear Convergence of Primal-Dual Interior Point Algorithms for Nonlinear Programming,
SIAM Journal on Optimization, 11(4):974-1002, 2001.
A. R. Conn, N. I. M. Gould and Ph. L. Toint,
Global convergence of a class of trust region algorithms for optimization with simple bounds,
SIAM Journal on Numerical Analysis, 25(182):433-460, 1988.
A. R. Conn, N. I. M. Gould and Ph. L. Toint,
LANCELOT: a Fortran package for large-scale nonlinear optimization (Release A),
Spinger Verlag, Heidelberg, 1992.
A. R. Conn, N. I. M. Gould, A. Sartenaer and Ph. L. Toint,
Global convergence of a class of trust region algorithms for optimization using inexact projections on convex
constraints,
SIAM Journal on Optimization, 3(1):164221, 1993.
Philippe Toint (Namur)
April 2009
207 / 323
Bibliography
April 2009
208 / 323
Lesson 5:
Sparsity, partial separability
and multilevel methods:
exploiting problem structure
April 2009
209 / 323
Outline
Multilevel problems
Bibliography
April 2009
210 / 323
Sparsity
April 2009
211 / 323
Sparsity
Sparsity
A matrix is sparse when the proportion and/or distribution of
its zero entries allows its efficient numerical usage
An (oriented) graph is asociated with every sparse (non)symmetric matrix
April 2009
212 / 323
Sparsity
2
3
April 2009
213 / 323
Sparsity
B
B
B
@
C
C
C?
A
April 2009
214 / 323
Sparsity
B
B
B
@
C
C
C
A
April 2009
214 / 323
Sparsity
B
B
B
@
C
C
C
A
April 2009
214 / 323
Sparsity
B
B
B
@
C
C
C
A
April 2009
214 / 323
Sparsity
B
B
B
@
Je
Je
Answer: 3 finite-differences!
Philippe Toint (Namur)
C
C
C
A
Je
214 / 323
Sparsity
P
Build a difference vector h = group hi ei
Compute v = c(x + h) c(x)
vi
hi
April 2009
215 / 323
Sparsity
April 2009
216 / 323
Sparsity
April 2009
216 / 323
Sparsity
April 2009
216 / 323
Sparsity
April 2009
216 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C?
C
C
C
A
April 2009
217 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
217 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
217 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
217 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
217 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
217 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
217 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
217 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
0
C
C
C
C
C
C
C
C
C
A
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
But what about the conflicts with the upper triangular part?
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
Powell and T (1979), Coleman and More (1984) for a graph interpretation
Philippe Toint (Namur)
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
Powell and T (1979), Coleman and More (1984) for a graph interpretation
Philippe Toint (Namur)
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B ?
@
C
C
C
C
C
C
C
C
C
A
Powell and T (1979), Coleman and More (1984) for a graph interpretation
Philippe Toint (Namur)
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
Powell and T (1979), Coleman and More (1984) for a graph interpretation
Philippe Toint (Namur)
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
Powell and T (1979), Coleman and More (1984) for a graph interpretation
Philippe Toint (Namur)
April 2009
218 / 323
Sparsity
B
B
B ?
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
Powell and T (1979), Coleman and More (1984) for a graph interpretation
Philippe Toint (Namur)
April 2009
218 / 323
Sparsity
B
B
B
B
B
B
B
B
B
@
C
C
C
C
C
C
C
C
C
A
Powell and T (1979), Coleman and More (1984) for a graph interpretation
April 2009
218 / 323
Sparsity
April 2009
219 / 323
Sparsity
April 2009
219 / 323
Sparsity
Partial separability
A more geometric concept:
p
X
i=1
p
X
fi (xSi )
i=1
Vocabulary:
element functions, element variables, internal variables ui = Ui x
Philippe Toint (Namur)
April 2009
220 / 323
Sparsity
April 2009
221 / 323
Sparsity
Pp
i=1 fi (Ui x)
Pp
Pn
i=1 fi (xi )
i=1 fi (ui ),
x f (x) =
p
X
+ fn+1 (x1 + + xn )
then
UiT x fi (ui )
i=1
xx f (x) =
p
X
i=1
April 2009
222 / 323
Sparsity
1
2
1
1
2
1
0
B
B
B
+B
B
@
0
2
C
B 1
C
B
C
B
C=B
C
B
@
1 A
2
1
1
2
1
1
2
1
1
1
1
1
1
1
1
C B
C B
C B
C+B
C B
A @
1
1
C B
C B
C B
C+B
C B
A @
1
1
1
1
1
1
1
C
C
C
C
C
A
C B
C B
C B
C+B
C B
A @
C
C
C
C
C
1 A
2
1
1
April 2009
223 / 323
Sparsity
April 2009
224 / 323
Sparsity
Idea: use computational tree for f (x) for solving Newtons equations
use chain-rule at the top of the computational tree
multiplicative decompositions (and partially separable)
often available from the problem modelling itself
Substantial computational gains
unpublished (?) by T. Coleman (2008)
April 2009
225 / 323
Multilevel problems
April 2009
226 / 323
Multilevel problems
min f (x)
xIRn
April 2009
227 / 323
Multilevel problems
P Prolongation
P Prolongation
...
Restriction R
P Prolongation
P Prolongation
April 2009
228 / 323
Multilevel problems
Prolongation
Ri : IRni IRni1
Pi : IRni1 IRni
Ri = PiT
April 2009
229 / 323
Multilevel problems
Parameter estimation in
discretized ODEs
discretized PDEs
Optimal control problems
Optimal surface design (shape optimization)
Data assimilation in weather forecast (different levels of physics in the
models)
April 2009
230 / 323
Multilevel problems
1Z 1
min
v
1 + (x v )2 + (y v )2
f (x), y = 0, 0 x 1
0,
x = 0, 0 y 1
f (x), y = 1, 0 x 1
0,
x = 1, 0 y 1
21
dx dy
where
f (x) = x (1 x)
April 2009
231 / 323
Multilevel problems
2.5
1.5
0.9
0.8
0.7
1
1.5
0.6
0.5
0.4
0.5
0.3
0.2
0.5
0.1
0
5
0
10
0
20
8
2
1
n = 32 = 9
10
solution at level 6
0.25
0.25
0.2
0.2
0.15
0.15
0.2
0.1
0.1
0.1
0.05
0.05
0
40
0
80
30
20
15
20
25
30
n = 312 = 961
Philippe Toint (Namur)
35
n = 152 = 225
0.3
5
0
0.4
10
10
0.5
10
15
10
n = 72 = 49
20
15
0
150
60
40
20
0
10
20
30
40
50
n = 632 = 3969
60
70
100
50
0
20
40
60
80
100
120
140
n = 1272 = 16129
April 2009
232 / 323
Multilevel problems
globalization technique
&
Efficiency Robustness
(Unconstrained case)
April 2009
233 / 323
Multilevel problems
Trust-region
Gratton-Sartenaer-T (2006, 2008)
Gratton-Mouffe-T-Weber Mendonca (2009)
Gratton-Mouffe-Sartenaer-T-Tomanos (2009)
T-Tomanos-Weber Mendonca (2009)
Gross-Krause (2008)
April 2009
234 / 323
Multilevel problems
Ae = r
(residual equation)
(error)
(exact solution)
(approximation)
where
e = x x
r = b A
x (residual)
k=3
k = 24
0.6
k = 3 and k = 24
0.8
1
0
10
20
30
40
50
60
April 2009
235 / 323
Multilevel problems
Relaxation methods
Basic iterative methods:
correct the i th component of xk in the order 1, 2, . . . , n
to annihilate the i th component of rk
Jacobi
n
X
1
[xk+1 ]i =
Gauss-Seidel
[xk+1 ]i =
aii
i1
X
j=1
aij [xk+1 ]i
n
X
j=i+1
April 2009
236 / 323
Multilevel problems
Smoothing effect
Very effective methods at smoothing, i.e., eliminating the
high-frequency (oscillatory) components of the error:
0.2
0.02
1.5
0.15
0.015
0.1
0.01
0.5
0.05
0.005
0
40
30
30
20
20
10
10
0
error of
initial guess
0
40
30
40
30
20
20
10
10
0
error after 10
GS iterations
30
40
30
20
20
10
10
0
April 2009
237 / 323
Multilevel problems
Ae = r Af e f = r f
Ac e c = r c
Ac = RAf P, e c = Re f , r c = Rr f
April 2009
238 / 323
Multilevel problems
April 2009
239 / 323
Multilevel problems
Fine
smooth
Smooth fine
Smaller fine
R
Oscil. coarse
P
smooth
(recur)
smooth
Smooth coarse
April 2009
240 / 323
Multilevel problems
Does it work?
Smoothing on fine grid only:
2
0.2
0.02
1.5
0.15
0.015
0.1
0.01
0.5
0.05
0.005
0
40
30
0
40
30
30
20
10
40
20
10
10
0
30
20
20
10
10
0
30
40
30
20
20
10
0
0.2
0.02
1.5
0.15
0.015
0.1
0.01
0.5
0.05
0.005
0
30
0
30
30
20
30
30
20
20
10
10
0
k=0
Philippe Toint (Namur)
30
20
20
10
10
0
k = 10
20
10
10
0
k = 100
April 2009
241 / 323
Multilevel problems
V-cycle
k+1
Smoothing
Philippe Toint (Namur)
April 2009
242 / 323
Multilevel problems
W-cycle
k+1
Smoothing
April 2009
243 / 323
Multilevel problems
Mesh Refinement
April 2009
244 / 323
Multilevel problems
April 2009
245 / 323
Multilevel problems
Return to optimization
Hierarchy of problem descriptions
Trust-region technique
&
Efficiency Robustness
April 2009
246 / 323
Multilevel problems
The framework
Assume that we have:
A hierarchy of problem descriptions of f :
{fi }ri=0
with
fr (x) = f (x)
(the restriction)
Pi : IRni1 IRni
(the prolongation)
April 2009
247 / 323
Multilevel problems
The idea
xIRn
at xk :
trial step sk
Restriction R
P Prolongation
If more than two levels are available (r > 1), do this recursively
Philippe Toint (Namur)
April 2009
248 / 323
Multilevel problems
Level 4
R4
Level 3
P4
0
R3
Level 2
P3
0
R2
Level 1
P2
0
R1
Level 0
P1
0
Notation:
R2
P2
0
R1
P 1 R1
0
P1
0
i: level index (0 i r )
April 2009
249 / 323
Multilevel problems
and
Hlow = RHup P
If fi = 0 for i = 0, . . . , r 1
Galerkin model: Restricted version of the quadratic model at the
upper level
Philippe Toint (Namur)
April 2009
250 / 323
Multilevel problems
up
xlow,0
xlow,k
up kxlow,k xlow,0 k
+
low
min +
,
kx
x
k
up
low
,k
low
,0
low
April 2009
251 / 323
Multilevel problems
In infinity norm:
min +
low , up kxlow ,k xlow ,0 k
Gratton, Mouffe, T, Weber Mendonca (2008)
April 2009
252 / 323
Multilevel problems
def
(Coarsening condition)
and
def
April 2009
253 / 323
Multilevel problems
smoothing
coarsening
&
Be as flexible as possible
April 2009
254 / 323
Multilevel problems
(until convergence):
April 2009
255 / 323
Multilevel problems
Global convergence
Based on the trust-region technology
Uses the sufficient decrease argument (imposed in Taylors iterations)
Plus the coarsening condition (kRgup k kgup k)
Main result
lim kgr ,k k = 0
April 2009
256 / 323
In more details
Intermediate results
At iteration (i, k) we associate the set:
def
Level 4
R4
Level 3
P4
0
R3
Level 2
P3
0
R2
Level 1
P2
0
R1
Level 0
Philippe Toint (Namur)
P1
0
R2
P2
0
R1
P 1 R1
0
P1
0
April 2009
257 / 323
In more details
Let
def
V(i, k) = R(i, k)
April 2009
258 / 323
In more details
Premature termination
For a recursive iteration (i, k):
A minimization sequence at level i 1 initiated at iteration (i, k)
denotes all successive iterations at level i 1
until a return is made to level i
k
Level 4
R4
Level 3
R3
Level 2
P3
0
R2
Level 1
P2
0
R1
Level 0
Philippe Toint (Namur)
P1
0
R2
P2
0
R1
P 1 R1
0
P1
0
April 2009
259 / 323
In more details
Properties of RMTR
Each minimization sequence contains at least one successful iteration
Premature termination in that case does not affect the convergence
results at the upper level
Which allows
Stop a minimization sequence after a preset number of successful
iterations
Use fixed lower-iterations patterns like the V or W cycles in multigrid
methods
Philippe Toint (Namur)
April 2009
260 / 323
In more details
April 2009
261 / 323
In more details
SCM Smoothing
April 2009
262 / 323
In more details
Three issues
April 2009
263 / 323
In more details
April 2009
264 / 323
In more details
Weak minimizers
Gratton, Sartenaer, T (2006)
Philippe Toint (Namur)
April 2009
265 / 323
Numerical results
April 2009
266 / 323
Numerical results
nr
511
1.046.529
250.047
1.046.529
1.046.529
65.025
1.046.529
65.025
65.025
65.025
1.046.529
1.046.529
65.025
393.984
103.050
103.050
1.046.529
r
8
9
5
9
9
7
9
7
7
7
9
9
7
9
7
7
9
Type
1-D,
2-D,
3-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
2-D,
quadratic
quadratic
quadratic
quadratic
quadratic
convex
convex
convex
convex
convex
convex
convex
convex
convex
nonconvex
nonconvex
nonconvex
Description
Dirichlet-to-Neumann transfer problem
Poisson model problem
Poisson model problem
Elastic-plastic torsion problem
Journal bearing problem
Optimal design problem
Minimium surface problem
Minimium surface problem
Minimium surface problem
Combustion problem
Combustion problem
Combustion problem
Minimium surface problem
Membrane problem
Optimal control problem
Optimal control problem
Boundary value problem
April 2009
267 / 323
Numerical results
0.8
0.6
0.4
FM
MR
MF
AF
0.2
10
20
30
40
50
60
70
80
90
100
April 2009
268 / 323
Numerical results
0.8
0.6
0.4
FM
MR
MF
AF
0.2
0
1
10
April 2009
269 / 323
Numerical results
CPU times
Problem name
DNT
P2D
P3D
DEPT
DPJB
DODC
MINS-SB
MINS-OB
MINS-DMSA
IGNISC
DSSC
BRATU
MINS-BC
MEMBR
NCCS
NCCO
MOREBV
Best
Philippe Toint (Namur)
AF
5.2
1122.8
626.1
1364.4
3600.0
894.8
3600.0
1445.6
1196.8
2330.4
3183.8
2314.1
2706.4
1082.0
3600.0
3600.0
3600.0
MF
24.4
MR
72.8
569.7
47.5
18.3
4.6
69.5
95.4
1390.0
247.7
58.6
70.4
73.4
398.3
184.2
3600.0
116.7
289.6
488.2
1051.6
236.8
122.3
91.7
3600.0
161.8
524.6
335.2
3600.0
3600.0
292.4
279.5
3589.6
704.9
3600.0
FM
6.7
26.0
28.8
8.6
83.6
36
153.9
27.5
18.2
398.2
12.1
10.1
140.0
154.0
331.9
224.2
41.7
Second best
April 2009
270 / 323
Numerical results
April 2009
271 / 323
Bibliography
April 2009
272 / 323
Bibliography
April 2009
273 / 323
Bibliography
April 2009
274 / 323
Lesson 6:
Cubic and quadratic
regularization methods:
a path towards
nonlinear step control
Philippe Toint (Namur)
April 2009
275 / 323
Outline
cubic
quadratic
Conclusions
Bibliography
April 2009
276 / 323
Regularization techniques
for unconstrained Problems
April 2009
277 / 323
The problem
April 2009
278 / 323
(maybe
C2 )
Trust-region methods
compute a step sk from xk to improve a model mk of f
within the trust-region ksk k
set xk+1 = xk + sk if mk and f agree at xk + sk
otherwise set xk+1 = xk and reduce the radius
April 2009
279 / 323
April 2009
280 / 323
kgk k
, k
mk (xk ) mk (xk + sk ) TR kgk k min
1 + kHk k
April 2009
281 / 323
April 2009
282 / 323
min
s
to
min
s
April 2009
283 / 323
April 2009
284 / 323
Cubic overestimation
Assume
f C2
f , g and H at xk are fk , gk and Hk
symmetric approximation Bk to Hk
Bk and Hk bounded at points of interest
Use
cubic overestimating model at xk
mk (s) fk + s T gk + 12 s T Bk s + 31 k ksk32
k is the iteration-dependent regularisation weight
p
easily generalized for regularisation in Mk -norm kskMk = s T Mk s
where Mk is uniformly positive definite
April 2009
285 / 323
f (xk ) f (xk + sk )
Step 2: Step acceptance: Compute k =
f (xk ) mk (sk )
x k + sk
if k > 0.1
and set xk+1 =
xk
otherwise
Step 3: Update the regularization parameter:
k+1
(0, k ] = 21 k if k > 0.9
very successful
April 2009
286 / 323
kgk k
mk (xk ) mk (xk + sk ) CR kgk k min
,
1 + kHk k
kgk k
k
kHk k
ksk k 3 max
,
k
kgk k
k
(Cartis/Gould/T)
April 2009
287 / 323
April 2009
288 / 323
Fast convergence
For fast asymptotic convergence = need to improve on Cauchy point:
minimize over Krylov subspaces
1
g stopping-rule: ks mk (sk )k min(1, kgk k 2 )kgk k
s stopping-rule: ks mk (sk )k min(1, ksk k )kgk k
If Bk satisfies the Dennis-More condition
k(Bk Hk )sk k/ksk k 0 whenever kgk k 0
and xk x with positive definite H(x )
= Q-superlinear convergence of xk under the g- and s-rules
If additionally H(x) is locally Lipschitz around x and
k(Bk Hk )sk k = O(ksk k2 )
=
April 2009
289 / 323
Function-evaluation complexity
How many function evaluations (iterations) are needed to ensure that
kgk k ?
so long as for very successful iterations k+1 3 k for 3 < 1
= basic ARC algorithm requires at most
l m
C function evaluations
2
for some C independent of
c.f. steepest descent
if H is globally Lipschitz, the s-rule is applied and additionally sk is
the global (line) minimizer of mk (sk ) as a function of
= ARC algorithm requires at most
l
m
S
function evaluations
3/2
for some S independent of
Philippe Toint (Namur)
290 / 323
Derivatives:
= ksk2
s m(s) = g + Bs + s
ss m(s) = B + I +
s
ksk
s
ksk
T
April 2009
291 / 323
and
= ksk2
Define s():
(B + I )s() = g
and find scalar as the root of secular equations
=0
ks()k2
or
1
=0
ks()k2
or
=0
ks()k2
April 2009
292 / 323
ks()k2
=0
=0
ks()k2
=0
ks()k2
April 2009
293 / 323
s = Qu where
April 2009
294 / 323
April 2009
295 / 323
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
ACO g stopping rule (3 failures)
ACO s stopping rule (3 failures)
trustregion (8 failures)
0.1
1.5
2.5
3.5
4.5
April 2009
296 / 323
min
s
1
2
to
min
s
April 2009
297 / 323
min
,s
April 2009
298 / 323
1 kJkT ck k
2 k kck k
April 2009
299 / 323
April 2009
300 / 323
April 2009
301 / 323
Objectives:
recover a unified global convergence theory
possibly open the door for new algorithms
Idea:
cast all three methods into a generalized TR framework
allow this TR to be updated nonlinearly
April 2009
302 / 323
...
April 2009
303 / 323
April 2009
304 / 323
April 2009
305 / 323
min
s
subject to
x +s F
April 2009
306 / 323
def
(tkGC > 0)
such that
mk (xkGC ) f (xk ) + ubs hgk , skGC i
and either
mk (xkGC ) f (xk ) + lbs hgk , skGC i
or
kPT (x GC ) [gk ]k epp |hgk , skGC i|
k
no trust-region condition!
Philippe Toint (Namur)
April 2009
307 / 323
0.2
0.4
0.6
0.8
1.2
1.4
1.6
April 2009
308 / 323
0.2
0.4
0.6
0.8
1.2
1.4
1.6
April 2009
309 / 323
if k 2 ,
(0, k ]
[k , 1 k ]
if k [1 , 2 ),
k+1
[1 k , 2 k ] if k < 1 .
Philippe Toint (Namur)
April 2009
310 / 323
k
,
mk (xk ) mk (xk + sk ) CR k min
1 + kHk k
k
,1
k
kHk k
ksk k 3 max
,
k
k
k
1 1 #
2
k 3
,
k
And therefore. . .
lim k = 0
(Cartis/Gould/T)
Philippe Toint (Namur)
April 2009
311 / 323
April 2009
312 / 323
where
def
m
k (x) =
min
hx mk (x), di
x+dF ,kdk1
c.f. the s-rule for unconstrained
April 2009
313 / 323
1
x g
k
x+
min
2
feasible
3
7
42
100
3
10
x2
1
10
y 3 + 13 [x 2 + y 2 ] 2
April 2009
314 / 323
k,c
1
x
x g
k,a
+
xk
2
feasible
3
7
42
100
3
10
x2
1
10
y 3 + 13 [x 2 + y 2 ] 2
April 2009
315 / 323
316 / 323
Conclusions
April 2009
317 / 323
Bibliography
April 2009
318 / 323
Conclusions
non-smooth techniques
specifically convex problems
penalty functions
augmented Lagrangians
affine scaling methods
general sequential quadratic programming (SQP)
systems of nonlinear equations
...
April 2009
319 / 323