Documente Academic
Documente Profesional
Documente Cultură
Mirror descent
Yuxin Chen
Princeton University, Spring 2018
Outline
• Mirror descent
• Bregman divergence
• Convergence analysis
1
x t+1
= arg min
f (xt
) + ∇f (xt ), x − xt + kx − xt k22
x∈C | {z } 2ηt
linear approximation
Proximal gradient methods 5-2
Outline
{z n1
ˆ 2 | {z }
quares) minimize 2R p f :=f
( )( = ) kX ⇢ X yk1>2 | {z
1 )
}
2 6 6 minimize • Mirror
:=f ( kX descent 27 )
:=g(
7
(Lasso) .. p yk + k k1
A proximal view of gradient d
1 6 2R
east squares)
⇣
minimize
D⇢00 (X > ˆ)⌘= 2 . |2 ⇣ {z ⌘ }7 | {z }
mize
1
k (t) 2Rp f ( (t)
µt rf (squares)
i) = kX
24
) 2k +minimizeg( ) minimize
yk
x f (x) 1 5 :=g(2 )
2 ⇣ (Least ⌘ 2R•p Bregman
f (⇢00) :=f
=Xdivergence
(>kX )ˆ yk
subject to
n
2
1 x œ C
minimize k ⌦ (t)
µt rf ( (t) 2
) ↵k + g( ⇣) ⌘ 1
t 2
f 2 t + rf minimize t
xt+1
, (Least t 1
=k xtt ≠
squares) ÷t+1 1 (xt )2Rp (t)2I
minimize f ( )2= kX yk2
k (Lasso) minimize 2Rp t
(t)Òf •µtAlternative
kX rf ( yk )+ forms k k +k21g( of mirror
) descent
1ror descent ⌦ ↵
t2
t+1 2 ⇣
t x t+1 | = arg min f |(x{z )
t ⌘}
+ ÈÒf (x t
), x ≠ xt Í +
f is convex and Lf -Lipschitz continuous
{z }
t 2
k f t +• rf t
2
k , 1
( )x µt rf (t ↵:=g((t) ) 2
minimize Convergence analysis
µt (t)
1 t 2 tÌ2
k ⌦ • :=f t
) kt + g( t+1
)
gman divergence 2µt I
k k f + rf ,
+ ⌦1 t , J 1 Îx ≠ xt Î2
(Least squares) 1minimize f (x t ) + Òf (xkX ), xt ≠ yk xt2 t ↵ ≠
t 2 f (
t )+= rf
1 2÷t 2
p
k k2R f ,
2 t 2
x of=mirror
ernative formst+1
arg min descent f (x ) + t 2µt
ÈÒf⇣(x ), x ≠ x Í +⌘ Îx ≠ x Î
t t
x
minimize ¸ 1 ˚˙ (t) µt rf ( ˝(t) ) 2÷ 2
k
first-order
2 approximation ¸k t+ g(˚˙ ) ˝
nvergence analysis proximal term
1 ⌦ ↵
+ , 1k
t 2
kt + 2 c f
t
+ rf t , t t t+1
Òf (xt ), x ≠ xt ≠ 2÷
2µ t t Îx ≠ x Î2
Mirror descent ( Mirror descent ) 5-35
1
xt+1 = arg min
f (xt
) + ∇f (xt ), x − xt + kx − xt k22
x∈C | {z } 2ηt
linear approximation
| {z }
proximity term
Proximal gradient methods 5-2 5-2
• quadratic proximal termBy is optimality
used by GD condition,
to monitor is the point where
xt+1 discrepancy
t + ÈÒf (x ), x ≠ x Í and ≠ 2÷1 t Îx ≠ xt Î2 have
f (x )approximation
between f (·) and first-order
t t
. .2 . .2
Æ .xt ≠ xú .2 ≠ ÷ 2 .Òf (xt ) ≠ Òf (xú ).2
. .2
Æ .xt ≠ xú . 2
1
minimizex∈Rn (x − x∗ )> Q(x − x∗ )
f (x) =
2
Gradient methods max Qi,i
where Q 0 is diagonal matrix with large κ = minii Qi,i 1
. .2 . .2 . t .
ú 2 2
.
Æ .xt ≠ xú .2 ≠ ÷ 2 .Òf (xt ) ≠ Òf (xú ).Æ2 .x ≠ x .2 ≠ ÷ .Òf (x ) ≠ Òf (
t
. .2 . .2
Æ .xt ≠ xú .2
Æ .xt ≠ xú . 2
1
minimizex∈Rn (x − x∗ )> Q(xGradient
f (x) = ∗
)
− xmethods
2
Gradient methods max Qi,i
where Q 0 is diagonal matrix with large κ = minii Qi,i 1
total-variation distance
minimizex∈∆ f (x)
where ∆ := {x ∈ Rn+ | 1> x = 1} is probability simplex
Mirror descent
Bregman divergence
with g t ∈ ∂f (xt )
1
ϕ(x) = x> Qx
2
Mirror descent 5-15
Example: KL divergence
Mirror descent
• for Euclidean case with ϕ(x) = kxk22 , this is law of cosine
kx − zk22 = kx − yk22 + ky − zk22 − 2 hz − y, x − yi
| {z }
kz−yk2 kx−yk2 cos ∠zyx
Mirror descent 5-21
Proof of three-point lemma
Here, (i) follows since (a) in exponential families, one has µ = ∇ϕ(θ) and
∇ϕ∗ (µ) = θ, and (b) hµ, θi = ϕ(θ) + ϕ∗ (µ) (homework)
[PICTURE]
2 2 2 z≠y x≠y
DÏ (z, =D for Euclidean
)+D case with yÎ •
2 2•, Èz
this is law of cosine
2 2
=ÈÒÏ(z) +
2˝
+ 2 ≠
5-2
Ï= \zx
descent
2 22, ≠
5-21
¸ ˝
•then
C
x(x), (x, 2= +(x, 2+ ≠(y,
≠ y,Mirror
x ≠ yÍdescent 5-21
If 2 =
\zyx
2= (x), Mirror 5-21
˚˙ C≠
Cx
≠C,Ï
z≠y
¸for
≠Euclidean case with 2 this
Ïis law of
≠y,cosine
2
C(z,
Îx ≠
MirrorinCifx,
(z,
•zÎ )Îx
Cis +
Cdescent ≠
affineyÎ plane,
(x Îy
C , x) then ’z2 œ≠
zÎ zÎ22 = Îx ≠ yÎ22 + Îy ≠ zÎ22 ≠ 2 ÈzMirror
y, x ≠ yÍ =≠ 5-21
If then
squared Euclidean case, itCÏmeans If≠xangle iszÎ obtuse \zyx
Mirror descent 5-21
= 2Îx D Îy
Èz ≠z)
¸yÎ2
2 cos \zyx zÎ
≠x
2 ˚˙ 2 ˝
ÈÒÏ(z) Èz 2
z≠y 2
y, x ≠
ÒÏ(y),
2 x≠y x ≠yÍ
x≠y 22cos \zyx 5-2
2 Ï(x)Ï= ÎxÎ
then 2 (z,D 2 z)
Mirror descent
(z,
Mirror
) 2 D descent
Convergence
• for
(x • •cosy)
C,ÏC
\zyx
Îy2y,≠analysis
D •
Euclidean case with Ï(x) = ÎxÎ2 , 5-21 z) ≠ ÈÒÏ(z)
this is law of cosine ≠ ÒÏ(y), x¸
(z,≠ yÍ ˚˙ (z,
5-21
˝ ) + • (x
D 5-21 Mirror descent
z) D y)
D Generalized
•Pythagorean Theorem
(z, = (z, + (x
z≠y x≠y
2 C2
2 2 2 5-21
Îx ≠ zÎ = ÎxÏ≠ yÎ + Îy
for Euclidean case with Ï = zÎ(y
this is Ï2 z)
lawÏ of =≠
cosine Cdescent
Ï C,Ï2 cos D x) Ø D x D , x) ’z œ C
Mirror descent
Îx ≠ zÎ2 = Îx ≠Mirror + Îy ≠ z
\zyx
• if C is affine plane, then
DD x)
x) Ø D
D x D
D ,, x)
x) ’z’z œ œ C C \xyz Ï Ï
(x = (x ≠ = zÎ
+For every three 2points
Dz)x) ˝Ø D , x)
≠ 2Mirror ≠2 ’z
y)2ϭ 25-26
descent
2C 2 2 ˚˙ ˝ 5-2 2
5-2
(x = 2 2 2
Mirror descent
≠ Îx
≠ zÎ Èzdescent
y, x ≠ yÍ
z≠y \xyz 2 2 C Ï5-21
5-21
2 2 Mirror descent
Mirror descent Mirror descent
• Îx ≠
œ=
(x2C+ œyÎC2+ Îy
z) zÎ ˚˙ Ï ˝ yÎ 5-21
(x
Ï
=
Ï
DÏ•(z,inx) = DÏ (z, xC ) + D (xzÎ D, x) ≠ ’z Îx œ≠ C
zÎ Îx ≠ x Î (x
Îx = 2
¸
=D
’z Mirror
œ2 C+\zxdescent Ï 2
Mirror ≠ Îx ≠ ≠ Îx ≠ 2 2 2x≠yÏ 2
C ,it2x)
descent 5-21
squared Euclidean case, 2 means angle D (z,
zÎis22 then
obtuse
z) = (z,
zÎ )+ D (xx, y) yÎ
¸foÎ
orfor
Fact 5.1
ForMirror
Euclidean every
Euclidean case
case for2Euclidean
• with
descent
in
Îx Ï≠
• squared
(z,
three points
with is =
case
• if C
= ÎxÎ
for
= ,with
x,is
= ÎxIf≠
Ï ≠•C,Ï
y,affine
2Euclidean
(z,
2, x
this
Euclidean is )law
this
Ï(x) =z) =, x)
• Fact
yÎ
z,Ï xCplane,=
2 PÎy
Îx, this
if C case,
of(x
is law
casecosine
ÎxÎ ≠D
D5.1
zÎ
2C,Ï
ofaffine
iswith
≠(x),
Cx
is (y
then
cosine
D
law D≠
of
itœ(yC≠means
z)
2 Èz ≠z≠y
=(x
x)
cosine
Îy
==ÎxÎ
≠D≠y)
angle
zÎ=xÎx
(z,
Ï(x)
Ï,zÎ
z≠yÏ x≠y cos \zyx
˚˙ 2 ˝
y, x
\xyz
\zx
this
≠ ≠ÎxÎ
yÎ x≠y 2 cos
Îx
2 yÍ
is5-26law
2,≠
isD y,
(z,2z,
zÎÏ2\zyx
obtuse
descentÏ 2
isof xC )Îx +≠
MirrorÏdescentÏ
cosine
5-21
DÏyÎ
2
Îy
Îx ≠=zÎ
≠ zÎ Îx2≠ ≠
2
Ï 2 z≠y
C Ï22
• ≠Èz
2 C
C
cos \zyx
zÎ 2
Mirror descent 2
Mirror descent 2
thagorean
If x = PTheorem
2 2
se
•
with Ï(x) = , Ï(x)
2• this
• if C is affine plane, thenÏ(x) x)
law
D
ÎxÎ of cosine
D ’z Ï
2 Ï MirrorC descent
Ï(x) ≠then
z)
¸
D Ï C 2
x) Ø
• Convergence analysis
z≠y 2 x≠y 2Mirror
Îy ≠ \xyz
C• xif C affine plane, then , x) ’z
2 2 5-2
5-21
(x), then
escent ÎxÎ plane, (y Ï (x=
cos \zyx
Ï 2 5-21
then
(x , x) =DDÏÏ(z, (z, )Îy
+D (x C \xyz 2 2
nts
DÏ (z, x) D(z,Ïx) )+≠
Mirror descent
Ïz) ≠Ï’z zÎ Ï 5-21
CØ DD xC C, x) ,2x) ’z œ C
y, C’z œC x Cœ
(x, = (x, + (y,
Èz2x) 2 22, this
for Euclidean case with =2 2≠ , this is law 2 of cosine 2
Cx, z,plane,
Ï C =zÎ Îx ≠ zÎ = Îx ≠ yÎ + Îy ≠ zÎ ≠ 2 Èz ≠ y, x ≠ yÍ
2 ≠ yÎ2 + Îy ≠ zÎ2 ≠ 2
D z) D y) D z) ÈÒÏ(z) ≠ ÒÏ(y), x ≠ yÍ
• Bregman
for Euclidean divergencecase with = is law of cosine
• 2 Ï(x) ÎxÎ
= Îx2descent
+
2Îx
2
Bregman divergence
Ï Ï Ï
if Ø is affine
≠ zÎ ≠=y, 5-26≠ yÍ
x (z, = (z, ) + (x
• then (z, (z, ) + (x C 5-21• Ïfor CEuclidean case with
Mirror 5-2 ’z
ifC,Ï
C isÎx affine plane, then =
(z, =Mirror
(z,
descent
) + 2 (x • Ï ¸ 2 Ï(x) ÎxÎ
Îx ≠Mirror
2zÎ2descent
zÎD 2
=
x)Îx ≠D yÎ
2yÎ 2 C2
x
2 + Îy
Îy D ≠ C 2, x) ≠ 2’z
5-26 œ Èz
2
Mirror
C ≠ y,
descent x ≠ yÍ D D x˝C 2 Ï C
D , x) 22 œ C
5-21 D z≠y
Ï(x) x) D x D , x) ’z œ C
Îx ≠ yÎ2 + Îy ≠2 zÎ2 ≠2 2 Èz 2≠ y, x
For
Îx ≠
(xÏC(z,
Ï
MirrorÎx
C ) +’z Ïœ(x
≠ Ï Ï
≠2 is zÎ ≠ 2•+˝ Îy
Mirror Èz
descent ≠ y, x 2
5-21 ≠ yÍ ˚˙ Ï
5-21
2 ϸ ˚˙ Ï ˝
Ïevery , x)xthree Dpoints C˚˙2 = 2 Mirror
¸ ˚˙ ˝
DÏ(x, œ= (x, +Èz≠z, (y,
CC ,•x)¸for if affine plane, then 2
2
x,Euclidean
2
case with
2
=≠ zÎ22, ≠ this is Èz
law ≠ofy,cosine
2
≠descent 2 xz≠y 2
+D
zÎ22 Îx≠
yÍ
cosFact 5.2
= + 2
Ï=(x =
descent
’z œz≠y
ÏFor C ) +every three points
x) D • C
Îx ≠ Îx ≠
≠˚˙¸z,zÎ
Îy ≠ ≠
Îx ≠ Èz
yÎ2Ï(x)
≠ ≠
yÍ
\xyz
zÎ yÎ zÎ y,
Mirror x yÍ x≠y cos \zyx 5-2
y, ¸˝Mirror ˚˙ ÎxÎ
Mirror descent 5-26
D (z, x) Ø D (z, x ) + D (x , x)
cos MirrorÎx zÎ 2 ’z Îx
œ≠ y, x ≠ yÍ
D
• for x) case
Euclidean DÏÏ(x) =x 2x≠y z≠y
DofÏ\zyx 2 x≠y,descent ≠ yÎ
2 , this is2law cosine
(x, +D 2 (y, z)
z≠y descent
2C,Ï
ÎxÎ
- descent
Alternative forms
z≠y 2 x≠y 2¸cos \zyx
of mirror descent
C,Ï 5-2
Fact 25.2 ≠Îx Ï (z, 2x) =≠ (z, = C )Alternative
z+ (z, (x = of ) ˚˙
Mirror descent 5-2
Ï ÈÒÏ(z)
Fact
y) 5.2
Mirror descent
D≠ =≠ yÍ D≠ÏÒÏ(y), 2xÒ
gMirror + D≠Ïx
D Ï• ≠ C yÍ
2x) ,-x)
2z=x ’z œxC
5-26
ÒÏ(x ÒÏ(x)
DÏ (x,Îxz) ≠ zÎÏ
= DÏ (x, +z≠y
Îy ≠ zÎ22 ≠ 2Mirror zÎ ≠2
(y, Îx ≠yÎ •Ï2• inÎy squared
≠zÎC 2≠ Euclidean
Èz ≠Ïy, ˚˙ case,
≠ yÍ
C C,Ï˝ it5-2 means If xangle
2 cos \zyx
\zxC x isthen
obtuse
C5-21= PC,Ï (x),
descent 2 = Îx ≠ yÎ2 + 5-26
or descent Mirror descent Mirror descent
5-26 y) DÈz y, x
¸ descent
Ïx≠y˚˙ z)
˝ ÈÒÏ(z)descent
ÒÏ(y), ¸x ≠ yÍ z≠y 2 x≠y
5-21 5-2
5-21
5-21
(z,
Mirror descent
x) = D (z, x ) +’z
C ) + DÏ (xÏC , x) C œ(xCCdescent ’z œ C DÏ (x ≠ Èg,
Dz) z(x=x≠
≠ Îx z)Í≠Ø
C,Ï 0Îx
=zÎ zÎœ22≠
2 ≠’z
Îx C5-26
C ÎÏ2(x ≠
x5-21D 2 Îxy)C2≠ = zÎ
Îx22 ≠ yÎ 2
\xyz
2
• for Euclidean case with Ï(x) = ÎxÎ , this is law of
DÏMirror , x) Ï 2
• for
• ifEuclidean if C is
case
C is affine •plane, affine
with
then plane, =
then
ÎxÎ22 , this isDlaw (yD of (z,
cosine
x)
= = D (z,
2 xC\xyz) + D (x 2 , x) ’z Mirror
œ C desc
• in squared Euclidean case, it means angle \zx x is obtuse
ase with Ï(x) = ÎxÎ 2 , this
Ï(x)
is law of cosine
Ï
≠ z) Îy ≠ÏzÎ 2
Ï C
2 Therefore, for all
Mirror
z œ Ï
descent
C, • if C is affine plane, t
D (z, x) = D (z,
DÏ (z, x) = DÏÏ(z, xC ) + DÏÏ (xC , C,Ï x x) ) + D’z (x
Ï œ C,Ï
C , x) ’z œ C C
2 2 2
Îx ≠ zÎ2 = Îx ≠ yÎ2 + Îy ≠0zÎ ≠x2 ≠≠Èz ≠2 = yÎ22C,Ï
+ 2
ÍzÎ22, ≠ 2 isÏ Èz
for y,Euclidean =y,˚˙ case
≠ yÍ≠
x Îx
with ≠=
2 this D law≠ofy,c
¸(z, x) =˚
Mirror descent 5-26
Mirror descent
2 2 2Èg,
Îx C,Ï ¸zÎ ˝≠ ), zÎy x≠ 5-26 5-26
if is ≠affine≠Cplane, then
Îx ≠ + Îy 2 Èz • Ø≠ ≠zÍyÍ 2 ÈÒÏ(x) ÒÏ(x Ï(x) ÎxÎ
C,Ï 5-26
C ) + D •(x
yÎ
Ï C2 , C
x) zÎ
’z2 œ ¸ = DÏ (x
x
Îz≠yÎ Îx≠yÎ cos
˚˙C,Ï , x)2≠˝ DÏ (z,
2 x) + DÏ (z, xC,Ï )
\zyx
Mirror descent
Mirror descent Îz≠yÎ2 Îx≠
2 2
Îz≠yÎ2 Îx≠yÎ2 cos \zyx
MirrorÎx
as claimed, where ≠linezÎ
lastMirror 2 = from
comes Îx ≠ yÎ
Fact 5.1
2 + Îy ≠ zÎ22 ≠ 2 Èz ≠ y, x
Fact 5.2
descent
DÏ (z, x) = DÏ (z, xC ) + DÏ (xC , x)
descent
Mirror descent 5-21 C¸
’z œÎz≠yÎ ˚˙
2 Îx≠yÎ
it means angle \zxC x is obtuse
If xC,ϕ = PC,ϕ (x), then
Mirror descent Mirror descent 5-21 5-27
5-26
Mirror descent 5-27
(x, =Ï (x,Fact + z) DÏ=(y, (z, (z, )Alternative
++ (x
If xC+ (y, Three-point lemma 2 2≠ Alternative2Èz ≠ forms of mirror 2 descent
Fact 5.1
D z) DÏthen y)ÈÒÏ(z) 5.2 z) ≠ ÈÒÏ(z) ≠D ÒÏ(y), x≠≠Îy•≠x ≠zÎ2yÍ forms of yÍmirror 2 descent z≠y 2 x≠y 2 cos \zyx
For every casethree points y, z, =¸ Îx
˚˙ for
cos yÍ Euclidean with
2≠ zÎ = Ï (y 5-21+ Îy= 2 \xyz 22 x
≠Ø
Alternative forms of mirror it˝ descent ’zÎy œ \zyx
C 2C≠x2isthen
• for
x,ÏD y) D z) ≠ ≠ÏzÎ D
ÒÏ(y), x)
2x Convergence
Fact yÍ ≠5.2 analysis •yÎ ≠D , ≠≠x)
≠ ≠
=P (x), (x, (x,
D z)
≠+ (y, 2 x,
Îx Îx 2≠ z≠y x≠y
2y,˚˙x= 2+
yÎ zÎ y,
for Euclidean ≠case with yÍ Ï(x) = ÎxÎ2 , this is la
2 yÎ
(x,Fact 5.1
2Alternative
•Èz2in squared
forms ˝of
Euclidean
mirror ¸ case,
descent ˝means
xangle obtuse
Îx ≠ zÎ22 = Îx•≠for
• D22Euclidean D22 Ï≠(x,
=•Ï (z, xC(x, D (x2+ , (y (y,ÎxD œ= y)
Ï D yÎ +
(x,
z)Îyz)≠=zÎ
≠≠ 2 •Ï
case
ÈÒÏ(z) 2Mirror
with
y)Èz+≠Dy, (y,
Ï=
≠
CyÍ
xÎxÎ≠ 2≠
z)yÍ, 5-21
this
ÒÏ(y), is law
Îx≠Ï
ÈÒÏ(z)
¸x
2≠ ≠
zÎ
≠ of
Mirror
C
≠yÍ
cosine
descent
ÒÏ(y),
Îx
x ≠ yÍ
≠ ≠\zx
zÎ Èz ≠ y,•x ≠ Ï(x) 5-2 cos \zyx
\xyz If = P 2(x),
C,Ï Mirror descent Mirror descent Ï2 Ï(x)
2descent 2 5-21
[PICTURE]
2 2 2 z≠y x≠y
DÏ (z, =D for Euclidean
)+D case with yÎ •
2 2•, Èz
this is law of cosine
2 2
=ÈÒÏ(z) +
2˝
+ 2 ≠
5-2
Ï= \zx
descent
2 22, ≠
5-21
¸ ˝
•then
C
x(x), (x, 2= +(x, 2+ ≠(y,
≠ y,Mirror
x ≠ yÍdescent 5-21
If 2 =
\zyx
2= (x), Mirror 5-21
˚˙ C≠
Cx
≠C,Ï
z≠y
¸for
≠Euclidean case with 2 this
Ïis law of
≠y,cosine
2
C(z,
Îx ≠
MirrorinCifx,
(z,
•zÎ )Îx
Cis +
Cdescent ≠
affineyÎ plane,
(x Îy
C , x) then ’z2 œ≠
zÎ zÎ22 = Îx ≠ yÎ22 + Îy ≠ zÎ22 ≠ 2 ÈzMirror
y, x ≠ yÍ =≠ 5-21
If then
squared Euclidean case, itCÏmeans If≠xangle iszÎ obtuse \zyx
Mirror descent 5-21
= 2Îx D Îy
Èz ≠z)
¸yÎ2
2 cos \zyx zÎ
≠x
2 ˚˙ 2 ˝
ÈÒÏ(z) Èz 2
z≠y 2
y, x ≠
ÒÏ(y),
2 x≠y x ≠yÍ
x≠y 22cos \zyx 5-2
2 Ï(x)Ï= ÎxÎ
then 2 (z,D 2 z)
Mirror descent
(z,
Mirror
) 2 D descent
Convergence
• for
(x • •cosy)
C,ÏC
\zyx
Îy2y,≠analysis
D •
Euclidean case with Ï(x) = ÎxÎ2 , 5-21 z) ≠ ÈÒÏ(z)
this is law of cosine ≠ ÒÏ(y), x¸
(z,≠ yÍ ˚˙ (z,
5-21
˝ ) + • (x
D 5-21 Mirror descent
z) D y)
D Generalized
•Pythagorean Theorem
(z, = (z, + (x
z≠y x≠y
2 C2
2 2 2 5-21
Îx ≠ zÎ = ÎxÏ≠ yÎ + Îy
for Euclidean case with Ï = zÎ(y
this is Ï2 z)
lawÏ of =≠
cosine Cdescent
Ï C,Ï2 cos D x) Ø D x D , x) ’z œ C
Mirror descent
Îx ≠ zÎ2 = Îx ≠Mirror + Îy ≠ z
\zyx
• if C is affine plane, then
DD x)
x) Ø D
D x D
D ,, x)
x) ’z’z œ œ C C \xyz Ï Ï
(x = (x ≠ = zÎ
+For every three 2points
Dz)x) ˝Ø D , x)
≠ 2Mirror ≠2 ’z
y)2ϭ 25-26
descent
2C 2 2 ˚˙ ˝ 5-2 2
5-2
(x = 2 2 2
Mirror descent
≠ Îx
≠ zÎ Èzdescent
y, x ≠ yÍ
z≠y \xyz 2 2 C Ï5-21
5-21
2 2 Mirror descent
Mirror descent Mirror descent
• Îx ≠
œ=
(x2C+ œyÎC2+ Îy
z) zÎ ˚˙ Ï ˝ yÎ 5-21
(x
Ï
=
Ï
DÏ•(z,inx) = DÏ (z, xC ) + D (xzÎ D, x) ≠ ’z Îx œ≠ C
zÎ Îx ≠ x Î (x
Îx = 2
¸
=D
’z Mirror
œ2 C+\zxdescent Ï 2
Mirror ≠ Îx ≠ ≠ Îx ≠ 2 2 2x≠yÏ 2
C ,it2x)
descent 5-21
squared Euclidean case, 2 means angle D (z,
zÎis22 then
obtuse
z) = (z,
zÎ )+ D (xx, y) yÎ
¸foÎ
orfor
Fact 5.1
ForMirror
Euclidean every
Euclidean case
case for2Euclidean
• with
descent
in
Îx Ï≠
• squared
(z,
three points
with is =
case
• if C
= ÎxÎ
for
= ,with
x,is
= ÎxIf≠
Ï ≠•C,Ï
y,affine
2Euclidean
(z,
2, x
this
Euclidean is )law
this
Ï(x) =z) =, x)
• Fact
yÎ
z,Ï xCplane,=
2 PÎy
Îx, this
if C case,
of(x
is law
casecosine
ÎxÎ ≠D
D5.1
zÎ
2C,Ï
ofaffine
iswith
≠(x),
Cx
is (y
then
cosine
D
law D≠
of
itœ(yC≠means
z)
2 Èz ≠z≠y
=(x
x)
cosine
Îy
==ÎxÎ
≠D≠y)
angle
zÎ=xÎx
(z,
Ï(x)
Ï,zÎ
z≠yÏ x≠y cos \zyx
˚˙ 2 ˝
y, x
\xyz
\zx
this
≠ ≠ÎxÎ
yÎ x≠y 2 cos
Îx
2 yÍ
is5-26law
2,≠
isD y,
(z,2z,
zÎÏ2\zyx
obtuse
descentÏ 2
isof xC )Îx +≠
MirrorÏdescentÏ
cosine
5-21
DÏyÎ
2
Îy
Îx ≠=zÎ
≠ zÎ Îx2≠ ≠
2
Ï 2 z≠y
C Ï22
• ≠Èz
2 C
C
cos \zyx
zÎ 2
Mirror descent 2
Mirror descent 2
thagorean
If x = PTheorem
2 2
se
•
with Ï(x) = , Ï(x)
2• this
• if C is affine plane, thenÏ(x) x)
law
D
ÎxÎ of cosine
D ’z Ï
2 Ï MirrorC descent
Ï(x) ≠then
z)
¸
D Ï C 2
x) Ø
• Convergence analysis
z≠y 2 x≠y 2Mirror
Îy ≠ \xyz
C• xif C affine plane, then , x) ’z
2 2 5-2
5-21
(x), then
escent ÎxÎ plane, (y Ï (x=
cos \zyx
Ï 2 5-21
then
(x , x) =DDÏÏ(z, (z, )Îy
+D (x C \xyz 2 2
nts
DÏ (z, x) D(z,Ïx) )+≠
Mirror descent
Ïz) ≠Ï’z zÎ Ï 5-21
CØ DD xC C, x) ,2x) ’z œ C
y, C’z œC x Cœ
(x, = (x, + (y,
Èz2x) 2 22, this
for Euclidean case with =2 2≠ , this is law 2 of cosine 2
Cx, z,plane,
Ï C =zÎ Îx ≠ zÎ = Îx ≠ yÎ + Îy ≠ zÎ ≠ 2 Èz ≠ y, x ≠ yÍ
2 ≠ yÎ2 + Îy ≠ zÎ2 ≠ 2
D z) D y) D z) ÈÒÏ(z) ≠ ÒÏ(y), x ≠ yÍ
• Bregman
for Euclidean divergencecase with = is law of cosine
• 2 Ï(x) ÎxÎ
= Îx2descent
+
2Îx
2
Bregman divergence
Ï Ï Ï
if Ø is affine
≠ zÎ ≠=y, 5-26≠ yÍ
x (z, = (z, ) + (x
• then (z, (z, ) + (x C 5-21• Ïfor CEuclidean case with
Mirror 5-2 ’z
ifC,Ï
C isÎx affine plane, then =
(z, =Mirror
(z,
descent
) + 2 (x • Ï ¸ 2 Ï(x) ÎxÎ
Îx ≠Mirror
2zÎ2descent
zÎD 2
=
x)Îx ≠D yÎ
2yÎ 2 C2
x
2 + Îy
Îy D ≠ C 2, x) ≠ 2’z
5-26 œ Èz
2
Mirror
C ≠ y,
descent x ≠ yÍ D D x˝C 2 Ï C
D , x) 22 œ C
5-21 D z≠y
Ï(x) x) D x D , x) ’z œ C
Îx ≠ yÎ2 + Îy ≠2 zÎ2 ≠2 2 Èz 2≠ y, x
For
Îx ≠
(xÏC(z,
Ï
MirrorÎx
C ) +’z Ïœ(x
≠ Ï Ï
≠2 is zÎ ≠ 2•+˝ Îy
Mirror Èz
descent ≠ y, x 2
5-21 ≠ yÍ ˚˙ Ï
5-21
2 ϸ ˚˙ Ï ˝
Ïevery , x)xthree Dpoints C˚˙2 = 2 Mirror
¸ ˚˙ ˝
DÏ(x, œ= (x, +Èz≠z, (y,
CC ,•x)¸for if affine plane, then 2
2
x,Euclidean
2
case with
2
=≠ zÎ22, ≠ this is Èz
law ≠ofy,cosine
2
≠descent 2 xz≠y 2
+D
zÎ22 Îx≠
yÍ
cosFact 5.2
= + 2
Ï=(x =
descent
’z œz≠y
ÏFor C ) +every three points
x) D • C
Îx ≠ Îx ≠
≠˚˙¸z,zÎ
Îy ≠ ≠
Îx ≠ Èz
yÎ2Ï(x)
≠ ≠
yÍ
\xyz
zÎ yÎ zÎ y,
Mirror x yÍ x≠y cos \zyx 5-2
y, ¸˝Mirror ˚˙ ÎxÎ
Mirror descent 5-26
D (z, x) Ø D (z, x ) + D (x , x)
cos MirrorÎx zÎ 2 ’z Îx
œ≠ y, x ≠ yÍ
D
• for x) case
Euclidean DÏÏ(x) =x 2x≠y z≠y
DofÏ\zyx 2 x≠y,descent ≠ yÎ
2 , this is2law cosine
(x, +D 2 (y, z)
z≠y descent
2C,Ï
ÎxÎ
- descent
Alternative forms
z≠y 2 x≠y 2¸cos \zyx
of mirror descent
C,Ï 5-2
Fact 25.2 ≠Îx Ï (z, 2x) =≠ (z, = C )Alternative
z+ (z, (x = of ) ˚˙
Mirror descent 5-2
Ï ÈÒÏ(z)
Fact
y) 5.2
Mirror descent
D≠ =≠ yÍ D≠ÏÒÏ(y), 2xÒ
gMirror + D≠Ïx
D Ï• ≠ C yÍ
2x) ,-x)
2z=x ’z œxC
5-26
ÒÏ(x ÒÏ(x)
DÏ (x,Îxz) ≠ zÎÏ
= DÏ (x, +z≠y
Îy ≠ zÎ22 ≠ 2Mirror zÎ ≠2
(y, Îx ≠yÎ •Ï2• inÎy squared
≠zÎC 2≠ Euclidean
Èz ≠Ïy, ˚˙ case,
≠ yÍ
C C,Ï˝ it5-2 means If xangle
2 cos \zyx
\zxC x isthen
obtuse
C5-21= PC,Ï (x),
descent 2 = Îx ≠ yÎ2 + 5-26
or descent Mirror descent Mirror descent
5-26 y) DÈz y, x
¸ descent
Ïx≠y˚˙ z)
˝ ÈÒÏ(z)descent
ÒÏ(y), ¸x ≠ yÍ z≠y 2 x≠y
5-21 5-2
5-21
5-21
(z,
Mirror descent
x) = D (z, x ) +’z
C ) + DÏ (xÏC , x) C œ(xCCdescent ’z œ C DÏ (x ≠ Èg,
Dz) z(x=x≠
≠ Îx z)Í≠Ø
C,Ï 0Îx
=zÎ zÎœ22≠
2 ≠’z
Îx C5-26
C ÎÏ2(x ≠
x5-21D 2 Îxy)C2≠ = zÎ
Îx22 ≠ yÎ 2
\xyz
2
• for Euclidean case with Ï(x) = ÎxÎ , this is law of
DÏMirror , x) Ï 2
• for
• ifEuclidean if C is
case
C is affine •plane, affine
with
then plane, =
then
ÎxÎ22 , this isDlaw (yD of (z,
cosine
x)
= = D (z,
2 xC\xyz) + D (x 2 , x) ’z Mirror
œ C desc
• in squared Euclidean case, it means angle \zx x is obtuse
ase with Ï(x) = ÎxÎ 2 , this
Ï(x)
is law of cosine
Ï
≠ z) Îy ≠ÏzÎ 2
Ï C
2 Therefore, for all
Mirror
z œ Ï
descent
C, • if C is affine plane, t
D (z, x) = D (z,
DÏ (z, x) = DÏÏ(z, xC ) + DÏÏ (xC , C,Ï x x) ) + D’z (x
Ï œ C,Ï
C , x) ’z œ C C
2 2 2
Îx ≠ zÎ2 = Îx ≠ yÎ2 + Îy ≠0zÎ ≠x2 ≠≠Èz ≠2 = yÎ22C,Ï
+ 2
ÍzÎ22, ≠ 2 isÏ Èz
for y,Euclidean =y,˚˙ case
≠ yÍ≠
x Îx
with ≠=
2 this D law≠ofy,c
¸(z, x) =˚
Mirror descent 5-26
Mirror descent
2 2 2Èg,
Îx C,Ï ¸zÎ ˝≠ ), zÎy x≠ 5-26 5-26
if is ≠affine≠Cplane, then
Îx ≠ + Îy 2 Èz • Ø≠ ≠zÍyÍ 2 ÈÒÏ(x) ÒÏ(x Ï(x) ÎxÎ
C,Ï 5-26
C ) + D •(x
yÎ
Ï C2 , C
x) zÎ
’z2 œ ¸ = DÏ (x
x
Îz≠yÎ Îx≠yÎ cos
˚˙C,Ï , x)2≠˝ DÏ (z,
2 x) + DÏ (z, xC,Ï )
\zyx
Mirror descent
Mirror descent Îz≠yÎ2 Îx≠
2 2
Îz≠yÎ2 Îx≠yÎ2 cos \zyx
MirrorÎx
as claimed, where ≠linezÎ
lastMirror 2 = from
comes Îx ≠ yÎ
Fact 5.1
2 + Îy ≠ zÎ22 ≠ 2 Èz ≠ y, x
Fact 5.2
descent
DÏ (z, x) = DÏ (z, xC ) + DÏ (xC , x)
descent
Mirror descent 5-21 C¸
’z œÎz≠yÎ ˚˙
2 Îx≠yÎ
it means angle \zxC x is obtuse
If xC,ϕ = PC,ϕ (x), then
Mirror descent Mirror descent 5-21 5-27
Let
g = ∇z Dϕ (z, x) = ∇ϕ(xC,ϕ ) − ∇ϕ(x)
z=xC,ϕ
hg, z − xC,ϕ i ≥ 0 ∀z ∈ C
∇ϕ y t+1 = ∇ϕ xt − ηt g t with g t ∈ ∂f (xt ) (5.3a)
xt+1 ∈ PC,ϕ y t+1 = arg min Dϕ (z, y t+1 ) (5.3b)
z∈C
xt+1 = ∇ϕ∗ ∇ϕ xt − ηg t (5.4)
minimizex f (x)
subject to x∈C
√
2ρR √
1
• If ηt = Lf t
with R := supx∈C Dϕ x, x0 , then
√ !
best,t opt Lf R log t
f −f ≤O √ √
ρ t
1 1 1 1
sup Dϕ (x, x0 ) = sup kx − n−1 1k22 = sup kxk22 − ≤
x∈∆ x∈∆ 2 x∈∆ 2 n 2
√
Since kgk∞ ≤ kgk2 ≤ nkgk∞ , one has
1 L
√ ≤ f,∞ ≤ 1
n Lf,2
m
X
minimizex f (x) = |a>
i x − bi |
i=1
subject to x ∈ ∆ = {x ∈ Rn+ | 1> x = 1}
ai,1 +ai,2
with ai ∼ N (0, In×n ) and bi = 2 + N (0, 10−2 ), m = 20,
n = 3000
"
and suppoe Ï is fl-strongly convex w.r.t. Î · Î. Th
Suppose f is convex and Lipschitz continuous (i.e
Lf R lo
L
2
with R := supxœC DÏ x, x0
Numerical example: robust regression
supxœC DÏ x, x0 +
k=0 ÷k
fl
A Ô
"
Ô
!
Robust regression problem with ai ∼ N (0, In×n ) and
qt
≠f ÆO
x and Lipschitz continuous (i.e. Îg t Îú Æ Lf ) on C,
trongly convex w.r.t. Î · Î. Then
best,t opt
Æ
! " L2f qt
supxœC DÏ x, x0 + 2
opt
2fl k=0 ÷k
f
opt
≠f
t
Æ
1
qt
2flR Ô
k=0 ÷k
best,t
Lf
Theorem 5.3
Ô
• If ÷t =
f
Mirror descent
! "
with R := supxœC DÏ x, x0 , then
A Ô B
L R log t
f best,t ≠ f opt Æ O
f
Ô Ô
fl t
stepsizes chosen according to best bounds (but still sensitive to
Mirror descent 5-40
stepsize choice)
Fundamental inequality for mirror descent
Lemma 5.4
opt
ηt2 L2f
ηt f (x ) − f
t
≤ Dϕ (x , x ) − Dϕ (x , x
∗ t ∗ t+1
)+
2ρ
f xt − f x∗ ≤ hg t , xt − x∗ i (property of subgradient)
1
= h∇ϕ xt − ∇ϕ y t+1 , xt − x∗ i (MD update rule)
ηt
1
= Dϕ x∗ , xt + Dϕ xt , y t+1 − Dϕ x∗ , y t+1 (three point lemma)
ηt
1
≤ Dϕ x∗ , xt + Dϕ xt , y t+1 − Dϕ x∗ , xt+1 − Dϕ xt+1 , y t+1
ηt
(Pythagorean)
1 1
= Dϕ x∗ , xt − Dϕ x∗ , xt+1 + Dϕ xt , y t+1 − Dϕ xt+1 , y t+1
ηt ηt
so we need to first bound 2nd term of last line
We claim that
(ηt Lf )2
Dϕ xt , y t+1 − Dϕ xt+1 , y t+1 ≤ (5.6)
2ρ
This gives
(ηt Lf )2
ηt f xt − f x∗ ≤ Dϕ x∗ , xt − Dϕ x∗ , xt+1 +
2ρ
as claimed
Dϕ xt , y t+1 − Dϕ xt+1 , y t+1
= ϕ xt − ϕ xt+1 − ∇ϕ y t+1 , xt − xt+1
ρ
2
≤ ∇ϕ xt , xt − xt+1 −
xt − xt+1
− ∇ϕ y t+1 , xt − xt+1
2
(strong convexity of ϕ)
t ρ
t
2
= ∇ϕ x − ∇ϕ y
t t+1
,x − x t+1
−
x − xt+1
2
2
ρ
t+1
2
= ηt hg , x − x i −
t t t+1
t
x −x (MD update rule)
2
t
ρ
2
≤ ηt Lf
x − xt+1
−
xt − xt+1
(Cauchy-Schwarz)
2
(ηt Lf )2
≤ (optimize quadratic function in kxt − xt+1 k)
2ρ