Documente Academic
Documente Profesional
Documente Cultură
Algorithms
Stephen J. Wright
Introduction
Optimization - minimization
Convex
Smooth
Regularization Functions
Outline of CD Algorithms
Applications
2
Contd..
Powells Example
Randomized Algorithms
Conclusion
3
Coordinate Descent
Algorithm
4
Introduction
Optimization - minimization
n
where f : is continuous.
5
Types of functions
Convex Function :
Domain :
f is said to be convex, if
6
Smooth Function :
7
Regularization Functions
8
Outline of Coordinate
Descent Algorithms
as we said in the Introduction, the function we use in
this paper is
Applying algorithm 1 to the lagrangian dual with k == 1 , each step has the form
from the lagrangian dual, we acquire and the update on after each update on
xk, we obtain
Chang, Hsieh, and Lin use cyclic and stochastic CD to solve a squared-
loss formulation of the support vector machine (SVM) problem in
machine learning, that is,
where (xi,yi) n x {0,1} are feature vector / label pairs and lama is a
regularization parameter.
Algorithms,
Convergence,
Implementations
Powells Example :
we define Lipschitz constants that are tied to the component directions, and
are the key to the algorithms and their analysis.
The Component Lipschitz constants are positive quantities Li such that for all x
n and all t we have
Algorithm 3
Theorem 1 : Suppose that Assumption 1 holds. Suppose
that k 1/Lmax in algorithm 3. Then for all k> 0 we have
to obtain
We thus have
In the case of f is strongly convex with modulus > 0, we have by taking the
minimum of both sides with respect to y in (20), and setting x = x k, that
By using this expression to bound || f( xk ) ||2 in (32), we obtain
Note that the same convergence expressions can be obtained for more refined
choice of step-length k, by making minor adjustments to the logic.
For example, the choice k 1/Lik leads to the same bounds, the same bounds
hold too when k is the exact minimizer of f along the coordinate search direction.
we compare (27) with the corresponding result for full-gradient descent with
constant step length k 1/L. The iteration
proposed by Nesterov.
Algorithm 4
assumes that an estimate
is available of modulus of
strong convexity 0
from (20), as well as
estimates of the
component-wise Lipschitz
constants Li from (21).
closely related to
accelerated full-gradient
methods.
Theorem 2: Suppose that Assumption 1 holds, and define