Tip 2015 2401430

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2401430, IEEE Transactions on Image Processing
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2015
On the Convergence of Nonconvex Minimization

Methods for Image Recovery
Jin Xiao, Michael K. Ng, and Yu-Fei Yang
AbstractNonconvex nonsmooth regularization method has

been shown to be effective for restoring images with neat edges.
Fast alternating minimization schemes have also been proposed
and developed to solve the nonconvex nonsmooth minimization
problem. The main contribution of this paper is to show the
convergence of these alternating minimization schemes, based
on the Kurdyka-ojasiewicz property. In particular, we show
that the iterates generated by the alternating minimization
scheme, converges to a critical point of this nonconvex nonsmooth
objective function. We also extend the analysis to nonconvex
nonsmooth regularization model with box constraints, and obtain
similar convergence results of the related minimization algorithm.
Numerical examples are given to illustrate our convergence
analysis.
[7], [18], [20]. In particular, we solve for a minimizer b

f of a
cost function J(f ) of the form
J(f ) = Hf g22 + (f ).
(2)
22
In the above cost function, the least squares

forces the
data fitting according to (1), describes the priors and > 0
is a positive regularization parameter that controls the trade-off
between the data fitting term and the regularization term.
In many image processing applications, the regularization
term (f ) can be expressed as follows:
(f ) =
(Di f 2 ),
(3)
iI
Index TermsImage restoration, nonconvex and nonsmooth,

box-constraints, alternating minimization methods, Kurdykaojasiewicz inequality.
I. I NTRODUCTION
N many image processing applications in medicine and
engineering, image restoration and reconstruction plays an
important role for processing and analysis, see for instance
[21], [22]. In this paper, we consider a basic forward image
restoration model
g = Hf + n,
(1)
where g Rq is the observed image, f Rp is the original

image (both are rearranged into vectors), n Rq accounts
for the noise and H is a q p matrix which can represent a
blurring matrix in optics, and a transform matrix in medical
tomography. We would like to restore an image b
f where Hb
f
should be close to g in the forward image restoration model
in (1) and b
f satisfies some prior requirements. Regularization
is a useful method to determine such b
f , see for instance, [4],
Manuscript received June 5, 2013; revised January 3, 2014, November 10,
2014; accepted February 1, 2015. The work of J. Xiao was supported in
part by School youth project (No. QL1203) in Zhanjiang Normal University,
The Open Fund Project of Key Research Institute of Philosophies and
Social Sciences in Hunan Universities, NSF of Guangdong Province(No.
2013040014926) and NSF of Hunan Province (No. 2015JJ2059). The work
of M. K. Ng was supported in part by RGC grants 202013 and 12301214,
and and HKBU FRG grants. The work of Y. Yang was supported in part by
NNSF of China (No. 60872129, 60835004 and 71431008) and the Science
and Technology Project of Hunan Province (No. 2014SK3235).
The Corresponding Author. J. Xiao is with College of Mathematics and
Econometrics, Hunan University, Changsha, Hunan 410082, China, and
School of Mathematics and Computation Science, Lingnan Normal University,
Zhanjiang, Guangdong 524048, China. E-mail: xiaojin1113@126.com.
M. K. Ng is with Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong. E-mail: mng@math.hkbu.edu.hk.
Y. Yang is with Department of Information and Computing Science,
Changsha University, Changsha, Hunan 410003, China.
where I denotes the set of all pixels of an image, i.e., I =

{1, 2, , p}, Di : Rp Rs , for s 1 is a linear operator
yielding a vector containing the differences between the pixel
and its neighbors, each Di can be seen as an s p matrix.
For instance, the family {Di } = {Di : i I} can represent
the discrete approximation of the gradient or the Laplacian
operator on f , or finite differences of various orders, or the
combination of any of these with the synthesis operator of a
frame transformation.
In the literature, there are several potential functions(PF)
that have been proposed and used. For instance, the smooth and convex function (t) = t2 is originally used
in [41] for Tikhonov regularization, and the nonsmooth and
convex function (t) = |t| is the well-known total variation
regularization [40]. Nonsmooth and nonconvex functions are
also studied in [14], [15], [20], [31][35], and have been
shown to be effective for restoring images with neat edges.
Two examples of nonconvex and nonsmooth regularization
functions are given in Table I.
In [34], alternating minimization algorithms have been
proposed and developed for solving the objective function:
, u) = Hf g22 + (f ) + Df u22 +
J(f
ui 2 ,
iI
(4)
where
(f ) =
(Di f 2 )
iI
with
(t) = (t) + |t|,
= (0+ ),
the above nonconvex function () from (3), u =

[u1 , u2 , , up ] is used to transfer the nonsmooth term out
of J, is a positive penalty parameter, D is the matrix where
all Di are vertically concatenated, i.e.,
D = [DT1 , , DTp ]T ,
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
TABLE I
R EGULARIZATION FUNCTIONS .
t
1+t
t
1+t
log(t + 1)
log(t + 1)
(1+t)2
1+t
2
(1+t)
3
t
1+t
log(t + 1) t

(1+t)
2
and DTi is the transpose of Di . In (4), the penalty term Df

u22 is added to (2) so that the two variables f and u can
be updated separately and efficiently. For the reference, we
list in Table I two examples of . Numerical examples have
also been given to demonstrate the effectiveness and efficiency
of these algorithms. The main aim of this paper is to show
the convergence of this alternating minimization algorithm. In
particular,
we show that if the smallest eigenvalue of HT H +
T
iI Di Di is greater than a positive constant, then the
sequence generated by this algorithm converges to a critical
point of (4). The approach of this theoretical result is based
on the Kurdyka ojasiewicz (KL) inequality studied in [2],
[3]. On the other hand, we extend the analysis to (4) with
box constraints and obtain similar convergence results of the
related minimization algorithm.
Their approach in [2], [3] relies on assuming that the objective function to be minimized satisfies the so-called Kurdyka
ojasiewicz(KL) property [25], [28], which was developed for
nonsmooth functions by Bolte et al. [8], [9]. In both of these
works, the suggested approach gains its strength from the fact
that the class of functions satisfying the KL property is considerably large, and cover a wealth of nonconvex-nonsmooth
functions recently arising in many fundamental applications,
which contain the proximal-point algorithm for minimizing
a proper and lower semicontinuous (lsc) function and the
forward-backward scheme for solving problems about the sum
of a proper lsc function and a differentiable (nonconvex)
function, see [1], [11], [12], [16], [36], and have proved to
possess good convergence properties in the nonconvex case.
In [26], Le Thi et al. considered and studied the difference of
two proper lsc convex functions (it is called a DC function)
as an objective function. By using the KL properties, they
presented the convergence analysis of the DC optimization
algorithm when one of the lsc functions is strongly convex.
The paper is organized as follows. In Section 2, we present
the alternating minimization algorithm, fix some notations and
collect a few preliminary basic facts on nonsmooth analysis. In
Section 3, we study the convergence analysis of algorithms in
[34]. In Section 4, we consider the model with box constraints
and analysis the related algorithm. In Section 5, numerical
examples are given to demonstrate the theoretical results.
Finally, some concluding remarks are given in Section 6.
II. A LGORITHM AND SOME PRELIMINARIES
A. The Alternating Minimization Algorithm
To solve J of the form given by (4), a nonsmooth graduated
nonconvexity scheme is employed in [34]. The main idea is
to consider a sequence
0 = 0 < 1 < < k < < n = 1,
(5)
2 2
t(2+t)
(1+t)2
()2 t
t+1
2
(1+t)
3
2

(1+t)
2
and approach (or ) by a sequence of : R R+ (or )

such that 0 (or 0 ) is convex and (or ) monotonously
reaches (or ) when goes from 0 to 1 in (5). Here n is a
fixed positive integer, and 1 = (or 1 = ) and (or )
is nonsmooth at 0 for any [0, 1]. To simplify the notations,
we write for k in the following discussion. Correspondingly,
the objective function J is approximated by a sequence J as
given below:
J (f , u) = Hf g22 + (f )+Df u22 +
ui 2 ,
iI
where (f ) =
(6)
(Di f ), (t) = (t) + |t|,
iI
= (0+ ).
From the properties of , , introduced in the Appendix A, we have
J0 is convex and nonsmooth;

J can be seen as a DC function and gradually goes to
J when increases, and J1 = J;

2
J (, u) is C and nonconvex as (0, 1];

J (f , ) is nonsmooth and convex.
In Table I, we list two examples of , , , , and
Now, we give three assumptions as follows.

H1. ker H ker D = {0}
H2. J is bounded below as [0, 1].
Additionally, due to (f ) C 2 , J (f , u) is proper and lsc.
And according to the properties of , we can assume:
H3. has a Lipschitz continuous gradient with constant L
as (0, 1].
In (6), an auxiliary variable u = [u1 , u2 , , up ] is used
such that the TV denoising step can be done by a shrinkage
operation. Therefore, when we fix f , u can be calculated
efficiently, and when we fix u, f can also be computed easily.
The iterative algorithm proposed in [34] is summarized as is
a fixed number and f (j) f (j,k) , u(j) u(j,k) in Algorithm
1.
Algorithm 1
Step 1: Initialize f (0) and j = 1;
Step 2: Update f and u until the convergence

U Step
function and the coupling function H(x, y) is continuously

differentiable. Then (x, y) Rn Rm we have
(j)
T (x, y) = (x H (x, y) + f (x) , y H (x, y) + g (y))

= arg min J (f (j1) , u)
uRsp
{
}
= (x T (x, y), y T (x, y))
Di f (j1)
(j1)
=
max Di f
2
, 0 ,
2
Di f (j1)
for all i I;
Definition 4. A necessary (but not sufficient) condition for z
F Step
Rn to be a minimizer of a function f : Rn R is f (z) 0.
(j)
(j1)
(j1)
(j1)
f
= f
+ f
where f
solves A point that satisfies this requirement is called a critical point.
The Kurdyka-ojasiewicz property plays a central role in
our
analysis. Below, we recall the essential elements. First,
T
T
(j)
As (2H H + 2
Di Di )s = f J (f , u ). (7)
we introduce some notation. For any subset S Rn and any
iI
point x Rn , the distance from x to S is defined and denoted
(Here the iteration index is the superscript j.)
by dist(x, S) inf{y x, y S}, when S = , we have
that dist(x, S) = for all x.
In Algorithm 1,we note that the positive definite Definition 5. Let f : Rn R {+} be a
part (2HT H + 2 iI DTi Di ) of the Hessian matrix of proper lower semicontinuous function. The function f is
J (f , u(j1) ) is used in the optimization procedure. This said to have the Kurdyka-ojasiewicz property at z
n
procedure is to ensure the descent direction and coerciveness dom(f )( {z R |f (z) = }) if there exist (0, +],
of J (f , u). Numerical examples in [34] have shown that a neighborhood U of z and a continuous concave function
Algorithm 1 as the step size = 1 can provide good restored : [0, ) R+ such that
images with neat edges.
(0) = 0;
is continuous differentiable on (0, );
for all z (0, ), (z) > 0;

B. Preliminaries of nonsmooth analysis
for all z in U [f (
z ) < f < f (
z ) + )], the KurdykaLet us recall a few definitions concerning subdifferential
ojasiewicz inequality and
calculus.
Definition 1. Let f : Rn R {+} be a proper lower
(f (z) f (
z ))dist(0, f (z)) 1,
(8)
semicontinuous function.
(i)
The domain of f is defined and denoted by domf = hold. Moreover, proper lower semicontinuous functions which
satisfy the Kurdyka-ojasiewicz inequality at each point of
{x Rn |f (x) < +}.
(ii)
For each x domf , the Fr
echet subdifferential of f domf are called KL functions.
(x), is the set of vectors x Rn ,
at x, written f
Among real-extended-valued lower-semicontinuous funcwhich satisfy
tions, typical KL functions are semi-algebraic functions or
more generally functions definable in an o-minimal structure,
1
lim inf
[f (y) f (x) x , y x] 0.
see [8], [9]. References on functions definable in an o-minimal
y=x x y
yx
structure are given in [17], [19]. It is clear that 2 and 22 ,
whose correspondingconcave functions (s) in Definition 5
(x) = .
If x domf , then f
are equal to cs and c s, c > 0, respectively, are KL functions.
(iii) The limiting-subdifferential or simply the subdiffer- For instance, we note that (t) (i.e., t and log(t + 1))
1+t
ential for short, of f at x domf , written f , in Table I are semi-convex (pseudo-convex)
functions and also
n
is defined as follows: f (x) = {x R |xn KL functions, see [10]. According to the definition of semi (xn ), x x }.
x, f (xn ) f (x), xn f
n
convex function, h(t) = t2 + (t) are Morse functions,
Definition 2. (Sublevel sets) Being given real numbers and i.e., for each critical point t of h, its Hessian 2 h(t) of h at t
we set [ f ] = {x Rn | f (x) }. We define is a nondegenerate endomorphism of Rn , and also there exist
similarly [ < f < ].
c1 , c2 0 such that
Definition 3. A function f : Rn Rm R {+} with
values f (x, u) is level-bounded in x locally uniformly in u,
|h(t) h(t)| c1 t t2 ,
h(t) c2 t t.
m
if for each u
R and R, there is a neighborhood
V N (
u) such that the set {(x, u)|u V, f (x, u) } is Therefore, we can determine the corresponding concave func
bounded in Rn Rm .
tion (in Definition 5) of h(t) is given by (s) = c s,
Let us recall a useful result from [39, Proposition 10.5 and where c is a positive constant. On the other hand, are
Exercise 8.8].
definitely semi-algebraic, thus are KL functions with the
Proposition 1. Let T (x, y) f (x) + g(y) + H(x, y), where form (s) = cs, c > 0. As a summary, J (f , u) is a KL
f : Rn (, +] is a proper lsc convex function, function, and its corresponding
concave function is given by
)
[
g : Rm (, +] is a proper continuously differentiable (s) = cs with 21 , 1 , c > 0.
ui
Additionally, some relative notations are supplemented as

follows:
The indicator function of a nonempty and closed set S
{
0,
if x S ;
S (x)
+, otherwise.
The proximal operator is defined through the formula
1
proxf (x) arg min{f (y) +
y x2 }.
2
yRn
III. C ONVERGENCE A NALYSIS
The main aim of the section is firstly to show that Algorithm
1 converges to a critical point of (6), then to prove that
algorithms in [34] converge to a critical point of (4) as goes
from 0 to 1. Now we can establish the properties of {f (j) } and
{u(j) }, and demonstrate that they are useful for convergence
analysis.
Theorem 1. Suppose that H1 , H3 hold and min (A) > C /2,
where C is a Lipschitz constant of f J (, u), Then
(i) the sequences {f (j) } and {u(j) } generated by Algorithm
1, satisfy the following inequalities:
(j)
Df (j1) u(j) 22 +
ui 2 + 1 u(j) u(j1) 22
iI
Df
(j1)
u(j1) 22
(j1)
ui
2
and therefore dist(0, J (f (j ) , u(j ) )) tends to zero when j

tends to +.
(
)
(v) All limit points of f (j) , u(j) belong to the set of critical
points of J (f , u), which is nonempty, compact and connected.
The proof of Theorem 1 is given in Appendix B. By using
the results in Theorem 1, we can verify that three
condi{ (j)
}
tions
in
[3,
p.99]
are
required
for
convergence
of
z
{( (j) (j) )}
f ,u
generated by Algorithm 1. In the light of [1],
[3], [11], [36], the following three conditions are a general
methodology which describes the mains steps to achieve this
goal. In particular we put in evidence how and when the KL
property is entering in action.
Theorem
{
}2. Suppose that the Assumptions H1 , H2 , H3 hold.
Let z(j) be a sequence generated by Algorithm 1. Then there
are two positive numbers a and b, we have
(C1) (Sufficient decrease condition) for each j N,
J (z(j+1) ) + az(j+1) z(j) 2 J (z(j) );

(C2) (Relative error condition) for each j N, there exists
(j) (f (j) , u(j) ) J (z(j) ) such that
(j) bz(j) z(j1) ;
(C3) (Continuity
condition) there exists a subsequence
{ (j ) }
i
such that
z
and
z
(9)
iN
iI
and J (z(ji ) ) J (z),

z(ji ) z
Hf g22 + (f (j) ) + Df (j) u(j) 22 +

2 f (j) f (j1) 22
Hf (j1) g22 + (f (j1) ) + Df (j1) u(j) 22
(j)
(10)
where 1 and 2 are two positive numbers;

(ii) For all j 1, there exists a > 0 such that
J (f (j) , u(j) ) + (f (j) f (j1) 22 + u(j) u(j1) 22 )
J (f (j1) , u(j1) ),
(11)
and thus J (f (j) , u(j) ) does not increase as j +;
(iii)
(
)
f (j) f (j1) 22 + u(j) u(j1) 22 < +

j=1
)
(
and limj f (j) f (j1) 22 + u(j) u(j1) 22 = 0.
(iv) Define
1
f (j) = (1 )A(f (j) f (j1) ) +
( (f (j) ) (f (j1) )), j 1, (12)

and
u(j) = 2(Df (j1) Df (j) ),
j 1,
)
(13)
f (j) , u(j) J (f (j) , u(j) ).
)}
{(
of
Moreover, for all bounded subsequence
f (j ) , u(j )
{( (j) (j) )}
f ,u
, we obtain
)
(
f (j ) , u(j ) (0, 0), j +,

we have
as i .
The proof of Theorem 2 is given in Appendix C. Our

objective is to show that the sequence which is generated
by Algorithm 1 converges to a critical point of (6). For that
purpose we use {the KL
} property to analyze the convergence.
Theorem 3. Let z(j) be a sequence generated by Algorithm
1 with an initial guess z(0) , satisfying (C1), (C2) and (C3),
and the assumptions H1 , H2 , H3 hold. If J has the Kurdyka = (f , u
) with
ojasiewicz property at z
J (
z) J (z(j) ) J (
z) + ,
and
(14)
J (z(0) ) J (
z) b (0)
+ (J (z ) J (
z)) < ,
a
a
(15)
(a and b are two positive numbers stated in Theorem 2, and
refers to a concave function required in Definition 5), then
+

z(j+1) z(j) < +; and (iii)
(i) z(j) B(
z, ); (ii)
z(0)
z+2
j=0
(
)
{
}
J z(j) J (
z) as j , and z(j) converges to a
.
critical point z
The proof of Theorem 3 is given in Appendix D.
By using [2, Theorem
{
} 3.3], we can further obtain the local
convergence of {z(j) } to a global minimum.
Theorem 4. Let z(j) be a sequence generated by Algorithm
1 with an initial guess z(0) , satisfying (C1), (C2) and (C3),
and the assumptions H1 , H2 , H3 hold. If J has the Kurdyka (a global minimum point of J ).
ojasiewicz property at z
<
Then there exist > 0 and > 0 such that z(0) z
< J (z(0) ) < min J + , where < /3 and
and
min
J
2 /a + ab () < 2/3( and stated in Theorem 3), we
and
have (i) z(j) converges to some z

z(j+1) z(j) <
j=1
+; and (ii) J (
z) = min J .
Moreover, under the assumption of H1, H2, we know that J
is bounded below and coercive, and due to non-increasingness
of J along the sequence {z(j) }, as well as [2, Theorem 3.2],
we get the following convergence theorem with less required
conditions.
Theorem 5. Under the assumptions of H1, H2, H3 , if J
is a KL function, then any bounded sequence {z(j) } generated by Algorithm 1 converges to some critical point of
J . Moveover, the sequence {z(j) } has a finite length, i.e.

z(j+1) z(j) < +.

j=1
Remark 1. As in the nonconvex and nonsmooth cost form

(6) is a fixed nonzero value, we obtain:
(i) Any sequences generated by Algorithm 1 converges to local
minima of the nonconvex and nonsmooth cost form (6). And
if the objective function in (6) satisfies Theorem 4, the global
minimum can be obtained.
(ii) The nonconvex and nonsmooth function in (6) has the
) with
Kurdyka-ojasiewicz property at the critical point (f , u
(s) = cs with [ 21 , 1), c > 0. The convergence rate can
be given by [2, Theorem 3.4]
) d j ,
(f (j) , u(j) ) (f , u
[0, 1) , d > 0.
Based on above convergence analysis of the nonconvex

energy function in (6), we now discuss the convergence
analysis of Algorithms in [34] from 0 = 0 to n = 1. We
first note that when 0 = 0, the energy function J0 is just the
convex model solved in [42]. It is clear that J0 (f , u) satisfies
the Kurdyka-ojasiewicz property with the concave function
0 (f (0) , u
(0) ) is
(s) = cs , = 21 , c > 0. Assume that z
its optimal solution, the convergence rate can be rewritten by
(0) ) d0j , 0 [0, 1) , d > 0.
(f (j,0) , u(j,0) ) (f (0) , u
k is the critical point of Jk generated by
Suppose k 0 and z
Algorithm 1 (c.f. Theorem 3). It is used as the initial point for
optimization of Jk+1 , Algorithm 1 will converge to a critical
k+1 of Jk+1 . Repeatedly, the sequence {
point z
zk } can be
generated by considering k 1 . This is also Algorithm II
stated in [34]. The convergence result can be summarized as
the following theorem. In particular, we consider
v() inf J (z),
z
denoising step can be solved by the Chambolle algorithm.

More specifically, the problem is reformulated as follows:
J (f , u) = Hf g22 + (f ) + f u22 + T V (u)
(16)
Obviously, T V (u) is still the Kurdyka-ojasiewicz function.
We can make use of the above mentioned results to show the
algorithm in [34] converges to a critical point of J1 .
IV. B OX C ONSTRAINED N ONCONVEX M INIMIZATION
M ODELS
Recently, image restoration with box constraints, e.g., [5],
[13], [14], [30], are considered and studied. It has been shown
in these papers that it is very useful to restore an image within
a range of pixel values. In [5] and [24], the peak signal-tonoise ratio improvement can be as high as 2dB as the bound
constraints have a lot of extreme pixels.
Here we incorporate the nonconvex nonsmooth optimization
problem in (2) and (3) with box constraints
minp Hf g2 +
(Di (f )2 )
f R
iI
(17)
s.t. l f r
where l and r are fixed vectors and inequalities are taken
componentwise.
According to the analysis in Section 2, we can consider the
following optimization problem for image restoration:
Di f 2
min Hf g22 + (f ) +
f Rp
iI
(18)
s.t. l f r
where (f ) =
(Di f 2 ). Similar to Algorithm 1,
iI
we also have the following algorithm to solve the above

optimization problem.
Algorithm 2
Step 1: Initialize f (0) and j = 1;

Step 2: Update f and u until the convergence
U Step
(j)
arg min J (f (j1) , u)

uRsp
{
}
Di f (j1)
(j1)
=
max
D
f
,
0
,
i
2
2
Di f (j1)
for all i I;
ui
V () arg min J (z).

z
Theorem 6. Let J () : Rp Rsp R, is proper and lsc,

J (z) : [0, 1] R is continuous, and J (z) meets levelbounded in z locally uniformly in , as well as the assumptions
k V (k ), then {
of H1, H2, H3 hold. If z
zk } is bounded and
all its cluster points lie in V (1).
The proof of Theorem 6 is given in Appendix E.
Remark 2. There is another alternating minimization algorithm
in [34] for solving the nonconvex model in (6). The idea is to
use the auxiliary variable u = [u1 , u2 , , up ] to transfer
the nonsmooth term out of J in such a way that the TV
F Step
f (j)
=
{

2 }

2
arg min Hf g + (f ) + Df u(j)
f S
(19)
where S = {f |l f r}.
(Here the iteration index is the superscript j.)
Next,
reduced
[23]. In
simplify
we dispose F Step with box constraints by the

projected-Newton method (PN method) in [6] and
PN method, the iteration index is s for f (s) . To
the notations and without loss of generality, we let
grad(f , u(j) ) f J (f , u(j) ),
s
f (s) (m
)2
Active set:
As {i :
(s)
fi
= li } {i :
The projection operator PS ()
(l)i
(f )i
(PS (f ))i
(r)i
(s)
fi
(s)
{i : fi
{i :
(s)
fi
= ri }
into S:
As = As+1 .
if (f )i < (l)i
if (l)i (f )i (r)i
if (f )i > (r)i
li + s , gradi (f (s) , u(j) ) > 0}

ri s , gradi (f (s) , u(j) ) < 0},
where
s = min{, s },
s = f (s) PS (f (s) grad(f (s) , u(j) )),
is some positive number and gradi (f (s) , u(j) ) denotes the

ith component of the gradient grad(f (s) , u(j) ). The set Bs
collects variables that are near their bounds, and for which the
objective J (, u(j) ) can be decreased by moving the variables
towards (or past) their bounds. The set of free variables is the
complementary set of the restricted variables.
We model the reduced HessianRs with the symmetric
positive matrix A 2HT H + 2 iI DTi Di as following
Rs
= EBs + (I EBs )A(I EBs )

{
i,j
if i Bs or j Bs
=
(A)i,j
otherwise.
{
where
(E )i,i =
1,
0,
(20)
i ,
i
/ ,
(s)
= gradi (f (s) , u(j) ),
for i {1, , p} \ Bs .
(21)
Computing the stepsize in the F Step
s = arg min J (PS (f (s) + Rs 1 d(s) ), u(j) )
(22)
>0
In practice, subproblem (22) is solved inexactly using a

projected backtracking (Armijo) line search. And we take the
initial step length parameter to be the so-called Cauchy point
d(s) 2
0s =
Ad(s) , d(s)
(24)
Or according to [29], we stop when

(
)
(
)
{ (
J f (s1) , u(j) J f (s) , u(j) max J f (i1) ,
)
(
)
}
u(j) J f (i) , u(j) , i = 1, , s 1
(25)
where (0, 1).
Algorithm PN (PN method for F Step)
(
)
Initialize f (0) = PS f (j1) ;
While s smax , (24) and (25) do not hold, Do
1. Search direction calculation: d(s) by (21);
2. Reduced Newton Hessian: Rs by (20);
s
s
3. Armijo line search: stepsize
m
by) (23);
( (s) =
(s+1)
s 1 (s)
4. update: f
= PS f + R
d
,
s := s + 1;
End Do
f (j) = f (s)
(Here the iteration index is the superscript s.)
the convergence of the sequence

{ Now,
} ( we
{( analyze )})
z(j) f (j) , u(j)
generated by Algorithm 2 combined
with
Algorithm
PN.
Firstly, we confirm the sequence
{ (s) }
f
generated by Algorithm PN is convergent.
{
}
Theorem 7. Any bounded subsequence f (s) generated by
Algorithm PN is convergent, and

lim f (s) PS (f (s) + d(s) ) = 0.
(26)
s
and i,j is the Kronecker delta. The reduced gradient of

J (f , u(j) ) at f (s) S is given by
{
(s)
(s)
fi ri ,
if ri fi s ,
(s)
di
=
for i Bs
(s)
(s)
fi li ,
if fi li s ,
di
(23)
holds, where (0, 1) and f (s) () = PS (f (s) + Rs 1 d(s) ).

The stopping criteria is given as follows:
At each iteration, we partition the variables into two groups:

free and restricted. Restricted variables are defined as a particular subset of the variables close to their bounds, based
on the sign of the corresponding components in the gradient.
Formally, the set of restricted variables is:
Bs
s
Step length reduction can be accomplished by taking m
=
m s
0 , m = 1, 2, , for some (0, 1). We stop at the first
m for which the sufficient decrease condition
(
)
s
s
J f (s) (m
), u(j)
J (f (s) , u(j) ) /m
f (s)
The proof of Theorem 7 is given in Appendix F. Next, we

turn to analyze the convergence of Algorithm 2, where F Step
is disposed by Algorithm PN.
Theorem 8.
that H1, H2
{ Assume
} ( {(
)}) and H3 hold, then any
sequence z(j) = f (j) , u(j)
generated by Algorithm 2 as is some fixed nonzero number, is bounded and
of optimization problem:
converges to some critical point z
min(f ,u) J (f , u), s.t. l f r. Moreover the sequence
+

z(j+1) z(j) < +.
{z(j) } has a finite length, i.e.
j=1
The proof of Theorem 8 is given in Appendix G.

Remark 3.
i
According to Theorem 6 and the constraints is ir k )}
relevant to k , the sequence {
zk } {(fk , u
is bounded and all its cluster points lie in
, u), s.t. l f r} as k goes
arg min(f ,u) {J(f
(a) TwoCircles
(b) Modified Shepp-Logan
(c) Text
(d) Liftingbody
Fig. 1. Original images
ii
k is the critical point of optifrom 0 to 1, where z

mization problem: min(f ,u) Jk (f , u), s.t. l f r;
According to above analysis, we know the optimum
value of the box constrained nonconvex minimization
models (17) solved by any Algorithm in [34] combined with the reduced projected-Newton method is
obtained after finite iterations.
We generate four blurred and noisy images by adding

Gaussian noises with standard derivations of 0.05, 0.055, 0.06,
0.07 to the blurred Two circles, Modified Shepp-Logan,
Text and Liftingbody images. Their restoration results in
PSNRs with respect to different values of n are reported in
Figures 2-5 respectively. According to the figures, we find that
the restoration results are not stable when n is larger than the
number near 70, and their PSNR values varies significantly.
One possible reason is that the algorithm is sensitive to .
When is small (n is large), the algorithm may be trapped
in local minimum points. We suggest that the value of n is
not necessary to be chosen too large. In our examples, when
the values of n are below the number near 70, the restoration
results are more stable. In Table II, we also summarize the
maximum PSNR of the restored image for different testing
images and noise levels. In most of the cases, the maximum
PSNR values are obtained when n are less than the number
near 70.
In Figure 6, we display the computational time required
for convergence for different values of n when Gaussian
noise with standard derivation of 0.05 is added to different
images. We see from the Figure 6 that Algorithm 1 is quite
efficient. For instance, it takes less than 17 seconds to obtain
the maximum PSNR of the restored 512-by-512 Liftingbody
image.
max(psnr)=19.3779, index = 5
50
50
50
V. E XPERIMENTAL R ESULTS
In this section, we test the performance of Algorithms 1 and
2, and observe their convergence results. All the numerical
examples are tested under Windows 7 and MATLAB R2010a
running on a DELL laptop with an Intel Core i5-2430M CPU
at 1.8 GHz and 2.92 GB of memory. Peak signal-to-noise ratio
F
(PSNR) (20 log10 gf
where g is the observed image,
q
f is the original image, q is the size of the image and F is
the Frobenius norm), is used to measure the restoration results.
CPU time is used to measure the efficiency of the restoration
method. The stopping criterion is that the relative change of the
successive iterates must be less than 104 . The initial value of
is set to be 1.1, and its value is updated at each iteration by
the following formula 1.8 as suggested in [34]. The value
of is set to be 0.015. The potential function t/(1 + t) in
Table I is employed in the following experiments. The value
of is set to be 0.5 in the used potential function.
The four testing images are shown in Figure 1: (a) TwoCircles.tiff of size 64 64; (b) Modified Shepp-Logan.tif of size
256 256; (c) Text.png of size 256 256; and (d) Liftingbody.png of size 512 512. To generate the observed images,
2
2
)
the two-dimensional truncated Gaussian function exp( s2t
2
for 3 s, t 3 with = 1.5 is used. The support of the
blurring function is 7 7.
A. Experiment 1
We first test the continuation approach by using k = nk in
(5). It is clear 0 = 0 and n = 1. We would like to test the
performance of Algorithm 1 for different values of n.
50
100
100
150
150
200
200
250
250
300
300
350
350
400
450
50
100
150
400
50
(a)
150
(b)
50
50
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
450
100
400
0
50
100
150
450
50
(c)
100
150
(d)
Fig. 2. Two circles image: Gaussian noise with standard derivation of (a)
0.05; (b) 0.055; (c) 0.06; (d) 0.07. The maximum value of PSNR is given by
max(psnr) and attained at the optimum index n.
B. Experiment 2
In this experiment, we test the restoration model with box
constraints. Here we consider li = 0 and ri = 1 for all
pixel values i in the testing images. In view of the results
in Experiment 1, we simply adopt the optimum index n in
table II, which is yielded along with the maximum PSNR of
the different restored image obtained respectively, into the run
TABLE II
T HE MAXIMUM VALUE OF PSNR (max(psnr)) IS OBTAINED AT THE OPTIMUM n (index) FOR DIFFERENT IMAGES AND NOISE STANDARD DERIVATIONS
WHEN n GOES FROM 1 TO 70.
Tested
image
(a)
(b)
(c)
(d)
0.05
max(psnr) index
19.38
5
26.60
7
19.33
12
29.93
68
0.055
max(psnr)
index
18.85
11
25.72
10
19.12
14
28.89
68
0.06
max(psnr) index
18.52
20
24.97
67
19.01
13
27.80
69
100
100
0.07
max(psnr) index
18.05
65
23.47
68
18.25
69
25.74
66
100
100
100
100
200
200
300
300
400
400
100
100
200
200
300
300
400
400
500
500
600
600
50
100
150
700
50
(a)
100
500
150
50
(b)
150
50
100
100
150
(b)
100
100
50
100
100
100
200
150
200
300
300
400
100
200
200
250
500
300
300
400
600
350
400
500
700
400
450
500
(a)
50
100
50
100
150
600
(c)
50
100
800
150
50
(d)
Fig. 3. Modified Shepp-Logan image: Gaussian noise with standard derivation

of (a) 0.05; (b) 0.055; (c) 0.06; (d) 0.07. The maximum value of PSNR is
given by max(psnr) and attained at the optimum index n.
of Algorithm 2. In table III, compared with the corresponding

maximum PSNR in Experiment 1, the PSNRs(dB) of restored
images disposed by Algorithm 2 are shown. We see that the
restoration results of TwoCircles, Modified Shepp-Logan
and Text images by Algorithm 2 is at least about 0.5dB
higher than those by Algorithm 1. For Liftingbody image,
the difference is only about 0.05dB. These observation may
be explained by considering the number of extreme pixels.
In table IV, for TwoCircles, Modified Shepp-Logan and
Text images, there are more than 60% extreme pixels. However, there is less than 0.1% extreme pixels for Liftingbody
image.
VI. C ONCLUSION
In this paper, we have analyzed the convergence of nonconvex nonsmooth regularization method for image restoration.
We have employed the Kurdyka-ojasiewicz inequality to
show that the iterates generated by the alternating minimization scheme, converges to a critical point of this nonconvex
nonsmooth objective function. The analysis is also extended
to image restoration model with box-constraints. Numerical
examples are presented to demonstrate the usefulness of nonconvex nonsmooth regularization model. In future research,
100
150
500
50
(c)
100
150
(d)
Fig. 4. Text image: Gaussian noise with standard derivation of (a) 0.05; (b)
0.055; (c) 0.06; (d) 0.07. The maximum value of PSNR is given by max(psnr)
and attained at the optimum index n.
TABLE III
N UMERICAL COMPARISON OF A LGORITHMS 1 AND 2 FOR I MAGES ( A )-( D )
IN F IGURE 1.
Tested
Algorithm
image
(a)
type
1
2
1
2
1
2
1
2
(b)
(c)
(d)
PSNR (dB) of restored image from

noise of standard derivation
0.05
0.055
0.06
0.07
19.38
18.85
18.52
18.05
19.84
19.49
19.48
18.65
26.60
25.72
24.97
23.47
27.14
26.60
26.02
24.69
19.33
19.12
19.01
18.25
19.87
19.76
19.64
19.23
29.93
28.89
27.80
25.74
29.96
28.93
27.93
25.89
we may study the convergence analysis of the other nonconvex nonsmooth image restoration problem. For instance,
min{f g1 +Df qq }, where 0 < q < 1. The variables can
f
be separated as follows: min{f g1 +Duqq + 2 uf 22 }.

f
The problem still can be solved by alternating minimization,
however the main difficulty is that qq is non-Lipschitz and
is not semi-algebraic when q is an irrational number.
A PPENDIX A
T HE PROPERTIES OF , ,
FROM
The properties of :
[34].
100
100
100
100
200
200
300
300
400
400
500
500
600
600
700
50
100
150
700
TABLE IV
P ERCENTAGE OF PIXEL VALUES OF 0 OR 1 FOR IMAGES ( A )-( D ) IN F IGURE
1.
Tested image
(a)
(b)
50
100
(a)
(c)
150
(d)
(b)
min/max pixel
0
255
0
255
0
1
0
255
percentage
73.90%
16.11%
58.18%
4.34%
92.72%
7.28%
0.0092%
0.0259%
100
100
100
100
200
200
300
300
400
400
500
500
600
50
100
150
600
(c)
(d)
= 1, 1 = ;
(0+ ) > 0 is finite, [0, 1];
(t) < 0, t R+ and (0+ ) is finite and less
than zero, (0, 1];
The properties of :
0
50
100
(c)
150
(d)
Fig. 5. Liftingbody image: Gaussian noise with standard derivation of (a)

0.05; (b) 0.055; (c) 0.06; (d) 0.07. The maximum value of PSNR is given by
max(psnr) and attained at the optimum index n.
Time=0.9341, index =5
Time=7.8915, index = 7
(a)
() is strictly concave and (0) = max (t);
(b)
(c)
(d)
t R, 0 (t) = 0 and 1 (t) = (t) (0+ )|t|;

() is C 1 on R with (0) = 0, [0, 1];
() is C 2 , () is increasing on (0, +), (t) =
(t) and | (t)| 22 (or 2 2 ) as (t) =
t/(1 + t)(or log(1 + t)), t R+ , (0, 1].
tR
14
1.6
1.4
12
A PPENDIX B
P ROOF OF T HEOREM 1:
1.2
10
1
8
0.8
6
0.6
4
0.4
0.2
10
20
30
40
50
60
70
80
10
20
30
(a)
40
50
60
70
80
(b)
Time=9.3146, index = 12
Time=16.81, index = 68
30
45
40
25
35
30
TIME(sec)
20
15
10
25
20
15
10
5
5
0
10
20
30
40
50
60
70
80
10
(c)
20
30
40
n
50
60
70
80
(d)
Fig. 6. Computational time(seconds) required for convergence for different

values of n, Time indicates the time spent as the maximum value PSNR
is obtained at the optimum index n. (a) TwoCircles; (b) Modified SheppLogan; (c) Text; (d) Liftingbody images.
(i) The inequality of (9) can be obtained by direct computation, see for instance [27]. The proof for the inequality of
(10) can be considered as follows. Under the assumptions of
the theorem and [3, Lemma 3.1], the following inequality can
be established:
J (f (j) , u(j) ) J (f (j1) , u(j) ) + f J (f (j1) , u(j) ),

C
f (j) f (j1) + f (j) f (j1) 2 ,
2
where , is the inner product, and C is the Lipschitz constant of f J (, u). According to the F Step in Algorithm 1,
the above inequality becomes
1
J (f (j) , u(j) ) J (f (j1) , u(j) )
A(f (j ) f (j 1) ),
C
f (j) f (j1) + f (j) f (j1) 2
2
Because A is positive definite, we obtain
(a)
(b)
is continuous and symmetric on R, increasing on

R+ , with (0) = 0;
is C 2 on R+ ( {t R|t > 0}), (0+ ) 0, t
0, inf (t) < 0 and lim (t) = 0;
tR+
+
t+
lim (t)
t0
If (0 ) 0 then
< 0 is well defined
and (t) 0 is increasing on (0, +).

The properties of :
(a)
(0) = 0, is symmetric and continuous on R,
and C 2 smooth on R+ , [0, 1] ;
for = 0, we have 0 (t) = (0+ )|t|, while for
(b)
(c)
min (A) C
)f (j ) f (j 1) 2 J (f (j 1) , u(j ) )
J (f (j) , u(j) ) + (
2
In accordance with the assumptions of the theorem, i.e.
min (A) > C /2, the inequality of (10) is satisfied.
For (ii) and (iii), we can establish the results by using (i) and
[2, Lemma 3.1].
For (iv), according to Algorithm 1, we first obtain
0 u J (f (j1) , u(j) )
(j)
(j)
=
((ui 2 ) + 2(ui Di f (j1) )) (27)
i
10
(j)
= (f (j) , u(j) ) satisfies (C2) with b =
which1 entails
1 max (A) + L + 2max (D).
f J (f (j1) , u(j) )
Because J (f , u) is proper and lsc, H1 and H2 hold, we have
(j)
j
p
sp
2HT (Hf (j1) g) + (f (j1) ) that {z } is contained in the level set {z R R |J
(0)
J (z) > . Using the

+2DT (Df (j1) u(j) )
(28) J (z) J (z )}, where J inf
z
Bolzano-Weierstrass theorem, we deduce the existence of a
Because of the structure of J , we have
= (f , u
)
subsequence {z(ji ) } which converges to some z
(j)
(j)
(j)
(j)
(j)
u J (f , u ) =
((ui 2 ) + 2(ui Di f )) as i +. By using (C1), we also see that the sequence
{J (z(ji ) )} is decreasing and limi z(ji +1) z(ji ) = 0.
i
Since C2 hold, we have (ji ) 0 as i . According
and
to the structure of J , let J (f , u) G (f ,
u) + g (u), where
f J (f (j) , u(j) ) = 2HT (Hf (j) g) + (f (j) ) +
G (f , u) = Hf g22 +Df u22 + i ui 2 , g (f ) =
(f ). Because of the lower semi-continuity of G (f , u), we
2DT (Df (j) u(j) ).
have
With (27) and (28), we obtain
G (
z) lim inf i+ G (z(ji ) ),
(j1)
(j)
(j)
(j)
and
the
convexity
property of G (z) imply
2(Df
Df ) u J (f , u )
z(ji ) G (
lim supi+ G (z(ji ) ) + (ji ) , z
z).
and
(ji )
(ji )
Additionally, limi , z z = 0.
1
Thus limi+ G (z(ji ) ) = G (
z).
(1 )A(f (j) f (j1) ) + ( (f (j) ) (f (j1) ))
Moreover, g (f ) is continuous, i.e. limi g (f (ji ) ) =

= f J (f (j) , u(j) ).
g (f ). It implies that J (
z) = lim J (z(ji ) ) , i.e. (C3) is
i
According{(to Proposition
1, (13) is yielded. Moreover, correct.
)}
suppose
f (j ) , u(j )
is a bounded sequence, then
{(
)}
A PPENDIX D
f (j 1) , u(j )
is also a bounded sequence. By using
(
)
(
)
P
ROOF OF T HEOREM 3:
(ii), we have f (j ) , u(j ) f (j 1) , u(j ) vanishes when

By using [3, Lemma 2.6] and the assumption that J
j . According to the properties of (), () is a satisfies (C1) and (C2), we obtain (i) and (ii). For (iii), by using
uniform continuous function on bounded subsets, thus the last (C3), (i) and (ii), we know z(j) z
and J (z(j) ) J (
z),
point of (iv) is yielded.
is a critical point of J .
as j . It remains to show that z
For (v), we obtain the results by using [2, Proposition 3.1].
The sequence (z(j) , (j) ) Graph( J ) {(z, )|
J (z)}. Due to limj z(j+1) z(j) = 0, we have

A PPENDIX C
limj (j) = 0. By the closure property of the subdifferP ROOF OF T HEOREM 2:
is a
ential J it is (
z, 0) Graph( J ), which means that z
(j)
(j)
(j1) 2
(j)
(j1) 2
2 + u critical point of J .
Because of z z
2 f f
u(j1) 22 and the use of (11), we deduce J satisfies (C1) with
a = .
A PPENDIX E
According) to (Theorem 1(iv), we can set( (j) =
( (j)
P
ROOF
OF T HEOREM 6:
f
, u(j) =
(1 1 )A(f (j) f (j1)
) + (f (j) )
)
)
(f (j1) ) , 2(Df (j1) Df (j) ) which is an element
Due to the assumption H1, H2, J () is coercive, lsc
(j)
(j)
of J (f , u ). As is L-Lipschitz function, we have and bounded below, we know J (z) satisfy level-bounded
in z locally uniformly in . And because J (z) (the con1
(1 )A(f (j) f (j1) ) + ( (f (j) ) (f (j1) ))
tinuity of (f ) and (0+ ) about ) is continuous, we
have limk 1 Jk (z) = J1 (z) for every z, after that

1
(1 )A f (j) f (j1) + Lf (j) f (j1)
limk 1 inf z Jk (z) = inf z J1 (z), i.e. lim1 v() = v(1).

(
)
According to [39, theorem 1.17], the theorem is true.

1
1 max (A) + L f (j) f (j1)
and
A
(f (j) f (j1) ) =
and
2(Df (j1) Df (j) )) 2max (D)f (j) f (j1) .
Therefore, for j N, we obtain
(j) = (f (j) , u(j) )

(
)

1
1 max (A) + L + 2max (D) f (j) f (j1)

(
)

1

1 max (A) + L + 2max (D) z(j) z(j1)
A PPENDIX F
Because of the positive definite matrix A, there exist dl , dr
such that 0 < dl < dr < + and
dl f 2 f T Rs f dr f 2
(29)
And f J (, u) is a Lipschitz function in S. Refer to [37,

Theorem 2] and [38, Lemma 2.3], we obtain the conclusion
of the theorem.
A PPENDIX G
R EFERENCES
We know J (f , u) are KL functions defined in Section 2

and S is a nonempty closed semi-algebraic set (convex set), so
J (f , u)+S (f ) are KL functions, where S () is the indicator
function.
Without loss of generality, we can set f (j) = PS (f (j1) +
1
Rj1 d(j1) ) in Algorithm PN. Because of proxS f =
PS (f ), we obtain
f (j) = proxS (f (j1) + Rj1
1 (j1)
), > 0.
By definition of the proximal operator we have

1
1
f (j) f (j1) Rj1 d(j1) 22
2
1
1
Rj1 d(j1) 22 .
S (f (j1) ) +
2
After the simplification, we obtain
(j)
1
S (f (j) )
f f (j1) , Rj1 d(j1) +
1
(j)
f f (j1) 22 S (f (j1) )
2
By using (7), (20), (21) and (29), we get
S (f (j) ) +
S (f (j) )
11
min (A)
1
+
)f (j) f (j1) 22
dr
2
S (f (j1) )
And let J (f (j) , u(j) ) J (f (j1) , u(j) ) into the above

inequality, we have
min (A)
1
J (f (j) , u(j) ) + S (f (j) ) + (
+
)f (j)
dr
2
f (j1) 22 J (f (j1) , u(j) ) + S (f (j1) )
i.e.,
Hf (j) g22 + (f (j) ) + Df (j) u(j) 22 +
min (A)
1
S (f (j) ) + (
+
)f (j) f (j1) 22
dr
2
Hf (j1) g22 + (f (j1) ) + Df (j1) u(j) 22
+S (f (j) )
(30)
Combined with (9), and Theorem 1(i) and (ii), we know
J (f , u) + S (f ) satisfies (C1) defined in Theorem 2.
Because of the compactness and convexity of S and
the shrinkage properties of f and u in [42], Theorem
5{ as }well
( {(as Theorem
)}) 1(iii), we confirm the sequence
z(j) = f (j)(, u(j)
generated
by Algorithm 2 is bound)
ed, and (j) = f (j) , u(j) J (f (j) , u(j) ) satisfies (12)
and (13) as f S, i.e. (C2) in Theorem 2 is met.
Similar to the proof of Theorem 2, we confirm that the
condition (C3) is also satisfied. According to Theorem 3, the
conclusion of this theorem is completed.
ACKNOWLEDGMENT
The authors would like to thank Dr. M. Nikolova and Dr.
Liwei Zhang for their constructive suggestions.
[1] H. Attouch and J. Bolte, On the convergence of the proximal algorithm

for nonsmooth functions involving analytic features, Mathematical Programming, vol. 116, no. 1:516, 2008.
[2] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating
minimization and projection methods for nonconvex problems. An approach based on the KurdykaLojasiewicz inequality, Math. Oper. Res.,
vol. 35, no. 2, pp. 438457, 2010.
[3] H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods
for semi-algebraic and tame problems proximal algorithms, forwardbackward splitting, and regularized Gauss-Seidel methods, Math. Program. Ser. A, vol. 137, issue 1-2, pp. 91129, 2013.
[4] G. Aubert and P. Kornprobst, Mathematical Problems in Image Processing, 2nd ed. Berlin, Germany: SpringerVerlag, 2006.
[5] A. Beck and M. Teboulle, Fast gradient-based algorithms for constrained
total variation image denoising and deblurring problems, IEEE Trans.
Image Processing., vol. 18, no. 11, pp. 24192434, 2009.
[6] D. P. Bertsekas, Projected Newton methods for optimization problems
with simple constraints, SIAM J. Control. Optim., vol. 20, no. 2, pp.
221246, 1982.
[7] J. E. Besag, Digital image processing: Towards Bayesian image analysis,
J. Appl. Stat., vol. 16, no. 3, pp. 395407, 1989.
[8] J. Bolte, A. Daniilidis, and A. Lewis, The ojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical
systems, SIAM J. Optim., vol. 17, no. 4, pp. 12051223, 2006.
[9] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota, Clarke subgradients of
stratifiable functions, SIAM J. Optim., vol. 18, no. 2, pp. 556572, 2007.
[10] J. Bolte, A. Daniilidis, O. Ley, and L. Mazet, Characterizations of
jasiewicz inequalities and applications: subgradient flows, talweg, convexity, Trans. Amer. Math. Soc., vol. 362, no. 6, pp. 33193363, 2010.
[11] J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized
minimization for nonconvex and nonsmooth problems, Mathematical
Programming, vol. 146, pp. 459494, 2014.
[12] R. I. Bot and E. R. Csetnek, An inertial Tsengs type proximal algorithm
for nonsmooth and nonconvex optimization problems, ArXiv eprints,
June 2014.
[13] R. Chan and J. Ma, A multiplicative iterative algorithm for boxconstrained penalized likelihood image restoration, IEEE Tran. Image
Processing, vol. 21, no. 7, pp. 31683181, 2012.
[14] X. Chen, M. Ng, and C. Zhang, Non-Lipschitz lp -Regularization and
Box Constrained Model for Image Restoration, IEEE Trans. on Image
[15] X. Chen and W. Zhou, Smoothing nonlinear conjugate gradient method
for image restoration using nonsmooth nonconvex minimization, SIAM J.
Imaging. Sci., vol. 3, no. 4, pp. 765790, 2010.
[16] E. Chouzenoux, J.-C. Pesquet, and A. Repetti, Variable metric forwardbackward algorithm for minimizing the sum of a differentiable function
and a convex function, Journal of Optimization Theory and Applications,
vol. 162, pp. 107132, 2014.
[17] M. Coste, An introduction to o-minimal geometry, RAAGNotes, Institut
de Recherche Math
ematiques de Rennes, Nov 1999.
[18] G. Demoment, Image reconstruction and restoration: Overview of
common estimation structure and problems, IEEE Trans. Acoust. Speech
Signal Process., vol. ASSP37, no. 12, pp. 20242036, 1989.
[19] L. V. Dries, Tame Topology and o-minimal Structures, London Mathematical Society Lecture Note Series, vol. 248. Cambridge University
Press, Cambridge, 1998.
[20] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributinons,
and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach.
Intell., vol. PAMI6, no. 6, pp. 721741, 1984.
[21] A. Jain, Fundamentals of Digital Image Processing, Upper Saddle River,
NJ: Prentice-Hall, 1989.
[22] R. Gonzalez and R. Woods, Digital image processing, Pearsol Int.
Edition3rd ed. 2008.
[23] C.T. Kelley, Iterative methods for optimization, Society for Industrial
and Applied Mathematics, vol. 18, 1987.
[24] D. Kim, S. Sra, and I. Dhillon, Tackling box-bonstrained optimization
via a new projected quasi-newton approach, SIAM J. Sci. Comput., vol.
32, no. 6, pp. 35483563, 2010.
[25] K. Kurdyka, On gradients of functions definable in o-minimal structures,
Annales de linstitut Fourier, vol. 48, no. 3, pp. 769783, 1998.
[26] HA. Le Thi, V.N. Huynh, and T. Pham Dinh, Convergence analysis of
dc algorithm for dc programming with subanalytic data, Technical report,
Ann. Oper. Res., INSARouen, 2009.
[27] A. S. Lewis and S. J. Wright, A proximal method for composite
minimization, arXiv preprint arXiv:0812.0423, 2008.
[28] S. ojasiewicz, Une propri

et
e topologique des sous-ensembles analy
tiques r
eels, Les Equations
aux D
eriv
ees Partielles, Editions
du centre
National de la Recherche Scientifique, Paris, 889, 1963.
[29] J. J. Mor
e and G. Toraldo, On the solution of large quadratic programming problems with bound constraints, SIAM J. Optim., vol. 1, no. 1,
pp. 93113, 1991.
[30] B. Morini, M. Pocelli, and R. Chan, A reduced Newton method for
constrained linear least-squares problems, J. Comput. Appl. Math., vol.
233, no. 9, pp. 22002212, 2010.
[31] M. Nikolova, Minimizers of cost-functions involving non-smooth datafidelity terms. application to the processing of outliers, SIAM J. Numer.
Anal., vol. 40, no. 3, pp. 965994, 2002.
[32] M. Nikolova, A variational approach to remove outliers and impulse
noise, J. Math. Imaging Vision, vol. 20, no. 12, pp. 99120, 2004.
[33] M. Nikolova, M. Ng, S. Zhang, and W. Ching, Efficient reconstruction
of piecewise constant images using nonsmooth nonconvex minimization,
SIAM J. Imaging. Sci., vol. 1, no. 1, pp. 225, 2008.
[34] M. Nikolova, M. Ng, and C. Tam, Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction, IEEE Tran. Image
[35] M. Nikolova, M. Ng, and C. Tam, On 1 Data Fitting and Concave
Regularization for Image Recovery, SIAM J. Sci. Comput., vol. 35, no.
1, pp. 397430, 2013.
[36] P. Ochs, Y. Chen, T. Brox, and T. Pock, ipiano: Inertial proximal algorithm for nonconvex optimization, SIAM Journal on Imaging Sciences,
vol. 7, no. 2, pp.13881419, 2014.
[37] R. Pytlak, An efficient algorithm for large-scale nonlinear programming
problems with simple bounds on the variables, SIAM J. Optim., vol. 8,
no. 2, pp. 532560, 1998.
[38] R. Pytlak and T. Tarnawski, Preconditioned conjugate gradient algorithms for nonconvex problems with box constraints, Numerische
Mathematik, vol. 116, no. 1, pp. 149175, 2010.
[39] R. T. Rockafellar and R. Wets, Variational Analysis, Grundlehren der
Mathematischen Wissenschaften, vol. 317. Springer, Berlin, 1998.
[40] L. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise
removal algorithms, Physica, vol. 60, no. 1, pp. 259268, 1992.
[41] A. Tikhonov and V. Arsenin, Solutions of Ill-Posed Problems. Washington, DC, 1977, Winston.
[42] Y. Wang, J. Yang, W. Yin, and Y. Zhang, A new alternating minimization
algorithm for total variation image reconstruction, SIAM J. Imaging. Sci.,
vol. 1, no. 3, pp. 248272, 2008.
PLACE
PHOTO
HERE
12
Yu-fei Yang received the B.Sc. degree, the M.Sc.

degree and the Ph.D. degree in mathematics from
Hunan University, in 1987, 1994 and 1999, respectively. From 1999 to 2001, he stayed at the
PLACE
University of New South Wales, Australia as visPHOTO
iting fellow. From 2002 to 2005, he held research
HERE
associate and postdoctoral fellowship positions at
the Hong Kong Polytechnic University. He was an
assistant, lecturer, associate professor and professor
with Hunan University from 1987 to 2012. He is currently a Professor with Department of Information
and Computing Science, Changsha University, China. His research interests
includes optimization theory and methods, and partial differential equations
with applications to image analysis.
Jin Xiao received the B.Sc.degree in Hunan University of Science and Technology, the M.Sc. degree in Dalian University of Technology, in 2004
and 2006, respectively. He is currently pursuing
the Ph.D. degree with College of Mathematics and
Econometrics, Hunan University, China. From 2006
to now, He is an assistant, lecturer with School
of Mathematics and Computation Science, Lingnan
Normal University, Guangdong, China. His main
research interests include image processing and numerical optimization.
Michael K. Ng is currently a Professor with the

Department of Mathematics, Hong Kong Baptist
University, Hong Kong. He received the B.Sc. and
M.Phil. degrees from the University of Hong Kong,
PLACE
Hong Kong, in 1990 and 1992, respectively, and
PHOTO
the Ph.D. degree from the Chinese University of
HERE
Hong Kong, Hong Kong, in 1995. He was a Research Fellow with the Computer Sciences Laboratory, Australian National University, Canberra, ACT,
Australia, from 1995 to 1997, and an Assistant/
Associate Professor with the University of Hong
Kong before joining Hong Kong Baptist University from 1997 to 2005.
His research interests include bioinformatics, data mining, image processing,
scientific computing, and data mining. He serves on the Editorial Boards
of international journals. He is also the Principal Editor of the Journal of
Computational and Applied Mathematics.

Tip 2015 2401430

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Tip 2015 2401430

Încărcat de

Drepturi de autor:

Formate disponibile

This article has been accepted for publication in a future issue of this journal, but has not been

On the Convergence of Nonconvex Minimization

AbstractNonconvex nonsmooth regularization method has

[7], [18], [20]. In particular, we solve for a minimizer b

In the above cost function, the least squares

Index TermsImage restoration, nonconvex and nonsmooth,

where g Rq is the observed image, f Rp is the original

where I denotes the set of all pixels of an image, i.e., I =

the above nonconvex function () from (3), u =

and DTi is the transpose of Di . In (4), the penalty term Df

and approach (or ) by a sequence of : R R+ (or )

J (f , u) = Hf g22 + (f )+Df u22 +

J0 is convex and nonsmooth;

J when increases, and J1 = J;

J (, u) is C and nonconvex as (0, 1];

In Table I, we list two examples of , , , , and

Now, we give three assumptions as follows.

Step 1: Initialize f (0) and j = 1;

Step 2: Update f and u until the convergence

function and the coupling function H(x, y) is continuously

T (x, y) = (x H (x, y) + f (x) , y H (x, y) + g (y))

for all z (0, ), (z) > 0;

Additionally, some relative notations are supplemented as

and therefore dist(0, J (f (j ) , u(j ) )) tends to zero when j

J (z(j+1) ) + az(j+1) z(j) 2 J (z(j) );

and J (z(ji ) ) J (z),

Hf g22 + (f (j) ) + Df (j) u(j) 22 +

where 1 and 2 are two positive numbers;

f (j) f (j1) 22 + u(j) u(j1) 22 < +

( (f (j) ) (f (j1) )), j 1, (12)

f (j ) , u(j ) (0, 0), j +,

The proof of Theorem 2 is given in Appendix C. Our

2 /a + ab () < 2/3( and stated in Theorem 3), we

z(j+1) z(j) < +.

Remark 1. As in the nonconvex and nonsmooth cost form

Based on above convergence analysis of the nonconvex

denoising step can be solved by the Chambolle algorithm.

we also have the following algorithm to solve the above

Step 1: Initialize f (0) and j = 1;

arg min J (f (j1) , u)

V () arg min J (z).

Theorem 6. Let J () : Rp Rsp R, is proper and lsc,

we dispose F Step with box constraints by the

The projection operator PS ()

li + s , gradi (f (s) , u(j) ) > 0}

s = f (s) PS (f (s) grad(f (s) , u(j) )),

is some positive number and gradi (f (s) , u(j) ) denotes the

= EBs + (I EBs )A(I EBs )

= gradi (f (s) , u(j) ),

In practice, subproblem (22) is solved inexactly using a

Or according to [29], we stop when

Algorithm PN (PN method for F Step)

the convergence of the sequence

and i,j is the Kronecker delta. The reduced gradient of

holds, where (0, 1) and f (s) () = PS (f (s) + Rs 1 d(s) ).

At each iteration, we partition the variables into two groups:

The proof of Theorem 7 is given in Appendix F. Next, we

The proof of Theorem 8 is given in Appendix G.

(b) Modified Shepp-Logan

Fig. 1. Original images

k is the critical point of optifrom 0 to 1, where z

We generate four blurred and noisy images by adding

max(psnr)=18.3381, index = 104