SQP

A Sequential Quadratic Programming Algorithm with an
Additional Equality Constrained Phase

Jose Luis Morales
Jorge Nocedal
Yuchen Wu
July 2, 2010
Abstract
A sequential quadratic programming (SQP) method is presented that aims to overcome some of the drawbacks of contemporary SQP methods. It avoids the difficulties
associated with indefinite quadratic programming subproblems by defining this subproblem to be always convex. The novel feature of the approach is the addition of an
equality constrained phase that promotes fast convergence and improves performance in
the presence of ill conditioning. This equality constrained phase uses exact second order
information and can be implemented using either a direct solve or an iterative method.
The paper studies the global and local convergence properties of the new algorithm and
presents a set of numerical experiments to illustrate its practical performance.
Introduction
Sequential quadratic programming (SQP) methods are very effective techniques for solving small, medium-size and certain classes of large-scale nonlinear programming problems.
They are often preferable to interior-point methods when a sequence of related problems
must be solved (as in branch and bound methods) and more generally, when a good estimate
of the solution is available. Some SQP methods employ convex quadratic programming subproblems for the step computation (typically using quasi-Newton Hessian approximations)
while other variants define the Hessian of the SQP model using second derivative information, which can lead to nonconvex quadratic subproblems; see [28, 1] for surveys on SQP
methods.
Departamento de Matem
aticas, Instituto Tecnol
ogico Aut
onomo de Mexico, Mexico. This author was
supported by Asociaci
on Mexicana de Cultura AC and CONACyT-NSF grant J110.388/2006.
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL,
60208-3118, USA. These authors were supported by Department of Energy grant DE-FG02-87ER25047 and
National Science Foundation grant DMS-0810213 .
A Sequential Quadratic Programming Algorithm
All these approaches have drawbacks. SQP methods that employ convex quasi-Newton
approximations [37, 23, 16] can be slow when solving large or badly scaled problems, whereas
methods that employ the exact Hessian of the Lagrangian [19] are confronted with the
difficult task of computing solutions of indefinite quadratic programs [28]. In this paper,
we propose an algorithm that aims to overcome some of these drawbacks by formulating
the SQP iteration in two phases. The algorithm first solves a convex quadratic program
to estimate the optimal active set, and then employs second-derivative information in an
additional equality constrained (EQP) phase that promotes rapid convergence. The EQP
subproblem is normally less expensive to solve than a general quadratic program and allows
for the application of iterative methods that scale up well in the number of variables. We
call our algorithm SQP+ because of the incorporation of the additional EQP phase.
Concurrently with this work, Gould and Robinson [30, 29] have developed a class of
two-phase SQP methods for large-scale optimization. Their main focus is on trust region
algorithms that solve two inequality constrained quadratic programs at every iteration. In
their approach, a convex quadratic program is first solved to yield a so-called predictor step,
which is then used to guide the solution of a second quadratic program (constructed using
second derivative information). Gould and Robinson have also proposed a method similar
to SQP+ in which the second phase of the algorithm consists of an EQP step. Other SQP
methods that perform the step computation in two phases are discussed in [17, 21].
The nonlinear programming problem under consideration is stated as
min f (x)
x
(1.1)
s.t. h(x) = 0,
g(x) 0,
where f : Rn R, h : Rn Rm and g : Rn Rt are smooth functions. We write the

Lagrangian of this problem as
L(x, , ) = f (x) T h(x) T g(x)
(1.2)
where and are vectors of multipliers. The KKT conditions for the nonlinear program
(1.1) are given by
f (x) H(x)T G(x)T = 0
(1.3a)
h(x) = 0
(1.3b)
g(x) 0
(1.3c)
(1.3d)
g(x) = 0.
(1.3e)
where H, G are the Jacobian matrices of h and g, respectively.

The SQP approach presented in this paper can be implemented both in a line search or
a trust region framework. In either case, global convergence can be promoted by imposing
decrease in the `1 merit function
(x) = f (x) + kh(x)k1 + kg(x) k1 ,
(1.4)
where g(x) , max{0, gi (x)} and > 0 is a penalty parameter.

The paper is organized in 6 sections. In Section 2 we describe the proposed SQP method
and outline various implementation options. In Sections 3 and 4 we discuss the global and
local convergence properties of the new algorithm, and in Section 5 we report the results of
some numerical experiments. Some concluding remarks are made in Section 6.
The SQP+ Method
The SQP+ approach can be implemented in a line search or trust region framework. For
concreteness, we consider only a line search implementation in this paper. Given a primaldual iterate (xk , k , k ), the algorithm first computes a step dIQ
k in the primal variables x
by solving the standard quadratic programming subproblem
f (xk )T d + dT Bk d
min
d
(2.1a)
subject to h(xk ) + H(xk )d = 0,
(2.1b)
g(xk ) + G(xk )d 0,
(2.1c)
where Bk is a positive definite approximation to the Hessian of the Lagrangian 2xx L(xk , k , k ).
IQ
The multipliers associated with the solution of (2.1) are denoted by IQ
k+1 , k+1 and will be
referred to as the IQP multipliers [25]. We assume that the constraints (2.1b)-(2.1c)) are
feasible, and will later discuss how to handle the case when they are not. One of the goals
of this IQP phase, is to provide a good estimate, say Wk , of the optimal active set. To
this end, we solve problem (2.1) accurately and define Wk as the indices of the constraints
(2.1b)-(2.1c) that are satisfied as equalities at the solution dIQ
k , i.e.,
Wk = {i E} {i I | gi (xk ) + gi (xk )T dIQ
k = 0},
(2.2)
where E, I are the index sets of the equality and inequality constraints, respectively. To
compensate for the limitations of the IQP phase and to promote a fast rate of convergence,
the algorithm computes an additional step by solving an equality constrained quadratic
problem (EQP) in which the constraints in Wk are imposed as equalities and all other
constraints are (temporarily) ignored. The EQP subproblem is given by
min
d
T
1 T
(f (xk ) + Wk dIQ
k ) d + 2 d Wk d
subject to hi (xk )T d = 0,
gi (xk )T d = 0,
(2.3a)
i E,
(2.3b)
i Wk I,
(2.3c)
where
IQ
Wk = W (xk , IQ
k+1 , k+1 )
with
W (x, , ) , 2xx L(x, , ),
(2.4)
and hi (xk )T and gi (xk )T denote the rows of the matrices H(xk ) and G(xk ), respectively.
Problem (2.3) could be unbounded because, away from the solution, the Hessian of the
Lagrangian may not be positive definite on the tangent space of the constraints defined by
Wk . In this case, we can add a regularization term to the objective (2.3a) or include a trust
region constraint in problem (2.3) to ensure that the EQP subproblem is well defined.
We denote an approximate solution to (2.3) by dEQ
k and the corresponding Lagrange
EQ
EQ
multipliers by k+1 , k+1 . In Section 3 we show that there is much freedom in the selection
of the EQP step. There are two effective procedures for computing dEQ
k , namely a projected
conjugate gradient method [31, 12, 26] and the direct solution of the (possibly modified)
KKT system of (2.3) [38, 2].
Next, the algorithm computes a contraction parameter [0, 1] such that the step
EQ
d = dIQ
k + dk satisfies all linearized constraints that are not in the working set Wk , i.e.,
gi (xk ) + gi (xk )T d 0, i Wkc ,
(2.5)
where Wkc denotes the complement of Wk . Thus, all the steps of the algorithm will satisfy
the linearized constraints (2.1b)-(2.1c).
Before the line search is performed, the algorithm updates the penalty parameter in
(1.4). For this purpose, we define a model of the merit function around the point xk as
q (d) = fk + fkT d + 12 dT Bk d + khk + Hk dk1 + k[gk + Gk d] k1 ,
(2.6)
where fk is shorthand for f (xk ) and similarly for other functions. We define the model
reduction from xk to xk + d, by
qred(d) = q (0) q (d).
(2.7)
For a step that satisfies the linearized constraints (2.1b)-(2.1c), we have

qred(d) = fkT d 21 dT Bk d + (khk k1 + kgk k1 ).
(2.8)
The update of the penalty parameter is based on the IQP step. As is now common [3, 39, 6],
we require that the new penalty parameter k be large enough that
qred(dIQ
k ) k (khk k1 + kgk k1 ),
(2.9)
for some prescribed constant (0, 1). From (2.8) we see that condition (2.9) is equivalent
to the requirement
IQ
1 IQ T
fkT dIQ
k + 2 (dk ) Bk dk
k
, trial .
(2.10)
(1 )(khk k1 + kgk k1 )
We can enforce this condition by updating the penalty parameter at every iteration k by
means of the following rule:
(
k1
if (2.9) is satisfied
k =
(2.11)
trial + b otherwise,
where b > 0 is a given constant and trial is defined in (2.10).
The algorithm then performs a backtracking line search along the piecewise linear segIQ
EQ
ment that starts at xk , goes through xk + dIQ
k and ends at xk + dk + dk . The line
search first attempts to find a steplength k [min , 1] that satisfies the sufficient decrease
condition
IQ
EQ
(2.12)
k (xk + dIQ
k + k dk ) k (xk ) qred(dk ),
for some given constant (0, 1), and where min is a threshold such as 0.25. If this line
search is successful, we define the total step as
EQ
dk = dIQ
k + k dk .
(2.13)
Otherwise, we choose a constant (0, 1) and let k (0, 1] be the first member of the
sequence {1, , 2 , . . .} such that
IQ
k (xk + k dIQ
k ) k (xk ) k qred(dk ),
(2.14)
where is the same constant as in (2.12). We then set

dk = k dIQ
k .
(2.15)
Regardless of whether dk is defined by (2.13) or (2.15), the new primal iterate is defined as
xk+1 = xk + dk .
(2.16)
Note that, unlike (2.14), the step length k does not appear in the sufficient decrease
condition given by the last term in (2.12). This is because we want global convergence to
be driven by the IQP phase, and therefore accept the EQP step only if it gives as much
reduction as that predicted by the full IQP step.
New Lagrange multipliers (k+1 , k+1 ) are defined at the EQP point, as follows. First,
we set the multipliers corresponding to the inactive linearized constraints to zero. The
EQ
rest of the multipliers are set to the EQP multipliers EQ
k+1 , k+1 except that if any
EQ
components of k+1 are negative, they are set to zero. The IQP multipliers are undeniably
another reasonable candidate for the new Lagrange multipliers. But, we believe the EQP
multipliers are better because the EQP subproblem (2.3) uses the exact Hessian of the
Lagrangian Wk , yielding accurate curvature information. In addition, as we will show in
Section 4, under suitable conditions the EQP multipliers converge rapidly to the optimal
multipliers.
Before describing the algorithm in more detail, we introduce some notation to denote
subvectors of inequality constraints and their multipliers. Given a working set Wk , and
multipliers corresponding to the inequality constraints g in (1.3c) or (2.1c), we denote
by [g]Wk I the subvector of constraints g restricted to the set Wk I, and by []Wk I
the corresponding multipliers. Similarly, we denote by [g]Wkc the subvector of constraints g
restricted to the complement Wkc , and by []Wkc the corresponding multipliers. To further
simplify notation, we omit the square bracket [ ] and the subscript I when it is
unambiguous to do so. Using this convention, we have
gWk [g]Wk [g]Wk I ,
and Wk []Wk []Wk I .
The new SQP algorithm is specified as follows.
Algorithm I
Initial data: x0 , 0 > 0, b > 0, > 0, (0, 1) and (0, 1).
For k = 1, 2, ... until the KKT conditions (1.3) for the nonlinear program (1.1) are satisfied
IQ
IQ
1. Define a positive definite matrix Bk and compute the step dIQ
k and multipliers k+1 , k+1
by solving the subproblem (2.1);
2. Determine the working set Wk ;

EQ
EQ
3. Compute the EQP step dEQ
k and multipliers k+1 , k+1 by finding an approximate
solution of problem (2.3);
EQ
4. Compute the largest number [0, 1] that ensures that the step d = dIQ
k + dk
satisfies the constraints (2.5);
5. Compute the penalty parameter k by (2.11);

6. Compute the steplength k , define dk by (2.13) or (2.15), and set xk+1 = xk + dk ;
7. Set
k+1 = EQ
k+1 ,

[k+1 ]Wk = max 0, EQ
k+1 ,
[k+1 ]Wkc = 0.
We have defined the model q in terms of the positive definite Hessian approximation
Bk so that the IQP step drives the global convergence of the algorithm while the EQP
phase yields superlinear convergence.
Many enhancements and safeguards are needed to transform this general (and somewhat
idealized) algorithm into a practical approach. For example, the constraints in (2.1) are not
always feasible. This issue can be resolved by modifying the constraints using relaxations
[33, 3] or by following a penalty approach [18, 10, 8]. The algorithm should also include
regularization strategies for controlling singularity and ill conditioning [25, 13], as well as
second-order corrections [18] or watchdog steps [9] to improve the efficiency of the iteration.
Step 4 is quite restrictive (for simplicity) and in a practical implementation we might allow
the EQP step to violate some of the linearized constraints that are not in the working set.
In Section 6 we comment on a variant that employs a projection in the EQP phase instead
of backtracking to satisfy the constraints (2.5).
A fully developed implementation of the proposed approach is outside the scope of this
paper. We focus, instead, on some of the key aspects of the algorithm so that its potential
can be assessed in a simple setting. One of the main questions we wish to investigate is
whether the EQP phase does, in fact, add significant benefits to the standard SQP approach.
One motivation for the SQP+ algorithm stems from the difficulties that have been
encountered in trying to introduce second-derivative information in SQP codes that were
originally designed to solve convex quadratic subproblems, such as snopt. Another motivation arose from the numerical experience with the sequential linear-quadratic programming
(SLQP) method described in [4, 5], which reveals the limitations of using only first order
information in the active set prediction phase. For example, it has proved to be difficult
to warm start the linear programming phase in SLQP methods because the trust region
constraint, which is defined as a box constraint [20, 4, 11], is typically active at every iteration and it is difficult to predict which sides of the box will be active. The IQP phase of
the SQP+ algorithm is both easier to warm start and provides a better working set estimation than the SLQP method often at the price of a marginally more expensive step
computation.
Global Convergence Properties
In this section we show that Algorithm I enjoys favorable global convergence properties.
We make the following assumptions about problem (1.1) and the iterates generated by the
algorithm.
Assumptions 3.1
(a) The sequence {xk } generated by Algorithm I is contained in a convex set where the
functions f, h, g are twice differentiable; the sequence {fk } is bounded below and the
sequences {fk }, {hk , }, {gk }, {Hk } and {Gk } are bounded.
(b) The quadratic program (2.1) is feasible for all k;
(c) The sequence of IQP multipliers is bounded, i.e. there exists a positive constant
IQ
such that k(IQ
k , k )k for all k;
(d) The sequence of Hessian approximations {Bk } is bounded above and there exists a
constant M > 0 such that for all k,
IQ
1 IQ T
2 (dk ) Bk dk
2
M kdIQ
k k2 .
(3.1)
These assumptions are fairly restrictive, in particular (b) and (c). Although they have
often been used in the literature (see e.g. Powell [35]), recent studies of SQP methods
establish global convergence properties under weaker assumptions. This requires, however,
that the SQP method be modified significantly by incorporating constraint relaxations,
Hessian modifications, trust regions or other devices [13, 8, 10]. The resulting algorithms
are complex and the analysis is long and laborious. By making the above assumptions, we
can develop a fairly compact convergence analysis that highlights the key properties of the
SQP+ approach.
IQ
IQ
We begin the analysis by noting that the primal-dual solution (dIQ
k , k+1 , k+1 ) of the
IQP subproblem (2.1) satisfies the following first-order conditions:
T IQ
T IQ
fk + Bk dIQ
k Hk k+1 Gk k+1 = 0
(3.2a)
IQ
(3.2b)
IQ
(3.2c)
hk + Hk dk = 0
gk + Gk dk 0
IQ
k+1 0
IQ
T
(IQ
k+1 ) (gk + Gk dk ) = 0.
(3.2d)
(3.2e)
It is not difficult to show (cf. [32, page 628]) that the directional derivative of the merit
function k at a point xk along the direction dIQ
k satisfies
T IQ
Dk (xk ; dIQ
k ) fk dk kh(xk )k1 kg(xk ) k1 .
By comparing the right hand side of this expression with (2.8), we obtain
IQ
IQ
1 IQ T
Dk (xk ; dIQ
k ) qred(dk ) 2 dk Bk dk .
(3.3)
Recalling from (2.9) that qred(dIQ

k ) 0 and since Bk is positive definite, we conclude that
. Relation (3.3) also shows that, by using qred(dIQ
dIQ
is
a
descent
direction
for
k
k ) in the
k
right hand sides of (2.12) and (2.14) (instead of using the directional derivative Dk (xk ; dIQ
k )
as in a standard Armijo condition), the sufficient decrease condition is less demanding and
accounts for the nondifferentiability of the penalty function. This allows the algorithm to
take longer steps.
An immediate consequence of Assumption 3.1-(c) is that the penalty parameters k
generated by the update rule (2.11) are bounded.
Lemma 3.2 There is an index k and a positive constant

such that k =
for all k k.
Proof. From (3.2b) and (3.2e) we have that
T IQ
T IQ
(Hk dIQ
k ) k+1 = hk k+1 ,
T IQ
T IQ
(Gk dIQ
k ) k+1 = gk k+1 .
(3.4)
Next, since IQ
k+1 0, we have that
IQ
IQ
IQ
T
T
T
(IQ
k+1 ) gk (k+1 ) max{0, gk } = (k+1 ) gk kk+1 k kgk k1 ,
and we also have

IQ
T
(IQ
k+1 ) hk kk+1 k khk k1 .
By combining these two relations with (3.2a) and (3.4), and by Assumptions 3.1-(c), we
obtain
IQ T
IQ
IQ
T IQ
1 IQ T
fkT dIQ
k + 2 dk Bk dk fk dk + dk Bk dk
IQ
k(IQ
k+1 , k+1 )k (khk k1 + kgk k1 )
(khk k1 + kgk k1 ).
From this relation and (2.10) we have that trial /(1 ). Consequently, we have from
(2.11) that for all k,

k max 0 ,
+ b ,
1
showing that the sequence of penalty parameters is bounded from above. Finally, note
that when the penalty parameter is increased, it is increased by the finite amount b . This
implies that the sequence of penalty parameters is eventually constant, which completes the
proof.

The next result is crucial.
Lemma 3.3 Under Assumptions 3.1, Algorithm I yields the limit

lim qred(dIQ
k ) = 0.
(3.5)
k =
Proof. By Lemma 3.2, there exists an integer k such that for all k k,
. Since for
the purpose of proving convergence, the initial finite number of iterations are not relevant,
we consider only those iterations with k k + 1.
Let Te denote the set of iterates for which the line search along the EQP direction was
successful, and hence (2.12) is satisfied. In this case we have
(xk ) (xk+1 ) qred(dIQ
k ) for k Te .
(3.6)
Similarly, let Ti denote the set of iterates for which a backtracking line search along the
direction dIQ is performed. In this case we have from (2.14) that
(xk ) (xk+1 ) k qred(dIQ
k ) for k Ti .
(3.7)
By (2.9) we have that qred(dIQ

(xk+1 )} is
k ) 0 for all k and therefore the sequence {
monotonically decreasing. Since f (xk ) is bounded from below by Assumption 3.1-(a), we
have that { (xk )} is also bounded from below. Thus, { (xk )} must converge, i.e.,

lim (xk+1 ) (xk ) = 0,
k
which implies
lim
kTe ,k
qred(dIQ
k )=0
and
lim
kTi ,k
k qred(dIQ
k ) = 0.
(3.8)
If the set Ti is finite, the limit (3.5) holds since k Te for all large k. Therefore, we assume
that Ti is infinite and show that Assumptions 3.1, the backtracking line search procedure
and the second limit in (3.8) imply that limkTi ,k qred(dIQ
k ) = 0; this will prove the
lemma.
Recalling the definitions (1.4) and (2.6) and Assumptions 3.1-(a) we have that
IQ
IQ
T IQ
(xk + k dIQ
(k dk ) f (xk + k dk ) (fk + k fk dk )
k ) q
IQ
+
(kh(xk + k dIQ
k )k1 khk + k Hk dk k1 )
IQ
+
(kg(xk + k dIQ
k ) k1 k[gk + k Gk dk ] k1 )
(3.9)
2
L1 k2 kdIQ
k k2 ,
for some constant L1 > 0 independent of k. From (3.2a) we have

1
T IQ
T IQ
dIQ
k = Bk [fk + Hk k+1 + Gk k+1 ].
By Assumptions 3.1-(a), all the terms inside the square brackets are bounded and the
matrices Bk are uniformly positive definite. Therefore, for all k, there is a constant L2 such
that
2
kdIQ
k k2 L2 ,
10
and (3.9) yields

IQ
2
(xk + k dIQ
(k dk ) L1 L2 k .
k ) q
(3.10)
The positive definiteness of Bk also implies that q is a convex function, and therefore,
IQ
q (k dIQ
(0) + k q
(dk ).
k ) (1 k )q
Recalling (2.7), this inequality gives

IQ
q (k dIQ
(0) k qred(dk ).
k ) q
Combining this expression with (3.10), and noting that (xk ) = q (0), we obtain

IQ
IQ
IQ
(xk + k dIQ
(xk ) =
(xk + k dk ) q
(k dk ) + q
(k dk )
(xk )
k )
2
k qred(dIQ
k ) + L1 L2 k .
(3.11)
Now, if
k
(1 )
qred(dIQ
k ),
L1 L2
(3.12)
we have
IQ
(xk + k dIQ
(xk ) k qred(dk )
k )
and therefore, the sufficient decrease condition (2.14) is satisfied. Since k Ti , k is chosen
by a backtracking line search with a contraction factor , we conclude that
k
(1 )
qred(dIQ
k ).
L1 L2
Substituting this inequality in the second limit in (3.8) gives

0=
lim
kTi ,k
k qred(dIQ
k )
lim
kTi ,k
2 IQ
(1)
L1 L2 qred (dk ) 0,
which implies that limkTi ,k qred(dIQ

k ) = 0.
We can now establish some limiting properties of the iterates.

Lemma 3.4 The sequence {xk } generated by Algorithm I is asymptotically feasible, i.e.,
lim (khk k1 + kgk k1 ) = 0.
(3.13)
lim gkT IQ
k+1 = 0
(3.14)
IQ
(3.15)
Furthermore,
k
and
lim d
k k
= 0.
11
Proof. From Lemma 3.3 and condition (2.9) we get
0 = lim qred(dIQ
k ) lim k (khk k1 + kgk k1 ) 0.
k
Hence, we have
lim k (khk k1 + kgk k1 ) = 0.
(3.16)
Because {k } is bounded away from zero, this implies the limit (3.13).
As to (3.14) and (3.15), we first observe from (3.2a) and (3.4) that
IQ T
IQ
IQ T IQ
IQ T IQ
0 = fkT dIQ
k + dk Bk dk (Hk dk ) k+1 (Gk dk ) k+1
IQ T
IQ
T IQ
T IQ
= fkT dIQ
k + dk Bk dk + hk k+1 + gk k+1
{fkT dIQ
k
IQ
1 IQ T
2 dk Bk dk }
IQ
1 IQ T
2 dk Bk dk
(3.17)
hTk IQ
k+1
gkT IQ
k+1 .
Using the definition (2.8) of qred, the limits (3.5) and (3.16), we obtain
IQ
1 IQ T
lim fkT dIQ
k + 2 dk Bk dk = 0,
(3.18)
and consequently by (3.17)

lim 1 dIQ T Bk dIQ
k
k 2 k
T IQ
+ hTk IQ
k+1 + gk k+1 = 0.
(3.19)
By Assumption 3.1-(c), the IQP multipliers are bounded and thus from (3.13) we have
lim hTk IQ
k+1 = 0.
(3.20)
Therefore (3.19) simplifies to

lim 1 dIQ T Bk dIQ
k
k 2 k
+ gkT IQ
k+1 = 0.
(3.21)
We also have from (3.13) that ||gk ||1 0, and thus by the boundedness of the IQP multipliers we obtain
X
lim
gi (xk )[IQ
(3.22)
k+1 ]i = 0,
k
iJk
where Jk = {i : gi (xk ) < 0}. Therefore,
X
IQ
IQ
T IQ
lim gk k+1 = lim
gi (xk )[k+1 ]i +
gi (xk )[k+1 ]i
k
k
iJ
iJk+
k
X
= lim
gi (xk )[IQ
k+1 ]i
k
0,
iJk+
(3.23)
12
where Jk+ = {i : gi (xk ) > 0}. By (3.1) and (3.23) the limits of both sequences in (3.21) are
nonnegative, so we conclude that
lim gkT IQ
k+1 = 0
and
lim 1 dIQ T Bk dIQ

k
k 2 k
=0
Hence, by recalling (3.1) we have

1 IQ T
d Bk dIQ
k
k 2 k
0 = lim
2
limk M kdIQ
k k2 0,
which implies that dIQ

k 0.
We can now prove the main result of this section.

IQ
Theorem 3.5 Any limit point of the sequence {xk , IQ
k+1 , k+1 } generated by Algorithm I
is a KKT point of the nonlinear program (1.1).
IQ
Proof. Consider a limit point {x , , } of the sequence {xk , IQ
k+1 , k+1 }. Then there is an
IQ
IQ
index set K such that (xk , k+1 , k+1 )kK (x , , ). Taking limits in (3.2a) we obtain
lim
k,kK
T IQ
T IQ
fk + Bk dIQ
k Hk k+1 Gk k+1 = 0.
(3.24)
We have shown in Lemma 3.4 that the sequence {dIQ

k } converges to zero, and Assumption 3.1-(d) states that the matrices Bk are bounded, thus
lim
k,kK
Bk dIQ
k = 0.
By Assumption 3.1-(a), the functions f , H, G are continuous. Hence from (3.24),

f (x ) H(x )T G(x )T = 0,
so that (1.3a) is satisfied. Lemma 3.4 guarantees that x is feasible and hence (1.3b) and
(1.3c) hold. Condition (1.3e) follows from (3.14), and the nonnegativity condition (1.3d)
holds by (3.2d). Consequently, the limit point {x , , } satisfies the KKT conditions
(1.3).

IQ
This theorem shows that the primal-dual variables (xk , IQ
k+1 , k+1 ) can be used to terminate the algorithm. An alternative (primal) stop test would be to terminate the iteration
when ||dIQ
k || is smaller than a given tolerance. One can also base the convergence test on
the new iterate, (xk+1 , k+1 , k+1 ).
Local Convergence Properties
We now show that Algorithm I identifies the optimal active set once an iterate xk is close
enough to a solution satisfying standard conditions. Furthermore, since for large k the
13
EQP phase computes Newton steps along the optimal active set, we show that the rate of
convergence of the iteration is quadratic.
We begin by introducing some notation. The working set Wk is given by (2.2) and the
(optimal) active set corresponding to a solution x of the nonlinear program (1.1) is defined
as
W = {i E} {i I | gi (x ) = 0}.
(4.1)
Recall that gW [g]WI is the subvector of inequality constraints in the working set W.
We denote by cW the vector of constraints in the working set, i.e.,

h(xk )
cW (xk ) =
(4.2)
gW (xk ) ,
and denote its Jacobian by AW , specifically

H(xk )
AW (xk ) =
, with
GW (xk )
GW (xk ) = [gi (xk )T ]iWI .
(4.3)
Our local convergence results are established under the following assumptions.
Assumptions 4.1 Let (x , , ) be a primal-dual solution to (1.1) and let W be the
corresponding optimal active set.
(a) (Smoothness) In a neighborhood of x , the functions f , g and h are twice continuously
differentiable with Lipschitz continuous second derivatives;
(b) (Regularity) The Jacobian of the active constraints at x , denoted as AW (x ), has
full row rank;
(c) (Strict Complementarity) + g(x ) > 0;
(d) (Regularity of IQP Model) For all k, the subproblem (2.1) is feasible, and the matrices
Bk are positive definite and their smallest eigenvalue is bounded away from zero;
(e) (Second Order Sufficiency) The Hessian W defined in (2.4) satisfies y T W (x , , )y >
0 for all y 6= 0 such that AW (x )y = 0.
Throughout the analysis, we assume that k k denotes the 2-norm, unless indicated
otherwise. The following lemma, which was first proved by Robinson [36], states that near
a solution x , the IQP step identifies the optimal active set W . We give a proof of this
result because some of the arguments are used again in the proof of Theorem 4.3.
Lemma 4.2 Suppose that Assumptions 4.1 hold. If xk is sufficiently close to x , the working set (2.2) identified by the solution of the IQP subproblem (2.1) is the optimal active set,
i.e., Wk = W .
14
Proof. By Assumption 4.1-(d), the IQP subproblem (2.1) has a unique primal optimal
solution dIQ
k , and hence the working set Wk is uniquely determined by (2.2). Let us recall
definitions (4.2) and (4.3), and consider the system
#
"
#" #
"
Bk
ATW (xk ) d
fk
,
(4.4)
=
cW (xk )
AW (xk )
0
which is defined in terms of the optimal active set W . We now show that when xk is close
to x , the IQP step can be computed via solution of the system (4.4).
Let us define the neighborhood
N () , { x | kx x k }
with
> 0.
(4.5)
By Assumptions 4.1-(a), (b), (d), for sufficiently small and xk N (), the linear system
(4.4) is nonsingular and the inverse of its coefficient matrix is bounded above in norm by
some constant 1 > 0. Let us define

=
.
(4.6)
[ ]W
From (4.4) we have that
"
and hence
"
=
Bk
AW (xk )
#1 "
#
ATW (xk )
fk ATW (xk )
cW (xk )
"
#

fk AT (xk )

W
d

.
1

cW (xk )
(4.7)
(4.8)
Furthermore, Assumptions 4.1-(a) imply that f , cW , AW are Lipschitz continuous and

therefore for all xk N () there exist constants f , c and a such that
kfk f (x )k f kxk x k f ,
kcW (xk ) cW (x )k
c kxk x k
c ,
(4.9)
kAW (xk ) AW (x )k a kxk x k a .

By (4.6), we can express the KKT conditions (1.3) of the nonlinear program (1.1) as
"
#
f (x ) ATW (x )
= 0,
(4.10)
cW (x )
together with
[ ]W > 0,
[ ]W c = 0,
gW c (x ) > 0,
(4.11)
15
where condition (4.11) follows from Assumption 4.1-(c). Therefore, we have from (4.10)
and (4.9) that
"
#
# "
fk AT (xk ) (fk f (x )) (AW (xk ) AW (x ))T
W

=

cW (xk )
cW (xk ) cW (x )
kfk f (x )k + kcW (xk ) cW (x )k + kAW (xk ) AW (x )kk k

f + c + a k k kxk x k.
(4.12)
If we define o = f + c + a k k, we obtain from (4.8) that

d

(4.13)
o 1 kxk x k o 1 .
Let us now write in terms of components whose dimensions are compatible with those of
in (4.6), i.e.,

=
.
(4.14)
Then, from (4.13)

kdk o 1 and k [ ]W k o 1 .
Hence,
= [ ]W ([ ]W )
(4.15)
[ ]W o 1 1,
and
gW c(xk ) + GW c(xk )d = gW c(x ) (gW c(x ) gW c(xk ))

(GW c(x ) GW c(xk ))d + GW c(x )d
(4.16)
gW c(x ) g + G o 1 + kGW c(x )ko 1 1,

where g and G are, respectively, Lipschitz constants for gW c(x ) and GW c(x) over N ()
and 1 is a vector of all ones of appropriate dimension. Therefore, if is small enough we
have from (4.11), (4.15), (4.16) that
> 0,
gW c(xk ) + GW c(xk )d > 0.
We can now construct the solution of the IQP subproblem as follows

"
#
#
"
IQ
k+1
c
dIQ
==
, [IQ
k = d,
k+1 ]W = 0.
[IQ
]
k+1 W
(4.17)
(4.18)
By (4.4) and (4.17), it is easy to verify that this primal-dual solution satisfies the KKT
conditions (3.2) of the IQP subproblem. Therefore, the vector dIQ
k constructed in this
manner is indeed the unique primal optimal solution of the IQP subproblem and its working
set satisfies Wk = W .

We now present the main result of this section. It shows that, if the algorithm takes the
full step for all xk sufficiently close to x , the iteration is quadratically convergent.
16
Theorem 4.3 Suppose that: i) Assumptions 4.1 hold; ii) For xk sufficiently close to x ,
Algorithm I computes the step dk by (2.13) with k = 1. Then, there exists a neighborhood
N () such that if xk N (), the sequence {(xl , l , l )}
l=k converges quadratically to
(x , , ).
Proof. By Lemma 4.2, if xk is sufficiently close to x we have Wk = W . The KKT
conditions of the EQP subproblem (2.3) are therefore given by
#"
#
"
#
"
fk
Wk
ATW (xk ) d + dIQ
k
=
.
(4.19)
cW (xk )
0
AW (xk )
Since, by Assumptions 4.1-(b), (e), the matrix
"
#
W (x, , ) ATW (x)
AW (x)
is nonsingular at (x , , ), we have by (2.4), (4.13), (4.18) and Assumption 4.1-(a) that,

if xk is sufficiently close to x , the coefficient matrix of (4.19) is also nonsingular, whose
inverse is bounded above in norm. Hence, the solution to (4.19) is given by
" EQ #
k+1
EQ
EQ
(d, ) = (dEQ
with k+1
,
.
(4.20)
k , k+1 )
EQ
k+1
Notice that (4.19) has the same form as (4.4), except that Bk is replaced by Wk . Thus we
IQ
EQ
can follow the same reasoning as in the proof of Lemma 4.2 and deduce that (dEQ
k +dk , k+1 )
also satisfies (4.17), i.e.,
EQ
k+1 > 0
EQ
and gW c(xk ) + GW c(xk )(dIQ
k + dk ) > 0.
(4.21)
As a result, for xk sufficiently close to x , step 4 of Algorithm I will choose = 1, step 6

will set
EQ
EQ
(4.22)
and xk+1 xk = dIQ
dk = dIQ
k + dk ,
k + dk
and step 7 will set
k+1 = EQ
k+1 ,
[k+1 ]W = EQ
k+1 ,
[k+1 ]Wc = 0.
(4.23)
(4.24)
Let us define a matrix function as

(x, ) , 2 f (x)
W
X
iE
i 2 hi (x)
i 2 gi (x).
iIW
Then, by recalling (2.4) and plugging (4.20), (4.22) into (4.19), we further deduce that the
EQP
EQP iterate (xk+1 , k+1
) satisfies

(xk , IQ ) ATW (xk )

W
k+1
AW (xk )
0

IQ
xk+1 xk
f (xk ) ATW (xk )k+1
=
,
EQ
IQ
cW (xk )
k+1
k+1

(4.25)

where
"
IQ
k+1 ,
IQ
k+1
17
[IQ
k+1 ]W
(4.26)
was the vector of IQP multipliers that has already been computed. Note that the system
(4.25) is the Newton iteration applied to the nonlinear system
f (x) ATW (x) = 0
(4.27a)
cW (x) = 0
(4.27b)
IQ
EQ
at the point (xk , k+1
), for which the solution is (xk+1 , k+1
). Thus, we can perform standard Newton analysis to establish quadratic convergence. We first observe that, by Assumptions 4.1-(a), the matrix

(x, ) ATW (x)
W
F (x, ) ,
0
AW (x)
is Lipschitz continuous with Lipschitz constant 1 . If we define

=
,
[ ]W
(4.28)
then, as we have noted for the linear system (4.19), in a neighborhood of (x , ), F is

invertible and F 1 is bounded above in norm by a constant 2 . Thus, by Theorem 5.2.1 of
IQ
) satisfies
[14], if (xk , k+1

xk x
IQ
1 ,
(4.29)

21 2
k+1
then

2
xk+1 x

EQ
1 2 xIQk x .

k+1
k+1
(4.30)
Let us define the neighborhood

N () = {x | kx x k } ,
where > 0 is small enough that all the properties derived so far in this proof and the proof
of Lemma 4.2 hold for xk N (). Then, from (4.13) and (4.18), there exists a constant
IQ
3 > 1 such that xk N () implies kk+1
k < 3 kxk x k, which further implies that

xk x q
IQ
1 + 2 kxk x k,
(4.31)
3

k+1
and that (4.29) and (4.30) are satisfied with sufficiently small , as desired. Hence, it follows
from (4.23), (4.30), (4.31) and the definitions in (4.20), (4.28) that if xk N (),

xk x 2
xk+1 x

k+1 1 2 (1 + 23 )kxk x k2 1 2 (1 + 23 ) k .

k+1
k
(4.32)
18
Let be sufficiently small such that min{1, 1/1 2 (1 + 23 )}, then xk+1 N ().
Therefore, we have shown that if xk N (), all subsequent iterates remain in N () and
converge quadratically to (x , , ).

This quadratic convergence result is more satisfying than those established in the literature of SQP methods that use exact Hessians. This is because, even in a neighborhood
of a solution satisfying Assumptions 4.1, the quadratic program (2.1) may have several
local minimizers if Bk is replaced by the Hessian Wk . Thus, for classical SQP methods
it is necessary to assume that the quadratic programming solver selects a local minimizer
with certain properties; for example, the one closest to the current iterate. Our quadratic
convergence result, relies only on the second-order sufficiency conditions of the problem.
To establish Theorem 4.3, we have assumed that near the solution the line search condition (2.12) is satisfied for k = 1. As is well known, however, the nonsmooth penalty
function may reject unit steplengths (the Maratos effect) and some additional features
must be incorporated into the algorithm to prevent this from happening. They include
second-order corrections or watchdog steps; see e.g. [32].
Implementation and Numerical Results
The SQP+ algorithm can be implemented as an extension of any SQP method that solves
convex quadratic programs for the step computation. Our implementation is based on
sqplab, a Matlab package developed by Gilbert [22] that offers a classical line search SQP
method using an `1 merit function and BFGS quasi-Newton updating. We chose sqplab
as a platform for our prototype implementation of the SQP+ method because it allows us
to easily evaluate various components of the algorithm in a controlled setting.
We made two modifications to sqplab in order to improve its performance. Instead of
using The Mathworks routine quadprog to solve the quadratic program (2.1), we employed
knitro/active [7] for this task. Unlike quadprog, the knitro/active code had no difficulties solving the quadratic subproblems generated during the IQP phase, and provided
reliable multiplier estimates. The second modification concerns the choice of the penalty
parameter , which in sqplab is based on the size of Lagrange multiplier estimates. We
replaced this technique by the rule (2.10)-(2.11) that is known to perform better in practice.
The BFGS quasi-Newton approximation Bk used in the IQP phase of the SQP+ method
is updated using information from the full step. We define the correction pairs as
sk = xk+1 xk
and yk = x L(xk+1 , k+1 , k+1 ) x L(xk , k+1 , k+1 ),
where k+1 is given in Step 7 of Algorithm I, and modify yk , if necessary, by means of

Powells damping procedure [34] to ensure that all Hessian approximations Bk are positive
definite.
As mentioned in Section 2, we can compute an approximate solution of the EQP problem
(2.3) by either applying the projected conjugate gradient method [32] (and adding a trust
region constraint to (2.3)) or using a direct method. Here we follow the latter approach.

When the EQP problem (2.3) is convex, its solution solves the KKT system
EQ
Wk
H T (xk ) GTW k (xk )
d
f (xk ) + Wk dIQ
H(xk )
EQ =
,
0
0
0
EQ
0
0
0
GW k (xk )
19
(5.1)
where, as in the previous section, GW k is the matrix obtained by selecting the rows of G
corresponding to the elements of Wk . If the inertia of the KKT matrix in (5.1) is given by
(n, |Wk |, 0),
(5.2)
then we know that problem (2.3) has a unique minimizer [32]. If this is not the case
(which can occur also if the gradients of the constraints in the working set Wk are not
linearly independent), we simply skip the EQP step. (We also tested the option of adding
a sufficiently large multiple of the identity matrix to Wk , as discussed in [38], so that the
inertia of the modified system is given by (5.2) but such an EQP step did not yield an
overall improvement in performance.)
In the line search procedure, we test the unit steplength along the full step dIQ +dEQ . If
k = 1 does not yield the sufficient decrease condition (2.12), we fall back on the IQP step
and commence the backtracking line search from there. The algorithm terminates when the
following conditions are satisfied
kf (xk ) H(xk )T k G(xk )T k k2 1

k h(xk ), max[0, g(xk )] k2 2
(5.3)
kTk g(xk )k2 3 ,

for some constants 1 , 2 , 3 . Following is a detailed description of the algorithm used in our
tests.
Algorithm SQP+
Initial data: x0 Rn ; 0 , b , 1 , 2 , 3 > 0; , , (0, 1); B1 = I.
For k = 1, 2, ... until (xk , k , k ) satisfies the termination test (5.3):
IQ
IQ
1. Compute the step dIQ
k and multipliers k+1 , k+1 by solving (2.1).
2. Determine the working set Wk . If |Wk | n, go to step 7.

3. Factor the system (5.1); if its inertia is not given by (5.2), then go to step 7.
EQ
EQ
Else, compute the EQP step dEQ
k and multipliers k+1 , k+1 by solving the linear
system (5.1).
EQ
4. Compute 0 to be the largest number in [0, 1] such that dk = dIQ
k + dk satisfies
(2.1b) and (2.1c).
5. Update the penalty parameter k by the rule (2.10)-(2.11).
20
6. If the sufficient decrease condition (2.12) is satisfied, for k = 1, then set
k+1 = EQ
k+1 ,
EQ
xk+1 = xk + dIQ
k + dk ,

[k+1 ]Wk = max 0, EQ
k+1 ,
[k+1 ]Wkc = 0,
and go to step 8.
7. Let k (0, 1] be the first member of the sequence {1, , 2 , . . .} such that
IQ
k (xk + k dIQ
k ) k (xk ) k qred(dk ),
(5.4)
and set
xk+1 = xk + k dIQ
k ;

(k+1 , k+1 ) = (1 k )k + k IQ , (1 k )k + k IQ . (5.5)
8. Update Bk+1 from Bk using the BFGS method with Powell damping.
The constants in the algorithm are chosen as follows: 1 = 2 = 3 = 105 , = 104 ,
= 0.5 and = 0.1. Initially, b = 0 and at the end of the first iteration it is reset to
b = max( m , k(1 , 1 )k /100).

In the next subsection, we compare Algorithm SQP+ with a classical SQP method that
is obtained by omitting steps 2-4 and 6 in Algorithm SQP+. We will refer to this classical
SQP method as C-SQP.
5.1
Numerical Experiments
Our test set consists of 167 small and medium-size problems from the CUTEr [27] collection.
The names of problems and their characteristics are given in Tables 1-4 of the Appendix.
We did not test large problems because our MATLAB code is not suitable for large-scale
applications. We believe, however, that the merits of the SQP+ approach can be evaluated
in the context of small and medium-size problems because, as we discuss below, the EQP
phase does not add a significant cost to the iteration.
Our implementation does not include a feature to ensure that the constraints (2.1b)(2.1c) in the SQP subproblem are always feasible. Therefore, those problems for which
SQP+ or C-SQP encountered an infeasible subproblem were removed from the test set.
The performance of SQP+ and the classical SQP method (both implemented in our
modification of sqplab) is reported in Figure 5.1. This figure plots the logarithmic performance profiles of Dolan and More [15], based on the number of iterations required for
convergence. (A comparison based on number of function evaluations would give very similar performance profiles.) Detailed results are given in Tables 1-4. (In only 7 of the test
problems was the EQP phase skipped because the inertia was not given by (5.2).)
The benefit of adding the EQP phase can be observed by comparing SQP+ and C-SQP,
since the other aspects of these two algorithms are identical. The results presented here
suggest that this benefit can be substantial.
21
0.8
0.6
0.4
0.2
SQP+
C-SQP
0
1
16
32
64
Figure 5.1: Comparison of C-SQP and SQP+ in terms of iterations.
To obtain another measure of the performance of the SQP+ algorithm, we also compare
it with snopt [24, 23] version 7.2-1, in terms of major iterations. We used the same test
set as in the previous experiment except that we removed all linear and quadratic programs
because snopt recognizes their structure and solves them in one major iteration (e.g. using
the exact Hessian for a quadratic program), whereas the SQP+ method treats linear and
quadratic programs as general nonlinear programs. The results are reported in Figure 5.2.
Although the test set used for this experiments is limited, we believe that the results are
encouraging both in terms of robustness and number of iterations.
As already mentioned, our Matlab implementation does not permit timing experiments
on large-scale problems, and therefore an evaluation of the speed of the SQP+ approach
must await the development of a sparse large-scale implementation. We mention, nonetheless, that the additional computational cost incurred by the EQP phase is not likely to
be significant. This view is based on the numerical experiments reported in [4] using a
sequential linear-quadratic programming method that contains an EQP phase, just like Algorithm SQP+. That paper reports that the cost of the EQP phase is dominated by the
cost of the linear programming phase. We can expect the same in the SQP+ algorithm since
solving a quadratic program is generally more expensive than solving a linear program.
We conclude this section by describing an enhancement to the EQP phase that we
developed and tested. Rather than using the contraction parameter in step 4 of Algorithm SQP+ to ensure that all linearized constraints are satisfied, we experimented with the
option of computing the minimum norm projection of the step dIQ + dEQ onto the feasible
region defined by (2.1b), (2.1c). We found that the improvements in performance obtained
with the projection approach were not substantial, except on bound constrained problems.
22
0.8
0.6
0.4
0.2
SQP+
SNOPT
0
1
16
32
Figure 5.2: Comparison of the number of major iterations in snopt and sqp+.
Since computing the projection is expensive, this option does not seem viable at this point
and further investigation is needed to determine the most effective implementation of the
EQP phase.
Final Remarks
We have presented a sequential quadratic programming method that can employ exact
second derivative information but never solves indefinite inequality-constrained quadratic
programs. It is a two-stage method in which global convergence and active-set identification
are driven by an IQP phase (which solves convex quadratic programs), while fast asymptotic
convergence is achieved by an EQP phase (which solves equality constrained subproblems).
The numerical results presented in this paper suggest that the addition of the EQP phase
leads to an important decrease in the total number of iterations and function evaluations.
Simultaneously with this work, Gould and Robinson [30] have developed a trust region
method that has many similarities with the algorithm proposed here. The main difference
between their approach and ours lies in the way the IQP and EQP steps are combined, in
their use of trust regions, in the procedure for computing the EQP step, in the definition
of Cauchy decrease, and in various details of implementation.
The approach presented here does not overcome one of the main limitations of SQP
methods, namely the major computational cost of solving large quadratic programs, particularly when the reduced space is large. Although the IQP matrix Bk can be chosen to
be a simple matrix such as a limited memory BFGS matrix, or even a diagonal matrix, the
cost of the IQP phase can still be significant. Therefore we view SQP+ as a method that
23
is applicable to the class of problems that is currently solved by SQP methods.
Acknowledgement. We the authors would like to thank Richard Byrd, and one of the
referees for many useful comments.
References
[1] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta Numerica,
4:151, 1995.
[2] J. F. Bonnans and G. Launay. Sequential quadratic-programming with penalization of
the displacement. SIAM Journal on Optimization, 5(4):792812, 1995.
[3] J. V. Burke and S. P. Han. A robust sequential quadratic-programming method. Mathematical Programming, 43(3):277303, 1989.
[4] R. H. Byrd, N. I. M. Gould, J. Nocedal, and R. A. Waltz. An algorithm for nonlinear optimization using linear programming and equality constrained subproblems.
Mathematical Programming, Series B, 100(1):2748, 2004.
[5] R. H. Byrd, N. I. M. Gould, J. Nocedal, and R. A. Waltz. On the convergence of
successive linear-quadratic programming algorithms. SIAM Journal on Optimization,
16(2):471489, 2006.
[6] R. H. Byrd, J. Nocedal, and R. A. Waltz. Steering exact penalty methods. Optimization
Methods and Software, 23(2), 2008.
[7] R. H. Byrd, J. Nocedal, and R.A. Waltz. KNITRO: An integrated package for nonlinear
optimization. In G. di Pillo and M. Roma, editors, Large-Scale Nonlinear Optimization,
pages 3559. Springer, 2006.
[8] R.H. Byrd, J. Nocedal, and G. Lopez-Calva. A line search exact penalty method using
steering rules. Technical report, Optimization Center, Northwestern University, 2009.
[9] R. M. Chamberlain, M. J. D. Powell, C. Lemarechal, and H. C. Pedersen. The watchdog
technique for forcing convergence in algorithms for constrained optimization. Mathematical Programming Studies, 16(MAR):117, 1982.
[10] L. Chen and D. Goldfarb. Interior-point `2 penalty methods for nonlinear programming
with strong global convergence properties. Mathematical Programming, 108(1):136,
2006.
[11] C. M. Chin and R. Fletcher. On the global convergence of an SLP-filter algorithm that
takes EQP steps. Mathematical Programming, Series A, 96(1):161177, 2003.
24
[12] Thomas F. Coleman and Arun Verma. A preconditioned conjugate gradient approach
to linear equality constrained minimization. Computational Optimization and Applications, 20(1):6172, 10 2001.
[13] A. R. Conn, N. I. M. Gould, and Ph. Toint. Trust-region methods. MPS-SIAM Series
on Optimization. SIAM publications, Philadelphia, Pennsylvania, USA, 2000.
[14] J. E. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Optimization
and Nonlinear Equations. Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1983.
Reprinted as Classics in Applied Mathematics 16, SIAM, Philadelphia, Pennsylvania,
USA, 1996.
[15] E. D. Dolan and J. J. More. Benchmarking optimization software with performance
profiles. Mathematical Programming, Series A, 91:201213, 2002.
[16] A. Drud. CONOPT: A GRG code for large sparse dynamic nonlinear optimization
problems. Mathematical Programming, 31:153191, 1985.
[17] F. Facchinei and S. Lucidi. Quadratically and superlinearly convergent algorithms for
the solution of inequality constrained minimization problems. Journal of Optimization
Theory and Applications, 85:265289, 1994.
[18] R. Fletcher. Practical Methods of Optimization. J. Wiley and Sons, Chichester, England, second edition, 1987.
[19] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Mathematical Programming, 91:239269, 2002.
[20] R. Fletcher and E. Sainz de la Maza. Nonlinear programming and nonsmooth optimization by successive linear programming. Mathematical Programming, 43(3):235256,
1989.
[21] M.P. Friedlander, N.I.M. Gould, S. Leyffer, and T.S. Munson. A filter active-set trustregion method. Technical Report Preprint ANL/MCS-P1456-097, Argonne National
Laboratory, 2007.
[22] J. C. Gilbert. SQPlab - A MATLAB software package for solving nonlinear optimization
problems and optimal control problems. Technical Report Version 0.4.1, INRIA, 2007.
[23] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale
constrained optimization. SIAM Journal on Optimization, 12:9791006, 2002.
[24] P. E. Gill, W. Murray, and M. A. Saunders. Users guide for SNOPT 7.1: a Fortran
package for large-scale nonlinear programming. Technical Report NA 05-2, Department
of Mathematics, University of California, San Diego, 2005.
[25] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press,
London, 1981.
25
[26] N. I. M. Gould, M. E. Hribar, and J. Nocedal. On the solution of equality constrained

quadratic problems arising in optimization. SIAM Journal on Scientific Computing,
23(4):13751394, 2001.
[27] N. I. M. Gould, D. Orban, and Ph. L. Toint. CUTEr and sifdec: A Constrained and
Unconstrained Testing Environment, revisited. ACM Trans. Math. Softw., 29(4):373
394, 2003.
[28] N. I. M. Gould, D. Orban, and Ph. L. Toint. Numerical methods for large-scale nonlinear optimization. Acta Numerica, pages 299361, 2005.
[29] Nicholas I. M. Gould and Daniel P. Robinson. A second derivative sqp method: Global
convergence. SIAM Journal on Optimization, 20(4):20232048, 2010.
[30] Nicholas I. M. Gould and Daniel P. Robinson. A second derivative sqp method: Local
convergence and practical issues. SIAM Journal on Optimization, 20(4):20492079,
2010.
[31] C. Keller, N. I. M. Gould, and A. J. Wathen. Constraint preconditioning for indefinite
linear systems. SIAM Journal on Matrix Analysis and Applications, 21(4):13001317,
2000.
[32] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations
Research. Springer, second edition, 2006.
[33] E. O. Omojokun. Trust region algorithms for optimization with nonlinear equality and
inequality constraints. PhD thesis, University of Colorado, Boulder, Colorado, USA,
1989.
[34] M. J. D. Powell. A fast algorithm for nonlinearly constrained optimization calculations.
In G. A. Watson, editor, Numerical Analysis, Dundee 1977, number 630 in Lecture
Notes in Mathematics, pages 144157, Heidelberg, Berlin, New York, 1978. Springer
Verlag.
[35] M. J. D. Powell. Variable metric methods for constrained optimization. In Bachem,
A., Gr
otschel, M., and Korte, B., editors, Mathematical Programming : The State of
the Art, Bonn 1982. Springer-Verlag, 1983.
[36] S. M. Robinson. Perturbed Kuhn-Tucker points and rates of convergence for a class of
nonlinear programming algorithms. Mathematical Programming, 7(1):116, 1974.
[37] K. Schittkowski. The nonlinear programming method of Wilson, Han and Powell with
an augmented Lagrangian type line search function. Numerische Mathematik, 38:83
114, 1981.
[38] R. J. Vanderbei and D. F. Shanno. An interior point algorithm for nonconvex nonlinear
programming. Computational Optimization and Applications, 13:231252, 1999.
26
[39] R. A. Waltz, J. L. Morales, J. Nocedal, and D. Orban. An interior algorithm for

nonlinear optimization that combines line search and trust region steps. Mathematical
Programming, Series A, 107:391408, 2006.
27
Appendix: Test Problems and Numerical Results

name
airport
biggsb1
bqpgasim
chenhark
clnlbeam
combustion
dixchlnv
dnieper
dual3
eg3
eigmaxa
eigmaxb
eigmina
eigminb
expfita
explin
explin2
grouping
haifas
haldmads
hanging
harkerp2
himmelbi
hs001
hs002
hs003
hs004
hs005
hs006
hs007
hs008
hs009
hs010
hs011
hs012
hs014
hs015
hs018
bounds
ineq
equal
SQP+
C-SQP
84
144
50
200
200
144
100
57
111
101
101
101
101
101
5
144
144
100
7
6
288
144
100
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
168
286
100
200
198
288
100
112
222
200
200
202
202
202
0
288
288
200
0
0
0
144
100
1
1
1
2
4
0
0
0
0
0
0
0
0
1
4
42
0
0
0
0
0
0
0
0
199
0
0
0
0
21
0
0
0
9
42
180
0
12
0
0
0
0
0
0
0
0
0
1
1
1
1
2
2
0
0
0
0
0
0
50
24
1
1
101
101
101
101
0
0
0
125
0
0
0
0
0
0
0
0
0
0
1
1
2
1
0
0
0
1
0
0
11
72
3
4
3
1
28
23
13
18
6
8
2
8
13
97
72
1
10
9
46
37
35
26
10
2
2
8
7
9
5
5
9
6
8
6
3
6
89
188
24
413
3
15
-24
-49
56
18
6
8
2
8
39
112
95
1
10
10
162
92
36
21
15
8
2
5
7
10
5
5
12
11
10
7
-7
8
* A negative number indicates failure to converge.
Table 1: Number of iterations for SQP+ and C-SQP.
28
name
bounds
ineq
equal
SQP+
C-SQP
hs019
hs020
hs021
hs022
hs023
hs024
hs025
hs026
hs027
hs028
hs029
hs030
hs031
hs032
hs033
hs034
hs035
hs036
hs037
hs038
hs039
hs040
hs041
hs042
hs043
hs044
hs045
hs046
hs047
hs048
hs049
hs050
hs051
hs052
hs053
hs054
hs055
hs056
hs057
hs059
hs060
hs062
hs064
hs065
hs066
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
3
4
4
5
5
5
5
5
5
5
5
5
6
6
7
2
2
3
3
3
3
3
4
2
4
0
4
2
6
0
0
0
0
6
6
3
4
6
3
6
6
8
0
0
8
3
0
4
10
0
0
0
0
0
0
0
10
12
8
7
2
4
6
6
3
6
6
2
3
1
2
5
2
0
0
0
0
1
1
1
1
2
2
1
1
1
0
0
0
0
0
3
6
0
0
0
0
0
0
0
0
0
0
0
0
1
3
0
0
1
1
2
0
0
0
0
0
0
0
1
1
1
0
0
0
1
0
0
0
0
0
0
2
3
1
1
0
0
0
2
3
2
2
3
3
3
3
1
6
4
0
0
1
1
0
0
0
6
8
1
4
6
4
0
21
24
4
7
1
6
3
4
7
3
2
5
60
12
6
5
5
7
6
0
51
28
7
29
16
3
8
1
2
2
5
6
10
6
7
35
8
4
6
-22
3
4
6
4
0
21
32
4
10
2
-10
4
4
7
9
-3
8
100
15
6
6
9
9
6
0
51
29
7
29
16
3
12
16
5
2
13
2
15
9
9
110
25
6
name
hs070
hs071
hs072
hs073
hs075
hs076
hs077
hs078
hs079
hs080
hs081
hs083
hs085
hs086
hs087
hs089
hs090
hs091
hs092
hs093
hs095
hs096
hs097
hs098
hs100
hs100lnp
hs101
hs102
hs103
hs104
hs105
hs106
hs108
hs110
hs111
hs111lnp
hs113
hs116
hs117
hs118
hs119
hs268
hs3mod
hs44new
29
bounds
ineq
equal
SQP+
C-SQP
4
4
4
4
4
4
5
5
5
5
5
5
5
5
9
3
4
5
6
6
6
6
6
6
7
7
7
7
7
8
8
8
9
10
10
10
10
13
15
15
16
5
2
4
8
8
8
4
8
4
0
0
0
10
10
10
10
5
18
0
0
0
0
6
12
12
12
12
0
0
14
14
14
16
16
16
1
20
20
0
0
26
15
30
32
0
1
4
1
1
2
2
1
3
0
0
0
0
0
3
36
6
0
1
1
1
1
2
4
4
4
4
4
0
6
6
6
6
0
6
13
0
0
0
8
15
5
17
0
5
0
5
0
1
0
1
3
0
2
3
3
3
3
0
0
0
4
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
3
3
0
0
0
0
8
0
0
0
14
5
18
4
6
1
17
7
13
6
15
4
0
5
67
23
24
18
26
76
2
2
13
13
8
14
49
32
39
11
75
24
11
6
10
48
5
348
17
7
4
2
5
5
31
6
35
4
18
5
17
7
13
7
13
-6
0
9
88
60
25
39
38
31
2
2
13
13
16
16
-77
-91
245
34
-94
46
12
7
47
47
17
95
-17
13
-19
32
7
5
name
jnlbrng1
jnlbrng2
jnlbrnga
jnlbrngb
loadbal
makela3
mccormck
minsurfo
ncvxbqp1
ncvxbqp2
ncvxbqp3
nobndtor
nonscomp
obstclae
obstclbm
optctrl3
optctrl6
optprloc
pentdi
probpenl
prodpl0
prodpl1
qrtquad
qudlin
rk23
s368
synthes1
torsion1
torsion2
torsion3
torsion4
torsion5
torsion6
torsiona
torsionb
torsionc
torsiond
torsione
torsionf
trimloss
30
bounds
ineq
equal
SQP+
C-SQP
144
144
144
144
31
21
144
144
144
144
144
144
144
169
169
118
118
30
144
144
60
60
144
144
17
144
6
144
144
144
144
144
144
144
144
144
144
144
144
142
144
144
144
144
42
0
288
144
288
288
288
144
288
338
338
0
0
60
144
288
60
60
20
288
6
288
12
288
288
288
288
288
288
288
288
288
288
288
288
264
0
0
0
0
20
20
0
0
0
0
0
0
0
0
0
1
1
29
0
0
9
9
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
0
52
0
0
0
0
11
0
0
0
0
0
0
0
0
0
0
79
79
0
0
0
20
20
0
0
11
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
7
4
3
1
17
22
4
5
2
4
124
5
9
11
6
7
7
6
1
2
8
6
36
2
9
0
5
4
5
2
5
1
3
4
8
2
4
1
3
183
36
45
31
61
79
-19
17
43
-3
-6
142
37
59
28
26
-206
-206
12
2
2
8
6
57
-3
9
0
8
17
20
9
13
3
11
17
18
8
17
3
8
114

SQP

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

SQP

Încărcat de

Drepturi de autor:

Formate disponibile

A Sequential Quadratic Programming Algorithm with an

Additional Equality Constrained Phase

A Sequential Quadratic Programming Algorithm

where f : Rn R, h : Rn Rm and g : Rn Rt are smooth functions. We write the

where H, G are the Jacobian matrices of h and g, respectively.

A Sequential Quadratic Programming Algorithm

where g(x) , max{0, gi (x)} and > 0 is a penalty parameter.

The SQP+ Method

subject to h(xk ) + H(xk )d = 0,

W (x, , ) , 2xx L(x, , ),

A Sequential Quadratic Programming Algorithm

For a step that satisfies the linearized constraints (2.1b)-(2.1c), we have

A Sequential Quadratic Programming Algorithm

where is the same constant as in (2.12). We then set

and Wk []Wk []Wk I .

The new SQP algorithm is specified as follows.

A Sequential Quadratic Programming Algorithm

2. Determine the working set Wk ;

5. Compute the penalty parameter k by (2.11);

A Sequential Quadratic Programming Algorithm

Global Convergence Properties

A Sequential Quadratic Programming Algorithm

Recalling from (2.9) that qred(dIQ

Lemma 3.2 There is an index k and a positive constant

and we also have

A Sequential Quadratic Programming Algorithm

Lemma 3.3 Under Assumptions 3.1, Algorithm I yields the limit

By (2.9) we have that qred(dIQ

for some constant L1 > 0 independent of k. From (3.2a) we have

A Sequential Quadratic Programming Algorithm

and (3.9) yields

Recalling (2.7), this inequality gives

Substituting this inequality in the second limit in (3.8) gives

which implies that limkTi ,k qred(dIQ

We can now establish some limiting properties of the iterates.

A Sequential Quadratic Programming Algorithm

Proof. From Lemma 3.3 and condition (2.9) we get

and consequently by (3.17)

Therefore (3.19) simplifies to

where Jk = {i : gi (xk ) < 0}. Therefore,

A Sequential Quadratic Programming Algorithm

lim 1 dIQ T Bk dIQ

Hence, by recalling (3.1) we have

which implies that dIQ

We can now prove the main result of this section.

We have shown in Lemma 3.4 that the sequence {dIQ

By Assumption 3.1-(a), the functions f , H, G are continuous. Hence from (3.24),

Local Convergence Properties

A Sequential Quadratic Programming Algorithm

GW (xk ) = [gi (xk )T ]iWI .

A Sequential Quadratic Programming Algorithm

Furthermore, Assumptions 4.1-(a) imply that f , cW , AW are Lipschitz continuous and

kAW (xk ) AW (x )k a kxk x k a .

A Sequential Quadratic Programming Algorithm

Then, from (4.13)

gW c(xk ) + GW c(xk )d = gW c(x ) (gW c(x ) gW c(xk ))

gW c(x ) g + G o 1  + kGW c(x )ko 1  1,

gW c(xk ) + GW c(xk )d > 0.

We can now construct the solution of the IQP subproblem as follows

A Sequential Quadratic Programming Algorithm

is nonsingular at (x , , ), we have by (2.4), (4.13), (4.18) and Assumption 4.1-(a) that,

As a result, for xk sufficiently close to x , step 4 of Algorithm I will choose = 1, step 6

Let us define a matrix function as

kAW (xk ) AW (x )k a kxk x k a .

gW c(x ) g + G o 1 + kGW c(x )ko 1 1,

kTk g(xk )k2 3 ,

b = max( m , k(1 , 1 )k /100).