Documente Academic
Documente Profesional
Documente Cultură
www.elsevier.com/locate/dsw
Abstract
Recently, the primal–dual simplex method has been used to solve linear programs with a large number of columns. We
present a parallel primal–dual simplex algorithm that is capable of solving linear programs with at least an order of magnitude
more columns than the previous work. The algorithm repeatedly solves several linear programs in parallel and combines the
dual solutions to obtain a new dual feasible solution. The primal part of the algorithm involves a new randomized pricing
strategy. We tested the algorithm on instances with thousands of rows and tens of millions of columns. For example, an
instance with 1700 rows and 45 million columns was solved in about 2 h on 12 processors.
c 2000 Elsevier Science B.V.
All rights reserved.
0167-6377/00/$ - see front matter
c 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 6 3 7 7 ( 0 0 ) 0 0 0 1 7 - 1
48 D. Klabjan et al. / Operations Research Letters 27 (2000) 47–55
speedups since pricing is done over all columns natural way of parallelizing the primal–dual algorithm
after every simplex iteration. Bixby and Martin cite since each successive is a convex combination of
other literature on parallel simplex algorithms, none the initial and all previous s.
of which is related to our work.
In Section 2, we review the primal–dual algorithm. Algorithm 1. The primal–dual algorithm
Section 3 presents the parallel primal–dual algorithm. 1. A dual feasible solution and a primal feasible
Section 4 gives the computation results. subproblem are given.
2. Solve the subproblem and let be a dual optimal
solution and x a primal optimal solution.
2. Primal–dual algorithm 3. If b = cx, then (x; ) are optimal solutions. If is
dual feasible, then (x; ) are optimal solutions.
We consider the LP 4. Find an ∈ Q; 0 ¡ ¡ 1 such that (1 − ) +
min{cx: Ax = b; x¿0}; (P) is dual feasible and is maximum. Set = (1 −
) + .
where A ∈ Qn×m ; b ∈ Qm ; x ∈ Qn and Qk is the set 5. Remove all the columns from the subproblem ex-
of k-dimensional rational numbers. cept the basic ones. Add a set of columns with the
The dual is lowest reduced costs rcj to form a new subprob-
max{b: A6c}: (D) lem.
6. Go to step 2.
The primal–dual algorithm is ecient only if the
number of columns is much larger than the number of Suppose we have p processors and each column of
rows, so we assume that n m. A is assigned with equal probability to a processor.
A primal subproblem is a problem with only a sub- A detailed description of the parallel primal–dual
set of columns from the constraint matrix A and cor- algorithm 2 follows.
responding objective coecients. The reduced cost of
Algorithm 2. The parallel primal–dual algorithm
column j with respect to a dual vector is rcj = cj −
Aj , where Aj is the jth column of A. 1. A dual feasible solution and p primal feasible
We start with a statement of the primal–dual algo- subproblems Pi ; i = 1; : : : ; p are given.
rithm 1 from Hu [11]. 2. for i = 1 to p in parallel do
The algorithm maintains a dual feasible solution . 3. Solve the subproblem Pi and let i be a dual op-
At each iteration the algorithm produces a dual vector timal solution and xi a primal optimal solution.
which is not necessarily dual feasible. The two dual 4. If b = cxi , then (xi ; ) are optimal solutions.
vectors are then combined into a new dual feasible 5. end for Pp
vector that gives the largest improvement to the dual 6. Find ∈ Qp+ ; i=1 i 61 such that
objective. The dual vector is used to select columns p
! p
X X
for the next primal subproblem. The procedure is then ˜ = 1 − i + i i (1)
iterated. i=1 i=1
is dual feasible and b˜ is maximum. Set = .
˜
7. If b = cxi for some i, then (xi ; ) are optimal
3. Parallel primal–dual algorithm solutions.
8. for i = 1 to p in parallel do
3.1. Algorithm 9. Remove all the columns from the subproblem
Pi except the basic columns. Using and cont-
The key idea of parallelization is to form several rolled randomization append new columns to Pi .
primal subproblems that are solved in parallel. The 10. end for
new dual feasible vector is a convex combination of 11. Go to step 3.
optimal dual vectors arising from the subproblems and
the current dual feasible solution. This can be seen as a The details for steps 1, 6 and 9 are given below.
D. Klabjan et al. / Operations Research Letters 27 (2000) 47–55 49
Forming initial subproblems (Step 1): Since it is We solve (3) with a SPRINT approach due to
expensive to compute the reduced costs of a large Forrest, see [2,6]. We start with a subproblem con-
number of columns, unless the initial dual solution sisting of the z column and some randomly chosen
is thought to be ‘good’, we choose columns for the ini- y columns. Then we iterate the following steps. Let
tial subproblems randomly. We add articial variables be an optimal dual vector. Delete all the columns
with big costs to each subproblem to make them pri- from the subproblem except the basic columns and
mal feasible. Once an articial variable becomes non- the z column. Add to the current subproblem a sub-
basic, we remove it permanently from the subproblem set of columns of (3) with the smallest reduced cost
(step 3, or step 9 if the internal functionalities of the based on , and then solve it. If all the columns have
linear programming solver are not available). nonnegative reduced cost, then the solution is optimal.
On the other hand, if the initial is thought to be If we compute reduced costs for (3) directly from
‘good’, then the controlled randomization procedure the constraints in (2), then for a single column we
(step 9) described below is applied. would have to compute p reduced costs for a column
Combining dual solutions (Step 6): In step 6 we in P, one for each i , and then compute their sum
nd a convex combination of vectors ; 1 ; : : : ; p that weighted with the dual vector of (3). Instead, we
yields a dual feasible vector and gives the largest in- can compute the reduced cost rcj more eciently by
crease in the dual objective value. Let v = b and rewriting
vi = bi ; i = 1; : : : ; p. Note that vi ¿v for all i by weak p
X
duality. The dual feasibility constraints rcj = rcj − i (rcj − rcji )
p
! p
X X i=1
1− i A + i i A6c m p
! p
!
i=1 i=1
X X X
= cj − 1− i k + i ik akj
can be rewritten as k=1 i=1 i=1
p
X
i (rc − rci )6rc : ˜ j = rcj˜;
= cj − A
i=1
where ˜ ∈ Qm with
Hence can be obtained by solving the LP p p
X X
p
X ˜k = (1 − i )k + i ik ;
max i (vi − v); i=1 i=1
i=1
p
k = 1; : : : ; m:
X
i
i (rc − rc )6rc ; Hence at each iteration of SPRINT, before forming
i=1 (2) the new subproblem, we rst compute ˜ and then the
p
X pricing for (3) is equivalent to the pricing for P with
i 61; respect to .
˜
i=1 We now show that the stopping criteria in step 7 is
¿0: also implied by the dual feasibility of the ’s. Note that
the dual feasibility check could be also done after step
This is an LP with p variables and n+1 constraints, 5 but that would be very costly to perform because of
where p is very small and n is very large. Its dual is the huge number of columns.
n
X
min z + rcj yj ; Pp
Proposition 1. If j=1 j = 1 is an optimal solution
j=1 to (2); then there is an i such that b = cxi . If i is
n
X (3) dual feasible; then b = cxi .
z+ (rcj − rcji )yj ¿vi − v; i = 1; : : : ; p;
Pp
j=1 Proof. Let v = b and vj = cxj . Assume that j=1 j
Pp
z¿0; y1 ; : : : ; yn ¿0: = 1. Then by duality and (1) v = j=1 j vj and hence
50 D. Klabjan et al. / Operations Research Letters 27 (2000) 47–55
Pp
j=1 j (vj − v) = 0. Since vj − v¿0 by weak duality Since there is no reason to distinguish between sub-
and j ¿0, it follows that j (vj − v) = 0 for all j. Since problems, the probability does not depend on the sub-
there is an i such that i ¿ 0, v = vi . problem index i. Clearly pj should be a nonincreasing
If i is dual feasible, then i = 1 and j = 0 for all function of the reduced cost rcj and
j 6= i is feasible to (2) and optimal by weak duality. n
X
Hence v = vi . pj = nSub − nSubLow:
j=1
Choosing columns for the subproblems (Step 9): In minRC6rcj 6maxRC
addition to needing columns with low reduced cost, it In order to emphasize the reduced cost we choose
is important that subproblems receive a representative the rapidly decreasing function exp(−x2 ). The value
sample of columns from the original problem. This is is determined by
achieved by a controlled randomization process based n
on the current dual solution. X
f() = exp(−(rcj )2 )
The following parameters are used in assigning
j=1
columns. minRC6rcj 6maxRC
nSub: the expected number of columns given to a
= nSub − nSubLow: (4)
subproblem (this is determined empirically);
nSubLow: the number of common low reduced cost Because n is so large, it is too expensive to evaluate
columns given to each subproblem in Case 1 below f() or its derivative. Instead, we approximate the
(to be discussed later in the section); value of using a ‘bucket’ approach.
nSub0: the number of columns j with rcj = 0; Let N be the number of buckets (we use N =
minRC: if nSub0 ¡ nSubLow, then minRC ¿ 0 is a 100n). For j = 0; : : : ; N − 1 dene the jth bucket to
number such that the number of columns with reduced contain columns
cost less than minRC is nSubLow; Sj = {i : j · a6rci − minRC6(j + 1) · a};
maxRC: if rcj ¿ maxRC, then column j is not con-
sidered as a candidate for a subproblem; wherePa = (maxRC − minRC)=N . Let bj = |Sj | and
nSubHigh: the number of columns j with rcj 6 Rj = i∈Sj rci =bj , the average reduced cost of the jth
PN −1
maxRC. bucket. Let f̃() = j=0 bj exp(−R2j ) and let ∗ be
To compute maxRC, rst note that we need p · the solution to the equation f̃(∗ ) = nSub − nSubLow.
nSub columns. Also we know that there are nSubLow It is easy to derive bounds
columns with reduced cost below minRC. Thus as- PN −1
suming at the rst iteration that the columns have been ln( j=0 bj =(nSub − nSubLow))
uniformly distributed among the processors, we have minRC 2
∗
p · minRC · nSub 6 (5)
maxRC = : PN −1
nSubLow ln( j=0 bj =(nSub − nSubLow))
6 :
We compute nSubHigh only at the rst iteration. In maxRC 2
subsequent iterations we retain the value of nSubHigh The function f̃ is continuously dierentiable and
and adjust maxRC accordingly. decreasing in the interval given by (5). If we use
We consider 3 cases. Newton’s method with a starting point in the interval
Case 1: nSub0 ¡ nSubLow. All columns j with given by (5), the method converges to the solution of
rcj ¡ minRC are given to each subproblem. Every the equation f̃(∗ ) = nSub − nSubLow (see e.g. [5]).
column j with minRC6rcj 6maxRC is a candidate Newton’s method is fast for small values of N and
for a subproblem. We select columns using the idea the optimal value ∗ is a good approximation to the
that the lower reduced cost columns should have a solution of (4).
higher probability of being selected. Case 2: nSub0¿nSubLow; nSub06k · nSub, where
Let pj be the probability that a column j from the k ¡ 1 is a parameter (we use k = 1=3). Replace
initial constraint matrix A is added to subproblem i. nSubLow by nSub0 and apply the procedure of Case 1.
D. Klabjan et al. / Operations Research Letters 27 (2000) 47–55 51
Case 3: nSub0 ¿ max(nSubLow; k · nSub). Since For simplicity we write n; S instead of ni ; Si . Let s
there are so many columns with rcj =0, we assign them and r̃; r̃6s, be two integers. Suppose that we choose
randomly to the subproblems. Empirically, we ob- s elements uniformly at random from S and we denote
tained good results if the expected number of columns them as S.b Let d̃ be the element with rank r̃ in S.
b The
given to each subproblem is nSub=r where following theorem forms the basis for our heuristic.
nSubHigh nSub02
r = max 1; ; : Theorem 1 (Klabjan [13]). If all elements in S are
nSub0 nSubLow · nSub
dierent; then
The remaining columns are assigned by controlled r̃(n + 1)
randomization as in Case 1. E = E(|{di : di 6d̃}|) = :
s+1
Pricing heuristic: It remains to be shown how to
eciently nd a reduced cost value such that the num- Since we want the sample size s to be as small as
ber of columns with the reduced cost smaller than that possible, we choose r̃ = 1. Hence s = dn=re. For our
value is a given number k. This is required in both instances n ranged in millions and r was always bigger
step 9 and the SPRINT algorithm used to solve (3). than 500. For example, if r = 500 and n = 2 × 106 , the
Using sorting terminology, we want to nd a rank k sampling size is 40 000.
element among reduced cost values. Since for our ap- Note that in step 9 of the parallel primal–dual al-
plications we have only an estimate of k, we want to gorithm, we have to nd elements of rank nSubLow
nd a reduced cost value that yields approximately k and nSubHigh. We need to obtain samples just once
columns. since we can use them for both computations.
Other approaches are considered in [2,11,6]. Bader
and JaJa [3] describe an algorithm for nding a median 3.2. Finite convergence
in parallel, but it is likely to be too slow for our needs
since it is targeted to nd exactly the rank k element. We prove the convergence of the algorithm under
Our approach relies on distributing the columns the assumption of primal nondegeneracy.
randomly. Assume that Si is the set of reduced cost
values at processor i. For simplicity of notation let dj Theorem 2. If problem P is primal nondegenerate
be the reduced cost of column j (instead of rcj(:) ), i.e. and at each iteration all the 0 reduced cost columns
Si = {d1 ; : : :S
; dni }. Our goal is to nd the kth smallest based on are added to each subproblem; then the
p
element in i=1 Si . For our applications k is always parallel primal–dual algorithm terminates ÿnitely.
much smaller than ni .
The following intuitive observation plays a key role. Proof. Suppose the algorithm is in the kth major iter-
Let dmi be an element with rank r = bk=pc in the ation after step 7. Denote by vik the optimal value of
sequence Si and let d = min Sp i=1; :::; p {dmi }. Let m be the subproblem i. Consider the LP (2) and its dual, and let
number of elements in i=1 Si that are smaller than (; z; y) be optimal solutions. Since the stopping cri-
d. Since the numbers di are randomly distributed, m teria
Pin steps 4 and 7 are not satised, by Proposition
p
should be approximately p · r ≈ k. It is clear that 1, i=1 i ¡ 1 and v = b ¡ vik for all i = 1; : : : ; p.
m6p · r. Experiments have shown the validity of the Suppose that =0. Since v ¡ vik , this is the only fea-
claim. Klabjan [13] gives some theoretical support by sible solution to (2). Therefore, conv{; 1 ; : : : ; p } is
proving that r(p − 1)6m as ni → ∞ for all i and the singleton , implying that =i for all i=1; : : : ; p.
k → ∞. Since 1 = is dual feasible and by Proposition 1, the
So the task reduces to nding an element with rank stopping criteria in step 7 is fullled, there is a con-
r in Si . Even here we do not sort the array Si due to the tradiction. Thus 6= 0 and the optimal value to (3) is
possible large value of ni . Any exact sequential me- positive. P
p
dian nding algorithm is likely to be too slow and too Since i=1 i ¡ 1, by complementary slackness
‘exact’ since we are looking only for an approxima- z = 0. ThePn optimal value to (3) is positive and
tion of a rank r element. Since k is typically a small hence rc
j=1 j j y ¿ 0. Since rcj yj ¿0 for all
number, so is r. j = 1; : : : ; n, there is an index j0 ; 16j0 6n such that
52 D. Klabjan et al. / Operations Research Letters 27 (2000) 47–55
Table 3 Table 5
Eect of parallel pricing The breakdown of execution times on 12 processors
The largest problem we have solved so far has 30 [3] D. Bader, J. JaJa, Practical parallel algorithms for dynamic
million columns and 25 000 rows [13]. The execution data redistribution, median nding, and selection, Proceedings
time on 12 processors was 30 h. of the 10th International Parallel Processing Symposium,
1996.
There are several open questions regarding an ef- [4] C. Barnhart, E. Johnson, G. Nemhauser, N. Savelsbergh, P.
cient implementation of a parallel primal–dual sim- Vance, Branch-and-price: column generation for solving huge
plex algorithm. Subproblem size is a key question and integer programs, Oper. Res. 46 (1998) 316–329.
the development of an adaptive strategy could lead [5] D. Bertsekas, Nonlinear Programming, Athena Scientic,
to substantial improvements. To make subproblems Belmont, MA 1995, pp. 79 –90.
[6] R. Bixby, J. Gregory, I. Lustig, R. Marsten, D. Shanno, Very
even more dierent, columns with negative reduced large-scale linear programming: a case study in combining
cost based on i can be added to the subproblems. interior point and simplex methods, Oper. Res. 40 (1992)
We made an initial attempt in this direction but more 885–897.
experimentation needs to be done. [7] R. Bixby, A. Martin, Parallelizing the dual simplex method,
Technical Report CRPC-TR95706, Rice University, 1995.
[8] CPLEX Optimization, Using the CPLEX Callable Library,
5.0 Edition, ILOG Inc., 1997.
Acknowledgements [9] G. Dantzig, L. Ford, D. Fulkerson, A primal–dual algorithm
for linear programs, in: H. Kuhn, A. Tucker (Eds.), Linear
This work was supported by NSF grant DMI- Inequalities and Related Systems, Princeton University Press,
9700285 and United Airlines, who also provided data Princeton, NJ, 1956, pp. 171–181.
for the computational experiments. Intel Corpora- [10] J. Edmonds, Maximum matching and a polyhedron with 0-1
vertices, J. Res. Nat. Bur. Standards 69B (1965) 125–130.
tion funded the parallel computing environment and [11] J. Hu, Solving linear programs using primal–dual
ILOG provided the linear programming solver used subproblem simplex method and quasi-explicit matrices,
in computational experiments. Ph.D. Dissertation, Georgia Institute of Technology, 1996.
[12] J. Hu, E. Johnson, Computational results with a primal–dual
subproblem simplex method, Oper. Res. Lett. 25 (1999) 149–
References 158.
[13] D. Klabjan, Topics in airline crew scheduling and large
[1] R. Ahuja, T. Magnanti, J. Orlin, Network Flows, scale optimization, Ph.D. Dissertation, Georgia Institute of
Prentice-Hall, Englewood Clis, NJ, 1993. Technology, 1999.
[2] R. Anbil, E. Johnson, R. Tanga, A global approach to crew
pairing optimization, IBM Systems J. 31 (1992) 71–78.