Aparallel Primal (Dual Simplex Algorithm

Operations Research Letters 27 (2000) 47–55
www.elsevier.com/locate/dsw
A parallel primal–dual simplex algorithm

Diego Klabjan ∗ , Ellis L. Johnson, George L. Nemhauser
School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0205, USA
Received 1 June 1999; received in revised form 1 December 1999
Abstract
Recently, the primal–dual simplex method has been used to solve linear programs with a large number of columns. We
present a parallel primal–dual simplex algorithm that is capable of solving linear programs with at least an order of magnitude
more columns than the previous work. The algorithm repeatedly solves several linear programs in parallel and combines the
dual solutions to obtain a new dual feasible solution. The primal part of the algorithm involves a new randomized pricing
strategy. We tested the algorithm on instances with thousands of rows and tens of millions of columns. For example, an
instance with 1700 rows and 45 million columns was solved in about 2 h on 12 processors. c 2000 Elsevier Science B.V.
All rights reserved.
Keywords: Programming=linear=algorithms; Large-scale systems
1. Introduction Primal–dual algorithms appear to have originated

with Dantzig et al. [9] and have been applied to many
This paper presents a parallel primal–dual simplex combinatorial problems such as network ows Ahuja
algorithm that is capable of solving linear programs et al. [1] and matching, Edmonds [10]. Recently,
with thousands of rows and millions of columns. For Hu [11] and Hu and Johnson [12] have developed
example, an instance with 1700 rows and 45 million a primal–dual simplex algorithm that is designed to
columns was solved in about 2 h on 12 processors. The solve LPs with a very large number of columns. The
largest instance we solved has 25 000 rows and 30 mil- primal–dual simplex iterates between solving primal
lion columns. Such linear programs arise, for example, subproblems over a restricted number of columns and
in airline crew scheduling and several other applica- dual steps. In this paper, we parallelize the dual step
tions as relaxations of set partitioning problems [4]. of the Hu-Johnson algorithm and obtain signicant
speedups. Since the number of dual steps is small,
information is passed infrequently, so it is ecient to
∗ Correspondence address: Department of Mechanical and In-
partition the columns among several computers, i.e.
dustrial Engineering, University of Illinois at Urbana-Champaign, to use a distributed memory system.
Urbana, IL 61801, USA.
Bixby and Martin [7] present a dual simplex algo-
E-mail addresses: diego@isye.gatech.edu (D. Klabjan), ellis.
johnson@isye.gatech.edu (E.L. Johnson), nemhauser@isye. rithm with parallel pricing. However, their approach
gatech.edu (G.L. Nemhauser). requires a shared memory system to achieve good
0167-6377/00/$ - see front matter c 2000 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 6 3 7 7 ( 0 0 ) 0 0 0 1 7 - 1
48 D. Klabjan et al. / Operations Research Letters 27 (2000) 47–55
speedups since pricing is done over all columns natural way of parallelizing the primal–dual algorithm
after every simplex iteration. Bixby and Martin cite since each successive is a convex combination of
other literature on parallel simplex algorithms, none the initial and all previous s.
of which is related to our work.
In Section 2, we review the primal–dual algorithm. Algorithm 1. The primal–dual algorithm
Section 3 presents the parallel primal–dual algorithm. 1. A dual feasible solution and a primal feasible
Section 4 gives the computation results. subproblem are given.
2. Solve the subproblem and let be a dual optimal
solution and x a primal optimal solution.
2. Primal–dual algorithm 3. If b = cx, then (x; ) are optimal solutions. If is
dual feasible, then (x; ) are optimal solutions.
We consider the LP 4. Find an ∈ Q; 0 ¡ ¡ 1 such that (1 − ) +
min{cx: Ax = b; x¿0}; (P) is dual feasible and is maximum. Set = (1 −
) + .
where A ∈ Qn×m ; b ∈ Qm ; x ∈ Qn and Qk is the set 5. Remove all the columns from the subproblem ex-
of k-dimensional rational numbers. cept the basic ones. Add a set of columns with the
The dual is lowest reduced costs rcj to form a new subprob-
max{b: A6c}: (D) lem.
6. Go to step 2.
The primal–dual algorithm is ecient only if the
number of columns is much larger than the number of Suppose we have p processors and each column of
rows, so we assume that n m. A is assigned with equal probability to a processor.
A primal subproblem is a problem with only a sub- A detailed description of the parallel primal–dual
set of columns from the constraint matrix A and cor- algorithm 2 follows.
responding objective coecients. The reduced cost of
Algorithm 2. The parallel primal–dual algorithm
column j with respect to a dual vector is rcj = cj −
Aj , where Aj is the jth column of A. 1. A dual feasible solution and p primal feasible
We start with a statement of the primal–dual algo- subproblems Pi ; i = 1; : : : ; p are given.
rithm 1 from Hu [11]. 2. for i = 1 to p in parallel do
The algorithm maintains a dual feasible solution . 3. Solve the subproblem Pi and let i be a dual op-
At each iteration the algorithm produces a dual vector timal solution and xi a primal optimal solution.
which is not necessarily dual feasible. The two dual 4. If b = cxi , then (xi ; ) are optimal solutions.
vectors are then combined into a new dual feasible 5. end for Pp
vector that gives the largest improvement to the dual 6. Find ∈ Qp+ ; i=1 i 61 such that
objective. The dual vector is used to select columns p
! p
X X
for the next primal subproblem. The procedure is then ˜ = 1 − i + i i (1)
iterated. i=1 i=1
is dual feasible and b˜ is maximum. Set = .
˜
7. If b = cxi for some i, then (xi ; ) are optimal
3. Parallel primal–dual algorithm solutions.
8. for i = 1 to p in parallel do
3.1. Algorithm 9. Remove all the columns from the subproblem
Pi except the basic columns. Using and cont-
The key idea of parallelization is to form several rolled randomization append new columns to Pi .
primal subproblems that are solved in parallel. The 10. end for
new dual feasible vector is a convex combination of 11. Go to step 3.
optimal dual vectors arising from the subproblems and
the current dual feasible solution. This can be seen as a The details for steps 1, 6 and 9 are given below.
D. Klabjan et al. / Operations Research Letters 27 (2000) 47–55 49
Forming initial subproblems (Step 1): Since it is We solve (3) with a SPRINT approach due to
expensive to compute the reduced costs of a large Forrest, see [2,6]. We start with a subproblem con-
number of columns, unless the initial dual solution sisting of the z column and some randomly chosen
is thought to be ‘good’, we choose columns for the ini- y columns. Then we iterate the following steps. Let
tial subproblems randomly. We add articial variables be an optimal dual vector. Delete all the columns
with big costs to each subproblem to make them pri- from the subproblem except the basic columns and
mal feasible. Once an articial variable becomes non- the z column. Add to the current subproblem a sub-
basic, we remove it permanently from the subproblem set of columns of (3) with the smallest reduced cost
(step 3, or step 9 if the internal functionalities of the based on , and then solve it. If all the columns have
linear programming solver are not available). nonnegative reduced cost, then the solution is optimal.
On the other hand, if the initial is thought to be If we compute reduced costs for (3) directly from
‘good’, then the controlled randomization procedure the constraints in (2), then for a single column we
(step 9) described below is applied. would have to compute p reduced costs for a column
Combining dual solutions (Step 6): In step 6 we in P, one for each i , and then compute their sum
nd a convex combination of vectors ; 1 ; : : : ; p that weighted with the dual vector of (3). Instead, we
yields a dual feasible vector and gives the largest in- can compute the reduced cost rcj more eciently by
crease in the dual objective value. Let v = b and rewriting
vi = bi ; i = 1; : : : ; p. Note that vi ¿v for all i by weak p
X
duality. The dual feasibility constraints rcj = rcj − i (rcj − rcji )
p
! p
X X i=1
1− i A + i i A6c m p
! p
!
i=1 i=1
X X X
= cj − 1− i k + i ik akj
can be rewritten as k=1 i=1 i=1
p
X
i (rc − rci )6rc : ˜ j = rcj˜;
= cj − A
i=1
where ˜ ∈ Qm with
Hence can be obtained by solving the LP p p
X X
p
X ˜k = (1 − i )k + i ik ;
max i (vi − v); i=1 i=1
i=1
p
k = 1; : : : ; m:
X
i
i (rc − rc )6rc ; Hence at each iteration of SPRINT, before forming
i=1 (2) the new subproblem, we rst compute ˜ and then the
p
X pricing for (3) is equivalent to the pricing for P with
i 61; respect to .
˜
i=1 We now show that the stopping criteria in step 7 is
¿0: also implied by the dual feasibility of the ’s. Note that
the dual feasibility check could be also done after step
This is an LP with p variables and n+1 constraints, 5 but that would be very costly to perform because of
where p is very small and n is very large. Its dual is the huge number of columns.
n
X
min z + rcj yj ; Pp
Proposition 1. If j=1 j = 1 is an optimal solution
j=1 to (2); then there is an i such that b = cxi . If i is
n
X (3) dual feasible; then b = cxi .
z+ (rcj − rcji )yj ¿vi − v; i = 1; : : : ; p;
Pp
j=1 Proof. Let v = b and vj = cxj . Assume that j=1 j
Pp
z¿0; y1 ; : : : ; yn ¿0: = 1. Then by duality and (1) v = j=1 j vj and hence
Pp
j=1 j (vj − v) = 0. Since vj − v¿0 by weak duality Since there is no reason to distinguish between sub-
and j ¿0, it follows that j (vj − v) = 0 for all j. Since problems, the probability does not depend on the sub-
there is an i such that i ¿ 0, v = vi . problem index i. Clearly pj should be a nonincreasing
If i is dual feasible, then i = 1 and j = 0 for all function of the reduced cost rcj and
j 6= i is feasible to (2) and optimal by weak duality. n
X
Hence v = vi . pj = nSub − nSubLow:
j=1
Choosing columns for the subproblems (Step 9): In minRC6rcj 6maxRC
addition to needing columns with low reduced cost, it In order to emphasize the reduced cost we choose
is important that subproblems receive a representative the rapidly decreasing function exp(−x2 ). The value
sample of columns from the original problem. This is is determined by
achieved by a controlled randomization process based n
on the current dual solution. X
f() = exp(−(rcj )2 )
The following parameters are used in assigning
j=1
columns. minRC6rcj 6maxRC
nSub: the expected number of columns given to a
= nSub − nSubLow: (4)
subproblem (this is determined empirically);
nSubLow: the number of common low reduced cost Because n is so large, it is too expensive to evaluate
columns given to each subproblem in Case 1 below f() or its derivative. Instead, we approximate the
(to be discussed later in the section); value of using a ‘bucket’ approach.
nSub0: the number of columns j with rcj = 0; Let N be the number of buckets (we use N =
minRC: if nSub0 ¡ nSubLow, then minRC ¿ 0 is a 100n). For j = 0; : : : ; N − 1 dene the jth bucket to
number such that the number of columns with reduced contain columns
cost less than minRC is nSubLow; Sj = {i : j · a6rci − minRC6(j + 1) · a};
maxRC: if rcj ¿ maxRC, then column j is not con-
sidered as a candidate for a subproblem; wherePa = (maxRC − minRC)=N . Let bj = |Sj | and
nSubHigh: the number of columns j with rcj 6 Rj = i∈Sj rci =bj , the average reduced cost of the jth
PN −1
maxRC. bucket. Let f̃() = j=0 bj exp(−R2j ) and let ∗ be
To compute maxRC, rst note that we need p · the solution to the equation f̃(∗ ) = nSub − nSubLow.
nSub columns. Also we know that there are nSubLow It is easy to derive bounds
columns with reduced cost below minRC. Thus as- PN −1
suming at the rst iteration that the columns have been ln( j=0 bj =(nSub − nSubLow))
uniformly distributed among the processors, we have minRC 2
∗
p · minRC · nSub 6 (5)
maxRC = : PN −1
nSubLow ln( j=0 bj =(nSub − nSubLow))
6 :
We compute nSubHigh only at the rst iteration. In maxRC 2
subsequent iterations we retain the value of nSubHigh The function f̃ is continuously dierentiable and
and adjust maxRC accordingly. decreasing in the interval given by (5). If we use
We consider 3 cases. Newton’s method with a starting point in the interval
Case 1: nSub0 ¡ nSubLow. All columns j with given by (5), the method converges to the solution of
rcj ¡ minRC are given to each subproblem. Every the equation f̃(∗ ) = nSub − nSubLow (see e.g. [5]).
column j with minRC6rcj 6maxRC is a candidate Newton’s method is fast for small values of N and
for a subproblem. We select columns using the idea the optimal value ∗ is a good approximation to the
that the lower reduced cost columns should have a solution of (4).
higher probability of being selected. Case 2: nSub0¿nSubLow; nSub06k · nSub, where
Let pj be the probability that a column j from the k ¡ 1 is a parameter (we use k = 1=3). Replace
initial constraint matrix A is added to subproblem i. nSubLow by nSub0 and apply the procedure of Case 1.
Case 3: nSub0 ¿ max(nSubLow; k · nSub). Since For simplicity we write n; S instead of ni ; Si . Let s
there are so many columns with rcj =0, we assign them and r̃; r̃6s, be two integers. Suppose that we choose
randomly to the subproblems. Empirically, we ob- s elements uniformly at random from S and we denote
tained good results if the expected number of columns them as S.b Let d̃ be the element with rank r̃ in S.
b The
given to each subproblem is nSub=r where following theorem forms the basis for our heuristic.

nSubHigh nSub02
r = max 1; ; : Theorem 1 (Klabjan [13]). If all elements in S are
nSub0 nSubLow · nSub
dierent; then
The remaining columns are assigned by controlled r̃(n + 1)
randomization as in Case 1. E = E(|{di : di 6d̃}|) = :
s+1
Pricing heuristic: It remains to be shown how to
eciently nd a reduced cost value such that the num- Since we want the sample size s to be as small as
ber of columns with the reduced cost smaller than that possible, we choose r̃ = 1. Hence s = dn=re. For our
value is a given number k. This is required in both instances n ranged in millions and r was always bigger
step 9 and the SPRINT algorithm used to solve (3). than 500. For example, if r = 500 and n = 2 × 106 , the
Using sorting terminology, we want to nd a rank k sampling size is 40 000.
element among reduced cost values. Since for our ap- Note that in step 9 of the parallel primal–dual al-
plications we have only an estimate of k, we want to gorithm, we have to nd elements of rank nSubLow
nd a reduced cost value that yields approximately k and nSubHigh. We need to obtain samples just once
columns. since we can use them for both computations.
Other approaches are considered in [2,11,6]. Bader
and JaJa [3] describe an algorithm for nding a median 3.2. Finite convergence
in parallel, but it is likely to be too slow for our needs
since it is targeted to nd exactly the rank k element. We prove the convergence of the algorithm under
Our approach relies on distributing the columns the assumption of primal nondegeneracy.
randomly. Assume that Si is the set of reduced cost
values at processor i. For simplicity of notation let dj Theorem 2. If problem P is primal nondegenerate
be the reduced cost of column j (instead of rcj(:) ), i.e. and at each iteration all the 0 reduced cost columns
Si = {d1 ; : : :S
; dni }. Our goal is to nd the kth smallest based on are added to each subproblem; then the
p
element in i=1 Si . For our applications k is always parallel primal–dual algorithm terminates ÿnitely.
much smaller than ni .
The following intuitive observation plays a key role. Proof. Suppose the algorithm is in the kth major iter-
Let dmi be an element with rank r = bk=pc in the ation after step 7. Denote by vik the optimal value of
sequence Si and let d = min Sp i=1; :::; p {dmi }. Let m be the subproblem i. Consider the LP (2) and its dual, and let
number of elements in i=1 Si that are smaller than (; z; y) be optimal solutions. Since the stopping cri-
d. Since the numbers di are randomly distributed, m teria
Pin steps 4 and 7 are not satised, by Proposition
p
should be approximately p · r ≈ k. It is clear that 1, i=1 i ¡ 1 and v = b ¡ vik for all i = 1; : : : ; p.
m6p · r. Experiments have shown the validity of the Suppose that =0. Since v ¡ vik , this is the only fea-
claim. Klabjan [13] gives some theoretical support by sible solution to (2). Therefore, conv{; 1 ; : : : ; p } is
proving that r(p − 1)6m as ni → ∞ for all i and the singleton , implying that =i for all i=1; : : : ; p.
k → ∞. Since 1 = is dual feasible and by Proposition 1, the
So the task reduces to nding an element with rank stopping criteria in step 7 is fullled, there is a con-
r in Si . Even here we do not sort the array Si due to the tradiction. Thus 6= 0 and the optimal value to (3) is
possible large value of ni . Any exact sequential me- positive. P
p
dian nding algorithm is likely to be too slow and too Since i=1 i ¡ 1, by complementary slackness
‘exact’ since we are looking only for an approxima- z = 0. ThePn optimal value to (3) is positive and

tion of a rank r element. Since k is typically a small hence rc
j=1 j j y ¿ 0. Since rcj yj ¿0 for all
number, so is r. j = 1; : : : ; n, there is an index j0 ; 16j0 6n such that
y ¿ 0 and rcj0 ¿ 0. By complementary slackness Table 1

Pj0p i
Pp Problem statistics
i=1 i (rcj0 − rcj0 ) = rcj0 . Since i=1 i ¡ 1 and
rcj0 ¿ 0, it easily follows that there is an index Problem Number Number
i name of rows of columns
i0 ; 16i0 6p such that rcj00 ¡ 0.
Consider now the column j0 of P. Since in (2) the pr1 10 677 17 045 897
row j0 is at equality, it follows that rcj˜0 = 0. By as- pr2 13 048 9 234 109
sumption the column is appended to each subproblem, sp1 449 42 134 546
hence to the subproblem i0 . By the nondegeneracy as- sp2 1742 45 952 785
sp3 3143 46 546 240
sumption it follows that vik+1 ¡ vik0 .
Pp k 0
Consider ok = i=1 vi . Since vik ¿vik+1 for all i =
1; : : : ; p and vik0 ¿ vik+1 , it follows that ok ¿ ok+1 for
0 The MPI message passing standard is widely used in
all k. Since there are only nitely many subproblems,
the parallel computing community. It oers facilities
the number of dierent values of vik for all i = 1; : : : ; p
for creating parallel programs to run across cluster
and k is nite. The claim now follows due to the nite
machines and for exchanging information between
number of dierent values in the o sequence and the
processes using message passing procedures like
monotonicity property ok ¿ ok+1 .
broadcast, send, receive and others.
The linear programming solver used was CPLEX,
Note that the dual objective value increases at each
CPLEX, Optimization [8], version 5.0.
iteration regardless of degeneracy, however the nite-
ness argument does not follow from the nondegener-
4.2. Problem instances
acy assumption.
The instances are listed in Table 1. The set parti-
tioning instances sp1, sp2, and sp3 are airline crew
4. Computational experiments scheduling problems (see e.g. [13]). All of these
instances may contain some duplicate columns but,
4.1. Computing environment because of the method of generation, no more
than approximately 5% of the columns are dupli-
All computational experiments were performed cates. For this reason we do not remove duplicate
on a cluster of machines comprised of 48 300 MHz columns.
Intel Dual Pentium IIs, resulting in 96 processors The remaining two instances also are from Klabjan
available for parallel program execution. All ma- [13]. The problems have a substantial set partitioning
chines are linked via 100 MB point-to-point Ethernet portion. They occurred in a new approach to solving
switched via a Cisco 5500 network switch. Each node the airline crew scheduling problem. The pr2 problem
has 512 MBytes of main memory. is particularly hard due to the high number of rows
The operating system used was Sun Solaris x86, and primal degeneracy.
version 2.5.1, which oers facilities for parallel com- All instances have on average 10 nonzeros per
puting like remote shell (rsh commands), global column.
le system support via NFS, and parallel computing
libraries like MPI or PVM. The cluster is represen- 4.3. Implementation and parameters
tative of typical machines of this type, in its rela-
tively slow internode communications and its good Because some problems are highly primal de-
cost=performance ratio vs. specialized parallel ma- generate, the primal LP solver struggles to make
chines like the CM-5, the Intel Paragon, or the IBM progress. Therefore, we globally perturb the right
SP-2 machines. hand side and then we gradually decrease the per-
The parallel implementation uses the MPI mes- turbation. Precisely, we perturb a row Ai x = bi of
sage passing interface MPI, MPICH implementation P to a ranged row bi − 1 6Ai x6bi + 2 , where
version 1.0, developed at Argonne National Labs. 1 ; 2 are small random numbers. There is no reason
to nd an optimal solution to the perturbed prob- Table 2

lem, so we apply the parallel primal–dual method Subproblem sizes
until Problem Subproblem Subproblem
mini=1; :::; p {vi } − v name size (nSub) size, seq.
¿ gap;
mini=1; :::; p {vi } pr1 17 500 30 000
pr2 12 500 30 000
where gap is a constant. The gap is checked at step sp1 10 000 10 000
7 of the algorithm. If gap is below gap, then the per- sp2 10 000 35 000
turbation is reduced by a factor of 2, i.e. each 1 ; 2 sp3 5 000 10 000
becomes 1 =2; 2 =2. Once all of the epsilons drop
below 10−6 , the perturbation is removed entirely.
For our experiments we set gap = 0:03. three iterations of SPRINT are required, the third one
For set partitioning problems a starting dual fea- just conrming optimality. The execution time never
sible vector is = 0 if all the cost coecients are exceeded 1 min.
nonnegative. However there are many columns with
0 cost resulting in many columns with 0 reduced cost. 4.4. Results
Hence, we perturb by componentwise subtracting a
small random number. This decreases the initial dual Due to the large number of columns and a main
objective but the new is not ‘jammed’ in a corner. memory of 512 MBytes, we were not able to solve
For instances pr1 and pr2 we use the same idea, how- any problem on a single machine. We implemented
ever needs to be changed to accommodate the extra a variant of the sequential primal–dual algorithm in
rows and variables (see [13]). which the columns are distributed across the ma-
At the rst iteration we do not usually have a warm chines. Only the pricing is carried out in parallel.
start and we found that the dual simplex is much faster We call it the primal–dual algorithm with parallel
than the primal simplex. However, a warm start is pricing. In essence, a true sequential primal–dual al-
available at each successive iteration since we keep gorithm would dier only in pricing all the columns
the optimal basis from the previous iteration. Hence sequentially. Based on the assumption that pricing is
the primal simplex is applied. a linear time task (which is the case for our pricing
Next we discuss the parameters nSub and procedure), we estimated the execution times of a
nSubLow. Empirically, we found that nSubLow = true sequential primal–dual implementation.
nSub=2:5 works best. We use this relationship in all The gain of the parallelization of the primal–dual
of our experiments. Table 2 gives nSub for the paral- algorithm is twofold; one is the parallel pricing strat-
lel primal–dual algorithm and the number of columns egy, and the second is having multiple subproblems.
used in the sequential version of the code. In each The rst one is addressed by the primal–dual algo-
case the parameters have been determined empirically rithm with parallel pricing. We give the speedups in
to be the subproblem sizes that have given the best Table 3. As we can see the parallel pricing heuristic
result. They will be discussed further in Section 4.4. has a big impact when the overall execution time is
Finally, we observed empirically that for some in- small and the number of columns is big, i.e. the sp1
stances (sp1 and sp2) and at certain iterations the al- problem. For the remaining problems the parallel pric-
gorithm spends too much time solving subproblems. ing does not have a dominant eect.
So we impose an upper bound of 30 min on the exe- Computational results using the parameters from
cution time of subproblems. This improves the overall Table 2 are presented in Table 4. The paralleliza-
execution time. Note that quitting the optimization be- tion gain of having multiple subproblems is signicant
fore reaching the optimality of subproblems does not for problems with a large number of rows and rela-
aect the correctness of the algorithm. tively small number of columns, i.e. the pr1 and pr2
We start SPRINT for solving (3) with 40 000 ran- problems. The speedup is relatively high for a small
dom columns and in each successive iteration we number of processors, up to 8, and vanishes at 20
append 50 000 best reduced cost columns. Typically processors.
Table 3 Table 5
Eect of parallel pricing The breakdown of execution times on 12 processors
Problem Number of Execution Problem Total Step 3 Steps 6,7 Step 9

name processors time (s) name time time time time
pr1 1 19 200 pr1 9 360 5 183 2727 1150

8 17 140 pr2 37 860 32 689 3166 1005
sp1 720 127 190 251
pr2 1 81 000
sp2 7 440 3 823 689 1700
8 76 860
sp3 27 120 19 425 1 887 3420
sp1 1 3 000
8 1 100
sp2 1 24 000
8 10 320
sp3 1 62 460 further by using a pure shared memory machine. On

8 50 160 average the communication time was 45 s per itera-
tion. A shared memory computing model would def-
initely lead to improvements for the sp1 problem.
Table 4 Additional improvement in the execution time of the
Computational results sp1 problem might result by allowing to be ‘slightly’
Problem Number of Speedup Execution Number of
infeasible. When solving LP (3), we could perform
name processors time (s) iterations just two SPRINT iterations and then quit without prov-
pr1 1 1.00 19 200 44
ing optimality.
6 1.77 10 800 37 A breakdown of execution times is shown in Table
9 2.09 9 180 30 5. For the harder problems (pr1, pr2 and sp3) more
12 2.05 9 360 29 than 70% of the time is spent in solving the subprob-
lems, however for the sp1 problem only 17% of the
pr2 1 1.00 81 000 89
4 1.74 46 500 87
time is spent on subproblem solving. A better com-
8 2.05 39 420 73 munication network would improve the times for step
12 2.13 37 860 72 9, but for the harder problems it would not have a sig-
16 2.23 36 300 68 nicant impact.
20 2.17 37 260 67 Although we use the same subproblem size regard-
sp1 1 1.00 3 000 11
less of the number of processors in our experiments
8 3.33 900 8 reported in Table 4, the subproblem size should de-
12 4.16 720 7 pend on the number of processors. The smaller the
16 4.19 715 7 number of processors, the bigger the subproblem size
should be. For a large number of processors, solv-
sp2 1 1.00 24 000 70
12 3.22 7 440 25
ing many small subproblems is better than spending
16 4.00 6 000 23 time on solving larger subproblems. For the pr1 prob-
20 3.41 7 020 21 lem the optimal subproblem sizes are 30 000, 25 000,
17 500 for p = 6; 9; 12, respectively.
sp3 1 1.00 62 460 62 We would like to point out that there are no ma-
12 2.30 27 120 57
16 2.50 24 960 53
jor synchronization requirements for subproblem so-
20 2.51 24 840 50 lutions. The execution times of subproblems dier by
a small amount, the average being 15 s. When the
execution time for a subproblem reached the upper
So our main conclusion is that the speedups are sig- time limit, all the subproblems achieved the time limit.
nicant when the number of processors is between 4 This fact is not surprising since the structure of the
and 12. The execution times can be improved even subproblems is the same.
The largest problem we have solved so far has 30 [3] D. Bader, J. JaJa, Practical parallel algorithms for dynamic
million columns and 25 000 rows [13]. The execution data redistribution, median nding, and selection, Proceedings
time on 12 processors was 30 h. of the 10th International Parallel Processing Symposium,
1996.
There are several open questions regarding an ef- [4] C. Barnhart, E. Johnson, G. Nemhauser, N. Savelsbergh, P.
cient implementation of a parallel primal–dual sim- Vance, Branch-and-price: column generation for solving huge
plex algorithm. Subproblem size is a key question and integer programs, Oper. Res. 46 (1998) 316–329.
the development of an adaptive strategy could lead [5] D. Bertsekas, Nonlinear Programming, Athena Scientic,
to substantial improvements. To make subproblems Belmont, MA 1995, pp. 79 –90.
[6] R. Bixby, J. Gregory, I. Lustig, R. Marsten, D. Shanno, Very
even more dierent, columns with negative reduced large-scale linear programming: a case study in combining
cost based on i can be added to the subproblems. interior point and simplex methods, Oper. Res. 40 (1992)
We made an initial attempt in this direction but more 885–897.
experimentation needs to be done. [7] R. Bixby, A. Martin, Parallelizing the dual simplex method,
Technical Report CRPC-TR95706, Rice University, 1995.
[8] CPLEX Optimization, Using the CPLEX Callable Library,
5.0 Edition, ILOG Inc., 1997.
Acknowledgements [9] G. Dantzig, L. Ford, D. Fulkerson, A primal–dual algorithm
for linear programs, in: H. Kuhn, A. Tucker (Eds.), Linear
This work was supported by NSF grant DMI- Inequalities and Related Systems, Princeton University Press,
9700285 and United Airlines, who also provided data Princeton, NJ, 1956, pp. 171–181.
for the computational experiments. Intel Corpora- [10] J. Edmonds, Maximum matching and a polyhedron with 0-1
vertices, J. Res. Nat. Bur. Standards 69B (1965) 125–130.
tion funded the parallel computing environment and [11] J. Hu, Solving linear programs using primal–dual
ILOG provided the linear programming solver used subproblem simplex method and quasi-explicit matrices,
in computational experiments. Ph.D. Dissertation, Georgia Institute of Technology, 1996.
[12] J. Hu, E. Johnson, Computational results with a primal–dual
subproblem simplex method, Oper. Res. Lett. 25 (1999) 149–
References 158.
[13] D. Klabjan, Topics in airline crew scheduling and large
[1] R. Ahuja, T. Magnanti, J. Orlin, Network Flows, scale optimization, Ph.D. Dissertation, Georgia Institute of
Prentice-Hall, Englewood Clis, NJ, 1993. Technology, 1999.
[2] R. Anbil, E. Johnson, R. Tanga, A global approach to crew
pairing optimization, IBM Systems J. 31 (1992) 71–78.

Aparallel Primal (Dual Simplex Algorithm

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Aparallel Primal (Dual Simplex Algorithm

Încărcat de

Drepturi de autor:

Formate disponibile

Operations Research Letters 27 (2000) 47–55