Sunteți pe pagina 1din 28

Permutation Generation Methods*

ROBERT SEDGEWlCK
Program ~n Computer Science and Dwlsmn of Applled Mathematics
Brown Unwersity, Prowdence, Rhode Island 02912

This paper surveys the numerous methods t h a t have been proposed for p e r m u t a t m n
e n u m e r a t i o n by computer. The various algorithms which have been developed over the
years are described in detail, and zmplemented in a modern ALc,oL-hke language. All of
the algorithms are derived from one rumple control structure.
The problems involved with implementing the best of the algorithms on real com-
puters are treated m detail. Assembly-language programs are derived and analyzed
fully.
The paper is intended not only as a survey of p e r m u t a t i o n generation methods, but
also as a tutomal on how to compare a n u m b e r of different algorithms for the same task
Key Words and Phrases: permutations, combmatomal algorithms, code optimlzatmn,
analysis of algorithms, lexicographlc ordering, random permutatmns, recursion, cyclic
rotatzon.
CR Categories: 3.15, 4.6, 5 25, 5.30.
INTRODUCTION tation generation was being used to solve
problems. Most of the problems that he
Over thirty algorithms have been pub- described are now handled with more so-
lished during the past twenty years for phisticated techniques, but the paper stim-
generating by computer all N! permuta- ulated interest in permutation generation
tions of N elements. This problem is a by computer p e r se. T h e problem is simply
nontrivial example of the use of computers stated, but not easily solved, and is often
in combinatorial mathematics, and it is used as an example in programming and
interesting to study because a number of correctness. (See, for example, [6]).
different approaches can be compared. The study of the various methods that
Surveys of the field have been published have been proposed for permutation gener-
previously in 1960 by D. H. Lehmer [26] ation is still very instructive today because
and in 1970-71 by R. J. Ord-Smith [29, 30]. together they illustrate nicely the rela-
A new look at the problem is appropriate tionship between counting, recursion, and
at this time because several new algo- iteration. These are fundamental concepts
rithms have been proposed in the inter- in computer science, and it is useful to
vening years. have a rather simple example which illus-
Permutation generation has a long and trates so well the relationships between
distinguished history. It was actually one them. We shall see that algorithms which
of the first nontrivial nonnumeric prob- seem to differ markedly have essentially
lems to be attacked by computer. In 1956, the same structure when expressed in a
C. Tompkins wrote a paper [44] describing modern language and subjected to simple
a number of practical areas 'where permu- program transformations. Many readers
* Thin work was supported by the N a t m n a l Science may find it surprising to discover that
F o u n d a t m n G r a n t No. MCS75-23738 ~'top-down" (recursive) and '~bettom-up"

Copyright 1977, Associahon for Computing Machinery, Inc. General permismon to repubhsh, b u t not for
profit, all or part of this m a t e r m l is granted provided t h a t ACM's copymght notice is given and t h a t reference
is made to the publication, to its date of issue, and to the fact t h a t r e p n n t m g privileges were granted by
permission of the Association for Computing Machinery

Computing Surveys, Vol 9, No 2, June 1977


138 R. Sedgewick

CONTENTS help, but not as much as one might think.


Table 1 shows the values of N! for N -< 17
along with the time that would be taken
by a permutation generation program that
produces a new permutation each micro-
second. For N > 25, the time required is
far greater than the age of the earth!
INTRODUCTION For many practical applications, the
1 METHODS BASED ON EXCHANGES sheer magnitude of N! has led to the devel-
Recur~lve methods
Adjacent exchanges opment of "combinatorial search" proce-
Factorial counting dures which are far more efficient than
"Loopless" algorithms permutation enumeration. Techniques
Another lterahve method
2 OTHER TYPES OF ALGORITHMS such as mathematical programming and
Nested cycling backtracking are used regularly to solve
Lexlcograpluc algorlthras
Random permutataons
optimization problems in industrial situa-
3 IMPLEMENTATION AND ANALYSIS tions, and have led to the resolution of
A recurslve method (Heap) several hard problems in combinatorial
An lteratlve method (Ives)
A cychc method (Langdon) mathematics (notably the four-color prob-
CONCLUSION lem). Full treatment of these methods
ACKNOWLEDGMENTS
REFERENCES
would be beyond the scope of this p a p e r -
they are mentioned here to emphasize
that, in practice, there are usually alter-
natives to the "brute-force" method of gen-
erating permutations. We will see one ex-
ample of how permutation generation can
sometimes be greatly improved with a
backtracking technique.
v
In the few applications that remain
where permutation generation is really re-
(iterative) design approaches can lead to quired, it usually doesn't matter much
the same program. which generation method is used, since
Permutation generation methods not the cost of processing the permutations far
only illustrate programming issues in
high-level (procedural) languages; they T A B L E 1. APPROXIMATE TIME NEEDED TO GENERATE
also illustrate implementation issues in ALL PERMUTATIONS OF N (1 /zsec p e r p e r m u t a t i o n )
low-level (assembly) languages. In this pa- N NI Time
per, we shall try to find the fastest possible
way to generate permutations by com- 1 1
puter. To do so, we will need to consider 2 2
some program "optimization" methods (to 3 6
get good implementations) and some 4 24
mathematical analyses (to determine 5 120
which implementation is best). It turns out 6 720
7 5040
that on most computers we can generate
8 40320
each permutation at only slightly more 9 362880
than the cost of two store instructions. 10 3628800 3 seconds
In dealing with such a problem, we must 11 39916800 40 seconds
be aware of the inherent limitations. 12 479001600 8 minutes
Without computers, few individuals had 13 6227020800 2 hours
the patience to record all 5040 permuta- 14 87178291200 1 day
tions of 7 elements, let alone all 40320 15 1307674368000 2 weeks
permutations of 8 elements, or all 362880 16 20922789888000 8 months
permutations of 9 elements. Computers 17 355689428096000 10 years

Computing Surveys, Vol 9, No 2, June 1977


Permutation Generation Methods 139
exceeds the cost of generating them. For P[1]:=:P[2]
example, to evaluate the performance of to mean "exchange the contents of array
an operating system, we might want to try elements P[1] and P[2]". This instruction
all different permutations of a fLxed set of gives both arrangements of the elements
tasks for processing, but most of our time P[1], P[2] (i.e., the arrangement before the
would be spent simulating the processing, exchange and the one after). For N = 3,
not generating the permutations. The several different sequences of five ex-
same is usually true in the study of combi- changes can be used to generate all six
natorial properties of permutations, or in permutations, for example
the analysis of sorting methods. In such
P[1] =:P[2]
applications, it can sometimes be worth-
P[2]:=:P[3]
while to generate "random" permutations P[1] =:P[2]
to get results for a typical case. We shall P[2]-=:P[3]
examine a few methods for doing so in this P[1]:=-P[2].
paper.
In short, the fastest possible permuta- If the initial contents of P[1] P[2] P[3] are
tion method is of limited importance in A B C, then these five exchanges will pro-
practice. There is nearly always a better duce the permutations B A C, B C A,
way to proceed, and if there is not, the C B A , C A B, a n d A C B.
problem becomes really hopeless when N It will be convenient to work with a
is increased only a little. more compact representation describing
Nevertheless, permutation generation these exchange sequences. We can think of
provides a very instructive exercise in the the elements as passing through "permu-
implementation and analysis of algo- tation networks" which produce all the
rithms. The problem has received a great permutations. The networks are com-
deal of attention in the literature, and the prised of "exchange modules" such as that
techniques that we learn in the process of shown in Diagram 1 which is itself the
carefully comparing these interesting al-
gorithms can later be applied to the per-
haps more mundane problems that we face
from day to day. DIAGRAM 1
We shall begin with simple algorithms
that generate permutations of an array by permutation network for N = 2. The net-
successively exchanging elements; these work of Diagram 2 implements the ex-
algorithms all have a common control change sequence given above for N = 3.
structure described in Section 1. We then The elements pass from right to left, and a
will study a few older algorithms, includ- new permutation is available after each
ing some based on elementary operations exchange. Of course, we must be sure that
other than exchanges, in the framework of the internal permutations generated are
this same control structure (Section 2). Fi- distinct. For N = 3 there are 35 = 243
nally, we shall treat the issues involved in possible networks with five exchange mod-
the implementation, analysis, and "opti- ules, but only the twelve shown in Fig. 1
mization" of the best of the algorithms are "legal" (produce sequences of distinct
(Section 3). permutations). We shall most often repre-
sent networks as in Fig. 1, namely drawn
1. METHODS BASED ON EXCHANGES
vertically, with elements passing from top
A natural way to permute an array of to bottom, and with the permutation se-
elements on a computer is to exchange two
of its elements. The fastest permutation
algorithms operate in this way: All N! per-
mutations of N elements are produced by a
:I I
sequence of N ! - 1 exchanges. We shall use
the notation DIAGRAM2.

Computing Surveys, Vol 9, No 2, June 1977


140 R . Sedgewick
~ABC ~ABC ABC A C D A--A D--D B

BAC ACB 'CBA


BCA BCA --'CAB ciEB o-VA ^
CBA BAC 'BAC o

CAB CAB 'BCA D~G~ 3.


ACB CBA ACB

~ B C sively, we can build up networks for four


A BC ~ A BC ~A elements from one of these. For example,
BAC ACB CBA
BCA BCA CAB using four copies of the f'Lrst network in
ACB CBA ACB Fig. 1, we can build a network for N = 4,
CAB CAB BCA
CBA BAC BAC as shown in Diagram 3. This network fills
P[4] with the values D, C, B, A in de-
creasing alphabetic order (and we could
~ABC ~ABC ABC clearly build m a n y similar networks
BAC ACB ~ CBA
CAB CAB BCA which fill P[4] with the values in other
ACB CBA ACB orders).
BCA BCA ~ CAB
CBA BAC BAC The corresponding network for five ele-

~
ments, shown in Diagram 4, is more com-
A BC ~ ABC ~A BC plicated. (The empty boxes denote the net-
BAC ACB CBA work of Diagram 3 for four elements). To
CAB CAB BCA get the desired decreasing sequence in
CBA BAC BAC P[5], we must exchange it successively
BCA BCA CAB
ACB CBA ACB with P[3], P[1], P[3], P[1] in-between gen-
erating all permutations of P [ 1 ] , . . . ,P[4].
FIGURE 1. Legal p e r m u t a t i o n networks for t h r e e In general, we can generate all permu-
elements.
tations of N elements with the following
recursive procedure:
quences t h a t are generated explicitly writ-
ten out on the right. A lg or i t h m 1.
It is easy to see that for larger N there procedure permutations (N);
will be large numbers of legal networks. begin c: = 1;
The methods t h a t we shall now examine loop:
will show how to systematically construct if N > 2 t h e n permutatmns(N-1)
endif;
networks for arbitrary N. Of course, we
while c<N:
are most interested in networks with a P[B [N,c]]:=:P[N];
sufficiently simple structure t h a t their ex- c:=c+l
change sequences can be conveniently im- repeat
plemented on a computer. end;
This program uses the looping control con-
Recursive Methods struct loop while repeat which is
described by D. E. K n u t h [23]. Statements
We begin by studying a class of permuta-
between loop and repeat are iterated:
tion generation methods t h a t are very sim-
when the while condition fails, the loop is
ple when expressed as recursive programs.
exited. If the while were placed immedi-
To generate all permutations of
ately following the loop, then the state-
PIll, ,PIN], we repeat N times the step:
ment would be like a normal ALGOLwhile.
"first generate all permutations of
In fact, Algorithm 1 might be imple-
P[1],- ,P[N-1], then exchange P[N] with
mented with a simpler construct like for
one of the elements P[1],...,P[N-1]". As
this is repeated, a new value is put into
P[N] each time. The various methods dif-
fer in their approaches to f'filing P[N] with ~ ~-c--cJ ~E~EJ ~^ ^'
c~
the N original elements.
The first and seventh networks in Fig. 1 E E~D D~C C B ^

operate according to this discipline. Recur- DIAO~M 4.

Computang Surveys, Vol 9, No. 2, June 1977


Permutation Generation Methods 141

c:=1 until N do . ' - were it not for the this method in which the indices can be
need to test the control counter c within easily computed and it is not necessary to
the loop. The array B[N,c] is an index precompute the index table.
table which tells where the desired value The fLrst of these methods was one of the
of P[N] is after P[1],. ,P[N-1] have been earliest permutation generation algo-
run through all permutations for the cth rithms to be published, by M. B. Wells in
time. 1960 [47]. As modified by J. Boothroyd in
We still need to specify how to compute 1965 [1, 2], Wells' algorithm amounts to
B[N,c]. For each value of N we could spec- using
ify any one of ( N - l ) ! sequences in which
/~-c i f N is even and c > 2
to fill P[N], so there are a total of
(N-1)!(N-2)!(N-3)!. 3!2!1! different ta-
t~N,c]
- 1 otherwise,
bles B[N,c] which will cause Algorithm 1
to properly generate allN! permutations of or, in Algorithm 1, replacing
P[1], .. ,P[N]. P [ B [ N , c ]]:=:P[N] by
One possibility is to precompute BIN,c] if (N even) and (c>2)
by hand (since we know t h a t N is small), then P[N]:=:P[N-c]
continuing as in the example above. If we else P[N]:=:P[N-1] endif
adopt the rule that P[N] should be filled It is rather remarkable that such a simple
with elements in decreasing order of their method should work properly. Wells gives
original index, then the network in Dia- a complete formal proof in his paper, but
gram 4 tells us that B[5,c] should be 1,3,1,3 m a n y readers may be content to check the
for c = 1,2,3,4. F o r N = 6 we proceed in the method for all practical values of N by
same way: if we start w i t h A B C D E F, constructing the networks as shown in the
then the l~LrstN = 5 subnetwork leaves the example above. The complete networks for
elements in the order C D E B A F, so N = 2,3,4 are shown in Fig. 2.
that B[6,1] must be 3 to get the E into P[6], In a short paper t h a t has gone virtually
leaving C D F B A E. The second N = 5 unnoticed, B.R. Heap [16] pointed out sev-
subnetwork then leaves F B A D C E, so eral of the ideas above and described a
that B[6,2] must be 4 to get the D into P[6], method even simpler t h a n Wells'. (It is not
etc. Table 2 is the full table for N <- 12 clear whether Heap was influenced by
generated this way; we could generate per- Wells or Boothroyd, since he gives no ref-
mutations with Algorithm 1 by storing erences.) Heap's method is to use
these N ( N - 1) indices.
There is no reason to insist that P[N] B(N,c)=( I f i N is odd
should be filled with elements in decreas- l f Y is even,
ing order. We could proceed as above to
build a table which fills P[N] in any order or, in Algorithm 1, to replace
we choose. One reason for doing so would P[B[N,c]]:=:P[N] by
be to try to avoid having to store the table: i f N odd then P[N]:=:P[1] else P[N]:=:P[c]endif
there are at least two known versions of Heap gave no formal proof that his method
TABLE 2. I_~EX TA~LEB[N, c] FOaAJP,_,oRrrHM1 works, but a proof similar to Wells' will
show that the method works for all N.
2 1 (The reader may find it instructive to ver-
3 11
ify that the method works for practical
4 123
values of N (as Heap did) by proceeding as
5 3 1 3 1
6 3 4 3 23 we did when constructing the index table
N 7 5 3 1 5 3 1 above.) Figure 3 shows that the networks
8 5 2 7 2 1 23 for N = 2,3,4 are the same as for Algo-
9 7 1 5 5 3 371 rithm 1 with the precomputed index table,
1 0 7 8 1 6 5 4923 but t h a t the network for N = 5, shown in
1 1 9 7 5 3 1 9753 1 Diagram 5, differs. (The empty boxes de-
12963109 4 3 8 9 23 note the network for N = 4 from Fig. 3.)

Computing Surveys, Vol. 9, No. 2, June 1977


142 R. Sedgewick
N=4 jumping to the place following the call on
ABCD return. The following program results di-
BACD rectly when we remove the recursion and
BCAD the loops from Algorithm 1:
CBAD
t:=N;
begtn: c[d:=l;
CABD
loop: if t>2 then t : = t - 1 ; go to begtn end[f;
N=2 ACBD return: if c[t]>-t then go to extt end[f;
P[B[c#]]] =:P[t];
H
AB ACDB
BA CADB c[~]:=c[t] + l;
go to loop;
C DA B
extt" if t < N then t: =t + 1; go to return end[f;
DCAB
This program can be simplified by combin-
DAC B
ing the instructions at begin and loop into
N= A DC B
a single loop, and by replacing the single
A BC
ADBC go to exit with the code at exit:
DA BC t:=N+l;
BAC
BCA
DBAC loop: loop while t>2: t . = t - 1 ; c[t]:=l repeat;
BDAC return, if c[t]>-z
CBA
then if t < N then t : = t + l ;
4j CAB
ACB
BADC
ADBC
go to r e t u r n end[f;
else P[B[c[t]]] =:Pit];
CBDA c[~]:=c[t] + l;
BCDA go to loop;
BDCA
end[f;

-- DBCA The program can be transformed further if


DCBA
we observe that, after c [N],. ., c [2] are all
set to 1 (this is the first thing t h a t the
CDBA
program does), we can do the assignment
FIGURE 2. Wells' algorithm for N =2,3,4. c[i]:=l before t:=i+l rather than after
i:=i-1 without affecting the rest of the
program. But this means that the loop
Neither Wells nor Heap gave recursive does nothing but set i to 2 and it can be
formulations of their methods, although eliminated (except for the initialization),.
Boothroyd [1] later gave a recursive imple- as in this version:
mentation of his version of Wells' method. t:=N;
Although Wells and Heap undoubtedly ar- loop: c[t]:=l while z>2. t:=t-1 repeat;
rived at their nonrecursive programs di- return" [f c[t]>-t
rectly, it is instructive here to derive a then if t < N then c[z]:=l; ~:=t+l;
nonrecursive version of Algorithm 1 by go to return end[f;
else P[B[c[t]]]: =.P[t ];
systematically removing the recursion.
c[d: =c[t] + 1;
The standard method for implementing
t:=2;
a recursive procedure is to maintain a go to return;
stack with the parameters and local varia- end[f;
bles for each invocation of the procedure.
Finally, since the two go to's in this pro-
The simple structure of Algorithm 1
gram refer to the same label, they can be
makes it more convenient to maintain an
replaced with a single loop. repeat. The
array c [ 1 ] , . . . , c [ N ] , where c[i] is the formulation
value of c for the invocation permuta-
i:=N; loop: c[~]:=1 while t > 2 : t : = z - 1 repeat;
tions(i). Then by decrementing i on a call
loop:
and incrementing i on return, we ensure [f c[t]<l then P[B[c[zJ]]:=:P[z];
that c[i] always refers to the proper value c[t]:=c[z]+ 1, t:=2;
ofc. Since there is only one recursive call, else c[t]:=l; t:=t+l;
transfer of control is implemented by end[f;
jumping to the beginning on call and while t<-N repeat;

Computing Surveys, Vol 9, No 2, June 1977


Permutation Generation Methods 143
N=4
really, this is done by turning the permu-
ABC D tation generation program into a proce-
BAC D dure which returns a new permutation
C A B D each time it is called. A main program is
A C B D then written to call this procedure N!
B C A D
times, and process each permutation. (In
this form, the permutation generater can
N=2 C BAD
be kept as a library subprogram.) A more
DBA C
efficient way to proceed is to recognize t h a t
BA il B DA C the permutation generation procedure is
!l A DB
DA BC
C really the "main program" and t h a t each
permutation should be processed as it is
BADC generated. To indicate this clearly in our
N=3
A BDC programs, we shall assume a macro called
i I AC DB
process which is to be invoked each time a
ABe
CADB
new permutation is ready. In the nonre-
! BA C cursive version of Algorithm 1 above, if we
DACB
put a call to process at the beginning and
A DC B
AeB another call to process after the exchange
BOA
C DA B statement, then process will be executed
DCAB N! times, once for each permutation. From
DCBA now on, we will explicitly include such
C DBA calls to process in all of our programs.
B De A The same transformations t h a t we ap-
D B CA
plied to Algorithm 1 yield this nonrecur-
C BDA
sive version of Heap's method for generat-
ing and processing all permutations of
BC DA
P[1], ,P[N]:
FIGURE 3. Heap's algorithm for N = 2,3,4. Algorithm 2 (Heap)
is attractive because it is symmetric: each ~:=N; loop: c[~]:=l while ~>2:~:=~-1 repeat;
process;
time through the loop, either c[i] is initial-
loop'
ized and i incremented, or c[i] is incre- if c/t] <t
mented and i initialized. (Note: in a sense,
then i f t odd then k : = l else k:=c[t] end[f;
we have removed too m a n y go to's, since P[t]:=:P[k];
now the program makes a redundant test i c[l]:=c[t] + l; ~:=2;
-< N after setting i: =2 in the then clause. process,
This can be avoided in assembly language, else c[~]:=l; ~:=~+1
as shown in Section 3, or it could be han- end[f;
dled with an "event variable" as described while I ~ N repeat;
in [24].) We shall examine the structure of This can be a most efficient algorithm
this program in detail later. when implemented properly. In Section 3
The programs above merely generate all we examine further improvements to this
permutations of P[1],. ,P[N]; in order to algorithm and its implementation.
do anything useful, we need to process
each permutation in some way. The proc-
essing might involve anything from sim-
Adjacent Exchanges
ple counting to a complex simulation. Nor- Perhaps the most prominent permutation
enumeration algorithm was formulated in
1962 by S. M. Johnson [20] and H. F. Trot-
-B B- ~C ter [45], apparently independently. They
-C C- -D discovered that it was possible to generate
~D a ^
all N! permutations of N elements with
DIAGRAM 5. N ! - 1 exchanges of adjacent elements.

Computing Surveys, Col 9, No 2, June 1977


144 R. Sedgewick
A---~ B B B B Nffi4

i.!-Tii !
ABCD
BAC D
BCAD
E E' E ETA
BCDA
DL~GRAM 6
C BDA
The method is based on the natural idea N=2 C BA D
that for every permutation of N - 1 ele- CABD

H
A B
ments we can generate N permutations of ACBD
B A
N elements by inserting the new element A C DB
into all possible positions. For example, for CADB
five elements, the first four exchange mod-
CDA B
ules in the permutation network are as
shown in Diagram 6. The next exchange is CDBA
N=
P[1]:=:P[2], which produces a new permu- DC BA
A BC
tation of the elements originally in P[2], DCAB
P[3], P[4], P[5] (and which are now in P[1], BA C
DAC B
P[2], P[3], P[4]). Following this exchange, BCA
ADCB
we bring A back in the other direction, as C BA ADBC
illustrated in Diagram 7. Now we ex- CAB D/kBC
change P[3]:=:P[4] to produce the next ACB DBAC
permutation of the last four elements, and
DBCA
continue in this manner until all 4! permu-
tations of the elements originally in P[2], BDCA
P[3], P[4], P[5] have been generated. The BDAC
network makes five new permutations of BA DC
the five elements for each of these (by ABDC
putting the element originally in P[1] in
all possible positions), so that it generates FIGURE 4. J o h n s o n - T r o t t e r a l g o r i t h m for N = 2,
a total of 5! permutations. 3, 4.
Generalizing the description in the last
paragraph, we can inductively build the Continuing the example above, we get the
network for N elements by taking the net- full network for N = 5 shown in Figure 5.
work for N - 1 elements and inserting By connecting the boxes in this network,
chains of N - 1 exchange modules (to sweep we get the network for N = 4.
the first element back and forth) in each To develop a program to exchange ac-
space between exchange modules. The cording to these networks, we could work
main complication is that the subnetwork down from a recursive formulation as in
for N - 1 elements has to shift back and the preceding section, but instead we shall
forth between the first N - 1 lines and the take a bottom-up approach. To begin,
last N - 1 lines in between sweeps. Figure imagine that each exchange module is la-
4 shows the networks for N = 2,3,4. The belled with the number of the network in
modules in boxes identify the subnetwork: which it first appears. Thus, for N = 2 the
if, in the network for N, we connect the module would be numbered 2; for N = 3
output lines of one box to the input lines of the five modules would be labelled 3 3 2 3 3 ;
the next, we get the network for N - 1 . for N = 4 the 23 modules are numbered
44434443444244434443444;
A.
B I
for N ~ 5 we insert 5 5 5 5 between the
C-
D
: :2:T:T: : numbers above, etc. To write a program to
generate this sequence, we keep a set of
E incrementing counters c[i], 2 < i <- N ,
D I A a ~ 7. which are all initially 1 and which satisfy

Computing Surveys, Vol. 9, No. 2, June 1977


Permutation Generation Methods 145
i,,, [ ~ ! :: I ',,",','
II llll '""II lllIl.II ]Ill ll.l

FIGURE 5 Johnson-Trotter algorithm for N = 5.

1 < c[i] <- i. We fmd the highest index i while i>1:


whose counter is not exhausted (not yet if d/~] then k:=c[~]+x
else k:=~-c[~]+x endif;
equal to i), output it, increment its
P[k]:=:P[k + 1];
counter, and reset the counters of the process;
larger indices: c[i]:=c[i] + l;
i:=1; loop while i<-N: i:=~+1; c[i]:=l repeat; repeat;
c[1]: =0; Although Johnson did not present the al-
loop:
gorithm in a programming language, he
i:=N;
loop while c[,]=i: c[i]:=l; i : = , - i repeat; did give a very precise formulation from
while ,>1: which the above program can be derived.
comment exchange module ~s on level ~; Trotter gave an ALC~OLformulation which
c[~]:=c[~] + l is similar to Algorithm 3. We shall exam-
repeat; ine alternative methods of implementing
When i becomes 1, the process is com- this algorithm later.
p l e t e d - the statement c[1] = 0 terminates An argument which is often advanced in
the inner loop in this case. (Again, there favor of the Johnson-Trotter algorithm is
are simple alternatives to this with "event that, since it always exchanges adjacent
variables" [24], or in assembly language.) elements, the p r o c e s s procedure might be
Now, suppose that we have a Boolean simpler for some applications. It might be
variable d i N ] which is true if the original possible to calculate the incremental effect
P[1] is travelling from P[1] down to P[N] of exchanging two elements rather than
and false if it is travelling from P[N] up to reprocessing the entire permutation. (This
P[1]. Then, when i = N we can replace the observation could also apply to Algorithms
comment in the above program by 1 and 2, but the cases when they exchange
nonadjacent elements would have to be
if d[N] then k:=c[N] else k : = N - c [ N ] endif;
handled differently.)
P[k]:,= :P[k + 1];
The Johnson-Trotter algorithm is often
This will take care of all of the exchanges inefficiently formulated [5, 10, 12] because
on level N. Similarly, we can proceed by it can be easily described in terms of the
introducing a Boolean d [ N - 1 ] for level values of elements being permuted, rather
N - 1 , etc., but we must cope with the fact than their positions. If P[1],. ", P[N] are
that the elements originally in originally the integers 1 , - . . , N, then we
P[2],... ,PIN] switch between those loca- might try to avoid maintaining the offset x
tions and P [ 1 ] , . . . , P [ N - 1 ] . This is han- by noting that each exchange simply in-
dled by including an offset x which is in- volves the smallest integer whose count is
cremented by 1 each time a d[i] switches not yet exhausted. Inefficient implementa-
from false to true. This leads to: tions involve actually searching for this
A l g o r i t h m 3 (Johnson-Trotter) smallest integer [5] or maintaining the in-
~:=1; verse permutation in order to find it [10].
loop while ~<N: ~:=~+1; c[~]:=l; Both of these are far less efficient than the
d[i]:= true; repeat; simple offset method of mgintaining the
c[1]:=0; indices of the elements to be exchanged
process; given by Johnson and Trotter, as in Algo-
loop: rithm 3.
i:=N; x:=0;
loop while c[~]=~: Factorial Counting
if not d/~] then x:=x+l endif;
d[i]:= not d/z]; c[d:=l; i.=~- l; A careful reader may have become suspi-
repeat; cious about similarities between Algo-

Computing Surveys, Vol. 9, No. 2, June 1977


146 R. Sedgewick

rithms 2 and 3. The similarities become digit which is not 9 and change all the
striking when we consider an alternate nines to its right to zeros. If the digits are
implementation of the Johnson-Trotter stored in reverse order in the array
method: c[N],c[N - 1], . . . ,c[2],c[1] (according to
the way in which we customarily write
A l g o r i t h m 3a (Alternate Johnson-Trotter) numbers) we get the program
z"=N;
loop: c[t]:=l; d/l]:=true; while l > l : ~.=~ - 1 repeat; t:=N, loop c[~]:=O while t > l l : = ~ - I repeat;
process, loop:
loop: i f c[~]<9 t h e n c[d:=c[t]+ l; z =1
i f c[t] < N + l - I else c[z]:=O; ~ = z + l
then if d / d then k.=c[~]+x endif;
else k : = N + l - t - c [ t ] + x endif; while ~<-N repeat;
P[k]: = :P [k + 1];
process; From this program, we see that our per-
c[t]:=c[~]+l; ~:=1; x:=0; mutation generation algorithms are con-
else if not d/t] then x : = x + l endif; trolled by this simple counting process, but
c[z]:=l; ~:=z+l; d[t]:= not d/d; in a mixed-radix number system. Where
endif; in ordinary counting the digits satisfy 0 -<
while ~-<N repeat; c[i] <- 9, in Algorithm 2 they satisfy 1 -<
This program is the result of two simple c[i] -< i and in Algorithm 3a they satisfy 1
transformations on Algorithm 3. First, <- c[i] <- N - i + l . Figure 6 shows the val-
change i to N + I - ~ everywhere and rede- ues of c [ 1 ] , . . . ,c[N] when process is en-
fme the c and d arrays so that c [ N + l -~], countered in Algorithms 2 and 3a for N =
d [ N +1 - i ] in Algorithm 3 are the same as 2,3,4.
c[i], d[i] in Algorithm 3a. (Thus a refer- Virtually all of the permutation genera-
ence to c[i] in Algorithm 3 becomes tion algorithms that have been proposed
c [ N + l - i ] when i is changed to N + I - i , are based on such "factorial counting"
which becomes c[i] in Algorithm 3a.) Sec- schemes. Although they appear in the lit-
ond, rearrange the control structure erature in a variety of disguises, they all
around a single outer loop. The condition have the same control structure as the
c[i] < N + l - i in Algorithm 3a is equiva- elementary counting program above. We
lent to the condition c[i] < i in Algorithm have called methods like Algorithm 2 re-
3, and beth programs perform the ex- cursive because they generate all se-
change and process the permutation in quences of c [ 1 ] , . . . , c [ i - 1 ] in-between in-
this case. When the counter is exhausted crements of c[i] for all i; we shall call
(c[i] = N + I - ~ in Algorithm 3a; c[i] = i in methods like Algorithm 3 iterative because
Algorithm 3), both programs fLx the offset, they iterate c[i] through all its values in-
reset the counter, switch the direction, and between increments of c[i + 1], .,c[N].
move up a level.
If we ignore statements involving P, k Loopless Algorithms
and d, we fmd that this version of the
Johnson-Trotter algorithm is identical to An idea that has attracted a good deal of
Heap's method, except that Algorithm 3a attention recently is that the Johnson-
compares c[i] w i t h N + I - i and Algorithm Trotter algorithm might be improved by
2 compares it with i. (Notice that Algo- removing the inner loop from Algorithm 3.
rithm 2 still works properly ff in beth its This idea was introduced by G. Ehrlich
occurrences 2 is replaced by I .) [10, 11], and the implementation was re-
To appreciate this similarity more fully, fined by N. Dershowitz [5]. The method is
let us consider the problem of writing a also described in some detail by S. Even
program to generate all N-digit decimal [12].
numbers: to "count" from 0 to 9 9 . . . 9 = Ehrlich's original implementation was
10N-1. The algorithm that we learn in complex, but it is based on a few standard
grade school is to increment the right-most programming techniques. The inner loop

Computing Surveys, Vol 9, No 2, June 1977


Permu~n Ge~ra~n Met~ 147
N=4 ~=4
i 111 iiii

1121 1112

1211 1113

1221 1114
1311 1121

1321 1122
N-2 2111 N=2 1123
11 ii
2121 1124
21 12
2211 1131

2221 1132

2311 1133
N=3 2321 N=3 1134
111 iii
3111 1211
121 112
3121 1212
211 113
3211 1213
221 121
3221 1214
311 122
3311 1221
321 123
3321 1222

4111 1223

4121 1224

42 ll 1231
4221 1232

4811 1233
4321 1234
(a) (b)

Fmu~ 6. Factorial counting: c[N],..., c[1]. (a) Using Mgorithm 2 ( r e c ~ l v e ) . (b) Using M g o -
rithm 3a (iterative).

in Algorithm 3 has three main purposes: to A l g o r i t h m 3b (Loopless Johnson-Trotter)


find the highest index whose counter is not ::=0;
exhausted, to reset the counters at the loop while : < N : i: = : + 1; c[d: =1 ; d / d : =true;
s[i]:=:- l; x[i]:=O repeat;
larger indices, and to compute the offset x.
process;
The first purpose can be served by main- loop:
taining an array s[i] which tells what the s[N + I].=N; x / : ] : = 0 ;
next value of i should be when c[i] is ex- if c[Q = i then
hausted: normally s[i] = i - 1 , but when if not d/i]
c[i] reaches its limit we set s [ i + l ] to s[i]. then x[s[ Q] :=x[s[ Q] +1; endif;
To reset the other counters, we proceed as d/t]: =not d/d; c[i]: = 1;
we did when removing the recursion from sit + 1]: =s[i]; s[i]: = : - 1;
endff;
Algorithm 1 and reset them just a l ~ r they
t:=s[N + l ];
are incremented, rather than waiting un- while :>1:
til they are needed. Finally, rather than if d / d then k: =c[~]+x[d
computing a "global" offset x, we can else k: =:-c[d+x[i] endif;
maintain an array x[i] giving the current P [ k ] : = : P [ k + 1];
offset at level i: when d[i] switches from process;
false to true, we increment x[s[i]]. These c[Q: =c[d + 1;
changes allow us to replace the inner repeat;
" l o o p . . . r e p e a t " in Algorithm 3 by an This algorithm differs from those de-
"ft. endif'. scribed in [10, 11, 12], which are based on

Computing Surveys, Vol. 9, No 2, J u n e 1977


148 * R . Sedgewick
the less efficient implementations of the
I I ~- ^
Johnson-Trotter algorithm mentioned :[: __l T
above. The loopless formulation is pur- il[l I : L_:
ported to be an improvement because each DIAGRAM 8
iteration of the main loop is guaranteed to
produce a new permutation in a ffLxed up the network for N elements from the
number of steps. network for N - 2 elements. We begin in
However, when organized in this form, the same way as in the Johnson-Trotter
the unfortunate fact becomes apparent method. For N = 5, the first four ex-
t h a t the loopless Algorithm 3b is slower changes are as shown in Diagram 6. But
than the normal Algorithm 3. Loopfree im- now the next exchange is P[1]:=:P[5],
plementation is not an improvement at all! which not only produces a new permuta-
This can be shown with very little analysis tion of P[1],...,P[4], but also puts P[5]
because of the similar structure of the al- back into its original position. We can per-
gorithms. If, for example, we were to count form exactly these five exchanges four
the total number of times the statement more times, until, as shown in Diagram 8,
c[i]:=l is executed when each algorithm we get back to the original configuration.
generates all N! permutations, we would At this point, P[1],...,P[4] have been
find that the answer would be exactly the rotated through four permutations, so t h a t
same for the two algorithms. The loopless we have taken care of the case N = 4. If we
algorithm does not eliminate any such as- (inductively) permute three of these ele-
signments; it just rearranges their order of ments (Ives suggests the middle three)
execution. But this simple fact means t h a t then the 20 exchanges above will give us
Algorithm 3b must be slower than Algo- 20 new permutations, and so forth. (We
rithm 3, because it not only has to execute shall later see a method which makes ex-
all of the same instructions the same num- clusive use of this idea that all permuta-
ber of times, but it also suffers the over- tions of N elements can be generated by
head of maintaining the x and s arrays. rotating and then generating all permuta-
We have become accustomed to the idea tions of N - 1 elements.) Figure 7 shows the
t h a t it is undesirable to have programs networks for N = 2,3,4; the full network
with loops that could iterate N times, but for N = 5 is shown in Fig. 8. As before, if
this is simply not the case with the John- we connect the boxes in the network for N,
son-Trotter method. In fact, the loop iter- we get the network for N - 2 . Note that the
ates N times only once out of the N! times exchanges immediately preceding the
that it is executed. Most often ( N - 1 out of boxes are redundant in t h a t they do not
every N times) it iterates only once. If N produce new permutations. (The redun-
were very large it would be conceivable dant permutations are identified by paren-
that the very few occasions that the loop theses in Fig. 7.) However, there are rela-
iterates many times might be inconven- tively few of these and they are a small
ient, but since we know t h a t N is small, price to pay for the lower overhead in-
there seems to be no advantage whatso- curred by this method.
ever to the loopless algorithm. In the example above, we knew t h a t it
Ehrlich [10] found his algorithm to run was time to drop down a level and permute
"twice as fast" as competing algorithms, the middle three elements when all of the
but this is apparently due entirely to a original elements (but specifically P[1]
simple coding technique (described in Sec- and P[5]) were back in position. If the ele-
tion 3) which he applied to his algorithm ments being permuted are all distinct, we
and not to the others. can test for this condition by intially sav-
ing the values of P[1], ,P[N] in another
Another Iterative Method array Q[1],-.. ,Q[N]:
In 1976, F. M. Ives [19] published an ex- A l g o r i t h m 4 (Ives)
change-based method like the Johnson- ~:=N;
Trotter method which does represent an loop: c[~]'=l; Q[~].=P[~], while ~<1. ~ = ~ - 1 repeat;
improvement. For this method, we build process,

Computing Surveys, Vol 9, No 2, June 1977


Permutation Generation Methods 149
loop. counters are used. However, we can use
i f c[t]<N + l - t counters rather than the test P [ N + I - i ] =
t h e n P[c[t]]: =:P[c[t] + 1] Q [ N + 1 - i ] , since this test always succeeds
c[t]'=c[t]+ l; t : = l ;
after exactly t - 1 sweeps. We are immedi-
process;
else P[z].=:P[N+ 1 -t];
ately led to the implementation:
c[t]: = t ; Algorithm 4a (Alternate Ives)
i f P[N + I - t ]=Q[N + I - t ] t h e n t:=t + l t =N; loop: c[z]:=l; w h i l e t > l : t : = t - 1 r e p e a t ,
else t:=l; process;
process loop
endif, if c[t]<N + l-t
endif, t h e n i f t o d d t h e n P[c[t]]:=:P[c[z]+l]
while t<N+l-z repeat, else P[t]:=:P[N+ l-t] endif,
c[l]'=c[l]+ l, t : = l ;
This program is very similar to Algo- process;
rithms 2 and 3a, but it doesn't fall immedi- e l s e c[z]:=l; t ' = t + l ;
ately into the "factorial counting" schemes endif,
of these programs, because only half of the w h i l e t-<N repeat;

%=4
This method does not require that the ele-
ments being permuted be distinct, but it is
A BC D
slightly less efficient t h a n Algorithm 4 be-
BA C D
cause more counting has to be done.
BC A D
Ives' algorithm is more efficient t h a n
BC DA the Johnson-Trotter method (compare Al-
.....-i
A C D B gorithm 4a w i t h Algorithm 3a) since it
C A D B does not have to m a i n t a i n the array d or
C DA B offset x. The alternate implementation
%=2
C D B A bears a remarkable resemblance to Heap's
A D B C
method (Algorithm 2). Both of these algo-
B A
DA BC
rithms do little more than factorial count-
(B A) l
ing. We shall compare them in Section 3.
D B A C

D BC A
2. OTHER TYPES OF ALGORITHMS
(A B C D)

, A BC

B AC

BC
A C B
A
A C B D

C A B D
C BAD

C B DA
In this section we consider a variety of
algorithms which are not based on simple
exchanges between elements. These algo-
rithms generally take longer to produce all
permutations t h a n the best of the methods
CAB A BDC
already described, but they ave worthy of
BA D C
C BA study for several reasons. For example, in
B D A C some situations it m a y not be necessary to
B D C A generate all permutations, but only some
A DC B "random" ones. Other algorithms may be
DA C B of practical interest because they are based
DC A B on elementary operations which could be
D C BA as efficient as exchanges on some com-
(A C B D)
puters. Also, we consider algorithms that
generate the permutations in a particular
(A B C D)
order which is of interest. All of the algo-
FmURE 7 Ires' a l g o r i t h m for N = 2,3,4 rithms can be cast in terms of the basic
r ii i[ ii 1,~1 ii ][ 1[ Ira 11 I[ II b I[ II II L,I II II II. Id. ]1 II 1I I~
il [ JlI I ll, pllifli~ltI J ~{iili]~ I l E :E~l[ I IIl } JlI ! 1~,l
l'if [I
II)1JJI I IJI l I~l~
qll'lg
II [l ,~ ,[,
~1u
,I1[[ll, l[ll~[II~!
t~
II 11 II IP I[ II il ~1~ II II li IP II II II IP [I li ~ll Ig I I

FIGURE 8 Ives' a l g o r i t h m for N = 5

Computing Surveys, Vol 9, No 2, J u n e 1977


150 R. Sedgewick
"factorial counting" control structure de- loop:
scribed above. rotate(i)
if c[t]<t then c[t]: =c[i]+ 1; i: =2;
process;
Nested Cycling else c[t]:=l; t : = t + l
As we saw when we examined Ives' algo- end[f;
rithm, N different permutations of while t<-N repeat;
P[1],. ,P[N] can be obtained by rotating This is nothing more than a simple count-
the array. In this section, we examine per- ing program with rotation added. The ro-
mutation generation methods which are t a t i o n n e t w o r k s and p e r m u t a t i o n se-
based, solely on this operation. We assume quences generated by this algorithm are
that we have a primitive operation given in Fig. 9. As in Fig. 7, the parenthes-
rotate(i) ized permutations are redundant permuta-
which does a cyclic left-rotation of the ele- tions produced in the course of the compu-
ments P[1],...,P[i]. In other words, ro- tation, which are not passed through the
process macro. The algorithm is not as
tate(i) is equivalent to
inefficient as it may s e e m , because most of
t:=P[1]; k:=2; the rotations are short, but it clearly will
loop while k<t: P[k-1]:=P[k] repeat;
not compete with the algorithms in Sec-
P[i]: =t;
tion 1. This method apparently represents
The network notation for rotate(5) is given the earliest attempt to get a real computer
by Diagram 9. This type of operation can to generate permutations as quickly as
be performed very efficiently on some com- possible. An ALGOL implementation was
puters. Of course, it will generally not be given by Peck and Schrack [33].
more efficient than an exchange, since ro- An interesting feature of the Tompkins-
tate(2) is equivalent to P[1]:=:P[2]. Paige method, and the reason it generates
so many redundant sequences, is that the
B~--C recursive procedure restores the permuta-
c~ D tion to the order it had upon entry. Pro-
grams that work this w a y are called back-
E-'~ track programs [26, 27, 46, 48]. We can
DIAGRAM9. easily apply the same idea to exchange
The most straightforward way to make methods like Algorithm 1. For example:
use of such an operation is a direct recur- procedure permutattons(N) ;
sive implementation like Algorithm 1: begin c"= 1;
loop:
procedure permutattons (N);
P[N]:=:P[c];
begin c:= 1; if N>2 then permutatmns(N-1)
loop: else process end[f;
if N > 2 then permutattons(N-1) P[c ]: = :P[N];
end[f; while c<N:
rotate(N); C:=C+I
while c<N: repeat;
process; end;
C:=c+l
repeat; A procedure like this was given by C. T.
end; Fike [13], who also gave a nonrecursive
version [37] which is similar to a program
W h e n t h e r e c u r s i o n is r e m o v e d f r o m t h i s
developed independently by S. Pleszczyfi-
p r o g r a m i n t h e w a y t h a t r e m o v e d t h e re-
ski [35]. These programs are clearly less
cursion from Algorithm 1, we get an old efficient than the methods of Section 1,
algorithm which was discovered by C. which have the same control structure b u t
Tompkins and L. J. Paige in 1956 [44]: require many fewer exchanges.
A l g o r i t h m 5 (Tompkins-Paige) Tompkins was careful to point out that
i.=N; loop: c[i]=l while t > 2 : t : = t - 1 repeat; it is often possible easily to achieve great
process; savings with this type of procedure. Often,

Computing Surveys, Vol. 9, No. 2, J u n e 1977


Permutation Generation Methods 151
N=4 permutations generated by permuting its
A B C D initial elements will not satisfy the crite-
BA C D ria either, so the exchanges and the call
(A B C D ) permutatmns(N-1) can be skipped. Thus
B C A O
large numbers of permutations never need
be generated. A good backtracking pro-
C B A D
gram will eliminate nearly all of the proc-
(B A D)
essing (see [44] for a good example), and
CA B D
such methods are of great importance in
AC B D many practical applications.
(C A B D) Cycling is a powerful operation, and we
(A B C D ) should expect to find some other methods
B C DA which use it to advantage. In fact, Tomp-
N=2 C B DA
kins' original paper [44] gives a general
proof from which several methods can be

H
A B (B C D A )
B A
constructed. Remarkably, we can take Al-
C D B A
gorithm 5 and switch to the counter sys-
(A B) DC BA
tem upon which the iterative algorithms
(C D B A ) in Section 2 were based:
DB C A
t:=N; loop: c[t]:=l w h i l e t > l . t : = t - 1 repeat;
B D CA
process;
N 3 (D B C A ) loop:
(B C O A ) rotate(N +1 - t);
A BC
--4 ifc/d<N+l-t then c[t]:=c[t]+l; t : = l ;
C D A B
BA C
DC h B
process;
(A B C ) else c[t]:=l; t : = t + l
(C D A B )
BCA endif;
DA C B while z-<N repeat,
C BA
--4 A D C B
(B C A)
(DA CB)
Although fewer redundant permutations
CAB are generated, longer rotations are in-
....,
AC D B
ACB volved, and this method is less efficient
CA DB
(C A B ) than Algorithm 5. However, this method
(A B C) (A C D B) does lend itself to a significant simplifica-
(C D A B) tion, similar to the one Ives used. The
DA BC condition c[i] = N + I - t merely indicates
A D BC that the elements in P[1],P[2],...,
(D A B C ) P [ N + I - i ] have undergone a full rota-
A B DC
t i o n - t h a t is, P[N+ l - t ] is back in its orig-
inal position in the array. This means that
B A D C
if we initially set Q[i] = P[i] for 1 -< i <- N,
(A B D C )
then c [ i ] = N + l - i is equivalent to
B D A C P [ N + I - i ] = Q [ N + I - i ] . But we have now
D B A C removed the only test on c [ i ] , and now it is
(B D A C ) not necessary to maintain the counter ar-
(D A B C ) ray at all! Making this simplification, and
(A B C D) changing t to N + I - i , we have an algo-
rithm proposed by G. Langdon in 1967 [25],
FIGURE 9. Tompkins-Pa,ge algorithm for N = 2, shown in Fig. 10.
3,4.

p r o c e s s involves selecting a small set of A l g o r i t h m 6 (Langdon)


permutations satisfying some simple crite-
ria. In certain cases, once a permutation is t : = l ; l o o p : Q [ I ] : = P [ ~ ] while t < N . ~ : = t + l repeat,
found not to satisfy the criteria, then the process;

Computing Surveys, Vol 9, No 2, J u n e 1977


152 R. Sedgewick
loop: when very fast rotation is available is it
rotate(t);
the method of choice. We shall examine
if P[z]=Q[t] then t : = N else t . = t - 1 endif;
process;
the implementation of this algorithm in
while t->l repeat; detail in Section 3.

This is definitely the most simply ex- Lexicographic Algorithms


pressed of our permutation generation al- A particular ordering of the N! permuta-
gorithms. If P [ 1 ] , . . . , P [ N ] are initially tion of N elements which is of interest is
1 , . . . , N , then we can eliminate the ini- "lexicographic", or alphabetical, ordering.
tialization loop and replace Q[i] by i in the The lexicographic ordering of the 24 per-
main loop. Unfortunately, this algorithm mutations of A B C D is shown in Fig.
runs slowly on most c o m p u t e r s - o n l y lla. "Reverse lexicographic" ordering, the
result of reading the lexicographic se-
N=4 quence backwards and the permutations
from right to left, is also of some interest.
A BC D
Fig. l l b shows the reverse lexicographic
B C DA
ordering of the 24 permutations of
C DAB ABCD.
D A B C The natural definition of these orderings
(A B C D) has meant that many algorithms have
B C A D been proposed for lexicographical permu-
C A D B tation generation. Such algorithms are in-
A DBC
herently less efficient than the algorithms
N=2 of Section 1, because they must often use
D B C A
more than one exchange to pass from one
(B C A D )
B A permutation to the next in the sequence
C ABD (e.g., to pass from A C D B to A D B C
(A B)
A B D C in Fig. lla). The main practical reason
B DCA that has been advanced in favor of lexico-
D CAB graphic generation is that, in reverse or-
(C A B D) der, all permutations of P [ 1 ] , . - - , P I N - l ]
N= (A B C D)
are generated before P[N] is moved. As
A BC BA
with backtracking, a processing program
C D
B C A
could, for example, skip ( N - l ) ! permuta-
A C D B
tions in some instances. However, this
CAB C D BA
property is shared by the recursive algo-
(A B C) D B A C rithms of Section 1 - in fact, the general
BAC (B A C D ) structure of Algorithm 1 allows P[N] to be
AC B A C B D filled in any arbitrary order, which could
CBA C B DA be even more of an advantage than lexico-
(B A C) B D A C graphic ordering in some instances.
(A B C) DA C B
Nevertheless, lexicographic generation
is an interesting problem for which we are
(A C B D)
by now well prepared. We shall begin by
C B A D
assuming that P [ 1 ] , . . . P [ N ] are distinct.
BA D C Otherwise, the problem is quite different,
AD C B since it is usual to produce a lexicographic
D C BA listing with no duplicates (see [3, 9, 39].
(C B A D) We shall fmd it convenient to make use
(BA CD) of another primitive operation, reverse(i).
(A B C D)
This operation inverts the order of the ele-
ments in P[1], P[i]; thus, reverse(i) is
FIGURE 10. Langdon's algorithm for N = 2,3,4. equivalent to

Computing Surveys, Vol 9, No. 2, June 1977


P e r m u t a t i o n Generation Methods 153
ABCD ABCD "dramatic improvement in the state of the
ABDC BACD art" of computer programming since the
ACBD ACBD algorithm is easily expressed in modern
ACDB CABD languages; the fact t h a t the method is over
ADBC BCAD
160 years old is perhaps a more sobering
comment on the state of the a r t of com-
ADCB CBAD
puter science.)
BACD ABDC
The direct lexicographic successor of the
BADC BADC
permutation
BCAD ADBC
BACFHGDE
BCDA DABC
BDAC BDAC
is clearly
BDCA DBAC
BACFHGED,
CABD ACDB but what is the successor of this permuta-
CADB CADB tion? After some study, we see t h a t
CBAD ADCB H G E D are in their lexicographically
CBDA DACB highest position, so the next permutation
CDAB CDAB must begin as B A C G -. The answer
CDBA DCAB
is the lexicographically lowest permuta-
DABC BCDA
tion t h a t begins in this way, or
DACB CBDA
BACGDEFH.
DBAC BDCA Similarly, the direct successor of
DBCA DBCA HFEDGCAB
DCAB CDBA
in reverse lexicographic order is
DCBA DCBA

(a) (b) DEGHFCAB,


~ o t r a ~ 11. ~ m ~ g r a p ~ c o ~ e H n g of A B C D. and its successor is
(a) In n a t u r a l o ~ e r . (b) In r e v e r ~ order.
EDGHFCAB.

~:=i; The algorithm to generate permutations


loop w h i l e ~ < N + I - ~ : in reverse lexicographic order can be
P[~]:=:P[N+I-~]; ~:=t+l repeat; clearly understood from this example. We
This operation will not be particularly effi- first scan from lei~ to right to fred the first
cient on most real computers, unless spe- i such t h a t P[~] > P [ i - 1 ] . If there is no
cial hardware is available. However, it such i, then the elements are in reverse
order and we terminate the algorithm
seems to be inherent in lexicographic gen-
since there is no successor. (For efficiency,
eration.
The furst algorithm t h a t we shall con- it is best to make P [ N + I ] larger t h a n all
the other e l e m e n t s - w r i t t e n P [ N + I ] =
sider is based on the idea of producing each
permutation from its lexicographic prede- ~ - a n d terminate when i = N + I . ) Other-
cessor. Hall and K n u t h [15] found t h at the wise, we exchange P[i] with the next-low-
method has been rediscovered m a ny times est element among P [ 1 ] , . . - , P [ i - 1 ] and
since being published in 1812 by Fischer t he n reverse P[1],. ,P[i-1]. We have
and Krause [14]. The ideas involved first
appear in the modern literature in a rudi- A l g o r i t h m 7 (Fischer-Krause)
P[N+I]=~;
m e n t a r y form in an algorithm by G. process;
Schrack and M. Shimrat [40]. A full for- loop.
mulation was given by M. Shen in 1962 t:=2; loop while P[~]<P[~-I]: ~:=~+1 repeat;
[41, 42], and Phillips [34] gives an "optim- while t < N ;
ized" implementation. (Dijkstra [6] cites j : = l ; loop while P [ j ] > P [ i ] : j : = j + 1 repeat;
the problem as an example to illustrate a P[~]:=:P~];

Computing Surveys, Vol. 9, No 2, June 1977


154 R. Sedgewick
reverse(~ -1); There seem to be no advantages to this
process; method over methods like Algorithm 1.
repeat;
Howell [17, 18] gives a lexicographic
Like the Tompkins-Paige algorithm, this method based on treating P[1], ,P[N] as
algorithm is not as inefficient as it seems, a base-N number, counting in base N, and
since it is most often scanning and revers- rejecting numbers whose digits are not dis-
ing short strings. tinct. This method is clearly very slow.
This seems to be the first example we
have seen of an algorithm which does not Random Permutations
rely on ~'factorial counting" for its control
structure. However, the control structure I f N is so large t h a t we could never hope to
here is overly complex; indeed, factorial generate all permutations of N elements,
counters are precisely what is needed to it is of interest to study methods for gener-
eliminate the inner loops in Algorithm 7. ating ~random" permutations of N ele-
A more efficient algorithm can be de- ments. This is normally done by establish-
rived, as we have done several times be- ing some one-to-one correspondence be-
fore, from a recursive formulation. A sim- tween a permutation and a random num-
ple recursive procedure to generate the ber between 0 a n d N ! - l . (A full t r e a t m e n t
permutations of P [ 1 ] , . - . ,P[N] in reverse of pseudorandom n u m b e r generation by
lexicographic order can be quickly devised: computer may be found in [22].)
procedure lexperms(N) ;
First, we notice t h a t each n u m b e r be-
begin c:= 1; tween 0 and N ! - 1 can be represented in a
loop: mixed radix system to correspond to an
if N > 2 a r r a y c [ N ] , c [ N - 1 ] , . ,c[2] with 0 -< c[i] <-
t h e n lexperms(N-1) end[f; ~-1 for 2 -< i -< N. For example, 1000
while c<N: corresponds to 1 2 1 2 2 0 since 1000 = 6!
P[N]:=:P[c]; + 2.5! + 4! + 2.3! + 2.2!. For 0 -< n < N!,
reverse(N-i); we have n = c[2].1! + c[3].2! + . . . +
c:=c+ l; c[N]. ( N - l ) ! . This correspondence is easily
repeat; established through standard radix con-
end;
version algorithms [22, 27, 47]. Alterna-
Removing the recursion precisely as we tively, we could fill the a r r a y by putting a
did for Algorithm 1, we get ~'random" n u m b e r between 0 and i - 1 in
c[i] for 2 <_ i <_ N .
A l g o r i t h m 8 (Ord-Smith)
~:=N; loop c[~].=l; while ~>2:~:=~-1 repeat; Such arrays c [ N ] , c [ N - 1 ] , . . . , c [ 2 ] can
process; clearly be generated by the factorial count-
loop: ing procedure discussed in Section 1, so
if c[l] <~ then P[~]:='P[c[l]], reverse(~-l ) , t h a t there is an implicit correspondence
c[~]:=c[z]+ l; ~:=2; between such arrays and permutations of
process; 1 2 . . . N. The algorithms t h a t we shall
else c[l]:=l; ~:=~+1 examine now are based on more explicit
end[f; correspondences.
while ~-<N repeat;
The fLrst correspondence has been at-
This algorithm was first presented by R. J. tributed to M. Hall, Jr., in 1956 [15], al-
Ord-Smith in 1967 [32]. We would not ex- though it m a y be even older. In this corre-
pect a priori to have a lexicographic algo- spendence, c[i] is defined to be the number
ri t hm so similar to the normal algorithms, of elements to the left of i which are
but the recursive formulation makes it ob- smaller than it. Given an array, the fol-
vious. lowing example shows how to construct
Ord-Smith also developed [31] a the corresponding permutation. To fmd
"pseudo-lexicographic" algorithm which the permutation of 1 2 7 correspond-
consists of replacing P[~]:=:P[c[i]]; re- ing to
v e r s e ( i - i ) ; by reverse(i) in Algorithm 8. 121220

Computing Surveys, Vol 9, No 2, June 1977


Permutation Generation Methods 155

we begin by writing down the first ele- 1


12
ment, 1. Since c [2] = 0, 2 must precede 1,
312
or
3412
21. 24513
Similarly, c [3] = 2 means that 3 must be 325614
preceded by both 2 and 1: 2436715

213 so that the permutation 2 4 3 6 7 1 5 cor-


responds to 1000. In fact, 2 4 3 6 7 1 5 is
Proceeding in this manner, we build up the 1000th permutation of 1,. ,7 in lexi-
the entire permutation as follows: cographic order: there are 6! permutations
2143 before it which begin with 1, then 2.5!
25143 which begin 2 1 or 2 3, then 4! which be-
256143 gin 2 4 1, then 2.3! which begin 2 4 3 1
2756143 or 2 4 3 5, and then 2.2! which begin
In general, if we assume that c[1] = 0, we 24 3 6 lor243 65.
can construct the permutation P [ 1 ] , . . . , A much simpler method than the above
P[N] with the program two was apparently first published by R.
Durstenfeld [8]. (See also [22], p. 125). We
v=l; notice that P[i] has i - 1 elements preced-
loop. ing it, so we can use the c array as follows:
J:=~; ~:=N;
loop whilej>c[~]+l: PO].=P[j-1]; repeat;
loop while ~->2: P[~]:=:P[c[l]+l]; ~:=z-1 repeat;
P[c/l]+ 1]:=~;
i'=z+l; If we take P[1],.--,P[N] to be initially
while ~-<N repeat; 1 , . . . , N , then the array 1 2 1 2 2 0 cor-
This program is not particularly efficient, responds to the permutation
but the correspondence is of theoretic in- 5 1 4 6 7 3 2. This method involves only
terest. In a permutation P[1],..-,PIN] of one scan through the array and is clearly
1,. ,N, a pair (i j ) such that i < j and more efficient than the above two meth-
P[i] > P0] is called an inversion. The ods.
counter array in Hall's correspondence We could easily construct a program to
counts the number of inversions in the generate all permutations of 1 2 . . . N
corresponding permutation and is called by embedding one of the methods de-
an inversion table. Inversion tables are scribed above in a factorial counting con-
helpful combinatorial devices in the study trol structure as defined in Section 1. Such
of several sorting algorithms (see [23]). a program would clearly be much slower
Another example of the relationship be- than the exchange methods described
tween inversion tables and permutation above, because it must build the entire
generation can be found in [7], where Dijk- array P[1],.-. ,P[N] where they do only a
stra reinvents the Johnson-Trotter method simple exchange. Coveyou and Sullivan
using inversion tables. [4] give an algorithm that works this way.
D. H. Lehmer [26, 27] describes another Another method is given by Robinson [37].
correspondence that was def'med by D. N.
Lehmer as long ago as 1906. To find 3. IMPLEMENTATION AND ANALYSIS
the permutation corresponding to
1 2 1 2 2 0, we first increment each ele- The analysis involved in comparing a
ment by 1 to get number of computer algorithms to perform
the same task can be very complex. It is
232331. often necessary not only to look carefully
Now, we write down 1 as before, and for at how the algorithms will be imple-
i = 2 , . - . , N , we increment all numbers mented on real computers, but also to
which are -> c[i] by 1 and write c[i] to the carry out some complex mathematical
left. In this way we generate analysis. Fortunately, these factors pres-

Computing Surveys, Vol 9, No 2, June 1977


156 R. Sedgewick

ent less difficulty than usual in the case of name, or an indexed m e m o r y reference.
permutation generation. First, since all of For example, A D D 1,1 means "increment
the algorithms have the same control Register I by r'; A D D l,J means "add the
structure, comparisons between many of contents of Register J to Register r'; and
them are immediate, and we need only A D D I,C(J) means "add to Register I the
examine a few in detail. Second, the anal- contents of the m e m o r y location whose ad-
ysis involved in determining the total run- dress is found by adding the contents of
ning time of the algorithms on real com- Register J to C". In addition, we shall use
puters (by counting the total number of control transfer instructions of the form
times each instruction is executed) is not OPCODE LABEL
difficult, because of the simple counting
algorithms upon which the programs are namely JMP (unconditional transfer); JL,
based. JLE, JE, JGE, JG (conditional transfer ac-
If we imagine that we have an impor- cording as whether the firstoperand in the
tant application where all N! permuta- last C M P instruction was <, -<, =, ->, >
tions must be generated as fast as possible, than the second); and C A L L (subroutine
it is easy to see that the programs must be call). Other conditional jump instructions
carefully implemented. For example, if we are of the form
are generating, say, every permutation of OPCODE REGISTER,LABEL
12 elements, then every extraneous in-
struction in the inner loop of the program namely JN, JZ, JP (transfer if the specified
will make it run at least 8 minutes longer register is negative, zero, positive). Most
on most computers (see Table 1). machines have capabilities similar to
Evidently, from the discussion in Sec- these, and readers should have no diffi-
tion 1, Heap's method (Algorithm 2) is the culty translating the programs given here
fastest of the recursive exchange algo- to particular assembly languages.
rithms examined, and Ives' method (Algo- M u c h of our effort will be directed to-
rithm 4) is the fastest of the iterative ex- wards what is commonly called code opti-
change algorithms. All of the algorithms mization: developing assembly language
in Section 2 are clearly slower than these implementations which are as efficient as
two, except possibly for Langdon's method possible. This is, of course, a misnomer:
(Algorithm 6) which m a y be competitive while we can usually improve programs,
on machines offering a fast rotation capa- we can rarely "optimize" them. A disad-
bility. In order to draw conclusions com- vantage of optimization is that it tends to
paring these three algorithms, we shall greatly complicate a program. Although
consider in detail how they can be imple- significant savings m a y be involved, it is
mented in assembly language on real com- dangerous to apply optimization tech-
puters, and we shall analyze exactly how niques at too early a stage in the develop-
long they can be expected to run. ment of a program. In particular, we shall
As we have done with the high-level not consider optimizing until we have a
language, we shall use a mythical assem- good assembly language implementation
bly language from which programs on real which we have fully analyzed, so that we
computers can be easily implemented. can tell where the improvements will do
(Readers unfamiliar with assembly lan- the most good. Knuth [24]presents a fuller
guage should consult [21].) We shall use discussion of these issues.
load (LD), stere (ST), add (ADD), subtract M a n y misleading conclusions have been
(SUB), and compare (CMP) instructions drawn and reported in the literature based
which have the general form on empirical performance statistics com-
paring particular implementations of par-
LABEL OPCODE REGISTER, OPERAND ticular algorithms. Empirical testing can
(optional) be valuable in some situations, but, as we
The first operand will always be a sym- have seen, the structures of permutation
bolic register name, and the second oper- generation algorithms are so similar that
and may be a value, a symbolic register the empirical tests which have been per-

Computing Surveys, Vol. 9, N o 2, June 1977


P e r m u t a t i o n Generation M e t h o d s 157
formed have really been comparisons of This direct translation of Algorithm 2 is
compilers, programmers, and computers, more efficient than most automatic trans-
not of algorithms. We shall see that the lators would produce; it can be further im-
differences between the best algorithms proved in at least three ways. First, as we
are very subtle, and they will become most have already noted in Section 1, the test
apparent as we analyze the assembly lan- w h i l e i -< N need not be performed after
guage programs. Fortunately, the assem- we have set i to 2 (if we assume N > 1), so
bly language implementations aren't we can replace JMP WHILE in Program 1
much longer than the high-level descrip- by JMP LOOP. But this unconditional
tions. (This turns out to be the case with jump can be removed from the inner loop
many algorithms.) by moving the three instructions at LOOP
down in its place (this is called rotating
A Recursive Method (Heap) the l o o p - s e e [24]). Second, the test for
We shall begin by looking at the imple- whether i is even or odd can be made more
mentation of Heap's method. A direct efficient by maintaining a separate Regis-
"hand compilation" of Algorithm 2 leads ter X which is defined to be 1 ff i is even
immediately to Program 1. The right-hand and - 1 ff i is odd. (This improvement ap-
side of the program listing simply repeats plies to most computers, since few have a
the text of Algorithm 2; each statement is j u m p i f even instruction.) Third, the varia-
attached to its assembly-language equiva- ble k can be eliminated, and some time
lent. saved, if C(I) is updated before the ex-

P R O G R A M 1. DIRECT IMPLEMENTATION OF HEAP'S METHOD

LD Z,1
LD I,N I =N,
INIT ST Z,C(I) loop c[1] =1,
CMP 1,2
JLE CALL while/>2
SUB 1,1 I =1-1,
JMP INIT repeat,
CALL CALL PROCESS process,
LOOP LD J,C(I) loop
CMP J,I
JE ELSE If c[/]
THEN LD T,I then
AND T,1
JZ T,EVEN if / odd
LD K,1 then k =1
JMP EXCH else k =c[i]
EVEN LD K,J endlf,
EXCH LD T,P(I)
LD T1 ,P(K)
ST T1 ,P(I)
ST T,P(K) P[I] = P[k],
ADD J,1
ST J,C(I) ch] =cH+l,
LD 1,2 1=2,
CALL PROCESS process,
JMP WHILE
ELSE ST Z,C(I) else c[11"= I,
ADD 1,1 I . = i + I endlf,
WHILE CMP I,N whlle I-<N
JLE LOOP repeat,

Computing Surveys, Vol 9, No. 2, June 1977


158 R . Sedgewick

change is made, freeing Register J. These preceding instruction (which therefore


improvements lead to Program 2. (The must happen N ! - I times). Some of the
program uses the instruction LDN X, instruction frequencies are more compli-
which simply complements Register X.) cated (we shall analyze the quantities AN
Notice that even if we were to precompute and BN in detail below), but all of the
the index table, as in Algorithm 1, we instructions can be labelled in this m a n n e r
probably would not have a program as effi- (see [31]). From these frequencies, we can
cient as Program 2. On computers with a calculate the total running time of the pro-
memory-to-memory move instruction, we gram, if we know the time taken by the
might gain further efficiency by imple- individual instructions. We shall assume
menting the exchange in three instruc- t h a t instructions which reference data in
tions rather than in four. memory take two time units, while j u m p
Each instruction in Program 2 is la- instructions and other instructions which
belled with the number of times it is exe- do not reference data in memory take one
cuted when the program is run to comple- time unit. Under this model, the total run-
tion. These labels are arrived at through a ning time of Program 2 is
flow analysis which is not difficult for this 19N! + A ~ + 10BN + 6 N - 20
program. For example, CALL PROCESS
(and the two instructions preceding it) time units. These coefficients are typical,
must be executed exactly N! times, since and a similar exact expression can easily
the program generates all N! permuta- be derived for any particular implementa-
tions. The instruction at CALL can be tion on any particular real machine.
reached via JLE CALL (which happens ex- The improvements by which we derived
actly once) or by falling through from the Program 2 from Program 1 are applicable
to most computers, but they are intended
PROGRAM 2. IMPROVED IMPLEMENTATION OF
only as examples of the types of simple
HEAP'S METHOD transformations which can lead to sub-
stantial improvements when programs are
LD Z,1 1
implemented. Each of the improvements
LD I,N 1
results in one less instruction which is exe-
INIT ST Z,C(I) N-1
cuted N! times, so its effect is significant.
CMP 1,2 N-1
Such coding tricks should be applied only
JLE CALL N-1
when an algorithm is well understood, and
SUB 1,1 N-1
then only to the parts of the program
JMP INIT N-1
which are executed most frequently. For
THEN ADD J,1 Nl-1
example, the initialization loop of Pro-
ST J,C(I) NW-1
gram 2 could be rotated for a savings of
JP X,EXCH Nw-1
N - 2 time units, but we have not bothered
LD J,2 AN
with this because the savings is so insig-
EXCH LD T,P(I) Nw-1
nificant compared with the other improve-
LD T1 , P - I ( J ) NV-1
ments. On the other hand, further im-
ST T1,P(I) N w- 1
provements within the inner loop will be
ST T,P-I(J) NW-1
available on many computers. For exam-
CALL LD 1,2 NI
ple, most computers have a richer set of
LD X,1 NW
loop control instructions than we have
CALL PROCESS Nw
used: on m a n y machines the last three
LOOP LD J,C(I) NI+B~-I instructions in Program 2 can be imple-
CMP J,I NI+BN-1 mented with a single command. In addi-
JL THEN NI+B~-I tion, we shall examine another, more ad-
ELSE ST Z,C(I) BN
vanced improvement below.
LDN X,X B~
To properly determine the effectiveness
ADD 1,1 BN
of these improvements, we shall first com-
CMP I,N BN
plete the analysis of Program 2. In order to
JLE LOOP BN
do so, we need to analyze the quantities AN

Computing Surveys, Vol 9, No 2, J u n e 1977


Permutation Generation Methods 159
In the algorithm, A N is the num-
a n d B N. Substituting these into the expression
ber of times the test i odd succeeds, and BN above, we find that the total running time
is the number of times the test c[i] = 1 suc- of Program 2 is
ceeds. By considering the recursive struc- (19 + (l/e) + 10(e-2))N! + 6N + O(1),
ture of the algorithm, we quickly find that
the recurrence relations or about 26.55N! time units. Table 3 shows
the values of the various quantities in this
N even
AN = NAN_I + ] '
analysis.
[N- 1 N odd We now have a carefully implemented
program whose performance we under-
and stand, and it is appropriate to consider
BN=NBN-, + 1 how the program can be further "optim-
hold for N > 1, with A, = B, = 0. These ized." A standard technique is to identify
recurrences are not difficult to solve. For situations that occur frequently and han-
example, dividing both sides of the equa- die them separately in as efficient a man-
tion for BN by N! we get ner as possible. For example, every other
exchange performed by Program 2 is sim-
BN BN-, 1 BN-2 1 1 ply P[1]:=:P[2]. Rather than have the pro-
N w- ( N - l ) ! + N! (N-2)! + ~ + N.T
gram go all the way through the main loop
1 1 1 to discover this, incrementing and then
..... 25+~+ +~ testing c[2], etc., we can gain efficiency by
or simply replacing i:=2 by i:=3"~rocess;
P[1]:=:P[2] in Algorithm 2. (For the pur-
BN=N ' E 1 pose of this discussion assume t h a t there is
2~N kl
a statement i:=2 following the initializa-
"

This can be more simply expressed in tion loop in Algorithm 2.) In general, for
terms of the base of the natural loga- any n > 1, we can replace i:=2 by ~:=n+l;
rithms, e, which has the series expansion process all permutations of P[1], ., P[n].
~k~o 1/k!: it is easily verified that This idea was first applied to the permuta-
BN- [N!(e-2)] tion enumeration problem by Boothroyd
[2]. For small n, we can quite compactly
That is, BN is the integer part of the real write in-line code to generate all permuta-
numberN!(e-2) (OrBN = Nl(e-2) + e with tions of P[1],. ., P[n]. For example, tak-
0 <- E < 1). The recurrences for A N c a n be ing n = 3 we may simply replace
solved in a similar manner to yield the CALL LD 1,2
result LD X,1
CALL PROCESS
AN = N! 2 ~~-,
N (-1)~
k! - [N!/e]. in Program 2 by the code in Program 3,

TABLE 3. ANALYSIS OF P R O G R A M 2 ( T N = 19! + A N + I O B N -{- 6IV - 20)


N N! AN BN TN 26.55N!

1 1 0 0
2 2 0 1 40 56+
3 6 2 4 154 159+
4 24 8 17 638 637+
5 120 44 86 3194 3186
6 720 264 517 19130 19116
7 5040 1854 3620 133836 133812
8 40320 14832 28961 1070550 1070496
9 362880 133496 260650 9634750 9634454
10 3628800 1334960 2606501 96347210 96344640
11 39916800 14684570 28671512 1059818936 1059791040
12 479001600 176214840 344058145 12717826742 12717492480

Computing Surveys, Vol 9, No 2, June 1977


160 R. Sedgewick
which efficiently permutes P[1], P[2], P[3]. n+3n! lines of code. This is an example of a
(While only the code that differs from Pro- space-time tradeoff where the time saved
gram 2 is given here, '<Program 3" refers to is substantial when n is small, but the
the entire improved program.) space f,o2_sumed becomes substantial when
The analysis of Program 3 differs only n is large. For n = 4, the total running
slightly from that of Program 2. This is time goes down to about 5.88N! and it is
fortunate, for it is often difficult to deter- probably not worthwhile to go further,
mine the exact effect of such major im- since the best that we could hope for would
provements. First, each of the new in- be 5N! (the cost of two stores and a call).
structions is clearly executed N!/6 times, On most computers, if Program 2 is "op-
and each occurrence of N! in Program 2's timized" in the manner of Program 3 with
frequencies becomes N!/6 for Program 3; n = 4, Heap's method will run faster than
thus, the total running time is any other known method.
( 5 0 / 6 ) N ! + A ' N + B ' N + 6 N - 20.
An Iterative Method (Ives)
Next, the analysis for AN and BN given
above still holds, except that the initial The structures of Algorithm 2 and Algo-
conditions are different. We find that rithm 4 are very similar, so that a direct
"hand compilation" of Ives' method looks
A'N 'i~N k! = very much like Program 1. By rotating the
loop and maintaining the value N + l - i in
a separate register we get Program 4, an
improved implementation of Ives' method
and the total rum~Jng time of Program 3 is which corresponds to the improved imple-
then about 8.88N!. mentation of Heap's method in Program 2.
By taking larger values of n we can get The total running time of this program
further improvements, but at the cost of is
1 8 N ! + 2 1 D N + 1 0 N - 25,
PROGRAM 3. OPTIMIZED INNER LOOP FOR
PROGRAM 2 where DN is the number of times i: =i + 1 is
executed in Algorithm 4. Another quan-
CALL LD 1,4 tity, CN, the number of times the test
LD X,1 P [ N + I - i ] = Q [ N + I - i ] fails, happens to
CALL PROCESS cancel out when the total running time is
LD T1 ,P(1) computed. These quantities can be ana-
LD T2,P(2) lyzed in much the same way that AN and
LD T3,P(3) BN were analyzed for Program 2: they sat-
ST T1 ,P(2) isfy the recurrences
ST T2,P(1) C N = C~v-2 + ( N - l ) ! - (N-2)v
CALL PROCESS D N = D1v-2 + ( N - 2 ) !
ST T3,P(1) so that
ST T2,P(3)
CN = ( N - l ) ! - (N-2)! + (N-3)! - (N-4)! + ""
CALL PROCESS
ST T1 ,P(1) DN = (N-2)I + (N-4)! + (N-6)! +
ST T3,P(2) and the total running time of Program 2 is
CALL PROCESS 18N! + 21(N-2) + O((N-4)!),
ST T2,P(1)
or about
ST T1 ,P(3)
CALL PROCESS (18'+ 21 ~N,
ST T3,P(1) N(N- i)]-"
ST T2,P(2) Thus Program 4 is faster than Program 2:
CALL PROCESS the improved implementation of Ives'
method uses less overhead per permuta-

C o m p u t i n g Surveys, Vol 9, No 2, J u n e 1977


Permutation Generation Methods 161

P R O G R A M 4 IMPROVED IMPLEMENTATION OF IVES' (As before, we shall write down only the
METHOD new code, but make reference to the entire
LD I,N 1 optimized program as "Program 5".) In
INIT ST I,C(I) N-1 this program, Pointer J is kept negative so
LD V,P(I) N-1 that we can test it against zero, which can
ST V,Q(I) N-1 be done efficiently on many computers.
CMP 1,1 N-1 Alternatively, we could sweep in the other
JLE CALL N-1 direction, and have J range from N - 1 to 0.
SUB 1,1 N-1 Neither of these tricks may be necessary
JMP INIT N-1 on computers with advanced loop control
THEN LD T,P(J) N I-CN- 1 instructions.
LD T1 ,P+I(J) NI-CN-1 To find the total running time of Pro-
ST T1 ,P(J) NI-CN-1 gram 5, it turns out that we need only
ST T,P+I(J) NI-C~-I replace N! by (N-2)! everywhere in the
ADD J,1 NI-C~,-1 frequencies in Program 4, and then add
ST J,C(I) Nt-C~-I the frequencies of the new instructions.
CALL LD 1,1 N I-C~ The result is
LD H,N Nw-C~
CALL PROCESS NI 9N! + 2 ( N - l ) ! + 1 8 ( N - 2 ) ! + O ( ( N - 4 ) ! ) ,
LOOP LD J,C(I) Nm+DN-1
not quite as fast as the "optimized" version
CMP J,H NS+DN-1
of Heap's algorithm (Program 3). For a
JL THEN NI+DN-1 fixed value of N, we could improve the
ELSE LD T,P(I) CN+D,~ program further by completely unrolling
LD T1 ,P(H) CN+D~ the inner loop of Program 5. The second
ST T1 ,P(I) C,v+D~ through eighth instructions of Program 5
ST I,C(I) C~+D~
could be replaced by
CMP T,Q(H) C~+DN
JNE CALL CN+DN LD T,P+I
ADD 1,1 D~ ST T,P
SUB H,1 DN ST V,P+I
CMP I,H DN CALL PROCESS
JL LOOP D~ LD T,P+2
ST T,P+I
tion than the improved implementation of ST V,P+2
Heap's method, mainly because it does less CALL PROCESS
counter manipulation. Other iterative LD T,P+3
methods, like the Johnson-Trotter algo- ST T,P+2
rithm (or the version of Ives' method, Al- ST V,P+3
gorithm 4a, which does not require the CALL PROCESS
elements to be distinct), are only slightly
faster than Heap's method. (This could be done, for example, by a
However, the iterative methods cannot macro generator). This reduces the total
be optimized quite as completely as we running time to
were able to improve Heap's method. In
7N! + ( N - l ) ! + 1 8 ( N - 2 ) ! + O ( ( N - 4 ) ! )
Algorithm 4 and Program 4, the most fre-
quent operation is P[c[N]]:=:P[c[N]+I]; which is not as fast as the comparable
c[N]:=c[N]+l; all but 1IN of the ex- highly optimized version of Heap's method
changes are of this type. Therefore, we (with n = 4).
should program this operation separately. It is interesting to note that the optimi-
(This idea was used by Ehrlich [10, 11].) zation technique which is appropriate for
Program 4 can be improved by inserting the recursive programs (handling small
the code given in Program 5 directly after cases separately) is much more effective
CALL PROCESS than the optimization technique which is

Computing Surveys, Vol. 9, No. 2, June 1977


162 R. Sedgewick
PROGRAM 5 OPTIMIZED INNER LOOP FOR With these assumptions, Langdon's
PROGRAM 4
method is particularly simple to imple-
ment, as shown in Program 6. Only eight
EXLP CALL PROCESS (N-1)I assembly language instructions will suf-
LD J,1-N (N-1)v fice on m a n y computers to generate all
INLP LD T,P+N+I(J) NI-(N-1)I permutations of {0,1,. , N - 1}.
ST T,P+N(J) NI-(N-1)v As we have already noted, however, the
ST V,P+N+I(J) NV-(N-1)l MOVEs tend to be long, and the method is
CALL PROCESS NV-(N-1) I not particularly efficient i f N is not small.
ADD J,1 NV-(N-1)~ Proceeding as we have before, we see t h a t
JN J,INLP NI-(N-1)v EN and FN satisfy
LD T,P+I (N-1)~
ST T,P+N (N-1)v EN = ~ k!
l~k'~N--1
ST V,P+I (N-1)~
FN= ~ kk!=(N+l)!- 1
CMP T,Q+N (N-1)l l~k~N

JNE EXLP (N-1)l


(Here FN is not the frequency of execution
of the MOVEinstruction, but the total num-
ber of words moved by it.) The total run-
appropriate for the iterative programs
(loop unrolling). ning time of Program 6 turns out to be

A Cyclic Method (Langdon) N,(2N+10+9)+(O(N-2) ')

It is interesting to study Langdon's cyclic It is faster than Program 2 for N < 8 and
method (Algorithm 6) in more detail, be- faster than Program 4 for N < 4, but it is
cause it can be implemented with only a much slower for larger N.
few instructions on m a n y computers. In By almost any measure, Program 6 is
addition, it can be made to run very fast on the simplest of the programs and algo-
computers with hardware rotation capa- rithms that we have seen so far. Further-
bilities. more, on most computer systems it will
To implement Algorithm 6, we shall use run faster than any of the algorithms im-
a new instruction plemented in a high-level language. The
MOVE TO, FROM(I) algorithm fueled a controversy of sorts (see
which, if Register I contains the number i, other references in [25]) when it was first
moves ~ words starting at Location FROM introduced, based on just this issue.
to Location TO. That is, the above instruc- Furthermore, if hardware rotation is
tion is equivalent to available, Program 6 may be the method of
choice. Since (N-1)/N of the rotations are
LD J,0 of length N, the program may be optimized
LOOP T,FROM(J) in the manner of Program 5 around a four-
T,TO(J) instruction inner loop (call, rotate, com-
ADD J,1 pare, conditional jump). On some ma-
CMPJ,I
JL LOOP P R O G R A M 6 IMPLEMENTATION OF LANGDON'S
METHOD
We shall assume that memory references
THEN LD I,N-1 NI
are overlapped, so that the instruction
takes 2i time units. Many computers have CALL PROCESS NI
"block transfer" instructions similar to LOOP LD T,P+I NV+E~
this, although the details of implementa- MOVE P,P+I(I) F~
tion vary widely. ST T,P+I(I) NV+E~
For simplicity, let us further suppose CMP T,I NI+E N
t h a t P I l l , . - .,P[N] are initially the inte- JNE THEN NI+EN
gers 0,1,. , N - l , so t h a t we don't have to SUB 1,1 E~
bother with the Q array of Algorithm 6. JNZ LOOP E~

Computing Surveys, Vol 9, No 2, June 1977


Permutation Generation Methods 163
chines, the rotate might be performed in, and again in computer science, because
say, two time units (for example, if paral- new algorithms (and new methods of ex-
lelism were available, or if P were main- pressing algorithms) are constantly being
tained in registers), which would lead to a developed. Normally, the kind of detailed
total time of 5N! + O((N-1)!). We have analysis and careful implementation done
only sketched details here because the is- in this paper is reserved for the most im-
sues are so machine-dependent: the ob- portant algorithms. But permutation gen-
vious point is that exotic hardware fea- eration nicely illustrates the important is-
tures can have drastic effects upon the sues. An appropriate choice between algo-
choice of algorithm. rithms for other problems can be made by
studying their structure, implementation,
CONCLUSION analysis, and %ptimization" as we have
done for permutation generation.
The remarkable similarity of the many
permutation enumeration algorithms ACKNOWLEDGMENTS
which have been published has made it
possible for us to draw some very definite Thanks are due to P Flanagan, who implemented
conclusions regarding their performance. and checked many of the algomthms and programs
on a real computer. Also, I must thank the many
In Section 1, we saw that the method given
authors listed below for providmg me with such a
by Heap is slightly simpler (and therefore wealth of maternal, and I must apologize to those
slightly more efficient) than the methods whose work I may have misunderstood or misrepre-
of Wells and Boothroyd, and that the sented. Finally, the editor, P. J. Denning, must be
method given by Ives is simpler and more thanked for his many comments and suggestmns for
efficient than the methods of Johnson and improving the readability of the manuscmpt.
Trotter (and Ehrlich). In Section 2, we
found that the cyclic and lexicographic al- REFERENCES
gorithms will not compete with these, ex- [1] BOOTI~OYD,J "PERM (Algorithm 6)," Com-
cept possibly for Langdon's method, which puter Bulletin 9, 3 (Dec. 1965), 104
avoids some of the overhead in the control [2] BOOTHROVD,J. "Permutation of the elements
structure inherent in the methods. By of a vector (Algomthm 29)"; and "Fast permu-
tation of the elements of a vector (Algorithm
carefully implementing these algorithms 30)," Computer J. 10 (1967), 310-312.
in Section 3 and applying standard code [3] BaATLEY, P "Permutations with repetitious
optimization techniques, we found that (Algomthm 306)," Comm A C M 1O, 7 (July
Heap's method will run fastest on most 1967), 450
[4] Cove,YOU, R R.; AND SULLrVAN, J.
computers, since it can be coded so that G. "Permutahon (Algorithm 71)," Comm.
most permutations are generated with A C M 4, 11 (Nov 1961), 497.
only two store instructions. [5] D~,aSHOWITZ,N. "A simplified loop-free algo-
However, as discussed in the Introduc- rithm for generating permutatlous," B I T 15
(1975), 158-164.
tion, our accomplishments must be kept in [6] DIJKSTRA, E W A dzsc~pllne of program-
perspective. An assembly-language imple- m~ng, Prentice-Hall, Englewood Cliffs, N J ,,
mentation such as Program 3 may run 50 1976.
to 100 times faster than the best previously [7] DZg~STaA, E. W "On a gauntlet thrown by
David Gries," Acta Informat~ca 6, 4 (1976),
published algorithms (in high-level lan- 357.
guages) on most computer systems, but [8] DURSTENFELD,R. "Random permutation (AI-
this means merely that we can now gener- gomthm 235)," Comm. A C M 7, 7 (July 1964),
420.
ate all permutations of 12 elements in one [9] EAVES, B. C "Permute (Algorithm 130),"
hour of computer time, where before we Comm. A C M 5, 11 (Nov. 1962), 551 (See also:
could not get t o N = 11. On the other hand, remarks by R. J. Ord-Smlth m Comm A C M
10, 7 (July, 1967), 452-3 )
if we happen to be interested only in all
[10] EHRLICH, G. "Loopless algomthms for gener-
permutations of 10 elements, we can now atmjg permutations, combinations and other
get them in only 15 seconds, rather than 15 combinatorial configurations," J A C M 20, 3
minutes. (July 1973), 500-513.
The problem of comparing different al- [11] EHRLICH, G "Four combmatomal algomthms
(Algomthm 466)," Comm A C M 16, 11 (Nov
gorithms for the same task arises again 1973), 690-691.

Computing Surveys, Vol 9, No 2, June 1977


164 R. Sedgewick
[12] EWN, S. Algortthmtc combtnatortcs, Macmil- 452 (See also: remarks in Comm ACM 12, 11
lan, Inc., N.Y., 1973. (Nov 1969), 638.)
[13] FIKE, C T. "A permutation generation [32] Om)-SMITH, R J "Generation of permuta-
method," Computer J 18, 1 (Feb. 1975), 21-22. tions m lexicographlc order (Algorithm 323),"
[14] FISCHER, L. L.; AND KRAUSE, K. C , Lehr- Comm. ACM 11, 2 (Feb. 1968), 117 (See also:
buch der Combtnattonslehre und der Artthme- certification by I. M. Leltch in Comm ACM
tzk, Dresden, 1812 12, 9 (Sept. 1969), 512.)
[15] HALL,M.; A N D K N U T H , D.E. "Combinatorial [33] PECK, J E. L ; AND SCHRACK, G F
analysis and computers," Amertcan Math. "Permute (Algorithm 86)," Comm. ACM 5, 4
Monthly 72, 2 (Feb. 1965, Part II), 21-28. (April 1962), 208.
[16] HEAl', B R. "Permutations by Inter- [34] PHmLmS, J. P . N . "Permutation of the ele-
changes," Computer J. 6 (1963), 293-4. ments of a vector in lexicographic order (Algo-
[17] HOWELL,J R. "Generation of permutations rithm 28)," Computer J. 1O (1967), 310-311
by adchhon,"Math. Comp 16 (1962), 243-44 [35] PLESZCZYNSKLS. "On the generation of per-
[18] HOWELL,J . R . "Permutation generater (Al- mutations," Informatmn Processtng Letters 3,
gorithm 87)," Comm. ACM 5, 4 (April 1962), 6 (July 1975), 180-183.
209. (See also: remarks by R. J. Ord-Smith in [36] RIORDAN,J. An ~ntroductton to comb~nato-
Comm AC M 10, 7 (July 1967), 452-3.) rtal analysts, John Wiley & Sons, Inc., N.Y.,
[19] IvEs, F.M. "Permutation enumeration: four 1958.
new permutatmn algorithms," Comm. ACM [37] ROHL,J.S., "Programming improvements to
19, 2 (Feb. 1976), 68-72. Fike's algorithm for generating permuta-
[20] JOHNSON,S.M. "Generatlonofpermutations tions," Computer J. 19, 2 (May 1976), 156
by adjacent transposition," Math. Comp. 17 [38] ROBXNSON,C. L. "Permutation (Algorithm
(1963), 282-285. 317)," Comm. ACM 10, 11 (Nov. 1967), 729.
[21] KNUTH,D.E. "Fundamental algorithms," in [39] SAG,T.W. "Permutations of a set with repe-
The art of computer pro~.rammzng 1, Addison- titions (Algorithm 242)," Comm ACM 7, 10
Wesley, Co., inc., Reading, Mass., 1968. (Oct 1964), 585.
[22] KNUTH, D . E . "Seminumerlcal algorithms," [40] SCHRAK, G. F , AND SHIMRAT, M
m The art ofcom~uter programming 2, Addi- "Permutation in lexicographlc order (Algo-
son-Wesley, Co., inc., Reading, Mass., 1969. rithm 102)," Comm. ACM 5, 6 (June 1962),
[23] KNUTH, D. E "Sorting and searching," in 346. (See also: remarks by R J Ord-Smith in
The art of computer programmtng 3, Addison- Comm ACM 10, 7 (July 1967), 452-3.)
Wesley, Co., Inc, Reading, Mass, 1972 [41] SHEN,M.-K "On the generation ofpermuta-
[24] KNUTH, D. E. "St,ru,ctured programming tlons and combinations," BIT 2 (1962), 228-
with go to statements, ' Computing Surveys 6, 231
4 (Dec. 1974), 261-301. [42] SHEN,M -K "Generation ofpermutations in
[25] LANaDON,G. G., Jr., "An algorithm for gen- lexicographic order (Algorithm 202)," Comm
erating permutations," Comm. ACM 10, 5 A C M 6, 9 (Sept. 1963), 517. (See also: remarks
(May 1967), 298-9. (See also. letters by R. J. by R. J. Ord-Snnth in Comm. ACM 10, 7 (July
Ord-Smith in Comm. AC M 10, 11 (Nov. 1967), 1967), 452-3.)
684; by B. E. Rodden m Comm. ACM 11, 3 [43] SLOANE,N. J. A. A handbook of tnteger se-
(March 1968), 150; CR Rev. 13,891, Computing quences, Academic Press, Inc , N Y , 1973
Reviews 9, 3 (March 1968), and letter by Lang- [44] TOMPKIN8,C. "Machine attacks on problems
don inComm. ACM 11, 6 (June 1968), 392.) whoso variables are permutations," in Proc
[26] LEH~R, D H. "Teaching combinatorial Symposium zn Appl. Math., Numerwal Analy-
tricks to a computer," in Proc. of Symposlum sts, Vol 6, McGraw-Hill, Inc., N.Y., 1956,
Appl. Math ,Combtnatortal Analysis, Vol. 10, 195-211
American Mathematical Society, Providence, [45] TROTTER, H. F. "Perm (Algorithm 115),"
R.I, 1960, 179-193. Comm. ACM 5, 8 (August 1962), 434-435.
[27] LEHMER, D.H. "The machine tools of combi-
natemcs," m Applied comb~natortal mathe- [46] W A L ~ R , R. J "An enumerative technique
matws (E. F. Beckenbach, [Ed.]), John Wiley, for a class of combinatorial problems," in Proc
& Sons, Inc., N Y , 1964 Symposium in Appl. Math, Combtnatorial
Analysts, Vol. 10, American Mathematical So-
[28] L~.mvi~.R,D.H. "Permutation by adjacent in- ciety, Providence, R.I , 1960, 91
terchanges," Amerwan Math. Monthly 72, 2
(Feb. 1965, Part II), 36-46. [47] WZLLS, M. B "Generation of permutations
[29] Om)-SMITH, R J. "Generation of permuta- by transposition," Math Comp 15 (1961), 192-
tion sequences Part 1," Computer J. 13, 3 195.
(March 1970), 152-155. [48] WELLS, M. B. Elements of combtnatortal
[30] OaD-SMITH, R. J. "Generation of permuta- computtng, Pergamon Press, Elmsford, N.Y.,
tion sequences: Part 2," Computer J. 14, 2 1971.
(May 1971), 136-139. [49] WHITEHEAD, E. G. Combinatorial algo-
[31] ORD-SMITH, R. J. "Generation of permuta- rithms. (Notes by C. Frankfeldt and A. E.
tions in psoudo-lexicographic order (Algo- Kaplan) Courant Institute, New York Univ.,
rithm 308)," Comm ACM 10, 7 (July 1967), 1973, 11-17.

Computlng Surveys, Vol 9, No 2, June 1977

S-ar putea să vă placă și