Sunteți pe pagina 1din 180

Algorithm Design and Analysis

Andrew Ensor
School of Engineering, Computer and Mathematical Sciences

Spring 2019
ii
Course Information

Aim To study techniques to support the design and analysis of algorithms.

Content Concurrency algorithms, design patterns, algorithmic analysis, de-


sign techniques, advanced data structures, graph algorithms, numerical
algorithms.

Classes Wednesdays 4-6pm in room WB327.

Computer Labs Thursdays 10-12noon in WZ518, Thursdays 4-6pm in WZ518,


Fridays 2-4pm in WZ518. Please note that only lab material should be
worked on during this time (i.e. no working on assignments, surfing the
web, e-mail, nor games).

Instructors
Assoc. Professor Andrew Ensor Dr Maryam Doborjeh
andrew.ensor@aut.ac.nz maryam.gholami.doborjeh@aut.ac.nz
WT609 WT701
921-9999 ext 8485

Office Hours Students are very welcome to discuss with their instructor prob-
lems regarding the course or other matters. Office hours set aside each
week specifically for student questions will be announced in class.

AUT online Students are encouraged to regularly check the course web site on
AUT online at https://blackboard.aut.ac.nz/. This web site contains
class announcements, discussion forums, assignment information, class re-
sources, as well as updated class marks.

Assessment The assessment will be measured through course work, a mid-


semester test, and a comprehensive final examination.

Course Work The course work grade will be based equally on four practical
computer assignments and is worth a total of 40% of the final grade.
These assignments are to be completed in the student’s own time and
assignments should be submitted in the correct assignment folder on AUT
online by 5pm on the due date.

Late Assignments The policy with late assignments is that for each day an
assignment is late it has one fifth of its marks deducted. However each
student is entitled to a total of three grace days throughout the semester
before they are penalized.

iii
iv

Mid-Semester Test This is a two hour test worth 10% of the final grade.
Final Examination This is a three hour test scheduled during the examina-
tion period. It is worth 50% of the final grade.
Textbook The textbook for this course is Introduction to Algorithms by Cor-
man, Leiserson, Rivest, Stein.
Collaboration Students are welcome to discuss their assignment work with
their instructor and with other students. However, any group work must
clearly state the contribution of each member of the group. Hence no stu-
dent may receive any part of an assignment from another person, whether
in printed or electronic form without the proper acknowledgement. Fail-
ure to abide by this will result in the assignment not being accepted. Any
form of cheating during the mid-semester test or the final examination is
not acceptable.
Timetable for Spring 2019
Week Wednesday Wednesday
Class 1 17 July Class 2 17 July
1
Mutex Algorithms Semaphores and Monitors
Class 3 24 July Class 4 24 July
2
Client-Server Framework Creational Patterns
Class 5 31 July Class 6 31 July
3
Structural Patterns Behavioral Patterns
Class 7 7 August Class 8 7 August
4
Basic Analysis Recurrence Analysis
1 Class 9 14 August Class 10 14 August
5
Divide-and-Conquer Technique Dynamic Programming
Class 11 21 August Class 12 21 August
6
Elements of Dynamic Programming Greedy Technique
2 Class 13 28 August Class 14 28 August
7
Red-Black Trees Augmenting Data Structures
Mid-Semester Break (2 September to 13 September)
3 Class 15 18 September Class 16 18 September
8
B-Trees Disjoint Sets
Class 17 25 September Class 18 25 September
9
Elementary Graph Algorithms Minimal Spanning Trees
Class 19 2 October Class 20 2 October
10
Single-Source Shortest Paths All-Pairs Shortest Paths
4 Class 21 9 October Class 22 9 October
11
Maximum Flow Network Routing
Class 23 16 October Class 24 16 October
12
Matrix Operations Fast Fourier Transforms
5
Examination Period (21 October to 8 November)

1 Assignment one due: 12 August


2 Mid-Semester Test: 26 August (to be confirmed)
3 Assignment two due: 16 September
4 Assignment three due: 7 October
5 Assignment four due: 21 October

v
vi
Contents

1 Concurrency Algorithms 1
1.1 Mutex Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Semaphores and Monitors . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Client-Server Framework . . . . . . . . . . . . . . . . . . . . . . . 15

2 Design Patterns 25
2.1 Creational Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Structural Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Behavioral Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 Algorithmic Analysis 45
3.1 Basic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Recurrence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 48

4 Design Techniques 55
4.1 Divide-and-Conquer Technique . . . . . . . . . . . . . . . . . . . 55
4.2 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Elements of Dynamic Programming . . . . . . . . . . . . . . . . 63
4.4 Greedy Technique . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5 Advanced Data Structures 81


5.1 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Augmenting Data Structures . . . . . . . . . . . . . . . . . . . . 86
5.3 B-Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Disjoint Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6 Graph Algorithms 109


6.1 Elementary Graph Algorithms . . . . . . . . . . . . . . . . . . . 109
6.2 Minimal Spanning Trees . . . . . . . . . . . . . . . . . . . . . . . 113
6.3 Single-Source Shortest Paths . . . . . . . . . . . . . . . . . . . . 123
6.4 All-Pairs Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . 129
6.5 Maximum Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.6 Network Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7 Numerical Algorithms 153


7.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.2 Fast Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . 157

vii
viii CONTENTS

A Advanced Analysis Techniques 163


A.1 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 163
A.2 Probabilistic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 164
A.3 Amortized Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Chapter 1

Concurrency Algorithms

1.1 Mutex Algorithms


Reading: none
A (heavyweight) process is an execution environment on a processor with its
own address space (memory) allocated by the operating system, and which runs
independently of other processes. This enables a processor to execute multiple
code concurrently without conflict. Each heavyweight process holds its own code
(machine instructions), data (global variables and heap space), and execution
stack (local variables and activation records of method calls) in memory, only
competing with other processes for the processor time and access to external
resources such as files.
Execution environments are expensive to create and manage so usually each
heavyweight process is designed to be shared by multiple threads. A thread or
lightweight process is a process that executes inside an execution environment
sharing the code and data, but not execution stack, with other threads in the
same environment. It is much more efficient for an operating system to swap
between threads within the same execution environment than to swap between
heavyweight processes. However, since threads share data, their access to the
data must be carefully synchronized to avoid race conditions, where the time
period in which several threads access some data overlap and the final result
might depend on how the operating system schedules the threads.
An operation is called atomic if it is guaranteed to be executed entirely by
a thread without intermediate states being visible to other threads. It can be
difficult to be certain whether or not a statement is atomic in a programming
language. For example, consider the increment operation:
x++;
on an int variable x. This statement might get compiled into a sequence of
machine instructions such as:
LOAD R,x ; load register R with the value of x
INC R ; increment register R
STORE R,x ; store the value of register R in x
Note that each thread can use the same registers since the operating system
ensures that the values of registers are saved and reloaded when it swaps between

1
2 CHAPTER 1. CONCURRENCY ALGORITHMS

threads. If one thread created by a program contains the increment operation


and another thread concurrently accesses the value of x then whether the second
thread is given the original or the incremented value of x depends on the thread
scheduling, which might vary each time the program is run. If the second thread
loads the value of x before the first thread has executed its STORE instruction
then the original value of x is used:

Thread A Thread B
LOAD R,x
INC R
LOAD R,x ; original value loaded
STORE R,x

If instead the STORE instruction does get executed first then the second thread
loads the incremented value of x:
Thread A Thread B
LOAD R,x
INC R
STORE R,x
LOAD R,x ; incremented value loaded

Worse still, if both threads attempt to modify the value of x then the results
are unpredictable. To illustrate this suppose two threads concurrently attempt
to increment the value of x. Most testing would show that x is incremented by
two:
Thread A Thread B
LOAD R,x
INC R
STORE R,x
LOAD R,x ; incremented value loaded
INC R
STORE R,x ; incremented by two

However, if the operating system happens to swap between the two threads part
way through the three instructions then both threads increment the original
value of x.
Thread A Thread B
LOAD R,x
INC R
LOAD R,x ; original value loaded
INC R
STORE R,x
STORE R,x ; incremented by one

This is known as a lost update since it appears as though one of the increment
statements has been ignored. Such problems can be difficult to detect, making
debugging multithreaded code difficult, so it is important to have a very clear
understanding of concurrency issues. The class LostUpdate demonstrates how
serious the problem can become, due to compiler optimizations of the loop
(probably holding the value of x in a register for efficiency, just loading its value
1.1. MUTEX ALGORITHMS 3

/**
A class that demonstrates the lost update problem in concurrency
by creating two threads that concurrently try to increment x
each a total of ITERATIONS times.
Sometimes the final value of x is not 2*ITERATIONS
@author Andrew Ensor
*/

public class LostUpdate implements Runnable


{
private int x;
private static final int ITERATIONS = 20000000; // multiple of 10

public LostUpdate()
{ x = 0;
}

// repeatedly increment the value of x


public void run()
{ System.out.println("Thread " + Thread.currentThread()
+ " started with x = " + x);
int loopIterations = ITERATIONS/10;
for (int i=0; i<loopIterations; i++)
{ x++; x++; x++; x++; x++; x++; x++; x++; x++; x++;
}
System.out.println("Thread " + Thread.currentThread()
+ " finishing with x = " + x);
}

public static void main(String[] args)


{ // create two concurrent threads
LostUpdate lostUpdate = new LostUpdate();
Thread threadA = new Thread(lostUpdate);
Thread threadB = new Thread(lostUpdate);
threadA.start();
threadB.start();
try
{ // wait for both threads to finish
threadA.join();
threadB.join();
}
catch (InterruptedException e)
{ System.out.println("Interrupted " + e);
}
System.out.println("The final value of x is " + lostUpdate.x);
}
}
4 CHAPTER 1. CONCURRENCY ALGORITHMS

at the start of the loop and storing it at the end) the net result of one entire
thread might be lost.
A critical section is a segment of code that should only be executed by one
thread at a time and mutual exclusion refers to the sychronization of thread
access to critical sections so that only one thread can be inside the code at a time.
If data get modified by multiple threads then any code that modifies the data is
considered a critical section and so some algorithm is required to ensure mutual
exclusion. Mutual exclusion can be obtained for a critical section by insisting
that each thread t must acquire a unique lock before it can enter the critical
section and only release the lock once it has completed the critical section.
In practice, a multithreaded system calls a method such as Acquire-Lock
whenever a thread is about to enter a critical section of code, which blocks
that thread until the lock is available. When the lock becomes available the
Acquire-Lock method then unblocks one blocked thread so that it completes
the method and starts executing code in the critical section. When the thread
completes the critical section the system calls a method such as Release-Lock
to release the lock so that it becomes available for other threads.
An algorithm for implementing mutual exclusion via the Acquire-Lock
and Release-Lock methods is called a mutex algorithm. A first attempt at
implementing a mutex algorithm might be to include a boolean flag exclude
that is initially false but is set to true when a thread t enters the critical sec-
tion, as given by the following pseudocode (a compact and language-independent
way of specifying algorithms via a mixture of natural language and high-level
programming constructs):

Incorrect-Acquire-Lock(t)
1 while exclude do
2 delay
3 exclude ← true  lock has been acquired by thread t

Incorrect-Release-Lock(t)
1 exclude ← false  lock released by thread t

(where an arrow ← denotes an assignment statement, and a triangle  denotes


a comment). However, this simple approach does not work as the testing of the
flag exclude and setting it to true is not done atomically. It is possible that just
after one thread completes the while loop but before it sets the flag to true
another thread enters Acquire-Lock and successfully completes the method,
resulting in both threads concurrently being allowed in the critical section.
The first correct algorithm for handling mutual exclusion between two threads
was Dekker’s algorithm. It allows either thread to obtain the lock if it is avail-
able, and when there is a conflict for the lock one thread is given priority over
the other, and the priority reverses when a thread releases the lock. It main-
tains a variable priority indicating which of the two threads has priority and a
boolean[] array requested with a flag for whether each thread has requested
the lock.
1.1. MUTEX ALGORITHMS 5

Dekker-Acquire-Lock(t)
1 s ← the other thread than t
2 requested [t] ← true
3 while requested [s] do
4 if priority = s then
5 requested [t] ← false
6 while priority = s do
7 delay
8 requested [t] ← true

Dekker-Release-Lock(t)
1 s ← the other thread than t
2 priority ← s
3 requested [t] ← false
Dekker’s algorithm can be simplified to give Peterson’s algorithm, by taking
advantage of a race condition on a variable turn in the while loop. If one thread
is caught in the while loop because the other thread has also just requested the
lock, then when the other thread changes turn the first thread is released from
the loop and the second thread gets caught by it until the first releases the lock.
Peterson-Acquire-Lock(t)
1 s ← the other thread than t
2 requested [t] ← true
3 turn ← t
4 while turn = t and requested [s] do
5 delay  thread s has not released lock

Peterson-Release-Lock(t)
1 requested [t] ← false

Peterson’s algorithm can be extended to work for more than two threads
but the algorithm grows in complexity. As an alternative, Lamport’s bakery
algorithm works with any number of concurrent threads much like customers
being assigned order numbers in a bakery. Each thread t that requests the lock
is assigned a number number [t], and threads are allowed access to the lock in
increasing order of the number. In the case where two threads might be assigned
the same id number, their distinct process numbers are used to break the tie.
The algorithm ensures that the lock is not provided to a thread while there is a
thread s with a lower number or that is part way through choosing a number.
Lamport-Acquire-Lock(t)
1 choosing[t] ← true
2 assign a number to thread t one larger than currently assigned maximum
3 choosing[t] ← false
4 for each thread s do
5 while choosing[s] do
6 delay  thread s has not chosen a number
7 while (number [s] 6= 0 and number [s] < number [t])
8 or (number [s] = number [t] and Id [s] < Id [t]) do
9 delay  thread s has not released the lock
6 CHAPTER 1. CONCURRENCY ALGORITHMS

Lamport-Release-Lock(t)
1 number [t] ← 0  unassigned

An alternative to using a software algorithm for mutual exclusion is to have


the processor disable hardware interrupts while a thread is in any critical section.
Although this works in a single processor environment it can potentially lead to
deadlock, interference with other processes, and does not work in systems with
multiple processors. Instead, many processors have a test and set instruction
where a given memory location is set to a new value and its former value returned
all in one atomic operation. Using an atomic test and set instruction a simple
algorithm for mutual exclusion is possible with a single boolean flag exclude.

TestAndSet-Acquire-Lock(t)
1 while TestAndSet(exclude, true) do
2 delay

TestAndSet-Release-Lock(t)
1 TestAndSet(exclude, false)

The simplest approach to have a thread delay in a mutex algorithm is use a


busy wait, while one thread is inside the critical section any other thread trying
to acquire the lock is made to loop continuously (spin) checking the condition
for continuing. However, this wastes processor time if the critical section takes a
long period to complete. A better alternative is to have the thread block, where
it is not allowed any processor time until the lock again becomes available, after
which it again checks the condition and either continues or blocks once again.

Exercise 1.1 (Implementing Lamport’s Bakery Algorithm) Implement


and test Lamport’s Bakery Algorithm (note that some fields might possibly need
to be declared as volatile so that the compiler does not try to optimize perfor-
mance using separate copies of variables in each thread).

1.2 Semaphores and Monitors


Reading: none
With the assistance of mutex algorithms more elaborate concurrency con-
trol is possible, such as semaphores and monitors. A counting semaphore has
an int variable s which represents the number of threads that are currently
permitted access to a resource (a special case is a binary semaphore which has
only two possible values 0 and 1 and so is really just a mutex lock as in Section
1.1). A semaphore uses a mutex algorithm to ensure that its statements within
Acquire-Resource and Release-Resource are executed atomically.

Semaphore-Acquire-Resource(t)
1  following lines must all be one atomic operation
2 while s ≤ 0 do
3 delay
4 s←s−1
1.2. SEMAPHORES AND MONITORS 7

Semaphore-Release-Resource(t)
1  following line must be an atomic operation
2 s←s+1

Counting semaphores can be used by an operating system to limit access to a


resource to a finite number of concurrent threads (or heavyweight processes) by
starting with the semaphore counter s at the upper bound. Each time another
thread acquires the lock the semaphore counter is decreased by one, until it
reaches zero, in which case further threads are caught in the while loop until a
thread releases the lock.
A monitor uses a blocking mutex algorithm to control thread access to criti-
cal sections of code that are declared synchronized, and to allow multiple threads
within critical sections to coordinate their progress through the code using wait
and notify operations. While a thread is executing code in a synchronized
block it is said to hold the monitor, and all other threads are blocked from
executing any code synchronized on the monitor until that thread relinquishes
the monitor. There are two ways that the thread can relinquish the monitor,
either by exiting the synchronized block or else by calling the monitor’s wait
operation. If a thread that holds the monitor calls the monitor’s wait operation
then it relinquishes the monitor even though it is still inside the synchronized
block and gets put in the monitor’s wait set. When another thread holds the
monitor and calls the monitor’s notify operation, one thread on the wait set is
chosen (typically at random) to be removed from the wait set and made eligible
to run. Since only one thread is allowed to hold the monitor at a time there are
two alternative possibilities for what happens next:

• the thread that has been made eligible to run continues execution with the
monitor whereas the thread that called notify is made to block until the
monitor is available (called a signal-and-exit monitor or a Hoare monitor),

• the thread that has been made eligible to run now blocks until the thread
that called notify relinquishes the monitor (called a signal-and-continue
monitor).

Note that the wait and notify operations allow for multiple threads to be
inside code synchronized on the same monitor at the same time, although only
one of the threads can hold the monitor at a time and so be executing code.
Each thread can use the notify operation to pass control to another thread to
resume executing synchronized code.
In Java any object can act as a signal-and-continue monitor (rather than a
signal-and-exit monitor), and it inherits the Object methods:

wait to place the current thread in the wait set,

notify to remove one (arbitrarily chosen) thread from the wait set,

notifyAll to remove all the threads from the wait set.

A critical section of Java code or an entire method is declared as synchronized


to have thread access to it controlled by a monitor object. The monitor is
typically chosen to be the object that holds the data manipulated by the code
(often this instance of the class containing the code, which is the default for a
8 CHAPTER 1. CONCURRENCY ALGORITHMS

synchronized method). A thread can be made to relinquish its monitor part way
through the code and be put in the monitor’s wait set by calling the monitor’s
wait method. This is typically done by a thread when it holds the monitor inside
synchronized code but can not continue with its task until another thread has
done something. If another thread which currently holds the monitor calls the
monitor’s notify method then one arbitrary thread is notified and removed
from the wait set. More commonly, the thread holding the monitor calls the
monitor’s notifyAll method to remove all the waiting threads from the wait
set, giving them each a chance to proceed, and all but one are then made to wait
again (hence in Java the wait method usually appears within a while loop with
a condition that the notifying thread has changed before calling notifyAll).
Once removed from the wait set a thread remains blocked until the monitor
becomes available, as Java uses signal-and-continue monitors so the notifying
thread still holds the monitor (hence often a notifying thread calls notify or
notifyAll just before it completes the synchronized code). When the monitor
becomes available that thread then competes with other threads that might also
be trying to obtain the monitor.
Threads can be managed by the Thread methods:
setPriority changes the thread priority between limits Thread.MIN PRIORITY
and Thread.MAX PRIORITY,
start puts the thread in the runnable state so that it begins executing the run
method of a Runnable object (that was specified to the Thread construc-
tor),
sleep (static method) indicates that the currently executing thread should
not get any further processor time for a specified number of milliseconds
(the thread does not relinquish any monitors it holds),
yield (static method) indicates that the currently executing thread should
give up its current slice of processor time (the thread does not relinquish
any monitors it holds),
join blocks the currently executing thread until the thread whose join method
was called is dead,
interrupt interrupts the thread from its sleep, wait, join, or blocking on an
I/O operation.
Threads can also be assigned to a ThreadGroup object when they are created,
allowing them to be worked with together as a group. The example class
BlockingLock demonstrates how the monitor wait and notify methods can
be used to implement a blocking lock in Java.
One common application of synchronization arises in coordinating access
to a shared database. The reader-writer problem requires synchronization so
that if a thread is writing to the database then no other thread can read from
nor write to it, but multiple threads should be able to read concurrently when
no thread is writing. The reader-writer problem can be solved using a lock
writeLock to ensure that other threads cannot read nor write while one thread
is writing, a count numReaders of the number of threads currently reading, and
another lock readLock to ensure that reading threads synchronize their access
to numReaders.
1.2. SEMAPHORES AND MONITORS 9

/**
A class that represents a blocking (rather than spinning) lock
where the BlockingLock instance is used as a monitor to control
access to its acquireLock and releaseLock methods
@author Andrew Ensor
*/

public class BlockingLock


{
private boolean s; // access to s is synchronized

public BlockingLock()
{ s = true; // initially lock is available
}

public synchronized void acquireLock()


{ while (!s) // wait for the lock available notification
{ try
{ wait();
}
catch (InterruptedException e)
{ // ignore
}
}
s = false; // lock is now unavailable for other threads
}

public synchronized void releaseLock()


{ s = true; // lock is now available for other threads
notify(); // notify one waiting thread
}
}

Read(t)
1 Acquire-Lock(readLock , t)
2 if numReaders = 0 then  block any writing threads
3 Acquire-Lock(writeLock , t)
4 numReaders ← numReaders +1
5 Release-Lock(readLock , t)
6 Read from the database
7 Acquire-Lock(readLock , t)
8 numReaders ← numReaders −1
9 if numReaders = 0 then  unblock any writing threads
10 Release-Lock(writeLock , t)
11 Release-Lock(readLock , t)

Write()
1 Acquire-Lock(writeLock , t)
2 Write to the database
3 Release-Lock(writeLock , t)
10 CHAPTER 1. CONCURRENCY ALGORITHMS

This algorithm can suffer from a flaw that is common in concurrent programming
known as starvation, where a thread is forever blocked from obtaining a lock. In
this case if threads frequently read then any thread that wants to write might
never be able to acquire writeLock .
The dining philosopher problem is another classic synchronization problem
that illustrates the need for coordinated access to shared resources. It involves
five philosophers sitting around a table with five chopsticks (the shared re-
source), where a single chopstick is between each adjacent pair of philosophers.
The philosophers sit and think, but from time to time they get hungry. In
order to eat they need to pick up the chopstick on either side of them, which
prevents the philosophers on either side from eating. Whenever a philosopher
stops eating the chopsticks are made available for the philosophers on either
side to use.
A simple attempt to solve the dining philosopher problem might be to use a
lock for controlling access to each chopstick, ensuring that only one philosopher
can hold each chopstick at a time. When a philosopher is hungry they could
try to acquire the lock for the chopstick on their left and on their right. The
classes Philosopher and DiningPhilosophersProblem demonstrate how this
strategy could be implemented, where each philosopher repeatedly changes from
thinking, to hungry (where the philosopher tries to acquire both chopsticks),
to eating. Although this approach might appear to work it does have a very
significant defect which is eventually apparent if the calls to sleep are removed
from Philosopher. The DiningPhilosophersProblem class risks resulting in
deadlock. A deadlock is a situation where no thread can progress because every
thread is waiting on resources held by another thread which in turn is also
waiting. In this example, since each philosopher first picks up the chopstick on
the left and holds it until the chopstick on the right is available eventually the
scenario can result in which each philosopher holds the chopstick on their left
and waits forever for the chopstick on the right.
The class DiningPhilosophersSolution demonstrates how the deadlock in
the dining philosophers problem can be fixed. A DiningPhilosophersSolution
uses itself as a monitor to ensure that chopsticks on either side of a philosopher
are available before either is picked up and that only one philosopher at a time
is in the process of acquiring the chopsticks. Although this approach does avoid
deadlock it is not guaranteed to be free from starvation.

Exercise 1.2 (Avoiding Deadlocks) The class SwappableElement can cause


a deadlock if one thread calls a.swap(b) concurrently as another thread calls
b.swap(a). This is because the synchronized swap method requires a thread to
acquire this monitor to enter the method, but also calls the getElement and
setElement methods of another instance which are synchronized on the other
instance (so requiring two monitors to be acquired).
Demonstrate the deadlock with a simple test class and then modify the swap
method so that it cannot cause deadlocks (hint: one possibility is to use the hash
code of the element to determine which SwappableElement actually does the
swapping).
1.2. SEMAPHORES AND MONITORS 11

/**
A class that represents a philosopher for the Dining Philosophers
problem (remove sleep and try/catch to make problem more evident)
@see DiningPhilosophersSolution.java
*/
import java.util.Random;

public class Philosopher implements Runnable


{
private int idNumber;
private Resource chopsticks;
private boolean stopRequested;
private Random generator;

public Philosopher(int idNumber, Resource chopsticks)


{ this.idNumber = idNumber;
this.chopsticks = chopsticks;
stopRequested = false;
generator = new Random();
}

public void run()


{ while (!stopRequested)
{ try
{ System.out.println("Philosopher "+idNumber+" is thinking");
Thread.sleep(generator.nextInt(1000));
System.out.println("Philosopher "+idNumber+" is hungry");
chopsticks.acquire(idNumber);
System.out.println("Philosopher "+idNumber+" is eating");
Thread.sleep(generator.nextInt(1000));
chopsticks.release(idNumber);
}
catch (InterruptedException e)
{ System.out.println("Interruption: " + e);
}
}
}

public void requestStop()


{ stopRequested = true;
}
}
12 CHAPTER 1. CONCURRENCY ALGORITHMS

/**
An implementation of the dining philosophers problem using blocking
locks. Note that this implementation might cause a deadlock
@see DiningPhilosophersSolution.java
*/

public class DiningPhilosophersProblem implements Resource


{
private static final int NUM = 5; //num philosophers and chopsticks
private BlockingLock[] chopsticks; // lock used for each chopstick

public DiningPhilosophersProblem()
{ chopsticks = new BlockingLock[NUM];
for (int i=0; i<NUM; i++)
chopsticks[i] = new BlockingLock();
}

public void acquire(int id)


{ // first acquire chopstick on left
chopsticks[id].acquireLock();
// then acquire chopstick on right
chopsticks[(id+1)%NUM].acquireLock();
}

public void release(int id)


{ // release both chopsticks
chopsticks[id].releaseLock();
chopsticks[(id+1)%NUM].releaseLock();
}

public static void main(String[] args)


{ DiningPhilosophersProblem resource
= new DiningPhilosophersProblem();
for (int i=0; i<NUM; i++)
{ // create and start the philosophers
Philosopher phil = new Philosopher(i, resource);
Thread thread = new Thread(phil);
thread.start();
}
}
}
1.2. SEMAPHORES AND MONITORS 13

/**
An implementation of the dining philosophers problem using a
monitor. Note that this implementation does not cause deadlock
but may cause starvation of a thread
@author Andrew Ensor
*/

public class DiningPhilosophersSolution implements Resource


{
private static final int NUM = 5; //num philosophers and chopsticks
private enum State{THINKING, HUNGRY, EATING};
private State[] philState;

public DiningPhilosophersSolution()
{ philState = new State[NUM];
for (int i=0; i<NUM; i++)
philState[i] = State.THINKING;
}

// helper method that return the id of philosopher to left


private int left(int id)
{ return (id-1+NUM)%NUM;
}

// helper method that return the id of philosopher to right


private int right(int id)
{ return (id+1)%NUM;
}

// return whether philosopher with specified id can eat


private boolean canEat(int id)
{ return (philState[left(id)] != State.EATING)
&& (philState[id] == State.HUNGRY)
&& (philState[right(id)] != State.EATING);
}

public synchronized void acquire(int id)


{ philState[id] = State.HUNGRY;
while (!canEat(id))
{ try
{ wait(); // make thread release monitor and wait for notify
}
catch (InterruptedException e)
{ // do nothing
}
}
philState[id] = State.EATING;
}

cont-
14 CHAPTER 1. CONCURRENCY ALGORITHMS

-cont

public synchronized void release(int id)


{ philState[id] = State.THINKING;
notifyAll(); // notify all waiting threads
}

public static void main(String[] args)


{ DiningPhilosophersSolution resource
= new DiningPhilosophersSolution();
for (int i=0; i<NUM; i++)
{ // create and start the philosophers
Philosopher phil = new Philosopher(i, resource);
Thread thread = new Thread(phil);
thread.start();
}
}
}

/**
A class that represents an element of generic type E that includes
a method for swapping two references which can cause deadlock
@author Andrew Ensor
*/

public class SwappableElement<E>


{
private E element;

public SwappableElement(E element)


{ this.element = element;
}

public synchronized E getElement()


{ return element;
}

public synchronized void setElement(E element)


{ this.element = element;
}

public synchronized void swap(SwappableElement<E> other)


{ E temp = element;
element = other.getElement();
other.setElement(temp);
// put Thread.sleep(1) here to make deadlock obvious
}
}
1.3. CLIENT-SERVER FRAMEWORK 15

1.3 Client-Server Framework


Reading: none
Network programming involves a server, which is software that offers some
service on a host machine connected to a network, and one or more clients, that
are also on the network and connect to the machine to request that service.
A single machine might have multiple servers running concurrently so that it
can offer different services on the network, with each server bound to its own
port on the machine. A port is not a physical device but rather an int value
between 0 and 65535 that determines which server should handle each request.
When a client tries to establish a connection to a machine it specifies the address
of the machine on the network and the port corresponding to the service it is
requesting. Some of the ports below about 5000 are pre-allocated for standard
services (for example, port 7 is assigned to an ECHO server, port 13 to a
DAYTIME server, ports 20 and 21 to FTP, and port 80 to HTTP), so a server
must choose an unallocated port to listen on for connection requests from clients.
When the server accepts a connection with a client a different port is allocated
on the host machine for that connection so that the server is able to listen for
further requests on the original port while the connection is still open. Once
the connection is established the client and server can communicate back and
forth using some agreed protocol for the exchange.
Each host machine in a network is identified by a unique Internet Protocol
(IP) address, a sequence of bytes that uniquely identifies the host. In Java the
InetAddress class (in the package java.net) has some helpful static methods
for obtaining the IP address from a host name. For example:
InetAddress.getByName("cache.aut.ac.nz");
returns an InetAddress object with the IP address 156.62.1.12. If a host
might have multiple IP addresses (such as a host name that has a lot of traffic),
they can all be obtained by the method getAllByName. Note that:
InetAddress.getByName("localhost");
always returns the fixed IP address 127.0.0.1, called the local loopback address.
Instead, the actual IP address of the local host on a network can be obtained
from the getLocalHost method:
InetAddress.getLocalHost();
Note that these methods can throw an UnknownHostException exception and
so need to be placed in a try-catch block.
Network communication is divided into dif- Application layer
ferent layers such as physical, data link, (HTTP, FTP, SMTP, VOIP, . . . )
network, transport, and application, where Transport layer
each layer has its own protocols. For ex- (TCP, UDP, . . . )
ample, the network layer of the Web uses Network layer
the Internet Protocol (IP) to describe its (IP, ATM, . . . )
packets of information, and its transport
Data link layer
layer uses the Transmission Control Pro-
(Ethernet MAC, PPP, . . . )
tocol (TCP) as well as the alternative User
Physical layer
Datagram Protocol (UDP) for the trans-
(raw bits, cables, . . . )
mission of that information.
16 CHAPTER 1. CONCURRENCY ALGORITHMS

TCP is used for point-to-point communication between a client and a server


that requires a reliable connection, ensuring all the data actually gets transmit-
ted and is received in the same order as it was sent. Protocols at the application
layer, such as the Hypertext Transfer Protocol (HTTP) and the File Transfer
Protocol (FTP) use TCP at the transport layer to ensure that an HTML file is
not jumbled when read and that a downloaded file is not missing parts. UDP
has much less overhead than TCP, and is used for the communication of inde-
pendent data packets called datagrams that are not guaranteed to be delivered
nor in order. UDP would typically be used at the application layer for infor-
mation such as streaming video, voice (VOIP), or the current time, where it
is more important to keep the images coming or obtain the most recent time
rather than maintaining perfect integrity of the information.
The java.net package in Java SE provides very convenient classes for net-
work programming at or above the application layer. The Socket and Server-
Socket classes use TCP for communication back and forth between a client and
a server using a reliable connection. The client and server each bind a socket
to their end of the connection, and then read from an input stream and write
to an output stream of their respective socket. The optional Java ME package
javax.microedition.io likewise provides interfaces SocketConnection and
ServerSocketConnection (since this is an optional package it is not required
to be implemented by all Java ME venders).
The java.net package also has the class DatagramSocket for communica-
tion via the User Datagram Protocol (UDP), without maintaining a connection.
Since UDP does not guarantee the arrival, arrival time, and content of data-
grams at the recipient, the recipient must piece the packets together in the
correct order and request retransmission of missing packets. In Java ME this is
optionally provided by the javax.io interface UDPDatagraphConnection.
High-level connections to the Web (such as HTTP or HTTPS connections to
web sites) can be conveniently handled in Java SE by the URL class, which itself
uses sockets to handle a TCP connection. The MIDP1.0 specification for Java
ME also provides (required) support for HTTP connections, and the MIDP2.0
specification added support for HTTPS connections.

Client request connection Server

Socket send information Socket 0..∗


ServerSocket
-port:int
send information +accept():void

The Client-Server Framework is a pattern that describes a framework for an


application that involves a connection between a client and a server. It is often
termed an architectural pattern since it describes a standard way of organizing
the modules in a software project, in this case the modules of a project using
networking. A socket is one end point of a two-way connection, which is used
to handle the communication via input and output streams. The server uses a
single server socket to listen for any requests on a particular port of the host
1.3. CLIENT-SERVER FRAMEWORK 17

machine. When a client requests a connection to that port (via its own socket)
the server socket creates a new socket to handle the server end of the connection.
A client can be implemented in Java by the following steps:

Initiate the connection A client requests a connection by creating a Socket


object specifying the host name (or IP address) and the port:

socket = new Socket(HOST_NAME, HOST_PORT);

Obtain streams Input and/or output streams are obtained from the socket
and are layered with appropriate filtering streams:

out = new PrintWriter(socket.getOutputStream(), true);


in = new BufferedReader(new InputStreamReader(
socket.getInputStream()));

Communicate The client uses the streams to communicate with the server
using an agreed protocol (either a standard protocol such as HTTP or
SMTP or a custom protocol for the application):

out.println(request);
String serverResponse = in.readLine();

Close connection When completed the client cleans up by closing the streams
and then the socket:

out.close();
in.close();
socket.close();

A server must be running on the host machine before any client requests
a connection. Implementing a server is similar to a client but just requires a
few extra steps since a typical server should be able to handle multiple clients
concurrently:

Create a server socket The server states its intention to listen on a specific
port by creating a ServerSocket object for that port. It might be con-
figured with a timeout:

ServerSocket serverSocket = new ServerSocket(PORT);


serverSocket.setSoTimeout(30000); // 30 second timeout

Listen for connections The ServerSocket method accept is called to block


the main server thread until a request is received. Typically the resulting
Socket is passed to a separate thread for handling the communication so
that the main server thread can continue listening for other clients:

Socket socket = serverSocket.accept();


Thread thread = new Thread(...);
thread.start();

Obtain streams Input and/or output streams are obtained from the socket
and are layered with appropriate filtering streams:
18 CHAPTER 1. CONCURRENCY ALGORITHMS

out = new PrintWriter(socket.getOutputStream(), true);


in = new BufferedReader(new InputStreamReader(
socket.getInputStream()));

Communicate The server thread uses the streams to communicate with the
client using an agreed protocol:
String clientRequest = in.readLine();
out.println(response);

Close connection When the communication has completed with that client
the server cleans up by closing the streams and then the socket (if the
server itself is completed then it also closes the server socket):
out.close();
in.close();
socket.close();
For example, the classes GuessClient and GuessServer demonstrate a sim-
ple networked application between one or more clients that try to guess random
numbers which are determined by a server. Note that these two classes have no
reference to each other, each client just knows the host name, correct port for
the server, and the protocol to use for the communication. The protocol used
in this example is as follows:
• the server initiates the communication by sending the Unicode message
Guess the number between m and n inclusive,
• the client then responds with a number,

• the server responds with a suitable message,


• the previous two steps continue until the server responds with Correct
guess!, which indicates to the client that the connection should be closed.

Exercise 1.3 (Chat Room) Prepare a client/server application that can be


used by clients to broadcast string messages to each other. The server should
accept connections and send any message it receives from a client to all the
clients that are currently connected. Note that each client will need two threads,
one thread that accepts client input and sends the input to the server, and an-
other thread that listens for messages received from the server and displays them
on the client. Connections are closed when the client sends the message QUIT.
1.3. CLIENT-SERVER FRAMEWORK 19

/**
A class that represents a client in a number guessing game
@see GuessServer.java
*/
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.Socket;
import java.util.Scanner; // Java 1.5 equivalent of cs1.Keyboard

public class GuessClient


{
public static final String HOST_NAME = "localhost";
public static final int HOST_PORT = 7777; // host port number

public GuessClient()
{
}

public void startClient()


{ Socket socket = null;
Scanner keyboardInput = new Scanner(System.in);
try
{ socket = new Socket(HOST_NAME, HOST_PORT);
}
catch (IOException e)
{ System.err.println("Client could not make connection: " + e);
System.exit(-1);
}
PrintWriter pw; // output stream to server
BufferedReader br; // input stream from server
try
{ // create an autoflush output stream for the socket
pw = new PrintWriter(socket.getOutputStream(), true);
// create a buffered input stream for this socket
br = new BufferedReader(new InputStreamReader(
socket.getInputStream()));

cont-
20 CHAPTER 1. CONCURRENCY ALGORITHMS

-cont

// play the game until value is correctly guessed


boolean finished = false;
do
{ String serverResponse = br.readLine();
System.out.println(serverResponse);
if (serverResponse.toLowerCase().indexOf("correct")==0)
finished = true;
else
{ // get user input and sent it to server
pw.println(keyboardInput.nextLine());
}
}
while (!finished);
pw.close();
br.close();
socket.close();
}
catch (IOException e)
{ System.err.println("Client error with game: " + e);
}

public static void main(String[] args)


{ GuessClient client = new GuessClient();
client.startClient();
}
}
1.3. CLIENT-SERVER FRAMEWORK 21

/**
A class that represents a server in a number guessing game where
GuessClient objects connect to this GuessServer and try to guess
a random integer value between min (incl) and max (excl)
The game initiates with a response from the server and ends when
the server responds with "Correct guess!"
@author Andrew Ensor
*/
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.InetAddress;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.Random;

public class GuessServer


{
private int min, max; // minimum (incl) and maximum (excl) range
private boolean stopRequested;
private Random generator;
public static final int PORT = 7777; // some unused port number

public GuessServer(int min, int max)


{ this.min = min;
this.max = max;
stopRequested = false;
generator = new Random();
}

// start the server if not already started and repeatedly listen


// for client connections until stop requested
public void startServer()
{ stopRequested = false;
ServerSocket serverSocket = null;
try
{ serverSocket = new ServerSocket(PORT);
System.out.println("Server started at "
+ InetAddress.getLocalHost() + " on port " + PORT);
}
catch (IOException e)
{ System.err.println("Server can’t listen on port: " + e);
System.exit(-1);
}

cont-
22 CHAPTER 1. CONCURRENCY ALGORITHMS

-cont

try
{ while (!stopRequested)
{ // block until the next client requests a connection
// note that the server socket could set an accept timeout
Socket socket = serverSocket.accept();
System.out.println("Connection made with "
+ socket.getInetAddress());
// start a game with this connection, note that a server
// might typically keep a reference to each game
GuessGame game = new GuessGame(socket,
generator.nextInt(max-min)+min);
Thread thread = new Thread(game);
thread.start();
}
serverSocket.close();
}
catch (IOException e)
{ System.err.println("Can’t accept client connection: " + e);
}
System.out.println("Server finishing");
}

// stops server AFTER the next client connection has been made
// (since this server socket doesn’t timeout on client connections)
public void requestStop()
{ stopRequested = true;
}

// driver main method to test the class


public static void main(String[] args)
{ GuessServer server = new GuessServer(1, 100);
server.startServer();
}

// inner class that represents a single game played across a socket


private class GuessGame implements Runnable
{
private int value; // value to guess
private Socket socket; // socket for client/server communication

// constructor for a guess game to guess value across a socket


// for client/server communication
public GuessGame(Socket socket, int value)
{ this.value = value;
this.socket = socket;
}

cont-
1.3. CLIENT-SERVER FRAMEWORK 23

-cont

public void run()


{ PrintWriter pw; // output stream to client
BufferedReader br; // input stream from client
try
{ // create an autoflush output stream for the socket
pw = new PrintWriter(socket.getOutputStream(), true);
// create a buffered input stream for this socket
br = new BufferedReader(new InputStreamReader(
socket.getInputStream()));
// play the game until value is correctly guessed
pw.println("Guess the number between " + min + " and "
+ (max-1) + " inclusive");
int guess = min-1;
do
{ String clientGuess = br.readLine();
String response;
if (clientGuess == null)
response = "Nothing entered, try again";
else
{ try
{ guess = Integer.parseInt(clientGuess);
if (guess < value)
response = "Guess too low, try again";
else if (guess > value)
response = "Guess too high, try again";
else
response = "Correct guess!";
}
catch (NumberFormatException e)
{ response = "Not an int value, try again";
}
}
pw.println(response);
}
while (guess!=value);
pw.close();
br.close();
System.out.println("Closing connection with "
+ socket.getInetAddress());
socket.close();
}
catch (IOException e)
{ System.err.println("Server error with game: " + e);
}
}
}
}
24 CHAPTER 1. CONCURRENCY ALGORITHMS
Chapter 2

Design Patterns

2.1 Creational Patterns


Reading: none
A design pattern is a recurring solution to a software design problem. It
describes the required objects and the necessary communication between those
objects in order to accomplish some common programming task. The founda-
tions for design patterns were laid by Gamma, Helm, Johnson, and Vlissides (the
so-called Gang of Four ) in their book Design Patterns in 1994, which described
23 design patterns. They classified design patterns into three types:

creational patterns which describe ways of creating objects, allowing more


control over which objects get instantiated,

structural patterns which describe how classes and objects can be combined
to form larger structures,

behavioral patterns which are concerned with the communication between


objects in a system.

Creational patterns deal with the best way of creating instances of objects.
Objects are always created in a language such as Java via a constructor:

Whatever whatever = new Whatever();

Placing such a statement in a program requires that the most appropriate class
and constructor be known when the program is coded. However, sometimes the
exact nature of the object might vary, so it might be preferable to delegate the
responsibility of calling an appropriate constructor to some class that is better
suited to determining the appropriate object to instantiate.
The Factory Pattern is a creational pattern which uses a class commonly
known as a factory class to create instances of objects. The factory class has
methods for instantiating and returning an object instance. The constructor
and class used to create the appropriate object are chosen by the factory from
one or more subclasses of some abstract class (or classes that all implement
the same interface). The client program is generally unaware of which actual
subclass was chosen by the factory to create the object.

25
26 CHAPTER 2. DESIGN PATTERNS

Client

uses gets product from

Product Factory

+getProduct():Product

ConcreteProduct

produces

Factory classes are very common in object oriented languages. Typically a


factory is used when a client program might not be able to anticipate which
class it should use to create an object, or when the knowledge of the actual
class used to create an object should be localized. For example, Java has a class
called DocumentBuilderFactory whose newDocumentBuilder method returns
a DocumentBuilder object for producing DOM object trees from XML docu-
ments. The actual DocumentBuilder object that is returned depends on the
configuration of the factory when the method is called.
The Abstract Factory Pattern is another creational pattern that is one level of
abstraction higher than the Factory Pattern. It has an abstract factory class that

Client sets factory AbstractFactory

gets product from

uses

Product FactoryA FactoryB

ConcreteProductA ConcreteProductB

produces

produces

organizes concrete classes into groups obtainable by alternate factory classes.


The abstract factory is used to obtain the appropriate factory class depending
on the situation.
A typical application is the development of the something such as the user
interface for a program that must operate on several platforms. Ideally, the
gui components for a program should have a distinct look and feel on each
platform. But rather than rewriting the user interface for each platform the
2.1. CREATIONAL PATTERNS 27

appearance of the components for each platform is held in a separate factory.


Then the program can choose the appropriate factory at runtime depending
on the platform it is on. In Java the UIManager class plays the role of such
an abstract factory class, and its setLookAndFeel method is used to load a
pluggable look and feel factory which handles the appearance of the components
on the platform transparently to the client program:
try
{ UIManager.setLookAndFeel(
UIManager.getSystemLookAndFeelClassName());
}
catch (Exception e)
{ System.out.println("Exception: " + e.getMessage());
}
Each of the Swing gui components can then query the look and feel factory
for how it should be displayed on that platform, avoiding the need for separate
components on each platform.
An abstract factory is chosen when a client application needs to be config-
urable with one of several families of classes. One benefit of using the Abstract
Factory Pattern is that the abstract factory class hides the concrete classes that
are generated. Because of this, the actual family of classes used to produce the
objects can be freely changed (plugged in) without affecting the client program.
Sometimes it is desirable to ensure that a class cannot have multiple instances
created from it. A class can prohibit any instances being created from it by
declaring its (empty) default constructor as private so that it can not be used
outside the class, and just having static methods and fields. (this is done
in the Math class in Java). An alternative is to use the Singleton Pattern, a
creational pattern that ensures that at most one instance is created from the
class. Firstly, all the constructors of the class are made private, and a private
static (class) field is used to reference the single instance of the class. Then
a static method is used that checks whether the instance has been created,
creating it if necessary, and returning the instance.
Singleton
-instance : Singleton
-Singleton()
+getInstance() : Singleton
The Singleton Pattern is typically used if one instance of a class can be
shared, especially if there is a lot of overhead in creating an object from the
class.
Exercise 2.1 (Handling Different Locales) Suppose you are designing an
application for managing address and telephone information as described by the
following two UML diagrams:
Address

+setStreet(street:String):void PhoneNumber
+setCity(city:String):void
+setPostalCode(code:int):void +setPhoneNumber(phone:int):void
+setRegion(region:String):void +getFullPhoneNumber():String
+getCountry():String
+getFullAddress():String
28 CHAPTER 2. DESIGN PATTERNS

Typical New Zealand and French addresses and phone numbers appear as:
31 Symonds Street Musee du Louvre
Auckland. 1020 F-75058 Paris Cedex
New Zealand FRANCE
+64 9 921 9999 +33 1 40 20 50 50
Prepare a suitable software design that has the flexibility for handling such in-
formation.

2.2 Structural Patterns


Reading: none
Structural patterns deal with the best way of combining classes (by using
inheritance) or objects (by using object composition) to form larger structures.
The Adapter Pattern is used to convert the programming interface of a
client class so that it is suitable for some requirement. This allows classes with
incompatible interfaces to work together. One way the Adapter Pattern can be
used to create an object that does implement the required interface is to use
inheritance to extend the client class to a class that does implement it. An
alternative is to use composition to include an instance of the client class inside
another class which implements the interface.

≪interface≫ ≪interface≫
Client Client
Requirement Requirement

Adapter Adapter

Another type of adapter is provided in Java by classes such as MouseAdapter,


which implements the MouseListener interface. Such adapter classes have
empty implementations of the methods of the required interface. A class that
needs to implement the interface can just extend the adapter class and override
the methods of interest to it.
The Composite Pattern describes how to build a hierarchy of objects. This
pattern uses a single node interface that describes any node in the tree, which
has methods for obtaining the children of the node (typically by an iterator or
one at a time). This node interface is implemented by a class that represents a
composite (a node that can have zero or more children) and possibly also by a
class that represents a leaf (a node that is not allowed children).
This pattern is typically followed when building tree data structures. For
example the TreeNode interface is used for building Swing tree models, and
the Node interface is used for an XML document tree (whose getChildNodes
method returns a NodeList object rather than an iterator of the child nodes).
2.2. STRUCTURAL PATTERNS 29

≪interface≫
Node

+children() : Iterator<Node> 0..∗


+getChildAt(index : int) : Node
+getChildCount() : int
+add(child : Node) : void
+remove(child : Node) : void

Leaf Composite

The Decorator Pattern pro-


vides a way of extending
the functionality of (decorat-
ing) a component class with- Component
out having to create derived
classes for each subclass of
the component class. This is
achieved by having the dec-
orator class extend the com- Decorator
ponent class and also hold
an instance of the compo- +Decorator(c:Component)
nent, which is usually pro-
vided to its constructor. It
then overrides the methods of
the component class, provid- ConcreteDecoratorA ConcreteDecoratorB
ing its own decorated version
of them.

Decorators are widely used when performing input/output with streams. Fil-
tering streams such as BufferedInputStream, DataInputStream, and Input-
StreamReader all follow the Decorator Pattern, since they accept an existing
stream as a parameter to their constructor and manipulate the data in the
stream to give a new (decorated) stream.

This pattern is also convenient for decorating gui components, as illus-


trated by the example GUIDecorator and its subclass PassOverDecorator. The
PassOverDecorator class can be used to decorate any JComponent gui compo-
nent, such as a JLabel, JButton, or JTextField. Once a gui component has
been created it can be passed to the constructor of PassOverDecorator which
alters the foreground colour of the component whenever the mouse passes over
it.
30 CHAPTER 2. DESIGN PATTERNS

/**
An example of a decorator for gui components
Adapted from The Design Patterns Java Companion by Cooper
@author Andrew Ensor
*/
import java.awt.BorderLayout;
import javax.swing.JComponent;

public class GUIDecorator extends JComponent


{
protected JComponent component; // component to decorate

public GUIDecorator(JComponent component)


{ super();
this.component = component;
setLayout(new BorderLayout());
add(component, BorderLayout.CENTER);
}
}

Façade

uses

SubsystemA SubsystemB SubsystemC

The Façade Patttern provides a way of hiding a complex system inside a


simpler interface. As a software application evolves it usually grows in com-
plexity, resulting in a myriad of classes with a lot of functionality that only an
advanced user would be interested in using. To shield the typical user from
the complexities of the system the Façade Pattern uses a façade class which
provides a simplified programming interface to the system, incorporating only
the features that a typical user might need. A more advanced user is permitted
to bypass the interface when necessary to access the full functionality of the
system.
The Proxy Pattern is used to represent a complex subject by a simpler one
(called a proxy). A proxy is sometimes used in a situation when creating the
real subject might take considerable time or resources, and a virtual proxy is
used in its place until the real subject is needed. Other types of proxies include
a remote proxy which provides a local representative of the real subject that
is located in a different address space, and a protection proxy which controls
access to the real subject or performs additional actions when it is accessed.
The example class RemoteImage is an abstract class that represents an image
obtained across a network connection from a specified uniform resource locator
(URL). The class ConcreteImage provides a concrete subclass of it which uses
a Toolkit to obtain the image. Since the Toolkit method getImage returns
2.2. STRUCTURAL PATTERNS 31

/**
Decorator for any JComponent to change the foreground colour
whenever the mouse passes over the component
@see GUIDecorator.java
*/
import java.awt.Color;
import java.awt.Dimension;
import java.awt.Toolkit;
import java.awt.event.MouseAdapter;
import java.awt.event.MouseEvent;
import javax.swing.JButton;
import javax.swing.JComponent;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.JTextField;

public class PassOverDecorator extends GUIDecorator


{
private Color defaultForeground, mouseOverForeground;

public PassOverDecorator(final JComponent component)


{ super(component);
defaultForeground = component.getForeground();
mouseOverForeground = new Color(255-defaultForeground.getRed(),
defaultForeground.getGreen(), defaultForeground.getBlue());
component.addMouseListener(new MouseAdapter()
{ public void mouseEntered(MouseEvent e)
{ component.setForeground(mouseOverForeground);
}

public void mouseExited(MouseEvent e)


{ component.setForeground(defaultForeground);
}
});
}
}
32 CHAPTER 2. DESIGN PATTERNS

Client uses
Subject

RealSubject 0..1
Proxy

/**
An abstract class that represents an image which is
obtained via a URL, as a demonstration of the Proxy Pattern
@author Andrew Ensor
*/
import java.awt.Image;
import java.net.URL;

public abstract class RemoteImage


{
private URL url;

public RemoteImage(URL url)


{ this.url = url;
}

public URL getURL()


{ return url;
}

public abstract Image getImage();


}

immediately (loading the image in a separate thread), a MediaTracker object is


used to wait for the image to be fully loaded before returning it. However, if the
network connection is slow the ConcreteImage class takes time to return the
image, holding up the application. As an alternative, the class ProxyImage is a
virtual proxy which immediately returns a temporary image, and itself creates
a ConcreteImage object in a separate thread. When the ConcreteImage has
finished loading the image from the URL the proxy replaces the temporary
image by it. Hence the application is not held up while the image is being
obtained but still displays something while it is loading.

Exercise 2.2 (Parity Check for Streams) Prepare a class called Parity-
OutputSream that extends the class FilterOutputStream and which uses the
Decorator Pattern to add an even parity bit to every seven bits that is written
to the stream. Take care with the last byte written and test your class with a
simple driver program.
2.2. STRUCTURAL PATTERNS 33

/**
A class that represents a ConcreteImage which is obtained via a URL,
and whose getImage method blocks until the image is loaded
@see RemoteImage.java
*/
import java.awt.Component;
import java.awt.Image;
import java.awt.MediaTracker;
import java.awt.Toolkit;
import java.net.URL;
import java.util.Properties;

public class ConcreteImage extends RemoteImage


{
private Image image;
private Component component;

// constructor for creating an image located at the specified


// URL that will appear on specified component
public ConcreteImage(URL url, Component component)
{ super(url);
image = null; // image is not yet loaded
this.component = component;
// set the proxy host and port number for AUT’s firewall
// note this is not needed if no firewall present
Properties props = new Properties(System.getProperties());
props.put("http.proxySet", "true"); // true if using proxy
props.put("http.proxyHost", "cache.aut.ac.nz"); // AUT specific
props.put("http.proxyPort", "3128"); // AUT specific
System.setProperties(props);
}

// returns the image, note that this implementation blocks


// until the image is loaded
public Image getImage()
{ if (image == null)
{ image = Toolkit.getDefaultToolkit().getImage(getURL());
// wait until the image has been loaded
MediaTracker tracker = new MediaTracker(component);
tracker.addImage(image, 0); // id 0 assigned to the image
try
{ tracker.waitForID(0);
}
catch (InterruptedException e)
{}
}
return image;
}
}
34 CHAPTER 2. DESIGN PATTERNS

/**
A class that represents a proxy image which is obtained via a URL,
with a temporary image displayed until the actual image is loaded
@see RemoteImage.java
*/
import java.awt.Component;
import java.awt.Image;
import java.net.URL;
import javax.swing.ImageIcon;

public class ProxyImage extends RemoteImage


{
private Component component;
private ConcreteImage ci;
private boolean imageAvailable;
private ImageIcon tempIcon;

// constructor for creating an image located at the specified


// URL that will appear on specified component
public ProxyImage(URL url, Component component)
{ super(url);
this.component = component;
ci = null; // no concrete image yet loaded
imageAvailable = false;
// load the temporary image
tempIcon = new ImageIcon("downloading.png"); // local image
// create a separate thread to get the desired image from URL
Thread thread = new Thread(new ConcreteImageLoader());
thread.start();
}

// returns the image, note that this implementation returns


// immediately
public Image getImage()
{ if (imageAvailable)
return ci.getImage();
else
return tempIcon.getImage();
}

// inner class that loads an image using ConcreteImage


private class ConcreteImageLoader implements Runnable
{
public void run()
{ ci = new ConcreteImage(getURL(), component);
ci.getImage();
imageAvailable = true;
component.repaint(); // repaint component with loaded image
}
}
}
2.3. BEHAVIORAL PATTERNS 35

2.3 Behavioral Patterns


Reading: none
The Iterator Pattern is a very common pattern used to obtain the elements
in a collection without exposing the details of the data structure that has been
used to implement the collection. Typically an iterator is described by an inter-
face which has methods such as hasNext to determine whether there are more
elements in the collection to return, and next to return the next element. To
move back to the start of the iteration one creates a new instance of the iterator
(alternatively an iterator might have a first method). Some iterators might
provide a remove method to remove the element from the collection that was
most recently returned by the next method.

uses
Client obtains element from

≪interface≫
≪interface≫ Iterator<E>
Collection<E>
+next() : E
+hasNext() : boolean
+remove() : void

ConcreteCollection ConcreteIterator

The Iterator Pattern does not specify the behaviour of an iterator if the
collection is modified somehow as it is iterating through its elements, such as if
another thread adds an element. A fail-fast iterator is an iterator that throws
an exception (such as a ConcurrentModificationException) if the collection
is modified in some way other than by using the iterator’s remove method.
Some iterators such as Java’s ListIterator interface provide further meth-
ods supporting bi-directional iteration through the elements in a collection. Usu-
ally such iterators are only used for collections that can somehow be efficiently
traversed in both forward and backward directions (such as a doubly-linked list).
The cursor position (index) of a bi-directional iterator is considered to always
lie between elements, immediately after the (previous) element most recently
called by next.
For example, the JDBC API libraries use a database connection to cre-
ate a statement of a specified type, either the default TYPE FORWARD ONLY (re-
sult set is not scrollable, meaning that the iterator moves only in one direc-
tion), TYPE SCROLL INSENSITIVE (result set is scrollable but not sensitive to
database changes, meaning that the iterator can be jumped to any index),
or TYPE SCROLL SENSITIVE (result set is scrollable and sensitive to database
changes), and of a specified concurrency, either CONCUR READ ONLY (result set
cannot be used to update the database) or CONCUR UPDATABLE (result set can
be used to update the database). One peculiarity with a ResultSet iterator is
36 CHAPTER 2. DESIGN PATTERNS

that it requires an initial call to next to position the cursor at the first record:
Statement stmt = con.createStatement(type, concurrency );
String command = "SELECT ...";
ResultSet rs = stmt.executeQuery(command);
while (rs.next())
{ ...= rs.getXxx (...);
...
}
Another type of iterator is a filtered iterator which iterates through only
those elements of the collection that satisfy some condition (this portion of the
collection is called a view ). Others might allow some control over the order
in which the iterator returns the elements of the collection. For example, a
Java ME record store has a RecordEnumeration iterator for iterating through
its byte[] records, which allows an optional filter to filter out only certain
records, an optional comparator to iterate the elements in a certain order, and a
boolean to indicate whether the iterator should be kept updated with (sensitive
to) changes in the record store:
RecordEnumeration re = rs.enumerateRecords(filter,
comparator, sensitive );
while (re.hasNextElement())
{ byte[] bytes = re.nextRecord();
...
}

Observable
-listeners:Collection<Observer> notifies
+addListener(o:Observer):void
+removeListener(o:Observer):void
+notifyAll():void ≪interface≫
Observer

+process(e:Event):void

Subject 0..∗ ConcreteObserver

The Observer Pattern provides a way to have one or more classes (called ob-
servers or listeners) notified of particular events. The subject that will generate
the events has methods such as addListener and removeListener for manipu-
lating a collection it holds of observers registered with it that should be notified
of the events. It also often has a method called notifyAll or fireXxx which
calls some process method for each registered observer, passing it details of
the event as a parameter. This process method would typically perform some
action as a consequence of the event.
2.3. BEHAVIORAL PATTERNS 37

A common example of the Observer Pattern is the Model-View-Controller


architecture for GUI components, which actually applies the pattern in two
ways. A Swing GUI component such as a JList, JTable, or a JTree provides a
view (visual component) for data that is held in a separate data structure called
the model (such as a ListModel, a TableModel, or a TreeModel). The controller
provides the communication between the model and the view, and is either a
separate class or else is incorporated as part of the model or the view. The
controller(s) is notified when an event happens on the view by being registered
as a listener on the component that generates events. Likewise, the controller(s)
is notified whenever a change is made to the model by being registered as a
listener to changes on the model.
A form of the Observer Pattern is used in multithreaded applications as a way
to control concurrent thread access to synchronized blocks of code. Each block
of synchronized code is controlled by a monitor, which acts as an observable to
the threads that are in its wait set. Its wait method adds a thread as a listener,
whereas the notifyAll method notifies all the listening threads and results in
them being removed as listeners.
One variation of the Producer/Consumer example demonstrates a com-
mon pattern where two threads communicate with each other using wait and
notifyAll. The example class Transmitter is used to create one thread that
sends information to an intermediary Messenger object. Another thread cre-
ated from the class Receiver obtains the information item at a time from the
Messenger. If the Messenger already holds an item of information when the
Transmitter tries to send more, the Transmitter thread is made to wait.
When the Receiver thread obtains the information, it notifies the Transmitter
thread so that it can proceed with sending more information. Likewise, when
the Messenger holds no item the Receiver thread is made to wait for a no-
tification from the Transmitter before it can obtain the information. In this
example the Messenger object (the monitor) maintains a wait set of observing
threads and its notifyAll method informs the threads that they can continue
processing.
Context Strategy
-current:Strategy
+perform():void

ConcreteStrategyA ConcreteStrategyB

+perform():void +perform():void
The Strategy Pattern is used to select between several alternative algorithms
or strategies. A context object holds a reference to an instance of the chosen
strategy which might be determined by the client or instead selected by the
context itself according to the situation. The pattern encourages the algorithm
for each strategy to be implemented in its own class rather than coded as meth-
ods in the context itself. All the strategies inherit from a common parent class
(or implement a common interface), so new strategies can be easy added with
only minor additions to the context. This pattern is typically used when there
are several alternative ways of performing some task, and allows the choice to
38 CHAPTER 2. DESIGN PATTERNS

/**
A class that represents a Transmitter of objects of type E that
are sent to the Receiver (operating in a separate thread) one
item at a time via a Messenger
@author Andrew Ensor
*/
import java.util.Collection;
import java.util.Random;

public class Transmitter<E> implements Runnable


{
private Messenger<E> messenger;
private Collection<E> information;
private Random generator;

public Transmitter(Messenger<E> messenger,


Collection<E> information)
{ this.messenger = messenger;
this.information = information;
generator = new Random();
}

public void run()


{ // transmit information to the messenger with a random delay
// between each transmission
for (E item : information)
{ System.out.println("Transmitter about to pass " + item
+ " to the messenger");
messenger.put(item); // waits until item has been put
try // sleep for up to 500ms
{ Thread.sleep(generator.nextInt(500));
}
catch (InterruptedException e)
{}
}
messenger.stopAccepting();
System.out.println("Transmitter stopping");
}
}
2.3. BEHAVIORAL PATTERNS 39

/**
A class that represents a Receiver of objects of type E that
are sent from the Transmitter (operating in a separate thread) one
item at a time via a Messenger
@author Andrew Ensor
*/
import java.util.Collection;
import java.util.Random;

public class Receiver<E> implements Runnable


{
private Messenger<E> messenger;

public Receiver(Messenger<E> messenger)


{ this.messenger = messenger;
}

public void run()


{ // receive information from the messenger
while (messenger.isAccepting() || messenger.hasItem())
{ E item = messenger.get(); // waits until item has been gotten
if (messenger.isAccepting())
System.out.println("Receiver has gotten " + item
+ " from the messenger");
}
System.out.println("Receiver stopping");
}
}
40 CHAPTER 2. DESIGN PATTERNS

/**
A class that represents a Messenger which is passed an object of
type E (by a Transmitter) and holds it until it is requested (by a
Receiver). Note that further attempts by a thread to give the
messenger an object while it holds one are made to wait, and
attempts by a thread to obtain the object while the messenger does
not hold one are made to wait.
@author Andrew Ensor
*/

public class Messenger<E>


{
private E item;
private boolean empty; // whether messenger holds an item
private boolean accepting; // whether messenger allows objects

public Messenger()
{ item = null;
empty = true;
accepting = true;
}

public synchronized void put(E item)


{ if (!accepting)
throw new IllegalStateException("Messenger not accepting");
// make the current thread (ie Transmitter) wait until the
// previous item has been requested
while (!empty)
{ try
{ wait();
}
catch (InterruptedException e)
{}
}
this.item = item;
empty = false;
// notify all other threads (ie Receiver) of the change in the
// status of the messenger
notifyAll();
}

cont-
2.3. BEHAVIORAL PATTERNS 41

-cont

public synchronized E get()


{ // make the current thread (ie Receiver) wait until the
// messenger holds an item
while (empty && accepting)
{ try
{ wait();
}
catch (InterruptedException e)
{}
}
if (empty)
return null; // no item to return
empty = true;
// notify all other threads (ie Transmitter) of the change in the
// status of the messenger
notifyAll();
return item;
}

public boolean hasItem()


{ return !empty;
}

public boolean isAccepting()


{ return accepting;
}

public synchronized void stopAccepting()


{ accepting = false;
System.out.println("Messenger requested to stop accepting");
notifyAll();
}
}
42 CHAPTER 2. DESIGN PATTERNS

/**
A class which demonstrate communication between two threads
one created from a Transmitter and the other from a Receiver
which communicate via a Messenger object by observing its
thread notifications
@author Andrew Ensor
*/
import java.util.ArrayList;
import java.util.Collection;

public class ThreadCommunication


{
public static void main(String[] args)
{ Messenger<String> messenger = new Messenger<String>();
String[] items = {"Proceed", "Solinus", "to", "procure",
"my", "fall", "And", "by", "the", "doom", "of", "death",
"end", "woes", "and", "all"};
Collection<String> information = new ArrayList<String>();
for (String item : items)
information.add(item);
Transmitter<String> transmitter = new Transmitter<String>
(messenger, information);
Receiver<String> receiver = new Receiver<String>(messenger);
Thread transmitterThread = new Thread(transmitter);
Thread receiverThread = new Thread(receiver);
transmitterThread.start();
receiverThread.start();
}
}
2.3. BEHAVIORAL PATTERNS 43

be made dynamically without requiring any changes to the code in a client


program.
The layout of gui components in a panel provides a good illustration of
the strategy pattern. A gui Container (the context) holds a reference to
a LayoutManager instance which can be changed via its setLayout method.
Each class that implements LayoutManager, such as BorderLayout, BoxLayout,
CardLayout, FlowLayout, GridLayout provides a different algorithmic strategy
for positioning the components in the container. A client just needs to select the
desired strategy and the manager provides the necessary algorithm to correctly
position each component.
The Template Pattern is the pattern for describing in-
heritance in object-oriented programming languages. It AbstractClass
is used to define an algorithm in a class (the template
for the algorithm) but leave some of the implementation
details to one or more subclasses. This pattern is very
often employed in object-oriented programming where
some parts of an algorithm can be implemented in the ConcreteClass
superclass, but other parts might have several imple-
mentations which are left to subclasses.
The Template Pattern classifies methods into four types:

concrete methods that are implemented in the superclass and which the sub-
classes would use without being overridden (such as final methods),
abstract methods that are not implemented in the superclass and so must be
implemented by subclasses,

hook methods that have a default implementation in the superclass, but


which some subclasses might want to override,
template methods that are implemented in the superclass and not intended
to be overridden, describing some algorithm using (abstract or hook)
methods that might themselves be overridden, changing the behaviour
of the template method.

Exercise 2.3 (Strategy for Layout Managers) Write a simple gui panel
that demonstrates the Strategy Pattern for determining the layout of the compo-
nents of the panel. You might like to consider using a custom layout manager
by preparing a class that implements the LayoutManager interface (such as the
example CircleLayout that is available on AUT online).
44 CHAPTER 2. DESIGN PATTERNS
Chapter 3

Algorithmic Analysis

3.1 Basic Analysis


Reading: pp5-13,15-37
An algorithm is a step-by-step procedure for solving a problem that takes
some values as input and produces some other values as output. Two impor-
tant characteristics of an algorithm are its efficiency and its correctness. The
efficiency of an algorithm can be measured based on various factors, such as
storage or network resource requirements, but usually it is the running time
and to a lesser extent the memory usage that are typically of greatest interest.
The correctness refers to whether the algorithm eventually halts and produces
the correct output.
For example, one very common computing problem is the sorting problem.
This problem takes as input a sequence (list) of n elements (numbers or objects

ha0 , a1 , . . . , an−1
that are comparable) i and should have as output a permutation
of the sequence aj0 , aj1 , . . . , ajn−1 that is ordered aj0 ≤ aj1 ≤ . . . ≤ ajn−1 .
Insertion sort is one example of an algorithm that correctly solves the sorting
problem. This algorithm works by incrementally enlarging a sorted subset and
inserting each element one at a time into the correct position in the subset,
moving larger elements over to accommodate it. This algorithm takes as input
an array (or list) A and gives the same array as output, referred to as in-place
sorting since the array itself is sorted rather than a separate copy of its elements.
It can be described using pseudocode as follows:
The analysis of this algorithm in order to predict the resources it will re-
quire when translated into a programming language and executed on a computer
requires some assumptions about the computing device itself. The Random Ac-
cess Model (RAM) views a computer as a single processor connected to memory
which it can access each part of in constant time (no caches or virtual memory).
The RAM model presumes that instructions are executed sequentially, with no
actual concurrency (as would be possible with processors working in parallel).
Allowing for parallel processors or a memory hierarchy changes the analysis of
the algorithm.
If the Insertion-Sort algorithm is to be executed on a RAM machine then
its time efficiency can be analyzed as follows. Let c1 denote the time required
each time around the for loop to increment the variable i and check whether it

45
46 CHAPTER 3. ALGORITHMIC ANALYSIS

Insertion-Sort(A)
1 for i ← 1 to length[A] − 1 do
2 key ← A[i]
3  Insert A[i] into the sorted sequence A[0 . . i − 1]
4 insertIndex ← i
5 while insertIndex > 0 and A[insertIndex −1] > key do
6  shift element at insertIndex −1 along one to make space
7 A[insertIndex ] ← A[insertIndex −1]
8 insertIndex ← insertIndex −1
9 A[insertIndex ] ← key

is greater than length[A] − 1, and let c2 , c4 , c7 , c8 , c9 denote the times required


to perform the assignment statements in lines 2, 4, 7, 8, 9 respectively. Likewise
suppose that c5 represents the time required by the RAM machine to check the
two boolean expressions in line 5 (actually if the first expression is false the
second might not even get evaluated). If the array A has n elements then the
for loop gets executed n − 1 times (once each for inserting the second, third,
. . . , n-th element). For each of these i = 1, 2, . . . , n − 1 iterations let mi denote
the number of times the while loop gets tested, so the code on lines 6, 7, 8 get
executed mi − 1 times for each iteration, and 1 ≤ mi ≤ i (mi = 1 if A[i] gets
inserted at index i, and mi = i if A[i] gets inserted at index 0 so the while loop
gets checked all i times). Now, presuming that the comments in lines 3 and 6
take no processor time then the total time Tn required for the algorithm would
be:
n−1
X n−1
X
T (n) = c1 n + c2 (n − 1) + 0(n − 1) + c4 (n − 1) + c5 mi + 0 (mi − 1)
i=1 i=1
n−1
X n−1
X
+ c7 (mi − 1) + c8 (mi − 1) + c9 (n − 1).
i=1 i=1

The formula for the time T (n) depends not only on the number n of elements
to be sorted but also on the initial ordering of the elements, which determines
the known values m1 , m2 , . . . , mn−1 . The best case (i.e. least time) is when
m1 = 1, m2 = 1, . . . , mn−1 = 1 which corresponds to the elements all being
initially in order. In this case:
T (n) = c1 n + c2 (n − 1) + c4 (n − 1) + c5 (n − 1) + c9 (n − 1)
= (c1 + c2 + c4 + c5 + c9 ) n − (c2 + c4 + c5 + c9 ) ,
so the running time is a linear function T (n) = an + b of n. The worst case (i.e.
greatest time) is when m1 = 1, m2 = 2, . . . , mn−1 = n − 1, which does occur if
the elements are exactly in reverse order. In this case:
n−1
X
T (n) = c1 n + c2 (n − 1) + c4 (n − 1) + c5 i
i=1
n−1
X n−1
X
+ c7 (i − 1) + c8 (i − 1) + c9 (n − 1).
i=1 i=1
3.1. BASIC ANALYSIS 47

n−1 n−1
X n(n − 1) X (n − 1)(n − 2)
Using the formulas i= and (i − 1) = gives:
i=1
2 i=1
2

n(n − 1)
T (n) = c1 n + c2 (n − 1) + c4 (n − 1) + c5
2
(n − 1)(n − 2) (n − 1)(n − 2)
+ c7 + c8 + c9 (n − 1)
2  2 
c5 + c7 + c8 2 1 3 3
= n + c1 + c2 + c4 − c5 − c7 − c8 + c9 n
2 2 2 2
+ (−c2 − c4 + c7 + c8 − c9 ) ,

so the running time is a quadratic function of n. For this reason the Insertion-
Sort algorithm is considered to have O n2 time complexity, since for some
inputs its running time can be at worst a quadratic function of the input size n.
One technique that is useful in determining the correctness of algorithms
containing loops is to use loop invariants. A loop invariant is a boolean state-
ment that is correct for each iteration of the loop. For example, a loop invariant
for the for loop in the Insertion-Sort algorithm is:

at the start of iteration i of the for loop the subarray A[0 . . i − 1]


is a permutation of the elements that were originally in A[0 . . i − 1]
but in sorted order.

To show a loop invariant is true an inductive proof is used. First the statement
is shown to be true at the start of the first iteration of the loop. Next it is
show that if it is true for iteration i then it must also be true for the following
iteration i + 1. Upon termination of the loop the loop invariant should provide
useful information for the analysis of the correctness of the algorithm.
The given loop invariant for the Insertion-Sort algorithm can be verified
as follows. First, the loop starts with i = 1 and so the subarray A[0 . . i − 1]
consisting of only one element A[0], which is the original element at A[0] (since
the loop has not yet swapped any elements) and is trivially sorted. Next, if the
loop invariant is true for iteration i then at the start of iteration i the subarray
A[0 . . i − 1] is a sorted permutation of the original elements A[0 . . i − 1]. To
see that the loop invariant holds for the start of the next iteration i + 1 the
statements inside the for loop executed during iteration i are studied. Note
that the inner while loop just inserts the next element A[i] in its correct sorted
position (formally this would be justified with another loop invariant for the
while loop). Hence at the completion of iteration i the subarray A[0 . . i] holds
the elements that were originally at A[0 . . i − 1] and A[i]. Since A[0 . . i − 1] were
in sorted order at the start of the iteration and A[i] has been inserted at the
correct sorted position, the loop invariant will be valid for the start of the next
iteration i + 1. Note that the loop terminates at the start of iteration i = n, and
at this stage the loop invariant states that A[0 . . n − 1] is a permutation of the
elements that were originally in A[0 . . n − 1] but in sorted order. This proves
that the Insertion-Sort algorithm does correctly solve the sorting problem.
Another algorithm that correctly solves the sorting problem is the Merge-
Sort algorithm. This algorithm uses recursion to repeatedly divide the array
A into two smaller arrays L and R, sort each, and then merge the two sorted
halves back together to one sorted array. To achieve this it typically uses an
48 CHAPTER 3. ALGORITHMIC ANALYSIS

algorithm Merge-Sort-Segment to perform the merge sort on the portion of


the array A between index p (inclusive) and r (exclusive).
An analysis of the time requirements T (n) can be determined as follows. If
c1 denotes the time required to call Merge-Sort and check whether p + 1 < r
then T (1) = c1 . For n > 1 if the time required to divide the array into two is
denoted by D(n) and the time to combine the two halves back together by C(n)
then T (n) = 2T (n/2) + D(n) + C(n). It can be shown that D(n) and C(n)
are both linear functions so can be together written as c2 n + c3 where c2 6= 0.
Hence T (n) can be given recursively by:

c1 if n = 1
T (n) =
2T (n/2) + c2 n + c3 if n > 1.

Solving this recurrence (using the Master Theorem of Section 3.2) gives that
the dominant term of T (n) is n log2 n in all cases (regardless of the original
order of the elements). Hence the Merge-Sort algorithm is considered to
have O (n log n) complexity.

Exercise 3.1 (Analyzing Selection Sort) Find an expression for the time
T (n) for the following Selection-Sort algorithm, and a suitable loop invariant
to verify that the algorithm correctly solves the sorting problem.

Selection-Sort(A)
1 n ← length[A]
2 for i ← 0 to n − 2 do
3  Find the least element in A[i . . n − 1]
4 indexMin ← i
5 for j ← i + 1 to n − 1 do
6 if A[j] < A[indexMin] then
7 indexMin ← j
8 swap elements at indexMin and i

3.2 Recurrence Analysis


Reading: pp41-57,62-75
The precise formula for the time T (n) (the growth function) that an algo-
rithm takes to solve a problem can be difficult to find and many of the constants
involved depend on compiler and platform-specific properties (such as processor
speed). Usually, the precise function is not required, but instead it is sufficient
to just know its dominant term, called the asymptotic complexity or order of
the algorithm. For example, in the best case the Insertion-Sort algorithm
has linear asymptotic complexity T (n) = an + b (with a 6= 0) whereas in the
worst case it has quadratic complexity T (n) = an2 + bn + c (with a 6= 0).
The notation O (g(n)) for a function g is used to denote a set of functions:

O (g(n)) = {f (n): ∃c > 0 and ∃n0 for which 0 ≤ f (n) ≤ cg(n) for every n ≥ n0 } .
3.2. RECURRENCE ANALYSIS 49

Merge-Sort(A)
1 n ← length[A]
2 Merge-Sort-Segment(A, 0, n)

Merge-Sort-Segment(A, p, r)
1 if p + 1 < r then  there are several elements to be sorted
2 q ← (p + r)/2  q is the middle index between p and r
3 Merge-Sort-Segment(A, p, q)
4 Merge-Sort-Segment(A, q, r)
5 n1 ← q − p
6 n2 ← r − q
7 create arrays L[0 . . n1 − 1] and R[0 . . n2 − 1]
8 for i ← 0 to n1 − 1 do
9 L[i] ← A[p + i]
10 for j ← 0 to n2 − 1 do
11 R[i] ← A[q + j]
12 i←0
13 j←0
14 for k ← p to r − 1 do
15  determine which element to next put in A
16 if i < n1 then  there are more elements in L
17 if j < n2 then  there are more elements in R
18 if L[i] ≤ R[j] then
19 A[k] ← L[i]
20 i←i+1
21 else A[k] ← R[j]
22 j ←j+1
23 else A[k] ← L[i]
24 i←i+1
25 else A[k] ← R[j]
26 j ←j+1
50 CHAPTER 3. ALGORITHMIC ANALYSIS

This means that the set O (g(n)) consists of all those Time
cg(n)
(non-negative) functions f (n) that for n sufficiently
large are bounded above by some (possibly large) f (n)
multiple of the function g(n). One often says f (n) is
O (g(n)) if f (n) ∈ O (g(n)). The O-notation is used
to give an asymptotic upper bound on a function n
f (n) to within a constant factor. n0
For example, 7n+18 is O(n) since it is bounded above by 8n for n ≥ 18. Also
5n2 + 100n log2 n − 1 is O n2 since it is bounded above by 6n2 for n ≥ 996.
But 5n2 + 100n log2 n − 1 is not O(n) as no constant multiple of any linear
function can be an upper bound for 5n2 + 100n log2 n − 1, which eventually
grows above any linear function. However, 7n + 18 is also considered O n2
since it is bounded above by 1n2 for n ≥ 9. For this reason one often says that
the time complexity of Insertion-Sort algorithm is O n2 even though for
some inputs it has linear complexity.
The Ω-notation is used to give an asymptotic lower bound on a function
f (n) to within a constant factor. The set Ω (g(n)) is given by:

Ω (g(n)) = {f (n): ∃c0 > 0 and ∃n0 for which 0 ≤ c0 g(n) ≤ f (n) for every n ≥ n0 } .
For example, 7n+18 is Ω(n) since it is bounded below Time
by 7n for n ≥ 1. It is also Ω(1) since it is bounded
below by the constant function 25 for n ≥ 1 Also f (n)
5n2 +100n log2 n−1 is Ω n2 since it is bounded be-
c′g(n)
low by 4n2 for n ≥ 1. Note that the Insertion-Sort
algorithm is Ω(n), since for some inputs it has linear n
time complexity, whereas for others it is quadratic. n0
The Θ-notation is used to asymptotically bound a function f (n) from above
and from below by two multiples of the same function g(n), where Θ (g(n)) is
the set:

{f (n): ∃c, c0 > 0 and ∃n0 for which 0 ≤ c0 g(n) ≤ f (n) ≤ cg(n) for every n ≥ n0 } .
Time
cg(n)
Hence a function f (n) is Θ (g(n)) means that for
large enough values of n the function f (n) is bounded f (n)
above by one multiple cg(n) of g(n) and below by an- c′g(n)
other multiple c0 g(n), in which case g(n) is called an
asymptotically tight bound for f (n). For example, n
n0

7n + 18 is Θ(n) and 5n2 + 100n log2 n − 1 is Θ n2 .
Note that f (n) is Θ (g(n)) if and only if f (n) is both Ω (g(n)) and O (g(n)).
Hence the Θ-notation cannot be applied to the Insertion-Sort algorithm since
for some inputs it has linear complexity, whereas for others it has quadratic
complexity. On the other hand, the Merge-Sort algorithm is Θ (n log n) since
regardless of the input its time requirement T (n) always has dominant term
n log n.
Two cautions however with using the O-, Ω-, and Θ-notations to compare
algorithms for a problem. Firstly, these notations hide the constant factors ci
in an algorithm, which can be quite large (and significant when comparing two
algorithms of the same complexity or algorithms for small values of n). Secondly,
they also hide the non-dominant terms which might also be significant for small
3.2. RECURRENCE ANALYSIS 51

values of n. For example, although the Insertion-Sort algorithm is O n2
and the Merge-Sort algorithm is Θ (n log n), for small values of n Insertion-
Sort is quite quick, and for some special inputs for even large values of n its
time complexity is linear (since it is Ω(n)).
The analysis of the time requirements of an algorithm often involves solving
recurrences, which are equations that express the values of a function T (n) for
n > 1 recursively in terms of its values for smaller values of n. Simple recurrences
of the form T (n) = c1 T (n − 1) + c2 T (n − 2) (such as the Fibonacci numbers)
can be solved using a technique that makes use of the roots of the quadratic
x2 − c1 x − c2 . However many of the recurrences encountered in algorithmic
analysis have the more complicated form:

T (n) = aT (n/b) + f (n)

where a ≥ 1 and b > 1 are constants, and f (n) is some known function. Techni-
cally n/b might mean bn/bc or dn/be, where the floor function bxc denotes the
largest integer that is not more than the number x and the ceiling function dxe
denotes the smallest integer that is not less than x. However, usually the choice
does not affect the analysis and for convenience often only certain values of n,
such as powers of 2, are analyzed.
Sometimes a pattern can be found in the values given by a recurrence
and an explicit form obtained for T (n). For example, consider the recurrence
T (n) = 2T (n/2) + cn where c is a constant. Calculating some values of T (n)
for successive values of n ≥ 2 gives:

T (2) = 2T (1) + 2c
T (4) = 2T (2) + 4c
= 4T (1) + 8c
T (8) = 2T (4) + 8c
= 8T (1) + 24c
T (16) = 2T (8) + 16c
= 16T (1) + 64c
T (32) = 2T (16) + 32c
= 32T (1) + 160c.

The explicit formula T (n) = nT (1) + cn log2 n for n = 2k (k ≥ 1) can be verified


by mathematical induction, so if c 6= 0 then T (n) is Θ (n log2 n).
An alternative way to solve a recurrence (to obtain an explicit formula for
T (n)) uses a tree to represent the times involved in the equation. A recursion
tree is a tree where each node represents one substitution of the recurrence
equation and holds a time so that the time T (n) is found by summing all the
node values in the tree.
For example, consider again the recurrence
T (n) = 2T (n/2)+cn where c is a constant.
A tree is drawn whose root holds the time cn
cn with two children each holding the time
T (n/2). Note that the sum of all the times
in the tree is T (n). T (n/2) T (n/2)
52 CHAPTER 3. ALGORITHMIC ANALYSIS

cn
Next, the recursion tree is
grown by expanding each of cn/2 cn/2
the T (n/2) leaf nodes us-
ing the recurrence, where
T (n/2) = 2T (n/4) + cn/2. T (n/4) T (n/4) T (n/4) T (n/4)
This process is repeated until all the leaf nodes in the tree hold the value T (1),
which cannot be expanded further using the recurrence. For the recurrence
T (n) = 2T (n/2) + cn this results in a tree with height log2 n where there are
n leaf nodes on the lowest level, each holding the term T (1), and each of the
other levels has total cn. Summing all the node values for the recursion tree
then gives that T (n) = nT (1) + cn log2 n. Hence in this example if c 6= 0 then
T (n) is again seen to be Θ (n log2 n).
Totals:
cn cn

cn/2 cn/2 cn

cn/4 cn/4 cn/4 cn/4 cn


..
.
nT (1)
T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1)

The recursion tree for a general recurrence of the form T (n) = aT (n/b)+f (n)
will have height logb n and have alogb n (which is the same as nlogb a ) leaf nodes
on the lowest level, each holding the term T (1).
Totals:
f (n) f (n)

f (n/b) f (n/b) ... f (n/b) f (n/b) af (n/b)

f (n/b2 ) . . . f (n/b2 ) f (n/b2 ) . . . f (n/b2 ) f (n/b2 ) . . . f (n/b2 ) f (n/b2 ) . . . f (n/b2 ) a2 f (n/b2 )


..
.
T (1) T (1) T (1) T (1)T (1) T (1) T (1) T (1)T (1) T (1) T (1) T (1)T (1) T (1) T (1) T (1) nlogb a T (1)
Then the times are summed across each level of the tree and the following
explicit formula for T (n) is obtained:
logb n−1
X
T (n) = nlogb a T (1) + ai f n/bi .

i=0
 
If the first term dominates then T (n) is Θ nlogb a . If instead the terms f n/bi
in the summation are all proportional to each other then T (n) is Θ nlogb a logb n .
If however, the first f (n) term in the summation dominates then T (n) is Θ (f (n)).
There are further possibilities but these three useful facts are summarized in the
following important result which avoids the need to explicitly construct the re-
cursion tree.
The Master Theorem: Suppose T (n) is given for n > 1 by the recurrence
3.2. RECURRENCE ANALYSIS 53

T (n) = aT (n/b) + f (n) where a ≥ 1 and b > 1 are constants, and f (n) is some
known function.
 
• If f (n) is O nlogb a− for some constant  > 0 then T (n) is Θ nlogb a .
 
• If f (n) is Θ nlogb a then T (n) is Θ nlogb a logb n .

• If f (n) is Ω nlogb a+ for some constant  > 0 and if af (n/b) ≤ rf (n) for
some constant r < 1 and all large enough n then T (n) is Θ (f (n)).

The key to applying the Master Theorem is to compare nlogb a with the
complexity of f (n). For example, suppose T (n) is given by the recurrence 
T (n) = 4T (n/2) + n3/2 . In this case nlog2 4 = n2 and f (n) = n3/2  is O n3/2 ,
so the first case of the Master Theorem gives that T (n) is Θ n2 .
In the earlier example T (n) was given by the recurrence T (n) = 2T (n/2)+cn.
In this case nlogb a = n and f (n) = cn is Θ(n), so the second case of the Master
Theorem gives that T (n) is Θ (n log2 n).
As a final example, suppose T (n) is given by the recurrence
 T (n) = T (n/3)+
n. In this case nlog3 1 = n0 = 1 and f (n) = n is Ω n1 with af (n/b) = n/3 =
1
3 f (n), so the third case of the Master Theorem given that T (n) is Θ(n).

Exercise 3.2 (Finding the Time Complexity) The multiplication of two n-


bit big integers (integers that are too big to be handled directly by a processor) can
be performed by an algorithm that operates in time T (n) given by the recurrence
T (n) = 3T (n/2) + cn, where c > 0 is a constant that depends on the proces-
sor. Draw a recursion tree for this recurrence and use it to find the asymptotic
complexity of T (n). Apply the Master Theorem to check your answer.
Strassen’s Algorithm for multiplying two n×n matrices (discussed in Section
7.1) has running time given by T (n) = 7T (n/2) + cn2 where c > 0 is a constant.
Suppose a competing algorithm has running time T (n) = aT (n/4)+cn2 . What is
the largest integer value for a for which the competing algorithm is asymptotically
faster than Strassen’s algorithm?
54 CHAPTER 3. ALGORITHMIC ANALYSIS
Chapter 4

Design Techniques

4.1 Divide-and-Conquer Technique


Reading: pp28-33,957-961
A brute-force approach for solving a problem simply checks every potential
solution to the problem. There are many problems for which a brute-force
approach is infeasible as the number of potential solutions could be enormous,
or even infinite. Checking just 2n candidates quickly becomes infeasible on any
computing device as n grows large. Hence techniques for designing algorithms
are important, such as divide-and-conquer, dynamic programming, and greedy
strategies.
The divide-and-conquer technique solves a computational problem by divid-
ing it into one or more subproblems of smaller size, conquering each of them
by solving them recursively, and then combining their solutions into a solution
for the original problem. The Divide-and-Conquer algorithm shows the pat-
tern for an algorithm that uses this technique to solve a problem of size n by
dividing it into a smaller subproblems each of size n/b, where a problem below
a threshhold size of n0 gets solved directly:

Divide-and-Conquer(n)
1 if n ≤ n0 then
2 directly solve problem without further dividing
3 else
4 divide problem into a subproblems each of size n/b
5 for i ← 0 to a − 1 do
6 Divide-and-Conquer(n/b) subproblem i
7 combine the a solutions into solution of the problem of size n

Let f (n) denote the time required to divide the problem of size n into a sub-
problems and later combine their solutions back together into a solution for the
original problem. Then the total time T (n) to solve a problem of size n > n0 is
given by the recurrence:
T (n) = aT (n/b) + f (n).
The Merge-Sort algorithm on Page 49 is one example of a divide-and-
conquer algorithm. It starts with a list or array of n elements to be sorted

55
56 CHAPTER 4. DESIGN TECHNIQUES

and divides the sorting problem into two smaller sorting problems, simply by
splitting the array in half to give two arrays each of size n2 (so a = 2 and
b = 2). It then recursively sorts each, and combines the two sorted arrays
together into a solution of the sorting problem by merging them into one sorted
array. The threshold is commonly taken to be n0 = 1, an array that contains
just one element and so is already sorted. With the Merge-Sort algorithm
T (n) = 2T (n/2) + c2 n + c3 for n > 1, so the second case of the Master Theorem
gives that the algorithm is Θ (n log2 n).
Quick sort is another sorting algorithm that uses the divide-and-conquer
technique. It also divides an array into two smaller arrays, using one chosen
element as the pivot to partition the array into two parts, however with quick
sort these arrays are not guaranteed to have the same approximate size. Once
each is recursively sorted, they are easily combined by appending the second
sorted side of the partition after the first to give the complete sorted array.
A binary search also uses the divide-and-conquer technique. To search a
list or array that is sorted in order the Binary-Search algorithm starts by
comparing the target with the element in the middle. This effectively divides
the search problem in half, as if the target has not been found only one half
of the elements need to be searched further, which can be done by recursively
calling the algorithm again. For this algorithm a = 1 since only one of the halves
is searched further, and b = 2 since the problem size is divided by two each time.
So the recurrence is T (n) ≤ T (n/2) + c for n ≥ 2 (an inequality is used since the
target might be found at any stage), where c is the constant time to perform the
recursive call, calculate the midpoint, and perform each comparison. The second
case of the Master Theorem applied to T (n) = T (n/2) + c gives Θ (log2 n), so
the Binary-Search algorithm must be O (log2 n).

Binary-Search(A, target)
1 return Binary-Search-Segment(A, target, 0, length[A])

Binary-Search-Segment(A, target, p, q)
1 if p ≥ q then
2 return -p-1  not found so return -insertion point-1
3 else
4 mid ← (p + q)/2
5 if target = A[mid ] then
6 return mid  target has been found
7 else if target < A[mid ] then
8 return Binary-Search-Segment(A, target, p, mid )
9 else
10 return Binary-Search-Segment(A, target, mid +1, q)

As an application suppose we need to multiply together big integers that


are too large to be handled directly by the arithmetic unit of the processor.
Although regular int values can be as large as 231 − 1 and some languages
allow long values as large as 263 − 1 ≈ 1019 , some areas such as encryption
require much bigger integers.
Big integers with n digits could be represented as strings of length n. They
can easily be added and subtracted by an Θ(n) loop which starts at the least-
significant digit and works its way to the most-significant digit, using a carry
4.1. DIVIDE-AND-CONQUER TECHNIQUE 57

as necessary. However, the common algorithm for multiplying two integers


multiplies each digit of one of the numbers by every digit of the other number,
and so is Θ n2 .
As an alternative to the common multiplication algorithm one can develop a
divide-and-conquer technique. Suppose p and q are two positive integer numbers
each with n digits (either integer could be padded with 0 digits if it had fewer),
and suppose that the processor can directly handle integers with up to at most
a certain threshold number of digits. The idea is to reduce the problem down
to a size so that it can be handled directly by the processor. Denote the digits
of p and q by p0 , p1 , . . . , pn−1 and q0 , q1 , . . . , qn−1 respectively, ordered from
most-significant digit to least significant, so that:

p = p0 · 10n−1 + p1 · 10n−2 + · · · + pn−2 · 10 + pn−1


q = q0 · 10n−1 + q1 · 10n−2 + · · · + qn−2 · 10 + qn−1 .

Let pH be the integer with digits p0 , p1 , . . . , pbn/2c−1 and let pL have digits
pbn/2c , pbn/2c+1 , . . . , pn−1 . Likewise, take qH and qL be the integers formed
from the first and second halves of the digits for q. Then p = pH · 10dn/2e + pL
and q = qH · 10dn/2e + qL . For example, if p = 123456 and q = 987654 then
pH = 123, pL = 456, qH = 987, and qL = 654. Next, note that
   
p·q = pH · 10dn/2e + pL · qH · 10dn/2e + qL
= pH · qH · 102dn/2e + (pH · qL + pL · qH ) · 10dn/2e + pL · qL .

This equation gives a divide-and-conquer technique for solving the multiplica-


tion problem p · q by first calculating the four smaller products pH · qH , pH · qL ,
pL ·qH , pL ·qL and then using some Θ(n) arithmetic and multiplications by pow-
ers of 10. When the two integers being multiplied drop below some threshold
number of combined digits the multiplication is handled directly in O(1) time.
The time required T (n) for this algorithm for n above the threshold is given
by T (n) = 4T (n/2) + f (n) where f (n) is a Θ(n) function. So by the first case
of the Master Theorem with a = 4 and b = 2 one obtains the unfortunate re-
sult that T (n) is Θ n2 , asymptotically the same as the common multiplication
algorithm.
This problem is due to the factor a = 4 in the recurrence, and some thought
gives a way of reducing this to a = 3 by using the fact that:

pH · qL + pL · qH = (pH + pL ) · (qH + qL ) − pH · qH − pL · qL .

Hence the multiplication p·q can be performed by calculating the three products
pH · qH , pL · qL , (pH + pL ) · (qH + qL ), giving the recurrence T (n) = 3T (n/2) +
f (n) and that T (n) is Θ nlog2 3 . 
The example IntegerMultiplication shows how this Θ n1.585 can be
implemented in Java (it also uses a java.math class called BigInteger to check
that the product is actually correct).
There are many other common algorithms that use the divide-and-conquer
technique. As a further example, consider a set of points P0 , P1 , . . . , Pn−1 where
Pi = (xi , yi ) and suppose x0 ≤ x1 ≤ · · · ≤ xn−1 (otherwise an O (n log n)
algorithm could first be used to sort them in order of increasing x-coordinates).
Finding the two points that are closest to each other has applications to areas
58 CHAPTER 4. DESIGN TECHNIQUES

/**
A class that demonstrates how the divide-and-conquer technique can
be used to provide an O(n^(log_2 3)) algorithm for multiplying
(positive) integer values of arbitrary length n
@author Andrew Ensor
*/
public class IntegerMultiplication
{
...
// multiply the two positive integer values held in the strings
public String multiply(String p, String q)
{ int pLength = p.length();
int qLength = q.length();
String product;
if (pLength+qLength<=THRESHOLD)
{ // directly evaluate the multiplication as int values
product = Integer.toString
(Integer.parseInt(p)*Integer.parseInt(q));
}
else
{ // ensure that p and q have the same length
if (pLength>qLength)
{ q = padWithZeros(q, pLength);
qLength = q.length();
}
else if (qLength>pLength)
{ p = padWithZeros(p, qLength);
pLength = p.length();
}
// divide the integer strings m and n into two parts
int middle = pLength/2;
String pHigh = p.substring(0, middle);
String pLow = p.substring(middle);
String qHigh = q.substring(0, middle);
String qLow = q.substring(middle);
// perform recursive conquer with three multiplications
String highPartProduct = multiply(pHigh, qHigh);
String lowPartProduct = multiply(pLow, qLow);
String mixedPart = multiply(add(pHigh,pLow),add(qHigh,qLow));
// combine three multiplications together to get product pq
String highPartShifted = appendZeros(highPartProduct,
2*(pLength-middle));
String midPartShifted = appendZeros(subtract(subtract(
mixedPart,highPartProduct),lowPartProduct),pLength-middle);
product = add(add(highPartShifted, midPartShifted),
lowPartProduct);
}
return product;
}
}
4.2. DYNAMIC PROGRAMMING 59

such as traffic-control systems. A brute-force approach would be to calculate


the distances between all n(n−1)

2 pairs of the n points, requiring a Θ n2 nested
loop. The divide-and-conquer technique can be used to improve on the brute-
force approach to obtain an O (n log n) algorithm.
In order to divide the problem into subproblems con- y
sider dividing the points into two groups by a vertical δr b
b
b
line, with bn/2c points on the left and dn/2e points b

on the right. Note that if Pi and Pj are the closest δ l


b b
two points then either both points are on the left, or b
b
both are on the right, or else one is on the left and
x
the other is on the right.
Hence the problem can be divided into two problems, first finding the distance
δl between the closest two points on the left, and the distance δr between the
closest two points on the right (where a suitable threshold is n ≤ 3). Then
to combine these solutions it remains to check whether there is a point Pi on
the left and a point Pj on the right that are within δ = min (δl , δr ) of each
other. To check this, an array is made of all those points that are within δ
of the dividing line, and they are sorted in order of increasing y-coordinates.
Then each point Pk in this array is checked with the following points one by one
(using a nested loop), if some point’s y-coordinate is within δ then its distance
from Pk is calculated and compared with δ, if not then no further points are
checked for Pk and the next point is taken.

Exercise 4.1 (Closest Pair of Points) Write a program that accepts a col-
lection of points and which implements the divide-and-conquer technique to find
the distance between the closest two points in the collection.

4.2 Dynamic Programming


Reading: pp323-338
The dynamic programming technique is an algorithm design technique that
is typically applied to optimization problems, where the best way of solving a
problem must be found. It is similar to the divide-and-conquer technique in that
it uses solutions to subproblems combined together to solve a problem. However,
in dynamic programming the subproblems can be interdependent, with common
sub-subproblems which the technique exploits to improve efficiency. It involves
the following steps:
1. characterize the structure of an optimal solution to the problem,
2. recursively define the value of an optimal solution as a composition of
optimal solutions to subproblems,
3. compute optimal solutions to the simplest subproblems and combine these
to find optimal solutions to more complex subproblems, repeating until the
original problem is solved.
One of the remarkable features of dynamic programming is that it can some-
times produce polynomial time algorithms to solve some seemingly intractable
problems for which a brute-force approach would require exponential time. Part
of the reason for this is that it stores the solution to the parts of a problem as it
60 CHAPTER 4. DESIGN TECHNIQUES

progresses so that those parts do not get solved again and again as the solution
to the original problem is constructed. By comparison, the divide-and-conquer
technique treats subproblems as being independent, and so might do more work
than necessary if the same subproblem is encountered several times.
For example, suppose a car factory produces cars in n steps along two as-
sembly lines A and B that work in parallel. Let ai and bi be the times required
for step i of the production (for 0 ≤ i < n) along each line, eA and eB be the
time required for a chassis to enter the line, and xA and xB the time for the
completed car to exit the line. Hence the total time taken to produce a car on
n−1
X
assembly line A is eA + ai + xA , and the time taken on assembly line B is
i=0
n−1
X
eB + bi + xB . Suppose that for rush jobs a partially completed car can be
i=0
shifted from one assembly line to the other at a time cost of si (from line A to
line B after step i of production) and ti (from line B to line A).
a0 a1 an−1
b b b b b
eA xA

Entry t0 s0 t1 s1 tn−2 sn−2 Exit

eB b b b b b xB
b0 b1 bn−1
The problem is to efficiently find the minimal time required to assemble a
car. Ignoring paths that loop back and forth between the same step of assembly
(which just waste time), there are 2n distinct possible paths that could be
taken through the assembly process, so a brute-force approach of checking every
possible path is totally infeasible except for very small values of n.
A dynamic programming approach starts by characterizing the structure of
an optimal solution. Clearly an optimal solution would not transfer from one
line to the other and then straight back so it must consist of a path that first
starts with step 0 of one of the lines (requiring either time eA + a0 or eB + b0 ),
and then either continue on the same line (either a1 or b1 ) or else swap to
the other line and perform step 1 there (either s0 + b1 or t0 + a1 ). It repeats
this until finishing step n − 1 and exiting (which adds a further time of either
an−1 + xA or bn−1 + xB ). Note that if an optimal solution performs step i on
a particular line then that optimal path must also be an optimal solution up
to that point of assembly. This must be the case since if there were a quicker
way of getting to that point of assembly, then replacing the first portion of the
optimal solution with the quicker way would give a solution even quicker than
the optimal solution. Put another way, each step of an optimal path for the
entire problem gives the fastest way to get to each point in the production line.
For example, if an optimal solution to the assembly problem enters on line A,
performs step 0 on that line and then switches to line B for step 1, then the
fastest way of getting to the start of step 2 on line B must also be by performing
step 0 on line A and then switching to line B for step 1. The identification that
the optimal solution to some problem must have optimal substructure is the key
to applying dynamic programming to solve the problem.
Next, the optimal solution must be defined recursively in terms of optimal
4.2. DYNAMIC PROGRAMMING 61

solutions to subproblems. The subproblems in this example consist of finding


the quickest way to get a partially assembled car to the start of step i on either
of the lines. Let fA (i) and fB (i) denote the quickest time to get to the start of
step i on line A and B respectively (these are referred to as f0 [i + 1] and f1 [i + 1]
in the textbook). Then fA (0) = eA and fB (0) = eB (since there is no transfer
possible before step 0), and for 1 ≤ i < n:

fA (i) = min (fA (i − 1) + ai−1 , fB (i − 1) + bi−1 + ti−1 )


fB (i) = min (fB (i − 1) + bi−1 , fA (i − 1) + ai−1 + si−1 ) .

Then the quickest time f ∗ to fully assemble a car is:

f ∗ = min (fA (n − 1) + an−1 + xA , fB (n − 1) + bn−1 + xB ) .

At this stage a recursive solution could be implemented, but it would give an


Ω (2n ) solution as it would wind up finding fA (n − 2) and fB (n − 2) twice,
fA (n − 3) and fB (n − 3) four times, and each of fA (1) and fB (1) 2n−2 times.
This is due to the fact that fA (i) and fB (i) are both dependant on fA (i−1) and
fB (i−1), that is, the subproblems of finding fA (i) and fB (i) are not independent
of each other.
Instead of reverting to recursion dynamic programming works from the bot-
tom up building a table of the optimal solutions it has found so far. In the
assembly line problem, it starts by calculating fA (1) and fB (1) and using them
to find fA (2) and fB (2), etc, until fA (n − 1) and fB (n − 1) are both found at
the end of a Θ(n) loop. The algorithm Fastest-Way shows how an optimal
solution to the assembly line problem is found, using arrays to store the times
fA (i) and fB (i) as they are found, and arrays lA and lB that indicate which line
the optimal solutions to each subproblem was on at its previous step.
As another example of dynamic programming consider the problem of try-
ing to minimize the work in multiplying together n matrices A0 , A1 , . . . , An−1 ,
where each Ai is a pi × pi+1 matrix. Matrix multiplication is not commuta-
tive, but it is associative, so the product A0 · A1 · . . . · An−1 can be found in
many different ways depending on the order in which the n − 1 multiplications
are performed. For example, the product A0 A1 can only be evaluated in one
way, but the product A0 A1 A2 can be evaluated in two ways, and the product
A0 A1 A2 A3 in five ways (the number  of ways of multiplying n matrices actually
grows exponentially as Ω 4n /n3/2 ).
However, not all these products involve the same amount of work. If Ai is
a pi × pi+1 matrix and Ai+1 is a pi+1 × pi+2 matrix then each entry of Ai Ai+1
involves pi+1 multiplications and pi+1 − 1 additions, so the total number of
multiplications in the product Ai Ai+1 is pi pi+1 pi+2 . To illustrate this consider
the case when A0 is a 3 × 5 matrix, A1 is a 5 × 2 matrix, and A2 is a 2 × 4
matrix. Then A0 A1 is a 3 × 2 matrix that requires 3 · 5 · 2 = 30 multiplications,
and so (A0 A1 ) A2 requires a further 3 · 2 · 4 = 24 multiplications, resulting in
54 multiplications in total. Compare this with A1 A2 , which is a 5 × 4 matrix
requiring 5 · 2 · 4 = 40 multiplications, and A0 (A1 A2 ) which requires a further
3 · 5 · 4 = 60 multiplications, resulting in 100 multiplications in total. Although
(A0 A1 ) A2 = A0 (A1 A2 ) the amount of work required is different by almost
a factor of two. For different sizes and for larger values of n the number of
required multiplications can vary enormously depending on the order in which
the matrices are multiplied together.
62 CHAPTER 4. DESIGN TECHNIQUES

Fastest-Way(a, b, s, t, e, x)
1 fA (0) ← eA
2 fB (0) ← eB
3 lA (0) ← A  only path to get time fA (0) is along line A
4 lB (0) ← B  only path to get time fB (0) is along line B
5 for i ← 1 to n − 1 do
6  find quickest time fA (i) to start of step i on line A
7 if fA (i − 1) + ai−1 < fB (i − 1) + bi−1 + ti−1 then
8 fA (i) ← fA (i − 1) + ai−1
9 lA (i) ← A  quickest path used step i − 1 of line A
10 else
11 fA (i) ← fB (i − 1) + bi−1 + ti−1
12 lA (i) ← B  quickest path used step i − 1 of line B
13  find quickest time fB (i) to start of step i on line B
14 if fB (i − 1) + bi−1 < fA (i − 1) + ai−1 + si−1 then
15 fB (i) ← fB (i − 1) + bi−1
16 lB (i) ← B  quickest path used step i − 1 of line B
17 else
18 fB (i) ← fA (i − 1) + ai−1 + si−1
19 lB (i) ← A  quickest path used step i − 1 of line A
20 if fA (n − 1) + an−1 + xA < fB (n − 1) + bn−1 + xB then
21 f ∗ = fA (n − 1) + an−1 + xA
22 l∗ = A
23 else
24 f ∗ = fB (n − 1) + bn−1 + xB
25 l∗ = B
26 return f ∗ and l∗

Since the number of ways of forming A0 · A1 · . . . · An−1 grows exponentially,


a brute-force approach soon becomes infeasible. A dynamic programming ap-
proach works in this example because the optimal way of forming the product
A0 · A1 · . . . · An−1 happens to contain optimal ways of solving smaller subprob-
lems, and the solution to the problem can be constructed from their solutions.
To see this note that the optimal way of forming the product must split at some
value of k with 0 < k ≤ n − 1:
A0 · A1 · . . . · An−1 = (A0 · A1 · . . . · Ak−1 ) · (Ak · Ak+1 · . . . · An−1 ) .
The way that the product A0 · A1 · . . . · Ak−1 was formed in the optimal solution
must also be an optimal way of forming this product, otherwise if there were
a more efficient way of forming A0 · A1 · . . . · Ak−1 it would give a way of
forming the entire product with fewer multiplications than the optimal solution
(using a proof by contradiction). Hence once the optimal solutions for forming
A0 ·A1 ·. . .·Ak−1 and Ak ·Ak+1 ·. . .·An−1 have been found for each 0 < k < n they
can be used to find the optimal solution for forming the product A0 ·A1 ·. . .·An−1 .
Recursively, it will be required to know the optimal way of forming the product
Ai Ai+1 . . . Aj−1 for every value of i and j with 0 ≤ i < j ≤ n − 1.
For 0 ≤ i < j ≤ n let mij denote the number of multiplications required
4.3. ELEMENTS OF DYNAMIC PROGRAMMING 63

to optimally form the product Ai Ai+1 . . . Aj−1 , and for j > i + 1 let sij denote
that value of k with i < k < j where the optimal product splits as:

Ai · Ai+1 · . . . · Aj−1 = (Ai · Ai+1 · . . . · Ak−1 ) · (Ak · Ak+1 · . . . · Aj−1 ) .

If j = i + 1 then mij = 0, whereas for j > i + 1 one has:

sij = mini<k<j (mik + mkj + pi pk pj )


mij = misij + msij j + pi psij pj

since the first product is a pi × psij matrix and the second product is a psij × pj
matrix.
Continuing the dynamic programming approach the desired value of m0n is
found from the recurrence not by a top-down recursive approach, but instead
with a bottom-up approach, starting with m01 , m12 , . . . , mn−1n and then using
them to find m02 , m13 , . . . , mn−2 n , and then m03 , m14 , . . . , mn−3 n etcetera until
m0n is found.
Note as in the assembly line problem that there is overlap in the subproblems.
Here a term such as m02 cannot be found until m01 and m12 are known, but
m13 also needs the value of m12 (as well as m23 ). The Matrix-Chain-Order 
algorithm shows how the optimal solution is found in O n3 using dynamic
programming, and the program MatrixChainOrder is a Java implementation of
this algorithm using ragged arrays for m and s, where mij is the entry m[j][i]
in the array (so that each m[j] is a one-dimensional array of length j).

Matrix-Chain-Order(p)
1 n ← length[p] − 1
2 for j ← 1 to n do
3 i←j−1
4 mij ← 0
5 for l ← 2 to n do  number of matrices multiplied together
6 for j ← l to n do
7 i←j−l
8 sij ← mini<k<j (mik + mkj + pi pk pj )  requires loop
9 mij ← misij + msij j + pi psij pj
10 return m and s

Exercise 4.2 (Assembly Line Problem) Write a program that implements


the assembly line algorithm, which outputs the quickest time to produce a car in
the factory and for each step of production which line the car passed through.

4.3 Elements of Dynamic Programming


Reading: pp339-362
One of the requirements in order to apply dynamic programming to an
optimization problem is that the problem have optimal substructure, meaning
that an optimal solution to the problem contains within it optimal solutions to
64 CHAPTER 4. DESIGN TECHNIQUES

/**
A class that demonstrates how the divide-and-conquer technique can
be used to find an optimum way of multiplying a chain of n
matrices where each matrix has order p_i times p_{i+1}
@author Andrew Ensor
*/

public class MatrixChainOrder


{
private int[] p; // orders of the matrices in the product chain
private int[][] s; //values of k where product A_i...A_{j-1} splits
private int[][] m; //minimum no multiplications to evaluate product
private int n; // total number of matrices in the product

public MatrixChainOrder(int[] p)
{ this.p = p;
n = p.length-1;
m = new int[n+1][];
m[0] = null; // m[j][i] not used for j=0
s = new int[n+1][];
s[0] = null; s[1] = null; // s[j][i] not used for j=0 nor j=1
for (int j=1; j<=n; j++)
{ int i = j-1;
m[j] = new int[j]; // create m[j][0], ..., m[j][j-1]
m[j][i] = 0;
s[j] = new int[j-1]; // create s[j][0], ..., s[j][j-2]
}
for (int l=2; l<=n; l++) // l is number of matrices in product
{ for (int j=l; j<=n; j++)
{ int i = j-l;
// find k for which m[k][i]+m[j][k]+p[i]p[k]p[j] minimized
int indexK = i+1;
int minM = m[indexK][i]+m[j][indexK]+p[i]*p[indexK]*p[j];
for (int k=i+2; k<j; k++)
{ int anotherM = m[k][i]+m[j][k]+p[i]*p[k]*p[j];
if (anotherM<minM)
{ indexK = k;
minM = anotherM;
}
}
// update arrays m and s
s[j][i] = indexK;
m[j][i] = minM;
}
}
}

cont-
4.3. ELEMENTS OF DYNAMIC PROGRAMMING 65

-cont

public String toString()


{ return m[n][0] + " multiplications for " + getProduct(0, n);
}

// recursive helper method that returns a string representation of


// the portion of the optimal product A_i x ... x A_{j-1} which
// uses parentheses to show the order of multiplication
private String getProduct(int i, int j)
{ if (j == i+1)
return "A["+i+"]";
else
{ int k = s[j][i];
// show the product split at k
return "(" + getProduct(i, k) + "." + getProduct(k, j) + ")";
}
}

// driver main method to test the class


public static void main(String[] args)
{ int[] orders = {17, 5, 26, 30, 19, 7, 15, 15}; // 7 matrices
MatrixChainOrder mco = new MatrixChainOrder(orders);
System.out.println(mco);
}
}

subproblems. For example, an optimal solution to the assembly line problem


contained within it optimal solutions for reaching various points of assembly on
an assembly line. An optimal solution to the matrix multiplication problem of
multiplying n matrices contained within it an optimal solution for multiplying
the first k matrices and for the remaining n − k matrices for some k < n.
One must be certain that an optimal solution to a problem does ensure
optimal solutions to its subproblems, otherwise dynamic programming cannot
be applied. For example, suppose G is a finite undirected (not weighted) graph
and consider paths in G between a vertex u and a vertex v (where a path is a
sequence of adjacent edges that can include an edge at most once but vertices
any number of times). If the length of a path is taken to be the number of its
edges then two problems could be stated:

• what is the length of a shortest path between u and v?

• what is the length of a longest path between u and v?


Consider a shortest path between u and v. If w b
w b

is a vertex along this path then the portion of the


path between u and w must be a shortest path u v
b b
between those two vertices, likewise the portion
between w and v must also be a shortest path
between them. b b

If this were not the case and say a shorter path could be found between u and
w then combine it with the portion between w and v. If the result were a path
66 CHAPTER 4. DESIGN TECHNIQUES

between u and v then it would be shorter than the shortest path, a contradiction.
If it were not a path then it would have two edges in common. Removing the
common edges would lead to an even shorter path, again a contradiction. Hence
the shortest path problem has optimal substructure.
Now instead consider a longest path between u
and v. If w is a vertex along this path one might
b b
be tempted to claim that the portion of the path
between u and w is a longest path between these u v
two vertices, but this might be incorrect. There b b

might exist longer paths between u and w which


include some of the edges used in the portion of b b

the longest path between w and v. w


The difficulty with the two portions of the longest path is that they are not
independent, a solution for the portion of the path between u and w affects
the possible solutions for the portion between w and v, since edges cannot be
used twice in a path. Hence the longest path problem does not have optimal
substructure and so is not suitable for solving via dynamic programming.
The other requirement for applying dynamic programming is overlapping
subproblems, meaning that optimal solutions to subproblems use some solutions
in common to simpler subproblems. By working bottom up, dynamic program-
ming builds optimal solutions to the simplest subproblems first and use their
solutions several times over to solve more complicated subproblems. This reuse
of solutions, only solving each subproblem once, helps improve the efficiency
of dynamic programming. If each subproblem involved solving only new sub-
problems distinct from those for other subproblems then dynamic programming
would not be suitable.
A common text processing problem is to find the similarity between two se-
quences (strings) of characters. One way of measuring the similarities is through
the length of the longest common subsequence. If X = hx0 , x1
, . . . , xm−1 i is a se-
quence of characters then a subsequence of X is any sequence xi0 , xi1 , . . . , xik−1
where 0 ≤ i0 < i1 < ik−1 ≤ m − 1 (essentially just removing zero or more of the
characters from the sequence X but maintaining the same order of the remain-
ing characters). Note that the i-th prefix of X, given by Xi = hx0 , x1 , . . . , xi−1 i
for 0 < i ≤ n is always a subsequence of X. A sequence Z is called a common
subsequence of two sequences X and Y if Z is a subsequence of X and also of
Y . For example, if X = hA, G, G, T, C, A, C, T i and Y = hG, C, A, G, A, T, Ci
(two DNA sequences) then Z = hG, C, A, T i is a common subsequence, another
is hA, G, T, Ci.
The longest common subsequence problem for two sequences X and Y in-
volves finding a maximum length common subsequence Z. A brute-force ap-
proach could start by finding all the subsequences of X and check whether each
was a valid subsequence of Y , but since a sequence of length n has 2n distinct
subsequences this quickly becomes infeasible.
To apply dynamic programming to solve the longest common subsequence
problem requires finding the optimal substructure in the problem. Suppose Z =
hz0 , z1 , . . . , zk−1 i is a longest common subsequence of X = hx0 , x1 , . . . , xm−1 i
and Y = hy0 , y1 , . . . , yn−1 i. Considering the last character in X and Y gives
one of the following two cases:

• xm−1 = yn−1 , and so too zk−1 = xm−1 since Z is a longest common subse-
4.3. ELEMENTS OF DYNAMIC PROGRAMMING 67

quence (otherwise xm−1 could be appended to Z giving a longer common


subsequence). Hence Zk−1 must also be a longest common subsequence
of Xm−1 and of Yn−1 .

• xm−1 6= yn−1 , and so zk−1 cannot equal both of them. Hence Z must be
a longest common subsequence of either Xm−1 and Y (if zk−1 6= xm−1 )
or of X and Yn−1 (if zk−1 6= yn−1 ).

This shows that an optimal solution to the longest common subsequence problem
is composed of optimal solutions to subproblems of finding the longest common
subsequence of Xi and Yj for i < m and j < n. A recursive solution to the
problem would first check whether xm−1 = yn−1 , and if so would recursively find
the longest common subsequence of Xm−1 and Yn−1 and append xm−1 to that
subsequence. If not it would recursively find the longest common subsequence
of Xm−1 and Y , and of X and Yn−1 and take the longer of the two. Note that
the second possibility will probably require finding the longest common subse-
quence of Xm−1 and Yn−1 in the next recursion, so there are many overlapping
subproblems.
For i ≤ m and j ≤ n let cij be the length of the longest common subse-
quence of Xi and Yj , and let bij give information about how the subsequence
is constructed from the recurrence, with bij = INCLUDE if xi−1 = yj−1 and
so xi−1 is used in the subsequence, bij = NOTX if xi−1 is not included in the
subsequence, or instead bij = NOTY if yj−1 is not included in the subsequence.
Then every ci0 = 0 since Y0 = hi and every c0j = 0 since X0 = hi. For i > 0
and j > 0 the value of cij is given by the recurrence:

c(i−1)(j−1) + 1 if xi−1 = yj−1
cij = 
max c(i−1)j , ci(j−1) if xi−1 6= yj−1 .

The LCS-Length algorithm shows how the length of the longest common
subsequence is found in O(mn), and the class LCSLength is a Java implementa-
tion of this algorithm which uses the values of bij to output one of the longest
common subsequences, starting with bmn .

Exercise 4.3 (Knapsack Problem) The 0-1 knapsack problem involves a set
of n items for which item i has positive benefit bi and positive integer weight wi .
Suppose a person with a knapsack can only carry up to a maximum total weight
W . The problem is to decide which items to choose so as to maximize the total
benefit.
Let Bkw be the maximum benefit that can be obtained from selecting some of
items 0 to k − 1 using a knapsack that has capacity w. Note that B0w = 0, and
for k > 0 that Bkw is given by the recurrence:

B(k−1)w if w < wk−1
Bkw = 
max B(k−1)w , B(k−1)(w−wk−1 ) + bk−1 if w ≥ wk−1 .

Design an O(nW ) algorithm for solving the 0-1 knapsack problem and implement
the algorithm in a program using an int[n+1][W+1] array of benefits. Then
use a boolean[n+1][W+1] to keep track of which items are used in the optimal
solution.
68 CHAPTER 4. DESIGN TECHNIQUES

LCS-Length(X, Y )
1 m ← length[X]
2 n ← length[Y ]
3 for i ← 1 to m do
4 ci0 ← 0
5 for j ← 0 to n do
6 coj ← 0
7 for i ← 1 to m do
8 for j ← 1 to n do
9 if xi = yj then
10 cij ← c(i−1)(j−1) + 1
11 bij ← INCLUDE
12 else if c(i−1)j > ci(j−1) then
13 cij ← c(i−1)j
14 bij ← NOTX
15 else
16 cij ← ci(j−1)
17 bij ← NOTY
18 return c and b

4.4 Greedy Technique


Reading: pp370-391
The greedy technique is a simplification of the dynamic programming tech-
nique for the case when making locally optimum choices would result in a glob-
ally optimum solution. One of the requirements in order to apply the greedy
technique to an optimization problem is that the problem have optimal sub-
structure, just as for dynamic programming, but the other is that the problem
satisfy the greedy-choice property. The greedy-choice property states that an
optimal solution to the problem can be found by making any locally greedy
choice, meaning that whenever a choice of candidates is possible the one that
currently looks best is chosen.
Note that many problems do not satisfy the greedy-choice property. For
example, consider the 0-1 knapsack problem (from Exercise 4.3) with three
items, one with weight 10kg and benefit 60 (a benefit per kg of 6), another with
weight 20kg and benefit 100 (a benefit per kg of 5), and the third with weight
30kg and benefit 120 (a benefit per kg of 4), and a knapsack that can hold up to
50kg, insufficient to hold all 60kg of items. One greedy strategy is to choose the
items with greatest benefit per unit weight first, which means that the 10kg item
would be the item chosen first, leaving 40kg available in the knapsack for further
items. The next greedy choice would be to choose the 20kg item, using now a
total of 30kg of the 50kg available. This greedy solution would result in a total
benefit of 160, but note that this is not an optimum solution, since choosing the
20kg item and the 30kg item would result in a larger total benefit of 220. Hence
the 0-1 knapsack problem does not satisfy the greedy-choice property and so
not solvable using the greedy technique (it requires the more-general dynamic
programming).
4.4. GREEDY TECHNIQUE 69

/**
A class that demonstrates how the divide-and-conquer technique can
be used to find a longest common subsequence of strings x and y
@author Andrew Ensor
*/

public class LCSLength


{
private enum Construction{INCLUDE, NOTX, NOTY};

public LCSLength()
{
}

// returns one longest common subsequence of x and y


public String findLCS(String x, String y)
{ int m = x.length();
int n = y.length();
int[][] c = new int[m+1][n+1];
Construction[][] b = new Construction[m+1][n+1];
for (int i=1; i<=m; i++)
c[i][0] = 0;
for (int j=0; j<=n; j++)
c[0][j] = 0;
for (int i=1; i<=m; i++)
{ for (int j=1; j<=n; j++)
{ if (x.charAt(i-1)==y.charAt(j-1))
{ c[i][j] = c[i-1][j-1]+1;
b[i][j] = Construction.INCLUDE;
}
else if (c[i-1][j]>c[i][j-1])
{ c[i][j] = c[i-1][j];
b[i][j] = Construction.NOTX;
}
else
{ c[i][j] = c[i][j-1];
b[i][j] = Construction.NOTY;
}
}
}
// use the array b to determine one longest common subsequence
// starting at index m,n of b
return getLCSString(x, b, m, n);
}

cont-
70 CHAPTER 4. DESIGN TECHNIQUES

-cont

// recursive helper method which returns the subsequence of X


// starting at entry i,j in the array b
private String getLCSString(String x, Construction[][] b,
int i, int j)
{ if (i>0 && j>0)
{ switch (b[i][j])
{ case INCLUDE :
{ return getLCSString(x, b, i-1, j-1) + x.charAt(i-1);
}
case NOTX :
{ return getLCSString(x, b, i-1, j);
}
case NOTY :
{ return getLCSString(x, b, i, j-1);
}
}
}
return "";
}

public static void main(String[] args)


{ // test example given in Introduction to Algorithms textbook
String x = "amputation";
String y = "spanking";
LCSLength finder = new LCSLength();
String lcs = finder.findLCS(x, y);
System.out.println("One longest common subsequence of " + x
+ " and " + y + " is " + lcs + ".");
}
}
4.4. GREEDY TECHNIQUE 71

A variation on the knapsack problem is the fractional knapsack problem,


where fractional quantities of each item can be chosen, so that different portions
of each item can be combined in the knapsack. In this case the problem does
satisfy the greedy-choice property, where items are again chosen based on the
order of maximum benefit per unit weight. To see why this is the case suppose
instead that some fraction of item j has been chosen for the knapsack instead of
some fraction of item i, where the benefit per unit weight of item j is less than
that of item i. Replacing that fraction of item j by an equal weight of item i
would result in a different knapsack with greater benefit, so the first knapsack
could not have been optimal. Applying this strategy to the knapsack with three
items would result in all of the 10kg item being chosen, and then all of the
20kg item, leaving space for only 23 of the 30kg item, providing an optimal total
benefit of 60 + 100 + 32 120 = 240. Note that no other combination of portions
of each item can give a greater total benefit.
The design of a greedy algorithm for an optimization problem that satisfies
both the optimal substructure property and the greedy choice property follows
the following steps:

1. characterize the optimal solution to the problem as one where a choice is


made and a simpler subproblem remains to be solved,

2. prove that there must be at least one optimal solution to the problem
which makes the greedy choice,

3. recursively compute an optimal solution to the subproblem and combine


it with the greedy choice to obtain an optimal solution to the original
problem.

The greedy technique is really just a simplification of dynamic programming in


the case where it is known that at least one optimal solution must exist which
makes a greedy choice at each step. It can lead to simpler and more efficient
algorithms in such cases since at each step only one further subproblem needs to
be solved, whereas dynamic programming might require several to be solved at
each step. Greedy algorithms can often be implemented in a top-down approach
using recursion rather than the bottom-up approach of dynamic programming.
As an example, suppose a set S = {a0 , a1 , . . . , an−1 } of n proposed activities
use a common resource which is only available for one activity at a time (such
as class times for a lecture theatre). Each activity ai has a start time si ≥ 0 and
finish time fi > si . Two activities ai and aj are called compatible if fi ≤ sj or
fj ≤ si , so that one of the activity has completed before the other starts, making
it possible for them to use the same resource. The problem is to determine the
maximum number of activities that can be selected (a largest possible subset of
mutually compatible activities).
First, suppose T ⊆ S is an optimal solution to the activity selection problem,
and consider the activity ai in T that finishes first, so that all other activities in
T must start after that activity finishes (otherwise they would be incompatible
with this activity). Note that this activity can be replaced any other activity in
S that finishes before fi and still the activities in T will be compatible. Hence
if T is an optimal solution then there must also be an optimal solution that
uses the activity in S which finishes first (a greedy choice). Hence the activity
selection problem possesses the greedy-choice property. Note that this is not
72 CHAPTER 4. DESIGN TECHNIQUES

claiming that every optimal solution contains a greedy choice, but rather than
at least one optimal solution contains a greedy choice (this is the one that a
greedy algorithm would find).
Choosing the activity ai in S that finishes first for the solution still leaves the
subproblem of finding the maximum number of activities that can be selected
which start after time fi . Thus the optimal solution can be found by first
making a greedy choice and then combining it with an optimal solution to
the subproblem. The simpler subproblem can be solved in the same way by
first finding the activity aj that finishes first from those that start after time
fi , and then finding the maximum number of activities that can be selected
starting after time fj . Hence the activity selection problem possesses the optimal
subproblem property (and note that there is only one further subproblem that
needs to be solved at each step).
Rather than searching for the activity that finishes first each time, the ac-
tivities in S can be sorted in order of finishing time before the algorithm com-
mences. Once this sorting has been done (say by an O (n log n) algorithm) the
Activity-Selector algorithm shows how an optimal solution can be found
in Θ(n). In this example, the recursive algorithm could easily be replaced by a
non-recursive algorithm which simply iterates through the activities in order.

Activity-Selector(s, f )
1  find maximum number of compatible activities from s that finish by f
2 return {a0 } ∪ Recursive-Activity-Selector(s, f, 0)

Recursive-Activity-Selector(s, f, i)
1  find the next activity in sorted order that starts after time fi
2 m←i+1
3 while m < length[s] and sm < fi do
4 m←m+1
5 if m < length[s] then  use am as the greedy choice
6 return {am } ∪ Recursive-Activity-Selector(s, f, m)
7 else
8 return ∅

The Huffman coding algorithm is an important text compression algorithm


that uses the greedy technique to assign codes to characters. Coding schemes
such as Unicode use a fixed number of bits to represent each character in a doc-
ument, whereas a Huffman code uses fewer bits for those characters that appear
more frequently in the document and more bits for the infrequent characters,
reducing the number of bits overall that are required for representing a string.
This can be valuable when the string needs to be sent along a slow network
connection, or when there is limited storage. It is called a prefix code since the
code for any character is not the start of the code for any other characters,
which simplifies the decoding to reconstruct the string later.
The Huffman algorithm uses the greedy technique to produce an optimal
prefix code for the string (as shown on pp388-391 of the textbook). It starts
with a list of allowable characters C and the frequency fc with which each char-
4.4. GREEDY TECHNIQUE 73

acter occurs (either found by iterating through the string or else estimated from
previous typical strings). The algorithm works by building a binary tree whose
leaf nodes are the characters, built so that characters with a higher frequency
appear further up the tree. Tracing a path from the root to any character gives
the bits in the encoding for that character, each left edge followed corresponds
to a 0 in the code and each right edge corresponds to a 1, so that the number
of bits in a character’s code is given by its height in the tree. The construction
is started by making a forest of binary trees, one per allowable character and
each having height 0. Then progressively two of the binary trees are chosen at a
time to join together to form a tree of height one greater, until only one binary
tree remains which holds the allowable characters at its leaf nodes. The choice
of which two binary trees to combine at each step is a greedy choice, the two
trees whose combined characters occur with the least frequency are chosen to
combine each time.
For example, suppose a Huffman code is to be made for the string:
“peter piper picked a peck of pickled peppers”.
First a table is made of each character and its frequency in the string:
character a c d e f i k l o p r s t space
frequency 1 3 2 8 1 3 3 1 1 9 3 1 1 7
Next, a separate binary tree is made for each character, and the trees are merged
together two at a time, until only one tree remains. The leaf nodes of each tree
holds the characters and for convenience each node holds the total frequency for
all its descendant character leaf nodes. First the leaf nodes with least frequency
are joined together two at a time, such as leaf l with leaf t, leaf s with leaf f ,
and leaf o with leaf a.
2 2 2

l:1 t:1 s:1 f :1 o:1 a:1 d:2 r:3 i:3 c:3 k:3 :7 e:8 p:9
Then the two smallest trees with frequency 2 are joined together and the process
continues until only one tree remains.
44

17 27

8 p:9 12 15

4 4 6 6 :7 e:8

2 2 2 d:2 r:3 i:3 c:3 k:3

l:1 t:1 s:1 f:1 o:1 a:1


Once completed the code for each character can be obtained from the tree. For
instance, in this example p gets assigned the two bit code 01 (since it occurred
frequently in the string), whereas s gets the longer code 00010, and the substring
“peter” would be coded as 01111000011111000. Note that a fixed-length code
74 CHAPTER 4. DESIGN TECHNIQUES

for the string would have required at least 4 bits (since there are 14 distinct
characters including the space), and the entire 44 character string would be
encoded using a total of 176 bits. Using the variable-length Huffman code
found in this example would result in a total of 149 bits (a savings of just over
15%).
The algorithm Huffman shows how the encryption algorithm is performed
in O (n log2 n), and the program Huffman (and interface HuffmanNode) is a
Java implementation of the algorithm. For efficiency the program stores the
root of each binary tree in a priority queue (implemented by a heap to keep the
operations O (log n)) ordered by frequencies.

Huffman(C)
1 create a priority queue Q to hold root of each binary tree
2 put each potential character and its frequency in the queue
3 while length[Q] > 1 do
4 x ← Extract-Min(Q)
5 y ← Extract-Min(Q)
6 allocate new node z with x and y as its children
7 Insert(Q, z)  add z to queue Q
8 return Extract-Min(Q)  return root node of last remaining tree

Exercise 4.4 (Huffman Compression) Use the Huffman coding algorithm


to write compression and decompression methods for transmitting a long string
across a network socket. Remember that the Huffman encoding will need to be
sent along with the encoded information and some code will be needed to denote
the end of transmission.
4.4. GREEDY TECHNIQUE 75

/**
An interface that represents a node in a binary tree that is used
for the Huffman encoding algorithm
@see Huffman.java
*/

public interface HuffmanNode


{
// returns the left child of this node or null if none
public HuffmanNode getLeftChild();

// returns the right child of this node or null if none


public HuffmanNode getRightChild();

// returns whether this node is a leaf (which holds a character)


public boolean isLeaf();

// returns total frequency of all characters that are descendants


public int getFrequency();
}
76 CHAPTER 4. DESIGN TECHNIQUES

/**
A class that performs the Huffman encoding algorithm for a string
@author Andrew Ensor
*/
import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.PriorityQueue;

public class Huffman


{
private Map<Character, Integer> charFrequencyMap;
private static final Integer ONE = new Integer(1);

// constructor accepts a string to use for determining frequency


// of each letter and the characters that are valid for encoding
public Huffman(String characters)
{ charFrequencyMap = new HashMap<Character, Integer>();
for (int i=0; i<characters.length(); i++)
{ Character character = new Character(characters.charAt(i));
Integer frequency = charFrequencyMap.get(character);
if (frequency == null)
frequency = ONE;
else
frequency = new Integer(frequency.intValue()+1);
charFrequencyMap.put(character, frequency);
}
}

// builds a binary tree using the specified string which


// should contain only valid characters
protected HuffmanNode buildTree(String data)
{ // create a priority queue to hold root of each binary tree
Comparator<HuffmanNode> comparator=new Comparator<HuffmanNode>()
{ public int compare(HuffmanNode nodeX, HuffmanNode nodeY)
{ return nodeX.getFrequency()-nodeY.getFrequency();
}
};
PriorityQueue<HuffmanNode> queue
= new PriorityQueue<HuffmanNode>(charFrequencyMap.size(),
comparator);

cont-
4.4. GREEDY TECHNIQUE 77

-cont

// create a leaf node for each character key in the map


Iterator<Character> iterator
= charFrequencyMap.keySet().iterator();
while (iterator.hasNext())
{ Character character = iterator.next();
HuffmanNode leaf = new LeafNode(character,
charFrequencyMap.get(character));
queue.add(leaf);
}
while (queue.size()>1)
{ HuffmanNode nodeX = queue.poll();
HuffmanNode nodeY = queue.poll();
HuffmanNode nodeZ = new ParentNode(nodeX, nodeY);
queue.add(nodeZ);
}
return queue.poll();
}

// returns a string representation of the character frequency map


public String toString()
{ StringBuffer output = new StringBuffer();
Iterator<Character> iterator
= charFrequencyMap.keySet().iterator();
while (iterator.hasNext())
{ Character character = iterator.next();
int frequency = charFrequencyMap.get(character).intValue();
output.append(character.charValue() + " has frequency "
+ frequency + "\n");
}
return output.toString();
}

// driver main method to test the class


public static void main(String[] args)
{ String original ="peter piper picked a peck of pickled peppers";
Huffman huff = new Huffman(original);
System.out.println("Characters");
System.out.println(huff);
System.out.println("Resulting Huffman binary tree");
System.out.println(huff.buildTree(original));
}

cont-
78 CHAPTER 4. DESIGN TECHNIQUES

-cont

// inner class that represents a leaf in a Huffman binary tree


private class LeafNode implements HuffmanNode
{
private Character character;
private int frequency;

public LeafNode(Character character, int frequency)


{ this.character = character;
this.frequency = frequency;
}

public HuffmanNode getLeftChild()


{ return null;
}

public HuffmanNode getRightChild()


{ return null;
}

public boolean isLeaf()


{ return true;
}

public int getFrequency()


{ return frequency;
}

public Character getCharacter()


{ return character;
}

public String toString()


{ return character.charValue() + ":" + frequency;
}
}

cont-
4.4. GREEDY TECHNIQUE 79

-cont

// inner class represents a parent node in a Huffman binary tree


private class ParentNode implements HuffmanNode
{
private HuffmanNode leftChild, rightChild;
private int frequency;

// create a parent node with specified child nodes


public ParentNode(HuffmanNode leftChild, HuffmanNode rightChild)
{ this.leftChild = leftChild;
this.rightChild = rightChild;
frequency=leftChild.getFrequency()+rightChild.getFrequency();
}

public HuffmanNode getLeftChild()


{ return leftChild;
}

public HuffmanNode getRightChild()


{ return rightChild;
}

public boolean isLeaf()


{ return false;
}

public int getFrequency()


{ return frequency;
}

public String toString()


{ return "[" + leftChild.toString() + ","
+ rightChild.toString() + "]";
}
}
}
80 CHAPTER 4. DESIGN TECHNIQUES
Chapter 5

Advanced Data Structures

5.1 Red-Black Trees

Reading: pp273-293
A binary search tree (or sort tree) is a binary tree (a tree where each node
has at most two child nodes) for which the element held in each node is greater
than or equal to that held in its left child but less than or equal to that in its
right child. The abstract data type for a binary search tree has add, contains,
and remove methods, but might also include other methods such as iterator,
maximum, minimum, predecessor, and successor.
A tree is said to be balanced if the left and right subtrees of any node always
have height within one of each other. As a consequence a balanced binary search
tree must have all its leaf nodes within one level of each other, and no binary
tree with the same number of nodes could have fewer levels.
Part of the appeal of using a binary search tree to store elements that can
be compared is that the add, contains, and remove methods all take time
proportional to the height of the tree, which for a balanced binary search tree is
log2 n. This is preferable to using a linear data structure such as a sorted array
list, which has O (log2 n) contains, but O (n) add and remove methods (since
elements need to be shifted along to maintain the ordering).
However, each call to add and remove affects the shape of the tree, so the
performance of the add, contains, and remove methods might drop if there are
more levels in the tree than necessary.
For example, if elements:
cow
“cow”, “fly”, “dog”, “bat”, “fox”,
“cat”, “eel”, “ant”, bat fly
are added in this order then the illustrated binary
tree will be built with root holding the element ant cat dog fox
“cow”. Searching for an element in this balanced
tree would require at most 4 ≈ log2 n compar-
eel
isons.

81
82 CHAPTER 5. ADVANCED DATA STRUCTURES

If instead the elements were added in an order


such as: ant
“ant”, “fox”, “bat”, “cat”, “cow”,
fox
“fly”, “eel”, “dog”,
then the binary tree is built with “ant” as its root. bat
In this case the tree is called degenerate in the cat
sense that each parent has only one child, with no
branching, and resembles more a linked list than cow
a tree. To search this tree for an element requires
up to 8 = n comparisons. It thus becomes im- fly
portant when adding or removing nodes that the
tree be kept balanced, or nearly balanced, with eel
as few levels as practically possible, in order to
dog
avoid the tree’s performance dropping to O(n).
If after an addition or removal of a node it is determined that the tree
needs to be balanced then there are two rotations of the tree that can be used
individually or in conjunction to make the tree more balanced, but maintain
the ordering required in a binary search tree.
A left rotation is used to promote the right child y of a node x up a level
and demote the left child a (if present) down one level. It does this in two steps.
First it breaks the link between the node to be rotated left and its right child,
and between that child and its own left child b (if present). Then it makes the
right child y the parent of the original node x and reattaches its detached child
b as the right child of the original node x. Following these steps ensures that
the required ordering of a binary tree is maintained.

x y
x y
a y x c
sever a c make
b c two b two a b
links links
=⇒ =⇒
A right rotation reverses this process by promoting the left child x up a level
and demoting the right child c (if present) down a level (the mirror image of a
left rotation), first severing two links and then making two other links.

y x
x y
x c a y
sever a c make
a b two b two b c
links links
=⇒ =⇒
The Θ(1) algorithm Left-Rotate(T, x) demonstrates how a left rotation
could be performed around a node x in a tree T where each node x has operations
for obtaining its parent parent[x], left child left[x], and right child right[x].
5.1. RED-BLACK TREES 83

Left-Rotate(T, x)
1 y ← right[x]  y is child to promote up one level
2  transfer left child of y to become right child of x
3 right[x] ← left[y]
4 if left[y] 6= null then
5 parent[left[y]] ← x
6  make y the parent
7 parent[y] ← parent[x]  reassign the parent of y
8 if parent[x] = null then  x was the root node of T
9 root[T ] ← y
10 else  determine whether x was left child or right child of its parent
11 if x = left[parent[x]] then
12 left[parent[x]] ← y
13 else
14 right[parent[x]] ← y
15 left[y] ← x
16 parent[x] ← y

Note that this algorithm updates the links to parent nodes as well as the links
to the left and right child nodes. Links to parents are typically required since
rebalancing algorithms need to efficiently move up the tree. An algorithm to
perform a right rotation is quite similar.
There are several alternative algorithms that are widely used to determine
when a tree needs rebalancing. An AVL tree is a binary tree where each node is
assigned a balance factor, which is the height of its left subtree minus the height
of its right subtree. After the addition or removal of a node the balance factor
of its ancestor nodes are recalculated (so it is convenient for each node in an
AVL tree to keep a link to its parent). If the balance factor is not −1, 0, or 1
for a node, then checking the balance factor for its two child nodes determines
which type of rotation is required to return the tree to be balanced (see the
discussion in the textbook p296).
A common alternative to an AVL tree is to use a red-black tree. In a red-
black tree each node is assigned one of two colours subject to the following
restrictions:

• the root is black,

• any child of a red node must be black,

• there must be the same number of black nodes along any path from the
root to a null reference.

Take the black height bh(x) of a node x to be one more than the number of black
nodes down any path from either child of x to a null reference (one is added
since the null node is itself considered black). Induction on the height of a
node x in the tree can be used to show that the subtree rooted at x has at least
2bh(x) − 1 nodes. Indeed, if the height of x is 0 then x is a leaf, and so bh(x) = 1
and the subtree rooted at x consists of just x (21 − 1 = 1 node). Furthermore,
if every subtree rooted at y with height k has at least 2bh(y) − 1 nodes, then
consider a node x with height k + 1. If y is one of the (up to two) child nodes
of x then its height is k, and if y is black then bh(y) = bh(x) − 1 and so its
84 CHAPTER 5. ADVANCED DATA STRUCTURES

subtree has at least 2bh(x)−1 − 1 nodes (by the inductive hypothesis), whereas
if y is red then bh(y) = bh(x), and so its subtree has at least 2bh(x) − 1 nodes
(also by the inductivehypothesis). Hence the subtree rooted at x must have at
least 2 · 2bh(x)−1 − 1 + 1 = 2bh(x) − 1 nodes, which completes the induction
argument. Now suppose a red-black tree with n nodes has height h. Since the
parent of any red node must be black the black height of the root must be at
least bh/2c + 1 ≥ h/2, and so by the previous discussion must have at least
2h/2 − 1 nodes, so n ≥ 2h/2 − 1. Hence the following important fact is obtained:

the height of any red-black tree with n nodes is h ≤ 2 log2 (n + 1).

Thus the red-black properties ensure that a red-black tree is kept reasonably
balanced, with height at most about double that of a perfectly balanced binary
tree.
After any node has been added or removed from the tree the red-black
properties must be reestablished. The algorithm RB-Insert uses a while loop
to traverse down the tree starting from the root to find the parent node y where
the new node z should be added as a leaf. It then adds z and assigns it the
color red, so that the third red-black property is not violated. However, if the
parent parent[z] of z is also red then the RB-Insert-Fixup(T, z) algorithm
make adjustments to reestablish the second red-black property. It does this by
handling six possible cases, which can be divided into two groups depending on
whether parent[z] is a left child or a right child of the grandparent of z. Note
that parent[z] cannot be the root since it is red, and that the grandparent must
currently be black.
Suppose parent[z] is the left child of the grandparent. Case one supposes
that z has an uncle y that is also red. In this case parent[z] and y are both
made black but the grandparent is made red to preserve the black height in
the tree.
c z c

b y d b d

z a case a
one
=⇒
This works regardless of whether z is the left or the right child of parent[z], but
might result in the grandparent and its parent node both being red. Hence the
while loop is repeated with z taken as the grandparent node.
Case two supposes that z has been added as the right child of its parent and
either it does not have an uncle or else its uncle is black. In this case a simple
recolouring is not sufficient so a left rotation is performed about its parent,
and then case three is used with z taken to be the former parent. Case three
supposes that z has been added as the left child of its parent and again either
it does not have an uncle or else its uncle is black. In this case a combination
of recolouring and a rotation is applied, parent[z] is labelled black and the
grandparent red, then a right rotation is performed about the grandparent.
Both case two and case three reestablish the red-black properties, so the while
loop will terminate.
5.1. RED-BLACK TREES 85

RB-Insert(T, z)
1  find node y that will be the parent of added node z
2 y ← null
3 x ← root[T ]
4 while x 6= null do
5 y←x
6 if key[z] < key[x] then
7 x ← left[x]
8 else
9 x ← right[x]
10  add node z as a child of y
11 parent[z] ← y
12 if y = null then
13 root(T ) ← z
14 else if key[z] < key[y] then
15 left[y] ← z
16 else
17 right[y] ← z
18 left[z] ← null
19 right[z] ← null
20 colour [z] ← red  added node assigned red label
21 RB-Insert-Fixup(T, z)  fix any violation of red-black labelling

RB-Insert-Fixup(T, z)
1 while parent[z] 6= null and colour [parent[z]] = red do
2  since parent[z] is red it is not root[T ] so z has grandparent
3 if parent[z] = left[parent[parent[z]]] then  parent[z] is left child
4 y ← right[parent[parent[z]]]  y is uncle of z or null
5 if y 6= null and colour [y] = red then  case one
6 colour [parent[z]] ← black
7 colour [y] ← black
8 colour [parent[parent[z]]] ← red
9 z ← parent[parent[z]]  repeat loop again
10 else
11 if z = right[parent[z]] then  case two
12 z ← parent[z]
13 Left-Rotate(T, z)
14 colour [parent[z]] ← black  case three
15 colour [parent[parent[z]]] ← red
16 Right-Rotate(T, parent[parent[z]])
17 else  parent[z] is right child
18 same as above but interchange left and right
19 colour [root[T ]] ← black
86 CHAPTER 5. ADVANCED DATA STRUCTURES

c c

a b b

z b case z a case z a c
two three
=⇒ =⇒
The other three cases presume parent[z] is a right child of the grandparent
and are the same as the first three cases but with the roles of left and right
interchanged.
Removing a node from a red-black tree is only slightly more complicated.
The RB-Delete algorithm first checks whether z had only one child, if so
then it replaces z with its child, otherwise a loop is used to find the successor
node of z and replace z with it, adopting the original colour of the node z.
If z is replaced by its successor this could cause a colouring violation for the
successor’s right child if the successor is black. If instead z is replaced by one
of its children or by null, a colouring problem would arise if z were black. In
these cases the RB-Delete-Fixup algorithm is used to reestablish the colour
properties starting at the node x that has the colouring problem. The fix up
checks eight possible cases, divided into two groups depending on whether x is
a left child or a right child of its parent. The cases depend on the colouring
of the sibling w of x, and of the colouring of the child nodes of w, performing
suitable recolouring and rotations to reestablish the colour properties. In the
first four cases x is the left child, and case one ensures that its sibling will be
black for later cases. Case two handles when w has no red children, and just
performs a simple recolouring and repeats the loop. Instead, case three handles
when left[w] is red and performs a rotation to instead make right[w] red for
case four, which performs recolouring and a single left rotation to reestablish
the colouring properties.

Exercise 5.1 (Implementation of a Red-Black Tree) Working in a team


implement the algorithms for Left-Rotate, Right-Rotate, RB-Insert, and
RB-Insert-Fixup. Prepare a suitable Node class to represent a node in a red-
black tree (including links to the parent, left and right child nodes, and colour),
and a main method to test the add method.

5.2 Augmenting Data Structures


Reading: pp302-316
It would be unusual to have to design and implement a completely new data
structure from scratch, in many situations a standard data structure for a stack,
queue, list, binary search tree, heap, hash table, or graph would be suitable.
However, often a standard data structure does not provide all the necessary
operations, and needs to be augmented with extra information (new fields)
for a particular application. Care is needed as the existing operations of the
underlying data structure might affect the values of those fields, and maintaining
the fields accurate might affect the performance of the data structure.
The process of augmenting a data structure with new fields and operations
generally involves the following steps:
5.2. AUGMENTING DATA STRUCTURES 87

RB-Delete(T, z)
1  find node r that will take the place of z
2 checkN ode ← null  node whose colour needs checking
3 if left[z] 6= null and right[z] 6= null then  z has two children
4  find successor node (left-most descendant of right subtree of z)
5 r ← right[z]
6 repeat
7 r ← left[r]
8 until left[r] = null
9  make the right child of r the left child of parent of r
10 left[parent[r]] ← right[r]
11 if right[r] 6= null then
12 parent[right[r]] ← parent[r]
13 if colour [r] = black then
14 checkN ode ← right[r]
15  have r adopt both child nodes of z
16 left[r] ← left[z]
17 parent[left[z]] ← r
18 right[r] ← right[z]
19 parent[right[z]] ← r
20 colour [r] ← colour [z]  keep same colour
21 else if left[z] 6= null or right[z] = null then  z has only one child
22 if left[z] 6= null then
23 r ← left[z]  replace z by its left child
24 else
25 r ← right[z]  replace z by its right child
26 if colour [z] = black then
27 checkN ode ← r
28 else  z had no children
29 r ← null
30 if colour [z] = black then
31 checkN ode ← parent[z]
32  update the link with parent of z
33 if r 6= null then
34 parent[r] ← parent[z]
35 if parent[z] = null then
36 root[T ] ← r
37 else if z = left[parent[z]] then
38 left[parent[z]] ← r
39 else
40 right[parent[z]] ← r
41 if checkNode 6= null then
42 RB-Delete-Fixup(T, checkNode)
88 CHAPTER 5. ADVANCED DATA STRUCTURES

RB-Delete-Fixup(T, x)
1 while x 6= root[T ] and colour [x] = black do
2 if x = left[parent[x]] then  x is left child
3 w ← right[parent[x]]  w is sibling of x
4 if w 6= null and colour [w] = red then
5  case one
6 colour [w] ← black
7 colour [parent[x]] ← red
8 Left-Rotate(T, parent[x])
9 w ← right[parent[x]]
10  note the sibling w is now black
11 if w = null or
12 ((left[w] = null or colour [left[w]] = black) and
13 (right[w] = null or colour [right[w]] = black)) then
14  case two
15 if w 6= null then
16 colour [w] ← red
17 x ← parent[x]
18 else
19 if right[w] = null or
20 colour [right[w]] = black then
21  case three
22 colour [left[w]] ← black
23 colour [w] ← red
24 Right-Rotate(T, w)
25 w ← right[parent[x]]
26  note right[w] is now red
27  case four
28 colour [w] ← colour [parent[x]]
29 colour [parent[x]] ← black
30 colour [right[w]] ← black
31 Left-Rotate(T, parent[x])
32 x ← root[T ]  terminate loop
33 else  x is right child
34 same as above but interchange left and right
35 colour [x] ← black
5.2. AUGMENTING DATA STRUCTURES 89

1. an underlying data structure is chosen upon which to build the augmented


data structure,
2. new fields are chosen to be stored in the data structure to enable the new
operations,
3. the existing operations of the data structure are checked whether they
need additions to maintain the new fields,
4. new operations are developed using the new fields giving additional func-
tionality to the data structure.
For example, suppose a collection is to be used to store (comparable) ele-
ments that get added and removed frequently, which must also include a method
select that returns the i + 1-th smallest element in the collection (the i + 1-th
order statistic). An array list could be used to store the elements in order so
that the select method would be O(1), but its add and remove methods would
be O(n). Alternatively, a linked list could be used, but its select would be
O(n).
Instead, a binary search tree would be
more suitable for maintaining the ele- cow
ments in order, and if a red-black tree 8
were used then the add and remove
methods would be O (log2 n). The bat fly
only difficulty with using a tree is that 3 4
counting through the nodes to find the
i + 1-th smallest element would require
O(n) unless the nodes of the tree were ant cat dog fox
equipped to simplify the counting pro- 1 1 2 1
cess. This could be achieved by aug-
menting the tree so that each node has eel
a size field giving the number of nodes 1
in the subtree rooted at that node.
For example, the class OrderStatisticTree (taken from another edition of
the textbook) has an inner class called Node to represent a node in a red-black
tree that is augmented with a size field.
/**
* Inner class for an order-statistic tree node, extending a
* red-black tree node with an additional size field.
*/
protected class Node extends RedBlackTree.Node
{
/** Number of nodes in the subtree rooted at this node */
protected int size;

/**
* Initializes a new node in an order-statistic tree
* @param data Data to save in the node.
*/
public Node(Comparable data)
{ super(data);
90 CHAPTER 5. ADVANCED DATA STRUCTURES

size = 1;
}

/** Returns a string rep of this node’s data and size */


public String toString()
{ return super.toString() + ", size(" + size + ")";
}
}
The class OrderStatisticTree actually extends another textbook class called
RedBlackTree, which provides an implementation of a red-black tree that for
convenience uses a sentinel nil black node to represent a null link.
Recomputing all the sizes whenever an element is added or removed would
be O(n), so instead the methods that add and remove are extended. These
methods first work their way up the tree toward the root either adding one
or subtracting one from the size of each node before they perform the actual
addition or removal of the node, operating in O (log2 n).
/**
* Inserts a node, updating the <code>size</code> fields of
* ancestors before superclass’s <code>insertNode</code> is
* called.
* @param z The node to insert.
*/
protected void treeInsert(Node z)
{ // Update the size fields of the path down to where
// handle will be inserted in the tree.
for (Node i=(Node)root; i!=nil;
i=(Node)((i.compareTo(z)>=0)?i.left:i.right))
i.size++;
// Insert the handle’s node into the tree.
super.treeInsert(z);
}

/**
* Deletes a node from the tree.
* @param handle Handle to the node being deleted.
* @throws ClassCastException if <code>handle</code> is not
* <code>Node</code> object.
*/
public void delete(Object handle)
{ // Walk up the tree by following parent pointers while
// updating the size of each node along the path.
Node x = (Node) handle;
for (Node i=(Node)x.parent; i!=nil; i=(Node)i.parent)
i.size--;
// Now actually remove the node.
super.delete(handle);
}
One complication with using a red-black tree is that the size field also needs
to be updated every time a left or right rotation is performed:
5.2. AUGMENTING DATA STRUCTURES 91

/**
* Calls {@link RedBlackTree#leftRotate} and then fixes the
* <code>size</code> fields.
* @param x handle The node being left rotated.
*/
protected void leftRotate(RedBlackTree.Node x)
{ Node y = (Node) x.right;
super.leftRotate(x);
y.size = ((Node) x).size;
((Node)x).size=((Node)x.left).size+((Node)x.right).size+1;
}

/**
* Calls {@link RedBlackTree#rightRotate} and then fixes the
* <code>size</code> fields.
* @param x handle The node being right rotated.
*/
protected void rightRotate(RedBlackTree.Node x)
{ Node y = (Node) x.left;
super.rightRotate(x);
y.size = ((Node) x).size;
((Node)x).size=((Node)x.left).size+((Node)x.right).size+1;
}
Once the existing operations of the red-black tree have been extended so
that they maintain the new size field, new operations that use the field can be
added, such as select which obtains the i-th order statistic in O (log2 n):
/**
* Finds node in a subtree that is at given ordinal position
* in an inorder walk of the subtree.
* @param x Root of the subtree.
* @param i The ordinal position.
* @return node that is ith in an inorder walk of the subtree
*/
protected Object select(BinarySearchTree.Node x, int i)
{ int r = 1 + ((Node)x.left).size;
if (i == r)
return x;
else if (i < r)
return select(x.left, i);
else
return select(x.right, i - r);
}

In general, a red-black tree (or any other type of balanced binary search tree)
can be augmented with a new field f without affecting O (log2 n) performance of
the tree’s add, contains, remove methods provided that f (x) can be calculated
for a node x using only the nodes x, left[x], right[x], and the values of f (left[x]),
f (right[x]). This ensures the operations of the tree can maintain the field f
without affecting the O (log2 n) performance of these operations. This helps
92 CHAPTER 5. ADVANCED DATA STRUCTURES

explain why the number of nodes in the subtree was chosen as an appropriate
field, rather than the rank of each node, since the rank of a node cannot be
determined solely from the rank of its left child and right child.

As another example, an interval tree is a tree whose nodes hold intervals


(closed intervals [a, b] for numbers a ≤ b), which has operations for inserting
new intervals into the collection, deleting an interval from the collection, and
searching for an interval [a, b] that overlaps with a specified interval i.

Usually interval trees are implemented using a red-black tree as the underly-
ing data structure, with intervals ordered by the starting value of each interval.
The data structure is augmented by an additional max field which gives the
maximum ending value for any interval in the subtree. The value of max for
a node x which holds an interval [a, b] is simply the largest of b, max [left[x]],
and max [right[x]], thus the new field can be maintained by the add, remove,
left rotate, and right rotate operations of the tree without affecting their per-
formance. The max field ensures that the search operation can be performed in
O (log2 n) as given by the following algorithm:

Interval-Search(T, i)
1  find an interval in interval tree T that overlaps with interval i
2 x ← root[T ]
3 while x 6= null and i ∩ interval [x] = ∅ do
4 if left[x] 6= null and max [left[x]] ≥ start[i] then
5 x ← left[x]
6 else
7 x ← right[x]
8 return x

This algorithm works its way down the tree starting at the root. If the interval
in a node x does not overlap with the interval i the algorithm moves to the left
or right child of x depending on whether the left subtree of x contains an interval
that ends after the start of i, given by the comparison max [left[x]] ≥ start[i].
The first interval found that overlaps with i is returned, whereas if there is no
such interval in the tree, null is returned.
Intervals in Interval Tree

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5.3. B-TREES 93

[7,11]
max:20

[2,3] [9,12]
max:5 max:20

[0,4] [3,5] [8,9] [11,13]


max:4 max:5 max:9 max:20

[1,3] [10,12] [17,20]


max:3 max:12 max:20

[16,17] [18,19]
max:17 max:19

Note that the Interval-Search algorithm will find an interval x that overlaps
with i if one exists in the tree. If max [left[x]] < start[i] then there is no interval
in the left subtree that overlaps with i, so if there is one it must be in the right
subtree. On the other hand, if max [left[x]] ≥ start[i] then some interval i0 in
the left subtree has end [i0 ] ≥ start[i], and if i does not overlap with i0 then
start[i0 ] > end [i]. Then the ordering property of T gives that no interval in the
right subtree of x can start before start[i0 ], so no interval in the right subtree
can overlap with i (so it is the left subtree that should be searched).

Exercise 5.2 (Interval Trees) Design a class called IntervalTree that im-
plements an interval tree using a red-black tree augmented with a max field, and
which has a search method to find an interval which overlaps with a specified
interval.

5.3 B-Trees
Reading: pp434-452
The design and analysis of most data structures presumes that the elements
are held in main memory so that each element is accessed in constant time. If a
collection is very large then its elements might need to be held on a secondary
storage device and only parts of it read in and out from main memory as needed.
The time required to obtain data from secondary storage consists of the seek
time (time required for the device to physically locate the data) the read/write
time (time to read or write the data), and the transfer time (time to transfer
the data to/from main memory). Usually the seek time is the most significant,
so typically entire pages (blocks) of consecutive data are read/written with each
seek, and the page manipulated in main memory to minimize the time delays
associated with the device.
A B-tree is a generalization of a balanced binary search tree that is designed
for use with secondary storage devices where the number of accesses to data
must be kept as low as possible. Each parent node in a B-tree holds multiple
elements and has many child nodes. This keeps the height of the tree low so that
94 CHAPTER 5. ADVANCED DATA STRUCTURES

the number of accesses to secondary storage is small. In a B-tree of minimum


order t every node x has n[x] keys (elements) kept in order, where n[x] ≥ t − 1
(unless x is the root) and n[x] ≤ 2t − 1, and n[x] + 1 links to child nodes (unless
x is a leaf). Keys are ordered in the tree so that the keys in the child referenced
by a link ci [x] are all between key i−1 [x] and key i [x].
For example, in a B-tree with minimum Q
order t = 2 each parent node must have
at least t = 2 children, so at least one FKM T
key, and at most 2t = 4 children, so at
most three keys. Such a B-tree is called
C H L NP RS VW
a 2-3-4 tree.
In a non-empty B-tree the root contains at least one key and all other nodes
contain at least t − 1 keys (with t children). Thus at depth 1 in the tree there
are at least 2 nodes, at depth 2 there are at least 2t nodes, at depth 3 at least
2t2 nodes, etcetera. So if the tree has height h there are:
n ≥ 1 + 2(t − 1) + 2t(t − 1) + 2t2 (t − 1) + · · · + 2th−1 (t − 1) = 2th − 1
keys, giving that h ≤ logt n+1 2 . In practice the minimum order t of a B-tree
is chosen to be quite large so that each node occupies one page of the storage
device (typically between 211 and 214 bytes on a hard drive). For instance, if
t = 1000 then from the root node any one of two billion keys could be found with
9
just three further reads from the storage device (as log1000 2×102 +1 ≈ 3). Using
9
a balanced binary search tree would require accessing up to log2 2×102 +1 ≈ 31
nodes on the device to find a key.
Searching for a key k in a binary tree is performed in a similar way to
searching a binary tree. Starting at the root, a node x is searched for the key
(either using a linear search or a more efficient binary search if t were large). If
k falls between key i−1 [x] and key i [x] then the search is repeated with the child
node at ci [x], first reading it into memory from the storage device and then
recursively searching it. Since every search commences at the root root[T ], the
root node is always held in memory so it need not be read from storage each
time a search is performed.
B-Tree-Search(x, k)
1  find the key k in the B-tree rooted at x
2 i←0
3  find the correct child in node to follow using a linear search
4 while i < n[x] and k > key i [x] do
5 i←i+1
6  if 0 < i < n[x] then i now satisfies key i−1 [x] < k ≤ key i [x]
7 if i < n[x] and k = key i [x] then  key found
8 return (x, i)
9 if leaf [x] then  key not found
10 return null
11 else  search child ci [x]
12 Disk-Read(ci [x], k)  read child into main memory
13 return B-Tree-Search(ci [x], k)
Searching a B-tree with n keys for a key using a linear search through each
node is O (t logt n) (or O (log2 t logt n) if a binary search is used in each node)
5.3. B-TREES 95

but only requires up to logt n+1 2 disk reads (once the root is initially read into
memory).
Inserting a key into a B-tree requires some effort to keep the B-tree balanced.
New keys are always inserted into existing leaf nodes, but since there is an upper
limit 2t − 1 on the number of keys in a node, a full node first needs to be split
(around its median key) into two nodes. The B-Tree-Split-Child algorithm
splits a full child ci (with 2t − 1 keys) of a parent node x (presumed to not itself
be full) into two half-full nodes, y with t − 1 keys and a new node z with t − 1
keys, with the median key key t−1 [y] moved up to be the key key i [x] in x.

B-Tree-Split-Child(x, i)
1  split child ci of parent x into existing child y and new child z
2 y ← ci
3 z ← Allocate-Node()
4 leaf [z] ← leaf [y]  z is made a leaf if y is a leaf
5 n[z] ← t − 1
6  move the second half of the keys in y to z
7 for j ← 0 to t − 2 do
8 key j [z] ← key j+t [y]
9 key j+t [y] ← null
10  move the second half of the links (if y not a leaf) in y to z
11 if not leaf [y] then
12 for j ← 0 to t − 1 do
13 cj [z] ← cj+t [y]
14 cj+t [y] ← null
15 n[y] ← t − 1
16  insert link to z in parent node x
17 for j ← n[x] downto i + 1 do
18 cj+1 [x] ← cj [x]
19 ci+1 [x] ← z
20  insert key (median in y) to z in parent node x
21 for j ← n[x] − 1 downto i do
22 key j+1 [x] ← key j [x]
23 key i [x] ← key t−1 [y]
24 key t−1 [y] ← null
25 n[x] ← n[x] + 1
26  update the secondary storage
27 Disk-Write(y)
28 Disk-Write(z)
29 Disk-Write(x)

To avoid changes propagating up the tree (as happens in red-black trees), requir-
ing the retrieval of ancestor nodes again from the storage, the B-Tree-Insert
algorithm anticipates the problem as it traverses down the tree, and splits any
full nodes it encounters. If the root is full, then firstly it gets split, which results
in an additional level being added to the tree. The B-Tree-Insert-Nonfull
algorithm is then used to insert the key k starting at the node x (presumed
not full), recursively splitting any full child nodes it encounters, until a leaf is
reached, in which case k is inserted (in the non-full leaf node).
For example, a B-tree with minimum order t = 2 to which the characters
96 CHAPTER 5. ADVANCED DATA STRUCTURES

B-Tree-Insert(T, k)
1 r ← root[T ]
2 if n[r] = 2t − 1 then  split the root, increasing height of T by 1
3 s ← Allocate-Node()
4 root[T ] ← s
5 leaf [s] ← false
6 n[s] ← 0
7 c0 [s] ← r
8 B-Tree-Split-Child(s, 0)
9 B-Tree-Insert-Nonfull(s, k)
10 else
11 B-Tree-Insert-Nonfull(r, k)

B-Tree-Insert-Nonfull(x, k)
1 i ← n[x] − 1
2 if leaf [x] then
3  insert k in the leaf at correct position
4 while i ≥ 0 and k < key i [x] do
5 key i+1 [x] ← key i [x]
6 i←i−1
7 key i+1 [x] ← k
8 n[x] ← n[x] + 1
9 Disk-Write(x)
10 else
11  find which child ci [x] of x to traverse to
12 while i ≥ 0 and k < key i [x] do
13 i←i−1
14 i←i+1
15 Disk-Read(ci [x])
16 if n[ci [x]] = 2t − 1 then  child ci [x] is full so split it
17 B-Tree-Split-Child(x, i)
18 if k > key i [x] then
19 i←i+1
20 B-Tree-Insert-Nonfull(ci [x], k)
5.3. B-TREES 97

‘F’, ‘S’, ‘Q’, ‘K’, ‘C’, ‘L’, ‘H’, ‘T’, ‘V’, ‘W’, ‘M’, ‘R’, ‘N’, ‘P’ are added is built
as follows.
Q

F FS FQS FK S
Q FQ FQ

CFK S C KL S C HKL S
FQ FQ FQT

C HKL ST C HKL STV C HKL S VW


Q Q

FK T FK T

C H LM S VW C H LM RS VW
Q Q

FK T FKM T

C H LMN RS VW C H L NP RS VW
The B-Tree-Delete algorithm searches for the key k to be removed by
traversing down the B-tree starting at the root root[T ]. If the key k is not
found in some node x then its child node ci [x] that will be searched next is
ensured to have at least t keys, one more than the minimum number t − 1 (the
exception is the root which must have at least 1 key if it is not itself a leaf).
This is done by checking the adjacent siblings of the child ci [x], if either has
at least t keys then its end key is moved up to the parent to replace the key
key i [x], which is moved down to ci [x], so that the child now has t keys (this is
essentially a rotation). If neither of the adjacent siblings has t keys, then they
must both have the minimum t − 1 keys, so one is chosen to be merged with
ci [x] together with the parent’s key for ci [x], giving a total of 2t − 1 keys in ci [x].
Note that since x has already been ensured to have t keys (or 2 for the root),
this second case does not reduce the number of keys in x below the minimum
t − 1.
Since the node x to be searched is ensured to have at least t keys, if k is
found in x (either by a linear search or a binary search) then two cases are
possible. If x is a leaf node then the key k is simply removed. However, if x is
a parent node then the removal of k = key i [x] would have repercussions for the
child nodes ci [x] and ci+1 [x]. If ci [x] has at least t keys then its greatest key k 0
(the predecessor of k) can be moved to take the place of k. Likewise, if ci+1 [x]
has at least t keys then its least key (the successor of k) can be moved. The only
remaining possibility is if both ci [x] and ci+1 [x] have t − 1 keys. In this case the
two siblings are merged together into one node with the key k placed between
98 CHAPTER 5. ADVANCED DATA STRUCTURES

their respective keys, and recursively k is deleted from the merged node.
For example, removing the key ‘P’
from the previous B-tree would be
achieved by starting at the root and
ensuring that the node with keys ‘F’, Q
‘K’, ‘M’ had at least t = 2 keys, and
then that the node with keys ‘N’ and FKM T
‘P’ had at least t = 2 keys, which they
do so the key ‘P’ is simply removed
from the leaf. C H L N RS VW
However, to now remove the key ‘S’ more work is required since the node with
key ‘T’ does not have at least 2 keys, which must be remedied before moving to
this node. Since its sibling has at
least t keys the key ‘M’ is moved to
replace ‘Q’ which is moved down to M
join ‘T’ (much like a right rotation in
a red-black tree). Then the node has FK QT
‘Q’ and ‘T’ and so is suitable to be
searched, as is the node with ‘R’ and
C H L N R VW
‘S’, so the key ‘S’ gets removed.
Next, consider removing the key ‘K’. Since ‘K’ lies in a parent node the children
on either side of the key are checked to see whether either has at least 2 keys
(and if so then ‘K’ would be replaced
by either its predecessor or successor M
key). In this case both have only the
minimum t−1 = 1 key, so as an inter- F QT
mediate step ‘H’, ‘K’, ‘L’ are merged
into one node with 2t − 1 = 3 keys,
C HL N R VW
and ‘K’ recursively removed from it.

Exercise 5.3 (Modifying a B-Tree) Perform some further add and remove
operations on the above B-tree, and use the classes BTree and BTreeTest to
check your answers.

5.4 Disjoint Sets


Reading: pp498-509
A disjoint set collection (or partition) is a collection S0 , S1 , . . . , Sn−1 of dis-
joint sets (sets for which Si ∩ Sj = ∅ if i 6= j). Each set Si in the collection is
identified by a unique representative element zi ∈ Si so that every two elements
x, y ∈ Si have the same representative zi . The representative zi for a set is
allowed to change when the collection gets modified. The abstract data type for
a disjoint set collection has the operations:

Make-Set(x) creates a new set containing just the element x (which must not
be in any other set in the collection),

Union(x, y) forms a new set that is the union of the set containing the element
x with the set containing the element y, and removes those two sets from
the collection,
5.4. DISJOINT SETS 99

B-Tree-Delete(T, k)
1 r ← root[T ]
2 B-Tree-Delete(r, k)
3 if not leaf [r] and n[r] = 0 then  root is now parent with no keys
4 root[T ] ← c0 [r]  reduce height of B-tree by 1

B-Tree-Delete(x, k)
1  search for k in node x using linear search
2 i←0
3 while i < n[x] and k > key i [x] do
4 i←i+1
5 if i < n and k = keyi [x ] then  k found in x at position i
6 if leaf [x] then
7  delete k from leaf x by moving keys on right left one
8 for j ← i + 1 to n[x] − 1 do
9 key j−1 [x] ← key j [x]
10 key[n[x] − 1] ← null
11 n[x] ← n[x] − 1
12 Disk-Write(x)
13 else  delete k from the parent x
14  try to replace k by predecessor key
15 y ← ci [x]
16 Disk-Read(y)
17 if n[y] ≥ t then  move key k 0 from y
0
18 k ← Delete-Greatest-In-Subtree(y)
19 key i [x] = k 0
20 else  try to replace k by successor key
21 z ← ci+1 [x]
22 Disk-Read(z)
23 if n[z] ≥ t then  move key k 0 from z
0
24 k ← Delete-Least-In-Subtree(z)
25 key i [x] = k 0
26 else  both y and z have t-1 keys so merge
27 key n[y] [y] ← k  move k to node y
28 remove k from x
29 move all keys and links of node z to y
30 n[y] ← n[y] + n[z] + 1
31 Disk-Write(x)
32 Disk-Write(y)
33 Disk-Free(z)
34 B-Tree-Delete(y, k)  recursive del
35 else  k not found in x so search child node ci [x]
36 Disk-Read(ci [x])
37 B-Tree-Ensure-Full-Enough(ci [x])
38 B-Tree-Delete(ci [x], k)
100 CHAPTER 5. ADVANCED DATA STRUCTURES

Find-Set(x) returns a reference to the current representative of the set that


contains the element x.

Disjoint-set collections have various applications, such as finding the connected


components of a graph, or finding suitable web pages in a search.
For example, starting with the elements a, b, c, d, e, f , g, suppose Make-Set
is called with each of the 7 elements in turn. This would result in seven sets {a},
{b}, {c}, {d}, {e}, {f }, {g}. Then calling Union(a, e), Union(b, d), Union(f, g)
would reduce the collection to four sets {a, e}, {b, d}, {c}, {f, g}. At this point
Find-Set(a) would give the representative of the set {a, e} (which must be
either a or e), the same as that obtained by Find-Set(e) but different from
the representative given by Find-Set(b) since b lies in another set. Calling
Union(d, f ) would then reduce the collection to three sets {a, e}, {b, d, f, g},
{c}.
One obvious implementation strategy for disjoint sets is to represent each
set by a linked list. Rather than making a doubly-linked list, it is more efficient
to link each node to the next node in the list and use another link back to the
head of the list where the representative for the set is held. This ensures that
Find-Set is always O(1).

repNode

element element element element


next next next next b

rep rep rep rep

The Union(x, y) operation can be implemented to add the elements of the set
containing y to the set containing x, by iterating through each node in that set
(setting each representative link), changing two further links to link it into the
set containing x, and then removing the set that originally contained y from the
collection.
An aggregate analysis of n Make-Set operations followed by n − 1 Union
operations can be undertaken using the number of links changed as a measure
of the time cost. In the worst case Union would always adds the largest set in
the collection to one of the smallest sets.
Operation Cost Total
Make-Set(x0 ) 1 1
Make-Set(x1 ) 1 2
Make-Set(x2 ) 1 3
... 1 ...
Make-Set(xn−1 ) 1 n
Union(x1 , x0 ) 3 n+3
Union(x2 , x0 ) 4 n+7
Union(x3 , x0 ) 5 n + 12
... ... ...
(n+4)(n−1)
Union(xn−1 , x0 ) n + 1 n + 2
5.4. DISJOINT SETS 101

/**
An interface that defines the abstract data type for a disjoint
set collection whose sets hold elements with type E
*/

public interface DisjointSetsADT<E>


{
/**
Creates a new set containing just the element x where x is
presumed not to be in any set in the collection
@param x The element to place in the set
@return A representative of the set (must be x)
*/
public E makeSet(E x);

/**
Forms the union of the sets which currently contain the
elements x and y
@param x, y Elements in each set to union (merge) together
@return A representative of the set
*/
public E union(E x, E y);

/**
Returns a representative of the set which currently contains x
@param x The element in the set
@return A representative of the set
*/
public E findSet(E x);
}

As the total number of links changed is n + (n+4)(n−1)


2 the average cost for each
of the 2n − 1 operations is O(n). To avoid the linear average time, each set in
the collection can maintain a count of its current size and the Union operation
modified to ensure it always adds the smaller set to the larger set. The textbook
(page 504) shows that in such a case the average cost would drop to at worst
O (log2 n).
The class LinkedDisjointSets implements the DisjointSetsADT interface
and demonstrates how disjoint sets can be implemented using linked lists, with a
Union operation that on average is O (log2 n). The class LinkedDisjointSets
actually maintains a map from each element to the corresponding node, so that
the node for each element can be conveniently located. An alternative to using
a map would be to use locators (instead of elements of type E) as parameters
and return types throughout the class, where a suitable locator for an element
x would be a Node object holding x. When a client wants to pass an element x
to the disjoint set collection, it would first wrap the element in a locator Node.
The Locator Pattern is a design pattern for using locators as a mechanism
to maintain the association between elements and their current positions in a
container.
A faster implementation of disjoint set collections is possible if trees are used
instead of linked lists. Each set in the collection is represented by a tree whose
102 CHAPTER 5. ADVANCED DATA STRUCTURES

/**
A class that implements a disjoint set collection using a linked
list for each set, where each node has a link to the next node in
the list and a link back to the representative at the head
@author Andrew Ensor
*/
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class LinkedDisjointSets<E> implements DisjointSetsADT<E>


{
private List<Node<E>> repNodes; // heads for each linked list
private Map<Node<E>,Integer> setSizes;//each repNode gives set size
private Map<E, Node<E>> elementMap; // map of elements to locators

cont-

root holds the representative for the set and for which each node only holds a
link to its parent node.

For example, if the disjoint set collection {a, e}, a b c f


{b, d}, {c}, {f, g} had representatives a, b, c, f ,
then the forest (tree collection) would consist of
e d g
four trees rooted on these representatives.
Then calling Union(d, f ) would result in three
trees. The representative for the set containing an a b c
element x is found by following the parent links
until the root node is found (whose parent link is e d f
itself), so Find-Set might be worse than O(1).
To improve the performance of the tree imple-
g
mentation two heuristics (features) are included.
The union by rank heuristic is used to ensure the smaller tree is added to the
larger tree (by adjusting the representative’s parent link in the smaller tree).
Rather than keeping track of the exact number of nodes in each tree it is simpler
in practice to just keep an upper bound called rank on the height of each tree.
The path compression heuristic is an optimiza-
tion used whenever a path is followed from a
node to the root. Each traversal from a node
to the root actually updates the parent node of a b c
each node along the path so that it points directly
to the root node, making future Find-Set calls
e d f
O(1) for those elements. For example, calling
Find-Set(g) on the previous forest would change
the parent link of the node holding g. g
The class ForestDisjointSets demonstrates how a forest can be used to
efficiently implement a disjoint set collection. An amortized analysis of the
algorithm using a potential function (see the textbook pp509-517) shows that
any sequence of m Make-Set, Union, Find-Set operations performed on a
5.4. DISJOINT SETS 103

-cont

public LinkedDisjointSets()
{ repNodes = new ArrayList<Node<E>>();
setSizes = new HashMap<Node<E>, Integer>();
elementMap = new HashMap<E, Node<E>>();
}

public E makeSet(E x)
{ if (elementMap.containsKey(x))
throw new IllegalArgumentException("element already used");
Node<E> node = new Node<E>(x);
node.repNode = node; // rep of the new set is x itself
repNodes.add(node); // add the head of the new set to the list
setSizes.put(node, new Integer(1));
elementMap.put(x, node); // add the new element to the map
return x;
}

public E union(E x, E y)
{ Node<E> nodeX = elementMap.get(x);
Node<E> nodeY = elementMap.get(y);
if (nodeX == null && nodeY == null)
return null; // neither element is in any set
if (nodeX == null)
return nodeY.repNode.x; // x was not in any set
else if (nodeY == null)
return nodeX.repNode.x; // y was not in any set
Node<E> repX = nodeX.repNode;
Node<E> repY = nodeY.repNode;
if (repX == repY)
return repX.x; // same set
else // add the smaller set to the larger set for efficiency
{ int sizeX = setSizes.get(repX).intValue();
int sizeY = setSizes.get(repY).intValue();
if (sizeX < sizeY)
return link(repY, repX); // add set with x to set with y
else
return link(repX, repY); // add set with y to set with x
}
}

cont-
104 CHAPTER 5. ADVANCED DATA STRUCTURES

-cont

// helper method adds second non-empty set to first where each set
// is specified by the node of its representative
private E link(Node<E> repX, Node<E> repY)
{ // insert nodes of second set into first immediately after repX
Node<E> nodeY = repY;
Node<E> previousY = null;
do
{ nodeY.repNode = repX;
previousY = nodeY;
nodeY = nodeY.next;
}
while (nodeY != null);
// link second set into the first set
previousY.next = repX.next;
repX.next = repY;
// update the map of set sizes and list of repNodes
int sizeX = setSizes.get(repX).intValue();
int sizeY = setSizes.get(repY).intValue();
setSizes.put(repX, new Integer(sizeX+sizeY));
setSizes.remove(repY);
repNodes.remove(repY); // setY no longer exists
return repX.x;
}

public E findSet(E x)
{ Node<E> node = elementMap.get(x);
if (node == null)
return null; // element not in any set
else // element is in a set
return node.repNode.x; // return representative of the set
}

// inner class that represents a node in one of the linked lists


private class Node<E>
{
public E x; // element held in the node
public Node<E> next; // link to next node in list
public Node<E> repNode; // link to head of the list

public Node(E x)
{ this.x = x;
next = null;
repNode = null;
}
}
}
5.4. DISJOINT SETS 105

disjoint set forest (using union by rank and path compression) has worst case
time O(mα(n)), where α(n) is an extremely slow growing function (in fact
α(n) ≤ 4 for n ≤ 1080 ), so the operations are on average virtually O(1).

Exercise 5.4 (Modifying a Disjoint Set Collection) Perform some more


Make-Set, Union, and Find-Set operations on the above forest and use the
class ForestDisjointSets to check your answers.
106 CHAPTER 5. ADVANCED DATA STRUCTURES

/**
A class that implements a disjoint set collection using a tree for
each set, where each node has a link to the parent node in
the tree and the representative is at the root
@author Andrew Ensor
*/
...
public class ForestDisjointSets<E> implements DisjointSetsADT<E>
{
private List<Node<E>> repNodes; // root for each tree in forest
private Map<Node<E>,Integer> setRanks; //each repNode gives set rank
private Map<E, Node<E>> elementMap; // map of elements to locators

public ForestDisjointSets()
{ repNodes = new ArrayList<Node<E>>();
setRanks = new HashMap<Node<E>, Integer>();
elementMap = new HashMap<E, Node<E>>();
}

public E makeSet(E x)
{ if (elementMap.containsKey(x))
throw new IllegalArgumentException("element already used");
Node<E> node = new Node<E>(x);
node.parentNode = node; // parent of the new node is itself
repNodes.add(node); // add the root of the new tree to the list
setRanks.put(node, new Integer(0)); // initial rank is zero
elementMap.put(x, node); // add the new element to the map
return x;
}

public E union(E x, E y)
{ Node<E> nodeX = elementMap.get(x);
Node<E> nodeY = elementMap.get(y);
if (nodeX == null && nodeY == null)
return null; // neither element is in any set
if (nodeX == null)
return getRootNode(nodeY).x; // x was not in any set
else if (nodeY == null)
return getRootNode(nodeX).x; // y was not in any set

cont-
5.4. DISJOINT SETS 107

-cont

Node<E> repX = getRootNode(nodeX);


Node<E> repY = getRootNode(nodeY);
if (repX == repY)
return repX.x; // same set
else // add set with smaller rank to larger set for efficiency
{ int rankX = setRanks.get(repX).intValue();
int rankY = setRanks.get(repY).intValue();
if (rankX < rankY)
return link(repY, repX); // add set with x to set with y
else
return link(repX, repY); // add set with y to set with x
}
}

// helper method that returns root node of tree with specified node
private Node<E> getRootNode(Node<E> node)
{ while (node.parentNode != node)
node = node.parentNode;
return node;
}

// helper method adds second non-empty set to first where each set
// is specified by the node of its representative
private E link(Node<E> repX, Node<E> repY)
{ // add the tree rooted at repY as a child of tree rooted at repX
repY.parentNode = repX;
// update the map of set ranks and list of repNodes
int rankX = setRanks.get(repX).intValue();
int rankY = setRanks.get(repY).intValue();
if (rankX == rankY)
setRanks.put(repX, new Integer(++rankX));//add 1 to setX rank
setRanks.remove(repY);
repNodes.remove(repY); // setY no longer exists
return repX.x;
}

cont-
108 CHAPTER 5. ADVANCED DATA STRUCTURES

-cont

public E findSet(E x)
{ Node<E> node = elementMap.get(x);
if (node == null)
return null; // element not in any set
else // element is in a set
return pathCompress(node).x; // return representative of set
}
// recursive helper method that path compresses path up from node
// to root so all nodes along path are now children of root
// returns the eventual parent node (root node) of node
private Node<E> pathCompress(Node<E> node)
{ if (node.parentNode == node)
return node; // node is the root node
Node<E> rootNode = pathCompress(node.parentNode);
node.parentNode = rootNode;
return rootNode;
}

// inner class that represents a node in one of the trees


private class Node<E>
{
public E x; // element held in the node
public Node<E> parentNode; // link to parent node in tree

public Node(E x)
{ this.x = x;
parentNode = null;
}
}
}
Chapter 6

Graph Algorithms

6.1 Elementary Graph Algorithms

Reading: pp527-557

A directed graph G is a pair (V, E) where the vertex set V is a set whose
elements are called vertices and the edge set E is a binary relation on V whose
elements are called edges. For any edge (u, v) in E the edge is said to be incident
from the vertex u (or leaves u) and incident to the vertex v (or enters v), and
the vertices u and v are said to be adjacent. The degree of a vertex is the sum
of the number of edges leaving it with the number of edges entering it.

For example, the state of a thread can be described by a directed graph,


whose vertices are the various states and whose edges represent transitions be-
tween the states.

waiting
wait notified
monitor
unavailable
start
new runnable monitor sync. blocked
obtained

run sleep
finished time
expired
sleeping
dead

An undirected graph G is a pair (V, E) where the edge set E consists of


unordered pairs of vertices u, v, which for convenience are also denoted by
(u, v) (with the understanding that (u, v) = (v, u)). The degree of a vertex
is the number of edges incident on it. Sometimes undirected graphs are not
allowed to have loops, which are edges from a vertex to itself.

109
110 CHAPTER 6. GRAPH ALGORITHMS

For example, a undirected


graph can be used to de-
scribe flights, where the ver- Fij Sam
tices represent airports and
the edges represent routes be- Bri Tah
tween the airports. Proba-
bly there would not be any Syd Auc
routes from an airport back
to itself (no loops). If some of Mel Wel
the routes were not two-way
then a directed graph would Cha
be used instead. Chr
A (simple) path of length k ≥ 0 from a vertex v0 to a vertex vk in a graph is
an ordered sequence of distinct edges (v0 , v1 ), (v1 , v2 ), . . . , (vk−1 , vk ) that join
adjacent vertices. A path for which v0 = vk is called closed (or a (simple) cycle),
and if there are no closed paths in a graph then the graph is called acyclic.
If there is a path from a vertex u to a vertex v then v is said to be reachable
from u. If the graph is undirected then reachable gives an equivalence relation
on the vertices in an undirected graph, and the equivalence classes are called
the connected components. If instead the graph is directed then an equivalence
relation is obtained by saying u and v are equivalent if both v is reachable from
u and u is reachable from v, in which case the equivalence classes are called the
strongly connected components.
For example, the directed graph illustrated along- a b c
side has three strongly connected components,
one consisting of the single vertex a, another of
the four vertices b, c, e, f , and the third consisting
d e f
of the vertex d.
There are two common alternative implementations for graphs:
adjacency list representation where besides the vertex set and edge set (pos-
sibly not included) each vertex holds a list of the edges that are incident
from it,
adjacency matrix representation where besides the vertices v1 , v2 , . . . , vn
there is an n × n matrix used to store the edges, where the i, j entry gives
the edges incident from vi to vj (or instead a 1 to indicate that there is
such an edge).
The adjacency list representation is usually preferred, particularly for sparse
graphs, where most vertices are not adjacent to each other (so the number of
2
edges |E| is much smaller than |V | ). The adjacency matrix representation is
typically used to implement dense graphs, where most pairs of vertices in the
2
graph are adjacent (so |E| is close to |V | ). The class AdjacencyListGraph
which implements the GraphADT interface uses an adjacency list representation
for either directed or undirected graphs.
Two of the simplest algorithms for searching a graph are breadth-first search
and depth-first search, and many of the more advanced algorithms are based on
these two algorithms. Both algorithms start at a given vertex of a graph and
discover all the vertices that are reachable from that vertex, but they do so in
a different order.
6.1. ELEMENTARY GRAPH ALGORITHMS 111

edge set
b
auc-wel
b

b
auc-chr
b vertex set adjacency lists
b
auc-fij
b

b
auc b
auc-wel auc-chr auc-fij auc-sam auc-tah auc-bri ...
auc-sam
b

b
wel b
auc-wel wel-chr wel-cha
auc-tah
b

b
chr b auc-chr wel-chr chr-mel
auc-bri
b

auc-syd b cha b wel-cha


b

auc-mel b fij b auc-fij fij-sam fij-syd


b

wel-chr b sam b auc-sam fij-sam


b

wel-cha b tah b auc-tah


b

chr-mel b
bri b auc-bri bri-syd
b

b
fij-sam
b
syd b auc-syd fij-syd bri-syd syd-mel

b
fij-syd
b
mel b auc-mel chr-mel syd-mel
b
bri-syd
b

b
syd-mel
b

vertex list
0 auc
1 wel
adjacency matrix
2 chr index 0 1 2 3 4 5 6 7 8 9
0 auc-welauc-chr auc-fijauc-samauc-tahauc-briauc-sydauc-mel
3 cha 1 auc-wel wel-chrwel-cha
2 auc-chrwel-chr chr-mel
4 fij 3 wel-cha
4 auc-fij fij-sam fij-syd
5 sam 5 auc-sam fij-sam
6 auc-tah
6 tah 7 auc-bri bri-syd
8 auc-syd fij-syd bri-syd syd-mel
7 bri 9 auc-mel chr-mel syd-mel

8 syd
9 mel
112 CHAPTER 6. GRAPH ALGORITHMS

A breadth-first search first visits each of its adjacent vertices, then it visits
each of their adjacent vertices that have not already been visited, and continues
in this way until no further vertices can be reached. Typically a breadth-first
search uses a processing queue of vertices that have been visited but not yet
processed (had all their adjacent vertices visited). Also it needs to keep track
of the vertices it has already visited, either storing them in a collection or
else decorating the vertices (such as by applying the Decorator Pattern to the
vertex node) by colouring the visited vertices (such as white for unvisited, grey
for visited but not yet processed, and black for processed vertices).
The edges that are followed (called
tree edges) when visiting a vertex for Queue:
the first time form a tree called the d
predecessor subgraph. For example, a b c a b e
performing a breadth-first search of f a b
the directed graph shown on Page c f a
?? starting at vertex d results in the c f
d e f c
vertices e, b, a being visited first and
then vertices f and c.
One application of breadth-first search is to find paths of least length between
a vertex and other vertices in the graph. Since a breadth-first search starting
at a vertex v builds paths from v incrementing in length by one each time a
vertex is processed it can be shown (see the textbook pp535-537) that the path
obtained when a vertex is first visited actually has minimal length.
A depth-first search first visits one of the adjacent vertices of the starting
vertex and then visits one of its adjacent vertices, continuing until a vertex
is reached that has no unvisited adjacent vertices. Then it backtracks to the
previously visited vertex and visits another of its adjacent vertices, repeating the
process until no further vertices can be reached. Typically a depth-first search
uses either a stack of vertices that have been visited but not yet processed
(the grey vertices), or else uses recursion to perform the search. Similarly to
breadth-first search, the depth-first search algorithm also needs to keep track
of the vertices it has already visited, either by building a collection of visited
vertices or by decorating the vertex nodes with a colour.
A depth-first search of the Stack:
directed graph shown on a b c
c a
Page ?? starting at vertex d
b b b f
might result in the vertices e, e e e e e
b, c being visited first, then d d d d d d
d e f
vertex a, followed by f .
It is interesting to note that if each vertex v is given a time stamp d[v] when it
is first visited (changed from white to grey) and another f [v] when it has been
fully processed (changed from grey to black) then for any two reachable vertices
u and v the time interval from d[u] to f [u] and the time interval from d[v] to
f [v] must satisfy that either one is contained entirely within the other or else
they are disjoint (do not overlap at all).
The classes BreadthFirstSearch and DepthFirstSearch demonstrate how
each search algorithm can be implemented for both undirected and directed
graphs. The constructor of each class starts by colouring each vertex white (un-
visited), using a map to store the colours (as an alternative the Vertex interface
could be decorated with a colour property using the Decorator Pattern). The
6.2. MINIMAL SPANNING TREES 113

search method can be called various times for one graph, since each search
only searches one strongly connected component of the graph. Note that both
these classes use the Template Pattern, with the search algorithm in a tem-
plate method (search) and hook methods vertexDiscovered (called whenever
the colour of a vertex is changed from white to grey), vertexFinished (called
whenever the colour is changed from grey to black), and edgeTraversed (called
when an edge is being followed to a previously unvisited vertex). These methods
can be overridden by subclasses to incorporate the search algorithm as a part
of a more sophisticated algorithm.
To illustrate the flexibility of using hook methods, the class StronglyCon-
nectedComponents uses two subclasses of DepthFirstSearch to determine the
strongly connected components of any directed graph. The CompleteDepth-
FirstSearch subclass repeatedly performs a depth-first search of a graph until
all vertices have been visited. The RecordDepthFirstSearch subclass records
the vertices that have been visited when a search is performed.
Finding strongly connected components is an important tool as many graph
algorithms can only be used on strongly connected graphs, so the StronglyCon-
nectedComponents class can be used to break a graph into its strongly connected
subgraphs, and the algorithms applied to each separately. The components of
a graph G are found in three steps:
• repeatedly perform depth-first searches of G until all vertices have been
visited, and store each vertex once a depth-first search has finished with
it on a stack (the order is essential here),
• form the transpose of G which is the graph formed with the same (equiv-
alent) vertices as G but has all edges reversed,
• use each vertex on the stack in order as a starting vertex for a depth-first
search of the transpose graph, and note the vertices that are visited in
each search (which give the components of the original graph).

Exercise 6.1 (Topological Sort) The vertices in a directed acyclic graph G


can be linearly ordered so that for any edge (u, v) in the graph, the vertex u
appears before the vertex v in the ordering. Such a linear ordering is known as
a topological sort and can be viewed as an ordering of the vertices of the acyclic
graph along a horizontal line so that all directed edges go from left to right.
Prepare a program that implements the following algorithm for performing a
topological sort of a directed acyclic graph G:

Topological-Sort(G)
1 repeatedly call Depth-First-Search(G, v) until all vertices visited
2 add each vertex to the front of a list as it is finished in the search
3 return the list of vertices

6.2 Minimal Spanning Trees


Reading: pp561-573
A weighted graph is a graph together with a function w that assigns to each
edge of the graph a non-negative number, called the weight of the edge.
114 CHAPTER 6. GRAPH ALGORITHMS

/**
A class which contains the breadth first search algorithm for any
graph that implements the GraphADT interface and whose vertices
hold elements of generic type E that has suitable hashing function
@author Andrew Ensor
*/
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class BreadthFirstSearch<E>


{
// colours for each vertex where WHITE is unvisited, GREY is
// currently being processed, and BLACK is completely processed
protected static enum Colour{WHITE, GREY, BLACK};
protected Map<Vertex<E>, Colour> vertexColours;
protected GraphADT<E> graph;

// creates new searcher which has performed no breadth-first search


public BreadthFirstSearch(GraphADT<E> graph)
{ this.graph = graph;
Set<Vertex<E>> vertices = graph.vertexSet();
vertexColours = new HashMap<Vertex<E>, Colour>(vertices.size());
for (Vertex<E> vertex : vertices)
vertexColours.put(vertex, Colour.WHITE);
}

// performs a breadth-first search of graph starting at given


// vertex which is presumed to be WHITE
public void search(Vertex<E> startVertex)
{ if (!graph.containsVertex(startVertex))
throw new IllegalArgumentException("vertex not in graph");
//create queue to hold vertices not yet fully processed
QueueADT<Vertex<E>> processingQueue
= new LinkedQueue<Vertex<E>>();
// handle the starting vertex
vertexColours.put(startVertex, Colour.GREY);
vertexDiscovered(startVertex);
processingQueue.enqueue(startVertex);
// repeatedly find adjacent vertices and visit them
while (!processingQueue.isEmpty())
{ Vertex<E> frontVertex = processingQueue.dequeue();
// find all the adjacent vertices that have not been visited
// and enqueue them

cont-
6.2. MINIMAL SPANNING TREES 115

-cont

for (Edge<E> incidentEdge : frontVertex.incidentEdges())


{ Vertex<E> adjacentVertex
= incidentEdge.oppositeVertex(frontVertex);
if (vertexColours.get(adjacentVertex) == Colour.WHITE)
{ edgeTraversed(incidentEdge);
vertexColours.put(adjacentVertex, Colour.GREY);
vertexDiscovered(adjacentVertex);
processingQueue.enqueue(adjacentVertex);
}
}
vertexColours.put(frontVertex, Colour.BLACK);
vertexFinished(frontVertex);
}
}

// hook method that is called whenever a vertex has been discovered


protected void vertexDiscovered(Vertex<E> vertex)
{ // default implementation does nothing
}

// hook method that is called whenever a vertex has been finished


protected void vertexFinished(Vertex<E> vertex)
{ // default implementation does nothing
}

// hook method that is called whenever a tree edge is traversed


protected void edgeTraversed(Edge<E> edge)
{ // default implementation does nothing
}
}
116 CHAPTER 6. GRAPH ALGORITHMS

/**
A class which contains the depth first search algorithm for any
graph that implements the GraphADT interface and whose vertices
hold elements of generic type E that has suitable hashing function
@author Andrew Ensor
*/
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class DepthFirstSearch<E>


{
// colours for each vertex where WHITE is unvisited, GREY is
// currently being processed, and BLACK is completely processed
protected static enum Colour{WHITE, GREY, BLACK};
protected Map<Vertex<E>, Colour> vertexColours;
protected GraphADT<E> graph;

// creates new searcher which has performed no depth-first search


public DepthFirstSearch(GraphADT<E> graph)
{ this.graph = graph;
Set<Vertex<E>> vertices = graph.vertexSet();
vertexColours = new HashMap<Vertex<E>, Colour>(vertices.size());
for (Vertex<E> vertex : vertices)
vertexColours.put(vertex, Colour.WHITE);
}

// performs a recursive depth-first search of graph starting at


// given vertex which is presumed to be WHITE
public void search(Vertex<E> startVertex)
{ if (!graph.containsVertex(startVertex))
throw new IllegalArgumentException("vertex not in graph");
// handle the starting vertex
vertexColours.put(startVertex, Colour.GREY);
vertexDiscovered(startVertex);
// visit each adjacent vertex
for (Edge<E> incidentEdge : startVertex.incidentEdges())
{ Vertex<E> adjacentVertex
= incidentEdge.oppositeVertex(startVertex);
if (vertexColours.get(adjacentVertex) == Colour.WHITE)
{ edgeTraversed(incidentEdge);
search(adjacentVertex);
}
}
vertexColours.put(startVertex, Colour.BLACK);
vertexFinished(startVertex);
}

cont-
6.2. MINIMAL SPANNING TREES 117

-cont

// hook method that is called whenever a vertex has been discovered


protected void vertexDiscovered(Vertex<E> vertex)
{ // default implementation does nothing
}

// hook method that is called whenever a vertex has been finished


protected void vertexFinished(Vertex<E> vertex)
{ // default implementation does nothing
}

// hook method that is called whenever a tree edge is traversed


protected void edgeTraversed(Edge<E> edge)
{ // default implementation does nothing
}
}

/**
A class which contains the strongly connected components algorithm
for a graph that implements GraphADT interface and whose vertices
hold elements of generic type E that has suitable hashing function
@author Andrew Ensor
*/
import java.util.HashSet;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.Stack;

public class StronglyConnectedComponents<E>


{
protected GraphADT<E> graph;

// creates new searcher which has performed no depth-first search


public StronglyConnectedComponents(GraphADT<E> graph)
{ this.graph = graph;
}

public Set<Set<Vertex<E>>> getComponentVertices()


{ Set<Set<Vertex<E>>> components = new HashSet<Set<Vertex<E>>>();
// first perform depth-first searches until all vertices visited
// and record vertices in reverse order of when they finish
CompleteDepthFirstSearch<E> cdfsSearcher
= new CompleteDepthFirstSearch<E>(graph);
Stack<Vertex<E>> vertexStack = cdfsSearcher.completeSearch();

cont-
118 CHAPTER 6. GRAPH ALGORITHMS

-cont

// create a new graph of same type that is the transpose of


// graph so has same vertices but all edges reversed
GraphADT<E> graphTransposed
= new AdjacencyListGraph<E>(graph.getType());
Map<Vertex<E>,Vertex<E>> correspOtoT
= new HashMap<Vertex<E>,Vertex<E>>();
Map<Vertex<E>,Vertex<E>> correspTtoO
= new HashMap<Vertex<E>,Vertex<E>>();
for (Vertex<E> originalVertex : graph.vertexSet())
{ Vertex<E> newVertex =
graphTransposed.addVertex(originalVertex.getUserObject());
// keep track of the correspondence between vertices in
// graph and the new vertices added to graphTransposed
correspOtoT.put(originalVertex, newVertex);
correspTtoO.put(newVertex, originalVertex);
}
for (Edge<E> originalEdge : graph.edgeSet())
{ Vertex<E>[] endVertices = originalEdge.endVertices();
Vertex<E> startVertex = correspOtoT.get(endVertices[1]);
Vertex<E> endVertex = correspOtoT.get(endVertices[0]);
graphTransposed.addEdge(startVertex, endVertex);
}
// next perform a depth-first search of the transposed graph
// using start vertices in order of those in the stack
RecordDepthFirstSearch<E> rdfsSearcher
= new RecordDepthFirstSearch<E>(graphTransposed);
while (!vertexStack.empty())
{ Vertex<E> vertex = vertexStack.pop();
Set<Vertex<E>> visitedVertices
=rdfsSearcher.getVisitedVertices(correspOtoT.get(vertex));
if (visitedVertices!=null && visitedVertices.size()>0)
{ // obtain a corresponding list of the original vertices
Set<Vertex<E>> originalVertices =new HashSet<Vertex<E>>();
for (Vertex<E> newVertex : visitedVertices)
originalVertices.add(correspTtoO.get(newVertex));
components.add(originalVertices);
}
}
return components;
}

// inner class that repeatedly performs depth-first search until


// all vertices have been visited
private class CompleteDepthFirstSearch<E>
extends DepthFirstSearch<E>
{
private Stack<Vertex<E>> vertexStack;

cont-
6.2. MINIMAL SPANNING TREES 119

-cont

public CompleteDepthFirstSearch(GraphADT<E> graph)


{ super(graph);
vertexStack = new Stack<Vertex<E>>();
}

// performs a complete depth-first search of graph and returns


// a stack of the vertices of graph with most recently finished
// on top
public Stack<Vertex<E>> completeSearch()
{ for (Vertex<E> vertex : graph.vertexSet())
{ if (vertexColours.get(vertex) == Colour.WHITE)
search(vertex);
}
return vertexStack;
}

// overridden DepthFirstSearch method called when vertex


// has finished
protected void vertexFinished(Vertex<E> vertex)
{ vertexStack.add(vertex);
}
}

// inner class that returns a set of vertices of graph when


// vertices are searched using depth-first search
private class RecordDepthFirstSearch<E> extends DepthFirstSearch<E>
{
private Set<Vertex<E>> visitedVertices; //holds visited vertices

public RecordDepthFirstSearch(GraphADT<E> graph)


{ super(graph);
// prepare the empty set of visited vertices
visitedVertices = new HashSet<Vertex<E>>();
}

// performs depth-first search of graph starting at specified


// vertex and returns a set of the visited vertices
public Set<Vertex<E>> getVisitedVertices(Vertex<E> vertex)
{ if (vertexColours.get(vertex)!=Colour.WHITE)
return null; // vertex already visited
else // perform a new search
{ visitedVertices.clear();
super.search(vertex); // will update visitedVertices
return visitedVertices;
}
}

// overridden DepthFirstSearch method called when vertex has


// been discovered
protected void vertexDiscovered(Vertex<E> vertex)
{ visitedVertices.add(vertex);
}
}
}
120 CHAPTER 6. GRAPH ALGORITHMS

$380
Fij Sam
$650
Bri $490 Tah
$680
$290 $750 $900
b 5 d Syd $550
Auc
2 10 12
$160 $90
$700
a 9 5 f
Mel $120 Wel $350
15 3 6 $850
$55

c e Cha
6 Chr
A spanning tree for an undirected connected graph G = (V, E) is a tree
which is a subgraph of G that includes all the vertices V . If a connected graph
has only a finite number of edges then a spanning tree can be obtained from the
graph by progressively removing edges from closed paths until no closed paths
remain. Part of the importance of having a spanning tree for a connected graph
is that every two vertices in the graph are then connected by a unique path in
the spanning tree. For example, the following are two spanning trees for the
previous graphs.
$380
Fij Sam
Bri $490 Tah
$290
$900
b 5 d Syd $550
Auc
2 10 12
$160 $90

a f
Mel $120 Wel $350
15
c e Cha
Chr
Many spanning trees are possible for a weighted graph, and they can have
different total weights. A minimal spanning tree for a connected weighted graph
is a spanning tree for the graph that has the smallest possible weight, in the
sense that the sum of the weights for all its edges is as small as possible.
Rather than starting with a connected graph and removing heaviest edges
one by one until a minimal spanning tree is found, it is more practical (and
efficient) to gradually build up a minimal spanning tree starting from any chosen
vertex. The following terminology assists in building a minimal spanning tree.
A cut of a graph G = (V, E) is a partition of the
b 5 d
vertices of the vertices V into two disjoint (and
non-empty) sets S and V − S. An edge (u, v) is 2 10 12
said to cross the cut if either u ∈ S and v ∈ V −S a 9 5 f
or else u ∈ V − S and v ∈ S. The cut is said to
respect a set A of edges of the graph if no edge in 15 3 6
c e
A crosses the cut. 6
Suppose A is contained in some minimal spanning tree T for an undirected
and connected weighted graph, and suppose S, V − S is a cut of the graph that
respects A. Then A can be extended as follows. Since the graph is connected
there must be edges that cross the cut, and adding any of them to A would not
create a closed path in A (as A respects the cut). So one of the edges (u, v) that
crosses the cut which has least weight is chosen to be added to A. Note that A
will still be a subset of some minimal spanning tree as if T itself did not contain
the edge (u, v) then it must contain a path from u to v that includes some other
6.2. MINIMAL SPANNING TREES 121

MST-Kruskal(G, w)
1 A←∅  set A holds edges in minimal spanning tree
2 for each vertex v ∈ V [G] do
3 Make-Set(v)
4 sort the edges E of G into increasing order of weight w
5 for each edge (u, v) ∈ E, taken in increasing order do
6 if Find-Set(u) 6= Find-Set(v) then
7 add edge (u, v) to A
8 Union(u, v)
9 return A

edge (x, y) that crosses the cut (because T spans the graph). This edge (x, y)
in T could be replaced by the edge (u, v) without increasing the weight of T
(since (u, v) was chosen to have least weight crossing the cut) and still be a tree
(without closed paths) that spans the graph. Hence A is still contained in some
minimal spanning tree.
The fact that a subset of a minimal spanning tree can be grown across a
cut of the graph by adding an edge that crosses the cut with least weight shows
that the problem of finding a minimal spanning tree satisfies the greedy-choice
property. It also has optimal substructure as if T is a minimal spanning tree
and a cut is made of the graph for which exactly one edge (u, v) of T crosses
the cut then the two subtrees formed by removing edge (u, v) from T must
also both be minimal spanning trees on each side of the cut (otherwise a tree
with smaller weight than T could be built). Thus a greedy technique would be
suitable for solving the minimal spanning tree problem if appropriate cuts of
the graph could be found.
One greedy algorithm for finding a minimal spanning tree is Kruskal’s algo-
rithm. It starts with a forest A of disjoint trees, each initially a single vertex,
and gradually joins them together until only a single tree remains. At each step
the algorithm takes the least weight edge (u, v) not already considered, if u and
v belong to the same tree in A then the edge is discarded, otherwise a cut of the
graph can be made which respects A and which (u, v) crosses, so (u, v) can be
added to A. A convenient way of efficiently checking whether the end points u
and v already belong to the same tree in A (and so would cause a closed path)
is to also hold the vertices of A in a disjoint set data structure, with one set of
vertices for each tree. If both end points u and v belong to the same set then
the edge should not be used to grow A.
For example, Kruskal’s algorithm can applied to the weighted graph with
six vertices and ten edges illustrated on Page ??. Each edge is considered in
turn, in order of increasing weight. Starting with six disjoint trees, each a single
vertex the least weight edge (a, b) is added to A, joining together two of the trees
in A, then the next-smallest weight edge (c, d) is added, joining together two
more trees in A. When the edge (c, e) is eventually considered it is discarded as
both endpoints c and e are already in the same tree (so also in the same set in
the disjoint set data structure). Once all the edges have been considered there
remains just one tree in A which is a minimal spanning tree.
122 CHAPTER 6. GRAPH ALGORITHMS

MST-Prim(G, w, r)
1 for each vertex u ∈ V [G] do
2 if u 6= r then
3 leastEdge[u] ← null
4 Enqueue(Q, u)
5 A←∅  set A holds edges in minimal spanning tree
6 addedVertex ← r  keeps track of most-recent vertex added to A
7 while size[Q] > 0 do
8  update vertices on Q that are adjacent to addedVertex
9 for each edge e ∈ incident[addedVertex ] do
10 v ← opposite[e, addedVertex ]
11 if v ∈ Q then
12 if weight[leastEdge[v]] > weight[e] then
13 leastEdge[v] ← e
14  priority queue Q now has vertex with least cross edge at head
15 v ← dequeue[Q]
16 add [A, leastEdge[v]]
17 addedVertex ← v
18 return A

b d b d b 5 d
2
a f a f a f
3
c e c e c e
b d b d b d

a 5 f a f a f
6
c e c e c e
6
Another greedy algorithm for finding a minimal spanning tree is Prim’s
algorithm. Prim’s algorithm works by enlarging a single tree A, starting from
any specified vertex r. At each stage a cut is made to separate A from the rest
of the graph and an edge (u, v) with least weight that crosses the cut is added
to A. In order to efficiently find the least weight edge (u, v) a priority queue Q
is used to hold each vertex that has not yet been added to A, together with the
least weight edge found so far between that vertex and some vertex in A. Each
time another edge and vertex is added to A the vertices still in the queue that
are adjacent to the new vertex are checked to see whether their least weight
edge should be updated, as the adjacent vertices might now be closer to A.
For example, Prim’s algorithm can be used on the previous weighted graph
that has six vertices and ten edges. Starting with the vertex a, the smallest
weight edge that joins this vertex to another vertex has weight 2 (and ap-
pears at the head of the queue). Once this edge is added to the tree, the tree
6.3. SINGLE-SOURCE SHORTEST PATHS 123

has four edges joining its two vertices to other vertices, of weights 15, 9, 10,
5. Adding the edge with smallest weight (weight 5), and continuing the al-
gorithm the tree grows until all the vertices are included in the set A. The
class MinimalSpanningTreePrim demonstrates how Prim’s algorithm can be
implemented using a queue for the vertices that are still to be processed.
b d b 5 d b d
2 10 10 12
a f a 9 f a 9 5 f
15 15 15 3
c e c e c e
b d b d b d
10 12 12
a 5 f a f a f
6
c e c e c e
6
If the graph has m edges and n vertices then either algorithm is actually
O (m log2 n), although with a judicious choice of data structure (such as a Fi-
bonacci heap) Prim’s algorithm can be made O (m + n log2 n) (an improvement
for dense graphs).

Exercise 6.2 (Implementing Kruskal’s Algorithm) Write a program that


implements Kruskal’s algorithm for finding a minimal spanning tree in an undi-
rected and connected weighted graph.

6.3 Single-Source Shortest Paths


Reading: pp580-591,595-599
The single-source shortest path problem for a weighted graph G = (V, E, w)
takes as input the graph G and a vertex s ∈ V (called the source vertex ) and
should have as output for each vertex v ∈ V a path from s to v of minimal
weight. Algorithms for solving this problem make use of the fact that the short-
est path problem has optimal substructure (as discussed on Page 65). Note
that this problem includes the supposedly simpler problem of finding the short-
est path between s and just one other specified vertex (the single-pair shortest
path problem). Actually, no algorithm is currently known that solves the single-
pair shortest path problem with order less than the algorithms that solve the
single-source problem, so it is no more time consuming (asymptotically) to find
all the shortest paths from s than to find a single path from s to some particular
vertex v.
Some algorithms are only designed for graphs whose edges have non-negative
weight, whereas others can handle negative weights. Negative weight edges can
cause complications for some algorithms as if a closed path can be found in the
graph whose total weight is negative then an algorithm might forever traverse
that path to continually reduce the total weight. Several algorithms for solving
the single-source shortest path problem keep track of a shortest-path estimate
d[v] for each vertex v, which gives the least known weight found so far for a
124 CHAPTER 6. GRAPH ALGORITHMS

/**
A class which contains Prim’s algorithm for finding a minimal
spanning tree in an undirected and connected weighted graph that
implements GraphADT interface and whose vertices hold elements of
generic type E that has suitable hashing function
@author Andrew Ensor
*/
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;

public class MinimalSpanningTreePrim<E>


{
private GraphADT<E> graph;
private Map<Edge<E>,Integer> weights;

public MinimalSpanningTreePrim(GraphADT<E> graph,


Map<Edge<E>,Integer> weights)
{ if (graph.getType()!=GraphADT.GraphType.UNDIRECTED)
throw new IllegalArgumentException("Graph not undirected");
this.graph = graph;
this.weights = weights;
}

public Set<Edge<E>> getMinimalSpanningTree(Vertex<E> root)


{ // create a min priority queue to hold each vertex
Comparator<LeastCrossEdge<E>> comparator
= new Comparator<LeastCrossEdge<E>>()
{ public int compare(LeastCrossEdge<E> u,
LeastCrossEdge<E> v)
{ return u.getWeight()-v.getWeight();
}
};
PriorityQueue<LeastCrossEdge<E>> queue
= new PriorityQueue<LeastCrossEdge<E>>
(graph.vertexSet().size(), comparator);
for (Vertex<E> vertex : graph.vertexSet())
{ if (vertex!=root)
queue.add(new LeastCrossEdge<E>(vertex));
}
// start building the minimal spanning tree
Set<Edge<E>> mst = new HashSet<Edge<E>>();
Vertex<E> addedVertex = root;

cont-
6.3. SINGLE-SOURCE SHORTEST PATHS 125

-cont

// process each element of the queue


while (queue.size()>0)
{ // update the least cross edges for all the adjacent vertices
for (Edge<E> edge : addedVertex.incidentEdges())
{ Vertex<E> adjacentVertex=edge.oppositeVertex(addedVertex);
// find whether the adjacent vertex is in queue
Iterator<LeastCrossEdge<E>> iterator = queue.iterator();
LeastCrossEdge<E> lce = null;
boolean found = false;
while (!found && iterator.hasNext())
{ lce = iterator.next();
found = (lce.vertex == adjacentVertex);
}
if (found && lce.getWeight()>weights.get(edge))
{ // remove lce from queue so gets resorted after change
iterator.remove();
lce.edge = edge;
queue.add(lce);
}
}
// add the smallest least cross edge to minimal spanning tree
LeastCrossEdge<E> lce = queue.poll();
if (lce.edge==null)
throw new NullPointerException("No edge in cross edge");
mst.add(lce.edge);
addedVertex = lce.vertex;
}
return mst;
}

// inner class that holds information about the least weight


// cross edge (an edge that joins A with a vertex not in A)
// to the specified vertex
private class LeastCrossEdge<E>
{
public Vertex<E> vertex;
public Edge<E> edge; // null if not a cross edge

public LeastCrossEdge(Vertex<E> vertex)


{ this.vertex = vertex;
this.edge = null;
}

public int getWeight()


{ if (edge!=null)
return weights.get(edge);
else
return Integer.MAX_VALUE;
}
}
}
126 CHAPTER 6. GRAPH ALGORITHMS

path from s to v. The process of relaxing an edge e = (u, v) consists of checking


whether d[u] + w[e] < d[v]. In such a case a least path from s to u augmented
by the edge e would give a shorter path than the previously found path from s
to v, and so d[v] should be replaced by d[u] + w[e].
The Bellman-Ford algorithm solves the single-source shortest path problem
for a directed or undirected weighted graph with m edges and n vertices, or
else indicates if the graph has a closed path with negative weight. It works
by relaxing each edge in the graph n − 1 times, so that during iteration i it
decreases each value of d[v] if a path consisting of i edges has weight less than
the previous value of d[v]. Besides keeping track of the current value of d[v] for

Bellman-Ford(G, w, s)
1 for each vertex v ∈ V [G] do
2 d[v] ← ∞
3 leastEdge[v] ← null  last edge on shortest path to v
4 d[s] ← 0
5 n ← |V [G]|  n is the number of vertices in G
6 for i ← 1 to n − 1 do
7 for each edge e ∈ E[G] do  e can be directed or undirected
8  relax edge e = (u, v)
9 u ← start[e]
10 v ← end [e]
11 if d[u] + w[e] < d[v] then
12 d[v] ← d[u] + w[e]
13 leastEdge[v] ← e
14  check whether any edge can still be relaxed
15 for each edge e ∈ E[G] do
16 u ← start[e]
17 v ← end [e]
18 if d[u] + w[e] < d[v] then
19 return false  G has a negative weight closed path
20 return true and every leastEdge[v]

each vertex v the Bellman-Ford algorithm also holds the edge leastEdge[v]
that is the last edge in a path from s to v with weight d[v] (so that the shortest
path with weight d[v] can be reconstructed once the algorithm has finished).
Since each of the n − 1 iterations checks whether every edge can be relaxed the
algorithm is Θ(mn).
To verify that the algorithm does find for every vertex v a shortest path
from s to v consider the following loop invariant:

at the start of iteration i the value d[v] is the weight of the shortest
path from s to v which uses at most i − 1 edges (or possibly i
depending on the order in which edges are processed).

At the start of iteration i = 1 the vertex s has d[s] = 0 and all other vertices
v have d[v] = ∞, so the loop invariant holds for i = 1 (since the path using 0
edges from s to s has total weight 0). Suppose the loop invariant holds for some
6.3. SINGLE-SOURCE SHORTEST PATHS 127

value of i ≥ 1, and consider a vertex v during iteration i of the loop. Any path
e1 , e2 , . . . , ei of i edges from s to v must pass through an adjacent vertex u of
v, and so e1 , e2 , . . . , ei−1 must form a path from s to u. By the loop invariant,
d[u] is at most the weight of the path e1 , e2 , . . . , ei−1 , so the total weight of
the i edges is at least d[u] + w(ei ). But iteration i of the loop checks whether
d[u] + w(ei ) < d[v] and updates the value of d[v] if this is the case. Hence the
loop invariant holds at the end of the iteration (and start of iteration i + 1). At
the start of iteration n (when the loop terminates) the loop invariant gives that
d[v] is the weight of the shortest path from s to v that uses at most n − 1 edges.
Note that if there were a shorter path using n (or more) edges, then such a path
would have to include some vertex twice, so contain a closed path of negative
weight. The algorithm finishes by checking for the possibility of a closed path
of negative weight, by seeing whether any edge can still be relaxed, if not then
a shortest path to each vertex v has been found.
For example, suppose the Bellman-Ford algo-
rithm is to be applied to the illustrated directed b 11 d
graph using the source vertex a, finding the short- 2 5 2
est path from a to any other vertex v in the (con- 1
nected) graph. Initially, d[a] = 0 and d[v] = ∞ a −1 9 5 f
for all other vertices. After one iteration the 15 36 7
edges incident to the source vertex get relaxed, c e
so d[b] = 2 and d[c] = 15. After the following −2
iteration further edges get relaxed (the number of relaxations depends on the
order in which edges are considered), resulting in d[c] being decreased to 11,
and d[d] = 13, d[e] = 7. After three further iterations each d[v] holds the weight
of the shortest path from s to v and the leastEdge[v] edges can be followed in
reverse from v to s to find the edges in the shortest paths.
∞ ∞ 2 ∞ 2 13
11 b 11 d b 11 d
b d
2 2 5 2 2 5 2
02 5 ∞ 0 ∞ 0 ∞
1 1 1
a −1 9 5 f a −1 9 5 f a −1 9 5 f
15 36 7 15 36 7 15 36 7
c e c e c e
∞ −2 ∞ 15 −2 ∞ 11 −2 7
2 13 2 8 2 8
b 11 d b 11 d b 11 d
2 2 2
02 5
14 02 5
14 02 5
10
1 1 1
a −1 9 5 f a −1 9 5 f a −1 9 5 f
15 36 7 15 36 7 15 36 7
c e c e c e
5 −2 7 5 −2 7 5 −2 7
If the edge weights in a directed or undirected graph are all non-negative then
a more efficient alternative to the Bellman-Ford algorithm can be used to solve
the single-source shortest path problem. Dijkstra’s algorithm is an algorithm
that grows a set S of vertices whose final shortest-path weights from the source
s have already been found, starting with S = {s}. It is a greedy algorithm as
it always selects the vertex v ∈ V − S not already in S that has the smallest
shortest-path estimate d[v] as the next element to add to S. For efficiency the
128 CHAPTER 6. GRAPH ALGORITHMS

Dijkstra(G, w, s)
1 for each vertex v ∈ V [G] do
2 leastEdge[v] ← null
3 if v 6= s then
4 d[v] ← ∞  last edge on shortest path to v
5 enqueue[Q, v]
6 else
7 d[s] ← 0
8 S ← {s}  set S holds vertices whose final shortest path are known
9 A←∅  set A holds edges in shortest paths tree
10 addedVertex ← s
11 while size[Q] > 0 do
12  relax edges incident to addedVertex
13 for each edge e ∈ incident[addedVertex ] do
14  relax edge e = (addedVertex , v)
15 v ← opposite[e, addedVertex ]
16 if d[addedVertex ] + w[e] < d[v] then
17 d[v] ← d[addedVertex ] + w[e]
18 leastEdge[v] ← e
19  priority queue Q now has vertex with smallest d[v] at head
20 addedVertex ← dequeue[Q]
21 add [S, addedVertex ]
22 add [A, leastEdge[addedVertex ]]
23 return A

Dijkstra algorithm holds the vertices v not currently in S in a priority queue


Q ordered by the values d[v], and whenever a vertex gets added to S the edges
incident to it are relaxed, updating their shortest-path estimates.

To see why a greedy choice works in the single-source shortest path problem
consider any iteration of the while loop where the set S already holds vertices
whose final shortest-path weights are known and a vertex u is at the head of
the priority queue, so that d[u] ≤ d[y] for all other vertices y ∈ V − S. For the
greedy-choice property to hold it must be shown that d[u] is the weight of the
shortest path from s to u. Suppose instead that there is a shorter path that is
a shortest path from s to u (proof by contradiction), and let (x, y) denote the
first edge in this shorter path for which x ∈ S but y ∈ / S. As x ∈ S the edge
(x, y) must have been relaxed in the earlier iteration when x was added to S
so d[y] is at most the weight of the portion of this path to y. Furthermore, as
this is a shorter path to u it follows that d[y] < d[u] (so long as all edges have
non-negative weights), which contradicts the fact that u is at the head of the
queue.
6.4. ALL-PAIRS SHORTEST PATHS 129

To illustrate Dijkstra’s algorithm consider the di- b 11 d


rected weighted graph illustrated alongside, and 2 5 2
note that no edge has negative weight (otherwise 1
the less efficient Bellman-Ford algorithm would a 1 9 5 f
be used). The algorithm commences with S = 15 36 7
{a}, d[s] = 0 (since the shortest path from s to s c e
has weight 0) and d[v] = ∞ for every other vertex. 2
The first iteration of the while loop relaxes the edges incident to the source
vertex a, namely edges (a, b) and (a, c), and picks the vertex b with the smallest
shortest-path estimate. Then the edges incident to b are relaxed, resulting in a
change to d[c], d[d], and d[e], and the vertex e is chosen. Repeating again, the
edges incident to e are relaxed and the process continues until no further ver-
tices remain in the priority queue Q. The class ShortestPathDijkstra demon-
strates how Dijkstra’s algorithm can be implemented for a directed or undirected
weighted graph whose edges all have non-negative weight. This implementation
has order O (m log2 n), which could be improved to O (m + n log2 n) with a more
efficient implementation of the queue.
∞ ∞ 2 ∞ 2 13
b 11 d b 11 d b 11 d
2 2 2
02 5 ∞ 02 5 ∞ 02 5 ∞
1 1 1
a 1 9 5 f a 1 9 5 f a 1 9 5 f
15 36 7 15 36 7 15 36 7
c e c e c e
∞ 2 ∞ 15 2 ∞ 11 2 7
2 13 2 12 2 12
b 11 d b 11 d b 11 d
2 2 2
02 5
14 02 5
14 02 5
14
1 1 1
a 1 9 5 f a 1 9 5 f a 1 9 5 f
15 36 7 15 36 7 15 36 7
c e c e c e
9 2 7 9 2 7 9 2 7

Exercise 6.3 (Implementing the Bellman-Ford Algorithm) Write a pro-


gram that implements the Bellman-Ford Algorithm for solving the single-source
shortest path problem in a directed or undirected weighted graph.

6.4 All-Pairs Shortest Paths


Reading: pp620-634
Algorithms such as the Bellman-Ford algorithm and Dijkstra’s algorithm for
solving the single-source shortest path problem can be used repeatedly for each
vertex to find all the shortest paths between all pairs of vertices in a weighted
graph. However, for a graph with m edges and n vertices this would be quite
inefficient, if there are no negative weight edges then Dijkstra’s algorithm
 would
be applied n times giving an O (mn log2 n) or O mn + n2 log2 n solution (de-
pending on the implementation of the queue), otherwise if there could be neg-
ative weight edges then the slower Bellman-Ford algorithm would be required,
resulting in an Θ mn2 solution.
130 CHAPTER 6. GRAPH ALGORITHMS

/**
A class which contains Dijkstra’s algorithm for solving the
single-source shortest path problem in a directed or undirected
weighted graph that implements GraphADT interface and whose
vertices hold elements of generic type E that has suitable hashing
function (note all edges presumed to have non-negative weight)
@author Andrew Ensor
*/
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;

public class ShortestPathDijkstra<E>


{
private GraphADT<E> graph;
private Map<Edge<E>,Integer> weights;

public ShortestPathDijkstra(GraphADT<E> graph,


Map<Edge<E>,Integer> weights)
{ this.graph = graph;
this.weights = weights;
}

public Set<Edge<E>> getShortestPathsTree(Vertex<E> source)


{ final Map<Vertex<E>,Integer> shortestPathEstimates
= new HashMap<Vertex<E>,Integer>();
Map<Vertex<E>,Edge<E>> leastEdges
= new HashMap<Vertex<E>,Edge<E>>();
// create a min priority queue to hold each vertex
Comparator<Vertex<E>> comparator = new Comparator<Vertex<E>>()
{ public int compare(Vertex<E> u, Vertex<E> v)
{ return shortestPathEstimates.get(u)-
shortestPathEstimates.get(v);
}
};
PriorityQueue<Vertex<E>> queue = new PriorityQueue<Vertex<E>>
(graph.vertexSet().size(), comparator);
for (Vertex<E> vertex : graph.vertexSet())
{ leastEdges.put(vertex, null);
if (vertex!=source)
{ shortestPathEstimates.put(vertex, Integer.MAX_VALUE);
queue.add(vertex);
}
else
shortestPathEstimates.put(source, new Integer(0));
}

cont-
6.4. ALL-PAIRS SHORTEST PATHS 131

-cont

// create set to hold vertices whose final shortest paths known


Set<Vertex<E>> knownSPVertices = new HashSet<Vertex<E>>();
knownSPVertices.add(source);
// start building the shortest paths tree
Set<Edge<E>> spt = new HashSet<Edge<E>>();
Vertex<E> addedVertex = source;
// process each element of the queue
while (queue.size()>0)
{ // relax edges incident to addedVertex
for (Edge<E> edge : addedVertex.incidentEdges())
{ Vertex<E> v = edge.oppositeVertex(addedVertex);
int newEstimate = shortestPathEstimates.get(addedVertex)
+ weights.get(edge);
if (!knownSPVertices.contains(v) &&
newEstimate < shortestPathEstimates.get(v))
{ // find the adjacent vertex in the queue
// note an iterator is used so queue can be modified
Iterator<Vertex<E>> iterator = queue.iterator();
boolean found = false;
while (!found && iterator.hasNext())
found = (iterator.next() == v);
if (found) // should always be found
{ // remove v from queue so gets resorted after change
iterator.remove();
shortestPathEstimates.put(v, newEstimate);
queue.add(v);
leastEdges.put(v, edge);
}
}
}
// priority queue now has vertex with smallest pe at head
addedVertex = queue.poll();
knownSPVertices.add(addedVertex);
spt.add(leastEdges.get(addedVertex));
}
return spt;
}
}
132 CHAPTER 6. GRAPH ALGORITHMS

Instead, the optimal substructure of shortest paths can be utilized to build


up a solution to the all-pairs shortest path problem, by progressively finding all
shortest paths of length k for k = 1, 2, . . . , n − 1. For vertices u and v let dk [u, v]
denote the least weight of all the paths from u to v that have length at most k,
or ∞ if there is no such path, and suppose that the graph does not have any
closed paths with negative weight. Then any shortest path in the graph can be
made to have length less than n (otherwise the path would include some vertex
twice and the loop could be removed), so that dn−1 [u, v] must give the weight
of the shortest path (of any length) between u and v.
Since a path of length 0 contains no edges, d0 [u, v] = 0 if u = v and d0 [u, v] =
∞ if u 6= v. For k ≥ 1 if there are no paths from u to v of length at most k then
dk [u, v] = ∞, otherwise consider a path from u to v of length at most k that
has least weight, and suppose the last edge e in the path connects a vertex x
with the vertex v. Then the portion of the path from u to x must have weight
dk−1 [u, x] (by optimal substructure). Hence the weight of the path must be
dk−1 [u, x] + w[e] for some edge e = (x, v) and vertex x adjacent to v, giving
that:
dk [u, v] = min (dk−1 [u, x] + w[e]) .
e=(x,v)∈incident[v]

A simplistic approach to implementing this in an algorithm would be to build


d1 [u, v] for all vertices u and v,and then d2 [u, v], d3 [u, v], . . . , dn−1 [u, v]. Such
an algorithm would be O mn2 (if each vertex had m n incident edges), which
is not an improvement over a repeated application of the algorithms for solving
the single-source shortest path problem. However, this asymptotic complexity
can be improved by noting that dk is not needed for every value of k, but only
for enough values so that dn−1 (or some dn0 for any n0 ≥ n − 1) is found. Once
some dk and dl have been found, the optimal substructure of shortest paths
gives that:
dk+l [u, v] = min (dk [u, x] + dl [x, v]) .
x∈V

Hence, once d1 is found, then d2 , d4 , d8 , d16 , .. . , dk are each found in turn until
k ≥ n−1. This approach gives a Θ n3 log2 n algorithm for solving the all-pairs
shortest path problem.
Note that each dk can be viewed as an n × n matrix, and the process of
finding dk+l from dk and dl is often viewed as a matrix multiplication where
instead of adding and multiplying together the entries of the matrices (weights)
the operations used are minimum and addition.
b 11 d
For example, suppose the weight of the shortest 2 5 2
paths between every pair of vertices is to be found 1
for the graph alongside. Taking the vertices in the a −1 9 5 f
order a, b, c, d, e, f and forming 6 × 6 matrices 15 36 7
dk , the values of d0 , d1 , d2 = d1 · d1 , d4 = d2 · d2 , c e
and d8 = d4 · d4 can be found. −2
   
0 ∞ ∞ ∞ ∞ ∞ 0 2 15 ∞ ∞ ∞
∞ 0 ∞ ∞ ∞ ∞ ∞ 0 9 11 5 ∞
∞ ∞ 0 ∞ ∞ ∞ ∞ −1 0 3 6 ∞
   
d0 =   d1 =  
 ∞ ∞ ∞ 0 ∞ ∞   ∞ ∞ ∞ 0 5 2 
∞ ∞ ∞ ∞ 0 ∞ ∞ ∞ −2 ∞ 0 7
∞ ∞ ∞ ∞ ∞ 0 ∞ ∞ ∞ 1 ∞ 0
6.4. ALL-PAIRS SHORTEST PATHS 133

   
0 2 15 ∞ ∞ ∞ 0 2 15 ∞ ∞ ∞
∞ 0 9 11 5 ∞ ∞ 0 9 11 5 ∞
∞ −1 0 3 6 ∞ ∞ −1 0 3 6 ∞
   
d2 =  · 
 ∞ ∞ ∞ 0 5 2   ∞ ∞ ∞ 0 5 2 
∞ ∞ −2 ∞ 0 7 ∞ ∞ −2 ∞ 0 7
∞ ∞ ∞ 1 ∞ 0 ∞ ∞ ∞ 1 ∞ 0
 
0 2 11 13 7 ∞
∞ 0 3 11 5 12
∞ −1 0 3 4 5
 
=  
 ∞ ∞ 3 0 5 2 
∞ −3 −2 1 0 7
∞ ∞ ∞ 1 6 0
   
0 2 11 13 7 ∞ 0 2 11 13 7 ∞
∞ 0 3 11 5 12 ∞ 0 3 11 5 12
∞ −1 0 3 4 5 ∞ −1 0 3 4 5
   
d4 =  · 
 ∞ ∞ 3 0 5 2   ∞ ∞ 3 0 5 2 
∞ −3 −2 1 0 7 ∞ −3 −2 1 0 7
∞ ∞ ∞ 1 6 0 ∞ ∞ ∞ 1 6 0
 
0 2 5 8 7 14
∞ 0 3 6 5 8
∞ −1 0 3 4 5
 
=  
 ∞ 2 3 0 5 2 
∞ −3 −2 1 0 3
∞ 3 4 1 6 0
   
0 2 5 8 7 14 0 2 5 8 7 14
∞ 0 3 6 5 8 ∞ 0 3 6 5 8
∞ −1 0 3 4 5 ∞ −1 0 3 4 5
   
d8 =  · 
 ∞ 2 3 0 5 2   ∞ 2 3 0 5 2 
∞ −3 −2 1 0 3 ∞ −3 −2 1 0 3
∞ 3 4 1 6 0 ∞ 3 4 1 6 0
 
0 2 5 8 7 10
∞ 0 3 6 5 8
∞ −1 0 3 4 5
 
=  
 ∞ 2 3 0 5 2 
∞ −3 −2 1 0 3
∞ 3 4 1 6 0

The entries of the matrix d8 then give the weights of the shortest paths between
any pair of vertices. For example, since the 2, 6-entry is 8 the shortest path from
vertex b to vertex f must have weight 8. 
The all-pairs shortest path problem can actually be solved in Θ n3 using
another characterization of shortest path, where the choice of intermediate (non-
endpoint) vertices that are in the path is considered rather than the number
of edges in the path. Let v0 , v1 , . . . , vn−1 denote the n vertices of the graph,
which is presumed again not to have any closed paths with negative weight.
Suppose u and v are vertices and that some shortest path from u to v only uses
intermediate vertices from v0 , v1 , . . . , vk−1 (so apart from the end points u and
v no other vertex in the path includes any of vk , vk+1 , . . . vn−1 ). Since the graph
does not contain any closed paths with negative weights the shortest path can
be presumed to not use any vertex twice. There are two possibilities to consider,
either the path includes vertex vk−1 once or else it doesn’t include vk−1 as an
intermediate vertex at all. In the case where vk−1 does appear in the path, the
portion of the path up to vk−1 must be a shortest path from u to vk−1 and
the portion after vk−1 must be a shortest path from vk−1 to v. Furthermore,
neither of these portions contain vk−1 as an intermediate vertex (since vk−1 only
appeared once in the shortest path from u to v). This shows that the all-pairs
shortest path problem has another type of optimal  substructure.
The Floyd-Warshall algorithm is a Θ n3 dynamic programming solution
for the all-pairs shortest path problem. It iteratively finds the smallest possible
134 CHAPTER 6. GRAPH ALGORITHMS

weights for paths between pairs of vertices that only use intermediate vertices
from v0 , v1 , . . . , vk−1 , starting with k = 0, then for k = 1, 2, . . . , n. When k = n
there is no restriction made on the intermediate vertices so the problem is solved.
For vertices u and v let dk [u, v] denote the smallest weight for paths from
u to v that only use intermediate vertices from v0 , v1 , . . . , vk−1 , and let pk [u, v]
denote the vertex x (the last intermediate vertex) before v on that smallest
path from u to v. For k = 0, if u and v are adjacent with least weight edge
e incident to them then d0 [u, v] = w[e] and p0 [u, v] = u, whereas if u and
v are not adjacent then d0 [u, v] = ∞ and p0 [u, v] is taken to be null. For
k > 0, if there is no path from u to v that uses intermediate vertices from
v0 , v1 , . . . , vk−1 then dk [u, v] = ∞ and pk [u, v] = null (the same as dk−1 [u, v]
and pk [u, v]). If instead there is a path from u to v using only intermediate
vertices from v0 , v1 , . . . , vk−1 then the optimal substructure gives that either
dk [u, v] = dk−1 [u, v] and pk [u, v] = pk−1 [u, v] (if the least weight such path
doesn’t include vk−1 ) or else dk [u, v] = dk−1 [u, vk−1 ] + dk−1 [vk−1 , v] (the sum of
weights of two shortest paths, neither of which includes vk−1 as an intermediate
vertex) and pk [u, v] = pk−1 [vk−1 , v]. Thus for k > 0:

dk [u, v] = min (dk−1 [u, v], dk−1 [u, vk−1 ] + dk−1 [vk−1 , v]) .

Floyd-Warshall(w)
1  solves all-pairs shortest paths problem for graph with weights w
2 n ← rows[w]  graph has vertices v0 , v1 , . . . , vn−1
3 for i ← 0 to n − 1 do
4 for j ← 0 to n − 1 do
5 d0 [i, j] ← w[i, j]  weight of edge from vi to vj
6 if there is an edge from vi to vj then
7 p0 [i, j] ← i
8 else
9 p0 [i, j] ← null
10 for k ← 1 to n do  consider paths with intermed. from v0 , v1 , . . . , vn−1
11 for i ← 0 to n − 1 do  consider paths from vi
12 for j ← 0 to n − 1 do  consider paths to vj
13  find least weight such path
14 s ← dk−1 [i, k − 1] + dk−1 [k − 1, j]  via vk−1
15 if dk−1 [i, j] ≤ s then
16 dk [i, j] ← dk−1 [i, j]  least not via vk−1
17 pk [i, j] ← pk−1 [i, j]
18 else  least is via vk−1
19 dk [i, j] ← s
20 pk [i, j] ← pk−1 [k − 1, j]
21 return dn and pn

The class AllPairsFloydWarshall demonstrates the Floyd-Warshall algorithm


applied to the graph from Page ??. It results in the matrices:
   
0 2 5 8 7 10 0 0 4 2 1 3
∞ 0 3 6 5 8 null 1 4 2 1 3
∞ −1 0 3 4 5 null 2 2 2 1 3
   
d6 =   p6 =  
 ∞ 2 3 0 5 2   null 2 4 3 3 3 
∞ −3 −2 1 0 3 null 2 4 2 4 3
∞ 3 4 1 6 0 null 2 4 5 3 5
6.5. MAXIMUM FLOW 135

Checking an entry such as the 2, 6-entry of p6 gives that the vertex preceding
f on the shortest path from b to f is v3 = d. Then checking the 2, 4-entry
gives the vertex preceding d is v2 = c, checking the 2, 3-entry gives the vertex
preceding c is v4 = e, and checking the 2, 5-entry gives the vertex preceding e
is v1 = b. Hence the shortest path from b to f is via e, c, and then d.

Exercise 6.4 (Transitive Closure of a Graph) The transitive closure of a


graph G = (V, E) is the graph G∗ = (V, E ∗ ) with the same vertices as G, but
which has an edge from a vertex u to a vertex v if and only if there is a path in
G from u to v.
One approach to finding the transitive closure of a graph is to apply the
Floyd-Warshall algorithm to the graph (taking weights 1 if the graph is not
weighted) to determine which vertices in G are connected by a path. A more
memory-efficient alternative is to adapt the Floyd-Warshall algorithm to find
boolean values tk [u, v] which are true if there is a path from u to v that uses
intermediate vertices from v0 , v1 , . . . , vk−1 . Then tn [u, v] gives whether or not
there is a path from vertex u to vertex v.
Implement the following dynamic programming algorithm that finds the tran-
sitive closure of a graph:

Transitive-Closure(G)
1  determines the transitive closure of the graph G
2 n ← |V [G]|
3 for i ← 0 to n − 1 do
4 for j ← 0 to n − 1 do
5 if i = j or there is an edge from vi to vj then
6 t0 [i, j] ← true
7 else
8 t0 [i, j] ← false
9 for k ← 1 to n do  consider paths with intermed. from v0 , v1 , . . . , vn−1
10 for i ← 0 to n − 1 do  consider paths from vi
11 for j ← 0 to n − 1 do  consider paths to vj
12 tk [i][j] ← tk−1 [i][j] | (tk−1 [i][k−1] & tk−1 [k−1][j])
13 return tn

6.5 Maximum Flow


Reading: pp643-663
A flow network is a directed graph G for which every edge e has a capacity
(weight) c(e) ≥ 0, together with two distinguished vertices, the source vertex s
and the sink vertex t, for which every vertex v in the graph lies on some path
from s to t.
Flow networks are often used to model the flow of
a fluid through pipes, parts moving through in- b 11 d
terconnected assembly lines, current through an 2 5 2
electric circuit, deliveries in a transportation net- 1
work, or the transmission of information across a a 1 9 5 f
computer network. The weighted graph alongside 15 36 7
could be considered a flow network with source a c e
sink f . 2
136 CHAPTER 6. GRAPH ALGORITHMS

/**
A class that demonstrates the Floyd-Warshall algorithm for solving
the all-pairs shortest paths problem in O(n^3)
*/
public class AllPairsFloydWarshall
{
private static final int INFINITY = Integer.MAX_VALUE;
private static final int NO_VERTEX = -1;
private int n; // number of vertices in the graph
private int[][][] d; //d[k][i][i] is weight of path from v_i to v_j
private int[][][] p; //p[k][i][i] is penultimate vertex in path

public AllPairsFloydWarshall(int[][] weights)


{ n = weights.length;
d = new int[n+1][][];
d[0] = weights;
// create p[0]
p = new int[n+1][][];
p[0] = new int[n][n];
for (int i=0; i<n; i++)
{ for (int j=0; j<n; j++)
{ if (weights[i][j]<INFINITY)
p[0][i][j] = i;
else
p[0][i][j] = NO_VERTEX;
}
}
// build d[1],...,d[n] and p[1],...,p[n] dynamically
for (int k=1; k<=n; k++)
{ d[k] = new int[n][n];
p[k] = new int[n][n];
for (int i=0; i<n; i++)
{ for (int j=0; j<n; j++)
{ int s;
if (d[k-1][i][k-1]!=INFINITY&&d[k-1][k-1][j]!=INFINITY)
s = d[k-1][i][k-1] + d[k-1][k-1][j];
else
s = INFINITY;
if (d[k-1][i][j] <= s)
{ d[k][i][j] = d[k-1][i][j];
p[k][i][j] = p[k-1][i][j];
}
else
{ d[k][i][j] = s;
p[k][i][j] = p[k-1][k-1][j];
}
}
}
}
}
}
6.5. MAXIMUM FLOW 137

A flow f in a network G is a function that assigns a value f (e) to each edge


in the network for which:
• 0 ≤ f (e) ≤ c(e) (the capacity rule), so that the flow through an edge
cannot exceed the capacity of the edge,
• for every vertex v ∈ V − {s, t}, the sum of the flows for edges incident
to v is the same as the sum of the flows for edges incident from v (the
conservation rule), so that apart from the source and sink, there is the
same amount of flow entering a vertex as leaves it.
The flow in to a vertex v is the sum of the flows for edges incident to v, and the
flow out from v is the sum of the flows for edges incident from v. The value of
the flow |f | is the total flow out from the source. Since all other vertices apart
from the sink conserve the amount of flow (the flow in is the same as the flow
out), |f | must also be the total flow in to the sink. A flow might represent the
flux (volume per unit time) of a fluid moving through pipes, the rate at which
parts move through an assembly line, the amount of current flowing through
the wires in a circuit, the number of deliveries along a route each day, or the
transmission rate of data in a computer network.
Usually the various edges from a vertex u to a vertex v are not distinguished
in a flow network, and c(u, v) is used to denote the total capacity from u to v
(the sum of the capacities for each edge from u to v), with c(u, v) = 0 if u and
v are not adjacent. The net flow f (u, v) is the sum of the flows for each edge
from u to v minus the flows for edges from v to u (the flows in the opposite
direction), so that f (v, u) = −f (u, v) for a flow f .
For example, consider the flow f in the earlier b 2/11 d
network with f (a, b) = 2, f (a, c) = 5, f (b, d) = 2, 2/2 5
1/2
f (c, d) = 3, f (c, e) = 2, f (d, e) = 4, f (d, f ) = 1, 1
f (e, f ) = 6 (the notation 5/15 is used for the edge a 1 9 4/5 f
(a, c) to denote the flow f (a, c) = 5 and capacity 5/15 3/3 6/7
2/6
c(a, c) = 15 and only positive flows are shown). c e
The value of this flow is |f | = 7. 2
A different flow would be f (a, b) = 2, f (a, c) = 7,
f (b, d) = 1, f (b, e) = 2, f (c, b) = 1, f (c, d) = 3,
f (c, e) = 3, f (d, e) = 2, f (d, f ) = 2, f (e, f ) = 7. b 1/11 d
2/2
2/2 2/5
Note that the value of this flow is |f | = 9, which is
1
larger than the previous flow in the same network. a 1/1 9 2/5 f
Moreover, since the value of this flow is the same 7/15 3/3 7/7
as the total capacity of edges (d, f ) and (e, f ), the 3/6
c e
flow must be maximum for the network. 2
The maximum-flow problem is the problem of finding a flow of maximum
value in a network, so that the network is used most effectively. In the case
where a network might have several sources and/or sinks the problem can be
converted to a problem with just a single source and single sink by adding a
supersource vertex and a supersink vertex, and edges with infinite capacity from
the supersource to each source and from each sink to the supersink.
Given a flow f the residual capacity cf from a vertex u to a vertex v is
given by cf (u, v) = c(u, v) − f (u, v), the amount of additional flow that can be
accommodated by the edge (u, v). Note that cf (u, v) < c(u, v) if f (u, v) > 0,
whereas cf (u, v) > c(u, v) if the flow is in the opposite direction. This gives
138 CHAPTER 6. GRAPH ALGORITHMS

a residual network whose vertices are the same as those of the original flow
network, but whose edges are (u, v) if cf (u, v) > 0.
For example, the flow from Page 137 has the il-
lustrated residual network. Note that the edge b 9 d
2
(a, c) has residual capacity cf (a, c) = 10 since the 52 1
flow used f (a, c) = 5 units of the original capac- 2
ity c(a, c) = 15, and the edge (c, a) has residual a 1 9 4 1 f
capacity cf (c, a) = 5, since although the origi- 10 34 1
nal network did not contain an edge (c, a) up to 5 5 c e 6
units of existing flow from a to c can be cancelled. 4
Residual capacities are useful for augmenting the value of a flow f , since if a
flow g can be found in the residual network then the combined flow f + g gives a
flow with value |f | + |g|. An augmenting path is a path in the residual network
from the source to the sink. Such a path can be used to obtain a flow g in the
residual network by taking the minimum of the capacities along each edge of
the path.
For example, one augmenting path in the pre-
vious residual network starts at the source and b 9 d
2
passes through c and e, ending at the sink. The 52 1
capacities of the edges are all at least 1 unit, so 2
the residual network has sufficient capacity for a a 1 9 4 1 f
flow g where g(a, c) = 1, g(c, e) = 1, g(e, f ) = 1. 10 34 1
This results in a flow f + g with larger value 5 c e 6
7 + 1 = 8. 4
Since the value of a flow increases whenever it is augmented the residual
network for the flow must eventually not contain any further augmenting paths
(so long as it has only integer flow values). The Ford-Fulkerson method uses this
fact to solve the maximum-flow problem by repeatedly combining flows obtained
from any augmenting path until no further augmenting paths are possible. To
understand why this method results in a flow with maximum value it is useful
to consider a cut of the network into two disjoint sets of vertices S and V − S,
where the source is in S and the sink is in V − S on the opposite side of the
cut. The capacity of the cut is taken to be the sum of all c(u, v) where u ∈ S
and v ∈ V − S. It is not difficult to show that the value of a flow |f | cannot
exceed the capacity of any cut of the network (see for example, Page 656 of the
textbook). The following result ensures that when the Ford-Fulkerson method
terminates the flow will be maximum, and its value will be the same as the
capacity of some minimum cut (this is justified on Page 657 of the textbook).
Max-Flow Min-Cut Theorem: Suppose f is a flow in a flow network. Then
the following are equivalent:

• f is a maximum flow,

• there are no augmenting paths in the residual network for f ,

• |f | is equal to the capacity of some cut in the flow network.

For a network with m edges, n vertices, and maximum flow value |f ∗ | if


augmenting paths are chosen in any order to augment the flow then the Ford-
Fulkerson method is O (m |f ∗ |). This complexity can be improved in practice
6.6. NETWORK ROUTING 139

by using a greedy technique, always choosing an augmenting path that uses the
least number of edges in the residual network. When augmenting paths are
chosen in this order the Ford-Fulkerson
 method is known as the Edmonds-Karp
algorithm, and is O m2 n . One simple strategy for finding an augmenting path
with the least number of edges is to use a breadth-first search in the residual
network for the sink starting at the source.
The class MaxFlowEdmondsKarp uses the Edmonds-Karp algorithm to solve
the maximum flow problem for the network on Page 137. It starts with a
flow with value 0, and augments it by a flow with value 2 using an augmenting
path through the vertices a, b, d, f . Next, it augments by a flow with value
6 through the vertices a, c, e, f . Then it augments by a flow with value 1
through the vertices a, c, b, e, f (note the longer path). Since there are no
further augmenting paths available in the residual network the flow must now
have maximum value 2 + 6 + 1 = 9.
11 9
b d 2 b d
2 5 2 52
1
a 1 9 5 f a 1 9 53 f
15 36 7 15 36 7
c e c e
2 2
b 9 d b 9 d
2 2
52 4 2
a 1 9 53 f a 10 53 f
9 3 1 8 3 1
6 c e 6 7 c e 7
8 8

Exercise 6.5 (Finding the Maximum Flow) Apply the Edmonds-Karp al-
gorithm to find the maximum flow for some sample flow networks and use the
class MaxFlowEdmondsKarp to check your answers.

6.6 Network Routing


Reading: none
A routing algorithm is an algorithm that specifies how information packets
are moved around between various computers in a computer network. A good
routing algorithm should ensure that the packets are routed through the network
by routers to their destinations quickly and reliably, while still being fair to
other packets in the network. Routing algorithms are graph algorithms where
the routers are the vertices of the graph and the physical connections between
adjacent routers are the edges. Routing algorithms can be divided into three
types:

• broadcast routing where a packet is sent to every router on the network,

• unicast routing where a packet is sent to a specific router on the network,

• multicast routing where a packet is sent to each router in a specified group.


140 CHAPTER 6. GRAPH ALGORITHMS

/**
A class that solves the maximum-flow problem by the Edmonds-Karp
algorithm for a network whose capacities are specified by a square
array (presumed to hold non-negative values)
@author Andrew Ensor
*/
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class MaxFlowEdmondsKarp


{
private int[][] capacities;
private int n; // number of vertices in network
private int source, sink;
private int[][] flow; // holds the maximum flow

public MaxFlowEdmondsKarp(int[][] capacities, int source, int sink)


{ this.capacities = capacities;
this.n = capacities.length;
this.source = source;
this.sink = sink;
// create an initial flow with value 0
flow = new int[n][n];
// create the residual network
int[][] residualNetwork = new int[n][n];
for (int i=0; i<n; i++)
{ if (capacities[i].length < n)
throw new IllegalArgumentException
("Capacity must not be a ragged array");
for (int j=0; j<n; j++)
residualNetwork[i][j] = capacities[i][j];
}
boolean maxFlow = false; // maximum flow not yet found
while (!maxFlow)
{ // use a breadth first search to find next augmenting path
int[] pathVertices = getShortestPath(residualNetwork,
source, sink);
if (pathVertices == null) // no more augmenting paths
maxFlow = true;
else
{
// find greatest possible flow for this augmenting path
int augmentCapacity
= residualNetwork[source][pathVertices[1]];
for (int i=2; i<pathVertices.length; i++)
{ augmentCapacity = Math.min(augmentCapacity,
residualNetwork[pathVertices[i-1]][pathVertices[i]]);
}

cont-
6.6. NETWORK ROUTING 141

-cont

// update the flow and the residual network


for (int i=1; i<pathVertices.length; i++)
{ int start = pathVertices[i-1];
int end = pathVertices[i];
flow[start][end] += augmentCapacity;
flow[end][start] -= augmentCapacity;
residualNetwork[start][end] -= augmentCapacity;
residualNetwork[end][start] += augmentCapacity;
}
}
}
}

// helper method that uses a breadth-first search to find shortest


// length path in residual network from source to sink
private int[] getShortestPath(int[][] network, int source, int sink)
{ int n = network.length;
Map<Integer, Integer> parentVertices
= new HashMap<Integer, Integer>(); //holds parents for search
List<Integer> visitedVertices = new ArrayList<Integer>();
QueueADT<Integer> processingQueue = new LinkedQueue<Integer>();
// handle the source vertex
parentVertices.put(source, null); // root has no parent
visitedVertices.add(source);
processingQueue.enqueue(source);
boolean sinkFound = false;
while (!sinkFound && !processingQueue.isEmpty())
{ int frontVertex = processingQueue.dequeue();
// find all adjacent vertices that have not been visited and
// enqueue them
for (int i=0; i<n; i++)
{ if (network[frontVertex][i]>0 &&
!visitedVertices.contains(i))
{ // visit the vertex i
parentVertices.put(i, frontVertex);
visitedVertices.add(i);
processingQueue.enqueue(i);
if (i == sink)
sinkFound = true;
}
}
}

cont-
142 CHAPTER 6. GRAPH ALGORITHMS

-cont

if (sinkFound)
{ // determine the path that was found from parentVertices
List<Integer> reversePath = new ArrayList<Integer>();
int currentVertex = sink;
while (currentVertex != source)
{ reversePath.add(currentVertex);
currentVertex = parentVertices.get(currentVertex);
}
reversePath.add(source);
// transfer the vertices in path to an int[] array
int numVertices = reversePath.size();
int[] path = new int[numVertices];
for (int i=0; i<numVertices; i++)
{ path[i] = reversePath.get(numVertices-1-i);
}
return path;
}
else
return null; // no path found
}

// returns a string representation of flow and its value


public String toString()
{ String output = "Maximum Flow\n";
for (int i=0; i<n; i++)
{ for (int j=0; j<n; j++)
output += "\t" + flow[i][j];
output += "\n";
}
// calculate the value of the flow
int value = 0;
for (int i=0; i<n; i++)
value += flow[source][i];
output += "Value is " + value + "\n";
return output;
}

public static void main(String[] args)


{ // test the flow network given in the manual
int[][] capacities = {
{0, 2, 15, 0, 0, 0}, // capacity from node a
{0, 0, 9, 11, 5, 0}, // capacity from node b
{0, 1, 0, 3, 6, 0}, // capacity from node c
{0, 0, 0, 0, 5, 2}, // capacity from node d
{0, 0, 2, 0, 0, 7}, // capacity from node e
{0, 0, 0, 1, 0, 0}}; // capacity from node f
MaxFlowEdmondsKarp maxFlow
= new MaxFlowEdmondsKarp(capacities, 0, 5);
System.out.println(maxFlow);
}
}
6.6. NETWORK ROUTING 143

The simplest algorithm for broadcast routing is the flooding algorithm. When
a router wants to send a packet to all the other routers on the network it simply
sends it to all its adjacent routers which in turn send the packet to all their
other adjacent routers. Of course most networks contain closed paths so the
flooding algorithm must be modified to avoid infinite loops where a packet is
forever transmitted around the path.
One strategy for this is to include a positive hop counter with each packet.
Every time a router receives a packet it decrements the hop counter and only
forwards the packet if the counter is still positive. If the diameter of the network
(the largest number of edges in a shortest path between any two vertices in the
graph) is known then starting the hop counter equal to the diameter ensures
that the packet will reach every router in the network.
An alternative strategy for the flooding algorithm is for the source router of
the packet to assign its own unique sequence number to the packet, and each
router to maintain a hash table of sequence numbers it has already received
from all the other routers in the network. When a router receives a packet it
checks the packet’s source and sequence number with its hash table and only
forwards the packet if it has not previously received the packet. To avoid the
potentially large space requirements for each hash table, often each router only
stores the most recent sequence number for each other router in the network
(presuming that if it receives a packet with a certain sequence number from the
source then it has probably already processed all the previous packets from that
source.
One advantage of the flooding algorithm is that it requires no setup costs
for the network, each router needs only be aware of its adjacent routers, and so
is suitable when the topology of the network (the configuration of routers and
connections) changes frequently. The flooding algorithm ensures that a packet
is sent to all the other routers in the shortest possible time but it does so at the
expense of the load on the network (involving many redundant packets being
transmitted).
For unicast routing the flooding algorithm results in too high a network load
so instead unicast routing algorithms make use of the topology of the network,
treating it as a weighted graph. Each weight represents the cost of sending a
packet through that edge, such as time delay in the transmission. A router
can estimate the time delay for an edge by sending an ECHO packet along the
connection to the adjacent router and timing the round-trip delay for the packet
to return.
The distance vector algorithm is an adaptation of the Bellman-Ford algo-
rithm from Page 126 for a distributed network. Each router stores a distance
vector (routing table), giving a shortest-path estimate d[v] for each router in
the network (identified by its IP address), and the adjacent edge leastEdge[v]
for that shortest-path estimate. Whenever a router receives a packet intended
for some destination router it checks the destination v with its current distance
vector and forwards the packet along the edge leastEdge[v] (the flooding algo-
rithm would instead forward the packet along every adjacent edge). Each router
x periodically sends its current distance vector to its adjacent routers. Each of
these routers compares that distance vector dx with its own and relaxes any of
its own shortest-path estimates d[v] if a path to v via the adjacent router would
give a shorter estimate than its own. Over a period of time each router builds
a more accurate distance vector for routing packages through the network, pro-
144 CHAPTER 6. GRAPH ALGORITHMS

Initialize-Distance-Vector()
1  determine the initial shortest path estimates from this vertex (router) u
2 for each known vertex (router) v do
3 if v ∈ adjacent[u] then
4 find estimated time w for packet along edge (u.v)
5 d[v] ← w
6 leastEdge[v] ← (u, v)
7 else  no shortest path estimate yet for path to v
8 d[v] ← ∞
9 leastEdge[v] ← null
10 d[u] ← 0
11 leastEdge[u] ← null

Relax-Distance-Vector(dx )
1  relax distance vector d of this vertex u given distance vector dx of
2  adjacent vertex x
3 e ← (u, x)
4 for each known vertex (router) v do
5 if w[e] + dx [v] < d[v] then
6 d[v] ← w[e] + dx [v]
7 leastEdge[v] ← e

gressively finding shorter paths to destinations. When there is a change in the


topology of the network that information is gradually propagated through the
network.
The class UnicastRouter demonstrates an implementation of a router that
uses a RouterDistanceVector to hold its current routing table and a Router-
Connection to manage each TCP socket connection with an adjacent router in
the network. When the router receives a UnicastPacket intended for another
router in the network it uses its current distance vector to determine the most
appropriate adjacent router in the network to which to forward the packet. A
packet is either a UnicastMessagePacket string message, a UnicastConnect-
Packet with connection information passed when a new connection is made
in the network, a UnicastUpdatePacket holding a distance vector from an-
other router which is used to relax the router’s own distance vector, or a
UnicastDisconnectPacket disconnection notification.
A widely-used alternative to the distance vector algorithm is the linked state
algorithm for unicast routing. This algorithm starts with each router estimating
the weight of each of its adjacent edges and using a flooding algorithm to pass
that information to all the other routers in the network. After this broadcast
phase each router knows the entire network and so can each use Dijkstra’s
algorithm from Page 128 to determine the shortest path from it to every other
router in the network. Although this requires more communication across the
network initially, once the broadcast phase is completed the network is optimized
until its topology changes.
If the network is very large then routing tables become unwieldy, too big for
6.6. NETWORK ROUTING 145

/**
A class that represents a router for performing unicast routing
across a network built from unicast routers
@author Andrew Ensor
*/
...
public class UnicastRouter implements Runnable
{
private int routerPort; // port on local machine used by server
private InetAddress routerAddress; // IP address on local machine
private ServerSocket serverSocket;
private boolean stopRequested;
private RouterDistanceVector distanceVector;
private QueueADT<UnicastPacket> packetQueue; // packets to process
private Thread processingThread; // handles the queue processing
private Map<InetAddress, RouterConnection> adjacentRouters; //synch
private RouterDisplay routerDisplay; // display for router output

public UnicastRouter(int routerPort)


{ this.routerPort = routerPort;
// create the server socket
serverSocket = null;
try
{ routerAddress = InetAddress.getLocalHost();
serverSocket = new ServerSocket(routerPort);
serverSocket.setSoTimeout(200); // timeout for socket accept
}
catch (UnknownHostException e)
{ displayError("Can’t get local host: " + e);
System.exit(-1);
}
catch (IOException e)
{ displayError("Server can’t listen on port: " + e);
System.exit(-1);
}
stopRequested = false;
// create the distance vector, packet queue, and adjacent routers
distanceVector = new RouterDistanceVector(routerAddress);
packetQueue = new LinkedQueue<UnicastPacket>();
processingThread = new Thread(new PacketProcessor());
adjacentRouters
= new ConcurrentHashMap<InetAddress, RouterConnection>();
}

// continually listens for connection requests from new adjacent


// routers
public void run()
{ displayText("Router started at " + routerAddress +
" on port " + routerPort);
displayAdjacentRouters();
displayNetwork();
processingThread.start();

cont-
146 CHAPTER 6. GRAPH ALGORITHMS

-cont

while (!stopRequested)
{ try
{ // block until the next client requests a connection
// or the server timeout is reached
Socket socket = serverSocket.accept();
InetAddress newRouter = socket.getInetAddress();
displayText("Connection made with " + newRouter);
RouterConnection connection
= new RouterConnection(this, socket);
// add the connection to the adjacentRouter map
adjacentRouters.put(newRouter, connection);
displayText("Connection made with " + newRouter);
displayAdjacentRouters();
// note that newRouter is not added to distanceVector
// until UnicastConnectPacket received from newRouter
// since we don’t yet know the distance from newRouter
Thread.sleep(50); // give other threads a chance
}
catch (SocketTimeoutException e)
{} // ignore the timeout and pass around while loop again
catch (InterruptedException e)
{} // ignore the interruption
catch (IOException e)
{ displayError("Can’t accept client connection: "+e);
stopRequested = true;
}
}
displayText("Router finishing");
try
{ serverSocket.close();
}
catch (IOException e)
{ displayError("Can’t close server: " + e);
}
}

// inner class that processes the queue of packets


private class PacketProcessor implements Runnable
{
// repeatedly tries to process the elements in the queue
// note that this method continues even once stopRequested
// until queue is empty
public void run()
{ while (!(stopRequested && packetQueue.isEmpty()))
processQueue();
}

cont-
6.6. NETWORK ROUTING 147

-cont

// wait until there is a packet at front of queue


public void processQueue()
{ UnicastPacket packet = null;
synchronized(packetQueue) // code block synchronized on queue
{ while (!stopRequested && packetQueue.isEmpty())
{ try
{ //wait for notification that queue not empty
packetQueue.wait();
Thread.sleep(50); // give other threads a chance
}
catch (InterruptedException e)
{} // ignore interruption
}
if (!packetQueue.isEmpty())
packet = packetQueue.dequeue();
}
if (packet != null)
process(packet);
}

private void process(UnicastPacket packet)


{ InetAddress source = packet.getSource();
InetAddress destination = packet.getDestination();
// check whether packet should just be forwarded
if (!destination.equals(routerAddress))
{ // forward packet via shortest distance router
InetAddress nextRouter
= distanceVector.getNextRouterToDestination(destination);
RouterConnection connection = null;
if (nextRouter != null)
connection = adjacentRouters.get(nextRouter);
if (connection == null)
{ // send back an undeliverable message
UnicastMessagePacket returnPacket
= new UnicastMessagePacket(routerAddress, source,
"Unable to deliver packet to " + destination);
processPacket(returnPacket);
}
else
{ connection.forward(packet);
displayText("Forwarding packet from " + source + " to "
+ destination);
}
}
// else this router is the intended destination of the packet
else if (packet instanceof UnicastUpdatePacket)
{ distanceVector.relaxDistanceVector(
((UnicastUpdatePacket)packet).getDistanceVector());
displayText("Update received from " + source);
displayNetwork();
}
...
}
...
}
148 CHAPTER 6. GRAPH ALGORITHMS

/**
A class that represents a distance vector held by a unicast router
Note that the maps used in this class have been synchronized
and so are suitable for concurrent thread usage
@see UnicastRouter.java
*/
...
public class RouterDistanceVector implements Serializable
{
private InetAddress source;
private Map<InetAddress,Double> distanceVector; //shortest distance
private Map<InetAddress,InetAddress> pathVector;//shortest direction

public RouterDistanceVector(InetAddress source)


{ this.source = source;
distanceVector = new LinkedHashMap<InetAddress, Double>();
pathVector = new LinkedHashMap<InetAddress, InetAddress>();
distanceVector.put(source, 0.0);
pathVector.put(source, source); // path of length 0
}

// adds or updates a router as an adjacent router in the network


// unless it already has a shorter path
public synchronized void addAdjacentRouter(InetAddress router,
double distance)
{ Double shortestDistance = distanceVector.get(router);
if (shortestDistance == null ||
shortestDistance.doubleValue() > distance)
{ distanceVector.put(router, distance);
pathVector.put(router, router);
}
}

// handles updating distance vector when an adjacent router


// has been disconnected
public synchronized void removeRouter(InetAddress router)
{ distanceVector.remove(router);
pathVector.remove(router);
// remove any network paths that are via this router
for (InetAddress destinationRouter : distanceVector.keySet())
{ InetAddress shortestRouter=pathVector.get(destinationRouter);
if (shortestRouter.equals(router))
{ distanceVector.remove(destinationRouter);
pathVector.remove(destinationRouter);
}
}
}

cont-
6.6. NETWORK ROUTING 149

-cont

// returns the distance to the given destination or


// Double.POSITIVE_INFINITY if not known
public synchronized double getDistanceToDestination
(InetAddress destination)
{ Double distance = distanceVector.get(destination);
if (distance == null) // no known path to the destination
return Double.POSITIVE_INFINITY;
else
return distance.doubleValue();
}

// returns next router in shortest path to destination or null if


// the destination is unknown
public synchronized InetAddress getNextRouterToDestination
(InetAddress destination)
{ return pathVector.get(destination);
}

// relaxes the edges in this distance vector based on those in


// the parameter
public synchronized void relaxDistanceVector
(RouterDistanceVector rdv)
{ // find distance of shortest path to the other router
Double distanceToRouter = distanceVector.get(rdv.source);
if (distanceToRouter != null)
{ // check every path of the other router
for (InetAddress destinationRouter :
rdv.distanceVector.keySet())
{ // check whether to relax distance to router
double distanceViaRouter = distanceToRouter.doubleValue()
+ rdv.distanceVector.get(destinationRouter);
Double originalDistance
= distanceVector.get(destinationRouter);
if (originalDistance == null ||
distanceViaRouter < originalDistance.doubleValue())
{ // relax distance to router
distanceVector.put(destinationRouter,
distanceViaRouter);
pathVector.put(destinationRouter,
pathVector.get(rdv.source));
}
}
}
// else no path known from source to rdv.source
}

// returns an unmodifiable set of all the known routers on network


public synchronized Set<InetAddress> getAllRouters()
{ return Collections.unmodifiableSet(distanceVector.keySet());
}
}
150 CHAPTER 6. GRAPH ALGORITHMS

each individual router to store. Instead, the routers are arranged into domain
hierarchies, and routers only use routing tables for destinations in the same
domain. Packets destined for another domain are sent to some known router
that handles that domain. This compromise means that packets might not take
a shortest path across the network, although they still do within each domain.
The reverse path forwarding algorithm is an adaptation of the flooding al-
gorithm for multicast routing, that takes advantage of the routing tables built
for unicast routing. The router that is the source of a packet for multicasting
to a group of routers starts by sending the packet to all its adjacent routers.
When a router u receives a packet from a source via its adjacent router x it
checks its routing table to see whether x is on the shortest path from u to the
source. If this is the case then u forwards the packet on to the other adjacent
routers (except x). If however x is not on the shortest path then the packet is
not forwarded and instead u sends x a prune message telling it to stop sending
u multicast packets that originate from that source. In this way the multicast
packet floods the entire network along edges in the shortest path tree that is
rooted at the source.
The reverse path forwarding algorithm leads to some waste, particularly if
only a few routers have client machines that are part of the group. To reduce
this the algorithm includes a type of group prune message, specifically for each
multicast group and source of packets for that group. If a router has pruned all
but one of its adjacent routers for a given source (so it is a leaf in the shortest
path tree rooted at the source), and if it is not itself part of the group then it
can tell the remaining adjacent router to prune it for any multicast packets from
the source for that group. This might result in the adjacent router becoming
a leaf instead, so it too might in turn have itself pruned for such multicast
packets. Eventually, only those routers required along the shortest path to the
group are included in the multicasting. One complication is that some client
might eventually want to join the group, but its router is already pruned. To
accommodate this possibility, any group pruning is made to expire after a certain
time period, so any router that wants to remain pruned must periodically resend
the group prune message to its adjacent routers.
A more efficient algorithm for multicast routing is the centre-based trees
algorithm. In this algorithm each group is assigned one router z on the network
that acts as a rendezvous centre for the group. Any multicast packet destined
for the group is forwarded to the centre router, which then uses a shortest path
tree to forward the packet to all the members of the group in the network. This
requires that any router that is part of the shortest path tree (and which might
itself not even be part of the group) be aware that the packet has come from its
parent in the tree and so should be forwarded to the child routers in the tree,
rather than be forwarded again to the centre (otherwise the packet would be
forwarded in a loop indefinitely).
A Steiner tree for a collection of vertices in a weighted graph G = (V, E, w)
is a tree with minimum total weight that includes those vertices (possibly along
with some other vertices of the graph). As special cases, the Steiner tree for
two vertices is the tree that gives the shortest path between the two vertices,
and the Steiner tree for all the vertices V in the graph is a minimal spanning
tree. Unfortunately, apart from these special cases there is no known efficient
algorithm for finding the Steiner tree for a collection of vertices in a graph
(this problem belongs to a class of problems known as NP-hard, see pp966-1017
6.6. NETWORK ROUTING 151

of the textbook). As a consequence, the centre-based trees algorithm can not


guarantee to obtain the best configuration of vertices for each group. Instead,
the shortest path tree that is found for a group is usually only an approximation
to a Steiner tree.

Exercise 6.6 (Unicast Routing) Working in a team use the class Unicast-
Router to prepare a network of routers, and investigate the effect on routing
when a router is relaxed or when the topology of the network changes.
152 CHAPTER 6. GRAPH ALGORITHMS
Chapter 7

Numerical Algorithms

7.1 Matrix Operations


Reading: pp735-754
Operations with matrices are an essential part of scientific and engineering
computing, and many numerical algorithms rely on the efficient manipulation
of matrices, particularly square n × n matrices.
If A = (aij ) and B = (bij ) are square n × n matrices then their product AB
is the n × n matrix whose i, j-entry is the sum ai0 b0j + ai1 b1j + · · · + ain bnj ,
which can be calculated
 in Θ(n). Thus the entire product of n × n entries can
be found in Θ n3 . Note that matrix multiplication is inherently more complex
than either matrix addition or subtraction, which are each Θ n2 .
As an alternative to the straightforward matrix multiplication algorithm, a
divide-and-conquer technique can be applied by noting that a product of two
n × n matrices can be broken down into various multiplications of matrices of
size n2 (presuming n is even).
    
A1 A2 B1 B2 A1 B1 + A2 B3 A1 B2 + A2 B4
= .
A3 A4 B3 B4 A3 B1 + A4 B3 A3 B2 + A4 B4

For example, to find the product:


  
a11 a12 a13 a14 b11 b12 b13 b14
 a21 a22 a23 a24   b21 b22 b23 b24 
  
 a31 a32 a33 a34   b31 b32 b33 b34 
a41 a42 a43 a44 b41 b42 b43 b44

the 1, 1-entry, 1, 2-entry, 2, 1-entry, and 2, 2-entry of the resulting 4 × 4 matrix


could be found by summing the two 2 × 2 products:
     
a11 a12 b11 b12 a13 a14 b31 b32
+ .
a21 a22 b21 b22 a23 a24 b41 b42

Thus the product of two n × n matrices can be broken down into four sums
of eight n2 × n2 products. The time required T (n) for this divide-and-conquer
technique
 is given by the recurrence T (n) = 8T (n/2) + f (n) where f (n) is
2
Θ (n/2) = Θ n2 (the time required to perform four n2 × n2 sums). So by


153
154 CHAPTER 7. NUMERICAL ALGORITHMS

the first case of the Master Theorem with a = 8 and b = 2 one obtains that
T (n) is Θ n3 , asymptotically not an improvement over the straightforward
approach to matrix multiplication.
Strassen’s algorithm uses an (obscure) combination of seven products of the
matrices A1 , A2 , A3 , A4 , B1 , B2 , B3 , B4 to give a divide-and-conquer technique
that has lower complexity. First one uses seven n2 × n2 multiplications to form
the following matrices:

P1 = A1 (B2 − B4 )
P2 = (A1 + A2 ) B4
P3 = (A3 + A4 ) B1
P4 = A4 (B3 − B1 )
P5 = (A1 + A4 ) (B1 + B4 )
P6 = (A2 − A4 ) (B3 + B4 )
P7 = (A1 − A3 ) (B1 + B2 ) .

Then the final n × n product can be given by five further additions and three
subtractions:
 
P5 + P4 − P2 + P6 P1 + P2
.
P3 + P4 P5 + P1 − P3 − P7

This gives the recurrence T (n) = 7T (n/2)+f (n) where f (n) is again
 Θ n2 . So
the first case of the Master Theorem gives that T (n) is Θ nlog2 7 ≈ Θ n2.808 .
In practice, if the matrices are sparse (with many zero entries) then there
are alternative algorithms that are preferable. If n is not large (below about 20)
then the extra overhead involved with Strassen’s algorithm that is hidden in the
asymptotic complexity make it slower than the straightforward multiplication
algorithm. For very large matrices Strassen’s algorithm might be used to recur-
sively reduce the problem to evaluating matrix multiplications that are smaller
than about 20 × 20, and then the straightforward algorithm used to directly
calculate the result. Actually, there are more sophisticated algorithms for ma-
trix multiplication
 which have even lower complexity, the current best bound is
about O n2.376 . It is currently
 unknown whether there is an algorithm closer
to the lower bound Ω n2 .
If A is an invertible n × n matrix then its LUP-decomposition consists of
three n × n matrices L, U , P for which P A = LU , where P is a permutation
matrix obtained from the n×n identity matrix In by interchanging various rows
(so all but n entries of P are zero and it has a 1 in each row and each column), L
is a lower triangular matrix where all entries above the diagonal are zero (each
entry lij = 0 for i < j), and U is an upper triangular matrix where all entries
below the diagonal are zero (each entry uij = 0 for i > j).
An LUP-decomposition for an invertible matrix A can be found by using
Gaussian elimination to row reduce A to row echelon form (which is the matrix
U ). Operations that interchange two rows are kept track of by applying them
to the identity matrix In , which will become the matrix P . The matrix L can
be found by keeping track of the inverse of each operation, starting with the
zero matrix, and when the process is completed In is added to give the matrix
7.1. MATRIX OPERATIONS 155

L. For example, suppose an LUP-decomposition is to be found for the matrix:


 
4 −5 6
A =  8 −6 7  .
12 −7 12

One starts with the augmented matrix (0n |A|In ), and applies Gaussian elimi-
nation to the matrix A noting the inverse of every operation in the matrix on
the left and just the row interchanges in the matrix on the right.
 
0 0 0 4 −5 6 1 0 0
 0 0 0 8 −6 7 0 1 0 
0 0 0 12 −7 12 0 0 1

In practice, to minimize numerical rounding inaccuracies usually rows are in-


terchanged so that the largest possible entry is used as the leading entry for
the operations. Hence, row 1 is interchanged with row 3 so that the entry 12 is
used:  
0 0 0 12 −7 12 0 0 1
 0 0 0 8 −6 7 0 1 0 
0 0 0 4 −5 6 1 0 0
Then the entry 12 is used to put zeros in the rows below it by adding − 23 times
row 1 to row 2, and adding − 13 times row 1 to row 3:
 
0 0 0 12 −7 12 0 0 1
 2/3 0 0 0 −4/3 −1 0 1 0 
1/3 0 0 0 −8/3 2 1 0 0

Next, since |−8/3| > |−4/3| row 2 is interchanged with row 3 to minimized
numerical inaccuracies:
 
0 0 0 12 −7 12 0 0 1
 1/3 0 0 0 −8/3 2 1 0 0 
2/3 0 0 0 −4/3 −1 0 1 0

Then the entry −8/3 is used to put a zero in the row below it by adding − 12
times row 2 to row 3:
 
0 0 0 12 −7 12 0 0 1
 1/3 0 0 0 −8/3 2 1 0 0 
2/3 1/2 0 0 0 −2 0 1 0

Now that the matrix A is an upper triangular matrix Gaussian elimination can
be stopped and the identity In added to the matrix on the left:
 
1 0 0 12 −7 12 0 0 1
 1/3 1 0 0 −8/3 2 1 0 0 
2/3 1/2 1 0 0 −2 0 1 0

This gives the three matrices for the LUP-decomposition:


     
1 0 0 12 −7 12 0 0 1
L =  1/3 1 0  U =  0 −8/3 2  P =  1 0 0 .
2/3 1/2 1 0 0 −2 0 1 0
156 CHAPTER 7. NUMERICAL ALGORITHMS

Checking, onesees that indeed P A = LU .


The Θ n3 algorithm LUP-Decomposition performs an LUP-decomposi-
tion on a matrix A, and the class LUPDecomposition provides an implementa-
tion of this algorithm. Since the algorithm always results in a lower triangular
matrix L with 1 down the diagonal it places the below-diagonal entries of L
in the original matrix A itself, along with the entries of the upper triangular
matrix U . Furthermore, since P is a permutation matrix and so only has one
non-zero entry in each row, a one-dimensional array π is used to represent P ,
where π[i] gives the column of row i in P that has the entry 1. For example,
the previous permutation matrix would be represented by π = (2, 0, 1).

LUP-Decomposition(A)
1 n ← rows[A]  A is matrix with rows 0, . . . , n − 1 to decompose
2 for i ← 0 to n − 1 do
3 π[i] ← i  permutation matrix π is initially identity matrix
4  find row k of the decomposition π, lower, upper triangular matrices
5 for k ← 0 to n − 1 do
6  find the largest entry from akk , . . . , a(n−1)k
7 p ← |akk |
8 k0 ← k
9 for i ← k + 1 to n − 1 do
10 if |aik | > p then
11 p ← |aik |
12 k0 ← i
13 if p = 0 then  only zero entries down the column
14 throw exception as matrix is not invertible
15  swap row k with row k 0
16 swap π[k] with π[k 0 ]  modify permutation matrix for row swap
17 for j ← 0 to n − 1 do
18 swap akj with ak0 j
19 for i ← k + 1 to n − 1 do
20  subtract aik /akk times row k from row i
21 aik ← aik /akk  apply inverse operation for lower
22 for j ← k + 1 to n − 1 do
23 aij ← aij − aik akj  apply operation to upper
24 return π, lower triangle of A with 1 on diagonal, upper triangle of A

LUP-decompositions are very important for various numerical calculations.


For instance, any system of linear equations written in matrix form as AX = B
in the case when A is invertible is typically solved by using LUP-decomposition.
This is achieved by multiplying both sides of the system on the left by the
matrix P , and using the fact that P A = LU . Hence the system to be solved
for X is LU X = P B. Putting Y = U X results in the system LY = P B, which
since L is a triangular matrix is quickly solved for Y . Then U X = Y can also
be quickly solved for X, giving a solution to the system in Θ n2 . Instead the
system could have been solved by using just Gaussian elimination in O n3
but LUP-decomposition gives fewer numerical inaccuracies and is preferred in
practice.
7.2. FAST FOURIER TRANSFORMS 157

For example, consider solving the system:


    
4 −5 6 x1 2
 8 −6 7   x2  =  1 
12 −7 12 x3 5
Finding the LUP decomposition for the matrix and multiplying both sides by
the permutation matrix P gives:
     
1 0 0 12 −7 12 x1 5
 1/3 1 0   0 −8/3 2   x2  =  2 
2/3 1/2 1 0 0 −2 x3 1
The system of equations:
    
1 0 0 y1 5
 1/3 1 0   y2  =  2 
2/3 1/2 1 y3 1

is quickly solved by using forward substitution. Clearly y1 = 5 so 31 y1 + y2 = 2


gives that y2 = 1/3, and so 23 y1 + 12 y2 + y3 = 1 gives that y3 = −5/2. Then the
system of equations:
    
12 −7 12 x1 5
 0 −8/3 2   x2  =  1/3 
0 0 −2 x3 −5/2

is also quickly solved using back substitution, as −2x3 = −5/2 gives that x3 =
5/4, then −8/3x2 +2x3 = 1/3 gives that x2 = 13/16, and so 12x1 −7x2 +12x3 =
5 gives that x1 = −23/64. Thus once the LUP decomposition is known for an
invertible matrix A a system AX = B can be solved in Θ n2 .

Exercise 7.1 (Systems of Linear Equations) Write a class whose construc-


tor accepts an invertible n × n matrix A as a double[][] array and an n × 1
matrix B, and solves the system of linear equations given by AX = B (by first
finding the LUP-decomposition of A, computing P B, then solving LY = P B for
Y followed by U X = Y to find the solution X).

7.2 Fast Fourier Transforms


Reading: pp822-843
A complex number z = a + ib (where i2 = −1) is said to be a complex n-th
root of unity if z n = 1. For example, the square roots of unity are z = 1, z = −1,
and the fourth roots of unity are z = 1, z = i, z = −1, z = −i.
In general there are exactly n complex n-th roots of 1
unity, given for k = 0, 1, 2, . . . , n − 1 by:
2πk 2πk
z = cos + i sin .
n n
−1 1
These n complex numbers appear equally spaced
about the origin when drawn in the complex (Ar- −1
gand) plane.
158 CHAPTER 7. NUMERICAL ALGORITHMS

The principal n-th root of unity is the complex number ω = cos 2π 2π


n + i sin n
2πi
− n
where ω = cos 2π 2π
n − i sin n = e . De Moivre’s Theorem gives that:
k
(cos θ + i sin θ) = cos kθ + i sin kθ
and so the n-th roots of unity are actually the values 1, ω = ω n−1 , ω 2 = ω n−2 ,
ω 3 = ω n−3 , . . . , ω n−1 = ω.
The Discrete Fourier Transform (DFT) of an n-tuple (a0 , a1 , a2 , . . . , an−1 )
is the n-tuple (y0 , y1 , y2 , . . . , yn−1 ) where each yk is given by:
yk = a0 + a1 ω k + a2 ω 2k + · · · + an−1 ω (n−1)k
for 0 ≤ k < n, and ω = cos 2π 2π
n − i sin n is an n-th root of unity. This can
alternatively be written as:
yk = p ω k


where p is the polynomial p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 . Discrete


Fourier Transforms have many widespread applications, such as in cryptography
where they are used to efficiently multiply big polynomials and big integers, and
particularly in signal processing. If a signal is sampled at regular times to have
amplitudes a0 , a1 , a2 , . . . , an−1 (in the time domain), then a DFT can be used
to find the amplitudes y0 , y1 , y2 , . . . , yn−1 of each sinusoidal component (in the
frequency domain) that comprise the signal.
The Discrete Fourier Transform can be conveniently expressed in matrix
form as:
    
y0 1 1 1 ... 1 a0
 y1   1 ω ω2 ... ω n−1   a1 
2 4 (n−1)2
    
 y2   1 ω ω ... ω   a2 
 =  .
 ..   .. .. .. ..   .. 
 .   . . . .   . 
yn−1 1 ω n−1 ω (n−1)2 ... ω (n−1)(n−1) an−1
For example, the Discrete Fourier Transform of the 4-tuple (7, −3, 4, −1) is
calculated using ω = cos 2π 2π
4 − i sin 4 = −i by:
    
7 1 1 1 1 7
 3+2i   1 −i −1 i   −3 
=
 4 .
  
 15   1 −1 1 −1
3−2i 1 i −1 −i −1
The n × n matrix for the Discrete Fourier Transform is actually invertible
with inverse:
 
1/n 1/n 1/n ... 1/n
 1/n ω/n ω 2 /n ... ω n−1
/n 
ω 2 /n ω 4 /n (n−1)2
 
 1/n ... ω /n .


 .. .. .. .. 
 . . . . 
1/n ω n−1 /n ω (n−1)2 /n ... ω (n−1)(n−1) /n
The inverse Discrete Fourier Transform of an n-tuple (y0 , y1 , y2 , . . . , yn−1 ) is
the n-tuple (a0 , a1 , a2 , . . . , an−1 ) where each ak is given by:
1 
ak = y0 + y1 ω k + y2 ω 2k + · · · + yn−1 ω (n−1)k
n
7.2. FAST FOURIER TRANSFORMS 159

for 0 ≤ k < n, where ω = cos 2π 2π


n + i sin n is the principal n-th root of unity.

Calculating the Discrete


 Fourier Transform of an n-tuple by using matrix
multiplication is Θ n2 . Since Fourier Transforms are often needed for real-
time processing of signals it is essential that they be calculated as efficiently as
possible. The Fast Fourier Transform (FFT) is a divide-and-conquer technique
for evaluating a Discrete Fourier Transform in Θ (n log2 n) that exploits patterns
in the n × n DFT matrix. It uses the fact that if p(x) = a0 + a1 x + a2 x2 + a3 x3 +
· · · + an−1 xn−1 is a polynomial of degree at most n − 1 where n is presumed for
convenience to be even then:

a0 + a2 x2 + · · · + an−2 xn−2 + a1 x + a3 x3 + · · · + an−1 xn−1


 
p(x) =
   
= a0 + a2 x2 + · · · + an−2 x2(n/2−1) + x· a1 + a3 x2 + · · · + an−1 x2(n/2−1)
= peven x2 + x · podd x2 ,
 

where:

peven (x) = a0 + a2 x + a4 x2 + · · · + an−2 xn/2−1


podd (x) = a1 + a3 x + a5 x2 + · · · + an−1 xn/2−1 .

So if ω = cos 2π 2π
n − i sin n then the Discrete Fourier Transform of the n-tuple
(a0 , a1 , a2 , . . . , an−1 ) is (y0 , y1 , y2 , . . . , yn−1 ) where:

yk = p ω k = peven ω 2k + ω k · podd ω 2k
  

for 0 ≤ k < n. But note that peven and podd are both polynomials of degree
at most n/2 − 1 and ω 2 is actually an n/2-th  root of unity, so for 0 ≤ k <
n/2 the values peven ω 2k and podd ω 2k are just entry k of the DFT of
(a0 , a2 , . . . , an−2 ) and of (a1 , a3 , . . . , an−1 ) respectively. For n/2 ≤ k < n note
that ω 2k = ω 2(k−n/2) , since ω n = 1, and ω k = ω (k−n/2)+n/2 = −ω k−n/2 , since
ω n/2 = −1. Thus all n entries of the DFT of (a0 , a1 , a2 , . . . , an−1 ) can be found
in a loop from k = 0 to k = n/2 − 1 where:

= peven ω 2k + ω k · podd ω 2k
 
yk
= peven ω 2k − ω k · podd ω 2k .
 
yk+n/2

Thus the DFT of (a0 , a1 , a2 , . . . , an−1 ) can be found recursively by combining


the two simpler DFTs of (a0 , a2 , . . . , an−2 ) and of (a1 , a3 , . . . , an−1 ), provided
that n is a power of two. In the base case when n = 1, p(x) = a0 , and so the
DFT of (a0 ) is (p(1)) = (a0 ).
160 CHAPTER 7. NUMERICAL ALGORITHMS

Recursive-FFT(a)
1 n ← length[a]  a = (a0 , a1 , a2 , a3 , . . . , an−1 ) and n is a power of 2
2 if n = 1 then
3 return a  base case a = (a0 )
4 ω ← cos 2π
n − i sin 2π
n  n-th root of unity
5 aeven ← (a0 , a2 , . . . , an−2 )
6 aodd ← (a1 , a3 , . . . , an−1 )
7 yeven ← Recursive-FFT(aeven )
8 yodd ← Recursive-FFT(aodd )
9 x←1  point x = ω k is a complex root of unity
10 for k ← 0 to n/2 − 1 do
11 y[k] ← yeven [k] + x · yodd [k]
12 y[k + n/2] ← yeven [k] − x · yodd [k]
13 x←x·ω
14 return y  y = (y[0], y[1], y[2], . . . , y[n − 1])

The time required T (n) by the Recursive-FFT algorithm for an n-tuple is


given by the recurrence relation T (n) = 2T (n/2) + f (n) where f (n) is O(n) due
to the for loop. So by the second case of the Master Theorem with a = 2 and
b = 2 one obtains that T (n) is Θ (n log2 n). The inverse DFT can be calculated
in a similar way but replacing ω by ω = cos 2π 2π
n +i sin n and including the factor
1
n.
In practical applications of the DFT the recursive version of the Fast Fourier
Transform algorithm is usually replaced by an iterative version, which is still
Θ (n log2 n) but has a smaller constant factor in T (n). Firstly note that the
Recursive-FFT algorithm uses the expression x · yodd [k] twice inside the
for loop (called a common subexpression), so the efficiency of the loop can be
improved by using a temporary variable t that gets both added to and subtracted
from u = yeven [k]. This is known as a butterfly operation and is characteristic
of the Fast Fourier Transform algorithm.
yeven [k] + y[k]

x = ωk
1 for k ← 0 to n/2 − 1 do
2 t ← x · yodd [k] yodd [k] × − y[k+n/2]
3 u ← yeven [k]
4 y[k] ← u + t
5 y[k + n/2] ← u − t
6 x←x·ω
Next, note that the allocation of the arrays aeven and aodd can be avoided
if the elements of the initial array (n-tuple) a can be suitably rearranged so
that the algorithm can operate with portions of a single array. For example,
if n = 23 = 8 and a = (a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 ) then it would be more
convenient to have a arranged as (a0 , a4 , a2 , a6 , a1 , a5 , a3 , a7 ) so that the DFT
is applied bottom-up.
7.2. FAST FOURIER TRANSFORMS 161

(a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 )

(a0 , a2 , a4 , a6 ) (a1 , a3 , a5 , a7 )

(a0 , a4 ) (a2 , a6 ) (a1 , a5 ) (a3 , a7 )

(a0 ) (a4 ) (a2 ) (a6 ) (a1 ) (a5 ) (a3 ) (a7 )


The algorithm Bit-Reverse-Copy performs the appropriate rearrangement of
the array a in Θ (n log2 n). A proof by induction can be used to show that
an element ak should be placed at index rev (k) where rev (k) is obtained by
reversing the order of bits when k is written in binary. So for example, k = 6
in binary is 110, so rev (k) = 011 which is the index 3. If the value of n is fixed
then the bit reversals could instead be permanently held in a table, so that the
rearrangement would be Θ(n).
Once the elements of a = (a0 , a1 , a2 , . . . , an−1 ) are rearranged to give an
array y = (y0 , y1 , y2 , . . . , yn−1 ) a single butterfly operation is applied to each of
the pairs (y0 , y1 ), (y2 , y3 ), . . . , (yn−2 , yn−1 ) in-place with x = 1 (corresponding
to the recursive case with n = 2). Then the resulting n/2 pairs are combined into
n/4 4-tuples using two butterfly operations with x = 1 and x = i (corresponding
to the recursive case with n = 4). Iteratively, these 4-tuples are combined pair
at a time using further butterfly operations into 8-tuples, repeating until a single
n-tuple remains in the array y.
The development of the Fast Fourier Transform algorithm is credited to
Cooley and Tukey, and its introduction was a very significant step forward
for numerical calculations, making possible calculations such as efficient signal
processing.

Exercise 7.2 (Implementing the Iterative FFT) Working in a team write


a program that implements the Iterative-FFT and Bit-Reverse-Copy al-
gorithms for an array of (complex) values. Note that this will require the de-
velopment of a class Complex or the usage of two-dimensional arrays for a and
y to represent complex numbers with addition, subtraction, and multiplication
operations.
162 CHAPTER 7. NUMERICAL ALGORITHMS

Bit-Reverse-Copy(a)
1  rearrange array a to give array y by reversing bits of each index
2 n ← length[a]  n is presumed to be a power of 2
3 for k ← 0 to n − 1 do
4 k0 ← k
5 r←0  r is the bit reversal of the index k
6 for j ← 0 to log2 n − 1 do
7 b ← k 0 &1  find bit j from right of n
8 r ← (r << 1) + b  shift bits left and add bit b
9 k 0 ← k 0 >>> 1  shift bits right
10 y[r] ← a[k]
11 return y

Iterative-FFT(a)
1 n ← length[a]  a = (a0 , a1 , a2 , a3 , . . . , an−1 ) and n is a power of 2
2 y ← Bit-Reverse-Copy(a)
3  use butterfly operations on y to find the DFT of a
4 for s ← 1 to log2 n do
5 m ← 2s  apply butterfly operations to m-tuples
6 ω ← cos 2πm − i sin 2π
m  m-th root of unity
7 for k ← 0 to n − 1 by m do  increment k in steps of m
8 x←1  point x = ω k is a complex root of unity
9 for j ← 0 to m/2 − 1 do
10  perform butterfly operation in-place at k + j
11 t ← x · y[k + j + m/2]
12 u ← y[k + j]
13 y[k + j] ← u + t
14 y[k + j + m/2] ← u − t
15 x←x·ω
16 return y  y = (y[0], y[1], y[2], . . . , y[n − 1])
Appendix A

Advanced Analysis
Techniques

A.1 Probability Theory


Reading: none
Probability theory is based around ideas of outcomes to experiments. A
sample space S is the set of all the possible outcomes from some experiment,
whose elements are called elementary events. Subsets of S are called events,
and two events A ⊆ S, B ⊆ S are said to be mutually exclusive if A ∩ B = ∅.
For example, the experiment of tossing a coin three times has a sample space
S with 23 = 8 elementary events, and might be denoted by:
S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T } .
The event of obtaining at least two heads from the experiment is the subset
{HHH, HHT, HT H, T HH}, which is mutually exclusive from the event of ob-
taining exactly one head which is the subset {HT T, T HT, T T H}.
A probability distribution Pr on a sample space S is a function from events
of S (its domain is the power set of S, all the subsets of S) to real numbers for
which:
1. Pr(∅) = 0.
2. Pr(S) = 1.
3. 0 ≤ Pr(A) ≤ 1 for every event A.
4. If A and B are mutually exclusive then Pr(A ∪ B) = Pr(A) + Pr(B).
For any event A ⊆ S, Pr(A) is called the probability of the event A.
If the coin used in the coin tossing experiment is considered unbiased then
the probability distribution Pr(A) = |A| /8 would be used, where |A| denotes the
cardinality of the event A (this is an example of a uniform probability distribution
where each elementary event is assigned the same probability).
The conditional probability that an event A occurs, given that an event B
does occur (meaning that Pr(B) > 0), is denoted by Pr(A|B), and defined by:
Pr(A ∩ B)
Pr(A|B) = .
Pr(B)

163
164 APPENDIX A. ADVANCED ANALYSIS TECHNIQUES

Using the probability distribution for the coin tossing example, the condi-
tional probability that at least two heads are obtained given that at least one
head is known to be obtained can be calculated as follows. Let A denote the
event that at least two heads are obtained, and B denote the (non-exclusive)
event that at least one head is obtained. Then:
7
Pr(B) = 1 − Pr(no heads are obtained) = 1 − Pr({T T T }) = ,
8
and Pr(A ∩ B) = Pr(A) = 84 since A ⊆ B. So Pr(A|B) = 48 / 78 = 74 .
Two events A and B are said to be independent if Pr(A ∩ B) = Pr(A) Pr(B),
or equivalently, if Pr(A|B) = Pr(A) (so long as Pr(B) > 0). For example, if A
denotes the event that the first toss is heads and if B denotes the event that
the third toss is tails, then A ∩ B = {HHT, HT T } so Pr(A ∩ B) = 28 . Since
Pr(A) = 48 and Pr(B) = 48 , and Pr(A ∩ B) = Pr(A) Pr(B), the events A and B
are independent.
A random variable X is a function from a sample space S to real numbers,
associating a real number with each possible outcome of an experiment. If a
sample space S is discrete then the expected value E(X) of a random variable
is defined by: X
E(X) = x Pr(X = x)
x

For example, the experiment of rolling two dice has a sample space with 36
elementary events, and if the dice are unbiased then the uniform probability
distribution would be used where each elementary event is assigned equal prob-
1
ability 36 . If X denotes the random variable whose value is the number showing
on the first die (one dice) then its expected value is:
1 1 1 1 1 1
E(X) = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · = 3.5
6 6 6 6 6 6
An indicator random variable is a random variable that only has values 0
and 1. In particular, if A is some event and X is the indicator random variable
defined by X(a) = 1 if a ∈ A and X(a) = 0 if a ∈ / A then E(X) = Pr(A).
Indicator random variables provide a convenient way of handling probabilities.
It is not difficult to show that the expected value of the sum of two random
variables X and Y is the sum of the expected value of each, that is E(X + Y ) =
E(X) + E(Y ). If in the dice experiment Y denotes the random variable whose
value is the number showing on the second die then E(Y ) = 3.5, and X + Y is
the random variable representing the sum of the dice. Hence the expected value
of the sum is E(X + Y ) = E(X) + E(Y ) = 7.
Two random variables X and Y are said to be independent if for all real
values x and y the event X = x and the event Y = y are independent, that is,
Pr(X = x and Y = y) = Pr(X = x) Pr(Y = y). If X and Y are independent
then one can verify that E(XY ) = E(X)E(Y ). Hence in the dice experiment,
the expected value of the product of the values showing on the two dice is
E(XY ) = 3.5 · 3.5 = 12.25.

A.2 Probabilistic Analysis


Reading: 91-104, 185-189
A.2. PROBABILISTIC ANALYSIS 165

The use of probability in the analysis of problems is known as probabilistic


analysis. In order to use probabilistic analysis to estimate the average-case
behaviour of an algorithm something must be known about the distribution of
the possible inputs to the algorithm.
As an application of probabilistic analysis, consider the Find-Index-Max-
imum algorithm that finds the index of a maximum value in an array A:

Find-Index-Maximum(A)
1 maxIndex ← 0
2 for i ← 1 to length[A] − 1 do
3 if A[i] > A[maxIndex ] then
4 maxIndex ← i
5 return maxIndex

This algorithm is easily seen to be Θ(n), but probabilistic analysis is helpful


in determining the expected number of times the assignment maxIndex ← i
is performed. This number depends not just on n but also on the order of
particular elements in the input array. The assignment would be performed
only once if the actual maximum were the first element in the array, but in
the worst case it would be performed n times, corresponding to the elements
being in ascending order. Consider the sample space S whose elementary events
are the possible input arrays A = ha0 , a1 , . . . , an−1 i with n elements. Take
X0 , X1 , . . . , Xn−1 to be the constant values 0 or 1, where X0 = 1 and each
Xi = 1 if and only if ai is greater than all of a0 , a1 , . . . , ai−1 (so each Xi is
an indicator random variable that states whether the maxIndex assignment is
performed for i). Define the random variable X by:
n−1
X
X (A) = Xi ,
i=0

and note that for input A the assignment statement is performed X(A) times.
If one assumes that the elements in the input array are in random order then
1 1
E (Xi ) = i+1 since there would be a probability of i+1 that ai is greater than
all of a0 , a1 , . . . , ai−1 . The expected number of assignment statements E(X)
can then be found by:
n−1
! n−1 n−1
X X X 1
E(X) = E Xi = E (Xi ) = ,
i=0 i=0 i=0
i+1

which is a quantity bounded below by loge (n+1) and above by loge n+1. Hence
E(X) ≈ loge n.
A randomized algorithm is an algorithm whose behaviour is determined not
only by its inputs but also by the values produced by a random number gener-
ator. Randomness might be used to alter the behaviour of an algorithm each
time it is run, or to ensure the expected case behaviour of an algorithm by
randomizing the order of its inputs (some algorithms such as Insertion-Sort
perform very poorly if their inputs have a certain order, such behaviour can
be avoided if the order is randomized). For example, consider the simple al-
gorithm Randomize-In-Place which can be used to randomize the order in
which elements appear in the input array A.
166 APPENDIX A. ADVANCED ANALYSIS TECHNIQUES

Randomize-In-Place(A)
1 n ← length[A]
2 for i ← 0 to n − 1 do
3 j ← Random(n − i) + i  random number between i and n-1
4 swap A[i] with A[j]

The claim that this Θ(n) algorithm actually does randomize the order of the
elements in an array A of n elements means that every permutation of the n
elements is equally likely. Since there are n! = 1 · 2 · 3 · . . . · n permutations of
1
the elements each permutation should have probability n! if the algorithm truly
randomizes the order. To see that this is the case consider the following loop
invariant:

at the start of iteration i of the for loop, for each of the n!/(n −
i)! possible permutations of i elements of A, the chances that the
subarray A[0 . . i − 1] is that permutation has equal probability (n −
i)!/n!.

First, the loop starts with i = 0 and so the subarray A[0 . . i − 1] consisting of
no elements of A is a trivial permutation with probability 1 (since the trivial
permutation is the only permutation of no elements). Next, suppose at the
start of iteration i every possible permutation of i elements has equal probability
(n−i)!/n! of appearing in A[0 . . i−1]. Iteration i of the for loop swaps A[i] with
one element randomly chosen from A[i], A[i + 1], . . . , A[n − 1], each having equal
1
probability n−i of being chosen. Consider any permutation ha0 , a1 , . . . , ai i of
i+1 elements of A, and let E1 denote the event that iterations 0, 1, . . . , i−1 have
created the permutation ha0 , a1 , . . . , ai−1 i, which is supposed to have probability
(n−i)!/n!. Let E2 denote the event that iteration i swaps ai into position i. The
probability that iteration i results in the permutation ha0 , a1 , . . . , ai i is given
by:

(n − i)! 1 (n − (i + 1))!
Pr (E1 ∩ E2 ) = Pr (E1 ) Pr (E2 |E1 ) = · = .
n! n−i n!

Hence at the start of iteration i+1 the loop invariant is valid. Upon termination
of the loop i = n and so the loop invariant gives that each of the n! permutations
1
of the n elements of A has equal probability n! of appearing in A[0 . . n − 1].
As a final example of probabilistic analysis consider the selection problem.
This problem takes as input an array or list of n elements and an integer i with
0 ≤ i < n, and has as output the element that is the i + 1-th smallest element
(the element that is larger than i other elements). The Randomized-Select
algorithm for solving the selection problem makes use of a Θ(n) Randomized-
Partition algorithm which picks an element at random as the partition element
and partitions the elements (in-place). All the elements smaller than the parti-
tion are placed to the left and all elements greater are placed to the right, with
the partition element between, and the eventual index of the partition element
is returned. The Randomized-Partition is a randomized algorithm to ensure
that the left and right sides of the partition can be expected to be of equal sizes
regardless of the initial ordering of A.
A.2. PROBABILISTIC ANALYSIS 167

Randomized-Partition(A, p, r)
1  choose an element at random as partition and place it at index p
2 swap A[p] with A[Random(r − p) + p]
3  swap elements so elements on left are smaller than partition
4  and elements on right are larger than partition
5 leftIndex ← p + 1
6 rightIndex ← r − 1
7 while leftIndex < rightIndex do
8  find element starting from left that is greater than partition
9 while A[leftIndex ] ≤ A[p] and leftIndex < rightIndex do
10 leftIndex ← leftIndex +1
11  find element starting from right that is less than partition
12 while A[rightIndex ] > A[p] do
13 rightIndex ← rightIndex −1
14 if leftIndex < rightIndex then
15 swap A[leftIndex ] with A[rightIndex ]
16  place partition element between the left and right sides of partition
17 swap A[p] with A[rightIndex]
18 return rightIndex

Randomized-Select(A, p, r, i)
1 if p + 1 ≥ r then  only one element
2 return A[p]
3 else
4  pick an element at random and partition A[p . . r − 1] using it
5 indexPartition ← Randomized-Partition(A, p, r)
6 if indexPartition = i then  use partition element
7 return A[indexPartition]
8 else if i < indexPartition then  check left side of partition
9 return Randomized-Select(A, p, indexPartition, i)
10 else  check right side of partition
11 return Randomized-Select(A, indexPartition +1, q, i)

The time required T (n) by the Randomized-Select algorithm to select  the


i + 1-th smallest element from an array A of length n is actually O n2 , since
in the worst (very unlikely) case each partition is empty on one side and the
i + 1-th smallest element is not found until after n partitions are formed. In
the best case it is Ω(n) since there is a n1 probability that the first partition
actually uses the i + 1-th smallest element. However, of more interest is the
expected (average) time E(T (n)) for the algorithm. A careful analysis using
indicator random variables is shown on pages 187-189 of the textbook but a
simplified analysis is possible as follows. Call a partition of A[p . . r − 1] good if
neither side of the partition has more than 34 (r −p) elements. Since the partition
element is chosen at random the probability that a partition is good is 12 (as
the element has 50% chance of lying between the lower quarter of elements and
the upper quarter). Define the random variable X so that X(A) is the number
of consecutive partitions that are formed for A until a good partition is formed.
168 APPENDIX A. ADVANCED ANALYSIS TECHNIQUES

Note that:
1 1 1 1
E(X) = 1 · +2· +3· +4· + . . . = 2.
2 4 8 16
Each call to Randomized-Partition takes time bn where b > 0 is some con-
stant. Suppose when a good partition is finally made that the index i falls on
the larger side of the partition, which in the worst case would have up to 43 n
elements. Then T (n) is at worst the time required to make X partitions until a
good partition is made, plus the time required to
 perform the algorithm on the
side with 43 n elements, so T (n) = bnX + T 34 n . Hence:
  
3
E(T (n)) = bn · E(X) + E T n .
4

This is a recurrence relation for E(T (n)), where a = 1, b = 43 , f (n) = 2bn,


and nlogb a = 1, so by the third case of the Master Theorem (with r = 34 )
one has E(T (n)) is Θ(n). Hence the average case of the Randomized-Select
algorithm has the same order as its best possible case.

A.3 Amortized Analysis


Reading: pp405-424
An amortized analysis of a sequence of operations uses the scenario of a
worst-case input and averages the total required time for the sequence with
that input over all the operations. It does not focus on each operation sepa-
rately since taking the worst case per operation can give an overly pessimistic
bound on the performance of the operations as a whole. Rather than using
probabilistic analysis to obtain an estimated (average) case running time, it
makes no assumptions about randomness in the input, instead calculating the
average performance of each operation in the worst case.
Amortized analysis can be used to analyze the performance of a data struc-
ture which has various operations that can be performed on it, particularly
if the required time of some operations can vary and any expensive operation
is always preceded by many cheap operations. For example, an ArrayList has
add, contains, and remove operations. If the ArrayList has sufficient capacity
the add method works in constant time, but if the current capacity has already
been reached a larger array must be allocated and the elements must be copied
to it, requiring linear time. Presuming the worst case scenario each time, the
add method would be considered to require linear time. However, this is not a
fair analysis since if the capacity is expanded in a smart way most calls to add
work in constant time, compensating for the occasional case when the capacity
must be expanded. It is this idea of compensation that amortized analysis takes
into consideration, commonly using either the aggregate method, the account-
ing method, or the potential method, which are each useful tools for optimizing
the design of an algorithm.
Consider the problem of a k-bit binary counter that counts upward from 0,
which uses an array A[0 . . k − 1] as its data structure to store the bits, with the
lowest-order bit held by A[0] and highest-order bit by A[k − 1]. The Incre-
ment algorithm increments the counter by one using a while loop that iterates
between 0 and k times, flipping up to k bits.
A.3. AMORTIZED ANALYSIS 169

Increment(A)
1 i←0
2 while i < length[A] and A[i] = 1 do
3 A[i] ← 0
4 i←i+1
5 if i < length[A] then
6 A[i] ← 1
7 else
8 overflow exception
A simple amortized analysis can be used to determine the average time for the
increment operation on the array data structure by averaging the total time
required to perform the operation n times over.
The aggregate method finds the worst-case running time T (n) for a sequence
of n operations, and the amortized cost per operation is then taken to be T (n)/n,
regardless of whether there were several different types of operations in the
sequence. When the sequence of operations is not specified it is usually presumed
to be a worst-case sequence of operations on a newly created data structure.
An aggregate method analysis of the increment operation considers the total
time to perform the operation n consecutive times, which can be measured by
the total number of bits that get flipped. Note that bit 0 flips every time the
increment operation is performed, whereas bit 1 flips every second time. In
general, bit i flips every 2i times the increment operation is performed, so the
k−1
Xj n k
total number of flips for n consecutive increment operations is < 2n.
i=0
2i

Counter A[7] A[6] A[5] A[4] A[3] A[2] A[1] A[0] Cost Total
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 1 1
2 0 0 0 0 0 0 1 0 2 3
3 0 0 0 0 0 0 1 1 1 4
4 0 0 0 0 0 1 0 0 3 7
5 0 0 0 0 0 1 0 1 1 8
6 0 0 0 0 0 1 1 0 2 10
7 0 0 0 0 0 1 1 1 1 11
8 0 0 0 0 1 0 0 0 4 15
9 0 0 0 0 1 0 0 1 1 16
10 0 0 0 0 1 0 1 0 2 18
11 0 0 0 0 1 0 1 1 1 19
12 0 0 0 0 1 1 0 0 3 22
13 0 0 0 0 1 1 0 1 1 23
14 0 0 0 0 1 1 1 0 2 25
15 0 0 0 0 1 1 1 1 1 26
16 0 0 0 1 0 0 0 0 5 31

Hence the average cost of each increment is 2n/n = 2, so the increment operation
requires constant time on average (and each increment flips on average 2 bits).
The accounting method uses a scheme of credits and debits to keep track of
the running time of a sequence of operations, assigning an amortized charge to
each operation. Interestingly, these charges are permitted to differ from their
real time cost, to allow for the fact that one type of operation might make more
170 APPENDIX A. ADVANCED ANALYSIS TECHNIQUES

work later for another operation and so should be charged more. So long as
the total real cost of the sequence of operations is bounded by the total of the
amortized costs, the amortized costs can be used instead of the real costs to
analyze an algorithm, removing the hassle of variability in the real costs from
the analysis. Conceptually the data structure functions like a bank holding a
balance of credit. Whenever a change is made its amortized cost is added to the
data structure’s credit (which should never drop into debt) and the real cost of
the operation is deducted from that balance.
Consider again the increment operation of the binary counter. The real cost
of the increment operation is variable due to the while loop that can iterate
a different number of times depending on the state of the data structure, and
is given by the total number of bits flipped. An amortization scheme for this
example could consider flipping a bit as a real cost of one dollar (a dollar in the
sense of some time cost unit). Whenever the increment operation sets a bit to 1
extra work needs to be done by a later increment operation to put it back to 0,
since the Increment algorithm only has extra work to do when it encounters
1 bits. To compensate for this each increment operation could be charged an
fixed amortized cost of two dollars. This is done to ensure that there will be
sufficient credit accumulated in the data structure to pay for future increment
operations that might have a higher real cost than this amortized cost. One

Counter A[7] A[6] A[5] A[4] A[3] A[2] A[1] A[0] Cost Balance
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 1 1
2 0 0 0 0 0 0 1 0 2 1
3 0 0 0 0 0 0 1 1 1 2
4 0 0 0 0 0 1 0 0 3 1
5 0 0 0 0 0 1 0 1 1 2
6 0 0 0 0 0 1 1 0 2 2
7 0 0 0 0 0 1 1 1 1 3
8 0 0 0 0 1 0 0 0 4 1
9 0 0 0 0 1 0 0 1 1 2
10 0 0 0 0 1 0 1 0 2 2
11 0 0 0 0 1 0 1 1 1 3
12 0 0 0 0 1 1 0 0 3 2
13 0 0 0 0 1 1 0 1 1 3
14 0 0 0 0 1 1 1 0 2 3
15 0 0 0 0 1 1 1 1 1 4
16 0 0 0 1 0 0 0 0 5 1

way to see that this is the case (and that the data structure won’t become
bankrupt) is to consider that for the two dollars in total paid by each increment
operation, one dollar is paid for actually flipping the least significant bit. If the
least-significant bit was 0 then the other dollar is credited to the data structure
(and imagined placed with the new 1 bit). If instead the least-significant bit
was 1 then the data structure already has a one dollar credit, which combined
with the other dollar gives the data structure two dollars to pay for flipping
the next bit to the left. This can be visualized by considering each 1 bit in the
data structure as corresponding to a credit of one dollar that the data structure
currently has. Thus a sequence of n consecutive increment operations would
have an amortized cost of 2n (the real cost might be slightly less), so each takes
A.3. AMORTIZED ANALYSIS 171

on average a constant amortized time 2.


The accounting method is more involved than the aggregate method (which
effectively assigns the same cost to each type of operation), but is more adapt-
able when different operations are combined in a sequence. It requires a wise
choice of amortized costs that pay for each operation before the cost is incurred.
The potential method is closely related to the accounting method, where
instead of using credits and debits the data structure is considered to hold a
certain potential (energy) depending on its state, which can be released to pay
for future changes. Each state D of the data structure is assigned a potential
Φ (D), and the amortized cost of a change is taken to be the real time cost of
the change plus the change in potential Φ Dafter − Φ Dbefore of the data
structure (analogously to conservation of energy). For convenience, the initial
potential of the data structure is usually taken to be 0 and other potentials
non-negative, so that after a sequence of operations the data structure ends up
with at least as much potential as when it started.
A suitable choice of potential function for the array data structure in the
increment example is to take its potential to be the number of 1 bits it holds
(note that this would correspond to the current credit of the data structure
via the accounting method analysis). Considering a sequence of n increment
operations, if t denotes the number of iterations of the while loop for one
particular increment operation in the sequence then t + 1 bits are changed by
the operation (unless there is an overflow), since t bits will be set to 0 and one
bit to 1. So the real time cost can be considered as t + 1. However, the change
in potential of the data structure is 1 − t, since the one bit set to 1 increases
the potential but the t bits set to 0 decrease the potential. Hence the suitable
amortized charge for the increment operation is (t + 1) + (1 − t) = 2. Since
the total amortized charge for the sequence of n increment operation is then 2n
and this equals the total real charge plus the change in potential (which is not
negative), an upper bound on the real charge is 2n, showing again that each
increment operation takes constant time on average.
As another example, consider the array implementation of a stack with push
and pop operations. When the capacity of the array is exceeded by a push
operation a new array of double the capacity is allocated and all the elements in
the previous array must be copied to it. The real cost of the pop operation is 1
(constant time to remove an element from the array), but of the push operation
is variable. If the capacity L has not been reached then the cost of a push
operation is 1 (a basic push), otherwise it is L + 1 (representing the time cost
of copying the elements in the stack to the new array and then performing a
basic push). Starting from an empty stack, suppose we want a bound on the
time cost of performing some sequence of n operations on the stack.
A worst-case analysis might consider that after n operations the stack could
hold up to n elements, and a push operation could cost up to n if the capacity
must be expanded. Hence the worst-case analysis would give a bound of n2 , but
this is clearly not a fair analysis since not all of the n operations could be O(n).
Since the cost of the push operation can vary an amortized analysis can give a
more accurate bound by considering the worst-case sequence of n consecutive
push operations. Using the aggregate method for this sequence and presuming
that initially the capacity is a poor L = 1 gives a total running time of:
n
T (n) = n + (1 + 2 + 4 + 8 + · · · + + n) < n + 2n = 3n,
2
172 APPENDIX A. ADVANCED ANALYSIS TECHNIQUES

Operation Elements on stack Capacity of stack Cost Total


initial 0 1 0 0
push(x0 ) 1 1 1 1
push(x1 ) 2 2 1+1 3
push(x2 ) 3 4 2+1 6
push(x3 ) 4 4 1 7
push(x4 ) 5 8 4+1 12
push(x5 ) 6 8 1 13
push(x6 ) 7 8 1 14
push(x7 ) 8 8 1 15
push(x8 ) 9 16 8+1 24
.. .. .. .. ..
. . . . .
..
push(xn−2 ) n−1 n−1 1 .
push(xn−1 ) n 2n − 2 (n − 1) + 1 T (n)

where the first term (n) is due to the storing of the pushed element in the array
and the other terms account for the (approximately log2 n) times the capacity is
expanded. The aggregate method thus shows that the total cost of any sequence
of n operations is at worst 3n, so the amortized cost of any single operation is
3 (constant time on average).
Alternatively, the accounting method could be used where the amortized cost
of the pop operation is taken to be 1 dollar (and real cost is 1), and the amortized
cost of the push operation is taken as 3 dollars (which has variable real cost).
To justify this choice and that the stack won’t fall into debt, note that for each
push operation 1 is spent in the basic push, 1 is placed as credit for the pushed
element to cover the cost of copying it to a larger array when the capacity is
next exceeded, and the remaining 1 is placed as credit for an element in the first
half of the array as credit for the cost of copying it (since the credit allocated
to it when it was first added was used up in an earlier expansion). Using the
amortized costs the occasional expansion of capacity can now be ignored in the
analysis since its real cost is pre-paid by the push method. Hence the worst case
amortized cost for a sequence of n push operation is 3n, so each has average 3
(constant time). Another valid scheme for the accounting method would be to
assign 4 dollars to the push operation and 0 to the pop operation, so that the
push operation also pays in advance for any eventual pop of that element. One
suitable potential function for applying the potential method would be to take
the potential of a stack holding n elements and capacity L to be Φ = 0 if n < L2
and Φ = 2n − L if n ≥ L2 .
It is instructive to compare this analysis with the analysis of an array im-
plementation of a stack that increases the capacity by a fixed amount K each
time (rather than doubling). Repeating the analysis with the aggregate method
gives a total running time of:
n(n/K + 1)
T (n) = n + (K + 2K + 3K + 4K + · · · + (n − K) + n) = n + ,
2
which is quadratic in n, so each push method would require linear time on
average. For this reason it is much smarter to double the capacity each time
rather than increase it by a fixed amount.

S-ar putea să vă placă și