Documente Academic
Documente Profesional
Documente Cultură
Analysis of Algorithms
Input
Algorithm
Output
best case
average case
worst case
120
100
Running Time
80
60
40
20
0
Easier to analyze
Crucial to applications such as
games, finance and robotics
[Lect01] Analysis of Algorithms
1000
2000
3000
4000
Input Size
Time (ms)
Write a program
implementing the
algorithm
Run the program with
inputs of varying size and
composition
Use a method like
System.currentTimeMillis() to
get an accurate measure
of the actual running time
Plot the results
6000
5000
4000
3000
2000
1000
0
0
50
100
Input Size
[Lect01] Analysis of Algorithms
Limitations of Experiments
It is necessary to implement the
algorithm, which may be difficult
Results may not be indicative of the
running time on other inputs not included
in the experiment.
In order to compare two algorithms, the
same hardware and software
environments must be used
[Lect01] Analysis of Algorithms
Theoretical Analysis
Uses a high-level description of the algorithm
(pseudocode) instead of an implementation
Characterizes running time as a function of
the input size, n.
Takes into account all possible inputs
Allows us to evaluate the speed of an
algorithm independent of the
hardware/software environment
Pseudocode (1.1)
Example: find max
High-level description
element of an array
of an algorithm
More structured than Algorithm arrayMax(A, n)
English prose
Input array A of n integers
Less detailed than a
Output maximum element of A
program
currentMax A[0]
Preferred notation for
for i 1 to n 1 do
describing algorithms
if A[i] currentMax then
Hides program design
currentMax A[i]
issues
return currentMax
Primitive Operations
Basic computations
performed by an algorithm
Identifiable in pseudocode
Largely independent from the
programming language
Exact definition not important
(we will see why later)
Assumed to take a constant
amount of time
Examples:
Performing an
arithmetic ops
Assigning a value
to a variable
Indexing into an
array
Calling a method
Returning from a
method
Comparing two
numbers
7
Counting Primitive
Operations (1.1)
By inspecting the pseudocode, we can determine the
maximum number of primitive operations executed by
an algorithm, as a function of the input size
# operations
2
1+n
2(n 1)
2(n 1)
2(n 1)
1
Algorithm arrayMax(A, n)
currentMax A[0]
for i 1 to n 1 do
if A[i] currentMax then
currentMax A[i]
{ increment counter i }
return currentMax
Total
[Lect01] Analysis of Algorithms
7n 2
8
10
Growth Rates
Constant 1
Logarithmic log n
Linear n
N-Log-N n log n
Quadratic n2
Cubic n3
Exponential 2n
T(n)
Growth rates of
functions:
1E+29
1E+27
1E+25
1E+23
1E+21
1E+19
1E+17
1E+15
1E+13
1E+11
1E+9
1E+7
1E+5
1E+3
1E+1
1E-1
1E-1
Cubic
Quadratic
Linear
In a log-log chart,
the slope of the line
1E+1
corresponds to the
growth rate of the
[Lect01] Analysis of Algorithms
function
1E+3
1E+5
1E+7
1E+9
n
11
Constant Factors
The growth rate is
not affected by
constant factors or
lower-order terms
Examples
T(n)
1E+25
1E+23
1E+21
1E+19
1E+17
1E+15
1E+13
1E+11
1E+9
1E+7
1E+5
1E+3
1E+1
1E-1
1E-1
Quadratic
Quadratic
Linear
Linear
1E+2
1E+5
1E+8
12
2n + 10 cn
(c 2) n 10
n 10/(c 2)
Pick c = 3 and n0 = 10
3n
2n+10
n
10
10
100
1,000
13
Big-Oh Example
1,000,000
n^2
n2 cn
nc
The above inequality
cannot be satisfied
since c must be a
constant
100n
10n
n
10,000
1,000
100
10
1
10
100
1,000
n
[Lect01] Analysis of Algorithms
14
3n3 + 20n2 + 5
3n3 + 20n2 + 5 is O(n3)
need c > 0 and n0 1 such that 3n3 + 20n2 + 5 cn3 for n n0
this is true for c = 4 and n0 = 21
15
g(n) is O(f(n))
Yes
No
No
Yes
Yes
Yes
Same growth
16
Big-Oh Rules
If is f(n) a polynomial of degree d, then f(n) is
O(nd), i.e.,
1.
2.
17
Example:
18
Running Time
1 second
1 minute
1 hour
O(n)
O(n[log n])
O(n2)
O(n4)
O(2n)
2,500
150,000
9,000,000
4,096
166,666
7,826,087
707
5,477
42,426
31
88
244
19
25
31
19
35
30
X
A
25
20
15
10
5
0
1 2 3 4 5 6 7
20
21
Arithmetic Progression
The running time of
prefixAverages1 is
O(1 + 2 + + n)
The sum of the first n
integers is n(n + 1) / 2
Thus, algorithm
prefixAverages1 runs in
O(n2) time
7
6
5
4
3
2
1
0
1
6
22
#operations
n
1
n
n
n
1
23
properties of logarithms:
logb(xy) = logbx + logby
logb (x/y) = logbx - logby
logbxa = alogbx
logba = logxa/logxb
properties of exponentials:
a(b+c) = aba c
abc = (ab)c
ab /ac = a(b-c)
b = a logab
bc = a c*logab
24
25
5n2 is (n2)
26
Time Complexity
Time complexity refers to the use of
asymptotic notation (O, , , o, ) in
denoting running time
If two algorithms accomplishing the same task
belong to two different time complexities:
27
Constant 1 (fastest)
Logarithmic log n
Linear n
N-Log-N n log n
Quadratic n2
Cubic n3
Exponential 2n (slowest)
The speed here refers to the speed in solving the
problem, not the growth rate of time as mentioned
earlier. A fast algorithm has lower growth rate than a
slow algorithm
28
Lecture 02a
Review of
Basic Data Structures
TCP2101 ADA
Description
vector
set
multiset
map
multimap
stack
queue
TCP2101 ADA
Output:
4 2 7 6
4 2 7 6
TCP2101 ADA
TCP2101 ADA
TCP2101 ADA
Output1:
-999
18
321
Enter an integer:
5
5 is NOT in set.
Use iterator to
iterate the set.
Output2:
-999
18
321
Enter an integer:
321
321 is IN set.
TCP2101 ADA
Output:
-999
-999
18
321
multiset allows
duplicate keys
TCP2101 ADA
TCP2101 ADA
char key;
cout << "Enter a char: ";
cin >> key;
it = m.find (key);
if (it == m.end())
cout << key
<< " is NOT in map.";
else
cout << key << " is IN map.";
}
TCP2101 ADA
char key;
cout << "Enter a char: ";
cin >> key;
it = m.find (key);
if (it == m.end())
cout << key
<< " is NOT in map.";
else
cout << key << " is IN map.";
}
Output 1:
Apple Boy Cat
A Apple
B Boy
C Cat
Enter a char: Z
Z is NOT in map
Output 2:
Apple Boy Cat
A Apple
B Boy
C Cat
Enter a char: C
C is IN map
map <char,string> m;
m.insert (pair<char,string>('A',"Apple"));
m.insert (pair<char,string>('A',"Angel"));
TCP2101 ADA
10
TCP2101 ADA
11
TCP2101 ADA
char key;
cout << "Enter a char: ";
cin >> key;
it = mm.find (key);
if (it == mm.end())
cout << key
<< " is NOT in map.";
else
cout << key << " is IN map.";
}
Output 1:
A Apple
A Angle
B Boy
C Cat
Enter a char: Z
Z is NOT in map
Output 2:
A Apple
A Angle
B Boy
C Cat
Enter a char: C
C is IN map
12
TCP2101 ADA
13
TCP2101 ADA
Output:
Push result: 0 1 2 3 4
Pop result : 4 3 2 1 0
14
TCP2101 ADA
15
TCP2101 ADA
Output:
Push result:
front,back
0,0
0,1
0,2
0,3
0,4
Pop result:
front,back
0,4
1,4
2,4
3,4
4,4
16
[]
Insert
Remove
vector
(1) can
go to any
valid position
directly.
O(n) insert at
beginning requires
shifting of all
elements to right
by one position.
O(n) remove at
beginning requires
shifting of all
elements to left by
one position.
O(n) if the
target is the
last item.
set/
multis
et
n/a
O(lg n)
O(lg n)
O(lg n)
map
O(lg n)
O(lg n)
O(lg n)
O(lg n)
multim
ap
n/a
O(lg n)
O(lg n)
O(lg n)
stack
n/a
(1) happen at
top.
(1) happen at
top.
n/a
queue
n/a
(1) happen at
back.
(1) happen at
front.
n/a
TCP2101 ADA
Find
17
Lecture 02b
Hash Tables
start
start
A Node Struct
Template
template <typename T>
struct Node {
T info;
Node<T> *next;
};
The next pointer stores
the address of a Node
of the same type! This
means that each node
can point to another
node.
start
Linked List
Advantages
Linked lists has 2 main advantages over
arrays.
1. Linked lists waste less memory for large
number of elements.
In arrays, the wasted memory is the part of
the array not being utilized.
In linked lists, the wasted memory is the
pointer in each node.
6
Linked List
Advantages (cont.)
2. Linked lists are faster than arrays on the
following 2 operations:
insert new element at start or middle of link
lists.
remove existing element from start or middle
of linked lists.
Linked List
Advantages (cont.)
newNode
10
newNode
Node<T> *newNode = new Node<T>;
newNode->info = element;
Now we have to store element into the node
11
newNod
e
Node<T> *newNode = new Node<T>;
newNode->info = element;
newNode->next = start;
12
newNod
e
Node<T> *newNode = new Node<T>;
newNode->info = element;
newNode->next = start;
start = newNode;
13
14
Time Complexities
for Linked List
insertFront well insert at the head of the linked
list ( 1 )
Find/Delete in the worst case, all nodes in the
linked list are checked, so it is ( n ) unordered
list. E.g find the max/min must search the whole
list
isEmpty is ( 1 ), because we just test the
linked list to see if it is empty
makeEmpty is ( n ), because we need to
delete all nodes
15
16
17
Fast Search
A hash table uses a function of the
key value of an element to identify its
location in an array.
A search for an element can be done
in ( 1 ) time.
The function of the key value is called
a hash function.
18
Hash Functions
The input into a hash function is a key
value
The output from a hash function is an
index of an array (hash table) where the
object containing the key is located
Example of a hash function:
h( k ) = k % 100
19
Example Using a
Hash Function
Suppose our hash function is:
h( k ) = k % 100
Inserting an Element
An element is inserted into a hash table
using the same hash function
h( k ) = k % 100
Collisions
Consider the hash function
h( k ) = k % 100
23
Birthday paradox
Birthday paradox
1
n( p, N ) 2 N ln(
)
1 p
p = probability of collision
N = max no of entry in the hash table
n is the min no of entry to cause a collision
Let N= 500,000 and p = 0.5
n = (2 x 500,000 x 0.693)^0.5
= 833 entries
2
3
4
5
2
3
INSERT object
with key 31
31 % 7 is 3
2
3
4
5
31
INSERT object
with key 31
31 % 7 is 3
The hash function is:
h( k ) = k % 7
29
31
INSERT object
with key 9
9 % 7 is 2
h( k ) = k % 7
30
36
31
4
5
6
INSERT object
with key 36
36 % 7 is 1
The hash function is:
h( k ) = k % 7
31
42
36
31
4
5
INSERT object
with key 42
42 % 7 is 0
The hash function is:
h( k ) = k % 7
32
42
36
31
INSERT object
with key 46
46
46 % 7 is 4
5
6
42
36
31
INSERT object
with key 20
46
20 % 7 is 6
5
6
20
h( k ) = k % 7
34
42
36
31
INSERT object
with key 2
46
2 % 7 is 2
5
6
COLLISION occurs
20
h( k ) = k % 7
35
42
36
31
INSERT object
with key 2
46
2 % 7 is 2
5
6
20
h( k ) = k % 7
36
42
36
31
INSERT object
with key 2
46
2 % 7 is 2
5
6
20
h( k ) = k % 7
37
42
36
31
INSERT object
with key 2
46
2 % 7 is 2
20
h( k ) = k % 7
38
42
36
24
31
46
INSERT object
with key 24
24 % 7 is 3
The hash function is:
20
h( k ) = k % 7
39
42
36
24
31
46
**FIND** the
object with key 9
9 % 7 is 2
The hash function is:
20
h( k ) = k % 7
40
42
36
24
31
46
5
6
9 % 7 is 2
The hash function is:
20
h( k ) = k % 7
41
42
36
24
31
46
5
6
9 % 7 is 2
The hash function is:
20
h( k ) = k % 7
42
42
36
24
31
46
5
6
**FIND** the
object with key 9
9 % 7 is 2
The hash function is:
20
h( k ) = k % 7
43
42
36
24
31
46
5
6
9 % 7 is 2
The hash function is:
20
h( k ) = k % 7
44
42
36
24
31
46
5
6
**FIND** the
object with key 9
9 % 7 is 2
The hash function is:
20
h( k ) = k % 7
45
42
36
24
31
46
9 % 7 is 2
The hash function is:
20
h( k ) = k % 7
46
Uniform Hashing
When the elements are spread evenly (or near
evenly) among the indexes of a hash table, it is
called uniform hashing
If elements are spread evenly, such that the
number of elements at an index is less than
some small constant, uniform hashing allows a
search to be done in ( 1 ) time
The hash function largely determines whether or
not we will have uniform hashing
47
Chaining Problem
In nature there are
Clustering phenomena
Some place are crowded
and most place are empty
Thus, most place no entry
Some entry has long chains
Worst case = O(n) where n is the max
length of the chain
Additional memory is required during run
time
49
Speed vs.
Memory Conservation
Speed comes from reducing the number of
collisions
In a search, if there are no collisions, the
first element in the linked list in the one we
want to find (fast)
Therefore, the greatest speed comes
about by making a hash table much larger
than the number of keys (but there will still
be an occasional collision)
55
Speed vs.
Memory Conservation
(cont.)
Each empty LinkedList object in a hash table
wastes 4 bytes of memory (4 bytes for the start
pointer)
The best memory conservation comes from
trying to reduce the number of empty LinkedList
objects
The hash table size would be made much
smaller than the number of keys (there would
still be an occasional empty linked list)
56
57
58
59
60
Time Complexities
for Hash Table
insert well insert at the head of the linked list
( 1 )
retrieve element is found by hashing, so it is
( 1 ) for uniform hashing (the hash function and
hash table are designed so that the length of the
collision list is bounded by some small constant)
61
62
Reference
Childs, J. S. (2008). Methods for Making
Data Structures. C++ Classes and Data
Structures. Prentice Hall.
64
Lec 03a
Binary Search Tree
Definition of Tree
A tree is a set of linked nodes, such that
there is one and only one path from a
unique node (called the root node) to
every other node in the tree.
A path exists from node A to node B if one
can follow a chain of pointers to travel
from node A to node B.
Paths
A set of linked nodes
D
F
A
E
B
Paths (cont.)
D
F
A
E
B
Cycles
There is no cycle (circle of pointers) in a
tree.
Any linked structure that has a cycle would
have more than one path from the root
node to another node.
Example of a Cycle
D
A
B
C
E
CDBEC
6
B
C
E
2 paths exist from A to C:
1. A C
2. A C D B E C
Example of a Tree
root
In a tree, every
pair of linked
nodes have a
parent-child
relationship (the
parent is closer
to the root)
Example of a Tree
(cont.)
root
For example, C is a
parent of G
Example of a Tree
(cont.)
root
E and F are
children of D
10
Example of a Tree
(cont.)
root
11
Example of a Tree
(cont.)
root
12
Binary Trees
13
H
14
root
A
H
15
Levels
root
A
level 0
level 1
level 2
level 3
J
In a full binary tree, each node has two children except for
the nodes on the last level, which are leaf nodes
17
18
Missing non-rightmost
nodes on the last level
19
Missing rightmost
nodes on the last
level
20
O
21
22
Properties of
Binary Search Trees
A binary search tree does not have to be a
complete binary tree.
For any particular node,
the key in its left child (if any) is less than its
key.
the key in its right child (if any) is greater than
or equal to its key.
The implementation
of a binary search
tree usually just
maintains a single
pointer in the private
section called root,
to point to the root
node.
24
Inserting Nodes
Into a BST
root:
NULL
BST starts off empty
Inserting Nodes
Into a BST (cont.)
root
37
Inserting Nodes
Into a BST (cont.)
root
37
Inserting Nodes
Into a BST (cont.)
root
37
Inserting Nodes
Into a BST (cont.)
root
37
Inserting Nodes
Into a BST (cont.)
root
37
45
Inserting Nodes
Into a BST (cont.)
root
37
45
Inserting Nodes
Into a BST (cont.)
root
37
45
Inserting Nodes
Into a BST (cont.)
root
37
45
48
Inserting Nodes
Into a BST (cont.)
root
37
45
2
41 > 37, so look to
the right
48
Inserting Nodes
Into a BST (cont.)
root
37
45
41
48
Inserting Nodes
Into a BST (cont.)
root
37
45
41
48
29 > 2, right
Inserting Nodes
Into a BST (cont.)
root
37
45
29
41
48
Inserting Nodes
Into a BST (cont.)
root
37
45
29
41
48
20 > 2, right
20 < 29, left
20, 30, 49, 7
38
Inserting Nodes
Into a BST (cont.)
root
37
45
29
41
48
20
20, 30, 49, 7
39
Inserting Nodes
Into a BST (cont.)
root
37
45
29
20
30, 49, 7
41
48
30 < 37
30 > 2
30 > 29
40
Inserting Nodes
Into a BST (cont.)
root
37
45
29
20
41
48
30
30, 49, 7
41
Inserting Nodes
Into a BST (cont.)
root
37
45
29
20
48
41
30
49 > 37
49 > 45
49 > 48
49, 7
42
Inserting Nodes
Into a BST (cont.)
root
37
45
29
20
41
30
48
49
49, 7
43
Inserting Nodes
Into a BST (cont.)
root
37
45
7 < 37
7>2
7 < 29
7 < 20
29
20
41
30
48
49
7
44
Inserting Nodes
Into a BST (cont.)
root
37
45
29
20
7
41
30
48
49
7
45
Inserting Nodes
Into a BST (cont.)
root
37
45
29
20
48
41
30
49
7
46
Searching for a
Key in a BST
root
37
45
29
20
7
41
30
48
Searching for a
key in a BST uses
the same logic
49
Searching for a
Key in a BST (cont.)
root
37
45
29 < 37
29
20
7
41
48
30
49
Key to search for: 29
48
Searching for a
Key in a BST (cont.)
root
37
45
29 > 2
29
20
7
41
48
30
49
Key to search for: 29
49
Searching for a
Key in a BST (cont.)
root
37
45
29 == 29
29
41
48
FOUND IT!
20
7
30
49
Key to search for: 29
50
Searching for a
Key in a BST (cont.)
root
37
45
29
20
7
41
48
30
49
Key to search for: 3
51
Searching for a
Key in a BST (cont.)
root
37
45
3 < 37
3>2
3 < 29
29
41
48
3 < 20
3<7
20
7
30
49
Key to search for: 3
52
Searching for a
Key in a BST (cont.)
root
37
45
2
When the child pointer
you want to follow is
set to NULL, the key
29
you are looking for is
not in the BST
20
7
41
48
30
49
Key to search for: 3
53
Time Complexities
If the binary search tree happens to be a
complete binary tree:
the time for insertion is ( lg n )
the time for the search is O( lg n )
Search in array is O( n )
54
Bad Luck
root
2
7
20
29
Exactly the same keys were
inserted into this BST but
they were inserted in a
different order (the order
shown below)
30
37
41
45
48
2, 7, 20, 29, 30, 37, 41, 45, 48, 49
49
55
2
7
20
29
This is some bad luck, but a
BST can be formed this way
30
37
41
45
48
2, 7, 20, 29, 30, 37, 41, 45, 48, 49
49
56
2
7
20
29
Using the tightest possible
big-oh notation, the insertion
and search time is O( n )
30
37
41
45
48
2, 7, 20, 29, 30, 37, 41, 45, 48, 49
49
57
58
Deletion Case 1:
No Children
root
37
45
29
20
48
41
30
Node 49 has no
children to
delete it, we just
remove it
49
60
Deletion Case 1:
No Children (cont.)
root
37
45
29
20
41
48
30
61
Deletion Case 2:
One Child
root
37
45
29
20
41
30
48
49
62
Deletion Case 2:
One Child (cont.)
root
37
45
29
20
41
30
48
49
63
Deletion Case 2:
One Child (cont.)
root
37
45
29
20
41
30
49
64
Deletion Case 2:
One Child (cont.)
root
37
45
29
20
41
30
48
Another example:
node 2 has one child
to delete it we also
splice it out
49
65
Deletion Case 2:
One Child (cont.)
root
37
45
29
20
41
30
48
Another example:
node 2 has one child
to delete it we also
splice it out
49
66
Deletion Case 2:
One Child (cont.)
root
37
45
29
20
41
30
48
49
67
Deletion Case 3:
Two Children
root
37
45
29
41
48
30
49
68
Deletion Case 3:
Two Children (cont.)
root
37
45
29
20
41
30
48
49
69
Deletion Case 3:
Two Children (cont.)
root
37
45
2
First, we go
to the left
once, then
follow the
right pointers
as far as we
can
29
20
41
30
48
49
70
Deletion Case 3:
Two Children (cont.)
root
37
45
29
20
41
30
30 is the greatest
node in the left
subtree of node 37
48
49
71
Deletion Case 3:
Two Children (cont.)
root
37
45
29
20
41
30
48
49
72
Deletion Case 3:
Two Children (cont.)
root
30
45
29
20
41
Finally, we delete
the lower red node
using case 1 or
case 2 deletion
48
49
73
Deletion Case 3:
Two Children (cont.)
root
30
45
29
20
41
Lets delete node 30
now
48
49
74
Deletion Case 3:
Two Children (cont.)
root
30
45
29
20
41
29 is the greatest node
in the left subtree of
node 30
48
49
75
Deletion Case 3:
Two Children (cont.)
root
30
45
29
20
41
Copy the object at node
29 into node 30
48
49
76
Deletion Case 3:
Two Children (cont.)
root
29
45
29
20
41
This time, the lower red
node has a child to delete
it we use case 2 deletion
48
49
77
Deletion Case 3:
Two Children (cont.)
root
29
45
29
20
41
This time, the lower red
node has a child to delete
it we use case 2 deletion
48
49
78
Deletion Case 3:
Two Children (cont.)
root
29
45
41
20
48
49
79
80
Traversing a BST
There are 3 ways to traversal a BST (visit
every node in BST):
1. Preorder (parent left right)
Root is output first
Unbalanced BST
Insertion is O( n )
Search is O( n )
Deletion is O( n )
83
References
Childs, J. S. (2008). Trees. C++ Classes
and Data Structures. Prentice Hall.
84
Lecture 03b
Priority Queues, and Heaps
Priority Queue
Implementation
To implement a priority queue, an array
sorted in descending order comes to mind
Dequeuing from a sorted array is easy
just get the value at the current front and
increment a front index this is ( 1 ) time
However, enqueuing into a sorted array
would take some time the element would
have to be inserted into its proper position
in the array
5
Enqueuing an Element
79
70
69
67
49
50
51
52
48
7
98
5
99 100
Enqueuing an Element
(cont.)
item: 71
79
70
69
67
49
50
51
52
48
7
98
5
99 100
Enqueuing an Element
(cont.)
item: 71
79
70
69
67
49
50
51
52
48
7
98
5
99 100
Enqueuing an Element
(cont.)
item: 71
79
70
69
67
49
50
51
52
48
7
98
99 100
Enqueuing an Element
(cont.)
item: 71
79
70
69
67
49
50
51
52
48
7
98
99 100
Enqueuing an Element
(cont.)
item: 71
79
70
69
67
49
50
51
52
48
7
98
99 100
Enqueuing an Element
(cont.)
item: 71
79
70
69
67
49
50
51
52
48
7
98
99 100
This process
continues and i
eventually
becomes 51
12
Enqueuing an Element
(cont.)
item: 71
79
70
69
69
49
50
51
i
52
48
20
98
99 100
This process
continues and i
eventually
becomes 51
13
Enqueuing an Element
(cont.)
item: 71
79
70
70
69
49
50
51
i
52
48
20
98
99 100
Enqueuing an Element
(cont.)
item: 71
79
70
70
69
49
50
i
51
52
48
20
98
99 100
Enqueuing an Element
(cont.)
item: 71
79
70
70
69
49
50
i
51
52
48
20
98
99 100
Enqueuing an Element
(cont.)
item: 71
79
71
70
69
49
50
i
51
52
48
20
98
99 100
Enqueuing an Element
(cont.)
If we assume that, on average, half the
elements in an array need to be shifted to
insert an element, then the enqueue for an
array is a ( n ) algorithm
In summary, when using an array for a
priority queue:
dequeue is ( 1 )
enqueue (on average) is ( n )
18
Comparing Operations
So which is better, the heap or the array?
We often eliminate a data structure that has a
high time complexity in a commonly used
operation, even if the other operations have very
low time complexities
In the array, on average, an enqueue-dequeue
pair of operations takes ( n ) + ( 1 ) time, but
( 1 ) is absorbed into ( n ), leaving us with an
overall time complexity of ( n ) per pair of
operations
20
Comparing Operations
(cont.)
In the heap, each enqueue-dequeue pair
of operations takes O( lg n ) + O( lg n )
time, giving us an overall time complexity
of O( lg n ) per pair of operations
The heap is usually better, although the
array can be good in situations where a
group of initial elements are supplied and
sorted, then only dequeue operations are
performed (no enqueue operations)
21
Heaps
A heap is a complete binary tree in which the
value of each node is greater than or equal to
the values of its children (if any)
Parent >= Children
Example of a Heap
root
46
39
28
16
14
32
29
24
Example of a Heap
(cont.)
root
46
39
28
16
14
32
29
24
Dequeue
Dequeuing the object with the greatest
value appears to be a ( 1 ) operation
However, after removing the object, we
must turn the resultant structure into a
heap again, for the next dequeue
Fortunately, it only takes O( lg n ) time to
turn the structure back into a heap again
(which is why dequeue in a heap is a
O( lg n ) operation
25
Dequeue (cont.)
root
46
39
28
16
14
32
29
15
24
26
Dequeue (cont.)
root
remElement: 46
46
39
28
16
14
32
29
15
24
27
Dequeue (cont.)
root
remElement: 46
39
28
16
14
32
29
15
24
28
Dequeue (cont.)
root
remElement: 46
39
28
16
14
32
29
15
24
2
Remove last node
29
Dequeue (cont.)
root
remElement: 46
5
Greatest Child
39 > 5, so
swap
39
16
14
32
29
15
28
24
Dequeue (cont.)
root
remElement: 46
39
28
16
14
32
29
15
24
Dequeue (cont.)
root
remElement: 46
39
28
16
14
32
29
15
24
32
Dequeue (cont.)
root
remElement: 46
39
28
16
14
32
29
15
24
Dequeue (cont.)
root
remElement: 46
39
28
16
14
32
29
15
24
34
Dequeue (cont.)
root
remElement: 46
39
28
16
14
32
29
15
24
Dequeue (cont.)
root
remElement: 46
39
28
Greatest
Child
16
14
32
29
15
24
Dequeue (cont.)
root
remElement: 46
39
28
32 > 5, so
swap
16
14
32
29
15
24
Dequeue (cont.)
root
remElement: 46
39
32
28
16
14
29
24
15
Dequeue (cont.)
root
remElement: 46
39
32
28
16
14
Greatest Child
29
24
15
Dequeue (cont.)
root
remElement: 46
39
32
28
29 > 5, so
swap
5
16
14
Greatest Child
29
24
15
Dequeue (cont.)
root
remElement: 46
39
32
28
16
14
29
15
24
Dequeue (cont.)
root
remElement: 46
39
32
28
16
14
29
15
24
42
Dequeue (cont.)
root
remElement: 46
39
32
28
16
14
29
15
24
Sometimes, it is not
necessary to swap all the
way down through the
heap
43
Dequeue (cont.)
root
remElement: 46
39
32
28
16
14
29
15
24
Heapify
The process of swapping downwards to form a
new heap is called heapifying
When, we heapify, it is important that the rest of
the structure is a heap, except for the root node
that we are starting off with; otherwise, a new
heap wont be formed
A loop is used for heapifying; the number of
times through the loop is always lg n or less,
which gives the O( lg n ) complexity
Each time we swap downwards, the number of
nodes we can travel to is reduced by
approximately half
45
Enqueue
root
39
value to enqueue: 37
32
28
16
14
29
24
15
46
Enqueue (cont.)
root
39
value to enqueue: 37
32
28
16
14
29
15
24
2
Create a new node
in the last position
47
Enqueue (cont.)
root
39
value to enqueue: 37
32
28
16
14
29
15
24
37
2
Place the value to
enqueue in the last
node
48
Enqueue (cont.)
root
39
32
28
16
14
29
15
24
37
If 37 is larger than
its parent, swap
49
Enqueue (cont.)
root
39
32
28
37 > 24,
so swap
16
14
29
15
24
37
If 37 is larger than
its parent, swap
50
Enqueue (cont.)
root
39
32
28
16
14
37
29
15
24
If 37 is larger than
its parent, swap
51
Enqueue (cont.)
root
39
37 > 28,
so swap
32
16
14
37
29
15
24
28
If 37 is larger than
its parent, swap
52
Enqueue (cont.)
root
39
32
37
16
14
28
29
15
24
Notice that 37 is
guaranteed to be
greater than or
equal to its children
53
Enqueue (cont.)
root
39
32
37
16
14
28
29
15
24
Enqueue (cont.)
root
39
32
37
16
14
28
29
15
24
Enqueue (cont.)
root
39
32
37
16
14
28
29
15
24
so 37 must be greater
than the other node
(2) as well.
56
Enqueue (cont.)
root
39
32
37
16
14
28
29
15
24
If 37 is larger than
its parent, swap
57
Enqueue (cont.)
root
39
37 < 39, so
dont swap
32
37
16
14
28
29
15
24
If 37 is larger than
its parent, swap
58
Enqueue (cont.)
root
39
32
37
16
14
28
29
15
24
59
Enqueue (cont.)
root
39
32
37
16
14
28
29
15
24
Implementing a Heap
Although it is helpful to think of a heap as a
linked structure when visualizing the enqueue
and dequeue operations, it is often implemented
with an array
Dont get mixed up with implementing a priority
queue (PQ) and implementing a heap
We have discussed earlier that to implement a
PQ, heap is better than array overall
We are discussing implementing a heap here
Lets number the nodes of a heap, starting with
61
0, going top to bottom and left to right
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
62
Heap Properties
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
63
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
64
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
65
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
66
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
67
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
68
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
69
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
(parent #) = (child # - 1) / 2
(using integer division)
70
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
(parent #) = (child # - 1) / 2
(using integer division)
71
Heap Properties
(cont.)
root
46
0
39
28
16
32
24
25
14
29
15
18
17
10
11
12
13
14
11
15
16
(parent #) = (child # - 1) / 2
(using integer division)
72
73
7 18 17 11 9
9 10 11 12 13 14 15 16
remElement: 46
7 18 17 11 9
9 10 11 12 13 14 15 16
remElement: 46
heapsize: 17
7 18 17 11 9
9 10 11 12 13 14 15 16
remElement: 46
heapsize: 17
7 18 17 11 9
9 10 11 12 13 14 15 16
77
remElement: 46
heapsize: 17
7 18 17 11 9
9 10 11 12 13 14 15 16
78
remElement: 46
7 18 17 11 9
9 10 11 12 13 14 15 16
heapsize: 17
79
remElement: 46
7 18 17 11 9
9 10 11 12 13 14 15 16
heapsize: 16
80
remElement: 46
heapsize: 16
7 18 17 11 9
9 10 11 12 13 14 15 16
81
remElement: 46
heapsize: 16
7 18 17 11 9
9 10 11 12 13 14 15 16
82
7 18 17 11 9
9 10 11 12 13 14 15 16
remElement: 46
heapsize: 16
83
remElement: 46
heapsize: 16
7 18 17 11 9
9 10 11 12 13 14 15 16
(left child #) =
2*(parent #) + 1 =
2*0+1=
1
(right child #) =
2*(parent #) + 2 =
2*0+2=
2
84
remElement: 46
heapsize: 16
7 18 17 11 9
9 10 11 12 13 14 15 16
85
remElement: 46
heapsize: 16
7 18 17 11 9
9 10 11 12 13 14 15 16
(left child #) =
2*(parent #) + 1 =
2*1+1=
3
(right child #) =
2*(parent #) + 2 =
2*1+2=
4
86
remElement: 46
heapsize: 16
7 18 17 11 9
9 10 11 12 13 14 15 16
87
remElement: 46
heapsize: 16
7 18 17 11 9
9 10 11 12 13 14 15 16
(left child #) =
2*(parent #) + 1 =
2*4+1=
9
(right child #) =
2*(parent #) + 2 =
2*4+2=
10
88
9 15 5
0 1
9 10 11 12 13 14 15 16
remElement: 46
heapsize: 16
7 18 17 11 9
89
9 15 5
0 1
9 10 11 12 13 14 15 16
remElement: 46
heapsize: 16
19 > heapsize
7 18 17 11 9
(left child #) =
2*(parent #) + 1 =
2*9+1=
19
90
9 15 5
0 1
9 10 11 12 13 14 15 16
so 9 must be a leaf node
remElement: 46
heapsize: 16
(left child #) =
2*(parent #) + 1 =
2*9+1=
19
7 18 17 11 9
91
9 15 5
0 1
9 10 11 12 13 14 15 16
heapsize: 16
7 18 17 11 9
92
9 15 5
0 1
9 10 11 12 13 14 15 16
heapsize: 16
7 18 17 11 9
94
Enqueue
root
39
value to enqueue: 37
32
28
16
14
29
24
15
96
Enqueue (cont.)
root
39
value to enqueue: 37
32
28
16
14
29
15
24
2
Create a new node
in the last position
97
Enqueue (cont.)
root
39
value to enqueue: 37
32
28
16
14
29
15
24
2
We just pretend
that 37 is here
98
Enqueue (cont.)
root
39
value to enqueue: 37
32
28
16
14
29
15
24
2
37 > 24, so we
pretend to swap
(we just copy 24)
99
Enqueue (cont.)
root
value to enqueue: 37
39
32
28
16
14
24
29
15
24
We now pretend 37
has been placed here
100
Enqueue (cont.)
root
value to enqueue: 37
39
32
28
16
14
24
29
15
24
and we compare 37
to 28 to see if we
should swap again
101
Enqueue (cont.)
root
value to enqueue: 37
39
32
28
16
14
24
29
15
24
37 > 28, so we do a
one-assignment
swap again
102
Enqueue (cont.)
root
value to enqueue: 37
39
32
28
16
14
28
29
15
24
2
We pretend 37
is here
103
Enqueue (cont.)
root
value to enqueue: 37
39
32
28
16
14
28
29
15
24
and compare 37 to
39 to see if we
should swap again
104
Enqueue (cont.)
root
value to enqueue: 37
39
32
28
16
14
28
29
15
24
This time we
shouldnt swap
105
Enqueue (cont.)
root
value to enqueue: 37
39
32
37
16
14
28
29
15
24
106
Enqueue (cont.)
root
value to enqueue: 37
39
32
37
16
14
28
29
15
24
107
Dequeue
root
39
32
37
16
14
28
29
15
24
Parent-Child Formulas
Parent-child formulas can also be sped up:
(left child #) = (parent #) << 1 + 1
(right child #) = (left child #) + 1
when finding the greatest child, the left and right
children are always found together
110
7 12 39 38 5 32 1 34 27 16
9 10 11 12 13 14 15 16
31
34
34
11
12
39
38
32
34
10
11
12
13
14
27
16
15
16
31
34
34
11
12
39
38
32
34
10
11
12
13
14
27
16
15
16
31
34
34
11
12
39
38
32
34
10
11
12
13
14
27
16
15
16
31
34
34
11
12
39
38
32
34
10
11
12
13
14
27
16
15
16
31
34
34
11
12
39
38
32
34
10
11
12
13
14
27
16
15
16
31
34
34
11
12
39
38
32
34
10
11
12
13
14
27
16
15
16
31
34
34
11
12
39
38
32
34
10
11
12
13
14
27
16
15
16
31
34
34
11
27
12
39
38
32
34
10
11
12
13
14
16
15
16
31
34
34
11
27
12
39
38
32
34
10
11
12
13
14
16
15
16
31
34
34
11
34
27
12
39
38
32
10
11
12
13
14
16
15
16
31
34
34
11
34
27
12
39
38
32
10
11
12
13
14
16
15
16
31
34
34
32
34
27
12
39
38
11
10
11
12
13
14
16
15
16
123
31
34
39
32
34
27
12
34
38
11
10
11
12
13
14
16
15
16
124
31
34
39
32
34
27
12
34
38
11
10
11
12
13
14
16
15
16
already a heap
125
31
temp
34
39
32
34
27
12
34
38
11
10
11
12
13
14
16
15
16
one-assignment swaps
126
31
34
34
39
32
27
12
34
38
11
10
11
12
13
14
16
15
16
127
31
34
34
39
32
27
12
34
38
11
10
11
12
13
14
16
15
16
temp
128
39
34
34
38
32
27
12
34
31
11
10
11
12
13
14
16
15
16
129
39
34
34
38
32
27
12
34
31
11
10
11
12
13
14
16
15
16
temp
130
38
34
34
34
32
27
12
31
11
10
11
12
13
14
16
15
16
Linked Heap
If we have a large element size and we
dont know the max size, we might want to
consider conserving memory and avoiding
resizing array with a linked heap
Should have the same time complexities
as the array-based heap
132
Reference
Childs, J. S. (2008). Methods for Making
Data Structures. C++ Classes and Data
Structures. Prentice Hall.
133
Lecture 04a
Graphs
Graphs
Graphs are a very general data structure
A graph consists of a set of nodes called
vertices
Any vertex can point to any number of other
vertices
It is possible that no vertex in the graph points to
any other vertex
Each vertex may point to every other vertex,
even if there are thousands of vertices
2
A Directed Graph
(Digraph)
A
F
D
E
3
An Undirected Graph
A
Each edge
points in
both
directions
ex: A
points to D
and D
points to A
F
D
E
Another Digraph
A
F
B
Nodes A and F point
to each other it is
considered
improper to draw a
single undirected
edge between them
C
E
Node Town
8
4
Vertex City
4
Builders
Paradise
10
Pointerburgh
2
Binary
Tree Valley
6
Graph Implementation
A vertex A is said to be adjacent to a
vertex B if there is an edge pointing from A
to B
Graphs can be implemented in a couple of
popular ways:
adjacency matrix
adjacency list
Adjacency Matrix
Nodes are given a key from 0 to n 1
If weight is not important, the adjacency
matrix is a 2-dimensional array of bool
(True or False) type variables
If weight is important, the adjacency matrix
is a 2-dimensional array of int type
variables
Node 2
8
4
Node 1
10
Node 4
2
Node 3
Node 5
9
Undirected
weighted graph
The row
numbers give
the vertex
number of the
vertex that an
edge is pointing
from
10
Example:
Node 1 points
to Node 2 (set
to T)
Node 2 also
points to Node
1 for undirected
graph
11
10
10
Weighted graph
- Unconnected
12
Node 2
8
4
Node 1
10
Node 4
2
Node 3
Node 5
13
F
Directed graph
Example:
Node 0 points
to Node 1 (set
to T)
But Node 1
doesnt point to
Node 0 (set to
F)
14
10
2
3
Directed
weighted graph
- Unconnected
15
10
16
Adjacency List
An adjacency list is an array of linked lists
The vertices are numbered 0 through
n1
Each index in the array corresponds to the
number of a vertex
The vertex (with an index number) is
adjacent to every node in the linked list at
that index
17
Directed weighted
graph
3
4
5
18
3
4
5
19
3
4
Vertex 3 has an
empty linked list it
has no link to any
other vertices
5
20
1 4
2 3
3 4
5 10
4 4
4 8
3
4
5 2
24
Depth-first search
Breadth-First Search
Topological Sorting
SFO
LAX
Graphs
ORD
DFW
Graph
A graph is a pair (V, E), where
Example:
PVD
ORD
SFO
LGA
HNL
LAX
DFW
Graphs
MIA
2
Edge Types
Directed edge
ORD
flight
AA 1206
PVD
ORD
849
miles
PVD
Undirected edge
Directed graph
Undirected graph
Applications
cslab1a
cslab1b
Electronic circuits
math.brown.edu
cs.brown.edu
Transportation networks
Highway network
Flight network
brown.edu
qwest.net
att.net
Computer networks
cox.net
John
Databases
Paul
David
Entity-relationship diagram
Graphs
Terminology
End vertices (or endpoints) of
an edge
Adjacent vertices
h
X
j
Z
i
g
Self-loop
X has degree 5
Parallel edges
Degree of a vertex
j is a self-loop
Graphs
Terminology (cont.)
Path
sequence of alternating
vertices and edges
begins with a vertex
ends with a vertex
each edge is preceded and
followed by its endpoints
Simple path
Examples
Graphs
a
U
c
d
P2
P1
X
Terminology (cont.)
Cycle
Simple cycle
Examples
C1=(V,b,X,g,Y,f,W,c,U,a,) is a
simple cycle
C2=(U,c,W,e,X,g,Y,f,W,d,V,a,)
is a cycle that is not simple
Graphs
a
U
c
d
C2
C1
g
W
f
Properties
Property 1
Notation
Sv deg(v) = 2m
n
m
deg(v)
Property 2
number of vertices
number of edges
degree of vertex v
Example
n = 4
m = 6
deg(v) = 3
In an undirected graph
with no self-loops and
no multiple edges
m n (n - 1)/2
Proof: each vertex has
degree at most (n - 1)
Asymptotic Performance
n vertices, m edges
no parallel edges
no self-loops
Bounds are big-Oh
Adjacency
List
Adjacency
Matrix
Space
n+m
n2
incidentEdges(v)
areAdjacent (v, w)
insertVertex(o)
deg(v)
min(deg(v), deg(w))
1
n
1
n2
insertEdge(v, w, o)
deg(v)
1
n2
1
removeVertex(v)
removeEdge(e)
Graphs
Depth-First Search:
Outline and Reading
Depth-first search (6.3.1)
Algorithm
Example
Properties
Analysis
A
B
Graphs
10
Subgraphs
A subgraph S of a graph
G is a graph such that
Subgraph
A spanning subgraph of G
is a subgraph that
contains all the vertices
of G
Spanning subgraph
Graphs
11
Connectivity
A graph is
connected if there is
a path between
every pair of
vertices
A connected
component of a
graph G is a
maximal connected
subgraph of G
Graphs
Connected graph
A forest is an undirected
graph without cycles
The connected
components of a forest
are trees
Graphs
Tree
Forest
13
Graph
Spanning tree
Graphs
14
Depth-First Search
Depth-first search (DFS)
is a general technique
for traversing a graph
A DFS traversal of a
graph G
Depth-first search is to
graphs what Euler tour
is to binary trees
15
DFS Algorithm
The algorithm uses a mechanism
for setting and getting labels of
vertices and edges
Algorithm DFS(G)
Input graph G
Output labeling of the edges of G
as discovery edges and
back edges
for all u G.vertices()
setLabel(u, UNEXPLORED)
for all e G.edges()
setLabel(e, UNEXPLORED)
for all v G.vertices()
if getLabel(v) = UNEXPLORED
DFS(G, v)
Algorithm DFS(G, v)
Input graph G and a start vertex v of G
Output labeling of the edges of G
in the connected component of v
as discovery edges and back edges
setLabel(v, VISITED)
for all e G.incidentEdges(v)
if getLabel(e) = UNEXPLORED
w opposite(v,e)
if getLabel(w) = UNEXPLORED
setLabel(e, DISCOVERY)
DFS(G, w)
else
setLabel(e, BACK)
Graphs
16
Example
unexplored vertex
visited vertex
unexplored edge
discovery edge
back edge
A
B
A
B
A
D
C
Graphs
17
Example (cont.)
A
B
A
D
C
Graphs
18
We mark each
intersection, corner
and dead end (vertex)
visited
We mark each corridor
(edge ) traversed
We keep track of the
path back to the
entrance (start vertex)
by means of a rope
(recursion stack)
Graphs
19
Properties of DFS
Property 1
DFS(G, v) visits all the
vertices and edges in
the connected
component of v
Property 2
The discovery edges
labeled by DFS(G, v)
form a spanning tree of
the connected
component of v
Graphs
20
Analysis of DFS
Setting/getting a vertex/edge label takes O(1) time
Each vertex is labeled twice
once as UNEXPLORED
once as VISITED
once as UNEXPLORED
once as DISCOVERY or BACK
Recall that
Sv deg(v) = 2m
Graphs
21
Breadth-First Search
L0
L1
L2
Graphs
C
E
D
F
22
Breadth-First Search
Breadth-first search
(BFS) is a general
technique for traversing
a graph
A BFS traversal of a
graph G
BFS Algorithm
The algorithm uses a
mechanism for setting and
getting labels of vertices
and edges
Algorithm BFS(G)
Input graph G
Output labeling of the edges
and partition of the
vertices of G
for all u G.vertices()
setLabel(u, UNEXPLORED)
for all e G.edges()
setLabel(e, UNEXPLORED)
for all v G.vertices()
if getLabel(v) = UNEXPLORED
BFS(G, v)
Algorithm BFS(G, s)
L0 new empty sequence
L0.insertLast(s)
setLabel(s, VISITED)
i0
while Li.isEmpty()
Li +1 new empty sequence
for all v Li.elements()
for all e G.incidentEdges(v)
if getLabel(e) = UNEXPLORED
w opposite(v,e)
if getLabel(w) = UNEXPLORED
setLabel(e, DISCOVERY)
setLabel(w, VISITED)
Li +1.insertLast(w)
else
setLabel(e, CROSS)
i i +1
Graphs
24
Example
L0
unexplored vertex
visited vertex
unexplored edge
discovery edge
cross edge
L0
L1
L1
L0
C
L1
D
F
C
E
C
E
Graphs
D
F
25
Example (cont.)
L0
L1
L0
C
E
L0
L1
D
F
L0
C
L2
L2
L1
L1
Graphs
C
E
D
F
L2
C
E
D
F
26
Example (cont.)
L0
L1
L0
L1
L2
C
E
L1
L2
A
C
E
D
F
L2
L0
F
Graphs
27
Properties
Notation
Property 1
B
E
Property 2
Property 3
L0
L1
C
F
L2
C
E
D
F
28
Analysis
Setting/getting a vertex/edge label takes O(1) time
Each vertex is labeled twice
once as UNEXPLORED
once as VISITED
once as UNEXPLORED
once as DISCOVERY or CROSS
Recall that
Sv deg(v) = 2m
Graphs
29
Applications
Using the template method pattern, we can
specialize the BFS traversal of a graph G to
solve the following problems in O(n + m) time
30
DFS
BFS
Biconnected components
L0
A
B
C
E
L1
L2
DFS
A
C
BFS
Graphs
31
w is an ancestor of v in
the tree of discovery
edges
L0
A
B
C
E
L1
L2
DFS
C
E
D
F
BFS
Graphs
32
B
C
DAG G
A
D
B
C
A
v4
v5
v3
Topological
ordering of G
33
Topological Sorting
Number vertices, so that (u,v) in E implies u < v
wake up
2
study computer sci.
eat
4
7
play
nap
8
write c.s. program
9
make cookies
for professors
5
more c.s.
6
work out
10
sleep
11
dream about graphs
34
Topological Sorting
Algorithm using DFS
Simulate the algorithm by using
depth-first search
Algorithm topologicalDFS(G)
Input dag G
Output topological ordering of G
n G.numVertices()
for all u G.vertices()
setLabel(u, UNEXPLORED)
for all e G.edges()
setLabel(e, UNEXPLORED)
for all v G.vertices()
if getLabel(v) = UNEXPLORED
topologicalDFS(G, v)
O(n+m) time.
Algorithm topologicalDFS(G, v)
Input graph G and a start vertex v of G
Output labeling of the vertices of G
in the connected component of v
setLabel(v, VISITED)
for all e G.incidentEdges(v)
if getLabel(e) = UNEXPLORED
w opposite(v,e)
if getLabel(w) = UNEXPLORED
setLabel(e, DISCOVERY)
topologicalDFS(G, w)
else
{e is a forward or cross edge}
Label v with topological number n
nn-1
36
37
9
38
8
9
39
7
8
9
40
6
7
8
9
41
5
7
8
9
42
4
6
5
7
8
9
43
5
7
8
9
44
5
7
8
9
45
3
4
6
5
7
8
9
46
Lecture 05a
Selectionsort, Heapsort &
Quicksort
Sorting
Sorting is the process of placing elements in
order
If elements are objects, they are ordered by a
particular data member
There are many algorithms for sorting, each
having its own advantages
No sorting algorithm is better than every other
sorting algorithm all the time
Choosing the right sorting algorithm for a
problem requires careful consideration
2
Selectionsort
Idea: Find the largest number from the unsorted
numbers, place it at the correct location.
Array: 3 6 5 2 4
n=5
i = 4: 3 4 5 2 6
i = 3: 3 4 2 5 6
i = 2: 3 2 4 5 6
i = 1: 2 3 4 5 6 (sorted)
3
Selectionsort (cont.)
void selectionSort (int a[], int n) {
for (int i = n-1; i > 0; i--) {
// find the max element in the unsorted a[i .. n-1]
int maxIndex = i; // assume the max is the last element
// test against elements before i to find the largest
for (int j = 0; j < i; j++) {
// if this element is larger, then it is the new max
if (a[j] > a[maxIndex])
// found new max; remember its index
maxIndex = j;
}
// maxIndex is the index of the max element,
// swap it with the current position
if (maxIndex != i)
swap (a[i], a[maxIndex]);
}
}
4
Selectionsort Time
Complexity
By the same analysis as the
prefixAverages1 algorithm in Lecture 1 (
the "for" loops of the 2 algorithms are
similar), Selectionsort runs in ( n2 ) time
Heapsort
Algorithm Heapsort (S, n)
Input sequence S of n elements
Output sorted sequence S
pq priority queue (S)
for i n-1 to 0
pq.dequeue (S[i])
Heapsort (cont.)
Algorithm Heapsort (S, n)
Input sequence S of n elements
Output sorted sequence S
pq priority queue (S)
for i n-1 to 0
pq.dequeue (S[i])
Heapsort (cont.)
Algorithm Heapsort (S, n)
Input sequence S of n elements
Output sorted sequence S
pq priority queue (S)
for i n-1 to 0
pq.dequeue (S[i])
Heapsort (cont.)
Algorithm Heapsort (S, n)
Input sequence S of n elements
Output sorted sequence S
pq priority queue (S)
for i n-1 to 0
pq.dequeue (S[i])
Heapsort (cont.)
Algorithm Heapsort (S, n)
Input sequence S of n elements
Output sorted sequence S
pq priority queue (S)
for i n-1 to 0
pq.dequeue (S[i])
Heapsort (cont.)
S = 3 6 5 2 4: pq = 6 4 5 2 3
i = 4: pq = 5 4 3 2: S = 3 6 5 2 6
i = 3: pq = 4 2 3: S = 3 6 5 5 6
i = 2: pq = 3 2: S = 3 6 4 5 6
i = 1: pq = 2: S = 3 3 4 5 6
i = 0: pq = : S = 2 3 4 5 6 (sorted)
11
Heapsort (cont.)
The previous Heapsort algorithm requires
2 arrays to work. The second array is used
to create a heap (priority queue)
Instead of create a heap separately, we
can actually create a heap on the original
array
Lec03b, section Forming an Initial Heap
12
Heapsort (cont.)
S=36524
Forming an Initial Heap
S/pq = 6 4 5 2 3 (S and pq are the array)
i = 4: S = 5 4 3 2 6 (pq = 5 4 3 2)
i = 3: S = 4 2 3 5 6 (pq = 4 3 2)
i = 2: S = 3 6 4 5 6 (pq = 3 2)
i = 1: S = 3 3 4 5 6 (pq = 2)
i = 0: S = 2 3 4 5 6 (sorted)
13
Heapsort Time
Complexity
Heapsort runs in ( n lg n ) time on
average
It has a best-case time of ( n ) if all
elements have the same value
This is an unusual case, but we may want to
consider it if many of the elements have the
same value
14
Quicksort
Time complexities of Quicksort
best: ( n lg n )
average: ( n lg n )
worst: ( n2 )
Functions of Quicksort
A recursive function, called quicksort
A nonrecursive function, usually called
partition
Partition
pivot
10
14
28
10
35
46
47
38
11
11
28
Partition (cont.)
pivot
10
11
14
28
10
35
46
47
38
11
10
11
14
28
10
11
28
38
47
35
46
28
All elements less than or equal to the pivot are on its left
18
Partition (cont.)
pivot
10
11
14
28
10
35
46
47
38
11
10
11
14
28
10
11
28
38
47
35
46
28
All elements greater than the pivot are on its right side
19
Partition (cont.)
pivot
10
11
14
28
10
35
46
47
38
11
10
11
14
28
10
11
28
38
47
35
46
28
Partition (cont.)
pivot
10
11
14
28
10
35
46
47
38
11
10
11
14
28
10
11
28
38
47
35
46
28
Partition (cont.)
0
10
11
14
28
10
11
28
38
47
35
46
22
Partition (cont.)
0
10
11
14
28
10
11
28
38
47
35
46
pivot
23
Partition (cont.)
0
10
11
10
14
11
28
28
38
47
35
46
pivot
24
Partition (cont.)
0
10
11
10
14
11
28
28
38
47
35
46
25
Partition (cont.)
0
10
11
10
14
11
28
28
38
47
35
46
26
Partition (cont.)
0
10
11
10
14
11
28
28
38
47
35
46
Partition is called
for this section
27
Partition (cont.)
0
10
11
10
14
11
28
28
38
47
35
46
pivot
28
Partition (cont.)
0
10
11
10
14
11
28
28
38
47
35
46
pivot
29
Partition (cont.)
0
10
11
10
14
11
28
28
38
47
35
46
30
Partition (cont.)
0
10
11
10
14
11
28
28
38
47
35
46
pivot
31
Partition (cont.)
0
10
11
10
11
14
28
28
38
47
35
46
pivot
32
Partition (cont.)
0
10
11
10
11
14
28
28
38
47
35
46
partition
33
Partition (cont.)
0
10
11
10
11
14
28
28
38
47
35
46
pivot
34
Partition (cont.)
0
10
11
10
11
14
28
28
38
47
35
46
pivot
35
Partition (cont.)
0
10
11
10
11
14
28
28
38
47
35
46
partition not
called for this
36
Partition (cont.)
0
10
11
10
11
14
28
28
38
47
35
46
partition not
called for this
37
Partition (cont.)
0
10
11
10
11
14
28
28
38
47
35
46
partition called
for this
38
Partition (cont.)
0
10
11
10
11
14
28
28
38
47
35
46
pivot
39
Partition (cont.)
0
10
11
10
11
14
28
28
38
35
46
47
pivot
40
Partition (cont.)
0
10
11
10
11
14
28
28
38
35
46
47
partition called
for this
41
Partition (cont.)
0
10
11
10
11
14
28
28
38
35
46
47
pivot
42
Partition (cont.)
0
10
11
10
11
14
28
28
35
38
46
47
pivot
43
Partition (cont.)
0
10
11
10
11
14
28
28
35
38
46
47
Partition not
called for this
44
Partition (cont.)
0
10
11
10
11
14
28
28
35
38
46
47
Partition not
called for this
45
Partition (cont.)
0
10
11
10
11
14
28
28
35
38
46
47
46
Partition (cont.)
0
10
11
10
11
14
28
28
35
38
46
47
47
Partition (cont.)
pivot
10
14
10
35
46
47
38
11
11
28
48
Partition (cont.)
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
49
Partition (cont.)
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
Partition (cont.)
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
51
Partition (cont.)
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
52
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
53
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
54
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
55
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
56
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
57
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
58
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
then i is incremented
59
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
then i is incremented
60
Partition (cont.)
i
pivot
10
14
28
10
35
46
47
38
11
partitioned section
11
28
unpartitioned section
61
Partition (cont.)
i
pivot
10
14
28
10
46
47
35
38
11
partitioned section
11
28
unpartitioned section
62
Partition (cont.)
i
pivot
10
14
28
10
46
47
35
38
11
partitioned section
11
28
unpartitioned section
then j is incremented
63
Partition (cont.)
i
pivot
10
14
28
10
46
47
35
38
11
partitioned section
11
28
unpartitioned section
then j is incremented
64
Partition (cont.)
1
2
3
4
5
6
7
8
partition( arr, p, r )
Partition starts off by
i=p1
passing in the array
for all j from p to r 1
arr
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
65
Partition (cont.)
1
2
3
4
5
6
7
8
partition( arr, p, r )
the beginning index of
i=p1
the array section it is
for all j from p to r 1
working with
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
66
Partition (cont.)
1
2
3
4
5
6
7
8
partition( arr, p, r )
and the ending index
i=p1
of the array section it
for all j from p to r 1
is working with
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
67
Partition (cont.)
1
2
3
4
5
6
7
8
partition( arr, p, r )
i is used to mark the
i=p1
end of the small-side
for all j from p to r 1
part of the partition
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
68
Partition (cont.)
1
2
3
4
5
6
7
8
partition( arr, p, r )
initially, there isnt
i=p1
one, so i is set to p 1
for all j from p to r 1
(-1 if p is 0)
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
69
Partition (cont.)
1
2
3
4
5
6
7
8
partition( arr, p, r )
i=p1
this loop iterates
for all j from p to r 1
through the
if arr[ j ] <= arr[ r ]
elements
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
70
Partition (cont.)
1
2
3
4
5
6
7
8
partition( arr, p, r )
i=p1
for all j from p to r 1
comparing them to the
if arr[ j ] <= arr[ r ]
pivot arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
71
Partition (cont.)
p
10
14
28
10
46
47
35
38
11
1
2
partition( arr, p, r )
i=p1
4
5
6
7
8
11
28
j is currently 7,
with partition
having already
iterated through
elements 0
through 6
72
Partition (cont.)
p
10
14
28
10
46
47
35
38
11
1
2
3
4
5
6
7
8
11
28
partition( arr, p, r )
i=p1
for all j from p to r 1
73
Partition (cont.)
p
10
14
28
10
46
47
35
38
11
1
2
3
4
5
6
7
8
11
28
partition( arr, p, r )
i=p1
for all j from p to r 1
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
74
Partition (cont.)
p
10
14
28
10
47
35
46
38
11
1
2
3
4
5
6
7
8
11
28
partition( arr, p, r )
i=p1
for all j from p to r 1
if arr[ j ] <= arr[ r ]
i++
75
Partition (cont.)
p
10
14
28
10
47
35
46
38
11
1
2
partition( arr, p, r )
i=p1
4
5
6
7
8
11
28
76
Partition (cont.)
p
10
14
28
10
47
35
46
38
11
1
2
3
4
5
6
7
8
11
28
partition( arr, p, r )
i=p1
for all j from p to r 1
77
Partition (cont.)
p
10
14
28
10
47
35
46
38
11
1
2
partition( arr, p, r )
i=p1
4
5
6
7
8
11
28
78
Partition (cont.)
p
10
14
28
10
47
35
46
38
11
1
2
3
4
5
6
7
8
11
28
partition( arr, p, r )
i=p1
for all j from p to r 1
79
Partition (cont.)
p
10
14
28
10
47
35
46
38
11
1
2
3
4
5
6
7
8
11
28
partition( arr, p, r )
i=p1
for all j from p to r 1
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
80
Partition (cont.)
p
10
14
28
10
11
35
46
38
47
1
2
3
4
5
6
7
8
11
28
partition( arr, p, r )
i=p1
for all j from p to r 1
if arr[ j ] <= arr[ r ]
i++
81
Partition (cont.)
p
r
11
10
14
28
10
11
35
46
38
47
1
2
partition( arr, p, r )
i=p1
4
5
6
7
8
28
82
Partition (cont.)
p
r
11
10
14
28
10
11
35
46
38
47
1
2
3
4
5
6
7
8
28
partition( arr, p, r )
i=p1
for all j from p to r 1
83
Partition (cont.)
p
r
11
10
14
28
10
11
35
46
38
47
1
2
3
4
5
6
7
8
28
partition( arr, p, r )
i=p1
for all j from p to r 1
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
84
Partition (cont.)
p
10
11
14
28
10
11
46
38
47
35
28
1
2
3
4
5
6
7
8
partition( arr, p, r )
i=p1
for all j from p to r 1
if arr[ j ] <= arr[ r ]
i++
85
Partition (cont.)
p
10
11
14
28
10
11
46
38
47
35
28
1
2
partition( arr, p, r )
i=p1
4
5
6
7
8
j isnt incremented
past r - 1
86
Partition (cont.)
p
10
11
14
28
10
11
28
38
47
35
46
1
2
3
4
5
6
partition( arr, p, r )
i=p1
for all j from p to r 1
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
return i + 1
87
Partition (cont.)
p
10
11
14
28
10
11
28
38
47
35
46
1
2
3
4
5
6
7
partition( arr, p, r )
i=p1
for all j from p to r 1
if arr[ j ] <= arr[ r ]
i++
swap( arr[ i ], arr[ j ] )
swap ( arr[ i + 1 ], arr[ r ] )
return i + 1
The Quicksort
Function
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
Since quicksort is
called recursively, it
also passes in the
array arr, the
beginning index p of
the section it is
working with, and the
ending index r of the
section it is working
with
89
The Quicksort
Function (cont.)
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
90
The Quicksort
Function (cont.)
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
pi
r
91
The Quicksort
Function (cont.)
pi-1
pi
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
quicksort is called
recursively here,
working with the
section on the left of
the pivot
92
The Quicksort
Function (cont.)
pi+1
pi-1
pi
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
The Quicksort
Function (cont.)
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
94
The Quicksort
Function (cont.)
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
95
The Quicksort
Function (cont.)
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
96
The Quicksort
Function (cont.)
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
97
The Quicksort
Function (cont.)
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
98
22
94 4 9
99
44
Learning Outcome
Merge-sort (4.1.1)
Summary of Sorting Algorithms
Radix-Sort (4.5.2)
Quick-Select ( 4.7)
Merge-Sort
Merge-sort is a sorting algorithm based on the
divide-and-conquer paradigm
Like heap-sort
Unlike heap-sort
Merge-Sort (cont.)
Merge-sort on an input sequence S with n elements
consists of three steps:
Algorithm Merge-Sort
Input Parameters: array a, start index p, end index r.
Output Parameter: array a.
Mergesort (a, p, r) {
// if only one element, just return.
if (p == r)
return
// Divide: divide a into two nearly equal parts.
m = (p + r) / 2
// Recur: sort each half.
Mergesort (a, p, m)
Mergesort (a, m + 1, r)
// Conquer: merge the two sorted halves.
Merge (a, p, m, r)
}
5
Merge-Sort Tree
An execution of merge-sort is depicted by a binary tree
7 2
7
9 4 2 4 7 9
2 2 7
77
22
4 4 9
99
44
7
Execution Example
Partition
7 2 9 43 8 6 1 1 2 3 4 6 7 8 9
7 2 9 4 2 4 7 9
7 2 2 7
77
22
3 8 6 1 1 3 8 6
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
8
7 29 4 2 4 7 9
7 2 2 7
77
22
3 8 6 1 1 3 8 6
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
9
7 29 4 2 4 7 9
722 7
77
22
3 8 6 1 1 3 8 6
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
10
7 29 4 2 4 7 9
722 7
77
22
3 8 6 1 1 3 8 6
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
11
7 29 4 2 4 7 9
722 7
77
22
3 8 6 1 1 3 8 6
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
12
7 29 4 2 4 7 9
722 7
77
22
3 8 6 1 1 3 8 6
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
13
7 29 4 2 4 7 9
722 7
77
22
3 8 6 1 1 3 8 6
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
14
7 29 4 2 4 7 9
722 7
77
22
3 8 6 1 1 3 8 6
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
15
7 29 4 2 4 7 9
722 7
77
22
3 8 6 1 1 3 6 8
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
16
7 29 4 2 4 7 9
722 7
77
22
3 8 6 1 1 3 6 8
9 4 4 9
99
44
3 8 3 8
33
88
6 1 1 6
66
11
17
Analysis of Merge-Sort
The height h of the merge-sort tree is O(log n)
at each recursive call we divide in half the sequence,
n/2
2i
n/2i
18
Time
Notes
selection-sort
O(n2)
in-place
slow (good for small
inputs)
quick-sort
O(n log n)
O(n2) for worst case if
not randomized
heap-sort
merge-sort
in-place, randomized
fastest (good for large
inputs)
O(n log n)
in-place
fast (good for large inputs)
O(n log n)
3, a
3, b
7, d
7, g
7, e
0 1 2 3 4 5 6 7 8 9
20
Bucket-Sort (4.5.1)
Let be S be a sequence of n
(key, element) items with keys
in the range [0, N - 1]
Bucket-sort uses the keys as
indices into an auxiliary array B
of sequences (buckets)
Phase 1: Empty sequence S by
moving each item (k, o) into its
bucket B[k]
Phase 2: For i = 0, , N - 1, move
the items of bucket B[i] to the
end of sequence S
Analysis:
Algorithm bucketSort(S, N)
Input sequence S of (key, element)
items with keys in the range
[0, N - 1]
Output sequence S sorted by
increasing keys
B array of N empty sequences
while S.isEmpty()
f S.first()
(k, o) S.remove(f)
B[k].insertLast((k, o))
for i 0 to N - 1
while B[i].isEmpty()
f B[i].first()
(k, o) B[i].remove(f)
S.insertLast((k, o))
21
Example
Key range [0, 9]
7, d
1, c
3, a
7, g
3, b
7, e
Phase 1
1, c
B
3, a
3, b
7, d
7, g
7, e
Phase 2
1, c
3, a
3, b
7, d
7, g
7, e
22
Radix-Sort (4.5.2)
Radix-sort is a
specialization of
lexicographic-sort that
uses bucket-sort as the
stable sorting algorithm
in each dimension
Radix-sort is applicable
to tuples where the
keys in each dimension i
are integers in the
range [0, N - 1]
Radix-sort runs in time
O(d( n + N))
Algorithm radixSort(S, N)
Input sequence S of d-tuples such
that (0, , 0) (x1, , xd) and
(x1, , xd) (N - 1, , N - 1)
for each tuple (x1, , xd) in S
Output sequence S sorted in
lexicographic order
for i d downto 1
bucketSort(S, N)
23
Radix-Sort for
Binary Numbers
Consider a sequence of n
b-bit integers
x = xb - 1 x1x0
We represent each element
as a b-tuple of integers in
the range [0, 1] and apply
radix-sort with N = 2
This application of the
radix-sort algorithm runs in
O(bn) time
For example, we can sort a
sequence of 32-bit integers
in linear time
Algorithm binaryRadixSort(S)
Input sequence S of b-bit
integers
Output sequence S sorted
replace each element x
of S with the item (0, x)
for i 0 to b - 1
replace the key k of
each item (k, x) of S
with bit xi of x
bucketSort(S, 2)
24
Example
Sorting a sequence of 4-bit integers
1001
0010
1001
1001
0001
0010
1110
1101
0001
0010
1101
1001
0001
0010
1001
0001
1101
0010
1101
1101
1110
0001
1110
1110
1110
25
7 4 9 6 2 2 4 6 7 9
Quick-Select ( 4.7)
Quick-select is a randomized
selection algorithm based on
the prune-and-search
paradigm:
k < |L|
k > |L|+|E|
k = k - |L| - |E|
Algorithm Quick-Select
Input Parameters: array a, start index p, end index r,
target index k.
Output Parameter: a[k] at the correct position.
QuickSelect (a, p, r, k) {
if (p < r) {
pi = Partition (a, p, r) // pivot index.
if (k == pi)
return
if (k < pi)
QuickSelect (a, p, pi - 1, k)
else
QuickSelect (a, pi + 1, r, k)
}
}
28
Partition
The partition step of Quick-Select is the same
partition in Quick-Sort which takes O(n) time.
Based on Probabilistic Facts, Quick-Select runs in
time O(n).
29
Quick-Select Visualization
An execution of quick-select can be visualized by a
recursion path
k=5, S=(7 4 9 3 2 6 5 1 8)
k=2, S=(7 4 9 6 5 8)
k=2, S=(7 4 6 5)
k=1, S=(7 6 5)
5
30
Iterative substitution
Recursion trees
The master method
Divide-and-Conquer
Divide-and conquer is a
general algorithm design
paradigm:
Merge-Sort Review
Merge-sort on an input sequence S with n elements
consists of three steps:
Merge-Sort Review
Input Parameters: array a, start index p, end index r.
Output Parameter: array a.
Mergesort (a, p, r) {
// if only one element, just return.
if (p == r)
return
// Divide: divide a into two nearly equal parts.
m = (p + r) / 2
// Recur: sort each half.
Mergesort (a, p, m)
Mergesort (a, m + 1, r)
// Conquer: merge the two sorted halves.
Merge (a, p, m, r)
}
4
Recurrence Equation
Analysis
The conquer step of merge-sort consists of merging two sorted
sequences, each with n/2 elements and implemented by means of
a doubly linked list, takes at most bn steps, for some constant b.
Likewise, the basis case (n < 2) will take at b most steps.
Therefore, if we let T(n) denote the running time of merge-sort:
T (n)
2T (n / 2) bn
if n 2
if n 2
That is, a solution that has T(n) only on the left-hand side.
Iterative Substitution
In the iterative substitution, or plug-and-chug, technique, we
iteratively apply the recurrence equation to itself and see if we can
find a pattern:
T ( n ) 2T ( n / 2) bn
2( 2T ( n / 22 )) b( n / 2)) bn
2 2 T ( n / 2 2 ) 2bn
23 T ( n / 23 ) 3bn
2 4 T ( n / 2 4 ) 4bn
...
2i T ( n / 2i ) ibn
Note that base, T(n)=b, case occurs when 2i=n. That is, i = log n.
So,
T (n) bn bn log n
Thus, T(n) is O(n log n).
6
T (n)
2T (n / 2) bn
if n 2
if n 2
time
depth
Ts
size
bn
n/2
bn
2i
n/2i
bn
Master Method
Many divide-and-conquer recurrence equations have
the form:
T (n)
aT ( n / b) f ( n )
if n d
if n d
if n d
aT ( n / b) f ( n )
if n d
The form: T (n )
Example:
T (n) 4T (n / 2) n
if n d
aT ( n / b) f ( n )
if n d
The form: T (n )
Example:
T (n) 2T (n / 2) n log n
if n d
aT ( n / b) f ( n )
if n d
The form: T (n )
Example:
T (n) T (n / 3) n log n
if n d
aT ( n / b) f ( n )
if n d
The form: T (n )
Example:
T (n) 8T (n / 2) n
if n d
aT ( n / b) f ( n )
if n d
The form: T (n )
Example:
T (n) 9T (n / 3) n
if n d
aT ( n / b) f ( n )
if n d
The form: T (n )
Example:
T (n) T (n / 2) 1
(binary search)
if n d
aT ( n / b) f ( n )
if n d
The form: T (n )
Example:
T (n) 2T (n / 2) log n
(heap construction)
Integer Multiplication
Algorithm: Multiply two n-bit integers I and J.
I I h 2n / 2 I l
J J h 2n / 2 J l
I * J ( I h 2n / 2 I l ) * ( J h 2n / 2 J l )
I h J h 2n I h J l 2n / 2 I l J h 2n / 2 I l J l
An Improved Integer
Multiplication Algorithm
Algorithm: Multiply two n-bit integers I and J.
I I h 2n / 2 I l
J J h 2n / 2 J l
I * J I h J h 2n [( I h I l )( J l J h ) I h J h I l J l ]2n / 2 I l J l
I h J h 2n [( I h J l I l J l I h J h I l J h ) I h J h I l J l ]2n / 2 I l J l
I h J h 2 n ( I h J l I l J h )2 n / 2 I l J l
Integer multiplication
(1234 x 5678)
= (12x1000 + 34) (56x1000 + 78)
=1,000,000(12x56) + 1000[(12x78) +
(34x56)] + (34 x 78)
4 calculation needed
12x56 , 12x78, 34x56, 34x78
18
19
Quick-Sort
7 4 9 6 2 2 4 6 7 9
4 2 2 4
22
7 9 7 9
99
20
Quick-Sort
Quick-sort is a sorting
algorithm based on the
divide-and-conquer
paradigm:
x
21
Algorithm QuickSort
1 quicksort( arr, p, r )
2 if p < r
3 pi = partition( arr, p, r )
4 quicksort( arr, p, pi 1 )
5 quicksort( arr, pi + 1, r )
22
Partition
We partition an input
sequence as follows:
Algorithm partition(S, p)
Input sequence S, position p of pivot
Output subsequences L, E, G of the
elements of S less than, equal to,
or greater than the pivot, resp.
L, E, G empty sequences
x S.remove(p)
while S.isEmpty()
y S.remove(S.first())
if y < x
L.insertLast(y)
else if y = x
E.insertLast(y)
else { y > x }
G.insertLast(y)
return L, E, G
23
Quick-Sort Tree
An execution of quick-sort is depicted by a binary tree
7 4 9 6 2 2 4 6 7 9
4 2 2 4
22
7 9 7 9
99
24
Execution Example
Pivot selection
7 2 9 43 7 6 1 1 2 3 4 6 7 8 9
7 2 9 4 2 4 7 9
22
3 8 6 1 1 3 8 6
9 4 4 9
99
33
88
44
25
2 4 3 1 2 4 7 9
22
3 8 6 1 1 3 8 6
9 4 4 9
99
33
88
44
26
2 4 3 1 2 4 7
11
3 8 6 1 1 3 8 6
9 4 4 9
99
33
88
44
27
2 4 3 1 1 2 3 4
11
3 8 6 1 1 3 8 6
4 3 3 4
99
33
88
44
28
2 4 3 1 1 2 3 4
11
7 9 7 1 1 3 8 6
4 3 3 4
99
88
99
44
29
2 4 3 1 1 2 3 4
11
7 9 7 1 1 3 8 6
4 3 3 4
99
88
99
44
30
2 4 3 1 1 2 3 4
11
7 9 7 17 7 9
4 3 3 4
99
88
99
44
31
n1
n1
1
32
33
Lecture 07
Greedy Algorithms
Making Change
Problem: A dollar amount to reach and a collection of
coin amounts to use to get there.
Configuration: A dollar amount yet to return to a
customer plus the coins already returned
Objective function: Minimize number of coins returned.
Greedy solution: Always return the largest coin you can
Example 1: Coins are valued $.32, $.08, $.01
bi - a positive benefit
wi - a positive weight
Objective: maximize
b ( x / w )
i
iS
Constraint:
iS
W
5
Example
Given: A set S of n items, with each item i having
bi - a positive benefit
wi - a positive weight
Weight:
Benefit:
Value:
($ per ml)
4 ml
8 ml
2 ml
6 ml
1 ml
$12
$32
$40
$30
$50
20
50
1
2
6
1
ml
ml
ml
ml
of
of
of
of
10 ml
6
5
3
4
2
Since bi ( xi / wi ) (bi / wi ) xi
iS
iS
Run time: O(n log n). Why?
Algorithm fractionalKnapsack(S, W)
Input: set S of items w/ benefit bi
and weight wi; max. weight W
Output: amount xi of each item i
to maximize benefit w/ weight
at most W
for each item i in S
xi 0
vi bi / wi
{value}
w0
{total weight}
while w < W
remove item i w/ highest vi
xi min{wi , W - w}
w w + min{wi , W - w}
7
Shortest Paths
A
0
4
2
B
7
5
E
C
3
D
8
Algorithm
Edge relaxation
Weighted Graphs
In a weighted graph, each edge has an associated numerical
value, called the weight of the edge
Edge weights may represent, distances, costs, etc.
Example:
SFO
PVD
ORD
LGA
HNL
LAX
DFW
MIA
10
Example:
Applications
SFO
PVD
ORD
LGA
HNL
LAX
DFW
MIA
11
Property 2:
There is a tree of shortest paths from a start vertex to all the other
vertices
Example:
Tree of shortest paths from Providence
SFO
PVD
ORD
LGA
HNL
LAX
DFW
MIA
12
Dijkstras Algorithm
The distance of a vertex
v from a vertex s is the
length of a shortest path
between s and v
Dijkstras algorithm
computes the distances
of all the vertices from a
given start vertex s
Assumptions:
Edge Relaxation
Consider an edge e (u,z)
such that
d(u) 50
s
d(u) 50
d(z) 75
d(z) 60
e
14
Example
A
0
4
2
B
7
5
E
C
3
7
5
E
D
8
0
4
2
C
0
4
2
8
D
11
B
2
7
5
E
C
3
D
8
5
15
Example (cont.)
A
0
4
2
B
2
C
3
5
E
8
F
5
A
0
4
2
B
2
C
3
5
E
8
F
16
Dijkstras Algorithm
A priority queue stores
the vertices outside the
cloud
Key: distance
Element: vertex
Locator-based methods
insert(k,e) returns a
locator
replaceKey(l,k) changes
the key of an item
Algorithm DijkstraDistances(G, s)
Q new heap-based priority queue
for all v G.vertices()
if v s
setDistance(v, 0)
else
setDistance(v, )
l Q.insert(getDistance(v), v)
setLocator(v,l)
while Q.isEmpty()
u Q.removeMin()
for all e G.incidentEdges(u)
{ relax edge e }
z G.opposite(u,e)
r getDistance(u) + weight(e)
if r < getDistance(z)
setDistance(z,r)
Q.replaceKey(getLocator(z),r)
17
Analysis
Graph operations
Label operations
Each vertex is inserted once into and removed once from the priority
queue, where each insertion or removal takes O(log n) time
The key of a vertex in the priority queue is modified at most deg(w)
times, where each key change takes O(log n) time
Recall that
Sv deg(v) 2m
The running time can also be expressed as O(m log n) since the
graph is connected
18
0
4
2
B
2
C
3
5
E
8
F
19
0
4
6
B
2
C
0
5
E
-8
9
F
BOS
867
849
PVD
ORD
740
621
1846
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
BWI
1090
DFW
946
1121
MIA
2342
21
Definitions
A crucial fact
22
ORD
Subgraph of a graph G
containing all the vertices of G
Spanning tree
DEN
10
PIT
9
STL
4
8
7
3
DCA
Applications
Communications networks
Transportation networks
DFW
ATL
23
Cycle Property
Cycle Property:
Let T be a minimum
spanning tree of a
weighted graph G
Let e be an edge of G
that is not in T and C let
be the cycle formed by e
with T
For every edge f of C,
weight(f) weight(e)
Proof:
By contradiction
If weight(f) > weight(e) we
can get a spanning tree
of smaller weight by
replacing e with f
9
3
7
Replacing f with e yields
a better spanning tree
f
2
8
4
9
3
7
24
Partition Property
U
f
Partition Property:
Consider a partition of the vertices of
G into subsets U and V
Let e be an edge of minimum weight
across the partition
There is a minimum spanning tree of
G containing edge e
Proof:
Let T be an MST of G
If T does not contain e, consider the
cycle C formed by e with T and let f
be an edge of C across the partition
By the cycle property,
weight(f) weight(e)
Thus, weight(f) weight(e)
We obtain another MST by replacing
f with e
4
9
8
8
e
7
4
9
8
8
e
7
25
Prim Algorithm
Input: A non-empty connected weighted graph with
vertices V and edges E (the weights can be negative).
Initialize: Vnew = {x}, where x is an arbitrary node (starting
point) from V, Enew = {}
Repeat until Vnew = V:
Choose an edge {u, v} with minimal weight such that u is
in Vnew and v is not (if there are multiple edges with the
same weight, any of them may be picked)
Add v to Vnew, and {u, v} to Enew
26
Prim-Jarniks Algorithm
Similar to Dijkstras algorithm (for a connected graph)
We pick an arbitrary vertex s and we grow the MST as a
cloud of vertices, starting from s
We store with each vertex v a label d(v) = the smallest
weight of an edge connecting v to a vertex in the cloud
At each step:
We add to the cloud the
vertex u outside the cloud
with the smallest distance
label
We update the labels of the
vertices adjacent to u
27
Key: distance
Element: vertex
Locator-based methods
insert(k,e) returns a
locator
replaceKey(l,k) changes
the key of an item
Distance
Parent edge in MST
Locator in priority queue
Algorithm PrimJarnikMST(G)
Q new heap-based priority queue
s a vertex of G
for all v G.vertices()
if v s
setDistance(v, 0)
else
setDistance(v, )
setParent(v, )
l Q.insert(getDistance(v), v)
setLocator(v,l)
while Q.isEmpty()
u Q.removeMin()
for all e G.incidentEdges(u)
z G.opposite(u,e)
r weight(e)
if r < getDistance(z)
setDistance(z,r)
setParent(z,e)
Q.replaceKey(getLocator(z),r)
28
Example
2
B
5
C
5
C
7
2
4
F
8
7
8
A
7
5
F
E
7
2
4
8
8
A
8
C
3
7
3
7
5
C
8
A
3
7
29
Example (contd.)
2
2
5
C
3
3
2
2
5
C
8
A
3
3
30
Dijkstra vs Prim
Algorithm DijkstraDistances(G, s)
Q new heap-based priority queue
for all v G.vertices()
if v s
setDistance(v, 0)
else
setDistance(v, )
l Q.insert(getDistance(v), v)
setLocator(v,l)
while Q.isEmpty()
u Q.removeMin()
for all e G.incidentEdges(u)
{ relax edge e }
z G.opposite(u,e)
r getDistance(u) +
weight(e)
if r < getDistance(z)
setDistance(z,r)
Q.replaceKey(getLocator(z),r)
Algorithm PrimJarnikMST(G)
Q new heap-based priority queue
s a vertex of G
for all v G.vertices()
if v s
setDistance(v, 0)
else
setDistance(v, )
setParent(v, )
l Q.insert(getDistance(v), v)
setLocator(v,l)
while Q.isEmpty()
u Q.removeMin()
for all e G.incidentEdges(u)
z G.opposite(u,e)
r weight(e)
if r < getDistance(z)
setDistance(z,r)
setParent(z,e)
Q.replaceKey(getLocator(z),r)
31
Dijkstra vs Prim
a
5 b
9
Shortest Path to all nodes
from node A
5 b
9
Minimum Spanning Tree
a
5 b
Min Spanning Tree :- How to connect all nodes with the least cost?
Graph must be connected.
E.g. Minimize the TNB power line transmission cost, Railway line cost
Save material/construction cost
You can start at any node, the Min Spanning Tree is the same.
Dijkstra: Shortest Path to all nodes from an individual node
Specific to each node. Graph is different for node A and node C
Graph differs depending on which node is the root (starting point)
How to reach all nodes with the least cost. Save time and fuel
AA[Lec05] The Greedy Method
32
Analysis
Graph operations
Label operations
Each vertex is inserted once into and removed once from the priority
queue, where each insertion or removal takes O(log n) time
The key of a vertex w in the priority queue is modified at most deg(w)
times, where each key change takes O(log n) time
Recall that
Sv deg(v) 2m
Kruskals Algorithm
A priority queue stores
the edges outside the
cloud
Key: weight
Element: edge
Algorithm KruskalMST(G)
for each vertex V in G do
define a Cloud(v) of {v}
let Q be a priority queue.
Insert all edges into Q using their
weights as the key
T
while T has fewer than n-1 edges do
edge e = Q.removeMin()
Let u, v be the endpoints of e
if Cloud(v) Cloud(u) then
Add edge e to T
Merge Cloud(v) and Cloud(u)
return T
34
Kruskal vs Prim
Algorithm PrimJarnikMST(G)
Q new heap-based priority queue
s a vertex of G
for all v G.vertices()
if v s
setDistance(v, 0)
else
setDistance(v, )
setParent(v, )
l Q.insert(getDistance(v), v)
setLocator(v,l)
while Q.isEmpty()
u Q.removeMin()
//Remove vertex/node by node and
//find the incident edge
for all e G.incidentEdges(u)
z G.opposite(u,e)
r weight(e)
if r < getDistance(z)
setDistance(z,r)
setParent(z,e)
Q.replaceKey(getLocator(z),r)
Algorithm KruskalMST(G)
for each vertex V in G do
define a Cloud(v) of {v}
let Q be a priority queue.
Insert all edges into Q using their
weights as the key
T
while T has fewer than n-1 edges do
// remove edge by edge
edge e = Q.removeMin()
Let u, v be the endpoints of e
if Cloud(v) Cloud(u) then
Add edge e to T
Merge Cloud(v) and Cloud(u)
return T
35
Kruskal vs Prim
If algorithm is stopped before completion
Prim :- Always one connected tree
Kruskal : 1 connected tree or a forest with multiple
trees.
36
37
Representation of a
Partition
Each set is stored in a sequence
Each element has a reference back to the set
Partition-Based
Implementation
A partition-based version of Kruskals Algorithm
performs cloud merges as unions and tests as finds.
Algorithm Kruskal(G):
Input: A weighted graph G.
Output: An MST T for G.
Let P be a partition of the vertices of G, where each vertex forms a separate set.
Let Q be a priority queue storing the edges of G, sorted by their weights
Let T be an initially-empty tree
while Q is not empty do
(u,v) Q.removeMinElement()
if P.find(u) != P.find(v) then
Running time:
Add (u,v) to T
O((n+m)log n)
P.union(u,v)
return T
39
Kruskal
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
40
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
41
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
42
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
43
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
44
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
45
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
46
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
47
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
48
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
49
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
50
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
51
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
52
Example
2704
BOS
867
849
ORD
LAX
1391
1464
1235
144
JFK
1258
184
802
SFO
337
187
740
621
1846
PVD
BWI
1090
DFW
946
1121
MIA
2342
53
O(|V|2)
Kruskal
Dijkstra
54
Lecture 08
Dynamic Programming
Learning Outcome
Matrix Chain-Product (5.3.1)
The General Technique (5.3.2)
0-1 Knapsack Problem (5.3.3)
Transitive closure (6.4.2)
Matrix Chain-Products
Dynamic Programming is a general
algorithm design paradigm.
C = A*B
A is d e and B is e f
f
B
j
e 1
k 0
O(def ) time
A
d
C
i
i,j
d
3
Matrix Chain-Products
Matrix Chain-Product:
Compute A=A0*A1**An-1
Ai is di di+1
Problem: How to parenthesize?
Example
B is 3 100
C is 100 5
D is 5 5
(B*C)*D takes (3 x 100 x 5) + (5 x 5 x5) =
1500 + 75 = 1575 ops
B*(C*D) takes (3 x 100 x 5) + (100 x 5
x5) = 1500 + 2500 = 4000 ops
4
An Enumeration Approach
Matrix Chain-Product Alg.:
Running time:
A Greedy Approach
Idea #1: repeatedly select the product that
uses (up) the most operations.
Counter-example:
A is 10 5
B is 5 10
C is 10 5
D is 5 10
Greedy idea #1 gives (A*B)*(C*D), which takes
(10x5x10)+(10x10x10)+(10x5x10) =
500+1000+500 = 2000 ops
A*((B*C)*D) takes (10x5x10)+(5x10x5)+(5x5x10)
= 500+250+250 = 1000 ops
6
A is 101 11
B is 11 9
C is 9 100
D is 100 99
Greedy idea #2 gives A*((B*C)*D)), which takes
109989+9900+108900=228789 ops
(A*B)*(C*D) takes 9999+89991+89100=189090 ops
A Recursive Approach
Define subproblems:
A Dynamic Programming
Algorithm
Since subproblems
Algorithm matrixChain(S):
overlap, we dont
Input: sequence S of n matrices to be multiplied
use recursion.
Output: number of operations in an optimal
Instead, we
paranethization of S
construct optimal
for i 1 to n-1 do
subproblems
Ni,i 0
bottom-up.
for b 1 to n-1 do
Ni,is are easy, so
for i 0 to n-b-1 do
start with them
j i+b
Then do length
Ni,j +infinity
2,3, subproblems,
and so on.
for k i to j-1 do
Ni,j min{Ni,j , Ni,k +Nk+1,j +di dk+1 dj+1}
Running time: O(n3)
9
A Dynamic Programming
Algorithm Visualization
Ni , j min{Ni ,k Nk 1, j di dk 1d j 1}
The bottom-up
construction fills in the
N array by diagonals
Ni,j gets values from
previous entries in i-th
row and j-th column
Filling in each entry in
the N table takes O(n)
time.
Total run time: O(n3)
Getting actual
parenthesization can be
done by remembering
k for each N entry
i k j
1 2
n-1
answer
n-1
10
Ni , j min{Ni ,k Nk 1, j di dk 1d j 1}
i k j
N1, 4 min{
N1,1 N 2, 4 d1d 2 d 5 0 2500 35 *15 * 20 13000 ,
N1, 2 N 3, 4 d1d 3d 5 2625 1000 35 * 5 * 20 7125 ,
N1,3 N 4, 4 d1d 4 d 5 4375 0 35 *10 * 20 11375
}
7125
12
bi - a positive benefit
wi - a positive weight
Objective: maximize
iT
Constraint:
w W
i
iT
13
Example
Given: A set S of n items, with each item i having
bi - a positive benefit
wi - a positive weight
Items:
Weight:
Benefit:
4 in
2 in
2 in
6 in
2 in
$20
$3
$6
$25
$80
5 (2 in)
3 (2 in)
1 (4 in)
W = 9 in
w = 8 in
B = $126
14
18
22
28
15
B[k , w]
else
max{ B[k 1, w], B[k 1, w wk ] bk }
B[k 1, w]
if wk w
B[k , w]
else
max{ B[k 1, w], B[k 1, w wk ] bk }
Algorithm 01Knapsack(S, W):
Since B[k,w] is defined in
Input: set S of items w/ benefit bi
terms of B[k-1,*], we can
and weight wi; max. weight W
Output: benefit of best subset with
reuse the same array
weight at most W
for w 0 to W do
B[w] 0
for k 1 to n do
for w W downto wk do
if B[w-wk]+bk > B[w] then
B[w] B[w-wk]+bk
17
10
11
{1}
{1,2}
{1,2,3}
18
19
24
25
25
25
25
{1,2,3,4} 0
18
22
24
28
29
29
40
{1,2,3,4,5} 0
18
22
28
29
34
35
40
18
B
C
A
D
B
C
A
G*
19
Computing the
Transitive Closure
We can perform
DFS starting at
each vertex
O(n(n+m))
Floyd-Warshall
Transitive Closure
Idea #1: Number the vertices 1, 2, , n.
Idea #2: Consider paths that use only
vertices numbered 1, 2, , k, as
intermediate vertices:
i
Floyd-Warshalls Algorithm
Floyd-Warshalls algorithm
numbers the vertices of G as
v1 , , vn and computes a
series of digraphs G0, , Gn
Algorithm FloydWarshall(G)
Input digraph G
Output transitive closure G* of G
i1
for all v G.vertices()
G0=G
denote v as vi
Gk has a directed edge (vi, vj)
ii+1
if G has a directed path from
G0 G
vi to vj with intermediate
for k 1 to n do
vertices in the set {v1 , , vk}
Gk Gk 1
We have that Gn = G*
for i 1 to n (i k) do
for j 1 to n (j i, k) do
In phase k, digraph Gk is
if Gk 1.areAdjacent(vi, vk)
computed from Gk 1
Gk 1.areAdjacent(vk, vj)
Running time: O(n3),
if Gk.areAdjacent(vi, vj)
assuming areAdjacent is O(1)
Gk.insertDirectedEdge(vi, vj , k)
(e.g., adjacency matrix)
return Gn
22
Floyd-Warshall Example
v7
BOS
ORD
v4
JFK
v2
v6
SFO
DFW
LAX
v1
v3
MIA
v5
23
Floyd-Warshall, Iteration 1
v7
BOS
ORD
v4
JFK
v2
v6
SFO
DFW
LAX
v1
v3
MIA
v5
24
Floyd-Warshall, Iteration 2
v7
BOS
ORD
v4
JFK
v2
v6
SFO
DFW
LAX
v1
v3
MIA
v5
25
Floyd-Warshall, Iteration 3
v7
BOS
ORD
v4
JFK
v2
v6
SFO
DFW
LAX
v1
v3
MIA
v5
26
Floyd-Warshall, Iteration 4
v7
BOS
ORD
v4
JFK
v2
v6
SFO
DFW
LAX
v1
v3
MIA
v5
27
Floyd-Warshall, Iteration 5
v7
BOS
ORD
v4
JFK
v2
v6
SFO
DFW
LAX
v1
v3
MIA
v5
28
Floyd-Warshall, Iteration 6
v7
BOS
ORD
v4
JFK
v2
v6
SFO
DFW
LAX
v1
v3
MIA
v5
29
Floyd-Warshall, Conclusion
v7
BOS
ORD
v4
JFK
v2
v6
SFO
DFW
LAX
v1
v3
MIA
v5
30
Lecture 09
Text Processing
a
b
a
a
b
c
a
Trie
Huffman encoding
Strings
A string is a sequence of
characters
Examples of strings:
Java program
HTML document
DNA sequence
Digitized image
ASCII
Unicode
{0, 1}
{A, C, G, T}
Text editors
Search engines
Biological research
Brute-Force Algorithm
Algorithm BruteForceMatch(T, P)
Input text T of size n and pattern
P of size m
Output starting index of a
substring of T equal to P or -1
if no such substring exists
a match is found, or
for i 0 to n - m
all placements of the pattern
{ test shift i of the pattern }
have been tried
j0
Brute-force pattern matching
while j < m T[i + j] = P[j]
runs in time O(nm)
jj+1
Example of worst case:
if j = m
T = aaa ah
P = aaah
return i {match at i}
may occur in images and
else
DNA sequences
break while loop {mismatch}
unlikely in English text
return -1 {no match anywhere}
The brute-force pattern
matching algorithm compares
the pattern P with the text T
for each possible shift of P
relative to T, until either
Brute-Force Algorithm
Matching:
Brute-Force Matching:
Comparing from left to right
ABABC
ABABABCCA
Error
Text to find
ABABC
ABABC
ABABABCCA
ABABABCCA
Successful match!
Error
Source Text
Worst Case:
Best Case:
Average Case:
O(M * N)
O(M)
O(M+N)
6
Boyer-Moore Heuristics
The Boyer-Moores pattern matching algorithm is based on two
heuristics
Looking-glass heuristic: Compare P with a subsequence of T
moving backwards
Character-jump heuristic: When a mismatch occurs at T[i] = c
Example
a
p a t
r i
1
t h m
r i
t e r n
2
t h m
m a t c h i n g
3
t h m
4
t h m
a l g o r
5
t h m
t h m
11 10 9 8 7
r i t h m
6
t h m
7
Last-Occurrence Function
Boyer-Moores algorithm preprocesses the pattern P and the
alphabet S to build the last-occurrence function L mapping S to
integers, where L(c) is defined as
Example:
S = {a, b, c, d}
P = abacab
L(c)
-1
Case 1: j 1 + l
.
a .
i
b a
j l
m-j
b a
Case 2: 1 + l j
.
a .
i
a .
l
b .
j
m - (1 + l)
a .
1+l
b .
9
Example
a
b a
a a
a d
b a
a b
a b b
b a
a b a
a b
4
13 12 11 10 9
b a
a b
a b
7
b a
a b
10
11
12
13
Analysis
Boyer-Moores algorithm
runs in time O(nm + s)
Example of worst case:
T = aaa a
P = baaa
12 11 10
18 17 16 15 14 13
24 23 22 21 20 19
14
a b a a b x .
a b a a b a
j
a b a a b a
No need to
repeat these
comparisons
Resume
comparing
here
15
P[j]
F(j)
a b a a b x .
a b a a b a
j
a b a a b a
F(j - 1)
16
i increases by one, or
the shift amount i - j
increases by at least one
(observe that F(j - 1) < j)
Algorithm KMPMatch(T, P)
F failureFunction(P)
i0
j0
while i < m
if T[i] = P[j]
if j = m - 1
return i - j { match }
else
ii+1
jj+1
else
if j > 0
j F[j - 1]
else
ii+1
return -1 { no match }
17
i increases by one, or
the shift amount i - j
increases by at least one
(observe that F(j - 1) < j)
F[i] j + 1
ii+1
jj+1
else if j > 0 then
{use failure function to shift P}
j F[j - 1]
else
F[i] 0 { no match }
ii+1
18
Example
a b a c a a b a c c a b a c a b a a b b
1 2 3 4 5 6
a b a c a b
7
a b a c a b
8 9 10 11 12
a b a c a b
13
j
P[j]
F(j)
a b a c a b
14 15 16 17 18 19
a b a c a b
19
Trie
Preprocessing the pattern speeds up pattern matching
queries
20
Trie (2)
Given a string X, efficiently encode X into a smaller
string Y
Each uncompressed character in X is 7-bit for ASCII and 16bit for UNICODE
A compressed character in Y has less bit
Saves memory and/or bandwidth
21
Why Trie?
N is no of string
L = length of string
Above is the performance for Red-Black BST and
Hashing.
Can we do better than the above for string?
Yes.
22
Trie Implementation
Trie: Search
Null pointer
(key word Shelter)
Trie: Insertion
Put(shore, 7)
25
Trie: Deletion
delete(shells, 3)
All the character l, l, s and its pointer will have to be removed.
The removal stop when there is a non-null value character or
when the is another sub-tree
26
Trie: Deletion
delete(shore, 7)
Stop at (h).
27
Trie: Cost
R-way trie
Good when R is small
When R is large, too much memory is required.
E.g. R-Way for english words, R = 26 , N = 10
Space = (26 +1) x 10 = 270 (still OK)
E.g. R-Way for UTF-16, R = 65536-way, N = 10
Space = (65536 + 1) x 10 = 655370 (too big).
28
Trie Conclusion
Trade memory with speed
When you do a string search, you dont
have to compare the whole word
Quick search hit worst case is the
length of character
Quicker search miss, just check the 1st
few character, if miss then its not
there.
Note: Just one implementation is given
in this lecture, other implementation
may varies [Lec11] Pattern Matching
29
M
A
C
A
MACABRE
MACACO
MACADAM
R
O
MACAQUE
M
N
I
MACADAMI
MACCBOY
MACAW
O
MACAROON
MACARONI
MACARONIC
30
26 character implementation
31
00
010
011
10
11
d
b
e
32
Example
X = abracadabra
T1 encodes X into 29 bits
T2 encodes X into 24 bits
T1
T2
d
a
b
c
d
33
Huffmans Algorithm
Given a string X,
Huffmans algorithm
construct a prefix
code the minimizes
the size of the
encoding of X
It runs in time
O(n + d log d), where
n is the size of X
and d is the number
of distinct characters
of X
A heap-based
priority queue is
used as an auxiliary
structure
Algorithm HuffmanEncoding(X)
Input string X of size n
Output optimal encoding trie for X
C distinctCharacters(X)
computeFrequencies(C, X)
Q new empty heap
for all c C
T new single-node tree storing c
Q.insert(getFrequency(c), T)
while Q.size() > 1
f1 Q.minKey()
T1 Q.removeMin()
f2 Q.minKey()
T2 Q.removeMin()
T join(T1, T2)
Q.insert(f1 + f2, T)
return Q.removeMin()
34
Example
11
a
5
b
2
c
1
d
1
X = abracadabra
Frequencies
b
2
r
2
a
5
2
d
r
2
r
6
2
a
5
a
5
4
d
4
d
r
35
Huffman Coding
String
Frequency
A= 20
B = 18
C = 15
D = 12
E=8
F -= 6
G=4
H=2
I=1
J=1
36
x1
x2
x2
12
11
x3
x3
22
13
21
x4
x4
32
23
31
33
Learning Outcome
P and NP (13.1)
Definition of P
Definition of NP
Alternate definition of NP
NP-completeness (13.2)
SFO
PVD
ORD
LGA
HNL
LAX
DFW
MIA
3
Polynomial-Time
Decision Problems
To simplify the notion of hardness, we will
focus on the following:
benefit at least K?
Does a graph G have an MST with weight at most K?
10
NP example
Problem: Decide if a graph has an MST of weight K
Algorithm:
1.
2.
3.
11
NP example (2)
Problem: Decide if a graph has an MST of weight K
Verification Algorithm:
1.
2.
3.
13
Equivalence of the
Two Definitions
Suppose A is a non-deterministic algorithm
Let y be a certificate consisting of all the outcomes of the
14
An Interesting Problem
A Boolean circuit is a circuit of AND, OR, and NOT
gates; the CIRCUIT-SAT problem is to determine if
there is an assignment of 0s and 1s to a circuits
inputs so that the circuit outputs 1.
Inputs:
Logic Gates:
0
1
NOT
0
1
Output:
OR
1
1
AND
15
CIRCUIT-SAT is in NP
Non-deterministically choose a set of inputs and the
outcome of every gate, then test each gates I/O.
Inputs:
Logic Gates:
0
1
NOT
0
1
Output:
OR
1
1
AND
16
NP-Completeness
A problem (language) L is NP-hard if every
problem in NP can be reduced to L in
polynomial time.
That is, for each language M in NP, we can
take an input x for M, transform it in
polynomial time to an input x for L such that
x is in M if and only if x is in L.
L is NP-complete if its in NP and is NP-hard.
NP
poly-time
L
17
Cook-Levin Theorem
CIRCUIT-SAT is NP-complete.
NP
poly-time
CIRCUIT-SAT
18
Some Thoughts
about P and NP
NP-complete
problems live here
NP
CIRCUIT-SAT
NP-Completeness (2)
x1
x1
x2
x2
12
11
x3
x3
22
13
21
x4
x4
32
23
31
33
20
Problem reduction
SAT (and CNF-SAT and 3SAT)
Vertex Cover
Hamiltonian Cycle
21
Problem Reduction
A language M is polynomial-time reducible to a
language L if an instance x for M can be transformed in
polynomial time to an instance x for L such that x is in M
if and only if x is in L.
CIRCUIT-SAT is in NP
For every M in NP, M CIRCUIT-SAT.
1
Output:
1
22
Transitivity of Reducibility
If A B and B C, then A C.
Types of reductions:
23
SAT
A Boolean formula is a formula where the
variables and operations are Boolean (0/1):
(a+b+d+e)(a+c)(b+c+d+e)(a+c+e)
OR: +, AND: (times), NOT:
24
SAT is NP-complete
Reduce CIRCUIT-SAT to SAT.
Given a Boolean circuit, make a variable for every input
and gate.
Create a sub-formula for each gate, characterizing its
effect. Form the formula as the output variable AND-ed
with all these sub-formulas:
Example:
m((a+b)e)(cf)(dg)(eh)(efi)
Inputs:
a
b
g
d
3SAT
The SAT problem is still NP-complete even if the formula is a
conjunction of disjuncts, that is, it is in conjunctive normal form
(CNF).
The SAT problem is still NP-complete even if it is in CNF and
every clause has just 3 literals (a variable or its negation):
(a+b+d)(a+c+e)(b+d+e)(a+c+e)
Reduction from SAT
DNF (disjunctive normal form):
_ _
_
__
B=a.b.c+a.b.c+a.b.c+a.b.c
CNF (conjunctive normal form):
_ _
_
_
_ _
_
B=(a+b+c).(a+b+c).(a+b+c).(a+b+c)
26
Illustration of Reductions
Every problem in NP
Cook-Levin Theorem
CIRCUIT-SAT is NP-complete
comp. design
CIRCUIT-SAT
local rep.
CNF-SAT
local rep.
3SAT
comp. design
VERTEX-COVER
local rep.
local rep.
CLIQUE
SET-COVER
comp. design
SUBSET-SUM
restriction
KNAPSACK
comp. design
HAMILTONIANCYCLE
restriction
TSP
27
Vertex Cover
A vertex cover of graph G=(V,E) is a subset W of V, such
that, for every edge (a,b) in E, a is in W or b is in W.
VERTEX-COVER: Given an graph G and an integer K,
does G have a vertex cover of size at most K?
Vertex Cover
Example
Suppose we are given a graph G representing a
computer network where the vertices represent routers
and edges represent physical connections. Suppose
further that we wish to upgrade some of the routers in
our network with special new, but expensive, routers
that can perform sophisticated monitoring operations for
incident connections. If we would like to determine if k
new routers are sufficient to monitor every connection in
our network, then we have an instance of VERTEXCOVER problem.
29
Vertex-Cover is NP-complete
Reduce 3SAT to VERTEX-COVER.
Let S be a Boolean formula in CNF with each clause
having 3 literals.
For each variable x, create a node for x and x,
and connect these two:
x
x
b
30
Vertex-Cover is NP-complete
Completing the construction
Connect each literal in a clause triangle to its copy
in a variable pair.
E.g., a clause (x+y+z)
x
b
31
Vertex-Cover is NP-complete
Example: (a+b+c)(a+b+c)(b+c+d)
Graph has vertex cover of size K=4+6=10 if formula is
satisfiable.
a
12
11
22
13
21
32
23
31
33
32
Some Other
NP-Complete Problems
SUBSET-SUM: Given a set of integers and a
distinguished integer K, is there a subset of
the integers that sums to K?
33
SUBSET-SUM
SUBSET-SUM: Given a set S of n integers and a distinguished
integer k, is there a subset of the integers in S that
sum to k?
Example: Suppose we have an internet web server, and we are
presented with a collection of download requests. For each download
request we can easily determine the size of the requested file. Thus,
we can abstract each web request simply as an integer - the size of
the requested file. Given this set of integers, we might be interested
in determining a subset of them that exactly sums to the bandwidth
our server can accommodate in one minute. Unfortunately, this
problem is an instance of SUBSET-SUM. Moreover, because it is NPComplete, this problem will actually harder to solve as our web
servers bandwidth and request-handling ability improves.
34
Some Other
NP-Complete Problems
0/1 Knapsack: Given a collection of items with
weights and benefits, is there a subset of weight
at most W and benefit at least K?