Data Structure and Algorithms Notes

(Author: Neil Brian. Adapted from Stanley Tay.
Last Updated: 5/12/17)

(To be used together with CS2010 Online Quiz 1 & 2 Cheatsheets)
A Data Structure stores and organises data
Has efficient: insert, search, delete, query, update
Priority Queue ADT

Operation Circular Array Circular Array Binary Heap �
(Strategy 1) (Strategy 2)
Void Enqueue(x) O(N) O(1) O(logN) (Insert)
Obj Dequeue() O(1 O(N) O(logN) (ExtractMax)
Best to implement via Binary Heap
Binary Heap Property

In all vertices except for root,
A[parent(i)] >= A[i] (max heap)
A[parent(i)] <= A[i] (min heap)
Put in compact 1-based array.
Navigation:
parent(i) = floor (i/2), except for i = 1
left(i) = 2 * i, if left(i) > heapsize then left child no exist
right(i) = 2 * i + 1, if right(i) > heapsize then right child no exist
Insert(v) O(logN) Extend the heapsize – O(1)

Insert element at the back of the array – O(1)
ShiftUp that element accordingly to fix Binary Heap property – O(logN)
ShiftUp(v) O(logN) While it is not the root and max property is violated,
Swap the element with its parent
Obj O(logN) Get the max value, which is in index 1,
ExtractMax(v) Replace that value with the last item in the array
Decrease heapsize.
ShiftDown the new index 1 value to fix Binary Heap property
Return the max value
ShiftDown(v) O(logN) While ensuring the element does not go past the heapsize
If it needs to be swapped down
Swap it down with the larger of the two children
CreateHeap(arr) O(NlogN) Fill in index 0 with dummy entry (since 1-based array)
*arr is unsorted Insert each element into the empty heap in O(logN) time in O(N) time
O(N) Fill in index 0 with dummy entry
Starting From the parent of the last leaf to the root
Perform ShiftDown on each element
Heapsort(arr) O(NlogN) CreateHeap out of the unsorted array in O(N)
And perform extract max O(logN) on all elements in O(N) time
Graph Theory/ Math Notes:

- A binary heap is a complete binary tree at all times.
- Height of a complete binary tree of size N:
- Height of a perfect binary tree:
- Number of nodes in a perfect binary tree of height h:

- Sterling's approximation:
- Harmonic Series
- Geometric Series
- Logarithmic Conversion
-
Table ADT
Operation Unsorted Array Sorted Array BBST �
search(v) O(N) O(logN) O(logN)
insert(v) O(1) O(N) O(logN)
findMax() O(N) O(1) O(logN)
listSorted() O(NlogN) O(N) O(N)
successor(v) O(N) O(logN) O(logN)
remove(v) O(N) O(N) O(logN)
getMedian() O(NlogN)* O(1) O(logN)
rank(v) O(NlogN)* O(logN) O(logN)
*Using QuickSelect in Tutorial 1, GetMedian() = Select(n/2). Expected O(n)
Best to implement via BBST
BST Property
For every vertex x and y,
y.key < x.key if y is in left subtree of x
y.key > x.key if y is in right subtree of x
*O(h) = O(logN) for AVL Tree

search(v) O(h) Start from root
If the search value is less than the current vertex, travel left
If the greater than the current vertex, travel right.
Repeat traversal until you find the value, and return it.
If traversal leads to a dead end, return null.
insert(v) O(h) Similar traversal to search.
When find insertion point create new vertex.
(If AVL Tree) Update the height and size, especially update the height of
vertices along the insertion path. Check balance factor as you walk up from
insertion point to the root and use rotation to balance
findMax()/findMin() O(h) Traverse until leftmost/rightmost child
listSorted() O(N) In-order traversal
Predecessor(v) O(h) If node v has a left subtree/child
Return the max of the left subtree
Else if the node is a right child of a parent
The parent is the predecessor
Else
Keep traversing up the BST until you traverse into a parent node from
a right child. That parent is the predecessor
successor(v) O(h) If node v has a right subtree/child
Return the min of the right subtree
Else if the node is a left child of a parent
The parent is the predecessor
Else
Keep traversing up the BST until you traverse into a parent node from
a left child. That parent is the predecessor
remove(v) O(h) Search for v
If v has no children just remove the node
Else if it has only one child
Connect the child to the parent of the deleted node and vice versa
Else (it has both children)
Replace the vertex with the successor and remove successor**
(If AVL Tree) Remember to update the height and size and balance accordingly
getMedian() O(h)
rank(node, v) O(h) (Assuming each vertex has a size attribute)
If you are at node, return the size of left subtree of the node + 1
Else if the value v you are looking for is less than the current node
Traverse left with rank(node.left, v)
Else
Return size of left subtree of the node + 1 + rank(node.right, v)
** Why take the successor of v?

Since vertex v has two children, therefore has right child
Successor of v is the minimum of right subtree
This minimum has no left child.
O(N) traversal of all elements
preorder() inorder() postorder()
print inorder(t.left) postorder(t.left)
preorder(t.left) print postorder(t.right)
preorder(t.right) inorder(t.right) print
Useful to duplicate trees listSorted() Used in Reverse Polish Notation
Height: number of edges from node to deepest leaf

Size: total number of vertices in subtree where the node is root
Empty Tree Normal Tree

Height = -1 Height = max(left subtree height, right subtree height) + 1
Size = 0 Size = left subtree size + right subtree size + 1
BBST/AVL Tree Property

Vertex height balanced if |x.left.height - x.right.height| <= 1
BST is height balanced if every vertex is height balanced
balancefactor = x.left.height - x.right.height

if balancefactor = +2 or -2, need to balance
Union-Find Disjoint Sets (UFDS) Data Structure
- Can union 2 disjoint sets
- Can find what set an item belongs in
- Can check if two items are same set
Implementation
int array p
p[i] records the parent of i
if p[i] == i, then i is a root
initialise array p
and ArrayList of rank, with all initialised to 0
findSet(i) O(α(N)) Recursively visit p[i] until p[i] = i

As you travel, everything visited is connected directly to the
representative item
(Path Compression Heuristic)
isSameSet(i,j) Check findSet(i) and findSet(j) same representative item
unionSet(i,j) Check if the items are from two disjoint sets i.e. !isSameSet(i,j)
(Since isSameSet also uses findSet, Path Compression Heuristic is involved)
Make the rep item of the taller tree the rep item of the rep item of the
shorter tree. (Union by Rank Heuristic)
If both trees are the same rank, then nest i under j and increase the rank
of j
Graph Terminologies
1. Directed Edge: ( A )-->( B ); goes in only 1 direction.
2. Undirected Edge: ( A )--( B ); goes in both directions.
3. Sparse Graph vs Dense Graph: Not a lot of edges vs Many Edges.
4. In/Out Degree: The number of inward/outward edges at a vertex.
5. Path Length/Cost: (Unweighted Graphs) The number of edges in a path.
(Weighted Graphs) The total sum of edge weights in a path.
6. Simple (Strict) Graph: An unweighted and undirected graph with no self-loops or multiple edges
between two vertices.
7. Simple Path: Sequence of non-repeated vertices connected by a sequence of edges.
8. Shortest Path Weight from vertex x to y: The lowest path weight between x to y.
9. Simple Cycle: Path that starts and ends with the same vertex; the start/end vertex is the only
repeated vertices.
10. Acyclic: There is no cycle.
11. Complete Graph: Simple graph of N vertices with [N(N − 1)]/2 edges.
12. Component: A group of vertices in an undirected graph that can visit each other via some path.
13. Connected Graph: Undirected graph with 1 component.
14. Directed Acyclic Graph (DAG): Directed graph that has no cycle.
a. A tree is a valid DAG
15. Tree: Connected graph where there is only 1 unique path between any pair of vertices. (E = V - 1)
a. This path is both the longest path and shortest path.
16. Spanning Tree: A tree that spans every vertex in the graph.
17. Bipartite Graph: Undirected graph where vertices can be partitioned into two sets, such that there
are no edges between members of the same set.
18. Subgraph: A subset of vertices (and their connecting edges) of the original graph.
Graph Theory
 Handshake Lemma: In an undirected graph, there must be an even number of odd-degree vertices.
 In a complete graph,
o Thus, for a complete graph, O(E) = O(V2)
 In a tree,
o Thus, in a tree, O(E) = O(V)
Graph Data Structures

Graph Traversal Algorithms
Idea: If v is reachable from s then all neighbours of v will also be reachable from s.
Breadth First Search Depth First Search

O(V+E)
In a complete graph, O(V+E) = O(V+V 2) = O(V2)
Each vertex in queue once - O(V) Each vertex is visited once, and flagged as visited
(Assuming use AdjList) For each dequeue of a to avoid cycle
vertex, all k neighbour vertices are checked (Assuming using AdjList) For each visited
Therefore, all E edges are examined - O(E) vertex, check all its k neighbours and visit
Therefore overall O(V+E) them if possible,
Therefore, all E edges are examined.
Therefore overall O(V+E).
BFS(source) { DFS(source) {
// initialisation // initialisation
for (i = 0; i < V; i++) { for (i = 0; i < V; i++) {
isVisited[i] = false; isVisited[i] = false;
predecessor[i] = -1; predecessor[i] = -1;
} }
queue.enqueue(source); // recursive call
isVisited[source] = true; recursiveDFS(source);
}
// main loop
while (!queue.isEmpty) { // main recursion
u = queue.dequeue(); recursiveDFS(vertex) {
neighboursOfU = AdjList.get(u); isVisited[vertex] = true;
for (v : neighboursOfU) { neighbours = AdjList.get(vertex);
if (isVisited[v] == false) { for (v : neighbours) {
isVisited[v] = true; if (isVisited[v] == false) {
predecessor[v] = u; predecessor[v] = u;
queue.enqueue(v); recursiveDFS[v];
} }
} }
} }
}
Complementary/Modified Algorithms
Reversed Path Keep backtracking via the predecessor array until you reach back to the
Reconstruction source, and printing as you visit the vertex.
(Iterative)
Path Reconstruction Recursively backtrack from the vertex into its predecessor until you hit the
(Recursive) root.
As you unfold the recursion, print the values which will be in order.
Check if u reachable from v BFS(v)/DFS(v) and check if visited[u] is true
Counting Components Component count initialise to 0, all vertices are unvisited.
O(V+E) Iterate through all the vertices,
If a vertex is unvisited, increment count by 1 and start DFS on the vertex.
Uses of BFS: (Finals AY2014/15 S1) Uses of DFS: (Finals AY2012/13 S1)
1. Graph traversal 1. Reachability test
2. Reachability Test 2. Find/Label/Count Components
3. Checking if the graph is connected. 3. Topological Sort of DAG.
4. Solving SSSP on an unweighted graph/weighted 4. Check if an undirected graph is a Bipartite
tree Graph
5. Checking if the graph is Bipartite. 5. Flood fill
6. Checking if a graph is a tree 6. Checking if graph is cyclic or acyclic
7. Finding articulation points/bridges
8. Finding strongly connected components in a
directed graph
Topological Ordering:
The linear ordering of the vertices in a DAG such that for every directed edge U-V, vertex U comes before
V in the ordering
Alternatively, each vertex comes before all vertices to which it has outbound edges.
Graph Theory
- A topological ordering is possible if and only if the graph is a DAG.
- A graph which has a cycle is not a DAG, does not have topological ordering.
o Every DAG has one or more topological sorts.
Topological Sort (Toposort) Algorithm on a DAG

Uses: DP, One-Pass Bellman Ford
General Idea: Modify DFS to post-order process vertex u after visiting all its neighbours
Implementation:
Use an ArrayList toposort to record the vertices
DAG Topological Sorting Algorithm

O(V+E)
DFS() {
for (i = 0; i < V; i++) {
isVisited[i] = false;
predecessor[i] = -1;
}
toposort.clear();
for (i = 0; i < V; i++) {
if (isVisited[i] == false) {
modifiedRecursiveDFSForToposort(i);
}
}
outputInReversedOrder(toposort);
}
modifiedRecursiveDFSForToposort(vertex) {
isVisited[vertex] = true;
neighbours = AdjList.get(vertex);
for (v : neighbours) {
if (isVisited[v] == false) {
predecessor[v] = u;
recursiveDFS[v];
}
}
toposort.add(vertex); // post-order
}
Minimum Spanning Tree (MST) Problem
Finding the spanning tree with the least possible weight, of a connected undirected** weighted graph
Brute force solution:

Find all cycles in the graph and remove the largest edge of each cycle.
But there can be up to O(2N) different cycles.
MST Algorithms
Prim’s Algorithm Kruskal’s Algorithm
O(ElogV) O(ElogV)
PriorityQueue stores an edge (in the form of EdgeList which is sorted by increasing edge weight.
IntegerTriple of edge weight and the two incident UFDS to test if adding edge will cause a cycle.
vertices), and sorts by increasing edge weight.
T is an empty graph.
Pick a source vertex, and process* the source.
[While the PQ is not empty] [While there are unprocessed edges]
Dequeue the least weight edge, if the outgoing Get the least unprocessed weight edge
vertex is not visited, add this edge to the MST, If adding this edge to T does not cause a cycle,
and process* the outgoing vertex. add the edge.
*Process: Marking the vertex as visited and T is an MST.

enqueuing all of its edges which is not yet in the
MST.
OPTIMIZATION OPTIMIZATION
Stop when all vertices have been added to the MST. Stop when there is only one disjoint set left
Or you have added V-1 edges to the MST.
Uses of MST Algorithms

- (Minimum Spanning Tree Problem) Finding the spanning tree of the graph with least possible weight.
- (Minimax Problem) Finding the minimum weight edge along the maximum weight path between one vertex to
the other.
- Finding the Second Best Minimum Spanning Tree
- (Kruskal’s Only) Finding the Minimum Spanning Forest of k trees which have minimum weight not
exceeding a given value X.
o (As done in Tutorial)
Normal Kruskal’s but once adding an edge exceeds the given value X, stop the algorithm.
Report the number of disjoint sets currently in the UFDS
(i.e. count number of p[i] == I in an O(V) pass)
Still O(ElogV).
Maximum Spanning Tree Problem
Using existing MST Algorithm: Negate all weights in the graph and solve MST as per normal
Modified Kruskal’s Algorithm: Sort edges in non-increasing order
**Is there such thing as MST on a Directed Graph

The equivalent of a minimum spanning tree in a directed graph is called an optimum branching or a
minimum-cost arborescence. The classical algorithm for solving this problem is the Chu-Liu/Edmonds
algorithm. There have been several optimized implementations of this algorithm over the years using
better data structures; the best one that I know of uses a Fibonacci heap and runs in time O(m + n log n)
and is due to Galil et al.
Single-Source Shortest Paths (SSSP) Problem
- In a weighted graph (not necessarily connected), finding the least weight path from a source vertex s
to each of every other vertex.
- In an unweighted/same-weight edge graph (not necessarily connected), finding the path with the least
number of edges from a source vertex s to each of every other vertex in the graph.
- In a weighted tree, finding the best (least weight, and only) path from a source vertex s to each of
every other vertex.
- In a weighted graph with negative weight cycle, SSSP is ill-defined because the total weight to a
vertex can decrease indefinitely by traversing the negative weight cycle.
SSSP Algorithms
Data Structures:
Predecessor array and distance array both of size V.
initaliseSSSP(source) boolean relax(u, v, weight(u,v))

Set the distance of all vertices to If distance[v] > distance[u] + weight(u,v)),
Infinity/Integer.MAX_VALUE, except for source which Relax the edges i.e. update distance[v] to the
is set to 0. lower value.
Set all predecessors to -1. And set predecessor of v to be u. Return true.
Else return false if cannot relax.
Modified BFS Bellman Ford’s Original Dijkstra’s Modified Dijkstra’s

O(V+E) if Unweighted-graph O(VE) O((V+E)logV) O((V+E)logV)
O(V) if Weighted-Tree O(V+E) if topological
order of relaxation/
One-pass
(can use on DAG)
Change visited array For V-1 times, relaxes Solved only contains Enqueue only the source
into distance array all the edges. source. with (0, source).
false corresponds to Negative Weight-Cycle Enqueue the source as [While PQ is not empty]
Infinity/Integer.MAX_VALUE Check* (0, source) and all Poll the front post
Reported if after the other vertices as pair.
Change algo, there is an edge (distance[v], v) into If the distance matches
visited[v] = true that can still be PriorityQueue which the distance array
Into relaxed. sorts in increasing (means, it’s not
distance[v] = distance. outdated), then relax
distance[u] + 1; all it neighbouring
OPTIMIZATION [While PQ is not empty] outgoing edges.
If weighted tree, then Add a flag, Dequeue the front most If an edge is relaxed,
remember to change 1 into If within an outerloop vertex and add the enqueue the new
weight(u,v) iteration, there is no vertex to solved. Relax (distance[v], v).
edge relaxation, stop all neighbouring
Simpler version the algorithm. Because outgoing edges. The old (distance[v], v)
Relax the neighbouring there is no edge that If an edge is relaxed, will be ignored because
edge by unit weight 1. needs to be relaxed update the distance of it does not match with
If an edge is relaxed, already. the outgoing vertex in the distance array,
enqueue it. the PQ.
Causes max iterations

Dijkstra’s Killer
even for the Optimised
version
For high V and E, use O((V+E)logV) Dijkstra’s Algo instead of Bellman Ford’s O(VE) Algo.
*Positive Weight Cycle Check

 Use Bellman Ford’s but ‘stretch’ instead of ‘relax’ the edges. If after V-1 passes, we can still
stretch an edge, means there is a positive weight cycle.
 Or negate all weight edges and perform the negative weight cycle check. (BF works with negative edge!)
Single Source Longest Path with all non +ve weight edges: Dikstra’s with Max PQ
Unweighted (All same weight) Graph or Weighted Tree Modified BFS

Bellman Ford Optimised Bellman Ford Original Dijkstra’s Modified Dijkstra’s
–ve weight edge Can solve SSSP. *Will terminate, cannot Can solve SSSP.
solve SSSP.
-ve weight cycle Can detect (by checking whether edges *Cannot detect, will run indefinitely, cannot
can still be relaxed), will solve SSSP.
terminate, cannot solve SSSP
* Assuming you can reach it from the source vertex.
Dynamic Programming (DP)

Ingredients:
 Problem exhibits Optimal Sub-structure
o Optimal solution to problem can be constructed efficiently from optimal solutions of its sub-
problems
 Problem exhibits Overlapping Sub-problems
o Problem can be broken down into sub-problems which are reused several times
 No repeated computation of an overlapping sub-problem
o Solve sub-problem once, save the solution and re-use it! (Memoisation)
vertex  state, edge  transition
Bottom-up DP:
From known base case, compute the solution to the larger problems using topological ordering of the DAG.
Top-down DP: Compute the solution to the largest sub problem by building a recurrence relation to it from
the solution to a smaller problem, making use of recursion and memoisation of answers to smaller sub
problems to avoid recomputations. In general, define the problem and define the recurrence, use
memoisation.
SSSP on a DAG
Bottom-up DP: O(V+E)
Topological sort the graph [O(V+E)]. Perform a one-pass Bellman Ford on a graph, which relaxes the
outgoing edges in topological order [O(V+E)]. A relaxation on the outgoing edges of a vertex makes use of
the fact that the incoming edges prior to it have been relaxed as well.
SSLP* on an Explicit DAG

Bottom up DP: O(V+E)
Negate all the edge weights and perform SSSP on a DAG like above. Negate the distance array back.

Perform Modified SSSP that does ‘stretching’ instead of ‘relaxing’. Note distances now initialise to –ve
Infinity but distance to source still 0.
*Single-Source Longest Path on a general (cyclic) graph is NP-Hard and not in the scope of CS2010
Longest Increasing Subsequence
Bottom up DP: O(N2)
Model it as a (implicit) DAG which also includes a vertex of value infinity at the end of a subsequence.
(An edge exists from a low to a high number. And since there are N-1 edges in between N numbers, add a
dummy number of infinity to make the number of edges equals numbers in the sequence.)
Distance array is thus size N+1.
distance[i] is the length of the LIS ending at the vertex in A[i], all initialise to 0.
The array in itself is a topological order in a sense.
For each index from left to right

For each index to the right of it
If it’s an increase in value (an implicit edge!)
`Stretch` the value of distance in the destination vertex by 1.
Top down DP: O(N2)

Let LIS(i) be the length of the LIS starting from index till the end of the array sequence.
LIS(lastIndex) = 1 since it’s the end of the sequence until itself.
LIS(i) = Possible maximum LIS on its rights side + 1
(1 is itself to the start of the right sequence, but this can only occur if it’s from low to high)
To find possible maximum LIS on the right, you have to consider all i+1 to the end of the sequence which
satisfies increasing order.
Counting Paths on (explicit) DAG

(Problem definition: Number of paths from source to destination)
In topological order, remove a vertex and increase the count of the vertices connected to its outgoing
edges. This propagation goes all the way to the destination vertex.
Top down DP: O(V+E)

(*slight change to problem definition, now the number of paths starts from destination to source)
Let numPathsToV(i) be the number of paths starting from vertex i to destination d
numPathsToV(d) = 1 since it’s the destination.
numPathsToV(i) = The sum of all numPathsToV(j) where j are vertices adjacent to i.
(Narrating the lecture notes example)

There is one way from 8 to 8. The number of ways from 6 to 8 is the number of ways from 8 to 8. The number of ways 4
to 8 is the number of ways from 6 to 8 plus ways from 5 to 8, which is 2. The number of ways from 1 to 8 is the number
of ways from 4 to 8 plus ways from 2 to 8, which is 4 + 2.
Travelling Salesman
C cities, start at city S, can end at any other city E.

What is the max profit he can get?
Answer: Infinity.
Single Source Longest Non Simple Path Problem on a General Graph with Cycles allows re-visitations, thus
causing cycles.
Single Source Longest Simple Path is a possible problem.
C cities, start at city S, can end at any other city E. But can only visit T number of cities.
What is the max profit he can get?
Top-down DP
You can model this as an SSLP problem on a DAG.
Change the graph into a DAG of vertex (City, NumCitiesLeft).
But an explicit graph modelling will have C*T number of vertices.
Let get_profit(u,t) be the maximum profit that salesman can get when he is at city u with t number of
cities left to visit.
get_profit(u,t) = 0 if t = 0,
get_profit(u,t) = -INF if cannot end at city u.
else if t > 0,
get_profit(u,t) = max(profit[u][v] + get_profit(v,t-1)) for all v in cities except for v = u.
Memoise using a 2D array memo table corresponding to city and NumCitiesLeft. O(C*T) space
Time Complexity
# of vertices in DAG - O(C*T)
Time to compute one distinct state (# of edges of each cities) - O(C)
Overall: O(C2*T)
Given C cities, completely connected to each other. Find the shortest tour, ending at the starting city,
which visits each of every other city exactly once.
Brute Force (Naïve) Solution

Try all C! permutations, computing each cost. Pick the minimum cost tour. – O(C!*C)
To generate all permutations, modify DFS
private static void DFSrec(int u) {

visited[u] = true;
for (int v = 0; v < N; v++)
if (!visited[v])
DFSrec(v);
// if all visited are true, then can compute the cost here.
visited[u] = false;
}
DP taught in CS4243
Longest Common Subsequence

Bottom Up DP
Initialise all base cases to 0. Do from left-to-right, top-to-bottom
Let D[i][j] be the length of the LCS of x at length i and y at length j
D[i][j] = D[i-1][j-1] + 1 if x[i] == y[j] i.e. same letter.
Else D[i][j] = max(D[i-1][j], D[i][j-1]);
# of distinct states – O(length(x)*length(y))

Time to compute one distinct state – O(1)
Overall: O(length(x)*length(y))
All-Pair Shortest Paths (APSP) Problem
Find the shortest paths between any pair of vertices in the given directed weighted graph.
Using Existing SSSP Algorithms

Apply an SSSP Algorithm V times, once from each vertex.
On unweighted graph:
BFS – O(V*(V+E)) = O(V3) if E = O(V2)
On weighted graph (without –ve weight):

Bellman Ford – O(V*(VE)) = O(V4) if E = O(V2)
Dijkstra’s - O(V*(V+E)*logV) = O(V3logV) if E = O(V2)
Floyd Warshall’s Algorithm

Data Structure:
2D distance matrix D[i][j] where D[i][j] contains the shortest path from i to j.
Initialisation:
D[i][i] = 0
D[i][j] = edgeExist(i,j) ? weightOfEdge(i,j) : INF
At the end of algorithm, D[i][j] contains the shortest path from i to j.
for (k = 0; k < V; k++) {

for (i = 0; i < V; i++)
for (j = 0; j < V; j++)
D[i][j] = Math.min(D[i][j], D[i][k]+D[k][j]);
}
Time complexity: O(V3) with future queries at O(1)
Can handle –ve edge. Can detect –ve weight cycle.
Variants
Printing the Actual SP
Addition Data Structure:
2D predecessor matrix p[i][j] where p[i][j] is the predecessor of j on a shortest path from i to j.
Initialisation:
p[i][j] = i for all.
(Trace)
reconstructReversedPath(4, 3)
Modification:
i = 3
if (D[i][k] + D[k][j] < D[i][j]) {
j = 4
D[i][j] = D[i][k] + D[k][j];
j = 4 != 3; enter while loop
p[i][j] = p[k][j];
print(j) = print(4)
}
j = p[3][4] = 2
j = 2 != 2; remain in while loop
reconstructReversedPath(endVertex, source){
print(j) = print(2)
i = source; j = endVertex;
j = p[3][2] = 0
while (j != source) {
j = 0 != 2; remain in while loop
print(j);
print(j) = print(0)
j = p[i][j];
j = p[3][0] = 3
}
j = 3 == 3; break out of while loop
print(source);
print(source) = print(3);
}
Transitive Closure Problem

Given a graph, determine if vertex i is connected to vertex j either directly (via an edge) or indirectly
(via a path)
Initialisation:
D[i][i] = 0
D[i][j] = 1 if edge i-j exists
D[i][j] = 0 otherwise
Modification:
D[i][j] = D[i][j] | (D[i][k] & D[k][j]);
Minimax/Maximin Problem
Finding the minimum of maximum edge weight along all possible paths from vertex I to vertex j (Minimax)
Finding the maximum of minimum edge weight along all possible paths from vertex I to vertex j (Maximin)
Initialisation:
D[i][i] = 0
D[i][j] = weight of edge i-j if it exists
D[i][j] = INF otherwise
Modification:
D[i][j] = Math.min(D[i][j], Math.max(D[i][k], D[k][j])); (Minimax)
D[i][j] = Math.max(D[i][j], Math.min(D[i][k], D[k][j])); (Maximin)
Detecting +ve/-ve Cycle
Modification:
Main diagonal of D to INF
After running Floyd Warshall’s recheck main diagonal
D[i][i] < INF but >= 0  +ve cycle
D[i][i] < 0  -ve cycle for vertex i
Miscellaneous Algorithms for General Use
1) Merge routine on 2 lists/arrays already sorted. (which can also be used to find common element between
two lists/arrays)
Example:
list1: 1 2 3 5 7
list2: 0 4 6 7 10
---> 0 1 2 3 4 5 6 7 7 10
O(m+n) where m , n is size of the lists
2) Quickselect to find kth smallest element in an unordered list, with random pivot
Worst Best Average

O(n2) O(n) O(n)
3) Counting Sort
O(n+k) where n is the number of elements in input array and k is the range of input.
- Counting sort is efficient if the range of input data is not significantly greater than the number of
objects to be sorted. Consider the situation where the input sequence is between range 1 to 10K and
the data is 10, 5, 10K, 5K.
- It is not a comparison based sorting. It running time complexity is O(n) with space proportional to
the range of data.
- It is often used as a sub-routine to another sorting algorithm like radix sort.
- Counting sort uses a partial hashing to count the occurrence of the data object in O(1).
- Counting sort can be extended to work for negative inputs also
4) Sorting Algorithms
Worst Best Average Notes
Insertion O(n2) O(n) O(n2)
Bubble O(n2) O(n) O(n2)
Selection O(n2) O(n2) O(n2) Unstable
Quick O(n2) O(nlog(n)) O(nlog(n)) Unstable
Merge O(nlog(n)) O(nlog(n)) O(nlog(n)) Not in-place
Heap O(nlog(n)) O(nlog(n)) O(nlog(n))
5) Exploit Spanning Trees (Multisource, or Treating the Destination as a Source)
There are times when you would like to know a shortest distance from the end vertex to a particular
vertex. Or you would like to see a spanning tree from multiple vertices. Remember to change the graph to
exploit the algorithms you know.
Detailed Code Implementations
Binary Heap
// O(logN)
Insert(v)
heapsize++; // extend - O(1)
A[heapsize] = v; // insert at the back - O(1)
ShiftUp(heapsize); // fixes Binary Heap property - O(logN)
// ShiftUp in O(logN)
void ShiftUp(i)
while i > 1 && A[parent(i)] < A[i] // while not root and max heap property violated
swap(A[i], A[parent(i)])
i = parent(i)
// O(logN)
Obj ExtractMax()
maxV = A[1] // get the Max value
A[1] = A[heapsze] // replace the max with the last item
heapsize-- // decrease the heapsize
Shiftdown(1) // fix Binary Heap property – O(logN)
return maxV
// O(logN)
void shiftDown(int i)
while i <= heapsize
maxV = A[i];
maxIndex = i;
// IN GENERAL FIND THE LARGER OF THE TWO CHILDREN

if left(i) <= heapsize && maxV < A[left(i)] // compare with left child, if exist
maxV = A[left(i)];
maxIndex = left(i);
if (right(i) <= BinaryHeapSize && maxV < A[right(i)] // compare with right child, if exist
maxV = A[right(i)];
maxIndex = right(i);
if (max_id != i) {
swap(A[i] , A[maxIndex])
i = maxIndex;
else
break;
// O(NlogN)
CreateHeap(arr)
N = size(arr)
A[0] = 0 // dummy entry cos 1 based indexing
for (i = 1; i <= N ; i++)
Insert(arr[i-1]) // O(logN)
// O(N)
CreateHeap(arr)
N = size(arr)
A[0] = 0 // dummy entry
for (i = 1; i <= N; i++)
A[i] = arr[i-1]
for (i = parent(N); i = 1; i--) // from parent of last leaf down to the root
ShiftDown(i)
// O(NlogN)
Heapsort(arr)
CreateHeap(arr)
N = size(arr)
for (i = 1, i <= N; i++)
A[N-i+1] = ExtractMax()
return A
BST/AVLTree
// o(h)
search(v)
if (this.value == null) return null
else if this.value == v
return this.value
else if this.value < v
search right
else
search left
// o(h)
insert(v)
if insertion point is found
create new vertex
if v < vertex.value
go left
else
go right
for avl tree update height and size

-- vertices along insertion path may have their height attribute update.
// o(h)
findmin/findmax()
traverse until leftmost/rightmost child (avl tree)
// o(h)
predecessor(v)
if this.left != null // if this subtree contains a left subtree
return findmax(this.left) // return the max of this left subtree
else
p = this.parent, t = this
while(p != null && t == p.left) // if parent not root & i am it's left child
t = p, p = t.parent // traverse right
if p is null return -1
else return p // return the predecessor
// o(h)
successor(v)
if this.right != null // if this subtree contains a right subtree
return findmax(this.right) // return the max of this right subtree
else
p = this.parent, t = this
while(p != null && t == p.right) // if parent not root & i am it's right child
t = p, p = t.parent // traverse left
if p is null return -1
else return p // return the successor
// o(n) traversal of all elements
preorder()
print
preorder(t.left)
preorder(t.right)
inorder()
inorder(t.left)
print
inorder(t.right)
postorder()
postorder(t.left)
postorder(t.right)
print
// o(h)
delete(v)
find v in o(h) time
by comparing t.key to v
when v is found
if (t.left == null && t.right == null) // has no children, this is a leaf
t = null; // simply erase this node
// has 1 child either left or right

else if (t.left == null && t.right != null) // only one child at right
t.right.parent = t.parent;
t = t.right; // connect child to parent and vice versa
else if (t.left != null && t.right == null) // only one child at left
t.left.parent = t.parent;
t = t.left; // connect child to parent and vice versa
else // has two children, find successor in o(h)**

int successorv = successor(v);
t.key = successorv; // replace this key with the successor's key o(1)
t.right = delete(t.right, successorv); // delete the old successorv o(h)
update height and size
// O(h)
int rank(node, v) { // assume that v exists in the bst and size attribute is there
if (node.key == v)
return node.left.size + 1; // return the size of the node and itself for this recursive method
else if (node.key > v)
return rank(node.left, v); // v must be on the left
else
// v is > node's left and the node and plus this rank
return node.left.size + 1 + rank(node.right, v);
leftrotate(t) {
bstvertex w = t.right; // get the right child of t
w.parent = t.parent; // swap their parents by pointing w to t's parents
t.parent = w; // *** w becomes t's parent
t.right = w.left; // take the child on w's left and give it to t's right
if (w.left != null) // does this child even exist?
w.left.parent = t; // make the child know t is now it's parent
w.left = t; // *** t becomes w's left child
// update the vertex in the lower height then the other one.
w.size = t.size;
t.size = 1 + getsize(t.left) + getsize(t.right);
t.height = 1 + math.max(getheight(t.left), getheight(t.right));

w.height = 1 + math.max(getheight(w.left), getheight(w.right));
return w;
}
rightrotate(t) {
bstvertex w = t.left; // get the left child of t
w.parent = t.parent; // swap their parents by pointing w to t's parents
t.parent = w; // *** w becomes t's parent
t.left = w.right; // take the child on w's right and give it to t's left
if (w.right != null) // does this child even exist?
w.right.parent = t; // make the child know t is now it's parent
w.right = t; // *** t becomes w's right child
// update the vertex in the lower height then the other one.
w.size = t.size;
t.size = 1 + getsize(t.left) + getsize(t.right);
t.height = 1 + math.max(getheight(t.left), getheight(t.right));

w.height = 1 + math.max(getheight(w.left), getheight(w.right));
return w;
}
4 cases of balancing
recall bf(x) = x.left.height - x.right.height
bf(x) = +2 && 0 <= bf(x.left) <= 1 left left case

rightrotate(x)
bf(x) = +2 && bf(x.left) = -1 left right case

leftrotate(x.left)
rightrotate(x)
bf(x) = -2 && -1 <= bf(x.right) <= 0 right right case

leftrotate(x)
bf(x) = -2 && bf(x.right) = 1 right left case

rightrotate(x.right)
leftrotate(x)
UDFS
findSet(i)
recursively visit p[i] until p[i] = i
everything visited will be connected to the representative item
isSameSet(i,j)
check findSet(i) and findSet(j) same representative item
unionSet(i, j)
general idea is
basically make the rep item of the taller tree as the rep item of the rep item of the shorter tree
if both trees are the same rank, then the i is nested under j and increase rank
// version 1
first you check whether they are same set.
then you find the rep item of i, and j, which we refer these rep items as x and y respectively
(aka x = findSet(i) and y = findSet(j))
if the rank of x is greater than y.
then nest y under x
else if the rank of y is greater than x
then nest x under y
else if the rank of them are the same.
then nest x under y
BUT increase the rank of y
// alternative refactored version

first you check whether they are same set.
then you find the rep item of i, and j, which refer to x and y
if the rank of x is greater than y.
then nest y under x
else
nest x under y
if the rank of them are the same.
increase the rank of y as well.
Graph Traversal
// O(V+E)
// Each vertex in queue once - O(V)
// (Assuming use AdjList) For each dequeue of a vertex, all k neighbour vertices are checked
// Therefore, all E edges are examined - O(E)
// Therefore overall O(V+E)
BFS(source) {
// initialisation
for (i = 0; i < V; i++) {
}
queue.enqueue(source);
isVisited[source] = true;
// main loop
while (!queue.isEmpty) {
u = queue.dequeue();
neighboursOfU = AdjList.get(u);
for (v : neighboursOfU) {
isVisited[v] = true;
predecessor[v] = u;
queue.enqueue(v);
}
}
}
}
// O(V+E)
// Each vertex visited one, and flagged visited to avoid cycle
// (Assuming using AdjList) For each visited vertex, check all its k neighbours and visit them if
possible,
// Therefore, all E edges are examined.
// Therefore overall O(V+E)
DFS(source) {
for (i = 0; i < V; i++) {
}
recursiveDFS(source);
}
recursiveDFS(vertex) {
predecessor[v] = u;
recursiveDFS[v];
}
}
}
Graph Traversal Algorithms (Continued)
// iterative method
reconstructReversedPath(endVertex, source){
i = endVertex;
while (i != source) {
print(i); // Do whatever you want with i here, print, or grab the weight or whatever.
i = predecessor[i];
}
print(source);
}
// recursive method
reconstructPath(endVertex){
backtrack(endVertex);
}
void backtrack(vertex) {
if (vertex == -1) {
return
}
backtrack(predecessor[vertex]);
print(vertex); // by recursion, the order will be reversed to the correct order.
}
// check if u reachable from v
boolean checkReachability(v, u) {
BFS(v); // or DFS(v)
if (visited[u] == true) {
return true;
} else {
return false;
}
}
// O(V+E) Every vertex is still called at least once.
int countComponents() {
componentCount = 0;
for (i = 0; i < V; i++) {
}
for (i = 0; i < V; i++) {
componentCount++;
recursiveDFS(i);
}
}
}
DFS() {
for (i = 0; i < V; i++) {
}
toposort.clear();
for (i = 0; i < V; i++) {
modifiedRecursiveDFSForToposort(i);
}
}
outputInReversedOrder(toposort);
}
modifiedRecursiveDFSForToposort(vertex) {
predecessor[v] = u;
recursiveDFS[v];
}
}
toposort.add(vertex); // post-order
}
MST Algorithms
// O(ElogV)
primsAlgo(source) {
addToMST(source);
enqueueAllEdgesConnectedToThisVertexIfNotInMST(source);
while (!pq.isEmpty()) {
edge = pq.dequeue(); //least weight edge in the PQ
v = getVertexLinkedToThisEdge(edge);
if (isPartOfMST(v) == false) {
addToMST(edge);
addToMST(v);
enqueueAllEdgesConnectedToThisVertexIfNotInMST(v);
}
}
}
enqueueAllEdgesConnectedToThisVertexIfNotInMST(vertex) {
for (neighbour : neighbours) {
if (isPartOfMST(neighbour) == false) {
edge = getEdge(vertex, neighbour);
pq.enqueue(edge);
}
}
}
void addToMST(vertex) {
taken[V] = true;
}
void addToMST(edge) {
AdjList.add(edge);
}
boolean isPartOfMST(vertex) {
return taken[V];
}
// O(ElogV)
kruskalsAlgo() {
while (hasUnprocessedEdgesLeft() == true) {
edge = getMinimumUnprocessedEdgeFromSortedEdgeList();
if (createsCycleIfAddThisEdgeToMST(edge) == false) {
addToMST(edge);
} else {
continue;
}
}
}
boolean createsCycleIfAddThisEdgeToMST(edge) {
u , v = incident vertices of the edge
if (isSameSet(u,v)) return true;
else return false;
}
void addToMST(edge) {
u , v = incident vertices of the edge
unionSet(u,v)
}
SSSP Algorithms
initaliseSSSP(source) {
for (vertex : vertices) {
distance[vertex] = Integer.MAX_VALUE;
predeceessor[vertex] = -1;
}
distance[source] = 0;
}
boolean relax(u, v, weight(u,v)) {
if (distance[v] > distance[u] + weight(u,v)) {
distance[v] = distance[u] + weight(u,v);
predecessor[v] = u;
return true;
}
return false;
}
// For unweighted graph or all edges same weight
// Or on weighted tree (remember to modify the 1 into weight(u,v) respectively!)
// O(V+E) ; O(V) if weighted tree;
// version 1
modifieidBFSForSSSP(source) {
// initialisation
initaliseSSSP(source);
// main loop
if (distance[v] == Integer.MAX_VALUE) {
distance[v] = distance[u] + 1;
predecessor[v] = u;
queue.enqueue(v);
}
}
}
}
// Version 2 - simpler
modifieidBFSForSSSP(source) {
// initialisation
// main loop
canRelax = relax(u, v, 1);
if (canRelax) {
queue.enqueue(v);
}
}
}
}
Bellman-Ford’s Killler Dijkstra’s Killer

// For general graph; Negative edge can handle; Negative cycle can detect but unable to give an answer.
// O(VE)
// O(V+E) if order of edge relaxation in topological order. And can do in one pass
bellmanFordSSSP(source) {
for (i = 1; i <= V; i++) {
for (edge(u,v) : edges) {
relax(u, v, weight(u,v));
}
}
// check for negative-weight cycle
for (for edge(u,v) : edges) {
if (distance[u] != Integer.MAX_VALUE && distance[v] > distance[u] + weight(u,v) ) {
reportExistenceOfNegativeWeightCycle();
}
}
}
// Optimised Bellman Ford
// Add a flag, if within an outerloop iteration, there is no edge relaxation, stop the algo
bellmanFordSSSPOptimised(source) {
for (i = 1; i <= V; i++) { // this is the outer loop being refered to
for (edge(u,v) : edges) {
hasRelaxationInThisIteration = relax(u, v, weight(u,v));
}
if (!hasRelaxationInThisIteration) { // no more relaxations are done
break;
}
}
// check for negative-weight cycle
for (for edge(u,v) : edges) {
if (distance[u] != Integer.MAX_VALUE && distance[v] > distance[u] + weight(u,v) ) {
reportExistenceOfNegativeWeightCycle();
}
}
}
// For general graph; Negative edge cannot handle (if reachable by source)
// Negative weight cycle cannot handle (if reachable by source)
// O((V+E)logV)
originalDijkstras(source) {
for (vertex : vertices) {
if (vertex == source) pq.enqueue(pair(0, vertex));
else pq.enqueue(pair(Integer.MAX_VALUE, vertex);
}
while (!pq.empty()){
u = pq.poll();
isRelaxed = relax(u, v, weight(u,v));
if (isRelaxed) updateDistanceVInPQ(); // which is decreaseKey, not available in Java API
}
}
}
// For general graph; Negative edge can handle

// Negative weight cycle cannot detect nor handle (if reachable by source)
// O((V+E)logV)
modifieidDijkstras(source) {
pq.enqueue(pair(0,source));
while (!pq.isEmpty()) {
pair(d, u) = pq.poll();
if (d == distance[u]) {
for (v : neighoursOfU) {
isRelaxed = relax(u, v, weight(u,v));
if (isRelaxed) pq.enqueue(pair(distance[v], v));
}
}
}
}
DP
private static int LIS(int i) {
if (i == N-1) return 1;
if (memo.get(i) != -1) return memo.get(i);
int ans = 1; // at least A[i] itself

for (int j = i+1; j < N; j++)
if (A[i] < A[j]) // if can be extended
ans = Math.max(ans, LIS(j)+1);
memo.set(i, ans);
return ans;
}
private int numPathsToV(int i){
if (i == V - 1)
return 1;
if (memo.get(i) != -1) return memo.get(i);
int ans = 0;
for (int j = 0; i < AdjList.get(i).size(); j++){
int U = AdjList.get(i).get(j);
ans += numPaths(U);
}
memo.set(i, ans);
return ans;
}
private static int get_profit(int u, int t) {
if (t == 0) return candEnd[u] ? 0 : -INF?;
if (memo[u][t] != -1) return memo[u][t];
memo[u][t] = -INF;
for (int v = 0; v < C; v++) {
if (v == u) continue;
memo[u][t] = Math.max(memo[u][t], profit[u][v] + get_profit(v, t-1));
}
return memo[u][t];
}
LCS() {
for(i = 0; i <= n; i++)
D[i][0] = 0;
for(j = 0; i <= m; j++)
D[0][j] = 0;
for(i = 1; i <= n; i++) {
for(j = 1; j <= m; j++) {
if (x[i] === y[j])
D[i][j] = D[i-1][j-1] + 1;
else
D[i][j] = max(D[i-1][j], D[i][j-1]);
}
}
}

Data Structure and Algorithms Notes

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Data Structure and Algorithms Notes

Încărcat de

Drepturi de autor:

Formate disponibile

(Author: Neil Brian. Adapted from Stanley Tay.

Last Updated: 5/12/17)

Priority Queue ADT

Best to implement via Binary Heap

Binary Heap Property

Put in compact 1-based array.

Insert(v) O(logN) Extend the heapsize – O(1)

Graph Theory/ Math Notes:

- Number of nodes in a perfect binary tree of height h:

Best to implement via BBST

*O(h) = O(logN) for AVL Tree

** Why take the successor of v?

Height: number of edges from node to deepest leaf

Empty Tree Normal Tree

BBST/AVL Tree Property

balancefactor = x.left.height - x.right.height

findSet(i) O(α(N)) Recursively visit p[i] until p[i] = i

Graph Data Structures

Breadth First Search Depth First Search

Topological Sort (Toposort) Algorithm on a DAG

DAG Topological Sorting Algorithm

Brute force solution:

*Process: Marking the vertex as visited and T is an MST.

Uses of MST Algorithms

Maximum Spanning Tree Problem

Modified Kruskal’s Algorithm: Sort edges in non-increasing order

**Is there such thing as MST on a Directed Graph

initaliseSSSP(source) boolean relax(u, v, weight(u,v))

Modified BFS Bellman Ford’s Original Dijkstra’s Modified Dijkstra’s

Causes max iterations

*Positive Weight Cycle Check

Unweighted (All same weight) Graph or Weighted Tree Modified BFS

Dynamic Programming (DP)

SSLP* on an Explicit DAG

Bottom up DP: O(V+E)

For each index from left to right

Top down DP: O(N2)

Counting Paths on (explicit) DAG

Top down DP: O(V+E)

(Narrating the lecture notes example)

C cities, start at city S, can end at any other city E.

Single Source Longest Simple Path is a possible problem.

Brute Force (Naïve) Solution

To generate all permutations, modify DFS

private static void DFSrec(int u) {

Longest Common Subsequence

# of distinct states – O(length(x)*length(y))

Using Existing SSSP Algorithms

On weighted graph (without –ve weight):

Floyd Warshall’s Algorithm

At the end of algorithm, D[i][j] contains the shortest path from i to j.

for (k = 0; k < V; k++) {

Time complexity: O(V3) with future queries at O(1)

Can handle –ve edge. Can detect –ve weight cycle.

Transitive Closure Problem

Detecting +ve/-ve Cycle

Miscellaneous Algorithms for General Use

O(m+n) where m , n is size of the lists

Worst Best Average

5) Exploit Spanning Trees (Multisource, or Treating the Destination as a Source)

// IN GENERAL FIND THE LARGER OF THE TWO CHILDREN

for avl tree update height and size

// has 1 child either left or right

else // has two children, find successor in o(h)**