Sunteți pe pagina 1din 11

Kareem Sabri January 22, 2010

A L G O R I T H M S A N D D AT A S T R U C T U R E S
Sorting I. Quicksort O(n^2) worst-case, O(nlogn) average (assuming distinct elements) Algorithm: Runs on array, a[p..r] 1. Choose pivot, a[r] 2. Choose q = (r-p)/2 3. Divide a into two sub-arrays a[p..q-1] and a[q+1..r]. Sort these such that a[p..q-1] <=a[q] and a[q+1..r] >= a[q]. 4. Call Quicksort on a[p..q-1] and a[q+1..r]. The key is the Partition algorithm to sort the subarrays: Partition(p,r) x = a[r] i = p-1 for j=p to r-1


return i+1 This algorithm works as follows: 1. Chooses the pivot to be the last element in the array. 2. Examines each element of the array, and if its less than the pivot, swaps it with the element next to the last swapped element (which must be greater than the pivot, or it would have been swapped). 3. Swaps the pivot with the element next to the last swapped element (which could be the pivot if all elements were less than or equal to the pivot). 4. Returns the new index of the pivot. if a[j]<=x

i = i+1 exchange a[i] with a[j]

exchange a[i+1] with a[r]

Termination occurs when p>=r, array is sorted. Improvement can be made be choosing a random pivot rather than the last element. Select a random index as the pivot, swap it with the last element in the array, then proceed as normal. II. Mergesort O(n), all cases Algorithm: Runs on array a[p..r] 1. Divide the array into 2 sub-arrays each of size n/2. 2. Call Mergesort on the 2 sub-arrays. 3. Merge the sorted subarrays. The key is the Merge algorithm to merge the sorted sub-arrays: Merge(p,q,r) create an array b of size r-p+1 to hold a[p..r] k=p, i=0, j=q-p+1 while (k<=r)






if (b[i]<=b[j])

else

k++ a[k]= b[j] j++ a[k] = b[i] i++

If either i or j reaches the end of its sub-array, simply add the remaining elements in the other sub-array to a.

Data Structures I. Heaps An array structure that represents a nearly complete binary tree. Each array element corresponds to a node in the tree. It is nearly complete because it is lled on all levels except possibly the lowest, where it is lled from the left up to a point. Array has 2 attributes, length and heap_size, which can obviously differ. Root is a[0]. For node a[i], a[i/2] is parent, a[2i] is left child, and a[2i+1] is right child. There are 2 types of heaps: Max Heap: a[parent(i)] >= a[i] Min Heap: a[parent(i)] <= a[i] Heapsort uses a max heap to sort an unordered array. Using: MaxHeapify(i) l = Left(i), r = Right(i) if l<=a.heap_size and a[l]>a[i]
largest = l else largest = i if r<=a.heap_size and a[r]>a[largest]


largest = r exchange a[i] and a[largest] MaxHeapify(largest) if largest != i

MaxHeapify works by examining a node, and swapping it and its largest child (if a child larger than it exists). This causes the node to oat down until the max heap property is satised. We use MaxHeapify to build a max heap out of an unsorted array by calling max heap on all the lowest nodes that are not leaves (leaves will take their appropriate positions by being swapped with their parents if necessary). The lowest node that is not a leaf is given by (length/2)-1. To perform Heapsort we use the property that the maximum element of max heap is always at the root, so we swap it with the last element in the heap (and then decrement heap_size), then use MaxHeapify to restore the heap to max heap status. Heapsort() for i=a.length-1 downto 0


exchange a[i] with a[0] a.heap_size-MaxHeapify(0)

II. Priority Queues Priority Queues can be efciently implemented using the Heap data structure. Like Heaps, there are both Max and Min Priority Queues. To implement a Max Priority Queue, where the element with the greatest key is served rst: Maximum()





return a[0] max = a[0] a[0] = a[a.heap_size] a.heap_size-MaxHeapify(0) return max ExtractMax()

To serve the max we call ExtractMax, which replaces the root with the lowest node on the tree, decrements the heap size, and then uses MaxHeapify to restore the max heap property to the tree. We may want to increase the priority of a given node: IncreaseKey(i,key)



a[i] = key while (i>0 AND a[Parent(i)] < a[i])

exchange a[i] with a[Parent(i)] i = Parent(i)

IncreaseKey causes the increased node to oat up the Heap until it is in its appropriate place. To Enqueue a new node we require an Insert function: Insert(key)


a[a.heap_size] = -1 //must be less than the minimum possible priority of any node a.heap_size++ IncreaseKey(heap_size-1, key)

The implementation of a Min Priority Queue is analogous, except it uses a Min Priority Heap, and the functions Minimum, ExtractMin, and DecreaseKey.

III. Hash Tables Provides a dynamic set that supports Insert, Search, and Delete. Direct Address Hash Tables Useful when universe of keys, U, is small. Store an array of size U, T. If an element with key k exists then T[k] will contain (or point to) that element, otherwise it will be empty. This is impractical (or impossible) for large U, and inefcient for sparse Hash Tables. Real Hash Tables Uses a table of size m < U. An element with key k will hash to index h(k), where h is a Hash Function that calculates the index. Since m < U two keys can collide, or hash to the same slot. To handle this we must have some type of collision resolution in place. Collision Resolution Chaining refers to the practice of placing all elements that hash to the same slot in a linked list. T[k] stores a pointer to the head of the list, or null if no elements with a matching key are present in the Hash Table. Performance of Chained Hash Tables depends on the load factor, =n/m, the average number of elements stored in a chain. Search should take O(1+) time assuming uniform hashing. Probing refers to the practice of nding an alternate spot in the hash table for an element with key k, if T[h(k)] is occupied. Linear Probing: h(k) = (h(k) + i) mod m Quadratic Probing: h(k) = (h(k) + ai^2 + bi) mod m To do this: j = h(k), i = 0; i++, j = (i+j) mod m (with m a power of 2) Double Hashing: h(k) = (h(k) + ih(k)) mod m Choosing Hash Functions Hash size = m Floating Point numbers in range (s,t):
Integers: m is prime: k mod m else: oor(k) mod m where = (sqrt(5)-1)/2 Strings: foreach char
h = (127h + char) mod m h(k) = round(((k-s)/(t-s))*m)

IV. Binary Trees Data Structure, typically represented using a Linked List, where each Node of the tree contains a left child and right child. The left child is less than or equal to its parent, and the right child is greater than or equal to its parent. 3 Orders of Tree traversal: Pre-Order: Left Child, Parent, Right Child In-Order: Parent, Left Child, Right Child Post-Order: Left Child, Right Child, Parent 2 Orders of search: Breadth-rst search: All nodes at the current level are examined before proceeding to their children. Typically accomplished using a FIFO Queue. Depth-rst search: All children of a node are examined before proceeding to its sibling. Typically accomplished using a Stack. Red Black Tree A balanced binary tree. Every node has an additional attribute, Color, which can be either Red or Black. Maintains the following properties: 1. Every node is either red or black. 2. The root is black. 3. Every leaf (NIL) is black. 4. If a node is red, both of its children are black. 5. For each node, all simple paths from the node to descendant leaves contain the same number of black nodes. When a new node is inserted into the tree we always make it red. Node insertions may violate properties 2 or 4. To restore these properties we call RBInsertFixup on our newly inserted node, z. RBInsertFixup works as follows: While zs parent is red: If zs uncle is also red, this is the rst case, in which we make zs grandparent red, and zs parent and uncle black. This restores property 4 for all subtrees below zs grandparent. We then continue checking at zs grandparent. If zs uncle is black, then we must perform rotations. If z is a right child, this is the second case, and we set z to zs parent and perform a Left Rotation. This makes z a left child with a red parent and black uncle. This is the third case, in which we make zs parent black, and zs grandparent red. Then we perform a Right Rotation on zs grandparent. This makes zs black parent the root, with red z and red zs grandparent as children, restoring the properties

RBInsertFixup















while z.p.color == red













if z.p == z.p.p.left











y = z.p.p.right if y.color == red





z.p.color = black y.color = black z.p.p.color = red z = z.p.p z = z.p LeftRotate(z)

else if z = z.p.right

z.p.color = black z.p.p.color = red RightRotate(z.p.p)

else same with right and left interchanged

root.color = black

Deleting a node can also cause violations of the red-black properties, and must be corrected using RBDeleteFixup. RBDelete proceeds as follows to delete a node z: Node y keeps track of a node (and its color) that was either removed from the tree or moved within the tree. If z has less than two children, z is replaced with its (possible NIL) other child. y holds the color of z, the node that was removed. If z has two children we need to nd a node to take the place of z, and adopt its children. To do this we nd the tree minimum, which must have no left child, starting from zs right child. Since the tree minimum will be moved it becomes our node y. We replace y with its right child, which will not violate the integrity of the tree, and then we replace z with y, and y takes zs color. We also maintain a node x that may have a color violation, and call RBDeleteFixup on that node. In the rst case x points to zs replacement (its right or left child). In the second case x points to ys right child, the node that replaced y, as it may cause a color violation and replacing z with y will not, since the same color relationship is maintained. If ys original color was black, we call RBDeleteFixup on x, the node that replaced y. RBDeleteFixup

V. Graphs A data structure representing vertices connected by edges, Graphs can be either directed or undirected. There are two ways to represent a Graph, as an Adjacency List or an Adjacency Matrix. The Adjacency List representation provides an efcient way to represent sparse graphs, where |E| is much less than |V|^2. Adjacency Matrix representations require an entry for every possible edge, so V^2 entries, however they support O(1) operations to determine if an edge exists between two vertices, and also can store this data in binary form, for efcient space utilization. Adjacency List consists of an array of |V| lists, with each list holding the edges from V to its adjacent vertices. The memory required for an Adjacency List is O(V+E), compared with O(V^2) for an Adjacency Matrix, hence Adjacency Lists are ideally suited for sparse graphs. Breadth-First Search Systematically explores G from the source s outwards, discovering all vertices at a distance d from s, then all vertices at a distance d+1 from s, and so on. This is accomplished using a FIFO Queue. The Breadth-First Search algorithm is as follows: BreadthFirstSearch(s) foreach vertex u in G


s.d = 0 s.prev = NULL Q.Enqueue(s) while Q is not empty






u = Q.Dequeue() foreach v in Adj[u]




if v.visited == false



Enqueue(v) v.visited = true v.d = u.d + 1 v.prev = u u.visited = false u.distance = innity u.prev = NULL

s.visited = true

Breadth-First Search can be used to compute the shortest path from a node s to a node t. After running BreadthFirstSearch on s, we will have a breadth-rst tree that contains all reachable vertices from s. If t.d is innity then t is not present in the tree, and there is no path between s and t. Otherwise, we simply work backwards using the prev attribute of each vertex from t until we reach s. Putting these vertices into a LIFO Queue will give us the shortest path from s to t.

Depth-First Search Systematically explores G by continually going deeper into the graph, always taking an edge at leaving a vertex if its available. Once a vertex has no more edges available, it backtracks to the previous vertex and explores its edges. The Depth-First Search algorithm appears below: DepthFirstSearch






DFSVisit(u)









time = time + 1 u.d = time u.visited = true foreach v in Adj[u]


if u.visited == false

v.prev = u DFSVisit(v) foreach vertex u in G

u.visited = false u.nished = false

time = 0 foreach vertex u in G



if u.visited = false
DFSVisit(u)

u.nished = true time = time + 1 u.f = time

The attributes d and f represent the time a vertex was discovered and nished (every edge leaving it has been explored), respectively. These attributes provide useful information about the Graph: For any two vertices, u and v: 1. If [u.d, u.f] and [v.d, v.f] are disjoint, then neither v nor u are descendants of the other 2. If [u.d, u.f] is contained in [v.d, v.f] then u is a descendant of v These are the only two possible relationships between two [d,f] ranges. Depth-First Search is also used to perform a Topological Sort, which is useful for a graph that indicates precedence among events (ie. an edge from u to v implies u occurs prior to v). To

perform a Topological Sort (and get the order in which events must occur) call DFS on the Graph and put each vertex on the front of a Linked List as its nished. Shortest Path Algoriths Breadth-First Search computes the shortest path between two vertices in the case on an unweighted graph, where the weight of each edge is the same. In the case of a weighted Graph, where each path has an associated weight or cost, a variation of Breadth-First Search is used to nd the lowest weight path. Dijkstras Algorithm is one such algorithm. It makes use of a Minimum Priority Queue (which can be implemented using a Min Heap) which initially contains all vertices, to extract the lowest weight vertex, u, then relax all vertices v in Adj[u], after which u is added to the set S of nished vertices. Dijkstra(s)







InitializeSingleSource(s) S = empty set Q = G.V while Q is not empty



u = ExtractMin(Q) S = S U {u} for each v in Adj[u]
Relax(u,v)

InitializeSingleSource(s)



Relax(u,v)


if v.d > u.d + w(u,v)

v.d = u.d + w(u,v) v.prev = u foreach vertex v in G.V

s.d = 0 v.d = innity v.prev = NULL

NP-Complete Problems No polynomial-time algorithm has yet been discovered for an NP-Complete problem, but no one has been able to prove that no polynomial-time algorithm can exist for any one of them. I. Traveling Salesman Given a list of cities and their pairwise distances, nd the shortest possible tour that visits all of them once. This problem is the same as nding the shortest Hamiltonian Cycle in an undirected, weighted graph. A Hamiltonian Cycle is a simple cycle that contains each vertex in V. The traveling salesman is nding the shortest Hamiltonian Cycle. Determining whether a graph, undirected or directed, even contains a Hamiltonian Cycle is NP-Complete. II. Knapsack Problem Given a set of items that have a weight and value, determine the number of each item to include in s set so that the total weight is less than a given limit and the value is maximized. III. Finding a longest path from a source. IV.Determining whether a graph contains a path with at least a given number of edges.