Documente Academic
Documente Profesional
Documente Cultură
Directions/Notes:
Write your ID on every page of this exam. Write your name just on this cover page.
Be sure to sign the honor code statement when you are finished.
All questions on this exam are implicitly prefaced with As taught in CMPSC 465 lectures this term.
Always justify your work and present solutions using the conventions taught in class.
The problem that is a take-off on a challenge problem is on the last sheet of the exam. You must solve that
problem first and remove and turn in that page of your exam within the first 30 minutes of the exam period.
Due to the logistics of timing the challenge problem, if you arrive late without having made prior
arrangements with the instructor, you will either forfeit missed time or be permitted to earn only up to half
credit on the problem.
Solve exactly 2 of 3 problems from among Problems 3, 4, and 5. Read directions within #3 about this very
carefully. In particular, if you solve all 3 and dont cross one out, only 1 score not 2 will count.
Use pencil to complete this exam. Use of pen will result in an automatic 10-point deduction and we
reserve the right not to read any problem with cross-outs.
Score Breakdown:
#
25
31
12
A.P.
Total
12
12
20
--
100
Score
Value
(Solve 2 of these 3)
1.
Tracing Execution.
a.
[25 pts.]
Illustrate the action of QUICKSORT on the array 465, 122, 221, 311, 121, 360. To save time, only show the first 3
calls (the initial call and first two recursive calls) to QUICKSORT.
[7]
Initial call: QUICKSORT (A, 1, 6)
i
pj
465
i
465
i
122
122
j
122
221
311
121
221
j
221
311
121
311
j
311
121
122
465
i
221
122
221
465
i
311
122
221
311
465
i
121
122
221
311
121
r
360
121
j
121
465
360
360
q=5
465
pj
122
311
r
121
122
221
j
221
122
221
311
j
311
122
221
311
jr
121
121
q=1
221
311
122
Draw a heap representation of a priority queue from the keys 1582, 1106, 971, 869, 809, 794, 791, 788. (You dont
have to show any work for building the heap.) Illustrate and explain the steps done when EXTRACT-MAX is called on
this priority queue.
[4]
1582 788 1106
Steps:
(1) Store 1582 in max.
(2) Copy 788 (last node into root).
869 788
971
788
809
794
791
c.
Illustrate the action of HEAPSORT on the array 465, 122, 221, 311, 121, 360. To save time, stop after there are 2
elements in your output array.
First build heap:
Last Parent
Heapify
465
122
360 221
311
311 122
121
360
221
Heap:
1)
Heapify
465 221 360
360 221
311
122
121
221
465
465
2)
Heapify
311
360 121
122
311 121
122 121
121
360
221
360
465
End Heap:
311
122
121
221
[7]
d.
Illustrate the action of BUCKET-SORT on 130, 141, 140, 104, 103, 129, assuming theyve been drawn from data
uniformly-distributed in [100, 142).
[100, 107)
[107, 114)
[114, 121)
129
[121, 128)
[128, 135)
130
[135, 142)
141
140
104
129
130
140
141
3) Concatenate lists
103
104
129
130
140
141
[7]
2.
[31 pts.]
BUILD-HEAP calls HEAPIFY from n / 2 down to 1. Explain why these particular calls are made and why in this
order.
[3]
By definition, i / 2 is the index of the parent of node i in a heap.
Since there are n nodes total, n / 2 is the index of the last location that has children.
We start there and move backwards to 1, so that we guarantee we meet HEAPIFYs precondition that both children of
its input node are heaps.
b.
Suppose you have a 5-ary heap with n nodes total. What is its exact height? What is the valid range of heights of
nodes? Illustrate where you would find nodes of the largest height and of heights 0 and 1.
[3]
c.
HEAPSORT, as we studied it, puts its output in an array and sorts in ascending order. Explain how to modify
HEAPSORT so that it gives its output in a linked list and in descending order. Your modifications should be as
efficient as possible and not use any memory that is not necessary.
[4]
We want the first element extracted to be the smallest, so we use a min heap instead of a max heap.
At each pass, insert the node that was the heaps root at the head of the linked list. (Insertion is thus always (1)
since its just pointer assignments.)
d.
e.
=
=
=
[BUILD-HEAP time]
O(n)
O(n lg n)
+
+
[4]
We studied a priority queue operation INCREASE-KEY(A, i, key). Describe an analogous DECREASE-KEY(A, i, key)
operation, assuming that key will always be strictly less than A[i]s key. Explain why it works.
[3]
Change A[i]s key to key.
Now, A[i]s key could be smaller than child keys and those of any descendant, in fact. Need to fix this, so
HEAPIFY.
f.
Give an array for which HEAPSORT will perform better than QUICKSORT. Explain why, using asymptotic running
times.
[4]
HEAPSORT runs in O(n lg n) worst case time.
QUICKSORTS average case is ((n lg n), but its worst case of O(n2) occurs when its input is already sorted, because
it produces splits where one of the conquer calls is on an empty array each time.
So, (100, 99, 98, 97, , 1) is one such array.
g.
While studying for your Algorithms exam, you decide to consult Wikipedia for help and end up clicking through
pages about many different kinds of sorts, when you come across a poorly-written article on MAGICSORT. The
article doesnt tell you much about MAGICSORT, other than that it compares keys as part of its logic, sorts in place,
and runs in linear time. Should you trust this article? Explain.
[5]
The lower bound on comparison-based sorts is (n lg n), so linear time comparison sorts are impossible. Lies!
h.
Suppose I have a spreadsheet with columns with all of your names in one column, your section numbers in another
column, and an attendance rating of good or poor or mixed in a third column. I first sort this spreadsheet by
name, then I sort that result by section, then I sort that result by attendance. I want an output where those with good
attendance in Section 1 are listed alphabetically by name first, then those with good attendance in Section 2
alphabetically, etc., followed by those with mixed attendance in Section 1 alphabetically, those with mixed
attendance in Section 2, etc. What property or properties must the sorting algorithm my spreadsheet uses have in
order to get correct output? Explain.
[2]
We need a stable sort, i.e. one where the relative positioning of same-values keys are preserved by the sort.
Otherwise, in the second column sort, attendance values get mixed up.
i.
Suppose we ran bucket sort as specified in class, except expanded to accept data from 0 to 100, and fed it the ages of
several people surveyed in downtown State College over the course of the week. Describe the performance of the
sort.
[2]
Bucket sort assumes its data comes from a uniform distribution (recall the analysis with indicator random variables
to help establish our expectation of 1 element per bucket) to get its linear time. The distribution of ages in State
College is certainly not uniform over [0, 100). Wed expect disproportionately full buckets in the 18-22 region and
sorting those buckets with insertion sort will require quadratic time.
j.
[1]
IMPORTANT: From among Problems 3, 4, and 5, solve EXACTLY TWO of the problems. If you start a problem but
decide later you dont want that problem to be graded, draw a giant X over the entire page. We will not look for
whether or not youve attempted more problems than you should and it is important for the processing of exams that you
follow these directions, so if you attempt all three, we will count one of the three scores and it will be the lowest one.
3.
Bucket Sort with a Twist. Bucket sort assumes its inputs are uniformly distributed and we can easily adapt it for any
uniform distribution and get pretty good performance. But what if we have a different kind of distribution? Suppose you
have an input data set of integers whose keys follow a distribution such that
All values fall in [0, 100).
Theres a 64% chance values will fall in [70, 100). Among those values that do, 50% are expected to be in [80,
90) and be uniformly distributed between 80 and 90. All values in [70, 80) and [90, 100) are equally likely to
occur.
Values in [40, 50) are expected 6% of the time and values in this range are uniformly distributed.
All other values are uniformly distributed.
For the sake of convenience, you may assume that the size of input to be sorted is a multiple of 100. Explain how to
adapt bucket sort for this situation so that performance does not suffer. Justify your decisions and note running times.
[12 pts.]
Say this is a visualization of the number line from 0 to 100:
0
10
20
30
all shaded area 30% likely
40
6%
likely
50
60
all shaded area 30%
likely
70
80
90
64% likely
16%
likely
32%
likely
99
16%
likely
Since the known probabilities account for 70% of the probability, the probability a key falls in [0, 40) [50, 70) is the
probability of the complement, 30%. This total area is 60, so each integer has a 0.5% chance of coming up.
Suppose input size is n.
The key factor in bucket sorts (n) performance is having each bucket be equally likely to come up.
So, we must allocate the buckets in such a way that this happens:
Allocate .2n buckets evenly spaced in [0, 40)
4.
Divide-and-Conquer. Consider the problem of counting the number of leaves in a subtree of k-ary subtree, given the
root of that subtree.
[12 pts.]
a.
Devise a divide-and-conquer algorithm for solving this problem. Specify the precondition(s) and postcondition(s) of
the algorithm and write pseudocode in a style similar to that used in CLRS and the lecture notes.
Divide work is very natural: divide for each of k children
So, conquer work is k subproblems of size n/k
Combine work is to add results of previous recursive calls.
The base of the recursion is a leaf; we return 1 to count that leaf.
So
COUNTLEAVES(T, r, k)
// PRE:
k > 0 and T is a k-ary tree. r is not NIL and points to some node in T.
// POST: FCTVAL == number of leaves in subtree of T rooted at r (given maximum k children per node)
{
if r is a leaf
// base: count leaf
{
return 1
}
else
// recusion: divide and conquer!
{
count = 0
for i = 1 to k
{
if ith leftmost child of r is not NIL
count = count + COUNTLEAVES(T, ith leftmost child of r, k)
}
return count
}
}
Another equally valid strategy for dealing with NIL nodes was to remove the r is not NIL part of the precondition
and add another base case for if r is NIL. In this case, the return value would be 0.
b.
Derive a recurrence for the running time of your algorithm. In doing so, make it clear what work in your algorithm
corresponds to each of the general parts of a divide-and-conquer algorithm.
Divide work is very natural: divide for each of k children already done 0 time
So, conquer work is k subproblems of size n/k k T(n/k)
Combine work is to add results of previous recursive calls add k values (k)
So, recurrence is
T(n) = k T(n/k) + (k)
or, more properly, as this is a multivariable function,
T(n, k) = k T(n/k) + (k)
5.
Correctness and Heaps. Write a loop invariant and use it to prove the following algorithm correct:
[12 pts.]
LIST-ANCESTORS(A, k)
// PRE: k > 1 and A[1..k] defines the first k keys in a binary heap
// POST: FCTVAL is a pair (L[1..n], n) where L[1..n] are the ancestors of k in A, in order from A[k]s parent to the
//
root of A
{
ancs = new empty array
anc_index = k
for i = 1 to lg k
anc_index = anc_index / 2
ancs[i] = A[anc_index]
return (ancs, i 1)
}
Invariant:
At start of iteration i,
1. ancs[1...i-1] contains, in order from ks parent to i-1 generations up, ks ancestors.
2. A[anc_index] = ks ancestor i-1 levels up
Initialization:
Initially, i = 1.
Then ancs[1..i-1] = ancs[1..0], an empty array, so #1 is vacuously true.
Also, anc_index = k and i-1 = 0. k is indeed 0 levels up, and is, in some sense, its ancestor 0 generations up from
itself, so #2 is true.
Maintenance:
Suppose, at the start of iteration i, invariant is true, i.e.
1. ancs[1...i-1] contains, in order from ks parent to i-1 generations up, ks ancestors.
2. A[anc_index] = ks ancestor i-1 levels up
[inductive hypothesis]
We set anc_index = anc_index / 2 , which by heap definitions is parent of that node. So, ancs[i] becomes this
node, and since anc_index was ks (i-1)st ancestor, ancs[i] will be ks ith ancestor.
Invariant got us ancs[1...i-1] and this gets us A[i], so now ancs[1...i-1] contains, in order from ks parent to i-1
generations up, ks ancestors.
Incrementing i preserves the invariant at the start of the next iteration.
Termination:
When loop is done, i = lg k + 1.
So, i -1 = lg k
This is the distance from k ro the root, or the total number of ancestors. So, ancs[1.. lg k ] contains all ancestors
and is returned, so we get the postcondition.
6.
[20 pts.]
Suppose you are given a sequence of n elements to sort. The input sequence consists of subsequences of length k,
where it is given that k | n. The elements in any given subsequence are smaller than the elements in the succeeding
subsequence and larger than the elements in the preceding subsequence. Thus, all that is needed to sort the whole
sequence of length n is to sort the k elements of each of the subsequences. Rigorously derive a lower bound on the
leaves in the decision tree for solving this sorting problem.
[6]
Say this is a visualization of the sequence s:
1
k k+1
2k 2k+1
3k
length k
length k
length k
(n/k-1)k+1 n
length k
Step n/k: Sort the n/kth subsequence Independently of how previous were sorted, k! comparisons
By the multiplication rule, there are (k!)(k!)(k!) [n/k factors] = (k!)n/k comparisons.
We need at least one leaf per permutation, so the lower bound on the number of leaves is (k!)n/k.
b.
In the same setup as (a), assuming the n elements are in an array A[1..n], how would we refer to the jth
subsequence? In other words, what indices define it?
A[ ((j-1)k + 1) .. jk ]
(See figure to understand this.)
[2]
c.
Suppose that you have a set R of red water jugs a B of blue water jugs, where |R| = |B| = n, and jugs are of different
shapes and sizes. All red jugs hold different amounts of water, as do the blue ones. Moreover, for every red jug,
there is a blue jug that holds the same amount of water, and vice-versa. Your task is to find a grouping of red jugs
and blue jugs that hold the same amount of water. You may compare these jugs only in the way specified in the
homework problem and using the assumptions of the homework problem only.
i.
[2]
We want to match reds and blues, so we need a function that matches each element of R with exactly one
element of B a bijection from R to B (or vice-versa). Since there are n elements, there are n! different
bijections possible.
ii.
Describe an arbitrarily chosen node in the decision tree for solving this problem.
[3]
bi : rj
cap(bi) = cap(rj)
cap(bi) > cap(rj)
iii. Define any necessary variables for and recall the inequality you derived for your decision tree here (no
derivation necessary, but you may opt to do scratch work on scratch paper as time permits).
[2]
Let h = height of decision tree
l = # of leaves of decision tree
As l = n! and branching factor is 3, 3h n!
iv. Starting from the result of (iii), prove the lower bound for the number of comparisons an algorithm solving
this problem must make.
[5]
3h n!
log3 3h log3 n!
n
h log 3
e
h n[log3 n - log3 e]
h n log3 n n log3 e
h = (lg n)
by Stirlings approximation
by laws of logs
by multiplication
as log3 e is a constant, all logs are (lg n), and
n lg n dominates n
IMPORTANT NOTES:
This last problem must be solved in the first 30 minutes of the exam period.
Remove this sheet. Listen for time to be called to pass it in.
Make sure the last 4-digits of your ID number on this sheet match the rest of your exam or you
will not get credit for this problem.
Write the first letter (only) of your last name in the box to the right to assist us with sorting exams
while relatively maintaining anonymity during grading.
First Letter of
Last Name: