Sunteți pe pagina 1din 11

CMPSC 465

Data Structures and Algorithms


Spring 2013

Exam II: Heaps and Sorting


February 21, 2013
Name: __________________________________________________

Last 4 digits of ID: ____ ____ ____ ____

Honor Code Statement


I certify that I have not discussed the contents of the exam with any other student and I will not discuss the contents
of this exam with any student in this class before it is returned to me with a grade. I have not cheated on this exam in
any way, nor do I know of anyone else who has cheated.
Signature: _________________________________________________________

Directions/Notes:

Write your ID on every page of this exam. Write your name just on this cover page.

No outside notes, books, or calculators are permitted.

Be sure to sign the honor code statement when you are finished.

All questions on this exam are implicitly prefaced with As taught in CMPSC 465 lectures this term.

Always justify your work and present solutions using the conventions taught in class.

The problem that is a take-off on a challenge problem is on the last sheet of the exam. You must solve that
problem first and remove and turn in that page of your exam within the first 30 minutes of the exam period.
Due to the logistics of timing the challenge problem, if you arrive late without having made prior
arrangements with the instructor, you will either forfeit missed time or be permitted to earn only up to half
credit on the problem.

Solve exactly 2 of 3 problems from among Problems 3, 4, and 5. Read directions within #3 about this very
carefully. In particular, if you solve all 3 and dont cross one out, only 1 score not 2 will count.

Use pencil to complete this exam. Use of pen will result in an automatic 10-point deduction and we
reserve the right not to read any problem with cross-outs.

Score Breakdown:
#

25

31

12

A.P.

Total

12

12

20

--

100

Score
Value

(Solve 2 of these 3)

CMPSC 465 Exam 2 Spring 2013 Page 2

1.

Last 4 Digits of ID: ___ ___ ___ ___

Tracing Execution.
a.

[25 pts.]

Illustrate the action of QUICKSORT on the array 465, 122, 221, 311, 121, 360. To save time, only show the first 3
calls (the initial call and first two recursive calls) to QUICKSORT.
[7]
Initial call: QUICKSORT (A, 1, 6)
i

pj
465
i
465
i
122

122
j
122

221

311

121

221
j
221

311

121

311
j
311

121

122

465
i
221

122

221

465
i
311

122

221

311

465
i
121

122

221

311

121

r
360

121
j
121
465

360

360
q=5

465

First recursive call: QUICKSORT (A, 1, 4)


i

pj
122

311

r
121

122

221
j
221

122

221

311
j
311

122

221

311

jr
121

121
q=1

221

311

122

Second recursive call: QUICKSORT (A, 1, 0)


Since p = 1, r = 0, p > r, QUICKSORT will return immediately
b.

Draw a heap representation of a priority queue from the keys 1582, 1106, 971, 869, 809, 794, 791, 788. (You dont
have to show any work for building the heap.) Illustrate and explain the steps done when EXTRACT-MAX is called on
this priority queue.
[4]
1582 788 1106

Steps:
(1) Store 1582 in max.
(2) Copy 788 (last node into root).

869 788

(3) HEAPIFY at root.


(4) Return max = 1582.

971

1106 788 869

788

809

794

791

CMPSC 465 Exam 2 Spring 2013 Page 3

c.

Last 4 Digits of ID: ___ ___ ___ ___

Illustrate the action of HEAPSORT on the array 465, 122, 221, 311, 121, 360. To save time, stop after there are 2
elements in your output array.
First build heap:
Last Parent
Heapify

465
122

360 221

311

311 122

121

360

221

Heap:
1)
Heapify
465 221 360
360 221

311
122

121

221

465

465

2)
Heapify
311
360 121
122
311 121
122 121

121
360

221
360

465

End Heap:
311
122
121

221

[7]

CMPSC 465 Exam 2 Spring 2013 Page 4

d.

Last 4 Digits of ID: ___ ___ ___ ___

Illustrate the action of BUCKET-SORT on 130, 141, 140, 104, 103, 129, assuming theyve been drawn from data
uniformly-distributed in [100, 142).

1) Build / fill buckets


103
104

[100, 107)
[107, 114)
[114, 121)

129

[121, 128)
[128, 135)

130

[135, 142)

141
140

2) Insertion sort list


103

104

129

130

140

141

3) Concatenate lists
103

104

129

130

140

141

[7]

CMPSC 465 Exam 2 Spring 2013 Page 5

2.

Last 4 Digits of ID: ___ ___ ___ ___

Short Answer Questions.


a.

[31 pts.]

BUILD-HEAP calls HEAPIFY from n / 2 down to 1. Explain why these particular calls are made and why in this
order.
[3]
By definition, i / 2 is the index of the parent of node i in a heap.
Since there are n nodes total, n / 2 is the index of the last location that has children.
We start there and move backwards to 1, so that we guarantee we meet HEAPIFYs precondition that both children of
its input node are heaps.

b.

Suppose you have a 5-ary heap with n nodes total. What is its exact height? What is the valid range of heights of
nodes? Illustrate where you would find nodes of the largest height and of heights 0 and 1.
[3]

Height of heap = log5 n

Largest height node is the root.


Height 1 nodes are those that are parents of leaves.
Height 0 nodes are leaves.

c.

HEAPSORT, as we studied it, puts its output in an array and sorts in ascending order. Explain how to modify
HEAPSORT so that it gives its output in a linked list and in descending order. Your modifications should be as
efficient as possible and not use any memory that is not necessary.

[4]

We want the first element extracted to be the smallest, so we use a min heap instead of a max heap.
At each pass, insert the node that was the heaps root at the head of the linked list. (Insertion is thus always (1)
since its just pointer assignments.)
d.

What is HEAPSORTS running time for an input of size n? Why?


T(n)

e.

=
=
=

[BUILD-HEAP time]
O(n)
O(n lg n)

+
+

[4]

[# loop iterations] * [swap time + HEAPIFY time]


(n-1) * [ (1)
+ O(lg n)
]

We studied a priority queue operation INCREASE-KEY(A, i, key). Describe an analogous DECREASE-KEY(A, i, key)
operation, assuming that key will always be strictly less than A[i]s key. Explain why it works.
[3]
Change A[i]s key to key.
Now, A[i]s key could be smaller than child keys and those of any descendant, in fact. Need to fix this, so
HEAPIFY.

CMPSC 465 Exam 2 Spring 2013 Page 6

f.

Last 4 Digits of ID: ___ ___ ___ ___

Give an array for which HEAPSORT will perform better than QUICKSORT. Explain why, using asymptotic running
times.
[4]
HEAPSORT runs in O(n lg n) worst case time.
QUICKSORTS average case is ((n lg n), but its worst case of O(n2) occurs when its input is already sorted, because
it produces splits where one of the conquer calls is on an empty array each time.
So, (100, 99, 98, 97, , 1) is one such array.

g.

While studying for your Algorithms exam, you decide to consult Wikipedia for help and end up clicking through
pages about many different kinds of sorts, when you come across a poorly-written article on MAGICSORT. The
article doesnt tell you much about MAGICSORT, other than that it compares keys as part of its logic, sorts in place,
and runs in linear time. Should you trust this article? Explain.
[5]
The lower bound on comparison-based sorts is (n lg n), so linear time comparison sorts are impossible. Lies!

h.

Suppose I have a spreadsheet with columns with all of your names in one column, your section numbers in another
column, and an attendance rating of good or poor or mixed in a third column. I first sort this spreadsheet by
name, then I sort that result by section, then I sort that result by attendance. I want an output where those with good
attendance in Section 1 are listed alphabetically by name first, then those with good attendance in Section 2
alphabetically, etc., followed by those with mixed attendance in Section 1 alphabetically, those with mixed
attendance in Section 2, etc. What property or properties must the sorting algorithm my spreadsheet uses have in
order to get correct output? Explain.
[2]
We need a stable sort, i.e. one where the relative positioning of same-values keys are preserved by the sort.
Otherwise, in the second column sort, attendance values get mixed up.

i.

Suppose we ran bucket sort as specified in class, except expanded to accept data from 0 to 100, and fed it the ages of
several people surveyed in downtown State College over the course of the week. Describe the performance of the
sort.
[2]
Bucket sort assumes its data comes from a uniform distribution (recall the analysis with indicator random variables
to help establish our expectation of 1 element per bucket) to get its linear time. The distribution of ages in State
College is certainly not uniform over [0, 100). Wed expect disproportionately full buckets in the 18-22 region and
sorting those buckets with insertion sort will require quadratic time.

j.

What are the different components of a divide-and-conquer algorithm?

[1]

divide, conquer, combine

IMPORTANT: From among Problems 3, 4, and 5, solve EXACTLY TWO of the problems. If you start a problem but
decide later you dont want that problem to be graded, draw a giant X over the entire page. We will not look for
whether or not youve attempted more problems than you should and it is important for the processing of exams that you
follow these directions, so if you attempt all three, we will count one of the three scores and it will be the lowest one.

CMPSC 465 Exam 2 Spring 2013 Page 7

3.

Last 4 Digits of ID: ___ ___ ___ ___

Bucket Sort with a Twist. Bucket sort assumes its inputs are uniformly distributed and we can easily adapt it for any
uniform distribution and get pretty good performance. But what if we have a different kind of distribution? Suppose you
have an input data set of integers whose keys follow a distribution such that
All values fall in [0, 100).
Theres a 64% chance values will fall in [70, 100). Among those values that do, 50% are expected to be in [80,
90) and be uniformly distributed between 80 and 90. All values in [70, 80) and [90, 100) are equally likely to
occur.
Values in [40, 50) are expected 6% of the time and values in this range are uniformly distributed.
All other values are uniformly distributed.
For the sake of convenience, you may assume that the size of input to be sorted is a multiple of 100. Explain how to
adapt bucket sort for this situation so that performance does not suffer. Justify your decisions and note running times.
[12 pts.]
Say this is a visualization of the number line from 0 to 100:
0

10
20
30
all shaded area 30% likely

40
6%
likely

50
60
all shaded area 30%
likely

70

80
90
64% likely

16%
likely

32%
likely

99

16%
likely

Since the known probabilities account for 70% of the probability, the probability a key falls in [0, 40) [50, 70) is the
probability of the complement, 30%. This total area is 60, so each integer has a 0.5% chance of coming up.
Suppose input size is n.
The key factor in bucket sorts (n) performance is having each bucket be equally likely to come up.
So, we must allocate the buckets in such a way that this happens:
Allocate .2n buckets evenly spaced in [0, 40)

Allocate .06n buckets evenly spaced in [40, 50)

Allocate .1n buckets evenly spaced in [50, 70)

Allocate .16n buckets evenly spaced in [70, 80)

Allocate .32n buckets evenly spaced in [80, 90)

Allocate .16n buckets evenly spaced in [90, 100)

Then the expected length of each linked list will be 1!

CMPSC 465 Exam 2 Spring 2013 Page 8

4.

Last 4 Digits of ID: ___ ___ ___ ___

Divide-and-Conquer. Consider the problem of counting the number of leaves in a subtree of k-ary subtree, given the
root of that subtree.
[12 pts.]
a.

Devise a divide-and-conquer algorithm for solving this problem. Specify the precondition(s) and postcondition(s) of
the algorithm and write pseudocode in a style similar to that used in CLRS and the lecture notes.
Divide work is very natural: divide for each of k children
So, conquer work is k subproblems of size n/k
Combine work is to add results of previous recursive calls.
The base of the recursion is a leaf; we return 1 to count that leaf.
So
COUNTLEAVES(T, r, k)
// PRE:
k > 0 and T is a k-ary tree. r is not NIL and points to some node in T.
// POST: FCTVAL == number of leaves in subtree of T rooted at r (given maximum k children per node)
{
if r is a leaf
// base: count leaf
{
return 1
}
else
// recusion: divide and conquer!
{
count = 0
for i = 1 to k
{
if ith leftmost child of r is not NIL
count = count + COUNTLEAVES(T, ith leftmost child of r, k)
}
return count
}
}
Another equally valid strategy for dealing with NIL nodes was to remove the r is not NIL part of the precondition
and add another base case for if r is NIL. In this case, the return value would be 0.

b.

Derive a recurrence for the running time of your algorithm. In doing so, make it clear what work in your algorithm
corresponds to each of the general parts of a divide-and-conquer algorithm.
Divide work is very natural: divide for each of k children already done 0 time
So, conquer work is k subproblems of size n/k k T(n/k)
Combine work is to add results of previous recursive calls add k values (k)
So, recurrence is
T(n) = k T(n/k) + (k)
or, more properly, as this is a multivariable function,
T(n, k) = k T(n/k) + (k)

CMPSC 465 Exam 2 Spring 2013 Page 9

5.

Last 4 Digits of ID: ___ ___ ___ ___

Correctness and Heaps. Write a loop invariant and use it to prove the following algorithm correct:
[12 pts.]
LIST-ANCESTORS(A, k)
// PRE: k > 1 and A[1..k] defines the first k keys in a binary heap
// POST: FCTVAL is a pair (L[1..n], n) where L[1..n] are the ancestors of k in A, in order from A[k]s parent to the
//
root of A
{
ancs = new empty array
anc_index = k
for i = 1 to lg k
anc_index = anc_index / 2
ancs[i] = A[anc_index]
return (ancs, i 1)
}
Invariant:
At start of iteration i,
1. ancs[1...i-1] contains, in order from ks parent to i-1 generations up, ks ancestors.
2. A[anc_index] = ks ancestor i-1 levels up
Initialization:
Initially, i = 1.
Then ancs[1..i-1] = ancs[1..0], an empty array, so #1 is vacuously true.
Also, anc_index = k and i-1 = 0. k is indeed 0 levels up, and is, in some sense, its ancestor 0 generations up from
itself, so #2 is true.
Maintenance:
Suppose, at the start of iteration i, invariant is true, i.e.
1. ancs[1...i-1] contains, in order from ks parent to i-1 generations up, ks ancestors.
2. A[anc_index] = ks ancestor i-1 levels up

[inductive hypothesis]

We set anc_index = anc_index / 2 , which by heap definitions is parent of that node. So, ancs[i] becomes this
node, and since anc_index was ks (i-1)st ancestor, ancs[i] will be ks ith ancestor.
Invariant got us ancs[1...i-1] and this gets us A[i], so now ancs[1...i-1] contains, in order from ks parent to i-1
generations up, ks ancestors.
Incrementing i preserves the invariant at the start of the next iteration.

Termination:
When loop is done, i = lg k + 1.
So, i -1 = lg k
This is the distance from k ro the root, or the total number of ancestors. So, ancs[1.. lg k ] contains all ancestors
and is returned, so we get the postcondition.

CMPSC 465 Exam 2 Spring 2013 Page 10

6.

Last 4 Digits of ID: ___ ___ ___ ___

Challenge Problem Follow-Up.


a.

[20 pts.]

Suppose you are given a sequence of n elements to sort. The input sequence consists of subsequences of length k,
where it is given that k | n. The elements in any given subsequence are smaller than the elements in the succeeding
subsequence and larger than the elements in the preceding subsequence. Thus, all that is needed to sort the whole
sequence of length n is to sort the k elements of each of the subsequences. Rigorously derive a lower bound on the
leaves in the decision tree for solving this sorting problem.
[6]
Say this is a visualization of the sequence s:
1
k k+1
2k 2k+1
3k
length k
length k
length k

(n/k-1)k+1 n
length k

There are n/k total subsequences.


Sorting a subsequence is equivalent to generating a permutation of k elements k! comparisons.
Sorting all is a process with n/k steps:
Step 1: Sort the 1st subsequence k! comparisons
Step 2: Sort the 2nd subsequence Independently of how 1st was sorted, k! comparisons

Step n/k: Sort the n/kth subsequence Independently of how previous were sorted, k! comparisons
By the multiplication rule, there are (k!)(k!)(k!) [n/k factors] = (k!)n/k comparisons.
We need at least one leaf per permutation, so the lower bound on the number of leaves is (k!)n/k.

b.

In the same setup as (a), assuming the n elements are in an array A[1..n], how would we refer to the jth
subsequence? In other words, what indices define it?
A[ ((j-1)k + 1) .. jk ]
(See figure to understand this.)

[2]

CMPSC 465 Exam 2 Spring 2013 Page 11

c.

Last 4 Digits of ID: ___ ___ ___ ___

Suppose that you have a set R of red water jugs a B of blue water jugs, where |R| = |B| = n, and jugs are of different
shapes and sizes. All red jugs hold different amounts of water, as do the blue ones. Moreover, for every red jug,
there is a blue jug that holds the same amount of water, and vice-versa. Your task is to find a grouping of red jugs
and blue jugs that hold the same amount of water. You may compare these jugs only in the way specified in the
homework problem and using the assumptions of the homework problem only.
i.

How do bijections fit this problem?

[2]

We want to match reds and blues, so we need a function that matches each element of R with exactly one
element of B a bijection from R to B (or vice-versa). Since there are n elements, there are n! different
bijections possible.
ii.

Describe an arbitrarily chosen node in the decision tree for solving this problem.

[3]

We compare a blue jug bi with a red jug rj:


cap(bi) < cap(rj)

bi : rj

cap(bi) = cap(rj)
cap(bi) > cap(rj)

iii. Define any necessary variables for and recall the inequality you derived for your decision tree here (no
derivation necessary, but you may opt to do scratch work on scratch paper as time permits).
[2]
Let h = height of decision tree
l = # of leaves of decision tree
As l = n! and branching factor is 3, 3h n!
iv. Starting from the result of (iii), prove the lower bound for the number of comparisons an algorithm solving
this problem must make.
[5]
3h n!

log3 3h log3 n!

n
h log 3
e

by taking logs of both sides

h n[log3 n - log3 e]
h n log3 n n log3 e
h = (lg n)

by Stirlings approximation
by laws of logs
by multiplication
as log3 e is a constant, all logs are (lg n), and
n lg n dominates n

IMPORTANT NOTES:
This last problem must be solved in the first 30 minutes of the exam period.
Remove this sheet. Listen for time to be called to pass it in.
Make sure the last 4-digits of your ID number on this sheet match the rest of your exam or you
will not get credit for this problem.
Write the first letter (only) of your last name in the box to the right to assist us with sorting exams
while relatively maintaining anonymity during grading.

First Letter of
Last Name:

S-ar putea să vă placă și