Documente Academic
Documente Profesional
Documente Cultură
CSCI-GA.1170-001/Summer 2016
Solution to Homework 5
Problem 1 (CLRS 11.2-1). (1 point) Suppose we use a hash function h to hash n distinct keys
into an array T of length m. Assuming simple uniform hashing, what is the expected number of
collisions? More precisely, what is the expected cardinality of {{k, l} : k 6= l and h(k) = h(l)}?
Solution: Let us define X i j as a random variable indicating i-th and j-th keys (by insertion
order) being hashed to the same location:
X i j = I h(ki ) = h(k j ) .
Then the number of collisions Nc can be expressed as sum of X i j over all pairs of distinct keys:
Nc =
n1 X
n
X
Xi j.
i=1 j=i+1
n1
n
n1 X
n
n1 X
n
XX
X
X
1
E Nc = E
Xij =
E Xij =
m
i=1 j=i+1
i=1 j=i+1
i=1 j=i+1
!
n1
n1
n1
X
X
X
ni
1
(n 1)n
1
=
=
n(n 1)
n
i =
m
m i=1
m
2
i=1
i=1
=
n(n 1)
.
2m
Alternatively:
E Nc
n 1
n(n 1) 1
n(n 1)
=
=
=
.
2 m
2
m
2m
Problem 2 (CLRS 11.3-3). (3 points) Consider a version of the division method in which
h(k) = k mod m, where m = 2 p 1 and k is a character string interpreted in radix 2 p . Show
that if we can derive string x from string y by permuting its characters, then x and y hash to
the same value. Give an example of an application in which this property would be undesirable
in a hash function.
Solution: Any permutation of a string can be obtained by repeated exchanges of pairs of
characters. Thus, it suffices to show that strings x and y derived from x by exchanging a
single pair of characters hash to the same value.
Let us define x and y as identical strings of n characters with a single pair of characters interchanged:
x a = yb,
ya = x b .
x and y have the following representations in radix 2 p :
x=
n1
X
x i 2ip ,
i=0
y=
n1
X
yi 2ip .
i=0
n1
X
!
mod (2 p 1),
x i 2ip
i=0
h( y) =
n1
X
!
yi 2ip
mod (2 p 1).
i=0
We know that:
0 h(x) < 2 p 1,
0 h( y) < 2 p 1,
(2 p 1) < h(x) h( y) < 2 p 1.
To show that h(x) = h( y) it is therefore sufficient to show that:
h(x) h( y) mod (2 p 1) = 0.
The characters in x and y are the same except for x a , x b , ya , and y b . Thus, the sums in radix
2 p representation will also be the same (and will cancel out on subtraction) except for x a 2ap ,
x b 2 bp , ya 2ap , and y b 2 bp . We also recall that x a = y b and ya = x b , and obtain:
h(x) h( y) mod (2 p 1) = (x a 2ap + x b 2 bp ) ( ya 2ap + y b 2 bp ) mod (2 p 1)
= (x a 2ap + x b 2 bp ) (x b 2ap + x a 2 bp ) mod (2 p 1)
= (x a x b )2ap (x a x b )2 bp mod (2 p 1)
= (x a x b )(2ap 2 bp ) mod (2 p 1)
= (x a x b )2 bp (2(ab)p 1) mod (2 p 1).
2(ab)p 1
,
p 1
2
i=0
!
ab1
X
2(ab)p =
2 pi (2 p 1).
2 pi =
i=0
!
ab1
X
h(x) h( y) mod (2 p 1) = (x a x b )2 bp
2 pi (2 p 1) mod (2 p 1).
i=0
i = i + 1
until i == m
error " Hash table overflow "
Problem 4 (CLRS 11.2). (5 points) Suppose that we have a hash table with n slots, with
collisions resolved by chaining, and suppose that n keys are inserted into the table. Each key
is equally likely to be hashed to each slot. Let M be the maximum number of keys in any slot
after all the keys have been inserted. Your mission is to prove an O(lg n/ lg lg n) upper bound
on E[M ], the expected value of M .
Solution:
(a) Argue that the probability Q k that exactly k keys hash to a particular slot is given by
k
1 nk n
1
1
.
Qk =
n
n
k
Given that each key is equally likely to be hashed to each of n slots:
1 k
Pr k keys hashed to the same slot =
,
n
1 nk
Pr other n k keys hashed to other slots = 1
.
n
n
Observing that there are k ways to pick k out of n keys gives:
k
1
1 nk n
Qk =
.
1
k
n
n
(b) Let Pk be the probability that M = k, that is, the probability that the slot containing the
most keys contains k keys. Show that Pk nQ k .
Let us define X i as a random variable denoting the number of keys in slot i. Then:
M = max X i ,
1in
Pk = Pr {M = k}
= Pr max X i = k .
1in
i=1
i6= j
Pr X j = k
n
X
Pr X i = k .
i=1
By (a), the probability that exactly k keys hash to a particular slot is Q k . Thus:
Pk
n
X
Pr X i = k
i=1
n
X
Qk
i=1
= nQ k .
(c) Use Stirlings approximation, equation (3.18) in CLRS, to show that Q k < e k /k k .
nk
< 1, we have:
Observing that 1 1n
k
1 nk n
1
1
Qk =
n
n
k
k
1
n
.
<
k
n
We now note that:
n
n!
=
k
k!(n k)!
1 n(n 1)...(n k + 1)(n k)...1
=
k!
(n k)!
1
=
n(n 1)...(n k + 1)
k!
1
< nk .
k!
And thus:
k
1
1 k
1
Qk <
n = .
n k!
k!
Finally, we use Stirlings approximation:
n
p
n
1
1+
,
n! = 2n
e
n
to note that:
n
n
n! >
,
e
and therefore:
Qk <
1
1
ek
< k = k .
k
k!
k
e
(d) Show that there exists a constant c > 1 such that Q k0 < 1/n3 for k0 = c lg n/ lg lg n.
Conclude that Pk < 1/n2 for k k0 = c lg n/ lg lg n.
By (c):
Q k0 <
e k0
k0 k0
k0
n3
1
,
n3
k0 k0
.
e k0
c lg n
c lg n
=
lg
lg e
lg lg n
lg lg n
c lg n
lg(c lg n) lg(lg lg n) lg e
lg lg n
c lg n
lg c + lg lg n lg lg lg n lg e ,
=
lg lg n
c
3
lg c + lg lg n lg lg lg n lg e
lg lg n
lg c lg e lg lg lg n
= c 1+
.
lg lg n
lg lg n
=
We note that the right-hand side is only defined for n > 2, and that as n , the
logarithm ratios go to zero, sending the parenthesized expression to 1. Defining n0 > 2
such that:
1+
lg c lg e lg lg lg n 1
lg lg n
lg lg n
2
for all n n0 ,
lg c lg e lg lg lg n
3 c 1+
lg lg n
lg lg n
for any c 6 and all n n0 .
lg lg n
lg lg n
6
1
,
n3
1
.
n2
This shows that the inequality holds for k = k0 . We will now show that it holds for k > k0
by showing that Q k < 1/n3 for all k k0 .
By picking c inside k0 large enough that k0 > e, we have (e/k) < 1 and (e/k)m+1 < (e/k)m
for k k0 and any m. Using Q k < e k /k k from (c):
e k0
ek
Qk < k k ,
k
k0
k0
e
Qk < k .
k0
Combining with Q k0 < 1/n3 from the first part of (d) and keeping in mind that k k0 :
Qk <
e k0
1
and Q k0 < 3 ,
k
k0
n
1
Qk < 3 .
n
.
E [M ] Pr M >
lg lg n
lg lg n
lg lg n
n
X
k Pr {M = k} .
k=0
k0
X
k Pr {M = k} +
n
X
k=k0 +1
k=0
k Pr {M = k} .
The number of keys in a slot cannot exceed the total number of keys so far, thus:
E [M ]
k0
X
n
X
k0 Pr {M = k} +
n Pr {M = k} .
k=k0 +1
k=0
k0
X
n
X
Pr {M = k} + n
Pr {M = k}
k=k0 +1
k=0
= k0 Pr M k0 + n Pr M > k0
= k0 Pr M k0 + n Pr M > k0
c lg n
c lg n
c lg n
Pr M
+ n Pr M >
.
=
lg lg n
lg lg n
lg lg n
To show that E [M ] = O(lg n/ lg lg n), we first rewrite Pr M > k0 in terms of Pk and
apply Pk < 1/n2 from (d):
n
X
Pr M > k0 =
Pr {M = k}
=
<
k=k0 +1
n
X
k=k0 +1
n
X
k=k0
< n
=
Pk
1
n2
+1
1
n2
1
.
n
Finally, using a trivial upper bound Pr M k0 1:
E [M ] k0 Pr M k0 + n Pr M > k0
1
= k0 1 + n
n
= k0 + 1
c lg n
=
+1
lg lg n
lg n
=O
.
lg lg n