Sunteți pe pagina 1din 10

Anshul Suryan

EE16B130
electrical engineering
anshulsuryan97@gmail.com

Assignment Based on Quiz 1


CS 6720 : Jan – May, 2020 : John Augustine
Due : Tuesday, February 25, 2020 at 11.55 PM
(Submission via Moodle, no extensions)

Instructions

1. Read all the sub parts and get a global sense of the question before
answering the local parts. One important learning outcome that I wish
for you is that you are able to break down a complex problem into its
parts and solve the parts. I have broken the problems into parts for you
so that you can see how the problems are typically broken down.

2. Answer to the point. If I haven’t asked for an explanation, then it is


unnecessary.

3. In this assignment, whenever I ask you to find a small value (usually,


the number of iterations to solve a problem), I want you to use the best
(i.e., one that gets the best asymptotic bound) and most convenient tail
bound (i.e., easy to use, but perhaps may not yield the best constant).

4. Include figures whenever you think it will help in expressing your idea.
(I will be happy to suggest ways to easily include figures in LATEX.)

1
Anshul Suryan CS 6720: Assignment Based on Quiz 1

Contents
Problem 1 3
(a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
(b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
(c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
(e) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Problem 2 5
(a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
(b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
(c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Problem 3 6
(a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
(b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
(c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
(e) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Problem 4 8
(a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
(b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
(c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Page 2 of 10
Anshul Suryan CS 6720: Assignment Based on Quiz 1

Problem 1
Consider a stream where each item i, 1 ≤ i ≤ n, is a pair

(ai , si ) ∈ {1, 2, . . . , m} × Z+ .

The value of n is not known to the algorithm. The si ’s denote the “significance” of the
pair. Our goal is toP sample one item such that each b ∈ {1, 2, . . . , m} is sampled with
si
probability p(b) = Pi:ani =bsi . For example, if the stream is (2, 3), (1, 4), (1, 2), (2, 2), then
i=1
the sample should be 1 with probability 6/11 and 2 with probability 5/11. Given just
two variables p and q (or cells, as we called it in class), we wish to design and analyze an
algorithm to perform the required sampling. (Hint: you can think of p as the reservoir
and q as a counter. Suggestion: Read through all the parts before answering Parts (a)
and (b).)

(a)
How would you initialize p and q when the first pair (a1 , s1 ) arrives?

Answer
p = a1 and q = s1

(b)
Subsequently, how would you modify p and q when a pair (ai , si ), 1 < i ≤ n, arrives?

Answer
q will store the sum of significances (si ’s) up till the ith pair i.e.
q = ij=1 sj
P
si
now a biased coin is tossed with the probability of heads given by p(heads) = Pi
j=1 sj
if the coin toss turns out to be a heads , ai gets sampled in cell p

(c)
What is the probability that a particular ai will enter the “reservoir” and stay there till
the end of the stream? In our example, a1 = 2, a2 = 1, a3 = 1, and a4 = 2, so the
question is asking what is the probability that an ai (say a2 ) enters the reservoir and
stays there. (Hint: Even though a2 and a3 are the same value, the answer for i = 2 is
only 4/11 and not 6/11. Suggestion: work out Part (d) first.)

Problem 1 [(c)] continued on next page. . . Page 3 of 10


Anshul Suryan CS 6720: Assignment Based on Quiz 1 Problem 1

Answer
probability that a particular item ai will enter the “reservoir” and stay there till
the end of the stream is probability that it enters the reservoir and the subsequent
samples doesn’t get stored.

p(ai enters the reservoir and stays till the end) = Pnsi
j=1 sj

(d)
Work out how you arrived at the probability in Part (c).

Answer
si
p(ai enters the reservoir) = Pi
j=1 sj

Qn sm
p(subsequent items doesn’t enter the reservoir) = m=i+1 (1 − Pm )
j=1 sj

si
Qn sm
p(ai enters the reservoir and stays till end) = Pi · m=i+1 (1 − Pm )
j=1 sj j=1 sj

si si+1 si+2 Pnsn


= Pi · (1 − Pi+1 ) · (1 − Pi+2 ) · · · · · · (1 − )
j=1 sj j=1 sj j=1 sj j=1 sj

Pi Pi+1 Pn−1
sj

  sj  sj

= Pi si Pj=1
·i+1 Pj=1
·i+2 Pj=1
··· ··· n

j=1 sj 
j=1 sj  sj
j=1 sj j=1

= Pnsi
j=1 sj

(e)
P
si
Prove that each b ∈ {1, 2, . . . , m} is sampled with probability p(b) = Pi:ani =bsi . (Hint: if
i=1
two events A and B are mutually exclusive, then Pr[A ∪ B] = Pr[A] + Pr[B]).

Answer
probability of getting b sampled is equal to probability of union of ai (i : ai = b)
 
S
p(b) = p ai
i:ai =b
P
= p(ai ) (since the events of sampling of ai , aj are mutually exclu-
i:ai =b
sive)

Problem 1 [(e)] continued on next page. . . Page 4 of 10


Anshul Suryan CS 6720: Assignment Based on Quiz 1 Problem 1

Pnsi
P
=
j=1 sj
i:ai =b

P
si
i:a =b
= Pni
j=1 sj

Problem 2

You have a large array A of n items out of which an arbitrarily chosen n items are
somehow interesting to you. (Perhaps, they hold the key to some secret treasure.) Your
goal is to find one such interesting item. Clarification: you can assume that check-
ing if an item is interesting is an easy O(1) time operation that is already
implemented for you.

(a)
Consider sequential search where you run a loop over the array items and check whether
the current item is interesting or not. How much time will this take in the worst case?

Answer

O(n − n)

(b)
Consider the following approach.

while(interesting item not found)


Pick a random number r from 1 to n uniformly
and independent from all previous picks.
Check if A[r] is interesting and report it if interesting.

Our goal is to find the number of iterations the above randomized algorithm will take.
Towards this, what is the probability that the first iteration will fail to find an interesting
item?

Answer

n− n
n

(c)
Now, let us extend this a bit. What is the probability that the first k iterations will all
fail to find an interesting item?

Problem 2 [(c)] continued on next page. . . Page 5 of 10


Anshul Suryan CS 6720: Assignment Based on Quiz 1 Problem 2

Answer
 k
1 − √1n

(d)
Derive a small value of k for which the probability of failing to find an interesting item
comes down to 1/nc for any given constant c > 0? (Hint: use the fact that (1−x) ≤ e−x .)

Answer
the fact that if A ≤ B and B ≤ C then A ≤ C can be used here to arrive at the
solution.

− √1n
1− √1 ≤e (using the given inequality)
n
 k
− √kn
1− √1 ≤e
n

− √kn 1
e ≤ nc
(to get the desired range of k)

√k ≤ ln nc
n


k≤ n · ln nc

√ − √k
 k
− √k
so, for k ≤ n · ln nc , e n ≤ n1c and we know 1 − √1n ≤ e n ;hence we can

argue that for k ≤ n · ln nc the probability of failing to find an interesting item
goes below 1/nc

Problem 3
Consider the following code:

i=n;
while(i>0)
Pick a random number r uniformly and independently at random from 0 to i;
i=r;
EndWhile

Our goal is to argue that this while loop terminates in O(log n) iterations with high
probability (whp), i.e., with probability at least 1 − 1/n.

Problem 3 continued on next page. . . Page 6 of 10


Anshul Suryan CS 6720: Assignment Based on Quiz 1 Problem 3 (continued)

(a)
Call an iteration “good” if the random number r ≤ i/2. How many good iterations can
we have at maximum? Why?

Answer
log2 n − 1

(b)
Our goal is to argue that the loop would have terminated whp within c log n iterations
for some c (whose value will be worked out later). What would be the “bad event” that
would foil the claim? (Hint: use the fact that some iterations are “good” and others
“bad”.)

Answer
Enter your answer here.

(c)
Let Xj , 1 ≤ j ≤ c log n be 1 with probability 1/2 and 0 otherwise. As usual, let X =
P
j Xj . What is the probability that X ≤ log n? You may assume c  2, but a constant.

Answer
Enter your answer here.

(d)
Now argue carefully that when c is sufficiently large, the loop will terminate whp after
at most c log n iterations.

Answer
Enter your answer here.

(e)
Derive a small value of c for which you can claim that the loop will terminate whp after
at most c log n iterations?

Problem 3 [(e)] continued on next page. . . Page 7 of 10


Anshul Suryan CS 6720: Assignment Based on Quiz 1 Problem 3

Answer
Enter your answer here.

Problem 4
Consider a set P of n points in [0, 1]2 . You need to find its k-centre clustering. Suppose
we have an exact algorithm A that takes T (n) time for n points, but that is way more
time that we can afford. (Think of T (n) = Θ(n5 ), say.) Our goal is to design a faster
algorithm A using A and an additional small but constant parameter  = 1/` (for some
integer `). The catch is that A may not be exact. The algorithm A divides up the
region [0, 1]2 into ` × ` grid cells. One can refer to the cell in the ith row and jth column
as cell (i, j), 1 ≤ i, j ≤ `. Now A creates a new set of points P 0 such that each p ∈ P 0 is
in the middle of some grid cell such that for each cell (i, j), the cell contains a point in
P 0 iff there is at least one point in P within that cell. This way, P 0 contains at most `2
points. Finally, instead of running the algorithm A on P , A runs the algorithm A on P 0
and reports the centres that A returns.
Clarification: I had previously abused the notation k to mean two different
things: (i) the number of centres and (ii) the dimension of the grid. Now I
am using ` for the dimensions.

(a)
What will be the running time of A ? (Tip: if you claim a running time that is not a
function of both n and `, then you missed something.) Briefly justify your claim.

Answer
O(n) + T (l2 )

first, to get to P 0 from P , algorithm A will iterate over all n points and
create l2 grid cells with one point at centre if set P has atleast one point in that
cell which will take O(n) time.
now, A will run on set P 0 which has l2 points at max (when atleast one point of
set P is present in every grid cell) ; which will take T (l2 ) time.

(b)
Recall that we discussed an approximation for the k-centre clustering problem in class.
Write out an approximation claim for A similar to that claim but using . (Tip: you
may want to work out the analysis in the next question before writing the claim.)

Problem 4 [(b)] continued on next page. . . Page 8 of 10


Anshul Suryan CS 6720: Assignment Based on Quiz 1 Problem 4

Answer
r : exact radius of the cluster using A
r0 : approximated radius of the cluster using A
only considering the radius of the cluster as it is the maximum distance that a
point inside the cluster can have from the centre. And the objective of k centre
clustering is to minimize the maximum radius.

claim: r0 − √
2
≤ r ≤ r0 + √
2

(c)
Prove the claim above.

Answer
we only need to consider the grid cell that lie on the periphery of the cluster.
because the farthest point from centre of cluster can only be in those cells. The
approximated cluster will always pass through the centre of the grid cell as the
centre is where the data points of that cell are approximated to. But the exact
radius of cluster circle will be different as the original data points in that cell can
be
a. closer to centre of cluster then than centre of grid cell
b. farther from centre of cluster then the centre of grid cell

Figure a : data point in the cell closest(at the nearest corner of cell) to the cluster
centre

Figure b: data point in the cell farthest (at the opposite corner of cell)from
the cluster centre

Problem 4 [(c)] continued on next page. . . Page 9 of 10


Anshul Suryan CS 6720: Assignment Based on Quiz 1 Problem 4

as the height and width of each cell is , the distance of corner from centre would
be √2
hence the exact radius r would lie between r0 − √2 to r0 + √2 i.e.

 
r0 − √ ≤ r ≤ r0 + √
2 2

(d)
Will this approach work for k-median clustering? Explain your answer giving a simple
concrete example.

Answer
No, this approach wont work for k-median clustering.
Reason: for k-median clustering , the objective is to minimize the sum of distances
of all points from centre. hence,clusters will be formed in such a way that the grid
cells with higher number of data points are closer to the center of cluster. example:

Page 10 of 10

S-ar putea să vă placă și