Randomized Algorithms and Probabilistic Analysis of Algorithms

Lecture: Randomized Algorithms TU/e 5MD20
5MD20Design Automation
Randomized Algorithms and Probabilistic
Analysis of Algorithms
1
Phillip Stanley-Marbell

Lecture Outline
Motivation
Probability Theory Refresher
Example Randomized Algorithm and Analysis
Tail Distribution Bounds
Example Application of Tail Bounds
Chernoff Bounds
The Probabilistic Method
Hashing
Summary of Key Ideas
2

What are Randomized Algorithms and Analyses?
Randomized algorithms
Algorithms that make random decisions during their execution
Example: Quicksort with a random pivot
Probabilistic analysis of algorithms
Using probability theory to analyze the behavior of (randomized or deterministic) algorithms
Example: determining the probability of a collision of a hash function
3
Probability and
Computation
Randomized Algorithms Probabilistic Analysis
of algorithms
Monte Carlo
algorithms
Las Vegas
algorithms
May fail or return an
incorrect answer
Always return right
answer

Why Randomized Algorithms and Analyses?
Why randomized algorithms?
Many NP-hard problems may be easy to solve for typical inputs
One approach is to use heuristics to deal with pathological inputs
Another approach is to use randomization (of inputs, or of algorithm) to reduce the chance of worst-case behavior
4
Probability and
Computation
of algorithms
Monte Carlo
algorithms
Las Vegas
algorithms

Why Randomized Algorithms and Analyses?
Why probabilistic analysis of algorithms?
Naturally, if algorithm makes random decisions, performance is not deterministic
Also, deterministic algorithm behavior may vary with inputs
Probabilistic analysis also lets us estimate bounds on behavior; well talk about such bounds today
5
Probability and
Computation
of algorithms
Monte Carlo
algorithms
Las Vegas
algorithms

Theoretical Foundations
Probability theory (things you covered in 2S610, 2nd year)
Probability spaces
Events
Random variables
Characteristics of random variables
Combinatorics & number theory (some things you might have seen in 2DE*)
Many relations come in handy in simplifying analysis
Algorithm analysis
We will review relevant material in the next half hour
6

Lecture Outline
Motivation
Chernoff Bounds
Hashing
7

Probability space, (, ,), defines
The possible occurrences (simple events), sets of occurrences (subsets of ), and likelihood of occurrences
Sample space,
Composed of all the basic events we are concerned with
Example: for a coin toss, = {H, T}
Sigma algebra,
Possible occurrences we can build out of
Example: for coin toss, = {, , H, T}
Events are members of

Probability measure,
A mapping from to [0, 1]

Assigns probability (a real number p [0, 1]) to events
One example of a probability measure is a probability mass function
8

Notation
Event sets
Will start today by representing events with sets, using letters early in the alphabet e.g., A, B, ...
Events may be unitary elements or subsets of
Probability
Probability of event A will be written as Pr{A}
9
e1
e2
e3
e4
e5
e6
e7
e8

Independence, Disjoint Events, and Unions
Two events, A and B are said to be independent, iff
Occurrence of A does not influence outcome of B
Pr{AB} = Pr{A}Pr{B}
Note that this is different form events being mutually exclusive
If two events A and B are mutually exclusive, then Pr{AB} =
For any two events E
1
and E
2
Pr{E1E2} = Pr{E1} + Pr{E2} Pr{E1 E2}
Union bound (often comes in handy in probabilistic analysis)
10
Pr
i 1
E
i

i 1
PrE
i

Conditional Probability
Probability of event B occurring, given A has occurred, Pr{B | A}
Pr{B | A} = Pr{B A}
Pr{A}
If events A and B are independent:
Pr{B A} = Pr{B}Pr{A}
Pr{B | A} = Pr{B A}
Pr{A}
Pr{B | A} = Pr{B}Pr{A} = Pr{B}
Pr{A}
11

Events and Random Variables
So far, we have talked about probability and independence of events
Rather than work with sets, we can map events to real values
Random Variables
A random variable is a function on the elements of the sample space, , used to identify elements of .
Definition: A random variable, X on a sample space is a real-valued function on ; i.e., X : .
We will only deal with discrete random variables, which take on a finite or countably infinite number of values
Random variables define events
The occurrence of a random variable taking on a specific value defines an event
Example: Coin toss. Let X be a random variable defining the number of heads resulting from a coin toss
Sample space, = {H, T}, sigma algebra of subsets of , = {, , {H}, {T}}

X: {, , {H}, {T}} {0, 1}
Events: {X = 0}, {X = 1}
In general, an event defined on a random variable X is of the form {s | X(s) = x }
12

Notation
Will represent random variables with uppercase letters, late in alphabet
Example: X, Y, Z
Will use the abbreviation rvar for random variable
Events
Events correspond to a random variable, say, X, (uppercase) taking on a specific value, say, x (lowercase)
Probability of rvar X taking on the specific value x is written as Pr{X = x} or f
X
(x)
Example: Coin toss Let X be an rvar representing number of heads; Pr{X = 0} = f
X
(0) = (for a fair coin)
13

Random Variables Intuition
So far, weve presented a lot of notation; can we gain more intuition ?
Imagine a phenomenon, that can be represented w/ real values
Example: the result of rolling a die
Let X and Y be functions mapping the result of rolling die to a number
e.g., X = die result, : {1, 2, 3, 4, 5, 6} or Y = 2(die result)+1 : {3, 5, 7, 9, 11, 13}
X and Y are two different functions (random variables) defined on the same set of events
Each time X takes on a specific value is an event
For the above die rolling example, with rvars X and Y:
Pr{X = 1} = Pr{Y = 3}, Pr{X = 4} = Pr{Y = 9}, and so on
14

Characteristics of Random Variables
Random variables and events
1. We first talked about random phenomenon events in terms of sets
2. We then introduced rvars, to let us represent events with real numbers
3. When representing events with rvars, we can then look at some measures or characteristics of event phenomena
Link to randomized algorithms and analyses; will reason about:
Randomized algorithms in terms of rvars characterizing actions of the algorithm
Probabilistic analysis of algorithms in terms of rvars characterizing properties of the alg. behavior given inputs
15
Probability and
Computation
of algorithms
Monte Carlo
algorithms
Las Vegas
algorithms

Characteristics of Random Variables
Expectation or Expected Value, E[X], of an rvar X
Properties of E[X]
Linearity
Constant multiplier
Question
What is E[XE[X ]] ?
16
EX

x
xf
x
x EX

i
i PrX i
or
E
i1
n
X
i
=
i1
n
EX
i
Ec X = cEX

Common Discrete Distributions
Uniform discrete
All values in range equally likely
= {a, ..., b}, = 2
, : Pr{X=x} = 1/||
Bernoulli or indicator random variable
Success or failure in a single trial
= {0, 1}, = 2
{0, 1}
= {, {0}, {1}, }, : Pr{X=0} = p, Pr{X=1} = 1-p
E[X] = p, Var[X] = p(1-p)
Binomial
Number of successes in n trials
= n+1 = {0, 1, 2, ..., n}, = 2
, : fX(k) = p
k
(1-p)
n-k

E[X] = np, Var[X] = np(1-p)
Geometric
Number of trials before first failure
= = 2
, : fX (k) = p(1-p)
k-1

E[X] = 1/p, Var[X] = (1-p)/p
2
17
( )
n
k

Useful Mathematical Results
Some useful results from number theory and combinatorics well use later
18
i0
r
i
=
1
1r
i1
r
i
=
r
1r
i0
m
r
i
=
1r
m1
1r
1 -
k
n
e
- k
n
, when k is small compared to n
For any y, 1+y e
y
i1
n
1
i
= ln(n) + O(1)

Lecture Outline
Motivation
Chernoff Bounds
Hashing
19

Quicksort
Input: A list S = {x1, ..., xn} of n distinct elements over a totally ordered universe
Output: The elements of S in sorted order
1. If S has one or zero elements, return S. Otherwise continue.
2. Choose an element of S as a pivot; call it x
3. Compare every other element of S to x in order to divide the other elements into two sublists
a. S1 has all the elements of S that are less than x;
b. S2 has all those that are greater than x.
4. Apply Quicksort to S1 and S2
5. Return the list S1, x, S2
Probabilistic Analysis of Quicksort
Worst case performance is (n
2
)
E.g., if input list in decreasing order and pivot choice rule is pick first element
On the other hand, if pivot always splits S into lists of approximately equal size, performance is O(n log n)
Question:
Assuming we use the pick first element pivot choice, and input elements are chosen from a uniform discrete
distribution on a range of values, what is the expected number of comparisons?
i.e., let X be an rvar denoting number of comparisons; what is E[X] ?
20

21
Theorem.
If the first list element is always chosen as pivot, and input is chosen uniformly
at random from all possible permutations of values in input support set, then
the expected number of comparisons made by Quicksort is 2n ln n + O(n)
Proof.
Given an input set x1, x2, ..., xn chosen uniformly at random from possible permutations, let y1, y2, ..., yn be the
same values sorted in increasing order
Let Xij be an indicator rvar that takes on value 1 if yi and yj are compared at any point in the algorithm, 0 otherwise,
for some i < j. The total number of comparisons, is the total number of times Xij = 1
Let X be an rvar denoting the total number of comparisons of Quicksort. Then,
X =
i1
n1
ji1
n
X
ij
and
where weve used the linearity property introduced on slide 16
E[X] = E
i1
n1
ji1
n
X
ij
=
i1
n1
ji1
n
EX
ij

22
Theorem.
Proof. (contd)
Since Xij is an indicator rvar, E[Xij] is the probability that Xij = 1 (from slide 17). But recall that Xij is the event that
two elements yi and yj are compared.
Two elements yi and yj are compared iff either of them is first pivot selected by Quicksort from the set Y
ij
= {yi,
yi+1, ..., yj}. This is because if any other item in Y
ij
were chosen as a pivot, since that item would lie between yi and
yj, it would place yi and yj on different sublists (and they would never be compared to each other).
Now, the order in the sublists is the same as in original list (we are in process of sorting). From theorem, we always
choose first element as pivot; since input is chosen uniformly at random from all possible permutations, any element
of the ordering Y
ij
is equally likely to be first in the (random ordered) input sublist.
Thus probability that yi or yj is selected as pivot, which is the probability that yi and yj are compared, which is the
probability that Xij = 1, which is E[Xij], is (from definition of discrete uniform distribution on slide 17), 2/(j-i+1).

23
Theorem.
Proof. (contd)
Substituting E[Xij] = 2/(j-i+1) into the expression for E[X] form slide 21:
E[X] =
i1
n1
ji1
n
2
ji1
=
i1
n1
k2
ni1
2
k
=
k2
n
i1
n1k
2
k
=
k2
n
n 1 k
2
k
= n 1
k2
n
2
k
2n 1
= 2n 2
k1
n
1
k
4n
= 2nlnn On
from slide 18

Randomized Quicksort
What if inputs are not the uniformly random selections of permutations?
How to avoid pathological inputs? pick a random pivot!
Analysis of number of comparisons is similar to foregoing analysis
24
Theorem.
Suppose that, whenever a pivot is chosen for Randomized Quicksort, it is
chosen independently and uniformly distributed over all possibly choices.
Then, for any input, the expected number of comparisons made by
Randomized Quicksort is 2n ln n + O(n).
Proof.
Almost identical to proof of expected number of comparisons for deterministic Quicksort with randomized inputs.
Try doing this proof yourself as an exercise.

Lecture Outline
Motivation
Chernoff Bounds
Hashing
25

Weve seen one example of measures for characterizing a distribution
Expectation, E[X] gives us an idea of the average value taken on by an rvar
Another important characteristic is the tail distribution
Tail distribution is the probability that an rvar takes on values far from its expectation
Useful in estimating the probability of failure of randomized algorithms
Intuitively, one may think of it as Pr{|X-k| a}
We will now look at a few different bounds on tail distribution
Loose bounds dont tell us much; they are often however easier to calculate
Tight(er) bounds give us a narrower range on values, but often require more information
26
P
r
{
X

=

x
}
x
Pr{X a}
a

Markovs Inequality
A loose bound that is easy to calculate is Markovs inequality
We can easily calculate Pr{X a} knowing only the expectation of X
This however often doesnt tell us much!
We will use a similar argument in the Probabilistic Method later today
27
Theorem [Markovs Inequality].
Let X be a random variable that assumes only nonnegative values.
Then, for all a > 0, Pr{X a} (E[X] /a)
Proof.
For a > 0, let I be a Bernoulli/indicator random variable, with I = 1 if X a, 0 otherwise. Since
X is nonnegative, I X/a. From slide 17, E[I] = Pr{I = 1} = Pr{X a}, thus
Pr{X a} E[X/a] = E[X]/a (from slide 16).

To derive at tighter bounds, we will need the idea of moments of an rvar
Definition: kth moment
The kth moment of an rvar X is E[X
k
] ,
k = 0 is termed the first moment, and so on
Definition: variance
The variance of an rvar X is defined as Var[X] = E[(X E[X])
2
]
Exercise: Show that Var[X] = E[X
2
] - (E[X])
2
Definition: standard deviation
The standard deviation of an rvar X, is [X] = Var[X ]

Moments
28

Chebyshevs Inequality
Now that we know about Var[X], we can introduce a tighter bound on tail
29
Theorem [Chebyshevs Inequality].
For any a > 0, Pr{|X E[X ]| a} (Var[X ] /a
2
)
Proof.
Pr{|X E[X ]| a} = Pr{(X E[X ])
2
a
2
}. Since (X E[X ])
2
is a nonnegative rvar, we can
apply Markovs inequality to yield:
Pr{(X E[X ])
2
a
2
} E[(X E[X ])
2
]/a
2
= (Var[X ] /a
2
).

Lecture Outline
Motivation
Chernoff Bounds
Hashing
30

Randomized Algorithm for Median, RM
Idea
Find two nearby elements d and u, spanning a small set C, by sampling S
Since |C| is o(n/log n), can sorted it in o(n) time using alg. that is O(k log k) for k elements
The check in step 7 is to validate that the set C is indeed small so that above assumption holds
31
Randomized Median Algorithm
Input: A set S of n elements over a totally ordered universe
Output: The median element of S, denoted m.
1. Pick a (multi-)set R of n
3/4
elements in S, chosen independently and uniformly at
random, with replacement.
2. Sort the set R.
3. Let d be the (n
3/4
-n)th smallest element in the sorted set R.
4. Let u be the (n
3/4
+n)th smallest element in the sorted set R.
5. By comparing every element in S to d and u, compute the set C = {x S : d x u}
and the numbers ld = |{x S : x < d}| and lu = |{x S : x > u}|
6. If ld > n/2 or lu > n/2 then FAIL
7. If |C | 4n
3/4
then sort the set C, otherwise FAIL
8. Output the (n/2- ld + 1)th element in the sorted order of C

What is the probability that RM Fails?
What can go wrong?
Sample might not be representative in terms of median:
e1: Y1 = |{r R | r m}| < n
3/4
n too few elements in sample smaller than m,
e1: Y1 = |{r R | r m}| < n
3/4
n too few elements in sample larger than m
e3: |C | > 4n
3/4
sample picked from S has d and u too far apart
Pr{RM fails} = Pr{e1 e2 e3} = Pr{e1} + Pr{e2} + Pr{e3}, since the events e are disjoint
Lets look at determining probability of event e
1
32

Reminder Bernoulli/Indicator and Binomial
Bernoulli or indicator rvar
Success or failure in a single trial
Example: Coin toss, with rvar X = 1 when heads, X = 0 when tails
= {0, 1}
Pr{X=0} = p, Pr{X=1} = 1-p
E[X] = p
Var[X] = p(1-p)
Binomial rvar
Number of successes in n Bernoulli trials of parameter p
Sum of n Bernoulli(p) rvars is a Binomial(n, p) rvar
= n+1 = {0, 1, 2, ..., n},
fX(k) = p
k
(1-p)
n-k

E[X] = np
Var[X] = np(1-p)
33
( )
n
k

Determining Pr{e
1
}
Lets define an indicator random variable X
i
Xi are independent since from definition of RM, sampling is with replacement
By definition, (n-1)/2 +1 elements in the input set S to RM are smaller than median
So, probability that a random sample is smaller than median is
Y1 is an rvar representing # items (in sample R, of size n
3/4
) smaller than median m
We can therefore write Y1 in terms of Xi as
34
X
i

1 if the ith sample is m
0 otherwise
PrX
i
1
n12 1
n

1
2
1
2n
Y
1
i1
n
34
X
i
By definition of RM alg

Determining distribution of Y1
Recall (slide 33) that sum of n Bernoulli(p) rvars is Binomial(n, p), so
and
35
Y
1
i1
n
34
X
i
f
Y
1
y
n
34
y

1
2
1
2n
1
2
1
2n
n
34
y
VarY
1
n
34
1
2
1
2n
1
2
1
2n
EY
1
n
34
1
2
1
2n

Determining Pr{e
1
}
Back to determining Pr{e
1
} (recall: its one of events in which RM fails)...
Pr{e1} = Pr{Y1 < n
3/4
n}
Even though we can determine distribution of rvar Y1, determining Pr{Y1 < y} is not easy
(If we instead wanted Pr{Y1 y} that is just the cumulative distribution function of Y1)
We could determine the appropriate limit of the above sum to give us Pr{Y1 < y}...
We can however easily get a bound on Pr{e
1
}
We can apply Chebyshevs inequality to get a bound on Pr{e1}:
36
Pre
1
Pr|Y
1

1
2
n
3
4
n|
Pr||Y
1
E[Y
1
]| n|
Var[Y
1
]
n
n
1
4
[
1
2
1
2n
][
1
2
1
2n
]

Lecture Outline
Motivation
Chernoff Bounds
Hashing
37

Bounds are useful!
We saw in previous example how knowing about the Chebyshev inequality helped us to quickly
answer questions about probability of failure of a randomized algorithm
But, how tight are the bounds?
Not all bounds tell us something useful
Example: Pr{X = x} 1 is always true for any rvar X and value x, but it tells you nothing new
Chernoff bounds give us tighter bounds on Pr{|X-E[X]| > a}
Chernoff Bounds
38
P
r
{
X

=

x
}
x
a
Loose
bound
1.0
Tighter
bound
Pr{X = a} (if X is a discrete rvar)

Chernoff Bounds
Unlike Markov and Chebyshev inequalities, these are a class of bounds
There are Chernoff bounds for different specific distributions
Chernoff bounds are however all formulated in terms of moment generating functions
Moment generating function for an rvar X, M
X
(t ) = E[e
tX
]
MX (t ) uniquely characterizes distribution
We will be most interested in the property that E[X
n
] = MX
(n)
(0)
i.e. nth derivative of MX (t ) at t = 0 yields E[X
n
]
Example: Moment generating function for Bernoulli rvar
(Recall: coin toss, heads or 1 with probability p, tails or 0 (1-p)):
39
M
X
t Ee
tX
pe
t 1
1pe
t 0
pe
t
1p

PrX a Pre
tX
e
ta
Ee
tX
e
ta
min
t0

Ee
tX
e
ta
Chernoff Bounds
Chernoff bounds generally make use of the ff. (from Markovs ineq., slide 7)
For a sequence of independent (but not necessarily i.i.d.) indicator rvars
The following Chernoff bounds (which can be derived from the above) exist:
For 0 < 1,
For 0 < < 1,
40
for t>0
PrX a Pre
tX
e
ta
Ee
tX
e
ta
min
t0

Ee
tX
e
ta
for t<0
PrX
1
2
3
where u = E[X ]
PrX
1
2
2

Chernoff Bounds Estimating a Parameter
Problem:
You have been asked to create a model of errors on a real communication interconnect
At the high communication speeds, transmitted data may be subject to bit errors
You want to estimate the probability of a bit error by measurement (e.g., eye diagrams):
How many measurement samples do you need?
Can you state a precise tradeoff between the accuracy of estimate and # of samples?
41
Jitter
Noise
Superposed bit streams yield "eye-diagram"
"1's"
"0's" Processing element
(MSP430F2274)
Ground and power-plane keep-out areas to reduce RF signal loss in test
sensor node attached to module
Interconnect; majority of interconnect
routed on bottom layer of PCB
53 mm
102 mm
measurement

Estimating probability of bit error from n measurements
Let p be the probability we are trying to estimate, taking n measurements
Let X = pn be the number of measurements in which we observe bit errors
If n is sufficiently large, we expect p to be close to p
Confidence interval
A 1 - confidence interval for a parameter p is an interval [p-, p+] such that
Pr{p [p-, p+]} 1 - i.e., Pr{np [n(p-), n(p+)]} 1 -
If actual p does not lie in interval, i.e., p [p - , p+]
If p < p , then X > n(p + ) (since X = np)
If p > p + , then X < n(p )
We can apply the Chernoff bounds for Binomial we showed
earlier
X = np is the number of observed errors in n measurements is Binomial(n, p) distrib.!
42
~
~
~ ~
~ ~ ~ ~
~
~
~
~

Prp p
, p
PrX np1
p
PrX np1
e
n
2
2p
e
n
2
3p
e
n
2
2
e
n
2
3
Applying Chernoff bounds
So, probability that the real p is less than away from estimated p,
can be set by performing an appropriate minimum # of measurements, n
Example: = 0.95, = 0.01 n 95,430 measurements
43
(since p 1 by definition of probabilities)
(applying the Binomial Chernoff bounds)
e
n
2
2
e
n
2
3

Other Applications of Parameter Estimation
Derive Chernoff bounds for distribution at hand
You cant always assume underlying distribution is Gaussion/normal
Semiconductor process / device models
An important part of the modern IC design flow
Diminishing device feature sizes (~100s atoms per transistor at 45nm); statistical models needed
Semiconductor fabrication companies (fab houses) use test chips to characterize process
How many test structures does one need to get a certain confidence in parameter estimates?
More applications
Characterizing probability of device failures: how many measurements do you need?
44

Characterizing Probability of Device Failures
45
Radioactive Decay of
238
U and
232
Th
from device packaging mold resin,
210
Po from PbSn solder (and Al wire)
12
C
!-particles !- rays Lithium
Cosmic rays Thermal neutrons
High energy neutron
(can penetrate up to
5 ft of concrete)
Neutron capture within Si
and B in integrated circuits
Unstable isotope
Magnesium
or
Possible interaction paths
Circuit state disturbance inducement
Microprocessor
electrical noise
Secondary ions and energetic particles may generate
electron-hole pairs in silicon; these may migrate
through device and aggregate, creating current
pulses that lead to changes of logic state.
+
temperature
uctuations
}
LD @(R4), R2
Program:
!x.+2x
?

More Applications of Randomized Algs.
Hashing: can use the basic tools introduced in the last two lectures to
Determine the expected number of items in a bin
Bound on the maximum number of items in a bin
Probability of false positives when using hash functions with fingerprints
Applicable to many areas of design automation (you will see example later in this course)
Approximate set membership: Bloom filters
Use probabilistic analysis to determine tradeoff b/n space and false positive probability
Hamiltonian cycles
Monte Carlo algorithms (will return a Hamiltonian cycle or failure)
46

Lecture Outline
Motivation
Chernoff Bounds
Hashing
47

A method for proving the existence of objects
Why is it relevant ?
The proofs are of a form that enables them to guide the creation of a randomized algorithm for finding the desired
object
Basic idea:
Construct a sample space such that the probability of selecting the desired object is > 0.
(if the probability of picking the desired element is > 0, then the element must exist.)
Alternatively: an rvar X must take on at least one value E[X], and at least one value E[X ]
Other approaches: second moment method, Lovasz local lemma
48

The Probabilistic Method: Example
A multiprocessor module (left) and its logical topology (right)
We want a grouping of the hardware into two sets, with a maximum number of connecting links
49
c
p
u
6
0
2
1
0
c
p
u
2
0
1
2
0
c
p
u
3
0
1
2
1
c
p
u
1
1
1
0
2
1
c
p
u
1
2
1
2
0
1
c
p
u
0
0
1
0
1
c
p
u
8
1
0
1
0
c
p
u
9
1
0
1
2
c
p
u
1
4
1
2
1
0
c
p
u
1
5
1
2
1
2
c
p
u
1
0
1
0
2
c
p
u
1
6
2
0
1
0
c
p
u
2
3
2
1
2
1
c
p
u
2
0
2
1
0
1
c
p
u
1
7
2
0
1
2
c
p
u
7
0
2
1
2
c
p
u
2
2
2
1
2
0
c
p
u
2
1
2
1
0
2
c
p
u
4
0
2
0
1
c
p
u
5
0
2
0
2
c
p
u
1
0
1
0
2
0
c
p
u
1
3
1
2
0
2
c
p
u
1
8
2
0
2
0
c
p
u
1
9
2
0
2
1
H
a
r
d
w
a
r
e

S
P
I

p
o
r
t
,

m
a
s
t
e
r
H
a
r
d
w
a
r
e

S
P
I

p
o
r
t
,

s
l
a
v
e
H
a
r
d
w
a
r
e
-
d
r
i
v
e
n

S
P
I

c
o
m
m
u
n
i
c
a
t
i
o
n

l
i
n
k
S
o
f
t
w
a
r
e
-
d
r
i
v
e
n

S
P
I

c
o
m
m
u
n
i
c
a
t
i
o
n

l
i
n
k
Processing element
(MSP430F2274)
Ground and power-plane keep-out areas to reduce RF signal loss in test
sensor node attached to module
Interconnect; majority of interconnect
routed on bottom layer of PCB
53 mm
102 mm

There may also be restrictions on valid topologies due to layout constraints
We can reformulate this as finding the Maxcut of the topology graph
Maxcut: a cut of graph of maximum weight; an NP-hard problem
Well use the probabilistic method to prove that a cut with certain properties exists
Well then turn proof into a randomized algorithm for finding the desired topology
50
Partition A
Partition B
This partitioning does not
yield the largest number of
links for a cut of the topology

How we will approach this problem:
1. Problem: topology partitioning for fault-tolerance
2. Restate as a Maxcut problem
3. Existence proof for a maxcut of value at least m/2
4. Conversion of proof into a simple randomized algorithm
51

Probabilistic Method: Problem Proof
52
Theorem [Maxcut].
Given any undirected graph G = (V, E ), with n vertices and m edges, there is a
partition of V into two disjoint sets A and B, such that at least m/2 edges
connect a vertex in A to a vertex in B, i.e., there is a cut with value at least m/2.
Proof.
Construct sets A and B by randomly and independently assigning each vertex to one of the two sets.
Let e1, ..., em be an arbitrary enumeration of the edges in G. For i = 1, ..., m, define Xi such that
Pr{edge ei connects a vertex in A to a vertex in B} = 1/2 (since we split the vertices into two sets,
randomly). Xi is therefore an Bernoulli/indicator rvar with p = 1/2 and E[Xi] = p = 1/2.
Let C(A, B) be an rvar denoting the value of the cut between A and B. Then,
Since E[C(A, B)] = m/2, there must be at least one value of C(A, B) m/2.
X
i

1 if edge i connects A to B
0 otherwise
ECA, B E
i1
m
X
i
=
i1
m
EX
i

m
2

Probabilistic Method: Proof Algorithm
Basic procedure Monte Carlo or Las Vegas algorithm
Repeat basic procedure a fixed number of times; return best m/2 cut or FAIL (MC)
Or, repeat procedure until we find an m/2 cut (LV)

What is the expected number of tries before we find a cut with value m/2 ?
We can use this as guide for number of times to repeat basic steps until we find a Maxcut
or FAIL (i.e., to direct a Monte Carlo algorithm)
53
Randomized Maxcut
Input: A graph G with n vertices and m edges
Output: A partition of G, into two sets A sets B such that at least m/2 edges connect A and B.
1. Randomly choose a partition. This can be done in linear time by scanning through vertices
and flipping a fair coin to pick destination set as A or B.
2. Check whether the selected cut is at least m/2, by counting edges crossing the cut
(polynomial time).

Probabilistic Method: Algorithm Performance
54
Expected number of tries before we find a cut with value m/2
Let p = Pr{C(A, B) m/2}
The value of a cut cannot be more than the number of edges, i.e., C(A, B) m
Previous proof showed that E[C(A, B)] = m/2, so,
Recall, geometric probability distribution
# trials before first failure, or, # trials before first success
= , fX (k) = p(1-p)
k-1
, E[X] = 1/p
Expected number of tries before we find a cut is 1/p, i.e., at least m/2+1
m
2
E[C(A, B)] _
|i
m
2
1|
iPrC(A, B)i _
|i
m
2
|
iPrC(A, B)i
(1p)[
m
2
1] p m
p
1
m
2
1

The Probabilistic Method Example Recap
A method for proving the existence of objects
Why is it relevant
The proofs can be used to guide construction of a randomized algorithm
There are also techniques to turn proofs into a deterministic algorithmsderandomization
What we just saw
1. A problem: topology partitioning for fault-tolerance
2. Restated as a Maxcut problem
3. Existence proof for a Maxcut of value at least m/2
4. Constructed a simple randomized algorithm based on proof
5. Analysis of the expected running time of the randomized algorithm
Question: was the algorithm Monte Carlo or Las Vegas

55

Lecture Outline
Motivation
Chernoff Bounds
Hashing
56

Hashing
Hash tables
Data structure that enables, on average, O(1) insertion and lookups
Useful when one would like to maintain a set of items, with fast lookup
Notation
Top-level table/array, T[]
Element for insertion in hash table, x, from a set U of possible elements
Key, k, is an identifier for x; assume we can easily map elements to integer keys
Hash function h(key[x]) specifies index in T[] where element x should be stored
Assumptions
Simple uniform hashing any element equally likely to hash to any slot
That is, h(key[x]) distributes the x elements uniformly at random over slots in T[]
57

Populating the Hash Table
Simplest approach: direct addressing
One element in T[] for each hash key when we can afford the space cost
May make sense when number of keys to be stored is approx. number of possible keys, |U |
Collisions
Want T[] to have about as many elements as well insert, n (not as many as exist, |U |)
Want h() to map larger set with |U | elements, to m slots
Since m < |U |, it is possible to have multiple elements hash to same slot
Can resolve collisions with two different approaches: chain hashing or open addressing
Chain Hashing
Keep items that hash to the same slot in a linked list or chain
Will now need to search through chain for insert/delete/lookup
The ratio = n/m is called the load
58
0 1 2 3 5 9
x1, x2, ..., x6 = {2, 0, 3, 1, 9, 5}
0 1 2 3 4 5 6 7 8 9

bin or slot
chain
x U = {0, ... 9}

Expected Search Time in Chain Hashing
Expected # of comparisons (assume new elements added to head of chain, simple uniform hashing)
If element is not already in hash table (compare to all elements in bin h(key(x))): (1+)
If element is in hash table (stop when we find element in bin h(key(x))): (1+):
59
Proof
Assume element we seek is equally likely to be any of the n elements in table. Number of elements
examined in lookup for element x is Lx = 1 + number of elements in bin h(key(x)) before x all
elements seen in chain before x are were added after x was.
Now, we can find avg. Lx by calculating expected value over the n possible elements in table...
Let xi denote i
th
element inserted into table, i = 1, ..., n, and ki = key(xi). Define an indicator rvar Xij:
, and E[Xij] = 1/m. Thus
X
ij
1 hk
i
hk
j
, with probability
1
m
0 otherwise
E_
1
n
_
i1
n
L
x
E_
1
n
_
i1
n
1 _ X
ij
ji1
n

1
n
_
i1
n
1_EX
ij
j
ji1
n
1
1
nm
__n
i1
n
_i
i1
n
_
1
1
nm
[n
2
n(n1)
2
] 1
n1
2m
1

2n

Not a
constant

Hash functions and Universal Hashing
Universal hashing
At runtime, pick the hash function that will be used at random...
... from a family of universal hash functions
Universal hashing gives good average case behavior
If key k is in table, expected length of chain containing k is at most 1 +
60
Definition [Universal Hash Function].
A finite collection, , of hash functions that map a given universe U of keys into
the range {0, 1, ..., m-1} is said to be universal, if for each pair of distinct keys k, l,
U, the number of hash functions h for which h(k) = h(l) is at most ||/m

Other Forms: Perfect Hashing, Bloom Filters
Perfect hashing
Uses two levels of hashing w/ universal hash functions: second level hashing upon collision
Can guarantee no collisions at second level
Unlike other forms of hashing, worst-case performance is O(1)
Bloom filters
Tradeoff between space and false positive probability
61
...
For each element xi, to be inserted, calculate k hashes:
T[h1(x0)] 1
T[hk(x0)] 1
Calculate k hashes of element y:
T[h1(x)] = 1?, and
...
and T[hk(x)] = 1? then y is in table
Insertion: Checking:
T[h1(xi)] 1
T[hk(xi)] 1
,
0 1 1 0 1 0 0 1 T:
...
After k hashes, probability of a given element of T[] being zero is
If we assume some elements still zero, probability of a false positive is then
1
1
n
km
11
1
n
km
k

Other Forms: Open Addressing
All elements stored in the top-level table T[] itself
No chaining
1 since hash table can get full once its m slots are taken by elements
Upon a collision, hash function defines next slot to probe until an empty slot is found
Advantages
No need for pointers used in chaining: may have more slots for same memory usage
Disadvantages
Entry deletion is complicated: cant simply remove entry as it will affect probe sequence
Probe sequence strategies
Linear probing, quadratic probing, double hashing
62

Lecture Outline
Motivation
Chernoff Bounds
Hashing
63

Summary
Why randomized algorithms and analyses ?
Analysis of algorithms that make use of randomness
Analysis of algorithms in the presence of random input
Designing algorithms that avoid pathological behavior using random decisions
Probability review
Probability space, events, random variables
Characteristics of random variables: expectation, moments
Randomized algorithms and Probabilistic analysis
Tail distribution bounds
Markov inequality, Chebyshev inequality, Chernoff bounds
Proofs algorithms
Hashing example and analysis
64

Probing Further...
Books
Kleinberg & Tardos chapter 13
Randomized Algorithms (Motwani and Raghavan)
Probability and Computing (Mitzenmacher and Upfal)
65

Randomized Algorithms and Probabilistic Analysis of Algorithms

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Randomized Algorithms and Probabilistic Analysis of Algorithms

Încărcat de

Drepturi de autor:

Formate disponibile

Lecture: Randomized Algorithms TU/e 5MD20

Example: for coin toss, = {, , H, T}

Events are members of

A mapping from to [0, 1]

Lecture: Randomized Algorithms TU/e 5MD20

Lecture: Randomized Algorithms TU/e 5MD20

Sample space, = {H, T}, sigma algebra of subsets of , = {, , {H}, {T}}

= {a, ..., b}, = 2

= n+1 = {0, 1, 2, ..., n}, = 2

Lecture: Randomized Algorithms TU/e 5MD20

The standard deviation of an rvar X, is [X] = Var[X ]

Lecture: Randomized Algorithms TU/e 5MD20

Or, repeat procedure until we find an m/2 cut (LV)

Question: was the algorithm Monte Carlo or Las Vegas

S-ar putea să vă placă și