Documente Academic
Documente Profesional
Documente Cultură
Jingrui He
09/11/2007
Coin Flips
You flip a coin
Head with probability 0.5
( )
P X = xi ∩ X = x j = 0 if i ≠ j
( ) ( )
P X = x ∪X = x = P X = x +P X = x
i j i (
j ) if i ≠ j
( )
P X = x1 ∪ X = x2 ∪ K ∪ X = xk = 1
Common Distributions
Uniform X U [1,K , N ]
X takes values 1, 2, …, N
( )
P X = i =1 N
E.g. picking balls of different colors from a box
Binomial X Bin ( n, p )
X takes values 0, 1, …, n
n i n −i
P ( X = i ) = p (1 − p )
i
E.g. coin flips
Coin Flips of Two Persons
Your friend and you both flip coins
Head with probability 0.5
You flip 50 times; your friend flip 100 times
How many heads will both of you get
Joint Distribution
Given two discrete RVs X and Y, their joint
distribution is the distribution of X and Y
together
E.g. P(You get 21 heads AND you friend get 70
heads)
∑∑ x y
P (X = x ∩ Y = y) = 1
E.g.
∑ ∑ P ( You get i heads AND your friend get j heads ) = 1
50 100
i =0 j =0
Conditional Probability
P ( X = x Y = y ) is the probability of X = x ,
given the occurrence of Y = y
E.g. you get 0 heads, given that your friend gets
61 heads
P (X = x ∩ Y = y)
P (X = x Y = y) =
P(Y = y)
Law of Total Probability
Given two discrete RVs X and Y, which take
values in { x1 ,K , xm } and { y1 ,K , yn } , We have
P ( X = xi ) =∑ P (X = x ∩ Y = y )
j i j
= ∑ P ( X = x Y = y )P ( Y = y )
i j j
j
Marginalization
P ( X = xi ) = ∑ P (X = x ∩ Y = y )
j i j
= ∑ P ( X = x Y = y )P ( Y = y )
i j j
j
( )
P Y = y j X = xi P ( X = xi )
( )
P X = xi Y = y j =
∑ P (Y = y
k j )
X = xk P ( X = xk )
Independent RVs
Intuition: X and Y are independent means that
X = x neither makes it more or less probable
that Y = y
Definition: X and Y are independent iff
P (X = x ∩ Y = y) = P (X = x) P (Y = y)
More on Independence
P (X = x ∩ Y = y) = P (X = x) P (Y = y)
P (X = x Y = y) = P (X = x) P (Y = y X = x) = P (Y = y)
P (X = x ∩ Y = y Z = z) = P (X = x Z = z) P (Y = y Z = z)
More on Conditional Independence
P (X = x ∩ Y = y Z = z) = P (X = x Z = z) P (Y = y Z = z)
P ( X = x Y = y, Z = z ) = P ( X = x Z = z )
P ( Y = y X = x, Z = z ) = P ( Y = y Z = z )
Monty Hall Problem
You're given the choice of three doors: Behind one
door is a car; behind the others, goats.
You pick a door, say No. 1
The host, who knows what's behind the doors, opens
another door, say No. 3, which has a goat.
Do you want to pick door No. 2 instead?
Host reveals
Goat A
or
Host reveals
Goat B
Host must
reveal Goat B
Host must
reveal Goat A
Monty Hall Problem: Bayes Rule
Ci : the car is behind door i, i = 1, 2, 3
P ( Ci ) = 1 3
H ij : the host opens door j after you pick door i
0 i= j
0 j=k
(
P H ij Ck ) =
i=k
1 2
1 i ≠ k , j ≠ k
Monty Hall Problem: Bayes Rule cont.
WLOG, i=1, j=3
P ( H13 C1 ) P ( C 1 )
P ( C1 H13 ) =
P ( H13 )
P ( H13 C1 ) P ( C1 ) = ⋅ =
1 1 1
2 3 6
Monty Hall Problem: Bayes Rule cont.
P ( H13 ) = P ( H13 , C1 ) + P ( H13 , C2 ) + P ( H13 , C3 )
= P ( H13 C1 ) P ( C1 ) + P ( H13 C2 ) P ( C2 )
1 1
= + 1⋅
6 3
1
=
2
P ( C1 H13 ) =
16 1
=
12 3
Monty Hall Problem: Bayes Rule cont.
P ( C1 H13 ) =
16 1
=
12 3
P ( C2 H13 ) = 1 − = > P ( C1 H13 )
1 2
3 3
You should switch!
Continuous Random Variables
What if X is continuous?
Probability density function (pdf) instead of
probability mass function (pmf)
A pdf is any function f ( x ) that describes the
probability density in terms of the input
variable x.
PDF
Properties of pdf
f ( x ) ≥ 0, ∀x
+∞
∫ f ( x) = 1
−∞
( )
f x ≤ 1 ???
0.35
0.3
0.25
f(x)
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
Common Distributions cont.
Beta X Beta (α , β )
β −1
x (1 − x ) , x ∈ [ 0,1]
1
f ( x; α , β ) =
α −1
B (α , β )
α = β = 1 : uniform distribution between 0 and 1
E.g. the conjugate prior for the parameter p in
Binomial distribution
1.6
1.4
1.2
1
f(x)
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Joint Distribution
Given two continuous RVs X and Y, the joint
pdf can be written as f X,Y ( x, y )
∫∫ f X,Y ( x, y )dxdy = 1
x y
Multivariate Normal
Generalization to higher dimensions of the
one-dimensional normal
Covariance Matrix
1
f Xv ( x1 ,K , xd ) =
( 2π ) d 2
Σ
12
1 v
⋅ exp − ( x − µ ) Σ ( x − µ )
T −1 v
2
Mean
Moments
Mean (Expectation): µ = E ( X )
∑ v P (X = v )
Discrete RVs: E ( X ) =
vi i i
+∞
Continuous RVs: E ( X ) = ∫ xf ( x ) dx
−∞
Variance: V ( X ) = E ( X − µ )
2
Discrete RVs: V ( X ) = ∑ ( vi − µ ) P ( X = vi )
2
vi
+∞
Continuous RVs: V ( X ) = ( x − µ) f ( x )dx
∫
2
−∞
Properties of Moments
Mean
( ) ( )
E X+Y = E X +E Y ( )
E ( aX ) = aE ( X )
If X and Y are independent, E ( XY ) = E ( X ) ⋅ E ( Y )
Variance
( )
V aX + b = a V X
2
( )
If X and Y are independent, V ( X + Y ) = V (X) + V (Y)
Moments of Common Distributions
Uniform X U [1,K , N ]
Mean (1 + N ) 2 ; variance ( N − 1) 12
2
Binomial X Bin ( n, p )
Mean np ; variance np 2
Normal X (
N µ ,σ 2 )
Mean µ ; variance σ 2
Beta X Beta (α , β )
αβ
Mean α (α + β ) ; variance
(α + β ) (α + β + 1)
2
Probability of Events
X denotes an event that could possibly happen
E.g. X=“you will fail in this course”
P(X) denotes the likelihood that X happens,
or X=true
What’s the probability that you will fail in this
course?
Ω denotes the entire event set
{
Ω = X, X }
The Axioms of Probabilities
0 <= P(X) <= 1
P (Ω) = 1
P ( X1 ∪ X 2 ∪ K) = ∑ P ( X ) , where X
i i i are
disjoint events
Useful rules
( ) ( ) ( )
P X1 ∪ X 2 = P X1 + P X 2 − P X1 ∩ X 2 ( )
P (X) = 1− P (X)
Interpreting the Axioms
Ω
X1
X2