Documente Academic
Documente Profesional
Documente Cultură
Sciences
MTH2222
Mathematics of Uncertainty
Semester 2, 2015
MTH2222 – Mathematics of Uncertainty
Prepared by:
Kais Hamza
© Copyright 2010
NOT FOR RESALE. All materials produced for this unit are protected by copyright.
Monash students are permitted to use these materials for personal study and
research only, as permitted under the Copyright Act. Use of these materials for any
other purposes, including copying or resale may infringe copyright unless written
permission has been obtained from the copyright owners. Enquiries should be made
to the publisher
MTH2222 – Mathematics of Uncertainty
Lecture Notes
MTH2222 – Mathematics of Uncertainty
Contents
1
2
5 Limit Theorems 55
1 Markov and Chebyshev’s Inequalities . . . . . . . . . . . . . . . . . . . . . . . . 55
2 The Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 The Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 1
1 Sets
• S ∪ T = {x; x ∈ S or x ∈ T } and S ∩ T = {x; x ∈ S and x ∈ T }
S S
T T
– Let S be the set of (strictly) positive even integers and T be the set of integers
less than or equal to 9. Then S ∪ T = {. . . , −2, −1, . . . , 8, 9, 10, 12, 14, 16, . . .} and
S ∩ T = {2, 4, 6, 8}.
– Let S be the set of polynomials of degree less than or equal to 2 and T be the set of
differentiable functions f with f (0) = f ′ (0) = 0. Describe S ∩ T .
3
4
S S
T T
– Within the set of positive integers, what is the complement of the set of even integers?
• S \ T = {x; x ∈ S and x ̸∈ T } = S ∩ T c
– Shade S \ T and T \ S.
• S∆T = (S \ T ) ∪ (T \ S)
– What is (S∆T ) ∪ (S ∩ T )?
• S ∪ (T ∩ U ) = (S ∪ T ) ∩ (S ∪ U )
S T S T
U U
5
and S ∩ (T ∪ U ) = (S ∩ T ) ∪ (S ∩ U )
S T S T
U U
∪
∞
• Sn = S1 ∪ S2 ∪ . . . = {x; x ∈ Sn for some n}
n=1
∪
∞
– What is [0, 1 − 1/n]?
n=1
∪
∞
– What is [0, 1 − 1/n)?
n=1
∩
∞
• Sn = S1 ∩ S2 ∩ . . . = {x; x ∈ Sn for all n}
n=1
∩
∞
– What is [0, 1 + 1/n]?
n=1
∩
∞
– What is [0, 1 + 1/n)?
n=1
( )c ( )c
∪
∞ ∩
∞ ∩
∞ ∪
∞
• De Morgan’s Laws. Sn = Snc and Sn = Snc
n=1 n=1 n=1 n=1
2 Probabilistic Models
• Probability Axioms:
– P(A) ≥ 0
– P(Ω) = 1
( )
∪
∞ ∑
∞
– For disjoint events (Am ∩ An = ∅ for m ̸= n), P An = P(An )
n=1 n=1
– P(Ac ) = 1 − P(A)
– P(∅) = 0
A B
3 Conditional Probability
P(A ∩ B)
• P(A|B) = for P(B) ̸= 0
P(B)
– If B ⊂ A then P(A|B) = 1.
∗ What does this mean?
– If A ⊂ B c then P(A|B) = 0.
∗ What does this mean?
P(A)
– If A ⊂ B then P(A|B) = .
P(B)
– Assuming equally likely outcomes, compute
∗ P(A|B)?
A • •
• • • •
∗ P(B|A)? • • • • •
• • • •
• • •
B
∗ P(Ac |B)? • •
– P(A|B) ≥ 0
– P(Ω|B) = P(B|B) = 1
( )
∪
∞ ∑
∞
– For disjoint events, P An B = P(An |B)
n=1 n=1
∗ Check this.
• Multiplication Rule:
8
(n−1 )
∩
– If P Ak ̸= 0, then
k=1
( ) ( )
∩
n n−1
∩
P Ak = P(A1 )P(A2 |A1 )P(A3 |A1 ∩ A2 ) . . . P An Ak
k=1 k=1
A1 A2
B
A3
5 Independence
• Assume equally likely outcomes.
A • • A • •
• • • • • • • •
• • • • • • • • • •
• • • • • • • •
• • • • •
B • B
• • • •
– Toss two fair coins. Let A = {HH, HT }, B = {HH, T H} and C = {HT, T H}.
Then A and B are independent, A and C are independent and, B and C are inde-
10
pendent.
• A1 , A2 , . . . , An are independent if
( )
∩ ∏
P Ai = P(Ai ),
i∈I i∈I
• If P(B ∩C) ̸= 0, A and B are conditionally independent given C iff P(A|B ∩C) = P(A|C).
– A supplier sends boxes of items to a factory: 90% of the boxes contain 1% defective,
9% contain 10% defective, and 1% contain 100% defective (eg wrong size). What
percentage of screws supplied are defective?
Two screws are chosen from a randomly selected box. What is the probability that
both are defective? Given that both are defective, what is the probability that the
box is 100% defective?
Chapter 2
1 Basic Concepts
• A random variable is a real-valued function of the outcome of the experiment.
– The number of heads out of two tosses of a coin defines a function (mapping) from
the sample space Ω = {HH, HT, T H, T T } into R:
– The number of heads until the first tail in a sequence of tosses of a coin defines a
function from the sample space Ω = {T, HT, HHT, HHHT, . . .} into R:
– Let X be the number of heads in the two tosses of a fair coin. X may take the values
0, 1 or 2:
P[X = 0] = 0.25, P[X = 1] = 0.50 and P[X = 2] = 0.25.
– Let X be the number of heads until the first tail in a sequence of tosses of a coin:
Starting with $y, you double your wealth after each head. Let Y be the amount of
money you hold after the first tail: Y = 2X y,
P[Y = y] = 0.5, P[Y = 2y] = 0.25, P[Y = 4y] = 0.125, P[Y = 8y] = 0.0625 . . .
11
12
• The Bernoulli Random Variable: A Bernoulli trial has only two possible outcomes
usually referred to as success and failure.
The Bernoulli random variable takes two values, 1 with probability say p, and 0 with
probability 1 − p: {
1−p x=0
p(x) =
p x=1
– A random variable with the above probability mass function is said to have a
Bernoulli distribution with parameter p.
• The Binomial Random Variable: The Binomial random variable is used to model
the number of successes in a sequence of say n Bernoulli trials with probability say p of
success (and probability q = 1 − p of failure):
( )
n x n−x n!
p(x) = p q = px q n−x , x = 0, 1, . . . , n
x x!(n − x)!
– A random variable with the above probability mass function is said to have a bino-
mial distribution with parameters n and p.
( )
n
– Counting: The number of ways one can select k objects out of n objects is =
k
n!
. It is the number of subsets of size k taken from a set of size n.
k!(n − k)!
∑n ( )
n k n−k ∑
n
n n!
– The Binomial Formula: (a + b) = a b = ak bn−k .
k=0
k k=0
k!(n − k)!
• The Discrete Uniform Random Variable: A random variable X with the following
probability mass function
1
p(x) = , x = m, . . . , n
n−m+1
is said to have a Discrete Uniform distribution (or that it is a Discrete Uniform random
variable) over the interval [m, n].
• The Geometric Random Variable: The Geometric random variable is used to model
the number of trials in a sequence of Bernoulli trials with probability say p of success
(and probability q = 1 − p of failure), until the first success:
p(x) = pq x−1 , x = 1, 2, . . .
13
– A random variable with the above probability mass function is said to have a geo-
metric distribution with parameter p.
∑∞
ak
– The Geometric Series: an = .
n=k
1−a
• The Poisson Random Variable The Poisson Random Variable with parameter λ has
probability mass function:
λx
p(x) = e−λ , x = 0, 1, 2, . . .
x!
– A random variable with the above probability mass function is said to have a Poisson
distribution with parameter λ.
– Let X be a binomial random variable with parameters n and p. What is the distri-
bution of n − X?
– Let X be a Discrete Uniform random variable over the interval [−n, n], n ≥ 1. What
is the distribution of |X|?
– E[aX + b] = aE[X] + b
1
– The Geometric Random Variable with probability of success p: E[X] = .
p
∑
∞
zk
= ez
k=0
k!
– var(aX + b) = a2 var(X)
(n − m)(n − m + 2)
– The Discrete Uniform Random Variable over [m, n]: var(X) =
12
1−p
– The Geometric Random Variable with probability of success p: var(X) = .
p2
– Roll two fair dice. Let S be the sum of the outcomes of the two dice and M be their
maximum. Compute P[S = 9] and P[M = 5]. What is P[S = 9, M = 5]?
– Roll three fair dice. Let X be the sum of the outcomes of dice 1 and 2, and Y be
the sum of the outcomes of dice 2 and 3. Compute P[X = 9] and P[Y = 9]. What
is P[X = 9, Y = 9]?
– Roll two fair dice. Let X be the minimum of the outcomes of the two dice and Y be
their maximum. Find the joint PMF of X and Y .
• The marginal PMF of X and Y can be obtained from the joint PMF
∑ ∑
pX (x) = pX,Y (x, y) pY (y) = pX,Y (x, y)
y x
– Consider the grid {0, 1, . . . , n} × {0, 1, . . . , n}. Assume that points on the grid are
equally likely to be selected and denote by (X, Y ) the coordinates of a random se-
lected point. What is the joint PMF of X and Y ? What is the marginal distribution
of X?
6 Conditioning
• The conditional PMF of X given an event A with P(A) > 0, is
– Roll two fair dice. Let S be the sum of the outcomes of the two dice and M be their
maximum:
0.4 x = 4
P[S = x|M = 3] = 0.4 x = 5
0.2 x = 6
∑
• Conditional PMFs are true PMFs; i.e. pX|A (x) ≥ 0 and pX|A (x) = 1.
x
pX,Y (x, y)
pX|Y (x|y) = P[X = x|Y = y] =
pY (y)
and we have ∑
E[X] = pY (y)E[X|Y = y]
y
19
– Roll two fair dice. Let S be the sum of the outcomes of the two dice and M be their
maximum. What is E[S|M = 3]?
∑6
Find E[S|M = y], y = 1, . . . , 6 and check that E[S] = y=1 pM (y)E[S|M = y].
7 Independence
• A random variable X is independent of the event A if for all x, {X = x} and A are
independent, i.e.
pX|A (x) = pX (x), for all x
– Roll two fair dice. Let S be the sum of the outcomes of the two dice and M be their
maximum. Is S independent of {M = 3}?
• X and Y are independent if for all pairs (x, y), {X = x} and {Y = y} are independent,
i.e.
pX,Y (x, y) = pX (x)pY (y), for all x, y
– Roll two fair dice. Let X be the minimum of the outcomes of the two dice and Y be
their maximum. Are X and Y independent?
• If X and Y are independent then, for any functions g and h, g(X) and h(Y ) are inde-
pendent.
• The owner of a small drugstore is to order copies of a news magazine for the n potential
readers among his customers. Customers act independently and each one of them will
actually express interest in buying the news magazine with probability p. Suppose that
the store owner actually pays $B for each copy of the news magazine, and the price to
customers is $C. If magazines left at the end of the week have no salvage value, what is
the optimum number of copies to order?
Chapter 3
– For any x, P[X = x] = 0, and for any a and b, P[a ≤ X ≤ b] = P[a < X ≤ b] =
P[a ≤ X < b] = P[a < X < b].
∫ ∞
– fX (x)dx = 1.
−∞
– If δ is very small P[x < X < x + δ] ≈ fX (x)δ.
– Consider a continuous random variable whose PDF is given by
{ 2
cx 0 ≤ x ≤ 1
fX (x) =
0 otherwise
∗ Find c.
∗ Compute P[X < 1/2], P[X ≤ 2] and more generally P[X ≤ x].
21
22
∫ ∞
– E[X] = xfX (x)dx
−∞
∫ ∞
– E[g(X)] = g(x)fX (x)dx
−∞
∫ ∞ (∫ ∞ )2
– var(X) = E[X ] − E[X] =
2 2
x fX (x)dx −
2
xfX (x)dx
−∞ −∞
α
– The PDF is fX (x) = .
π(x2
+ α2 )
– The mean and variance do not exist.
FX (x) = P[X ≤ x]
– Draw the CDF of a binomial random variable with parameters n = 3 and p = .5.
– Draw the CDF of an arbitrary discrete random variable (taking an arbitrary but
finite number of of values).
– Draw the CDF of a uniform random variable over the interval [0, 1].
– FX is continuous,
∫ x
– FX (x) = fX (y)dy,
−∞
– fX (x) = FX′ (x).
25
λα α−1 −λx
fX (x) = x e , x≥0
Γ(α)
α
– The mean is E[X] = .
λ
α
– The variance is var(X) = .
λ2
26
• If X is a normal random variable with mean µ and variance σ 2 , then E[X] = µ and
var(X) = σ 2 .
• A normal random variable with mean 0 and variance 1 is called a standard normal random
variable.
5 Conditioning on an Event
• The conditional PDF fX|A of a continuous random variable X given an event A with
P(A) > 0 is defined as satisfying
∫ b
P[a < X < b|A] = fX|A (x)dx.
a
More generally, E[g(X)|A] = g(x)fX|A (x)dx, and the conditional variance of X given
−∞
A is var(X|A) = E[X 2 |A] − E[X|A]2 .
28
∫
– In particular for A = {X ∈ B}, E[X|X ∈ B] = xfX|X∈B (x)dx and E[g(X)|X ∈
∫ B
B] = g(x)fX|X∈B (x)dx.
B
∗ Let X be an exponential random variable with parameter λ. Compute the
conditional expectation of X given the event {X > a} for a > 0.
∑
n
– fX (x) = fX|Ai (x)P(Ai )
i=1
∑
n
– E[X] = E[X|Ai ]P(Ai )
i=1
∑
n
– E[g(X)] = E[g(X)|Ai ]P(Ai )
i=1
∗ Example 3.11 The metro train arrives at the station near your home every
quarter hour starting at 6:00am. You walk into the station every morning be-
tween 7:10am and 7:30am, with the time in this interval being a uniform random
variable. What is the PDF of the time you have to wait for the first train to
arrive?
29
∫ ∫
P[X ∈ A, Y ∈ B] = fX,Y (x, y)dxdy, ∀A, B.
A B
– We select a point “at random” from the unit square [0, 1] × [0, 1] and denote by
(X, Y ) its coordinate. Then
{
1 0 ≤ x, y ≤ 1
f(X,Y ) (x, y) =
0 elsewhere
( 2 )
1 x − 2rxy + y 2
– f (x, y) = √ exp − is a joint PDF.
2π 1 − r2 2(1 − r2 )
• If X and Y are jointly continuous with joint density fX,Y (x, y), then
1
∗ f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
( 2 )
1 x − 2rxy + y 2
∗ f (x, y) = √ exp −
2π 1 − r2 2(1 − r2 )
• Let X and Y be two jointly continuous random variables with joint PDF fX,Y (x, y).
∫ +∞ ∫ +∞
E[g(X, Y )] = g(x, y)fX,Y (x, y)dxdy.
−∞ −∞
31
In particular
∫ +∞ ∫ +∞
E[XY ] = xyfX,Y (x, y)dxdy.
−∞ −∞
1
– f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
( 2 )
1 x − 2rxy + y 2
– f (x, y) = √ exp −
2π 1 − r2 2(1 − r2 )
• Let X and Y be two jointly continuous random variables with joint PDF fX,Y (x, y). The
conditional PDF fX|Y (x|y) of X given Y is defined, whenever fY (y) > 0, as
fX,Y (x, y)
fX|Y (x|y) = .
fY (y)
32
1
– f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
( 2 )
1 x − 2rxy + y 2
– f (x, y) = √ exp −
2π 1 − r2 2(1 − r2 )
∫ ∞
– fX,Y (x, y) = fX|Y (x|y)fY (y), fX (x) = fX|Y (x|y)fY (y)dy and P[X ∈ A|Y = y] =
∫ −∞
fX|Y (x|y)dy.
A
Furthermore, Bayes’ rule for continuous random variables holds
• Let X and Y be two jointly continuous random variables with joint PDF fX,Y (x, y). The
conditional expectations of X, g(X) and h(X, Y ) given Y = y are
∫ ∞
– E[X|Y = y] = xfX|Y (x|y)dx,
−∞
33
1
∗ f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d
(b − a)(d − c)
( 2 )
1 x − 2rxy + y 2
∗ f (x, y) = √ exp −
2π 1 − r2 2(1 − r2 )
∫ ∞
– E[g(X)|Y = y] = g(x)fX|Y (x|y)dx,
−∞
∫ ∞
– E[h(X, Y )|Y = y] = h(x, y)fX|Y (x|y)dx.
−∞
• Let X and Y be two jointly continuous random variables with joint PDF fX,Y (x, y). Then
In particular
∫ ∞
– E[X] = E[X|Y = y]fY (y)dy,
−∞
∫ ∞
– E[g(X)] = E[g(X)|Y = y]fY (y)dy,
−∞
7 Derived Distributions
• Let X be a continuous random variable. To find the distribution of Y = g(X), one must
obtain the CDF of Y :
∫
FY (y) = P[g(X) ≤ y] = fX (x)dx
{x:g(x)≤y}
8 Simulations
The Inverse Transform Method
• Let X be a continuous random variable. Assume that FX is strictly increasing on the
range of X. Then FX (X) is a uniform over [0, 1] random variable. Conversely, if U is
uniform over [0, 1] random variable, then FX−1 (U ) is distributed like X.
36
1
X = − ln(1 − U ).
λ
1
X = − ln U
λ
also has an exponential distribution with parameter λ.
• Suppose X has a density f and that there exists a density ϕ and a constant C such that
Then, given a sequence of random variables having density ϕ, the following algorithm
produces a (single) simulation of X.
1. Simulate a number from the Random Variable with distribution g. Call the outcome
y.
2. Simulate a number from a uniform over [0, Cg(y)]. Call the outcome u.
3. If u ≤ f (y), then take X = y. Otherwise return to step 1.
1 Transforms
• The moment generating function (MGF) of a random variable X is
MX (t) = E[etX ].
In the discrete case, ∑
MX (t) = etx pX (x).
x
In the continuous case, ∫ ∞
MX (t) = etx fX (x)dx.
−∞
Note that in general MX (t) may not be defined for all values of t, however MX (0) is
always defined and equals 1.
• Moment generating functions for some common random variables
( )
n x
– Binomial (n, p): pX (x) = p (1 − p)n−x , x = 0, 1, . . . , n
x
MX (t) = (1 − p + pet )n
39
40
1
– Discrete Uniform over [m, n]: pX (x) = , x = m, . . . , n
n−m+1
e(n+1)t − emt
MX (t) =
(n − m + 1)(et − 1)
pet
MX (t) = , t < − ln(1 − p)
1 − (1 − p)et
λx
– Poisson λ: pX (x) = e−λ , x = 0, 1, . . .
x!
1
– Uniform over [a, b]: fX (x) = ,a≤x≤b
b−a
ebt − eat
MX (t) =
(b − a)t
41
λα α−1 −λx
– Gamma (α, λ): fX (x) = x e ,x>0
Γ(α)
( )α
λ
MX (t) = , t<λ
λ−t
( )
1 (x − µ)2
– Normal (µ, σ ): fX (x) = √
2
exp − ,x>0
2πσ 2σ 2
( )
MX (t) = exp µt + σ 2 t2 /2
• If MX (t) = MY (t) < +∞ for all values of t in an open interval containing 0, then X and
Y have the same CDF (distribution). In other words, if MX (t) is finite for all values of t
in an open interval containing 0, then MX (t) determines uniquely the CDF (distribution)
of X.
1
∗ MX (t) =
1 − 2t
sinh t
∗ MY (t) =
t
( )4
2 1 t
∗ MZ (t) = e t
+ e
3 3
P[X = xk ] = pk , k = 1, . . . , n.
( )2
2 1 t
– MX (t) = + e
3 3
• If MX (t) is finite for all values of t in an open interval containing 0, then it admits
derivatives at 0 of all orders and
(n)
MX (0) = E[X n ].
∑
+∞
tn
MX (t) = µn ,
n=0
n!
– The sum of n independent Bernoulli random variables with the same parameter is
binomial.
– The sum of two independent exponential random variables with the same parameter
is gamma.
– The sum of two independent Bernoulli p random variables is binomial (2, p).
∑ ∑
– The function γ(z) = ϕ(x)ψ(z − x) = ϕ(z − y)ψ(y) is called the convolution
x y
of ϕ and ψ.
• The continuous case: if X and Y are independent continuous random variables, then
∫ +∞ ∫ +∞
fX+Y (z) = fX (x)fY (z − x)dx = fX (z − y)fY (y)dy.
−∞ −∞
– The sum of two independent exponential λ random variables is gamma (2, λ).
∫ +∞ ∫ +∞
– The function γ(z) = ϕ(x)ψ(z − x)dx = ϕ(z − y)ψ(y)dy is also called the
−∞ −∞
convolution of ϕ and ψ.
1
– If (X, Y ) has pdf f (x, y) = λ2 e−λy , 0 < x < y, then E[Y |X = x] = x + and
λ
1
E[Y |X] = X + .
λ
e−x 1 + ex
– If (X, Y ) has pdf f (x, y) = , 0 < ln y < x, then E[Y |X = x] = and
ex − 1 2
1 + eX
E[Y |X] = .
2
1
– If (X, Y ) has pdf f(X,Y ) (x, y) = , a ≤ x ≤ b and c ≤ y ≤ d, then
(b − a)(d − c)
c+d c+d
E[Y |X = x] = and E[Y |X] = .
2 2
• The law of total expectation (also known as the law of iterated expectations)
E[E[Y |X]] = E[Y ].
2
∗ If (X, Y ) has pdf f (x, y) = λ2 e−λy , 0 < x < y, then E[Y ] = .
λ
47
1
var(Y |X = x) = .
λ2
2
var(Y ) = .
λ2
– Conditional on Jane finding the book in the first bookstore, Y is exponential λ, and
1 1
therefore has conditional mean equal to , conditional variance equal to 2 and
λ λ
conditional PDF equal to λe−λx , x > 0.
– Let N be the number of bookstore needed to find the book. Then
n n
E[Y |N = n] = , var(Y |N = n) = ,
λ λ2
λn
fY |N =n (y) = y n−1 e−λy , y > 0.
(n − 1)!
49
– N is geometric p. Therefore
1 1
E[Y ] = , var(Y ) = , fY (y) = pλe−pλy , y > 0.
pλ p 2 λ2
– E[Y ] = µE[N ].
∑
+∞
– MY (t) = MX (t)n pN (n).
n=0
50
– If X and Y are independent, then they are uncorrelated. The converse is not always
true.
– var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y ).
– −1 ≤ corr(X, Y ) ≤ 1 and
51
• Out of all estimators (random variables) g(Y ) based on Y , the mean squared estimation
error E[(X − g(Y ))2 ] is minimum for g(Y ) = E[X|Y ]:
E[(X − g(Y ))2 ] ≥ E[(X − E[X|Y ])2 ], for all functions g(Y ).
• Out of all linear estimators aY + b based on Y , the mean squared estimation error E[(X −
(aY + b))2 ] is minimum for
cov(X, Y ) cov(X, Y )
a= and b = E[X] − E[Y ].
var(Y ) var(Y )
52
• The pair (U, V ) is said to have a standard bivariate normal distribution if their joint PDF
is
{ }
1 1 [ 2 ]
fU,V (u, v) = √ exp − u + v − 2ρuv
2
2π 1 − ρ2 2(1 − ρ2 )
– If
√U and Z are two independent standard normal random variables, and V = ρU +
1 − ρ2 Z (|ρ| < 1), then (U, V ) has a standard bivariate normal distribution with
53
correlation ρ.
• The pair (X, Y ) is said to have a bivariate normal distribution if their joint PDF is
1
fX,Y (x, y) = √
2πστ 1 − ρ2
{ [ ]}
1 (x − µ)2 (y − ν)2 (x − µ)(y − ν)
× exp − + − 2ρ
2(1 − ρ2 ) σ2 τ2 στ
σ
– Conditionally on Y = y, X is normal with mean µ+ρ (y−ν) and variance σ 2 (1−ρ2 ).
τ
( ) ( ) ( 2 )
X µ σ ρστ
• If we write Z for the column vector , m for and Σ for the matrix ,
Y ν ρστ τ 2
then the determinant of Σ is |Σ| = σ 2 τ 2 (1 − ρ2 ), its inverse is
( 2 )
−1 1 τ −ρστ
Σ = 2 2
σ τ (1 − ρ2 ) −ρστ σ2
( )
x
and, with z = ,
y
( )
1 1 ′ −1
fX,Y (x, y) = fZ (z) = exp − (z − m) Σ (z − m) .
2π|Σ|1/2 2
Here z′ denotes the transpose of z.
Limit Theorems
σ2
E[Mn ] = µ and var(Mn ) = .
n
• As n ↑ +∞, var(Mn ) ↓ 0 and “Mn approaches µ” (in some sense). This is a “first order
approximation” of Mn .
Mn − µ
√ converges (in some sense) to a standard normal random variable.
σ/ n
E[X]
P[X ≥ a] ≤ , a > 0.
a
55
56
σ2
P[|X − µ| ≥ c] ≤ , c > 0.
c2
In particular
1
P[|X − µ| ≥ kσ] ≤ , k > 0.
k2
P[|Mn − µ| > ε] −→ 0, as n → ∞.
• Let Sn = X1 + . . . + Xn . If n is large,
( )
c − nµ
P[Sn ≤ c] ≈ Φ √ .
σ n
58
• A machine processes parts one at a time. The processing times of different parts are inde-
pendent random variables uniformly distributed over [1, 5]. Approximate the probability
that the number of of parts processed within 320 time units, denoted by N , is at least
100.
– A binomial random variable Sn with parameters n and p can be viewed as the sum
of n independent Bernoulli random variables X1 , . . . , Xn with common parameter p:
Sn = X1 + . . . + Xn .
Therefore,
( ) ( )
b − np a − np
P[a ≤ Sn ≤ b] ≈ Φ √ −Φ √ .
np(1 − p) np(1 − p)
∑21 ( )
36
P[Sn ≤ 21] = (0.5)3 6 = 0.8785.
k=0
k
59
Note that
P[Sn ≤ 21.5] ≈ 0.879.
This “continuity correction” is often used as a refinement to the CLT approximation.
• Convergence with probability 1 (also called almost sure convergence) is much stronger
than convergence in probability, and the SLLN is much stronger than the WLLN.
• Consider a discrete-time arrival process. The set of times is partitioned into consecutive
intervals of the forms Ik = {2k , 2k + 1, . . . , 2k+1 − 1}. During each interval Ik , there is
exactly one arrival, and all times are equally likely. The arrival times within different
intervals are assumed to be independent. Let Yn = 1 if there is an arrival at time n, and
Yn = 0 if there is no arrival. Then Yn to 0 converges in probability but does not converge
with probability 1.
Problem Sets
MTH2222 – Mathematics of Uncertainty
MTH2222 – Mathematics of Uncertainty 1
2. Let A and B be two sets with a finite number of elements. Show that the number of
elements in A ∩ B plus the number of elements in A ∪ B is equal to the number of
elements in A plus the number of elements in B.
3. We are given that P(Ac ) = 0.6, P(B) = 0.3, and P(A ∩ B) = 0.2. Determine P(A ∪ B).
4. We roll a four-sided die once and then we roll it as many times as is necessary to obtain a
different face than the one obtained in the first roll. Let the outcome of the experiment be
(r1 , r2 ) where r1 and r2 are the results of the first and the last rolls, respectively. Assume
that all possible outcomes have equal probability. Find the probability that:
(a) r1 is even.
(b) Both r1 and r2 are even.
(c) r1 + r2 < 5.
5. Alice and Bob each choose at random a number between zero and two. We assume a
uniform probability law under which the probability of an event is proportional to its
area. Consider the following events:
A: The magnitude of the difference of the two numbers is greater than 1/3.
B: At least one of the numbers is greater than 1/3.
C: The two numbers are equal.
D: Alices number is greater than 1/3.
Find the probabilities P(A), P(B), P(A ∩ B), P(C), P(D), P(A ∩ D).
7. Suppose that P(E) = 0.6. What can you say about P(E | F ) when
8. We roll two fair 6-sided dice. Each one of the 36 possible outcomes is assumed to be
equally likely.
9. A new test has been developed to determine whether a given student is overstressed.
This test is 95% accurate if the student is not overstressed, but only 85% accurate if the
student is in fact overstressed. It is known that 99.5% of all students are overstressed.
Given that a particular student tests negative for stress, what is the probability that the
test results are correct, and that this student is not overstressed?
Revision problems
1. We are given that P(A) = 0.55, P(B c ) = 0.35, and P(A ∪ B) = 0.75. Determine P(B)
and P(A ∩ B).
2. Let A and B be two sets. Under what conditions is the set A ∩ (A ∪ B)c empty?
3. A magical four-sided die is rolled twice. Let S be the sum of the results of the two rolls.
We are told that the probability that S = k is proportional to k, for k = 2, 3, . . . , 8,
and that all possible ways that a given sum k can arise are equally likely. Construct an
appropriate probabilistic model and find the probability of getting doubles.
4. Show the formula
P((A ∩ B c ) ∪ (Ac ∩ B)) = P(A) + P(B) − 2P(A ∩ B),
which gives the probability that exactly one of the events A and B will occur. [Compare
with the formula P(A ∪ B) = P(A) + P(B) − P(A ∩ B), which gives the probability that
at least one of the events A and B will occur.]
MTH2222 – Mathematics of Uncertainty 3
5. (a) A gambler has in his pocket a fair coin and a two-headed coin. He selects one of the
coins at random, and when he flips it, it shows heads. What is the probability that
it is the fair coin?
(b) Suppose that he flips the same coin a second time and again it shows heads. What
is the probability that it is the fair coin?
(c) Suppose that he flips the same coin a third time and it shows heads. What is now
the probability that it is the fair coin?
6. Alice and Bob have 2n + 1 coins, each with probability of a head equal to 1/2. Bob tosses
n + 1 coins, while Alice tosses the remaining n coins. Show that the probability that after
all the coins have been tossed, Bob will have gotten more heads than Alice is 1/2.
4 Tutorial 02
2. Give (a precise and complete definition of) the probability mass function of a binomial
random variable with k trials and probability of success u. [2]
1. A magnetic tape storing information in binary form has been corrupted, so it can only
be read with bit errors. The probability that you correctly detect a 0 is 0.9, while
the probability that you correctly detect a 1 is 0.85. Each digit is a 1 or a 0 with equal
probability. Given that you read a 1, what is the probability that this is a correct reading?
2. Bonferroni’s inequality.
Prove that for any two events A and B, we have
5. Suppose that A, B, and C are independent. Use the definition of independence to show
that A and B ∪ C are independent.
6. A parking lot consists of a single row containing n parking spaces (n ≥ 2). Mary arrives
when all spaces are free. Tom is the next person to arrive. Each person makes an equally
likely choice among all available spaces at the time of arrival. Describe the sample space.
Obtain P(A), the probability the parking spaces selected by Mary and Tom are at most
2 spaces apart.
MTH2222 – Mathematics of Uncertainty 5
(a) What is the probability that at least one of the events A1 , A2 , . . . , An occurs?
(b) What is the probability that none of the events A1 , A2 , . . . , An occurs?
9. Let X be the outcome of the roll of a fair die. Write the PMF of X.
10. The annual premium of a special kind of insurance starts at $1000 and is reduced by
10% after each year where no claim has been filed. The probability that a claim is filed
in a given year is 0.05, independently of preceding years. What is the PMF of the total
premium paid up to and including the year when the first claim is filed?
11. Let X be a discrete random variable that is uniformly distributed over the set of integers
in the range [a, b], where a and b are integers with a < 0 < b. Find the PMF of the
random variable max(0, X).
14. You are visiting the rainforrest, but unfortunately your insect repellent has run out. As a
result, at each second, a mosquito lands on your neck with probability 0.5. If a mosquito
lands, it will bite you with probability 0.2, and it will never bother you with probability
0.8, independently of other mosquitoes. What is the expected time between successive
bites?
6 Tutorial 02
Revision problems
1. In general, what is P(A ∪ B ∪ C)?
2. Write the law of total probability for the event B and the partition {A ∩ B, A ∩ B c , Ac }.
3. Bonferroni’s inequality.
Generalize Problem 2 of the previous section to the case of n events A1 , A2 , ..., An , by
showing that
5. We are told that events A and B are independent. In addition, events A and C are
independent. Is it true that A is independent of B∪C? Provide a proof or counterexample
to support your answer.
6. Suppose that A, B, and C are independent. Use the definition of independence to show
that A and B ∩ C are independent.
( )
∪
∞
Deduce that P An = lim P(An ). Prove the corresponding result for a decreasing
n→∞
n=1
sequence of events.
8. Give (a precise and complete definition of) the probability mass function of a Poisson
random variable with parameter p.
9. Let X be a random variable that takes integer values and is symmetric, that is, P(X =
k) = P(X = −k) for all integers k. What is the expected value of Y = X cos(Xπ) and
Z = sin(Xπ)?
MTH2222 – Mathematics of Uncertainty 7
1. Fischer and Spassky play a sudden-death chess match whereby the first player to win a
game wins the match. Each game is won by Fischer with probability p, by Spassky with
probability q, and is a draw with probability 1 − p − q.
2. Let X be a Poisson random variable with parameter λ. Compute E[X], E[X(X − 1)] and
deduce var(X).
3. Imagine a TV game show where each contestant i spins an infinitely calibrated wheel of
fortune, which assigns him/her with some real number with a value between 1 and 100.
All values are equally likely and the value obtained by each contestant is independent of
the value obtained by any other contestant.
(c) Let N be the integer-valued random variable whose value is the index of the first
contestant who is assigned a smaller number than contestant 1. As an illustration,
if contestant 1 obtains a smaller value than contestants 2 and 3 but contestant 4
has a smaller value than contestant 1 (X4 < X1 ), then N = 4. Find P(N > n) as a
function of n.
(d) Find E[N ], assuming an infinite number of contestants.
5. A city’s temperature is modelled as a random variable with mean and standard deviation
both equal to 10 degrees Celsius. A day is described as “normal” if the temperature
during that day ranges within one standard deviation from the mean. What would be the
temperature range for a normal day if temperature were expressed in degrees Fahrenheit?
6. Let a and b be positive integers with a ≤ b, and let X be a random variable that takes as
values, with equal probability, the powers of 2 in the interval [2a , 2b ]. Find the expected
value and the variance of X.
7. As an advertising campaign, a chocolate factory places golden tickets in some of its candy
bars, with the promise that a golden ticket is worth a trip through the chocolate factory,
and all the chocolate you can eat for life. If the probability of finding a golden ticket is
p, find the mean and the variance of the number of candy bars you need to eat to find a
ticket.
8. The MIT football team wins any one game with probability p, and loses it with probability
1 − p. Its performance in each game is independent of its performance in other games.
Let L1 be the number of losses before its first win, and let L2 be the number of losses
after its first win and before its second win. Find the joint PMF of L1 and L2 .
9. Your probability class has 250 undergraduate students and 50 graduate students. The
probability of an undergraduate (or graduate) student getting an A is 1/3 (or 1/2, re-
spectively). Let X be the number of students that get an A in your class.
10. A scalper is considering buying tickets for a particular game. The price of the tickets is
$75, and the scalper will sell them at $150. However, if she can’t sell them at $150, she
won’t sell them at all. Given that the demand for tickets is a binomial random variable
with parameters n = 10 and p = 1/2, how many tickets should she buy in order to
maximize her expected profit?
11. Suppose that X and Y are independent discrete random variables with the same geometric
PMF: pX (k) = pY (k) = p(1 − p)k−1 , k = 1, 2, . . ., where p is a scalar with 0 < p < 1.
Show that for any integer n ≥ 2, the conditional PMF P(X = k|X + Y = n) is uniform.
MTH2222 – Mathematics of Uncertainty 9
Revision problems
1. Let X be a discrete random variable that is uniformly distributed over the set of integers
in the range [a, b], where a and b are integers with a < 0 < b. Find the PMF of the
random variable min(0, X).
3. A particular binary data transmission and reception device is prone to some error when
receiving data. Suppose that each bit is read correctly with probability p. Find a value
of p such that when 10,000 bits are received, the expected number of errors is at most 10.
5. St. Petersburg paradox. You toss independently a fair coin and you count the number
of tosses until the first tail appears. If the number is n, you receive 2n . What is the
expected amount that you receive? How much would you be willing to pay to play this
game?
6. A fair coin is tossed successively until two consecutive heads or two consecutive tails
appear. Find the PMF, the expected value, and the variance of the number of tosses.
10 Tutorial 04
Find P[X < 1], P[|X| ≤ 1], the CDF of X, E[X] and var(X). [6]
1. Alvin shops for probability books for K hours, where K is a random variable that is
equally likely to be 1, 2, 3, or 4. The number of books N that he buys is random and
1
depends on how long he shops according to the conditional PMF pN |K (n|k) = , for
k
n = 1, . . . , k.
2. At his workplace, the first thing Oscar does every morning is to go to the supply room
and pick up one, two, or three pens with equal probability 1/3. If he picks up three pens,
he does not return to the supply room again that day. If he picks up one or two pens, he
will make one additional trip to the supply room, where he again will pick up one, two,
or three pens with equal probability 1/3. (The number of pens taken in one trip will not
affect the number of pens taken in any other trip.) Calculate the following:
(a) The probability that Oscar gets a total of three pens on any particular day.
MTH2222 – Mathematics of Uncertainty 11
(b) The conditional probability that he visited the supply room twice on a given day,
given that it is a day in which he got a total of three pens.
(c) E[N ] and E[N |C], where E[N ] is the unconditional expectation of N , the total num-
ber of pens Oscar gets on any given day, and E[N |C] is the conditional expectation
of N given the event C = {N > 3}.
(d) σN |C , the conditional standard deviation of the total number of pens Oscar gets on
a particular day, where N and C are as in part (c).
(e) The probability that he gets more than three pens on each of the next 16 days.
(f) The conditional standard deviation of the total number of pens he gets in the next
16 days given that he gets more than three pens on each of those days.
3. Your computer has been acting very strangely lately, and you suspect that it might have
a virus on it. Unfortunately, all 12 of the different virus detection programs you own
are outdated. You know that if your computer does have a virus, each of the programs,
independently of the others, has a 0.8 chance of believing that your computer is infected,
and a 0.2 chance of thinking your computer is fine. On the other hand, if your computer
does not have a virus, each program has a 0.9 chance of believing that your computer is
fine, and a 0.1 chance of wrongly thinking your computer is infected. Given that your
computer has a 0.65 chance of being infected with some virus, and given that you will
believe your virus protection programs only if 9 or more of them agree, find the probability
that your detection programs will lead you to the right answer.
4. The runner-up in a road race is given a reward that depends on the difference between
his time and the winner’s time. He is given 10 dollars for being one minute behind, 6
dollars for being one to three minutes behind, 2 dollars for being 3 to 6 minutes behind,
and nothing otherwise. Given that the difference between his time and the winner’s time
is uniformly distributed between 0 and 12 minutes, find the mean and variance of the
reward of the runner-up.
5. Let X be a random variable with PDF fX (x) = 2x/3, 1 < x ≤ 2, and let Y = X 2 .
Calculate E[Y ] and var(Y ).
6. Find the PDF, the mean, and the variance of the random variable X with CDF FX (x) =
1 − a3 /x3 , x ≥ a, where a is a positive constant.
7. The median of a random variable X is a number µ that satisfies FX (µ) = 1/2. Find the
median of the exponential random variable with parameter λ.
Revision problems
1. A class of n students takes a test in which each student gets an A with probability p,
a B with probability q, and a grade below B with probability 1 − p − q, independently
of any other student. If X and Y are the numbers of students that get an A and a B,
respectively, calculate the joint PMF pX,Y .
2. Let X, Y , and Z be independent geometric random variables with the same PMF: pX (k) =
pY (k) = pZ (k) = p(1 − p)k−1 , k = 1, 2, . . ., where p is a scalar with 0 < p < 1. Find
P(X = k|X + Y + Z = n). Hint: Try thinking in terms of coin tosses.
12 Tutorial 04
3. Joe Lucky plays the lottery on any given week with probability p, independently of
whether he played on any other week. Each time he plays, he has a probability q of
winning, again independently of everything else. During a fixed time period of n weeks,
let X be the number of weeks that he played the lottery and Y be the number of weeks
that he won.
(a) What is the probability that he played the lottery on any particular week, given that
he did not win on that week?
(b) Find the conditional PMF pY |X (y|x).
(c) Find the joint PMF pX,Y (x, y).
(d) Find the marginal PMF pY (y). Hint: One possibility is to start with the answer
to part (c), but the algebra can be messy. But if you think intuitively about the
procedure that generates Y , you may be able to guess the answer.
(e) Find the conditional PMF pX|Y (x|y). Do this algebraically using the preceding
answers.
(f) Re-derive the answer to part (e) by thinking as follows: for each one of the n − Y
weeks that he did not win, the answer to part (a) should tell you something.
4. Give (a precise and complete definition of) the probability density function of an expo-
nential random variable with parameter α.
MTH2222 – Mathematics of Uncertainty 13
4. What is the distribution of Y = −2X − 3 if X is normal with mean 13 and variance 6.[2]
1. Compute the nth moment, n ≥ 1, of a uniform random variable over the interval [0, 1].
3. The maintenance manager at a chemical facility knows that the times between repairs,
X, for a specific chemical reactor are well modelled by this distribution:
4. Engineers often use the uniform distribution to model the arrival time of some event
given that the event did occur within some interval. For example, production knows that
a particular pump failed at some time between 1.00 and 3.00 pm. Given that we know it
failed at some time during this period, the pdf for the specific time within the period is
{
1/(b − a) a ≤ x ≤ b
f (x) =
0 otherwise
Where a = 1 and b = 3. This pdf essentially says that all the times within this interval
are equally likely to occur.
(b) Derive the variance and the standard deviation for this distribution.
(c) Find the probability that the pump failed after 1.30 pm.
5. A radar tends to overestimate the distance of an aircraft, and the error is a normal
random variable with a mean of 50 meters and a standard deviation 100 meters. What
is the probability that the measured distance will be smaller than the true distance?
6. Let X be normal with mean 1 and variance 4. Let Y = 2X + 3. Calculate the PDF of Y
and find P(Y ≥ 0).
Revision problems
1. Compute the nth moment, n ≥ 1, of a uniform random variable over the interval [a, b].
[ ]
F (x) = 1 − exp −(λx)β , x>0
Find the PDF of X, and express its mean and variance in terms of the Gamma function:
∫ ∞
Γ(x) = z x−1 e−z dz, x > 0.
0
MTH2222 – Mathematics of Uncertainty 15
Find c. [3]
1. Oscar uses his high-speed modem to connect to the internet. The modem transmits zeros
and ones by sending signals −1 and +1, respectively. We assume that any given bit has
probability p of being a zero. The telephone line introduces additive zero-mean Gaussian
(normal) noise with variance σ 2 (so, the receiver at the other end receives a signal which
is the sum of the transmitted signal and the channel noise). The value of the noise is
assumed to be independent of the encoded signal value.
(a) Let a be a constant between −1 and +1. The receiver at the other end decides
that the signal −1 (respectively, +1) was transmitted if the value it receives is less
(respectively, more) than a. Find a formula for the probability of making an error.
(b) Find a numerical answer for the question of part (a) assuming that p = 2/5, a = 1/2
and σ 2 = 1/4.
4. An old modem can take anywhere from 0 to 30 seconds to establish a connection, with
all times between 0 and 30 being equally likely.
16 Tutorial 06
(a) What is the probability that if you use this modem you will have to wait more than
15 seconds to connect?
(b) Given that you have already waited 10 seconds, what is the probability of having to
wait at least 10 more seconds?
5. Consider a random variable X with PDF fX (x) = 2x/3, 1 < x ≤ 2, and let A be the
event {X ≥ 1.5}. Calculate E[X], P(A), and E[X|A].
6. Dino, the cook, has good days and bad days with equal frequency. On a good day, the
time (in hours) it takes Dino to cook a souffle is described by the PDF fG (g) = 2, if
1/2 < g ≤ 1, but on a bad day, the time it takes is described by the PDF fB (b) = 1, if
1/2 < b = 3/2. Find the conditional probability that today was a bad day, given that it
took Dino less than three quarters of an hour to cook a souffle?
7. One of two wheels of fortune, A and B, is selected by the toss of a fair coin, and the
wheel chosen is spun once to determine the value of a random variable X. If wheel A is
selected, the PDF of X is fX|A (x|A) = 1 if 0 < x ≤ 1. If wheel B is selected, the PDF of
X is fX|B (x|B) = 3 if 0 < x ≤ 1/3. If we are told that the value of X was less than 1/4,
what is the conditional probability that wheel A was the one selected?
8. Alexei is vacationing in Monte Carlo. The amount X (in dollars) he takes to the casino
each evening is a random variable with a PDF of the form fX (x) = ax, if 0 ≤ x ≤ 40.
At the end of each night, the amount Y that he has when leaving the casino is uniformly
distributed between zero and twice the amount that he came with.
10. Let X have a uniform distribution in the unit interval [0, 1], and let Y have an exponential
distribution with parameter ν = 2. Assume that X and Y are independent. Let Z =
X +Y.
11. Let X and Y be independent random variables, with each one uniformly distributed in
the interval [0, 1]. Find the probability of each of the following events.
Revision problems
1. Let X be a normal random variable with mean µ and variance σ 2 , compute E[etX ], for a
scalar t.
(a) Y = aX, a ̸= 0;
(b) Z = X 2 ;
(c) R = 1 − e−λX .
3. Let X be a standard normal random variable. Find the PDF of X 2 , and identify its
distribution.
5. Let X and Y be independent random variables, with each one uniformly distributed in
the interval [0, 1]. Find the probability of {XY ≤ 1/4}.
6. Let P a random variable which is uniformly distributed between 0 and 1. On any given
day, a particular machine is functional with probability P . Furthermore, given the value
of P , the status of the machine on different days is independent.
(a) Find the probability that the machine is functional on a particular day.
(b) We are told that the machine was functional on m out of the last n days. Find the
conditional PDF of P . You may use the identity
∫ 1
k!(n − k)!
pk (1 − p)n−k dp = .
0 (n + 1)!
(c) Find the conditional probability that the machine is functional today given that it
was functional on m out of the last n days.
18 Tutorial 07
2. For a positive continuous random variable X, write down the PDF of Y = X 2 in terms
of the PDF of X. [2]
2. The random variables X and Y have the joint PDF fX,Y (x, y) = λ3 (y −x)e−λy , 0 < x < y.
Obtain fX (x), fY (y) and fX|Y (x|y).
3. Let Y be a gamma random variable with parameters (2, λ), and let X, conditionally on
Y = y, be continuous uniform over [0, y].
4. Your driving time to work is between 30 and 45 minutes if the day is sunny, and between
40 and 60 minutes if the day is rainy, with all times being equally likely in each case.
Assume that a day is sunny with probability 2/3 and rainy with probability 1/3.
(a) Find the PDF, the mean, and the variance of your driving time.
(b) Your distance to work is 20 miles. What is the PDF, the mean, and the variance of
your average speed (driving distance over driving time)?
5. The random variables X and Y have the joint PDF fX,Y (x, y) = 2, x > 0, y > 0 and
x + y ≤ 1. Let A be the event {Y ≤ 0.5} and let B be the event {Y > X}.
(b) Calculate fX|Y (x|0.5). Calculate also the conditional expectation and the conditional
variance of X, given that Y = 0.5.
(c) Calculate fX|B (x).
(d) Calculate E[XY ].
(e) Calculate the PDF of Y /X.
√
6. Let X be an exponential random variable with parameter λ. Obtain the PDF of Y = X,
its mean and variance.
7. Let X be a random variable with PDF fX . Find the PDF of the random variable |X| in
the following three cases.
Revision problems
λn+1
1. The random variables X and Y have the joint PDF fX,Y (x, y) = (y − x)n−1 e−λy ,
(n − 1)!
0 < x < y, n integer, n ≥ 2. Obtain fX (x), fY (y) and fX|Y (x|y).
2. The lifetime of a light bulb is supposed to be exponentially distributed with mean inversely
proportional to the length its filament. That is, if x is the filament length, then it is
assumed that the mean lifetime of the light bulb is K/x.
(a) For a filament of length x, write down the PDF of the lifetime Zx .
(b) Production of filaments is such that their lengths can be anywhere between two
limits l and L. In fact a randomly selected light bulb can be assumed to have a
uniformly distributed over [l, L] filament length, herein denoted by X. Let Y be the
lifetime of a randomly selected light bulb.
i. What is the conditional PDF of Y given X = x?
ii. What is the joint PDF of (X, Y )?
iii. What is E[Y ]?
iv. What is the PDF of Y ?
20 Tutorial 08
[2]
Find a, pY (41), pY (11), the third largest possible value of Y , and its corresponding prob-
ability.
Find
5. Suppose that
6 − 3t
MX (t) = .
2(1 − t)(3 − t)
Find the PDF of the associated random variable.
(a) W = X1 + X2 + X3 + X4 .
(b) V = 0.25(X1 + X2 + X3 + X4 ).
(c) U = X1 + X2 + X3 + X4 + Y .
(d) MQ (t) = [MX (t)]5 .
(e) MH (t) = [MX (t)]2 [MY (t)]3 .
8. Use the formula for the MGF of a Poisson random variable X to calculate E[X] and
E[X 2 ].
22 Tutorial 08
Revision problems
1. Let X1 , X2 , X3 , X4 be independent random variables with common mean, variance, and
MGF denoted by E[X], var(X), and MX (t), respectively. Let Y be a random variable
that is independent of X1 , X2 , X3 , X4 , and has MGF MY (t). Each part of this problem
introduces a new random variable either as a function of X1 , X2 , X3 , X4 and Y , or as an
MGF defined in terms of MX (t) and MY (t). For each part part, determine the mean and
variance of the new random variable.
(a) R = 4X1 − Y .
(b) MG (t) = e6t MX (t).
(c) MD (t) = MX (7t).
2. Let X1 and X2 be independent random variables. Use the properties of MGFs to verify
that var(X1 + X2 ) = var(X1 ) + var(X2 ).
MTH2222 – Mathematics of Uncertainty 23
5. Let X be a geometric random variable with parameter P , where P is itself random and
uniformly distributed from 1/n to 1. Let Z = E[X|P ]. Find E[Z] and limn→∞ E[Z].
6. The random variables X and Y are described by a joint PDF which is constant within
the unit area quadrilateral with vertices (0, 0), (0, 1), (1, 2) and (1, 1). Use the law of
total variance to find the variance of X + Y .
7. (a) You roll a fair six-sided, and then you flip a fair coin the number of times shown by
the die. Find the expected value and the variance of the number of heads obtained.
(b) Repeat part (a) for the case where you roll two dice, instead of one.
8. A fair coin is flipped independently until the first head is obtained. For each tail observed
before the first head, the value of a continuous random variable with uniform PDF over
[0, 3] is generated. Let the random variable X be defined as the sum of all the values
obtained before the first head. Find the mean and variance of X.
Revision problems
1. Consider two independent and identically distributed discrete random variables X and
Y . Assume that their common PMF, denoted by p(x), is symmetric around zero, i.e.
p(x) = p(−x), for all x. Show that the PMF of X +Y is also symmetric around zero and is
( )1/2 ( )1/2
∑ ∑ ∑
largest at zero. Hint: Use the Schwarz inequality: (ak bk ) ≤ a2k b2k .
k k k
2. Let X, Y and Z be discrete random variables. Show the following generalisations of the
law of iterated expectations.
3. The random variables X1 , . . . , Xn have common mean µ, common variance σ 2 and, fur-
thermore, E[Xi Xj ] = c for every pair of distinct i and j. Derive a formula for the variance
of X1 + . . . + Xn in terms of µ, σ 2 , c and n.
4. Let X1 , . . . , Xn be some random variables and let cij = cov(Xi , Xj ). Show that for any
numbers a1 , . . . , an , we have
∑n ∑ n
ai aj ci,j ≥ 0.
i=1 j=1
6. Consider two random variables X and Y . Assume for simplicity that they both have zero
mean.
Find the means, variances, and the correlation coefficient of X and Y . Also, find the
value of the constant c. [4]
1. A police radar always overestimates the speed of incoming cars by an amount that is
uniformly distributed between 0 and 5 miles/hour. Assume that car speeds are uniformly
distributed from 55 to 75 miles/hour. What is the least squares estimate of the car speed
based on the radar’s measurement?
(a) Find the least squares estimate of Y given that X = x, for all possible values x.
(b) Let g ∗ (x) be the estimate from part (a), as a function of x. Find E[g ∗ (X)] and
var(g ∗ (X)),
(c) Find the mean square error E[(Y − g ∗ (X))2 ]. Is it the same as E[var(Y |X)]?
(d) Find var(Y ).
3. We are given that E[X] = 1, E[Y ] = 2, E[X 2 ] = 5, E[Y 2 ] = 8, and E[XY ] = 1. Find the
linear least squares estimator of Y given X.
4. In a communication system, the value of a random variable X is transmitted, but what
is received (denoted by Y ) is the value of X corrupted by some additive noise; that is
Y = X + W . We know the distribution of X and W , and let us assume that these two
ransom variables are independent and have the same PDF. Calculate the least squares
estimate of X given Y . What happens if X and W are dependent?
5. Consider three zero-mean random variables X, Y and Z with known variances and co-
variances. Give a formula for the linear least squares estimate of X based on Y and Z,
that is, find a and b that minimize E[(X − aY − bZ)2 ]. For simplicity, assume that Y and
Z are uncorrelated.
MTH2222 – Mathematics of Uncertainty 27
X = U + V, Y = U − 2V.
Revision problems
1. Linear least squares estimate based on several measurements. Let X be a
random variable with mean µ and variance v, and let Y1 , . . . , Yn be measurements of the
form Yi = X +Wi , where the Wi are random variables with mean 0 and variance vi , which
represent measurement errors. We assume that the random variables X, W1 , . . . , Wn are
independent. Show that the linear least squares estimator of X based on Y1 , . . . , Yn is
∑
(µ/v) + ni=1 (Yi /vi )
∑ .
(1/v) + ni=1 (1/vi )
2. Suppose that X and Y are independent normal random variables with the same variance.
Show that X − Y and X + Y are independent.
28 Tutorial 11
2. Chernoff bound for a Poisson random variable. Let X be a Poisson random variable
with parameter λ.
e−λ (eλ)k
P[X ≥ k] ≤ .
kk
3. Let X1 , X2 , . . . be a sequence of independent random variables that are uniformly dis-
tributed between 0 and 1. For every n, we let Yn be the median of the values of
X1 , X2 , . . . , X2n+1 . [That is, we order X1 , . . . , X2n+1 in increasing order and let Yn be
the (n+1)st element in this ordered sequence.] Apply the Weak Law of Large Numbers
to the sequence of Bernoulli random variables which equal 1 when Xi ≥ 0.5 + c to show
that that the sequence Yn converges to 1/2, in probability.
4. Uncle Henry has been having trouble keeping his weight constant. In fact, at the end
of each week, he notices that his weight has changed by a random amount, uniformly
distributed between -0.5 and 0.5 pounds. Assuming that the weight change during any
given week is independent of the weight change of any other week, find the probability
that Uncle Henry will gain or lose more than 3 pounds in the next 50 weeks.
5. Let Sn be the number of successes in n independent Bernoulli trials, where the probability
of success in each trial is p = 1/2. Provide a numerical value for the limit as n tends to
∞ for each of the following three expressions.
MTH2222 – Mathematics of Uncertainty 29
[n n ]
(a) P − 10 ≤ Sn ≤ + 10 .
2 2
[n n n n]
(b) P − ≤ Sn ≤ + .
2 10 2 10
[ √ √ ]
n n n n
(c) P − ≤ Sn ≤ + .
2 2 2 2
Revision problems
1. Bo assumes that X, the height in meters of any Canadian selected by an equally likely
choice among all Canadians, is a random variable with E[X] = h. Because Bo is sure that
no Canadian is taller than 3 meters, he decides to use 1.5 meters as a conservative value
for the standard deviation of X. To estimate h, Bo uses the average H of the heights of
n Canadians he selects at random.
(a) In terms of h and Bos 1.5 meter bound for the standard deviation of X, determine
the expectation and standard deviation of H.
(b) Find as small a value of n as possible such that the standard deviation of Bos
estimator is guaranteed to be less than 0.01 meters.
(c) Bo would like to be 99% sure that his estimate is within 5 centimeters of the true
average height of Canadians. Using the Chebyshev inequality, calculate the minimum
value of n that will achieve this objective.
(d) If we agree that no Canadians are taller than three meters, why is it correct to use
1.5 meters as an upper bound on the standard deviation for X, the height of any
Canadian selected at random?
2. On any given flight, an airlines goal is to fill the plane as much as possible, without
overbooking. If, on average, 10% of customers cancel their tickets, all independently
of each other, what is the probability that a particular flight will be overbooked if the
airline sells 320 tickets, for a plane that has maximum capacity 300 people? What is the
probability that a plane with maximum capacity 150 people will be overbooked if the
airline sells 160 tickets?
MTH2222 – Mathematics of Uncertainty
Assessment Summary
Unit Schedule
0 No formal assessment