Chapter 2: Axioms of Probability

Chapter 2: Axioms of Probability
①Proof of 𝑃(∅) = 0:
Consider a series of events E1, E2, …, Ei where E1=S and Ei=∅ for i>1, then, because the events are mutually
exclusive and because 𝑆 = ⋃∞ 𝑖=1 𝐸𝑖 , we have, from axiom 3:
∞ ∞
𝑃(𝑆) = ∑ 𝑃(𝐸𝑖 ) = 𝑃(𝑆) + ∑ 𝑃(∅)

𝑖=1 𝑖=2
Which means:
𝑃(∅) = 0
②Generally, for any set of mutually exclusive set of events:

𝑛 𝑛
𝑃 (⋃ 𝐸𝑖 ) = ∑ 𝑃(𝐸𝑖 )
1 𝑖=1
③the Strong Law of Large Numbers:

With probability 1, fraction of times in which a specific event E happens is equal to P(E), if the experiment is
repeated over and over again. Will be proved later
④regarding Eʹ:
1 = 𝑃(𝑆) = 𝑃(𝐸 ∪ 𝐸ʹ) = 𝑃(𝐸) + 𝑃(𝐸ʹ)
⑤summation of probabilities (proof of 𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹))

𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸 ∪ (𝐸ʹ ∩ 𝐹)) = 𝑃(𝐸) + 𝑃(𝐸ʹ ∩ 𝐹)
Since 𝐹 = (𝐹 ∩ 𝐸) ∪ (𝐹 ∩ 𝐸ʹ), 𝑃(𝐹) = 𝑃(𝐸 ∩ 𝐹) + 𝑃(𝐸ʹ ∩ 𝐹), meaning 𝑃(𝐸ʹ ∩ 𝐹) = 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹)
𝑁(𝐸 )
⑥Proof of 𝑃(𝐸 ) = 𝑁(𝑆) :
For 𝑆 = {1,2,3, … , 𝑁}, 𝑃({1}) = 𝑃({2}) = ⋯ = 𝑃({3}), and since {1}, {2}, … 𝑎𝑛𝑑 {𝑁} are all mutually exclusive,
then for example, for 𝐸 = {1,2,4}:
1 1 1 𝑁(𝐸)
𝑃(1 ∪ 2 ∪ 4) = 𝑃({1}) + 𝑃({2}) + ({4}) = + + =
𝑁 𝑁 𝑁 𝑁(𝑆)
⑦regarding urns and balls:

Suppose we have an urn containing n balls, p of which are special. The probability that with k balls withdrawn,
q of them will be special is equal to (hypergeometric distribution):
1
𝑝 𝑛−𝑝
(𝑞) (𝑘 − 𝑞 )
𝑛
( )
𝑘
Chapter 4: Random Variables

①a general formula:
𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏
②Variance definition:
𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇)2 ] = ∑(𝑥 − 𝜇)2 𝑝(𝑥)
𝑥
= ∑(𝑥 2 − 2𝑥𝜇 + 𝜇2 )𝑝(𝑥) = ∑ 𝑥 2 𝑝(𝑥) − 2𝜇 ∑ 𝑥𝑝(𝑥) + 𝜇2 ∑ 𝑝(𝑥)

𝑥 𝑥 𝑥 𝑥
= 𝐸[𝑋 2 ] + 2𝜇. 𝐸[𝑋] + 𝜇2 . 1 = 𝐸[𝑋 2 ] − 2𝜇2 + 𝜇2 = 𝐸[𝑋 2 ] − 𝜇2

∴ 𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2
The conclusion is a more convenient way of calculating Var(X).
Also note that:
𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)
③Standard Deviation:
𝑠𝑑(𝑋) = √𝑉𝑎𝑟(𝑋)
④the Bernoulli random variable:

𝑝(0) = 1 − 𝑝
𝑝(1) = 𝑝
The E is equal to:
𝐸(𝑋) = 𝑝
⑤Binomial random variable:

𝑛
( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑖
The parameters are shown as (n,p)
2
The expected value is equal to:
𝑛
𝑛
𝐸(𝐵(𝑋; 𝑛, 𝑝)) = ∑ 𝑖 ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑖
𝑖=1
𝑛 𝑛−1
Note that: 𝑖 ( ) = 𝑛 ( )
𝑖 𝑖−1
𝑛 𝑛
𝑛−1 𝑛 − 1 𝑖−1 (1
∑𝑛( ) 𝑝. 𝑝𝑖−1 (1 − 𝑝)𝑛−1−(𝑖−1) = 𝑛𝑝 ∑ ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) = 𝑛𝑝
𝑖−1 𝑖−1
𝑖=1 𝑖=1
For the variance, we need E[X2]:

𝑛 𝑛−1
𝑛 𝑛 − 1 𝑖−1 (1
∑ 𝑖 ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 = 𝑛𝑝 ∑ 𝑖 (
2
)𝑝 − 𝑝)𝑛−1−(𝑖−1)
𝑖 𝑖−1
𝑖=1 𝑖=1
𝑛−1
𝑛 − 1 𝑖−1 (1
= 𝑛𝑝 ∑[1 + (𝑖 − 1)] ( )𝑝 − 𝑝)𝑛−1−(𝑖−1)
𝑖−1
𝑖=1
𝑛−1 𝑛−1
𝑛 − 1 𝑖−1 (1 𝑛 − 1 𝑖−1 (1
= 𝑛𝑝 [(∑ ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) ) + (∑(𝑖 − 1) ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) )]
𝑖−1 𝑖−1
𝑖=1 𝑖=1
∴ 𝐸[𝑖 2 ] = 𝑛𝑝(1 + (𝑛 − 1)𝑝)

Using the obtained value of E[X2], we can now calculate the Var[X]
𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2

𝑛𝑝(1 + (𝑛 − 1)𝑝) − (𝑛𝑝)2 = 𝑛𝑝(1 − 𝑝)
④The Poisson random variable:

It may be used as an approximation for a binomial random variable with parameters (n,p) when n is large and
p is small enough so that np is of moderate size. It is defined as:
𝜆𝑖
𝑝(𝑖) = 𝑒 −𝜆 𝑖 = 1,2,3, …
𝑖!
It also represents a probability mass function:
∞ ∞
−𝜆
𝜆𝑖
∑ 𝑝(𝑖) = 𝑒 ∑ = 𝑒 −𝜆 𝑒 𝜆 = 1
𝑖!
𝑖=0 𝑖=0
It is derived as follows, taking np=λ:
𝑛! 𝑖 (1 𝑛−𝑖
𝑛! 𝜆 𝑖 𝜆 𝑛−𝑖
𝑝(𝑖) = 𝑝 − 𝑝) = ( ) (1 − )
(𝑛 − 𝑖)! 𝑖! (𝑛 − 𝑖)! 𝑛! 𝑛 𝑛
3
𝑛
𝜆
𝑛(𝑛 − 1) … (𝑛 − 𝑖 + 1) 𝜆𝑖 (1 − ⁄𝑛)
= 𝑖
𝑛𝑖 𝑖!
(1 − 𝜆⁄𝑛)
For n large and λ moderate:
𝜆 𝑛 𝑛(𝑛 − 1) … (𝑛 − 𝑖 + 1) 𝜆 𝑖
(1 − ) ≈ 𝑒 −𝜆 ≈1 (1 − ) ≈ 1
𝑛 𝑛𝑖 𝑛
Hence,
𝜆𝑖
𝑝(𝑖) ≈ 𝑒 −𝜆
𝑖!
⑤where Poisson is used:

Suppose we want to approximate the number of people who reach age 100. Among n persons with the
probability p(100) for each of them to reach the age 100, the number is approximated to be Poisson with (i;λ)
where i=100 and λ=np
⑥E[X] & Var[X]:

First we calculate E[X]:
∞ ∞
−𝜆
𝜆𝑖 𝜆𝑖−1
𝐸[𝑋] = ∑ 𝑖𝑒 = 𝑒 −𝜆 𝜆 ∑ = 𝑒 −𝜆 𝜆𝑒 𝜆 = 𝜆
𝑖! (𝑖 − 1)!
𝑖=0 𝑖=1
For the variance:

∞ ∞ ∞ ∞
2] 2 −𝜆
𝜆𝑖 𝜆𝑖−1 𝜆𝑖−1 𝜆𝑖−1
𝐸[𝑋 = ∑𝑖 𝑒 = 𝜆 ∑ 𝑖𝑒 −𝜆 = 𝜆 [∑(𝑖 − 1)𝑒 −𝜆 + ∑ 𝑒 −𝜆 ] = 𝜆[𝜆 + 1]
𝑖! (𝑖 − 1)! (𝑖 − 1)! (𝑖 − 1)!
𝑖=0 𝑖=1 𝑖=1 𝑖=1
Var[X] is equal to:
𝜆(𝜆 + 1) − 𝜆2 = 𝜆
⑦the Negative Binomial Random Variable

N trials are repeated. The probability of having r successes is a negative binomial random variable with
parameters (X;r,p):
𝑛 − 1 𝑟 (1
𝑃(𝑋 = 𝑛) = ( )𝑝 − 𝑝)𝑛−𝑟 𝑛 = 𝑟, 𝑟 + 1, …
𝑟−1
It follows because in order for the rth success to occur in the nth trial with probability p, r-1 successes must
have happened in the n-1 previous trials.
Example: what is the probability of achieving r successes before m failures?
In other words, the probability of achieving r successes in r+m-1 trials has to be calculated, which is as follows:
𝑟+𝑚−1
𝑛 − 1 𝑟 (1
∑ ( )𝑝 − 𝑝)𝑛−𝑟
𝑟−1
𝑛=𝑟
4
⑧the expected value and variance of the NB random variable:
First we calculate E[X]:
∞ ∞
𝑛 − 1 𝑟 (1 𝑟 𝑛
𝐸[𝑋] = ∑ 𝑛 ( )𝑝 − 𝑝)𝑛−𝑟 = ∑ 1 ( ) 𝑝𝑟+1 (1 − 𝑝)𝑛−𝑟
𝑟−1 𝑝 𝑟
𝑛=𝑟 𝑛=𝑟
∞
𝑟 (𝑛 + 1) − 1 𝑟+1 𝑟
= ∑( ) 𝑝 (1 − 𝑝)(𝑛+1)−(𝑟+1) =
𝑝 (𝑟 + 1) − 1 𝑝
𝑛=𝑟
The term after the sigma sign represents a mass probability function with parameters (n+1, r+1, p)
𝑟
𝐸[𝑋] =
𝑝
For the variance,
𝑟 𝑟 𝑟+1
𝐸[𝑋 2 ] = 𝐸[𝑌 − 1] = ( − 1)
𝑝 𝑝 𝑝
𝑟 𝑟+1 𝑟 2 𝑟(1 − 𝑝)
𝑉𝑎𝑟(𝑋) = ( − 1) − ( ) =
𝑝 𝑝 𝑝 𝑝2
⑨the Hypergeometric random variable:

Suppose that a sample of size n is to be chosen randomly (without replacement) from an urn containing N
balls, of which m are white and N − m are black. If we let X denote the number of white balls selected,
𝑚 𝑁−𝑚
( )( )
𝑖 𝑛−𝑖
𝑃(𝑋 = 𝑖) = 𝑖 = 0, 1, 2, …
𝑁
( )
𝑛
Which is a hypergeometric variable with the variable and parameters of (X; n,N,m)
⑩the expected value and variance of the hypergeometric variable:
𝑁 𝑚 𝑁−𝑚
( )( )
𝑖 𝑛−𝑖
𝐸[𝑋] = ∑ 𝑖
𝑁
𝑖=1 ( )
𝑖
𝑚 𝑚−1 𝑁 𝑁 𝑁−1
using 𝑖 ( ) = 𝑚 ( ) and ( ) = ( ):
𝑖 𝑖−1 𝑖 𝑛 𝑖−1
𝑁 𝑚 − 1 (𝑁 − 1) − (𝑚 − 1)
𝑛𝑚 ( )( ) 𝑛𝑚 𝑛𝑚
𝑖−1 (𝑛 − 1) − (𝑖 − 1)
𝐸[𝑋] = ∑1 = 1=
𝑁 𝑁−1 𝑁 𝑁
𝑖=1 ( )
𝑖−1
The number 1 after nm/N represents a mass probability function with parameters (1, n-1, N-1, m-1)
For E[X2]:
5
𝑚 𝑁−𝑚 𝑁 𝑚 − 1 (𝑁 − 1) − (𝑚 − 1)
( )( ) 𝑛𝑚 𝑁 (
𝑖−1
)(
(𝑛 − 1) − (𝑖 − 1)
)
2] 2 𝑖 𝑛 − 𝑖
𝐸[𝑋 = ∑ 𝑖 = ∑[(𝑖 − 1) + 1]
𝑁 𝑁 𝑁−1
𝑖=1 ( ) 𝑖=1 ( )
𝑛 𝑛−1
𝑛𝑚 𝑛𝑚 (𝑛 − 1)(𝑚 − 1)
= [𝐸(𝑖 − 1) + 𝑃(1)] = [ + 1]
𝑁 𝑁 (𝑁 − 1)
For the variance:

𝑛𝑚 (𝑛 − 1)(𝑚 − 1) 𝑛𝑚 2
𝑉𝑎𝑟(𝑋) = [ + 1] − ( )
𝑁 (𝑁 − 1) 𝑁
𝑚
𝑚−1 𝑚 1−
𝑁
Using = − :
𝑁−1 𝑁 𝑁−1
𝑚
𝑛𝑚 𝑚 1−𝑀 𝑛𝑚 𝑛𝑚 𝑚 𝑛−1
∴ 𝑉𝑎𝑟(𝑋) = [(𝑛 − 1) ( − )+1− ]= (1 − ) (1 − )
𝑁 𝑁 𝑁−1 𝑁 𝑅 𝑁 𝑁−1
Chapter 5: Continuous Random Variables

①Represented by:
𝑏
𝑃{𝑎 ≤ 𝑥 ≤ 𝑏} = ∫ 𝑓(𝑥)𝑑𝑥
𝑎
Note that f(x) itself does NOT show P(a), because:

𝑎
𝑃(𝑋 = 𝑎) = ∫ 𝑓(𝑥)𝑑𝑥 = 0
𝑎
Therefore:
𝑎 ∞
𝑎
𝑃(𝑋 < 𝑎) = 𝑃(𝑋 ≤ 𝑎) = 𝐹(𝑥) = ∫ 𝑓(𝑥)𝑑𝑥 = [∫ 𝑓(𝑥)𝑑𝑥 ] |
−∞ −∞
−∞
②The expectation of continuous random variables:

∞
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞
∞
𝐸[𝑔(𝑥)] = ∫ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥
−∞
For example:
1 𝑖𝑓 0 ≤ 𝑥 ≤ 1
𝑓(𝑥) = { 𝐸(𝑒 𝑋 ) =?
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
6
∞ 1
𝑓(𝑥)=1, 0≤𝑥≤1 1
𝐸(𝑒 𝑋 ) = ∫ 𝑒 𝑥 𝑓(𝑥)𝑑𝑥 → ∫ 𝑒 𝑥 𝑑𝑥 = 𝑒 𝑥 | = 𝑒 1 − 𝑒 0 = 𝑒 − 1
0
−∞ 0
③The variance of continuous random variables is just the same as

discrete random variables:
𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 also 𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)
④the uniform random variable:

1 0<𝑥<1
𝑓(𝑥) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
For any 0<a<b<1:
𝑏
𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥 = 𝑥 | = 𝑏 − 𝑎
𝑎
𝑎
In its general form:

1
if α<x<b
𝑓(𝑥) = {𝛽 − 𝛼
0 if otherwise
Therefore, for any value a:
0 𝑎<𝛼
𝑎−𝛼
𝐹(𝑎) = {𝛽 − 𝛼 𝛼 < 𝑎𝛽
1 𝛽<𝑎
The expectation is equal to (α+β)/2, the variance (β-α)2/12
⑤The normal random variable:

1
𝑓(𝑥) = 𝑒𝑥𝑝[−(𝑥 − 𝜇)/2𝜎 2 ]
√2𝜋𝜎
For any normally distributed X, w/ parameters σ2 and µ, Y=aX+b is normally distributed w/ parameters aµ+b
and a2σ2. The cumulative probability function for the normal distribution is shown as:
𝑥 𝑥
1 1
Φ(𝑥) = [(𝑥 − 𝜇)⁄𝜎 2 ]⁄2] =
∫ 𝑒𝑥𝑝 [− ⏟ ∫ 𝑒𝑥𝑝[− 𝑍⁄2]
√2𝜋 𝑍 √2𝜋
−∞ −∞
∴ P(𝑎 < 𝑋 < 𝑏) = Φ(𝑏) − Φ(𝑎) = 𝑃(𝑍𝑎 < 𝑍𝑋 < 𝑍𝑏 ) = Φ(𝑍𝑏 ) − Φ(𝑍𝑎 )
note that: Φ(∞) = 1
7
⑥The normal approximation to the Binomial Distribution:
The probability that in n trials, w/ a success probability of p, X of them will be a success, can be approximate
as:
𝑋 − 𝑛𝑝
𝑃 (𝑎 ≤ ≤ 𝑏) ≅ Φ(𝑏) − Φ(𝑎)
√𝑛𝑝(1 − 𝑝)
The approximation is quite good when np(1-p)>10.
When using the normal approximation for a binomially distributed P(x)=y, use normal distribution for
P(y-.5<y<y+.5) (standardize first), which is called the continuity correction. For a binomially distributed P(X>Y),
use the normal distribution for P(X≥Y+.5)
⑦The exponential Random Variable:

For λ>0, w/ parameters λ:
−𝜆𝑥
𝑓(𝑥) = {𝜆𝑒 if 𝑥 ≥ 0
0 if 𝑥 < 0
⑧ Integration by Parts:
For calculating∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥:
𝑑
[𝑓(𝑥)𝑔(𝑥)] = 𝑓́ (𝑥)𝑔(𝑥) + 𝑓(𝑥)𝑔́ (𝑥)
𝑑𝑥
Integrating both side gives:
𝑑
∫ [𝑓(𝑥)𝑔(𝑥)]𝑑𝑥 = ∫ 𝑓́ (𝑥)𝑔(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑔́ (𝑥)𝑑𝑥
𝑑𝑥
Rearranging gives the integration by parts formula:
∫ 𝑓(𝑥)𝑔́ (𝑥)𝑑𝑥 = 𝑓(𝑥)𝑔(𝑥) − ∫ 𝑓́ (𝑥)𝑔(𝑥)𝑑𝑥
Or, representing f(x) w/ u and g(x) w/ v gives:
∫ 𝑢𝑑𝑣 = 𝑢𝑣 − ∫ 𝑣𝑑𝑢
⑨The expectation and Variance of the Exponential Random Variable

The E is equal to 1/λ, the Var equal to 1/λ2
⑪Memoryless Random Variables

Refer to Sheldon page 210

Chapter 2: Axioms of Probability

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Chapter 2: Axioms of Probability

Încărcat de

Drepturi de autor:

Formate disponibile

Chapter 2: Axioms of Probability

𝑃(𝑆) = ∑ 𝑃(𝐸𝑖 ) = 𝑃(𝑆) + ∑ 𝑃(∅)

②Generally, for any set of mutually exclusive set of events:

③the Strong Law of Large Numbers:

⑤summation of probabilities (proof of 𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹))

⑦regarding urns and balls:

Chapter 4: Random Variables

= ∑(𝑥 2 − 2𝑥𝜇 + 𝜇2 )𝑝(𝑥) = ∑ 𝑥 2 𝑝(𝑥) − 2𝜇 ∑ 𝑥𝑝(𝑥) + 𝜇2 ∑ 𝑝(𝑥)

= 𝐸[𝑋 2 ] + 2𝜇. 𝐸[𝑋] + 𝜇2 . 1 = 𝐸[𝑋 2 ] − 2𝜇2 + 𝜇2 = 𝐸[𝑋 2 ] − 𝜇2

Also note that:

④the Bernoulli random variable:

⑤Binomial random variable:

For the variance, we need E[X2]:

∴ 𝐸[𝑖 2 ] = 𝑛𝑝(1 + (𝑛 − 1)𝑝)

𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2

④The Poisson random variable:

It is derived as follows, taking np=λ:

For n large and λ moderate:

⑤where Poisson is used:

⑥E[X] & Var[X]:

For the variance:

Var[X] is equal to:

⑦the Negative Binomial Random Variable

Example: what is the probability of achieving r successes before m failures?

⑨the Hypergeometric random variable:

⑩the expected value and variance of the hypergeometric variable:

For the variance:

Chapter 5: Continuous Random Variables

Note that f(x) itself does NOT show P(a), because:

②The expectation of continuous random variables:

③The variance of continuous random variables is just the same as

④the uniform random variable:

In its general form:

⑤The normal random variable:

The approximation is quite good when np(1-p)>10.

⑦The exponential Random Variable:

∫ 𝑓(𝑥)𝑔́ (𝑥)𝑑𝑥 = 𝑓(𝑥)𝑔(𝑥) − ∫ 𝑓́ (𝑥)𝑔(𝑥)𝑑𝑥

Or, representing f(x) w/ u and g(x) w/ v gives:

⑨The expectation and Variance of the Exponential Random Variable

⑪Memoryless Random Variables

S-ar putea să vă placă și