Sunteți pe pagina 1din 8

Chapter 2: Axioms of Probability

①Proof of 𝑃(∅) = 0:
Consider a series of events E1, E2, …, Ei where E1=S and Ei=∅ for i>1, then, because the events are mutually
exclusive and because 𝑆 = ⋃∞ 𝑖=1 𝐸𝑖 , we have, from axiom 3:
∞ ∞

𝑃(𝑆) = ∑ 𝑃(𝐸𝑖 ) = 𝑃(𝑆) + ∑ 𝑃(∅)


𝑖=1 𝑖=2

Which means:

𝑃(∅) = 0

②Generally, for any set of mutually exclusive set of events:


𝑛 𝑛

𝑃 (⋃ 𝐸𝑖 ) = ∑ 𝑃(𝐸𝑖 )
1 𝑖=1

③the Strong Law of Large Numbers:


With probability 1, fraction of times in which a specific event E happens is equal to P(E), if the experiment is
repeated over and over again. Will be proved later

④regarding Eʹ:
1 = 𝑃(𝑆) = 𝑃(𝐸 ∪ 𝐸ʹ) = 𝑃(𝐸) + 𝑃(𝐸ʹ)

⑤summation of probabilities (proof of 𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹))


𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸 ∪ (𝐸ʹ ∩ 𝐹)) = 𝑃(𝐸) + 𝑃(𝐸ʹ ∩ 𝐹)

Since 𝐹 = (𝐹 ∩ 𝐸) ∪ (𝐹 ∩ 𝐸ʹ), 𝑃(𝐹) = 𝑃(𝐸 ∩ 𝐹) + 𝑃(𝐸ʹ ∩ 𝐹), meaning 𝑃(𝐸ʹ ∩ 𝐹) = 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹)

𝑁(𝐸 )
⑥Proof of 𝑃(𝐸 ) = 𝑁(𝑆) :
For 𝑆 = {1,2,3, … , 𝑁}, 𝑃({1}) = 𝑃({2}) = ⋯ = 𝑃({3}), and since {1}, {2}, … 𝑎𝑛𝑑 {𝑁} are all mutually exclusive,
then for example, for 𝐸 = {1,2,4}:
1 1 1 𝑁(𝐸)
𝑃(1 ∪ 2 ∪ 4) = 𝑃({1}) + 𝑃({2}) + ({4}) = + + =
𝑁 𝑁 𝑁 𝑁(𝑆)

⑦regarding urns and balls:


Suppose we have an urn containing n balls, p of which are special. The probability that with k balls withdrawn,
q of them will be special is equal to (hypergeometric distribution):

1
𝑝 𝑛−𝑝
(𝑞) (𝑘 − 𝑞 )
𝑛
( )
𝑘

Chapter 4: Random Variables


①a general formula:
𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏

②Variance definition:
𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇)2 ] = ∑(𝑥 − 𝜇)2 𝑝(𝑥)
𝑥

= ∑(𝑥 2 − 2𝑥𝜇 + 𝜇2 )𝑝(𝑥) = ∑ 𝑥 2 𝑝(𝑥) − 2𝜇 ∑ 𝑥𝑝(𝑥) + 𝜇2 ∑ 𝑝(𝑥)


𝑥 𝑥 𝑥 𝑥

= 𝐸[𝑋 2 ] + 2𝜇. 𝐸[𝑋] + 𝜇2 . 1 = 𝐸[𝑋 2 ] − 2𝜇2 + 𝜇2 = 𝐸[𝑋 2 ] − 𝜇2


∴ 𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2
The conclusion is a more convenient way of calculating Var(X).

Also note that:

𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)

③Standard Deviation:
𝑠𝑑(𝑋) = √𝑉𝑎𝑟(𝑋)

④the Bernoulli random variable:


𝑝(0) = 1 − 𝑝
𝑝(1) = 𝑝
The E is equal to:

𝐸(𝑋) = 𝑝

⑤Binomial random variable:


𝑛
( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑖
The parameters are shown as (n,p)

2
The expected value is equal to:
𝑛
𝑛
𝐸(𝐵(𝑋; 𝑛, 𝑝)) = ∑ 𝑖 ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑖
𝑖=1

𝑛 𝑛−1
Note that: 𝑖 ( ) = 𝑛 ( )
𝑖 𝑖−1
𝑛 𝑛
𝑛−1 𝑛 − 1 𝑖−1 (1
∑𝑛( ) 𝑝. 𝑝𝑖−1 (1 − 𝑝)𝑛−1−(𝑖−1) = 𝑛𝑝 ∑ ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) = 𝑛𝑝
𝑖−1 𝑖−1
𝑖=1 𝑖=1

For the variance, we need E[X2]:


𝑛 𝑛−1
𝑛 𝑛 − 1 𝑖−1 (1
∑ 𝑖 ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 = 𝑛𝑝 ∑ 𝑖 (
2
)𝑝 − 𝑝)𝑛−1−(𝑖−1)
𝑖 𝑖−1
𝑖=1 𝑖=1
𝑛−1
𝑛 − 1 𝑖−1 (1
= 𝑛𝑝 ∑[1 + (𝑖 − 1)] ( )𝑝 − 𝑝)𝑛−1−(𝑖−1)
𝑖−1
𝑖=1
𝑛−1 𝑛−1
𝑛 − 1 𝑖−1 (1 𝑛 − 1 𝑖−1 (1
= 𝑛𝑝 [(∑ ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) ) + (∑(𝑖 − 1) ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) )]
𝑖−1 𝑖−1
𝑖=1 𝑖=1

∴ 𝐸[𝑖 2 ] = 𝑛𝑝(1 + (𝑛 − 1)𝑝)


Using the obtained value of E[X2], we can now calculate the Var[X]

𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2


𝑛𝑝(1 + (𝑛 − 1)𝑝) − (𝑛𝑝)2 = 𝑛𝑝(1 − 𝑝)

④The Poisson random variable:


It may be used as an approximation for a binomial random variable with parameters (n,p) when n is large and
p is small enough so that np is of moderate size. It is defined as:

𝜆𝑖
𝑝(𝑖) = 𝑒 −𝜆 𝑖 = 1,2,3, …
𝑖!
It also represents a probability mass function:
∞ ∞
−𝜆
𝜆𝑖
∑ 𝑝(𝑖) = 𝑒 ∑ = 𝑒 −𝜆 𝑒 𝜆 = 1
𝑖!
𝑖=0 𝑖=0

It is derived as follows, taking np=λ:

𝑛! 𝑖 (1 𝑛−𝑖
𝑛! 𝜆 𝑖 𝜆 𝑛−𝑖
𝑝(𝑖) = 𝑝 − 𝑝) = ( ) (1 − )
(𝑛 − 𝑖)! 𝑖! (𝑛 − 𝑖)! 𝑛! 𝑛 𝑛

3
𝑛
𝜆
𝑛(𝑛 − 1) … (𝑛 − 𝑖 + 1) 𝜆𝑖 (1 − ⁄𝑛)
= 𝑖
𝑛𝑖 𝑖!
(1 − 𝜆⁄𝑛)

For n large and λ moderate:

𝜆 𝑛 𝑛(𝑛 − 1) … (𝑛 − 𝑖 + 1) 𝜆 𝑖
(1 − ) ≈ 𝑒 −𝜆 ≈1 (1 − ) ≈ 1
𝑛 𝑛𝑖 𝑛
Hence,

𝜆𝑖
𝑝(𝑖) ≈ 𝑒 −𝜆
𝑖!

⑤where Poisson is used:


Suppose we want to approximate the number of people who reach age 100. Among n persons with the
probability p(100) for each of them to reach the age 100, the number is approximated to be Poisson with (i;λ)
where i=100 and λ=np

⑥E[X] & Var[X]:


First we calculate E[X]:
∞ ∞
−𝜆
𝜆𝑖 𝜆𝑖−1
𝐸[𝑋] = ∑ 𝑖𝑒 = 𝑒 −𝜆 𝜆 ∑ = 𝑒 −𝜆 𝜆𝑒 𝜆 = 𝜆
𝑖! (𝑖 − 1)!
𝑖=0 𝑖=1

For the variance:


∞ ∞ ∞ ∞
2] 2 −𝜆
𝜆𝑖 𝜆𝑖−1 𝜆𝑖−1 𝜆𝑖−1
𝐸[𝑋 = ∑𝑖 𝑒 = 𝜆 ∑ 𝑖𝑒 −𝜆 = 𝜆 [∑(𝑖 − 1)𝑒 −𝜆 + ∑ 𝑒 −𝜆 ] = 𝜆[𝜆 + 1]
𝑖! (𝑖 − 1)! (𝑖 − 1)! (𝑖 − 1)!
𝑖=0 𝑖=1 𝑖=1 𝑖=1

Var[X] is equal to:

𝜆(𝜆 + 1) − 𝜆2 = 𝜆

⑦the Negative Binomial Random Variable


N trials are repeated. The probability of having r successes is a negative binomial random variable with
parameters (X;r,p):
𝑛 − 1 𝑟 (1
𝑃(𝑋 = 𝑛) = ( )𝑝 − 𝑝)𝑛−𝑟 𝑛 = 𝑟, 𝑟 + 1, …
𝑟−1
It follows because in order for the rth success to occur in the nth trial with probability p, r-1 successes must
have happened in the n-1 previous trials.

Example: what is the probability of achieving r successes before m failures?

In other words, the probability of achieving r successes in r+m-1 trials has to be calculated, which is as follows:
𝑟+𝑚−1
𝑛 − 1 𝑟 (1
∑ ( )𝑝 − 𝑝)𝑛−𝑟
𝑟−1
𝑛=𝑟
4
⑧the expected value and variance of the NB random variable:
First we calculate E[X]:
∞ ∞
𝑛 − 1 𝑟 (1 𝑟 𝑛
𝐸[𝑋] = ∑ 𝑛 ( )𝑝 − 𝑝)𝑛−𝑟 = ∑ 1 ( ) 𝑝𝑟+1 (1 − 𝑝)𝑛−𝑟
𝑟−1 𝑝 𝑟
𝑛=𝑟 𝑛=𝑟

𝑟 (𝑛 + 1) − 1 𝑟+1 𝑟
= ∑( ) 𝑝 (1 − 𝑝)(𝑛+1)−(𝑟+1) =
𝑝 (𝑟 + 1) − 1 𝑝
𝑛=𝑟

The term after the sigma sign represents a mass probability function with parameters (n+1, r+1, p)
𝑟
𝐸[𝑋] =
𝑝
For the variance,
𝑟 𝑟 𝑟+1
𝐸[𝑋 2 ] = 𝐸[𝑌 − 1] = ( − 1)
𝑝 𝑝 𝑝
𝑟 𝑟+1 𝑟 2 𝑟(1 − 𝑝)
𝑉𝑎𝑟(𝑋) = ( − 1) − ( ) =
𝑝 𝑝 𝑝 𝑝2

⑨the Hypergeometric random variable:


Suppose that a sample of size n is to be chosen randomly (without replacement) from an urn containing N
balls, of which m are white and N − m are black. If we let X denote the number of white balls selected,
𝑚 𝑁−𝑚
( )( )
𝑖 𝑛−𝑖
𝑃(𝑋 = 𝑖) = 𝑖 = 0, 1, 2, …
𝑁
( )
𝑛
Which is a hypergeometric variable with the variable and parameters of (X; n,N,m)

⑩the expected value and variance of the hypergeometric variable:

𝑁 𝑚 𝑁−𝑚
( )( )
𝑖 𝑛−𝑖
𝐸[𝑋] = ∑ 𝑖
𝑁
𝑖=1 ( )
𝑖
𝑚 𝑚−1 𝑁 𝑁 𝑁−1
using 𝑖 ( ) = 𝑚 ( ) and ( ) = ( ):
𝑖 𝑖−1 𝑖 𝑛 𝑖−1

𝑁 𝑚 − 1 (𝑁 − 1) − (𝑚 − 1)
𝑛𝑚 ( )( ) 𝑛𝑚 𝑛𝑚
𝑖−1 (𝑛 − 1) − (𝑖 − 1)
𝐸[𝑋] = ∑1 = 1=
𝑁 𝑁−1 𝑁 𝑁
𝑖=1 ( )
𝑖−1
The number 1 after nm/N represents a mass probability function with parameters (1, n-1, N-1, m-1)

For E[X2]:

5
𝑚 𝑁−𝑚 𝑁 𝑚 − 1 (𝑁 − 1) − (𝑚 − 1)
( )( ) 𝑛𝑚 𝑁 (
𝑖−1
)(
(𝑛 − 1) − (𝑖 − 1)
)
2] 2 𝑖 𝑛 − 𝑖
𝐸[𝑋 = ∑ 𝑖 = ∑[(𝑖 − 1) + 1]
𝑁 𝑁 𝑁−1
𝑖=1 ( ) 𝑖=1 ( )
𝑛 𝑛−1
𝑛𝑚 𝑛𝑚 (𝑛 − 1)(𝑚 − 1)
= [𝐸(𝑖 − 1) + 𝑃(1)] = [ + 1]
𝑁 𝑁 (𝑁 − 1)

For the variance:


𝑛𝑚 (𝑛 − 1)(𝑚 − 1) 𝑛𝑚 2
𝑉𝑎𝑟(𝑋) = [ + 1] − ( )
𝑁 (𝑁 − 1) 𝑁
𝑚
𝑚−1 𝑚 1−
𝑁
Using = − :
𝑁−1 𝑁 𝑁−1

𝑚
𝑛𝑚 𝑚 1−𝑀 𝑛𝑚 𝑛𝑚 𝑚 𝑛−1
∴ 𝑉𝑎𝑟(𝑋) = [(𝑛 − 1) ( − )+1− ]= (1 − ) (1 − )
𝑁 𝑁 𝑁−1 𝑁 𝑅 𝑁 𝑁−1

Chapter 5: Continuous Random Variables


①Represented by:
𝑏
𝑃{𝑎 ≤ 𝑥 ≤ 𝑏} = ∫ 𝑓(𝑥)𝑑𝑥
𝑎

Note that f(x) itself does NOT show P(a), because:


𝑎
𝑃(𝑋 = 𝑎) = ∫ 𝑓(𝑥)𝑑𝑥 = 0
𝑎

Therefore:
𝑎 ∞
𝑎
𝑃(𝑋 < 𝑎) = 𝑃(𝑋 ≤ 𝑎) = 𝐹(𝑥) = ∫ 𝑓(𝑥)𝑑𝑥 = [∫ 𝑓(𝑥)𝑑𝑥 ] |
−∞ −∞
−∞

②The expectation of continuous random variables:


𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞

𝐸[𝑔(𝑥)] = ∫ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥
−∞

For example:
1 𝑖𝑓 0 ≤ 𝑥 ≤ 1
𝑓(𝑥) = { 𝐸(𝑒 𝑋 ) =?
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

6
∞ 1
𝑓(𝑥)=1, 0≤𝑥≤1 1
𝐸(𝑒 𝑋 ) = ∫ 𝑒 𝑥 𝑓(𝑥)𝑑𝑥 → ∫ 𝑒 𝑥 𝑑𝑥 = 𝑒 𝑥 | = 𝑒 1 − 𝑒 0 = 𝑒 − 1
0
−∞ 0

③The variance of continuous random variables is just the same as


discrete random variables:
𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 also 𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)

④the uniform random variable:


1 0<𝑥<1
𝑓(𝑥) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
For any 0<a<b<1:
𝑏
𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥 = 𝑥 | = 𝑏 − 𝑎
𝑎
𝑎

In its general form:


1
if α<x<b
𝑓(𝑥) = {𝛽 − 𝛼
0 if otherwise
Therefore, for any value a:
0 𝑎<𝛼
𝑎−𝛼
𝐹(𝑎) = {𝛽 − 𝛼 𝛼 < 𝑎𝛽
1 𝛽<𝑎
The expectation is equal to (α+β)/2, the variance (β-α)2/12

⑤The normal random variable:


1
𝑓(𝑥) = 𝑒𝑥𝑝[−(𝑥 − 𝜇)/2𝜎 2 ]
√2𝜋𝜎
For any normally distributed X, w/ parameters σ2 and µ, Y=aX+b is normally distributed w/ parameters aµ+b
and a2σ2. The cumulative probability function for the normal distribution is shown as:
𝑥 𝑥
1 1
Φ(𝑥) = [(𝑥 − 𝜇)⁄𝜎 2 ]⁄2] =
∫ 𝑒𝑥𝑝 [− ⏟ ∫ 𝑒𝑥𝑝[− 𝑍⁄2]
√2𝜋 𝑍 √2𝜋
−∞ −∞

∴ P(𝑎 < 𝑋 < 𝑏) = Φ(𝑏) − Φ(𝑎) = 𝑃(𝑍𝑎 < 𝑍𝑋 < 𝑍𝑏 ) = Φ(𝑍𝑏 ) − Φ(𝑍𝑎 )
note that: Φ(∞) = 1

7
⑥The normal approximation to the Binomial Distribution:
The probability that in n trials, w/ a success probability of p, X of them will be a success, can be approximate
as:

𝑋 − 𝑛𝑝
𝑃 (𝑎 ≤ ≤ 𝑏) ≅ Φ(𝑏) − Φ(𝑎)
√𝑛𝑝(1 − 𝑝)

The approximation is quite good when np(1-p)>10.

When using the normal approximation for a binomially distributed P(x)=y, use normal distribution for
P(y-.5<y<y+.5) (standardize first), which is called the continuity correction. For a binomially distributed P(X>Y),
use the normal distribution for P(X≥Y+.5)

⑦The exponential Random Variable:


For λ>0, w/ parameters λ:
−𝜆𝑥
𝑓(𝑥) = {𝜆𝑒 if 𝑥 ≥ 0
0 if 𝑥 < 0

⑧ Integration by Parts:
For calculating∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥:
𝑑
[𝑓(𝑥)𝑔(𝑥)] = 𝑓́ (𝑥)𝑔(𝑥) + 𝑓(𝑥)𝑔́ (𝑥)
𝑑𝑥
Integrating both side gives:
𝑑
∫ [𝑓(𝑥)𝑔(𝑥)]𝑑𝑥 = ∫ 𝑓́ (𝑥)𝑔(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑔́ (𝑥)𝑑𝑥
𝑑𝑥
Rearranging gives the integration by parts formula:

∫ 𝑓(𝑥)𝑔́ (𝑥)𝑑𝑥 = 𝑓(𝑥)𝑔(𝑥) − ∫ 𝑓́ (𝑥)𝑔(𝑥)𝑑𝑥

Or, representing f(x) w/ u and g(x) w/ v gives:

∫ 𝑢𝑑𝑣 = 𝑢𝑣 − ∫ 𝑣𝑑𝑢

⑨The expectation and Variance of the Exponential Random Variable


The E is equal to 1/λ, the Var equal to 1/λ2

⑪Memoryless Random Variables


Refer to Sheldon page 210

S-ar putea să vă placă și