Session 4-5 Reference: SFM Ch.5

DOM501
Session 4-5
Reference: SfM Ch.5
Probability distribution of a discrete variable
 A discrete variable is a variable that takes only discrete values. These values may
not be integer, but they do not form a continuous function.
 It is a mutually exclusive list of all possible numerical outcomes along with the
probability of each outcome occurring.
 Eg: The number of possible absentees in an office:
No. of absentees Probability
0 0.15
1 0.35
2 0.2
3 0.15
4 0.1
5 0.05
Expected value of a discrete variable
 The expected value of a discrete variable is the weighted average of all the
outcomes, the weights being the probability scores.
 µ= 𝐸 𝑋 = σ𝑁
𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 )
 In the previous example, µ = 0.15(0)+0.35(1)+0.2(2)+0.15(3)+0.1(4)+0.05(5)

= 1.85
Variance and standard deviation of discrete variable
 The variance of the discrete variable is the sum of the squared difference between
outcome and expected value, multiplied by the probability of that outcome.
 Variance = σ𝑁 𝑥
𝑖=1 𝑖 − 𝐸 𝑋 2 𝑃(𝑋 = 𝑥 )
𝑖
 The standard deviation is the square root of the variance.
 σ= 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= σ𝑁
𝑖=1 𝑥𝑖 − 𝐸 𝑋 2 𝑃(𝑋 = 𝑥 )
𝑖
Covariance of a probability distribution
 Covariance measures the strength of a relationship between the probability
distribution of two numerical variables.
 A negative covariance indicates a negative relationship, a positive covariance

indicates a positive relationship (occurrence of one makes the other more likely),
and a covariance of 0 indicates the probability distributions are independent.
 Covariance 𝜎𝑥𝑦 = σ𝑁
𝑖=1 𝑥𝑖 − 𝐸 𝑋 𝑦𝑖 − 𝐸 𝑌 𝑃(𝑥𝑖 𝑦𝑖 )
 𝑃(𝑥𝑖 𝑦𝑖 ) is the probability of both 𝑥𝑖 and 𝑦𝑖 occurring.

Sum of two discrete variables
 Expected value of the sum of two variables:
E(X+Y) = E(X) + E(Y)
 Variance of the sum of two variables:

2
Var(X+Y) = σ𝑋+𝑌 = σ𝑋2 + σ2𝑌 + 2𝜎𝑥𝑦
 Standard deviation:
𝜎𝑋+𝑌 = 𝑉𝑎𝑟(𝑋 + 𝑌)
 A variable X has the following distribution:
X(i) P(i)
10 0.2
12 0.1
15 0.4
20 0.3
 E(X)=10*0.2+12*0.1+15*0.4+20*0.3=15.2
 Variance = (10 − 15.2)2 ∗ 0.2 + 12 − 15.2 2 ∗ 0.1 + 15 − 15.2 2 ∗ 0.4 +
20 − 15.2 2 ∗ 0.3 = 5.408 + 1.024 + 0.016 + 6.912 = 13.36
 Stdev=3.655
 If there is a second variable Y with E(Y)=21.4, Stdev(Y)=4.22, and covariance = -1.56
E(X+Y)=15.2+21.4=36.6
Variance(X+Y)=13.36+4.22^2+2*-1.56=28.05
Stdev(X+Y)=5.296
The Uniform Distribution
 Also called rectangular dist. Has the same chance of occurrence anywhere in its
range.
1
𝑓 𝑋 = 𝑤ℎ𝑒𝑟𝑒 𝑏 𝑖𝑠 𝑀𝑎𝑥 𝑋 , 𝑎 𝑖𝑠 𝑀𝑖𝑛(𝑋)
𝑏−𝑎+1
𝑎+𝑏
Mean: 𝜇 =
2
(𝑏−𝑎)2
Variance: 𝜎 2 =
12
Standard Deviation: 𝜎
The Binomial Distribution
 A discrete random variable distribution created by a Bernoulli Process, which has
the following properties –
 It is a series of trials, each trial has only two outcomes, with probabilities p and 1-p
 The value of p stays fixed over the course of the process
 The trials are statistically independent
 If there are n trials, the chances of obtaining exactly r successes (r<=n) is given by
the binomial formula (let q = 1-p):
𝑛!
𝑝𝑟 𝑞 𝑛−𝑟
𝑟! (𝑛 − 𝑟!)
 A roulette table has 18 black numbers, 18 red, and 1 green. A person is betting on
either red or black. What is his chance of getting more than 3 wins in 5 games?
p=18/37, q=1 – 18/37=19/37
n=5, P(x>3)=P(x=4)+ P(x=5)
P(x=4)=5C4*(18/37)^4*19/37
P(x=5)=5C5*(18/37)^5
Graphical results of the binomial distribution
 When p is small (around 0.1) the distribution is right-skewed
 As p increases, the skewness is less noticeable until it is
symmetrical at p=0.5
 As p increases beyond 0.5, the distribution starts being skewed to
the left.
 The probability for each outcome at a certain value p are the same
as the outcomes for q, except in reverse order.
 Q: In 10 tosses of an honest coin, what are the chances of a)
Exactly 7 heads b) Less than 5 heads?
Central Tendency of the Binomial Distribution
 Mean: np
 Variance: npq
 Standard deviation: 𝑛𝑝𝑞
 On a 6-sided die, there is 1/6 chance of rolling 6.
 Over 10 trials, mean = np = 10/6, Variance = npq = 50/36
 Stdev = (50/36)^0.5
 Final note: To apply the binomial distribution, we must first ensure that
the process meets the conditions for a Bernoulli Process.
 Excel command: binom.dist(x,n,p,cumulative) or
binom.dist.range(n,p,x1,x2)
Hypergeometric Distribution
 Where the binomial distribution the sample data are selected with replacement
from a finite pool (or without from an infinite pool) the hypergeometric
distribution is found when the samples are taken from a finite pool without
replacement.
 If n samples are taken from population N, and out of the population A members are
of interest, then the probability of exactly x successes out of n samples is:
𝐶𝑥𝐴 𝐶𝑛−𝑥
𝑁−𝐴
𝑃 𝑋 = 𝑥 𝑛, 𝑁, 𝐴 =
𝐶𝑛𝑁
𝑛𝐴(𝑁−𝐴) 𝑁−𝑛
 Mean = 𝑛𝐴/𝑁 Std. dev. 𝜎 =
𝑁2 𝑁−1
 Excel command: hypgeom.dist(x,n,A,N,cumulative)
Poisson Distribution
 Characteristics of the Poisson Process:
 The process is applied to a discrete random variable that takes integer values
 The average value of the random variable over the given time period is already
known or can be calculated given past data
 At any one second, the possibility of a positive outcome is very small, and a fixed
value.
 At any one second, the possibility of two or more positive outcomes is so small we
can assign it a value of zero.
 The probability of a positive outcome at any given second is not only fixed, but
independent of the actual time as well as the result in any other second.
The Poisson Formula
 Let λ be the mean number of occurrences in the interval of time under study.
 e is the base of the natural logarithm system, approx. 2.71828
 Poisson probability of x number of incidents occurring
𝜆𝑥 𝑒 −𝜆
𝑃 𝑥 =
𝑥!
 If a binomial process has a large number of trials (n>20) and a small probability of
success (p<0.05), we can use the Poisson formula after substituting the binomial
mean np.
(𝑛𝑝)𝑥 𝑒 −𝑛𝑝
𝑃 𝑥 =
𝑥!
 Excel command: poisson.dist(x, mean, cumulative)
 A lawyer receives average 6 clients a day. What is his chance of getting at least 3
clients? Clients arrive following a Poisson distribution.
λ= 6, P(x>=3)= P(x=3)+P(x=4)+…. = 1-[P(x=0)+P(x=1)+P(x=2)]
P(x=0)=𝑒 −6
P(x=1)=6𝑒 −6
62 𝑒 −6 −6
P(x=2)= = 18𝑒
2!
 Ans: 1 − 25𝑒 −6
The Exponential Distribution
 Right-skewed, ranges from zero to +infinity. Defined by the mean number of
occurrences per unit time, 1/λ.
 Mean = 1 /λ= standard deviation = ave. no. of occurrences over time
 P(X)=λ𝑒 −𝜆𝑥
 P(x<=X)=1-𝑒 −𝜆𝑥
 Excel command: expon.dist(x, lambda, cumulative)
 A lawyer receives 6 clients a day. What is his chance of getting at least 3 clients?
Clients arrive following a exponential distribution.
1/λ=6, λ=1/6
P(x>=3)=1 – P(x<=2)
2
−6
P(x<=2) = 1 − 𝑒
2
−6
 Ans: 1 – P(x<=2)=𝑒 =0.7165
Problem
 On average, 5 candidates selected out of 50. Probability of there being between 3
and 5 selections?
 Binom.dist.range(50,(5/50),3,5)=0.5044
The normal distribution
 It is a continuous probability distribution. Also called Gaussian distribution.
 Can be used to approximate discrete distributions, with sufficiently large samples.
Can approximate Binomial distribution if np,nq>5
 It is symmetrical, bell-shaped in appearance, its interquartile range is from -0.67
standard deviations to +0.67 std.devs.
 All continuous functions have a probability density function which is the likelihood of
the variable taking a particular value.
 Integrating between two values X1, X2, gives us the probability of the variable
falling between those two values.
Normal Distribution from Z-score
 For a normal distribution, we can find thee probability of the variable being below a
certain value by using the Z-table.
𝑋−𝜇
 Calculate Z= , the corresponding value in the table shows the probability of the
𝜎
variable being less than or equal to that value.
 To find the probability of the variable being between X1 and X2, P(X1<X<X2)=
P(Z(X2))-P(Z(X1)).
 P(X<X1)=P(Z(X1)
 P(X>X1)= 1-P(Z(X1)
 If mean is 200, 50, Z(168)=-0.64, P(Z(168))=0.2611 =P(X<168)
 P(X>168)=1-P(X<168)=0.7389
 P(168<X<250)=P(X<250) - P(X<168) = 0.8413 - 0.2611=0.5802
 Excel commands for Normal Distribution
 NORM.DIST(x, Mean, stdev, 1): Returns the probability of a random variable being
less than or equal to x, for a given Mean and Stdev.
 NORM.INV(p, Mean, Stdev): Finds the value x for which (PX<=x)=p, for a given
mean and stdev.
 NORM.S.DIST(z,1): Returns the corresponding probability value for a certain Z-score
(can act as substitute for Z-table)
 NORM.S.INV(p): Returns the Z-score for a certain probability (can act as substitute
for Z-table)

Session 4-5 Reference: SFM Ch.5

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Session 4-5 Reference: SFM Ch.5

Încărcat de

Drepturi de autor:

Formate disponibile

DOM501

 In the previous example, µ = 0.15(0)+0.35(1)+0.2(2)+0.15(3)+0.1(4)+0.05(5)

 The standard deviation is the square root of the variance.

 A negative covariance indicates a negative relationship, a positive covariance

 𝑃(𝑥𝑖 𝑦𝑖 ) is the probability of both 𝑥𝑖 and 𝑦𝑖 occurring.

 Variance of the sum of two variables:

 Mean = 1 /λ= standard deviation = ave. no. of occurrences over time

S-ar putea să vă placă și