Documente Academic
Documente Profesional
Documente Cultură
Cap 4
Cap 4
incertitudini
Capitolul 4
2023-2024
Cunoastere incerta
2023-2024
Example
• Consider two boxes:
– 1 blue box: 3 apples and 1 orange
– 1 red box: 2 apples and 3 oranges
• Choose randomly one box and extract randomly one fruit from
that box. The fruit is then returned to its original box.
• We can define two random variables :
– Color of chosen box 𝐵, 𝑑𝑜𝑚 𝐵 = *𝑟𝑒𝑑, 𝑏𝑙𝑢𝑒+
– Chosen fruit 𝐹, 𝑑𝑜𝑚 𝐹 = 𝑜𝑟𝑎𝑛𝑔𝑒, 𝑎𝑝𝑝𝑙𝑒
2023-2024
Propositions and Formulae
• An atomic or primitive proposition is:
– An expression 𝑋 𝑜𝑝 𝑣 s.t. 𝑣 ∈ dom(𝑋) and 𝑜𝑝 is a binary relational
operator defined by 𝑑𝑜𝑚 𝑋 × 𝑑𝑜𝑚(𝑋), For example 𝑜𝑝 ∈
=, ≠, <, ≤, >, ≥ for the case when domain of 𝑋 is a totally ordered
set (for example a set of numbers).
– An expression 𝑋 𝑜𝑝 𝑌 s.t. 𝑜𝑝 is a binary relational operator defined by
𝑑𝑜𝑚 𝑋 × 𝑑𝑜𝑚(𝑌).
2023-2024
Random Variables vs Possible Worlds
• A possible world is an assignment of a value to each
random variable.
• So, if 𝒱 is the set of variables and 𝒟 is the union of all
their domains then a possible world is a function
𝜔 ∶ 𝒱 → 𝒟 s.t. 𝜔 𝑋 ∈ 𝑑𝑜𝑚(𝑋)
• A random variable can be also defined as a function
defined on set of possible worlds with values in the
variable’s domain.
• Let Ω be the set of all possible worlds. A random
variable is a function 𝑋 ∶ Ω → 𝑑𝑜𝑚(𝑋). Valoarea
variabilei este 𝑋 𝜔 ∈ 𝑑𝑜𝑚(𝑋) pentru o stare 𝜔 ∈ Ω.
2023-2024
Formulae Semantics
• Let ℱ be the set of formulae. Relation ⊨ Ω × ℱ defines the
truth value of formula 𝑓 ∈ ℱ in a world 𝜔 ∈ Ω. It is defined
inductively as follows:
– 𝜔 ⊨ 𝑋 𝑜𝑝 𝑣 if value assigned by 𝜔 to 𝑋 is in relation 𝑜𝑝 with 𝑣 i.e.
𝑋 𝜔 𝑜𝑝 𝑣.
– 𝜔 ⊨ 𝑋 𝑜𝑝 Y if value assigned by 𝜔 to 𝑋 is in relation 𝑜𝑝 with value
assigned by 𝜔 to 𝑌 i.e. 𝑋 𝜔 𝑜𝑝 𝑌(𝜔).
– 𝜔 ⊨ 𝑓 ∧ 𝑔 if 𝜔 ⊨ 𝑓 and 𝜔 ⊨ 𝑔
– 𝜔 ⊨ 𝑓 ∨ 𝑔 if 𝜔 ⊨ 𝑓 or 𝜔 ⊨ 𝑔 or both
– 𝜔 ⊨ ≦𝑓 if ≦(𝜔 ⊨ 𝑓)
• A tautology is a true formula in all worlds.
• A contradiction or impossible event is a false formula in all
worlds.
• 𝑓 and 𝑔 are mutually exclusive if 𝑓 ∧ 𝑔 is a contradiction.
2023-2024
Model
• Every formula 𝑓 ∈ ℱ defines a subset Ω𝑓 ⊆ Ω called
subset of models of 𝑓:
Ω𝑓 = 𝜔 ∈ Ω 𝜔 ⊨ 𝑓+ ⊆ Ω
• If 𝑓 is tautology (universally true) then Ω𝑓 = Ω.
• Example: ∨𝑣∈𝑑𝑜𝑚 𝑋 𝑋 = 𝑣 is tautology.
• If 𝑓 is contradiction (universally false, impossible)
then Ω𝑓 = ∅.
• Example: if 𝑣 ≠ 𝑤 ∈ 𝑑𝑜𝑚(𝑋) then 𝑋 = 𝑣 ∧ (𝑋 =
𝑤) is a contradiction.
• Observation: 𝑓 is tautology iff ≦𝑓 is contradiction.
2023-2024
Set of models: 𝝈-algebras
• Let Ω be the set of possible worlds. We cannot use any
collection of subsets of Ω (subset of 2Ω ) to define probabilities.
• These subsets must be measurable. Mathematically they must
be 𝜍-algebras. So, in general case, defining semantics of
probability is based on measure theory.
2023-2024
Examples of 𝝈-algebras
• *∅, Ω+ is called trivial 𝝈-algebra. It is useful for defining the
algebra of Boolean functions (logical formulae with two
possible interpretations: true and false).
• 2Ω se called power 𝝈-algebra. It is useful to define discrete
probability distributions.
• If Ω = ℝ we consider the collection of finite or infinite, closed
or open real intervals, together with their countable unions and
intersections. This represents all measurable sets of real
numbers (i.e. their “length” can be intuitively defined) and it is
called Borel algebra. It is useful to define probability
distributions of continuous real random variables. The
elements of this set are called Borel sets.
2023-2024
Masura de probabilitate
• Se considera intai cazul finit (discret). Cu alte cuvinte
presupunem ca:
– avem un numar finit de variabile aleatoare discrete
– Domeniul fiecarei variabile aleatoare este finit
• Fiecarei lumi posibile 𝜔 ∈ Ω i se asociaza o masura numerica
𝑚(𝜔) numita masura de probabilitate cu proprietatile
urmatoare:
0 ≤ 𝑚 𝜔 pentru orice 𝜔 ∈ Ω
𝜔∈Ω 𝑚 𝜔 = 1
2023-2024
Specificarea distributiei complete de
probabilitate pentru domenii finite
• Fie 𝒳𝑖 = 𝑑𝑜𝑚(𝑋𝑖) finite. Probabilitatile 𝑃(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) pentru toate
valorile posibile 𝑥𝑖 ∈ 𝒳𝑖 se definesc printr-un tabel cu 𝑛 coloane.
• Suma acestor valori este 1. Definirea DCP necesita 𝑛𝑖=1 𝒳𝑖 − 1 valori.
Pentru 𝑛 variabile aleatoare Booleene sunt necesare 2𝑛 − 1 valori.
2023-2024
Semantica probabilitatii conditionate
• Fie e o constatare cu 𝑃(𝑒) > 0 (daca 𝑃(𝑒) = 0 atunci 𝑒 este
imposibila si deci nu poate fi observata). Constatarea 𝑒 va conduce
la eliminarea tuturor lumilor incompatibile cu 𝑒. Ea va induce astfel
o noua masura a increderii 𝑚𝑒 definita astfel:
1
×𝑚 𝜔 daca ω ⊨ 𝑒
𝑃 𝑒
𝑚𝑒 𝜔 =
0 altfel
• Probabilitatea conditionata a ipotezei data fiind constatarea 𝑒 se
defineste dupa formula generala de definire probabilitatii, folosind
insa noua masura 𝑚𝑒 .
1 𝑃 ℎ∧𝑒
𝑃 𝑒 = 𝜔⊨ℎ 𝑚𝑒 𝜔 = × 𝜔⊨ℎ∧𝑒 𝑚 𝜔 =
𝑃 𝑒 𝑃 𝑒
𝑃 ℎ∧𝑒
• Ecuatia: 𝑃 𝑒 = pentru 𝑃 𝑒 > 0 este deseori considerata
𝑃 𝑒
definitia probabilitatii conditionate.
2023-2024
Exemplu
• Se stie ca:
– 90% din trenuri pleaca la timp: 𝑃. 𝑃 𝑃 = 0.9
– 80% din trenuri ajung la timp: 𝐴. 𝑃 𝐴 = 0.8
– 75% din trenuri pleaca la timp si ajung la timp: 𝑃 ∧ 𝐴. 𝑃 𝑃 ∧ 𝐴 = 0.75
• Se cere:
1. Ati prins un tren care a plecat la timp. Care este probabilitatea sa
ajunga la timp? 𝑃 𝐴 𝑃 =?
𝑃 𝐴∧𝑃 0.75
𝑃 𝐴𝑃 = = = 0.8333
𝑃 𝑃 0.9
2. Ati coborat dintr-un tren care a ajuns la timp. Care este probabilitatea
sa fi plecat la timp? 𝑃 𝑃 𝐴 =?
𝑃 𝑃∧𝐴 0.75
𝑃 𝑃𝐴 = = = 0.9375
𝑃 𝐴 0.8
3. Sunt evenimentele 𝑃 si 𝐴 independente? Tema ! (vezi slide-urile
urmatoare pentru definitia independentei)
2023-2024
Regula inlantuirii probabilitatilor conditionate
𝑃 𝑓1 ∧ 𝑓2 ∧ 𝑓3 = 𝑃 𝑓1 × 𝑃 𝑓2 𝑓1 × 𝑃(𝑓3 |𝑓1 ∧ 𝑓2 )
2023-2024
Formula de inversiune a lui Bayes
𝑃 𝑒 ℎ ×𝑃 ℎ
• Propozitie. Daca 𝑃 𝑒 ≠ 0 atunci: 𝑃 𝑒 =
𝑃 𝑒
• Demonstratie.
𝑃 ∧𝑒 =𝑃 𝑒 ×𝑃 𝑒
𝑃 𝑒∧ =𝑃 𝑒 ×𝑃
Egaland cei doi membrii, formula lui Bayes rezulta imediat.
• Interpretare. Formula lui Bayes arata cum se calculeaza
probabilitatea a posteriori 𝑃(|𝑒) pe baza probabilitatii a priori
𝑃() si a verosimilitatii (engl.likelihood) 𝑃(𝑒|) a ipotezei .
Deci probabilitatea a posteriori este proportionala cu produsul
dintre probabilitatea a priori si verosimilitate.
• Verosimilitate. Verosimilitatea 𝑃(𝑒|) a ipotezei reprezinta
probabilitatea conditionata ca dovada 𝑒 sa fi fost intr-adevar
cauzata de ipoteza .
2023-2024
Aplicarea formulei de inversiune a lui Bayes
• Pe cazul general sa presupunem ca dispunem de o multime
completa de ipoteze mutual exclusive *1, … , 𝑛+. Acest lucru
inseamna ca pentru orice 1 ≤ 𝑖 < 𝑗 ≤ 𝑛 formula ≦(𝑖 ∧ 𝑗)
este o tautologie si ca 1 ∨ ⋯ ∨ 𝑛 este o tautologie. Aplicand
formula de rationament pe cazuri:
𝑛 𝑛
𝑃 𝑒 = Σ𝑖=1 𝑃(𝑒 ∧ 𝑖 ) = Σ𝑖=1 𝑃 𝑒 𝑖 𝑃(𝑖 )
𝑃 𝑒 𝑗 𝑃 𝑗
𝑃 𝑗 𝑒 = 𝑛
Σ𝑖=1 𝑃 𝑒 𝑖 𝑃 𝑖
2023-2024
Exemplu
• Un test de identificare a consumului unui medicament este 98 %
senzitiv si 98 % specific. Se stie ca 0.4% din populatie sunt utilizatori
ai medicamentului. Care este probabilitatea ca o persoana testata
pozitiv sa fie utilizator al medicamentului ?
• Senzitivitatea: probabilitatea ca utilizatorii medicamentului sa obtina
rezultate adevarat-pozitive (engl. true-positive).
• Specificitatea: probabilitatea ca neutilizatorii medicamentului sa
obtina rezultate adevarat-negative (engl. true-negative).
𝑃 𝑇𝑃 𝑈 = 0.98
𝑃 ≦𝑇𝑃 ≦𝑈 = 0.98
𝑃 𝑈 = 0.004
𝑃 𝑈 𝑇𝑃 = ?
𝑃 𝑇𝑃 𝑈 𝑃 𝑈
𝑃 𝑈 𝑇𝑃 =
𝑃 𝑇𝑃
𝑃 𝑇𝑃 = 𝑃 𝑇𝑃 ∧ 𝑈 + 𝑃 𝑇𝑃 ∧ ≦𝑈 = 𝑃 𝑇𝑃 𝑈 𝑃 𝑈 + 𝑃 𝑇𝑃 ≦𝑈 𝑃 ≦𝑈 =
𝑃 𝑇𝑃 𝑈 𝑃 𝑈 + 1 − 𝑃 ≦𝑇𝑃 ≦𝑈 1 − 𝑃 𝑈 = 0.98 × 0.004 + 1 − 0.98 ×
1 − 0.004 = 0.02384 Probabilitatea ceruta este de aproximativ 16.44 % De ce ?
2023-2024
Formula lui Bayes cu probabilitati conditionate
𝑃 𝑒 ∧𝑘 ×𝑃 𝑘
𝑃 𝑒∧𝑘 =
𝑃 𝑒𝑘
2023-2024
Independenta conditionata
• Definitie. este independenta conditionat de 𝑓 data fiind
𝑒, fapt notat prin ⊥ 𝑓|𝑒, dnd 𝑃(|𝑓 ∧ 𝑒) = 𝑃(|𝑒). Cu
alte cuvinte cunoasterea lui 𝑓 nu afecteaza increderea in
data fiind 𝑒.
• Demonstratie.
𝑃 𝑒 = 𝑃(|𝑓 ∧ 𝑒) dnd
𝑃 ℎ∧𝑒 𝑃 ℎ∧𝑓∧𝑒
= dnd
𝑃 𝑒 𝑃 𝑓∧𝑒
𝑃 𝑓∧𝑒 𝑃 ℎ∧𝑓∧𝑒
= dnd
𝑃 𝑒 𝑃 ℎ∧𝑒
𝑃 𝑓 𝑒 = 𝑃(𝑓| ∧ 𝑒)
2023-2024
Independenta conditionata – Proprietati
• Propozitie. ⊥ 𝑓|𝑒 dnd 𝑃 𝑒 𝑃 𝑓 𝑒 = 𝑃( ∧ 𝑓|𝑒)
• Demonstratie.
𝑃 𝑒 = 𝑃(|𝑓 ∧ 𝑒) dnd
𝑃 ℎ∧𝑒 𝑃 ℎ∧𝑓∧𝑒
= dnd
𝑃 𝑒 𝑃 𝑓∧𝑒
𝑃 ℎ∧𝑒 𝑃 𝑓∧𝑒
= 𝑃( ∧ 𝑓 ∧ 𝑒) dnd
𝑃 𝑒
𝑃 ℎ∧𝑒 𝑃 𝑓∧𝑒 𝑃 ℎ∧𝑓∧𝑒
× = dnd
𝑃 𝑒 𝑃 𝑒 𝑃 𝑒
𝑃 |𝑒 𝑃 𝑓|𝑒 = 𝑃 ∧ 𝑓 𝑒
• Observatie: ⊥ 𝑓 dnd 𝑃 ∧ 𝑓 = 𝑃 𝑃(𝑓)
2023-2024
Problema zilei de nastere
• Se considera un grup de 𝑘 persoane. Sa se determine probabilitatea
ca doua persoane din grup sa aiba aceeasi zi de nastere (se vor
ignora anii bisecti).
• Fie 𝑛 = 365 numarul de zile din an. Fie 𝑋1 , 𝑋2 , … , 𝑋𝑘 variabile
aleatoare astfel incat 𝑋𝑖 ∈ *1,2, … , 𝑛+ reprezinta ziua de nastere a
persoanei 𝑖 ∈ *1,2, … , 𝑘+. Trebuie sa determinam:
𝑃 ∨1≤𝑖<𝑗≤𝑘 𝑋𝑖 = 𝑋𝑗 = 1 − 𝑃(∧1≤𝑖<𝑗≤𝑘 𝑋𝑖 ≠ 𝑋𝑗 )
• Exista 𝑛𝑘 valori posibile ale tuplului (𝑋1 , 𝑋2 , … , 𝑋𝑘 ). Dintre acestea
se pot alege 𝐴𝑘𝑛 = 𝑛 𝑛 − 1 … (𝑛 − 𝑘 + 1) tupluri (aranjamente de
𝑛 luate cate 𝑘) in care toate valorile sunt diferite. Rezulta:
𝑛−1 𝑛−2 … 𝑛−𝑘+1 𝑘−1 𝑛−𝑖
𝑃 ∧1≤𝑖<𝑗≤𝑘 𝑋𝑖 ≠ 𝑋𝑗 = = 𝑖=1 𝑛
𝑛𝑘−1
• Cat este 𝑘 minim astfel incat probabilitatea ca doua persoane sa fie
nascute in aceeasi zi sa fie cel putin 0.5?
2023-2024
Solutie in Python
import numpy as np
import matplotlib.pyplot as plt
n = 365
p_min = 0.5
gasit = 0
prob = np.zeros(n+1)
for k in range(2,n+1):
prob_k = 1.0
for i in range(n-k+1,n):
prob_k *= i/n
prob[k] = prob_k
if prob_k <= 1-p_min and not gasit:
print("k =",k,"Prob =",1-prob_k)
gasit = 1
print(prob[2:])
plt.plot(range(2,n+1),1.0-prob[2:])
plt.show()
2023-2024
Problems
1. 3 students X, Y and Z forgot to sign their papers in a written exam. Based
on students’ previous results, the teacher knows that those 3 students
wrote a good paper with probabilities 0.8, 0.7 and 0.6. After grading all
the papers, the teacher observed that among those 3 unsigned papers, 2
have good results and one has weak results. Assuming that students
worked independently, what is the probability that the weak paper was
done by student Z?
2. A program contains two modules. First module contains an error with
probability 0.2. Second module contains an error with probability 0.4. An
error in the first module will determine the program to freeze with
probability 0.5, while an error in the second module will determine the
program to freeze with probability 0.8. If there are errors in both
modules, the program will freeze with probability 0.9. Assuming that the
program froze, what is the probability that errors exist in both modules?
2023-2024
Probability Distributions
• We are going to introduce few basic issues related to
probability distributions for both discrete and
continuous cases.
𝑃𝑋 𝑥 = 𝑃(𝑋 = 𝑥)
• We present basic definitions of:
– Expected value
– Variance, covariance and standard deviation
– Correlation
• Properties:
1. 𝔼 𝑎𝑋 + 𝑏𝑌 + 𝑐 = 𝑎𝔼 𝑋 + 𝑏𝔼 𝑌 + 𝑐
2. If 𝑋 ⊥ 𝑌 then 𝔼 𝑋𝑌 = 𝔼 𝑋 𝔼,𝑌-
2023-2024
Example
• Determine the average number of dice rolls until we obtain number 6.
• Let 𝑋 be a random variable representing the number of dice rolls until we
obtain 6.
𝑑𝑜𝑚 𝑋 = 1,2, … = ℕ∗
• We must find 𝔼,𝑋-. Following the definition:
𝔼𝑋 = ∞ 𝑖=1 𝑖 ⋅ 𝑃(𝑋 = 𝑖)
1 5 1 5 2 1
𝑃 𝑋=1 = , 𝑃 𝑋=2 = ⋅ , 𝑃 𝑋=3 = ⋅ ,…
6 6 6 6 6
1 ∞ 5 𝑖−1 1
𝔼𝑋 = ⋅ 𝑖=1 𝑖 ⋅ = ⋅𝑆
6 6 6
• We observe that:
5 5 2 5 1 5 5
𝑆 = 1+ + +⋯ + ⋅𝑆 = 5+ ⋅𝑆 = 6+ ⋅𝑆
6 6 6 1−6 6 6
It follows that 𝑆 = 62 so 𝔼 𝑋 = 6.
2023-2024
Homework
2023-2024
Variance and Standard Deviation
• Definition.
– Variance of random variable 𝑋, denoted by 𝑉𝑎𝑟(𝑋), is
defined by:
2
𝑉𝑎𝑟 𝑋 = 𝔼, 𝑋 − 𝔼 𝑋 -
– Standard deviation of random variable 𝑋, denoted by
𝑆𝑡𝑑(𝑋), is defined by:
𝜍𝑋 = 𝑆𝑡𝑑 𝑋 = 𝑉𝑎𝑟 𝑋
• Property.
𝑉𝑎𝑟 𝑋 = 𝔼 𝑋 2 − 𝜇𝑋2
• HW: Prove this property
2023-2024
Covariance
• Definition. Covariance of two random variables 𝑋 and
𝑌, denoted by 𝐶𝑜𝑣(𝑋, 𝑌) or 𝜍𝑋𝑌 , defined by:
𝐶𝑜𝑣 𝑋, 𝑌 = 𝔼, 𝑋 − 𝔼 𝑋 𝑌 − 𝔼 𝑌 -
• Properties:
1. 𝐶𝑜𝑣 𝑋, 𝑌 = 𝔼 𝑋𝑌 − 𝔼 𝑋 𝔼 𝑌
2. If 𝑋 ⊥ 𝑌 then 𝐶𝑜𝑣 𝑋, 𝑌 = 0
3. 𝑉𝑎𝑟 𝑋 = 𝐶𝑜𝑣(𝑋, 𝑋)
2023-2024
Correlation
• Definition. Correlation of two random variables 𝑋 and 𝑌,
denoted by 𝜌𝑋𝑌 , is defined by:
𝐶𝑜𝑣 𝑋, 𝑌
𝜌𝑋𝑌 =
𝑆𝑡𝑑 𝑋 𝑆𝑡𝑑 𝑌
• Property:
𝜌𝑋𝑌 ≤ 1
• Observations:
1. If 𝜌𝑋𝑌 = 0 then it means that 𝑋 and 𝑌 are uncorrelated
2. If 𝜌𝑋𝑌 ≈ 0 then it means that 𝑋 and 𝑌 are weakly correlated.
3. If 𝜌𝑋𝑌 > 0 then it means that 𝑋 and 𝑌 are positively correlated.
4. If 𝜌𝑋𝑌 < 0 then it means that 𝑋 and 𝑌 are negatively correlated.
5. If |𝜌𝑋𝑌 | ≈ 1 then it means that 𝑋 and 𝑌 are strongly correlated
(positively or negatively depending on sign of 𝜌𝑋𝑌 )
2023-2024
Bernoulli Distribution
𝜇𝑋 = 𝔼 𝑋 = 0 ⋅ 𝑃 0 + 1 ⋅ 𝑃 1 = 𝑝
𝔼 𝑋 2 = 02 ⋅ 𝑃 0 + 12 ⋅ 𝑃 1 = 𝑝
𝑉𝑎𝑟 𝑋 = 𝔼 𝑋 2 − 𝜇𝑋2 = 𝑝 − 𝑝2 = 𝑝𝑞
𝜍𝑋 = 𝑉𝑎𝑟 𝑋 = 𝑝𝑞
2023-2024
Binomial Distribution
• Definition. A random variable 𝑋 represents the
total number of successes from a sequence of 𝑛
independent Bernoulli experiments cu success
probability 𝑝 has binomial distribution.
2023-2024
Discrete Uniform Distribution
• Definition. A random variable 𝑋 that can take 𝑛 ≥ 2 equally
probable values has a discrete uniform distribution.
𝑑𝑜𝑚 𝑋 = 𝐴 = *𝑎1 , 𝑎2 , … , 𝑎𝑛 + and 𝐴 = 𝑛 ≥ 2
1
𝑃 𝑥 = for 𝑥 ∈ 𝐴
𝑛
• If 𝐴 ⊂ ℝ then:
𝑎1 + 𝑎2 + ⋯ + 𝑎𝑛
𝔼𝑋 = 𝑥𝑃 𝑥 =
𝑛
𝑥∈≦
2 2 2
𝑎1 + 𝑎 2 + ⋯ + 𝑎 𝑛
𝔼 𝑋2 = 𝑥 2𝑃 𝑥 =
𝑛
𝑥∈≦
𝑉𝑎𝑟 𝑋 = 𝔼 𝑋 2 − 𝔼 𝑋 2 = ⋯ HW !
𝜍 𝑋 = 𝑉𝑎𝑟 𝑋 = ⋯ HW !
2023-2024
Continuous Uniform Distribution
• Definition. Let 𝑋 be a real random variable with domain ,𝑎, 𝑏-.
If probability density of 𝑋 is constant then 𝑋 has a continuous
uniform distribution with pdf and cdf defined by:
1
𝑓 𝑥 = for 𝑥 ∈ ,𝑎, 𝑏-
𝑏−𝑎
𝑥−𝑎
𝐹 𝑥 = for 𝑥 ∈ ,𝑎, 𝑏-
𝑏−𝑎
• If 𝑎 = 0 and 𝑏 = 1 then 𝑋 has a standard continuous uniform
distribution.
• Properties.
+∞
𝑏+𝑎
𝔼𝑋 = = 𝑥 ⋅ 𝑓 𝑥 𝑑𝑥
2 −∞
𝑏−𝑎 2
𝑉𝑎𝑟 𝑋 = 2023-2024
Normal Distribution
• Definition: O random variable normally (or Gaussian)
distributed has probability density given by:
𝑥−𝜇 2
1 −
𝑓 𝑥 = 𝑒 2𝜎2 pentru 𝑥 ∈ ℝ
𝜍 2𝜋
• Sums and averages have a normal distribution.
Moreover, measurement errors and fluctuations
resulting by cumulating small deviations have normal
distribution. Consequently, this distribution models
well values obtained by measuring physics variables:
weight, length, temperature, voltage and current,
pollution level, salinity, etc.
2023-2024
Parameters of Normal Distribution
• Parameter 𝜇 is called expected value, as:
𝔼𝑋 =𝜇
• Parameter 𝜍 is called standard deviation, as:
𝑆𝑡𝑑 𝑋 = 𝜍
• Sometimes:
– 𝜇 is called localization parameter, as it specifies the
reference value around which the values of the
variable are grouped
– 𝜍 is called scaling parameter , as it specifies how the
variable values of the variable are “spread” around
its reference value.
2023-2024
Standard Normal Distribution
• Definition: If 𝜇 = 0 and 𝜍 = 1 then we have a standard
normal distribution.
• Pdf and cdf of a standard normal random variable are:
1 𝑥2
𝜙 𝑥 =𝑓 𝑥 = 𝑒− 2 for 𝑥 ∈ ℝ
2𝜋
𝑥 𝑧2
1
Φ 𝑥 =𝐹 𝑥 = 𝑒 − 2 𝑑𝑧 for 𝑥 ∈ ℝ
−∞ 2𝜋
• Let 𝑋 be a random variable normally distributed. Then:
𝑋−𝜇
𝑍=
𝜍
is standard normally distributed.
• Let 𝑍 be a standard normal random variable. Then:
𝑋 = 𝜍𝑍 + 𝜇
Is normally distributed with parameters 𝜇 and 𝜍. 2023-2024
Central Limit Theorem – Intuition
• The mean of a sufficiently large number of independent and
identically distributed random variables is approximately
normally distributed.
• This can be interpreted as if you have a population with mean 𝜇
and standard deviation 𝜍 and take sufficiently large random
samples from the population with replacement, then the
sample mean will be approximately normally distributed.
• Let 𝑛 be the sample size and let 𝑋 be the sample mean. Then:
𝜇𝑋 = 𝜇
𝜍
𝜍𝑋 =
𝑛
• This situation can be experimentally observated for sufficiently
large samples, i.e. 𝑛 ≥ 30.
2023-2024
Centrol Limit Theorem – Mathematical Formulation
• Theorem: Let be 𝑛 independent random variables such that
𝔼 𝑋𝑖 = 𝜇 and 𝑆𝑡𝑑 𝑋𝑖 = 𝜍 for 1 ≤ 𝑖 ≤ 𝑛. Let 𝑆𝑛 = 𝑛𝑖=1 𝑋𝑖 .
Then:
𝑆𝑛 𝑆
−𝔼 𝑛
𝑆𝑛 − 𝔼 𝑆𝑛 𝑆𝑛 − 𝑛𝜇 𝑛 𝑛
𝑍𝑛 = = =
𝑆𝑡𝑑 𝑆𝑛 𝜍 𝑛 𝑆𝑛
𝑆𝑡𝑑
𝑛
converges in cdf to a standard normal random variable:
lim 𝐹𝑍𝑛 𝑧 = Φ 𝑧 for all 𝑧
n→∞
• Observations:
– 𝑆𝑛 diverges
𝑆𝑛 𝑆𝑛
– converges and 𝑉𝑎𝑟( ) converges to zero.
𝑛 𝑛
𝑆𝑛
– behaves like a normal random variable.
𝑛 2023-2024
Beta Distribution
• Definition: The value of a random variable 𝑋 represents the
success probability of a Bernoulli event, i.e. 𝑑𝑜𝑚(𝑋) = ,0,1-.
Variable 𝑋 has a Beta distribution with parameters 𝛼 and 𝛽 that
represent the number of observed successes and respectively
failures in a sequence of 𝛼 + 𝛽 successive event repetitions.
• Pdf of a random variable with Beta distribution is defined by:
1 Functia Beta
𝑓 𝑥; 𝛼, 𝛽 = 𝑥 𝛼−1 1 − 𝑥 𝛽−1
𝑩 𝜶, 𝜷
2023-2024
Beta Function
• Beta function with parameters 𝛼, 𝛽 > 0 is defined as:
Γ 𝛼 Γ 𝛽
𝐵 𝛼, 𝛽 =
Γ 𝛼+𝛽
where Γ Gamma function defined by:
∞
Γ 𝑧 = 𝑥 𝑧−1 𝑒 −𝑥 𝑑𝑥
0
for each complex number 𝑧 ∈ ℂ ∖ *−1, −2, … , −∞+.
• Γ function represents the generalization of factorials. If 𝑛 ≥ 1 is
a natural number then:
Γ 𝑛 = 𝑛−1 !
• If 𝛼, 𝛽 ≥ 1 are natural numbers then:
𝛼 𝛼𝛽 𝛼−1
𝑓 𝑥; 𝛼, 𝛽 = 𝐶𝛼+𝛽 𝑥 1 − 𝑥 𝛽−1
𝛼+𝛽
2023-2024
Properties of Beta Distribution
• If 𝑋 has a Beta distribution with parameters 𝛼, 𝛽 then:
𝛼
𝔼 𝑋; 𝛼, 𝛽 =
𝛼+𝛽
𝛼𝛽
𝑉𝑎𝑟 𝑋; 𝛼, 𝛽 =
𝛼+𝛽+1 𝛼+𝛽 2
• Observations:
– Expected value represents the ratio of successes to the total
number of trials.
– If 𝛼, 𝛽 → ∞ then 𝑉𝑎𝑟 𝑋; 𝛼, 𝛽 → 0, i.e. values of 𝑋 tend to
concentrate around the average value if the number of trials
grows to infinity.
2023-2024
Histograms
• A histogram illustrates the of pmf or pdf of a random variable
(discrete or continuous).
• We divide the range of values of the variable in equal intervals
(engl. bins) and we count the values from each bin.
• A histogram can illustrate the frequency of values or the
relative frequency of values of the variable for each bin.
– Frequency histogram is represented by a set of bars (one bar per each
interval) such that the height of each bar represents the number of
values from each bin.
– Relative frequency histogram represented by a set of bars (one bar per
each interval) such that the height of each bar represents the
proportion of values from each bin.
2023-2024
Simulating Binomial Distribution
• We are using Python’s numpy.random package for random
sampling.
• We set the distribution parameters:
– probability of success of an event 𝑝 ∈ (0,1)
– number 𝑛 of trials
• Experiment: choose 𝑥 ∈ (0,1) by uniform random sampling 𝑛
times and count how many times 𝑥 < 𝑝 (event succeeded).
• Repeat experiment of a large number 𝑁 of times recording how
many times the event succeeded for each repetition.
• Determine all integers 𝑘 from 0,1, … , 𝑛 for which the event
succeeded and for how many times.
2023-2024
Python Solution
import numpy as np
import matplotlib.pyplot as plt
N = 50000 # no. of experiment repetitions
n = 30 # no. of trials / experiment
p = 0.6 # probability of success
q = 1-p # probability of failure
u = np.random.rand(n,N)
y = u<p
x = sum(y)
unique, counts = np.unique(x,
return_counts=True)
unique_all = [i for i in range(0,n+1)]
counts_all = [0 for i in range(0,n+1)]
for i in range(unique.size):
counts_all[unique[i]] = counts[i]
hist = counts_all/sum(counts_all)
plt.figure(1)
plt.plot(unique_all,hist,'.')
plt.figure(2)
plt.hist(x,[i-0.5 for i in range(n+2)],
rwidth=1.0)
2023-2024
Verification Using binomial
import numpy as np
import matplotlib.pyplot as plt
N = 50000 # no. repetitions of the experim.
n = 30 # number of trials / experiment
p = 0.6 # probability of success
q = 1-p # probability of failure
out = np.random.binomial(n,p,N)
unique, counts = np.unique(out,
return_counts=True)
unique_all = [i for i in range(0,n+1)]
counts_all = [0 for i in range(0,n+1)]
for i in range(unique.size):
counts_all[unique[i]] = counts[i]
hist = counts_all/sum(counts_all)
plt.figure(1)
plt.plot(unique_all,hist,'.')
plt.figure(2)
plt.hist(out,out.max()-out.min())
2023-2024