Documente Academic
Documente Profesional
Documente Cultură
1
Compiled by William Chen (http://wzchen.com) with contributions
from Sebastian Chiu, Yuan Jiang, Yuqi Hou, and Jessy Hwang.
Material based off of Joe Blitzsteins (@stat110) lectures
(http://stat110.net) and Blitzstein/Hwangs Intro to Probability
textbook (http://bit.ly/introprobability). Licensed under CC
BY-NC-SA 4.0. Please share comments, suggestions, and errors at
http://github.com/wzchen/probability_cheatsheet.
Last Updated May 24, 2015
Paradoja de Simpson
c
P (A | B, C) < P (A | B , C) y P (A | B, C ) < P (A | B , C )
Conteo
Teora de Conjuntos
Regla de la multiplicaci
on - Supongamos que tenemos un
experimento compuesto (un experimento con m
ultiples componentes).
Si el componente 1 tiene n1 resultados posibles, el segundo
componente tiene n2 resultados posibles, y el componente r tiene nr
resultados posibles, entonces en general, hay n1 n2 . . . nr posibilidades
para todo el experimento.
Tabla de Muestreos - Las tablas de muestreo describen las
diferentes maneras posibles de tomar una muestra de tama
no k de una
poblaci
on de tama
no n. Los nombres de cada columna denotan si el
orden importa o no.
Importa
Con Reemplazo
Sin Reemplazo
No Importa
n + k 1
k
n
k
n
n!
(n k)!
P (Evento) =
n
umero de resultados favorables
n
umero de resultados
Probabilidad y Pensamiento
Condicionado
Eventos independientes - A y B son independientes si el
conocimiento de uno no te da ning
un tipo de informaci
on sobre el otro.
A y B son independientes si y solo si se cumple una de las siguientes
declaraciones:
P (A B) = P (A)P (B)
P (A|B) = P (A)
Independencia Condicional - A y B son condicionalmente
independientes dado C si: P (A B|C) = P (A|C)P (B|C). La
independencia condicional no implica independencia y la
independencia no implica independencia condicional.
(A B) A B
(A B) A B
Funci
on de Cuant
a (PMF) (S
olo caso discreto) es una funci
on que
toma el valor de x, devuelve la probabilidad de que la variable
aleatoria toma
on con valores
P ese valor x. La PMF es una funci
positivos, y
x P (X = x) = 1
PX (x) = P (X = x)
Funci
on de Densidad (CDF) es una funci
on que para valores x,
devuelve la probabilidad que una variable aleatoria toma hasta ese
valor x.
F (x) = P (X x)
P (A|C) = P (A|B1 , C)P (B1 |C) + ...P (A|Bn , C)P (Bn |C)
P (A|C) = P (A B1 |C) + P (A B2 |C) + ...P (A Bn |C)
Ley de la Probabilidad Total con B y Bc (Caso especial de un
conjunto divido), y con una condici
on extra (solo se agrega C)
c
xi P (X = xi )
N
otese que cualquier X e Y , a y b coeficientes escalares y c es
constante, la siguiente propieda de Linealidad de la Esperanza
mantiene:
P (A) = P (A B) + P (A B )
c
P (B|A)P (A)
P (A B)
=
P (B)
P (B)
P (A B|C)
P (B|A, C)P (A|C)
P (A|B, C) =
=
P (B|C)
P (B|C)
Odds Form of Bayes Rule, and with extra conditioning (just add C!)
Independence
Distributions
P (A) = P (A B1 ) + P (A B2 ) + ...P (A Bn )
P (A|B)
P (B|A) P (A)
=
P (Ac |B)
P (B|Ac ) P (Ac )
P (B|A, C) P (A|C)
P (A|B, C)
=
P (Ac |B, C)
P (B|Ac , C) P (Ac |C)
PX (x) = P (X = x)
FX (x0 ) = P (X x0 )
Independencia - Intuitivamente, dos variables aleatorias son
independientes si conocer una no te da ning
un tipo de informacion
sobre la otra. X e Y son independientes si para TODOS los valores de
x e y:
P (X = x, Y = y) = P (X = x)P (Y = y)
xP (X = x|A)
Funci
on de Probabilidad (PMF) (Solo Discreta) nos da la
probabilidad de que una variable aleatoria tome el valor x.
Funci
on de Distribuci
on Acumulada (CDF) nos da la
probabilidad de que una variable aleatoria tome el valor x o uno menor.
1
0
A occurs
A does not occur
Variance
2
=
2
2
What is the Cumulative Density Function (CDF)? It is the
following function of x.
F (x) = P (X x)
What is the Probability Density Function (PDF)? The PDF,
f (x), is the derivative of the CDF.
F (x) = f (x)
Or alternatively,
Z
Conditional Distributions
Moments
P (B|A)P (A)
f (t)dt
LotUS states that you can find the expected value of a function of a
random variable g(X) this way:
E(g(X)) = x g(x)P (X = x)
Z
E(g(X)) =
g(x)f (x)dx
Universality of Uniform
When you plug any random variable into its own CDF, you get a
Uniform[0,1] random variable. When you put a Uniform[0,1] into an
inverse CDF, you get the corresponding random variable. For example,
lets say that a random variable X has a CDF
F (x) = 1 e
F (X) = 1 e
fY |X (y|x) =
(k)
k = E(X ) = MX (0)
This is true by Taylor Expansion of etX
MX (t) = E(e
tX
)=
X
X
0k tk
E(X k )tk
=
k!
k!
k=0
k=0
k 0X
MX (0) = E(X e
f (x|A) =
Marginal Distributions
Review: Law of Total Probability
Says for an event A and partition
P
B1 , B2 , ...Bn : P (A) =
i P (A Bi )
To find the distribution of one (or more) random variables from a joint
distribution, sum or integrate over the irrelevant random variables.
Getting the Marginal PMF from the Joint PMF
P (X = x) =
P (X = x, Y = y)
) = E(X ) = k
t(aX+c)
ct
) = e E(e
(at)X
ct
) = e MX (at)
t(X+Y )
) = E(e
tX
)E(e
tY
) = MX (t) MY (t)
P (X = x, Y = y)
P (X = x|Y = y)P (Y = y)
=
P (X = x)
P (X = x)
Mean E(X) = 01
P (Y = y|X = x) =
F (x) =
Joint Distributions
Review: Joint Probability of events A and B: P (A B)
Both the Joint PMF and
Joint PDF must be non-negative and
P P
sum/integrate to 1. ( x y P (X = x, Y = y) = 1)
R R
( x y fX,Y (x, y) = 1). Like in the univariate cause, you sum/integrate
the PMF/PDF to get the CDF.
Multivariate LotUS
P
Review: E(g(X))
x g(x)P (X = x), or
R =
E(g(X)) = g(x)fX (x)dx
For discrete random variables:
E(g(X, Y )) =
XX
x
g(x, y)P (X = x, Y = y)
E(g(X, Y )) =
Cov(X, Y )
Var(X)Var(Y )
Continuous Transformations
Cov(X, Y )
X Y
X
Y E(XY ) = E(X)E(Y )
n
X
Var(Xi ) + 2
i=1
Cov(Xi , Xj )
i<j
n
2
Convolutions
Cov(X, Y ) = Cov(Y, X)
Cov(X + a, Y + b) = Cov(X, Y )
Cov(aX, bY ) = abCov(X, Y )
Cov(W + X, Y + Z) = Cov(W, Y ) + Cov(W, Z) + Cov(X, Y )
+ Cov(X, Z)
Covariance and Invariance - Correlation, Covariance, and Variance
are addition-invariant, which means that adding a constant to the
term(s) does not change the value. Let b and c be constants.
Var(X + c) = Var(X)
Cov(X + b, Y + c) = Cov(X, Y )
Corr(X + b, Y + c) = Corr(X, Y )
In addition to addition-invariance, Correlation is scale-invariant,
which means that multiplying the terms by any constant does not
affect the value. Covariance and Variance are not scale-invariant.
Corr(2X, 3Y ) = Corr(X, Y )
Cov(X1 , X2 )
P (T1 t) = 1 e
Order Statistics
Continuous Y
R
E(Y ) =
R yfY (y)dy
E(Y |X = x) =R
yfY |X (y|x)dy
E(Y |A) =
yf (y|A)dy
n
X
k=i
fX(i) (x) = n
n!
j1
nj
t
(1 t)
(j 1)!(n j)!
Conditional Expectation
d 1
g (y)
dy
fU(j) (u) =
X
Y Cov(X, Y ) = 0
F (X(j) ) U(j)
U(j) Beta(j, n j + 1)
J =
n 1
i1
n
k
nk
F (x) (1 F (x))
k
i1
F (x)
(1 F (X))
ni
f (x)
Conditional Variance
Eves Law (aka Law of Total Variance)
Var(Y ) = E(Var(Y |X)) + Var(E(Y |X))
Chain Properties
We use
to denote is approximately distributed. We can use the
central limit theorem when we have a random variable, Y that is a
sum of n i.i.d. random variables with n large. Let us say that
2
E(Y ) = Y and Var(Y ) = Y
. We have that:
2
Y
N (Y , Y )
When we use central limit theorem to estimate Y , we usually have
n = 1 (X1 + X2 + + Xn ).
Y = X1 + X2 + + Xn or Y = X
n
2
Specifically, if we say that each of the iid Xi have mean X and X
,
then we have the following approximations.
2
X1 + X2 + + Xn
N (nX , nX )
2
n = 1 (X1 + X2 + + Xn )
X
N (X , X )
n
n
We use
to denote converges in distribution to as n . These
are the same results as the previous section, only letting n and
not letting our normal distribution have any n terms.
1
d
N (0, 1)
(X1 + + Xn nX )
n
n X d
X
N (0, 1)
/n
Markov Chains
A Markov Chain is a walk along a (finite or infinite, but for this class
usually finite) discrete state space {1, 2, . . . , M}. We let Xt denote
which element of the state space the walk is on at time t. The Markov
Chain is the set of random variables denoting where the walk is at all
points in time, {X0 , X1 , X2 , . . . }, as long as if you want to predict
where the chain is at at a future time, you only need to use the present
state, and not any past information. In other words, the given the
present, the future and past are conditionally independent. Formal
Definition:
P (Xn+1 = j|X0 = i0 , X1 = i1 , . . . , Xn = i) = P (Xn+1 = j|Xn = i)
State Properties
A state is either recurrent or transient.
If you start at a Recurrent State, then you will always return
back to that state at some point in the future. You can
check-out any time you like, but you can never leave.
Otherwise you are at a Transient State. There is some
probability that once you leave you will never return. You
dont have to go home, but you cant stay here.
A state is either periodic or aperiodic.
If you start at a Periodic State of period k, then the GCD of
all of the possible number steps it would take to return back is
> 1.
Otherwise you are at an Aperiodic State. The GCD of all of
the possible number of steps it would take to return back is 1.
Transition Matrix
Element qij in square transition matrix Q is the probability that the
chain goes from state i to state j, or more formally:
qij = P (Xn+1 = j|Xn = i)
To find the probability that the chain goes from state i to state j in m
steps, take the (i, j)th element of Qm .
(m)
Definition
qij
Stationary Distribution
= P (Xn+m = j|Xn = i)
If you have a certain number of nodes with edges between them, and a
chain can pick any edge randomly and move to another node, then this
is a random walk on an undirected network. The stationary
distribution of this chain is proportional to the degree sequence. The
degree sequence is the vector of the degrees of each node, defined as
how many edges it has.
Continuous Distributions
Exponential Distribution
Let us say that X is distributed Expo(). We know the following:
Story Youre sitting on an open meadow right before the break of
dawn, wishing that airplanes in the night sky were shooting stars,
because you could really use a wish right now. You know that shooting
stars come on average every 15 minutes, but its never true that a
shooting star is ever due to come because youve waited so long.
Your waiting time is memorylessness, which means that the time until
the next shooting star comes does not depend on how long youve
waited already.
Example The waiting time until the next shooting star is distributed
Expo(4). The 4 here is , or the rate parameter, or how many
shooting stars we expect to see in a unit of time. The expected time
1
, or 14 of an hour. You can expect to
until the next shooting star is
wait 15 minutes until the next shooting star.
Expos are rescaled Expos
Y Expo() X = Y Expo(1)
Memorylessness The Exponential Distribution is the sole
continuous memoryless distribution. This means that its always as
good as new, which means that the probability of it failing in the
next infinitesimal time period is the same as any infinitesimal time
period. This means that for an exponentially distributed X and any
real numbers t and s,
P (X > s + t|X > s) = P (X > t)
Uniform
Let us say that U is distributed Unif(a, b). We know the following:
Properties of the Uniform For a uniform distribution, the
probability of an draw from any interval on the uniform is proportion
to the length of the uniform. The PDF of a Uniform is just a constant,
so when you integrate over the PDF, you will get an area proportional
to the length of the interval.
Example William throws darts really badly, so his darts are uniform
over the whole room because theyre equally likely to appear anywhere.
Williams darts have a uniform distribution on the surface of the room.
The uniform is the only distribution where the probably of hitting in
any specific region is proportion to the area/length/volume of that
region, and where the density of occurrence in any one specific spot is
constant throughout the whole support.
Normal
Let us say that X is distributed N (, 2 ). We know the following:
Central Limit Theorem The Normal distribution is ubiquitous
because of the central limit theorem, which states that averages of
independent identically-distributed variables will approach a normal
distribution regardless of the initial distribution.
Transformable Every time we stretch or scale the normal
distribution, we change it to another normal distribution. If we add c
to a normally distributed random variable, then its mean increases
additively by c. If we multiply a normally distributed random variable
by c, then its variance increases multiplicatively by c2 . Note that for
every normally distributed random variable X N (, 2 ), we can
transform it to the standard N (0, 1) by the following transformation:
X
N (0, 1)
Gamma Distribution
Let us say that X is distributed Gamma(a, ). We know the following:
Story You sit waiting for shooting stars, and you know that the
waiting time for a star is distributed Expo(). You want to see a
shooting stars before you go home. X is the total waiting time for the
ath shooting star.
Example You are at a bank, and there are 3 people ahead of you.
The serving time for each person is distributed Exponentially with
mean of 2 time units. The distribution of your waiting time until you
begin service is Gamma(3, 21 )
Beta Distribution
Geometric
p Beta(a, b)
Then after observing the value X = x, we get a posterior distribution
p|(X = x) Beta(a + x, b + n x)
Relationship with Gamma This is the bank-post office result. See
Reasoning by Representation
2
Distribution
Let us say that X is distributed 2n . We know the following:
Story A Chi-Squared(n) is a sum of n independent squared normals.
Example The sum of squared errors are distributed 2n
i.i.d.
n = Z1 + Z2 + + Zn , Z
n 1
,
2 2
N (0, 1)
Discrete Distributions
DWR = Draw w/ replacement, DWoR = Draw w/o replacement
n
n!
=
n1 n2 . . . n k
n1 !n2 ! . . . nk !
Hypergeometric
Negative Binomial
1
Example If each pokeball we throw has a 10
probability to catch
1
Mew, the number of failed pokeballs will be distributed Geom( 10
).
First Success
X|p Bin(n, p)
DWR
DWoR
Binom/Bern
(Bern if n = 1)
NBin/Geom
(Geom if k = 1)
HGeom
NHGeom
(see example probs)
Bernoulli
The Bernoulli distribution is the simplest case of the Binomial
distribution, where we only have one trial, or n = 1. Let us say that X
is distributed Bern(p). We know the following:
Story. X succeeds (is 1) with probability p, and X fails (is 0)
with probability 1 p.
Example. A fair coin flip is distributed Bern( 21 ).
Binomial
Let us say that X is distributed Bin(n, p). We know the following:
Story X is the number of successes that we will achieve in n
independent trials, where each trial can be either a success or a failure,
each with the same probability p of success. We can also say that X is
a sum of multiple independent Bern(p) random variables. Let
X Bin(n, p) and Xj Bern(p), where all of the Bernoullis are
independent. We can express the following:
X = X1 + X2 + X3 + + Xn
Example If Jeremy Lin makes 10 free throws and each one
independently has a 43 chance of getting in, then the number of free
throws he makes is distributed Bin(10, 34 ), or, letting X be the number
of free throws that he makes, X is a Binomial Random Variable
distributed Bin(10, 34 ).
Binomial Coefficient n
k is a function of n and k and is read n
choose k, and means out of n possible indistinguishable objects, how
many ways can I possibly choose k of them? The formula for the
binomial coefficient is:
n
n!
=
k
k!(n k)!
nk
w+b
n
Poisson
Let us say that X is distributed Pois(). We know the following:
Story There are rare events (low probability events) that occur many
different ways (high possibilities of occurences) at an average rate of
occurrences per unit space or time. The number of events that occur
in that unit of space or time is X.
Example A certain busy intersection has an average of 2 accidents
per month. Since an accident is a low probability event that can
happen many different ways, the number of accidents in a month at
that intersection is distributed Pois(2). The number of accidents that
happen in two months at that intersection is distributed Pois(4)
Xi + Xj Bin(n, pi + pj )
X1 ,X2 ,X3 Mult3 (n,(p1 ,p2 ,p3 ))X1 ,X2 +X3 Mult2 (n,(p1 ,p2 +p3 ))
X1 , . . . , Xk1 |Xk = nk Multk1
n nk ,
pk1
p1
,...,
1 pk
1 pk
Multivariate Uniform
See the univariate uniform for stories and examples. For multivariate
uniforms, all you need to know is that probability is proportional to
volume. More formally, probability is the volume of the region of
interest divided by the total volume of the support. Every point in the
support has equal density of value Total1 Area .
Distribution Properties
Important CDFs
Exponential F (X) = 1 ex , x (0, ))
Multivariate Distributions
Multinomial
Story - We have n items, and then can fall into any one of the k
buckets independently with the probabilities p
~ = (p1 , p2 , . . . , pk ).
1. X + Y Pois(1 + 2 )
2. X|(X + Y = k) Bin k,
Example - Let us assume that every year, 100 students in the Harry
Potter Universe are randomly and independently sorted into one of
four houses with equal probability. The number of people in each of
the houses is distributed Mult4 (100, p
~), where p
~ = (.25, .25, .25, .25).
Note that X1 + X2 + + X4 = 100, and they are dependent.
1
1 +2
3. X Gamma(n1 , ), Y Gamma(n2 , ),
X
Y X + Y Gamma(n1 + n2 , ) Note that Gamma
can thus be thought of as a sum of iid Expos.
4. X NBin(r1 , p), Y NBin(r2 , p),
X
Y X + Y NBin(r1 + r2 , p)
6. Z1 N (1 , 12 ), Z2 N (2 , 22 ),
Z1
Z2 Z1 + Z2 N (1 + 2 , 12 + 22 )
n!
2n
n
e
n
Miscellaneous Definitions
Medians A continuous random variable X has median m if
P (X m) = 50%
A discrete random variable X has median m if
P (X m) 50% and P (X m) 50%
3. Gamma(1, ) Expo()
1
4. 2n Gamma n
2, 2
5. NBin(1, p) Geom(p)
e1 = 1/e by a definition of ex .
Reasoning by Representation
Beta-Gamma relationship If X Gamma(a, ),
Y Gamma(b, ), X
Y then
Beta(a, b)
X
X+Y
Formulas
Example Problems
Calculating Probability (1)
2. Beta(1, 1) Unif(0, 1)
X+Y
In every time period, Bobo the amoeba can die, live, or split into two
amoebas with probabilities 0.25, 0.25, and 0.5, respectively. All of
Bobos offspring have the same probabilities. Find P (D), the
probability that Bobos lineage eventually dies out. Answer - We use
law of probability, and define the events B0 , B1 . and B2 where Bi
means that Bobo has split into i amoebas. We note that P (D|B0 ) = 1
since his lineage has died, P (D|B1 ) = P (D), and P (D|B2 ) = P (D)2
since both lines of his lineage must die out in order for Bobos lineage
to die out.
X
X+Y
1
1
1
+ + +
log n + 0.57721 . . .
2
3
n
Stirlings Approximation
1+
I call 2 UberXs and 3 Lyfts at the same time. If the time it takes for
the rides to reach me is i.i.d., what is the probability that all the Lyfts
will arrive first? Answer - since the arrival times of the five cars are
i.i.d., all 5! orderings of the arrivals are equally likely. There are 3!2!
orderings that involve the Lyfts arriving first, so the probability that
3!2!
= 1/10 . Alternatively, there are 53
the Lyfts arrive first is
5!
ways to choose 3 of the 5 slots for the Lyfts to occupy, where each of
the choices are equally likely. 1 of those choices have all 3 of the Lyfts
5
arriving first, thus the probability is 1/
= 1/10
3
Linearity of Expectation
Geometric Series
2
a + ar + ar + + ar
n1
n1
X
1 rn
=
ar = a
1r
k=0
k
Exponential Function (e )
X
xn
x2
x3
x n
x
1+
e =
=1+x+
+
+ = lim
n
n!
2!
3!
n
n=1
X1
n
n
n
+
+ +
= n
n
n1
1
j
j=1
Similarily,
P (max(X1 , X2 , . . . , Xn ) a) = P (X1 a, X2 a, . . . , Xn a)
We will use that principal to find the CDF of U(n) , where
U(n) = max(U1 , U2 , . . . , Un ) where Ui Unif(0, 1) (iid).
P (max(U1 , U2 , . . . , Un ) a) = P (U1 a, U2 a, . . . , Un a)
= P (U1 a)P (U2 a) . . . P (Un a)
= a
nk
x (1 x)
0
dx =
1
(n + 1)
n
k
E
1
X+1
=
X
k=0
1
X+1
. Answer - By LOTUS,
1 e k
e X k+1
e
=
=
(e 1)
k+1
k!
k=0 (k + 1)!
Markov Chains
William really likes speedsolving Rubiks Cubes. But hes pretty bad
at it, so sometimes he fails. On any given day, William will attempt
N Geom(s) Rubiks Cubes. Suppose each time, he has a
independent probability p of solving the cube. Let T be the number of
Rubiks Cubes he solves during a day. Find the mean and variance of
T . Answer - Note that T |N Bin(N, p). As a result, we have by
Adams Law that
p(1 s)
s
p(1 p)(1 s)
p2 (1 s)
p(1 s)(p + s(1 p))
+
=
s
s2
s2
tT
|N )) = E((pe + q) ) = s
(pe + 1 p) (1 s)
n=0
,
+ +
1
~
s=
tT
0
1
E(e
s0 = s0 (1 ) + s1 and s1 = s0 () + s0 (1 )
0
Q=
1
s
s
=
1 (1 s)(pet + 1 p)
s + (1 s)p (1 s)pet
s0 q01 =
= s1 q10
+
tX
E(e ) =
1 (1 )et
So, we would want to try to get our MGF into this form to identify
what is. Taking our original MGF, it would appear that dividing by
s + (1 s)p would allow us to do this. Therefore, we have that
Robber
E(etT ) =
s
s+(1s)p
=
(1s)p
s + (1 s)p (1 s)pet
1 s+(1s)p et
s
s + (1 s)p
E(X ) =
6
3
But a much nicer way to use the MGF here is via pattern recognition:
note that M (t) looks like it came from a geometric series:
1
1
tn
n!
X
n=0
n
=
X
n=0
n! t
n n!
The coefficient of
here is the nth moment of X, so we have
E(X n ) = n!
n for all nonnegative integers n. So again we get the same
answer.
Biohazards
Section author: Jessy Hwang
1. Dont misuse the native definition of probability - When
answering What is the probability that in a group of 3 people,
no two have the same birth month?, it is not correct to treat
the people as indistinguishable balls being placed into 12 boxes,
since that assumes the list of birth months {January, January,
January} is just as likely as the list {January, April, June},
when the latter is fix times more likely.
2. Dont confuse unconditional and conditional
probabilities, or go in circles with Bayes Rule P (B|A)P (A)
. It is not correct to say P (B) = 1
P (A|B) =
P (B)
because we know that B happened.; P(B) is the probability
before we have information about whether B happened. It is
not correct to use P (A|B) in place of P (A) on the right-hand
side.
write
F (X)dx because F (X) is a random variable. It does
not make sense to write P (X) because X is not an event.
5. A random variable is not the same thing as its
distribution - To get the PDF of X 2 , you cant just square the
PDF of X. The right way is to use one variable transformations
- To get the PDF of X + Y , you cant just add the PDF of X
Recommended Resources
Distributions
Distribution
Bernoulli
Bern(p)
P (X = 1) = p
P (X = 0) = q
k
p (1 p)nk
P (X = k) = n
k
Binomial
Bin(n, p)
Geometric
Geom(p)
Negative Binom.
NBin(r, p)
Hypergeometric
HGeom(w, b, n)
Poisson
Pois()
Expected Value
Variance
MGF
pq
q + pet
k {0, 1, 2, . . . n}
np
npq
(q + pet )n
P (X = k) = q k p
k {0, 1, 2, . . . }
r n
P (X = n) = r+n1
p q
r1
q/p
q/p2
rq/p
rq/p2
n {0, 1, 2, . . . }
P (X = k) =
Beta
Beta(a, b)
Chi-Squared
2n
Multivar Uniform
A is support
Multinomial
Multk (n, p
~)
b
nk
P (X = k) =
w+b
n
)
n
k!
e(e
1)
a+b
2
(ba)2
12
x (, )
et+
f (x) = ex
x (0, )
1/
1/2
,t
t
a/
a/2
f (x) =
1
ba
etb eta
t(ba)
2
2
1 e(x ) /(2 )
2
1
(x)a ex x1
(a)
x (0, )
f (x) =
w+bn
n (1
w+b1 n
nw
b+w
x (a, b)
f (x) =
p
r
t
( 1qe
t ) , qe < 1
f (x) =
<1
k {0, 1, 2, . . . }
Exponential
Expo()
Gamma
Gamma(a, )
w
k
k {0, 1, 2, . . . , n}
Uniform
Unif(a, b)
Normal
N (, 2 )
p
, qet
1qet
(a+b) a1
x
(1
(a)(b)
2 t2
2
a
<
,t <
x)b1
x (0, 1)
(1)
(a+b+1)
2n
n~
p
Var(Xi ) = npi (1 pi )
Cov(Xi , Xj ) = npi pj
a
a+b
1
xn/21 ex/2
2n/2 (n/2)
x (0, )
f (x) =
1
|A|
xA
~ =~
P (X
n) =
n1
n
p
n1 ...nk 1
n
. . . pk k
n = n1 + n2 + + nk
P
k
i=1
pi eti
Inequalities
Cauchy-Schwarz
p
|E(XY )| E(X 2 )E(Y 2 )
Markov
E|X|
P (X a)
a
Chebychev
P (|X X | a)
Jensen
2
X
a2
n