Sunteți pe pagina 1din 19

Background material

1. Some Background on Probability


2. Univariate random variables: definition
3. Discrete random variables
4. Continuous random variables
5. Change of variables
6. Bivariate discrete random variables
7. Bivariate continuous random variables
8. The multivariate normal
References: Amemiya: part of 3 and 5; Goldberger: 2, 4 and part of 7.

Some Background on Probability

Probability assigns numbers to events. It forms the foundation of statistics, and can be
studied independently.

1.1

Axioms of probability

The fundamental notion is that of a probability space, that is composed of three elements:
The sample space . It represents a population formed of individuals, firms, etc...
A collection of events F. An event (one element of F) is a subset F .
A probability measure P . P is a function which maps F onto [0, 1]. It assigns a
probability to each event.

The probability space is (, F, P ). It corresponds to the experiment we want to study.


Example: Experiment=tossing a coin twice. The sample space is
= {HH, HT, T H, T T }.
An event is any subset of . One can define a simple event (as opposed to composite)
as an element of the sample space. Lastly, F is endowed with the uniform (Lebesgue)
measure.
Axioms of probability:
1. 0 P (A) 1 for all A F.
2. P () = 1.
3. P (j Aj ) =
j 6= j 0 .

P (Aj ) for all family of disjoint events: Aj F, and Aj Aj 0 = if

Consequence of the axioms:


Let A = A be the complement of an event A. Then:
P (A) = 1 P (A).
For any two events A and B we have
P (A B) = P (A) + P (B) P (A B).
Exercise: if a die is rolled twice, what is the probability of having at least one ace.
Solution: (6 + 6 1)/36 = 11/36.

1.2

Counting techniques

Simple events with equal probability: Assume that the sample space consists of a
finite number of simple events with equal probability. Then if A F, n(A) is the number
of simple events contained in A and n() is the number of simple events contained in ,
we have:
n(A)
P (A) =
.
n()

Example: the probability of having two tails if two coins are tossed is:
P ({T T }) =

n({T T })
1
= .
n({HH, HT, T H, T T })
4

Permutations: The number of distinct ordered sets taking r elements from n elements
is
n!
= n(n 1)...(n r + 1) Arn .
(n r)!
Combinations: The number of distinct sets taking r elements from n elements is
n!
n(n 1)...(n r + 1)
=
Cnr .
r!(n r)!
r(r 1)...1
Note that Newtons formula is
n

(a + b) =

n
X

Cnk ak bnk .

k=0

So in particular:
n
X

Cnk = 2n ,

k=0

this is the number of elements in F if has n elements.


Exercise: What is the probability that a five-cards poker hand will contain 3 aces?
Solution: The number of poker hands containing 3 aces is C41 (the colour of the aces)
2
5
times C48
(the other cards, that need to be non aces). The total number of hands is C52
.
5
1 2
So the probability is: C4 C48 /C52 .001736.

1.3

Conditional probability

We are often interested in the properties of an economic variable given others. Example:
wage given education and age. Moreover, most economic models hold ceteris paribus,
hence the need to condition.
Definition: Let A and B be events. The probability that A occurs given B is:
P (A|B) =

P (A B)
P (B)

where A B is the joint event that A and B occur.

There are two important consequences of this definition.


Law of total probability: Let events A1 , ..., AJ be mutually exclusive (Ai Aj =
for all i 6= j) such that P (A1 ... AJ ) = 1. Then
P (B) =

J
X

P (B|Aj )P (Aj ).

j=1

Bayes theorem:
P (A|B) =

P (B|A)P (A)
.
P (B)

Generalization: if A1 , ..., AJ are as above then


P (B|Ai )P (Ai )
P (Ai |B) = PJ
.
j=1 P (B|Aj )P (Aj )
Exercise: In a class there are 30 students, 20 of which are economists. 10 economists
are females, as well as 5 non economists. If I pick a female randomly what is the probability
that she is an economist?
Solution:
P (F |E)P (E)
=
P (E|F ) =
P (F |E)P (E) + P (F |N E)P (N E)

10 20
20 30
10 20
5 10
+ 10
20 30
30

2
= .
3

Independence: A and B are statistically independent iff


P (A B) = P (A)P (B).
An alternative definition is, iff:
P (A|B) = P (A),

or equivalently P (B|A) = P (B)

That is: A is independent of B if B (resp. A) does not bring any new information on A
(resp. B).
Ex: A {T, H}=result of the first coin, B {T, H}=result of the second coin.
We can also define pairwise independence, and mutual independence.
Exercise: If the probability being admitted to one school is 1/n and you apply to n
schools, what is the probability that you will be admitted to at least one school?

Solution: P (admitted in at least 1) is 1 P (admitted at none), that is, if we assume


that school decisions are taken independently:


1
1 P (not admitted at school 1)...P (not admitted at school n) = 1 1
n

n
.

Definition

A random variable is a variable that takes values according to a certain probability distribution.
More formally, a random variable is a function of the sample space onto RK :
X : RK ,
7 X().
If K = 1 X is said univariate.
We generally can forget about . It will be sufficient to think of X as a variable
that can take some values with some probability; that is: X is stochastic (as opposed to
deterministic).
To all univariate random variable X corresponds a cumulative distribution function
(cdf), that maps the real line onto the interval [0, 1], given by:
FX (x) = P { , X() x} P (X x),

for all scalar x.

FX can take various forms, depending on whether X is discrete, continuous or mixed.

Discrete random variables

Definition: A random variable is discrete if its support is finite or countable.


Example 1: X takes the value 1 if the outcome of the event tossing a coin is heads,
zero if it is tails. Then = {H, T }.
Example 2: X is the number of firms going bankrupt in a given year. Then X can
take any non negative integer value.
The convention is then to order the mass points of X in increasing order:
x1 < x2 < ....

The cdf of X is a step function:

0 = FX (x
1 ) < FX (x1 ) = FX (x2 ) < FX (x2 ) < ... < 1.

FX is non decreasing and right-continuous.


Then the probability mass function (pmf) satisfies:
fX (x) = P (X = x).
Note that fX is equal to zero outside of the discrete support of X. X is characterized
either by its cdf or by its pmf.
Bernouilli:
fX (x) = px (1 p)1x , for x = 0, 1.
We check that this is a pmf.
Example 1: the coin toss (p = 1/2).
Example 2: unemployment probability (p depends on characteristics of the individual,
the local labor market, previous time in unemployent...).
Discrete Uniform:
fX (x) = 1/N for x = 1, 2, ...N, zero elsewhere.
Example: roll of a die (N = 6).
Binomial: Let Sn = X1 + .... + Xn be the number of successes of n draws from a
Bernouilli distribution with parameter p. Then the p.m.f. of Sn is given by:
fSn (x) = Cnx px (1 p)nx ,
where
Cnx =
are the binomial coefficients.
We verify that
n
X

x = 0...n,

n!
x!(n x)!

fSn (x) = (p + (1 p))n = 1.

x=0

Poisson: The sample space is {0, 1, 2, 3, ...}. The p.m.f. of a Poisson distributed random
variable with parameter > 0 is:
fX (x) = exp()
We verify that:

fX (x) = exp()

x=0

x
.
x!

X
x
x=0

x!

= 1.

The Poisson is widely used to model duration processes. Example: arrival of job offers.

Continuous random variables

Definition: A random variable is continuous if its support forms a continuum.


Recall that the cdf of X is defined as
FX (x) = P (X x).
As in the discrete case, FX is non decreasing and right-continuous. In addition it satisfies:
limx FX (x) = 0, and limx+ FX (x) = 1.
Density function: X is absolutely continuous (or simply continuous) if there exists a
non negative function fX such that:
Z
FX (b) FX (a) =

fX (x)dx, for all a, b.


a

Then, fX is called the probability density function (pdf) of X.


Note that
fX (x) =

dF (x)
dx

at the points where FX is differentiable. In particular, fX (x) 0, and


These two properties characterize a pdf.

R +

fX (x)dx = 1.

Probability of a simple event: An important remark is that, as FX is continuous


(and not only right-continuous):
P (X = a) = lim FX (b) FX (a) = 0.
ba

This implies that the event X = a has probability zero! Still, this is not an impossible
event.
Example: the probability of having a yearly wage of 114, 773.6 dollars is zero. However,
there can be individuals with that wage in the sample.
Conditional density function: Let [x1 , x2 ] [a, b]. Then:
P (X [x1 , x2 ]|X [a, b]) =

P (X [x1 , x2 ])
.
P (X [a, b])

Define the conditional density of X given X [a, b] as:


fX (x|X [a, b]) = R b

fX (x)

if x [a, b]
fX (x)dx
= 0 if x
/ [a, b].
a

Then, for all x1 , x2 :


Z

x2

P (X [x1 , x2 ]|X [a, b]) =

fX (x|X [a, b])dx.


x1

Example: Let f be the pdf of the wage variable. The pdf of the wage, given that one
belongs to the 20% poorest of the population, is:
fX (x|x [0, FX1 (.20)]) =

f (x)
 X1
 = 5fX (x) if x [0, FX1 (.20)].
FX FX (.20)

Uniform (0, 1):


fX (x) = 1, for all x [0, 1].
The cdf is FX (x) = x for x [0, 1], FX (x) = 0 for x < 0 and FX (x) = 1 for x > 1. Can
be generalized to Uniform (a, b).
Exponential: with parameter :
fX (x) = exp(x),
The cdf is FX (x) = (1 exp(x))1{x > 0}.

x > 0.

Standard normal: The sample space is (, +), and the p.d.f. of a standard normal
random variable is given by:
1
fX (x) = exp(x2 /2) = (x).
2
The cdf has no simple analytical expression:
Z

(e
x)de
x = (x).

FX (x) =

The function has a maximum at x = 0, and presents two inflexion points at 1.


Moreover, it is symmetric around zero so that
(x) = 1 (x).

(x) = (x),

Normal: More generally, a random variable X is normal if:


X

is standard normal. The pdf of X is given by:


fX (x) =

1 x
(
),

and its cdf is given by:

x
).

The normal family plays a central role in statistics and econometrics. In the course,
we will understand why when studying the central limit theorem.
X (x) = (

Mixed random variable: There can also be mixtures of continuous and discrete random variables. An example arises when random variables are censored.
A convenient way to represent a mixed random variable is a a couple of two random
variables, one discrete and the other continuous.
Example: financial assets owned by a household (there are typically many zeros). Let
X be the value of the financial assets, and Y be the dummy variable (i.e. binary 0/1)
indicating participation to financial markets. We study (X, Y ).

Functions of random variables

Let X be a (discrete or continuous) random variable. We are interested in the properties


of the transformed random variable:
Y = g(X),
where g is an univariate function.
Example: we often like to model log-wages instead of wages.
The link is obtained by the following equality of cdfs:
FY (y) = P (Y y) = P (g(X) y).
If Y is a strictly monotonic function of X then g 1 exists and:

P (X g 1 (y)) = F (g 1 (y))
if g(.) is increasing
X
FY (y) =
P (X g 1 (y)) = 1 F (g 1 (y)) if g(.) is decreasing.
X
Theorem: If X is (absolutely) continuous the pdf of Y is given by:
1
dg (y)
.
fY (y) = fX (g (y))
dy
1

Example: if X, the log-wage, is distributed as a standard normal, then the pdf of the
wage Y is given by:

1 (log y)
fY (y) = fX (log(y)) =
.
y
y
This is the pdf of the (standard) log-normal distribution.
Remark: as g(g 1 (y)) = y we have:
1


dg (y)


1
1



.
= fX (g (y)) 0 1
fY (y) = fX (g (y))

dy
g (g (y))
1

Ranks: Let Y = FX (X). As:


FFX (X) (u) = P (FX (X) u) = P (X FX1 (u)) = FX (FX1 (u)) = u,

10

the rank function is uniformly distributed:


FX (X) U[0,1] .
Remark: we need FX to be strictly increasing. The result does not hold for discrete
random variables.
Flexible parametric modelling: The normal is symmetric and has zero kurtosis. This
can be an unsatisfactory property to model some economic variables.
Example 1: age at the end of school is typically very skewed.
Example 2: stock returns data in finance present high kurtosis.
One first solution to circumvent this problem is to transform a normal random
variable. For instance, the log-normal distribution is used to model wages:
X such that log X N (, 2 ).
A second possibility is to use other parametric families, such as the Gamma distribution.
A general approach is to use mixtures of normals. The pdf of a mixture of K normals
is of the form:
K
X
1 x k
fX (x) =
pk (
),

k
k
k=1
where

PK

k=1

pk = 1.

Result: every continuous pdf can be approximated by the pdf of a mixture of normals.
This result is true when K is not fixed. In practice, the choice of K is analogous the
choice of a bandwidth in nonparametric estimation, and so can be difficult. Still, the
family of normal mixtures is remarkable for its simplicity and its ability to fit arbitrary
continuous distributions.

Bivariate discrete random variables

Economics focuses on the relations between variables. It is thus necessary to generalize


the analysis of the previous sections to cases where several variables are involved.
We now turn to multivariate random variables X = (X1 , ..., XK ) taking (vector) values
in RK , where K > 1. To simplify, we shall most of the time consider bivariate random
variables (X, Y ).

11

The random variable is characterized by its joint cdf :


FX,Y (x, y) = P (X x, Y y).
Joint probability: There is a finite or countable number of values for X and Y . Then
the joint probability mass function is given by:
fX,Y (x, y) = P (X = x, Y = y).
This function is zero except at the mass points of (X, Y ). We can list the values of
(X, Y ) in a 2-by-2 array: (xj , yk ), where x1 < x2 < ... and y1 < y2 < .... Then
XX
j

fX,Y (xj , yk ) = 1.

Example 1: We toss 2 coins. We denote X = 1 if we obtain 2 heads, and Y = 1 if the


two coins show the same result. The pmf can be represented by the following contingency
table:
X/Y
0
1
0
1

1/2
0

1/4
1/4

The pmf of (X, Y ) is such that fX,Y (0, 0) = 1/2, fX,Y (0, 1) = 1/4, fX,Y (1, 0) = 0,
fX,Y (1, 1) = 1/4, and zero everywhere else. The cdf satisfies FX,Y (0, 0) = 1/2, FX,Y (0, 1) =
3/4, FX,Y (1, 0) = 1/2, FX,Y (1, 1) = 1, extended in the usual way.
Example 2: Trinomial distribution with parameters n, p, q, where p + q 1. The pmf
is given by:
n!
px q y (1 p q)nxy
fX,Y (x, y) =
x!y!(n x y)!
for all (x, y) {0, ..., n}2 such that x + y n, and zero everywhere else. This can be a
relevant framework to model labor market participation, unemployment and employment
over n months.
We check that:
n X
nx
X
fX,Y (x, y) = 1.
x=0 y=0

Marginal probability:
fX (x) = P (X = x) =

X
y

12

P (X = x, Y = y),

so:
fX (x) =

fX,Y (x, y).

Example 1:
X/Y

0
1

1/2
0

1/4 3/4
1/4 1/4

1/2

1/2

Example 2: The marginals of the trinomial distribution are binomial. For instance:
fX (x) =

nx
X
y=0

n!
n!
px q y (1 p q)nxy =
px (1 p)nx .
x!y!(n x y)!
x!(n x)!

Conditional probability: Taking A = {Y = y} and B = {X = x} in the definition of


conditional probability we obtain the conditional pmf of Y given X:
fY |X (y|x) =

fX,Y (x, y)
.
fX (x)

Note that, given some x, fY |X (.|x) satisfies:


X
k

fY |X (yk |x) =

X fX,Y (x, yk )
k

fX (x)

P
=

fX,Y (x, yk )
fX (x)
=
= 1.
fX (x)
fX (x)

Hence, for a given x, the conditional pmf is a pmf.


We can also define the conditional cdf :
FY |X (y|x) = P (Y y|X = x).
Example 1:
P (X = 0|Y = 0) =

P (X = 0, Y = 0)
1/2
=
= 1.
P (Y = 0)
1/2

Independence: Two discrete random variables are said independent iff {X = xj } and
{Y = yk } are independent, for all j, k, i.e.
P (X = xj , Y = yk ) = P (X = xj )P (Y = yk ).

13

An equivalent result is:


P (Y = yk |X = xi ) = P (Y = yk ),
that is: X does not bring any information on Y .
We can show the following theorem: in contingency tables such as the one in example
1, X and Y are independent iff every row is proportional to any other row or, equivalently,
every column is proportional to every other column.
Example 1: obviously X and Y are not independent.
Independence is a convenient assumption, as it allows to model joint distributions
through their marginals (random sample). In economics, however, independence is a
rare event, and most of the time we are interested in correlation patterns.
Multivariate discrete random variables: All the discussion about bivariate random
variables can be generalized to the multivariate case.
For instance, in statistics we are often interested in random samples (iid). A random
sample of size N is a multivariate random variable X = (X1 , ..., XN ), where all Xj are
mutually independent and identifaclly distributed. So the pdf of X is:
fX (x1 , ..., xN ) =

N
Y

fXj (xj ) =

j=1

N
Y

fX (xj ).

j=1

So to define a random sample it is sufficient to define the pmf of any of the Xj .

Bivariate continuous random variables

Probability distribution. Let (X, Y ) be an absolutely continuous bivariate random


variable. Then (X, Y ) possesses a joint pdf fX,Y that satisfies:
Z

b1

b2

FX,Y (b1 , b2 ) =

fX,Y (x, y)dydx.

Remark: in this course we shall assume that the regularity conditions are such that
the Fubini theorem can be applied. As such, integrals can be interchangable.
We have:

2 FX,Y (x, y)
fX,Y (x, y) =
.
xy

14

As FX,Y is increasing with respect to both arguments it follows that fX,Y is non negative.
R + R +
Moreover: fX,Y (x, y)dydx = 1. Together, these two properties characterize a
bivariate pdf.
Moreover:
P (X = a, b1 Y b2 ) = 0.
As in the univariate case, this zero probability event is not impossible.
Example 1: the roof distribution, the joint pdf of which is
0 x 1,

fX,Y (x, y) = x + y,

0 y 1.

We verify that this is a joint pdf.


Example 2: fX,Y (x, y) = xy exp((x + y)) for x > 0 and y > 0, zero otherwise.
Marginal density. The bivariate pdf (cdf) contains all the information about the
marginal pdf (cdf) of X and Y .
Indeed, for continuous variables we have:
Z

fX (x) =

fX,Y (x, y)dy.

This is a valid (univariate) pdf, as:


Z

fX,Y (x, y)dydx = 1.

fX (x)dx =

The marginal cdf is given by:


FX (x) = P (X x, Y +) = FX,Y (x, +).
Again, we verify that this is a valid (univariate) cdf.
Example 1: for the roof distribution
Z
fX (x) =
0

1
(x + y)dy = (x + ), for 0 x 1.
2

Example 2:
Z
fX (x) =

xy exp((x + y))dy = x exp(x), for x > 0.


0

15

Conditional density. Given a continuous bivariate random variable (X, Y ) one can
define its conditional pdfs:
fY |X (y|x) =

fX,Y (x, y)
;
fX (x)

fX|Y (x|y) =

fX,Y (x, y)
.
fY (y)

Example: for the roof distribution we have, for 0 x 1:


fY |X (y|x) =

x+y
,
x + 12

0 y 1.

Conditional cdf. The conditional cdf is given by:


Z

fY |X (e
y |x)de
y

FY |X (y|x) =

= lim

h0

P (Y y|x h X x + h).

We have the two factorizations:


fX,Y (x, y) = fX (x)fY |X (y|x) = fY (y)fX|Y (x|y),
Z

FX,Y (x, y) =

fX (e
x)FY |X (y|e
x)de
x=

fY (e
y )FX|Y (x|e
y )de
y.

Independence. We say that X and Y are (stochastically) independent iff


fX,Y (x, y) = fX (x)fY (y), for all x, y.
Hence two variables are independent if their joint distribution is the product of their
marginal distributions.
Implications:
fY |X (y|x) does not depend on x, for all y. In particular, fY |X (y|x) = fY (y).
Likewise, fX|Y (x|y) = fX (x).
Example: in time series, if fYt |Yt1 (yt |yt1 ) does not depend on yt1 .
Independence is a symmetric relation: Y is independent of X if and only if X is
independent of Y .
Property: if X and Y are independent and Z = g(X) and W = h(Y ) are functions of
X and Y alone, repectively, then Z and W are independent.

16

Change of variables. As an example, consider the roof distribution, and let Z = X+Y .
The pdf of Z is
Z

fZ (z) =

fX,Y (x, z x)dx.

x+y=z

So:

fX,Y (x, y)dxdy =

(x + z x)1{x [0, 1], z x [0, 1]}dx,

fZ (z) =

which yields:
fZ (z) = z (min(z, 1) max(z 1, 0)) , for z [0, 2].
We check that fZ is indeed a pdf.
Mixed random variables: In the case of mixed distributions, it is often convenient
to specify the marginal discrete pmf and the conditional pdf of the continuous variable
given the discrete one.
Example: wage and type of contract.

The multivariate normal

Theorem. Let U and V be two independent standard univariate normal r.v.s. Let
be a constant such that || < 1. Let
X = U;

Y = U +

p
1 2 V.

Then the joint pdf of (X, Y ) is:


 2

x + y 2 2xy
2 (x, y) = p
exp
.
2(1 2 )
2 1 2
1

2 is called the standard bivariate normal pdf with parameter . We shall denote:
X
Y

0
0

!
,

1
1

!!
.

Proof. To show this result, note that Y |X N (X, 1 2 ). Hence its pdf is:
1

fY |X (y|x) = p
1 2

17

y x
p
1 2

!
.

Hence:
1

fX,Y (x, y) = fY |X (y|x)fX (x) = p

1 2

y x
p
1 2

!
(x).

We verify that this yields 2 (x, y).


Properties.
The marginal distributions of (X, Y ) are N (0, 1).
The conditional distribution of Y given X is Y |X N (X, 1 2 ).
X and Y are independent if and only if = 0.
Bivariate normal distributions. Let (X, Y ) be standard bivariate normal with parameter , and let Z1 = 1 + 1 X, and Z2 = 2 + 2 Y . Then the pdf of (Z1 , Z2 ) is given
by:







1

p
exp
2
2 1 2 1

z1 1
1

z2 2
2

2(1

z1 1
1

z2 2
2

2 )

This is the pdf of a bivariate normal pdf:


Z1
Z2

!
N

1
2

!
,

1 2
21
1 2
22

!!
.

The same properties as before are easily proved. In particular we have:


Z2 = + Z1 +
where
= 2

2
;
1 1

2
;
1


N 0, (1 2 ) 22 ,

and is independent of Z1 .
Two remarks:
The bivariate normal is more than simply normal marginals: the dependence structure is also normal.
Example: in studies of default risk, the normal is typically found to present too little
dependence in the tail.
Two univariate normal variables are not necessarily independent. They can be
dependent if the dependence structure is not normal.
18

The multivariate normal. Its pdf is, for x RK :


fX (x) = (2)

K
2

12

||


1
0 1
exp (x ) (x ) ,
2


where || is the determinant of the K K matrix , and is K 1.


Example: If all components of X = (X1 , ..., XK ) are independent N (k , 2k ). Then

= diag 21 , ..., 2K .
So:
(x )0 1 (x ) =

K
X
(xk k )2
,
2

k
k=1

and:
K

1
1 X (xk k )2
fX (x) = (2)
exp
1 ... K
2 k=1
2k
!)
(
K
Y
1 (xk k )2
1

=
exp
.
2
2

2
k
k
k=1
K
2

We will see that is the mean of X, and is its variance.

19

S-ar putea să vă placă și