Harvard Government 2000 Lecture 2

Definitions and Notation
Random Variables and Distributions

Expectation and Transformations
Elementary Asymptotics
Some Important Distributions
Gov2000: Quantitative Methodology for

Political Science I
Lecture 2: Basic Probability, Random Variables, and some
September 24, 2007
Gov2000: Quantitative Methodology for Political Science I

Outline
1 Definitions and Notation
What is Probability?
Notation and Definitions
Marginal, Joint and Conditional Probability
2 Random Variables and Distributions
What is a Random Variable?
Discrete and Continuous Distributions
Marginal, Joint, and Conditional Distributions
3 Expectation and Transformations
Expectation and Variance
Conditional Expectation and Variance
4 Elementary Asymptotics
Convergence of a Sequence
Convergence in Probability
Convergence in Distribution
5 Some Important Distributions

Random Variables and Distributions What is Probability?
Expectation and Transformations Notation and Definitions
Elementary Asymptotics Marginal, Joint and Conditional Probability
Intuitive Definition
While there are several interpretations of what probability is,
most modern (post 1935 or so) researchers agree on an
axiomatic definition of probability.
3 Axioms (Intuitive Version):

1 The probability of any particular event must be
non-negative.
2 The probability of anything occurring among all possible
events must be 1.
3 The probability of one of many mutually exclusive events
happening is the sum of the individual probabilities.
The rules of probability can be derived from these axioms.


Subjective Interpretation
Probability is a subjective belief about the likelihood of an event.

Example 1: The probability of drawing 5 red cards out of 10
drawn from a deck of cards is whatever you want it to be.
Example 2: The probability of state failure among partial
democracies is whatever you want it to be.
But...
1 If you don’t follow the three axioms, a smart bookie can set
up a Dutch book against you.
2 There is a correct way to update your beliefs once you
collect evidence (data).

Frequency Interpretation
Suppose some process can produce different events (e.g. coin

flip).
Probability of is the relative frequency with which an event

would occur if the process were repeated a large number of
times under similar conditions.
Example 1: The probability of drawing 5 red cards out of

10 drawn from a deck of cards is the frequency with which
this event occurs in repeated samples of 10 cards.
Example 2: The probability of state failure among partial
democracies is the ...

If you want to explore this debate further, check out this article
in the Stanford Encyclopedia of Philosophy.
http://plato.stanford.edu/entries/probability-interpret/

Basic Set Theoretic Notation
Let A denote a set. If a is a member of A we write a ∈ A.
If a1 , a2 , and a3 are the members of A, we write
A = {a1 , a2 , a3 }.
The empty set ∅ is the set with no members.
If A is a subset of B we write A ⊂ B.
For example, if A = {red, blue} and B = {red, blue, green},

then A ⊂ B.

The intersection of two sets A and B is the set containing all

elements that belong to both sets. We write the intersection of
A and B as A ∩ B.
For example, if A = {red, blue} and B = {blue, green}, then

A ∩ B = {blue}
The union of two sets A and B is the set that contains the
intersection of A and B, the elements in A that aren’t in B and
the elements of B that aren’t in A.
For example, if A = {red, blue} and B = {blue, green}, then

A ∪ B = {red, blue, green}

Sample Spaces
The sample space is the set of all possible outcomes, and is

often written as Ω.
For example, if we flip a coin twice, there are four possible
outcomes,

Ω = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Events
Events are subsets of the sample space.
For Example, if

Ω = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails} ,
then
∅

{heads, heads}, {heads, tails}, {tails, tails}
{heads, tails}
are all events.
If A is an event, then "everything else" in the sample space is

called the compliment of A, and is written as Ac .
Probability Function
A probability function P(·) is a function defined over all subsets

of a sample space Ω and that satisfies the three axioms:
1 P(A) ≥ 0 for all A in the set of all events.

2 P(Ω) = 1
3 if events
S∞ A1 , AP
2 , . . . are mutually exclusive then
P( i=1 Ai ) = ∞ i=1 P(Ai ).

Marginal and Joint Probability
So far we have only considered situations where we are

interested in the probability of a single event A occurring. We’ve
denoted this P(A). P(A) is sometimes called a marginal
probability.
Suppose we are now in a situation where we would like to

express the probability that an event A and an event B occur.
This quantity is written as P(A ∩ B), P(B ∩ A), P(A, B), or
P(B, A) and is the joint probability of A and B.

Conditional Probability
If P(B) > 0 then the probability of A conditional on B can be

written as
P(A, B)
P(A|B) =
P(B)
This implies that
P(A, B) = P(B) × P(A|B)

For example, if we randomly draw two cards from a standard 52

card deck and define the events A = {King on Draw 1} and
B = {King on Draw 2}, then
P(A) = 4/52
P(B|A) = 3/51
P(A, B) = P(A) × P(B|A) = 4/52 × 3/51
Question: P(B) =?
a) 3/51
b) 4/52
c) 4/51
d) not enough information

Law of Total Probability (LTP)
With 2 Events:
P(B) = P(B, A) + P(B, Ac )

= P(B|A) × P(A) + P(B|Ac ) × P(Ac )
In general, if {Cn : n = 1, 2, 3, . . . } forms a partition of the

sample space, then
X
P(B) = P(B ∩ Cn )
n
X
= P(B|Cn ) × P(Cn )
n

Confirming Intuition with the LTP
P(B) = P(BA) + P(BAc )

= P(B|A) × P(A) + P(B|Ac ) × P(Ac )
P(B) = 3/51 × 1/13 + 4/51 × 12/13
3 + 48 1
= =
51 × 13 13

Some other useful rules
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Also, If P(A) > 0 and P(B) > 0, then we can write the following.
P(AB) = P(A)P(B|A) = P(B)P(A|B)
P(A)P(B|A)
P(A|B) =
P(B)
P(A)P(B|A)
P(A|B) =
P(B|A) × P(A) + P(B|Ac ) × P(Ac )

False Positive Problem

Suppose we have a test for a rare disease (1/100,000) with the
following properties (shown through extensive trials):
P(+ test| disease) = .999 (Sensitivity)
P(− test| no disease) = .999 (Specificity)
Question: Suppose you receive a positive test, what is the

probability that you have the disease?
a) < 1/3
b) between 1/3 and 2/3
c) > 2/3
d) not enough information

Answer to the False Positive Problem
P(+ test, dis.)

P(dis.| + test) =
P(+ test)
P(+ test| dis.) × P(dis.)
=
P(+ test)
P(+ test| dis.) × P( dis.)
=
P(+test|dis.) × P(dis.) + P(+test|no dis.) × P(no dis.)
≈

Independence
Events A and B are independent if knowing whether A occurred
provides no information about whether B occurred.
Formal Definition
⊥B
P(AB) = P(A)P(B) =⇒ A⊥
With all the usual > 0 restrictions, this implies
P(A|B) = P(A)
P(B|A) = P(B)
This type of independence is sometimes called “marginal”

independence.
Coins vs. Cards

A two coin flip thought experiment provides a good example of
independence because the outcome from the first flip doesn’t
affect the outcome from the second flip. If A = {Heads on flip 1}
and B = {Heads on flip 2}, then
P(A, B) = P(A) × P(B)
Contrast this with our two card thought experiment. If

A = {King on Draw 1} and B = {King on Draw 2}, then
P(A, B) = P(A)P(B|A) = 1/13 × 3/51 6= P(A)P(B)

Conditional Independence
Events A and B are conditionally independent given C, if
knowing whether C occurred and knowing whether A occurred
provides no information about whether B occurred.
Formal Definition
With P(C) > 0, we can write
P(A, B, C)
P(A, B|C) =
P(C)
and we say that A is conditionally independent of B given C
⊥B|C) if
(A⊥
P(A, B|C) = P(A|C)P(B|C)

Rain and Sprinklers

Suppose I flip a coin every morning in the Summer. If it comes
up heads, I turn on my sprinkler. I never turn on my sprinkler in
Fall, Winter, and Spring.
Events:
A = {the sprinkler was on today}
B = {it rained today}
C = {it is Summer}
Question 1: Are A and B independent?
Question 2: Conditional on knowledge of C, are A and B

independent?

Why is the grass wet?

Suppose I flip a coin every morning. If it comes up heads, I turn
on my sprinkler. When I get home from work at night, I turn the
sprinkler off if it is on.
Events:
A = {the sprinkler was on today}
B = {it rained today}
C = {the grass is wet}
Question 1: Are A and B independent?
Question 2: Conditional on knowledge of C, are A and B

independent?

Random Variables and Distributions What is a Random Variable?
Expectation and Transformations Discrete and Continuous Distributions
Elementary Asymptotics Marginal, Joint, and Conditional Distributions
A random variable X is a function that maps the sample space

to the real numbers.
Returning to our previous example with

Ω = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}
we could define a random variable X (ω) to be the function that

returns the number of heads for each element of Ω.
X ({heads, heads}) = 2
X ({heads, tails}) = 1
X ({tails, heads}) = 1
X ({tails, tails}) = 0

Discrete Distributions
For discrete distributions, the random variable X takes on a

finite, or a countably infinite number of values.
Example 1: The number of Clinton supporters in a poll of
1,000 likely voters.
Example 2: The number of calls to the Clinton campaign
headquarters on a given day.
A common shorthand is to think of discrete RVs taking on
distinct values.
A probability mass function (pmf) and a cumulative
distribution function (cdf) are two common ways to define
the distribution for a discrete RV.

Discrete Probability Mass Functions
A probability mass function f (x) of a random variable X is a

non-negative
P function that gives the probability that X = x and
x f (x) = 1.
For example, when X is the number of heads in two coin flips,


 1/4 x = 0
f (x) = 1/2 x = 1
1/4 x = 2


PMF Plot
1.0
0.8
0.6
f(x)
●
0.4
● ●
0.2
0.0
−0.5 0.0 0.5 1.0 1.5 2.0 2.5

Discrete Cumulative Distribution Function
A cumulative distribution function F (x) of a random variable X

is a non-decreasing function that gives the probability that
X ≤ x.
For example, when X is the number of heads in two coin flips,



 0 x <0
1/4 0 ≤ x < 1

F (x) =

 3/4 1 ≤ x < 2
1 2≤x


Discrete CDF Plot

1.0
●
0.8
● ●
0.6
F(x)
0.4
● ●
0.2
0.0
−0.5 0.0 0.5 1.0 1.5 2.0 2.5

Discrete CDF Question
Question: If X = “the number of heads in two coin flips”, how

can you calculate the probability of X = 1 with the CDF?
a) F (1)
b) F (2)
c) F (1) − F (0)
d) F (2) − F (1)

Continuous Distributions
Continuous random variables take on an uncountably

infinite number of values.
Example: Segal-Cover scores for US Supreme Court
justices
A probability density function (pdf) and a cumulative
distribution function (cdf) are two common ways to define
the distribution for a continuous RV.

Continuous Probability Density Function
The probability density function f (x) of a continuous random

variable X is the non-negative function that satisfies
1 f (x) ≥ 0 for all x ∈ R
R∞
−∞ f (x)dx = 1
2
For example
1/4 0 < x < 4
f (x) =
0 otherwise

1/4 0 ≤ x ≤ 4
f (x) =
0 otherwise
Think of densities as infinite data histograms.

1.0
0.8
0.6
f(x)
0.4
0.2
0.0
0 1 2 3 4

Continuous Cumulative Distribution Functions
A cumulative distribution function F (x) of a random variable X

is a non-decreasing function that gives the probability that
X ≤ x. However, for a continuous RV, the cdf is continuous.
Z x
F (x) = f (z)dz
−∞
For example, 
 0 x <0
F (x) = x/4 0 ≤ x < 4
1 4≤x


Continuous CDF Plot

1.0
0.8
0.6
F(x)
0.4
0.2
0.0
0 1 2 3 4

Continuous Probability Questions

For the continuous distribution, described by the following pdf

1/4 0 < x < 4
f (x) =
0 otherwise
Question 1: What is the probability that X = 3?

a) 0
b) 1/4
c) 3/4
Question 2: What is the probability that 1 < X < 3?
a) 1/4
b) 2/4
c) 3/4

Marginal, Joint, and Conditional Distributions
Just as marginal, joint, and conditional probabilities can be

defined for two arbitrary events A and B; marginal, joint, and
conditional probability distributions can be defined for two
random variables X and Y .

Discrete Joint Distributions

The joint mass function fX ,Y (x, y ) of two discrete random
variables X and Y is the function that gives the probability that
X = x and Y = y for all x and y .
Example:
Y
1 2 3
1 0.22 0.04 0.09 0.35
X 2 0.15 0.10 0.20 0.45
3 0.01 0.07 0.12 0.20
0.38 0.21 0.41 1.00

Continuous Joint Distributions

The joint density function fX ,Y (x, y ) of two continuous random
variables X and Y is the function that gives the density height
where X = x and Y = y for all x and y .
1.0
0.8
0.6
y
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0

Continuous Joint Distributions

The joint density function fX ,Y (x, y ) of two discrete random
variables X and Y is the function that gives the density height
where X = x and Y = y for all x and y .
f(x,
y)
y
x

Discrete Marginal Distributions

The marginal mass function fX (x) of a discrete random variable
X gives the probability that X = x for all x, and can be
calculated from the joint probability function fX ,Y (x, y ) of X and
Y according to
X
fX (x) = fX ,Y (x, y ).
y
Y
1 2 3
1 0.22 0.04 0.09 0.35
X 2 0.15 0.10 0.20 0.45
3 0.01 0.07 0.12 0.20
0.38 0.21 0.41 1.00
Continuous Marginal Distributions

The marginal density function fX (x) of a continuous random
variable X gives the density height that X = x for all x, and can
be calculated from the joint density function fX ,Y (x, y ) of X and
Y according to
Z ∞
fX (x) = fX ,Y (x, y )dy .
−∞
0.40
0.39
0.38
f(x,y)
f(x)
0.37
0.36
y
x 0.0 0.2 0.4 0.6 0.8 1.0

Conditional Discrete Distributions

The conditional mass function fX |Y (x|y ) of two discrete random
variables gives the probability that X = x given the fact that
Y = y for all all values of x and y and is given by:
fX ,Y (x, y )
fX |Y (x|y ) =
fY (y )
where it is assumed that fY (y ) > 0. It follows that
fX ,Y (x, y ) = fX |Y (x|y )fY (y ),

fX ,Y (x, y )
fY (y ) = .
fX |Y (x|y )

Table: Joint and Marginal Probabilities
Y
1 2 3
1 0.22 0.04 0.09 0.35
X 2 0.15 0.10 0.20 0.45
3 0.01 0.07 0.12 0.20
0.38 0.21 0.41 1.00
Table: Conditional f (x|y ) Probabilities
Y
1 2 3
1 0.58 0.19 0.22
X 2 0.39 0.48 0.49
3 0.03 0.33 0.29
1.00 1.00 1.00

Conditional Continuous Distributions
The conditional density function fY |X (y |x) when Y is a

continuous random variable gives the density height for Y = y
given the fact that X = x for all all values of x and y and is
given by:
fY ,X (y , x)
fY |X (y |x) =
fX (x)
where it is assumed that fX (x) > 0.

Joint Density Conditional Density

1.0
1.0
0.8
0.8
0.6
0.6
y
y
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x

Joint Density Conditional Density

y
y
f(y|x
f(x,y
)
)
x x

Conditional Densities- Discrete X
Marginal Density
0.4
f(y)
0.2
0.0 −3 −2 −1 0 1 2 3 4
Conditional Density X=−1

0.4
f(y|x)
0.2
0.0
−3 −2 −1 0 1 2 3 4
Conditional Density X=2

0.4
f(y|x)
0.2
0.0
−3 −2 −1 0 1 2 3 4

Expectation
The expected value of a random variable X is denoted by E[X ]
and is a measure of central tendency of X . Roughly speaking,
an expected value is like a weighted average.
The expected value of a discrete random variable X is defined
as
X
E[X ] = xfX (x).
all x
The expected value of a continuous random variable X is

defined as Z ∞
E[X ] = xfX (x)dx.
−∞

An example will make this more clear. Suppose X is a discrete

random variable that can take values of 0, 1, and 2. The
probability function of X is given by:

0.20 if x = 0

fX (x) = 0.45 if x = 1

0.35 if x = 2

The expected value of X is:
E[X ] = 0 × fX (0) + 1 × fX (1) + 2 × fX (2)

= 0 × 0.20 + 1 × 0.45 + 2 × 0.35
= 1.15

Interpreting Discrete Expected Value

The expected value for a discrete random variable is the
balance point of the mass function.
1.0
0.8
0.6
f(x)
●
0.4
●
0.2
●
0.0
−0.5 0.0 0.5 1.0 1.5 2.0 2.5

Interpreting Continuous Expected Value

The expected value for a continuous random variable is the
balance point of the density function.
0.15
0.10
f(x)
0.05
0.00
0 2 4 6 8 10 12

Sample Mean as an Expected Value
Let x1 , . . . , xn be our sample. Then the sample mean is defined

as the following
n
1X
x̄ = xi
n
i=1
This can be re-written in the following form:

n
X 1
x̄ = xi ·
n
i=1
Note how this resembles the definition of discrete expected

value.

Example
3.0
2.5
2.0
1.5
1.0
0.5
0.0
2 3 4 5 6

Example
3.0
2.5
2.0
1.5
1.0
0.5
0.0
2 3 4 5 6 7

Example
3.0
2.5
2.0
1.5
1.0
0.5
0.0
2 3 4 5 6 7

Why the balance point?
It is a reasonable measure for the “center” of the data.

We have some intuition about balance points.
The sample balance point has properties that are easy to
describe. (It is a linear combination of the data.)
It is most accurate in a certain sense (next week).

Useful Properties of Expected Values
Suppose we have k random variables X1 , . . . , Xk . If E[Xi ] exists

for all i = 1, . . . , k , then
" k #
X
E Xi = E[X1 ] + · · · + E[Xk ]
i=1
If two random variables X and Y are independent and have

finite expectations then
E[XY ] = E[X ]E[Y ]

Suppose a and b are constants and X is a random variable.

Then
E[aX ] = aE[X ]
E[b] = b
E[aX + b] = aE[X ] + b

Expectation Question
Question: If X1 , . . . , Xn are random variables with

E[X1 ] = µ, ..., E[Xn ] = µ, what is the expected value of
X n = n1 (X1 + . . . + Xn )?
µ
a) n
b) nµ
c) µ

Variance
The expected value of a function of the random variable X
(g(X ))is denoted by E[g(X )] and is a measure of central
tendency of g(X ).
The variance is a special case of this and the variance of a

random variable X (a measure of its dispersion) is given by
V [X ] = E[(X − E[X ])2 ]

= E[X 2 − 2E[X ]X + E[X ]2 ]
= E[X 2 ] − 2E[X ]2 + E[X ]2
= E[X 2 ] − E[X ]2

For a discrete random variable X

X
V [X ] = (x − E[X ])2 fX (x)
all x
For a continuous random variable X

Z ∞
V [X ] = (x − E[X ])2 fX (x)dx
−∞

Physical Interpretation of Variance

0.20
0.4
0.15
0.3
0.10
0.2
f(x)
f(x)
0.05
0.1
0.00
0.0
−6 −2 2 4 6 −6 −2 2 4 6
x x
Sample Variance
The sample variance is usually written in one of two ways:

1 Pn 2
i=1 (xi − x̄)
1
n
1 Pn 2
i=1 (xi − x̄)
2
n−1
The first option can be re-written in the following form.

n
X 1
(xi − x̄)2 ( )
n
i=1
Notice how this relates to the discrete definition of variance.

Physical Interpretation of Sample Variance

3.0
2.5
2.0
1.5
1.0
0.5
0.0
2 3 4 5 6

3.0
2.5
2.0
1.5
1.0
0.5
0.0
2 3 4 5 6


3.0
2.5
2.0
1.5
1.0
0.5
0.0
2 3 4 5 6

Useful Properties of Variances
If X1 , . . . , Xn are independent random variables and c1 , . . . , cn+1

are arbitrary constants then
V [c1 X1 + · · · + cn Xn + cn+1 ] = c12 V [X1 ] + · · · + cn2 V [Xn ]

Variance Question
Question: If X1 , . . . , Xn are i.i.d. random variables with

V [X1 ] = σ 2 , ..., V [Xn ] = σ 2 , what is the variance of
X n = n1 (X1 + . . . + Xn )?
σ2
a) n
b) nσ 2
c) σ 2

Conditional Expectation
The concept of conditional expectation is fundamental to

regression analysis.
Suppose we have two RVs X and Y that have some bivariate

distribution.
The conditional expectation of Y given X = x (denoted E[Y |x])

is the expected value of Y under the conditional distribution of
Y given X = x.

In the discrete case:

X
E[Y |x] = yfY |X (y |x)
y
In the continuous case:
Z ∞
E[Y |x] = yfY |X (y |x)dy
−∞
Similar definitions apply to the case of multiple conditioning

variables.
E[Y |x] is a function of x (realized values of X ) and can be

interpreted as the balance point for the conditional distribution.
Conditional Expectation - X discrete
Marginal Density
0.4
0.2
f(y)
0.0 −3 −2 −1 0 1 2 3 4

0.4
f(y|x)
0.2
0.0
−3 −2 −1 0 1 2 3 4

0.4
f(y|x)
0.2
0.0
−3 −2 −1 0 1 2 3 4

Conditional Expectation - X continuous
E[X],E[Y] E[Y|X]
1.0
1.0
0.8
0.8
0.6
0.6
●
y
y
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x

Conditional Variance
Likewise, we can define the conditional variance of Y given
X = x (denoted V [Y |x]) to be the variance of Y under the
conditional distribution of Y given X = x.
In the discrete case:

X
V [Y |x] = (y − E[Y |x])2 fY |X (y |x)
y
In the continuous case:

Z ∞
V [Y |x] = (y − E[Y |x])2 fY |X (y |x)dy
−∞

Conditional Variance - X discrete
Marginal Density
0.4
0.2
f(y)
0.0
−3 −2 −1 0 1 2 3 4

0.4
f(y|x)
0.2
0.0
−3 −2 −1 0 1 2 3 4

0.4
f(y|x)
0.2
0.0
−3 −2 −1 0 1 2 3 4

Random Variables and Distributions Convergence of a Sequence
Expectation and Transformations Convergence in Probability
Elementary Asymptotics Convergence in Distribution
Definition: Convergent Sequences of Real Numbers
A sequence of real numbers cn is said to converge to c if for

every > 0 there exists an integer N such that for n ≥ N,
|cn − c| < .
We will write this as
cn → c

Example
If cn is 1 + 1/n, then cn → 1.
2.0
●
1.8
1.6
●
cn
1.4
●
1.2
●
●
●
●●
●●
●●●
●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
1.0
0 20 40 60 80 100

Definition: Convergence in Probability
We say that a sequence of random variables Xn converges in

probability to a real number θ if for every > 0
P(|Xn − θ| > ) → 0 as n → ∞

Xn →p θ

Example: The Weak Law of Large Numbers

If X1 , X2 , . . . , Xn , . . . are i.i.d. with −∞ < E[X1 ] = µ < ∞, then
X n →p µ
n=1 n = 10 n = 100
0.40
4
1.2
0.35
1.0
3
0.30
0.8
0.25
f(Xn)
f(Xn)
f(Xn)
2
0.6
0.20
0.4
0.15
1
0.2
0.10
0.0
0.05
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Xn Xn Xn

Convergence Question
Question: Does Xn appear to be converging in probability to 2?
n=1 n = 10 n = 100
0.4
4
1.2
1.0
0.3
3
0.8
f(Xn)
f(Xn)
f(Xn)
0.2
2
0.6
0.4
0.1
1
0.2
0.0
0
0.0
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Xn Xn Xn

Definition: Convergence in Distribution
We say that a sequence of random variables Xn converges in

distribution to a random variable X if the cumulative distribution
functions Fn and F of Xn and X satisfy the following
Fn (x) → F (x) as n → ∞ for each continuity point x of F

Xn →d X

The Classical Central Limit Theorem

If X1 , X2 , . . . , Xn , . . . are
√ i.i.d. with E[X1 ] = µ and V [X1 ] = σ 2
and E|X |2 < ∞, then n(X n − µ) →d N (0, σ 2 )
n=1 n = 10 n = 100
0.5
2.0
0.6
0.4
0.5
1.5
0.4
0.3
f(Xn)
f(Xn)
f(Xn)
1.0
0.3
0.2
0.2
0.5
0.1
0.1
0.0
0.0
0.0
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8
Xn Xn Xn

The Univariate Normal Distribution

The univariate normal (Gaussian) probability density function is
given by

1 1
fN (x|µ, σ 2 ) = √ exp − 2
(x − µ)2
2πσ 2σ
2.0
N(0,1)
N(2, 1)
N(0, .25)
1.5
Density
1.0
0.5
0.0
−4 −2 0 2 4

Some facts about the univariate normal distribution:

The normal distribution with mean 0 and variance 1 is
called the standard normal distribution
If a large random sample is taken from any distribution with
finite variance the sampling distribution of the sample
mean will be approximately normal
If a sample (X1 , . . . , Xn ) of any size n is taken from a
normal distribution with known variance then the sampling
distribution of the sample mean will be normal with mean
E[X ] and variance V [X ]/n
A linear function of a normal RV is itself a normal RV
The R functions rnorm(), dnorm(), and pnorm()
calculate pseudo-random normal deviates, the normal
density function, and the normal distribution function
respectively.

The Multivariate Normal Distribution
The d-variate normal density function is given by

−d/2 −1/2 1 0 −1
fN (x|µ, Σ) = (2π) |Σ| exp − (x − µ) Σ (x − µ)
2
Here x and µ are vectors of length d and Σ is a d × d

positive-definite matrix. The mean of x is µ and the
variance-covariance matrix of x is Σ.

The Chi-Square Distribution
The chi-square probability density function is given by
2(−ν/2) (ν/2−1)
fχ2 (x|ν) = x exp(−x/2) for x > 0.
Γ(ν/2)
R∞
where Γ(z) = 0 t z−1 exp[−t]dt (if z is an integer then
Γ(z) = (z − 1)!).
The mean of a chi-square random variable is ν, its variance is
2ν, and (when ν ≥ 2) its modal value is ν − 2.
The parameter ν is referred to as the degrees of freedom.

0.5
chisquare 1
0.4
chisquare 4
chisquare 15
0.3
Density
0.2
0.1
0.0
0 10 20 30 40

Some facts about the chi-square distribution:

The chi-square distribution is important because the
asymptotic sampling distribution of many test statistics will
be chi-square.
If the random variables X1 , . . . , Xk are i.i.d. and if each of
these variables has a standard normal distribution, then
the sum of squares X12 + · · · + Xk2 has a chi-square
distribution with k degrees of freedom.
If the random variables X1 , . . . , Xk are independent and if
Xi follows a chi-square distribution with νi degrees of
freedom for i = 1, . . . , k then the sum X1 + · · · + Xk has a
chi-square distribution with ν1 + · · · + νk degrees of
freedom.


normal distribution then the random variable
n
1 X
(Xi − X̄n )2
V [X ]
i=1
follows a chi-square distribution with n − 1 degrees of

freedom.
The R functions rchisq(), dchisq(), and pchisq()
calculate pseudo-random chi-square deviates, the
chi-square density function, and the chi-square distribution
function respectively.

The t Distribution
The t probability density function is given by
Γ ((ν + 1)/2) 1
ft (x|ν) = √ × (ν+1)/2
πνΓ(ν/2) x2
1+ ν
The mean of a tν random variable is 0 and it’s variance is

ν/(ν − 2) as long as ν > 2.
The mean of a t1 RV does not exist.

0.6
t1
t4
0.5
t 15
0.4
Density
0.3
0.2
0.1
0.0
−4 −2 0 2 4

Some facts about the t distribution:

The t distribution can be motivated as follows. If
Z ∼ N (0, 1), Y ∼ χ2ν , and Z and Y are independent, then
Z
X ≡q
Y
ν
follows a tν distribution.
normal distribution with zero mean and unknown variance
then the sampling distribution of the sample mean divided
by the sample standard error will have the t distribution
with ν = n − 1.

The sampling distribution of regression coefficients (after

some standardization) can be shown to follow a
t-distribution.
As ν → ∞ the tν distribution approaches the N (0, 1)
distribution.
The R functions rt(), dt(), and pt() calculate
pseudo-random t deviates, the t density function, and the t
distribution function respectively.

The F Distribution
The F density is given by:
ν1 −(ν1 +ν2 )/2

Γ((ν1 + ν2 )/2 ν1 /2 (ν1 −2)/2
fF = (ν1 /ν2 ) x 1+ x
Γ(ν1 /2)Γ(ν2 /2) ν2
ν1 is sometimes called the numerator degrees of freedom and

ν2 is sometimes called the denominator degrees of freedom.

4
F 1,2
F 5,5
F 30, 20
3
F 500, 200
Density
2
1
0
0.0 0.5 1.0 1.5 2.0

Some facts about the F distribution:

if X1 and X2 are independent chi-square RVs with ν1 and
ν2 degrees of freedom respectively then (X1 /ν1 )/(X2 /ν2 )
follows an F distribution with ν1 numerator df and ν2
denominator df.
If X follows a t distribution with ν df, then X 2 follows an F
distribution with 1 numerator df and ν denominator df.
The F distribution will be useful for testing hypotheses
about multiple regression coefficients.
The R functions rf(), df(), and pf() calculate
pseudo-random F deviates, the F density function, and
the F distribution function respectively.

Harvard Government 2000 Lecture 2

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Harvard Government 2000 Lecture 2

Încărcat de

Drepturi de autor:

Formate disponibile

Definitions and Notation

Random Variables and Distributions

Gov2000: Quantitative Methodology for

September 24, 2007

Gov2000: Quantitative Methodology for Political Science I

Definitions and Notation

Gov2000: Quantitative Methodology for Political Science I

3 Axioms (Intuitive Version):

The rules of probability can be derived from these axioms.

Definitions and Notation

Probability is a subjective belief about the likelihood of an event.

Gov2000: Quantitative Methodology for Political Science I

Suppose some process can produce different events (e.g. coin

Probability of is the relative frequency with which an event

Example 1: The probability of drawing 5 red cards out of

Gov2000: Quantitative Methodology for Political Science I

Definitions and Notation

Gov2000: Quantitative Methodology for Political Science I

Basic Set Theoretic Notation

Let A denote a set. If a is a member of A we write a ∈ A.

If a1 , a2 , and a3 are the members of A, we write

The empty set ∅ is the set with no members.

For example, if A = {red, blue} and B = {red, blue, green},

Gov2000: Quantitative Methodology for Political Science I

Definitions and Notation

The intersection of two sets A and B is the set containing all

For example, if A = {red, blue} and B = {blue, green}, then

For example, if A = {red, blue} and B = {blue, green}, then

Gov2000: Quantitative Methodology for Political Science I

The sample space is the set of all possible outcomes, and is

Gov2000: Quantitative Methodology for Political Science I

Definitions and Notation

are all events.

If A is an event, then "everything else" in the sample space is

A probability function P(·) is a function defined over all subsets

1 P(A) ≥ 0 for all A in the set of all events.

Gov2000: Quantitative Methodology for Political Science I

Definitions and Notation

Marginal and Joint Probability

So far we have only considered situations where we are

Suppose we are now in a situation where we would like to

Gov2000: Quantitative Methodology for Political Science I

If P(B) > 0 then the probability of A conditional on B can be

P(A, B) = P(B) × P(A|B)

Gov2000: Quantitative Methodology for Political Science I

Definitions and Notation

For example, if we randomly draw two cards from a standard 52

Gov2000: Quantitative Methodology for Political Science I

Law of Total Probability (LTP)

P(B) = P(B, A) + P(B, Ac )

In general, if {Cn : n = 1, 2, 3, . . . } forms a partition of the

Gov2000: Quantitative Methodology for Political Science I

Definitions and Notation

Confirming Intuition with the LTP

P(B) = P(BA) + P(BAc )

Gov2000: Quantitative Methodology for Political Science I

Some other useful rules

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

P(AB) = P(A)P(B|A) = P(B)P(A|B)

Definitions and Notation

False Positive Problem

Question: Suppose you receive a positive test, what is the

Gov2000: Quantitative Methodology for Political Science I

Answer to the False Positive Problem