Lesson4 MAT284 PDF

Chapter 4: Sequence of Random
Variables
Chebyshevs Inequality, Convergence, Law of
Large Numbers, Central Limit Theorem, random
walks.
The standard deviation

Standard deviation is the positive square root
of the mean-square deviations of the
observations from their arithmetic mean.
Formula:
s=
(X X)
n 1
= variance
Produces a more
accurate estimate of the
true population standard
deviation
Note: Because the sum of the deviations equals zero, once n-1 of the
deviations are specified, the last deviation is already determined.
Hence the denominator uses the number of quantities that are free to
vary, called degrees of freedom.
Chebyshevs inequality
If X is a random variable with mean and variance 2,
then for any value k > 0
2
P { X k}
k
For proof see Page 127, $ 4.9 in your text book

Alternatively, for a given sample
s be the sample mean and standard deviation

of a data set. Assuming that s >0, Chebyshevs inequality
Let
and
states that for any value of k 1, greater than 100 1 2
k
percent of the data lie within the interval from x ks to
x + ks
Continued
Note: Thus by
letting k = 3/2, from Chebyshevs inequality that greater
than 100(5/9) = 55.56% of the data from any data set lies
within a distance of 1.5s of the sample mean x ,
letting k=2 shows that greater than 75% of the data lies
within 2s of the sample mean and
letting k=3 shows that greater than 800/9=88.9 % of the
data lies within 3s of the sample mean.
Empirical rule
Empirical rule
Sequence of random variables

A set of indexed set of random variables: {X1 , X 2 ,......., X n ,...}
Suppose we have a sample space S and a probability measure P. Also,
suppose that the sample space S consists of a finite no. of elements i.e.
S = {s1 , s2 ,....., sk }
Then a random variable X is a mapping that assigns a real no. to any

of the possible outcomes si, i=1,2,..,k. Thus X(si) = xi for i=1,2,
,k.
To define a sequence of random variables X1 , X 2 ,......., X n ,... it is
given that we have an underlying sample space S. In particular, each
Xn is a function from S to real numbers. Thus,
Xn (si ) = xni , for i = 1,2,......, k
In sum, a sequence of random variables is in fact a sequence of

functions X n : S
Example 1
Consider the following random experiment. A fair coin is tossed once.
Here the sample space has only two elements S = {H,T}. We define a
sequence of random variables X1 , X 2 ,......., X n ,... on the sample space as
follows:
1
Xn (s) = n + 1
1
if s = H
if s = T
a. Are the Xis independent?

b. Find the pmf and cdf of Xn,
FXn (x) for n=1,2,.
c. As n , what does FXn (x) look like?
Soln. a. The Xis are not independent because their values are determined by the
1
same coin toss. In particular, P(X1 = 1, X 2 = 1) = P(T ) =
which is different from
1 1 1
P(X1 = 1).P(X2 = 1) = P(T ).P(T ) = . =
2 2 4
Continued
b. Each Xi can take any two possible values that are equally likely. Thus
the pmf of Xn is given by
1
2
PXn (x) = P(Xn = x) =
1
2
From this we can get the cdf of Xn
1
if x =
n +1
if x = 1
1 if x 1
1
1
FXn (x) = P(Xn x) = if
x 1
n +1
2
1
0 if x < n + 1
The adjacent
figure shows the
cdf of Xn for
different values of
n. We see in the
figure that the cdf
of Xn approaches
the cdf of a
Bernoulli(1/2)
random variable
as n .
Note: This means

that the sequence
converge in
distribution to a
Bernoulli(1/2)
random variable as
Example 2
Consider the following random experiment. A fair coin is tossed repeatedly
forever. Here, the sample space S consists of all possible sequences of
heads and tails. We define the sequence of random variables X1, X2, .,
Xn,.. as follows
Xn = {0
1
if the n th coin toss results in a head

if the nth coin toss results in a tail
In this example, the Xis are independent because each Xi is a result of

a different coin toss. In fact, the Xis are independent and identically
distributed (iid) Bernoulli (1/2) random variable.
A special kind of sequence of r.v.

A very common and particularly useful class of sequences of random
variables is IID
Definition: IID Independent and identically distributed
Independent essentially, the value of one random variable does
not effect the value of any other; for instance, a coin doesnt
remember which side last landed up, so consecutive flips are said
to be independent
Identically distributed each random variable X has associated
with it a cumulative distribution function (CDF) which is derived
from the probability measure. The CDF gives the probability of the
value of the random variable being less than or equal to a certain
value. When two or more random variables have the same CDF,
we say they are identically distributed.
Convergence
We are interested in behaviour of functions of random variables such
as means, variances, proportions
For large samples, exact distributions can be difficult / impossible to
obtain.
In these situations we would like to see if a sequence of random
variables X1, X2, .., Xn, converges to a random variable X, i.e.
we would like to see if Xn gets closer and closer to X in some sense
as n increases. For example, suppose we are interested in knowing
the value of a random variable X, but we are not able to observe X
directly. Instead we can do some measurements and come up with an
estimate of X, call it X1. We then perform more measurements and
update our estimate of X, call it X2. Continue this process to obtain X1,
X2, X3, .. Finally as n increases our estimate gets better and
better i.e. Xn converges to X.
Example
Let Y1, Y2, Y3, . be a sequence of iid random variables. Let
1 n
Xn = Yi be the average of the first n of the Yis. This defines a new
n i=1
sequence X1, X2, . In other words, the sequence of interest X1, X2,
. might be a sequence of statistics based on some other sequence of
iid random variables. Note that the original sequence Y1, Y2, is iid
but the sequence X1, X2, .. is not iid
Types of convergence
There are two main types of convergence.
Convergence in Probability (CIP) - A sequence of random variables
is said to converge in probability to X if the probability of it differing from
X goes to zero as n gets large, written as
Xn P
X if >0,
P ( Xn X > ) 0 as n
Limit of an estimator
Convergence in Distribution (CID) - A sequence of random variables

is said to converge in distribution to X if the limit of the corresponding
cumulative distribution functions is the cumulative distribution function
of X, written as
Xn X if lim Fn (t) = F(t) t at which F is continuous

Limit of a CDF
Continued
Besides these there are also other types of convergence
Convergence in Quadratic mean - A sequence of random variables
is said to converge in quadratic mean to X, written as
Xn qm
X, if E ( Xn X ) 0 as n
2
is used primarily because it is stronger than CIP or CID and it can

be computed relatively easily
Convergence almost surely - A sequence of random variables is
said to converge almost surely to X, written as
Xn a.s.
X if >0,
P lim Xn X < = 1
n
Continued
Relation between different types of convergence:
a.s.
qm CIP CID
Law of Large Numbers (LLN)
Central Limit Theorem (CLT)
Limit Theorems
Limit Theorems can be used to obtain properties of estimators as the
sample sizes tend to infinity
Convergence in Probability Limit of an estimator
Convergence in Distribution Limit of a CDF
Two important theorems in probability:
the Law of Large Numbers (LLN)
the Central Limit Theorem (CLT)
Law of Large Numbers (LLN)

It states that if you repeat an experiment independently a large
number of times and average the result, what you obtain should be
close to the expected value, i.e. the mean of a large sample is close
to the mean of the distribution.
Two main versions of the law of large numbers Weak law of large numbers (WLLN)
Strong law of the large numbers (SLLN)
The difference between them is mostly theoretical
The Weak Law of Large Numbers: Let X1 , X 2 ,......., X n be an iid
random variables with a finite expected value, i.e. E(Xi ) = <
then for any > 0 ,
lim P X = 0
n
Continued..
Proof: The proof will be easier if we additionally assume that the random
variables have a finite variance i.e.
2
Var(X) = <
Using Chebyshevs inequality
Var(X) Var(X)
P X
=
0 as n
2
2
n
Application: Suppose that a sequence of independent trials is performed. Let E
be a fixed event and the probability that E occurs on a given trial be P(E). Let,
1 if E occurs on trial i
Xi =
0 if E does not occur on trial i
Then it follows that X1 + X 2 + .....+ X n represents the no. of times that E occurs
in the first n trials. Because, E[Xi ] = P(E)
, by WLLN it follows that for any > 0,
the probability that the proportion of the first n trials in which E occurs differs
from P(E) by more than , goes to 0 as n increases
Distribution depends on sample size
But its spread becomes

more and more reduced
as the sample size
increases
Sample sizes
The sample mean X is

also centered about the
population mean
Probability density function of the sample mean from a

standard normal population
The Central Limit theorem

One of the most important results in Probability
What does it say?
this theorem asserts that the sum of a large number of
independent random variables has a distribution that is
approximately normal.
it also helps explain the remarkable fact that the
empirical frequencies of so many natural populations
exhibit a bell-shaped (that is, a normal) curve.
Statement: Central Limit Theorem

Let X1 , X 2 ,......., X n ,..... be a sequence of independent and identically
distributed random variables each having mean E[Xi] = < and
variance 0 < Var(Xi) = 2 <, then the random variable
X X1 + X2 + ......+ Xn n
Zn =
=
/ n
n
converges in distribution to the standard normal random variable as
n . That is
lim P ( Z n x ) = (x)
n
This means that if X1 , X 2 ,......., X n ,..... be a sequence of i.i.d

random variables with mean and variance 2, then for n large, the
distribution of X1+X2+.+Xn is approximately normal with mean
n and variance n2
Central Limit Theorem

An interesting fact about the CLT is that it does not matter what the
distribution of the Xis is. The Xis can be discrete, continuous or
mixed random variables. Let us look at some examples:
Example 1. Lets assume that Xis are Bernoulli(p). Then E[Xi] = p
and Var(Xi) = p(1-p). Also Yn = X1+X2++Xn has Binomial(n,p)
distribution. Thus
Yn np
Zn =
np(1 p)
where Yn ~ Binomial(n,p). Figure shows the pmf for different values

of n. The shape of the pmf gets closer to the normal pdf as n
increases (See the next slide)
Example 2. Let's assume that Xi's are Uniform(0,1). Then E[Xi]=1/2,
Var(Xi)=1/12. In this case, Zn=(X1+X2+...+Xnn/2)/n/12
Figure shows the PDF of Zn for different values of n. The shape of
the PDF gets closer to the normal PDF as n increases (next slide)
Example1
Fig: In these figures, Zn is the

normalized sum of n
independent Bernoulli(p) random
variables. The shape of its PMF,
PZn(z), resembles the normal
curve as n increases.
Example 2
Fig: Zn is the normalized sum of n

independent Uniform(0,1) random
variables. The shape of its PDF,
fZn(z), gets closer to the normal
curve as n increases.
A Remark
We could have directly looked at Yn=X1+X2+...+Xn, so why
do we normalize it first and say that the normalized version
(Zn) becomes approximately normal?
This is because E[Yn]=nE[Xi] and Var(Yn)=n2Var(Yn)=n2 go to
infinity as n goes to infinity. We normalize Yn in order to have a finite
mean and variance (E[Zn]=0, Var(Zn)=1).
Nevertheless, for any fixed n, the CDF of Zn is obtained by scaling
and shifting the CDF of Yn. Thus, the two CDFs have similar shapes.
Example:
An insurance company has 25000 automobile policy holders. If the
yearly claim of a policy holder is a random variable with mean 320 and
standard deviation 540, approximate the probability that the total yearly
claim exceeds 8.3 million.
Solution: Let X denote the total yearly claim. Number the

policy holders, and let Xi denote the yearly claim of policy
holder i. With n = 25,000, we have from the central limit
n
theorem that
X = Xi
i=1
will have approximately a normal distribution with mean 320

25,000 = 8 106 and standard deviation 540 25,000 =
8.5381 104.
Contd.
Therefore
6
6
6
10
8.3
10
10
6
P{X > 8.3 10 } = P
>
4
4
8.5381 10
8.5381 10
X 8 10 6
.3 10 6
= P
>
4
4
8.5381 10
8.5381 10
P{Z > 3.51}

0.00023
Thus there are only 2.3 chances out of 10000 that the total yearly
claim will exceed 8.3 million
Important Applications
The importance of the central limit theorem is, in many real
applications, a certain random variable of interest is a sum of a large
number of independent random variables. In these situations, we
are often able to use the CLT to justify using the normal distribution.
Here are few examples:
Laboratory measurement errors are usually modeled by normal
random variables.
In communication and signal processing, Gaussian noise is the most
frequently used model for noise.
In finance, the percentage changes in the prices of some assets are
sometimes modeled by normal random variables.
When we do random sampling from a population to obtain statistical
knowledge about the population, we often model the resulting
quantity as a normal random variable.
Continued
Also it can simplify our computations significantly. A problem in
which we are interested in a sum of one thousand i.i.d. random
variables, it might be extremely difficult, to find the distribution of the
sum by direct calculation. Using the CLT we can immediately write
the distribution, if we know the mean and variance of the Xi's.
How large a sample is needed?

The central limit theorem leaves the question that how
large the sample size n needs to be for the normal
approximation to be valid?
The answer depends on the population distribution of the sample
data
For instance, if the underlying population distribution is normal, then
the sample mean X will also be normal regardless of the sample
size.
A general rule of thumb is that one can be confident of the normal
approximation whenever the sample size n is at least 30. That is,
practically speaking, no matter how nonnormal the underlying
population distribution is, the sample mean of a sample of size at
least 30 will be approximately normal.
How to apply the CLT

Here are the steps that we need in order to apply the CLT:
Write the random variable of interest, Y, as the sum of n i.i.d.
random variable Xi's:
Y=X1+X2+...+Xn.
Find E(Y) and Var(Y) by noting that
E[Y]=n, Var(Y2)=n2, where =E(Xi) and 2=Var(Xi).
According to the CLT, conclude that
Y E(Y ) Y n
=
Var(Y )
n
is approximately standard normal; thus, to find P(y1Yy2), we can

write
y n Y n y n
P(y1 Y y2 ) = P
y2 n
y1 n

n
n
Examples
Example 1. A bank teller serves customers standing in the queue one by
one. Suppose that the service time Xi for customer i has mean
E[Xi]=2(minutes) and Var(Xi)=1. We assume that service times for
different bank customers are independent. Let Y be the total time the
bank teller spends serving 50 customers. Find P(90<Y<110).
Example 2. In a communication system each data packet consists of

1000 bits. Due to the noise, each bit may be received in error with
probability 0.1. It is assumed bit errors occur independently. Find the
probability that there are more than 120 errors in a certain data packet.
Random Walk

Lesson4 MAT284 PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lesson4 MAT284 PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Chapter 4: Sequence of Random

The standard deviation

For proof see Page 127, $ 4.9 in your text book

s be the sample mean and standard deviation

states that for any value of k 1, greater than 100 1 2

Sequence of random variables

Then a random variable X is a mapping that assigns a real no. to any

Xn (si ) = xni , for i = 1,2,......, k

In sum, a sequence of random variables is in fact a sequence of

a. Are the Xis independent?

FXn (x) for n=1,2,.

c. As n , what does FXn (x) look like?

which is different from

From this we can get the cdf of Xn

Note: This means

if the n th coin toss results in a head

In this example, the Xis are independent because each Xi is a result of

A special kind of sequence of r.v.

Convergence in Distribution (CID) - A sequence of random variables

Xn X if lim Fn (t) = F(t) t at which F is continuous

is used primarily because it is stronger than CIP or CID and it can

Central Limit Theorem (CLT)

Law of Large Numbers (LLN)

Using Chebyshevs inequality

Distribution depends on sample size

But its spread becomes

The sample mean X is

Probability density function of the sample mean from a

The Central Limit theorem

Statement: Central Limit Theorem

This means that if X1 , X 2 ,......., X n ,..... be a sequence of i.i.d

Central Limit Theorem

where Yn ~ Binomial(n,p). Figure shows the pmf for different values

Fig: In these figures, Zn is the

Fig: Zn is the normalized sum of n

Solution: Let X denote the total yearly claim. Number the

will have approximately a normal distribution with mean 320

P{Z > 3.51}

How large a sample is needed?

How to apply the CLT

is approximately standard normal; thus, to find P(y1Yy2), we can

Example 2. In a communication system each data packet consists of

S-ar putea să vă placă și