Documente Academic
Documente Profesional
Documente Cultură
Variables
Chebyshevs Inequality, Convergence, Law of
Large Numbers, Central Limit Theorem, random
walks.
s=
(X X)
n 1
= variance
Produces a more
accurate estimate of the
true population standard
deviation
Note: Because the sum of the deviations equals zero, once n-1 of the
deviations are specified, the last deviation is already determined.
Hence the denominator uses the number of quantities that are free to
vary, called degrees of freedom.
Chebyshevs inequality
If X is a random variable with mean and variance 2,
then for any value k > 0
2
P { X k}
k
and
k
percent of the data lie within the interval from x ks to
x + ks
Continued
Note: Thus by
letting k = 3/2, from Chebyshevs inequality that greater
than 100(5/9) = 55.56% of the data from any data set lies
within a distance of 1.5s of the sample mean x ,
letting k=2 shows that greater than 75% of the data lies
within 2s of the sample mean and
letting k=3 shows that greater than 800/9=88.9 % of the
data lies within 3s of the sample mean.
Empirical rule
Empirical rule
S = {s1 , s2 ,....., sk }
Example 1
Consider the following random experiment. A fair coin is tossed once.
Here the sample space has only two elements S = {H,T}. We define a
sequence of random variables X1 , X 2 ,......., X n ,... on the sample space as
follows:
1
Xn (s) = n + 1
1
if s = H
if s = T
Soln. a. The Xis are not independent because their values are determined by the
1
same coin toss. In particular, P(X1 = 1, X 2 = 1) = P(T ) =
1 1 1
P(X1 = 1).P(X2 = 1) = P(T ).P(T ) = . =
2 2 4
Continued
b. Each Xi can take any two possible values that are equally likely. Thus
the pmf of Xn is given by
1
2
PXn (x) = P(Xn = x) =
1
2
1
if x =
n +1
if x = 1
1 if x 1
1
1
FXn (x) = P(Xn x) = if
x 1
n +1
2
1
0 if x < n + 1
The adjacent
figure shows the
cdf of Xn for
different values of
n. We see in the
figure that the cdf
of Xn approaches
the cdf of a
Bernoulli(1/2)
random variable
as n .
Example 2
Consider the following random experiment. A fair coin is tossed repeatedly
forever. Here, the sample space S consists of all possible sequences of
heads and tails. We define the sequence of random variables X1, X2, .,
Xn,.. as follows
Xn = {0
1
Convergence
We are interested in behaviour of functions of random variables such
as means, variances, proportions
For large samples, exact distributions can be difficult / impossible to
obtain.
In these situations we would like to see if a sequence of random
variables X1, X2, .., Xn, converges to a random variable X, i.e.
we would like to see if Xn gets closer and closer to X in some sense
as n increases. For example, suppose we are interested in knowing
the value of a random variable X, but we are not able to observe X
directly. Instead we can do some measurements and come up with an
estimate of X, call it X1. We then perform more measurements and
update our estimate of X, call it X2. Continue this process to obtain X1,
X2, X3, .. Finally as n increases our estimate gets better and
better i.e. Xn converges to X.
Example
Let Y1, Y2, Y3, . be a sequence of iid random variables. Let
1 n
Xn = Yi be the average of the first n of the Yis. This defines a new
n i=1
sequence X1, X2, . In other words, the sequence of interest X1, X2,
. might be a sequence of statistics based on some other sequence of
iid random variables. Note that the original sequence Y1, Y2, is iid
but the sequence X1, X2, .. is not iid
Types of convergence
There are two main types of convergence.
Convergence in Probability (CIP) - A sequence of random variables
is said to converge in probability to X if the probability of it differing from
X goes to zero as n gets large, written as
Xn P
X if >0,
P ( Xn X > ) 0 as n
Limit of an estimator
Continued
Besides these there are also other types of convergence
Convergence in Quadratic mean - A sequence of random variables
is said to converge in quadratic mean to X, written as
Xn qm
X, if E ( Xn X ) 0 as n
2
Xn a.s.
X if >0,
P lim Xn X < = 1
n
Continued
Relation between different types of convergence:
a.s.
qm CIP CID
Law of Large Numbers (LLN)
Limit Theorems
Limit Theorems can be used to obtain properties of estimators as the
sample sizes tend to infinity
Convergence in Probability Limit of an estimator
Convergence in Distribution Limit of a CDF
Two important theorems in probability:
the Law of Large Numbers (LLN)
the Central Limit Theorem (CLT)
lim P X = 0
n
Continued..
Proof: The proof will be easier if we additionally assume that the random
variables have a finite variance i.e.
2
Var(X) = <
Var(X) Var(X)
P X
=
0 as n
2
2
n
Application: Suppose that a sequence of independent trials is performed. Let E
be a fixed event and the probability that E occurs on a given trial be P(E). Let,
1 if E occurs on trial i
Xi =
0 if E does not occur on trial i
Then it follows that X1 + X 2 + .....+ X n represents the no. of times that E occurs
in the first n trials. Because, E[Xi ] = P(E)
, by WLLN it follows that for any > 0,
the probability that the proportion of the first n trials in which E occurs differs
from P(E) by more than , goes to 0 as n increases
Sample sizes
X X1 + X2 + ......+ Xn n
Zn =
=
/ n
n
converges in distribution to the standard normal random variable as
n . That is
lim P ( Z n x ) = (x)
n
Zn =
np(1 p)
Example1
Example 2
A Remark
We could have directly looked at Yn=X1+X2+...+Xn, so why
do we normalize it first and say that the normalized version
(Zn) becomes approximately normal?
This is because E[Yn]=nE[Xi] and Var(Yn)=n2Var(Yn)=n2 go to
infinity as n goes to infinity. We normalize Yn in order to have a finite
mean and variance (E[Zn]=0, Var(Zn)=1).
Nevertheless, for any fixed n, the CDF of Zn is obtained by scaling
and shifting the CDF of Yn. Thus, the two CDFs have similar shapes.
Example:
An insurance company has 25000 automobile policy holders. If the
yearly claim of a policy holder is a random variable with mean 320 and
standard deviation 540, approximate the probability that the total yearly
claim exceeds 8.3 million.
X = Xi
i=1
Contd.
Therefore
6
6
6
10
8.3
10
10
6
P{X > 8.3 10 } = P
>
4
4
8.5381 10
8.5381 10
X 8 10 6
.3 10 6
= P
>
4
4
8.5381 10
8.5381 10
Important Applications
The importance of the central limit theorem is, in many real
applications, a certain random variable of interest is a sum of a large
number of independent random variables. In these situations, we
are often able to use the CLT to justify using the normal distribution.
Here are few examples:
Laboratory measurement errors are usually modeled by normal
random variables.
In communication and signal processing, Gaussian noise is the most
frequently used model for noise.
In finance, the percentage changes in the prices of some assets are
sometimes modeled by normal random variables.
When we do random sampling from a population to obtain statistical
knowledge about the population, we often model the resulting
quantity as a normal random variable.
Continued
Also it can simplify our computations significantly. A problem in
which we are interested in a sum of one thousand i.i.d. random
variables, it might be extremely difficult, to find the distribution of the
sum by direct calculation. Using the CLT we can immediately write
the distribution, if we know the mean and variance of the Xi's.
Y E(Y ) Y n
=
Var(Y )
n
P(y1 Y y2 ) = P
y2 n
y1 n
n
n
Examples
Example 1. A bank teller serves customers standing in the queue one by
one. Suppose that the service time Xi for customer i has mean
E[Xi]=2(minutes) and Var(Xi)=1. We assume that service times for
different bank customers are independent. Let Y be the total time the
bank teller spends serving 50 customers. Find P(90<Y<110).
Random Walk