Sunteți pe pagina 1din 15

Statistics 201A, Jim Pitman, Fall 2015 Lecture 9

Notes by Nandish Mehta


Wednesday 16, 2015

Contents
1 Midterm Review
1.1 Distribution of a sum of independent random variables . .
1.2 The gamma function . . . . . . . . . . . . . . . . . . . . .
1.3 Graphical Explanation of Chebyshevs Inequality . . . . .
1.4 Constructing a negative binomial variable using indicators

.
.
.
.

2
2
2
4
5

2 Symmetry of Random variables


2.1 Application to order statistics . . . . . . . . . . . . . . . . . .
2.2 Density of the k th order statistic . . . . . . . . . . . . . . . .

9
10
13

.
.
.
.

1
1.1

Midterm Review
Distribution of a sum of independent random variables

It is a general question when this works out nicely. You have seen the most
important examples for applications in statistics (binomial, Poisson, normal,
geometric, negative binomial, exponential, gamma ) in Worksheets 1 and 2.
Another example is compound Poisson distributions in Worksheet 4. This
leads to the general theory of infinitely divisible distributions https://en.
wikipedia.org/wiki/Infinite_divisibility_(probability). The distribution of X is called infinitely divisible if for every n it is possible to
construct a sequence Xn,1 , . . . , Xn,n of n IID random variables such that
d P
X = ni=1 Xn,i . All of the above examples except binomial are infinitely divisible, as you can easily check. Another very interesting infinitely divisible
law is the Cauchy distribution of Y with density
dy
P (Y dy) =
, yR
(1)
(1 + y 2 )
This distribution of Y has the amazing property that if Y1 , Y2 ... Yn are IID
copies of Y , then
Y1 + Y2 + . . . + Yn d
=Y
(2)
n
In other words, averaging IID copies of Y does not reduce at all the spread
in the distribution of Y . Compare with the more familiar
Z1 + Y2 + . . . + Zn d Z
(3)
= 1/2
n
n
for IID normal(0, 2 ) Zi , any > 0. It seems at first that (1) goes against
the law of large numbers, according to which averaging independent random
variables should reduce their variability. But the law of large numbers applies only to IID random variables Yi with E|Y | < , and for the Cauchy
distribution we have E|Y | = . The exact convolution rule (??) can be
derived with pain from the convolution formula for densities, and without
pain using the characteristic function of Y , which is E exp(itY ) = e|t| for
real t, and the uniqueness theorem for characteristic functions.

1.2

The gamma function

Definition: The gamma function is a function of r defined by,


Z
(r) :=
x( r 1)ex dx, for every r > 0
0

(4)

As such, it is just the necessary normalization constant in the probability density of a random variable X on (0, ) with density fX (x)
xr1 ex 1(x > 0). So the distribution of X is called gamma(r, 1) iff
fX (x) = (r)1 xr1 ex 1(x > 0) and gamma(r, ) for > 0 iff X
gamma(r, 1).
Recursion property: The gamma function satisfies the generalized factorial
recursion:
(r + 1) = r(r)
(5)
Particular Values:
Z

(1) =

ex dx = 1

(6)

(2) = 1(1) = 1
(3) = 2(1) = 2 1 = 2
(7)

(8)

(n) = (n 1)!
The Gamma function for half-integer values is given by,

1
( ) =
2
3
1 1
1
( ) = ( ) =

2
2 2
2
5 3 1
7

( ) =
2
222

(9)

(10)
(11)

As r 0 the gamma function explodes to +.


Gamma Distribution: This is a two parameter continuous distribution defined for a random variable Y > 0 as,
Y gamma(r, ) fY y =

r (
y r 1)e( y) for y > 0
(r)

Y (r, 1) fY y =

y ( r 1) (
e y) for y > 0
(r)

(12)

Note that Y gamma(r, ) Y gamma(r, 1)


Stirlings approximation
n!

2n

 n n
e
3

as n

(13)

Figure 1: Graph of (r) for 0 < r < 5


where the means the ratio of the two sides converges to 1 as n .
Written with (n + 1) in place of n!, this approximation works also for real
r instead of n :
 r r

(r + 1) 2n
as r
(14)
e
To get asymptotics for (r) instead of (r + 1), simply use Gamma(r + 1) =
r(r) to see
 
2 r r
(r)
as r .
(15)
r e

1.3

Graphical Explanation of Chebyshevs Inequality

Problem: Bound P (|X a| b) = E[1(|X a| b)]. The method is shown


in Figure 2. For each fixed choice of a and b > 0, the graph of 1(|X a| b)
2
as a function of X is dominated by the parabola of (|Xa|)
, which is designed
b2
to touch the indicator function of X only at X = a and at X = a b. This
clearly gives the inequality
1(|X a| b)

(X a)2
b2

(16)

Figure 2: Graphical explanation of Chebychev Inequality


Taking expectation on both sides Chebyschevs inequality is established as
below,
P (|X a| b) = E[1(|X a| b)]

E(X a)2
.
b2

(17)

2 := V ar(X) = E(X E(X))2 , this becomes


If a = E(X), b = kX , where X

P (|X E(X)| kX )

1
, for any real k > 0.
k2

(18)

Only the case where k > 1 is of any interest, because for k 1 the right
hand side exceeds the trivial bound of 1.

1.4

Constructing a negative binomial variable using indicators

Question: Can a negative binomial variable be usefully written as a sum of


indicators?
Answer: Yes, on account of the following:

Figure 3: Sum of indicator functions


Fact: Every random variable X 0, 1, 2, . . . can be expressed as an infinite
sum of indicators, as
X = 1(X 1) + 1(X 2) + 1(X 3) + 1(X 4) . . .

(19)

The equation above can be represented by Figure 3


Taking expectations on both sides we get,
E(X) = P (X 1) + P (X 2) + P (X 3) + P (X 4) . . .

(20)

The equation above is referred called the tail-sum formula for E(X). It
applies to any X 0, 1, 2, 3, . . ..
Example: Assume a random variable X = Gp Geom(p) on 0, 1, 2, . . .. Get
E(Gp ) =

P (Gp n)

n=1
inf
X

(1 p)n

n=1

(1 p)
(1 p)
=
1 (1 p)
p

(21)

Figure 4: Tail integral for expectation of a random variable X 0 from its


CDF. Note: this is the graph of the CDF F (x) := P (X x) not P (X x).
E(X) is the shaded area above the graph and below 1
As a check E(Gp + 1) = E(Gp ) + 1 = p1 which is the familiar formula for
the expected number of indepdendent Bernoulli(p) trials to produce the first
success.
For a random variable X 0 with a CDF FX (x), the analog of the
tail-sum formula is the tail-integral formula
Z
Z
E(X) =
P (X > x)dx =
(1 FX (x))dx
0

This tail integral formula is represented graphically by Figure 4. The CDF is


FX (x) := P (X x). The vertical dotted line above x represents a segment
of length 1 FX (x) = P (X > x). This method of computing expectations
works fine for any non-negative random variable X. To see this, recall that
d
U U [0, 1] implies F 1 (U ) = X. So
E(X) = E(F

Z
(U )) =

F 1 (u)du

The lengths F 1 (u) for selected u [0, 1] are represented by horizontal lines
in Figure 4. The shaded area can be computed in two different ways, either
7

Figure 5: Expectation of a discrete random variable X from its CDF. The


S2 seems irrelevant.
by summing lengths of vertical segments, weighted by spacings between these
segments, or by summing similarly weighted lengths of horizontal segments,
and passing to the limit of closely spaced segments. Either way, the limiting
area is E(X), whether finite or infinite.
Tail-sum Formula for discrete case: To illustrate for X a discrete random
variable. Again, as shown in the Figure 5 the area covered by the vertical or
the horizontal line segments is equal to E(X).
Side Note: Consider a CDF of a random variable X as shown in Figure 6.
For any value of X positive or negative, X can be written as X = X+ X
where, X+ = X 1(X 0) and X = X 1(X 0). Thus by linearity
of expectation and application of the tail integral formula to X+ and X
separately gives
Z
Z
E(X) = E(X+ ) E(X ) =
P (X > x)dx
P (X x)dx (22)

as illustracted in Figure 6. In the example of Figure 6, assuming the distribution is all concentrated on the interval of X values shown in the diagram
(so there must be atoms at the endpoints) it appears that E(X ) is slightly
greater than E(X+ ). So E(X) must be slightly negative.

Figure 6: CDF of random variable value X where X is signed number.

Symmetry of Random variables

Recall the change of random variable principle which asserts that


d

X = Y = (X) = (Y )

(23)

where X can be any sequence or a vector, and is any measureable function


defined on the range of values of X.
Key Definition: A sequence of random variable (X1 , X2 , . . . , Xn ) is called
exchangable (or symmetric) iff the following is true,
d

(X(1) , X(2) , . . . , X(n) ) = (X1 , X2 , . . . , Xn )


(24)
for every permutation of 1, 2, . . . , n. For example the permutation
(i) := (n + 1 i) reverses the order of the variables. So if X1 , . . . , Xn
are exchangeable, then the sequence X1 , . . . , Xn is reversible, meaning
d

(X1 , X2 , . . . , Xn ) = (Xn , Xn1 , . . . , X1 )


. For two variables X1 and X2 , that is all that is involved in exchangeability:
d
(X1 , X2 ) = (X2 , X1 ). But for n variables there are n! different permutations
9

of {1, 2, . . . , n}, so exchangeability of X1 , X2 , . . . , Xn for n 3 is much


stronger than just reversibility.
Observations:
An IID sequence is exchangeable.
A pair of random variables (X, Y ) with joint density fX,Y is exchangeable if and only if fX,Y (x, y) = fX,Y (y, x). This is also depicted by
Figure 7
A vector of random variables (X1 , . . . , Xn ) with joint density f (x1 , . . . , xn )
is exchangeable if and only if f can be chosen to satisfy
f (x(1) , . . . , f (x(n) ) = f (x1 , . . . , xn )
for every permutation of {1, 2, . . . , n}. That is to say, f is a symmetric function of n variables https://en.wikipedia.org/wiki/Symmetric_
function. To match up with the previous definition, here should be taken
to be the inverse of . This is clear from consideration of the corresponding
formula for probability functions in the discrete case.

2.1

Application to order statistics


d

Consider Y1 , Y2 , Y3 exchangeable. Then (Y1 , Y2 , Y3 ) = (Y2 , Y1 , Y3 ) = (3!


= 6 identical joint distributions). So in particular
P (Y1 < Y2 < Y3 ) = P (Y2 < Y1 < Y3 ) =

(25)

where goes through the remaining 4 possible order relations for 3 variables. All of these probabilities are equal. If we assume that there is a
joint density, or that the Yi are IID with a continuous CDF, then e.g.
P (Y1 = Y2 ) = 0 and the same for any other event requiring ties between
2 or more values of the Yi . It follows in that case that the 3! probabilities in
(25) are not only all equal, they add up to 1. So each of these probabilities
is 1/3!. This argument is very general. It shows that for any sequence of
exchangeable random variables (Y1 , . . . , Yn ) such that P (Y1 = Y2 ) = 0, in
particular for any such sequence with a joint density, or if the Yi are IID
with a continous CDF, then each of the n! events requiring the Yi to be in a
particular order has probability 1/n!. This fact is the key to all the problems
involving records of such Yi in Worksheet 3.
Next, define order statistics for Y1 , Y2 , . . . , Yn (assuming no ties) by
Yn,1 < Yn,2 < . . . < Yn,n with Yn,i = Y (i) for some random permutation of of 1, 2, . . . , n. Fig-8 below shows one such scenario. Note that
10

Figure 7: Exchangeable variables X and Y . For any two points (x, y) and
(y, x) which are reflections of each other across the diagonal, the joint density
(or joint probability function) at the two points must be the same.

Figure 8: A particular permutation of values Y1 , Y2 , Y3 .

11

Figure 9:

(1) = 1, (2) = 3 and (3) = 2.


Theorem: Assume Y1 , Y2 , . . . are IID with density fY (y) then,
1) has uniform distribution on the n! different permutations of n
2) The joint density of order statistics is,
P (Yn,1 dy1 , Yn,2 dy2 , . . . , Yn,n dyn )
= n!fY (y1 )fY (y2 ) . . . fY (yn ) 1(y1 < y2 < . . . < yn )
dy1 dy2 . . . dyn
(26)
3) is independent of the vector of order statistics (Yn,1 , Yn,2 , . . . , Yn,n )
The event that Y1 , Y2 , . . . , Yn is such that the order statistics fall in
dy1 , dy2 , . . . , dyn as shown in Figure 9, with an X marking the location of
each Yi . Note there should be an X in the last bin too, and 0s in all the
intervals between the bins, to indicate no points fall there.
Proof: Say we assume n = 3. And we express, for y1 < y2 < y3
P ((1) = 3, (2) = 1, (3) = 2, Yn,1 dy1 , Yn,2 dy2 , Yn,3 dy3 ) = P (Y3 dy1 , Y1 dy2 , Y2 dy3 )
(27)
The equation above can be re-written as,
fY (y1 )dy1 fY (y2 )dy2 fY (dy3 )dy3 =

1
(3!fY (y1 dy1 , . . . , f yY (y3 )dy3 ))
3!

(28)

which is the required factorization: the 1/3! is the probability of the particular permutation, and the rest is the joint density of the order statistics. The
interpretation of the two factors is clear by either summing or integrating
over cases, as needed. (Usual story: factorization of a probability function or
density implies independence: the constants can be found by summation or
integration of the factored expression). The generalization of this argument
to n! permutations of n variables is rather obvious.

12

Application:
P (Yn,1 dy1 )
=
dy1

(n1)

previous expression

other variables
(29)

= nfY (y1 )(1 F (y1 ))(n1)

Alternate Method: It is not really necessary to deal with the joint density
of all n order statistics to find the distribution of just one or two of them.
Look for instance at
Yn,1 := min Yi
(30)
1in

A general technique to find the distribution of a minimum is to look at the


right tail. Thus,
n

P (Yn,1 > y) = P ( Yi > y)


i=1

= (1 FY (y))n

( by independence)

This implies,
P (Yn,1 dy)
d
d
=
P (Yn,1 y) = P (Yn,1 > y)
(31)
dy
dy
dy
d
= (1 FY (y))n = nfY (y1 )(1 F (y1 ))n1
dy
(32)
d
as before, but now using the chain rule of calculus and fY (y) = dy
FY (y).
Similarly, the desnity of the maximum can be written down just as easily:

P (Yn,1 dy)
= nfY (y)FY (y)n1
dy

(33)

or derived by differentiation of P (Yn,n y) = FY (y)n .

2.2

Density of the k th order statistic

By the same intuitive calculation of differential probabilities, the density of


the kth order statistic is given by,
P (Yn,k dy)
n!
=
fY (y)FY (y)k1 (1 FY (y))nk
dy
(k 1)!1!(n k)!

(34)

The multinomial coefficient comes from choosing k1 variables to fall before


13

Figure 10: The counts in three intervals to make the kth order statistic fall
in dy = dy1
y, one to fall in dy, and n k to fall after y. See Figure 10
Important Case: Distribution of Y 0 s is U [0, 1]. So we need only consider
0 < u < 1.
fU (u) = 1(0 < u < 1)
FU (u) = u for 0 < u < 1
1 FU (u) = 1 u for 0 < u < 1

(35)

So for U1 , U2 , . . . , Un IID uniform(0, 1) variables


P (Un,k du)
n!
=
uk1 (1 u)nk 1(0 < u < 1)
du
(k 1)!(n k)!

(36)

So Un,k = k,nk+1 where r,s beta(r, s) as discussed in the previous


lecture.
(r+s) r1
Check: The r,s density is, (r)(s)
u (1 u)s1 . Match powers of u and
(1 u) to see r = k and s = n k + 1. Comparing constants, we see that
(k + (n k + 1))
n!
=
(k)(n k + 1)
(k 1)!(n k)!

(37)

which is correct by three applications of (n) = (n 1)! for n = 1, 2, . . ..


Note that we have a separate calculation of the normalization constant in
the beta(r, s) density whenever both r and s are positive integers, by this
construction of
Ur+s+1,r beta(r, s) for r, s {1, 2, . . .}
These beta densities with integer parameters also arise naturally as posterior
distributions in Bayesian inference for an unknown probability parameter U
14

in [0, 1], whose prior distribution is uniform = beta(1, 1). If given U = p


there are n independent Bernoulli (p) trials resulting in k successes and
n k failures, for some 0 k n, the posterior distribution of U given this
data is beta(1 + k, 1 + n k). To be discussed in detail in a later lecture.

15

S-ar putea să vă placă și