12 S241 Expectation For Multivariate Distributions

Expectation
for multivariate distributions
Definition
Let X1, X2, , Xn denote n jointly distributed
random variable with joint density function
f(x1, x2, , xn )
then
E g X 1 ,K , X n
K g x ,K , x f x ,K , x dx ,K , dx
1
Example
Let X, Y, Z denote 3 jointly distributed random
variable with joint density function then
f x, y , z
12
7
x 2 yz
Determine E[XYZ].
0 x 1, 0 y 1, 0 z 1
otherwise
Solution:
1 1 1
12 2
E XYZ xyz
x yz dxdydz
7
0 0 0
1 1 1
12
x 3 yz xy 2 z 2 dxdydz
7 000
12 x
x 2 2

yz
y z
7 00 4
2
1 1
3 y
y 2

z 2 z
70 2
3
1
3 z
2z

7 4
9
2
y 1
x 1
1 1
3
dydz yz 2 y 2 z 2 dydz
700
x 0
3 1
2 2
dz z z dz
7 0 2
3
y 0
3 1 2
17
3 17

7 4 9
84
7 36
Some Rules for Expectation
1.
E Xi
K x f x ,K , x dx K dx
i
x f x dx
i i
Thus you can calculate E[Xi] either from the joint distribution of
X1, , Xn or the marginal distribution of Xi.
K x f x ,K , x dx ,K , dx
Proof:
x K f x ,K , x dx K dx
i
x f x dx
i i
i 1
dxi 1 K dxn dxi
E a1 X 1 L an X n a1 E X 1 L an E X n
2.
The Linearity property

Proof:
K a x K
an xn f x1 ,K , xn dx1 K dxn
1 1
a1 K
x f x ,K , x dx K dx
1
an K
x f x ,K , x dx K dx
n
3. (The Multiplicative property) Suppose X1, , Xq

are independent of Xq+1, , Xk then
E g X 1 ,K , X q h X q 1 ,K , X k
h X q 1 ,K , X k
E g X 1 ,K , X q E
In the simple case when k = 2
E XY E X E Y
if X and Y are independent
Proof:
E g X 1 ,K , X q h X q 1 ,K , X k
K g x ,K , x h x
,K , xk f x1 ,K , xk dx1 K dxn
K g x ,K , x h x
,K , xk f1 x1 ,K , xq
K h x
q 1
q 1
q 1
f 2 xq 1 ,K , xk dx1 K dxq dxq 1 K dxk
,K , xk f 2 xq 1 ,K , xk
K g x ,K , x
1
f1 x1 ,K , xq dx1 K dxq dxq 1 K dxk
h X q 1 ,K , X k
E g X 1 ,K , X q E
E g X 1 ,K , X q
K h x
q 1
,K , xk f 2 xq 1 ,K , xk dxq 1 K dxk
h X q 1 ,K , X k
E g X 1 ,K , X q E
Some Rules for Variance

2
Var X E X X E X 2 X2
1.
Var X Y Var X Var Y 2Cov X , Y

where
Cov X , Y =E X X Y Y
Proof
2
Var X Y E X Y X Y
where X Y E X Y X Y
Thus
Var X Y E X Y X Y
2
2
E X X 2 X X Y Y Y Y
Var X 2Cov X , Y Var Y
Note: If X and Y are independent, then
=E X X E Y Y
= E X X E Y Y 0
and
Var X Y Var X Var Y
Definition: For any two random variables X and Y

then define the correlation coefficient XY to be:
xy =
Cov X , Y
Var X Var Y
Cov X , Y
XY
Thus Cov X , Y = XY X Y
and
Var X Y X2 Y2 2 XY X Y

2
X
2
Y
if X and Y are
independent
Properties of the correlation coefficient XY
xy =
Cov X , Y
Var X Var Y
Cov X , Y
XY
If X and Y are independent than XY 0.

Reason : Cov X , Y 0
The converse is not necessarily true.
i.e. XY = 0 does not imply that X and Y are independent.
More properties of the correlation coefficient XY
1 XY 1
and XY 1 if there exists a and b such that
P Y bX a 1
whereXY = +1 if b > 0 and XY = -1 if b< 0
Proof: Let
U X X and V Y Y .
Let
g b E V bU 0
Consider choosing b to minimize
for all b.
Consider choosing b to minimize

2
g b E V bU
E V 2 2bVU b 2U 2
U
2
E V 2 2bE VU b 2 E
g b 2 E VU 2bE U 2 0
or
b bmin
E VU
E U
Since g(b) 0, then g(bmin) 0
Hence g(bmin) 0
2
g bmin E V 2 2bmin E VU bmin

E
E
VU
E V 2
E
VU
E U 2
2
E VU
2
E V
0
2
E U
Hence
E VU
E U E
2
2
U
2
E VU
U
2
E
or
Note
E X
Y Y
2
2
E X X E Y Y
2
XY
1
2
U
2
g bmin E V 2 2bmin E VU bmin
E
If and only if
E V bminU 0
2
XY 1
This will be true if P V bminU 0 1
P Y Y bmin X X 0 1
P Y bmin X a 1 where a Y bmin X
i.e.
Summary
1 XY 1
P Y bX a 1
where
and
b bmin
E X X Y X
E X X
Cov X , Y XY X Y
Y
=
= XY
2
Var X
X
X
Y
a Y bmin X Y XY
X
X
2.
Var aX bY a 2 Var X b 2 Var Y 2abCov X , Y
Proof
2
Var aX bY E aX bY aX bY
with aX bY E aX bY a X bY
Thus
Var aX bY E aX bY a X bY
2
2
2
2
E a X X 2ab X X Y Y b Y Y
a Var X 2abCov X , Y b Var Y

2
3.
Var a1 X 1 K an X n
a Var X 1 K a Var X n
2
1
2
n
2a1a2 Cov X 1 , X 2 K 2a1an Cov X 1 , X n
2a2 a3Cov X 2 , X 3 K 2a2 an Cov X 2 , X n

2an 1an Cov X n 1 , X n
ai2 Var X i 2 ai a j Cov X i , X j

i 1
n
i j
ai2 Var X i if X 1 ,K , X n are mutually independent

i 1
Some Applications
(Rules of Expectation & Variance)
Let X1, , Xn be n mutually independent random
variables each having mean and standard deviation
(variance 2).
1 n
1
1
Let
X X i X1 K X n
n i 1
n
n
a1 X 1 K an X n
Then
1
1
X E X E X 1 K E X n
n
n
1
1
K
n
n
Also
1
Var X n
n
1
Var X Var X 1 K
n
2
X
or X
n
1
2 K
n
2 2
n 2
n
n
1
2
n
Thus X and X
n
Hence the distribution of X is centered at and

becomes more and more compact about as n
increases
Tchebychevs Inequality
Tchebychevs Inequality
Let X denote a random variable with
mean =E(X) and
variance Var(X) = E[(X )2] = 2
then
1
P X k 1 2
k
1
P k X k 1 2
k
Note:
2
Is called
thestandard
deviation
of
X,
Var X E X
Proof:
x f x dx
Var ( X )
2
x f x dx
2
x f x dx
x f x dx
k
k
x f x dx k x f x dx
2
k f x dx
2
k f x dx
2
k
2
f x dx
kf x dx
k P X k P X k
2
k P X k
2
Thus k P X k
1
or P X k 2
k
1
and P X k 1 2
k
2
Tchebychevs inequality is very conservative

1
P X k P k X k 1 2
k
k =1
1
P X P X 1 2 0
1
k = 2
P X 2 P 2 X 2 1
k = 3
1 3
2
2
4
1 8
P X 3 P 3 X 3 1 2
3
9
The Law of Large Numbers
The Law of Large Numbers

variables each having mean
1 n
Let X X i
n i 1
Then for any > 0 (no matter how small)

P X P X 1 as n
Proof
We will use Tchebychevs inequality which states for
any random variable X.
1
P X k X X X k X 1 2
k
Now X and X
n
P X
1
P k X X k X 1 2
k
n
where k X k
or k
Thus
1
P X
1 2 1 2 1
k
n
as n
Thus
P X 1 as n
A Special case
variables each having Bernoulli distribution with
parameter p
if repetition is S (prob p)
1
Xi
0 if repetition is F (prob q 1 p)
E Xi p
X1 K X n
X
p proportion of successes
n
Thus the Law of Large Numbers states

P p p p 1 as n
Thus the Law of Large Numbers states that

p proportion of successes
converges to the probability of success p as n

Some people misinterpret this to mean that if the
proportion of successes is currently lower that p then
the proportion of successes in the future will have to be
larger than p to counter this and ensure that the Law of
Large numbers holds true.
Of course if in the infinite future the proportion of
successes is p than this is enough to ensure that the
Law of Large numbers holds true.
Some more applications

Rules of expectation and Rules of
Variance
The mean and variance

of a Binomial Random variable
We have already computed this by other methods:
1. Using the probability function p(x).
2. Using the moment generating function mX(t).
Suppose that we have observed n independent
repetitions of a Bernoulli trial
variables each having Bernoulli distribution with
parameter pand defined by
1 if repetition i is S (prob p)
Xi
0 if repetition i is F (prob q)
E X i 1p 0 q p
Var X i 1 p p 0 p q pq
2
Now X = X1 + + Xn has a Binomial distribution with

parameters n and p
X is the total number of successes in the n repetitions.
X E X 1 K E X n p K p np
X2 var X 1 K var X n pq K pq npq

of a Hypergeometric distribution
The hypergeometric distribution arises when we sample
with replacement n objects from a population of N = a +
b objects. The population is divided into to groups
(group A and group B). Group A contains a objects
while group B contains b objects
Let X denote the number of objects in the sample of n
that come from group A. The probability function of X
b
is:
a

x
n
p x
a b
Let X1, , Xn be n random variables defined by

1 if i th object selected comes from group A
Xi
th
0
if
i
object selected comes from group B
Then
X X1 L X n
a
b
P X i 1
and P X i 0
ab
ab
Proof
P X i 1
a b 1 n 1
a b
Pn
a b 1 !
a b 1 n 1 ! a
a b
a b !
a b n !
Therefore
a
E X i 1P X i 1 0 P X i 0
ab
a
2
2
2
E X i 1 P X i 1 0 P X i 0
ab
and
a
2
var X i E X i E X i
ab
2
a
a
b
a
1-
ab
ab ab
a b
Thus
E X E X1 K X n
n
E Xi
i 1
b
n
ab
Also
Var X Var X 1 K X n
n
Var X i 2 Cov X i , X j
i 1
and
a
b
var X i
ab ab
We need to also calculate Cov X i , X j

Note: Cov U , V E U U V V
E UV U V V U U V
E UV U V V U U V
E UV U V E UV E U E V
Thus
and
X
j
Cov X i , X j E X i X j E X i E
a
E Xi
ab
X i X j 1 0 P X i X j
E X i X j 1 P
X i 1, X j
P X i X j 1 P
Note:
P X i 1, X j 1
a a 1
0
1
a a 1 a b 2 Pn 2
a b 2 !
a b
Pn 2
a a 1
a b 2 n 2 !
a b !
a b a b 1
a b n !
Thus
and
E X i X j
a a 1
a b a b 1
X
j
Cov X i , X j E X i X j E X i E
a a 1
a b a b 1 a b
a
a 1
a
b
a
1
a
a 1 a b a a b 1
a b 1 a b
ab
a b 1 a b
Thus
Var X Var X 1 K X n
n
Var X i 2 Cov X i , X j
i 1
with
and
i j
a
b
ab
var X i
a b a b a b 2
Cov X i , X j
ab
a b 1 a b
Var X Var X i 2 Cov X i , X j

i 1
i j
ab
a b
n n 1
2
ab
a b 1 a b
Thus
Var X Var X i 2 Cov X i , X j

i 1
i j
ab
a b
ab
a b
n n 1
ab
a b 1 a b
n 1
a b 1
np A pB 1 f
a
b
n 1
n 1
where p A
, pB
and f
ab
ab
a b 1 N 1
Thus if X has a hypergeometric distribution with

parameters a, b and n then
a
E X n
np A
ab
Var X np A pB 1 f
a
b
n 1
n 1
where p A
, pB
and f
ab
ab
a b 1 N 1

of a Negative Binomial distribution
The Negative Binomial distribution arises when we
repeat a Bernoulli trial until k successes (S) occur. Then
X = the trial on which the kth success occurred.
The probability function of X is:
x 1
k xk
p x
p
x k , k 1, k 2,...
q
k 1
Let X1= the number of trial on which the 1st success
occurred.
and Xi = the number of trials after the (i -1)st success on
which the ith success occurred (i 2)
Then X = X1 + + Xk
and X1, , Xk are mutually independent
Xi each have a geometric distribution with parameter p.
1
q
thus E X i
and Var X i 2
p
p
k
k
hence E X E X i
p
i 1
k
kq
and Var X Var X i 2
p
i 1
Thus if X has a negative binomial distribution with

parameters k and p then
k
E X
p
kq
Var X 2
p
Multivariate Moments
Non-central and Central
Definition
Let X1 and X2 be a jointly distirbuted random variables
(discrete or continuous), then for any pair of positive
integers (k1, k2) the joint moment of (X1, X2) of order
(k1, k2) is defined to be:
k1k2
k1
k2
E X 1 X 2
k1 k2
x
1 x2 p x1 , x2
x1
if X 1 , X 2 are discrete
x2
x1k1 x2k2 f x1 , x2 dx1dx2
if X 1 , X 2 are continuous
Definition
integers (k1, k2) the joint central moment of (X1, X2) of
order (k1, k2) is defined to be:
k1
k2
E X 1 1 X 2 2
0
k1 , k2
x1 1 x2 2
k1
x1
x2
x x
k1
k2
k2
p x1 , x2
f x1 , x2 dx1dx2
where 1 = E [X1] and 2 = E [X2]
Note
E X 1 1 X 2 2 Cov X 1 , X 2
0
1,1
= the covariance of X1 and X2.

xy =
Cov X , Y
Var X Var Y
Cov X , Y
XY
Properties of the correlation coefficient XY

xy =
Cov X , Y
Var X Var Y
Cov X , Y
XY
If X and Y are independent than XY 0.

Reason : Cov X , Y 0
The converse is not necessarily true.
i.e. XY = 0 does not imply that X and Y are independent.
More properties of the correlation coefficient
1 XY 1
P Y bX a 1
whereXY = +1 if b > 0 and XY = -1 if b< 0
Some Rules for Expectation
1.
E Xi
K x f x ,K , x dx K dx
i
x f x dx
i i
Thus you can calculate E[Xi] either from the joint distribution of
X1, , Xn or the marginal distribution of Xi.
2.
E a1 X 1 L an X n a1 E X 1 L an E X n
The Linearity property
3. (The Multiplicative property) Suppose X1, , Xq

are independent of Xq+1, , Xk then
E g X 1 ,K , X q h X q 1 ,K , X k
h X q 1 ,K , X k
E g X 1 ,K , X q E
In the simple case when k = 2
E XY E X E Y
if X and Y are independent
Some Rules for Variance

2
Var X E X X E X 2 X2
Var X Y Var X Var Y 2Cov X , Y
1.
where
Note: If X and Y are independent, then
=E X X E Y Y
= E X X E Y Y 0
and
Var X Y Var X Var Y

xy =
Cov X , Y
Var X Var Y
Cov X , Y
XY
Thus Cov X , Y = XY X Y
and
Var X Y X2 Y2 2 XY X Y

2
X
2
Y
if X and Y are
independent
2.
Var aX bY a 2 Var X b 2 Var Y 2abCov X , Y
Proof
2
Var aX bY E aX bY aX bY
with aX bY E aX bY a X bY
Thus
Var aX bY E aX bY a X bY
2
2
2
2
E a X X 2ab X X Y Y b Y Y
a Var X 2abCov X , Y b Var Y

2
3.
Var a1 X 1 K an X n
a Var X 1 K a Var X n
2
1
2
n
2a1a2 Cov X 1 , X 2 K 2a1an Cov X 1 , X n
2a2 a3Cov X 2 , X 3 K 2a2 an Cov X 2 , X n

2an 1an Cov X n 1 , X n
ai2 Var X i 2 ai a j Cov X i , X j

i 1
n
i j
ai2 Var X i if X 1 ,K , X n are mutually independent

i 1
Distribution functions,
Moments,
Moment generating functions
in the Multivariate case
The distribution function F(x)

This is defined for any random variable, X.
F(x) = P[X x]
Properties
1.
2.
3.
F(-) = 0 and F() = 1.

F(x) is non-decreasing
(i. e. if x1 < x2 then F(x1) F(x2) )
F(b) F(a) = P[a < X b].
4. Discrete Random Variables

F x P X x p u
u x
p x F x F x jump in F x at x.
1.2
F(x)
1
0.8
0.6
0.4
p(x)
0.2
0
-1
F(x) is a non-decreasing step function with

F 0 and F 1
5.
Continuous Random Variables Variables

F x P X x
f x F x .
f u duF(x)
f(x) slope
0
-1
F(x) is a non-decreasing continuous function with

F 0 and F 1
To find the probability density function, f(x), one first

finds F(x) then
f x F x .
The joint distribution function F(x1, x2, , xk)

is defined for k random variables, X1, X2, , Xk.
F(x1, x2, , xk) = P[ X1 x1, X2 x2 , , Xk xk ]
for k = 2
x2
(x1, x2)
x1
F(x1, x2) = P[ X1 x1, X2 x2]
Properties
1.
F(x1 , -) = F(- , x2) = F(- , -) = 0
2.
F(x1 , ) = P[ X1 x1, X2 ] = P[ X1 x1] = F1 (x1)

= the marginal cumulative distribution
function of X1
F(, x2) = P[ X1 , X2 x2] = P[ X2 x2] = F2 (x2)
= the marginal cumulative distribution
function of X2
F(, ) = P[ X1 , X2 ] = 1
3.
F(x1, x2 ) is non-decreasing in both the x1

direction and the x2 direction.
i.e. if a1 < b1 if a2 < b2 then
i.
F(a1, x2) F(b1 , x2)
ii.
F(x1, a2) F(x1 , b2)
iii.
F( a1, a2) F(b1 , b2)

x2
(a1, b2)
(b1, b2)
x1
(a1, a2)
(b1, a2)
4.
P[a < X1 b, c < X2 d] =

F(b,d) F(a,d) F(b,c) + F(a,c).
x2
(b, d)
(a, d)
x1
(a, c)
(b, c)
4. Discrete Random Variables

F x1 , x2 P X 1 x1 , X 2 x2
x2
p u ,u
u2 x2 u1 x1
(x1, x2)
x1
F(x1, x2) is a step surface
p x1 , x2 jump in F x1 , x2 at x1 , x2 .
5. Continuous Random Variables

F x1 , x2 P X 1 x1 , X 2 x2
x2
x1 x1
f u , u du du
1
(x1, x2)
x1
F(x1, x2) is a surface
f x1 , x2
2 F x1 , x2
x1x2
2 F x1 , x2
x2 x12
Multivariate Moments
Non-central and Central
Definition
integers (k1, k2) the joint moment of (X1, X2) of order
(k1, k2) is defined to be:
k1k2
k1
k2
E X 1 X 2
k1 k2
x
1 x2 p x1 , x2
x1
x2
x1k1 x2k2 f x1 , x2 dx1dx2
Definition
integers (k1, k2) the joint central moment of (X1, X2) of
order (k1, k2) is defined to be:
k1
k2
E X 1 1 X 2 2
0
k1 , k2
x1 1 x2 2
k1
x1
x2
x x
k1
k2
k2
p x1 , x2
f x1 , x2 dx1dx2
where 1 = E [X1] and 2 = E [X2]
Note
E X 1 1 X 2 2 Cov X 1 , X 2
0
1,1
= the covariance of X1 and X2.
Multivariate Moment Generating

functions
Recall
The moment generating function
tx
e
p x
mX t E e
tX
if X is discrete
etx f x dx if X is continuous
Definition
Let X1, X2, Xk be a jointly distributed random
variables (discrete or continuous), then the joint
moment generating function is defined to be:
mX1 ,K , X k t1 , K , tk E et1 X1 K tk X k
t1 x1 K tk xk
e
p x1 , K , xk
if X 1 , K , X k are discrete
t1 x1 K tk xk
e
f x1 , K , xk dx1 K dxk
if X 1 ,K , X k are continuous
x1
xk
Definition
Let X1, X2, Xk be a jointly distributed random
variables (discrete or continuous), then the joint
moment generating function is defined to be:
mX1 ,K , X k t1 , K , tk E et1 X1 K tk X k
t1 x1 K tk xk
e
p x1 , K , xk
if X 1 , K , X k are discrete
t1 x1 K tk xk
e
f x1 , K , xk dx1 K dxk
if X 1 ,K , X k are continuous
x1
xk
Note : mX1 ,K , X k 0, K , 0 1
mX1 ,K , X k 0, K , t , K 0 mX i t
i
Power Series expansion the joint moment

generating function (k = 2)
tX sY
tX sY
mX ,Y t , s E e E
e
e
tX
sY
1 tX
L 1 sY
L
2!
2!
u u u
using e 1 u L
2! 3! 4!
u
t2 2
s2 2
t k sm k m
E 1 Xt Ys X XYts Y L
X Y K
2!
2!
k !m !
2
2
k m
t
s
t s
1 t 1,0 s 0,1 2,0 1,1ts 2,0 L
k ,m K
2!
2!
k !m!
2,0 2
0,2 2
k ,m k m
1 1,0t 0,1s
t 1,1ts
s L
t s K
2!
2!
k !m !

12 S241 Expectation For Multivariate Distributions

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

12 S241 Expectation For Multivariate Distributions

Încărcat de

Drepturi de autor:

Formate disponibile

Expectation

for multivariate distributions

Some Rules for Expectation

dxi 1 K dxn dxi

The Linearity property

3. (The Multiplicative property) Suppose X1, , Xq

In the simple case when k = 2

f 2 xq 1 ,K , xk dx1 K dxq dxq 1 K dxk

f1 x1 ,K , xq dx1 K dxq dxq 1 K dxk

Some Rules for Variance

Var X Y Var X Var Y 2Cov X , Y

Var X 2Cov X , Y Var Y

Note: If X and Y are independent, then

Var X Y Var X Var Y

Definition: For any two random variables X and Y

Properties of the correlation coefficient XY

If X and Y are independent than XY 0.

More properties of the correlation coefficient XY

Consider choosing b to minimize

Consider choosing b to minimize

Since g(b) 0, then g(bmin) 0

g bmin E V 2 2bmin E VU bmin

This will be true if P V bminU 0 1

Var aX bY a 2 Var X b 2 Var Y 2abCov X , Y

a Var X 2abCov X , Y b Var Y

2a1a2 Cov X 1 , X 2 K 2a1an Cov X 1 , X n

2a2 a3Cov X 2 , X 3 K 2a2 an Cov X 2 , X n

ai2 Var X i 2 ai a j Cov X i , X j

ai2 Var X i if X 1 ,K , X n are mutually independent

Hence the distribution of X is centered at and

Tchebychevs inequality is very conservative

The Law of Large Numbers

The Law of Large Numbers

Then for any > 0 (no matter how small)

Thus the Law of Large Numbers states

Thus the Law of Large Numbers states that

converges to the probability of success p as n

Some more applications

The mean and variance

Now X = X1 + + Xn has a Binomial distribution with

The mean and variance

Let X1, , Xn be n random variables defined by

We need to also calculate Cov X i , X j

Var X Var X i 2 Cov X i , X j

Var X Var X i 2 Cov X i , X j

Thus if X has a hypergeometric distribution with

The mean and variance

Thus if X has a negative binomial distribution with

x1k1 x2k2 f x1 , x2 dx1dx2

where 1 = E [X1] and 2 = E [X2]

= the covariance of X1 and X2.

Properties of the correlation coefficient XY

If X and Y are independent than XY 0.

More properties of the correlation coefficient

Some Rules for Expectation

3. (The Multiplicative property) Suppose X1, , Xq

In the simple case when k = 2

Some Rules for Variance

Var X Y Var X Var Y 2Cov X , Y

Note: If X and Y are independent, then

Var X Y Var X Var Y

Definition: For any two random variables X and Y

Var aX bY a 2 Var X b 2 Var Y 2abCov X , Y