Documente Academic
Documente Profesional
Documente Cultură
EE 278
Lecture Notes # 3
Winter 20102011
Probability space (, F , P)
Examples
0 if r 0.5
X(r) =
.
1 otherwise
Y = cos(2V)
Derived distributions
X 1(F)
find PX
Notes:
Random vectors
10
n1
1
S n() =
Xk ()
n k=0
or X or {Xn; n = 0, 1, . . . , k 1} or {Xn; n Zk }
c R.M. Gray 2011
1
Sn =
Xk
n k=0
Earlier example: two coin flips, k-coin flips (first k binary coefficients
of fair spinner)
11
12
Random processes
where the various forms are equivalent and all stand for Pr(X F)
Technically, the formula holds for suitable events F B(R)k , the Borel
field of Rk (or some suitable subset). See book for discussion.
One multidimensional event of particular interest is a Cartesian
product of 1D events (called a rectangle):
13
14
Have seen one example fair coin flips, a Bernoulli random process
X(t) = A cos(2t + )
k
F = k1
i=0 F i = {x : xi F i; i = 0, . . . , k 1}
16
F X (x)
PX (F) =
pX (x) = P(X 1(F))
fX (r)dr = Pr(X x)
xF
17
Notes:
fX (x) =
d
F X (x)
dx
p()
:X()=x
pX (x),
x:xr
19
20
=
= Z+, P determined by the geometric pmf
even
(1 p)k1 p =
p
(1 p)
(1 p)k1 p
k=2,4,...
k=0
(1 p)
1 p
=
= p
2
2 p
1 (1 p)
1
pY (0) = 1 pY (1) =
2 p
1 if even
Define a random variable Y : Y() =
0 if odd
Using the inverse image formula for the pmf for Y() = 1:
c R.M. Gray 2011
21
P(F) =
rF
22
r: X(r)F
r: X(r)=x
g(r) dr.
g(r) dr
w1/2
24
d
dw
b(w)
a(w)
g(r) dr = g(b(w))
db(w)
da(w)
g(a(w))
dw
dw
In our example
fW (w) = g(w1/2)
1/2
2
1/2
w
1/2
g(w )
2
Define
where
w1/2 w/22
fW (w) =
e
; w [0, ).
22
a chi-squared pdf with one degree of freedom
EE278: Introduction to Statistical Signal Processing, winter 20102011
25
x
max(x, y) =
y
min(x, y) =
if x y
otherwise
if x y
otherwise
c R.M. Gray 2011
26
Thus
To find the pdf of U , we first find its cdf. U u iff both X and Y are
u, so using independence
FV (v) = Pr(X v or Y v)
= 1 (1 F X (v))(1 FY (v))
EE278: Introduction to Statistical Signal Processing, winter 20102011
27
28
V() =
E.g., both continuous with same pdf, or both discrete with same pmf
29
30
If random vector has a discrete range space, then the distribution can
be described by a multidimensional pmf pX(x) = PX({x}) = Pr(X = x)
as
PX(F) =
xF
31
pX(x) =
(x0,x1,...,xk1)F
32
FX(x) = PX k1
i=0 (, xi]
= P Xi1((, xi]) .
i=0
FX(x)
= F X0,X1,...,Xk1 (x0, x1, . . . , xk1)
= Pr(Xi xi; i = 0, 1, . . . , k 1)
x0 x1 xk1
=
k
F X ,X ,...,X (x0, x1, . . . , xk1).
x0x1 . . . xk1 0 1 k1
33
34
x1,x2,...,xk1
=
pX(x0, x1, x2, . . . , xk1)
35
36
F Xi () = FX(, , . . . , , , , . . . , ),
pXi ()
=
x0,x1,...,xi1,xi+1,...,xk1
or
fXi () =
37
2D random vectors
38
If the range space of the vector (X, Y) is continuous and the cdf is
differentiable so that fX,Y (x, y) exists,
fX (x) =
pX (x) =
39
40
Example
Suppose rvs X and Y are such that the random vector (X, Y) has a
pmf of the form
1
pXY (x, y) = pX (x)pY (y) = ; x, y = 0, 1
4
1
pX (x) = pY (y) = ; x = 0, 1
2
pX (x) =
pX,Y (x, y) =
r(x)q(y)
y
41
pXY (x, y) 0
1
42
Another example
Loaded pair of six-sided dice have property the sum of the two dice =
7 on every roll.
1
0.1
0.4
Quite different joints can yield the same marginals. Marginals alone
do not tell the story.
43
44
pX (x) =
Continuous example
1
pXY (x, y) = pXY (x, 7 x) = , x = 1, 2, . . . , 6
6
(X, Y) a rv with a pdf that is constant on the unit disk in the XY plane:
C x2 + y2 1
fX,Y (x, y) =
0 otherwise
x2+y21
C dx dy = 1.
fX (x) =
+ 1x2
1x2
45
+1
46
Inverse matrix is
2C 1 x2 dx = C = 1,
or C = 1.
Thus
C dy = 2C 1 x2 , x2 1.
1
1
1
1
=
,
1 2 1
fX (x) = 2
1 x2 , x2 1.
1
1
2
2
exp 2(1
(x
+
y
2xy)
2)
fX,Y (x, y) =
, (x, y) R2.
2
2 1
47
48
2
(yx)2
(yx)2
x2
x
exp 2(1
exp
2)
2
2(12) exp 2
fX,Y (x, y) =
=
.
2
2 1 2
2(1 2)
49
50
The axioms of probability that these pmfs for any choice of K and
k1, . . . , kK must be consistent in the sense that if any of the pmfs is
used to compute the probability of an event, the answer must be the
same. E.g.,
pX1 (x1) =
x2
x0,x2
pXk1 ,Xk2 ,...,XkK (x1, x2, . . . , xK ) = 2K , all (x1, x2, . . . , xK ) {0, 1}K
x3,x5
51
52
The same result holds for continuous time random processes and for
continuous alphabet processes (family of pdfs)
53
54
f (xk )
i=1
p(xk ),
i=1
then there is a random process {Xn} having these vector pmfs for
finite collections of samples. A process of this form is called an iid
process.
EE278: Introduction to Statistical Signal Processing, winter 20102011
55
56
pXY (x, y) =
xF,yG
pX (x)pY (y)
xF,yG
=
pX (x)
pY (y) = P(X 1(F))P(Y 1(G))
xF
yG
57
58
k1
i=0
k1
fXi (xi).
i=0
59
60
Conditional distributions
k1
i=0
61
62
Define for each x AX for which pX (x) > 0 the conditional pmf
alphabet AX AY
joint pmf pX,Y (x, y)
marginal pmfs pX and pY
EE278: Introduction to Statistical Signal Processing, winter 20102011
63
64
P(Y F|X = x) =
For fixed x, pY|X (|x) is a pmf:
pY|X (y|x) =
yAY
P(X G, Y F) =
1
pX (x) = 1.
pX (x)
x,y:xG,yF
pX (x)
xG
=
The joint pmf can be expressed as a product as
xG
pX,Y x, y
yF
pY|X (y | x)
pX (x)P(F | X = x)
pY|X (y|x)
yF
pX,Y (x, y)
1
=
pX,Y (x, y)
pX (x)
pX (x) yA
yA
Y
65
66
pX|Y (x|y) =
Z {0, 1}
X {0, 1}
Y {0, 1}
67
68
pX|Y (x|y) =
xAX
Therefore
pY|X (0 | 0) = pZ (0 0) = pZ (0) = 1
pY|X (y|x)
pX (x)
pY|X (y|x)pX (x)
pY|X (0 | 1) = pZ (0 1) = pZ (1) =
pY|X (1 | 0) = pZ (1 0) = pZ (1) =
pY|X (1 | 1) = pZ (1 1) = pZ (0) = 1
= Pr{x Z = y | X = x} = Pr{Z = y x | X = x}
69
70
72
pY|X (0|0)
(1 )(1 p)
pX (0) =
pY|X (0|0)pX (0) + pY|X (0|1)pX (1)
(1 )(1 p) + p
p
pX|Y (1|0) = 1 pX|Y (0|0) =
(1 )(1 p) + p
pY|X (1|0)
(1 p)
pX|Y (0|1) =
pX (0) =
pY|X (1|0)pX (0) + pY|X (1|1)pX (1)
(1 )p + (1 p)
(1 )p
pX|Y (1|1) = 1 pX|Y (0|1) =
(1 )p + (1 p)
pX|Y (0|0) =
71
(1 )(1 p) + p for y = 0
=
(1 p) + (1 )p for y = 1
pY (0) = 12 p + 12 (1 p) =
1
2
= pY (1)
In this case, the bit sent X and the bit received Y are independent
(check this)
pmf pX0,X1,...,Xk1
73
chain rule
74
n1
l=1
fY|X (y|x)
fX,Y (x, y)
.
fX (x)
76
fY|X (y|x) dy =
=
=
fX,Y (x, y)
dy
fX (x)
1
fX,Y (x, y) dy
fX (x)
1
fX (x) = 1,
fX (x)
P(X G, Y F) =
P(Y F|X = x)
77
x,y:xG,yF
xG
xG
fX (x)
fY|X (y | x)dy dx
yF
fX (x)P(F | X = x)
78
Bayes rule:
fX|Y (x|y) =
U = (X, Y), Gaussian pdf with mean (mX , mY )t and covariance matrix
=
EE278: Introduction to Statistical Signal Processing, winter 20102011
79
2X
X Y
X Y
2Y
,
c R.M. Gray 2011
80
Algebra
Rearrange
det() = 2X 2Y (1 2)
1
1/2X
/(X Y )
1
=
1/2Y
(1 2) /(X Y )
1 ym ( / )(xm ) 2
Y Y X
X
exp
2
X 2
exp 12 ( xm
12Y
X )
fXY (x, y) =
22X
22Y (1 2)
so
fXY (x, y)
1
1
1
t
=
e 2 (xmX ,ymY ) (xmX ,ymY )
2 det
1
1
=
exp
2(1 2)
2X Y 1 2
2
x mX 2
(x
m
)(y
m
)
y
m
X
Y
Y
2
+
X
X Y
Y
fY|X (y|x) =
12Y
22Y (1
2 )
mY|X mY + (Y /X )(x mX )
81
82
fX (x) =
1 ym ( / )(xm ) 2
Y Y X
X
exp
e(xmX ) /2X
.
2
2X
= fX0 (x0)
n1
i=1
83
84
= Pr(X = X(Y))
Pc(X)
= 1 Pe ,
Received: rv Y
Conditional pmf (noisy channel) pY|X (y|x)
where
= Pr(X(Y)
Pe(X)
X).
pY|X (y|x) =
xy
x=y
.
c R.M. Gray 2011
85
86
=
Pr(X = X) = 1 Pe(X)
pX,Y (x, y)
(x,y):X(y)=x
(x,y):X(y)=x
pY (y)
pX|Y (x|y)
x:X(y)=x
pY (y)pX|Y (X(y)|y).
Accomplished by X(y)
arg max pX|Y (u|y) which yields
pX|Y (X(y)|y)
= maxu pX|Y (u|y)
87
88
=
pX,W (, ) = pX,W (x, y x)
,:=x,+=y
pY|X (y|x) =
Intuitive!
pX,Y (x, y)
= pW (y x),
pX (x)
Marginal for Y :
pY (y) =
= pX (x)pW (y x).
pX,Y (x, y) =
89
pX (x)pW (y x)
c R.M. Gray 2011
90
a discrete convolution
Above uses ordinary real arithmetic. Similar results hold for other
definitions of addition, e.g., modulo 2 arithmetic for binary
91
,:x,+y
y
d fX () fW ()
d fX ()FW (y ).
92
Taking derivatives:
fY (y) =
fX (x) fW (y x) dx,
Gaussian example:
fX|Y (x|y) =
fX (x) fW (y x)
fX () fW (y ) d
which is N(x, 2W ).
c R.M. Gray 2011
93
fY (y) =
Integrand resembles
fY|X (y|) fX () d
exp 12 (y )2 exp 12 2
2W
2X
=
d
22W
22X
2
1 y 2y + 2 2
1
=
exp
+ 2 d
2X W
2
2W
X
2
exp 2y 2
1 2 1
1
2y
W
=
exp ( 2 + 2 ) 2 d
2X W
2
X W
W
94
1 m 2
exp (
) .
2
1 m 2
exp (
) d = 22
2
e(yx) /2W
fY|X (y|x) = fW (y x) =
22W
95
1 2 1
1 2y
1 m 2
1 2
m m2
2 + 2 2 vs.
=
2 2 + 2 .
2
2
2
2
X W
W
96
1
1
1
=
+
2 =
2 2W 2X
2X 2W
,
2X + 2W
and
y
m
2
=
m
=
y.
2W 2
2W
1
1 2y m 2 m2
2 + 2 2 =
2
X W
W
97
fX|Y (x|y) = N
2
exp 212 (y x)2 exp 212 x2 exp 12 2 y+2
W
X
W
X
/
=
2
2
2
2W
2(X + 2W )
2X
2
y2
1 y 2yx+x2
x2
exp 2
+ 2 2 +2
2W
X
X
W
=
22X 2W /(2X + 2W )
1
2
2
2 2
exp 22 2 /(
(x
y
/(
+
))
2 +2 )
X
X
W
X W
X
W
=
.
22X 2W /(2X + 2W )
2
2
2 exp 1 2 y 2
exp 12 y2
2 +
m
W
X
W
fY (y) =
22 exp
=
2
2X W
2
2(2X + 2W )
So fY = N(0, 2X + 2W )
1 2 1
1
2y
exp ( 2 + 2 ) 2 d
2
X W
W
1 m 2 m2
m
2
=
exp
2 d = 2 exp
2
2
22
98
2X 2W
.
y,
2X + 2W 2X + 2W
2X
99
100
continuous cases
= Pr(x + W y | X = x) = Pr(W y x | X = x)
= Pr(W y x) = FW (y x)
Differentiating,
fY|X (y|x) =
101
pX (x)
pX (x)
Pr(Y G) =
=
pX (x)
pX (x)
fW (y x) dy.
pX|Y (x|y) =
fY (y) =
fY|X (y|x) dy
fW (y x) dy.
Pr(X F and Y G) =
102
fY|X (y|x) dy
Choosing F = R yields
Pr(X F and Y G) =
d
d
FY|X (y|x) = FW (y x) = fW (y x)
dy
dy
pX (x) fW (y x),
c R.M. Gray 2011
G
G
103
pX|Y (x|y)
104
Pr(X F|Y = y) =
pX|Y (x|y)
Apply to binary input and Gaussian noise: the conditional pmf of the
binary input given the noisy observation is
fW (y x)pX (x)
fY (y)
fW (y x)pX (x)
=
; y R, x {0, 1}.
pX () fW (y )
pX|Y (x|y) =
of X given Y = y is
As in the purely discrete case, MAP detector X(y)
given by
fW (y x)pX (x)
X(y)
= argmax pX|Y (x|y) = argmax
.
x
x
pX () fW (y )
X(y)
= argmax pX|Y (x|y) = argmax fW (y x)pX (x).
x
105
106
Pe = Pr(X(Y)
X)
= Pr(X(Y)
0|X = 0)pX (0) + Pr(X(Y)
1|X = 1)pX (1)
1 (x y)2
1
X(y)
= argmax pX|Y (x|y) = argmax
exp
2 2W
x
x
2
2W
= Pr(Y > 0.5|X = 0)pX (0) + Pr(Y < 0.5|X = 1)pX (1)
= Pr(W + X > 0.5|X = 0)pX (0) + Pr(W + X < 0.5|X = 1)pX (1)
= Pr(W > 0.5|X = 0)pX (0) + Pr(W + 1 < 0.5|X = 1)pX (1)
Minimum distance or nearest neighbor decision, choose closest x to y
A threshold detector
0 y < 0.5
X(y)
=
.
1 y > 0.5
1
0.5
0.5
1
Pe = 1
+
=
.
2
W
W
2W
c R.M. Gray 2011
107
108
Statistical estimation
109
MAP Estimation
110
112
Characteristic functions
MX ( ju) =
When sum independent random variables, find derived distribution by
convolution of pmfs or pdfs
Can be complicated, avoidable using transforms as in linear systems
Summing independent random variables arises frequently in signal
analysis problems. E.g., iid random process {Xk } is put into a linear
pX (x)e jux
g: E(g) = p()g()
Consider probability space (X , B(X ), PX ) with PX described by pmf
pX
114
116
jux
Define random
variable g(X) on this space g(X)(x) = e . Then
jux
E[g(X)] =
pX (x)e so that
F(pX ) =
pX (x)e j2x
Zz(pX ) =
pX (x)z x.
115
1
2
/2
1
MX ( ju)eiuk du =
2
/2
=
/2
pX (x)
/2
x
1
2
/2
e ju(xk) du
Y = X+W
/2
pX (x)kx = pX (k).
MY ( ju) =
pY (y)e juy
pY (y) =
pX,W (x, w)
x,w:x+w=y
c R.M. Gray 2011
117
to obtain
118
Iterate:
MY ( ju) =
juy
pX,W (x, w) e juy =
p
(x,
w)e
X,W
y
x,w:x+w=y
y
x,w:x+w=y
=
ju(x+w)
=
p
(x,
w)e
pX,W (x, w)e ju(x+w)
X,W
x,w:x+w=y
x,w
MY ( ju) =
x,w
= MX ( ju)MW ( ju),
pX (x)e jux
pW (w)e juw
MY ( ju) =
MXi ( ju).
i=1
119
120
MX ( ju) =
juk
k=0
pX (k) = (1 p) + pe
ju
pYn (k) =
k=1
Xi, then
MX ( ju) =
F ( f X ) =
121
MX ( ju) = E e juX .
122
L s( fX ) =
fX (x)e j2x dx
n
(1 p)nk pk ; k Zn+1.
k
pYn (k)
n
n
juk
nk
k
=
k (1 p) p e ,
k=0
MYn ( ju) =
Uniqueness of transforms
by
fX (x) =
1
2
123
124
X N(m, 2)
MY ( ju) =
MXi ( ju).
i=1
= e jumu
2 2 /2
125
2 2 /2 n
] = e ju(nm)u
2 (n2 )/2
2 2 /2
126
128
Yn = nk=1 Xi
Then
1
2
2
MX ( ju) = E(e ) =
e(xm) /2 e jux dx
2
1/2
(2 )
1
2
2
2
2
=
e(x 2mx2 jux+m )/2 dx
2
1/2
(2 )
1
2 2
(x(m+ ju2))2/22
=
e
dx e jumu /2
2
1/2
(2 )
juX
127
t
MX( ju) = MX0,...,Xn1 ( ju0, . . . , jun1) = E e ju X
n1
= E exp j uk Xk
k=0
n1
n1
= exp j
uk mk 1/2
uk (k, m)um
k=0
k=0 m=0
129
Used to define discrete time iid processes and processes which can
be constructed from iid processes by coding or filtering.
Consistent family?
130
131
132
133
134
p (x ) =
Xn
n1
i=0
x=1
p
pXn (x) =
,
1 p x = 0
n=0
Y0
Yn =
,
Xn Yn1 n = 1, 2, . . .
135
136
Alternatively:
1 if Xn Yn1
Yn =
.
0 if Xn = Yn1
i=1
137
y0
1
=
; y1 = 0, 1.
2
pY n (y )
1 y1y0
p
(1 p)1y1y0
2 y
pY (yi)
(provided p 1/2)
Joint not product of marginals, but can use chain rule with conditional
probabilities to write as product of conditional pmfs, given by
1
pYn (y) = ; y = 0, 1; n = 0, 1, 2, . . .
2
n1
i=0
In a similar fashion it can be shown that the marginals for Yn are all
the same:
138
n1
pY1 (y1) =
1 yiyi1
p (y ) =
p
(1 p)1yiyi1 .
2 i=1
Yn
pX (yi yi1).
Used the facts that (1) a b = c iff a = b c, (2) Y0, X1, X2, . . . , Xn1
mutually independent, and (3) Xn are iid.
n1
pY l+1 (yl+1)
= pX (yl yl1)
pY l (yl)
c R.M. Gray 2011
140
pYl|Yl1 (yl|yl1) =
= pylyl1 (1 p)1ylyl1
n=0
Y0 = 0
Yn =
.
n Xk = Yn1 + Xn n = 1, 2, . . .
k=1
Yn = Yn1 or Yn = Yn1 + 1; n = 2, 3, . . .
c R.M. Gray 2011
141
l=1
Thus
142
To completely describe this process need a formula for the joint pmfs
n
143
144
Xn iid
Similar derivation
i=1
pX (yi yi1)
Similar derivation works for sum of iid rvs with any pmf pX to show
that
i=1
or, equivalently,
p(yiyi1)(1 p)1(yiyi1),
where
Markov
yi yi1 = 0 or 1, i = 1, 2, . . . , n; y0 = 0.
EE278: Introduction to Statistical Signal Processing, winter 20102011
145
146
binomial theorem
Pr(Xn = 1) = p
n=0
0
Yn =
,
n Xk n = 1, 2, . . .
k=1
Yn = Yn1 + Xn, n = 1, 2, . . .
Transform of the iid random variables is
pYn (k)
n
nk k
=
(1 p) p e ju(n2k)
k
k=0
(n+k)/2 (nk)/2
e juk .
=
(1 p)
p
(n k)/2
k=n, n+2,...,n2,n
147
148
n
pYn (k) =
(1 p)(n+k)/2 p(nk)/2 ,
(n k)/2
k = n, n + 2, . . . , n 2, n.
n=0
0
Yn =
,
k=1 Xk n = 1, 2, . . .
Handle in essentially the same way, but use cdfs and then pdfs
Previously found marginal fYn using transforms to be N(0, n2X )
c R.M. Gray 2011
149
To find the joint pdfs use conditional pdfs and chain rule
l=1
150
i=1
151
fX (yi yi1).
c R.M. Gray 2011
152
and hence
If fX = N(0, 2)
y
(y y )2
n exp i i1
exp 212
22
fY n (yn) =
22 i=2
22
n
1
2 n/2
2
2
= (2 )
exp 2 ( (yi yi1) + y1) .
2
i=2
This is a joint Gaussian pdf with mean vector 0 and covariance matrix
KX (m, n) = 2 min(m, n), m, n = 1, 2, . . .
A similar argument implies that
153
155
154