Documente Academic
Documente Profesional
Documente Cultură
Reading:
Ch. 3 in Kay-II.
Notes by Prof. Don Johnson on detection theory,
see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf.
Ch. 10 in Wasserman.
0 1 = .
null hypothesis
H1 : 1,
alternative hypothesis.
0 = {0},
1 = {1}
1, decide H1,
0, decide H0.
X1 = {x : (x) = 1}.
PFA = E x | [(X) | ] =
for in 0
X1
Z
PM = E x | [1 (X) | ] = 1
p(x | ) dx
X1
Z
p(x | ) dx
for in 1.
X0
p(x | ) dx
for in 1.
X1
EE 527, Detection and Estimation Theory, # 5
true state
1
L(1|1) = 0
L(0|1)
described
by
0
L(1|0)
L(0|0) = 0
the
quantities
L(0 | 1) p( | x) d
=
1
Z
+
0
L(0 | 0) p( | x) d
| {z }
0
Z
L(0 | 1) p( | x) d
=
1
L(0 | 1) is constant
Z
L(0 | 1)
p( | x) d
| 1 {z
}
P [1 | x]
and, similarly,
Z
1(x)
L(1 | 0) p( | x) d
=
0
L(1 | 0) is constant
Z
L(1 | 0)
p( | x) d .
| 0 {z
}
P [0 | x]
or
P [1 | x]
zZ
p( | x) d
(
X1 =
}|
x : Z 1
L(1 | 0)
L(0 | 1)
)
(1)
p( | x) d
| 0 {z
}
P [0 | x]
or, equivalently, upon applying the Bayes rule:
(
)
R
p(x | ) () d L(1 | 0)
1
.
X1 = x : R
p(x | ) () d L(0 | 1)
(2)
true state
1
L(1|1) = 0
L(0|1) = 1
0
L(1|0) = 1
L(0|0) = 0
(3)
(4)
p(1 | x)
x :
p(0 | x)
| {z }
posterior-odds ratio
L(1 | 0)
.
L(0 | 1)
(5)
x :
p(x | 1)
p(x | )
| {z 0}
likelihood ratio
0 L(1 | 0)
1 L(0 | 1)
)
(6)
where
0 = (0),
1 = (1) = 1 0
10
Z
L(1 | 0) p(x | )() d dx
E x, [loss] =
X1
Z
L(0 | 1) p(x | )() d dx.
+
X0
11
risk?
Z
Z
L(1 | 0) p(x | )() d dx
X1
Z
L(0 | 1) p(x | )() d dx
+
X0
Z
L(1 | 0) p(x | )() d dx
=
X1
Z
L(0 | 1) p(x | )() d dx
X1
Z
L(0 | 1) p(x | )() d dx
+
X0
Z
L(0 | 1) p(x | )() d dx
+
X1
const
| {z }
not dependent on (x)
Z n
Z
+
L(1 | 0)
p(x | )() d
X1
Z
L(0 | 1)
p(x | )() d dx
1
1
12
0-1 loss: For the 0-1 loss, i.e. L(1|0) = L(0|1) = 1, the
preposterior (Bayes) risk for rule (x) is
Z
Z
p(x | )() d dx
E x, [loss] =
X1
Z
p(x | )() d dx
+
X0
(7)
13
(x)
| {z }
likelihood ratio
p(x | 1) H1 0 L(1|0)
=
p(x | 0)
1 L(0|1)
(8)
see also Ch. 3.7 in Kay-II. (Recall that (x) is the sufficient
statistic for the detection problem, see p. 37 in handout # 1.)
Equivalently,
H1
14
PM
1 = (1) = 1 0
In this case, our Bayes decision rule simplifies to the MAP rule
(as expected, see (5) and Ch. 3.6 in Kay-II):
H1
p(1 | x)
1
p(0 | x)
| {z }
posterior-odds ratio
(9)
.
p(x | 0)
1
| {z }
likelihood ratio
EE 527, Detection and Estimation Theory, # 5
(10)
15
16
versus
H1 : 1
but p | x( | x) is computed as follows:
Z
p | x( | x) =
p, | x(, | x) d
and, therefore,
p( | x)
zZ
}|
p, | x(, | x) d d
R
0
p, | x(, | x) d d
|
{z
}
H1
L(1 | 0)
L(0 | 1)
(11)
p( | x)
17
R
1
R
0
px | ,(x | , ) ,(, ) d d
px | ,(x | , ) ,(, ) d d
H1
L(1 | 0)
.
L(0 | 1)
(12)
(13)
zZ
R
1
}|
px | ,(x | , ) () d d
()
Z
R
0
()
px | ,(x | , ) () d d
|
{z
}
p(x | )
H1
L(1 | 0)
.
L(0 | 1)
(14)
.
(15)
L(0 | 1)
p, | x(0, | x) d
Now, if and are independent a priori, i.e. (13) holds, then
EE 527, Detection and Estimation Theory, # 5
18
(16)
1 L(0 | 1)
px | ,(x | 0, ) () d p(x | 0)
|
{z
}
integrated likelihood ratio
where
0 = (0),
1 = (1) = 1 0.
19
Z
p(x | )() d dx
Z
p(x | )() d dx
+
X0
n Z
X1? = x :
Z
p(x | ) () d
o
p(x | ) () d < 0 .
20
Z
p(x | )() d dx
+
X0?
using
the def.
of X1?
X
|{z}
min
nZ
o
p(x | )() d) dx
p(x | )() d,
data space
4
= q0 (x)
Z h zZ
X
Z
=
= q1 (x)
{ i h zZ
p(x | )() d
}|
}|
{i
p(x | )() d
dx
[q0(x)] [q1(x)]1 dx
for 0 1,
a, b 0.
x1
x2
x=
..
xN
EE 527, Detection and Estimation Theory, # 5
21
with
x1, x2, . . . xN
conditionally independent,
distributed (i.i.d.) given and
identically
N
Y
p(xn | 0)
n=1
q1(x) = p(x | 1) 1 = 1
N
Y
p(xn | 1)
n=1
yielding
Chernoff bound for N conditionally i.i.d. measurements (given ) and simple hyp.
Z h Y
N
N
i h Y
i1
=
0
p(xn | 0)
1
p(xn | 1)
dx
n=1
0 11
n=1
N nZ
Y
n=1
= 0 11
nZ
oN
22
f (N ) exp N
o
23
where
g() =
(1 )
(2 1)T [ 1 + (1 ) 2]1(2 1)
2
h | + (1 ) | i
1
2
+ 12 log
.
|1| |2|1
24
0.2
0.18
0.16
0.14
PD
0.12
0.1
0.08
0.06
FA
0.04
0.02
0
0
10
15
Test Statistic
20
25
(x)
PFA
z
}|
{
= P [test
| statistic
{z > } | = 0]
XX1
PD = P [test
| statistic
{z > } | = 1].
XX0
Comments:
(i) As the region X1 shrinks (i.e. % ), both of the above
probabilities shrink towards zero.
EE 527, Detection and Estimation Theory, # 5
25
26
versus
H1 : = 1.
No prior pdf/pmf on is available.
Goal: Design a test that maximizes the probability of detection
PD = P [X X1 ; = 0]
(equivalently, minimizes the miss probability PM) under the
constraint
PFA = P [X X1 ; = 0] = 0 .
Here, we consider simple hypotheses; classical version of
testing composite hypotheses is much more complicated. The
Bayesian version of testing composite hypotheses is trivial (as
EE 527, Detection and Estimation Theory, # 5
27
L = PD + (PFA 0)
Z
hZ
=
p(x ; 1) dx +
X1
p(x; 0) dx
X1
=
X1
To maximize L, set
n
o
p(x ; 1)
X1 = {x : p(x ; 1)p(x ; 0) > 0} = x :
> .
p(x ; 0)
Again, we find the likelihood ratio:
(x) =
p(x ; 1)
.
p(x ; 0)
p(x ; 0) dx = PFA = 0 .
X1
EE 527, Detection and Estimation Theory, # 5
28
or,
expressing in terms of the pdf/pmf of (x) under H0:
Z
p;0 (l ; 0) dl = .
29
X
k
pk
pk log .
qk
The complete proof of this lemma for the discrete (pmf) case
is given in
30
N (PFA) .
31
Chernoff-Stein Lemma
D p(Xn | 0) k p(Xn | 1)
|
{z
}
K-L distance for a single observation
p(xn | 0) log
p(xn | 0)
x
32
x[n] = w[n],
H1 :
x[n] = A + w[n],
n = 1, 2, . . . , N
versus
n = 1, 2, . . . , N
where
A > 0 is a known constant,
w[n] is zero-mean white Gaussian noise with known variance
2, i.e.
w[n] N (0, 2).
The above hypothesis-testing formulation is EE-like: noise
versus signal plus noise. It is similar to the on-off keying
scheme in communications, which gives us an idea to
rephrase it so that it fits our formulation on p. 4 (for which
we have developed all the theory so far). Here is such an
EE 527, Detection and Estimation Theory, # 5
33
p(x ; a) = p
N
i
1 X
2
exp
(x[n]
a)
2 2 n=1
(2 2)N
(17)
a=0
(off) versus
H1 :
a=A
(on).
p(x ; a = A)
p(x ; a = 0)
N
exp[ 21 2 n=1(x[n] A)2]
1/(2 )
.
PN
1
2
2
N/2
1/(2 )
exp( 22 n=1 x[n] )
2 N/2
34
35
probability):
N
N
X
H1
1
1 X
0
2
log (x) = 2
(x[n] A)2 +
(x[n])
log
2 n=1
2 2 n=1
1
N
H1
1 X
0
(x[n]
x[n]
+A)
(x[n]
+
x[n]
A)
log
2 2 n=1 | {z }
1
0
N
X
H1
0
2 A
x[n] A2 N 2 2 log
1
n=1
N
X
AN H1 2
0
x[n]
log
2
A
1
n=1
and, finally,
since A > 0
2
A
x
log(0/1) +
N
A
2}
|
{z
H1
H1
A
2
|{z}
36
1
2
A
A
P [X > | a = 0] + 12 P [X < | a = A]
2
2
|
{z
}
|
{z
}
PFA
1
2
PM
i
A/2
p
;a=0
>p
2
2
/N
/N
| {z }
standard
normal
random var.
h X A
i
A/2
A
+ 21 P p
< p
;a=A
2
2
/N
/N
| {z }
standard
normal
random var.
1
2
r
N A2
Q
2
|
{z4 }
standard
normal complementary cdf
EE 527, Detection and Estimation Theory, # 5
+ 21
A2
N
2
|
{z 4 }
= Q 21
N A2
2
standard
normal cdf
37
since
(x) = Q(x).
The minimum probability of error decreases monotonically with
N A2/ 2, which is known as the deflection coefficient. In this
A2
case, the Chernoff information is equal to 8 2 (see Lemma 1):
min log
[0,1]
nZ
p1 (x)
h (1 ) A2 i
A2
= min
2 =
2
8 2
[0,1]
and, therefore, we expect the following asymptotic behavior:
A2
min av. error probability f (N )exp N 2 (19)
8
N +
2
2
2
4
8
NA
2
4 2
|
{z
}
f (N )
38
(20)
39
with a threshold .
If T (x) > , decide H1 (i.e. reject H0),
otherwise decide H0 (see also the discussion on p. 59).
Performance evaluation: Assuming (17), we have
T (x) | a N (a, 2/N ).
Therefore, T (X) | a = 0 N (0, 2/N ), implying
PFA
= P [T (X) > ; a = 0] = Q p
2/N
40
2
Q1(PFA).
N
A
= P (T (X) > | a = A) = Q p
2/N
s
!
A2
1
= Q Q (PFA)
2/N
!
v
u
.
= Q Q1(PFA) u N A2/ 2
t | {z }
4
= SNR = d2
Given the false-alarm probability PFA, the detection probability
PD depends only on the deflection coefficient:
N A2 {E [T (X) | a = A] E [T (X) | a = 0]}2
d = 2 =
var[T (X | a = 0)]
2
41
PD = Q Q
(PFA) d .
Comments:
As we raise the threshold , PFA goes down but so does PD.
ROC should be above the 45 line otherwise we can do
better by flipping a coin.
Performance improves with d2.
EE 527, Detection and Estimation Theory, # 5
42
43
D p(Xn | a = 0) k p(Xn | a = A)
n
h p(X | a = A) io
n
= E p(xn | a=0) log
p(Xn | a = 0)
where
p(xn | a = 0) = N (0, 2)
h p(x | a = A) i
x2n
1
(xn A)2
n
2
+
=
(2
A
x
A
).
log
=
n
2
2
2
p(xn | a = 0)
2
2
2
Therefore,
A2
D p(Xn | 0) k p(Xn | 1) =
2 2
PM
44
=
=
N +
p
1
1 Q Q (PFA) N A2/ 2
p
1
Q
N A2/ 2 Q (PFA)
1
p
2
2
1
[ N A / Q (PFA)] 2
n p
o
exp [ N A2/ 2 Q1(PFA)]2/2
1
2
2
1
[ N A / Q (PFA)] 2
n
exp N A2/(2 2) [Q1(PFA)]2/2
o
p
1
+Q (PFA) N A2/ 2
1
p
2
2
1
[ N A / Q (PFA)] 2
n
o
p
exp [Q1(PFA)]2/2 + Q1(PFA) N A2/ 2
(21)
45
46
47
48
probability. Now,
N
X
h p(d | ) i
n
1
log (d) =
log
p(dn | 0
n=1
N
X
h P dn (1 P )1dn i H1
D,n
D,n
=
log dn
log .
1d
PFA,n (1 PFA,n) n
n=1
To further simplify the exposition, we assume that all sensors
have identical performance:
PD,n = PD,
PFA,n = PFA.
N
X
dn
n=1
P
D
PFA
1 P H1
D
+ (N u1) log
log
1 PFA
or
h P (1 P ) i H1
1 P
D
FA
FA
u1 log
log + N log
.
PFA (1 PD)
1 PD
EE 527, Detection and Estimation Theory, # 5
(22)
49
u1 0 .
The Neyman-Person performance analysis of this detector
is easy: the random variable U1 is binomial given (i.e.
conditional on the hypothesis) and, therefore,
N
pu1 (1 p)N u1
P [U1 = u1] =
u1
where p = PFA under H0 and p = PD under H1. Hence, the
global false-alarm probability is
PFA,global
N
X
N
u1
0
= P [U1 > | 0] =
PFA
(1PFA)N u1 .
u1
0
u1 =d e
50
51
i j = i 6= j.
versus
H1 : 1
..
versus
versus
HM 1 : M 1
and, consequently, our action space consists of M choices. We
design a decision rule : X (0, 1, . . . , M 1):
0,
decide H0,
1,
decide H1,
(x) =
..
M 1, decide HM 1
where partitions the data space X [i.e. the support of p(x | )]
into M regions:
Rule : X0 = {x : (x) = 0}, . . . , XM 1 = {x : (x) = M 1}.
EE 527, Detection and Estimation Theory, # 5
52
i = 0, 1, . . . , M 1.
M
1 Z
X
i=0
M
1
X
L(m | i) p( | x) d
Z
L(m | i)
p( | x) d,
m = 0, 1, . . . , M 1.
i=0
min
0lM 1
l(x)},
m = 0, 1, . . . , M 1
n
= x : m = arg
n
= x : m = arg
min
0lM 1
min
0lM 1
M
1 Z
X
i=0
M
1
X
i=0
L(l | i) p(x | ) () d
Z
L(l | i)
o
p(x | ) () d .
i
53
E x, [loss] =
M
1 M
1 Z
X
X
m=0
Xm
m=0 i=0
M
1 Z
X
M
1
X
Z
L(m | i)
Xm i=0
{z
hm ()
hm(x) dx
Xm
1 Z
hM
X
m=0
Xm ?
hm(x) dx 0
54
m(x) =
i=0, i6=m
p( | x) d
i
Z
=
p( | x) d
const
| {z }
not a function of m
and
?
Xm
n
= x : m = arg
P [ l | x]
max
0lM 1
n
=
x : m = arg
p( | x) d
max
0lM 1
(23)
n
= x : m = arg
max
0lM 1
p(l | x)
or, equivalently,
?
Xm
n
= x : m = arg
o
max [l p(x | l)] , m = 0, 1, . . . , M 1
0lM 1
55
where
0 = (0),
. . . , M 1 = (M 1)
1
M
io
n
h1
p(x | l)
=
x : m = arg max
0lM 1 M
n
o
=
x : m = arg max p(x | l)
(24)
0lM 1 | {z }
likelihood
56
57
58
P Values
Reporting accept H0 or accept H1 is not very informative.
Instead, we could vary PFA and examine how our report would
change.
Generally, if H1 is accepted for a certain specified PFA, it will
0
be accepted for PFA
> PFA. Therefore, there exists a smallest
PFA at which H1 is accepted. This motivates the introduction
of the p value.
To be more precise (and be able to handle composite
hypotheses), here is a definition of a size of a hypothesis
test.
Definition 1. The size of a hypothesis test described by
Rule : X0 = {x : (x) = 0},
X1 = {x : (x) = 1}.
is defined as follows:
= max P [x X1 | ] = max possible PFA.
0
59
FA
z
}|
{
= P [x X1, | = 0]
(simple hypotheses).
60
61
62
Multiple Testing
We may conduct many hypothesis tests in some applications,
e.g.
bioinformatics and
sensor networks.
Here, we perform many (typically binary) tests (one test per
node in a sensor network, say). This is different from testing
multiple hypotheses that we considered on pp. 5458, where
we performed a single test of multiple hypotheses. For a
sensor-network related discussion on multiple testing, see
E.B. Ermis, M. Alanyali, and V. Saligrama, Search and
discovery in an uncertain networked world, IEEE Signal
Processing Magazine, vol. 23, pp. 107118, Jul. 2006.
Suppose that each test is conducted with false-alarm probability
PFA = . For example, in a sensor-network setup, each node
conducts a test based on its local data.
Although the chance of false alarm at each node is only , the
chance of at least one falsely alarmed node is much higher,
since there are many nodes. Here, we discuss two ways to deal
with this problem.
EE 527, Detection and Estimation Theory, # 5
63
pi < .
M
n
h[
Ai
i=1
n
X
P [Ai].
i=1
64
Then
P {A} = P
M
n[
Ai
i=1
n
X
M
X
= .
P {Ai} =
M
i=1
i=1
2
Comments: Suppose now that the tests are statistically
independent and consider the smallest of the M p values.
Under H0i, pi are uniform(0, 1). Then
P {min{P1, P2, . . . , PM } > x} assuming H = (1 x)M
0i
i = 1, 2, . . . , M
yielding the proper
min{p1, p2, . . . , pm} as
value
to
be
attached
to
(26)
65
and let m0 + m1 = m be
number of true H0i hypotheses + number of true H1i
hypotheses = total number of hypotheses (nodes, say).
declared total
V
m0
S
m1
R
m
FDP =
V /R, R > 0,
0, R = 0
66
i
Cm m
where Cm is defined P
to be 1 if the p values are
m
independent and Cm = i=1(1/i) otherwise.
(iii) Define the BH rejection threshold = p(R).
(iv) Accept all H1i for which p(i) .
Theorem 4. (formulated and proved by Benjamini and
Hochberg) If the above BH method is applied, then, regardless
of how many null hypotheses are true and regardless of the
distribution of the p values when the null hypothesis is false,
m0
FDR = E [FDP]
.
m
67