Sunteți pe pagina 1din 5

5.2.

MEASUREMENTS AND EVENTS

97

where the F (i) are disjoint, and hence since gn M((f )) that
gn1 (ri ) = F (i) (f ).
Since (f ) = f 1 (B(A)), there are disjoint sets Q(i) B(A) such that F (i) = f 1 (Q(i)). Define
the function hn : A < by
M
X
ri 1Q(i) (a)
hn (a) =
i=1

and
hn (f ())

M
X

ri 1Q(i) (f ()) =

i=1

M
X

M
X

ri 1f 1 (Q(i)) ()

i=1

ri 1F (i) () = gn ().

i=1

This proves the result for simple functions. By construction we have that g() = limn gn () =
limn hn (f ()) where, in particular, the right-most limit exists for all . Define the function
h(a) = limn hn (a) where the limit exists and 0 otherwise. Then g() = limn hn (f ()) =
h(f ()), completing the proof.
2
Thus far we have developed the properties of -fields induced by random variables and of classes
of functions measurable with respect to -fields. The idea of a -field induced by a single random
variable is easily generalized to random vectors and sequences. We wish, however, to consider the
more general case of a -field induced by a possibly uncountable class of measurements. Then we will
have associated with each class of measurements a natural -field and with each -field a natural
class of measurements. Toward this end, given a class of measurements M, define (M) as the
smallest -field with respect to which all of the measurements in M are measurable. Since any
-field satisfying this condition must contain all of the (f ) for f M and hence must contain the
-field induced by all of these sets and since this latter collection is a -field,
[
(f )).
(M) = (
f M

The following lemma collects some simple relations among -fields induced by measurements and
classes of measurements induced by -fields.
Lemma 5.2.2 Given a class of measurements M, then
M M((M)).
Given a collection of events G, then
If G is also a -field, then

G (M((G))).
G = (M(G)),

that is, G is the smallest -field with respect to which all G-measurable functions are measurable. If
G is a -field and I(G) = {all 1G , G G} is the collection of all indicator functions of events in G,
then
G = (I(G)),
that is, the smallest -field induced by indicator functions of sets in G is the same as that induced
by all functions measurable with respect to G.

CHAPTER 5. CONDITIONAL PROBABILITY AND EXPECTATION

98

Proof: If f M, then it is (f )-measurable and hence in M((M)). If G G, then its indicator function 1G is (G)-measurable and hence 1G M((G)). Since 1G M((G)), it must be
measurable with respect to (M((G))) and hence 11
G (1) = G (M((G))), proving the second
statement. If G is a -field, then since since all functions in M(G) are G-measurable and since
(M(G)) is the smallest -field with respect to which all functions in M(G) are G-measurable, G
must contain (M(G)). Since I(G) M(G),
(I(G)) (M(G)) = G.
If G G, then 1G is in I(G) and hence must be measurable with respect to (I(G)) and hence
2
11
G (1) = G (I(G)) and hence G (I(G)), completing the proof.
We conclude this section with a reminder of the motivation for considering classes of events and
measurements. We shall often consider such classes either because a particular class is important for
a particular application or because we are simply given a particular class. In both cases we may wish
to study both the given events (measurements) and the related class of measurements (events). For
example, given a class of measurements M, (M) provides a -field of events whose occurrence or
nonoccurrence is determinable by the output events of the functions in the class. In turn, M((M))
is the possibly larger class of functions whose output events are determinable from the occurrence
or nonoccurrence of events in (M) and hence by the output events of M. Thus knowing all output
events of measurements in M is effectively equivalent to knowing all output events of measurements
in the more structured and possibly larger class M((M)), which is in turn equivalent to knowing
the occurrence or nonoccurrence of events in the -field (M). Hence when a class M is specified,
we can instead consider the more structured classes (M) or M((M)). From the previous lemma,
this is as far as we can go; that is,
(M((M))) = (M).

(5.1)

We have seen that a function g will be in M((f )) if and only if it depends on the underlying
sample points through the value of f . Since we did not restrict the structure of f , it could, for
example, be a random variable, vector, or sequence, that is, the conclusion is true for countable
collections of measurements as well as individual measurements. If instead we have a general class
M of measurements, then it is still a useful intuition to think of M((M)) as being the class of all
functions that depend on the underlying points only through the values of the functions in M.

Exercises
1. Which of the following relations are true and which are false?
f 2 (f ), f (f 2 )
f + g (f, g), f (f + g)
2. If f : A , g : , and g(f ) : A is defined by g(f )(x) = g(f (x)), then g(f ) (f ).
3. Given a class M of measurements, is

S
f M

(f ) a -field?

4. Suppose that (A, B) is a measure space and B is separable with a countable generating class
{Vn ; n = 1, 2 . . .}. Describe (1Vn ; n = 1, 2, . . .).

5.3. RESTRICTIONS OF MEASURES

5.3

99

Restrictions of Measures

The first application of the classes of measurements or events considered in the previous section is
the notion of the restriction of a probability measure to a sub--field. This occasionally provides a
shortcut to evaluating expectations of functions that are measurable with respect to sub--fields and
in comparing such functions. Given a probability space (, B, m) and a sub--field G of B, define
the restriction of m to G, mG , by
mG (F ) = m(F ), F G.
Thus (, G, mG ) is a new probability space with a smaller event space. The following lemma shows
that if f is a G-measurable real-valued random variable, then its expectation can be computed with
respect to either m or mG .
Lemma 5.3.1 Given a G-measurable real-valued measurement f L1 (m), then also f L1 (mG )
and
Z
Z
f dm = f dmG ,
where mG is the restriction of m to G.
If f is a simple function, the result is immediate from the definition of restriction. More generally
use Lemma 4.3.1(e) to infer that qn (f ) is a sequence of simple G-measurable functions converging
to f and combine the simple function result with Corollary 4.4.1 applied to both measures.
Corollary 5.3.1 Given G-measurable functions f, g L1 (m), if
Z
Z
f dm
g dm, all F G,
F

then f gm-a.e. If the preceding holds with equality, then f = g m-a.e.


Proof: From the previous lemma, the integral inequality holds with m replaced by mG , and hence
from Lemma 4.4.7 the conclusion holds mG -a.e. Thus there is a set, say G, in G with mG probability
one and hence also m probability one for which the conclusion is true.
2
The usefulness of the preceding corollary is that it allows us to compare G-measurable functions
by considering only the restricted measures and the corresponding expectations.

5.4

Elementary Conditional Probability

Say we have a probability space (, B, m), how is the probability measure m altered if we are told
that some event or collection of events occurred? For example, how is it influenced if we are given
the outputs of a measurement or collection of measurements? The notion of conditional probability provides a response to this question. In fact there are two notions of conditional probability:
elementary and nonelementary.
Elementary conditional probabilities cover the case where we are given an event, say F , having
nonzero probability: m(F ) > 0. We would like to define a conditional probability measure m(G|F )
for all events G B. Intuitively, being told an event F occurred will put zero probability on the
collection of all points outside F , but it should not effect the relative probabilities of the various
events inside F . In addition, the new probability measure must be renormalized so as to assign
probability one to the new certain event F . This suggests the definition m(G|F ) = km(G F ),

100

CHAPTER 5. CONDITIONAL PROBABILITY AND EXPECTATION

where k is a normalization constant chosen to ensure that m(F |F ) = km(F F ) = km(F ) = 1.


Thus we define for any F such that m(F ) > 0. the conditional probability measure
m(G|F ) =

m(G F )
, all G B.
m(F )

We shall often abbreviate the elementary conditional probability measure m(|F ) by mF . Given
a probability space (, B, m) and an event F B with m(F ) > 0, then we have a new probability
space (F, B F, mF ), where B F = {all sets of the form G F, G B }. It is easy to see that we
can relate expectations with respect to the conditional and unconditional measures by
R
Z
f dm
Em 1F f
F
=
,
(5.2)
EmF (f ) = f dm = F
m(F )
m(F )
where the existence of either side ensures that of the other. In particular, f L1 (mF ) if and only
if f 1F L1 (m). Note further that if G = F c , then
Em f = m(F )EmF (f ) + m(G)EmG (f ).

(5.3)

Instead of being told that a particular event occurred, we might be told that a random variable
or measurement f is discrete and takes on a specific value, say a with nonzero probability. Then
the elementary definition immediately yields
m(F |f = a) =

m(G f = a)
.
m(f = a)

If, however, the measurement is not discrete and takes on a particular value with probability zero,
then the preceding elementary definition does not work. One might attempt to replace the previous
definition by some limiting form, but this does not yield a useful theory in general and it is clumsy.
The standard alternative approach is to replace the preceding constructive definition by a descriptive
definition, that is, to define conditional probabilities by the properties that they should possess. The
mathematical problem is then to prove that a function possessing the desired properties exists.
In order to motivate the descriptive definition, we make several observations on the elementary
case. First note that the previous conditional probability depends on the value a assumed by f , that
is, it is a function of a. It will prove more useful and more general instead to consider it a function
of that depends on only through f (), that is, to consider conditional probability as a function
m(G|f )() = m(G|{ : f () = f ()}), or, simply, m(G|f = f ()), the probability of an event G
given that f assumes a value f (). Thus a conditional probability is a function of the points in the
underlying sample space and hence is itself a random variable or measurement. Since it depends on
only through f , from Lemma 5.2.1 the function m(G|f ) is measurable with respect to (f ), the
-field induced by the given measurement. This leads to the first property:
For any fixed G, m(G|f ) is (f ) measurable.

(5.4)

Next observe that in the elementary case we can compute the probability m(G) by averaging or
integrating the conditional probability m(G|f ) over all possible values of f ; that is,
Z
X
m(G|f = a)m(f = a)
m(G|f ) dm =
a

X
a

m(G f = a) = m(G).

(5.5)

5.4. ELEMENTARY CONDITIONAL PROBABILITY

101

In fact we can and must say more about such averaging of conditional probabilities. Suppose that
F is some event in (f ) and hence its occurrence or nonoccurrence is determinable by observation
of the value of f , that is, from Lemma 5.2.1 1F () = h(f ()) for some function h. Thus if we are
given the value of f (), being told that F should not add any knowledge and hence should
not alter the conditional probability of an event G given f . To try to pin down this idea, assume
that m(F ) > 0 and let mF denote the elementary conditional probability measure given F , that is,
mF (G) = m(G F )/m(F ). Applying (5.5) to the conditional measure mF yields the formula
Z
mF (G|f ) dmF = mF (G),
where mF (G|f ) is the conditional probability of G given the outcome of the random variable f and
given that the event F occurred. But we have argued that this should be the same as m(G|f ).
Making this substitution, multiplying both sides of the equation by m(F ), and using (5.2) we derive
Z
m(G|f ) dm = m(G F ), all F (f ).
(5.6)
F

To make this plausibility argument rigorous observe that since f is assumed discrete we can write
X
a1f 1 (a) ()
f () =
aA

and hence
1F () = h(f ()) = h(

a1f 1 (a) ())

aA

and therefore
F =

f 1 (a).

a:h(a)=1

We can then write


Z
m(G|f ) dm

Z
=

=
=

Z
1F m(G|f ) dm =

h(f )m(G|f ) dm

X
m(G {f = a})
m({f = a}) =
h(a)m(G f 1 (a))
m({f
=
a})
a
a
[
1
m(G
f (a)) = m(G F ).

h(a)

a:h(a)=1

Although (5.6) was derived for F with m(F ) > 0, the equation is trivially satisfied if m(F ) = 0
since both sides are then 0. Hence the equation is valid for all F in (f ) as stated. Eq. (5.6) states
that not only must we be able to average m(G|f ) to get m(G), a similar relation must hold when
we cut down the average to any (f )-measurable event.
Equations (5.4) and (5.6) provide the properties needed for a rigorous descriptive definition of
the conditional probability of an event G given a random variable or measurement f . In fact, this
conditional probability is defined as any function m(G|f )() satisfying these two equations. At this
point, however, little effort is saved by confining interest to a single conditioning measurement, and
hence we will develop the theory for more general classes of measurements. In order to do this,
however, we require a basic result from measure theorythe Radon-Nikodym theorem. The next two
sections develop this result.

S-ar putea să vă placă și