Sunteți pe pagina 1din 504

Springer Texts in Statistics

Advisors:
George Casella Stephen Fienberg Ingram Olkin

Springcr-Science+Business Media, LLC


Springer Texts in Statistics

Alfred: Elements of Statistics for the Life and Social Sciences


Berger: An Introduction to Probability and Stochastic Processes
Blom: Probability and Statistics: Theory and Applications
Brockwell and Davis: An Introduction to Times Series and Forecasting
Chow and Teicher: Probability Theory: Independence, Interchangeability,
Martingales, Third Edition
Christensen: Plane Answers to Complex Questions: The Theory of Linear
Models, Second Edition
Christensen: Linear Models for Multivariate, Time Series, and Spatial Data
Christensen: Log-Linear Models and Logistic Regression, Second Edition
Creighton: A First Course in Probability Models and Statistical Inference
du Toit, Steyn and Stumpf" Graphical Exploratory Data Analysis
Edwards: Introduction to Graphical Modelling
Finkelstein and Levin: Statistics for Lawyers
Flury: A First Course in Multivariate Statistics
Jobson: Applied Multivariate Data Analysis, Volume I: Regression and
Experimental Design
Jobson: Applied Multivariate Data Analysis, Volume II: Categorical and
Multivariate Methods
Kalbfleisch: Probability and Statistical Inference, Volume I: Probability,
Second Edition
Kalbfleisch: Probability and Statistical Inference, Volume II: Statistical
Inference, Second Edition
Karr: Probability
Keyfitz: Applied Mathematical Demography, Second Edition
Kiefer: Introduction to Statistical Inference
Kokoska and Nevison: Statistical Tables and Formulae
Lehmann: Testing Statistical Hypotheses, Second Edition
Lindman: Analysis of Variance in Experimental Design
Lindsey: Applying Generalized Linear Models
Madansky: Prescriptions for Working Statisticians
McPherson: Statistics in Scientific Investigation: Its Basis, Application, and
Interpretation
Mueller: Basic Principles of Structural Equation Modeling
Nguyen and Rogers: Fundamentals of Mathematical Statistics: Volume I:
Probability for Statistics
Nguyen and Rogers: Fundamentals of Mathematical Statistics: Volume II:
Statistical Inference
Noether: Introduction to Statistics: The Nonparametric Way

Continued at end of book


Yuan Shih Chow Henry Teicher

Probability Theory
Independence, Interchangeability,
Martingales

Third Edition

, Springer
Yuan Shih Chow Henry Teicher
Department of Statistics Department of Statistics
Columbia University Rutgers University
New York, NY 10027 New Brunswick, NJ 08903
USA USA

Editorial Board
Ingram Olkin
Stephen Fienberg Department of Statistics
Department of Statistics Stanford University
Carnegie Mellon University Stanford, CA 94305
Pittsburgh, PA 15213-3890 USA
USA
George Casella
Biometrics Unit
Cornell University
Ithaca, NY 14853-7081
USA

Library of Congress Cataloging-in-Publication Data


Chow, Yuan Shih, 1924-
Probability theory : independence, interchangeability, martingales
/ Yuan Shih Chow, Henry Teicher. - 3rd ed.
p. cm. - (Springer texts in statistics)
Includes bibliographical references and index.
ISBN 978-0-387-40607-7 ISBN 978-1-4612-1950-7 (eBook)
DOI 10.1007/978-1-4612-1950-7
1. Probabilities. 2. Martingales (Mathematics) 1. Teicher,
Henry. II. Title. III. Series.
QA273.C573 1997
519.2-dc21 97-9299

Printed on acid-free pa per.

© 1997, 1988, 1978 Springer Science+Business Media New York


Originally published bySpringer-Verlag New York in 1997, 1988, 1978
Softcover reprint ofthe hardcover 3rd edition 1997, 1988, 1978
AII rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New
York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly
analysis. Use in connection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or here-
after developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even
if the former are not especially identified, is not to be taken as a sign that such names, as
understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely
byanyone.

Production managed by Karina Mikhli; manufacturing supervised by Jacqui Ashri.


Typeset by Asco Trade Typesetting Ltd., Hong Kong.

987654321

ISBN 978-0-387-40607-7
To our teachers
J. L. Doob and 1. Wolfowitz
Preface to the Third Edition

Apart from some additional theorems and examples, simplification of proofs,


and correction of typographical errors, the main change is the addition of
section 7.5, dealing with V-statistics. In the first two editions (1978, 1988)
Lemma 5.4.5 was contained in the proof of Theorem 10.4.1 but seems worth
highlighting here.

vii
Preface to the Second Edition

Apart from new examples and exercises, some simplifications of proofs, minor
improvements, and correction of typographical errors, the principal change
from the first edition is the addition of section 9.5, dealing with the central
limit theorem for martingales and more general stochastic arrays.

ix
Preface to the First Edition

Probability theory is a branch ofmathematics dealing with chance phenomena


and has clearly discernible links with the real world. The origins of the sub-
ject, generally attributed to investigations by the renowned French mathe-
matician Fermat of problems posed by a gambling contemporary to Pascal,
have been pushed back a century earlier to the Italian mathematicians
Cardano and Tartaglia about 1570 (Ore, 1953). Results as significant as the
Bernoulli weak law of large numbers appeared as early as 1713, although its
counterpart, the Borel strong law oflarge numbers, did not emerge until 1909.
Central limit theorems and conditional probabilities were already being
investigated in the eighteenth century, but the first serious attempts to grapple
with the logical foundations of probability seem to be Keynes (1921), von
Mises (1928; 1931), and Kolmogorov (1933).
An axiomatic mold and measure-theoretic framework for probability
theory was furnished by Kolmogorov. In this so-called objective or measure-
theoretic approach, definitions and axioms are so chosen that the empirical
realization of an event is the outcome of a not completely determined physical
experiment-an experiment which is at least conceptually capable of indefi-
nite repetition (this notion is due to von Mises). The concrete or intuitive
counterpart of the probability of an event is a long run or limiting frequency
of the corresponding outcome.
In contradistinction to the objective approach-where typical realizations
of events might be: a coin falls heads, more than 50 cars reach a busy inter-
section during a specified period, a continuously burning light bulb fails
within 1000 hours-the subjective approach to probability advocated by
Keynes is designed to encompass realizations such as: it will rain tomorrow,
life exists on the planet Saturn, the Iliad and the Odyssey were written by the
same author-despite the fact that the experiments in question are clearly
xi
xii Preface

unrepeatable. Here the empirical counterpart of probability is degree or


intensity of belief.
It is tempting to try to define probability as a limit of frequencies (as
advocated by von Mises) rather than as a real number between zero and one
satisfying certain postulates (as in the objective approach). Unfortunately,
incorporation of repeatability as a postulate (von Mises' "randomness
axiom") complicates matters while simultaneously circumscribing the notion
of an event. Thus, the probability of the occurrence infinitely often of some
particular event in an infinite sequence of repetitions of an experiment-
which is ofconsiderable interest in the Kolmogorov schema- is proscribed in
(the 1964 rendition of) the von Mises approach (1931). Possibly for these
reasons, the frequency approach appears to have lost out to the measure-
theoretic. It should be pointed out, however, that justification of the measure-
theoretic approach via the Borel strong law of large numbers is circular in
that the convergence of the observed frequency of an event to its theoretically
defined probability (as the number of repetitions increases) is not pointwise
but can only be defined in terms ofthe concept being justified, viz., probability.
If, however, one is willing to ascribe an intuitive meaning to the notion of
probability one (hence also, probability zero) then the probability p of any
intermediate value can be interpreted in this fashion.
A number ofaxiomatizations for subjective probability have appeared
since Keynes with no single approach dominating. Perhaps the greatest
influence of subjective probability is outside the realm of probability theory
proper and rather in the recent emergence of the Bayesian school of statistics.
The concern of this book is with the measure-theoretic foundations of
probability theory and (a portion of) the body of laws and theorems that
emerge therefrom. In the 45 years since the appearance of von Mises' and
Kolmogorov's works on the foundations of probability, the theory itself has
expanded at an explosive pace. Despite this burgeoning, or perhaps because
of the very extent thereof, only the topics of independence, interchangeability,
and martingales will be treated here. Thus, such important concepts as
Markov and stationary processes will not even be defined, although the
special cases of sums of independent random variables and interchangeable
random variables will be dealt with extensively. Likewise, continuous param-
eter stochastic processes, although alluded to, will not be discussed. Indeed,
the time seems propitious for the appearance of a book devoted solely to such
processes and presupposing familiarity with a significant portion of the
material contained here.
Particular emphasis is placed in this book on stopping times-on the one
one hand, as tools in proving theorems, and on the other, as objects of interest
in themselves. Apropos of the latter, randomly stopped sums, optimal
stopping problems, and limit distributions of sequences of stopping rules
(i.e., finite stopping times) are of special interest. Wald's equation and its
second-moment analogue, in turn, show the usefulness of such stopped sums
in renewal theory and elsewhere in probability. Martingales provide a
natural vehicle for stopping times, but a formal treatment of the latter cannot
Preface xiii

await development of the former. Thus, stopping times and, in particular, a


sequence of copies of a fixed stopping rule appear as early as Chapter 5,
thereby facilitating discussion of the limiting behavior of random walks.
Many of the proofs given and a few ofthe results are new. Occasionally, a
classical notion is looked at through new lenses (e.g., reformulation of the
Lindeberg condition). Examples, sprinkled throughout, are used in various
guises; to extend theory, to illustrate a theorem that has just appeared, to
obtain a classical result from one recently proven.
A novel feature is the attempt to intertwine measure and probability
rather than, as is customary, set up between them a sharp demarcation. It is
surprising how much probability can be developed (Chapters 2,3) without
even a mention of integration. A number of topics treated later in generality
are foreshadowed in the very tractable binomial case of Chapter 2.
This book is intended to serve as a graduate text in probability theory. No
knowledge of measure or probability is presupposed, although it is recognized
that most students will have been exposed to at least an elementary treatment
of the latter. The former is confined for the most part to Chapters 1,4,6, with
convergence appearing in Section 3.3 (i.e., Section 3 of Chapter 3). I Readers
familiar with measure theory can plunge into Chapter 5 after reading Section
3.2 and portions of Sections 3.1, 3.3, 4.2, 4.3. In any case, Chapter 2 and also
Section 3.4 can be omitted without affecting subsequent developments.
Martingales are introduced in Section 7.4, where the upward case is
treated and then developed more generally in Chapter 11. Interchangeable
random variables are discussed primarily in Sections 7.3 and 9.2. Apropos
of terminology, "interchangeable" is far more indicative of the underlying
property than the current "exchangeable," which seems to be a too literal
rendition of the french word" echangeable."
A one-year course presupposing measure theory can be built around
Chapters 5, 7, 8, 9, 10, 11, 12.
Our warm thanks and appreciation go to Mary Daughaday, Beatrice
Williams, and Pat Wolf for their expert typing of the manuscript.

References
J. M. Keynes, A Treatise on Probability, 1921; MacMillan, London, 1943.
A. Kolmogorov, Foundations ot'the Theory of Probability, 1933; Chelsea, New York,
1950.
R. von Mises, Probability, Statistics and Truth, 1928; Wm. Hodge, London, 1939.
R. von Mises, Mathematical Theory of Probability and Statistics, 1931 (H. Geiringer,
editor), Academic Press, N.Y., 1964.
O. Ore, .. Appendix," Cardano, The Gambling Scholar, Princeton University Press,
1953; Holt, New York, 1961.
I. Todhunter, A History of the Mathematical Theory of Probability, 1865; Chelsea,
New York, 1949.

1 In the same notational vein, Theorem 3.4.2 signifies Theorem 2 or Section 4 of Chapter 3.
Contents

Preface to the Third Edition vii


Preface to the Second Edition ix
Preface to the First Edition xi
List of Abbreviations xix
List of Symbols and Conventions xxi

1 Classes of Sets, Measures, and Probability Spaces


1.1 Sets and set operations
1.2 Spaces and indicators 4
1.3 Sigma-algebras, measurable spaces, and product spaces 6
1.4 Measurable transformations 12
1.5 Additive set functions, measures, and probability
spaces 18
1.6 Induced measures and distribution functions 25

2 Binomial Random Variables 30


2.1 Poisson theorem, interchangeable events, and their
limiting probabilities 30
2.2 Bernoulli, Borel theorems 39
2.3 Central limit theorem for binomial random variables,
large deviations 45

3 Independence 54
3.1 Independence, random allocation of balls into cells 54
3.2 Borel-Cantelli theorem, characterization of
independence, Kolmogorov zero-one law 61
xv
xvi Contents

3.3 Convergence in probability, almost certain convergence,


and their equivalence for sums of independent random
variables 66
3.4 Bernoulli trials 75

4 Integration in a Probability Space 84


4.1 Definition, properties of the integral, monotone
convergence theorem 84
4.2 Indefinite integrals, uniform integrability, mean
convergence 92
4.3 Jensen, Holder, Schwarz inequalities 103

5 Sums of Independent Random Variables 113


5.1 Three series theorem 113
5.2 Laws of large numbers 124
5.3 Stopping times, copies of stopping times, Wald's
equation 138
5.4 Chung-Fuchs theorem, elementary renewal theorem,
optimal stopping 150

6 Measure Extensions, Lebesgue-Stieltjes Measure,


Kolmogorov Consistency Theorem 165
6.1 Measure extensions, Lebesgue-Stieltjes measure 165
6.2 Integration in a measure space 171
6.3 Product measure, Fubini's theorem, n-dimensional
Lebesgue-Stieltjes measure 184
6.4 Infinite-dimensional product measure space,
Kolmogorov consistency theorem 191
6.5 Absolute continuity of measures, distribution
functions; Radon-Nikodym theorem 202

7 Conditional Expectation, Conditional Independence,


Introduction to Martingales 210
7.1 Conditional expectations 210
7.2 Conditional probabilities, conditional probability
measures 222
7.3 Conditional independence, interchangeable random
variables 229
7.4 Introduction to martingales 239
7.5 V-statistics 259

8 Distribution Functions and Characteristic Functions 270


8.1 Convergence of distribution functions, uniform
integrability, Helly-Bray theorem 270
Contents xvii

8.2 Weak compactness, Frechet-Shohat, Glivenko-


Cantelli theorems 281
8.3 Characteristic functions, inversion formula, Levy
continuity theorem 286
8.4 The nature of characteristic functions, analytic
characteristic functions, Cramer-Levy theorem 294
8.5 Remarks on k-dimensional distribution functions and
characteristic functions 308

9 Central Limit Theorems 313


9.1 Independent components 313
9.2 Interchangeable components 328
9.3 The martingale case 336
9.4 Miscellaneous central limit theorems 340
9.5 Central limit theorems for double arrays 345

10 Limit Theorems for Independent Random Variables 354


10.1 Laws of large numbers 354
10.2 Law of the iterated logarithm 368
10.3 Marcinkiewicz-Zygmund inequality, dominated
ergodic theorems 384
10.4 Maxima of random walks 392

11 Martingales 404
11.1 Upcrossing inequality and convergence 404
11.2 Martingale extension of Marcinkiewicz-
Zygmund inequalities 412
11.3 Convex function inequalities for martingales 421
11.4 Stochastic inequalities 432

12 Infinitely Divisible Laws 444


12.1 Infinitely divisible characteristic functions 445
12.2 Infinitely divisible laws as limits 454
12.3 Stable laws 468

Index 479
List of Abbreviations

r.v. random variable


r.v.s random variables
dJ. distribution function
c.£. characteristic function
p.dJ. probability density function
u.i. uniform integrability or uniformly integrable
i.o. infinitely often
a.c. almost certainly
a.s. almost surely
a.e. almost everywhere
i.d. infinitely divisible
i.i.d. independent, identically distributed
iff if and only if
CLT Central Limit Theorem
WLLN Weak Law of Large Numbers
SLLN Strong Law of Large Numbers
LIL Law of the Iterated Logarithm
m.gJ. moment generating function
Cov Covariance

xix
List of Symbols and Conventions

a(<;§) a-algebra generated by the class <;§


a(X) a-algebra generated by the random variable X
EX expectation of the random variable X
JX abbreviated form of the integral J X dP
EPX abbreviated form of (E Xy
II X l p p-norm of X, that is, (E IXIP)l/P
C(F) continuity set of the function F
a.c. or 8.S. or a.e.
convergence almost certainly or almost surely or almost
everywhere
Pordor I'
convergence in probability or in distribution or in J1.-
measure
convergence in mean of order p
wor c
weak or complete convergence
class of n-dimensional or infinite-dimensional Borel sets
9l{ } real part of
J{ } imaginary part of
/\ minimum of
maximum of
simultaneous statement that a :::;; lim Y,.:::;; rrm Y,. :::;; b
simultaneous statement that Z :::;; Z2 and Z ~ Zl

xxi
xxii List of Symbols and Conventions

fictitious LV. with dJ. F


median of X
normal r.v. with mean p" variance (72
1
Classes of Sets, Measures, and
Probability Spaces

1.1 Sets and Set Operations


A set, in the words of Georg Cantor, the founder of modern set theory, is a
collection into a whole ofdefinite, well-distinguished objects of our perception
or thought, The objects are called elements and the set is the aggregate of
these elements. It is very convenient to extend this notion and also envisage a
set devoid of elements, a so-called empty set, and this will be denoted by 0.
Each element of a set appears only once therein and its order of appearance
within the set is irrelevant. A set whose elements are themselves sets will be
called a class.
Examples of sets are (i) the set of positive integers denoted by either
{I, 2, ... } or {ill: (J) is a positive integer} and (ii) the closed interval with end
points a and b denoted by either {ill: a ::;; ill ::;; b} or [a, b]. Analogously, the
open interval with end points a and b is denoted by {ill: a < (J) < b} or (a, b),
while (a, b] and [a, b) are designations for {ill: a < ill ::;; b} and {ill: a ::;; ill < b}
respectively.
The statement that ill E A means that ill is an element of the set A and
analogously the assertion ill ¢ A means that ill is not an element of the set A or
alternatively that ill does not belong to A. If A and B are sets and every
element of A is likewise an element of B, this situation is depicted by writing
A c B or B ~ A, and in such a case the set A is said to be a subset of B or
contained in B. If both A c Band B c A, then A and B contain exactly the
same elements and are said to be equal, denoted by A = B. Note that for
every set A, 0 cAe A.
A set A is termed countable if there exists a one-to-one correspondence
between (the elements of) A and (those of) some subset B of the set of all
positive integers. If, in this correspondence, B = {I, 2, ... , n}, then A is called
2 I Classes of Sets. Measures. and Prohability Spaces

a finite set (with n elements). It is natural to consider 0 as a finite set (with


zero elements). A set A which is not countable is called uncountable or
nondenumerable.
If A and B are two sets, the difference A - B is the set of all elements of A
which do not belong to B; the intersection A (\ B or A . B or simply A B is the
set of all elements belonging to both A and B; the union A u B is the set of all
elements belonging to either A or B (or both); and the symmetric difference
A A B is the set of all elements which belong to A or B but not both. Note that
A u A = A, A (\ A = A, A - A = 0, A - B = A - (AB) c A,
A u B = B u A ~ A ~ AB = BA, A A B = (A - B) u (B - A).
Union, intersection, difference, and symmetric differehce are termed set
operations.
If A, B, C are sets and several set operations are indicated, it is, strictly
speaking, necessary to indicate via parentheses which operations are to be
performed first. However, such specification is frequently unnecessary. For
instance, (A u B) u C = A u (B u C) and so this double union is inde-
pendent of order and may be designated simply by A u B u C. Analogously,
(AB)C = A(BC) = ABC, (A A B) A C = A A (B A C) = A A B A C,
A(B u C) = AB u AC, A(B A C) = AB A AC.
If A is a nonempty set whose elements Amay be envisaged as tags or labels,
{A;.: AE A} is a nonempty class of sets. The intersection n;.eA A;. (resp. union
UhA AA) is defined to be the set of all elements which belong to AA for all
AE A (resp. for some AE A). Apropos of order of carrying out set operations,
if * denotes anyone of u, (\, -,11, for any set A it follows from the definitions
that

UA* (U
AeA
A A =
AeA
A A) * A, A U AA =
* AeA A U AA)'
* ( AeA

n * (n
Ae A
A;. A =
.I.e A
Ai.) * A, A * n
AeA
AA = A* (n
AeA
A;.).

Then
A - U A A=
;'eA
n(A -
AeA
A A)' A - nA;. = U(A -
AeA AeA
AA)'

For any sequence {An, n ~ I} of sets, define

n UA
00 00 00 00

lim An
"-00
= U nAk,
n= 1 k=n
lim An
"-00
=
n= 1 k=n
k

and note that, employing the abbreviation i.o. to designate" infinitely often,"
lim An = {w: wEAn for infinitely many n} = {w: wEAn' i.o.}
(1)
lim An = {w: wEAn for all but a finite number of indices n}.
1.1 Sets and Set Operations 3

To prove, for example, the first relation, let A = {W: wEAn' i.o.}. Then
WE A iff for every positive integer m, there exists n ~ m such that wEAn,
that is, iff for every positive integer m, W E U:'=m
An i.e., iff WE t An· n:= U:'=m
In view of (1), lim An C lim An' but these two sets need not be equal
(Exercise 3). If lim An = ITii1 An = A (say), A is called the limit of the sequence
An; this situation is depicted by writing lim An = A or An -+ A. If Ate
A 2 C ... (resp. AI=' A 2 =' ...) the sequence An is said to be increasing
(resp. decreasing). In either case, {An' n ~ l} is called monotone.
Palpably, for every monotone sequence An, limn~oo An exists; in fact, if
{An} is increasing, limn~oo An = U:,=
1 An, while if {An} is decreasing,
limn~oo An = n:,=
1 An· Consequently, for any sequence of sets An,

00

lim An = lim UA k,
n-oo "-00 k=n

limA n = lim
n-+oo n-oo k=n
n
00

Ak •

EXERCISES I. 1
I. Prove (i) if An is countable, n ~ 1, so is U:= 1 An; (ii) if A is uncountable and B => A,
then B is uncountable.
2. Show that U:= 1 [0, n/(n + I» = [0, I), n:= I (0, I/n) = 0·
3. Prove that lim n-<Xl An C ITm n-<Xl An. Specify lim An and lim An when A 2j = B,
A 2i - 1 =C,j= 1,2, ....
4. Verify that U:'= 1 An = Iim n_<Xl U~ Aj and n:= I An = Iim n_<Xl nj= I A j . Moreover,
if {An' n ~ I} is a sequence of disjoint sets, i.e., Aj Aj = 0, i # j, then
<Xl
lim U A j = 0·
,,""cc j=n

5. Prove that ITmn(A n U Bn) = ITm n An U ITm n Bn and limn An' Bn = limn An' limn Bn.
Moreover, lim An = A and lim Bn = B imply limn (An U Bn) = A u Band
lim AnBn = AB.
6. Demonstrate that if B is a countable set and Bn = {(b I' ... , hn): hi E B for 1 ::; i ::; n l.
then Bn is countable, n ~ I.
7. Prove that the set S consisting of all infinite sequences with entries 0 or 1 is non-
denumerable and conclude that the set of real numbers in [0, I] or any nondegenerate
interval is nondenumerable. Hint: If S were countable, i.e., S = {sn, n ~ I} where
Sn = (x nl . Xn2 ,·· .), then (J - XII' I - Xn, ... , I - x nn ,.·.) would be an infinite
sequence of zeros and ones not in S.
8. If an is a sequence of real numbers, 0 ::; an ::; 00, prove that

U[0, an)
n=1
= [0, supan),
n21
U[0. (~)n]
n=1 n
# [0, sup (n_",=-~)n].
n n~l

9. For any sequence of sets {An, n ~ I}, define B I = AI' Bn+ 1 = Bn L\ An+!, n ~ 1.
Prove that limn Bn exists iff lim An exists and is empty.
4 I Classes of Sets, Measures, and Probability Spaces

1.2 Spaces and Indicators


A space 0 is an arbitrary, nonempty set and is usually postulated as a reference
or point of departure for further discussion and investigation. Its elements are
referred to as points (of the space) and will be denoted generically by w. Thus
0= {W:WEO},
For any reference space 0, the complement A C of a subset A of 0 is defined
by A C = 0 - A and the indicator I A of A c 0 is a function defined on 0 by

Similarly, for any real function f on 0 and real constants a, b, I[a$f$bl


signifies the indicator of the set {w: a ~ f (w) ~ h}.
For any subsets A, B of 0

Au A C = 0,
A - B = ABC, IA ~ IB iff A c B,

with the last inequality becoming an equality for all W iff AB = 0. Let A be an
arbitrary set and {A;., ,l E A} a class of subsets ofO. It is convenient to adopt
the conventions

U A). = 0,
;'e0
n
;'e0
A;. =0.

Moreover,

IU'>EAA.> = supI A .>·


;'eA

If A;. . A;.' = 0 forA., ,l' E A and,l '# ,l', the sets A;. are called disjoint. A class of
disjoint sets will be referred to as a disjoint class.
If {An' n ~ I} is a sequence of subsets of 0, then {I An' n ~ I} is a sequence
of functions on 0 with values 0 or 1 and

llimA n = lim IAn' I,im An = ITiti I An'


n~OO n~OO

Moreover,

(1)
1.2 Spaces and Indicators 5

Equality holds in (I) iff {An' n ::?: I} is a disjoint class. The following identity
(2) is a refinement of the finite counterpart of (I) : For Ai c n, 1 :::;; i :::;; n, set

Then
(2)

In proof of (2), if for some WEn, I U7 Aj( w) = 0, clearly Sk( w) = 0, 1 :::;; k :::;; n,
whence (2) obtains. On the other hand, if I u74w) = 1, then W E A j for at least
one j, 1 :::;; j :::;; n. Suppose that W belongs to exactly m of the sets AI' ... , An.
Then St(w) = m, siw) = (D,···, sm(w) = 1, Sm+ t(w) = ... = sn(w) = 0,
whence

EXERCISES 1.2
1. Verify that
A~B = Ae ~ Be, C =A~B iff A = B ~ C,
00 00 00

U An ~ U Bn U (An ~ Bn),
, ,
C
1

n An ~ n B
00 00

n C U (An ~ Bn )·
1 I 1

2. Prove that (lim._ oo A.t = lim._ oo A~ and (lim n _ oo A.)e = Iim._ oo A~; also that
lim B. = B implies lim B~ = Be and lim A ~ B. = A ~ B.
3. Prove that IlimA n = ITiTI IAn and that IlimA n = lim IAn whenever either side exists.
4. If A. c n, n 2': 1, show that

lu;:,:,A n = max IAn' I ();:': I An = min I An·


n2: 1 "2: 1

5. Iffis a real function on n, thenjl = f iff f is an indicator of some subset ofn.

6. Apropos of(2), prove that if B., is the set of points belonging to exactly m(1 ::; m ::; n)
of A" ... ,A., then

IBm = S., - (mm+ 1) S.,+1 + (m m+ 2) S.,+2 - ... + (_I)n-., (n)


m S•. (3)

7. If{f.,n 2': O}isasequenceofrealfunctionsonnwithf. i foandA n = {w:f.(w) > c},


then A. c A.+, and lim A. = A o .
6 I Classes of Sets, Measures, and Probability Spaces

8, If {In, n ~ O} is a sequence of real functions with f. i fo and gn = f.1ras,fnS,bl for


some constants -00 ~ a < b ~ 00, then {gn, n ~ I} is not necessarily increasing.
However, if for n ~ 0

thenf~ i fo·
9. If I, and 12 are real functions on Q, prove that for all real x and rational r

(w: 1,(w) + f2(W) < x} = U {w: f.(w) < r}· (w: f2(W) < x - r}.
all ,.

1.3 a-Algebras, Measurable Spaces, and


Product Spaces
Let 0 be a space.

Definition. A nonempty class d of subsets of 0 is an algebra if


i, AC E sl whenever A Ed,
ii, At U A 2 Ed whenever A j E sl,j = 1,2.
Moreover, .91 is a a-algebra if, in addition,
iii. U:'= 1 An E d whenever An E .W', n ~ L

Evidently, (ii) implies that for every positive integer n, U~ A jEd when-
ever AjEd, 1 ~j ~ n, while both (i) and (ii) entail A t A 2 Ed, if AjEd,
j = 1, 2,; also, since d is nonempty, 0 E d, 0 E d. Clearly, (iii) implies (ii)
by taking An = A 2 , n ~ 2. Note that a a-algebra is closed under countable
intersections.

Definition. A nonempty classd ofsubsets of 0 is a monotone class iflim An E.W'


for every monotone sequence An E d, n ~ L

Obviously, a a-algebra is a monotone class. Conversely, a monotone


algebra d (i.e., a monotone class which is simultaneously an algebra) is a
a-algebra. For if An E d, n ~ 1, then B n = Uj= 1 A jEd, n ~ 1, whence
Uf= 1 A j = limn B n E d.
Let So be the class of all subsets of 0 and To = {0, O}. Then So and To are
a-algebras and for any a-algebra V 0 of subsets of 0, To c V 0 c So'

Definition. The minimal algebra Iff' (resp. a-algebra, monotone class) contain-
ing a nonempty class Iff ofsubsets of0, is an algebra (resp. a-algebra, monotone
class) such that
i. Iff'::::> Iff,
1.3 a-Algebras. Measurable Spaces. and Product Spaces 7

ii. tff" :::> tff' whenever tff" :::> tff and tff" is an algebra (resp. a-algebra, monotone
class).
Such a minimal algebra tff' (resp. a-algebra, monotone class) containing tff
is also called the algebra (resp. a-algebra, monotone class) generated by tff and
is denoted by d(tff) (resp. a(tff), m(tff».
To demonstrate the existence of d(tff) (resp. a(tff), m(Iff)), let
Qs = {~: ~ :::> tff, ~ is an algebra (resp. a-algebra, monotone class)}
Then Qs is nonempty since So E Qs' Since an arbitrary intersection ofalgebras
(resp. a-algebras, monotone classes) is itself an algebra (resp. a-algebra,
monotone class), if £:0 = n£lteQS~' then £:0 is an algebra (resp. a-algebra,
monotone class) and £:0 :::> tff. Obviously, £:0 is minimal.
For an arbitrary class tff of sets, it is impossible to give a constructive
procedure for obtaining a(tff) (see Hausdorff (1957»). However, some in-
dication of its structure is given by

Theorem 1. If d is an algebra, m(d) = a(d).


PROOF. By a prior observation, d c m(d) c a(d). To prove that m(d) :::>
a(d) it suffices to show that m(d) is a a-algebra, and indeed merely an
algebra, again via a prior comment. To this end, for any Ben, set £:0 B =
{A: A . BCE m(d), B· A CE m(d), A u BE m(d)}. If An E ~B' A n+ t cAn,
n ~ 1, and A = n:=
I An, then {An' B {B· A~}, and {An U B} are mono-
C
},

tone sequences in m(d). Thus, A . BC, B· A C, and A u B belong to m(d),


whence A E £:0 B • Similarly, for increasing An in £:0 B , n ~ 1, lim An E ~B'
Hence £:0 B is a monotone class for all Be n.1f BEd, £:0 B :::> d since d is an
algebra. Therefore, £:0 B :::> m(d) for BEd. Since A E £:0 B iff BE £:0 A' necessarily
~B :::> m(d) for BE m(d). This latter means that if B E m(d) and E E m(d),
then B E E BC, and B u E all are in m(d). In other words, m( d) is an algebra.
C
,

(Recall that nEd c m(d).) 0

Definitions. A nonempty class d of subsets of n is a n-class if AB E d when-


ever both A, BEd, whereas it is a "--class if (i) nEd, (ii) A - BEd for A,
BEd and A :::> B, (iii) lim An E d for every increasing sequence An E d,
n ~ 1.

Note that a A-class is closed under complementation and hence (iv) A u BE


d if A, BEd and AB = 0 (since (A u B)C = B C- A Ed via (ii)). In view of
Au B = Au BAC, a A-class which is simultaneously a n-class is a a-algebra
via (iv) and (iii). Furthermore (same proof as for d(tff)), there exists a unique
minimal A-class A(Iff) (resp. n-class) containing a given class Iff of sets.

Theorem 2. If a A-class d contains an-class £:0, then d :::> a(£:0), the a-algebra
generated by £:0. In particular, if £:0 is an-class, A(£:0) = a(£:0).
8 I Classes of Sets, Measures, and Probability Spaces

PROOF. If~ is the minimal A.-class containing~, it suffices to show that ~ is a


n-class since for any other A.-class ~' containing ~ necessarily~' :::> ~ :::> a(~)
because ~ is a a-algebra. To this end, set ~ 1 = {A: A c 0, AD E ~ for all
D E ~}. Clearly, ~ 1 is a A.-class containing~, whence ~ 1 :::> ~. Hence, AD E ~
whenever D E 9&, A E~. Thus, if~2 = {B: Be 0, AB E ~ for all A E ~}, then
~ z is a A.-class containing ~. Consequently, ~ z :::> ~ and so AB E ~ for both
A, BE~. 0

If A is a fixed subset of 0 and ~ is any class of subsets of 0, the class of all


sets of the form BA with B E ~ will be denoted by A . ~ or ~ . A or ~ II A.
If A (rather than 0) is considered as the reference space (relative to which
complementation occurs) then ~ II A is a a-algebra (resp. n-class, algebra)
relative to A if~ is a a-algebra (resp. n-class, algebra) relative to O. When more
than one reference space crops up, this will be appended to the class of sets
under consideration. Thus, a A(~) denotes the a-algebra generated by the class
9& but relative to the reference space A.

Theorem 3. For every non-empty class ~ of subsets of 0 and every non-empty


set A cO, an (9&) II A = aA(9& II A).
PROOf. Let ~ = {B: Be 0, B· A E () A(~ II A)}. Then ~ is a a-algebra
relative to 0 and ~ :::> ~, whence ~ :::> an(9&). In other words, an(~) II A c
a A(9& II A). On the other hand, ()n(~) II A is a a-algebra containing 9& II A,
whence ()n(9&) II A :::> () A(9& II A). 0

Definition. If d is a a-algebra relative to 0, then the pair (0, d) is called a


measurable space. The sets of d are called measurable sets.

It is interesting to contrast the definition of a measurable space with that


of a topological space. Both measurable and topological spaces engender
natural product spaces.
For any measurable spaces (OJ, d j ), j = 1,2, ... define for n ~ 2
n

X A j = {(w
j= I
t, W z ,· .. , w n): wjEA j C OJ, 1 S j S n},
n

AI X A z x ... x An = XA j ,
j= I

.~ d
I-I
j = a({.~
I-I
Aj : Aj E d j , I s js n}),
j~I(Oj,dJ = (~Oj' j~d}
Then X?=I OJ is called the product space (with components OJ, 1 sis n).

Moreover, the measurable space X?=


1 (OJ, d j ) is termed the n-dimensional
product-measurable space and X?=
I d j is the product a-algebra.
1.3 a-Algebras. Measurable Spaces. and Product Spaces 9

Sets of the form X7= I Ai with Ai C 0i, 1 sis n, are called (n-dimen-
sional) rectangles of the product space X7= I OJ, and, moreover, if A j E d j ,
1 sis n, they are dubbed measurable rectangles or rectangles with measur-
able sides. Clearly, the intersection (but not the union) of any two measurable
rectangles of a given product space is a measurable rectangle in that space. In
other words, the class of measurable rectangles of X7 = I OJ is an-class.

Theorem 4. If (OJ, d), 1 sis n are measurable spaces, the class .91 of all
finite unions of disjoint rectangles X7= I Ai with A j E d j , l s i s n, is the
algebra generated by the class of all measurable rectangles of the product
space X7= I OJ.
PROOF. Let "§ denote the n-c1ass of measurable rectangles of X7= I OJ' Now
d is also a n-c1ass since if Ai = Uj:: 1 Eij E d, i = 1,2, with Eij E '§, then
AI' A 2 = Uh.k
E lh E 2k E d. Moreover, if E = E I X ... x En E '§,
E C
= EI X ... X En _ I X E~ u EI X ... X En _ 2 X E~ _ I X On
n
U ... u E~ X O2 X ... X On = UD j (say) E d
I

AC = nE = j=ln j=1UD!j) = UD!I)D\2) ... D(r'Ed


j=1
C
J i l l
r n

12 I,. .

Consequently, if A, BEd, the preceding ensures that


A u B = A u B· A C E d.
Hence d is an algebra, and since d ::> "§, necessarily .91 ::> d(,,§). On the
other hand, every finite union of disjoint rectangles with measurable sides
E d(,§) and so d c d(,,§), whence d = d(,,§). 0

Clearly, the a-algebra X7= I d j , generated by the rectangles with measur-

° °
able sides, is also the a-algebra generated by the algebra d of Theorem 4.
Xt
The points of = I OJ, (0 I X 02) X 3, and I X (0 2 X 03) are respec-
tively of the form (WI, W2, W3), «WI' W2), W3), and (WI, (W2' w 3 )), where
W j E OJ for i = 1,2, 3 and formally they are different. However, between any
two of the three points there is a natural one-to-one correspondence and so
each one ofthe points will be identified with (WI' W2' W3)' Similarly, «WI' w 2 ),
(W3' w 4 )) will be identified with (WI, W 2 , W 3 , w 4 ), etc. Under this convention
it is easily seen that for 1 S m < n

and not so easily that


10 I Classes of Sets. Measures, and Probability Spaces

To verify the latter, let


" "
x= XOj, z= X OJ,
j= I i=m+ 1

" "
ff = X
j=

I"
Yf = X
j=m+ 1
d;,

ff' = {.X
1= 1
Aj : Aj E d j , 1 $; i $; n}, <;1' = {.X.= 1
A j : A j E d j, 1 $; i $; m},
Yf' ={ X A j: A j
i::=m+ 1
E d j, m + 1 $; i $; n}.
By definition, ff = ax(ff'), <;1 = ay{<;1'), Yf = az{Yf'), and <;1 x Yf =
0'( {A x B: A E <;1, BE Yf}), where aQ is the a-algebra relative to the space Q.
Now if X?=I AjEff', then X:"=I A j E<;1' and X?=m+1 AjEYf', whence
X?=I A j = (X:"=I A j ) x (X?=m+1 AJE<;1 x Yf,implyingff c <;1 x Yf,On
theotherhand,ifAE<;1andBEYf,thenA x B = (A x 0m+1 X .,. X 0") (\
(0 1 X ... x Om X B) E ff and so <;1 x Yf c ff. 0

In the product measurable space X?= I(Oj, d j ), n = 1,2, ... , sets of the
form A x (X?=m+ 1 OJ with A E X:,,= 1 d j for 1 $; m < n are called cylinders
with m-dimensional base A. The quintessence ofa cylinder set is that all but a
finite number of coordinates are unrestricted. The notion of a cylinder is
important in the case of an infinite-dimensional product space.
Let (OJ, dJ, i = 1,2, ... , be measurable spaces and define

XA
OC>

j = {(WI> W 2 ,·· .): W j E Ai C OJ, i = 1,2, ...},


j= 1

<;1 = U{X A j: A jEdj,


m= 1 i= 1
1 $; i $; m, and A j = OJ, i > m},
"X-

Xd
j= 1
j = 0'(<;1),

j~ (OJ, dJ = (~Oj' j~ .c1}


Then X r;, 1(OJ, d i) is the infinite-dimensional product-measurablespace with
components (OJ, d;). The a-algebra in question, Xr;, 1 d j, is that generated
by cylinders with finite-dimensional (measurable) bases (see Exercise 1.3.6).
In the special case where (OJ, d j) = (0, d) for all i, a convenient notation
IS

"
d" = X" d;, (0", d") = X (OJ, d j ), 1 $; n $; 00.
j= I j= 1
1.3 a-Algebras, Measurable Spaces, and Product Spaces II

A prominent example of the preceding considerations arises when n =


R = [ - 00,00] = {w: - 00 ~ w ~ oo}. The set R will be called the realline
and ± 00 will be considered as real numbers. In contradistinction, ( - 00, (0) =
{w: - 00 < W < oo} will be termed the finite real line. whose elements
are finite real numbers. The relationships between ± 00 and the finite real
numbers are such that for every finite real number x

-00 < x < 00, x + 00 = 00 +x = 00,

x + (- (0) = - 00 +x = - 00,

~=_x_=o,
00 -00

00 + 00 = (00)'(00) = (-00)'(-00) = -(-00) = 00,


- 00 + (- (0) = 00 . ( - (0) = ( - (0) . 00 = - (00) = - 00,

x·oo=oo·x=(-x)·(-oo)=(-oo)·(-x)=oo or ° or -00,

accordingasx > O,X = O,x < 0. Note that oojooand 00 - 00 are not defined.
For x E R, the sets [ - 00, x], (- 00, x), (- 00, x], [ - 00, x), [x, 00], (x, (0),
[x, (0), (x, 00] are termed infinite intervals.
The elements of the a-algebra !!d generated by the class of infinite intervals
of the form [ - 00, x), - 00 < x < 00, are known as the Borel sets (of the line)
or linear Borel sets or Borel sets in R. The measurable space (R,!!d) is called the
Borel line or I-dimensional Borel space.
The product measurable spaces (R", .j$") and (ROO, !!d OO ), emanating from
the Borel line (R, !!d), are called the n-dimensional Borel space and infinite-
dimensional Borel space respectively. Moreover, the sets of !!d" are termed
n-dimensional Borel sets. 1 ~ n ~ 00. Since every point WE ROO is an infinite
sequence of real numbers, the infinite dimensional Borel space (ROO, !!d OO ) is
also alluded to as a sequence space.
For any interval J c R the a-algebra generated by the class of all subin-
tervals of J coincides with f!4 n J according to Theorem 3. Thus, it is natural
to describe the a-algebra in question as the class of Borel subsets of J.
Similar comments apply to any n-dimensional interval (rectangle) J" c R".

EXERCISES 1.3
J. Prove that .fi1 is a a-algebra if and only if it is both a A.-class and a n-class. If a class
,91 c f», then a(.fi1) c a(£il).

2. If;f' is an algebra such that for every disjoint sequence {F., n ~ J} of sets in .?
U;,,;I F. E Y;, then Y; is a a-algebra.

3. Let (Qi' ,fi1,) be measurable spaces and .9I i = IT(~,) with Q i E ~i' I :s:; i :s:; n. If
f§ = {A I x A 2 X ... x A.: Ai E ~i' 1 :s:; i :s:; n}, then IT(~) = .fi1 I x .91 2 X ..• x .91.
011 Q1 X Q2 X ... x Q •.
12 I Classes of Sets. Measures. and Probability Spaces

4. If 4". n ~ I is an increasing sequence of a-algebras, then .cY' = U:= I .cY'" is merely


an algebra.

5. The a-algebra generated by a countable class of disjoint, nonempty sets whose


union = Q is the class of all unions of these sets.

6. If (Qi' .cY',), i ~ I, are measurable spaces, the class Iff of all cylinder sets of Qi Xf
with bases in X';' ·cY'i for some In ~ I is an algebra, but not a a-algebra. Moreover,
setting!!iJ = {XT= I Ai: Ai E ,Wi} verify that X,oc;, I .cY'j = a(fl1) = a(6').

7. Let!!iJ be a n-class of subsets of Q and '§ the class of all finite unions of disjoint
sets of!!iJ with 0 E!!iJ. If DC E '§ for every D E!!iJ, prove that '§ is the algebra generated
by!!iJ.
8. Show that the class of Borel sets :fI may be generated by {(x, N], - x., < x < x.,}
or by.'I' = {{ + Xl}. [ - Xl, Xl], [a, b), - N :-0; a :-0; b :-0; X }.

9. Prove that A = {(x, y): x 2 + y2 < r 2 } is a Borel set of R 2 Hint: A is a countable


union of open (classical) rectangles. Utilize this to prove that {(x, y): x 2 + y2 :-0; r 2 } is
likewise a Borel set and hence also the circumference of a circle.

10. If.<I is the class of open sets of Qn = ( -XJ, XJ t, then a(.<I) = .fln n Qn.

1.4 Measurable Transformations

° °
Let 0 1 and O 2 be reference sets and X a function defined on 01' with values in
2, the latter being denoted by X: 0 1 --+ 2, For every subset A of O 2 and
class '§ of subsets of il 2, define

The set X - I(A) and the class X - I(,§) are called respectively the inverse images
of the set A and of the class '§. Clearly, if for each AE A the set A;. C 2, then °
X-I(UA;.) = UX-I(A;.),
;'EA ;'EA

°
and hence the inverse image X - I(,§) of a a-algebra '§ on O 2 is a a-algebra on
OJ and the class {B: B c 2 , X - I(B) E ff} is a a-algebra on O 2 if .11' is a
a-algebra on 1 , °
Lemma I. For any mapping X: 0 1 --+ O 2 and class d of subsets ofil 2 ,
X-I(an,(.w'» = an,(X-1(d».
PROOF. Since X- l (an 2 (d» is a a-algebra containing X-I (d), it must be shown
for any a-algebra .@ of subsets of ill with'@ ~ X-I (d) that.@ ~ X-I (an 2 (d».
Since.@ is a a-algebra, the class fJl = {B: B c 02,X- l (B)E.@} is likewise a
1.4 Measurable Transformations 13

<T-algebra and this together with the relation fJl ::J d implies fJl ::J <Tn,(d).
Thus
o
Suppose that X: Q t ~ Q 2 and Y: Q 2 ~ Q3' If Y(X) is defined by the usual
composition, that is,
Y(X)(W) = Y(X(w», WE Q,

then Y(X): 0 1 ~ Q 3 and (Y(X»-I(A) = X-I(y-I(A» for every A c Q3'


If (Q I' d I) and (Q 2 , d 2) are measurable spaces and X: Q I ~ Q 2 then X is
said to be a measurable transformation from (Q I' d I) to (Q 2 , d 2) or an d 1-
measurable function from Q I to (Q 2 , d 2) provided X - I(d 2) c d I'
Suppose that X is a measurable transformation from (Q I , d I) to (Q 2 , d 2)
and Y is a measurable transformation from (Q 2, d 2) to (Q 3, d 3)' It follows
immediately from the definition that Y(X) is a measurable transformation
from (Q I , d l ) to (Q 3, d 3 ).

Theorem I. If X j is a measurable transformation from (Q, d) to (Q j , dJ,


I ~ i ~ n, where n may be infinite, then X(w) = (X I(W), ... , XnCw» is a
measurable transformation from (Q, d) to the product measurable space
X7=I(Qj, dJ
PROOF. If n < 00, let

~ = {.X
,=
A j : AjEdj , I ~ i ~ n},
1

while if n = 00, take

~= U{X
m;;:; 1 i=l
A j : A j E d j, 1 ~ i ~ m, and Ai = Qi' i > m}.
Then <T(~) = X7= I d j and by the prior lemma

<T(X-I(~» = X-I(<T(~» = X-I (~d}


Since Xj-l(dJ c d for each i, X-I(~) c d, whence

d:J <T(X-I(~» = X-I(~I d } o


Next, if X?= I(Qj, dJ is a product-measurable space, define for A c
Q I
X Q2

A(I)(wI)= {W 2 :(W I ,W 2)EA} forwlEQI>


A(2)(w2) = {w l :(W I ,W 2)EA} forw 2 EQ 2'
The sets A(I)(w l ) and A(2)(w2) are called sections of A (at WI and W2 res-
pectively).
14 I Classes of Sets, Measures, and Probability Spaces

Theorem 2. Let Xl:I(Oj, .9I j ) be a product measurable space.


I. For A E.9I 1 X .91 2 and Wj E OJ' the sections AU)(w) E .91 3 - j ' j = 1,2.
11. If T is a measurable transformation from Xl: I(Oj, .9I j ) to a measurable
°
space (0, .c1), then for every W2 E 2 , T(w l , (2) defines a measurable
transformationfrom (° 1 , .91 1 ) to (0, .91).
PROOF. Let
= {A: A E.9I 1 X .91 2 , A(I)(wI) E.9I 2 for each WI E Oil,
~
~ = {D: D = D I X D 2 where D E .9I i = 1,2}. j j ,

Then.9l 1 x .91 2 = a(~) by definition. For D = D I X D 2 E~, D(l)(w l ) = D 2


or 0 according as WI E D I or not. Hence ~ ::> ~. It is easy to verify that the
section of a union (resp. difference) is the union (resp. difference) of the sec-
tions, i.e., for Uf
An C 0 1 X O 2 and all WI E 0 1

(AI - A 2)(l)(w l ) = A\I)(wI) - A~I)(WI)' (9 Anr,<w l ) = 9A~I)(wI)'


Therefore, ~ is a a-algebra and so ~ ::> a(~) = .91 1 x .91 2 , proving (i) for
j= 1. Similarly, for A(2)(W2)'
To prove (ii), let BEd. Then T-I(B) E.9I 1 x .91 2 and for every W2 E 02'
by (i) {WI: T(W I ,W2)EB} = (T- I (B»(2)(W2)E.9I I . Therefore, T(W I ,W 2) is

measurable from 1, .91 1 ) to (0,.91) for every W2 E 2 , ° 0

Since outcomes of interest in a chance experiment can usually be quantized,


numerical-valued functions play an important role in probability theory.
A measurable transformation X from a measurable space (0,.91) to the
Borel line (R, &I) is called a (real) measurable function on (n, .91). Since f!J is
the a-algebra generated by the class of intervals [- 00, x), - 00 < x < 00, or

°
by (x, 00], - 00 < x < 00, it follows from Lemma I that a real-valued func-
tion X on is measurable iff for every finite x, {X(w) < x} Ed or
{X(w) > x} Ed, XE( -00, (0).
In the special case of a measurable transformation .r from the Borel line
(R, f!J) to itself, f is termed a Borel or Borel-measurable function. Since every
open set in ( - 00, (0) is a countable union of open intervals and every open
interval is a Borel set, every continuous function on (- 00, (0) is a Borel
function. Hence a real function on [ - 00, 00] with range [ - 00, 00] which is
continuous on ( - 00, (0) is always a Borel function.
Similarly, for any finite or infinite interval J c R measurable transforma-
tions from (J, f!J. J) to (R, f!J) are termed Borel functions or Borel functions
on J. Analogously, for any finite or infinite rectangle r eRn, measurable
functions from (In, &In . r) to (R, f!J) are called Borel functions of n variables
or Borel functions on r. Since every open set of ( - 00, 00 t is a countable
union of open rectangles, continuous functions on ( - 00, 00 t are likewise
Borel functions.
1.4 Measurable Transformations 15

Let (0, d) be a measurable space and X, Y real functions on O. A complex-


valuedfunctionZ = X + iYonOiscalledd-measurableorsimplymeasurable
whenever both X and Yare d-measurable.
If {X., n ~ I} are real measurable functions on (0, d), the set equality
{w: sup. 2: 1 Xn(w) > x} = U:,=
dw: X.(w) > x} reveals that sup. 2: 1 X. is a
measurable function. Analogously, inf'2: I X., wn._
oo X. = infk2: 1 SUP.2:k X.,
and lim X. are all measurable. In particular, max(X I' X 2) and min(X I' X 2)
are measurable. Since an identically constant function is trivially measurable,
it follows that if X is a real measurable function, so are its positive and
negative parts defined by
X+ = max(O, X) and X- = max(O, - X).
Measurability of the sum (if defined) of two measurable functions is a simple
consequence of the relation (Exercise 1.2.9)
{w: X1(w) + X 2 (w) < x} = U {w: X.(w) < r}. {w:r < x - Xiw)}.
rational r

Thus, X = X + - X - is measurable iff X + and X - are measurable.


Clearly, if n is a positive integer and C is any constant, measurability of X
implies that of X' and cX. Measurability of the product of two measurable
functions is then a simple consequence of the identity
X 1X 2 = ![(X. + X 2)2 - (XI - X 2)2].
Likewise, the ratio of two measurable functions (if defined) is measurable.
If {X., n ~ I} are real measurable functions, measurability oflim._ oo X.
and wn._
oo X. and Exercise 1.4.5 ensure that lim._ oo X. (if it exists) is
measurable.
As an aid in the discussion of integration in later chapters, it is helpful to
define classes of functions with properties mirroring those of a A-class of sets.

Definition. A family Yf of nonnegative functions is called a A-system if


I. 1 EYf,
11. X.EYf,n ~ I,X. i X= XEYf,
111. XjEYf', Cj real, finite, i = 1,2, and cIX t + C2 X 2 ~ 0
=ctX I + C2 X 2 E Yf'.

Definition. A nonempty family JIt of nonnegative functions is called a


monotone system if the following conditions are satisfied:
1. If Xi E JIt and Cj is a finite nonnegative number, i = 1,2, then clX 1 +
C2X2 Ej(,
11. if X. E JIt and X. i X, then X E JIt.

Note that a A-system is always a montone system but not conversely. A


connection between classes of functions and classes of sets beyond that fur-
nished by indicator functions is given next. In the course of the proof, an
16 I Classes of Sets, Measures, and Probability Spaces

arbitrary nonnegative measurable function is shown to be an increasing limit


of simple (see Exercise 1.4.3) nonnegative measurable functions, a pheno-
menon that will reappear frequently in Chapters 4 and 6.

Theorem3. Let yt be afamily ofnonnegative functions on 0 which contains all


indicators ofsets ofsome class !?2 ofsubsets ofO. If either (i)!?2 is a n-class and
yt is a A-system, or (ii) !?2 is a a-algebra and yt is a monotone system, then yt
contains all nonnegative a(!?2)-measurable functions.
PROOF. Set <§ = {A: I A E yt}. (i) By hypothesis <§ ::) !?2 and, according to (i),
(ii), (iii) of the definition of a A-system, <§ is a A-class, whence <§ ::) a(!?2) by
Theorem 1.3.2, If X is a nonnegative a(!?2)-measurable function, then J k. n =
{kl2 n ~ X < (k + 0/2 n} E <§, 0 ~ k ~ 4n - 1, as does J 4 ".n = {X ~ 2n}.
Hence, if

X n E.Yt' by (iii). Clearly X niX and so X E yt by (ii). In case (ii), where yt is a


monotone system and !?2 is a a-algebra, <§ ::) !?2 = a(!?2) and the rest follows
as before. 0

If X is a real.J:1'-measurable function so that X- t(.~) C .91, it is natural to


speak of X- t(.~) as the a-algebra generated by X and hence to denote it by
a(X). Then X is also a measurable function from (0, a(X)) to (R, ff4) and
indeed a(X) is the smallest a-algebra of subsets of 0 for which that statement
is true. Moreover, if g is a Borel function on R, then g(X) is a real d-measurable
function and a(g(X)) c a(X).

Definition. The a-algebra a(X A' AE A) generated by a nonempty family


{X A' A E A} of real measurable functions on (0, d) is defined to be the a-
algebra generated by the class of sets {(J): Xi (J)) E B}, BE ff4, A E A, or equiv-
alently by {X A < x}, X E ( - 00, (0), A. E A.
Clearly, a( X A' A E A) c .91 and a( X A' AE A) is the smallest a-algebra
relative to which all X A' A E A, are measurable.
In particular, for A = {I, 2, , .. , n} the a-algebra generated by X t, ... , X n'
namely a(X t, ... , X n), is the a-algebra generated by the class of sets {Xi E BJ,
B i E:?J, 1 ~ i ~ n, or equivalently by the sets of a(X i ), 1 ~ i ~ n. Thus,
a(X I' ... , X n) = a(U~ a(X i ))·
When A = {l,2, ...}, it follows analogously that a(X 1,X 2 , ... ) =
a(Uf a(XJ). Consequently, for n ~ 2
a(X I) c a(X I' X 2) c ... c a(X I' ... , X n),
a(X I' X 2"") ::) a(X 2, X 3"") ::) ... ::) a(X n, X n+ I,·· .).

The a-algebra n:= 1 a( X n' X n + I' ...), which is called the tail a-algebra of
{X n , n ~ I}, wi11 be encountered in Chapter 3.
1.4 Measurable Transformations 17

If Xi is a measurable function from (n, d) to (R, :JI), 1 :s; i :s; n, then Xi


is likewise a (real) measurable function from (n, O"(X i» to (R, :JI), 1 :s; i :s; n,
and, moreover, each Xi is a measurable function from (n, O"(X j ••• X n to »
(R, :JI), 1 :s; i :s; n. The next theorem characterizes functions which are
O"(X I' ... , Xn)-measurable.

Theorem 4. Let X I' ... , X n be real measurable functions on (n, d). A (real)
function on n is O"(X j, . . . , X n)-measurable iff it has the form f(X (, ... , X n),
where f is a Borelfunction on W.
PROOF. Let C§ denote the class of Borel functions on Wand define
.Yf' = U(X t, X n): f E C§, f(X I ' " ' ' X n) 2 O},
... ,
~ = {{(w: Xlw):s; Xi' 1 :s; i:S; n}: -00 < Xi < 00,1 :s; i :s; n}
Then ~ is a n-class of subsets of n, O"(~) = O"(X I' . . . , X n), and IDE YI' for
D E~. Moreover, YI' is a A.-system. In fact, 1 E..if, and if /;(X I' . . . , X n) E..if,
C i is a finite real number for i = 1,2, and

C t i l (X I' ... , X n) + c z j~( Xl' ... , X n) 2 0,


then Ctll(X I"'" X n) + czfz(X I"'" X n) = f(X I"'" X n) E..if where

Now let fm(X 1 , ... ,Xn)E..if,m21, and fm(Xj, ... ,Xn)i f(XI, ,X n).
Set gm = max(.fl' ... ,fm)' Then gm i g E C§ and O:s; fm( X I' , X n) =
gm(X I"'" X n) :s; fm+ I(X I"'" X n)· Therefore,
f (X I' . . . , X n) = g( Xl' ... , X n) E YI'
and YI' is a A.-system. By Theorem 3, .Yf' contains all nonnegative O"(~)­
measurable functions. Hence, if Y is 0"( X I' . . . , X n)-measurable, both Y +
and Y- are in Yl'sothat Y = y+ - Y- =f(X1, ... ,Xn)forsomeBorel
function f on Rn.
Conversely, let f be a Borel function on Rn and
X(w) = (X I(W), ... , Xn(w», WEn.

Since Xi is measurable from (n, O"(X (, ... , X.» to (R, :JI) for 1 :s; i :s; n,
X is measurable from (n, 0"( X I. X z, ... , X n» to (W, :JIn) by Theorem 1. By
hypothesis,f is measurable from (W, :JI') to (R, :JI), whence f(X (, ... , X n) =
f(X) is measurable from (n, O"(X (, ... , X.» to (R, 86), that is,.f(X I' . . . , X n)
is O"(X" ... , Xn)-measurable. 0
In similar fashion one can prove

Theorem 5. If {X l' A. E A} is an infinite family of real measurable functions on


(n, d), then a real measurablefunction on n is O"(X l, A. E A)-measurable iff it is
of the form f(X A" X A" .•. , X An' ... ), where A.i E A, i = 1,2, ... , and f is an
infinite-dimensional Borelfunction.
18 I Classes of Sets, Measures, and Probability Spaces

EXERCISES 1.4
1. Let T: 0 1 --+ O2 and A.l C 0 1 for AEA. Show that T(U.l A.l) = U.l T(A.l) but that
T(A.l - A.l') need not equal T(A.l) - T(A.l')' where T(A) = {T(wd: WI EA},
T(d) = {T(A): AEd}. If T is-one-to-one and onto, then d a O'-algebra on 0 1
entails T(d) is a O'-algebra on 02'
2. In the notation of Exercise 1, prove that A c T-l(T(A») with set equality holding
if T is one-to-one. Also, T(T-l(B» c B c O 2 with equality if T is onto.
3. A function X on (0, §, P) is called simple if for some finite integer n and real
numbers Xl' ... , X n it is representable as X = I7=lx;IA, for {A i ,1 $ i $ n} a
disjoint subclass of §. Prove that any non-negative measurable function is an
increasing limit of non-negative simple functions.
4. Prove that (X + Y)+ $ X+ + Y+, (X + Yf $ X- + Y-, X+ $ (X + Y)+ + Y-.
Also if Bnt B, then X-I(BnH X-I (B).
5. Let X and Ybe real measurable functions on (0, d) and c any real constant. Show
that {w: X(w) < Y(w) + c}, {w: X(w) $ Y(w) + c}, {w: X(w) = Y(w)} Ed.
6. Prove that if X is a real measurable function on (0, d), so is IXI. Is the converse
true?
7. If X and Yare real functions on (0, d) then X + iY is measurable iff (X, Y) is
measurable from (0, d) to (R 2 ,gJ2).
8. If (OJ, d;), i = 1,2 are measurable spaces and X: 0 1 --+ O 2 then X is a measurable
transformation from (Ol,dl ) to (02,d2) provided X- l (£0) c d l for some class
£0 of sets for which 0'(£0) = d 2 •
9. Prove that any real monotone function on R is Borel measurable and has at most
a countable number of discontinuities.
to. For any linear Borel set B and T.v. X prove that O'(XI[XeBI) = X-I (B) n O'(X).

1.5 Additive Set Functions, Measures, and


Probability Spaces
Let n be a space and .91 be a nonempty class of subsets of n. A set function J1
on .91 is a real-valued function defined on.9l. If J1(A) is finite for each A E.9I, J1
is said to be finite, and if there exists {An, n ~ l} c .91 such that J.L(A n) is
finite for each nand U:'= 1 An = n, J.L is said to be <r-finite on .91.
If A c n and A = U:= 1 An, where {An' n = I, ... , m} is a disjoint
subclass of .91, the latter subclass is called a finite partition of A in .91. If
{An, n = 1,2, ...} is a disjoint subclass of.91 and U:'= 1 An = A, it is called a
<r-partition of A in .91.
Definition. A set function J1 on .91, denoted J.L( A) or J1 {A} for A E .91 is additive
(or more precisely, finitely additive), if for every A E.9I and every finite
partition {An' n = I, ... , m} of A in .91, IT J1(A n) is defined and
m
1.5 Additive Set Functions, Measures, and Probability Spaces 19

moreover, p, is a-additive or countably additive or completely additive if for


every A Ed and every u-partition {An' n = 1,2, ... } of A in d, If
p,(A n) is
defined and

Note in the u-additive case that the definition precludes conditional con-
vergence of the series. Clearly, if 0 E d, countable additivity implies finite
additivity.

If an additive set function is finite on some set A of an algebra d, it is


necessarily finite on all BEd with B c A. Examples of set functions that are
additive but not u-additive appear in Exercises 1.5.6 and 1.5.7.

Definition. A nonnegative u-additive set function p, on a class d containing 0


with p,{0} = 0 is called a measure. If p, is a measure on au-algebra ff of
subsets of 0, the triplet (0, .<Y;, p,) is called a measure space. A measure space
(0, ff, P) is a probability space if prO} = 1.

From a purely formal vantage point the prior definition relegates prob-
ability theory to a special case of measure theory. However, such important
notions as independence (Chapter 3) and other links between probability and
the real world have nurtured probability theory and given it a life and direc-
tion of its own.
In addition to the basic property of u-additivity, a measure p, on an algebra
d is monotone, that is, p,{A.} ~ p,{A 2 } whenever Al c A 2 , AjEd, i = 1,
2 (Exercise l),and, moreover,subadditive, that is, ifUI" A j E d, A j E d,j ~ 1,
then

(1)

as noted in (iii) of the forthcoming Theorem 2. A prominent example of a


measure is ordinary "length" on the class d of all finite half-open intervals
[a, b), i.e., p,{[a, b)} = b - a. The extension of the definition of length to
u(d) (= class of all Borel sets) is known as Lebesgue measure and will be
encountered in Chapter 6.
Probability spaces will underlie most if not all ofthis book. In a probability
space (0, ff, P), the sets A E.? are called events and the nonnegative, real
number P{A} is referred to as the probability of the event A. The monotone
property of a measure ensures that for every event A

0= P{0} ~ P{A} ~ prO} = 1. (2)

Thus, in probability theory "event" is simply a name attached to an


element of a u-algebra of subsets of a basic reference set 0, frequently called
the sample space. From an intuitive or applicational standpoint an event is the
20 I Classes of Sets. Measures. and Probability Spaces

abstract counterpart of an observable outcome of a (not completely deter-


mined) physical experiment. The numerical-valued probability of an event is
in some sense an idealization of the intuitive notion of a long-run frequency as
attested to by (2) and the additivity property.
Events of probability zero are called null events. A real-valued measurable
function X on a probability space (0, .:1', P) is called a random variable
(abbreviated r.v.) if {w: IX(w)1 = oo} is a null event. If some property obtains
except on a null event, this property is said to obtain almost surely (abbreviated
a.s.) oralmost certainly (abbreviated a.c.) or with probability one. Hence, a r.v.
on a probability space (0, :Y', P) is just an a.c. finite :Y' -measurable function
on O.
It is an extremely useful state of affairs that a probability or more generally
a measure defined on an algebra s1 or even a semi-algebra (definition forth-
coming) may be uniquely extended to a(s1). The proof will be deferred to
Chapter 6 but a first step in this direction is Theorem 1.

Definition. A semi-algebra Y' is a n-c1ass of subsets of 0 such that 0 E Y', 0 E Y'


and for each A E Y' there is a finite partition of A C in Y'.

Let Y' be a semi-algebra. It follows easily that


1. the class <§ of all finite unions of disjoint sets of Y' is the algebra generated
by Y' (clearly, <§ c s1(Y') and <§ is an algebra containing /:/' and hence
s1(Y'));
It. for {A., n = 1, ... , m} c /:/', there is a finite partition of A 1 A'2'" A~
in Y';
111. for each A E Y' and each countable class {A., n = 1,2, ...} c Y' such
that Uf A. :::> A, there is a a-partition {B., n = 1,2, ... } of A in ,C/' for
which each B. is a subset of some Am (write A = U. AA. as a disjoint
union and utilize (ii».
Let <§ and Yf be two classes elf subsets of 0 with <§ c Yf. If J1 and v are
set functions defined on <§ and .n" respectively such that J1(A) = v(A) for
A E <§, v is said to be an extension of p to .Yf, and J1 the restriction of v to <§,
denoted by J1 = v I,~.

Theorem 1. If J1 is a nonnegative additive set function on a semi-algebra //', then


there exists a unique extension v of J1 to the algebra s1 generated by .C/' such
that v is additive. Moreover, if J1 is a-additive on Y', so is von s1.

PROOF. Since s1 is the class of all finite unions of disjoint sets in Y', every
A E s1 has a finite partition {A., 1 ~ n ~ m} in .C/'. For such an A, define
1.5 Additive Set Functions, Measures, and Probability Spaces 21

Then v is consistently defined on d, since if A Ed has two distinct finite


partitions {An, 1 ~ n ~ m} and {B j , 1 ~ j ~ k} in g,
m m m "
v(A) = L Jl(A n) = L Jl(AA n) = L L Jl(AnB)
n;1 n;1 n;1 j;1
k m k k
= L L Jl(AnB) = L Jl(AB) = L Jl(B).
j;ln;1 j;1 j;1
It is easy to see that v is additive on d. The uniqueness of v follows from the
fact that if v* is additive on d and v* I." = Jl, then for any finite partition
{An, 1 ~ n ~ m} in g of A E d,
m m
v*(A) = L v*(A n) = L Jl(A n) = v(A).
n; 1 n; 1

Suppose next that Jl is IT-additive on g, {An, n ~ 1} is a IT-partition in d of


AEdand{C n, 1 ~ n ~ m}isapartitionofAing.Foreachn,let{Bj,jn_1 <
j ~jn} be a finite partition of An in g, wherejo = O. Then {Bj,j ~ 1} is a
IT-partition of A in g and
00 00 in 00 CO

L v(A n) = L L Jl(B j ) = L Jl(B) = L Jl(AB)


1 n; 1 j; j"_1 + 1 1 1
00 m m 00

= L L Jl(CnB j ) = L L Jl(CnB)
j;1 n;1 n;1 j;1
m

= L Jl(Cn) = v(A). o
1

The subadditive property of a measure and alternative characterizations


of countable additivity appear in

Theorem 2. Let Jl be a nonnegative additive set function on an algebra d and


{An, n ~ O} c d.
1. if {An, n ~ l} is a disjoint class and Uf An C A o , then
00

Jl(A o) ~ L Jl(A n)·


1

11. if A o c Ui AJar some m = 1,2, ... , then


m

111. Jl is IT-additive iff for every increasing sequence {An' n ~ 1} c d with


limn An = A Ed,
(3)
n
22 I Classes of Sets, Measures. and Probability Spaces

In this case Ji. is subadditive on.xl, i,e" ifUf AjE.xI, AjE.xI,j ~ 1, then

Ji.(Q A j) ~ ~ Ji.(A).
iv. ifJi. is a-additive, then for every decreasing sequence An E .xl with Ji.(A t} < 00
and limn An = A E .xl,
(4)

Conversely, iffor every decreasing sequence An E.xI with limn An = 0,


(5)

then Ji. is a-additive,


PROOF, (i) Since (i) is obvious if Ji.(A o} = 00, let Ji.(A o} < 00. Now A o -
U,;, AnE.~ (m = 1,2", ,), whence by subtractivity (see Exercise 1.5.1)
o ~ Ji. ( Ao - YAn) = Ji.(A o} - Ji. (y An) = Ji.(A o} - ~ Ji.(A n},
m 00

Ji.(A o} ~ L Jt(A n} ~ L Ji.(A n}·


1 t
(ii) Since A o = U,;,
AoA n = AoA , U AoA2A~ U", U AoAmA~-,,,, A~,
by additivity and monotonicity of Ji.

Ji.(A o} = p(y AoA n) = p(AoA t } + p(A o A 2AD + ...


+ p(AoAmA~-l AD
:$: Ji.(A t } + Ji.(A 2} + + p(A m}·
(iii) If (3) holds for every increasing sequence {An, n ~ I} c.xl with
lim An = A E.xI, then for any a-partition {B n, n ~ l} in .xl of B E s~,

p(B} = p(Q Bn) = P(li: YBn) = Ii: p(y Bn)


m 00

= lim L p(Bn} = L p(Bn}·


m t t

Hence, p is a-additive. Conversely, let Ji. be a-additive and An' n ~ 1, an


increasing sequence of .xl sets with limn An = A E.xI. Then

p(A} = p(Q An) = p(A t U (A 2 - At) u (A 3 - A 2) u···}

= p(A , } + p(A 2 - At} + p(A 3 - A 2} + ...


= lim(Ji.(A t } + p(A 2 - AI} + ... + p(A n - A n- t »
= lim p(A n }.
n
1.5 Additive Set Functions, Measures, and Probability Spaces 23

The second part of (iii) follows from the first part and (ii).
(iv) If J.l is cr-additive, and An' n ~ 1, is a decreasing sequence of.91 sets with
J.l(A I) < 00 and limn An = A Ed, then Al - An is an increasing sequence in
.91, whence by (iii)

,u(A , ) - J.l(A) = J.l(A I - A) = J.l(li~(AI - An»)

= lim J.l(A. - An) = J.l(A , ) - lim J.l(A n).


n

Since J.l(A 1 ) < 00, (4) holds. Conversely, if {An} is any cr-partition in .91 of
some set A Ed, then Ui'=nAjE.9I, n~ 1, and Ui'=nAj!fIn1An=0,
whence by hypothesis lim J.l{Ui'=n A j} = O. By finite additivity,

J.l{A} = n~IJ.l{Aj} + J.lLVn Aj }.


and so countable additivity follows upon letting n ---> 00. o
Corollary 1. A nonnegative additive set function J.l on a semi-algebra Y satisfies
(i) and (ii) on Y. Moreover, a measure J.l on Y is subadditive on Y.
PROOF. Extend J.l to the algebra .9I(Y) generated by Y via Theorem 1. By
Theorem 2, the set function (resp. measure) J.l satisfies (i) and (ii) (resp.
cr-additivity, hence subadditivity) on .9I(Y) and afortiori on Y. 0

A finite measure space is a measure space (0, ff, J.l) with J.l finite. In such a
case, the finiteness proviso in Theorem 2 (iv) is superfluous.

Corollary 2. If (0, ff, P) is a probability space, P{lim n An} = limn P{A n} for
every sequence of sets An E:F whose limit exists.

Theorem 3. Let (O,:F, P) be a probability space with :F = cr(~), where ~ is an


algebra of subsets of o. Then for A E:F and all e > 0, there exists a set Bt E ~
such that P{A A Bt } < e and so IP{A} - P{Bt}1 < e.
PROOF. Let .91 = {A: A E ff and for every e> 0 there exists B, E C§ with
P{A A Br.} < e}. Then s:1 :=> C§, and moreover .91 is a cr-algebra since, setting
B = B",
A AB = A AB
C C
=> A EdC
if A Ed,
co 00 00 00

U An A U Bn U (An A Bn) => U An Ed


C if A jEd, j ~ 1,
• • 1 1

(recall Exercise 1.2.1), noting that

!~: p{Q AjA jVI Bj } = p{Q AjA jVI Bj }


Hence .91 :=> cr(C§) = ff. o
24 I Classes of Sets. Measures. and Probability Spaces

EXERCISES 1.5
1. Show that a finite measure II on a class .'# is (i) subtractive, i.e.. A; E .91, i = 1, 2.
AI C A 2 and A z - AI E.W implies/l(Az - AJl = /l(A z ) - /l(A I ), and (ii) monotone,
i.e., /l(A I) =:; /l(A 2 ) with A; as in (i). In view of (i), if there is one set in d wIth finite
measure, the proviso 11(0) = 0 is automatically satisfied.

2. If (Q, sd, /l) is a measure space, prove that /l(lim An) :5 lim /l(A n) for An E d.
Analogously, if /l(U~n A;) < 00, some n ~ I, then /l(rrm An) ~ rrm /l(A n).

3. IfQ = R = [ - x,x], then Y' = {{ + x}, R, [a, b), - x =:; a :5 b :5 x l is a semi-


algebra, not an algebra. Also, if (Qj, d j ), i = 1,2, are measurable spaces,

{AI x Az:AjEdi,i = 1,2}


is a semi-algebra. Also, if n ;:: 2 and 9; is a semi-algebra of subsets of Qi, 1 =:; i =:; n,
then Y' = {X~=1 Si: S; E 9;, 1 =:; i =:; n is a semi-algebra of subsets of X~~1 Qi'
4. Prove that the class of all finite unions of disjoint sets of a semi-algebra Y' is an
algebra.
5. If (Q, d, P) is a probability space and {An, n ~ I} is a sequence of events,

(i) verify that

p{lim An}
11-+00
= lim p{
n ..... oo
n
J=n
A j }, p{lim An}
n-oo
= lim p{
n-oo
UA
J=n
j}

(ii) show that p{n:,= 1 An} = I if PiAn} = I, n ~ 1.


6. Let Q = {positive integers}, d = {A: A c Q}. Define /ll {A} = number of elements
in A and !J2 {A} = 0 or 00 according as A is a finite or infinite set. Note that if An =
{n}, then Ui=n Aj ! lim An = 0 and /l1 {U;:, A j } = 00. Since /ll is a bona fide
measure on d, called counting measure, the finite hypothesis in the first part of (iv)
of Theorem 2 is indispensable. The set function /l2 is additive but not a-additive.
Thus, if in the second part of (iv), condition (5), were stipulated only when An! 0
and /l{A n} < 00, some n ~ 1, then a finiteness requirement for /l would also be
necessary.

7. For n, d, /ll as in Exercise 6, define Nn(A) = /ll {A[I, n]} and

. N.(A) }
£t = { A Ed: /l{A} = 11m - - exists.
"-+00 n
Prove that £t is closed under complementation and finite disjoint unions but that
nonetheless £t is not an algebra. Also /l {A}, called the asymptotic density of A, is
additive but not a-additive. Hint: Let Bk = {odd integers in [2 2\ 22k + I)} and
B~ = {even integers in [2 2k - 1, 2 Zk )}. If B = Ui"= 1(B k U B~) and A = {odd integers
ofQ}, then A E~, BE~, but AB¢~.

8. Letfbe a monotone increasing function on Q = [0, I) such that 0 =:; f =:; I, and ~
the class of finite unions of disjoint intervals [a, b) c Q.

i. ~ is an algebra.
1.6 Induced Measures and Distribution Functions 25

ii. Put Il(A) = D~ \ (f(b j) - f(a}) for A = Uj~ I [aj' b} E <§, where 0::0; a l <
b l ::0; a2 < b 2 ::0; ... ::0; a. < b. ::0; I. Then Il is additive on <§.
iii. If Il is u-additive on <§, thenfis left continuous, i.e.,«£) = limo <';-0 f(t - (i) for
every t E Q.
iv. Iffis left continuous, then Il is u-additive on r.§. Hint: For part (iv), see the proof
of Lemma 6.1.1.

9. Let (Q, ff, Il) be a measure space and r.§ a sub-u-algebra of .~. Prove that
<§* = {G fj, N: G E <§, N E .~, Il{N} = o}

is a sub-u-algebra of ff. Then <§* is called the completion of <§. When <§ = ff then
<§* = ff and so the completion of ff is rather defined by ff* = {A fj, M: A E .~,
MeN E .~,Il{N} = O}. If Il* on ff* is defined by 11*{A 11 M} = I1{A}, check that
(Q, .~*, Il*) is a measure space. It is called the completion of (Q, .r;;, 11).

10. (Cantor ternary set.) Let II = III = (t,~) be the "open middle third" of I = [0, I].
Analogously, for j ~ 2, let I j • b k = I, ... , 2j -\, be the open middle third of each of
the 2j - 1 intervals of I - UI=! Ij, where I j = Uf~-" I j . k , j ~ I. Show that the
Cantor set C = I - U~ \ I j satisfies: (i) x E C iff the digit I is not needed in the
triadic expansion x = L~ I xjr\ where, in general, X j = 0, 1,2; (ii) C is un-
countable; (iii) C is perfect (all its points are limit points); (iv) C is nowhere dense
(the closure of C contains no nondegenerate interval); (v) if Il is a measure on the
Borel sets of [0, I] whose restriction to intervals coincides with "length" (such a
" Lebesgue measure" appears in Section 6.1), then Il {C} = O.

1.6 Induced Measures and


Distribution Functions
Consider a measurable transformation X from a measure space (0, d, J-l) to a
measurable space (n, d), i.e., X: 0 -+ nand X- 1(d) c d. The measure J-l
on ~r,/ induces in a very natural fashion a measure [l on d, referred to as the
induced measure, denoted by J-lx and defined for all A Ed by
J-lx{ A} = [l{ A} = J-l{ X - 1(A)}.

It is readily verified thatJ-lx isa measureond and consequently that (n, d, J-lx)
is a measure space.
In particular, a random variable X on a probability space (0, ff, P)
engenders a new (Borel) probability space (R, 14, P x), where 14 is the class of
Borel sets of R = [ - 00, 00]. To focus attention on the latter, i.e., to consider
X of primary interest, is tantamount to restricting the former to (0, a(X), P).
Associated with any random variable X on a probability space (0, ff, P) is a
real function F x called the distribution function of the random variable X and
defined for all x E[ -00,00] by
Fx(x) = P{w: X(w) < x} = P{X- 1(( -00, x»)} (1)
26 I Classes of Sets, Measures. and Probability Spaces

Definition. A real-valued function G(x) on R = [- 00,00] is called a dis-


tribution function (abbreviated dJ,) if
I. G is nondecreasing,
II. G is left continuous, i.e.,
lim G(y) = G(x), all x e R,

iii. G( - 00) = Iimx~ _ 00 G(x) = 0, G(oo) = Iimx~oo G(x) = 1.

It follows from Theorem 1.5.2 and the fact that a r.v. is finite a.s., that the
"distribution function of any r.v." is a distribution function in the sense ofthe
prior definition. Conversely, for every distribution function G there exists a
r.v. X on some probability space (n, §', P) such that Fx = G. The prooffor a
special case is given in this section but a general treatment will be deferred to
Section 6.3.
Analogously to tbe preceding, n random variables X I' ... , X n on a
probability space (n, §', P) induce via the map X = (X I"'" X n ), the new
(Borel) probability space (R n, 14n, P x), where 14n is the class of n-dimensional
Borel sets of R n and P x is the induced measure. Rather than shift from the basic
probability space, one may restrict attention to (n, a(X I" .. , X n ), P).
Associated with the random vector X i.e., with the n-tuple of random variables
(X I> ••• , X n), is a real function on Rn called the (joint) distribution function of
(X I' ... , X n ), denoted by F XI, ...,X n and defined by

Fxt, ... ,X'<X I ,···, x n) = P{w: XI < XI"'" X n < X n},


X = (XI"'" xn)e R
n
. (2)

The joint dJ. F XI, .... Xnsatisfies properties akin to (i), (ii), (iii) (see Section 6.3),
but monotonicity in each argument is insufficient and the analogue of (i)
involves a mixed difference. Nonetheless, if a dJ. on Rn is properly defined, it is
again true that the "dJ. of (X I' ... , X n )" is a dJ. on R" and that, conversely,
givenadJ.G(x l , ... , xn)onRn,therealwaysexistrandomvariablesX t ,··" X n
on some probability space whose joint dJ. F XI, .... x n = G.
In the same vein, a sequence {X n , n 2: I} of r.v.s on a probability space
(n, §', P) induces a Borel probability space (ROO, 14 00 , P x) via the map
X = (X I' X 2' ...) or alternatively places in relief (n, a(X n , n 2: 1), P). Note
that the dJ.s of all finite n-tuples ofr.v.s (X it ..... Xi) are determined via

F X/I' .... X/n(XI'···' x n) = P{w: XiI < XI' , Xin < x n }·

It will be proved in Section 6.4 that if the dJ.s Git in are prescribed in a
consistent manner for all choices of indices I .::;; it < < in and all n 2: 1,
there exists a sequence {Xn,n 2: 1} ofr.v.s on a probability space such that
1.6 Induced Measures and Distribution Functions 2.7

AdJ. G on R is called discrete if

G(x) = L
j:Xj<x
Pj' XER, (3)

where Pj > 0 for allj, LalljPj = 1, and S = {Xj: 1 5;j < n + 1 5; oo} is a subset
of( -<X), (0). The associated function

f(X){= Pj for x = Xj' 15;j < n + 1 5; <X)


(4)
=0 for x "# x j
is termed a probability density function (abbreviated p.dJ.) on

S = {x/ 1 5; j < n + 1 5; oo}.

Clearly, a probability density function is completely determined by {Xj' Pj'


1 5; j < n + 1 5; oo}. Typically, S is the set of positive or nonnegative integers
or some finite subset thereof. This will be the case with the binomial and
Poisson dJ.s occurring in Chapter 2.
To construct a probability space (n,~, P) and a r.v. X on it whose d.f.
Fx is equal to a preassigned discrete dJ. G say (as in (3», it suffices to choose
n = s, ~ = class of all subsets of n, P {A} = Lj'XjE A. Pj' and X(w) = w. Note
that then P{w: X(w) = Xj} = Pj' 1 5;j < n + 1 5; 00, where L~ Pj = 1.
AdJ. G is called absolutely continuous if there exists a Borel function q on
R = [ - 00, 00] such that

G(x) = f 00 g(t)dt, XER. (5)

The associated function g is termed a density function and necessarily


satisfies

g ~ 0, a.e., f~oo g(t)dt = 1. (6)

Here, a.e. abbreviates" almost everywhere" (the analogue of "almost cer-


tainly" when probability is replaced by Lebesgue measure) and both in-
tegrals will be interpreted in the Lebesgue sense. The Lebesgue integral,
Lebesgue measure, and related questions will be discussed in Chapter 6.
A third type of dJ., called singular, is defined in Chapter 8. The most
general dJ. on R occurring in probability theory is a (convex) linear combina-
tion of the three types mentioned (see Section 8.1).
Distribution functions occupy a preeminent position in probability
theory, which is primarily concerned with properties of LV.S verifiable via
their dJ.s.
Random variables with discrete dJ.s as in (3) will be called discrete r.v.s
(with values x) while LV.S with absolutely continuous dJ.s as in (5) will be
called absolutely continuous r.v.s (with density g). The next chapter deals with
some important special cases thereof.
28 I Classes of Sets, Measures, and Probability Spaces

Theorem 1. If(X I " ' " X n ) and (YI , ••• , y,,) are random vectors with identical
distribution functions, that is,

then g( X I' ... , X n) and g( YI , ... , y,,) have identical dJ.s for any finite Borel
function 9 on W.
PROOF. It follows from Theorem 1.4.4 that g(X t, ... , X n) and g( YI , ... , Yn)
are r.v.s. Set
rg = {B: BE BIn, P{(X I> ... , X n) E B} = P{(YI , ... , Yn) E Bn,

!0 = {D: D = X[-
J= I
00, c), cj real}.

Then rg =:> !0 by hypothesis, and, moreover, it is easy to verify that rg is a


A-class and !0 is a n-c1ass. By Theorem 1.3.2, rg =:> a(!0) = f!4n. Hence, since
for AE ( - 00, (0), A == {(XI"'" x n) E W: g(x 1 , ••• , x n) < A} E BIn,

P{g(XI, .. ·,Xn) < A} = P{(XI, ... ,Xn)EA} = P{(YI , ... , Yn)EA}


= P{g(YI ,· .. , Yn ) < A}. 0

Corollary 1. If(X I"'" X n ) and (YI , ••• , y") are random vectors with identical
distribution functions, thenfor any Borelfunction 9 on Wand linear Borel set B

EXERCISES 1.6
1. Prove that G as defined in (3) is adJ.; verify that (n, .~, P) as defined thereafter is a
probability space and that F x = G.
2. If P,>. I ~ 0, PES, t E T, and Lie
T Lpes PI'. I = I, where Sand T are countable subsets
of (- 00, Xi), define a probability space and random variables X, Yon it with

Fx(x) = L LP
p<xteT
p ." Fy(Y)= L LP
t<ypeS
p ."

Hint: Take n = S x T.
3. Prove that if H(x) = P {X ~ x}, where X is a r. v. on (n, .~, P), then H is a nondecreas-
ing right-continuous function with H( - 00) = 0, H( + 00) = 1. Some authors define
such functions to be d.f.s.
4. For adJ. F, the set

S(F) = {x: F(x + f:) - F(x - e) > 0 for all e > O}


is called the support of F. Show that each jump point of F belongs to the support and
that each isolated point of the support is a jump point. Prove that S(F) is a closed set
and give an example of a discrete d.f. whose support is ( - 00, (0). Any point x E S(F)
is called a point of increase of F.
References 29

Comments

The notion of a n-class and A-class (Section 3) seems to have originated with
Dynkin (1961).

References
J. L. Doob, "Supplement," Stochastic Processes, Wiley, New York, 1953.
E. B. Dynkin, Theory of Markol' Processes (D. E. Brown, translator), Prentice-Hall,
Englewood Cliffs, New Jersey, 1961.
Paul R. Halmos, Measure Theory, Van Nostrand, Princeton, 1950; Springer-Verlag,
Berlin and New York, 1974.
Felix Hausdorff, Set Theory (J. Aumman et al., translators), Chelsea, New York, 1957.
Stanislaw Saks, Theory of the Integral, (L. C. Young, translator), Stechert-Hafner,
New York, 1937.
2
Binomial Random Variables

The major theorems of probability theory fall into a natural dichotomy-


those which are analytic in character and those which are measure-theoretic.
In the latter category are zero-one laws, the Borel-Cantelli lemma, strong
laws oflarge numbers, and indeed any result which requires the apparatus of a
probability space.
On the other hand, findings such as central limit theorems, weak laws of
large numbers, etc., may be stated entirely in terms of distribution functions
and hence are intrinsically analytic. The fact that these distribution functions
(dJ.s) are frequently attached to random variables (LV.S) does not alter the fact
that the underlying probability space (on which the r.v.s are defined) is of no
consequence in the statement of the analytic result. Indeed, it would be
possible, although in many cases unnatural, to express the essential finding
without any mention of LV.S (and afortiori of a probability space).
In presenting theorems, distributions will generally be attached to LV.S.
For analytic results, the r.v.s are inessential but provide a more colorful and
intuitive background. In the case of measure-theoretic results, it suffices to
recognize that a probability space and LV.S on it with the given distributions
can always be constructed.

2.1 Poisson Theorem, Interchangeable Events,


and Their Limiting Probabilities
The term distribution will be used in lieu of either dJ. or p.dJ. The binomial
distribution is not only one ofthe most frequently encountered in practice but
plays a considerable theoretical role due in part to its elemental simplicity

30
2.1 Poisson Theorem, Interchangeable Events, and Their Limiting Probabilities 31

and in part to its leading (in the limit) to even more important distributions,
such as the Poisson and the normal.
As is customary, the combinatorial symbol (j) abbreviates n !/j !(n - j)! for
integers n ~ j ~ 0 and is defined as zero when j > n or j < O.
The binomial d.f. is a discrete dJ. with jumps of sizes (j)pi( 1 - p)" - i at the
points j = 0, 1, ... , n. In other words, its p.dJ. is completely characterized
(Section 1.6) by S = {O, 1,2, ... , n} and {(j)pi(1 - p)"- i, ~ j ~ n}. Here,
p E (0, 1) and n is a positive integer.
°
The construction of Section 1.6 shows that it is a simple matter to define
a probability space and a r.v. X on this space whose associated dJ. is binomial.
Thus, a r.v. X will be alluded to as a binomial r.v. with p.dJ. b(k; n, p) if for
some positive integer nand p E (0, 1)

P{X = k} = (:)pkqn-k == b(k; n, p), k = 0, 1, ... ,n, q = 1- p. (1)

The Poisson d.f. is a discrete dJ. with jumps of sizes )./e -)./j! at the points
j = 0, 1,2, ... , and so X will be called a Poisson r.v. with p.dJ. p(k;)") if for
some)., E (0, (0)
).,ke -)'
P{X = k} = k! == p(k; ).,), k = 0, 1,2,.... (2)

The quantities nand p of the binomial distribution and)., of the Poisson


°
distribution are referred to as parameters. If )., = in (2) or p = in (1), the
LV.S (or their distributions) are called degenerate since then P{X = O} = 1
°
(i.e., its dJ. has a single jump of size one). Degeneracy likewise occurs if p = 1
in (1) since then P{X = n} = 1.
The normal dJ. is an absolutely continuous dJ. (Section 1.6) G with

x 1
= - - e _(u-II)2/2.,.2 du,
G( x )
J- 00 afiJc
XER = [- 00,00]. (3)

e
The parameters of the distribution are E ( - 00, (0) and a 2 E (0, (0). Here,
a 2 = 0 may be regarded as a degenerate distribution. It is customary to
denote the standard normal distribution corresponding to the case = 0, e
a 2 = 1 by
1
J
x
<l>(x) = - - e- u2 2
du, q>(x) = -1- e- x 2 / 2 . (4)
fiJc
/

-00
fiJc
Here q>(x) is called the (standard) normal density function. A r.v. X is
e
normally distributed with parameters and a 2 , written X is N(9, ( 2 ) if

P{X < x} X-
= G(x) = <I> ( -a- e) , XER. (5)
32 2 Binomial Random Variables

Commencing here and sporadically throughout the remaining chapters, a


simple but incredibly expedient device for asymptotic calculations, introduced
by the English mathematician G. H. Hardy, will be utilized. Thus, if rn > 0,
n ~ I, a sequence {b n , n ~ I} of real numbers is said to be little 0 of Yn (resp.
capital 0 of Yn ), denoted bn = o(rn) (resp. bn = O(r n» if limn_oo(bn/r n) = 0
(resp.lbnl/r n < C < 00, n ~ 1). It is a simple matter to check that the sum of
»
two quantities that are o(r n) (resp. O(r n is likewise o(r n) (resp. O(r n». Thus, a
veritable calculus can be established involving one or the other or both of 0
and O. In a similar vein, bn '" r n iflimn_oo(bn/r n) = 1. The same notations and
calculus apply to real functions f(x) where either x -+ 00 or x -+ O. Thus, for
example, log(1 + x) = x + O(X z ) = O(lxl) as x -+ O.
The first theorem, purely analytic in character, states that the limit of
binomial p.dJ.s may be a Poisson p.dJ. and hence implies the analogous
statement for d.f.s.

Theorem I (Poisson, 1832). IfSn is a binomial LV. with p.d.f. b(k; n, Pn), n ~ I,
and as n -+ 00, npn = A. + o(1)for some A.E [0, (0), then/or k = 0, 1,2, ...
A.k -).
lim P{Sn = k} = -kef .
n-oo .

PROOF. Set qn = 1 - Pn. Since n log(1 - (A./n) + o(l/n» = n« -A./n) +


o(l/n» -+ -A.,

P{Sn = k} = (~)p~q~-k
-_ n(n - 1)··· (n - k + 1) [A.
-+0 (1)]k[
- A. (1)]n-k
1--+0-
k! n n n n

n n
1)
= -n (n-- - ... (n - k +
n
1) -k!1[A. + 0(1)] k
o
By refining the arguments of the preceding proof, a more general result is
obtainable. To pave the way for the introduction of a new p.dJ. subsuming the
binomial p.dJ., two lemmas which are probabilistic or measure analogues of
(2) and (3) of Section 1.2 will be proved.

Lemma l. Let q> be an additive set function on an algebra d and AjEd,


1 ~ j ~ n. Iflq>(Ui A)I < 00, then

q> (0 A
1
j) = I q>(A) - l$h<j,$n
j=l
L q>(AhAj,) + ... + (_l)n-lq>(A1A z ··· An)'
(6)
2.1 Poisson Theorem, Interchangeable Events, and Their Limiting Probabilities 33

PROOF. Set CPj = cp(A), CPh.j, ... j. = cp(AhAj, ... A j), r ~ 2, Clearly, (6) holds
for n = 1. Suppose, inductively, that it holds for a fixed but arbitrary positive
integer m in [1, n), whence Icp(U~+ 1 A)I < 00. Now,

(7)
cp(A m+l ) = cp(y AjA m+l ) + cp(Am+1 - y A j).

where Icp(U~ Aj)1 < 00, Icp(Am+I)1 < 00, Icp(U~ AjAm)1 < 00. By the
induction hypothesis

cp(u AjAm+l ) = ICPj,m+1 -


I j=1
L CPh.h,m+1
lS,h<j,s,m
+ ... (8)

+ (_l)m-Icpl. 2, ...• m.m+ I'


and so employing (8) in (7) the conclusion (6) with n = m + 1 follows, 0

Lemma 1 yields immediately

Corollary 1 (Poincare Formula). If AI> .,., An are events in a probability


space (Q,~, P) and 4 = LI S,h <j,<'" <jkS,n P{AhAj, , .. A j.}, then

(9)

Definition. Events AI' ., ., An of a probability space (Q,~, P) are called


interchangeable (exchangeable) iffor all choices of 1 ::; i l < ... < i j ::; nand
all 1 ::; j ::; n,
(10)

Interchangeable events seem to have been introduced by Haag (1924-


1928) but the basic results involving these are due to de Finetti (1937).

Corollary 2. If {A I' " " An} are interchangeable events ofa probability space
(Q, ~, P) with Pj defined by (10), then

ptV1A j } = npi - (;)P2 + G)P3 - ... + (-It-Ipn' (11)

The novel part of the next lemma is (13), since for A = Q, (12) is merely the
complementary version of the Poincare formula (9),
34 2 Binomial Random Variables

Lemma 2. Let qJ be an additive set function on an algebra d and A jEd,


1 s,j s, n. IflqJ(U~ A)I < 00 and
B m = {w: W E exactly m afthe events AI' ... , An}, Os, m s, n,
thenfor any A E d
n

qJ(BoA) = qJ(A) - I qJ(AjA) + I qJ(Aj,AhA) - ...


j=1 1:5j,<h:5n

+ (-I)nqJ(A I ... AnA), (12)

qJ(B m) = 1:5j,<~<jm:5nqJ(Aj,'" A jm ) - (m; 1)1:5j,<,,~jm+.:5n


X qJ(Aj, ... A jm +) + ... + (_l)n-m(:)qJ(A I ... An)' (13)

PROOF. By Lemma 1

qJ(BoA) = qJ(A) - qJ (y AjA)


n

= qJ(A) - I qJ(AjA) + I qJ(Aj,AhA) - ...


j= I 1:5j, <h:5n
+ (-l)nqJ(A I ... AnA),
proving (12). For any choice of ii' 1 s, i s, m with 1 s, it < ... < im s, n, let
I S, i l < < i n - m S, n signify the remaining integers in [I, n]. Then taking
A = Aj, A jm in (12)
qJ(Aj, ... AjmAf• ... ALm) = qJ(A . Af• ... Afn_J
n-m
= qJ(A) - L qJ(AihA) + L qJ(Aih,Aih2A) - ...
h= I I :5h, <h2:5n-m
+ (_l)n-mqJ(A i • . . . Ain_mA). (14)
Since for any 1 s, m s, n
Bm = U A·J1 ... A·1m . A~11 ... A~In-m
1:5j, < ... <jm:5n

represents Bmas a disjoint union, (13) follows by summing over il"" ,im in
(14). 0

The following corollaries are direct consequences of Lemma 2.

Corollary 3. If AI> ... , An are events of a probability space (0, iF, P) and
Tk = II sj, < ... <jkSn P{Aj,A h AjJ for k = 1,2, ... , n, then
P{exactly m of AI' , An occur}

= Tm - (m ; 1) Tm +I + ... + (-I)n-m(::,)T". (15)


2.1 Poisson Theorem, Interchangeable Events, and Their Limiting Probabilities 35

In the case of interchangeable events, (15) simplifies considerably and may


be expressed in terms of the classical difference operator:
lipn = 1i1pn = Pn+l - Pn' likpn = 1i(lik-1pn), k 2 2,

Corollary 4. If AI> .,., An are interchangeable events of a probability space


(n, $', P) with Pi = P{A 1A 2 .•. Ai}' 1 :;; j :;; n, then, setting Po = l,for any
integer min [0, n]

P{exactlymofA t , ... , An occur} =.f (_I)i-m(~)(i)Pi


l=m m I
m
= (n) ni ( _Iy(n ~ m)Pm+ i
m ] =0 J

= (:)( -l)n-mlin-mpm' (16)

Since the events Bm , °:;m :;; n, of (16) constitute a finite partition of n,


their probabilities are nonnegative and add up to one. This shows that the
probabilities Pi in (10) cannot be arbitrary positive numbers (see Exercise 9).
In view of the correspondence between events Ai and their indicators I Aj'
it should not be surprising, setting Sn = Lj= t I Aj' that the event Bm of(16) is
equivalent to {w: Sn = m}. Indeed, discussion of the limiting behavior of the
probabilities in (16) will be couched in terms of the LV.S Sn'

Theorem 2. Iffor each n 2 I, Sn is a random variable with


m

P{Sn = m} = ni ( -I)i( n+.) (m + j)P~~i' 0:;; m :;; n, (17)


]=0 m J m
such that for some ,10 E (0, 00) and An E [0, 00)

°:: ; n(n - I) ... (n - k + 1)p~n) :::;; At, 1 :::;; k :::;; n, (18)


k 2 I, as n -> 00, (19)
then

m 2 0.

PROOF. For m 20, fixed no and all large n


A:;'e-;'n
P{Sn -- m } - - - - no-t(_IY[
'" -,- n! (n) _
'J
Anm +]
')' Pm + i
m., - i=O
L.
m.J.., ( n-m-J.
n~m( pi n!
+ i=no
(n)
L. - " ' (n - m - J')'. Pm+i
m.J. J

Am+i
- L
00

(-l)i~ = It + 12 + 13 (say).
i=no m.J.
36 2 Binomial Random Variables

Since A. n + o( I) = np\n) ~ ..1.0 , there exists A. > such that A.n < A., all n. For
°
..1. 0
any I> > and fixed m, choose no to satisfy
00 A.m+j
L -,-.,<1>,
j=nom.J.
whence 11 3 1 < I> and via (18)
n-m n(n - 1)··· (n - m - j + 1) n-m lm+ j
II 21 ~ L
j=no
,.,
m.J.
pl::~ j::;; L ~<
j=nom.J.
1>.

Now, for 1 ~ m + j ::;; m + no, by (19)


,
n. p(n) . _ lm+ j < nm+ jp(n) . _ lm+ j
(n-m-j)! m+] n - m+J n

= l~ + j + 0(1) - l~ + j = 0(1),
n.
, p(n) . _ lm+ j > (n _ m _ J')m+jp(n) . _ lm+j
.),m+J n - m+] n
( n-m-J.

= I_ m + J.)m+ j nm+jp(n)._lm+ j
( n m+J n

implying I I = 0(1), and the theorem follows. o


This result will be utilized in Theorem 3.1.2 to obtain (under suitable
conditions) a limiting Poisson distribution for the number of empty cells in
the casting of balls into cells at random. Note that this implies Theorem 1.
It follows immediately from Theorem I that the sequence Fn(x) =
Lk < x b(k; n, p) of binomial dJ.s tends to the Poisson dJ. F(x) = < x p(k; l) Lk
as n -+ 00, np -+ l E (0, (0). In this case, not only does
b(k; n, p) -+ p(k; l), (20)
for every positive integer k, but also
00

L Ib(k; n, p) -
k=O
p(k; l)l-+ ° (21)

as n -+ 00, np -+ l E (0, (0), where b(k; n, p) is defined as zero for k > n. This is
an instance of the more general phenomenon of the following

EXAMPLE 1. Let n = {WI' W 2 , .. .}, /F = class of all subsets of n, and let


{P, Pn, n ~ I} be a sequence of probability measures on (n, /F) such that
Pn -+ Pas n -+ 00, that is,
j = 1,2, .... (22)
2.1 Poisson Theorem. Interchangeable Events. and Their Limiting Probabilities 37

Then, as n --+ 00
00

L IPn,j - pjl--+ 0. (23)


j=·t

PROOF. For any I'; > 0, choose N = N, such that

L Pj <
j>N
B. (24)

Now (22) ensures that as n --+ 00

N
L IPn,j - pjl --+ 0, (25)
j= 1

whence as n --+ 00

(26)

Then (24) and (26) entail

IIri1 LPn, j :::;; B, (27)


n-ooj>N

so that via (24), (25), (26), and (27)


00

IIri1 L IPn,j -
n-oo j= 1
pjl = Iiffi
n-oo j>N
L IPn,j - pjl :::;; Iiffi
n-oo j>N
L (Pn,j + p) :::;; 21';,
which, in view of the arbitrariness of 1';, is tantamount to (23). o
EXERCISES 2.1
1. Prove that for n = 1,2, ...

f~oo cp(x)dx = 1, f~ooX 2. - l cp(x)dx = 0,

2. Verify that

L kp(k; 1) = L
00 00

1= Pp(k; 1) - 12,
k=O k=O

L• p(k; 1) = 1
-
foo e-Xx' dx,
k=O n! A

L k p(k; 1) = 1
00
3 3
+ 31 2 + 1,
.
k=O

L kib(k; n, p) = np or npq + (np)2 as j = 1 or 2.


k=O
38 2 Binomial Random Variables

3. Verify for all positive integers nl> n2 and nonnegative integers k that

4. Find the dJ. of X 2 when


i. X is a Poisson r.V. with parameter A,
ii. X is N(O, I).
Prove under (i) that Ane-;'/n!:o=::; P{X ~ n} :0=::; An/n! for 11 ~ O.

5. A deck of N cards numbered I, 2, ... , N is shuffled. A "match" occurs at position j


if the card numbered j occupies the jth place in the deck. If p~) denotes the prob-
ability of exactly m matches, 0 :0=::; m :0=::; N, prove that

I ( ( - I)N - m)
pIN) = _ 1_ I + -I - -I + ... + - -- (i)
m m! 2! 3! (N - m)! '

(N) e-
I
I I (ii)
IPm --;T <m!(N-m+ I)!'

q<;') -+ e - I (~ + ~ + ...) (iii)


2! 3! '
where q~) is the probability of at least m matches.
6. For k = 0, 1,2, ... , r = 1,2, ... ,0 < P < I, and q = I - p, put f(k; r, p) =
('+t-I)p'l. Prove that Dx,=o f(k; r, p) = I for every rand P and that jf q = q(r)
and rq -+ AE (0, I) as r -+ co, then
lk
f(k; r, p) -+ e-.l ki'

(f(k; r, p) is called a negative binomial p.dJ. with parameters rand p.)


7. Bonferroni Inequalities. If AI' ... , An are events and
7k= L
I Sj. < ... <jkSn
P{Aj,Ah···AjJ,

then

2m:o=::; 11.

8. Find the formula analogous to (15) for the probability that at least m of the events
AI>"" An occur.

9. Let Po = I and Pm' I :0=::; m :0=::; 11, be real numbers such that
m
qm = (n)m ni: (_IY(" -: m)Pm+
)=0 ]
j ~0
for 0 :0=::; m :0=::; 11. Then L::,=o qm = Po = I and there exists a probability space
(Q, .¥, P) and interchangeable events A I' ..• , An with Pi = P{A I A 2 ... A j},
I :O=::;j:O=::; 11. Hint: Take Q = {w = (WI>' .. , wn):w j = Oor I, I :O=::;j:O=::; Il}, Y; = class
of all subsets ofQ, and P({w}) = qml('~,) whenever D= I w j = m. Set
Aj = {w = (wl, ... ,wn):wj = I},I :O=::;j:O=::; 11.
2.2 Bernoulli, Borel Theorems 39

10. Verify that if Pi = pi in (16) i.e., in Exercise 9, the probabilities qm coalesce to the
binomial and P{A i, A i2 ··· Ai.} = n~=, P{A i), 1 ~ i, < ... < ik ~ n.

11. Prove that if Pi = (~~j)/(~), j = 0, ... , r, r ~ N, the p.d.f. in (16) is hypergeometric,


i.e., qm = (::')(~~'::)/(~); also if Pi = 11(j + 1), {qm} is the discrete uniform distribution,
i.e., qm = I/(n + 1), 0 ~ m ~ n.

12. Prove that if r = r. = A. N In -+ CX) and A. -+ A E [0, CX) in the hypergeometric case
of Exercise II, then Pi = P}·' satisfies (18) and (19) of Theorem 2, so that the hyper-
geometric p.dJ. tends to the Poisson p.dJ. under the stated condition.

13. Let {A., n ~ I} be an infinite sequence of interchangeable events on a probability


space (Q, .F, P), that is, for all n ~ I and indices

Prove that

P{A., i.o.} = p{U 1


Ai} =I - lim (-I)k 8 kpo ,
/<-00

P{lim A.} = p{ nA.} = lim Pk'


n= 1 1.;-00

Hint: Recall Exercise 1.5.5.

2.2 Bernoulli, Borel Theorems


Due to the ease of computations involving the binomial distribution, many
important notions such as weak and strong laws of large numbers and the
central limit theorem applicable to wide classes of random variables may be
adumbrated here.

Theorem 1 (Bernoulli Weak Law of Large Numbers, 1713). Let {Sn} be a


sequence oj binomial random variables with p.dJ.s b(k; n, p), n ~ 1. Then Jor
all e > 0,

PROOF.

p{ISn_pl~e}=p{ISn-npl~ne}=
n
L
Ik-npl2:ne
P{Sn=k}

~ L (k -2 ;)2 P{Sn = k}
Ik-npl2:nt ne
1
L (k -
n
~ 22 np)2 P{Sn = k}.
ne k=O
40 2 Binomial Random Variables

The prior inequality is a special case of the simple but very useful Tchebychev
inequality discussed in Chapter 4. Moreover, the last sum which will be
identified in Chapter 4 as the variance of Sn (or its dJ.) equals

2~ (n-2)! k-2n-k
= n(n - 1)p /;'2 (k - 2)!(n - k)! P q

I) ~ (n-1)! k-ln-k 22
- (2 np - np/;'l (k _ 1)!(n _ k)! P q +n p

= n(n - l)p2 - np(2np - 1) + n2p 2 = npq. (1)

Therefore,

o (2)

The prior theorem may be strengthened by the simple device of replacing


(k - np)2 by (k - np)4.

Theorem 2. Let Sn, n :2: 1, be binomial LV.S with p.dJ.s b(k; n, p), 0 < p < 1,
n :2: 1. Then for every t; > 0

PROOF. Set kj = k(k - I) ... (k - j + 1), 1 ~ j ~ k, whence

Since

o ~j ~ n,
2.2 Bernoulli, Borel Theorems 41

it follows that

f (k - np)4(n)pkqn-k
k=O k

= f (e - 4k 3np + 6k znzpz - 4kn 3p3 + n4p4 )(n)pkqn-k


k=O k
= (n4p4 + 6n3p 3 + 7nzp z + nIP) - 4np( n3p 3 + 3nzp z + nIP)
+ 6nZpZ(nzpZ + nIP) - 4n 3p3(nIP) + n4p4
= p4(n4 - 4nn3 + 6n znz - 4n3nl + n4) + p3(6n3 - 12nnz + 6n Zn l )
+ pZ(7nz - 4nnl) + nIP
= (3n Z - 6n)p4 - (6n Z - 12n)p3 + (3n z - 7n)pZ + np
= 3n(n - 2)pZ(p - I)Z + npq = npq(3npq - 6pq + 1).
Consequently, proceeding as at the outset of Theorem 1,

p{ I Sn - p
n
I ~ t;} = Ik-npiL ~n£P{Sn = k} ~ n; t; 4 f (k -
k=O
n
np)4(k )pkqn-\

= pq(3npq - 6pq + 1) = 0 (~)


n 3 t;4 nZ '

and therefore the series in question converges. D

The strong law of large numbers involves the pointwise convergence of a


sequence of random variables on a probability space. A discussion of this will
be facilitated by

Lemma 1. If {Yn, n ~ I} is any sequence of random variables on a probability


space (n, ff, P), then P{limn~oo Yn = O} = 1 iff

p{j};, I > ~, i.O.} = 0, k = 1,2, ....

PROOF. Let A = Uk=


I Ab where A k = {I Y"I > 11k, i.o.}. If w ¢ A, then
I YnCw)! > 11kfor only finitely many n for every positive integer k, implying
lim y"(w) = O. Conversely, iflim y"(w) = 0, then w ¢ A k for k = 1,2, ... , and
so w ¢ A. Thus A C = {lim Y" = O}, whence P{lim Y" = O} = 1 iff P{A} = 0
or equivalently
P{Ad = 0, k~1. D

The lemma which follows plays a pivotal role in probability theory in


establishing the existence of limits and constitutes one-half the Borel-
Cantelli theorem of Section 3.2.
42 2 Binomial Random Variables

Lemma 2 (Borel-Cantelli Lemma). If {An' n ~ I} is a sequence ofevents for


which If
P{A n } < 00, then P{A n , i.o.} = O.
PROOF. Since {An, i.o.} = IUX'= U:'=k An U:'=k An, all k ~
1 C I, by Theorem
1.5.2

and so
00

o ~ P{A n, i.o.} ~ lim L P{A n} = O. o


k-oo n=k

The last two lemmas in conjunction with Theorem 2 yield

Theorem 3 (Borel Strong Law of Large Numbers, 1909). Let Sn constitute


a sequence of binomial r.v.s on some probability space (0, Ji', P) with p.dJ.s
b(k; n, p), n ~ 1. Then

p{lim Sn = p} = 1.
"-00 n
PROOF. According to Theorem 2, for every e > 0

whence the Borel-Cantelli lemma guarantees that

e > O.

Thus, by Lemma 1

which is tantamount to that which was to be proved. o


The existence of such a probability space will follow from Theorems 3.1.1
and 6.4.3.
S. Bernstein ingeniously exploited the binomial distribution and Theorem I
to prove Weierstrass' approximation theorem, which asserts that every
continuous function on [0, 1] can be uniformly approximated by poly-
nomials.

EXAMPLE 1. If f is a continuous function on [0, 1] and the Bernstein poly-


nomials are defined by

P E [0, I], (3)


2.2 Bernoulli, Borel Theorems 43

then
lim Bn(P) = f(P) uniformly for p E [0, 1]. (4)

PROOF. Let 8 n be a binomial r.v. with p.d.f. b(k; n, p). Since every continuous
function on [0, 1] is bounded and uniformly continuous thereon, I f(P)1 ~
M < 00 for p E [0, 1], and for every e > 0 there exists () > 0 such that
If(P) - f(P') I < eiflp - p'l < () and 0 ~ p,p' ~ 1. Then,settingq = 1 - P
and An = {j: Ij/n - pi < ()},

IBn(P) - f(P)1 = IJJ;)piqn-j[f (~) - f(P)] I


~ jt(;)piqn-jlf(~) - f(p)1

~ e L bU; n, p) + 2M L
An A:i
bU; n, p)

By (2),

P {I Sn I } P(1-p)
--;; - p ~ () ~ n{)2 ~
1
4n{)2'

and so if n ~ M(e{)2)- 1,

IBn(P) - f(p)1 ~ e + e = 2e, o ~ p ~ 1,


yielding (4). o
If {Y", n ~ O} are r.v.s on a probability space (O,~, P), then {Y", n ~ I}
is said to converge in probability to Yo, denoted yn..e.. Yo, if
lim P{I Y" - Yol ::s; e} = 1,

all e > O. Alternatively, {Y", n ~ I} converges almost certainly (a.c.) to Yo,


denoted Y,,~ Yo, if P{limn~oo Y" = Yo} = 1. Theorems 1 and 3 of this
section assert that if {Sn' n ~ I} are binomial LV.S on (O,~, P), then both
types ofconvergence hold with Y" = SJn and Yo = p. A detailed discussion of
these concepts is given in Section 3.3, where it is shown that Y" ~ Yo implies
Y" ~ Yo, the converse being untrue in general. However, the case of a
countable sample space 0 is exceptional according to

EXAMPLE 2. If {Yn, n 2:: O} is a sequence of LV.S on a countable probability


space (o,~, P) where ~ = {all subsets ofO} and Y" ~ Yo, then Y" ~ Yo.
44 2 Binomial Random Variables

PROOF. Set A = {oo: lim n _ oo y" = Yo} and suppose that PtA} < 1 or equiv-
alently PtA"} > O. Since A" is countable and, moreover, a countable union of
null events is a null event, there exists 00 0 E A" with P{ oo o} = b > O. Moreover,
000 E A" implies that for some t:: > 0 and subsequence nj of the positive
integers, I y"/ooo) - Yo(oo o)! > t::,j ~ 1, whence

contradicting Y" .f. Yo' o

EXERCISES 2.2

1. Verify that if np ...... ). E (0, (0), then D kib(k; n, p) ...... LQ' kip(k; ).), j = 1,2. Hint:
Recall Exercise 2.1.2.

2. If gm().) = 2:;;"=0 (k - :A.ymp(k; A), show that gd:A.) = 0, gz(:A.) = g3(:A.) = :A., g4(:A.) =
3:A. z + A, g6().) = 15).3 + 25A z + A.

3. (i) Prove a weak law of large numbers where Sn has p.d.f. p(k; n).), n 2: I, that is,
P{I(Sjn) - AI > e} = 0(1), e > O. (ii) If {X, X n, n 2: I} are LV.S with a common
d.f. and n P{ I X I > n} = 0(1), then (I/n)max, ';i,;nl Xii .f, o.

4. Prove a strong law of large numbers where Sn has p.d.f. p(k; n:A.), n 2: 1, that is,
lim n_ oo (Sn/n) = :A., a.c. Hint: Consider P{lSn - n).1 4 > n4 e4 }.

5. Show that the Borel-Cantelli lemma is valid on any measure space (n, .91, /1), that is,
Do
/1{A n} < 00 implies /1{1lm n An} = 0 for An E .e/, n 2: 1.

6. Let {Xn.i , j 2: I, n 2: I} be a sequence ofr.v.s such that 2:J=' P{IXn) > e} ~O,
all e > O. Prove that sUPh I IXn.jl !. O.

7. Prove that for 0 < p < 1

lim In (k -c::np)4b(k; n, p) = 1
r::;::"
foo
t 4 e- t '/2 dt = lim I00 (k _!1).)4 p(k; A).
n-oo k=O v' npq v' 2n - 00 .1-00 k=O v' A

8. A sequence {Sn, n 2: I} of r.v.s is said to converge completely (Hsu-Robbins) if for


every e> 0, I
P{ IS.I > e} < 00. Prove that if {Sn, n 2: I} converges completely,
then lim S. = 0, a.c.

9. (i) For any LV. X and constants an ...... 0, verify that anX ~ O. Hint: It suffices to prove
c.1 XI~ 0, where Cn = supj"nlajl. (ii) If {X n , n 2: I} are r.v.s with identical d.f.s and
{a.} are as in (i), then an X• .f, O.
2.3 Central Limit Theorem for Binomial Random Variables, Large Deviations 45

2.3 Central Limit Theorem for Binomial


Random Variables, Large Deviations
To show that the limit of binomial p.dJ.s may be normal an asymptotic
estimate of the combinatorial occurring therein is essential and this follows
readily from

Lemma 1 (Stirling's Formula). For every positive integer n ~ 2


1 1
where 2 < en < -2 . (1)
1 n + lin
PROOF. Define
l)n+O/2l, 1
an = ( 1 +- n ~ L
n bn = 2n + l'

Then
1 n + I 1 1+ b b; b~
logan = (n + 2)log-- = - I o g - -n = 1 + - + - + ...
n 2b n I - bn 3 5
= 1 + bn (say), where
1 b; ~ b;
< -3 < Un < -3 (I + bn + bn + .,.)
2 4

12n + 1 12(n + 1) + 1

(2)
12n(n + I)'
so that
C()

o< L b n = C< 00.


n;l

Therefore,
(n + l)n+O/2l n n
L log a + L bj = n + C - L b
C()

log I = j = n j
n. 1 j;l j;n+t

= n + C - en + 1 (say), where via (2)


1 1
-,----- < en < - .
12n + 1 12n
Hence, for n ~ 1
n! = (n + l)n+ O /2)exp(-n - C + en + 1 ),
implying
(n + I)! = (n + l)n+(3/2l exp(-n - C + en + 1 ). (3)
46 2 Binomial Random Variables

Set K = e 1 - C
> O. Then, replacing n by n - 1 in (3),

n ~ 2. (4)

The identification of K as J2it


will be made at the end of this section via
probabilistic reasoning, thereby completing the proof. 0

Lemma 2 (DeMoivre-Laplace, 1730). For n = 1,2, ... let k = k n be a non-


negative integer and set x = Xk = (k - np)(npq)-1/2, where q = 1 - p,
o < p < 1. If x = 0(n 1/6 ) and <p(x) is the standard normal density, there exist
positive constants A, B, C such that

b(k; n, p) _ 11 < ~ + Blxl 3 + Clxl.


I
(5)
(npq)-1/2<p(x) n fi J7J
PROOF. Since x = 0(n 1/6 ), necessarily (kin) -+ p. By Stirling's formula
. _ (n) k n-k _ nn+(1/2) exp( -n + en)(2n)-1/2pkqn-k
b(k, n, p) - k P q - kk+(1/2)(n _ k)n k+(1/21 exp( -n + ek + en- k)

e' (k
= __ _ k)-n+k-(1/2 1(npq)-1/2
_ )-k-(1/21(n__
J2it np nq
where e = en - ek - en-ko Since kin -+ p, e = 0(n- 1). Now
k n-k
log{(2nnpq)1/2b(k; n, p)} = e - (k + !)log - - (n - k + !)log--
np nq

= e - (np + x~ + !)IOg(1 + xj£)

- (nq - x~ + !)IOg(1 - x!£)

I3
(np + x~ +!) [\jrq
2

= e-
x q + 0 (IX
-;p - 2np n 2)] 3/

r,;pq +!)
- (nq - x V r1Plf [ !£ X2p
-x -nq - -2nq + 0 (IX
I3
-n3 / 2)]

I3 2
= r,;pq + x 2q - -x2q
e - [ x v"Plf 2
- + 0 (IX
+ -xj£
2 np
-n 1/2) + 0 (x-n )]

r,;pq
- [ -x v"Plf + x 2 p - -X2p
2
- + 0 - 1 2) + 0 (X- )]
- -x!£
2 nq n / n
(IX I3 2

= - x; + O(~) + O(~) + O(~),


2.3 Central Limit Theorem for Binomial Random Variables, Large Deviations 47

whence

1 2 (IXI (IXI) + 0 (1))


.j2ic exp (_X
3
(npq)1/2b(k; n, p) = -2- + 0 ~) +0 ~ ;;

= ~(X)[1 + o(~) + o(~) + o(~)J


yielding (5). o
The preceding lemma is an example of a local limit theorem since it concerns
densities, while the forthcoming corollary provides a global limit theorem
involving dJ.s.

Theorem 1 (DeMoivre-Laplace (Central Limit) Theorem). IfSn is a sequence


of binomial r.v.s with p.dJ.s b(k; n, p), n ~ 1, and {m n}, {M n} are sequences of
nonnegative integers satisfying mn ::::;; M n, n ~ l,for which

(mn -2 np)3 -. 0, (M n -2 np)3 -. 0 (6)


n n
as n -. 00, then

P{mn ::::;; Sn ::::;; M n} '" et>(Mn(~q~~/: t) _et>(mn(~p;~/~ i). (7)

PROOF. For k = mn, mn + I, ... ,Mn let Xk = (k - np)h n, where hn =


(npq)-1/2. By Lemma 2
Mn Mn
P{mn ::::;; Sn ::::;; M n } = L b(k; n, p) = L ~(Xk)' h (1
k=m n k=mn
n + 0k(I», (8)

where Ok(1) -. 0 uniformly for mn ::::;; k ::::;; M n (via (6». Now


(k+(1/2)-n Pl hn
f(k-(1/2)-np)h n
~(u)du = hn~«(k) = hn exp(1(xl - a»~(Xk)'

where Xk - (h,,/2) < (k < Xk + (h,,/2). Hence,


hnqJ(Xk) = exp(1(a - XD)[et>(Xk + !h n) - et>(Xk - !h n)]. (9)
Since I(l - xli = I(k - Xk I 'I(k + Xk I : : ; hn[ IXk I + (h,,/4)], hypothesis (6)
ensures that exp{!<a - xl)} -. I uniformly for mn ::::;; k ::::;; M n •
Consequently, from (8) and (9)
Mn
P{mn ::::;; Sn ::::;; M n} = (1 + 0(1» L [et>(Xk + !hn) - et>(Xk - !h n)]
k=mn
48 1 Binomial Random Variables

Note that the second statement in (11) of Corollary 1 asserts that the dJ
of the normalized binomial r.v. S: as defined in (10) tends to the normal dJ.
Such a tendency toward normality is a widespread phenomenon and ex-
emplifies a class of theorems known as central limit theorems (Chapter 9).

Corollary 1. If {Sn, n ;::: I} is a sequence ofbinomial r. v.s with p.dJ.s b(k; n, p),
0< p < 1, and
S* = Sn - np (10)
n ;;;M'
thenfor every pair affinite constants a < b and all x
lim P{a ~ S: ~ b} = <I>(b) - <I>(a), lim P{S: < x} = <I>(x). (11)

PROOF. Let M n be the largest integer ~ np + b;;;M


and n the smallest m
integer;::: np + a;;;M.
Then M n ;::: mn ;::: 0 for all large nand (6) obtains.
Thus, by the DeMoivre-Laplace theorem
P{a ~ S: ~ b} = P{m n ~ Sn ~ M n}

"" <I>(M
n
~+ 1) _ <I>(m
n
~-1} (12)

Since the right side of (12) approaches <I>(b) - <I>(a) as n -> 00 via the con-
tinuity of <1>( . ), the first part of (11) follows. As for the second, for
-oo<a<b<oo
lim P{S: ~ b} ;::: lim P{a ~ S: ~ b} = <I>(b) - <I>(a),

whence, letting a -> - 00,

lim P{S: ~ b} ;::: <I>(b).

Thus, for a < b,


lim P{S: < b} ;::: lim P{S: ~ a} ;::: <I>(a),

yielding as a -> b
lim P{S: < b} ;::: et>(b).

Similarly,
1- IliTi P{S: < a} = lim P{S: ;::: a} ;::: lim P{a ~ S: ~ b} = <I>(b) - <I>(a),

yielding as b -> 00

IliTi P{S: < a} ~ <I>(a),

and the second part of (11) follows. o


2.3 Central Limit Theorem for Binomial Random Variables. Large Deviations 49

As an application of Corollary 1, the constant K occurring in the proof of


Stirling's formula will be shown to equal (21l)lj2: if (4) rather than (1) is
utilized in the proof of the prior theorem, then setting a = - b, (11) becomes

lim P{ISn*1 ~ b} = ~ fb e- x2 / 2dx, (13)


n-oo K -b
j
which readily implies K ~ (21l)t 2. On the other hand,
p
P{IS*I > b} < ~2 L (k - n )2(n)pkqn_k
n - b Ik-npl>b,!npq ~ k
< -1 ~
L.
(k - np? (n) Pkqn -k = -1
- b2 k=O npq k b2
via (1) of Section 2. Hence,
1
1= P{IS:I ~ b} + P{IS:I > b} ~ P{IS:I ~ b} + b2

Il-~
------> -
I fb e -x'j2 dx + -12 h-", j
(21l)t 2
------> - -
K -b b K
yielding K ~ (21l)lj2. Consequently, K = (21l)tj2. o
As will become increasingly apparent, an important characteristic of any
dJ. F(x) is the manner in which it increases to one and decreases to zero-in
other words, the order of magnitude of its tails I - F(x) and F( - x) as
x -+ + 00. In the case of a normal LV., symmetry of the distribution entails
identical behavior of the upper and lower tails

Lemma 3. For all x > 0,

x
-1-- - x 2{2
< foo e - .2j2 du < -1 e - x j2 .
2
2 e
+ x x X

PROOF. For x > °


I
-2 fOO e -.2/2 d u> foo "2e
1 -.2j2 d u--e
_ I -x 2j2 - fOO e -.2/2 du
x x x U x x

whence

-I e -x
2j2 _
- JOO ( 1 + "21 ) e -.2j2 du> JOO e -.2j2 du > X
-1-- e -x
2j2
. o
x x U x +x 2

Corollary 2. For the standard normal distribution <l>(x),

1 tf\{) 1 -x 2 j2
as x -+ 00,
- '¥\X '" X(21l)lj2 e ,
50 2 Binomial Random Variables

Corollary 1 ensures that the tails of the distribution of the normalized


binomial LV. S: = (Sn - np)j~, say 1 - F:(x) and F:( -x), tend to
1 - <J)(x) and <J)( - x) respectively as n -+ 00. The next assertion is that this
remains true for x = X n -+ 00 provided the approach of X n to 00 is sufficiently
slow. This is the prototype of what is called a large deviation theorem.

Theorem 2.lfSn is a sequence ofbinomial LV.S with p.dJ.s b(k; n, p), n ~ 1, and
{an, n ~ 1) is a sequence ofreal numbers with an -+ 00, an = o(n 1/6 ) as n -+ 00,
then setting S:
= (Sn - np)j(npq)1/2 as earlier,

(14)

PROOF. For simplicity suppose an > 1, n ~ 1, and define M n to be the largest


integer :s; np + (an + log an)(npq)1/2 and mn the largest integer :s; np +
an(npq)1/2. Then

_ Mn - np _ -1/2
XM n - (npq)1/2 - an + log an + O(n ),

whence the DeMoivre-Laplace theorem is applicable, yielding

= <I>(an + log an + O(n - 1/2)) - <I>(an + O(n - 1/2)).


By Lemma 3 and the fact that a~ = 0(n I/2 ),

Hence,

Now, for k = Mn + 1, ... , n

P{Sn = k}
-----'--"---'-- =n - k + 1 -p <n - Mn p
-
P{Sn = k - I} k q - Mn + 1 q'
2.3 Central Limit Theorem for Binomial Random Variables. Large Deviations 51

whence

P{S. ~ M.} =
.-Mn
L P{S. = M. + j}:$; j=O
L
.-M n (nM- M i-p)j P{S. = M.}
j=O .+ q
(M. + I)q n
:$; P{S. = M.} :$; P{S. = M.}.
M. +q - np M. - np
Since M. - np = xMJnpq)t /2, by Lemma 2

n P{S. = M.} _ CP(XMJ


M. - np p' qXM n
(2 np 2q2) - t 12
[a. + log a. + O(n t/2)] exp( -Ha. + log a. + 0(n- 1/2 )]2)
= o(cp~·»).
Analogously,

P{S• = } _ cp(a.) = (cp(a.») (15)


m. ( )t 2
npq /
0 .
a.
Thus,

and
P{Sn ~ mn} = P{m n :$; Sn :$; M n} + P{Sn > M n}

_ cp(a.) + 0 (cp(a.») _ ~ cp(a n). (16)


an a. an
Consequently, (14) follows from (15), (16) and
P{S. > m.} :$; P{S: ~ an} :$; P{S. ~ m.}. 0

EXERCISES 2.3
1. Show that the Bernoulli weak law of large numbers is consequence of the
DeMoivre-Laplace theorem.
2. Prove that as x

f'
-+ 00.

uqJ(u)du - x 1'" qJ(u)du,

1 - <I>(x + (a/x)) _
- - - - - - + e a.
I - <I>(x)

3. Use Lemma 2 to show that b(n; 2n, t) - (nnr t/2 .


52 2 Binomial Random Variables

4. (Normal approximation to the Poisson distribution.) If X;, is a Poisson r.v. with


parameter .l., prove that for any -00 < a < b < 00

hm. P[ a < ~
X;, -.l. < b = l1>(b) - l1>(a) J
;,-"" v.l.

II
by first showing for

z=
k-.l.
-- = O(.l.1/6) that
p(k; .l.) _ < i 81z1 3 C1zl
.ji I.l. 1/2¢J(Z) - .l. + .ji + .ji'
Obtain a large deviation result akin to Theorem 2.

5. Let Sn be binomial LV.S with p.d.f. b(k; n, p), 0 < p < 1, and S: = (Sn - np)jfiM.
Prove that for every rx > 1,
P{IS:I > (2rxlogn)I/2, i.o.} = O.
6. Utilize Exercise 5 to prove that for Sn as defined there and for every p > 1.
. Sn - np
hm - - p - = 0 a.c.
n--+co n
(If P = I, this yields the Borel strong law of large numbers.)
7. Prove for X;, as in Exercise 4 that P {X;, :.,; .l.} > c;" .l. > 0, where c;, = ! for integer
=
.l. and c;, = e- I otherwise. Hint: An,;, Ii=o(.l.iJjl)e-;' = P{X;,:";.l.} for n:";.l. <
n + 1, and A n.n - A n.n+! = J:+
I (.l. ne-;'jn!)d.l. > A +l,n+1 - A . +!, implying A . >
n nn nn
A n+, .n+!. Also A n.n - An,n+1 < A n.n - A n- I • n and by Exercise 4 P{X;, :.,; .l.} -+! as
.l. -+ 00.
8. For q > 0 and k = 0, 1, ... , n, let

ek = b(k; n, q ~ 1) = G)(q + lfnqn-k,


q
c=--
2q ,
+1

N = [nq ++ IJ,
1
h =k- N.

Prove that if! < rx < i, then


I ek = O(exp( - nn)) (i)
Ihl>n'

where '1 < 2rx - 1, and that for Ihl :.,; nO,

(ii)

(Hint: For (i) apply Theorem 2, and for (ii) apply Lemma 2.
9. (Renyi) Let Sn be a binomial r.v. with p.d.f. b(k; n, p), where 0 < p < 1. If [np] is
the largest integer:.,; np and In, k n are integers such that In = O(n"), kn = O(n") for
some rx in (0, i), and k; - J;
= o(n), prove that

lim P{Sn = k n + [np]} = 1.


n-"" P{Sn = In + [np]}
Hint: Apply Exercise 8(ii).
References 53

10. If 0 < a ~ 1 and -00 < a < 00 show that sup",Ict>(xa- 1 - a) - ct>(x)1 ~ (h)-1/2.
[a-I - 1 + lal]. Hint: Add and subtractct>(xa-I),and use the mean value theorem.

References
J. Bernoulli, Ars Conjectandi, Basel, 1713.
S. Bernstein, "Demonstration du theoreme de Weierstrass fondee sur Ie calcul des
probabilites," Soob. Charkov. Mat. Obs. 13 (1912), 1-2.
E. Borel, "Sur les probabilites demombrables et leurs applications arithmetiques,"
Rend. Circ. Mat. Palermo 27 (1909), 247-271.
W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I, 3rd ed.,
Wiley, New York, 1950.
B. de Finetti, "La prevision, ses lois logiques, ses sources subjectives," Annales de
I'lnstitut Henri Poincare 7 (1937), 1-68.
J. Haag, "Sur un probleme general de probabilites et ses diverses applications," Proc.
Inst. Congr. Math., Toronto, 1924, 1928,629-674.
G. H. Hardy, Divergent Series, Clarendon Press, Oxford, 1949.
P. L. Hsu and H. Robbins, "Complete convergence and the law of large numbers,"
Proc. Nat. Acad. Sci. U.S.A. 33 (1947),25-31.
P. S. Laplace, Theorie analytique de probabiliu!s, 1812 (Vol. 7 in Oeuvres completes de
Laplace, Gauthier-Villars, Paris, 1886].
A. de Moivre, The Doctrine of Chances, 1718; 3rd ed., London, 1756.
S. D. Poisson, Recherches sur la probabilite des judgements, Paris, 1837.
A. Renyi, Foundations of Probability, Holden-Day, San Francisco, 1970.
H. Robbins, "A remark on Stirling's formula," Amer. Math. Mont/hy 62 (1955),
26-29.
J. Stirling, M ethodus Differentia/is, London, 1730.
H. Teicher, "An inequality on Poisson probabilities," Ann. Math. Stat. 26 (1955),
147-149.
3
Independence

Independence may be considered the single most important concept in


probability theory, demarcating the latter from measure theory and fostering
an independent development. In the course of this evolution, probability
theory has been fortified by its links with the real world, and indeed the
definition of independence is the abstract counterpart of a highly intuitive and
empirical notion. Independence of random variables {X;}, the definition of
which involves the events of <1(X j ), will be shown in Section 2 to concern only
the joint distribution functions.

3.1 Independence, Random Allocation of Balls


into Cells
Definition. If (0, ~, P) is a probability space and T a nonempty index set,
classes f§, ofevents, t E T, are termed independent iffor each m = 2, 3, ... ,each
choice of distinct t JET, and events A j E f§'j' 1 ~ j ~ m,
(1)
Events A" t E T, are called independent if the one-element classes f§, = {A,},
are independent.
t E T,

Clearly, nonempty subclasses of independent classes are likewise inde-


pendent classes. Conversely, if for every nonempty finite subset T1 c T
the classes f§" t E T1 , are independent, then so are the classes f§" t E T. It
may be noted that the validity of (1) for some fixed integer m > 2 is not
sufficientto guarantee independence ofthe events A l' A 2' ... , Am (Exercise 4).

54
3.1 Independence, Random Allocation of Balls into Cells 55

On the other hand, it is easily verified via (14) of Lemma 2.1.2 that At, lET,
are independent events iff the classes t:§t = {0, n, A" A~} are independent.

Definition. {X n' n ~ I} are termed independent random variables if the classes


t:§ n = a(X n), n ~ 1, are independent. More generally, stochastic processes, i.e.,
families of r.v.s {x~n), l E T,,}, n ~ 1, are independent (of one another) if the
classes t:§ n = a(x~n), l E T,,), n ~ 1, are independent.

Random variables which are not independent are generally referred to as


dependent. Clearly, subsets of independent families or of independent random
variables are themselves independent. Of course, independence of the families
(or random vectors) (Xl> X 2) and (X 3 , X 4 ) postulates nothing about the
independence or dependence of X I and X 2' Note that random variables X
and Yare independent if and only if for all A E a(X), BE a( Y)
P{A . B} = P{A} . P{B}.
A sequence {X n' n ~ 1} (or the random variables comprising this sequence)
is called independent, identically distributed (abbreviated i.i.d.) if X n , n ~ 1,
are independent and their distribution functions are identical.
If{Xn,n ~ I} arerandomvariableswithP{X n E An} = l,whereA n c A =
{ai' a2'" .}, n ~ 1, define
(2)

Then PI ..... ia.l, ... , an) is called the joint probability density function of
X I> .•. , X n and the latter are termed discrete LV.S. It is not difficult to ascer-
tain that the discrete random variables X I> •.• , X n are independent iff for
every choice of (ai' ... , an) in A x A x ... x A
PI, .... n(al'···' an) = PI(al)'" pian), (3)
where Pj is the (one-dimensional) probability density function of Xj' 1 ~
j ~ n. It follows from (3) that if the discrete random variables X I' ... , X n are
independent then for all real Al> ... , An

and the converse is also true. Even without the proviso of discreteness (4) is
still equivalent to independence, but the proof is more involved and hence
deferred to Section 2. This condition may be rephrased in terms of the joint
and one-dimensional d.f.s, namely, for all real Ai> 1 ~ i ~ n,
n

Fx, ..... x.(AI,· .. ,An) = OFxlA;). (5)


i= I

If r.v.s X I and X 2 on (0, ~, P) are finite everywhere or even if merely


AI = {XI = 00, X 2 = -oo} = 0,A 2 = {XI = -00, X 2 = oo} = 0,then
the definition oftheir sum is standard, i.e., (X I + X 2)(W) = X I(W) + X iw).
56 3 Independence

However, if A = Al U A 2 #0,thenX t (w) + Xiw) is undefined on A. For


definiteness, set(X 1 + X 2)(W) = 0, WE A. Since P{ IXii = oo} = 0, i = 1,2,
P{A} = 0 and so (Xl + X 2)(W) = X l(W) + X iw), a.c. Hence, for any
LV. X 3 on (n,!F, P),

Unless the contrary is stated, any subsequent relationship among LV.S will be
interpreted in the a.c. sense. Since f(tl> t 2 ) = t 1 + t 2 is a Borel function on
(R 2, 91 2 ) (with the convention that 00 +(- (0) = ( - (0) + 00 = 0), the sum
Xl + X 2 is a LV. by Theorem 1.4.4. By induction, the sum L'i
Xi of n LV.S is a
r.v. for every n = 1,2, ....

Definition. Random variables Xj' 1 ~ j ~ n ~ 00, are called Bernoulli trials


with success probability p if Xj' 1 ~ j ~ n ~ 00, are i.i.d. with P{X j = 1} =
pE(O, 1)andP{X j = O} = q = 1 - p.
The event {Xj = 1} is frequently interpreted as a success on thejth trial
(and concomitantly {Xj = O} as a failure on the jth trial) so that it is natural
to refer to p as the success probability. Then Sn = L'i
X j is the number of
successes in the first n trials.

Theorem 1. If {X n, n ~ 1} are i.i.d. LV.S on a probability space (0, !F, P) with


P{X 1 = 1} = pE(O, 1) and P{X 1 = O} = q = 1 - p, then Sn = Xi' L'i
n ~ 1, is a sequence of binomial LV.S on (n,!F, P) with p.d.f.s b(k; n, p),
n ~ 1, and

. {Sn - np
lIm P ( )1/2 < X
} _
-
1
Fe.
IX e
-u2/2
du, X E ( - 00, (0), (6)
n-oo npq y2n -00

(7)

PROOF. Clearly, {Sn' n ~ 1} are LV.S, and since P{X n = x} = pXq1-X for
x = 0 or 1, by independence for k = 0, 1, ... , n

P{Sn = k} = L P{X 1 = Xl"'" X n = Xn} = L pf."'X'qn-r.yx"


where the summation is over all (k) choices of Xl> ... , Xn such that L'i Xi = k
and Xi = 0 or 1 for i = 1, ... , n. Hence,

P{Sn = k} = L pkqn-k = (~)pkqn-k.


Now, (6) and (7) follow from Corollary 2.3.1 and Theorem 2.2.3. 0

To say that r balls have been cast at random into n cells will mean that
every possible arrangement of these r balls into the n cells has the same
3.\ Independence, Random Allocation of Balls into Cells 57

probability, i.e., is "equally likely." A mathematical model or probability


space describing such an experiment is concocted by taking
n = {w: W = (WI' , w,), w j = 1,2, ... , n, 1 ~ j ~ r}
= (1, ... , n) x x (1, ... , n),

ff = class of all subsets of n,


P{A} = L n-' = mn-', AEff,
OlEA

where m = number of elements in A. Define X 1> •.• , X, to be the coordinate


random variables ofn, i.e., Xj(w) = Wj' 1 ~ j ~ r. In words, X j is the number
of the cell containing the jth ball. Then X l' ... , X, are discrete LV.S with
joint p.dJ.
PI. .jk 1 , · · · , k,) = P{X 1 = kl>"" X, = k,} = n-',
k i = 1, ... , n, 1 ~ i ~ r.
Moreover, if Pi denotes the p.dJ. of Xi' 1 ~ i ~ r, for k i = 1, ... , n, 1 ~ i ~ r,
" n'- 1 1
PiCk;) = P{X i = kJ = L
kj= l,j"'i
Pl.. ,(k 1 , · · · , k,) = - , =-,
n n
(8)

whence for all choices of k l ' ... , k,


,
Pl, .. jkl>"" k,) = n plk;).
i=l

According to (3), XI' ... , X, are independent and, taking cognizance of (8),
X j ' 1 ~ j ~ r, are i.i.d. random variables. Thus, a random allocation of r balls
into ncells is tantamount to considering i.i.d. LV.S X 1> .•• , X, with P{X 1 = k}
= lin, k = 1, ... , n. Let

Ai = {X 1 ::1= i, ... , X, ::1= i} = {w: cell i is empty}.


Then in view of independence and (8), for 1 ~ i ~ n

P{AJ = n, P{X
j= 1
j ::1= i} = (1)'
1- -
n
;

P{AitAiJ = P{X j ::1= i 1 or i 2 , 1 ~ j ~ r}

= (1 - P{X j = id - P{Xj = i 2 })' = (1 - ~r


and, in general, for 1 ~ m ~ nand 1 ~ i1 < ... < i m ~ n

P{A-11 ... A·I".. } = P{X.J


::1= i Ior···
m or i, 1 <
- J. -< r} = (1 _ ~)' .
n
58 3 Independence

In other words, AI' ... , An are interchangeable events and consequently if


Pm(r, n) = P{exactly m cells are vacant}, o :5; m :5; n,
it follows from Corollary 2.1.4 that

Pm(r, n) = j~m(-ly-m
n .
j ( 1 - ~j)r,
m (n) (j) o :5; m :5; n. (9)

Theorem 2 (von Mises). Ifrn balls are cast at random into n cells so that each
arrangement has probability n- rn , where
n 2:: 1, (10)

then as n ~ 00 the probability Pm(rn, n) of exactly m vacant cells satisfies


Ame-).n
Pm(rn, n) - _n_,_ = 0(1) for m = 0, 1,.... (11)
m.
In particular, if An = A. + 0(1), then {Pm(rn, n), m 2:: O} tends as n ~ 00 to the
Poisson p.dJ. with parameter A..
PROOF. Set pr = (1 -
l (j/n»r n and rewrite (9) as

Pm(rn,n) = ni:m(_lY( n+
)=0 m
.)(m+j)pl::~j'
] m

Since 1 - x :5; e- x for 0 < x < 1,

n(n - 1)··· (n - k + l)p~n) :5; nke- krn /n = A.~ :5; A.~

by (10). Moreover, for any fixed k 2:: 1, if r n :5; n 3 / 2 , then

nkp~n) = nk ( 1 - ~k)rn = nk[e- k/n + O(n- 2 )]rn

= nke-krn/n[1 + O(n- 2 ek/n)]rn = A~[l + 0(1)] = A.~ + 0(1)


by (10), while if r n > n3 / 2 , then

nkp~n) = nk ( 1 - ~k)rn :5; nk [( 1 - ~k)nJ.Ift :5; nke-k..;n = 0(1),


A.~ = nke- krn/n :5; nke - k./ft" = o( 1),
nkp~n) = A.~ + 0(1),
and so the desired conclusion (11) follows from Theorem 2.1.2. 0

In Chapter 9 it will be shown that the dJ. ofthe number ofempty cells tends
to the normal distribution when r In ~ IX > 0 and even in certain cases if
rJn -+ 0 or 00.
3.1 Independence. Random Allocation of Balls into Cells 59

Definition. The conditional probability of an event A given an event B of


positive probability, denoted P{AIB}, is defined by
P{AB}
P{AIB} = P{B} . (12)

Clearly, when the events A and B are independent, P{AIB} = P{A}.

Theorem3.!f {An' n ~ I} is a sequence of events with L':'=l P{An} = 00, then

li~....:up pt=V+1 AjlAn} = 1. (13)

°
PROOF. Evidently, P{An} > for infinitely many values of n, and it may be
supposed without loss of generality that P {An} > 0, n ~ 1. Since

1 ~ pLQ An} ~ n~l P{A n)~t Ai} = n~l P{An{l- pt=Ql AjIAn}].
divergence of the series requires

lim inf[l -
n-+oo
p{. 0 AjIAn}] =
]=n+l
0,

which is tantamount to (13).

EXERCISES 3.1
I. Events An. n ~ I, are independent iff their indicator functions I An' n ~ I, are
independent LV.S iff~n = {0. n, An' A~}. n ~ I. are independent classes.

2. If the classes of events d nand !2 are independent, n ~ I. so are U:,=. d nand !2.

3. (i) Any r.v. X is independent of a degenerate r.v. Y. (ii) Two disjoint events are
independent iff one of them has probability zero. (iii) If P {X = ± 1, Y = ± I} =
for all four pairs of signs, then X and Yare independent r.v.s.
*
4. Let n = {wo , w., w 2, w3}, §' = {A: A en}, P{wJ = to::; i ::; 3, and otherwise
Pisdefined by additivity. If A j = {wo , wJ, I ::; i ::; 3, then each of {A., A 2 }, {A l' A 3 },
{A 2' A 3} is a pair of independent events but the events A I' A 2, A 3 are not inde-
pendent. On the other hand, if B. = {wo}, B 2 = A 2 , B 3 = 0, then (I) obtains for
m = 3 but the events BI> B 2 are not independent.

5. (i) If X 1>"" X n are independent LV.S and 9j, I ::; i ::; n, are finite Borel functions on
( - 00, 00), then 1'; = 9j(Xj), 1 ::; i ::; n are independent LV.S. In particular,
- X., ... , - X n are independent LV.S. (ii) If Bi> I ::; i ::; n, are linear Borel sets and
1'; = XJrx,EB;J' where I A isthe indicator function oftheset A,show that {1';, I ::; i ::;.Il}
are independent LV.S.

6. If {X n , n ~ I} are i.i.d. r.v.s with P{X 1 = O} < I and Sn = D


Xi' n ~ I, then for
every c > 0 there exists an integer n = nc such that P {ISn I > c} > O.
60 3 Independence

7. If X, Y, Z are LV.S on (Q, §, P) and signifies the relation of independence, prove or


0

give counter examples for:


•. X 0 Y iff X 2 0 y 2
II. X 0 Y, X 0 Z iff X 0 (Y + Z).
III. X 0 Y, Y 0 Z imply X 0 Z.
IV. X 0 (Y, Z), Yo Z imply X, Y, Z independent r.v.s.

8. If X and Yare independent LV.S and X + Y is degenerate, then both X and Yare
degenerate.

9. Let {X., n ~ I} be i.i.d. LV.S with P{X j = Xj} = 0 for i #j. If 1'; = D=I
J1Xj$x,J'
i ~ I, prove that {1';, i ~ I} are independent LV.S with P{ 1'; = }} = Ifi,} = 1,2, ... , i,
and i ~ I.

10. In Bernoulli trials with success probability p, let Y be the waiting time beyond the 1st
trial until the first success occurs, i.e., Y = } iff X j + 1 = I, X j = 0, i :$; }, where
{X J , . . . , Xj} are i.i.d. with P{X j = I} = p = I - P{X j = OJ. Find P{ Y = }},
} = 0, I, ... , which for obvious reasons is called the geometric distribution. If
YI ,· .• , Y,. are i.i.d. LV.S with a geometric distribution, find the p.d.f. of S, = D=
I 1';,
known as the negative binomial distribution (with parameter r). Hint: S, may be
envisaged as the waiting time beyond the rth trial until the rth success.
II. If Y and Z are independent Binomial (resp. Poisson, negative binomial) r.v.s with
parameters (n, p) and (m, p) (resp. A. 1 and A. 2, r. and r2), then Y + Z is a binomial
(resp. Poisson, negative binomial) LV. with parameter (m + n, p) (resp. A. I + A. 2 ,
r J + r2)' Thus, if {X., n ~ I} are i.i.d. Poisson LV.S with parameter A., the sum
S" = I? X j is a Poisson LV. with parameter nA..
12. In the random castingofr balls into ncells, let 1'; = I iftheithcellisemptyand =0
otherwise. For any k < n, show that the p.d.f. of 1';" ... , 1';k depends only on k and
not on i l , ... , ik • Hint: It suffices to consider P{ 1';, = I, ... , 1';k = I} for all k :$; n.
13. If{A.} is a sequence of independent events with P{A.} < I,n ~ I,andP{Uf A.}=I,
then P{A., i.o.} = I.
14. Let Q = [0, I] and .s;1 = Borel subsets of Q, and let P be a probability measure on
.s;1 such that P{[a, b)} = b = a for 0 :$; a :$; b :$; I. Such a "Lebesgue measure"
exists and is unique (Section 6.1). For WE Q, let W = W I W 2 ... be the decimal
expansion of W (for definiteness, no "finite expansion" is permitted). Prove that
{w., n ~ l} are independent LV.S with P{w. =}} = to,} = 0, 1, ... ,9.

15. If Np is a geometric LV. with parameter p, that is, P{ Np = k} = pqk, k ~ 0, prove that
limp~o P{pNp < x} = I - e-X, x > 0, and check that F(x; A.) = (I - e-AX)I[X>OI is
a dJ. for any A. > O. A r.v. with d.f. F(x; A.) is said to have an exponential distribution
with parameter A..

16. If {A., n ~ I} is a sequence of events with L~=I P {A.} = 00, show that
lim sup.~<Xl P{Uj~~AjIA.} = I for all m ~ 1.
3.2 Borel-Cantelli Theorem, Characterization of Independence 61

3.2 Borel-Cantelli Theorem, Characterization of


Independence, Kolmogorov Zero-One Law
The Borel-Cantelli theorem is a sine qua non of probability theory and is
instrumental in proving strong laws of large numbers, the law of the iterated
logarithm (Chapter 10), etc. The portion of Theorem 1 that is complementary
to Lemma 2.2.2 postulates independent events and while this proviso can be
weakened (Lemma 4.2.4) some such restriction is necessary (see Example 1).

Theorem 1 (Borel-Cantelli). If {An, n ~ I} is a sequence of events with


Lf P{A n} < 00, then P{A n, i.o.} = O. Conversely, if the events {An, n ~ I}
are independent and If
P{A n} = 00, then P{A n, i.o.} = 1.
PROOF. The first part is just the Borel-Cantelli lemma of the prior
chapter. If the events are independent, Theorem 3.1.3 ensures that
lim sUPn~oo P{Ui=n+1 AJ = 1; therefore, since P{U~nAJ is a monotone
sequence,

1 = lim P
n-oo
{Uj=n
Ai} = P{A n, i.o.}. o

Corollary 1 (Borel Zero-One Criterion). If{An, n ~ I} are independent events,


P{A n, i.o.} = 0 or 1 according as If
P{A n} < 00 or P{A n} = 00. Lf
EXAMPLE 1 (D. 1. Newman).l In a sequence of Bernoulli trials {X n} with
success probability p E (0,1), define N n to be the length of the maximal
success run commencing at trial n, that is,
{w: N n = j} = {w: X n + i = 0, Xi = 1, n ~ i < n + j}, j ~ O.
If Log n denotes logarithm to the base lip, then

p{rrm L ogn
n~oo
Nn
= I} = 1. (1)

PROOF. Since (Exercise 3.1.10) N n has a geometric distribution

P{N n > a Log n} = L qpj ~ paLogn = ~,


j>aLogn n
and so by the Borel-Cantelli theorem
P{N n > a Log n, i.o.} = 0, a> 1,
implying

p{rrm L ogn
n~oo
Nn
~ I} = 1. (2)

1 See (Feller, 1950, p. 210).


62 3 Independence

To circumvent the dependence of the events {N n > a Log n}, n ~ I, define


kn = [n Log n] = greatest integer equal to or less than n Log n. If Log 2 n
denotes Log Log nand 0 < a < 1.
kn + [a Log kn] :::; (n + I)Log n - Log n + a(Log n + Log 2 n)
:::; k n + I - (1 - a)Log n + a Log 2 n + I,
whence
kn+ 1 - (k n + [a Log kn]) ~ (I - a)Log n - a Log 2 n - 1 > I
for n ~ no. Consequently, the events

are independent and, moreover,


P {A n } -- P{X kn -- 1, .•. ,
X kn + [0 Log knI -- I} -- P[oLogknl+ I

> poLogk n + I > P


- - (n Log n)O'

implying L::'=no P{A n} = co. Thus, by the Borel-Cantelli theorem, for a E(O, 1)
P{N kn ~ a Log k n , i.o.} ~ P{N kn ~ [a Log k n ] + 1, i.o.} = I,

yielding

p{rrm L
n-oo
Nn
og n
~ I} ~ p{lim
n-oo
N knk
Log n
~ I} = I,

which, in conjunction with (2), proves (I). o


Next, it will be shown that independence ofa finite set ofr.v.s depends only
upon their joint dJ. Some preliminary lemmas permitting enlargement of
independent x-classes are needed.

Lemma 1. If~ and ~ are independent classes ofevents and ~ is a x-class, then ~
and (1(~) are independent.
PROOF. For any B E~, define

~* = {A: A E (1(~) and P{A . B} = P{A} . P{B}}.

Then: OE~*; AI - A2E~* if Aj, A2E~* and AI:::> A 2 ; AE~* if A =


lim An' where An E ~* and An C A n+ l , n ~ 1. Thus, ~* is a A.-class which
contains ~ by hypothesis. Consequently, ~* :::> (1(~) by Theorem 1.3.2. In
other words, P{A' B} = P{A}' P{B} for any BE<§ and every A E(1(~). 0

Lemma 2. Let {XI> t E T} be a stochastic process on (O,~, P) and TI , T2 non-


empty disjoint subsets of T. For i = 1,2 define ~i to be the class of all sets
3.2 Borel-Cantelli Theorem, Characterization of Independence 63

D~ = {X'li ) < ..1. 1 , .•. , X,l:,) < ..1.m }, where m is a positive integer, ..1.j a real
number, and tYI E 1;, 1 ~ j ~ m. If £i) I and £i)2 are independent classes, then so
are O"(X t E TI ) and O"(X t E T2 ).
" "
PROOF. Since £i) I and £i) 2 are n-classes and £i) I is independent of £i) 2, Lemma 1
ensures that £i)1 and 0"(£i)2) are independent. A second application of Lemma 1
yields independence of 0"(£i)1) and 0"(£i)2)' Since O"(£i)) = O"(X t E 1;), i = 1,2,
Lemma 2 is proved. "

Corollary 2. If the r.v.s X" t E T, are independent and TI , 72 are nonempty


disjoint subsets of T, then O"(X" t E TI ) and O"(X t E T2) are independent.
"
The joint dJ. of any finite set of r.v.s X I' ... , X. has been defined in (2) of
Section 1.6.by

FXI ..... Xn(Xt'···'xn)=P{XI <XI,""Xn<x n}, (3)

Theorem 2. Random variables X I' ... , X n on a probability space (n, Y', P) are
independent if and only if
n

F XI ..... Xn = fl F Xi'
j= I
(4)

PROOF. Necessity is trivial since {Xj < Xj} is an event in O"(X j), 1 ~ j~ n. To
prove sufficiency, let £i) I be the class of sets of the type {X n < ..1.n} while £i) 2 is
the class of sets of the form {X I < ..1. 1, ... , X n- I < ..1.n- d. Then £i)1 and £i)2
are n-classes and independent. It follows from Lemma 2 that O"(Xn) and
O"(X h ... , X n - I ) are independent. Consequently, if An E O"(X n) and Ai E O"(X)
C O"(X I , . . . , X n - 1 ), 1 :::;; j < n,

(5)

If n = 2, the proofis complete. Otherwise, since (4) holds with n replaced by


n - 1 so does (5), and repeating the argument a finite number of times
n

P{A I A 2 ... An} = fl P{A j}


i= I

for all A j E O"(X i ), 1 ~ i ~ n. Thus, the classes O"(X j ), 1 ~ i ~ n, and hence also
the r.v.s Xi' 1 ~ i ~ n, are independent. 0

Corollary 3. (i) The random variables comprising a stochastic process {X" t E T}


are independent iff for all finite subsets of indices {tJET, 1 ~ j ~ m} the joint
distribution of X", ... , X,," coincides with flj=
I F X'i'
(ii) If {X n , n ~ I} are independent LV.S, {Y", n ~ 1} are independent LV.S
and X. and Yn are identically distributed for n ~ 1, then F x I ..... X n = FyI ..... Yn'
n ~ 1.
64 3 Independence

It is a remarkable fact that probabilities of sets of a certain class defined in


terms of independent r.v.s cannot be other than zero or one. The sets in
::Juestion are encompassed by the following

Definition. The tail a-algebra of a sequence {X n' n 2:: I} of r.v.s on a prob-


ability space (0, ffl, P) is n:= 1 <I(X j, j 2:: n). The sets of the tail <I-algebra are
called tail events and functions measurable relative to the tail <I-algebra are
dubbed tail functions.

A typical example of a tail event of {X n' n 2:: I} is {OJ: I:= 1 X n( OJ)


converges} since convergence depends entirely upon the "tail of the series."
If An, n 2:: 1, are independent events, then X n = IAn are independent r.v.s.
Moreover, U~n AjE<I(Xj , j 2:: m) for n 2:: m 2:: 1 whence

co co
{An,i.o.} = n U Aj E<I(Xm,Xm+ 1 ,···), allm2:: 1.
n=m j=n

The Kolmogorov zero-one law (below) confines the value of P{ An, i.o.} to
zero or one while the Borel zero-one criterion provides a touchstone. The
zero-one law, however, applies to independent r.v.s other than indicator
functions.

Theorem 3 (Kolmogorov Zero-One Law). Tail events of a sequence {Xn ,


n 2:: I} ofindependent random variables have probabilities zero or one.
PROOF. By Corollary 2, for each n 2:: 1 the classes <I(X i , 1 ~ i ~ n) and
<I(Xj,j> n) are independent and so a fortiori are <I(X j , 1 ~ i ~ n) and
n:=o <I(Xj,j > n) = ~ (say) for every n 2:: 1. The latter implies that
.511 = U:= 1 <I(X j , 1 ~ j ~ n) is independent of ~. Since .511 is an algebra
(Exercise 1.3.4) and hence a n-class, Lemma 1 ensures that <1(.511) and ~ are
independent. But ~ C <I(X n, n 2:: 1) = <1(.511), whence the tail <I-algebra .Ql is
independent of itself! In other words, for every B E~, P{B} = P{B· B} =
P 2 {B}, implying P{B} = 0 or 1. 0

Corollary 4. Tail functions of a sequence of independent r.v.s are degenerate,


that is, a.c. constant.
PROOF. For any tail function Y, by the zero-one law pry < c} = 0 or 1 for
every c in ( - 00, (0). If P{ Y < c} = 0 for all c, then P{ Y = oo} = 1, whereas
if P{Y < c} = 1 for all c, then P{Y = - oo} = 1. Otherwise Co =
inf{c: pry < c} = l} is finite, whence pry = co} = 1 via the definition
of co. 0

Corollary 5. If {X n' n 2:: I} is a sequence of independent r.v.s then Ilriin_ co X n


and limn- co X n are degenerate.
3.3 Convergence in Probability 65

PROOF. Since for each n ~ k ~ 1, X n is a(Xj,j ~ k)-measurable, Yk =


SUPn~k X n is a(Xj,j ~ k)-measurable (Section 1.4), whence Yn is a(Xj,j ~ k)-
measurable for n ~ k ~ 1, implying mnn_a:> X n = limn_a:> SUPj~n X j =
lim n_", y" is a(Xj,j ~ k)-measurable for all k ~ 1. Thus, mnn_a:> X n is
nk";l a(Xj,j ~ k)-measurable, i.e., a tail function, and similarly for lim X n,
so that the conclusion follows from Corollary 4. 0

EXERCISES 3.2
1. Find a trivial example of dependent events A" with I P{A"} divergent but
P{A", i.o.} < 1.
2. If X", n :?: I, are independent LV.S, prove that P{lim"~OC' X" = O} = I iff
<Xl
I P{IX"' :?: r.} < 00,
n= 1

all r. > O.
3. Prove that if(i) {X", n:?: I} are i.i.d. N(O, (12) random variables, then

P { r=
urn X" = (J} = 1.
"~<Xl J210g n
If, rather, {X", n :?: I} are i.i.d. exponential LV.S with parameter ;. (Exercise 3.1.15)
then P{limn~<Xl Xn/log n = r J } = 1. (ii) If {X", n ~ I} are i.i.d. Poisson LV.S with
parameter A, then P{limn~oo Xn(log log n)/log n = 1} = 1 irrespective of A. Hint:
Recall Exercise 2.1.4.
4. Show that random vectors X = (X" ... , X m) and Y = (YI , . . . , yo) are independent
(of one another) iff the joint dJ. of X and Y is the product of the dJ.s of X and of Y.
5. If {Xn , n :?: 1} is a sequence of finite-valued Lv.s and Sn = I1
Xi' determine which
of the following are tail events of {Xn , n:?: I}: {Sn converges}; {limSn > IimSn};
{lim Sn = oo}; {X n > C, i.o.}; {limX n = O}; {Sn > Cn' i.o.}.
6. If {Xn, n :?: I} is a sequence of independent finite-valued LV.S and {an, n :?: I} is a
sequence of finite constants with 0 < an --+ 00, show that lim Snlan and lim Snla" are
degenerate.
7. (Barndorff-Nielsen,) If {A", n :?: I} are events satisfying I:= 1 P{A"A~+ I} < 00
and P{A"} = 0(1), then P{A", i.o.} = O.
8. Let {Xn,n:?: I} be i.i.d. LV.S with P{X 1 = I} = P{X I = -I} = 1 and set S" =
Ii= 1 X;, Prove that
i. Iim"_<Xl P{SJ';;; < x} = <l>(x), - YJ < X < x,
ii. P{inf";" S" = -oo} = P{SUPn;'l Sn = oc} = 1.
9. If X", n :?: I, are i.i.d. demonstrate equivalence of the following relations:

i. rrm"~OC IX"l/n ~ I,a.s.;


ii. I:=I P{IXnl > n} < 00;
iii. I:=I n P{n - I ~ IX,I < n} < 00.

10. Verify for i.i.d. r.v.s X n , n ~ I, that p{rrmn~oo X n = oo} = I if and only if XI is
unbounded above, i.e., P{X 1 < C} < I, all C < 00.
66 3 Independence

11. For each n ~ 1, let {X•. j,j ~ l} be a sequence of independent LV.S. Then
sup/X.)!. 0 iffIi'=1 P{IX.) > e} = 0(1), all e > O.
12. If {X., n 2:: I} are independent LV.S with E X. = 0, E X; = 1 which obey a central
limit theorem, i.e., lim._., P{S. 2:: xn 1/2} = I - <1>(x), x E ( - OC, (0), where S. =
2:7 Xi' n ~ 1, prove that fiffi._., S.Jn l / 2 = 00, a.c. Hinr: Utilize the zero-one law.

3.3 Convergence in Probability, Almost Certain


Convergence, and Their Equivalence for
Sums of Independent Random Variables
If {X., n ~ I} is a sequence ofr.v.s on (n,~, P), the set C, where X., n ~ 1,
converges (i.e., lim X.(w) exists and is finite) is a measurable set since

C = tOI .VI DI
co co co {
IX.+i(w) - X.(w)l <
I}
k.
Moreover, if P{C} = I, there exists a T.v. X on (n,~, P) such that

.-co X. = X} = I.
p{lim (I)

It suffices to define
X(w) = lim X.(w), WEC,

= lim X.(w), WECC,

and clearly X is a r.v. In such a situation, the r.v.s X., n ~ I, are said to
converge almost certainly (or almost surely) to the r.v. X, denoted symbolically
by X. ~X. Even if X(w) is not a r.v. it is still possible that lim._ co X.(w) =
lim._ oo X.(w) = X(w) with probability one, likewise denoted by X.~X.
The former will be distinguished from the latter on occasion by writing
X.~ a T.v. X or X. ~X, finite.

Definition. A sequence of r.v.s {X., n ~ I} on a probability space (n, ~, P)


is said to converge in probability (to X) if there exists a r.v. X on (n, ~, P)
such that
P{IX. - XI > e} = 0(1), all e > O.
This will be denoted symbolically by X• .E.. X.

If r.v.s X • .E.. X and X. ~ Y, then X = Y, a.c. (Exercise I). Moreover, the


calculus of convergence of real numbers carries over to convergence in
probability (Exercise l(ii), l(iv».
The next lemmas analyze a.c. convergence and convergence in probability.
3.3 Convergence in Probability 67

Lemma I. Random variables X n ~a r.v. X iff sUPj~nlXj - XI-f. 0 ifffor all


I::,b > 0

P {O [IXj - XI ~ I::]} ~ 1- b for n ~ N(I::, b).

PROOF. By Lemma 2.2.1, X n ~ X, a r.v., or equivalently X n - X ~ 0, iff


P{IX n - XI > 1::, Lo.} = 0, I:: > O.
Hence, X n ..!!4 X iff for all I:: > 0

!~~ P{~~~IXj - XI > I::} = !~~ pLQn [\X j - XI > I::]}

= pLOt jQn[IX j - XI > I::]} = 0, (2)

that is, iffsuPj~nlXj - XI-f. O. The final condition of the lemma is simply a
restatement in finite terms of the following alternative form of (2):

I:: > O.
o
If X, X n, n ~ 1 are r.v.s with L:'=l P{IXn - XI> I::} < 00, all I:: > 0 then the
Borel-Cantelli lemma ensures that X n ~ X.

Corollary l. Ifrandom variables Xn~ a LV. X, then X n ~ X.

Although a.c. convergence of r.v.s X n' n ~ 1, is, in general, stronger than


convergence in probability, the latter does circumscribe rrm
X n and lim X n
(Exercise 5).

Corollary 2. If random variables X n ..!!4 a r.v. X, then g(X n)..!!4g(X)for g


continuous.

Strictly speaking, the latter is not a corollary of the lemma but rather an
immediate consequence of the definition and continuity.

Lemma 2. Random variables X n !. a r.v. X iff(i)suPm>n P{ I X m - Xnl > I::} =


0(1), I:: > 0, iff (ii) every subsequence of {X n, n ~ I} has itself a subsequence
converging a.c. to the same r.v. X.

PROOF. If X n ~ X, then for aliI:: > 0, n ~ no(l::) implies P{ IX n - X I > I::} < 1::.
Hence, for m > n ~ no(I::/2)
68 3 Independence

which implies (i). Conversely, if (i) obtains, then for any integer k ~ 1,
P{IX n - Xml > 2- k} < 2- k provided n> m ~ mk' Set n l = m\,ni+1 =
max(nj + 1, mi+I), and X~ = X nk and Ak = {IX~+ 1 - X~I > 2- k}. Then
:D"'= 1 P{A k } < 00, whence the Borel-Cantelli theorem ensures that, apart
from an w-set A of measure zero, IX~ + 1(w) - X~(w) I :$ 2- k provided k ~
some integer ko(w). Hence, for WE AC, as n -+ 00
00 00 1
~~~IX;" - X~I:$ k~nIX~+1 - X~I:$ k~nrk = 2n - 1 = 0(1),
and so

p{rrm X~ = lim X~, finite} = 1.


Ic-C() k-et)
(3)

If X = lim X~, then X is a r.v. and, according to (3), X nk 24 X as k -+ 00. By


Corollary 1, X nk -f. X. Since for any e > 0

P{IXk - XI > e} :$ P{IXk - Xnkl > i} + P{IX nk - XI > H = 0(1)

as k -+ 00, X k ~ X.
Next, if X n ~ X, any subsequence of {X n}, say X; -f. X, whence, as already
shown, there exists a subsequence of {X;}, say X~ ~ some LV. Y. Then
X~ -f. Y but also X~ -f. X, necessitating X = Y, a.c. (Exercise 1). Thus, the
subsequence X; has a further subsequence X~ ~ X.
Finally, if X n does not converge in probability to X, there exists an e > 0
and a subsequence Xnkwith P{IX nk - XI > e} > e.Butthennosubsequence
of X nk converges in probability to X (a fortiori almost certainly to X) in
violation of (ii).

Corollary 3. If random variables X n -f. X, then g(X n) -f. g(X) for every
continuous function g.
PROOF. Every subsequence of Yn = g(X n ) has a further subsequence y"k =
g(XnJ with X nk ~ X. By Corollary 2, Ynk = g(XnJ ~ g(X), whence, by
Lemma 2, g(X n) = Yn ~ g(X).

Corollary 4. Random variables Xn~ a r.v. X iff (iii) sUPm>nlXm - Xnl.f. O.


PROOF. If X n ~ X, both sUPm>nlXm - XI-f. 0 and IX n - XI ~ 0 via
Lemma 1, whence for any e > 0

P{~~~IXm - Xnl > e} :$ P{~~~IXm - XI > i}


+ P{IX - Xnl > i} = 0(1).
3.3 Convergence in Probability 69

Conversely, (iii) entails sUPm>n P{IX m - Xnl > t:} = 0(1), all e > 0, and so
by Lemma 2 there exists a LV. X with X n E. X. Thus, for all e > 0

P{~~~IXm - e} ~ P{~~~IXm - n
n
XI > Xnl >

+ P{IXn - XI > = 0(1)

as n -+ ro, implying X n ~ X by Lemma 1. o


The question of a.c. convergence of a sequence of L v.s {X n' n 2:: I} depends
only upon the corresponding sequence of joint dJ.s {Fx, ... x., n 2:: I}. In
other words, if {X n' n 2:: I} and {Yn , n 2:: I} have the same finite-dimensional
joint dJ.s, then X n ~ X iff y" ~ Y. In fact, setting Am = {IXm - Xnl > t:},

P{sUPIXm - Xnl >


m>n
e} = p{ U Am}
m=II+1

= P{An+d + P{A~+ IA n+2}


+ P{A~+IA~+2An+3} + ...
= P{sUPI Ym
m>n
- Y"I > e}
by Theorem 1.6.1, and so the equivalence follows from Lemma 1.

Suppose that for some constants 0 < bn i 00, bn+ Ilb n -+ 1, i.i.d. r.v.s
{X n , n 2:: I} and partial sums Sn = L:~ Xi, n 2:: 1,

Sn ---+
-
•.c. S finite. (4)
bn '

Then, clearly, it is necessary that

(5)

and via the Borel-Cantelli theorem


00 00

L: P {IX I I > t:b = n;1


n;1
L: P {IX nI > eb
n} n} < 00, all t: > O. (6)

This is a restriction on the dJ. of X I (see Corollary 4.1.3) and thereby pro-
vides a necessary condition for (4). On the other hand, if (4) is replaced by

(7)
70 3 Independence

the dJ. of X 1 should likewise be constrained, but the simple subtraction of (5)
leads nowhere. However, (6) is still necessary for (7) if the r.v.s {X n' n ~ I}
are i.i.d. as follows from the second of the ensuing lemmas.

Lemma 3 (Feller-Chung). Let {An, n ~ I} and {B n, n ~ I} be sequences of


events on (0, fF, P) and set Ao = 0. If either (i) Bn and AnA~-1 ... Aoare
independentforall n ~ 1 or (ii) the classes {B n } and

are independent for all n ~ I, then

PROOF. In case (i)

pL91 AnBn } = pL9IBnAnZi(BjA)C} = JIP{BnAnZi(BjAj)C}

~ n~1P{BnAn~O>j} = J1 P{Bn}P{An:O~Aj}
~ (~~~ P{Bn}) pL91 An},
and in case (ii)

p{U AjBj} ±P{AjB j n(Ai Bi)C} ~ j=1±P{AjBj nA~}


I
=
j=1 j+1 j+1

= ±P{Bj}P{Aj nA~} ~
j=1 j+1
inf P{BJP{U Aj},
Isjsn I

whence the conclusion follows as n -+ 00. o


Lemma 4. Let {Y", n ~ I} and {Zn, n ~ I} be sequences of r.v.s such that
either (i) Y" and (Z1"'" Zn) are independent for all n ~ 1 or (ii) Y" and
(Zn, Zn+ 1" ..) are independent for all n ~ 1. Then for any constants en, bn,
e, and b,

P L9/ Zn + Y" > en]} ~ P L9 1


[Zn > en + bn]} !~~ P{ Yn ~ - bn}, (8)

p{nm (Zn + y") ~ e} ~ p{lim Zn > e+ b}. lim P{Yn ~ -b}. (9)
n-oo "-00 "-00
3.3 Convergence in Probability 71

a.c., entails liffin-+ oo Zn ~ e, a.c. Furthermore, if lim..-+oo P{Yn > -b} .


00 P{ y" < b} > 0 for all 15 > 0 ( a fortiori, if Y" ~ 0), then Zn + Y" ~ 0
lim..-+
implies Zn ~ O.
PROOF. Set An = {Zn > en + bn}, Bn = {Yn 2: -bn}. By Lemma 3, for m 2: 1

ptOm [Y" + Zn > en]} 2: pt9mAnBn} pt9m


2: An} ni~~ P{Bn},
yielding (8) for m = 1 and (9) via en == e, bn == 15, m -+ 00. The penultimate
statement follows easily from (9), and since both (8) and (9) also hold for
{ - Y,,}, { - Zn} the final assertion likewise obtains. 0

EXAMPLE 1 (Chung). Let {Xn, n 2: I} be independent r.v.s with partial sums


Sn = L~ Xi> n 2: 1. If SJn ~ 0 and S2"/2 n ~ 0, then SJn ~ O.
PROOF. For k = 1,2, ... there is a unique integer n(k) such that 2n(k)-1 ~
k < 2n(lt). Take e > 0 and set

Since Bk and A k A k- I .•. A~ A~ (A o = 0) are independent and en::;)


U~::} AkBlt , by Lemma 3

p{ UCk} p{U AkBk} p{U


m+ 1
2:
2'"
2:
2'"
Alt } inf P{Blt }·
k ~ 2'"
(10)

By hypothesis and Lemma 1 the left side of (10) is 0(1) as m -+ 00, and
moreover,
P{Bk} ::;; P{IS2"(k,j > 2n(k)-l e} + P{lSkl > 2n(k)-l e} = 0(1)
as k -+ 00. Consequently, P{Ui... Ad = 0(1) as m -+ 00, and so SJn ~ O.
o
Although, in general, a.c. convergence is much stronger than convergence
in probability, in the special case of sums Sn of independent random variables
the two are equivalent. A basic tool in demonstrating this equivalence for
sums of independent random variables is an inequality due to P. Levy. This
in turn necessitates the

Definition. For any r.v. X a real number m(X) is called a mf'dian of X if


P{X ~ m(X)} 2: t ::;; P{X 2: m(X)}.
In fact, if a = inf{A: P{X ::;; A.} 2: t}, then lal < 00 and, since (Exercise
1.6.3) P{X ::;; ..l} is right continuous, P{X ::;; a} 2: t. By definition,
P{X ::;; a - e} < t
72 3 Independence

for all e > 0, and so letting e -+ 0, P{X < a} ::; ! or equivalently P{X ~ a}
~ !. Thus, a is a median of X.
A pertinent observation concerning medians is that if for some constant c
P{IXI ~ c} < e::;!, then Im(X)1 ::; c. Moreover, if c is any finite constant,
cm(X) and m(X) + c are medians of cX and X + c respectively.

Lemma 5 (Levy Inequalities). If {X j , 1 ::; j ::; n} are independent LV.S,


Sj= II= I Xi, and m( Y) denotes a median of Y, then for every e > 0

p{ max [Sj - m(Sj - Sn)]


IS}Sn
~ e} ::; 2 P{Sn ~ d, (11)

pt~::nISj - m(Sj - Sn)1 ;::: e}::; 2 PUSnl ~ e}. (12)

PROOF. Set So = 0 and define T to be the smallest integer j in [1, n] for which
Sj - m(Sj - Sn) ~ e (if such an integer exists) and T = n + 1 otherwise. If
B j = {m(Sj - Sn) ~ Sj - Sn}, 1 ::; j ::; n,
then P{B j };:::!. Since {w: T = j} E a(X I, ... , X), B j E a(X j + 1>'''' X n),
and {Sn ;::: e} ::> Uj= I Bj{T = j},
n n

P{Sn ~ e} ;::: I P{BlT = n} = I P{B j } • P{T = j} ~ ! P{1 ::; T ::; n},


j=1 j=1

which is tantamount to (11). Rewrite (11) with Xjreplaced by - X j , 1 ::; j ::; n,


recalling that m( - Y) = -m(Y), and add this to (11) to obtain (12). 0

Definition. A r.v. X is called symmetric or said to have a symmetric d.f. if X


and - X have the same distribution function.

It is easy to verify that X is symmetric iff P{X < x} = P{X > -x} for
every real x and also that zero is a median of a symmetric r.v. It follows from
Corollary 3.2.3 that sums of independent symmetric r.v.s are themselves
symmetric r.v.s. This leads directly to

Corollary s. If {X j , 1 ::; j ::; n} are independent, symmetric r.v.s with partial


sums Sn = I~ X j , then for every e > 0

(13)

(14)

Theorem 1 (Levy). If {X n , n ~ I} is a sequence of independent LV.S, then


Sn = I~ Xi converges a.c. iff it converges in probability.
3.3 Convergence in Probability 73

PROOF. It suffices to verify sufficiency. By Lemma 2, for any e in (0, ;}), there
exists an integer ho such that n > h ~ ho implies, setting Sh,. = S. - Sh'
that P{lSh,.1 > e} < e. In view of an earlier comment, this entails Im(Sh,.)1
:5; e for n > h ~ ho . By Levy's inequality (12), for k > h ~ ho

p{ max ISh,. I >


h<.:5:k
2e} = p{h<.:5:k
max ISh,.1 > 2e, max Im(S.,k)1 :5; e}
h<.:5:k

:5; 2 P{lSh,kl > e} < 2e.

Hence, letting k -+ 00, if h ~ ho ,

p{su p ISh,.1 > 2e} :5; 2e,


.>h

and so S. ~ some r.v. S by Corollary 4. o


Lemmas 4 and 2 may be exploited to give an alternative proof of Levy's
Theorem. Since S• .!. S, the latter ensures the existence of a subsequence k.
with Sk" ~ S. Now for every integer m > 0, there is an integer n = n(m) such
that k. :5; m < k.+1' Clearly, m -+ 00 entails k. -+ 00.
Set Ym = S - Sm and Zm = Sm - Sk . By hypothesis, Ym ':' 0 and moreover
Ym + Zm ~ 0 by the choice of k•. "Clearly, (ZI"'" Zm) is a(X 1"'" X m)-
measurable and as noted in the proof of Corollary 3.2.5, Ym is a(Xj , j > m)-
measurable. Since the two a-algebras are independent via Corollary 3.2.2, so
are (ZI"'" Zm) and Ym for every m ~ 1. Hence, by Lemma 4, Zm ~ 0
implying Ym ~ O. D

EXERCISES 3.3
I. i. If X • .!. X and X • .!. Y,thenP{X = Y} = I.
ii. X • .!. X and Y• .!. Y imply X. + Y• .!. X + Y.
iii. X. ~ 0 implies m(X.) --+ O.

iv. If X • .!. X, Y• .!. Y,andgisacontinuousfunctiononR 2,theng(X., y,,).!:. g(X, Y).

2. Let{X.,n~ l}and{Y",n:2: I} be two sequences ofr.v.swith F x 1. ...• x n = F y 1• ... , y n


for n :2: I. If X• .!. X, prove that Y• .!. Y and that X and Yare identically distributed.
Hint: Apply Lemma 3.3.2.

3. i. What is wrong with the following" proof" of Corollary 3?

P{lg(X.) - g(X)1 > e} ~ P{IX. - XI > 8} = 0(1).


74 3 Independence

ii. A LV. X is symmetric iff X + and X - have identical d.f.s.


iii. If 0 is a median of a LV. X, it is also a median of XI!lxl <c) for any c > O.

4. If independent LV.S X.!. X, then X is degenerate. Prove for nondegenerate i.i.d.


LV.S {X.} that PiX. converges} = O.

5. For any sequence of LV.S {X., n ~ O} with X• .!. X o,

f
P lim X. ~ X ~ fiffi
0 X.} = 1.
) " ..... x "-00
Conversely, if fiffi.~ 00 X. = X 0 (resp.li.m..~ 00 X. = X 0), a.c., then for any € > 0
PiX. > X o + €} = 0(1) (resp. PiX. < X o - €} = 0(1»).

6. If {X., n ~ I} are independent, symmetric LV.S such that (lIb.) Ii Xi.!. 0 for some
positive constants b., then (llb.)max l ";,,. Xi!. O. Hint:

7. If the LV.S X.lb• ..!'. 0, where the constants b. satisfy 0 < b.l 00, then
max Im(X j - X.)I = o(b.).
1 SjS"

8. Prove for any LV.S {X.} and constants {b.} with 0 < b.l 00 that (i) (lIb.) I1 Xi ±
o implies X.lb. 2...
a.c.
0, (ii) if for identically
p
distributed {X.}, some nonzero
constant c and 0 < b. --+ 00, (lib.) L~ Xi --+ c, then b. - b.-I'

9. Let the stochastic processes {X., 1 ~ n ~ k} and {X~, 1 ~ n ~ k} be independent


of one another and have the same joint distributions. If m. is a median of X.,
I ~ n ~ k, then for € > 0 (see Lemma 10.1.1)

P { max IX. - m. 1 ~ €} ~ 2 P { max 1x. - X~ I ~ €}.


15:"$k 1 S"$II:

10. If r.v.s X. ~ X as n --+ 00 and {N., n ~ I} are positive-integer valued L v.s with
(i) N. ~ Xl, then X N " ~ X. If, rather, (ii) N• .!. Xl, that is, PiN. < C} = 0(1) all
C > 0 and X is a LV., then X N ".!. X.
II. If the w" on (O,:J', P) is .~.-measurable, n ~ 1, where .¥. is a decreasing
n:=
LV.

sequence of sub-u-algebras of :J' and w" ~ W, then W is I .j: .-measurable.

12. If X .. X 2' X 3 are independent, symmetric LV.S with P{ I X I + X 2 + X 31 ~ M} = 1,


then P{I~=lIXil ~ M} = I.
13. If {X, X., n ~ I} are finite measurable functions on a measure space (S, :E, J1) with
J1 { I x. - XI> t} = 0(1), € > 0, then SUPm >. J1 { I X m - X.I > €} = 0( I), € > 0, and
there exists a subsequence X' j with J1{limj~oo X' j oF X} = O.

14. Let {X., n ~ I} be LV.S such that P{IX.I ~ c > O} ~ b > 0, n ~ I. If {a.} are
finite constants for which a. X. !. 0, then a. --+ O.
3.4 Bernoulli Trials 75

15. IfLv.s X. ~ X, finite, prove thatforevery t: > Othere isaset A, with PIA,} < t:such
that lim X.(w) = X(w) uniformly on A~. This is known as Egorov's theorem.
(Hint: Verify that if A•. k = n1=' {IX j - XI < 2- k } and A = {tim X. = X}, then
lim._., P{A•. d = P{tim._., A•. d ~ PIA} = I, whence for some integers nk
P{ A~ •. k} < t:/2 k. Take A, = Uk'= I A~ •. k)' State and prove the converse.
16. If {X., n ~ I} are independent LV.S, Sm.• = Li=m+ 1 X j , S. = SO.• , then for any
f.> 0

.
This is Ottaviani's inequality. Hint: If T = inf{j

U
~ I: ISjl > 2t:}. then

{T = j,ISj.• 1 ~ t:} c {IS.I > t:}.


j= 1

17. If {X, X., n ~ 1} are i.i.d. symmetric r.v.s and S. = L~ Xi' then (i)
n
P{S. > x} ~ ZP{X > 2X}p·-1 {X ~ 2x}

for x> 0, and (ii) P{S. > x} ~ (n/2)P{X > x} [1 - (n - 1)P{X > x}]. Part (i)
is untrue if aU "twos" are deleted (Take n = 2, x = !, and P{ X = ± I} = t). Hint:
Apropos of (ii), define T = inf{j ~ I: Xi> x}.

18. Let S. = I~ Xi where {X., n ~ I} are independent LV.S and suppose that
lim._., P{S._I ~ -bn} > 0, all {j> O. Then rrm
Sjn ~ C < <x, a.c. implies
I:,=I PIX. > t:n} < x., all t: > C.

3.4 Bernoulli Trials


A sequence {X n , n ~ I} of i.i.d. LV.S with P{X. = I} = pE(O,I) and
P{X n = -I} = q = 1 - p constitutes a sequence of Bernoulli trials with
parameter p. Define S. = I~ Xi, n ~ 1. If y" = (X. + 1)/2, clearly
{y., n ~ I} is a sequence of Bernoulli trials with success probability p
(Section 1) and so {(Sn + n)/2, n ~ I} is a sequence of binomial r.v.s. Thus,
the DeMoivre-Laplace, Bernoulli, and Borel theorems all pertain to Ber-
noulli trials with parameter p.
According to the intuitive notion of fairness, a sequence of tosses of a fair
coin should at any stage n assign equal probabilities to each of the 2n n-tuples
of outcomes. If a gambler bets one dollar on correctly guessing each indi-
vidual outcome and Xi denotes his gain (or loss) at the ith toss, this is
tantamount to requiring that P{X 1 = ±1,,,,,Xn = ±1} = 2-· for each
of the 2n choices of sign, n ~ 1. Thus, his cumulative gain (or loss) after n
tosses is S. = IiXj, where {X n , n ~ I} are Bernoulli trials with parameter
1. The graph of Sn, n ~ 1, shows the random cumulative fortunes (gains) of
the gambler as a function of n (which may be envisaged as time), and the
fortunes Sn, n ~ I, are said to undergo or constitute a random walk.
76 3 Independence

The distribution of the length of time 7k to achieve a gain of k dollars is


implicit in Theorem 1 (Exercise 5) while the limit distribution of the "first
passage time" 7k as well as that of maxI sjsn Sj appear in Theorem 2.
The same limit distributions are shown in Chapter 9 to hold for a large
class of random walks, i.e., sequence of partial sums Sn, n ~ 1, of i.i.d. random
variables.
Clearly the r.v.S {X n, n ~ I} constituting Bernoulli trials with parameter
p = t are independent symmetric LV.S, and so by Corollary 3.2.3 the joint
dJ.s of (X I' ... , X n) and ( - X I' ... , - X n) are identical and Theorem 1.6.1
guarantees
P{(XI, ... ,Xn)EB} = P{(-X I, ... , -Xn)EB}
n
for any Borel set B of R .

Theorem 1. If {Xn,n ~ I} are i.i.d. with P{X 1 = I} = P{X 1 = -I} =t


and Sn = L7=
1 X;, thenfor every positive integer N:

p{ max Sj ~ N, Sn < N} = P{Sn > N}; (1)


1 ~J~n

p{ max Sj ~ N} = 2 P{Sn ~ N} - P{Sn = N}; (2)


1 SJsn

p{ max Sj
1 SJsn
= N} = P{Sn = N} + P{Sn = N + I}
=r
n
([(n + : + 1)/2]). (3)

where [A.] is the integral part of A. if A. ~ 0, [ - A.] =- [A.] if A. ~ 0;

P {Sj '" 0, 1 ~ j ~ n + I} = P {max Sj ~ o}


I SJ Sn

= P{Sn = o} + P{Sn = I} = 2-n([n~2]); (4)

P{SI 1= 0, ... , Sn '" 0, Sn+ 1 = o} = p{ max Sj


1 s,js,n-I
~ 0, Sn > o}. (5)

PROOF. Define T to be the smallest integer j in [1, n] for which Sj = N if


such j exists and T = n + 1 otherwise. Then [T = k] E O'(X I' ... , X k)
and ST = Non [T ~ n] since N ~ 1. Hence, in view of independence,

p{ maxSj ~ N,Sn < N} =P{T ~ n,Sn < N} = P{T < n,Sn < N}
Is,JS,n

=:t:P{T= k, i=t+IX; < o}


3.4 Bernoulli Trials 77

= nfp{T = k}P{
k=1
i
i=k+1
Xi <o}
= nil P{T = k}P{ i Xi> o}
k=1 i=k+1
n-I
= L P{T =
k=1
k, Sn > N}

= P{T < n, Sn > N} = P{Sn > N},


yielding (1). To obtain (2), note that via (1)

p{ m~x Sj ~ N}= p{ m~xSj


I Sjsn Sjsn
~ N,Sn < N} + p{ maxS
1 Sjsn
j ~ N, Sn ~ N}
1

= P{Sn > N} + P{Sn ~ N} = 2 P{Sn ~ N} - P{Sn = N}.


The first equality of (3) follows via (2) since

p{ m~xSj = N} = p{ ISjsn
ISjSn
m~xSj ~ N} - p{ m~xSj ~ N + I}
ISjsn
= 2 P{Sn ~ N} - 2 P{Sn ~ N + I} - P{Sn = N}
+P{Sn=N+l}
= P{Sn = N} + P{Sn = N + I}.
In proving the second equality of (3), if n + N = 2m for some m = 0, 1, ... ,

P{Sn = N} = p{Sn ; n=n ~ N} = (:)2- n


= ([(n +nN )/2J)2- n, (6)

and clearly P {Sn = N + I} = 0. Similarly, if n + N + 1= 2m for some m = 0,


1, ... then (6) holds with N replaced by N + 1.
Apropos of (4),
P{SI # O,,,,,Sn # O} = P{SI = -1,S2 ~ -1,,,,,Sn ~ -I}

+ P{SI = I,S2 ~ 1"",Sn ~ I}


= 2 P{SI = -I, S2 ~ -1, , Sn ~ -I}
= 2 P {X 1 = - 1, X 2 ~ 0, , X2 + ... + Xn ~ O}
= P{X 2 ~ 0, ... , X 2 + + Xn ~ O}
= P{X I ~ O"",X I + + Xn - l ~ O}

= p{ max Sj
ISjsn-1
~ oJ,
which is tantamount to the first equality of (4). To obtain the second, note
that via (2)
78 3 Independence

p{ max Sj:::;
ISjsn
o} = 1 - p{ ISjsn
max Sj ~ I} = 1 - 2 P{Sn ~ I} + P{Sn = I}

= P{Sn = O} + P{Sn = I}.

The last equality of (4) derives from

P{S2m = O} = C:)2- 2m
, P{S2m+1 = I} = Cmm+ 1)2- 2m - l
.

Finally, to obtain (5), note that via (4)


P{SI =I' O"",Sn =I' O,Sn+1 = O} = P{SI =I' O,,,,,Sn =I' O}
- P{SI =I' 0,. ",Sn+1 =I' O}

= P{ max Sj :::; O}
I sjSn- I

- P { max Sj :::; O}
I Sjsn

= p{ max Sj:::; 0, Sn >


I sjsn-I
o}. 0

Next, a local limit theorem (7) and global limit (8) will be obtained for
- = maxI sjsn Sj' Accordmg
Sn . to (8), the dJ. of n -1/2-Sn tends as n - 00 to the
positive normal dJ. The name arises from the fact that if X is N(O, I) then IX I
has the dJ. 2C1l(x) - I, x > O.

Theorem 2. Let {Xn , n ~ I} bei.i.d. r.v.s with P{X I = I} = P{X I = -I} = t


and partial sums Sn = It
Xi' n ~ 1, So = O. If Sn = maxI sjsn Sj and
1k = inf{ n ~ I: Sn = k}, k ~ 1 then for any sequence of positive integers Nn with
Nn = 0(n 2/3)

(7)

x> 0, (8)

lim P{1k < xk 2 } = 2[1 - C1l(x- I/2)], x> O. (9)


k-oo

PROOF. By (3) of Theorem 1

P{Sn = N} = P{Sn = N} + P{Sn = N + I},

and so, by DeMoivre-Laplace local central limit theorem (Lemma 2.3.2),


if N n + n is even,
3.4 Bernoulli Trials 79

{- _ }_P{Sn -_N }_b(N


P Sn - N n - n -
n + n.
2 ' n, 2
I) __ 2
(2nn)1/2 e
-N~/2n
,

and similarly when N n + n is odd.


Next, if x > 0, setting N = [xn l12 J and employing (2) and Lemma 2.3.2

P(Sn ~ xn 1/ 2 } = P{S. ~ N} = I - P{S. ~ N + I}


= 1 - 2 P{S. ~ N + I} + P{S. = N + I}
= 2 P{Sn ~ N} - 1 + 0(1). (10)

By the DeMoivre-Laplace global central limit theorem (Corollary 2.3.1)

P{S. ~ N} = P {n~~2 ~ x(1 + 0(1»} --+ C1>(x),

whence (8) follows from (10) and (7).


Finally, if x > 0, setting n = [xk 2 ], via (8) and continuity ofthe normal d.f.

P{1k ~ xk 2 } = P{1k ~ n} = P{Sn ~ k}


--+ 1 - [2cI>(x- 1 / 2 ) - 1] = 2[1 - cI>(x- 1 / 2 )],

and since

P{1k = xk 2 } ~ P{1k = n} ~ P{S. = k} = 0(1)


by the local central limit theorem, (9) follows. o

Theorem 3. Let {X.,n ~ I} be i.i.d. with P{X 1 = 1} = P{X 1 = -I} =t


and set S. = I?=
1 Xi> n ~ 1, So = O. Then

p{lim S.
n-oo
= oo} = 1 = p{lim S. = -oo},
n-C('I
(11)

P{S. assumes every integer value, i.o.} = I, (12)

P{S., n ~ 0 reaches k before - j} = j/U + k)for every pair of


positive integers j, k. (13)

PROOF. Since, probabilistically speaking, the sequence {X n , n ~ 2} does not


differ from {X., n ~ I}, defining qj = P{suP.~oS. ~j},j ~ 0, it follows that
for j ~ I
80 3 Independence

=t P{~~f J2 x ~ j - I} + t P{~~~ J2 X~ j + I}
i i

= tP{sUPS n ~j -
n~O
I} + tP{SUPSn
n~l
~j + I} = ¥qj-l + qj+l).

Hence, for j ~ I, qj - qj-l = qj+ 1 - qj = constant = c (say). Therefore, for


j ~ I,qj = cj + qo.SinceO:::;; qj:::;; I, necessarily c = O,whenceqj = qo = I
for j ~ l. That is, P{SUPn;, 1 Sn ~ j} = I for every j ~ I, whence

p{SUP Sn =
n~l
oo} = p{rrm Sn = oo} = l.
n-oo

By symmetry, P{limn_ <Xl Sn = - oo} = I and (II) is proved.


Next, if A = {suPn;, 1 Sn = infn;, Sn = - oo}, then by (II), P{A} = I,
00, 1
necessitating (12).
To prove (13), set r = j + k and
Yi = P{{Sn' n ~ O} reaches i before i - r}, i ~ O.

Then Yo = I, Yr = O. For 0 < i < r

Yi = P{X 1 = 1, {Sn - XI' n ~ I} reaches i - I before i - r - I}


+ P{X I = -I, {Sn - XI> n ~ I} reaches i + I before i - r + I}
= PYi-l + qYi+ 1> (14)
wherep = q = t.
As earlier, Yi = c + Yi-I = ci + Yo, 1 :::;; i:::;; r. Since Yo = I, Yr = 0,
necessarily Yi = 1 - (ifr) = (r - i)/r, 0 :::;; i :::;; r, and Yk = (r - k)/r = j/U + k).
o

Theorem 4. Let {Xn' n ~ I} be a sequence of Bernoulli trials with parameter


p "# t. If the partial sums are defined by Sn = L~ Xi' n ~ I, and So = 0, then
for any positive integers j, k
(p/qt - (p/q)k+ j
P{partial sums {Sn, n ~ O} reach k before - j} = 1 _ (p/q)k+ j . (15)

PROOF. Set r = j + k and s = p/q, where, as usual, q = I - p, and define


for any integer i ~ 0
Yi = P{ {Sn, n ~ O} reaches i before i - r}.

They Yo = 1, Yr = 0, and (14) obtains but with p "# t. Hence, for 0 < i < r
P(Yi - Yi-I) = q(Yi+ 1 - Yi) or Yi+ I - Yi = S(Yi - Yi-l)·
Thus, for 0 < i < r
Yi+l - Yi = S2(Yi_1 - Yi-2) = ... = Si(Yl - Yo), (16)
3.4 Bernoulli Trials 81

and clearly (16) holds for i = 0. Since s i: 1, for < ° i :::; r


i i i_ Si
Yi - Yo = m~I(Ym - Ym-l) = m~lsm-I(YI - Yo) = ~(YI - Yo)' (17)

Taking i = r in (17) reveals that - (I - s) = (I - S')(YI - Yo), and hence


-(I - Si) Si - S' Si - sj+k
Yi - Yo = (I _ s') or Yi = I _ s' = 1 _ sj+k '
yielding (15) for i = k. 0

When p = q, the right side of (15) becomes % and by I'Hospital's rule


Sk _ sj+k j
--,-----.,-,.+"'k -+ -. -k as s -+ 1.
l-s) }+

If it were known that the left side of (15) was a continuous function of p,
then (15) would imply (13).

EXAMPLE 1. Suppose that a gambler A with a capital of j dollars and his


adversary B whose capital is k dollars play the following game. A coin is
tossed repeatedly (sequentially) and at each toss A wins one dollar if the
outcome is a head while B wins a dollar if the outcome is a tail. The game
terminates when one of the players is ruined, i.e., when either A loses j
dollars or Bioses k dollars. If the probability of heads is p E (0, I), then by
Theorems 3 and 4

P{A ultimately wins} =j ~ k ifp =q

if s = r. i: 1. (18)
q
Interchanging p with q and j with k,

P{B ultimately wins} =j : k if p =q


s-j - 1 - Sk
r.q i:
s-(j+k)

= I - s-(j+k) = 1- sj+k if s = 1, (19)

and so for all p E (0, I)


P{A ultimately wins} + P{B ultimately wins} = I,
that is, the game terminates (in a finite number of tosses) with probability
one.
If p :::; q, that is, s :::; 1 and B has infinite capital (k = (0), then letting
k -+ 00 in (18) and (19)
P{A ultimately wins} = 0,
P{B ultimately wins} = 1,
ll2 3 Independence

whereas if p > q, that is, S > 1 and B has infinite capital,


P{games terminates} = P{B ultimately wins} = s-j.
The next result demonstrates that when the gamblers A and B have the
same initial capital r, the duration of the game and the final capital of A are
independent random variables. The duration of the game foreshadows the
notion of a finite stopping time (Chapter 5.3).

EXAMPLE 2 (S. Samuels). Let Sn = IJ:1 X j where {Xj , j ~ I} are Bernoulli trials
with parameter pE(O, l)and define T = inf{n ~ 1: ISnl = r}, r > oand T = 00
otherwise. If V = r + ST where ST = I::':1 Sn' lIT:n), then V and T are inde-
pendent random variables.
PROOF. Note that according to Example 1, P{T < a)} = 1. Let C(n, r) =
number of n-tuplets (X l ' ... , X n ) with Sn = r, ISjl < r, 1 ~ j < n. Then
P{T = n, V = 2r} = P{T = n, Sn = r} = C(n, r)p(n+r l/2 q(n-r l/2, n ~ r > O.
and by symmetry
P{T = n, V = O} = P{T = n, Sn = -r} = C(n, r)p(n-r l/2 q(n+r l/2.
Hence,
P{T= n} = [1 + (p/q)'J'P{T= n, V = O}, n~r > 0 (20)
and so

L P{T = n, U = O} = [1 + (pjq)'r
00

P{U = O} = 1
.
n=r

Consequently, according to (20)


P{T = n, V = O} = P{T = n}' P{U = O}, n~r > 0
which, in turn, implies
P{T = n, V = 2r} = P{T = n}' P{U = 2r}, n ~ r > O. 0

EXERCISES 3.4
I. Let Sn = D Xi> n ~ I, where {X n' n ~ I} are i.i.d. with P{X 1 = I} = P{X 1 = -I}
= 1. If Ii = inf {n ~ I: Sn = i}, i = I, 2, then
a. P{T1 < Xl} = I, I:,=
1 P{T1 > n} = 00,
b. T1 and T2 - TI are i.i.d.

2. Show that (a) and (b) of Exercise 1 hold if rather

T1 = inf{n ~ I:Sn = O} and T2 = inf{n > T1:S n = O}.


3. Let Sn = I~ aiX i , where {ai' i ~ I} are constants and {X n' n ~ I} are as in Exercise I.
If Sn":':::" S, where lSI::; M < 00, a.c., prove via Levy's inequality that (i) sUPn" IISnl
::; M, a.c., and, moreover, by an extension of Exercise 3.3.12 that (ii) D
lanl ::; M.
References 83

4. Let {X n ,n:2: I} be i.i.d. with P{X 1 = I} = P{X 1 = -I} =! and Sn = Xi' D


n :2: 1. For positive integers i and j, let T be the smallest positive integer for which
Sn = i or -j. Then
i P{ST = i} - j P{ST = j} = 0, i 2 P{ST = i} + l P{ST = j} = i 'j,
where ST = Sm On the set {T = m}, m :2: 1.
5. If in Bernoulli trials with parameter!, 1;" = inf{n :2: I: Sn = k}, prove that

P{ 1;" = III = 2 -n{ ([(1111:k~l2J) - ([(11 _n k-_II )/2])} for 11 :2: k.


Hillt: Apply (2) to P{1;" ::;; III = P{max 1 sisn Sj :2: k}.

6. If Sn is as in Exercise 1, prove for N = 0, 1, ... that

p{max Sj::;; N, Sn+1 > N} = 2P{Sn+i :2: N + l} - P{Sn+1 = N + I}


j';'n

- 2P {Sn :2: N + I} + P {Sn = N + I}.


7. If Sn = L1
Xj' n :2: 1 where {X n , 11 :2: I} are Bernoulli trials with parameter p, verify
that P{maxj';'n(Sj - jp) > x} ::;; C P{Sn - np > x} where c > O.
8. Let {An' n :2: I} and {B n, n :2: I} be events such that for all large m and n > m, An is
independent of B"(Uj:~AjB)c. Then L::'=I P{A n} = 00 implies P{AnBn, i.o.} :2:
limn~oo P{Bn}. Hint: Use the techniques of Theorem 3.2.1 and Lemma 3.3.3.

References
O. Barndorff-Nielsen, "On the rate of growth of the partial maxima of a sequence of
independent, identically distributed random variables," Math. Scand. 9 (1961),
383-394.
L. Baum, M. Katz, and H. Stratton, "Strong Laws for Ruled Sums," Ann. Math. Stat.
42 (197\), 625-629.
F. Cantelli, "Su due applicazioni di un teorema di G. Boole," Rend. Accad. Naz.
Lincei 26 (1917).
K. L. Chung,"The strong law of large numbers," Proc. 2nd Berkeley Symp. Stat. and
Prob. (1951), 341-352.
K. L. Chung, Elementary Probability Theory with Stochastic Processes, Springer-Verlag,
Berlin, New York, 1974.
J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I, 3rd ed.,
Wiley, New York, 1950.
A. Kolmogorov. Foundatiolls of Probability, (Nathan Morrison, translator), Chelsea,
New York, 1950.
r
P. Levy, Theorie de addition des variables aleatoires, Gauthier-Villars, Paris, 1937;
2nd ed., 1954.
M. Loeve, Probability Theory, 3rd ed. Van Nostrand, Princeton, 1963; 4th ed., Springer-
Verlag, Berlin and New York, 1977-1978.
R. VOn Mises," Uber aufteilungs und Besetzungs Wahrscheinlichkeiten," Revue de la
Faculte des Sciences de r Universite d'lstanbul, N.S. 4 (1939), 145-163.
A. Renyi, Foundations of Probability, Holden-Day, San Francisco, 1970.
4
Integration in a Probability Space

4.1 Definition, Properties of the Integral,


Monotone Convergence Theorem
There are two basic avenues to integration. In the modern approach the
integral is introduced first for simple functions-as a weighted average of
the values of the function-and then defined for any nonnegative measurable
lunction!as a limit of the integrals of simple nonnegative functions increasing
to f Conceptually this is extremely simple, but a certain price is paid in terms
of proofs. The alternative classical approach, while employing a less intuitive
definition, achieves considerable simplicity in proofs of elementary properties.
If X is a (real) measurable function on a probability space (0, $', P), the
integral of X over 0 with respect to P is denoted by Sn
X dP, abbreviated by
E X or E[X], referred to as the expectation or mean of X and defined (when
possible) by:
I. If X ~ 0, then E X = 00 when P{X = oo} > 0, while ifP{X = oo} = 0,

E X = hm

.L
00 i {i i+ I}
2" P 2ft < X ~ - 2
ft • (I)
n-oo 1=1

ii. For general X, if either E X+ < 00 or E X- < 00, then

(2)

where X + = max(X, 0), X- = max( - X, 0). In this case, the expectation


of X is said to exist and E X E [ - 00, 00], denoted by IE X I ~ 00. If
IE X I < 00, X is called integrable.
III. If E X+ = E X- = 00, E X is undefined.

84
4.1 Definition, Properties of the Integral, Monotone Convergence Theorem 85

It is not difficult to see that the limit in (1) always exists, since setting

i i + I} <Xl i
Pn,i = P { 2n < X ~ -2" ' Sn = L fnPn,i' (3)
i= 1
additivity of P guarantees Pn,; = Pn+ I, 2; + Pn+ I, 2;+ I' whence
<Xl 2i
Sn = i~1 2n+1 (Pn+I,2i + Pn+I,2;+I) ~ Sn+I'
and so limn~ <Xl Sn exists, Furthermore, E X ~ Sn' n ~ 1, for X ~ O.
It is trivial to check that
E[I] = 1, 0 ~ E X ~ 00 when X ~ 0,
X ac. 0 ifEIXI = 0, P{IXI < oo} = 1 if EIXI < 00, (4)
EX = E YifP{X = y} = 1, E[-X] = -EX iflEXI ~ 00,

and easy to verify that if X ~ 0 and P {X = oo} = 0,

. L
EX = lIm i +-1 P {i
<Xl - i +-I}
-n< X < -
n~<Xl i=O 2
n 2 - 2n

-_ I'1m ~
L.
n~<Xl i=O
i
-n-
+
2
1 p{n 2
i ~
in
X < - + -I}
2

= lim i ~ P{~2n -< X < +2nI}.


n~<Xl;=12n
i (5)

For example, the last line of (5) equals

lim[
n~<Xl;=12
L<Xl'-'n P{'2~ < .+
X < -'-n- + L '-n P X
2
I} <Xl' 1 {
;=12
= ~. }]
2

L /Ii P{i/I <


. [<Xl
= lIm + 1}
X < i-;;- <Xl /Ii P { X = i-+
+ L n-
I}] ,
n~<Xl i=12 2 2 ;=02 2
which, in turn, coincides with the definition of E X.
Every integrable, symmetric r.v. has mean zero as follows from (2) and
Exercise 3.3.3(ii), whereas the expectation of a nonintegrable symmetric
r.v, is undefined.
A measurable function X is called elementary if for some sequence
{An, n ~ I} of disjoint sets of IF
(6)

where - 00 ~ X n ~ 00. Clearly, simple functions (Exercise 1.4.3) are ele-


mentary. An elementary function X always engenders a partition {B n } of 0
in IF (i.e., Bn E IF, Bn disjoint, and U1"
Bn = 0). It suffices to note that

(6')
86 4 Integration in a Probability Space

where B I = (Uf AnY, YI = 0, Yn+ I = x n, Bn+I = An' n ~ 1.


The basic properties of the integral-linearity, order preservation, and
monotone convergence-are embodied in

Theorem 1. Let X, Y, X n, Yn, n ~ 1, be measurable functions on a probability


space (0, §, P).
J. If X = L:'; t xnl An is a nonnegative elementary function, then

(7)

ii. If X ~ 0, a.c., there exist elementary functions X n with 0 :::; X n i X a.c.


and E X n i E X; moreover, ifP{X = oo} = 0, then X n may be chosen so
that P{X - X n :::; 2- n, n ~ l} = 1.
111. If X ~ Y ~ 0, a.c., then E X ~ E Y ~ 0 andfor any a in (0,00),
1
P{X~a}:::;-EX (Markov inequality). (8)
a

iv. If 0 :::; X niX a.c., then E X n i E X (monotone convergence theorem).


v. IfE X, E Y, E X + E Yare all defined, then E[X + Y] exists and
E[X + Y] = EX +EY (additivity). (9)

vi. If X ~ Y, a.c., and E X, E Y exist, then


EX~EY (order preservation). (10)
vii. IfE X exists and Y is integrable, then for all finite real a, b, E[aX + bY]
exists and
E[aX + bY] = aE X + bEY (linearity). (11 )

PROOF. (i) If P{X = oo} > 0, then X n = 00 for some n for which P{A n } > 0
so that (7) obtains. If, rather P{X = oo} = 0, then X m = 00 for some m
requires that P{A m } = 0, whence X m P{A m } = O. By setting x~ = 0 on any
An C {X = oo} and x~ = X n otherwise, and defining X' == X· lrx<oo) =
Lf x~IAn' it is evident that P{X' = X} = 1, EX' = E X, and X'(w) < 00
for all w. Let

Pni = P{X' E InJ = L P{AJ


j:xje/ni

and note that for all n ~ 1

J.x j
ooi 00 00

i~1 fn Pni :::; i~1 h~/nixj P{A j } :::; P{A j }, (12)


4.1 Definition, Properties of the Integral, Monotone Convergence Theorem 87

so that by prior observations

L xj P{A j} = L xjP{Aj}.
OC! OC!

EX = E X' =
1 1

To prove (ii), set


i
CfJni = 2n Ili/2"<XS(i+ 1)/2"1
and define the simple functions X~l) and elementary functions X~2) by
n' 2"- 1 OC!

X~l) = nIIX>nl + L CfJni' X (2)


n
-
-
~
L. CfJni' (14)
i=l i= 1

Then X~l) i X, X~2) i X', and X~) ~ x~t I> j = 1,2; furthermore, via (i)
E X~2) = Sn ~ Sn+ 1 = E X~221' Consequently, if P{X = oo} = 0, E X~2) i E X
according to the definition (1). Moreover, under the same proviso P{X = oo}
= 0, subadditivity guarantees

which is tantamount to the last assertion in (ii). On the other hand, if


P{X = oo} > 0, then E X~l) 2 n P{X = oo} ~ 00 = E X and it will follow
from the initial portion of the proof of (iii) that E X~I) is monotone.
Apropos of (iii), suppose first that X = Lb, 1 x;IAi' Y = Lj;.1 yjI Bj ,
where {A;} and {B j } are each partitions of n in fF. Then

X = L XJA;Bj' Y=LYjIAiBj'
i.i i.j

and P{X 2 Y 2 O} = 1 entails Xi 2 Yj 2 0 if P{AiB j } > O. By (i)


OC! OC!

EX = LXi P{AiBj} 2 LYj P{AiBJ = E Y.


i,j i. j

For the general case it may be supposed that P{X = oo} = 0, whence also
P{ Y = oo} = O. If X n = X~2) as in (14) and Yn is defined analogously via Y,
then these elementary functions satisfy P{X n 2 Yn 2 O} = 1, so that by the
part already proved E X n 2 E Yn • Hence, by (ii), E X 2 E Y.
Consequently, the Markov inequality follows directly from X 2 a1lx,,- al
and (i).
To prove the monotone convergence theorem, note first in case
P{ X = oo} > 0 that by the Markov inequality and Corollary 1.5.2

EX n 2 a P{X n > a} n-oo, a P{X > a} 2 a P{X = oo} u-~ 00,

which is fitting since E X = 00.


88 4 Integration in a Probability Space

If rather P{X = oo} = 0, note that by this same corollary as m ..... 00

p{~
2"
< X <~}
m-2"
= p{xm >~}2" - p{xm>~2"_~}
2"

. . . p{x >~} - p{x >~i~~}


= p {~ < X < i + I}. (15)
2" - 2"
For any a < E X, the definition of E X ensures the existence of positive
integers n, k such that

"~p
k

i~' 2"
. {'

2"
• +
~ <X<~---
- 2"
I} >a'
whence, via (15), for all large m

E Xm 2 J, 2n p
k i { i
2" < i
X m ~ ~ > a.
+ I} (16)

By (iii), E X m ~ E X m+' ~ EX, which in conjunction with (16) yields


EXmiEX.
Apropos of (v), if X = If
Xi/A, 2 0, Y = Yj/B· 20, where {AJ If
and {BJ are partitions of n in fJ', then X + Y = (Xi + Y)/A,Bj' t.j
yielding via (7) and a-additivity

E [X + Y] =L (Xi + Yj)P{AiB j}
i,j

= LXi P{AJ + L Yj P{Bj} = EX + E Y. (17)


i j

In conjunction with (ii), (17) yields additivity for X 2 0, Y 2 0.


In general, if EX = 00, then E X+ = 00, E X- < ~, E Y > - 00,
E Y- < 00, so that P{X > - 00, Y > - oo} = 1. Now (Exercise 1.4.4),
(X + Y)- ~ X- + Y- and X+ ~ (X + Y)+ + Y-, whence by (iii) and the
part already proved
E(X + Y)- ~ E(X- + Y-) = E X- + E Y- < 00,

E(X + Y)+ + E Y- = E [(X + Yt + Y-] 2 E X+ = 00,

implying E[X + Y] = 00 and hence (9). Similarly (9) holds ifE X = - 00 or


E Y = - 00 or E Y = 00.
Lastly, if IEXI + IE YI < 00, then EX+ < 00, EX- < 00, implying
E IX I = E X + + E X - < 00 by the portion already proved. Similarly,
ElY I < 00, whence P{ IX I < 00, I Y I < oo} = 1 and

EIX + YI ~ EIXI + EI YI < 00.


4.1 Definition. Properties of the Integral. Monotone Convergence Theorem 89

Thus, from
X+ + y+ = (X + Y)+ + [X- + Y- - (X + Y)-],
by the part already proved
E X+ +E y+ = E[X+ + Y+] = E(X + Y)+ + E[X- + Y- -(X + Y)-],
yielding again via the additivity or subtractivity already proved
EX + E Y = E(X + Y).

To dispatch (vi) in the nontrivial case EX < 00, E Y > - 00, note that
via (v) and (iii)
E X = E Y + E[X - Y] ~ E y.
Finally, apropos of (vii), let 0 < a < 00 and X ~ O. If P{X = oo} > 0,
then E[aX] = 00 = a E X. If P{X = oo} = 0, then, recalling (ii), X~2) =
I~ t CfJni i X, a.c., whence 0 ~ aX~2) i aX, a.c., and by (i) and (iv)

E[aX] = lim E[aX~2)] = lim E [.~ a CfJ ni ] = a E X.


n n l= 1

In general, for 0 < a < 00 and IE XI ~ 00

E[aX] = E(aX)+ - E(aX-) = aEX+ - aEX- = aEX.


Since E[O . X] = O· E X = 0 and E[ - X] = - EX,
EaX = aEX (18)
for any finite a whenever E X exists. Finally, (11) follows via (9) and (18).
[J

Note that if X, Z are measurable functions with X ::;; Z (respectively,


X ~ Z) where EZ+ < 00 (respectively, EZ- < (0) then EX exists. A fortiori if
X is bounded above or below by an integrable random variable, then EX exists.

Corollary 1. A measurable function X on (O,:F, P) is integrable iff IX I is


integrable, and in such a case

1
P{IXI ~ a} ::;; - E lXI, a>O (Markov inequality) (19)
a
and
IEXI ~ EIXI. (20)
If X is a discrete r.v. with values {xi' 1 ~ j ~ n} and p.dJ. {Pi' 1 ~j ~ n},
where 1 ~ n ~ 00, Theorem l(i) ensures that
n

EX = LXiPi (21)
i= 1

when xi ~ O. Consequently, (21) holds for any discrete T.v. provided that
I.i= 1 xt Pi or I.i= 1 xjpi is finite when n = 00. On the other hand, if X is an
90 4 Integration in a Probability Space

absolutely continuous r.v. with density g and E X exists,

EX = roo tg(t)dt (22)


·-00

according to Corollary 6.5.4.


In the case of an infinite series of nonnegative measurable functions, the
operations of expectation and summation may be interchanged as follows
directly from Lebesgue's monotone convergence theorem (Theorem l(iv»:

Corollary 2. If {X., n ~ I} are nonnegative measurablefunctions on (0, ~, P),


then
00 00

EI X. = I EX•. (23)
.=1 .=1
Without the nonnegativity proviso, (23) is, in general, false even when
Lr;.1 Xi converges, a.c.

EXAMPLE l. Let {Y,., n ~ I} be i.i.d. random variables with P{ Y. = ± I} = t


and define T = inf{n ~ 1: Ii'= I t; = l} where inf{0} = 00. Then T < 00,
a.c. (Exercise 3.4.1), and, setting X. = y"I[T~.I'
00 00 T
I x. = .=1
.=1
L Y.I[T~.J = L Y,. = 1
.=1
by definition of T, and so E I:,= IX. = 1. However, since the event {T ~ n} E
u(YI,.·., y"-I), the LV.S Y,. and I[T~.J are independent by Corollary 3.2.2,
whence it follows from Theorem 4.3.3 that
n ~ 1,
so that (23) fails.

Definition. Given any nonnegative constants {b., n :?: O}, a continuous


function b(-) on [0, 00) is called an extension of {b.} to [0, 00) if b(n) == b.,
n ~ O. Moreover, when {b.} is strictly monotone, b(·) is a strictly monotone
extension of {b.} if it is both strictly monotone and an extension of {b.}.

Corollary 3. Let {b., n ~ O} be a strictly increasing sequence with 0 :s; b. i 00


and let b(-) be a strictly monotone extension of {b.} to [0, 00). Then for any
r.v. X ~ 0, a.c.
00 00

L P{X ~ b.} :s; E b-I(X):s; L P{X > b.}. (24)


.=1 .=0
In particular,for any r > 0 and any r.v. X,
00 00

L P{IXI ~ nil'} :s; EIXI':s; I P{IXI > nil'}. (25)


.= I .=0
4.1 Definition, Properties of the Integral, Monotone Convf'rgence Theorem 91

EIXI' = r ICO t,-1P{IXI > t} dt. (26)

PROOF. Set cp(x) = b- 1(x),


co co
Y = L jI(jsqJ(X)<j+ 1)' Z = L U + I)I(j<qJ(X)Sj+ 1)'
1 o
whence Y ~ cp(X) ~ Z, a.c., since P{X < oo} = 1 = P{cp(X) < oo}. Thus,
EY ~ E cp( X) ~ E Z.
But
00 00 CO 00

L P{X ~ b
1
n} = L P{cp(X) ~ n} = L L P{j ~ cp(X) < j
1 n=1j=n
+ 1}

co
= L j P{j ~ cp(X) < j + I} = E Y
j= 1

and
co co co
Lo P{X > b n} = L L Pfj < cp(X) ~j + I}
n=O j=n

co
= L(J+ I)P{j < cp(X) ~j + I} = E Z,
j=O

completing the proof of (24).

Clearly, (25) follows from (24) with b(x) = x', r > 0. Finally, it suffices to
verify (26) when r = 1 since the statement for r > then follows by a change °
of variable. Now, for any a > 0, (25) ensures

lco P{laXI ~ u} du ~ n~1 P{laXI ~ n} ~ a' EIXI ~ 1 + n~1 P{laXI > n}


~ 1 + Ico P{!aXI > u} du
whence setting t = u/a

1~, P{IXI ~ t} dt ~ EIXI ~ a- 1 + Ico P{IXI > t} dt


and the conclusion follows as a -+ 00. o
EXERCISES 4.1
1. If X is a geometric LV. with parameter p, that is, P{X = k} = pqk, k = 0, 1, ... ,
0< p < 1 verify that EX = qjp, EX(X - 1) = 2(qjp)2, EX(X - I)(X - 2) = 6(qjp)3,
EX 3 = 6(qjp)3 + 6(qjp)2 + qjp and EX 2 _ (EX)2 = (qjp)2 + qjp.
92 4 Integration in a Probability Space

2. If X is an integrable LV., for every e > 0 there is a simple function X, with


EIX - X,I < e.
3. Utilize Exercise 1.2.6 to give an alternative proof of Corollary 2.1.3.
4. (i) IfP{X = 1} = p, P{X = O} = q = 1 - p, prove that E(X - EX)k = pq[qk-1 -
(- p)k-1], k = 1, 2, .... (ii) if X is a LV. with I1=o P{X = j} = 1, then EX =
I1=1 P{X ~j}.
5. If X is an integrable LV., n P{IXI ~ n} = 0(1), but the converse does not hold.

6. Construct a sequence of discrete LV.S X. such that X.!. 0 but E X. -/+ O.


7. If {X, X., n ~ 1} is a sequence of L V.S on some probability space such that

I'" EIX. - XI' < 00,


n=1

some r > 0, then X. ~ X.


8. If(i){X., n ~ 1} is a sequence of nonnegative, integrable r.v.s with S. = XI + ... +
X. and (ii) I::"=I E X. < 00, then S. converges a.c. Hence, if (i) obtains, E S. > 0,
n ~ 1, and I::"=I E X./E S. < 00, then S./E S. converges a.c.
9. If X and Yare measurable functions on (0, :#', P) with EX 1,4 = E YI,4 for all A E:#',
then X = Y, a.c. Hint: Consider A,.s = {w: 00 ~ X(w) ~ r > s ~ Y(w) ~ -oo}.
10. If {X., n ~ 1} are i.i.d. random variables then (i) 1/nmaxISh;;.IXjl ~O iff
EIXtI < 00. (ii) lin maxI SjS. IXjl !. 0 iff n' P{IXtI > n} --+ O.

4.2 Indefinite Integrals, Uniform Integrability,


Mean Convergence
For any measurable function X on a probability space (n,~, P) whose
expectation E X exists, define

v{A} = EX· 1,4 = Lx dP, AE~. (1)

The set function v{ A} defined on ~ by (1) is called the indefinite integral of X.


Moreover, any nonnegative measurable function X on (n, ~, P) generates
a new measure v on ~ via (1), and if E X = 1, this will be a probability
measure.

Lemma 1. The indefinite integral v of any measurable function X on (n, ~, P)


whose expectation exists is a a-additive set function on ~, that is,for disjoint
AjE~,j~l,
4.2 Indefinite Integrals. Uniform Integrability. Mean Convergence 93

PROOF. By Corollary 4.1.2,


ooJ
.L
00
X+ dP = .L E X+ I Aj = E.L X+ I Aj
00

)=1 Aj )=1 )=1

and, similarly,

j=1
f J
Aj
X- dP = r
JU 1"Aj
X- dP.

Thus, since E X exists,

JI {jX dP =~ ({/+ dP - {jX- dP)

= r
JUAj
X+ dP - r
JUAj
X- dP = r
J U 1"Aj
X dP. D

Corollary 1. If X is a nonnegative measurable function on (n, fF, P), its


indefinite integral is a measure on fF.
The integrability of a measurable function X can be characterized via its
indefinite integral according to

Lemma 2. A measurable function X is integrable iff for every e > 0 there


corresponds a tJ > 0 such that A E fF, P{A} < tJ implies
I
EIXI::;~. (2)

PROOF. If X is integrable and X k = IXIIlIX1,;kJ' then X k i lXI, a.c., whence


by Theorem 4.1.1(iv), E X k i EIXI, which entails EIXIIlIx1>kJ -+ 0 as
k -+ 00. If K is a positive integer for which E IX II lIxl > K] < e12, set tJ =
min(eI2K, I/EIXI). Then for A E fF with P{A} < tJ,

{IXI dP = {IXlIlIXI>Kl dP + {IXIIlIXISKJ dP ::; ~ + ~ = e,


and so (2) holds.
Conversely, (2) implies IX I and therefore X integrable. D

This suggests the following

Definition. A sequence of r.v.s {Xn , n ~ I} is called uniformly integrable


(abbreviated u.i.) if for every e > 0 there corresponds a tJ > 0 such that

~~~ {IXnl dP < e (3)


94 4 Integration in a Probability Space

whenever P{A} < E> and, in addition,


sup EIXnl < 00. (4)
n~t

Furthermore, {X n} is said to be u.i. from above or below according as {X:} or


{X;} is u.i.
Some immediate consequences of the definition are:
I. {X n} is u.i. iff { IXnI} is u.i.
11. If {X n } and {y,,} are each u.i., so is {X n + Yn }.
Ill. If {X n' n ~ I} is u.i., so is any subsequence of {X n}.
IV. {X n} is u.i. iff it is u.i. from above and from below.
v. If IXnl ~ Y with E Y < 00, then {X n } is u.i.
An alternative characterization of uniform integrability appears in

Theorem 1 (u.i. criterion). A sequence oILv.s {X n , n ~ I} is u.i. iff

lim sup
a-tO n~ t
r
JIIXnl >a]
IXnl dP = O. (5)

PROOF. If {X n } is u.i., then sup EIXnl ~ C < 00, whence for any € > 0,
choosing E> as in (3), the Markov inequality ensures that
C
P{ IX n I > a} ~ a - tEl X n I ~ - < E>, n ~ 1,
a
provided a > C/E>. Consequently, from (3) for a > C/b,

sup
n~ 1
r
JIIxnl>a]
IXnl dP < €, (6)

which is tantamount to (5). Conversely, for any € > 0, choosing a sufficiently


large to activate (6),

E IX n I ~ a + r
JIIXnl >a)
IX n I dP ~ a + €, n ~ 1,

yielding (4). Moreover, selecting E> = €/a, for any A with P{A} < E> and all
n~ 1

f IXnl
A
dP = f
AIIXn!,;a)
IXnl dP + f AIIXnl>a)
IXnl dP

~ a P {A} + r
JIIXnl >a)
IXnI dP ~ € + € = 2€,

so that (3) holds. o


The importance of uniform integrability will become apparent from
Theorem 3.
4.2 Indefinite Integrals. Uniform Integrability. Mean Convergence 95

Associated with any probability space (O,~, P) are the ;R" spaces of
all measurable functions X (necessarily LV.S) for which E I X IP < 00, denoted
by ffl P or ffl p(O, ~, P), P > O. Random variables in ffl P will be dubbed
!E" r.v.s.
The inequalities of Section 1 show that a LV. X E ffl P iff the series
Lr' P{IXI > n l/P } converges.

Definition. A sequence {X", n ~ I} of ffl P r.v.s is said to converge in mean of


order p (to a LV. X) if E I X" - X IP -+ 0 as n -+ 00. This will be denoted by
X" 2-. X. Convergence in mean of orders one and two are called convergence
in mean and convergence in mean square (or quadratic mean) respectively.
Convergence in mean of order p for any p > 0 implies convergence in
probability as follows from the Markov inequality via
P{IX" - XI ~ e} = P{IX" - XIP ~ e"} ~ e- P EIX" - XIP.
Moreover, convergence of X" to X in mean of order p entails X E ffl p in
view of the inequality

the latter being an immediate consequence of

Lemma 3. If X and Yare nonnegative LV.S and p > 0, then


E(X + Y)P ~ 2 P [E XP + E fP]. (7)

PROOF. For a > 0, b > 0, (a + by ~ [2 max(a, b)]P ~ 2P[a P + bP] and (7)
follows.
In particular, X E ffl P' Y E ffl P imply X + Y E ffl P' 0

Among the most important and frequently used results of integration are
Lebesgue's dominated convergence theorem (Corollary 3) and monotone
convergence theorem, and Fatou's lemma. The second will be obtained by first
verifying the latteL

Theorem 2 (i) (Monotone convergence theorem). If the r.v.s {X", n ~ 1} are


u.i.from below and X" 1 X, a.c., then E X- < 00 and EX" lEX.
(ii) (Fatou's lemma). If the LV.S {X", n ~ I} are u.i. from below and
E lim X" exists, then
(8)

The most typical usage of Fatou's lemma and (i) would be as in

Corollary 2. If the LV.S {X", n ~ O} satisfy X. ~ x o , a.c., n ~ I, where X o


is integrable (a fortiori, if X" ~ 0, a.c., n ~ 1) then E li.!:!1.- ex> X" exists, (8)
holds, and, moreover, if X" lX, a.c., then EX. lEX.
96 4 Integration in a Probability Space

PROOF OF THEOREM 2. (i) X. i X a.c. entails X;;! X -, a.c., whence


sup E X;; < 00 implies E X- < 00. Then, 0 =:;; X. + Xi' i X + Xi' whence,
by Theorem 4.1 (iv), E(X. + Xi') i E(X + Xi') implying that E X. i E X.
To prove (ii), define, for K > 0,
X~ = X II A (- K), X: = X. - X~ (9)
and note that, for any 8 > 0 and sufficiently large K,
inf EX: = inf E(XII + K)I[Xn~-KI ~ inf E(-X;;I[x;~KJ) ~ -8. (10)
.~1 .~1 .~1

Hence, setting y" = infi~. X;, it follows from - K =:;; Y" i fu!!...-. 00 X~ and (10)
that
E lim X~ = lim E Y" =:;; lim E X~. (11)

Now X. =:;; X~and E !.i.m..-.oo X.exists by hypothesis, whence E !.i.m..-oo X. =:;;


E Iim._ 00 X~. Thus, by (9), (10), and (11)
lim EX. = lim E(X: + X:) lim E X~ -
~ E

~ E lim X~ - 8 ~ E lim X. - E,
and (8) obtains as 8 -+ O. 0
The next lemma, especially when A. = B., may be regarded as an extension
of the Borel-Cantelli theorem which relaxes the independence requirement
therem.

Lemma 4. If {A., n ~ I} and {B., n ~ I} are sequences of events satisfying

(12)

for all large i and some positive integer k and


00

L P{Ad = 00, (13)


j= I

then

(14)

Moreover, if(12) holdsfor infinitely many integers k and (13) obtains, then
P{B., i.a.} = 1. (15)

PROOF. Replacing i by i + nk and then setting A: = A:. i = A i +. k in (12),

p{O B ~ p{.O
J=k
i}
J=l
Ai+(II+ilkIAi+lIk} = p{ 0 h=.+l
AtI A :}
4.2 Indefinite Integrals. Uniform Integrability. Mean Convergence 9'/

for all large n, whence (14) follows from Theorem 3.1.3, noting that
k 00 00

L L P{A + = 00 = n=1
;=1 n=1
L P{A:.;}
j nk }

for some i. The final statement is a consequence of the monotonicity in the


left side of (14).

EXAMPLE 1. If {Sn = D=I Xi' n ~ t} where {X n , n ~ t} are i.i.d. random


variables, then for any e ~ 0, P{ISnl ~ e, i.o.} = 1 or 0 according as
00

L P{ISnl :::;; e}
n=1
diverges or converges.

PROOF. In view of the Borel-Cantelli lemma, it suffices to consider the case


of divergence and here it may be supposed that L:'=1 prO :::;; Sn :::;; e} = 00
since otherwise L:'=1 prO ~ Sn ~ -e} = 00 whence X n may be replaced by
-Xn • Setting A j = {O:::;; Sj:::;; e}, Bj = {ISjl:::;; e}, Cij = {ISj - Sd :::;; e}, clearly
Aj Ur;j+k A j c: A j Ur;;+kC;j, implying via independence and Theorem 1.6.1
that for all k ~ 1

Thus, P{Bn , i.o.} = 1 by Lemma 4. o


Lemma 5. If {Sn = L7= 1 Xj, n ~ 1, So = O}, where {X n , n ~ I} are i.i.d.
random variables, then for any b > 0 and positive integers k, N
N N
L P{ ISil < kb} ~ 2k L P{lS;1 < b}. (16)
;=0 j=O
PROOF. If ~ = inf{n ~ O:jb :::;; Sn < (j + l)b}, where inf{0} = 00, then
N N ;
L PUb:::;; Sj < (j + l)b} = L L P{~ = n,jb :::;; Si < (j + l)b}
;=0 i=O n=O
N N N
: :; n=O
L j=n
LP{~ = n}P{lSj - Snl < b} :::;; L P{ISnl < b},
n=O
and so (16) follows by summing onj from -k to k - 1. o
A sequence ofpartial sums {Sn, n ~ l} ofi.i.d.random variables {X n , n ~ I}
is called a random walk and it is customary to set So = O. The origin and the
random walk itself are said to be recurrent if P {I Sn I < e, i.o.} = 1, for all
e > O. Otherwise, the random walk is nonrecurrent or transient. Thus,
Example 1 furnishes a criterion for a random walk to be recurrent. However,
98 4 Integration in a Probability Space

here as in virtually all questions concerning random walks, criteria involving


the underlying distribution F of X are far preferable to those involving Sn for
the simple reason that the former are much more readily verifiable (see
Example 5.2.1).
The definition of a recurrent random walk applies also to sums Sn of i.i.d.
random vectors where ISn I signifies Euclidean distance from the origin.
According to Example 2 and Exercise 4.2.10, there is an abrupt change in the
behavior of simple random walks in going from the plane to three dimensions.
This is likewise true for general random walks (Chung and Fuchs, 1951).

EXAMPLE 2 (Polya). Let {Sn, n ;::: O} be a simple random walk on (- 00, oor
whose initial position is at the origin, that is, Sn = D
Xj, where {X n , n ;::: l}
are i.i.d. random vectors with
m
and Ie;
i;;]
= 1.

When m = I, {X n , n ;::: l} constitute Bernoulli trials with parameter p = t


and it follows immediately from Theorem 3.4.3 that {Sn, n ;::: O} is recurrent in
this case. When m = 2, setting An = {S2n = (O,O)} and recalling Exercises
2.1.3, 2.3.3,

P{A n} = 4- 2n ± .,
j=O
(2n)!. ,
[j.(n - }).]
2= (2n)4-
n
2n ±(~)2 = [(2n)2_2n]2 "'_,
j=O } n 1W

and so If P{A n} = 00. Moreover, AjA j = AjCjj for i < j where Cij =
{S2j - S2j = (O,O)}, implying

Thus, Lemma 4 applies with B j = A j , revealing that a simple random walk in


the plane is recurrent. On the other hand, for m ;::: 3 the origin and random
walk are nonrecurrent (Exercise 4.2.10).

Lemma 6. Let {X n, n ;::: I} be independent, symmetric LV.S and {an' n ;::: I},
{en, n ;::: I} sequences of positive numbers with an ...... 00. If X~ = XnIUX"1 $c"I'
S~ = D= I Xj, Sn = D=
I Xj' then

implies p{rrm anSn;::: I} = I.


n- 00
(17)

PROOF. The hypothesis ensures that N m = inf{j;::: m: Sj > aj} (= x other-


wise) is a bona fide r.v. for all m ;::: 1. If whenever n ;::: m
(18)

then
4.2 Indefinite Integrals. Uniform Integrability. Mean Convergence 99

00

= L P{Sn ~ S~, N = n} ~ t.
n::;:m
m

implying

and hence also the conclusion of (17) by the Kolmogorov zero-one law.
To verify (18), set Xj = XjIlIXjl:scj) - XjI IIXj ! >'j] and note that by
symmetry and independence the joint distributions of (X t, ... , X n) and
(X!, ... , X:) are identical for all n. Hence, if n > m,

P{Sn ~ S~, N m = n}
= p{~ XjIUXjl>cj] ~ 0, ~ XjIlIXjl:scj) > an,S;:::;; ai' m:::;; i < n}

= P{Sn :::;; S~, N m = n},

and equality also holds for n = m, mutatis mutandis. D

The next result reveals the extent to which mean convergence is stronger
than convergence in probability and provides a Cauchy criterion for the
former.

Theorem 3 (i) (Mean Convergence Criterion). If the r. v.s {I X niP, n ~ I} are


°
u.i.for some P > and X n ~ X, then X E!t'p and X n ~ X.
Conversely, if X n, n ~ I, are!t'Pr.v.S with X n ~ X, then X E!t' P' X n .f. X,
and {I X niP, n ~ I} are u.i.
(ii) (Cauchy Convergence Criterion). If {X n, n ~ I} are !t'p r.v.s with
sUPn/>n EI X m - X nIP = 0(1) as n ...... 00, there exists a r.v. X E!t' P such that
X n ~ X and conversely.

PROOF. If Xn.f. X, by Lemma 3.3.2 there exists a subsequence X n• with


X n• ~ X. By Fatou's lemma (Corollary 2)

EIXIP = E[limlxn.I P] :::;; lim EIXn.I P :::;; sup EIXmlP < 00

°
k-oo k-(JJ m~l

since {I X niP} is u.i. Again employing the latter, for any t: > 0, () > may be
chosen such that P{A} < () implies
sup E{IXnIPI A ] < t:, (19)
.. ~1
100 4 Integration in a Probability Space

the second part holding by Lemma 2. Moreover, X n J'. X ensures that n ~ N


entails
P{ IX n - XI > e} < 8. (20)
Consequently, for n ~ N, by (19), (20) and Lemma 3

EIX n - XI P = E[IX n - XIP(IlIXn-XI~t) + IlIXn-XI>t))]


~ eP + 2P E[IlIxn-xl>tl(IXn,P + IXI P)] < 2P+ l(e + eP),
!I'
so that X n -4 X.
Conversely, if X n .!!1 X, then, as noted earlier, X n .f. X E:£' P and by
Lemma 3
sup EIXnlPIA ~ 2P sup E[IX n - XIPI A + IXIPIA ] < 00 (21)
n~l n~l

for all A E fi' and, in particular, for A = n. Since


sup EIX n - XIPIIIXn-XI>K) ~ max EIX n - XIPIIIXn-XI>K) + sup EIX n - XI P
n~m n>m
m-oo
XI P ------+ 0,
K~oo
------+ sup EIX n -
n>m
{IXn-XIP,n~ I} is uj. whence (21) guarantees that {IXnIP,n~ I} is
uj. Apropos of (ii), if the Cauchy criterion holds, the Markov inequality
ensures that for any e > 0
sup P{IX m - Xnl > e}:$ e-Psup EIX m - XnlP = 0(1)
m>n m>n
and Lemma 3.3.2 guarantees a subsequence nk with X nk ~ some LV. X. By
Corollary 2 (Fatou's lemma), EIX m - X IP ~ limk _ oo EIX m - XnklP, imply-
ing
o ~ lim EIX m - XI P ~ lim lim EIX m - XnklP = 0,
m-oo m-oo k-oo
Y'p . . ! fp
so that X n ~ X. Conversely, If X n ---+ X, by Lemma 3

sup EIX m - XnlP ~ 2P[SU P EIX m - XI P + EIX - XnIP] = 0(1)


m>n m>"

and the Cauchy criterion obtains. o


Corollary 3 (Lebesgue Dominated Convergence Theorem). Let {X, X n,
n ~ 1} be a sequence of r.v.s with X n .f. X. If E[suPn> I 1X n I] < 00, then
EIX n - XI-+ 0 and afortiori E X n -+ E X.
PROOF. The hypothesis ensures that {X n, n ~ I} is u.i., whence the con-
clusion follows from Theorem 3(i) with P = 1. 0

Corollary 4. If {X, X n , n ~ 1} is a sequence of nonnegative:£' I LV.S with


X n J'. X, then E X n -+ EX iffEIX n - X 1-+ 0 iff {X n , n ~ I} is u.i.
4.2 Indefinite Integrals. Uniform Integrability. Mean Convergence 101

PROOF. Sufficiency is immediate from IEX. - E X I = IE(X. - X)I :::;;


EIX. - XI.Aproposofnecessity,sinceO:::;; (X - X.)+ :::;; Xand(X - X.f'
.f. 0 by Corollary 3.3.3, dominated convergence (Corollary 3) guarantees
E(X - X.)+ ~ O. By hypothesis E(X - X.) ~ 0, whence E(X - X.)- ~ 0,
and so E IX - X. I ~ O. 0

Among other things, Corollary 4 underlines the importance of the concept


ofu.i.
!I'
Corollary 5. If X., n 2: I, are !f P r.v.s with X. ----4 X for some p > 0, then
X E !f P and E IX. IP -+ E IX IP.

PROOF.p The hypothesis implies that X • .!. X and Corollary 3.3.3 ensures that
IX.IP -+ IXI P, whence EIX.,p -+ EIXIP by Corollary 4. 0

In the proof of Theorem 2.2.2 it was shown for binomial r.v.s S. that
EIS. - np!4 = 0(n 2) and so EI(S.ln) - pl4 = 0(1). Thus, it follows directly
from Corollary 5 that E(S.ln)4 -+ p4. More generally, since S.ln ~ p and
IS.lnl :::;; 1 Corollaries 3 and 5 ensure that EIS.lnl/i ~ p/i for every P > O.

EXERCISES 4.2
I. Improve the inequality of Lemma 3 by showing that

(a + b)P ~ (a P + bP)'max(l, 2P- 1 ) for a> 0, b > 0, p > 0;

also, if a i ~ 0, then (I7=1 aJP ~ I7=1 a; (resp. ~) for p ~ I (resp. ~).


2. Prove that if LV.S X. ~ X and X • .2.. Y, then X = Y, a.c. Construct LV.S X,
y y
X., n ~ I, such that (i) X. ~ X but X.""';'" X for any p > 0, (ii) X. ~ X for all
all p > 0 but X. ~ X.
3. Let {A., n ~ I} be events with In""= I P{A.} = 00. If there exist events {Bi> i ~ I}
and {Dij,j > i, i ~ I} such that for all large i and some positive integer k (resp.
infinitely many k > 0)
i. AjA j C Dij, i <j,
ii. P{U~k+j Dij} = P{U~k+i Bj _;},
iii. the classes {Ai} and {Dj.i+k' Di.i+k+IDf.i+ko Di.i+k+lDf.i+k+IDf.i+k""} are
independen t,

then P{Ui=k Bj } = I (resp. P{B., i.o.} = I).


4. If X 1 and Xl are independent r.v.s and X I + Xl E!f' P for some p E (0,00), then
Xi E !f'p, i = 1,2. Hint: For all large A. > 0,

P{lX11 > A.} S; 2 P{IXd > A., IXll <~} ~ 2 P{IX 1 + Xli >~}
5. Let PIX. = a. > O} = I/n = I - PIX. = O}, n ~ I. Is {X.,n ~ I} uj. if
i. a. = o(n),
ii. a. = en > O?
102 4 Integration in a Probability Space

6. If {X., n :2: I} are LV.S with sup. 2. E I X. IP < IX) for some P > 0, then {I X. I', n :2: I}
is u.i. for 0 < IX < p.
7. If the r.v.s X., n :2: I, are u.i., so are S./n, n :2: 1, where S. = I7= I Xi; in particular,
if X., n:2: 1, are identically distributed !f'. random variables, then {S./n, n :2: I}
is u.i.
8. Show that (i) if the Poisson r.v.s S. have p.d.f. p(k; nil), n :2: 1, then EI(S./n) - ill-+ O.
Hint: Recall Exercise 2.2.2. (ii) Any sequence ofr.v.s y".!. 0 iff E(I y"1/(1 + Iy"1)) =
0(1).
9. (i) Construct a sequence of LV.S that is u.i. from below and for which the expectation
of lim X. does not exist. (ii) Show that the hypothesis sup.2.1 X. IE!f' I of Corollary
3 is equivalent to IX.I $ Y E !f'., n :2: 1.
10. Let {X., n :2: I} be a simple symmetric random walk in R\ that is, {X.} are i.i.d.
random vectors such that P{X. = (e., ... , ek )} = 1/2k, where ej = 0, 1, or -I and
D eJ = 1. Prove that

p{* Xi returns to its origin, i.O.} = 0 for k = 3.

II. If {X. X., n :2: I} are r. v.s on (n, ff, P), show that the indefinite integrals I' A X. dP-+
J
A X dP, finite, uniformly for all A E ff iff X. converges to X in mean.'

12. If the two sequences of integrable LV.S {X.}, {Y.} satisfy P{X.:2: Y.:2: O} = 1,
X. ~ X, Y. ~ Y, and EX. -+ E X, finite, then EI Y" - YI-+ O.
13. Let {X., n :2: I} be a sequence of !f'p LV.S, P > 0 with SUPUA IX.I P dP: n :2: 1 and
P{A} < 0] = o(l)aso -+ O. Then X. ~ X iff X• .!!4 X.
14. If X E!f'. (n, ff, P) and u(X), t§ are independent classes of events, prove that E XI A
= EX· P{A}, all A E t§.

15. (Kochen-Stone) If {Z., n :2: I} is a sequence of r.v.s with 0 < E Z; < IX), E Z. #- 0,
and ITm._oo(E Zn)2/E Z; > 0, then P{ITmn_ oo Zn/E Zn:2: I} > O. Hint: If Yn =
Zn/E Zn, there is a subsequence {n'} with E Y;' < K < IX). Replacing {n'} by {n} for
notational simplicity, E lim Y; $ K by Corollary 2. Since - Yn $ I + Y;, neces-
sarily E Iim( - Yn) exists. Then Theorem 2(ii) ensures (since {y"} is u.i.) that E r.m Y.
:2:r.mEY.=1.
16. (Kochen-Stone) If {An' n :2: I} is a sequence of events such that for some c > 0
i. P{AA} $ c P{A;} [P{A j _;} + P{Aj }], i < j
ii. I:'=I P{A n } = IX),
then P {A., i.o.} > O.
Hint: If Z. = L7=.I Aj , note that E Z; $ E Z. + 4c(I'i P{ A i })2 ~ (1 + 4c)(E Z.)2
for all large n since E Zn -+ IX) and, via Exercise 15,
P{A n, i.o.} :2: p{rrm ZJE Z. ~ I} > O.

4.3 Jensen, HOlder, Schwarz Inequalities


A finite real function g on an interval J c ( - 00, (0) is called convex on J if
whenever XI' X2 E J and A. E [0, I]
4.3 Jensen. Holder. Schwarz Inequalities 103

(1)
Geometrically speaking, the value of a convex function at any point on the
line segment joining XI to X2 lies on or below the line segment joining g(x l )
and g(X2)' Since t = u«t - s)/(u - s» + s«u - t)/(u - s»,
t-s u-t
g(t) ~ - - g(u) + - - g(s), s < t < u, (2)
u-s u-s
or equivalently
g(t) - g(s) g(u) - g(t)
----~ , s < t < u. (3)
t-s u-t
If g is convex on an open interval J 0' it follows from (2) that lims<t_s g(t) ~
g(s), limr>s_t g(s) ~ g(t), limt<u_t g(u) ~ g(t), and limu>t_u g(t) ~ g(u),
whence g is continuous on J o. Furthermore, as a consequence of (3), a
differentiable function g is convex on J 0 iff g' is nondecreasing on J o. Thus,
if g is convex on J 0 and twice differentiable, g" ~ O. Conversely, if g" ~ 0 on
J o , a two-term Taylor expansion yields, setting ql = A., q2 = 1 - A.,
qjg(X j) ~ qj[g(qlx l + q2X2) + q3-j(Xj - X3_j)g'(q I XI + q2X2)], i = 1,2,
and summing, (I) holds, that is, g is convex on J o . Moreover, it is shown in
Hardy et al. (1934, pp. 91-95) that
i. If g is convex on an open interval J 0' it has left and right derivatives g; and
g~ at every point of J 0' with g; ~ g~, each derivative being nondecreasing,
n. If g is convex on an interval J, at each interior point ~ e J,
teJ. (4)

Theorem 1. If X is an integrable LV., c is a finite constant, and g is a convex


function on (- 00, (0), then
E g-(X - EX + c) < 00. (5)
Moreover, ifa(t) and t - a(t) are nondecreasing on ( -00, (0), then Ela(X)1 < 00,
E g- (a(X) - E a(X) + c) < 00, and
E g(X - E X + c) ~ E g(cx(X) - Ecx(X) + c). (6)

PROOF. By (4), g(t) ~ g(O) + tg~(O) for t e (- 00, (0), whence (5) holds. Since
monotonicity ensures t+ + cx(O) ~ cx(t+) ~ cx(t) ~ cx( - C) ~ - C + cx(O), the
hypothesis implies Icx(X)1 ~ IXI + Icx(O) I and so cx(X) is integrable. Con-
sequently, (4) yields E g-(cx(X) - E cx(X) + c) < 00. Set
fJ(t) = t - cx(t) - E X + E cx(X), t e (- 00, (0).
Then ElfJ(X)1 < 00 and E fJ(X) = O. If P{fJ(X) = O} = 1, (6) holds trivially.
Otherwise, fJ(t l ) < 0, fJ(t 2) > 0 for some tl> t 2 e( -00, (0). If to =
inf{t: (J(t) > O}, then t l ~ to ~ t 2 by monotonicity of t - cx(t), and
t ~ to if (J(t) > 0, t ~ to if fJ(t) < O. (7)
104 4 Integration in a Probability Space

Again employing (4),


g(X - E X + c) ~ g(a(X) - E a(X) + c) + f3(X)g~(a(X) - E a(X) + c).
(8)
By (7), X ~ to when f3(X) > 0 and X ~ to for f3(X) < O. Since both g~ and a
are nondecreasing, necessarily
f3(X)g~(a(X) - E a(X) + c) ~ f3(X)g~(a(to) - E a(X) + c). (9)
Taking expectations in (8) and (9), the conclusion (6) follows by recalling
that E f3(X) = O. 0

Corollary I. If9 is a convex function on ( - 00, 00 ),for any !i't r.v. X and any
finite constant c
E g(X - E X + c) ~ g(c) (10)
and, in particular,
E g(X) ~ g(E X) (Jensen's inequality). (11)

Corollary 2. If X is an !i'l r.v., then for 1 ~ P< 00

EIX - E XIP ~ EI Y - E YIP, (12)


where for some choice of - 00 ~ a < b ~ 00

Y = XIlasXsb) + aIIX<a) + bIIX>b)'


PROOF. Take c = 0, a(t) = max [a, min(t, b)], g(t) = IW, p ~ I in (6). 0

In particular, abbreviating (E IX I)P by EP IX I. Jensen's inequality (11)


yields for I ~ p < 00
EIXIP ~ EPIXI or E'/PIXI P ~ EIXI. (13)
Replacing p and IX I respectively by p'lp and IX IP in (13),
E'/PIXI P ~ E,/p'IXIP', 0 < P < p/ < 00 (14)
and so convergence in mean of order p implies convergence in mean of any
order less than p. A convenient, widespread notation is to set
p> 0, (15)
and it is customary to refer to IIXllp as the p-norm of X. According to (14),
IIXll p ~ IIXllp' for 0 < p < p'. Moreover, IIXllp satisfies the triangle in-
equality for p ~ I as noted in Exercise 10 and IlcXll p = lei' IIXlip'

Theorem 2 (Holder Inequality). If X, Yare measurable functions on a prob-


ability space (n,~, P), thenfor p > I, p/ > I with (lip) + (1lp') = 1

(16)
4.3 Jensen, Holder. Schwarz Inequalities 105

PROOF. In proving (16), it may be supposed that 0 < IIxlip II YII p' < 00 since
(16) is trivial otherwise. Set
IXI
U = TIXII~'
entailing IIUll p = 1 = 1IVII p " Now, -log t is a convex function on (0, (0),
whence, via (1), for a, b > 0
p P

- log ( -a + -b ')
~ - -1 log a p
- -Ilog
pb' = -log ab,
P P' P p'

or equivalently

o ~ a, b ~ 00.

Thus,
1 1, 1 1
E UV < - E UP
- P
+ -p' E VP = -P + -p' = 1
,

which is tantamount to (16). o


Corollary 3 (Schwarz Inequality). For any !E 2 random variables X and Y,

(17)

Corollary 4 (Liapounov). If X is a non-negative!E p LV., all p > 0, and

g(p) = log E XP, o~ p < 00, (18)

then g is convex on [0, (0).

PROOF, For 0 ~ Pt, P2 < 00 and qt, q2 > 0 with ql + q2 = 1, noting that
l/qj > 1, i = 1,2, Holder's inequality yields

If X is a LV, on a probability space, E I X IP, P > 0 is called the pth absolute


moment of X( or its distribution), while for any positive integer k, E X k (if it
exists) is termed the kth moment of X.
For any !E I r.v. X, the variance of X is defined by

(19)
106 4 Integration in a Probability Space

while its positive square root is the standard deviation of X. Clearly, for every
finite constant c, a 2 (X + c) = a 2 (X) and a 2 (cX) = c2 a 2 (X). The variance or
standard deviation of X provides information about the extent to which the
distribution of X clusters about its mean and this is reflected in the simple but
extremely useful Tchebychev inequality

a> 0, (20)

which follows from a direct application of the Markov inequality to the LV.
(X - E X)2.
A basic tool of probability theory is truncation. Two alternative methods
of truncating a LV. X are

(i)

(ii)

where a, c are constants such that - 00 ~ a < c ~ 00. One or both equality
signs in the set of the indicator function of (i) may be deleted. Whenever both a
and c are finite, Yand Y' are bounded r.v.s and hence have moments of all
orders. For X E ff l' Corollary 2 reveals that a~ ~ ai, whereas no comparable
inequality between a~. and ai exists (Exercise I).
If X, Yare LV.S with 0 < a(X), a(Y) < 00, the correlation coefficient
between X and Y or simply the correlation of X and Y is given by

E(X - EX)(Y - E Y)
Px. y = p(X, Y) = a(X). a(Y) . (21 )

If p(X, Y) = 0, the LV.S X and Yare said to be uncorrelated.


It follows directly from the Schwarz inequality (17) that Ip(X, Y) I ~ 1.
The correlation coefficient Px. y indicates the extent to which there is a linear
relationship between the r.v.s X and Y (Exercise 7).
ff 2 L v.s X", n ~ 1, are called uncorrelated if X" and X mare uncorre1ated
for each pair of distinct indices n, m. Independent ff 2 LV.S are necessarily
uncorrelated as follows from

Theorem 3. If X and Yare independent ff 1 r.v.s, then X . Y E 2 1 and

E XY = EX· E Y. (22)

PROOF. To prove (22) suppose first that X ~ 0, Y ~ O. For m ~ 1 and'; = 1,


2, ... , set mj=j/2m, Ym.j=I[mi<Ysmj,t!' and Ym=LJ=.mjYm,j' Then
o ~ Ym i Y, a.c., and 0 ~ X Ym i X Y, a.c. By the definition of expectation and
independence, taking n j analogous to mj
4.3 Jensen, Holder, Schwarz Inequalities 107

00

E XYm,j = lim In j P{n j < XYm,j S n;+t}


" i= 1
00

= lim Inj P{n j < X S nj+t, mj < Ys mj + t }


n i= 1

00

= lim In; P{nj < X S nj+d· P{mj < Ys mj+d


" ..... 00 i= 1

= P{mj < Y S m j + t} . E X.
By the monotone convergence theorem
00 00

E XY = lim E XYm = lim E


m m
L mjXYm,j = lim I
j= t m j= t
mj E XYm,j

00

= lim I mj P{mj < Y S mj+ dE X = E y. EX.


m j= I

Thus, in the general case, recalling Exercise 3.1.5,


E IX Y I = E IX I. ElY I and E X±Y± = E X± . E Y±
for all four pairings of positive and negative parts. Consequently,
EX· E Y = (E X + - E X -)(E Y + - E Y -)
= E(X+ - X-)(Y+ - Y-) = E XY. o
Corollary 5. If {X j , 1 sj s n} are uncorrelated!f'2 r.v.s, (a fortiori, inde-
pendent !f' 2 LV.S), then

0'2 ttl Xj) = jt 2


a (X j). (23)

PROOF. Since E IJ=


I Xj =
2
LJ=
I E Xj and a (X + c) = a (X) for every
2

finite constant c, it may be supposed that E X j = 0, 1 :s j :S n. Then

0'2 (.±
)=1
Xj) = E (.± )=1
X j)2 = .± E Xf + .I.E
)=1 '''')
XjX j = .±
)=1
(12(X j). 0

Even though Tchebychev's inequality is an easy consequence of the order-


preserving property of expectations, it is quite useful, especially in verifying
convergence in probability (Exercise 4).
Let {Xn,n ~ I} be a sequence of LV.S with EIXnl/l:s C < 00 for some
p ~ I and 8n = L~ Xj. By Holder's inequality (Exercise 4.3.5) or Jensen's
inequality
108 4 Integration in a Probability Space

and so according to Exercise 4.2.6 {I Sn/n la, n ~ I} is u.i. for < IX < p. If, °
moreover, {I X nIII, n ~ I} is u.i., so is {ISn/nl ll , n ~ I}, generalizing Exercise
4.2.7.

EXAMPLE 1. If {I X nIII, n ~ l} is u.i. for some p ~ I and Sn = Ii Xi' then


ISn/n III is u.i. In particular, if {X n, n > I} are identically distributed Y II r.v.s
for f3 ~ I, then {I Sn/n III, n ~ l} is u.i.

PROOF. Since the case p = 1 is obvious (Exercise 4.2.7), suppose that p > 1.
By Holder's inequality, ISnlnl1l ~ ~ I7= 1 IXl, whence uniform integrability
~M~~ 0
EXAMPLE 2. Let Sn = Ii=1 Xj' n ~ 1 where {Xn , n ~ I} are independent r.v.s
with EX. = 0, E X n2 = 1. If {Xn2 , n ~ I} is u.i., then {S;/n, n ~ I} is u.i.
PROOF. Define y" = XnIlIXnl~KI - E XnIlIXnl~KI' K > 0 and Z. = X. - y",
n ~ 1. Then {y", n ~ I} are independent r.v.s with E y" = and E Y,,2 ~ 1. °
The same statement likewise holds for {Z., n ~ I}. Hence, if T" = Ii=l lj,
J¥" = Ii=l Zj'

E(T,,/n l/2 )4 = :2 Ct E lj4 + 2 i~ E 1';2. E lj2)

~ :2 [nK 4
+ 2 (;) ] ~ K 4+ 1
and so {(T,,/n l/2 )2, n ~ I} is u.i. by Exercise 4.2.6. Hence, for any I: > 0, there
is a () > 0 such that P {A} < () entails

sup E (T,,2 I A) < -41: .


n~l n
On the other hand, via u.i. of {X;, n ~ I}, K = K, > 0 may be chosen so
that sup.~ lEX; Illxnl>KI < 1:/4 whence
2 2 1~ 2
E(J¥" In) = -n1~
j=l n j=l J
I:
L... E Zj ~ - L... E Xj IlIX.I>KI <-.
4

Consequently, fOf P {A:} <-() and all n ~ 1,

E ( S2)
: IA ~ 2 E (T.•2 +n W 2 I ) < 1:.
n A

Clearly, E(S;/n) = 1, n ~ 1 and so {S;/n, n ~ I} and a fortiori {Sn/n 1/2 , n ~ I}


is u.i. 0

A generalization of Example 2 appears in Corollary 11.3.2.

EXAMPLE 3. If {y", n ~ I} are non-negative r.v.s satisfying lim._ex> E y" = 1 =


limn_ex> E Y/ for some p > I then Yn ~ I.
4.3 Jensen, Holder. Schwarz Inequalities 109

PROOF. Let r = min(l, p/2). If p < 2, then r < 1 < p and so by Exercise 4.3.8
(P-1l/(p-rl E y.
( lim E
"-CO
Y: ) ~ lim (E p)(l--"rl/(p-rl = 1
n-oo n

implying
(24)

Thus, (J~r ~ 0 and E Y: ~ 1 so that Y: !. 1 and hence Y,f ~ 1. Since E Y,{' ~ 1,


{Y,,P, n ~n I} is u.i. which, in turn, ensures ElY" - liP ~ O.
If rather p ~ 2, then r = 1 and E Y,,2r = E Y,,2 ~ 1 since (E y"Q)l /Q is an
increasing function of q. Thus, (24) holds with equality throughout and the
conclusion follows as before. 0

EXERCISFS 4.3
1. (i) Show for any fill r.v. X that ai = E X 2 - (E X)2. (ii) If P{X = I} = pe(O, I)
andP{X = O} = P{X = 2} = (1- p)/2,then,settingZ = XlIX>I)' W = XlIXSII'
necessarily (1ij. < (1i < (1~.

2. Calculate the mean and variance of U., the number of empty cells in the random
assignment of r balls into n cells.
3. Prove (i) for any r.v.s {S., n ~ I} with (12(S.) = o(n 2 ) that n -1 (S. - E S.) .!. 0,
(ii) for any r.v. X and positive numbers a, t that necessarily P{X ~ a} ~ e- al E elx
(iii) ifm is a median of X e fill' then Im-E XI~EIX -E XI and hence Im:'E XI
~ (1 for X e fil2 with variance (12.
(iv) for fill r.v.S. X and Y with E XY finite, their cOl'ariance, denoted Cov[X, Y],
is defined by Cov[X, Y] = E[X - E X] [Y - E Y]. Prove that if f and g are
non-decreasing functions and X is an r.v. with E f(X), E g(X), E f(X)' g(X) finite,
then Cov[f(X), g(X)] ~ O.
(v) for any fil2 r.v.s. Xj' 1 ~ j ~ n, check that

(12 (t
j=l
Xj) = t
j=1
(12(X) +2 L
l$i<j$.
Cov(X;, Xj)'

*
4. For arbitrary real numbers ai' bi, 1 ~ i ~ n, prove that

lajbd ~ (* ,adPfP(* Ibdqfq,


provided (lIp) + (llq) = 1, p > 0, q > O. Hint: Apply Holder's inequality to suit-
able r.v.s X, Y.
5. If S. = Lj=l Xj where {Xj,j ~ I} are independent r.v.s with E Xj = 0, E Xl = (1l,
s; = Lj=1 (1l, then
P{S. ~ x} ~ (1 - 2S;)
X
t P{Xj ~ 2x},
j=1
x> O.
Hint: P{S. ~ x, Xj ~ x, max/<jX/ < x} ~ P{Xj ~ 2x}' P{S. - Xj ~ -x,
maxi<jXi < x}.
110 4 Integration in a Probability Space

6. (i) If X is an !L'I r.v., Y = yea, b) is as in Corollary 2, and Z = Y(a', b'), where


a :s; a' < b' :s; b, then EI Y - E Y II' ~ EiZ - E ZIP for any P ~ 1. (ii) For get), lX(t)
as in Theorem 1, E g(X) ~ E g(IX(X) - E IX(X) + E X) ~ g(E X) and E g(X) ~
E g(eX + (1 - e)E X) ~ g(E X), 0 :s; e :s; 1.

7. (i) If X' = aX + b, Y' = eX + d, verify that p(X', Y') = ±p(X, Y) according as


ae > Oorae < O.(ii)ifX, Yare r.v.swith 0 < a(X),q(Y) < oo,thenp(X, Y) = I iff
(X - E X)/a(X) = (Y - E Y)/a(Y), a.c. (iii) if X = sin Z, Y = cos Z, where
P{Z = ± I} = i = P{Z = ±2}, then p(X, Y) = 0 despite X, Ybeing functionally
related.

8. Verify for 0 < a < b < d and any nonnegative r.v. Y that
E y b :s; (E P)ld-bl/1d-al(E yd)(b-al/1d-al.

Utilize this to conclude for any positive r.v.s {Y", n ~ I} and positive constants
{c., n ~ I} that ifI.""= I e: E Y:
< 00 for IX = IX I and 1X 2, where IX j > 0, i = 1,2, then
the series converges for all IX in [lXI' 1X2].
9. The moment-generating function (m.gJ.) of a r.v. X is the function rp(h) = E e hX •
If rp(h) is finite for h = ho > 0 and h = -h o, verify that (i) E e h1x1 :s; rp(h) +
rp( - h) < 00, 0 < h :s; ho (ii) log rp(h) is convex in [ - ho, ho] and strictly convex if
X is non-degenerate (iii) ifE X = 0, then rp(h) ~ 1 for Ihl :s; ho. (iv) The kth moment
of X is finite for every positive integer k and equals rplkl(O), (v) if X I and X 2 are
independent r.v.s with (finite) m.gJ.s rpt (h), rp2(h) (say) for Ihl < ho, then the m.gJ.
of X I + X 2 is rpl (h)' rp2(h) in this interval, (vi) if rp(h) is finite in [0, ho], all moments
of X+ are finite.

10. Prove Minkowski's inequality, that is, if X I E !L'p, X 2 E !L'p, then


P ~ 1.

Hint: Apply Holder's inequality to E IX dIX I + X 211' - I when P > 1.


II. Let{X.,n ~ l}ber.v.swith(i)EIX.i 2 :s; land(ii)E«X I + ... + X.)/n)2 = O(n-O)
for some IX> O. Then (ljn)Lj= I X/~:~ O. Note that (i) =;> (ii) if E XjX j = 0, i '" j.
Hint: Choose n.. = smallest integer ~ m210 , m = 1,2, ... Then (I/n..) Lim Xj~O
as m -+ 00 and

E{ max 1_I_iX.m+jI2}:S;En"+12-n.. ·m+f·m X;m+j=0(m- 2 ).


1 $1 < n". + 1 -"". "WI +k I "m 1

12. For any sequence {A., n ~ I} of events, define


1 • 1 •
Y. = - L (fAj - P{A j }), PI(n) = - L P{AJ,
n I n I

2
P2(n) = L P{AjAd, d. = P2(n) - p~(n).
n(n - 1) t $j<k".

Prove that (i) E Y; = dn + n-I[pI(n) - P2(n)], (ii) y" f.. 0 iff E Y; -+ 0 iff d. -+ 0,
(iii) Y" ~ 0 if d. = O(n-O) for some IX > O. Hint: Utilize Exercise 11.

13. For events {A., n ~ I} and e > 0, there exist distinct indices j, k in [1, n], where
n> l/e, such that P{AjA k } ~ piCn) - e, where PI(n) is as in Exercise 12. Hint:
P2(n) ~ piCn) - l/n.
4.3 Jensen, Holder, Schwarz Inequalities 111

14. Bernstein's Inequality. If S. = Ii=1 X j where {Xj' 1 5,j 5, n} are independent r.v.s
with E X j = 0, E Xl = ul, s; = Ii=1 ul > 0 which satisfy (i) EIXl5, (k!j2)ulc k- 2
for k > 2, 15, j 5, n,O < c < 00 (a fortiori if(i)' P{IXjl5, c} = 1,15, j 5, n), then

P{S. > x} 5, eXP{2(S~:2CX)}' x> O.

Hint: exp{tXj } 5, 1 + tXj + I~=2tkIXl/k! valid for t > 0 implies E elKj 5, expo
{ult 2/2(1 - tc)},O < tc < 1and choosing t = x/(s; + cx)yields Eexp{xS./(s; + cx)}
5, exp{x 2 /2(s; + cx)}, x> O. Now apply Exercise 4.3.3(ii).
15. If S. = Ii=1 X j where {Xj' 1 5, j 5, n} are independent LV.S with E X j = 0, E Xl =
ul, s; = Ii=1 ul > 0, then for A> Y > 1,

p{ max ISjl
15j5.
~ AS.} 5, Y~P{lS.1
- 1
~ (A - y)s.}.

Hint: 1fT = inf{1 5, j 5, n: ISjl ~ AS.} and T = n + 1otherwise, then for 1 < y < A

P t~~;.ISjl ~ AS.} 5, P{IS.I ~ (A - y)s.} + :~ P{T = j}' P{IS. - Sjl ~ YSo}.


Now apply Tchebychev's inequality.
16. If {S., X., n ~ I} are as in exercise 15 and c. --+ 00, then (l/cos.)max1 5j".ISj l !. O.
17. For X, Ye2"p, P > I, define d(X, Y) = IIX - YII p and show that d has all the
attributes of a metric except one. For X, Y e 2" p write X - Y if X = Y, a.c., and let
2"; be the space of equivalence classes of 2" p. Show that 2"; is a Banach space, i.e., a
normed, linear, complete space. When p = 2, 2"; is a Hilbert space under the inner
product (X, Y) = E X Y.
18. ProvethatifS. = I7=1 Xi' where {Xi' i ~ I} are independent LV.'S withE Xi = 0,
1 5, i 5, n, then for any e > 0,

p{ max Sj> e} 5, 2P{S.


1 :S;j:S;.
~e- EIS.I}. (24)

Hint. Use the Feller-Chung lemma with A j = {Sj > e},B j = {So - Sj > -EISol}.

References
Y. S. Chow and W. J. Studden, .. Monotonicity of the variance under truncation and
variations of Jensen's inequality," Ann. Math. Stat. 40 (1969), 1106-1108.
K. L. Chung and W. H. J. Fuchs, "On the distribution of values of sums of random
variables," Mem. Amer. Soc. 6 (\951).
J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
P. R. Halmos, Measure Theory, Van Nostrand, Princeton, 1950; Springer-Verlag,
Berlin and New York, 1974.
P. Hall, "On the 2"p convergence of random variables," Proc. Cambridge Phi/os. Soc.
82 (1977). 439-446.
G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities, Cambridge Univ. Press,
London, 1934.
112 4 Integration in a Probability Space

S. B. Kochen and C. J. Stone, "A note on the Borel-CanteIli lemma," /II. Jour. Math. 8
(1964),248-251.
A. Liapounov, "Nouvelle forme du theoreme sur la limite de probabilite," Mem. Acad.
Sc. St. Petersbourg 12 (1905), No.5.
M. Loeve, Probability Theory, 3rd ed., Van Nostrand, Princeton, 1963; 4th ed., Springer-
Verlag, Berlin and New York, 1977-1978.
G. Polya, "Uber eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die
Irrfahrt im Strassennetz," Math. Ann. 84 (1921), 149-160.
S. Saks, Theory ofthe Integral (L. C. Young, translation), Stechert-Hafner, New York,
1937.
H. Teicher, "On the law of the iterated logarithm," Ann. Proh. 2 (1974),714-728.
5
Sums of Independent Random
Variables

Of paramount concern in probability theory is the behavior of sums {Sn,


n ~ l} of independent random variables {Xi' i ~ I}. The case where the
{X i} are i.i.d. is of especial interest and frequently lends itself to more incisive
results. The sequence of sums {Sn, n ~ l} of i.i.d. r.v.s {X n} is alluded to as a
random walk; in the particular case when the component r.v.s {Xn} are
nonnegative, the random walk is referred to as a renewal process.

5.1 Three Series Theorem


The first question to be dealt with apropos of sums of independent r.v.s is
when such sums converge a.c. A partial answer is given next, and the ensuing
lemmas culminate in the Kolmogorov three series theorem.

Theorem 1 (Khintchine-Kolmogorov Convergence Theorem). Let {Xn ,


n ~ l} be independent 2 2 r.v.s with E X n = 0, n ~ 1. IfIl=1 EX} < 00,
then "'If X j converges a.c. and in quadratic mean and, moreover, E("'If X Y =
"'IJ=1 E XJ.
PROOF. If Sn = II X j ' by Corollary 4.3.5

E(Sm-Sn)2=E(fXj)2 = fEX}-O (1)


n+l n+l

as m > n - 00, whence, according to Theorem 4.2.3, Sn~ some r.v. S,


denoted by S = "'If
X j . A fortiori, Sn!. "'If
X j and so by Levy's theorem

113
114 5 Sums ofIndependent Random Variables

(Theorem 3.3.1) Sn ~ If Xj' The remainder follows from


E(f 1
Xj )2 = lim
n-oo
E S; = lim
n-ooj=t
±E = f E X;
1
X; (2)

via Corollaries 4.2.5 and 4.3.5. 0

The first lemma involves" summation by parts" and is as useful as its


integral counterpart.

Lemma 1 (Abel). If {an}' {b n} are sequences ofreal numbers and An = D=o aj,
n ~ 0, then for n ~ 1
n n- 1

I ajbj = Anb n - Aob l - I A/bj + I - b); (3)


j= I j= I
if I.!=I aj converges and A: = IJ=n+! aj' then for n ~ 1
n n- 1

I ajbj = A6 b l - A:bn + I Aj(bj + I - b); (4)


j= I j= I
moreover, if an ~ 0, bn+ I ~ bn ~ 0, A: = IJ=n+ I aj < 00, then
co co
I ajbj = A6 b l + I Aj(bj+ I - bj)' (5)
j= I j= I
PROOF.
n" ""
I ajb j = I (A j - Aj-1)bj = I Ajbj - I Aj_Ibj,
I I I 1

yielding (3). Take ao = - I.!= I aj = - A6 in (3) to obtain (4). Next, assuming


an ~ 0, bn+ I ~ bn ~ 0, if rtm A:b n > 0, then I:'+ I ajbj ~ A:bn implies
n
If ajbj = 00. By (4), A6b. + If Aj(bj + I - b) ~ If ajbj = 00, so that
(5) holds. If, rather, rtm A:b n = 0, then (5) obtains by letting n --+ 00 in (4).
o
The following Kronecker lemma is a sine qua non for probability theory as
will become apparent in the next section.

Lemma 2 (Kronecker). If {an}, {b n} are sequences of real numbers with


o < bn i
00, If (a)b) converging, then

1
- I
n
aj --+ O. (6)
bn j= I
PROOF. This will be demonstrated in the alternative equivalent form that
convergence of If
aj entails II
ajb j = o(bn). By (4),
1 bl 1 n- I
-A: + A6 -b + -b I
n

b L ajbj = Aj(bj + I - b). (7)


n j= 1 n " 1
5.1 Three Series Theorem 115

For any I: > 0, choose the integer m so that I A! I < I: for j ;;:: m. Then
1 n- 1 _I n - 1
-I::::; limb L A!(bj + 1 - bj):::; limb L Aj(bj + 1 - bj):::; 1:,
n n m n n m

whence from (7)


1 n 1 n
-I: <
- -lim
b-L'"
n
.Ja·b·
J- <
n j: 1
rrm -bL.JJ-
'" a·b· < I:
n n j: 1
and (llb n) L}: 1 ajbj --+ O. o
Lemma 3. Let {X n} be independent r.v.s with E X n = 0, Sn = L~ Xi' n ;;:: 1,
and E SUPn X~ < 00. IfP{suPnlSnl < oo} > 0, then Sn converges a.c. and in
quadratic mean and

(8)

PROOF. It suffices to prove (8) since Theorem 1 guarantees the rest. To this
end, set Z2 = sup X~ and choose K > 0 sufficiently large so that

p{s~p ISnl < K} > O.


Define T = inf{ n: ISn I ;;:: K} and note that T is positive integer valued
with P{T = oo} > O. Since {T;;:: n} = {ISjl < K, 1 :::;j < n}EO'(X 1, ... ,
X n- 1) for n ;;:: 2, the r.v.s X n and IIT?n) are independent for n ;;:: 1. Let
V n = L}:1 Xjl lnj ) and observe that V n is O'(X( ... Xn)-measurable and

Vn = SminlT. n)'

U; = ISmin(T-l.n-O + X min (T.n)1


2
:::; 2(K 2 + Z2), (9)

E V~ :::; 2(K 2 + E Z2) = C < 00.

Now, setting V 0 = 0, for j ;;:: 1

V; = V;-l + 2Vj-1XiIT?j) + X;/IT?j)'


and so by independence and Theorem 4.3.3
E V; - E V;-1 = P{T ;;::j}E X;'
Summing over 1 :::; j :::; n,
n n

C ;;:: E V~ = L P{T ;;:: j}E X; ; : P{T = oo} L E XI,


j: 1 j: 1

which yields (8) when n --+ 00. D

Lemma 4. If{X n} are independent r.v.swith E sUPnlXnl < ooandSn = L~ Xi


converges a.c., then Lf E X n converges.
116 5 Sums ofIndependent Random Variables

PROOF. Define K, T, V nas in Lemma 3, which is permissible since Sn ~ some


LV. S. Now, min[T, n] -+ T and so (9) ensures that

(10)

whereS T = Snon {T = n},n ~ l,andS T = Son {T = oo}. A more extensive


discussion of ST occurs in Section 3 of this chapter, where it is pointed out
that ST is a bona fide r.v. As in (9),

which, in conjunction with (10) and the Lebesgue dominated convergence


theorem, ensures
finite. (11)

By the independence observed in Lemma 3,

n n

E Vn = E I
j= 1
XjI(T""j] = I
1
P{T ~ j}E Xj'

whence

Employing Lemma I with bj = I/P{T ~ j}, aj = E V j - E V j _ I' j ~ I,


ao = A o = 0,

IEX.= EV n _nf[ I _ I JEV.


j=1 J P{T ~ n} j=1 P{T > j} P{T ~j} J

and so, recalling (11) and P{T = oo} > 0, If E X j converges. o


Corollary 1. If {X n' n ~ I} are independent L V.S which are uniformly bounded,
i.e., P{lXnl :s;; C < 00, n ~ 1} = 1, and moreover, Sn = D Xi~Sfmite,
then If E X j and If:1 /12(X) converge.

PROOF. The series of means converges by Lemma 4, whence If (Xi - E Xi)


converges a.c. and Lemma 3 applies. D

Definition. Two sequences of r.v.s {X n } and {y"} will be called equivalent if


If P{X n ;/= y"} < 00.

If {X n }, {y"} are equivalent sequences of LV.S, the Borel-Cantelli lemma


ensures that P{X n ;/= Y", i.o.} = O. Hence, p{I Xi converges} = I iff
peL Y; converges} = 1.
5.1 Three Series Theorem 117

The way is now paved for presentation of

Theorem 2 (Kolmogorov Three Series Theorem). If {X n} are independent


r.v.s, then 2:1" Xi converges a.c. iff
i. 2:1" P{IXnl > 1} < 00,
ii. 2:1" E X~ converges,
iii. 2:1" O'L
< 00,
where X~ = XnIIIXnl,;l)' n ~ 1.
PROOF. Sufficiency: If the three series converge, then 2:1" (X~ - E X~) con-
verges a.c. by Theorem 1, whence (ii) implies that L1" X~ converges a.c.
According to (i), {X n}, {X~} are equivalent sequences of LV.S and so LXi
converges a.c.
Conversely, if 2: X n converges a.c., then Xn~O, implying P{IXnl > 1,
i.o.} = 0, whence (i) holds by the Borel-Cantelli theorem (Theorem 3.1.1).
Also, {X n}, {X~} are equivalent sequences, so that necessarily L1" X~ con-
verges a.c. The remaining series, (ii) anq (iii), now converge by Corollary 1. 0

Corollary 2. If {X n} are independent r.v.s satisfying E X n = 0, n ~ I, and


ro
L E[X;IIIX nl';l) + IXnIIIIXnl>1J] < 00, (12)
1

then 2:1" X n converges a.c.


PROOF. Since E X n = 0,
ro ro ro
2: IE XnIIIXnl,;1I1 = LIE XnIlIXnl >111 :s; L EIXnlIIIXnl >1) < 00.
1 1 1
Moreover, by the Markov inequality (Theorem 4.1.l(iii»
P{lXn' > I} = P{IXnIIIIXnl>l) > 1} :s; EIXnlIllXnl >1)'
whence the corollary flows from Theorem 2. 0
Corollary 3 (Loeve). If {X n} are independent LV.S and for some constants
°L1"< X
IXn :s; 2, 2:1" E 1X n Ian < 00, where E X n =
n converges a.c.
°
when l:s; IXn :s; 2, then

PROOF. It suffices to consider separately the cases 1 :s; IXn :s; 2, n ~ 1, and
0< IXn < 1, n ~ 1. In the former instance, X;IIIX nl,;l) + IXnIIIIXnl>l):S;
IX nlan, whence (12) obtains. In the latter,
ro ro
2: E(X; + IXnI)IIIXnl,; 1) :s; 22: EIXnlan < 00,
1 1

and in both cases


ro ro
2: P[ I X n1 ~ 1] :s; 2: E I X nIan < 00,
1 1

SO that the three series of Theorem 2 converge. o


118 5 Sums oflndependent Random Variables

In much of probability theory, integration is with respect to a single


probability measure P, and so it seems natural to write fAX as an abbrevia-
tion for fAX dP. Abundant use will be made of this concise notation.
Turning to the i.i.d. case:

Theorem 3 (Marcinkiewicz-Zygmund). If {X n} are i.i.d. with EIXII P < 00


for some p in (0, 2), then

converges a.c., where

Furthermore, if either
If Xnln llp converges a.c.
(i) °< p < 1 or (ii) 1 < P < 2 and E X I = 0, then

PROOF. Set Aj = {(j - WIP < IXII ~/IP},j ~ 1. Then for rx > p > °
~EIy"la = JI Jl n -
alP
{jlXlla = JI Jt- aiP
{jlXlla

~ .I: (r alP + ~
J; I rx P
p-a)IP) f IX t1
Aj
a

~ I: (; +~)
j; I rx ]
f IXti ~ ~EIX,IP
rxP
<
Aj
P
P
00, (13)

whence (rx = 2) If
(Y" - E Yn) converges a.c. by Theorem 1. Since, re-
calling Corollary 4.1.3,

I'"
I
P{X-1% 1= n}= I'" P{IXII > nllp }
n
Y
I
~ EIXIIP < 00,

the sequences {X nln IIP }, {y"} are equivalent, and so (X nln l/P - E y") Ir'
converges a.c.
°
In case (i), where < P < 1, In"';
I IEy"1 < 00 via (13) with rx = 1, and
this same series converges in case (ii), 1 < P < 2 and E X I = 0, since

I:IEy"I~In-'IPi
I I IIXnl>nl/p)
IXnl= fIn-liP
n;lj;n+1
f Aj
IX,I

.I ji,ln-IIPf 'XI,~~.I(j-l)(P-')IPf
J;2n;1 Aj P I J ;I Aj
IXII

~ -P- L'" f IX liP = -P- E IX liP < 00.


P- 1 I Aj P- 1
Thus, the second part of Theorem 3 follows from the first. o
5.1 Three Series Theorem 119

EXAMPLE 1. Let {X n, n ;;::: 1} be i.i.d. r.v.s with EIX 11 < 00. If {an, n ;;::: l} are
real numbers such that an = O(1/n) and If
an converges, then 1 anX n L:'=
converges a.c.
PROOF. By considering X;; and X;; separately, it may and will be supposed
that XI;;::: O. Set
Y" = X~ - E X~.

By the Borel-Cantelli theorem, {X~} and {X n } are equivalent sequences, and


from (4) it follows that a convergent series remains convergent when its terms
are multiplied by any convergent, monotone sequence (Abel's test). Thus,
E X~ lEX 1 implies I
an E X~ converges. Consequently, to prove an X n I
converges a.c., it suffices to prove L
an Yn converges a.c. By hypothesis,
na; : :;
2
A < 00, and so, setting A j = {j - I < XI:::; j},

00
:::; An~l j~ln-2
n f AjXi:::; 2A E X I'

where the last inequality follows via (13) with IX = 2, p = 1. Hence, Lan Yn
converges a.c. by the Khintchine-Kolmogorov convergence theorem. 0

EXAMPLE 2. If {b n , n ;;::: I} are constants satisfying 0 < bn l 00 and

b; Lb
00
2
j- = O(n) (14)
j=n

and {X, X n , n;;::: I} are i.i.d. r.v.s with I:'=l P{!XI > bn } < 00, then
00

L b;;l(Xn -
n= 1
E XIUX1s;b"l) (15)

converges a.c. Moreover, if E X = 0 and


n
bn I bj 1 = O(n), (16)
j= 1

then

(17)

converges a.c.

Remark. If bn/n l or bibn ;;::: AWn)6 for j ;;::: n, where b > t, A > 0, then
(14) holds; if bibn ;;::: AWn)lJ for j :::; n, where 0 < b < 1, A > 0, then (16)
obtains.
120 5 Sums oflndependent Random Variables

00 ex> co a:>

00 > LP{!Xnl > bn} = L L P{AJ = LjP{AJ (18)


n=O n=O j=n+ I j= I

whence for some C in (0, 00),

j~lbj-2a2(lj)~ ~ bj2 E Y; ~ Jl bj- 2 nt I"x;


= JI j~nbj-2 I"X 2 ~ Jl b;Jnbj- 2 P{A n}
<Xl

~ C L n P{A n} < 00
I

via (14) and (18). By Theorem 5.1.1, Lf bjl(lj - E lj) converges a.c., and
so, again employing (18), Lf bj-I(X j - E lj) converges a.c., yielding (15).
Moreover, if E X = 0,

<Xl

~ C L (n + I)P{A +tl <


.= I
n 00

via (16) and (18). Consequently, (17) follows via (15). o


Theorem 4. Let {X.} be independent r.v.s such thatfor some positive e, (j
inf P { 1X.I > e} = (j, (19)
n~1

and suppose that Lf


a. X. converges a.c.for some sequence {a.} ofreal numbers.
If either (i) X., n ~ 1, are i.i.d. and nondegenerate or (ii) EX. = 0, n ~ 1,
and {X.} is u.i., then
<Xl

La; < 00. (20)


I

PROOF. Since Lf a. X. converges a.c., necessarily a. X• ..E. 0, implying via (19)


that a. 0. In proving (20), it may be supposed that a. ::f. 0, n ~ I. If Y. =
--+
X.llla"X"I~I]' the three series theorem requires that

(21)

In case (i), a~" = E(X I I!la"Xd:S; I] - E Y,Y. If aL --+ for some subsequence °
nj , then XI Illa".xd:S; I] - E Y. .!. 0, implying Xl degenerate, contrary to the
j
J
hypothesis of (i). Thus lim. a~" > 0, whence (21) implies (20).
5.1 Three Series Theorem 121

Under (ii), since E X n = 0,

IE Y"I = IJUX"I
f
> la"l- 1)
Xn I~ JIIX"I
f
> la"l- 1)
IXnl = 0(1)

by uniform integrability and an -+ O. Hence,

lim at = lim(E Y; - E 2
y") = lim E Y; ~ lim E 2
1Y" I
n

by (19), and once more (20) flows from (21). o


Corollary 4 (Marcinkiewicz-Zygmund). If {X n} are independent r.v.s with
E X n = 0, EX; = 1, n ~ 1, and infn EIXnl > 0, then a.c. convergence of
L anXnfor some real numbers an entails convergence ofL a;.
PROOF.Uniform integrability of {X n } is implied by EX; = 1, n ~ I, while
infEIXnl > 0 and EX; = 1 ensure (19) (Exercise 5). 0

Definition. A series I:'= I X n of r.v.s will be said to converge a.c. uncondition-


ally if L:'=I X nk converges a.c. for every rearrangement (n l , n2 , . . . ) of
(1, 2, ., .). More specifically, a rearrangement is a one-to-one map of the
positive integers onto the positive integers.
In the case of degenerate LV.S X n , n ~ 1, the series I f X n converges
unconditionally iff I IXnl converges (Weierstrass). However, the analogue
for nondegenerate independent random variables is invalid. In fact, if X n ,
n ~ 1, are independent LV.S with P{X n = ± lin} = t, then X n converges L
a.c. unconditionally by Theorem 1, but L IXnl = 00, a.c.

Lemma 5. If {X n} are independent r.v.s with E X n = 0, n ~ I, and I f EX;


L
< 00, then X n converges a.c. unconditionally and Lj=
I X nj = I f Xj' a.c.,
for every rearrangement {n j } of {j}.
PROOF. Theorem 1 ensures that L
X n converges a.c. unconditionally. More-
over, for any fixed rearrangement {nJ, define S;,. = Ij= I X nj , Sm = Li Xj'
Then, setting Q = {n l , ... , nm}L\{l, 2, ... m},

E(S;" - Sm)2 = I E
keQ
xr
Now, if {nl, ... , nm } ~ {l, 2, ... ,j}, then
00

E(S;" ~ Sm)2 ~ IE xf = 0(1) asj -+ 00.


j+ I

Hence, S;,. - Sm ~ 0 as m -+ 00, implying Ij= I X nj = Lf X j a.c. 0


122 5 Sums of Independent Random Variables

Theorem 5. A series If X n of independent r.v.s X n converges a.c. uncondi-


tionally ifffor Yn = Xn/IIX"ls I)

i'. If P{IXnl > 1} < 00,


ii'. If IE Y,.I < 00,
llJ. L.,I E y n < co,
..., "'" 2

and if so, If
X nj = If
X j a.c.for every rearrangement {n j} of {j}. Moreover,
a series ofindependent r.v.s {X n } converges absolutely a.c. iff (i'), (iii'), and
ii". If EI Ynl < 00

hold.
PROOF. Since the series appearing in (i'), (ii'), (iii') are independent of the
order of summation, the three series theorem (Theorem 2) guarantees that
If X n converges a.c. unconditionally, and by Lemma 5

I'" (Y,.j - E Yn) = I'" (lj - E lj), a.c.


I I

Then, in view of (ii'), If


Y,.j = If
lj a.c., whence (i') ensures that Xnj = If
If X j a.c.
Conversely, if I X n converges a.c. unconditionally, (i), (ii), (iii) ofTheorem
2 are valid for every rearrangement {nj} of {j}. By the Weierstrass theorem
(ii') holds, and hence also If
E 2 Y,. < 00. But this latter and (iii) entail
(iii').
The proof of the final statement is similar. D

Corollary 5. If the series If


Xn of independent r.v.s Xn converges a.c., then
I (X n - en) converges a.c. unconditionally, where
(22)

PROOF. Set Yn = Xn/UX"ISI)' By the three series theorem

I'" at < 00, (23)


I

whence Lemma 5 guarantees that If


(Yn - E Y,.) converges a.c. uncondi-
tionally. Then by (23), II
(X n - E Y.) converges a.c. unconditionally. D

EXERCISES 5.1
1. Let S. = D X;, where (X", n ~ I} are independent LV.S. (i) IfI:'=1 P(IX.I > c}
= 00, all c> 0, then lim"_oo IS.I = 00, a.c. (ii) If (b", n ~ I} are positive constants for
which li.m.-oo P(S"-l > -bb"} > 0 for all b > 0, then fIn1 S"/b" s: C < 00, a.c.,
implies I:'= I PIX" > eb"} < 00 for all e > C. Hint: Recall Lemma 3.3.4.
2. Any series I~1 Xi of r.v.s converges absolutely a.c. if I;;'=1 EIX.I'" < 00 for some
sequence r" in (0, I].
5.1 Three Series Theorem 123

3. If X., n ~ I, are independent r.v.s and Y~ = X.Iux.l,;c], then a.c. convergence of


I x. ensures convergence of Do P{ IX.I > e}, D" E Y~, D" a~~ for every e > O.
Conversely, convergence of these series for some e > 0 guarantees a.c. convergence
I
of X •.
4. If {X.}, {Y.} are equivalent sequences of r.v.s, prove that
i. P{D" X.converges} = I iffP{D" Y.converges} = I,
ii. if 0 < b. l 00, then P{D Xi = a(b.)} = 1 iff P{D 1"; = a(b.)} = I.
5. Let {X.} be U.i.LV.S. Then inf P[IX.I > e] > 0, for some e > 0, iffinf. EIX./ > O.
6. If X., n ~ I, are i.i.d. !l! 1 LV.S, then I (X.ln) converges a.c. if either (i) Xl is sym-
metric or (ii) E I X 1 Ilog + IX d < 00 and E Xl = O.
7. Let {a.} be a sequence of positive numbers with I f a. = 'X>, Then, if p > 0, there
exist independent LV.S X. with EX. = 0, EIXnlP = a. such that If X. diverges
a.c., thereby furnishing a partial converse to Corollary 3 when 0 < p :0; 2.
8. If X., n ~ I, are independent r.v.s with P(X. = I} = P(X. = -I} = t, then
If X./~ diverges a.c. although I f EI XJ~IP < 00, all P > 2. Thus, the restric-
tion of exponents in Corollary 2 is essential.
9. If {A., n ~ I} are independent events with P {A.} > 0, n ~ 1 and I:=I P {An} = 00
then D=I IA)D=I P{Aj} ~ l. Hint: If a. = D=I P{AJ and Xj = (fA j - P{Aj })/aj
then {Xj' j ~ I} are independent with E X j = 0, E Xl :0; P{Aj}/aJ ~ l/aj_1 - l/aj,
j> l.
10. If x., n ~ I, are independent r.v.s with P. = P{X. = an} = I - P(X n = -an},
characterize the sequences (a., P.) for which If Xi converges a.c.; specialize to
P. == t, to a. == a, and to an = n-· (IX > 0).
II. If, in the three series theorem, the alternative truncation Z. = min[l, max(X., -I)]
is employed, then convergence of the two series If E Zn, If at
is equivalent to the
a.c. convergence of I f X •.
12. For any sequence of LV.S {S.}, it is always possible to find constants 0 < a. l 00 for
which SJa. ~ O. Hint: If 0 < e. 10, choose a. > a.- 1 such that P{ IS.I > a.e.}
< 2-'.

13. (Chung) Let '¥ be a positive, even function with x-Z'¥(x) 1, Ixrl,¥(x) l as Ixll. If
0< bn l 00, {X.} are independent with EX. = 0, I (E 'I'(X.)/'¥(b.» < 00, then
I (X.lb.) converges a.c. Hint: Apply Corollary 2 with X n replaced by Xnlb n.

14. If{X.,n ~ I} arei.i.d.r.v.swithEIX I I < oo,provethatI:'=l X.(sinnt)/nconverges


a.c. for every t E ( - 00, (0). Conversely, a.c. convergence of this series for some
t =I kn, k an integer and i.i.d. {X., n ~ l} implies EIXd < 00. Hint: For m = 1,
2, ... , choose integers nm so that nmt E (2mn + (nI4), 2mn + (nI2)] for t = n14.

15. If {b., n ~ I} are finite constants with 0 < b.l 00 and (i) b; IJ=. bj- Z = O(d(b.»,
where d is a nondecreasing mapping of [0, (0) into [0, (0) for which (ii) d(b.) ~
en> 0, n ~ I, and (iii) xZld(lxi) l as Ixll, then for any i.i.d. LV.S {X, X n , n ~ I}
with (iv) E d( IX I) < 00, the conclusion (15) obtains. Moreover, if E X = 0 and
(v) Ixl/d(lxl) 1 as Ixll and (vi) bn I:i= 1 b; 1 = O(d(b.», then (17) holds.

16. If {X, X., n ~ I} are i.i.d. LV.S with E X = 0, E XZ(l + log+ IXI)-26 < 00 for some
(j > 0, then I n- I/Z (log nr(l/Z)-~X. converges a.c.
124 5 Sums of Independent Random Variables

5.2 Laws of Large Numbers


In a sense, a.c. convergence of a series of independent LV.S X n is atypical and
in the nondegenerate i.i.d. case it is nonexistent. Thus, the issue rather be-
comes one ofthe magnitude of the partial sums Sn = L~ Xi·WhenP{Xn = l}
= p = 1 - P{Xn = OJ, so that Sn is a binomial LV. with parameters nand p,
it was proved in Theorem 2.2.3 that

S -np 1
= _ L (Xi
n
n - E Xi) ~ O.
n n i~l

Definition. A sequence {Xn } of IE I r.v.s is said to obey the classical strong


law of large numbers (SLLN) if

(a)

If, merely,

(b)

the sequence {X n} satisfies the classical weak law of large numbers (WLLN).

From a wider vista, n may not reflect the real magnitude and the expecta-
tions need not exist. Thus, there is occasion to consider the more general
strong and weak laws of large numbers

~ ~(X.
"-- I
- b.)~O
I P ,
an i~ I

where 0 < an t 00. Here, the smaller the order of magnitude of an, the more
precise the SLLN becomes; the fuzzy notion of an optimal choice of an
impinges on the law of the iterated logarithm (Chapter 10). Note, in this
context, Exercise 5.1.12.
The first SLLN may be reaped as a direct application of Kronecker's
lemma to Corollary 3 of the three series theorem (Theorem 5.1.2), thereby
obtaining Loeve's generalization of a result of Kolmogorov (IXn == 2).

Theorem 1. If {X n } are independent r.v.s satisfying


00 EIX Ian
7
"
nan
n <00 (1)

for some choice of IXn in (0, 2], where E X n = 0 whenever 1 :::; IXn :::; 2, then
1 n
- LXj~O. (2)
n j~ 1
5.2 Laws of Large Numbers 125

Corollary 1. Let {Xn} be independent !t' 2 LV.S with E X n = 0, Lf E X;/n 2


< 00. Then (2) holds.

According to Corollary 1, independent LV.S with means zero and variances


n(log n)-6 obey the classical SLLN if () > 1. This remains true for any () > 0
by Corollary 10.1.4. provided (as is necessary) L:~I P{IXnl ~ ne} < 00,
e> O.
In the i.i.d. case, the next theorem gives a generalization due to Marcin-
kiewicz and Zygmund of a classical SLLN (p = 1) of Kolmogorov.

Theorem 2 (Marcinkiewicz-Zygmund). If {X n} are i.i.d. LV.S and Sn =


Li Xi' then for any p in (0,2)

(3)

for some finite constant c iff E IX I IP < 00, and if so, c = E X I when 1 S; P < 2
while c is arbitrary (and hence may be taken as zero)for 0 < p < 1.
PROOF. If (3) holds, then

X n _ Sn - nc _ (n - 1)I/P Sn-I - nc ~O
n l/p - n l/p n (n - l)llp ,

whence by the Borel-Cantelli theorem Lf P{ I X II ~ n l/p } < 00. Thus,


EIX liP < 00 by Corollary 4.1.3. Conversely, if E\X liP < 00, by Theorem
5.1.3 the following series converge a.c.:

1 < P < 2;

p = 1; (4)

00 X
O<p<1.
~ nl;P'

For p :1= 1, (3) obtains (with c = E X I or 0) by Kronecker's lemma. When


p = 1, since E XnIIiXnl:s;nl = E XIIllx'l:s;nj-+ E XI as n -+ 00, Kronecker's
lemma, in conjunction with (4), yields

a.c. o

Corollary 2 (Kolmogorov). If {Xn} are i.i.d. LV.S, Sn = Li Xi and E XI


exists, then Sn/n ~ E X I'
126 5 Sums of Independent Random Variables

PROOF. If IE X II < 00, the conclusion is contained in Theorem 2. If E X I =


- 00, define y" = max(X n' - K), where K > O. Then {y,,} are i.i.d. with
E I YI I < 00. By Theorem 2

r.-= Sn r.-= 1 ~ a.c. K-oo


Hm - ~ Hm - L. 1'; = E YI----->E X I'
n n I

Hence, S,jn .!:::. - 00. Analogously, S,jn ~ 00 when E X I = 00. 0

EXAMPLE 1. If {X n' n ~ 1} are i.i.d. ffp random varibles for some p in (0, 2),
then (i) L::"=I n- 2jpX; < 00, a.c. Moreover, if E X = 0 whenever 1 ~ p < 2,
•. )
( It n -2jp,,\,~ X.S. ~O.
L.J = 2 J r I

PROOF. According to the proof of Theorem 5.1.3, y" = n - ljp X nlllx"1 sn1/PI'
n~ 1 and n-IIPX n , n ~ 1 are equivalent sequences and L::"=l E Y,,2 < 00
implying (i). Then, by Kronecker's lemma, n- 2IP Lj=1 XJ~ 0, and so
Theorem 2 ensures that

n- 2lp L XjS j _
n

j=2
1
1 2Ip [ S; -
= _n-
2
L XJ
n

j=l
]
~O. o
If {X n , n ~ 1} are i.i.d. LV.S, Sn = L~ Xi and {b n , n ~ 1} are constants
with 0 < bn t 00, the relationship between X ..Ibn -!:4 0 and (Sn - Cn)/bn ~ 0
for some sequence of constants {Cn , n ~ 1} is described by

Theorem 3. Let {X, X n, n ~ 1} be i.i.d. r.v.s with partial sums {Sn, n ~ I} and
{b n, n ~ I} constants such that 0 < bn t 00 and
00

L P{IXnl > b n} < 00. (5)


n= I

(i) (Feller) If(rx.) b,jn t 00 or (P) bn/n!, the first halfof(6) holds and E X = 0
or (y) E X = 0 and
00 n
b; Lbj2 = O(n), bn L bj- I = O(n), (6)
j=n j= I

then
S
~~O
b . (7)
n

(ii) If {an, n ~ l} are positive constants with An = L~ aj -+ 00 such that


bn = A,jan t 00 and satisfies the first halfof (6), then
5.2 Laws of Large Numbers 127

(8)

and, moreover, if E X = 0,
1 n
-
A"L. a j X j a.c.
---+
0. (9)
n I

PROOF. According to Example 5.1.2, condition (5), the initial portion of (6)
and Kronecker's lemma ensure

(10)

The conclusion (8) in case (ii) follows likewise.


Now bn/n i 00 implies the first half of (6), hence (10) and also

Lj P{b
00

n~oo. j _ , < IXI ::; bj } N-",.O


j=N

by Example 5.1.2 (18), and so (7) follows from (10), thereby proving (0:) of
(i). In case ({3) since E X = 0,
128 5 Sums ofIndependent Random Variables

=0 b (N)n + j~/{IXI > bJ + n~/ P{b j -


00 00
1 < IXI ~ bj }

= 0(1)

as n, and then N -+ 00 by Example 5.1.2 (18).


In case (y), (7) is a direct consequence of Kronecker's lemma applied to
(17) of Example 5.1.2.
To prove the remaining portion of (ii) it suffices, via (8), to note that
LJ= 1 aj E XIIIXI:SAj/Djl = o(A n ) in view of EX = 0 and

Corollary 3. Let {an, n ~ I} be positive constants with An = Ii aj -+ 00 and

(11)

For any i.i.d. LV.S {X n, n ~ I}, there exist constants Cnfor which
1
-An L a ~X. -
n
C-) ~ 0 (12)
1 J J J

iff
anXn
- - ----+
a.c. 0 (13)
An .

PROOF. If (13) obtains, so does (12) with C j = EX I ux1 :SAj/Dj) by Theorem 3.


Conversely, since an = o(A n) guarantees An/A n- 1 -+ I, (12) ensures

an(X n - Cn)/A n ~ O.

Moreover, anXn/A n ~ 0 via an = o(A n). Then anCn/A n -+ 0, which, in turn,


ensures (13). 0

Although necessary and sufficient conditions for the WLLN are available
in the case of independen( r.v.s, a complete discussion (see Section 10.1)
requires the notion of symmetrization. The i.i.d. situation, however, is
amenable to methods akin to those already employed.

Theorem 4 (Feller). If {X n} are i.i.d. LV.S and Sn = 2:7= 1 Xi' then


Sn - Cn ~ 0 (14)
n
5.2 Laws of Large Numbers 129

for some choice ofreal numbers Cn iff

n P {I X I I> n} - 0, (15)

and ifso, Cn/n = EX IIlIXdsnl + 0(1).


PROOF. Sufficiency: Set Xi = XjIlIXjlsnl for 1 ~ j ~ n and S~ = Lj= I Xi·
Then, for each n ~ 2, {Xi, 1 ~ j ~ n} are i.i.d. and for c > 0, P{I(Sn/n) -
(S~/n)1 > c} ~ P{Sn #= S~} = P{U~ [Xj #= Xi]} ~ n P{lX II> n], so that
(15) entails (S~/n) - (Sn/n) -f. O. Thus, to prove (14) it suffices to verify that

S~ - E S~ S~ p
n =-; - EXIII/Xdsnl- O. (16)

By Lemma 5.1.1 (4), Corollary 4.3.5, and (15),


n n

E(S~ - E S~)2 = L (T2(Xi) ~ L E(Xi)2 = n E(X'Y

f
I I

= nLn Xi
j=1 [j-I<IXdsjl
n

:$; nLl[P{lXII >j - I} - P{lXII >n]


j=1

= n[p{lXII > O} - n 2 P{IXII > n}

+ ~t>U + 1)2 -l)P{IXII > nJ

:$; 3n[1 + ~t:jP{lXd 2


>nJ = 0(n ), (17)

which implies (16) and hence (14) with Cn = n EX IIlIXdsnl'


Conversely, if (14) holds, setting Cn = Cn - C n - I' n ;;::: I, Co = 0,

whence (X I - cn)/n -f. 0, necessitating Cn = o(n). By Levy's inequality


(Lemma 3.3.5), for any B > 0

p{ max IS. - C· - m(S· - C· - S + C ) I > nc}


I sjSn J J J J n n - 2

:$; 2P{I Sn - Cnl ~ ~} = 0(1); (18)


130 5 Sums ofindependent Random Variables

but, taking X j = Sj - Cj in Exercise 3.3.7,


max Im(Sj - C j - Sn + Cn)1 = o(n). (19)
1 s,jsn

Thus, from (18) and (19), for all f. > 0

lim p{ max ISj - Cjl <


n 1 '5:J$n
m;} = 1. (20)

Moreover, for max, $j$nICjl < ne, and hence for all large n,

p{ max ISj - Cjl < ne}


lSJsn
~ p{ max IX
1 '5:Jsn
j - cjl < 2ne}

~ p{ ,max
$)$n
IXjl < 3ne},

which, in conjunction with (20), yields

pn{IX,1 < 3ne} = pt~~:nIXjl < 3ne}-+ 1


or, equivalently, for all e > 0

n loge 1 - P {IX ,I ~ 3ne}] -+ 0 (21)

as n -+ 00. Since log(1 - x) = -x + o(x) as x -+ 0, (21) entails (15).


The final characterization of Cn/n results from the fact that (14) entails
(15), which, in turn, implies (14) with Cn/n = EX ,/!lXd$nj' 0

EXAMPLE 2 (Chung-Ornstein). If {X, X n, n ~ I} are i.i.d. with n P{IXj > n}


= 0(1) and E XI !lXI $n) = 0(1) (a fortiori if E X = 0) then the random walk
{Sn = L~ Xi' n ~ I} is recurrent. If, rather, EX> 0 or E X < 0, the random
walk is nonrecurrent.
PROOF. Take N = k . m in Lemma 4.2.5, where m is an arbitrary integer. Then
for any e > 0

According to Theorem 4, Sn/n .f. 0, and so

LP{ISnl<e}~-2Iim-k
00

n=O
1 IP
km
k-oo m n=O
m {I S
-!'.
n
Ie} m
<- =-2'
m

Since m is arbitrary, the series on the left diverges for all e > 0 and the con-
clusion follows from Example 4.2.1. The final remark stems from the strong
law of large numbers. 0
5.2 Laws of Large Numbers 131

In Kolmogorov's classical SLLN (Corollary 2) involving identically distrib-


uted 2 1 random variables {X n , n 2:: I}, independence can be weakened to
pairwise independence (Xi independent of X j for all i oF j). One way of demon-
strating this is via strong laws for non-negative random variables.

Lemma 1 (Etemadi). If Sn = Lj=IXj, n 2:: 1, where {X n, n 2:: I} are non-


negative 2 2 random variables satisfying

sup E X n = B < 00 (22)


n~1

(23)

n > m 2:: 1, (24)


then
Sn - E Sn •. s. 0
------+ . (25)
n
PROOF. Let k n = [an], a > 1. Since all pairs of r.v.'s have non-positive covari-
ance, for all e > 0 and some C in (0, (0)

L ai/k;
00 00 00 k
2
e L P{ISk n - E SkJ > ek n} ~ L a~k /p;. ~ L
n=1 n=1 n n=1 j=1

whence the Borel-Cantelli lemma ensures


(s - kn E Sk n )/k,. ~ O.
Via monotonicity of Sk' for any k E [k n, k n+ 1),
Isk - E Sk I ~ ~
k ISk n +! - E S k n +! I+ ISk - n E Skn I+ E(Sk n +l- Skn ) ,
k kn kn + 1 kn kn
and so, since k;;1 E(Sk n +! - SkJ ~ k;;l(k n+ 1 - kn)B ~ (IX - I)B,

-I' I
Imk--+oo Sk - kE Sk I (-
a ~ - +0,
I)B -"h a.s.

yielding (25). D

Corollary 4. If Sn = Lj= 1 Xj' n 2:: 1, where {X n' n 2:: I} are non-negative, pair-
wise independent random variables such that

sup E X n < 00 (26)


n~1

00

L n- 2 E X;IIXnSnl < 00
n=1
(27)
132 5 Sums of Independent Random Variables

L P{Xn > n} < 00,


00

(28)
n=l

then

(29)

PROOF. The non-negative, pairwise independent, 2 2 r.v.'s Y,. = XnI[Xns;n),


n ~ 1 clearly satisfy (24) and also (22) and (23) in view of (26) and (27). Thus,
by Lemma 1,

-n1 ( i=l
Ln n
Y; - LEY;
i=l
)
~ o. (30)

Now, via (26) and (27),


1 1 B 1
o ~ -E
n XnI[x n n X n - E XnI[x n<nj)
>n) = -(E -
~-
n - -2
2
n E XnI[x n S;nj = 0(1)

implying as n -+ 00 that

-
1(
n
ES n -
n
LEY; =-
i=l ni=1
) 1L
EXJ[X,>i)-+O.
n
(31)

Since (28) ensures that {Xn' n ~ I} and {Y,., n ~ I} are equivalent sequences,
(30) holds with L~= 1 Y; replaced by Sn, and this together with (31) yields (29).
D

Theorem 5 (Etemadi). Let Sn = L~= 1 Xi' n ~ 1, where {X n, n ~ I} are pairwise


independent, identically distributed 2 1 random variables. Then Sn/n~ E X 1.
PROOF. Evidently, {X:, n ~ I} (likewise {X;, n ~ I}) satisfies (26), (28), and
also (27) since
n
L L L
00 00

n-2E(X:)2I[X~s;nj = n- 2 E(X+)2I u- 1 <x+s;j)


n=l n=1 j=1

L L n- 2E(X+)2Iu_ <x+
00 00

= 1 S;j)
j=1 n=j

=CEX+<oo,
whence Corollary 4 guarantees
1 n
Sn/n=-
n j= 1
L
(xt -Xi-)~EXt -EXt =EX 1 • D
5.2 Laws of Large Numbers 133

An extremely useful tool in probability theory is the Kolmogorov in-


equality, which, as will be seen in Section 7.4, holds in a much broader con-
text.

Theorem 6 (Kolmogorov inequality). If {Xj' 1 ~j ~ n} are independent 2 2


r.v.s with E X j = 0, Sj = L{ Xi' 1 ~j ~ n, then for e > 0

P {max ISjl 2::


Ish;;n
e} ~ e1 j=1t E XJ.
2 (32)

PROOF. Define T = smallest integer j in [1, n] for which ISjl 2:: e if such
exists and T = n + 1 otherwise. Then {T 2:: j} E a(X I' ... , Xj-I) for
1 ~ j ~ n + 1, where the a-algebra in question is {0, O} whenj = 1. Thus,
X j and IIT"j) are independent r.v.s for 1 ~ j ~ n. Moreover, since ST = Sj
on {T = j}, I ~ j ~ n, and s:..inIT.n) = LJ=
I XjIIT"j]'

pt~~:nISjl2:: e} = P{T ~ n} = JIP{T =j}

I~i
~ 2' L..-
e j= I [T=Jl
Sj2 = Ii
2'
e ITSn]
2
SminlT.n)

= ~ (±I P{T 2:: j}E XJ + Isi<jsn


e
2L E(XJ[T"Jl)' E Xj)
I n
~ 2'
e
LEX;'
I
0

The versatility of the Kolmogorov inequality will be exemplified in


extending Theorem 4 to the case of random indices.

Theorem 7. If {X n , n 2:: I} are i.i.d. r.v.s obeying


n P{IX" > n} = 0(1) (33)
and {T,., n 2:: 1} are positive integer-valued LV.S satisfying

T,. p
- --+ c, where 0 < C < 00, (34)
n

then, setting Sn = L1 Xi'


(STjT,.) - E XII[lX"sn) ~ O. (35)
134 5 Sums ofindependent Random Variables

PROOF. Define Xj = XjIIIXjl$n)' Sj = If= 1 X;, Mj = E Sj, and en =


[e· n] = largest integer ~ e . n. Then,

P{Sr. i= S~., T" ~ 2e ~ n} p{y [IXjl > n]}


2,.
~ L P {I X I > j n} ~ 2en P { I X 1 I > n} = 0(1)
j= 1

by (33), whence, taking cognizance of (34),


P{Sr. i= S~J ~ P{Sr. i= S~., Tn ~ 2en } + P{T" > 2en } = 0(1),
that is,
Sr. - S~. ~ O. (36)
Now the prior calculation of (17) shows that E X't 2
= o(n), whence,
defining

Hj = {lSj - Mjl > ne},

by Kolmogorov's inequality (Theorem 6)

Thus,
P{BrJ ~ P{T" ~ 2en , BrJ + P{T" > 2en }
~ P{Dn } + P{T" > 2en } = 0(1),
so that

S~. - M~ • .f. O.
n

Hence, by (36) and (34)

Sr. - M'r. = ~ (Sr. - M'r.) .f. O.


T" T" n

o
For any sequence of LV.S {X n , n ~ I} and nonnegative numbers u, v
define
Su.v = L
t sjsv
X 1ul + j, Sv = So,v, So = O. (37)
5.2 Laws of Large Numbers 135

The r.v.s Su.v are called delayed sums (of the sequence X n). If I,.<Xl= I P{ ISnl
> m:} < 00 for every I: > 0, then by the Borel-Cantelli lemma, Sn/n ~ O.
The converse is false even when the {X n } are i.i.d. since then by the classical
SLLN (Theorem 5.2.2, p = 1), Snln ~ 0 iff E XI = 0, whereas according
to a theorem of Hsu-Robbins-Erdos (Chapter to) I,.<Xl= I P{lSnl > nl:} < 00,
aliI: > 0 iff E X I = 0, E xi < 00. However, in the i.i.d. case Snln ~ 0 does
imply that I,.<Xl= 1 (l/n)P{ ISnl > nl:} < 00 for every I: > 0 by a result of
Spitzer, and the next theorem due to Baum-Katz extends this.
For any sequence of r.v.s {X n' n ~ I} define via (37).

S:.v = max ISu.jl, S~ = max ISjl, u ~ 0, v ~ O. (38)


1 sjsv I sjsv

Theorem 8 (Baum-Katz-Spitzer). Let {X n, n ~ I} be i.i.d. r.v.s with EIXIIP


< 00 for some p in (0, 2) and E X I = 0 if 1 ~ P < 2. If rxp ~ 1 (hence rx > t).
then for every I: > 0
<Xl
L nap - 2 P{S: > nal:} < 00. (39)
n=1
PROOF. By Theorem 5.2.2, Sn/n l / p ~ 0, implying S:/n l / p ~ 0 and hence

(40)

Suppose first that rxp = 1. Since {S!n.2n, n ~ I} are independent r.v.s, (40)
and the Borel-Cantelli theorem (Theorem 3.2.1) imply that for every I: > 0
and c = (log 2)-1
<Xl <Xl
00 > L P{S!n.2n ~ 2 l:} = L P{S!n ~ 2 l:}
an an
n=1 n=1

~ L<Xl P{S!, ~ 2a(l+ 1)I:}dt > C {<XlX-I P{S: ~ 2al:xa}dx,

and so for aliI:' > 0

Suppose next that rxp > 1. Since for m ~ 1

(m + l)a p/(a p- l ) > map/(ap-I) + ~ ml/(ap-I) > map/(ap-I) + ml/(ap-I)


- rxp - 1 - ,
the r.v.s
{S:~P/(~P - II. m'/(~p - I), m ~ I}
are independent. Moreover, (40) implies

.
m-a/(ap-I)S:~P/(~P _I) ml/(~p _ I) ~ m-a/(ap- I) S:~P/(~P- I).m~p/(~p-') ~ 0
136 5 Sums of Independent Random Variables

as m ..... 00, whence for all 13 > 0


co
00 > "P{S*
i.J > m«/(<<P-I)}
mCll:p/(ClP-l),m1/(Clp-l) _ e
m=l
co
= L P{S:I/(~P-l) ~ m«/(<<P-O e }
m=l

~ fico P{S~/(~P-I) ~ (t + l)«/(<<P-I)e}dt

~ (lXp - 1) fico X«p-2 P{S: ~ e'x«}dx


for 13' = 2«/(<<p-l)e and (39) follows. o
The converse to Theorem 8 appears in Section 6.4.

Corollary 5. If {Xn,n~ I} are i.i.d. r.v.s with EX I =0, EIXIIP< 00 for


some p in [1, 2), then
CO

L nP - 2 P{lSnl > ne} < 00, 13 > O. (41)


n=1

The convergence of series such as (39), (41) is enhanced when an explicit


bound C(e, p) is obtained for the sum since uniform integrability of stopping
or last times related to Sn may be deducible therefrom.

EXERCISES 5.2
I. Prove for i.i.d. LV.S (X.} with S. = D Xi that (S. - C.)fn ~O for some sequence
ofconstantsC.iffEIXII < 00.
2. If {X.} are i.i.d. with EIXIIP = 00 forsomepE(O, 00), then P{lim.IS.lln l / P = oo} = 1.
3. Demonstratefor i.i.d. r.v.s {X.} that E sup. I X .Inl < 00 iff EI X Illog+ IX II < 00.

4. If S. = D
Xi' where {X.} are i.i.d. ft'p r.v.s for some p;<: 1, then EIS'/nI P -+
P
IE X I . Hint: Recall Example 4.3.1.

5. If {X.} are i.i.d. LV.S with E XI = I and {a.} are bounded real numbers, then
(lIn) D a j -+ I iff (lIn) D= I ajX j ~ I.

6. Let {X.} be i.i.d. LV.S with (S.ln) - C• .!. 0 for some sequence of constants {C.}.
Prove that

i. J[~'<IXoI</I'IIXII = o(l)wheneverO < IX < P < 00,


ii. 2 JlIxoIs2.) XI - JIIX,+X,IS2.) (Xl + X 2 ) = 0(1).
7. Let {X, X., n ;<: I} be i.i.d. ft'p r.v.s for some p in (0, 2) where EX = 0 whenever
I :s; p < 2. If {a., n ;<: I} are positive constants satisfying a:/LJ=I at = O(lln),
IJ=I at -+ 00, then
5.2 Laws of Large Numbers 13'1

1 •
• )I/P ~ ajXj~O.
I at
( j=1
J=I

Conversely, ifa:/Il=1 at -
Cn, then (*)ensures X E 2'p(and also EX = Oifp = I).
Hint: if y,. = a.(IJ=1 af}-I/PX.· Inx.I".IIP], then EI y"1" :::; Cn-"/PEIXI"IIIXI""IPI
whence I~=I EI y"1" < 00 for (X> P and I~=I IE Y,.I < 00 for I < p < 2 as in
Theorem 5.1.3. N.B. Exercise 7 with p = I and Theorem 5.2.3(ii) are related to
summability.
8. For r > 0 and any LV. X, prove that X E .!l;. iff
00

I n,-I(logn)'P{IXI;;:: nlogn} < 00


n=l

Hint: Employ the techniques of Theorem 7.


9. (Klass-Teicher) If {X, X., n ;;:: I} are i.i.d. r.v.s and {b., n ;;:: I} are constants with
(i) b./n tor (ii) b./n L b.!n l12 -+ 00, I~ (bjlN = O(b;/n) and b. t, then

~
b.
fX
I
j - ~ E XII!XI:<b"I'!' 0 iffn P{lXI
b.
> b.} = 0(1).

10. Prove that if {X., n ;;:: I} are independent r.v.s with EX. = 0, EX; = a;, s; =
D a; -+ 00, then s;; l(log s;)-" D Xi ~ 0 for (X > t-
Il. (Feller-Chung) Let P(x) be nonnegative and nonincreasing on (0, 00) and suppose
that the positive, nondecreasing sequence {b., n ~ I} satisfies (*) lim.- 00 b.,lb. >
c > I for some integer r or a fortiori either (**) b.lnP t for some fJ > 0 or (***)
b; If=. bj- 2 = O(n). Then Loo= I P(xb.) either diverges for all x > 0 or converges
for all x > O. (Hint: Any integer m satisfies'" :::; m < ,M I, where k = k(m) -+ 00 as
m -+ 00, whenceb. m ~ ckb.,alliargen). Thus, if {X, X.} are i.i.d. and Loo=l P{IXI >
Cb.} = 00 for some C > 0 and {b.} as stipulated, 1IiTI._ 00 I X.llb. 'c 00 = 1IiTI._ oo
IS. lib•.
12. (Heyde-Rogozin) If {X, X., n;;:: 1} are i.i.d. with (*) lim x _ oo x 2 P{lXI > x}1
E X 2 I lIxl :<x) > (X> 0, then for every sequence {b.} satisfying 0 < b. too either
1IiTI._ooIS.llb. a.C. CD or (Ilb.XS. - L~ E X/IIXI';bJJ~O. In particular, for sym-
metric i.i.d. LV.S satisfying (*), Sjb. - + 0 or IIiTI S./b. a.c. w. Hint: P{I X I > x} :::;
(I + b 2(X-l)P{lXI > bx}, b > I, x > x o , whence the series of Exercise II either
converges for all C > 0 or diverges for all C > O.

13. (Komlos-Revesz) Let {X.,n ~ I} be independent r.v.s with means E X.and positive
variances a; c
satisfying lim EX. = and I~= 1 (j;; 2 = 00. Then

Ii=1 Xj/a} 'c


~. 2 -+c.
L..j=1 aj

Hint: Ii= I [(X j - E X)la} If= I a j- 2)] converges a.c.


14. (Heyde) Let {X., n ~ 1} be i.i.d. 2'p LV.S, S. = I~ Xi'S. = max1:<j:<. Sj' (i) If
I :::; p < 2, then n-1/P(S. - nil) ~ 0 or n-I/PS. ~ 0 according as Il = E X I ;;:: 0
or Il < O. (ii) If 0 < p < I, then n-1/PS. ~ O.
138 5 Sums of Independent Random Variables

15. (Derman-Robbins) {X.} i.i.d. with E X I nonexistent does not preclude P{S.Jn
--.. oo} = I. Hint: Let Iim x _..,x· PIX > x} > 0 and E(X-)P < 00 for I> f3 >
:x > O. The latter implies n- IIP I7 X i- ~ 0, while the former entails P{I7 xt =:;
Cn 1IP } =:; P{max 1 "i". xt =:; Cn l/P } =:; exp( -en Y ), y = I - rxfJ-l > 0, C > O.

16. (Klass) Let {X.} be i.i.d. with E XI = 0 and I:'=I P{IX.I > b.} < 00, where
b. > 0, n- b; L n-Ib; i
2
00.

Then EIS.I = o(b.). Hint: X j = y".j + Z•. j, where y".j = X/UXj!';b n ]' In particular,
X. i.i.d. with E X I = 0, EI X liP < 0:, P E [1,2), implies EIS.I = o(n 1IP ).
17. Strong Laws for Arrays. Let {X. i , 1 =:;; i =:;; n} bean arrayofr.v.'s that are identically
distributed and rowwise independent, i.e., {Xnl , ... , X •• } are independent for n ;;::: 2.
If Elxlllq<oo, 0<q<2, and EXll=O whenever l=:;;q<2, then
n- 2 /q I7=1 X'i~ O. (For an extension to 2 =:; q < 4, see Example 10.4.1.)

5.3 Stopping Times, Copies of Stopping


Times, Wald's Equation

Let (n,!IF, P) be a probability space and {!lF n , n ~ I} an increasing sequence


of sub-u-algebras of !IF, that is,!IF 1 c !IF 2 C . . . c !IF. A measurable function
T = T(w) taking values 1,2, ... , 00 is called a stopping time relative to {!lF n }
or simply an {~lI}-time if {T = j} E~, j = 1,2 .... Clearly, If T m, then =
T is an !F.-time. If T is an !F.-time, then setting !F 0 = {0, O}, !F 00 =
u(Uf !F.),
.- 1
{T ;;::: n} = Q - U {T = j} E !IF.- 1, I ~ n ~ 00. (I)
1

Moreover, since {T = oo} = n - {T < oo}, a stopping time is completely


determined by the sets {T = n}, I ~ n < 00.
A stopping time T is said to be finite if P{T = oo} = 0 and defective if
P {T = oo} > O. A finite stopping time is also called a stopping rule or
stopping variable. When!IF n = u(X 1 , ••• , X n ), n ~ I, for some sequence of
LV.S {X n }, an !lFn-time will generally be alluded to as an {Xn}-time or a
stopping time relative to {X n }. Stopping times and rules have already
appeared incognito in Theorem 5.2.5, Lemmas 5.1.3, 5.1.4, Theorem 3.4.1,
and Lemma 3.3.5.
The notion of a stopping time derives from gambling. Its definition is
simply the mathematical formulation of the fact that, since an honest gambler
cannot peer into the future, his decision to stop gambling at any time n must
be based solely upon the outcomes Xl' ... , X. up to that time and not on
subsequent outcomes Xj,j > n.
Let {X n , n ~ l} constitute a sequence of LV.S and {!IF., n ~ I} an in-
creasing sequence of sub-u-algebras of !IF such that X nis !IF n-measurable for
each n ~ I. Then {Xn , !IF n' n ~ I} will be called a stochastic sequence.
5.3 Stopping Times, Copies of Stopping Times, Wald's Equation 139

Clearly, for any sequence of r.v.s {X n}, {X n, u(X t, ... , X n), n;::: I} is a
stochastic sequence.
For any stochastic sequence {X n , fF n , n ;::: I} and fFn-timc T, define
X T = XT(W)(w), where X",(w) = rrm Xn(w), (2)

(3)
fF '" = u(9 fFn).

It is easy to verify that (i) fF T is a sub-u-algebra of fF "" (ii) X T and Tare


fFT-measurable, and (iii) fFT coincides with fFmwhen T == m.
Since
P{IXTI = oo} = P{T = 00, IX", I = oo},
X T is a r. v. if either T is finite or rrm X n is finite, a.c.
For {fFn}-times T t and T2 with T1 ~ T2 ,
(4)

since if A e fF T, then A{T2 ~ n} = A{T1 ~ n} . {T2 ~ n} e fF n, 1 ~ n < 00.


If T is an {fFn}-time, so is T + m for every integer m ;::: 1, and since T ~
T + m, necessarily fF T C fF T+ 1 C .•. c fF "'. Hence, fF '" = u<Uf fF n)
c u(Uf fF T+n) C fF "" that is, u(Uf fF T+n) = fF "'. Consequently, the
sequence t:§n = fF T+n' n ;::: 1, may be utilized to define a new stopping time.
Note, incidentally, that fFT+n = ~+m when T == m.
Suppose that T1 is an {fF n' n ;::: 1}-time and T2 is a {t:§ n' n ;::: 1}-time,
where t:§n = fFT,+n' n ;::: 1. Then T = T1 + T2 is an {fF n, n ~ 1}-time and
fF T = t:§ T2' Since for 1 ~ m < 00
m
{T = m} = U {T1 =j, T2 = m -j}efFm ,
j= 1

T is an {fF n, n ~ 1}-time. To prove fF T = t:§ T2' let A e t:§ T2' Then for m =
1, 2, ... and j = 1, ... , m - 1,

A {T2 = m - j} e t:§ m - j = fF T , + m - j'

A {T1 = j, T2 = m - j} = A {T2 = m - j}{ T1 +m- j = m} e fF m'


m-l
A{T = m} = U A{T1 =j, T2 =m- j}efFm,
j= 1

which implies that AefFT. Conversely, let AefF T. Then for r = 1,2, ... ,
and m = 0, I, ... ,

A{T = r + m} efFr+m,
A{T2 = m}· {T1 + m = r + m} = A{T = r + m}{Tl = r} e fF,+m,
A{T2 = m} e fF T, +m = t:§m,

which implies that A e t:§ T2'


140 5 Sums of Independent Random Variables

On the other hand, if T I, Tz are {~., n ~ l}-times with TI < Tz and

r = {Tz - T I ~f TI < 00 (5)


00 If TI = 00,

then Tz = T I + rand r is a {~., n ~ 1}-time, where~. = ~Tl + •. It suffices


to note that when 1 ~ n < 00
{r = n}{TI + n =j + n} = {TI =j}{Tz =j + n}E~j+.' 1 ~j < 00,

implies {-r = n} E ~ T, +. = ~., n ~ 1.

Lemma 1. 1fT is an {X.}-timefor some sequence {X.} ofr.v.s, there exists a


sequence {C.} ofdisjoint Borel cylinder sets of (ROO, ~oo) whose corresponding
bases B. are n-dimensional Borel sets, n ~ 1, such that
{w: T = n} = {w: (X 1' ... ' X., .. .)E Cn }, n = 1,2 .... (6)

Conversely, given any sequence {C n, n ~ I} of disjoint Borel cylinder sets


with n-dimensional Borel bases, an {Xn}-time T is defined by (6) and {T = oo}
= n - Uf {T= n}.
PROOF. If T is an {Xn}-time, then {T = n} E a(X I' ... ' X n), n ~ 1, whence by
Theorem 1.4.4 there exists an n-dimensional Borel set Bn for which
{T = n} = {(X 1, ... , X n) E Bn }.
For each n ~ I, let C~ be the cylinder set in Roo with base Bn. Then
n ~ l.
t
Moreover, C n = C~ - Uj= Cj, n ~ I, are disjoint Borel cylinder sets with
n-dimensional Borel bases. Since {T = m} . {T = n} = 0 for m #- n, (6)
follows.
Conversely, given a sequence of disjoint cylinders C. E fJlOO with n-
dimensional Borel bases B., n ~ I, if T is defined as stipulated, then
{T = n} = {(XI, ... ,Xn)EB.}Ea(XI, ... ,Xn), n ~ 1,
so that T is an {X.}-time. o
Lemma 2. If {X n, n ~ l} are i.i.d. r.v.s and T is a finite {~n}-time where
!Ii'" and a(Xj, j > n) are independent, n ~ I, then ~T and a(XT +I , X T +Z ' ••• ) are
independent and {XT+.' n ~ I} are i.i.d. with the same distribution as X I.
PROOF. If AI' ... , An are real numbers and A E ~ T'

P{A ·l\ [X T + i < AJ} = JI P{A. [T = niOI [X j +i < AJ}


= j~1 P{A· [T = n}· P{OI [X j +i < AJ}
5.3 Stopping Times, Copies of Stopping Times, Wald's Equation 141

since A· [T = j]E~. Hence,


n
P{ A· DI [X T+i < AJ =
}
JI
<Xl

P{A· [T
n
= j]}iG P{X j +i < AJ
<Xl n

= I
j= I
P{A . [T = j]} TI P{X i < AJ
i= I

= P{A} TI P{X
i= I
i < AJ. (7)

Hence, taking A = n,
n

P {X T + i < Ai' 1 :s; i :s; n} = TI P {X


i= I
i < Ai}

and, in particular, P{X T+j < Aj } = P{X j < Aj } = P{X I < Aj }, 1 :S;j:S; n.
Thus, since n is an arbitrary positive integer, {X T + n' n 2:: I} are i.i.d. with the
same distribution as X I' Consequently, from (7)

P{A. iQI[XT+i < AJ} = P{A}tl P{XT+i < AJ,


and therefore, in view of the arbitrariness of AI' ... , An and n, the classes
ff'T and a(X T+I , X T+2, ... ) are independent. 0

Corollary 1. a(T) and a(X T+ I' X T+2"") are independent.


PROOF. It suffices to recalI that T is ff'T- measurable. o
Next, let {X n , !l 2:: I} be i.i.d. LV.S and T a finite {Xn}-time. Then, by
Lemma 1 there exist disjoint cylinder sets {C n , n 2:: I} in f!4<Xl with n-dimen-
sional Borel bases such that
{T = n} = {(X I ,X 2 , ... )ECn }, 1:s; n < 00.

Define T(I) = TI = T and T(j+ I),j 2:: 1, via ~ = L{ Tli), by


{T(j+I) = n} = {(XTj+I,XTj+2, ...)ECn}, 1:s; n < 00.

Then, as noted earlier for j = 1, T(j + I) is a finite {X T J + n' n 2:: 1}-time, j 2:: 1.
The stopping variables {T(j+ I), j 2:: I} or {TW, j 2:: l} will be called copies
of T. Moreover, as folIows from earlier discussion, Tm = Lj= I TU) is a
(finite) {Xn}-time and ff'T~ c ff'T~+I' m 2:: 1.
In a sequence of Bernoulli trials with parameter p = t, i.e., Sn = Xi' I1
where {X n} are i.i.d. r.v.s with P{X j = ± I} = t, let T = TI = T(I) =
inf{n 2:: I:S n = O}and T(j+I) = inf{n 2:: I:S Tj +n = O}.Then ~ = If=1 Tli),
j 2:: I, are the return times to the origin and T(j) is the time between the
(j - l)st and jth return. If, rather, T = inf{n 2:: 1: Sn = l}, then ~ is the
first passage time through the barrier at j, whereas TW is the amount of time
required to pass from j - 1 to j. Either choice of T yields a finite {X n}-time
with infinite expectation (Exercises 3.4.1, 3.4.2).
142 5 Sums of Independent Random Variables

Lemma 3. Let {X., n ~ I} be i.i.d. r.v.s and T a finite {X.}-time. If To = 0,


T(l) = T, and {T(j),j > 1} are copies ofT, then, setting Tm = D=I T(j), the
random vectors
m ~ I,
are i.i.d.
PROOF. As already noted, Tm is a finite {X.}- time. Moreover, it is easy to see
that Vm and hence also (VI' ... , Vm ) is :FT... -measurable. By Lemma 2,
a(X T... + I , X Tm + 2 , .•.) is independent of :F T... , m ~ I, and, since T(m+l) and
(X T + 1 , .•• , XT... +J are a(X T... + I , X T... + 2 , ••• )-measurable, a(Vm + l ) and
:F T are independent for m ~ I. Thus Vm + 1 is independent of (VI' ... , Vm ),
m ~ I, which is tantamount to independence of {Vm, m ~ 1}.
Furthermore, for all real Ai' I ~ i ~ n, and m, n ~ I, if {C n , n ~ I} are
the Borel cylinder sets of (6) defining T(Il,
qm = p{T(m) = n,X T _1+ I < AI>X T ... _ + 2 < A2 , •. ·,X T ... < ATI ... )}
1

= p{T(m) = n,X T _
1
+1 < AI, ... ,X Tm _ +. < A.} I

= P{(X T _1 + I> X T ... _ I + 2, , X T... _I +., ...) E C.,


XT _1+ 1 < A1, ... ,X T _ +. < A.}. 1

= P{(X., X 2 , )EC., XI < AI> ... , X. < An} =


... ql

by Lemma 2, since Tm is a finite {X.}-time. Thus, {Vm, m ~ 1} are i.i.d.


random vectors. 0

Corollary 2. If T is a finite {X.}-time, where {X., n ~ l} are i.i.d. random


variables, then the copies {T(·), n ~ l} of Tare i.i.d. random variables.

If {X., n ~ 1} are r.v.s and T is an {X.}-time, then the expectation of


IXTI is

EIXTI = r
J\T<",j
IXTI + r
J{T="'}
ITmi X.I

~
j=. J{T=j}
r IXjl + r
J{T="'1
Ilim X.I. (8)

Analogously, if the expectation of X T exists,

EXT = E(X T)+ - E(X T )- = ~


j=1
i [T=j)
Xj + i[T=oo]
Tmi X.. (9)

As customary, X T is called integrable iLl EXT I < 00. If a stopping time T is


integrable, then necessarily P {T ~ n} = o(n - I) and a fortiori T is a finite
stopping time or stopping variable.
Consider a sum of a random number of LV.S ST = Lf Xi' where-T is an
{X., n ~ I}-time. If T is integrable and the {X n } are i.i.d. 2 1 LV.S, the ex-
pectation of ST has the natural form (10).
5.3 Stopping Times, Copies of Stopping Times, Wald's Equation 143

Theorem 1 (Wald's Equation). Let {X n, n 2: 1} be i.i.d. LV.S on (0, IF, P),


Sn = L~ Xi' n 2: I, and let {IF n, n 2: l} constitute an increasing sequence of
sub-u-algebras of IF with (i) IFnand u(Xn + 1) independent, n 2: 1. If E Xl exists
and T is an {IFn}-time with E T < 00, then

EST = EX 1 • E T. (10)
PROOF. Suppose first that E I X I I < 00. Then by Theorem 4.3.3
00 00 00

E L IX nl/[T2:n) = L EIX nl/[T2:n) = L EIXnIP{T 2: n} = E T· EIX II,


I I I
whence by the dominated convergence theorem
00 00

EST = E L X n · I[T2:n) = L P{T 2: n}· E X n = EX l ' E T.


1 1

If, rather, E X I = 00, then E Xl < 00, EX: = 00, whence by what has
just been proved
00

E Si ~ E L X n- I[T2:n) = E T· E Xl < 00.


t

Therefore, EST exists and


00 00 00

EST = E L (X: - X n-)I[T2:n)= E L X n+ IIT2:n) - E L X; I[T2:n)


I 1 1

=ET·EX: -ET·EX , =ET·EX t •


The proof for E X I = - 00 is analogous. o
Corollary 3. If {X n} are i.i.d. Lv.sfor which E XI exists and T is an {X n}-
time with E T < 00, then, setting Sn = L~ Xi' (10) obtains.

The next result is a refinement of Theorem 1, with (i) due to Robbins-


Samuel.

Theorem 2. Let {X n} be i.i.d. r.v.s, let Sn = D Xi' and let T be a finite {X n}-
time for which E ST exists. (i) IfE X I exists and either E X I#-O or E T < 00,
then (10) holds. (ii) IfP{ IX II> n} = o(n- I) and E T < 00, then

EST rEI
ET = X I lIX Ii sn)'
1m (II)
n-oo

and when Sr is integrable,


Sn pEST
--4-- (12)
nET'
PROOF. Let T(j), j 2: 1, be the copies of T and set To = 0, Tm = TW. L':'
Then according to Corollary 2 and Lemma 3, T(j),j 2: 1, are i.i.d. and so are
144 5 Sums ofindependent Random Variables

SW = X Tj_ I + I + ... + X T for j ;::: 1. By Corollary 5.2.2


j

S(l) + ... + stm)


ST Tm~ET.
m
------~ES
m T, (13)
m m
If E X I exists, then by this same corollary and exercise 3.3.10
ST
-2!'~E XI'
Tm
which, in conjunction with (13), yields (10). On the other hand, if P { I X I I > n}
= o(n - I), then by Theorem 5.2.6
ST p
-'-+0,
-2!'-C
T m
(14)
m

where Cm = EX IIIIXd"m)' Hence, any subsequence {n"} of the posItive


integers has itself a subsequence, say {n'}, for which by (14) and Lemma 3.3.2

ST•. _ Cn' ~ O. (15)


7;,.
Then, via (15) and (13), recalling that E T < 00,

en' = (en. _ ST•. ) + ST•.. !!.-..!£., EST.


1'". n' Tn' ET
Thus, every subsequence of {cn} has itself a subsequence whose limit is
independent of the subsequences, yielding (11). Finally, (12) follows from
(11) and Theorem 5.2.4. 0

Corollary 4. Let {X n, n ;::: I} be i.i.d. r.v.s, let Sn = L~ Xi and let T be an


integrable {X n}-time. If n P[ IX II > n] = 0(1) and E X I Illx d "nl has no limit
as n -+ 00, then E ST does not exist.

The next theorem is the second moment analogue of Wald's equation.

Theorem 3. If {Xn} are independent r.v.s with E X n = 0, E X; = u 2 < 00, Sn =


L~ Xi' n ;::: 1 and T is an {.~.}-time with E T < 00 where (i) fF" ;:) u(X I ... X n)
and (ii) fF" and u(Xn+d are independent, n ;::: 1, then
E S? = u 2 E T. (16)

*
PROOF. If T(n) = min[T, n], then T(n) is a finite stopping time and

ES?(n) E( XjIIT~j))
= 2

ECt XjI[T~j)r + EX;I[T~n) + 2 EXnI[T~n)nfXjI[T~j)'


l

=
5.3 Stopping Times, Copies of Stopping Times, Wald's Equation 145

fi'n-t,
n- I n- I

E Xnl1T"nl L Xjl1T"jl = E XnE L XjI1T"jI11T"nl = 0,


1 1

whence

E(~ Xil T"J1Y - ECtlXjIlnJlr = E X;lIT"nl = (12 PET ~ n],


and summing,
n
ES}(n) = E ( ~XjllT"Jl
)2 = (12j~1P{T ~j} = (12 E T(n).
n

Since T(n) t T,
lim E S}(n) = (12 E T < 00. (17)
n~'"

Moreover, in completely analogous fashion,

E(ST(n) - ST(m»2 = E( ±
m+1
Xjllnjl)2 = (12[E T(n) - E T(m)]

= 0(1) as n > m - 00.

Thus, ST(n) -!4 S, and so by Corollary 4.2.5 and (17), E S2 = (12 E T. It


remains to identify S with ST, and this follows from the existence of a sub-
sequence of ST(n) converging a.c. to Sand T(n) ~ T. 0

Corollary 5. If {X n } are i.i.d. r.v.s with E X I = 0, E Xi = (12 < 00 and T is


an {Xn}-time with E T < 00, then (16) holds.

EXAMPLE 1. Let {X n , n ~ I} be independent r.v.s with sUPn" I EIXnl' s;


M < 00. where either r = 2, E X n = 0, or 0< r S; l. If {an' n ~ I} are
positive constants with an 1. An = LJ=I aj and Sn = L'i Xi. then for any finite
{Xn}-time T

E aTISTl' S; MEAT (18)


and, moreover, for any ex in (0, 1)
(19)
PROOF.
as r
If T is a bounded stopping time, So
= 2 or not, then via independence
= 0, and {)" 2 = 2 or °according

E aTISTl' = JI LT=jlajlSjl'
146 5 Sums ofIndependent Random Variables

00 j

= M L L a" P {T =
j= ,,=
1 1
j} = MEAT·

Hence, for any finite stopping time T, (18) holds for T(n) = min[T, n],
yielding (18) as n -+ 00 by Fatou's lemma and monotone convergence. To
prove (19), note that by Holder's inequality

EIS Ir«
T
= E T«(l-«)
ISTlr« . T«(l-«) <
-
(E ISTI')«(E P)l-«
T1-« '

and so, employing (18) with all = l/n 1-«, All ~ n«/a., (19) follows. 0

As an application of Example 1, if {XII' n ~ 1} are independent f.V.s that


obey the Central Limit Theorem with E XII = 0, EX; = 1, SII = Ii Xi' n ~ 1
and

(20)

then E T~ = 00 for c 2 > 1/a., 0 < a. < I, and E U~ = 00 for m > m«, all
a. > O. The latter statement follows from the former which, in turn, results

Gr
from (19) via
2
c «E T~ ~ EISTf« ~ E T~.
The same conclusion holds if {XII' n ~ I} are independent r.v.s with mean
zero and variance one which obey the central limit theorem (Chapter 9),
since r; is a finite stopping time (Exercise 3.2.12).

Lemma 4. Let {XII' n ~ I} be i.i.d. random variables and T = inf {n ~ 1:


X" E B}, where B is a linear Borel set such that 0 < P{X 1 E B} < 1. If To = 0,
T,. = IJ=l TW, n ~ 1, where {TW,j ~ I} are copies of T, then setting
n ~ 1, (21)

{ Y1 , Z l' Y2 , Z 2' ... } is a sequence of independent variables.


PROOF. If n = 1, Y1 and Zl are independent according to exercise 5.3.12. For
n ~ 2, via Lemma 3,
5.3 Stopping Times, Copies of Stopping Times, Wald's Equation 147

since the joint distributions of (Y", Zn) and (Yl' Z d are identical. The conclu-
sion then follows by induction. 0

If {X, X n' n 2:: I} are i.i.d. random variables with E X = 0, E X 2 < 00, then
limn .... 00 Sn/n 1/2 = 00, a.c., as noted in exercise 3.2.12. This remains true even
when E X 2 = 00, according to

Theorem 4 (Stone). If Sn = Ij= 1Xj' n 2:: 1, where {Xn' n 2:: 1} are i.i.d. random
variables with E Xl = 0, E IX11 > 0, then
lim Sn/n 1/2 = 00 = - lim Sn/n 1/2 , a.s. (22)
n.... oo

PROOF. It suffices to prove the first half of (22). To this end, choose finite
°
constants a, b such that a < < band P{a < Xl < O}P{O < Xl < b} > 0.
Then T = inf{n 2:: 1: a < X n < b} is an {Xn}-time with finite expectation.
Since, as noted earlier, the theorem holds when E xi < 00, it may be supposed
that E Xf = 00, whence 0< P{a < Xl < b} < 1. Set To = 0, T" = Lj= 1 TW,
n 2:: 1, where {TW,j 2:: 1} are copies of T and define
(23)

According to Lemma 3, { ~,j 2:: 1} and Z j' j 2:: 1} are each i.i.d. sequences and,
moreover, via Lemma 4, {~,j 2:: 1} is independent of {Zj,j 2:: I}. By Wald's
equation, E Y1 + E Zl = E If X j = 0, whence
n n n
ST n = I (~+ Zj) = I (~- E ~) + I (Zj - E Zj)' (24)
j=l j=l j=l

By example 5.2.2, {Zj - E Z j' j 2:: 1} is a recurrent sequence, and so


P{IIj=l (Zj - E Zj)1 < M, i.o.} = 1 for some M in (0,00). Define 'tift =
inf{ n 2:: m: IIi (Zj - E Zj)1 < M}, whence 'tift < 00, a.s. and 'tift i 00 as m -+ 00.
Now, (J2 = E(Y1 - E y 1 )2 E (0,00) whence

M + ST tm
2:: ~ (~ -
jf-1
E ~) --[1 (J't:.!2
~ (~ -
jf-1
E~) ] 1/2
(J'tlft (25)
148 5 Sums of Independent Random Variables

via (24), and since r m is independent of {lj,j ~ I},

p{-k
arm
.I (lj -
)=1
E lj) > x}
= f p{ak11 /
k=m
2 .t
)=1
(lj - E lj) > x}p{r m = k}~ 1 - $(x)

for all x via the Central Limit Theorem for i.i.d. random variables (Corollary
9.1.2).
Consequently, for any x > 0, as n ~ 00,

P {ST tm > ~ ar~2 } ~ 1 - $(x) + 0(1). (26)

Now, T.jr m = : Ii':: 1 TUl ~ E T < 00, implying via (26) that, for x > 0,
m
as m --+ 00,

and so

p{s T tm
/T.t 1 / 2 > xa
rn 3
E- 1 / 2 T. i.o.}
'

= lim
m-oo
p{.O [ST /Tr~/2 > xa3 E- T]}
)=m
t
j
12
/

~ 1 - $(x)

whence for all x > °


Qx = P {!~~ S./n 1/ 2 ~ x; E -1/2 T } ~ 1- $(x).

Thus, Qx = 1 by the Kolmogorov zero-one law and (22) follows as x --+ 00.
D

Lemma 5. Let So = 0, S. = I7= 1 Xi'S. = maXO~j~.Sj' n ~ 1, where {X, X.,


n ~ I} are i.i.d. random variables with E X = 0, E IX I > 0. Then, for x > and
any integer k ~ 1,
°
P{S. ~ 2kx} ~ nP{X > x} + Pk{S. ~ x}.
PROOF. IfX o = O,X. = maxO~j~.Xj,andAk = {So ~ 2kx, X. ~ x},itsuffices
to prove that P{A k} ~ Pk{S. ~ x}. To this end, define To = 0, 1; = L~= 1 T(h l ,
1 ~ i ~ k, where
T(i) = inf{j ~ 1: ST i _ 1 +j - ST'_I ~ x}, 1 ~ i ~ k.
5.3 Stopping Times, Copies of Stopping Times, Wald's Equation 149

By Corollary 5.4.2, T1 is a finite stopping time and, by Corollary 5.3.2, {T(il ,


1 ~ i ~ k} are i.i.d. Clearly, T 1 ~ n on the set Al since otherwise Al . {S. < x}
=I 0, a contradiction. For k ~ 2, suppose, inductively, that 1; ~ n on Ai for
1 ~ i ~ m < k. Then, on Am + l '
m+l m
ST m
+ -1
1
~ L
i=1
max (ST i _ 1 + j
l,;j<T(i)
- ST i _ ) + L X TV) < 2(m + l)x,
i=1
implying that n > Tm + 1 - 1, that is, Tm + 1 ~ non A m + l' Thus, for m = k - 1,

P{A k } ~ P{tl T(il ~ n} = Pk{S. ~ x}. D

EXERCISES 5.3
1. Verify that if T is an {~}-time, X Tm +j is §Tm'j-measurable,j ~ 1, m ~ 1.
2. If T I and T2 are ~-times, so are T I T2, max(TI , T2), min(Tl , T2), and kTl , where k
is a positive integer.
3. If T is an integrable {X.}-time, where {X.} are independent LV.S with E X. = 0,
EIX.I ::; C < 00, n ~ 1, then Wald's equation holds.
4. If (Xi' ~), i ~ 1, are i.i.d. 2 2 random vectors with E Xl =E Yl = 0, and ~ =
a(X I , YI ,···, X n, Y,,), S. = I~ Xi' Un = I~ ~, then for any integrable ~-time T,
the identity E STUT = E T· EX l YI holds.
5. Let S. = I7=l Xi where {X., n ~ 1} are i.i.d. r.v.s with E XI = Jl > 0 and N = Np
an {X. }-time (or a LV. independent of {X., n ~ I}) having the geometric distribu-
tion. Then Iimp~o P{SN/E SN < x} = 1 - e-X, x> O.
6. Show that the condition E T < 00 cannot be dropped in Corollary 3. Hint: Con-
sider P{X. = 1} = P{X. = -I} = tand T= inf{n ~ 1: S. > O}.
7. If {X., n ~ I} are independent random variables with E X. = 0, E X; = 1 and T*
(resp. T*) = inf{n ~ 1: IS.I > (resp. <)en l /2 }, where inf 0 = 00, then E T* = 00,
e ~ 1 and E T* = 00, e ::; 1.
8. If {X., n ~ I} are independent LV.S with EX. = 0, E X; = a;, n ~ I and T is an
§ .-time for which E Ii aJ < 00, where (i) §. ~ a(X I' ... , X.), (ii) § nand
a(X.+ I) are independent, n ~ 1, then E(Ii X)2 = E Ii aJ.

9. (Yadrenko) If {X., n ~ I} are i.i.d. LV.S uniformly distributed on [0, I] and T =


inf{n> I: S. ~ I}, where S. = D
Xj, prove that E T = e = 2 EST'

10. Utilize the method of Example 1 to give an alternative proof of Theorem 3.


11. Let {X., n ~ I} be i.i.d. with P{X I > M} > 0, all M in (0, (0). If 1; = inf{n ~ 1:
X. ~ e}, where lei < 00, prove that E T';' < 00, m ~ I, and E X Tc ::; E X Td for
e < d and E X Tc = P-I{X ~ e} SIX2:C) X.
12. If {X, X., n ~ I} are i.i.d., B is a linear Borel set with P{X E B} > 0, and T =
inf{n ~ I: X. E B}, then X T and Ii-I Xi are independent LV.S. Moreover, if
EIXI < 00, then E X T = E Xl IXeB )' E T.
150 5 Sums of Independent Random Variables

13. Show that lim sup.~oo n -1/2IIi'= I Xii = 00, a.c. for any sequence of Li.d. random
variables {X., n ~ I} with EIXti > O.
14. Let S. = Ii'=1 Xi' n ~ 1, where {X, X., n ~ I} are i.i.d. random variables with
E X = 0, E IX I > o. For x > 0, p > 0, and k = 1, 2, ... , show that

qk == p{ max Sj//
I';j';.
~ 2kx}:S; p{ m~x X)/ ~ x} +
I';j';'
pk{ m~x
I';j';'
Sj// ~ x},
and if 1 :s; r :s; 2 and P< l/r, there exists a universal constant C',I1 such that

qk:S; p{ max X)/


I ' ; j ,;.
~ x} + [C',1l(2x)-'n l
-,flEIXI'r.

Hint: This generalizes Lemma 5.3.5.

5.4 Chung-Fuchs Theorem, Elementary


Renewal Theorem, Optimal Stopping
An instance when the generality of Theorem 5.3.1 (as opposed to the specif-
icity of its corollary) is desirable is furnished by

Theorem 1. Let {X.} be i.i.d. r.v.s with EIXII > 0, S. = D Xi' and define
T+ = inf{n ~ 1: Sn ~ O}, T~ = inf{n ~ 1: S. > c > O}. (1)
Then:
i. P{T+ < oo} = 1 iff rrm S. ~. 00, in which case E ST + > 0;
ii. E T+ < 00 iff E T~ < 00 for all c in (0, (0).

PROOF. If under the hypothesis of (i), T+ is a finite {Xn}-time, then ST + is a


LV. and EST + ~ E X: ~ O. Let TUl,j ~ 1, be copies of T+ and set To = 0,
T" = L1
TUl,n ~ 1. By Lemma 5.3.3,ther.v.sstm ) = X Tm _I + 1 + ... + X Tm ,
m ~ 1, are i.i.d., whence by Corollary 5.2.2
S(l)
+ ... + st.) ---+
a.c. E S E X +
T+~ t·
n

Hence, rrm
s. ~ rrm
n EXt ~ 0, a.c., and, moreover, s. = + 00, a.c., rrm
since EXt > 0 in view of the fact that the only permissible alternative,
namely, E Xl < 0, would imply (same corollary) that S.ln ~ E Xl < 0,
contradicting rrm
S. ~ 0, a.c. The converse is trivial.
Apropos of (ii), since via (i)

S(l) + ... + st.)


- - - - - - ~ EST > 0,
n +
5.4 Chung-Fuchs Theorem, Elementary Renewal Theorem, Optimal Stopping 151

for any c > 0 there exists an integer k such that p{I~ SUI > c} ~ t. Setting
Zn = I(':- t)k+ t SUI, the r.v.s {Zn, n ~ l} are i.i.d. by Lemma 5.3.3 and
clearly Zn ~ 0, a.c., with P{ZI > c} ~ t. Define r = inf{n ~ 1: Zn > c}.
Then
P{r ~ n} = P{ZI ~ c"",Zn_1 ~ c} = pn-I{ZI ~ c},
whence
00 00 I
Er= IP{r~n}= Ipn-I{ZI~C}=p{Z }~2.
n=1 n=1 I>C
Moreover, since
t kt kt Tj Tk'
C < I Zj =
1
L SU) = j=1
1
I I
Tj_I+1
Xi = I Xj =
1
STk<'

necessarily T~ ~ 1;." whence


kt
E T~ ~ E 1;.t = E I TU) = E(kr)· E T+ < 00
1

by Wald's equation since kr is a ~n-time, where ~n = o(TO), ... , TIn!,


SOl, ... , SIn), i.e., {kr = kn} E ~nk and o(Tln+ ll) is independent of ~n (recall
Lemma 5.3.3). Again, the converse is trivial since T+ ~ T~. 0

The stopping rule T+ and its partial analogue T_ , defined in (2) below, are
interconnected as in

Theorem 2. Let {X n} be i.i.d. LV.S, Sn = Ii Xi' and define


T+ = inf{n ~ 1: Sn ~ O}, L = inf{n ~ 1:Sn ~ O},
(2)
T+ = inf{n ~ I: Sn > O}, T_ = inf{n ~ 1: Sn < O}.
Then, (i)
1
E T+ = P{T_ = oo}' E T+ = P{L = oo}' (3)

P{L = oo} = (l -~)P{T_ = oo}, (4)


where

L P{S 1 > 0, ... , Sn- 1 > 0, Sn = O} =


00

~ = P{ST _ = 0, T_ < oo}. (5)


I

Moreover, ifEIX11 > 0,


ii. ~ < 1,
iii. L is defective iff Sn ~ 00.
152 5 Sums of Independent Random Variables

PROOF. (i) For'k > 1 and 1 ~ n ~ k, define


A~ = {SI ~ SZ, .. ·,SI ~ Sd,
A~ = {SI > Sn, ... , Sn-I > Sn ~ Sn+ I' ... , Sn ~ Sd·
Then, if k ~ n ~ I,
P {A ~} = P {X n < 0, ... , X 2 + ... + X n < O}
. P {X n + 1 ~ 0, ... , X n + 1 + ... + X k ~ O}

= P{T+ ~ n}P{T'_ > k - n},


and so when k ~ I
k
I = L P{T+ ~ n}P{T'_ > k - n}, (6)
n=1

yielding
1 ~ P{T'_ = oo}E T+, (7)
whence P{ T'_ = oo} > 0 implies E T+ < 00. Conversely, if E T+ < 00,
via (6)
j k
1~ L P{T+ ~ n}P{T'_ > k - j} + L P{T+ ~ n};
n= 1 n=j

letting k -+ 00,
j 00

1~ L P{T+ ~ n}P{T'_ = oo} + L P{T+ ~ n},


n=1 n=j

and then letting j -+ 00,

1 ~ P{T_ = oo}E T+,

implying P {T_ = oo} > O. Consequently, P {T_ = oo} E T+ = I if either


P{T'_ = oo} > 0 or E T+ < 00, and so, without qualification,

1
E T+ = P{T'_ = oo}

Similarly,

establishing (3). Next,

P{T'_ = oo} - P{L = oo} = P{T'_ = 00, L < oo}


00

= L P{L = n, T'_ = oo}


n=1
5.4 Chung-Fuchs Theorem, Elementary Renewal Theorem, Optimal Stopping 153

00

L P{St > 0, ... , S._ t > 0, S. = 0, T'_ = oo}


.=1
00

=L P {S, > 0, ... , S. _, > 0, S. = 0, S. + 1 ~ 0, S. + 2 ~ 0, ...}


.=J
00

= LP{SI >O"",S._I >O,S.=O,X.+, ~O,X.+, + X.+ 2 ~O, ... }


.=1
00

=L P{S, > 0, ... , S.-1 > 0, S. = O}P{S, ~ 0, S2 ~ O, ... }


.=1
=~ P{T'_ = oo},

yielding
(I - ~)P{T'_ = oo} = P{L = oo},

which completes the proof of (i).

(ii) Suppose that E I X 1 I > o. If ~ = I, then T_ < 00, a.c., and, replacing
{X.} by {-X.} in Theorem I(i), E Sr- < O. On the other hand, ~ = I
entails ST _ = 0, a.c., via (5), a flagrant contradiction.
(iii) If L and hence also T'_ is defective, E T+ < 00 by (3) and afortiori
T+ < 00, a.c., so that according to Theorem I(i)

(8)
Thus, T = inf{ n ~ 1: S. > K} < 00, a.c. for all K > 0 and, via Lemma 5.3.2,
P{fu!!,,~oo (ST+. - ST) ~ O} = P{fu!!,,~oo S. ~ O} = e> o.
Consequently, with probability at least e,
fu!!,,-oo S. = lim._ oo ST+. = ST + fu!!,,~oo (ST+. - ST) > K,
implying that P{fu!!,,-oo S. = oo} ~ e > 0 whence, by the Kolmogorov zero-
one law, fu!!,,- 00 S. = 00, a.c., that is, S. ~ 00.
Conversely, if lim._ oo S. a.c. 00, T cannot be finite since Theorem 1 (with
X -+ - X) then would entail lim S. a.c. -00. 0
.-00

A combination of Theorems I and 2 yields

Corollary l. If {X.} are i.i.d. r.v.s, S. = L~ Xi' and


T+ = inf{n ~ l:S. ~ OJ, T_ = inf{n ~ I:S.:s;; OJ,
T~ = inf{n ~ I:S. > c > OJ, T'-c = inf{n ~ I:S. < -c < OJ,
then either S. = 0, n ~ I, a.c., or one ofthe following holds:
i. L is defective, lim S. ac. 00, E T+ < 00, E T~ < 00, C > 0;
154 5 Sums of Independent Random Variables

ii. T+ is defective, lim Sn ac. - 00, E T_ < 00, E T'-c < 00, C > 0;
iii. T+ and L are finite, rrm Sn a.c. 00,lim Sn a.c. - 00, E T+ = E L = 00.

PROOF. Supposing {Sn, n ~ I} nondegenerate, necessarily EIXti > O.


(i) If L is defective, T'_ is defective a fortiori, whence by Theorem 2
Sn ~oo and E T+ < 00. Moreover, by Theorem I(ii), E T~ < 00 for > O. c
Similarly for (ii).
If, as in (iii), both T+ and T_ are finite, then Theorem I(i) ensures that
rrm Sn = 00 a.c.,
moreover, according to Theorem 2(ii) and (4), T'_ and (analogously) T'+ are
finite, whence by Theorem 2(i), E T+ = E T_ = 00. 0

The following provides a criterion for category (iii) of Corollary I.

Corollary 2 (Chung-Fuchs). Let {X n , n ~ I} be i.i.d. r.v.s with EXt = 0


and EIX t I> 0 and let Sn = I~ Xj' Then
rrm Sn = 00 a.c., lim Sn = - 00 a.c.

PROOF. By Corollary I, it suffices to prove that E T+ = E L = 00. Suppose,


on the contrary, that E T+ (say) < 00. Then, by Wald's equation,
o ~ E(X n ~ E ST = E T+ . EXt = 0,
+

implying E Xi = O. Since EXt = 0, E XI = 0 and therefore EIXtl = 0,


a contradiction. 0

The next theorem, which extends Corollary 2, asserts that the same
conclusion holds if SJn !. 0 and E I X • I > O.

Theorem 3 (Chung-Ornstein). If {X n } are i.i.d. LV.S with EIX.I > 0, Sn =


Ii Xi' and
(9)

then T+ and T_ are finite and

p{n:nSn = oo} = I =P{li~Sn = -oo}- (10)

PROOF. According to Corollary I it suffices to prove that T+, L are finite


and, by symmetry, merely that T_ is finite. Suppose contrariwise that T_
is defective, whence E T+ < 00 by Theorem 2. Now the WLLN (Theorem
5.2.4) together with (9) implies
nP{IXtI>n} =0(1), EX.I I1X I!:5n) =0(1). (II)
Since E ST + exists and E T+ < 00 as noted above, Theorem 5.3.2(ii) ensures
5.4 Chung-Fuchs Theorem, Elementary Renewal Theorem, Optimal Stopping 155

EST + = O. Thus, EX: ::; EST + = 0, entailing EX IIlIxd,;n) -+ - E Xl < 0,


in contradiction of (11). Consequently T_ is finite. 0

The same proof yields

Corollary 3. Let {X n} be i.i.d. r.v.s with EIX II > 0, Sn= Li Xi' and n P{ IX t I
> n} O. (i) Ifeitherlim EX IIIIXd,;n) does not exist orlim E X IIIIXd,;n) ~ 0,
-+
then T+ < 00, a.c. (ii) If, moreover, EX: = E Xl = 00, and if lim E X I
X IUXd,;n) = c finite or EX IIUXtl,;n) has no limit, then p{nm Sn = oo} =
P{lim Sn = -oo} = 1.

Corollary 1 implies that apart from degeneracy only three possible modes
of behavior exist for sums Sn of i.i.d. r.v.s {X n}. If, moreover, E X: = E Xl
= 00, the same trichotomy will be shown to exist for the averages Snln.

Lemma 1. (i) If {X n, n ~ I} are Li.d. with Sn = Xi' n ~ 1, then


LV.S Li
Sn ~ 00 iff there exists an {Xn}-time T for which E T < 00 and EST> O.
(ii) Moreover, when E Xi = 00, then (Snln) ~ 00 iff there exists an {X n}-
time T with E T < 00 and E ST > - 00.
PROOF. Under (i), if Sn ~ 00, then Corollary 1 ensures that E T+ < 00,
where T+ = inf{n ~ 1: Sn ~ OJ, and clearly EST + ~ E Xi > Con- o.
versely, if T is an {Xn}-time with E T < 00 and EST> 0, let T(j),} ~ I, be
copies of T with T" = Ii
T(j), To = O. By Lemma 5.3.3, v" = (Tt n ), stn),
n ~ 1, are i.i.d. random vectors, where sIn) = X T" _1 + I + ... + X Tn' Define a
{v,,}-time t by

t = inf{n ~ l:ST n ~ O} = inf{n ~ l:~S(j) ~ O}


Since Li SW ~ 00, Corollary 1 guarantees that E t < 00 and since ST, ~ 0,
T+ ::; Te. Hence, via Wald's equation
t

E T+ ::; E Te = E I TU) = E t· E T(I) < 00


I

and, invoking Corollary 1 once more, Sn ~ Xl.


Apropos of (ii), let T be an {XnHime with E T < 00 and EST> - 00.
For K > 0, define X~ = X n - K, S~ = Xj, and Ii
156 5 Sums of Independent Random Variables

Consequently, by (i) Sn - nK = S~ ~ 00 for all K > 0, implying Snln ~ 00.


The remainder of (ii) follows trivially from (i). 0

Theorem 4 (Kesten). If {X n' n ~ I} are i.i.d. r.v.s with EX: = E Xl = 00


and Sn = L~ Xi> n ~ 1, then one of the following holds:
i. Snln ~ 00;

11. Snln ~ - 00;

PROOF. (i) If Sn ~ 00, Corollary 1 ensures E T+ < 00 and E ST + ~ E xt


= where T+ = inf{n ~ 1: Sn ~ OJ, so that Snln ~ 00 by Lemma 1.
00,
Similarly, Sn ~ - 00 guarantees (ii). Since the hypothesis precludes
degeneracy, only the alternative lim Sn = 00, a.c., and lim Sn = -00, a.c., re-
mains, in which case by (ii) of Lemma 1 no {Xn}-time T with E T < 00 and
IE STI :5: 00 exists. Hence, if X~ = X n - K and S~ = L~ X~, no {Xn}-time T'
with E T' < 00 and IE S~·I :5: 00 exists for any finite constant K. Again
invoking Lemma 1 and Corollary 1, lim S~ = 00, a.c., and lim S~ = -00, a.c.
In view of the arbitrariness of K, (iii) follows. 0

Corollary 4. If {X n} are i.i.d. random variables, then, for every a in (0, t], one
of the following holds:
(i) Snln"~ 00.
(ii) Snln"~ -00.
(iii) lim Snln" •. c. lim Snln" •. c. -00.

PROOF. If p. = E X exists, then Snln~ p. E [ -00,00], and if p. "# 0, either


(i) or (ii) holds. If, rather, p. = 0, Theroem 5.3.4 asserts that (iii) obtains for
a = 1/2 (a fortiori, for a < t). Finally, if E X+ = E X- = 00, Corollary 4
follows from Theorem 4. 0

Renewal theory is concerned with the so-called renewal function E Nt>


where

Nc = max{j: Sj = *
Xi :5: c}, c > 0,

the r. v.s {X n' n ~ I} being i.i.d. with p. = E XI E (0, 00]. Although, N c is not
a stopping time, when X 1 ~ 0, a.c. Nc + 1 = Tc'. Thus, when X 1 ~ 0, a.c.,
(13) and (14) below hold with N c replacing T~ and the former is known as the
elementary renewal theorem. A stronger result (due to Blackwell) asserts
that E N c +" - E N c - alp. as c - 00 (modification being necessary when
{X n } are lattice r.v.s).
References 157

Stopping times may be utilized to obtain the elementary renewal theorem.


A first step in this direction is

Lemma 2 (Gundy-Siegmund). Let {Xn, n ;;::: I} be independent, nonnegative 2 1


LV.S and let {Tm , m ;;::: I} be a sequence of {Xn}-times satisfying E Tm < 00,
m;;::: 1, and lim m_ co E Tm = 00. If

L
j= 1
n 1[Xj >£jj
Xj = o(n), I: > 0,

then E X Tn = o(E 1',,) as n -+ 00.

PROOF. For I: > 0, choose N ;;::: 1 such that Lj= 1 S[xj>£j) X j < nl: for n ;;::: N.
Then, if n ;;::: N,

N-l co
~ I: E 1'" + LE
j= 1
X j + LE
j=N
XjI[xj>£j. Tn~Jl

co
= I: E 1'" + 0(1) + L P{1'" ;;::: j}E XjI[xj>£j)
j=N

co k
= I: E 1'" + 0(1) + L LE
k=N j=N
X j I[Xj>£Jl P{1'" = k}
co
~ I: E 1'" + O( 1) + L I:k P{1'" = k} ~ 21: E 1'" + O( 1),
k=N

and so E X Tn = o(E 1',,). o


Theorem 5 (Elementary Renewal Theorem). If {X n, n ;;::: 1} are i.i.d. with
Jl = E Xl E(O, 00], Sn = Xi' and Li
T~ = inf{n ;;::: 1: Sn > c}, c> 0, (12)
then (i) E T~ < 00 and
lim E T~ = _1_. (13)
,-co c EX l
Moreover, (ii) if a} = a 2 <
I 00, then

(14)

PROOF. By Corollary 5.2.2, Sn/n ~ E Xl> 0, whence Sn ~ 00. Thus,


Corollary 5.4.1 guarantees that the {X n}-time T~ has finite expectation.
158 5 Sums of Independent Random Variables

For any m in (0, J1.), choose the positive constant K sufficiently large to
ensure E X 1I IX I :$ K) > m and define
n

S~ = LX;, v= inf{n ~ I:S~ > c}.


1

Then {X~} are i.i.d. and, as earlier, E V < 00. By Wald's equation
K +c ~ E S~ = E V· E X'I'
and so
E T~ EVII
--<--<--<-
K + c - K + c - E X'I - m'
whence

~I E T~ 1
1m --~-
c-ex> C m
and (m --> J1.)
~I ET~ I
lm--~-. (15)
C-ex> c J1.
If J1. = 00, (15) is tantamount to (13). If, rather, J1. < 00, then by Wald's
equation J1. E T~ = E Sf~ > c, implying

. E T~ 1
I1m - - > -
c -00 C - J1.'

which, in conjunction with (15), yields (13).


Apropos of (ii), setting T = 'Fe', by Theorem 5.3.3

(16)

Since T = T~ i 00 as c i 00 and JIXT >cn) xi = 0(1), e > 0, by Lemma 2 and


part (i)
EX} = o(E T) = o(c) as c --> 00.

Hence,
E(ST - C)2 ~ E(X;)2 ~ E X} = o(c) as c --> 00, (17)
so that

E S} < 00, EIXTI = o(~),


(18)
c ~ J1. E T = E ST ~ c +EIXTI = c + o(~),
whence E T 2 < 00 via (16) and (18).
From (17) and (18)
E(ST - J1. E T)2 ~ 2[E(ST - C)2 + (c - J1. E T)2] = o(c), (19)
5.4 Chung-Fuchs Theorem, Elementary Renewal Theorem, Optimal Stopping 159

and from (16), (19), and Schwarz's inequality

Jl.2 E(T - E T)2 = E(Jl.T - ST + ST - Jl. E T)2 = u2 E T + o(e).


Hence, by (13)

2
u2
uf = E(T - E T)2 = 2 E T
U
+ o(e) = 3" e + o(e),
Jl. Jl.

which is (14). o
When X n , n ~ 1 are i.i.d. with EX 1 = Jl. > 0, u 2 = E Xr E (0, 00), (Jl.3leu 2)1 /2
(1;, - (clJl.) has a limiting normal distribution as e -+ 00 according to Theorem
9.4.2.

EXAMPLE 1. If {X, X n , n ~ I} are nonnegative i.i.d. r.v.s with P{X > O} > 0
and Tx = inf{n ~ 1: Sn = I~ Xi > X},
X
! E Tx ~ E mm
. (X ,X ) ~ E Tx ' X> O. (20)

Remark. Since, setting So = 0,


00 00

E Tx = I P{Tx ~ n} =I P{Sn ~ X},


n=1 n=O

and since if F is the dJ. of X and


x
(21)
a(x) = So [1 - F(y)]dy'

according to Corollary 6.2.1 and Example 6.2.2

a~) = J: y dF(y) + x[1 - F(x)] = E min(X, x),

it follows that the inequalities of (20) may be transcribed as


1 00 00

2n~o P{Sn ~ x} ~ a(x) ~ n~o P{Sn ~ X}, x> O. (22)

PROOF. Since Tx ~ n iff Sn-l ~ x, the series expression for E Tx is immediate


(.see Exercise 4.1.4). Set
n

S~ = IX;,
1

and note that (omitting subscr~pts) x ~ min(ST' x) ~ S~, implying

x ~ E S'r ~ x + E X'r ~ 2x.


160 5 Sums of Independent Random Variables

Moreover, by Wald's equation

E S'r = E T· E XI = E T· E min(X, x),

yielding (20). o
Consider the plight of a gambler who makes a series of plays with out-
comes X n , n ~ 1, where the X n , n ~ 1, are i.i.d. r.v.s, and who has the option
of stopping at any time n ~ 1 with the fortune y" = maxI ,;;i';;n Xi - en,
e > O. Since the gambler is not clairvoyant his choice of a rule for cessation of
play is a stopping time, i.e., his decision to stop at any specific time n must
be based only on X I' ... , Xn (and not on X j with j > n). Does there exist an
optimal stopping rule, i.e., one which maximizes E YT over the class of all
finite stopping times?

Theorem 6. Let {X n, n ~ I} be i.i.d. r.v.s with EI X II < 00 and

Yn = max Xi - en, n ~ 1, (23)


1 S;iS;n

where e is a positive constant. If T is the finite {X n}-time defined by

T = inf{n ~ 1: X n ~ f3}, (24)

where 13 is the unique solution of

(25)

then E T < 00 and T is an optimal stopping rule, that is,

(26)

for every finite {Xn}-time T' for which E Yr exists.

PROOF. Since f(f3) = E(X I - 13)+ is a decreasing continuous function of 13


with Iim p- oo f(f3) = 0, lim p _- oo f(f3) = 00, and is strictly decreasing when
f(f3) > 0, necessarily (25) has a unique solution 13. Set p = P{X I ~ f3} and
q = 1 - p. By (25), p > O. Then

P{T = n} = pqn-I, P{T < oo} = Lpqn-I = -p- = 1,


l-q

E T = L00 npqn-I = p -d (00


L qn ) = 1
P 2 = -,
I dq 0 (l - q) p
and, since X j ~ 13 on {T =}},
00
EXT = L E XjI,T=jl
I
5.4 Chung-Fuchs Theorem, Elementary Renewal Theorem, Optimal Stopping 161

co
= L E XjI[T~J1I[xj~PI
!
co
= L P{T ~j}E X!I[Xl~Pl
I

co
=I [E(X I - f3)+ + f3p]P{T ~ j}
I

e
= (e + f3p)E T = - + f3.
p
Moreover,

E Yi = E(X T - eT)- ~ E Xi + e E T ~ f3- + e E T < 00,

whence E YT exists and


e
E YT = E max X j - e E T = EXT - - = f3,
! s;js;T P

yielding the first part of (26). Moreover, if T' is any finite {Xn}-time for which
E Yr exists, it may be supposed without loss of generality that E Yr > - 00.
Since for any b > f3
n
Yn = max Xi - en ~ b + L [(Xi - b)+ - e]
ISisn 1

and E[(X I - b)+ - e] < 0 by (25), Theorem 5.3.2 ensures that


r
E YT , ~ b + E L [(Xi - b)+ - e] < b.
!

Hence, E Yr ~ f3, which is the second portion of (26). o


The stopping rule 1; in (28) is of especial interest since the finiteness of its
moments depends on the value of the parameter e. According to Exercise 5.3.7
(also the application of Corollary 5.3.5) E 1; = 00 for e ~ 1 whereas Example
2 demonstrates under the Lindeberg condition (27) (see Chapter 9 for a more
general form) that E 1; < 00 for 0 < e < l. For {X n , n ~ I} as in Example 2
there exist constants ck ! 0 (c k is the smallest positive root of the Hermite
polynomial of degree 2k) such that E 1;k(m) < 00 for C < Ck> all m ~ I while
E 1;k(m) = 00 for all large m when C > ck [B. Brown].
EXAMPLE 2. Let Sn = Ii=! Xj where {Xj' j ~ I} are independent r.v.s with
E Xn = 0, E X; = I, n ~ 1 obeying

In
j=!
i [x]>.jj
Xl = o(n) (27)
162 5 Sums ofIndependent Random Variables

and define
1;, = 1;,(1), 1;,(m) = inf{n ~ m: IS.I > cn l/2 }, c > 0, m = 1,2,.... (28)
Then for all m ~ 1, E 1;,(m) < 00 for all c in (0, 1).
PROOF. The argument for m > 1 requires only minor modifications from that
for m = 1 which will therefore be supposed. Let c E (0, I) and assume that
E 1;, = 00. If V = min(1;" rt), clearly E V < 00 and E V - 00 as n - 00. Thus,
E xt = o(E V) by Lemma 2 and via Theorem 5.3.3

E V = ESt = E St-l + 2 E SV-IXV + EXt


2
~c E V + 2c[E V' E xtr /2
+ EXt
= c2 E V + o(E V).
2
Hence, (I - c ) E V ~ o(E V) yielding a contradiction as n - 00. Thus,
E 1;, < 00 for 0 < c < 1. 0

EXERCISES 5.4
I. Verify the second equality of (3).
2. Prove that the stopping rule T of (24) remains optimal when Y. = X. - en, e > O.
3. Prove Corollary 3.
4. Prove that if {X.} are i.i.d. with E X I = J.I E (0,00] and N, = sup{n ;::: I: S. $ e},
then Ncle ~ I/Jl..
5. Let {X.} be i.i.d. r.v.s with E X I = J.I > 0 and
T= T; = inf{n;::: I:S. > e} fore> O.

Prove that Tic ~ IIJl., a.c. If, moreover, E Ki < 00, then
ST - Jl. E T
ft .!. 0 as e -+ 00.

6. Prove that if {X.} are i.i.d. with E X I = J.I E (0, 00] and 1; = inf{n ;::: I: S. > en·},
e > 0,0 < IX < I, then
7 1 -. I
-~e-- ~ ~ as e -+ 00 and E 1; -
(e)I/(1-.'
~ .

7. (Chow-Robbins) Let {Y", n ;::: I} be positive LV.S with lim Y" = I, a.c., and {a.,
n ;::: I} positive constants with a. -+ 00, aJa._ I -+ I. For e > 0 define N = N c =
inf{n ;::: I: Y" $ a.le}. Prove that P{N < 00, lim, aNle = I} = I and, if E sup.;> I Y"
< 00, then lim,~", E aNle = I.
8. If S. = L'iXi' where {X., n ;::: I} are nondegenerate i.i.d. r.v.s, T = inf{j ;::: 1: Sj <
- a < 0 or S j > b > O} is of interest in sequential analysis (Wald). Prove that T is a
stopping variable with finite moments of all orders. Hint: P {T > rn} $ P { IS jr -
SrU- \)1 < a + b, I $ j $ n} for all integers r > O.
5.4 Chung-Fuchs Theorem, Elementary Renewal Theorem, Optimal Stopping 163

9. (Alternative proof that T'+ < 00, a.c., implies E L = 00.) Let {T (J), j ~ I} be
copies of T'+ and set T;, = I~ TW. Then Z = If
IITn<oo) = number of times S.
exceeds Si' i < n (take So = 0) and P{T_ ~ n + I} = P{S. > Si' 0 ~ i < n} =
P{U~ [1] = n]}, whence E T_ - I = E Z = 00.

10. If {X., n ~ I} are i.i.d. with S. = D Xi' then (i) Jmi S. = 00, a.c., iff there is a finite
{X.}-time T with EST> O. (ii) Moreover, when EXt = 00 Jmi S"jn = 00, a.c.,
iff there exists a finite {X .}-time T with E ST > - 00.
I I. If {X., n ~ I}arei.i.d. r.v.swithE XI = Jl > - 00 and S. = D Xi' 1; = inf{n ~ I:
S. > cn}, then E 1; < 00 for c < Jl and E 1; = 00 for c ~ Jl.
12. Let {So = I~ Xi' n ~ I, So = O} be a random walk on the line. A real number c
is said to be recurrent if PUS. - cl < e, i.o.} = I for all e > 0 and is called possible
if for every e > 0 there is an integer n such that P {IS. - c 1 < e} > O. Prove that if
c is possible and b is recurrent, then b - c is recurrent. Since every recurrent value
is clearly possible, the set Q of recurrent values is an additive group. Show that Q is
closed. Thus, if Q is nonempty, Q = (- 00, 00) or Q = Inc: c ¥- 0, n = 0, ± I,
± 2, ...} or Q = {O} (the latter only if X is degenerate at 0).

References
J. H. Abbott and Y. S. Chow, "Some necessary conditions for a.s. convergence of
sums of independent r. v.s.," Bull. Institute Math. Academia Sinica 1 (1973), 1-7.
L. E. Baum and M. Katz, "Convergence rates in the law of large numbers," Trans.
Amer. Math. Soc. 120 (1965), 108-123.
B. Brown, "Moments of a stopping rule related to the central limit theorem," Ann.
Math. Stat. 40 (1969), 1236-1249.
D. L. Burkholder, "Independent sequences with the Stein property," Ann. Math.
Stat. 39 (1968),1282-1288.
Y. S. Chow, "Local convergence of martingales and the law of large numbers," Ann.
Math. Stat. 36 (1965),552-558.
Y. S. Chow, "Delayed sums and Borel summability of independent, identically distrib-
uted random variables," Bull. Inst. Math., Academia Sinica 1 (1973), 207-220.
Y. S. Chow and H. Robbins, "On the asymptotic theory of fixed-width sequential
confidence intervals for the mean." Ann. Math. Stat. 36 (1965),457-462.
Y. S. Chow and H. Teicher, "Almost certain summability of i.i.d. random variables,"
Ann. Math. Stat. 42 (1971), 401-404.
Y. S. Chow, H. Robbins, and D. Siegmund, Great Expectations: The Theory ofOptimal
Stopping, Houghton Mifflin, Boston, 1972.
Y. S. Chow, H. Robbins, and H. Teicher, "Moments of randomly stopped sums,"
Ann. Math. Stat. 36 (1965),789-799.
K. L. Chung, "Note on some strong laws of large numbers," Amer. Jour. Math. 69
(1947),189-192.
K. L. Chung, A Course in Probability Theory, Harcourt Brace, New York, 1968; 2nded.,
Academic Press, New York, 1974.
K. L. Chung and W. H. J. Fuchs, "On the distribution of values of sums of random
variables," Mem. Amer. Math. Soc. 6 (1951).
K. L. Chung and D. Ornstein, "On the recurrence of sums of random variables," Bull.
Amer. Math. Soc. 68 (1962),30-32.
164 5 Sums oflndependent Random Variables

C. Derman and H. Robbins, "The SLLN when the first moment does not exist,"
Proc. Nat. Acad. Sci. U.S.A. 41 (1955),586-587.
J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
K. B. Erickson, "The SLLN when the mean is undefined," Trans. Amer. Math. Soc.
185 (1973),371-381.
W. Feller, "Uber das Gesetz der grossen Zahlen," Acta Univ. Szeged., Sect. Sci. Math.
8 (1937),191-201.
W. Feller, "A limit theorem for random variables with infinite moments," Amer. Jour.
Math. 68 (1946),257-262.
W. Feller, An Introduction to Probability Theory and its applications, Vol. 2, Wiley,
New York, 1966.
R. Gundy and D. Siegmund, "On a stopping rule and the central limit theorem," Ann.
Math. Stat. 38 (1967),1915-1917.
C. C. Heyde, "Some renewal theorems with applications to a first passage problem,"
Ann. Math. Stat. 37 (1966), 699-710.
T. Kawata, Fourier Analysis in Probability Theory, Academic Press, New York, 1972.
H. Kesten, "The limit points ofa random walk," Ann. Math. Stat. 41 (1970),1173-1205.
A. Khintchine and A. Kolmogorov, "Uber Konvergenz von Reichen, derem Glieder
durch den Zufall bestimmt werden," Rec. Math. (Mat. Sbornik) 32 (1924),668-677.
A. Kolmogorov, "Uber die Summen durch den Zufall bestimmer unabbiingiger
Grossen," Math. Ann. 99 (1928),309-319; 102 (1930), 484-488.
M. J. Klass, "Properties of optimal extended-valued stopping rules," Ann. Prob. I
(1973),719-757.
M. Klass and H. Teicher, "Iterated logarithm laws for random variables barely with or
without finite mean," Ann. Prob. 5 (1977), 861-874.
K. Knopp, Theory and Application of Infinite Series, Stechert-Hafner, New York, 1928.
P. Levy, Theorie de faddition des variables aleatoires, Gauthier-Villars, Paris, 1937;
2nd ed., 1954.
M. Loeve, "On almost sure convergence," Proc. Second Berkeley Symp. Math. Stat.
Prob., pp. 279-303, Univ. of California Press, 1951.
M. Loeve, Probability Theory, 3rd ed., Van Nostrand, Princeton, 1963; 4th ed., Springer-
Verlag, Berlin and New York, 1977-1978.
J. Marcinkiewici! and A. Zygmund, "Sur les fonctions independantes," Fund. Math.
29 (1937),60-90.
P. Revesz, The Laws of Large Numbers, Academic Press, New York, 1968.
H. Robbins and E. Samuel, "An extension of a lemma of Wald," J. Appl. Prob. 3
(1966),272-273.
F. Spitzer, "A combinatorial lemma and its applications to probability theory," Trans.
Amer. Math. Soc. 82 (1956),323-339.
C. J. Stone, "The growth of a recurrent random walk," Ann. Math. Stat. 37 (1966),
1040-1041.
H. Teicher, "Almost certain convergence in double arrays," Z. Wahr. verw. Gebiete 69
(1985),331-345.
A. Wald, "On cumulative sums of random variables," Ann. Math. Stat. 15 (1944),
283-296.
6
Measure Extensions,
Lebesgue-Stieltjes Measure,
Kolmogorov Consistency Theorem

6.1 Measure Extensions, Lebesgue-Stieltjes


Measure
A salient underpinning of probability theory is the one-to-one correspondence
between distribution functions on R" and probability measures on the Borel
subsets of R". Verification of this correspondence involves the notion of
measure extension.
Recall that a measure J1. on a class .91 of subsets of a space 0 is a-finite if
° = U:'= 1 0" with 0" E .91, J1.{O,,} < co, n ~ l. Moreover, if J1. and v are set
functions on classes <§ and Yf respectively with <§ c Yf and J1.{A} = v{A}
for each A E <§, then v is dubbed an extension of J1. to Yf while J1. is called the
restriction of v to <§, the latter being denoted by J1. = v I~.

Theorem 1. If J1. is a measure on a semi-algebra Y of subsets of 0, there exists


a measure extension v of J1. to a(Y). Moreover, if J1. is a probability or a-finite
measure, then v is likewise and the extension is unique.

PROOF. For each subset A c 0, define

v{A} = inf{~ J1.{Sn}: QS,,::::> A, S" E Y, n ~ I}. (1)

vIt = {D c 0: v{D· A} + v{Dc . A} = v{A} for all A cO}. (2)

Clearly, for A c B c 0, 0 = v{0} ~ v{A} ~ v{B}; moreover, 0 E vIt,


and DC E vIt whenever D E vIt.
(i) J1. = vlY"

165
166 6 Measure Extensions

If S E Y, then by (1), v{S} :5: J.l{S} + J.l{0} + J.l{0} + ... = J.l{S}, while
if Uf Sn ::J S, where Sn E Y, n ~ I, then, since J.l is a measure on Y, by
Corollary 1.5.1 J.l{S} = J.l{Uf SnS} :5: If
J.l{Sn}, and so via (1), J.l{S} :5: v{S}.
(ii) v is subadditive, that is, for A o c Uf An C n
00

v{A o } :5: L v{A n}. (3)


t

In proving (3) it may be supposed that v{A n } < 00, n ~ I. For any
e > 0 and n ~ 1, choose Sn.m E Y, m ~ I, with

00

v{A o} :5: ~>{Sn.m} :5: L v{A n} + e,


n,m 1

and, letting e --+ 0, (3) obtains.


(iii) Y c vii.
Let S E Y and A c n. By (ii) and the definition of vii it suffices to verify
that
v{A} ~ v{S·A} + v{sc·A}, (4)

and in so doing it may clearly be supposed that v{A} < 00.


For any e > 0, choose Sn E Y, n ~ 1, such that Uf Sn ::J A and
00

v{A} +e~ I J.l{Sn}· (5)


!

Since Y is a semi-algebra, for each n ~ 1 there exists a finite partition of


SCSn in Y, say (Sn.m, m = I, ... , mn ). Then (S· Sn, Sn.m, m = 1,2, ... , mn ) is
a partition of Sn in Y, and by additivity of J.l on Y
mn
J.l{Sn} = J.l{S . Sn} + L J.l{Sn.m}, n ~ I. (6)
m=!
But U:'=! U:::~! Sm.n = U:'= t SC. Sn ::J A· SC and Uf SSn ::J SA, whence
by (1)
00 mn 00

L L J.l{Sn.m} ~ v{A . SC}, LJ.l{S,Sn} ~ v{S·A}. (7)


n= t m= t n= t
Consequently, from (5), (6) and (7)
00 00 mn
v{A} +e~ L J.l{S· Sn} + L L J.l{Sn.m} ~ v{S· A} + v{Sc. A},
1 n=1 m=1
which is tantamount to (4) as e --+ O.
6.1 Measure Extensions, Lebesgue-Stieltjes Measure 167

(iv) vii is an algebra and for every DEvil and A c:: n and finite partition
{D n , n = 1,2, ... ,m} of D in vii,
m
v{A . D} = L v{A . Dn }. (8)
j

Now for A en and D; E vii, i = 1,2,


v{A} = v{A· Dd + v{A- Dn
= v{A· DjD z } + v{AD.D z } + v{AD~Dz}
+ 1'{AD~Dz}. (9)
Replacing A by A(D j u Dz ) in (9),
v{A(D j u Dz )} = v{A· DjD z } + v{ADjD z } + v{AD~Dz}, (10)
whence from (9) and (10)
v{A} = v{A(D j u Dz )} + v{ADWz},
and so, via the definition, D. u D z E vii. Since, as noted at the outset, .,If
is closed under complements, vii is an algebra, and, moreover, if DjD z = 0,
(10) yields
v{A(D j u D z)} = v{ADd + v{AD z },
which is precisely (8) when m = 2. The general statement of (8) follows by
induction.
(v) vii is a a-algebra, and for every DEvil and A c n and a-partition
{D n , n 2 I} of D in vii,
00

v{A . D} = L v{A . Dn }· (II)


j

Let {D n , n 2 l} be the stated a-partition of D and set En = U~ D;. By


(iv), En E vii for every positive integer n, whence for any A c n,

v{A} = v{A En} + v{A E~} 2 v{A· En} + v{ADC}


n
= L v{A ·DJ + v{A ·DC}.
j

Hence, by (3)

L: v{A . DJ + v{A . DC} 2


00

v{A} 2 v{A- D} + v{A . DC}, (12)


j

and so equality holds throughout (12), yielding (11) upon replacement of A


by AD. Moreover, if {Dj,j 2 I} is any sequence of disjoint sets of vii and
D = Uf Dj , (12) remains intact, whence it is clear that (see Exercise 1.3.2)
vii is a a-algebra. Clearly, if J1. is finite or a-finite, v inherits the characteristic.
It follows directly via (i), (iii), and (v) that (vi) vii ~ a(Y), v is a measure on
vii, and J1. = vly·
168 6 Measure Extensions

Finally, to prove uniqueness, let v* be any measure extension of J1. to


a( Y) and define
$ = {E: E E a(Y) and v(E) = v*(E)}.
If J1.{O} < 00, then v{O} = v*{O} < 00 and it is easily verified that $ is a
A.-class containing the n-class Y, whence v = v* on a(Y). If J1. is merely
a-finite on Y, there exist sets On E Y with Uf
On = 0 and J1.{On} < 00,
n ~ 1. Then, as just seen, v = v* on On n a(Y), n ~ 1, and so v = v* on
a(Y). []
The set function v defined by (I) on the a-algebra of all subsets of 0 is
called the outer measure induced by J1., while the a-algebra .A described in (2)
is termed thea-algebra of v-measurable sets.
Any measure J1. on a a-algebra SII is called complete if B cAE SII and
J1.{A} = 0 imply BE SII (and necessarily p{B} = 0). A complete measure
space is a measure space whose measure is complete.
The outer measure vstipulated in (1) defines a complete measure extension
of J1. to.A, the a-algebra of all v-measurable sets (Exercise 2).
As an important application of Theorem 1, the special cases of Lebesgue-
Stieltjes and Lebesgue measure on the line will be considered. Let R =
[ - 00, 00] and Y denote the semi-algebra of all intervals of the form [a, b),
- 00 :5 a :5 b :5 00 and R, {oo} in addition. A class of measures known as
Lebesgue-Stieltjes measures will be generated on Y via monotone functions.
For any mappingf of R = [ - 00, 00] into R, set
f(t +) = lim f(s) = lim f(s), tE [ - 00, 00), f( 00 +) = f( 00),
$-1+ t<s-t

f(t -) = lim f(s) = lim f(s), t E ( - 00, 00], f( - 00 -) = f( - 00),

when these limits exist. Iff(t) = f(t-), thenfis said to be continuous from
the left or left continuous at t E R. A function which is left continuous at all
points of some set T c R is called left continuous on T and when T = R,
simply left continuous. Similarly, f(t) = f(t+) defines right continuity at
t and the analogous terms are employed.
If f is a function with f(t - ) existing for each t E R, then g(t) = f(t - ) is
left continuous (Exercise 3). In particular, iff is a monotone function on R,
f(t-) exists and g(t) = f(t-) is left continuous. Since the set of discon-
tinuities of a monotone function is countable and two left-continuous
functions are identical if they coincide except for a countable set, every
monotone function m on R defines a unique left-continuous function F = F m
via F(t) = m(t - ).
Lemma 1. Let F be a nondecreasing, left continuous function on R = [- 00, 00]
withF(-oo) = F(-oo+)andIF(t)1 < oo,ltl < 00. If
J1.{[a, b)} = F(b) - F(a), - 00 :5 a :5 b :5 00,
J1.({00}) = 0 = J1.{0}, J1.{R} = F(oo) - F(-oo), (13)
then J1. is a measure on the semi-algebra Y.
6.1 Measure Extensions, Lebesgue-Stieltjes Measure 169

PROOF. Clearly, jJ. is nonnegative and additive on Y. To verify a-additivity,


consider S E Y with S #- {oo} or 0 and let {Sn, n ~ I} be a a-partition of S
in Y. By Corollary 1.5.1

(14)

In proving the reverse inequality, since jJ.{R} = jJ.{[ -00, oo)}, it may be
supposed that S = [a, b), -00 ~ a < b ~ 00. Moreover, in view of the
hypothesized equality F( -00) = F( -00+) and left continuity, it suffices to
establish that

L jJ.{Sn· [c, d)}


00

jJ.{[c, d)} ~ whenever a ~ c < d < b ~ 00, c #- -00. (15)


1

Since 00 ¢ [a, b), necessarily Sn #- {oo}, n ~ 1 whence Sn' [c, d) = [an, bn),
where c ~ an ~ ba ~ d, n ~ 1.
For any e > 0, set I n = (an - (jn, bn) where (jn > 0 satisfies F(a n) - F(a n -
n
(jn) < e/2 , n ~ 1 via left continuity. Then [c, d] =
whence by the Heine-Borel theorem [c, d) c J nk c U7
[an, bn) Cur
U7
In
[a nk - (jnk' bn) for
ur
some finite set of integers n1 , ••• , nm• By Corollary 1.5.1
m m
jJ.{[c, d)} ~ L (F(bn) - F(a nk - (jn) ~ L (F(bn) - F(a n) + e2- nk )
1 1
m
Lt (F(bn.) - + e ~ L jJ.{Sd + e.
00

~ F(a n.)
1

Thus, (15) obtains as e -+ 0, and jJ. is a measure on Y. o


Theorem 2. Any nondecreasing, finite function m on ( - 00, 00) determines a
complete measure Vm on the a-algebra .11m of all vm-measurable subsets of
R = [ - 00, 00] with

vm{[a, b)} = m(b-) - m(a-), - 00 < a ~ b< 00,

vm({ oo}) = vm({ - oo}) = O. (16)

Moreover, .IIm =:> the a-algebra f!J ofall Borel sets of Rand Vm is unique on f!J.

PROOF. Set F(t) = m(t -) for - 00 < t ~ 00 and F( - 00) = m( - 00 + ).


Then F is defined and left continuous on R with F( - 00 + ) = F( - 00).
By Lemma 1, the set function jJ. defined on Y by (13) is a measure thereon,
whence Theorem 1 guarantees the existence of a measure extension vm of jJ.
to .11m =:> f!J = a(Y), where vm and .11m are defined by (1) and (2) respectively.
According to Exercise 2, vm is complete on .IIm'
Hence, vm{A} = vm{A - { -oo}}, A E.IIm is a complete measure on .11m
satisfying (16).
To verify uniqueness on f1l, let v* be any other measure on f1l satisfying
170 6 Measure Extensions

(16). Then
= v*{{ -oo}} = v.. {{oo}} = v.. {{ -oo}}
v*{{oo}}
v*{[a, b)} = m(b-) - m(a-) = v..{[a, b)}, -00 < a:S; b < 00,

and for -00 <b< 00

v*{[ -00, b)} = v*{( -00, b)} = lim v*{[a, b)} = lim v.. {[a, b)}
Q--oo

similarly, for -00 <a< 00

v*{[a, oo)} = v.. {[a, oo)}


Thus, taking a = b,
v*{R} = v*{[ -00, oo)} = v.. {[ -00, oo)} = v.. {R}.
Hence, v.. and v* coincide on II' whence by Theorem 1 these a-finite measures
also coincide on !Jl = a(9'). 0

For any finite, nondecreasing function m on ( - 00, 00) the corresponding


measure v.. is called the Lebesgue-Stieltjes measure determined by m. Similarly,
the complete measure space (R, j {.. , v..) is referred to as the Lebesgue-
Stieltjes measure space determined by m.
In the important special case m(t) = t for t E ( - 00, 00), the corresponding
measure v = v.. is the renowned Lebesgue measure, generalizing the notion of
length; the sets of j { = j { .. are called Lebesgue-measurable sets and (R, j{, v)
is the Lebesgue measure space (of R).

EXERCISES 6.1
1. Let J.l be a measure on a a-algebra .91 and defined = {A AN: A E.9I, NcB E.9I,
J.l{B} = O}. Then .91 is a a-algebra, .fj => .91, and ji is a complete measure on .91,
where ji{A AN} = J.l{A} for A A NEd.
2. Prove that the extension v (as defined in Theorem I) of a measure J.l on .Y' to the
a-algebra .I( of all v-measurable sets is complete.
3. Let f map R = [ - 00, 00] into R with f(t - ) existing for every t E R. Then g(t) =
f(t - ), t E R, is left continuous, i.e., g(t) = g(t - ), t E R.

4. If v.. is the Lebesgue-Stieltjes measure determined by a finite nondecreasing function


mon( - 00, (0) and F(t) = m(t-), - 00 :;;; t :;;; oo,provethat for - 00 < a < b < 00
v{[a, b)} = F(b-) - F(a-), v{(a, b]} = F(b+) - F(a+),
v{[a, b]} = F(b+) - F(a-), v{(a, b)} = F(b) - F(a +),
v{{oo}} = v{{ -oo}} = 0, v{{a}} = F(a+) - F(a-).

5. If m(t) is as in Exercise 4 and G(t) = m(t +), - 00 :;;; t :;;; 00, then the nondecreasing,
right continuous function G determines a measure ji on the semi-algebra ii' of all
6.2 Integration in a Measure Space 171

finite or infinite intervals ofthe form (a, b], a < b, and also 0, R = [ - 00, 00], { - oo}
via jl{(a, b]} = G(b) - G(a), jl{{ -oon = Jl{0} = 0, jl{R} = G(oo) - G( -(0).
6. There is a I-I correspondence between dJ.s and probability measures on the Borel
sets of the line.
7. If (R,.n, v) is the Lebesgue measure space (of R), n = [0, I], .91 = n·.n, and
P = vi...., then (n, .91, P) is a probability space.
8. If X is a T.v. with dJ. F, then P{X+ = O} = F(O+), P{X+ < x} = F(x), x> 0, and
P{X- = O} = I - F(O),P{X- < x} = I - F(-x+),x > O.FindthedJ.ofIXI.
9. Give an example to show that the uniqueness assertion of Theorem I is not true
withouttherestriction ofa-finiteness on Y. Hint: Taken = {r: rrational,O::; r < I},
Y = {n· [a, b): 0 ::; a ::; b ::; I}, Jl(0) = 0 and Jl(A) = 00 if 0 ¥ A E Y, v(A) =
number of elements in A for A E a(Y), v*(A) = 2v(A) for A E a(Y).
10. If (R, .n, v) is the Lebesgue measure space of the real line and E is a Lebesgue
measurable set, so is E + x = {y + x: y E E} and, moreover, v{E + x} = v{E} for
every x E ( - 00, (0).
II. For any real x, y consider the equivalence relation x - y if x - y = r = rational.
Let the subset E of [0, I) contain exactly one point of each equivalence class. Then E
is a non-Lebesgue-measurable set. Hint: (i) if x E (0, I), thenx E E + r for some r in
(-I, 1). (ii)(E + r) n (E + s) = 0 for distinct rationals r, s. Thus, if E is Lebesgue
measurable so is S = U,eF(E + r), where F is the set of rationals in (-1, 1) via
Exercise 10. Then S c ( -1,2), whence 3 ~ v{S} = If
v{E}, implying v{S} =
I v{E} = O. However, by (i), [0, I) c U(E + r) = S, so that v{S} ~ 1, a contradic-
tion.

6.2 Integration in a Measure Space


Let (S, :E, JI.) constitute a measure space, that is, S is an arbitrary nonempty
set, :E is a a-algebra of subsets of S, and JI. is a measure on :E. In the event that
JI. is a-finite, (S, :E, JI.) will be termed a a-finite measure space. Any property
that obtains except on a set of JI.-measure zero will be said to hold almost
everywhere (abbreviated a.e.).
Let X be a :E-measurable function, that is, a mapping from S to R =
[ - 00, 00] for which X - 1(81) c :E, where 81 is the class of Borel subsets
of R. Then, paralleling the approach in Chapter 4, the integral

EX = Lx dJl.

may be defined as follows:


(i) If X ~ 0, a.e., and JI.{X = oo} > 0, then EX = 00, while if X ~ 0, a.e.,
and JI.{X = oo} = 0,

E X = hm

L ni JI. {in2
00

n-ooi=12
< X ::;
i+ I}
-n-
2
. (I)
172 6 Measure Extensions

(ii) In general, if either E X + < 00 or E X - < 00,

EX = EX+ - EX-, (2)


inwhichcaseE Xissaidtoexist,denotedbylE XI :s;; oo.IfiE XI < oo,thatis,
E X exists and is finite, X is called an integrable function, or, simply, in-
tegrable.
If IE X I :s;; 00, the indefinite integral of X is defined by

Lx dfJ. = EX/A' AEL.

As in Chapter 4, it is easy to verify that the limit in (1) always exists and
that:
o :s;; E X :s;; 00 if X ~ 0, a.e., E I = fJ.(S);
IXI < oo,a.e., ifEIXI < 00; X=O,a.e., iffEIXI=O;
EX = E·Y if X = Y, a.e., and IE X I :s;; 00.

From (I) and (2) it is readily seen that if (S, L i , J1J, i = 1,2, are measure
spaces with fJ.l = fJ.2 b:. and X is a L1-measurable function, then

Lx dfJ.2 = Lx dfJ.l (3)

in the sense that if one of the integrals exists, so does the other and the two
are equal.
Associated with any measure space (S, L, fJ.) are the spaces !l' p =
!l'iS, L, fJ.), P > 0, of all measurable functions X for which E I X IP < 00.
For any measurable function X, and especially when X E !l' P' the !l'p-norm
of X is defined by
(4)
Let {X n , n ~ I} be a sequence of measurable functions on the measure
space (S, L, fJ.). Jf fJ.{lim X n -:f. limn X} = 0, set X = lim X n whence lim X n =
X, a.e., denoted by X n ~ X. If X is finite a.e., X n will be said to converge a.e.,
denoted by X n ~ X, finite. Alternatively, if X n is finite a.e., n ~ 1, then X n
converges in measure to X, written X n -4 X, iflimfJ.{IX n - XI > t:} = o for
all t: > O. These are obvious analogues of a.c. convergence and convergence
in probability on a probability space, but the correspondence is not without
limitation (see Exercise 6.2.4). In case IIX n - Xll p = 0(1), the earlier notation
:I!
X n ~ X will be employed.
In dealing with the basic properties of the integral in Chapter 4, the proof
of Theorem 4.1.1(i) utilized the fact that for any nonnegative integrable LV.
X on a probability space (0, JF, P)

EX = lIm
n-oo

;=0
L -+-
00 1
n P
i 2
{i
-n < X < --
2 - 2
n .
i+ I} (5)
6.2 Integration in a Measure Space 173

The counterpart of (5) in the case of a nonnegative, integrable function X


on a measure space (S, ~, p.) is
. ex> i +
E X = 11m I - 2 n
{i 1 i+
p. Ii < X ~ - n - .
I} (6)
n-ex>i=1 2 2
n n
To prove (6), note that, setting Sn = L~ 1 (iI2 )p.{iI2 < X ~ (i + 1)/2n},
2"-1 i {i i+
s2n - sn ~ i~1 22n p. 22n < X ~ ~ ~ r
2n I} n
p.{2-2n < X ~ r } = 0(1)
since EX = limn_ex> Sn' Moreover, r 2n p.{X > 2- n} ~ 2- nE X = 0(1), so
that

whence
2-(2n-l)p.{X > r(2n-1)} ~ 2. 2- 2n p.{X > r 2n } = 0(1).
Consequently, the difference of the right and left sides of (6), viz.,
limn_ex> 2- np.{X > r n}, is zero.
It may be noted via (6) or (1) that if X is a nonnegative, integrable function
on (S,~, p.), then p. is IT-finite on S{X > OJ.

Theorem 1. Let X, Y, X n, Y" be measurable functions on the measure space


(S, ~,p.).

i. If X = If xnI A" ~ 0, where {An, n ~ 1} are disjoint measurable sets,


then
ex>
EX = L xnJ.l{A n} where o· 00 = 00 ·0 = O. (7)
1

ii. (a) If X ~ 0, a.e., there exist elementary functions X n with 0 ~ X n i X,


a.e., and E X n i E X. (b) If, moreover, X < 00, a.e., then X - X n ~ 2- n,
n ~ 1, a.e., is attainable.
iii. If X ~ Y ~ 0, a.e., then E X ~ E Y and for 0 < a < 00
EX ~ ap.{X ~ a}, (8)
iv. If 0 ~ X n i X, a.e., then E X n i E X (monotone convergence).
v. IfE X, E Y, and E X + E Yare defined, then so is E(X + Y) and

E(X + Y) = EX + E Y.
vi. If X ~ Y, a.e., IE X I ~ 00, lEY I ~ 00, then E X
~ E y.
vii. IfE X, E Y, a EX + bEY are defined for finite a, b, then
E(aX + bY) = aEX + bE Y. (9)
viii. X is integrable iff IX I is integrable and if X ~ 0, Y ~ 0, a.e., then
E(X + Y)p ~ 2P(EXP + E ¥P),p > O.
174 6 Measure Extensions

PROOF. The argument follows that of Theorem 4.1.1 with two modifications.
Firstly, in proving (i) write Ini = (iI2 n, (i + 1)/2n], P.ni = p.{ X' E Ini }, and replace
(13) of Theorem 4.1.1 by

Consequently, E X = L~ I xjp.{AJ via (6).


Secondly, in the proof of (iv), i.e., monotone convergence, replace (15),
(16) of Section 4.1 by

~~~ E X m ~ i~1 ~ ~~~p.{Xm E In;} ~ JI ~ p.{X E In;}


k • k •

> a,

utilizing Exercise 1.5.2. D

The next theorem incorporates analogues of Lemma 4.2.1, Corollaries


4.2.2 and 4.2.3, and Theorem 4.3.2.

Theorem 2. Let {X n, n ~ 1} be a sequence of measurable functions on the


measure space (S, L, p.).
1. If X n ~ 0, a.e., n ~ I, then
E lim X n :s; lim E X n (Fatou). (10)

II. IflXnl :s; Y, a.e., where E Y < 00 and either X n ~ X or X n -4 X, then

E IX n - X I- 0 (Lebesgue dominated convergence theorem). (11)


111. IfIXIPE!l'I' IYIP'E!l'I, where p> 1, p' > I, (lIp) + (lip') = I, then
XY E!l'1 and
EIXYI:S; IIXllpll YII p ' (Holder inequality). (12)
iv. If IE X I :s; 00, the indefinite integral v{ A} = JA X dp. is a a-additive set
function on L.
PROOF. The proofs of Holder's inequality and (iv) are identical with those of
Chapter 4, while the argument in Fatou's lemma is precisely that used for X~
in Theorem 4.2.2(ii). Apropos of (ii), suppose first that Xn~X, The
hypothesis ensures that X n E !l'1' X E !l'1' whence by (10)
E(Y ± X) = E lim(Y ± X n ) :s; lim E(Y ± X n),

implying EX :s; Iim,,-oo E X n :s; ITiii n _ oo E X n :s; E X, that is, IXnl :s; Y E!l' I
and X n ~X imply
EX = limE X n • (13)

Since 0 :s; IX n - XI :s; 2Y, (13) ensures that E IX n - XI- o.


6.2 Integration in a Measure Space 175

On the other hand, if X n J:... X, then (Exercise 3.3.13) every subsequence


ofXn,sayX~,satisfiessuPm>nll{IX~- X~I > £} = 0(1), all £ > O,andhence
has a further subsequence, say X nk , with X nk ~ Yo, finite. By the portion
already proved, E I X nk - Yo I = o( 1), whence X nk J:... Yo via (8). Thus X = Yo,
a.e.,andEIX nk - XI = 0(1). Consequently, every subsequence ofEI X n - XI
has a further subsequence converging to zero and so E IX n - X I = 0(1).
o
Although most properties of a probability space carryover to a measure
space, Jensen's inequality and the usefulness of uniform integrability are
conspicuous exceptions (see Exercises 12, 13, and 4).
Let (R, .Am, v:) be the Lebesgue-Stieltjes measure space determined by a
finite, nondecreasing function m on ( - 00, (0) and let X be a Borel function on
R with IE XI ~ 00. Then
Lx dm or L X(t)dm(t), A E .Am,

is just an alternative notation for the indefinite Lebesgue-Stieltjes integral


SA X dv:, A E .Am' that is, for the indefinite integral on the Lebesgue-
Stieltjes measure space determined by m. According to (3), if Vm = v: I.~,
then
LX dm = Lx dv: = Lx dVm' AE~,

since X is a Borel function. Consequently, when dealing with Borel sets B,


the Lebesgue-Stieltjes integral SB X dm may be envisaged as being defined
either in the Lebesgue-Stieltjes measure space (R, .Am, v:) or in the Borel
measure space (R,~, vm). Since vm{{oo}} = vm{{- oo}} = 0, the integral
S~ co X dm may be unambiguously defined as

r
J[-CO,CO]
X dm = f
(-co,co)
X dm = r
J[-CO,CO)
X dm = f(-co,co)
X dm.

However, if - 00 < a < b< 00, S: X dm is not, in general, well defined since
r
J[a,bl
X dm = [m(a+) - m(a- )]X(a)

+ [m(b+) - m(b- )]X(b) + f (a,b)


X dm

via additivity ofthe indefinite integral and Exercise 6.1.4. On the other hand, if
a and b are continuity points of m, i.e., m(b +) = m(b - ) and m(a +) = m(a - ),
S:
then X dm may be interpreted as the common value

r
~~~
X dm = f~~ X dm = f~~ X dm = r
~~~
X dm.

Thus, only when a and b are continuity points (including ± (0) of m will
the notationf: X dm be utilized for a Lebesgue-Stieltjes integral.
176 6 Measure Extensions

As in the case of Riemann integrals, if a < b,

fX dm = - fx dm

by fiat (under the prior proviso).


In the important special case m(t) = t, - 00 < t < 00, the Lebesgue-
Stieltjes integral SA X dm is denoted by SA X(t)dt and is called the Lebesgue
integral.
Let X be a measurable transformation from a measure space (S, L, J1.) to
a measurable space (T, d). Then, as in Section 1.6, the measure J1. induces
a measure v on d via
v{A} = J1.{X- I (A)}, AEd. (14)

In fact, v{0} = 0, v{T} = J1.{S}, and v is a-additive since if {An, n 2 I} are


disjoint sets of d, then X-I(A n ), n 2 1, are disjoint sets of Land
X-I(Uf An) = Uf X-1(A n)· The measure v, induced by J1. and the measur-
able transformation X, will be denoted by J1.X - I. The next result might well
be called the change of variable theorem since it justifies a technique im-
mortalized by integral calculus.

Theorem 3. Let X be a measurable transformation from a measure space


(S, L, J1.) to a measurable space (T, d) and let v = J1.X - I be the induced
measure on d. If g is a real d -measurable function on T, then

(15)

in the sense that if either integral exists, so does the other and the two are equal.
PROOF. Note that g(X) is a real L-measurable function. Since g = g+ - g-
and g± are measurable functions on (T, d), it suffices to prove that every
nonnegative d-measurable function is in~, where
~ = {g: g is a nonnegative d-measurable function for which (15) holds}.
Now by monotone convergence and linearity, ~ is a monotone system (see
Section 1.4), and if g = I A for any A Ed, then

1
f/ dv = {dv = J1.{X- (A)} = f/A(X)dJ1..

Thus, ~ contains all indicator functions of sets of the a-algebra d, whence by


Theorem 1.4.3, ~ contains all nonnegative d-measurable functions. 0

As an application of Theorem 3, let (n, fF, P) be a probability space and


X a r.v. thereon. Then X is a measurable transformation from (n, fF, P) to
(R, fA) (as usual, R = [- 00,00]) and in conjunction with P induces a
probability measure Vx = P X-Ion fA via
vx{B} = P{X-I(B)}, BE fA. (16)
6.2 Integration in a Measure Space 177

In particular, for - 00 < a <b< 00

vx{[a,b)} = P{a:S; X < b} = Fx(b) - Fx(a),


where Fx is the dJ. of X. Since Fx is left continuous, Theorem 6.1.2 ensures
that the measure Vx coincides with the restriction to !!4 of the Lebesgue-
Stieltjes measure determined by F x. Actually, if V x is completed to fj as in
Exercise 6.1.1, then Vx on fj is precisely the Lebesgue-Stieltjes measure on
.-ItF determined by F x (for a proof, see Halmos (1950, p. 56)).

Corollary l. If X is a LV. on a probability space (n, !F, P) with dJ. Fx and


g is a finite Borel function,

E g(X) = f:oog(t)dFX<t) = f:oot dFg(xlt) (17)

in the sense that if either of the integrals exists, so does the other and the two
are equal.
PROOF. Let Vx be defined on !!4' by (16). Then, as noted earlier, Vx is the re-
striction to !!4 of the Lebesgue-Stieltjes measure determined by F x. By
Theorem 3

E g(X) = Lg(X) dP = {g dvx = f:}(t)dF x(t) (18)

in the sense delineated therein. Replacing g(t) by t and X by g(X) in (18)


yields

E g(X) = f:oot dFg(xlt), (19)

again in the aforementioned sense. Thus (17) and hence the corollary is
~~. 0

It may be noted that E X has so far been calculated only when X is an


elementary function. Corollary I says explicitly (and Theorem 3 implicitly)
that E X may be replaced by a Lebesgue-Stieltjes integral, but this does not
resolve the problem of evaluation. Theorem 4 asserts that under modest
conditions the Lebesgue-Stieltjes and Riemann-Stieltjes integrals coincide,
and fortunately the latter is susceptible of calculation via (21)(x), (vii),
etc. (see below), as illustrated by Lemma 1.
A finite nondecreasing function m on ( - 00, 00) elicits, in addition to the
Lebesgue-Stieltjes integral SA X dm (defined for Borel functions X and Borel
sets A whenever hX dm exists), also the Riemann-Stieltjes integral

f f(t)dm(t), - 00 < a< b< 00.

(This will never be denoted by SD f(t)dm(t), where D is a closed, open, or


half-open interval with end points a and b, such notation being reserved for
Lebesgue-Stieltjes integrals).
178 6 Measure Extensions

If f(t) and m(t) are bounded functions on [a, b] the Riemann-Stieltjes


integral offwith respect to m from a to b is defined by

f
b n

a f(t)dm(t) = lim k~/(~k)[m(tk) - m(tk- 1 )] (20)

provided the limit, which is taken as maxI ~ksn (t k - t k- I ) -+ 0, exists


independently of the choice of ~k in [t k- I' tk], where
a = to < t I < ... < t n = b, 1 :s; k :s; n.

When f is continuous and m is finite and nondecreasing on [a, b], both


J~ f(t)dm(t) and J~ m(t)df(t) exist (see Widder (1961, Chapter 5». In the
special case m(t) = t, the integral defined by (20) coalesces to the Riemann
integral off from a to b and is denoted J~ f(t)dt.
Let J, fl'/2 be continuous functions and m, ml> m2 finite, nondecreasing
functions on the finite interval [a, b]. Then (Widder (1961, Chapter 5»,
if c denotes a finite constant,

i. f cf(t)dm(t) = c f f(t)dm(t), f f(t)d(m(t) + c) = f f(t)dm(t),

11. f dm(t) = m(b) - m(a),

fUI(t) + fit»dm(t) = f fl(t)dm(t) + f f2(t)dm(t),

111. f f(t)d(ml(t) + mit» = f f(t)dml(t) + f f(t)dmit),

IV. f f(t)dm(t) = f f(t)dm(t) + f f(t)dm(t), a < c < b,

v. f fl(t)dm(t) :s; f f2(t)dm(t) if fl(t) :s; fit) on [a, b], (21)

VI.I f f(t)dm(t) I :s; fI f (t) Idm(t) :s; [m(b) - m(a)] a~~~xb I f(t) I,
Vl1. f f(t)dm(t) + f m(t)df(t) = m(b)f(b) - m(a)f(a),

V111. f m(t)df(t) = m(a) f df(t) + m(b) f df(t) for some ~ in [a, b],
IX. f f(t)m(t)dt = m(a) ff(t)dt + m(b) f f(t)dt for some ~ in [a, b],

x. f f(t)dm (t) = f f(t)m'(t)dt,

provided m has a continuous derivative m' on [a, b].


6.2 Integration in a Measure Space 179

J:
The Riemann-Stieltjes integral f(t)dm(t) has been defined for finite
closed intervals [a, b] and exists iff is continuous and m is finite and non-
decreasing on [a, b]. Moreover, iff is continuous and m is finite and non-
decreasing on [a, (0) for some finite constant a, the definition may be extended
via

i'" f(t)dm(t) = !~~ f f(t)dm(t) (22)

provided the limit on the right exists. Analogously, for any finite constant b,
if f(t) is continuous and m(t) is finite and nondecreasing on ( - 00, b] (resp.
on ( - 00, (0» define

J
b

_ 00 f(t)dm(t) = a~i~oo L b

f(t)dm(t),

!~~ f
(23)
{oooo f(t)dm(t) = f(t)dm(t)
Q-+ - 00

provided the limits on the right exist (independently of the manner in which
b ---+ 00, a ---+ - (0). If a = - 00 or b = 00 or both, f(t)dm(t) is frequently J:
alluded to as an improper Riemann-Stieltjes integral.
The relationship between the Riemann-Stieltjes integral f(t)dm(t) J:
and the Lebesgue-Stieltjes integral Jla,b)f(t)dm(t) is embodied in

Theorem 4. If f(t) is a continuous function and m(t) is a finite nondecreasing


function on (- 00, (0), then ifm(a) = m(a-) and m(b) = m(b-),

ila,b)
f(t)dm(t) = fb f(t)dm(t),
a
-oo<a<b<oo; (24)

moreover, if I J(- 00. oo)f(t)dm(t) I ::; 00, then the Riemann-Stieltjes integral
J~ 00 f(t)dm(t) exists and

i_ 00, oo/(t)dm(t) = f:oo f(t)dm(t),


i_ oo.b/(t)dm(t) = f 00 f(t)dm(t) if m(b) = m(b - ), (25)

t, oo/(t)dm(t) = 1 00

f(t)dm(t) if m(a -) = m(a).

PROOF. Choose tt.Jn) such that m(tt.nl\


J}
= m(tt.) n) -),I -
< J' <
-
n,

2(b - a)
a = t\» < t\n) < ... < t~n) = b, max (t~n) - t~n~ 1) < .
1 skSn n
180 6 Measure Extensions

On [a, b) define
n

fn(t) = L f(tL~dI[I~"2"I~n)(t),
k=!

and note thatfn -+ fon [a, b) by the continuity off


Since the Lebesgue-Stieltjes measure v = Vm determined by m is finite
on the finite interval [a, b) and max a :5,:5blf(t)1 < 00, by the Lebesgue
dominated convergence theorem

i[a. b)
f(t)dm(t) = i [a. b)
f dv = lim
n
i[a, b)
fn dv

= lim E L f(tL~dI[I~n~l,Il:'»

f
n k=!

= li:,n kt f(t~n~ !)[m(t~n) - m(t~n~ I)J = f(t)dm(t)

since the Riemann-Stieltjes integral exists.


To prove (25), note that via the portion already proved

i eo)
[a.
f±(t)dm(t) =
b~eo
lim
m(b)=m(b-)
i [a,b)
f±(t)dm(t) = lim
b-eo
m(b)=m(b-)
fb f±(t)dm(t)
a

= leo f±(t)dm(t),
and the comparable statement with [a, (0) replaced by (- 00, b) follows
analogously. The remaining portion of (25) obtains by letting a -+ - 00
subject to m(a) = m(a - ). 0

The Lebesgue-Stieltjes and Riemann-Stieltjes integrals can be extended


to nonincreasing functions m in an obvious fashion by defining

{f(t)dm(t) = - {f(t)d( - m(t», BE fJI,

f f(t)dm(t) = - f f(t)d( -m(t», -oo.:s; a < b .:s; 00,


(26)

whenever the integrals on the right are defined. In the special case where
m = I - F with F a d.f., the latter may also be expressed as

f f(t)d[l - F(t)J = - f f(t)dF(t). (27)

It may be noted via the definition that whenever J~f(t)dm(t) exists for
- 00 < a < b < 00, so does J=b f( -t)dm( -t) and the two are equal.
The absolute moment EIXI', r > 0 always exists for any r.v. X, and its
6.2 Integration in a Measure Space 181

finiteness has been shown equivalent to convergence of the series


If' P{IXI ~ n t/'} in Corollary 4.1.2. It may be evaluated as a Riemann
integral involving the tails of the distribution of X as follows from Corollary 2
of

Lemma 1. If X is a r.v. with dJ. F and m(t) is a continuous nondecreasing


function on ( - 00, 00) with m(O) = 0, then
i. Elm(X)1 = J~<x,F(t)dm(t) + J~ [1 - F(t)]dm(t),
ii. if IE m(X)1 < 00, E m(X) = J~ [1 - F(t)] dm(t) - r~-oo F(t) dm(t).
iii. E m( IX I) = J~ [1 - F(t) + F( - t)]dm(t).
PROOF. By Theorem 4, the discussion following, and integration by parts
((21)(vii», if C > 0 and G = 1 - F,

Elm(X)1 = f:oo Im(t)IdF(t) = {OO m(t)dF(t) - foo m(t)dF(t)


= lim [- (m(t)d[1 - F(t)] -
c-oo Jo
fO-cm(t)dF(t)]
= 1~~ [m( -C)F( -C) + f/(t)dm(t) - m(C)G(C)

+ I: G(t)dm(t)] ~ f 00 F(t)dm(t) + {OO G(t)dm(t). (28)

Thus, in proving (i) it may be supposed that Elm(X)1 < 00, and so, since

o ~ m(C)G(C) - m( -C)F( -C) ~ Loom(t)dF(t) - r-: m(t)dF(t)

= i [Itl2C)
Im(t)ldF(t) = 0(1)

as C -+ 00, (i) follows from (28).


Part (ii) follows by applying (i) separately to m+ and m-. Likewise (i)
implies (iii) in view of 1 - F1XI(t) = P{IXI ~ t} = G(t) + F( -t+) and
Exercise 6.2.6.

Corollary 2. For any LV. X with dJ. F,

EIX!' = r {OOt'-I[l - F(t) + F(-t)]dt, r> 0, (29)

and ifE X 2k + I < 00 for some k = 0, 1, ... , then

E X 2 k+ 1 = (2k + 1) {OO t 2k [1 - F(t) - F( - t)]dt. (30)


182 6 Measure Extensions

PROOF. The first statement is an immediate consequence of(iii) and (x) of(21),
while the second follows similarly from (ii). 0

The following combines Example 2.1.1 and Corollary 4.2.4:

EXAMPLE 1. Let {Xn, n ~ O} be nonnegative functions with X n E !l'1(S, I:, p.)


for n ~ 0 and E X n - E X o. If either X n .4 X 0 or X n~ X 0' then
EIX n - Xol- O.
PROOF. If y" = (X 0 - X n) +, then y,,::; X 0' n ~ 1. Since either y".4 0 or
y" ~ 0, by Lebesgue's dominated convergence theorem E(X o - X n)+ =
E y" - O. Then E(X o - X n) - 0 implies E(X o - X n)- - 0, and so
EIX o - Xnl--+ O. 0

Two alternative types of truncation of r.v.s have already been utilized in


Theorem 4.2.2 and Theorem 5.1.3, 5.2.4, etc., and it is of interest to compare
the corresponding expectations.

EXAMPLE 2. If X is a nonnegative r.v. with dJ. F and


y = min(X, s), z = XIIX<s), (31)
then for any choice of rand S in (0, 00)

E yr = r {tr-l[l - F(t)]dt, (32)

E zr = E yr - s'[l - F(s)] = {t r dF(t). (33)

PROOF. If G is the dJ. of y, then


G(x) = F(x)I[x<s) + Ilx~s),
whence by Corollary 2

E yr = r {"'('-I[1 - G(t)]dt = r {tr-l[l - F(t)]dt.


Since Z = YIIX<sj, via (21)(vii)
E zr = E yrIIX<s) = E yr - E yrIIX~S)

= r {('-I [1 - F(t)]dt - s[1 - F(s)] = {t r dF(t). 0

EXERCISES 6.2
l. If 111,112 are measures on (n, §) and S X d(1l1 + 112) exists so does S X dill' i = 1,2,
and S X d(llt + J,l2) = S X dJ,lI + S X d1l 2 ·
2. The integral of a nonnegative measurable function over a set of measure zero has the
value 0; also, if SA 9 dJ,l = 0 for every measurable set A, then 9 = 0, a.e.
6.2 Integration in a Measure Space 183

3. If S is the set of positive integers, ~ is the class of all subsets of S, and J.l(A) = number
of integers in A E ~, then (S, ~, J.l) is a non finite measure space and convergence in
measure is equivalent to uniform convergence everywhere.
4. Demonstrate in a nonfinite measure space (S, ~, J.l) that X. ~ X does not neces-
sarily imply X. 4 X. Hint: Utilize Exercise 3. If X. 4 X, does X. Y 4 XY?
~
5. If X. E .PiS, ~, J.l) and X. -4 X for some p > 0, then X E.P p and EI X.I P --+ EIXI P•
6. Iff is a finite, nondecreasing function and m is a continuous nondecreasing function
on [a, b], where - 00 < a < b < 00, then

f f(t)dm(t) = f f(t + )dm(t) = f f(t- )dm(t).

7. Prove Minkowski's inequality: If Xi E .Pp, i = 1,2, and p ~ I, IIX I + X zllp :s;


IIX dip + Il X zll p ·
8. If F ixi is the dJ. of lXI, verify that for every c > 0

f't
e
dF,xl(t) = c P{IXI ~ c} + f'e
P{lXI ~ t}dt.
(i) Show that a sequence ofr.v.'s {X., n ~ I} is u.i. iffsup.~ 1 S:'
P{IX.' ~ t} dt --+ 0
as c --+ 00.
ALV. X is said to be stochastically larger thanaLv. YifP{X ~ x} ~ P{Y ~ x} for
all x. (ii) If the LV.S X., n ~ I, are u.i. and IX.I is stochastically larger than IY.1.
n ~ I, then {y", n ~ l} is u.i.
9. If {X.,n ~ I} are.P I LV.S with a common distribution, then Emax1,;i,;.IXd =
o(n). Hint: Use Exercise 8 to establish u.i.
10. Show that the analogue of (2IXiv) for Lebesgue-Stiettjes integrals is not true in
general. Construct an example for which the Riemann-Stieltjes integral over a
finite interval [a, b] fails to exist.
II. Establish that g(t) = (sin t)/t is Riemann but not Lebesgue integrable over ( - 00, 00)
and find a function g(t) which is Lebesgue integrable but not Riemann integrable.
12. Let S= {t,2}, ~ = {{I}, {2},0,S} and J.l = counting measure. If Xes) =1,
Jensen's inequality fails for the convex function XZ.
13. Let S = {I, 2, ... }, ~ = {A: A c S}, J.l = counting measure on ~. If X.(s) =
n- 11(1 ,;s,;.I' then X. ~ 0 and EX. =
I despite the fact that for J.l{A} < l, E X.I A
= O. Thus, Theorem 4.2.3(i) may fail in a a-finite measure space.
14. If f, m are finite and nondecreasing on (- 00, 00) with f continuous, prove that
SID.b1f(t)dm(t) + SID.b] m(t)df(t) = f(b)m(b+) - f(a)m(a-).
15. Let P. E (0, I), q. = I - P., where np. --+ A. E (0, 00). Let J.l be counting measure on
the class of all subsets of!l = {t, 2, ...}. If XU) = ;Je-),/j! and X.U) = (j)~q:- i,
IJ !£p
provethatX. -+ XandX.~ X,p ~ 1. Hint: Apply Example 1 or Example 2.1.1.
16. (Erickson) In Example 5.4.1 thefunction a(x) = x/J~ [I - F(y)]dy was encountered,
where F is the dJ. of a nonnegative LV. X. (i) Show that a(x) is nondecreasing. (ii)
Prove that EX = 00 implies E a(X) = 00. Hint: E a(X) < 00 entails a(x) =
0«(1 - F(xW 1) and hence E X/g y dF(y) < 00, contradicting the Abel-Dini
theorem.
184 6 Measure Extensions

17. Random variables {IXnIP,n;;;: I} are u.i. iffsuPn"IJ~tP-lP{lXnl >t}dt-+O as


K -+ 00.
18. (Young's inequality) Let rp be a continuous, strictly increasing function on [0, (0)
with rp(O) = O. If t/t is the inverse function (which therefore satisfies the same con-
ditions) then for any a ;;;: 0, b ;;;: 0

ab:;;;, frp(X)dX + s: t/t(x)dx

and equality holds iff b = rp(a).

6.3 Product Measure, Fubini's Theorem,


n-Dimensional Lebesgue-Stieltjes Measure
Let (ni , $'i' JJ.), i = 1,2, denote two measure spaces. Ignoring for a moment
the measures, the spaces engender (Section 1.3) a product measurable space
(n, $'), where n = Xl;l ni = n l x n z and $' = Xl; I $'i = $'1 X $'z =
a({A I x A z : Ai E $'i' i = I, 2}).
For any set A E $', the sections

are $' 2- and $' I-measurable respectively according to Theorem 1.4.2. Thus,
JJ.z{A(l)(wI)} and JJ.l {A(2)(wz)} are well-defined real functions, the first on n l
and the second on nz . For notational simplicity these will be denoted by
JJ.Z{A(l)} and JJ.l{A(Z)} respectively.
Now if JJ.I and JJ.z are finite measures,
sf = {A E $': JJ.3-j{A(j)} is $'i-measurable, i = 1,2}
is a A.-class containing all rectangles with measurable sides, whence sf :::l $',
that is, JJ.3-d A(i)} is ~-measurable i = 1, 2; this carries over to the case
where JJ.i is a-finite, i = 1,2, since if n j = U;;"
I Bi,i' where Bij are disjoint

sets of $'j, JJ.i{Bij} < oo,j ~ I, i = 1,2, then n l x n 2 = Ui'.h (B li , x B Zh )


and every measurable set A = UiI,h A(B liI x B Zh ) with JJ.i{A(3-i)} =
Lh.h JJ.i{[A(B Iii x B zh )]l3 - i)} being $' 3 _ i-measurable, i = I, 2. Thus, when
(ni , $'i, JJ.) are a-finite measure spaces, i = I, 2, and A E $' = $' 1 X $' Z'
JJ.3 -i {A (i)}'IS Ja;; i-measura bl e, I. -- I , 2.

Theorem 1. If (nj , $'i' JJ.), i = 1, 2, are a-finite measure spaces and n =


nl x n z ,$' = $'1 x $'z,thenthereexistsQa-finitemeasurespace(n,$',JJ.)
for which
JJ.{B I x B 2} = JJ.I {B I } . JJ.z{B z } if Bi E $'i' i = 1,2; (1)
moreover, the measure JJ. is uniquely determined by (I) and, furthermore,for
every A E $', JJ.3 _i{A(i)} is $'j-measurable, i = 1,2, and

(2)
6.3 Product Measure, Fubini's Theorem, n-Dimensional Lebesgue-Stieltjes Measure 185

PROOF. For A E fF, define J.l by the first equality in (2). Then J.l ~ 0 and
J.l{0} = O. Since (Uf An)(Z) = Uf A~) for every sequence {An} of fF sets,
and disjointness of {An} entails that of {A~Z)}, the set function 11 so defined
is a measure on fF, whence (il, fF, J.l) is a measure space. Moreover, if
A = B I X B z , where B i E .-7;, i = 1,2, then

so that (1) holds. Now (Exercise 1.5.3),


'§ = {B I x B z : B i E fF i , i = 1,2}
is a semi-algebra and, furthermore, since III and J.lz are a-finite, ji = J.l1(§ is
a a-finite measure on '§.
By Theorem 6.1.1 the extension 11 of ji is a-finite on g; = a('§) and
uniquely determined by ji. Finally, if

1l*{A} = i
0,
Ilz{A(l)}dll j, A E g;,

then, by analogy with the preceding, J.l* is a measure on fF satisfying (1),


whence by uniqueness J.l = J.l*, i.e., (2) holds. D

The measure J.l defined in (2) is called the product measure of the a-finite
measures J.lj and J.lz and is denoted by J.l1 x 117.' The a-finite measure space
(il j x il z , fF j X fFz, J.lI x J.lz) is referred to as the product measure space
(of the a-finite measure spaces (ili' fF i , J.la, i = 1,2).
In situations where more than one measure is floating around, almost
everywhere statements must be qualified. Thus, a.e. [J.lj] abbreviates the
statement" except for a set of J.ll-measure zero."

Corollary I. If (il, fF, J.l) is the product measure space of the a-finite measure
spaces (ili' fF i , J.la, i = 1,2, and J.l{A} = Ofor some A E fF, then J.lj {A(Z)(wz)}
= 0, a.e. [J.lz], and J.lz{A(l)(w j)} = 0, a.e. [J.lI].

If X = X(w j, w z ) is a function on the product measurable space


(il j x il z , fF j x fFz), then the functions X~/(wz) defined for each w j E il j
by

X~/(Wz) = X(w 1, wz)


are called sections of X at W j • Analogously, the functions X~l(wI) are
sections of X at W7.' It follows directly from Theorem 1.4.2(ii) that every
section at Wi of an .'ff'l x fF z-measurable function X is an fF 3 _ i-measurable
function, i = 1,2. Define for i = 1,2

A 3-i = {W 3-i : i{ll


X+(w j, WZ)dJ.li = i Or
X-(w j, WZ)dJ.li = oo}. (3)
186 6 Measure Extensions

Theorem 2 (Fubini). If (0, :IF, J.L) is the product measure space of the a-finite
measure spaces (0;, :IF i, J.LJ, i = 1, 2, and X is an :IF-measurablefunction whose
integral exists, then

is:IF 3_cmeasurable, i = 1,2, and

moreover, in the case where X is integrable so are almost all sections of X at


wJor i = 1,2.
The general case follows from consideration of X +, X -, and so
°
PROOF.
X 2: may be so supposed; moreover, J.Ll and J.L2 may be assumed finite.
Let ytJ = {X: X 2: 0, X is :IF-measurable, So.
X(w 1 , w 2)dJ.Li is :IF 3 _i-measur-
able, i = 1,2, and (4) holds)}. Then ytJ contains all indicator functions of sets

°
of:IF by Theorem 1. Since ytJ is a monotone system, ytJ contains all nonnegative
:IF -measurable functions by Theorem 1.4.3. Moreover, if X 2: is integrable,
it is clear from (4) that so are S02 X(Wl> w 2)dJ.L2 and SOl X(Wl, w2)dJ.Ll for
almost all WI and W2 respectively. 0

If (OJ, :lF j, J1.i)' 1 ~ i ~ n, are a-finite measure spaces and (O,:IF) =


X7; I (Oi' :lFJ is the product measurable space, it can be proved inductively
that there is one and only one measure J.L on X7; I :lF i such that for every
measurable rectangle B I X ... x Bn

n J.Lj{BJ.
n

J.L{B I X ••• x Bn} =


i; I

Then J.L is called the product measure and denoted by J.Ll x ... x J.Ln or
X7; I J.Li· Associativity of ordinary multiplication together with Theorem 6.1.1

guarantees that X7; I J.Lj = (X~ 1 J.LJ x (X~+ I J.LJ for any integer m in [1, n).
The measure space (0, :IF, J.L) = (X~ OJ, X7; X7;
I :lF j , I J.LJ is alluded to as

the (a-finite) product measure space of (OJ, :lFj, J.Li), 1 ~ i ~ n, and is denoted
by X7; 1 (Oi, :lF j , J.LJ The extension of Fubini's theorem is immediate, and

so the integral of any nonnegative :IF -measurable or J.L-integrable function on


the product measure space X7; 1 (OJ, :lFi , J.Lj) may be evaluated by iterated
integrals in any order (i.e., via an n-fold iteration of integrals analogous to (4),
where n = 2).
In the important special case where Oi = R = [ - oo,oo],:lF i = fJI = the
class of all (linear) Borel measurable sets and J.Li = J.L = Lebesgue measure,
I ~ i ~ n, the corresponding product measure space (R n, fJln, J.Ln) is the
n-dimensional Borel measure space.
In Section 1.6 it was observed that the d.f. Fx of a LV. X on a probability
6.3 Product Measure, Fubini's Theorem, n-Dimensional Lebesgue-Stieltjes Measure 187

space (0, fF, P) is a dJ. in the sense that it is a nondecreasing, left-continuous


function on R with
F( - (0) = F( - 00 +) = 0, F(oo) = F(oo -) = 1.
The converse was mentioned there but its proof need be deferred no longer.
If G is an arbitrary dJ. on the line, there always exists a r.v. X on some
probability space with F x = G. It suffices to take n = R, fF = fJI = linear
Borel sets, X(w) = w (the coordinate r.v.) and to define P{[a, b)} = G(b) -
G(a), - 00 ~ a :s; b :s; 00. By Theorem 6.1.2 this ensures a unique probability
measure P on fJI, and, clearly,
F x(x) = P{X(w) < x} = P{w: w < x} = G(x).
This statement and indeed the notions of Lebesgue-Stieltjes measure,
Lebesgue-Stieltjes integral, etc., extend readily to Rn. An n-dimensional
distribution function or d.f. on Rn is a function F satisfying
lim F(XI"'" x n) = F(XI" .. , x j _" - 00, x j + I' ... , x n) = 0, l:S;j:S;n
Xj--OO

(5, i)
lim F(x ... , x j _" Yj, x j + I,· .. , x n) = F(x . .. , x j , . .. , x n),
Xj>Yj-Xj " "
1 :s; j :s; n, (5, ii)
n

11:· b F == F(b" ... ,bn) - LF(b" ... ,bj_"aj,bj+" ... ,bn)


j; 1

+ L F(bl,···,bj-l,aj,bj+I, ... ,bk_l,abbk+I,···,bn) - ...


1 $j<k$n

+ ( -1)nF(a ... , an) 2 0, (5, iii)


"
lim F(x" ... ,xn) = F(oo, ... ,oo) = 1, (5, iv)

whenever - 00 :s; aj :s; bj :s; 00, 1 :s; i :s; n.


Let !/ be the class of all finite or infinite intervals [a, b) together with 0,
R, {oo} and define
!/n = {S:S = SI x S2 X •.. X Sn,SjE!/, l:s; i:S; n}

and
n

11:· b F when S = X Sj with either Sj = [aj, bj) or


j; 1

P{S} = (6)
n

o if S = X Sj
j; 1
with some Sj =0 or {oo}.
188 6 Measure Extensions

where F satisfies (5, i)-(5, iii) and is finite on (-co, co)". As in Lemma 6.1.1,
P is a measure on the semi-algebra Y", and so by Theorem 6.1.1, P can be
extended to .A", the class of all P-measurable sets, the extension being unique
on the class (fI" of n-dimensional Borel sets. The measure P and the measure
space (R", --It", P) are called respectively the n-dimensional Lebesgue-
Stieltjes measure and the n-dimensional Lebesgue-Stieltjes measure space
determined by F. The same terms respectively are used for the restriction of P
to f!J" and (R", fJI", P). If F is a dJ., that is, satisfies (5, iv) also, then (R", fJ4", P)
is a probability space.
If X" ... , X" are LV.S on a probability space (0, !IF, P), their joint dJ.
F x, ..... xn(Xt, ... , x") = P{X I < XI' ... , X" < x"}
is readily verified to be a dJ. on R" in the sense of (5). Conversely, given any
n-dimensional dJ. F as in (5, i)-(5, iv), there always exist n r.v.s X I"'" X"
on some probability space (0, !IF, P) whose joint dJ. is the preordained F.
It suffices to choose 0 = R",!IF = (fI", Xi(w) = ith coordinate of w, I ~ i ~ n,
and to define P on Y" via (6). According to the prior discussion, P is uniquely
determined on the Lebesgue-Stieltjes measure space (R", (fI", P). Moreover,

P{X I < x("",X n < x n } = P{w:w i < Xi' I ~ i ~ n}


= lim A~·xF = F(x l , . . . , x n).
Q- - 00

Thus, there is a one-to-one correspondence between distribution functions on


R n and probability measures on (R n, (fin).
If Y = Y(w(, ... , w n) is an integrable function on the Lebesgue-Stieltjes
measure space (Rn, .An, P), the integral of Y . I A for any A E (fin is denoted by
S YI A dP or by
r~· f Y(w" ... , wn)dF(w t , .. ·, w") (7)

and abbreviated (when the context is clear) by SA Y dF.


The analogue of the first part of Corollary 6.2.1 is

Theorem 3. If X = (X I' ... , X m) is a random vector with df Fx and g is a


Borelfunction on Rm,

Eg(X) = f g(tl, ... ,tm)dFx(tI, ... ,tm)


JRm
in the sense that if either of the integrals exists, so does the other and the two
are equal.
It should be noted that the iterated integrals (4) of Fubini's theorem may
also be written as in

(8)
6.3 Product Measure, Fubini's Theorem, n-Dimensional Lebesgue-Stieltjes Measure 189

Definition. Convolution of d.f.s is a binary operation denoted by *, which


associates with every pair F I' F 2 of dJ.s, adJ. F, denoted by F I * F 2, and
defined by

(9)

The dominated convergence theorem permits limits to be taken inside the


integral and thus the convolution F t * F 2of any two dJ.s is again adJ.
The primary interest of convolution for probability theory stems from

Theorem 4. If X I and X 2 are independent LV.S on some probability space


(n, :IF, P) with corresponding dJ.s F I and F 2' then their sum XI + X 2 has
the dJ. F I * F2 .
PROOF. Since, via the independence of X 1 and X 2' the mapping
W --+ (X I(W), X iw)) takes P-measure on:IF sets into the product (F 1 x F2 )-
measure on the Borel sets of the plane, by Theorem 3 and Fubini's theorem

P{X I + X 2 < x} =
Jor I[x,+x,<X](w) dP(w) = J[z+y<X]
r d(F1 x F2 )(z, y)

= f:oo f:/[z+y<X](Z, y)dF (z)dF iy)


I

= f:ooFI(X - y)dFiy),
and so the dJ. of XI + X 2 is F I * F2 • 0

The convolution formula (9) may be paralleled by a convolution of the


measures F I {-}, F 2 {·} determined by the dJ.s F I, F 2 . To this end, let the
translation of the Borel set B by the amount y be denoted by B - y =
{x - y: x E B} and define

F{B} = f:ooFdB - y}dF 2 (y). (10)

It is a simple matter to check that F is a probability measure on the Borel sets


of the line. Moreover, when B = ( - 00, x), (10) reduces to (9). Then by
Theorem 6.1.1 the measure F{.} of (10) is that determined by the dJ. F 1 * F 2
of (9), thereby justifying the common name F. Furthermore, for any Borel
function 9 integrable relative to F I * F 2,

When 9 is the indicator function I B of some Borel set B, then g(x + y) =


I B - y for every real y, whence (11) coalesces to (10). Hence, by Theorem
1.4.3, (11) holds for any (F 1 * F 2)-integrable function.
190 6 Measure Extensions

EXAMPLE 1. Let (r, §) and (A, <§) be measurable spaces with v a measure on
<§ and JL" {A} a function on § x A such that JL" {.} is a measure on § for

t
almost all A E A and moreover JL" {A} is <§-measurable for every A E §. Define

JL{A} = JL,,{A}dv(A), AE§

and for any § -measurable function f

L f(y)dJL,,(Y) = [L f+ (y)dJL,,(Y) - L f- (y)dJL,,(y)] I[frJ< d/l'=oo= frf- d/l,)C

Then JL is a measure on §, fr f(y)dJLh) is <§-measurable and


L f(y)dJL(Y) = t (L f(y)dJLAY ») dV(A) (12)

if either integral exists.

PROOF. Let A' = {k JL" is a measure on §}. Then JL{A - A'} = 0 whence
JL{A} = fA'JL,,{A}dv(A). Via the monotone convergence theorem, JL is a mea-
sure on §. As usual, in verifying (12), it suffices to consider non-negative f.
fr
Let Yl' = {f ~ 0: f is §-measurable, f(y)dJL,,(Y) is <§-measurable and (12)
holds}. Then IA E Yl' for all A E § by definition and moreover by Theorem
6.2.1 Yl' is a monotone system. Thus, by Theorem 1.4.3, Yl' contains all
non-negative § -measurable functions. D

EXERCISES 6.3
1. Let (OJ, ff j , P j ) be a probability space, where OJ = (- 00, (0) and ff j = {B: B is a
Borel subset of(-oo,oo)},i= 1,2.U(Q,ff,P) = (0. x 0z,ff. x ffz,P. x P z)
and if Xlw) = Wj, i = 1,2, for W = (w., wz)e 0, then X. and X z are independent
LV.S on (n, ff, P).
2. un l= Oz = [0, 1], ff. = ff z = [0, 1] .~, fl. = Lebesgue measure, and fl2{A} =
number of points in A, then (OJ, ffi> flJ, i = 1,2, are measure spaces. The set
A = {(w.,wz):w. = wz}eff l x ffz,but

whence Theorem 1 as well as the uniqueness part of Theorem 6.1.1 is invalid if (1-
finiteness is omitted.
3. For any (not necessarily (1-finite) measure spaces (OJ, ff j , flj), i = 1,2, and any
integrable function X on the product measure space (n, ff, fl),

Hint: Consider X ~ 0 and utilize the remark just prior to Theorem 6.2.1.
6.4 Infinite-Dimensional Product Measure Space, Kolmogorov Consistency Theorem 191

4. For j ~ I, define a i ,2j-1 = (2 j - l)i-I/2 jj and a i ,2j = - a ,2j+I' i ~ l. The row


j

sums of the double series Lj au are simply the elements of the first column. What are
the column sums? Taking III and 112 as counting measure, does this example con-
tradict Fubini's theorem?
5. Show that g(wl' W2) = e-w,w, - 2e- 2w ,w, is Lebesgue integrable (i) over Q 1 =
[I, 00) for each W 2 and (ii) over Q 2 = (0, IJ for each WI' but that Fubini's theorem
fails. Why?
6. An alternative construction of a LV. X on a probability space with a preassigned dJ.
F is to take Q = [0, I J, ff = Borel subsets of [0, I J, P = Lebesgue measure on
[0, IJ, and X(w) = F-I(w), where F-I(w) = sup{x: F(x) < w}.
7. Prove that the random vectors X = (X I' ... , Xm)and f = (fl , ... , fn)on (Q, ff, P)
are independent of one another iff their joint dJ. F x, y = F x . F y and conclude that X
and f independent entails (R m+n, gr+n, vx, y) = (R m, [JIm, Vx) X (Rn, [JIn, Vy), where
vx, Vy, and VX,y are the Lebesgue-Stieltjes measures determined by F x , F y, F x . y
respecti vely,
8. Random variables XI' X 2 have a bivariate Poisson distribution if
min(j. k) ai aj - iak - i
P{X = ' X = k} = e-(a,+a,+a12) " 12 I 2
I j, 2 L.
i=O I. ")'(k
"("j _ I . _ I')'.

for any pair of nonnegative integers (j, k), where aI' a2' al2 are nonnegative param-
eters. Define a probability space and LV.S X I' X 2 on it whose joint distribution is
bivariate Poisson and show that X j is a Poisson LV. with mean aj + a 12 . Prove that
the correlation p(X I> X 2) ~ 0 and that X I and X 2 are independent iff a l 2 = O.
9. Random variables XI" .. , X k have a multinomial distribution if P{X i = Xi' 1 ::;;
i ::;; k} = n! n~=1 (PN(Xi!» for any nonnegative integers Xi' I ::;; i ::;; k, with
L~ Xi = n and zero otherwise. Here, n is a positive integer and L~ pj = I, Pi> 0,
I ::;; i ::;; k. Prove that if {A 1 ::;; i ::;; k} is a partition orQ in ff with Pi = P{A;} and
j ,

Xi = number ofoccurrences of Ai in n independent trials, I ::;; i::;; k,thenXI,,,,,X k


have a multinomial distribution. Show that Xi is a binomial LV. with parameters Pi'
n and that p(X j , X) < O.
10. Random \ariables {Xi' 1 ::;; i::;; n} are called interchangeable if their joint dJ. is a
symmetric function, i.e., invariant under permutations, and LV.S {X n , n ~ I} are
interchangeable if every finite subset is interchangeable. Prove that if {X n , n ~ l}
are!:£'2 interchangeable LV.S, then p(X I , X 2) ~ O. Hint: 112(L~ X) ~ O.

6.4 Infinite-Dimensional Product Measure


Space, Kolmogorov Consistency Theorem
In determining a product measure on an infinite-dimensional product
measurable space (Section 1.3) it is well to keep in mind the distinguished
role played by the number 1 in infinite (and even finite) products.
192 6 Measure Extensions

Let (a j, §j, PJ, i ~ I, be a sequence of probability spaces and define


n

a~ = Xaj,
j= I

00 n 00

§= X§.· §'=X§· §"= X§·"


j= I " n "
I
n
n+ 1

'§n = {A: A = Bn x a; where Bn E §~},

Then for each n ~ I, (a;, §;) is a measurable space, {'§n} is an increasing


sequence of sub-a-algebras of §, and '§ is the algebra of cylinders of §. For
A = Bn x a; E '§n, define
P{A} = (Pt x P 2 x ... x Pn){Bn}.

Note that P is well defined on '§, since if A = Bn x a; = Bm x a;;. E '§n' '§m


where, say m > n, then necessarily Bm= Bn x an + I x ... x am, whence

(PI X ... x Pm){Bm} = (PI X ..• x Pn){Bn}· n Pj{ail


m

i=n+ I

Clearly, P is a-additive on '§n, n ~ I, and additive on the algebra ,§, and,


moreover, P{Q} = I.
Now a = a~ x a;,§ = §~ x §;,n ~ l,thatis,ifwEn,w=(w~,w;),
where w~ E a~, w; E a; for n ~ I. For any A c a, set
A(w~) = {w;: (w~, w;) E A}, A(w;) = {w~: (w~, w;) E A}.
By Theorem 1.4.2, if A E §, then A(w~) is §;-measurable and if A E '§, say
A = Bm x a;;. E '§m' where m > n, then
A(w~) = {w;: (w~, w;) E Bm x a;;.}
= {(w n+ I"'" wm): (WI"'" w m) E Bm} x n;;. (I)
= Bm(w~) x a;;.

is an (m - n)-dimensional cylinder in (n~, §~) so that if pIn) is the analogue


of P for the space (n~, §~)
p(n){A(w~)} = (Pn+l x ... x Pm){Bm(w~)}.

Hence, via Theorem 6.3.1 and Fubini's theorem,


P{A} = (PI x ... x Pm){Bm}

= r (P
Jn~
Pm){Bm(w~)}d(PI x ... x P
n + 1 X ... X n)

= r ... r p(n){A(w~)}dPl(wd"'dPn(wn)'
Jnn Jn,
(2)
6.4 Infinite-Dimensional Product Measure Space, KolmogoroY Consistency Theorem 193

Theorem 1. If (Qj, fF j , P), i~ 1, is a sequence of probability spaces, there


exists a unique probability measure P on the product cr-algebra I fF j such X;-;
that for every cylinder A = B n x Q n + I X Q n + 2 X ... with B n E X~ fF;,
(3)
PROOF. Clearly, P, as defined by (3), is nonnegative and additive on the
algebra t§ with P{Q} = 1. In view of Theorem 6.1.1 it is sufficient to verify
cr-additivity of P on t§ and thus by Theorem 1.5.2(iv) to prove that An E t§,
n ~ I, with An 1 and infn 2 I P{A n} = E > 0 imply I An -:f. 0. To this end,
set Dn = {w t : P(1){A n(wI)} > E/2}, noting that the prime in (2) is super-
n:=
fluous since 0.'1 = Qt. Since

E:S; P{A n } = i (},


P(t){An(wt)} dP I

= Ln P(1){A n(WI)}dP t + Lt(1){A n(WI)}dP I


E
:s; PdDn } + 2'
necessarily P dD n } ~ E/2, n ~ 1. Now {D n } is a decreasing sequence of fF I
sets, whence, since P I is a probability measure on fF I' Theorem 1.5.2(iv)
ensures the existence of a point wT E 0. 1 with wT E D n. Thus, An(w!), nl"
n ~ l,isadecreasingsequenceoft§(1)setswithP(1){A.(w!)} ~ E/2,n ~ land
the argument just offered with 0., {An}' E applies to Q'{, {An(w!)}, E/2, and
yields a point w! E 0. 2 with P(2){A n(wT, wn} ~ E/4, n ~ 1. Continuing in
this fashion a sequence (wT, w!, ...) is obtained with w: E Qm and

p(m){An(wT, ... , w:)} ~ 2Em ' m = 1,2, ... ;'n = 1,2,.... (4)

To prove that w* = (wT, w!, ...) E nl"


An, note that since for each n ~ I,
An = B m X Q~ for some m ~ 1, necessarily
*
A n(WI' *) = {Q~ if (wT,· w:) E Bm
0
0"

.. 0, W m Of *
1 (w t , 000'
*) J
W m 'F
Bmo

But (4) ensures that An(wT, w:) -:f. 0. Thus, (w*,


0 0 0' 0 0 0' w:) E Bm, whence
(wT, .. 0,w:, w:+ ... . 0) E Bm x Q~ = An, n ~ 1. 0

Theorem 1 establishes that a sequence of probability spaces (Qj, fF j , P)


i ~ 1, engenders an infinite-dimensional product measure space (X;-; I Qj,
X;-; I fF j , X;-; I P j) such that for every n = 1,2, denoting X;-; I P j by P,
0 0"

The proof is based on the fact that if t§n = {A: A = Bn x Q~} is the class of
cylinders with n-dimensional bases B n E Xi= I fF j and
t\{A} = (PI x 000 x Pn){B n},
194 6 Measure Extensions

then Pn is a probability measure on the a-algebra t§n with Pn = Pn+ II~. and,
moreover, if P{A} = limn Pn{A} for A E t§ = Uf t§n, then P is a-additive
on the algebra t§, whence there is a unique extension to a(t§) = ~i' Xf
The following question then poses itself: If (O,~, PJ, i ~ 1, is a sequence
of probability spaces with ~ c ~+1 and Pi = Pi+d~;, is P{A} = limnPn{A},
A E t§ = Uf~, necessarily a-additive on the algebra '§? The answer is,
in general, negative; see Example 3. However, if 0 = ROO and g;, is the
class of cylinders of (ROO, &1 00 ) with n-dimensional bases, the answer becomes
affirmative.

Theorem 2. Let (Rn, f!Jn, P n ), n ~ I, be a sequence of probability spaces with

P n + I {An X R} = Pn{A n}, An E &In, n ~ 1, (5)

and let ijn he the a-algebra ofcylinders in (ROO, &1 00 ) with n-dimensional Borel
bases. Ift§ = Uf ijn andfor each A = An X X:'+ I R with An E :!in,

P{A} = Pn{A n}, (6)

then P is a-additive on the algebra t§ and determines a unique probability


measure extension P to &1 00 •

PROOF. In view of Theorem 6.1.1, it suffices to prove that P is well defined and
a probability measure on t§. The former is an immediate consequence of (5)
and, clearly, P is nonnegative and additive on t§ with P{R oo } = l.
Let ~ n and ~ n denote the classes of all sets of the form J I X •.. x J nand
J I X ... x I n x R x R x ... respectively, where J i signifies an interval
of R, 1 :5: i :5: n, i.e., J i = [ai' bJ, [ai' bi), (ai' bi ), or (ai' bJ for - 00 :5: aj :5:
bi :5: 00. Then the classes J(' nand Ji" nof all finite unions of sets of ~ n and ~ n
respectively are algebras. If Ji" = Uf Ji"n' then Ji" is a subalgebra of'S,
whence P is defined and additive on Ji".
To check a-additivity on Ji", let {An, n ~ I} be a decreasing sequence of
sets of Ji" with infn P{An} = E> 0, whence An = Am X R x R x ... for .
some Am. E J(' m.' and mn + 1 > mn , n ~ l. Since P n is a probability measure on
J('n, every interval J I X ... x I n of ~n contains a closed subinterval whose
Pn-measure is arbitrarily close to that of J I X ... x J n' Thus, there is a
closed set Bn (which is a finite union of closed intervals) with

(7)

Let En = Bn X R x R···, whence (7) holds with An, En' P replacing Am.,
Bn , Pm. respectively. Consequently, if en = 8 1 . E2 ••• 8n ,
6.4 Infinite-Dimensional Product Measure Space, Kolmogorov Consistency Theorem 195

whence
l: l:
7<
P{L n } > P{A n }
-
- 2 ~ 2

and Cn -# 0. Let wIn) = (w\n), wT 1, ••• ) E En. Since En!' necessarily w(n+ p) E
Cnc fin' implying (w\n+p), ... ,w~.7P)EBn, p = 0, 1, .... Choose a sub-
sequence {n a } of the positive integers for which w\n 1k ) - . a limit W\OI (finite or
infinite) as k -. 00. Likewise, there is a subsequence {n2k} of {n1k} with
WT2k) -. W~OI as k -. 00, etc. Then
w(n kk ) = (w\n kk ), w~nkk), ...) -. (w\O), w~o), ..•) == w(O)

and so as k -. 00

n ~ 1.

Therefore w(O) E fin cAn, n ~ 1, so that nf


An -# 0. Consequently, P is
O'-additive on the algebra ,it with P{R oo } = 1, whence by Theorem 6.1.1, P
has a unique extension P to f!loo = O'(,it). Clearly, P is O'-additive on f§, and
since P = P on ,itn, P = P on O'(,itn) = din' n 2 1, whence P = P on
f§ = Uf din· 0

Is it always possible to define a sequence of r.v.s {X n , n 2 I} on some


probability space (Q, fF, P) such that the joint dJ.s of all finite subsets of
r.v.s XI" .. , X n coincide with dJ.s F 1, .... n assigned a priori? The answer is
yes provided the assignment is not internally contradictory.
A family {F I ..... n(x I' ... , x n)} of n-dimensional dJ.s defined for all n 2 1
will be called consistent if for all n ~ 1

Fl. .... n(x\' ... ,xn)= lim Fl. .... n+\(x\' ... ,xn+\). (8)
X .. + I - +dJ

Theorem 3 (Kolmogorov Consistency Theorem). If {F I. .... n' n ~ I} is a con-


sistentfamily ofdJ.s, then there exists a probability measure P on (R 00, f!loo) such
that the dJ.s ofthe coordinate r.v.s X I' ... , Xnon (ROO, f!loo, P) coincide with the
preassigned d.f.s F I. .... n' that is, such that if
Xk(w) = Wk, k = 1,2, ... , for w = (w\, W2,"') E ROO (9)
then for all n 2 1,
P{X\ < x\,,,,,X n < x n} = FI. .... n(x\' ... ,xn). (to)
PROOF. If An is an n-dimensional set of f!ln, define

Pn{A n} = f··· fdF1 .....n<t\, ... , t n), (11)


A ..

whence (R n, f!ln, P n) is a probability space, n 2 1. Employing the notation of


(6) of Section 3, for all pairs of real numbers a j < bi' 1 ~ i ~ n,
196 6 Measure Extensions

Pn+\{~[ai,bJ R} = an+l,i~_ooPn+\{~:[ai,bJ}
X

bn + 1-00

tin
lim
+ 1- - 00
~:·:\F\ ..... n+\ = ~:.bF\, .... n = Pn{.X[aj,b J }
I = 1
bn + t - OO

in view of (8). Hence, P n+ \ {An X R} = Pn{A n } for all An E f!ln, n ~ 1, and so


by Theorem 2 there is a probability measure P on (ROO, f!loo) such that for all
An E f!ln
n ~ 1.
The coordinate functions defined by (9) are LV.S and by (9) and (11)
P{X\ < x\"",X n < x n} = P{w:w\ < x\"",wn < xn}
= Fl, .... n(x\' ... ,xn). 0
Corollary 1. If {X n , n ~ I} is a sequence of r.v.s on some probability space
(n, ~, P), there exists a sequence of coordinate r.v.s {X~, n ~ I} on the prob-
ability space (ROO, f!loo, P') such that thejoint dJ.sofX \, ... , X n and X'I' ... , X~
are identical for all n ~ 1.

Theorem 4. If {X n , n~ I} are LV.S on some probability space (n,~, P), there


exist two sequences ofcoordinate L V.S {X;, n ;;:: I} and {X~, n ;;:: I} on (ROO, fAoo)
each having the same joint distributions as {X n , n ~ I}, i.e.,
F x·;..... x~ = FX·, ..... X~ = F x, ..... x n ' n ~ 1, (12)
such that the stochastic processes
{X:, n ~ I} and {X~, n ~ I} (3)
are independent of one another.

PROOF. For w = (w\, w 2 , ...) E ROO, define ~{w) = Wj, j ~ I, and Fn , j> by
F2n = P{Y\ < y\, ... , Y2n < Y2n} = F X'.X2 Xn(Y\' Y3,"" Y2n-\)
. FX'.X2 XJY2' Y4,"" Y2n),
F2n - 1 = P{Y1 <YI"'" Y2n - t <Y2n-.}
= F X'.Xl Xn(Y\' Y3,"" Y2n-\)
. FX'.X2 Xn_'(Y2, Y4,"" Y2n-2)

for n ~ I. Then {F n' n ~ I} is a consistent sequence of dJ.s and hence


determines a probability measure P on (ROO, 91 00 ) which clearly satisfies (12)
and (13) if X: = Y2n -\, X~ = Y2n , n ~ 1. 0

The LV.S X:, n ~ 1, defined by


n ~ 1,
6.4 Infinite-Dimensional Product Measure Space, Kolmogorov Consistency Theorem 197

are called the symmetrized XII. Given any sequence of LV.S {X"' n ~ l}, in
subsequent allusion to the symmetrized X"' namely X:, the distinction be-
tween X" and X~ (which are probabilistically indistinguishable) will be
glossed over and X:
will be written
n ~ 1,
where {X~, n ~ I} is independent of {X"' n ~ l} and possesses the same
joint distributions.
Since the {X:, n ~ I} are symmetric LV.S, it is frequently easier (especially
for independent {X"' n ~ I}) to prove a result for {X:} rather than attempt a
direct argument with {X,,} (see Chapter 10).
The following examples complement the three series theorem, the second
furnishing a typical exploitation of symmetrization. For any r.v.s {X"' n ~ I}
and positive constant c, define
" "
s;(c) = L a2(XjIlIXk;c]) = L (E XIIllxjl"C] 2
- E XjIuxjl"c)· (14)
j= 1 j= 1

EXAMPLE 1. Let S" = L~ Xi, n ~ 1, where {X"' n ~ I} are independent,


symmetric LV.S and let s,,(c) be as in (14).
If for some c > 0, s;(c) -+ 00, then

~ S" a.c. a.C. I.


urn - - = 0 0 = - 1m - - .
S" (15)
"-00 S,,(c) "-00 S,,(C)
If L:= 1 P {I X" I > C} = 00, all C > 0, then
(16)

If s;(c) = 0(1), all c > 0 and Loo= 1 P{lX,,1 > c} < 00, some c > 0 then
S" converges a.c. (17)

PROOF. Define
(18)

Then symmetry guarantees that E X~ = 0, n ~ 1, and the hypothesis of (15)


ensures that the uniformly bounded LV.S {X~, n ~ I} obey a central limit
theorem (Theorem 9.1.1 or Exercise 9.1.1). Thus if <I> denotes the normal dJ.,
for all b > 0

implying by the Kolmogorov zero-one law that for all b > 0

p{mn _1_ ± Xf > ~} ~ p{mn _1_ ± Xf ~ b} = 1.


"-00 s,,(c) 2
i= 1 "-00 s,.(c) i= 1
198 6 Measure Extensions

Since sn(c) --+ 00, Lemma 4.2.6 ensures

P ~ {
hm - Sn > - = I b} (19)
n-oo sn(c) - 2 '
and so, in view of the arbitrariness of b, (19) obtains with b = 00. The an-
alogous statement for the lower limit follows by symmetry.
Apropos of (16), via IXnI ::;; ISn I + ISn _ I I and the Borel-Cantelli theorem

2 wn ISnl ~ wn IXnl = 00, a.c.,

and by symmetry wnn_ oo Sn = 00 = -lim Sn, a.c. The final statement, (17),
follows immediately from the three series theorem (Theorem 5.1.2). 0

Corollary 2. For independent, symmetric T.V.S {Xn} either L~ Xi converges a.c.


or lim n_ oo L~ Xi = 00 = -li!!!n-oo L~ Xi' a.c.

EXAMPLE 2. Let {X n, n ~ l} be independent T.V.S and Sn = L~ Xi' n ~ 1.


IfLOO= I P{ IXnl > c} = 00, all c > 0, then

wn ISn I = 00, a.c. (20)

If for some c > 0, Loo= I P{ IX nI > c} < 00, and s;(c) --+ 00, then
= ISnl ac.
JIm - - = 0 0 . (21)
n-oo sn(c)
Suppose that for some c > 0,
00

L P{lXnl > c} < 00 and s;(c) = 0(1). (22)


n=1
Ifwn n_ oo D= 1 E Xi = 00 (resp·li!!!n_oo Lj= 1 E Xi = - (0), then wnn_ oo Sn
= 00 (resp. li!!!n- 00 Sn = - (0), a.c. If - 00 < li!!!n- 00 D E Xi ::;; wnn_ 00
L~ E Xj < 00, then wnn_oolSnl < 00, a.c.
PROOF. Since the proof of (16) did not utilize symmetry, (20) is immediate. To
establish (21), let {X:, n ~ I}and {X~, n ~ l} be i.i.d. stochastic processes on
on some probability space whose joint distributions (of X'I' ... ' X~ or
X'{, ... , X:) are identical with those of XI' ... , Xn. The double prime will be
deleted in what follows. Since, in the notation of (18),

+2 i £IX., ,;c.I·\~1 > c)


X;
6.4 Infinite-Dimensional Product Measure Space, Kolmogorov Consistency Theorem 199

necessarily
2ah ~ afX"_X~)2C + 2e P{lXnl > e},
2

afX"_X~)2C ~ 2ai~ + 8e 2 P{lXnl > e},


and so Ij; 1 afXj_Xj)2C - 2s;(e).Itfollowsfrom(15)thatflri1n_ 00 ISn - S~I/sn(c)
= 00, a.c., and this, in turn, ensures the conclusion of (21).
Finally, the hypothesis of (22) guarantees that l(X j - E Xj) con- Ii;
verges a.c. The final conclusion of (22) follows from the identity
n n n
L1 X = L1(X
j;
j
j;
j - E Xj) + L E Xj,
j; 1

while the prior ones follow upon replacing n in this identity by suitable
subsequences {nJ. 0

Since Levy's inequality acquired a very simple form for symmetric r.v.s.
(Corollary 3.35), symmetrization is especially useful in proving the following
converse to Theorem 5.2.8.

Theorem 5. Let IXp ~ 1 and Sn = L~=1 Xi' n ~ 1, where {X, X n, n ~ I} are


non-degenerate i.i.d. random variables. If, for some 8 > 0,

I
00
naP-2P{ISnl ~ ena } < 00, (23)
n=1
then IX > 0 and EIXIP < 00. Moreover, EX = 0 if either IX < 1 or IX = 1 and
(23) holds for all 8 > O.
PROOF. By Theorem 4, there exists a sequence {X', X~, n ~ 1} such that
{X, X n, n ~ 1} and {X', X~, n ~ 1} are i.i.d. Set Y = X - X', Y" = X n - X~,
and 1'" = Ii=1 1], n ~ 1. Then, for some 6 > 0,

00

I nap - 2
P{IT"I ~ 2en
a
} < 00, (24)
n=1
so that, as m ~ 00,

2m 2m
I nap - 2
P{IT"I ~ 28(1 + 2a )ma } ~ I naP- 2 P{IT"I ~ 2ena } = 0(1). (25)
n=m n=m

Now, if 1',,* = maxI s,j S,n 11j1 and Y,,* = max 1 s,j S,n 11]1, for any tJ > 0, via Levy's
inequality,

2P{IT"I ~ tJ} ~ P{T,,* ~ tJ} ~ P{Y,,* ~ 2tJ} = 1 - P{IYI < 2tJ}


200 6 Measure Extensions

.-1
= P{I YI ~ 2b} L pj{1 YI < 2b}
j=O
.-1
~ P{IYI ~ 2b} L
j=O
[1- 2P{I1j1 ~ b}J. (26)

Consequently, choosing b = I'/na, where 1'/ = 2e, (24) implies that


00 n-l
L nap - 2 P{IYI ~ 21'/na} L [1- 2P{I1j1 ~ I'/n a}] < 00, (27)
.= t j=O

while (25) and the initial inequality of (26) ensure that, as m -+ 00,

2m
map-tP{T",* ~ 1'/(1 + 2a)ma}::; L nap - 2 P{1;'* ~ 1'/(1 + 2a)ma} = 0(1).
n=m

Since r.xp ~ 1, as m -+ 00,

P{T",* ~ I'/,ma} = 0(1), (28)


where 1'/' = 1'/(1 + 2a) = 2e(1 + 2a). In view of P{T",* > O} ~ P{T1* > O} =
P{I YI #- O} > 0, necessarily r.x ~ 0, so that r.xp ~ 1 guarantees r.x > O.
Clearly, (23) holds when e is replaced by e(1 + 2a), whence (27) likewise
holds with 1'/ replaced by 1'/'. Thus, since P{I1j1 ~ I'/,na} ::; P{T,,* ~ I'/'na} for
1 ::; j ::; n, (28) ensures that the second sum in (27) with 1'/ replaced by 1'/' is
n + o(n), implying that
00

L nap - 1 P{IYI ~ 21'/ ' na} < 00,


.=1

and hence also that

for some positive constant C, yielding EI YIP < 00. Hence, via independence
of X and X',

00 > EIX - X'IP ~ E[(X - K)+]PP{X' < K},


and choosing K large enough so that P{X' < K} > 0, necessarily E(X+)P <
00. Similarly, E(X-)P < 00 and so EIXIP < 00.
When r.x ::; 1, clearly p ~ 1, so that E IXI < 00. By Theorem 5.2.8, for all
e' > 0,
00

L nap - 2 P{IS. - n E XI ~ e'na} < 00,


.=1

which, in conjunction with (23), yields

L
00

nap - 2 P{IE XI ~ 2en a - t } < 00 •


• =1

Thus, E X = 0 if either r.x < 1 or r.x = 1 and (23) holds for all e > O. 0
6.4 Infinite-Dimensional Product Measure Space, Kolmogorov Consistency Theorem 201

Actually, IX> 1/2 in Theorem 5, since the alternative, IX ~ 1/2, requires p ~ 2,


whence the Central Limit Theorem for i.i.d. random variables (Corollary 9.1.2)
implies that
P{IT"I ~ 2w <X} ~ P{IT"I ~ 2W 1/ 2 } -+ 2[1 - $(28)] > 0,
which is incompatible with (24).
EXAMPLE 3. Let Q = {t, 2, ... }, fF" = IT({t}, {2}, ... , {n})j, Pn{A} = 0 or I
according as A is a finite set or not. Since no two infinite sets of fF" are disjoint,
Pn is a probability measure on fF" and moreover, Pn+11fF" = Pn. However,
P{A} = lim Pn{A} is not IT-additive since
P{ {m}} = lim Pn{{m}} = 0, m = 1,2, ....

whereas
00

P{Q} = lim Pn{Q} = 1 -#


"-+00
L P{ {n}}
n:;:;:;1

EXERCISES 6.4
l. Verify that it is possible to define a sequence of independent LV.S {Xn } with specified
dJ.s F. on a probability space.

2. Let (n, !fI, v) be the probability space consisting of Lebesgue measure on the Borel
subsets of [0, 1]. Each w in [0, I] may be written in binary expansion as w = X 1 X 2 ...
= If 2-·X., where X. = X.(w) = 0 or I and this expansion is unique except for a
set of w of the form m/2n which is countable and hence has probability (Lebesgue
measure) zero. For definiteness regarding such w, only the expansion consisting of
infinitely many ones will be used. Show that for all n ~ I, {w: X .(w) = I} E (JI and that
{X., n ~ I} is a sequence of independent r.v.s. Describe the LV.S Y. where w =
In"":1 ,-'y" and, is an integer> 2.

3. In Exercise 2, define Z. = Z.(w) = I or - I according as the integer i for which


(i - 1)/2· :5; w < i/2· is odd or even. Show that any two but not all three of Z I'
Z2' Z\ . Z2 are independent. What is the relation between Z. and X.? The Z.,
n ~ I, are known as the Rademacher functions.

4. Let no = {O, I}, 5'0 = class


of subsets of no, ~o({l}) =! = ~o({O}), and define
~j == ~o. i ~ I, and (n, .~.~)
= (nO', 5'0'. XI~;)' Prove the following. (i) Each
point of n is an 5'-set of ~-measure zero. (ii) The set D of all points of n with only
finitely many coordinates equal to I has J.l-measure O. (iii) Define n' = n - D,5" =
5'n', ~'{A ·n'} = ~{A}, A E 5'. (iv) For each WEn', z(w) = I~I wir i is a I-I
map of n' onto Z = (0, I). (v) If C = {Z: 0:5; a:5; Z < b:5; I}, A = {w: Z(W)EC},
then A is measurable and ~'{ A} = b - a. Hint: It suffices to take binary rational a, b.
(vi) For any Borel subset B of Z, the set A = {w: z(w) E B} is measurable and J.l' {A} =
Lebesgue measure of B.
5. Let T= [a, b], -00:5; a < b:5; 00, bean index set,n = R T = {w: W = w(t), tE T} =
space of real functions on T. If B. is a Borel set of R·, then A = {w(t): (w(td, ... ,
W(t.))E B.} is a Borel cylinder set for any choice of distinct t \' ... , t. in T. The class
202 6 Measure Extensions

d T of Borel cylinder sets is an algebra. Define f1IT = u(d T) and let T = [a, b],
-00 ~ a < b ~ 00. Do the sets {m(t): m(t) is bounded on T}, {m(t): m(t) is continu-
ous on T} belong to f1IT? If A* = {m(t): m(ti)e Ri , i = 1,2, ... }, and .91* is the class
of all such set A * (as t j and Rj vary), set f1I* = u(d*). Is f1IT = f1I*?
6. If S. =Li Xi where {X, X., n ;::: I} are i.i.d. then LI n-1P{IS.1 > en} < 00 all e > 0
iff E X = O. Hint: Sufficiency is contained in Theorem 5.2.7. For necessity, define
S: = Li Xr where {X:, n;::: I} is the symmetrized sequence. The hypothesis ensures
convergence ofLI n- I P{IS:I >ne} and hence also LI n-1P{max 1"j".IXrl >ne},
e > O.

6.5 Absolute Continuity of Measures,


Distribution Functions; Radon-Nikodym
Theorem
Let (n, ff, /l) be a measure space and T an arbitrary nonempty set. The
essential supremum g of a family {g" t E T} of measurable functions from n
into R = [ - 00, 00], denoted by esuPI eT gl' is defined by the properties:
i. g is measurable,
ii. g 2 g" a.e., for each t E T,
iii. For any h satisfying (i) and (ii), h 2 g, a.e.
Clearly, if such a g exists, it is unique in the sense that two such essential
suprema of the same family are equal a.e.

Lemma 1. Let (n, ff, /l) be a a-finite measure space and {g" t E T} a nonempty
family of real, measurable functions. Then there exists a countable subset
To c: T such that
sup gl = esup gl'
leTo leT
PROOF. Since /l is a-finite, it suffices to prove the theorem when /l is finite;
moreover, by considering tan - 1 gl if necessary, it may be supposed that
IgIl ~ C < 00 for all t E T. Let 5 signify the class of all countable subsets
J c: T and set

(1. = sup E (sup gl)'


Ie", tel

whence (1. is finite. Choose In E 5, n 2 1 for which (1. = sUPn~ I E(SUPleI. gl)
and let To = Uf In· Then To is a countable subset of T and clearly (1. =
E[suPte To gIl The measurable function g = sUPle To gl satisfies (ii) since
otherwise for some t E T necessarily (1. < E max(g, gl) ~ (1.. Obviously, (iii)
holds, and so g = esuPleT gl' 0
6.5 Absolute Continuity of Measures, Distribution Functions 203

Definition. If (n,~, JlJ, i = 1,2, are two measure spaces and JlI {A} = 0
whenever Jl2{A} = 0, then JlI is said to be absolutely continuous with respect to
pz or simply pz-continuous. If, rather JlI {N C } = 0 for some set N E ~ with
Jl2 {N} = 0, then JlI is called pz-singular (or the pair JlI' Jl2 is dubbed singular).
If g is a nonnegative integrable function on (n,~, Jl), the indefinite
integral vg{A} = fA g dJl is absolutely continuous relative to Jl. This is
tantamount to saying that the integral of a nonnegative function g over a set
A of measure zero has the value zero (Exercise 6.2.2). The Radon- Nikodym
theorem asserts under modest assumptions that the indefinite integral v9 is the
prototype of a measure absolutely continuous with respect to Jl. The crucial
step in proving this is

Lemma 2. Let (n, ~,Jl) be a a-finite measure space and va a-finite measure on
~, and let ;t(' denote the family of all measurable functions h ~ 0 satisfying
fA h dJl ~ v{A}, A E~. Then
v{A} = t/J{A} + 19dJl' AE~ (I)

where t/J is a jl-singular measure and


g = esup h. (2)
he.K

PROOF. Since Jl and v are a-finite, it suffices by a standard argument to con-


sider the case where both are finite and, furthermore, the trivial case Jl == 0
may be eliminated. According to Lemma 1, there exists a sequence hn E;t(',
n ~ 1, for which g = esuPhe.K h = sUPn~ I hn. Now if hI> h 2 E;t(', then h =
max(h l , h 2 ) E ;t(' since

fA
h dJl = f A[hl ~ h21
hi dlJ, + fA[hl < h21
h 2 dlJ, ~ v{A}, A E 17,

and so it may be supposed that hn ~ hn+ I' n ~ 1. Then g = limn hn, whence by
the monotone convergence theorem

1 9 dJl ~ v{A}, A E~.

Consequently, t/J as defined by (1) is a measure.


Next, for n ~ 1 and A E ~ with Jl{A} > 0,

gyn(A) = {B E~: Be A, t/J{B} < ~ Jl{B}}


is nonempty; otherwise, the choice ho = (1/n)1 A would guarantee for all
BE ~ that

J[ ho dJl = ~n Jl{AB} ~ t/J{AB} ~ t/J{B}


B
= v{B} - [g dJl,
J
B

implying ho + g E f![ and thus violating g = eSUPhefi h.


204 6 Measure Extensions

Choose BI,. E ~.(n) with


J.L{B I ,.} ~ i sup{J.L{B}: B E ~.(n)}
= (XI,. (say), If J.L{B~,.} = 0, stop; otherwise, choose B 2 ,. E~. (B~,.) with
J.L{B 2 ,.} ~ i sup{J.L{B}: B E ~.(B~,.)}
= (X2,. (say). If J.L{BL· B2,.} = 0, stop; otherwise, choose B 3 ,. E ~.(B~,.'
B2,.) with
J.L{B 3 ,.} ~ i sup{J.L{B}: B E ~.(Bt..· B2,.)} = (X3,.
and so on. If the process terminates at some finite step k., set Bi ,. = 0,
j> k•.
Since ~.(AI) c ~.(A2) for Al C A 2, necessarily Bi,.E~.(n) for j ~ 1 if
Bi ,. -# 0, and since ~.(A) is closed under countable disjoint unions,
M. = U~ I Bi ,. E ~.(n), n ~ 1. Now, if J.L{M~} > 0, for some n ~ 1, there
exists some D E ~.(M~), whence J.L{D} > O. Moreover, (Xm,. -+ 0 as m -+ 00 via
disjointness of {Ri ,., j ~ I} and finiteness of J.L. However, for all m

C0
1

2(Xm,. = sup {J.L{B} : B E~. Bi,.)}

~ sup{J.L{B}: B E ~.(M~)} ~ J.L{D} > 0,


a contradiction. Thus, J.L{M~} = 0, n ~ 1, and l/t{M.} < (lJn)J.L{M.} =
(lJn)J.L{n}. Consequently,

l/t{.c\ M.} ~ !~~ l/t{M.} = 0


o

Corollary 1 (Lebesgue Decomposition Theorem). If J.L, v are a-finite measures


on a measurable space (n, :F), there exist two uniquely determined measures
At, ,1.2 such that v = ;'1 + ,1.2' where ,1.2 is J.L-continuous and Al is j.L-singular.
PROOF. It suffices to verify uniqueness. Let Al + ,1.2 = ,1.'1 + A~, where AI, ,1.'1
are J.L-singular and ,1.2' ,1.2 are J.L-continuous. If Al -# A'., there exists A E :F with
J.L{A} = 0 and AdA} -# A.'dA}. But then A2 {A} -# A2{A}, violating absolute
~~~ 0

The Lebesgue decomposition may be used to prove

Theorem 1 (Radon-Nikodym). If VI' V2, J.L are a-finite measures on a measur-


able space (n,:F) with Vi being J.L-continuous, i = 1,2, and ifv = VI - V2 is well
defined on:F (i.e., v.{n} and V2{n} are not both (0) then there exists an:F-
6.5 Absolute Continuity of Measures, Distribution Functions 205

measurable function g,finite a.e. [jl], such that

v{A} = f/ djl, A E?l', (3)

and 9 is unique to within sets of p.-measure zero.


PROOF. Let gj and t/Ji be defined as in Lemma 2, i = 1,2. Then both Vi and
JA gj dp,
are p,-continuous and hence also t/Jj, i = 1,2. Since according to
Lemma 2, t/Jj is p,-singular, i = 1, 2, necessarily (Exercise 1) t/Jj == 0, whence

vj{A} = Lgj djl, A E?l', i = 1, 2.

Moreover, 9 = gl - g2 is ?l'-measurable and so

v{A} = Vi {A} - v2 {A} = f/ djl, A E?l',

which is (3). In proving uniqueness, it may be assumed that p, is finite. If g*


is any other ?l'-measurable function satisfying (3), then for any C > 0
A= {C > g* > 9 > -C}E?l',
whence

Lg* djl = v{A} = f/ djl,

necessitating jl{A} = 0, all C > 0 and hence jl{g* > g} = O. Analogously,


p,{g* < g} = 0 and so g* = g, a.e. Finally, when V is finite, 9 is p,-integrable
and hence finite a.e. [jl], whence the latter also obtains when V is a-finite. 0

Corollary 2. Iv{A} I < 00 for all A E .iF iff 9 is p,-integrable and v is a measure iff
9 ~ Oa.e. [jl].

A function 9 defined by (3) is called the Radon-Nikodym derivative of v


with respect to Jl and is denoted, in suggestive fashion, by dv/djl. Thus, if v is a
(well-defined) difference of two jl-continuous, a-finite measures, (3) may be
restated as
dv
v{A} = f
A djl djl, A E?l'. (3')

Theorem 2. Let jl be a a-finite measure and v a p.-continuous, a-finite measure on


the measurable space (n, ?l'). If X is an ?l'-measurablefunction whose integral
Jo X dv exists, then for every A E?l',

LX dv = Lx ~: djl (4)
206 6 Measure Extensions

PROOF. It may be supposed that Jl is finite and (via X = X+ - X-) that


X ~ O. Let Jf be the class of nonnegative ~-measurable functions for which
(4) obtains. Then Jf is a monotone system which contains the indicator
functions of all sets in ~. By Theorem 1.4.3, Jf contains all nonnegative ~­
measurable functions. 0

Corollary 3. If v, Jl, A. are a-finite measures on a measurable space (n, ~) with v


being Jl-continuous and Jl being A.-continuous, then v is A.-continuous and
dv dv dJl
a.e. [A.].
dA. = dJl . dA. '
PROOF. v
Clearly, is A.-continuous and dv/dJl is ~ -measurable with Sn (dv/dJl)dJl
extant. Thus, by Theorem 2, for all A E ~

which is tantamount to the conclusion. o


If F is a dJ. on R for which there exists a Borel function f on ( - 00, 00) with

F(x) = f_oo.x/(t)dt = foof(t)dt, -00 <x< 00 (5)

where dt signifies Lebesgue measure on (R, Bl), then, as noted in Section 1.6, F
is said to be absolutely continuous and f is called the density function of F. In
particular, if F is the dJ. attached to some r.v. X on a probability space, thenf
is also called the density of X. Clearly, when the f of (5) exists, it is unique to
within sets of Lebesgue measure zero.

Corollary 4. If X is a r.v. on a probability space (n, ~, P) with dJ. F and


density f, and 9 is a Borelfunction on R = [- 00,00] such that E g(X) exists,
then for any linear Borel set B

i[XEB)
g(X) dP = [g(t)f(t)dt
JB
(6)

and, in particular,

P{X E B} = {f(t)dt. (7)

PROOF. Let Jl denote the restriction to Bl of the Lebesgue-Stieltjes measure


determined by F. From (5), for - 00 < a < b < 00

i
[a, b)
f(t)dt = F(b) - F(a) = F(b - ) - F(a - ),
6.5 Absolute Continuity of Measures, Distribution Functions 207

and so by the uniqueness of the restriction of Lebesgue-Stieltjes measure to ~

Jt{B} = {!(t)dt, BE~.

Thus, Jt is absolutely continuous with respect to Lebesgue measure, whence by


Theorem 2

{g(t)!(t)dt = {g(t)dJt(t) = f:oo g(t)/it)dJt(t)

= E /B(X)g(X) = [ g(X)dP,
J[XEBI

recalling Corollary 6.2.1. o


Analogously, if (i) F is an n-dimensional dJ. or (ii) F is the joint dJ. of r.v.s
XI' ... , X n on some probability space and there exists a Borel function! on R"
with

F(XI"'" xn ) = f~'" f~!(tl"'" tn)dt l •.• dt n, (8)

where dt 1 ..• dt n signifies Lebesgue measure on (R n, ~n), then F is declared


absolutely continuous and! is called the density function of F in case (i) and
the joint density of the r.v.s X I' ... , X n in case (ii). Again, ! is unique to
within sets of n-dimensional Lebesgue measure zero. Under (ii), if g is any
Borel function on R n for which E g(X I, ... , X n) exists, then for any B E ~n

and, in particular,

P{(X I, ...,X EB} = {!(tt> ...,tn)dt


n) l ... dt n. (10)

EXERCISES 6.5
I. If t/J and Il are measures such that t/J is both Jl-continuous and Wsingular, then t/J == O.
2. Two measures Jl, v are called equivalent, denoted Jl == v, if each is absolutely con-
tinuous with respect to the other. Verify that this is an equivalence relation. If
(n,~, Jl) is a probability space, Xi is a nonnegative!f I random variable and Jli is the
indefinite integral of Xi> i = 1,2, then, if Jl{[X 1 = 0] ~ [X 2 = OJ} = 0, the two
indefinite integrals are equivalent measures.
3. If (O,~, Jli) is a measure space, i = 1,2, then JlI is absolutely continuous relative to
III +Jl2·
4. If F(x; a, b) = (b - a)-I(x - a), a :'S: x :'S: b, the corresponding measure F{ . } is
absolutely continuous relative to Lebesgue measure with Radon-Nikodym
derivativef(x) = (b - a)-l/la,;x';bl and F is called the uniform distribution on
208 6 Measure Extensions

[a, b]. When a = 0, b = I, F{ . } coincides with Lebesgue measure on [0, I] and a


LV. with d.f. F is said to be uniformly distributed. Show that if XI' ... , X. are i.i.d.
uniformly distributed LV.S, the measure determined by the dJ. of X = (X I' ... , X.)
is the n-dimensional Lebesgue measure on the hypercube 0 ::;; Xj ::;; I, I ::;; i ::;; n, of
R·.
5. A completely additive set function von a measurable space (n, ~) which assumes at
most one of the values + 00 and - 00 and satisfies v{0} = 0 is sometimes called a
si2fted measure. If v is a signed measure on a measurable space (Q, §) and v + {A} =
supfvfB}: A::::> Be§'}, then v+ is a measure satisfying v+ {A} ;;:: v{A}, Ae§'.
Likewise, v-{A} = -inf{v{B}:A::::> Be§} is a measure with v-{A};;:: -viA};
the measures v + and v - are called the upper and lower variations respectively and
the representation v = v + - v- is the Jordan decomposition of v. If v is a-finite, so are
v+ and V-.

6. Ifv = v+ - v- is theJordan decomposition (Exercise 5) of the signed measure v, then


v = v+ + v- is a measure called the total variation of v. Clearly, IviA} I ::;; viA},
Ae§.

7. If v is a signed measure on a measurable space (Q, §) with total variation Ii (see


Exercise 6) and X is integrable relative to v, one may define JX dv = JX dv+ -
J X dv-. Prove that if v is finite, v{A} = SUP{lJAX dvl: X is measurable and
IXI::;; I}.
8. Let V denote a linear functional on If in, §, Jl), i.e., range V = ( - 00, (0) and
V(af + bg) = aV(f) + bV(g) for all f, geIf p' It is continuous if V(f.) ..... V(f)
whenever 1If. - flip ..... 0, and V is bounded if W(f)I::;; Cpllfllp for all feIfp,
where Cp is a finite constant. Prove that a continuous linear functional is bounded.
Hint : Otherwise, thereexistl~ e If pwith I V(j~)1 > nlll.ll p , whence ifg. = l~/(nllj~llp),
Ilg.ll p = 0(1) but I V(g.)! > 1.
9. If (n, §, Jl) is a a-finite measure space and V is a continuous linear functional on
If p(n, §, Jl), p > 1, there exists 9 e If q , where (l/p) + (l/q) = 1, such that V(f) =
J
f . 9 dJl for all f e If p' This is known as the (Riesz) representation theorem.
Hint: Let Jl be finite. Since I A e If in, §, Jl) for all A e §, a set function on § is
defined by v{A} = V(IA)' It is finitely additive and, moreover, a-additive by con-
tinuity of V and the fact that V(O) = O. Further, v is finite since Vis bounded (Exercise
8); v is absolutely continuous with respect to Jl. By the Radon-Nikodym theorem,
there is an §'-measurable g, finite a.e., with V(l A) = v{ A} = JA 9 dJl.

10. Set functions {v., n ;;:: I} on (n, .'F, Jl) are uniformly absolutely continuous relative
to Jl if for all e > 0, Jl{A} < 0, implies Iv.{A}1 < e for all n;;:: I. The sequence
{v., n ;;:: I} is equicontinuous from above at 0 iffor all e > 0 and Am! 0, Iv.{ Am} I
< doralln;;:: Iwhereverm;;:: m,.Provethatifmeasures{v.,n;;:: I}areequicon-
tinuous from above at 0 and also absolutely continuous relative to Jl, then {v.,n ;;:: I}
are uniformly absolutely continuous relative to Jl.

11. If f. e If p(n, §, Jl), n ;;:: I, then Ilf. - fmllp = o(l) as n, m ..... 00 iff (i) f. - fm 4 0
as, n, m ..... 00 and (ii) JA 1f.IP dJl, n ;;:: I, are equicontinuous from above at 0·
12. Random variables X I' . . . , X. have a (nonsingular) joint normal distribution if their
dJ. is absolutely continuous with density defined by f(x l , •.• , x.) = (2n)-'/2
6.5 Absolute Continuity of Measures, Distribution Functions 209

jlAl exp{ -! Li.j= 1 ai/Xi - ei)(Xj - ej)}, where A = {aiJ is a positive definite
matrix of order n and IA I signifies the determinant of A. Here, e = (e l , ... , e.) is a
e
real vector. Verify that this yields a bona fide probability measure and that E Xi = i ,
p(X i , Xi) = au!aiaj' where {aij} is the inverse matrix of A.

References
J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
P. R. Halmos, Measure Theory, Van Nostrand, Princeton, 1950; Springer-Verlag,
Berlin and New York, 1974.
G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities, Cambridge Univ. Press,
London. 1934.
A. N. Kolmogorov, Foundations of Probability (Nathan Morrison, translator),
Chelsea, New York, 1950.
M. Loeve, Probability Theory, 3rd ed., Van Nostrand, Princeton, 1963; 4th ed., Springer-
Verlag, Berlin and New York, 1977-1978.
E. J. McShane, Integration, Princeton Univ. Press, Princeton, 1944.
M. E. Monroe, Introduction to Measure and Integration, Addison-Wesley, Cambridge,
Mass., 1953.
H. Robbins, "Mixture of Distributions," Ann. Math. Statist. 19 (1948),360-369.
S. Saks, Theory of the Integral (L. C. Young, translator), Stechert-Hafner, New York,
1937.
J. L. Snell, "Applications of martingale system theorems," Trans. Amer. Math. Soc.
73 (1952), 293-312.
D. V. Widder, Advanced Calculus, 2nd ed., Prentice-Hall, Englewood Cliffs, New
Jersey, 1961.
7
Conditional Expectation,
Conditional Independence,
Introduction to Martingales

7.1 Conditional Expectations

From a theoretical vantage point, conditioning is a useful means of exploiting


auxiliary information. From a practical vantage point, conditional prob-
abilities reflect the change in unconditional probabilities due to additional
knowledge.
The latter is represented by a sub-a-algebra '§ of the basic a-algebra fi'
of events occurring in the underlying probability space (0, fi', P). Associated
with any measurable function X on 0 whose integral is defined, i.e., IE X I
~ ro, is a function Y on 0 with lEY I ~ ro satisfying

1. Y is '§-measurable,
11. SA Y dP = SA X dP, all A E '§.

Such a function Y is called the conditional expectation of X given '§ and is


denoted by E{X I'§}. In view of (i) and (ii), any '§-measurable function Z
which differs from Y = E{X I'§} on a set of probability zero also qualifies as
E{X I'§}. In other words, the conditional expectation E{X I'§} is only defined
to within an equivalence, i.e., is any representative of a class of functions
whose elements differ from one another only on sets of probability measure
zero-an unpleasant feature.
To establish the existence of E{X I'§} for any fi'-measurable function X
with IE X I ~ ro, define the set functions A, A+, A- on fi' by

A E fi'. (1)

210
7.1 Conditional Expectations 211

The measures A. ± are P-continuous on fF and so, if their restrictions ~ ==


A. ± 1\9 are a-finite, the Radon-Nikodym theorem (Theorem 6.5.1) ensures
the existence of Y = dA.\9/dP\9 satisfying

J Y dP = A.{A} = f X dP,
·A A
A Ef"S.

Thus, it suffices when ~ are a-finite to equate

E{XIf"S} = dA.ff (2)


dP ff
and to recall that the Radon- Nikodym derivative of (2) is unique to within
sets of measure zero.
The second of the following lemmas shows that a similar procedure may
be employed even when a-finiteness is lacking.

Lemma 1. Ifv is a P-continuous measure on fF, there exists a set E E fF such


that v is a-finite on fF (\ E and for each A E fF (\ EC

v(A) = 0 = P{A} or v{A} = 00 > P{A} > O. (3)

PROOF. Set

fi} = {D: D E fF, v is a-finite on fF (\ D}

and then choose D. E fi}, n ~ 1, with sup,,~ 1 P{D.} = SUPDe!iJ P{D} = a


(say). Clearly, E = U1"
D. E fi}, whence P{E} = a. Moreover, for D E fF (\ E C

either v{D} < 00, implying DuE E fi} and hence a ~ P{D u E} = P{D} + a,
that is, P{D} = 0 = v{D}, or alternatively v{D} = 00, whence P{D} > 0 by
the P-continuity of v. 0

Lemma 2. '{v is a P-continuous measure on fF, there exists an fF-measurable


function dv/dP ~ 0, a.c., with

f
dv
v{A} = A dP dP, AEfF. (4)

Moreover, ifv is a-finite, then dv/dP is a random variable.


PROOF. Choose E as in Lemma 1 and set v' =V 13'" n E and P' = P 13'" n E' Then
Viis a-finite and P'-continuous, whence by the Radon- Nikodym theorem
(Theorem 6.5.1) dv'/dP' exists on E. Define

dV'
on E
:;= :' {
on P.
212 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

Then dv/dP is g;--measurable and (4) holds for A E g;- n E. If, rather, A E
g;- n EC, then (3) ensures that (4) still obtains. Finally, if dv/dP were infinite
on a set B of positive P-measure, then for every measurable subset A c B

v{A} = {:;dP = 00 or 0

and v would not be a-finite. In other words, if v is a-finite, dv/dP is a r.v.


D
Theorem 1. If X is an g;- -measurable function with IE X I ::; 00 and '§ is a
sub-a-algebra of g;-, then (i) there exists a '§-measurable function E{ X I'§}
unique to within sets of measure zero, called the conditional expectation of X
relative to ~~, for which

{E{XI'§}dP= {X dP forallAE'§. (5)

(ii) If X is integrable and Z is an integrable, '§-measurable random variable


such that for some n-class f0 with a(f0) = '§

EZ = EX, {Z dP = {X dP, A Ef0, (6)

then Z = E{X I'§}, a.c.


PROOF. (i) For A E '§, define the measures v± {A} = fAX± dP. By Lemma 2,
dv±/dP(9 exists, where P(9 = PI(9' Since IE XI::; 00, at least one of the pair
E X+, E X- is finite, and so at least one of dv±/dP(9 is integrable. Thus, if
dV+ dv- . (dv+ dV-)
--- onmm - - <00
Y = dP(9 dP(9 dP(9' dP(9 (7)
{
0, otherwise,
the function Y is '§-measurable, IE YI ::; 00, and for A E '§

{YdP = {~~: dP(9 - {~~~ dP(9


= {x+ dP - {x- dP = {X dP,
so that (5) holds with E{XI'§} = Y. As for uniqueness, let .Yt and Y2 be '§-
measurable and satisfy (5). Then P[YI > r > Y2 ] = 0 for every finite rational
number r. Hence, P {Yt > Y2 } = O. Similarly P {Y2 > Yt } = 0 and so
Y, = Y2 , a.c.
Apropos of (ii), if

d = {A: A g;-, {z dP = {X dP}.


E
7.1 Conditional Expectation 213

then nEd:::> P} by (6). Since X and Z are integrable, d is a A.-class, whence


by Theorem 1.3.2 d :::> u(P}) = C§ and so Z = E{XIC§}, a.c. 0

An immediate consequence of Theorem 1 is

Corollary 1. Let C§ I' C§ 2 be u-algebras of events and let X, f be ff I random


variables. /fu(X) and C§. are independent, then E{XIC§d = EX a.c. and if

then E{Xlu(C§~ u ~2)} = E{flu(C§1 U C§2)}, a.c.


A concrete construction of a conditional expectation appears in

Corollary 2. Let the random vectors X = (X I' ... , X m ) and f = (fl , ... , y,,)
on (n, /F, P) be independent of one another and let f be a Borel function on
R m x WwithIEf(X, f)l::; oo·/f,forxERm ,

g(x) = {Ef(X, Y) ifIEf(x, f)! ::; 00


(8)
0, otherwise,

then g is a Borel function on R m with

g(X) = E{f(X, y)!u(X)}, a.c. (9)

PROOF. Let F x' F y, and F x. y be the joint distribution functions of the random
vectors, X, f and (X, f) respectively, and denote the corresponding
Lebesgue-Stieltjes measures by vx, Vy, and vx . y. Since f is a Borel function
on R m + n , Theorem 1.4.2 ensures that f(x, y) is a Borel function on R n for
each fixed x E R m , and so by Theorem 6.3.3 and Fubini's theorem

are Borel functions on Rm • Thus

is a Borel set and g(x) = [g + (x) - g _(x)]/ DC(X) is a Borel function whence
by Theorem 1.4.4, g(X) is u(X)-measurable.
By independence (R m +n, fJm+n, vx . y) = (R m, 81 m, Vx ) x (R n, fJn, Vy). If
A E u(X), Theorem 1.4.4 guarantees the existence of B E!JI m such that A =
{X E B}, and once more via Theorem 6.3.3 and Fubini's theorem
214 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

f/±(X) dP = E IAg±(X) = f/±(X)dvx(x)

= r r f±(x, y)dvy(y)dvx(x)
J JR"
B

= r
JBXR"
f± dvx,y

= f,/±(X, Y)dP.

SinceIEf(X, Y)I ~ oo,eitherEg+(X)orEg_(X)isfinitewhenceg+(X) <


00, a.c. or g_(X) < 00, a.c. Hence, ID(X) = 0, a.c. Thus, for A E a(X),

L g(X) dP = L [g+(X) - g_(X)]IDc dP = L g+(X) dP - L g_(X) dP

= L f+(X, Y) dP - L f-(X, Y) dP = L f(X, Y) dP,

and so (9) holds by Theorem l(i). o


As will be seen in Section 2, the conditional probability of an event A
given~, denoted P{A I~}, may be defined as E{IAI~}. A typical exploitation
of conditional expectation appears in

EXAMPLE 1 (Kesten). If {S" = L~ Xi' n ~ I}, where {X, X" ~ I} are non-
negative i.i.d. r.v.s with E X = 00, then fiiii"_ex> XJS"-l = 00, a.c.
PROOF. Set A j = {Sj-l ~ I:Xil, I: > 0, where So = 0, and note that AjA j C
Aj{2J=: +1 X h ~ I:X j} for i < j, implying for k ~ 1 via independence and
identical distributions

P{A i , UA ~ P{Aj}P{O A
]=l+k
j}
]=k
j }'

so that (12) of Lemma 4.2.4 obtains with B j = A j • Moreover, if F is the dJ.


of X and a(O) = 0,
x
a(x) = So [1 _ F(y)]dy' x > 0,

then, clearly, a(x)/x ! and (Exercise 6.2.16) a(x) i and E a(X) = 00. It follows
that E a(I:X) = 00, all I: > O. However, via Corollary 2 and Examples
5.4.1,6.2.2

J1P{A"} = EJ1P{S"-1 < I:X"IX"} = {ex> J1P{S"-1 =s:; I:x}dF(x)

~ {ex> a(l:x)dF(x) = E a(I:X),


7.1 Conditional Expectations 21;

and so Lemma 4.2.4 ensures P{A n, i.o.} = 1, all e > 0, which is tantamount
to the conclusion of Example l.
It follows immediately from this example that for any i.i.d. LV.S {X, X n ,
n ~ I} with EIXI = 00,

a.c. o

Let X be an ~ -measurable function with IE X I ~ 00 and {Y)., AE A},


{~)., AE A}, nonempty families of random variables and O'-algebras of
events respectively. It is customary to define
E{X I Y)., AE A} = E{X IO'(Y)., AE A)},
(10)
E{XI~)., AE A} = E{XIO'(~)., AE A)},
and, in particular,
E{XI Yt , · · · , Yn } = E{XI0'(Y1 , · · · , y,,)},
(11)
E{XI Y} = E{XIO'(Y)}.
Since, by definition E{X I Y1 , ••• , Y,,} is 0'(Y1 , •.• , y")-measurable, Theorem
1.4.4 guarantees that for some Borel function g on R n
E{XI Yl>'''' Yn } = g(Y., ... , y").
Conversely, if g is a Borel function on Rn such that IE g(Y1 , .•• , Y,,)I ~ 00
and for every A E 0'( Yl> .. , y")

Lg(y 1 , ••• , Yn) dP = Lx dP,


then g(Y1 , .•• , y") = E{XI Yl>"" Yn }, a.c.
In particular, if Y = I A for some A E~, then O'(Y) = {0, A, AC, O} and
every version of E{X I Y} must be constant on each of the sets A, AC , neces-
sitating

P/A} Lx dP if WE A

E{XI Y}(w) = (12)

P{~C} LeX dP if WE AC,

where either of the constants on the right can be construed as any number in
[ - 00, 00] when the corresponding set A or A C has probability zero.
More generally, if {An, n ~ 1} is a O'-partition of 0 in ~ with P{A n} > 0,
n ~ 1, and ~ = O'(A n , n ~ 1), then for any measurable function X with
IEXI ~ 00

a.c. (13)
216 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

°
Moreover, this remains valid even if PrAm} = for certain indices m, the
quantity in the parenthesis being then interpreted as an arbitrary constant.
Some simple consequences of the definition of conditional expectation are

E{II':9'} = I, a.c., E{XI':9'} ;::: 0, a.c. if X ;::: 0, a.c., (14, i)

E{cXI':9'} = c E{XI':9'}, a.c., if IE X I ::; 00 and


c is a finite constant, (14, ii)

E{X + YI':9'} = E{XI':9'} + E{YI':9'}, a.c., ifE(X- + Y-) < 00 or


E(X+ + Y+) < 00. (14, iii)

These properties assert roughly that if TX = E{XI':9'}, then T is linear,


order preserving (monotone), and Tl = 1.

E{XI':9'} = X, a.c. if X is ':9'-measurable. (14, iv)

If ':9'1' ':9'2 are a-algebras with ':9'1 c ':9'2 c !F and IE XI::; 00, then
E{E{XI':9'2} l':9'd = E{XI':9'd = E{E{XI':9'd 1':9'2}, a.c. (14, v)

the first equality of (14, v) follows. Since E{X I':9' d is ':9'j-measurable for
i = 1 and hence i = 2, the second equality of (14, v) follows from (14, iv).
D

Theorem 2. Let {X n' n ;::: I} and Y be random variables with ElY I < 00 and
':9' a a-algebra ofevents.
i. (Monotone convergence theorem for conditional expectations). If Y ::;
X n iX, a.c., then E{X nl':9'} i E{XI':9'}, a.c.
ii. (Farou lemma for conditional expectations). If Y ::; X n' n ;::: 1, a.c., then
E{lim X nl':9'} ::; lim E{X nl':9'}, a.c.
iii. (Lebesgue dominated convergence theorem for conditional expectations).
If X n ~ X and IXnl ::; I YI, n ;::: 1, a.c., then E{X nl':9'} ~ E{XI':9'}.
PROOF. (i) By the monotone property of conditional expectations, E{ Y I':9'}
::; E{X nl':9'} i some function Z, a.c. For A E ':9', by ordinary monotone
convergence

fA
Z dP = lim n f
n A n
E{X l':9'}dP = lim f A
X n dP = f
A
X dP,

and, since Z is ':9'-measurable, Z = E{XI':9'}, a.c.


7.1 Conditional Expectation 217

Apropos of (ii), set y" = infm2:n X m • Then Y ~ Y" i limm-oo X m , whence


limn E{Xnl~} ;::: limn E{Ynl~} = E{lim n Xnl~} by (i). Finally, (iii) is a
consequence of (ii) via
a.c. D

An extremely useful fact about conditional expectations is furnished by

Theorem 3. Let X be a random variable with IE X I ~ 00 and ~ a a-algebra


of events. If Y is a .finite-valued ~-measurable random variable such that
IE XYI ~ 00, then
E{XYI~} = Y E{XI~}, a.c. (15)
PROOF. By separate consideration of X± and y± it may be supposed that
X ;::: 0 and Y ;::: O. Moreover, by the monotone convergence theorem for
conditional expectations it may even be assumed that X and Yare bounded
LV.S. Set

v{A} = I XY dP, /irA} = I X dP, AE~.

Then both /i and v are finite, P-continuous measures on ~ and, denoting as


usual the restrictions of P, v, /i to ~ by P(9, V(9, /i<§,
dV(9 d/i
dP<§ = E{XYI~}, :;: = E{XI~}' dP = X, a.c.

For A E~, by Theorem 6.5.2

L Y d/i(9 = I :~:
Y dP(9 = Iy E{XI~} dP(9,

and for AE~

Consequently,

Iy E{XI~} I dP = XY dP, A E~,

and since Y E{XI~} is ~-measurable, (15) follows. 0

Theorem 4 (Jensen's Inequality for Conditional Expectations). Let Y be ff-


measurable with IE YI ~ 00 and g any finite convex function on ( -00, (0) with
IEg(Y)1 ~ 00. Iffor some a-algebra ~ of events, (i) X = E{YI~}, a.c. or (ii)
X is ff-measurable with X ~ E{ YI~}, a.c. and g i, then
g(X) ~ E{g(Y)I~}, a.c. (16)
218 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

PROOF. Since (ii) follows directly from (i), it suffices to prove the latter. To
this end, define
*( )
gt=lm
I' g(s) - g(t) ,
5-1- s-t
whence g* is a finite, nondecreasing function on ( - 00, (0). Now the secant
line of a convex function is always above the one-sided tangent line, that is,
g(t) ~ g(s) + (t - s)g*(s), -00 < S, t < 00, (17)
whence if A = {IXI ::;; M, g*(X) ~ a}, 0< M < 00 and B = {IXI < 00,
g*(X) ~ o} both g(X) and g*(X) are bounded on A E <;#, so by (17)
IAg(Y) ~ IAg(X) + IA(Y - X)g*(X), a.c.
Since IE IA(Y - X)g*(X)1 ::;; 00, by Theorem 3,
IA E{g(Y)I<;#}'= E{IAg(Y)I<;#} ~ IAg(X), a.c.
As M -+ 00, IA ~ IB , so (16) holds on B. Similarly, for {g*(X) ::;; 0, IXI < oo}.
Consider next D = {X = oo}. If g*(s) > 0, some s in (-00, (0), then (17)
ensures
I D E{g(Y)I<;#} ~ g(s)ID + I D E{Y - sl<;#}g*(s) = 00 = g(X)' ID
If g*(s)::;; 0, all s in( -00, (0), theng! whence g(oo) ::;; g(X), a.c. and (16) holds
on D since
I D E{g(Y)I<;#} ~ IDg(oo) = IDg(X).
Let D' = {X = -oo}. If g*(s) > 0, all s in (-00, (0), then 9 i whence
I D· E{g(Y)I<;#} ~ ID,g( -(0) = g(X)I D,

(Y - s)g*(s) implying
°
whereas if g*(s) < for some s in (-00, (0), then via (17) g(Y) ~ g(s) +

E{g(Y)I<;#} ~ + [E{YI<;#} - s]g*(s)


g(s)
= g(s) + [X - s]g*(s) = 00, a.c. on D'
and (16) holds on D'. D

Corollary 3. For X and <;# as in the theorem, with probability one


IE{XI<;#}I::;;E{IXII<;#}, Er{IXII<;#}::;;E{IXI'I<;#}, r~l,

E{max(a, X) I<;#} ~ max{a, E{X I<;#}}, - 00 < a< 00.

Theorem 5 (Extended Fatou Lemma For Conditional Expectations), Let


{X n , n ~ I} be random variables with IE Xnl ::;; 00, n ~ 1, and <;# a a-algebra
ofevents. If IE limn X nI ::;; 00 and
sup E{X; I[x~ >kll <;#} ~
n?1
° as k -+ 00,

then E{lim n X nI<;#} ::;; limn E{ X nI<;#}, a.c,


7.1 Conditional Expectation 219

PROOF. If lk = SUPn~ 1 E{X; I[X;; >kll ~}, k > 0, then lk~ by hypothesis.
Since with probability one
°
E{Xnl~} = E{Xn(J[X,;-:<>kl + I[x,;->kj) I~} 2:: E{XnI[x,;- :<>kll~} - lk,
it follows via Theorem 2(ii) that for all k > °
lim E{X n I~} 2:: lim E{XnI[x,;- :<>kll~} - lk
n n

2:: E{lim XnI[x,;- SkI I ~} - lk


n

2:: E{lim X n I~} - Yk , a.c.,


which yields the theorem upon letting k -+ 00. o
Corollary 4. Let ~ be a a-algebra ofevents, {X n' n 2:: I} random variables with
IE Xnl ~ 00, n 2:: 1, and lim k_ oo E{lXnIIlIXnl>k] I~} = 0, uniformly in n with
probability one. If X n ~ X, where IE XI ~ 00, then E{X n I~} ~ E{X I ~}.
PROOF. Applying the theorem t-o X n and -X n ,
E{X I~} ~ lim E{X n I~} ~ rrm E{Xn I~} ~ E{X I ~}, a.c. 0
n

Corollary 5. Let ~ be a a-algebra of events and {X n' n 2:: I} random variables


with IE Xnl ~ 00, n 2:: 1. If Xn~ X o, where IE Xol ~ 00 and for some r > 1
sup E{IXnl'I~} < 00, a.c.,
n~l

then E{Xnl~}~ E{Xol~}·


PROOF. Let X~ = XnI A , n 2:: 0, where A = {suPn~lE{IXnl'I~} < C < oo}.
Since E IX~I' = E[E{IX~I'I~}] ~ C, Fatou's lemma ensures that E IXol' ~ C.
Moreover, for K > 0,
E{IX~II[lx~I>K11~} ~ Kl-'E{IX~I'I~} ~ CK 1 - '
uniformly in n with probability one, whence E{X~I~} ~ E{Xol~} by Corol-
lary 4 and the conclusion follows as C -+ 00. 0

EXAMPLE 2. Let Sn = LJ= 1 X j ' where {X n' n 2:: I} are independent LV.S and
let {lX n, n 2:: I} be constants such thatP{Sn < IXn} > 0, n 2:: 1. Then

(18)

entails

P{Sn 2:: IX n, i.o.} = 1. (19)

PROOF. Set AN = U.~)=N [Sn 2:: IX n], N 2:: 1, and suppose that P{A N} = I for
220 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

some N ~ 1. Then 1 = E P{ANISN}' implying P{ANISN} = 1, a.c" and so on


the set {SN < aN}
a.c. (20)

Next,ifh(x) = P{U:=N+l [Sn - SN ~ an - x]},thenhjandbyCorollary


7.1.2

a.c. (21)

According to (20), h(SN) = 1, a.c., on {SN < aN}' Thus

f(-<Xl.aN)
[1 - h(x)] dP{SN < x} = P{SN < aN} - i [SN<aN]
h(SN) dP = 0,

and so h(x) = 1 for every x < an that is a point of increase of P {SN < x}. Since
P{SN < aN} > 0, there must be at least one such x. Then monotonicity of h
guarantees h(x) = 1 for all x ~ aN and (21) ensures
P{A N+ 1 I SN} = 1, a.c.
on [SN ~ aN]. Consequently, recalling (20), P{A N+ I I SN} = 1, a.c., and so
P{AN+d = 1. Since P{Ad = 1 by hypothesis, it follows inductively that
P{A N } = 1 for all N ~ 1, which, in turn, yields (19). 0

Corollary 6. If Sn = 2:7
Xi' where {X n, n ~ I} are independent LV.S with
P{Xn < O} > 0, n ~ 1, then for nonnegative, constants {an, n ~ I}, (18) implies
(19).

Remark. Clearly, both equality signs can be deleted from (18) and (19)
provided P{Sn < an} > 0, n ~ 1, is modified to P{Sn ::; an} > 0, n ~ 1.

EXERCISES 7.1
I. If X is a r.V. with IE X I ~ 00 and ':§ a a-algebra of events with a(X) and '!J indepen-
dent classes, then EIX I ':§} = EX. a.c. In particular. if IX n} are independent LV.S
with IE Xnl ~ 'X), n:2: I, then E{Xnl X to . . . , X n- I } = EIXnl X n+ l • X"+2' ... : =
E X n • a.c.

2. Let I An. n :2: I} be a-partition of n in .'F. Verify that if t§ = a(A n. n :2: I) and
IE XI ~ x. then

where the parenthetical quantity is an arbitrary real number if PI An: = O.

3. Show that Corollary 5 remains true if EIXI < 00 and C = 00.


7.1 Conditional Expectation 221

4. Let ~ be a semi-algebra of events and X, Y r.v.s with IE XI < 00, IE YI < 00. If
SoX dP :::;; So
Y dP, DE C/, then E{X IO'(.@)} :::;; E{ Y IO'(.@)}, a.c.
5. Let (X I' X 2) be jointly normally distributed (Exercise 6.5.12) with p.d.f.

2 1/2
[27t0'10'2(1 - p) r I exp { 1
2(1 _
[xi 2px IX2
2) "2 - - - +"2
x~J} .
P 0'1 0'10'2 0'2

Find E(X I IX 2} and the conditional variance.

6. Prove that
i. if X isan 9"2 r.v. and Yisar.v. such that E{X I Y} = Y,a.c.and E{ Y IX} = X,a.c.,
then X = Y, a.c.
ii. Let ~I' ~2 be a-algebras of events and X an 9"1 r.v. If Xl = E{XI~d, X 2 =
E(XII~2)' and X = X 2 , a.c., then Xl = X 2 , a.c.
iii. If X, Yare 9"1 r.v.s with E{XI Y} = Y, a.c., and E{YIX} = X, a.c., then
X = Y, a.c.
7. Prove that the set E of lemma I and the function dvldP are unique to within an
equivalence.

8. Let X be an!E 2 r.v. and <§ a a-algebra ofevents. Prove that (i) 0'2(E{ X I <§}) :::;; 0'2(X),
(ii) iffor any oc in ( - 00, 00), Y = min(X, oc),
E{[X - E{XI~}J21~} :?: E{[Y - E{YI':§}J21<;§}, a.c.
(iii) E(X - E{XI~})2 :::;; E(X - y)2 for any ~-measurable r.v. Y.
:/
9. Let <§ be an a-algebra of events and (X., n :?: I} r. v.s. If for some p :?: I, X. ---4 X,
then E[X. I~} ~E[X I ~}.

10. Let (g be a a-algebra of events and X a nonnegative r.v. Prove that E[X I~} =
esup{h: h is ~-measurable, h :?: 0, a.c., and SA h dP :::;; .fAX dP, all A E~}.

II. Show via an example that in Theorem 2(iii) X. ~ X cannot replace X. ~ X.

12. If [ X., n :?: I} are 9" I interchangeable r.v.s (Exercise 6.3.10) and fi'. = O'(L!= I X j,
) :?: n), prove that E{Xd fi'.} = (lin) D X j , a.c., I :::;; i :::;; n. More generally, if
[X., n:?: I} are interchangeable r.v.s, cp is a symmetric Borel function on R m
with E Icp(X ..... , X m)1 < 00, and fi'. = O'(U m.j,) :?: n), where

(mn)U m .• = L:
IS; I < ... < i m S n
cp(X i ,,···, Xi,.), n:?: m,

then for 1 :::;; i l < ... < im :::;; n+ 1

E(cp(X i,,···, Xi,.) I·~.+ I} = E[cp(X I"'" X m ) Ifi'.+ I}' a.c.

13. (Chow-Robbins) Let S. = D


Xi> n:?: I, where (X, X., n:?: I} are i.i.d. with
E IX I = 00. Then for any positive constants {b.}, either P (lim,,- 00 b; II S. I = O} = I
or (*) P(mn._oob.-IIS.I = oo} = 1. Hint: If(*) fails, by the zero-one law mn
b.-IIS.I < 00, a.c., whence p{mnb;IIS._11 < oo} = P{mnb;IIL7=2 Xd < oo}
= I, entailing P{mnb;IIX.+.1 < oo}:?: p{mnb;I(IS.1 + IS.-II) < oo} = 1.
Now apply lim b.-I IS.I :::;; IimIS.I/(I + IX.+II)·mn(l + IX.+11)jb. and
Example I.
222 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

7.2 Conditional Probabilities, Conditional


Probability Measures
Let (Q, ff, P) be a fixed probability space and '1 a sub-a-algebra of ~', For
A, BE ff, define
P{A 1'1} = E{IA I '1}, P{A I B} = E{IA I fB}' (1)
The former, P{A 1'1}, is called the conditional probability of the event A
given '1 and according to the prior section is a '1-measurable function on Q
satisfying

Lp{A,'1} dP = f/A dP = P{A· G}, GE'1. (2)

The latter, P{A I B}, is called the conditional probability of the event A given
the event B and according to 7.1(12), if P{B} > 0,

WEB. (3)

The properties 7.1 (14) ofconditional expectation and monotone convergence

°
yield:
~ P{A I '1} ~ 1, a.c,; (4, i)

P{A 1'1} = 0, a.c" iff P{A} = 0, P{A I '1} = 1, a.c., iff P{A} = 1; (4, ii)

If {An, n ;;::: l} are disjoint sets of ff, then

p{ QAn I'1} = J, P{A n I '1}, a.c.; (4, iii)

lim P{A n I '1} = P{A I '1}, a,c, (4, iv)

Property (iii) asserts that for each sequence {An} of disjoint events

except for w in a null set which may well depend on the particular sequence
{An}. It does not stipulate that there exists a fixed null set N such that

for every sequence {An} of disjoint events. In fact, the later is false (Halmos,
1950, p. 210).
7.2 Conditional Probabilities, Conditional Probability Measures 223

Definition. Let ff l ' <§ be a-algebras of events. A conditional probability


measure or regular conditional probability on .~ I given (If is a function P(A, w)
defined for A E ff 1 and WEn such that

1. for each WEn, P(A, w) is a probability measure on ff l'


11. for each A E ff l' P(A, w) is a <§-measurable function on n coinciding with
the conditional probability of A given <§, i.e., P(A, w) = P{A I <§}(w), a.c.

One advantage of regular conditional probabilities is that conditional


expectations may then be envisaged as ordinary expectations relative to the
conditional probability measure, as stated in

Theorem 1. Iffor some pair ff l' <§ ofa-algebras ofevents, P",{A} == P(A, w) is
a regular conditional probability on ff 1 given <§ and X is an ff i-measurable
function with IE X I ::; 00, then

E{XI<§}(w) = Lx dP"', a.c. (5)

PROOF. By separate consideration of X+ and X-, it may be supposed that


X ~ O. Let
yt' = {X: X ~ 0, X is ff i-measurable, and (5) holds}.

By definition of conditional probability measure, I A E yt' for A E ff i' Since,


as is readily verified, yt' is a monotone system, Theorem 1.4.3 ensures that yt'
contains all nonnegative ff i-measurable functions. 0

In general, a regular conditional probability measure does not exist


(Doob, 1953, p. 624). A simple case where it does is

EXAMPLE 1. Let ff i = §, <§ = a(A n, n ~ 1), where {An, n ~ I} is a a-


partition of n in ff. Then, if

P(A ) = {P{AAn}/P{A n}, wEAn' P{A n} > 0


,w P{A}, wEAn, P{A n} = 0,

P(A, W) = P{A I<§} a.c. (as in Exercise 7.1.2), and so P(A, w) is a regular
conditional probability relative to ff given cr;.

A more interesting and important illustration is that of

EXAMPLE 2. Let Xi' X 2 be coordinate LV.S on a probability space (R 2, [JI2, P)


with an absolutely continuous dJ. F(x i , X2), i.e., for some Borel functionf

(6)
224 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

Let iF l = fJl2 = u(X l , X 2), '§ = u(X 2) = R x fJI = {R x B: BEfJI}, and


for (Xl' X2) E R 2 define

fl(Xl) = f:oof(Xl' t)dt, fi x 2) = f:oof(S, x 2)ds, (7)

- {f(Xl' x2)/f2(X2) iffix 2) > 0


f 1(Xl I X2 ) - fl(Xl)
.
Iff2(X2) = o.
(8)

By Fubini's theorem, j;(x) is a Borel function on R 1 for i = 1, 2, and so


fl(XI I X2) is a Borel function on R 2. For BEfJl 2 and X = (Xl' X2)ER 2 define

P(B, x) = 1 [5: (5, X2)e BI


fl(S I x2)ds. (9)

Then for each X E R 2, P(B, x) is a probability measure on .JJ2, and for each
B E .~2, P(B, x) is a Borel function in X2 and hence u(X 2)-measurable. More-
over, for BE fJl2 and A 2 = R X B 2 E u(X 2)

L2 P(B, X) dP = {2 f:oo P(B, (s, t»f(s, t)ds dt


=f foo [1
B2 - 00 [u: (u, t) e BI
fl(U I t)dU]f(S, t)ds dt
.

=f 1 fl(U I t)fit)du dt
B2 [u: (u. I) e BI

=f r f(u,t)dudt = f fOO IBf(u,t)dudt


JB2 [u: (u, t) e BI B2 - 00

and so according to (2), P(B, x) = P{BIX 2}(x), a.c., for each BE .:fd 2. Con-
sequently, P(B, x) is a regular conditional probability measure on fJl2 given
u(X 2)' Hence, by Theorem 1 for any Borel function h on R 2 with
IE h(X l , X 2)1 :::;; 00

a.c. (10)

The Borel function fl (x I I X2) is called the conditional density of X I


given X 2 = X2, while J~oo h(s, x2)fl(slx2)ds is termed the conditional
expectation of h(X I' X 2) given that X 2 = X2 and is denoted by

Analogously, f2(X2Ix l ) is the conditional density of X 2 given XI = Xl'


7.2 Conditional Probabilities, Conditional Probability Measures 225

Moreover,

FI(xl I X2) = i [u<x,)


fl(u I x2)du, F 2(X21 XI) = i
[u<x21
fiu I XI)dU

are the conditional d.f.s of X I given X 2 = X2 and of X 2 given XI = XI


respectively, and from (10)
P{X 1 < x 1 1X 2 } = F 1(XIIX 2),
As noted in Chapter 1, any random vector X = (X 1' ... ' X") on (n,.?, P)
induces a probability measure P x on the class flI" of n-dimensional Borel sets
via Px{B} = P{X- 1(B)}, BE flI". For any a-algebra ~ c .? if Px{B I~}
is defined by Px{BI~} = P{X-l(B)I~}, BEflI", then Px{BI~} is a~­
measurable r.v. for each BE fJI". Moreover, for any sequence of disjoint sets
BmE flI", m 2': I, there exists a null set N E ~ depending in general on {B m,
m 2': I} such that

Px{QBml~}(W)=m~IPx{Bml~}(W), wENc. (11)

Can a version of Px{B I~} be chosen for each BE flI" so that on the comple-
ment of a single null set N E~, (11) holds for all disjoint sequences
{B m , m 2': I} c .9B"? An affirmative answer to this case has been given by
Doob in Theorem 2 (below)

Definition. Let X = (X 1, ... , X") or (X l' X 2 , ...) be a stochastic process on


(n, .?, P) and ~ a a-algebra of events. A function Px(B, w) on flI" x n,
1 ~ n ~ 00, is termed a regular conditional distribution for X given ~IJ pro-
vided that for each WEn, Px(B, w) is a probability measure on flI" and that
for BE flI", Px(B, w) is ~-measurable with
a.c. (12)

Theorem 2 (Doob). If X = (X 1, ... , X") is a random vector on a probability


space (n, .?, P) and ~ is a a-algebra ofevents, there exists a regular conditional
distribution for X given ~.
PROOF. For any rational numbers Ajo 1 ~ i ~ n, define

F:;'(A 1, ... , A") = P{OI [Xi < AiJ I~ }(W). (13)

By the properties enumerated in (4) there is a null set N E ~ such that for
WENC and all rational numbers Ai, Ai, rim
F:;'(A 1, , A") 2': F:;'(A.'I' ... ,A~) if Ai > Ai, I ~ i ~ n, (i)
F:;'(A 1, , A") = lim F:;'(r 1m , ... , r"m), (ii)
rim t Ai
1 SiSn
226 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

lim F:;'(A I , ... , An) = 0, 1 :$; i :$; n, lim F:;'(A b ... , An) = I, (iii)
Aj - - 00 Aj - 00
1 :5i$n

(iv)
where the notation in (iv) is that of (5) of Section 6.3; also, A:$; A' signifies
that the inequality holds for each coordinate. Define for any real numbers
pj and WE N C
F::'(PI"'" Pn) = lim Fn"'(A I ,···, An), Aj rational, 1 :$; i :$; n, (14)
Ai-+Jli-
iSiS"

while for wEN, set

(15)

Then for each WEn, F:;'(PI' ... , Pn) is an n-dimensional d.f. and hence
determines a Lebesgue-Stieltjes measure P,.} on f!Jn with p",{R n} '= I. For
B E f!Jn and WEn define
(16)
If
:If = {B: BE f!Jn, Px(B. w) is <§'-measurable, Px(B, w) = P{X-I(B)I<§'}, a.c.},
g = {B: B = ,X[-
.= I
00, A rational AjE( -00,(0),1
j ), :$; i :$; n}
then :If is a A-class, g is a n-c1ass, and by (13) through (16):If :=> g. Hence,
by Theorem 1.3.2,:If :=> a(g) = f!Jn, that is, Px(B, w) is a regular conditional
distribution for X given <§'. 0
Corollary 1. If X = (XI' X 2 , ... ) is a (countable) stochastic process on
(n, ~, P) and <§' is a a-algebra of events, there exists a regular conditional
distribution for X given <§'.
PROOF. For all n ~ I and rational Ai> I :$; i :$; n, choose F:;'(A b . , . , An) as in
(13). Select the null set N E <§' such that in addition to properties (i)-(iv), for
WEN C
lim F:;'+I(A I , ... , An + l ) = F:;'(A I , ... , An), n~1. (v)
A. n + I-a::>

Define F:;'(J.lI' ... , J.ln) as in (14), (15). Then for each WEn, {F:;', n ~ I} is a
consistent family of dJ.s and hence by Theorem 6.4.3 determines a measure
J.l", on (ROO, f!Joo). Define Px(B, w) = p",{B}, BE f!Joo, and
:If = {B: BE f!Joo, P x(B, w) is <§'-measurable,
Px(B, w) = P{X-I(B)\<§'}, a.c.},

g = .91 {B:BEf!J"" B= i~ [ - 00, A;) x R x R x ... rational AjE( - 00, CO)}.


7.2 Conditional Probabilities, Conditional Probability Measures 227

whence Ye is a A-class, ~ is a n-c1ass, and .Yf ~ ~, so that by Theorem 1.3.2


Ye ~ a(~) = [flOC>. 0

Corollary 2. Let X = (X 1, . . . , X n) or (X l' X 2, . . .) be a stochastic process on


(Q, IF, P) and r§ a a-algebra of events. IfP'X{B} = Px(B, w) is a regular con-
ditional distribution for X given r§ and h is a Borelfunction on R n , 1 :$; n :$; 00,
with IE h(X) I :$; 00, then

E{h(X) I r§}(w) = Lnh(X) dP'X(x), a.c.,

where P'X(x) = P'X{X7= I ( - 00, x;)} for x = (XI' x 2 , .•• , x n)·


PROOF. As in Theorem I. o
The question naturally poses itself whether a regular conditional distribu-
tion for X given r§ engenders a corresponding regular conditional prob-
ability on a(X) given r§. The answer is positive under the proviso of

Theorem 3. Let X = (X I' ... , X n ) or (X I' X 2, ...) be a stochastic process on


(0, IF, P) and r§ a a-algebra ofevents. If the range of X, i.e., {X(w); w E O} is a
Borel set, then there exists a regular conditional probability on a(X) given r§.
PROOF. Let Px(B, w) be a regular conditional distribution for X given r§ and
set S = {X(w): w EO}. Then
Px(S, w) = P{X-I(S) I r§}(w) = P{O I r§}(w) = 1
except for WEN E r§ with P{N} = O. If for A E a(X) there are two sets B I,
B 2 E [fin with A = X-I(B I ) = X- 1 (B 2 ), then B I .1 B 2 eSc, implying
PX(B 1 , w) = PX (B IB 2 , w) = PX (B 2 , w), wENC.
Hence, for A E a(X) it is permissible to define
P(A, w) = Px(B, w), wENC,
and for any fixed WoE N C set
P(A, w) = Px(B, w o), WEN.
Then for each WE 0, P(A, w) is a probability measure on a(X), and for each
AE a(X), P(A, w) is r§-measurable with

P(A, w) = Px(B, w) = P{A I r§}(w), a.c. o


The preceding notions will be utilized to prove the conditional Holder
inequality.

Theorem 4. If X, Yare random variables on (0, IF, P), f§ is a a-algebra of


events and 1 < p < 00, (lip) + (1Iq) = 1, then
E{jXYII f§} ~ EI/P{jXIP I f§}. E1/q{j Ylq Ir§}, a.c. (17)
228 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

PROOF. For BE tJ42 and WEn, let Pw{B} = P(B, w) be a regular conditional
distribution for (X, Y) given r§. By Corollary 2

E{IXYIIr§}(w) =
Jr !x x21 dPw ,
R2
1 a.c.,

E,/p{lxIPIr§}(w) = ({2IXIIP dPw),IP, a.c.,

E1/q{IYlqlr§}(w) = ({2IX2IqdPw)'lq, a,c.

Since for each WEn, Pw is a probability measure on tJ42, (17) follows from the
ordinary Holder inequality. 0

EXERCISES 7.2

I. If the events (B j } of (n, fF, P) constitute a finite or a-partition of n in fF and have


positive probabilities, prove Bayes' theorem, i.e., for any event A

2. If in Example 2, Flx') = Iim x,_,_oo F(x l , X2), i = 1,2, prove that the dJ. F I is a
mixture of the family F(x 1 I X2) in the sense of Exercise 8.2.3.

3. If X = (X l' ... , X n) is a random vector whose d.f. F is absolutely continuous relative


to Lebesgue measure with densitYffind the conditional density of (X I •... , X) given
X j + 1 = Xj+l • . . . , X n = X n , where 1 ::;;j < n and verify the existence of a regular
conditional probability measure on 9In given a(X j + .. ... , X n ).

4. If LV.S X and Y have a joint normal distribution (Exercise 6.5.12), show that the
conditional distributions of X given Y and Y given X are both normal. Is the con-
verse true?

5. If XI' ... , X k have a multinomial distribution (Exercise 6.3.9) find the conditional
distribution of X I given X 2, the conditional mean E( X 1 I X 2}. and the conditional
variance a 2(X 11 X 2)'

6. (D. G. Kendall) If AI' , An are interchangeable events on (n, fF, P) and Sn =


number of events AI' , An that occur, prove for I ::;; i I < ... < i j ::;; nand 1 s; j
::;; N ::;; n that

7. Let (fl, fF, P) be a probability space and ~ a a-algebra of events. If A E fF and B =


(w: PtA I~} > OJ, verify that (i) B E~, (ii) PtA - B} = 0, and, moreover, if B'
satisfies (i) and (ii) then P(B - B'} = 0 (i.e., B is a ~-measurable cover of A).

8. Prove the conditional Minkowski inequality (Exercise 6.2.7) and also the condi-
tional Markov inequality E{IYIIr§} ~ cP{1 Y ~ clr§} for any c > 0, Ye 2 1 ,
and sub-a-algebra r§.
7.3 Conditional Independence, Interchangeable Random Variables 229

7.3 Conditional Independence, Interchangeable


Random Variables
The notion of conditional independence parallels that of unconditional or
ordinary independence. As usual, all events and random variables pertain to
the basic probability space (n, iF, P).

Definition. Let t§ be a a-algebra of events and {t§., n ~ I} a sequence of


classes of events. Two such classes ~I and ~1 are said to be conditionally
independent given ~ if for all Aj E t§j, i = 1,2,

a.c. (1)

More generally, the sequence {~., n ~ I} is declared conditionally inde-


pendent given C!J if for all choices of Am E t§k m' where k i # k j for i # j, m =
1,2, ... , n, and n = 2,3, ... ,

P{A ,A 2 ... A. I t§} = n• P{A I


i= ,
i t§}, a.c. (2)

Finally, a sequence {X•• n ~ I} of random variables is called conditionally


independent given ~ if the sequence of classes t§. = a(X.), n ~ I, is condi-
tionally independent given t§.

If t§ = {0, il}, then conditional independence given t§ coalesces to


ordinary (unconditional) independence, while ift§ = iF, then every sequence
of classes of events is conditionally independent given t§.
Independent LV.S {X.} may lose their independence under conditioning;
for example, if {X.} are Bernoulli trials with parameter p and S. = Xi' LI
ther. P{X i = 11 S2} > 0, i = 1, 2 for S2 = 0 or 2, whereas P{X 1 = 1,
X 2 = 1 IS2} = 0 when S2 = o.
On the other hand, dependent LV.S may gain independence under con-
ditioning, i.e., become conditionally independent. Thus, if {X.} are inde-
pendent integer valued LV.S (see Exercise 7.3.3) and S. = LI
Xi' clearly the
LV.S {S., n ~ I} are dependent. However, given that the event {S2 = k} (of
positive probability) occurs,

P{S = . S ='1 S } = P{SI = i, S2 = k, S3 = j}


1 I, 3 } 2 P {S 2 = k}

P{SI = i}P{X 2 = k - i}P{X 3 =j - k}


P{S2 = k}

= P{S ='1 S } P{X 3 = j - k, S2 = k}


, 1 2 P{S2 = k}
= P{S, = i IS2}P{S3 = j 1S2}'
230 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

If the subscript n of Sn is interpreted as "time" and the r.v.s SI, Sz, S3 are
envisaged as chance occurrences of the past, present and future times
respectively, then the prior relationship may be picturesquely stated as, "The
past and future are conditionally independent given the present." In fact,
designating the LV. Sn as "the present," the r.v.s SI"'" Sn-l as "the past,"
and the r.v.s Sn+ 1 " ' " Sn+m as "the future," it may be verified for any n 2:: 2,
m > 0, that the past and the future are conditionally independent given the
present. This property holds not only for sums Sn of independent r.v.s but
more generally when the r.v.s {Sn' n 2:: I} constitute a Markov chain (Exercise
7.3.1).

Theorem 1. lfr§i, i= 1,2,3, are a-algebras of events, then conditional inde-


pendence Ofr§1 and r§z given r§3 is equivalent to anyone of the following:
I. For all Al Er§I,
a.c. (3)
ii. For every D E ~ where ~ is a n-class with r§ 1 = a(~),
a.c. (4)
iii. For every D j E ~j where ~i is a n-class with r§i = a(~i)' i = 1,2,
a.c. (5)

IV. For every a(r§1 u r§3)-measurable function X with IE XI :s; 00,

a.c. (6)

PROOF.(i) (1) => (3): Let Ai E r§j, i = 1,2, 3. By the definition of conditional
expectation, (1), and Theorem 7.1.3,

fA2A3
IA, = f
A3
IA,A2 = f
A3
P{A I A z l r§3} = fA3
P{A 1 I r§3} . P{Azl r§3}

= f
A3
IA2 P{A 1 I r§3} = fA2A3
P{A 1 I r§3},

whence (3) obtains by Theorem 7.1.l(ii). Conversely, by 7.1, (l4)(v), Theorem


7.1.3, and (3),

P{A1Azl r§3} = E{P{A1Azl a(r§z u r§3)} I r§3}


= E{IA2 P{A 1 I a(r§z u r§3)} I r§3}
= E{I A P{A 1 I r§3} I r§3} = P{A 1 I r§3}p{A z l r§3},
2 a.c.

Apropos of (ii), since (3) =;> (4) trivially, it suffices to prove the reverse
implication. If

d = {A: A E:F, P{A I a(r§z u r§3)} = P{A I r§3}, a.c.},


7.3 Conditional Indepenaence, Interchangeable Random Variables 231

then.s;1 is a A.-class ~ fi) via (4), and so by Theorem 1.3.2,.s;1 ~ a(fi) = ~t,
which is tantamount to (3).
Concerning (iii), it suffices to prove (5) ==- (1), the converse being trivial.
Since
.s;1 = {A 2 E fF: P{D t A 2 ~3} = P{D I I ~3} . P{A 2 ~3}, a.c., all D 1 E fi)tl,
1 1

.s;1* = {AIEfF:P{AIA21~3} = P{AII~3}·P{A21~3},a.c.,allA2E~2}


are both A.-classes and .s;1 ~ fi)2 by (5), Theorem 1.3.2 ensures that .s;1 ~
a(fi)2) = ~2. Then a reapplication of this theorem yields.s;1* ~ a(fi)I) = ~I
and (1) obtains.
Finally, to verify (iv) note that, taking X = I A" (6) ==- (3), and hence it
suffices to prove (3) ==- (6). For Ai E ~i' i = 1, 3 via (3),
P{AIA3Ia(~2 u ~3)} = IA3P{Atla(~2 u ~3)} = IA)P{AII~3}
= P{AIA31~3}' a.c.
lf d denotes the class of non-negative fF -measurable functions for which (6)
holds, then d is a A-system which, as just shown, contains the indicator
functions of all sets of a n-class generating a(~ I u ~ 3)' Hence, by Theorem
1.4.3, d contains all non-negative a(~ I u ~ 3)-measurable functions and
therefore (6) holds. 0

Corollary 1. Random variables XI' ... , X. on (n, fF, P) are conditionally


independent given a a-algebra <# of events iff for all (x I' ... , x.) E R·

p{Q I
[Xi < xJ <#} = it P{X i < Xi I <#}, a.c.

PROOF. This follows easily from (iii). o


A direct application of (6) yields

Corollary 2. If rqndom variables X I and X 2 are conditionally independent


given a a-algebra ~ of events and IE X 1 I ::; 00, then
E{X I I a(~ u a(X 2))} = E{X I I ~}, a.c.

Corollary 3. Let <#i be a a-algebra of events, i = 1, 2, 3. If a(<#1 U <#3) is


independent of <#2 , then ~I and <#2 are conditionally independent given ~3 (as
well as unconditionally independent).
PROOF. Since (3) with the subscripts 1 and 2 interchanged holds via Exercise
7.1.1, this follows directly from Theorem 1. 0
Recall that r. v.s {X., n ~ I} are called interchangeable if the joint distri-
bution of every finite subset of k of these r.v.s depends only upon k and not
the particular subset, k ~ 1.
232 7 Conditional Expectation. Conditional Independence. Introduction to Martingales

Definition. A mapping n = (n l , n z , ...) from the set N of all positive integers


onto itself is called a finite permutation if n is one-to-one and n. = n for all
but a finite number of integers. Let Q denote the set of all finite permutations
n and let 91 00 be the class of Borel subsets of Roo = R x R x ... and X =
(X l ' X z' ...) a sequence of r.v.s on (n, g;, P). Define 1tX = (X "I' X "2' ...)
for 1t = (1t l , 1t z , ...). Then
!/ = {X- l (B):BE91 ,P{X- ' (B) OO
~ (nX)-I(B)} = O,all1tEQ}, (7)
is called the O'-algebra of permutable events (of X).

Theorem 2. Random variables X n , n = 1,2, .. , on (n, g;, P)are interchangeable


iff they are conditionally independent and identically distributed given some
a-algebra <§ of events. Moreover, <§ can be taken to be either the a-algebra !/
of permutable events or the tail a-algebra ff, and
P{X I < XI I!/} = P{X I < XI Iff}, a.c. (8)
PROOF. Sufficiency is immediate since if

n P{X I <
m

P{X I < X" ... , Xm < X m <§} I = Xj I <§}, a.c., m ~ 1,


j= I

it follows upon taking expectations that {X., n ~ I} are interchangeable.


Apropos of necessity, for n ~ 1 and any real X, define
1 n
~n(x) = -
nj=1
L IIX,<x)' (9)

Then
E[~.(x) - ~m(x)]Z = 1m - nl (P{X I < x} - P{X I < X, X z < x}) -+ 0
m'n
as m, n -+ 00, whence ~.(x)!. some ff-measurable LV. ~(x), implying
(Exercise 3.3.1(iv))
n
m

j= I
~.(Xj)!. n
m

j= I
~(x), m~1. (10)

Let lXI' ... , IXmdenote positive integers and set


M = {IX: IX = (IXI"'" IX m), where 1 :$; IXj :$; n for 1 :$; i :$; m},
N = {1X:IXEM,lXj "# IXj,i "#j, I:$; i,j:$; m},

I(IX) = IIX" <XI •...• X, ... <X"')·

Whenever n > m, taking cognizance of (9),


1 m n 1
n n
m

j=1
~.(Xj) = m L IIX;<Xj)
n j=1 i=1
= m L I(IX)
n HM

= m1 ( L + L ) I(IX). (11)
n ~eM-N ~eN
7.3 Conditional Independence, Interchangeable Random Variables 233

Since as n --+ 00

1 1 nm - n(n - I)· .. (n - m + 1)
Iii
n aeM-N
L l(a)::; Iii
n
L
aeM-N
1=
n
m --+ 0

for each element A = X - I(B) E Y, (11) entails

f Ii~n(Xj)dP=n-mI f
Aj=1 aeN X-I(B)
l(a)dP+o(1). (12)

Now for any (X E N, if 7t E Q is chosen so that 7t i = (Xi' I ::; i ::; m, then by


definition of Y and interchangeability

fX-I (B)
l(a)dP = f
(nX)-I(B)
1IX,,<Xl ..... X'.,,<XmldP

= f
X-I(B)
1[Xl <XI.·· .• Xm<xm] dP .

Thus, from (12)


+
f n ~n(x)dP = n(n -
m 1) ... (n - m I)
m
A j= I n

x f
X-I(B)
1 1XI <XI, .... X'!'<Xm) dP + 0(1)

--+ JA11XI <Xl ..... Xm<Xm] dP. (13)

On the other hand, the left side of(13) tends to SA nj= I ~(x)dP by (10) and
dominated convergence. Hence for any A E Y, all real x I' ... ,Xm , and positive
integers m,

f Ii~(x)dP= f
A j=1 A
1 I X I<Xl, ... ,Xm<xml dP ,

and therefore

EtJjI~(X)ly}=p{XI <xI,··"Xm<xmIY}, a.c. (14)

Since Y :::::> ff and ~(x) is ff-measurable for every x, (14) entails

n ~(x),
m

P{X I < XI"'" X m < XmI Y} = a.c. (15)


I

P{X I < XI"'" X m < XmI ff} = E{ I) ~(x) Iff} = I) ~(Xj), a.c. (16)

In particular, for m = I, ~(XI) = P{X I < xIIY} = P{X I < XI Iff}, a,c.,
and by (15), (16)
234 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

P{X I < XI"",X m < x m l9'} = nm

I
P{X I < x;I9'} = n
m

I
P{X I < xilff}

= P {X I < X I' ... , Xm < Xm Iff}, a.c.


(17)
Finally, for any j = 2, ... , m, letting X; -+ co for i '" j, I ~ i ~ m, in (17),

P{X j < Xj I·'I'} = P{X I < x j l9'} = P{X I < Xj Iff}


= P{X j < Xj If}, a.c. (18)
Together, (17) and (18) assert that {X n , n ~ I} are conditionally i.i.d. given
either 9' or ff and that (8) holds. 0

Corollary 4 (de Finetti). If {Xn , n~ I} are interchangeable LV.S, there exists a


a-algebra I'§ of events such that for all m ~ I

P{X I < XI"'" X m < x m } = {Ol P{X I < Xj II'§} dP.

Corollary 5. If r.v.s {X n , n ~ I} are conditionally independent given some


a-algebra I'§ of events, there exists a regular conditional distribution, say pro,
for X = (X 10 X 2, " .) given I'§ such that for each WEn the coordinate LV.S
{~n' n ~ 1} of the probability space (R 00, f!400, pro) are independent. Moreover,
if X j E !fl , 1 ~ j ~ n, and E[X IX 2'" X n ] exists,
nE{X; II'§},
n

E{X IX 2'" X n II'§} = a.c., n ~ 2. (19)


;=1

PROOF. For any rational A., define Fi(A.) = P{X; < A. I I'§}(w), i ~ 1. There
exists a null set N E I'§ such that for WE NO, all positive integers n, and all
rational A., A.', r, r l , . . . , rn :

P{OI [X; < rJ!1'§ }(W) = t~ Fi(r;);


Fi(A.) ~ Fi(A.'), A. > A.'; Fi(A.) = limFi(r), I ~ i ~ n;
'fA
lim Fi(A.) = 0, lim Fi(A.) = I, 1~ i ~ n.
A--CO .. ~oo

Define for WE N C , rational rio ... , r n , and all n ~ 2,

nFi(r;).
n

Fn.ro(rl' ... , rn) =


i= I

Then properties (i)-(v) of Theorem 7.2.2 and Corollary 7.2.1 hold, so that
defining Fn,ro = nJ= I F Xj for WEN and continuing as in the proof of
Corollary 7.2.1, existence is ensured of a regular conditional distribution
pro for X given I'§.
7.3 Conditional Independence, Interchangeable Random Variables 235

Let {¢n, n ~ I} be coordinate LV.S on (ROO, fJloo, P"'). Since for all n ~ 2
and real XI"'" Xn
P"'{¢I < XI"'" ¢n < x n} = nn

i= 1
Ff'(xj-) = nP"'{¢j < xJ
n

i= 1

for WE N C , while

n P{X j < xJ
n

P"'Rt < XI' ... , ¢n < x n} = Fn,,,,(x l , · · · , x n) =


j= I
n
= np"'{¢j < xJ
i=1
for wEN, it follows that {¢n, n ~ I} are independent LV.S on (ROO, fJloo, P"')
for each WEn.
Apropos of (19), if P~ is a regular conditional distribution for X =
(XI' X 2 , ••. ) given t§, then since Px is a product measure, say, PI x P2 X ...
on fJloo, via Corollary 7.2.2 and Fubini's theorem

E{X I X2"'Xnlt§} = f XIX2···xndP.f.

L
Roo

= iU Xi dPj(x;) = D E{X;it§}, a.c. D

EXAMPLE I. Let {Xn' n ~ I} be interchangeable r. V.S on (0, §', P) and t§ = f7


or Y' (as in Theorem 2). (i) If X I E 2"1' then lin I7=1 Xi ~ E {X II t§} (ii) If
n' P{IXII> n} = 0(1), then lin I7= 1 X j - E{X 11t§} !. O.
PROOF. Let P'" be a regular conditional distribution for X = (X 1, X 2, . , ,)
given t§ such that the coordinate r.v.s {¢n, n ~ I} of the probability space
(ROO, fAoo, Poo} are i.i.d. for each WEQ. If Sn = Ii X j and E"'¢I = fRoo ~I dP"',
then by Corollary 7.2.2 E{Xllt§} = E"'¢I' a.c. By Corollaries 7.2.2 and 7.3.5
for any e > 0

!~ pLvm[1~Sn - E{XIIY'} I> e]} = !~ pLVm [1~Sn(W) - E"'¢II > e]}


=!~ EP"'LVm[l~i~ ¢i-E"'¢II>e]}=o. D

e
Apropos of (ii), set S~ = IJ=I Xi, T: = IJ=I j , where Xi = XAIXil~n)'
¢j = eill~jl,;;n), 1 ~j ~ n. Then P{Sn -:F S~} ~ IJ=I P{IXjl > n} = 0(1) and

E(S~ - n E{X~It§W = EE"'(T: - n E"'¢~)2 = EE"'Ltl (¢j - E"'eDJ

= n EE"'(e~ - E"'e~)2 ~ nEe? = 0(n 2 )


according to (17) of Theorem 5.2.4. Thus, S~/n- E{X11t§} !. 0, and so Snln-
E{X 11t§} !. O. 0
236 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

In view of Corollaries 4 and 5 it seems plausible that many results valid for
sequences of i.i.d. random variables carryover to sequences of interchange-
able r.v.s (see Chapter 9).

Theorem 3. Let {X n , n ;;:: I} be conditionally independent given a a-algebra <§


of events and let ff = n:,=t a(X j , j ;;:: n) denote the tail a-algebra. Thenfor

any T E ff there exists some G E <§ with P{G ~ T} = O.


PROOF. Via Corollary I, a(X t, ... , X n) and a(X j ' j > n) are conditionally
independent given <§ for any n ;;:: l. Then U:,=
1 a(X t, ... , X n) and ff are
conditionally independent given <§, whence by Theorem l(iii), a(X 1, X 2' ...)
and ff are conditionally independent given <§. Hence, for T E ff, P{TI<§} =
P 2 {TI<§}, a.c., implying P{P{TI<§} = 0 or I} = l. Thus,
P{T I <§} = IG , a.c.,
for some G E <§, implying

P{T·G} = Lp{T1<§}dP = P{G}

or P{G· T C } = O. Analogously, P{T· G = 0, and so P{G ~ T} = 0.


C
} 0

Theorem 2 states that interchangeable LV.S {X n , n = 1,2, ...} are con-


ditionally i.i.d. given either the tail a-algebra ~ or the a-algebra !/ of
permutable events. What is the relationship between .'J' and ff? More
generally, if {X n , n ;;:: I} are conditionally i.i.d. given <§ and also given £,
what is the connection between <§ and £?

Definition. If (n, ~, P) is a probability space and <§ is a sub-a-algebra of ~,


the completion <§* of <§ is a sub-a-algebra of ~ (Exercise 1.5.9) defined by
<§* = {G~N: GE<§, NE~ with P{N} = O}.
This definitition of completion differs slightly from the customary one
which replaces <§* by <§* = {G ~ M: G E <§, MeN E~, P{N} = O}.
However, <§* may not be a sub-a-algebra of~, whereas <§* c ~.

Lemma 1. Let <§ be a a-algebra of events and f§* its completion. If X* is any
<§*-measurable function, there exists a <§-measurable function X such that
X = X*, a.c.
PROOF. Let
£ = {X*: X* ;;:: 0, X* is r1*-measurable, and X* = X, a.c.,
for some <§-measurable function X}.
Then £ is a monotone system and I G E £ for all G E <§*. Hence, £ con-
tains all nonnegative <§*-measurable functions. The general case now follows
from X* = X*+ - X*-. 0
7.3 Conditional Independence, Interchangeable Random Variables 237

Lemma 2. Let ~* be the completion ofa a-algebra ~ ofevents. (i) Ifl E X I ~ 00,
then E{X I ~*} = E{X I ~}, a.c. (ii) LV.S {X n , n ~ I} are conditionally
independent (resp. conditionally i.i.d.) given ~ iff {X n , n ~ I} are condition-
ally independent (resp. conditionally i.i.d.) given ~*.
PROOF. (i) Since by Lemma 1, E{X I ~*} = Y, a.c., where Y is ~-measurable,
E{X I ~*} = E{X I ~}, a.c., whence (ii) follows immediately from (i). 0

Theorem 4. Let {X n' n ~ I} be a sequence of r.v.s, ~ a sub-a-algebra of


a(X n , n ~ 1), and f7 the tail a-algebra. If~* and f7* are the completions of
~ and f7 respectively and {X n , n ~ I} are conditionally i.i.d. given ~, then
~* = f7*.

PROOF. According to Theorem 3, ~* =:> f7 and hence ~* :=l f7*. By Theorem


2, {X n' n ~ I} are interchangeable and the proof of this theorem shows that

~n(x) = ~
n
f I IXj <x)!!2. P{X
j=1
1 < x I f7},

whence (Exercise 7.1.9)

E Hj~ IIXj<Xll~*}.!. E{P{X 1 < xlf7}I~*} = P{X 1 < xlf7}.

By Lemma 2, {X n } are conditionally i.i.d. given ~*, implying

E{~n .r. IIXj<x) I~~*} = P{X


j= 1
1 < x I ~*} = P{X 1 < x I f7}, a.c.

Consequently, for A E ~ and (x I' ... , x m ) E Rm , employing Lemma 2 again,

J
= E[IAjQ P{X < Xi I ~*} If7 = E[IAiQ P{X j < Xi I f7} If7
j J
= P{A I f7} . p{0[Xj < xiJ If7}. a.c.

Hence, by Theorem 1, ~ and a(X I" .. , X m) are conditionally independent


given f7 for any positive integer m. Therefore, ~ and a(X I' X 2' ...) are
conditionally independent given f7. Since ~ c a(X I' X 2' ...), P{ G I f7} = 0
or 1, a.c., for G E~. For any G E ~ ifT = {w: P{G I f7} = I}, then T E f7 and
P{TG} = E P{TG I f7} = E IT P{G I f7} = P{T},
P{G} = EP{GIf7} = EI T P{GIf7} = P{TG}.
Therefore, P{T ~ G} = 0, whence G E f7*. Hence, ~ c f7*, implying
~* c f7*. Consequently, ~* = f7*. 0
238 7 Conditional Expectation. Conditional Independence. Introduction to Martingales

It follows directly from Theorem 4 that if {X., n ~ I} are conditionally


i.i.d. given l'§ and also given Jr, then l'§* = Jr*.

Definition. A a-algebra l'§ of events is called degenerate if P{G} =0 or 1,


for all G E l'§.

Corollary 6. If {X., n ~ 1} are conditionally i.i.d. given a a-algebra l'§ c


a(X., n ~ 1) and the tail a-algebra is degenerate, then l'§ is degenerate.

Corollary 7. If {X., n ~ 1} are interchangeable LV.S with a degenerate tail


a-algebra, then the a-algebra of permutable events is degenerate.

In particular, Corollary 7 in conjunction with the Kolmogorov zero-one


law yields

Corollary 8 (Hewitt-Savage Zero-One Law). If {X., n ~ I} are i.i.d. r.v.s,


then the a-algebra of permutable events is degenerate.

EXERCISES 7.3
I. Let S denote the set of positive or nonnegative or all integers and let (P•• n E S} be a
probability density function. Define P = {Pi}} to be a stochastic matrix. that is,
Pi} ~ 0 for all i, j E S, and Iies Pi} = I for all i E S. (i) Verify that p{nj=o [Xi =
iJ} = Pio nj;;J Pij.ij+1 satisfies the consistency requirement of Theorem 6.4.3. The
LV.S (X., n ~ O} are called a temporally homogeneous Markov chain with state
space S. (ii) Check that Pi} = P{X.+ 1 = j IX. = i}, n ~ 0 for all i, j E S, and verify
for n ~ I, m > 0, that "the past" X 0' ... , X .-1 and "the future" X.+ I> . . . , X .+m
are conditionally independent given "the present" X•. (iii) Verify that the product
of two stochastic matrices is a stochastic matrix and interpret the equation pm+. =
pm. p. probabilistically.

2. Verify that the class Y of permutable events as defined in (7) is a a-algebra and prove
Corollary I.
3. If S. = Ii Xi where {X., n ~ I} are independent r.v.s then 8 1 and 8 3 are condi-
tionally independent given 8 2 , Hint: Via Corollary 7.1.2 if F3 is the dJ. of X 3

P{8 1 < YI' 83 < Y318 1 , S2} = I[S,<Yd' F3(Y3 - S2)


4. If {X., n ~ I} are interchangeable LV.S, then for all r > 0

E IX I X 2 ... X. I' = f E· {IX I I' I Y} dP, n = 1,2, ... ,

where [/ is the a-algebra of permutable events.


5. There exist interchangeable r.v.s (X., n ~ 1} not i.i.d. with E Xf = XJ but

finite for all k ~ I (in particular, with zero covariances).


7.4 Introduction to Martingales 239

6. If a r.v. X is independent of the i.i.d. r.v.s {X., n ~ I}, show that y" = X. + X,
n ~ 1, are interchangeable r.v.s. Express that joint dJ. of (YI , ... , y,,) in terms of
the dJ.s of X and X I .
7. For any n ~ 2, find n interchangeable r.v.s {X 1, ... , X.} which cannot be embedded
in a collection of n + I interchangeable LV.S {XI"'" X., X.+ I }. Hint: Recall
Exercise 6.3.10.
8. If {X., n ~ I} are .Ii'l interchangeable LV.S with p = E(X 1 - EX I)(X 2 - E X I),
then E X I X 2 = 0 iff E X I = 0 and p = O.
9. If {X., n ~ I} are .Ii'2 interchangeable LV.S and {Y", n ~ I} is defined by X. =
y" + E{XdY'}, then {y", n ~ I} constitute uncorrelated interchangeable LV.S.

10. Prove that if {X n} are .Ii'l interchangeable LV.S then

I' I I I I
I
E ~ ~ Xj ~ E n _ I ~ Xj n 2.
n- 1
,
~

11. Prove that if the r.v.'s {X n, n ~ I} of Corollary 5 are conditionally i.i.d. given rJ,
then the coordinate LV.'S {en, n ~ I} are i.i.d.

7.4 Introduction to Martingales


The indebtedness of probability theory to gambling is seldom as visible as in
the concept and development of martingales. The underlying notion is that
of a fair game in which regardless of the whims of chance in assigning out-
comes to the past and present, one's (conditional) total expected future
fortune is precisely one's current aggregate. Analogously, submartingales
and supermartingales correspond to favorable and unfavorable games,
respectively. Although martingales were first discussed by Levy, the realiz-
ation of their potential, the fundamental development of the subject and
indeed most of the theorems presented are due to Doob.
Let (n, fi', P) be a probability space and N some subset ofthe integers and
±oo, that is, N c {-oo, ... , -2, -1,0,1,2, ... , oo}. A sequence fi'n of
sub-a-algebras of fi' with indices in N will be called a stochastic basis if it is
increasing, i.e., fi'm c fi'n for m < n. If {fi'n, n EN} is a stochastic basis and
Sn is an fi'n-measurable function for each n E N, then {Sn, fi'n, n EN} is
called a stochastic sequence. An fR p stochastic sequence, p > 0, is one for
which EISnlP < 00, n E N, and correspondingly an fRp bounded stochastic
sequence is one satisfying sUPnEN EISnlP < 00.

Definition. A submartingale is a stochastic sequence {Sn, fi' n' n EN} with


IE Sn I ::; 00, n E N, such that for all m, n E N with m < n
E{Sn I fi' m} ~ Sm, a.c. (I)
If {-Sn' ·'F n, nE N} is a submartingale, then {Sn, .'F n, nE N} is called a super-
martingale. Moreover, if {Sn' fi' n' n E N} is both a submartingale and a
supermartingale, it is termed a martingale.
240 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

Definition. If {ffn, n E N} is a stochastic basis, a stopping time or ffn-time T


is a function from Q to N u {oo} such that {T= n} E ffn, n E N, and
P{[TEN] u [T = oo]} = 1.

When N = {I, 2, ... , oo}, the preceding coincides with the definition of
Section 5.3 and T is a finite stopping time or stopping variable if
P{T = oo} = O.
If {Sn, ffn' n E N} is a submartingale and there exists a measurable
function R with IE R I ::s; 00 (resp. an ()nEN ffn-measurable function L with
IELI ::s; (0) such that for each n E N

E{R I ffn} :2: Sn I


(res p . E{Sn DNffn} ~ L)' a.c., (2)

then {Sn, ffn, N} is said to be closed on the right (resp. left). A closed sub-
martingale {Sn, ffn, N} is one which is closed both on the right and left. A
submartingale {Sn' ffn, N} is declared to have a last (resp., first) element if
N has a maximum (resp. minimum). Obviously, a submartingale with a last
(resp. first) element is closed on the right (resp.left). A martingale {Sn, ffn' N}
is said to be closed on the right by R (resp. on left by L) if (2) holds with
equality.

The submartingale {Sn, ff n' n :2: I} will be closed iff it is closed on the right
since it is automatically closed on the left by S I' An analogous statement
holds for {Sn, ffn, n ::s; -I}.
The important special case ofa submartingale or martingale {Sn, ffn, N}
with ffn = u(Sm, m ::s; n, mEN), n E N, will be denoted by {Sn, N} or {Sn,
nEN}.
Simple properties of conditional expectations readily entail the following:
i. The stochastic sequence {Sn, ffn, - 00 < n < oo} is a submartingale
iff for every finite integer n, IE Snl ::s; 00 and E{Sn+ II ffn} ~ Sn, a.c.
ii. If {Sn, ffn, N} is a submartingale and {Sn, <§n, n E N} is a stochastic
sequence with <§n c ffn, n EN, then {Sn, <§n, N} is a submartingale.
iii. Ifboth {Sn' ffn, N} and {S~, '~n, N} are submartingales and E Sn + E S~
exists for each n E N, then {Sn + S~, ffn' N} is a submartingale.
iv. An IE I stochastic sequence {Sn = I~ Xi, ffn, - 00 < n < oo} is a
submartingale (resp. martingale) iff for -00 < n < 00
E{X n+ 1 I ffn} :2: 0 (resp. E{X n+ 1 I ffn} = 0), a.c.;

the LV.S {X n } are called martingale differences in the latter case.


v. If {Sn = Ii= I Xi> ffn, n :2: 1} is an !e2 martingale, then

n
E S; = I EX;, n ~ 1.
j= I
7.4 Introduction to Martingales 241

The condition of (iv) corroborates the view of Sn as an aggregate of outcomes


X n of favorable or fair games.
If, as in Example 3, {Sn, iF n, n ~ m} is a stochastic sequence such that
{S:, iF:, n ::s;; -m} is a martingale, where S: = L n , iF: = iF -n' n ::s;; -m,
then {Sn, iFn, n ~ m} is sometimes alhided to as a downward (or reverse)
martingale, in contradistinction to the more standard upward martingale
{Sn, iF n, n ;?: m} of Example 1.
The following examples attest to the pervasiveness of martingales.

EXAMPLE 1. Let {X n' n ~ I} be independent .P 1 random variables with


Sn = L7=1 Xi' Then {Sn' n ~ I} is a submartingale ifE X n ~ 0, n > 1, and a
martingale if E X n = 0, n > 1.

EXAMPLE 2. Let {X n' n ~ l} be independent .P t random variables with


E Xn = 0, n ~ 1. For any integer k ~ I, if

Sk.n = L X i,X i2 •.• X ik , n ;?: k,


1 ~il < ... <ik5:n

then {Sk.n, n ~ k} is an .P1 martingale. The important special case k = 1 is


subsumed in Example 1. More generally, if {X n , n;?: I} are .P k random
variables with E{X n+ tl·'1'n} = 0, n ;?: 1, where {X n, ·'1'n, n ;?: I} is a sto-
chastic sequence, then {Sk.n' iFn, n ~ k} is an.P1 martingale with E Sk.k = 0.

EXAMPLE 3. Let {X n' n ~ I} be interchangeable random variables and cpO,


a symmetric Borel function on R mwith Elcp(Xl>"" Xm)1 < 00. A sequence
of so-called U -statistics is defined for any integer m ;?: I by

Um. n = (n)-I
m
L
1 ;s;i l < ... <i m :5n
cp(X i " ... , X im ), n ;?: m.

If iF n = a(Um.j,j ~ n), then for 1 ::s;; i l < ... < im ::s;; n + I and BEiF n+t

via symmetry and interchangeability (Exercise 7.1.12), implying


E{cp(X i,,···, Xi.J I iF n+d = E{cp(X 1"'" X m) I iF n+d, a.c.,
for 1 ::s;; i 1 < ... < im ;S; n + 1, and hence

E{Um.nliFn+d = E{Um.n+1IiFn+d = Um. n+ 1 , a.c.

Consequently, if U: = Um. -n and iF: = iF -n' n::S;; -m, then {U:, iF:,
n ::s;; -m} is a martingale closed on the right. The important special cases
m = I, cp(x) = x and m = 2, cp(x 1, X2) = (Xl - X2)2j2 yield the arithmetic
mean U l.n = (ljn) L7= 1 Xi = Xn (say) and the sample variance U 2.n =
(n - 1)-1 L7=1 (Xi - Xn )2 respectively.
242 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

EXAMPLE 4. Let {X n' n ~ I} be ff j LV.S which are conditionally independent


given r; with E{X n I r;} ~ 0 a.c. for n ~ 1. If Sn = L~ Xi and fl'n = (J(r; u
(J(X j, ... , X n», then, recalling Theorem 7.3.1(iv), {Sn, fl'n, n ~ I} is a sub-
martingale. In particular, if {X n' n ~ 1} are ff I interchangeable LV.S with
E X IX 2 = 0, then {Sn, fl'n, n ~ 1} is a martingale since, invoking Theorem
7.3.2 and Corollary 7.3.5,0 = EX IXn = E[E{X jXn,r;}] = E[E 2 {X nlr;}],
n ~ 1.

EXAMPLE 5. Let {Sn = D Xj, fl'n, n ~ I} be a submartingale with EX:


< ro, n ~ 1, and let T be an fl'n-time. If 1;. = min(T, n) and V n = ST = n

L~ XJIT"2jj, then {Vn, fl'n' n ~ 1} is a submartingale since V n is fl'n-


measurable, E V n+ ~ D EX: < ro, and

E{Vnl'~n-l} = Vn- I+ E{XnI[T"2n)I'~n- d


= V n- I + I(T"njE{Xnl'~n-l} ~ Vn-I> a.c., n > 1.

EXAMPLE 6. Let (0, fl', P) be the probability space consisting of Lebesgue


measure on the Lebesgue-measurable sets of 0 = [0, 1] and for each n =
1,2, ... let the points 0 = w n. -I < w n. O < w n. 1 < ... < Wn,2n = 1 engender
a partition, say Qn' of 0 into disjoint intervals such that Qn+ I is a subpartition
of Qn' Choose .'Fn = the (J-algebra generated by the intervals An. o =
[0, Wn.o], An. j = (Wn.j- I> Wn.j], I ~ j ~ 2n, and for any finite, real-valued
function 9 on 0 define

Vn(w) = g(wn,) - g(wn.j-d, WE An • j , 0 ~j ~ 2n .


Wn,j - Wn,j-I
Since, as is readily verified, for every n ~ 1

1 VndP = 1 V n+ 1 dP, AEfl'n, (3)

{Vn' ·'Y'n, n ~ I} is a martingale. Note that for N = {I, 2, ...}, EI Vnl < ro,
n ~ I, and (3) constitute an alternative definition of an ff j martingale
{Vn,fl'n,n ~ 1}.

EXAMPLE 7. Let {X n , n ~ I} be any sequence of 2) LV.S and define X~ =


X n - E{Xnl XI' ... , Xn-d, n ~ 2, and X'I = XI or XI - E XI' If S~ =
L~ X'i' then {S~, n ~ 1} is a martingale. In particular, if {Sn = D X j '
n ~ I} is an ffl submartingale and
n

S~ = Sn - IE{XjIXI ... ,Xj-d, n ~ 2, S'j = SI = Xl> (4)


j= 2

then {S~, n ~ I} is a martingale and 0 ~ L1=2 E{XjIXI, ... , Xj-d ja.c.


7.4 Introduction to Martingales 243

n
EIS~I ~ EISnl + L E Xj = EISnl + E Sn -
j:2
EX) ~ 3 sup EISnl·
n~1

Hence, every fill submartingale {Sn, n 2 1} can be expressed as Sn =


S~+ S:, n 2 1, where {S~, n 2 1} is an fill martingale and 0 ~ S: i a.c.
Moreover, if {Sn, n 2 1} is fill bounded so are S~, n 2 1, and S~, n 2 1.

The first question to be explored is convergence of submartingales.

Theorem 1. If {Sn' §n, n 2 1} is a submartingale such that for every {§n}-


time T

J [T< 00]
ST dP :F 00,

then limn.... 00 Sn exists a.c.


PROOF. If Pn {A} = fA Sn dP, A E § n' then by the submartingale property,
Pn+) {A} 2 Pn{A} for A E §no Suppose that for some pair of real numbersIX,/J

V = {flni Sn > IX > [J > lim Sn}


n-oo n-oo

has positive probability, say P{ V} > lJ > O. It may and will be supposed,
replacing Sn by (IX - [J)- )(Sn - (J), that IX = 1, [J = O.
Set mo = 1, B o = Q, Vo = B o V and C~ = Bo{Sn > 1, Sj ~ 1, for mo ~
j < n}. Define A) = U~~ C~, where nl is large enough to ensure P{Vo - Ad
< lJ/4.
Next, define D~ = Al {Sn < 0, Sj 2 0, n) ~ j < n}, B) = U::'i D~, where
ml is large enough to guarantee P{Vo - Bd < lJ/4 and note that
m,
Pm,{A 1 - Bd 2 Pml{A1 - Bd + L pj{DJ}
j=nl

nl
2 Pn,{Ad = L Pnl{C~}
n=mo
244 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

Furthermore, if VI = B 1 . Vo = BIB o V, then P{Vd = PWo} - PWo - Bd


> 3t5/4.
If C~ = B I {Sn > 1, Sj :::; 1, m1 :::; j < n}, A 2 = U~', C~, then n2 may be
chosen so that P{ VI - A 2} < t5/2 3 . Set D~ = A 2{Sn < 0, Sj;;::: 0, n2 :::;
j < n}, and
m2 m2 "1 mt

B2 = U D~ = A2 U {Sn < O} = U {S; > 1}' U {Sj < O}


"2 "2 1 n1

"2 m2

.U {Sk > 1}' U {Sn < O},


whence, analogously, P{ VI - B 2} < t5/2 3 if m2 is large and, moreover,

J.lm,{A 2 - B 2} ;;::: P{A 2} ;;::: P{VI } - P{VI - A 2} ;;::: 5:-


Proceeding inductively, for k = 1, 2, ... there exist integers mk+ 2 >
mk+ I > nk+ I > nk and sets A k E ~ nk' Bk E ~ mk with A k ::> Bk ::> Ak+ I such
that
(2 k + 1)t5 t5
J.lmk{A k - Bd > 2k+ I > 2'

Now the disjoint sets Ck = A k - BkE~mk' k;;::: 1, and so setting {T = md


= CdSmk ;;::: O}, k ;;::: 1, and {T = oo} = (Ur Ck[Smk ;;::: OJ)", it follows since
A k - Bk C {Smk ;;::: O} that

I Si=O, I s;=f,1 S,-:;;;:::I.J.lm{Cj}=oo,


JIT<OO) JIT<OO) I JCJIS,"j~O) J I J

contradicting the hypothesis. Consequently, P{ V} = 0 for any choice of


> Ii, implying the conclusion of the theorem.
IY. 0

Lemma 1. If {Sn, ~n, n EN} is a martingale and ({) is any real convex function
with IE({)(Sn) I :::; 00, then {({)(Sn)' ~n' nEN} is a submartingale. If (Sn' ~n'
n E N} is merely a submartingale but ({) is, in addition, nondecreasing, then
{({)(Sn)' ~ n' n E N} is likewise a submartingale.
By the conditional Jensen inequality, Theorem 7.1.4, for m < nand
PROOF.
m,nEN,
E{({)(Sn) I·~ m} ;;::: ({)(E{Sn I ~ m}) = ({)(Sm)' a.c.
yielding the first statement. The submartingale hypothesis combined with ({)
nondecreasing converts the prior equality to ;;:::, thereby preserving the
oo~~oo. 0

In particular, if {Sn' n ;;::: l} is a martingale, then {ISnIP, n;;::: l} is a sub-


martingale for any p ;;::: 1, and if {Sn, n ;;::: 1} is a submartingale, so is
(max(Sn, K), n ;;::: 1} for K E ( - 00, 00).
7.4 Introduction to Martingales 245

Lemma 2. If{Sn,g;n,n :2: I} is a submartingale with sUPn"21 ES; = M < 00,


thenfor any stopping time T

r
J[T<oo]
S;dP-:s;,M, r
J[T<oo)
ISTldP-:s;,2M-ES I -:s;,3supEI Snl.
n"21
(5)

PROOF.Set T' = min(T, n), n :2: I. Since {S;, g;n, n :2: I} is a submartingale
by Lemma 1,

E s;, = L n
j= I
I
[T=j)
st + I [T>n)
s; -:s;, Ln
j= I
I
[T=j)
s; + I
[T>n]
s; = E Sn+,
so that E S;, ~ M, whence the first part of (5) follows from Fatou's lemma,
Next,

E SI = i[T= I]
SI + i [T> I)
SI -:s;, i
[T= I)
SI + i[T> I)
S2

-:s;, i[T=I]
SI + i [T=2)
S2 + i
[T>2]
S2 -:s;, ...

-:s;, r
J[I :5T:5n)
ST + r
J[T>n)
Sn = EST"

implying
E 1ST' I = 2 E S;, - EST' -:s;, 2M - E SI'
and the remainder of (5) follows once more by Fatou. o
The special case X n == X o of the next corollary, due to Doob (1940), may
be considered a milestone in the development of martingales, When X 0 is
bounded, this had been obtained by Levy (1937).

Corollary 1. Let {g; n' n :2: 1} be a stochastic basis and {X n' n :2: O} a sequence
of.If I LV.S with X n ~ X 0 and E sUPn"211 Xnl < 00. Then, if

g; 00 = a( Ug;n),
n=1

PROOF. (i) It will first be demonstrated for any integrable r,v, X 0 that
E{X o I g;n} ~ E{X o I g; oo}·

By considering xt and X o separately, it may and will be supposed that


X o :2: O. Then, setting Sn = E{X o I g;n}, {Sn, g;n, n:2: I} is a nonnegative
martingale with SUPn"21 EISnl = E SI = E X o < 00. Hence, by Theorem 1
and Lemma 2, Iim n_ oo Sn = Soo exists, a.c., and, moreover, Soo is an integrable
r.v. by Fatou's lemma. Since
246 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

P{Sn > C} ~ c- I E Sn = C- I EX 0 --+ 0 as C --+ 00,

r
JISn>C)
Sn= r
JISn>C)
E{Xolg;n}= r
JISn>C)
Xo--+O

as C --+ 00, and so {Sn' n 2 I} is u.i., whence Sn~ Sro by Corollary 4.2.4.
Thus, for all n 2 I and A E g;n, if m > n,

IXo = f}n = ISm ~ f}ro,


implying SA X o = SA Sro for all A E Uf g;n, so that E{X o I g; ro} = Sro =
lim Sn, a.c.
Next, set Ym = sUPn~mlXn - Xol. For every integer m and all n 2 m

Dn == IE{X n I g;n} - E{X o Ig; ao} I


~ IE{(X n - XO)lffn}1 + IE{Xolg;n} - E{Xolg;ao}1
~ E{Ymlg;n} + IE{Xolg;n} - E{Xolg;ao}l,

whence for every integer m


lim Dn ~ E{ Ym I g; ao}

by the part (i) already proved. Since Ym ~ 0 and I Ym I ~ 2 sup IXnIE !l\,
it follows that E{ Ym I g; ao} ~ 0 as m --+ 00 by Theorem 7.1.2, whence
~~Q 0

Theorem 2. Let {Sn, g;n, n 2 I} be a submartingale and g; ao = a(Uf g;n)'


I. If SUPn~ I E S: < 00, then Sao = lim Sn exists a.c. with Sao finite a.c. on
{SI > - oo}. Moreover, ifsuPn~ I EI Snl < 00, then EI Sao I < 00.
11. If {S:,n 21} are u.i., then Sao = limS n exists a.c. and {Sn,g;n,
I ~ n ~ oo} is a submartingale.
iii. If {Sn, g;n, n 2 I} is a submartingale closed by some g;-measurable
function S with E S+ < 00, then {S:, n 2 I} are u.i., so that
Sao = lim Sn

exists a.c. and, moreover, {Sn, g;n, 1 ~ n ~ oo} is a submartingale closed


byS.
iv. The LV.S {Sn, n 21} are u.i. iff {Sn, g;n, 1 ~ n ~ oo} is an!E 1 sub-
martingale with lim n_ ao E Sn = E Sao iff Sn ~ Sao' where Sao = lim Sn'
PROOF. (i) By Lemma 2 and Theorem I, Sn ~ Sao' Moreover, if sUPn~1
EISnl < 00, Fatou's lemma guarantees EISaol < 00. Next, for any k > 0 set
S~ = SnIIS,>-k)' Then {S~, g;n, n 2 I} is a submartingale with sUPn~1
E S~+ < 00 and E S'I 2 -k. Lemma 2 with T == n ensures sUPn~1 EIS~I
< 00, whence Sao is finite a.c. on {SI > - k}. Letting k --+ 00, the remaining
portion of (i) obtains.
7.4 Introduction to Martingales 247

Apropos of (ii), the hypothesis implies Sn ~ Soo via (i). Moreover, for
A E fF mand n 2= m, applying Theorem 4.2.2(ii) to - Sn 1A'

f f
A
Sm :5;
A
Sn :5; TIm
n
f f
A
Sn :5;
A
Soo,

whence E{Soo IfFm} 2= Sm, a.c., for m 2= 1.


In case (iii), the hypothesis and Lemma 1 ensure that {Sn+' fF n' n 2= I} is a
submartingale closed by S+, whence for n 2= 1 and k > 0

i [Sn>k)
S+ <
n -
i[Sn>k)
S+
.

Since P{Sn > k} :5; k- I E Sn+ :5; k- I E S+ -+ 0 uniformly in n as k -+ 00, it


follows that

lim i
k-oo [Sn>k)
S: :5; lim
k-Q
i [Sn>k]
S+ = 0,
uniformly in n as k -+ 00. Now each S: is integrable and so {Sn+, n 2= I} are
u.i. By (ii) {Sn, fF n' 1 :5; n :5; oo} is a submartingale. To verify that it is closed
by S, define S~k) = max(Sn' -k), 1 :5; n :5; 00, and S(k) = max(S, -k), where
k > O. Then {S~k), n 2= I} are u.i., S~k)~S~), and by Lemma 1 {S~k), fF n,
n 2= I} is a submartingale closed by S(k). Hence, for A E fF n , n 2= 1,

L LS~k) ~ LS~) L
S(k) 2= 2= S 00'

Since E S+ < 00 and -S(1) :5; _S(k) i -S,

f A
S = lim
k-oo
f A
S(k) 2= f A
S 00 ,

implying E{S I fF oo} 2= Soo' a.c.


In part (iv), if {Sn' n 2= I} are u.i., (i) and (ii) ensure that Sn ~ Soo and
{Sn, fF n' 1 :5; n :5; oo} is an .P I submartingale. Then, by u.i., lim n_ 00 E Sn =
E Soo'
Conversely, if {Sn, fF n , 1 :5; n :5; oo} is an.P I submartingale with lim E Sn
= ES oo , then {S:, n 2= I} are u.i. by (iii). Hence, ES: -+ ES;:;, and so
E S;; -+ E S~. Since S;; ~ S~, Corollary 4.2.4 ensures that {S;;, n 2= I} are
u.i.
Finally, if {Sn, n 2= 1} are u.i., Sn ~ Soo by Theorem 4.2.3, and this same
theorem also yields the converse. 0

Theorem 3. Let {Sn, fF n' n 2= I} be a martingale and fF 00 = a(U.~)= I fF n)


I. IfsupEISnl < oo,thenSn~SooE.PI'
n. If {Sn, n 2= I} is u.i., then Sn~ Sa; E.P I and {Sn, fF n, 1 :5; n :5; oo} is a
martingale.
248 7 Conditional Expectation. Conditional Independence. Introduction to Martingales

iii. If the martingale {Sn' fF n, n ~ I} is closed by some r.v. SE!l'I, then


{Sn, n ~ 1} is u.i., so that Sn ~ SOC) E!l' I and, moreover, E{S I fF n} = Sn,
a.c., 1 ~ n ~ 00.
IV. The LV.S {Sn, n ~ I} are u.i. iff {Sn' fF n, 1 ~ n ~ oo} is an!l' I martingale

with lim E Sn = E SOC) iff Sn ~ SOC), where SOC) = lim Sn'


PROOF. Parts (i), (ii), (iii), (iv) follow directly from their counterparts in
Theorem 2 (applied in the latter three cases to both Sn and - Sn)' 0

Corollary 2. If {Sn, fF n, n ~ I} is a positive (or negative) !l'1 martingale,


Sn~SOC)E!l'I'
The next theorem illustrates how global martingale convergence in
conjunction with stopping times yields local martingale convergence.

Theorem 4. If

is a submartingale with E sUPn;, I X: < 00, then Sn converges a.c. on the set
{XI> -00, sUPn;,1 Sn < oo}.
PROOF. For any c> 0, define T = T;, = inf{n ~ 1: Sn > c}. Then T is a
stopping time and {T;, = oo} = {suPn;, I Sn ~ c} --+ {suPn;, I Sn < oo} as
c --+ 00. As seen in Example 5, {Un = LJ=I XjI[T;,j), fF n, n ~ I} is a
submartingale and

E U: = E(.f. XjIIT;,jJ) +
J= 1
~ E(.f. X j I IT >i1) + + E(f. XjIIT=jl)+
J= 1 I

~ C+ EsupX: < 00.

Hence, by Theorem 2, Un converges a.c. on {X I > - oo}. Therefore, Sn =


LJ= I X j converges a.c. on {X I > - 00, Tc = oo}. Consequently, letting
c --+ 00, Sn converges a.c. on {X I > - 00, sUPn;' 1 Sn < oo}. 0

An issue of considerable importance and unquestionable utility in prob-


ability theory is the effect on expected values of randomly stopping a sto-
chastic sequence.

Theorem S. Let {Sn, fF n, n ~ I} be a submartingale.


I. 1fT is afinite {fFn}-time with

i IT>n)
s: = 0, (6)

then for n ~ 1

E{ST I fF n} ~ Sn, a.c. on {T ~ n} and E ST ~ E SI' (7)


7.4 Introduction to Martingales 249

tl. If {T", n ~ I} is a sequence of finite {j"n}-times with T 1 :$ T2 :$ ...


satisfying
m ~ I, (8)

and !F Tn = {B c 0: B[T" = j] E!Fj,j ~ I}, then {ST n , !F Tn , n ~ I} is a


submartingale.
PROOF. To prove the first part of (7) it suffices to verify for n ~ I and A E j "n
that

f
A[T:2:nl
ST> f
A[T:2:nl
Sn· (9)

Now

f A[T:2:nl
Sn = f A[T=nl
Sn + f A[T>nl
Sn

= f A[T=nl
ST + f A[T:2:n+ II
Sn+ I·

The last term on the right is the first term on the left with n replaced by n + 1,
so, repeating the argument m - n - I times,

fA[T:2:nl
Sn:$ f
A[nST<ml
ST + f A[T:2:ml
Sm = f
A[nSTSml
ST + fA[T>ml
Sm· (10)

Noting that JA[T>ml Sm :$ J[T>ml S;,


the desired conclusion (9) follows via
(6) sinceIE STI :$ 00. The remaining portion of (7) follows from (9) with
n = I, A = n = [T ~ I].
A propos of (ii), since S Tn is j " Tn-measurable, it suffices to prove that

a.c., n~1. (11 )

Let BE!F Tn and Bm = B[T" = m]. Then Bm E j" m and T,,+ t ~ m a.c. on
Bm • By (i)

whence, summing on m,

implying (11). o
250 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

Corollary 3. If {Sn' !F n, n E N} is a submartingale with N having a finite last


element and Tis a finite !Fn-time with IE STI ~ 00, then (7) holds for n EN.
PROOF. If m is the last element of N, the last term of (10) vanishes. D

Corollary 4. If {Sn' !F n, n ~ I} is a martingale and Tis a finite {!FnHime


with

lim
n-",
r
JIT>nl
ISnl = 0, (12)

thenfor n ~ 1

E{ST I !F n} = Sn, a.c. on {T ~ n} and EST = E SI' (13)

Moreover, under the hypothesis of (ii) of Theorem 5 with S: replaced by ISnl in


(8), {STn' !FTn , n ~ t} is a martingale; in particular,for any !F,.-time T, {SminIT,nl'
n ~ I} is a martingale.

In the age-old problem of a gambler's ruin (Section 3.4) i.e., symmetric


random walk on the line with absorbing barriers at integers b > 0 and
- a < 0, a particle starting at the origin moves at each stage one unit to the
right or left with equal probabilities, the successive moves Xi' i = 1,2, ... ,
being independent. Then {Sn = I~ X;, n ~ I} is a martingale and T =
inf{n ~ 1: Sn = b or -a} is a stopping variable satisfying (12) and T ~ 1.
According to Corollary 4, {SI' Sd is a martingale, so that
b P{ST = b} - a P{ST = -a} = EST = E SI = o.
Since the sum of these two probabilities is unity,
b
P{ST = -a} = - - .
a+b
As another illustration of the preceding, consider the following generaliz-
ation of the so-called ballot problem, in which r votes for the incumbent
and s votes for his rival are cast successively in random order with s > r.
The probability that the winner was ahead at every stage of the voting is
(s - r)/(s + r) and can be obtained from

EXAMPLE 8. Let {X j , 1 ~ j ~ n} be nonnegative integer-valued 2 1 inter-


changeable random variables and set Sj = I{
Xi' 1 ~j ~ n. Then

P{Sj <j, 1~j ~ nlSn} = (1 - ~nr


PROOF. Since the above is trivially true when Sn ~ n, suppose Sn < n. If
L j = S/j, !F _ j = a(Sj, ... , Sn), 1
j ~ n, then, as noted in Example 3,
~
OJ, !F j , - n ~j ~ -I} is a martingale. Furthermore, if

T = inf{j: -n ~j ~ -1, Yj ~ I}
7.4 Introduction to Martingales 251

and T = -1 if no suchj exists, then T is a stopping rule or finite {§)-time


and, moreover, a bounded LV. Since by definition YT = 1 on {Sj ?: j} U7
and zero elsewhere, Corollary 3 implies that on the set {Sn < n}

p{y [Sj ?: j] Sn} I = E{ YT I §" -n} = L n = ~n,


which is tantamount to the proposition. o
Setting X j = 2 or 0 according as the jth vote goes to the loser or his rival,
note that if rj (resp. s) of the first j votes are cast for the loser (resp. rival),
then given that Sn = 2r the event Sj < j, 1 :::;; j :::;; n = r + s, is tantamount
to rj < Sj' 1 :::;; j :::;; n.

Theorem 6. Let {Sn = I7


Xj, §"n,n ?: I} be a submartingale with EX: < 00,
n?: 1 and let T be a finite {§"n}-time and §"o = (0, Q). If(i)
T
EIE{X:I§"n-d < 00 (14)
n=1

or (ii) {S: , n ?: I} are uniformly integrable, then for n ?: 1


E{ST I §"n} ?: Sn, a.c. on {T ?: n}, and EST?: E SI' (15)
PROOF. Under (i)
T 00 00

E I X n+ = E I X n+ I[nn) = I E[I[T~n) E{X: I§"n- dJ


1 1 1

00 T
= EI I[T~nl E{X: I§"n-I} = E I E{X: I§"n- d < 00.
1 1

Hence, E S; :::;; E If X + < 00 and, moreover,

I
n

I[T>n)
s+n <-
I
[T>n)
L
n

j=1
x:t
'\' . J -
< '\'
T
L.J
[T>n)j=1
x:t = 0(1)

as n --+ 00. Thus, (6) and consequently the conclusion (7) of Theorem 5 hold.
Under (ii), sUPn2: 1 E S: < 00 whence E S; < 00 by Lemma 2. Since
P{T> n} = 0(1), the remainder of(6) and consequently (15) follow from u.i.
o
Corollary 5. If {Sn = I7
Xi' §"n' n ?: 1} is an It'l martingale and T is a
finite {§"n} -time satisfying
T
ELE{lXnll§"n-d < 00 (16)
n=1

(in particular, if T is a bounded r.v.) or if {Sn' §" n' n ?: 1} is u.i., then for any
252 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

n ~ 1

E{ST I '?n} = Sn, a.c. on {T ~ n}, and EST = E SI' (17)

Corollary 6. If {Sn = D Xj, .?n' n ~ I} is an 2 1 martingale and T is a


finite {~}-time satisfying (12) or (16), thenfor any r ~ 1

E{ ISri' I'?n} ~ ISnl', a.c. on {T ~ n}, and EISrI' ~ EISII'. (18)

PROOF. By Lemma 1, {lSnl. .?n, n ~ I} is a submartingale, and so, by


Theorem 6
(ISnII[nnl ~ E'{ISTII[nn) I'?n} ~ E{ISTI'I[nn) I'?n},
recalling the conditional Jensen inequality. o
Next, martingale generalizations of Theorem 5.3.1 (Wald's equation)
and Theorem 5.3.3 will be obtained. Let {Sn = LJ=I Xj' .?n, n ~ I} be an
2 2 martingale. As noted in Example 2, {S; - D
XJ = 2 II $i<j$n XiX j ,
~, n ~ 2} is an 2 1 martingale with expectation zero, whence for any stopping
variable T, setting T(n) = min(T, n),
T(n)
E S}(n) = E X J II
(19)

by Corollary 4. Since T(n) i T, it therefore follows from Fatou's lemma that

(20)

Thus, if E S} = 00, equality holds in (20). In the contrary case E S} < 00, in
view of (20), equality will hold therein recalling (19) if,

(21 )

Hence, it suffices to verify (21) to establish equality in (20).

Lemma 3. If {Sn' .? n' n ~ I} is an 2 2 martingale and T is any finite {.? n}-


time, then (20) obtains. Moreover, if

lim
n-""
i
IT>n]
S; < 00 (22)

or

lim i
n-"" [T>n]
ISnl =0 (23)

holds, then E S} = E If XJ.


PROOF. In view of the prior discussion, it suffices to show that (22) ~ (23)
7.4 Introduction to Martingales 253

=- (21). Suppose that lim f[T>ndSnl = C > O. For any k in (0,00)

lim
n~co
r
J[T>n)
S; ~ k lim
n~co
r
JIT>n,ISnl>k)
ISn I = k· c --+ 00

as k --+ 00, and so (22) =- (23). Next, supposing, as is permissible, that


E S} < 00, necessarily EISTI < 00, and so via Theorem 5, (23) implies that
on {T ~ n}

whence on {T ~ n}

Since E{S} I ~n} = E{S}(n) I ~n} = Shn) on {T < n}, (21) follows. 0

Theorem 7. If {Sn = I~ X j, ~ n' n ~ I} is an .se 2 martingale and T is any


finite {~n}-time, then E S} ::; E If XJ. Moreover, if anyone of the four con-
ditions

lim r
n~co J[T>n)
ISnl = 0, lim r
n~co J[T>n)
S; < 00,

T (24)
EI
n=1
x; < 00,

holds, then, setting ~ 0 = {0, Q},


T T
E S} = E I XJ = E I E{XJ I ~j- d· (25)
j= 1 j= 1

Either of the last two conditions of(24) implies E ST = EX l ' If {Sn = D Xi'
~ n' n ~ I} is merely an .se 1 martingale, the last condition of (24) entails
EST=EX t ·

PROOF. First, E If XJ < 00 =- (22) since, recalling (19),

i [T>n)
S; ::; E sLn = E
TAn
I XJ ::; E I XI.
1
T

1
(26)

and so in view of Lemma 3 and Corollary 4 it suffices to note that the final
condition of (24) ensures that
T
EISTI ::; E I IXnl < 00
I
254 7 Conditional Expectation. Conditional Independence, Introduction to Martingales

and
n T
EISnIIIT>n] ~ E L IXjIIIT>n) ~ E I
1 t
IXjIIIT>n] = 0(1). 0

Corollary 7. If {X n' n 2 I} are independent LV.S with EX n = 0, n 2 1, and


Tis afinite {Xn}-time, then, setting Sn = L7=t Xi'

(27)

implies E ST = O. Ifa~ = E X~ < 00, n 2 I, then either (27) or


T
EI af < 00 (28)
1

implies
E Si- = ELf Xl = ELf af. (29)

PROOF. This follows directly from Theorem 7. o


A useful device in obtaining Doob's maximal inequality (33) and a mart-
ingale generalization (36) of Kolmogorov's inequality is

Lemma 4. Let {Sn = I1= 1 X j, !#'n' n 2 I} be an 2 t stochastic sequence and


{vn, g;,-l, n 2 I} a stochastic sequence with vn E2oo , n 2 I. Then for any
bounded {!#'n}-time T,
T
E VTST = E I [Vj E{X j I !#'j-l} + (Vj - Vj-l)Sj-l],
1
(30)
and, moreover, i/(Vj+l - Vj)Sj ~ 0, a.c.,j 2 0,
T
E VTS T $ E I Vj E{X j I !#'j- d· (31)
1

PROOF. If Un = VnS n - II [vjE{Xjl!#'j_d + (Vj - Vj-l)Sj-l], then {Un,


!#'n' n 2 l} is a martingale and (30) follows from Corollary 4 or 5. Then (31)
follows directly from (30). 0

Corollary 8 (Dubins-Freedman). If {Sn = Ll Xj' !#'n' n 2 l} is an 2 2


martingale with EX 1 = 0 and Yn = E{X~ I !#'n- d, n 2 1, where!#'o =
{0, n}, then for any stopping time T and real numbers a, b with b > 0

r 2
(.,---------,-,.-a_+_S--'-T_-,.--) 2 < a + Y1 + _1_ (32)
JIT<OO) b + Y1 + ... + YT - (b + y1 )2 b + Y1 '

PROOF. If v,;- 1 = (b + Y1 + ... + Yn?, then {vn, !#'n- I> n 2 l} is a stochastic


sequence with Vn+1 $ Vn' Since {(a + Sn)2, !#'n, n 2 I} is a nonnegative
submartingale, by (31) (with So = -a)
7.4 Introduction to Martingales 255

f (+
ITSn)
b Ya + ST
I + ... + T
~ E Vj E{
Y )2 ~ L.
j~ I
(a +2
Sj) - (a + Sj_l) 2 IY'
tZ
j- I }
n

= vI(a 2 + YJ + E L: Vj lj.
j~ 2
Since
Y 1
v· Y. = ) < ,..--------
)) (b + YI + ... + lj)2 - b + YI + ... + lj_1 b + YI + ... + lj'
the conclusion (32) follows as n -+ 00. o
Theorem 8. If {Sn = L:~ Xj' Ji'n' n ~ I} is a nonnegative 2 1 submartingale
and {v n, Ji' n - 1> n ~ I} Q stochastic sequence of 2 00 r. v.s with Vn ~ Vn+ I ~ 0,
a.c., then for any A> 0, (i)

A p{ I sjSnVjSj ~ A} + r
max vnS n ~ j~i I E VjX j (33)
J[maxvjsj<).]

and (ii) (Doob Inequalities)

P { max Sj
I sjSn
~ A} ~ -1
A
f
[maxSj2:).)
Sn, (34)

whence

IISnllp ~ II max Sj
I I <)Sn
II
IP
~ P ~ 1 IISnllp, p> 1,

111~~:n Sj L< e ~ 1 (I + II Sn log+ Snllp), p = 1. (35)

(iii) (Hajek-Renyi Inequality) If {Un = Lj~1 uj , fF", n ~ I} is an 2 2

°
martingale and {b n, n ~ I} is a positive, nondecreasing real sequence, then for
any A>

P { V.I
max -.!... > A < -12 } L --)
Eu
n
2
(36)
I~j~n bj - I - ..1. j~1 b} .

PROOF.Let T; = inf{n ~ I: vnS n ~ Ai} and T; = min(T;, n), n ~ I, i = 1,2.


From (31)of Lemma 4

Ai p{ max VjSj
1 :5j:5:n
~ Ai} + fL ~lf~"VjSj<).r]
VnS n

~ i [T;Sn)
VT;ST; + i [T;>n)
VnSn = E VTiST,

Ti n

~ E L: Vj E{X j I Ji'j_ d ~ L: E VjX j, (37)


I j~1
256 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

yielding (33) for i = 1. To obtain (36), take i = 2, Vj = b j- 2 in (37), noting


that S. = U;, n 2: 1, is a nonnegative 2 t submartingale. Since (34) is an
immediate consequence of (33), it remains to prove (35). Setting S: =
maxI ~j~. Sj' if p > 1 it follows from Corollary 6.2.2 and (34) that

E S:p = p (00 ).p-I P{S: 2: )'}d)' s; p (00 ).p- 2 ( S. dP d)'


Jo Jo J[S~~l)
s'
= pES (n).p_2 d). = -p- E S (S*)(P-I)
•Jo p- 1 •• .

Hence, if (p - l)q = p, by Holder

E S:p s; ---.!!.-1 IIS.ll p IlS:(P-1)ll q = ~1 IIS.ll p Et/qS:P,


p- p-
yielding the first part of (35). If, rather, p = 1, again via (34)

E S: - 1 s; E(S: - 1)+ = L oo
P{S: - 1 2: )'}d)'

1 1
00
s; -1- s. dP d)'
o A. + 1 [S~ ~ H I)

r(S~-W d)'
=ES. A+l=ES.log+S:.
Jo
Since for constants a ;::: 0, b > 0 necessarily a log b :$: a log+ a + be-I,
E S: - 1 s; E S. log+ S. + e- I E S:,
from which the second portion of (35) is immediate. o
EXAMPLE 9. If {S., $i., n ;::: I} is a submartingale and h is any nonnegative,
increasing, convex function, then for any positive t and real x
E h(tS.)
P { max Sj 2: x } s; h( ) (38)
I ~)~. tx
and, in particular,

p{ max Sj 2: x} S; e- 1x E e'Sn , t > O. (39)


I ~)~.

PROOF. Since {h(tS), /7j, 1 :$: j S; n} is a nonnegative submartingale via


Lemma 1, (34) ensures that

p{ max Sj 2: x}
I~)~.
S; p{h(m~x tSj) 2: h(tX)} =
I$)$n
p{ m~x h(tS) 2: h(tX)}
15)$.

< E h(tS.). 0
- h(tx)
7.4 Introduction to Martingales 257

The next example generalizes Example 5.2.1.

EXAMPLE 10. If Sn = Lj=1 Xj' where {X n, n ~ 1} are i.i.d. fiJ p r.v.'s for some
= 0 whenever 1 :5: P < 2, then
p in (0, 2) with E XI
00

"L.. ]'-2/PX j S j-1 converges a.c. (40)


j=2
PROOF. If Y" = n-l/PXnlllXnl:5nl/Pj, n ~ 1, and 1'" = n-1/PSn-111lSn_d:snlJPj,
n ~ 2, then {1'"(Y,, - E Y,,), n ~ 1} is a martingale difference sequence wtih
00 00 00

L E 1'"2(Y,, - E y")2:s; L E(y" - E Y,,)2:5: L E y"2 < 00


n=2 n=2 n=2
by (13) of Theorem 5.1.3. Then L~=2 1'"(Y,, - E y") converges a.c. via Exercise
7.4.1.Moreover,L~=2EI1'"E Y"I :5:L~=2IE Y"I < oo,and{y"} and {n- 1 / PX n }
are equivalent sequences according to Theorem 5.1.3. Hence, L~=2 1'"E Y"
converges a.c., and so L~= 2 1'" Y" converges a.c.
By the Marcinkiewicz-Zygmund strong law oflarge numbers, P {I Sn -11 >
n 1/p , i.o.} = 0 whence {1'" Y", n ~ 1} and {Sn- dn 1/P. X n/n 1/p , n ~ 1} are equiv-
alent sequences and (40) follows. 0

EXERCISES 7.4
I. If {S., n :2: I} is a martingale satisfying (i) I1'=1
E X/ < 00 or (ii) If
E(IXjl/uxjl>l]
1])
+ X/IUXjl:S < 00 or (iii) I1'=1
EIXjlP < 00 for some p in [1, 2], then converges S.
a.c. Hint: For (ii) consider X; = X j l[I Xk;l] - E{X/[IXj l:S1)IX 1 , ••• , Xj-d and
Xi' = Xj - X;.
2. In statistical problems, likelihood ratios U. = g.(X l ' ... , X.)/f,,(X I ' ... , X.) are
encountered. where!., g. are densities, each being a candidate for the ac[ual density
ofLv.s XI.' .. , X •. If {X., n :2: I} are coordinate LV.S on (ROO, atoo, P) and g. vanishes
whenever f" does, show that {U., n ~ I} is a martingale whenf. is [he true density.
3. There exist martingales {S., ff., n :2: I} and stopping variables T for which (13)
fails. Let {X n , n ~ I} be i.i.d. with E XI = 0 and set T = inf{n:2: I:S. = D Xi
> O}. Then for n = I, E{ST - S.lff.} > 0 on [T > I].

4. If {S., ff., n :2: I} is a martingale or positive submartingale. then for any stopping
time T, EISTI ~ lim._oo EIS.I. In particular, if {S.,ff.,n:2: I} is!f'. bounded,
EISTI < 00 for every stopping time T.

5. If E 1STI < 00, it is spurious generality to have !im,,-oo rather than lim._oo in (6) or
(12). Hint: If v" = S;; in the first case and IS. I in the second, then, as in the proof of
Theorem 5,

zA[T«n] v" ~ fA[.:sT:sm]


VT + f
A[T>m]
Vm •

6. (i) Find a positive !f'1 martingale which is not u.i. (ii) If Y", n :2: 1, are r.v.s on
(!l, ff, P) and A E u(Y1 , Y2 , •• • ), then P{AI Y1 , •.• , Y,,} ~ I A • Hint: Apply Corol-
lary 1. (iii) If {ff., n:2: I} is a stochastic basis and {X., n:2: I} are r.v.'s with
258 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

E sUP.«lIX.I<co, then
E{limn_co X.lffco } ::;; limn-co E{X.lffco }·

7. If {A., n ~ I} is a sequence of events and A. E ff. i, prove the following version of


the Borel-Cantelli theorem:

8. If {So = II=l Xi' n ~ I} is an Ifp martingale for some p in [1,2] and 0 < b. i co
then lim S./b. = 0, a.c. on the set A where I:'=2 b;PE{lX.IPIff.-d < 00. In par-
ticular, S./b. ~ 0 if P{A} = 1. Hint: Set T" = T 1\ n where T = inf{n ~ 1:
D~: bj-PE{ IXjIPIj"j-l} > K > 0 and apply Exercise 1 to Ii" X)bj.

9. Let Y be an 2. LV. and {'§., -00 < n < oo} a stochastic basis. If '§co =
a(U,:,co '§.), '§-co = n,:,co '§., and U. = E{YI'§.}, -co::;; n::;; 00, then {U.,
'§., - 00 ::;; n ::;; oo} is a martingale.

10. Let (Sn = I~ Xi' ff n, n ~ I} be a martingale with bounded conditional variance,


i.e., E{S. - E{S.Iff.-. }]21ff._.} = E{X;Iff._.} ::;; a;, n
~ 1, where is a a;
finite constant. If Uk .• is as defined in Example 2, show that E( Uk. n - Uk .• _ .)2 ::;;
a; E Uf-I.n-t, and hence that E
2k k
uL : ; D:: a;+.
1 2k
D= t ar. a; a
If = 2 , E ul,. ::;;
(;)a and E(U k.• - U k.• _ 1 )2 ::;; n - a , k ~ I.

11. Show, if (S., ff., n ~ I} is a nonnegative martingale with E S. = 1, that PIS. ~ A.,
some n ~ I} ::;; I/X

12. Prove that an If. stochastic sequence {II= t Xi' a(X., ... , X.), n ~ I} is a martin-
gale iff EX. + 1 cp(X I' ... , X.) = 0 for all n ~ I and all bounded Borel functions cp,
whereas the 2. LV.S {X., n ~ I} are independent iff for all n ~ 1

E t/!( X" + • )cp( X .. ... , X n) =0


for all bounded Borel functions cp, t/! with E t/!( X. + .) = o.
13. (Harris) A branching process is a sequence {Z., n ~ O} of nonnegative integer-
valued LV.S with Zo = I and such that the conditional distribution of Z.+ 1 given
(Zo, ... , Z.) is that of a sum of Z. i.i.d. LV.S each with the distribution of ZI' If
E Z t = mE (0, 00), verify that {w;, = Z.lm', n ~ I} is a convergent martingale.

14. (Breiman) Let A, B be linear Borel sets and {X., n ~ I} Lv.s.1f

ptYn [XjE B] Ix I"'" X.} ~ M'X"eA)'" > 0,

prove that P{X.E A, i.o.} ::;; P{X.EB, i.o.}. Hint: If FN= Ui=N [XjEB],
limn_co P{FNIX 1 ••• X n } = IF N '
15. (Doob) If (X.n ~ I} are independent LV.S with E X n = 0, n ~ 1, and S. = I~ Xi,
S: = maxI sis.ISd, then E S: ::;; 8 EIS.I, thereby improving (33) for p = 1. Hint:
Via Ottaviani's inequality (Exercise 3.3.16)

fo
co PIS: > 2y}dy::;; 2 EIS.I + I co

2EIS,,1
2 P{lS.1 > y}dy.
7.5 U-Statistics 259

7.5. U -Statistics
Let h be a measurable symmetric function on R\ k ~ 1 (i.e., invariant under
the permutations of its arguments for k ~ 2) and {X.., n ~ 1} a sequence of
i.i.d. random variables. Then (Example 7.4.3) a sequence of V-statistics Vk...(h)
and their corresponding sums Sk... (h) are defined by
n
(k ) Vk...(h) = Sk...(h) = . L.
1:5:11<"'<lk:5:n
h(X il , ... , X ik ), n ~ k~ 1. (1)

For hE .!f 1 , that is, Elhl = Elh(X 1 , ... , Xk)1 < 00, the V-statistic Vk...(h)
and its "kernel" h are said to be degenerate of order i - 1 where 2 ~ i ~ k if
E{h(X 1, ... , XdlX 1 ' ' ' ' , Xj} = O,a.s. forj = i - 1but not forj = i; otherwise
(i.e., if E{h(X 1"'" Xk)IX d is not a.s. zero), they are non-degenerate. In the
particular case i = k, they are called completely degenerate. It is sometimes
convenient to express non-degeneracy as "degeneracy of order zero."
Let I denote the identity operator, that is, If = f, and define ~ and opera-
tors Qj' 1 ~ j ~ k by
~ = QJ = E{f(X 1 , ... , Xk)IX«, 1 ~ IX ~ k, IX #j}
for any function f on R k with E If I < 00.

Lemma 1. Qr f = QJ, 1 ~ i ~ k and for 1 ~ i # j ~ k


QiQJ = QjQJ = E{f(X 1 , •.. , Xk}IX«, IX # i,j}.
PROOF. Since Qrf = Qi[QJJ, the initial statement is immediate via (14, iv) of
Section 7.1. Next, set ~1 = u(X i ), ~2 = u(X), and ~3 = u(X«, IX # i,j). Then
~ is U(~l u ~3}-measurable and, moreover, ~2 and U(~l u ~3) are indepen-
dent. By Corollary 7.3.3, ~1 and ~2 are conditionally independent given ~3
whence, by Theorem 7.3.1 (iv), E{~lu(~2 u ~3)} = E{~1~3}' a.c. Hence,
QiQjf = E{~lu(~2 U ~3)} = E{~1~3}
= E{E{fIX«, IX # j}IXp , P# i,j} = E{fIX p , P# i,j} = QjQJ. D

Settingjj* = E{fIX 1 , ... , Xj}, Lemma 1 ensures that

jj* = E{fIX«, IX #j + 1, ... , k} = Qj+1'" Qd-


Define for 1 ~ j ~ k,

whence, via Exercise 7.1.1,


260 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

and, for r ~ 1,
ft(X,) = E{J(X" X,+t, ... , X'+k-dIX,} - Ef
are i.i.d. random variables.

Lemma 2. The functions Jj = Jj(X t, ... , X) defined in (2) are completely degen-
erate for 2 :S::j:S:: k and
k
f(Xt, .. ·,Xk)=Ef+ Lt ft(X;)
i=

+ t ';;i,L< i ,;;k
fz(X i , Xi,) + ...
2

+ L
1 $;;i t <···<ik-t::::;k
h-t(Xi,,··.,X ik _,)

+ fk(X t,···, X k)·


PROOF. Since QP - Q)Jj* = 0 and the operators commute,

n (I -
j
E{JjIX t,· .. , Xj-d = Qj'" QkJj = Qj'" Qk Q;)Jj* = 0, a.c.
i= t

whence Jj(X t, ... , X) is completely degenerate, 2 :s:: j :s:: k. The final statement
follows from the representation

n [(I -
k
f = Qi) + Q;]f,
i= 1

noting via Lemma 1 that


Qt ... Qd = E{JIXa , a :1= 1, ... , k} = Ef. o
Lemma 3. Let f E R\ k ~ 2, with IEfl:s:: 00 and let 1:s:: it < ... < ik :s:: n
and I:s::Pt<"'<Pm:s::n be two sets of integers. (i) If {it, ... ,ik}n
{Pt, ... , 13m} = {<5 t ,· .. , <5j }, where 1 :S::j:S:: k /\ m, then
E{f(X i1 , ... , Xi k
)IX pi
, •.•
m
, Xp } = E{f(X i , ... , Xi
1 k
)IX~ 1 , ... , X~.}.
J

(ii) If ik ¢ {Pt, ... , 13m}, then


E{f(X i " ••. , Xi)IX p" ••• , X p.., Xi" ... , X ik _,}
= E{J(Xi " , Xi)IX i " ... , Xik_J
PROOF. Let {a j+t , , ad = {it, , id - {<5 t , ... , <5j } and {Yj+1' ... , Ym} =
{Pt, ... , 13m} - {<5 t , , <5j }. Set <;§t = a(X aj +1' ••• , X a ) , <;§z = a(X yj +" ... ,
X ~ ), and <;§3 = a(X ~ 1 , ... , X ~j ). Then a(<;§ t U <;§3) is independent of <;§ zwhence
1m

<;§t and <;§z are conditionally independent given <;§3 by Corollary 7.3.3. Since
f(X i " .•. , Xi) is a(<;§t U <;§3)-measurable, (i) follows from Theorem 7.3.1 (iv).
Apropos of (ii), if A is the set of distinct integers among Pt, ... , 13m' it, ... ,
ik- t , then {it, ... , ik- d = A n {it, ... , ik}, so that(ii) follows from (i). 0
7.5 V-Statistics 261

CoroUary 1. If h is a completely degenerate kernel on R\ k ~ 2, and ofFn =


O'(X l ' ... , X n) then, for any constants {aj,j ~ I},

{.f. aj
J=k
. L.
1:S;I,<'''<lk-,<J
.h(Xi",,,,Xik_,,Xj),ofFn,n~k}
is a martingale and, in particular (a j = l,j ~ 1), so is {Sk,n(h), ofFn, n ~ k}.

An important aspect of non-degenerate V -statistics is their decomposition


into an average of i.i.d. random variables plus a finite linear combination of
completely degenerate V-statistics, This is an immediate consequence of

Theorem 1 (Hoeffding's decomposition). If {X n, n ~ I} are i.i.d. random vari-


ables and h is a symmetric function on R\ k ~ 2, with E h(X l ' ... , X k) = 0,
then (i)

Vk,n(h) = ±(~)
j= 1 ]
Vj,n(h j), (3)

where hj is as in (2). Moreover, if h is degenerate of order i-I, where 2 :::; i :::; k,


the first i-I terms of the sum in (3) vanish. Furthermore, (ii) if Sj,n(h j) =
(j) Vj,n(h j ), 1 :::; j :::; k, then {Sj,n(h), ofFn = O'(X l' ... , X n), n ~ j} is a martin-
gale, 1 :::; j :::; k, and, if E h2 < 00, then E SI,n(hl)Sm,n(h m) = 0 for m :f. I.
PROOF. Replacingf, X l ' ... , X k by h, XII' ... ,Xlk in Lemma 2 and summing,
n
(k
) V k,ih) = Sk,ih k) +
J=1
kf [L . L.
1 :S;I,<"'<lk:S;n 1 :S;I, <"'<Ij:s;k
h(XI, ' ... , Xl,
' J
)J.
Each of the (j) terms hj(X i" ... , Xi) appears the same number of times in
the bracketed double sum, and all together there are terms. Thus, me)
hj(X i,, ... , Xi) is repeated (~)(~)/(j) times. Hence,

n)
( k U.,,(h) ~ S.,,(h.) + k-1
j~'
C)G)
C) (n) k (k)
SJ,.(h j ) ~ k J~' j UJ,.(hj >

which is tantamount to (3). According to Lemma 2, hj is completely degenerate


for j ~ 2, whereas V 1,n(hd = lin L~= 1 E{hIX r } is an average of Li.d. random
variables. Moreover, if h is degenerate of order i-I, where 2 :::; i :::; k, then
hj = 0 = hjfor 1 :::;j:::; i-I via (2), whencethefirsti - 1 terms of(3) vanish.
Furthermore, since hj is completely degenerate for j ~ 2, Corollary 1 en-
sures that {Sj,n(h), ofFn, n ~ k} is a martingale whenj ~ 2, and clearly the same
conclusion holds for j = 1.
Finally, if I :f. m, it follows from Lemma 4(i) that
E h/(Xi" ... , Xi,)hm(X p" ... , XpJ = 0,
implying that E S/,ihl)Sm,n(hm) = 0 for I :f. m. D
262 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

Lemma 4. Let 1 ~ i 1 < ... < ik ~ nand 1 ~ 131 < ... < 13m ~ n be two sets of
integers with {i 1, ... , ik } n {131"'" 13m} = {i 1, ... , ij }, where 1 ~j < k. (i) Iff
is a completely degenerate function on Rk, k ;::: 2, and 9 is a function on R m with
IE gl ~ 00, IEfgl ~ 00, then
Ef(X i" ... , XiJg(X p" ... , XpJ = O.
(ii) If m = k and h is a symmetric function on Rk, k ;::: 2, with E h 2 < 00, then
E h(X i" ... , XiJh(X p" ... , XpJ = E[E{h(X 1 , ... , X k)!X 1 , ... , XJ2,
and the right side is an increasing function of j.
PROOF. Employing Lemma 3(i),
E{f(X j 1 , ••• , Xi k p , ...
)g(XI m
, X p )IX j 1 , ... , Xj}
)

= E{f(X i" , XiJE{g(X p" , XpJIX i" , XdlXi" , XiJ


= E{f(X i" , XiJE{g(X p1 , , XpJIX i" , XiJIX i" , XiJ
= E{g(X p , 1
, X pm )IX iI , ... , Xi}E{f(X
J iI , , Xi k }IX i1 , , Xi j },
(4)
and the last term equals zero a.s. when f is completely degenerate, so that (i)
is an immediate consequence of (4).
Apropos of (ii), taking m = k, 9 = f = h in (4),
E{h(X i" ... , XiJh(X p" ... , XpJIX i" ... , XiJ
= E{h(X i" , XiJIX i" , XiJE{h(X p" ... , XpJIXi" ... , XiJ
= E 2{h(X 1 , , X k)IX 1 , , Xj}
since {X n' n ;::: 1} are i.i.d. whence (ii) follows upon taking expectations. More-
over, if 1] = E{h(X 1"'" Xk)!X 1'"'' XJ, 1 ~j ~ k and ~j = a(X l' ... , Xj),
then {Yj, ~j' 1 ~ j ~ k} is a martingale for r = 1 and hence a submartingale
for r = 2. Thus, E 1]2, 1 ~j ~ k is an increasing sequence. D

Corollary 2. Let E h 2 < 00. If the V-statistic Vk,n(h) is degenerate of order


i - 1, where 2 ~ i ~ k or non-degenerate (i = 1) with E h = 0, then its variance
is

a~k."(h) = (~) -1 jt c)(: =~) 2


E[E {h(X 1 ,···, Xk)IX 1"'" Xj}],

and so
7.5 U-Statistics 263

PROOF. Set qj = E[E 2 {h(X 1 , .•. , Xk)IX 1 " ' " Xj}]. Since the number of pairs

(i1,"" i k), (PI"'" Pk) with exactly jintegers in common in (~)e)(~=-~).


Lemma 4 yields

(J~k.n(hl=(kn)-2 L L
l~il<"'<ik~n l~PI<"'<Pk~n
Eh(Xi"···,XiJh(Xp,, ... ,XpJ

= (~r1 jt1 e)(~ =- ~)qj'


and, if h is degenerate of order i-I, that first i-I terms of the sum vanish.
For large n, the dominant term of the sum is

k) (n - k) (n)-l . (k)2 [(n - k)!J2


( i k-i k qi=l! i qi n !(n-2k+i)!

via Stirling's formula (Lemma 2.3.1). o


Since (Example 7.4.3) {Uk,n(h), (J(Uk,j' j ~ n), n ~ k} is a reverse martingale
for hE .P l ' it follows (see the discussion after Theorem 11.1.1) under this
proviso that Vk,n(h)~ E h or, equivalently, [1/(~)JSk,n(h)~ E h. The next
theorem provides an analogue of the Marcinkiewicz-Zygmund strong law of
large numbers for V-statistics.
Define
jp
Pj = k _ (k _ j)p' 1 ~j ~ k. (5)

Then Pk = P and for P in (1, 2k/(2k - j)), the sequence {Pj' 1 ~ j ~ k} is de-
creasing, and Pj E (1, 2).

Theorem 2. Let {Xn' n ~ I} be i.i.d. random variables on some probability space


(!l, IF, P) and h a symmetric function on R\ k ~ 2.1f (i) h is degenerate of order
i-I and 1 < P < 2k/(2k - i), where 2 ~ i ~ k, then
nk(P-1 l!PI Uk,n(h) - E hi ~ 0, that is, n-k!PSk,n(h - E h)~ 0 (6)

provided

E{h(X1"",Xk)IX1, ... ,XJE.Ppj i~j~k. (7)


(ii) Alternatively, (6) holds for non-degenerate h if 1 < p < k/(k - 1) and (7)
is satisfied for i = 1. (iii) Moreover, if 0 < p < 1 and h E .PP' then
n-k!PSk,n(h)~ O.
264 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

PROOF. Suppose initially that i = k and h E .fep whence, according to (i),


1 < p < 2. Hence, 0: = kIp E (kI2, k). Set
gj(X 1, ... , X k) = h(X 1, ... , Xk)Illhl:S;j"]
- E{h(X 1, ... , X k)I[lhISj"]IX 1, ... , X k- 1}
and define
n
Sk,n(h) = Lr
j=k
a
L
1 si, <"'<i k - , <j
h(Xi" ... , Xik-l' X),
n

1ic.n(h) = L: r
j=k
a
L:
lsi,<"'<ik _,<j
giXi" ... , Xik-l' Xj}'

If f7n = a(X l ' ... , X n), Corollary 1 asserts that {Sk.n(h), f7n, n ~ k} is a mar-
tingale and the same is true of {lic.n(h), f7n, n ~ k} since, via Lemma 3(i),
E{gn(Xi " ... , Xik-l' X n)lf7n-d
= E{gn(Xi " ... , X ik _" Xn)IX i " ... , X ik _,} a.s. O.
Hence, employing Lemma 4(i) and setting An = {(n - 1) < Ihl 1/a ~ n},n ~ 1,

E[T.k.n (h)] 2 fJ'-2a(j-l)Eg~<


= L..- k 1 J -
~J'k-1-2aEh21[Ihl :S;f'].
L..-
j=k - j=k

= f /-1-2a n=lt f h2 dP~ n=lf (!+_I_)n


j=k An n 20:-k
k- 2a f h2 dP
An

~ (20: - k + 1) f f Ihl dP (20: - k + 1)E h1P <


k /a = 00.
20: - k n= 1 20: - k
An
1

Moreover, since h is completely degenerate,

sup EISk.n(h) - 1ic.n(h} I ~ 2 .L: r


_ co a(j-l)
k _ 1 ElhII[lhl>f']
n;;,k J=k

~ 2E(lh l L /-l-a) ~ CElhl k/a < 00.


jSlhl""

Thus, in view of
sup EISk.ih)1 ~ sup EISk.n(h) -lic.ih)1 + sup Ellic.n(h)!,
n;;,k n;;,k n;;,k

the martingale {Sk.n(h), f7., n ~ k} is .fe1-bounded and hence convergent so


that Kronecker's lemma yields

n- k/PSk.n_1(h) = n- a t
j=k
L
l:S;i,<'''<i k _,<j
h(X i" ... , X ik _" X)~O. (8)

Therefore, (8) holds whenever h is a completely degenerate kernel belonging


7.5 U-Statistics 265

to .PP' 1 < P < 2 and k is any integer ~ 2. Clearly, (8) is tantamount to (6)
since E h = O.
Suppose next that 2 :::; i < k. Then, via Hoeffding's decomposition (Theo-
rem 1) and since again E h = 0,

k p
k 1 (n - j)!
:::; n- / ~. (k _ ')' (n _ k)'. ISj,.(h)1
)-1 ] •

k 1
" k-j-k/PIS (h)1
/..-. (k _ ] ')'• n
:::; )=1 j,. j

k 1 j
= ~. (k _ .),ISj,.(h)l/n /PJ , (9)
)=1 ] •

where Pj is as in (5) and hj is completely degenerate for j ~ i ~ 2. In view of


the cited properties of {Pj' 1 :::;j :::; k}, the hypothesis 1 < P < 2kl(2k - i)
implies that Pi E (1, 2) whence Pj E (1,2) for j ~ i. Hence, if hj E .PPJ for j ~ i,
(8) in conjunction with (9) ensures that n-k/PSk.• (h)~ O.
Now, E{h(X 1, ... , X k )IX 1, ... , Xj}, l:::;j:::; k is a martingale whence
IE{h(X 1 , ... , X k)!X 1, ... , XjW, 1 :::;j:::; k is a submartingale for any P ~ 1,
implying that
EIE{hIX 1 , ... , Xj-dI P :::; EIE{hIX 1 , ... , XjW, P ~ 1.
Next, either employ Exercise 7.5.4 or note via (2) that

where I is the identity operator and QJ = E{fIX a , IX #- r}. It follows that


E{hIX 1 , ... , Xj} E .PPJ entails hj E .PPJ' thereby proving (i).
If, rather, h is non-degenerate and 1 < P < kl(k - 1), suppose without loss
of generality that E h = O. Then, exactly as in case (i), all terms of the sum in
(9) for whichj ~ 2 converge a.s. to zero, and whenj = 1,
n -Ik -i k - 1)pl/PS l,.(h 1)

= n -lk-ik-1)pl/p"
L... E{h(X r> X r+1' ... , X r+k-1 )IXr--+
} a.s. 0
r= 1

by the classical Marcinkiewicz-Zygmund strong law (Theorem 5.2.2) so that


(6) holds in case (ii).
Finally, when 0 < P < 1, it suffices to prove that n-k/PSk,.(Ihl)~0 whence
h may be supposed non-negative. Set hj = hl lh s;.f]' where IX = kip and

D k- 1,j(h) = L
1 :::;i 1 < ... <i k - 1 <j
h(X i " ... , Xik-l' X).
266 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

Now,

L P{h(X i" ... , Xik-l' X) > r


00

:::; j=k
for some choice of 1 :::; i1 < ... < i k - 1 < j}

:::; L L
00 00

(n - l)kE IAn:::; E hk/·IAn = E hP < 00.


n=1 n=1
Hence, P{D k- 1,n(h) # Dk- 1,n(hn), i.o.} = O. Moreover, since rx > k,
n
E "·-·D
L.. J
j=k
k -I.}}
n
.(h.) = "L. J.-. k
j=k
(J'-I)-
00

j=k
j
1 E h·} -< "L.. J·k-l-. "L.. E hIAn
n=1

< (rx - k + 1) ~ _1_ E hI < (rx - k + 1) E hk/.


- rx- k n=ln
L. .-k An - rx- k < 00.

Thus, the series L'f'=kF r'D k- 1 ,j(h) and hence also LJ=kF·Dk- 1•j (h)converges
a.s. whence n-k/PSk.n(h)~ O. 0

Corollary 3. Let h E !l?P with h degenerate of order i - I and


E{h(X 1, ... , XdIX 1, ... , Xk-d E !l?Pi where Pi is as in (5). If (i) 1 < P <
2k/(2k - i) when i;;::: 2 or (ii) 1 < P < k/(k - 1) when i = 1 and E h = 0, then
n-k/PSk.n(h)~ O.

Corollary 4. If {Uk.n(h), n ;;::: k;;::: 2} is a sequence of completely degenerate


U-statistics with ElhI 2k/(2k-l) < 00, then, as n --+ 00,
nl/2Uk,n(h)~O. (10)
PROOF. For completely degenerate hE !l? p' 1 < P < 2, Theorem 2 ensures that
nk(P-1)/PUk,n(h)~ O. Since k(p - 1)/p = 1/2 for P = 2k/(2k - 1), (10) follows.
o
Lemma 5. Let {Uk,n(h), n ;;::: k ;;::: 2} be a sequence of non-degenerate U-statistics
with E h = O. If
E{h(X 1, ... , XdIX 1, ... , XJ E !l?2j/(2j-l), (11)
(a fortiori, if E Ih1 4 / 3 < (0), then

_ f E {h(Xr , X r + 1, ... , X r +k-l)IXr } + 0(1),


k L..
n1/2 Uk,n(h) -172 a.s. (12)
n r=1
7.5 U-Statistics 267

PROOF. Via Hoeffding's decomposition,

n 1/2 Vk,n(h) _
-172
n
k ~ E{h(X" ... , X r+k-l)!Xr}+
L,
r= 1
n 1/2 ~
L,
j= 2
(k) Vj,n()'
}
.
h
As in the proof of Theorem 2, (11) implies that hj E 2 2j/(2j-l), 2 5, j 5, k
whence (12) follows from Corollary 4. Clearly, Elh1 4 / 3 < 00 ensures (11).
o
A Central Limit Theorem and Law of the Iterated Logarithm for non-
degenerate V-statistics will be proved in Sections 9.1 and 10.2.

EXERCISE 7.5

k
(k + l)Sk+
-
1 •• =
" .-
'-- (-lYSk_ ).•S.U+l) , k ~ 1,
)=0

and (ii) if {X, X., n ~ 1} are i.i.d. with E X = 0, then Sk.• /(k) is a completely degener-
ate U-statisticfor k ~ 2.

2. Let Sk.•, n ~ k and S~\j ~ 1 be as in Exercise 1, where {X, X., n ~ 1} are i.i.d., and
let {b n , n ~ 1} be constants such that 0 < b./n 1/ 2 t 00. If (i)(l/b.) D XI~ 0 and
(ii) I~=l P{lXI > bn } < 00, prove that Sk,./b:~~,k ~ 2. In particular, if X E 2 p ,
o< p < 2 and E X = 0 whenever p ~ 1, then Sk.•/nk/P ~ O. Hint: Employ the
Newton identities

where the sum is over all non-negative integers m), 1 :$; j :$; ksatisfyingI'= 1jm) = k.
3. Find a non-degenerate kernel h such that Elhl P = 00, some p > 1, but E{h(X l' ... ,
Xk)IX dE !l'q, all q > O.
4. Show that if E h(X 1- ... _ X k) = 0, for 2 :$; j :$; k
)
E{h(X l ,· .. ,Xk)IX 1 ,···,X)} = hiX 1""'X) + I
1=1
hl (X;)

+ I
1 $1, <1,$)
h2 (X I" XI,) + ...

5. (Kemperman) For any countable partition of (-00, (0) into Borel sets B),
j = 1, 2 ... , define the non-negative, symmetric function h by h(X l ' ... , Xk) =
Ij=1 Ujn~=1 [[X/EBil' k ~ 2 and let {X, X n , n ~ 1} be i.i.d. random variables with
P{X E B)} = 'Ttj > O,j ~ 1, where I f 'Tt) = 1. Set 'Tt) = cfj", u) = jP with IX> 1,p > 1.
Prove that if klX - fJ > 1 ~ klX - pfJ and 1X[1 + (k - l)p] > 1 + pfJ, then hE 2 1 ,
268 7 Conditional Expectation, Conditional Independence, Introduction to Martingales

h ¢!I! P' E{h(X 1 " ' " Xk)IX dE !l!p. These inequalities hold, for example, if p=
k - 1 and 1 < IX < [1 + (k - l)p]/k.
6. If 1 < P < (2k/2k - i), there exist degenerate kernels of order i = 1, 2::;; i ::;; k
such that nk(P-l)/PVk.• ~ 0 despite EIE{h(X 1"'" Xk)IX 1"", Xi}IPi = 00, where
Pi is as in (5). Hint: Let {X., n ~ I} be symmetric i.i.d. random variables with
P{IXI > t} = ct- P(1og t)-I, t ~ 2, where 1 < P < 2. Then

is degenerate of order i-I, 2 ::;; i ::;; k. Hint: Set X; = X,IlIX,1 $v'2v1p,], I ~ 1, IX > O.

References
Y. V. Borovskich and V. S. Korolyuk, Theory of V-Statistics, Kluwer Academic
Publishers, Boston, 1994.
L. Breiman, Probability, Addison-Wesley, Reading, Mass., 1968.
H. Buhlman, "Austauschbare stochastiche Variabeln und ihre Grenzwartsatze,"
Univ. of California Publications in Statistics, 3 (1960), 1-36.
Y. S. Chow, "A martingale inequality and the law ofIargenumbers," Proc. Amer. Math.
Soc. 11 (1960),107-111.
Y. S. Chow, H. Robbins, and D. Siegmund, Great Expectations: The Theory ofOptimal
Stopping, Houghton Mifflin, Boston, 1972.
Y. S. Chow, H. Robbins, and H. Teicher, "Moments of randomly stopped sums," Ann.
Math. Stat. 36( 1965), 789-799.
K. L. Chung, A course in Probability Theory, Harcourt Brace, New York, 1968; 2nd ed.,
Academic Press, New York, 1974.
J. L. Doob, "Regular properties of certain families of chance variables," Trans. Amer.
Math. Soc. 47 (1940), 455-486.
J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
L. E. Dubins and D. A. Freedman, "A sharper form of the Borel-Cantelli lemma and
the strong law," Ann. Math. Stat. 36 (1965), 800-807.
E. B. Dynkin and A. Mandelbaum, "Symmetric statistics, Poisson point processes and
multiple Wiener integrals," Ann. Statist. 11 (1983), 739-745.
B. de Finetti, La prevision; ses lois logiques, ses sources subjectives," Annales de
/'Institut Henri Poincare 7 (1937), 1-68.
E. Gine and 1. Zinn, "Marcinkiewicz-type laws of large numbers and convergence of
moments for V-statistics," Probability in Banach Spaces 8 (1992), 273-291, Birk-
hauser, Boston.
J. Hajek and A. Rcnyi, "Generalization of an inequality of Kolmogorov," Acta Math.
Acad. Sci. Hung. 6 (1955),281-283.
P. R. Halmos, Measure Theory, Van Nostrand, Princeton, N. J., 1950; Springer-Verlag,
Berlin and New York, 1974.
E. Hewitt and L. J. Savage, "Symmetric measures on Cartesian products," Trans. Amer.
Math. Soc. 80 (1955), 470-501.
W. Hoeffding, "The strong low of large numbers for V -statistics," Institute of Statistics
Mimeo Series 302 (1961), University of North Carolina, Chapel Hill, NC.
D. G. Kendall, "On finite and infinite sequences of exchangeable events," Studia Scient.
Math. Hung. 2 (1967),319-327.
7.5 U-Statistics 269

M. J. Klass, "Properties of optimal extended-valued stopping rules," Ann. Prob. I


(1973),719-757.
K. Krickeberg, Probability Theory, Addison-Wesley, Reading, Mass., 1965.
P. Levy, Theorie de faddition des variables aleatoires, Gauthier-Villars, Paris, 1937;
2nd ed., 1954.
M. Loeve, Probability Theory, 3rd ed., Van Nostrand, Princeton, 1963; 4thed., Springer-
Verlag, Berlin and New York, 1977-1978.
P. K. Sen, "On !l'p convergence of V-statistics," Ann. [nst. Statist. Math. 26 (1974),
55-60.
R.1. Serfling, Approximation Theorems of Mathematical Statistics, Wiley, New York.
H. Teicher, "On the Marcinkiewicz-Zygrnund strong law for V-statistics," J.
Theoret. Probability 10 (1997),
8
Distribution Functions and
Characteristic Functions

8.1 Convergence of Distribution Functions,


Uniform Integrability, Helly-Bray
Theorem
Distribution functions are mathematical artifacts with properties that are
independent of any probabilistic setting. Notwithstanding, most of the
theorems of interest are geared to dJ.s of LV.S and the majority of proofs are
simpler and more intuitive when couched in terms of LV.S having, or prob-
ability measures determined by, the given d.f.s. Since r.v.s possessing pre-
assigned dJ.s can always be defined on some probability space, the language of
LV.S and probability will be utilized in many ofthe proofs without further ado.
Recall that a dJ. on the line is a nondecreasing, left-continuous function on
R=[-oo,oo] with F(oo) = Iim x -+ ao F(x) = 1, F(-oo)=lim x -+_ ao F(x)
= 0. A discrete dJ. was defined in Section 1.6 as roughly tantamount to a
"step function" with a finite or denumerable number of jumps. As such, it
determines and is determined by a probability density function (p.d.f.), say f,
.r
and a nonempty countable subset S of ( - 00, (0) with positive on Sand
vanishing on S<. Absolutely continuous dJ.s were encountered in Section 6.5.
AdJ. F is absolutely continuous iff F(x) = J~ ao f(t)dt, - 00 < x < 00, for
some Borel function f ;::: 0, a.e., with J~ ao f(t)dt = 1. AdJ. F is termed
singular if it is continuous and its corresponding probability measure is
singular with respect to Lebesgue measure (Exercise 5).
The first proposition states that any dJ. on R is a convex linear combination
of these three types.
Also, a d.f. is degenerate or improper if it has only a single point of increase
(Exercise 1.6.4) and otherwise nondegenerate or proper.

270
8.1 Convergence of Distribution Functions 271

Proposition 1. If F is an arbitrary dJ. on R = [ - 00, 00], then F = (XI F I +


(X2 F2 + (X3 F3 where Lf;l(Xj=I,(Xj2:0, i=I,2,3, and FJ, F 2, F 3 are
discrete, absolutely continuous and singular d.f.s respectively.
PROOF. IfF is discrete, F = FJ, (XI = 1, while if F is continuous, F coincides
with F* in what follows. Set 51 = {x: F(x+) - F(x) > O} so that if F is
neither discrete nor continuous, (XI = P{5dE(0, 1) where P is the measure
induced by F. Hence, if PI is the probability measure determined by

1
Pd{x}} = -.P{{x}},
(XI
PI{B} = 0, BE 5~· Pi,
then the dJ. corresponding to PI' say F I' is discrete. Moreover, p* =
[1/(1 - (Xt)](P - (XIPI) is a probability measure vanishing on all one-point
sets, whence its corresponding d.f. F* = [1/(1 - (XI)](F - (XtFt) is con-

°°
tinuous. If p* is absolutely continuous (resp. singular) relative to Lebesgue
measure, its dJ. F* may be taken as F 2 (resp. F 3) and (X3 = (iesp. (X2 = 0).
Otherwise, by Corollary 6.5.1, F* = f3F 2 + (1 - f3)F 3' < f3 < 1, where F2
is absolutely continuous and, moreover, F 3 is singular. Thus, F - (XIF I =
(1 - (X t)F* = f3(1 - (X t)F 2 + (l - f3) (l - (X t)F 3 is the asserted decom-
position. 0

The support (Exercise 1.6.4) or spectrum of an arbitrary d.f. F is the closed


set 5 defined by
5 = {x: F(x + e) - F(x - e) > 0, all e > O}.
and the elements of 5 are called points of increase.
An instance of convergence of a sequence of dJ.s to a dJ. occurred in
Corollary 2.3.1, but the situation there was too specialized to furnish clues to
the general problem.
For any real function G, let C(G) denote the set of continuity points of G,
that is, C(G) = {x: - 00 < x < 00, G(x-) = G(x+) = G(x)}. Note that if
G is monotone, C(G) is the complement of a countable set and afortiori dense
in (- 00,00).

Definition. A sequence of nondecreasing functions G. on (- 00, 00) is said to


converge weakly to a nondecreasing function G on (- 00, 00), denoted by
G. ~ G, iflim._ oo G.(x) = G(x) for all x E C(G). If, in addition, G.( 00) -+ G( 00)
and G.( - 00) -+ G( - 00) where, as usual, G( ± 00) = Iim x _ ± 00 G(x), then
{G.} is said to converge completely to G, denoted by G. ~ G.

In the special case of dJ.s F., complete convergence of {F.} guarantees that
the "limit function" F, if left continuous (as may and will be supposed via
Lemma 8.2.1 even when merely F. ~ F), is a d.f.
272 8 Distribution Functions and Characteristic Functions

If {X n' n ~ I} is a sequence of r.v.s on some probability space (0, ff, P)


with dJ.s FXn that converge completely to F, the r.v.s {Xn } are said to converge
in distribution or law, symbolized by X n ..!4 X F • Here X F is, in general, a
fictitious r.v. with dJ. F. It is not asserted that any such "LV." exists on (0,
iF, P), but, of course, one can always define a r.v. X with d.f. F on another
probability space; rather, X n ~ X F is simply a convenient alternative nota-
tion for F X n ~ F. Clearly, convergence in distribution is a property of the
dJ.s of the LV.S in question and not of the r.v.s themselves.
However, if X n !. X (afortiori, if X n ~ X or X n :£'P, X), then the following
Corollary 1 asserts that a bona fide r.v. X F on (0, iF, P) does exist and coin-
cides with X. Such a case may be denoted simply by X n i. X, that is, FXn ~ F x.

Theorem 1 (Slutsky). If {X n , n ~ I} and {Y", n ~ I} are r.v.s on some prob-


ability space with X n - Y" ~ 0 and Y" i. X F , then X n 2. X F •
PROOF. Let x, x ± t: E C(F), where t: > 0 and x E ( - 00, (0). Then

P{X n < x} = P {X n < x, IX n - Y"I < t:} + P{X n < x, IX n - Ynl ~ t:}

~Fdx+t:)+P{IXn- Ynl~d

and, analogously,

Thus,

F(x - t:) ~ lim Fxn(x) ~ rrm Fxn(x) ~ F(x + t:),


n n

and letting t: -+ 0 subject to x ± t: in C(F), the conclusion follows. 0

Corollary 1. If {X, X n , n ~ I} are r.v.s on some probability space with


X n ~ X, then X n i. X.

Corollary 2. If {X n}, {Yn}, {Zn} are sequences of r.v.s on (0, iF, P) with
X n 2. X F, Yn X. a, Zn ~ b, where a, b are finite constants, then

X n Y" + Zn 2. aXF + b.
Note. Here, aX F + b is a fictitious r.v. whose distribution coincides with
that of aX + b when X is a bona fide LV. with dJ. F.

PROOF. By the theorem it suffices to prove X n Y" + b 2. aX F + b or equiv-


alently that X n Y" ~ aX F • Since it is obvious that aX n ~ aX F , applying the
theorem once more it suffices to prove that Xn(Y" - a) ~ Oor, renotating, that
X n Un ~ 0 if X n ~ X F , Un !. O. To this end, for any <5 > 0, choose ±h E C(F)
8.1 Convergence of Distribution Functions 273

such that F(h) - F( - h) ~ 1 - fJ. Then, for all sufficiently large n and any
t:> 0,
P { I Un X nI > t:} ~ P { I Un X nI > t:, 0 < IXnI ~ h} + 2fJ
~ P{lUnl > t:lh} + 2fJ ~2fJ
and the result follows as fJ -+ O. D

Corollary 3. If {a, b, an' bn, n ~ I} arefinite constants with an -+ a, bn -+ band


the LV.S X n .'!.. X F , then anX n + bn ~ aX F + b.

If {X n } is a sequence ofr.v.s and bn is a sequence of positive constants such


that X nlbn ~ 0, it is natural, paralleling the classical notation, to designate this
by X n = op(b n). Analogously, X n = Op(b n) will signify that X nlb nis bounded in
probability, i.e., for every t: > 0, there are constants C" Nt such that
P{ IXnl ~ Ctb n } ~ t:
for n > Nt. In this notation, Theorem 1 says that if X n ~ X F , then X n +
op(l) ~ X F · A calculus paralleling that of 0 and 0 exists for op and Op. For
example, the Taylor expansion

f(x) = L (x J.- cy f(j)(c)


k

j=O
'f
.
+ o(lx - elk),

valid as x -+ c under the hypothesis below (Cramer, 1946, p. 290) leads directly
to

Corollary 4. Iff(x) has k derivatives at x = c and the LV.S X n satisfy X n =


c + op(b n), where bn = 1 or bn = 0(1), then

f(X n) = ±(Xn.~
j=O J.
cy f(j)(c) + op(b:).
It will be advantageous to prove the ensuing for nondecreasing functions
Gn on ( - 00, (0). In the special case where the Gn are dJ.s, the condition which
conjoined with weak convergence yields complete convergence is (iii)(y) in

Lemma 1. Let {G n, n ~ O} be finite, nondecreasing functions on (- 00, (0)


with Gn ~ Go. Set ~Gn = Gn(oo) - Gn( -(0), n ~ 0, where G(oo) = G(oo-)
and G( - (0) = G( - 00 +). Then
i. ITffin _ oo Gn( - (0) ~ Go( - (0) ~ Go((0) ~ l!m,,-oo Gn( (0),
ii. ~Go ~ limn_ oo ~Gn'
Moreover, if ~Gn<a) = Gn(a) - Gn( -a) for n ~ 0, 0 < a < 00, and if
~Gn < 00 for n ~ 1, then
iii. (ex) lim Gn( ± (0) = Go( ± 00 ),finite iff (13) lim n_ 00 ~Gn = ~Go < 00 iff (y)
SUPn", l[~Gn - ~Gn(a)] = 0(1) as a -+ 00.
274 8 Distribution Functions and Characteristic Functions

PROOF. Since G.( - 00) :::;; G.(x) :::;; G.( 00), takingx E C(Go) and lettingn - 00,

yielding (i) as x - ± 00. Then (ii) follows immediately from (i). That (a)
implies ({3) is trivial. For the reverse implication, let 6.G n - 6.G o < 00. Then
Go( ± 00) are finite and by (i)

ITm G.(oo) = ITm[6.G n + Gn(-oo)]

= 6.G o + ITm Gn( - 00) :::;; Go( 00) :::;; lim Gn( 00),

whence lim n_ <Xl Gn( (0) = Go( 00), finite, and so lim Gn( - 00) = Go( - 00),
finite.
Under(y),forany£ > ochoose a = a(£) > Osuchthat6.Gn - 6.G n(a) < £,
n ~ 1, for a ~ a. Then if ±a E C(G o),

ITm 6.G n :::;; 6.G o(a) + £< 00,


n

ensuring 6.G o < 00 by (ii), and since £ is arbitrary, ITm n 6.G n :::;; 6.G o . In
conjunction with (ii), this yields ({3). Conversely, under ({3), for any £ > 0
choose the integer nl such that n ~ nl entails 6.G n - 6.G o < £ and select
a > 0 with ±a E C(G o) such that 6.G o - 6.G o(a) < £. Then for n ~ some
integer n2, 6.G oUi) - 6.G n(a) < £, implying for n ~ no = max(nl, n2) that

Choose aj such that 6.G j - 6.G j(aj) < 3£, 1 :::;; j < no, whence for a ~ a' =
max (a, at> ... , ano-I)
sup[6.G n - 6.G.(a)] < 3£,
n~1

which is tantamount to (y). o


Lemma 2 (Helly-Bray). If {F n , n ~ I} is a sequence ofdJ.s with Fn ~ F and
a E C(F), b E C(F), then for every real, continuous function 9 on [a, b]

lim fb g dF n
= fb g dF. (1)
n- 00 a CJ

PROOF. As the notation indicates, the integrals in (1) are Riemann-Stieltjes,


although they may also be interpreted (Theorem 6.2.4) as Lebesgue-Stieltjes
integrals over [a, b). For £ > 0, choose by uniform continuity b > 0 so that
Ig(x) - g(y) I < £ for Ix - yl < b, x,yE[a,b]. Select xjEC(F), 1 < i:::;; k,
such that a = XI < X2 < ... < Xk+ I = b and maxI :Si:Sk(Xi+ I - Xi) < b.
8.1 Convergence of Distribution Functions 275

Then

Hn == fgdFn - f9dF

= itl {[fi"g(X)dFn(X) - fi"g(XJdFn(X)]

+ [S;"'9(Xi)dFn(X) - S;"'9(XMF(X)]

+ [fi"g(XMF(X) - fi"g(X)dF(X)]}

JI {S;i"[g(X) - g(xJJdFn(x) + fi+ '[g(xJ - g(x)]dF(x)

+ g(xJ[Fixi+t) - Fn(xJ - F(Xi+l) + F(XJJ}-


Hence,
k
IHnl:S e + e + L:lg(Xi)IIFn(Xi+d - Fn(xJ - F(Xi+l) + F(Xi)l--- 2e
i= I

as n -+ 00. Since e is arbitrary, (l) follows. o


Lemma 3. (i) If {G n, n ~ O} are finite, nondecreasing functions on ( - 00. (0)
with lim n_oo Gn(x) = Go(x) for x E some dense subset D of ( - 00,(0). then
Gn ~ Go·
(ii) Let {F n• n ~ I} be dJ.s with F n ~ Fo and 9 a nonnegative continuous
function on (-00, (0). For n ~ 0, aE C(Fo), and x E [ -00,00], define

Gn(x) = fg dF n •

Then Gn is finite and nondecreasing on (- 00, (0), n ~ 0, and


(C() Gn ~ Go,
(f3) lim J:
9 dF n ~ J: 9 dF o , lim J~oo 9 dF n ~ J~oo 9 dF o ·
PROOF. (i)IfxE C(Go)ande > O,chooseb > Osuch that IGo(Y) - Go(x)1 < e
for Ix - yl < b. Select XiED, i = 1,2, with x - b < XI < X < X2 < X + b.
Then
Go(x) - e < Go(xd <-- Gn(XI) :S Gn(x) :S Gn(X2) -+ GO(X2) < Go(x) + e,
whence lim Gn(x) = Go(x) for x E C(G o). Apropos of (ii), note that by (the
Helly-Bray) Lemma 2 and part (i) of the current lemma Gn ~ Go. Then (f3)
follows directly from Lemma l(i). 0
276 8 Distribution Functions and Characteristic Functions

Definition. If {F", n ~ I} is a sequence of dJ.s on R, and g is a real, continuous


function on ( - 00, (0), then g is called uniformly integrable(u.i) relative to {E,,}
if

sup f Ig(y)ldF,,(y) = 0(1) as a -> 00. (2)


"~lJ[lYI~al
Furthermore, {F", n 2 1} is said to be tight if the function 1 is u.i. relative to
{F,,}.

Clearly, (i) iffand g are u.i. relative to {F,,}, so aref+ and af + bg for any
finite real numbers a, b. (ii) iff, g are continuous, If I ~ Igl, and g is u.i.
relative to {F,,}, so is f.
Thus, in the case of dJ.s, Lemma l(iii) may be rephrased as follows:

If the dJ.s F" ~ F, then F,,':' F iff {F,,} is tight iff every bounded continuous
function g is u.i. relative to {F,,}.

Theorem 2. If {F", n 2 I} is a sequence ofdJ.s on R with F" ~ F and g is a


nonnegative, continuous function on (- 00, (0) for which J~ ao g dF" < 00,
n ~ 1, then

(3)

iff g is u.i. relative to {F,,}.


PROOF. For any a E C(F) and x E [ - 00, 00] define G,,(x) = .f~ g dF", G(x) =
J~ g dF. By Lemma 3, G" ~ G.1f g is u.i. relative to {F,,}, then (iii) (y) of Lemma
1 holds, whence by (iii)(a) thereof

G,,( ± (0) -> G( ± (0), finite,


which is virtually (3).
Conversely, if (3) holds, so does (iii)(f) of Lemma 1 for G, G" as defined,
whence by (iii)(y), g is u.i. relative to {F,,}. 0

Corollary 5. If the dJ.s F" ~ F and g is a continuous function on (- 00, (0)


which is u.i. relative to {F,,}, then (3) holds and J~aolgldF < 00.

Corollary 6 (Helly-Bray Theorem). (i) If the d.f.s F,,':' F and g is a bounded,


continuous function on ( - 00, (0), then

!~~ f:} dF" = f:ao g dF. (4)

(ii) If the dJ.s F" ~ F and g is continuous on ( - 00, (0) with lim y _ ± ao g(y)
= 0, then Eq. (4) holds.
8.1 Convergence of Distribution Functions 277

PROOF. Since Igi ~ M < 00 and 1 is u.i. relative to {F n } by (iii)(y) of Lemma 1,


necessarily 9 is u.i. relative to {F n}, and the conclusion follows from Corollary 5.
In case (ii), for any 8 > 0 and sufficiently large a, 1g(y) I < 8 for I y I ~ a and
so 9 is u.i. relative to (F n }. 0

Corollary 7. IfdJ.s Fn ~ F and {flxl s dFn(x), n ~ I} is a bounded sequence


for some s > 0, then
i. F n -':' F,
ii. Slxl' dFn(x) -+ Slxl' dF(x), 0 ~ r < s, and
iii. Sx k dFn(x) -+ S x k dF(x), k = 1,2, ... [s], k # S.

PROOF. This follows from Corollary 5 since for 0 ~ r < s and some C in
(0, (0)

[
J[lxl~al
lxi' dF.(x) ~
a
.t-, flx1s dF.(x) < a~" n~1. D

The Helly-Bray theorem (Corollary 6(i» is extremely useful and clearly


complements the earlier Helly-Bray lemma.
The notion of a function u.i. relative to d.f.s {F n } is redolent of that of
uniform integrability of r.v.s {X n} encountered in Chapter 4. The connection
between these is elucidated in

Proposition 2. Let 9 be a continuous function on ( - 00, (0) and let {X n} be


r.v.s on a probability space (0, fi', P) with dJ.s {F n}. If 9 is u.i. relative to {F n},
then the r.v.s {g(X n)} are u.i. Conversely, if the LV.S {g(X n)} are u.i. and either
(i) Ig(t)l-+ 00 as Itl-+ 00 or (ii) {F n} is tight, then 9 is u.i. relative to {F n}.

nr
PROOF. Throughout, in addition to any other requirements choose a > 0 so
that ±a E C(Fn ). If 9 is u.i. relative to {F n } and 8> 0, select a so that for
n ~ 1 it also satisfies the first inequality of

8> f ~al
Ilyl
Ig(y)ldF n = [
J UXnl ~a)
Ig(Xn)ldP ~ [
JUg(Xnll >b]
Ig(Xn)1 dP, (5)

whereb = max{ Ig(y)l: Iyl ~ a} and the equality holds via Theorem 6.2.4 and
Corollary 6.2.1. Thus {g(X n )} are u.i.
Conversely, in case (i), as a -+ 00 there exists K = K a -+ 00 such that

fUyl~al Ig(y)ldFn = J[IXnl~al


[ Ig(Xn)1 dP ~ [ I g(X n) I dP,
JUg(Xnll~KI
whence u.i. of {g(X n )} implies that of 9 relative to {F n }. Under (ii), for any
8> 0 choose b > 0 such that

sup [ Ig(Xn)1 dP < 8


n~l JUg(Xnll>bl
278 8 Distribution Functions and Characteristic Functions

and then select a> 0 so that sUPn~1 P{IXnl ::2': a} < e/b. Then, for n::2': 1

f.
lIyl~al
Ig(Y)ldFn(Y) = i
[IXHI~al
Ig(Xn)1 dP ~ e + b P{lXnl ::2': a} < 2e,

whence 9 is u.i. relative to {F n }. o


Proposition 2 in conjunction with Theorem 2 yields the following improve-
ment of Theorem 4.2.3(i):

Corollary 8. For some p > 0, let {X n, n::2': I} be Ifp r.v.s on (0, f/', P) with
X n ~ X F· Then EIXnlP -+ EIXFIP,jinite iff {IXnIP, n ::2': 1} is u.i.

If f/'* denotes the class of all d.f.s on R, many distances may be defined
on f/'*. One prominent choice is d*[F, G] = SUpxERIF(x) - G(x) I (see
Exercise 2). The Levy distance d[F, G] corresponds to the maximum distance
between F and G measured along lines of slope - 1 (in contemplating this,
draw vertical lines connecting F and also G at any discontinuities) multiplied
by the factor 1/.j2. Formally,
d[Fn, F] = inf{h > 0: F(x - h) - h ~ Fn(x) ~ F(x + h) + h, all x}. (6)

Theorem 3. Let {F, F., n ::2': 1} bed.f.s. Then (i) Fn '; F iff (ii) J9 dF n -+ J9 dF
for every bounded, continuous function 9 iff (iii) d[F., F] -+ 0 iff (iv) rrm Fn{C}
~ F{C}, lim F.{V} ~ F{V} for all closed sets C and open sets V, where
Fn{·}, F{·} are the probability measures determined by F., F respectively.
PROOF. That (i) implies (ii) is the Helly-Bray theorem, Corollary 6. To
show that (i) implies (iii), for any e > 0 choose a, bE C(F) such that e/2
exceeds both F(a) and 1 - F(b) and then select ajE C(F), 0 ~ j ~ m,
with ao = a < al < ... < am = band laj - aj-II < e, 1 ~ j ~ m. Deter-
mine N j , 0 ~ j ~ m, so that n ::2': N j entails IFn(a) - F(a) I < e/2 and set
N = maXO,;;j,;;m N j • Let n > N. If x ~ ao,

e
F.(x) ::2': 0 > F(x) - 2 ::2': F(x - e) - e,

and, analogously, F(x - e) - e ~ Fn(x) ~ F(x + e) + e for x ::2': am' More-


over, if aj_1 ~ x ~ aj for somej, 1 ~j ~ m,
e
F.(x) ~ F.(aj) < F(aj) + 2~ F(x + e) + e,

e
Fn(x) ::2': F.(aj_,) > F(aj_ I) - 2 ::2': F(x - e) - e.

Combining these, d[F n, F] < e and (iii) follows.


8.1 Convergence of Distribution Functions 279

To verify that (iii) implies (i), for any X o E C(F) and e > 0 choose 1J > 0
such that Ix - Xo I :$ 1J entails IF(x) - F(xo) I < e. Set h = min(e, 1J) and
select N so that d[F., F] < h when n ~ N. Then, for n ~ N from (6),
Fn(xo) :$ F(xo + h) + h :$ F(xo + 1J) + e :$ F(xo) + 2e,
Fn(xo) ~ F(xo - h) - h ~ F(xo - 1J) - e ~ F(xo) - 2e,
and (i) follows.
To obtain (i) from (ii), retain the choice of xo, 1J, e; define

I
I, x :$ Xo - 1J

h(x) = x o ; x, Xo - 1J :$ x :$ Xo

0, x> xo,
= h(x), hix) = h(x

r:-\I r:
and set hl(x) - 1J). For any dJ. G

G(xo - 1J) = dG :$ f:", hi dG :$ dG :$ G(xo)

G(xo) = r: h 2 dG :$ f:", h 2 dG :$ r:+o dG :$ G(xo + 1J),


and so, taking G = F and then G = F.,

f hi dF - f hi dFn ~ F(xo - 1J) - Fn(xo) ~ F(xo) - Fn(xo) - e,

f h 2 dF - f h 2 dF. :$ F(xo + 1J) - Fn(xo) :$ F(xo) - Fn(xo) + e,


whence via (ii), for all sufficiently large n

IFn(xo) - F(xo)1 :$ e + it! If hi dF - f hi dF.1 < 3e.


It remains to establish the equivalence of (i) and (iv). Under the latter, for
any a, x E C(F) with a < x
lim Fn(x) ~ lim[Fn(x) - Fn(a)] ~ lim Fn{(a, x)} ~ F{(a, x)}

= F(x) - F(a) ~ F(x),

I - rrm Fn(a) = lim[1 - F.(a)] ~ lim[Fn(x) - Fn(a)]

~ F(x) - F(a) ~ 1 - F(a).

Hence,
280 8 Distribution Functions and Characteristic Functions

Finally, to confirm that (i) entails (iv), it suffices by considering comple-


ments to verify the assertion about open sets, and these may be supposed
subsets of (- 00, 00). For any - 00 < a < b < 00, choose e > 0 so that
a + e, b - e are in C(F). Then

lim Fn{(a, b)} 2 Iim[F n(b - e) - Fn(a + e)]


= F(b - e) - F(a + e) = F{(a + e, b - e)}.
As e! 0, (a + e, b - e) i (a, b), and so lim Fn{(a, b)} 2 F{(a, b)}. Since every
open set of( - OC, 00) is a countable disjoint union of finite open intervals, the
second statement of (iv) follows. 0

If X F is a fictitious LV. with dJ. F and 9 is a finite Borel function on (-00, 00),
it is natural to signify by g(X F) a fictitious r.v. with dJ. G(x) = F {g-l( -00, x)},
where, as earlier, F{·} represents the probability measure determined by the
dJ. F(·).

Corollary 9. If {X n' n 2 I} is a sequence ofLv.s on (0., ff', P) with X n ~ X F,


and F {D} = 0, where D is the discontinuity set of the Borel function g, then
g(X n) ~ g(X F)·
PROOF. Let Fn. Gn denote the dJ.s of X n, g(X n) respectively. By (iv), for any
closed set C of ( - 00, 00), if A = closure of A,
---
lim Gn{C} = rrm Fn{g-l(C)} ~ rrm Fn{g-l(C)} ~ F{g-l(C)}
It-IX:

:::;; F{g-l(C) u D} = F{g- l(C)} = G{C},


and so the conclusion follows from Theorem 3. o
Corollary 10. If {X n' n 2 I} is a sequence ofLv.s on (0., ff', P) with X n ~ X F'
then g(X n) ~ g(XF)for any continuousfunction g.

EXERCISES 8.1
I. Let F be the d.f. ofa LV. Y. where pry = I} = pry = O} = 1and define X n == Y,
X = I - Y. Verify that X n 2.. X but X n -¥. X. Also prove that if X n 2.. X F' where F is
degenerate at c (i.e.• F ( {c}} = I), then X n !'., C.

2. If F n is the d.f. of X n • n ~ 0, where P{X n = -(lIn)} = I - P{X n = O} = 1. n ~ I,


and P(X o = O} = I. verify that Fn ~ Fo , lim F.(O) # Fo(O),

d*{F n , Fo] = supx IF.(x) - Fo(x)l-f-+ O.

3. If X and Yare r.v.s with d.f.s F and G and P{IX - YI ~ £} < £, then the Levy
distance d[F, G] :::;; £.

4. If dJ.s F n .s. F 0 and mn is the unique median of F n' n ~ 0, prove that mn --> mo. Can
an analogous statement be made if the medians are not unique?
8.2 Weak Compactness, Frechet-Shohat, Glivenko-Cantelli Theorems 281

5. Define X = 2 If= 1 X/Y, where the {Xj} are i.i.d. r.v.s on some probability space
with P{X I = I} = P{X I = O} = 1. Then 0 s X s I, and if XI = I, then X ~~,
while if X I = 0, X s 2 If= 2 r j = t. Verify that the dJ. F of X satisfies F(x) =
rkFWx),O < x < r\ k ~ I, and that F is a singular dJ.

6. If {S., n ~ I} is a sequence of binomial LV.S with p.dJ. b(k; n, p), find the density and
dJ. F such that F. ~ F, where F. is the dJ. of(npq)-l(S. - np)2.
7. If g(x) = x", ex> 0, and P(X. = a.} = I/n = I - P(X. = OJ, is g u.i. relative to
F. = F X n if a. = a· nil", a > 0; if a. == a?

8. (Chernoff) Suppose that f,,/a.) = O(b.}, I s j s m, andf,,/a.) = o(b.), m < j s k,


imply g.(a.) = o(b.) for some constants a. and b. > 0, b. j > 0, and Borel functions
g., f. j . If {X.} are LV.S with f"j{X,) = Op(b.) or op(b.) according as I s j s m or
m < j s k, then g.(X.) = op(b.). If, rather, g.(a.) = O(b.), then g.(X.) = Olb.).
9. If {F., n ~ I} is a sequence of dJ.s with F m(x) - F.(x) --+ 0 as m, n --+ 00 for all x E
( - 00,00), does F. ~ some F?

10. If LV.S X.!. X and IglP is u.i. relative to Fxn , then g(X.).::4 g(X).
II. If (F.(x) = J~ <X> f.(y)dy, n ~ O} are absolutely continuous d.f.s with f.(x) --+ fo(x),
a.e., then F.{B} --+ Fo{B} uniformly in all Borel sets B. Hint: J'.'.'<X>If,,(x) - fo(x)/dx
--+ O.

12. FinddJ.sF. ~ Fanda Borel set Bsuch that F{B} = O,F.{B} == 1. Hint: IfY.,n ~ I
is a sequence of binomial r.v.s with parameters nand p = t, let F n be the d.f. of
[Y., - (n 2 /2)]/(n/2).
13. If dJ.s F. ~ F, where F is continuous, then F. converges to F uniformly on ( - 00, 00).

14. Ifforn ~ 1 and lui < UoE(O, 00), them.gJ.s q>.(u) = J'.'.'<X>e"X dF.(x) < g(u) < 00 and
F.':' F, then q>.(u) --+ J'.'.'<x> e"X dF(x), finite for lui < u o'

15. Let S. = Ii Xi where {X., n ~ I} are i.i.d. r.v.s with E Xl = Jl. > O. If N = N p is
an {X.} - time (or a LV. independent of {X., n ~ I}) having the geometric distri-
bution then limp_o P{SN/E SN < x} = I - e- x , x> O. Hint: Recall Exercise 3.1.15.

16. Let H k ('), k ~ I, denote the Hermite polynomial of degree k, which satisfies
HHdx) = xH k(x)-kH k _ 1(X), k~l, with Ho(x) = I, Hdx)=x. If Sk,.=
II :5i, <"'<i.:5. X i,··· Xi., n ~ k ~ I, where {X, X., n ~ I} are i.i.d. with E X = 0,
-
E X 2 = 1, prove that k! Sk,./n kl 2 --+
d
Hk(Z), where Z is a standard normal random
variable. Hint: Use induction, Exercise 7.5.1, and Theorem 8.1.1.

8.2 Weak Compactness,. Frechet-Shohat,


Glivenko-Cantelli Theorems
The Frechet-Shohat and Glivenko-Cantelli theorems are of special interest
and use in statistics. It is advantageous that neither in the proof or application
ofthe former to LV.S X n is any supposition, such as the X n being independent,
interchangeable, or constituting a martingale, needed. A first step in the
282 8 Distribution Functions and Characteristic Functions

direction of proof is the notion of (sequential) weak compactness. Recall that


y -+ x - means that y < x and y -+ x, and analogously for y -+ x +.

Lemma 1. If G is a bounded nondecreasing function on D, a dense subset of


( - 00, 00), then
F(x) = lim G(y)
yeD
y-x-

is a left-continuous, nondecreasing function on ( - 00, 00) with C(F) ::::l C(G)


and F(x) = G(x)for x E C(G).
PROOF. Let F(x) = a. For any G > 0 there exists x' E D with x' < x and
G(x') > a-G. Hence, F(y) > a - G for y E D n (x', x), implying F(x - ) ~
a - G, and thus F(x - ) ~ a. Since F inherits the monotonicity of G, neces-
sarily F(x - ) :::; a, whence F(x - ) = a = F(x). Moreover, if Yn E D, Yn f x,
X n E D, X n 1 x E C(G), it follows that

G(x) ....... G(Yn) :::; F(Yn+ I) :::; F(x) :::; F(x n) :::; G(x n) -+ G(x), 0
yielding the final statement of the lemma.

Lemma 2. (i) Every sequence of dJ.s is weakly compact, that is, contains a
subsequence c;onverging weakly to a left-continuous function. (ii) A sequence of
dJ.s {F n} is completely compact, i.e., every subsequence contains a further
subsequence converging completely, iff {F n' n ~ I} is tight.
(iii) A sequence of d.f.s {F n} converges completely (resp. weakly) to F iff
every subsequence of {F n} has itselfa subsequence converging completely (resp.
weakly) to the same function F.
PROOF. Let D = {r j } be a countable dense set, say the rationals. Since
0:::; Fn(r l ) :::; I, there exists a convergent subsequence {Fn'i(rl),j ~ I}. Then
O:s; Fn'i(rZ) :s; I and there exists a convergent subsequence {Fn2irz)} of
{Fn,/rz)}. Continuing in this fashion, the diagonal sequence {Fnjj' j ~ I}
converges to a bounded nondecreasing function G on D, whence by Lemma 1
F(x) = lim G(y)

is left continuous on ( - 00, 00). If now x E C(F), choose Xm E D, x~ E D with


Xm 1x strictly and x~ -+ x -. Then,
F(x) ~ G(x~)~ Fni;(x~):::; Fn/x):::; Fnjx m) j~ocJ G(x m)
:s; F(Xm-l) m~ocJ F(x),
and so Fnjj (x) -+ F(x) for x E C(F), establishing (i).
Sufficiency in (ii) is an immediate consequence of (i) and Lemma 8.1 .I(iii).
For necessity, suppose 1 is not u.i, relative to {F n }. Then for some G > 0 there
exists a sequence an -+ 00 and a subsequence {nJ of {n} such that AFn; -
AFnj(a) > G for j ~ l. Hence, by Lemma 8. 1.1 (iii) no subsequence of Fnj can
converge completely.
8.2 Weak Compactness, Frechet-Shohat, Glivenko-Cantelli Theorems 283

Apropos of (iii), let x E C(F) and suppose that Fix) does not tend toward
F(x) as n -> 00. Then there exists a subsequence F n· for which lim Fn·(x) exists
but differs from F(x), thereby precluding Fn • from having a subsequence
converging completely (or weakly) to F. The remaining portion of (iii) is
trivial. 0
Lemma 8.1.3 ensures the following

Corollary 1. If {F n, n ~ O} are dJ.s with limn_eX> Fn(x) = Fo(x) for x in a


dense subset of ( - 00, (0), then Fn ~ F o·
Theorem I (Frechet-Shohat). If {F n} is a sequence of dJ.s whose moments
CX n. k = J~eX> x k dFn(x)~cxkfinite, k = 1,2, ... , where {cxd are the moments
ofa uniquely determined dJ. F, then Fn -'+ F.
PROOF. By (i) of Lemma 2, any subsequence of {Fn} has a further subsequence,
say {F n ,}, converging weakly to a left-continuous function F*, whence by
hypothesis and Corollary 8.1.7, Fni -'+ F* and

CXk = lim CXn;,k = feX> x k dF*(x), k = 1,2, ....


1-00 -00

Since, by hypothesis, F is uniquely determined, F* = F and so (iii) of Lemma 2


ensures that F n ..:. F. 0
This raises the question of when a dJ. is uniquely determined by its mo-
ments, if indeed they exist. A partial answer appears in Proposition 8.4.6.
The next lemma is of interest in its own right and instrumental in proving
the Glivenko-Cantelli theorem.

Lemma 3. If dJ.s Fn ~ F and Fn(x ±) -> F(x ± ) at all discontinuity points x of


F, then Fn converges uniformly to F in ( - 00, (0). In particular, if the dJ.s Fn
converge to a continuous dJ. F, the convergence is uniform throughout ( - 00, (0).
PROOF. By hypothesis, Fn(x) -> F(x) for all x. For any positive integer k, let
Xjk be the smallest x for which jlk :5 F(x + ), 1 :5 j < k, and set XOk = - 00,
Xkk = 00. Then F(xjd :5 j/k,O :5 j < k and for x jk < x < Xj+l.k, 0 :5 F(xj+l.d -
F(x jk +) :5 11k so that
1
Fn(Xjk +) - F(x jk +) - k :5 Fn(x jk +) - F(xj+I,k) :5 Fn(x) - F(x)

:5 Fn(xj+ 1,k) - F(Xjk +)


1
:5 Fn(xj+ I.k) - F(xj+ 1,k) + k'
Hence,
sup IFn(x) - F(x) I :5 max IFn(Xjk) - F(Xjk)!
-eX><x<eX> l:Sj<k
1
+ max IFn(Xjk +) - F(Xjk +)1 +-k '
o :sj<k
284 8 Distribution Functions and Characteristic Functions

and since as n --+ 00 the right side --+ 11k, which is arbitrarily small for large k,
the left side --+ 0 as n --+ 00. 0

If X I' X Z' ... , are i.i.d. r.v.s, the sample or empirical d.f. F~ based on
Xl' ... , X n is defined by F:;>(x) = (lIn) D=
1 I[X j <x)(w). Note that for all n

and x, F~(x) is a r. v. and for every n and almost all w, F~(x) is a dJ. Therefore,
for almost all w, {F~(x), n ~ I} is a sequence of dJ.s.

Theorem 2 (Glivenko-Cantelli). If {X n, n ~ 1} are i.i.d. r.v.s on a probability


space (0, fi', P) with dJ. F and F~ is the empirical dJ. based on X I' ... , X n'
then sUP_oo<x<ooIF~(x) - F(x)l~ o.

PROOF. For every x in (- 00, (0), lj = IIXj<x) and Zj = IIXj,,;x), j ~ 1,


constitute Bernoulli trials with success probabilities p = F(x) and F(x +)
respectively. By the strong law of large numbers for Bernoulli r.v.s, F~(x) =
(lIn) L~ lj ~ F(x) and F:;>(x +) = (lIn) L~ Zj ~ F(x +). Thus, if C =
{cj,j ~ I} = set of rational numbers enlarged by any irrational discon-
tinuity points of F and
Al = {w: F~(cj ±) --+ F(cj ±),j ~ I},
Az = {w: F~(x) is adJ., n ~ I},
it follows that P{Ad = 1. Moreover, as noted above, P{A 2 } = 1 and so
P{AIA z} = 1. Since for WE A IAz, {F:;>(x), n ~ I} is a sequence of dJ.s with
F~(c; ±) --+ F(c; ±), j ~ I, Corollary 1 ensures that F~ ~ F for WE A I A 2'
whence by Lemma 3, F:;>(x) converges uniformly to F(x) for WE A I A 2. 0

In problems of interest, dJ.s are likely to be attached to r.v.s X n on a


probability space (0, fi', P) and, as in the case of a.c. convergence or con-
vergence in probability, it is generally necessary to normalize the r.v.s X n' say
to Yn = (X n - bn)la n, (an> 0), to achieve convergence in distribution. This
converts the dJ. Fn of X n to
FyJx) = P{X n < anx + bn} = Fianx + bn)
and raises some questions in the uniqueness realm.

Definition. Two d.f.s F and G are said to be of the same type if for some
positive a and real b
G(x) = F(ax + b). (I)

Theorem 3. If {F n} are dJ.s such that for some positive constants an and real
constants bn

where F and G are nondegeneratedJ.s, then F and G are ofthe same type, that is,
(1) holds, and an--+ a, bn --+ b.
8.2 Weak Compactness, Frechet-Shohat, Glivenko-Cantelli Theorems 285

PROOF. Note at the outset that if H is any nondegenerate dJ. with H(a l x + b l )
= H(a2x + b 2) for all x, where aj and bj are real,j = 1,2, and a l + a2 ¥- 0,
then al = a2, b l = b 2. For if alx + b l ¥- a 2x + b 2, then
x' = t [(al + a2)x + b l + b 2]
cannot be a point of increase of H(x). Thus, if x' is a point ofincrease,alx + b l
= a2 x + b 2 and, since there are two distinct points of increase, necessarily
al = a2,b l = b 2·
The hypothesis (and proof) is easily couched in terms of LV.S, namely,
X n ~ X F (nondegenerate) and y" = a;; I(X n - bn) ~ YG (nondegenerate),
where an > O. Let a be a finite or infinite limit point of {an} and {n'} a sub-
sequence of the positive integers {n} with an' --> a; let b be a finite or infinite
limit point of {b n.} and {nO} a subsequence of {n'} with bn·· --> b.
If a = 00, Xn·la n· ~ 0, whence by Theorem 8.1.1 -bn"la n" = Yn" -
(Xn"la n,,) ~ YG , which is impossible since YG is nondegenerate. Likewise,
a = 0 is precluded since this would entail bn" = X n" - an" Yn" ~ X F' Thus,
o < a < 00 and, since (an" - a) y"" ~ 0,
aYn" + bn" = X n" - (an" - a)Yn" ~ X F ,
which ensures b finite in view of a Yn" ~ a YG . Thus, by Corollary 8.1.3
YG ~ y"" = a,;;.I(X n" - bn,,) ~ a-I(X F - b)
or G(x) = F(ax + b). By the remark at the outset, no other subsequence of
{b n·} can have an alternative limit point b', that is, lim bn· = b. Analogously,
if {n*} is a subsequence of {n} with an' --> a*, bn• --+ b*, the prior reasoning
requires F(ax + b) = F(a*x + b*), which, by the initial comment, entails
a = a*, b = b*. Thus an --> a, bn --> b. D

Corollary 2. If Fn(anx + bn) -=. F(x), Fn(Ct.nx + Pn) -=. G(x), where F, G are
nondegenerate, an' Ct. n > 0, then F and G are related by (1) and Ct.nla n --> a,
(Pn - bn)lan --+ b. In particular, if G = F, Ct. n -- an' Pn - bn = o(a n)·

As seen in Section 6.3, the class of distribution functions on R is closed


under the convolution operation *. It is likewise closed under the more
general operation of mixture (Exercise 3).

EXERCISES 8.2
I. Give an example showing that Theorem 3 is invalid without the nondegeneracy
hypothesis.
2. Verify that the class 9'* of all dJ.s on R is a complete metric space under the Levy
distance.
3. Let 1\ be a Borel subset of Rm where R = ( - co, co), m ;:::: 1, and for every AE 1\ let
F(x; A) be a dJ. on R such that F(x; A) is a Borel function on Rm + I. If G is any dJ.
whose support c 1\, show that H(x) = .fA F(x: A)dG(A) is a dJ on R. It is called a
286 8 Distribution Functions and Characteristic Functions

G-mixture ofthe family.F = {F(x; i.), i. E ;\} or simply a mixture. Convolution is the
special case F(x; A) = F(x - A), m = I. If EH and Ef ; denote integration relative to
Hand F(x; A) respectively, show for any Borel function cp with EH Icp I < 'X that
EH[cp] = JEdcp]dG(A).
4. Let Q be the planar region bounded by the quadrilateral with vertices (0,0), (I, I),
(0, 1), (1, I) and the triangle with vertices (1,0), (I, 0), (I, t). If X, Yare jointly uni-
formly distributed on Q, that is, P is Lebesgue measure on the Borel subsets of Q,
prove that F x + y = F x * F y despite the nonindependence of X and Y.

5. The convolution of two discrete distributions Fj , j = I, 2, is discrete, and if the sup-


port S(F j ) contains nj points, j = I, 2, S(F I * F 2) contains at most n I . n2 and at
least nl + n2 - I points.

6. The convolution of two d.f.s is absolutely continuous ifat least one of the components
is absolutely continuous. The converse is false (Exercise 8.4.6).
7. If {X., n :;:.: I} are i.i.d. with a uniform distribution on [0, I], that is, F(x) = x for
0::; x ::; I, show that n(max i SjS. Xi - I) 2.. X G •

8. The Levy concentration function of adJ. F is defined by

Q(a) = supxeR[F«x + a)+) - F(x)], a :;:.: O.

Demonstrate that if F = F 1 * F 2' Q I (a12) . Q2(a12) ::; Q(a) ::; minj = I. 2 Qj(a), and
deduce that F I * F 2 is continuous iff F I or F 2 is continuous.
9. Let F i be a discrete dJ. with maximum jump qj, i :;:.: I, and suppose that G. =
FI * F 2 * ... * F. ~ G. Prove that if G is continuous, Oi= I qi = 0(1).
10. If flJ2 are densities, their convolution is defined by

f = fl * f2 = J
fl(X - y)f2(y)dy·

Verify that if F j is an absolutely continuous dJ. with density Jj,j = 1,2, then F =
F I * F 2 is absolutely continuous with density f = II * I2'
11. (Chow-Robbins) Prove that if F is a dJ. with F(O) = 0, then G(x) = F(x + n) is Of
J
a dJ. iff x dF(x) <x. More generally, Ghas a finite kth moment iff F has a finite
(k + I)st moment.

12. If X and Yare independent r.v.s such that X + Yand X have identical dJ.s, then
Y = 0, a.c. Hint: Employ the Levy concentration function of Exercise 8.

8.3 Characteristic Functions, Inversion


Formula, Levy Continuity Theorem
Any r.v. X on a probability space (0, fF, P) has an associated characteristic
function (abbreviated c.f.) CPx(t) defined for all real t by
cp(t) = q>x(t) = E ei1X = E cos tX + i E sin tX. (1)
8.3 Characteristic Functions, Inversion Formula, Uvy Continuity Theorem 287

On the other hand, any dJ. F on R has a corresponding Fourier-Stieltjes


transform </JF(t) defined for all real t by

cp(t) = CPF(t) = f: e itx dF(x) = f: cos tx dF(x) +i f: sin tx dF(x), (1')

where the integrals are of the improper Riemann-Stieltjes variety.


When F = Fx is the dJ. of a r.v. X on some probability space (n,~, P),
then according to Theorem 6.2.4 the two integrals coincide. Thus, the prob-
ability integral of(1) may be envisaged as the Riemann-Stieltjes (or Lebesgue-
Stieltjes) integral of (1') with F = Fx and vice versa in view of oft-repeated
comments about construction of a r.v. X on (n, ~, P) with a preassigned d.f.
F. Thus, the term d. may also be used in conjunction with d.f.
Clearly, </J(O) = 1, I</J(t) I ~ 1, </J(t) = </J( -t), where l{> signifies the complex
conjugate of </J, and </J is uniformly continuous on ( - 00, 00) since by the
dominated convergence theorem
I</J(t + h) - </J(t) I = IE eitX(eihX - 1)1 ~ Ele ihx - II~O
independently of t. Moreover, for any constants a, b
</JaX+b(t) = eibt</Jiat). (2)
The importance of d.s in probability theory stems on the one hand from
the one-to-one correspondence between cJ.s and dJ.s, stated as a corollary to
the Levy inversion formula below, and on the other from the ease of operation
with d.s, due largely to the parallelism between convolution of dJ.s and
multiplication of (the corresponding) c.f.s as formulated in Theorem 2
(below). In delving into probablistic problems one may, therefore, operate
interchangeably with dJ.s or their c.f.s.

Theorem I (Levy Inversion Formula). If X is a LV. with d. cP, then for


-oo<a<b<oo

lim -2
C-oc n
1
-c
I C
e- Ita -

it
e- Itb
</J(t)dt

_ P{
- a<
X
<
b}
--- = a} +
+ P{X 2
P{X = b}
. (3)

If a, bE C(F), where F = F x' the right side of (3) reduces to F(b) - F(a).
The integrand of (3) is defined at t = 0 by continuity and is not, in general,
absolutely integrable over ( - 00, 00).
PROOF. Set

I(C) = -2
1 fC e- ita -. e- itb </J(t)dt = -21 fe e- ita - e- irb E eUx. dt.
n -c It 1r -c it
288 8 Distribution Functions and Characteristic Functions

Since [(e- ita - e-ilb)/it]eiIX is bounded for all wand t, by Fubini's theorem
I fC eit(X - a) _ eit(X - b)
I(C) = -2 E dt
n -c it

1 E fC sin t(X - a) - sin t(X - b)


=- ~
not

=-
1sin t
-dt - E
[fC(X - a) fC(X - b) sin t
-dt
J= EJdX),
not 0 t.

Ifc(u-a) . {I,
where
a<u<b
- sm t d ('-00 1.
( )-
J CU - -- t ~ 2' u = a,b
n C(u-b) t 0
, u < a, u > b.
Since IJdu)1 ~ 2 for all - 00 < u, C < 00, by dominated convergence
lim I(C) = lim E JdX) = E lim JdX)
C-+C() C-oo C-oo

1 P{X = a} + P{X = b}
= E[2 I [x=aorbl + I[a<x<bJ] = 2 + P{a < X < b}.
o
Corollary 1. There is a one-to-one correspondence between dJ.s on Rand
their cJ.s.
PROOF. The very definition (1') shows that identical dJ.s have identical d.s,
while the converse follows from
I fC e- i1a - e- i1b
F(b) = lim lim -2 cp(t)dt, bE C(F). o
a--oo c-oo n -c it

Corollary 2. If J~ 00 Icp(t) Idt < 00, then for - 00 < a< b< 00

-2
I foo e- ita - e- itb
. cp(t)dt = P{a < X < b} +
P{X = a} + P{X = b}
2
n _00 It
= F(b) - F(a),
(3')
where the corresponding dJ. F is absolutely continuous with a bounded,
continuous density
f(x) = F(x) = -2I
e- Ilx cp(t)dt. foo (4) .
n _00
PROOF. In view of the hypothesis, (3) transcribes as the first equality in (3'),
whence, letting b -+ a +, Lebesgue's dominated convergence theorem ensures
P{X = a} = O. Hence, F is continuous, yielding the second equality. Since
for b approaching a the integrand of (3') divided by b - a tends to e-itacp(t),
(4) follows by dominated convergence. Boundedness of f is apparent from
8.3 Characteristic Functions, Inversion Formula, Uvy Continuity Theorem 289

(4), and continuity of f follows once more by dominated convergence. To


verify absolute continuity, note that via Fubini's theorem for - 00 < a <
b < 00

f bf(x)dx =
a
1
-2
1t
fb dx fex>
a -ex>
.
e-llxcp(t)dt

= 2~ f~ex>(fe-iIX dx )cp(t)dt
I fex> e- il• - e- i1b
= -2 . cp(t)dt = F(b) - F(a)
1t _ ex> It
by (3'). Since f is continuous, f(x) ~ 0 on (- 00, (0) and, letting b -+ 00,
a-+ -OO,fE2'I(-OO, (0) and Ilfll = 1. 0
Theorem 2. If Fi is a dJ. with corresponding c.f. cpj, i = 0, I, 2, then F 0 =
F 1 * F 2 iff CPo = CPI . CP2·
ilZ
PROOF. Suppose F 0 = F t * F 2. Since e is bounded and continuous, by (II)
of Section 6.3

CPo(t) = f eilZ d(F 1 * F 2)(z) = f~ex> f~ex>eil(X+Y) dFt(x)dFiy)

Conversely, if CPo = cP 1 • CP2 and F = F 1 * F 2 has c.f. cP, then by the portion of
the theorem just proved cP = CPl· CP2 = CPo, whence by Corollary 1, F o =
F = F 1 * F 2. 0
Corollary 3. If X I' X 2 are independent LV.S on some probability space with
cJ.s cP x I' cP X 2' then the cJ. of their sum is
(5)

PROOF. This is an immediate consequence of Theorem 2 and Theorem 6.3.4.


o
Corollary 4. Convolution is associative and commutative and so the class §*
ofall dJ.s on R is an Abelian semigroup.

What is the analogue vis-a-vis c.f.s of complete convergence of dJ.s? This


and more is answered by

Theorem 3 (Levy continuity theorem). Let {F n} be a sequence ofdJ.s on R with


corresponding c.f.s {CPn}' If Fn ~ F, then limn_ex> CPn(t) = CPF(t) uniformly in
It I ~ T for all T > o. Conversely, ifcpit) converges to a limit g(t) on ( - 00, (0)
which is continuous at t = 0, then g is the c.f. of some dJ. F and Fn ~ F.
290 8 Distribution Functions and Characteristic Functions

PROOF.For arbitrary e > 0, choose ±MEC(F)· nf


C(Fn) for which F( -M) +
1 - F(M) < e and then select N 1 such that n ;;:: N 1 entails
Fn( - M) < F( - M) + e < 2e, 1 - Fn(M) < 1 - F(M) + e < 2e.
Then

IqJn(t) - qJ(t) 1 = If~ooei'X dFn(x) - f~ooeitx dF(X)! :$; II 1 1 + II 2 1 + II 3 1,

(6)

t-: t-:
where

IIII = I e
i1x
dFn(X) - e
i1x
dF(x) I :$; F n( - M) + F( - M) < 3e,

II 2 1 = I f:ei,XdFn(X) - f:eitxdF(X)!:$; 1 - Fn(M) + 1 - F(M) < 3[;.

for n ;;:: N 1, and

I 3 = fM e itx dFn(x) - fM e itx dF(x)


-M -M

M
= [Fn(x) - F(x)]e iIX 1 - it fM ei'X[Fn(x) - F(x)] dx,
-M -M

whence for fixed but arbitrary T > 0 and 1t I :$; T

II 3 1:$; I FiM) - F(M) I + IF n( -M) - F( -M)I + T f~MlFn(X) - F(x)ldx.

(7)
Since lim Fn(x) = F(x) except on a set of Lebesgue measure zero, the Lebesgue
dominated convergence theorem ensures that for n ;;:: N 2 the last term in (7)
is less than e. Hence, for n;;:: N = max(N1 , N2 ) and It I :$; T, II 3 1 < 7e, and
so from (6), qJn(t) -+ qJ(t) uniformly in It I :$; T.
Conversely, suppose that qJn(t) -+ g(t) where g is continuous at t = O.
By Lemma 8.2.2 there is a monotone, left-continuous function F and a
subsequence {F n,} of {F n} such that Fn; ~ F. Then for any (j > 0

(8)

By dominated convergence and (ii) of Corollary 8.1.6, it follows as i -+ 'X) that

I
2(j
fd
_/(t)dt =
foo -00
sin (jx
~ dF(x),
8.3 Characteristic Functions, Inversion Formula, Uvy Continuity Theorem 291

and so, as b --+ 0, by continuity of 9 at the origin and, once more, dominated
convergence,
F( 00) - F( - 00) = g(O) = lim lpn(O) = l.

Thus, F n; ~ F, whence by the portion of the theorem already proved lpn. --+ lpF'
implying 9 = lpF' Analogously, every subsequence of {F n} has itself a sub-
sequence converging completely to adJ. F* and lpF* = 9 = lpF, entailing
F* = F. Consequently, Lemma 8.2.2(iii) ensures F n -=. F. 0
Corollary 5. If {F, F n, n ~ I} are d.f.s with corresponding d.s {lp, lpn' n ~ I},
then Fn -=. F iff lim lpn = lp.

The few but powerful theorems concerning d.s already established permit
the extension of Theorem 3.3.1 via the ensuing

Lemma 1 (Doob). If X is a LV. with d. thenfor any positive C and b

L[I -~{lp(t)}]dt
lp,

P{lXI ~ C} ~ (l + 2[)n/C[»2 (9)

PROOF. It suffices to prove

2)-2
f b
o (l - cos Cu)du ~ [)3
(
b +; (10)

since (9) then follows from

J:[I - ~{lp(t)}]dt = J:E(l- costX)dt


= E fb(l - cos tX)dt
o
~ i
!IXI"Cj
dP fb(l - cos tX)dt
0

>
-
i!IXI"Cj
[)3 ([) + ~) - 2 dP
IXI

~b 3
(b + ~r2p{IXI ~ C}.
To this end, choose the minimal [) 1 ~ [) such that C[) 1 = 2kn for some positive
integer k. Then [)l < [) + 2n/C and

fo
Cb
(l - cos u)du = L
k f
2j"b 1b
' (l - cos u)du.
j=l 2(j-l)"blbl
Now for 0 < b < 2n and any real a

f
O+b fbl2 fb l2
cos u du ~ cos u du =2 cos u du,
o -N2 0

whence, since y - sin y ~ .\.3/n2 for 0 ~ y ~ n,


292 8 Distribution Functions and Characteristic Functions

Cb
fo (1 - cos u)du ~ 2k
f1tb/bl
0 (1 - cos u)du = 2k [(no)
~ - sin ~
(no)]
0)3 Co 3 Co 3
~ 2kn ( 0 1 = oi ~ (0 + 2njC?
and (10) follows. o
(O,~, P) andfor some T >
Corollary 6. If {X n} is a sequence ofr.v.s on the °
c.f.ofX m - X n,saYCfJm.n(t),convergesto 1 uniformly in It I ~ Tasm > n -+ 00,
then X n ~ some LV. X on (O,~, P).

PROOF. For every C > 0, from (9)

sup P{IX m - Xnl > C} ~ sup T- 3(T + 2cn)2 fT 11 - CfJm.n(t)ldt = 0(1),


m>n m>n 0

and so, by Lemma 3.3.2, X n ~ X. 0


Theorem 4. If {X n} are independent LV.S on a probability space (O,~, P),
Sn= Ii'=
I Xi' n ~ 1 and Sn ~ SF' then there exists a r.v. Son (O,~, P) such

that Sn~S,

PROOF. For n ~ 1, let CfJn' t/Jn, and CfJ be the d.s of Sn, X n and F respectively. By

°
hypothesis and Theorem 3, CfJn(t) -+ CfJ(t) uniformly in every bounded interval.
Hence, for any 6 in (0, !) there exists T > and an integer N > such that for
It I ~ T and n ~ N
°
ICfJ(t) I > !, ICfJn(t) - CfJ(t) I < 6, ICfJn+k(t) - CfJn(t) I < 26, k ~ 1, (11)

recalling that CfJ(O) = 1and CfJiscontinuous. ByCorollary3,CfJn(t) = ni=1 t/JJ{t),


andsoforltl ~ Tandn ~ N,(11) implies

n+m I = ICfJn(t)!ICfJn(t)
1
CfJn+ m(t) I < 46, m~l.
I 1 - nn t/JJ{t) -

Therefore, limn_ oo CfJsk-sJt) = 1 uniformly for It I ~ T and k > n. By


Corollary 6, there exists a LV. S on (0, ~, P) with Sn ~ S and, consequently,
by Levy's theorem (Theorem 3.3.1) Sn ~ S. D

EXERCISES 8.3
I. If q> is a c.f. so are I q>(t) 12 and .J4l (q>(t)} where .J4l {z} denotes the real part of z. Find the
c.f. of the singular dJ. of Exercise 8.1.5.
2. If Fn, Gn are dJ.s with F n -"-> F, Gn ':" G, then F n * Gn -"-> F * G.
3. If H is a G-mixture of.:1' = (F(x; A), A E t\} as in Exercise 8.2.3 and q>(t, A) is the d.
corresponding to F(x; A), show that q>H(t) = SA q>(t; A)dG(A). Thus, if (q>n(t), n 2: I}
is a sequence of c.f.s and Lf Cn = I, Cn 2: 0, q>(t) = Lf Cjq>j is a c.f. Verify that
exp{A[q>(t) - I]}, A> 0, is a c.f. if q> is.

4. Prove that if q> is a c.f., 1 - .J4l{q>(2t)} ::; 4[1 - .~[q>(t)}].


8.3 Characteristic Functions, Inversion Formula, Levy Continuity Theorem 293

5. Prove that a real-valued c.f. cp satisfies cp(t + h) ~ cp(h) for t > 0 and sufficiently
small h > O.
6. Verify that the d. corresponding to a uniform distribution on (-IX,:X) is cp,(t) =
(sin IXt)/:Xt, IX > 0, and that lim cp.(t) exists but F" '!4 to adJ.

7. Find thed.cpx ifP{X = IX} = P{X = -IX} = -!-. Show by iteration of an elementary
trigonometric identity that

sin I sin t/2" n" I n°O t


-- = - - cos -c --+ cos -c
t 1/2" j=l 2J j=l 2J
and utilize this to state a result on convergence in law of sums of independent LV.S.

8. If cp is the c.f. of the dJ. F, prove that

lim ~ fC e-i,xcp(l)dl = F(x+) - F(x),


c-oo 2C -c
and if X is an integer valued LV., show that

-
I ..
fn e-UJcpx(t)dl = P{X = j}.
2n -n

Give an analogous formula for S" = D= I Xi> where {X;} are i.i.d. integer-valued
LV.S.

9. Prove that Iim c _ 00 (l/2C)S<:'clcp(t)1 2 dt = LxeR[F(x+)-F(xW. Hint: if XI' X 2


are independent with common d.f. F, apply Exercise 8 at x = 0 to X I - X 2'

f
10. Prove that
I e-i,xcp(t)
F(x + ) + F(x - ) = I - - . dt
n It

where

Ie) ,
f (f = lim
c ..... o -c
-<
+
£

and hence that


I I fe-il>CP(I)
F(x) = - - - ---dt
2 2n it

for F continuous.

II. Utilize d.s to give an alternative proof of Corollary 8.1.10.

12. Prove for any dJ. with d. cp that for all real x and h > 0

-
I fX+ 2h F(y)dy - -
I fX F(y)dy = -
I foo (Sin
-
U)2 .
e-·uxtPcp(u/h)du.
2h x 2h x- 2h n - 00 u

Hint: Apply the Levy inversion formula to F * Gh , where Gh is the uniform dis-
tribution on [-h, h].

13. Utilize Exercise 12 to prove the converse portion of Theorem 3.


294 8 Distribution Functions and Characteristic Functions

14. Let H,(x) = SA F(x; A)dGi(A) be a G,-mixture (Exercise 8.2.3) of the additively
closed family $' = {F(x; 1), lEA c R m } (Exercise 8.4.5). Verify that the convolu-
tion HI * H2 is a (G I * G2 )-mixture of $'. Hint: Utilize Exercise 3.
15. If, in Exercise 14, m = I and A = [0, (0) or [0, I, 2, ...] or (nonnegative rationals},
then the mapping of ~ = (G: G(A} = I} onto .Yt = (H: H(x) = SA F(x; l)dG(A.),
G E~} is one-to-one. Then.Yt is said to be identifiable. Hint: ljJ(z; G) = .fA ZA dG(l) is
analytic in 0 < Izl < I and tp(t; 1) = tpA(t), 1 E A where tp(t) = tp(t; I).

8.4 The Nature of Characteristic Functions,


Analytic Characteristic Functions,
Cramer-Levy Theorem
In view ofthe crucial importance ofd.s in probability theory, it is desirable to
amass some information concerning their nature. A first step in this direction
is a tally of the more popular dJ.s and the corresponding d.s.

Distribution Density Support c.f.

Degenerate (rx} e iClt

Symmetric I
(I, -I} cos t
Bernoulli 2
Binomial G)pX(1 - p r x xE(O,I, ... ,n} (pe i ' + 1 - p)",O < p < 1

Poisson lx~
-A

x!
XE (0, I, ...} exp(l(e i' - I)}, 1> 0

Normal I
--exp
(J$
{-(X-tW }
20'2
(- 00, (0) ei8 ,-a 2,2/ 2 , (J > 0
Symmetric
Uniform
-
I [ -rx, IX]
sin rxt
- - rx>O
2rx IXt '

~ (I - 1:1)
2(1 - cos rxt)
Triangular [-IX, IX] , rx > 0
IX 2 t 2
Inverse
Triangular
I- cos rxx
(- 00, (0) [I---;. ' It'r rx > 0
1trxx 2
A
Gamma
IX
[(1) x
),-1 -(U
e
(0,00) (I - ~rA, 1> 0, IX> 0

rx
Cauchy (- 00, (0) e-· III , rx > 0
1t(rx 2
+ x 2)

Characteristic functions and moments are intimately related, as will be


seen via the preliminary
8.4 The Nature of Characteristic Functions 295

Lemma 1. For t E ( - (0) and nonnegative integers n

Ifl
00,

" (. \i (. )"+
ei. - " I t ) _ It
L- -.-, - - - , - -u du
ei'U(I)"
i=O J. n. 0

(1)

and for any fJ in [0, I]

. " (it)il 2 1 - Oltl"H


Ie"l - j~O Y ~ (1 + fJ)(2 + fJ)· .. (n + (5) , (2)

where the denominator of the right side of (2) is unity for n = O.


PROOF. Since for n ~ I

(')"+Ifl
_It__
, e
ilU(1 _
u
)" d
u
=~,
(.)" (.)" fl
+ ( _It I)' e
ilU(1 _
u
)"-1 d
u,
n. 0 n. n . 0

the first part of (1) follows by summing while the second is obtained in-
ductively.
To prove (2), let I" denote the left side of (I). Since Ieil - II = 21 sin t/21 ~
1 O
2 - lt10 for 0 ~ fJ ~ I, let n ~ I, whence from (1)

11"1 ~ fl {"+1 ... f~3Iei'2 - IIdt 2 ... dt"+1

~2
1 lf 'l f'"+1 f'
-0 0 0 . .. 0
3
t~ dt 2 ... dt"+ I
2
1
-
= (I + fJ) ... (n + fJ)'
O
ltl"H
0

Corollary 1. le z - 11 ~ (e lzl - I) for any complex z.

PROOF. (2) with {) = 1, n = 0 yields the bound 2(elzl - 1). However, directly

le
z
- II = I
I~ zi/j! ~ ~ Izli/j! = elzl - 1. 0

Theorem l. If X is a LV. with d. ({J and EIXI"H < 00 for some nonnegative
integer n and some {) in [0, I], then ({J has continuous derivatives q>(k l of orders
k :s; nand
I ~ k ~ n, 0)

q> ()
t = ~(it)iEXj
L. ., + 0 (II")
t as t -+ O. (5)
j=O J.
296 8 Distribution Functions and Characteristic Functions

Conversely,ifcpl2k)(0)existsandisfiniteforsomek = 1,2, ... ,then


E X 2k < 00.
(4) follows easily from (2). To prove (3), note that via (1)

i'i'k i'2.
PROOF.

e ilX - L -.-,-
k-t (itXY = (iX)k ... e"' X dt t ... dt k, (6)
j=O ). 0 0 0

so that

and

- +
IjJk(t
--, h)---IjJk(t)
- - -_
h
E('X)k
I -1
h
f'+h i'k
I 0
... i 0
'2
e i'IX d tt ... d t k.

Since
2
I~ f+h f~k ... { e i/ IXdt t ... dtkl :5: (It I + Ihl)k-t = 0(1) as h -+ 0,

by the dominated convergence theorem, for 1 :5: k :5: n

(7)

and, in particular,
cp'(t) = ljJ't(t) = i E Xe itX .
Repeating the previous argument, it follows via (7) that for 2 :5: k :5: n
2
d 1jJ (t) = ik E X k
1jJ~2)(t) = --~-
dt 0 0
...
0
i'i'k-2 it2
eitlX dt t ... dt k- 2,

and, continuing in this fashion, for k :5: n

IW-1)(t) = jk E X k {e itlX dt t ,

cp(k)(t) = ljJ~k)(t) = ik E Xke itX ,


which is (3). Now by (3),
cp(n)(t + h) - cpln)(t) = in E xneilX(eihX - 1),

and since IeihX - 11 :5: 2, once more by dominated convergence


lim cp(n)(t + h) = cpln)(t)
h-O

and cp has a continuous nth derivative, whence (5) follows from Taylor's
theorem.
8.4 The Nature of Characteristic Functions 297

To prove the converse, define for any finite function g for - 00 < x, h < 00

~~t)g(x) = + h) - g(x - h),


g(x
~~n)g(x) = ~~n-1)g(x + h) - ~~n-1)g(x - h), n = 2,3, ....
Then ~~n)[agl(x) + bg 2(x)] = a~~n)gl(X) + b~~n)gix) for - 00 < a, b < 00
and

If g(x) is a polynomial of degree n, ~hg(X) is a polynomial of degree at most


n - 1. Hence ~~m)g(x) = 0 if g is a polynomial of degree < m and, moreover,
~~2m)x2m = (2h)2m(2m)! Since 1p(2n)(0) is finite, as t -+ 0 (Hardy, 1952, p. 290)

t 2n
Ip(t) = 1p(0) + tlp'(O) + ... + (2n)! [1p(2n)(0) + 0(1)],

I1p(2n)(0) I = Ilim E(_ei_hX e_-_ih_X)2n I =I·I m


E(sin-hX)2n
->X E n 2
h-O 2h h-O h -
by Fatou's lemma, completing the proof. o
In Section 3.3, a r.v. X was defined to be symmetric if X and -X have
the same distribution. The property in question belongs to the dJ. and is
tantamount to saying that a dJ. is symmetric if F(x) = 1 - F( - x +), X E R.

Proposition 1. A d. Ip(t) is real valued (for real t) iff its dJ. F is symmetric.
PROOF. The proof is most easily couched in terms of LV.S. If F is symmetric,
that is, X and - X are identically distributed,
Ip(t) = Ipx(t) = Ip-x(t) = Ipx( -t) = Ip(t),
so that Ip is real. Conversely, if Ip is real,
Ipx(t) = Ipx(t) = Ipx( - t) = Ip - x(t),
and so by the one-to-one correspondence between d.s and dJ.s, X and - X
are identically distributed, that is, F is symmetric. 0

A function Ip( t) on the real line is said to be periodic with a period to if


Ip(t + to) = Ip(t) for all real t.

Proposition 2. IJ Ip is the d. oj the dJ. F and IIp(t o) I = 1 Jor some to "" 0,


then F is discrete with support S consisting oj a subset oj points in arithmetic
progression and conversely. In this case, S c {(2nj + ()o)/t o , j = 0, ± 1, ...}
Jor some real ()o and e-iIIJO/I°Ip(t) is a periodic Junction with period to.
298 8 Distribution Functions and Characteristic Functions

PROOF. Since 1cp(t o) 1 = 1, cp(t o) = ei80 for some real (Jo, whence 1 = e - i8ocp(t O)
= J ei(l ox- 8o) dF(x) = Jcos(tox - (Jo)dF(x). In r.v. parlance,
E[1 - cos(t o X - (Jo)] = 0,
necessitating
P{toX - (Jo = 2jn,j = 0, ± 1, ...}
2jn + (J 0 .
=p{ X = to
}
,J = 0, ± 1,... = 1.

Conversely, the latter entails, setting Pj = P{X = (2jn + (Jo)/t o} ~ 0,


cp(t) = L pjeit(2j" + 8o)/to = ei8ot/to L pje 2j"it/I O, (8)
j j

whence 1cp(t o) 1 = 1 and e-itOo/tocp(t) is periodic with a period to. D

Corollary2.Ac.f.cp(t)satisfieseither(i)lcp(t)1 < Iforallt -::/: O,(ii) 1cp(t) 1 == 1,


or (iii) Icp(t) I = 1 for countably many isolated values of t.
PROOF. It suffices to prove that if 1cp(t n) 1 = 1 for t n - s (finite), tn -::/: s,
then Icp(t) \ == 1. Now, via (8) with t n replacing to and then t m replacing to,
°-: /:
for k, m, n = 1,2, ...
Icp(k(t m - t n» 1 == I cp(kt m ) 1 = 1.
Since tm - tn - 0, every interval (a, b) contains a point c with Icp(c) I = 1,
whence by continuity Icp(t) I == 1. D

Corollary 3. If a d. cp satisfies I cp(t o) 1 = 1cp(ext o) 1 = 1 for some to -::/:


irrational ex, then cp is degenerate, that is, cp(t) = ei8t for some real (J.
°
and

PROOF. If cp is nondegenerate, by Proposition 2, P{X = (2nj j + (Jo)/t o} >


for some (Jo and integers j I ¥ h. Then for some real (J I and integers k I ¥ k 2
°
2nji + (Jo 2nk i + (JI i = 1,2,
to ext o

( kl) = -;(JI -
2n jl - -; (Jo = 2n (k 2
j2 - -; ) ,

whence k l - k2 = exUI - h), a contradiction. D

Corollary 2 in combination with Theorem 8.3.3. elicits some interesting


conclusions about convergence in distribution of normalized sums of i.i.d.
LV.S.

Theorem 2. If {X n} are i.i.d. with nondegenerate dJ. F, Sn = Ii Xi' and for


some positive constants an and real bn and nondegenerate dJ. G
Sn - b n d S
- G,
an
8.4 The Nature of Characteristic Functions 299

then

PROOF. If an -r> 00, there is a subsequence {n'} of the positive integers {n}
(which, for notational simplicity, will be replaced by {n}) such that an -+ a,
finite. By Corollary 8.1.3, Sn - bn = an(Sn - bn)/an 2.. aSG' Thus, if t/J and cP
are d.s respectively of XI and aSG' by Theorem 8.3.3 e-ibn't/Jn(t) -+ cp(t).
Clearly, this entails cp(t) = 0 for all t such that It/J(t) I < 1. By Corollary 2,
cp(t) = 0 for all except perhaps countably many isolated values of t, whence
cp(t) == 0 by continuity, a contradiction. Thus, an -+ 00, whence X dan ~ 0
and likewise X Jan ~ O. Theorem 8.1.1 then ensures that (Sn- 1 - bn)/an 2.. SG'
which in conjunction with (Sn-I - bn-I)/an-I 2.. SG necessitates

by Corollary 8.2.2. D

The next proposition reveals inter alia that CPa(t) = e-/'I is a c.f. for
o < ex ~ 1 but fails to detect that CPa is likewise a d. for 1 < ex ~ 2, a fact
which will be established in Theorem 12.3.2.

Proposition 3 (Polya). A nonnegative, even function t/J convex and decreasing


on (0, (0) with t/J(O +) = t/J(O) = I is a d.
PROOF. Apart from the trivial case t/J(t) == 1, if t/J(t) == c for all t > to, then
t/J = Iim n _ oo t/Jn' where t/Jn is strictly decreasing (to c) for t > to' Thus, by
Theorem 8.3.3 it suffices to prove the proposition for t/J strictly decreasing.
For m = n2 n and n = 1,2, ... , set t j = j2- n, t/Jj = t/J(t), j = 0, 1, ... , m. If
(t, yit)) is a point on the line determined by the points (t j _ l , t/Jj-I) and
(t j , t/J), then for 1 ~ j ~ m

() t/Jj-Itj - t/Jjt j - I
Yj\t = t/Jjt -- t/Jj-I
tj_ 1
t + -'---'-----"--''-----'---'~...e:-
t - t _
j j j 1

whence, by monotomclty, 1 ~ Ym(O) = Cm > 0 and A.". ~ t m. Note that


Ym(t) = cm(l - (t/Am))+ for t ~ t m ~ Am' Define Cj, Aj inductively for
j = m - 1, ... , 2, 1 via

m
L Cj = Yj(O),
i=j
300 8 Distribution Functions and Characteristic Functions

Then
m C. m
Yj(t) = - t .L. -;f + .Lei>
l=) A. 1=)

and so y/t) = Yj+ t(t) entails t = Aj = t j for j < m. Moreover,

./,
'l'j+1
./,
+ 'l'j-I ./, = (t
~ 2'1'j
j + 1t+ t 1) ./,
j -
'l'j
j

by convexity, implying y/O) ~ Yj+ 1(0). Thus, Cj = y/O) - Yj+ 1(0) ~ 0 and
Lj=1 Cj = YI(O) = 1. Furthermore,ljJm(t) == Lj=l cp - (ItI!Aj)t coincides
with Yj(ltl) for t j _ 1 S; It I S; t j , 1 S; j S; m, and ljJm<t) is a d., m ~ 1, via
Exercise 8.3.3 and the table of c.f.s. By hypothesis ljJ(O) = ljJ(O+ ), whence
ljJ(t) = Iim n .... co ljJn(t), and so ljJ is a d. by Theorem 8.3.3. 0

Corollary 4. Two different d.s may coincide in anyfixedfmite interval ( - T, T),


T > O.
PROOF. If suffices to change the d. e- III in (T, T + 1) and ( - T - 1, - T) by
replacing its arcs therein by line segments preserving continuity. According
to Proposition 3, the modified function is still a d. 0

A necessary and sufficient condition for a continuous complex function ({J


on (- 00, 00) with ({J(O) = 1 to be a d. exists (Exercise 16) but is too unwieldy
to constitute a working criterion.
One of the most beautiful and striking theorems of probability theory,
conjectured by Levy and proved by Cramer, asserts that if a normal dJ. is the
convolution of two nondegenerate dJ.s, each must be normal.
To place this in perspective, an algebraic viewpoint is helpful. If dJ.s F,
F I' F 2 are in the relation F = F 1 * F 2, it is natural to call F 1 and F 2 factors or
divisors of F. The corresponding d.s ({JI and ({J2 are literal divisors of ({J. If
J a denotes the dJ. degenerate at Ct, then J a is a factor of every dJ. F since
((JF(t) = eial[e-ial({JF(t)], but these are trivial and unwanted factors. Clearly,
all divisors of discrete dJ.s are discrete, but the analogous statement for
continuous dJ.s is false, as revealed by the trigonometric identity

sin 2t sin t
- - = --cost.
2t t

Definition. A family g;' of dJ.s is factor closed if for all FE g;' the convolu-
tion relation F = G 1 * G 2 with G i a nondegenerate dJ. implies that G i E g;',
i = 1,2.
The theorem of Cramer-Levy may be rephrased by saying that the family
of normal d.f.s is factor closed. The only known method of proof is via d.s and
requires a discussion of analytic d.s.
8.4 The Nature of Characteristic Functions 301

A d. <p is called analytic or more specifically r-analytic, r > 0 (resp. entire),


if <p can be represented by a convergent power series in -r < t < r (resp. in
- 00 < t < 00), i.e.,
00 a .t
j

<p(t) = L ~, -r < t < r.


j=O }.

Proposition 4. If F is a dJ. with corresponding d. <p, the following conditions


are equivalent:
i. <p is r-analytic;
ii. <p(t) = L.f=o IX/it)jjj!, -r < t < r, where IX j = Jx j dF(x) (finite);
...
111. '\'00
J.' < 00, 0 < t < r,.
L-j=O 1X 2j t 2jj(2')
iv. Je'lxl dF(x) < 00,0 < t < r.
PROOF. Suppose as in (i) that for some r > 0 and complex numbers {an}
<p(t) = LO antnjn!, -r < t < r. Then, <p(k)(t) = L:'=k antn-kj(n - k)!, k =
1, 2, ... , whence by Theorem 8.4.1, ak = q>(k)(O) = ik J~ 00 x k dF(x) = iklXk
(say), yielding (ii). Since (ii) = (iii) is trivial, set Pj = JIx Ij dF(x) and suppose
L.f=o P2j t 2jj(2j)! < 00, which is (iii). Via Jensen's inequality,
1/(2j-1) < PI / 2j
P2j-1 - 2j ,

whence
f
j=1
2j 1
P2j_ 1 t - <
(2j - I)! - j=1
f (l + P2)t
(2j - I)!
2j
-
1
< '+ d
- e dt
i (2j)!
I
2j
P2j t <
00

for 0 < t < r, implying in this range that

f e'lx l dF(x) = Loooo Jo I~I:tj dF(x) = Jo Pit < 00, (9)

so that (iii) = (iv). Finally, if (iv) obtains, then, employing the language of
random variables, (9) implies that

E f I(itX)jl
j=O j!
= Ee 1tX1 < 00 for It I < r,

whence by dominated convergence for It I < r

<p(t) = E eitX = f
j=O
E (it~)j =
).
f
j=o}'
IXJ~i,t)j. o
Corollary 5. A d. <p with dJ. F is entire iff J~ 00 er1xl dF(x) < 00 for all r > O.

For any complex number z = x + iy define

<p(z) = f~oo eizu dF(u) (10)

provided J~oo e- Yu dF(u) < 00.


302 8 Distribution Functions and Characteristic Functions

The next proposition asserts that a d. q> is r-analytic iff q>(z) is defined and
analytic in the strip 1.1"(z) I = 1y 1 < r ofthe complex plane. Consequently, the
statement that <p is r-analytic may also be interpreted as asserting that <p(z)
as defined by (10) is analytic in 1.1"(z) I < r.

Proposition 5. A d. <p with dJ. F is r-analytic iff <p(z) (in (10» is an analytic
function in 1.1"(z) I < r, and in this case for k = 0, 1,2, ... ,

OO'j j-k
(k)( )
z
= "L. I ajz Izi < r, (1t)
q> j=k j - k) ,'.
(.

where a j is the jth moment of F and <p(O) = <p.


PROOF. If <p is r-analytic and C(i) is the continuity set of F, for abE C(F) by
uniform convergence of the exponential series

b. (izy fb .
= f e'zx dF(x) = L -.-,
00
g(z) Xl dF(x),
a j=O j. a

and since the latter series converges uniformly in 1z I ~ M for all 0 < M < 00,
g(z) is entire and for k = 0, 1, ...

j-k fb fb
g(k)(Z) = L 00
~z , x
j=k(j-k).
'j

a
j
dF(x) =
a
(ixte izx dF(x).

Choose ±a.EC(F) with 0 < a. i 00, n;::: 1, and define

9.(Z) = [ eizx dF(x), n> 1.


J[an - 1 ,; Ixi < anI

Then by the preceding, g.(z) is entire, n ;::: 1, and for 1.1"(z)\ ~ s < r

f:
.=m
19n(Z) \ S; [
Jllul 2:am_ d
e1u.1( zlI dF(u) S; [
Jllul 2:am_ II
~luldF(u) < 00

by Proposition 4. Thus, If
g.(z) converges uniformly in 1.1"(z) 1 S; s < r,
whence (Titchmarsh, 1932, p. 95) <p(z) = J~ 00 eizu dF(u) = Lf 9.(Z) is
analytic in 1.1"(z) 1 < r. Moreover, for 1.1"(z) I < rand k = 0, 1, ... , setting
ao = 0,

= f [
n=1 J[an-,,;lxl<anl
(ixte izx dF(x) = E (iX)keiZX
8.4 The Nature of Characteristic Functions 303

and so, via (and in the notation of) Proposition 4, for 1z 1:::;; s < r

f J=kf IUik·(i~k-~). i
n=1 [0.-1,,1"1<0.)
j
u dF(u) I
co ~-k dk co p.~
: :; j=k
L U _ k)I. P = dk Sj=O)'
L ~- < 00,
j

whence
(k) _ ik(izy-k..
.L (. _ k)IC1..
00

CfJ (z) - j , Izi < r.


J=k )
Conversely, if CfJ(z) is defined in 1..I(z) 1 < r, then (iv) of Proposition 4 holds
whence CfJ is r-analytic. 0

Proposition 6. (i) If F is a dJ. whose d. CfJ is r-analytic for some r > 0, then F
is uniquely determined by its moment sequence {IXj = xj dF(x),j ~ 1}. J
(ii) A d. CfJ with moment sequence {IX j , j ~ I} is r-analytic for some r > 0 iff
~ (IX 2n)I/2n
hm 2 < 00. (12)
n-+ 00 n
j
PROOF. (i) Denote F, CfJ by F I' CfJI and suppose for j ~ 1 that IXj = Jx dFk(x),
k = 1, 2, where F2 is a dJ. with d. CfJ2' By hypothesis and Proposition 4,
CfJ2 is r-analytic, whence by Proposition 5, CfJb) is analytic in 1..I(z)1 < r,
k = 1,2, and
IX~iz)j
L _J_.,- = CfJ2(Z),
co
CfJI(Z) = Izi < r.
j=O ).

Therefore (Titchmarsh, 1932, p. 88), CfJI(Z) == CfJiz) in 1..I(z) 1 < r, whence


CfJI(t) = CfJ2(t) for - 00 < t < 00. Then F I = F 2 by Corollary 8.3.1.
(ii) By Stirling's formula (Section 2.3)

~ log n! = ~ [(n + ~}Og n - n + C + o(1)J = log n + 0(1)


whence, if bn = (IIXnl/n!)-I/n, necessarily logb n = -log(IIXnll/n/n) + 0(1).
Since the radius of convergence of a power series L
Cn zn is lim 1Cn 1- I/n
(Titchmarsh, 1932, p. 213), (ii) follows from Proposition 4. Conversely, (12)
and IIX2n_t1l/(2n-I):::;; Pl~l!~-I):::;; IX~~2n imply (11) for k = 0 and some
r > 0, so that a fortiori (ii) of Proposition (4) holds 0

On the other hand a dJ. may be uniquely determined by its moment


sequence without having an analytic d. To verify this, it is necessary to
invoke the celebrated Carleman criterion (Shohat and Tamarkin, 1943,
p. 19) which asserts that adJ. F is uniquely determined by its moment
sequence {IXj,j ~ 1} if
00 1
L -(
-)1/2 j = 00.
j= I IX2j
(13)
304 8 Distribution Functions and Characteristic Functions

EXAMPLE l. Let X be a random variable with P{X = n} = Ce-n/logn, n ~ 3,


where C is such that P{X ~ 3} = 1. Then for x ~ 3 and m ~ 4, set
h(x) = xme-x/logx.
Now
h'(x) mil 0
--=----+--=
h(x) x log X log2 X

iff x = x m , where

m
xm
= log X m
( 1) Xm
1 - log X m :-s; log X m.

As m -> 00, xm-> 00, and therefore m log xm/x m -> 1, whence (m/2)log m :-s;
Xm :-s; 2m log m for m ~ mo. Define

M m = max xme-x/logx = x:::e-xm/IOgXm.


x~ 3

Now for m ~ mo

:-s; CM 2m +2 :-s; C[(4m + 4)log(2m + 2)]2m+2 e -2m-2,


and therefore for all large m
(tX2m)I/2m :-s; 4C I/2m[(2m + 2)log(2m + 2)]I+O/m) :-s; 8(2m + 2)log(2m + 2).
Hence, II" (tX2)-1/2j = 00, and moreover limm_oo(tX2m)I/2m/2m = 00 since

(tX2m)I/2m> (CM 2m )I/2m > e- I (llOg m) for all large m.

Thus qJ is not r-analytic for any r > 0 by Proposition 6, whereas the Carleman
criterion ensures that the distribution in question is uniquely determined by
its moments. 0

Proposition 7. If qJ, qJh qJ2 are d.s with qJ(t) = qJI (t)qJit)for all real t and qJ is
ro-analytic, then qJi is ro-analytic, i = 1,2, and qJ(z) = qJI(Z)' qJiz) for
Izi < roo
PROOF. If F, F h F 2 are the corresponding dJ.s, by Proposition 4

f:oo e'lx l dF(x) < 00

for 0 < r < ro, whence by Section 6.3, (11),

f:ooe'" dF(u) = f:oo€rx dFI(x) f:ooe'Y dF2(y), Irl < roo


8.4 The Nature of Characteristic Functions 305

Since neither integral on the right vanishes, J~ <Xl e"" dF;(x) < 00, i = 1, 2,
for Irl < r o, so that for i = 1,2

{<Xl e"" dF;(x) < 00, 0 < r < r o,

f:<Xlerx dFj(x) < 00, - ro < r < 0,

whence J~ <Xl erlxl dFb) < 00 for 0 < r < ro and i = 1,2. Again invoking
Proposition 4, cP; is ro-analytic, i = 1,2. Finally, since cp(z) and qJI(Z)' cpiz)
are both analytic in IZI < r o and coincide on the real axis, they must coincide
in Izi < roo D

Corollary 6. If cP, CPI, CP2 are c.f.s with cp(t) = CPI(t)· CP2(t)for all real t and cP is
an entire c.f., so is cP;, i = 1,2, and qJ(z) = qJI(Z)' CP2(z)for all complex Z.

Theorem 3 (Cramer-Levy). Thefamity ofnorma I distributions isfactor closed.


PROOF. It suffices to prove that if CPo is the c.f. of the standard normal dJ. and
t2
e- /2 = qJo(t) = qJ.(t)· CP2(t), then the c.f. cpj is normal (or degenerate),
j = 1,2. Now CPo(z) is an entire function, and so by Corollary 6, qJJ{z) is
likewise an entire function j = 1,2. Thus, the corresponding dJ. F j has a
finite mean, and so replacing CPj by exp[( -IYitex]qJj, it may be supposed that
Fj has mean zero, j = 1,2. In the parlance of random variables, X 0 =
X I + X 2 with X I' X2 independent and E X j = 0, j = 0,1,2. Then
E{XoIXJ = Xj' j = 1,2, whence, employing the conditional Jensen
inequality, for j = 1, 2 and all complex Z
Icpi Z ) I = IE e;ZXjl :s: E elzXjl = Ee1zE{Xo1Xjll
:s: E e 1zlE (IXoll Xj) :s: E e Izxol

= _2_ f<Xle'ZX'-X2/2 dx = 2elz12/2 f<Xle-(1/2)(X-IZll2 dx


Jhro Jhro
:s: 2e W / 2
• (14)

Again invoking Corollary 6, e- z2 /2 = CPo(z) = CPI(Z)' cpiz) for all complex


z. Thus, cp;(z) is a nonvanishing entire function, whence by the ensuing Lemma
2, qJ;(z) = exp{giz)} with giz) an entire function j = 1,2. Hence, recalling
(14), for all complex Z
9l{giz)} = 10glcpJ{z)1 :s: 1 + !lzI 2 •
If giz) = Lo anjz n, by Lemma 3 which follows
lan/I :s: 4(1r 2 + 1), 0 < r < 00, n ~ 3,
implying anj = 0, n ~ 3, j = 1,2. Thus, giz) = aOj + aliz + a2jz2. Since
cP j(O) = 1 and E Xj = 0, necessarily aOj = 0 = a Ij for j = 1, 2. Finally,
306 8 Distribution Functions and Characteristic Functions

E XJ = -cpj(O) = -2a2jimpliescpit) = e-aJI2/2,whereo} = E XJ ~ Oand


ai + a~ = 1. 0

Lemma 2 (Titchmarsh, 1932, p. 247). Ifcp(z) is an entire, nonvanishingfunction,


then cp(z) = exp{g(z)}, where g(z) is entire.
PROOF. If h(z) = cp'(z)jcp(z), then h is entire, whence g(z) = So h(w)dw is
entire. Since

~ [cp(z)e-g(Z)] = e-g(Z) [cp'(Z) - cp(z). CP'(Z)] =0


dz cp(z)
necessarily, cp(z) = CeY(Z) = eg(z) upon noting that g(O) = O. o
Lemma 3 (Titchmarsh, 1932, p. 176). If cp(z) = LO
anz n is analytic in Iz I < r o
and A(r) = maxlzl=r 9t{cp(z)}, 0 < r < r o, then lanlr" 4A +(r) - 291{cp(O)}, :$
n > O.
PROOF. Set z = reilJ , 0 < r < r o , and an = IX n + iPn, n ~ 0, whence cp(z) =
LO anz n = u(r, 0) + iv(r, 0) say. Since
00

u(r, 0) = L (IX n cos nO - Pn sin nO)r"


n=O

converges uniformly in 0, for n > 0,

-1
n
1
211

0
u(r, O)cos nO dO = IXn rn, -I
n
1211

0
u(r, O)sin nO dO = - Pnrn

and, recalling that an = IXn + iPn,

lanr"1 = I~ f 1lu
(r, (J)e-
i 1J
• dO 1:$ ~ f 1l
!u(r, 0)1 dO, n > O.

Now 9l{cp(0)} = 1X0 = (l/2n) g1l u(r, O)dO, implying for r> 0 that la.rnl +
21X0 :$ ljn S~1I [u(r, 0) + lu(r, 0)1] dO :$ 4A +(r), which is Lemma 3. 0

A theorem of Raikov-Ottaviani asserts that the Poisson family is factor


closed (to within translations), but the proof closely parallels that of Theorem
3 (see Exercise 12).

Theorem 4. The family of binomial distributions is factor closed (to within


translations).
PROOF. Supposethatcp!(t)'CP2(t) = cp(t) = (pe il + q)',O < p < l,q = 1- p,
n = positive integer. Clearly, the "translations" eialcp!(t) and e-ia1cpit) are
also factors, but the theorem asserts that for some choice of IX these are both
binomial. Since 1 = Icp(2n) I = TIJ=! Icppn) I 1, Proposition 2 guarantees :$
that Fj is discrete with support Sj c {cj + k: k an integer, cj real},j = 1,2.
Clearly (Exercise 8.2.5), Sj is bounded since the support of the binomial
8.4 The Nature of Characteristic Functions 307

distribution is. Thus, without loss of generality, qJJ{t) = e icjl Li:j;o Pjkeikt
with Cj real, 0 ::; Pjk ::; I, PjO > O. Since qJ is entire, setting z = ir, r real, in
Corollary 6,
(pe- r + qt = e- r(C'+ C2I nL
2

j; 1 k;O
nj

pjke-rk,

whence C1 + C 2 = 0 via r -+ ± 00. Consequently,


L Plk Wk . k;O
L P2k Wk,
". "2
(pw + q)" =
k;O
implying nl + n2 = n and D~o Pjk wk = (pw + q)"j, j = 1,2, which tS
tantamount to the conclusion of the theorem. o
EXERCISES 8.4
I. (i) Since the triangular distribution (see d. table) is the convolution of two uniforms,
the d. of the former follows readily. Utilize Corollary 8.3.2 to obtain the c.f. of the
inverse triangular distribution.
(ii) Use contour integration to obtain the normal d.
2. Prove that if a c.f. cp(t) = I + wet) + 0(t 2 ), where w(t) = - w( - t), then cp(t) == I.
Hint: Apply Lemma 8.3.1. .
3. Ifcpisad.,whichofeiSin',(1 - c)j(l - ccp(t»,O < c < 1,lcp(tW,.;f{cp(t)},lj(l + t 4 ),
(1 -ltl')I I1 'I<I),1 > IX> O,ared.s?

4. Find the dJ. whose c.f. is C L~ 2 [cos jtjV logj)] and utilize this d. to show that the
converse of Theorem I is false for odd derivatives (e.g., cp').
5. A family of d.f.s ff 0 = (F(x; ),), ), E A} with A an Abelian semigroup under addition,
is called reproductive or additively closed if F(x: ),1) • F(x: ),2) = F(x:)" + ),2) for
all ),j E A, j = I, 2. If A = {O, 1,2, ...}, the corresponding family of d.s cp(t; ),) =
[cp(t)].l for some d. cp. Show that if cp(t) and Ijcp(t) are both d.s, then cp is degenerate.
Hence, if A c ( - 00, 'XJ), usually A c [0, 'XJ).
6. Prove that the convolution of 2 singular dJ.s may be absolutely continuous. Hint:
sin t O O t
-
t
= n
cos 2 2j -
J
1 . n00 cos 2t
1
2j

7. Show that
e il - D~A [(it)jjj!]
cp"(t) = (it)"jn!
is a d., n ~ I.

8. If (X., n ~ I} are independent LV.S uniformly distributed on [-IX., IX.], n ~ I and


s; = D lXi -> 00,find the limit dJ. of (lj s.)D
Xi as -> 00. n
9. Prove that if F is absolutely continuous with c.f. cp, then Iim111_00 cp(t) = O. This is the
Riemann-Lebesgue lemma. Hint: Suppose initially that density vanishes outside some
finite interval and in general approximate. Show that if F has an absolutely con-
tinuous component, then 1Inl"I_oolcp(t)1 < I.
308 8 Distribution Functions and Characteristic Functions

10. If Fj,j = 1,2,3, are dJ.s with F I * F 3 = F 2 * F 3 , it does not follow that F I = F 2.
Hint: If qJ is the d. of a LV. X with PIX = O} =.!, PIX = ± kn} = 2j(k 2n 2)},
k = 1,3,5, ... , then qJ is a periodic c.f. coinciding with qJl(t) = (I - Itl)/IIIISI1 in
[-I, I].
11. There exist uncountably many absolutely continuous dJ.s with support (0, ::t:J) and
identical moments Vb k = 1,2, .... Hint: For 0 < cx < 1, r > 0, and a nonnegative
integer k, set c = (k + l)/cx, b = r + is in the well-known formula b-cr(c) =
S~ i-Ie-by dy. Convert this to a similar example with support (- oc, ::t:J).

12. Prove the Raikov-Ottaviani theorem that the family of Poisson dJ.s is factor closed.
Hint: It may be supposed, without loss of generality, that the support Sj (of any
factor F j) C {O, 1, 2, ... }. Rather than d.s, utilize the corresponding probability
generating functions t/J( w) = e),(W- 11, t/J l v,) = E ",x j = Ii';:o P [X j = ilw i , j = I, 2.
13. If c.f.s qJ., qJ2 satisfy nJ: I [qJj(t)]'j = (pe i • + q)",
where cx j > 0, j = 1,2, then
qJj{t)= eifCj(peil + q)"j with n l ', n2 positive integers such that (Xln l + cx 2nz = n; also
CI(XI + C2CX2 = O.

14. If I - qJ(t) = 0(1 t I') as t -. 0 for some (X in (0,2], then PI I X I > c) = 0«'-') as
C -. Xj. Hint: Integrate.f IIxl > c] (I - cos tx)dF(x) ~ kt' over (0, C- I) or use Lemma
lj.3.1.
15. Prove that

2
-
foo I - .3f(<p(t)}
2 dt =
foo IxldF(x).
n -00 t -00

16. Show that a c.f. is positive definite i.e., D.k: I qJ(t j - tk)pi5k ~ 0 for any n ~ I.
real t I' . . . , t n and complex Plo ... , Pn' Conversely, a continuous, positive definite
function qJ on ( - rx:, ::t:J) with qJ(O) = I is a d. This is Bochner's theorem.

8,5 Remarks on k-Dimensional Distribution


Functions and Characteristic Functions
Multidimensional dJ.s (Section 6.3) lack the simplicity of their one-
dimensional counterparts and it is frequently easier to deal with the cor-
responding measure or d. The d. of a random vector X = (X l ' ... , X k)
on a probability space (0, fJ', P) is defined by

cp(t) = cp(tl, .. ·,tk)= EeXP(ittjXj) (1)

and is also expressible as a Fourier-Stieltjes transform

cp(t l ,···, tk) = f exp (i t tjXj)dF(Xb"" Xk),

where F is the dJ. of X.


8.5 Remarks on k-Dimensional Distribution Functions and Characteristic Functions 309

It is easy to verify that a joint dJ. F(x I' . . . , Xk) is continuous at a point
(XI, . .. , Xk) if its marginal d.C.

Fh) = lim F(Yi"'" Yj-I' Y, Yj+ I,···, h)


Yi- OO
i*j

is continuous at Y = x j for eachj = 1,2, ... , k. However, in contradistinction


to the case k = 1, F may be discontinuous even though the corresponding
measure F{·} = P X-I assigns probability zero to every point of Rk • For
example F may allocate probability one to some hyperplane or hypersurface
of Rm , 1 ~ m < k, without assigning positive probability to any point therein
(see Exercise 1).
For any bounded increasing function F on R\ define
C(F) = {x = (Xl>' .. , Xk): F/xj+) = F/xj-), 1 ~ j ~ k},

where F j is the" marginal function" of the prior paragraph. As noted above,


C(F) need not coincide with the set of continuity points of F when k > 1.

Definition. A sequence {F n , n ;;::: I} of dJ.s on Rk converges weakly to a


function F on R\ denoted by F n ~ F, if lim Fn(x) = F(x) for all X E C(F).
If, moreover, F is a dJ., then F n is said to converge completely to F, denoted
by F n ":" F.

A straightforward generalization of Theorem 8.3.1 yields

Theorem 1. If X = (X I"'" X k) is a random vector with d. (() and dJ. F, then

= E n[!I[xj=ajorbj) + I[a J
<Xj <bja
j= I

and the right side reduces to P{aj < X j < bj , 1 ~j ~ k}whena = (al, ... ,ak)
and b = (b l , ... , bk) are in C(F).

The reformulation and proof of Theorem 8.3.2 for R k are immediate. The
statement of the Levy continuity theorem (Theorem 8.3.3) carries over
verbatim to R k and the proof generalizes readily.
It is worthy of note that independence may be characterized in terms of
d.s as well as dJ.s.

Theorem 2. Random variables X t, ... , X k on .some probability space are


independent iff their joint d. is the product oftheir marginal d.s, i.e.,

n ({)Xj(t j).
k

({)x, ..... x.(tt,···, t k ) = (2)


j=1
310 8 Distribution Functions and Characteristic Functions

PROOF. If XI' ... , X k are independent, by Theorem 4.3.3


k k k
itXj
((JX' ..... Xk(t 1 ' ••• 't k ) = Ene = nEeitXj = n((JXj(t).
j=1 j=1 j=1
Conversely, since the d. of the product measure FIx F 2 X ..• x Fk is

fex P (itt j Xj)d(F I x .. · x Fk)

= roo fexP(ittjXj)dFl(Xl) ... dFk(Xk)

via Fubini's theorem, if(2) holds, the d.s of F x, ..... Xk and the product measure
FIx ... x Fk are identical. But then by the uniqueness theorem for dJ.s
and d.s on R k so are the dJ.s, that is,
k
Fx, ..... Xk = nF x ,
j= 1

which ensures independence. o


The question may be posed as to whether the collection ofone-dimensional
dJ.s of cX' = I'=
1 CjX j for all choices of the constant vector C = (Ch' .. ,Ck)
determines a unique dJ. for X = (X h ... , X k)' If the assignment of distribu-
tions to cX' is compatible with the existence of a joint distribution, the latter
is necessarily unique since, denoting the joint d. of X by ((Jx(t), for any scalar
u in ( - 00, (0)

and so, setting u = 1, the family of univariate d.s on the right determines the
multivariate d. on the left.

EXERCISES 8.5
1. Let F(xl> X2) be the dJ. corresponding to a uniform density over the interval (0, 1) of
the X2 axis of R 2. Verify that (i) F is discontinuous at all points (0, X2) with X2 > 0,

°
(ii) the marginal dJ. F I (x I) is discontinuous I/r x I = 0, (iii) F 2(X2) is continuous. Note
that F { . } assigns probability to all points of R 2 • Construct an F for which C(F) '#
discontinuity set of F.
2. If q> is a c.f. on R I withdJ. F, what is the dJ. corresponding to '/J{tl'"'' tk) = q>{L~ t j )?
3. Prove the dJ. F(x) is continuous at x = (XI"'" X2) if X E C(F). Construct a dis-
continuous density I on R2 with continuous marginal densities II and 12'
References 311

4. If Fi , I :$ i :$ k, are dJ.s on RI, show that for any IX in [-I, I], F(Xh""
nw -
Xk) =
[I + IX Fi(xj)] n~= 1 Fj(Xj) is a dJ. with the given marginal dJ.s.
5. Prove that X = (X 1, ... , X k) has a multivariate normal distribution with mean
vector () = «()l, ... , ()k) and covariance matrix ~ = {aij} iff every linear combination
cX' = L~ CjX j has a normal distribution on R 1 with mean C/I' and variance c~c'.

6. If X. = (X. h ... , X .k), n ~ I, is a sequence of random vectors for which every


linear combination cX~ ~ N cp •• cu ·, where N P.> is a fictitious normal LV. with meanp
and variance IX, prove that the dJ. of X. converges to the normal d.f. on R k with mean
vector Ii and covariance matrix ~.
7. Prove the Cramer-Levy theorem (Theorem 8.4.3) in R k • Hint: Use the result for
k = I.
8. Generalize Theorem 8.4.4 to the multinomial distribution.
9. Prove Theorem 8.5.1 and deduce the one-to-one correspondence between dJ.s and
cJ.s on R k •
10. Prove the k-dimensional analogue of the continuity theorem (Theorem 8.3.3).

II. Verify that if F. ~ F, the marginal dJ.s F•. j ~ F j , I :$ j :$ k.

12. Let the random vectors X. ~ X 0, where X 0 = (X 01' ... , X Ok) is a possibly fictitious
random vector with dJ. F. If {Y", n ~ O} are k-dimensional random vectors whose
ith component is gi(X•. 1" " , X•. k ), I :$ i :$ k, n ~ 0, where {gj, I :$ i :$ k} are
continuous functions on R\ then Y. ~ Yo.

References
H. E. Bray, "Elementary properties of the Stieltjes integral," Ann. Math. 20 (1919),
177-186.
F. P. Cantelli, "Una teoria astratta del calcola delle probabilitci," 1st. Ital. Attuari 3
(1932).
H. Chernoff. "Large sample theory: Parametric case, Ann. Math. Stat. 27 (1956),1-22.
Y. S. Chow and H. Robbins, "On optimal stopping rules:' Z. Wahr. 2 (1963), 33-49.
K. L. Chung, A Course in Probability Theory, Harcourt Brace, New York, 1968; 2nd ed.,
Academic Press, New York, 1974.
H. Cramer, "Ober eine Eigenschaft der normalen Verteilungsfunktion," Math. Z. 41
(1936),405-414.
H. Cramer, Mathematical Methods ofStatistics, Princeton Univ. Press, Princeton, 1946.
H. Cramer, Random Variables and Probability Distributions, Cambridge Tracts Math.
No. 36, Cambridge Univ. Press, London, 1937; 3rd ed., 1970.
J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
W. Feller, An Introduction to Probability Theory and Its Applications, Vol. 2, Wiley,
New York, 1966.
M. Frechet and J. Shohat, "A proof of the generalized second limit theorem in the
theory of probability," Trans. Amer. Math. Soc. 33 (1931).
J. Glivenko, Stieltjes Integral, 1936 [in Russian].
312 8 Distribution Functions and Characteristic Functions

B. V. Gnedenko and A. N. Kolmogorov, Limit Distributions for Sums of Independent


Random Variables (K. L. Chung, translator), Addison-Wesley, Reading, Mass.,
1954.
G. H. Hardy, A course ofPure Mathematics, 10thed., Cambridge Univ. Press, New York,
1952.
E. Helly, "Uber lineare Funktionaloperationen," Sitz. Nat. Kais. Akad. Wiss. 121, IIa
(1921),265-277.
P. Levy, Calcul des probabilites, Gauthier-Villars, Paris, 1925.
r
P. Levy, Theorie de addition des variables aleatoires, Gauthier-Villars, Paris, 1937; 2nd
ed.,1954.
M. Loeve, Probability Theory, 3rd ed., Van Nostrand, Princeton, 1963; 4th ed., Springer-
Verlag, Berlin and New York, 1977-1978.
E. Lukacs, Characteristic Functions, 2nd ed., Hoffner, New York, 1970.
G. Polya, "Remarks on characteristic functions," Proc. 1st Berkeley Symp. Stat. and
Prob., 1949, 115-123.
D. A. Raikov, "On the decomposition of Gauss and Poisson laws," Izv. Akad. Nauk,
USSR (Ser. Mat.) 2 (1938a), 91-124 [in Russian].
D. A. Raikov, "Un theoreme de la theorie des fonctions caracteristiques analytiques,"
1zuest. Fak. Mat. Mek. Uniu. Tomsk NIl 2 (l938b), 8-11.
H. Robbins, "Mixture of distributions," Ann. Math. Stat. 19 (1948), 360-369.
S. Saks, Theory of the Integral (L. C. Young, translator), Stechert-Hafner, New York,
1937.
N. A. Sapogov. "The stability problem for a theorem of Cramer," Izv. Akad. Nauk,
USSR (ser. Mat.) 15 (1951),205-218. [See also selected translations, Math. Stat.
and Prob. 1,41-53, Amer. Math. Soc.]
H. Scheffe, "A useful convergence theorem for probability distributions," Ann. Math.
Stat. 18 (1947), 434-438.
J. A. Shohat and J. D. Tamarkin, "The problem of moments," Math. Survey No. I,
Amer. Math. Soc., New York, 1943.
E. Slutsky, "Uber stochastiche Asymptoten und Grenzwerte," Metron 5 (\925), 1-90.
H. Teicher, "On the factorization of distributions," Ph.D. Thesis, 1950. [See also Ann.
Math. Stat. 25 (1954), 769-774.]
H. Teicher, "Sur les puissances de fonctions caracteristiques," Comptes Rendus 246
(1958), 694-696.
H. Teicher, "On the mixture of distributions," Ann. Math. Stat. 31 (1960), 55-73.
H. Teicher, "Identifiability of mixtures," Ann. Math. Stat. 32 (1961),244-248.
E. C. Titchmarsh, The Theory of Functions, Oxford Univ. Press, 1932; 2nd ed., 1939.
9
Central Limit Theorems

Central limit theorems have played a paramount role in probability theory


starting-in the case of independent random variables-with the DeMoivre-
Laplace version and culminating with that of Lindeberg-Feller. The term
"central" refers to the pervasive, although nonunique, role of the normal
distribution as a limit of dJ.s of normalized sums of (classically independent)
random variables. Central limit theorems also govern various classes of
dependent random variables and the cases of martingales and interchange-
able random variables will be considered.

9.1 Independent Components


Consider at the outset a sequence {X n' n ~ I} of independent random
variables with finite variances {a;, n ~ I}. No generality is lost and much
convenience is gained by supposing (as will be done) that E X n = 0, n ~ 1.
Set
n n

Sn = LXi, s; = E S; = L af, n ~ 1. (I)


i= 1 i= 1

The problem, of analytic rather than measure-theoretic character, is to


determine when Sn, suitably normalized (to Sn!sn), converges in law to the
standard normal distribution. The solution is linked to the following

n
Definition. Random variables {X n , ~ I} with E X n = 0, E X; = < 00, a;
and dJ.s F n are said to obey the Lindeberg condition if s; = L~ af > 0 for

i
some nand

f
j= 1 lIxl> tS;)
2
x dFi x ) = o(s;), all E > O. (2)

313
314 9 Centra) Limit Theorems

Condition (2) requires that Sft -+ 00 and is equivalent to the classical form
of the Lindeberg condition, viz.,

±i
j= 1 IIxl >u.)
x 2 dFi x ) = o(s;) for all 8> O. (2')

Monotonicity of Sft yields (2) ~ (2'), while the reverse implication follows by
noting that for all 8 > 0 and arbitrarily small (j > 0

S;;2 Ln xi LL
2 dFi x ) = S;;2 + L ]
j= 1 IIxl> Uj) ":sjS6sn j:sJ>6s.

s; (j2 + S;;2 f
j =1
i IIxl > .6sn )
X2 dF j -+ (j2.

Despite their equivalence, (2) is structurally simpler than (2'). Moreover,


(2) or (2') entails
(1~
max --t = 0(1), (3)
lSjSftSft

since for arbitrary 8 > 0

max
j
alS;; 2 s; max s;; 2 [8 2S; +
j
iIIxl>.s.)
X2dF j] = 8
2
+ 0(1).

If {X n} are independent with E X n = 0, E X; = (1;, either s; = L~ (1f -+ 00


or s; i S2 < 00. The latter contingency is devoid of interest in the current
context since, if N #. a2 denotes a fictitious normal random variable with mean
p., variance (12, and both Sjsn ~ N O• 1 and Sft i s, then Sft ~ N 0.s2 by Corollary
8.1.3. In terms of the characteristic function of X j ' say CfJit), this entails

CfJl(t)· jU CfJit) = !~"c:, i\ CfJi t ) = ex p { _ S;2}.

By the Cramer-Levy theorem (Theorem 8.4.3) both of the c.f.s CfJl(t) and
fLoo
= 2 CfJJ{t) must be normal. Isolating CfJ2(t) analogously, it follows that X 2
and eventually all Xft are normally distributed.

Theorem I (Lindeberg). If {X n' n ~ I} are independent random variables with


zero means, variances {(1;}, and distribution functions {Fft} satisfying (2), then
the distribution functions of the normalized sums SjSft tend to the standard
normal. Conversely (Feller), convergence of these distribution functions to
the standard normal and

(4)

imply (2).
9.1 Independent Components 315

PROOF. Let t, t: be fixed but arbitrary real numbers, the latter being positive.
Set
t2X 2
Y't)
~
=e
o
l
/
Xj - 1- itX 0

J
+ _-.!...
2'
and note that I ~{t)1 ::s; min[t 2 XJ, (l/6)ltXjI3] by Lemma 8.4.1 with () = I
and n = 1,2. Consequently, recalling that E X j = 0,

(5)
Thus, for 1 ::s; j ::s; n, setting So = 0 = So and utilizing independence and (5),
2
t }1
IE exp {it (
So) S~t2}
t
{(SO_l) S~-l
+ ~S; - E exp it ~n + 2s; J

SO_l) + s2t2} ({itX-} {U~t2 })\


=
IE exp {it ( ~n ~S; E exp sn J - exp - ~S;

2E exp {itX} 2
{U t }1
2

exp - ~S;
2
::s; e/ /
I sn j -

(6)

Finally, via (6) and (3)

(7)

and, since t: is arbitrary, the conclusion follows from the hypothesis and
Theorem 8.3.3.
Conversely, (4) entails (3) via
Uo
max --!.::s; max --!.
Uo + max --!.
Uo --+ 0
t :s,j:s,n Sn 1 :s,j:s,m Sn m< j:s,n Sj
316 9 Central Limit Theorems

as first n and then m ~ 00. This, in turn, yields as n ~ 00

= O(1).f <Pj
)= 1
I (.!-) - 11 = 0(1)
Sn
(8)

since Lemma 8.4.1 guarantees I<Pitlsn) - II :s; t 2 aJl2s;, which in con-


junction with (3) implies

max
1 $,) $,n
I<Pj (!.-) - 1 I= 0(1),
Sn

Since the integrand on the right :s; 2 :s; 2x 2 /fh;, while that on the left :s;
t 2 x 212s;, it follows that as n ~ 00

which is tantamount to (2'). D

Corollary 1 (Liapounov). If {X n } are independent with E X n = ° and


L:;=I EI X j l2H = o(s;H)for some b > O,then

limp{Sn < x} = ~ IX e- u2 / 2 duo (9)


n-oo Sn V 2n - 00

PROOF. Take q = 2 and r = 2 + bin

°< q < r. D (10)


9.1 Independent Components 317

Corollary 2. If {X.} are i.i.d. with EX. = /1, E(X. - /1)2 = a 2 E (0, (0), then

I1m· p{S.afi
• -00
- n/1 < x } = -1- IX
j2;r. - 00
e -"2/2 du. (11)

If the {X.} have finite moments of order k, what additional assumptions


will ensure moment convergence, i.e., that for any positive integer k > 2

(12)

The answer is tied to the next

n = 0, EX; = a;
Definition. Random variables {X., ~ I} with EX.
said to obey a Lindeberg condition of order r > if ° are

f r
j= I Jllxl>tSj]
Ixl'dF/x) = o(s~), all [; > 0. (13)

For r = 2, this is just the ordinary Lindeberg condition. Surprisingly, for


r> 2, (13) is equivalent to

L EIXjl' = o(s~) (13')
j= I

and also to (I 3") (defined as (13) with Sj replaced by s.). Clearly (13') ~ (13) ~
(13") and so to establish equivalence for r > 2 it suffices to verify that
(13") ~ (13'). The latter follows from the fact that for r > 2 and all [; > °
• •
L EIXX ~ L E{([;s.y-
j= I j= I
2
XJ I IIX jl,;;tS"] + IXXIllxjl>tS")}

~ [;'-2S~ + o(s~).
According to (10), a Lindeberg condition of order r > 2 implies that of order
q, for all q in [2, r]; in particular, a Lindeberg condition of order r > 2
implies the central limit theorem (9).

Theorem 2. Let {X., n ~ l} be independent random variables with E X. = 0,


EX; = a;. If {X.} satisfies a Lindeberg condition of order r for some integer
r ~ 2, then (12) obtains for k = 1,2, ... , r.

Corollary 3. If {X., n ~ 1} are i.i.d. with E XI = 0, E XfkE(O, (0) for some


positive integer k, then lim._ oo E(S./s.?i = (2j)!j2 ij!, E(S./S.)2 i - I = 0(1),
1 ~j ~ k.
PROOF OF THEOREM 2. Since E(S./S.)2 = 1, n ~ 1, the theorem is clearly valid
for r = 2. In the case r > 2, suppose inductively that the theorem holds for
318 9 Central Limit Theorems

k = 2,3, ... , r - 1, whence, recalling that a Lindeberg condition of order r


entails that of lower orders, (12) obtains for 2 :$ k :$ r - 1. Thus,

Let {Y", n ~ O} be independent, normally distributed random variables


with means zero and variances a;, where a~ = 1 and, in addition, are
independent of {X n' n ~ I}. Set
j-I n

Qj,n = LXi + L ~, f(t) = t'.


i;1 i;j+1

Then, Qj,n + X j = Qj+ I,n + ll+ I> 1 :$ j < n, and P')(t) = r!, whence

(14)

noting that by independence, for all j

E Yj ~ X~ pi) (Qj,n) = E (Yj ~ X~)Ef(i)(Qj,n) = 0, i = 1,2,


Sn Sn Sn Sn

= O(1)E(,
Xj i
l ~ Illl)E[ I~: r- i + I YOI'-i]
Xj1i
= O(I)EC s7 Illl)
for i = 3, ... ,r, recalling that {I Sj la, 1 :$ j :$ n} is a submartingale, IX ~ 1.
The latter together with the fact that {X n } obeys Lindeberg conditions of
orders 3,4, ... , r ensures that the last expression in (14) is 0(1), noting that
for i ~ 3, Ellll i = Cia~ = CJE XJ)i /2 :$ C i EIXjl i for some constant Ci in
(0, (0). D

A central limit theorem is greatly enhanced by knowledge of the rate of


approach to normality and even more so by a precise bound on the error in
approximating for fixed n. The latter, known as the Berry-Esseen theorem,
has considerable practical and theoretical significance.
9.1 Independent Components 319

Lemma I. If F is a dJ. and G is a real differentiable function with G(x) -+ 0


or 1 according as x -+ - 00 or + 00 and supx IG'(x) I :::; M > 0, then there
exists a real number c such that for all T > 0

If
oo (sin X)2 Hc (2X)dX
-00 x T
I~ 2Mb[~2 - 3 foo sin: x dxJ
n/2 x
~ 2Mb[~ - ~J
2 Tb
(l5)
where 15 = (1j2M)supx IF(x) - G(x) I and Hc(x) = F(x + c) - G(x + c).
PROOF. Since G is necessarily bounded, 15 is finite and the integral on the left
of (15) exists. In the nontrivial case 15 > 0, there exists a real sequence {x n }
with F(x n ) - G(x n) -+ ±2Mb. Since F(x) - G(x) -+ 0 as x -+ ±oo, {xn }
has a finite limit point, say b, and the continuity of G ensures F(b) - G(b) :::;
-2Mb or F(b +) - G(b) ~ 2Mb. Suppose the former for specificity and set
c = b - b. Then if Ix I < 15, by the mean value theorem

Hlx) = F(x + c) - G(x + c) = F(b + x - b) - G(b +x - b)


:::; F(b) - [G(b) + (x - b)G'(O)]
:::; -2Mb + (15 - x)M = -M(b + x),
whence
b1- cos Tx ()d fb (x + b)(l - cos Tx)
f -b
----=2,.---
X
Hex x:::; - M
-b X
2 dx

= -2Mb fb 1 - c~s Tx dx
o x

= _ 2MbT [~_ [00 sin: x dXJ. (l6)


2 JHj2 x
Moreover,

1 llxl>b)
1 - c~s Tx Hc(x)dx :::; 2Mb
X
1 lIxj>b)
1 - c~s Tx dx
X

oo sin2 x
= 4MbT
i bT/2
-2-
X
dx. (l7)

Without loss of generality, suppose T large enough so that the middle of (15)
is positive and hence the sum ofthe right sides of (16) and (17) negative. Then

If-00 I (1
=-
oo
lIxlsb]
+ 1) >
llxj>b) -
2M bT - - 3
2
[n ibT/2
oo
-
2
sin -
2
x
dx x ] '

which is tantamount to the first inequality in (15), and the second follows
directly therefrom. 0
320 9 Central Limit Theorems

Lemma 2.11, in addition to the hypotheses ofLemma 1, G is ofbounded variation


on (- 00, (0) and F - G E!f l' thenfor every T > 0

sup IF(x) - G(x)1


x
~~
rr
iT I({>F(t) - ({>G(t) Idt + 24M,
0 t rrT
(18)

where ({>F, ({>G are Fourier-Stieltjes transforms of F, G.

PROOF. In the nontrivial case where the integral is finite,

whence

({>F(t) - . ({>G(t) e -ile = Joo ( )eilx d x.


H eX
-It -00

Since the above right side is bounded, via Fubini

T ({>F(t)~. ((>G(t) e-ilTT -ltlJdt = JT Joo Hix)ei1x(T-ltl)dxdt


J
-T It -T -00

= Joo 2(1 - c~s Tx) Hix)dx


-00 x

Joo 2 (2X)d
-_ 2T sin x He --
--2- X
-00 x T

and so

whence by Lemma 1

yielding (18). D
9.1 Independent Components 321

Lemma 3. Let <p:(t) be the d. of Sn = Xj' where {X n} are independent L1


LV.S WI zero means an varzances an· n = "n
·th d · 2 I'fr 2 +0
L...j= 1 Yj2+0 and Sn2 = "n
L...j= 1 a j2,
where yJ H = E IXj - E X j 1 H, then for 0 < fJ :$; 1
2

<P: (!.-) - e- /2/2 31 rnt 12+Oe-/2/2 for It I < ~. (19)


I
1 :$;
Sn Sn 2rn
PROOF. Let 0 be a generic designation for a complex number with 101 :$; I.
If <P j is the d. of X j ' 1 :$; j :$; n, by Theorem 8.4.1
2
Yj l t l )2+0
<Pj ( -t) = 1 - - t 2
aJ + 0 ( - -
Sn 2sn Sn

Thus, since log(l + z) = z + (40/5)lzI 2 for Izl :$; 3/8,

whence, summing on j,

log <P *(t) _t


- = -
2
-nl-
+ -80 (r t l)2+0.
Sn 2 5 sn
Hence, via Ie Z
- 11 :$; Iz Ie 1zl

< 8- (rn
--l tl)2+0 exp - {_t 2
+ -2} < 3 (rn1tl)2+0
-- e
-/ /2 2

- 5 Sn 2 5 Sn
o
Lemma 4. Under the conditions of Lemma 3

I<P: (~) - e-
t2/2
1 :$; 16 (r~~tIYH e- /2/3 for
1 (s
It I :$; [ 36 r"n
)2+0J1 /O

and 0 < fJ :$; 1.


322 9 Central Limit Theorems

PROOF. If {Xi, X j } are i.i.d., 1 ~ j ~ n, employing Theorem 8.4.1 and


Exercise 4.2.1,
2 2
t)
<Pj ( ~ 1
= E exp (it(X j sn- X'))
j
t E(X
~ 1 - 2s; j - Xi)2
I
2H 2 2
+ E -t (X· - X') 1 ~ I _t _
aJ
sn s;

e2r
J J
I
2H 2H
+2
2H
I ~ 1 EI X j l
2H
~ 1- + 81 Y:: 1

~ exp ( _(atr + 8 IY:J2+b) ,


implying since 361tl b ~ (sn/rn)2+b that
2 2H
I <P: (~) 1 ~ exp ( _t 2
+ 81 t~n 1 ) ~ e-/ + 2 2
/
2
/
9
~ e- 2
2/ /3

Hence, supposing (as is permissible by Lemma 3) that It I ~ sn/(2rn),

I <P: (~) - e-/


2
/
2
1 ~ e-/ 3 + e-/ 2 ~ 2e-/ 3 ~
2
/
2
/
2
/ 16 1 t~n l2+be -/ 2
/
3. 0

Theorem 3 (Berry- Esseen). If {Xn' n ~ I} are independent random variables


E X n = 0, EX; = a;, s; = L7=t af > 0, r;H = L7=1 EIXd2+b < 00,
n ~ l,for some b in (0, 1] and Sn = L:i Xi' there exists a universal constant
C b such that

sup
- 00 <x< 00
IP{Sn < xs n } - ~ IX
Y 2n - 00
e-
y2 2
/ dy I~ cb(rn)2H. Sn
(20)

Remark. The thrust of the theorem is for rn/sn = 0(1), in which case
Esseen (1945) asserts that C b ~ 7.5 when b = 1.
PROOF. If F(x) = P{Sn < xsn} and cI>(x) = standard normal dJ., <D'(x) =
(1/.jbr)e- x2 / 2 ~ 1/.jbr = M and, since both F and <D have mean zero and
°
variance one, Tchebychev's inequality ensures F(x) ~ 1/x 2 , X < 0, and
I - F(x) ~ 1/x 2 , X > (similarly for <D). Thus, F - <D E!I' 10 whence by
Lemma 2, taking T d = (1/36) (sn/rn)2+b and recalling Lemma 4,
supl P{Sn < xsn } - cI>(x)1
x
9.1 Independent Components 323

Consequently, (20) is valid for rnls n ::; 1 and holds trivially with C" = I
when r nlsn > 1. 0

Corollary 4. If {Xn' n ~ I} are i.i.d. random variables with E Xn = 0, E X; = (12,


+" +"
E I X n 12 = y2 < 00 for some ~ in (0, 1] and <I> is the standard normal dJ.,
there exists a universal constant c" such that
2+"
sup I P{Sn < x(1n 1 / 2 } - <l>(x)l::; ~~2 1: . (21)
-oo<x<oo n ( )(1
Remark. It is asserted in Van Beek, 1972 that CA ::; .7975 when ~ = 1.
If {X n , n ~ I} is a sequence of independent random variables with
E Xn = 0, EX; = 1 obeying the Lindeberg condition, it seems extremely
plausible that (1Ijn) LJ::'n+ t X j ~ No, I ' This would be encompassed by
a central limit theorem involving a double sequence of rowwise independent
random variables {X nj ,l::;j::; kn -+ oo}, i.e., such that {Xnl, .. "XnkJ
were independent for each n ~ I. For, if necessary and sufficient conditions
for asymptotic normality of L~~ 1 X nj were available, it would suffice to set
X nj = Xn+jljn, 1 ::;j::; n. Amazingly, the class of all possible limit distri-
butions of (centered) row sums L~~ 1 X nj of rowwise independent
{X nj' 1 ::; j ::; kn -+ oo} can be characterized and coincides with the class of
socalled infinitely divisible laws (under a minor additional proviso). But this
will be the subject of Chapter 12.
A central limit theorem (C.L.T) has already been established for i.i.d.
random variables with finite variance in Corollary 9.1.2, and the case of
infinite variance is dealt with next. The latitude in choosing the normalizing
constants An, En of Theorem 4 is governed by Corollary 8.2.2.

Theorem 4. If {Xn, n ~ I} are i.i.d. random variables with non-degenerate dJ.


F and <I> is the standard normal dJ., then

!~~ p{~n J/i - An < x} = <l>(x) (22)

for some Bn > 0 and An iff


· Jllxl>c)dF(x) - o·
I1m 2 J 2 -, (23)
c~oo (l/c ) llxl<c)x dF(x)
moreover, Bn may be chosen as in (25) while An may be taken as

-n
Bn
iIIxl<B.)
xdF(x). (24)

PROOF, Note that either (22) or (23) implies that E xi > o. If J~ 00 x 2 dF(x) < 00,
2
thenc Jllxl>c) dF ::; Juxl>C) x 2 dF(x) = o(1)asc -+ 00,sothat(23)obtainsand
in this case (22) has already been proved with

An = jn foo xdF,
(1 - 00
324 9 Central Limit Theorems

which differ from the choices of (25), (24) only in the sense of Corollary 8.2.2.
Thus, it may and will be supposed that E X 2 = 00.
By the dominated convergence theorem

c- 2 f x 2 dF(x) = 0(1)
JUxl<C]

as c -+ 00 and so, defining

Bn = sup{c: c- 2 f
JUxl <cl
x 2 dF(x) ~ !},
n
(25)

necessarily Bn i 00 and via continuity

;2 r 2
x dF(x) = 1. (26)
n JIlXI<Bnl

Under the hypothesis of (23), if 0 < e ~ I,

JlcBn<lxl<Bnlx2 dF(x) < B; JUxl;o,cBn1dF = (I)


Juxl< BnI X 2 dF(x) - JUxl<cBnlx2dF(x) 0 ,

whence, recalling (26), for 0 < e ~ I

_; f
Bn J Uxl < cBnl
x 2 dF(x) = n
2
f
Bn JUxl < Bnl
x 2 dF(x) [1 _J[CBn<lxl<Bn~x2
JUxl < BnlX
dF(X)]
dF(x)
-+ 1.

(27)
Thus, via (27) and (23)
lim n P{IXII > eBn} = lim nJllxl>cBnldF
2 2 = 0 (28)
n~oo n~oo nBn J[lxl<cB nlx dF
for 0 < e ~ I and hence for all e > O. Likewise (27) holds for all e > O.
Define Xi = Xj[UXI<Bnl' 1 ~ j ~ n, and S~ = D Xi. Now (26) entails
n l / 2 = o(B n) so that if y" = (n l / 2jBn)X 1[[IXIi <cBnl' then Yn !. 0 and, moreover,
E Y; -+ I by (27). Thus, {y", n ~ I} is uniformly integrable, whence
(n tl2 j Bn ) JUxl < cBnl X dF = E Yn = 0(1) for all e > 0, and so (27) implies

--
n
B;
[i x 2 dF -
Uxl < cBn]
(i [Ixl < cBnl
x dF )2] -+ 1
'
e > O. (29)

Now for e = 1, (29) asserts that s; == (j2(S~) '" B; and, moreover, E Yn = 0(1)
implies E X'I = 0(B n n- I /2 ). Thus, for 0 < e ~ 1 and all large n
9.1 Independent Components 325

according to (28). Consequently, (S~ - E S~)jsn ~ N o. 1 by Exercise 9.1.2 and,


P{Sn ¥- S~} S n P{IXII ~ Bn } = 0(1),
it follows that
Sn Sn - E S~ d
B - An =
- Bn -No • I'
n

Conversely, if (22) prevails, then (28) and (29) hold for all E > 0 by
Corollary 12.2.3, whence
(EB n)2 Jllxl>Bn£)dF
2
E n P{IXti > BnE}
JIIxl<Bn 2
£)x
S 2
dF(x)
2 J 2 = 0(1)
nB n [lIxl<B n£]x dF - ([lxl<£Bn)xdF) ]
J
for all E > 0, which is tantamount to (23). 0

EXAMPLE 1 (Friedman, Katz, Koopmans). If {X, X n , n ;;:: 1} are i.i.d. LV.S with
EX = 0, E X 2 = u 2 E (0, 00), U; = U2(X~) where X~ = (X /\ n l /2 ) v (_n l /2 ),
Sn = LJ=I Xj' then
L n- I sup IP{Snjn l /2 < x} - <I> (xju ) I < 00.
00

(30) n
n=l x

PROOF. Since Un - U, it may and will be supposed that Un > 0, n ;;:: 1 and that
U = 1. Let S~ be the sum on n i.i.d. r.v.s having the common distribution of X~
and set Jln = E X~. In view of
IP {Sn < xn 1/2} - P {S~ < xn 1/2 }I s n P {IX I > n 1/2 }
it suffices to prove (30) with Sn replaced by Now
)J
S~.
l 2 l 2
Sf - nJl xn / - nJl } (x - n / Jl
P{S~ < xn l / 2 } - <I>(xjun) = [ P { n 1/2 n < 1/2 n _ <I> n
n ~ n ~ ~

If Cj , j ;;:: 1 are positive constants, the Berry-Esseen theorem ensures that if


A~ = supx IAnl and B~ = supx IBnl

L L L
00 00 00

n- 1 A~ S C 1 n- 3/ 2 EIX~ - Jlnl 3 ju; S C2 n- 3/2 EIX~13 < 00


n=1 n=1 n=1
since
00
n~1 n- 3/2 EIX~13 = 3 n~l n-
00
32
/
fn
0
'12
2
t P{IXI > t}dt

= 3 foo t 2 P{IXI > t} L n- 3/2 dt


o n;;"(tvI)2

= C3 IX> t P{IXI > t}dt < 00.

On the other hand, since Un - u, via integration by parts, E X = 0 and


326 9 Central Limit Theorems

Corollary 6.2.2 (30),

f n-1B~::; f
n=1 n=1
n_llet>(-nl/2fln) - et>(0)! = C4
(Tn n=1
n- 1 f f0
n
I/
21
1'"1/11" e- t2 / 2 dt

::; C4 n~1 n- 1/2Iflnl/(Tn::; Cs n~1 n- 1/2 f.~/2 P{IXI > t}dt


= Cs f" P{IXI > t} nt n- 1/2 dt = C6
1
f" t P{IXI > t}dt < 00. 0

Dominated convergence in conjunction with (30) yields the result of Rosen


that under the above conditions
00

n=1
L n- 1 IP {Sn < O} - tl < 00. (31)

°
Theorem 5 (CLT for non-degenerate U-statistics). Let Uk,n(h), n ~ k ~ 2 be a
sequence of U-statistics with E h = and (T2 = EIE{h(X 1'"'' Xk)IX d]2 E
(0,00). If
E{h(Xl, ... ,Xk)IX1, ... ,XJE22j/(2j-l), 2::;j::;k (32)
(a fortiori, if E Ih1 4 / 3 < (0), then for any real x

lim p{n / Uk,ih) <


n"'OO k(T
12
x} = V~2n fX- 00
e-
y2 2
/ dy. (33)

PROOF. According to Lemma 7.5.5,


n 1/ 2 1 n
-k Uk,n(h)
(T
= -----ri2
(Tn r=1
L E{h(X" ... , X r+k-l)IXr} + 0(1), a.s.

Since y,. = E{h(X r, ... , X r+k-l)IXr}, r ~ 1, are i.i.d. random variables with
E y,. = 0, E Y,.2 = (T2 E (0, 00), the conclusion (33) follows from Corollary 2 and
Theorem 8.1.1. 0

EXERCISFS 9.1
I. Show that if {X., n ~ I} are independent LV.S with IX.I ~ C., a.c., n ~ I, and
C. = o(s.), where s; = L~ E(X j - E X j)2 -+ 00, then (S. - E S.)/s. ~ No. I'
2. If {X. j , 1 ~j ~ kn -+ oo} are rowwise independent LV.S with S. = D',;,I X. j ,
EX.j = 0, E X;j = a;j' s; = D',;,
1 a;j -+ 00, then S./s. ~ No. I jf

k.
L E X;JlIX.il> £s.l = o(s;), e > O.
j=1

3. If {X.} are independent with


1 1
P{X. = in"} = 2n P ' P{X. = O} = 1 - Ii' 2rx>P-l
n
the Lindeberg condition holds when and only when 0 ~ p< 1.
9.1 Independent Components 327

4. Failure of the Lindeberg condition does not preclude asymptotic normality. Let
{Y.} be i.i.d. with E Y" = 0, E Y; = I; let {Z.} be independent with P{Z. = ± n} =
1/2n 2, P{Z. = O} = I - (1/n 2) and {Z.} independent of {Y.}. Prove that if X. =
Y" + Zn, Sn = D Xi' then Sn/';;' ~ N O• 1 and the Lindeberg condition cannot
hold. Explain why this does not contravene Theorem I.
5. Let {Y", n ;::: I} be i.i.d. LV.S with finite variance a 2 (say a 2 = I) and let {a;, n ;::: I}
be nonzero constants with 5; = I~ af -> 00. Show that the weighted i.i.d. LV.S
{an Y", n ;::: I} obey the central limit theorem, i.e., (lis.) D
a j >j ~ N o. 1 if an = o(sn)
and E Y = O.
6. Let {X n} be independent with

P{Xn = ± I} = L, P{Xn = ±n} = HI -~) ~2'


P{X. = O} = (I - ~)(I - ~2),n;::: I,a > I.

Again, Sn/';;' has a limiting normal distribution despite the Lindeberg condition
being vitiated.

7. Prove that the Liapounov condition of Corollary 9.1.1 is more stringent the larger
the value of b.
8. (i) For what positive values of lX, ifany, does a CL.T. hold for i.i.d. symmetric random
variables with F(x) = I - 1/(2x"), x ;::: I, F(x) = to 0 ~ x ~ I.
(ii) Does a c.L.T. hold for independent {X.} with P{Xn = ± I} = t
P{Xn = ±n}
= 1/(2n 3 ), P{X. = O} = (1/2) - (lln 3 )? Hint: Apply Theorems 4 and I.

9. CL.T. for V-statistics: Let {X.} be i.i.d. and cp(x l , ... , x m ) a real, symmetric function
of its m arguments. If E cp2(X I' ... , X m) < 00 and E cp(X I' , X m) = 8, then
n l /2(V n - 8) ~ N o.n , where V n = (~)-I I I $;, < ... <im$n cp(X i " , XiJ and a =
2
m E[E {cp(X I"'" Xm)IX d - 8 ]. Hint: A CL.T. applies to
2 2 2

since the components of the sum are i.i.d. with mean 0 and variance a 21m 2 , while
V n - v".!. 0 via E(V n - v,,)2 = 0(1).

10. Let X n = (X nl , ... , X nk ), n ;::: I, be i.i.d. random vectors with E X nj = I1j,


Cov(X.i,X n) = aij, 1 ~ i ~j ~ k.
If Snj = I7= I Xij' n ;::: I, I ~ j ~ k, prove that
(n- I/2(Snl - nl1l)' ... ' n- 1/2(Snk - nl1k»
converges in distribution to the k-dimensional normal distribution with mean vector
zero and covariance matrix {aij}. Hint: Recall Exercise 8.5.6.
II. Let {X n} be independent LV.S with EX. == 0, E X; == I which obey the Lindeberg
condition. If nj = [jnlk] = greatest integer ~ jnlk for j = 0, I, ... , k and Sn =
I7= I Xi' prove that «kln)I/2Sn" .. ·, (kln)I/2S n.) converges in distribution to the

k-dimensional normal with mean vector zero and covariance matrix {aij}, where
aij = min(i, j). Conclude that
328 9 Central Limit Theorems

lim p{ max S'j <


'-00 1 :5,J:5,k
xn l 2
/ } = p{ ~
vi k
max lj <
1 :5,j:5,k
x},
where {lj, 1 ~ j ~ k} are i.i.d. r.V.s with the standard normal d.f.

12. Provethatif{X., n;::: l}areindependentr.v.swithE X, = O,E X; = u;whichobey


central limit theorem and Iim,_oo E(S';-I/2 Ii X j )2k = (2k)!/2 k k!, then {X,} obeys
a Lindeberg condition of order 2k.

13. Let {X., n ;::: I} be i.i.d. with symmetric density f(x) = Ixl- 3 IlIxl> I)' Show that
(n log n)-1/2 Ii Xi 2.. NO,I'

9.2 Interchangeable Components


For an infinite sequence of interchangeable random variables {X n }, Corollary
7.3.4 exhibits the joint distributions as mixtures ofdistributions corresponding
to i.i.d. random variables and thereby provides a tool for proving central and
other limit theorems.
Initially it will be assumed that E X f < 00, whence no further generality
is lost in supposing E Xl = 0, E Xf = 1.

Theorem 1. Let {X n' n ~ I} be interchangeable random values. If E X = 0,


E x 2 = 1, and
= = COV(xi, xD,
Cov(X l' X 2) ° (1)
then (1/j;;)L~Xj~No,l' Conversely, if EX < 00 and (l/j;;)L~Xj~
2

N O,I' then (1) holds.


PROOF. According to Theorem 7.3.2 the r.v.s {X n } are conditionally i.i.d.
given the a-algebra '§ of permutable events, and according to Corollary
7.3.5 there is a regular conditional distribution pw for X given '§ such that
the coordinate r. v.s {~n, n ~ 1} are i.i.d. on the probability space (R 00, !!J oo , PW).
Moreover, by this same corollary, for.i # j
E XjX j = E[E{XjXjl'§}] = E[E 2 {X;i'§}]'
(2)
E[Xf - 1] [X; - I] = E[E 2 {Xf - I)I'§}]
and so (I) is equivalent to
E{Xd'§} = 0, E{X?I'§} = I, j ~ 1, a.c. (I')
Consequently, for all ill

and if Sn = L~ Xi' T,. = L~ ~i' then via Corollaries 7.2.2, 7.3.4, 7.3.5, and
9.2 Interchangeable Components 329

dominated convergence

~~~p{~ < x} = ~~~ fp{~ < XI~}dP = ~~~ fpw{fi < x} dP

= f~~~pw{fi < X}dP, (3)

and so sufficiency follows from Corollary 9.1.2.


Conversely, if Sn/~.!... N 0,1' then Sn/n!. O. By Example 1 of 7.3.1,
Sn/n ~ E {X I~} and the first half of (1)' follows. Apropos of the second half,
e- t2 /2 = lim E[E{eitSn/nlI21~}] = lim EEweitTn/nl/2
n~OC) n-+co

so that
E e-[1-a~Wlt2/2 = 1, all t E (-00, (0),

implying (t -+ (0) that a~(~) ~ 1, a.c. Consequently, a~(~) = 1 = E{X21~},


a.c., whence (1)' and hence (1) is obtained. 0
If Sn = D Xi'S; = E S;, it is quite possible that Sn/sn ~ No. I despite
violation of (I). The point is that if Cov(X I' X 2) = p2 > 0, the order of
magnitude of Sn is n rather than n 1/ 2 . Recall that X F denotes a fictitious r.v.
with dJ. F.

Theorem 2. If {X n, n ~ I} are fill interchangeable random variables with


partial sums Sn, n ~ 1, then
(i) Sn/n ~ SG, where G is the dJ. of E{X II~} and ~ is the a-algebra of
permutable events,
(ii) if F is any distribution function uniquely determined by its moments
akok ~ 1,andEX 1 X 2 ···X kexists,allk ~ 1,thenSn/n~SFiffEXIX2",Xk
= ak , k ~ I.
Corollary 1. If {X n' n ~ I} are fil 1 interchangeable random variables with
Cov(X I' X 2) = p2 > 0 and EX IX 2' .. X k exists for all k ~ 1 then
(1/ pn)L1 X i ~ No. I iff E X I X 2 •.. X k vanishes for odd integers and equals
1 . 3 ... (k - 1)lfor 'even integers k.
PROOF OF THEOREM 2. (i) is an immediate consequence of the strong law of
large numbers for interchangeable r.v.s (Example 7.3.1). To prove (ii) note
via Corollary 1.3.5 that, for all k ~ 1,
E Xl ... X k = E[E{X 1 ... Xkl~}] = E[E{Xd~}r. (4)
Thus, if Sn/n.!... SF, where F has finite moments ak , k ~ 1, part (i) ensures that
the right and hence the left sides of (4) are ak'
Conversely, if the left and hence the right sides of (4) equal ak> then (i) and
d
the Frechet-Shohat theorem 8.2.1 guarantee that Sn/n -+ SF' 0
330 9 Central Limit Theorems

Unfortunately, interchangeable r.v.s encountered in practical situations


of interest are likely to be finite in number and not embeddable in an infinite
sequence, so that Corollary 7.3.4 and prior results are inapplicable. A case of
this sort occurred in Chapter 3.1 with the random distribution of balls into
cells.
Suppose that N = N. balls are distributed at random into n cells and set
X. j = 1 or 0 according as the jth cell, 1 :::; j :::; n, is or is not empty. Then
{X. j ,1 :::; j :::; n, n ~ I} constitute a double sequence ofr.v.s which, for each
n > I, form a finite collection of interchangeable r.v.s with

(5)

Thus, recalling Exercises 7.3.6 and 6.3.10, {X. j , 1 :::; j :::; n} is not embeddable
in an infinite sequence of interchangeable r.v.s. Nonetheless, asymptotic
normality of the distribution of the number of empty cells, i.e., Ii= t X. j ,
can be prove~ by more ad hoc methods.
By way of preliminaries, set U = U. = Ii X. j and note that

EU = I• E X.
j=t
j = n (I)N
1- -
n
(6)

-(I - -(1 -
and from (5)

q~ = n[(1 - ~r ~rN] + n(n - 1)[(1 - ~r ~rN]

= n[(1 -
- ~r + (n - 1)(1 - ~r n(1 - ~rN]:::; n(1 - ~r· (7)

Let Sj = SJ"l denote the waiting time until the occupation of the jth new
cell by a ball. Set So = 0 and y".j = Sj - Sj-t, j ~ l. Clearly, {y".j, 1 :::; j :::; n}
are independent with Y,..t = SI = 1 and

P{y".j (j_l)i-t( -n-


= i} = -n-
j-l) ,1- j ~ 2, i·~ l. (8)

That is, {y",j - 1,2:::; j :::; n} are independent geometric r.v.s.


At most n - k empty cells, i.e., at least k occupied cells in the random
casting of N balls into n cells is tantamount to the kth new cell being entered
by the Nth or a prior ball. Thus, for 2 :::; k :::; n
9.2 Interchangeable Components 331

P{Sk ~ N} = P{U n ~ n - k}, (9)

and the possibility arises of shunting the asymptotic normality of a sum Un of


interchangeable r.v.s to one Sk of independent LV.S.

Theorem 3. Let U = Un designate the number of empty cells engendered by


distributing N = N n balls at random into n cells, and set a = an = NIn,
b = bn = (eO - 1 - a)I/2, and (J = (Ju n '
(i) If N ---> 00 and aln ---> 0 as n ---> 00, then
(10)
(ii) (J ---> 00 iff ne - 2°b2 ---> 00, in which case a = o(n), N ---> 00, (10) holds, and

Un - E Un d N
(J ---> 0 , I' (II)

(12)

PROOF, (i) Since for - 00 < ex < 00

(1 - ~r = exp{n( - ~ - ;;2 + O(n-


3
»)} = ex p { -ex - ~: + O(n- 2)},
(13)

it follows via (7) that

- ex p ( - ~ + O(an- 2»)] - ex p ( _ 2: + O(an- 2»)}

aeO }
= ne- 20 eO - a - 1 -
[ 2n (1 + 0(1» + O(an- 1 ) + O(an- 2)

ae-o
= ne- 2°b 2 - -2-(1 + 0(1» + O(ae- 20 )
(14)

under the assumptions that n ---> 00 and a = o(n). Moreover,


332 9 Central Limit Theorems

a
- (1 + 0(1» = 0(1) if a -+ 00
n

ae-
2a
a
2
ae
a
2 a
ae
a
ne- b = nb = n(e _ 1 _ a) =
2a(1 + 0(1» 0
na 2 = N If a
(1) . -+ °
if a -+ a. E (0, (0),

yielding (10) when N -+ 00 and a = o(n).

:;
(ii) From the definition of a and (14), if (12 -+ 00, then N -+ 00, and from (7)

(12 ::; n(1 - ~r ne- a,

implying a = o(n) when (12 -+ 00. On the other hand, if ne - 2ab2 -+ 00, then
N
00 +- ne- 2a b 2 = - e- 2a(e a - 1 - a) = O(N),
a
a ae a e 2a
- ::; -b2 ::; -b2 = 0(1),
n n n
and so again N -+ 00 and a = o(n). Hence, if one of (12 and ne - 2ab2 tends to 00,
then by (i), (10) holds and the other likewise approaches 00. Now assume that
2a 2
(12 = ne- b (1 + 0(1» -+ 00. (15)
From (15)
ea b a

by' n
; : -+ 0, .fit -+ 0,
.fit(e - 1)
b -+ 00.
(16)

In order to evaluate

P{ j;;nbe _ < x
V - ne- a
a
}
= P{U n < ne- a + x.fitbe- a} (17)

for x :F 0, define k = kn so that n - k + 1 is the smallest integer 2: ne- a +


x.fitbe- a. Then
n - k = ne- a + x.fitbe- a + 0(1),
and from (15)

n- k = ne- a + x(1 + o(1»y';:nbe- a = ne- a [Xb(1


1+ .fit
+ 0(1»] ,(18)

so that via (18)

10g(1 -~) = -a + xb(l + 0(1», (19)


n .fit
k n ea
n - k = n _ k - 1= T + [xb(1-+-0-(1-»/~7~] - 1 (20)
9.2 Interchangeable Components 333

recalling (16).
From (9) and (17)

p{U.fi" n~-Q <


nbe a
x} = P{U n <n- k + l} = P{U n :$ n - k}

= P{S < N} = p{Sk - E Sk < N - E Sk} (21)


k- a(Sk) - a(Sk) .

Now by (8)
n
E Y. . = - - - , - -
n.) n - j + l'
for j ~ 1, whence
k n 1
L n - J. + 1 = L+ -;-,J
n
E Sk = n
j= 1 n-k 1

2
a (Sk) = i [(
j= 1 n- J
~ + 1)2 - n - J~ + I] = f
n-k+ 1
(~:
J
- J~).
Since for m = 1,2

I
i dt 1 1 1
<
0_. -
tm - -.m-('_l)m
< --'m
;-1 J J J
it follows that
n nil 1 k
o <- log - - - L - < -- - - = -,---:-.,-
n - k n-k+ I j - n - k n n(n - k)'

1 1)
0:::;; ( n _ k - ~ - n-hlnp i l
: :; (n - k)2 -
1 2nk
n2 :::;; n2(n _ k)2'
whence, recalling (19) and (20),

N - E Sk = N - n log n : k + °C ~ k)

= n[a + 10g (1 - ~)] + O(e Q


)

= xbjn(1 + 0(1», (22)

and via (20), (19), and (16)

2
a (Sk) = n nk n (nk)
_ k - n log n _ k + 0 (n _ k)2
334 9 Central Limit Theorems

= n{n ~ k + 10g(1 -~) + O(n: k)2)}


= n{e o- + o(j;) - a + oeo + ~~b~O/fi»)}
1

= nb 2(1 + 0(1». (23)


Moreover, by (22), (23)
N - ESk
= x(1 + 0(1». (24)
(1 (Sk )
Now, from (6) and (13)

= ne- + 0(1)
O
(25)

To complete the proof, by Exercise 9.1.2 it remains to verify that {Yn,j - E y",j'
2 ~ j ~ k n } obey the Lindeberg condition.
Setting lj' = y",j+l - 1 and % = j/n, P{Y/ = h} = qjh(l - q), h = 0, I, ... ,
1 ~ j ~ k - 1. Recalling Exercise 4.1.1 and (23),

(1§k = I
j=l
[(-!L)2
1 - qj
+ -!LJ = nb 2 (1 + 0(1»
1- %

I
j=l
E lj'3 =6 I
j=l
[(-!L)3
1- %
+ (-!L)2J +
1 - qj
kf -!L.
j=l1 - qj
For 1 ~ j ~ k, noting (20) and (16),
q. k
__
J _ ~ __ = eO _ 1 + 0(be o/n 1/2 ) = 0(bn 1/ 2 ) = o«1s)
1 - qj n- k
whence

Clearly,

and so
k-l k-l
L
j=l
Ellj' - E lj'1 3 ~ 8 L
j=l
E lj'3 = o«1§J
9.2 Interchangeable Components 335

Thus, the Liapounov and hence also the Lindeberg condition of Exercise 9.1.2
holds. 0

Remarks. (i) The denominator in (12) can be replaced by fie- a

°
(e' - 1 - 1X)1/2 if a -'IXE(O, (0), by a(nI2)1/2 if a -. 0, and by (ne- a )1/2 if
a -. 00. (ii) If aln > 6> 0, then (J2 ~ ne- a - . via (7); if N = na ~ C < 00,
then n l/2 a = 0(1) and (J2 = (na 2/2)(1 + 0(1» + O(a) -. via (14). (iii) If °
(J2 -. (J~ E (0, (0) and a -. 00, then via (10) a = log n - log (J~ + 0(1); if (J2 -.
(J~ E (0, (0) and a -. 0, then a 2 = (2(J5In)(1 + 0(1». Here, the possible

°
limiting distributions can be completely determined (see Theorem 4, Theorem
3.1.4, and Exercise 6). (iv) If (J2 -.0, either a -. or a -. 00 and the limit
distributions are degenerate (Exercise 6).

Theorem 4. If Un is the number of empty cells in distributing N n balls at random


into n cells and Nn2In = Nna n -. (j2 E [0, (0), then

lim P{U
n
- (n _ N
n
) =j} = «(j2/2~~-62/2, j = 0, 1, .... (26)
n-oo J.
PROOF. According to (9) and (8) with N = Nn,j
P{U n ~ n - N + k} = P{SN-k ~ N} = P{SN-k - (N - k) ~ k},
whereS N _ k - (N - k) = 'L7=-Nlj -
I)and{lj - l,j;;::: 2}areindependent,
geometric r.v.s with success probabilities 1 - U - 1)ln. Hence, the char-
acteristic function of SN-k - (N - k), say q>(t), is given by

q>(t) n
= N-k( 1 -
)=1
'-1)[ '-1 J-I
J__
n
1 _ J_ _ e it
n

=N
n- exp k { • _
_J_ _1 +J_
. __1 eit + 0 ( J2'2)}
j= Inn n

= exp{~~ (e it
- 1) + O(~O}
-.exp{(j; (e it -l)},
and Corollary 8.3.5 guarantees that the dJ. of Un - (n - N n) tends to a
Poisson d.f. with parameter (j2/2, yielding (26). 0

EXERCISES 9.2
I. For an arbitrary dJ. G, concoct a sequence of interchangeable r.v.s for which
P{S. < x~} -+ J<Il(x/y)dG(y).
2. If {X.,n ~ I} are interchangeable r.v.s with E XI = 0, E xi
°
Cov(X., X 2) = = Cov(Xi, XD, then Corollary 9.1.4 holds.
= 1, EIXd 3 < 00,

3. Suppose that for each n = I~ 2, ... , {X ni , i = 1,2, ... } constitute interchangeable


random variables with E X. I = 0, E X;I = 1, EIXnl 13 < 00 and set
336 9 Central Limit Theorems

when C§. is the a-algebra relative to which {X. i , i ~ I} are conditionally i.i.d. If
';;'m.(w)!. 0, a.(w)!. I, rY..(w)/';;'a;'(w)!. 0, and S. = D=I X.;, then S.;.;;, ~
N o. 1 ,
4. If for each n ~ I {X. i , i ~ I} are interchangeable LV.S with E X. I = 0, E X;I = I,
EIX. 113 < 00 and E X. I X. 2 = o(n- I ), E X;IX;2 --> I, EIX. 113 = o(n I/2 ), then
S./';;' ~ N o. 1,

5. If two of the last three conditions of Exercise 4 obtain but the third is violated,
construct a sequence {X .i, i ~ I} of interchangeable processes for which S./';;' f2-.
N o. 1,
6. Apropos of Theorems 3 and 4, prove that (i) if a. - log n --> 00, P{L'i X. i = O} --> I,
(ii) if N a. --> 0, P {I X. i = n - N} --> I. (iii) if a. - log n --> fJ finite and P A designates
a Poisson r.V. with mean A., I7=1 X. i ~ P. xp{-6} (see Theorem 3.1.2).
7. Let the i.i.d. LV.S {Z., n ~ I} be independent of Y, where Yis uniformly distributed on
(0, I) and P{Z. = ± I} = t. Then X. = y- 1/ 2 Z., n ~ I, are !f I interchangeable
LV.S with (lIn) L'i Xi ~ 0 but E X IX 2 does not exist.

9.3 The Martingale Case


If the component r.v.s {X., n ~ I} are martingale differences, that is,
E{Xn + II~.} = 0, n ~ 1,
and, moreover, obey the Lindeberg condition, a central limit theorem will
hold provided the conditional variances are sufficiently well behaved.

Theorem 1./f{X., ~., n ~ I} is a stochastic sequence with E{X.+ II~.} = 0,


n ~ 0 (with ~ 0 = {0, On, such that {X.} obey the Lindeberg condition and,
' u.2 = E X .'
sett mg
2 2
s. = " . 2
L.,I Uj,


I EIE{XJI~j-d - uJI = o(s;), (1)
j= I

then (l/sn)Ij= I X j ~ N o. 1 ,

PROOF. The proof of Theorem 9.1.1 may easily be adapted to handle the
current situation. In fact, let ~{t), ait) be exactly as defined there and set
2
bit) = ~ [E{XJI~j- d - uJJ.
Then 9.1(5) holds provided (i) all expectations E therein are changed to
E{ 'l~j_I}' (ii) the term bP/s.) is added to the expression within the absolute
value signs on the right, and (iii) the term Ibi t/ s.) I is added within the brackets
9.3 The Martingale Case 337

on the extreme right. Analogously, (6) obtains if the alteration (iii) is effected
and E is replaced by E{·lffj_d in the second expectation of the second
expression (where independence was used) and in all succeeding expressions
of (6). Finally, (7) holds if E :D=
I Ibit/sn) I is appended to the extreme
right side and so, recalling (l), the theorem follows. 0
A version of the Berry-Esseen theorem also carries over to martingales.

Theorem 2. If {Sn = D Xi' n 2 I} is an !!'3 martingale with E Sn = 0,

J/
E S~ = s~ = :L'i a;, n 2 I, there exists an absolute constant Co such that

Ip{~: < x} - $(x) I ~ co[k ~ EIX l j


3
4

I ]L'iEIE{XfIXI, ... ,Xj-d-afl


+ [c ---
o J2Tc s~/2J:L'i EIX j l 3 .

Lemma 1. Iff is a realfunction withpn-I) absolutely continuous on [a, bJ, then,


setting ftO) = f,
(b - a)i 1 fb
= L --.,-
n-I
f(b) fUl(a) + , ( b - x)"-Ipn)(x)dx. (2)
j=O J. (n-l). a

PROOF. Set

F(x) = f(b) _ nil (b ~,x)i f(j)(x),


j=O J.
(3)

whence F'(x) = [-(b - x)"-I/(n - 1)!Jftn)(x), a.e. on [a, b]. Thus, F is


absolutely continuous on [a, bJ and

F(a) = F(b) - fbF'(x)dx = 1 ,fb(b _ x)n- Ipn)(x)dx,


a (n - I). a

which is tantamount to (2) in view of (3). o


Lemma 2. If {Sn = :L'i Xi' n 2 I} is as in Theorem 2, thenfor any functionf
with absolutely continuous second derivative on every bounded interval and some
absolute constant C in (0, iJ

where Yo is normal with mean 0, variance I, and IlfU)11 = inf[M: J.lU f(j)(x) I 2
M} = OJ, I ~ j ~ 3, with J.l denoting Lebesgue measure on the real line.
338 9 Central Limit Theorems

PROOF. Let {Y,., n ~ O} be independent normal LV.S with means zero and
variances a; (a5 = 1) with {Yn , n ~ O} independent of {X n , n ~ I}.
Set Qj,n = D=:
Xi + LJ+ I 1';, 1 ~ j ~ n, and note that Qj,n + Xj =
Qj+ I,n + lJ+ I' By Lemma 1
If(a + h) - f(a) - hf'(a) - (~)P2)(a) I ~ Iht IIf(3)1I,
whence for some LV.S ()j with I()jl ~ I,j = 1,2,

(5)

Since EIlJI 3 = claj = cl(E XJ)3/2 ~ CI EIXjl3 and

E(X i. - yi.)f(j) (<lie'!)


J J Sn

= lE{f(j)(<l1ne'!)E{X~ - Y~IXI'" Xj-I' lJ+I"'" Y,.}} = 0, j = I

E fm (<l;.e'!) [E{XJ I X I' ... , Xj_d - an i = 2,

the conclusion of the lemma follows from (5). D

PROOF OF THEOREM 2. Let £ > 0 and define h( - t) =- h(t), where

£
O<t<-

r
- -4

h(t) = 3£63
1
G- t -"2' (6)

1
2'

Then h is a nonincreasing odd function with h" absolutely continuous and


Ilh"ll = 8£-2, Ilh"'ll = 32£-3, and consequently

f(t) = h (t - x - n +~ (7)
9.3 The Martingale Case 339

is a nonincreasing function on ( - 00, 00) whose second and third derivatives


have the same norms as those of h and which vanishes on (x + t:, 00), while
f(t) = I for t :$; x. Since j<2) is absolutely continuous on (- 00, 00), by
Lemma 2, for any real x

p{:: < x} :$; Ef G:) :s; E f(Yo) + I :s; $(x + t:) + I, (8)

where

By the mean value theorem $(x + t:) - $(x) :s; t:/jb., whence from (8)

{Sn}
P - < x :s; $(x)
Sn
+ t:
M:
V 2n
+ I. (9)

Define

Then, choosing t: = K~/4 in (9),

which is tantamount to the upper bound of the theorem. In analogous fashion


a lower bound is obtained which combined with (10) yields the theorem. 0

A discussion of martingale and other central limit theorems based on


Dvoretzky (1970) and McLeish (1974) is given in Section 5.

EXERCISES 9.3
1. Utilize Theorem 9.3.1 to prove sufficiency in Theorem 9.2.1.
2. If {X.} are independent with E X j = 0, E xl = (fl, EIXY H < 00, 0 < b ::; I,
modify the right side of (4) of Lemma 2 to (1I!(2)1I + IIP))II) D=
I EIX;i2+6. Hint:

Employ a 2- or 3-term Taylor expansion according as Ihi> I or Ihi::; I.


3. If {X.} are i.i.d. with EX. = 0, EX; = I, EIX.I2+6 < 00,0 < b::; I, prove the
large deviation result that if b log n - 4IX; --+ 00, P{S./~ ~ IX.} - I - <l>(1X.).
340 9 Central Limit Theorems

9.4 Miscellaneous Central Limit Theorems


The first central limit theorem concerns sums of random numbers, say tn'
of independent random variables {X n' n ~ I} and permits t n to be highly
dependent upon the sequence {X n }.

Theorem 1 (Doeblin-Anscombe). Let {X n' n ~ I} be independent random


variables with E X n = 0, EX; = I, and {tn, n ~ l} positive integer-valued
random variables with tnlb n ~ c, where {c, bn, n ~ I} are positive, finite
constants such that bn i 00. Ifn- 1/2 Ll= I X j ~ N o. 1, then
1 '"
172 LXj~No.I'
tn j= I

PROOF.Set Sn = Lt X j and k n = [cbnJ = greatest integer::; cbn·


Now,
S'n (k n)I/2[Sk n S'n - Sk n]
t~/2 = tn 'k'[t2 + k~/2 (1)

and according to the hypothesis the first factor on the right side of (1) con-
verges in probability to one. Moreover, Skjk~/2 as a subsequence of Snln l ' 2
converges in distribution to No. I' Thus, to prove the theorem it suffices to
establish that

(2)

To this end, note that for any positive numbers e, (j


PUS'n - Sk..l > ek~/2} ::; PUS'n - Sd > ek~/2, It n - knl ::; (jk n}
+ PUt n - knl > (jk n}. (3)

Since the event corresponding to the first term on the right implies A: u A;,
where

An+ = { max ISj - Sk..l > ek~/2},


+6)k n
kn~j~(l

A; = { max
kn(l-6)~j~kn
ISj - Sk..l > ek~/2},
and since by Kolmogorov's inequality
P(A;)::; (e 2kn)-1 E(Sk n ±[6kn J - SkY::; (e 2kn)-I(jk n = e- 2(j,
it follows from (3) that
P{IS'n - Sk..l > ek~/2} ::; 2&-2 + 0(1). (4)
For arbitrary e > 0, the first term on the right in (4) tends to zero with (j and
the theorem follows. 0
9.4 Miscellaneous Central Limit Theorems 341

Actually, Theorem 1 holds even if tnln converges in probability to a


positive random variable (Blum et al., 1962-1963; Mogyorodi, 1962).
If r.v.s {X n , n ~ I} obey the hypothesis of the ensuing theorem, then
N c = sup{n ~ 1: L7=
I Xi ~ e}, is a bona fide LV. In the special case of
classical renewaltheory(Section 5.4), where X n > O,n ~ 1,impliesNc + 1 =
1;,(0) [see (5)], asymptotic normality of N c is an immediate consequence
of Theorem 2 below. On the other hand, if merely E X n == J-L > 0, then only
the inequality N c ~ 1;,(0) - 1 may be inferred. Nonetheless, N c is still
asymptotically normal under the hypothesis of Corollary 2.

Theorem 2. If {X, X n, n ~ I} are i.i.d. LV.S with EX = J-L > 0 and a Z =


a~ E (0, (0), define Sn = L~ Xi and
T = 1;, = 1;,(oc) = inf {n ~ 1: Sn > en a}, -00 < oc < 1, e > O. (5)
Then, as e -+ 00

J-L(1 - oc)[1;,(oc) - (elJ-L)I/II-a)] d N


(6)
a(elJ-L)I/Z(l a) -+ 0.1'

PROOF. For simplicity, take J-L = l. Since via the strong law of large numbers
1;,le l /(l-a) ~ 1 as e -+ 00 (Exercise 5.4.6), by Theorems 1 and 8.Ll

T ) I/Z (ST - T) ST - eTa era - T d


( Cl/O-a) aft = acl/ZO-a) + acl/zO-a)-+No.I' (7)

Since E xi < 00, Xn/.j;J ~O, implying XTift ~O as e -+ 00, whence

o< ST - cra < X T ft ~O


- ae l / ZO a) - ft ael/Z(l a) ,

and (7) ensures


cTa - T d
ac l / ZII a) -+ No. I'

Now Zc = I + 0(1) entails Z: -a = 1 + (Zc - 1)(1 - OC + 0(1», so that


cTa - T = T{[T-lel/(l-a)]I-a - I}
= T[T-lel/ll-a) - 1](1 - IX + 0(1»
-(1 - oc)[T - e l /ll - a)](1 + 0(1»,
and the theorem follows. o
Corollary 1. Under the hypothesis of Theorem 2, for -00 < oc < 1
·a)
maxI <isn(SiI.J - J-Ln
I-a
d N (8)
an(l/Z) a -+ O. I'

PROOF. For x "# 0, define


(9)
342 9 Central Limit Theorems

Then, setting qc = [a/Jl(l - iX)](C/Jl)I/2 0 - a l,

p{ max ~- Jln 1 - a
> - xan O / 2l - a }
O$j$nJ

= P {max
o$j$.J
~! > c} = P {1; ~ n}

by Theorem 2 via inversion of (9). o


Corollary 2. If {X, X n , n ~ I} are i.i.d. with EX 1 = Jl > 0, a 2 = ai E (0, (0),
and N c = sup{n ~ 1: I7=1 Xi ~ c}, C ~ 0, then
N c - c/Jl d
c 1 / 2 aJl- 3/2 ~ N o. 1 as c -+ 00.

PROOF. In view of Theorem 2, it suffices to show that (N c - 1;(0)/C I/2 con-


verges to zero in probability or in 2 1 , and, since N c - 1;(0) + 1 ~ 0, it is
enough to verify that E(N c - 1;(0» ~ EN 0 < 00. Since E 1;(0) < 00, by
Corollary 5.4.1,
<Xl

E(N c - 1;(0» = L (P{N c ~ n} - P{1;(O) ~ n})


n=1
<Xl

~ I P{1;(O) < n ~ N c }
n=1
ex> n-l
= L I P{1;(O) = k, Sj ~ dor somej ~ n}
n= I k= 1

~ JI :t: P{1;(O) = k}P{~~~(Sj - Sk) < O}

= Jl nJ+I {1;(O) =
P k}pt~~~/j < O}
JI P {1;(O) = k}JIP{~~~Sj < O}
f p{inf Sj < O} ~ E No <
n= 1 J~n
00

via Corollary 10.4.5. o


The invariance principle, launched by Erdos and Kac (1946) consists in (i)
showing that under some condition on r.v.s {X n , n ~ I} (e.g., that of Linde-
berg) the limit distribution of some functional (e.g., maxI $j$n If= 1 Xi) of
9.4 Miscellaneous Central Limit Theorems 343

{Xn} is independent of the underlying distributions Fn of X n' n ;::: I, (ii)


evaluating this limit distribution for an expeditious choice of {F n }.
By combining their technique with a sequence of stopping times, it is
possible to eliminate step (ii) in

Theorem 3. If {X n , n ;::: I} are independent LV.S with E X n = 0, EX; =


cr 2 E(0, 00), n;::: I, which satisfy the Lindeberg condition, Sj = Xi' and L{
Ti = inf{j;::: I:Sj > c},thenasc- 00,(cr 2 /c 2 )TiconlJergesindistributionto
the positive stable distribution of characteristic exponent! or equivalently

°
(I/crn I/2 )max 1 $;j$;n Sj converges to the positive normal distribution, i.e., for
x> 0, y >
2
lim P {1;,i > c : } = lim P {m~x Sj:5: xcrn
l/2 } = 2l1> ( ~/2) - 1. (10)
c-oo cr n-oo 1 ~J~ny y
Note. For y = I, the right side of (10) is a dJ. in x, namely, the positive
normal distribution, while for x = 1 it is one minus a dJ. in y; the latter,
2[1 - lI>(y- 1/ 2 )], Y > 0, is the so-called positive stable distribution of
characteristic exponent! (Chapter 12).
PROOF. Without loss of generality, take cr = I = y and let x > 0. In view of
the Lindeberg condition, for every () > °
n I n
j~IP{lXjl > ()~}::;; n{)2 j~IEXJI[IXjl>cSJlil = 0(1). (II)

For any positive integer k, set nj = [jn/k], j = 0, 1, ... , k and n = k,


k + 1, .... If
j = 0, I, ... , k - I,

then Sn = Snk = L~:J 1';. Moreover, Yo, ... , ¥,.-1 are independent LV.S and,
furthermore, for fixed j = 0, 1, ... , k - 1 as n - 00

Consequently, as n - 00 the r.v.s Yn)~ ~ No. 11k for j = 0, 1, ... , k - 1.


Next, for each i = 1, 2, ... , n let m(i) be the integer for which nm(i)-1 <
i :5: nm(i) and note that ° < m(i) :5: k. For any e > 0, setting Ai = Ai,n(e) =
{ISnm,,) - S;I ;::: e~}, 1 :5: i :5: n, and omitting the * in 1;,*

P{TxJli :5: n} - P{Sn > x~} = P{Sn :5: x~, max Sm > x~}
t :::;;m<n
n-l n-l
(13)
: ; L P{Txyin =
i= 1
i, Ai} + L P{Sn::;; x~, Txyin =
i= 1
i, Ai}

= D + B (say). Now, via Kolmogorov's inequality


344 9 Central Limit Theorems

.-1
D= L1 P{'Txvr.; =
j=
i}P{AJ

.- 1 1 1 2n 2
~ L -2(nm(i)-nm(j)-I)P{Txm=i}~-2-k
i= 1 ne ne
=-k2'
e
(14)

On the other hand, recalling that nk = nand s./fi ~ No. I'

L [P{S. ~ (x - = i, An
.- I

B~ e)fi, Txm
i= 1

.- 1
~ L P{Txm = i, S'm(j) -
i= 1
S. > O} + P{(x - e)fi < S. ~ xfi}

.-1
= L P{TxJii = i}P{S'm(j) - S. > O} + O(e). (15)
j= 1

By virtue of n- 1/2 Y.,j ~ No, 11k' j = 0, 1, , .. , k - 1, and independence, for fixed


k as n -+ 00 necessarily
k-l
n -1/2(S" - S 'm(i)
) - n -1/2
-
"L. Yj d
-+ N
0, (k-m(j)/k ·I -- 1, 2,
t 'lor ... , nk- 1 •
j=m(i)

Noting that 1 ~ m(i) ~ k (despite I ~ i < n -+ (0)


P{S'm(j) - S•• > O} -+ t uniformly for 1 :s; i:S; nk - I• (16)
On the other hand, for nk _ 1 < i < n necessarily m(i) = k, whence the left side
of(16) is zero. Consequently, from (15) and (16)
.-1
B~t L P{TxJii = i} + 0(1) + O(e)
i= 1
(17)
~ tP{'Tx Jii ~ n} + 0(1) + O(e),
and so, combining (13), (14), and (17),
1- 2
2 lim P{TxJii ~ n} ~ I - <l>(x) + -k2 + O(e). (18)
n-(() e

To obtain the reverse inequality for the lower limit, observe that A. = 0,
whence via (14)
P{Txv" ~ n} - P{S. > (x + 2e)fi}
= P{S. ~ (x + 2e)fi, max Sj > xfi}

2: L P{TxJii = i, S. < S'm,'" IXd ~ efi, An
i= 1

2: f P{TXJii
i= 1
= i, S. < S'm(,,' IXd ~ efi} -
i= 1
f P{TxJii = i, Ai}
9.5 Central Limit Theorems for Double Arrays 345

n n 2
~ i~IP{TXJiI = i,Sn < Sn m,,,} - i~IP{lX;i > ejn} - kt;2

I n 2
= "2 i~l P{TxJiI = i} + 0(1) - ke 2
in view of (11) and the last equality of (15) and its aftermath. Thus,

1 . 2
-2 lIm P{TxJiI :s; n} ~ 1 - Cl>(x + 2e) - -k2' (19)
ft-Q) e
Letting k - 00 and then e - 0 in (18) and (19), it follows that as n - 00

p{ max Sj > xjn} = P{TxJiI :s; n} - 2[1 - lI>(x)], x > 0,


I:S;Jsn

which is tantamount to (10) for y = 1 = a. 0

EXERCISES 9.4
1. Construct a sequence of independent r.v.s {X.} with identical means and variances
for which the Lindeberg condition fails.
2. Let{X., n ~ I} be independent LV.S obeying the central limit theorem with EX. = ~,
EX; == ~2 + (12 < 00. If Nc(lX) = sup{k ~ I: Sk :::::; ck"}, c > 0, prove if ~ > 0,
o : : :; IX < I that Ne<lX) is a bona fide LV.
3. Let {X., n ~ I} be independent LV.S with (l/fi) D (E X j - ~) -+ 0 for ~ E (0,00),
(lIn) D (12(X) -+ (12 E (0,00), and (l/fi(1)(s. - DE Xj) ~ No,l' Show that the
conclusions of Theorems I and 2 remain valid.
4. If {X•• n ~ I} are i.i.d. LV.S with J1. = EX> 0 and 0 < b. --+ 00, then, if

likewise (llb.)(max l S;js;.Sj - n~) ~ SF' This yields an alternative proof of Corollary
9.4.1 in the special case IX = O.

9.5 Central Limit Theorems for Double Arrays


The initial result employs a double sequence schema {Xn,j' 1 :s; j :s; kn < 00,
n ;;:::: I} (see Chapter 12 for the independent case) and furnishes conditions for
the row sums Sn = L~~l Xn,j to converge in distribution to a mixture of normal
distributions with means zero.
Since conditions (1), (2), and (5) of Theorem 1 can be interpreted as con-
vergence in distribution (to the distribution with unit mass at zero), this
theorem does not require the array of r.v.s {Xn ;} to be defined on a single
probability space. In other words, for each n ~ 1, {Xn,l"'" Xn,k.} may
be r.v.s on some probability space (On' ~, Pn) with the a-algebras ~,1 c
~,2 c .. · C ~.k" c~.
346 9 Central Limit Theorems

Theorem I (Hall, Heyde). For each n ~ 1, let {Sn,j = '~J=1 Xn,i, :F..,j, 1 ~ j ~
kn < oo} be an 2 2 stochastic sequence with X: = maxI $i$k n IXn,;!, Un~j =
I3=1 X;,i' 1 ~ j ~ k n such thatjor some :F..,I-measurable r.v. u;
U 2 - u2 !. 0
n n(where U 2 = U 2 ) (1)
"n.k,.

X:!. 0 (2)
Un2 ~ 1];' (3)
sup E
n2:1
X: 2
< 00 (4)

k ~

L E{Xn):F..,j-d !.O,
j=1
L E 2{Xn,jl:F..,j_d!.O
j=1
(5)

with :F..,o the trivial u-algebra, then Sn = Sn,k n


~ Sa where E e ilSG = E e-(12f2)~:.
PROOF. Set X~,j = X n.j - Ej- 1 X n,j where Ej- 1 X n,j = E{Xn):F..,j-d, and let
S~,j' U~,j' X~* be the analogues of Sn,j, Un,j, X:. In view of(2) and (5), X~*!. o.
By (5), (3), Schwarz's inequality and Slutsky's theorem,

whence

by (1) and (3). Moreover via Theorem 7.4.8


E max (Xn,j - Ej- 1 X n,Y ~ 2 E(X: 2 + max E]-IXn)
j$k n j

~ 2 E(X: 2 + max Ej _ X: 2) ~ 10 E
jSk,.
1
2
< 00 X:
implying sUPn;" 1 E(X~*)2 < 00. Hence (1), (2), (3) and (4) hold for the primed
r.v.'s (X~j}'
F or any c > 0, Iet I] ,,2 = I] 2 1\ C, Un"2 = Un2 1\ C, X"nj = X'n,j I [~;"Jl W here IX =
min{1 ~ i ~ kn: U2i> c} or k n according as U~2 > c or not. Define U~',i' S;,j'
X;* in an obvious fashion. Since either U~2 < c whence U~'2 = U~2 or IX ~ kn
implying c < U~'2 ~ C + (X~*)2,

U~2 1\ C ~ U~2 ~ (U~2 1\ c) + (X~*)2


and so U~2 ~ 1]2, U~2 - U~2 !. O. Clearly X~*!. 0 and sup E(X~*)2 < 00.
Hence the "double prime" analogues of (1), (2), (3), (4), say (1)", (2)", (3)", (4)"
hold.
Next, setting T,. = n~~1 (l + iX;) and noting that X;,j = 0 for j > IX

EI T,.1 2 = E nk

j=1
(1 + X;/) = E(l + X;,~) n (1 + X;/)
j<~
9.5 Central Limit Theorems for Double Arrays 347

:5: E(1 + X;.~)eu;;.~-I :5: eC [1 + E(X:*)2]


implying via (4)" that {T", n.:<:: I} is u.i.
Define r(x) by r(O) = 0 and (see Lemma 12.1.1)
e ix = (1 + ix)e-(X 2 /2)+r(X) (6)

whence 1 = exp{r(x) + LJ=3(-ixY/j} for Ixl < 1 implying r(x) = x 4 a(x) +


ix 3 b(x), where
0< a(x) = 1/4 - x 2 /6 + x 4 /8··· < 1/4
0< b(x) = 1/3 - x /52
+ x 4 /7'" < 1/3
so that Ir(x)1 < Ixl 3 for Ixl < 1. Now (6) entails

eiS~ = 1'" W:' = (1'" - 1) J.v" + (1'" - I)(W:' - J.v,,) + W:' (7)

where

W:' = exp{ -tU;2 + L r(X;)}, (8)


j

Since on {X"*
n I"~n r(X"n,}·)1
< I} 'L.Il=! <" ·IX" .1
~J n.J
3
< X"*U"2
- n n P
-+ 0,
(3)" guarantees
(9)
while (1)" and (8) ensure W:'/J.v,,!. 1. Thus, 0 :5: J.v" :5: 1 entails W:' - J.v,,!. 0
a~d so recalling that {T,,} is u.i. (and a fortiori, tight)
(1'" - I)(W:' - J.v,,)!. O. (10)
Next, since {X;.j' .fF".j, 1 :5: j :5: k"} are martingale differences,

=E n (1 + iX"".}.) = ... = 1 + iX"".1


kn-I

j=1

and so, recalling that 0 :5: J.v" :5: 1

via (2)", (4)" and dominated convergence.


Now in view of (7)

eiS~ - J.v"(T,, - 1) = (1'" - I)(W:' - J.v,,) + W:' (12)

while (9) and (10) ensure that the right and hence left side of (12).2. e-~"2/2.
Since the left side is u.i., Corollary 8.1.8 in conjunction with (11) guarantees
that
(13)
348 9 Central Limit Theorems

Consequently,
IE(eiS~ _ e-,,2/ 2 )! ~ IE(eis~ _ eiS;;)1 + IE e iS;; _ e,,"2/2 1+ \E(e-,,"2/ 2 _ e-,,2/ 2 )1
~ 2P{U~2 > c} + IE(eiS~ _ e-,,"2/ 2)1 + P{,,2 > c}
which, in conjunction with (13) implies that for any t: >0
lim IE(eiS~ - e-,,2/ 2 )1 ~ 3s
provided c :?: Ct' Replacing S~ by tS~ for t #- 0,
lim E ei'S~ = E e-,2,,2/2.

Thus S~ ~ Sa where the c.f. of Sa is E e-,,2,2/2. Finally, Sn ~ Sa via (5).


Corollary 1 (McLeish). Iffor each n:?: 1, {S",j=~] Xn,i, §,.,j' 1 ~j ~ k n < <Xl}
is an 2 2 stochastic sequence on (0, ff, P) satisfying (2), (4), (5) and
(1')
for some non-negative n;:;"l
§,.,1 -measurable r.v. ,,2
then the conclusion of
Theorem 1 holds.
In order to replace conditions (2), (4) by a conditional Lindeberg condition
(22) and substitute v,,2 = ~J=l E{X~I§,.j_l} for V;,
several lemmas will be
needed.

Lemma 1 (Dvoretzky).Ifthe events AjE~, 1 ~ j ~ n where {~, j:?: O} is an


increasing sequence of sub-u-algebras of the basic u-algebra ff, then

p{ )=1UA j} t
~ e+ p{ P{Ajl~-d > e},
J-I
e > O. (14)

In particular, for any non-negative stochastic sequence {Yn,j'§",j' 1 ~ j ~ kn},


if~)~'1 E{ Y".J(Yn.j~tll~,j-l} .!. 0, all e > 0, then
p
max Y",j -+ O. (15)
i5:j'5:k n

PROOF. Setting}J.k = I1=1 P{Ajl~-d, 1~ k ~ n

pLQ AltLn jt
~ e]} ~ pLQ Aj[}J.j ~ e]} = E P{Ajl~_I}I(~j~t) ~ e l

so that

Lemma 2 (Dvoretzky). Let W be any C§-measurable, 2 2 , r.v. where C§ is a


sub-u-algebra offf. Thenfor any e > Oand any 2 2 r.v. Yfor which E{ YIC§} a~·O,
with probability one
(16)
9.5 Central Limit Theorems for Double Arrays 349

PROOF. Without loss of generality suppose e = 1. Define


A={W2~1}, Z = (2Y + W)2 _ y2
2
Q(D) = E(y 2 + Z)I[Y2+Z>IID - E y I[Y2>IID' De$'.
It suffices to prove for any Ge~ that Q(G) ~ O. Since E WYIAG =
E WIAGE{YI~} = 0 implies
E ZIAG = 3 E y 2I AG + E W 2I AG
and moreover

necessarily
2 2
Q(AG) = E ZIAG - E ZI[Y2+Z~ IIAG + E y I[y2+Z>IIAG - E y I[Y2>IIAG ~ O.
c 2
DnA , y + Z ~ 1 iff(2Y + W + 1)(2Y + W - 1) ~ iff -(1 + W)/2 ~ Y ~ o
(1 - W)/2. Hence A c[y 2 + Z ~ 1] C [y 2 ~ 1] so that A c[y 2 + Z ~ 1]·
[y2 > 1] = ,p. Furthermore, since Z < 0 iff 12 y + WI < IYI which, in turn,
implies I YI < IWI, necessarily [Z < O]A C C [y 2 < 1] so that Z ~ 0 on
A c [y 2 ~ 1]. Thus
Q(AcG) = E(y 2 + Z)I[Y2+Z>I2:y2WG - E y2I[Y2+Z~I<Y2WG

+ E ZI[y2+Z>I,Y2>IIAcG ~ 0
whence
Q(G) = Q(AG) + Q(AcG) ~ O. D

Corollary 2. If EIXI 2 < ex) and e > 0, with probability one


E{IX - E{XI~Wlrlx-E{XI~}I>2<11~} ~ 4 E{X2I[lxl>'II~}. (17)

PROOF. Take W = 2 E{XI~} and Y = X - E{XI~} 0

Lemma 3 (McLeish). For each n ~ 1 let {y",j' ff",j' 1 ~ j ~ kn < ex)} be a


non-negative !l'1 stochastic sequence and let
j

Sn,j = L
i=l
y",i'

If
kn

i~1 E{ Y"j[y n,j><llff",j- d !. 0, e>O (18)

and
{/In, n ~ I} is tight (19)

then
p
max ISn,j - /In,jl-+ O. (20)
1 $j~k"
350 9 Central Limit Theorems

PROOF. Via Lemma I,


Y,,* = max Yn •j ':' O.
jS k n

Let Y:. j = Y",J[Yn.jSK'2.~n,jSK1' K > O.


Now

implying

lim P
"-00
{t I
1
Y".i - Y:) > o} : ; sup P{JLn > K} + lim P{ Y,,* > K-
n~ 1 n-oo
2
} ~0
whence defining S~.j' S~, JL~.j' JL~ analogously

Moreover,

whence, in view of(18) and (19), for any b > 0

P {max IJLn,j -
1 SjSk n
JL~) > fJ} ::; P {t E{Y",JIYn,j>K'211'?;'.j-d > fJ/2}
1

+ P{JLnIll'n>Kl > b/2} ::; P {t E{Yn,J[yn,pK' 211'?;'.j-d > b/2}

+ sup P { JLn > K ,j n~<X)


----.
{
sup P JLn > K ----. O.
} K~<X)

n~l n~l

Furthermore, by Theorem 7.4.8 for any b > 0

(F /4) P {m~x IS~,j - JL~) > b} ::; ~ E m~x (S~.j - JL~,Y ::; E(S~ - JL~)2
k" kn II"
::; L E Y:j ::; K- 2E L Y:. j = K- 2 E L E {Y:).?;..i-d
i=l 1 1
9.5 Central Limit Theorems for Double Arrays 351

Finally,
max ISn,j - J.ln)
j

~ max ISn,j - S~) + max IS~,j - J.l~,jl + max 1J.l~,j - J.ln,jl !. O. 0


j j j
As earlier, Ej - 1{ . } abbreviates E{ '1§;"j-1}' Condition (22) is a conditional
Lindeberg condition and hence less stringent than the classical version.

where §;',O is the trivial u-algebra, Then Sn == Sn,k n ~ SG where E e i1So =


E e-~;12/2.

PROOF. For n ~ 1, define X~,j = Xn,j - E j - 1{Xn,j}' By Corollary 2, {X~,j}


satisfies (22) and hence also condition (2) of Theorem 1, via Lemma 1.
Set v,,2 = L~~l Ej - 1 {X';.J, V; = L~n X~ and define V;2 and U~2 analogously.
Via (23)
v.,2
n
~.,2
'IF

The latter ensures that V;2 is tight which, in conjunction with (22) for {X~.j}
guarantees that

by Lemma 3, Consequently, (1) and (3) of Theorem 1 hold for {X~.J.


Next, define the martingale difference sequence
X"n,}. = X'n,}.[["j E
L.l i - I
{X'2/ } 1)
n.i (IX~.ll>tJ ::5

and note that (2) and (5) hold trivially for {X;,j}' Likewise (1) and (3) obtain
since U~2 - U~'2 !. 0 in view of
U,2 - U"2 < U'21 k n '2
n n - n [Ln:, Ej-l{Xn.iIIX~.jl><l}>l)
and the conditional Lindeberg condition.
Moreover,
kn
E max
.
X"~
n,) -
< /;2 +E "X,2.I,
!-- n,) IIX'\iI>£,L~1
j '2
Ei-,{Xn.;lIIX· il>"}::S; 1)
}5':k n )=1 IJ.

kn

~ /;2 + E j~ Ej - 1 {X~JIIIX~ȣ)} I[L{ E i - dX~:i/lIX~.il>'I}::S; 1)


~ /;2 +1
352 9 Central Limit Theorems

so that (4) also holds for {X;J. Consequently, S; = IJg,1 X;,j ~ Sa by


Theorem 1. Again employing the conditional Lindeberg condition

lim P {S~ # S;} ~ lim P {~ E i - {X~~iI[lx~.,I>'I} > I} = 0


1

whence S~ ~ Sa. Finally, Sn ~ Sa via (24). 0

Likewise in Theorem 2 the r.v.s {Xnj , 1 ~ j ~ k n} may be defined on differ-


ent probability spaces (nn, ff", Pn)'

Corollary 3 (Dvoretzky). If, for each n ~ 1, {Sn,j = I,1=1 X ni , ff".j, 1 ~ j ~


kn < oo} is an .!l'2 stochastic sequence on (n,~, P) satisfying (22), (24) and

f
k
v,,2 = [E{X;,jlff",j-d - E 2{Xn •j lff".j_d].!.,,2 (23')
j=1

for some non-negative n:;'1 ff".I-measurable r.v. ,,2, then the conclusion of
Theorem 2 holds.

EXERCISES 9.5
1. (Helland) If for n;;:: 1, {Xnj , !F,.j' 1 :::;; j :::;; k n} is a stochastic sequence on (O,§', P)
with E Xn* = E maXj IXnjl = 0(1) as n .... 00, then L~~I Ej_1{IXnPllx nJ I>tl} !. 0, e > O.
In particular, ifevents A nj E!F,.j where!F,., c ... C !F,.k n C jO, then p{U~n A nj } =0(1)
as n .... 00 implies D~I P{Anjl!F,..j-l} !. O.
Hint: Let T" = inf{1 :::;; j :::;; k n : IX.jl > e} and T" = k. otherwise. Then

2. (Helland) (i) If for each n;;:: 1, {X.j ,9';,j' 1 :::;; j:::;; k.} is a stochastic sequence with
IXnjl :::;; c, 1 :::;; j :::;; k. and U; = Dn X~ !. 1, then {v,,2, n ;;:: 1} is tight.
Hint: If 1;, = inf{l :::;; j :::;; kn , Lt=1 X;i > 2} and T" = k. otherwise, then

Tn
P{v,,2 > a} ~ P{U; > 2} + a-I E I X;j ~ 0(1) + (c + 2)a- 1•
1

(ii) If, in addition, X: = maXj IXnjl !. 0, then v,,2 !. 1.


Hint: The conditional Lindeberg condition (22) holds via Exercise 1.
3. (Ganssler-Hausler) If for each n ;;:: 1, {Snj = II=1 X ni , !F,.j' 1 :::;; j :::;; kn} is a martingale
with E XI = 0, n ;;:: 1 satisfying (i). E X: .... 0 and (ii) U; !. 1, then I~n X.j ~ No,l'
Hint: X: !. 0 implies (iii) D~1 X=jIIIXnJI>I]!. 0, r = 1, 2 and via Exercise 1, (iv)
I~~I Ej_1{XnjIIIXnJI,;; I)}!. O. Then setting X~j = XnjIIIXnJI,;; IJ - Ej _ 1{X.iIIXnJI';; I)}'
Corollary 3 ensures I'~~ X~j ~ No. 1 since {X~j} satisfies (22) and moreover via (ii),
(iii) and (iv), D~I (X~j)2 .... 1 whence Exercise 2 (ii) yields (V:)2 !. 1.
References 353

References
F. Anscombe, "Large sample theory of sequential estimation," Proc. Cambro Phi/os.
Soc. 48 (1952),.600-607.
S. Bernstein, "Several comments concerning the limit theorem of Liapounov," Dokl.
Akad. Nauk. SSSR 24(1939),3-7.
A. C. Berry, "The accuracy of the Gaussian approximation to the sum of independent
variates," Trans. Amer. Math. Soc. 49 (1941), 122-136.
J. Blum, D. Hanson, and 1. Rosenblatt, "On the CLT for the sum of a random number
of random variables," Z. Wahr. Verw. Geb. 1 (1962-1963), 389-393.
J. Blum, H. Chernoff, M. Rosenblatt, and H. Teicher, "Central limit theorems for
interchangeable processes," Can. Jour. Math. 10 (1958),222-229.
K. L. Chung, A Course in Probability Theory, Harcourt Brace, New York, 1968; 2nd ed.,
Academic Press, New York, 1974.
W. Doeblin, "Sur deux problemes de M. Kolmogorov concernant les chaines denom-
brables," Bull, Soc. Math. France 66 (1938),210-220.
J. L. Doob, Stochastic Processes, Wiley New York, 1953.
A. Dvoretzky, "Asymptotic normality for sums of dependent random variables,"
Proc. Sixth Berkeley Symp. on Stat. and Prob. 1970,513-535.
P. Erdos and M. Kac, "On certain limit theorems of the theory of probability," Bull.
Amer. Math. Soc. 52 (1946), 292-302.
C. Esseen, "Fourier analysis of distribution functions," Acta Math. 77 (1945), 1-125.
W. Feller, .. Ober den Zentralen Grenzwertsatz der wahrscheinlichkeitsrechnung," Math,
Zeit. 40 (1935),521-559.
N. Friedman, M. Katz, and L. Koopmans, "Convergence rates for the central limit
theorem," Proc. Nat. Acad. Sci. 56 (1966), 1062-1065.
P. Hall and C. C. Heyde, Martingale Limit Theory and its Application, Academic Press,
New York, 1980.
K. Knopp, Theory and Application of J"jinite Series, Stechert-Hafner, New York, 1928.
P. Levy, Theorie de faddition des variables aleatoiries, Gauthier-Villars, Paris, 1937;
2nd ed., 1954.
J. Lindeberg, "Eine neue Herleitung des Exponentialgesetzes in der Wahrschein-
lichkeitsrechnung," Math. Zeit. 15 (1922),211-225.
D. L. McLeish, "Dependent Central Limit Theorems and invariance principles," Ann.
Prob. 2 (1974), 620-628.
J. Mogyorodi, "A CLT for the sum of a random number of independent random
variables," Magyor. Tud. Akad. Mat. Kutato Int. Kozl. 7 (1962), 409--424.
A. Renyi, "Three new proofs and a generalization of a theorem of Irving Weiss,"
Magyor. Tud. Akad. Mat. Kutato Int. Kozl. 7 (1962),203-214.
A. Renyi, "On the CLT for the sum of a random number of independent random
variables, Acta Math. Acad. Sci. Hung. 11 (1960), 97-102.
B. Rosen, "On the asymptotic distribution of sums of independent, identically distri-
buted random variables," ArkivfOr Mat. 4 (1962),323-332.
D. Siegmund, "On the asymptotic normality of one-sided stopping rules," Ann. Math.
Stat. 39 (1968), 1493-1497.
H. Teicher, "On interchangeable random variables," Studi di Probabilita Statistica e
Ricerca Operativa in Onore di Giuseppe Pompilj, pp. 141-148, Gubbio, 1971.
H. Teicher, "A classical limit theorem without invariance or reflection, Ann. Math. Stat.
43 (1973), 702-704.
P. Van Beek, "An application of the Fourier method to the problem of sharpening the
Berry-Esseen inequality," Z. Wahr. 23 (1972), 187-197.
I. Weiss, "Limit distributions in some occupancy problems," Ann. Math. Stat. 29 (1958),
878-884.
V. Zolotarev, "An absolute estimate of the remainder term in the c.L.T.," Theor. Prob.
and its Appl. II (1966), 95-105.
10
Limit Theorems for
Independent Random Variables

10.1 Laws of Large Numbers


Prior discussion of the strong and weak laws of large numbers centered
around the i.i.d. case. Necessary and sufficient conditions for the weak law
are available when the underlying random variables are merely independent
and have recently been obtained for the strong law as weIl. Unfortunately, the
practicality of the latter conditions leaves much to be desired.
A few words are in order on a method of considerable utility in probability
theory, namely, that of symmetrization. In Chapter 6 it was pointed out that,
given a sequence ofr.v.s {X n , n ;::: 1} on (Q,:F, P),a symmetrized sequence of
LV.S {X:, n ;::: l} can be defined-if necessary by constructing a new prob-
ability space. The joint distributions of

n;::: I,

are determined by the fact that {X~, n ;::: 1} is independent of {X n , n ;::: 1}


and possesses the same joint dJ.s, that is, {X n,.n ;::: 1} and {X~, n ;::: 1}
are i.i.d. stochastic processes. In particular, if the initial LV.S X n , n ;::: I,
are independent, so are the symmetrized X n , n;::: 1, namely, the X:,
n ;::: 1.
The salient point concerning symmetrization is that the X: are symmetric
about zero while the magnitude of the sum of the two tails of the correspond-
ing dJ.F: is roughly the same as that of F n' The relation between the distribu-
tions and moments is stated explicitly in

354
10.1 Laws of Large Numbers 355

Lemma t. If{X j , 1 5. j 5. n} and {Xi, 1 5. j 5. n} are i.i.d. stochastic processes


with medians mj and X1 = X j - Xi, then for any n ~ I, e > 0, and real a

t p{ max (X j
l~)~n
- m) ~ e} 5. p{ max X1 ~ e},
l~J~n
(I)

t p{ max IXj - mjl ~ e} 5. p{ max IXjl ~ e}


15)5" 15J:5n
(2)
5. 2 p{ max IXj - a I
15j5n
~ ~},
2

tE(maxIXj-mjl)P 5.E(max IXjl)P 5.2E(2max l x j -a l )P, (2')


I SJ:5n 1 :5):5" 1 SJ5n

where p > O. Moreover, ifE X I = 0,


p ~ 1. (3)

PROOF. Set A j = {Xj - mj ~ e}, B j = {Xi - mj 5. O}, C j = {Xj ~ e}. Then


A j · Bj c C j and by Lemma 3.3.3

which is (I). To prove (2), set


T = inf{j ~ I: IXj - mjl ~ e}.
Then via independence,
n
P{T5.n} 5.2 L1 [P{T=j,Xj-mj~e,Xi5.mj}
j=

n n
5. 2 L
j=l
P{T=j, IXjl ~ e} =2 L
j=l
P{T=j, IX}I ~ e}

5. 2P{ m~x
15is n
IXjl ~ e}.
This yields the first inequality of (2) and the second follows from

p{m~x IXjl ~ e} ~ p{max IXj -


ISi5n ISiSn
al ~ ~or max IXi - al
I~i~n
~~}
5. 2 P{2 max IXj - al ~ e}.
ISiS n

Moreover, (2) in turn yields (2') via Corollary 6.2.2.


356 10 Limit Theorems for Independent Random Variables

Apropos of (3), since E{ X I - X'I IX I} = X I' the conditional Jensen


inequality ensures E I XT IP ~ E IX liP. The remaining portion follows
trivially from Exercise 4.2.1 and does not require that E X I = O. 0
Theorem 1 (Weak law). For each n ~ 1, let {X nj , 1 5",j 5", k n -+ oo} be
independent r.v.s, Sn = L~n X nj , and let mnj denote a median of X nj . Thenfor
some real numbers An
(4)
max Imnjl-+ 0, (5)
t $ j,; k n
as n -+ 00 iff
kn
L P {IXnj I ~ t:} -+ 0, t: > 0, (6)
j= I

kn

L (12(XnjIIIXnil< 11) -+ 0,
j= I
(7)

in which case
kn

An - L E X )II
j= I
n X ni!<II-+ O. (8)

PROOF. (6) implies (5) trivially. To prove (4), set

and note that via independence (7) ensures Vn - E Vn .!:. O. Since by (6)
kn

P{Vn -# Sn} S; L P{lXnjl ~ l} = 0(1), (9)


j= I

Sn - E Vn !:.. 0, yielding (4) and (8).


Conversely, if (4) and (5) hold, let (X~I' ... , X~kJ and (X. I , ... , Xnd be
i.i.d. random vectors for each n ~ 1 and set X:j = Xnj - X~j' S: = D~ I X:j .
Then (4) entails S: !:.. 0 and so by Levy's inequality (Corollary 3.3.5) for any
t:>0

n
whence

ex p {- II
P{IX:j ! ~ t:}} ~ J= I
P{IX:jl < t:} = p{maxlx:jl < t:} -+
J $k n
I

as n -+ 00, implying for all t: > 0 that


kn

L P{ IX:jl ~ t:} = 0(1). (10)


j= I
10.1 Laws of Large Numbers 357

Since (5) ensures Imnjl < c;, 1 ~ j ~ k n for all large n, by Lemma 1

2 P{lX:jl ~ c;} ~ P{lX nj - mnjl ~ c;} ~ P{lXnjl ~ 2c;}

for all large n, and (6) follows via (10).


To establish (7), set

V: = L Y:
k"
j •
j= 1

By (6), (9), and (4), v" - An!. 0 entailing V: !. O. Hence, if V:k = D= 1 Y:j ,
1 ~ k ~ kn , by Levy's inequality for all c; > 0

p{max 1V:kl ~ c;} ~ 2 P{I V:I ~ c;} = 0(1). (11)


k$k"

For fixed n ~ 1 and c; > 0, define T = inf{j: I s j s k n, IVnjl ~ e} and


T = 00 if this set is empty. Then 1;, = min( T, kn ) is a bounded stopping
variable and since I Y:jl s 2, I s j s k n ,

whence it follows from the second moment analogue of Wald's equation


(specifically, Corollary 7.4.7) that
T"
e2 + 4(e + 1)P{T ~ k n } = E[V:. TJ2 = E L a 2 (y:j )
1
k"
~ 2 P{T ~ kn } L a 2 (Y,,).
1

As n -+ 00, (11) ensures that P{T ~ kn } = 0(1), yielding


k"
C;2 ~ 2 Imi L a 2 ( Y,,),
n-+ (() 1

and since c; is arbitrary, (7) follows. o


Remark: Sufficiency only requires (6) for e = 1.

Corollary 1. If for each n 1, {X nj , 1 ~ j ~ k n -+ (f)} are independent r.v.s


~
with mnj a median of X nj , 1 ~ j ~ k n, and Sn = D~ 1 X nj , then

(12)

for some real numbers An iff


k"
L P {IX nj - mnj I ~ I} = 0(1), (13)
j= 1
358 10 Limit Theorems for Independent Random Variables

kn

L E(X nj -
j; I
mnj)21Ilxnrmnj!<11 = 0(1), (14)

PROOF. Since Znj = X nj - m nj has zero as a median, (12) follows from (13)
and (14) by Theorem I, noting (9). Conversely, under (12), setting Un =
D~I Znj, B n = An - D~I mnj ,

Un - Bn ~ 0, (15)

whence (13) holds by Theorem l. To prove (14), set

and let (¥~ I> ... , ¥~kJ and (Y" I' ... , ¥nkJ be i.i.d. random vectors. It follows
from (15) and (13) that

and since I(¥nj - ¥~)/31 < I and (Exercise 3.3.3) m( Y,,) = 0, Theorem I and
Lemma I ensure that

which is tantamount to (14). o


Corollary 2. lffor each n ;;::: I, {¥nj' I ~ j ~ k n --+ oo} are independent, positive
LV.S, then Sn = L'~ I Y"j ~ I and maxI 5,j5,k n m( ¥n) --+ 0 iff for all e > 0

kn
L P {Y"j ;;::: e} = 0(1), (16)
j; I

kn

L E ¥njl[Ynj< II --+ l.
j;1
(17)

Moreover, ifL~~ I E ¥nj = I, then Sn!. I and max l 5,j5,k m(¥n) --+ 0 iff
n

kn

j; I
L E ¥nJ[Ynj~'1 = 0(1), e> O. (18)

PROOF. Necessity of (16) and (17) is an immediate consequence of (6) and (8)
of Theorem 1 with An == l. Sufficiency likewise follows from Theorem I once
10.1 Laws of Large Numbers 359

it is noted that for arbitrary e in (0, 1)


kn kn
0::; L (E Y;jI[y nj <
j;1
1) - E 2 y")[Y nj <I ) : : ; L E Y;)[y nj <1)
j;l
kn kn

::; e L E Y"jI[Ynj<tl + j;L1 P{Ynj 2


j; 1
e}.

The final remark is a direct consequence of


kn

o::; e L P{Ynj 2 e}
j; 1

kn kn

::; L E YnjI[Ynj"tl::; L (P{e ::;


j; 1 j; I
Ynj } + E y"jI[y nj " I ) ' 0

Corollary 3. If {X n' n 2 1} are independent r.v.s, Sn =D X j' and {b n , n 2 1}


are constants with 0 < bn i 00, then Snlbn ~ 0 iff
n

L P{lXjl
j; 1
2 bn } = 0(1), (19)

(20)

(21)

PROOF. Apply Theorem 1 twice. Conditions (20), (21) imply


1 n p

bn j~l 2 nj --> 0 (22)

for 2 nj = XjI[IXjl<bnl - E XjIIIXjl<bnl' and so (20) guarantees (22) with


2 nj = XjIIIXjl<bn); moreover, (19) ensures (22) for 2 nj = XjIIIXjl"bn)' and
these combine to yield Snlbn ~ O. Conversely, Snlbn ~ 0 entails ISn-llbnl ::;
ISn - Ilb n- 1 I ~ 0, whence X nlbn ~ 0, and hence m(Xnlbn) = 0(1). Since for
I ::;j::;n

I m(::) I =1 ~~ m(;;) 1 ::; I m(;;) I,

max X.)
m ( ---1
bn
I ::; - I max Im(X)1
bn 1 SjSno
+ max I (X ")
m---1
bj
1

1
1 sjsn I no<jsn

~ ~Up m(~-l)
b)
I~o,
)"no
360 10 Limit Theorems for Independent Random Variables

conditions (4) and (5) of Theorem 1 hold with An = 0, so that (6), (7), (8) are
tantamount to (19), (20), (21). D

The following lemmas will be useful in discussing the strong law.

Lemma 2. If x j, 1 5, j 5, n, are real numbers, Sn = L~ x j, and


Qk.n = L X il X i2 . . . X ik , n ~ k ~ I,
15i l <"'<i,cs n

thenfor n ~ k ~ 2, Qk,n = Li=k XjQk-t.j-t and


s~ - klQk,n = Ck> (23)
where C k is a generic designation for a finite linear combination (coefficients
independent of n) of terms I Or= (D=
1 X~i) of order k, that is, L?'= I hi = k,
1 5, hi 5, k, 1 5, m < k.

PROOF. Suppose inductively that


s: - h! Qh,n = Ch , (24)
Since this implies S~+I - klsnQk,n = SnCk = CHI, it suffices to verify that
SnQk,n = (k + l)QH1.n + CHI' (25)
However, via the induction hypothesis and the identity below,

xt
k-l n
+ (_l)HI L
11

SnQk,n = (k + l)QHI,n + L (-l)i+IQk-j,n L l


xf+!
i=1 j=1 j=1

= (k + l)Q
k+!,n.L..
+ ~I (_I)i+1 [S~-i ~ X!+I + CHI
+ Ck - i ] !--)
(k _ ')1
,=1 I . )=1

+ l)QHI,n + CHI'
= (k
Since (24) clearly holds for k = 1 and 2, the lemma follows. 0

Lemma 3. If k is any positive integer, {Yn, n ~ l} is a sequence of positive


numbers, QOn == I, and
n

Qk,n= LYilYj,"·Yj.= LYjQk-t,j-I' n ~ k, (26)


15i1<·"<j.5n j=k

then Yj 5, A, 1 5, j 5, n, and Qk,n 5, A k imply Li= t Yj 5, (2k - I)A for any


positive integer n and any positive constant A.

PROOF. The lemma is trivially true for k = 1. Suppose inductively that it


holds for k - 1 where k ~ 2. If the set I = {j: k 5, j 5, n, Qk-I, ' - I > A k - I }
has an empty complement, then, since Qk,n 5, A k, (26) yields ti=k Yj 5, A,
implying D=
I Yj 5, kA 5, (2k - l)A. Alternatively, there is a largest integer
10.1 Laws of Large Numbers 361

min I C, whence by the induction hypothesis Lj,;l Yj :5; (2k - 3)A, implying
D= 1 Yj :5; (2k - 2)A. Consequently, noting that (26) entails LjelYj :5; A,

n m
LYj:5; L Yj + LYj :5; (2k - I)A,
1 1 je 1

completing the induction. o


Let 0 < bn i 00 and consider the series
Q)

LI = L b j-
2
aJ,
j= 1

Q) jk-1 j,-1
(27)
Lk = L b;;,2kaJk L aJ.-l··· L aJ" k ~ 2.
j.=k j.-,=k-l h=1
Corollary 5.2.1 states that for independent r.v.s with E X n = 0, EX; = a;,
the convergence ofLI is sufficient for the classical strong law (where bn = n).
The next theorem asserts that convergence of Lk for some k ~ 2 in con-
junction with the necessary condition X n = o(b n), a.c., ensures the generalized
strong law SJb n ~ O.

Theorem 2. Let {X n, n ~ I} be independent r.v.s with E X n = 0, EX; = a;


and set Sn = L~ Xi' If the series ~k of(27) converges for some k ~ 1 and
Q)

L P{lXnl > eb n} < 00 for all e > 0, (28)


n=1
where 0 < bn i 00, then Sn/bn ~O.

PROOF. Define ur,..n = LJ=kbj-kXjVk-l.j-l, n ~ k ~ 1, where V O• n == I


and
n
V k. n = LXjVk-l.j-l = L X i,X i2 ••• Xi.,
j=k 1 :sit <···<ik=s:n

Then {ur,.,n, /Fn' n ~ k} is a martingale, where /F n = a(X j , 1 :5; j :5; n), and,
moreover, E Wrn is the series in (27) modified in that the first summation
only goes up to n rather than 00. The convergence of Lk thus ensures that
{ur,..n, /Fn, n ~ k} is an 2 2 bounded martingale, hence convergent to some
r.v. by Theorem 7.4.3. Consequently, by Kronecker's lemma
n
bn-kU k,n = b-
n "L, X j V k - l , j - l --+
k a.c. 0
. (29)
j=k
This proves the theorem for k = 1, in which case (28) is superfluous. Next, in
view of (27)
n ~-1 j,-l
Zk,n = L bj~2kXJk L XJ._,,,. L Xl., n ~ k,
jk =k j. - 1 =k - 1 h =1
362 10 Limit Theorems for Independent Random Variables

is an !i'l bounded submartingale and hence convergent a.c. Thus, by Kro-


necker's lemma
• j.- 1 j,- I

Qk .• = I XJ. L xJ._,'" L XJ, = O(b;k), a.c.


j. =k
=k - 1j. - I iI =1

Moreover, (28) ensures X; = o(b;), a.c. Applying Lemma 3 with Yj = XJ, it


follows that with probability one

· I (. )h12
I ,I X~ ~ .L XJ = o(b~),
h 2. 2. (30)
J= I J= I

By Lemma 2 there exist finite constants Ck; C I' ... , Ck _ 2 (when k = 2,


merely C2) such that
k-2
(b;;IS.)k = I ch(b;;lS.)hA k_h.• + CkAk.• + klb;;kU k.• , (31)
h=l
where for 0 ~ h ~ k - 2, A k - h •• is a finite linear combination of terms
IJ=
n7'= I (b;;h i 1 X~i) satisfying hi 2. 2 for 1 ~ i < m < k and 1 hi = L:"=
k - h.
In view of (30), Ak-h.• ~O for 0 ~ h ~ k - 2. Thus, according to (31)
and recalling (29), S./b. is a root of a kth degree polynomial in which the
leading coefficient is unity and the remaining coefficients converge (a.c.) to
zero. The conclusion of the theorem follows from the well-known relations
between the roots and coefficients of a polynomial. 0
The next corollary reveals, under the necessary (when 0 < b. i 00) condi-
tion (28) with b. = s.(logs.)a, 0( > 0, that independent r.v.s with zero means,
variances a; and s; = L~=l at
-+ 00, obey the strong law


s;; 1(log s.)-a I Xj~O,
j= 1

thereby generalizing Exercise 5.2.10, where rx > t.


Corollary 4. If {X., n 2. l} are independent r.v.s with E X. = 0, E X; = a;,
then S./b.~O provided (28) and

(32)

hold.
PROOF. Setting s; = L~ aJ, if k~ > 1, the series Lk of (27) converges, being
dominated by

"'.
L.
[b;;2ka;(~1L. a 2)k-l]J
< C~
- .~l s;(log s;)U
a; < 00. o
10.1 Laws of Large Numbers 363

If higher-order moments are assumed finite, the next result asserts that
convergence of a single series suffices for the classical strong law.

Theorem 3 (Brunk-Chung). If {X., n ~ I} are independent r. v.s with EX.


= 0, n ~ I, and for some r ~ I
co EI X.1 2 r
.=L 1 n
r+ 1 < 00, (33)

then (I/n) :L7=1 Xi ~


.z 2.
O.

PROOF. Setting S. = L~ Xi' the submartingale inequality of Theorem 7.4.8


(33) yields

t: 2r P { sup _._J .I} =


IS ~ t: t: 2r lim { I ·1
S r
P max ~
2
~ t: 2r }
j",. } m-co .SjSm}

and so, in view of Lemma 3.3.1 it suffices to show that the right side of (34) is
0(1). Now,

(35)

and by the Marcinkiewicz-Zygmund (Section 3) and Holder inequalities

It follows via (33) and Kronecker's lemma that EIS.1 2 r = 0(n 2r ) and, moreover,
that the series on the right of (35) is bounded by

which converges as n --. 00 by hypothesis. o


Theorem 4. Let {S., n ~ I} be the partial sums ofindependent random variables
{X., n ~ I} with EX. = 0, EIX.la ~ a.,a' A. =: A•. a = (L:7=1 ai,a)l/a --. 00
364 10 Limit Theorems for Independent Random Variables

where 1 < tX ~ 2 and A n +l,JA n • a ~ Y< 00, all n ~ 1. If, for some P in
[0, l/tX) and positive b, c,
00

L P{IXnl > bA n(lOg2 An)I-P} < 00 (36)


n=1

then

(38)

PROOF. Set y" = XnIUXnl:s;cAn(1oR2An)-l'j, Jv" = XnIUXnl>Mn(10R2An)1-I'I' v" =


X n - Y" - Jv". Now since f3 < l/tX,

~ An(lOg2 An)-P L" P{ IX;i > bA;(lOg2 AY -P}


i= 1

and so, in view of (36),

Secondly, (37) and Kronecker's lemma guarantee

1 n

A (I
n og2
A)I
n
p.L (Vi -
1= I
E Vi) ~ O.

Thus, since E X n = 0, it suffices to verify that


1 n

A (I
n og2
A)1
n
p.L (l'i -
1= I
E l'i) ~ O. (39)

To this end, note that if nk = inf{n ~ 1: An ~ yk},


10.1 Laws of Large Numbers 365

and so {nk> k ~ I} is strictly increasing. Moreover, for all k ~ 1

Therefore, setting U. = L7= 1 (}j - E }j), for all E > 0

P{U. > 2y2EA.(I0g2 A.)I-//, i.o.}

: :; p{ max
"k-I <nSnk
U. > 2y2EA•• _,(I0g2 A•• _Y-//, i.O.}
(40)


t; = E U; : :; L E i= 1
XrIlIxil:scAi(IOg2Ai)-Pj

Consequently, setting
EA 2(log A)1 - 2//
A. =
• •
t;
2.
,
it follows that e.' x. = 2e. Since (41) ensures that 2eA.x. ~ 2e«-lE(log2 A.)l-//«
-+ 00 and h(x) ~ (x/2)log(1 + x) as x -+ 00, for all large n (see 10.2(1»

x;h(2eA.) ~ eA. x; log(1 + 2eA.) = E(log 2 A.) log(1 + 2eA.) > 8e 2 1og 2 A•.

Thus, via (3) of Lemma 10.2.1

P {max U. > EA••(log 2 A•.>l-//} = P {max U. > A.•• x•• t•• }


IS.S.. IS.S ••

1
:::;; exp{ -t x ;.h(2eA.,>/4e 2
} :::;;exp{ - 21og 2 A•.} :::;; (k logy)2

and so (40) and the Borel-Cantelli lemma ensure

~ U.
11m 1 /I < 0, a.c. (42)
• -00 A.(I0g2 A.) - -

Since {- (Y• .. E y"), n ~ I} have the same bounds and variances as


{Y" - E Y", n ~ I}, (42) likewise obtains with - U. replacing U. thereby
proving (39) and the theorem. 0
366 10 Limit Theorems for Independent Random Variables

Corollary 5. Let {Xn, n ~ 1} be independent random variables with E X n = 0,


EX; = a;,
s; = 2:7 a;
-+ 00 and Sn+ tlsn :-::; Y < 00, n ~ 1. If for some f3 in
[0, t) and some positive c, [)
00

I P{ IXnI > [)Sn(log2 Sn)! - P} < 00 (43)


n; !

then

(45)

Note that Corollary 5 precludes f3 = t. In fact, (43) and (44) when f3 = t


comprise two of the three conditions for the Law ofthe Iterated Logarithm in
Theorem 10.2.3.

Corollary 6. If {X n' n ~ 1} are independent random variables with E X n = 0,


EX; = a;, s; = Ii a; -+ 00 and IXnl :-::; cpsn(log2 sn)-P, a.s., n ~ 1 where
o :-: ; f3 < t, cp > 0 then (45) obtains provided in the case f3 = 0 that Sn + ! = O(sn).
Prohorov (1950) has shown for bn = nand nk = 2k that convergence ofthe
senes

00

I exp{ -eb;._,!(s;. - s;._J}, e > 0


k;!

is necessary and sufficient for Sn/n ~ 0 when IXnI < Knjlog2 n, a.c. for
n ~ 1. Unrestricted necessary and sufficient conditions depending upon
solutions of equations involving truncated moment generating functions
have been given by Nagaev (1972).

EXERCISES 10.1

I. Verify via independent {X.} with

that (21) of Corollary 3 (b. = n) cannot be replaced by an analogous condition with


a truncated second moment replacing the variance.

2. If {X.} are i.i.d. with Pi = P{X 1 = 2i } = 1/[2i (j + I)j], j ~ I, and P{X! = O} =


I - If Pi' prove that (Log n/n)(S. - n) -!'. - 1. where Log denotes logarithm to the
base 2. Hint: Consider Y. i = XjI(xj$./LognJ'
10.1 Laws of Large Numbers 367

3. Let {X., n::e: I} be independent LV.S with EX. = 0, EX; = 0'; < 00, s; =
L~ af -> 00, and a. = o(s.). Prove the result of Raikov that

. I ~ 2 p
Iff 2 L. X j -> I.
sn j= t

Hint: Apply Corollary 2 to Y. j = XYIs;, I ~j ~ n,


4. Necessary and sufficient conditions for the strong law for independent {X.} with
EX. = 0, EX; = 0';
< 'X! cannot be framed solely in terms of variances. Hint:
Consider
I
P{X. = ±cn} = 2 = HI - P{X. = on, n::e: 2,
2c nlogn
and

p{Xn = ±!o:n} = !~~n = t[1- P{Xn = O}], n::e: 2.

5. Define a sequence of LV.S for which (lin) X;..:.=.. 0 but D L (X .In) diverges a.c.,
revealing limitations to the" Kronecker lemma approach."

6. Let {X., n ::e: I} be independent LV.S with E X. = 0, E X; = satisfying (28). 0';,


(i) Show that if 0'; -
nl(log n)", " > 0, then (*) (lIn) X; ~ O. D
(ii) If, rather, 0'; -
nl(log log n)b, gives necessary and sufficient conditions for
(*) in terms of b when IX.I = D(nllog log n), a.c.

7. Let {X n , I} be independent LV.S with E X n = 0, EX; =


n::e: s; = af, and 0';, D
a;s;_ ~ E X~,
1 where (*) 1 L:'= n-
4 E X~ < 00. Prove that the classical SLLN
holds. Compare (*) with (33) when r = 2.

8. Show that Theorem 2extends to the case where {X n' n ::e: I} are martingaledifferences
with constant conditional variances 0';.
9. If {X n } are independent LV.S with variances 0';, s; = L~ af = o(b;), and
D' (E X~/b~) < X!, where bn i 00, then (JIb;) I L7= xf ~ o.
10. Let X n = b· Yn , n ::e: I, b > I, where {Yn } are bounded i.i.d. random variables. Prove
that (llb n) L7= n
1 X; ~ 0 provided bnlb -> 00. Compare with Exercise 5.2.8.

II. Let IX n, n ::e: I} be independent, symmetric r.v.s and X~ = X.lllx.1<b.J' where


0< hn < (fj.lf X: = XnlllX.lsb.J - XnlllX.1>b.)' then {X:, n ::e: I} are mdependent,
symmetric LV.S with the same joint dJ.s as {X n, n ::e: I}. Let Sn' S~, S: be the cor-
responding partial sums and En > O. Then

[S~ > En] C [Sn > en] v [S: > en]


and
P{S~ > en, i.o.} ~ 2 P{Sn > En' i.o.}.
12. Let X n = an Y", n > I, where {Yn} are bounded i.i.d. LV.S and

0'; = (log n)- I ex p { Un },


log n
n > I, i. > O.

Show that s; = D
af - (2),) - I exp{2).nllog n} and that Ilb n L7 = 1 X i ~ 0 when-
ever s; log log s; = o(b;).
368 10 Limit Theorems for Independent Random Variables

10.2 Law of the Iterated Logarithm


One of the most beautiful and profound discoveries of probability theory is
the celebrated iterated logarithm law. This law, due in the case of suitably
bounded independent random variables to Kolmogorov, was the culmina-
tion of a series of strides by mathematicians of the caliber of Hausdorff,
Hardy, Littlewood, and Khintchine. The crucial instruments in the proof
are sharp exponential bounds for the probability of large deviations of sums
of independent r.v.s with vanishing means.
The next lemmas are generalized analogues of Kolmogorov's exponential
bounds. The probabilistic inequality (3) is of especial interest in the cases
(i) i' n == A, Xn -- 00, Cn -- 0, and (ii) An -- 00, CnX n = a > O.
Define
h(x) = (1 + x)log(1 + x) - x, x ~ 0
(1)
g(x) = x- 2 (e X
- 1 - x), -00 <x< 00

Lemma 1. Let 8n = IJ=l Xi where {Xi' 1 ::; j ::; n} are independent r.v.s with
s;
E Xi = 0, E Xl = ul, = IJ=l ul > o.
(i) If P {Xi::; CnSn} = 1, 1 ::; j ::; n, then
t > 0, (2)

and if, in addition, CnX n ::; an, thenfor all An and X n such that an· X n > 0, anAn > 0

P {max s"J >


-
s } -<
A.n x n" e-h(an}.n)x~/a~ (3)
1 $i$n

t> 0, (4)

and if, in addition, ui ::; CnSn' 1 ::; j ::; n,


E elSn/Sn ~ exp{t 2 g( -tcn)[1 - t 2 c;g( -tcn)]}
~ exp{t 2 g( -tcn)[1 - (t 2 c;/2)]}, t > O. (5)

PROOF. The representation g(x) = SA So e XY dy du shows that g is non-negative,


increasing and convex. The point of departure for (2), (3), (4) is the simple
observation

tX"J = 1 + t E --+
2
EelXj/Sn = I + E [elXj/Sn - I - _J X g (tX")2 _J . (6)
Sn Sn Sn

Hence, monotonicity ensures under (i) that if t > 0,


t2u~
E elXj/Sn ::; I + -f g(tc n) ::; exp{t 2 g(tc n)uJ/s;} (7)
Sn

and (1) follows via independence.


10.2 Law of the Iterated Logarithm 369

If, rather, (ii) obtains, then (6) in conjunction with the elementary in-
equality (l + u)e > e u > 0, yields for t > 0
Ul U
,

whence (4) is immediate. Under the additional hypothesis aj ~ c"s", 1 ~ j ~ n,


necessarily S;4 IJ=I af ~ c;, and (5) follows from (4) in view of g(O) = t.
To establish (3), note that via Example 7.4.9 and (1), for t > 0

p{ m~x Sj ~ }X"S"} ::::;; e-J.txnsn E etSn ::::;; exp{ -AtX"S" + t s;g(c"s"t}}, (8)
I S)S"
2

and so, setting a = a" and t = bx"/s" where bx" > 0,

p{ max Sj ~ AX"S"} ~ exp{ -x;[Ab - 2


b g(c"x"b)]}
ISjS" ::::;; exp{ -x;[Ab - b 2g(ab)]}

Employing the definition (1) of g, the prior exponent is minimized at b =


a-I log(1 + aA) with a value of
_x 2 _x 2
-f
a
[(1 + aA) log(1 + aA) - aA] = -f
a
h(aA)

Clearly, nothing precludes A or a from being a function of n. o


The simple inequality of (3) yields a generalization of the easier half of the
law of the iterated logarithm. In what follows log2 n will abbreviate log log n.

Corollary 1. Let {X", n ~ 1} be independent r.v.s with EX" = 0, EX; = a;,


s; = I~ al-+ co.
(i) If P{X" ::::;; d,,} = 1, n ~ 1 where d" > 0, then with probability one

(9)

according as (lOg2 S;)1/2d"/s" -+ 0 or (log2 S;)1/2d"/s" -+ a > O.


(ii) If P{IX", ::::;; d"} = 1, n ~ 1 where (lOg2 S;)1/2d"/s" -+ a, then with prob-
ability one,

(10)

PROOF. Let b" = s"(log2 s;r 1/ 2 and suppose d"/b" -+ a ~ O. Since 0 < b" i co, it
follows that b;l(max l 5;;iS" d;) -+ a and so d" may be taken to be increasing.
For any ex>1, define no=inf{n~1:s"~ex} and nk = inf{n> nk-I:
k
s" ~ exS"k_I}' k ~ 1. Then S"k_l ~ S"k- 1 < exS"k_l so that S"k- 1 ~ ex and
log2 S;k_l '" log2 S;k- I · Hence,
370 10 Limit Theorems for Independent Random Variables

P {Sn > Aa 2sn(log2 S;)1/2, i.o.} ::5: P { max Sn > Aa 2Snk _1 (log2 S;k_Y/2, i.O.}
nk-I ~n<nk

::5: P {max Sj> ASnk-1 (lOg2 s;k_d l/2, i.O.} (11)


1 $.j<nk

For any y > 0, setting A = (l/u)h- l (y), X n = (lOg2 S;)1/2, n = nk - 1 and


noting for any u > a and all large n that Cn(lOg2 S;)1/2 = (lOg2 s;)1/2dn /s n < u,
Lemma 1 ensures that for all large k

p{ I
max Sj > !h-l(y)Snk-I(lOg2S;k-dl/2}::5:
$;j<nk U
eXP{-~h(h-I(Y»IOg2S;k_l}
U

::5: (2k log a)-y/u 2


Hence, via (11) and the Borel-Cantelli lemma, for all y > u 2

P {Sn > :2 h- l (y)sn(log2 S;)1/2, i.O.} = 0

whence with probability one for a > 0,


-.- Sn a2 . -I h- l (a 2 )
11m 2 1/2::5: - mf h (y)!--- (12)
n-oo Sn(log2 sn) a y>u2 a

proving the second part of (i). If rather, a = 0 then (12) holds for arbitrarily
small a. Since h(a) '" a2/2 as a -+ 0, necessarily h-I(a) '" (2a)I/2 whence
(l/a)h- l (a 2 ) -+.j2 as a -+ 0 yielding the first portion of (i). Finally, under (ii),
(12) holds for both Sn and -Sn so that (10) obtains. 0

For any positive integer n, let {Xn, j, 1 ::5: j ::5: n} constitute independent LV.S
with dJ.s Fn,j and finite moment generating functions ((In.J{t) = exp{I/Injt)}
for 0 ::5: t < to' Suppose that Sn,n = 2:J=1 Xn,j has dJ. Fn. For any tin [0, to),
define associated d.C.s F~~~ by

F(t) ~x) I
= ---- IX e'Y dF ~y)
n,J ({Jnjt) -00 n,J

and let {Xnjt), I ::5: j ::5: n} be (fictitious) independent r.v.s with dJ.s {F~!j'
1 ::5: j ::5: n}. Since the d. of Xnjt) is ({Jnjt + iU)/({JnJt), setting I/In(t) =
L:J= I I/Injt), the c.f. of Sit) = L:J= 1 Xnjt) is given by
nn ({In J{t + iu) }
Ee
. S
1U
"
(I)
= . = exp{I/I.(t + iu) - I/In(t) .
j= I ((Jnjt)
Thus, the mean and variance of Sn(t) are I/I~(t) and I/I;(t) respectively and,
moreover, the dJ. of Sit) is
10.2 Law of the Iterated Logarithm 371

whence for any tin [0, to) and real u


P{S._. > u}

= exp{l/I.(t) - tl/l~(t)} foo exp{-tyJI/I~(t)}dF~}(yJI/I~(t) + I/I~(t»,


[u-I/f~(I})/,~
(13)

If 1/1. and its derivatives can be approximated with sufficient accuracy, (13)
holds forth the possibility of obtaining a lower bound for the probability that
a sum of independent LV.S with zero means exceeds a multiple of its standard
deviation.

Lemma 2. Let {Xj , 1 ~ j ~ n} be independent r.v.s with E Xj = 0, E Xl = al,


s; = L~ al > 0, and P{IXjl::;; d.} = 1, 1::;; j ::;; n. If S. = X j and LJ=1
lim.~oo d.x./s. = 0, where x. > X o > 0, thenfor every y in (0, 1), some C y in (0, .!),
and all large n
P{S. > (1 - y)2 S• X .} ~ Cyexp{ -x;(1 - y)(l - y2)/2}. (14)

PROOF. Let qJj(t) denote the m.gJ. of Xj and set S._. = S./s. = IJ=1 X)s. and
c. = d./s.,
°
Since, in the notation leading to (13), qJ.)t) = qJit/s.), 1 ~ j ::;; n,
and gl (x) = X-I (eX - 1)j, it follows for t > and 1 ::;; j ::;; n that
< a2
'1'._ J'
m' It) = -dtd E e,Xj/ = E --!.

S
s
"(e,Xj!s" - 1) - tg (+tc)---l.
~ I - • S2'
• •

where g is as in Lemma 1. Hence if I/I.(t) = LJ= 1 I/I.Jt),


• ~ tg 1 (tc.)
I/I~(t) = L I/I~)t)
j=l ~ tg 1 ( -tc.)/[l + t 2 c;g(tc.)]. (15)
Moreover, since !gl(X) - 11 < (lxl/2)[1 - (lxl/3)r 1 for °< Ixl < 3 and
372 10 Limit Theorems for Independent Random Variables

Ig(x) - 11 < (Ixl/6)(1 - (!xl/4W 1 for °< Ixl < 4, iflimtnc n = 0,


n
t/t~(tn) = L
j=l
t/t~.j(tn) = 1 + O(tncn)· (16)

Thus, via (5), (15), (16) and g(O) = 1, gl(O) = 1, for any y in (0,1) and all
large n
2
tn
t/tn(tn) - tnt/t~(tn) ~ t;[g( -tncn) - t;c;g2( -tncn) - gt(tnCn)] ~ -2 (l + y),
= (1 - y)t n - t/t~(tn) = _ t (1 + (1» < -ytn
Vn - J t/t~(tn) y n 0 - 2 .

Consequently, taking u = (1 - y)t n in (13),

x r
P{Sn> (1 - y)sntn} ~ exp{t/tn(t n) - tnt/t~(tn)}

exp{ - tyJ t/t~(tn)}dF~n) (yJ t/t~(tn) + t/t~(tn»

~ exp{ -t;/2)(1 + y)} fO dF~n)(yJt/t~(tn) + t/t~(tn»


-Y'n/2
~ C y exp{( - t;/2)(1 + y)} (17)

since
n
Sn(t n) - t/t~(tn)
LZnj~ No,t
Jt/t~(tn) j= 1

by Exercise 9.1.2 or Corollary 12.2.2 in view of


n

E Zn,j = 0, L E Z;j = 1, and


j= 1

Finally, set t n = (1 - y)x n in (17) to obtain (14). o


Remark. If X n --+ 00, then for every y in (0, 1) the constant C y > 1- for
°
f.
all f. > provided n ~ some integer Nt.

The strong law asserts under certain conditions that with probability one
sums Sn of independent r.v.s with zero means are o(n). In the symmetric
Bernoulli case, Hausdorff proved in 1913 that Sn ac O(n(l/2)+t), € > 0. The
order of magnitude was improved to O(Jn log n) by Hardy and Littlewood
in 1914 and to O(Jn log2 n) by Khintchine in 1923. (Here, as elsewhere,
log2 n denotes log log nand logk + 1 n = log logk n, k ~ 1). One year later
Khintchine obtained the iterated logarithm law for the special case in
question and in 1929 Kolmogorov proved
10.2 Law of the Iterated Logarithm 373

Theorem 1 (Law of the Iterated Logarithm). Let {XO, n ~ I} be independent


r.v.s with E X o = 0, EX; = 17;, s; = L~ 171--+ 00. If IXol ~ do, a.c., where the
constant do = 0(So/(lOg2 so)1 /2) as n --+ 00, then, setting So = L?=1 Xi'

(18)

PROOF. Choose the integers nk , k ~ 1 such that SOk ~ a l < SOk+1 and note that
u;/s; = 0(1) whence SOk ' " a k • According to Corollary 1,

a.c. (19)

To establish the reverse inequality, choose y in (0, 1) and define independent


events
k ~ I,
where, since SOk '" ak , a > I,

(20)

for all large k. Thus, taking X Ok = hk in Lemma 2, noting (20) and that
dokhk/g k = 0(1),
P{Ad ~ C y exp{ -hf(1 - yXI - yZ)/2} ~ C y exp{ -(1 - yZ)Zlog k}
Cy
=k~

for all large k, whence by the Borel-Cantelli theorem


P{Ak> i.o.} = 1. (21)
Next, choose a so large that (l - y?(1 - a-Z)lIZ - (2/a) > (I - y)3 and
set to = j2logz So, implying for all large k that
(1 - y)Zgkhk - 2s0k _,t ok _, '" [(1 - y)z(1 - a- Z)I /Z - 2a-l]sOktok
> (l - y)3soktok·
Hence, setting Bk = {I SOk _ ,I ~ 2s0k _ ,tOk _,},

AkBk C {SOk > (1 - y)Zgkhk - 2s0k _,tOk_,} c {SOk > (I - y)3soktoJ


again for all large k. However, (ii) of Corollary 1 guarantees P{BL i.o.} = 0,
which, in conjunction with (21), entails
P{SOk > (I - y)3soktOk' i.o.} ~ P{A k · B k, i.o.} = 1.
Thus, with probability one
374 10 Limit Theorems for Independent Random Variables

and letting y! ° the reverse inequality of (19) is proved. o


Corollary 2. Under the conditions of Theorem 1

To extend the law of the iterated logarithm (LIL) from bounded to un-
bounded LV.S without losing ground, a refined truncation is necessary. This
means that the truncation constants, far from being universal, should (as
first realized by Hartman and Wintner in the i.i.d. case) depend upon the
tails of the distributions of the r.v.s involved.
Let {X., n 2:: I} denote independent random variables with EX. = 0,
EX; = a;, s; = D=t a? -+00. Then {X., n 2:: I} obeys the LIL if (17)
obtains.

Theorem 2. If {X., n 2:: I} are independent LV.S with EX. = 0, EX; = a;,
s; = L~ a? --+ 00, in order that {X., n 2:: l} obey the LIL it is necessary that
00

L P{X. > <5S.(I0g2 S;)t/2} < 00, <5 > fi. (22)
.=1

PROOF. If b; = 2s; log2 s; and S. = L~ Xi' So = 0, then lim._", Snlb. 3':;,1. Now
°
S._llb. ~ since a 2 (S._d = S;_I = o(b;), and clearly S._I is independent of
~X., X.+1' ... ) for all n 2:: 1. Hence, by Lemma 3.3.4(ii) lim X.lb. ::;; 1 + e, a.c.
and (22) follows by the Borel-Cantelli theorem. 0

Corollary 3. Under the hypothesis of Theorem 2, in order that both {X.} and
{-X.} obey the LIL, it is necessary that
00

L P{ IX. I > <5S.(I0g2 S;)1/2} < 00, <5 > fi. (23)
.=1
The next result stipulates two conditions which, conjoined with (23) for a
fixed <5, are sufficient for the LIL. One of these, (25), clearly implies the
Lindeberg criterion and hence the asymptotic normality of L~ X )s•.

Theorem 3. If {X., n 2:: I} are independent r.v.s with EX. = 0, EX;


s;= L~ a? --+ 00, and dJ.s {F., n 2:: I} satisfying for some <5 > ° = a;,

00

L P{IX.I > <5S.(I0g2 S;)1/2} < 00, (24)


.=t
10.2 Law of the Iterated Logarithm 375

(25)

L
00
2
n= 1 Sn(\og2
1 2
i
Sn) [£Snllog2s~)-1/2<lxI56sn(log,s~)1/2)
X
2
dFn(x) < 00 for all e > 0,
(26)
then the law of the iterated logarithm (17) holdsfor {X n} and -{X n}. Alter-
natively, if(24) is valid for all e5 > 0, (25) obtains and (26) is replaced by
00 j. - 1 h - 1
L(S].log 2 S]k)-kyj• L Yj._,··· LYj,<oo, (27)
j. = k j. - I =k- 1 j, = 1

for some k ~ 2 and all e > 0, where

then the LIL likewise holdsfor {X n} and {-X n}.

PROOF. Condition (25) implies

<fJn(e) = max S;;;2


m ~n
f
j= 1
i [x 2 > £2sJllog2 S7) - ']
x 2 dF/x) = 0(1), e > 0,

and hence permits the choice of integers nk+ 1 > nk such that <fJn(k - 2) < k - 2
for n ~ nk> k ~ 1. Define e~ = k - 2, nk ::s; n < nk+ l' k ~ 1. Then e~ l and °
for nk ::s; n < nk+ I

::s; <fJn(en) ::s; <fJn.(en) < k- 2 = 0(1) (28)

as n -> 00 provided en = e~. Proceeding in a similar spirit with the tail of the
series of (26), there is a sequence e~ = 0(1) such that

nk + 1

L L ... ::s; L k-
00 00

= 2
< 00, (29)
k= 1 n>n. k= 1

where en = e~.
Consequently, en = max(e~, e~) = 0(1) and both (25) and (26) hold with e
replaced by ej and en respectively.
Define truncation constants {b n , n ~ I} by
376 10 Limit Theorems for Independent Random Variables

and set

n n n

S~ = L Xj, S"n = "l..J X/~J' Sn'" =-I..


"X/~/
J O
'

I I I

Now
0'; - at = E X;IlIx.l>b.) + E 2
XnIIIX.lsb.1 ::; 2 E X;IlIx.l>b.I'
recalling that E X n = 0, and so (28) ensures o'~~ - at. Thus Theorem
yields
S~ - E S~ a.c. I.
fIn1 2 1/2 (31)
n- 00 sn(2 log2 Sn)

Secondly, Kronecker's lemma and (29) guarantee that

(32)

Thirdly, (24) implies that S;' = 0(1) with probability one, and, further-
more,

IE S;/I::; ±i
i= I [Ixl >clsj(loK2S~)1/2)
IxldFi x )

: ; it i l clSil lO K2 s~)1/2 < Ixl :ss.(log2 S~) _ 1/2 ) 1x I dFi


x
)

+ itl LXI>s.(IOg2S~)_1/2)IXldFJ{X)

via (24) and (25).


The first portion of the theorem is an immediate consequence of (30),
(31), (32), and the assertion just after (32).
In the alternative case, note that since Yn(l» and hence the series of (27) is
decreasing in 1>, there exists, as earlier, a sequence I>n = 0(1) such that (25) and
(27) hold with I> replaced by I>j and I>n respectively. Define
bn = I>n SnO og2 S;)- 1/2
and X~, X~, X;' as in (30), but now with (j = 1 and the new choice of bn.
The only link in the prior chain of argument requiring modification is that
used to establish (32).
10.2 Law of the Iterated Logarithm 377

Now

in view of the strengthened version of (25), and so for any b > 0


00

L P{JX~ - E X~I > bsn(logz s;)t/Z}


n= 1

~ 0(1) + JIP{IX~I > ~SnCIogz S;)I/Z}


~ 0(1) + JIP{IXnl > ~Sn(lOgz S;)t/Z} < 00 (33)

for all b > 0 as hypothesized.


Since the variance of X~ is dominated by 'YnCen ), it follows from the
strengthened (or en) version of (27) and (33) that Theorem 10.1.2 applies to
X~ - E X~ with bn = snClogz S;)I/ Z. Thus (32) and the final portion of the
theorem follow. 0

The first corollary reduces the number of conditions of the theorem while
the second circumvents the unwieldy series of (27).

Corollary 4. If {X n} are independent random variables with E X n = 0, EX;


= a;, = s; LI af --+ 00, satisfying (25), and for some (J. in (0, 2],
f Z 1
n=1 (sn logz sn)
z alZ r
JlIxl>tsn(IOg2S~)-'121
Ixl a dFn(x) < 00 for all e > 0 (34)

then the LIL holdsfor {X n} and {-X n}.


PROOF. Clearly, the series of (34) exceeds the series obtained from (34) by
restricting the range of integration (i) to (esnClogz S;)-I/Z, bsn(logz S;)I/Z]
or (ii) to (esn(logz S;)I/Z, 00). But the series corresponding to (i) dominates
the series of (26) multiplied by ba - Z, while the series.corresponding to (ii)
(with e < b) majorizes the series of (24) multiplied by ba • 0

Corollary 5. Let {X n} be independent random variables with E X n = 0,


E X; = a;, s; = LI
af --+ 00, satisfying (24) for all b > 0, and (25). Iffor
some p > 0,

(35)

where

then the LIL holdsfor {X n} and {-X n}.


378 10 Limit Theorems for Independent Random Variables

PROOF. For all e > 0,

for k > lip. o


All the prior conditions for the LIL simplify greatly in the special case of
weighted i.i.d. random variables. Define such a class Q by
Q = {a. Y., n ~ I: Y",n ~ l,arei.i.d.randomvariableswithmeanO,variance
af in (0, co] and a., n ~ 1 are nonzero constants satisfying
s; = Ii a1 --> co} (36)
and let F denote the common dJ. of {Y.}.
To obtain the classical Hartman- Wintner theorem governing the i.i.d.
case only part (i) of the following"theorem is needed.

Theorem 4. Let {a. Y,,} E Q with a; = 0(s;/log 2 s;) and ai = I. If either (i) for
some (X in (0, 2]

a; 2),,/2 i
f (2s. log2 Iyl" dF(y) < co for all e> 0 (37)
n=I s. [y2;" esMC1~ log2 s~)

or (ii)

~ {2
.~I P Y I >
[)s; log2
a;
s;} < co fi II ~
or aU>
0 (38)

and for some p > 0,

for all e > 0

(39)

then the LIL holds for {anY,,}, that is,

(40)

PROOF. In the weighted i.i.d. case, condition (25) becomes


2:
s.
La;
j= 1
i
[y2>es'/a2Iog2s'l
J J J
y2 dF(y) = 0(1), e> 0, (41)

and is automatic whenever the integral therein is o( I), that is, whenever
10.2 Law of the Iterated Logarithm 379

a; = o(s;/log 2 s;). The first part of the theorem thus follows from Corollary 4
since (37) is just a transcription of (34).
Likewise, (38) is a transliteration of (24), and so the second portion will
follow from Corollary 5 once (35) is established. To this end, note that
a; = 0(s;/log 2 s;) entails lim a;/s;_ 1 = 0, whence

log(I+(a;/s;_I» O(a;/s;_I) 0(1)


an == log s;_ 1 = log s; _ 1 = log s;- 1

and
log s; = (I + an)log s; - I' (42)
so that,
s;(Iog s; _ I)P log2 s;
= (s;_ I + a;)[(1 + an)-I log S;]P[lOg2 S;_I + log(I + an)]

= (I + ~)[I
S;_I
+ 0(1)
P log S;_I
][1 + 0(1)
(log s;_I)log s;
]

X s; _ 1(log s;)P log2 s;- 1


= s;_I(log s;)P log2 S;_I + (I + o(l»a;(Iog s;)P log2 S;_I'

Hence, if qn == qn(f:) = C,(I0g2 s;/(Iog s;)P), noting that


logi S;_I = (I + o(l)log i s;, i = 1,2,

s; log2 s; _ S;_I log2 S;_I = (I + o(l»a; log2 S;-l = C-I(I (1» 2


2 P
(log sn) (I og Sn-
2)p
I (I og Sn-I
2)p ,+ 0 an qn,

implying for all large n that


n 2 2 Sn2 1og2 Sn2 2C,Sn2 Iog2 Sn2
L a·q· <
j= 1 J J -
C < ------.==o----"-i'--
'(log s;)P - (log L~ aJq)P
(43)

Consequently, qn(f:) = Cllog 2 s;/(Iog s;)P) (and a fortiori qn(f:) =


O(lOg2 s;/(Iog s;)P» entails (43). But (43) is precisely (35) in the weighted
i.i.d. case since then Yn(f:) = a;q.(f:). 0

The status of the LIL in Q is conveniently described in terms of

Yn = n2
a; , n~l. (44)
Sn
Note that YI = 1,0 < Yn < n, n > I, and

S2
~=n
n ( Y
1--.1
)-1 (45)
si j= 2 j
380 10 Limit Theorems for Independent Random Variables

Consequently, under the hypothesis

-Yn = -a; < 1 - -I lor


l"
some u~ > 1, (46)
n s; - J

it follows from (45) that for some c > 0

IOg2 s; ~ (l + o(l»log n. (47)

Lemma 3. If {an, n ~ I} satisfies (46), s; -+ OC) and

Yn = o((log s;)log 2 s;), (48)

then for every J-II > 0 and real J-I2 necessarily nl'l/(Iog2 s;t 2 i OC) (aI/large n)
and

(49)

PROOF. Under (46), recalling (45) and employing

(l-~rl = 1 +~(l-~rl ~exp{JYn/n},


there follows

log s; = log S;_I - log(l - Yn)


n
~ (1 + n Iogsn-I
JYn 2 )Iog s;_ \0

implying

Therefore, noting that these entaillog j s; = (l + o(l»log j S;_I' i = 1,2,

n _1)1"( log S2 )1'2 >1- (1)1"[


1- ( -- 2 n1--n 1+ n(log S;-I»)Og2
Jy ]1'2 n
n IOg2 S;_I - S;_I

J-II + 0(1)
n

for J-I2 ~ 0; the same conclusion is obvious when J-I2 < 0, so that for all J-I2
nI'l (n - 1)1'1 J11(l + 0(1»
(50)
(I0g2 S;)1'2 (I0g2 S;_1)1'2 ~ n I'I(lOg2 s;t 2 '
l
10.2 Law of the Iterated Logarithm 381

whence for all large n


1 2 nl"
L1 /
n

j= 1"(log2 sJ'r 2< - -::------..-:-::-


- Jl.l (lOg2 S;)1'2'

which is tantamount to (49). Moreover, (50) ensures that nl"(log2 S;)-1'2 is


increasing for all large n. When Jl.2 > 0, (47) guarantees that it tends to 00 as
n -+ 00, whereas this is obvious if Jl.2 ~ 0. D

Theorem 5. If {un Y,,} E Q, where u; = o(S;/lOg2 s;) and ')In = 0((lOg2 s;)P) for
some p < 1, then the LIL holds for {un Y", n 2: 1} provided E y2 < 00.
PROOF. According to Theorem 4 it suffices to verify (37) for some IX in (0, 2].
Now the hypotheses entail ')In = o(n), thus a fortiori (46) and also ')In ~
K(log2 s;l, whence Lemma 3 is applicable. Setting

ej ej

J
== ')Ij log2 s} -> K(lOg2 SJ)1 +/1 == q.,J

this lemma guarantees qj i 00 all large j (and for convenience this will be
supposed for allj 2: 1); the lemma also certifies for any IX in [0, 2) that

Consequently, for any e > ° and some constant K. in (0, (0)

~ K. f
n=1
(lOg2 S;)1 +p-a [
JIQn S y 2<qn+tl
y2 dF(y) < 00

provided 1 + P~ IX < 2. Thus, (37) obtains and the theorem is proved. D

Corollary 6. If s; -+ 00, ')In = 0(1), and {Y, Y", n 2: 1} are i.i.d. random
variables with E Y = 0, the LIL holds for {Un Y,,} and {- Un Y,,} iff E Y 2 < 00.
PROOF. The hypothesis implies (46), whence (47) ensures

u; log2 S; _ ')In log2 S; S; -


---"-----'<-2"----"- -
Clog n _
-- - 0
(1)
,
Sn n n
and so the conclusions follow from Theorems 5 and 6. D
382 10 Limit Theorems for Independent Random Variables

In the special case an = I,n ~ l,necessarilY1'n = I,n ~ l,andCorollary6


reduces to

Corollary 7 (Hartman- Wintner). If {Yn} are i.i.d. random variables with


E Y\ = 0, the L1L holdsfor {Yn} and {- Yn} iffE Yi < 00.

In Q, the necessary condition (23) for the two-sided L1L becomes

~
L, p{ 2
Yt >
~ n log2 s;a~} <
U 00, J > 2a~. (51 )
n= I 1'n

If 1'n increases faster than C log2 s;,


(51) asserts that something beyond a
finite second moment for Yt is required for a two-sided L1L. On the other
hand, if 1'n = 0(1), (51) does not even stipulate that the variance be finite.
Nonetheless, this is necessary for a two-sided L1L according to

Theorem 6. Let {an' n ~ I} be nonzero constants satisfying s; = L~ a; --+ 00,


a; = 0(s;/log 2 s;). If {Y, Y", n ~ I} are i.i.d. with E Y = 0, E y 2 = 00, then

p{lim ILJ=t
n~ 00 sn(log2
aj~~
sn)
= oo} = 1. (52)

PROOF. Let {Y:, n ~ I} denote the symmetrized {Yn } and for c > 0 set
- Y*I
Y 'n- n IIY~I:scl' X'n-an
- Y'n,an d ac2 -- E y'2
n' Thensn,2 =L,j=taXj-acsn,
- "n 2 _ 22
whence {X~} obey the conditions of Theorem I, implying

-
{n~
P lim
00
d- }
L~-I a·Y'·
Sn Og2 sn)
t~2 > ac
}
= 1.

By Lemma 4.2.6

(53)

and since ac --+ 00 as c --+ 00, (53) holds with ac replaced by + 00, which, in
turn, yields (52). 0

Theorem 7 (L1L for non-degenerate V-statistics). Let Vk,n(h), n ~ k ~ 2 be a


sequence of V-statistics with E h = 0 and 0'2 = E[E{h(X 1, .•. , X k)IX 1 }]2 E
(0, (0). If
E{h(X 1 , ... , X k)IX 1 , ... , Xj} E 2'2j/(2j-l)' 2 ~j ~ k
(a fortiori, if E Ih 4/3
1 < (0), then with probability one

limn~oo (21ognlog n )1 2V k' n(h)


/
-ka =

~ nmn~oo (21ognlog n)1 2V k'n(h) = ka,


/
10.2 Law of the Iterated Logarithm 383

PROOF. Via Lemma 7.5.5,


/
~ t
( og og n 2 Uk.n(h) = (2 n Iog og n)1 /2 r= 1 E{h(X r, ... , Xr+k-dIXr}
21 nl )1

+ o((log log n)-1 /2), a.s.


Since y,. = E {h(X" ... , X r +k _ 1) IX r }, r ~ 1 are i.i.d. random variables with
E y,. = 0, E Y,.2 = (12 E (0, 00), the conclusion follows from Corollary 7. D

EXERCISES 10.2
I. Show under the conditions of Theorem 1 or Corollary 7 that
~ S. r:. a.c.
hm ". 2 ". 2 1/2 = V 2,
'-00 (L.,j=1 X j IOg2 L.,j=1 Xj)
2. If a.A.. ~ e - 1, the upper bound in (3) of Lemma 1 may be replaced by
exp{ -(A..' x;)fan(e - I)}.
3. Verify that the LIL holds for independent r.v.s X. distributed as N(O, a;) provided
5; = L~ a1 -+ 00, a. = 0(5.). Hint: Use a sharp estimate of the normal tail for the
dJ. of S./s•.

4. If {Xj' 1 S; j S; n} are independent r.v.s with P{Xj S; Jl + d.} = 1, d. > 0 where


E Xj = Jl, aij = a 2 E(O, (0), 1 S; j S; n then for h as in (I),
2
P{S. ~ n(1 + e)Jl} S; ex p { -d7 h(eJl d./a 2 )} or exp{ -nJle/d.(e - I)}

according as eJl > 0 oreJl ~ (e - 1)a2 /d•. Hint: Apply Lemma 1 with A.. = eJl/a 2 ,
x. = an l /2 .
5. When {X.} are independent with EX. = 0, EX; = a;, = D af, X. S;
5; c.s. i,
a.c., lim c.x. = 0, check via (2) that for all y > 0, r > 0, and all large n

p{ max Sj >
t SJ'5n
(I + y)'x.s.} s; exp{ -!x;(I + y)2.-I}.

6. Under the conditions of Theorem I, show that with probability one that every point
of [ -1,1] is a limit point of S./s.(2Iog 2 S.)1/2. Hint: For d # 0, 0 < d < I, y > 0,
setting ak = (I - y)dh k, bk = (I + y)2 dh k, and 7k = s •• - S•• _" for all large k
P{ghak < 7k < bkgd = P{7k > akgd - P{7k > bkgd
-(I + y)ai } {-(I + Y)bf} 1 {-(I - Y)d hi }
2
~ Cyexp { 2 - exp 2 > "2Cyexp 2

via Exercise 5.

7. Let {y,,} be i.i.d. with E YI = 0 and let s; = D


a1 = exp{n: (log; ny'}, where
ex; ~ 0, i = 1,2,3. Note that if 0 < ex l < I, or ex l = I, 0 < ex2 < I, Theorem 5
applies. Show that if exl > I, the two-sided LIL holds for {a. y"} iff
E y 2 (log I YI)",-I(lOg21 YI)"2- 1 (log 3 1YI)"' < 00.

8. If {X., n ~ I} are independent r.v.S with P{X. = ±n"} = tn-/l, P{X. = O} =


1 - n-/l, then {X.} obeys the LIL if 1 - P> max(O, -2ex), P> O.
384 10 Limit Theorems for Independent Random Variables

9 Let S. = D'=I Xi where {X, X., n ~ I} are i.i.d. with E X = 0, E X 2 = 1 and let T
be an {X.}-time with E T < 00. If T,. = Ij=1 TW where TW, j ~ 1 are copies of T,
then

10. If {X., n ~ I} are i.i.d. with E ei'x, = e-I'I', 0 < a. ~ I, prove that
P{Ilill!n-I/'S.ll/lou. = ell'} = I,

that is, P{IS.I > n /'(Iogn)(1+t ll', i.o.} = Oor 1 according as/; > 0 or /; < O. Hint:
l

Show that P{n-I/'IS., > x} = P{lX II> x} and use the known (Chapter 12)
fact that PiX II > x} - Cx-' as x --+ 00.
11. If {X., n ~ I} are interchangeable r.v.s with E X I = 0, E xi = I, Cov(X I' X 2) =
0= Cov(Xi, X~), then 1lill(2n log2 n)-112 D
Xi = I, a.c.

12. For {X., n ~ I} as in Lemma 2 except that (*) lim d.x./s. = a > 0, x. --+ 00, prove
that for all y in (0, 1) and all u in (0, uo)

p{s. > I: Y (-'e~~~")s.x.} ~ (t + O(l»eX P{ ~~; [h(u) + o(l)]},


where Uo is the root of the equation e -" = (e" - u)(e" - 1)2 and

h(u) = U2[gl(U) - g( -u) + u 2g 2( -u)]


with 9 and gl as in Lemma 2. Utilize this to conclude under these conditions with
x. = (I0g2 S.)112 that Ilill S.(s; log2 S.)-112 = C E (0, (0), where C depends upon a
and perhaps also the underlying dJ.s.

10.3 Marcinkiewicz-Zygmund Inequality,


Dominated Ergodic Theorems
The first theorem, an inequality due to Khintchine concerning symmetric
Bernoulli trials, will playa vital role in establishing an analogous inequality
due to Marcinkiewicz and Zymund applicable to sums of independent r.v.s
with vanishing expectations.

Theorem 1 (Khintchine Inequality). If {X., n ~ I} are i.i.d. r.v.s with


t
P{X I = I} = P{X I = -l} = and {c.} are any real numbers, thenfor every

t
pin (0, (0) there exist positive,finite constants A p , Bp such that

Ap(* cf yl2 ~ "jtl CjXj ~ Bp(~ cf Y12. (1)

PROOF. Suppose initially that p = 2k, where k is a positive integer. Then,


setting SII = D=
I CjX j,
10.3 Marcinkiewicz-Zygmund Inequality, Dominated Ergodic Theorems 385

where IX I , .•• , IX j are positive integers with L{= I IX i = 2k, A<lI •...• <lj
(IX I + + IXj)!/IX I !'" IX j !, and i l , ... , i j are distinct integers in [I, n]. Since
E Xf; Xf{ = I when IXI>' •• , IX j are all even and zero otherwise,
E S2k
n
= "L, A 2fJ,,···.2fJj.,
c 2fJ , ... c 'j2fJj ,

{31"'" {3j being positive integers with D= I {3i = k. Hence

E Sn2k -- L AzfJl
A
..... zfJj . A fJ
I ... ·•
ZfJ, ... c·ZfJ )
fJ j c·" Ij
fJ, ... ·• fJj

~ B~:s;k,
where s; = L7= cf and I

{31!"'{3j!
BZk
Zk = sup AzfJl ..... ZfJj = sup (2k)!
-----
AfJ, ... fJj (2{31)!'" (2{3)' k!
2k(2k - I)···(k + I) 2k(2k - I)· .. (k + I)
< sup . < ------:,.......,:----,-...,...----~
- nl= I 2{3l2{3j - 1) ... ({3i + I) - 2fJ , + ... +fJj

= 2k(2k - I;~" (k + I) ~ kk.

Thus, when p = 2k the upper inequality of (I) holds with Blk ~ kllz. Since
IISnllp is increasing in p, IISnllp ~ IISnib ~ Blks n for p ~ 2k, whence the
upper inequality of (I) obtains with Bp ~ kllz, where k is the smallest integer
~ p12.
It suffices to establish the lower inequality for 0 < p < 2 since IISnllp ~
I Snllz = Sn for p ~ 2. Recalling the logarithmic convexity of the Y p norm
established in Section 4.3 and choosing r l , r z > 0 such that r l + r z = I,
pr l + 4r z = 2,
s; = IISnll~ ~ IISnl!:"IISnll:'2 ~ IISnll~"(21IZsn)4'Z,
whence
IISnll~" ~ 4-'2S;-4'2 = 4-'2S~",
IISnilp ~ 4-'2lp"sn

Hence, the lower inequality holds for 0 < p < 2 with Ap ~ 4 -'21p'l =
2-(Z-Pllp and for p ~ 2 with Ap ~ 1. 0

Corollary I. Under the hypothesis of Theorem I, if SZ = cJ < 00, then If


(i) Sn = D
CiXi~S, (ii) IISll p ~ kllzs, where k is the smallest integer
,s2
~ p12, (iii) E e < 00 for all t > O.
PROOF. Theorem 5.1.2 guarantees (i) while (ii) follows from Khintchine's
inequality and Fatou's lemma. Apropos of (iii),
tj tj 'j
I I I
<Xl <Xl <Xl <Xl

Ee =
,s2
~ E SZj ~ ~ V /2 s)Zj = L ~ (tsZY < (tsZey
j=O)' j=O}' j=O}' j=O
since /Ij! < L:= Z
0 j"In! = ei. Thus, E e,s2 < 00 for ts e < 1. Finally, since
386 10 Limit Theorems for Independent Random Variables

Sn -+ S, for any t > 0 the integer n may be chosen so that 2te(s2 - s;) < 1.
Then
E e,S2 = E e,(S-s.+s.)2 ~ E[e2IS~ . e21(S-S.)'] < 00

since S; is a bounded LV. for fixed n. o


Theorem 2 (Marcinkiewicz-Zygmund Inequality). If {X n, n ~ I} are
independent r.v.s with E X n = 0, then for every p ~ I there exist positive
constants A p , B p depending only upon p for which

PROOF. Clearly (Exercise 4.2.4), II X j E 2 P iff X j E 2 P' 1 ~ j ~ n, iff


(II Xf>I/2 E 2
p , whence the latter may be supposed. Let {X:, n ~ I} be the
symmetrized {X n, n ~ 1}, that is, X: = X n - X~, n ~ 1, where {X~, n ~ I}
is independent of and identically distributed with {X n' n ~ I}. Moreover, let
{Vn, n ~ I} constitute a sequence ofi.i.d. LV.S independent of {X n' X~, n ~ I}
with P{VI = I} = P{VI = -I} = 1. Since

E{~ V;(Xi - X;)I VI'···' v", XI'···, X n} = ~ V;Xj,


it follows that for any integer n > 0, {II V; Xi' II V;(X i-X;)} is a two-term
martingale, leading to the first inequality of

Elf v;xilP ~ EI* v;xrl ~ 2


P
P
-E{I*
1
P
v;xil + I~ V;x;n
=2
P
EI* v;xiI
P

.
(3)

Since Khintchine's inequality (1) is applicable to E{ III V; Xi IP IXI' X2, ...},


necessarily

A~E(*Xfr2 ~ EI~ V;XiI ~ B~E(*Xfr2,


P

which, in conjunction with (3), yields

A~E(~Xfr2 ~EI~v;xrr ~2PB~E(~Xfr2. (4)

However, in view of the symmetry of {Xj, I ~ j ~ n}

whence, recalling Lemma I0.1.1 (or repeating the earlier two-term martingale
10.3 Marcinkiewicz-Zygmund Inequality, Dominated Ergodic Theorems 387

argument),

and so (2) follows from (4) and (5), the upper and lower constants B p and A p
being twice and one half respectively those of the Khintchine inequality. 0

Corollary 2. If {X n, n 2 I} are i.i.d. with E XI = 0, EIXtlP < 00, P 2 2,


andSn = L1 Xj,thenEISnl P = O(n P/ 2 ). (6)
PROOF. If p> 2, by Holder's inequality D XJ ~ n(P- 2l/p (L11 Xil P)2/ p,
and the conclusion follows from (2).

Corollary 3. If {X n} are independent r.v.s with E X n = 0, n 2 I, and both


L1 X j and L1 xl converge a.c. as n --+ 00, then, denoting the limits by Lf Xj,
Lf xl respectively,for p 2 1

If T is a stopping time relative to sums Sn of independent LV.S and {cn} is a


sequence of positive constants, the finiteness of E C TIST I, which is of interest
in problems of optimal stopping (see, e.g., Theorem 5.4.6), is guaranteed by
that ofE sUPn~ I CnISnl. Questions such as the latter have long been of interest
in ergodic theory in a framework far more general than will be considered
here. In fact, the classical dominated ergodic theorem of Wiener encompasses
the sufficiency part of

Theorem 3 (Marcinkiewicz-Zygmund). For r ~ 1, independent, identically


distributed r.v.s {X, X n, n 2 I} satisfy

Esupn-'Ifxjl' < 00 (8)


n~ t I

iff
EIXI' < 00, r> I, and EIXllog+ IXI < 00, r = I. (9)

PROOF. Since {X: , n 2 I} and {X;; , n 2 I} are each i.i.d. with moments of the
same order as {Xn}, Example 7.4.3 stipulates that {(Iln) L1 xt,.? n' n 2 I}
and {(Iln) D X j- , .?n, n 2 I} are positive (reversed) martingales, whence
(9) and Theorem 7.4.8 (35) ensure that (8) holds with X replaced by X + or
X -. The exact conclusion (8) then follows by Lemma 4.2.3.
Conversely, if(8) obtains forr 2 I,EIXII' ~ ESUPn~1 n-'IL1 Xii' < 00,
so that only the case r = 1 needs further attention. Now

Esupn-IIXnl=Esupn- 1 ~Xj-
n n- t n I I I
~Xi
I
~2Esupn-1 ~Xi <00,
388 10 Limit Theorems for Independent Random Variables

and thus, choosing M > 1 so that P{IXI < M} > 0,

ro r
Xl Xl

00 > p{supn- I IX nl2 t}dt 2 p{su p n- I \X nI2 t}dt


J n;z,1 JM n;z,1

f:n~IP{IXI2 nt})J:p{lXI <jt}dt


f:
=

2l\ P{lXI <jM} JI P{lXI2 nt}dt.

Now E I X I < entails positivity of the infinite product, whence

f
00

OO
L P{ I X I 2
00
= foo L I1n,,;'-'IXIJ dt dP
00
00 >
fM n=)
nt}dt
lIXI;z,MJ M n=1

> ( (IXI(L!! _ l)dt dP > ( IXl(loglXI - log M - l)dP


J11XI;z,MJ J M 1 JlIXIZMJ

= E IXllog+ IXI + 0(1),


establishing (9) for r = 1. o
If the i.i.d. random variables of the prior theorem have mean zero, the
order of magnitude n- r appearing therein can almost be halved provided
r 2 2.
A useful preliminary step in proving this is furnished by

Lemma 1. If{ Y,., n 2 1} are independent, nonnegative r.v.s then E(Lf YS < 00
for some r 2 I provided
00 00

n=)
L E Y~ < 00, LEY: <
n= 1
00, (10)

where CL = 1 if r is an integer and IX = r - [r] = fractional part of r otherwise.


PROOF. Since

independence guarantees that


10.3 Marcinkiewicz-Zygmund Inequality, Dominated Ergodic Theorems 389

Since (*) y~-a ~ y~ + Y: (or via Exercise 4.3.8), (10) ensur.es

L E y~-a < 00.


00

(12)
\

The lemma is trivial for r = 1. If r is an integer 2: 2, the lemma follows


inductively via (II) and (12). If r is not an integer, (10) and (*) entail
D'" E y" < 00 and (12), whence the conclusion follows from (II) and the
already established integral case. D

Theorem 4. For r 2: 2, independent, identically distributed r.v.s {X, X n , n 2: I}


with E X = 0 satisfy

E sup
IL?;\ XJ <
,/2 00 (13)
n;,e" (n log2 n)

iff
X 2 log IXI
EIXI' < 00, r> 2, and E log21XI IllXI>e"1 < 00, r = 2. (14)

PROOF. Let E X 2 = 1, Sn = L?;


I Xj, Cn = (n IOg2 n)-1/2 or I according as

n > ee or not, and set bn = nl/' or (njlog2 n)I/2 according as r > 2 or r = 2.


Assume initially that X n is symmetric; in proving (13), r will be supposed
> 2, the case of equality requiring only minor emendations. Define

n n
S~ = LX;, s~ = LX;"
i I

Now for h = rx = r - [r] > 0 or h = 1 and positive constants no, K I, K

< K EIXI' < 00,

and the same conclusion follows analogously when h = r. Hence, by Lemma I


390 10 Limit Theorems for Independent Random Variables

Thus, to complete the proof of sufficiency for symmetric {X n } it suffices


via Lemma 4.2.3 to prove that E(sup CnIS~ I)' < 00, and this will be done for
all r > 0, supposing only that E X = 0, E X 2 = 1. To this end, set nk = [3 k],
whence by Levy's inequality and symmetry

a:>

~ 4 L P{cn,S~'+1 2: u}. (15)


k= 1

It follows from Example 7.4.9 and (1) of Lemma 10.2.1, noting g(l) < 1, that
for 0 < tb n ~ 1

P{S~ 2: x} ~ e- rx fI EerX'; ~ exp(-tx + t 2 f E Xl) ~ e-rx+nr2.


j= 1 1

for some positive constant a. Therefore, choosing Uo such that auo > 5,
it follows via (15) that

a:>
f "0 U,-I P{sup cnlS~1 2: u}du ~ 4J, fa:>
a:>
"0 U,-I exp{ -(au - 1)log(k + 1)}du

a:> 1
< c~ k 2 < 00,

so that E(sup CnIS~ I)' < 00 by Corollary 6.2.2.


In the case of general {Xn }, if X: = X n - X~, n 2: 1, are the symmetrized
r.v.s,

E sup c~ ISn I' = E sup c~ IE{S: IXI' X 2, ... }"


~ E sup C~ E{lS:1' IX I' X 2''''}
~ E supc~IS:1' < 00.

For r > 2 the converse follows from the necessity part of Theorem 3.
When r = 2, Theorem 3 merely yields E xi < 00. However,

X; 2E S; + S;_I
E sup ~ sup < 00,
n~e' n log2 n n~e' n log2 n
10.3 Marcinkiewicz-Zygmund Inequality, Dominated Ergodic Theorems 391

and so, choosing AI > ee such that P{IX I < AI} > 0,

00 > fooo p{sup n IX;


og2 n
~ t}dt ~ J(00 p{sup n IX;
og2 n
M
~ t}dt

~ }J I
P{X 2 < Mj IOg2 j} f: ~ P{X; ~ tn IOg2 n}dt.
Now E X 2 < 00 entails positivity of the infinite product, whence for some
C>O
00 > (00 L P{X 2 ~ tn IOg2 n}dt = f (00 L I'nlo82n:sx2t-'j dt dP
JM n IIXI2:C] J M n

>
- i lIX!2:C] M
f X2(t IOg2X2X t2 I - ee ) dt dP

X2
i f
X2
> 2 dt dP + O( 1)
- lIX/2:C] M t IOg2 X

X 2 10glXI
=2E IOg21XI IlIX1>e-j+O(I). D

EXERCISES 10.3
I. Show via examples that Theorem 3 is false for 0 < r < I and Theorem 4 fails for
I ::; r < 2.
2. Verify under the hypothesis of Theorem 4 that for any finite {X n}-time T
E[(T log2 T)- l l2jSTIJ < oc.

3. If X(1), X(21 are LV.S. with respective symmetric dJ.s F and G, where F(x) ~ G(x) for
all x > 0, then F.(x) ~ Gn(x), x > 0, where F n (resp. Gn) is the n-fold convolution of
F ( resp. G) WI'th't If H ence, I'f SU)
I se. n -- "n
L"i = 1 XU)·
i ' ) -- I , 2,were
h i ' I _< I. <
XU) _ n, are
are i.i.d., then EIS~I)IP ::; EIS~2)IP, n ~ I, P > O.

4. Show that in Theorem 3 sufficiency and necessity for r > I extend to interchangeable
LV.S. Is (9) necessary for r = I?

S.1f {Xn,n~ I} are independent LV.S with EXn=O, n~ I, and pE(I,2], then
EmaxlsjsnII{=1 Xd P ::; ApD=1 EIXjlP for some finite constant Ap. Hint:
Use the Doob and Marcinkiewicz-Zygmund inequalities.

6. USn = I7=1 Xi where {X, X n, n ~ I} are i.i.d. r.v.s,prove that E exp{t sup cn'Snl} < 00
for some t > OiffE{exptlXI} < 00 for some t > 0 where Cn = (nlog 2 nf l /2, n ~ ee.
392 10 Limit Theorems for Independent Random Variables

10.4 Maxima of Random Walks


In Section 5.2 moments ofi.i.d. r.v.s {X n , n ~ I} and hence behavior of the
corresponding random walk {Sn = L1
Xi' n ~ I} were linked to convergence
of series involving ISnl and S: = maxI sisnl Sil. Here, analogous results for
the one-sided case involving explicit bounds for the series will be obtained.
For any positive v, set
[v)
Xo = 0, Xv = max Xj' Sv = LXi' Sv = max Sj'
Osjsv i:O Osjsv
(1)
X: = max Osjsv
IXjl, S: = max ISjl.
Osjsv

Theorem I. If {X, X n , n ~ I} are i.i.d. r.v.s with EX = and p, oc, yare


constants satisfying 1 :::;; y :::;; 2, ocy > 1, ocp > 1, there exists a constant C =
°
C P. a. Y E (0, (0) such that
00

Lnap - 2 P{Sn ~ na} :::;; C[E(X+)P + (EIXI Y)lap -II/(aY-l)). (2)


n= 1

PROOF. Suppose without loss of generality that E I X I > 0, E(X +)P < 00, and
EIXIY < 00. Clearly, for any k > 0, by Lemma 5.3.5,
P{Sn ~ na} :::;; nP{X > na/2k} + Pk{Sn ~ na/2k}. (3)

Now via Theorem 7.4.8 and the Marcinkiewicz-Zygmund inequality

(4)

Set k = 1 + [(ocp - 1)/(ocy - 1)]. Then A. = k - (ocp - 2)/(ocy - 1) >


1/(ocy - 1), and ifEIXIY ~ 1, from (4)

I '= f nap - p {S ~ 2kna}


I
2
k
= L
noy-t>EIXIY
+ L
nOY-'sEIXI'

:::;; C I[L nap - 2+ kll-ay) EkIX IY + L nap - 2] (5)


= C 2 [(EI X jY)([I-).(aY-I))/lay-I)}+k + (EI X Iy)(a p - II/lay- II]
= C(EIXIY)lap-l)/(aY-l l .
On the other hand, if E IXI Y< 1, again via (4)

L n-).(aY-1) EklXIY :::;; C


00

I :::;; C 1 2
Ekl XI Y:::;; C(EIXl y)(a p -I I!(aY-I). (6)
n: 1
10.4 Maxima of Random Walks 393

;a
Hence, from (2), (5), and (6)

n~lnaP-2 P{Sn ~ na} ~ ~ [n ap - 1


p{X >
ap
+ n - 2 J>k{Sn ~ ;~}J
~ C[E(X+Y + (EIXn(ap-l)/(ay-l)]. 0

Taking IX = I,)' = p in Theorem 1 yields


Corollary 1. If {X, X n, n ~ I} are i.i.d.
then for some constant C P E (0, (0)
LV.S with EX = ° and 1 < p ~ 2,

co
L nP - 2 P{Sn ~ n} ~ CpEIXI P ,
n=l
co (7)
Ln P
-
2
P{S: ~ n} ~ 2C p EIXI P .
n=l

Corollary 2 (Hsu-Robbins). If {X, X n, n ~ l} are i.i.d. LV.S, then


co
L P{ ISnl ~ n£} < 00, £ > 0, (8)
n=l
iff E X = 0, E X 2 < 00.

PROOF. (8) follows immediately from Corollary 1 applied to X/£ while the
converse is the special case IX = 1, p = 2 of Theorem 6.4.5. 0

EXAMPLE 1. Strong Laws for Arrays. Let {Xni' 1 ~ i ~ n, n ~ I} be an array


of random variables that are identically distributed and rowwise independent,
i.e., {X n1 , ... , X nn } are independent for n ~ 2. IfEIXlllq < 00, 0< q < 4, and
E Xli = 0 whenever 1 ~ q < 4, then n-2/qL~=1 Xni~O.
PROOF. The case 0< q < 2 is covered in Exercise 5.2.17. Set Snn = L~=l X ni ,
n ~ 1. When 1 ~ q < 4, replacing p by q and choosing)' = 2, IX = 2/q in
Theorem 1, it follows that
co
L P{Snn > £n 2 /q } < 00, £ > O. (9)
n=l
Since Xi may be replaced by -Xi in Theorem 1 (when EIXIP < (0), (9) holds
with Snn replaced by - Snn, and so
co
L P{ISnnl > en
n=l
2
/
q
}< 00, £>0, (10)

whence the Borel-Cantelli lemma implies that n-2/qsnn~ 0. o


For any r.v.s {Yn , n ~ l} and nonnegative v, set Yo = 0,
Yv = max lj, (11)
O~j~v
394 10 Limit Theorems for Independent Random Variables

and for p > 0, a > 0 define (as usual, sup 0 = 0)


M(e) = M(e, a) = sup(y" - en a) = supey. - en a),
.2:0 .2:0

L(e) = L(e, a) = sup{n ~ I: y. ~ en a },

/(e) = /(e, p, a) = (00 vap - 2 p{suprar; ~ e}dV,


Jo J2:V

J(e) = J(e, p, a) = {oo vap - 2 P{Yv ~ wa}dv. (12)

Clearly, J(e) :$ /(e).

Lemma 1. For ap > 1,

I(2e):$ ap ~ I e(l-apl/a E[M(e)Ja p -1)/a:$ (2(ap-1)/a - 1)-IJ(~), (13)

00
J(e):$ max(2 a p-2, 1) L nap - 2
P{Y. ~ en a}, (14)
n= 1

E[L(e)r p - 1 :$ (ap - I)/(e). (15)

PROOF. (14) is trivial. For (13), set p = (ap - I)/a. Then

e-P E[M(e)]P = (00 p{SUP(y" _ ena) ~ W1/P}dV


Jo .2: 1

establishing the second half of (13). Moreover,

/(2e) = (oov ap - 2 p{supra}j ~ 2e}dv


Jo J2:v
10.4 Maxima of Random Walks 395

=
f OO

o
V, p
-
2 P
{(M(e») 1/.
--
e
>
-
V
}
dv = -1- E [M(e)]I.p-I)/.
--
ap - 1 e '

which is the first half of (13). Apropos of (15),


P{L(e) ~ v} = P{Yn ~ £n. for some n ~ v}

~ p{sup n-·Y ~ e}, n


"~V

so that

E[L(r.)]·P-1 = (ap - 1) L oo
v· p- 2 P{L(e) ~ v}dv ~ (ap - 1)I(e). 0

The combination of Lemma 1 and Theorem 1 yields

Theorem 2. If {X, X n, n ~ I} are i.i.d. LV.S with EX = 0, Sn = D Xi'


So = 0, and 1 ~ y ~ 2, ay > 1, ap > 1, then for some constant C = Cp.•. yE
(0,00)

E [snuf~Sn - n·)Tp-I)/. ~ C[E(X+)P + (EIXjY)I.p-1)/I.r1)]. (16)

t
and EX °
Lemma 2. Let {X, X n, n ~ l} be i.i.d. r.v.s with EIXI I /. < 00, where a >
= if a ~ 1. If Sn = D Xi and

L(e) = L(e, a) = sup {n ~ 1: Sn ~ en·}, (17)


LI(e) = LI(e, a) = sup{n ~ 1: X n ~ £n.}, (18)

where e > °and sup 0 = 0, then E V(e) < 00 implies E LH2e) < 00, Y > 0.
PROOF. Set A j = {Xj ~ 2ej·}, B j = {ISj- II ~ ell. Now n-·Sn ~o by
Theorem 5.2.2, and so P{Bn} -+ 1 as n -+ 00. Since for n ~ 1 the classes {B n}
and {An' AnA~+ I""} are independent, by Lemma 3.3.3 for n ~ no

P{L(e) ~ n} ~ PtVn AjBj } ~ ptVnAj}~~~P{Bj}


~ t P{L I (2e) ~ n},
and the lemma follows. 0

Lemma 3. Let {X, X n, n ~ I} be i.i.d., a> 0, ap > 1, e > 0, and

J I (e) = Loo
v· p - 2
P{X v ~ w·}dv. (19)

Then, if L1(e) is as in (18),


(ap - I)J 1 (e) ~ E[L I (2-·e)]"P-l, apePJI(e) ~ E(X+)P, (20)
396 10 Limit Theorems for Independent Random Variables

(21)
PROOF. The second inequality of (20) follows immediately from

As for the first,

(ap - I)J I(e) = (a.p - I) foo V"p-2 p{ max X n Z W"}dV


o v<nS2v-l

= (ap - I) 1OOV"P-2 P{L 1(2-"e) Z v}dv = E[L 1(2-"e)]"P-I.

To prove (21), note that for v Z I


[vI
P{X v Z v"} = L P{X j Z V,,}pj-I {X < v"}
j= 1

Z [v]p[v1{X < v"}P{X z v"}


= [v]P{X z v"} [1 - P{X v ~ v"}],
whence for v ~ 1

[v]P{X z v"} ~ (1 + [v]P{X ~ v"})P{Xv ~ v"}


~ [I + E(X+)l/"]P{X v z v"}

by the Markov inequality. Hence

E(X+Y = a.p 1 00

V"p-l P{X z v"}dv

$ I + 2ap 1 00

V"p-2 P{X v z v"}dv' [I + E(X+)l/"]. (22)

Finally, if X' = max(X, c), C > 0, then (22) holds with X+ replaced by X'.
IfE(X+Y = 00, then, since a.p > 1, E(X')I/" = o(E(X'Y) as C -. 00, implying
J 1(1) = 00. Thus, (21) obtains. 0

Theorem 3. Let {X, X n , n z I} be i.i.d. LV.S, Sn = It Xi> So = X o = 0, and


p > I/a > 0, e > O. Set

M(e) = sup(Sn - en"), M 1 (e) = sup(Xn - en"),


n~O n~O
10.4 Maxima of Random Walks 397

Then, (i)

E(X+Y < 00 iff J I(l) = LX> V<x p -2 P{.K v 2 v<X}dv < 00

iff E[M I(£)J<XP-Il/<x < 00; £ > 0,


iff E[LI(£)]<xP-I < 00, £ > O.

(ii) Suppose E(X+Y < 00 for p 2 1, EX = 0, and EIXI7< 00 for some


y E (l/lX, 2] when 1- < lX < 1. Then for any lX > 1- and all £ > 0
E[M(£)J<XP-Il/<X < 00, E[L(£)]<X P- I < 00, (23)

and J(£) :::;; 1(£) < 00, where the latter are as in (12) but with Sv replacing v".
(iii) Let lX> 1, EIXII/<x < 00, and EX = 0 if ex :::;; 1. If either of the
conditions of (23) holds or 1(£) < 00 or J(£) < 00, then E(X+Y < 00.
PROOF. By Lemma 3, E(X+Y < 00 iff J 1(1) < 00 iff J 1(£) < 00 for all £ > O.
Then Lemma 1 ensures E[M 1(£)]l<XP-ll/<x < 00,1(2£) < 00, £ > 0, and hence
E[LI(£)]<xP-I < 00 since L I (£) :::;; L I (£). Conversely, by Lemma 1
E[M 1(£)]l<X P- Il/<X < 00

implies 11(£) < 00, £ > 0, and hence J \(£) < 00, £ > O. Moreover, if
E[LI(£)]<xP-I < 00, £ > 0, then by Lemma 3, J I(£) < 00, £ > O.
Apropos of (ii), if 1 < lX < 1, the first half of (23) follows from Theorem 2.
Then by Lemma I, J(£) :::;; 1(£) < 00 and E[L(£)YP-I ::; E[[(£)]<X P- \ <: 00.
If, rather, lX 2 1, define X~ = XnI[Xn~ -C]' C > O. Then S~ = D Xj 2 Sn'
Since E(X+Y < 00 for some p 2 1 (p > 1 if lX = 1), necessarily y ='
min(p, 2) E [1, 2] and EIX'II Y < 00. Hence, by Theorem 2

n )](<X P - I)/<X
,L
:::;; E [ sup ( (Xj - E X'I) - w<X < 00
n~O )=1

by Theorem 1, and since E XI = 0(1) as C ~ 00, the first half of (23) is


established. The remainder of (23) and (ii) follow from L'(£) 2 L(£),
1'(£) 2 1(£).
To prove (iii), note that by Lemma 1

J(£) < 00 = E[M(£)J<XP-Il/<X < = 1(£) < = E[L(£)]<X P- I <


00 00 00,

whence E[L(£)YP-I < 00. Then by Lemma 2, E[LI(£)YP-I < 00, implying
E(X+Y < 00 by part (i). 0

Corollary 3 (Kiefer-Wolfowitz). If {X, X n, n 2 I} are i.i.d. with EX < 0


and Sn = L~ Xi' So = 0, p > I, then E(suPn~O sny- I < 00 iff E(X+Y < 00.
398 10 Limit Theorems for Independent Random Variables

PROOF. Apply Theorem 3 with a. = 1 and e = - E X to {X n - EX, n 2 I}.

Corollary 4. Let {X, X n , n 2 I} be i.i.d. r.v.s with EX = 0, Sn = L~ Xi' and


define L(e) = sup{n 2 1: ISnl 2 ne}, e > O. Then for p > I, E[L(e)]p-1 < 00
for all e > 0 iffEIXIP < 00.
PROOF. Clearly, L(e) = max[L +(e), L -(e)] :$ L +(e) + L -(e), where L -(e) =
sup{n 2 I: Sn :$ -ne} and L +(e) = sup{n 2 I: Sn 2 en}. 0

Corollary 5. If{X, X n , n 2 1} are i.i.d. with E X = Jl. > 0, E(X-)2 < 00 and
No = sup{n 20: Sn :$ O}, where Sn = L~ Xi' then E No < 00 and

f p{inf Sj O} <
n=1 J",n
:$ 00.

PROOF. Set Yn = Jl. - X n , whence

f p{inf Sj o}= f p{~upt (li - Jl.) 2O}


"=1 J~"
:$
n=1 j?;nl;:l

:$ f p{suj",np i=±I (li - ~)2 2 nJl.}


n= I 2

:$ LP
<0 {
sup j (L li - Jl.)
- J1
> n- } < 00
n=1 j",1 i=1 2 2
by Corollary 3.. Since {No 2 n} = U~n {Sj :$ O} c {infj ",. Sj :$ O}, neces-
sarily E No < 00. 0

For any r.v.s Y, Z let the relation Y '" Z signify that Y and Z have identical
distributions.

Theorem 4. Let {X, X n , n 2 1} be i.i.d. r.v.s with Sn = D Xi' n 2 I, So = 0,


5n = maxO,;j,;n Sj, 5 = SUPn",O Sn' Then
i. 5n ' " (5n - 1 + X)+, n 2 1,
ii. if Sn~ - 00, then 5 '" (5 + X)+,
iii. If EX e( -00,0) and E X 2 < 00, then E S = [O',i - O'(hx)- ]/( -2 EX).
PROOF. If Wo = 0, J¥" = (J¥,,-I + X n )+, n 2 1, then

n21. (24)

Clearly, (24) obtains for n = 1 and, assuming its validity for n - 1 2 1,


J¥" = (max[Sn_I,Sn_1 - SI"",Sn-1 - Sn-2,0] + X n )+
= (max[Sn, Sn - SI,"" Sn - Sn-I])+
= max[Sn, Sn - SI>"" Sn - Sn-1, 0],
10.4 Maxima of Random Walks 399

whence (24) holds for n ~ 1. Thus,


J¥" '" max[S., S.-t,···, St, 0] = 5., n ~ 1,
and (i) follows.
Next, if S. ~ - 00, then 5. ~ 5 (finite), whence (ii) is an immediate
consequence of (i).
Under the hypothesis of(iii), E 5 < <X) by Corollary 3. Moreover, assuming
temporarily that E(X +)3 < 00, this same corollary ensures E 52 < 00.
By (ii),
E S = E(S + xt = E(S + X) + E(S + X)-,
implying
EX=-E(S+Xr. (25)
Similarly,
E[(5 + X)-]2 - E X 2 = E[(5 + X)2 - 2(5 + X)(5 + X)+
+ (5 + X)+2] - E X 2
= E S2 + 2 E SX - E[(S + Xt]2
=2E5EX,
and combining this with (25) yields (iii) when E(X+)3 < 00. In general, set
X~ = min(X., C) and define 5', X' analogously via X~, whence

(26)

Since (5' + X')- :::; (X')- = X-, 5' :::; 5, and lime_oo 5' = S, by Lebesgue's
dominated convergence theorem E 5' ~ E Sand ais, + X') - --+ afs+ X) - and
(iii) follows from (26), 0

Corollary 6./f{X, X., n ~ 1} are i.i.d. LV.S with EX = 0, a 2 = E X 2 .( 00,


S. = D Xi' So = 0, and

M = M(e) = sup(S. - ne) = sup L (Xi - e), (27)
"2:0 ft2:0 i= 1

where e > 0, then


a2
lim e E M(e) = -2 . (28)
<-0

PROOF. To avoid triviality, suppose a 2 > O. By Theorem 4(iii)


2e E M(e) = ai-. - afM+X-W' (29)
Since (M + X - er :::; X- + e and
lim M(e) ~ lim sup (Sj - je) = sup Sj .":'.~ 00,
.-0 .-0 O<j~. O~j<.

afM+X-W --+ 0 by dominated convergence and (28) follows from (29). 0


400 10 Limit Theorems for Independent Random Variables

Theorem 5. Let {X, X n, n z I} be i.i.d. LV.S with EX = 0, EIXI > 0,


Sn = I7= J Xi' So = 0 and set M(t;) = sUPn"O (Sn - nt;). If

V_ = V_(t;) = inf{n z I:Sn ~ nt;}, L = inf{n z I:Sn ~ O},


V+ = V+(t;) = inf{n z I:Sn > nt;}, T+ = inf{n z I:Sn > O},

then lim,_o V ±(t;) = T± < 00, a.c., and

I. E M(t;) = (l/P{V+ = oo})J[U+<OOI(SU+ - t;V+)


= (EV_)J[u+<OO)(Su+ -t;V+).
II. if (J2 = E X2 < 00, then EST + < 00, EST _ < 00, and

E(t;V _ - Su _) r
J[U+<ool
(Su + - t;V J ----> (J; ,

(J2
E( - ST - ). E ST + <
-
-
2'

III. J[U + < 00] (Su + - t;V +) < 00 iff E M(t;) < 00 iff E(X+)2 < 00.

PROOF. By Corollaries 5.4.1, 5.4.2, T+ and L are finite stopping times and,
clearly, lim,_o V ±(t;) = T±. The second equality of (i) follows immediately
from Theorem 5.4.2. Apropos of the first, define VO) = V +, V1i + J) = 00 on
{VIi) = oo} and otherwise V n = Ii=J VIi) where

V(i+J) = inf { n z I: t
U·+n
j=U;+l
(Xj -
}
t;) > 0 , i z l.

Thus, {V(i), i z I} are copies of V + which, however, is not a finite stopping


time. Nonetheless, analogously to Corollary 5.3.2, for any positive integers
mi , I ~ i ~ n,

(30)

Now, setting

n z I, (31)
necessarily
00

M(t;) = Wu/[U, <00) + L (WUn - WUn_.)I[U(n1<00)· (32)


n=2

Since
00

E(WU2 - W u.)I[ul2l<oo) = L E(Wu(2)+j - J.i.j)I[u(I)=j,U(21<00)


j= 1
References 401

00

L P{U(I) =j}E Wv • /(V.<ool


j= 1

= P{U+ < oo}E WV./(V.<ool

and similarly for n > 2, it follows from (32) that


00

E M(e) = L pn-I{U+ < oo}E WV,/(V.<OOI


n=1

To prove (ii), note that via Wald's equation, the initial notation of (31),
and part (i),

E( - Wv J . E Wv • / [U • < 00 I = e E V - . E Wv • / [U • < 00 I
(72
= e E M(e) --+ -
2

as e --+ 0 by Corollary 6. Hence, from Fatou's lemma

E( - S r J . E Sr. ::::;; lim E( - Wv _) . lim E Wv • I (v. < 00 1


£-0 f.-O

Since both E( - Sr _) and E Sr. are positive, each is therefore finite and (ii)
is proved.
Finally, as already observed, E V _ = I/P{ V + = oo} < 00, whence (iii)
follows from (i) and Corollary 3.

References
L. E. Baum and M. Katz, "Convergence rates in the law of large numbers," Trans.
Amer. Math. Soc. 120 (1965), 108-123.
H. D. Brunk, "The strong law of large numbers," Duke Math. J. 15 (1948), 181-195.
Y. S. Chow, "A martingale inequality and the law oflarge numbers," Proc. Amer. Math.
Soc. II (1960), 107-111.
Y. S. Chow, "On a strong law of large numbers for martingales," Ann. Math. Stat. 38
(1967),610.
Y. S. Chow, .. Delayed sums and Borel summability of independent, identically dis-
tributed random variables, Bull. [nst. Math., Academia Sinica I (1973), 207-220.
402 10 Limit Theorems for Independent Random Variables

Y. S. Chow and T. L. Lai, "Some one-sided theorems on the tail distribution of sample
sums with applications to the last time and largest excess of boundary crossings,
Trans. Amer. Math. Soc. (1975).
Y. S. Chow, H. Robbins, and D. Siegmund, Great Expectations: The Theory ofOptimal
Stopping, Houghton Mifflin, Boston, 1972.
K. L. Chung, "Note on some strong laws of large numbers," Amer. J. Math. 69 (1947),
189-192.
K. L. Chung, "The strong law of large numbers," Proc. 2nd Berkeley Symp. Stat. and
Prob. (\951), 341-352.
J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
V. A. Egorov. "On the strong law oflarge numbers and the law of the iterated logarithm
for sequences of independent random variables." Theor. Prob. Appl. 15 (1970),
509-514.
W. Feller, An Introduction to Probability Theory and Its Applications, Vol. 2, Wiley,
New York, 1966.
W. Feller, "An extension of the law of the iterated logarithm to variables without
variance," Joum. Math. and Mech. 18 (\968),343-356.
B. V. Gnedenko and A. N. Kolmogorov, Limit Distributionsfor Sums of Independent
Random Variables (K. L. Chung, translator), Addison-Wesley, Reading, Mass.,
1954.
P. Hartman and A. Wintner, "On the law of the iterated logarithm," Amer. Jour. Math.
63 (1941), 169-176.
P. L. Hsu and H. Robbins, "Complete convergence and the law of large numbers,
Pmc. Nat. Acad. Sci. U.S.A. 33 (\947),25-31.
A. Khintchine, "Uber dyadische Bruche," Math. Zeit. 18 (1923),109-116.
A. Khintchine. "Uber einen Satz der Wahrscheinlichkeitsrechnung," Fund. Math. 6
(1924),9-20.
J. Kiefer and J. Wolfowitz, "On the characteristics of the general queueing process with
applications to random walk," Ann. Math. Stat. 27 (1956),147-161.
J. F. C. Kingman, "Some inequalities for the queue GI/G/I," Biometrika 49 (1962),
315-324.
A. Kolmogorov, "Uber der Gesetz des Iterierten Logarithmus," Math. Annalen 101
(1929),126-135.
M. Loeve. Probability Theory, 3rd ed., Van Nostrand, Princeton, 1963; 4th ed., Springer-
Verlag, Berlin and New York, 1977-1978.
J. Marcinkievicz and A. Zygmund, "Sur les fonctions independantes," Fund. Math. 29
(1937). 60-90.
1. Marcinkiewicz and A. Zygmund, "Quelques theoremes sur les fonctions indepen-
dantes," Studia Math. 7 (1938),104-120.
S. V. Nagaev, "On necessary and sufficient conditions for the strong law of large
numbers," Theor. Proh. Appl. 17 (1972),573-581.
Y. V. Prohorov. "The strong law of large numbers," Izv. Akad. Nauk. Ser. Mat. 14
(1,950),523-536 [in Russian].
Y. V. Prohorov, "Some remarks on the strong law of large numbers," Theor. Prob.
Appl. 4 (\959), 201-208.
D. A. Raikov, "On a connection between the central limit theorem in the theory of
probability and the law of large numbers," /zv. Nauk USSR, Sov. Math. (1938),
323-338.
D. Siegmund, "On moments of the maximum of normed sums," Ann. Math. Stat. 40
(\969),527-531.
F. Spitzer, "A combinatorial lemma and its applications to probability theory,"
Trans. Amer. Math. Soc. 82 (\956), 323-339.
F. Spitzer, "The Wiener-Hopf equation whose kernal is a probability density," Duke
Math. Jour. 24 (\ 957). 327-343.
References 403

V. Strassen, "A converse to law of the iterated logarithm," Z. Wahr. 4 (1966), 265-268.
H. M. Taylor, "Bounds for stopped partial sums," Ann. Math. Stat. 43(1972), 733-747.
H. Teicher, "A dominated ergodic type theorem," Z. Wahr. 8 (1967), 113-116.
H. Teicher, "Some new conditions for the strong law," Proc. Nat. Acad. Sci. U.S.A. 59
(1968), 705-707.
H. Teicher, "Completion of a dominated ergodic theorem," Ann. Math. Stat. 42 (1971),
2156-2158.
H. Teicher, "On interchangeable random variables," Studi di Probabilita, Statistica e
Ricerca Operative in Onore di Giuseppe Pompilj, pp. 141-148, Gubbio, 1971.
H. Teicher, "On the law of the iterated logarithm, "Ann. Prob. 2 (1974), 714-728.
H. Teicher, "A necessary condition for the iterated logarithm law," Z. Wahr. Verw. Geb.
(1975),343-349.
H. Teicher, "Generalized exponential bounds, iterated logarithm and strong laws,"
Z. Wahr. Verw. Geb. (1979), 293-307.
N. Wiener, "The ergodic theorem," Duke Math. Jour. 5 (1938) 1-18.
A. Zygmund, Trigonometric Series, Vol. I, Cambridge, 1959.
11
Martingales

An introduction to martingales appeared in Section 7.4, where convergence


theorems for submartingales {Sn, g; n' n ~ I} (relating to differentiation
theory) were discussed. Here, emphasis will fall upon convergence theorems
for martingales {S-n, g; -n' n ~ -l} (relating to ergodic theorems). In
demarcating the two cases, it is natural to refer to a martingale {Sn, g; n' n ~ I}
as an upward martingale and to allude to a martingale {S-n, g; -n' n ~ -l}
when written {Sn, g; n' n ~ I} as a downward or reverse martingale. Martin-
gale and stochastic inequalities will also be dealt with.

11.1 Upcrossing Inequality and Convergence


The convergence approach of Section 7.4 does not carryover to downward
martingales since the formal analogue of a "first time" in the former case
is a "last time" in the latter, whereas a genuine first time is not well defined.
The ensuing upcrossing inequality provides an avenue of approach to
downward martingales.

Definition. If r l < r 2 and a l , •.• , an are finite, real numbers, the number of
upcrossings of the interval [r I' r 2] by the sequence a I' a 2 , ••. ,an is defined
as the number of times the elements a j pass from "on or below r l " to "on or
above r2'" More precisely, let 0(1 be the smallest integer (if any) for which
aa, ~ r l , and in general for j ~ 2 let O(j be the smallest integer (if any) ex-
ceeding O(j_ I for which

404
ll.l Upcrossing Inequality and Convergence 405

If 2u is the largest even integer j for which (Xj is defined then u is called the
number of upcrossings (if (X2 is undefined, then u = 0).

Lemmal(Panzone).If{Sj,~j == a{Si' l::s; i ::S;j),l::S;j::S; n}isanonnegative


submartingale, r is a positive ~ I-measurable random variable and U is the
number ofupcrossings of [0, r] by (SI"'" S.), then

ErU + ESI::S; Ln_,>OIS. + Ln_,=o.Sn~}.::S; ES.. (1)

PROOF. For n = 2, (1) is confirmed by


E rU + E SI = 1 r +1 SI ::s; 1 S2
[S,=0.S2~r] [S,>OI [S,=O,S2~r]

+1 S2 ::s; E S2'
[S, >0]

Suppose inductively that (1) holds with n replaced by n - 1 for all sub-
martingales, and set
1::S;j::S;n-2
So ifO < S._I < r
T,,- 1 = { S._I otherwise.
For A E a(Tj , I ::s; j ::s; n - 2) = a(Sj' 1 ::s; j ::s; n - 2)

{T,,-2 = {S.-2::S; {S._I

::s; fAISn_ ,~rl


S._I + fA[O <Sn- ,<rl
S. = f T,,- •.
A

Hence, E{T,,- ,11j, 1 ::s; j ::s; n - 2} ~ T,,- 2' a.c. Clearly, for 2 ::s; m < n - 1
E{Tm l1j, l::S;j::S; m - I} ~ Tm - I, a.c., and so {1j'~j' l::S;j::S; n - I} is
a nonnegative submartingale.
Let r be the number of upcrossings of [0, r] by (TI , ... , T" _I)' Then
U = V + I[Sn_I=O.Sn~r],
whence by the induction hypothesis
ErU + ES I = ErV + ErI[Sn_I=O,Sn~rl + ES.
::s; E T,,-I + ErI[Sn_l=O.Sn~rl

= 1 [0 <Sn _, <rl
s. + 1 [Sn _ , ~ rl
S._I + ErI[Sn_,=O,Sn,U]

::s;1[O<Sn_l<r] ~+1 [Sn_,~r)


~+1 [Sn_I=O.Sn~r)
~
o
406 11 Martingales

Corollary 1. Let {Sj' ff'j = I1(X j , 1 S; i S; j), 1 S; j S; n} be a submartingale


and let r l < r2 be finite real numbers. If U is the number of upcrossings of
[r" r2] by (SI' ... ' Sn), then
E U S; (r2 - r l )-I E(Sn - r l )+ S; (r2 - rl)-I[E S: + Irlll
PROOF. Since {(Sj - r l )+, ff'j' 1 S;j S; n} is a nonnegative submartingale
and U is the number of upcrossings of [0,r 2 - rl] by «SI - r l )+, ... ,
(Sn - r l )+), the lemma ensures
(r2 - rl)E U S; E(Sn - rl)+ - E(SI - rl)+ S; E(Sn - rlt. D

Theorem 1. Let {Sn, ff'n, - 00 < n S; O} be a submartingale and ff' - 00 =


n~oo ff'n·
(i) If E st < 00, then S _ 00 = limn__ 00 Sn exists a.c. with E S~ 00 < 00.
Moreover, iflim n__ oo E Sn = K > -00, then sUPn50 EISnl < 00, sn!:4s- oo
and {Sn, ff'n, - 00 S; n S; O} is a submartingale.
(ii) If{Sn, ff'n, - 00 < n S; O} isa nonnegative submartingale with E sg < 00
forsomep~ 1,thenSn~S_ooasn~-00.
PROOF. Set S = wn
n_- oo Sn, S-oo = lim n__ oo Sn and suppose that for some
choice of r l < r2 the set A = [S> r2 > r l > S-oo] has positive prob-
ability. Then if Un is the number of upcrossings of [r" r2] by (S -n'· .. ,So),
the prior corollary yields

00 > (r 2 - rl)-I[E st + Irll] ~ E Un 2 I Un ~ 00


as n ~ 00, a contradiction. Hence, P{A} = 0 for all choices of r l < r 2 ,
implying S _ 00 = S, a.c., and Sn ~ S _ 00. By Fatou's lemma

E S: 00 =E lim S: S; lim E S: S; E st,


n- - 00 n- - 00

yielding the initial portion of (i). Moreover, iflim n__ 00 E Sn =K > - 00, for
-oo<nS;O

EISnl = 2ES: - ESn S; 2ESt - K,


implying sUPn50 EISnl < 00 and via Fatou that EIS-ool < 00. To prove
sn!:4 S_ 00' it suffices to verify uniform integrability of {Sn}. To this end,
choose 8 > 0 and fix m S; 0 so that E Sm < K + 8. Then for n S; m and
a> 0,

r
)lIs.l~aJ
ISn I = r
)[s.~aJ
Sn +(r
)[s.> -aJ
Sn) - E Sn

S; r
)[S.~al
Sm + (r )[s"> -aJ
Sm) - K

S; r
JIIS"I ~aJ
ISm I + E Sm - K < r
)IIS"I ~aJ
ISm I + 8 < 28 (2)
11.1 Upcrossing Inequality and Convergence 407

for large a since P{ ISn I ;;::: a} :5: a- I E ISn I -+ 0 as a -+ 00. The conclusion of
(2) clearly also holds for the finitely many integers n in Em, OJ, and so uniform
integrability is established.
Next, for A E :IF _ ro and every m :5: 0

Is - ro =
A
lim
n- -ro A
I
Sn
A
~ ISm,
whence E{Sm I:IF _ ro} ;;::: S _ ro' a.c. for all m, concluding the proof of (i).
Apropos of (ii), {S~, :lFn , - 00 < n ~ O} is a nonnegative submartingale
which is easily seen to be u.i., whence the conclusion of (ii) follows. 0

Since U-statistics

Um,n = (n)-I
m
I
1 Sil <O"<im$n
<p(X j " ... , XjJ, n;;::: m,

constitute a downward martingale (Example 3, Section 7.4), Theorem 1


yields directly a.c. and .P t convergence of U m,n as n -+ 00. Moreover, when
the {Xn} are i.i.d., the Hewitt-Savage zero-one law (Corollary 7.3.7) ensures
that the limit is a constant which clearly must coincide with E <p(X t, ... , X m)'

Corollary 2. If {Sn,:IF n' - 00 < n < oo} is an .P t submartingale with


SUP-ro<n<ro EISnl < 00 and X n = Sn - Sn-t, then
ro n

IX j = lim IXj=Sro-S-ro, a.c.


-00 m.lI-oo -m

where S ± ro = lim n_ ± ro Sn and, moreover, E ISro - S - rol < 00.

PROOF. By Theorem 1, L~m X j = So - L m- I ~So - S-ro with


E IS _ rol < 00 by Fatou. Moreover, by Theorem 7.4.2, L':' X j = Sm -
SO~Sro - So with EISrol < 00. 0

Corollary 3.If{Sn, :lFn, - 00 :5: n :5: O} is a submartingale with E Sri < 00 and

PROOF. By Theorem 1, Sn *
lim n__ ro E Sn = K > -00, then Sn* S and E{SI:IF -ro} ;;::: S-ro, a.c.
S as n -+ 00. For A E :IF - ro

by uniform integrability and the corollary follows. o


Corollary 4. Let {Y", -00 < n < oo} be a sequence of LV.S on a probability
space {n,:IF, P) and {:IF n' - 00 < n < oo} a sequence of increasing sub-u-
algebras of:IF with:IF ro = u(UO' :lFn) and :IF - ro = n:::
~ :lFn· lflim n_ ro Y" =
Yro ' a.c., lim n__ ro Y" = Y- ro , a.c., and E sUP-ro<n<rol Ynl < 00, then

(3)

n--oo
408 11 Martingales

In particular, for any ft'l r.v. Y


limE{YIg;n} = E{YIg; oo}, a.c.
(4)

n- - 00

PROOF. The initial portion of (3) is Corollary 7.4.1. Apropos of the second,
note that by Fatou, EI Lool < 00. If Dn = E{Y"Ig;n} - E{Loolg; -oo}, then
for n ~ m

IDnl ~ IE{Y"Ig;n} - E{Loolg;n}1 + IE{Loolg;n} - E{Loolg; -00}1

~ E{~~~I}} - L0011g;n} + IE{Loolg;n} - E{Loolg; -00}1·

By Corollary 3, lim n__ oo E{Loolg;n} = E{Loolg; -oo} and

lim IDnl
11--00
~ lim E{SUP1 }}- L0011g;n}
11--00 )5.m

Theorem 2 (Austin). If {Sn, g;n, -00 < n < oo} is a martingale with
sUPnEISnl = K < oo,thenI:~oo(Sn - Sn_t)2 < oo,a.c.
PROOF. By Theorem 1, Sn~ an g; _oo-measurable r.v. S-oo as n --+ -00.
Since S-oo E ft'l' it may and will be supposed that S-oo = 0, a.c. Set X n =
Sn - Sn-l' whence by Corollary 2, Ii; _00 X j = Sn, a.c. For any C >
define T = inf{n > - 00: ISnl > q, where inf{0} = 00. Then, noting that
°
ISjl ~ Con {T > j},

1 I XJ
[T;oo] -00
n
~ I
n

j;-oo
1 [T>jl
(Sj - Sj_ d 2

Now

In
-00
1
[T>Jl
In {(1
(SJ - SJ- 1) = -00 [T>Jl
SJ - 1
[T>j-ll
SJ- 1) +1
IT;jl
SJ- t }

=1 S;+1
IT>nl IT"nl
Sf-l~C2+1IT"nlSf-l'
11.1 Upcrossing Inequality and Convergence 409

i r
j=-ooJ[T>jl
Sj_I(Sj - Sj_l) = ±r
-ooJ[T=j)
(Sj_ISj - SJ-I)

:s;; r
J[T:5.)
(C1STI - S}_I)

:s;; CK - r
J[T:5.]
S}_I'

recalling Exercise 7.4.4. Consequently, (5) ensures that for all n > - 00

r
J[T=oo)
±
-00
XJ :s;; C 2 + 2CK.
Thus, L~oo XJ < 00, a.c. on the set {T = oo}. By Corollary 2, P{T = oo} =
P{sup.~ -00 IS.I :s;; C} --+ 1 as C --+ 00, and so L~ 00 XJ converges a.c. 0

Corollary 5. If {So = D Xj' ~., I :s;; n < oo} is an 2 1 bounded martingale,


then for every integer k 2': 2, Uk .• = LI:5 i '<"'<ik:5. X i ,X i2 .. ,Xik converges
a.c. as n --+ 00.
PROOF. Since for k 2': 2, LJ=IIXl:s;; (Ii=l XJ)k/2, n> I, Theorem 2
ensures that LJ= 11 X j Ik converges a.c. as n --+ 00, k 2': 2, while Theorem 7.4.3
guarantees that S. converges a.c. The corollary now follows from the identity
of Lemma 10.1.2. D

In view of the identity Uk.• - U k.• - I = X.Uk-t,.-I' n 2': k > 1, the


prior corollary asserts that if S. = L~ X j, n 2': I, is an 2 1 bounded martin-
gale, then Uk .• = I,J=k X j Uk-l,j-l converges a.c. Note that in the forma-
tion of Uk,. the martingale differences X j have been multiplied by an
S'J_I-measurable function, viz., Uk - 1 ,j-l' A more general result of this nature
appears in part (ii) of the ensuing theorem.
For any sequence of LV.S {y"}, the generic notation

Y* = sup I Y"I (6)

will be employed for the maximal function sup. I Y.I.

Lemma 2. Let {So = D


Xj,~.,n 2': l} be an 2 1 martingale and v,. =
(LfIX.n1/r, r 2': 1. Thenfor every K > 0
(7)
where for j = 1, 2, 3, {X~), ~., n 2': I} are martingale difference sequences
satisfying
00

E L IX~I)lr :s;; cr E min(v,., KY (8)


I
410 II Martingales

with c, = 2', r #- 2, C z = 1,
00
E L IX~Z)I ~ 2 EIXTI[T<oo)1 ~ 2 E X*, (9)
1

where

T= inf{n ~ I: v.." == (* IX j I'r I' > K},


E(~ IX~3)I'r/' ~ E v,., {X(3)* > O} c {v,. > K}, (10)

p{X(3)* > O} ~ E;,


PROOF. For K > 0 and T, v.." as in (9) note that v.." ~ v,., Moreover, for
n ~ 1 define
X~l) = XnI[T>nl - d,
E{XnI[T>n)l~n-

X~ZI = XnI[T=nl - E{XnI[T=n)l~n- d,


n -- X nI [T<nl'
X(3)

where ~0 is any sub-a-algebra of ~ I' For r ~ 1

00
~ 2'-1 L E(lXnI[T>nl + IE{XnI[T>n)l~n- d I')
1
00 T-I
~ 2' E DXnl'I[T>n) = 2' E L IXnl'
1 1

~ 2' E min(v,., Ky,


while for r = 2
00 00 T-I
E L IX~I)IZ ~ E L X;I[T>n) ~ E L X; ~ E min(Vz, K)Z,
1 1 1

so that (8) obtains, Furthermore,


00 00
EDX~Z)I ~ 2EDX nI[T=n)1 ~ 2EIX TII[T<001 ~ 2EX*,
1 1

/
E(~ IX~3)l'r/' ~ E(~ IXnl'r ' = E v,.,
and, since {X(3)' > O} c {v,. > K},
P{X(3)' > O} ~ P{v,. > K} ~ K- 1 E v,.,
11.1 Upcrossing Inequality and Convergence 411

Clearly, {X~j), :#'n, n ~ l}, j = 1, 2, 3, are martingale difference sequences


satisfying (7). 0

Theorem 3. Let {Sn"~n,n ~ I} be a martingale and {y",:#'n-"n ~ I} a


stochastic sequence with y* < 00, a.c., where:#'o is any sub-a-algebra of:#'o
(i) If E X* < 00, where X n = Sn - Sn _ I' then Sn converges a.c. on
{If X; < oo}. In particular, ifE(If X;)1/2 < 00, Sn converges a.c.
(ii) If SUPn;;, I E ISn I < 00, then If X n Yn converges a.c.
PROOF. Under the hypothesis of (i), taking r = 2 in the prior lemma, for every
K > 0 there exists a decomposition as in (7) with S~1) = I~ XlI), n ~ 1, an
!e 2 bounded martingale, If IX~2) I < 00, a.c., and

Thus, in view of the arbitrariness of K, the martingale Sn converges a.c. on


{If X; < oo}.
If, as in (ii), {Sn, :#'n, n ~ l} is an !e l bounded martingale, Theorem 2
guarantees that If X; < 00, a.c. Then, since y* < 00 a.c., clearly

I x; Y;
00

< 00, a.c.


1

Define T = inf{n ~ 1:ISnl ~ K or 1y"+11 ~ K}, X~ = XnY"IIT;;,n), and


S~ = LJ; I Xi, n ~ 1. Now {S~, :#'n, n ~ I} is a martingale and If (X~)2 < 00,
a.c. Moreover,

IX~I ~ K(ISnl + ISn-II)I,T;;,n) ~ K[2K + ISnII'T;n)],


sup IX~I ~ K(2K + ISTIIIT<OO),
n

implying EX" < 00. By (i), S~ converges a.c., whence If X n Y" converges
a.c. on {T = oo}. Since Sn converges a.c. and Y* < 00 a.c., P{T = oo} ~ 1
as K ~ 00, whence If X n Y" converges a.c. 0

Corollary 6. Let {Sn = I~ X j' :#' n' n ~ I} be an !e I martingale such that

sup{EIXTI[T<oo)l:stopping times T} < 00.

Then Sn converges a.c. on [If X; < 00].

EXERCISES 11.1

1. If Un is the number of upcrossings of ['1' '2] by the submartingale {Sj' §j =


a(Sj,1 :-; i :-; j), 1 :-; j :-; n} prove that

P{ Un 2: K} :-; (r2 - '1)-1 E(Sn - r 1)+ I[U"=k)' k 2: I.


412 11 Martingales

2. If {X n , n ;;:: I} are !L'I interchangeable r.V.S, prove that I/n L~ Xi~ some LV.
Y (also in !L'I)' Verify that Y = E{X11nf ~n}' where ~n = a(D X,,};;:: n). Hint:
n- 1 D Xi = E{n- I D Xd~n} = E{X II~n}'

3. Let Ybear.v.withE Y = OandEIYllog(l + IY!) < co.lf{~n,n;;:: I}isasequence


of independent a-algebras of events, then E{ Y I~ n} ~ 0 as n -+ co. Hint: Apply
Corollary 4 and Theorem 7.4.8.

11.2 Martingale Extension of


Marcinkiewicz- Zygmund Inequalities
Let (n, :F, P) be a probability space and let {0, n) = :Foe :F Ie· .. c
:F 00 c :F be a sequence of u-algebras with :F 00 = u(ur :F.). For any
stochastic sequence {f., :F., n ~ l} define fo = 0 and
f = {f.,n ~ I}, f* = suplf.!. Ilfll p = supllJ..ll p , foo = lim f.,
.":1 n~ 1 n-+oo

f~ = max IJiI, n ~ 1,
I s;js;.

00 ) 1/2 • )1 /2
S(f) = Soo(f) = ~ dJ
(
' S.(f) = ( ~dJ .

Recall thatfis fiJ p bounded if Ilfll p < 00. Moreover,fwill be called a


submartingale or martingale whenever {f., :F., n ~ I} is such. For any real
numbers a, b let a /\ b = min(a, b) and a v b = max(a, b).

Lemma 1. If f is an fiJ I bounded martingale or nonnegative fiJ I bounded


submartingale and T = inf{n ~ 1: If. I > ..t}, ..t > 0, then
E S}-I(f) + E f}-, ~ 2 EfT fT-1 ~ 2..tllfll. (I)
PROOF. Since 1fT-II ~ ..t, andfoo = Iimf. = lim f., a.c., by Fatou
ElfTfT-tl ~ AElfTI ~ A lim Elf.1 = Allfll,

and so it remains to prove the first inequality in (1). Setting T" = T /\ n for
n ~ 1,
.-1

S;-I(f) + f;-, = 2 L
ls;jS;kS;.-1
djd k =2 L dj(f.-,
j~1
- Ji-I)

whence (employing Corollary 7.4.5 in the martingale case)


E[S}n-,(f) + f}n- ] ~ 2 E fT.!Tn-I'
'
)).2 Martingale Extension of Marcinkiewicz-Zygmund Inequalities 413

Since IfTJTn-11 ::;; A.(A. + IfTI) and ElfTI < 00 by Exercise 7.4.4, the first
inequality of (1) follows as n -+ 00. 0

Lemma 2. If f is a martingale or nonnegative submartingale,for every A. > °


A. P{S(f) > A.,f* ::;; A.} < 211fll, (2)
A. P{S(f) > A.} ::;; 311fll. (3)
PROOF. Theorem 7.4.8 (34) implies A. PU* > A.} ::;; Ilfll, whence (3) is an
immediate consequence of (2). To prove the latter, suppose without loss
of generality that f is 2 1 bounded and define T = inf{n ~ 1: Ifni> A.}.
Now ST-I(f) = S(f) on the set {T = oo} = U* ::;; A.} and, utilizing
Lemma 1,
A. P{S(f) > A.,f* ::;; A.} ::;; A. P{ST-I(f) > A.}
::;; A.-I ES}-I(f)::;; 211fll. D

Lemma 3. Letf be a nonnegative submartingale, 0< e < 00, y" = Sn(ej) v f:,
n ~ 1. Then,for A. > 0, {3 = (1 + 2e 2)1/2, and p E (1, (0)

A. P{Yn > {3A.} ::;; 3 f.[Y n > ,,]


fn, (4)

9p 3/2
IISn(f)ll p ::;; - 1 Ilfnllp, (5)
p-
9p 3/2
IIS(f)ll p ::;; - 1 Ilfll p. (6)
p-
PROOF. Define I j = I[sj(9f»,,) and gj = IJj,j ~ 1. Since Ij+1 ~ I j, neces-
sarily g = {gn, n ~ I} is a nonnegative submartingale. Let T = inf{n ~ I,:
Sn(ef) > A.}. On the set {Sn(ef) > {3A..f: ::;; A.}, note that T ::;; n, g: ::;; A., and
IdTI = 1fT - fT-t1::;; fT v fT-I ::;; f:::;; A., so that, recalling the definition
of {3,
(1 + 2e2)A 2 < S;(ef) = S}-I(ef) + e 2d} + e 2 L dJ
T<j'5,n
::;; A2 + e 2A2 + e 2 L (gj - gj_I)2
T<j'5,n
::;; (1 + e 2)A 2 + e 2S;(g),
implying that Sn(g) > A on the set in question. Hence, applying Lemma 2 to
{gt> ... , gn, gn," .},
A P{Sn(ej) > {3A, f: ::;; A} ::;; AP{Sig) > A.. g: ::;; A} ::;; 211gnll = 211 I nfnll·
On the other hand, by Theorem 7.4.8 (34)

A PU: > A} ::;; f.


In>,,)
In ::;; f. IY n >")
In,
414 11 Martingales

and so, combining these estimates,


AP{ y" > f3A} :$ A PU: > A} + AP{Sn(l~f) > f3A, f: :$ A}

:$ 3 i [Y n > Al
fn

and (4) is proved. To obtain (5), note that via (4)

f3- P E Y: = p rooAP-I P{Yn >


Jo
f3A}dA:$ 3p rooAP-z
Jo
i[Yn>AI
fn dPdA

=.3 P E(fn fnAP-ZdA) = /~ I Efn y :- I

:$ ~I
P-
Ilfnllpll Ynll~-I,

implying

Choose () = p- liZ, whence fJP = (I + (2jp»pIZ < 3 and (5) follows. Finally,
let n -+ 00 in (5) to obtain (6). 0

Theorem 1 (Burkholder). Iff= Un' n 2 I} isan2" 1 martingale and p E (1,00),


there exist constants A p = [18 p 3/Zj(p - 1)]-1 and Bp = 18 p 3/Zj(p _ I)IIZ
such that
ApIISn(f)ll p :$ Ilfnllp :$ BpIISn(f)lIp, (7)

ApIIS(f)ll p :$ IIfll p :$ BpIIS(f)ll p. (8)

PROOF. It suffices to verify (7) since (8) then follows as n


-+ 00. To this end,
set gj = EUn+ I.~), h j = EU; I.~). Then gn = fn+, hn = f;; and jj =
EUnl.~) = gj - h j for 1 :$ j :$ n. Since Sn(f) :$ Sn(g) + Sn(h), by Min-
kowski's inequality and Lemma 3

II Sn(f)ll p :$ IISn(g)ll p + II Sn(h)ll p


9 p3 / Z 18 p3 / Z
:$ - - I (1lgnllp + II hnllp) :$ - - I Ilfnllp,
p- p-
yielding the first inequality of (7). Apropos of the second, suppose without
loss of generality that IIJ~lIp > Oand IISif)llp < oo.Thenfj E2"p, I:$j:$ n,
whence, if

I :$j:$ n,
11.2 Martingale Extension of Marcinkiewicz-Zygmund Inequalities 415

and (lIp) + (llq) = I, it follows that {gj' I ~ j ~ n} is an !£Iq martingale


with Ilgnllq = I and E fngn = IIfnllp. Consequently, if e\ = gb ej = gj -
gj_ 1 for 2 ~ j ~ n, then via the Schwarz and Holder inequalities

= E L djej ~ E Sn(f)Sig) ~ II Sn(f)ll pIISn(g)ll q


I

utilizing the portion of (7) already proved. o


Theorem I in conjunction with Theorem 7.4.8 yields

Corollary 1. If Un' n z l} is an !£II martingale and p E (I, 00), there exist


constants A p = [18 p3 / 2I(p - 1)r I and B~ = 18p 5/2 /(p - 1)3/2 such that

ApIISn(f)ll p ~ Ilf:ll p ~ B~IISn(f)llp, (9)

ApIIS(f)ll p ~ Ilf*ll p ~ B~IIS(f)llp. (10)

The usefulness ofTheorem I will be demonstrated in obtaining a martingale


strong law of large numbers and an extension of Wald's equation that does
not require integrability of the stopping rule.

Corollary 2. Iff = Un, ffn , n z I} is an !£I2r martingale such that for some r z 1

(II)

then fnln .;;:.


.z- 2.
O.

PROOF. The argument of Theorem 10.1.3 carries over verbatim. o


Corollary 3. If f = Un, ffn , n z I} is an !£I r martingale such that for some r in
(1,2] and B E (0,00)
n

supn- I LE{!J} - J}-ll'lffj-d ~ B, a.c., (12)


n", I j= I

then E fT = E fl for every stopping time T with E Tl/r < 00.

PROOF. If 1;, = T /\ n, then EfT" = E fl for n z I by Corollary 7.4.4. Hence,


it suffices via dominated convergence to prove that Z = sUPn", I I fT.! is
416 11 Martingales

integrable. To this end, set m = [v'] for v> 0, whence, employing Theorem
7.4.8,
P{Z ~ v} ~ P{T ~ v'} + P{T ~ m, Z ~ v}

~ P{T ~ v'} + p{ max IfTjl ~ v}


1 S:Jsm

~ P{T ~ v'} + v-' ElfTJ. (13)


By Theorem 1 there exists a constant C E (0, 00) such that
T'" )'/2 ~ CE ~ Idjl' T."
ElfTJ ~ C E ~dJ (
Tm
~ CE LE{ldXI~j-d ~ CBE Tm
I

= CB(m P{T ~ v'} + [T) (14)


JIT,;;V'I

in view of (29) of Corollary 7.4.7 (which holds in general) and (12). Con-
sequently, from (13) and (14)

P{Z ~ v} ~ (1 + CB)P{T ~ v'} + CBv-' [ T,


JIT';; v'l
whence

EZ = [00 P{Z 2: v}dv ~ (l + CB)E Til' + CB [00 v- r [ T dP dv


Jo Jo JIT,;; vrl
= (1 + CB)E Til' + CB [ T dP foo v-' dv
In Tllr

= (1 + CB + CB)E Til' <


r- 1
00. o

common variance (12 < 00, E If Xi = whenever E T I /2 < 00. °


Thus, if {X., n ~ I} are independent r.v.s with common mean zero and

In contrast to the Marcinkiewicz-Zygmund theorem (Theorem 10.3.2),


Theorem 1 does not hold for p = 1. For example, if {X., n ~ l} are i.i.d. LV.S
with P{X. = ±l} = t, lv,. = L'i X j and T = inf{n ~ 1: lv,. = l}, then
U. == WT ,,"' n ~ I} is an !£ 1 martingale with
Ilf.11 = Elf.1 = Ef.+ + E f; = 2 E f: -+ 2
as n -+ 00, noting that E J. = 0. However,

IIS.U)II = E S.U) = E ( JI
TA.
1
)1/2
= E(T /\ n)1/2 -+ 00,
11.2 Martingale Extension of Marcinkiewicz-Zygmund Inequalities 417

so that the first inequality of(7) fails for p = 1. However, the second inequality
does hold. More precisely, Corollary 1 obtains when p = 1, as will be shown
in the next theorem.

Lemma 4. Let f = {fn == D dj , n :2: I} be an!l'l martingale with Idjl ~ J-j,


where {v", ff n _ I> n :2: I} is a stochastic sequence. If A > 0, fJ > 1 and 0 < tJ <
fJ - 1 then

P{f* > fJA, S(f) v V* ~ tJA} ~ (fJ _ ~~ 1)2 P{f* > A}, (15)

9tJ 2
P{S(f) > fJA, f* v v* ~ tJA} ~ fJ2 _ tJ2 _ 1 P{S(f) > A}. (16)

PROOF. Set So(f) = 0 and define


J.L = inf {n :2: 1: I fn I > A}, v = inf {n :2: 1: If" I > fJA},
a = inf{n :2: 0: Sn(f) v v,,+ 1 > tJA}.
If
n

hn = LdjI[IJ<j$vl\al
j= 1

then h = {h n, n :2: I} is a martingale with S(h) = 0 on {J.L = oo} = {f* ~ A}.


Moreover, recalling that Idjl ~ V; by hypothesis,
S2(h) ~ S;(f) = [S;-l(f) + d;]I[a<ool + S;(f)I[a=ool
~ 2tJ 2A2.
Hence,
IIhll~ = E S2(h)I(J'>J.] ~ 2tJ 2A2 P{f* > A},
implying via the Kolmogorov inequality (Theorem 7.4.8) that
P{f* > fJA, S(f) v V* ~ tJA}
= P{a = 00, v < oo,hn = fVl\n - fjJl\n,n:2: I}
~ P{h* > fJA - (I + tJ)A} ~ [(fJ - 1 - tJ)Ar2I1hll~
2tJ 2
~ (fJ - tJ - 1)2 P{f* > A}.

To obtain (16), define analogously


J.L' = inf{n :2: 1: Sn(f) > A}, v' = inf{n :2: 1: Sn(f) > fJA}
a' = inf{n:2: O:f~ v v,,+l > tJA}.
If
n

gn = L djI lI" <js v' 1\ a'l'


t
418 II Martingales

then 9 = {gn, n ~ I} is a martingale with g* = 0 on Vi' ~ a'} ~ {S(f) :::;; A},


whence
g* :::;; (f;, + /:.)1[1"<"'<00] + 2f*/[I"<"'=001
:::;; 3JA,
implying E(g*? :::;; 9J 2 A2 p{S(f) > A}. Thus,
P{S(f) > /3A,f* v V* :::;; JA}
= P{ v' < 00 = a', S;(g) = S;, ,..if) - S~, An(f), n ~ I}
:::;; P{S2(g) > [/32 - (I + J2)]A 2} :::;; [(/32 - 1 - J2)A 2 r t ES 2(g)

:::;; [(/32 - 1 - J2)A 2 r I E (g*)2 :::;; /32 _ ~~2_ I P{S(f) > A}, 0

Lemma 5. If f = {f" = L~ d j , n ~ I} is an !E I martingale and

then 9 = {gn, n ~ I} and h = {h n, n ~ l} are !E I martingales with f" =


gn + hn, n ~ 1 and
(19)
00

L Idn/lld"I>2d~_tll :::;; 2d*, (20)


n= I

00

L Elbnl :::;; 4 E d*, (21)


n=1

PROOF, The validity of(l9) is clear. On the set {jdjl > 2dj_ d, Idjl + 2dj_ t :::;;
21djl :::;; 2dj, implying
00 00

L Idj/lldil>2dJ_ tll :::;; 2 L (dj - dj-I) = 2d*,


I I

which is (20), This, in turn, yields (21) via


00 00 00

LElbjl:::;; ELldj/lldjl>2dj_tll + LEIE{dj/lldjl>2dj_tllffj-dl


I I I

:::;; 2 E d* + 2 E d* = 4 E d*, o
Theorem 2 (Davis). There exist constants 0 < A < B < 00 such that, for any
fill martingale f = {f", n ~ I},
A E S(f) :::;; Ef* :::;; BE S(f), (22)
11.2 Martingale Extension of Marcinkiewicz-Zygmund Inequalities 419

PROOF. Writing f,. = g. + h. as in Lemma 5, it follows therefrom that


00

Ef* ~ E(g* + h*)~ Eg* + IElbjl ~ Eg* + 4Ed*, (23)


1
00

E S(f) ~ E[S(g) + S(h)] ~ E S(g) +I Elbjl ~ E S(g) + 4 E d*. (24)


1

Since g. = Ii aj is a martingale with Ia.1 ~ 4d:_ I' Lemma 4 ensures that


for . 1. > 0, p > <5, and 0 < <5 < 1
2<5 2
P{g* > p..1., S(g) v 4d* ~ <5..1.} ~ (p _ <5 _ 1)2 P{g* > ..1.},

9<5 2
P{S(g) > p..1., g* v 4d* ~ <5..1.} ~ p2 _ <5 2 _ 1 P{S(g) > ..1.}.

Hence,
2<5 2
P{g* > P..1.} ~ P{S(g) > <5..1.} + P{4d* > <5A.} + (P _ <5 _ 1)2 P{g* > ..1.},
9<5 2
P{S(g) > P..1.} ~ P{g* > <5..1.} + P{4d* > <5,1,} + p2 _ <5 2 _ 1 P{S(g) > ,1,},
implying
2<5 2
P- I Eg* -< <5- t ES(g) + 4<5- 1 Ed* + Eg* (25)
(P - <5 - 1)2 '

p- 1 E S(g) ~ <5 - 1 E g* +
9<5 2
4<5 - I E d* + p2 _ <5 2 _ 1 E S(g). (26)

Now, recalling Lemma 5,


00

E g* ~ E(f* + h*) ~ Ef* + L: Elbjl ~ E f* + 4 E d*, (27)


1
00

E S(g) ~ E[S(f) + S(h)] ~ E S(f) +I Elbjl ~ E S(f) + 4 E d*. (28)


I

In order to subtract in (25) and (26), rewrite these for the martingale
{gt> g2, ... , g., g., g., ...}, obtaining via (27) and (28)
2
2<5 ]
[ p-l - (P _ <5 _ 1)2 Eg: ~ <5- 1 ES.(f) + 8<5- 1 Ed: ~ 9<5- 1 ES(f),

(29)

[p-I - p2 !~: _I]ES.(9) < <5- 1


Ef: + 8<5- 1 Ed: ~ 17<5- 1
Ef*.

(30)
420 11 Martingales

For small b, the coefficients on the left in (29), (30) are positive, and so, letting
n -+ 00 and recalling (23), (24), there exist constants B I , B, AI, A for which
E f* ~ BI E S(f) + 4 E d* ~ BE S(f),
E S(f) ~ A I E f* + 4 E d* ~ A - I E f*. 0

Corollary 4. If the stochastic sequence {IX.IP, fff'., n ~ l} is uniformly


integrable for some p E (0,2) and S. = L~ Xi' then

lim ~ EIS. - anl P = 0, (31)


n-<X) n
where an == 0 for 0 < p < 1 and an = Li = I E{X j Ifff' j _ d if 1 ~ p < 2.
PROOF. Define Y" = X. if 0 < p < 1 and Y. = X. - E{X.Ifff'n-d when
1 ~ p < 2. Then {I Y" IP, n ~ I} is u.i., whence for any t: > 0 there exists a
constant M > 1 with

Set
Y~ = Y. - Y~.

Then, irO < p < 1,

E IS.I P = EI t (Yj + Yj) IP ~ E (t I Yj I) P +E (t I Yj I) P


~ (nM)p + LEI YjlP ~ (nM)p + nt:
I

and (31) follows. If, rather, 1 ~ p < 2, then Theorems 1 and 2 guarantee the
existence of a constant C with

~ CE[(t Y?f2 + (t Y?f2]


~ C [(nM 2)p/2 + E t I Yj IP] ~ C[(nM 2)p/2 + nt:],
again yielding (31). 0

EXERCISES 11.2
I. Let {X., n ~ I} be independent LV.S with EX. = 0, EX; = I for n ~ 1. If T =
inf{n ~ l:D X j > O}, then P{T < oo} = I and E T I /2 = 00.

2. Let {X.,n~ I} be interchangeable LV.S with EX I =0= EX I X 2 , Exf = 1. If


T = inf{n ~ I: D
Xi> O}, then P{T < oo} = I and E T I /2 = 00.
11.3 Convex Function Inequalities for Martingales 421

3. If {X, X., n ~ l} are martingale differences with EX = 0, EIXI P < 00 for some
p ~ 2, and {S., n ~ I} are the partial sums, then {IS./n 1/ 2 IP, n ~ l} are u.i. Hint:
Consider EIS~IP+l and EIS;I Pwhere X~ = X.I[IXnl$MJ - E{X.I[IXnl$MJIX 1 ••• X.- 1 }
and X; = X n - X~.
4. If {X., n ~ I} are martingale differences and p E (I, 2J then E SUP.2 I ID X X :s;
A p L;'O EI X.I P for some finite constant A p .

11.3 Convex Function Inequalities for


Martingales
In the prior section it was shown that for p ~ 1 and any martingale f there
exist constants 0 < A p < Bp < 00 such that

(I)

where f* and S(f) are defined in Section 2. Here, it will be shown for any
convex, nondecreasing function <1> on [0, 00) with <1>(0) = 0 and

<1>(2.1) :s;; c<1>(A) for all A ~ 0 and some c > 0


that there exist constants 0 < A e < Be < 00 such that for any martingale f
A e E <1>(S(f) :s;; E <1>(f*) :s;; Be E <D(S(f». (2)

In what follows, let (0, .?, P) be a probability space, {ff n' n ~ I} an


increasing sequence of sub-a-algebras of ff, and ff 0 = {0,0}.

Lemma 1. Let <1> be a nondecreasing function on [0, 00], continuous on [0, 00)
with <1>(0) = 0, <1>( 00) = <1>(00 - ), and
<1>(2.1) :s;; c<1>(A) for all A E [0, 00) and some c > O. (3)
Iff and g are nonnegative, measurable functions on (0, ff, P) and b, e, f3 are
positive constants with f3 > 1 satisfying
eC I +Iogp < 1 where 210gp = f3, (4)

P{g> f3A,f :s;; bA} :s;; eP{g > A} for all A E (0,00), (5)
there exists A = A e• P. o. e E (0, 00) such that
E<1>(g):s;; A E<1>(f). (6)

PROOF. Suppose without loss of generality that E <1>(f) < 00. From (3) and
(4) there exists y = 'Ie. p E (0, 00) and Yf = Yfe.o E (0, 00) with 'Ie < 1 such that
for all A> 0

(7)
422 II Martingales

Since <I>(t) = fO' I[O,t)(A)d<l>(A), for any nonnegative measurable function h


on (O,~, P)

E <I>(h) = E {Xl (A)d<l>(A) = E {\;.. ro)(h)d<l>(A)


1[0, h)

= {ro P{h > A}d<l>(A). (8)

Now (5) ensures that


P{g > f3A} ~ P{f > t5A} + t: P{g > A},
and so via (8) and (7)

E <l>(f3- 'g) = {ro P{g > f3A.}d<l>(A.)


~ {ro P{f > t5A.}d<l>(A.) + {ro P{g > A.}d<l>(A.)
t:

= E <1>(15 - If) + t: E <I>(g) ~ 11 E <I>(f) + t: E <I>(g). (9)

Let gn =9 /\ n for n ~ 1. Then gn satisfies (5), so that from (6) and (9)

implying
(1 - ')'t:)E <l>(gn) ~ ')'11 E <I>(f),
and hence (6) and A = ')'11/(1 - ')'t:). o
Lemma 2 (Zygmund). If<l> is a convex function on an interval [a, b), where a
is finite, there exists a nondecreasing, integrable cp on [a, c)for every c E (a, b)
such that

<l>(t) = <l>(a) + {CP(U)dU, t E [a, b). (10)

PROOF. Since <I> is convex, cp(t) = limo<h_o[<I>(t + h) - <I>(t)]/h, t E [a, b)


exists, is a finite, nondecreasing function on [a, b), and coincides with
d<l>(t)/dt, a.e. Now <I> is absolutely continuous on [a, c) for every c E (a, b)
and so (10) follows. 0

Lemma 3 (Burkholder- Davis-Gundy). Let <I> be a nondecreasing function on

°
[0, 00], finite and convex on [0, 00) with <1>(0) = 0, <I>( 00) = <I>( 00 - ) = 00,
and <I>(2A.) ~ c<l>(A.) for all A. ~ and some c E (0, 00). Then there exists a
constant B = Be E (0, 00) such that for every sequence {Zn, n ~ I} of non-
negative measurable functions on (0, ~, P)

(11 )
11.3 Convex Function Inequalities for Martingales 423

PROOF. Set Wo = Zo = 0 and define for n ~ 1

• 00

JtY" = L E{ZJ~j-l}' W = Woo = L E{zjlS;;j_ d·


1 1

In verifying (II) it may be supposed that E Z > 0 and E «I>(Z) < 00. Then
convexity and cI>(00-) = 00 entail E Z < 00, so that E W < 00. For A. ~ 0
define the S;;.-time T = inf{n ~ 0: JtY,,+ 1 > A.}, whence

i
[O,;T,;n]
(W - Z - A.) =
j=O
f i
[T=j)
(~ - A. + W - ~- Z)

~
j=O
f i
[T=j)
E{W - ~- ZIS;;j} ~0
or

i [O,;T,;.)
(W - A.) ~ i [O,;T,;.)
Z,

and so, letting n ~ 00,

E(W - A.)+ = i[O,;T<oo)


(W - A.) ~ i [O,;T<oo)
Z = i [W>).)
Z. (12)

Next, setting d = c - I and applying Lemma 2,


21
tcp(t) ~ f I cp(u)du = cI>(2t) - «I>(t) ~ d«l>(t). (13)

Since cp is nonnegative, nondecreasing on [0, 00), and bounded on every


finite interval [0, M], necessarily cp is a function of bounded variation. Hence,
via integration by parts (for Riemann -Stieltjes integrals), for t E (0, 00)

Lcp(U)dU = tcp(t) - Lu dcp(u)

= L(t - u)dcp(u) + tcp(O) = {oo (t - u)+ dcp(u) + tcp(O),

and so by Theorem 6.2.4 and Lemma 2,

cI>(t) = {oo (t - u) + dcp(u) + tcp(O), (14)


424 II Martingales

where the integral is in the Lebesgue-Stieltjes sense. From (14), recalling (12),

E<I>(J¥,,) = E[LXl(J¥" - ut d<p(u) + J¥"q>(0)]

= 100

E(J¥" - u)+ dq>(u) + q>(O)E J¥"

~ fOO
o
r
J[W >")
Zn dP dq>(u) + q>(O)E Zn

f
n

= EZ n d<p(u) + q>(O)EZ n = EZnq>(J¥,,),


[D.W n )

implying

fora E (0,00). (IS)

Let ljJ on [0,00) be an inverse function of <p, that is, ljJ(<p(u» = u =


<p(ljJ(u» for u E (0,00), and set

'P(t) = LljJ(U)dU.

Then, by Young's inequality (Exercise 6.2.15) uv ~ <I>(u) + 'P(v) for u,


v E (0, 00) with equality holding for v = q>(u). Consequently,

and so from (15)

Next, take v = q>(u) in Young's inequality, yielding via (13)


'P(q>(t)) = tq>(t) - <I>(t) ~ (d - 1)<I>(t), d";? I,
and, combining this with (16),

[a - b(d - I)]E<I>(~) ~ b E<I>(~!} (17)

From (13), <p(u) ~ (d/u)<I>(u), whence for r E [1, 00)


<I>(ru)
log - - =
fro -q>(t)- dt < d fro -dt = log r d
<I>(u) "<I>(t) - u t
or
r ";? I. (18)
11.3 Convex Function Inequalities for Martingales 425

Taking a = d and b = 1 in (17),

E <I> (~) :::;; E <I>(Zn),

whence, recalling (18),


E <l>(lv,,) :::;; E <I>(dZn) :::;; dd E <l>(Zn), (19)
and (11) follows with B = dd by monotone convergence. o
Corollary 1 (Garsia). For every sequence {zn' n ~ I} ofnonnegative measurable
functions on (n, :F, P) and any p > 1

(20)

PROOF. When <I>(t) = t P, (13) yields d = p, whence (20) follows directly from
(19).

Theorem 1 (Burkholder-Davis-Gundy). Let <I> be a nondecreasing function


on [0, 00], finite and convex on [0, (0) with <1>(0) = 0, <I>( (0) = <1>( 00 - ), and
<I>(2A.) :::;; c<l>(A.) for all A. > 0 and some (; in (0, (0). Then there exist constants
o < Ae < Be < 00 such that for any martingale f
Ae E <l>(S(f» :::;; E <l>(f*) :::;; Be E <I>(S(f), (21)
where S(f) and f* are as in Section 2.
PROOF. By Lemma 11.2.5,f = 9 + h, where the martingales g, h are defined
by gn = Ij= t aj, hn = Ij= I bj with
an = dnI[ld"I~2d~_tl - E{dnllld"I~2d~_tll:Fn-d,
bn = dnllld"I>2d~_tl + E{dnllld"I~2d~_tll:Fn-d,
00

Z == L Idjllldjl>2dj_tll :::;; 2d*,


I

00 00

W == L IE{djI[djl~2dj_tll:Fj-d I:::;; L E{ldjIIlIdjl>2dj_tll:Fj- d. (22)


I I
Then
00

f* :::;; g* + h* :::;; g* + L1bjl :::;; g* + Z + W,


I
00
(23)
S(g) :::;; S(f) + S(h) ~ S(f) + L1bjl :::;; S(f) + Z + W,
I
and
S(f) ~ S(g) + S(h) ~ S(g) + Z + W, d* ~ S(f)
(24)
g* :::;; f* + h* :::;; f* + Z + W, d* :::;; 2f*.
426 II Martingales

By Lemmas 11.2.4 and 1 there are finite, positive constants B j = Bic),


j = 1,2 such that
E <I>(g*) :::;; B I E <I>(S(g) v 4d*) :::;; B I E[$(S(g» + <I>(4d*)],
(25)
E $(S(g» :::;; B2 E <I>(g* v 4d*) :::;; B2 E[<I>(g*) + <I>(4d*)].
Moreover, by Lemma 3, for some B 3 = B 3(c) E (0,00)
E <I>(W) :::;; B3 E <I>(Z), (26)
whence via (23)
E $(f*) :::;; E <I>(g* + Z + W) :::;; E[$(3g*) + $(3Z) + <1>(3 W)]
:::;; c2 E[$(g*) + <I>(Z) + <I>(W)]
:::;; c2B I E[$(S(g» + <I>(4d*)] + c 2 E[$(Z) + <I>(W)] by (25)
:::;; c 2B I E <I>(S(f) + Z + W) + c 2B) E $(4d*)
+ c 2 E[<I>(Z) + <I>(W)] by (23)
:::;; c4 B I E <I>(S(f) + c2B I E <I>(4d*)
+ (c 4 B I + c 2 )E[<I>(Z) + $(W)]
:::;; c4 B I E <I>(S(f) + c2 B I E <I>(4d*)
+ (c 4 B I + c 2 )(1 + B 3 )E $(Z) by (26)
:::;; c 4 B I E <I>(S(f» + c4 B I E $(d*)
+ (c 5 B I + c 3 )(1 + B 3 )E $(d*) by (22)
:::;; [2c 4 B I + (I + B3 )(c SB I + c 3 )]E $(S(f) by (24),
yielding the upper inequality in (21). Similarly, for some finite, positive
constants A j = Aic), 1 :::;; j :::;; 7,

E $(S(f) :::;; E $(S(g) + Z + W) by (24)


:::;; A 2 E[$(g*) + <I>(4d*) + <I>(Z) + <I>(W)] by (25)
:::;; A 3 E[$(f* + Z + W) + <I>(4d*) + $(Z) + <I>(W)] by (24)
:::;; A 4 E[<I>(f*) + $(d*) + $(Z) + $(W)]
:::;; As E[<I>(f*) + <I>(d*) + <I>(Z)] by (26)
:::;; A 6 E[$(f*) + $(d*)] by (22)
:::;; A 7 E $(f*),
completing the proof of (21).

As will be demonstrated shortly, it is useful to have a version of Theorem I


with S(f) = (If dJ)1/2 replaced by s(f) = (If = I E{dJ I~j_ d)1/2, and for
this an analogue of Lemma 11.2.4 is needed.

Lemma 4. Iff is a martingale, I :::;; (X :::;; 2, and

(27)
11.3 Convex Function Inequalities for Martingales 427

then for any A. > 0, f3 > 1, and b E (0, f3 - 1) there exists a finite, positive
constant Ba such that
B ba
PU* > f3)., s(f) v d* ::;; bA.} ::;; (f3 _ ~ _ b)a PU* > A.}. (28)

PROOF. Define sif) = (D= 1 E{ldjlal~j_ d)"a and

Jl = inf{n ~ 1: Ifni> A.},


V = inf{n ~ 1:lfnl > f3A.},
(J = inf{n ~ 0: Idnl V Sn+ l(f) > bA.}
and
n n

hn = L djI IIl <j$vA<11 == L aj


j= 1 j= 1
(say).

Then {h n , ~n, n ~ I} is a martingale with S(h) = ° on {Jl = oo} and


sa(h) = ~(h)IIIl<OOI ::;; baA.aII!">).I'
Hence, by Theorems 11.2.1 and 11.2.2, for some B = B a E (0, (0)
)a/2
= sup Elhnl a ::;; E h*a ::;; Ba E ( L aJ
00 00
IIhll~ ::;; Ba E Liajla
n,,- 1 1 1

= Ba E sa(h) ::;; BabaA.a PU* > A.},


whence
PU* > f3)., s(f) v d* ::;; bA.}
= P{v < 00, (J = 00, hn = fVAn - fllAn for all n ~ I}
::;; P{h* > f3A. - A. - bA.}

::;; Ba (f3 _ ~ -br PU* > A.}. o

Theorem 2. Let «I> be a nondecreasing function on [0, 00], continuous on [0, (0)
with «1>(0) = 0, «1>(00) = «I>( <X) - ), and «I>(2A.) ::;; c«l>(A.) for all A. > and some
c E (0, (0). Then, for every IX in [1, 2] there exists a finite positive constant
°
B = Be• a such that for any martingale f
E «I>(f*) ::;; B E «I>(s(f» + B E «I>(d*), (29)

wheres(f) = (Lj=l E{ldjlal~j_d)l/a.

PROOF. Choose f3 = 3 and bE (0, 1) such that Ba baC 3 < 1, where B is the
constant in (28). Then by Lemmas 1 and 4 there exists B = Be. a E (0, (0)
such that
E «I>(f*) ::;; B E «I>(s(f) v d*) ::;; B E[«I>(s(f» + «I>(d*)]. 0
428 11 Martingales

Theorem 2 will be applied in generalizing Theorem 9.1.2 so as to encom-


pass convergence of absolute moments.

Corollary 2 (Brown). Let {X n , n ~ I} be independent LV.S with E Xn = 0,


EX; = 0';,
s; = L~ O'f, If {X n } obey a Lindeberg condition of order r ~ 2,
that is,
all t: > 0, (30)
then
(31)

PROOF. Set
X~ = XnIIIXnl9nJ - E XnIIIXnI9nJ'

X~ = XnIIIXnl>snJ - E XnIIIXnl>snJ'
and Sn = L~ Xj' S~ = D Xi, s; = D Xi· Then Sn = S~ + S; and, by
Theorem 2, for some constant C E (0, 00)
]('+ll/2
L E(Xi?
n
EIS~I'+ 1 ~ C [ + C E max IXil'+ 1
1 I$j$n
~ C(S~+I + 2r+ls~+I).

Thus, EIS~/snl'+1 ~ CO + 2,+1), implying {IS;';snl',n ~ I} is uniformly


integrable.

*
Again via Theorem 2, for some B in (0, 00)

EIS;I' ~ B[ E(Xi)2T2 + BEI~~:nIXil'


~ B(t
1
r
JIIXjl>Sj)
Xl]'/2 + 2'B t Jr
j=1 !IXjl>Sj)
IXX

= 0(s;)'/2 + o(s~) = o(s~)

since as noted in Section 9.1 (see, e.g., (10) therein), a Lindeberg condition of
order r > 2 ensures that of lower orders. Thus, E IS:/sn I' = 0(1), implying
{I S:/sn I', n ~ I} is uniformly integrable. Consequently, the same is true of
{ISn/snl', n ~ I}, and so, in view of the central limit theorem (Theorem 9.1.1)
and Corollary 8.1.8, the conclusion (31) follows.

In order to complement Theorem 2 a counterpart to (14) of Lemma 3 is


needed for concave functions. A function <I> is concave if - <I> is convex.

Lemma 5. If <I> is a nondecreasing function on [0, 00], finite and concave on


(0, 00) with <1>(0) = 0, then

<l>(t) = L q>(u)du, t E [0, 00), (32)


11.3 Convex Function Inequalities for Martingales 429

for some finite nonnegative, nonincreasing function cp on [0, 00) and, moreover,

<l>(t) = - I" (t A u)dcp(u) + tcp( 00) (33)

for t E [0, 00), where cp(00) = lim,... 00 cp(t).


PROOF. Lemma 2 ensures (32) and, since <I> is nondecreasing, cp ~ 0. Via
integration by parts in (32) for Riemann-Stieltjes integrals,

<l>(t) = tcp(t) - {u dcp(u)

= - {u dcp(u) - fOOt dcp(u) + tcp( 00)

= - 100

(t A u)dcp(u) + tcp( 00). o


The following is a direct analogue of Lemma 3.

°
Lemma 6. If <I> is a nondecreasing function on [0, 00], finite and concave on
(0, 00) with <1>(0) = and <I>( 00) = <1>(00 - ), thenfor every sequence {zn, n ~ I}
of nonnegative measurable functions on (n,:#', P)

E<I>(~Zj) ~ 2E<I>(~E{Zjl:#'j_d). (34)

°
PROOF. Define
for A >
Jv", Zn, W = Woo, and Z = Zoo as in the proofof Lemma 3, and
set T = inf{n ~ 0: Jv,,+ t > A}. Then WT ~ W A A and
00 00

E ZT = E L zilT~jl = E L E{zjl:#'j_ d I 1nJl


1 1

= E WT ~ E(W A A),
whence
E(Z A A) ~ E(ZT + A.I IT < 00)

~ E(W A A) + A P{W > A} ~ 2 E(W A A).


Consequently, employing Lemma 5,
OO

E <l>(Z) = E[ - 1 (z A u)dcp(u) + ZCP(oo)]


00

= - 1 E(Z A u)dcp(u) + cp( oo)E Z


00

~ -21 E(W A u)dcp(u) + cp(oo)E W = 2E <I>(W). 0


430 11 Martingales

Theorem 3. If (J) is a nondecreasing function on [0, 00], finite and concave on


(0, (0) with (J)(O) = 0, (J)( (0) = (J)( 00 - ) and IX E [1, 2], there exists a constant
A = A<I E (0, (0)' such that for any martingale f
E (J)(f*") ~ A E (J)(s<l(f», (35)
where s<l(f) = If E{ldjl<ll~j_d·
PROOF. By Theorems 11.2.1 and 11.2.2, for some constant C = C<I E (0, (0)
)<112
df
00 00

E(f*t ~ C E ( t ~ C E t1djl<l = C E s<l(f). (36)

For u > 0, define

T = inf{n ~ O:S:+I(f) == nfE{ldjl<ll~j_d > u}


Then sT(f) ~ s<l(f) /\ u and
(f*)<1/\ U ~ uI[T<oo) + suplfTAnI<l.
n~l

Hence, via (36),


E[(f*)<1 /\ u] ~ U P{s<l(f) > u} + C E ~(f)
~ (C + 1)E[s<l(f) /\ u].
Consequently, from Lemma 5 and (36)

E (J)(f*<I) = - {Xl E(f*<I /\ u)dcp(u) + cp(oo)E f*<I


~ -(C + 1) LooE[S<l(f) /\ u]dcp(u) + Ccp(oo)E s<l(f)

~ (C + o[- LooE[S<l(f) /\ u]dcp(u) + cp(oo)E S<l(f)J


= (C + 1)E (J)(s<l(f». 0

Corollary 3. If °
< p < IX and 1 ~ IX ~ 2, there exists a constant
A = A<I E (0, (0) such that, for any martingale f,

(37)

PROOF. (J)(t) = tP/<I, t ~ 0, satisfies the requirements of Theorem 3, whence


E(f*Y = E (J)(f*<I) ~ A E (J)(s<l(f» = A E sP(f). 0

Lemma 7. If {X n, ~n' n ~ 1} is a stochastic sequence and T is an ~n-time


satisfying
11.3 Convex Function Inequalities for Martingales 431

T
E L
i= I
E{IXil11~i_d < 00, (38)

for some y ~ 1, then

(39)

PROOF. When y = 1, the first two lines of the proof of Theorem 7.4.6 (with X:
replaced by IXIII) yield (39). Fory > 1, set v",11 = Ll= I E{IXlllal~i_I}' IX> 0,
n ~ 1, and Yi = IXiI - E{IXill~i-d, i ~ 1, and consider the martingale
f.. = Ll=l YiIIT;;,il' n ~ 1. Since DO=1 E{I YiIIIT~i]I~i-d = L[E{I Yill~i-d
~ 2V1 ,T, it follows, taking IX = 1, et>(x) = Ixl 1 in Theorem 11.3.3, that

1 1
L L
E TAll Yi 1 ~ E sup I" YiIIT;;,i) 1
II II ~ I I

~ B'E[(2V1,T)Y + (~~r IYiIIIT;;,i 1) ]


~ B· E [(2V 1, T)1 + it YiI1IIT~ilJ
l
I

= B·lE(2V1,T)1 + E JI IIT;;,ilE{IYiI11~i-dJ
I
~ 21 + B[E Vl,T + E ~,T].
Consequently,

E ( TAll
~ IX;! )1 ~21 E TAli
~ [I Yi 11 +EVl,T] =0(1),
and so monotone convergence yields (39). o
Theorem 4. If {SII = LJ= I Xi' ~II' n ~ I} is a martingale and T is an ~II-time
with
T )112
E L E{IX;!11~i_d <
T
00, E ( ~ E{Xfl~i-1 < 00 (40)
I

for some y ~ 2, then


(41)
PROOF. By Theo.rem 11.2.1,
TAll
E 1ST 1\111 1 ~ 0(1)' E ( ~ xf
)11 2~ O(I)E (T~ xf )112 = 0(1)
432 II Martingales

via Lemma 7 with IXI replaced by X 2 , whence the conclusion follows by


Fatou's lemma. 0

Corollary 4. If Sn = Lj;l Xj' n ~ 1, where {X n , n ~ I} are independent ran-


dom variables with E X n = 0, n ~ 1, and sUPn~l EIX nl1 < 00 for some y ~ 2,
then EISTI1 < 00 for any {Xn}-time Twith E p/2 < 00.

EXERCISE 11.3
1. Prove Corollary 11.2.3 by applying Corollary 11.3.3.

11.4 Stochastic Inequalities


Throughout this section the generic notation
a=l-a (1)
will be adopted and the conditional expectation E{ V I<:§} and conditional
variance will be denoted E!f V and ai( V) respectively. Integrability require-
ments in many of the lemmas of this section can clearly be weakened to a.c.
finiteness of the conditional expectations therein.

Lemma 1. If<:§ is a a-algebra ofmeasurable sets, 13 is a <:§-measurable r. v. at most


equal to one a.c., and V is an .!l' 2 random variable satisfying
E!f V ~ - aai( V), P{f3 +V ~ l} = I, (2)
for some a > 0, then

E f3+V <_13_
!f 1 + af3 + V-I + ap'
PROOF.

1 1 aV
1 + af3 + V - 1 + ap = 1 + af3 + V 1 + ap

=
aV (1 1 av)
1 + ap 1 + ap + 1 + af3 + V 1 + ap ~
2 2
aV + a V
(1 + ap)2 ' (3)
and so via (2)
1 1 1 a E!f U
1 + af3 + E!f U - 1 + ap 1 + af3 + E!f U 1 + ap
< - a 2 ai( U)
(4)
- (I + af3 + E!f U)2'
11.4 Stochastic Inequalities 433

Hence, from (3) and then (4)

E 1 _ E 1
'§ 1 + 1X{3 + U - '§ 1 + 1X({3 + E'§ U) + (U - E'§ U)
1 1X2CJ~( U) 1
< + <--.
- 1 + 1X{3 + E'§ U (I + 1X{3 + E'§ U)2 - 1 + 1X/3
Consequently,

E f3 + U
'§ I + 1X{3 + U
= E ! [-I +
'§ IX
1 + IX ] <! [-I
1 + 1X{3 + U - IX
+~]
1 + 1X/3
f3
=1+1X/3"
o

Lemma 2. If {Sn = L1 Xi' n ;;::: I} is an Y 2 stochastic sequence satisfying


E~nXn+ I ~ -1XCJ.}JX n+I), P{Sn ~ I} = I, n;;::: I, (5)
for some IX> 0, then {Sn/[I + 1X(1 - Sn)], 17 n, n ;;::: I} is a supermartingale.
PROOF" By Lemma I, for n ;;::: 1

E~
Sn+1
n 1 + IXSn+1
-E
- ~ Sn+Xn+1 Sn
<----"--,~
n 1 + IXS n + X n+1 - 1 + IXS n
o
According to the ensuing theorem, it is uncertain that a nonnegative super-
martingale Sn starting in (0, I) will ever exceed one if the conditional coef-
ficients of variation of the differences are bounded away from zero.

Theorem I (Dubins-Savage). If P{O ~ X I ~ I} = 1 and {Sn = I.i Xi'


17n' 1 ~ n < oo} is a nonnegative Y 2 supermartingale satisfying
E{X nl17 n-d ~ -IXCJ}n_,(X n), n> I, (6)
for some IX > 0, then
X
P{Sn;;::: I, some n ;;::: I} ~ E 1 + 1X(11- XI) (7)

PROOF. Let T = inf {n ;;::: I: Sn ;;::: l} and set v" = L7= I lj, where
y" = XnI[T>nl + (l - Sn-I)I[T=nl' So = o. (8)
On [T = n], 0 < Yn = 1 - Sn-I ~ X n , and so Y; ~ X; on n, whence
E'F n_I Y; ~ E~n_ IX;, n > 1. On the set [T ;;::: n]
E~n_. Yn ~ E~n_,Xn ~ -IXCJ}n_,(X n),
so that E}n_. Y" ; : : E}n_ .Xn, implying CJ}n_ .(y") ~ lX}n __ ,(Xn) on [T ;;::: n].
Therefore, (6) obtains for {y"} on [T ;;::: n], and on the complement [T < n]
434 11 Martingales

it is trivially true since y" = O. Hence, Lemma 2 ensures that {v,,/( I + ex v,,),
~n' n ~ I} is a supermartingale. Consequently, noting that VT = I, setting
T(n) = min[T, n], and employing Theorem 7.4.5.

P{Sn ~ I,somen ~ I} = P{v" = I,somen ~ l} =


i [T<oo]
I
VT
+ ex
V
T

· E
< I1m VT(n)
. < E Y1
- n-oo I + exVT(n) I + exYI
Since Xl = Y1 , (7) follows. D

Theorem 2. If {Sn = L:~ Xi' ~n, n ~ I} is an fe 2 martingale with E Sl =0


and ~o = (0, {l), thenfor any positive constants a, b

P{Sn ~ a ~ E{XJI~j-d + b, some n ~ I} ~ (l + ab)-l. (9)

PROOF. It suffices to prove the theorem when a = 1. Define So = X o = 0,


~ 0 = ~ - 1, and for K > 0 set

U = Sn + K - L:j;o E~j_' XJ n ~ 0,
n K +b '
whence Un = Un - U n- 1 = (K + b)-I[X n - E~n_. X;], n ~ 1, and Uo =
K(K + b)-l = Uo . Note that for n ~ I
-':1
E ~n-I U
n
- - E ~n-' X n2
=K+b

and
a}n_.(u n) = (K + b)-2 E~n_. X; = -(K + b)-l E~n_' Un' n ~ I,
so that for ex = (K + b)
(10)
Then, via Theorem 1

P{Sn ~ ~ Ej-1XJ + b, some n ~ I} = P{U n ~ 1, some n ~ O}


Uo K 1 K-oo 1
< = - - -------+ - - D
1 + exU o K + b 1 + b 1 + b'

Corollary l. If {X n , n ~ I} are independent LV.S with E X n = 0, EX; = I,


n ~ I, and Sn = L~ Xi' then for positive a,b
1
P{Sn ~ an + b,somen ~ I} ~ I + abo
11.4 Stochastic Inequalities 435

Notation. In the remainder of this section, the generic notation


a' = eO - 1 - a (11)
will be employed.

Next, analogues of Lemmas I and 2 will yield a counterpart of Theorem 1.

Lemma 3. If v, V, Yare r.v.s with Y ~ 0, I V, ~ I, E V = 0, and both Yand V


are C§-measurable for some a-algebra C§ of measurable sets, then

E~[cosh Y(V + V)] ~ exp(Y' E~ V 2 )cosh YV.

PROOf. (YV)' = Lf.z yjVjjjl ~ VZY', whence

E~ e
YU
~ I + Y' E!J V Z ~ exp(Y' E!J V 2 ). (12)

Applying (12) to - V also,


E!J cosh Y(V + V) = i E!J(eY(V+U) + e-Y(V+U) ~ exp(Y' E!J VZ)cosh YV.
o
Lemma 4. Let {8n = L~ Xj' ff n, n ~ 1} be a martingale with IXn, ~ I,
n ~ I, and set Un = E,.
_I X;, n ~ I, where ff 0 = (0, Q). For any real y and

positive numbers A., u

.r.
{ ex p {- )=1 Uj(U+ ~j .)'}COSh A( Y+L~n
lUI U+
.), jOn,
lU)
n ~ I}
is a supermartingale.
PROOF. Designate the putative supermartingale by {v", jOn, n ~ 1} and set
Uo =u. By Lemma 3, for n ~ 2

E3'._1 v" = exp { - 7~ Uj


(
U + L{
A.
Ui
)'}
E,._ 1cosh
A(y + La
8 n- 1
Uj
+ X n)

~ exp { - n
.LUj
(A)' (A )'} A(yLn.
" j . + Un "n . cosh
+8 n - 1)
~ Vn -l·
)=1 L.o U • L.o U ) oU)

Setting Vo = cosh(Ayju), the same argument shows that irE Xl = 0, {v", jOn,
n ~ O} is likewise a supermartingale. 0

Theorem 3. If {8n = L~ Xj' jOn' n ~ 1} is an .f£'z martingale with E 8 1 = 0


and T is a stopping variable for which X; ~ Ion {T ~ n}, n ~ I, thenfor any
real y and positive numbers A., U

E cosh "T +E 8
A(y T)
XZ ~ e
u(A)' Ay
- cosh - . (13)
U + L.l 'j_1 j U U
436 11 Martingales

PROOF. Set Uo = U and Un = Ej'"_1 X:, n ~ 1. If Yn = XnI 1nn ), Un =


Ej'" _I Y:, °
then Ej'" _I Yn = and Un = unI lnn )· Since Y: ~ 1 and the indices
in the sums below are at most T,
y + If X j U + If lj
~-+ If Uj = + If U/
X: ~ 1, a.c. Now, for a > °
U

whence no generality is lost in supposing

( A)j (A)j(- -
U )2
(U ~ a)' I1- -< I1-
00 00

j=2j! U + a- j=d! U U + a

= (u: ar(~r
implying

I u· (A)'
n
j=l
- .- < (A)'
IbUj -
J
- I u·
j=l U
n
J
( - .- -U)2
If=ou j
< u2
-
-
U
(1 1)
(A)' I-;=;-:--
n
I Ib I Uj D=ou j

(14)

Employing the notation and final remark of Lemma 4, setting T(n) =


min[T, n], and invoking Theorem 7.4.5,

E cosh A(Y +",TST) = E VT exp { ITI Uj ~)'}


(A
U+ L.,I Uj L.,O Uj

~ lim E VT(n) exp { II


T(n)
Uj
(
"'j
A )' }
n L.,O U j

recalling (14). o

Corollary 2. If {Sn = I~ Xj' f7 n' n ~ I} is an .P 2 martingale with E X I = 0,


U I = E xi > 0, and T is a stopping variable with X: ~ Ion [T ~ n], n ~ 1,

then for any A > °


EeXP{If E~~, Xf} ~ 2eXP{2U I (:J}
11.4 Stochastic Inequalities 437

PROOF. Via Theorem 3 and then Lemma 3 with V = 0, Y = A/U h V = XI'


r§ = (0,0),
AST } ~ 2 E cosh "T
E exp { "T AST = 2 E E {cosh
A "TST I=
.'#' I
}
L.,I ui L.,I U i L.,l U i

Corollary 3. Under the conditions of Corollary 2

E eXP{AST - ;'" ~ E~j_' XJ} ~ 1. (15)

PROOF. As in the theorem, suppose X; ~ 1, a.c. By (12) with Y = A,


<"§ = .'Fn - l>

E,j'n_, eXP{ASn - X~Uj}

= eXP(ASn- 1 - A' ~J1j)E~"_,eAXn

~ eXP{ASn_ 1 - X ~u j + XU n}

= eXP{AS n_ 1 - X nt1Uj},

whence {exp{ASn - A.' Ii U), ~n' n ~ I} is a nonnegative supermartingale.


Again via (12)

E eXP{AS T - X ~ U ~ lim E eXP{AST(n) -


j} A' ~)Uj}
o
Theorem 4 (Blackwell). If {Sn = Ii Xi' ~n, n ~ O} is a supermartingale with
So = 0, IXnl ~ 1, and E~n_1Xn ~ -rxfor some rx in (0,1) and all n ~ 1, then
I rx).
P{Sn ~ A > 0, some n ~ I} ~ ( 1 + rx . (16)

PROOF. If Fn is the conditional distribution of X n given ~ n_ l ' by convexity


of the exponential function for e > and n ~ 1 °
438 II Martingales

I - IX 8 I + IX -8 I I+IX
<--e +--e < for 0 :::; lJ :::; log I _ IX
- 2 2 -

= eo (say). Consequently, for n ~ I


E'~"-I e 8oS" = e8oS" - E I
fFn-1
e8oX " <
-
e8oS" - I ,
whence {e 8oS", fF n , n ~ O} is a supermartingale. Hence, for any stopping
rule T, setting T(n) = min[T, n],
E e80ST :::; lim E e8oST (") :::; E e8oSo = I.

Finally, if T = inf{n ~ I: Sn ~ A. > O},


e 801
P{Sn ~ A. for some n ~ I} = e801 P{T < co} :::; E e80ST :::; I,
which is tantamount to (16). o
Suppose that a gambler with initial capital 10 E (0, I) wins or loses (at
each play) whatever amount he stakes with probabilities P and q respectively,
where 0 < P :::; t, the successive plays being independent. What is the best
gambling scheme for boosting his capital to one dollar? It will be shown that a
bold strategy is optimal, namely, whenever his capital In :::; the entire to
capital should be staked, whereas whenever J. > only the amount I - J. to
(needed to obtain his goal) should be bet.
Let {X n' n ~ I} be i.i.d. random variables with
P{X n = I} = I - P{X n = - I} = P :::; t
and let Ii = min(a, I - a) whenever 0:::; a :::; 1.
For any constant 0:::; I = 10:::; I, define for n ~ 1,

f,n = f,n- f,- X = {In-I(l + X n) if 0 :::; In-I:::; t (17)


1 + n- 1 n
In - •(l - X n) + Xn
Of 1 f,
1 2":::; n- I:::; ,
I

and note that {In, n ~ I} represent the successive (random) fortunes


associated with the bold strategy just described. Set

Pn(f) = p{ m~x fj ~ I},


O$)$n
o:::; I :::; I, n = 0, I, ... ,

Pn(f) = I if I ~ I, pif) = 0 if I :::; o. (18)

Lemma 5. For any constants 0:::; I - s :::; f, setting q = I - P,


Pn+ 1 (f) ~ PPn(f + s) + qPn(f - s), n = 0, I,. 0 0 0 (19)
PROOF. For every n, by (17) Pn(f)jin f If An(f) = {maxO$i$nfj ~ I}, then
Pn+I(!f)=pP{An+I<!f)IX 1 = I} +qP{An+I(!f)IX. =-l}
= P P{An(f)} = PPn(f)
11.4 Stochastic Inequalities 439

and

P.+IC ;f)=PP{A.+I(~~f)IXI = I}
+ q P { A. + 1 C~ f) IXI = - 1}

= p + q P{A.(f)} = P + qP.(f)·
Hence, for 0 :::; f :::; 1 and n 2: 0
P.+ 1(f) = pp.(2f) + qp.(2f - 1), (20)
and obviously (20) holds for f :::; 0 or f 2: 1. Consequently, (20) obtains for all
f E (-00,00) and all n 2: O.
To prove (19), define for 0:::; f - s :::; f and n 2: 0
!l.(f, s) = P.+ I(f) - PP.(f + s) - qP.(f - s). (21)
If f + s < 1, then !lo(f, s) = PI(f) 2: 0, while if f + s 2: 1, monotonicity
of P. and (20) ensure !lo(f, s) 2: O. Hence, for 0 :::; f - s :::; f and n = 0
(22)
Now from (20) and (21)
!l.(f, s) = p(P.(2f) - P.(f + s)] + q(P.(2f - 1) - P.(f - s)],
and employing (20) once more
!l.+ l(f, s) = p 2[P.(4f) - p.(2f + 2s)] + q2(P.(4f - 3) - p.(2f - 2s - 1)]
+ pq(P.(4f - 1) - p.(2f + 2s - 1)
+ p.(4f - 2) - p.(2f - 2s)], (23)
and so
!l.+ l(f, s) = pA.(2f, 2s) + q{p(P.(4f - 2) - p.(2f + 2s - 1)]
+ q(P.(4f - 3) - p.(2f - 2s - 1m· (24)
To verify (22) for n 2: 1, suppose inductively that it holds for a fixed
but arbitrary integer n. If (i) f - s 2: t, then 2f - 1 2: 2s, whence by (24)
!l.+I(f, s) = p!l.(2f, 2s) + q!l.(2f - 1, 2s) 2: 0; if (ii) f + s :::; t, then 2f-
2s :::; 2f + 2s :::; 1 and again via (24), !l. + 1(f, s) 2: p!l.(2f, 2s) 2: 0; finally, if
t
(iii) 0 :::; f - s < < f + s, then 1 < f, and by (23)
!l.+ l(f, s) 2: pq[p.(4f - 1) - p.(2f + 2s - 1) + P.(4f - 2) - p.(2f - 2s)].
(25)
Since q 2: p, if s > 1, then from (25)
!l.+ I(f, s) 2: p{(P(P.(4f - 1) - p.(2f + 2s - 1)]
440 11 Martingales

+ q[Pn(4f - 2) - Pn(2f - 2s)]}


= p6. n(2f - t, 2s - t) ~ 0,

and if s ::; i. again via (25)


6.n + 1(/' s) ~ P{P[Pn(4f - I) - Pn(2f - 2s)]
+ q[Pn(4f - 2) - Pn(2f + 2s - I)]
= p6..(2f - t, t - 2s) ~ 0,
completing the induction. D

Lemma 6. Let {X n, :l'n, I ::;; n ::;; N < oo} be an .ff I stochastic sequence, let
CN be the class of all stopping rules T with P{T ::;; N} = I and define
YN = yZ = XN,
(26)
Yn = Y: = max [X n, E{Yn+ II:l'n}], I::;; n < N.
Then, if a = inf{n ~ I: X n = Yn},
supEXT=EYI =EX u ' (27)
TeCN

PROOF. Clearly, {Yn, :l'n, I ::; n ::; N} is a supermartingale. Thus, for any
TEC N ,

Moreover,

E"t=[ YI+[ ')11=[ YI+[ E{')I21:l'd


J lu = II J lu > II J lu = II Jlu> II

= [ YI + [ ')12 = [ ')II + [ Y2 + [ ')12


J lu = I) Jlu > I) Jlu = 1) J lu = 2] Jlu> 2)

= ... = ~ [ ')Ij = E Yu = E Xu,


j= I JIU=j]

and (27) follows. D

Lemma 7. For any random variables Yt> ... , YN setting X n = llYn "2 I) and
:l'. = a(YI , ... , y"), I ::;; n ::;; N,

p{ max Yn
l:5,n:5,N
~ I} = EYI (28)

where ')I 1 is as in (26).


11.4 Stochastic Inequalities 441

PROOF. Set T = inf{1 ~ n ~ N: X. = I} where inf 0 = N. Then TEeN


and

P{IsjsN
max lj ~ I} = P{IsjSN
max Xj = I}
= P{T < N} + P{T = N, X N = I}
=EXr=EYI
by Lemma 6. o
TheoremS (Dvoretzky). Let {X., n ~ l} be i.i.d. with P{X I = I} = P = 1 -
P{X I = -I} and q = 1 - P ~!. Set !!F. = <1(X I ,· .. ,X.), n ~ 1, and
!!F o = {0, Q}. For any constant 0 < f = fo ~ I, designate the fortunes
associated with the bold strategy by
f. = In-I + in-IX. where ii = min(a, 1 - a). (29)
If g., n ~ I, are any !!F.-measurable functions with 0 ~ g. ~ 1 and h., n ~ 1,
are the fortunes associated with this alternative betting strategy, that is,
ho = f and
n ~ 1, (30)
then for aU n = 0, 1, ... , and aUf E (0, 1)

PN(f) = pt~:SXNJj ~ I} ~ pt~::N h ~ I}. j (31)


PROOF. Without loss of generality suppose 0 < f < 1. Set
dN = I1hN;;'ll' d. = max[Ilhn;;'l)' E{d.+i1SiO.}] (32)
for 0 ~ n < N. By Lemma 7

pt~:SXN h ~ I} = Edo,
j (33)

and obviously dN = Po(h N ), a.c.


Suppose inductively for some n in [1, N) that
d.+ I ~ PN-.-I(h.+ I), a.c.

Now (32) entails

d. = 1 = PN-.(h.) if h. ~ I,
while if h. < 1, via the induction hypothesis with probability one,
d. = E{d.+ I I!!F.} ~ E{PN-n-I(h.+ I)I!!F.}
= E{PN_._I(h. + h.g.Xn+I)ISiO.}
442 11 Martingales

= P'PN-"-I(h" + h,,9,,) + Q'PN-"-I(h" - h,,9,,)


~ PN-,,(h,,),

recalling Lemma 5. This completes the (backward) induction, whence for


o~ n ~ N
d" ~ PN-,,(h,,),
and, in particular, recalling (33),

p{ max hj
O~;,j:sN
~ t} = Edo ~ PN(ho) = PN(f). o

References
D. G. Austin, "A sample property of martingales," Ann. Math. Stat. 37 (1966), 1396-
1397.
D. Blackwell, "On optimal systems," Ann. Math. Stat. 25 (1954),394-397.
B. M. Brown, "A note on convergence of moments," Ann. Math. Stat. 42 (1971),
777-779.
D. L. Burkholder, "Martingale transforms," Ann. Math. Stat. 37 (1966),1494-1504.
D. L. Burkholder, "Distribution function inequalities for martingales," Ann. Prob-
ability 1 (1973), 19-42.
D. L. Burkholder and R. F. Gundy, "Extrapolation and interpolation of quasi-linear
operators on martingales," Acta Math. 124 (1970),249-304.
D. L. Burkholder, B. J. Davis, and R. F. Gundy, "Inequalities for convex functions of
operators on martingales," Proc. Sixth Berkeley Symp. Math. Stat. Prob. 2 (1972),
223-240.
Y.S. Chow, "On a strong law of large numbers for martingales," Ann. Math. Stat. 38
(\967),610.
Y. S. Chow, "Convergence of sums of squares of martingale differences," Ann. Math.
Stat. 39 (1968),123-133.
Y. S. Chow, "On the Lp-convergence for n-IIPS", 0< p < 2," Ann. Math. Stat. 42
(1971),393-394.
K. L. Chung, A Course in Probability Theory, Harcourt Brace, New York, 1968; 2nd ed.,
Academic Press, New York, 1974.
B. Davis, "A comparison test for martingale inequalities," Ann. Math. Stat. 40 (1969),
505-508.
1. L. Doob, Stochastic Processes, Wiley, New York, 1953.
L. E. Dubins and D. A. Freedman, "A sharper form of the Borel-Cantelli lemma and the
strong law," Ann. Math. Stat. 36 (1965),800-807.
L. E. Dubins and L. J. Savage, How to Gamble /frou Must, McGraw-Hill, New York,
1965.
A. M. Garsia, "On a convex function inequality for martingales," Ann. Probability 1
(1973),171-174.
R. F. Gundy, "A decomposition for g\-bounded martingales," Ann. Math. Stat. 39
(\968), 134-138.
J. Neveu, Martingales a temps discrets, Masson, Paris, 1972.
References 443

R. Panzone, "Alternative proofs for certain upcrossing inequalities," Ann. Math. Stat.
38 (1967),735-741.
E. M. Stein, Topics in Harmonic Analysis Related to the Littlewood-Paley Theory,
Princeton Univ. Press, Princeton, 1970.
H. Teicher, "Moments of randomly stopped sums-revisited," Jour. Theor. Prob. 9
(1995), 779-793.
A. Zygmund, Trigonometric Series, Vol. I, Cambridge, 1959.
12
Infinitely Divisible Laws

Row sums L~::'l X,,; of arrays of random variables {X,,;, 1 ;s; i ;s; k" -+ 00,
n ~ I} that are rowwise independent have been considered briefly with respect
to the Marcinkiewicz-Zygmund type strong laws of large numbers (Example
10.4.1). In this same context, non-Iterated Logarithm laws and generalizations
thereof have been dealt with by H. Cramer and C. Esseen (see references at
the end of this chapter). Here, limit distributions of row sums of the variables
in such an array will be treated.
It is a remarkable fact that the class oflimit distributions of normed sums of
i.i.d. random variables is severely circumscribed. If the underlying L v.s, say
{X"' n ~ I} have merely absolute moments of order r, then for r ~ 2 only
the normal distribution can arise as a limit, while if 0 < r ;S; 2, the limit law
belongs to a class called stable distributions. If the basic LV.S are merely
independent (and infinitesimal when normed cf. (l) of Section 2), a larger
class of limit laws, the so-called class !f' emerges. But even the class !f' does
not contain a distribution of such crucial importance as the Poisson. A
perusal of the derivation (Chapter 2) ofthe Poisson law as a limit of binomial
laws B" reveals that the success probability associated with B" is a function
of n. Thus, if B,,_ I is envisaged as the distribution of the sum of i.i.d. random
variables Yl , ... , Y,,-l, then B" must be the distribution of the sum of n
different i.i.d. random variables which, therefore, may as well be labeled
X". l' X". 2' ... , X".". In other words, to obtain the Poisson law as a limit of
distributions of sums of i.i.d. (or even independent) random variables, a
double sequence schema {X "j' j = 1, ... , k,,_ oo} must be employed (with
X".I"'" X".k n independent within rows for each n = 1,2, ...). Under one
further proviso, the class of limit laws of (row) sums of such r.v.s coincides
with the class of infinitely divisible laws.

444
12.1 Infinitely Divisible Characteristic Functions 445

12.1 Infinitely Divisible Characteristic Functions


It should be borne in mind that the notion of infinite divisibility as presented
below is a distribution concept requiring no mention or consideration of
LV.S. In fact, an attempt to define it via r.v.s can lead to unnecessary difficulty
and complication (see Gnedenko and Kolmogorov (1954»).

Definition. AdJ. F is called infinitely divisible (i.d.) if for every integer n ;::: 1
there exists adJ. Fnsuch that F = Fn * Fn * ... * Fn = (F n)"' or equivalently if
its d. cP (also called i.d.) is the nth power of a c.f. CPn for every integer n ;::: 1.

Clearly, the normal, Poisson, and degenerate distributions are i.d.


Moreover, if F(x) is i.d., all distributions of the same type F(ax + b) are i.d.
A number of useful facts about i.d. distributions and d.s will be amassed
in the propositions which follow.

Proposition l. AdJ. F with bounded support is i.d. iff it is degenerate.


PROOF. Although the proof can be couched solely in terms of d.f.s, it is more
easily intuited, and hence will be presented, in terms of r.v.s. Thus, if X is a
LV. with dJ. F, the hypothesis implies that with probability one, IXI ~ C < 00.
Without loss of generality suppose EX = O. Then, since F = (F n )"', n ;::: I, if
{X ni, I ~ i ~ n} are (fictitious) i.i.d. random variables with dJ. F n' necessarily
E X nl = 0 and IXn,1 ~ Cjn with probability one. Consequently, if (J2
denotes the variance of X, 0 ~ (J2 = L~= 1 E(X ni - E X n ;)2 = n E X;l ~
n( CIn)2 = 0(1), whence (J2 = 0, implying P {X = EX} = I, i.e., F degenerate.
D
Proposition 2. An i.d. d. cp(t) does not vanish (for real t).
PROOF. By hypothesis, cP = cP~, n;::: 1, with CPn a c.f. Then r/J = Icpl2 and
r/Jn = ICPnl 2 are real-valued d.s and the positive real function r/J has a unique
real, positive nth root, say r/JI/n, n ;::: I. Since necessarily r/J = r/J:, n ;::: 1, the
positive real function r/Jn must coincide with r/JI/n. Thus, 0 ~ r/J ~ 1 implies
r(t) = limn_co r/Jn(t) = 1 or 0 according as r/J(t) > 0 or r/J(t) = O. Then,
r/J(O) = 1 and continuity ofr/J imply r(t} = 1 throughout some neighborhood
of t = O. Theorem 8.3.3 ensures that f(t) is a d., whence continuity dictates
that r is nonvanishing. Hence, r/J and therefore also cP is nonvanishing. D

If cP is an i.d. d. so that cP = CP:, n ;::: 1, it seems plausible that CPn = cpl/n.


But how can one choose a continuous version of cpl/n? The following lemma
asserts that a continuous logarithm (hence also nth root) of a continuous,
nonvanishing complex function on ( - 00, (0) can be defined.

Lemma l. IJJ(t) is a continuous, nonvanishing complex Junction on [ - T, T]


with J(O) = 1, there is a unique (single-valued) continuous Junction ..t(t) on
446 12 Infinilely Divisible Laws

[ - T, T] with A(O) = 0 and f(t) = eA(I). Moreover, [ - T, T] is replaceable by


( - 00,(0).

PROOF. If PT = inf[ - T, T) I f(t) I, then 0 < PT S; l. Since f is uniformly


continuous on [ - T, T], there exists b T in (0, PT) such that It' - tiS; b T
implies If(1') - f(t)1 < pT!2 S; 1· Choose points {tJ with to = 0 such that
- T = Lift < ... < L 1 < to < t l < ... < tift = T and t j +I - t j = t l - to
S; bT . Define

(-Iy-I .
= L .
<Xl

L(z) (z - 1)1, Iz-II<l.


j= I J
Then L(z) is the unique determination (principal value) of log z vanishing at
z = l. For t E [t _ I' t I], 1f(t) - II = 1f(t) - f(t o) I S; 1, and so L(f(t» is
defined. Set
A(t) = L(f(t», t E [L I' til
Then ,1.(0) = L(l) = 0 and A.(t) is continuous with exp{A.(t)} = f(t) in
[t-I, til Since for tE [t k, tk+I],I(f(t)/f(t k» - II S; (PT/2)/PT = 1, for any
k the definition of A. may be extended from [t -k> t k] to [tk> tk+ I] by A.(t) =
A.(tk) + L«f(t)/f(t k))); analogously, replacing t k by t -k> the definition extends
to [L k- h Lkl Then A.(t) is defined and continuous in [ - T, T], and for
t E [tk> tk+ I], k ~ I,

eA(I) = ex p ( L(;g!») + A.(tk») = ex p ( L(;(~!»)


+ k~l L (f(t j+
j=O
1»)) = f(t).
f(t)
A similar statement holds in [L k- I ' Lkl Next, given A. in [ - T, T], it may
be extended by the prior method to [- T - 1, T + I], and hence by
induction to ( - 00, (0). Finally, if two such functions A and A.' exist, eA(I) =
e A'(!), whence A.(t) - A.'(t) = 2nik(t) with k(t) an integer. Since k(t) is continuous
with k(O) = 0, necessarily k(t) = 0 and A. is unique.

Definition. The function A.(t) defined by Lemma I is called the distinguished


logarithm off(t) and is denoted by Log f(t). Also exp{(l/n)A.(t)} is called the
distinguished nth root off(t) and is denoted by fl/"(t).

Note. Clearly, if t/J(t) is a continuous complex function on ( - 00, (0) with


t/J(O) = 0, then Log e"'(l) = t/J(t). Moreover, forf, g as in Lemma I, Logf· g =
Log f + Log g, Log(f/g) = Log f - Log g, and Log f = L(f) for 1tiS; T
whenever sUPIII:STlf(t) - 11 < l. Thus, for k an integer, Log(eDil+2klti) = ait
and

Lemma 2. Let f,.h. k ~ I be as in Lemma l. Iff" --> f uniformly in [ - T, T],


then Log f" - Log f -+ 0 uniformly in [ - T, Tl
12.1 Infinitely Divisible Characteristic Functions 447

PROOF. Since minlll~T If(t)1 >


!
and°
SUPIII ~ T I(J".(t)j f(t» - II ~ for k ~ K o . Then
J". - f uniformly in [-T, T],

Log J". - Log f J". = L (fk(t»)


= Log 7 f(t) - L(I) = 0, uniformly in [ - T, T].
o
Proposition 3. A d. q>(t) is i.d. iff its distinguished nth root q>l/ n(t) = e(l/n)Logtp(l)
is a c.f.jor every positive integer.
PROOF. If q> is i.d., q> = q>~, n ~ 1, and so by Proposition 2, q> and hence also
q>n is nonvanishing, whence their distinguished nth roots and logarithms
are well defined by Lemma 1. Moreover, eLogtp = q> = q>~ = e"Logtpn, so that
Log q>(t) = n Log q>n(t) + 2nik(t) with k(t) an integer. Since Log q> and Log q>n
are continuous and vanish at zero, k(t) = 0, whence Log q>n = (ljn)Log q>,
which is tantamount to q>n = q>l/n.
Conversely if the distinguished nth root of q> exists and is a d. for every
n ~ 1, q> = eLogtp = (eO/n)Logtp)n shows that q> is i.d. 0

Proposition 4. A finite product of i.d. d.s is i.d. Moreover, if i.d. d.s q>k - q>,
a d., then this limit d. q> is i.d.
PROOF. Clearly, if q> = q>~, t/J = t/J~, n ~ 1, then q>' t/J = [q>n . t/Jn]n, n ~ 1,
shows that a product of two and hence any finite number of Ld. d.s is i.d.
Suppose next that the i.d. d.s q>k - q>, a d. Then, the i.d. d.s t/Jk(t) =
lq>k(t)1 2 = q>k(t)· q>k( -t) - the d. t/J(t) = 1q>(tW. Consequently, t/J~/n as the
positive nth root of the positive function t/Jk tends as k - 00 to the nonnegative
nth root t/Jl/nofthe nonnegative function t/J. Since for n ~ 1, t/J~/n is a sequence
of d.s whose limit t/J l /n is continuous at t = 0, t/J l /n is a d. for n ~ 1. Thus, t/J
is i.d. and hence nonvanishing. Consequently, <p is nonvanishing, whence
Log q> is defined by Lemma 1. By Lemma 2,

as k - 00, and since q>l/ n is continuous at t = 0, it is a c.f. for all n ~ 1, so that


q> is i.d. by Proposition 3. 0

Since c.f.s exp{A.(e iIU - 1) + itO}, A. > 0, of Poisson type are i.d., it follows
from Proposition 4 that exp{D= 1 [A.J{eiluj - 1) + itOj ]} and hence also
exp{itO + J~«) (e i1u - I)dG(u)} with G a bounded, increasing function is i.d.
The latter comes close but does not quite exhaust the class of i.d. d.s.

Proposition 5. The class of i.d. laws coincides with the class of distribution
limits offinite convolutions ofdistributions of Poisson type.
PROOF. That every such limit is i.d. follows directly from Proposition 4.
Conversely, if q> is i.d., so that q> = q>~, n ~ 1, then

n[q>n(t) - 1] = n[e(l/n)Logtp - 1] -+ Log q>,


448 12 Infinitely Divisible Laws

that is
lim exp{n[lpn(t) - I]} = lp(t).
Now,

n ~ 1,

and a net -00 < - M n = Un,l < U n ,2 < ... < un,kn+1 = M n < 00 may be
chosen whose points are continuity points of F n and such that
1
max(u n j+1 - un) ~-3
j' '2n
and

Then for It I ~ n, choosing An, j = n[Fn< Un, j +I) - Fn( Un, j)],

If"~~'ii+ I(e il" - l)n dFn(u) - Anjeit"n,i - 1) I


"n,i+ 1 it I
=
I f
"n,i (e " - eil"n.i)n dFn<u)

1 f"n,i+ 1

~"2 dFn(u).
n "n,j

Hence, for It I ~ n, summing over 1 ~ j ~ kn ,

<Xl .~ An,){eil"n,i - I
If -<Xl
(eil" - l)n dFn(u) - 1)
)-1

f.
kn 1 f"n'i 1
~ 2n dFn(u) + L"2 + 1
dFn(u) ~ -.
lI"I~Mnl 1 n "n,i n
2onsequently, for It I ~ n and sufficiently large n

I
en('Pn(tj-Ij - .n
)=1
exp(Anjeit"n,i - 1»\

= leXP(~Anjeil"n'i -1)) - exp(n(lpn(t) - 1»1


= len('Pn(lj-i) II exp (~ Anjeil"n,i - 1) - f~<Xl (e il" - l)n dFn(U») - 11
~ 2Ilp(t)12(e l / n - 1) = 0(1),
12.1 Infinitely Divisible Characteristic Functions 449

recalling Corollary 8.4.1. Consequently, for all real t

<p(t) = lim en[<pn(I)- I)


n-oo
= lim n exp(A'n,){ei1u".i -
kn

n-oo j= 1
1». 0

For any real y and nondecreasing, left-continuous G(u) with G( - (0) = 0,


G(oo) < 00, set

l{!(t) = l{!(t; y, G) = iyt + oo (ellU. - f


-00 1- 1
itu )
+ u2
(1 + 2
U )
~ dG(u), (1)

where the integrand, say h(t, u), is defined at u = 0 by continuity (whence


h(t, 0) = - t 2 /2). Since ei1u - 1 = O( 1) as Iu I -. 00 and ei1u - 1 = itu + O(u 2 )
as u -+ 0, dominated convergence ensures that l{!(t) is continuous, and,
clearly, l{!(0) = 0 and l{!(t) = Log eoJ/(I).

Theorem 1. <p(t) = exp{l{!(t; y, G)} as defined by (1) is an i.d. d.for every real
y and G as stipulated. Moreover, <p uniquely determines y and G.
PROOF. The integrand h(t, u) of l{! satisfies Ih(t, u)1 ::s; C < 00 for It I ::s; T
and all real u. Choose -M n = Un,l < Un ,2 < ... < un ,kn +l = M n to be
nonzero continuity points of G for which

Then, for ItI ::s; T

If:oo h(t, u)dG(u) - jtl h(t, un.)[G(Un,j+ 1) - G(un,)] I


kn G(u . ) - G(u.) 1
L
< C
- f. [lul~Mn)
dG(u) +
j;1
n,)+ I
2nG(00)
n,) ::s; -.
n

Thus, setting An,j = [(1 + u;,)/u;,j] [G(un,j+ 1) - G(un,j)] and

y G(un,j+ I) - G(un,j)
an,j = kn - u .
n,)
'

n exp(itan,j + AnJeilUn.i -
kn
= lim 1»,
ft-(I) j=l

and so <p(t) is a d. and i.d. by Proposition 4.


450 12 Infinitely Divisible Laws

Apropos of uniqueness, define

- V(t) = f'+ 1t/I(w)dw -


1-1 2t/1(t) =
f'+ 1 fOO-00 h(w, u)dG(u)dw + 2iyt -
1-1 2t/1(t)
iU
= fOO [eilu(eiU ~ e- ) _ 2 _ ~](~)dG(U)
_00 IU 2 2
1+u u

- 2 f:oo h(t, u)dG(u)


2f:oo (1 - Si: u) C:2u
2
= - e
ilu
) dG(u) = - f:oo eilu dH(u),

2f 00 (1 - Si: x) C:2X2 )dG(X).


where

H(u) =

Clearly, H is nondecreasing and left continuous with H( - (0) = 0, H( (0) =


C 1 < 00. Thus, t/I determines V, which, in turn, determines H (by Theorem
8.3.1), which by Theorem 6.5.2 determines G (hence, also y). D

Let {G, GN , n ~ I} be nondecreasing, left-continuous functions ofbounded


variation with G( - (0) = G,,( - (0) = 0, n ~ 1. Recall as in Section 8.1 that
GN ~ G iff G" ~ G and G,,(oo) -+ G(oo), G,,( - (0) -+ G( - (0).

Theorem 2. Let {y, YN' n ~ I} be finite real numbers and {G, GN, n ~ I}
nondecreasing left-continuous functions of bounded variation which vanish at
- 00.1fYN -+ Yand GN~ G, then t/I(t, YN' G,,) -+ t/I(t; Y, G)for all real t, where t/I
is as in (1). Conversely, if t/I(t; Y", GN ) tends to a continuous function g(t)
as n -+ 00, then necessarily g(t) = t/I(t, Y, G) and YN -+ Y, GN~ G.
PROOF. °
If G(oo) = 0, then G,,(oo) -+ by complete convergence, whence
t/I(t; YN' G,,) -+ t/I(t; Y, G) = iyt, recalling that the integrand h(t, u) of (1) is
bounded in modulus for fixed t. If G( (0) > 0, then G,,( (0) > for all large n
by Lemma 8.1.1, whence (ljG,,( 00 »t/I(t; Y", G,,) -+ (ljG( (0» t/I(t; Y, G) by the
°
Helly-Bray theorem, and so t/I(t; y", GN ) -+ t/I(t; Y, G).
Apropos of the final assertion, Theorem 1 ensures that the i.d. c.f.s
= e"'(I; Yn, G n ) -+ e9(I), continuous. Thus, e9(I) is a c.f. and i.d. by Theorem 1
e"'n(l)
and Proposition 4. Define oc(t) = Log e9(I) and ocN(t) = Log e"'n(l) = t/lN(t). By
Theorem 8.3.3 e"'n(l) -+ e9(I) uniformly in It I ~ T for all T E (0, (0), whence
by Lemma 2, t/lN(t) -+ oc(t) uniformly in It I ~ T and oc(t) is continuous. Hence,
recalling the proof and notation of the last part of Theorem 1 and defining
'+ foo-00eily dHiy),
f1-1 t/lN(y)dy =
I
y"(t) = 2t/1,,(t) -

f
'+l
V(t) = 2oc(t) - 1_IOC(y)dy,
12.1 Infinitely Divisible Characteristic Functions 451

it follows that v,,(t) ..... V(t), continuous, and, in particular, Hft(oo) = v,,(0) .....
V(O), whence V(O) ;;::: O. If V(O) = 0, then

x) (1~
f
2

Hft(oo) = 2 oo ( 1 -
-00
sin
-~
+ dGft(x) ..... 0,
X )

implying Gft( (0) ..... 0, whence G(u) == 0 and necessarily ')1ft tends to a finite
limit ')I. If V(O) > 0, the dJ.s Hft(u)/Hft( (0) (whose d.s v,,(t)jVft(O) ..... V(t)/V(O»
converge to a limit dJ., say H(u)/V(O). Thus, Hft -4 H, and by the Helly-Bray
theorem for any continuity point u of H, recalling Theorem 6.5.2,

f
u ( sin y) - 1 y2
2Gft(U) = 1- - -1--2 dHft(Y)
-00 Y +Y

f
u ( sin y) - 1 y2
--+ 1- - --2 dH(y). (2)
-00 y 1+Y
Define G(u) to be the integral on the right side of (2). Since the continuity
points of G and H are identical, Gft ..:. G. Hence, ')1ft tends to a finite limit ')I.
Clearly, r/I(t, ')I, G) = g(t). 0

From the preceding, a canonical form for i.d. cJ.s known as the Levy-
Khintchine representation follows readily.

Theorem 3 (Levy-Khintchine representation). A d. cp(t) is i.d. iff

cp(t) = exp i')lt { f + _oo ( e"u


. 00
ituu ) --;;-
- I - 1+ 2
(I
+ u ) dG(u) } ,
2
(3)

where y, G are as stipulated in (I).


PROOF. Theorem 1 asserts that cp( t) = e~(I; y, G) as above is i.d., and so it
suffices to prove the converse. If cp = CP:, n ;;::: I, as in the proof of Proposition
5, n[cpft(t) - I] ..... Log cp(t). Now,

n[cpft(t) - I] = f~oo n(ei1u - I)dFiu)

oo
f
U
= it --2 n dFft(U)
-00 1 +u
(1
f
2 2
+ . - 1 - -itu-) -+-
oo ( e"u u ) - -un d F (u).
_ 00 1 + u2 u2 I + u2 ft

Set
nu fU ny2
f
oo
')1ft = -1--2 dFft(u), Giu) = -1- 2 dFft(y),
-00 +U -00 +y
r/lft = r/I(t; ')1ft, Gft)·
452 12 Infinitely Divisible Laws

As noted above, .pn(t) -+ Log <p(t), which is continuous. Thus, by Theorem 2


Yn -+ Y, Gn .4 G, and
o
In the case of distributions with finite variance, the canonical form in (3)
admits considerable simplification.

Theorem 4 (Kolmogorov). A function <p(t) is the d. ofan i.d. distribution with


finite variance iff for some real y* and nondecreasing, left-continuous G* with
G*( - 00) = 0, G*(oo) < 00

<p(t) = eXP{iY*t + f~oo (e ilu - 1 - itu) u12 dG*(U)} (4)

and, moreover, <p uniquely determines y* and G*.


PROOF. If the d.f. corresponding to the i.d. c.f. <p(t) = e"'(I) has a finite second
moment, .p has a finite second derivative at zero and a fortiori a generalized
second derivative at zero. Now, via (3)

.p(2t) - 2.p(O) + .p( - 2t) foo e 2ilu - 2 + e- 2i'u 1 + u2


(2t)2 = -00 -(2it)2 u2 dG(u)

=- oo .
iIU
(eilU - e- )2 1 + u
2
- 2 - dG(u) = -
foo (sin tU)2
- - (1 + u2) dG(u),
f -00 21t u -00 tu

whence

-.p"(O) ~ lim
,-0 I (
lIul:s 1/1]
-
tu
tu) 2 (1
sin - + u2)dG(u) ~ (sin 1)2 fOO
- 00
(1 + u2)dG(u).

Thus, G*(u) = J~ 00 (1 + y2)dG(y) has the asserted properties i.e., G*( 00) < 00
and

J(
2
eilU - 1- ~)
1 + u2
(1 +u2u )dG(U) = JI~(ei'U - 1-~) ~ dG*(u)
1 + u2 u2

= f(e ilU - 1 - itu) ~ dG*(u) + it f(u - _u_2) ~ dG*(u),


u 1+u u
whence (4) holds with y* = y + Ju dG.
Conversely, if <p(t) = exp{r(t)} is as in (4), then, since

u12 (u - 1 : u 2) = 0(1),
necessarily r(t) = .p(t; y, G) + itc for some constant c, where

G(u) = foo (1 + l)-1 dG*(y),


12.1 Infinitely Divisible Characteristic Functions 453

whence qJ(t) is an i.d. c.f. by Theorem 1. Moreover, as t' -> t


it U itu
r(t') - r(t)
, = lY. * + f(e ' , - e - IU
.) dG*(u)
--2-
t-t t-t u

f
dG*(u)
-> iy* +i (e itu - I) - u - = f'(t) (5)

since for It' - tl ~ 1, recalling Lemma 8.4.1,


u
eil ' - eitu
- - - - - IU =
. I leitl'-nu - I - lUe
. -',Iu I
i t'-t t'-t

~
ei(t"-t)u - 1 - i(t' - t)u
,
I + lu(1 - e-itU)1 :s; u 2 (! + Itl),
I t - t
which is integrable with respect to u - 2 dG*(u). Analogously, as t' -> t
it u itu
f'(t'), - f'(t) =1. f e ' , - e dG*( U ) - > - f eitu dG*( u.)
t - t u(t - t)
Thus, f and hence qJ has a finite second derivative, whence the transform of qJ
has a finite second moment. Moreover, f" and hence qJ" uniquely determines
G* and therefore also y*. From (5), y* = - if'(O) is the mean ofthe underlying
distribution and it is readily verified that G*( (0) is the variance. 0

EXERCISES 12.1

I. Prove that if cp is an i.d. d., then cpA is a d. for every A. ~ O.

2. Prove Proposition I without mentioning LV.S.

3. Verify that the function G of (3) has a finite moment of even order 2k iff the same is
true for the underlying i.d. distribution.

4. Let cp be a non-i.d. d. having the representation (3) with G not nondecreasing (but
otherwise as in (1». Prove that y and G are still uniquely determined by cp.
5. If
1X(1 - fI) I - f3 k
PIX = -I} = - - - - , PIX = k} = - - ( 1 + 1Xf3)f3 ,k = 0, I, ... ,
I+IX I+IX
where 0 < IX < f3 < I, show that X has an i.d. c.f. cp(t) iff IX = 0 and, further, that
Icp(tW is i.d. even when IX -:f. O.
6. Show for an i.d. c.f. cp(t) = exp{ljJ(t; y, G)} that if the support of its d.f. F is bounded
from below, so is that of G. Is the converse true?

7. Prove that if CPo = cpZ" k ~ I, where CPk is a c.f. for k ~ 0 and nk is a sequence of
positive integers -+ 00, then cp is i.d.

8. Prove that if {X n' n ~ I} are i.i.d. r.v.s with d.f. G and N is a Poisson LV. (parameter A.)
J
independent of {X n' n ~ I}, then the c.f. of 1~ Xi is exp{A. (e itu - l)dG(u)}.
9. Show that an i.d. mixing G of an additively closed family ff = {F(x; A.), A.EA c: Rm}
yields an i.d. mixture H. Hint: Recall Exercise 8.3.14.
454 12 Infinitely Divisible Laws

12.2 Infinitely Divisible Laws as Limits


On several occasions double sequences of random variables (independent
within rows) have been briefly encountered. Such a general schema comprises
an array of r.v.s
(1)

with corresponding dJ.s {Fn.d and c.f.s {q>n.d such that within each row,
i.e., for each n ~ 1, the r.v.s X n. l , X n. 2 " " , X n. kn are independent. The r.v.s
of an array such as in (1) will be called infinitesimal if

(i) max P{lXn kl > e} = 0(1), all e > 0,


I sksk n

that is, if the row elements become uniformly small in the sense of (i).
Exactly as in the proof of the weak law of large numbers (Section 10.1),
this implies
(ii) max Im(Xn.k)1 = 0(1),
I sksk n

where, as usual, m(X) is a median of X. Moreover, since

max f Ixl'dF nk =:;; e' + max f Ixl'dF nk


k Jllxl < rJ k Jlt < Ixl < rl

=:;; e' + r' max P{lXnkl > e},


k

infinitesimality also entails

(iii) max i
ISkSkn Ilxl<rl
Ixl'dFnk(x) = 0(1) for all r > 0, r > O.

Lemma 1. The infinitesimality condition (i) is equivalent to either

(i')

or
(i") max II - q>n.k(t) I = 0(1) uniformly in It I =:;; T for all T > O.
1 SkSk n

PROOF. maxd [X 2 /(1 + x 2 )]dFnix) =:;; e2 + maxdllxl~tl dF nk ...... 0 as n ...... 00,


and then e ...... 0 under (i). Conversely, under (i'), for all e > 0

e
-1--
2
max P{lXn.kl > e} =:;; max
+ e2 k k
x 2 dFnk(x)
-1--
IIxl ~ tJ + x
i 2
= 0(1),
12.2 Infinitely Divisible Laws as Limits 455

and so (i) obtains. Next, for It I ::;; T, recalling Lemma 8.4.1,

m:x 1 l - CP.k(t) I = m:x f(e I ilX


- l)dF.k I
::;; max[
k
r
J Uxl ,;; <]
ItxldF. k + 2 r
J Uxl > <I
dF.kJ

::;; f.T + 2 max P{ IX .k I > E;} --+ 0


k

as n --+ 00, and then E; --+ 0, so that (i) implies (in). Finally, since 1 - 9l {cp} ::;;
11 - cpl, Lemma 8.3.1 stipulates a positive constant a(c, <5) such that

max P{lX.d > c}::;; a(c, <5) rlmaxll - CP•. k(t)ldt = 0(1)
k o k J
for all c > 0, whence (in) ensures (i). o
For fixed but arbitrary f, 0< f < 00, define

a•. k = a•. k(f) = r


JUxl <r]
x dF.k> X.,k = X •. k - a.,k>
(2)
F.,k(X) = F.,k(X + a•. k), cP•. k(t) = e-i1an·'CP.,k(t).
Since (i) entails (iii), max1';;k';;k.lan,kl = 0(1) for all f > 0, and so {X.,k}
infinitesimal implies {X•. d infinitesimal and hence also, via Lemma 1,
(iv) max 11 - cP.,k(t) I = 0(1) uniformly in It I ::;; T for all T > O.
l,;;k,;;k n

Lemma 2. If {X.,d are infinitesimal and {F.,k> cPn.d are defined by (2), then
for any f, T > 0 and n ~ NT' there exist positive constants c i = Ci(T, f),
i = 1,2, such thatfor 1 ::;; k ::;; k.

C 1 sup 11 - cPn,k(t)l::;; f-I +Xl-xl dF.,k(X)


-
::;; -C l
iT log ICP.,k(t) Idt.
Itl,;;T 0

PROOF. For It I ::;; T, omitting subscripts,

11 - cP(t) I ::;; Ir (eit(X-a) - l)dF I+ 2 r dF


JUxl < r) JUxl ~ r)

:; I r (eit(X-a) - 1 - it(x - a»dF I


JUxl<rJ

+ r it(x -
IJUxl<r] a)dF(x) I+ 2P{!XI ~ f}
::;; T
2
2
r
JUxl<r)
(x _ a)l dF + T r (x -
IJUxl<r) a)dF I
+ 2P{!XI ~ f}. (3)
456 12 Infinitely Divisible Laws

Now, noting via (2) that Ia I < T,

r (x - a)2 dF S; [I + (T + lal)2] r (x - 0)2 2 dF,


JUxl < ,) Juxi < 'J I + (x - a)

Ir (x - a) dFI = lal r dF
JlIxl<'J

S;
lal[1
JlIxl:;::,]

+ (r + lal)2]
(r - lal)
2
I IIxl:;::']
(x - a)2
d
( ) 2 F,
1+ x - a
(4)

and so it follows from (3) that

sup II - CPn.k(t)1
~ S;
2 + lal
[I + (r + lal)] -2 + ( _I 1)2
2 (T 2
T) f-1--
x
dFn.
2
-
k
III ~ r r a + x2

where, noting that (i) entails max k Ian. k I < r/2 for all large n,

( 9r2) [2 +/4rT +T2 J--_-I


2

-< 1+-
4 r2 C '
1

yielding the lower bound. Next, if F* denotes the dJ. of a LV. X* with c.f.
Icp(tW, from the elementary inequality

sin
( I - -~
TX) (I~
+x
2
)
~ c(T) > 0,

(I - Icp(uW)du = foo ((I - cos ux)du dF*


Jo -00 Jo
sin TX)
= T
f-
oo ( 1 - -
00

oo
- dF*(x)
Tx

f
X2
~ Te(T) --2 dF*(x). (5)
_ool+x

For any LV. X with dJ. F and median m, let X* denote the symmetrized X
and define
Fm(x) = PIX - m < x}, qm(x) = P{IX - ml ~ x},
q*(x) = P{IX*I ~ x}.
12.2 Infinitely Divisible Laws as Limits 457

By Lemmas 6.2.1 (iii) and 10.1.1

f:oo 1 :2X2 dF =
m
l°Otf"~dC :2 X2 )::; 21°O q *(X)dC :2 X2 )

oo
f
X2
=2 --2 dF*(x). (6)
_ool+x
Moreover, from the elementary inequality
2(m - a)(x - a) and the first equality in (4),

(X-a)2dF::;i (x-m)2dF+2(r+1ml)li (X-a)dFI


i lIxl < r] [Ixl < rJ [xl <: r]

:; i [Ixl <r]
(x - m)2 dF + 2r(r + Iml) i
lIxl2: r]
dF,

whence

f~
+
dF = f
1 x 1
(x - a)2
+ (x - a)
2 dF ::; i
lIxl<rJ
(x - a? dF + i lIxj2:r]
dF

::; i (x - m)2 dF + [1 + 2r(r + Iml)] i dF. (7)


IIxl < r] lIxl2: r]

But

i IIxl<r]
(x - m)2 dF ::; [1 + (r + Im1)2] i IIxl<r)1
(x - m)2 2 dF
+ (x - m)
oo X2
::; [1 + (r + Im1)2]
f - - 2 dF m
_ool+x

and as in (4)

dF::; 1 +(r+I~1)2i (x-m)22 dF


i IIxl2:r) (r - Iml) IIxl2:r) 1 + (x - m)

<
1 + (r + Iml)2 foo - -x 2d Fm
- (r - 1m 1)2 _ 1 + x2
00 '

so that, combining these with (7) and recalling (6) and (5),

f oo - -X22 dF- ::; d 2 -1--2


_ool+x
x f x 2 dF*
dFm ::; 2d 2 - -
+x
2
f 2
l+x

2d 2
Jo (l
( 2
::; Tc(T) - 1<p(u)1 )du. (8)
458 12 Infinitely Divisible Laws

In view of 1 - 11p12 :s; -logllp12 = -210gllpl, the upper bound follows


from (8) with C 2 = 2d 2 (-r)jTc(T), noting that for sufficiently large n

d 2 = [1 + (r + Im I)]
2 (
1+
1 + 2r(r + Im
(r _ 1m 1)2
I»)

:s; (1 + 4r 2)(1 + \~2;~2) = dir). 0

Lemma 3. If {X n.k} are infinitesimal r.vs with d.s Ipn.kfor which


kn
lim n Ilpn.k(t) I = f(t)
n-ook=l

exists and is continuous at t = 0, then for any r in (0, (0) there exists a constant
C depending on rand {lpn.k} such that

k
L
k=l
n
f+ x
2
-
2 dFn.k(x) :s;
-1--
X
c.
PROOF. Since n~':.l lq>n.k(tW -+ f2(t), continuous at zero,f2 is a d., whence
T may be chosen so thatf2 > i for It I :s; T. Then, by uniform convergence
n~nllpn.k(tW >! for n ~ NT and It I s; T and D':.lloglq>n.k(t)l-+ logf(t)
uniformly in ItiS; T. Hence from Lemma 2

L
k
n foo x
2
-1--2 dFn.k(x) :s; -C2 L
k
n iT logl q>n.k(t) Idt
k=l -00 +x k=10

-+ -c 2 f:IOgf(t)dt. 0

The next lemma indicates that the d. of a sum of infinitesimal rowwise


independent random variables behaves like that of a related i.d. d.

Lemma 4. If {Xn.d are infinitesimal LV.S such that for some r in (0,00) there
J
exists a C in (0, (0) with D':.l [x 2j(1 + x 2)]dFn. k(x) :s; C, n ~ 1, then
L~':.l [Log <Pn.k(t) - (<Pn.k(t) - 1)] = 0(1) for all real t. Moreover, for any
constants An and all real t

n q>n.k(t) -
kn
Log e- itAn I/I(t; Yn' Gn) = 0(1), (9)
k=l
where

and 1/1 is as in (1) of Section 1.


12.2 Infinitely Divisible Laws as Limits 459

By hypothesis and Lemma 2, for It I ::;; T and n ~ Nt


I
PROOF.
2
I kn C
L lqin,k(t) - L
n
k x -
11::;; - -1--2 dFn,k(X) ::;; - .
k=1 Clk=1 +x CI

Furthermore, infinitesimality implies (iv), whence Log qin,k is well defined for
It I ::;;
T and 1 ::;; k ::;; k n provided n ~ n(T, t), whence under these circum-
stances
ILog qin,k(t) - (qin,k(t) - 1)1 ::;; lqin,k(t) - 11 2 .
Thus, since T is arbitrary, for all real t if n ~ n (T, Itl),

I J. [Log qin.k(t) - (qin,k(t) - I)] I::; k~llqin'k(t) - 11


2

C ~
::;; - max l4>n,k(t) - II = 0(1). (10)
CI I "k"k n
Next,

Log qin,it) - (qinjt) - 1) = Log 4>njt) - itan,k - I(e it " - I)dFn,k(u)

= Log 4>n,k(t) - [itan'k + it I 1 : x2 dFn,k


X2
+ I(e
ilX
- I - I ~XX2)C :2 ) I :2X2dFn'kJ

and so, upon summing and setting

Yn = -An + ~ (an'k + I -+XIX dFn,k)'


k=1
2

L I"
n 2
Gn(u) = k
-I-x- 2 dFn,k,
-
I -00 +X
(9) follows from (10). D
The connections between i.d. laws and the array of (1) is unfolded in
Theorem 1 below.
Theorem l. If {X n, b I ::;; k ::;; k n --+ 00, n ~ I} are infinitesimal, rowwise
independent LV.S, the class oflimit distributions ofcentered sums L~~ I X n,k - An
coincides with the class ofi.d.laws. Moreover, D~ I X n. k - An .!4 i,d. distribu-
tion characterized by (y, G) iff Yn --+ Y, Gn .s. G, where

Yn =- An + ~
k=1
(an'k X 2 dFn,k(X~,
+ f -1
+X ')

L
k
Gn(u) = x -
n

-1- - 2 dFn,k(X),
f" 2

k=1 -00 +X
and t is an arbitrary but fixl:d constant in (0, (0).
460 12 Infinitely Divisible Laws

PROOF. Any i.d.law characterized by (y, G) is obtainable as a limit of distribu-


tions of row sums of independent, infinitesimal LV.S X n • k . It suffices to choose
kn = n and take as the d. of X n.k the i.d. d. characterized by (y/k n, (l/k n)G)
since such X n. k are clearly infinitesimal. Next, if for some constants
An,e-itAnn~'::t qJn.k(t)--+g(t), a d., then Lemma 3 applies withf = Igl,
and hence also Lemma 4, so that by Theorem 12.1.2, Yn --+ Y, Gn . ; G, and
9 = exp{l/!(t; Y, G)}. Finally, if Yn --+ Y, Gn '; G, Theorem 12.1.2 ensures
l/!(t,Yn,Gn) --+ l/!(t;y,G), and L~=1 J[x 2/(1 + x 2)] dFn.k = Gn(oo) --+ G(oo) < 00
whence Lemma 4 guarantees that e- ilAn n~~1 qJn.k(t) --+ et/!(t;Y.Gl. 0

Corollary 1. The only admissible choices of the constants An are

An = I (an.k + f~
k= I +
dFn.k) - Y +
1 X
0(1)

for some constants Y and r > O.

The next question that poses itself is under what conditions on F n• k a


particular i.d. limit is obtained.

Theorem 2. If {X n.k, 1 ::::;; k ::::;; k n --+ 00, n 2 I} are rowwise independent,


infinitesimal LV.S, then for any constants An' e- iAnt n~,:: 1 qJn.k(t) --+ the i.d. d.
exp{l/!(t; Y, G)} iff
k
n
fU I + x2
0> u E C(G)
~Fnk(U)--+ -00 ~dG(x),

k [1
L
n

- Fnk(u)] --+
foo -1 -+2-x 2 dG(x), 0< U E C(G) (11)
1 U X

lim lim
,-On-ook=1
I [r J1xl<,]
X2 dFnk(x) - (r
J1xl<'J
X dFnk(X») 2] = G(O+) - G(O-),

(12)

- An + Ir
1 J l1 xl<rJ
X dFnk(x) --+ y + r
J l1 xl<tl
X dG(x) - r
J l1 xl<:rJ X
~ dG(x) (13)

for some fixed r > 0 for which ± r are continuity points of G.


PROOF. By Theorem 1 it suffices to show that (11), (12), (13) are equivalent to

(14)

and

-An + ~ (ank + f 1 : x2 dFnk(X») = Yn --+ y. (15)

This will be effected by proving (14) <:> (14') <:> (II '), (12') <:> (I 1), (12), and,
moreover, that when (14) obtains (13)<:>(15). In this schema, (11') is (II)
12.2 Infinitely Divisible Laws as Limits 461

with F n • k replaced by Fn • b and (12') and (14') are defined by


lim hm [Gie) - Gn( -e)] = G(O+) - G(O-) (12')

(where Gn is as in (14», and


Gix) --+ G(x), 0> X E C(G),
Gn(oo) - Gn(x) --+ G(oo) - G(x), 0 < X E C(G), (14')
lim lim [Gn(e) - Gn( -e)] = G(O+) - G(O-).

Since (14) implies lim.~o hm[Gn(e) - Gn( -e)] = lim.~o[G(e) - G( -e)]


= G(O+) - G(O-), it is apparent that (14) = (14'). Conversely, under (14'),
taking ± x E C( G),
G(O+) - G(O-) = lim lim[Gn(x) - Gn( -x)]
x-o+ "

= lim [G(x) - G(oo) - G( -x) + lim Gioo)]


x ..... O+

= G(O+) - G(O-) - G(oo) + lim Gn(oo),


so that !illi"Gn(oo) = G(oo), whence (14) holds. Of course, (14)=(12')
trivially and (14') = (II ') since for continuity points x, by the Helly-Bray
theorem,

t
kn _
Fnk(x) =
kn
t IX_
I + UZ UZ
00 ~ I + UZ dFnk(u)
_

z z
= IX I +zu dGn(u)--+ IX I +zu dG(u), x < 0,
-00 u -00 U

L
k
n

[I
_
- Fnk(x)] =
Joo -I Z
+
- dGn(u)
Joo -I +z - dG(u),
U
Z
--+
U
Z
x> O.
I X u x u
Conversely, (11 '), (12') = (14') since for continuity points u, by the Helly-
Bray theorem,

Giu) = I" dGn(x) = I" A


d(I Fnk(X»)

"
-00 +x
-00 1

I +X
I
XZ Z
--+ - I - -Z --z-dG(x) = G(u), u < 0,
-00 +x x

Gn(oo) - Giu) = 1 00

I :zxz d(~ Fnk(X»)


= - 1 00

I :ZXZ d(~ [I - Fnk(X)])

--+ 1 00

dG(x) = G( 00) - G(u), u > 0


462 12 Infinitely Divisible Laws

Thus, (14)¢>(14')¢>(11'), (12'). Next, if an = maxI :Sk:5kJllxl<,dxl dF nk , then


maxklankl ~ an = 0(1) by infinitesimality. Now,
kn kn
L Fnk(x - an) ~ L Fnk(x) ~ L Fnk(x + an),
k=1 k k=t
whence
L Fnk(x -
an) ~ L Fnk(x) ~ L Fnk(x + an)'
k k k
Thus, if u < 0, for any e > 0 for which u, u ± e are negative continuity points
of G, if (11) obtains,
u+t I + x 2
rrm rrm
L Fnk(u) ~ L Fnk(u + e) = - - 2 - dG(x),
f
- f
k k - <Xl X

u 2
+x
t

lim L Fnk(u) ~ lim L Fnk(u - e) = - I


- - 2- dG(x),
k k - <Xl X

and so, letting e -+ 0, lim D~ 1 Fnk(u) = S~ <Xl [(1 + X2)jx 2]dG(x). An


analogous statement holds for continuity points u > 0, whence (11) => (11').
The same argument works in reverse, yielding (11 ') => (11). Consequently,
(11') ¢> (11). Next, sincefor any e > 0

_1_2 ~ f x 2 dFnk(x) ~L f
~ dFnk(x) ~ L f x 2 dFnk>
I + e k=1 JUxl<t) JUxl<t)I + x k k Jllxl<tl

(12') ¢> Iim t _o+ rrm 2


n D~ I SUxl <t) x dFnk(x) = G(O+) - G(O-), and there-
fore, to complete the verification that (II'), (12') ¢> (11), (12) it suffices to
show that for all e > 0

In(e) = I If
k=1 JUxl<tl
x 2 dFnk(x) - [f
JUxl<tl
x 2 dFnk(x) - (f
JUxl<tl
x dFnk(X») 2] I
= 0(1).
Recalling that Iank I ~ an = 0(1), for 0 < e < r (omitting subscripts
temporarily)

f x 2 dFnk - f (x - ank)2 dF nk I
IJUxl<tl JUxl<tl

= If (x - a)2 dF - f (x - a)2 dF I
JUx-al<t) JUxl<tl

= Iilx-al<t:Slxll - lX'<t:s,x-alll

~ e2 f dF + f (e + lal)2 dF
Jlt :slxl:s t+lall Jlt -Ial :slxl <t)

~ (e + an? r
Jlt - an :slxl:s t+anl
dF nk
12.2 Infinitely Divisible Laws as Limits 463

and

I
2
f (x - ank)2 dF nk - f x 2 dF nk + (f x dF nk )
I J11xl«1 JUxl«1 JUxl«J

= If J l1xl < <I


(-2ax + a2)dF + (f Jl1xl < <I
XdF)21

=
(1
1 Uxl < <I
x dF - a
) 12 - a2
Uxl2!: <I
dF I

=
I(1 1<:5lxl<rl
x dF
) 1
2 - a2
Ilxl2!:<l
dF I

.::;; an r
J I <:5lxl < rl
r dF + a; r
J ux!2!: <I
dF

implying

since, choosing 11 in (0, E), where ±E, E ± 11 are continuity points of G, and
noting that an < 11 for n ~ N. (and recalling that (11) ¢ > (11'))

1 + x2
f
oo
+ - - 2- dG(x) < 00,
< x

= 0(1) as 11 -> o.
464 12 Infinitely Divisible Laws

To complete the proof of the theorem it remains to show that (13) ~ (15)
under (14). Since for all k

IJrlIxl<t]x dFnk(x) I= IJrlIx-al<t](x - a)dFI

= IJrlIx-al<t)(x - a)dF - r
JlIxl < r)
(x - a)dF +a r
JlIxl;;, f]
dF I
< r
IJ[ix-al<f:<;lxll (x - a)dF - r
JlIxl<f:<;lx-all
(x - a)dF I
+Ial r
JlIxl;;'f)
dF

~, r dF
J[f:<;lxl<f+lall

+ (, + lal) r
J[f-Ial:<; Ixl < f]
dF + lal r
J lIxl ;;, f)
dF

via (16), and so, recalling that ±' are continuity points of G, it follows from
I
k=1
f-~
I+x
dFnk(x) = I (1
k=l [iXI<f)
x dFnk - 1Ilxl<f)l+x
~ dFnk(x)
+
I lIxl;;'r)
x2
--
1+ x
-
dFnk(x) )

= 0(1) - r
J[iXI<f]
x dGn(x) + r
JlIxl;;'f]
~ dGn(x)
x
that (13)~(15) under (14). 0

Theorem 3. Let {X n. k, 1 ~ k ~ k n -> 00, n ~ I} be rowwise independent


LV.S.
(i) If {Xn,d are infinitesimal and D:
I Xn,k - An converges in distribution

for some choice ofconstants An' the limit is normal iff


max I X n k I .!. 0 (17)
1 Sk Sk"
12.2 Infinitely Divisible Laws as Limits 465

or equivalently

L
k= 1
k
n
I [lxl" £)
dF nk = 0(1), e> O. (18)

(ii) {Xn,d are infinitesimal and LZ~ 1 (X n. k - an,k) has a limiting normal
distribution necessarily N(O, (12) iffjor all e > 0, (18) holds and

lim ~
n k= 1 J[lxl<d
[r x 2dFnk(x) - (rJ[lxl<tl
x dF nk (X»)2] = (12. (19)

PROOF. (i) tfL X n. k - An has a limiting distribution, then (II), (12), and (13)
hold. Moreover, if the limit is normal, since exp{tjJ(t, y, G)} = e-(<1' r' /2)+it9 iff
y = 0, G(u) = 0 for u < 0 and G(u) = (12, U > 0, (11) of Theorem 2 requires
Dn Fnk ( -e) = 0(1) = Dn[1 -
Fnk(e)], e > 0, which is tantamount to (18)
and also to fl~n(l -
S[lxl>t)dFnk(x»-> I, f, > 0, in view of 1- LPi:-:;
fl (1 - Pi) :-:; e - L.p, :-:; 1 (where 0 :-:; Pi :-:; 1). It is therefore also equivalent to
(17)via P{maxJsksdXnkl;;::: e} = I - nZ~J(1 - P{IXn.d;;::: f.}).
Apropos of (ii), (18) and (19) ensure (II) and (12) with G as above and
(12 = G(O+) - G(O-), while (13) is satisfied with An = D~ 1 an.b Y = O.
Thus, D~ J (X n. k - an.d has a limiting normal dJ. N(O, (12) by Theorem 2.
As for the converse, (18) holds via (i), and therefore it suffices to verify that
(12) and (18) entail (19). Now, for 0 < [;' < f,

Lk{I
n

k= I [lxl<d
x 2 dF nk - (I [lxl<t]
)2}
x dF nk

L {I
k" x 2 dF nk - (I x dF nk )2}
k = I. [Ixl < t') [lxl < t']

+ L {I k" x 2 dF nk - (1 x dF nk )2}
k =1 It' slxl < t) [t' slxl < t)

- 2 L (1 k" x dF nk )(1 x dF nk ' ) (20)


k =1 [lxl <t') [t' slxl <t]

and in view of (18),

0:-:; L {I
k" x 2 dF nk - (I x dF nk )2}
1
< Ixl < t) slxl < £)

I
k= 1 [t' It'

:-:; L L
n n
k x 2 dF nk :-:; e2 k dFnk = 0(1),
k= J It' slxl < t) k= J [Ixl ~ t']

o< 2 L
k
n

II [lxl <t')
x dF nk
III slxl <t)
x dF nk
I
I
k= 1 It'

~ 2£f,' L
k dF nk = 0(1).
k= J [lxl"')
466 12 Infinitely Divisible Laws

Consequently, from (20), for all positive e, e'

lim "
-
.~oo
L.
k {1 n

k= t Uxl<e)
x 2 dF. k - (1 Ilxl<e)
X dF. k )2}
= fiili L {1 (1 X dF k)2} .
n

k x 2 dF. k -
• ~ 00 k= 1 Uxl < e') llxl < e')

Thus, the prior upper and lower limits are independent of e, whence (12)
ensures (l9). 0

Corollary 2. If {X.k> 1 :$; k :$; k. --+ 00, n ~ l} are infinitesimal rowwise


independent LV.S with zero means and variances a;k satisfying L~~ 1 a;k = 1,
n ~ 1, then L~~ 1 X. k has a limiting standard normal distribution iff for all
e>O
(21)

PROOF. In view of

L
k= 1
k
n
1
llxl ;,e)
x 2 dF.k(x)

~maX[I(r
k= 1 JUxl;, e)
XdF. k(x))2,e I
k= 1
r
JUxl;, e)
I XldF. k(X),[;2I
k= 1
r
J uxl ;, e)
dF.k(X)] ,

(22)
(21) implies (18) and also D=
1 (Jlxl <e X dF. k )2 = o(1)for all [; > 0, noting that
E X. k = O. Consequently (21), which is equivalent to D~ 1 JUxl<e) x 2 dF.k(x)
--+ 1 via L~~ 1 E X;k = 1, also implies (19). Then, Theorem 3 guarantees a
limiting standard normal dJ. for D~ 1 X n. k since D~ 1 a.k = 0(1) via (22),
Conversely, if Ihl X •. k is asymptotically N(O, 1), then taking y = 0 = A. in
Theorem 2, (11) and (13) ensure I ~n a. k = o( 1) so that for all [; > 0,

via (19), implying D~ 1 Jllxl <e) X2 dF.k(x) --+ 1 for all [; > 0, and, as already
noted, this is tantamount to (21). 0

Corollary 3. If {X., n ~ I} are independent LV.S with d.f.s {F., n ~ I}, then
(lIB.) L~ Xi - A. has a limiting standard normal dJ. and {XkIB., 1 :$; k :$; n}
are infinitesimal for some sequence {B.} of positive constants tending to 00 iff
for all [; > 0

(23)
12.2 Infinitely Divisible Laws as Limits 467

(24)

where

PROOF. Define Xn,k = XklB n, 1 ~ k ~ n, n ~ 1. Then {Xn,b 1 ~ k ~ n, n ~ I}


are rowwise independent LV.S with Fnk(x) = Fk(Bnx), and (23), (24) are
simply transcriptions of (18), (19). 0

EXERCISES 12.2
1. If {X n . k , I ~ k ~ kn ---+X.J,n:2: I} are rowwise independent LV.S, prove that
L~n X nk -.':. some constant )I and that {Xn.k} are infinitesimal iff for every e > 0

(i) If I !lxl2:c)
dFnk(x) = 0(1), (ii) If I Ilxl<t)
x dF.k(x) ---+ )I,

(iii) I {f
k= I !lxl<')
x 2 dFnk(x) - (f
. Ixl<'
x dF•. b))2} = 0(1).

~ k ~ k. ---+ 00, n :2: I} be rowwise independent and positive


2. Let {X n • k , I LV.S.
Prove that D~ I X n. k ~ 1 and {X •. k} are infinitesimal iff for every f. > 0

(iv) k~l
k
n
SOC:
, dFnjx) = 0(1), (v) I
k= I
f' x dF.ix) ---+ 1.
Jo
3. If {X•. k> I~ k ~ k....... ,Xl} are rowwise independent positive LV.S with finite
expectations satisfying D~ 1 E X. k = I, then D~ 1 X •. k -.':. I and {X •. k} are
infinitesimal iff for every c > 0, (v) holds.

4. (Raikov) Let {X •. k! have finite variances and {X •. k - E Xn.k> I ~ k ~ k., n :2: I}


be rowwise independent, infinitesimal LV.S satisfying D~ 1 E (X n.k - EX nk)2 = I,
n 2: 1. Then L~~ 1 (X n.k - EX n.k) has a limiting standard normal dJ. iff Un =
D~ 1 (X n. k - EX •. k)2 .!:. 1.

5. Construct rowwise independent, infinitesimal LV.S {X n • k } which do not satisfy (i).


6. Since the uniform distribution is not i.d., (prove) why does Exercise 8.3.7 not contra-
dict Theorem I?

7. Give necessary and sufficient conditions for sums L~n X nk of rowwise independent,
infinitesimal LV.S {Xn.d to have limiting Poisson distributions.

8. Prove that if {X., n :2: I} are independent LV.S, there exist constants An' Bn > 0
such that B; I D
Xi - Anhas a limiting standard normal d.f. and {Xk/B n, 1 ~ k ~ Il}
are infinitesimal iff there exists constants Cn ---+ OCJ with

(vi) if
1 !lxl>Cn)
dF k = 0(1), (vii) -~
C.
i 1
{f !lxl <Cn)
2
x dF k - (f IIxl <Cn!
XdFk)2}""".N.
468 12 Infinitely Divisible Laws

Hint: Under (23) and (24), choose Gn --+ 0 such that Gn Bn --+ CIJ and then determine
nj such that for n 2 nj the left side of (23) (resp. (24» with G = Gj is < I/j (resp.
> I - I/j). Then take C n = GjBn for nj ~ n < nj + I' Conversely, under (vi), (vii)
choose B; to be C; multiplied by the left side of (vii), whence C n = o(B n ) and (23),
(24) hold.

9. If {X n ,.'12 I} are independent LV.S with P{X k = ±k} = (/2k, P{X k = O} =


I - I/k, does B;; I I~ Xi - An have a limiting standard normal dJ. for some An'
Bn > O?

10. If {X, X n , n 2 I} are i.i.d. with EX = 0, E Xl = 1 and {a n • j , 1 ~ i ~ n} are con-


stants with maxi lan.i1 = 0(1) and Ii'=1 a;.i = 1 then Ii'=1 an.jXj ~ No. I'
11. The subclass of infinitely divisible distributions which are limit laws of normed sums
(1/ Bn)D Xi - An of independent LV.S {X n' n 2 (} (0 < Bn --+Xl) is known as the
class !fl (Levy). Employ characteristic functions to prove that FE !fl iff for every
rJ. in (0, I) there exists adJ. G. such that F(x) = F(x/rJ.) * G•. (If (rJ., G) characterizes
an i.d. c.f. whose distribution E !fl, then its left and right derivatives, denoted G'(x),
exist on (- Xl, 0) and (0, Xl) and [(I + x 2 )/x]G'(x) is non increasing.)

12.3 Stable Laws

As indicated at the outset of this chapter, the class of limit laws of normed
sums of i.i.d. random variables is a narrow subclass of the infinitely divisible
laws consisting of stable distributions.

Definition. AdJ. F or its d. q> is called stable if for every pair of positive
constants b l , b 2 and real constants ai' a2 there exists b > Oand real a such that

(I)

Clearly, if F(x) is stable, so is F(cx + d), c > 0, so that one may speak of
"stable types." Patently, degenerate distributions and normal distributions
are stable, and in fact these are the only stable dJ.s with finite variance. The
class of stable d.s will be completely characterized but, unfortunately,
explicit expressions for stable dJ.s are known in only a handful of cases.

Theorem l. The class of limit distributions ofnormed sums (lIB.) L~ Xi - A.


of i.i.d. random variables {X., n ~ I} coincides with the class of stable laws.
PROOF. If F is a stable dJ. and {X., n ~ I} are i.i.d. with distribution F, then
via(I),PfD Xi < x} = [F(x)]"' = F(bx + a),wheretheparametersdepend
on n, say b = liB. > 0, and a = -A •. Then
12.3 Stable Laws 469

p {~
B
n .=1
t Xi - An < x} = P {f Xi < Bnx + AnBn}
1

= F[~n (Bnx + AnBn) - An] = F(x)

for all n ~ 1 and, a fortiori, in the limit.


Conversely, suppose that F is a limit distribution of normed sums
(lIB n) L~ Xi - An ofi.i.d. {X n, n ~ I}. IfF is improper, it is certainly stable,
while otherwise by Theorem 8.4.2, (i) Bn -+ 00 and (ii) BnlBn- 1 -+ 1. For any
constants 0 < b l < b 2 < 00, define m = mn = inf{m > n: BmlBn > b 2Ib.},
whence BmlBn -+ b 2lb. via (i) and (ii). For any real at, a2' define constants
A m • n so that

By hypothesis, the left and hence right side of (2) converges in distribution to
F(b.tx + at)*F(bi·x + a2)·Ontheotherhand,(lIBm + n ) L7+ n Xi - A m+n
converges in distribution. According to Corollary 8.2.2, the two limit
distributions must be of the same type, that is, (1) obtains for some b > 0
anda. 0
It follows immediately from Theorem 1 that the stable distributions
form a subclass of the infinitely divisible laws and hence (l) may be used in
conjunction with the representation of i.d. c.f.s to glean further information.

Theorem 2. Afunction qJ is a stable d. iff

qJ(t) = qJa(t;y,p,c) = eXP{iyt - c1t1a[1 + iP~IW(t'IX)J} (3)

where °< IX S 2, IPis 1, c ~ 0, and


tan Tr1X/2, IX#I
w (t IX ) ={
, (2/Tr)log It I, IX = 1.

Note. The subclass with P = 0 = y comprises the symmetric stable


distributions. The parameter IX is called the characteristic exponent. If IX = 2,
necessarily P = 0, yielding the normal d. When IX < 2, absolute moments of
order r are finite iff r < IX.

PROOF. If qJ is a stable c.f. it is i.d. by Theorems 1 and 12.2.1, whence qJ(t) =


exp{t/t(t)}, where, according to the representation theorem (Theorem 12.1.3),
470 12 Infinitely Divisible Laws

z
l/t(bt) = itby + foo (e i1bX
-00
- I - I itbx Z) I
+x
\X
X
dG(x)

y )b + i (y)b
f
Z
oo ( . it
= itby + _ 00 e"Y - I - I + i Ib z i dG

= it [bY + 0 - b )
Z
f~oo I : yZ dG(~) ]
Z
+ foo (eilY _ 1 _~) b + yZ dG(~)
_ 00 I + yZ yZ b

= itb' + foo
-00
(eilY _ 1_~)(1 + +
+i i i +
1
i) b
Z
),Z
i dG(~).
b
(4)

Since q> is stable (taking a l = a z = 0) for any positive pair b t , b z , there


exists b > 0 and real a with l/t(b l t) + l/t(b z t) = ita + l/t(bt). Hence, from (4)
and uniqueness of the i.d. representation, for all x

f x
-00
bi + i dG
---=---'--;;-z
I+y
(1-) + fX
bl -00
b~ + i Z dG
I +y
(1-) = fX
bz -00
Z
b + i Z dG (~) ,
1+y b

(bi + i)dG (:J + (b~ + i)dG (:J = (b


Z
+ yZ)dG (~), (5)

(bi + b~ - bZ)[G(O+) - G(O-)] = 0. (6)


Set
+ yZ I + yZ
f
1 -ex
f
OO

J(x) = e' ~ dG(y), r(x) = -z-dG(y)


-00 y
for real x. If b = e- h , b i = e- h ;, i = 1,2,
Z Z
J(x + h) =
(. /b)eX f
1 + yZ oo
--z-dG(y)
Y
Z
= foo
eX
b +Zu dG-
U
(u)
b ,

-ex bZ + u
r(x+h)= -00 UZ
f (u)
dG b'

Thus, from (5), for all x and arbitrary hI' h z there exists h such that
+ hi) + J(x + h z ) = J(x + h),
J(x
r(x + hi) + r(x + h z ) = r(x + h). (7)

Taking hi = h z = 0, there exists b z such that 2J(x) = J(x + b z ), and


inductivelynJ(x) = J(x + bn - . ) + J(x) = J(x + bn)forsomebnE(-oo, 00).
Hence, (mln)J(x) = Oln)J(x + b m ) = J(x + bm - b n ) = J(x + b(m/n) say, for
any positive integers m, n, whence rJ(x) = J(x + b,) for all real x, every
positive rational r, and some function b,.
If J(x o ) is positive, J(x o + b(l/2) = O/2)J(x o) > 0, implying b(l/2) > °
12.3 Stable Laws 471

since 1 is nonincreasing. In similar fashion,


l(x o + nb(1/2l)= (lj2)l(x o + (n - l)b(1/2l) > 0
for every positive integer n, implying lex) > 0 for all x. Thus, either 1 == 0
or (as will be supposed) 1 is nonvanishing.
Since 0 < l(x)l as xT, it follows from rl(x) = l(x + b,) that for rational
numbers r' > r> 0, b, > b", whence as r i I (through rationals), 0 ::; b,l
some number 15'. Thus, lex) = lim'll rl(x) = lim, I I l(x + b,)::; l(x + b')
::; lex), implying lex) = l(x +) and l(x + nJ') = lex) for n ~ I. Since
l( 00 - ) = 0, b' = 0, Analogously, for rational r 1 I, 0 ~ b, Tb* and lex) =
l(x -) and b* = 0, Consequently, 1 is continuous and if rational r T any
positive r o , then b,l some b,o' whence
rol(x) = lim rl(x) = lim l(x + b,) = l(x + b,o)'
'i'o '1'0

Thus, b, is defined and strictly decreasing (the same functional equation


obtains) for all real, positive rand b l = O.
(i) Note that by definition lex) < 00 for all x > - 00, and so
00 = lim'ioo rl(x) = lim'loo l(x + b,) implies b,l - 00 as r Too and
l( - (0) = 00, As r 10, b, T00 since 0 = lim,_o r lex) = lim,_ 00 l(x + b,),
implying b, ---> 00 as r ---> O.
(ii) l(x + e) < lex) for f. > 0 and all x E ( - 00, (0). Suppose contrariwise
that l(x o + e) = l(xo) for some Xo and e > O. Since b l _ = 0, the quantity
r may be chosen so that 0 < b, < e, implying rl(xo) = l(xo + b,) = l(xo),
a contradiction since l(xo) > O. Thus 1 is strictly decreasing implying b,
continuous.
(iii) For all positive r l , r 2

l(x + b",,) = rlr2l(x) = rll(x + b'l) = l(x + b'l + b,,),


and so by strict monotonicity J"'l = b" + b'l for all r > 0, i = 1,2. This is
j

the multiplicative form of Cauchy's functional equation, and since b, is


continuous, necessarily b, = - (ljIX)log r for some constant IX. As r increases
from 0 to 00, b, decreases from 00 to - 00, necessitating IX > O. Moreover
r leO) = l(b,) = l( - (tjIX)log r), implying for x E ( - 00, (0) that

l()
X = leO) ·e -ax =-e
C I -ax
,
IX

where CI = IXl(O) ~ O. Note that, CI > 0 if G( (0) - G(O +) > O. Hence,

+ y2
f
oo I C
--2- dG(y) = l(log x) = ~ X-a, x> 0,
x y IX
or
x 1- a
dG(x) = C I -1--2 dx, x>O CI = al(O), (8)
+X
472 12 Infinitely Divisible Laws

Since G(oo) - G(O+) < 00, necessarily 0 < a < 2 and moreover, from (7)
e-a(x+h d + e- a (x+h = 2) e-a(x+h) or

(9)
Proceeding in similar fashion with r(x) if G(O-) > 0, it follows that

dG(x) = C2lxll-2ao dx, x < 0, c 2 = aor(O) > 0, (10)


l+x
and again via (7)
(11)
Setting b l = b 2 = 1 in (9) and (11) reveals that ao = a.
Summarizing, if G I 0, either G( 00) - G(O + ) = 0 = C1 and G(O - ) =
) = C2, whence G(O+) - G(O-) = (12 > 0, entailing b 2 = bi + b~ via
(6) and <p(t) normal, i.e., a = 2 or alternatively 0 < a < 2 and G(O+) =
G(O-) via (6), (9) and (11) with b l = b2 = 1. In the latter case, from (4),
(8), (10)

1
00

ljJ(t) = iyt + C (
e,tx
-
- 1 - -itx-2) -
dx-
1 0 + l+x x l +a

+ C2 f O- (
_ 00
itx- ) -
e itx - 1 - - dx-
1 + x 2 IX II + a .
(12)

Next, (12) will be evaluated in terms of elementary functions.


~i) 0 < a < I_

ljJ(t) = it [ Y - c l Jo xa(l + x2) + c 2


[00 dx fO-00
dx
Ixla(l + x2)
]

fO- (e
1
00
itx dX itx dX (13)
+C I (e -l)~+C2 -l)-I-II+a-
0+ X -00 X

By contour integration

0= -
(e lZ
dz
1)- =
f R(e'" -
2 -
1)
dv
-+
fR' (e- U
1)-
du
-
v l +a fu 1 +a
- -

1Q zl+a R, R2

2 _ 11</2 _ R _eilJide
+ I(-I)J (exp(iR j e '8 ) - I) l;a i8(1+a)-
j=l 0 Rj e
Now, since v == lexp(iRe i8 ) - 11:5: 2R for 0 < R < 1 and v:5: e-Rsin8 +
e
1 :5: 2 for 0 :5: :5: n12,
1</2

IJ[
1</2 _
(exp(iRe'8) - I)
ide
:5:
2
10
--81
R I - a de = 0(1) as R --+ 0

[1</2 de

1J
o Rae' a

2 w = 0(1) as R --+ 00,


o
12.3 Stable Laws 473

it follows that

via integration by parts and the recursion formula for the r function. Thus,
if t > 0,

Since

setting

and
an
c = - (c\ + c 2 )· r( -a)cos 2 ~ 0,

for t > 0, from (13),

= ity' + ar( - a)t' ([J(O) + J - (O)]cos ~ a + [J - (0) - J(O)]sin ~ a)

= ity' - ct' {I + iPtan(~)a},


where

P= c 2 - c\ = r(O) - J(O) E [ -1,1]. (14)


C2 + c\ r(O) + J(O)
For t < 0,
474 12 Infinitely Divisible Laws

l{J(t) = l{J( -I) = - iy'( - t) - c( -;- t)a{ I - iP tan (~}x}

= iy't - C1t1a{1 + iP~ tan(~)()l


which dovetails with (3).
(ii) I < ex < 2. Since x/O + x 2) = X - [x 3/0 + x 2 )], it follows from (12)
~hat for suitable y"

[00 . dx
l{J(t) = ity" + C 1 J (e"X - I - itx) Xl+a
o

+ C 2 f o (itX
e . ) -dx-
- I - Itx l a (5)
-00 Ixl + '

By the same contour integration

= i-aex- I 1 000 - e-y)y-ady

= i-aM(ex) say, where 0< M(ex) < 00, and so, setting
1r
C = - M(ex)(c 1 + c 2 )coS"2 ex ~ 0,

for t > 0, from (15)

l{J(t) = ity" - Cta{1 + iPtan(~)ex}

= ity" - cit la {I + i I~ I P tan (~) ex},

and exactly as in case (i) the above also holds for I < 0.
(iii) ex = I. Since

[00 I - ~os u du = [00 sin u du =~


Jo u Jo u 2
and

'" sin v fEU v + 0(v 2)


dv = dv = log u, u > 0,
,-0 + f, l
lim -.2- lim 2
,-0 + , v

if t > 0,
12.3 Stable Laws 475

r'( .,
Jo e' x -
itx ) dx
I - 1 + X 2 ~2

=
1
o
00
cos tx - I d X
x2
+I .1 00

0
(
sm

tx - -tx-2 ) -dx2
I +x x

= - -n t + i lim -
2 <-0 +
[f<'
t sin v dv + t foo(sin v
<
-2-
V <
-2- -
V v(l
I) J
+ v )
2 dv

n
= -"2 t - it log t + itYo ,

noting
3
sin v I = v + O(v ) _ ~ + _v_ = O(v) as v -+ o.
7 - v(1 +v 2
) v2 V I + v2

Thus, setting c = (n/2)(c 1 + C2) ~ 0, for t > 0, from (12)

t/J(t) = itji + C1[ - ~t - it log tJ + C2 [ - ~ t + it log tJ

= it') - ct [I + ifJ ~n log tJ = itji - cit I [I + ifJ -~Itln~ log It I],

which coincides with (3) for t > 0 and also for t < O. Clearly, I fJ I ::;; I
from (14).
Conversely, suppose that rp(t) is defined by (3). If IX = 2, then rp(t) =
exp{ iyt - ct 2 /2}, so that rp is a normal d. and hence a stable d. If, rather,
o < IX < 2, let J - (0) and J + (0) be determined by

r(o) - J(O)
(16)
J (0) + J(O) = fJ,

c = j-lXn -1X)[r (0) + J(O)]cos"2 IX


n
if IX #- I

n (X [r(O) + J(O)] if IX = l. (17)


2
Set

I
x dy
IXr(o) _oolyla-I(I +y2)' x<O
=
G(x)
r
1 IXJ(O) Jo ya
dy
1( 1 + i) + G(O), x>O (18)
476 12 Infinitely Divisible Laws

and define for arbitrary y' E ( - 00, 00)

l/!(t) = ity' + _ fOO (


00 e
i1x
- I - I
itx ) I + x2
+ x 2 ~ dG(x). (19)

Then

l/!(t) = ity' + IXJ(O) fOO(e ilX -


o
1- ~) ~
l+x Xl+~
2

+ IXr(O)
f o ( e'., x_I - -itx- ) ----
-00
dx
1 + x 2 Ixl l +~

and, from the computations following (12), for some y" E ( - 00,00)

l/!(t) = it(y" + y') - cltl~{ 1 + iP I:' w(t, IX)}. (20)

Hence, from (3), (19), and (20), choosing y' =y- y",

qJ(t) = exp{l/!(t)},

whence qJ is an i.d. c.f. by Theorem 12.1.3.


Moreover, qJ(t) is stable, since setting

s(t) = ity - cltl~[1 + iP 1:1 w(t, IX)).

for IX #- 1,2 and positive bi' i = 1,2,

where b = b l + b 2 - a > 0 and b~ = b~ + b'2. If, rather, IX = 1 then (21)


obtains with b = b l + b 2 and

Thus, qJ is a stable c.f. o


EXERCISES 12.3

I. Show that the mass of a proper stable dJ. is confined to (0, (0) (resp. ( - 00, 0» iff
P = -I (resp. P = I).
2. Prove that if X has a stable distribution with characteristic exponent IX E (0, 2), then
E I X IP < 00 for P < IX and E IX IP = 00 for P > IX. Hint: If X = X I + X 2' where
X l' X 2 are i.i.d., then EIX IP < 00 for P < IX by Exercise 8.4.11. If P > IX, then EIX IP
= 00 by Theorem 8.4.1 (4).
References 477

3. Prove that all proper stable distributions are continuous and infinitely differentiable.
4. A d.r. F(with c.r. cp) is said to be in the domain ofattraction (resp. domain of normal
attraction) of a (stable) distribution G (with d. ljJ) if for suitable constants A., B.
(resp. for some An and B. = bn l /), lim e;'A·cp'(t/Bn) = ljJ(t). Show that every stable
distribution belongs to its own domain of (normal) attraction.
5. In coin tossing, prove that the probability that the mth return to equilibrium (equal
number of heads and tails) occurs before m2 x tends to 2[1 - <I>(X- 1/2)] as m -+ 00. As
usual, <I> is the standard normal d.f.
6. The limit distribution of Exercise 5 has density f(x) = (2nx 3 )-1/2 e -lflx, x> 0.
This is actually the stable density function corresponding to IX = t, fJ = -I, y = 0,
c = l.
7. If Sn = I~ Xi> n ;;:: 1 is a random walk with X I having a symmetric stable distribu-
tion of characteristic exponent IX, show that Sn/n 1/. also has this distribution. Hence, if
I $ r < IX < 2, E Is. I = Cn 1/. for some C in (0, (0) whence the conclusion ofTheorem
10.3.4 fails for I $ r < 2.

References
K. L. Chung, A Course in Probability Theory, Harcourt Brace, New York, 1968; 2nded.,
Academic Press, New York, 1974.
H. Cramer, "Su un teorema relativo alia legge uniforme dei grande numeri," Giornale
del/' lstituto degli Attuari 5 (1934), 1-13.
e. Esseen, "Fourier analysis of distribution functions," Acta Math 77 (1945), 79.
B. V. Gnedenko and A. N. Kolmogorov, Limit Distributions for Sums of Independent
Random Variables, Addison-Wesley, Reading, Mass., 1954.
P. Levy, Theorie de faddition des variables aleatoires, Garthier-Villars, Paris, 1937;
2nd ed., 1954.
M. Loeve, Probability Theory, 3rd ed., Van Nostrand, Princeton, 1963; 4th ed., Springer-
Verlag, Berlin and New York, 1977-1978.
Index

d measurable function, 15 Barndorff-Nielsen, 65, 83


Abbott, 163 Baum, 83, 135, 163,401
Abel, 114, 119 Baum-Katz-Spitzer theorem, 135
Absolutely continuous converse, 199
distribution function, 27, 270 Bayes theorem, 228
set function, 203, 205 Bernoulli, 53, 75
random variables, 27 trials with parameter p, 75-81
Additive (set function), 18 trials with success probability p,
a-, 19 56
see also Subadditive set function weak law of large numbers, 39
Algebra, 6 Bernstein, 53, 353
degenerate a-, (of events), 238 inequality, 111
product a-, 8 polynomial, 42
semi-,20 Berry, 318,322, 353
a-,6 -Esseen theorem, 322, 337
see also a-algebra generated by; Binomial
a-algebra of permutable distribution function, 31,294
events negative, 38, 60
tail a-, 63 random variable, 31, 50, 56
Almost certainly (surely), 20 Bivariate Poisson distribution, 191
Almost everywhere, 171 Blackwell, 156,437,442
Anscombe, 340, 353 Blum, 341, 353
At random, 56 Bochner theorem, 308
Austin, 408, 442 Bonferroni inequalities, 38
Borel, 53, 75
Ballot problem, 250 -Cantelli lemma, 42, 44,101, 102
Banach space, 111 -Cantelli theorem, 61, 96, 258

479
480 Index

Borel (cant.) Class, 1


(measurable) function, 14 J..-,7
line, 11 monotone, 6
set, 11 1t-,7
space, 11, 186 Complement, 4
strong law of large numbers, 42 Complete
zero-one criterion, 61 compactness, 282
Bounded in probability, 273 convergence of distribution
Branching process, 258 functions, 271
Bray, 274,276,311 convergence of random variables,
Breiman, 258, 268 43
Brown, B., 161, 163,428,442 Completion (of), 25, 236
Brunk, 363, 401 Conditional
-Chung strong law of large density, 224
numbers, 363 distribution function, 225
Buhlman, 268 expectation, 210, 212, 213,
Burkholder, 163,414,422,425,442 216-218
-Davis-Gundy inequality, 425 independence, 229-232
inequality, 414 probability, 59, 222, 223
probability measure (regular
Cantelli, 42, 44, 61, 83, 96, 258, 284, conditional probability), 223
311 see also Regular conditional
Carleman criterion, 303 distribution, 225
Cauchy Consistent
convergence criterion, 99 family of distribution functions,
distribution function, 294 195
Central limit theorem, 48, 313, 345 Continuity point, 175, 271
De Moivre-Laplace, 46, 47 Convergence
Doeblin-Anscombe, 340 almost certainly (a.c.) or almost
for martingales, 327, 336, 345 surely (a.s.), 43, 66
for Poisson random variables, 52 a.c. unconditionally, 121
for sums of interchangeable almost everywhere (a.e.), 172, 173
random variables, 328, 329 complete (for distribution
for V-statistics, 326 functions), 271, 309
Liapounov, 316 complete (for random variables),
Lindeberg-Feller, 314 44
Characteristic exponent, 469 in distribution (law), 272
Characteristic function, 286 in mean of order p, 95, 99, 102,
r-analytic, 301 104,109
entire, 301 in measure, 172
Chernoff, 281, 311, 353 in probability, 43, 66, 72-74
Chow, 111, 163, 221, 268, 286, 311, moment, 317
401,402,442 weak,271
Chung, 70, 71, 83, 98, 111, 123, 130, Convex function, 102-104, 110
137,154,163,268,311,353, inequality for martingales, 421,
402,477 422,425
Index 481

Convolution, 189 hypergeometric, 39


Coordinate random variable, 57, infinitely divisible, 445
187, 196 inverse triangular, 294
Copies (of a stopping time), 141, joint, 26, 187
142, 146, 147, 150, 155 joint normal, 208, 221, 228
Correlation coefficient, 106 marginal, 309
Countable set, 1 multinomial, 191
Counting measure, 24 n-dimensional, 187
Covariance, 109 negative binomial, 38, 60
Cramer, 273, 311,477 normal, 31, 53, 294
- Levy theorem, 305 Poisson, 31,294
positive normal, 78, 343
Davis, 418, 422, 425, 442 positive stable, 343
Defective stopping time, 138 sample (empirical), 284
de Finetti, 33, 53, 234, 268 singular, 270
Degenerate stable, 468
distribution function, 31, 70, 294 symmetric Bernoulli, 294
a-algebra (of events), 238 triangular, 294
random variable, 31,64 Doeblin, 340, 353
V-statistic, 259 -Anscombe central limit theorem,
Delayed sum, 135 340
De Moivre, 53, 75 Doob, 29, 83, 111, 164,209,223,
-Laplace (central limit) theorem, 225,239,245,268,291,311,
46-48 353,402,442
Density (function), 27,31,46,206 maximal inequalities, 255
Discrete upcrossing inequality, 405
distribution function, 27 Domain of attraction, 477
random variable, 27 Dubins, 254, 268, 433-437
Disjoint Dvoretzky, 339, 348, 352, 353,441
class, 4 Dynkin, 29, 268
Distinguished
logarithm, 446 Egorov, D. H., theorem, 75
nth root, 446 Egorov, V. A., 402
Distribution function (d.f.), 25, 26, Elementary function, 85
28,30,270,271 Equicontinuous, 208
absolutely continuous, 270 Equivalence
associated, 370 of measures, 207
binomial, 31 of sequences of random variables,
bivariate Poisson, 191 116
Cauchy, 294 Erdos, 135,342,353
conditional, 225 Erickson, 164, 183
degenerate, 31,270,294 Esseen, 318, 322, 353
discrete, 27 Essential supremum, 202
exponential, 60 Etemadi, 131, 132
gamma, 294 Exchangeable (see interchangeable)
geometric, 60 random variables
482 Index

Event, 19 elementary, 85
Expectation (mean), 84 finite set, 18
existence of, 84 integrable, 84, 89, 172
Exponential joint distribution, 26, 187
distribution, 60 left continuous, 25, 26
random variable, 60, 65 moment generating, 110
Extension monotone set, 19
of a sequence, 90 probability density, 27
of a set function, 20, 165 set, 18
(i-additive (countably additive)
Factor closed, 300 set, 19
Fatou lemma, 95, 174 (i-finite set, 18
for conditional expectations, 216 simple, 18
extended, 218 subadditive set, 19
Feller, 53, 61, 70, 83, 126, 128, 137, subtractive set, 24
164,311,313,314,353,402 tail,64
-Chung lemma, 70
weak law of large numbers, 128 Gambler's ruin problem, 81, 250
Fictitious random variable, 272 Gamma distribution, 294
Finite Garsia, 425, 442
measure space, 23 Geometric distribution, 60, 91
partition, 18 Glivenko, 284, 311
permutation, 232 -Cantelli theorem, 284, 311
real line, 11 Gnedenko, 312,402,477
set, 2 Gundy, 157, 164,422,425,442
set function, 18
stopping time (rule, variable), 138, Haag, 33, 53
240 Hajek, 255, 268
Finitely additive set function, 18 -Renyi inequality, 255
First passage time, 141 Halmos,29, 111, 177,209,222,268
Frechet, 283, 311 Hanson, 353
-Shohat theorem, 283 Hardy, 53, 103, 111,209,297,312,
Freedman, 254,268,442 368,372
Friedman, 325 Harris, 258
Fubini theorem, 186, 191,213 Hartman, 374, 378,402
Fuchs, 98, 111, 154, 163 - Wintner law of the iterated
Function logarithm, 382
d-measurable,15 Hausdorff, 29, 368, 372
absolutely continuous, 27 Helly, 274, 312
additive, 18 -Bray lemma, 274
Borel (measurable), 14 -Bray theorem, 276
convex, 102-104 Hewitt, 238, 268
density, 27, 31,46,206 Heyde, 137,164, 346, 353
discrete distribution, 27 Hoeffding, 268
distribution, 26 decomposition, 261
Index 483

Holder inequality, 104, 109, 174,227 Lebesgue-Stieltjes, 175, 180


Hsu,44,53,135,393,402 Riemann, 178
Hypergeometric distribution, 39 Riemann-Stieitjes, 178-180
Interchangeable (exchangeable)
Identifiable, 294 events, 33
Independent random variables, 191,231-234,
classes, 54 238,241
conditionally, 229-232 Inverse triangular distribution, 294
events, 54 Inversion formula, 287
families of random variables, 55
identically distributed (i.i.d.) Jensen inequality, 104
random variables, 55 for conditional expectations, 217
Indicator, 4 Joint
Induced measure, 25, 176 distribution function, 26, 187
Infinite dimensional product normal distribution, 208
measurable space, 10 probability density function, 55
measure space 191-193 Jordan decomposition, 208
Infinitely
often (i.o.), 2 Kac, 342, 353
divisible (distribution), 445 Katz, 83, 135, 163, 325, 401
Inequality Kawata, 164
Burkholder, 414 Kemperman, 267
Burkholder-Davis-Gundy, 422, Kendall, 228, 268
425 Kesten, 156, 164,214
Doob maximal, 255 Khintchine, 113, 164, 368, 372, 402,
Doob upcrossing, 404-406 451
Hajek-Renyi,255 inequality, 384
Holder, 104, 109, 174, 227 - Kolmogorov convergence
Jensen, 104,217 theorem, 113
Khintchine, 384 Kiefer, 397, 402
Kolmogorov, 133, 255 Kingman, 402
Levy, 72 Klass, 137, 138, 164, 269
Marcinkiewicz-Zygmund, 386 Knopp, 164,353
Markov,86,89,173 Kochen, 102, 112
Minkowski, 110, 183 Kolmogorov, 83,113,124,164,208,
Ottaviani,75 312,368,372,402,445,452,
Schwarz, 105 477
Tchebychev, 40, 106 consistency theorem, 195
Young, 184,424 inequality, 133, 255
Integrable, 84, 93, 172 law of the iterated logarithm,
function, 172 373
uniformly, 93 strong law of large numbers, 125
Integral, 84, 171 three series theorem, 117
indefinite, 92, 93 zero-one law, 64
Lebesgue, 176 Komlos, 137
484 Index

Koopmans, 325 -Stieltjes measure space, 170, 175,


Krickeberg, 269 188
Kronecker lemma, 114 see also Non-Lebesgue measur-
able set, 171
Lai,402 Left continuous, 25, 168
Laplace, 46, 47, 53, 75 Levy, 71, 72,83,164,239,245,269,
Law of the iterated logarithm (LIL) 305,312,353,451,477
independent random variables, class .P (of distributions), 468
373-382 concentration function, 286
V-statistics, 382 continuity theorem, 289
Hartman-Wintner, 382 distance, 278
Kolmogorov, 373 inequality, 72
Law of large numbers, Strong inversion formula, 287
(SLLN), Weak (WLLN) - Khintchine representation, 451
definition, 124 theorem, 72
Bernoulli (weak), 39 Liapounov, 105, 112
Borel (strong), 42 central limit theorem, 316
Brunk-Chung (strong), 363 Likelihood ratio, 257
Etemadi, 132 Lindeberg, 313, 353
for arrays, 138,393 condition, 313, 317
for interchangeable random -Feller central limit theorem, 314
variables, 235 Linear Borel set, 11
for V-statistics, 263 Littlewood, 111, 209, 368, 372
Feller (weak), 128 Loeve, 83, 112, 117, 124, 164,209,
generalized SLLN, 126-128, 269,312,402,477
361-366 .Pp random variable, 95
generalized WLLN, 356-359,467 .Pp space, 95
Kolmogorov (strong), 125 Lukacs, 312
Marcinkiewicz-Zygmund (strong),
125 Marcinkiewicz, 118, 121, 125, 164,
Lebesgue 386,387,402
decomposition theorem, 204 -Zygmund inequality, 387
dominated convergence theorem, -Zygmund strong law of large
100,174 numbers, 125, 138, 263,
for conditional expectations, 393
216 Marginal distribution, 309
integral, 176 Markov
measurable set, 170 chain, 238
measure, 170 inequality, 86, 89, 173
measure space, 170 Martingale, 239
monotone convergence theorem, central limit theorems, 336, 345,
86,90,95,173 346,351
for conditional expectations, convergence theorems, 247, 248,
216 406-408, 411
-Stieltjes measure, 170, 188 differences, 241, 336
Index 485

inequalities, 255, 256, 409, 414, Moment, 105


418,425,427,430,434 convergence, 277, 317,442
moments of stopped generating function, 110,281
(martingales), 250, 253, 431 of randomly stopped sums, 250,
reversed (downward), 241 253,431,432
Wald equation for, 250-253,415 Monotone
Match, 38 class, 6
Maximal inequalities, 255, 256, 368 convergence theorem, 86, 90, 95,
McShane, 209 173
McLeish, 348, 349, 353 convergence theorem for
Mean convergence criterion, 99 conditional expectations, 216
Measurable sequence of sets, 3
cover, 228 set function, 19
function, 14 system 15
rectangle, 9 Monroe, 209
set, 8 Multinomial distribution, 191
see also d -measurable function;
Lebesgue or v-measurable set Nagaev, 366,402
Measure, 19 n-dimensional
complete, 168 Borel measure space, 186
conditional probability, 222 distribution function, 187
convergence in, 172 Negative
counting, 24 binomial distribution 38, 60
extension, 165 part, 15
finite, 23 Neveu, 442
induced, 25 Newman, 61
Lebesgue, (and measure space), Nikodym, 204, 205
170 Non-Lebesgue measurable set, 171
Lebesgue-Stieltjes, (and measure Normal distribution, 31, 294
space), 170 positive, 78, 343
n-dimensional Lebesgue-Stieltjes, Normal random variable, 31
188 v-measurable set, 168
outer, 168 Null event, 20
product, 184 Number of upcrossings, 404-406
restriction of a, 165
a-finite, 165 Optimal stopping rule, 160
signed, 208 Ornstein, 130, 154, 163
space, 19 Ottaviani, 305
see also Infinite (and n-) inequality, 75
dimensional product measure Outer measure, 168
space, 185, 193
Median, 71, 109 p-norm,I04
Minkowski inequality, 110, 183 Panzone, 405, 443
Mixture, 190,286,292,294,345,453 Parameter, 31
Mogyorodi, 341, 353 Periodic, 297
486 Index

Permutable events, 232 discrete, 27


Poincare formula, 33 exponential, 65
Point fictitious, 272
of continuity, 175,271 independent, identically
of increase, 28,271 distributed (i.i.d.), 55
Poisson interchangeable (exchangeable),
distribution, 31, 52,294 191,231-234,238,241
random variable, 31, 65 2 p -, 95
theorem, 32 normal, 31, 65
Polya, 98, 112,299,312 Poisson, 31, 65
Positive symmetric, 72, 74
definite, 308 symmetrized, 197
normal distribution, 78, 343 Real line, 11
part, 15 Rectangle, 9
stable distribution, 343 Recurrent, 97, 98, 130, 163
Probability, 19 Regular conditional distribution,
conditional, 59, 222, 223 225
density function, 27 Renyi, 52, 53, 268, 353
space, 19 Renewal theorem (elementary),
success probability, 56 157-159
Product Restriction, 20, 165
measurable space, 8 Revesz, 137, 164
measure, 184 Riemann
measure space, 184, 185 integral, 178
a-algebra, 8 - Lebesgue lemma, 307
space, 8 -Stieltjes integral, 178-180
Prohorov, 402 Riesz representation theorem, 208
Robbins, 45,53, 135, 138, 160, 164,
Rademacher functions, 201 209,221,286,312,375,384
Radon Rogozin, 137
- Nikodym derivative, 205 Rosenblatt, J., 353
- Nikodym theorem, 204 Rosenblatt, M., 353
Raikov, 312,402,467 Rowwise independence, 138, 393,
-Ottaviani theorem, 308 454,459,460
Random
allocation of balls into cells, 56, Saks,29,112,209,312
57,330-335 Same type, 284
vector, 26 Sample (empirical)
walk, 76, 392-401 distribution, 284
see also Simple random walk space, 19
Random variable, 20 Samuel, 143, 164
absolutely continuous, 27 Samuels, 82
binomial,31 Sapogov, 312
coordinate, 57, 196,234,235 Savage,238,269,433-437
degenerate, 31 Scheffe, 312
Index 487

Schwarz inequality, 105 Stone, 102, 112, 147, 164


Second moment analogue ofWald's Stopping time, 138, 240
equation, 144, 253, 254 {Xn }-time,138
Section Strassen, 403
of a function, 185 Stratton, 83
of a set, 13 Strong law of large numbers
Semi-algebra, 20, 24 (SLLN),124
Set independent random variables,
Borel,ll 42,124-128,356-359,
function, 19 361-366
measurable, 8 interchangeable random
operation, 2 variables, 235
Shohat, 283, 303, 312 martingale differences, 258, 415
Siegmund, 164, 402 pairwise independent random
a-additive (countably additive), 19 variables, 132
a-algebra, 6 V-statistics, 263
generated by, 7, 16 arrays, 138, 393
of permutable events, 232 Studden, 111
a-finite, 18 Subadditive set function, 19
measure, 165 Submartingale, 239
partition, 18 closed,240
Signed measure, 208 convergence theorems, 246, 248,
Simple 406,407
function, 18 Subtractive, 24
random walk, 98 Success probability, 56
Singular Supermartingale, 239
distribution, 270 Support (spectrum), 28, 271
jl-singular, 203 Symmetric
Slutsky, 272, 312 Bernoulli distribution, 294
Snell,209 distribution, 72, 297
Space, 4, 8, 11, 19,95, 111, 168, 185 random variable, 72
Spitzer, 135, 164,402 Symmetrized random variable,
Stable distribution, 468-477 197
Standard System
deviation, 106 A-,15
normal,31 monotone, 15
Stein, E., 402
Stirling, 53 Tail
formula, 45, 49 event, 64
Stochastic function, 64
larger, 183 of a distribution, 49
.<t'p-bounded stochastic sequence, a-algebra
239 Tamarkin, 283, 303, 312
matrix, 238 Taylor, 403
sequence, 138,239 Tchebychev inequality, 40, 106
488 Index

Teicher, 53, 112, 137, 163, 164, 268, Wald, 158, 162, 164
269,312,353,403,443 equation, 143, 144,253, 254,415
Three series theorem, 117, 123 Weak
Tight, 276 compactness (sequential), 282
Titchmarsh, 302, 303, 306, 312 law of large numbers, 124, 128,
Total variation, 208 235,356-359
Triangular distribution, 294 Weierstrass, 121
Truncation, 106, 110, 182,213 approximation theorem, 42
Weiss, I., 353
Vncorrelated, 106 Widder, 178, 209
Vniform distribution, 39, 293 Wiener dominated ergodic theorem,
Uniformly 387,403
absolutely continuous, 208 Wintner, 374, 378, 402
bounded random variables, 116 Wolfowitz, 347,402
integrable (u.i.) random variables,
93,94 Yadrenko, 149
integrable relative to distribution Young, L. C, 112
functions, 276, 277 Young, W. H., inequality, 184,424
V-statistics, 241, 259-268
Central limit theorem, 326 Zero-one law, 97
degenerate, 259 Hewitt-Savage, 238
decomposition of, 261 Kolmogorov,64
Strong law of large numbers, 263 for interchangeable random
Law of the iterated logarithm, 382 variables, 238
see also Borel zero-one criterion
Van Beek, 323, 353 Zolotarev, 353
Variance, 105 Zygmund, 118, 121, 125, 164,386,
Von Mises, 58, 83 387,402,422,443
Springer Texts in Statistics (continued from page ii)

Peters: Counting for Something: Statistical Principles and Personalities


Pfeiffer: Probability for Applications
Pitman: Probability
Robert: The Bayesian Choice: A Decision-Theoretic Motivation
Santner and Duffy: The Statistical Analysis of Discrete Data
Saville and Wood: Statistical Methods: The Geometric Approach
Sen and Srivastava: Regression Analysis: Theory, Methods, and Applications
Whittle: Probability via Expectation, Third Edition
Zacks: Introduction to Reliability Analysis: Probability Models and Statistical
Methods

S-ar putea să vă placă și