Inequalities A Journey Into Linear Analysis - D. J. H. Garling, CUP 2007

This page intentionally left blank
INEQUALITIES: A JOURNEY INTO LINEAR

ANALYSIS
Contains a wealth of inequalities used in linear analysis, and explains in
detail how they are used. The book begins with Cauchys inequality and
ends with Grothendiecks inequality, in between one nds the Loomis
Whitney inequality, maximal inequalities, inequalities of Hardy and of
Hilbert, hypercontractive and logarithmic Sobolev inequalities, Beckners in-
equality, and many, many more. The inequalities are used to obtain proper-
ties of function spaces, linear operators between them, and of special classes
of operators such as absolutely summing operators.
This textbook complements and lls out standard treatments, providing
many diverse applications: for example, the Lebesgue decomposition theo-
rem and the Lebesgue density theorem, the Hilbert transform and other
singular integral operators, the martingale convergence theorem, eigenvalue
distributions, Lidskiis trace formula, Mercers theorem and Littlewoods 4/3
theorem.
It will broaden the knowledge of postgraduate and research students, and
should also appeal to their teachers, and all who work in linear analysis.
D. J. H. Garling is Emeritus Reader in Mathematical Analysis at the Uni-
versity of Cambridge and a Fellow of St Johns College, Cambridge.
INEQUALITIES: A JOURNEY INTO
LINEAR ANALYSIS
D. J. H. GARLING
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, So Paulo
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
First published in print format
ISBN-13 978-0-521-87624-7
ISBN-13 978-0-521-69973-0
ISBN-13 978-0-511-28936-1
D. J. H. Garling 2007
2007
Information on this title: www.cambridge.org/9780521876247
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
ISBN-10 0-511-28936-7
ISBN-10 0-521-87624-9
ISBN-10 0-521-69973-8
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
hardback
paperback
paperback
eBook (EBL)
eBook (EBL)
hardback
Contents
Introduction page 1
1 Measure and integral 4
1.1 Measure 4
1.2 Measurable functions 7
1.3 Integration 9
1.4 Notes and remarks 12
2 The CauchySchwarz inequality 13
2.1 Cauchys inequality 13
2.2 Inner-product spaces 14
2.3 The CauchySchwarz inequality 15
3 The AMGM inequality 19
3.1 The AMGM inequality 19
3.2 Applications 21
4 Convexity, and Jensens inequality 24
4.1 Convex sets and convex functions 24
4.2 Convex functions on an interval 26
4.3 Directional derivatives and sublinear functionals 29
4.4 The HahnBanach theorem 31
4.5 Normed spaces, Banach spaces and Hilbert space 34
4.6 The HahnBanach theorem for normed spaces 36
4.7 Barycentres and weak integrals 39
5 The L
p
spaces 45
5.1 L
p
spaces, and Minkowskis inequality 45
v
vi Contents
5.2 The Lebesgue decomposition theorem 47
5.3 The reverse Minkowski inequality 49
5.4 Holders inequality 50
5.5 The inequalities of Liapounov and Littlewood 54
5.6 Duality 55
5.7 The LoomisWhitney inequality 57
5.8 A Sobolev inequality 60
5.9 Schurs theorem and Schurs test 62
5.10 Hilberts absolute inequality 65
6 Banach function spaces 70
6.1 Banach function spaces 70
6.2 Function space duality 72
6.3 Orlicz spaces 73
7 Rearrangements 78
7.1 Decreasing rearrangements 78
7.2 Rearrangement-invariant Banach function spaces 80
7.3 Muirheads maximal function 81
7.4 Majorization 84
7.5 Calder ons interpolation theorem and its converse 88
7.6 Symmetric Banach sequence spaces 91
7.7 The method of transference 93
7.8 Finite doubly stochastic matrices 97
7.9 Schur convexity 98
8 Maximal inequalities 103
8.1 The HardyRiesz inequality (1 < p < ) 103
8.2 The HardyRiesz inequality (p = 1) 105
8.3 Related inequalities 106
8.4 Strong type and weak type 108
8.5 Riesz weak type 111
8.6 Hardy, Littlewood, and a batsmans averages 112
8.7 Rieszs sunrise lemma 114
8.8 Dierentiation almost everywhere 117
8.9 Maximal operators in higher dimensions 118
8.10 The Lebesgue density theorem 121
8.11 Convolution kernels 121
8.12 Hedbergs inequality 125
Contents vii
8.13 Martingales 127
8.14 Doobs inequality 130
8.15 The martingale convergence theorem 130
9 Complex interpolation 135
9.1 Hadamards three lines inequality 135
9.2 Compatible couples and intermediate spaces 136
9.3 The RieszThorin interpolation theorem 138
9.4 Youngs inequality 140
9.5 The HausdorYoung inequality 141
9.6 Fourier type 143
9.7 The generalized Clarkson inequalities 145
9.8 Uniform convexity 147
10 Real interpolation 154
10.1 The Marcinkiewicz interpolation theorem: I 154
10.2 Lorentz spaces 156
10.3 Hardys inequality 158
10.4 The scale of Lorentz spaces 159
10.5 The Marcinkiewicz interpolation theorem: II 162
11 The Hilbert transform, and Hilberts inequalities 167
11.1 The conjugate Poisson kernel 167
11.2 The Hilbert transform on L
2
(R) 168
p
(R) for 1 < p < 170
11.4 Hilberts inequality for sequences 174
11.5 The Hilbert transform on T 175
11.6 Multipliers 179
11.7 Singular integral operators 180
11.8 Singular integral operators on L
p
(R
d
) for 1 p < 183
12 Khintchines inequality 187
12.1 The contraction principle 187
12.2 The reection principle, and Levys inequalities 189
12.3 Khintchines inequality 192
12.4 The law of the iterated logarithm 194
12.5 Strongly embedded subspaces 196
12.6 Stable random variables 198
12.7 Sub-Gaussian random variables 199
viii Contents
12.8 Kahanes theorem and Kahanes inequality 201
13 Hypercontractive and logarithmic Sobolev inequalities 206
13.1 Bonamis inequality 206
13.2 Kahanes inequality revisited 210
13.3 The theorem of Latala and Oleszkiewicz 211
13.4 The logarithmic Sobolev inequality on D
d
2
213
13.5 Gaussian measure and the Hermite polynomials 216
13.6 The central limit theorem 219
13.7 The Gaussian hypercontractive inequality 221
13.8 Correlated Gaussian random variables 223
13.9 The Gaussian logarithmic Sobolev inequality 225
13.10 The logarithmic Sobolev inequality in higher dimensions 227
13.11 Beckners inequality 229
13.12 The BabenkoBeckner inequality 230
14 Hadamards inequality 233
14.1 Hadamards inequality 233
14.2 Hadamard numbers 234
14.3 Error-correcting codes 237
14.4 Note and remark 238
15 Hilbert space operator inequalities 239
15.1 Jordan normal form 239
15.2 Riesz operators 240
15.3 Related operators 241
15.4 Compact operators 242
15.5 Positive compact operators 243
15.6 Compact operators between Hilbert spaces 245
15.7 Singular numbers, and the RayleighRitz minimax formula 246
15.8 Weyls inequality and Horns inequality 247
15.9 Ky Fans inequality 250
15.10 Operator ideals 251
15.11 The HilbertSchmidt class 253
15.12 The trace class 256
15.13 Lidskiis trace formula 257
15.14 Operator ideal duality 260
16 Summing operators 263
16.1 Unconditional convergence 263
Contents ix
16.2 Absolutely summing operators 265
16.3 (p, q)-summing operators 266
16.4 Examples of p-summing operators 269
16.5 (p, 2)-summing operators between Hilbert spaces 271
16.6 Positive operators on L
1
273
16.7 Mercers theorem 274
16.8 p-summing operators between Hilbert spaces (1 p 2) 276
16.9 Pietschs domination theorem 277
16.10 Pietschs factorization theorem 279
16.11 p-summing operators between Hilbert spaces (2 p ) 281
16.12 The DvoretzkyRogers theorem 282
16.13 Operators that factor through a Hilbert space 284
17 Approximation numbers and eigenvalues 289
17.1 The approximation, Gelfand and Weyl numbers 289
17.2 Subadditive and submultiplicative properties 291
17.3 Pietschs inequality 294
17.4 Eigenvalues of p-summing and (p, 2)-summing endomor-
phisms 296
18 Grothendiecks inequality, type and cotype 302
18.1 Littlewoods 4/3 inequality 302
18.2 Grothendiecks inequality 304
18.3 Grothendiecks theorem 306
18.4 Another proof, using Paleys inequality 307
18.5 The little Grothendieck theorem 310
18.6 Type and cotype 312
18.7 Gaussian type and cotype 314
18.8 Type and cotype of L
p
spaces 316
18.9 The little Grothendieck theorem revisited 318
18.10 More on cotype 320
References 325
Index of inequalities 331
Index 332
Introduction
Inequalities lie at the heart of a great deal of mathematics. G.H. Hardy
reported Harald Bohr as saying all analysts spend half their time hunting
through the literature for inequalities which they want to use but cannot
prove. Inequalities provide control, to enable results to be proved. They
also impose constraints; for example, Gromovs theorem on the symplectic
embedding of a sphere in a cylinder establishes an inequality that says that
the radius of the cylinder cannot be too small. Similar inequalities occur
elsewhere, for example in theoretical physics, where the uncertainty principle
(which is an inequality) and Bells inequality impose constraints, and, more
classically, in thermodynamics, where the second law provides a fundamental
inequality concerning entropy.
Thus there are very many important inequalities. This book is not
intended to be a compendium of these; instead, it provides an introduc-
tion to a selection of inequalities, not including any of those mentioned
above. The inequalities that we consider have a common theme; they relate
to problems in real analysis, and more particularly to problems in linear
analysis. Incidentally, they include many of the inequalities considered in
the fascinating and ground-breaking book Inequalities, by Hardy, Littlewood
and P olya [HaLP 52], originally published in 1934.
The rst intention of this book, then, is to establish fundamental inequal-
ities in this area. But more importantly, its purpose is to put them in
context, and to show how useful they are. Although the book is very largely
self-contained, it should therefore principally be of interest to analysts, and
to those who use analysis seriously.
The book requires little background knowledge, but some such knowledge
is very desirable. For a great many inequalities, we begin by considering
sums of a nite number of terms, and the arguments that are used here lie
at the heart of the matter. But to be of real use, the results must be extended
1
2 Introduction
to innite sequences and innite sums, and also to functions and integrals.
In order to be really useful, we need a theory of measure and integration
which includes suitable limit theorems. In a preliminary chapter, we give a
brief account of what we need to know; the details will not be needed, at
least in the early chapters, but a familiarity with the ideas and results of
the theory is a great advantage.
Secondly, it turns out that the sequences and functions that we consider
are members of an appropriate vector space, and that their size, which
is involved in the inequalities that we prove, is described by a norm. We
establish basic properties of normed spaces in Chapter 4. Normed spaces
are the subject of linear analysis, and, although our account is largely self-
contained, it is undoubtedly helpful to have some familiarity with the ideas
and results of this subject (such as are developed in books such as Linear
analysis by Bela Bollobas [Bol 90] or Introduction to functional analysis by
Taylor and Lay [TaL 80]. In many ways, this book provides a parallel text
in linear analysis.
Looked at from this point of view, the book falls naturally into two unequal
parts. In Chapters 2 to 13, the main concern is to establish inequalities
between sequences and functions lying in appropriate normed spaces. The
inequalities frequently reveal themselves in terms of the continuity of certain
linear operators, or the size of certain sublinear operators. In linear analysis,
however, there is interest in the structure and properties of linear operators
themselves, and in particular in their spectral properties, and in the last four
chapters we establish some fundamental inequalities for linear operators.
This book journeys into the foothills of linear analysis, and provides a
view of high peaks ahead. Important fundamental results are established,
but I hope that the reader will nd him- or herself hungry for more. There
are brief Notes and Remarks at the end of each chapter, which include
suggestions for further reading: a partial list, consisting of books and papers
that I have enjoyed reading. A more comprehensive guide is given in the
monumental Handbook of the geometry of Banach spaces [JoL 01,03] which
gives an impressive overview of much of modern linear analysis.
The Notes and Remarks also contain a collection of exercises, of a varied
nature: some are ve-nger exercises, but some establish results that are
needed later. Do them!
Linear analysis lies at the heart of many areas of mathematics, includ-
ing for example partial dierential equations, harmonic analysis, complex
analysis and probability theory. Each of them is touched on, but only to a
small extent; for example, in Chapter 9 we use results from complex analysis
to prove the Riesz-Thorin interpolation theorem, but otherwise we seldom
Introduction 3
use the powerful tools of complex analysis. Each of these areas has its own
collection of important and fascinating inequalities, but in each case it would
be too big a task to do them justice here.
I have worked hard to remove errors, but undoubtedly some remain.
Corrections and further comments can be found on a web-page on my per-
sonal home page at www.dpmms.cam.ac.uk
1
Measure and integral
1.1 Measure
Many of the inequalities that we shall establish originally concern nite
sequences and nite sums. We then extend them to innite sequences and
innite sums, and to functions and integrals, and it is these more general
results that are useful in applications.
Although the applications can be useful in simple settings concerning the
Riemann integral of a continuous function, for example the extensions are
usually made by a limiting process. For this reason we need to work in the
more general setting of measure theory, where appropriate limit theorems
hold. We give a brief account of what we need to know; the details of the
theory will not be needed, although it is hoped that the results that we
eventually establish will encourage the reader to master them. If you are
not familiar with measure theory, read through this chapter quickly, and
then come back to it when you nd that the need arises.
Suppose that is a set. A measure ascribes a size to some of the subsets
of . It turns out that we usually cannot do this in a sensible way for all
the subsets of , and have to restrict attention to the measurable subsets of
. These are the good subsets of , and include all the sets that we meet
in practice. The collection of measurable sets has a rich enough structure
that we can carry out countable limiting operations.
A -eld is a collection of subsets of a set which satises
(i) if (A
i
) is a sequence in then
i=1
A
i
, and
(ii) if A then the complement A .
Thus
(iii) if (A
i
) is a sequence in then
i=1
A
i
.
The sets in are called -measurable sets; if it is clear what is, they
are simply called the measurable sets.
4
1.1 Measure 5
Here are two constructions that we shall need, which illustrate how the
conditions are used. If (A
i
) is a sequence in then we dene the upper limit
limA
i
and the lower limit limA
i
:
limA
i
=
i=1
_
j=i
A
j
_
and limA
i
=
i=1
_
j=i
A
j
_
.
Then limA
i
and limA
i
are in . You should verify that x limA
i
if and
only if x A
i
for innitely many indices i, and that x limA
i
if and only
if there exists an index i
0
such that x A
i
for all i i
0
.
If is the set N of natural numbers, or the set Z of integers, or indeed
any countable set, then we take to be the collection P() of all subsets of
. Otherwise, will be a proper subset of P(). For example, if = R
d
(where R denotes the set of real numbers), we consider the collection of Borel
sets; the sets in the smallest -eld that contains all the open sets. This
includes all the sets that we meet in practice, such as the closed sets, the G
sets (countable intersections of open sets), the F
sets (countable unions of

closed sets), and so on. The Borel -eld has the fundamental disadvantage
that we cannot give a straightforward denition of what a Borel set looks
like this has the consequence that proofs must be indirect, and this gives
measure theory its own particular avour.
Similarly, if (X, d) is a metric space, then the Borel sets of X are sets in
the smallest -eld that contains all the open sets. [Complications can arise
unless (X, d) is separable (that is, there is a countable set which is dense in
X), and so we shall generally restrict attention to separable metric spaces.]
We now give a size (non-negative, but possibly innite or zero) to each of
the sets in . A measure on a -eld is a mapping from into [0, ]
satisfying
(i) () = 0, and
(ii) if (A
i
) is a sequence of disjoint sets in then (
i=1
A
i
) =
i=1
(A
i
):
is countably additive.
The most important example that we shall consider is the following. There
exists a measure (Borel measure) on the Borel sets of R
d
with the property
that if A is the rectangular parallelopiped

d
i=1
(a
i
, b
i
) then (A) is the
product
d
i=1
(b
i
a
i
) of the length of its sides; thus gives familiar geometric
objects their natural measure. As a second example, if is a countable set,
we can dene #(A), or [A[, to be the number of points, nite or innite,
in A; # is counting measure. These two examples are radically dierent:
for counting measure, the one-point sets x are atoms; each has positive
measure, and any subset of it has either the same measure or zero measure.
Borel measure on R
d
is atom-free; no subset is an atom. This is equivalent
6 Measure and integral
to requiring that if A is a set of non-zero measure A, and if 0 < < (A)
then there is a measurable subset B of A with (B) = .
Countable additivity implies the following important continuity
properties:
(iii) if (A
i
) is an increasing sequence in then
(
i=1
A
i
) = lim
i
(A
i
).
[Here and elsewhere, we use increasing in the weak sense: if i < j then
A
i
A
j
. If A
i
A
j
for i < j, then we say that (A
i
) is strictly increasing.
Similarly for decreasing.]
(iv) if (A
i
) is a decreasing sequence in and (A
1
) < then
(
i=1
A
i
) = lim
i
(A
i
).
The niteness condition here is necessary and important; for example,
if A
i
= [i, ) R, then (A
i
) = for all i, but
i=1
A
i
= , so that
(
i=1
A
i
) = 0.
We also have the following consequences:
(v) if A B then (A) (B);
(iv) if (A
i
) is any sequence in then (
i=1
A
i
)
i=1
(A
i
).
There are many circumstances where () < , so that only takes
nite values, and many where () = 1. In this latter case, we can consider
as a probability, and frequently denote it by P. We then use probabilistic
language, and call the elements of events.
A measure space is then a triple (, , ), where is a set, is a -eld of
subsets of (the measurable sets) and is a measure dened on . In order
to avoid tedious complications, we shall restrict our attention to -nite
measure spaces: we shall suppose that there is an increasing sequence (C
k
)
of measurable sets of nite measure whose union is . For example, if is
Borel measure then we can take C
k
= x: [x[ k.
Here is a useful result, which we shall need from time to time.
Proposition 1.1.1 (The rst BorelCantelli lemma) If (A
i
) is a
sequence of measurable sets and

i=1
(A
i
) < then (limA
i
) = 0.
Proof For each i, (limA
i
) (
j=i
A
j
), and (
j=i
A
j
)
j=i
(A
j
) 0
as i .
If (A) = 0, A is called a null set. We shall frequently consider properties
which hold except on a null set: if so, we say that the property holds almost
everywhere, or, in a probabilistic setting, almost surely.
1.2 Measurable functions 7
1.2 Measurable functions
We next consider functions dened on a measure space (, , ). A real-
valued function f is -measurable, or more simply measurable, if for each
real the set (f > ) = x: f(x) > is in . A complex-valued function
is measurable if its real and imaginary parts are. (When P is a probability
measure and we are thinking probabilistically, a measurable function is called
a random variable.) In either case, this is equivalent to the set (f U) =
x: f(x) U being in for each open set U. Thus if is the Borel -eld
of a metric space, then the continuous functions are measurable. If f and g
are measurable then so are f +g and fg; the measurable functions form an
algebra / = /(, , ). If f is measurable then so is [f[. Thus in the real
case / is a lattice: if f and g are measurable, then so are f g = max(f, g)
and f g = min(f, g).
We can also consider the Borel -eld of a compact Hausdor space (X, ):
but it is frequently more convenient to work with the Baire -eld: this is
the smallest -eld containing the closed G
sets, and is the smallest -eld

for which all the continuous real-valued functions are measurable. When
(X, ) is metrizable, the Borel -eld and the Baire -eld are the same.
A measurable function f is a null function if (f ,= 0) = 0. The set ^ of
null functions is an ideal in /. In practice, we identify functions which are
equal almost everywhere: that is, we consider elements of the quotient space
M = //^. Although these elements are equivalence classes of functions,
we shall tacitly work with representatives, and treat the elements of M as
if they were functions.
What about the convergence of measurable functions? A fundamental
problem that we shall frequently consider is When does a sequence of mea-
surable functions converge almost everywhere? The rst BorelCantelli
lemma provides us with the following useful criterion.
Proposition 1.2.1 Suppose that (f
n
) is a decreasing sequence of non-
negative measurable functions. Then f
n
0 almost everywhere if and only
if ((f
n
> ) C
k
) 0 as n for each k and each > 0.
Proof Suppose that (f
n
) converges almost everywhere, and that > 0.
Then ((f
n
> ) C
k
) is a decreasing sequence of sets of nite measure,
and if x
n
(f
n
> ) C
k
then (f
n
(x)) does not converge to 0. Thus, by
condition (iv) above, ((f
n
> ) C
k
) 0 as n .
For the converse, we use the rst BorelCantelli lemma. Suppose that the
condition is satised. For each n there exists N
n
such that ((f
N
n
>
1/n) C
n
) < 1/2
n
. Then since

n=1
((f
N
n
> 1/n) C
n
) < ,
(lim((f
N
n
> 1/n) C
n
) = 0. But if x , lim((f
N
n
> 1/n) C
n
) then
f
n
0.
Corollary 1.2.1 A sequence (f
n
) of measurable functions converges almost
everywhere if and only if
_
( sup
m,nN
[f
m
f
n
[ > ) C
k
_
0 as N
for each k and each > 0.
It is a straightforward but worthwhile exercise to show that if f(x) =
lim
n
f
n
(x) when the limit exists, and f(x) = 0 otherwise, then f is
measurable.
Convergence almost everywhere cannot in general be characterized in
terms of a topology. There is however a closely related form of conver-
gence which can. We say that f
n
f locally in measure (or in probability)
if (([f
n
f[ > ) C
k
) 0 as n for each k and each > 0; similarly
we say that (f
n
) is locally Cauchy in measure if (([f
m
f
n
[ > ) C
k
) 0
as m, n for each k and each > 0. The preceding proposition, and an-
other use of the rst BorelCantelli lemma, establish the following relations
between these ideas.
Proposition 1.2.2 (i) If (f
n
) converges almost everywhere to f, then (f
n
)
converges locally in measure.
(ii) If (f
n
) is locally Cauchy in measure then there is a subsequence which
converges almost everywhere to a measurable function f, and f
n
f locally
in measure.
Proof (i) This follows directly from Corollary 1.2.1.
(ii) For each k there exists N
k
such that (([f
m
f
n
[ > 1/2
k
)C
k
) < 1/2
k
for m, n > N
k
. We can suppose that the sequence (N
k
) is strictly increasing.
Let g
k
= f
N
k
. Then (([g
k+1
g
k
[ < 1/2
k
) C
k
) < 1/2
k
. Thus, by
the First BorelCantelli Lemma, (lim(([g
k+1
g
k
[ > 1/2
k
) C
k
)) = 0.
But lim([g
k+1
g
k
[ > 1/2
k
) C
k
) = lim([g
k+1
g
k
[ > 1/2
k
). If x ,
lim([g
k+1
g
k
[ > 1/2
k
) then

k=1
[g
k+1
(x) g
k
(x)[ < , so that (g
k
(x)) is
a Cauchy sequence, and is therefore convergent.
Let f(x) = limg
k
(x), when this exists, and let f(x) = 0 otherwise.
Then (g
k
) converges to f almost everywhere, and locally in measure. Since
([f
n
f[ > ) ([f
n
g
k
[ > /2) ([g
k
f[ > /2), it follows easily that
f
n
f locally in measure.
1.3 Integration 9
In fact, there is a complete metric on M under which the Cauchy sequences
are the sequences which are locally Cauchy in measure, and the convergent
sequences are the sequences which are locally convergent in measure. This
completeness result is at the heart of very many completeness results for
spaces of functions.
If A is a measurable set, its indicator function I
A
, dened by setting
I
A
(x) = 1 if x A and I
A
(x) = 0 otherwise, is measurable. A simple
function is a measurable function which takes only nitely many values,
and which vanishes outside a set of nite measure: it can be written as
n
i=1
i
I
A
i
, where A
1
, . . . , A
n
are measurable sets of nite measure (which
we may suppose to be disjoint).
Proposition 1.2.3 A non-negative measurable function f is the pointwise
limit of an increasing sequence of simple functions.
Proof Let A
j,n
= (f > j/2
n
), and let f
n
=
1
2
n
4
n
j=1
I
A
j,n
C
n
. Then
(f
n
) is an increasing sequence of simple functions, which converges point-
wise to f.
This result is extremely important; we shall frequently establish inequal-
ities for simple functions, using arguments that only involve nite sums,
and then extend them to a larger class of functions by a suitable limit-
ing argument. This is the case when we consider integration, to which we
now turn.
1.3 Integration
Suppose rst that f =

n
i=1
i
I
A
i
is a non-negative simple function. It is
then natural to dene the integral as

n
i=1
i
(A
i
). It is easy but tedious
to check that this is independent of the representation of f. Next suppose
that f is a non-negative measurable function. We then dene
_
f d = sup
_
g d: g simple, 0 g f.
A word about notation: we write
_
f d or
_
f d for brevity, and
_
f(x) d(x) if we want to bring attention to the variable (for example, when
f is a function of more than one variable). When integrating with respect to
Borel measure on R
d
, we shall frequently write
_
R
d
f(x) dx, and use familiar
conventions such as
_
b
a
f(x) dx. When P is a probability measure, we write
E(f) for
_
f dP, and call E(f) the expectation of f.
We now have the following fundamental continuity result:
Proposition 1.3.1 (The monotone convergence theorem) If (f
n
)
is an increasing sequence of non-negative measurable functions which con-
verges pointwise to f, then (
_
f
n
d) is an increasing sequence and
_
f d =
lim
n
_
f
n
d.
Corollary 1.3.1 (Fatous lemma) If (f
n
) is a sequence of non-negative
measurable functions then
_
(liminf f
n
) d liminf
_
f
n
d. In particular,
if f
n
converges almost everywhere to f then
_
f d liminf
_
f
n
d.
We now turn to functions which are not necessarily non-negative. A
measurable function f is integrable if
_
f
+
d < and
_
f
d < , and
in this case we set
_
f d =
_
f
+
d
_
f
d. Clearly f is integrable if
and only if
_
[f[ d < , and then [
_
f d[
_
[f[ d. Thus the integral
is an absolute integral; fortuitous cancellation is not allowed, so that for
example the function sin x/x is not integrable on R. Incidentally, integration
with respect to Borel measure extends proper Riemann integration: if f is
Riemann integrable on [a, b] then f is equal almost everywhere to a Borel
measurable and integrable function, and the Riemann integral and the Borel
integral are equal.
The next result is very important.
Proposition 1.3.2 (The dominated convergence theorem) If (f
n
) is
a sequence of measurable functions which converges pointwise to f, and if
there is a measurable non-negative function g with
_
g d such that [f
n
[ g
for all n, then
_
f
n
d
_
f d as n .
This is a precursor of results which will come later; provided we have
some control (in this case provided by the function g) then we have a good
convergence result. Compare this with Fatous lemma, where we have no
controlling function, and a weaker conclusion.
Two integrable functions f and g are equal almost everywhere if and only
if
_
[f g[ d = 0, so we again identify integrable functions which are equal
almost everywhere. We denote the resulting space by L
1
= L
1
(, , ); as
we shall see in Chapter 4, it is a vector space under the usual operations.
Finally, we consider repeated integrals. If (X, , ) and (Y, T, ) are mea-
sure spaces, we can consider the -eld (T), which is the smallest -eld
containing AB for all A , B T, and can construct the product mea-
sure on (T), with the property that ()(AB) = (A)(B).
Then the fundamental result, usually referred to as Fubinis theorem, is that
1.3 Integration 11
everything works very well if f 0 or if f L
1
(X Y ):
_
XY
f d( ) =
_
X
__
Y
f d
_
d =
_
Y
__
X
f d
_
d.
In fact the full statement is more complicated than this, as we need to discuss
measurability, but these matters need not concern us here.
This enables us to interpret the integral as the area under the curve.
Suppose that f is a non-negative measurable function on (, , ). Let
A
f
= (, x): 0 x < f() R
+
. Then
( )(A
f
) =
_
__
R
+
I
A
f
d
_
d
=
_
_
_
f()
0
d
_
d() =
_
f d.
The same argument works for the set S
f
= (, x): 0 x < f().
This gives us another way to approach the integral. Suppose that f is a
non-negative measurable function. Its distribution function
f
is dened as
f
(t) = (f > t), for t 0.
Proposition 1.3.3 The distribution function
f
is a decreasing right-
continuous function on (0, ), taking values in [0, ]. Suppose that (f
n
)
is an increasing sequence of non-negative functions, which converges point-
wise to f M. Then
f
n
(u)
f
(u) for each 0 u) ([f[ > v) if u > v, and since ([f[ > u
n
) ([f[ > v)
if u
n
v, it follows that
f
is a decreasing right-continuous function on
(0, ).
Since (f
n
> u) (f > u),
f
n
(u)
f
(u) for each 0 < u < .
Proposition 1.3.4 Suppose that f is a non-negative measurable function
on (, , ), that is a non-negative measurable function on [0, ), and
that (t) =
_
t
0
(s) ds. Then
_
(f) d =
_

0
(t)
f
(t) dt.
Proof We use Fubinis theorem. Let A
f
= (, x): 0 x < f() R
+
.
Then
_
(f) d =
_
_
_
f()
0
(t) dt
_
d()
=
_
R
+
I
A
f
(, t)(t) (d() d(t))
=
_

0
__
I
A
f
(, t)(t) d()
_
dt
=
_

0
(t)
f
(t) dt.
Taking (t) = 1, we obtain the following.
Corollary 1.3.2 Suppose that f is a non-negative measurable function on
(, , ). Then
_
f d =
_

0
f
(t) dt.
Since
f
is a decreasing function, the integral on the right-hand side of this
equation can be considered as an improper Riemann integral. Thus the equa-
tion can be taken as the denition of
_
f d. This provides an interesting

alternative approach to the integral.
1.4 Notes and remarks
This brief account is adequate for most of our needs. We shall introduce fur-
ther ideas when we need them. For example, we shall consider vector-valued
functions in Chapter 4. We shall also prove further measure theoretical
results, such as the Lebesgue decomposition theorem (Theorem 5.2.1) and a
theorem on the dierentiability of integrals (Theorem 8.8.1) in due course,
as applications of the theory that we shall develop.
There are many excellent textbooks which give an account of measure
theory; among them let us mention [Bar 95], [Bil 95], [Dud 02], [Hal 50],
[Rud 79] and [Wil 91]. Note that a large number of these include probability
theory as well. This is very natural, since in the 1920s Kolmogoro explained
how measure theory can provide a rm foundation for probability theory.
Probability theory is an essential tool for analysis, and we shall use ideas
from probability in the later chapters.
2
The CauchySchwarz inequality
2.1 Cauchys inequality
In 1821, Volume I of Cauchys Cours danalyse de l
Ecole Royale Polytech-

nique [Cau 21] was published, putting his course into writing for the great-
est utility of the students. At the end there were nine notes, the second of
which was about the notion of inequality. In this note, Cauchy proved the
following.
Theorem 2.1.1 (Cauchys inequality) If a
1
, . . . , a
n
and b
1
, . . . , b
n
are
real numbers, then
n
i=1
a
i
b
i

_
n
i=1
a
2
i
_
1/2
_
n
i=1
b
2
i
_
1/2
.
Equality holds if and only if a
i
b
j
= a
j
b
i
for 1 i, j n.
Proof Cauchy used Lagranges identity:
_
n
1=1
a
i
b
i
_
2
+
(i,j):i<j
(a
i
b
j
a
j
b
i
)
2
=
_
n
i=1
a
2
i
__
n
i=1
b
2
i
_
.
This clearly establishes the inequality, and also shows that equality holds if
and only if a
i
b
j
= a
j
b
i
, for all i, j.
Cauchy then used this to give a new proof of the Arithmetic Mean
Geometric Mean inequality, as we shall see in the next chapter, but gave
no other applications. In 1854, Buniakowski extended Cauchys inequality
to integrals, approximating the integrals by sums, but his work remained
little-known.
13
14 The CauchySchwarz inequality
2.2 Inner-product spaces
In 1885, Schwarz [Schw 85] gave another proof of Cauchys inequality, this
time for two-dimensional integrals. Schwarzs proof is quite dierent from
Cauchys, and extends to a more general and more abstract setting, which
we now describe.
Suppose that V is a real vector space. An inner product on V is a real-
valued function (x, y) x, y) on V V which satises the following:
(i) (bilinearity)
1
x
1
+
2
x
2
, y) =
1
x
1
, y) +
2
x
2
, y) ,
x,
1
y
1
+
2
y
2
) =
1
x, y
1
) +
2
x, y
2
) ,
for all x, x
1
, x
2
, y, y
1
, y
2
in V and all real
1
,
2
,
1
,
2
;
(ii) (symmetry)
y, x) = x, y) for all x, y in V ;
(iii) (positive deniteness)
x, x) > 0 for all non-zero x in V .
For example, if V = R
d
, we dene the usual inner product, by setting
z, w) =
d
i=1
z
i
w
i
for z = (z
i
), w = (w
i
).
Similarly, an inner product on a complex vector space V is a function
(x, y) x, y) from V V to the complex numbers C which satises the
following:
(i) (sesquilinearity)
1
x
1
+
2
x
2
, y) =
1
x
1
, y) +
2
x
2
, y) ,
x,
1
y
1
+
2
y
2
) =
1
x, y
1
) +
2
x, y
2
) ,
for all x, x
1
, x
2
, y, y
1
, y
2
in V and all complex
1
,
2
,
1
,
2
;
(ii) (the Hermitian condition)
y, x) = y, x) for all x, y in V ;
(iii) (positive deniteness)
x, x) > 0 for all non-zero x in V .
For example, if V = C
d
, we dene the usual inner product, by setting
z, w) =
d
i=1
z
i
w
i
for z = (z
i
), w = (w
i
).
2.3 The CauchySchwarz inequality 15
A (real or) complex vector space V equipped with an inner product is
called an inner-product space. If x is a vector in V , we set |x| = x, x)
1/2
.
Note that we have the following parallelogram law:
|x +y|
2
+|x y|
2
= (x, x) +x, y) +y, x) +y, y))
+ (x, x) x, y) y, x) +y, y))
= 2|x|
2
+ 2|y|
2
.
2.3 The CauchySchwarz inequality
In what follows, we shall consider the complex case: the real case is easier.
Proposition 2.3.1 (The CauchySchwarz inequality) If x and y are
vectors in an inner product space V , then
[ x, y) [ |x| . |y| ,
with equality if and only if x and y are linearly dependent.
Proof This depends upon the quadratic nature of the inner product. If
y = 0 then x, y) = 0 and |y| = 0, so that the inequality is trivially true.
Otherwise, let x, y) = re
i
, where r = [ x, y) [. If is real then
_
_
_x +e
i
y
_
_
_
2
= x, x) +
_
e
i
y, x
_
+
_
x, e
i
y
_
+
_
e
i
y, e
i
y
_
= |x|
2
+ 2[ x, y) [ +
2
|y|
2
.
Thus |x|
2
+2[ x, y) [ +
2
|y|
2
0. If we take = |x| / |y|, we obtain
the desired inequality.
If equality holds, then |x+e
i
y| = 0, so that x+e
i
y = 0, and x and y
are linearly dependent. Conversely, if x and y are linearly dependent, then
x = y, and [ x, y) [ = [[ |y|
2
= |x| |y|.
Note that we obtain Cauchys inequality by considering R
d
, with its usual
inner product.
Corollary 2.3.1 |x +y| |x| + |y|, with equality if and only if either
y = 0 or x = y, with 0.
Proof We have
|x +y|
2
= |x|
2
+x, y) +y, x) +|y|
2
|x|
2
+ 2 |x| . |y| +|y|
2
= (|x| +|y|)
2
.
Equality holds if and only if 'x, y) = |x| . |y|, which is equivalent to the
condition stated.
Since |x| = [[ |x|, and since |x| = 0 if and only if x = 0, this corollary
says that the function x |x| is a norm on V . We shall consider norms in
Chapter 4.
As our second example of inner product spaces, we consider spaces of
functions. Suppose that (, , ) is a measure space. Let L
2
= L
2
(, , )
denote the set of complex-valued measurable functions on for which
_
[f[
2
d < .
It follows from the parallelogram law for scalars that if f and g are in L
2
then
_
[f +g[
2
d +
_
[f g[
2
d =
_
[f[
2
d +
_
[g[
2
d,
so that f +g and f g are in L
2
. Since f is in L
2
if f is, this means that
L
2
is a vector space.
Similarly, since
[f(x)[
2
+[g(x)[
2
2[f(x)g(x)[ = ([f(x)[ [g(x)[)
2
0,
it follows that
2
_
[f g[ d
_
[f[
2
d +
_
[g[
2
d,
with equality if and only if [f[ = [g[ almost everywhere, so that f g is
integrable. We set
f, g) =
_
f g d.
This function is sesquilinear, Hermitian and positive semi-denite. Further,
f, f) = 0 if and only if f = 0 almost everywhere. We therefore identify
functions which are equal almost everywhere, and denote the resulting quo-
tient space by L
2
= L
2
(, , ). L
2
is again a vector space, and the value
of the integral
_
f g d is unaltered if we replace f and g by equivalent

functions. We can therefore dene f, g) on L
2
L
2
: this is now an inner
product. Consequently, we have the following result.
Theorem 2.3.1 (Schwarz inequality) If f, g L
2
(, , ), then
f g d
__
[f[
2
d
_
1/2
__
[g[
2
d
_
1/2
,
with equality if and only if f and g are linearly dependent.
More particularly, when = N, and is counting measure, we write
l
2
=
_
x = (x
i
):
i=1
[x
i
[
2
<
_
.
Then if x and y are in l
2
the sum

i=1
x
i
y
i
is absolutely convergent and
i=1
x
i
y
i
i=1
[x
i
[[y
i
[
_

i=1
[y
i
[
2
_
1/2
_

i=1
[x
i
[
2
_
1/2
.
We shall follow modern custom, and refer to both Cauchys inequality and
Schwarz inequality as the CauchySchwarz inequality.
Seen from this distance, it now seems strange that Cauchys inequality
did not appear in print until 1821, and stranger still that Schwarz did not
establish the result for integrals until more than sixty years later. Nowadays,
inner-product spaces and Hilbert spaces have their place in undergraduate
courses, where the principal diculty that occurs is teaching the correct
pronunciation of Cauchy and the correct spelling of Schwarz.
We shall not spend any longer on the CauchySchwarz inequality, but it is
worth noting how many of the results that follow can be seen as extensions
or generalizations of it.
An entertaining account of the CauchySchwarz inequality and related
results is given in [Ste 04].
Exercises
2.1 Suppose that () < and that f L
2
(). Show that
_
[f[ d (())
1/2
__
[f[
2
d
_
1/2
.
The next two inequalities are useful in the theory of hypercontractive
semigroups.
2.2 Suppose that r > 1. Using Exercise 2.1, applied to the function f(x) =
1/
x on [1, r
2
], show that 2(r 1) (r + 1) log r.
2.3 Suppose that 0 < s < t and that q > 1. Using Exercise 2.1, applied to
the function f(x) = x
q1
on [s, t], show that
(t
q
s
q
)
2
q
2
2q 1
(t
2q1
s
2q1
)(t s).
2.4 Suppose that P is a Borel probability measure on R. The characteristic
function f
P
(u) is dened (for real u) as
f
P
(u) =
_
R
e
ixu
dP(x).
(i) Prove the incremental inequality
[f
P
(u +h) f
P
(u)[
2
4(1 'f
P
(h)).
(ii) Prove the HarkerKasper inequality
2('f
P
(u))
2
1 +'f
P
(2u).
This inequality, proved in 1948, led to a substantial breakthrough in
determining the structure of crystals.
2.5 Suppose that g is a positive measurable function on and that
_
g d = 1. Show that if f L
1
() then
_
f d
__
(f
2
/g) d
_
1/2
.
3
The arithmetic meangeometric mean inequality
3.1 The arithmetic meangeometric mean inequality
The arithmetic meangeometric mean inequality is perhaps the most famous
of all inequalities. It is beloved by problem setters.
Theorem 3.1.1 (The arithmetic meangeometric mean inequality)
Suppose that a
1
, . . . , a
n
are positive numbers. Then
(a
1
. . . a
n
)
1/n
a
1
+ +a
n
n
,
with equality if and only if a
1
= = a
n
.
The quantity g = (a
1
. . . a
n
)
1/n
is the geometric mean of a
1
, . . . , a
n
, and the
quantity a = (a
1
+ +a
n
)/n is the arithmetic mean.
Proof We give three proofs here, and shall give another one later.
First we give Cauchys proof [Cau 21]. We begin by proving the result
when n = 2
k
, proving the result by induction on k. Since
(a
1
+a
2
)
2
4a
1
a
2
= (a
1
a
2
)
2
0,
the result holds for k = 1, with equality if and only if a
1
= a
2
.
Suppose that the result holds when n = 2
k1
. Then
a
1
. . . a
2
k1
_
a
1
+ +a
2
k1
2
k1
_
2
k1
and
a
2
k1
+1
. . . a
2
k
_
a
2
k1
+1
+ +a
2k
2
k1
_
2
k1
,
19
20 The AMGM inequality
so that
a
1
. . . a
2
k
__
a
1
+ +a
2
k1
2
k1
__
a
2
k1
+1
+ +a
2
k
2
k1
__
2
k1
But
(a
1
+ +a
2
k1)(a
2
k1
+1
+ +a
2
k)
1
4
(a
1
+ +a
2
k)
2
,
by the case k = 1. Combining these two inequalities, we obtain the required
inequality. Further, equality holds if and only if equality holds in each of
the inequalities we have established, and this happens if and only if
a
1
= = a
2
k1 and a
2
k1
+1
= = a
2
k,
and
a
1
+ +a
2
k1 = a
2
k1
+1
+ +a
2
k,
which in turn happens if and only if a
1
= = a
2
k.
We now prove the result for general n. Choose k such that 2
k
> n, and
set a
j
equal to the arithmetic mean a for n < j 2
k
. Then, applying the
result for 2
k
, we obtain
a
1
. . . a
n
.a
2
k
n
a
2
k
.
Multiplying by a
n2
k
, we obtain the inequality required. Equality holds if
and only if a
i
= a for all i.
The second proof involves the method of transfer. We prove the result
by induction on the number d of terms a
j
which are dierent from the
arithmetic mean a. The result is trivially true, with equality, if d = 0. It is
not possible for d to be equal to 1. Suppose that the result is true for all
values less than d, and that d terms of a
1
, . . . , a
n
are dierent from a. There
must then be two indices i and j for which a
i
> a > a
j
. We now transfer
some of a
i
to a
j
; we dene a new sequence of positive numbers by setting
a
t
i
= a, a
t
j
= a
i
+ a
j
a, and a
t
k
= a
k
for k ,= i, j. Then a
t
1
, . . . , a
t
j
has the
same arithmetic mean a as a
1
, . . . , a
n
, and has less than d terms dierent
from a. Thus by the inductive hypothesis, the geometric mean g
t
is less than
or equal to a. But
a
t
i
a
t
j
a
i
a
j
= aa
i
+aa
j
a
2
a
i
a
j
= (a
i
a)(a a
j
) > 0,
so that g < g
t
. This establishes the inequality, and also shows that equality
can only hold when all the terms are equal.
The third proof requires results from analysis. Let
= x = (x
1
, . . . , x
n
)
n
: x
i
0 for 1 i n, x
1
+ +x
n
= na.
3.2 Applications 21
is the set of n-tuples (x
1
, . . . , x
n
) of non-negative numbers with arithmetic
mean a. It is a closed bounded subset of
n
. The function (x) = x
1
x
n
is continuous on , and so it attains a maximum value at some point c =
(c
1
, . . . , c
n
). [This basic result from analysis is fundamental to the proof;
early versions of the proof were therefore defective at this point.] Since
(a, . . . , a) = a
n
> 0, p(c) > 0, and so each c
i
is positive. Now consider any
two distinct indices i and j. Let p and q be points of
n
, dened by
p
i
= 0, p
j
= c
i
+c
j
, p
k
= c
k
for k ,= i, j,
q
i
= c
i
+c
j
, q
j
= 0, q
k
= c
k
for k ,= i, j.
Then p and q are points on the boundary of , and the line segment [p, q] is
contained in . Let f(t) = (1 t)p +tq, for 0 t 1, so that f maps [0, 1]
onto the line segment [p, q]. f(c
i
/(c
i
+c
j
)) = c, so that c is an interior point
of [p, q]. Thus the function g(t) = (f(t)) has a maximum at c
i
/(c
i
+ c
j
).
Now
g(t) = t(1 t)(c
i
+c
j
)
2

k,=i,j
c
k
, so that
dg
dt
= (1 2t)(c
i
+c
j
)
2

k,=i,j
c
k
,
and
dg
dt
_
c
i
c
i
+c
j
_
= (c
j
c
i
)(c
i
+c
j
)
2

k,=i,j
c
k
= 0.
Thus c
i
= c
j
. Since this holds for all pairs of indices i, j, the maximum is
attained at (a, . . . , a), and at no other point.
We shall refer to the arithmetic meangeometric mean inequality as the
AMGM inequality.
3.2 Applications
We give two applications of the AMGM inequality. In elementary analysis,
it can be used to provide polynomial approximations to the exponential
function.
Proposition 3.2.1 (i) If nt > 1, then (1 t)
n
1 nt.
(ii) If x < n < m then (1 +x/n)
n
(1 +x/m)
m
.
(iii) If x > 0 and > 1 then (1 x/n
)
n
1.
(iv) (1 +x/n)
n
converges as n , for all real x.
Proof (i) Take a
1
= 1 nt and a
2
= = a
n
= 1.
22 The AMGM inequality
(ii) Let a
1
= = a
n
= 1 +x/n, and a
n+1
= = a
m
= 1. Then
(1 +x/n)
n/m
= (a
1
. . . a
m
)
1/m
(a
1
+ +a
m
)/m = 1 +x/m.
(iii) Put t = x/n
. Then if n
> x, 1 x/n
1
(1 x/n
)
n
< 1, by
(i), and the result follows since 1 x/n
1
1 as n .
If x < 0 then, for n > x, ((1 +x/n)
n
) is an increasing sequence which is
bounded above by 1, and so it converges, to e(x) say. If x > 0, then
(1 +x/n)
n
(1 x/n)
n
= (1 x
2
/n
2
)
n
1,
so that (1 +x/n)
n
converges, to e(x) say, where e(x) = e(x)
1
.
We set e = e(1) = lim
n
(1 + 1/n)
n
.
Carleman [Car 23] established an important inequality used in the study
of quasi-analytic functions (the DenjoyCarleman theorem: see for example
[Hor 90], Theorem 1.3.8). In 1926, Polya [P ol 26] gave the following elegant
proof, which uses the AMGM inequality.
Theorem 3.2.1 (Carlemans inequality) Suppose that (a
j
) is a sequence
of positive numbers for which

j=1
a
j
< . Then
n=1
(a
1
. . . a
n
)
1/n
< e
j=1
a
j
.
Proof Let m
n
= n(1 + 1/n)
n
, so that m
1
m
n
= (n + 1)
n
, and let b
n
=
m
n
a
n
. Then
(n + 1)(a
1
. . . a
n
)
1/n
= (b
1
. . . b
n
)
1/n
(b
1
+ +b
n
)/n,
so that
n=1
(a
1
. . . a
n
)
1/n
n=1
1
n(n + 1)
j=1
b
j
j=1
b
j
n=j
1
n(n + 1)
j=1
b
j
j
=
j=1
_
1 +
1
j
_
j
a
j
< e
j=1
a
j
.
The AMGM inequality has been around for a long time, and there are
many proofs of it: 52 are given in [BuMV 87]. The rst two proofs that
we have given are truly elementary, using only the algebraic properties of
an ordered eld. The idea behind the second proof is called the method of
transfer: it will recur later, in the proof of Theorem 7.7.1. It was introduced
by Muirhead [Mui 03] to prove Theorem 7.9.2, which provides a far-reaching
generalization of the AMGM inequality.
The salient feature of the AMGM inequality is that it relates additive and
multiplicative averages: the logarithmic and exponential functions provide a
link between addition and multiplication, and we shall use these to generalize
the AMGM inequality, in the next chapter.
Exercises
3.1 The harmonic mean h of n positive numbers a
1
, . . . , a
n
is dened as
(
n
j=1
(1/a
j
)/n)
1
. Show that the harmonic mean is less than or equal
to the geometric mean. When does equality occur?
3.2 Show that a d-dimensional rectangular parallelopiped of xed volume
has least surface area when all the sides have equal length. Show
that solving this problem is equivalent to establishing the AMGM
inequality.
3.3 Suppose that a
1
, . . . , a
n
are n positive numbers. Show that if 1 < k < n
then
(a
1
. . . a
n
)
1/n
_
n
k
_
1

i
1
<<i
k
(a
i
1
. . . a
i
k
)
1/k
a
1
+ +a
n
n
.
3.4 With the terminology of Proposition 3.2.1, show that e(x)e(y) =
e(x +y), that e = e(1) =
j=0
1/j! and that e(x) =
j=0
x
j
/j!
3.5 Let t
n
= n
n
/n! By considering the ratios t
n+1
/t
n
, show that n
n
< e
n
n!
3.6 Suppose that (a
n
) and (f
n
) are sequences of positive numbers such that
n=1
a
n
= and f
n
f > 0 as n . Show that
_
N
n=1
f
n
a
n
___
N
n=1
a
n
_
f as N .
3.7 Show that the constant e in Carlemans inequality is best possible.
[Consider nite sums in the proof, and strive for equality.]
4
Convexity, and Jensens inequality
4.1 Convex sets and convex functions
Many important inequalities depend upon convexity. In this chapter, we
shall establish Jensens inequality, the most fundamental of these inequali-
ties, in various forms.
A subset C of a real or complex vector space E is convex if whenever x
and y are in C and 0 1 then (1 )x + y C. This says that the
real line segment [x, y] is contained in C. Convexity is a real property: in
the complex case, we are restricting attention to the underlying real space.
Convexity is an ane property, but we shall restrict our attention to vector
spaces rather than to ane spaces.
Proposition 4.1.1 A subset C of a vector space E is convex if and only if
whenever x
1
, . . . , x
n
C and p
1
, . . . , p
n
are positive numbers with p
1
+ +
p
n
= 1 then p
1
x
1
+ +p
n
x
n
C.
Proof The condition is certainly sucient. We prove necessity by induction
on n. The result is trivially true when n = 1, and is true for n = 2, as this
reduces to the denition of convexity. Suppose that the result is true for
n 1, and that x
1
, . . . , x
n
and p
1
, . . . , p
n
are as above. Let
y =
p
n1
p
n1
+p
n
x
n1
+
p
n
p
n1
+p
n
x
n
.
Then y C by convexity, and
p
1
x
1
+ +p
n
x
n
= p
1
x
1
+ +p
n2
x
n2
+ (p
n1
+p
n
)y C,
by the inductive hypothesis.
24
4.1 Convex sets and convex functions 25
A real-valued function f dened on a convex subset C of a vector space
E is convex if the set
U
f
= (x, ): x C, f(x) E R
of points on and above the graph of f is convex. That is to say, if x, y C
and 0 1 then
f((1 )x +y) (1 )f(x) +f(y).
f is strictly convex if
f((1 )x +y) < (1 )f(x) +f(y)
whenever x and y are distinct points of C and 0 < < 1. f is concave
(strictly concave) if f is convex (strictly convex).
We now use Proposition 4.1.1 to prove the simplest version of Jensens
inequality.
Proposition 4.1.2 (Jensens inequality: I) If f is a convex function on
a convex set C, and p
1
, . . . , p
n
are positive numbers with p
1
+ + p
n
= 1,
then
f(p
1
x
1
+ +p
n
x
n
) p
1
f(x
1
) + +p
n
f(x
n
).
If f is strictly convex, then equality holds if and only if x
1
= = x
n
.
Proof The rst statement follows by applying Proposition 4.1.1 to U
f
. Sup-
pose that f is strictly convex, and that x
1
, . . . , x
n
are not all equal. By
relabelling if necessary, we can suppose that x
n1
,= x
n
. Let
y =
p
n1
p
n1
+p
n
x
n1
+
p
n
p
n1
+p
n
x
n
,
as above. Then
f(y) <
p
n1
p
n1
+p
n
f(x
n1
) +
p
n
p
n1
+p
n
f(x
n
),
so that
f(p
1
x
1
+ +p
n
x
n
) = f(p
1
x
1
+ +p
n2
x
n2
+ (p
n1
+p
n
)y
p
1
f(x
1
) + +p
n2
f(x
n2
) + (p
n1
+p
n
)f(y)
< p
1
f(x
1
) + +p
n
f(x
n
).
Although this is very simple, it is also very powerful. Here for example is
an immediate improvement of the AMGM inequality.
26 Convexity, and Jensens inequality
Proposition 4.1.3 Suppose that a
1
, . . . , a
n
are positive, that p
1
, . . . , p
n
are
positive and that p
1
+ +p
n
= 1. Then
a
p
1
1
. . . a
p
n
n
p
1
a
1
+ +p
n
a
n
,
with equality if and only if a
1
= = a
n
.
Proof The function e
x
is strictly convex (see Proposition 4.2.1), and so
e
p
1
x
1
. . . e
p
n
x
n
= e
p
1
x
1
++p
n
x
n
p
1
e
x
1
+ +p
n
e
x
n
for any real x
1
, . . . , x
n
, with equality if and only if x
1
= = x
n
. The result
follows by making the substitution x
i
= log a
i
.
We can think of Proposition 4.1.2 in the following way. We place masses
p
1
, . . . , p
n
at the points (x
1
, f(x
1
)), . . . , (x
n
, p(x
n
)) on the graph of f. This
denes a measure on ER. Then the centre of mass, or barycentre, of these
masses is at the point
(p
1
x
1
+ +p
n
x
n
, p
1
f(x
1
) + +p
n
f(x
n
)),
and this lies above the graph, because f is convex. For a more sophisticated
version, we replace the measure dened by the point masses by a more
general measure. In order to obtain the corresponding version of Jensens
inequality, we need to study convex functions in some detail, and also need
to dene the notion of a barycentre with some care.
4.2 Convex functions on an interval
Let us consider the case when E is the real line R. In this case the convex
subsets are simply the intervals in R. First let us consider dierentiable
functions.
Proposition 4.2.1 Suppose that f is a dierentiable real-valued function
on an open interval I of the real line R. Then f is convex if and only if its
derivative f
t
is an increasing function. It is strictly convex if and only if f
t
is strictly increasing.
Proof First suppose that f is convex. Suppose that a < b < c are points in
I. Then by convexity,
f(b)
c b
c a
f(a) +
b a
c a
f(c).
4.2 Convex functions on an interval 27
Rearranging this, we nd that
f(b) f(a)
b a

f(c) f(a)
c a

f(c) f(b)
c b
.
Thus if a < b c < d are points in I,
f(b) f(a)
b a

f(d) f(c)
d c
.
It follows from this that f
t
is increasing.
Conversely, suppose that f
t
is increasing. Suppose that x
0
< x
1
are points
in I and that 0 < < 1: let x
= (1 )x
0
+x
1
. Applying the mean-value
theorem, there exist points x
0
< c < x
< d < x
1
such that
f(x
) f(x
0
) = (x
x
0
)f
t
(c) = (x
1
x
0
)f
t
(c),
f(x
1
) f(x
) = (x
1
x
0
)f
t
(d) = (1 )(x
1
x
0
)f
t
(d).
Multiplying the rst equation by (1 ) and the second by , and adding,
we nd that
(1 )f(x
0
) +f(x
1
) f(x
) = (1 )(x
1
x
0
)(f
t
(d) f
t
(c)) 0.
If f
t
is strictly increasing then this inequality is strict, so that f is strictly
convex. If it is not strictly increasing, so that there exist y
0
< y
1
in I with
f
t
(x) = f
t
(y
0
) for y
0
x y
1
, then f(x) = f(y
0
) + (x y
0
)f
t
(y
0
) for
y
0
x y
1
, and f is not strictly convex.
We now drop the requirement that f is dierentiable. Suppose that f is
a convex function on an open interval I, and that x I. Suppose that x+t
and x t are in I, and that 0 < < 1. Then (considering the cases where
t > 0 and t < 0 separately) it follows easily from the inequalities above that
(f(x) f(x t)) f(x +t) f(x) (f(x +t) f(x)),
so that
[f(x +t) f(x)[ max([f(x +t) f(x)[, [f(x) f(x t)[),
and f is Lipschitz continuous at x. (A function f from a metric space (X, d)
to a metric space (Y, ) is Lipschitz continuous at x
0
if there is a constant C
such that (f(x), f(x
0
)) Cd(x, x
0
) for all x X. f is a Lipschitz function
if there is a constant C such that (f(x), f(z)) Cd(x, z) for all x, z X.)
We can go further. If t > 0, it follows from the inequalities above, and
the corresponding ones for f(x t), that
f(x) f(x t)
t

f(x) f(x t)
t
f(x +t) f(x)

t

f(x +t) f(x)
t
,
so that the right and left derivatives
D
+
f(x) = lim
h0
f(x +h) f(x)
h
and D
f(x) = lim
h0
f(x h) f(x)
h
both exist, and D
+
f(x) D
f(x). Similar arguments show that D

+
f
and D
f are increasing functions, that D

+
f is right-continuous and D
f
left-continuous, and that D
f(x) D
+
f(y) if x > y. Consequently, if
D
+
f(x) ,= D
f(x) then D
+
f and D
f have jump discontinuities at x.

Since an increasing function on an interval has only countably many dis-
continuities, it follows that D
+
f(x) and D
f(x) are equal and continuous,

except at a countable set of points. Thus f is dierentiable, except at this
countable set of points.
Proposition 4.2.2 Suppose that f is a convex function on an open interval
I of R, and that x I. Then there is an ane function a on R such that
a(x) = f(x) and a(y) f(y) for y I.
Proof Choose so that D
f(x) D
+
f(x). Let a(y) = f(x)+(yx).
Then a is an ane function on R, a(x) = f(x) and a(y) f(y) for y I.
Thus f is the supremum of the ane functions which it dominates.
We now return to Jensens inequality. Suppose that is a probability
measure on the Borel sets of a (possibly unbounded) open interval I = (a, b).
In analogy with the discrete case, we wish to dene the barycentre to be
_
I
xd(x). There is no problem if I is bounded; if I is unbounded, we require
that the identity function i(x) = x is in L
1
(): that is,
_
I
[x[ d(x) < . If
so, we dene as
_
I
xd(x). Note that I.
Theorem 4.2.1 (Jensens inequality: II) Suppose that is a probability
measure on the Borel sets of an open interval I of R, and that has a
barycentre . If f is a convex function on I with
_
I
f
d < then f( )
_
I
f d. If f is strictly convex then equality holds if and only if ( ) = 1.
4.3 Directional derivatives and sublinear functionals 29
A probability measure whose mass is concentrated at just one point x,
so that (x) = 1 and ( x) = 0, is called a Dirac measure, and is
denoted by
x
.
Proof The condition on f ensures that
_
I
f d exists, taking a value in
(, ]. By Proposition 4.2.2, there exists an ane function a on R with
a( ) = f( ) and a(y) f(y) for all y I. Then
f( ) = a( ) =
_
I
a d
_
I
f d.
If f is strictly convex then f(y) a(y) > 0 for y ,= , so that equality holds
if and only if (I ) = 0.
An important special case of Theorem 4.2.1 arises in the following way.
Suppose that p is a non-negative measurable function on an open interval
I, and that
_
I
p d = 1. Then we can dene a probability measure p d by
setting
p d(B) =
_
B
p d =
_
I
pI
B
d,
for each Borel set B. If
_
I
[x[p(x) d(x) < , then p d has barycentre
_
I
xp(x) d(x). We therefore have the following corollary.
Corollary 4.2.1 Suppose that p is a non-negative measurable function on
an open interval I, that
_
I
p d = 1 and that
_
I
[x[p(x) d(x) < . If f is a
convex function on I with
_
I
p(x)f
(x) d(x) < then

f
__
I
xp(x) d(x)
_
_
I
f(x)p(x) d.
If f is strictly convex then equality cannot hold.
4.3 Directional derivatives and sublinear functionals
We now return to the case where E is a vector space. We consider a radially
open convex subset C of a vector space E: a subset C of E is radially open
if whenever x C and y E then there exists
0
=
0
(x, y) > 0 such that
x + y C for 0 <
0
. Suppose that f is a convex function on C,
that x C and that y E. Then arguing as in the real case, the function
(f(x +y) f(x))/ is an increasing function of on (0,
0
(x, y)) which is
bounded below, and so we can dene the directional derivative
D
y
(f)(x) = lim
0
f(x +y) f(x)
.
This has important properties that we shall meet again elsewhere. A real-
valued function p on a real or complex vector space E is positive homogeneous
if p(x) = p(x) when is real and positive and x E; it is subadditive
if p(x + y) p(x) + p(y), for x, y E, and it is sublinear or a sublinear
functional if it is both positive homogeneous and subadditive.
Proposition 4.3.1 Suppose that f is a convex function on a radially open
convex subset C of a vector space E, and that x C. Then the directional
derivative D
y
(f)(x) at x is a sublinear function of y, and f(x+y) f(x) +
D
y
(f)(x) for x, x +y C.
Proof Positive homogeneity follows from the denition of the directional
derivative. Suppose that y
1
, y
2
E. There exists
0
such that x +y
1
and
x +y
2
are in C for 0 < <
0
. Then by convexity x +(y
1
+y
2
) C for
0 < <
0
/2 and
f(x +(y
1
+y
2
))
1
2
f(x + 2y
1
) +
1
2
f(x + 2y
1
),
so that
D
y
1
+y
2
(f)(x)
1
2
D
2y
1
(f)(x) +
1
2
D
2y
2
(f)(x) = D
y
1
(f)(x) +D
y
2
(f)(x).
The nal statement follows from the fact that (f(x + y) f(x))/ is an
increasing function of .
Radially open convex sets and sublinear functionals are closely related.
Proposition 4.3.2 Suppose that V is a radially open convex subset of a real
vector space E and that 0 V . Let p
V
(x) = inf > 0: x V . Then p
V
is a non-negative sublinear functional on E and V = x: p
V
(x) < 1.
Conversely, if p is a sublinear functional on E then U = x: p(x) < 1
is a radially open convex subset of E, 0 U, and p
U
(x) = max(p(x), 0) for
each x E.
The function p
U
is called the gauge of U.
Proof Since V is radially open, p
V
(x) < for each x E. p
V
is positive
homogeneous and, since V is convex and radially open, x V for >
p
V
(x), so that > 0: x V = (p
V
(x), ). Suppose that > p
V
(x) and
> p
V
(y). Then x/ V and y/ V , and so, by convexity,
x +y
+
=

( +)
x
+

( +)
y
V,
so that x + y ( + )V , and p
V
(x + y) < + . Consequently p
V
is
subadditive. If p
V
(x) < 1 then x V . On the other hand, if x V then
since V is radially open (1 + )x = x + x V for some > 0, so that
p
V
(x) 1/(1 +) < 1.
For the converse, if x, y U and 0 1 then
p((1 )x +y) (1 )p(x) +p(y) < 1,
so that (1 )x + y U: U is convex. Since p(0) = 0, 0 U. If
x U, y E and > 0 then p(x + y) p(x) + p(y), so that if 0 <
< (1 p(x))/(1 + p(y)) then x + y U, and so U is radially open. If
p(x) > 0 then p(x/p(x)) = 1, so that x U if and only if > p(x); thus
p
U
(x) = p(x). If p(x) 0, then p(x) 0 < 1 for all > 0. Thus x U
for all > 0, and p
U
(x) = 0.
4.4 The HahnBanach theorem
Does an analogue of Proposition 4.2.2 hold for an arbitrary vector space
E? The answer to this question is given by the celebrated HahnBanach
theorem. We shall spend some time proving this, and considering some of
its consequences, and shall return to Jensens inequality later.
Recall that a linear functional on a vector space is a linear mapping of
the space into its eld of scalars.
Theorem 4.4.1 (The HahnBanach theorem) Suppose that p is a sub-
linear functional on a real vector space E, that F is a linear subspace of E
and that f is a linear functional on F satisfying f(x) p(x) for all x F.
Then there is a linear functional h on E such that
h(x) = f(x) for x F and h(y) p(y) for y E.
Thus h extends f, and still respects the inequality.
Proof The proof is an inductive one. If E is nite-dimensional, we can
use induction on the dimension of F. If E is innite-dimensional, we must
appeal to the Axiom of Choice, using Zorns lemma.
First we describe the inductive argument. Let o be the set of all pairs
(G, g), where G is a linear subspace of E containing F, and g is a linear
functional on G satisfying
g(x) = f(x) for x F and g(z) p(z) for z G.
We give o a partial order by setting (G
1
, g
1
) (G
2
, g
2
) if G
1
G
2
and
g
2
(z) = g
1
(z) for z G
1
: that is, g
2
extends g
1
. Every chain in o has an
upper bound: the union of the linear subspaces occurring in the chain is a
linear subspace K, say, and if z K we dene k(z) to be the common value
of the functionals in whose domain it lies. Then it is easy to check that
(K, k) is an upper bound for the chain. Thus, by Zorns lemma, there is a
maximal element (G, g) of o. In order to complete the proof, we must show
that G = E.
Suppose not. Then there exists y E G. Let G
1
= span (G, y).
G
1
properly contains G, and we shall show that g can be extended to a
linear functional g
1
on G
1
which satises the required inequality, giving the
necessary contradiction.
Now any element x G
1
can be written uniquely as x = z + y, with
z G, so that if g
1
is a linear functional that extends g then g
1
(x) =
g(z) + g
1
(y). Thus g
1
is determined by g
1
(y), and our task is to nd a
suitable value for g
1
(y). We need to consider the cases where is zero,
positive or negative. There is no problem when = 0, for then x G, and
g
1
(x) = g(x). Let us suppose then that z + y and w y are elements of
G
1
with > 0 and > 0. Then, using the sublinearity of p,
g(w) +g(z) = g(w +z) p(w +z)
p(w y) +p(z +y)
= p(w y) +p(z +y),
so that
g(w) p(w y)

p(z +y) g(z)
.
Thus if we set
0
= sup
_
g(w) p(w y)
: w G, > 0
_
,
1
= inf
_
p(z +y) g(z)
: z G, > 0
_
,
then
0

1
. Let us choose
0

1
, and let us set g
1
(y) = . Then
g
1
(z +y) = g(z) + p(z +y),
g
1
(w y) = g(w) p(w y)
for any z, w G and any positive , , and so we have found a suitable
extension.
Corollary 4.4.1 Suppose that f is a convex function on a radially open
convex subset C of a real vector space E and that x C. Then there exists
an ane function a such that a(x) = f(x) and a(y) f(y) for y C.
Proof By the HahnBanach theorem there exists a linear functional g on E
such that g(z) D
z
(f)(x) for all z E (take F = 0 in the theorem). Let
a(z) = f(x) +g(z x). This is ane, and if y C then
a(y) = f(x) +g(y x) f(x) +D
yx
(f)(x) f(y),
by Proposition 4.3.1.
We can also express the HahnBanach theorem as a separation theorem.
We do this in three steps.
Theorem 4.4.2 (The separation theorem: I) Suppose that U is a non-
empty radially open convex subset of a real vector space E.
(i) If 0 , U there exists a linear functional on E for which (x) > 0 for
x U.
(ii) If V is a non-empty convex subset of E disjoint from U there exists a
linear functional on E and a real number for which (x) > for x U
and (y) for y V .
(iii) If F is a linear subspace of E disjoint from U there exists a linear
functional on E for which (x) > 0 for x U and (y) = 0 for y F.
Proof (i) Choose x
0
in U and let W =Ux
0
. W is radially open and 0 W.
Let p
W
be the gauge of W. Then x
0
,W, and so p
W
(x
0
) 1. Let y
0
=
x
0
/p
W
(x
0
), so that p
W
(y
0
) = 1. If y
0
span (y
0
), let f(y
0
) =. Then
f is a linear functional on span (y
0
) and f(x
0
) =p
W
(x
0
) 1. If 0,
then f(y
0
) =p
W
(y
0
) and if < 0 then f(y
0
) = p
W
(y
0
) p
W
(y
0
),
since p
W
(y
0
) +p
W
(y
0
) p
W
(0) =0. By the HahnBanach Theorem, f
can be extended to a linear functional h on E for which h(x)p
W
(x) for all
xE. If x U then, since h(x
0
) =p
W
(x
0
) 1 and p
W
(x x
0
) <1,
h(x) = h(x x
0
) h(x
0
) p
W
(x x
0
) p
W
(x
0
) < 0;
now take = h.
(ii) Let W = U V . Then W is radially open, and 0 , W. By (i),
there exists a linear functional on E such that (x) > 0 for x W: that
is, (x) > (y) for x U, y V . Thus is bounded above on V : let
= sup(y): y V . The linear functional is non-zero: let z be a vector
for which (z) = 1. If x U then, since U is radially open, there exists
> 0 such that x z U. Then (x) = (x z) +(z) + > .
(iii) Take as in (ii) (with F replacing V ). Since F is a linear subspace,
(F) = 0 or R. The latter is not possible, since (F) is bounded above.
Thus (F) = 0, and we can take = 0.
4.5 Normed spaces, Banach spaces and Hilbert space
Theorem 4.4.1 is essentially a real theorem. There is however an important
version which applies in both the real and the complex case. A real-valued
function p on a real or complex vector space is a semi-norm if it is sub-
additive and if p(x) = [[p(x) for every scalar and vector x. A semi-
norm is necessarily non-negative, since 0 = p(0) p(x) +p(x) = 2p(x). A
semi-norm p is a norm if in addition p(x) ,= 0 for x ,= 0.
A norm is often denoted by a symbol such as |x|. (E, |.|) is then a
normed space. The function d(x, y) = |x y| is a metric on E; if E is
complete under this metric, then (E, |.|) is called a Banach space.
Many of the inequalities that we shall establish involve normed spaces and
Banach spaces, which are the building blocks of functional analysis. Let us
give some important fundamental examples. We shall meet many more.
Let B(S) denote the space of bounded functions on a set S. B(S) is a
Banach space under the supremum norm |f|
= sup
sS
[f(s)[. It is not
separable if S is innite. We write l
for B(N). The space

c
0
= x l
: x
n
0 as n
is a separable closed linear subspace of l
, and is therefore also a Banach

space under the norm |.|
. If (X, ) is a topological space then the space

C
b
(X) of bounded continuous functions on X is a closed linear subspace of
B(X) and is therefore also a Banach space under the norm |.|
.
Suppose that (E, |.|
E
) and (F, |.|
F
) are normed spaces. It is a standard
result of linear analysis that a linear mapping T from E to F is continuous
if and only if
|T| = sup
|x|
E
1
|T(x)|
F
< ,
that L(E, F), the set of all continuous linear mappings from E to F, is a
vector space under the usual operations, and that |T| is a norm on L(E, F).
Further, L(E, F) is a Banach space if and only if F is. In particular E
, the
dual of E, the space of all continuous linear functionals on E (continuous
linear mappings from E into the underlying eld), is a Banach space under
the norm ||
= sup[(x): |x|
E
1.
Standard results about normed spaces and Banach spaces are derived in
Exercises 4.94.13.
4.5 Normed spaces, Banach spaces and Hilbert space 35
Suppose that f, g L
1
(, , ). Integrating the inequality [f(x)+g(x)[
[f(x)[+[g(x)[ and the equation [f(x)[ = [[ . [f(x)[, we see that L
1
(, , )
is a vector space, and that the function |f|
1
=
_
[f[ d is a seminorm on it.
But
_
[f[ d = 0 only if f = 0 almost everywhere, and so |.|
1
is in fact a
norm. We shall see later (Theorem 5.1.1) that L
1
is a Banach space under
this norm.
If V is an inner-product space, then, as we have seen in Chapter 2,
|x| = x, x)
1/2
is a norm on V . If V is complete under this norm, V
is called a Hilbert space. Again, we shall see later (Theorem 5.1.1) that
L
2
= L
2
(, , ) is a Hilbert space. A large amount of analysis, including
the mathematical theory of quantum mechanics, takes place on a Hilbert
space. Let us establish two fundamental results.
Proposition 4.5.1 Suppose that V is an inner-product space. If x, y V ,
let l
y
(x) = x, y). Then l
y
is a continuous linear functional on V , and
|l
y
|
= sup[l
y
(x)[: |x| 1 = |y| .
The mapping l : y l
y
is an antilinear isometry of V into the dual space
V
: that is |l
y
|
= |y| for each y V .

Proof Since the inner product is sesquilinear, l
y
is a linear functional on
V . By the CauchySchwarz inequality, [l
y
(x)[ |x| . |y|, so that l
y
is
continuous, and |l
y
|
|y|. On the other hand, l

0
= 0, and if y ,= 0 and
z = y/ |y| then |z| = 1 and l
y
(z) = |y|, so that |l
y
|
= |y|. Finally, l is
antilinear, since the inner product is sesquilinear.
When V is complete, we can say more.
Theorem 4.5.1 (The FrechetRiesz representation theorem) Sup-
pose that is a continuous linear functional on a Hilbert space H. Then
there is a unique element y H such that (x) = x, y).
Proof The theorem asserts that the antilinear map l of the previous propo-
sition maps H onto its dual H
. If = 0, we can take y = 0. Otherwise, by

scaling (considering / ||
), we can suppose that ||
= 1. Then for each

n there exists y
n
with |y
n
| 1 such that (y
n
) is real and (y
n
) 1 1/n.
Since (y
n
+ y
m
) 2 1/n 1/m, |y
n
+y
m
| 2 1/n 1/m. We now
apply the parallelogram law:
|y
n
y
m
|
2
= 2 |y
n
|
2
+ 2 |y
m
|
2
|y
n
+y
m
|
2
4 (2 1/n 1/m)
2
< 4(1/n + 1/m).
Thus (y
n
) is a Cauchy sequence: since H is complete, y
n
converges to some
y. Then |y| = lim
n
|y
n
| 1 and (y) = lim
n
(y
n
) = 1, so that
|y| = 1. We claim that (x) = x, y), for all x H.
First, consider z ,= 0 for which z, y) = 0. Now |y +z|
2
= 1 +[[
2
|z|
2
and (y +z) = 1+(z), so that [1+(z)[
2
1+[[
2
|z|
2
for all scalars
. Setting = (z)/ |z|
2
, we see that
_
1 +
[(z)[
2
|z|
2
_
2
1 +
[(z)[
2
|z|
2
,
so that (z) = 0. Suppose that x H. Let z = x x, y) y, so that
z, y) = 0. Then (x) = x, y) (y) +(z) = x, y). Thus y has the required
property. This shows that the mapping l of the previous proposition is
surjective. Since l is an isometry, it is one-one, and so y is unique.
We shall not develop the rich geometric theory of Hilbert spaces (see
[DuS 88] or [Bol 90]), but Exercises 4.54.8 establish results that we shall
use.
4.6 The HahnBanach theorem for normed spaces
Theorem 4.6.1 Suppose that p is a semi-norm on a real or complex vector
space E, that F is a linear subspace of E and that f is a linear functional on
F satisfying [f(x)[ p(x) for all x F. Then there is a linear functional h
on E such that
h(x) = f(x) for x F and [h(y)[ p(y) for y E.
Proof In the real case, p is a sublinear functional on E which satises
p(x) = p(x). By Theorem 4.4.1, there is a linear functional h on E which
satises h(x) p(x). Then
[h(x)[ = max(h(x), h(x)) max(p(x), p(x)) = p(x).
We use Theorem 4.4.1 to deal with the complex case, too. Let f
R
(x)
be the real part of f(x). Then f
R
is a real linear functional on E, when
E is considered as a real space, and [f
R
(x)[ p(x) for all x F, and
so there exists a real linear functional k on E extending f
R
and satisfying
k(x) p(x) for all x. Set h(x) = k(x) ik(ix). We show that h has the
required properties. First, h is a complex linear functional on E: h(x+y) =
h(x) +h(y), h(x) = h(x) when is real, and
h(ix) = k(ix) ik(x) = k(ix) +ik(x) = ih(x).
4.6 The HahnBanach theorem for normed spaces 37
Next, if y F and f(y) = re
i
, then f(e
i
y) = r = k(e
i
y) and
f(ie
i
y) = ir so that k(ie
i
y) = 0; thus h(e
i
y) = r = f(e
i
y), and
so h(y) = f(y): thus h extends f. Finally, if h(x) = re
i
then
[h(x)[ = r = h(e
i
x) = k(e
i
x) p(e
i
x) = p(x).
This theorem is the key to the duality theory of normed spaces (and indeed
of locally convex spaces, though we wont discuss these).
Corollary 4.6.1 Suppose that x is a non-zero vector in a normed space
(E, |.|). Then there exists a linear functional on E such that
(x) = |x| , ||
= sup
|y|1
[(y)[ = 1.
Proof Take F = span (x), and set f(x) = |x|. Then f is a linear
functional on F, and [f(x)[ = [[ |x| = |x|. Thus f can be extended to
a linear functional on E satisfying [(y)[ |y|, for y E. Thus ||
1.
As (x/ |x|) = 1, ||
= 1.
The dual E
of E
is called the bidual of E. The next corollary is an

immediate consequence of the preceding one, once the linearity properties
have been checked.
Corollary 4.6.2 Suppose that (E, |.|) is a normed space. If x E and
E
, let E
x
() = (x). Then E
x
E
and the mapping x E

x
is a
linear isometry of E into E
.
We now have a version of the separation theorem for normed spaces.
Theorem 4.6.2 (The separation theorem: II) Suppose that U is a
non-empty open convex subset of a real normed space (E, |.|
E
).
(i) If 0 , U there exists a continuous linear functional on E for which
(x) > 0 for x U.
(ii) If V is a non-empty convex subset of E disjoint from U there exists a
continuous linear functional on E and a real number for which (x) >
for x U and (y) for y V .
(iii) If F is a linear subspace of E disjoint from U there exists a continuous
linear functional on E for which (x) > 0 for x U and (y) = 0 for
y F.
Proof U is radially open, and so by Theorem 4.4.2 there exists a linear
functional on E for which (x) > 0 for x U. We show that is
continuous: inspection of the proof of Theorem 4.4.2 then shows that (ii)
and (iii) are also satised.
Let x
0
U. Since U is open, there exists r > 0 such that if |x x
0
|
E
r
then x U. We show that if |x|
E
1 then [(x)[ < (x
0
)/r. Suppose
not, so that there exists x
1
with |x
1
|
E
1 and [(x
1
)[ (x
0
)/r. Let
y = x
0
r((x
1
)/[(x
1
)[)x
1
. Then y U and (y) = (x
0
) r[(x
1
)[ 0,
giving the required contradiction.
We also have the following metric result.
Theorem 4.6.3 (The separation theorem: III) Suppose that A is a
non-empty closed convex subset of a real normed space (E, |.|
E
), and that
x
0
is a point of E not in A. Let d = d(x
0
, A) = inf|x
0
a| : a A. Then
there exists E
with ||
= 1 such that (x
0
) (a) +d for all a A.
Proof We apply Theorem 4.6.2 (ii) to the disjoint convex sets x
0
+dU and
A, where U = x E: |x| < 1. There exists a continuous linear functional
on E and a real number such that (a) for a A and (x
0
+x) >
for |x|
E
< d. Let = / ||
, so that ||
= 1. Suppose that a A
and that 0 < < 1. There exists y E with |y| < 1 such that (y) > .
Then (x
0
) d > (x
0
dy) > (a). Since this holds for all 0 < < 1,
(x
0
) (a) +d.
We also have the following normed-space version of Corollary 4.4.1.
Corollary 4.6.3 Suppose that f is a continuous convex function on an open
convex subset C of a real normed space (E, |.|) and that x C. Then there
exists a continuous ane function a such that a(x) = f(x) and a(y) f(y)
for y C.
Proof By Corollary 4.4.1, there exists an ane function a such that a(x) =
f(x) and a(y) f(y) for y C. We need to show that a is continuous.
We can write a(z) = f(x) + (z x), where is a linear functional on E.
Given > 0, there exists > 0 such that if |z| < then x + z C and
[f(x +z) f(x)[ < . Then if |z| < ,
f(x) +(z) = a(x +z) f(x +z) < f(x) +,
4.7 Barycentres and weak integrals 39
so that (z) < . But also |z| < , so that (z) = (z) < , and
[(z)[ < . Thus is continuous at 0, and is therefore continuous (Exercise
4.9); so therefore is a.
4.7 Barycentres and weak integrals
We now return to Jensens inequality, and consider what happens on Banach
spaces. Once again, we must rst consider barycentres. Suppose that is a
probability measure dened on the Borel sets of a real Banach space (E, |.|).
If E
then is Borel measurable. Suppose that each E
is in L
1
().
Let I
() =
_
E
(x) d(x). Then I
is a linear functional on E
. If there
exists in E such that I
() = ( ) for all E
, then is called the

barycentre of .
A barycentre need not exist: but in fact if is a probability measure
dened on the Borel sets of a real Banach space (E, |.|), and is supported
on a bounded closed set B (that is, (E B) = 0), then has a barycentre
in E.
Here is another version of Jensens inequality.
Theorem 4.7.1 (Jensens inequality: III) Suppose that is a probability
measure on the Borel sets of a separable real normed space E, and that has
a barycentre . If f is a continuous convex function on E with
_
E
f
d <
then f( )
_
E
f d. If f is strictly convex then equality holds if and only
if =

.
Proof The proof is exactly the same as Theorem 4.2.1. Proposition 4.6.3
ensures that the ane function that we obtain is continuous.
Besides considering measures dened on a Banach space, we shall also
consider functions taking values in a Banach space. Let us describe here
what we need to know.
Theorem 4.7.2 (Pettis theorem) Suppose that (, , ) is a measure
space, and that g : (E, |.|) is a mapping of into a Banach space
(E, |.|). The following are equivalent:
(i) g
1
(B) , for each Borel set B in E, and there exists a sequence g
n
of simple E-valued measurable functions which converges pointwise almost
everywhere to g.
(ii) g is weakly measurable that is, g is measurable for each in E

and there exists a closed separable subspace E
0
of E such that g() E
0
for
almost all .
If these equivalent conditions hold, we say that g is strongly measurable.
Now suppose that g is strongly measurable and that I E. We say that g
is weakly integrable, with weak integral I, if (g) L
1
(), and
_
(g) d =
(I), for each E
. Note that when is a probability measure this simply

states that I is the barycentre of the image measure g(), which is the Borel
measure on E dened by g()(B) = (g
1
(B)) for each Borel set B in E.
By contrast, we say that a measurable function g is Bochner integrable if
there exists a sequence (g
n
) of simple functions such that
_
|g g
n
| d 0
as n . Then
_
g
n
d (dened in the obvious way) converges in E, and
we dene the Bochner integral
_
g d as the limit. A measurable function
g is Bochner integrable if and only if
_
|g| d < . A Bochner integrable
function is weakly integrable, and the Bochner integral is then the same as
the weak integral.
We conclude this chapter with the following useful mean-value inequality.
Proposition 4.7.1 (The mean-value inequality) Suppose that
g : (, , ) (E, |.|) is weakly integrable, with weak integral I. Then
|I|
_
|g| d.
Proof There exists an element E
with ||
= 1 such that
|I| = (I) =
_
(g) d.
Then since [(g)[ |g|,
|I|
_
[(g)[ d
_
|g| d.
Jensen proved versions of his inequality in [Jen 06], a landmark in convex
analysis. He wrote: It seems to me that the notion of convex function is
almost as fundamental as these: positive function, increasing function. If
I am not mistaken in this then the notion should take its place in elementary
accounts of real functions.
The HahnBanach theorem for real vector spaces, was proved indepen-
dently by Hahn [Hah 27] and Banach [Ban 29]. The complex version was
proved several years later, by Bohnenblust and Sobczyk [BoS 38].
Details of the results described in Section 4.7 are given in [DiU 77].
Exercises 41
Exercises
4.1 (i) Use Jensens inequality to show that if x > 0 then
2x
2 +x
< log(1 +x) <
2x +x
2
2 + 2x
.
Let d
n
= (n + 1/2) log(1 + 1/n) 1. Show that
0 < d
n
< 1/4n(n + 1).
Let r
n
= n!e
n
/n
n+1/2
. Calculate log(r
n+1
/r
n
), and show that r
n
decreases to a nite limit C. Show that r
n
e
1/4n
C.
(ii) Let I
n
+
_
/2
0
sin
n
d. Show that I
n
is a decreasing sequence
of positive numbers, and show, by integration by parts, that nI
n
=
(n 1)I
n2
for n 2. Show that
I
2n+1
I
2n
=
2
4n+1
(n!)
4
(2n)!(2n + 1)!
1
as n , and deduce that C =

2. Thus n!
2n
n+1/2
/e
n
.
This is Stirlings formula. Another derivation of the value of C will
be given in Theorem 13.6.1.
4.2 Suppose that f is a convex function dened on an open interval I of
the real line. Show that D
+
f and D
f are increasing functions, that

D
+
f is right-continuous and D
f left-continuous, and that D
f(x)
D
+
f(y) if x > y. Show that D
+
f(x) and D
f(x) are equal and

continuous, except at a countable set of points where
lim
h0
D
+
f(x h) = D
f(x) < D
+
f(x) = lim
h0
D
f(x +h).
Show that f is dierentiable, except at this countable set of points.
4.3 Suppose that f is a real-valued function dened on an open interval
I of the real line. Show that f is convex if and only if there exists an
increasing function g on I such that
f(x) =
_
x
x
0
g(t) dt +c,
where x
0
is a point of I and c is a constant.
4.4 Suppose that (, , P) is a probability space, and that f is a non-
negative measurable function on for which
E(log
+
f) =
_
log
+
f dP =
_
(f>1)
log f dP < ,
so that E(log f) < . Let G(f) = exp(E(log f)), so that
0 G(f) < . G(f) is the geometric mean of f. Explain this
terminology. Show that G(f) E(f).
4.5 This question, and the three following ones, establish results about
Hilbert spaces that we shall use later. Suppose that A is a non-empty
subset of a Hilbert space H. Show that A
= y: a, y) = 0 for a A
is a closed linear subspace of H.
4.6 Suppose that C is a non-empty closed convex subset of a Hilbert space
H and that x H. Use an argument similar to that of Theorem
4.5.1 to show that there is a unique point c C with |x c| =
inf|x y|: y C.
4.7 Suppose that F is a closed linear subspace of a Hilbert space H and
that x H.
(i) Let P(x) be the unique nearest point to x in F. Show that
x P(x) F
, and that if y F and x y F
then y = P(x).
(ii) Show that P : H H is linear and that if F ,= 0 then
|P| = 1. P is the orthogonal projection of H onto F.
(iii) Show that H = F F
, and that if P is the orthogonal pro-

jection of H onto F then I P is the orthogonal projection of H onto
F
.
4.8 Suppose that (x
n
) is a linearly independent sequence of elements of a
Hilbert space x.
(i) Let P
0
= 0, let P
n
be the orthogonal projection of H onto
span (x
1
, . . . , x
n
), and let Q
n
= I P
n
. Let y
n
= Q
n1
(x
n
)/
|Q
n1
(x
n
)|. Show that (y
n
) is an orthonormal sequence in H:
|y
n
)| =1 for each n, and y
m
, y
n
) = 0 for m ,= n. Show that span
(y
1
, . . . , y
n
) = span (x
1
, . . . , x
n
), for each n.
(ii) [GramSchmidt orthonormalization] Show that the sequence
(y
n
) can be dened recursively by setting
y
1
= x
1
/ |x
1
| , z
n
= x
n
n1
i=1
x
i
, y
i
) y
i
and y
n
= z
n
/ |z
n
| .
4.9 This question, and the four following ones, establish fundamental
properties about normed spaces. Suppose that (E, |.|
E
) and (F, |.|
F
)
are normed spaces. Suppose that T is a linear mapping from E to F.
Show that the following are equivalent:
(i) T is continuous at 0;
(ii) T is continuous at each point of E;
(iii) T is uniformly continuous;
Exercises 43
(iv) T is Lipschitz continuous at 0;
(v) T is a Lipschitz function;
(vi) |T| = sup|T(x)|
F
: |x|
E
1 < .
4.10 Show that the set L(E, F) of continuous linear mappings from E to
F is a vector space under the usual operations. Show that |T| =
sup|T(x)|
F
: |x|
E
1 is a norm (the operator norm) on L(E, F).
Show that if (F, |.|
F
) is complete then L(E, F) is complete under the
operator norm.
4.11 Suppose that T L(E, F). If F
and x E, let T
()(x) =
(T(x)). Show that T
() E
and that |T
()|
E
|T| . ||
F
.
Show that T
L(F
, E
) and that |T
| |T|. Use Corollary 4.6.1

to show that |T
| = |T|. T
is the transpose or conjugate of T.

4.12 Suppose that T is a linear functional on a normed space (E, |.|
E
).
Show that is continuous if and only if its null-space
1
(0) is
closed.
4.13 Suppose that F is a closed linear subspace of a normed space (E, |.|
E
),
and that q : E E/F is the quotient mapping. If x E, let d(x, F) =
inf|x y|
E
: y F. Show that if q(x
1
) = q(x
2
) then d(x
1
, F) =
d(x
2
, F). If z = q(x), let |z|
E/F
= d(x, F). Show that |.|
E/F
is a
norm on E/F (the quotient norm). Show that if E is complete then
(E/F, |.|
E/F
) is.
4.14 Show that the vector space B(S) of all bounded (real- or complex-
valued) functions on a set S is complete under the norm |f|
=
sup[f(s): s S, and that if (X, ) is a topological space then the
space C
b
(X) of bounded continuous functions on X is a closed linear
subspace of B(X) and is therefore also a Banach space under the norm
|.|
.
4.15 Suppose that f is a bounded convex function dened on an open con-
vex subset of a normed space E. Show that f is Lipschitz continuous.
Give an example of a convex function dened on an open convex subset
of a normed space E which is not continuous.
4.16 Show that a sublinear functional is convex, and that a convex positive
homogeneous function is sublinear.
4.17 Show that the closure and the interior of a convex subset of a normed
space are convex.
4.18 Here is a version of the separation theorem for complex normed spaces.
A convex subset A of a real or complex vector space is absolutely
convex if whenever x A then x A for all with [[ 1. Show that
if A is a closed absolutely convex subset of a complex normed space
(E, |.|
E
) and x
0
, A then there exists a continuous linear functional
on E with ||
= 1, (x
0
) real and
(x
0
) sup
aA
[(a)[ +d(x
0
, A).
4.19 Let be the vector space of all innite sequences with only nitely
many non-zero terms, with the supremum norm. Let be dened by
(A) =
2
n
: e
n
A,
where e
n
is the sequence with 1 in the nth place, and zeros elsewhere.
Show that is a probability measure on the Borel sets of which
is supported on the unit ball of , and show that does not have a
barycentre.
4.20 Let be the Borel probability measure on c
0
dened by
(A) =
2
n
: 2
n
e
n
A,
where e
n
is the sequence with 1 in the nth place, and zeros elsewhere.
Show that does not have a barycentre.
5
The L
p
spaces
5.1 L
p
spaces, and Minkowskis inequality
Our study of convexity led us to consider normed spaces. We are interested
in inequalities between sequences and between functions, and this suggests
that we should consider normed spaces whose elements are sequences, or
(equivalence classes of) functions. We begin with the L
p
spaces.
Suppose that (, , ) is a measure space, and that 0 < p < . We
dene L
p
(, , ) to be the collection of those (real- or complex-valued)
measurable functions for which
_
[f[
p
d < .
If f = g almost everywhere, then
_
[f g[
p
d = 0 and
_
[f[
p
d =
_
[g[
p
d. We therefore identify functions which are equal almost every-
where, and denote the resulting space by L
p
= L
p
(, , ).
If f L
p
and is a scalar, then f L
p
. Since [a+b[
p
2
p
max([a[
p
, [b[
p
)
2
p
([a[
p
+[b[
p
), f +g L
p
if f, g L
p
. Thus f is a vector space.
Theorem 5.1.1 (i) If 1 p < then |f|
p
= (
_
[f[
p
d)
1/p
is a norm on L
p
.
(ii) If 0 < p < 1 then d
p
(f, g) =
_
[f g[
p
d is a metric on L
p
.
(iii) (L
p
, |.|
p
) is a Banach space for 1 p < and (L
p
, d
p
) is a complete
metric space for 0 < p < 1.
Proof The proof depends on the facts that the function t
p
is convex on
[0, ) for 1 p < and is concave for 0 < p < 1.
(i) Clearly |f|
p
= [[ |f|
p
. If f or g is zero then trivially |f +g|
p

|f|
p
+ |g|
p
. Otherwise, let F = f/ |f|
p
, G = g/ |g|
p
, so that |F|
p
=
45
46 The L
p
spaces
|G|
p
= 1. Let = |g|
p
/(|f|
p
+|g|
p
), so that 0 < < 1. Now
[f +g[
p
= (|f|
p
+|g|
p
)
p
[(1 )F +G[
p
(|f|
p
+|g|
p
)
p
((1 )[F[ +[G[)
p
(|f|
p
+|g|
p
)
p
((1 )[F[
p
+[G[
p
) ,
since t
p
is convex, for 1 p < . Integrating,
_
[f +g[
p
d (|f|
p
+|g|
p
)
p
_
(1 )
_
[F[
p
d +
_
[G[
p
d
_
= (|f|
p
+|g|
p
)
p
.
Thus we have established Minkowskis inequality
__
[f +g[
p
d
_
1/p
__
[f[
p
d
_
1/p
+
__
[g[
p
d
_
1/p
,
and shown that |.|
p
is a norm.
(ii) If 0 < p < 1, the function t
p1
is decreasing on (0, ), so that if a
and b are non-negative, and not both 0, then
(a +b)
p
= a(a +b)
p1
+b(a +b)
p1
a
p
+b
p
.
Integrating,
_
[f +g[
p
d
_
([f[ +[g[)
p
d
_
[f[
p
d +
_
[g[
p
d;
this is enough to show that d
p
is a metric.
(iii) For this, we need Markovs inequality: if f L
p
and > 0 then
p
I
([f[>)
[f[
p
; integrating,
p
([f[ > )
_
[f[
p
d. Suppose that
(f
n
) is a Cauchy sequence. Then it follows from Markovs inequality that
(f
n
) is locally Cauchy in measure, and so it converges locally in measure
to a function f. By Proposition 1.2.2, there is a subsequence (f
n
k
) which
converges almost everywhere to f. Now, given > 0 there exists K such that
_
[f
n
k
f
n
l
[
p
d < for k, l K. Then, by Fatous lemma,
_
[f
n
k
f[
p
d
for k K. This shows rst that f
n
k
f L
p
, for k K, so that f L
p
,
and secondly that f
n
k
f in norm as k . Since (f
n
) is a Cauchy
sequence, it follows that f
n
f in norm, as n , so that L
p
is complete.
In a similar way if E is a Banach space, and 0 < p < , then we denote
by L
p
(; E) = L
p
(E) the collection of (equivalence classes of) measurable
E-valued functions for which
_
|f|
p
d < . The results of Theorem 5.1.1
5.2 The Lebesgue decomposition theorem 47
carry over to these spaces, with obvious changes to the proof (replacing
absolute values by norms).
Let us also introduce the space L
= L
(, , ). A measurable function
f is essentially bounded if there exists a set B of measure 0 such that f
is bounded on B. If f is essentially bounded, we dene its essential
supremum to be
ess sup f = inft:
[f[
(t) = ([f[ > t) = 0.
If f is essentially bounded and g = f almost everywhere then g is also essen-
tially bounded, and ess sup f = ess sup g. We identify essentially bounded
functions which are equal almost everywhere; the resulting space is L
.
L
is a vector space, |f|
= ess sup [f[ is a norm and straightforward

arguments show that (L
, |.|
) is a Banach space.
5.2 The Lebesgue decomposition theorem
As an important special case, L
2
is a Hilbert space. We now use the Frechet
Riesz representation theorem to prove a fundamental theorem of measure
theory.
Theorem 5.2.1 (The Lebesgue decomposition theorem) Suppose that
(, , ) is a measure space, and that is a measure on with () < .
Then there exists a non-negative f L
1
() and a set B with (B) = 0
such that (A) =
_
A
f d +(A B) for each A .
If we dene
B
(A) = (A B) for A , then
B
is a measure. The
measures and
B
are mutually singular; we decompose as B ( B),
where (B) = 0 and
B
( B) = 0; and
B
live on disjoint sets.
Proof Let (A) = (A) + (A); is a measure on . Suppose that g
L
2
R
(). Let L(g) =
_
g d. Then, by the CauchySchwarz inequality,
[L(g)[ (())
1/2
__
[g[
2
d
_
1/2
(())
1/2
|g|
L
2
()
,
so that L is a continuous linear functional on L
2
R
(). By the FrechetRiesz
theorem, there exists an element h L
2
R
() such that L(g) = g, h), for
each g L
2
(); that is,
_
g d =
_
ghd +
_
ghd, so that
_
g(1 h) d =
_
ghd. ()
48 The L
p
spaces
Taking g as an indicator function I
A
, we see that
(A) = L(I
A
) =
_
A
hd =
_
A
hd +
_
A
hd
for each A .
Now let N = (h < 0), G
n
= (0 h 1 1/n), G = (0 h < 1) and
B = (h 1). Then
(N) =
_
N
hd +
_
N
hd 0, so that (N) = (N) = 0,
and
(B) =
_
B
hd +
_
B
hd (B) +(B), so that (B) = 0.
Let f(x) = h(x)/(1 h(x)) for x G, and let h(x) = 0 otherwise. Note
that if x G
n
then 0 f(x) 1/(1 h(x)) n. If A , then, using (),
(A G
n
) =
_
1 h
1 h
I
AG
n
d =
_
fI
AG
n
d =
_
AG
n
f d.
Applying the monotone convergence theorem, we see that (A G) =
_
AG
f d =
_
A
f d. Thus
(A) = (A G) +(A B) +(A N) =
_
A
f d +(A B).
Taking A = , we see that
_
f d < , so that f L
1
().
This beautiful proof is due to von Neumann.
Suppose that (, , ) is a measure space, and that is a real-valued
function on . We say that is absolutely continuous with respect to if,
given > 0, there exists > 0 such that if (A) < then [(A)[ < .
Corollary 5.2.1 (The RadonNykod ym theorem) Suppose that (, ,
) is a measure space, and that is a measure on with () < . Then
is absolutely continuous with respect to if and only if there exists a
non-negative f L
1
() such that (A) =
_
A
f d for each A .
Proof Suppose rst that is absolutely continuous with respect to . If
(B) = 0 then (B) = 0, and so the measure
B
of the theorem is zero.
Conversely, suppose that the condition is satised. Let B
n
= (f > n). Then
by the dominated convergence theorem, (B
n
) =
_
B
n
f d 0. Suppose
that > 0. Then there exists n such that (B
n
) < /2. Let = /2n. Then
5.3 The reverse Minkowski inequality 49
if (A) < ,
(A) = (A B
n
) +
_
A(0fn)
f d < /2 +n = .
We also need a signed version of this corollary.
Theorem 5.2.2 Suppose that (, , ) is a measure space, with () < ,
and that is a bounded absolutely continuous real-valued function on
which is additive: if A, B are disjoint sets in then (AB) = (A)+(B).
Then there exists f L
1
such that (A) =
_
A
f d, for each A .
Proof If A , let
+
(A) = sup(B): B A.
+
is a bounded
additive non-negative function on . We shall show that
+
is countably
additive. Suppose that A is the disjoint union of (A
i
). Let R
j
=
i>j
A
i
.
Then R
j
, and so (R
j
) 0 as j . By absolute continuity,
sup[(B)[: B R
j
0 as j , and so
+
(R
j
) 0 as j .
This implies that
+
is countably additive. Thus
+
is a measure on ,
which is absolutely continuous with respect to , and so it is represented by
some f
+
L
1
(). But now
+
is additive, non-negative and absolutely
continuous with respect to , and so is represented by a function f
. Let
f = f
+
f
. Then f L
1
() and
(A) =
+
(A) (
+
(A) (A)) =
_
A
f
+
d
_
A
f
d =
_
A
f d.
5.3 The reverse Minkowski inequality
When 0 < p < 1 and L
p
is innite-dimensional then there is no norm on
L
p
which denes the topology on L
p
. Indeed if (, , ) is atom-free there
are no non-trivial convex open sets, and so no non-zero continuous linear
functionals (see Exercise 5.4). In this case, the inequality in Minkowskis
inequality is reversed.
Proposition 5.3.1 (The reverse Minkowski inequality) Suppose that
0 < p < 1 and that f and g are non-negative functions in L
p
. Then
__
f
p
d
_
1/p
+
__
g
p
d
_
1/p
__
(f +g)
p
d
_
1/p
.
50 The L
p
spaces
Proof Let q = 1/p and let w = (u, v) = (f
q
, g
q
). Thus w takes values in
R
2
, which we equip with the norm |(x, y)|
q
= ([x[
q
+[y[
q
)
1/q
. Let
I(w) =
_
wd =
__
ud,
_
v d
_
.
Then
|I(w)|
q
q
=
__
ud
_
q
+
__
v d
_
q
=
__
f
p
d
_
1/p
+
__
g
p
d
_
1/p
,
while
__
|w|
q
d
_
q
=
__
(u
q
+v
q
)
1/q
d
_
q
=
__
(f +g)
p
d
_
1/p
,
so that the result follows from the mean-value inequality (Proposition 4.7.1).
In the same way, the inequality in Proposition 4.7.1 is reversed.
Proposition 5.3.2 Suppose that 0 < p < 1 and that f and g are non-
negative functions in L
1
. Then
_
(f
p
+g
p
)
1/p
d
___
f d
_
p
+
__
g d
_
p
_
1/p
.
Proof As before, let q = 1/p and let u = f
p
, v = g
p
. Then u, v L
q
and.
using Minkowskis inequality,
_
(f
p
+g
p
)
1/p
d =
_
(u +v)
q
d = |u +v|
q
q
(|u|
q
+|v|
q
)
q
=
___
f d
_
p
+
__
g d
_
p
_
1/p
.
5.4 Holders inequality
If 1 < p < , we dene the conjugate index p
t
to be p
t
= p/(p 1). Then
1/p + 1/p
t
= 1, so that p is the conjugate index of p
t
. We also dene to
be the conjugate index of 1, and 1 to be the conjugate index of .
Note that, by Proposition 4.1.3, if p and p
t
are conjugate indices, and t
and u are non-negative, then
tu
t
p
p
+
u
p
p
t
,
with equality if and only if t
p
= u
p
. We use this to prove Holders inequal-

ity. This inequality provides a natural and powerful generalization of the
CauchySchwarz inequality.
We dene the signum sgn(z) of a complex number z as z/[z[ if z ,= 0, and
0 if z = 0.
Theorem 5.4.1 (Holders inequality) Suppose that 1 < p < , that
f L
p
and g L
p
. Then fg L
1
, and
_
fg d
_
[fg[ d |f|
p
|g|
p
.
Equality holds throughout if and only if either |f|
p
|g|
p
= 0, or g =
sgn(f)[f[
p1
almost everywhere, where ,= 0.
Proof The result is trivial if either f or g is zero. Otherwise, by scaling, it is
enough to consider the case where |f|
p
= |g|
p
= 1. Then by the inequality

above [fg[ [f[
p
/p +[g[
p
/p
t
; integrating,
_
[fg[ d
_
[f[
p
/p d +
_
[g[
p
/p
t
d = 1/p + 1/p
t
= 1.
Thus fg L
1
() and [
_
fg d[
_
[fg[ d.
If g = sgn(f)[f[
p1
almost everywhere, then fg = [fg[ = [f[
p
= [g[
p
almost everywhere, so that equality holds.

Conversely, suppose that
_
fg d
=
_
[fg[ d = |f|
p
|g|
p
.
Then, again by scaling, we need only consider the case where |f|
p
= |g|
p
=
1. Since [
_
fg d [ =
_
[fg[ d, there exists such that e
i
fg = [fg[ almost
everywhere. Since
_
[fg[ d = 1 =
_
[f[
p
/p d +
_
[g[
p
/p
t
d and [f[
p
/p +[g[
p
/p
t
[fg[,
[fg[ = [f[
p
/p + [g[
p
/p
t
almost everywhere, and so [f[
p
= [g[
p
almost
everywhere. Thus [g[ = [f[
p/p
= [f[
p1
almost everywhere, and g =
e
i
sgn(f)[f[
p1
almost everywhere.
52 The L
p
spaces
Corollary 5.4.1 if f L
p
then
|f|
p
= sup
__
[fg[ d: |g|
p
1
_
= sup
_
_
fg d
: |g|
p
1
_
,
and the supremum is attained.
Proof The result is trivially true if f = 0; let us suppose that f ,= 0.
Certainly
|f|
p
sup
__
[fg[ d: |g|
p
1
_
sup
_
_
fg d
: |g|
p
1
_
,
by H olders inequality. Let h = [f[
p1
sgn f. Then
fh = [fh[ = [f[
p
= [h[
p
,
so that h L
p
and |h|
p
= |f|
p/p
p
. Let g = h/ |h|
p
, so that |g|
p
= 1.
Then
_
fg d =
_
[fg[ d =
_
[f[
p
|f|
p/p
p
d = |f|
p
p
/ |f|
p/p
p
= |f|
p
.
Thus
|f|
p
= sup
__
[fg[ d: |g|
p
1
_
= sup
_
_
fg d
: |g|
p
1
_
,
and the supremum is attained.
As an application of this result, we have the following important corollary.
Corollary 5.4.2 Suppose that f is a non-negative measurable function on
(
1
,
1
,
1
) (
2
,
2
,
2
) and that 0 < p q < . Then
_
_
X
1
__
X
2
f(x, y)
p
d
2
(y)
_
q/p
d
1
(x)
_
1/q
_
_
X
2
__
X
1
f(x, y)
q
d
1
(x)
_
p/q
d
2
(y)
_
1/p
.
Proof Let r = q/p. Then
_
_
X
1
__
X
2
f(x, y)
p
d
2
(y)
_
q/p
d
1
(x)
_
1/q
=
__
X
1
__
X
2
f(x, y)
p
d
2
(y)
_
r
d
1
(x)
_
1/rp
=
__
X
1
__
X
2
f(x, y)
p
d
2
(y)
_
g(x) d
1
(x)
_
1/p
for some g with |g|
r
=1
=
__
X
2
__
X
1
f(x, y)
p
g(x) d
1
(x)
_
d
2
(y)
_
1/p
(by Fubinis theorem)
_
_
X
2
__
X
1
f(x, y)
pr
d
1
(x)
_
1/r
d
2
(y)
_
1/p
(by Corollary 5.4.1)
=
_
_
X
2
__
X
1
f(x, y)
q
d
1
(x)
_
p/q
d
2
(y)
_
1/p
.
We can consider f as a vector-valued function f(y) on
2
, taking values in
L
q
(
1
), and with
_
2
|f(y)|
p
q
d
2
< : thus f L
p
2
(L
q
1
). The corollary
then says that f L
q
1
(L
p
2
) and |f|
L
q
1
(L
p
2
)
|f|
L
p
2
(L
q
1
)
.
Here is a generalization of Holders inequality.
Proposition 5.4.1 Suppose that 1/p
1
+ + 1/p
n
= 1 and that f
i
L
p
i
for 1 i n. Then f
1
f
n
L
1
and
_
[f
1
f
n
[ d |f
1
|
p
1
|f
n
|
p
n
.
Equality holds if and only if either the right-hand side is zero, or there exist
ij
> 0 such that [f
i
[
p
i
=
ij
[f
j
[
p
j
for 1 i, j n.
Proof By Proposition 4.1.3,
[f
1
f
n
[ [f
1
[
p
1
/p
1
+ +[f
n
[
p
n
/p
n
.
We now proceed exactly as in Theorem 5.4.1.
It is also easy to prove this by induction on n, using H olders inequality.
54 The L
p
spaces
5.5 The inequalities of Liapounov and Littlewood
Holders inequality shows that there is a natural scale of inclusions for the
L
p
spaces, when the underlying space has nite measure.
Proposition 5.5.1 Suppose that (, , ) is a measure space and that
() < . Suppose that 0 < p < q < . If f L
q
then f L
p
and
|f|
p
()
1/p1/q
|f|
q
. If f L
then f L
p
and |f|
p
()
1/p
|f|
.
Proof Let r = q/(q p), so that p/q + 1/r = 1 and 1/rp = 1/p 1/q. We
apply H olders inequality to the functions 1 and [f[
p
, using exponents r and
q/p:
_
[f[
p
d (())
1/r
__
[f[
q
d
_
p/q
,
so that
|f|
p
(())
1/rp
__
[f[
q
d
_
1/q
= ()
1/p1/q
|f|
q
.
When f L
,
_
[f[
p
d |f|
p
(), so that |f|

p
()
1/p
|f|
.
When the underlying space has counting measure, we denote the space
L
p
() by l
p
() or l
p
; when = 1, . . . , n we write l
n
p
. With counting
measure, the inclusions go the other way.
Proposition 5.5.2 Suppose that 0 < p < q . If f l
p
then f l
q
and
|f|
q
|f|
p
.
Proof The result is certainly true when q = , and when f = 0. Otherwise,
let F = f/ |f|
p
, so that |F|
p
= 1. Thus if i then [F
i
[ 1 and
so [F
i
[
q
[F
i
[
p
. Thus

i
[F
i
[
q

i
[F
i
[
p
= 1, so that |F|
q
1 and
|f|
q
|f|
p
.
For general measure spaces, if p ,= q then L
p
neither includes nor is
included in L
q
. On the other hand if 0 < p
0
< p < p
1
then
L
p
0
L
p
1
L
p
L
p
0
+L
p
1
.
More precisely, we have the following.
Theorem 5.5.1 (i) (Liapounovs inequality) Suppose that 0 < p
0
< p
1
<
and that 0 < < 1. Let p = (1 )p
0
+ p
1
. If f L
p
0
L
p
1
then f L
p
and |f|
p
p
|f|
(1)p
0
p
0
|f|
p
1
p
1
.
5.6 Duality 55
(ii) (Littlewoods inequality) Suppose that 0 < p
0
< p
1
< and that
0 < < 1. Dene p by 1/p = (1 )/p
0
+ /p
1
. If f L
p
0
L
p
1
then
f L
p
and |f|
p
|f|
1
p
0
|f|
p
1
.
(iii) Suppose that 0 < p
0
< p
1
and that 0 < < 1. Dene p by
1/p = (1 )/p
0
+ /p
1
. Then if f L
p
there exist functions g L
p
0
and
h L
p
1
such that f = g +h and |g|
1
p
0
|h|
p
1
|f|
p
.
Proof (i) We use Holders inequality with exponents 1/(1 ) and 1/:
|f|
p
p
=
_
[f[
p
d =
_
[f[
(1)p
0
[f[
p
1
d
__
[f[
p
0
d
_
1
__
[f[
p
1
d
_
= |f|
(1)p
0
p
0
|f|
p
1
p
1
.
(ii) Let 1 = (1 )p/p
0
, so that = p/p
1
. We apply H olders
inequality with exponents 1/(1 ) and 1/:
|f|
p
=
__
[f[
p
d
_
1/p
=
__
[f[
(1)p
[f[
p
d
_
1/p
__
[f[
(1)p/(1)
d
_
(1)/p
__
[f[
p/
d
_
/p
=
__
[f[
p
0
d
_
(1)/p
0
__
[f[
p
1
d
_
/p
1
= |f|
1
p
0
|f|
p
1
.
(iii) Let g = fI
([f[>1)
and let h = f g. Then [g[
p
0
[f[
p
, and so
|g|
p
0
|f|
p/p
0
p
. On the other hand, [h[ 1, so that [h[
p
1
[h[
p
[f[
p
,
and |h|
p
1
|f|
p/p
1
p
. Thus
|g|
1
p
0
|h|
p
1
|f|
p((1)/p
0
+/p
1
)
p
= |f|
p
.
Liapounovs inequality says that log |f|
p
is a convex function of p, and
Littlewoods inequality says that log |f|
1/t
is a convex function of t.
5.6 Duality
We now consider the structure of the L
p
spaces, and their duality properties.
Proposition 5.6.1 The simple functions are dense in L
p
, for 1 p < .
56 The L
p
spaces
Proof Suppose that f L
p
. Then there exists a sequence (f
n
) of simple
functions with [f
n
[ [f[ which converges pointwise to f. Then [f f
n
[
p
[f[
p
, and [f f
n
[
p
0 pointwise, and so by the theorem of dominated
convergence, |f f
n
|
p
p
=
_
[f f
n
[
p
d 0.
This result holds for L
if and only if () < .

Proposition 5.6.2 Suppose that 1 p < . A measurable function f is
in L
p
if and only if fg L
1
for all g L
p
.
Proof The condition is certainly necessary, by H olders inequality. It is
trivially sucient when p = 1 (take g = 1). Suppose that 1 < p < and
that f , L
p
. There exists an increasing sequence (k
n
) of non-negative simple
functions which increases pointwise to [f[. By the monotone convergence
theorem, |k
n
|
p
; extracting a subsequence if necessary, we can suppose
that |k
n
|
p
4
n
, for each n. Let h
n
= k
p1
n
. Then as in Corollary 5.4.1,
|h
n
|
p
= |k
n
|
p/p
p
; setting g
n
= h
n
/ |h
n
|
p
, |g
n
|
p
= 1 and
_
[f[g
n
d
_
k
n
g
n
d = |k
n
|
p/p
p
_
k
p
n
d = |k
n
|
p
4
n
.
If we set s =
n=1
g
n
/2
n
, then |s|
p
1, so that s L
p
, while
_
[f[s d =
.
Suppose that 1 p < and that g L
p
. If f L
p
, let l
g
(f) =
_
fg d.
Then it follows from H olders inequality that the mapping g l
g
is a linear
isometry of L
p
into (L
p
)
. In fact, we can say more.

Theorem 5.6.1 If 1 < p < , the mapping g l
g
is a linear isometric
isomorphism of L
p
onto (L
p
)
.
Proof We shall prove this in the real case: the extension to the complex
case is given in Exercise 5.11. We must show that the mapping is surjective.
There are several proofs of this; the proof that we give here appeals to
measure theory. First, suppose that () < . Suppose that (L
p
)
and that ,= 0. Let (E) = (I

E
), for E . Then is an additive
function on . Further, [(E)[ ||
. ((E))
1/p
, so that is absolutely
continuous with respect to . By Theorem 5.2.2 there exists g L
1
such
that (I
E
) = (E) =
_
E
g d for all E . Now let
+
(f) = (f.I
g0
) and
(f) = (f.I
g<0
):
+
and
are continuous linear functionals on L

p
, and
=
+

. If f is a simple function then

+
(f) =
_
fg
+
d. We now
show that g
+
L
p
. There exists a sequence (g

n
) of non-negative simple
functions which increase pointwise to g
+
. Let f
n
= g
p
1
n
. Then
_
g
p
n
d
_
g
p
1
n
g
+
d =
+
(f
n
)
_
_
+
_
_
|f
n
|
p
=
_
_
+
_
_
__
g
p(p
1)
n
d
_
1/p
=
_
_
+
_
_
__
g
p
n
d
_
1/p
,
so that
_
g
p
n
d (|
+
|
)
p
. It now follows from the monotone convergence

theorem that
_
(g
+
)
p
d (|
+
|
)
p
, and so g
+
L
p
. Similarly g
L
p
,
and so g L
p
. Now (f) = l
g
(f) when f is a simple function, and the
simple functions are dense in L
p
, and so = l
g
.
In the general case, we can write =
n
n
, where the sets
n
are disjoint
sets of nite measure. Let
n
be the restriction of to L
p
(
n
). Then by
the above result, for each n there exists g
n
L
p
(
n
) such that
n
= l
g
n
.
Let g be the function on whose restriction to
n
is g
n
, for each n. Then
straightforward arguments show that g L
p
() and that = l
g
.
The theorem is also true for p = 1 (see Exercise 5.8), but is not true for
p = , unless L
is nite dimensional. This is the rst indication of the

fact that the L
p
spaces, for 1 < p < , are more well-behaved than L
1
and
L
.
A Banach space (E, |.|) is reexive if the natural isometry of E into E
maps E onto E
: thus we can identify the bidual of E with E.

Corollary 5.6.1 L
p
is reexive, for 1 < p < .
The proof of Theorem 5.6.1 appealed to measure theory. In Chapter 9
we shall establish some further inequalities, concerning the geometry of the
unit ball of L
p
, which lead to a very dierent proof.
5.7 The LoomisWhitney inequality
The spaces L
1
and L
are clearly important, and so is L

2
, which provides
an important example of a Hilbert space. But why should we be interested
in L
p
spaces for other values of p? The next few results begin to give an
answer to this question.
First we need to describe the setting in which we work, and the notation
which we use. This is unfortunately rather complicated. It is well worth
writing out the proof for the case d = 3. Suppose that (
1
,
1
,
1
), . . . ,
(
d
,
d
,
d
) are measure spaces; let (, , ) be the product measure space
58 The L
p
spaces
d
i=1
(
i
,
i
,
i
). We want to consider products with one or two factors omit-
ted. Let (
j
,
j
,
j
) =
i,=j
(
i
,
i
,
i
). Similarly, if j, k are distinct indices,
let (
j,k
,
j,k
,
j,k
) =

i,=j,k
(
i
,
i
,
i
). If , we write = (
j
,
j
),
where
j

j
and
j

j
, and if
j

j
, where j ,= 1 we write
j
= (
1
,
1,j
), where
1

1
and
1,j

1,j
.
Theorem 5.7.1 Suppose that h
j
is a non-negative function in L
d1
(
j
,
j
,
j
), for 1 j d. Let g
j
(
j
,
j
) = h
j
(
j
) and let g =

d
j=1
g
j
.
Then
_
g d
d
j=1
|h
j
|
d1
.
Proof The proof is by induction on d. The result is true for d = 2, since we
can write g(
1
,
2
) = h
1
(
2
)h
2
(
1
), and then
_
g d =
__
1
h
1
d
1
___
2
h
2
d
2
_
.
Suppose that the result holds for d 1. Suppose that
1

1
. We dene
the function g
1
on
1
by setting
g
1
(
1
) = g(
1
,
1
);
similarly if 2 j d we dene the function h
j,
1
on
1,j
by setting
h
j,
1
(
1,j
) = h
j
(
1
,
1,j
)
and dene the function g
j,
1
on
1
by setting
g
j,
1
(
1
) = g
j
(
1
,
1
).
Then by H olders inequality, with indices d 1 and (d 1)/(d 2),
_
1
g
1
d
1
=
_
1
h
1
j=2
g
j,
1
d
1
|h
1
|
d1
j=2
g
j,
1
(d1)/(d2)
d
1
(d2)/(d1)
.
But now by the inductive hypothesis,
1
(
d
j=2
g
j,
1
)
(d1)/(d2)
d
1
(d2)/(d1)
=
1
(
d
j=2
g
(d1)/(d2)
j,
1
) d
1
(d2)/(d1)
j=2
_
_
_h
(d1)/(d2)
j,
1
_
_
_
d2
(d2)/(d1)
=
j=2
_
h
d1
j,
1
d
1,j
1/(d1)
=
d
j=2
|h
j,
1
|
d1
.
Consequently, integrating over
1
, and using the generalized H older inequal-
ity with indices (d 1, . . . , d 1),
_
g d |h
1
|
d1
_
j=2
|h
j,
1
|
d1
d
1
(
1
)
|h
1
|
d1
d
j=2
__
1
|h
j,
1
|
d1
d1
d
1
_
1/(d1)
= |h
1
|
d1
d
j=2
__
1
__
1,j
h
d1
j,
1
d
1,j
_
d
1
_
1/(d1)
=
d
j=1
|h
j
|
d1
.
Corollary 5.7.1 Suppose that h
j
L
j
(
j
,
j
,
j
) for 1 j d, where
j
1. If f is a measurable function on satisfying [f(
j
,
j
)[ [h
j
(
j
)[
for all = (
j
,
j
), for 1 j d, then
|f|
/(d1)

j=1
|h
j
|
1/
(1/)
d
j=1
j
|h
j
|
j
,
where =
1
+ +
d
.
60 The L
p
spaces
Proof For [f()[
/(d1)

d
j=1
[h
j
(
j
)[
j
/(d1)
. The second inequality
follows from the generalized AMGM inequality.
Corollary 5.7.2 (The LoomisWhitney inequality) Suppose that K is
a compact subset of R
d
. Let K
j
be the image of K under the orthogonal
projection onto the subspace orthogonal to the j-th axis. Then
d
(K)
j=1
d1
(K
j
)
1/(d1)
.
[Here
d
denotes d-dimensional Borel measure, and
d1
(d1)-dimensional
measure.]
Proof Apply the previous corollary to the characteristic functions of K and
the K
j
, taking
j
= 1 for each j.
5.8 A Sobolev inequality
In the theory of partial dierential equations, it is useful to estimate the
size of a function in terms of its partial derivatives. Such estimates are
called Sobolev inequalities. We use Corollary 5.7.1 to prove the following
fundamental Sobolev inequality.
Theorem 5.8.1 Suppose that f is a continuously dierentiable function of
compact support on R
d
, where d > 1. If 1 p < d then
|f|
pd/(dp)

p(d 1)
2(d p)
j=1
_
_
_
_
f
x
j
_
_
_
_
p
1/d
p(d 1)
2d(d p)
j=1
_
_
_
_
f
x
j
_
_
_
_
p
p
1/p
.
Proof We rst consider the case when p = 1. Let us write x = (x
j
, x
j
).
Then
f(x) =
_
x
j
f
x
j
(t, x
j
) dt =
_

x
j
f
x
j
(t, x
j
) dt,
so that
[f(x)[
1
2
_

f
x
j
(t, x
j
)
dt.
5.8 A Sobolev inequality 61
Then, applying Corollary 5.7.1 with
j
= 1 for each j,
|f|
d/(d1)

1
2
j=1
_
_
_
_
f
x
j
_
_
_
_
1
1/d
1
2d
j=1
_
_
_
_
f
x
j
_
_
_
_
1
.
Next suppose that 1 < p < d. Let s = p(d 1)/(d p). Then (s 1)p
t
=
sd/(d 1) = pd/(d p); we shall see why this is useful shortly. Now
[f(x)[
s
=
_
x
j
x
j
_
[f(t, x
j
)[
s
_
dt
s
_
x
j
[f(t, x
j
)[
s1
[
f
x
j
(t, x
j
)[ dt;
similarly
[f(x)[
s
s
_

x
j
[f(t, x
j
)[
s1
[
f
x
j
(t, x
j
)[ dt,
so that
[f(x)[
_
s
2
_

[f(t, x
j
)[
s1
[
f
x
j
(t, x
j
)[ dt
_
1/s
.
Now take
j
= s for each j: by Corollary 5.7.1,
|f|
s
sd/(d1)

s
2
j=1
_
_
_
_
[f[
s1
[
f
x
j
[
_
_
_
_
1
1/d
.
Now
_
_
_
_
[f[
s1
[
f
x
j
[
_
_
_
_
1
_
_
f
s1
_
_
p
_
_
_
_
f
x
j
_
_
_
_
p
= |f|
s1
(s1)p
_
_
_
_
f
x
j
_
_
_
_
p
,
so that
|f|
s
sd/(d1)

s
2
|f|
s1
(s1)p
j=1
_
_
_
_
f
x
j
_
_
_
_
p
1/d
.
62 The L
p
spaces
Thus, bearing in mind that (s 1)p
t
= sd/(d 1) = pd/(d p),
|f|
pd/(dp)

p(d 1)
2(d p)
j=1
_
_
_
_
f
x
j
_
_
_
_
p
1/d
p(d 1)
2d(d p)
j=1
_
_
_
_
f
x
j
_
_
_
_
p
p
1/p
.
This theorem illustrates strongly the way in which the indices and con-
stants depend upon the dimension d. This causes problems if we wish to let
d increase to innity. We return to this point in Chapter 13.
5.9 Schurs theorem and Schurs test
We end this chapter with two results of Schur, which depend upon H olders
inequality. The rst of these is an interpolation theorem. Although the
result is a remarkable one, it is a precursor of more powerful and more
general results that we shall prove later. Suppose that (, , ) and (, T, )
are -nite measure spaces, and that K is a measurable function on
for which there are constants M and N such that
_
(ess sup
y
[K(x, y)[) d(x) M,
and
_
[K(x, y)[ d(y) N, for almost all x .
If f L
1
(), then
_
K(x, y)f(y) d(y)
(ess sup
y
[K(x, y)[)
_
[f(y)[ d(y),
so that, setting T(f)(x) =
_
K(x, y)f(y) d(y),
|T(f)|
1

_
(ess sup
y
[K(x, y)[) d(x) |f|
1
M |f|
1
.
Thus T L(L
1
(), L
1
()), and |T| M.
On the other hand, if f L
(), then
[T(f)(x)
_
[K(x, y)[[f(y)[ d(y) |f|
_
[K(x, y)[ d(y) N |f|
,
so that T L(L
(), L
()), and |T| N.

Holders inequality enables us to interpolate these results. By Theorem
5.5.1, if 1 < p < then L
p
L
1
+ L
, and so we can dene T(f) for

f L
p
.
5.9 Schurs theorem and Schurs test 63
Theorem 5.9.1 (Schurs theorem) Suppose that (, , ) and (, T, )
are -nite measure spaces, and that K is a measurable function on
for which there are constants M and N such that
_
(ess sup
y
[K(x, y)[) d(x) M,
and
_
[K(x, y)[ d(y) N, for almost all x .
Let T(f) =
_
K(x, y)f(y) d(y). If 1 < p < and f L
p
() then T(f)
L
p
() and |T(f)|
p
M
1/p
N
1/p
|f|
p
.
Proof Applying H olders inequality,
[T(f)(x)[
_
[K(x, y)[[f(y)[ d(y)
=
_
[K(x, y)[
1/p
[f(y)[[K(x, y)[
1/p
d(y)
__
[K(x, y)[[f(y)[
p
d(y)
_
1/p
__
[K(x, y)[ d(y)
_
1/p
N
1/p
__
[K(x, y)[[f(y)[
p
d(y)
_
1/p
x-almost everywhere.
Thus
_
[T(f)(x)[
p
d(x) N
p/p
_ __
[K(x, y)[[f(y)[
p
d(y)
_
d(x)
= N
p/p
_ __
[K(x, y)[ d(x)
_
[f(y)[
p
d(y)
N
p/p
M |f|
p
p
.
The next result remains a powerful tool.
Theorem 5.9.2 (Schurs test) Suppose that k = k(x, y) is a non-negative
measurable function on a product space (X, , ) (Y, T, ), and that 1 <
p < . Suppose also that there exist strictly positive measurable functions
s on (X, , ) and t on (Y, T, ), and constants A and B such that
_
Y
k(x, y)(t(y))
p
d(y) (As(x))
p
for almost all x,

64 The L
p
spaces
and
_
X
(s(x))
p
k(x, y) d(x) (Bt(y))
p
for almost all y.
Then if f L
p
(Y ), T(f)(x) =
_
Y
k(x, y)f(y) d(y) exists for almost all x,
T(f) L
p
(X) and |T(f)|
p
AB|f|
p
.
Proof Holders inequality shows that it is enough to prove that if h is a
non-negative function in L
p
(X) and g is a non-negative function in L

p
(Y )
then
_
X
_
Y
h(x)k(x, y)g(y) d(y) d(x) AB|h|
p
|g|
p
.
Now, using H olders inequality,
_
Y
k(x, y)g(y) d(y)
=
_
Y
(k(x, y))
1/p
t(y)
(k(x, y))
1/p
g(y)
t(y)
d(y)
__
Y
k(x, y)(t(y))
p
d(y)
_
1/p
__
Y
k(x, y)(g(y))
p
(t(y))
p
d(y)
_
1/p
As(x)
__
Y
k(x, y)(g(y))
p
(t(y))
p
d(y)
_
1/p
.
Thus, using H olders inequality again,
_
X
_
Y
h(x)k(x, y)g(y) d(y) d(x)
A
_
X
h(x)s(x)
__
Y
k(x, y)(g(y))
p
(t(y))
p
d(y)
_
1/p
d(x)
A|h|
p
__
X
(s(x))
p
__
Y
k(x, y)(g(y))
p
(t(y))
p
d(y)
_
d(x)
_
1/p
= A|h|
p
__
Y
__
X
(s(x))
p
k(x, y) d(x)
_
(g(y))
p
(t(y))
p
d(y)
_
1/p
AB|h|
p
__
Y
(g(y))
p
d(y)
_
1/p
= AB|h|
p
|g|
p
.
5.10 Hilberts absolute inequality 65
5.10 Hilberts absolute inequality
Let us apply Schurs test to the kernel k(x, y) = 1/(x+y) on [0, )[0, ).
We take s(x) = t(x) = 1/x
pp
. Then
_

0
(s(x))
p
k(x, y) dx =
_

0
1
(x +y)x
1/p
dy =

sin(/p
t
)
1
y
1/p
=

sin(/p)
(t(x))
p
,
and similarly
_

0
k(x, y)(t(y))
p
dy =

sin(/p)
(s(x))
p
,
Here we use the formula
_

0
1
(1 +y)y
dy =

sin
for 0 < < 1,
which is a familiar exercise in the calculus of residues (Exercise 5.13).
Thus we have the following version of Hilberts inequality for the kernel
k(x, y) = 1/(x+y). (There is another more important inequality, also known
as Hilberts inequality, for the kernel k(x, y) = 1/(xy): we consider this in
Chapter 11. To distinguish the inequalities, we refer to the present inequality
as Hilberts absolute inequality.)
Theorem 5.10.1 (Hilberts absolute inequality: the continuous
case) If f L
p
[0, ) and g L
p
[0, ), where 1 < p < , then

_

0
_

0
[f(x)g(y)[
x +y
dxdy

sin(/p)
|f|
p
|g|
p
,
and the constant / sin(/p) is best possible.
Proof It remains to show that the constant / sin(/p) is the best possible.
Suppose that 1 < < 1 + 1/2p
t
. Let
f
(x) = ( 1)
1/p
x
/p
I
[1,)
(x) and g
(y) = ( 1)
1/p
y
/p
I
[1,)
(y).
66 The L
p
spaces
Then |f
|
p
= |g
|
p
= 1. Also
_

0
_

0
f
(x)g
(y)
x +y
dxdy = ( 1)
_

1
__

1
dx
x
/p
(x +y)
_
dy
y
/p
= ( 1)
_

1
_
_

1/y
du
u
/p
(1 +u)
_
dy
y
= ( 1)
_

1
__

0
du
u
/p
(1 +u)
_
dy
y
( 1)
_

1
_
_
1/y
0
du
u
/p
(1 +u)
_
dy
y

sin(/p)
( 1)
_

1
_
_
1/y
0
du
u
/p
_
dy
y
.
Now
_
1/y
0
u
/p
du = 1/(y
), where = 1/p = 1/p

t
(1)/p 1/2p
t
,
and so
_

1
_
_
1/y
0
du
u
/p
_
dy
y
=
1
_

1
dy
y
+
=
1
( + 1)
4p
t2
.
Thus
_

0
_

0
f
(x)g
(y)
x +y
dxdy

sin(/p)
4p
t2
( 1).
Letting 1, we obtain the result.
Similar arguments establish the following discrete result.
Theorem 5.10.2 (Hilberts absolute inequality: the discrete case)
If a l
p
(Z
+
) and b l
p
(Z
+
), where 1 < p < , then
m=0
n=0
[a
m
b
n
[
m+n + 1

sin(/p)
|a|
p
|b|
p
,
and the constant / sin(/p) is best possible.
Let us give an application to the theory of analytic functions. The Hardy
space H
1
(D) is the space of analytic functions f on the unit disc D =
z: [z[ < 1 which satisfy
|f|
H
1
= sup
0<r<1
1
2
_

[f(re
i
)[ d < .
Theorem 5.10.3 (Hardy) If f(z) =
n=0
a
n
z
n
H
1
then
n=0
[a
n
[
n + 1
< |f|
H
1
.
We need the fact that we can write f = bg, where b is an analytic function
on D for which [b(re
i
)[ 1 as r 1 for almost all , and g is a function
in H
1
(D) with no zeros in D. (See [Dur 70], Theorem 2.5.) Then |g|
H
1
=
|f|
H
1
. Since 0 , g(D), there exists an analytic function h on D such that
h
2
= g. Let
h(z) =
j=0
h
n
z
n
, b(z)h(z) =
j=0
c
n
z
n
.
Then
n=0
[h
n
[
2
= sup
0<r<1
1
2
_

[h(re
i
)[
2
d = |f|
H
1
,
n=0
[c
n
[
2
= sup
0<r<1
1
2
_

[b(re
i
)h(re
i
)[
2
d = |f|
H
1
,
and a
n
=
n
j=0
h
j
c
nj
. Thus, using Hilberts inequality with p = 2,
n=0
[a
n
[
n + 1

n=0
n
j=0
[h
j
c
nj
[
n + 1
=
n=0
k=0
[h
n
c
k
[
n +k + 1
|f|
H
1
.
Holders inequality was proved in [Hol 89], and Minkowskis in [Min 96]. The
systematic study of the L
p
spaces was inaugurated by F. Riesz [Ri(F) 10],
as part of his programme investigating integral equations.
Exercises
5.1 When does equality hold in Minkowskis inequality?
5.2 (Continuation of Exercise 4.4.)
(i) Suppose that (, , P) is a probability space, and that f is a non-
negative measurable function on for which E(log
+
f) < . Show
that if 0 < r < then G(f) = exp(E(log f)) |f|
r
= (E(f
r
))
1/r
.
(ii) Suppose that t > 1. Show that (t
r
1)/r is an increasing
function of r on (0, ), and that (t
r
1)/r log t as r 0.
68 The L
p
spaces
(iii) Suppose that |f|
r
0
< for some r
0
> 0. Show that
log(|f|
r
) E(([f[
r
1)/r) for 0 < r r
0
. Use the theorem of
dominated convergence to show that |f|
r
G(f) as r 0.
5.3 Let f
+
and f
be the functions dened in Theorem 5.2.2. Show that

((f
+
> 0) (f
> 0)) = 0.
5.4 Suppose that f L
p
(0, 1), where 0 < p < 1. Choose 0 = t
0
< t
1
<
t
n
= 1 so that
_
t
j
t
j1
[f(x)[
p
dx = (1/n)
_
1
0
[f(x)[
p
dx for 1 j n.
Let f
j
= nfI
(t
j1
,t
j
]
. Calculate d
p
(f
j
, 0). Show that if U is a non-
empty convex open subset of L
p
(0, 1) then U = L
p
(0, 1).
5.5 Show that (L
, |.|
5.6 Show that the simple functions are dense in (L
, |.|
) if and only if
() < .
5.7 Give an inductive proof of Proposition 5.4.1.
5.8 Prove the following:
(i) If f L
1
and g L
then fg L
1
and [l
g
(f)[ = [
_
fg d[
|f|
1
|g|
.
(ii) l is a norm-decreasing linear mapping of L
into (L
1
)
.
(iii) If g is a non-zero element of L
and 0 < < 1 there exists

a set A
of nite positive measure such that [g()[ > (1 ) |g|
for
A
.
(iv) Show that |l
g
|
1
= |g|
. (Consider sgn gI
A
.)
(v) By following the proof of Theorem 5.6.1, show that l is an isom-
etry of L
onto (L
1
)
. (Find g, and show that ([g[ > ||
) = 0.)
5.9 Show that there is a natural isometry l of L
1
into (L
.
5.10 It is an important fact that the mapping l of the preceding question
is not surjective when L
1
is innite-dimensional: L
1
is not reexive.
(i) Let c = x = (x
n
): x
n
l for some l, as n . Show that
c is a closed linear subspace of l
. If x c, let (x) = lim

n
x
n
.
Show that c
, and that ||
= 1. Use the HahnBanach theorem

to extend to l
. Show that , l(l

1
).
(ii) Use the RadonNykod ym theorem, and the idea of the preceding
example, to show that l(L
1
(0, 1)) ,= (L
(0, 1))
.
5.11 Suppose that is a continuous linear functional on the complex Banach
space L
p
C
(, , ), where 1 p < . If f L
p
R
(, , ), we can con-
sider f as an element of L
p
C
(, , ). Let (f) be the real part of
(f) and (f) the imaginary part. Show that and are continuous
linear functionals on L
p
R
(, , ). Show that is represented by an
element g of L
p
C
(, , ). Show that |g|
p
= ||
.
Exercises 69
5.12 Suppose that (, , ) is a -nite measure space, that E is a Banach
space and that 1 p < .
(i) If =

k
j=1
j
I
A
j
is a simple measurable E
-valued function
and f L
p
(E), let
j()(f) =
k
j=1
_
A
k
j
(f) d.
Show that j() (L
p
(E))
and that |j()|
L
p
(E)
= ||
L
p
(E
)
.
(ii) Show that j extends to an isometry of L
p
(E
) into (L
p
(E))
.
[It is an important fact that j need not be surjective: this requires the
so-called RadonNikodym property. See [DiU 77] for details; this is an
invaluable source of information concerning vector-valued functions.]
(iii) Show that
p |f|
L
p
(E)
= supj()(f): simple, ||
L
p
(E
)
1.
5.13 Prove that
_

0
1
1 +y
1
y
dy =

sin
for 0 < < 1,
by contour integration, or otherwise.
5.14 Write out a proof of Theorem 5.10.2.
6
Banach function spaces
6.1 Banach function spaces
In this chapter, we introduce the idea of a Banach function space; this pro-
vides a general setting for most of the spaces of functions that we consider.
As an example, we introduce the class of Orlicz spaces, which includes the L
p
spaces for 1 < p < . As always, let (, , ) be a -nite measure space,
and let M = M(, , ) be the space of (equivalence classes of) measurable
functions on .
A function norm on M is a function : M [0, ] (note that is
allowed) satisfying the following properties:
(i) (f) = 0 if and only if f = 0; (f) = [[(f) for ,= 0; (f + g)
(f) +(g).
(ii) If [f[ [g[ then (f) (g).
(iii) If 0 f
n
f then (f) = lim
n
(f
n
).
(iv) If A and (A) < then (I
A
) < .
(v) If A and (A) < there exists C
A
such that
_
A
[f[ d C
A
(f)
for any f M.
If is a function norm, the space E = f M: (f) < is called a
Banach function space. If f E, we write |f|
E
for (f). Then condition
(i) ensures that E is a vector space and that |.|
E
is a norm on it. We
denote the closed unit ball x: (x) 1 of E by B
E
. As an example,
if 1 p < , let
p
(f) = (
_
[f[
p
d)
1/p
. Then
p
is a Banach function
norm, and the corresponding Banach function space is L
p
. Similarly, L
is
a Banach function space.
Condition (ii) ensures that E is a lattice, and rather more: if g E and
[f[ [g[ then f E and |f|
E
|g|
E
. Condition (iv) ensures that the
simple functions are in E, and condition (v) ensures that we can integrate
functions in E over sets of nite measure. In particular, if () < then
70
6.1 Banach function spaces 71
L
E L
1
, and the inclusion mappings are continuous. Condition (iii)
corresponds to the monotone convergence theorem for L
1
, and has similar
uses, as the next result shows.
Proposition 6.1.1 (Fatous lemma) Suppose that (f
n
) is a sequence in
a Banach function space (E, |.|
E
), that f
n
f almost everywhere and that
liminf |f
n
|
E
< . Then f E and |f|
E
liminf |f
n
|
E
.
Proof Let h
n
= inf
mn
[f
m
[; note that h
n
[f
n
[. Then 0 h
n
[f[, so
that
(f) = ([f[) = lim
n
|h
n
|
E
lim inf
n
|f
n
|
E
.
Suppose that A . Then if E is a Banach function space, we set
E
A
= f E: f = fI
A
. E
A
is the linear subspace of E consisting of those
functions which are zero outside A.
Proposition 6.1.2 If E is a Banach function space and (A) < then
f E
A
: |f|
E
1 is closed in L
1
A
.
n
) is a sequence in f E
A
: |f|
E
1 which con-
verges in L
1
A
norm to f
A
, say. Then there is a subsequence (f
n
k
) which
converges almost everywhere to f
A
. Then f
A
is zero outside F, and it fol-
lows from Fatous lemma that (f
A
) 1.
Theorem 6.1.1 If (E, |.|
E
) is a Banach function space, then it is norm
complete.
n
) is a Cauchy sequence. Then if (A) < , (f
n
I
A
) is
a Cauchy sequence in L
1
A
, and so it converges in L
1
A
norm to f
A
, say. Further,
there is a subsequence (f
n
k
I
A
) which converges almost everywhere to f
A
.
Since (, , ) is -nite, we can use a diagonal argument to show that there
exists a subsequence (g
k
) = (f
d
k
) which converges almost everywhere to a
function f. It will be enough to show that f E and that |f g
k
|
E
0.
First, (f) sup
k
|g
k
|
E
< , by Fatous lemma, so that f E. Second,
given > 0 there exists k
0
such that |g
l
g
k
|
E
< for l > k k
0
. Since
g
l
g
k
f g
k
almost everywhere as l , another application of Fatous
lemma shows that |f g
k
|
E
for k k
0
.
It is convenient to characterize function norms and Banach function spaces
in terms of the unit ball.
72 Banach function spaces
Proposition 6.1.3 Let B
E
be the unit ball of a Banach function space.
Then
(i) B
E
is convex.
(ii) If [f[ [g[ and g B
E
then f B
E
.
(iii) If 0 f
n
f and f
n
B
E
then f B
E
.
(iv) If A and (A) < then I
A
B
E
for some 0 < .
(v) If A and (A) < then there exists 0 < C
A
< such that
_
A
[f[ d C
A
for any f B
E
.
Conversely, suppose that B satises these conditions. Let
(f) = inf > 0: f B.
[The inmum of the empty set is .]
Then is a function norm, and B = f: (f) 1.
Proof This is a straightforward but worthwhile exercise.
6.2 Function space duality
We now turn to function space duality.
Proposition 6.2.1 Suppose that is a function norm. If f M, let
t
(f) = sup
__
[fg[ d: g B
E
_
.
Then
t
is a function norm.
Proof This involves more straightforward checking. Let us just check two
of the conditions. First, suppose that
t
(f) = 0. Then
t
([f[) = 0, and
by condition (iv),
_
F
[f[ d = 0 whenever (F) < , and this ensures that
f = 0.
Second, suppose that 0 f
n
f and that sup
t
(f
n
) = < . If
(g) 1 then
_
f
n
[g[ d , and so
_
f[g[ d , by the monotone
convergence theorem. Thus (f) .
t
is the associate function norm, and the corresponding Banach function
space (E
t
, |.|
E
) is the associate function space. If f E

t
then the mapping
g
_
fg d is an isometry of (E
t
|.|
E
) into the dual space E
of all con-
tinuous linear functionals on (E, |.|
E
), and we frequently identify E
t
with
a subspace of E
.
Theorem 6.2.1 If is a function norm then
tt
= .
Proof This uses the HahnBanach theorem, and also uses the fact that the
dual of L
1
can be identied with L
(Exercise 5.8). It follows from the

denitions that
tt
, so that we must show
tt
. For this it is enough
to show that if (f) > 1 then
tt
(f) > 1. There exist simple functions f
n
such that 0 f
n
[f[. Then (f
n
) ([f[) = (f). Thus there exists a
simple function g such that 0 g [f[ and (g) > 1.
Suppose that g is supported on A, where (A) < . Then g is disjoint
from hI
A
: h B
E
, and this set is a closed convex subset of L
1
A
. By the
separation theorem (Theorem 4.6.3) there exists k L
A
such that
_
A
gk d > 1 sup
_
_
A
hk d
: hI
A
B
E
_
= sup
_
_
hk d
: h B
E
_
.
This implies rst that
t
(k) 1 and second that
tt
(g) > 1. Thus
tt
(f)
tt
(g) > 1.
6.3 Orlicz spaces
Let us give an example of an important class of Banach function spaces, the
Orlicz spaces. A Youngs function is a non-negative convex function on
[0, ), with (0) = 0, for which (t)/t as t . Let us consider
B
=
_
f M:
_
([f[) d 1
_
.
Then B
satises the conditions of Proposition 6.1.3; the corresponding

Banach function space L
is called the Orlicz space dened by . The norm

|f|
= inf
_
> 0:
_
([f[) d 1
_
is known as the Luxemburg norm on L
.
The most important, and least typical, class of Orlicz spaces occurs when
we take (t) = t
p
, where 1 < p < ; in this case we obtain L
p
.
[The spaces L
1
and L
are also Banach function spaces, although, ac-

cording to our denition, they are not Orlicz spaces.]
Let us give some examples of Orlicz spaces.
(t) = e
t
1. We denote the corresponding Orlicz space by (L
exp
, |.|
exp
).
Note that if () < then L
exp
L
p
for 1 p < , and |f|
exp
1 if
and only if
_
e
[f[
d 1 +().
(t)=e
t
2
1. We denote the corresponding Orlicz space by (L
exp
2, |.|
exp
2
).
Note that L
exp
2 L
exp
.
(t) = t log
+
t, where log
+
t = max(log t, 0). We denote the corresponding
Orlicz space by (L
Llog L
, |.|
Llog L
).
We now turn to duality properties. First we consider Youngs functions
more carefully. As is convex, it has a left-derivative D
and a right-
derivative D
+
. We choose to work with the right-derivative, which we de-
note by , but either will do. is a non-negative increasing right-continuous
function on [0, ), and (t) as t , since D
+
(t) (t)/t.
Proposition 6.3.1 Suppose that is a Youngs function with right-
derivative . Then (t) =
_
t
0
(s) ds.
Proof Suppose that > 0. There exists a partition 0 = t
0
< t
1
< < t
n
=
t such that
n
i=1
(t
i
)(t
i
t
i1
)
_
t
0
(s) ds
n
i=1
(t
i1
)(t
i
t
i1
) +.
But (t
i1
)(t
i
t
i1
) (t
i
) (t
i1
) and
(t
i
)(t
i
t
i1
) D
f(t
i
)(t
i
t
i1
) (t
i
) (t
i1
),
so that
(t)
_
t
0
(s) ds (t) +.
Since is arbitrary, the result follows.
The function is increasing and right-continuous, but it need not be
strictly increasing, and it can have jump discontinuities. Nevertheless, we
can dene an appropriate inverse function: we set
(u) = supt: (t) u.
Then is increasing and right-continuous, and (u) as u . The
functions and have symmetric roles.
Proposition 6.3.2 (t) = supu: (u) t.
Proof Let us set (t) = supu: (u) t. Suppose that (u) t. Then if
t
t
> t, (t
t
) > u. Since is right-continuous, (t) u, and so (t) (t).
On the other hand, if u < (t), then (u) t, so that (t) u. Thus
(t) (t).
We now set (u) =
_
u
0
(v) dv. is a Youngs function, the Youngs
function complementary to .
Theorem 6.3.1 (Youngs inequality) Suppose that (t) =
_
s
0
(s) ds
and (u) =
_
u
0
(v) dv are complementary Youngs functions. Then tu
(t) + (u), with equality if and only if u = (t) or t = (u).
Proof We consider the integrals as areas under the curve. First suppose
that (t) = u. Then if 0 s < t and 0 v < u, then either v (s) or
s < (v), but not both. Thus the rectangle [0, t) [0, u) is divided into two
disjoint sets with measures
_
t
0
(s) ds and
_
u
0
(v) dv. [Draw a picture!]
Next suppose that (t) < u. Then, since is right continuous, it follows
from the denition of that (v) > t for (t) < v u. Thus
tu = t(t) +t(u (t))
< ((t) + ((t))) +
_
u
(t)
(v) dv (t) + (u).
Finally, if (t) > u then (u) t, and we obtain the result by interchang-
ing and .
Corollary 6.3.1 If f L
and g L
then fg L
1
and
_
[fg[ d 2 |f|
. |g|
.
Proof Suppose that > |f|
and > |g|
. Then
[fg[

_
f
_
+
_
g
_
;
integrating,
_
[fg[ d 2, which gives the result.
Thus L
(L
)
t
, and |g|
t
2 |g|
(where |.|
t
is the norm associate

to |.|
). In fact, we can say more.

Theorem 6.3.2 L
= (L
)
t
and
|g|
|g|
t
2 |g|
.
Proof We have seen that L
(L
)
t
and that |g|
t
2 |g|
. Suppose
that g L
t
and that |g|

t
1. Then there exists a sequence (g

n
) of simple
functions such that 0 g
n
[g[. Since
(g) =
([g[) = sup
n
|g
n
|
, it
is therefore enough to show that if g is a non-negative simple function with
|g|
= 1 then |g|
t
1.
Let h = (g). Then the conditions for equality in Youngs inequality hold
pointwise, and so hg = (h) + (g). Thus
_
hg d =
_
(h) d +
_
(g) d =
_
(h) d + 1.
If |h|
1, this implies that |g|

t
1. On the other hand, if |h|
= > 1
then
|h|
=
_
(h/) d
_
(h) d,
by the convexity of . Thus
_
hg d |h|
, and so |g|
t
1.
We write |.|
()
for the norm |.|
t
on L
: it is called the Orlicz norm.

Theorem 6.3.2 then states that the Luxemburg norm and the Orlicz norm
are equivalent.
Finally, let us observe that we can also consider vector-valued function
spaces. If (X, ) is a Banach function space and (E, |.|
E
) is a Banach
space, we set X(E) to be the set of E-valued strongly measurable functions,
for which (|f|
E
) < . It is a straightforward matter to verify that X(E)
is a vector space, that |f|
X(E)
= (|f|
E
) is a norm on X(E), and that
under this norm X(E) is a Banach space.
A systematic account of Banach function spaces was given by Luxemburg
[Lux 55] in his PhD thesis, and developed in a series of papers with Zaanen
[LuZ 63]. Orlicz spaces were introduced in [Orl 32]. The denition of these
spaces can be varied (for example to include L
1
and L
): the simple def-

inition that we have given is enough to include the important spaces L
exp
,
L
exp
2 and L
Llog L
. A fuller account of Banach function spaces, and much
else, is given in [BeS 88].
Exercises
6.1 Write out a proof of Proposition 6.1.3 and the rest of Proposition 6.2.1.
6.2 Suppose that the step functions are dense in the Banach function space
E. Show that the associate space E
t
can be identied with the Banach
space dual of E.
Exercises 77
6.3 Suppose that E
1
and E
2
are Banach function spaces, and that E
1
E
2
.
Use the closed graph theorem to show that the inclusion mapping is con-
tinuous. Give a proof which does not depend on the closed graph theo-
rem. [The closed graph theorem is a fundamental theorem of functional
analysis: if you are not familiar with it, consult [Bol 90] or [TaL 80].]
6.4 Suppose that E is a Banach function space and that fg L
1
for all
g E. Show that g E
t
.
6.5 Suppose that E is a Banach function space. Show that the associate
space E
t
can be identied with the dual E
of E if and only if whenever

(f
n
) is an increasing sequence of non-negative functions in E which
converges almost everywhere to f E then |f f
n
|
E
0.
6.6 Calculate the functions complementary to e
t
1, e
t
2
1 and t log
+
t.
6.7 Suppose that is an Orlicz function with right derivative . Show that
(f) =
_
0
(u)([f[ > u) du.
6.8 Suppose that is a Youngs function. For s 0 and t 0 let f
s
(t) =
st (t). Show that f
s
(t) as t . Let (s) = supf
s
(t):
t 0. Show that is the Youngs function conjugate to .
The formula (s) = supst (t): t 0 expresses as the
LegendreFenchel transform of .
7
Rearrangements
7.1 Decreasing rearrangements
E
) is a Banach function space and that f E. Then
|f|
E
= |[f[|
E
, so that the norm of f depends only on the absolute values
of f. For many important function spaces we can say more. Suppose for
example that f L
p
, where 1 t) dt)
1/p
, and so |f|
p
depends only on the distribution of
[f[. The same is true for functions in Orlicz spaces. In this chapter, we shall
consider properties of functions and spaces of functions with this property.
In order to avoid some technical diculties which have little real interest,
we shall restrict our attention to two cases:
(i) (, , ) is an atom-free measure space;
(ii) = N or 1, . . . , n, with counting measure.
In the second case, we are concerned with sequences, and the arguments
are usually, but not always, easier. We shall begin by considering case (i) in
detail, and shall then describe what happens in case (ii), giving details only
when dierent arguments are needed.
Suppose that we are in the rst case, so that (, , ) is atom-free. We
shall then make use of various properties of the measure space, which follow
from the fact that if A and 0 < t < (A) then there exists a subset B of
A with (B) = t (Exercise 7.1). If f 0 then the distribution function
f
takes values in [0, ]. The fact that
f
can take the value is a nuisance.
For example, if = R, with Lebesgue measure, and f(x) = tan
2
x, then
f
(t) = for all t > 0, which does not give us any useful information about
f; similarly, if f(x) = sin
2
x, then
f
(t) = for 0 < t < 1 and
f
(t) = 0
for t 1. We shall frequently restrict attention to functions in
M
1
(, , ) = f M(, , ):
[f[
(u) < , for some u > 0.
78
7.1 Decreasing rearrangements 79
Thus M
1
contains sin
2
x, but does not contain tan
2
x. If f M
1
, let C
f
=
infu:
[f[
(u) < . Let us also set
M
0
= f M
1
: C
f
= 0 = f M:
[f[
(u) < , for all u > 0,
and at the other extreme, let M
denote the space of (equivalence classes)

of measurable functions, taking values in (, ]. Thus M
0
M
1
M
M
. Note that L
p
M
0
for 0 < p < and that L
M
1
.
Suppose that f M
1
. Then the distribution function
[f[
is a decreas-
ing right-continuous function on [0, ), taking values in [0, ] (Proposition
1.3.3). We now consider the distribution function f
of
[f[
.
Proposition 7.1.1 If f M
1
, f
is a decreasing right-continuous function

on [0, ), taking values in [0, ], and f
(t) = 0 if t > (). If () =

then f
(t) C
f
as t .
The functions [f[ and f
are equidistributed: ([f[ > u) = (f
> u) for
0 u < .
Proof The statements in the rst paragraph follow from the denitions, and
Proposition 1.3.3.
If ([f[ > u) = , then certainly ([f[ > u) (f
> u). If
[f[
(u) =
([f[ > u) = t < , then f
(t) u, so that (f
> u) t = ([f[ > u).

If (f
> u) = , then certainly ([f[ > u) (f
> u). If (f
> u) =
t < , then f
(t) u: that is, (

[f[
> t) u. Thus if v > u,
[f[
(v) t.
But
[f[
is right-continuous, and so ([f[ > u) =
[f[
(u) t = (f
> u).
The function f
is called the decreasing rearrangement of f: it is a right-

continuous decreasing function on [0, ) with the same distribution as [f[.
Two applications of Proposition 1.3.3 also give us the following result.
Proposition 7.1.2 If 0 f
n
f and f M
1
then 0 f
n
f
.
This proposition is very useful, since it allows us to work with simple
functions.
Proposition 7.1.3 If f M
1
and E is a measurable set, then
_
E
[f[ d
_
(E)
0
f
d.
80 Rearrangements
Proof Let h = [f[I
E
. Since 0 h [f[, h
, and h
(t) = 0 for
t > (E). Since h and h
are equidistributed,
_
E
[f[ d =
_
hd =
_
(E)
0
h
d
_
(E)
0
f
d.
Proposition 7.1.4 If f, g M
1
then
_
[fg[ d
_
0
f
dt.
Proof We can suppose that f, g 0. Let (f
n
) be an increasing sequence
of non-negative simple functions, increasing to f. Then f
n
g
,
by Proposition 7.1.2. By the monotone convergence theorem,
_
fg d =
lim
n
_
f
n
g d and
_
f
d = lim
n
_
f
n
g
dt. It is therefore su-

cient to prove the result for simple f. We can write f =
n
i=1
a
i
I
F
i
, where
a
i
0 and F
1
F
2
F
n
. (Note that we have an increasing sequence
of sets here, rather than a disjoint sequence, so that f
n
i=1
a
i
I
[0,(F
i
))
.)
Then, using Proposition 7.1.3,
_
fg d =
n
i=1
a
i
__
F
i
g d
_
i=1
a
i
_
_
(F
i
)
0
g
dt
_
=
_

0
_
n
i=1
a
i
I
[0,(F
i
))
_
g
dt =
_

0
f
dt.
7.2 Rearrangement-invariant Banach function spaces
We say that a Banach function space (X, |.|
X
) is rearrangement-invariant
if whenever f X and [f[ and [g[ are equidistributed then g X and
|f|
X
= |g|
X
. Suppose that (X, |.|
X
) is rearrangement-invariant and is
a measure-preserving map of (, , ) onto itself (that is, (
1
(A)) = (A)
for each A ). If f X then f and f have the same distribution, and
so f X and |f |
X
= |f|
X
; this explains the terminology.
Theorem 7.2.1 Suppose that (X, |.|
X
) is a rearrangement-invariant func-
tion space. Then (X
t
|.|
X
) is also a rearrangement-invariant function space,

and
|f|
X
= sup
__
f
dt: |g|
X
1
_
= sup
__
f
dt: g simple, |g|

X
1
_
.
Proof By Proposition 7.1.4
|f|
X
= sup
__
[fg[ d: |g|
X
1
_
sup
__
f
dt: |g|
X
1
_
.
On the other hand, if f X and g X
t
with |g|
X
1, there exist
increasing sequences (f
n
) and (g
n
) of simple functions which converge to [f[
and [g[ respectively. Further, for each n, we can take f
n
and g
n
of the form
f
n
=
k
j=1
a
j
E
j
, g
n
=
k
j=1
b
j
E
j
,
where E
1
, . . . , E
k
are disjoint sets of equal measure (here we use the special
properties of (, , ); see Exercise 7.7) and where b
1
b
k
. Now
there exists a permutation of (1, . . . , n) such that a
(1)
a
(k)
. Let
f
n
=
k
j=1
a
(j)
E
j
. Then f
n
and f
n
are equidistributed, so that
|f|
X
|f
n
|
X
= |f
n
|
X

_
f
n
g
n
d =
_
f
n
g
n
dt.
Letting n , we see that |f|
X

_
f
dt.
Finally, suppose that g X
t
and that [g[ and [h[ are equidistributed.
Then if f X and |f|
X
1,
_
[fh[ d
_
f
dt =
_
f
dt |g|
X
.
This implies that h X
t
and that |h|
X
|g|
X
; similarly |g|
X
|h|
X
.
7.3 Muirheads maximal function
In Section 4.3 we introduced the notion of a sublinear functional; these
functionals play an essential role in the HahnBanach theorem. We now
extend this notion to more general mappings.
A mapping T from a vector space E into a space M
(, , ) is subad-
ditive if T(f + g) T(f) + T(g) for f, g E, is positive homogeneous if
T(f) = T(f) for f E and real and positive, and is sublinear if it is
both subadditive and positive homogeneous. The mapping f f
gives
good information about f, but it is not subadditive: if A and B are disjoint
sets of positive measure t, then I
A
+I
B
= 2I
[0,t)
, while (I
A
+I
B
)
= I
[0,2t)
.
We now introduce a closely related mapping, of great importance, which
82 Rearrangements
is sublinear. Suppose that f M
1
and that t > 0. We dene Muirheads
maximal function as
f
(t) = sup
_
1
t
_
E
[f[ d: (E) t
_
,
for 0 < t < ().
Theorem 7.3.1 The mapping f f
is sublinear, and if [f[ [g[ then

f
. If [f
n
[ [f[ then f
n
f
. Further,
f
(t) =
1
t
_
t
0
f
(s) ds.
Proof It follows from the denition that the mapping f f
is sublinear,
and that if [f[ [g[ then f
. Thus if [f
n
[ [f[ then lim
n
f
n
f
.
On the other hand, if (B) t then, by the monotone convergence theorem,
_
B
[f
n
[ d
_
B
[f[ d. Thus lim
n
f
n
(t) (1/t)
_
B
[f[ d. Taking the
supremum over B, it follows that lim
n
f
n
(t) f
(t).
If f M
1
, then f
(t) (1/t)
_
t
0
f
(s) ds, by Proposition 7.1.3. It follows

from Proposition 7.1.2 and the monotone convergence theorem that if [f
n
[
[f[ then (1/t)
_
t
0
f
n
(s) ds (1/t)
_
t
0
f
(s) ds. It is therefore sucient to

prove the converse inequality for non-negative simple functions.
Suppose then that f =

n
i=1
i
I
F
i
is a simple function, with
i
> 0 for
1 < i < n and F
1
F
2
F
n
. If (F
n
) t, choose G F
n
with
(G) = t. If t < (F
n
) there exists j such that (F
j1
) t < (F
j
). Choose
G with F
j1
G F
j
and (G) = t. Then (1/t)
_
t
0
f
(s) ds = (1/t)
_
G
f d,
and so (1/t)
_
t
0
f
(s) ds f
(t).
Corollary 7.3.1 If f M
1
then either f
(t) = for all 0 < t < ()

or 0 f
(t) f
(t) < for all 0 < t < (). In the latter case, f
is
a continuous decreasing function on (0, ()), and tf
(t) is a continuous
increasing function on (0, ()).
Proof If
_
t
0
f
(s) ds = for all 0 < t < (), then f
(t) = for all

0 < t < (). If there exists 0 < t < () for which
_
t
0
f
(s) ds < , then

_
t
0
f
(s) ds < for all 0 < t < (), and so 0 f
(t) f
(t) < for

all 0 < t < (). The function tf
(t) =
_
t
0
f
(s) ds is then continuous and

increasing. Thus f
is continuous. Finally, if 0 < t < u < () then, setting

= (u t)/u,
f
(u) = (1 )f
(t) +

u t
_
u
t
f
(s) ds (1 )f
(t) +f
(t) f
(t).
Here is another characterization of Muirheads maximal function.
Theorem 7.3.2 Suppose that 0 < t < (). The map f f
(t) is a
function norm, and the corresponding Banach function space is L
1
+L
. If
f L
1
+L
then
f
(t) = inf|h|
1
/t +|k|
: f = h +k.
Further the inmum is attained: if f L
1
+ L
there exist h L
1
and
k L
with |h|
1
/t +|k|
= f
(t).
Proof We need to check the conditions of Section 6.1. Conditions (i) and
(ii) are satised, and (iii) follows from Theorem 7.3.1. If A is measurable,
then I
A
(t) 1, so that condition (iv) is satised. If (A) < there exist
measurable sets A
1
, . . . , A
k
, with (A
i
) = t for 1 i k, whose union
contains A. Then if f M,
_
A
[f[ d
k
i=1
_
A
i
[f[ d ktf
(t),
and so condition (v) is satised. Thus f
is a function norm.
First, suppose that f = h +k, with h L
1
and k L
. If (A) t then
_
A
[h[ d |h|
1
, and so h
(t) |h|
1
/t. Similarly,
_
A
[k[ d t |k|
, and
so k
(t) |k|
. Thus f is in the corresponding Banach function space,

and
f
(t) h
(t) +k
(t) |h|
1
/t +|k|
.
Conversely suppose that f
(t) < . First we observe that f M

1
. For if
not, then for each u > 0 there exists a set of measure t on which [f[ > u, and
so f
(t) > u/t, for all u > 0, giving a contradiction. Let B = ([f[ > f
(t)).
Thus f
(s) > f
(t) for 0 < s < (B), and f
(s) f
(t) for (B) s <

(). Since [f[ and f
are equidistributed, (B) = (f
> f
(t)) t. Now
84 Rearrangements
let h = sgn f([f[ f
(t))I
B
, and let k = f h. Then h
(s) = f
(s) f
(t)
for 0 < s < (B), and h
(s) = 0 for (B) s < (), so that

1
t
_
[h[ d =
1
t
_
(B)
0
f
(s) f
(t) ds
=
1
t
_
t
0
f
(s) f
(t) ds = f
(t) f
(t).
On the other hand, [k()[ = f
(t) for B, and [k() = [f()[ f
(t) for
, B, so that |k|
(t). Thus |h|

1
/t +|k|
(t).
Theorem 7.3.3 Suppose that t > 0. Then L
1
L
is the associate space

to L
1
+L
and the function norm
t
(g) = max(|g|
1
, t |g|
)
is the associate norm to f
(t).
Proof It is easy to see that L
1
L
is the associate space to L

1
+L
. Let
|.|
t
denote the associate norm. Suppose that g L
1
L
.
If |f|
1
1 then f
(t) 1/t, and so [

_
fg d[ |g|
t
/t. Thus
|g|
= sup
_
[
_
fg d[: |f|
1
1
_
|g|
t
/t.
Similarly, if |f|
1 then f
(t) 1, and so [
_
fg d[ |g|
t
. Thus
|g|
1
= sup
_
[
_
fg d[: |f|
1
_
|g|
t
.
Consequently,
t
(g) |g|
t
.
Conversely, if f
(t) 1 we can write f = h +k with |h|

1
/t +|k|
1.
Then
_
fg d
_
[hg[ d[ +
_
[kg[ d
(|h|
1
/t) (t |g|
) +|k|
|g|
1

t
(g).
Thus |g|
t

t
(g).
7.4 Majorization
We use Muirheads maximal function to dene an order relation on L
1
+L
:
we say that g weakly majorizes f, and write f
w
g, if f
(t) g
(t) for
7.4 Majorization 85
all t > 0. If in addition f and g are non-negative functions in L
1
and
_
f d =
_
g d, we say that g majorizes f and write f g. We shall

however principally be concerned with weak majorization.
The following theorem begins to indicate the signicance of this ordering.
For c 0, let us dene the angle function a
c
by a
c
(t) = (t c)
+
.
Theorem 7.4.1 Suppose that f and g are non-negative functions in L
1
+
L
. The following are equivalent:

(i) f
w
g;
(ii)
_
0
f
(t)h(t) dt
_
0
g
(t)h(t) dt for every decreasing non-negative

function h on [0, );
(iii)
_
a
c
(f) d
_
a
c
(g) d for each c 0;
(iv)
_
(f) d
_
(g) d for every convex increasing function on
[0, ) with (0) = 0.
Proof We rst show that (i) and (ii) are equivalent. Since tf
(t) =
_
0
f
(s)I
[0,t)
ds, (ii) implies (i). For the converse, if h is a decreasing non-
negative step function on [0, ), we can write h =
j
i=1
i
I
[0,t
i
)
, with
i
> 0
and 0 < t
1
< < t
j
, so that if f
w
g then
_
f
(t)h(t) dt =
j
i=1
i
t
i
f
(t
i
)
i=1
i
t
i
g
(t
i
) =
_
g
(t)h(t) dt.
For general decreasing non-negative h, let (h
n
) be an increasing sequence
of decreasing non-negative step functions which converges pointwise to h.
Then, by the monotone convergence theorem,
_
f
(t)h(t) dt = lim
n
_
f
(t)h
n
(t) dt
lim
n
_
g
(t)h
n
(t) dt =
_
g
(t)h(t) dt.
Thus (i) and (ii) are equivalent.
Next we show that (i) and (iii) are equivalent. Suppose that f
w
g and
that c > 0. Let
t
f
= infs: f
(s) c and t
g
= infs: g
(s) c.
86 Rearrangements
If t
f
t
g
, then
_
a
c
(f) d =
_
(f>c)
(f c) d =
_
t
f
0
f
(s) ds ct
f
_
t
f
0
g
(s) ds ct
f
+
_
_
t
g
t
f
g
(s) ds c(t
g
t
f
)
_
=
_
t
g
0
g
(s) ds ct
g
=
_
(g>c)
(g c) d,
since g
(s) > c on [t
f
, t
g
).
On the other hand, if t
f
> t
g
, then
_
a
c
(f) d =
_
(f>c)
(f c) d =
_
t
f
0
f
(s) ds ct
f
_
t
f
0
g
(s) ds ct
f
=
_
t
g
0
g
(s) ds +
_
t
f
t
g
g
(s) ds ct
f
_
t
g
0
g
(s) ds +c(t
f
t
g
) ct
f
=
_
(g>c)
(g c) d,
since g
(s) c on [t
g
, t
f
). Thus (i) implies (iii).
Conversely, suppose that (iii) holds. By monotone convergence, the in-
equality also holds when c = 0. Suppose that t > 0, and let c = g
(t). Let
t
f
and t
g
be dened as above. Note that t
g
t.
If t
f
t, then
_
t
0
f
(s) ds
_
t
f
0
f
(s) ds + (t t
f
)c
=
_
(f>c)
(f c) d +tc
_
(g>c)
(g c) d +tc
=
_
t
g
0
g
(s) ds + (t t
g
)c
=
_
t
0
g
(s) ds,
7.4 Majorization 87
since g
(s) = c on [t
g
, t).
On the other hand, if t
f
> t, then
_
t
0
f
(s) ds =
_
t
0
(f
(s) c) ds +ct
_
t
f
0
(f
(s) c) ds +ct
=
_
(f>c)
(f c) d +ct
_
(g>c)
(g c) d +ct
=
_
t
0
(g
(s) c) ds +ct
=
_
t
0
g
(s) ds.
Thus f
w
g, and (iii) implies (ii).
We nally show that (iii) and (iv) are equivalent. Since a
c
is a non-
negative increasing convex function on [0, ), (iv) implies (iii). Suppose
that (iii) holds. Then
_
(f) d
_
(g) d when =

j
i=1
i
a
c
i
, where
i
> 0 and a
c
i
is an angle function for 1 i j. As any convex increasing
non-negative function with (0) = 0 can be approximated by an increas-
ing sequence of such functions (Exercise 7.8), the result follows from the
monotone convergence theorem.
Corollary 7.4.1 Suppose that (X, |.|
X
) is a rearrangement-invariant
Banach function space. If f X and h
w
f then h X and |h|
X
|f|
X
.
Proof By Theorem 7.2.1, and (ii),
|h|
X
= sup
__
h
dt: |g
|
X
1
_
sup
__
f
dt: |g
|
X
1
_
= |f|
X
.
Theorem 7.4.2 Suppose that (X, |.|
X
) is a rearrangement-invariant func-
tion space. Then L
1
L
X L
1
+L
, and the inclusions are continuous.

88 Rearrangements
Proof Let 0 < t < (), and let E be a set of measure t. Set C
t
= |I
E
|
t
X
/t.
Since X
t
is rearrangement-invariant, C
t
does not depend on the choice of E.
Suppose that f X and that (F) t. Then
1
t
_
F
[f[ d |f|
X
|I
F
|
t
X
/t C
t
|f|
X
< ,
so that f
(t) C
t
|f|
X
. Thus f L
1
+ L
, and the inclusion: X

L
1
+L
is continuous. Similarly X
t
L
1
+L
, with continuous inclusion;

considering associates, we see that L
1
L
X, with continuous inclusion.

7.5 Calderons interpolation theorem and its converse
We now come to the rst of several interpolation theorems that we shall
prove.
Theorem 7.5.1 (Calderons interpolation theorem) Suppose that T is
a sublinear mapping from L
1
+L
to itself which is norm-decreasing on L

1
and norm-decreasing on L
. If f L
1
+L
then T(f)
w
f.
If (X, |.|
X
) is a rearrangement-invariant function space, then T(X) X
and |T(f)|
X
|f|
X
for f X.
Proof Suppose that f L
1
+ L
and that 0 < t < (). By Theorem

7.3.2,
T(f)
(t) inf|T(h)|
1
/t +|T(k)|
: f = h +k
inf|h|
1
/t +|k|
: f = h +k = f
(t),
and so T(f)
w
f. The second statement now follows from Corollary 7.4.1.
Here is an application of Calder ons interpolation theorem. We shall state
it for R
d
, but it holds more generally for a locally compact group with Haar
measure (see Section 9.5).
Proposition 7.5.1 Suppose that is a probability measure on R
d
and that
(X, |.|
X
) is a rearrangement-invariant function space on R
d
. If f X,
then the convolution product f , dened by
(f )(x) =
_
f(x y) d(y),
is in X, and |f |
X
|f|
X
.
7.5 Calder ons interpolation theorem and its converse 89
Proof If f L
1
then
_
[f [ d
_ __
[f(x y)[ d(x)
_
d(y) =
_
|f|
1
d = |f|
1
,
while if g L
then
[(g )(x)[
_
[g[ d |g|
.
Thus we can apply Calderons interpolation theorem.
As a consequence, if h L
1
(R
d
) then, since [f h[ [f[ [h[, f h X
and
|f h|
X
|[f[ [h[|
X
|f|
X
|h|
1
.
The rst statement of Calder ons interpolation theorem has an interesting
converse. We shall prove this in the case where has nite measure (in
which case we may as well suppose that () = 1), and is homogeneous:
that is, if we have two partitions = A
1
A
n
= B
1
B
n
into sets
of equal measure then there is a measure-preserving transformation R of
such that R(A
i
) = B
i
for 1 i n. Neither of these requirements is in fact
necessary.
Theorem 7.5.2 Suppose that () = 1 and is homogeneous. If f, g L
1
and f
w
g then there exists a linear mapping T from L
1
to itself which is
norm-decreasing on L
1
and norm-decreasing on L
and for which T(g) = f.

If g and f are non-negative, we can also suppose that T is a positive operator
(that is, T(h) 0 if h 0).
Proof The proof that we shall give is based on that given by Ry [Ryf 65].
It is a convexity proof, using the separation theorem.
First we show that it is sucient to prove the result when f and g are
both non-negative. If f
w
g then [f[
w
[g[. We can write f = [f[, with
[()[ = 1 for all , and g = [f[, with [()[ = 1 for all . If there exists
a suitable S with S([g[) = [f[, let T(k) = .S(k/). Then T(g) = f, and T
is norm-decreasing on L
1
and on L
. We can therefore suppose that f and

g are both non-negative, and restrict attention to real-valued functions.
We begin by considering the set
=T: T L(L
1
), T 0, |T(f)|
1
|f|
1
, |T(f)|
|f|
for f L
.
90 Rearrangements
If T , the transposed mapping T
is norm-decreasing on L
. Also,
T
extends by continuity to a norm-decreasing linear map on L

1
. Thus the
extension of T
to L
1
, which we again denote by T
, is in .
is a semi-group, and is a convex subset of
B
+
= T L(L
): T 0, |T| 1.
Now B
+
is compact under the weak operator topology dened by the semi-
norms p
h,k
(T) =
_
(T(h)k d, where h L
, k L
1
. [This is a consequence
of the fact that if E and F are Banach spaces then L(E, F
) can be identied
with the dual of the tensor product E
F with the projective norm, and of

the BanachAlaoglu theorem [DiJT 95, p. 120]. We shall show that is
closed in B
+
in this topology, so that is also compact in the weak operator
topology.
Suppose that h, k L
and that |h|

1
1, |k|
1. Then if T ,
[
_
T(h)k d[ 1. Thus if S

, [
_
S(h)k d[ 1. Since this holds for all
k L
with |k|
1, |S(h)|
1
1. Thus S .
As we have observed, we can consider elements of as norm-decreasing
operators on L
1
. We now consider the orbit
O(g) = T(g): T L
1
.
The theorem will be proved if we can show that O(g) f: f 0, f
w
g.
O(g) is convex. We claim that O(g) is also closed in L
1
. Suppose that
k O(g). There exists a sequence (T
n
) in such that T
n
(g) k in L
1
norm. Let S be a limit point, in the weak operator topology, of the sequence
(T
n
). Then S and S
are in . If h L
, then
_
khd = lim
n
_
T
n
(g)hd = lim
_
n
gT
n
(h) d
=
_
gS(h) d =
_
S
(g)hd.
Since this holds for all h L
, k = S
(g) O(g). Thus O(g) is closed.

Now suppose that f
w
g, but that f , O(g). Then by the separation
theorem (Theorem 4.6.3) there exists h L
such that
_
fhd > sup
__
khd: k O(g)
_
.
7.6 Symmetric Banach sequence spaces 91
Let A = (h > 0), so that h
+
= hI
A
. Then if k O(g), I
A
k O(g), since
multiplication by I
A
is in , and is a semigroup. Thus
_
fh
+
d
_
fhd > sup
__
I
A
khd: k O(g)
_
= sup
__
kh
+
d: k O(g)
_
.
In other words, we can suppose that h 0. Now
_
fhd
_
1
0
f
ds, and
so we shall obtain the required contradiction if we show that
sup
__
khd: k O(g)
_
_
1
0
g
ds.
We can nd increasing sequences (g
n
), (h
n
) of simple non-negative functions
converging to g and h respectively, of the form
g
n
=
J
n
j=1
a
j
A
j
, h
n
=
J
n
j=1
b
j
B
j
,
with (A
j
) = (B
j
) = 1/J
n
for each j. There exists a permutation
n
of
1, . . . , J
n
such that
1
J
n
J
n
j=1
a
(j)
b
j
=
1
J
n
J
n
j=1
a
j
b
j
=
_
1
0
g
n
h
n
ds.
By homogeneity, there exists a measure-preserving transformation R
n
of
such that R
n
(B
(j)
) = A
j
for each j. If l L
, let T
n
(l)() = l(R
n
());
then T
n
. Then
_
T
n
(g)hd
_
T
n
(g
n
)h
n
d =
_
g
n
h
n
ds.
Since
_
1
0
g
ds = sup
_
1
0
g
n
h
n
ds, this nishes the proof.
7.6 Symmetric Banach sequence spaces
We now turn to the case where = N, with counting measure. Here we are
considering sequences, and spaces of sequences. The arguments are often
92 Rearrangements
technically easier, but they are no less important. Note that
L
1
= l
1
= x = (x
i
): |x|
1
=
i=0
[x
i
[ < ,
M
0
= c
0
= x = (x
i
): x
i
0 with |x|
c
0
= |x|
= max [x
i
[, and
M
1
= l
.
It is easy to verify that a Banach sequence space (X, |.|
X
) is rearrange-
ment invariant if and only whenever x X and is a permutation of N
then x
X and |x|
X
= |x
|
X
(where x
is the sequence dened by

(x
)
i
= x
(i)
). Let e
i
denote the sequence with 1 in the i-th place, and
zeros elsewhere. If (X, |.|
X
) is a rearrangement-invariant Banach sequence
space then |e
i
|
X
= |e
j
|
X
: we scale the norm so that |e
i
|
X
= 1: the re-
sulting space is called a symmetric Banach sequence space. If (X, |.|
X
) is a
symmetric Banach sequence space, then l
1
X, and the inclusion is norm-
decreasing. By considering associate spaces, it follows that X l
, and the
inclusion is norm-decreasing.
Proposition 7.6.1 If (X, |.|
X
) is a symmetric Banach sequence space then
either l
1
X c
0
or X = l
.
Proof Certainly l
1
X l
. If x Xc
0
, then there exists a permutation
and > 0 such that [x
(2n)
[ for all n; it follows from the lattice
property and scaling that the sequence (0, 1, 0, 1, 0, . . .) X. Similarly, the
sequence (1, 0, 1, 0, 1, . . .) X, and so (1, 1, 1, 1, . . .) X; it follows again
from the lattice property and scaling that X l
.
If x c
0
, the decreasing rearrangement x
is a sequence, which can be

dened recursively by taking x
1
as the absolute value of the largest term,
x
2
as the absolute value of the next largest, and so on. Thus there exists
a one-one mapping : N N such that x
n
= [x
(n)
[. x
n
can also be
described by a minimax principle:
x
n
= minmax[x
j
[: j , E: [E[ < n.
We then have the following results, whose proofs are the same as before, or
easier.
Proposition 7.6.2 (i) [x[ and x
are equidistributed.
(ii) If 0 x
(n)
x then 0 x
(n)
x
.
(iii) If x 0 and A N then

iA
x
i

[A[
i=1
x
i
.
(iv) If x, y c
0
then

i=1
[x
i
y
i
[
i=1
x
i
y
i
.
We dene Muirheads maximal sequence as
x
i
=
1
i
sup
jA
[x
j
[ : [A[ = i
.
Then x
i
is a normon c
0
equivalent to |x|
=max
n
[x
n
[, and x
i
=(
i
j=1
x
j
)/i,
so that x
= (x
.
Again, we dene x
w
y if x
. The results corresponding to those of

Theorems 7.4.1, 7.2.1 and 7.5.1 all hold, with obvious modications.
Let us also note the following multiplicative result, which we shall need
when we consider linear operators.
Proposition 7.6.3 Suppose that (x
n
) and (y
n
) are decreasing sequences of
positive numbers, and that

N
n=1
x
n

N
n=1
y
n
, for each N. If is an
increasing function on [0, ) for which (e
t
) is a convex function of t then
N
n=1
(x
n
)

N
n=1
(y
n
) for each N. In particular,

N
n=1
x
p
n

N
n=1
y
p
n
for each N, for 0 < p < .
If (X, |.|
X
) is a symmetric Banach sequence space, and (y
n
) X, then
(x
n
) X and |(x
n
)|
X
|(y
n
)|
X
.
Proof Let a
n
= log x
n
log x
N
and b
n
= log y
n
log x
N
for 1 n N.
Then (a
n
)
w
(b
n
). Let (t) = (x
N
e
t
) (x
N
). Then is a convex
increasing function on [0, ) with (0) = 0, and so by Theorem 7.4.1
N
n=1
(x
n
) =
N
n=1
(a
n
) +N(x
N
)
n=1
(b
n
) +N(x
N
) =
N
n=1
(y
n
).
The second statement is just a special case, since e
tp
is a convex function of t.
In particular, x
n
y
n
, and so the last statement follows from Corollary 7.4.1.
7.7 The method of transference
What about the converse of Calderons interpolation theorem? Although it
is a reasonably straightforward matter to give a functional analytic proof
of the corresponding theorem along the lines of Theorem 7.5.2, we give a
more direct proof, since this proof introduces important ideas, with useful
applications. Before we do so, let us consider how linear operators are
represented by innite matrices.
94 Rearrangements
Suppose that T L(c
0
) and that T(x) = y. Then y
i
=
j=1
t
ij
x
j
, where
t
ij
= (T(e
j
))
i
, so that
t
ij
0 as i for each j, and |T| = sup
i
j=1
[t
ij
[
< .
Conversely if (t
ij
) is a matrix which satises these conditions then, setting
T(x)
i
=
j=1
t
ij
x
j
, T L(c
0
) and |T| = sup
i
(
j=1
[t
ij
[).
Similarly if S L(l
1
), then S is represented by a matrix (s
ij
) which
satises
|S| = sup
j
_

i=1
[s
ij
[
_
< ,
and any such matrix denes an element of L(l
1
).
If T L(c
0
) or T L(l
1
) then T is positive if and only if t
ij
0 for each
i and j. A matrix is doubly stochastic if its terms are all non-negative and
i=1
t
ij
= 1 for each j and
j=1
t
ij
= 1 for each i.
A doubly stochastic matrix denes an operator which is norm-decreasing on
c
0
and norm-decreasing on l
1
, and so, by Calder ons interpolation theorem,
it denes an operator which is norm-decreasing on each symmetric sequence
space. Examples of doubly stochastic matrices are provided by permutation
matrices; T = (t
ij
) is a permutation matrix if there exists a permutation of
N for which t
(j)j
= 1 for each j and t
(i)j
= 0 for i ,= j. In other words, each
row and each column of T contains exactly one 1, and all the other entries
are 0. If T is a permutation matrix then (T(x))
i
= x
(i)
, so that T permutes
the coordinates of a vector. More particularly, a transposition matrix is a
permutation matrix that is dened by a transposition a permutation that
exchanges two elements, and leaves the others xed.
Theorem 7.7.1 Suppose that x and y are non-negative decreasing sequences
in c
0
with x
w
y. There exists a doubly stochastic matrix P = (p
ij
) such
that x
i

j=1
p
ij
y
j
for 1 i < .
Proof We introduce the idea of a transfer matrix. Suppose that =
ij
is the transposition of N which exchanges i and j and leaves the other
integers xed, and let
be the corresponding transposition matrix. Then

if 0 < 1 the transfer matrix T = T
,
is dened as
T = T
,
= (1 )I +
.
Thus
T
ii
= T
jj
= 1 ,
T
kk
= 1 for k ,= i, j,
T
ij
= T
ji
=
T
kl
= 0 otherwise.
If T(z) = z
t
, then z
k
= z
t
k
for k ,= i, j, and
z
t
i
+z
t
j
= ((1 )z
i
+z
j
) + (z
i
+ (1 )z
j
) = z
i
+z
j
,
so that some of z
i
is transferred to z
t
j
(or conversely). Note also that T is an
averaging procedure; if we write z
i
= m+d, z
j
= md, then z
t
i
= m+d,
z
t
j
= md, where 1 = 1 2 1. Since T is a convex combination
of I and
, T is doubly stochastic, and so it is norm-decreasing on c

0
and
on l
1
. Note that transposition matrices are special cases of transfer matrices
(with = 1).
We shall build P up as an innite product of transfer matrices. We use
the fact that if k < l and y
k
> x
k
, y
l
< x
l
and y
j
= x
j
for k < j < l,
and if we transfer an amount min(y
k
x
k
, x
l
y
l
) from y
k
to y
l
then the
resulting sequence z is still decreasing, and x
w
z. We also use the fact
that if x
l
> y
l
then there exists k < l such that y
k
> x
k
.
It may happen that y
i
x
i
for all i, in which case we take P to be
the identity matrix. Otherwise, there is a least l such that y
l
< x
l
. Then
there exists a greatest k < l such that y
k
> x
k
. We transfer the amount
min(y
k
x
k
, x
l
y
l
) from y
k
to y
l
, and iterate this procedure until we obtain a
sequence y
(1)
with y
(1)
l
= x
l
. Composing the transfer matrices that we have
used, we obtain a doubly stochastic matrix P
(1)
for which P
(1)
(y) = y
(1)
.
We now iterate this procedure. If it nishes after a nite number of steps,
we are nished. If it continues indenitely, there are two possibilities. First,
for each k for which y
k
> x
k
, only nitely many transfers are made from y
k
.
In this case, if P
(n)
is the matrix obtained by composing the transfers used
in the rst n steps, then as n increases, each row and each column of P
(n)
is
eventually constant, and we can take P as the term-by-term limit of P
(n)
.
The other possibility is that innitely many transfers are made from y
k
,
for some k. There is then only one k for which this happens. In this case,
we start again. First, we follow the procedure described above, omitting
the transfers from y
k
, whenever they should occur. As a result, we obtain a
doubly stochastic matrix P such that if z = P(y) then z
i
x
i
for 1 i < k,
z
k
= y
k
> x
k
, there exists an innite sequence k < l
1
< l
2
< such
that x
l
j
> z
l
j
for each j, and z
i
= x
i
for all other i. Let = x
l
1
z
l
1
.
96 Rearrangements
Note that

j=1
(x
l
j
z
l
j
) z
k
x
k
. We now show that there is a doubly
stochastic matrix Q such that Q(z) x. Then QP(y) x, and QP is
doubly stochastic. To obtain Q, we transfer an amount x
l
1
z
l
1
from z
k
to
z
l
1
, then transfer an amount x
l
2
z
l
2
from z
k
to z
l
2
, and so on. Let Q
(n)
be the matrix obtained after n steps, and let w
(n)
= Q
(n)
(z). It is easy to
see that every row of Q
(n)
, except for the k-th, is eventually constant. Let
n
be the parameter for the nth transfer, and let p
n
=
n
i=1
(1
i
). Then
easy calculations show that
Q
(n)
kk
= p
n
, and Q
(n)
kl
i
= (
i
/p
i
)p
n
.
Then
w
(n+1)
k
= (1
n+1
)w
(n)
k
+
n+1
z
l
n+1
= w
(n)
k
(x
l
n+1
z
l
n+1
),
so that
n+1
(w
(n)
k
z
l
n+1
) = x
l
n+1
z
l
n+1
. But
w
(n)
k
z
l
n+1
x
k
z
l
1
x
l
1
z
l
1
= ,
so that

n=1
n
< . Thus p
n
converges to a positive limit p. From this
it follows easily that if Q is the term-by-term limit of Q
(n)
then Q is doubly
stochastic, and Q(z) x.
Corollary 7.7.1 If x, y c
0
and x
w
y then there is a matrix Q which de-
nes norm-decreasing linear mappings on l
1
and c
0
and for which Q(y) =x.
Proof Compose P with suitable permutation and multiplication operators.
Corollary 7.7.2 If x and y are non-negative elements of l
1
and x y then
there exists a doubly stochastic matrix P such that P(y) = x.
Proof By composing with suitable permutation operators, it is sucient to
consider the case where x and y are decreasing sequences. If P satises the
conclusions of Theorem 7.7.1 then
j=1
y
j
=
i=1
x
i

i=1
j=1
p
ij
y
j
j=1
_

i=1
p
ij
_
y
j
=
j=1
y
j
.
Thus we must have equality throughout, and so x
i
=
j=1
p
ij
y
j
for each j.
7.8 Finite doubly stochastic matrices 97
7.8 Finite doubly stochastic matrices
We can deduce corresponding results for the case when = 1, . . . , n. In
particular, we have the following.
Theorem 7.8.1 Suppose that x, y R
n
and that x
w
y. Then there exists
a matrix T = (t
ij
) with
n
j=1
[t
ij
[ 1 for 1 i n and
n
i=1
[t
ij
[ 1 for 1 j n
such that x
i
=
n
j=1
t
ij
y
j
.
Theorem 7.8.2 Suppose that x, y R
n
and that x 0 and y 0. The
following are equivalent:
(i) x y.
(ii) There exists a doubly stochastic matrix P such that P(y) = x.
(iii) There exists a nite sequence (T
(1)
, . . . , T
(n)
) of transfer matrices
such that x = T
(n)
T
(n1)
T
(1)
y.
(iv) x is a convex combination of y
:
n
.
Proof The equivalence of the rst three statements follows as in the innite-
dimensional case. That (iii) implies (iv) follows by writing each T
(j)
as (1
j
)I +
j
(j)
, where
(j)
is a transposition matrix, and expanding. Finally,
the fact that (iv) implies (i) follows immediately from the sublinearity of the
mapping x x
.
The set x: x y is a bounded closed convex subset of R
n
. A point c of
a convex set C is an extreme point of C if it cannot be written as a convex
combination of two other points of C: if c = (1 )c
0
+c
1
, with 0 < < 1
then c = c
0
= c
1
.
Corollary 7.8.1 The vectors y
:
n
are the extreme points of
x: x y.
Proof It is easy to see that each y
is an extreme point, and the theorem

ensures that there are no other extreme points.
Theorem 7.8.2 and its corollary suggests the following theorem. It does
however require a rather dierent proof.
Theorem 7.8.3 The set P of doubly stochastic nn matrices is a bounded
closed convex subset of R
nn
. A doubly stochastic matrix is an extreme
98 Rearrangements
point of P if and only if it is a permutation matrix. Every doubly stochastic
matrix can be written as a convex combination of permutation matrices.
Proof It is clear that P is a bounded closed convex subset of R
nn
, and that
the permutation matrices are extreme points of P. Suppose that P = (p
ij
)
is a doubly stochastic matrix which is not a permutation matrix. Then
there is an entry p
ij
with 0 < p
ij
< 1. Then the i-th row must have another
entry strictly between 0 and 1, and so must the j-th column. Using this fact
repeatedly, we nd a circuit of entries with this property: there exist distinct
indices i
1
, . . . , i
r
and distinct indices j
1
, . . . , j
r
such that, setting j
r+1
= j
1
,
0 < p
i
s
j
s
< 1 and 0 < p
i
s
j
s+1
< 1 for 1 s r.
We use this to dene a matrix D = (d
ij
), by setting
d
i
s
j
s
= 1 and d
i
s
j
s+1
= 1 for 1 s r.
Let
a = inf
1sr
p
i
s
j
s
, b = inf
1sr
p
i
s
j
s+1
.
Then P +D P for a b, and so P is not an extreme point of P.
We prove the nal statement of the theorem by induction on the number
of non-zero entries, using this construction. The result is certainly true when
this number is n, for then P is a permutation matrix. Suppose that it is true
for doubly stochastic matrices with less than k non-zero entries, and that
P has k non-zero entries. Then, with the construction above, P aD and
P +bD have fewer than k non-zero entries, and so are convex combinations
of permutation matrices. Since P is a convex combination of P aD and
P +bD, P has the same property.
7.9 Schur convexity
Schur [Sch 23] investigated majorization, and raised the following problem:
for what functions on (R
n
)
+
is it true that if x 0, y 0 and x y then
(x) (y)? Such functions are now called Schur convex. [If (x) (y),
is Schur concave.] Since x
x x
for any permutation , a Schur convex

function must be symmetric: (x
) = (x). We have seen in Theorem 7.4.1

that if is a convex increasing non-negative function on [0, ) then the
function x
n
i=1
(x
i
) is Schur convex. Theorem 7.8.2 has the following
immediate consequence.
7.9 Schur convexity 99
Theorem 7.9.1 A function on (R
n
)
+
is Schur convex if and only if
(T(x)) (x) for each x (R
n
)
+
and each transfer matrix T.
Let us give one example. This is the original example of Muirhead
[Mui 03], where the method of transfer was introduced.
Theorem 7.9.2 (Muirheads theorem) Suppose that t
1
, . . . , t
n
are pos-
itive. If x (R
n
)
+
, let
(x) =
1
n!
n
t
x
1
(1)
t
x
n
(n)
.
Then is Schur convex.
Proof Suppose that T = T
,
, where =
ij
and 0 1. Let us write
x
i
= m+d, x
j
= md, T(x)
i
= m+d, T(x)
j
= md,
where 1 = 1 2 1. Then
(x) =
1
2(n!)
_
n
t
x
1
(1)
. . . t
x
n
(n)
+
n
t
x
1
((1))
. . . t
x
n
((n))
_
=
1
2(n!)
k,=i,j
t
x
k
(k)
_
t
x
i
(i)
t
x
j
(j)
+t
x
j
(i)
t
x
i
(j)
_
=
1
2(n!)
k,=i,j
t
x
k
(k)
_
t
m+d
(i)
t
md
(j)
+t
md
(i)
t
m+d
(j)
_
,
and similarly
(T(x)) =
1
2(n!)
k,=i,j
t
x
k
(k)
_
t
m+d
(i)
t
md
(j)
+t
md
(i)
t
m+d
(j)
_
.
Consequently
(x) (T(x)) =
1
2(n!)
k,=i,j
t
x
k
(k)
(),
100 Rearrangements
where
() = t
m
(i)
t
m
(j)
_
t
d
(i)
t
d
(j)
+t
d
(i)
t
d
(j)
t
d
(i)
t
d
(j)
t
d
(i)
t
d
(j)
_
= t
m
(i)
t
m
(j)
_
(a
d
+a
d
) (a
d
+a
d
)
_
,
and a
= t
(i)
/t
(j)
. Now if a > 0 the function f(s) = a
s
+a
s
is even, and
increasing on [0, ), so that () 0, and (x) (T(x)).
Note that this theorem provides an interesting generalization of the
arithmetic-mean geometric mean inequality: if x (R
n
)
+
and

n
i=1
x
i
= 1,
then
_
n
i=1
x
i
_
1/n
(x)
1
n
n
i=1
x
i
,
since (1/n, . . . , 1/n) x (1, 0, . . . , 0).
Given a nite set of numbers (the populations of cities or countries, the
scores a cricketer makes in a season), it is natural to arrange them in de-
creasing order. It was Muirhead [Mui 03] who showed that more useful
information could be obtained by considering the running averages of the
numbers, and it is for this reason that the term Muirhead function has
been used for f
(which is denoted by other authors as f
). It was also
Muirhead who showed how eective the method of transference could be.
Doubly stochastic matrices occur naturally in the theory of stationary
Markov processes. A square matrix P = (p
ij
) is stochastic if all of its
terms are non-negative, and

j
p
ij
= 1, for each i: p
ij
is the probability of
transitioning from state i to state j at any stage of the Markov process. The
matrix is doubly stochastic if and only if the probability distribution where
all states are equally probable is an invariant distribution for the Markov
process.
Minkowski showed that every point of a compact convex subset of R
n
can be expressed as a convex combination of the sets extreme points, and
Caratheodory showed that it can be expressed as a convex combination of
at most n + 1 extreme points. The extension of these ideas to the innite-
dimensional case is called Choquet theory: excellent accounts have been given
by Phelps [Phe 66] and Alfsen [Alf 71].
Exercises 101
Exercises
7.1 Suppose that (, , ) is an atom-free measure space, that A
and that 0 < t < (A) < . Let l = sup(B): B A, (B) t
and u = inf(B): B A, (B) t. Show that there exist mea-
surable subsets L and U of A with (L) = l, (U) = u. Deduce
that l = u, and that there exists a measurable subset B of A with
(B) = t.
7.2 Suppose that f M
1
(, , ), that 0 < q < and that C 0.
Show that the following are equivalent:
(i)
[f[
(u) = ([f[ > u) C
q
/u
q
for all u > 0;
(ii) f
(t) C/t
1/q
for 0 < t < ().
7.3 Suppose that f M
1
. What conditions are necessary and sucient
for
[f[
to be (a) continuous, and (b) strictly decreasing? If these
conditions are satised, what is the relation between
[f[
and f
?
7.4 Show that a rearrangement-invariant function space is either equal
to L
1
+L
or is contained in M
0
.
7.5 Suppose that 1 0
_
.
7.6 Suppose that f and g are non-negative functions on (, , ) for
which
_
log
+
f d < and
_
log
+
g d < . Let
G
t
(f) = exp
_
1
t
_
t
0
log f
(s) ds
_
,
and let G
t
(g) be dened similarly. Suppose that G
t
(f) G
t
(g) for
all 0 < t < (). Show that
_
(f) d
_
(g) d for every

increasing function on [0, ) with (e
t
) a convex function of t:
in particular,
_
f
r
d
_
g
r
d for each 0 < r < . What about
r = ?
Formulate and prove a corresponding result for sequences. (In
this case, the results are used to prove Weyls inequality (Corollary
15.8.1).)
7.7 Suppose that f is a non-negative measurable function on an atom-
free measure space (, , ). Show that there exists an increasing
sequence (f
n
) of non-negative simple functions, where each f
n
is of
the form f
n
=

k
n
j=1
a
jn
I
E
jn
, where, for each n, the sets E
jn
are
disjoint, and have equal measure, such that f
n
f.
102 Rearrangements
7.8 Suppose that is a convex increasing non-negative function on
[0, ) with (0) = 0. Let
n
(x) = D
+
f(0)x +
4
n
j=1
(D
+
f(
j
2
n
) D
+
f(
j 1
2
n
))(x
j
2
n
)
+
.
Show that
n
increases pointwise to .
7.9 Show that the representation of a doubly stochastic n n matrix as
a convex combination of permutation matrices need not be unique,
for n 3.
7.10 Let
d
= x R
d
: x = x
. Let s(x) = (
i
j=1
x
i
)
d
i=1
, and let
= s
1
: s(
d
)
d
. Suppose that is a symmetric function on
(R
d
)
+
. Find a condition on for to be Schur convex. Suppose
that is dierentiable, and that
0 /x
d
/x
d1
/x
1
on
d
. Show that is Schur convex.
7.11 Suppose that 1 k d. Let
e
k
(x) =
x
i
1
x
i
2
. . . x
i
k
: i
1
< i
2
< < i
k
be the k-th elementary symmetric polynomial. Show that e

k
is Schur
concave.
7.12 Let X
1
, . . . , X
k
be independent identically distributed random vari-
ables taking values v
1
, . . . , v
d
with probabilities p
1
, . . . , p
d
. What is
the probability that X
1
, . . . , X
k
take distinct values? Show that
is a Schur concave function of p = (p
1
, . . . , p
d
). What does this tell
you about the matching birthday story?
7.13 Suppose that X is a discrete random variable taking values v
1
, . . . , v
d
with probabilities p
1
, . . . , p
d
. The entropy h of the distribution is
j:p
j
,=0
p
j
log
2
(1/p
j
). Show that h is a Schur concave function of
p = (p
1
, . . . , p
d
). Show that h log
2
d.
7.14 Let
s(x) =
1
d 1
d
i=1
(x
i
x)
2
be the sample variance of x R
d
, where x = (x
1
+ +x
d
)/d. Show
that s is Schur convex.
8
Maximal inequalities
8.1 The HardyRiesz inequality (1 < p < )
In this chapter, we shall again suppose either that (, , ) is an atom-free
measure space, or that = N or 1, . . . , n, with counting measure. As its
name implies, Muirheads maximal function enjoys a maximal property:
f
(t) = sup
_
1
t
_
E
[f[ d: (E) t
_
for t > 0.
In this chapter we shall investigate this, and some other maximal functions
of greater importance. Many of the results depend upon the following easy
but important inequality.
Theorem 8.1.1 Suppose that h and g are non-negative measurable functions
in M
0
(, , ), satisfying
(h > )
_
(h>)
g d, for each > 0.
If 1 n.
Then h
n
h, and so, by the monotone convergence theorem, it is sucient
to show that |h
n
|
p
p
t
|g|
p
. Note that
_
h
p
n
d n
p
(h 1/n), so that
103
104 Maximal inequalities
h
n
L
p
. Note also that if 0 < < 1/n, then
(h
n
> ) (1/n)(h > 1/n)
_
(h>1/n)
g d =
_
(h
n
>)
g d
and so h
n
and g also satisfy the conditions of the theorem.
Using Fubinis theorem and H olders inequality,
_
h
p
n
d = p
_

0
t
p1
(h
n
> t) dt
p
_

0
t
p2
_
_
(h
n
>t)
g() d()
_
dt
= p
_
g()
_
_
h
n
()
0
t
p2
dt
_
d()
=
p
p 1
_
g()(h
n
())
p1
d()
p
t
|g|
p
__
(h
n
)
(p1)p
d
_
1/p
= p
t
|g|
p
|h
n
|
p1
p
.
We now divide, to get the result.
When p = , (h > )
_
(h>)
g d |g|
(h > ), and so (h >

) = 0 if > |g|
; thus |h|
|g|
.
Corollary 8.1.1 (The HardyRiesz inequality) Suppose that 1 <p <.
(i) If f L
p
(, , ) then
_
_
f
_
_
p
p
t
|f|
p
.
(ii) If f L
p
[0, ) and A(f)(t) = (
_
t
0
f(s) ds)/t then
|A(f)|
p

_
_
_f
_
_
_
p
p
t
|f|
p
.
(iii) If x l
p
and (A(x))
n
= (
n
i=1
x
i
)/n then
|A(x)|
p

_
_
_x
_
_
_
p
p
t
|x|
p
.
Proof (i) If > 0 and t = (f
> ) > 0 then

(f
> ) = t
_
t
0
f
(s) ds =
_
(f
>)
f
(s) ds,
so that
_
_
f
_
_
p
p
t
|f
|
p
= p
t
|f|
p
.
(ii) and (iii) follow, since [A(f)[ f
and [A(x)[ x
.
8.2 The HardyRiesz inequality (p = 1) 105
The constant p
t
is best possible, in the theorem and in the corollary.
Take = [0, 1], with Lebesgue measure. Suppose that 1 < r < p
t
, and let
g(t) = t
1/r1
. Then g L
p
, and h = g
= rg, so that
_
_
g
_
_
p
r |g|
p
.
Similar examples show that the constant is also best possible for sequences.
This result was given by Hardy [Har 20], but he acknowledged that the
proof that was given was essentially provided by Marcel Riesz. It enables
us to give another proof of Hilberts inequality, in the absolute case, with
slightly worse constants.
Theorem 8.1.2 If a = (a
n
)
n0
l
p
and b = (b
n
)
n0
l
p
, where 1 <p <,
then
j=0
k=0
[a
j
b
k
[
j +k + 1
(p +p
t
) |a|
p
|b|
p
.
Proof Using Holders inequality,
k=0
k
j=0
[a
j
b
k
[
j +k + 1

k=0
j=0
[a
j
[
j + 1
[b
k
[
|A([a[)|
p
|b|
p
p
t
|a|
p
|b|
p
.
Similarly,
j=1
j1
k=0
[a
j
b
k
[
j +k + 1
p |a|
p
|b|
p
.
Adding, we get the result.
In exactly the same way, we have a corresponding result for functions on
[0, ).
Theorem 8.1.3 If f L
p
[0, ) and g L
p
[0, ) , where 1 < p < , then

_

0
_

0
[f(x)g(y)[
x +y
dxdy (p +p
t
) |f|
p
|g|
p
.
8.2 The HardyRiesz inequality (p = 1)
What happens when p = 1? If () = and f is any non-zero function in L
1
then f
(t) (f
(1))/t for t 1, so that f
, L
1
. When () < , there
are functions f in L
1
with f
, L
1
(consider f(t) = 1/t(log(1/t))
2
on (0, 1)).
But in the nite-measure case there is an important and interesting result,
due to Hardy and Littlewood [HaL 30], which indicates the importance of
the space Llog L. We consider the case where () = 1.
Theorem 8.2.1 Suppose that () = 1 and that f L
1
. Then f
L
1
(0, 1)
if and only if f Llog L. If so, then
|f|
Llog L

_
_
_f
_
_
_
1
6 |f|
Llog L
,
so that
_
_
f
_
_
1
is a norm on Llog L equivalent to |f|
Llog L
.
Proof Suppose rst that f
L
1
and that
_
_
f
_
_
1
= 1. Then, integrating by
parts, if > 0,
1 =
_
_
_f
_
_
_
1
_
1
1
t
__
t
0
f
(s) ds
_
dt =
_
log
1
_
f
() +
_
1
(t) log
1
t
dt.
Thus
_
1
0
f
(t) log(1/t) dt 1. Also |f|

1
= |f
|
1

_
_
f
_
_
1
= 1, so that
f
(t) f
(t) 1/t. Thus

_
[f[ log
+
([f[) d =
_
1
0
f
(t) log
+
f
(t) dt
_
1
0
f
(t) log
1
t
dt 1,
and so f Llog L and |f|
Llog L

_
_
f
_
_
1
. By scaling, the same result holds
for all f L
1
with
_
_
f
_
_
1
< .
Conversely, suppose that
_
[f[ log
+
([f[) = 1. Let B = t (0, 1] : f
(t) >
1/
t and let S = t (0, 1] : f
(t) 1/
t. If t B then log
+
(f
(t)) =
log(f
(t)) >
1
2
log(1/t), and so
_
_
_f
_
_
_
1
=
_
1
0
f
(t) log
1
t
dt
2
_
B
f
(t) log
+
(f
(t)) dt +
_
S
1
t
log
1
t
dt
2 +
_
1
0
1
t
log
1
t
dt = 6.
Thus, by scaling, if f Llog L then f
L
1
(0, 1) and
_
_
f
_
_
1
6 |f|
Llog L
.
8.3 Related inequalities
We can obtain similar results under weaker conditions.
8.3 Related inequalities 107
Proposition 8.3.1 Suppose that f and g are non-negative measurable func-
tions in M
0
(, , ), and that
(f > )
_
(f>)
g d, for each > 0.
Then
(f > ) 2
_
(g>/2)
g d, for > 0.
Proof
(f > )
_
(g>/2)
g d +
_
(g/2)(f>)
g d
_
(g>/2)
g d +

2
(f > ).
Proposition 8.3.2 Suppose that f and g are non-negative measurable func-
tions in M
0
(, , ), and that
(f > )
_
(g>)
g d, for each > 0.
Suppose that is a non-negative measurable function on [0, ) and that
(t) =
_
t
0
() d < for all t > 0. Let (t) =
_
t
0
(()/) d. Then
_
X
(f) d
_
X
g(g) d.
Proof Using Fubinis theorem,
_
X
(f) d =
_

0
()(f > ) d
_

0
()
_
_
(g>)
g d
_
d
=
_
X
__
g
0
()
d
_
g d =
_
X
g(g) d.
Corollary 8.3.1 Suppose that f and g are non-negative measurable func-
tions in M
0
(, , ), and that
(f > )
_
(g>)
g d, for each > 0.
If 1 )
_
(g>)
g d, for each > 0.
If (B) < then
_
B
f d (B) +
_
X
g log
+
g d.
Proof Take = I
[1,)
. Then (t) = (t 1)
+
and (t) = log
+
t, so that
_
X
(f 1)
+
d
_
X
g log
+
g d.
Since fI
B
I
B
+ (f 1)
+
, the result follows.
Combining this with Proposition 8.3.1, we also obtain the following corol-
lary.
tions in M
0
(, , ), and that
(f > )
_
(f>)
g d, for each > 0.
If (B) < then
_
B
f d (B) +
_
X
2g log
+
(2g) d.
8.4 Strong type and weak type
The mapping f f
is sublinear, and so are many other mappings that we

shall consider. We need conditions on sublinear mappings comparable to the
continuity, or boundedness, of linear mappings. Suppose that E is a normed
space, that 0 < q < and that T : E M(, , ) is sublinear. We say
that T is of strong type (E, q) if there exists M < such that if f E
then T(f) L
q
and |T(f)|
q
M|f|
E
. The least constant M for which
the inequality holds for all f E is called the strong type (E, q) constant.
When T is linear and 1 q < , strong type (E, q) and bounded from E
8.4 Strong type and weak type 109
to L
q
are the same, and the strong type constant is then just the norm of
T. When E = L
p
, we say that T is of strong type (p, q).
We also need to consider weaker conditions, and we shall introduce more
than one of these. For the rst of these, we say that T is of weak type (E, q)
if there exists L < such that
: [T(f)()[ > L
q
|f|
q
E
q
for all f E, > 0. Equivalently (see Exercise 7.2), T is of weak type
(E, q) if
(T(f))
(t) Lt
1/q
|f|
E
for all f E, 0 < t < ().
The least constant L for which the inequality holds for all f E is called
the weak type (E, q) constant.
When E = L
p
(
t
,
t
,
t
), we say that T is of weak type (p, q). Since
|g|
q
q
=
_
[g[
q
d
q
x: [g(x)[ > ,
strong type (E, q) implies weak type (E, q).
For completeness sake, we say that T is of strong type (E, ) or weak
type (E, ) (strong type (p, ) or weak type (p, ) when E = L
p
) if there
exists M such that if f E then T(f) L
(
d
) and |T(f)|
M|f|
E
.
Here are some basic properties about strong type and weak type.
Proposition 8.4.1 Suppose that E is a normed space, that 0 < q <
and that S, T: E M(, , ) are sublinear and of weak type (E, q), with
constants L
S
and L
T
. If R is sublinear and [R(f)[ [S(f)[ for all f then
R is of weak type (E, q), with constants at most L
S
. If a, b > 0 then a[S[ +
b[T[ is sublinear and of weak type (E, q), with constants at most 2(a
q
L
q
S
+
b
q
L
q
T
)
1/q
. If S and T are of strong type (E, q), with constants M
S
and M
T
then R and a[S[ +b[T[ are of strong type (E, q), with constants at most M
S
and aM
S
+bM
T
respectively.
Proof The result about R is trivial. Suppose that > 0. Then (a[S(f)[ +
b[T(f)[ > ) (a[S(f)[ > /2) (b[T(f)[ > /2), so that
(a[S(f)[ +b[T(f)[ > ) ([S(f)[ > /2a) +([T(f)[ > /2b)
2
q
a
q
L
q
S
q
|f|
q
E
+
2
q
b
q
L
q
T
q
|f|
q
E
.
The proofs of the strong type results are left as an easy exercise.
Weak type is important, when we consider convergence almost every-
where. First let us recall an elementary result from functional analysis
about convergence in norm.
Theorem 8.4.1 Suppose that (T
r
)
r0
is a family of bounded linear mappings
from a Banach space (E, |.|
E
) into a Banach space (G, |.|
G
), such that
(i) sup
r
|T
r
| = K < , and
(ii) there is a dense subspace F of E such that T
r
(f) T
0
(f) in norm,
for f F, as r 0.
Then if e E, T
r
(e) T
0
(e) in norm, as r 0.
Proof Suppose that > 0. There exists f F with |f e| < /3M, and
there exists r
0
> 0 such that |T
r
(f) T
0
(f)| < /3 for 0 < r r
0
. If
0 < r r
0
then
|T
r
(e) T
0
(e)| |T
r
(e f)| +|T
r
(f) T
0
(f)| +|T
0
(e f)| < .
Here is the corresponding result for convergence almost everywhere.
Theorem 8.4.2 Suppose that (T
r
)
r0
is a family of linear mappings from
a normed space E into M(, , ), and that M is a non-negative sublinear
mapping of E into M(, , ), of weak type (E, q) for some 0 < q < ,
such that
(i) [T
r
(g)[ M(g) for all g E, r 0, and
(ii) there is a dense subspace F of E such that T
r
(f) T
0
(f) almost
everywhere, for f F, as r 0.
Then if g E, T
r
(g) T
0
(g) almost everywhere, as r 0.
Proof We use the rst BorelCantelli lemma. For each n there exists f
n
F
with |g f
n
| 1/2
n
. Let
B
n
= (M(g f
n
) > 1/n) (T
r
(f
n
) , T
0
(f
n
)).
Then
(B
n
) = (M(g f
n
) > 1/n)
Ln
q
2
nq
.
Let B = limsup(B
n
). Then (B) = 0, by the rst BorelCantelli lemma.
If x / B, there exists n
0
such that x , B
n
for n n
0
, so that
[T
r
(g)(x) T
r
(f
n
)(x)[ M(g f
n
)(x) 1/n, for r 0,
8.5 Riesz weak type 111
and so
[T
r
(g)(x) T
0
(g)(x)[
[T
r
(g)(x) T
r
(f)(x)[ +[T
r
(f
n
)(x) T
0
(f
n
)(x)[ +[T
0
(f
n
)(x) T
0
(g)(x)[
2/n +[T
r
(f
n
)(x) T
0
(f
n
)(x)[ 3/n
for small enough r.
We can of course consider other directed sets than [0, ); for example N,
or the set
(x, t): t 0, [x[ kt
d+1
ordered by (x, t) (y, u) if t u.
8.5 Riesz weak type
When E = L
p
(, , ), a condition slightly less weak than weak type is
of considerable interest: we say that T is of Riesz weak type (p, q) if there
exists 0 < L < such that
x: [T(f)(x)[ >
L
q
q
_
_
([T(f)[>)
[f[
p
d
_
q/p
.
This terminology, which is not standard, is motivated by Theorem 8.1.1,
and the HardyRiesz inequality. We call the least L for which the inequality
holds for all f the Riesz weak type constant. Riesz weak type clearly implies
weak type, but strong type does not imply Riesz weak type (consider the
shift operator T(f)(x) = f(x 1) on L
p
(R), and T(I
[0,1]
).
Proposition 8.5.1 Suppose that S and T are of Riesz weak type (p, q), with
weak Riesz type constants L
S
and L
T
. Then max([S[, [T[) is of Riesz weak
type (p, q), with constant at most (L
q
S
+L
q
T
)
1/q
, and S is of Riesz weak type
(p, q), with constant [[L
S
.
Proof Let R = max([S[, [T[). Then (R(f) >) =([S(f)[ >)([T(f)[ >),
so that
(R > )
L
q
S
q
_
_
([S(f)[>)
[f[
p
d
_
q/p
+
L
q
T
q
_
_
([T(f)[>)
[f[
p
d
_
q/p
L
q
S
+L
q
T
q
_
_
(R(f)>)
[f[
p
d
_
q/p
.
The proof for S is left as an exercise.
We have the following interpolation theorem.
Theorem 8.5.1 Suppose that T is a sublinear mapping of Riesz weak type
(p, p), with Riesz weak type constant L. If p < q < then T is of strong
type (q, q), with constant at most L(q/(q p))
1/p
, and T is of strong type
(, ), with constant L.
Proof Since T is of Riesz weak type (p, p),
([T(f)[
p
> )
L
p
_
([T(f)[
p
>)
[f[
p
d.
Thus [T(f)
p
[ and L
p
[f[
p
satisfy the conditions of Theorem 8.1.1. If p <q <,
put r = q/p (so that r
t
= q/(q p)). Then
|T(f)|
q
= |[T(f)[
p
|
1/p
r
(r
t
)
1/p
|L
p
[f[
p
|
1/p
r
= L(r
t
)
1/p
|f|
q
.
Similarly,
|T(f)|
= |[T(f)[
p
|
1/p
|L
p
[f[
p
|
1/p
= L|f|
.
8.6 Hardy, Littlewood, and a batsmans averages
Muirheads maximal function is concerned only with the values that a func-
tion takes, and not with where the values are taken. We now begin to
introduce a sequence of maximal functions that relate to the geometry of
the underlying space. This is very simple geometry, usually of the real line,
or R
n
, but to begin with, we consider the integers, where the geometry is
given by the order.
The rst maximal function that we consider was introduced by Hardy
and Littlewood [HaL 30] in the following famous way (their account has
been slightly edited and abbreviated here).
The problem is most easily grasped when stated in the language of cricket,
or any other game in which a player compiles a series of scores in which an
average is recorded . . . Suppose that a batsman plays, in a given season, a
given stock of innings
a
1
, a
2
, . . . , a
n
(determined in everything except arrangement). Suppose that
is . . . his
maximum average for any consecutive series of innings ending at the -th,
8.6 Hardy, Littlewood, and a batsmans averages 113
so that
=
a
+a
+1
+ +a
+ 1
= max
+a
+1
+ +a
+ 1
;
we may agree that, in case of ambiguity,
is to be chosen as small as
possible. Let s(x) be a positive function which increases (in the wide sense)
with x, and let his satisfaction after the -th innings be measures by s
=
s(
). Finally let his total satisfaction for the season be measured by S =
=

s(
). Theorem 2 ... shows that S is ... a maximum when the

innings are played in decreasing order.
Of course, this theorem says that S
n
=1
s(a
).
We shall not give the proof of Hardy and Littlewood, whose arguments,
as they say, are indeed mostly of the type which are intuitive to a student of
cricket averages. Instead, we give a proof due to F. Riesz [Ri(F) 32]. Rieszs
theorem concerns functions on R, but rst we give a discrete version, which
establishes the result of Hardy and Littlewood. We begin with a seemingly
trivial lemma.
Lemma 8.6.1 Suppose that (f
n
)
nN
is a sequence of real numbers for which
f
n
as n . Let
E = n: there exists m < n such that f
m
> f
n
.
Then we can write E =
j
(c
j
, d
j
) (where (c
j
, d
j
) = n: c
j
< n < d
j
), with
c
1
< d
1
c
2
< d
2
, and f
n
< f
c
j
f
d
j
for n (c
j
, d
j
).
Proof The union may be empty, nite, or innite. If (f
n
) is increasing then
E is empty. Otherwise there exists a least c
1
such that f
c
1
> f
c
1
+1
. Let
d
1
be the least integer greater than c
1
such that f
d
1
f
c
1
. Then c
1
, E,
d
1
, E, and n E for c
1
< n < d
1
. If (f
n
) is increasing for n d
1
, we are
nished. Otherwise we iterate the procedure, starting from d
1
. It is then
easy to verify that E =
j
(c
j
, d
j
).
Theorem 8.6.1 (F. Rieszs maximal theorem: discrete version) If
a = (a
n
) l
1
, let
n
= max
1kn
([a
nk+1
[ +[a
nk+2
[ + +[a
n
[) /k.
Then the mapping a is a sublinear mapping of Riesz weak type (1, 1),
with Riesz weak type constant 1.
Proof The mapping a is certainly sublinear. Suppose that > 0. Then
the sequence (f
n
) dened by f
n
= n
n
j=1
[a
j
[ satises the conditions of
the lemma. Let
E
= n: there exists m < n such that f

m
> f
n
=
j
(c
j
, d
j
).
Now f
n
f
nk
= k
n
j=nk+1
[a
j
[, and so n E
if and only if
n
> .
Thus
#n:
n
> = #(E
) =
j
(d
j
c
j
1).
But
(d
j
c
j
1)
(c
j
<n<d
j
)
[a
n
[ = f
d
j
1
f
c
j
0,
so that
#n:
n
>

(c
j
<n<d
j
)
[a
n
[
n:
n
>
[a
n
[.
Corollary 8.6.1
n
a
n
.
Proof Suppose that <
n
, and let k = #j :
j
> . Then k n and,
by the theorem,
k
(
j
>)
[a
j
[ ka
k
.
Thus a
k
a
n
. Since this holds for all <
n
,
n
a
n
.
The result of Hardy and Littlewood follows immediately from this, since,
with their terminology,
S =
s(
) =
s(
s(a
).
[The fact that the batsman only plays a nite number of innings is resolved
by setting a
n
= 0 for other values of n.]
8.7 Rieszs sunrise lemma
We now turn to the continuous case; as we shall see, the proofs are similar
to the discrete case. Here the geometry concerns intervals with a given point
as an end-point, a mid-point, or an internal point.
8.7 Rieszs sunrise lemma 115
Lemma 8.7.1 (Rieszs sunrise lemma) Suppose that f is a continuous
real-valued function on R such that f(x) as x and that f(x)
as x . Let
E = x: there exists y < x with f(y) > f(x).
Then E is an open subset of R, every connected component of E is bounded,
and if (a, b) is one of the connected components then f(a) = f(b) and f(x) <
f(a) for a < x < b.
Proof It is clear that E is an open subset of R. If x R, let m(x) =
supf(t): t < x, and let L
x
= y: y x, f(y) = m(x). Since f is con-
tinuous and f(t) as t , L
x
is a closed non-empty subset of
(, x]: let l
x
= sup L
x
. Then x E if and only if f(x) < m(x), and if
and only if l
x
< x. If so, m(x) = f(l
x
) > f(t) for l
x
< t x.
Similarly, let R
x
= z : z x, f(z) = m(x). Since f is continuous
and f(t) as t , R
x
is a closed non-empty subset of [x, ): let
r
x
= inf R
x
. If x E then m(x) = f(r
x
) > f(t) for x t < r
x
. Further,
l
x
, r
x
, E, and so (l
x
, r
x
) is a maximal connected subset of E and the result
follows.
Why is this the sunrise lemma? The function f represents the prole
of a mountain, viewed from the north. The set E is the set of points in
shadow, as the sun rises in the east.
This lemma was stated and proved by F. Riesz [Ri(F) 32], but the paper
also included a simpler proof given by his brother Marcel.
Theorem 8.7.1 (F. Rieszs maximal theorem: continuous version)
For g L
1
(R, d), let
m
(g)(x) = sup
y<x
1
x y
_
x
y
[g(t)[ dt,
Then m
is a sublinear operator, and if > 0 then

(m
(g) > ) =
_
(m
(g)>)
[g(t)[ dt,
so that m
is of Riesz weak type (1, 1), with constant 1.

Proof It is clear from the denition that m
is sublinear. Suppose that

g L
1
(R, d) and that > 0. Let G
(x) = x
_
x
0
[g(t)[ dt. Then G
satises the conditions of the sunrise lemma. Let

E
= x: there exists y < x with G
(y) > G
(x) =
j
I
j
,
where the I
j
= (a
j
, b
j
) are the connected components of E
. Since
G
(x) G
(y) = (x y)
_
x
y
[g(t)[ dt,
m
(g)(x) > if and only if x E
. Thus
(m
(g) > ) = (E
) =
j
(b
j
a
j
).
But
0 = G
(b
j
) G
(a
j
) = (b
j
a
j
)
_
b
j
a
j
[g(t)[ dt,
so that
(m
(g) > ) =
j
_
b
j
a
j
[g(t)[ dt =
_
(m
(g)>)
[g(t)[ dt.
In the same way, if
m
+
(g) = sup
y>x
1
y x
_
y
x
[g(t)[ dt,
m
+
is a sublinear operator of Riesz weak type (1, 1). By Proposition 8.5.1,
the operators
m
u
(g)(x) = sup
y<x<z
1
z y
_
z
y
[g(t)[ dt = max(m
(g)(x), m
+
(g)(x)),
M(g)(x) = max(m
u
(g)(x), [g(x)[)
are also sublinear operators of Riesz weak type (1, 1).
Traditionally, it has been customary to work with the HardyLittlewood
maximal operator
m(g)(x) = sup
r>0
1
2r
_
x+r
xr
[g(t)[ dt
(although, in practice, m
u
is usually more convenient).
Theorem 8.7.2 The HardyLittlewood maximal operator is of Riesz weak
type (1, 1), with Riesz weak type constant at most 4.
8.8 Dierentiation almost everywhere 117
Proof We keep the same notation as in Theorem 8.7.1, and let c
j
= (a
j
+
b
j
)/2. Let F
= (m(g) > ). If x (a
j
, c
j
) then x F
(take r = x a
j
),
so that
_
(m(g)>)
[g[ dt
j
_
_
c
j
a
j
[g[ dt
_
=
j
((c
j
a
j
) (G(c
j
) G(a
j
)))
j
(c
j
a
j
) = (E
)/2,
since G(c
j
) G(a
j
) for each j. But
(m(g) > ) (m
u
(g) > ) = (m
(g) > ) (m
+
> ),
so that
(m(g) > ) (m
(g) > ) +(m

+
(g) > ) = 2(E
),
and so the result follows.
8.8 Dierentiation almost everywhere
We are interested in the values that a function takes near a point. We
introduce yet another space of functions. We say that a measurable function
f on R
d
is locally integrable if
_
B
[f[ d < , for each bounded subset B of
R
d
. We write L
1
loc
= L
1
loc
(R
d
) for the space of locally integrable functions
on R
d
. Note that if 1 < p < then L
p
L
1
+L
L
1
loc
.
Here is a consequence of the F. Riesz maximal theorem.
Theorem 8.8.1 Suppose that f L
1
loc
(R). Let F(x) =
_
x
0
f(t) dt. Then F
is dierentiable almost everywhere, and the derivative is equal to f almost
everywhere. If f L
p
, where 1 0, and if each fI
(R,R)
is
dierentiable almost everywhere, then so is f. We apply Theorem 8.4.2,
using M(f) = max(m
u
(f), [f[), and setting
T
h
(f)(x) = (1/h)
_
x+h
x
f(t) dt for h ,= 0, and T
0
(f)(x) = f(x).
Then [T
h
(f)[ M(f), for all h. If g is a continuous function of compact
support, then T
h
(g)(x) g(x), uniformly in x, as h 0, and the continuous
functions of compact support are dense in L
1
(R). Thus T
h
(f) f almost
everywhere as h 0: but this says that F is dierentiable, with derivative
f, almost everywhere.
If f L
p
, then, applying Corollary 5.4.2,
|T
h
(f)|
p

_
_

_
1
[h[
_
h
0
[f(x +t)[ dt
_
p
dx
_
1/p
1
[h[
_
h
0
__

[f(x +t)[
p
dx
_
1/p
dt = |f|
p
.
If g is a continuous function of compact support K then T
h
(g) g uniformly,
and T
h
(g) g vanishes outside K
h
= x: d(x, K) [h[, and so T
h
(g) g
in L
p
norm as h 0. The continuous functions of compact support are
dense in L
p
(R); convergence in L
p
norm therefore follows from Theorem
8.4.1.
8.9 Maximal operators in higher dimensions
Although there are further conclusions that we can draw, the results of the
previous section are one-dimensional, and it is natural to ask what happens
in higher dimensions. Here we shall obtain similar results. Although the
sunrise lemma does not seem to extend to higher dimensions, we can replace
it by another beautiful lemma. In higher dimensions, the geometry concerns
balls or cubes (which reduce in the one-dimensional case to intervals).
Let us describe the notation that we shall use:
B
r
(x) is the closed Euclidean ball y : [y x[ r and U
r
(x) is the open
Euclidean ball y : [y x[ < r.
d
is the Lebesgue measure of a unit ball
in
d
. S
r
(x) is the sphere y : [y x[ = r. Q(x, r) = y : [x
i
y
i
[ <
r for 1 i d is the cube of side 2r centred at x.
We introduce several maximal operators: suppose that f L
1
loc
(
d
). We
set
A
r
(f)(x) =
_
U
r
(x)
f d
(U
r
(x))
=
1
r
d
d
_
U
r
(x)
f d.
8.9 Maximal operators in higher dimensions 119
A
r
(f)(x) is the average value of f over the ball U
r
(x).
m(f)(x) = sup
r>0
A
r
([f[)(x) = sup
r>0
1
r
d
d
_
U
r
(x)
[f[ d,
m
u
(f)(x) = sup
r>0
sup
xU
r
(y)
1
r
d
d
_
U
r
(y)
[f[ d,
m
Q
(f)(x) = sup
r>0
1
(2r)
d
_
Q
r
(x)
[f[ d,
and
m
Q
u
(f)(x) = sup
r>0
sup
xQ
r
(y)
1
(2r)
d
_
Q
r
(y)
[f[ d.
As before, m is the HardyLittlewood maximal function.
The maximal operators are all equivalent, in the sense that if m
t
and m
tt
are any two of them then there exist positive constants c and C such that
cm
t
(f)(x) m
tt
(f)(x) Cm
t
(f)(x)
for all f and x.
Proposition 8.9.1 Each of these maximal operators is sublinear. If m
t
is
any one of them, then m
t
(f) is a lower semi-continuous function from
d
to [0, ]: E
= x : m
t
(f)(x) > is open in
d
for each 0.
Proof It follows from the denition that each of the maximal operators is
sublinear. We prove the lower semi-continuity for m: the proof for m
Q
is
essentially the same, and the proofs for the other maximal operators are
easier. If x E
, there exists r > 0 such that A

r
([f[)(x) > . If > 0 and
[x y[ < then U
r+
(y) U
r
(x), and
_
U
r+
[f[ d
_
U
r
[f[ d, so that
m(f)(y) A
r+
([f[)(y)
_
r
r +
_
d
m(f)(x) >
for small enough > 0.
We now come to the d-dimensional version of Rieszs maximal theorem.
Theorem 8.9.1 The maximal operators m
u
and m
Q
u
are of Riesz weak type
(1, 1), each with constant at most 3
d
.
Proof We prove the result for m
u
: the proof for m
Q
u
is exactly similar. The
key result is the following covering lemma.
Lemma 8.9.1 Suppose that G is a nite set of open balls in
d
, and that
is Lebesgue measure. Then there is a nite subcollection F of disjoint balls
such that
UF
(U) =
_
_
UF
U
_
1
3
d
_
_
UG
U
_
.
Proof We use a greedy algorithm. If U = U
r
(x) is a ball, let U
= U
3r
(x)
be the ball with the same centre as U, but with three times the radius. Let
U
1
be a ball of maximal radius in G. Let U
2
be a ball of maximal radius in
G, disjoint from U
1
. Continue, choosing U
j
of maximal radius, disjoint from
U
1
, . . . , U
j1
, until the process stops, with the choice of U
k
.
Let F = U
1
, . . . , U
k
. Suppose that U G. There is a least j such
that U
U
j
,= . Then the radius of U is no greater than the radius of
U
j
(otherwise we would have chosen U to be U
j
) and so U U
j
. Thus
UG
U
UF
U
and
(
_
UG
U) (
_
UF
U
UF
(U
) = 3
d
UF
(U).
Proof of Theorem 8.9.1 Let f L
1
(
d
) and let E
= x: m
u
(f)(x) >
. Let K be a compact subset of E
. For each x K, there exist y

x
R
d
and r
x
> 0 such that x U
r
x
(y
x
) and A
r
x
([f[)(y
x
) > . (Note that it follows
from the denition of m
u
that U
r
x
(y
x
) E
; this is why m
u
is easier to work
with than m.) The sets U
r
x
(y
x
) cover K, and so there is a nite subcover
G. By the lemma, there is a subcollection F of disjoint balls such that
UF
(U)
1
3
d
(
_
UG
U)
(K)
3
d
.
But if U F, (U)
_
U
[f[ d, so that since

UF
U E
UF
(U)
1
UF
_
[f[ d
1
_
E
[f[ d.
Thus (K) 3
d
(
_
E
[f[ d)/, and

(E
) = sup(K): K compact, K E

3
d
_
E
[f[ d.
8.10 The Lebesgue density theorem 121
Corollary 8.9.1 Each of the maximal operators dened above is of weak
type (1, 1) and of strong type (p, p), for 1 < p .
I do not know if the HardyLittlewood maximal operator m is of Riesz
weak type (1, 1). This is interesting, but not really important; the important
thing is that m m
u
, and m
u
is of Riesz weak type (1, 1).
8.10 The Lebesgue density theorem
We now have the equivalent of Theorem 8.8.1, with essentially the same
proof.
1
loc
(R
d
). Then A
r
(f) f almost
everywhere, as r 0, and [f[ m(f) almost everywhere. If f L
p
, where
1 < p < , then A
r
(f) f in L
p
norm.
Corollary 8.10.1 (The Lebesgue density theorem) If E is a measur-
able subset of
d
then
1
r
d
d
(U
r
(x) E) =
(U
r
E)
(U
r
)
1 as r 0 for almost all x E
and
1
r
d
d
(U
r
(x) E) =
(U
r
E)
(U
r
)
0 as r 0 for almost all x / E.
Proof Apply the theorem to the indicator function I
E
.
8.11 Convolution kernels
We can think of Theorem 8.10.1 as a theorem about convolutions. Let
J
r
(x) = I
U
r
(0)
/(U
r
(0)). Then
A
r
(f)(x) =
_
1
d
J
r
(x y)f(y) dy =
_
1
d
f(x y)J
r
(y) dy = (J
r
f)(x).
Then J
r
f f almost surely as r 0, and if f L
p
then J
r
f f in
L
p
norm.
We can use the HardyLittlewood maximal operator to study other con-
volution kernels. We begin by describing two important examples. The
Poisson kernel P is dened on the upper half space H
d+1
= (x, t): x
R
d
, t > 0 as
P(x, t) = P
t
(x) =
c
d
t
([x[
2
+t
2
)
(d+1)/2
.
P
t
L
1
(
d
), and the constant c
d
is chosen so that |P
1
|
1
= 1. A change of
variables then shows that |P
t
|
1
= |P
1
|
1
= 1 for all t > 0.
The Poisson kernel is harmonic on H
d+1
that is,
2
P
t
2
+
d
j=1
2
P
x
2
j
= 0
and is used to solve the Dirichlet problem in H
d+1
: if f is a bounded
continuous function on R
d
and we set
u(x, t) = u
t
(x) = P
t
(f)(x) = (P
t
f)(x)
=
_
1
d
P
t
(x y)f(y) dy =
_
1
d
f(x y)P
t
(y) dy,
then u is a harmonic function on H
d+1
and u(x, t) f(x) uniformly on the
bounded sets of R
d
as t 0. We want to obtain convergence results for a
larger class of functions f.
Second, let
H(x, t) = H
t
(x) =
1
(2t)
d/2
e
[x[
2
/2t
be the Gaussian kernel. Then H satises the heat equation
H
t
=
1
2
d
j=1
2
H
x
2
j
on H
d+1
. If f is a bounded continuous function on R
d
and we set
v(x, t) = v
t
(x) = H
t
(f)(x) = (H
t
f)(x)
=
_
1
d
H
t
(x y)f(y) dy =
_
1
d
f(x y)H
t
(y) dy,
then v satises the heat equation on H
d+1
, and v(x, t) f(x) uniformly
on the bounded sets of R
d
as t 0. Again, we want to obtain convergence
results for a larger class of functions f.
The Poisson kernel and the Gaussian kernel are examples of bell-shaped
approximate identities. A function =
t
(x) on (0, ] R
d
is a bell-shaped
approximate identity if
(i)
t
(x) = t
d
1
(x/t);
(ii)
1
0, and
_
1
d

1
(x) dx = 1;
(iii)
1
(x) = ([x[) where (r) is a strictly decreasing continuous function
on (0, ), taking values in [0, ].
8.11 Convolution kernels 123
[In fact, the results that we present hold when is a decreasing function
(as for example when we take = I
[0,1]
/(U
1
(0))), but the extra require-
ments make the analysis easier, without any essential loss.]
If is a bell-shaped approximate identity, and if f L
1
+L
, we set
t
(f)(x) = (
t
f)(x) =
_
R
d
t
(x y)f(y) d(y).
Theorem 8.11.1 Suppose that is a bell-shaped approximate identity and
that f (L
1
+L
)(R
d
). Then
(i) the mapping (x, t)
t
(f)(x) is continuous on H
d+1
;
(ii) if f C
b
(R
d
) then
t
(f) f uniformly on the compact sets of R
d
;
(iii) if f L
p
(R
d
), where 1 p <, then |
t
(f)|
p
|f|
p
and
t
(f) f
in L
p
-norm.
Proof This is a straightforward piece of analysis (using Theorem 8.4.1 and
Proposition 7.5.1) which we leave to the reader.
The convergence in (iii) is convergence in mean. What can we say about
convergence almost everywhere? The next theorem enables us to answer
this question.
Theorem 8.11.2 Suppose that is a bell-shaped approximate identity, and
that f (L
1
+L
)(
d
). Then [
t
(f)(x)[ m(f)(x).
Proof Let (x) = ([x[), and let us denote the inverse function to :
(0, (0)] [0, ) by . Then, using Fubinis theorem,
t
(f)(x) =
1
t
d
_
R
d
1
_
x y
t
_
f(y)dy
=
1
t
d
_
R
d
_
_

1(
xy
t
)
0
du
_
f(y)dy
=
1
t
d
_
(0)
0
_
_
(
1
(
xy
t
))>u
f(y)dy
_
du
=
1
t
d
_
(0)
0
_
_
[
xy
t
[
<(u)
f(y)dy
_
du
=
1
t
d
_
(0)
0
_
_
U
t(u)
(x)
f(y)dy
_
dy
=
1
t
d
_
(0)
0
d
t
d
(u)
d
A
t(u)
(f)(x)du,
so that
[
t
(f)(x)[ m(f)(x)
_
(0)
0
d
(u)
d
du
= m(f)(x)
_
(0)
0
(w:
1
(w) > u)du
= m(f)(x)
_
R
d
1
(w)dw = m(f)(x).
Corollary 8.11.1 Let
(f)(x) = sup
t>0

t
([f[). Then
is of weak type
(1, 1) and strong type (p, p), for 1 < p .
Corollary 8.11.2 Suppose that f L
1
(
d
). Then
t
(f)(x) f(x) as
t 0, for almost all x.
Proof We apply Theorem 8.9.1, with M(f) =
(f). The result holds

for continuous functions of compact support; these functions are dense in
L
1
(
d
).
(
d
). Then
t
(f) f almost
everywhere.
Proof Let us consider what happens in |x| < R. Let g = fI
|x|2R
, h =
f g. Then g L
1
(
d
), so
t
(g) g almost everywhere. If [x
t
[ < R,
[
t
(h)(x
t
)[ =
_

t
(y x
t
)h(y)dy
|h|
_
[z[R
t
(y)dy 0 as t 0.
8.12 Hedbergs inequality 125
p
(
d
) for 1 p , then
t
(f) f almost
everywhere.
Proof L
p
L
1
+L
.
8.12 Hedbergs inequality
Our next application concerns potential theory. Suppose to begin with that
f is a smooth function of compact support on R
3
: that is to say, f is
innitely dierentiable, and vanishes outside a bounded closed region S.
We can think of f as the distribution of matter, or of electric charge. The
Newtonian potential I
2
(f) is dened as
I
2
(f)(x) =
1
4
_
R
3
f(y)
[x y[
dy =
1
4
_
R
3
f(x u)
[u[
du.
This is well-dened, since 1/[x[ L
1
+L
.
Since I
2
is a convolution operator, we can expect it to have some continuity
properties, and these we now investigate. In fact, we shall do this in a more
general setting, which arises naturally from these ideas. We work in R
d
,
where d 2. Suppose that 0 < < d. Then 1/[x[
d
L
1
+ L
. Thus if
f L
1
L
, we can consider the integrals

I
d,
(f)(x) =
1
d,
_
R
d
f(y)
[x y[
d
dx =
1
d,
_
R
d
f(x u)
[u[
d
du,
where =
d,
is an appropriate constant. The operator I
d,
is called the
Riesz potential operator, or fractional integral operator, of order .
The function [x[
d
/
d,
is locally integrable, but it is not integrable,
and so it is not a scalar multiple of a bell-shaped approximate identity.
But as Hedberg [Hed 72] observed, we can split it into two parts, to obtain
continuity properties of I
d,
.
Theorem 8.12.1 (Hedbergs inequality) Suppose that 0 < < d and
that 1 p < d/. If f (L
1
+L
)(R
d
) and x R
d
then
[I
d,
(f)(x)[ C
d,,p
|f|
p/d
p
(m(f)(x))
1p/d
,
where m(f) is the HardyLittlewood maximal function, and C
d,,p
is a con-
stant depending only on d, , and p.
Proof In what follows, A, B, ... are constants depending only on d, and p.
Suppose that R > 0. Let
R
(x) =
A
R
d
_
[x[
R
_
d
I
([x[<R)
=
AI
([x[<R)
R
[x[
d
,
R
(x) =
A
R
d
_
[x[
R
_
d
I
([x[R)
=
AI
([x[R)
R
[x[
d
,
where A is chosen so that
R
is a bell-shaped approximate identity (the
lack of continuity at [x[ = R is unimportant). Then |
R
|
A/R
d
, and if
1 < p < d/ then
|
R
|
p
=
B
R
__

R
r
d1
r
(d)p
dr
_
1/p
= DR
d/p
.
Thus, using Theorem 8.11.2, and H olders inequality,
[I
d,
(f)(x)[
R
A
_
[
_
R
d
f(y)
(x y) dy[ +[
_
R
d
f(y)
(x y) dy[
_
A
_
m(f)(x) +D|f|
p
R
d/p
_
.
We now choose R = R(x) so that the two terms are equal: thus
R
d/p
m(f)(x) = E |f|
p
, and so
[I
d,
(f)(x)[ C |f|
p/d
p
(m(f)(x))
1p/d
.
Applying Corollary 8.9.1, we obtain the following.
Corollary 8.12.1 Suppose that 0 < < d.
(i) I
d,
is of weak type (1, d/(d )).
(ii) If 1 0. Then
([I
d,
(f)[ > ) (m(f) > (/C)
d/(d)
) F/
d/(d)
.
(ii)
|I
d,
(f)|
q
C
d,,p
|f|
p/d
p
_
_
_m(f)
1p/d
_
_
_
q
C
t
d,,p
|f|
p/d
p
_
_
_[f[
1p/d
_
_
_
q
= C
t
d,,p
|f|
p
.
Thus in R
3
, |I
2
(f)|
3p/(32p)
C
t
3,2,p
|f|
p
, for 1 < p < 3/2.
Simple scaling arguments show that q = pd/(d ) is the only index for
which the inequality in (ii) holds (Exercise 8.9).
8.13 Martingales
Our nal example in this chapter comes from the theory of martingales.
This theory was developed as an important part of probability theory, but
it is quite as important in analysis. We shall therefore consider martingales
dened on a -nite measure space (, , ).
First we describe the setting in which we work. We suppose that there is
an increasing sequence (
j
)
j=0
or (
j
)
j=
of sub--elds of , such that
is the smallest -eld containing
j
j
. We shall also suppose that each of
the -elds is -nite. We can think of this as a system evolving in discrete
time. The sets of
j
are the events that we can describe at time j. By time
j + 1, we have learnt more, and so we have a larger -eld
j+1
.
As an example, let
Z
d
j
= a = (a
1
, . . . , a
d
): a
i
= n
i
/2
j
, n
i
Z for 1 i d,
for < j < . Z
d
j
is a lattice of points in R
d
, with mesh size 2
j
. If
a Z
d
j
,
Q
j
(a) = x R
d
: a
i
1/2
j
< x
i
a
i
, for 1 i d
is the dyadic cube of side 2
j
with a in the top right-hand corner.
j
is
the collection of sets which are unions of dyadic cubes of side 2
j
; it is a
discrete -eld whose atoms are the dyadic cubes of side 2
j
. We can think
of the atoms of
j
as pixels; at time j +1, a pixel in
j
splits into 2
d
smaller
pixels, and so we have a ner resolution. (
j
) is an increasing sequence of
-elds, and the Borel -eld is the smallest -eld containing
j
j
. This
is the dyadic ltration of R
d
.
In general, to avoid unnecessary complication, we shall suppose that each
j
is either atom-free, or (as with the dyadic ltration) purely atomic, with
each atom of equal measure.
A sequence (f
j
) of functions on such that each f
j
is
j
-measurable is
called an adapted sequence, or adapted process. (Thus, in the case of the
dyadic ltration, f
j
is constant on the dyadic cubes of side 2
j
.) If (f
j
) is
an adapted sequence of real-valued functions, and if f
j
L
1
+ L
, we say
that (f
j
) is
a local sub-martingale if
_
A
f
j
d
_
A
f
j+1
d,
a local super-martingale if
_
A
f
j
d
_
A
f
j+1
d,
and a local martingale if
_
A
f
j
d =
_
A
f
j+1
d,
whenever A is a set of nite measure in
j
. If in addition each f
j
L
1
,
we say that (f
j
) is a sub-martingale, super-martingale or martingale, as the
case may be. The denition of local martingale extends to complex-valued
functions, and indeed to vector-valued functions, once a suitable theory of
vector-valued integration is established.
These ideas are closely related to the idea of a conditional expectation
operator, which we now develop.
Theorem 8.13.1 Suppose that f (L
1
+L
)(, , ), and that

0
is a -
nite sub--eld of . Then there exists a unique f
0
in (L
1
+L
)(,
0
, )
such that
_
A
f d =
_
A
f
0
d for each A
0
with (A) < . Further,
if f 0 then f
0
0, if f L
1
than |f
0
|
1
|f|
1
, and if f L
then
|f
0
|
|f|
.
Proof We begin with the existence of f
0
. Since
0
is -nite, by restrict-
ing attention to sets of nite measure in
0
, it is enough to consider the
case where () < and f L
1
. By considering f
+
and f
, we may
also suppose that f 0. If B
0
, let (B) =
_
B
f d. Then is a
measure on
0
, and if (B) = 0 then (B) = 0. Thus it follows from the
Lebesgue decomposition theorem that there exists f
0
L
1
(,
0
, ) such
that
_
B
f d = (B) =
_
B
f
0
d for all B
0
. If f
1
is another function
with this property then
_
(f
1
>f
0
)
(f
1
f
0
) d =
_
(f
1
<f
0
)
(f
1
f
0
) d = 0,
so that f
1
= f
0
almost everywhere.
We now return to the general situation. It follows from the construction
that if f 0 then f
0
0. If f L
1
, then f
0
= f
+
0
f
0
, so that
_
[f
0
[ d
_
[f
+
0
[ d +
_
[f
0
[ d =
_
f
+
d +
_
f
d =
_
[f[ d.
If f L
and B is a
0
-set of nite measure in (f
0
> |f|
), then
_
B
(f
0
|f|
) d =
_
B
(f |f|
) d 0,
from which it follows that f
0
|f|
almost everywhere. Similarly, it

follows that f
0
|f|
almost everywhere, and so |f

0
|
|f|
. Thus
if f (L
1
+L
)(, , ) then f
0
(L
1
+L
)(,
0
, ).
The function f
0
is denoted by E(f[
0
), and called the conditional ex-
pectation of f with respect to
0
. The conditional expectation operator
f E(f[
0
) is clearly linear. As an example, if
0
is purely atomic, and
A is an atom in
0
, then E(f[
0
) takes the constant value (
_
A
f d)/(A)
on A. The following corollary now follows immediately from Calder ons
interpolation theorem.
Corollary 8.13.1 Suppose that (X, |.|
X
) is a rearrangement invariant Ba-
nach function space. If f X, then |E(f[
0
)|
X
|f|
X
.
In these terms, an adapted process (f
j
) in L
1
+ L
is a sub-martingale
if f
j
E(f
j+1
[
j
), for each j, and super-martingales and martingales are
characterized in a similar way.
Proposition 8.13.1 (i) If (f
j
) is a local martingale, then ([f
j
[) is a local
sub-martingale.
(ii) If (X, |.|
X
) is a rearrangement invariant function space on (, , )
and (f
n
) is a non-negative local sub-martingale then (|f
j
|
X
) is an increasing
sequence.
Proof (i) If A, B
j
then
_
B
E(f
j+1
[
j
)I
A
d =
_
AB
f
j+1
d =
_
B
f
j+1
I
A
d
=
_
B
E(f
j+1
I
A
[
j
) d,
so that
E(f
j+1
I
A
[
j
) = E(f
j+1
[
j
)I
A
= f
j
I
A
.
Thus
_
A
[f
j
[ d =
_
[E(f
j+1
I
A
[
j
)[d
_
[f
j+1
I
A
[ d =
_
A
[f
j+1
[ d.
(ii) This follows from Corollary 8.13.1.
8.14 Doobs inequality
If f (L
1
+L
)() then the sequence E(f[

j
) is a local martingale. Con-
versely, if (f
j
) is a local martingale and there exists f (L
1
+L
)() such
that f
j
= E(f[
j
), for each j, then we say that (f
j
) is closed by f.
If (f
j
) is an adapted process, we set
f
k
(x) = sup
jk
[f
j
[, f
(x) = sup
j<
[f
j
[.
Then (f
j
) is an increasing adapted process, the maximal process, and f
j

f
pointwise.
Theorem 8.14.1 (Doobs inequality) Suppose that (g
j
)
j=0
is a non-
negative local submartingale. Then (g
k
> )
_
(g
k
>)
g
k
d.
Proof Let (x) = infj : g
j
(x) > . Note that (x) > k if and only if
g
k
(x) , and that (x) = if and only if g
(x) . Note also that the

sets ( = j) and ( j) are in
j
; this says that is a stopping time. Then
_
(g
k
>)
g
k
d =
_
(k)
g
k
d =
k
j=0
_
(=j)
g
k
d
j=0
_
(=j)
g
j
d (by the local sub-martingale property)
j=0
( = j) = ( k).
Although this inequality is always known as Doobs inequality, it was rst
established by Jean Ville [1937]. It appears in Doobs fundamental paper
(Doob [1940]) (where, as elsewhere, he fully acknowledges Villes priority).
Corollary 8.14.1 If 1 < p < then |g
k
|
p
p
t
|g
k
|
p
and |g
|
p

p
t
sup
k
|g
k
|
p
.
Proof This follows immediately from Theorem 8.1.1.
8.15 The martingale convergence theorem
We say that a local martingale is bounded in L
p
if sup
j
|f
j
|
p
< .
8.15 The martingale convergence theorem 131
Theorem 8.15.1 If 1 < p and (f
j
) is a local martingale which is
bounded in L
p
then (f
j
) is closed by some f in L
p
.
Proof We use the fact that a bounded sequence in L
p
is weakly sequentially
compact if 1 < p < , and is weak
sequentially compact, when p = .

Thus there exists a subsequence (f
j
k
) which converges weakly (or weak
,
when p = ) to f in L
p
(). Then if A is a set of nite measure in
j
,
_
A
f
j
k
d
_
A
f d. But if j
k
j,
_
A
f
j
k
d =
_
A
f
j
d, and so
_
A
f d =
_
A
f
j
d.
We now prove a version of the martingale convergence theorem.
Theorem 8.15.2 Suppose that (f
j
) is a local martingale which is closed by
f, for some f in L
p
, where 1 p < . Then f
j
f in L
p
-norm, and
almost everywhere.
Proof Let F = span (
j
L
p
(
j
)). Then F is dense in L
p
(), since is
the smallest -eld containing
j
j
. The result is true if f F, since then
f L
p
(
j
) for some j, and then f
k
= f for k j. Let T
j
(f) = E(f[
j
), let
T
(f) = f, and let M(f) = max(f
, [f[). Then |T
j
| = 1 for all j, and so
f
j
f in norm, for all f L
p
, by Theorem 8.4.1.
In order to prove convergence almost everywhere, we show that the sub-
linear mapping f M(f) = max(f
, [f[) is of Riesz weak type (1, 1): the

result then follows from Theorem 8.4.2. Now ([f
k
[) is a local submartingale,
and
_
A
[f
k
[ d
_
A
[f[ d for each A in
k
, and so, using Doobs inequality,
(f
> ) = lim
k
(f
k
> )
lim
k
_
([f
k
[>)
[f
k
[ d
lim
k
_
([f
k
[>)
[f[ d
=
_
([f
[>)
[f[ d,
and so the sublinear mapping f f
is of Riesz weak type: M is therefore

also of Riesz weak type (1, 1).
Corollary 8.15.1 If 1 < p < , every L
p
-bounded local martingale con-
verges in L
p
-norm and almost everywhere.
Although an L
1
-bounded martingale need not be closed, nor converge in
norm, it converges almost everywhere.
Theorem 8.15.3 Suppose that (f
j
)
j=0
is an L
1
-bounded martingale. Then
f
j
converges almost everywhere.
Proof Since (,
0
, ) is -nite, it is enough to show that f
j
converges
almost everywhere on each set in
0
of nite measure. Now if A is a set
of nite measure in
0
then (f
j
I
A
) is an L
1
-bounded martingale. We can
therefore suppose that () < . Let M = sup |f
j
|
1
. Suppose that N > 0.
Let T be the stopping time T = infj : [f
j
[ > N, so that T takes values in
[0, ]. Let B = (T < ) and S = (T = ). Let
g
j
() = f
j
() if j T(),
= f
T()
() if j > T().
If A
j
, then
_
A
g
j+1
d =
_
A(j+1T)
f
j+1
d +
_
A(j+1>T)
f
T
d
=
_
A(jT)
f
j+1
d +
_
A(j+1=T)
f
j+1
d +
_
A(j+1>T)
f
T
d
=
_
A(jT)
f
j
d +
_
A(j>T)
f
T
d
=
_
A
g
j
d,
by the martingale property, since A (j T)
j
. Thus (g
j
) is a martin-
gale, the martingale (f
j
) stopped at time T. Further,
|g
j
|
1
=
kj
_
(T=k)
[f
k
[ d +
_
(T>j)
[f
j
[ d |f
j
|
1
M,
so that g is an L
1
-bounded martingale.
Now let h = [f
T
[I
B
. Then h liminf [g
j
[, so that |h|
1
M, by Fatous
lemma. Thus h + NI
S
L
1
, and [g
j
[ h + NI
S
, for each j. Thus we can
write g
j
= m
j
(h + NI
S
), where |m
j
|
1. By weak*-compactness, there
exists a subsequence (m
j
k
) converging weak* in L
to some m L
. Then
(g
j
k
) converges weakly in L
1
to some g L
1
. We now use the argument
of Theorem 8.15.1 to conclude that (g
j
) is closed by g, and so g
j
converges
almost everywhere to g, by Theorem 8.15.2. But f
j
= g
j
for all j in S,
and (B) = lim
k
(f
k
> N) M/N, by Doobs inequality. Thus f
j
converges pointwise except on a set of measure at most M/N. But this
holds for all N, and so f
j
converges almost everywhere.
The great mathematical collaboration between Hardy and Littlewood was
carried out in great part by correspondence ([Lit 86], pp. 911). Reading
Hardys papers of the 1920s and 1930s, it becomes clear that he also corre-
sponded frequently with European mathematicians: often he writes to the
eect that the proof that follows is due to Marcel Riesz (or whomsoever), and
is simpler, or more general, than his original proof. Mathematical collabora-
tion is a wonderful thing! But it was Hardy who revealed the mathematical
power of maximal inequalities.
The term Riesz weak type is introduced here, since it ts very naturally
into the development of the theory. Probabilists, with Doobs inequality in
mind, might prefer to call it Doob weak type.
The martingale convergence theorem was proved by Doob in a beautiful
paper [Doo 40], using Doobs inequality, and an upcrossing argument. The
version of the martingale convergence theorem that we present here is as
simple as it comes. The theory extends to more general families of -elds,
to continuous time, and to vector-valued processes. It lies at the heart of
the theory of stochastic integration, a theory which has been developed in
ne detail, exposed over many years in the Seminar Notes of the Univer-
sity of Strasbourg, and the Notes on the Summer Schools of Probability
at Saint-Flour, published in the Springer-Verlag Lecture Notes in Mathe-
matics series. Progress in mathematical analysis, and in probability theory,
was handicapped for many years by the failure of analysts to learn what
probabilists were doing, and conversely.
Exercises
8.1 Give examples of functions f and g which satisfy the conditions of
Theorem 8.1.1, for which
_
f d = and
_
g d = 1.
8.2 Show that if f ,= 0 and f 0 then
_
R
d
A(f) d = .
8.3 Suppose that f is a non-negative decreasing function on (0, ).
Show that f
= m
(f) = m
u
(f). What is m
+
(f)?
8.4 [The Vitali covering lemma.] Suppose that E is a bounded measur-
able subset of R
d
. A Vitali covering of E is a collection | of open
balls with the property that if x E and > 0 then there exists
U | with radius less than such that x U. Show that if | is
a Vitali covering of E then there exists a sequence (U
n
) of disjoint
balls in | such that (E
n
U
n
) = 0.
[Hint: repeated use of Lemma 8.9.1.]
8.5 Suppose that S is a set of open intervals in the line which cover a
compact set of measure m. Show that there is a nite disjoint subset
T whose union has measure more than m/2.
8.6 Give a proof of Theorem 8.11.1.
8.7 Consider the Fejer kernel
n
(t) =
1
n + 1
_
sin(n + 1)t/2
sin t/2
_
2
on the unit circle T. Show that if 1 p < and f L
p
then
n
f f in L
p
(T)-norm. What about convergence almost every-
where?
8.8 For t R
d
let (t) = ([t[), where is a continuous strictly de-
creasing function on [0, ) taking values in [0, ]. Suppose that
L
1
+L
p
, where 1 0 let
t
(f)(x) = f(x/t):
t
is a dilation operator.
(i) Suppose that f L
p
(R
d
). Show that |
t
(f)|
p
= t
d/p
|f|
p
.
(ii) Show that
t
(I
d,
(f)) = t
I
d,
(
t
(f)).
(iii) Show that if 1 < p < d/ then q = pd/(d p) is the only
index for which I
d,
maps L
p
(R
d
) continuously into L
q
(R
d
).
8.10 Suppose that (, , ) is a measure space and that
0
is a sub--eld
of . Suppose that 1 p , and that J
p
is the natural inclusion of
L
p
(,
0
, ) into L
p
(, , ). Suppose that f L
p
(, , ). What
is J
p
(f)?
8.11 Let f
j
(t) = 2
j
for 0 < t 2
j
and f
j
(t) = 0 for 2
j
< t 1. Show
that (f
j
) is an L
1
-bounded martingale for the dyadic ltration of
(0, 1] which converges everywhere, but is not closed in L
1
.
8.12 Let K = [0, 1]
d
, with its dyadic ltration. Show that if (f
j
) is an
L
1
-bounded martingale then there exists a signed Borel measure
such that (A) =
_
A
f
j
d for each A
j
. Conversely, suppose
that is a (non-negative) Borel measure. If A is an atom of
j
,
let f
j
(x) = 2
dj
(A), for x A. Show that (f
j
) is an L
1
-bounded
martingale. Let f = lim
j
f
j
, and let = f d. Show that
is a non-negative measure which is singular with respect to : that
is, there is a set N such that (N) = 0 and ([0, 1]
d
N) = 0.
9
Complex interpolation
9.1 Hadamards three lines inequality
Calder ons interpolation theorem and Theorem 8.5.1 have strong and sat-
isfactory conclusions, but they require correspondingly strong conditions
to be satised. In many cases, we must start from a weaker position. In
this chapter and the next we consider other interpolation theorems; in this
chapter, we consider complex interpolation, and all Banach spaces will be
assumed to be complex Banach spaces. We shall turn to real interpolation
in the next chapter.
We shall be concerned with the RieszThorin Theorem and related results.
The original theorem, which concerns linear operators between L
p
-spaces,
was proved by Marcel Riesz [Ri(M) 26] in 1926; Thorin [Tho 39] gave a
dierent proof in 1939. Littlewood described this in his Miscellany [Lit 86]
as the most impudent in mathematics, and brilliantly successful. In the
1960s, Thorins proof was deconstructed, principally by Lions [Lio 61] and
Calder on [Cal 63], [Cal 64], [Cal 66], so that the results could be extended
to a more general setting. We shall need these more general results, and so
we shall follow Lions and Calderon.
The whole theory is concerned with functions, possibly vector-valued,
which are bounded and continuous on the closed strip

S =z =x+iy C:
0 x 1 and analytic on the open strip S =z =x+iy C: 0 <x<1,
and we shall begin by establishing the rst fundamental inequality, from
complex analysis, that we shall need.
Proposition 9.1.1 (Hadamards three lines inequality) Suppose that
f is a non-zero bounded continuous complex-valued function on

S which is
analytic on the open strip S. Let
M
= sup[f( +iy)[: y R.
135
136 Complex interpolation
Then M
0
M
1
1
.
Proof First we simplify the problem. Suppose that N
0
> M
0
, N
1
> M
1
.
Let
g(z) = N
z1
0
N
z
1
f(z).
Then g satises the conditions of the proposition, and
sup[g(iy)[: y R = sup[g(1 +iy)[: y R < 1.
We shall show that [g(z
0
)[ 1 for all z
0
S; then
[f( +iy)[ = N
1
0
N
1
[g( +iy)[ N
1
0
N
1
.
Since this holds for all N
0
> M
0
, N
1
> M
1
, we have the required result.
Let K = sup[g(z)[: z S. We want to apply the maximum modulus
principle: the problem is the behaviour of g as [y[ . We deal with this
by multiplying by functions that decay at innity. Suppose that > 0. Let
h
(z) = e
z
2
g(z). If z = x +iy S then
[h
(z)[ = e
(x
2
y
2
)
[g(z)[ e
e
y
2
K,
so that h
(z) 0 as [y[ .
Now suppose that z
0
= x
0
+iy
0
S. Choose R>1 such that e
R
2
y
2
0
K1.
Then z
0
is an interior point of the rectangle with vertices iRy
0
and 1iRy
0
,
and [h(z)[ e
on the sides of the rectangle. Thus, by the maximum mod-

ulus principle, [h
(z
0
)[ e
, and so
[g(z
0
)[ = e
y
2
0
e
x
2
0
[h(z
0
)[ e
(1+y
2
0
)
.
But is arbitrary, and so [g(z
0
)[ 1.
9.2 Compatible couples and intermediate spaces
We now set up the machinery for complex interpolation. Suppose that two
Banach spaces (A
0
, |.|
A
0
) and (A
1
, |.|
A
1
) are linear subspaces of a Banach
space (V, |.|
V
) (in fact, a Hausdor topological vector space (V, ) will do)
and that the inclusion mappings (A
j
, |.|
A
j
) (V, |.|
V
) are continuous, for
j = 0, 1. Then the pair (A
0
, |.|
A
0
), (A
1
, |.|
A
1
) is called a compatible couple.
A word about terminology here: the two Banach spaces play a symmetric
role, and we shall always use j to denote either 0 or 1, without repeating
for j = 0, 1.
9.2 Compatible couples and intermediate spaces 137
It is straightforward to show (Exercise 9.1) that the spaces A
0
A
1
and
A
0
+A
1
are then Banach spaces, under the norms
|a|
A
0
A
1
= max(|a|
A
0
, |a|
A
1
).
|a|
A
0
+A
1
= inf|a
0
|
A
0
+|a
1
|
A
1
: a = a
0
+a
1
, a
j
A
j
.
A Banach space (A, |.|
A
) contained in A
0
+A
1
and containing A
0
A
1
for
which the inclusions
(A
0
A
1
, |.|
A
0
A
1
) (A, |.|
A
) (A
0
+A
1
, |.|
A
0
+A
1
)
are continuous is then called an intermediate space.
The obvious and most important example is given when 1 p
j
.
Then (L
p
0
, |.|
p
0
), (L
p
1
, |.|
p
1
) form a compatible couple, and if p is between
p
0
and p
1
then (L
p
, |.|
p
) is an intermediate space (Theorem 5.5.1).
With Hadamards three lines inequality in mind, we now proceed as fol-
lows. Suppose that (A
0
, |.|
A
0
), (A
1
, |.|
A
1
) is a compatible couple. Let
L
0
= iy: y R and L
1
= 1 + iy: y R be the two components of the
boundary of S. We set T(A
0
, A
1
) to be the vector space of all functions F
on the closed strip

S taking values in A
0
+A
1
for which
F is continuous and bounded on

S;
F is analytic on S (in the sense that (F) is analytic for each continuous
linear functional on A
0
+A
1
);
F(L
j
) A
j
, and F is a bounded continuous map from L
j
to A
j
.
We give T(A
0
, A
1
) the norm
|F|
T
= max
j=0,1
(sup|F(z)|
A
j
: z L
j
).
Proposition 9.2.1 If F T(A
0
, A
1
) and z S then |F(z)|
A
0
+A
1
|F|
T
.
Proof There exists (A
0
+A
1
)
with ||
= 1 and (F(z)) = |F(z)|

A
0
+A
1
.
Then (F) satises the conditions of Proposition 9.1.1, and so [(F(z))[
|F|
T
.
If (F
n
) is an T-Cauchy sequence, then it follows that F
n
(z) converges uni-
formly, to F(z) say, on

S; then F T(A
0
, A
1
) and F
n
F in T-norm.
Thus (T(A
0
, A
1
), |.|
T
Now suppose that 0 < < 1. The mapping F F() is a continuous
linear mapping from T(A
0
, A
1
) into A
0
+ A
1
. We denote the image by
(A
0
, A
1
)
[]
= A
[]
, and give it the quotient norm:
|a|
[]
= inf|F|
T
: F
[]
= a.
Then (A
[]
, |.|
[]
) is an intermediate space.
With all this in place, the next fundamental theorem follows easily.
Theorem 9.2.1 Suppose that (A
0
, A
1
) and (B
0
, B
1
) are compatible couples
and that T is a linear mapping from A
0
+ A
1
into B
0
+ B
1
, mapping A
j
into B
j
, with |T(a)|
B
j
M
j
|a|
A
j
for a A
j
, for j = 0, 1. Suppose
that 0 < < 1. Then T(A
[]
) B
[]
, and |T(a)|
[]
M
1
0
M
1
|a|
[]
for
a A
[]
.
Proof Suppose that a is a non-zero element of A
[]
and that > 0. Then
there exists F T(A
0
, A
1
) such that F() = a and |F|
T
(1 + ) |a|
[]
.
Then the function T(F(z)) is in T(B
0
, B
1
), and
|T(F(z))|
B
j
(1 +)M
j
|F(z)|
A
j
for z L
j
.
Thus T(a) = T(F()) B
[]
. Set G(z) = M
z1
0
M
z
1
T(F)(z). Then G
T(B
0
, B
1
), and |G(z)|
B
j
(1 +) |F(z)|
A
j
for z L
j
. Thus
|G()|
[]
= M
1
0
M
1
|T(a)|
[]
(1 +) |a|
[]
,
so that |T(a)|
[]
(1 + )M
1
0
M
1
|a|
[]
. Since is arbitrary, the result
follows.
9.3 The RieszThorin interpolation theorem
Theorem 9.2.1 is the rst ingredient of the RieszThorin interpolation the-
orem. Here is the second.
Theorem 9.3.1 Suppose that 1 p
0
, p
1
and that 0 < < 1. Let
1/p = (1 )/p
0
+ /p
1
. If (A
0
, A
1
) is the compatible couple (L
p
0
(, , ),
L
p
1
(, , )) then A
[]
= L
p
(, , ), and |f|
[]
= |f|
p
for f L
p
(, , ).
Proof The result is trivially true if p
0
= p
1
. Suppose that p
0
,= p
1
. Let us set
u(z) = (1z)/p
0
+z/p
1
, for z

S; note that u() = 1/p and that '(u(z)) =
1/p
j
for z L
j
. First, let us consider a simple function f =
K
k=1
r
k
e
i
k
I
E
k
with |f|
p
= 1. Set F(z) =
K
k=1
r
pu(z)
k
e
i
k
I
E
k
, so that F() = f. If z L
j
then [F(z)[ =

K
k=1
r
p/p
j
k
I
E
k
, and so |F(z)|
p
j
= |f|
p/p
j
p
= 1. Thus F is
continuous on

S, analytic on S, and bounded in A
0
+A
1
on

S. Consequently
|f|
[]
1. By scaling, |f|
[]
|f|
p
for all simple f.
Now suppose that f L
p
. Then there exists a sequence (f
n
) of simple
functions which converge in L
p
-norm and almost everywhere to f. Then
(f
n
) is Cauchy in |.|
[]
, and so converges to an element g of (A
0
, A
1
)
[]
. But
9.3 The RieszThorin interpolation theorem 139
then a subsequence converges almost everywhere to g, and so g = f. Thus
L
p
(, , ) (A
0
, A
1
)
[]
, and |f|
[]
|f|
p
for f L
p
(, , ).
To prove the converse, we use a duality argument. Suppose that f is
a non-zero function in (A
0
, A
1
)
[]
. Suppose that > 0. Then there exists
F T(A
0
, A
1
) with F() = f and |F|
T
(1+) |f|
[]
. Now let us set B
j
=
L
p
j
, so that (B
0
, B
1
) is a compatible couple, L
p
(, , ) (B
0
, B
1
)
[]
, and
|g|
[]
|g|
p
for g L
p
(, , ). Thus if g is a non-zero simple function,

there exists G T(B
0
, B
1
) with G() = g and |G|
T
(1 +) |g|
p
. Let us
now set I(z) =
_
F(z)G(z) d. Then I is a bounded continuous function on
S, and is analytic on S. Further, if z L

j
then, using H olders inequality,
[I(z)[
_
[F(z)[[G(z)[ d |F(z)|
p
j
. |G(z)|
p
j
(1 +)
2
|f|
[]
|g|
[]
(1 +)
2
|f|
[]
|g|
p
.
We now apply Hadamards three lines inequality to conclude that
[I()[ =
_
fg d
(1 +)
2
|f|
[]
|g|
p
.
Since this holds for all simple g and all > 0, it follows that f L
p
and
|f|
p
|f|
[]
.
There is also a vector-valued version of this theorem.
Theorem 9.3.2 Suppose that E is a Banach space. Suppose that 1
p
0
, p
1
and that 0 < < 1. Let 1/p = (1 )/p
0
+ /p
1
. If (A
0
, A
1
)
is the compatible couple (L
p
0
(; E), L
p
1
(; E)) then A
[]
= L
p
(; E), and
|f|
[]
= |f|
p
for f L
p
(; E).
Proof The proof is exactly the same, making obvious changes. (Consider a
simple function f =
K
k=1
r
k
x
k
I
E
k
with r
k
R, x
k
E and |x
k
| = 1, and
with |f|
p
= 1. Set F(z) =
K
k=1
r
pu(z)
k
x
k
I
E
k
, so that F() = f.)
Combining Theorems 9.2.1 and 9.3.1, we obtain the RieszThorin inter-
polation theorem.
Theorem 9.3.3 (The RieszThorin interpolation theorem) Suppose
that (, , ) and (, T, ) are measure spaces. Suppose that 1 p
0
, p
1

and that 1 q
0
, q
1
, and that T is a linear mapping from L
p
0
(, , ) +
L
p
1
(, , ) into L
q
0
(, T, ) + L
q
1
(, T, ) and that T maps L
p
j
(, , )
continuously into L
q
j
(, T, ) with norm M
j
, for j = 0, 1. Suppose that
0 < < 1, and dene p
and q
by
1
p
=
1
p
0
+

p
1
,
1
q
=
1
q
0
+

q
1
,
(with the obvious conventions if any of the indices are innite). Then T
maps L
p
(, , ) continuously into L
q
(, T, ) with norm at most M
1
0
M
1
.
There is also a vector-valued version of the RieszThorin theorem, which
we leave the reader to formulate.
9.4 Youngs inequality
We now turn to applications. These involve harmonic analysis on locally
compact abelian groups. Let us describe what we need to know about this
an excellent account is given in Rudin [Rud 79]. Suppose that G is a locally
compact abelian group. Since we are restricting our attention to -nite
measure spaces, we shall suppose that G is -compact (a countable union of
compact sets). Since we want the dual group (dened in the next section) to
have the same property, we shall also suppose that G is metrizable. In fact,
neither condition is really necessary, but both are satised by the examples
that we shall consider. There exists a measure , Haar measure, on the Borel
sets of G for which (if the group operation is addition) (A) = (A) =
(A + g) for each Borel set A and each g G; further is unique up to
scaling. If G is compact, we usually normalize so that (G) = 1. In fact,
we shall only consider the following examples:
R, under addition, with Lebesgue measure, and nite products R
d
, with
product measure;
T = z C: [z[ = 1 = e
i
: 0 < 2, under multiplication, and with
measure d/2, and nite products T
d
, with product measure;
Z, under addition, with counting measure #,and nite products Z
d
, with
counting measure;
D
2
= 1, 1, under multiplication, with probability measure (1) =
(1) = 1/2, nite products D
d
2
= = (
1
, . . . ,
d
):
i
= 1,
with product measure, under which each point has measure 1/2
d
, and the
countable product D
N
2
, with product measure.
Z
2
= 0, 1, under addition mod 2, with counting measure #(0) =
#(1) = 1, nite products Z
d
2
= v = (v
1
, . . . , v
d
): v
i
= 0 or 1, with
counting measure, and the countable sum Z
(N)
2
, consisting of all Z
2
valued
sequences with only nitely many non-zero terms, again with counting
measure. Let P
d
denote the set of subsets of 1, . . . , d. If A P
d
, then
9.5 The HausdorYoung inequality 141
we can consider I
A
as an element of Z
d
2
; thus we can identify Z
d
2
with P
d
.
Under this identication, the group composition of two sets A and B is
the symmetric dierence AB.
Note that although D
d
2
and Z
d
2
are isomorphic as groups, we have given
then dierent measures.
Our rst application concerns convolution. Suppose that G is a locally
compact abelian group and that 1 < p < . It follows from Proposition
7.5.1 that if f L
1
(G) and g L
p
(G) then f g L
p
(G) and |f g|
p

|f|
1
|g|
p
. On the other hand, if h L
p
(G) then
_
h(x y)g(y) d(y)
|g|
p
|h|
p
,
by H olders inequality, so that h g is dened as an element of L
and
|h g|
|h|
p
|f|
p
. If now k L
q
(G), where 1 < q < p
t
, then k
L
1
+L
p
, and so we can dene the convolution k g. What can we say about

k g?
Theorem 9.4.1 (Youngs inequality) Suppose that G is a -compact
locally compact metrizable abelian group, that 1 < p, q < and that 1/p +
1/q = 1 + 1/r > 1. If g L
p
(G) and k L
q
(G) then k g L
r
(G), and
|k g|
r
|k|
p
|g|
q
.
Proof If f L
1
(G) + L
p
(G), let T
g
(f) = f g. Then T L(L
1
, L
p
), and
_
_
T: L
1
L
p
_
_
|g|
p
. Similarly, T L(L
p
, L
), and
_
_
_T : L
p
_
_
_
|g|
p
. We take p
0
= 1, p
1
= p
t
and q
0
= p, q
1
= . If we set = p/q
t
= q/r
we nd that
1
1
+

p
t
=
1
q
,
1
p
+

=
1
r
;
the result therefore follows from the RieszThorin interpolation theorem.
In fact, it is not dicult to prove Youngs inequality without using inter-
polation (Exercise 9.3).
9.5 The HausdorYoung inequality
For our second application, we consider group duality, and the Fourier trans-
form. A character on a -compact locally compact metrizable abelian group
G is a continuous homomorphism of G into T. Under pointwise multipli-
cation, the characters form a group, the dual group G
t
, and G
t
becomes a
-compact locally compact metrizable abelian group when it is given the
topology of uniform convergence on the compact subsets of G. If G is com-
pact, then G
t
is discrete, and if G is discrete, then G
t
is compact. The dual
of a nite product is (naturally isomorphic to) the product of the duals.
The dual G
tt
of G
t
is naturally isomorphic to G. For the examples above,
we have the following duals:
R
t
= R; if x R and R
t
then (x) = e
2ix
.
(R
d
)
t
= R
d
; if x R
d
and (R
d
)
t
then (x) = e
2i,x)
.
T
t
= Z and Z
t
= T; if n Z and e
i
T then n(e
i
) = e
in
.
(D
d
2
)
t
= Z
d
2
and (Z
d
2
)
t
= D
d
2
. If D
d
2
and A P
d
, let w
A
() =
iA
i
.
The function w
A
is a character on D
d
2
, and is called a Walsh function. If
A = i, we write
i
for w
i
; the functions
1
, . . . ,
d
are called Bernoulli
random variables.
i
() =
i
, and w
A
=
iA
i
.
(D
N
2
)
t
= Z
(N)
2
and (Z
(N)
2
)
t
= D
N
2
. Again, the Walsh functions are the
characters on D
N
2
.
If f L
1
(G), we dene the Fourier transform T(f) =

f as
T(f)() =
_
G
f(g)(g) d(g) ( G
t
).
It follows from the theorem of dominated convergence that T(f) is a bounded
continuous function on G
t
, and the mapping T is a norm-decreasing linear
mapping of L
1
(G) into C
b
(G
t
). We also have the Plancherel theorem.
Theorem 9.5.1 (The Plancherel theorem) Suppose that G is a -
compact locally compact metrizable abelian group. If f L
1
(G) L
2
(G),
then T(f) L
2
(G
t
,
t
) (where
t
is Haar measure on G
t
), and we can scale
the measure
t
so that |T(f)|
2
= |f|
2
. We can then extend T by continuity
to a linear isometry of L
2
(G) onto L
2
(G
t
); the inverse mapping is given by
f(g) =
_
G
T(f)()(g) d
t
().
Proof We give an outline of the proof in the case where G is a compact
group, and Haar measure has been normalized so that (G) = 1. First, the
characters form an orthonormal set in L
2
(G). For if G
t
then
, ) =
_
G
d =
_
G
1 d = 1,
9.6 Fourier type 143
while if
1
and
2
are distinct elements of G
t
, and
1
(h) ,=
2
(h), then, using
the invariance of Haar measure,
1
,
2
) =
_
G
2
d =
_
G
(
1
1
2
)(g) d(g)
=
_
G
(
1
1
2
)(g +h) d(g) = (
1
1
2
)(h)
_
G
(
1
1
2
)(g) d(g)
=
1
(h)
1
2
(h)
1
,
2
) .
Thus
1
,
2
) = 0. Finite linear combinations of characters are called
trigonometric polynomials. The trigonometric polynomials form an alge-
bra of functions, closed under conjugation (since =
1
). The next step is
to show that the characters separate the points of G; we shall not prove this,
though it is clear when G = T
d
or D
N
2
. It then follows from the complex
StoneWeierstrass theorem that the trigonometric polynomials are dense in
C(G). Further, C(G) is dense in L
2
(G): this is a standard result from mea-
sure theory, but again is clear if G = T
d
or D
N
2
. Thus the characters form
an orthonormal basis for L
2
(G). Thus if f L
2
(G) we can write f uniquely
as

G
, and then |f|

2
2
=

G
[a
[
2
. But then T(f)() = a
and
f(g) =
T(f)()(g).
The proof for locally compact groups is harder: the Plancherel theorem
for R, and so for R
d
, comes as an exercise later (Exercise 13.1).
After all this, the next result may seem to be an anti-climax.
Theorem 9.5.2 (The HausdorYoung inequality) Suppose that f
L
r
(G), where G is a -compact locally compact metrizable abelian group and
1 < r < 2. Then the Fourier transform T(f) is in L
r
(G
t
), and |T(f)|
r

|f|
p
.
Proof The Fourier transform is an isometry on L
2
, and is norm-decreasing
from L
1
to L
. We therefore apply the RieszThorin interpolation the-

orem, taking p
0
= 1, p
1
= 2, q
0
= and q
1
= 2, and taking = 2/r.
9.6 Fourier type
We now turn to the Fourier transform of vector-valued functions. If f
L
1
(G; E), where E is a Banach space, we can dene the Fourier transform
T(f) by setting T(f)() =
_
G
f(g)(g) d(g). Then T(f) C
b
(G
t
, E), and
|T(f)|
|f|
1
. In general though, neither the Plancherel theorem nor
the HausdorYoung inequalities extend to this setting, as the following
example shows. Let us take G = T, E = c
0
, and f() = (
n
e
in
), where
= (
n
) c
0
. Then
_
_
f(e
i
)
_
_
= ||
for all , so that |f|

L
p
(c
0
)
= ||
,
for 1 p . On the other hand (T(f))
k
=
k
e
k
, where e
k
is the kth unit
vector in c
0
, and so
k
|(T(f))
k
|
p
=
k
[
k
[
p
.
Thus if we choose in c
0
, but not in l
p
, for any 1 p < , it follows that
T(f) is not in l
p
, for any 1 p < .
On the other hand, there are cases where things work well. For example,
if H is a Hilbert space with orthonormal basis (e
n
), and f =

n
f
n
e
n

L
2
(G; H), then f
n
L
2
(G) for each n, and |f|
2
2
=
n
|f
n
|
2
2
. We can apply
the Plancherel theorem to each f
n
. Then T(f) =
n
T(f
n
)e
n
, and T is an
isometry of L
2
(G; H) onto L
2
(G
t
; H); we have a vector-valued Plancherel
theorem. Using the vector-valued RieszThorin interpolation theorem, we
also obtain a vector-valued HausdorYoung inequality.
This suggests a way of classifying Banach spaces. Suppose that E is a
Banach space, that G is a -compact locally compact metrizable abelian
group and that 1 p 2. Then we say that E is of Fourier type p with
respect to G if T(f) L
p
(G
t
; E) for all f L
p
(G; E) L
1
(G; E) and the
mapping f T(f) extends to a continuous linear mapping from L
p
(G; E)
into L
p
(G
t
, E). It is not known whether this condition depends on G, for
innite G, though Fourier type p with respect to R, T and Z are known to
be the same. If the condition holds for all G, we say that E is of Fourier
type p. Every Banach space is of Fourier type 1. We have seen that c
0
is
not of Fourier type p with respect to T for any 1 < p 2, and that Hilbert
space is of Fourier type 2.
Proposition 9.6.1 If E is of Fourier type p with respect to G then E is of
Fourier type r with respect to G, for 1 < r < p.
Proof The result follows from the vector-valued RieszThorin theorem, since
L
r
(G; E)=(L
1
(G; E), L
p
(G; E))
[]
and L
r
(G; E)=(L
(G; E), L
p
(G; E))
[]
,
where = p
t
/r
t
.
This shows that Fourier type p forms a scale of conditions, the condition
becoming more stringent as p increases. Kwapie n [Kwa 72] has shown that
a Banach space is of Fourier type 2 if and only if it is isomorphic to a Hilbert
space.
9.7 The generalized Clarkson inequalities 145
Fourier type extends to subspaces. We also have the following straight-
forward duality result.
Proposition 9.6.2 A Banach space E is of Fourier type p with respect to
G if and only if its dual E
is of Fourier type p with respect to G

t
.
Proof Suppose that E is of Fourier type p with respect to G, and that
_
_
_T : L
p
(G) L
p
(G
t
)
_
_
_ = K. Suppose that h L
p
(G
t
; E
) L
1
(G
t
; E
). If
f is a simple E-valued function on G then, by Fubinis theorem
_
G
f(g)T(h)(g) d(g) =
_
G
f(g)
__
G
h()(g) d
t
()
_
d(g)
=
_
G
h()
__
G
f(g)(g) d(g)
_
d
t
()
=
_
G
h()T(f)() d
t
().
Thus
|T(h)|
p
= sup
_
_
G
fT(h) d
: f simple, |f|
p
1
_
= sup
_
_
G
T(f)hd
t
: f simple, |f|
p
1
_
sup
_
|T(f)|
p
|h|
p
: f simple, |f|
p
1
_
K|h|
p
.
Thus E
is of Fourier type p with respect to G

t
. Conversely, if E
is of
Fourier type p with respect to G
t
, then E
is of Fourier type p with respect

to G
tt
= G, and so E is of Fourier type p with respect to G, since E is
isometrically isomorphic to a subspace of E
.
Thus if L
1
is innite-dimensional, then L
1
does not have Fourier type p
with respect to Z, for any p > 1, since (L
1
)
has a subspace isomorphic to

c
0
.
9.7 The generalized Clarkson inequalities
What about the L
p
spaces?
Theorem 9.7.1 Suppose that 1 < p < . Then L
p
(, , ) is of Fourier
type r for 1 < r min(p, p
t
), and if f L
r
(G; L
p
) then
|T(f)|
L
r
(G
;L
p
)
|f|
L
r
(G;L
p
)
.
Proof We use Corollary 5.4.2 twice.
__
G
|T(f)|
r
L
p
()
d
t
_
1/r
=
_
_
G
__
[T(f)(, )[
p
d()
_
r
/p
d
t
()
_
1/r
_
_
__
G
[T(f)(, )[
r
d
t
()
_
p/r
d()
_
1/p
by Corollary 5.4.2, and
_
_
__
G
[T(f)(, )[
r
d
t
()
_
p/r
d()
_
1/p
_
_
__
G
[f(g, )[
r
d(g)
_
p/r
d()
_
1/p
,
by the HausdorYoung inequality. Finally
_
_
__
G
[f(g, )[
r
d(g)
_
p/r
d()
_
1/p
_
_
G
__
[f(g, )[
p
d()
_
r/p
d(g)
_
1/r
=
__
G
|f|
r
L
p
()
d(g)
_
1/r
,
by Corollary 5.4.2, again.
This enables us to prove the following classical inequalities concerning L
p
spaces.
Theorem 9.7.2 (Generalized Clarkson inequalities) Suppose that
f, g L
p
(, , ), where 1 < p < , and suppose that 1 < r min(p, p
t
).
(i) |f +g|
r
p
+|f g|
r
p
2(|f|
r
p
+|g|
r
p
)
r
1
.
(ii) 2(|f|
r
p
+|g|
r
p
)
r1
|f +g|
r
p
+|f g|
r
p
.
(iii) 2(|f|
r
p
+|g|
r
p
) |f +g|
r
p
+|f g|
r
p
2
r
1
(|f|
r
p
+|g|
r
p
).
(iv) 2
r1
(|f|
r
p
+|g|
r
p
) |f +g|
r
p
+|f g|
r
p
2(|f|
r
p
+|g|
r
p
).
Proof (i) Dene h L
r
(D
2
; L
p
) by setting h(1) = f, h(1) = g. Then
h = ((f +g)/2)1 + ((f g)/2), so that T(h)(0) =(f + g)/2 and
T(h)(1) =(f g)/2. Thus, applying the HausdorYoung inequality,
|T(h)|
L
r
(Z
2
;L
p
)
=
1
2
(|f +g|
r
p
+|f g|
r
p
)
1/r
|h|
L
r
(D
2
;L
p
)
= (
1
2
(|f|
r
p
+|g|
r
p
))
1/r
=
1
2
1/r
(|f|
r
p
+|g|
r
p
)
1/r
.
Multiplying by 2, and raising to the r
t
-th power, we obtain (i).
(ii) Apply (i) to u = f +g and v = f g:
|2f|
r
p
+|2g|
r
p
2(|f +g|
r
p
+|f g|
r
p
)
r
1
.
Dividing by 2, and raising to the (r 1)-st power, we obtain (ii).
(iii) Since |h|
L
r
(D
2
,L
p
)
|h|
L
r
(D
2
,L
p
)
,
2
1/r
(|f|
r
p
+|g|
r
p
)
1/r
2
1/r
(|f|
r
p
+|g|
r
p
)
1/r
.
Substituting this in (i), and simplifying, we obtain the right-hand inequality.
Also,
2
1/r
(|f +g|
r
p
+|f g|
r
p
)
1/r
2
1/r
(|f +g|
r
p
+|f g|
r
p
)
1/r
.
Substituting this in (ii), and simplifying, we obtain the left-hand inequality.
(iv) These are proved in the same way as (iii); the details are left to the
reader.
In fact, Clarkson [Cla 36] proved these inequalities in the case where r =
min(p, p
t
) (see Exercise 9.5).
9.8 Uniform convexity
Clarksons inequalities give strong geometric information about the unit ball
of the L
p
spaces, for 1 < p < . The unit ball of a Banach space (E, |.|
E
) is
convex, but its unit sphere S
E
= x: |x| = 1 can contain large at spots.
For example, in L
1
, the set S
+
L
1
= f S
L
1: f 0 = f S
L
1:
_
f d = 1
is a convex set, so that if f
1
, f
2
S
+
L
1
then |(f
1
+f
2
)/2| = 1. By contrast,
a Banach space (E, |.|
E
) is said to be uniformly convex if, given > 0, there
exists > 0 such that if x, y S
E
and |(x +y)/2| > 1 then |x y| < .
In particular, (E, |.|
E
) is p-uniformly convex, where 2 p < , if there
exists C > 0 such that if x, y S
E
then
_
_
_
_
x +y
2
_
_
_
_
1 C |x y|
p
.
Theorem 9.8.1 If 2 p < then L
p
(, , ) is p-uniformly convex. If
1 < p 2 then L
p
(, , ) is 2-uniformly convex.
Proof When p 2, the result follows from the rst of the generalized
Clarkson inequalities, since if |f|
p
= |g|
p
= 1 then
_
_
_
_
f +g
2
_
_
_
_
p
1
_
_
_
_
f g
2
_
_
_
_
p
, so that
_
_
_
_
f +g
2
_
_
_
_
1
1
p2
p
|f g|
p
.
When 1 < p < 2, a similar argument shows that L
p
is p
t
-uniformly convex.
To show that it is 2-uniformly convex, we need to work harder. We need
the following inequality.
Lemma 9.8.1 If 1 0 such that
_
[s[
p
+[t[
p
2
_
2/p
_
s +t
2
_
2
+C
p
(s t)
2
.
Proof By homogeneity, it is sucient to prove the result for s = 1 and
[t[ 1. For 0 t 1, let f
p
(t) = ((1 + [t[
p
)/2)
1/p
. Then by Taylors
theorem with remainder, if 0 t < 1 there exists t < r < 1 such that
f
p
(t) = f
p
(1) + (t 1)f
t
p
(t) +
(t 1)
2
2
f
tt
p
(r).
Now
f
t
p
(t) =
t
p1
2
(f
p
(t))
1p
and f
tt
p
(t) =
(p 1)t
p2
4
(f
p
(t))
12p
so that f
p
(1) = 1, f
t
p
(1) = 1/2 and f
tt
p
(t) (p 1)/2
p
for 1/2 t 1. Thus
((1 +t
p
)/2)
1/p
(1 +t)/2
p 1
2
p+1
(1 t)
2
for 1/2 t 1. On the other hand, f
p
(t) (1 + t)/2 > 0 on [1, 1/2],
by H olders inequality, so that (((1 + [t[
p
)/2)
1/p
(1 + t)/2)/(1 t)
2
> 0
on [1, 1/2], and is therefore bounded below by a positive constant. Thus
there exists B
p
> 0 such that
((1 +[t[
p
)/2)
1/p
(1 +t)/2 B
p
(1 t)
2
for t [1, 1].
On the other hand,
((1 +[t[
p
)/2)
1/p
+ (1 +t)/2 ((1 +[t[
p
)/2)
1/p
2
1/p
for t [1, 1];
the result follows by multiplying these inequalities.
Now suppose that f, g S
L
p. By the lemma,
[f[
p
+[g[
p
2

_
f +g
2
2
+C
p
[f g[
2
_
p/2
,
so that, integrating and using the reverse Minkowski inequality for L
p/2
,
1
f +g
2
2
+C
p
[f g[
2
_
p/2
d
1/p
_
__
f +g
2
p
d
_
2/p
+C
p
__
[f g[
p
d
_
2/p
_
1/2
=
_
_
_
_
_
f +g
2
_
_
_
_
2
p
+C
p
|f g|
2
p
_
1/2
,
and the result follows from this.
Uniformly convex spaces have strong properties. Among them is the fol-
lowing, which provides a geometrical proof that L
p
spaces are reexive, for
1 < p < .
Theorem 9.8.2 A uniformly convex Banach space is reexive.
Proof We consider the uniformly convex space (E, |.|
E
) as a subspace of
its bidual E
. We use the fact, implied by the HahnBanach theorem, that

the unit sphere S
E
is weak*-dense in S
E
. Suppose that S
E
. We
shall show that for each n N there exists x
n
S
E
with |x
n
| 1/n.
Thus x
n
in norm, so that S
E
, since S
E
is a closed subset of the
complete space E.
Suppose that n N. By uniform convexity, there exists > 0 such
that if x, y S
E
and |(x +y)/2| > 1 then |x y| < 1/3n. There
exists S
E
such that [()[ > 1 /2. Let M be the non-empty set
x S
E
: [(x) ()[ </2. If x, y M then [((x+y)/2) ()[ </2,
so that [((x + y)/2)[ > 1 ; thus |(x +y)/2| > 1 and so |x y| <
1/3n. Now pick x
n
M. There exists S
E
such that [(x
n
) () >
|x
n
| 1/3n. Let N be the non-empty set
x S
E
: [(x) ()[ < /2, [(x) ()[ < 1/3n.
Note that N M. Pick y
n
N. Then
|x
n
| [(x
n
) ()[ + 1/3n
[(x
n
y
n
)[ +[(y
n
) ()[ + 1/3n
1/3n + 1/3n + 1/3n = 1/n.
Fourier type was introduced by Peetre [Pee 69]. The introduction of Fourier
type gives the rst example of a general programme of classifying Banach
spaces, according to various criteria. We begin with a result which holds for
the scalars (in this case, the HausdorYoung inequality) and nd that it
holds for some, but not all, Banach spaces. The extent to which it holds for
a particular space then provides a classication (in this case, Fourier type).
Results of Kwapie n [Kwa 72] show that a Banach space has Fourier type 2
if and only if it is isomorphic to a Hilbert space.
Uniform convexity provides another way of classifying Banach spaces. The
uniform convexity of a Banach space (E, |.|
E
) is related to the behaviour
of martingales taking values in E. Theorem 9.8.2 can be extended in an im-
portant way. We say that a Banach space (E, |.|
E
) is nitely represented in
(F, |.|
F
) if the nite-dimensional subspaces of F look like nite-dimensional
subspaces of E: if G is a nite-dimensional subspace of F and > 0 then
there is a linear mapping T : G E such that
|T(g)| |g| (1 +) |T(g)| for all g G.
A Banach space (E, |.|
E
) is super-reexive if every Banach space which is
nitely represented in E. It is an easy exercise (Exercise 9.9) to show that
a uniformly convex space is super-reexive. A remarkable converse holds:
if (E, |.|
E
) is super-reexive, then E is linearly isomorphic to a uniformly
convex Banach space, and indeed to a p-uniformly convex space, for some
2 p < ([Enf 73], [Pis 75]). More information about uniform convexity,
and the dual notion of uniform smoothness, is given in [LiT 79].
Exercises
9.1 Suppose that (A
0
, |.|
A
0
) and A
1
, |.|
A
1
) form a compatible couple.
(i) Show that if (x
n
) is a sequence in A
0
A
1
and that x
n
l
0
in
(A
0
, |.|
A
0
) and x
n
l
1
in (A
1
, |.|
A
1
) then l
0
= l
1
.
(ii) Show that (A
0
A
1
, |.|
A
0
A
1
Exercises 151
(iii) Show that (a, a): a A
0
A
1
is a closed linear subspace
of (A
0
, |.|
A
0
) (A
1
, |.|
A
1
).
(iv) Show that (A
0
+A
1
, |.|
A
0
+A
1
9.2 Suppose that f is a non-zero bounded continuous complex-valued
function on the closed strip

S = z = x + iy: 0 x 1 which is
analytic on the open strip S = z = x + iy: 0 < x < 1, and which
satises [f(iy)[ 1 and [f(1 +iy)[ 1 for y R. Show that
(w) =
1
i
log
_
i
1 z
1 +z
_
maps the unit disc D conformally onto S. What happens to the
boundary of D?
Let g(w) = f((w)). Show that if w D then
g(w) =
1
2
_
2
0
g(e
i
)e
i
e
i
w
d.
Deduce that [f(z)[ 1 for z S.
9.3 Suppose that 1 < p, q < and that 1/p + 1/q = 1 + 1/r > 1. Let
= r
t
/p
t
, = r
t
/q
t
. Show that + = 1, and that if h L
r
and
|h|
r
= 1 then [h[
L
p
, with |[h[
|
p
= 1 and [h[
L
q
, with
_
_
[h[
_
_
q
= 1. Use this to give a direct proof of Youngs inequality.

9.4 Suppose that a = (a
n
) l
2
(Z).
(i) Use the CauchySchwarz inequality to show that
n,=m
a
n
mn
_
2
n=1
1
n
2
_
1/2
|a|
2
.
(ii) Let T be the the saw-tooth function
T(e
i
) = for 0 < t < ,
= for t < 0,
= 0 for t = 0.
Show that

T
0
= 0 and that

T
n
= i/n for n ,= 0.
(iii) Calculate |T|
2
, and use the Plancherel theorem to show that
n=1
(1/n)
2
=
2
/6.
(iv) Let A(e
i
) =

m=
ia
n
e
in
, so that A L
2
(T) and

A
n
=
ia
n
. Let C = AT. Show that |C|
2
|A|
2
.
(v) What is c
n
? Show that
m=
n,=m
a
n
mn
2

2
|a|
2
2
.
(vi) (Hilberts inequality for l
2
(Z)). Suppose that b = (b
m
)
l
2
(Z). Show that
m=
n,=m
a
n
b
m
mn
|a|
2
|b|
2
.
9.5 Verify that the generalized Clarkson inequalities establish Clarksons
original inequalities, in the following form. Suppose that f, g
L
p
(, , ). If 2 p < then
(a) 2(|f|
p
p
+|g|
p
p
) |f +g|
p
p
+|f g|
p
p
2
p1
(|f|
p
p
+|g|
p
p
).
(b) 2(|f|
p
p
+|g|
p
p
)
p
1
|f +g|
p
p
+|f g|
p
p
.
(c) |f +g|
p
p
+|f g|
p
p
2(|f|
p
p
+|g|
p
p
)
p1
.
If 1 < p < 2 then the inequalities are reversed.
9.6 Show that the restrictions of the norm topology and the weak topol-
ogy to the unit sphere S
E
of a uniformly convex space are the same.
Does a weak Cauchy sequence in S
E
converge in norm?
9.7 Say that a Banach space is of strict Fourier type p if it is of Fourier
type p and |T(f)|
L
p
(G
,E)
|f|
L
p
(G,E)
for all f L
p
(G, E), and all
G. Show that a Banach space of strict Fourier type p is p
t
-uniformly
convex.
9.8 Suppose that f
1
, . . . , f
d
L
p
(, , ) and that
1
, . . . ,
d
are Bernoulli
functions on D
d
2
.
(i) Show that if 1 < p < 2 then
1
2
d
D
d
2
_
_
_
_
_
_
d
j=1
j
()f
j
_
_
_
_
_
_
p
1/p
j=1
|f
j
|
p
p
1/p
.
(ii) Use a duality argument to show that if 2 < p < then
1
2
d
D
d
2
_
_
_
_
_
_
d
j=1
j
()f
j
_
_
_
_
_
_
p
1/p
j=1
|f
j
|
p
p
1/p
.
Exercises 153
9.9 Suppose that a Banach space (E, |.|
E
) is nitely represented in
a uniformly convex Banach space (F, |.|
F
). Show that (E, |.|
E
)
is uniformly convex. Show that a uniformly convex space is
super-reexive.
10
Real interpolation
10.1 The Marcinkiewicz interpolation theorem: I
We now turn to real interpolation, and in particular to the Marcinkiewicz
theorem, stated by Marcinkiewicz in 1939 [Mar 39]. Marcinkiewicz was
killed in the Second World War, and did not publish a proof; this was done
by Zygmund in 1956 [Zyg 56]. The theorem diers from the RieszThorin
theorem in several respects: it applies to sublinear mappings as well as to
linear mappings; the conditions at the end points of the range are weak type
ones and the conclusions can apply to a larger class of spaces than the L
p
spaces. But the constants in the inequalities are worse than those that occur
in the RieszThorin theorem.
We begin by giving a proof in the simplest case. This is sucient for many
purposes; the proof is similar to the proof of the more sophisticated result
that we shall prove later, and introduces techniques that we shall use there.
Theorem 10.1.1 (The Marcinkiewicz interpolation theorem: I) Sup-
pose that 0 < p
0
< p < p
1
, and that T : L
p
0
(, , ) + L
p
1
(, , )
L
0
(, T, ) is sublinear. If T is of weak type (p
0
, p
0
), with constant c
0
, and
weak type (p
1
, p
1
), with constant c
1
, then T is of strong type (p, p), with a
constant depending only on c
0
, c
1
, p
0
, p
1
and p.
Proof First we consider the case when p
1
< . Suppose that f L
p
.
The idea of the proof is to decompose f into two parts, one in L
p
0
, and
one in L
p
1
, and to let this decomposition vary. For > 0, let E
=
x: [f(x)[ > , let g
= fI
E
and let h
= f g
. Then g
L
p
0
,
since |g
|
p
0
(E
)
1/p1/p
0
|f|
p
, by H olders inequality, and h
L
p
1
,
since
_
([h
[/)
p
1
d
_
([h
[/)
p
d. Since f = g
+h
,
[T(f)[ [T(g
)[ +[T(h
)[,
154
10.1 The Marcinkiewicz interpolation theorem: I 155
so that
([T(f)[ > ) ([T(g
)[ > /2) ([T(h
)[ > /2)
and
([T(f)[ > ) ([T(g
)[ > /2) +([T(h
)[ > /2).
Thus
_
[T(f)[
p
d = p
_

0
p1
([T(f)[ > ) d
p
_

0
p1
([T(g
)[ > /2) d
+p
_

0
p1
([T(h
)[ > /2) d
= I
0
+I
1
, say.
Since T is of weak type (p
0
, p
0
),
I
0
c
0
p
_

0
p1
__
[g
(x)[
p
0
d(x)
_
/(/2)
p
0
d
= 2
p
0
c
0
p
_

0
pp
0
1
_
_
([f[>)
[f(x)[
p
0
d(x)
_
d
= 2
p
0
c
0
p
_
[f(x)[
p
0
_
_
[f(x)[
0
pp
0
1
d
_
d(x)
=
2
p
0
c
0
p
p p
0
_
[f(x)[
p
0
[f(x)[
pp
0
d(x) =
2
p
0
c
0
p
p p
0
|f|
p
p
.
Similarly, since T is of weak type (p
1
, p
1
),
I
1
c
1
p
_

0
p1
__
[h
(x)[
p
1
d(x)
_
/(/2)
p
1
d
= 2
p
1
c
1
p
_

0
pp
1
1
_
_
([f[)
[f(x)[
p
1
d(x)
_
d
= 2
p
1
c
1
p
_
[f(x)[
p
1
_
_

[f(x)[
pp
1
1
d
_
d(x)
=
2
p
1
c
1
p
p
1
p
_
[f(x)[
p
1
[f(x)[
pp
1
d(x) =
2
p
1
c
0
p
p
1
p
|f|
p
p
.
Combining these two, we have the desired result.
156 Real interpolation
Secondly, suppose that p
1
= , and that f L
p
. Write f = g
+ h
, as
before. Then |T(h
)|
c
1
, so that if [T(f)(x)[ > 2c
1
then [T(g
)(x)[ >
c
1
. Thus, arguing as for I
0
above,
_
[T(f)[
p
d = p
_

0
t
p1
([T(f)[ > t)dt
= p(2c
1
)
p
_

0
p1
([T(f)[ > 2c
1
) d
p(2c
1
)
p
_

0
p1
([T(g
)[ > c
1
) d
c
1
p(2c
1
)
p
c
0
_

0
p1
__
[g
[
p
o
d
__
(c
1
d)
p
0
d
= 2
p
pc
pp
0
1
c
0
_

0
pp
0
1
_
_
([f[>)
[f[
p
0
d
_
d
=
2
p
pc
pp
0
1
c
0
p p
0
|f|
p
p
.
10.2 Lorentz spaces
In order to obtain stronger results, we need to spend some time introduc-
ing a new class of function spaces, the Lorentz spaces, and to prove a key
inequality due to Hardy. The Lorentz spaces are a renement of the L
p
spaces, involving a second parameter; they t well with the proof of the
Marcinkiewicz theorem. The Muirhead maximal function f
is an impor-
tant ingredient in their study; for this reason we shall assume either that
(, , ) is atom-free or that it is discrete, with counting measure.
We begin with weak-L
p
. If 0 0
(([f[ > ))
1/p
< .
Note that |f|
p,
= supt
1/p
f
(t): 0 < t < (). This relates to weak type:

a sublinear mapping T of a Banach space E into M(, , ) is of weak type
(E, p) if and only if T(E) L
p,
and there exists a constant c such that
|T(f)|
p,
c |f|
E
. Note that, in spite of the notation, |.|
p,
is not a
norm (and in fact if p 1, there is no norm on L
p,
equivalent to |.|
p,
).
When 1 < p < we can do better.
10.2 Lorentz spaces 157
Proposition 10.2.1 Suppose that 1 < p < . Then f L
p,
if and
only if
|f|
p,
= supt
1/p
f
(t): 0 < t < () < .

Further |.|
p,
is a norm on L
p,
, and
|f|
p,
|f|
p,
p
t
|f|
p,
.
(L
p,
, |.|
p,
) is a rearrangement-invariant function space.
Proof If |f|
p,
< , then since f
, |f|
p,
|f|
p,
and f L
p,
.
On the other hand, if f L
p,
then
_
t
0
f
(s) ds |f|
p,
_
t
0
s
1/p
ds = p
t
|f|
p,
t
11/p
so that t
1/p
f
(t) p
t
|f|
p,
, and |f|
p,
p
t
|f|
p,
. Since the mapping
f f
is sublinear, |.|
p,
is a norm, and nally all the conditions for
(L
p,
, |.|
p,
) to be a rearrangement invariant function space are readily
veried.
The form of the weak-L
p
spaces L
p,
suggests a whole spectrum of
rearrangement-invariant function spaces. We dene the Lorentz space L
p,q
for 0 < p < and 0 < q < as
L
p,q
=
f : |f|
p,q
=
_
q
p
_
()
0
t
q/p
f
(t)
q
dt
t
_
1/q
<
.
Note that |f|
p,q
is the L
q
norm of f
with respect to the measure

(q/p)t
q/p1
dt = d(t
q/p
).
Note also that L
p,p
= L
p
, with equality of norms. In general, however, |.|
p,q
is not a norm, and if p < 1 or q < 1 there is no equivalent norm. But
if 1 < p < and 1 q < then, as in Proposition 10.2.1, there is an
equivalent norm. In order to prove this, we need Hardys inequality, which is
also at the heart of the general Marcinkiewicz interpolation theorem which
we shall prove.
10.3 Hardys inequality
name]Hardy
Theorem 10.3.1 (Hardys inequality) Suppose that f is a non-negative
measurable function on [0, ). Let
A
,
(f)(t) = t
_
t
0
s
f(s)
ds
s
,
B
,
(f)(t) = t
_

t
s
f(s)
ds
s
,
for < < and > 0. If 1 q < then
(i)
_

0
(A
,
(f)(t))
q
dt
t

1
q
_

0
(t
f(t))
q
dt
t
,
and
(ii)
_

0
(B
,
(f)(t))
q
dt
t

1
q
_

0
(t
+
f(t))
q
dt
t
.
Proof We shall rst prove this in the case where = 1 and q = 1. Then
_

0
A
1,
(f)(t)
dt
t
=
_

0
t
1
__
t
0
f(u) du
_
dt
=
_

0
__

u
t
1
dt
_
f(u) du
=
1
_

0
u
f(u) du,
and so in this case we have equality.
Next, suppose that = 1 and 1 < q < . We write f(s) =
s
(1)/q
s
(1)/q
f(s), and apply H olders inequality:

_
t
0
f(s) ds
__
t
0
s
1
ds
_
1/q
__
t
0
s
(1)q/q
f(s)
q
ds
_
1/q
=
_
t
_
1/q
__
t
0
s
(1)q/q
f(s)
q
ds
_
1/q
,
so that, since q/q
t
= q 1,
(A
1,
(f)(t))
q
q1
t
_
t
0
s
(1)(q1)
f(s)
q
ds.
Thus
_

0
(A
1,
(f)(t))
q
dt
t

1
q1
_

0
t
1
__
t
0
s
(1)(q1)
f(s)
q
ds
_
dt
=
1
q1
_

0
__

s
t
1
dt
_
s
(1)(q1)
f(s)
q
ds
=
1
q
_

0
s
+(1)(q1)
f(s)
q
ds
=
1
q
_

0
(s
(1)
f(s))
q
ds
s
.
The general form of (i) now follows by applying this to the function s
1
f(s).
To prove (ii), we set g(u) = f(1/u) and u = 1/s. Then
B
,
(f)(t) = t
_

t
s
f(s)
ds
s
= t
_
1/t
0
u
g(u)
du
u
= A
,
(g)(1/t),
so that
_

0
(B
,
(f)(t))
q
dt
t
=
_

0
(A
,
(g)(1/t))
q
dt
t
=
_

0
(A
,
(g)(t))
q
dt
t
q
_

0
(t
g(t))
q
dt
t
=
1
q
_

0
(t
+
f(t))
q
dt
t
.
If we set = 1 and apply the result to f
, we obtain the following:

Corollary 10.3.1 If f (L
1
+L
)(, , ) then
_
t
(1)q
f
(t)
q
dt
t

1
q
_
t
(1)q
f
(t)
q
dt
t
.
Note that if we set = 1 and = 1/q
t
, we obtain the HardyRiesz
inequality.
10.4 The scale of Lorentz spaces
We now have the following result, which complements Proposition 10.2.1.
Theorem 10.4.1 Suppose that 1 < p < , 1 q < . Then f L
p,q
if
and only if
|f|
p,q
=
_
q
p
_
()
0
t
q/p
f
(t)
q
dt
t
_
1/q
< .
Further |.|
p,q
is a norm on L
p,q
, and
|f|
p,q
|f|
p,q
p
t
|f|
p,q
.
(L
p,q
, |.|
p,q
) is a rearrangement-invariant function space.
Proof The result follows from the corollary to Hardys inequality, setting
= 1/p
t
. |f|
p,q
is a norm, since f
is sublinear, and the rest follows as in

Proposition 10.2.1.
What is the relation between the various L
p,q
spaces, as the indices vary?
First, let us keep p xed, and let q vary.
Theorem 10.4.2 If 0 < p < and 1 q < r then L
p,q
L
p,r
,
|f|
p,r
|f|
p,q
and |f|
p,r
|f|
p,q
,
Proof If f L
p,q
and 0 < t < () then
t
1/p
f
(t) =
_
q
p
_
t
0
(s
1/p
f
(t))
q
ds
s
_
1/q
_
q
p
_
()
0
(s
1/p
f
(s))
q
ds
s
_
1/q
= |f|
p,q
,
so that L
p,q
L
p,
, and the inclusion is norm decreasing. The same argu-
ment works for the norms |f|
p,q
and |f|
p,
.
Suppose that 1 q < r < . Since
q
p
_

0
(t
1/p
h(t))
q
dt
t
= q
_

0
(th(t
p
))
q
dt
t
,
for h a non-negative measurable function, we need only show that if g is a
decreasing function on [0, ) then
_
q
_

0
t
q
g(t)
q
dt
t
_
1/q
is a decreasing function of q.
We rst consider the case where 1 = q < r. We can approximate g from
below by an increasing sequence of decreasing step functions, and so it is
enough to consider such functions. We take g of the form
g =
J
j=1
a
j
I
[0,t
j
]
, where a
j
> 0 and t
j
> 0 for 1 j J.
Then, applying Minkowskis inequality,
_
r
_

0
t
r
(g(t))
r
dt
t
_
1/r
j=1
_
ra
r
j
_
t
j
0
t
r1
dt
_
1/r
=
J
j=1
a
j
t
j
=
_

0
tg(t)
dt
t
.
Next, suppose that 1 < q < r. Let = r/q, let h(t) = (g(t
1/q
))
q
and
let u = t
q
, so that h(u) = (g(t))
q
. Then changing variables, and using the
result above,
_
r
_

0
t
r
(g(t))
r
dt
t
_
1/r
=
_
_

0
u
(h(u))
du
u
_
1/q
_

0
uh(u)
du
u
_
1/q
=
_
q
_

0
t
q
(g(t))
q
dt
t
_
1/q
.
What happens as p varies? If (, , ) is non-atomic and () = ,
we can expect no patterns of inclusions, since there is none for the spaces
L
p
= L
p,p
. When (, , ) is non-atomic and of nite measure, we have the
following.
Proposition 10.4.1 Suppose that (, , ) is non-atomic and that () <.
Then if 0 < p
1
< p
2
, L
p
2
,q
2
L
p
1
,q
1
for any q
1
, q
2
, with continuous
inclusion.
Proof Because of Theorem 10.4.2, it is enough to show that L
p
2
,
L
p
1
,1
,
with continuous inclusion. But if f L
p
2
,
then
1
p
1
_
()
0
t
1/p
1
f
(t)
dt
t

_
1
p
1
_
()
0
t
1/p
1
1/p
2
dt
t
_
|f|
p
2
,
=
p
2
p
2
p
1
(())
1/p
1
1/p
2
|f|
p
2
,
.
When (, , ) is atomic, we can take = N. We then denote the
Lorentz space by l
p,q
. In this case, as you might expect, the inclusions go the
other way.
Proposition 10.4.2 If 0 < p
1
< p
2
, then l
p
1
,q
1
l
p
2
,q
2
for any q
1
, q
2
,
with continuous inclusion.
Proof Again it is enough to show that l
p
1
,
l
p
2
,1
, with continuous inclu-
sion. But if x l
p
1
,
then
1
p
2
n=1
n
1/p
2
1
x
n

_
1
p
2
n=1
n
1/p
2
1/p
1
1
_
|x|
p
1
,
.
10.5 The Marcinkiewicz interpolation theorem: II
We now come to a more general version of the Marcinkiewicz interpolation
theorem: we weaken the conditions, and obtain a stronger result. The proof
that we give is due to Hunt [Hun 64].
Theorem 10.5.1 (The Markinkiewicz interpolation theorem: II)
Suppose that 1 p
0
< p
1
< and 1 q
0
, q
1
, with q
0
,= q
1
, and that T
is a sublinear operator from L
p
0
,1
(
t
,
t
,
t
)+L
p
1
,1
(
t
,
t
,
t
) to M
1
(, , )
which is of weak types (L
p
0
,1
, q
0
) and (L
p
1
,1
, q
1
). Suppose that 0 < < 1,
and set
1
p
=
1
p
0
+

p
1
,
1
q
=
1
q
0
+

q
1
.
Then if 1 r there exists a constant B, depending only on p
0
, p
1
, q
0
, q
1
,
, r and the weak type constants, such that |T(f)|
q,r
B|f|
p,r
, for f L
p,r
.
Corollary 10.5.1 If q p then there exists a constant B such that |T(f)|
q

B|f|
p
.
Hunt [Hun 64] has shown that the result is false if q < p.
Proof Before beginning the proof, some comments are in order. First, it is
easy to check that L
p,r
L
p
0
,1
+ L
p
1
,1
for p
0
< p < p
1
and 1 r .
Second, we shall only give the proof when all of the indices are nite; a
separate proof is needed when one or more index is innite, but the proofs
are easier. Thirdly, we shall not keep a close account of the constants that
accrue, but will introduce constants C
i
without comment.
10.5 The Marcinkiewicz interpolation theorem: II 163
We set
=
1/q
0
1/q
1
1/p
0
1/p
1
_
=
1/q
0
1/q
1/p
0
1/p
=
1/q 1/q
1
1/p 1/p
1
_
.
Note that can be positive or negative.
Suppose that f L
p,r
. We split f in much the same way as in Theorem
10.1.1. We set
g
(x) = f(x) if [f(x)[ > f
),
= 0 otherwise,
and set h
= f g
.
Since T is sublinear, [T(f)[ [T(g
[ + [T(h
)[, and so (T(f))
()
T(g
(/2) +T(h
(/2). Thus
|T(f)|
q,r

_
r
q
_

0
(
1/q
(T(g
(/2) +T(h
(/2))
r
d
_
1/r
_
r
q
_

0
(
1/q
T(g
(/2))
r
d
_
1/r
+
_
r
q
_

0
(
1/q
T(h
(/2))
r
d
_
1/r
= J
0
+J
1
, say.
We consider each term separately.
Since T is of weak type (L
p
0
,1
, q
0
),
T(g
(/2) C
0
_
2
_
q
0
|g
p
0
,1
.
But g
.I
[0,
)
, so that
|g
p
0
,1

1
p
0
_

0
s
1/p
0
f
(s)
ds
s
.
Thus
J
r
0
C
1
_

0
_
1/q1/q
0
_

0
s
1/p
0
f
(s)
ds
s
_
r
d
= C
2
_

0
_
u
1/p1/p
0
_
u
0
s
1/p
0
f
(s)
ds
s
_
r
du
u
(where u =
)
= C
2
_

0
_
A
1/p1/p
0
,1/p
0
(f
)(u)
_
r
du
u
C
3
_

0
_
u
1/p
f
(u)
_
r
du
u
(using Hardys inequality)
= C
4
(|f|
p,r
)
r
.
Similarly, since T is of weak type (L
p
1
,1
, q
1
),
T(h
(/2) C
5
_
2
_
q
1
|h
p
1
,1
.
But
h
) and h
, so that
|h
|
p
1
,1

/p
1
f
) +
1
p
1
_

s
1/p
1
f
(s)
ds
s
.
Thus
J
r
1
C
6
_

0
_
1/q1/q
1
(
1/p
1
f
) +
1
p
1
_

s
1/p
1
f
(s)
ds
s
)
_
r
d
,
so that J
1
C
7
(K
1
+K
2
), where
K
1
=
__

0
(
1/q1/q
1
+/p
1
f
))
r
d
_
1/r
=
__

0
(u
1/p
f
(u))
r
du
u
_
1/r
(where u =
)
C
8
|f|
p,r
,
and
K
r
2
=
_

0
_
1/q1/q
1
_

s
1/p
1
f
(s)
ds
s
_
r
d
=
1
[[
1/r
_

0
_
u
1/p1/p
1
_

u
s
1/p
1
f
(s)
ds
s
_
r
du
u
(where u =
)
=
1
[[
1/r
_

0
(B
1/p1/p
1
,1/p
1
(u))
r
du
u
C
9
(|f|
p,r
)
r
,
using Hardys inequality again. This completes the proof.
We have the following extension of the HausdorYoung inequality.
Corollary 10.5.2 (Paleys inequality) If G is a locally compact abelian
group then the Fourier transform is a continuous linear mapping from L
p
(G)
to the Lorentz space L
p
,p
(G
t
), for 1 < p < 2.
In detail, when G = R
d
this says that there are constants C
p
and K
p
such
that
__
R
d
[
f(u)[
p
u
d(p2)
du
_
1/p
K
p
__

0
[(

f)
(t)[
p
t
p2
dt
_
1/p
K
p
C
p
|f|
p
.
(Paleys proof was dierent!)
The Marcinkiewicz theorem has inspired a whole theory of interpolation
spaces. This theory is developed in detail in the books by Bergh and
Lofstr om [BeL 76] and Bennett and Sharpley [BeS 88].
The Lorentz spaces were introduced by Lorentz [Lor 50]. More details can
be found in [Hun 66], [StW 71] and [BeS 88].
Exercises
10.1 Show that the simple functions are dense in L
p,q
when p and q are
nite.
10.2 Suppose that (E, |.|
E
) is a Banach function space, and that 1 p <.
Suppose that |I
A
| (A)
1/p
for all sets A of nite measure. Show that
L
p,1
E and that the inclusion mapping is continuous.
10.3 Suppose that (E, |.|
E
) is a Banach function space in which the simple
functions are dense, and that 1 p < . Suppose that |I
A
| (A)
1/p
for all sets A of nite measure. Show that E L
p,
and that the
inclusion mapping is continuous.
10.4 Prove Theorem 10.5.1 when r = , and when q
0
or q
1
is innite.
11
The Hilbert transform, and Hilberts inequalities
11.1 The conjugate Poisson kernel
We now consider the Hilbert transform, one of the fundamental operators
of harmonic analysis. We begin by studying the Hilbert transform on the
real line R, and show how the results that we have established in earlier
chapters are used to establish its properties. We then more briey discuss
the Hilbert transform on the circle T. Finally we show how the techniques
that we have developed can be applied to singular integral operators on R
d
.
Suppose that f L
p
(R), where 1 p < . Recall that in Section 8.11
we used the Poisson kernel
P(x, t) = P
t
(x) =
t
(x
2
+t
2
)
to construct a harmonic function u(x, t) = u
t
(x) = (P
t
f)(x) on the upper
half space H
2
= (x, t) : t > 0 such that u
t
L
p
, and u
t
f in L
p
norm and almost everywhere (Theorem 8.11.1 and Corollary 8.11.3). We
can however think of H
2
as the upper half-plane C
+
= z = x+it: t > 0 in
the complex plane, and then u is the real part of an analytic function u+iv
on (
+
, unique up to a constant. We now turn to the study of this function.
We start with the Poisson kernel. If z = x +it then
i
z
=
t
(x
2
+t
2
)
+
ix
(x
2
+t
2
)
= P(x, t) +iQ(x, t) = P
t
(x) +iQ
t
(x).
P is the Poisson kernel, and Q is the conjugate Poisson kernel. Since
(P + iQ)(x + it) is analytic in x + it, Q is harmonic. Note that (Q
t
) is
not an approximate identity: it is an odd function and is not integrable.
On the other hand, Q
t
L
p
() for 1 < p , and for each such p there
exists k
p
such that |Q
t
|
p
k
p
/t
1/pt
. This is easy to see when p = since
167
168 The Hilbert transform, and Hilberts inequalities
[Q
t
(x)[ Q
t
(t) = 1/2t. If 1 < p < ,
_

[Q
t
(x)[
p
dx =
2
__
t
0
x
p
(x
2
+t
2
)
p
dx +
_

t
x
p
(x
2
+t
2
)
p
dx
_
__
t
0
dx
t
p
+
_

t
dx
x
p
_
=
2
_
1 +
1
p 1
_
1
t
p1
=
2p
(p 1)t
p1
= k
p
p
/t
p/p
.
If f L
p
(R), where 1 p < , we can therefore dene
Q
t
(v) = v
t
(x) = v(x, t) = Q
t
f =
1
yf(x y)
y
2
+t
2
dy,
and then u +iv is analytic. Thus v is harmonic in (x, t). Further,
[v(x, t)[ |Q
t
|
pt
|f|
p
k
pt
|f|
p
/t
1/p
,
and v is well-behaved at innity. But what happens when t 0?
2
(R)
We rst consider the simplest case, when p = 2. Since each Q
t
is a convo-
lution operator, it is sensible to consider Fourier transforms. Simple calcu-
lations, using the calculus of residues, and Jordans lemma (Exercise 11.1),
show that
T(P
t
)() =

P
t
() = e
2t[[
and T(Q
t
)() =

Q
t
() = isgn()e
2t[[
.
Here, an essential feature is that the Fourier transforms of Q
t
are uni-
formly bounded. Then
v
t
() =

Q
t
()

f() = isgn()e
2t[[

f(),
so that
|v
t
|
2
= | v
t
|
2

_
_
_
f
_
_
_
2
= |f|
2
,
by Plancherels theorem. Let
w() = isgn()

f().
Then w L
2
and |w|
2
=
_
_
_
f
_
_
_
2
= |f|
2
. Further,
[ v
t
() w()[
2
4[w()[
2
,
2
(R) 169
so that by the theorem of dominated convergence, v
t
w in L
2
-norm. We
dene the Hilbert transform H(f) to be the inverse Fourier transform of w.
Then by Plancherels theorem again, |H(f)|
2
= |f|
2
and v
t
H(f) in
L
2
-norm. Further v
t
=

P
t
H(f), and so v
t
= P
t
(H(f)). Thus v
t
H(f) in
L
2
norm and almost everywhere, by Theorem 8.11.1 and Corollary 8.11.3.
Finally,
H
2
(f)() = isgn()
H(f)() =
f(),
so that H is an isometry of L
2
(R) onto L
2
(R). Let us sum up what we have
shown. Let
Q
(f)(x) = sup
t>0
[Q
t
(f)(x)[ = sup
t>0
[v
t
(x)[.
Q
is sublinear.
Theorem 11.2.1 The Hilbert transform H is an isometry of L
2
(R) onto
L
2
(R), and H
2
(f) = f, for f L
2
(R). Q
t
(f) = P
t
(H(f)), so that
Q
t
(f) H(f) in norm, and almost everywhere, and |Q
(f)|
2
2 |f|
2
.
We have dened the Hilbert transform in terms of Fourier transforms. Can
we proceed more directly? As t 0, Q
t
(x) 1/x and

Q
t
() isgn().
This suggests that we should dene H(f) as h f, where h(x) = 1/x.
But h has a singularity at the origin, which we must deal with. Let us set
h
(x) = h(x) if [x[ and h
(x) = 0 if [x[ < . Then h
is not integrable,
but it is in L
p
for 1 
f(y)
x y
dy,
and [H
(f)(x)[ |h
|
2
|f|
2
.
Although neither Q
1
nor H
1
is integrable, their dierence is, and it can
be dominated by a bell-shaped function. This allows us to transfer results
from Q
t
(f) to H
(f). Let H
(f)(x) = sup
>0
[H
(f)(x)[. H
is sublinear; it
is called the maximal Hilbert transform.
Proposition 11.2.1 (Cotlars inequality: p = 2) Suppose that f
L
2
(R). Then H
(f) m(H(f)) + 2m(f), and H
is of strong type (2, 2).

Proof Let = log(e/2), and let
L(x) =
1
2
+(1 [x[) for [x[ 1,
=
1
x

x
x
2
+ 1
for [x[ > 1.

Then L is a continuous even integrable function on R, and it is strictly
decreasing on [0, ). |L|
1
= 1 + + log 2 = 2. Let = L/2. Then is
a bell-shaped approximate identity, and [h
[ 2
. Thus if f L
2
,
[H
(f)[ [Q
(f)[ +2m(f), by Theorem 8.11.2. But [Q
(f)[ = [P
(H(f))[
m(H(f)), again by Theorem 8.11.2. Thus H
(f) m(H(f)) + 2m(f). By

Theorem 8.5.1, H
is of strong type (2, 2).

2
(R). Then H
(f) H(f) in L
2
norm, and almost everywhere.
The limit
lim
0
1
_
[y[>
f(y)
x y
dy
is the Cauchy principal value of
_
f(y)/(x y) dy.
Proof If f is a step function, H
(f) Q
(f) 0 except at the points

of discontinuity of f. Thus it follows from Theorem 8.4.2 that if f L
2
then H
(f) Q
(f) 0 almost everywhere, and so H
(f) f almost
everywhere. Since [H
(f) Q
(f)[
2
4(m(f))
2
, it follows from the theorem
of dominated convergence that |H
(f) Q
(f)|
2
0, and so H
(f) f
in L
2
norm.
p
(R) for 1 < p <
What about other values of p? The key step is to establish a weak type (1, 1)
inequality: we can then use Marcinkiewicz interpolation and duality to deal
with other values of p. Kolmogoro [Kol 25] showed that the mapping f
H(f) is of weak type (1,1), giving a proof which is a tour de force of argument
by contradiction. Subsequent proofs have been given, using the harmonicity
of the kernels, and the analyticity of P + iQ. We shall however introduce
techniques due to Calder on and Zygmund [CaZ 52], applying them to the
Hilbert transform. These techniques provide a powerful tool for studying
other more general singular integral operators, and we shall describe these
at the end of the chapter.
Theorem 11.3.1 The mapping f Q
(f) is of weak type (1,1).

p
(R) for 1 < p < 171
Proof By Theorem 11.2.1, Q
is of strong type (2, 2). Suppose that f L

1
.
Without loss of generality we need only consider f 0. We consider the
dyadic ltration (T
j
), and set f
j
= c(f[T
j
).
Suppose that > 0. Let be the stopping time = infj : f
j
> ,
as in Doobs lemma. Since f
j
2
j
|f|
1
, > . We set M
j
= ( = j),
M =
j
(M
j
) = ( < ) and L = ( = ). We dene
g(x) = f(x) if x L,
= f
j
(x) if x M
j
.
The function g is the good part of f; note that |g|
1
= |f|
1
. The function
b = f g is the bad part of f; |b|
1
2 |f|
1
. Since
([Q
(f)[ > ) ([Q
(g)[ > /2) ([Q
(b)[ > /2),

we can consider the two parts separately.
We begin with the good part. If x M
j
, then f
j1
(x) , so that, since
f 0, f
j
(x) 2. If x L, f
j
(x) for all j, so that by the martin-
gale convergence theorem, f(x) for almost all x L. Consequently
|g|
2.
Applying Doobs lemma, (M) |f|
1
/, and so
_
g
2
d =
_
L
g
2
d +
_
M
g
2
d

_
L
g d +
_
|f|
1
_
4
2
5|f|
1
,
so that |Q
(g)|
2
2
4 |g|
2
2
20|f|
1
. Thus, by Markovs inequality,
([Q
(g)[ > /2) (20|f|

1
)(2/)
2
= 80 |f|
1
/.
We now turn to the bad part b. M is the union of a disjoint sequence (E
k
)
of dyadic intervals, for each of which
_
E
k
b d = 0. Let F
k
be the interval
with the same mid-point as E
k
, but two times as long, and let N =
k
F
k
.
Then
(N)
k
(F
k
) = 2
k
(E
k
) = 2(M) 2 |f|
1
/.
It is therefore sucient to show that
(([Q
(b)[ > /2) C(N)) 8 |f|

1
/,
and this of course follows if we show that
_
C(N)
[Q
(b)[ d 4 |f|
1
.
Let b
k
= b.I
E
k
. Then b =

k
b
k
and v
t
(b)(x) =

k
v
t
(b
k
)(x) for each x.
Consequently,
Q
(b) = sup
t>0
[v
t
(b)[ sup
t>0
k
[v
t
(b
k
)[
k
Q
(b
k
).
Thus
_
C(N)
Q
(b) d
_
C(N)
k
Q
(b
k
) d =
k
_
C(N)
Q
(b
k
) d
k
_
C(F
k
)
Q
(b
k
) d.
We now need to consider
_
C(F
k
)
Q
(b
k
) d in detail. Let E
k
= (x
0
l, x
0
+l],
so that F
k
= (x
0
2l, x
0
+ 2l]. If x
0
+y C(F
k
) then
v
t
(b
k
)(x
0
+y) =
_
l
l
b
k
(x
0
+u)Q
t
(y u) d(u)
=
_
l
l
b
k
(x
0
+u)(Q
t
(y u) Q
t
(y)) d(u),
since
_
l
l
b
k
(x
0
+u) d(u) = 0. Thus
[v
t
(b
k
)(x
0
+y)[ |b
k
|
1
sup
lul
[Q
t
(y u) Q
t
(y)[.
Now if [u[ l and [y[ > 2l then [y[ 2[y u[ < 3[y[, and so
[Q
t
(y u) Q
t
(y)[ =
1
y u
(y u)
2
+t
2

y
y
2
+t
2
=
1
u(y(y u) t
2
)
((y u)
2
+t
2
)(y
2
+t
2
)
4l
y
2
y(y u) +t
2
y
2
+t
2
6l
y
2
.
Thus
Q
(b
k
)(x
0
+y) = sup
t>0
[v
t
(b
k
)(x
0
+y)[
6l |b
k
|
1
y
2
,
and so
_
C(F
k
)
Q
(b
k
) d
6 |b
k
|
1
.
p
(R) for 1 < p < 173
Consequently
_
C(N)
[Q
(b)[ d
6
k
|b
k
|
1
=
6
|b|
1

12
|f|
1
.
Corollary 11.3.1 Suppose that 1 < p < . Then Q
is of strong type
(p, p). If f L
p
(R) then Q
t
(f) is convergent, in L
p
norm and almost
everywhere, to a function H(f), say. H(f) L
p
(R), and the linear mapping
f H(f) : L
p
(R) L
p
(R) is bounded.
Proof Suppose rst that 1 < p 2. It follows from the Marcinkiewicz
interpolation theorem that Q
is of strong type (p, p) for 1 < p < 2. If

f L
p
L
2
then Q
t
(f) H(f) 0 almost everywhere, as t 0, and
[Q
t
(f) Q
s
(f)[ 2Q
(f), so that [Q
t
(f) H(f)[ 2Q
(f). Thus Q
t
(f)
H(f) in L
p
-norm. Since L
2
L
p
is dense in L
p
, the remaining results of the
corollary now follow.
Suppose now that 2 < p < . If f L
p
(R) and g L
p
(R) then
_
gQ
t
(f) d =
_
Q
t
(g)f d,
and from this it follows that Q
t
(f) L
p
(R), and that the mappings f
Q
t
(f) : L
p
(R) L
p
(R) are uniformly bounded; there exists K such that
|Q
t
(f)|
p
K|f|
p
for all f L
p
(R) and t > 0.
Suppose that f L
2
(R)L
p
(R). Then Q
t
(f) H(f) in L
2
(R), Q
t
(f) =
P
t
(H(f)) and Q
t
(f) H(f) almost everywhere. Now Q
t
(f) : t > 0 is
bounded in L
p
, and so by Fatous lemma, |H(f)|
p
< K|f|
p
. But then
|Q
(f)|
p
= |P
(H(f))|
p
K|f|
p
. Since L
2
(R) L
p
(R) is dense in
L
p
(R), this inequality extends to all f L
p
(R). The remaining results now
follow easily from this.
Corollary 11.3.2 (Hilberts inequality) If 1 < p < there exists a
constant K
p
such that if f L
p
(R) and g L
p
(R) then
_
R
__
R
f(x)
x y
dx
_
g(y) dy
K
p
|f|
p
|g|
p
.
[Here the inner integral is the principal value integral.]
With these results, we can mimic the proof of Proposition 11.2.1 to obtain
the following.
Proposition 11.3.1 (Cotlars inequality) Suppose that 1 < p < and
that f L
p
(R). Then H
(f) m(H(f)) +2m(f), and H
is of strong type
(p, p).
Similarly we have the following.
Theorem 11.3.2 If f L
p
(R), where 1 < p < , then H
(f) H(f) in
L
p
-norm and almost everywhere.
11.4 Hilberts inequality for sequences
We can easily derive a discrete version of Hilberts inequality.
Theorem 11.4.1 (Hilberts inequality for sequences) If 1 < p <
there exists a constant K
p
such that if a = (a
n
) l
p
(Z) then
m=
n,=m
a
n
mn
p
K
p
|a|
p
p
.
Thus if b l
p
then
m=
b
m
n,=m
a
n
mn
K
p
|a|
p
|b|
p
.
Proof Let h
0
= 0, h
n
= 1/n for n ,= 0. Then h l
p
for 1 < p < , and
so the sum

n,=m
a
n
/(m n) converges absolutely. For 0 < < 1/2 let
J
= (2)
1/p
I
(,)
and let K
= (2)
1/p
I
(,)
, so that J
and K
are unit
vectors in L
p
(R) and L
p
(R) respectively. Then the principal value

_
J
(x) dx = lim
0
_
[x[>
J
(x) dx
is zero, while
1
[mn[ + 2

_
J
(x n)K
(y m) dx
1
[mn[ 2
,
for m ,= n. If (a
n
) and (b
m
) are sequences each with nitely many non-zero
terms, let
A
(x) =
n
a
n
J
(x n) and B
(y) =
m
b
m
K
(y m).
Then by Hilberts inequality, [
_
R
H(A
)(y)B
(y) dy[ K
p
|A
|
p
|B
|
p
.
But |A
|
p
= |a|
p
and |B
|
p
= |b|
p
, and
_
R
H(A
)(y)B
(y) dy
n,=m
a
n
b
m
mn
as 0.
Thus
m
b
m
n,=m
a
n
mn
K
p
|a|
p
|b|
p
;
letting b vary,
n,=m
a
n
mn
p
K
p
p
|a|
p
p
.
The usual approximation arguments then show that the result holds for
general a l
p
(Z) and b l
p
(Z).
11.5 The Hilbert transform on T
Let us now consider what happens on the circle T, equipped with Haar
measure P = d/2. If f L
1
(T), then we write E(f) for
_
T
f dP, and set
P
0
(f) = f E(f). For 1 p , P
0
is a continuous projection of L
p
(T)
onto L
p
0
(T) = f L
p
(T): E(f) = 0.
Let c(z) = (1 +z)/(1 z). If z = re
i
and r < 1 then
c(z) = 1 + 2
k=1
z
k
= 1 + 2
k=1
r
k
e
ik
=
k=
r
[k[
e
ik
+
k=
sgn(k)r
[k[
e
ik
= P
r
(e
i
) +iQ
r
(e
i
) =
_
1 r
2
1 2r cos +r
2
_
+i
_
2r sin
1 2r cos +r
2
_
.
P(re
i
) = P
r
(e
i
) and Q(re
i
) = Q
r
(e
i
) are the Poisson kernel and conju-
gate Poisson kernel, respectively. If f L
1
(T), we dene P
r
(f) = P
r
f and
Q
r
(f) = Q
r
f. P
r
0 and |P
r
|
1
= E(P
r
) = 1, and so |P
r
(f)|
p
|f|
p
for f L
p
(T), for 1 p . We dene the maximal function
m(f)(e
i
) = sup
0<t
1
2t
_
t
t
[f(e
i(+)
)[ d.
(P
r
)
0<r<1
is an approximate identity, and, arguing as in Theorem 8.11.2,
P
(f) = sup
0<r<1
[P
r
(f)[ m(f). From this it follows that if f L
p
(T),
then P
r
(f) f in L
p
norm and almost everywhere, for 1 p < .
Now let us consider the case when p = 2. If f L
2
(T), let
H(f) = i
k=
sgn(k)

f
k
e
ik
;
the sum converges in L
2
norm, and |H(f)|
2
= |f E(f)|
2
= |P
0
(f)|
2

|f|
2
. H(f) is the Hilbert transform of f. H
2
(f) = P
0
(f), so that H maps
L
2
0
(T) isometrically onto itself.
If f L
2
(T), then Q
r
(f) = P
r
(H(f)), so that Q
(f) P
(H(f)),
Q
r
(f) f in L
2
norm, and almost everywhere, and |Q
(f)|
2
2 |H(f)|
2

2 |f|
2
. Further Q
r
(e
i
) cot(/2) as r 1. Let us set, for 0 < < ,
H
(e
i
) = cot(/2) for < ,
= 0 for 0 < .
Then H
1r
and Q
r
are suciently close to show that H
(f) H(f) in L
2
norm, and almost everywhere, as 0.
What happens when 1 < p < ? It is fairly straightforward to use the
Calder onZygmund technique, the Marcinkiewicz intepolation theorem, and
duality to obtain results that correspond exactly to those for L
p
(R). It is
however possible to proceed more directly, using complex analysis, and this
we shall do.
First we have the following standard result.
Proposition 11.5.1 Suppose that 1 < p < and that u is a harmonic
function on D = z: [z[ < 1 with the property that
sup
0<r<1
_
1
2
_
2
0
[u(re
i
)[
p
d
_
< .
Then there exists f L
p
(T) such that u(re
i
) = P
r
(f)(e
i
) for all re
i
D.
Proof Let u
r
(e
i
) = u(re
i
). Then u
r
: 0 < r < 1 is bounded in L
p
(T),
and so there exist r
n
1 and f L
p
(T) such that u
r
n
f weakly as
n . Thus if 0 < r < 1 and 0 < 2 then P
r
(u
r
n
)(e
i
) P
r
(f)(e
i
).
But P
r
(u
r
n
) = u
rr
n
, and so u
r
(e
i
) = P
r
(f)(e
i
).
We begin with the weak type (1, 1) result.
1
(T). Then Q
r
(f) converges pointwise
almost everywhere to a function H(f) on T as r 1, and if > 0 then
P([H(f)[ > ) 4 |f|
1
/(2 +).
Proof By considering positive and negative parts, it is enough to consider
f 0 with |f|
1
= 1, and to show that P([H(f)[ > ) 2/(1 + ). For
z = re
i
set
F(z) = P
r
(f)(e
i
) +iQ
r
(f)(e
i
).
F is an analytic function on D taking values in the right half-plane H
r
=
x + iy : x > 0, and F(0) = 1. First we show that P([Q
r
(f)[ > )
2/(1 + ) for 0 < r < 1. Let w
(z) = 1 + (z )/(z + ): w
ia a M obius
transformation mapping H
r
conformally onto z : [z 1[ < 1. Note also
that if z H
r
and [z[ > then '(w
(z)) > 1.
Now let G
(z) = w
(F(z)) = J
(z) + iK
(z). Then J
(z) > 0, and if

[Q
r
(f)(z)[ > then J
(z) > 1. Further, J
(0) = w
(1) = 2/(1 +). Thus

P([Q
r
(f)[ > )
1
2
_
2
0
J
(re
i
) d = J
(0) =
2
1 +
.
Now let S(z) = 1/(1 + F(z)). Then S is a bounded analytic function
on D, and so by Proposition 11.5.1, there exists s L
2
(T) such that
S(re
i
) = P
r
(s)(e
i
). Thus S(re
i
) s(e
i
) almost everywhere as r 1.
Consequently, F, and so Q
r
(f), have radial limits, nite or innite, almost
everywhere. But, since P([Q
r
(f)[ > ) 2/(1 +) for 0 < r < 1, the limit
H(f) must be nite almost everywhere, and then P([H(f)[ > ) 2/(1+).
If f L
1
(T), let Q
(f) = sup
0<r<1
Q
r
(f).
Theorem 11.5.2 If 1 < p < then Q
is of strong type (p, p).

Proof It is enough to show that there exists a constant K
p
such that
|Q
r
(f)| K
p
|f|
p
for all f L
p
(T). For then, by Proposition 11.5.1,
there exists g L
p
(T) such that Q
r
(f) = P
r
(g), and then Q
(f) = P
(g),
so that |Q
(f)|
p
p
t
|g|
p
p
t
K
p
|f|
p
. If f L
p
(T), h L
p
(T), then
E(Q
r
(f)h) = E(fQ
r
(
h)), where

h(e
i
) = h(e
i
), and so a standard duality
argument shows that we need only prove this for 1 < p < 2. Finally, we
need only prove the result for f 0.
Suppose then that f L
p
(T), that f 0 and that 0 < r < 1. Let
= /(p + 1), so that 0 < < /2 and /2 < p < p/2 < . Note that
cos p = cos . As before, for z = re
i
set
F(z) = P
r
(f)(e
i
) +iQ
r
(f)(e
i
).
F is an analytic function on D taking values in the right half-plane H
r
, and
so we can dene the analytic function G(z) = (F(z))
p
= J(z)+iK(z). Then
|Q
r
(f)|
p
p

1
2
_
2
0
[G(re
i
)[ d.
We divide the unit circle into two parts: let
S = e
i
: 0 [ arg F(re
i
)[ ,
L = e
i
: < [ arg F(re
i
)[ < /2.
If e
i
S then [F(re
i
)[ P
r
(f)(e
i
)/ cos , so that
1
2
_
S
[G(re
i
)[ d
1
2(cos )
p
_
S
(P
r
(f)(e
i
))
p
d
(|P
r
(f)|
p
/ cos )
p
(|f|
p
/ cos )
p
.
On the other hand, if e
i
L then < arg G(re
i
) < 2, so that
J(re
i
) < 0 and [G(re
i
)[ J(re
i
)/ cos . But
1
2
_
L
J(re
i
) d +
1
2
_
S
J(re
i
) d = J(0) = (E(f))
p
0,
and so
1
2
_
L
[G(re
i
)[ d
1
2 cos
_
L
J(re
i
) d
1
2 cos
_
S
J(re
i
) d
1
2 cos
_
S
[G(re
i
)[ d |f|
p
p
/(cos )
p+1
.
Consequently |Q
r
(f)|
p
p
(2/(cos )
p+1
) |f|
p
p
.
The following corollaries now follow, as in Section 11.3.
Corollary 11.5.1 Suppose that 1 < p < . If f L
p
(T) then Q
r
(f) is
convergent, in L
p
norm and almost everywhere, to a function H(f), say, as
r 1. H(f) L
p
(R), and the linear mapping f H(f) : L
p
(R) L
p
(R)
is bounded.
p
(T), where 1 < p < , then H
(f) H(f) in
L
p
-norm and almost everywhere, as 0.
11.6 Multipliers 179
11.6 Multipliers
We now explore how the ideas of Section 11.3 extend to higher dimensions.
We shall see that there are corresponding results for singular integral op-
erators. These are operators which reect the algebraic structure of R
d
,
as we shall describe in the next two sections. We consider bounded linear
operators on L
2
(R
d
). If y R
d
, the translation operator
y
is dened as
y
(f)(x) = f(xy). This is an isometry of L
2
(R
d
) onto itself; rst, we con-
sider operators which commute with all translation operators. (This idea
clearly extends to L
2
(G), where G is a locally compact abelian group, and
is the starting point for commutative harmonic analysis.) Operators which
commute with all translation operators are characterized as follows.
Theorem 11.6.1 Suppose that T L(L
2
(R
d
)). The following are equiva-
lent.
(i) T commutes with all translation operators.
(ii) If g L
1
(R
d
) and f L
2
(R
d
) then T(g f) = g T(f).
(iii) There exists h L
(R
d
) such that
T(f) = h
f for all f L
2
(R
d
).
If these conditions are satised, then |T| = |h|
.
If so, then we write T = M
h
, and call T a multiplier.
Proof Suppose that (i) holds. If g L
1
(R
d
) and f, k L
2
(R
d
) then
g T(f), k) =
__

y
(T(f))g(y) dy, k
_
=
__
T(
y
(f))g(y) dy, k
_
=
_
T(
_

y
(f)g(y) dy), k
_
= T(g f), k) .
Thus (ii) holds.
On the other hand, if (ii) holds and if f L
2
(R
d
) then
T(
y
(f)) = lim
t0
T(
y
(P
t
(f)) = lim
t0
T(
y
(P
t
) f)
= lim
t0
y
(P
t
) T(f) =
y
(T(f)),
where (P
t
)
t>0
is the Poisson kernel on R
d
and convergence is in L
2
norm.
Thus (i) holds.
If (iii) holds then
(T(
y
(f)))() = h()
(
y
(f))() = h()e
2iy,)

f() =

(
y
(T(f)))(),
so that T
y
=
y
T, and (i) holds. Further,
|T(f)|
2
=
_
_
_
T(f)
_
_
_
2
|h|
_
_
_
f
_
_
_
2
= |h|
|f|
2
.
Finally, if (i) and (ii) hold, and f L
2
(R
d
), let
(f) = (P
1
T(f))(0) = c
d
_
T(f)(x)
([x[
2
+ 1)
d/2
dx.
Then [(f)[ |P
1
|
2
|T(f)|
2
|P
1
|
2
|T| |f|
2
, so that is a continuous
linear functional on L
2
(R
d
). Thus there exists k L
2
(R
d
) such that (f) =
f, k). Let j(y) = k(y). Then
(f j)(x) =
_
f(y)k(y x) dy =
_
f(y +x)k(y) dy
= (
x
(f)) = (P
1
T(
x
(f)))(0)
= (P
1

x
(T(f)))(0) =
_
P
1
(y)T(f)(y +x) dy
=
_
P
1
(x y)T(f)(y) dy = (P
1
T(f))(x).
Thus P
1
T(f) =f j. Taking Fourier transforms, e
2[[
T(f)() =

f()
j(),
so that
T(f)() = h()

f(), where h() = e
2[[
j(). Suppose that ([h[ >

|T|) > 0. Then there exists B of positive nite measure on which [h[ > |T|.
But then there exists g L
2
(R
d
) for which g = sgn hI
B
. Then
|T(g)|
2
2
=
_
B
[h()[
2
d > |T|
2
| g|
2
2
= |T|
2
|g|
2
2
,
giving a contradiction. Thus h L
(R
d
), and |h|
|T|.
11.7 Singular integral operators
R
d
is not only a locally compact abelian group under addition, but is also
a vector space. We therefore consider multipliers on L
2
(R
d
) which re-
spect scalar multiplication. If > 0 the dilation operator
is dened
as
(f)(x) = f(x/). If f L
p
(R
d
) then |
(f)|
p
=
1/p
|f|
p
, so that
dilation introduces a scaling factor which varies with p.
We consider multipliers on L
2
(R
d
) which commute with all dilation
operators.
11.7 Singular integral operators 181
If f L
2
(R
d
) then
(f)() =
d

f(). Thus if M
h
commutes with
dilations then
(M
h
(f))() =
d
h()

f() =

(
M
h
(f))() =
d
h()

f(),
so that h() = h(); h is constant on rays from the origin, and h() =
h(/[[). If we now proceed formally, and let K be the inverse Fourier
transform of h, then a change of variables shows that K(x) = K(x)/
d
; K
is homogeneous of degree d, and if x ,= 0 then K(x) = (1/[x[
d
)K(x/[x[).
Such functions have a singularity at the origin; we need to impose some
regularity on K. There are various possibilities here, but we shall suppose
that K satises a Lipschitz condition on S
d1
: there exists C < such
that [K(x) K(y)[ C[xy[ for [x[ = [y[ = 1. In particular, K is bounded
on S
d1
; let A = sup[K(x)[ : [x[ = 1.
Thus we are led to consider a formal convolution K f, where K is
homogeneous of degree d, and satises this regularity condition. K is not
integrable, but if we set K
(x) = K(x) for [x[ and K(x) = 0 for [x[ <

then K
L
p
(R
d
) for all 1 < p . Following the example of the Hilbert
transform, we form the convolution K
(f) = K
f, and see what happens

as 0.
Let us see what happens if f is very well behaved. Suppose that f is
a smooth function of compact support, and that f(x) = 1 for [x[ 2. If
[x[ 1 and 0 < < 1 then
(K
f)(x) (K
f)(x) =
__
S
d1
K() ds()
_
log(/),
so that if the integral is to converge, we require that (
_
S
d1
K() ds()) = 0.
We are therefore led to the following denition.
A function K dened on R
d
0 is a regular Calder onZygmund
kernel if
(i) K is homogeneous of degree d;
(ii) K satises a Lipschitz condition on the unit sphere S
d1
;
(iii)
_
S
d1
K() ds() = 0.
The Hilbert transform kernel K(x) = 1/x is, up to scaling, the only regular
Calder onZygmund kernel on R. On R
d
, the Riesz kernels c
d
x
j
/[x[
d+1
(1 j d) (where c
d
is a normalizing constant) are important examples of
regular Calder onZygmund kernels (see Exercise 11.3).
The regularity conditions lead to the following consequences.
Theorem 11.7.1 Suppose that K is a regular Calder onZygmund kernel.
(i) There exists a constant D such that [K(x y) K(x)[ D[y[/[x[
d+1
for [x[ > 2[y[.
(ii) (H ormanders condition) There exists a constant B such that
_
[x[>2[y[
[K(xy)K(x)[ dx B and
_
[x[>2[y[
[K
(xy)K
(x)[ dx B
for all > 0.
(iii) There exists a constant C such that
_
_
_

K
_
_
_
C for all > 0.

Proof We leave (i) and (ii) as exercises for the reader (Exercise 11.2); (i)
is easy, and, for K, (ii) follows by integrating (i). The argument for K
is
elementary, but more complicated, since there are two parameters [y[ and .
The fact that the constant does not depend on follows from homogeneity.
(iii)

K
() = lim
R
I
,R
, where
I
,R
=
_
[x[R
e
ix,)
K(x) dx.
Thus

K
(0) = 0, by condition (iii). For ,= 0 let r = /[[. If < 2r then

I
,R
= I
,2r
+I
2r,R
and
[I
,2r
[ =
_
[x[2r
(e
ix,)
1)K(x) dx
[[
_
[x[2r
[x[(A/[x[
d
) dx C
d
2r[[A = 2C
d
A.
We must therefore show that I
a,R
is bounded, for a 2r. Let z = /[[
2
,
so that [z[ = r and e
iz,)
= e
i
= 1. Now
I
a,R
=
_
a[xz[R
e
ixz,)
K(x z) dx =
_
a[xz[R
e
ix,)
K(x z) dx,
so that
I
a,R
=
1
2
_
I
a,R
_
a[xz[R
e
ix,)
K(x z) dx
_
= F +
1
2
_
a+r[x[Rr
e
ix,)
(K(x) K(x z)) dx +G,
where the fringe function F is of the form
_
ar[x[a+r
f(x) dx, where [f(x)[
A/(a r)
d
, so that [F[
d
A((a + r)/(a r))
d
, and the fringe function
p
(R
d
) for 1 p < 183
G is of the form
_
Rr[x[R+r
g(x) dx, where [g(x)[ A/(R r)
d
, so that
[G[
d
A((R +r)/(R r))
d
. Thus [F[ 3
d
d
A and [G[ 3
d
d
A.
Finally, H ormanders condition implies that
1
2
_
a+r[x[Rr
e
ix,)
(K(x) K(x z)) dx
B/2.
Suppose now that g is a smooth function of compact support. Then
(K
g)(x) =
_
[y[>1
g(x y)K(y) dy +
_
1[y[>
(g(x y) g(x))K(y) dy.
The rst integral denes a function in L
p
, for all 1 < p , while
[(g(x y) g(x))K(y)[ A
_
_
g
t
_
_
/[y[
d1
,
since [g(xy)g(x)[ |g
t
|
[y[, and so the second integral, which vanishes

outside a compact set, converges uniformly as 0. Thus for such g, T
(f)
converges pointwise and in L
p
norm as 0.
2
then K
(f) = K
f converges in L
2
norm, to
K(f) say, as 0.
For |K
(f)|
2
B|f|
2
, and so the result follows from Theorem 8.4.1.
p
(R
d
) for 1 p <
We now follow the proof of Theorem 11.3.1 to establish the following.
Theorem 11.8.1 T
is of weak type (1, 1), with a constant independent

of .
Proof As before, a scaling argument shows that it is enough to show that
K
1
is of weak type (1, 1).
Suppose that f L
1
(R
d
), that f 0 and that > 0. As in Theorem
11.3.1, we consider the dyadic ltration of R
d
, dene the stopping time ,
and dene the good part g and the bad part b of f. Then |g|
1
= |f|
1
,
|b|
1
2 |f|
1
and |g|
2
d
. Then
_
g
2
d (4
d
+ 1)|f|
1
, so that
|K
1
(f)|
2
2
(4
d
+ 1)B|f|
1
and ([K
1
(g)[ > /2) 4(4
d
+ 1)B|f|
1
/.
What about b? Here we take F
k
to be the cube with the same centre x
k
as E
k
, but with side 2
d/2
as big. This ensures that if x , F
k
and y E
k
then [x x
k
[ 2[y x
k
[. As in Theorem 11.3.1, it is enough to show that
_
C(F
k
)
[K
1
(b
k
)[ d B|b
k
|
1
. We use Hormanders condition:
_
C(F
k
)
[K
1
(b
k
)[ d =
_
C(F
k
)
_
E
k
K
1
(x y)b
k
(y) dy
dx
=
_
C(F
k
)
_
E
k
(K
1
(x y) K
1
(x x
k
))b
k
(y) dy
dx
_
C(F
k
)
_
E
k
[K
1
(x y) K
1
(x x
k
)[ [b
k
(y)[ dy dx
=
_
E
k
_
_
C(F
k
)
[K
1
(x y) K
1
(x x
k
)[ dx
_
[b
k
(y)[ dy
B|b
k
|
1
.
Compare this calculation with the calculation that occurs at the end of the
proof of Theorem 11.3.1.
Using the Marcinkiewicz interpolation theorem and duality, we have the
following corollary.
Corollary 11.8.1 For 1 < p < there exists a constant C
p
such that if
f L
p
(R
d
) then |K
(f)|
p
C
p
|f|
p
, and K
(f) converges in L
p
norm to
K(f), as 0.
What about convergence almost everywhere? Here we need a d-dimensional
version of Cotlars inequality.
Proposition 11.8.1 Suppose that T is a regular Calder onZygmund kernel.
There exists a constant C such that if f L
p
(R
d
), where 1 0
[K
(f)[ m(K(f)) +Cm(f).

This can be proved in the following way. Let be a bump function: a
smooth bell-shaped function on R
d
with ||
1
= 1 which vanishes outside
the unit ball of R
d
. Let
(x) =
d
(x/), for > 0. Then
K(f) =
K(
) f, so that, by Theorem 8.11.2, sup

>0
[K(
) f[ m(T(f)[.
Straightforward calculations now show that there exists D such that
[K
1
(x) K()(x)[ Dmin(1, [x[
(d+1)
) = L
1
(x), say.
Then, by scaling,
sup
>0
[T
(f) T(
) f[ sup
>0
[L
f[ |L|
1
m(f),
and Cotlars inequality follows from this.
The proof of convergence almost everywhere now follows as in the one-
dimensional case.
The results of this chapter are only the beginning of a very large subject, the
study of harmonic analysis on Euclidean space, and on other Riemannian
manifolds. An excellent introduction is given by Duoandikoetxea [Duo 01].
After several decades, the books by Stein [Stei 70] and Stein and Weiss
[StW 71] are still a valuable source of information and inspiration. If you
still want to know more, then turn to the encyclopedic work [Stei 93].
Exercises
11.1 Use contour integration and Jordans lemma to show that
P
t
() = e
2t[[
and

Q
t
() = isgn ()e
2t[[
.
11.2 Prove parts (i) and (ii) of Theorem 11.7.1.
11.3 Let R
j
(x) = c
d
x
j
/[x[
d+1
, where c
d
is a normalizing constant, be the
jth Riesz kernel.
(i) Verify that R
j
is a regular Calder onZygmund kernel.
(ii) Observe that the vector-valued kernel R = (R
1
, . . . , R
d
) is rota-
tional invariant. Deduce that the Fourier transform

R is rotational-
invariant. Show that

R
j
() = ib
d
j
/[[. In fact, c
d
is chosen so
that b
d
= 1.
Let T
j
be the singular integral operator dened by R
j
.
(iii) Show that

d
j=1
T
2
j
= I.
(iv) Suppose that f
0
L
2
(R
d
), and that f
j
= T
j
(f
0
). Let u
j
(x, t) =
P
t
(f
j
), for 0 j d. For convenience of notation, let x
0
= t. Show
that the functions u
j
satisfy the generalized CauchyRiemann equa-
tions
d
j=0
u
j
x
j
= 0,
u
j
x
k
=
u
k
x
j
,
for 0 j, k d. These equations are related to Cliord algebras,
and the Dirac operator. For more on this, see [GiM 91].
(v) Suppose that 1 < p < . Show that there exists a constant
A
p
such that if f is a smooth function of compact support on R
d
then
_
_
_
_
2
f
x
j
x
k
_
_
_
_
p
A
p
|f|
p
,
where is the Laplacian.
[Show that

2
f
x
j
x
k
p
= T
j
T
k
f.]
For more on this, see [Stei 70] and [GiM 91].
12
Khintchines inequality
12.1 The contraction principle
We now turn to a topic which will recur for the rest of this book. Let
(F, |.|
F
) be a Banach space (which may well be the eld of scalars). Let
(F) denote the space of all innite sequences in F, and let
d
(E) denote the
space of all sequences of length d in F. Then D
N
2
acts on (F); if D
N
2
and x = (x
n
) (F) we dene x() by setting x()
n
= (
n
()x
n
). Similarly
D
d
2
acts on
d
(F). In general, we shall consider the innite case (although
the arguments usually concern only nitely many terms of the sequence),
and leave the reader to make any necessary adjustments in the nite case.
First we consider the case where F is a space of random variables. Suppose
that X = (X
n
) is a sequence of random variables, dened on a probability
space (, , P) (disjoint from D
N
2
), and taking values in a Banach space
(E, |.|
E
). In this case we can consider
n
X
n
as a random variable dened
on D
N
2
. We say that X is a symmetric sequence if the distribution
of X() is the same as that of X for each D
N
2
. This says that each
X
n
is symmetric, and more. We shall however be largely concerned with
independent sequences of random variables. If the (X
n
) is an independent
sequence, it is symmetric if and only if each X
n
is symmetric.
If (X
n
) is a symmetric sequence and if (
n
) is a Bernoulli sequence of
random variables, independent of the X
n
, then (X
n
) and (
n
X
n
) have the
same distribution, and in the real case, this is the same as the distribution
of (
n
[X
n
[).
Symmetric sequences of random variables have many interesting proper-
ties which we now investigate. We begin with the contraction principle.
This name applies to many inequalities, but certainly includes those in the
next proposition.
Proposition 12.1.1 (The contraction principle) (i) Suppose that (X
n
)
is a symmetric sequence of random variables, taking values in a Banach
187
188 Khintchines inequality
space E. If = (
n
) is a bounded sequence of real numbers then
_
_
_
_
_
N
n=1
n
X
n
_
_
_
_
_
p
||
_
_
_
_
_
N
n=1
X
n
_
_
_
_
_
p
for 1 p < .
(ii) Suppose that (X
n
) and (Y
n
) are symmetric sequences of real random
variables dened on the same probability space (
1
,
1
, P
1
), that [X
n
[ [Y
n
[
for each n, and that (u
n
) is a sequence in a Banach space (E, |.|
E
). Then
_
_
_
_
_
N
n=1
X
n
u
n
_
_
_
_
_
p
_
_
_
_
_
N
n=1
Y
n
u
n
_
_
_
_
_
p
for 1 p < .
(iii) Suppose that (X
n
) is a symmetric sequence of real random variables
and that |X
n
|
1
1/C for all n. Suppose that (
n
) is a Bernoulli sequence
of random variables and that (u
n
) is a sequence in a Banach space (E, |.|
E
).
Then
_
_
_
_
_
N
n=1
n
u
n
_
_
_
_
_
p
C
_
_
_
_
_
N
n=1
X
n
u
n
_
_
_
_
_
p
for 1 p < .
Proof (i) We can suppose that ||
= 1. Consider the mapping

T :

N
n=1
n
X
n
from l
N
into L
p
(). Then T() is a convex com-
bination of T():
n
= 1, and so
_
_
_
_
_
N
n=1
n
X
n
_
_
_
_
_
p
= |T()|
p
max|T()|
p
:
n
= 1 =
_
_
_
_
_
N
n=1
X
n
_
_
_
_
_
p
.
(ii) Suppose that (
n
) is a sequence of Bernoulli random variables on a
separate space
2
= D
N
2
. Then
_
_
_
_
_
N
n=1
X
n
u
n
_
_
_
_
_
p
p
= E
1
(
_
_
_
_
_
N
n=1
X
n
u
n
_
_
_
_
_
p
E
)
= E
1
E
2
__
_
_
_
_
N
n=1
n
[X
n
[u
n
_
_
_
_
_
p
E
_
E
1
E
2
__
_
_
_
_
N
n=1
n
[Y
n
[u
n
_
_
_
_
_
p
E
_
(by (i))
= E
1
__
_
_
_
_
N
n=1
Y
n
u
n
_
_
_
_
_
p
E
_
=
_
_
_
_
_
N
n=1
Y
n
u
n
_
_
_
_
_
p
p
.
(iii) Again suppose that (X
n
) are random variables on (
1
,
1
, P
1
) and
that (
n
) is a sequence of Bernoulli random variables on a separate space
2
= D
N
2
. Then
_
_
_
_
_
N
n=1
n
u
n
_
_
_
_
_
p
p
= E
2
__
_
_
_
_
N
n=1
n
u
n
_
_
_
_
_
p
E
_
E
2
__
_
_
_
_
N
n=1
C
n
E
1
([X
n
[)u
n
_
_
_
_
_
p
E
_
(by (i))
E
2
_
E
1
__
_
_
_
_
N
n=1
C
n
[X
n
[u
n
_
_
_
_
_
E
_
p
_
(by the mean-value inequality)
E
2
E
1
__
_
_
_
_
N
n=1
C
n
[X
n
[u
n
_
_
_
_
_
p
E
_
(by Proposition 5.5.1)
= C
p
_
_
_
_
_
N
n=1
X
n
u
n
_
_
_
_
_
p
p
.
12.2 The reection principle, and Levys inequalities
The next result was originally due to Paul Levy, in the scalar-valued case.
Theorem 12.2.1 (The reection principle; Levys inequalities) Sup-
pose that (X
n
) is a symmetric sequence of random variables taking values
in a Banach space (E |.|
E
). Let S
m
= X
1
+ + X
m
, and let S
=
sup
m
|S
m
|
E
.
(i) If S
m
converges to S almost everywhere then P(S
>t) 2P(|S|
E
>t),
for t > 0.
(ii) If is an innite set of natural numbers, and S
= sup
|S
|
E
,
then P(S
> t) 2P(S
> t), for t > 0.

Proof We use a stopping time argument. Let = infj : |S
j
|
E
> t (we
set = if S
t). Let A
m
be the event ( = m). The events A
m
are
disjoint, and (S
> t) =
m=1
A
m
, so that P(S
> t) =
m=1
P(A
m
).
(i) Let B = (|S|
E
> t). Note that B = lim(|S
j
|
E
> t). We shall use the
fact that
S
n
=
1
2
(S + (2S
n
S)) =
1
2
([S
n
+ (S S
n
)] + [S
n
(S S
n
)]),
so that
|S
n
|
E
max (|S
n
+ (S S
n
)|
E
, |S
n
(S S
n
)|
E
)
= max (|S|
E
, |S
n
(S S
n
)|
E
) .
Let C
n
= (|S
n
(S S
n
)|
E
> t). Then
A
n
= (A
n
B) (A
n
C
n
),
so that P(A
n
) P(A
n
B) + P(A
n
C
n
). We shall show that these two
summands are equal.
If j > n, then
P(A
n
(|S
j
|
E
> t)) = P(A
n
(|S
n
+ (S
j
S
n
)|
E
> t))
= P(A
n
(|S
n
(S
j
S
n
)|
E
> t)),
by symmetry. Since
A
n
B = lim
j
(A
n
(|S
j
|
E
> t))
and
A
n
C
n
= lim
j
(A
n
(|S
n
(S
j
S
n
|
E
> t)),
P(A
n
B) = P(A
n
C
n
); thus P(A
n
) 2P(A
n
B). Adding,
P(S
> t) 2P(B) = 2P(|S|

E
> t).
(ii) Let E = (S
> t), and let

E
n
= (sup|S
|
E
: , n > t)
F
n
= (sup|2S
n
S
|
E
: , n > t).
Then, arguing as before, A
n
= (A
n
E
n
) (A
n
F
n
) and P(A
n
E
n
) =
P(A
n
F
n
), so that
P(A
n
) 2P(A
n
E
n
) 2P(A
n
E).
Adding, P(S
> t) 2P(E) = 2P(S
> t).
The reection principle has many important consequences.
Corollary 12.2.1 If (X
n
) is a symmetric sequence of random variables
then

n=1
X
n
converges almost everywhere if and only if it converges in
probability.
Proof Since a sequence which converges almost everywhere converges in
probability, we need only prove the converse. Suppose then that (S
n
) con-
verges in probability to S. First we show that, given > 0, there exists N
such that P(sup
nN
|S
n
S
N
|
E
> ) < . There is a subsequence (S
n
k
)
which converges almost everywhere to S. Let A
K
= (sup
kK
|S
n
k
S|
E
). Then (A
K
) is an increasing sequence, whose union contains the
set on which S
n
k
converges to S, and so there exists K such that
P(sup
kK
|S
n
k
S|
E
> ) < /4. Let N = n
K
. We discard the rst
N terms: let Y
j
= X
N+j
, let m
k
= n
K+k
N, let = m
k
: k N and
let Z
k
= Y
m
k1
+1
+ +Y
m
k
. The sequences (Y
j
) and (Z
k
) are symmetric.
Let T
j
=
j
i=1
Y
i
and let U
k
= T
m
k
=
k
l=1
Z
l
. Then T
j
S S
N
in prob-
ability, and U
k
S S
N
almost everywhere. Then, applying the reection
principle twice,
P( sup
nN
|S
n
S
N
|
E
> ) = P(T
> ) 2P(T
> )
= 2P(U
> ) 4P(|S S
N
|
E
> ) < .
We now use the rst BorelCantelli lemma. Let (
r
) be a sequence of positive
numbers for which

r=1
r
< . We can nd an increasing sequence (N
r
)
such that, setting B
r
= (sup
n>N
r
|S
n
S
N
r
|
E
>
r
), P(B
r
) <
r
. Thus the
probability that B
r
happens innitely often is zero: S
n
converges almost
everywhere.
Corollary 12.2.2 If (X
n
) is a symmetric sequence of random variables for
which

n=1
X
n
converges almost everywhere to S, and if S L
p
(E), where
0 t) dt
2p
_

0
t
p1
P(|S|
E
> t) dt = 2E(|S|
E
)
p
.
Since |S
n
S|
p
E
(2S
)
p
and |S
n
S|
p
E
0 almost everywhere,
E(|S
n
S|
p
E
) 0 as n , by the dominated convergence theorem.
Corollary 12.2.3 Suppose that (X
n
) is a symmetric sequence of random
variables for which

n=1
X
n
converges almost everywhere to S. Then, for
each subsequence (X
n
k
),

k=1
X
n
k
converges almost everywhere. Further,
if S L
p
(E), where 0 < p < , then

k=1
X
n
k
converges in L
p
norm.
Proof Let X
t
n
= X
n
, if n = n
k
for some k, and let X
t
n
= X
n
otherwise.
Then (X
t
n
) has the same distribution as (X
n
), and so it has the same con-
vergence properties. Let Y
n
=
1
2
(X
n
+ X
t
n
). Then

n=1
Y
n
=

k=1
X
n
k
,
from which the result follows.
12.3 Khintchines inequality
Let us now consider possibly the simplest example of a symmetric sequence.
Let X
n
=
n
a
n
, where (a
n
) is a sequence of real numbers and (
n
) is a
sequence of Bernoulli random variables. If (a
n
) l
1
, so that
n
a
n
converges
absolutely, then

n
n
()a
n
converges for all , and the partial sums s
n
converge in norm in L
(D
N
2
). On the other hand, if (a
n
) c
0
and (a
n
) , l
1
then

n
n
()a
n
converges for some, but not all, . What more can we
say?
First, let us consider the case where p = 2. Since
E(
m
n
) = E(1) = 1 if m = n, E(
m
n
) = E(
m
)E(
n
) = 0 otherwise,
(
n
) is an orthonormal sequence in L
2
(). Thus

n=1
n
a
n
converges in L
2
norm if and only if (a
n
) l
2
. If this is so then |
n=1
n
a
n
|
2
= |(a
n
)|
2
;
further, the series converges almost everywhere, by Corollary 12.2.1 (or by
the martingale convergence theorem). Thus things behave extremely well.
We now come to Khintchines inequality, which we prove for nite sums.
This does two things. First, it determines what happens for other values
of p. Second, and perhaps more important, it gives information about the
Orlicz norms |.|
exp
and |.|
exp
2
, and the distribution of the sum.
Theorem 12.3.1 (Khintchines inequality) There exist positive con-
stants A
p
and B
p
, for 0 < p < , such that if a
1
, . . . , a
N
are real numbers
and
1
, . . . ,
N
are Bernoulli random variables, then
A
p
|s
N
|
p
B
p
|s
N
|
p
,
12.3 Khintchines inequality 193
where s
N
=
N
n=1
n
a
n
and
2
= |s
N
|
2
2
=
N
n=1
a
2
n
.
If 0 
) 2e
2
/2
2
, for > 0.
Proof This proof was given by Khintchine and independently, in a slightly
dierent form, by Littlewood. The inclusion mapping L
q
L
p
is norm
decreasing for 0 < p < q < , and so |s
N
|
p
for 0 < p < 2 and
|s
N
|
p
for 2 < p < . Thus we can take A
p
= 1 for 0 < p 2 and
B
p
= 1 for 2 < p < . The interest lies in the other inequalities. First we
consider the case where 2 < p < . If 2k 2 < p < 2k, where 2k is an even
integer, then |s
N
|
2k2
|s
N
|
p
|s
N
|
2k
. Thus it is sucient to establish
the existence and asymptotic properties of A
2k
, where 2k is an even integer.
In this case,
_
_
_
_
_
N
n=1
n
a
n
_
_
_
_
_
2k
2k
= E(
N
n=1
n
a
n
)
2k
=
j
1
++j
N
=2k
(2k)!
j
1
! j
N
!
a
j
1
1
a
j
N
N
E(
j
1
1

j
N
N
)
=
j
1
++j
N
=2k
(2k)!
j
1
! . . . j
N
!
a
j
1
1
a
j
N
N
E(
j
1
1
) E(
j
N
N
),
by independence. Now E(
j
n
n
) = E(1) = 1 if j
n
is even, and E(
j
n
n
) =
E(
n
) = 0 if j
n
is odd. Thus many of the terms in the sum are 0, and
_
_
_
_
_
N
n=1
n
a
n
_
_
_
_
_
2k
2k
=
k
1
++k
N
=k
(2k)!
(2k
1
)! (2k
N
)!
a
2k
1
1
a
2k
N
N
.
But (2k
1
)! (2k
n
)! 2
k
1
k
1
! 2
k
N
k
N
! = 2
k
k
1
! k
N
!, and so
_
_
_
_
_
N
n=1
n
a
n
_
_
_
_
_
2k
2k
(2k)!
2
k
k!
k
1
++k
N
=k
k!
(k
1
)! (k
N
)!
a
2k
1
1
. . . a
2k
N
N
=
(2k)!
2
k
k!

2k
Thus we can take A
2k
= ((2k)!/2
k
k!)
1/2k
. Note that A
2k
1/
2k, and
that A
2k
(e/2k)
1/2
as k , by Stirlings formula.
Then, since E(S
n
N
) = 0 if n is odd,
E(e
ts
N
) =
n=0
t
n
E(s
n
N
)
n!
=
k=0
t
2k
E(s
2k
N
)
(2k)!
k=0
t
2k
(2k)!
(2k)!
2k
k!2
k
= e
t
2
2
/2
.
Similarly,
E(e
s
2
N
/4
2
) =
k=0
E(s
2k
N
)
2
2k
2k
k!

k=0
(2k)!
2
3k
(k!)
2
2,
since (2k)! 2
2k
(k!)
2
.
Further, by Markovs inequality,
P([s
N
[ > ) = 2P(s
N
> ) = 2e
t
E(e
ts
N
) 2e
t
e
t
2
2
/2
.
Setting t = /
2
, we obtain the nal inequality.
We now consider the case where 0 < p 2. Here we use Littlewoods
inequality. Note that the argument above shows that we can take A
4
= 3
1/4
.
Suppose that 0 < p < 2. Let = (4 2p)/(4 p), so that 1/2 = (1 )/p +
/4. Then, by Littlewoods inequality,
= |s
N
|
2
|s
N
|
(1)
p
|s
N
|
4
3
/4
|s
N
|
(1)
p
,
so that 3
1/p1/2
|s
N
|
p
, and we can take B
p
= 3
1/p1/2
. In particular
we can take B
1
=

3.
This part of the argument is due to Littlewood; unfortunately, he made a
mistake in his calculations, and obtained B
1
=

2. This is in fact the best
possible constant (take N = 2, a
1
= a
2
= 1), but this is much harder to
prove. We shall do so later (Theorem 13.3.1).
12.4 The law of the iterated logarithm
Why did Khintchine prove his inequality? In order to answer this, let us
describe another setting in which a Bernoulli sequence of random variables
occurs. Take = [0, 1), with Lebesgue measure. If x [0, 1), let x =
0 x
1
x
2
. . . be the binary expansion of x (disallowing recurrent 1s). Let
r
j
(x) = 2x
j
1, so that r
j
(x) = 1 if x
j
= 1 and r
j
(x) = 1 if x
j
= 0. the
functions r
j
are the Rademacher functions ; considered as random variables
on , they form a Bernoulli sequence of random variables. They are closely
connected to the dyadic ltration of [0, 1); the Rademacher function r
j
is
12.4 The law of the iterated logarithm 195
measurable with respect to the nite -eld
j
generated by the intervals
[k/2
j
, (k + 1)/2
j
), for 0 k < 2
j
1. Suppose now that x = 0.x
1
x
2
. . . is a
number in [0, 1), in its binary expansion (disallowing recurrent 1s). Let t
n
(x)
be the number of times that 1 occurs in x
1
, . . . , x
n
, and let a
n
(x) = t
n
(x)/n.
We say that x is 2-normal if a
n
(x)
1
2
as n . In 1909, Borel proved his
normal numbers theorem, the rst of all the strong laws of large numbers.
In its simplest form, this says that almost every number in [0, 1) is 2-normal.
We can express this in terms of the Rademacher functions, as follows. Let
s
n
(x) =

n
j=1
r
j
(x); then s
n
(x)/n 0 for almost all x. Once Borels
theorem had been proved, the question was raised: how does the sequence
(t
n
(x)
1
2
) behave as n ? Equivalently, how does the sequence (s
n
(x))
behave? Hardy and Littlewood gave partial answers, but in 1923, Khintchine
[Khi 23] proved the following.
Theorem 12.4.1 (Khintchines law of the iterated logarithm) For
n 3, let L
n
= (2nlog log n)
1/2
. If (r
n
) are the Rademacher functions and
s
n
=
n
j=1
r
j
then
lim sup
n
[s
n
(x)/L
n
[ 1 for almost all x [0, 1).
Proof The proof that follows is essentially the one given by Khinchine,
although he had to be rather more ingenious, since we use the reection
principle, which had not been proved in 1923. Suppose that > 1. We need
to show that for almost all x, [s
n
(x)[ > L
n
for only nitely many n, and
we shall use the rst BorelCantelli lemma to do so.
Let =
1/2
, so that 1 < < . Let n
k
be the least integer greater than
k
. The sequence n
k
is eventually strictly increasing there exists k
0
such
that n
k
> n
k1
> 3 for k > k
0
. Let
B
k
=
_
sup
n
k1
<nn
k
[s
n
[ > L
n
_
, for k k
0
.
Now L
n
k
/L
n
k1

as k , and so there exists k
1
k
0
so that
L
n
k
L
n
k1
for k k
1
. Thus if k > k
1
and n
k1
< n n
k
then
L
n
L
n
k1
L
n
k
, and so
B
k

_
sup
n
k1
<nn
k
[s
n
[ > L
n
k
_
_
s
n
k
> L
n
k
_
,
so that, since E(s
2
n
k
) = n
k
,
P(B
k
) P(s
n
k
> L
n
k
)
2P([s
n
k
[ > L
n
k
) (by the reection principle)
4e
log log n
k
(by Khintchines inequality)
4e
log(k log )
(by the choice of n
k
)
= 4
_
1
k log
_
,
and so

k=k
1
P(B
k
) < . Thus for almost all x, [s
n
(x)[ L
n
for all but
nitely many n.
Later Khintchine and Kolmogoro showed that this is just the right an-
swer:
lim sup
n
[s
n
(x)/L
n
[ = 1 for almost all x [0, 1).
We shall however not prove this; a proof, in the spirit of the above argument,
using a more detailed version of the De Moivre central limit theorem that
we shall prove in the next chapter, is given in [Fel 70], Theorem VIII.5.
12.5 Strongly embedded subspaces
We have proved Khintchines inequality for nite sums. From this, it is a
straightforward matter to prove the following result for innite sums.
Theorem 12.5.1 Let S be the closed linear span of the orthonormal sequence
(
n
)
n=1
in L
2
(D
N
2
), and suppose that f S. If 0 ) 2e
2
/2|f|
2
2
.
Proof The details are left to the reader.
The fact that all these norms are equivalent on S is remarkable, important,
and leads to the following denition. A closed linear subspace S of a Banach
function space X(E) is said to be strongly embedded in X(E) if whenever
f
n
S and f
n
0 in measure (or in probability) then |f
n
|
X(E)
0.
Proposition 12.5.1 If S is strongly embedded in X(E) and X(E) Y (E)
then the norms |.|
X(E)
and |.|
Y (E)
are equivalent on S, and S is strongly
embedded in Y (E).
12.5 Strongly embedded subspaces 197
Proof A simple application of the closed graph theorem shows that the
inclusion mapping X(E) Y (E) is continuous. If f
n
S and |f
n
|
Y (E)
0
then f
n
0 in measure, and so |f
n
|
X(E)
0. Thus the inverse mapping
is continuous on S, and the norms are equivalent on S. It now follows
immediately that S is strongly embedded in Y .
Proposition 12.5.2 Suppose that () = 1 and that 1 p < q < . If S
is a closed linear subspace of L
q
(E) on which the L
p
(E) and L
q
(E) norms
are equivalent, then S is strongly embedded in L
q
(E).
Proof We denote the norms on L
p
(E) and L
q
(E) by |.|
p
and |.|
q
. There
exists C
p
such that |f|
q
C |f|
p
for f S. We shall show that there
exists
0
> 0 such that if f S then
([f[
0
|f|
q
)
0
.
Suppose that f S, that > 0 and that ([f[ |f|
q
) < for some > 0.
We shall show that must be quite big. Let L = ([f[ |f|
q
). Then
|f|
p
p
=
_
L
[f[
p
d +
_
\L
[f[
p
d
_
L
[f[
p
d +
p
|f|
p
q
.
We apply H olders inequality to the rst term. Dene t by p/q + 1/t = 1.
Then
_
L
[f[
p
d
__
L
[f[
q
d
_
p/q
((L))
1/t

1/t
|f|
p
q
.
Consequently
|f|
p

_
p
+
1/t
_
1/p
|f|
q
C
p
_
p
+
1/t
_
1/p
|f|
p
.
Thus >
0
, for some
0
which depends only on C
p
, p and q. Thus if f S,
([f[
0
|f|
q
)
0
.
Suppose now that f
n
0 in probability. Let > 0. Then there exists n
0
such that ([f
n
[
0
) <
0
/2 for n n
0
, and so
0
|f
n
|
q

0
for n n
0
.
Consequently |f
n
|
q
< for n n
0
.
Corollary 12.5.1 The space S of Theorem 12.5.1 is strongly embedded in
L
exp
2, and in each of the L
p
spaces.
Proof S is certainly strongly embedded in L
p
, for 1 p < ; since the
norms |.|
p
and |.|
exp
2
are equivalent on S, it is strongly embedded in L
exp
2.
Combining this with Corollary 12.2.1, we have the following.
Corollary 12.5.2 Suppose that (a
n
) is a real sequence. The following are
equivalent:
(i)

n=1
a
2
n
< ;
(ii)

n=1
a
n
n
converges in probability;
(iii)

n=1
a
n
n
converges almost everywhere;
(iv)

n=1
a
n
n
converges in L
p
norm for some 0 < p < ;
(v)

n=1
a
n
n
converges in L
p
norm for all 0 < p < ;
(vi)

n=1
a
n
n
converges in L
exp
2 norm.
12.6 Stable random variables
Are there other natural examples of strongly embedded subspaces? A real-
valued random variable X is a standard real Gaussian random variable if it
has density function (1/2)
1/2
e
t
2
/2
, and a complex-valued random vari-
able X is a standard complex Gaussian random variable if it has density
function (1/2)e
[z[
2
. Each has mean 0 and variance E([X[
2
) = 1. If
(X
n
) is a sequence of independent standard Gaussian random variables and
(a
1
, . . . , a
N
) are real numbers then S
N
=

N
n=1
a
n
X
n
is a normal random
variable with mean 0 and variance
2
= E
_
N
n=1
a
n
X
n
_
2
=
N
n=1
[a
n
[
2
;
that is, S
N
/ is a standard Gaussian random variable. Thus if 0 < q <
then
E([S
N
[
q
) =
q
_
2
_

0
t
q
e
t
2
/2
dt
=
q
_
2
q
_

0
u
(q1)/2
e
u
du
=
_
2
q
((q + 1)/2)
q
.
Thus if S is the closed linear span of (X
n
) in L
2
then all the L
p
norms on
S are multiples of the L
2
norm, and the mapping (a
n
)

n=1
a
n
X
n
is a
scalar multiple of an isometry of l
2
into L
p
(). Similarly, if |S
N
|
2
=
_
3/8
then E(e
S
2
n
) = 2, so that in general |S
N
|
exp
2
=
_
8/3 |S
N
|
2
, the mapping
(a
n
)
n=1
a
n
X
n
is a scalar multiple of an isometry of l
2
into L
exp
2, and
the image of l
2
is strongly embedded in L
exp
2.
12.7 Sub-Gaussian random variables 199
Here is another example. A real random variable X is said to have the
Cauchy distribution with parameter a if it has probability density function
a/(t
2
+ a
2
). If so, then it has characteristic function E(e
iXt
) = e
[at[
. X
is not integrable, but is in L
q
(), for 0 < q < 1. Now let (X
n
) be an inde-
pendent sequence of random variables, each with the Cauchy distribution,
with parameter 1. If (a
1
, . . . , a
N
) are real numbers then S
N
=

N
n=1
a
n
X
n
is a Cauchy random variable with parameter |(a
n
)|
1
, so that S
N
/ |(a
n
)|
1
is a Cauchy random variable with parameter 1. Thus the mapping (a
n
)
n=1
a
n
X
n
is a scalar multiple of an isometry of l
1
into L
q
(), for 0 < q < 1,
and the image of l
1
q
(), for 0 < q < 1.
These examples are special cases of a more general phenomenon. If X
is a standard real Gaussian random variable then its characteristic function
E(e
itX
) is e
t
2
/2
, while if X has Cauchy distribution with density 1/(x
2
+1)
then its characteristic function is e
[t[
. In fact, for each 0 < p < 2 there exists
a random variable X with characteristic function e
[t[
p
/p
; such a random
variable is called a symmetric p-stable random variable. X is not in L
p
(),
but X L
q
() for 0 < q < p. If (X
n
) is an independent sequence of random
variables, each with the same distribution as X, and if a
1
, . . . , a
N
are real,
then S
N
/ |(a
n
)|
p
= (
N
n=1
a
n
X
n
)/ |(a
n
)|
p
has the same distribution as X;
thus if 0 < q < p, the mapping (a
n
)

n=1
a
n
X
n
is a scalar multiple of
an isometry of l
p
into L
q
(), and the image of l
p
is strongly embedded in
L
q
(), for 0 < q < p.
12.7 Sub-Gaussian random variables
Recall that Khintchines inequality shows that if S
N
=

N
n=1
a
n
n
then
its moment generating function E(e
tX
) satises E(e
tX
) e
2
t
2
/2
. On the
other hand, if X is a random variable with a Gaussian distribution with
mean 0 and variance E(X
2
) =
2
, its moment generating function E(e
tX
)
is e
2
t
2
/2
. This led Kahane [Kah 85] to make the following denition. A
random variable X is sub-Gaussian, with exponent b, if E(e
tX
) e
b
2
t
2
/2
for
< t < .
The next result gives basic information about sub-Gaussian random vari-
ables.
Theorem 12.7.1 Suppose that X is a sub-Gaussian random variable with
exponent b. Then
(i) P(X > R) e
R
2
/2b
2
and P(X < R) e
R
2
/2b
2
for each R > 0;
(ii) X L
exp
2 and |X|
exp
2
2b;
(iii) X is integrable, E(X) = 0, and E(X
2k
) 2
k+1
k!b
2k
for each positive
integer k.
Conversely if X is a real random variable which satises (iii) then X is
sub-Gaussian with exponent 2
2b.
Proof (i) By Markovs inequality, if t > 0 then
e
tR
P(X > R) E(e
tX
) e
b
2
t
2
/2
.
Setting t = R/b
2
, we see that P(X > R) e
R
2
/2b
2
. Since X is also
sub-Gaussian with exponent b, P(X < R) e
R
2
/2b
2
as well.
(ii)
E(e
X
2
/4b
2
) =
1
2b
2
_

0
te
t
2
/4b
2
P([X[ > t) dt
1
b
2
_

0
te
t
2
/4b
2
dt = 2.
(iii) Since X L
exp
2, X is integrable. Since tx e
tx
1, tE(X)
e
b
2
t
2
/2
1, from which it follows that E(X) 0. Since X is sub-Gaussian,
E(X) 0 as well. Thus E(X) = 0.
Further,
E(X
2k
) = 2k
_

0
t
2k1
P([X[ > t) dt
2.2k
_

0
t
2k1
e
t
2
/2b
2
dt
= (2b
2
)
k
2k
_

0
s
k1
e
s
ds = 2
k+1
k!b
2k
.
Finally, suppose that X is a real random variable which satises (iii). If
y > 0 and k 1 then
y
2k+1
(2k + 1)!

y
2k
(2k)!
+
y
2k+2
(2k + 2)!
,
so that
E(e
tX
) 1 +
n=2
E(
[tX[
n
n!
)
1 + 2
k=1
E(
[tX[
2k
(2k)!
)
1 + 4
k=1
k!(2b
2
t
2
)
k
(2k)!
1 +
k=1
(4b
2
t
2
)
k
k!
= e
4b
2
t
2
,
since 2(k!)
2
(2k)!
Note that this theorem shows that if X is a bounded random variable
with zero expectation then X is sub-Gaussian.
If X
1
, . . . , X
N
are independent sub-Gaussian random variables with ex-
ponents b
1
, . . . , b
N
respectively, and a
1
, . . . , a
N
are real numbers, then
E(e
t(a
1
X
1
++a
N
X
N
)
) =
N
n=1
E(e
ta
n
X
n
)
N
n=1
e
a
2
n
b
2
n
/2
,
so that a
1
X
1
+ + a
N
X
N
is sub-Gaussian, with exponent (a
2
1
b
2
1
+ +
a
2
N
b
2
N
)
1/2
. We therefore obtain the following generalization of Khinchines
inequality.
Proposition 12.7.1 Suppose that (X
n
) is a sequence of independent iden-
tically distributed sub-Gaussian random variables with exponent b, and let S
be their closed linear span in L
2
. Then S is strongly embedded in L
exp
2.
12.8 Kahanes theorem and Kahanes inequality
We now turn to the vector-valued case. We restrict our attention to an
independent sequence of symmetric random variables, taking values in the
unit ball of a Banach space E.
Theorem 12.8.1 Suppose that (X
n
) is an independent sequence of sym-
metric random variables, and suppose that

n=1
X
n
converges almost ev-
erywhere to S. Let S
= sup
n
|S
n
|
E
. Then, if t > 0,
P(S
> 2t + 1) 4(P(S
> t))
2
.
Proof Once again, we use a stopping time argument. Let T = infj: |S
j
| >t
and let A
m
= (T = m). Fix an index k, and consider the event B
k
=
(|S
k
|
E
> 2t + 1). Clearly B
k
(T k), and so
P(B
k
) =
k
j=1
P(A
j
B
k
).
But if A
j
then |S
j1
()|
E
t, so that |S
j
()|
E
t + 1. Thus if
A
j
B
k
, |S
k
S
j
()|
E
> t. Using the fact that A
j
and S
k
S
j
are
independent, we therefore have
P(A
j
B
k
) P(A
j
(|S
k
S
j
|
E
> t)) = P(A
j
)P(|S
k
S
j
|
E
> t).
Applying the reection principle to the sequence (S
k
S
j
, S
j
, 0, 0, . . .), we
see that
P(|S
k
S
j
|
E
> t) 2P(|S
k
|
E
> t) 2P(S
> t).
Substituting and adding,
P(B
k
) =
k
j=1
P(A
j
B
k
) 2(
k
j=1
P(A
k
))P(S
> t) 2(P(S
> t))
2
.
Using the reection principle again,
P( sup
1nk
|S
n
|
E
> 2t + 1) 2P(B
k
) 4(P(S
> t))
2
.
Letting k , we obtain the result.
Theorem 12.8.2 (Kahanes Theorem) Suppose that (X
n
) is an inde-
pendent sequence of symmetric random variables, taking values in the unit
ball of a Banach space E. If

n=1
X
n
converges almost everywhere to S
then S
L
exp
, E(e
S
) < , for each > 0, and

n=1
X
n
converges to
S in L
exp
norm.
Proof Suppose that > 0. Choose 0 < < 1 so that e
< 3/2 and

e
4
< 1/2. Since S
n
S almost everywhere, there exists N such that
P(|S S
N
|
E
> ) < /8. Let Z
n
= X
N+n
, let R
k
=

k
j=1
Z
j
, let R =
j=1
Z
j
, and let R
= sup
k
|R
k
|
E
. We shall show that E(e
R
) 2,
so that R
L
exp
and |R
|
exp
1/. Since S
N + R
, it follows
that S
L
exp
, that |S
|
exp
|N|
exp
+ |R
|
exp
(N/ log 2) + 1/ and
that E(e
S
) e
N
E(e
R
) 2e
N
. Further, since |S
n
S|
E
2R
for
n N, |S
n
S|
exp
2/ for n N. Since this holds for any > 0,
S
n
S in L
exp
norm.
It remains to show that E(e
R
) 2. Since R = S S
N
, P(|R|
E
> ) <
/8, and so by the reection principal P(R
> ) < /4. Let = + 1,

let t
0
= = 1, and let t
r
= 2
r
1, for r N. Then t
r+1
= 2t
r
+ 1;
applying Theorem 12.8.1 inductively, we nd that
P(R
> t
r
)

2
r
4
.
Then, since e
2
<
1
2
,
E(e
R
) e
t
0
P(R
t
0
) +
r=0
e
t
r+1
P(t
r
< R
t
r+1
)
e
r=0
e
t
r+1
P(R
> t
r
)
3
2
+
1
4
r=1
e
(2
r+1
1)
2
r
=
3
2
+
e
r=1
_
e
2
_
2
r
<
3
2
+
1
4
r=1
2
2
r
< 2.
Corollary 12.8.1 S L
p
(), for 0 < p < , and S
n
S in L
p
norm.
Corollary 12.8.2 Suppose that (
n
) is a Bernoulli sequence of random vari-
ables, and that E is a Banach space. Let
S =
_

n=1
n
x
n
: x
n
E,
n=1
n
x
n
converges almost everywhere
_
.
Then S is strongly embedded in L
exp
(E).
Proof Take a = 1 and = e
5
, so that e
< 3/2 and e

4
< 1/2. If s =
n=1
n
x
n
S, then |s|
1
< . Suppose that |s|
1

2
/8. Then |x
n
| 1
for each n, and so we can apply the theorem. Also P(|s|
E
> ) /8,
by Markovs inequality, and the calculations of the theorem then show that
|s|
exp
1. This shows that S is strongly embedded in L
exp
, and the nal
inequality follows from this.
Corollary 12.8.3 (Kahanes inequality) If 1 < p < q then there exists
a constant K
pq
such that if u
1
, . . . , u
n
E then
_
_
_
_
_
N
n=1
n
u
n
_
_
_
_
_
q
K
pq
_
_
_
_
_
N
n=1
n
u
n
_
_
_
_
_
p
.
We shall prove a more general form of Kahanes inequality in the next
chapter.
Spellings of Khintchines name vary. I have followed the spelling used in
his seminal paper [Khi 23]. A similar remark applies to the spelling of
Kolmogoro.
For more details about p-stable random variables, see [Bre 68] or [ArG 80].
We have discussed Khintchines use of his inequality. But why did Little-
wood prove it? We shall discuss this in Chapter 18.
Exercises
12.1 Suppose that L
(, , ) is an Orlicz space and that f L
. Sup-
pose that g is a measurable function for which ([g[ > t) 2([f[ >
t) for all t > 0. Show that g L
and |g|
2 |f|
.
Hint: Consider the functions g
1
and g
1
dened on D
2
as
g
1
(, 1) = g(), g
1
(, 1) = 0,
g
1
(, 1) = 0, g
1
(, 1) = g().
12.2 Let
A
n
=
_
1
2
n
,
1
2
n
+
1
2
n+1
_
, B
n
=
_
1
2
n
+
1
2
n+1
,
1
2
n1
_
,
and let X
n
= n(I
A
n
I
B
n
). Show that (X
n
) is a symmetric sequence
of random variables dened on (0, 1], equipped with Lebesgue mea-
sure. Let S
n
=

n
j=1
X
j
and S =

j=1
X
j
. Show that S
=
[S[, and that S
L
exp
. Show that S
n
S pointwise, but that
|S S
n
|
exp
= 1/ log 2, so that S
n
, S in norm. Compare this with
Corollary 12.2.2.
12.3 Suppose that a
1
, . . . , a
N
are real numbers with

N
n=1
a
2
n
= 1. Let
f =
N
n=1
n
a
n
and let g =
N
n=1
(1 +i
n
a
n
).
Exercises 205
(a) Use the arithmetic-mean geometric mean inequality to show
that |g|

e.
(b) Show that E(fg) = i.
(c) Show that we can take B
1
=

e in Khintchines inequality.
12.4 Suppose that X is a random variable with Cauchy distribution with
parameter a. Show that E(e
iXt
) = e
[at[
. [This is a standard exercise
in the use of the calculus of residues and Jordans lemma.]
12.5 Suppose that F is a strongly embedded subspace of L
p
(), where
2 < p < . Show that F is isomorphic to a Hilbert space, and that
F is complemented in L
q
() (that is, there is a continuous linear
projection of L
q
() onto F) for p
t
q p.
13
Hypercontractive and logarithmic Sobolev
inequalities
13.1 Bonamis inequality
In the previous chapter, we proved Kahanes inequality, but did not estimate
the constants involved. In order to do this, we take a dierent approach.
We start with an inequality that seems banal, and has an uninformative
proof, but which turns out to have far-reaching consequences. Throughout
this chapter, we set r
p
= 1/
p 1, for 1 < p < .

Proposition 13.1.1 (Bonamis inequality) Let
F
p
(x, y) = (
1
2
([x +r
p
y[
p
+[x r
p
y[
p
))
1/p
,
where x, y R. Then F
p
(x, y) is a decreasing function of p on (1, ).
Proof By homogeneity, we can suppose that x = 1. We consider three cases.
First, suppose that 1 < p < q 2 and that 0 [r
p
y[ 1. Using the
binomial theorem and the inequality (1 +x)
1 + x for 0 < 1, and

putting = p/q, we nd that
F
q
(1, y) =
_
1 +
k=1
_
q
2k
__
y
2
q 1
_
k
_
1/q
_
1 +
p
q
k=1
_
q
2k
__
y
2
q 1
_
k
_
1/p
.
206
Now
p
q
_
q
2k
__
1
q 1
_
k
=
p
q
q(q 1) (q 2k + 1)
(2k)!(q 1)
k
=
p(2 q) (2k 1 q)
(2k)!(q 1)
k1
p(2 p) (2k 1 p)
(2k)!(p 1)
k1
=
_
p
2k
__
1
p 1
_
k
.
Thus
F
q
(1, y)
_
1 +
k=1
_
p
2k
__
y
2
p 1
_
k
_
1/p
= F
p
(1, y).
Second, suppose that 1 s t and 1 +st > s +t. Set = r
q
/r
p
and
= 1/[r
p
y[. Then, using the rst case,
F
q
(1, y) = (
1
2
([1 +r
p
y[
q
+[1 r
p
y[
q
))
1/q
=
1
(
1
2
([ +[
q
+[ [
q
))
1/q
(
1
2
([1 +[
q
+[1 [
q
))
1/q
(
1
2
([1 +[
p
+[1 [
p
))
1/p
= F
p
(1, y).
Again, let = r
q
/r
p
=
_
(p 1)/(q 1). Note that we have shown that
the linear mapping K L(L
p
(D
2
), L
q
(D
2
)) dened by
K(f)(x) =
_
D
2
k(x, y)f(y) d(y),
where k(1, 1) = k(1, 1) = 1 + and k(1, 1) = k(1, 1) = 1 , is
norm-decreasing.
Third, suppose that 2 p < q < . Then 1 < q
t
< p
t
2 and
2
= (p 1)/(q 1) = (q
t
1)/(p
t
1), so that K is norm-decreasing from
L
q
to L
p
. But k is symmetric, and so K

t
= K is norm-decreasing from L
p
to L
q
.
Next we extend this result to vector-valued functions.
F
p
(x, y) = (
1
2
(|x +r
p
y|
p
+|x r
p
y|
p
))
1/p
,
208 Hypercontractive and logarithmic Sobolev inequalities
where x and y are vectors in a normed space (E, |.|
E
). Then F
p
(x, y) is a
decreasing function of p on (1, ).
Proof We need the following lemma.
Lemma 13.1.1 If x and z are vectors in a normed space and 1 < 1
then
|x +z|
1
2
(|x +z| +|x z|) +

2
(|x +z| |x z|).
Proof Since
x +z =
_
1 +
2
_
(x +z) +
_
1
2
_
(x z),
we have
|x +z|
_
1 +
2
_
|x +z| +
_
1
2
_
|x z|
=
1
2
(|x +z| +|x z|) +

2
(|x +z| |x z|).
We now prove the corollary. Let us set s = x + r
p
y, t = x r
p
y and
= r
q
/r
p
, so that 0 < < 1.
_
1
2
(|x +r
q
y|
q
+|x r
q
y|
q
)
_
1/q
=
_
1
2
(|x +r
p
y|
q
+|x r
p
y|
q
)
_
1/q
_
1
2
_
[
1
2
(|s| +|t|) + (/2)(|s| |t|)]
q
+[
1
2
(|s| +|t|) (/2)(|s| |t|)]
q
__
1/q
_
1
2
_
[
1
2
(|s| +|t|) +
1
2
(|s| |t|)]
p
+[
1
2
(|s| +|t|)
1
2
(|s| |t|)]
p
__
1/p
= (
1
2
(|s|
p
+|t|
p
))
1/p
=
_
1
2
(|x +r
p
y|
p
+|x r
p
y|
p
)
_
1/p
.
We now extend Bonamis inequality.
Theorem 13.1.1 (Bonamis Theorem) Suppose that 1 < p < q < ,
and that x
A
: A 1, . . . , N is a family of vectors in a normed space
(E, |.|
E
). Then
_
_
_
_
_
A
r
[A[
q
w
A
x
A
_
_
_
_
_
L
q
(E)
_
_
_
_
_
A
r
[A[
p
w
A
x
A
_
_
_
_
_
L
p
(E)
,
where the w
A
are Walsh functions.
Proof We prove the result by induction on N. The result is true for N = 1,
by Corollary 13.1.1. Suppose that the result is true for N 1. We can
write D
N
2
= D
N1
2
D
N
2
, and P
N
= P
N1
P
N
. Let P(N 1) denote
the set of subsets of 1, . . . , N 1 and let P(N) denote the set of subsets
of 1, . . . , N. If B P(N 1), let B
+
= B N, so that P(N) =
P(N 1) B
+
: B P(N 1). Let
u
p
=
BP(N1)
r
[B[
p
w
B
x
B
and v
p
=
BP(N1)
r
[B[
p
w
B
x
B
+,
so that
AP(N)
r
[A[
p
w
A
x
A
= u
p
+
N
r
p
v
p
; let u
q
and v
q
be dened similarly.
Then we need to show that
|u
q
+
N
r
q
v
q
|
L
q
(E)
|u
p
+
N
r
p
v
p
|
L
p
(E)
.
Now, by the inductive hypothesis, for each D
N
2
,
E
N1
_
|u
q
+
N
()r
q
v
q
|
q
E
_
1/q
=
E
N1
_
_
_
_
_
_
BP(N1)
r
[B[
q
(x
B
+
N
()r
q
y
B
)
_
_
_
_
_
_
q
E
1/q
E
N1
_
_
_
_
_
_
BP(N1)
r
[B[
p
(x
B
+
N
()r
q
y
B
)
_
_
_
_
_
_
p
E
1/p
= E
N1
(|u
p
+
N
()r
q
v
p
|
p
E
)
1/p
.
Thus, using Corollary 5.4.2 and the result for n = 1,
|u
q
+
N
r
q
v
q
|
L
q
(E)
= (E
N
(E
N1
(|u
q
+
N
r
q
v
q
|
q
E
)))
1/q
(E
N
(E
N1
(|u
p
+
N
r
q
v
p
|
p
E
)))
1/p
(E
N1
(E
N
(|u
p
+
N
r
q
v
p
|
q
E
)
p/q
))
1/p
(E
N1
(E
N
(|u
p
+
N
r
p
v
p
|
p
E
)))
1/p
= |u
p
+
N
r
p
v
p
|
L
p
(E)
.
13.2 Kahanes inequality revisited
We have the following generalization of Kahanes inequality (which corre-
sponds to the case n = 1). Let W
n
denote the set of Walsh functions w
A
with [A[ = n and let H
n
(E) be the closed linear span of random vectors of
the form w
A
u
A
, with [A[ = n.
Theorem 13.2.1 Suppose that (u
k
) is a sequence in a Banach space E and
that (w
A
k
) is a sequence of distinct elements of W
n
. Then if 1 < p < q
_
_
_
_
_
K
k=1
w
A
k
u
k
_
_
_
_
_
L
q
(E)
_
q 1
p 1
_
n/2
_
_
_
_
_
K
k=1
w
A
k
u
k
_
_
_
_
_
L
p
(E)
.
Thus H
n
p
for all 1 < p < . Further H
1
(E) is
strongly embedded in L
exp
2(E) and H
2
(E) is strongly embedded in L
exp
(E).
Proof If S
K
=
K
k=1
k
u
k
and |S
K
|
2
1/(2
e) then
E(e
|S
K
|
2
) =
j=0
E(|S
K
|
2j
)
j!

j=0
(2j)
j
2
2j
e
j
j!

j=0
1
2
j
= 2,
since j
j
e
j
j! (Exercise 3.5).
Similarly, if T
K
=

K
k=1
w
A
k
u
k
with [A
k
[ = 2 for all k and |T
K
|
2
1/e
then
E(e
|T
K
|
) =
j=0
E(|T
K
|
j
)
j!

j=0
j
j
e
j
j!

j=0
1
2
j
= 2.
We also have the following result in the scalar case.
Corollary 13.2.1 span H
k
: k n is strongly embedded in L
p
for all
1 2
then
|f|
q

n
j=1
|f
j
|
q
j=1
(q 1)
j/2
|f
j
|
2
j=1
(q 1)
j
1/2
j=1
|f
j
|
2
2
1/2
(q 1)
n+1
q 2
|f|
2
.
13.3 The theorem of Latala and Oleszkiewicz
Theorem 13.2.1 gives good information about what happens for large values
of p (which is the more important case), but does not deal with the case
where p = 1. We do however have the following remarkable theorem relating
the L
1
(E) and L
2
(E) norms of Bernoulli sums, which not only shows that
2 is the best constant in Khintchines inequality but also shows that the
same constant works in the vector-valued case.
Theorem 13.3.1 (LatalaOleszkiewicz [La O 94]) Let S
d
=
d
i=1
i
a
i
,
where
1
, . . . ,
d
are Bernoulli random variables and a
1
, . . . , a
d
are vectors in
a normed space E. Then |S
d
|
L
2
(E)

2 |S
d
|
L
1
(E)
.
Proof The Walsh functions form an orthonormal basis for L
2
(D
d
2
), so that
if f C
R
(D
d
2
) then
f =
f
A
w
A
= E(f) +
d
i=1
f
i
i
+
[A[>1
f
A
w
A
,
and |f|
2
2
= f, f) =
f
2
A
.
We consider a graph with vertices the elements of D
d
2
and edges the set
of pairs
(, ):
i
,=
i
for exactly one i.
If (, ) is an edge, we write . We use this to dene the graph Laplacian
of f as
L(f)() =
1
2
:
(f() f()),
and the energy c(f) of f as c(f) = f, L(f)). Let us calculate the Lapla-
cian for the Walsh functions. If and
i
,=
i
, then
w
A
() = w
A
() if i , A,
w
A
() = w
A
() if i A,
so that L(w
A
) = [A[w
A
. Thus the Walsh functions are the eigenvectors of
L, and L corresponds to dierentiation. Further,
L(f) =
d
i=1
f
i
i
+
[A[>1
[A[
f
A
w
A
,
so that
c(f) =
d
i=1
f
2
i
+
[A[>1
[A[
f
2
A
.
Thus
2 |f|
2
2
= f, f) c(f) + 2(E(f))
2
+
d
i=1
f
2
i
.
We now embed D
d
2
as the vertices of the unit cube of l
d
. Let f(x) =
|x
1
a
1
+ +x
d
a
d
|, so that f() = |S
d
()|, f, f) = |S
d
|
2
L
2
(E)
, and E(f) =
|S
d
|
L
1
(E)
. Since f is an even function,

f
i
= 0 for 1 i d, and since f is
convex and positive homogeneous,
1
d
:
f() f
1
d
= f
_
d 2
d

_
=
d 2
d
f(),
by Jensens inequality. Consequently,
Lf()
1
2
(df() (d 2)f()) = f(),
so that c(f) |f|
2
2
and 2 |f|
2
2
|f|
2
2
+ 2(E(f))
2
. Thus |S
d
|
L
2
(E)

2 |S
d
|
L
1
(E)
.
d
2
213
d
2
The introduction of the Laplacian in the proof of Theorem 13.3.1 indicates
that the results that we have proved are related to semigroup theory. Let
P
t
= e
tL
; then (P
t
)
t0
is a semigroup of operators on C
R
(D
d
2
) with innites-
imal generator L. Then P
t
(w
A
) = e
t[A[
w
A
, and so Bonamis theorem shows
that if 1 < p < and q(t) = 1 + (p 1)e
2t
then
|P
t
(f)|
q(t)
|f|
p
.
This inequality is known as the hypercontractive inequality.
The hypercontractive inequality is closely related to the logarithmic Sobolev
inequality, which is obtained by dierentiation. Suppose that f is a non-
negative function on D
d
2
. We dene its entropy, Ent(f), as
Ent(f) = E(f log f) |f|
1
log |f|
1
.
[We set 0 log 0 = 0, since xlog x 0 as x 0.] Since the function xlog x
is strictly convex, it follows from Jensens inequality that Ent(f) 0, with
equality if and only if f is constant. If |f|
1
= 1 then Ent(f) = E(f log f),
and generally Ent(f) = Ent(f) for > 0. This entropy is a relative
entropy, related to the entropy of information theory in the following way.
Recall that the information entropy ent() of a probability measure on D
d
2
is dened as
D
d
2
() log
2
(). Thus ent(P
d
) = d (where P
d
is Haar
measure), and, as we shall see, ent() ent(P
d
) for any other probability
measure on D
d
2
. Now if f 0 and |f|
1
= 1 then f denes a probability
measure f dP
d
on D
d
2
which gives the point probability f()/2
d
. Thus
ent(f dP
d
) =
D
d
2
f()
2
d
log
2
_
f()
2
d
_
= d
Ent(f)
log 2
.
Thus Ent(f) measures how far the information entropy of f dP
d
falls below
the maximum entropy d.
Theorem 13.4.1 (The logarithmic Sobolev inequality) If f C
R
(D
d
2
)
then Ent(f
2
) 2c(f).
Proof Take p = 2 and set q(t) = 1+e
2t
. Since P
t
(w
A
) = e
t[A[
w
A
, dP
t
(w
A
)/
dt = [A[e
t[A[
w
A
= LP
t
(w
A
), and so by linearity dP
t
(f)/dt = LP
t
(f).
Suppose that |f|
2
= 1. Then |P
t
(f)|
q(t)
1, so that (d/dt)E(P
t
(f)
q(t)
) 0
at t = 0. Now
d
dt
(P
t
(f)
q(t)
) = P
t
(f)
q(t)
d
dt
log(P
t
(f)
q(t)
) = P
t
(f)
q(t)
d
dt
(q(t) log(P
t
(f)))
= P
t
(f)
q(t)
dq(t)
dt
log(P
t
(f)) +P
t
(f)
q(t)1
q(t)LP
t
(f)
= 2e
2t
P
t
(f)
q(t)
log(P
t
(f)) + (1 +e
2t
)P
t
(f)
q(t)1
LP
t
(f).
Taking expectations, and setting t = 0, we see that
0 E(f
2
log(f
2
)) + 2E(fL(f)) = Ent(f
2
) 2c(f).
We can use the logarithmic Sobolev inequality to show that certain func-
tions are sub-Gaussian. Let
i
D
d
2
be dened by (
i
)
i
= 1, (
i
)
j
= 1,
otherwise. If f C
R
(D
d
2
) and D
d
2
, dene the gradient f() R
d
by
setting f()
i
= f(
i
) f(). Then
[f()[
2
=
d
i=1
(f(
i
) f())
2
=
:
(f() f())
2
.
Note that
c(f) =
1
2
d
:
(f() f())f()
=
1
2
d+1
:
(f() f())f() +
:
(f() f())f()
=
1
2
E([f[
2
).
Theorem 13.4.2 Suppose that E(f) = 0 and that [(f)()[ 1 for all
D
d
2
. Then f is sub-Gaussian with exponent 1/
2: that is, E(e

f
) e
2
/4
,
for all real .
Proof It is clearly sucient to consider the case where > 0. Let H() =
E(e
f
). First we show that E([(e
f/2
)[
2
)
2
H()/4. Using the mean
d
2
215
value theorem to establish the rst inequality,
E([(e
f/2
)[
2
) =
1
2
d

:
(e
f()/2
e
f()/2
)
2
=
2
2
d
(e
f()/2
e
f()/2
)
2
: , f() < f()
_

2
2.2
d
(f() f())
2
e
f()
: , f() < f()
_

2
4.2
d

:
(f() f())
2
e
f()
=

2
4
E(|(f)|
2
2
e
f
)

2
4
E(e
f
) =

2
H()
4
.
Thus, applying the logarithmic Sobolev inequality,
Ent(e
f
) 2c(e
f/2
) = E([(e
f/2
)[
2
)

2
H()
4
.
But
Ent(e
f
) = E(fe
f
) H() log H() = H
t
() H() log H(),
so that
H
t
() H() log H()

2
H()
4
.
Let K() = (log H())/, so that e
K()
= E(e
f
). Then
K
t
() =
H
t
()
H()

log H()
2

1
4
.
Now as 0, H() = 1 +E(f) +O(
2
) = 1 +O(
2
), so that log H() =
O(
2
), and K() 0 as 0. Thus K() =
_
0
K
t
(s) ds /4, and
H() = E(e
f
) e
2
/4
.
Corollary 13.4.1 If r > 0 then P(f r) e
r
2
.
This leads to a concentration of measure result. Let h be the Hamming
metric on D
d
2
, so that h(, ) =
1
2
d
i=1
[
i

i
[, and if and only if
h(, ) =1. If Ais a non-empty subset of D
d
2
, let h
A
() = infh(, ): A.
Corollary 13.4.2 Suppose that P(A) > 1/e. Then E(h
A
)
d. Let
A
s
= : h(, A) s. If t > 1 then P(A
t
d
) 1 e
(t1)
2
.
Proof Let g() = h
A
()/
d. Then [g() g()[ d(, )/
d, so that
[(g)()[ 1 for each D
d
2
. Applying Corollary 13.4.1 to E(g) g with
r = 1, we see that P(g E(g) 1) 1/e. But P(g 0) > 1/e, so that
E(g) 1. Now apply Corollary 13.4.1 to g E(g), with r = t 1:
1 P(A
t
d
) = P(g > t) P(g E(g) > t 1) e
(t1)
2
.
13.5 Gaussian measure and the Hermite polynomials
Although, as we have seen, analysis on the discrete space D
d
2
leads to inter-
esting conclusions, it is natural to want to obtain similar results on Euclidean
space. Here it turns out that the natural underlying measure is not Haar
measure (that is, Lebesgue measure) but is Gaussian measure. In this set-
ting, we can obtain logarithmic Sobolev inequalities, which correspond to
the Sobolev inequalities for Lebesgue measure, but have the great advantage
that they are not dependent on the dimension of the space, and so can be
extended to the innite-dimensional case.
First, let us describe the setting in which we work. Let
1
be the proba-
bility measure on the line R given by
d
1
(x) =
1
2
e
x
2
/2
dx,
and let
1
be the random variable
1
(x) = x, so that
1
is a standard
Gaussian or normal random variable, with mean 0 and variance E(
2
1
) = 1.
Similarly, let
d
be the probability measure on R
d
given by
d
d
(x) =
1
(2)
d/2
e
[x[
2
/2
dx,
and let
i
(x) = x
i
, for 1 i d. Then (
1
, . . . ,
d
) is a sequence of in-
dependent standard Gaussian random variables. More generally, a closed
linear subspace H of L
2
() is a Gaussian Hilbert space if each f H has
a centred Gaussian distribution (with variance |f|
2
2
). As we have seen, H
is then strongly embedded in L
exp
2. If, as we shall generally suppose, H is
separable and (f
i
) is an orthonormal basis for H, then (f
i
) is a sequence of
independent standard Gaussian random variables.
13.5 Gaussian measure and the Hermite polynomials 217
We shall discuss in some detail what happens in the one-dimensional
case, and then describe how the results extend to higher dimensions. The
sequence of functions (1, x, x
2
, . . .) is linearly independent, but not ortho-
gonal, in L
2
(
1
); we apply GramSchmidt orthonormalization to obtain an
orthonormal sequence (
h
n
) of polynomials. We shall see that these form an
orthonormal basis of L
2
(
1
). Each

h
n
is a polynomial of degree n, and we
can choose it so that its leading coecient is positive. Let us then write
h
n
= c
n
h
n
, where c
n
> 0 and h
n
is a monic polynomial of degree n (that is,
the coecient of x
n
is 1). The next proposition enables us to recognize h
n
as the nth Hermite polynomial.
Proposition 13.5.1 Dene the nth Hermite polynomial as
h
n
(x) = (1)
n
e
x
2
/2
(
d
dx
)
n
e
x
2
/2
.
Then
h
n
(x) = (x
d
dx
)h
n1
(x) = (x
d
dx
)
n
1.
Each h
n
is a monic polynomial of degree n, (h
n
) is an orthogonal sequence
in L
2
(
1
), and |h
n
|
2
= (n!)
1/2
.
Proof Dierentiating the dening relation for h
n1
, we see that dh
n1
(x)/dx
= xh
n1
(x) h
n
(x), which gives the rst assertion, and it follows from this
that h
n
is a monic polynomial of degree n. If m n, then, integrating by
parts m times,
_
x
m
h
n
(x) d
1
(x) =
(1)
n
2
_

x
m
(
d
dx
)
n
e
x
2
/2
dx
=
(1)
nm
m!
2
_

(
d
dx
)
nm
e
x
2
/2
dx
=
_
0 if m < n,
n! if m = n.
Thus (h
n
) is orthogonal to all polynomials of lower degree; consequently
(h
n
) is an orthogonal sequence in L
2
(
1
). Finally,
|h
n
|
2
2
= h
n
, x
n
) +h
n
, h
n
x
n
) = n!
Corollary 13.5.1 We have the following relations:
(i) h
n
(x) = i
n
e
x
2
/2
_
R
u
n
e
iux
d
1
(u) =
1
2
_

(x +iy)
n
e
y
2
/2
dy.
(ii)
dh
n
dx
(x) = nh
n1
(x).
(iii)
_ _
dh
n
dx
_
2
d
1
= n(n!),
_
dh
n
dx
dh
m
dx
d
1
= 0 for m ,= n.
Proof The rst equation of (i) follows by repeatedly applying the operator
x d/dx to the equation 1 = e
x
2
/2
_
R
e
iux
d
1
(u). Making the change of
variables y = u + ix (justied by Cauchys theorem), we obtain the second
equation. Dierentiating under the integral sign (which is easily seen to be
valid), we obtain (ii), and (iii) follows from this, and the proposition.
Proposition 13.5.2 The polynomial functions are dense in L
p
(
1
), for
0 < p < .
Proof We begin by showing that the exponential functions are approximated
by their power series expansions. Let e
n
(x) =
n
j=0
(x)
n
/n! Then
[e
x
e
n
(x)[
p
= [
j=n+1
(x)
n
/n![
p
e
p[x[
,
and
_
e
p[x[
d
1
(x) < , so that by the theorem of dominated convergence
_
[e
x
e
n
(x)[
p
d
1
0 as n , and so e
n
(x) e
x
in L
p
(
1
).
Now suppose that 1 p < and that f L
p
(
1
) is not in the closure
of the polynomial functions in L
p
(
1
). Then by the separation theorem
there exists g L
p
(
1
) such that
_
fg d
1
= 1 and
_
qg d
1
= 0 for every
polynomial function q. But then
_
e
x
g(x) d
1
(x) = 0 for all , so that
1
2
_

e
isx
g(x)e
x
2
/2
dx =
_
e
isx
g(x) d
1
(x) = 0,
so that the Fourier transform of g(x)e
x
2
/2
is zero, and so g = 0, giving a
contradiction.
Thus the polynomial functions are dense in L
p
(
1
), for 1 p < . Since
L
1
(
1
) is dense in L
p
(
1
) for 0 < p < 1, the polynomial functions are dense
in these spaces too.
13.6 The central limit theorem 219
Corollary 13.5.2 The functions (
h
n
) form an orthonormal basis for L
2
(
1
).
It is worth noting that this is a fairly sophisticated proof, since it uses the
theorem of dominated convergence, and Fourier transforms. It is possible
to give a more elementary proof, using the StoneWeierstrass theorem, but
this is surprisingly complicated.
13.6 The central limit theorem
We wish to establish hypercontractive and logarithmic Sobolev inequalities
in this Gaussian setting. We have seen that in D
d
2
these inequalities are
related to a semigroup of operators. The same is true in the Gaussian case,
where the semigroup is the OrnsteinUhlenbeck semigroup (P
t
)
t0
acting on
L
2
(
1
):
if f =
n=0
f
n
h
n
(), then P
t
(f) =
n=0
e
nt
f
n
h
n
().
There are now two ways to proceed. The rst is to give a careful direct
analysis of the OrnsteinUhlenbeck semigroup; but this would take us too
far into semigroup theory. The second, which we shall follow, is to use the
central limit theorem to carry results across from the D
d
2
case. For this we
only need the simplest form of the central limit theorem, which goes back
to the work of De Moivre, in the eighteenth century.
A function g dened on R is of polynomial growth if there exist C > 0
and N N such that [f(x)[ C[1 +[x[
N
), for all x R.
Theorem 13.6.1 (De Moivres central limit theorem) Let (
n
) be a
sequence of Bernoulli random variables and let C
n
= (
1
+ +
n
)/
n.
Let be a Gaussian random variable with mean 0 and variance 1. Then
P(C
n
t) P( t) for each t R, and if g is a continuous function of
polynomial growth then E(g(C
n
)) E(g()) as n .
Proof We shall prove this for even values of n: the proof for odd values is
completely similar. Fix m, and let t
j
= j/
2m. The random variable C

2m
takes values t
2k
, for m k m, and
P(C
2m
= t
2k
) =
1
2
2m
_
2m
m+k
_
.
First we show that we can replace the random variables (C
2m
) by random
variables (D
2m
) which have density functions, and whose density functions
are step functions. Let I
2k
= (t
2k1
, t
2k+1
] and let D
2m
be the random
variable which has density
p
2m
(t) =
_
m
2
1
2
2m
_
2m
m+k
_
if t I
2k
for some m k m,
= 0 otherwise.
Thus P(C
2m
I
2k
) = P(D
2m
I
2k
). The random variables C
2m
are all sub-
Gaussian, with exponent 1, and so P([C
2m
[ > R) 2e
R
2
, and if m 2
then P([D
2m
[ > R + 1) 2e
R
2
. Thus if g is a continuous function of
polynomial growth and > 0 there exists R > 0 such that
_
[C
2m
[>R
[g(C
2m
)[ dP

3
and
_
[D
2m
[>R
[g(D
2m
)[ dP

3
for all m. On the other hand, it follows from the uniform continuity of g on
[R, R] that there exists m
0
such that
_
[C
2m
[>R
g(C
2m
) dP
_
[D
2m
[>R
g(D
2m
) dP

3
for m m
0
. Thus E(g(C
2m
)) E(g(D
2m
)) 0 as m . Similarly,
P(C
2m
t) P(D
2m
t) 0 as m . It is therefore sucient to prove
the result with the random variables (D
2m
) in place of (C
2m
).
First we show that p
2m
(t) e
t
2
/2
/C (where C is the constant in Stir-
lings formula) as m . Applying Stirlings formula (Exercise 13.1),
p
2m
(0) =
_
m
2
(2m)!
2
2m
(m!)
2
1/C.
If t > 0 and m 2t
2
then t I
2k
t
for some k
t
with [k
t
[ m/2. Then
p
2m
(t) = p
2m
(0)
(m1) . . . (mk
t
)
(m+ 1) . . . (m+k
t
)
= p
2m
(0)
(1 1/m) . . . (1 k
t
/m)
(1 + 1/m) . . . (1 +k
t
/m)
.
Let
r
2m
(t) = log
_
(1 1/m) . . . (1 k
t
/m)
(1 + 1/m) . . . (1 +k
t
/m)
_
=
k
t
j=1
log(1 j/m)
k
t
j=1
log(1 +j/m).
Since [ log(1 +x) x[ x
2
for [x[ < 1/2,
[r
m
(t) +k
t
(k
t
+ 1)/m[ k
t
(k
t
+ 1)(2k
t
+ 1)/3m
2
,
for large enough m. But k
2
t
/m t
2
/2 as m , and so r
2m
(t) t
2
/2
13.7 The Gaussian hypercontractive inequality 221
as m . Thus p
2m
(t) e
t
2
/2
/C as m . By symmetry, the result
also holds for t < 0.
Finally, p
2m
is a decreasing function on [0, ), so that the functions p
2m
are uniformly bounded; further, if t 3 and m 2 then
p
2m
(t) ([t[/2)P(D
2m
> [t[/2) [t[e
([t[/21)
2
.
We apply the theorem of dominated convergence: if g is a continuous func-
tion of polynomial growth then
E(g(D
2m
)) =
_

g(t)p
2m
(t) dt
1
C
_

g(t)e
t
2
/2
dt = E(g()).
In particular, taking g = 1, 1 = (1/C)
_
e
t
2
/2
dt, so that the constant C
in Stirlings formula is

2. Similarly,
P(D
2m
t) =
_
t
p
2m
(s) ds
1
2
_
t
e
s
2
/2
dt = P( t).
13.7 The Gaussian hypercontractive inequality
If f is a function on D
d
2
and
d
, the group of permutations of 1, . . . , d,
we set f
() = f(
(1)
, . . . ,
(d)
). Let
SL
2
(D
d
2
) = f L
2
(D
d
2
): f = f
for each
d
.
Then SL
2
(D
d
2
) is a d+1-dimensional subspace of L
2
(D
d
2
), with orthonormal
basis (S
(d)
0
, . . . , S
(d)
d
), where
S
(d)
j
=

A:[A[=j
w
A
_
_
d
j
_
1/2
.
But span (S
(d)
0
, . . . , S
(d)
j
) = span (1, C
d
, . . . , C
j
d
), where C
d
= S
(d)
1
=
(
d
i=1
i
)/
d. Thus (1, C
d
, . . . , C
d
d
) is also a basis for SL
2
(D
d
2
), and there
exists a non-singular upper-triangular matrix H
(d)
= (h
(d)
k,j
) such that
S
(d)
i
=
k
j=0
h
(d)
k,j
C
j
d
= h
(d)
i
(C
d
),
where h
(d)
i
(x) =

i
j=0
h
(d)
k,j
(x). With this notation, we have the following
corollary of Bonamis theorem.
Corollary 13.7.1 Suppose that 1 < p < q < , and that (x
0
, . . . , x
N
) is a
sequence of vectors in a normed space (E, |.|
E
). If d N then
_
_
_
_
_
N
k=0
r
k
q
h
(d)
k
(C
d
)
_
_
_
_
_
L
q
(E)
_
_
_
_
_
N
k=0
r
k
p
h
(d)
k
(C
d
)
_
_
_
_
_
L
p
(E)
.
We now show that the polynomials h
(d)
k
converge to the normalized Her-
mite polynomial

h
k
as d .
Proposition 13.7.1 h
(d)
k,j

h
k,j
(the coecient of x
j
in the normalized
Hermite polynomial

h
k
) as d .
Proof We prove this by induction on k. The result is certainly true when
k = 0. Suppose that it is true for all l < k. Note that, since |C
d
|
2
= 1,
it follows from Khintchines inequality that there exists a constant M
k
such
that E([C
d
[
k
(1 + [C
d
[
k
)) M
k
, for all d. It follows from the inductive
hypothesis that given > 0 there exists d
k
such that [h
(d)
l
(x)

h
l
(x)[ <
(1 +[x[
k
)/M
k
for l < k and d d
l
. Now it follows from orthogonality that
h
(d)
k
(x) = x
k
k1
l=0
_
E(C
k
d
h
(d)
l
(C
d
))
_
h
(d)
l
(x).
If d d
l
then
[E(C
k
d
(h
(d)
l
(C
d
)
h
l
(C
d
)))[ E([C
k
d
(1 +[C
d
[
k
)[)/M
k
,
and E(C
k
d
h
l
(C
d
)) E(
k
h
l
()), by De Moivres central limit theorem, and
so E(C
k
d
h
(d)
l
(C
d
)) E(
k
h
l
()) as d . Consequently
h
(d)
k
(x) x
k
k1
l=0
E(
k
h
l
())
h
l
(x) = h
k
(x),
for each x R, from which the result follows.
We now have the following consequence.
Theorem 13.7.1 Suppose that 1 < p < q < and that
0
, . . . ,
n
are real
numbers. Then
_
_
_
_
_
N
n=0
r
n
q
h
n
_
_
_
_
_
L
q
(
1
)
_
_
_
_
_
N
n=0
r
n
p
h
n
_
_
_
_
_
L
p
(
1
)
,
where as before r
p
= 1/
p 1 and r
q
= 1/
q 1.
13.8 Correlated Gaussian random variables 223
Proof Suppose that > 0. As in Proposition 13.7.1, there exists d
0
such
that
[
N
n=0
r
n
p
n
h
(d)
n
(x)[
p
[
N
n=0
r
n
p
h
n
(x)[
p
(1 +[x[
Np
),
for d d
0
, from which it follows that
_
_
_
_
_
N
n=0
r
n
p
n
h
(d)
n
(C
d
)
_
_
_
_
_
p
_
_
_
_
_
N
n=0
r
n
p
h
n
(C
d
)
_
_
_
_
_
p
0.
But
_
_
_
_
_
N
n=0
r
n
p
h
n
(C
d
)
_
_
_
_
_
p
_
_
_
_
_
N
n=0
r
n
p
h
n
()
_
_
_
_
_
p
,
as d , by De Moivres central limit theorem. Thus
_
_
_
_
_
N
n=0
r
n
p
n
h
(d)
n
(C
d
)
_
_
_
_
_
p
_
_
_
_
_
N
n=0
r
n
p
h
n
()
_
_
_
_
_
p
,
as d . Similarly,
_
_
_
_
_
N
n=0
r
n
q
h
(d)
n
(C
d
)
_
_
_
_
_
q
_
_
_
_
_
N
n=0
r
n
q
h
n
()
_
_
_
_
_
q
,
as d , and so the result follows from Corollary 13.7.1.
We can interpret this inequality as a hypercontractive inequality. If
(P
t
)
t0
is the OrnsteinUhlenbeck semigroup, if 1 < p < , if q(t) = 1 +
(p1)e
2t
and if f L
p
(
1
), then P
t
(f) L
q(t)
(
1
), and |P
t
(f)|
q(t)
|f|
p
.
13.8 Correlated Gaussian random variables
Suppose now that and are standard Gaussian random variables with a
joint normal distribution, whose correlation = E() satises 1 < < 1.
Then if we set
1
= and
2
= ( )/, where =
_
1
2
, then
1
and
2
are independent standard Gaussian random variables, and =
1
+
2
.
Let
2
be the joint distribution of (
1
,
2
). We can consider L
2
() and L
2
()
as subspaces of L
2
(
2
). Let
be the orthogonal projection of L

2
() onto
L
2
(); it is the conditional expectation operator E([).
Proposition 13.8.1 Suppose that and are standard Gaussian random
variables with a joint normal distribution, whose correlation = E() sat-
ises 1 < < 1. Then
(h
n
()) =
n
h
n
().
Proof Since P
(f) =
m=0
_
f,
h
m
()
_
h
m
(), we must show that
_
h
n
(),
h
m
()
_
=
n
if m = n,
= 0 otherwise.
First observe that if m < n then
h
m
() =

h
m
(
1
+
2
) =
m
j=0
p
j
(
2
)
j
1
,
where each p
j
is a polynomial of degree mj, so that
_
h
n
(),
h
m
()
_
=
m
j=0
_
E
h
n
(
1
)
j
1
_
(E
2
p
j
(
2
)) = 0,
by the orthogonality of

h
n
(
1
) and
j
1
. A similar result holds if m > n, by
symmetry. Finally, if m = n then p
n
(
2
) =

h
m
(
1
)(0) =
n
/(n!)
1/2
, and so
_
h
n
(),
h
n
()
_
= E
1
(
n
/(n!)
1/2
)
h
n
(
1
)
n
1
=
n
.
1
and
2
be independent standard Gaussian random
variables, and for t 0 let
t
= e
t
1
+ (1 e
2t
)
1/2
2
. If f L
2
(
1
) then
P
t
(f) = E(f(
t
)[
1
) (where (P
t
)
t0
)is the OrnsteinUhlenbeck semigroup).
This proposition enables us to prove the following fundamental result.
Theorem 13.8.1 Suppose that and are standard Gaussian random vari-
ables with a joint normal distribution, whose correlation = E() satises
1 < < 1, and suppose that (p 1)(q 1)
2
. If f L
2
() L
p
() and
g L
2
() L
q
() then
[E(fg)[ |f|
p
|g|
q
.
Proof By approximation, it is enough to prove the result for f =

m
j=0
j
h
j
() and g =
n
k=0
j
h
j
() Let e
2t
=
2
, and let r = 1+
2
(p
t
1). Note
13.9 The Gaussian logarithmic Sobolev inequality 225
that 1 < r < q and that p
t
= 1 +e
2t
(r 1). Then
[E(fg)[ = [E(fE(g[)[
|f|
p
|E(g[)|
p
(by H olders inequality)

= |f|
p
|P
t
(g)|
p
|f|
p
|g|
r
(by hypercontractivity)
|f|
p
|g|
q
.
The statement of this theorem does not involve Hermite polynomials. Is
there a more direct proof? There is a very elegant proof by Neveu [Nev 76],
using stochastic integration and the It o calculus. This is of interest, since
it is easy to deduce Theorem 13.7.1 from Theorem 13.8.1. Suppose that
1 < p < q < . Let r =
_
(p 1)/(q 1), and let and be standard
Gaussian random variables with a joint normal distribution, with correlation
r. If f() =

N
n=0
r
n
p
h
n
() then P
(f()) =

N
n=0
r
n
q
h
n
(). There
exists g L
q
with |g|
q
= 1 such that [E(P
(f)()g())[ = |P
(f)|
q
. Then
|P
(f)|
q
= [E(P
(f)()g())[ = [E(f()g())[ |f|

p
|g|
q
= |f|
p
.
13.9 The Gaussian logarithmic Sobolev inequality
We now turn to the logarithmic Sobolev inequality. First we consider the
innitesimal generator L of the OrnsteinUhlenbeck semigroup. What is its
domain D(L)? Since (P
t
(
h
n
)

h
n
)/t n
h
n
,

h
n
D(L) and L(
h
n
) =
n
h
n
. Let
D =
_
f =
n=0
f
n
h
n
L
2
(
1
):
n=0
n
2
f
2
n
<
_
.
If f D then, applying the mean value theorem term by term,
|(P
t
(f) f)/t|
2

n=0
n
2
f
2
n
, and so f D(L), and L(f) =
n=0
nf
n
h
n
.
Conversely, if f D(L) then
_
(P
t
(f) f)/t,
h
n
_
= ((e
nt
1)/t)f
n

_
L(f),
h
n
_
,
so that L(f) =
n=0
nf
n
h
n
, and f D. Thus D = D(L). Further, if
f D(L) then
c(f) = f, L(f)) =
n=0
nf
2
n
=
_

_
df
dx
_
2
d
1
,
where df/dx =
n=1
nf
n
h
n
L
2
is the formal derivative of f.
We want to use De Moivres central limit theorem. To this end, let us
denote the innitesimal generator of the semigroup acting on SL
2
(D
d
2
) by
L
d
, and denote the entropy and the energy of f(C
d
) by Ent
d
and c
d
(f).
Proposition 13.9.1 If f is a continuous function of polynomial growth
which is in D(L), then Ent
d
(f
2
) Ent(f
2
).
Proof Since f
2
and f
2
log f
2
are of polynomial growth,
E((f(C
d
))
2
)
_
f
2
d
1
and E((f(C
d
))
2
log(f(C
d
))
2
)
_
f
2
log f
2
d
1
as d ; the result follows from this.
2
(
1
) is dierentiable, with a uniformly
continuous derivative f
t
. Then c
d
(f) c(f) as d .
Proof The conditions ensure that f
t
L
2
() and that c(f) =
_
(f
t
)
2
d
1
.
We shall prove the result for even values of d: the proof for odd values
is completely similar. We use the notation introduced in the proof of De
Moivres central limit theorem.
Fix d = 2m. If C
d
() = t
2k
then
L
d
(f(C
d
))() =
1
2
((m+k)f(t
2k2
) + (mk)f(t
2k+2
) 2mf(t
2k
)) ,
so that E(f, L
d
(f))) =
1
2
(J
1
+J
2
), where
J
1
=
k
((mk)f(t
2k
)(f(t
2k+2
) f(t
2k
))P(C
d
= t
2k
))
and
J
2
=
k
((m+k)f(t
2k
)(f(t
2k2
) f(t
2k
))P(C
d
= t
2k
))
=
k
((m+k + 1)f(t
2k+2
)(f(t
2k+2
) f(t
2k
))P(C
d
= t
2k+2
)) ,
by a change of variables. Now
(m+k + 1)P(C
d
= t
2k+2
) = (m+k + 1)
(2m)!
2
2m
(m+k + 1)!(mk 1)!
= (mk)
(2m)!
2
2m
(m+k)!(mk)!
= (mk)P(C
d
= t
2k
),
13.10 The logarithmic Sobolev inequality in higher dimensions 227
so that
E(f, L
d
(f))) =
1
2
k
(mk)(f(t
2k+2
) f(t
2k
))
2
P(C
d
= t
2k
)
=
k
_
mk
m
__
f(t
2k+2
) f(t
2k
)
t
2k+2
t
2k
_
2
P(C
d
= t
2k
).
Given > 0 there exists > 0 such that [(f(x + h) f(x))/h f
t
(x)[ <
for 0 < [h[ < , so that
[(f(x +h) f(x))
2
/h
2
(f
t
(x))
2
[ < (2[f
t
(x)[ +).
Also, k/m = t
2k
/
d. Thus it follows that

[E(f, L
d
(f))) +E((f
t
(C
d
))
2
)[ (E(2[f
t
(C
d
)[) +) +K
d
,
where
K
d
=
k
[k[
m
[f
t
(t
2k
)[
2
P(C
d
= t
2k
) =
1
d
E([C
d
[(f
t
(C
d
))
2
).
By De Moivres central limit theorem, E([C
d
[(f
t
(C
d
))
2
) E([[(f
t
())
2
) as
d , so that K
d
0 as d ; further, E(f
t
(C
d
))
2
E((f
t
)
2
) as
d and so c
d
(f) E((f
t
)
2
) = c(f) as d .
Corollary 13.9.1 (The Gaussian logarithmic Sobolev inequality)
Suppose that f L
2
(
1
) is dierentiable, with a uniformly continuous deriva-
tive f
t
. Then Ent(f
2
) 2c(f).
13.10 The logarithmic Sobolev inequality in higher dimensions
What happens in higher dimensions? We describe briey what happens in
R
d
; the ideas extend easily to the innite-dimensional case. The measure
d
is the d-fold product
1

1
. From this it follows that the polynomials
in x
1
, . . . , x
d
are dense in L
2
(R
d
). Let P
n
be the nite-dimensional subspace
spanned by the polynomials of degree at most n, let p
n
be the orthogonal
projection onto P
n
, let
n
= p
n
p
n1
and let H
:n:
=
n
(L
2
(
d
)). Then
L
2
() =
n=0
H
:n:
. This orthogonal direct sum decomposition is the Wiener
chaos decomposition; H
:n:
is the n-th Wiener chaos. If x
= x
1
1
. . . x
d
d
,
with [[ =
1
+ +
d
= n, then
n
(x
) =
d
i=1
h
i
(x
i
). This is the Wick
product: we write it as :x
:.
A more complicated, but essentially identical argument, using indepen-
dent copies C
m,1
, . . . , C
m,d
of C
m
, establishes the Gaussian version of
Bonamis theorem.
Theorem 13.10.1 Suppose that 1 < p < q < , and that y
A
is a
family of elements of a Banach space (E, |.|
E
), where A is a nite set of
multi-indices = (
1
, . . . ,
d
). Then
_
_
_
_
_
A
r
[[
q
:x
: y
_
_
_
_
_
L
q
(E)
_
_
_
_
_
A
r
[[
p
:x
: y
_
_
_
_
_
L
p
(E)
.
Proof The details are left to the reader.
This result then extends by continuity to innite sums, and to innitely
many independent Gaussian random variables.
The logarithmic Sobolev inequality also extends to higher dimensions.
The Ornstein-Uhlenbeck semigroup acts on multinomials as follows: if f =
A
f
:x
: then
P
t
(f) =
A
e
[[t
f
:x
: and L(f) =
A
[[f
:x
:
Then we have the following theorem.
2
(
d
) has a uniformly continuous
derivative f, and that |f|
L
2
(
d
)
= 1. Then
0
_
[f[
2
log [f[
2
d
d

_
[f[
2
d
d
.
This theorem and its corollary have the important property that the in-
equalities do not involve the dimension d; contrast this with the Sobolev
inequality obtained in Chapter 5 (Theorem 5.8.1).
We also have the following consequence: the proof is the same as the proof
of Theorem 13.4.2.
2
(
d
) has a uniformly continuous
derivative f, that
_
R
d
f d
d
= 0, and that [(f)(x)[ 1 for all x R
d
.
Then f is sub-Gaussian with index 1/
2: that is,
_
R
d
(e
f
) d
d
e
2
/4
,
for all real .
Corollary 13.10.1 If r > 0 then
d
(f r) e
r
2
.
If A is a closed subset of R
d
, and s > 0 we set A
s
= x: d(x, A) s.
Corollary 13.10.2 Suppose that
d
(A) > 1/e. Let If s > 1 then
d
(A
s
)
1 e
(t1)
2
.
13.11 Beckners inequality 229
Proof Let g(x) = d(x, A). Then [g(x)[ 1 for each x , A, but g is not
dierentiable at every point of A. But we can approximate g uniformly by
smooth functions g
n
with [g
n
(x)[ 1 for all x, and apply the argument
of Corollary 13.4.2, to obtain the result. The details are again left to the
reader.
13.11 Beckners inequality
Bonamis inequality, and the hypercontractive inequality, are essentially real
inequalities. As Beckner [Bec 75] showed, there is an interesting complex
version of the hypercontractive inequality.
Theorem 13.11.1 (Beckners inequality) Suppose that 1 < p < 2, and
let s =

p 1 = r
p
, so that 0 < s < 1. If a and b are complex numbers
then
|a +isb|
p
|a +b|
p
.
Proof The result is trivially true if a = 0. Otherwise, by homogeneity, we
can suppose that a = 1. Let b = c +id. Then [1 +isb[
2
= [1 sd[
2
+s
2
c
2
,
so that
|1 +isb|
2
p
=
_
_
[1 +isb[
2
_
_
p
/2
=
_
_
(1 sd)
2
+s
2
c
2
_
_
p
/2
_
_
(1 sd)
2
_
_
p
/2
+s
2
c
2
(by Minkowskis inequality)
= |1 sd|
p
+s
2
c
2
|1 d|
2
+s
2
c
2
(by the hypercontractive inequality)
= 1 +d
2
+s
2
c
2
= |1 +sc|
2
2
+d
2
|1 +c|
2
p
+d
2
(by the hypercontractive inequality again)
=
_
_
(1 +c)
2
_
_
p/2
+d
2
_
_
(1 +c)
2
+d
2
_
_
p/2
(by the reverse Minkowski inequality)
= |1 +b|
2
p
.
Following through the second half of the proof of Bonamis inequality, and
the proof of Theorem 13.7.1, we have the following corollary.
Corollary 13.11.1 (Beckners theorem) Suppose that 1 < p < 2, and
that s =

p 1.
(i) If z
A
: A 1, . . . , n is a family of complex numbers, then
_
_
_
_
_
A
(is)
[A[
w
A
z
A
_
_
_
_
_
L
p
_
_
_
_
_
A
w
A
z
A
_
_
_
_
_
L
p
,
where the w
A
are Walsh functions.
(ii) If f =
n
j=0
j
h
j
is a polynomial, let M
is
(f) =
n
j=0
(is)
j
j
h
j
. Then
|M
is
(f)|
L
p
(
1
)
|f|
L
p
(
1
)
.
13.12 The BabenkoBeckner inequality
Beckner [Bec 75] used Corollary 13.11.1 to establish a stronger form of the
HausdorYoung inequality. Recall that this says that the Fourier transform
is a norm-decreasing linear map from L
p
(R) into L
p
(R), for 1 < p 2, and

that we proved it by complex interpolation. Can we do better? Babenko
had shown that this was possible, and obtained the best possible result,
when p
t
is an even integer. Beckner then obtained the best possible result
for all 1 < p 2.
Theorem 13.12.1 (The BabenkoBeckner inequality) Suppose that
1 < p 2. Let n
p
= p
1/2p
, n
p
= (p
t
)
1/2p
and let A
p
= n
p
/n
p
. If f
L
p
(R) then its Fourier transform T(f)(u) =
_
e
2ixu
f(x) dx satises
|T(f)|
p
A
p
|f|
p
, and A
p
is the best possible constant.
Proof First let us show that we cannot do better than A
p
. If e(x) = e
x
2
,
then T(e)(u) = e
u
2
. Since |e|
p
= 1/n
p
and |e|
p
= 1/n
p
, |T(e)|
p
=
A
p
|e|
p
.
There is a natural isometry J
p
of L
p
(
1
) onto L
p
(R): if f L
p
(
1
), let
J
p
(f)(x) = n
p
e
x
2
f(
p
x),
where
p
=

2p. Then
|J
p
(f)|
p
p
=

p
_

e
px
2
[f(
p
x)[
p
dx
=
1
2
_

e
y
2
/2
[f(y)[
p
dy = |f|
p
L
p
(
1
)
.
We therefore consider the operator T
p
= J
1
p
TJ
p
: L
p
(
1
) L
p
(
1
). Let
13.12 The BabenkoBeckner inequality 231
f
n
= J
p
(h
n
). Then, since (dh
n
/dx)(x) = xh
n
(x) h
n+1
(x),
df
n
dx
(x) = 2xf
n
(x) +
p
n
p
e
x
2 dh
n
dx
(
p
x)
= 2(p 1)xf
n
(x)
p
f
n+1
(x);
thus we have the recurrence relation
p
f
n+1
(x) = 2(p 1)xf
n
(x)
df
n
dx
(x).
Now let k
n
be the Fourier transform of f
n
. Bearing in mind that if f is a
smooth function of rapid decay and if g(x) = xf(x) and h(x) = (df/dx)(x)
then
T(g)(u) =
i
2
dT(f)
du
(u) and T(h)(u) = 2iuT(f)(u),
we see that
p
k
n+1
(u) = i(p 1)
dk
n
du
(u) 2iuk
n
(u)
= i
_
2uk
n
(u) (p 1)
dk
n
du
(u)
_
,
so that, since
p
s(p 1) =
p
, we obtain the recurrence relation
p
k
n+1
(u) = is
_
2(p
t
1)uk
n
(u)
dk
n
du
(u)
_
where, as before, s =

p 1.
Now f
0
(x) = n
p
e
x
2
, so that k
0
(u) = n
p
e
u
2
= A
p
f
0
(u). Comparing
the recurrence relations for (f
n
) and (k
n
), we see that k
n
= A
p
(is)
n
J
1
p
(h
n
),
so that T
p
(h
n
) = A
p
(is)
n
h
n
. Thus T
p
= A
p
M(is), and so, by Beckners
theorem,
_
_
_T
p
: L
p
(
1
) L
p
(
1
)
_
_
_ A
p
. Since J
p
and J
p
are isometries, it
follows that
_
_
_T : L
p
(R) L
p
(R)
_
_
_ A
p
.
An exactly similar argument establishes a d-dimensional version.
Theorem 13.12.2 (The BabenkoBeckner inequality) Suppose that
1 < p 2. Let A
p
= p
1/2p
/p
t1/2p
. If f L
p
(R
d
), then its Fourier transform
f(u) =
_
R
d
e
2ix,u)
f(x) dx satises
_
_
_
f
_
_
_
p
A
d
p
|f|
p
, and A
d
p
is the best
possible constant.
Bonamis inequality was proved in [Bon 71]; it was used in her work on
harmonic analysis on the group D
N
2
. At about the same time, a similar
inequality was proved by Nelson [Nel 73] in his work on quantum eld theory,
and the inequality is sometimes referred to as Nelsons inequality.
The relationship between the hypercontractive inequality and the loga-
rithmic Sobolev inequality is an essential part of modern semigroup theory,
and many aspects of the results that are proved in this chapter are claried
and extended in this setting. Accounts are given in [Bak 94] and [Gro 93].
An enjoyable panoramic view of the subject is given in [Ane 00].
A straightforward account of information and entropy is given in [App 96].
In his pioneering paper [Gro 75], Gross used the central limit theorem, as
we have, to to establish Gaussian logarithmic Sobolev inequalities.
The book by Janson [Jan 97] gives an excellent account of Gaussian
Hilbert spaces.
Exercises
13.1 Let
f
n
(x) = (1)
n
e
x
2 d
n
dx
n
(e
x
2
).
Show that (f
n
)
n=0
is an orthonormal sequence in L
2
(R), whose linear
span is dense in L
2
(R). Find constants C
n
such that (

f
n
) = (C
n
f
n
)
is an orthonormal basis for L
2
(R). Show that T(

f
n
) = i
n

f
n
. De-
duce the Plancherel theorem for L
2
(R): the Fourier transform is an
isometry of L
2
(R) onto L
2
(R).
The idea of using the Hermite functions to prove the Plancherel
theorem goes back to Norbert Wiener.
13.2 Calculate the constants given by the BabenkoBeckner inequality
for various values of p, and compare them with those given by the
HausdorYoung inequality.
14
Hadamards inequality
14.1 Hadamards inequality
So far, we have been concerned with inequalities that involve functions. In
the next chapter, we shall turn to inequalities which concern linear operators.
In the nite-dimensional case, this means considering matrices and deter-
minants. Determinants, however, can also be considered as volume forms.
In this chapter, we shall prove Hadamards inequality [Had 93], which can
usefully be thought of in this way. We shall also investigate when equality
holds, in the real case: this provides a digression into number theory, and
also has application to coding theory, which we shall also describe.
Theorem 14.1.1 (Hadamards inequality) Let A = (a
ij
) be a real or
complex n n matrix. Then
[ det A[
n
j=1
_
n
i=1
[a
ij
[
2
_
1/2
,
with equality if and only if either both sides are zero or

n
i=1
a
ij
a
ik
= 0 for
j ,= k.
Proof Let a
j
= (a
ij
) be the j-th column of A, considered as an element of the
inner product space l
2
n
. Then the theorem states that [ det A[
n
j=1
|a
j
|,
with equality if and only if the columns are orthogonal, or one of them is
zero.
The result is certainly true if det A = 0. Let us suppose that det A is not
zero. Then the columns of A are linearly independent, and we orthogonalize
them. Let E
j
= span (a
1
, . . . , a
j
), and let Q
j
be the orthogonal projection
of l
n
2
onto E
j
. Let b
1
= a
1
and let b
j
= Q
j1
(a
j
), for 2 j n. Then
233
234 Hadamards inequality
|b
j
| |a
j
|. On the other hand,
b
j
= a
j

j1
i=1
a
j
, b
i
)
b
i
, b
i
)
b
i
for 2 j n, so that the matrix B with columns b
1
, . . . , b
n
is obtained
from A by elementary column operations. Thus det B = det A. Since the
columns of B are orthogonal, B
B = diag(|b
1
|
2
, . . . , |b
n
|
2
), so that
[ det A[ = [ det B[ = (det(B
B))
1/2
=
n
j=1
|b
j
|
n
j=1
|a
j
| .
We have equality if and only if |b
j
| = |a
j
| for each j, which happens if and
only if the columns of A are orthogonal.
The theorem states that the volume of a parallelopiped in l
n
2
with given
side lengths has maximal volume when the sides are orthogonal, and the
proof is based on this.
14.2 Hadamard numbers
Hadamards inequality has the following corollary.
Corollary 14.2.1 Suppose that A = (a
ij
) is a real or complex matrix and
that [a
ij
[ 1 for all i and j. Then [ det A[ n
n/2
, and equality holds if and
only if [a
ij
[ = 1 for all i and j and

n
i=1
a
ij
a
ik
= 0 for i ,= k.
It is easy to give examples where equality holds in the complex case, for
any n; for example, set a
hj
= e
2ihj/n
.
In the real case, it is a much more interesting problem to nd examples
where equality holds. An n n matrix A = (a
ij
) all of whose entries are 1
or 1, and which satises

n
i=1
a
ij
a
ik
= 0 for i ,= k is called an Hadamard
matrix, and if n is an integer for which an Hadamard matrix of order n
exists, then n is called an Hadamard number. Note that the orthogonality
conditions are equivalent to the condition that AA
t
= nI
n
.
If A = (a
ij
) and B = (b
i
j
) are Hadamard matrices of orders n and n
t
respectively, then it is easy to check that the Kronecker product, or tensor
product,
K = AB =
_
k
(i,i
)(j,j
)
_
= (a
ij
) .
_
b
i
_
is a Hadamard matrix of order nn
t
. Thus if n and n
t
are Hadamard numbers,
then so is nn
t
. Now the 2 2 matrix
_
1 1
1 1
_
is an Hadamard matrix.
14.2 Hadamard numbers 235
By repeatedly forming Kronecker products, we can construct Hadamard
matrices of all orders 2
k
.
Are there any other (essentially dierent) Hadamard matrices? Hadamard
[Had 93] constructed Hadamard matrices of orders 12 and 20. Forty years
later, Paley [Pal 33] gave a powerful way of constructing innitely many new
Hadamard matrices. Before we present Paleys result, let us observe that
not every number can be an Hadamard number.
Proposition 14.2.1 If A = (a
ij
) is a Hadamard matrix of order n, where
n 3, then 4 divides n.
Proof Let a, b, c be distinct columns. Then
n
i=1
(a
i
+b
i
)(a
i
+c
i
) = a +b, a +c) = a, a) = n.
But each summand is 0 or 4, so that 4 divides n.
Theorem 14.2.1 (Paley [Pal 33]) Suppose that q = p
k
is a prime power.
If q = 1(mod 4), then there is a symmetric Hadamard matrix of order
2(q + 1), while if q = 3(mod 4) then there is a skew-symmetric matrix C
of order n = q + 1 such that I
n
+C is an Hadamard matrix.
In order to prove this theorem, we introduce a closely related class of ma-
trices. An n n matrix C is a conference matrix (the name comes from
telephone network theory) if the diagonal entries c
ii
are zero, all the other
entries are 1 or 1 and the columns are orthogonal:

n
i=1
c
ij
c
ik
= 0 for
i ,= k. Note that the orthogonality conditions are equivalent to the condi-
tion that CC
t
= (n 1)I
n
.
Proposition 14.2.2 If C is a symmetric conference matrix, then the matrix
D =
_
I
n
+C I
n
+C
I
n
+C I
n
C
_
is a symmetric Hadamard matrix.
If C is a skew-symmetric conference matrix, then the matrix I
n
+C is an
Hadamard matrix.
Proof If C is a symmetric conference matrix,
DD
(I
n
+C)
2
+ (I
n
+C)
2
((I
n
+C) + (I
n
C))(I
n
+C)
((I
n
+C) + (I
n
C))(I
n
+C) (I
n
C)
2
+ (I
n
C)
2
=
_
2I
n
+ 2C
2
0
0 2I
n
+ 2C
2
_
= 2nI
2n
.
If C is a skew-symmetric conference matrix, then
(I
n
+C)(I
n
+C)
t
= (I
n
+C)(I
n
C) = I
n
C
2
= I
n
+CC
t
= nI
n
.
In order to prove Paleys theorem, we therefore need only construct con-
ference matrices of order q +1 with the right symmetry properties. In order
to do this, we use the fact that there is a nite eld F
q
with q elements. Let
be the Legendre character on F
q
:
(0) = 0,
(x) = 1 if x is a non-zero square,
(x) = 1 if x is not a square.
We shall use the elementary facts that (x)(y) = (xy), that (1) = 1 if
and only if q = 1(mod 4) and that

xF
q
(x) = 0.
First we dene a qq matrix A = (a
xy
) indexed by the elements of F
q
: we
set a
xy
= (xy). A is symmetric if q = 1(mod 4) and A is skew-symmetric
if q = 3(mod 4).
We now augment A, by adding an extra row and column:
C =
0 (1) . . . (1)
1
.
.
. A
1
.
C has the required symmetry properties, and we shall show that it is a
conference matrix. Since

xF
q
(x) = 0, the rst column is orthogonal to
14.3 Error-correcting codes 237
each of the others. If c
y
and c
z
are two other distinct columns, then
c
y
, c
z
) = 1 +
xF
q
(x y)(x z)
= 1 +
xF
q
(x)(x +y z)
= 1 +
x,=0
((x))
2
(1 +x
1
(y z))
= (1) +
x,=0
(1 +x) = 0.
This completes the proof.
Paleys theorem implies that every multiple of four up to 88 is an Hadamard
number. After another twenty-nine years, it was shown [BaGH 62] that 92
is an Hadamard number. Further results have been obtained, but it is still
not known if every multiple of four is an Hadamard number.
14.3 Error-correcting codes
Hadamard matrices are useful for construction error-correcting codes. Sup-
pose that Alice wants to send Bob a message, of some 10,000 characters,
say. The characters of her message belong to the extended ASCII set of 256
characters, but she must send the message as a sequence of bits (0s and 1s).
She could for example assign the numbers 0 to 255 to the ASCII characters
in the usual way, and put each of the numbers in binary form, as a string
of eight bits. Thus her message will be a sequence of 80,000 bits. Suppose
however that the channel through which she send her message is a noisy
one, and that there is a probability 1/20 that a bit is received incorrectly by
Bob (a 0 being read as a 1, or a 1 being read as a 0), the errors occurring
independently. Then for each character, there is probability about 0.34 that
it will be misread by Bob, and this is clearly no good.
Suppose instead that Alice and Bob construct an Hadamard matrix H
of order 128 (this is easily done, using the Kronecker product construction
dened above, or the character table of F
127
) and replace the -1s by 0s,
to obtain a matrix K. They then use the columns of K and of K as
codewords for the ASCII characters, so that each ASCII character has a
codeword consisting of a string of 128 bits. Thus Alice sends a message
of 1,280,000 bits. Dierent characters have dierent codewords, and indeed
any two codewords dier in either 64 or 128 places. Bob decodes the message
by replacing the strings of 128 bits by the ASCII character whose codeword
it is (if no error has occurred in transmission), or by an ASCII character
whose codeword diers in as few places as possible from the string of 128
bits. Thus Bob will only decode a character incorrectly if at least 32 errors
have occurred in the transmission of a codeword. The probability of this
happening is remarkably small. Let us estimate it approximately. The
expected number of errors in transmitting a codeword is 6.4, and so the
probability of the number of errors is distributed approximately as a Poisson
distribution with parameter = 6.4. Thus the probability of 32 errors (or
more) is about e
32
/32!. Using Stirlings approximation for 32!, we see
that this probability is about e
(e/32)
32
/8
, which is a number of order

10
13
. Thus the probability that Bob will receive the message with any
errors at all is about 10
9
, which is really negligible. Of course there is a
price to pay: the message using the Hadamard matrix code is sixteen times
as long as the message using the simple binary code.
14.4 Note and remark
An excellent account of Hadamard matrices and their uses is given in Chap-
ter 18 of [vLW 92].
15
Hilbert space operator inequalities
15.1 Jordan normal form
We now turn to inequalities that involve linear operators. In this chapter, we
consider operators between nite-dimensional complex vector spaces, which
involve matrices and determinants, and operators between innite dimen-
sional complex Hilbert spaces. Let us spend some time setting the scene,
and describing the sorts of problem that we shall consider.
First, suppose that E is a nite-dimensional complex vector space, and
that T is an endomorphism of E: that is a linear mapping of E into itself.
We describe without proof the results from linear algebra that we need; an
excellent account is given in the book by Hirsch and Smale [HiS 74], although
their terminology is slightly dierent from what follows. We consider the
operator I T; this is invertible if and only if
T
() = det(I T) ,=
0. The polynomial
T
is the characteristic polynomial; its roots
1
, . . . ,
d
(repeated according to multiplicity, and arranged in decreasing absolute
value) form the spectrum (T). They are the singular points: if (T)
then E
(T) = x: T(x) = x is a non-trivial linear subspace of E, so that

is an eigenvalue, with eigenspace E
. Of equal interest are the subspaces

E
(k)
(T) = x: (T I)
k
(x) = 0 and G
(T) =
_
k>1
E
(k)
(T).
G
= G
(T) is a generalized eigenspace, and elements of G
are called
principal vectors. If
1
, . . . ,
r
are the distinct eigenvalues of T, then each
G
s
is T-invariant, and E is the algebraic direct sum
E = G
1
G
r
.
Further, each generalized eigenspace G
can be written as a T-invariant

direct sum
G
= H
1
H
l
,
239
240 Hilbert space operator inequalities
where each H
i
has a basis (h
1
, . . . , h
k
), where T(h
1
) = h
1
and T(h
l
) =
h
l
+ h
l1
for 2 l k. Combining all of these bases in order, we obtain
a Jordan basis (e
1
, . . . , e
d
) for E; the corresponding matrix represents T in
Jordan normal form. This basis has the important property that if 1 k d
and E
k
= span (e
1
, . . . , e
k
) then E
k
is T invariant, and T
k
= T
[E
k
has
eigenvectors
1
(T), . . . ,
k
(T).
15.2 Riesz operators
Although we shall be concerned in this chapter with linear operators between
Hilbert spaces, in later chapters we shall consider operators between Banach
spaces. In this section, we consider endomorphisms of Banach spaces. Sup-
pose then that T is a bounded endomorphism of a complex Banach space
E. Then the spectrum (T) of T, dened as
C: I T is not invertible,
is a non-empty closed subset of C, contained in : [[ inf |T
n
|
1/n
, and
the spectral radius r(T) = sup[[ : (T) satises the spectral radius
formula r(T) = inf|T
n
|
1/n
. The complement of the spectrum is called
the resolvent set (T), and the operator R
(T) = R
= (I T)
1
dened
on (T) is called the resolvent of T.
The behaviour of I T at a point of the spectrum can however be
complicated; we restrict our attention to a smaller class of operators, the
Riesz operators, whose properties are similar to those of operators on nite-
dimensional spaces.
Suppose that T L(E). T is a Riesz operator if
(T) 0 is either nite or consists of a sequence of points tending to 0.
If (T) 0, then is an eigenvalue and the generalized eigenspace
G
= x: (T I)
k
(x) = 0 for some k N
is of nite dimension.
If (T) 0, there is a T-invariant decomposition E = G
,
where H
is a closed subspace of E and T I is an isomorphism of H
onto itself.
We denote the corresponding projection of E onto G
with null-space H
by P
(T), and set Q
(T) = I P
(T).
If T is a Riesz operator and (T) 0, we call the dimension of
G
the algebraic multiplicity m

T
() of . We shall use the following con-
vention: we denote the distinct non-zero elements of (T), in decreasing
15.3 Related operators 241
absolute value, by
1
(T),
2
(T), . . ., and denote the non-zero elements of
(T), repeated according to algebraic multiplicity and in decreasing abso-
lute value, by
1
(T),
2
(T), . . . . (If (T) 0 =
1
, . . . ,
t
is nite, then
we set
s
(T) = 0 for s > t, and use a similar convention for
j
(T).)
Suppose that T is a Riesz operator and that (T) 0. Then is an
isolated point of (T). Suppose that s > 0 is suciently small that is the
only point of (T) in the closed disc z: [z [ s. Then it follows from
the functional calculus that
P
(T) =
1
2i
_
[z[=s
R
z
(T) dz.
This has the following consequence, that we shall need later.
Proposition 15.2.1 Suppose that T is a Riesz operator on E and that
[
j
(T)[ > r > [
j+1
(T)[. Let
J
r
= G
1
G
j
, K
r
= H
1
H
j
.
Then E = J
r
K
r
. If
r
is the projection of E onto K
r
with null-space J
r
then
r
(T) =
1
2i
_
[z[=r
R
z
(T) dz.
We denote the restriction of T to J
r
by T
>r
, and the restriction of T to K
r
by T
<r
. T
<r
is a Riesz operator with eigenvalues
j+1
,
j+2
, . . . .
15.3 Related operators
Suppose that E and F are Banach spaces, and that S L(E) and T L(F).
Following Pietsch [Pie 63], we say that S and T are related if there exist
A L(E, F), B L(F, E) such that S = BA and T = AB. This simple
idea is extremely powerful, as the following proposition indicates.
Proposition 15.3.1 Suppose that S = BA and T = AB are related.
(i) (S) 0 = (T) 0.
(ii) Suppose that p(x) = xq(x) + is a polynomial with non-zero constant
term . Let N
S
= y : p(S)y = 0 and let N
T
= z : p(T)(z) = 0. Then
A(N
S
) N
T
, and A is one-one on N
S
.
Proof (i) Suppose that (S) and that ,= 0. Set J
(T) = (AR
(S)B
I
F
)/. Then
(T I
F
)J
(T) = (A(BAI
E
)R
(S)B AB +I
F
)/ = I
F
,
J
(T)(T I
F
) = (AR
(S)(BAI
E
)B AB +I
F
)/ = I
F
,
so that (T) and R
(T) = J
(T). Similarly if (T) and ,= 0 then

(S).
(ii) Since Ap(BA) = p(AB)A, if y N
S
then p(T)A(y) = Ap(S)(y) = 0,
and so A(N
S
) N
T
. If y N
S
and A(y) = 0, then p(S)(y) = y = 0, so
that y = 0. Thus A is one-one on N
S
.
Since a similar result holds for B(N
T
), we have the following corollary.
Corollary 15.3.1 If S = BA and T = AB are related Riesz operators
and (S) 0 then A(G
(S)) = G
(T) and B(G
(T)) = G
(S). In
particular, m
S
() = m
T
().
In fact, although we shall not need this, if S L(E) and T L(F) are
related, and S is a Riesz operator, then T is a Riesz operator [Pie 63].
15.4 Compact operators
Are there enough examples of Riesz operators to make them important and
interesting? To begin to answer this, we need to introduce the notion of a
compact linear operator. A linear operator T from a Banach space (E, |.|
E
)
to a Banach space (F, |.|
F
) is compact if the image T(B
E
) of the unit ball
B
E
of E is relatively compact in F: that is, the closure T(B
E
) is a compact
subset of F. Alternatively, T is compact if T(B
E
) is precompact: given > 0
there exists a nite subset G in F such that T(B
E
)
gG
(g + B
F
). It
follows easily from the denition that a compact linear operator is bounded,
and that its composition (on either side) with a bounded linear operator is
again compact. Further the set K(E, F) of compact linear operators from
E to F is a closed linear subspace of the Banach space L(E, F) of bounded
linear operators from E to F, with the operator norm.
Theorem 15.4.1 Suppose that T L(E), where (E, |.|
E
) is an innite-
dimensional complex Banach space. If T
k
is compact, for some k, then T is
a Riesz operator.
The proof of this result is unfortunately outside the scope of this book.
A full account is given in [Dow 78], and details are also given, for example,
in [DuS 88], Chapter VII.
15.5 Positive compact operators 243
Our task will be to establish inequalities which give information about
the eigenvalues of a Riesz operator T in terms of other properties that it
possesses. For example, [
1
[ r(T). The Jordan normal form gives ex-
haustive information about a linear operator on a nite-dimensional spaces,
but the eigenvalues and generalized eigenspaces of a Riesz operator can give
very limited information indeed. The simplest example of this phenomenon
is given by the Fredholm integral operator
T(f)(x) =
_
x
0
f(t) dt
on L
2
[0, 1]. T is a compact operator (Exercise 15.2). It follows from the
CauchySchwarz inequality that [T(f)(x)[ x
1/2
|f|
2
|f|
2
, and arguing
inductively,
[T
n
(f)(x)[ (x
n1
/(n 1)!) |f|
2
.
From this it follows easily that T has no non-zero eigenvalues, and indeed
the spectral radius formula shows that (T) = 0. We shall therefore also
seek other parameters that give information about Riesz operators.
15.5 Positive compact operators
For the rest of this chapter, we shall consider linear operators between
Hilbert spaces, which we denote as H, H
0
, H
1
, . . . . We shall suppose that all
these spaces are separable, so that they have countable orthonormal bases;
this is a technical simplication, and no important features are lost.
We generalize the notion of a Hermitian matrix to the notion of a Her-
mitian operator on a Hilbert space. T L(H) is Hermitian if T = T
:
that is, T(x), y) = x, T(y)) for all x, y H. If T is Hermitian then
T(x), x) = x, T(x)) = T(x), x), so that T(x), x) is real. A Hermitian
operator T is positive, and we write T 0, if T(x), x) 0 for all x H. If
S L(H) then S +S
and i(S S
) are Hermitian, and S
S is positive.
Proposition 15.5.1 Suppose that T L(H) is positive. Let w = w(T) =
supT(x), x) : |x| 1. Then w = |T|.
Proof Certainly w |T|. Let v > w. Then vI T 0, and so, if x H,
(vI T)T(x), T(x)) 0 and T(vI T)(x), (vI T)(x)) 0.
Adding,
(vT T
2
)(x), vx
_
0, so that v T(x), x)
T
2
(x), x
_
= |T(x)|
2
.
Thus vw |T|
2
, and w |T|.
Proposition 15.5.2 If T L(H) is positive, then w = |T| (T).
Proof By the preceding proposition, there exists a sequence (x
n
) of unit
vectors in H such that T(x
n
), x
n
) w. Then
0 |T(x
n
) wx
n
|
2
= |T(x
n
)|
2
2wT(x
n
), x
n
) +w
2
2w(w T(x
n
), x
n
)) 0
as n , so that (T wI)(x
n
) 0 as n .
Just as a Hermitian matrix can be diagonalized, so can a compact
Hermitian operator. We can deduce this from Theorem 15.4.1, but, since
this theorem has been stated without proof, we prefer to give a direct proof,
which corresponds to the proof of the nite-dimensional case.
Theorem 15.5.1 Suppose that T is a positive compact operator on H.
Then there exists an orthonormal sequence (x
n
) in H and a decreasing nite
or innite sequence (s
n
) of non-negative real numbers such that T(x) =
n
s
n
x, x
n
) x
n
for each x H. If the sequence is innite, then s
n
0 as
n .
Conversely, such a formula denes a positive element of K(H).
Proof If T = 0 we can take any orthonormal sequence (x
n
), and take
s
n
= 0. Otherwise,
1
= |T| > 0, and, as in Proposition 15.5.2, there
exists a sequence (x
n
) of unit vectors in H such that T(x
n
)
1
x
n
0.
Since T is compact, there exists a subsequence (x
n
k
) and an element y of
H such that T(x
n
k
) y. But then
1
x
n
k
y, so that y ,= 0, and T(y) =
lim
k
T(
1
x
n
k
) =
1
y. Thus y is an eigenvector of T, with eigenvalue
1
.
Let E
1
be the corresponding eigenspace. Then E
1
is nite-dimensional;
for, if not, there exists an innite orthonormal sequence (e
n
) in E
1
, and
(T(e
n
)) = (
1
e
n
) has no convergent subsequence.
Now let H
1
= E
1
. If x H
1
and y E
1
then
T(x), y) = x, T(y)) =
1
x, y) = 0.
Since this holds for all y E
1
, T(x) H
1
. Let T
1
= T
[H
1
. Then T
1
is a
positive operator on H
1
, and
2
= |T
1
| <
1
, since otherwise
1
would be an
eigenvalue of T
1
. We can therefore iterate the procedure, stopping if T
k
= 0.
In this latter case, we put together orthonormal bases of E
1
, . . . , E
k1
to
obtain a nite orthonormal sequence (x
1
, . . . , x
N
). If x
n
E
j
, set s
n
=
j
.
Then it is easy to verify that T(x) =
N
n=1
s
n
x, x
n
) x
n
for each x H.
15.6 Compact operators between Hilbert spaces 245
If the procedure does not stop, we have an innite sequence of orthogonal
eigenspaces (E
k
), with
k
> 0. Again, we put together orthonormal bases
of the E
k
to obtain an innite orthonormal sequence (x
n
), and if x
n
E
k
,
set s
n
=
k
. Then T(x
n
) = s
n
x
n
, so that, since (T(x
n
)) has a convergent
subsequence, s
n
0.
If now x H and k N, we can write
x =
N
k
n=1
x, x
n
) x
n
+r
k
,
where N
k
= dim(E
1
+ +E
k
) and r
k
H
k
. Note that |r
k
| |x|. Then
T(x) =
N
k
n=1
x, x
n
) T(x
n
) +T(r
k
) =
N
k
n=1
s
n
x, x
n
) x
n
+T(r
k
).
But |T(r
k
)| |T
k
| |x| =
k
|x| 0 as n , and so T(x) =

n=1
s
n
x, x
n
) x
n
.
For the converse, let T
(k)
(x) =

k
n=1
s
n
x, x
n
) x
n
. Each T
(k)
is a nite
rank operator, and T
(k)
(x) T(x) as k . Suppose that > 0. There
exists N such that s
N
< /2. T
(N)
(B
H
) is a bounded nite-dimensional set,
and so is precompact: there exists a nite set F in H such that T
(N)
(B
H
)
fF
(f + (/2)B
H
). But if x B
H
then
_
_
T(x) T
(N)
(x)
_
_
< /2, and so
T(B
H
)
fF
(f +B
H
): T is compact.
15.6 Compact operators between Hilbert spaces
We now use Theorem 15.5.1 to give a representation theorem for compact
linear operators between Hilbert spaces.
Theorem 15.6.1 Suppose that T K(H
1
, H
2
). Then there exist orthonor-
mal sequences (x
n
) in H
1
and (y
n
) in H
2
, and a nite or innite decreasing
null-sequence (s
n
) of positive real numbers such that T(x) =
n
s
n
x, x
n
) y
n
for each x H
1
.
Conversely, such a formula denes an element of K(H
1
, H
2
).
Proof The operator T
T is a positive compact operator on H

1
, and so there
exist an orthonormal sequence (x
n
) in H
1
, and a nite or innite decreasing
sequence (t
n
) of positive real numbers such that T
T(x) =
n
t
n
x, x
n
) y
n
for each x H
1
. For each n, let s
n
=

t
n
and let y
n
= T(x
n
)/t
n
, so that
T(x
n
) = y
n
. Then
y
n
, y
n
) = T(x
n
)/t
n
, T(x
n
)/t
n
) = T
T(x
n
), x
n
) /s
n
= 1,
and
y
n
, y
m
) = T(x
n
)/t
n
, T(x
m
)/t
m
) = T
T(x
n
), x
m
) /t
n
t
m
= 0
for m ,= n, so that (y
n
) is an orthonormal sequence. The rest of the proof
is just as the proof of Theorem 15.5.1.
We write T =
n=1
s
n
, x
n
) y
n
or T =
N
n=1
s
n
. . . , x
n
) y
n
.
We can interpret this representation of T in the following way. Suppose
that T =

n=1
s
n
, x
n
) y
n
K(H
1
, H
2
). Then T
n=1
s
n
, y
n
) x
n

K(H
2
, H
1
), and T
T =

n=1
s
2
n
. . . , x
n
)x
n
K(H
1
). Then [T[ =
n=1
s
n
, x
n
) x
n
K(H
1
) is the positive square root of T
T, and T = U[T[,
where U(x) =
n=1
x, x
n
) y
n
is a partial isometry of H
1
into H
2
, mapping
the closed linear span K of (x
n
) isometrically onto the closed linear span L
of (y
n
), and mapping K
to 0.
We leave the reader to formulate and prove the corresponding nite-
dimensional version of Theorem 15.6.1.
15.7 Singular numbers, and the RayleighRitz minimax formula
Suppose that T =

n=1
s
n
(T) , x
n
) y
n
K(H
1
, H
2
), where (x
n
) and (y
n
)
are orthonormal sequences in H
1
and H
2
respectively, and (s
n
(T)) is a de-
creasing sequence of non-negative real numbers. The numbers s
n
(T) are
called the singular numbers of T, and can be characterized as follows.
Theorem 15.7.1 (The RayleighRitz minimax formula) Suppose that
T =
n=1
s
n
(T) , x
n
) y
n
K(H
1
, H
2
), where (x
n
) and (y
n
) are orthonor-
mal sequences in H
1
and H
2
respectively, and (s
n
(T)) is a decreasing se-
quence of non-negative real numbers. Then
s
n
(T) = inf
__
_
_T
[J
_
_
_ : dimJ < n
_
= infsup|T(x)| : |x| 1, x J
: dimJ < n,
and the inmum is achieved.
Proof Let r
n
= inf
_
_
_T
[J
_
_
_ : dim J < n. If K
n1
= span (x
1
, . . . , x
n1
),
then s
n
(T) =
_
_
T
[K
n1
_
_
, and so s
n
(T) r
n
. On the other hand, suppose that
J is a subspace with dim J = j < n. If x K
n
= span (x
1
, . . . , x
n
), then
|T(x)| s
n
(T) |x|. Let D = K
n
+J, let L = J
D and let d = dim D.

Then dim L = d j and dim (K
n
+L) d, so that
dim (K
n
L) = dim K
n
+ dim L dim (K
n
+L)
n + (d j) d = n j > 0.
Thus there exists x K
n
L with |x| = 1, and then
_
_
_T
[J
_
_
_ |T(x)|
s
n
(T), so that r
n
s
n
(T). Finally, the inmum is achieved on K
n1
.
Proposition 15.7.1 (i) If AL(H
0
, H
1
) and BL(H
2
, H
3
) then
s
n
(BTA) |A| . |B| .s
n
(T).
(ii) If S, T K(H
1
, H
2
) then s
n+m1
(S +T) s
m
(S) +s
n
(T).
(iii) Suppose that (T
k
) is a sequence in K(H
1
, H
2
) and that T
k
T in
operator norm. Then s
n
(T
k
) s
n
(T) as k , for each n.
Proof (i) follows immediately from the RayleighRitz minimax formula.
(ii) There exist subspaces J
S
of dimension m 1 and J
T
of dimension
n 1 such that
_
_
_S
[J
S
_
_
_ = s
m
(S) and
_
_
_T
[J
T
_
_
_ = s
m
(T). Let K = J
S
+ J
T
.
Then dim K < m+n 1 and
s
m+n1
(S +T)
_
_
_(S +T)
[K
_
_
_
_
_
_S
[K
_
_
_ +
_
_
_T
[K
_
_
_ s
m
(S) +s
n
(T).
(iii) Suppose that > 0. Then there exists k
0
such that |T T
k
| < , for
k k
0
. If K is any subspace of H
1
of dimension less than n and x K
,
|T(x)| |T
k
(x)| |x| ,
so that s
n
(T) s
n
(T
k
) for k k
0
. On the other hand, if k k
0
there
exists a subspace K
k
with dim K
k
= n 1 such that
_
_
_(T
k
)
[K
k
_
_
_ = s
n
(T
k
),
and so
_
_
_T
[K
k
_
_
_ s
n
(T
k
) + for k k
0
. Thus s
n
(T) s
n
(T
k
) + for k k
0
.
We again leave the reader to formulate and prove the corresponding nite-
dimensional versions of Theorem 15.7.1 and Proposition 15.7.1.
15.8 Weyls inequality and Horns inequality
We have now set the scene. Suppose that T K(H). On the one hand, T
is a Riesz operator, and we can consider its eigenvalues (
i
(T)), repeated
according to their algebraic multiplicities. On the other hand we can write
T =

n=1
s
n
(T) , x
n
) y
n
, where (s
n
(T)) are the singular numbers of T.
How are they related?
Theorem 15.8.1 (i) Suppose that T L(l
n
2
) is represented by the matrix A.
There exist unitary matrices U and V such that A=Udiag(s
1
(T), . . . ,
s
n
(T))V .
Thus
[ det A[ =
j=1
j
(T)
=
n
j=1
s
j
(T).
(ii) (Weyls inequality I) Suppose that T K(H). Then
j=1
j
(T)
j=1
s
j
(T).
(iii) (Horns inequality I) Suppose that T
k
K(H
k1
, H
k
) for 1kK.
Then
J
j=1
s
j
(T
K
T
1
)
K
k=1
J
j=1
s
j
(T
k
).
Proof (i) follows immediately from the nite-dimensional version of Theorem
15.6.1 and the change-of-basis formula for matrices.
(ii) We can suppose that
J
,= 0. Then, by the remarks at the end
of Section 1, there exists a J-dimensional T-invariant subspace H
J
for
which

T = T
[H
J
has eigenvalues
1
(T), . . . ,
J
(T). Let I
J
be the inclu-
sion: H
J
H, and let P
J
be the orthogonal projection H H
J
. Then
s
j
(
T) = s
j
(P
J
TI
J
) s
j
(T). Thus
j=1
j
(T)
=
J
j=1
s
j
(
T)
J
j=1
s
j
(T).
(iii) Again, we can suppose that s
J
(T
K
T
1
) ,=0. Let T
K
. . . T
1
=
n=1
s
n
(T
K
. . . T
1
, x
n
) y
n
, and let V
0
= span (x
1
, . . . , x
J
). Let V
k
=
T
k
. . . T
1
(V
0
), so that T
k
(V
k1
)=V
k
. Let

T
k
=T
k [V
k 1
. Since s
J
(T
K
. . .T
1
) ,=0,
dim (V
k
) =J, for 0 k K; let W
k
be an isometry from l
J
2
onto V
k
.
H
0
T
0
H
1
T
1

T
K
H
K

V
0
T
0
V
1
T
1

T
K
V
K
W
0
W
1
W
K

l
J
2
l
J
2
l
J
2
Let A
k
be the matrix representing W
1
k
T
k
W
k1
. Then A
K
. . . A
1
represents
(T
K
. . . T
1
)
[V
0
, so that
J
j=1
s
j
(T
K
. . . T
1
) = [ det(A
K
. . . A
1
)[ =
K
k=1
[ det A
k
[
=
K
k=1
J
j=1
s
j
(
T
k
)
K
k=1
J
j=1
s
j
(T
k
)
Weyl [Wey 49] proved his inequality by considering alternating tensor
products, and also proved the rst part of the following corollary. As P olya
[P ol 50] observed, the inequality above suggests that majorization should be
used; let us follow P olya, as Horn [Hor 50] did when he proved the second
part of the corollary.
Corollary 15.8.1 Suppose that is an increasing function on [0, ) and
that (e
t
) is a convex function of t.
(i) (Weyls inequality II) Suppose that T K(H). Then
J
j=1
([
j
(T)[)
J
j=1
(s
j
(T)), for each J.
In particular,
J
j=1
[
j
(T)[
p
j=1
(s
j
(T))
p
, for 0 < p < , for each J.
Suppose that (X, |.|
X
) is a symmetric Banach sequence space. If (s
j
(T))
X then (
j
(T)) X and |(
j
(T))|
X
|(s
j
(T))|
X
.
(ii) (Horns inequality II) Suppose that T
k
K(H
k1
, H
k
) for 1kK.
Then
J
j=1
(s
j
(T
K
T
1
))
J
j=1
_
K
k=1
s
j
(T
k
)
_
, for each J.
In particular,
J
j=1
(s
j
(T
K
T
1
))
p
j=1
_
K
k=1
s
j
(T
k
)
_
p
, for 0 < p < , for each j.
X
) is a symmetric Banach sequence space. If
(
K
k=1
s
j
(T
k
)) X then (s
j
(T
K
T
1
)) X and |(s
j
(T
K
T
1
))|
X

_
_
_
_
K
k=1
s
j
(T
k
)
_
_
_
_
X
.
Proof These results follow from Proposition 7.6.3.
15.9 Ky Fans inequality
The Muirhead maximal numbers (
k
(T)) and (s
k
(T)) play as important
role in operator theory as they do for sequences. We now characterize s
k
in terms of the trace of a matrix. Let us recall the denition. Suppose
that E is a nite dimensional vector space, with basis (e
1
, . . . , e
n
), and dual
basis (
1
, . . . ,
n
). Then if T L(E), we dene the trace of T, tr(T), to be
tr(T) =
n
j=1
j
(T(e
j
)). Thus if T is represented by the matrix (t
ij
), then
tr(T) =

n
j=1
t
jj
. The trace is independent of the choice of basis, and is
equal to
n
j=1
j
, where the
j
are the roots of the characteristic polynomial,
counted according to multiplicity. The trace also has the following important
commutation property: if F is another nite-dimensional vector space, not
necessarily of the same dimension, and S L(E, F), T L(F, E) then
tr(ST) = tr(TS); for if S and T are represented by matrices (s
ij
) and (t
jk
),
then Tr(ST) =
j
s
ij
t
ji
= tr(TS).
Theorem 15.9.1 (Ky Fans theorem) Suppose that T K(H
1
, H
2
).
Then
s
k
(T)=(1/k) sup[tr(ATB): AL(H
2
, l
k
2
), BL(l
k
2
, H
1
), |A| 1, |B| 1.
Proof Suppose that T =

n=1
s
n
(T) , x
n
) y
n
. Dene A L(H
2
, l
k
2
) by
setting A(z) = (z, y
j
))
k
j=1
, and dene B L(l
k
2
, H
1
) by setting B(v) =
k
j=1
v
j
x
j
. Then |A| 1 and |B| = 1. The operator ATB L(l
k
2
) is rep-
resented by the matrix diag(s
1
(T), . . . , s
k
(T)), so that s
k
(T)=(1/k)tr(ATB).
On the other hand, suppose that A L(H
2
, l
k
2
), that B L(l
k
2
, H
1
), and
that |A| 1 and |B| 1. Let A(y
j
) = (a
lj
)
k
l=1
and let B(e
i
), x
j
) = b
ji
.
Then
ATB(e
i
) = A
j=1
s
j
(T)b
ji
y
j
j=1
a
lj
s
j
(T)b
ji
k
l=1
,
15.10 Operator ideals 251
so that
tr(ATB) =
k
i=1
j=1
a
ij
s
j
(T)b
ji
=
k
j=1
_
k
i=1
a
ij
b
ji
_
s
j
(T).
Now
BA(y
j
), x
j
) =
_
B
_
k
i=1
a
ij
e
i
_
, x
j
_
=
k
i=1
a
ij
b
ji
,
and
[ BA(y
j
), x
j
) [ |B| . |A| . |y
j
| . |x
j
| 1,
so that [
k
i=1
a
ij
b
ji
[ 1, and (1/k)[tr(ATB)[ s
k
(T).
Corollary 15.9.1 (Ky Fans inequality) If S, T K(H
1
, H
2
) then
s
k
(S +T) s
k
(S) +s
k
(T).
15.10 Operator ideals
We are now in a position to extend the results about symmetric Banach se-
quence spaces to ideals of operators. Suppose that (X, |.|
X
) is a symmetric
Banach sequence space contained in c
0
. We dene the Banach operator ideal
S
X
(H
1
, H
2
) to be
S
X
(H
1
, H
2
) = T K(H
1
, H
2
): (s
n
(T)) X,
and set |T|
X
= |(s
n
(T))|
X
. If X = l
p
, we write S
p
(H
1
, H
2
) for S
X
(H
1
, H
2
)
and denote the norm by |.|
p
.
Theorem 15.10.1 S
X
(H
1
, H
2
) is a linear subspace of K(H
1
, H
2
), and |.|
X
is a norm on it, under which it is complete. If T S
X
(H
1
, H
2
), A
L(H
2
, H
3
) and B L(H
0
, H
1
) then ATB S
X
(H
0
, H
3
), and |ATB|
X

|A| . |T|
X
. |B|.
Proof Ky Fans inequality says that (s
n
(S + T))
w
(s
n
(S) + s
n
(T)). If
S, T S
X
then (s
n
(S)+s
n
(T)) X, and so by Corollary 7.4.1 (s
n
(S+T))
X, and |(s
n
(S +T))|
X
|(s
n
(S))|
X
+|(s
n
(T))|
X
. Thus S +T S
X
and
|S +T|
X
|S|
X
+|T|
X
.
Since |S| = [[ |S|
X
, it follows that S
X
(H
1
, H
2
) is a linear subspace of
K(H
1
, H
2
), and that |.|
X
is a norm on it.
Completeness is straightforward. If (T
n
) is a Cauchy sequence in
S
X
(H
1
, H
2
) then (T
n
) is a Cauchy sequence in operator norm, and so con-
verges in this norm to some T K(H
1
, H
2
). Then s
k
(T
n
) s
k
(T),
for each k, by Corollary 15.7.1, and so T S
X
(H
1
, H
2
), and |T|
X

sup |T
n
|
X
, by Fatous Lemma (Proposition 6.1.1). Similarly, |T T
n
|
sup
mn
|T
m
T
n
|
X
0 as n .
The nal statement also follows from Corollary 15.7.1.
The nal statement of Theorem 15.10.1 explains why S
X
(H
1
, H
2
) is called
an ideal. The ideal property is very important; for example, we have the
following result, which we shall need later.
Proposition 15.10.1 Suppose that S
X
(H) is a Banach operator ideal, and
that r > 0. The set
O
(r)
X
(H) = T S
X
(H): z : [z[ = r (T) =
is an open subset of S
X
(H), and the map T T
<r
is continuous on it.
Proof Suppose that T O
(r)
X
(H). Let M
T
=sup
[z[=r
|R
z
(T)|. If |ST|
X
<
1/M
T
then |S T| < 1/M
T
, so that if [z[ = r then zI S is invertible and
|R
S
(z)| 2M(T). Thus S O
(r)
X
(H), and O
(r)
X
(H) is open. Further, we
have the resolvent equation
SR
z
(S) TR
z
(T) = zR
z
(S)(S T)R
z
(T),
so that, using Proposition 15.2.1,
|S
<r
T
<r
|
X
=
_
_
_
_
_
1
2
_
[z[=r
SR
z
(S) TR
z
(T) dz
_
_
_
_
_
X
2rM
2
T
|S T|
X
.
Ky Fans theorem allows us to establish the following characterization of
S
X
(H
1
, H
2
).
Proposition 15.10.2 Suppose that X is a symmetric Banach sequence
space and that T =

n=1
s
n
(T) , x
n
) y
n
K(H
1
, H
2
). Then T S
X
(H
1
, H
2
)
if and only if (T(e
j
), f
j
)) X for all orthonormal sequences (e
j
) and (f
j
)
in H
1
and H
2
, respectively. Then
|T|
X
=sup|(T(e
j
), f
j
))|
X
: (e
j
), (f
j
) orthonormal inH
1
, H
2
respectively.
Proof The condition is certainly sucient, since (s
n
(T)) = (T(x
n
), y
n
)).
Suppose that T S
X
(H
1
, H
2
) and that (e
j
) and (f
j
) are orthonormal
sequences in H
1
and H
2
, respectively. Let us set y
j
= T(e
j
), f
j
).We arrange
y
1
, . . . , y
k
in decreasing absolute value: there exists a one-one mapping :
1, . . . , k N such that [y
(j)
[ = y
j
for 1 j k. Dene A L(H
2
, l
k
2
)
by setting
A(z)
j
= sgn y
(j)
z, f
(j)
),
and dene B L(l
k
2
, H
1
) by setting B(v) =

k
j=1
v
j
e
(j)
. Then |A| 1
and |B| = 1, and tr(ATB) =

k
j=1
y
j
. But [tr(ATB)[ ks
k
(T), by
Ky Fans theorem, and so (y
j
)
w
(s
j
(T)). Thus (T(e
j
), f
j
)) X and
|(T(e
j
), f
j
))|
X
|T|
X
.
We can use Horns inequality to transfer inequalities from symmetric se-
quence spaces to operator ideals. For example, we have the following, whose
proof is immediate.
Proposition 15.10.3 (i) (Generalized H olders inequality) Suppose that
0 < p, q, r < and that 1/p + 1/q = 1/r. If S S
p
(H
1
, H
2
) and T
S
q
(H
1
, H
2
) then ST S
r
(H
1
, H
2
) and
j=1
(s
j
(ST))
r
1/r
j=1
(s
j
(S))
p
1/p
.
j=1
(s
j
(T))
q
1/q
.
(ii) Suppose that (X, |.|
X
) is a symmetric Banach sequence space con-
tained in c
0
, with associate space (X
t
, |.|
X
) also contained in c
0
. If S S
X
and T S
X
then ST S
1
and |ST|
1
|S|
X
. |T|
X
.
15.11 The HilbertSchmidt class
There are two particularly important Banach operator ideals, the trace class
S
1
and the HilbertSchmidt class S
2
. We begin with the HilbertSchmidt
class.
Theorem 15.11.1 Suppose that H
1
and H
2
are Hilbert spaces.
(i) Suppose that T K(H
1
, H
2
). Then the (possibly innite) sum
j=1
|T(e
j
)|
2
is the same for all orthonormal bases (e
j
) of H
1
. T
S
2
(H
1
, H
2
) if and only if the sum is nite, and then |T|
2
2
=
j=1
|T(e
j
)|
2
.
(ii) If S, T S
2
(H
1
, H
2
) then the series

j=1
S(e
j
), T(e
j
)) is absolutely
convergent for all orthonormal bases (e
j
), and the sum is the same for all
orthonormal bases. Let
S, T) =
j=1
S(e
j
), T(e
j
)) .
Then S, T) is an inner product on S
2
(H
1
, H
2
) for which T, T) = |T|
2
2
, for
all T S
2
(H
1
, H
2
).
Proof (i) Suppose that (e
j
) and (f
k
) are orthonormal bases of H
1
. Then
j=1
|T(e
j
)|
2
=
j=1
k=1
[ T(e
j
), f
k
) [
2
=
k=1
j=1
[ e
j
, T
(f
k
)) [
2
=
k=1
|T
(f
k
)|
2
.
Thus the sum does not depend on the choice of orthonormal basis (e
j
). Now
there exists an orthonormal sequence (x
j
) such that |T(x
j
)| = s
j
(T), for
all j. Let (z
j
) be an orthonormal basis for ( span (x
j
))
, and let (e
j
) be an
orthonormal basis for H
1
whose terms comprise the x
j
s and the y
j
s. Then
j=1
|T(e
j
)|
2
=
j=1
|T(x
j
)|
2
+
j=1
|T(y
j
)|
2
=
j=1
(s
j
(T))
2
,
so that the sum is nite if and only if T S
2
(H
1
, H
2
), and then |T|
2
2
=
j=1
|T(e
j
)|
2
.
(ii) This is a simple exercise in polarization.
The equality in part (i) of this theorem is quite special. For example,
let v
j
= (1/
j log(j + 1)). Then v = (v

j
) l
2
; let w = v/ |v|
2
. Now let
P
w
= w w be the one-dimensional orthogonal projection of l
2
onto the
span of w. Then P
w
S
p
, and |P
w
|
p
= 1, for 1 p < , while
j=1
|P
w
(e
j
)|
p
=
j=1
1
j
p/2
(log j)
p
|v|
p
2
=
for 1 p < 2. This phenomenon is a particular case of the following
inequalities.
Proposition 15.11.1 Suppose that T =
n=1
s
n
(T) , x
n
) y
n
K(H
1
, H
2
)
and that (e
k
) is an orthonormal basis for H
1
.
(i) If 1 p < 2 then

k=1
|T(e
k
)|
p
j=1
(s
j
(T))
p
.
(ii) If 2 < p < then

k=1
|T(e
k
)|
p
j=1
(s
j
(T))
p
.
Proof (i) We use Holders inequality, with exponents 2/p and 2/(2 p):
j=1
(s
j
(T))
p
=
j=1
(s
j
(T))
p
_

k=1
[ e
k
, x
j
) [
2
_
=
k=1
j=1
(s
j
(T))
p
[ e
k
, x
j
) [
2
k=1
j=1
(s
j
(T))
p
[ e
k
, x
j
) [
p
[ e
k
, x
j
) [
2p
k=1
j=1
(s
j
(T))
2
[ e
k
, x
j
) [
2
p/2
j=1
[ e
k
, x
j
) [
2
1p/2
k=1
j=1
(s
j
(T))
2
[ e
k
, x
j
) [
2
p/2
=
k=1
|T(e
k
)|
p
.
(ii) In this case, we use Holders inequality with exponents p/2 and
p/(p 2):
k=1
|T(e
k
)|
p
=
k=1
j=1
(s
j
(T))
2
[ e
k
, x
j
) [
2
p/2
=
k=1
j=1
(s
j
(T))
2
[ e
k
, x
j
) [
4/p
[ e
k
, x
j
) [
24/p
p/2
k=1
j=1
(s
j
(T))
p
[ e
k
, x
j
) [
2
j=1
[ e
k
, x
j
) [
2
(p2)/2
k=1
(
j=1
(s
j
(T))
p
[ e
k
, x
j
) [
2
)
=
j=1
(s
j
(T))
p
_

k=1
[ e
k
, x
j
) [
2
_
=
j=1
(s
j
(T))
p
.
15.12 The trace class
We now turn to the trace class. First let us note that we can use it to
characterize the Muirhead maximal numbers s
k
(T).
Theorem 15.12.1 Suppose that T K(H
1
, H
2
). Then
s
k
(T) = inf|R|
1
/k +|S| : T = R +S, R S
1
(H
1
, H
2
), S K(H
1
, H
2
),
and the inmum is attained.
Proof First suppose that T = R + S, with R S
1
(H
1
, H
2
) and S
K(H
1
, H
2
). Then by Ky Fans inequality,
s
k
(T) s
k
(R) +s
k
(S) |R|
1
/k +|S| .
On the other hand, if T =
n=1
s
n
(T) , x
n
) y
n
, let
R =
k
n=1
(s
n
(T) s
k
(T)) , x
n
) y
n
and
S =
k
n=1
s
k
(T) , x
n
) y
n
+
n=k+1
s
n
(T) , x
n
) y
n
.
Then T = R + S and |R|
1
= k(s
k
(T) s
k
(T)) and |S| = s
k
(T), so that
s
k
(T) = |R|
1
/k +|S|.
This enables us to prove an operator version of Calder ons interpolation
theorem.
Corollary 15.12.1 Suppose that is a norm-decreasing linear map of
K(H
1
, H
2
) into K(H
3
, H
4
), which is also norm-decreasing from S
1
(H
1
, H
2
)
into S
1
(H
3
, H
4
). If T K(H
1
, H
2
) then s
k
((T)) s
k
(T), so that |(T)|
X
|T|
X
for any Banach operator ideal J
X
.
The important feature of the trace class S
1
(H) is that we can dene a
special linear functional on it, namely the trace.
Theorem 15.12.2 (i) Suppose that T is a positive compact operator on a
Hilbert space H. Then the (possibly innite) sum

j=1
T(e
j
), e
j
) is the
same for all orthonormal bases (e
j
) of H. T S
1
(H
1
, H
2
) if and only if the
sum is nite, and then |T|
1
=
j=1
T(e
j
), e
j
).
(ii) If T S
1
(H), then
j=1
T(e
j
), e
j
) converges absolutely, and the sum
is the same for all orthonormal bases (e
j
) of H.
Proof (i) We can write T as T =

n=1
s
n
(T) , x
n
) x
n
. Let S =
n=1
_
s
n
(T) , x
n
) x
n
. Then S is a positive compact operator, and
T = S
2
. Thus
j=1
T(e
j
), e
j
) =
j=1
S(e
j
), S(e
j
)) =
j=1
|S(e
j
)|
2
,
and we can apply Theorem 15.11.1. In particular, the sum is nite if and
only if S S
2
(H), and then
j=1
s
j
(T) =
j=1
(s
j
(S))
2
=
j=1
T(e
j
), e
j
) .
(ii) We can write T as T =
n=1
s
n
(T) , x
n
) y
n
. Let
R =
n=1
_
s
n
(T) , y
n
) y
n
and S =
n=1
_
s
n
(T) , x
n
) y
n
.
Then R and S are HilbertSchmidt operators, T = RS, and if (e
j
) is an or-
thonormal basis then T(e
j
), e
j
) = S(e
j
), R
(e
j
)), so that the result follows
from Theorem 15.11.1 (ii).
15.13 Lidskiis trace formula
The functional tr(T) =

j=1
T(e
j
), e
j
) is called the trace of T. It is a
continuous linear functional on S
1
(H), which is of norm 1, and which satises
tr(T
) = tr(T). It generalizes the trace of an operator on a nite-dimensional

space; can it too be characterized in terms of its eigenvalues?
Theorem 15.13.1 (Lidskiis trace formula) If T S
1
(H) then

j=1
j
(T) is absolutely convergent, and tr(T) =
j=1
j
(T).
Proof This result had been conjectured for a long time; the rst proof
was given by Lidskii [Lid 59]: we shall howevever follow the proof given by
Leiterer and Pietsch, as described in [K on 86].
The fact that

j=1
j
(T) is absolutely convergent follows immediately
from Weyls inequality. Let us set (T) =
j=1
j
(T). If T is of nite rank,
then (T) = tr(T). The nite rank operators are dense in S
1
(H), and tr is
continuous on S
1
(H), and so it is enough to show that is continuous on
S
1
(H).
The key idea of the proof is to introduce new parameters which are more
useful, in the present circumstances, than the singular numbers. The next
lemma gives the details.
Lemma 15.13.1 Suppose that S, T K(H). Let t
k
(T) = (s
k
(T))
1/2
,
t
k
(T) = (1/k)
k
j=1
t
j
(T) and y
k
(T) = (t
k
(T))
2
. Then
(i)

l
k=1
s
k
(T)
l
k=1
y
k
(T) 4
l
k=1
s
k
(T);
(ii) [
k
(T)[ y
k
(T);
(iii) y
2k
(S +T) 2y
k
(S) + 2y
k
(T).
Proof (i) Clearly s
k
(T) y
k
(T); this gives the rst inequality. On the other
hand, applying the HardyRiesz inequality,
l
k=1
y
k
(T) =
l
k=1
(t
k
(T))
2
4
l
k=1
(t
k
(T))
2
= 4
l
k=1
s
k
(T).
(ii) It follows from Weyls inequality that
[
k
(T)[ (
1
k
k
j=1
[
j
(T)[
1/2
)
2
(
1
k
k
j=1
t
k
(T))
2
= y
k
(T).
(iii) Using Proposition 15.7.1, and the inequality (a +b)
1/2
a
1/2
+b
1/2
for
a, b 0,
t
2k
(S +T)
1
k
k
j=1
(s
2j1
(S +T))
1/2
1
k
k
j=1
(s
j
(S) +s
j
(T))
1/2
1
k
k
j=1
((s
j
(S))
1/2
+ (s
j
(T))
1/2
) = t
k
(S) +t
k
(T);
thus
y
2k
(S +T) (t
k
(S) +t
k
(T))
2
2(t
k
(S))
2
+ 2(t
k
(T))
2
= 2y
k
(S) + 2y
k
(T).
Let us now return to the proof of the theorem. Suppose that T S
1
(H)
and that > 0.

j=1
y
j
(T) 4
j=1
j
(T) < , and so there exists J such
that

j=J+1
[y
j
(T)[ < /24, and there exists 0 < r < min(/24J, [
J
(T)[)
such that T O
(r)
1
. By Proposition 15.10.1, there exists 0 < < /24
such that if |S T|
1
< then S O
(r)
1
(H), |S
<r
T
<r
|
1
< /24 and
|S
>r
T
>r
|
1
< /24. Consequently, for such S,
[
j
(T)[>r
j
(T)
[
j
(S)[>r
j
(S)
= [tr(T
>r
) tr(S
>r
)[
|T
>r
S
>r
|
1
< /24.
On the other hand, using the inequalities of Lemma 15.13.1,
[
j
(T)[<r
[
j
(T)[
j=J+1
[y
j
(T)[ < /24,
and
[
j
(S)[<r
[
j
(S)[ =
j=1
[
j
(S
<r
)[
2J
j=1
[
j
(S
<r
)[ +
j=2J+1
y
j
(S
<r
)
2Jr +
j=2J+1
y
j
(S)
2/24 + 4
j=J+1
y
j
(T) + 4
j=J+1
y
j
(S T)
6/24 + 4
j=1
y
j
(S T)
6/24 + 16
j=1
s
j
(S T) 22/24.
Thus [(T (S))[ < , and is continuous.
We can now apply Corollary 15.3.1.
Theorem 15.13.2 If S S
1
(H
1
) and T S
1
(H
2
) are related operators,
then tr(ST) = tr(TS).
15.14 Operator ideal duality
We can now establish a duality theory for Banach operator ideals analogous
to that for symmetric Banach function spaces. The basic results are sum-
marized in the next theorem; the details are straightforward, and are left to
the reader.
Theorem 15.14.1 Suppose that X is a symmetric Banach sequence space
contained in c
0
, whose associate space is also contained in c
0
. If S
J
X
(H
1
, H
2
) and T J
X
(H
2
, H
1
) then TS S
1
(H
1
) and ST S
1
(H
2
),
tr(TS) = tr(ST) and [tr(TS)[ |S|
X
. |T|
X
. Further,
|S|
X
= sup[tr(ST)[ : T J
X
(H
2
, H
1
), |T|
X
1.
The inner product on S
2
(H
1
, H
2
) can also be expressed in terms of the
trace: if S, T S
2
(H
1
, H
2
), and (E
j
) is an orthonormal basis for H
1
then
S, T) =
j=1
S(e
j
), T(e
j
)) =
j=1
T
S(e
j
), T(e
j
)) = tr(T
S).
The ideals S
p
enjoy the same complex interpolation properties as L
p
spaces.
Theorem 15.14.2 Suppose that 1 p
0
, p
1
, that 0 < < 1 and that
1/p = (1 )/p
0
+/p
1
. Then S
p
= (S
p
0
, S
p
1
)
[]
(where S
= K).
Proof The proof is much the same as for the L
p
spaces. Suppose that
T =

n=1
s
n
(T) , x
n
) y
n
. Let u(z) = (1 z)/p
0
+ z/p
1
, and let T(z) =
n=1
(s
n
(T))
pu(z)
, x
n
) y
n
. Then |T(z)|
p
j
= |T|
p
for z L
j
(for j = 0, 1)
and T
= T, so that |T|
[]
|T|
p
. On the other hand, suppose that
F T(S
p
0
, S
p
1
), with F() = T. Let r
n
= (s
n
(T))
p1
, and for each N let
R
N
=
N
n=1
r
n
, y
n
) x
n
, and G
N
(z) =
N
n=1
r
p
v(z)
n
, y
n
) x
n
, where v(z) =
(1 z)/p
t
0
+z/p
t
1
. Then
N
n=1
(s
n
(T))
p
= tr(RT) max
j=0,1
sup
zL
j
[trR(z)F(z)[
|R|
p
max
j=0,1
sup
zL
j
|F(z)|
p
j
=
_
N
n=1
(s
n
(T)
_
p
)
1/p
|T|
[]
.
Letting N , we see that T S
p
and |T|
p
|T|
[]
.
Information about the spectrum and resolvent of a bounded linear operator
are given in most books on functional analysis, such as [Bol 90], Chapter 12.
Accounts of the functional calculus are given in [Dow 78] and [DuS 88].
The study of ideals of operators on a Hilbert space was inaugurated by
Schatten [Scha 50], although he expressed his results in terms of tensor prod-
ucts, rather than operators.
Exercises
15.1 Suppose that T L(E), where (E, |.|
E
) is a complex Banach space.
(i) Suppose that , (T). Establish the resolvent equation
R
= ( )R
= ( )R
.
(ii) Suppose that S, T L(E), that (ST) and that ,= 0.
Show that (TS) and that
R
(TS) =
1
(I TR
(ST)S).
What happens when = 0?
(iii) Suppose that is a boundary point of (T). Show that is
an approximate eigenvalue of T: there exists a sequence (x
n
) of unit
vectors such that T(x
n
) x
n
0 as n . (Use the fact that if
(T) and [ [ < |R
|
1
then (T).) Show that if T is
compact and ,= 0 then is an eigenvalue of T.
15.2 Show that the functions e
2int
: n Z form an orthonormal
basis for L
2
(0, 1). Using this, or otherwise, show that the Fredholm
integral operator
T(f)(x) =
_
x
0
f(t) dt
is a compact operator on L
2
(0, 1).
15.3 (i) Suppose that (x
n
) is a bounded sequence in a Hilbert space H.
Show, by a diagonal argument, that there is a subsequence (x
n
k
)
such that x
n
k
, y) is convergent for each y H. (First reduce the
problem to the case where H is separable.) Show that there exists
x H such that x
n
k
, y) x, y) as n , for each y H.
(ii) Suppose that T L(H, E), where (E, |.|
E
Show that T(B
H
) is closed in E.
(iii) Show that T L(H, E) is compact if and only if T(B
H
) is
compact.
(iv) Show that if T K(H, E) then there exists x H with
|x| = 1 such that |T(x)| = |T|.
(v) Give an example of T L(H) for which |T(x)| < 1 for all
x H with |x| = 1.
15.4 Suppose that T K(H
1
, H
2
), where H
1
and H
2
are Hilbert spaces.
Suppose that |x| = 1 and |T(x)| = |T| (as in the previous ques-
tion). Show that if x, y) = 0 then T(x), (T(y)) = 0. Use this to
give another proof of Theorem 15.6.1.
15.5 Use the nite-dimensional version of Theorem 15.6.1 to show that an
element T of L(l
d
2
) with |T| 1 is a convex combination of unitary
operators.
15.6 Suppose that T L(H
1
, H
2
). Show that T K(H
1
, H
2
) if and
only if |T(e
n
)| 0 as n for every orthonormal sequence
(e
n
) in H
1
.
16
Summing operators
16.1 Unconditional convergence
In the previous chapter, we obtained inequalities for operators between
Hilbert spaces, and endomorphisms of Hilbert spaces, and considered special
spaces of operators, such as the trace class and the space of HilbertSchmidt
operators. For the rest of the book, we shall investigate inequalities for op-
erators between Banach spaces, and endomorphisms of Banach spaces. Are
there spaces of operators that correspond to the trace class and the space
of HilbertSchmidt operators?
We shall however not approach these problems directly. We begin by
considering a problem concerning series in Banach spaces.
Suppose that

n=1
x
n
is a series in a Banach space (E, |.|
E
). We say
that the series is absolutely convergent if

n=1
|x
n
|
E
< , and say that it
is unconditionally convergent if

n=1
x
(n)
is convergent in norm, for each
permutation of the indices: however we rearrange the terms, the series still
converges. An absolutely convergent series is unconditionally convergent,
and a standard result of elementary analysis states that the converse holds
when E is nite-dimensional. On the other hand, the series

n=1
e
n
/n
converges unconditionally in l
2
, but does not converge absolutely. What
happens in l
1
? What happens generally?
Before we go further, let us establish some equivalent characterizations of
unconditional convergence.
n
) is a sequence in a Banach space
(E, |.|
E
). The following are equivalent:
(i) The series

n=1
x
n
is unconditionally convergent.
(ii) If n
1
< n
2
< then the series

n=1
x
n
i
converges.
(iii) If
n
= 1 then the series

n=1
n
x
n
converges.
263
264 Summing operators
(iv) If b = (b
n
) is a bounded sequence then the series
n=1
b
n
x
n
converges.
(v) Given > 0, there exists a nite subset F of N such that whenever G
is a nite subset of N disjoint from F then
_
_
nG
x
n
_
_
E
< .
Proof It is clear that (ii) and (iii) are equivalent, that (v) implies (i) and
(ii) and that (iv) implies (ii). We shall show that each of (ii) and (i) implies
(v), and that (v) implies (iv).
Suppose that (v) fails, for some > 0. Then recursively we can nd nite
sets F
k
such that
_
_
_
nF
k
x
n
_
_
_
E
, and with the property that minF
k
>
sup F
k1
= N
k1
, say, for k > 1. Thus, setting N
0
= 0, F
k
J
k
, where
J
k
= n: N
k1
< n N
k
. We write
k=1
F
k
as n
1
< n
2
< ; then
j=1
x
n
j
does not converge. Thus (ii) implies (v). Further there exists a
permutation of N such that (J
k
) = J
k
for each j and (N
k1
+i) F
k
for
1 i #(F
j
). Then

n=1
x
(n)
does not converge, and so (i) implies (v).
Suppose that (v) holds, and that b is a bounded sequence. Without loss of
generality we can suppose that each b
n
is real (in the complex case, consider
real and imaginary parts) and that 0 b
n
< 1 (scale, and consider positive
and negative parts). Suppose that > 0. Then there exists n
0
such that
_
_
nG
x
n
_
_
E
< if G is a nite set with min G > n
0
. Now suppose that
n
0
< n
1
< n n
2
. Let b
n
=
k=1
b
n,k
/2
k
be the binary expansion of b
n
, so
that b
n,k
= 0 or 1. Let B
k
= n : n
1
< n n
2
, b
n,k
= 1. Then
_
_
_
_
_
n
2
n=n
1
+1
b
n
x
n
_
_
_
_
_
=
_
_
_
_
_
_
k=1
1
2
k
nB
k
x
n
_
_
_
_
_
_
k=1
1
2
k
_
_
_
_
_
_
nB
k
x
n
_
_
_
_
_
_
<
k=1
/2
k
= .
Thus

n=1
b
n
x
n
converges, and (v) implies (iv).
Corollary 16.1.1 Suppose that the series

n=1
x
n
is unconditionally con-
vergent and that is a permutation of N. Let s =

n=1
x
n
and s
n=1
x
(n)
. Then s = s
.
Proof Suppose that > 0. There exists a nite set F satisfying (v). Then if
N > sup F, [
N
n=1
x
n
nF
x
n
[ < , and so [s
nF
x
n
[ . Similarly,
if N > sup
1
(n): n F, then [
N
n=1
x
(n)

nF
x
n
[ < , and so
[s
nF
x
n
[ . Thus [s s
[ 2. Since this holds for all > 0,

s = s
.
Corollary 16.1.2 If the series

n=1
x
n
is unconditionally convergent and
E
then

n=1
[(x
n
)[ < .
16.2 Absolutely summing operators 265
Proof Let b
n
= sgn((x
n
)). Then

n=1
b
n
x
n
converges, and so therefore
does

n=1
(b
n
x
n
) =
n=1
[(x
n
)[.
We can measure the size of an unconditionally convergent series.
n
) is an unconditionally convergent
sequence in a Banach space (E, |.|
E
). Then
M
1
= sup
__
_
_
_
_
n=1
b
n
x
n
_
_
_
_
_
: b = (b
n
) l
, |b|
1
_
and M
2
= sup
_

n=1
[(x
n
)[: B
E
_
are both nite, and equal.
Proof Consider the linear mapping J : E
l
1
dened by J() = ((x
n
)).
This has a closed graph, and is therefore continuous. Thus M
2
= |J| is
nite.
If b l
then
n=1
b
n
x
n
_
n=1
b
n
(x
n
)
n=1
[(x
n
)[ M
2
.
Thus
_
_
_
_
_
n=1
b
n
x
n
_
_
_
_
_
= sup
_
[
_

n=1
b
n
x
n
_
[ : B
E
_
M
2
,
and M
1
M
2
. Conversely, suppose that B
E
. Let b
n
= sgn((x
n
)).
Thus

n=1
[(x
n
)[ = (
n=1
b
n
x
n
) M
1
||
, so that M
2
M
1
.
16.2 Absolutely summing operators
We now linearize and generalize: we say that a linear mapping T from a
Banach space (E, |.|
E
) to a Banach space (F, |.|
F
) is absolutely summing
if whenever

n=1
x
n
converges unconditionally in E then

n=1
T(x
n
) con-
verges absolutely in F. Thus every unconditionally convergent series in E is
absolutely convergent if and only if the identity mapping on E is absolutely
summing.
Theorem 16.2.1 A linear mapping T from a Banach space (E, |.|
E
) to a
Banach space (F, |.|
F
) is absolutely summing if and only if there exists a
constant K such that
N
n=1
|T(x
n
)|
F
K sup
B
E
n=1
[(x
n
)[,
for all N and all x
1
, . . . , x
N
in E.
Proof Suppose rst that K exists, and suppose that

n=1
x
n
is uncondi-
tionally convergent. Then
n=1
|T(x
n
)|
F
= sup
N
N
n=1
|T(x
n
)|
F
K sup
N
sup
B
E
n=1
[(x
n
)[
= K sup
B
E
n=1
[(x
n
)[ < ,
so that T is absolutely summing.
Conversely, suppose that K does not exist. Then we can nd 0 = N
0
<
N
1
< N
2
< and vectors x
n
in E such that
sup
B
E
N
k
n=N
k1
+1
[(x
n
)[
1
2
k
and
N
k
n=N
k1
+1
|T(x
n
)|
F
1.
Then sup
B
E
n=1
[(x
n
)[ 1, so that

n=1
x
n
is unconditionally con-
vergent. Since

n=1
|T(x
n
)|
F
= , T is not absolutely summing.
16.3 (p, q)-summing operators
We now generalize again. Suppose that 1 q p < . We say that a
linear mapping T from a Banach space (E, |.|
E
) to a Banach space (F, |.|
F
)
is (p, q)-summing if there exists a constant K such that
_
N
n=1
|T(x
n
)|
p
F
_
1/p
K sup
B
E
_
N
n=1
[(x
n
)[
q
_
1/q
()
for all N and all x
1
, . . . , x
N
in E. We denote the smallest such constant K
by
p,q
(T), and denote the set of all (p, q)-summing mappings from E to F
by
p,q
(E, F). We call a (p, p)-summing mapping a p-summing mapping,
and write
p
for
p,p
and
p
for
p,p
. Thus Theorem 16.2.1 states that the
absolutely summing mappings are the same as the 1-summing mappings. In
fact we shall only be concerned with p-summing operators, for 1 < p < ,
and (p, 2) summing operators, for 2 p < .
16.3 (p, q)-summing operators 267
We then have the following:
Theorem 16.3.1 Suppose that (E, |.|
E
) and (F, |.|
F
) are Banach spaces
and that 1 q p < . Then
p,q
(E, F) is a linear subspace of L(E, F),
and
p,q
is a norm on
p,q
(E, F), under which
p,q
(E, F) is a Banach space.
If T
p,q
(E, F) then |T|
p,q
(T), and if R L(D, E) and S L(F, G)
then STR
p,q
(D, G) and
p,q
(STR) |S|
p,q
(T) |R|.
If () holds for all x
1
, . . . , x
N
in a dense subset of E then T
p,q
(E, F),
and
p,q
(T) is the smallest constant K.
Proof We outline the steps that need to be taken, and leave the details to
the reader. First, |T|
p,q
(T): consider a sequence of length 1. Next,
p,q
(T) = [[
p,q
(T) (trivial) and
p,q
(S + T)
p,q
(S) +
p,q
(T) (use
Minkowskis inequality on the left-hand side of (*)), so that
p,q
(E, F) is a
linear subspace of L(E, F), and
p,q
is a norm on
p,q
(E, F). If (T
n
) is a
p,q
-Cauchy sequence, then it is a |.|-Cauchy sequence, and so converges in
the operator norm, to T, say. Then T
p,q
and
p,q
(T
n
T) 0 (using
(*)), so that
p,q
(E, F) is a Banach space. The remaining results are even
more straightforward.
Recall that if 1 r < s < then l
r
l
s
, and the inclusion is norm-
decreasing. From this it follows that if 1 q
1
q
0
p
0
p
1
< and
T
p
0
,q
0
(E, F) then T
p
1
,q
1
(E, F) and
p
1
,q
1
(T)
p
0
,q
0
(T). We can
however say more.
Proposition 16.3.1 Suppose that 1 q
0
p
0
< , that 1 q
1
p
1
<
and that 1/p
0
1/p
1
= 1/q
0
1/q
1
> 0. If T
p
0
,q
0
(E, F) then T
p
1
,q
1
(E, F) and
p
1
,q
1
(T)
p
0
,q
0
(T).
In particular, if 1 p
0
< p
1
and T
p
0
(E, F) then T
p
1
(E, F) and
p
1
(T)
p
0
(T).
Proof Let r = p
1
/p
0
and s = q
1
/q
0
. If x
1
, . . . , x
N
E, then using H olders
inequality with exponents s
t
and s,
_
N
n=1
|T(x
n
)|
p
1
_
1/p
0
=
_
N
n=1
_
_
_T(|T(x
n
)|
r1
x
n
)
_
_
_
p
0
_
1/p
0

p
0
,q
0
(T) sup
||
1
_
N
n=1
[(|T(x
n
)|
r1
x
n
)[
q
0
_
1/q
0
=
p
0
,q
0
(T) sup
||
1
_
N
n=1
|T(x
n
)|
(r1)q
0
[(x
n
)[
q
0
_
1/q
0

p
0
,q
0
(T)
_
N
n=1
|T(x
n
)|
(r1)q
0
s
_
1/s
q
0
sup
||
1
_
N
n=1
[(x
n
)[
sq
0
_
1/sq
0
=
p
0
,q
0
(T)
_
N
n=1
|T(x
n
)|
p
1
_
1/p
0
1/p
1
sup
||
1
_
N
n=1
[(x
n
)[
q
1
_
1/q
1
,
since (r 1)q
0
s
t
= p
1
and 1/s
t
q
0
= 1/p
0
1/p
1
. Dividing, we obtain the
desired result.
The following easy proposition provides a useful characterization of (p, q)-
summing operators.
Proposition 16.3.2 Suppose that (E, |.|
E
) and (F, |.|
F
) are Banach spaces,
that T L(E, F), that 1 q p < and that K > 0. Then T
p,q
and
p,q
K if and only if for each N and each S L(l
N
q
, E)
_
N
n=1
|TS(e
i
)|
p
_
1/p
K|S| .
Proof Suppose rst that T
p,q
and S L(l
N
q
, E). Let x
n
= S(e
n
). If
B
E
then
[(x
n
)[
q
=
[(S
)(e
n
)[
q
= |S
()|
q
|S
|
q
= |S|
q
,
so that
_
N
n=1
|TS(e
n
)|
p
_
1/p
=
_
N
n=1
|T(x
n
)|
p
_
1/p

pq
(T) |S| K|S| .
16.4 Examples of p-summing operators 269
Conversely, suppose that the condition is satised. If x
1
, . . . , x
N
E, dene
S: l
N
q
E by setting T(
1
, . . . ,
N
) =
1
x
1
+ +
N
x
N
. Then
|S| = |S
| = sup
B
E
_
N
n=1
[S
()(e
n
)[
q
_
1/q
= sup
B
E
_
N
n=1
[(x
n
)[
q
_
1/q
,
so that
_
N
n=1
|T(x
n
)|
p
_
1/p
K sup
B
E
_
N
n=1
[(x
n
)[
q
_
1/q
.
Corollary 16.3.1 Suppose that 1 q p
1
p
2
and that T
p
1
,q
. Then
p
2
,q
(T) |T|
1p
1
/p
2
(
p
1
,q
(T))
p
1
/p
2
.
Proof For
_
N
n=1
|TS(e
n
)|
p
2
_
1/p
2
_
N
sup
n=1
|TS(e
n
)|
_
1p
1
/p
2
_
N
n=1
|TS(e
n
)|
p
1
_
1/p
2
(|T| . |S|)
1p
1
/p
2
p
1
,q
(T)
p
1
/p
2
|S|
p
1
/p
2
= |T|
1p
1
/p
2
p
1
,q
(T)
p
1
/p
2
|S| .
16.4 Examples of p-summing operators
One of the reasons why p-summing operators are important is that they
occur naturally in various situations. Let us give some examples. First, let
us introduce some notation that we shall use from now on. Suppose that
K is a compact Hausdor space and that is a probability measure on the
Baire subsets of K. We denote the natural mapping from C(K) to L
p
(),
sending f to its equivalence class in L
p
, by j
p
.
Proposition 16.4.1 Suppose that K is a compact Hausdor space and that
is a probability measure on the Baire subsets of K. If 1 p < then j
p
is p-summing, and
p
(j
p
) = 1.
Proof Suppose that f
1
, . . . , f
N
C(K). If x K, the mapping f f(x)
is a continuous linear functional of norm 1 on C(K), and so
N
n=1
|j
p
(f
n
)|
p
p
=
N
n=1
_
K
[f
n
(x)[
p
d(x)
=
_
K
N
n=1
[f
n
(x)[
p
d(x)
sup
_
N
n=1
[(f
n
)[
p
: C(K)
, ||
1
_
.
Thus j
p
is p-summing, and
p
(j
p
) 1. But also
p
(j
p
) |j
p
| = 1.
Proposition 16.4.2 Suppose that (, , ) is a measure space, that 1
p < and that f L
p
(, , ). Let M
f
(g) = fg, for g L
. Then
M
f

p
(L
, L
p
) and
p
(M
f
) = |M
f
| = |f|
p
.
Proof We use Proposition 16.3.2. Suppose rst that p > 1. Suppose
that S L(l
N
p
, L
). Let g
n
= S(e
n
). If
1
, . . . ,
N
are rational and
|(
1
, . . . ,
n
)|
p
1 then [
N
n=1
n
g
n
()[ |S|, for almost all . Tak-
ing the supremum over the countable collection of all such
1
, . . . ,
N
, we
see that |(g
1
(), . . . , g
n
())|
p
|S|, for almost all . Then
N
n=1
|M
f
S(e
n
)|
p
p
=
N
n=1
|fg
n
|
p
p
=
N
n=1
_
[fg
n
[
p
d
=
_
[f[
p
(
N
n=1
[g
n
[
p
) d |S|
p
|f|
p
p
.
Thus it follows from Proposition 16.3.2 that M
f
is p-summing, and
p
(M
f
)
|f|
p
. But
p
(M
f
) |M
f
| = |f|
p
.
If p = 1 and S L(l
N
, L
) then for each

N
n=1
[S(e
n
)()[ = S
_
N
n=1
n
e
n
_
()
for some = (
n
) with ||
= 1. Thus
_
_
_
N
n=1
[S(e
n
)[
_
_
_
|S|, and so
N
n=1
|M
f
S(e
n
)|
1

_
_
_
_
_
N
n=1
[S(e
n
)[
_
_
_
_
_
|f|
1
|S| |f|
1
16.5 (p, 2)-summing operators between Hilbert spaces 271
Proposition 16.4.3 Suppose that (, , ) is a measure space, and that
L
p
(E
), where E is a Banach space and 1 p < . Then the mapping I
:
x
_
()(x) d() from E to L

p
(, , ) is p-summing, and
p
(I
)
||
p
.
Proof Suppose that x
1
, . . . , x
N
E. Let A = : () ,= 0. Then
N
n=1
|I
(x
n
)|
p
p
=
_
A
N
n=1
[()(x
n
)[
p
d()
=
_
A
N
n=1
[(()/ |()|)(x
n
)[
p
|()|
p
d()
_
sup
||
1
N
n=1
[(x
n
)[
p
_
_
A
|()|
p
d().
We wish to apply this when E is an L
q
space. Suppose that K is a
measurable function on (
1
,
1
,
1
) (
2
,
2
,
2
) for which
_
1
__
2
[K(x, y)[
q
d
2
(y)
_
p/q
d
1
(x) < ,
where 1 p < and 1 < q . We can consider K as an element of
L
p
(L
q
) = L
p
((L
q
)
t
); then I
K
is the integral operator
I
K
(f)(x) =
_
2
K(x, y)f(y) d
2
(y).
The proposition then states that I
K
is p-summing from L
q
(
2
,
2
,
2
) to
L
p
(
1
,
1
,
1
), and
p
(I
K
)
_
_
1
__
2
[K(x, y)[
q
d
2
(y)
_
p/q
d
1
(x)
_
1/p
.
16.5 (p, 2)-summing operators between Hilbert spaces
How do these ideas work when we consider linear operators between Hilbert
spaces? Do they relate to the ideas of the previous chapter?
Proposition 16.5.1 Suppose that H
1
and H
2
are Hilbert spaces and that
2 p < . Then
p,2
(H
1
, H
2
) = S
p
(H
1
, H
2
), and if T S
p
(H
1
, H
2
) then
p,2
(T) = |T|
p
.
Proof Suppose that T
p,2
(H
1
, H
2
). If (e
n
) is an orthonormal sequence
in H
1
and y H
1
, then

N
n=1
[ e
n
, y) [
2
|y|
2
, and so

N
n=1
|T(e
n
)|
p
(
p,2
(T))
p
. Consequently,

n=1
|T(e
n
)|
p
(
p,2
(T))
p
, and in particular
|T(e
n
)| 0 as n . Thus T is compact (Exercise 15.7). Suppose that
T =
n=1
s
n
(T) , x
n
) y
n
. Then
j=1
(s
j
(T))
p
=
j=1
|T(x
j
)|
p
(
p,2
(T))
p
,
so that T S
p
(H
1
, H
2
), and |T|
p

p,2
(T).
Conversely, if TS
p
(H
1
, H
2
) and SL(l
N
2
, H
1
), then (
N
n=1
|TS(e
n
)|
p
)
1/p
|TS|
p
|S| |T|
p
, by Proposition 15.11.1 (ii). By Proposition 16.3.2,
T
p,2
(H
1
, H
2
) and
p,2
(T) |T|
p
.
In particular,
2
(H
1
, H
2
) = S
2
(H
1
, H
2
). Let us interpret this when H
1
and H
2
are L
2
spaces.
1
= L
2
(
1
,
1
,
1
) and H
2
= L
2
(
2
,
2
,
2
),
and that T L(H
2
, H
1
). Then T S
2
(H
2
, H
1
) if and only if there exists
K L
2
(
1
2
) such that T = I
K
. If so, and if T =
j=1
s
j
, g
j
) f
j
, then
K(x, y) =
j=1
s
j
f
j
(x)g
j
(y),
the sum converging in norm in L
2
(
1
2
), and |K|
2
= |T|
2
.
Proof If T = I
K
, then T
2
(H
2
, H
1
), by Proposition 16.4.3, and |T|
2
=
|K|
2
. Conversely, suppose that T =

j=1
s
j
, g
j
) f
j

2
(H
2
, H
1
). Let
h
j
(x, y) = f
j
(x)g
j
(y). Then (h
j
) is an orthonormal sequence in L
2
(
1
2
),
and so the sum

j=1
s
j
h
j
converges in L
2
norm, to K, say. Let K
n
=
n
j=1
s
j
h
j
. If f L
2
(
2
) then
T(f) = lim
n
n
j=1
s
j
f, g
j
) f
j
= lim
n
I
K
n
(f) = I
K
(f)
since
|I
K
(f) I
K
n
(f)|
2
|I
KK
n
| |f|
2
|I
KK
n
|
2
|f|
2
,
and |I
KK
n
|
2
0 as n .
1
273
1
The identication of 2-summing mappings with HilbertSchmidt mappings,
together with the results of the previous section, lead to some strong con-
clusions.
Let us introduce some more notation that we shall use from now on.
Suppose that (, , P) is a probability space. Then if 1 p < q we
denote the inclusion mapping L
q
L
p
by I
q,p
.
Theorem 16.6.1 Suppose that (, , P) is a probability space. Suppose
that T L(L
1
, L
) and that
_
T(f)

f dP 0 for f L
1
. Let T
1
= I
,1
T.
Then T
1
is a Riesz operator on L
1
, every non-zero eigenvalue
j
is positive,
the corresponding generalized eigenvector is an eigenvector, and

j=1
j

|T|. The corresponding eigenvectors f
j
are in L
and can be chosen to be

orthonormal in L
2
. The series
j=1
j
f
j
(y)f
j
(x)
then converges in L
2
( ) norm to a function K L
( ) and if
f L
1
then T(f)(x) =
_
K(x, y)f(y) dP(y).

Proof Let T
2
= I
,2
TI
2,1
: L
2
L
2
. Then T
2
is a positive Hermitian opera-
tor on L
2
. Since, by Proposition 16.4.1, I
2,
is 2-summing, with
2
(I
,2
) =
1, T
2
is also a 2-summing operator, with
2
(T
2
) |T|. Thus T
2
is a pos-
itive HilbertSchmidt operator, and we can write T
2
=

j=1
j
, f
j
) f
j
,
where (
j
) = (
j
(T
2
)) is a decreasing sequence of non-negative numbers in
l
2
. Now T
2
1
= I
2,1
T
2
I
,2
T, so that T
2
1
is compact, and T
1
is a Riesz operator.
Since T
1
= I
2,1
I
,2
T, the operators T
1
and T
2
are related, and (
j
) is the se-
quence of eigenvalues of T
1
, repeated according to their multiplicity, and each
principal vector is in fact an eigenvector. Since T
2
(f
j
) =
j
I
2,
TI
2,1
(f
j
),
f
j
L
.
Now let S =
j=1
_
j
, f
j
) f
j
, so that S
2
= T
2
. If f L
2
then
|S(f)|
2
2
= S(f), S(f)) = T
2
(f), f)
=
_
T(f)

f dP |T(f)|
|f|
1
|T| |f|
2
1
.
Thus S extends to a bounded linear mapping S
1
: L
1
L
2
with |S
1
|
|T|
1/2
. Then S
1
L(L
2
, L
), with |S
1
| |T|
1/2
. Since S is self-adjoint,
S=I
,2
S
1
, and so S is 2-summing, by Proposition 16.4.1, with
2
(S)|T|
1/2
.
But
2
(S) = (
j=1
(
_
j
)
2
)
1/2
= (
j=1
j
)
1/2
, and so

j=1
j
|T|.
Thus T
2
is a trace class operator.
Now let W
n
=
n
j=1
j
(T
2
) , f
j
) f
j
and let K
n
(x, y) =
n
j=1
j
(T
2
)f
j
(y)
f
j
(x). Then
W
n
(f), f) =
n
j=1
j
(T
2
)[ f, f
j
) [
2
j=1
j
(T
2
)[ f, f
j
) [
2
= T(f), f) ,
and [ W
n
(f), g) [
2
W
n
(f), f) W
n
(g), g), so that
_
AB
K
n
(x, y) dP(x)dP(y)
2
= [ W
n
(I
A
), I
B
) [
2
W
n
(I
A
), I
A
) W
n
(I
B
), I
B
)
T(I
A
), I
A
) T(I
B
), I
B
)
|T|
2
(P(A))
2
(P(B))
2
,
so that [K
n
(x, y)[ |T| almost everywhere. Since K
n
K in L
2
(), it
follows that [K(x, y)[ |T| almost everywhere. Thus I
K
denes an element
T
K
of L(L
1
, L
). But I
K
= T
2
on L
2
, and L
2
is dense in L
1
, and so T = T
K
.
16.7 Mercers theorem
Theorem 16.6.1 involved a bounded kernel K. If we consider a continuous
positive-denite kernel on XX, where (X, ) is a compact Hausdor space,
we obtain even stronger results.
Theorem 16.7.1 (Mercers theorem) Suppose that P is a probability
measure on the Baire sets of a compact Hausdor space (X, ), with the
property that if U is a non-empty open Baire set then P(U) > 0, and that
K is a continuous function on X X such that
_
XX
K(x, y)f(x)f(y) 0 for f L
1
(P).
Then T = I
K
satises the conditions and conclusions of Theorem 16.6.1.
With the notation of Theorem 16.6.1, the eigenvectors f
j
are continuous,
and the series

j=1
j
f
j
(x)f
j
(y) converges absolutely to K(x, y), uniformly
in x and y. T is a compact operator from L
1
(P) to C(X), and

j=1
j
=
_
X
K(x, x) dP(x).
16.7 Mercers theorem 275
Proof If x X and > 0 then there exists a neighbourhood U of x such
that [K(x
t
, y) K(x, y)[ < for x
t
U and all y X. Then [T(f)(x
t
)
T(f)(x)[ |f|
1
for x
t
U, and so T is a bounded linear mapping from
L
1
(P) into C(X), which we can identify with a closed linear subspace of
L
(P). Then T satises the conditions of Theorem 16.6.1. If

j
is a non-
zero eigenvalue, then T(f
j
) =
j
f
j
C(X), and so f
j
is continuous.
Now let W
n
=
n
j=1
j
, f
j
) f
j
; let R
n
= T W
n
and L
n
= K K
n
, so
that R
n
= I
L
n
=
j=n+1
j
, f
j
) f
j
. Thus L
n
(x, y) =
j=n+1
j
f
j
(x)f
j
(y),
the sum converging in norm in L
2
(P P). Consequently, L
n
(x, y) =
L
n
(y, x), almost everywhere. But L
n
is continuous, and so L
n
(x, y) =
L
n
(y, x) for all (x, y). In particular, L
n
(x, x) is real, for all x. If x
0
X and
U is an open Baire neighbourhood of x
0
then
_
UU
L
n
(x, y) dP(x)dP(y) = R
n
(I
U
), I
U
) =
j=n+1
j
[
_
U
f
j
dP[
2
0,
and so it follows from the continuity of L
n
that L
n
(x
0
, x
0
) 0, for all
x
0
X. Thus
K
n
(x, x) =
n
j=1
j
[f
j
(x)[
2
K(x, x) for all x X,
and so

j=1
j
[f
j
(x)[
2
converges to a sum Q(x), say, with Q(x) K(x, x),
for all x X.
Suppose now that x X and that > 0. There exists n
0
such that
m
j=n+1
j
[f
j
(x)[
2
<
2
, for m > n n
0
. But if y X then
m
j=n+1
j
[f
j
(x)f
j
(y)[
j=n+1
j
[f
j
(x)[
2
1/2
j=n+1
j
[f
j
(y)[
2
1/2
(K(y, y))
1/2
|K|
1/2
()
by the CauchySchwartz inequality, so that

j=1
j
f
j
(x)f
j
(y) converges
absolutely, uniformly in y, to B(x, y), say. Similarly, for xed y, the series
converges absolutely, uniformly in x. Thus B(x, y) is a separately continuous
function on X X. We want to show that B = K. Let D = K B. Since
j=1
j
f
j
(x)f
j
(y) converges to K in norm in L
2
(P P), it follows that
D = 0 P P-almost everywhere. Let G = x: D(x, y) = 0 for all y. For
almost all x, D(x, y) = 0 for almost all y. But D(x, y) is a continuous
function of y, and so x G for almost all x. Suppose that D(x, y) ,= 0.
Then there exists a Baire open neighbourhood U of x such that D(z, y) ,= 0,
for z U. Thus U G = . But this implies that P(U) = 0, giving a
contradiction. Thus B = K.
In particular, Q(x) = K(x, x) for all x, and

j=1
j
[f
j
(x)[
2
= K(x, x).
Since the summands are positive and continuous and K is continuous, it
follows from Dinis Theorem (see Exercise 16.3) that the convergence is uni-
form in x. Using the inequality () again, it follows that

j=1
j
f
j
(x)f
j
(y)
converges absolutely to K(x, y), uniformly in (x, y). Thus I
K
n
I
K
= T in
operator norm. Since I
K
n
is a nite-rank operator, T is compact. Finally,
j=1
j
=
j=1
j
_
X
[f
j
[
2
dP =
_
X
j=1
j
[f
j
[
2
dP =
_
X
K(x, x) dP(x).
It is not possible to replace the condition that K is continuous by the con-
dition that T L(L
1
, C(K)) (see Exercise 16.4).
16.8 p-summing operators between Hilbert spaces (1 p 2)
We know that the 2-summing operators between Hilbert spaces are simply
the HilbertSchmidt operators, and the
2
norm is the same as the Hilbert
Schmidt norm. What about p-summing operators between Hilbert spaces,
for other values of p? Here the results are rather surprising. First we
establish a result of interest in its own right, and a precursor of stronger
results yet to come.
Proposition 16.8.1 The inclusion mapping i
1,2
: l
1
l
2
is 1-summing,
and
1
(i
1,2
) =

2.
Proof The proof uses the KahaneKhintchine inequality for complex num-
bers. Suppose that x
(1)
, . . . , x
(N)
l
1
. Suppose that K N, and let
1
, . . . ,
K
be Bernoulli random variables on D
K
2
. Then, by Theorem 13.3.1,
N
n=1
_
K
k=1
[x
(n)
k
[
2
_
1/2
2
N
n=1
_
E
k=1
k
()x
(n)
k
_
=

2E
_
N
n=1
k=1
k
()x
(n)
k
2 sup
_
N
n=1
k=1
k
x
(n)
k
: [
k
[ 1 for all k
_
.
16.9 Pietschs domination theorem 277
Thus
N
n=1
_
_
_x
(n)
_
_
_
2
2 sup
N
n=1
[(x
(n)
)[ : (l
1
)
= l
, ||
1,
so that i
1,2
is 1-summing, and
1
(i
1,2
)
2. To show that

2 is the best
possible constant, consider x
(1)
= (1/2, 1/2, 0, 0, . . .), x
(2)
= (1/2, 1/2,
0, 0, . . .).
Theorem 16.8.1 If T=
j=1
s
j
(T)y
j
x
j
S
2
(H
1
, H
2
) then T
1
(H
1
, H
2
)
and
1
(T)
2 |T|
2
.
Proof If x H
1
, let S(x) = (s
j
(T) x, x
j
)). Applying the CauchySchwartz
inequality,
j=1
[S(x)
j
[ (
j=1
(s
j
(T))
2
)
1/2
(
j=1
[ x, x
j
) [
2
)
1/2
|T|
2
|x| ,
so that S L(H
1
, l
1
) and |S| |T|
2
. If l
2
let R() =

j=1
j
y
j
.
Clearly R L(l
2
, H
2
) and |R| = 1. Since T = Ri
1,2
S, the result follows
from Proposition 16.8.1.
Corollary 16.8.1 S
2
(H
1
, H
2
) =
p
(H
1
, H
2
), for 1 p 2.
We shall consider the case 2 < p < later, after we have developed the
general theory further.
16.9 Pietschs domination theorem
We now establish a fundamental theorem, whose proof uses the Hahn
Banach separation theorem in a beautiful way. First we make two remarks.
If (E, |.|
E
) is a Banach space, there is an isometric embedding i of E into
C(K), for some compact Hausdor space K: for example, we can take K
to be the unit ball of E
, with the weak* topology, and let i(x)() = (x).

Second, the Riesz representation theorem states that if is a continuous lin-
ear functional on C(K) then there exists a probability measure in P(K),
the set of probability measures on the Baire subsets of K, and a measurable
function h with [h(k)[ = ||
for all k K such that (f) =

_
X
fhd for
all f C(K). We write = hd.
Theorem 16.9.1 (Pietschs domination theorem) Suppose that
(E, |.|
E
) and (F, |.|
F
) are Banach spaces and that T L(E, F). Suppose
that i : E C(K) is an isometric embedding, and that 1 p < . Then
T
p
(E, F) if and only if there exists P(K) and a constant M such
that |T(x)| M(
_
[i(x)[
p
d)
1/p
for each x E. If so, then M
p
(T),
and we can choose so that M =
p
(T).
Proof If such and M exist, and x
1
, . . . , x
N
E then, since for each k K
the mapping x i(x)(k) is a continuous linear functional of norm at most
1 on E,
N
n=1
|T(x
n
)|
F
M
p
_
K
N
n=1
[i(x
n
)(k)[
p
d(k)
M
p
sup
_
N
n=1
[(x
n
)[
p
: E
, ||
1
_
,
and so T
p
(E, F) and
p
(T) M.
Conversely, suppose that T
p
(E, F); by scaling, we can suppose that
p
(T) = 1. For S = (x
1
, . . . , x
N
) a nite sequence in E and k K, set
g
S
(k) =
N
n=1
[i(x
n
)(k)[
p
and l
S
(k) =
N
n=1
|T(x
n
)|
p
F
g
S
(k).
Then g
S
C
R
(K). Since K is compact, g
S
attains its supremum G
S
at
a point k
S
of K. Now if E
then by the HahnBanach extension

theorem there exists hd C
R
(K)
with |hd| = ||
such that (x) =

_
K
i(x)hd, and so
N
n=1
|T(x
n
)|
p
F
sup
_
N
n=1
[(x
n
)[
p
: E
, ||
1
_
= sup
_
N
n=1
[
_
i(x
n
)hd[
p
: hd C(K)
, |hd|
1
_
sup
_
N
n=1
_
[i(x
n
)[
p
d: P(K)
_
= sup
_
_
N
n=1
[i(x
n
)[
p
d: P(K)
_
G
S
.
Thus l
S
(k
S
) 0. Now let
L = l
S
: S = (x
1
, . . . , x
N
) a nite sequence in E,
16.10 Pietschs factorization theorem 279
and let
U = f C
R
(K): f(k) > 0 for all k K.
Then L and U are disjoint, and U is convex and open. L is also convex: for
if S = (x
1
, . . . , x
N
) and S
t
= (x
t
1
, . . . , x
t
N
) are nite sets in E and 0 < < 1

then (1 )h
S
+h
S
= h
S
, where
S
tt
= ((1 )
1/p
x
1
, . . . , (1 )
1/p
x
N
,
1/p
x
t
1
, . . . ,
1/p
x
t
N
).
Thus by the HahnBanach separation theorem (Theorem 4.6.2), there exist
hd C
R
(K)
and R such that

_
fhd > for f U and
_
l
S
hd
for l
S
L. Since 0 L, 0. If f U and > 0 then f U, and so
_
fhd > . Since this holds for all > 0, it follows that = 0. Thus
_
fhd > 0 if f U, and so h(k) = |hd|
-almost everywhere. Thus

_
l
S
d 0 for l
S
L. Applying this to a one-term sequence S = (x), this
says that |T(x)|
p
F

_
K
[i(x)(k)[
p
d(k). Thus the required inequality holds
with M = 1 =
p
(T).
16.10 Pietschs factorization theorem
Proposition 16.4.1 shows that if is a probability measure on the Baire sets
of a compact Hausdor space, and if 1 p < , then the natural map
j
p
: C(K) L
p
() is p-summing, and
p
(j
p
) = 1. We can also interpret
Pietschs domination theorem as a factorization theorem, which shows that
j
p
is the archetypical p-summing operator.
Theorem 16.10.1 (The Pietsch factorization theorem) Suppose that
(E, |.|
E
) and (F, |.|
F
) are Banach spaces and that T L(E, F). Suppose
that i : E C(K) is an isometric embedding, and that 1 p < . Then
T
p
(E, F) if and only if there exists P(K) and a continuous linear
mapping R : j
p
i(E) F (where j
p
i(E) is the closure of j
p
i(E) in L
p
(),
and is given the L
p
norm) such that T = Rj
p
i. If so, then we can nd a
factorization such that |R| =
p
(T).
Proof If T = Rj
p
i, then since j
p
is p-summing, so is T, and
p
(T)
|R|
p
(j
p
) |i| = |R|. Conversely, suppose that T
p
(E, F). Let be
a probability measure satisfying the conclusions of Theorem 16.9.1. If f =
j
p
i(x) = j
p
i(y) j
p
i(E) then |T(x) T(y)|
F

p
(T) |j
p
i(x) j
p
i(y)|
p
=
0, so that T(x) = T(y). We can therefore dene R(f) = T(x) without
ambiguity, and then |R(f)|
F

p
(T) |f|
p
. Finally, we extend R to j
p
i(E),
by continuity.
We therefore have the following diagram:
E
T
F
i
R
i(E)
j
p
j
p
i(E)
_

C(K)
j
p
L
p
()
In general, we cannot extend R to L
p
(), but there are two special cases
when we can. First, if p = 2 we can compose R with the orthogonal projec-
tion of L
2
() onto j
2
i(E). We therefore have the following.
Corollary 16.10.1 Suppose that (E, |.|
E
) and (F, |.|
F
) are Banach spaces
and that T L(E, F). Suppose that i : E C(K) is an isometric em-
bedding. Then T
2
(E, F) if and only if there exists P(K) and a
continuous linear mapping R : L
2
() F such that T = Rj
2
i. If so, we
can nd a factorization such that |R| =
2
(T).
E
T
F
i
R
C(K)
j
2
L
2
()
Second, suppose that E = C(K), where K is a compact Hausdor space.
In this case, j
p
(E) is dense in L
p
(), so that R L(L
p
(), F). Thus we
have the following.
Corollary 16.10.2 Suppose that K is a compact Hausdor space, that
(F, |.|
F
) is a Banach space and that T L(C(K), F).Then T
p
(C(K), F)
if and only if there exists P(K) and a continuous linear mapping
R : L
p
() F such that T = Rj
p
. If so, then we can nd a factoriza-
tion such that |R| =
p
(T).
This corollary has the following useful consequence.
16.11 p-summing operators between Hilbert spaces (2 p ) 281
Proposition 16.10.1 Suppose that K is a compact Hausdor space, that
(F, |.|
F
) is a Banach space and that T
p
(C(K), F). If p < q < then
q
(T) |T|
1p/q
(
p
(T))
p/q
.
Proof Let T = Rj
p
be a factorization with |R| =
p
(T). Let j
q
: C(K)
L
q
() be the natural map, and let I
q,p
: L
q
() L
p
() be the inclusion
map. If F
then g
= R
() (L
p
())
= L
p
(). By Littlewoods
inequality, |g
|
q
|g
|
1p/q
1
|g
|
p/q
p
, and
|g
|
1
=
_
_
j
p
(g
)
_
_
=
_
_
j
p
R
()
_
_
= |T
()|
|T
| . ||
= |T| . ||
.
Thus
q
(T) =
q
(RI
q,p
j
q
) |RI
q,p
|
q
(j
q
)
= |RI
q,p
| =
_
_
I
q,p
R
_
_
= sup
__
_
I
q,p
R
()
_
_
: ||
1
_
= sup
_
|g
|
q
: ||
1
_
sup
_
|g
|
1p/q
1
: ||
1
_
sup
_
|g
|
p/q
p
: ||
1
_
|T|
1p/q
|R|
p/q
= |T|
1p/q
(
p
(T))
p/q
.
16.11 p-summing operators between Hilbert spaces (2 p )
Pietschs theorems have many applications. First let us complete the results
on operators between Hilbert spaces.
1
and H
2
are Hilbert spaces and that
2 p < . Then T
p
(H
1
, H
2
) if and only if T S
2
(H
1
, H
2
).
Proof If T S
2
(H
1
, H
2
) then T
2
(H
1
, H
2
), and so T
p
(H
1
, H
2
). Con-
versely, if T
p
(H
1
, H
2
) then T
p,2
(H
1
, H
2
), and so T S
p
(H
1
, H
2
).
Thus T is compact, and we can write T =

j=1
s
j
(T) , x
j
) y
j
. Let B
1
be
the unit ball of H
1
, with the weak topology. By Pietschs domination theo-
rem, there exists P(B
1
) such that |T(x)|
p
(
p
(T))
p
_
B
1
[ x, y) [
p
d(y)
for all x H
1
. Once again, we make use of the KahaneKhintchine inequal-
ity. Let
1
, . . . ,
J
be Bernoulli random variables on D
J
2
, and let x() =
J
j=1
j
()x
j
. Then T(x()) =

J
j=1
j
()s
j
(T)y
j
, so that |T(x())| =
(
J
j=1
(s
j
(T))
2
)
1/2
, for each . Thus
(
J
j=1
(s
j
(T))
2
)
p/2
(
p
(T))
p
_
B
1
[ x(), y) [
p
d(y).
Integrating over D
J
2
, changing the order of integration, and using the
KahaneKhintchine inequality, we see that
j=1
(s
j
(T))
2
p/2
(
p
(T))
p
_
D
J
2
__
B
1
[ x(), y) [
p
d(y)
_
dP()
= (
p
(T))
p
_
B
1
_
D
J
2
[
J
j=1
j
() x
j
, y) [
p
dP()
d(y)
(
p
(T))
p
B
p
p
_
B
1
j=1
[ x
j
, y) [
2
p/2
d(y),
where B
p
is the constant in the KahaneKhintchine inequality. But
J
j=1
[ x
j
, y) [
2
|y|
2
1 for y B
1
, and so |T|
2
= |(S
j
(T))|
2

B
p
p
(T).
16.12 The DvoretzkyRogers theorem
Pietschs factorization theorem enables us to prove the following.
Theorem 16.12.1 Suppose that S
2
(E, F) and T
2
(F, G). Then
TS is 1-summing, and compact.
Proof Let i
E
be an isometry of E into C(K
E
) and let i
F
be an isometry of
F into C(K
F
). We can write S =

Sj
2
i
E
and T =

Tj
t
2
i
F
:
Then j
t
2
i
F

S is 2-summing, and therefore is a HilbertSchmidt operator.
Thus it is 1-summing, and compact, and so therefore is TS =

T(j
t
2
i
F

S)j
2
i
E
.
16.12 The DvoretzkyRogers theorem 283
We can now answer the question that was raised at the beginning of the
chapter.
Theorem 16.12.2 (The DvoretzkyRogers theorem) If (E, |.|
E
) is a
Banach space in which every unconditionally convergent series is absolutely
convergent, then E is nite-dimensional.
Proof For the identity mapping I
E
is 1-summing, and therefore 2-summing,
and so I
E
= I
2
E
is compact.
Since
1
(T)
2
(T), the next result can be thought of as a nite-
dimensional metric version of the DvoretzkyRogers theorem.
Theorem 16.12.3 If (E, |.|
E
) is a n-dimensional normed space, then
2
(E) =

n.
Proof Let I
E
be the identity mapping on E. We can factorize I
E
= Rj
2
i,
with |R| =
2
(I
E
). Let H
n
= j
2
i(E). Then dim H
n
= n and j
2
iR is the
identity mapping on H
n
. Thus
n =
2
(I
H
n
)
2
(j
2
) |i| . |R| = |R| =
2
(I
E
).
For the converse, we use Proposition 16.3.2. Let S L(l
J
2
, E), let K be the
null-space of S, and let Q be the orthogonal projection of l
J
2
onto K
. Then
dim K
n, and I
E
S = S = SI
K
Q, so that
2
(S) |S|
2
(I
K
)
n|S|. Thus (
J
j=1
|I
E
S(e
j
)|
2
)
1/2

n|S|, and so
2
(I
E
)

n.
This result is due to Garling and Gordon [GaG 71], but this elegant proof
is due to Kwapie n. It has three immediate consequences.
E
) is an n-dimensional normed
space. Then there exists an invertible linear mapping T : E l
n
2
with
|T| = 1 and
_
_
T
1
_
_

n.
Proof Let U : l
n
2
H
n
be an isometry, and take T = U
1
j
2
i, so that
T
1
= RU, and
_
_
T
1
_
_
= |R| =

n.
Corollary 16.12.2 Suppose that E
n
is an n-dimensional subspace of a
normed space (E, |.|
E
). Then there exists a projection P of E onto E
n
with |P|

n.
Proof Let i be an isometric embedding of E into C(K), for some compact
Hausdor space K, and let I
E
n
= Rj
2
i
[E
n
be a factorization with |R| =

n.
Then P = Rj
2
i is a suitable projection.
E
) is an n-dimensional normed
space and that 2 < p < . Then
p,2
(I
E
) n
1/p
.
Proof By Corollary 16.3.1,
p,2
(I
E
) |I
E
|
12/p
(
2
(i
E
))
2/p
= n
1/p
.
We shall obtain a lower bound for
p,2
(I
E
) later (Corollary 17.4.2).
16.13 Operators that factor through a Hilbert space
Corollary 16.10.1 raises the problem: when does T L(E, F) factor through
a Hilbert space? We say that T
2
=
2
(E, F) if there exist a Hilbert
space H and A L(H, F), B E, H such that T = AB. If so, we set
2
(T) = inf|A| |B| : T = AB.
To help us solve the problem, we introduce the following notation: if
x = (x
1
, . . . , x
m
) and y = (y
1
, . . . , y
n
) are nite sequences in a Banach space
(E, |.|
E
) we write x y if

m
i=1
[(x
i
)[
2
n
j=1
[(y
j
)[
2
for all E
.
Theorem 16.13.1 Suppose that T L(E, F). Then T
2
if and only
if there exists C 0 such that whenever x y then

m
i=1
|T(x
i
)|
2
C
2
n
j=1
|y
j
|
2
. If so, then
2
is the inmum of the C for which the condition
holds.
Proof Suppose rst that T
2
and that C >
2
(T). Then there is a
factorization T = AB with |B| = 1 and |A| < C. Suppose that x y.
Let (e
1
, . . . , e
l
) be an orthonormal basis for span (B(x
1
), . . . , B(x
m
)), and
let
k
= B
(e
k
) for 1 k l. Then
m
i=1
|T(x
i
)|
2
C
2
m
i=1
|B(x
i
)|
2
= C
2
m
i=1
l
k=1
[ B(x
i
), e
k
) [
2
= C
2
l
k=1
m
i=1
[
k
(x
i
)[
2
16.13 Operators that factor through a Hilbert space 285
C
2
l
k=1
n
j=1
[
k
(y
j
)[
2
= C
2
n
j=1
l
k=1
[ B(y
j
), e
k
) [
2
C
2
n
j=1
|B(y
j
)|
2
C
2
n
j=1
|y
j
|
2
.
Thus the condition is necessary.
Second, suppose that the condition is satised. First we consider the
case where E is nite-dimensional. Let K be the unit sphere of E
: K is
compact. If x E and k K, let x(k) = k(x). Then x C(K). Now let
S =
(x, y):
m
i=1
|T(x
i
)|
2
> C
2
n
j=1
|y
j
|
2
,
and let
D =
j=1
[ y
j
[
2
i=1
[ x
i
[
2
: (x, y) S
.
Then D is a convex subset of C(K), and the condition ensures that D is
disjoint from the convex open set U = f : f(k) > 0 for all k K. By
the HahnBanach theorem, there exists a probability measure P on K so
that
_
g dP 0 for all g D. Then it follows by considering sequences
of length 1 that if |T(x)| > C |y| then
_
[ x[
2
dP
_
[ y[
2
dP. Let a =
sup
_
[ x[
2
dP: |x| = 1. Then a 1, and it is easy to see that a >
0 (why?). Let = aP, and let B(x) = j
2
( x), where j
2
is the natural
map from C(K) L
2
(), and let H = B(E). Then |B| = 1, and it
follows that if |B(x)| < |B(y)| then |T(x)| C |y|. Choose y so that
|B(y)| = |y| = 1. Thus if |B(x)| < 1 then |T(x)| C. This implies that
|T(x)| C |B(x)| for all x E, so that if B(x) = B(z) then T(x) = T(z).
We can therefore dene A L(H, F) such that T = AB and |A| C.
We now consider the case where E is innite-dimensional. First sup-
pose that E is separable, so that there is an increasing sequence (E
i
) of
nite-dimensional subspaces whose union E
is dense in E. For each i

there is a factorization T
[E
i
= A
i
B
i
, with |A
i
| C and |B
i
| = 1. For
x, y E
i
let x, y)
i
= B
i
(x), B
i
(y)). Then a standard approximation
and diagonalization argument shows that there is a subsequence (i
k
) such
that if x, y E
then x, y)
i
k
converges, to x, y)
, say. x, y)
is a pre-
inner product; it satises all the conditions of an inner product except that
N = x: x, y)
= 0 for all y E
may be a non-trivial linear sub-

space of E
. But then we can consider E/N, dene an inner product on it,

and complete it, to obtain a Hilbert space H. Having done this, it is then
straightforward to obtain a factorization of T; the details are left to the
reader. If E is non-separable, a more sophisticated transnite induction is
needed; an elegant way to provide this is to consider a free ultralter dened
on the set of nite-dimensional subspaces of E.
Let us now consider the relation x y further.
Proposition 16.13.1 Suppose that x = (x
1
, . . . , x
m
) and y = (y
1
, . . . , y
n
)
are nite sequences in a Banach space (E, |.|
E
). Then x y if and only
if there exists A = (a
ij
) L(l
m
2
, l
n
2
) with |A| 1 such that x
i
=
n
j=1
a
ij
y
j
for 1 i m.
Proof Suppose that x y. Consider the subspace V = ((x
i
))
m
i=1
:
E
of l
m
2
. If v = ((x
i
))
m
i=1
V , let A
0
(v) = ((y
j
))
n
j=1
l
n
2
. Then A
0
is well-dened, and |A
0
| 1. Let A = A
0
P, where P is the orthogonal
projection of l
m
2
onto V . Then A has the required properties.
Conversely, if the condition is satised and E
then
m
i=1
[(x
i
)[
2
=
m
i=1
j=1
a
ij
(y
j
)
j=1
[(y
j
)[
2
.
In Theorem 16.13.1, we can clearly restrict attention to sequences x and
y of equal length. Combining Theorem 16.13.1 with this proposition, and
with Exercise 16.6, we obtain the following.
Theorem 16.13.2 Suppose that T L(E, F). Then the following are equiv-
alent:
(i) T
2
;
(ii) there exists C 0 such that if y
1
, . . . , y
n
X and A L(l
n
2
, l
n
2
) then
n
i=1
_
_
_
_
_
_
T
j=1
u
ij
y
j
_
_
_
_
_
_
2
C
2
|A|
2
n
i=1
|T(y
j
)|
2
;
(iii) there exists C 0 such that if y
1
, . . . , y
n
X and U = (u
ij
) is an
n n unitary matrix then
n
i=1
_
_
_
_
_
_
T
j=1
u
ij
y
j
_
_
_
_
_
_
2
C
2
n
i=1
|T(y
j
)|
2
.
If so, then
2
is the inmum of the C for which the conditions hold.
Absolutely summing operators were introduced by Grothendieck [Grot 53]
as applications semi-integrales ` a droite and many of the results of the rest
of the book have their origin in this fundamental work. It was however
written in a very compressed style, and most of the results were expressed
in terms of tensor products, rather than linear operators, and so it remained
impenetrable until the magnicent paper of Lindenstrauss and Pe lczy nski
[LiP 68] appeared. This explained Grothendiecks work clearly in terms of
linear operators, presented many new results, and ended with a large number
of problems that needed to be resolved.
Theorem 16.8.1 was rst proved by Grothendieck [Grot 53]. The proof
given here is due to Pietsch [Pie 67], who extended the result to p-summing
operators, for 1 p 2. Theorem 16.11.1 was proved by Pe lczy nski [Pel 67].
Grothendieck proved his result by calculating the 1-summing norm of a
HilbertSchmidt operator directly. Garling [Gar 70] did the same for the p-
summing norms, thus giving a proof that does not make use of the Kahane
Khintchine inequality.
If (E, |.|
E
) and (F, |.|
F
) are nite-dimensional spaces of the same dimen-
sion, the BanachMazur distance d(E, F) is dened as
inf|T|
_
_
T
1
_
_
: T a linear isomorphism of E onto F.
This is a basic concept in the local theory of Banach spaces, and the geom-
etry of nite-dimensional normed spaces. Corollary 16.12.1 was originally
proved by John [Joh 48], by considering the ellipsoid of maximal volume
contained in the unit ball of E. This more geometric approach has led to
many interesting results about nite-dimensional normed spaces. For this,
see [Tom 89] and [Pis 89].
Mercer was a near contemporary of Littlewood at Trinity College,
Cambridge (they were bracketed as Senior Wrangler in 1905): he proved
his theorem in 1909 [Mer 09] for functions on [a, b] [a, b]. His proof was
classical: a good account is given in [Smi 62].
Exercises
16.1 Prove Proposition 16.1.2 without appealing to the closed graph
theorem.
16.2 Why do we not consider (p, q)-summing operators with p < q?
16.3 Suppose that (f
n
) is a sequence in C(K), where K is a compact
Hausdor space, which increases pointwise to a continuous function
f. Show that the convergence is uniform (Dinis theorem). [Hint:
consider A
n,
= k: f
n
(k) f(k) .]
16.4 Give an example where P is a probability measure on the Baire sets
of a compact Hausdor space K, and T L(L
1
, C(K)) satises the
conditions of Theorem 16.6.1, but where the conclusions of Mercers
theorem do not hold.
16.5 (i) Suppose that P is a probability measure on the unit sphere K of
l
d
2
. Show that there exists x l
d
2
with |x| = 1 and
_
K
[ x, k) [
2
dP(k)
1/d.
(ii) Give an example of a probability measure P on the unit sphere
K of l
d
2
for which
_
K
[ x, k) [
2
dP(k) |x|
2
/d for all x.
(iii) Use Corollary 16.12.1 to obtain a lower bound for a in Theo-
rem 16.13.1.
16.6 Suppose that

i=1
f
i
is an unconditionally convergent series in
L
1
R
(, , ). Show that
_
m
i=1
|f
i
|
2
1
_
1/2
_
_
_
_
_
_
m
i=1
f
2
i
__
_
_
_
_
1/2
1
2E
__
_
_
_
_
m
i=1
i
f
i
_
_
_
_
_
1
_
,
where (
i
) is a sequence of Bernoulli random variables. Deduce that
i=1
|f
i
|
2
1
< (Orlicz theorem).
What happens if L
1
is replaced by L
p
, for 1 < p 2, and for
2 < p < ?
16.7 Prove the following extension of Theorem 16.13.1.
Suppose that G is a linear subspace of E and that T L(G, F).
Suppose that there exists C 0 such that if x G, y E and x
y then

m
i=1
|T(x
i
)|
2
C
2
n
j=1
|y
j
|
2
. Show that there exists
a Hilbert space H and B L(E, H), A L(H, F) with |A| C,
|B| 1 such that T(x) = AB(x) for x G.
Show that there exists

T
2
(E, F) such that

T(x) = T(x) for
x G, with
2
(
T) C.
16.8 Show that
2
(E, F) is a vector space and that
2
is a norm on it.
Show that (
2
(E, F),
2
) is complete.
17
Approximation numbers and eigenvalues
17.1 The approximation, Gelfand and Weyl numbers
We have identied the p-summing operators between Hilbert spaces H
1
and H
2
with the HilbertSchmidt operators S
2
(H
1
, H
2
), and the (p, 2)-
summing operators with S
p
(H
1
, H
2
). These spaces were dened using singu-
lar numbers: are there corresponding numbers for operators between Banach
spaces? In fact there are many analogues of the singular numbers, and we
shall mention three. Suppose that T L(E, F), where E and F are Banach
spaces.
The n-th approximation number a
n
(T) is dened as
a
n
(T) = inf|T R| : R L(E, F), rank(R) < n.
The n-th Gelfand number c
n
(T) is dened as
c
n
(T)=inf
_
_
T
[G
_
_
: G a closed subspace of E of codimension less than n.
The n-th Weyl number x
n
(T) is dened as
x
n
(T) = supc
n
(TS): S L(l
2
, E), |S| 1.
The approximation numbers, Gelfand numbers and Weyl numbers are
closely related to singular numbers, as the next proposition shows. The
Weyl numbers were introduced by Pietsch; they are technically useful, since
they enable us to exploit the strong geometric properties of Hilbert space.
Proposition 17.1.1 Suppose that T L(E, F), where E and F are Banach
spaces. Then x
n
(T) c
n
(T) a
n
(T), and if E is a Hilbert space, they are
all equal.
x
n
(T) = supa
n
(TS): S L(l
2
, E), |S| 1.
289
290 Approximation numbers and eigenvalues
If E and F are Hilbert spaces and T is compact then a
n
(T) = c
n
(T) =
x
n
(T) = s
n
(T).
Proof If S L(l
2
, E) and G is a subspace of E with codim G < n then
codim S
1
(G) < n, so that c
n
(TS) c
n
(T) |S|, and x
n
(T) c
n
(T). If R
L(E, F) and rank R < n then the null-space N of R has codimension less
than n, and
_
_
T
[N
_
_
|T R|; thus c
n
(T) a
n
(T). If E is a Hilbert space
then clearly x
n
(T) = c
n
(T); if G is a closed subspace of E of codimension
less than n, and P is the orthogonal projection onto G
then rank(TP) < n

and |T TP| =
_
_
T
[G
_
_
, so that c
n
(T) = a
n
(T). Consequently
x
n
(T) = supa
n
(TS) : S L(l
2
, E) |S| 1.
Finally, the RayleighRitz minimax formula (Theorem 15.7.1) states if
T K(H
1
, H
2
) then s
n
(T) = c
n
(T).
In general, the inequalities can be strict: if J is the identity map from
l
3
1
(R) to l
3
2
(R), then a
2
(J) = 1/
2 <
_
2/3 = c
2
(T); if I is the identity
map on l
2
1
(R) then x
2
(I) = 1/
2 < 1 = c
2
(I).
It is clear that if T L(E, F) then T can be approximated in operator
norm by a nite rank operator if and only if a
n
(T) 0 as n . In
particular, if a
n
(T) 0 as n then T is compact. It is however a
deep and dicult result that not every compact operator between Banach
spaces can be approximated by nite rank operators. This illuminates the
importance of the following result.
Theorem 17.1.1 If T L(E, F) then T is compact if and only if c
n
(T) 0
as n .
Proof First, suppose that T is compact, and that > 0. There exist
y
1
, . . . , y
n
in the unit ball B
F
of F such that T(B
E
)
n
i=1
(y
i
+ B
F
). By
the HahnBanach theorem, for each i there exists
i
F
with |
i
|
= 1
and
i
(y
i
) = |y
i
|. Let G = x E:
i
(T(x)) = 0 for 1 i n. G has
codimension less than n + 1. Suppose that x B
E
G. Then there exists
i such that |T(x) y
i
| < . Then |y
i
| = (y
i
) =
i
(y
i
T(x)) < , and so
|T(x)| < 2. Thus c
n+1
< 2, and so c
n
0 as n .
Conversely, suppose that T L(E, F), that |T| = 1 and that c
n
(T) 0
as n . Suppose that 0 < < 1 and that G is a nite-codimensional sub-
space such that
_
_
T
[G
_
_
< . Since
_
_
_T
[
G
_
_
_ =
_
_
T
[G
_
_
< , we can suppose that G
is closed, and so there is a continuous projection P
G
of E onto G. Let P
K
=
I P
G
, and let K = P
K
(E). Since K is nite-dimensional, P
K
is compact,
and there exist x
1
, . . . , x
n
in B
E
such that P
K
(B
E
)
n
i=1
(P
K
(x
i
) +B
E
).
If x B
E
there exists i such that |P
K
(x x
i
)| ; then
|P
G
(x x
i
)| |x x
i
| +|P
K
(x x
i
)| < |x| +|x
i
| + 2 +.
Consequently
|T(x) T(x
i
)| |T(P
G
(x x
i
))| +|T(P
K
(x x
i
))| (2 +) + < 4.
Thus T is compact.
17.2 Subadditive and submultiplicative properties
The approximation numbers, Gelfand numbers and Weyl numbers enjoy
subadditive properties. These lead to inequalities which correspond to the
Ky Fan inequalities.
Proposition 17.2.1 Let
n
denote one of a
n
, c
n
or x
n
. If S, T L(E, F)
and m, n, J N then
m+n1
(S +T)
m
(S) +
n
(T), and
2J
j=1
j
(S +T) 2
j=1
j
(S) +
J
j=1
j
(T)
2J1
j=1
j
(S +T) 2
J1
j=1
n
(S) +
J1
j=1
n
(T)
+
J
(S) +
J
(T).
If (X, |.|
X
) is a symmetric Banach sequence space and (
n
(S)) and (
n
(T))
are both in X then (
n
(S +T)) X and
|(
n
(S +T))|
X
2 |(
n
(S) +
n
(T))|
X
2(|(
n
(S))|
X
+|(
n
(T))|
X
).
Proof The rst set of inequalities follow easily from the denitions, and the
next two follow from the fact that
2j
(S +T)
2j1
(S +T)
j
(S) +
j
(T).
Let u
2n1
= u
2n
=
n
(S) +
n
(T). Then (
n
(S +T))
w
(u
n
), and so
|(
n
(S +T))|
X
|(u
n
)|
X
2 |(
n
(S) +
n
(T))|
X
,
by Corollary 7.4.1.
The approximation numbers, Gelfand numbers and Weyl numbers also
enjoy submultiplicative properties. These lead to inequalities which corre-
spond to the Horn inequalities.
Proposition 17.2.2 Let
n
denote one of a
n
, c
n
or x
n
. If S L(E, F)
and T L(F, G) and m, n, J N then
m+n1
(TS)
n
(T) .
m
(S), and
2J
j=1
j
(TS)
j=1
j
(T) .
j
(S)
2
2J1
j=1
j
(TS)
J1
j=1
j
(T) .
j
(S)
J
(T)
J
(S).
Suppose that is an increasing function on [0, ) and that (e
t
) is a
convex function of t. Then
2J
j=1
(
j
(TS)) 2
J
j=1
(
j
(T).
j
(S)), for each J.
In particular,
2J
j=1
[
j
(TS)[
p
2
J
j=1
(
j
(T).
j
(S))
p
X
) is a symmetric Banach sequence space. If (
j
(T))
and (
j
(S)) are both in X then (
j
(TS)) X and |(
j
(TS))|
X

2|(
j
(T)
j
(S))|
X
.
Proof For (a
n
) and (c
n
), the rst inequality follows easily from the deni-
tions. Let us prove it for (x
n
). Suppose that R L(l
2
, E), that |R| 1,
and that > 0. Then there exists A
m
L(l
2
, F) with rank(A
m
) < m and
|SR A
m
| < a
m
(SR) + x
m
(S) +.
There also exists B
n
L(l
2
, G) with rank(B
n
) < n and
|T(SR A
m
) B
n
| a
n
(T(SR A
m
)) +
x
n
(T) |SR A
m
| +.
Then rank (TA
m
+B
n
) < m+n 1, and so
a
m+n1
(TSR) |T(SR A
m
) B
n
|
x
n
(T) |SR A
m
| + x
n
(T)(x
m
(S) +) +.
Taking the supremum as R varies over the unit ball of L(l
2
, E),
x
m+n1
(TS) x
n
(T)(x
m
(S) +) +;
this holds for all > 0, and so the inequality follows.
The next two inequalities then follow from the fact that
2j
(TS)
2j1
(TS)
j
(T)
j
(S).
Thus if we set v
2j1
= v
2j
=
j
(T)
j
(S) then

J
j=1
j
(TS)
J
j=1
v
j
, and
the remaining results follow from Proposition 7.6.3.
We next consider the Gelfand and Weyl numbers of (p, 2)-summing oper-
ators. For this, we need the following elementary result.
Proposition 17.2.3 Suppose that T L(H, F), where H is a Hilbert space,
and that 0 <
n
< 1, for n N. Then there exists an orthonormal sequence
(e
n
) in H such that |T(e
n
)| (1 )c
n
(T) for each n.
Proof This follows from an easy recursion argument. Choose a unit vector
E
1
such that |T(e
1
)| > (1
1
) |T| = (1
1
)c
1
(T). Suppose that we have
found e
1
, . . . , e
n
. If G = e
1
, . . . , e
n
, then codim G = n, so that there

exists a unit vector e
n+1
in G with |T(e
n+1
)| > (1
n+1
)c
n+1
(T).
Corollary 17.2.1 If T
p,2
(H, F), where 2 p < , then
_

n=1
(c
n
(T))
p
_
1/p

p,2
(T).
Proof Suppose that > 0. Let (e
n
) satisfy the conclusions of the proposition.
If N N then
(1 )
_
N
n=1
(c
n
(T))
p
_
1/p
_
N
n=1
|T(e
n
)|
p
_
1/p

p,2
(T) sup
_
N
n=1
[ e
n
, y) [
2
_
1/2
: |y| 1

p,2
(T).
Since and N are arbitrary, the inequality follows.
Corollary 17.2.2 If T
p,2
(E, F), where E and F are Banach spaces
and 2 p < , then x
n
(T)
p,2
(T)/n
1/p
.
Proof Suppose that S L(l
2
, E) and that |S| 1. Then
p,2
(TS)
p,2
(T), and so
c
n
(TS)
_
1
n
n
i=1
c
i
(TS)
p
_
1/p

p,2
(TS)
n
1/p

p,2
(T)
n
1/p
.
The result follows on taking the supremum over all S in the unit ball of
L(l
2
, E).
17.3 Pietschs inequality
We are now in a position to prove a fundamental inequality, which is the
Banach space equivalent of Weyls inequality.
Theorem 17.3.1 (Pietschs inequality) Suppose that T is a Riesz oper-
ator on a Banach space (E, |.|
E
). Then
2n
j=1
[
j
(T)[ (2e)
n
j=1
x
j
(T)
2
,
2n+1
j=1
[
j
(T)[ (2e)
n+1/2
j=1
x
j
(T)
2
. . . x
n+1
(T).
Proof We shall prove this for 2n; the proof for 2n +1 is very similar. As in
Sections 15.1 and 15.2, there exists a T-invariant 2n-dimensional subspace
E
2n
of E for which T
2n
= T
[E
2n
has eigenvalues
1
(T), . . . ,
2n
(T). Note
that x
j
(T
2n
) x
j
(T) for 1 j 2n. Since
2
(I
E
2n
) =

2n, the Pietsch
factorization theorem tells us that there exists an isomorphism S of E
2n
onto l
2n
2
with
2
(S) =

2n and
_
_
S
1
_
_
= 1. Let R = ST
2n
S
1
: l
2n
2
l
2n
2
.
Then R and T
2n
are related operators, and so R has the same eigenvalues
as T. Using Weyls inequality and Proposition 17.2.1,
2n
j=1
[
j
(T)[ =
2n
j=1
[
j
(R)[
2n
j=1
s
j
(R)
j=1
s
2j1
(R)
2
=
j=1
x
2j1
(ST)
j=1
x
j
(S)x
j
(T)
2
.
17.3 Pietschs inequality 295
Now x
j
(S)
2
(S)/
j = (2n/j)
1/2
, by Corollary 17.2.2, and
n
j=1
(2n/j) =
2
n
n
n
/n! (2e)
n
, since n
n
e
n
n! (Exercise 3.5), so that
2n
j=1
[
j
(T)[ (2e)
n
j=1
x
j
(T)
2
.
Corollary 17.3.1 (i) Suppose that is an increasing function on [0, )
and that (e
t
) is a convex function of t. Then
2J
j=1
([
j
(T)[) 2
J
j=1
(
2ex
j
(T)), for each J.
In particular,
2J
j=1
[
j
(T)[
p
2(2e)
p/2
J
j=1
(x
j
(T))
p
X
) is a symmetric Banach sequence space. If (x
j
(T))
X then (
j
(T)) X and |(
j
(T))|
X
2
2e |(x
j
(T))|
X
.
Proof Let y
2j1
(T) = y
2j
(T) =

2ex
j
(T). Then
J
j=1
[
j
(T)[
J
j=1
y
j
(T),
for each J, and the result follows from Proposition 7.6.3.
We use Weyls inequality to establish the following inequality.
Theorem 17.3.2 If T L(E, F) then
2n
j=1
c
j
(T) (4en)
n
j=1
x
j
(T)
2
.
Proof Suppose that 0 < < 1. A straightforward recursion argument shows
that there exist unit vectors z
j
in E and
j
in F
such that
j
(z
i
) = 0 for
i < j and [
j
(T(x
j
))[ (1)c
j
(T). Let A: l
2n
2
E be dened by A(e
j
) =
z
j
, let B: F l
2n
be dened by (B(y))
j
=
j
(y), let I
(2n)
,2
: l
2n
l
2n
2
be
the identity map and let S
2n
= I
(2n)
,2
BTA. Then |A|
2n, since
|A()|
2n
j=1
[
j
[. |z
j
|
2n|| ,
by the CauchySchwarz inequality. Further, |B| 1 and
2
(I
(2n)
,2
) =

2n,
so that x
j
(I
(2n)
,2
B)
_
2n/j, for 1 j 2n, by Corollary 17.2.2.
Now S
2n
is represented by a lower triangular matrix with diagonal entries
j
(T(x
j
)), and so
(1 )
2n
2n
j=1
c
j
(T)
2n
j=1
s
j
(S
2n
)
j=1
s
2j1
(S
2n
)
2
,
by Weyls inequality. But, arguing as in the proof of Pietschs inequality,
s
2j1
(S
2n
) |A| x
2j1
(I
(2n)
,2
BT)
2nx
j
(I
(2n)
,2
B)x
j
(T) (2n/
_
j)x
j
(T),
so that
(1 )
2n
2n
j=1
c
j
(T)
_
(2n)
2n
n!
_
j=1
x
j
(T)
2
(4en)
n
j=1
x
j
(T)
2
.
Since is arbitrary, the result follows.
Since (2n)
2n
e
2n
.(2n)! we have the following corollary.
Corollary 17.3.2

2n
j=1
(c
j
(T)/
j) 2
n
e
2n
(
n
j=1
x
j
(T))
2
Applying Proposition 7.6.3, we deduce this corollary.
Corollary 17.3.3

j=1
(c
j
(T))
2
/j 2e
2
j=1
(x
j
(T))
2
.
Corollary 17.3.4 If

j=1
(x
j
(T))
2
< then T is compact.
Proof For then

j=1
(c
j
(T))
2
/j < , so that c
j
(T) 0, and the result
follows from Theorem 17.1.1.
17.4 Eigenvalues of p-summing and (p, 2)-summing
endomorphisms
We now use these results to obtain information about the eigenvalues of
p-summing and (p, 2)-summing endomorphisms of a complex Banach space.
Theorem 17.4.1 If (E, |.|
E
) is a complex Banach space and T
2
(E),
then T
2
is compact, so that T is a Riesz operator. Further, (
j=1
[
j
(T)[
2
)
1/2

2
(T).
17.4 Eigenvalues of p-summing and (p, 2)-summing endomorphisms 297
Proof Let T = Rj
2
i be a factorization, with |R| =
2
(T), and let S = j
2
iR.
Then T and S are related operators, and S is a HilbertSchmidt operator
with |S|
2

2
(T). As T
2
= RSj
2
i, T
2
is compact, and so T is a Riesz
operator. Since T and S are related,
j=1
[
j
(T)[
2
j=1
[
j
(S)[
2
|S|
2

2
(T).
Theorem 17.4.2 If T
p,2
(E) and m > p then T
m
is compact, and so T
is a Riesz operator.
Proof Using submultiplicity, and applying Corollary 17.2.2,
x
mn1
(T
m
) (x
n
(T))
m
(
p,2
(T))
m
/n
m/p
,
and so

j=1
(x
j
(T
m
))
2
< . The result follows from Corollary 17.3.4.
Corollary 17.4.1 Suppose that T
p,2
(E). Then
n
1/p
[
n
(T)[ n
1/p
n
(T) 2p
t
2e
p,2
(T).
Proof
n
1/p
[
n
[ n
1/p
n
(T) |((T))|
p,
2
2e |(x(T))|
p,
(by Corollary 17.3.1)
2p
t
2e |(x(T))|
p,
(by Proposition 10.2.1)
= 2p
t
2e sup
j
j
1/p
x
j
(T)
2p
t
2e
p,2
(T) (by Corollary 17.2.2).
Applying this to the identity mapping on a nite-dimensional space, we have
the following, which complements Corollary 16.12.3.
Corollary 17.4.2 If (E, |.|
E
) is an n-dimensional normed space, then
p,2
(E) n
1/p
/(2p
t
2e).
If T
p
(E) for some 1 p 2, then T
2
(E), and T is a Riesz
operator with (
j=1
[
j
(T)[
2
)
1/2

2
(T)
p
(T) (Theorem 17.4.1). What
happens when 2 < p < ?
Theorem 17.4.3 If T
p
(E) for some 2 < p < , then T is a Riesz
operator and (
j=1
[
j
(T)[
p
)
1/p

p
(T).
Proof Since T
p,2
(E), T is a Riesz operator. Suppose that p < r < .
Then, by Corollary 17.4.1,
[
j
(T)[
r
(2p
t
2e
p,2
(T))
r
/j
r/p
(2p
t
2e
p
(T))
r
/j
r/p
,
so that
j=1
[
j
(T)[
r
C
r
p
(T)
r
, where C
r
= (2p
t
2e)
r
p/(r p).
Note that C
r
as r p: this seems to be an unpromising approach.
But let us set
D
r
= inf
C:
j=1
[
j
(T)[
r
C(
p
(T))
r
, E a Banach space, T
p
(E)
.
Then 1 D
r
C
r
: we shall show that D
r
= 1. Then
j=1
[
j
(T)[
p
1/p
= lim
rp
j=1
[
j
(T)[
r
1/r

p
(T).
In order to show that D
r
= 1, we consider tensor products. Suppose that
E and F are Banach spaces. Then an element t =

n
j=1
x
j
y
j
of E F
denes an element T
t
of L(E
, F): T
t
() =

n
j=1
(x
j
)y
j
. We give t the
corresponding operator norm:
|t|
= |T
t
| = sup
_
_
_
_
_
_
n
j=1
(x
j
)y
j
_
_
_
_
_
_
F
: ||
E
= sup
j=1
(x
j
)(y
j
)
: ||
E
1, ||
F
.
This is the injective norm on E F. We denote the completion of E F
under this norm by E
F. If S L(E
1
, E
2
) and T L(F
1
, F
2
) and t =
n
j=1
x
j
y
j
we set (S T)(t) =
n
j=1
S(x
j
) T(y
j
). Then it follows from
the denition that |(S T)(t)|
|S| |T| |t|
.
Proposition 17.4.1 Suppose that i
1
: E
1
C(K
1
) and i
1
: E
2
C(K
2
)
are isometries. If t =

n
j=1
x
j
y
j
E
1
E
2
, let I(t)(k
1
, k
2
) =
17.4 Eigenvalues of p-summing and (p, 2)-summing endomorphisms 299
n
j=1
i
1
(x
j
)(k
1
) i
2
(y
j
)(k
2
) C(K
1
K
2
). Then |I(t)| = |t|
, so that
I extends to an isometry of E
1
E
2
into C(K
1
K
2
).
Proof Let f
j
= i
1
(x
j
), g
j
= i
2
(y
j
). Since
[I(t)(k
1
, k
2
)[ =
j=1
k
1
(f
j
)
k
2
(g
j
)
|t|
, |I(t)| |t|
.
If, for k = 1, 2,
k
E
k
and |
k
|
E
k
= 1, then by the HahnBanach theorem,
extends, without increase of norm, to a continuous linear functional on
C(K
k
), and by the Riesz representation theorem this is given by h
k
d
k
,
where
k
is a Baire probability measure and [h
k
[ = 1. Thus
j=1
1
(x
j
)
2
(y
j
)
_
K
1
_
K
2
n
j=1
f
j
(k
1
)g
j
(k
2
)h
2
(k
2
) d
2
h
1
(k
1
) d
1
_
K
1
__
K
2
I(t)h
2
(k
2
) d
2
_
h
1
(k
1
) d
1
_
K
1
__
K
2
[I(t)[ d
2
_
d
1
|I(t)| .
Consequently |t|
|I(t)|.
Theorem 17.4.4 Suppose that 1 p < and that T
1

p
(E
1
, F
1
),
T
2

p
(E
2
, F
2
). Then T
1
T
2

p
(E
1
F
1
, E
2
F
2
) and
p
(T
1
T
2
)
p
(T
1
)
p
(T
2
).
Proof Let i
1
: E
1
C(K
1
) and i
2
: E
2
C(K
2
) be isometric embeddings,
and let I : E
1
E
2
C(K
1
K
2
) be the corresponding embedding. By
Pietschs domination theorem, there exist, for k = 1, 2, probability measures
k
on the Baire sets of K
k
such that
|T
k
(x)|
p
(T
k
)
__
K
k
[i
k
(x)[
p
d
k
_
1/p
.
Now let =
1

2
be the product measure on K
1
K
2
. Suppose that
t =
n
j=1
x
i
y
i
and B
F
1
, B
F
2
. Let f
j
= i
1
(x
j
), g
j
= i
2
(y
j
). Then
j=1
(T
1
(x
j
))(T
2
(y
j
))
T
1
j=1
(T
2
(y
j
))x
j
_
_
_
_
_
_
T
1
(
n
j=1
(T
2
(y
j
))x
j
)
_
_
_
_
_
_

p
(T
1
)
_
K
1
j=1
(T
2
(y
j
))f
j
(k
1
)
p
d
1
(k
1
)
1/p

p
(T
1
)
_
K
1
_
_
_
_
_
_
T
2
j=1
f
j
(k
1
)y
j
_
_
_
_
_
_
p
d
1
(k
1
)
1/p

p
(T
1
)
p
(T
2
)
_
K
1
_
K
2
j=1
f
j
(k
1
)g
j
(k
2
)
p
d
1
(k
1
) d
2
(k
2
)
1/p
=
p
(T
1
)
p
(T
2
)
__
K
1
K
2
[I(t)[
p
d
_
1/p
.
Thus |(T
1
T
2
)(t)|

p
(T
1
)
p
(T
2
)(
_
K
1
K
2
[I(t)[
p
d)
1/p
, and this inequal-
ity extends by continuity to any t E
1
F
1
.
We now complete the proof of Theorem 17.4.3. We consider T T. If
1
,
2
are eigenvalues of T then
1
2
is an eigenvalue of T T, whose
generalized eigenspace contains
G
: , eigenvalues of T, =
1
and so
j=1
[
j
(T)[
r
j=1
[
j
(T T)[
r
D
r
p
(T T)
r
= D
r
(
p
(T))
2r
.
Thus D
r
D
1/2
r
, and D
r
= 1.
Detailed accounts of the distribution of eigenvalues are given in [K on 86]
and [Pie 87]; the latter also contains a fascinating historical survey.
Theorem 17.1.1 was proved by Lacey [Lac 63]. Eno [Enf 73] gave the
rst example of a compact operator which could not be approximated in
norm by operators of nite rank; this was a problem which went back to
Banach.
Exercises
17.1 Verify the calculations that follow Proposition 17.1.1.
17.2 Suppose that (, , ) is a measure space, and that 1 < p < .
Suppose that K is a measurable kernel such that
K
p
=
_
_
__
[K(
1
,
2
)[
p
d(
2
)
_
p/p
d(
1
)
_
1/p
< .
Show that K denes an operator T
K
in L(L
p
(, , )) with |T
K
|
K
p
. Show that T
K
is a Riesz operator, and that if 1 < p 2 then
k=1
[
k
(T
K
)[
2
K
2
p
, while if 2 < p < then

k=1
[
k
(T
K
)[
p
K
p
p
.
17.3 Let (, , ) be T, with Haar measure. Suppose that 2 < p <
and that f L
p
. Let K(s, t) = f(s t). Show that K satises the

conditions of the preceding exercise. What are the eigenvectors and
eigenvalues of T
K
? What conclusion do you draw from the preceding
exercise?
18
Grothendiecks inequality, type and cotype
18.1 Littlewoods 4/3 inequality
In the previous chapter, we saw that p-summing and (p, 2)-summing prop-
erties of a linear operator can give useful information about its structure.
Pietschs factorization theorem shows that if is a probability measure on
the Baire sets of a compact Hausdor space and 1 p < then the natu-
ral mapping j
p
: C(K) L
p
() is p-summing. This implies that C(K) and
L
p
() are very dierent. In this chapter, we shall explore this idea further,
and obtain more examples of p-summing and (p, 2)-summing mappings.
We consider inequalities between norms on the space M
m,n
= M
m,n
(R) or
M
m,n
(C) of real or complex mn matrices. Suppose that A = (a
ij
) M
m,n
.
Our main object of study will be the norm
|A| = sup
i=1
j=1
a
ij
t
j
: [t
j
[ 1
= sup
i=1
n
j=1
a
ij
s
i
t
j
: [s
i
[ 1, [t
j
[ 1
.
|A| is simply the operator norm of the operator T
A
: l
n
l
m
1
dened by
T
A
(t) = (
n
j=1
a
ij
t
j
)
m
i=1
, for t = (t
1
, . . . , t
n
) l
n
. In this section, we restrict

attention to the real case, where
|A| = sup
i=1
j=1
a
ij
t
j
: t
j
= 1
= sup
i=1
n
j=1
a
ij
s
i
t
j
: s
i
= 1, t
j
= 1
.
302
18.1 Littlewoods 4/3 inequality 303
We set a
i
= (a
ij
)
n
j=1
, so that a
i
R
n
. The following inequalities are due to
Littlewood and Orlicz.
Proposition 18.1.1 If A M
m,n
(R) then

m
i=1
|a
i
|
2

2 |A| (Little-
wood) and (
m
i=1
|a
i
|
2
1
)
1/2
2 |A| (Orlicz).
Proof Using Khintchines inequality,
m
i=1
|a
i
|
2
=
m
i=1
j=1
[a
ij
[
2
1/2
2
m
i=1
E([
n
j=1
j
a
ij
[)
=

2E
i=1
[
n
j=1
j
a
ij
[
2 |A| .
Similarly

n
j=1
(
m
i=1
[a
ij
[
2
)
1/2
2 |A|. Orliczs inequality now follows

by applying Corollary 5.4.2.
As a corollary, we obtain Littlewoods 4/3 inequality; it was for this that he
proved Khintchines inequality.
Corollary 18.1.1 (Littlewoods 4/3 inequality) If A M
m,n
(R) then
(
i,j
[a
ij
[
4/3
)
3/4
2 |A|.
Proof We use Holders inequality twice.
i,j
[a
ij
[
4/3
=
j
[a
ij
[
2/3
[a
ij
[
2/3
j
[a
ij
[
2
)
1/3
(
j
[a
ij
[)
2/3
i
(
j
[a
ij
[
2
)
1/2
2/3
i
(
j
[a
ij
[)
2
1/3
=
_
i
|a
i
|
2
_
2/3
_
i
|a
i
|
2
1
_
1/3
2 |A|
_
4/3
.
304 Grothendiecks inequality, type and cotype
The exponent 4/3 is best possible. To see this, let A be an nn Hadamard
matrix. Then (
i,j
[a
ij
[
p
)
1/p
= n
2/p
, while if |t|
= 1 then, since the a

i
are orthogonal,
j
a
ij
t
j
i
(
j
a
ij
t
j
)
2
1/2
=

n
_
i
a
i
, t)
2
_
1/2
= n|t|
2
n
3/2
.
18.2 Grothendiecks inequality
We now come to Grothendiecks inequality. We set
g(A) = sup
i=1
_
_
_
_
_
_
n
j=1
a
ij
k
j
_
_
_
_
_
_
H
: k
j
H, |k
j
| 1
= sup
i=1
n
j=1
a
ij
h
i
, k
j
)
: h
i
, k
j
H, |h
i
| 1, |k
j
| 1
,
where H is a real or complex Hilbert space. g(A) is the operator norm of
the operator T
A
: l
n
(H) l
m
1
(H) dened by T
A
(k) = (
n
j=1
a
ij
k
j
)
m
i=1
for
k = (k
1
, . . . , k
n
) l
n
(H).
Theorem 18.2.1 (Grothendiecks inequality) There exists a constant
C, independent of m and n, such that if A M
m,n
then g(A) C |A|.
The smallest value of the constant C is denoted by K
G
= K
G
(R) or
K
G
(C), and is called Grothendiecks constant. The exact values are not
known, but it is known that 1.338 K
G
(C) 1.405 and that /2 =
1.571 K
G
(R) 1.782 = /(2 sinh
1
(1)).
Proof There are several proofs of this inequality. We shall give two, neither
of which is the proof given by Grothendieck, and neither of which gives good
values for the constants.
We begin by giving what is probably the shortest and easiest proof. Let
K
m,n
= supg(A): A M
m,n
, |A| 1.
18.2 Grothendiecks inequality 305
If |A| 1 then

m
i=1
[a
ij
[ 1, and so g(A) n; we need to show that
there is a constant C, independent of m and n, such that K
m,n
C.
We can suppose that H is an innite-dimensional separable Hilbert space.
Since all such spaces are isometrically isomorphic, we can suppose that H
is a Gaussian Hilbert space, a subspace of L
2
(, , P). (Recall that H is a
closed linear subspace of L
2
(, , P) with the property that if h H then
h has a normal, or Gaussian, distribution with mean 0 and variance |h|
2
2
;
such a space can be obtained by taking the closed linear span of a sequence
of independent standard Gaussian random variables.) The random variables
h
i
and k
j
are then unbounded random variables; the idea of the proof is to
truncate them at a judiciously chosen level. Suppose that 0 < < 1/2.
There exists M such that if h H and |h| = 1 then
_
[h[>M
[h[
2
dP =
2
. If
h H, let h
M
= hI
([h[M|h|)
. Then
_
_
h h
M
_
_
= |h|.
If |A| 1 and |h
i
|
H
1, |k
j
|
H
1 then
i=1
n
j=1
a
ij
h
i
, k
j
)
i=1
n
j=1
a
ij
h
M
i
, k
M
j
_
i=1
n
j=1
a
ij
h
i
h
M
i
, k
M
j
_
i=1
n
j=1
a
ij
h
i
, k
j
k
M
j
_
.
Now
i=1
n
j=1
a
ij
h
M
i
, k
M
j
_
i=1
n
j=1
a
ij
h
M
i
()k
M
j
() dP()
M
2
,
while
i=1
n
j=1
a
ij
h
i
h
M
i
, k
M
j
_
K
m,n
and
i=1
n
j=1
a
ij
h
i
, k
j
k
M
j
_
K
m,n
,
so that
K
m,n
M
2
+ 2K
m,n
, and K
m,n
M
2
/(1 2).
For example, in the real case if M = 3 then = 0.16 and K
G
13.5.
18.3 Grothendiecks theorem
The following theorem is the rst and most important consequence of
Grothendiecks inequality.
Theorem 18.3.1 (Grothendiecks theorem) If T L(L
1
(, , ), H),
where H is a Hilbert space, then T is absolutely summing and
1
(T)
K
G
|T|.
Proof By Theorem 16.3.1, it is enough to consider simple functions f
1
, . . . , f
n
with
sup
_
_
_
_
_
_
n
j=1
b
j
f
j
_
_
_
_
_
_
1
: [b
j
[ 1
1.
We can write
f
j
=
m
i=1
c
ij
I
A
i
=
m
i=1
a
ij
g
i
,
where A
1
, . . . , A
m
are disjoint sets of positive measure, and where g
i
=
I
A
i
/(A
i
), so that |g
i
|
1
= 1. Let h
i
= T(g
i
), so that |h
i
|
H
|T|. Then
n
j=1
|T(f
j
)|
H
=
n
j=1
_
_
_
_
_
m
i=1
a
ij
h
i
_
_
_
_
_
H
g(A) |T| K
G
|A| |T| ,
where A is the matrix (a
ij
). But if [t
j
[ 1 for 1 j n then
m
i=1
j=1
a
ij
t
j
=
_
_
_
_
_
_
n
j=1
t
j
f
j
_
_
_
_
_
_
1
1,
so that |A| 1.
Grothendiecks theorem is essentially equivalent to Grothendiecks in-
equality. For suppose that we know that
1
(S) K|S| for each S
L(l
1
, H), and suppose that A M
m,n
. If h
1
, . . . , h
m
are in the unit ball of
H, let S : l
1
H be dened by S(z) =

m
i=1
z
i
h
i
. Then |S| 1, so that
1
(ST
A
)
1
(S) |T
A
| K|A|. But then
n
j=1
_
_
_
_
_
m
i=1
a
ij
h
i
_
_
_
_
_
=
n
j=1
|ST
A
(e
j
)|

1
(ST
A
) sup
_
_
_
_
_
_
n
j=1
b
j
e
j
_
_
_
_
_
_
: [b
j
[ 1 for 1 j n
K|A| .
18.4 Another proof, using Paleys inequality
It is of interest to give a direct proof of Grothendiecks Theorem for operators
in L(l
1
, H), and this was done by Pelczy nski and Wojtaszczyk [Pel 77]. It
is essentially a complex proof, but the real version then follows from it. It
uses an interesting inequality of Paley.
Recall that if 1 p < then
H
p
=
_
f: f analytic on D, |f|
p
= sup
0r<1
_
1
2
_
2
0
[f(re
i
)[
p
d
_
1/p
<
_
,
and that
A(D) = f C(
D): f analytic on D.
We give A(D) the supremum norm. If f H
p
or A(D) we can write
f(z) =
n=0

f
n
z
n
, for z D. If f H
2
, then |f|
H
2
= (
n=0
[
f
n
[
2
)
1/2
.
Theorem 18.4.1 (Paleys inequality) If f H
1
then (
k=0
[
f
2
k
1
[
2
)
1/2
2 |f|
1
.
Proof We use the fact that if f H
1
then we can write f = bg, where b is a
Blaschke product (a bounded function on D for which lim
r,1
[f(re
i
)[ = 1
for almost all ), and g is a function in H
1
with no zeros in D. From
this it follows that g has a square root in H
2
: there exists h H
2
with
h
2
= g. Thus, setting k = bh, we can write f = hk, where h, k H
2
and
|f|
1
= |h|
2
|k|
2
. For all this, see [Dur 70].
Thus

f
n
=
n
j=0

h
j
k
nj
, and so
k=0
[
f
2
k
1
[
2
k=0
2
k
1
j=0
[
h
j
[[
k
2
k
1j
[
2
=
k=0
2
k1
1
j=0
[
h
j
[[
k
2
k
1j
[ +
2
k1
1
j=0
[
h
2
k
1j
[[
k
j
[
2
2
k=0
2
k1
1
j=0
[
h
j
[[
k
2
k
1j
[
2
+
2
k1
1
j=0
[
h
2
k
1j
[[
k
j
[
.
By the CauchySchwarz inequality,
2
k1
1
j=0
[
h
j
[[
k
2
k
1j
[
2
k1
1
j=0
[
h
j
[
2
2
k
1
j=2
k1
[
k
j
[
2
|h|
2
2
2
k
1
j=2
k1
[
k
j
[
2
,
so that
k=0
2
k1
1
j=0
[
h
j
[[
k
2
k
1j
[
2
|h|
2
2
|k|
2
2
;
similarly
k=0
2
k1
1
j=0
[
h
2
k
1j
[[
k
j
[
2
|h|
2
2
|k|
2
2
,
and so
k=0
[
f
2
k
1
[
2
4 |h|
2
2
|k|
2
2
.
We also need the following surjection theorem.
Theorem 18.4.2 If y l
2
, there exists f A(D) with |f|

e |y|
2
such that

f
2
k
1
= y
k
for k = 0, 1, . . . .
Proof We follow the proof of Fournier [Fou 74]. By homogeneity, it is enough
to prove the result when |y|
2
= 1. Note that
log
_

k=0
(1 +[y
k
[
2
)
_
=
k=0
log(1 +[y
k
[
2
) <
k=0
[y
k
[
2
= 1,
so that

k=0
(1 +[y
k
[
2
) < e.
First we consider sequences of nite support. We show that if y
k
=
0 for k K then there exists f(z) =

2
K
1
j=0

f
j
z
j
with

f
2
k
1
= y
k
for
k = 0, 1, . . . , K and |f|
2

K
k=0
(1 + [y
k
[
2
). Let us set f
(0)
(z) = y
0
and
g
(0)
(z) = 1, and dene f
(1)
, . . . , f
(K)
and g
(1)
, . . . , g
(K)
recursively by setting
_
f
(k)
(z)
g
(k)
(z)
_
=
_
1 y
k
z
2
k
1
y
k
z
(2
k
1)
1
_
_
f
(k1)
(z)
g
(k1)
(z)
_
=M
k
_
f
(k1)
(z)
g
(k1)
(z)
_
,
for z ,= 0.
Now if [z[ = 1 then M
k
M
k
= (1 +[y
k
[
2
)I
2
, so that
[f
(k)
(z)[
2
+[g
(k)
(z)[
2
= (1+[y
k
[
2
)([f
(k1)
(z)[
2
+[g
(k1)
(z)[)
2
=
k
j=0
(1+[y
j
[
2
).
It also follows inductively that f
(k)
is a polynomial of degree 2
k
1 in
z, and g
(k)
is a polynomial of degree 2
k
1 in z
1
. Thus f
(k)
A(D)
and
_
_
f
(k)
_
_
2
k
j=0
(1 +[y
j
[
2
). Further, f
(k)
= f
(k1)
+y
k
z
2
k
1
g
(k1)
, and
y
k
z
2
k
1
g
(k1)
is a polynomial in z whose non-zero coecients lie in the range
[2
k1
, 2
k
1]. Thus there is no cancellation of coecients in the iteration,
and so (
f
(k)
)
2
j
1
= y
j
for 0 j k. Thus the result is established for
sequences of nite support.
Now suppose that y l
2
and that |y| = 1. Let

k=0
(1 +[y
k
[
2
) =
2
e, so
that 0 < < 1. There exists an increasing sequence (k
j
)
j=0
of indices such
that

n=k
j
+1
[y
n
[
2
< (1 )
2
/4
j+1
. Let
a
(0)
=
k
0
i=0
y
i
e
i
and a
(j)
=
k
j
i=k
j1
+1
y
i
e
i
for j > 0.
Then there exist polynomials f
j
with (
f
j
)
2
k
1
= a
(j)
k
for all k, and with
|f
0
|
(
k
0
k=0
(1 +[y
k
[
2
))
1/2

e,
|f
j
|
(1 )
e/2
j
for j > 0.
Then

j=0
f
j
converges in norm in A(D) to f say, with |f|

e, and
f
2
k
1
= y
k
for 0 k < .
We combine these results to prove Grothendiecks theorem for l
1
.
Theorem 18.4.3 If T L(l
1
, l
2
) then T is absolutely summing and
1
(T)
2
e |T|.
Proof Let T(e
i
) = h
(i)
. For each i, there exists f
(i)
A(D) with
_
_
f
(i)
_
_
e
_
_
h
(i)
_
_

e |T| such that (
f
(i)
)
2
k
1
= h
(i)
k
, for each k. Let S : l
1

A(D) be dened by S(x) =
i=0
x
i
f
(i)
, let J be the inclusion A(D) H
1
,
and let P : H
1
l
2
be dened by P(f)
k
=

f
2
k
1
, so that T = PJS.
Then |S|

e |T|,
1
(J) = 1, by Pietschs domination theorem, and
|P| 2, by Paleys inequality. Thus T = PJS is absolutely summing, and
1
(T) |P|
1
(J) |S| 2
e |T|.
18.5 The little Grothendieck theorem
We can extend Grothendiecks theorem to spaces of measures. We need the
following elementary result.
Lemma 18.5.1 Suppose that K is a compact Hausdor space and that
1
, . . . ,
n
C(K)
. Then there exists a probability measure P on the Baire

sets of K and f
1
, . . . , f
n
in L
1
(P) such that
j
= f
j
dP for each j.
Proof By the Riesz representation theorem, for each j there exists a prob-
ability measure P
j
on the Baire sets of K and a measurable h
j
with [h
j
[ =
|
j
|
everywhere, such that

j
= h
j
dP
j
. Let P = (1/n)
n
j=1
P
j
. Then P
is a probability measure P
j
on the Baire sets of K, and each P
j
is absolutely
continuous with respect to P. Thus for each j there exists g
j
0 with
_
K
g
j
dP = 1 such that P
j
= g
j
dP. Take f
j
= h
j
g
j
.
Theorem 18.5.1 Suppose that K is a compact Hausdor space. If T
L(C(K)
, H), where H is a Hilbert space, then T is absolutely summing and
1
(T) K
G
|T|.
Proof Suppose that
i
, . . . ,
n
C(K)
. By the lemma, there exist a

probability measure P and f
1
, . . . , f
n
L
1
(P) such that
j
= f
j
dP for
1 j n. We can consider L
1
(P) as a subspace of C(K)
. T maps L
1
(P)
18.5 The little Grothendieck theorem 311
into H, and
n
j=1
|T(
j
)| K
G
sup
_
_
_
_
_
_
n
j=1
b
j
f
j
_
_
_
_
_
_
1
: [b
j
[ 1
= K
G
sup
_
_
_
_
_
_
n
j=1
b
j
j
_
_
_
_
_
_
: [b
j
[ 1
.
Corollary 18.5.1 (The little Grothendieck theorem) If T
L(C(K), H), where K is a compact Hausdor space and H is a Hilbert
space, then T
2
(C(K), H) and
2
(T) K
G
|T|.
Proof We use Proposition 16.3.2. Suppose that S L(l
N
2
, C(K)). Then
S
L(C(K)
, l
N
2
). Thus
1
(S
) K
G
|S
|, and so
2
(S
)
1
(S
)
K
G
|S
| |T
|. But
2
(S
) is the HilbertSchmidt norm of S
, and so
2
(S
) =
2
(TS). Thus (
N
n=1
|TS(e
n
)|
2
)
1/2
K
2
G
|T| |S|, so that
T
2
(C(K), H) and
2
(T) K
G
|T|.
We also have a dual version of the little Grothendieck theorem.
Theorem 18.5.2 It T L(L
1
(, , ), H), where H is a Hilbert space,
then T is 2-summing, and
2
(T) K
G
|T|.
Proof By Theorem 16.3.1, it is enough to consider simple functions in
L
1
(, , ), and so it is enough to consider T L(l
d
1
, H). We use Propo-
sition 16.3.2. Suppose that S L(l
N
2
, l
d
1
). Then S
L(l
d
, l
N
2
), and so
2
(S
) K
G
|S
|, by the little Grothendieck theorem. Then

2
(S
)
K
G
|S
| |T
|. But
2
(S
) is the HilbertSchmidt norm of S
, and so
2
(S
) =
2
(TS). Thus
_
N
n=1
|TS(e
n
)|
2
_
1/2
K
G
|S| |T| sup
_
N
n=1
[ e
n
, h) [
2
_
1/2
: |h| 1
= K
G
|S| |T| .
Thus T is 2-summing, and
2
(T) K
G
|T|.
18.6 Type and cotype
In fact, we can obtain a better constant in the little Grothendieck theorem,
and can extend to the result to more general operators. In order to do
this, we introduce the notions of type and cotype. These involve Bernoulli
sequences of random variables: for the rest of this chapter, (
n
) will denote
such a sequence.
Let us begin, by considering the parallelogram law. This says that if
x
1
, . . . , x
n
are vectors in a Hilbert space H then
E
_
_
_
_
_
_
n
j=1
j
x
j
_
_
_
_
_
_
2
=
n
j=1
|x
j
|
2
H
.
We deconstruct this equation; we split it into two inequalities, we change an
index, we introduce constants, and we consider linear operators.
E
) and (F, |.|
F
) are Banach spaces, that T L(E, F)
and that 1 p < . We say that T is of type p if there is a constant C
such that if x
1
, . . . , x
n
are vectors in E then
_
_
_
_
_
_
n
j=1
j
T(x
j
)
_
_
_
_
_
_
2
F
1/2
C
j=1
|x
j
|
p
E
1/p
.
The smallest possible constant C is denoted by T
p
(T), and is called the type
p constant of T. Similarly, we say that T is of cotype p if there is a constant
C such that if x
1
, . . . , x
n
are vectors in E then
j=1
|T(x
j
)|
p
F
1/p
C
_
_
_
_
_
_
n
j=1
j
(x
j
)
_
_
_
_
_
_
2
E
1/2
.
The smallest possible constant C is denoted by C
p
(T), and is called the
cotype p constant of T.
It follows from the parallelogram law that if T is of type p, for p > 2, or
cotype p, for p < 2, then T = 0. If T is of type p then T is of type q, for
1 q < p, and T
q
(T) T
p
(T); if T is of cotype p then T is of cotype q, for
p < q < , and C
q
(T) C
p
(T). Every Banach space is of type 1. By the
Kahane inequalities, we can replace
_
_
_
_
_
_
n
j=1
j
T(x
j
)
_
_
_
_
_
_
2
F
1/2
by
_
_
_
_
_
_
n
j=1
j
T(x
j
)
_
_
_
_
_
_
q
F
1/q
18.6 Type and cotype 313
in the denition, for any 1 < q < , with a corresponding change of con-
stant.
Proposition 18.6.1 If T L(E, F) and T is of type p, then T
L(F
, E
)
is of cotype p
t
, and C
p
(T
) T
p
(T).
Proof Suppose that
1
, . . . ,
n
are vectors in F
and x
1
, . . . , x
n
are vectors
in E. Then
[
n
j=1
T
(
j
)(x
j
)[ =
j=1
j
(T(x
j
))
j=1
j=1
j
T(x
j
)
_
_
_
_
_
_
n
j=1
j
_
_
_
_
_
_
2
1/2
_
_
_
_
_
_
n
j=1
j
T(x
j
)
_
_
_
_
_
_
2
1/2
_
_
_
_
_
_
n
j=1
j
_
_
_
_
_
_
2
1/2
T
p
(T)(
n
j=1
|x
j
|
p
)
1/p
.
But
_
n
j=1
|T
(
j
)|
p
_
1/p
= sup
j=1
T
(
j
)(x
j
)
j=1
|x
j
|
p
1/p
1
,
and so
_
n
j=1
|T
(
j
)|
p
_
1/p
T
p
(T)
_
_
_
_
_
_
n
j=1
j
_
_
_
_
_
_
2
1/2
.
Corollary 18.6.1 If T L(E, F) and T
is of type p, then T is of cotype

p
t
, and C
p
(T) T
p
(T
).
The converse of this proposition is not true (Exercise 18.3).
An important special case occurs when we consider the identity operator
I
E
on a Banach space E. If I
E
is of type p (cotype p), we say that E is
of type p (cotype p), and we write T
p
(E) (C
p
(E)) for T
p
(I
E
) (C
p
(I
E
)), and
call it the type p constant (cotype p constant) of E. Thus the parallelogram
law states that a Hilbert space H is of type 2 and cotype 2, and T
2
(H) =
C
2
(H) = 1.
18.7 Gaussian type and cotype
It is sometimes helpful to work with sequences of Gaussian random variables,
rather than with Bernoulli sequences. Recall that a standard Gaussian ran-
dom variable is, in the real case, a real-valued Gaussian random variable
with mean 0 and variance 1, so that its density function on the real line is
(1/
2)e
x
2
/2
, and in the complex case is a rotationally invariant, complex-
valued Gaussian random variable with mean 0 and variance 1, so that its
density function on the complex plane is (1/)e
[z[
2
. For the rest of this
chapter, (g
n
) will denote an independent sequence of standard Gaussian
random variables, real or complex. The theories are essentially the same in
the real and complex cases, but with dierent constants. For example, for
0 < p < we dene
p
= |g|
p
, where g is a standard Gaussian random
variable. Then in the real case,
1
=
_
2/,
2
= 1 and
4
= 3
1/4
, while, in
the complex case,
1
=

/2,
2
= 1 and
4
= 2
1/4
.
If in the denitions of type and cotype we replace the Bernoulli sequence
(
n
) by (g
n
), we obtain the denitions of Gaussian type and cotype. We
denote the corresponding constants by T
p
and C
p
.
Proposition 18.7.1 If T L(E, F) is of type 2 (cotype 2) then it is of
Gaussian type 2 (Gaussian cotype 2), and T
2
(T) T
2
(T) (C
2
(T) C
2
(T)).
Proof Let us prove this for cotype: the proof for type is just the same. Let
x
1
, . . . , x
n
be vectors in E. Suppose that the sequence (g
n
) is dened on
and the sequence (
n
) on
t
. Then for xed ,
n
j=1
[g
j
()[
2
|T(x
j
)|
2
F
C
2
(T)E
_
_
_
_
_
_
n
j=1
j
g
j
()x
j
_
_
_
_
_
_
2
E
.
18.7 Gaussian type and cotype 315
Taking expectations over , and using the symmetry of the Gaussian se-
quence, we nd that
n
j=1
|T(x
j
)|
2
F
C
2
(T)E
_
_
_
_
_
_
n
j=1
j
g
j
x
j
_
_
_
_
_
_
2
E
= C
2
(T)E
_
_
_
_
_
_
n
j=1
g
j
x
j
_
_
_
_
_
_
2
E
.
The next theorem shows the virtue of considering Gaussian random
variables.
Theorem 18.7.1 (Kwapie ns theorem) Suppose that T L(E, F) and
S L(F, G). If T is of Gaussian type 2 and S is of Gaussian cotype 2 then
ST
2
(E, F), and
2
(ST) T
2
(T)C
2
(S).
Proof We use Theorem 16.13.2. Suppose that y
1
, . . . , y
n
E and that
U = (u
ij
) is unitary (or orthogonal, in the real case). Let h
j
=

n
i=1
g
i
u
ij
.
Then h
1
, . . . , h
n
are independent standard Gaussian random variables. Thus
i=1
_
_
_
_
_
_
ST(
n
j=1
u
ij
y
j
)
_
_
_
_
_
_
2
1/2
C
2
(S)
_
_
_
_
_
_
n
i=1
g
i
(
n
j=1
u
ij
T(y
j
))
_
_
_
_
_
_
2
1/2
= C
2
(S)
_
_
_
_
_
_
n
j=1
h
j
T(y
j
)
_
_
_
_
_
_
2
1/2
T
2
(T)C
2
(S)
j=1
|y
j
|
2
1/2
.
Corollary 18.7.1 A Banach space (E, |.|
E
) is isomorphic to a Hilbert space
if and only if it is of type 2 and cotype 2, and if and only if it is of Gaussian
type 2 and Gaussian cotype 2.
p
spaces
Let us give some examples.
Theorem 18.8.1 Suppose that (, , ) is a measure space.
(i) If 1 p 2 then L
p
(, , ) is of type p and cotype 2.
(ii) If 2 p < then L
p
(, , ) is of type 2 and cotype p.
Proof (i) Suppose that f
1
, . . . , f
n
are in L
p
(, , ). To prove the cotype
inequality, we use Khintchines inequality and Corollary 5.4.2.
_
_
_
_
_
_
n
j=1
j
f
j
_
_
_
_
_
_
2
p
1/2
_
_
_
_
_
_
n
j=1
j
f
j
_
_
_
_
_
_
p
p
1/p
=
[
n
j=1
j
f
j
()[
p
d()
1/p
=
[
n
j=1
j
f
j
()[
p
d()
1/p
A
1
p
(
n
j=1
[f
j
()[
2
)
p/2
d()
1/p
A
1
p
j=1
__
([f
j
()[
p
) d()
_
2/p
1/2
= A
1
p
j=1
|f
j
|
2
p
1/2
.
Thus L
p
(, , ) is of cotype 2.
To prove the type inequality, we use the Kahane inequality.
_
_
_
_
_
_
n
j=1
j
f
j
_
_
_
_
_
_
2
p
1/2
K
p,2
_
_
_
_
_
_
n
j=1
j
f
j
_
_
_
_
_
_
p
p
1/p
= K
p,2
j=1
j
f
j
()
p
d()
1/p
p
spaces 317
= K
p,2
j=1
j
f
j
()
d()
1/p
K
p,2
j=1
[f
j
()[
2
p/2
d()
1/p
K
p,2
j=1
__
[f
j
()[
p
d()
_
1/p
= K
p,2
j=1
|f
j
|
p
1/p
p
.
Thus L
p
(, , ) is of type p.
(ii) Since L
p
(, , ) is of type p
t
, L
p
(, , ) is of type p, by Proposi-
tion 18.6.1. Suppose that f
1
, . . . , f
n
are in L
p
(, , ). To prove the type
inequality, we use Khintchines inequality and Corollary 5.4.2.
_
_
_
_
_
_
n
j=1
j
f
j
_
_
_
_
_
_
2
p
1/2
_
_
_
_
_
_
n
j=1
j
f
j
_
_
_
_
_
_
p
p
1/p
=
[
n
j=1
j
f
j
()[
p
d()
1/p
=
[
n
j=1
j
f
j
()[
p
d()
1/p
B
p
(
n
j=1
[f
j
()[
2
)
p/2
d()
1/p
B
p
j=1
__
[f
j
()[
p
d()
_
2/p
1/2
= B
p
j=1
|f
j
|
2
p
1/2
.
Thus L
p
(, , ) is of type 2.
18.9 The little Grothendieck theorem revisited
We now give the rst generalization of the little Grothendieck theorem.
Theorem 18.9.1 Suppose that (E, |.|
E
) is a Banach space whose dual E
is of Gaussian type p, where 1 < p 2. If T L(C(K), E), then T
,2
(C(K), E), and
p
,2
(T)
1
1
T
p
(E
) |T|.
Proof Suppose that f
1
, . . . , f
n
, C(K). We must show that
j=1
|T(f
j
)|
p
1/p
C sup
kK
j=1
[f
j
(k)[
2
1/2
,
where C =
1
1
T
p
(E
).
For f = (f
1
, . . . , f
n
) C(K; l
n
2
), let R(f) = (T(f
j
))
n
j=1
l
p
n
(E). Then we
need to show that |R| C |T|. To do this, let us consider the dual
mapping R
: l
p
n
(E
) C(K; l
n
2
)
. If = (
j
)
n
j=1
C(K; l
n
2
)
, then
R
() = (T
(
1
), . . . , T
(
n
)). By Lemma 18.5.1, there exist a Baire prob-
ability measure P on K and w
1
, . . . , w
n
L
1
(P) such that T
(
j
) = w
j
dP
for 1 j n. Then
|R
()|
M(K:l
n
2
)
=
_
K
j=1
[w
j
(k)[
2
1/2
dP(k)
=
_
K
j=1
g
j
w
j
(k)
1/2
dP(k)
=
1
1
_
K
E
j=1
g
j
w
j
(k)
dP(k)
=
1
1
E
_
K
j=1
g
j
w
j
(k)
dP(k)
=
1
1
E
_
_
_
_
_
_
T
(
n
j=1
g
j
j
)
_
_
_
_
_
_
E
18.9 The little Grothendieck theorem revisited 319

1
1
|T
| E
_
_
_
_
_
_
n
j=1
g
j
j
_
_
_
_
_
_
E

1
1
|T
_
_
_
_
_
_
n
j=1
g
j
j
_
_
_
_
_
_
2
E
1/2

1
1
|T
| T
p
(E
j=1
|
j
|
p
E
1/p

1
1
|T
| T
p
(E
) ||
l
p
n
(E
)
.
This gives the best constant in the little Grothendieck theorem.
Proposition 18.9.1 The best constant in the little Grothendieck theorem is
1
1
(
_
/2 in the real case, 2/
in the complex case).

Proof Theorem 18.9.1 shows that
1
1
is a suitable upper bound. Let P be
standard Gaussian measure on R
d
(or C
d
), so that if we set g
j
(x) = x
j
then
g
1
, . . . , g
d
are independent standard Gaussian random variables. Let K be
the one-point compactication of R
d
(or C
d
), and extend P to a probability
measure on K by setting P() = 0.
Now let G: C(K) l
d
2
be dened by G(f) = (E(f g
j
))
d
j=1
. Then
|G(f)| =
j=1
[E(f g
j
)[
2
1/2
= sup
j=1
j
g
j
:
d
j=1
[
j
[
2
1

1
|f|
,
so that |G|
1
.
On the other hand, if f = (f
1
, . . . , f
d
) C(K; l
d
2
), set R(f)
i
= (G(f
i
))
l
d
2
(l
d
2
), for 1 i d. Then
|f|
C(K;l
d
2
)
= sup
kK
_
d
i=1
[f
i
(k)[
2
_
1/2
and |R(f)| =
_
d
i=1
|G(f
i
)|
2
_
1/2
,
so that
|R(f)|
2
(G) sup
kK
(
d
i=1
[f
i
(k)[
2
)
1/2
=
2
(G) |f|
C(K;l
d
2
)
,
and
2
(G) |R|.
We consider R
. If e = (e
1
, . . . , e
d
), then R
(e) = ( g
1
, . . . , g
d
). Then
|R
(e)| = E(), where = (
d
j=1
[g
j
[
2
)
1/2
. By Littlewoods inequality,
d = ||
2
||
1/3
1
||
2/3
4
. But
||
4
4
= E
j=1
[g
j
[
2
=
d
j=1
E([g
j
[
4
) +
j,=k
E([g
j
[
2
[g
k
[
2
) = d
4
4
+d(d 1).
Thus
||
2
1
d
3
/ |g|
4
4
= d/(1 + (
4
4
1)/d),
so that, since |e| =

d,
|R|
2
= |R
|
2
1/(1 + (
4
4
1)/d).
Consequently,
2
(G) |G| /(
1
(1 + (
4
4
1)/d)
1/2
). Since d is arbitrary,
the result follows.
18.10 More on cotype
Proposition 18.10.1 Suppose that (E, |.|
E
) and (F, |.|
F
) are Banach
spaces and that F has cotype p. If T
q
(E, F) for some 1 q <
then T
p,2
and
p,2
(T) C
p
(F)B
q
q
(T) (where B
q
is the constant in
Khintchines inequality).
Proof Let j : E C(K) be an isometric embedding. By Pietschs domina-
tion theorem, there exists a probability measure on K such that
|T(x)|
F

q
(T)
_
_
C(K)
[j(x)[
q
d
_
1/q
for x E.
18.10 More on cotype 321
If x
1
, . . . , x
N
E, then, using Fubinis theorem and Khintchines inequality,
(
N
n=1
|T(x
n
)|
p
F
)
1/p
C
p
(F)
_
_
_
_
_
N
n=1
n
T(x
n
)
_
_
_
_
_
2
F
1/2
C
p
(F)
_
E
__
_
_
_
_
T
_
N
n=1
n
x
n
__
_
_
_
_
q
F
__
1/q
C
p
(F)
q
(T)
_
E
_
_
K
j
_
N
n=1
n
x
n
_
q
d
__
1/q
= C
p
(F)
q
(T)
_
_
K
E
_
j
_
N
n=1
n
x
n
_
q
d
__
1/q
C
p
(F)B
q
q
(T)
_
K
_
N
n=1
[j(x
n
)[
2
_
q/2
d
1/q
C
p
(F)B
q
q
(T) sup
||
E
1
N
n=1
_
[(x
n
)[
2
_
1/2
.
We now have the following generalization of Theorem 16.11.1.
Corollary 18.10.1 If (F, |.|
F
) has cotype 2 then
q
(E, F) =
2
(E, F) for
2 q < .
We use this to give our nal generalization of the little Grothendieck
theorem. First we establish a useful result about C(K) spaces.
Proposition 18.10.2 Suppose that K is a compact Hausdor space, that
F is a nite-dimensional subspace of C(K) and that > 0. Then there
exists a projection P of C(K) onto a nite-dimensional subspace G, with
|P| = 1, such that G is isometrically isomorphic to l
d
(where d = dim G)
and |P(f) f| |f| for f F.
Proof The unit sphere S
F
of F is compact, and so there exists a nite set
f
1
, . . . , f
n
S
F
such that if f S
F
then there exists j such that |f f
j
|
/3. If k K, let J(k) = (f
1
(k), . . . , f
n
(k)). J is a continuous mapping of
K onto a compact subset J(K) of R
n
(or C
n
). There is therefore a maximal
nite subset S of K such that |J(s) J(t)| /3 for s, t distinct elements
of S. We now set
h
s
(k) = max(1 3 |J(k) J(s)| /, 0)
for s S, k K. Then h
s
(k) 0, h
s
(s) = 1, and h
s
(t) = 0 for t ,= s. Let
h(k) =
sS
h
s
(k). Then, by the maximality of S, h(k) > 0 for each k K.
We now set g
s
= h
s
/h. Then g
s
(k) 0, g
s
(s) = 1, g
s
(t) = 0 for t ,= s, and
sS
g
s
(k) = 1. Let G = span g
s
. If g G then |g| = max[g(s)[: s S,
so that G is isometrically isomorphic to l
d
, where d =dim G.
If f C(K), let P(f) =

sS
f(s)g
s
. Then P is a projection of C(K)
onto G, and |P| = 1. Further,
f
j
(k) P(f
j
)(k) =
sS
(f
j
(k) f
j
(s))g
s
(k)
=
(f
j
(k) f
j
(s))g
s
(k): [f
j
(k) f
j
(s)[ /3,
since g
s
(k) = 0 if |f
j
(k) f
j
(s)| > /3. Thus |f
j
P(f
j
)| /3. Finally
if f S
F
, there exists j such that |f f
j
| /3. Then
|f P(f)| |f f
j
| +|f
j
P(f
j
)| +|P(f
j
) P(f)| .
Theorem 18.10.1 If (F, |.|
F
) has cotype 2 and T L(C(K), F) then T is
2-summing, and
2
(T)
3(C
2
(F))
2
|T|.
Proof First, we consider the case where K = 1, . . . , d, so that C(K) = l
d
.
Then T
2
(C(K), F), and
4
(T) C
2
(F)B
4
2
(T), by Proposition 18.10.1.
But
2
(T) (
4
(T))
1/2
|T|
1/2
, by Proposition 16.10.1. Combining these in-
equalities, we obtain the result.
Next, we consider the general case. Suppose that f
1
, . . . , f
N
C(K) and
that > 0. Let P and G be as in Proposition 18.10.2. Then
_
N
n=1
|T(f
n
)|
2
F
_
1/2
_
N
n=1
|TP(f
n
)|
2
F
_
1/2
+
N |T|
3(C
2
(F))
2
|T| (sup
sS
N
n=1
[f
n
(s)[
2
)
1/2
+
N |T| ,
by the nite-dimensional result. Since > 0 is arbitrary, it follows that
_
N
n=1
|T(f
n
)|
2
F
_
1/2
3(C
2
(F))
2
_
sup
kK
N
n=1
[f
n
(k)[
2
_
1/2
.
Littlewood was interested in bilinear forms, rather than linear operators:
if B is a bilinear form on l
m
l
n
then B(x, y) =

m
i=1
n
j=1
x
i
b
ij
y
j
, and
|B| = supB(x, y) : |x|
1, |y|
1. Looking at things this way, it is

natural to consider multilinear forms; these (and indeed forms of fractional
dimension) are considered in [Ble 01].
Grothendiecks proof depends on the identity
x, y) = cos
_
2
_
1
_
S
n1
sgn x, s) sgn y, s) d(s)
__
,
where x and y are unit vectors in l
n
2
(R) and is the rotation-invariant
probability measure on the unit sphere S
n1
.
In fact, the converse of Proposition 18.7.1 is also true. See [DiJT 95].
Paleys inequality was generalized by Hardy and Littlewood. See [Dur 70]
for details.
Kwapie ns theorem shows that type and cotype interact to give results
that correspond to Hilbert space results. Here is another result in the same
direction, which we state without proof.
Theorem 18.11.1 (Maureys extension theorem) Suppose that E has
type 2 and that F has cotype 2. If T L(G, F), where G is a linear subspace
of E. There exists

T L(E, F) which extends T:

T(x) = T(x) for x G.
Note that, by Kwapiens theorem we may assume that F is a Hilbert space.
In this chapter, we have only scratched the surface of a large and im-
portant subject. Very readable accounts of this are given in [Pis 87] and
[DiJT 95].
Exercises
18.1 How good a constant can you obtain from the proof of Theorem
18.2.1?
18.2 Suppose that T L(E, F) is of cotype p. Show that T
p,1
(E, F).
Compare this with Orlicz theorem (Exercise 16.6).
18.3 Give an example of an operator T which has no type p for 1 < p 2,
while T
has cotype 2.
18.4 Suppose that f(z) =

k=0
a
k
z
k
H
1
. Let T(f) = (a
k
/
k). Use
Hardys inequality to show that T(f) l
2
and that |T(f)|
2

|f|
H
1
.
Let g
k
(z) = z
k
/
k + 1 log(k + 2). Show that

k=0
g
k
converges
unconditionally in H
2
, and in H
1
. Show that T is not absolutely
summing.
H
1
can be considered as a subspace of L
1
(T). Compare this result
with Grothendiecks theorem, and deduce that there is no continuous
projection of L
1
(T) onto H
1
.
18.5 Show that
1
1
is the best constant in Theorem 18.5.2.
References
[Alf 71] E.M. Alfsen (1971). Compact Convex Sets and Boundary Integrals (Springer-
Verlag).
[Ane 00] C. Ane et al. (2000). Sur les Inegalites de Sobolev Logarithmiques (Soc.
Math. de France, Panoramas et Synthèses, 10).
[App 96] D. Applebaum (1996). Probability and Information (Cambridge University
Press).
[ArG 80] A. Araujo and E. Gine (1980). The Central Limit Theorem for Real and
Banach Valued Random Variables (Wiley).
[Bak 94] D. Bakry (1994). Lhypercontractivite et son utilisation en theorie des
semigroupes, Lectures on Probability Theory.

Ecole d
Ete de Saint Flour 1992

(Springer Lecture Notes in Mathematics, volume 1581).
[Ban 29] S. Banach (1929). Sur les fonctionelles lineaires II, Studia Math. 1 223239.
[Bar 95] R.G. Bartle (1995). The Elements of Integration and Lebesgue Measure
(Wiley).
[BaGH 62] L.D. Baumert, S.W. Golomb and M. Hall Jr.(1962) Discovery of a
Hadamard matrix of order 92, Bull. Amer. Math. Soc. 68 237238.
[Bec 75] W. Beckner (1975). Inequalities in Fourier analysis, Ann. Math. 102 159
182.
[BeS 88] C. Bennett and R. Sharpley (1988). Interpolation of Operators (Academic
Press).
[BeL 76] J. Bergh and J. L ofstr om (1976). Interpolation Spaces (Springer-Verlag).
[Bil 95] P. Billingsley (1995). Probability and Measure (Wiley).
[Ble 01] R. Blei (2001). Analysis in Integer and Fractional Dimensions (Cambridge
University Press).
[BoS 38] H.F. Bohnenblust and A. Sobczyk (1938). Extensions of functionals on
complex linear spaces, Bull. Amer. Math. Soc. 44 9193.
[Bol 90] B. Bollob as (1990). Linear Analysis (Cambridge University Press).
[Bon 71] A. Bonami (1971).

Etude des coecients de Fourier des fonctions de L
p
(G),
Ann. Inst. Fourier (Grenoble) 20 335402.
[Bre 68] L. Breiman (1968). Probability (Addison Wesley).
[BuMV 87] P.S. Bullen, D.S. Mitrinovic and P.M. Vasic (1987). Means and their
Inequalities (Reidel, Boston).
[Cal 63] A.P. Calder on (1963). Intermediate spaces and interpolation, Studia Math.
Special Series 1 3134.
325
326 References
[Cal 64] A.P. Calder on (1964). Intermediate spaces and interpolation, the complex
method, Studia Math. 24 113190.
[Cal 66] A.P. Calder on (1966). Spaces between L
1
and L
and the theorem of

Marcinkiewicz, Studia Math. 26 273299.
[CaZ 52] A.P. Calder on and A. Zygmund (1952). On the existence of certain singular
integrals, Acta Math. 82 85139.
[Car 23] T. Carleman (1923). Sur les fonctions quasi-analytiques, in Proc. 5th Scand.
Math. Cong. (Helsinki).
[Cau 21] A. Cauchy (1821). Cours dAnalyse de l
Ecole Royale Polytechnique (De-

bures frères, Paris).
[Cla 36] J.A. Clarkson (1936). Uniformly convex spaces, Trans. Amer. Math. Soc.
40 396414.
[DiJT 95] J. Diestel, H. Jarchow and A. Tonge (1995). Absolutely Summing Opera-
tors (Cambridge University Press).
[DiU 77] J. Diestel and J.J. Uhl Jr. (1977). Vector Measures (American Mathemat-
ical Society).
[Doo 40] J.L. Doob (1940). Regularity properties of certain families of chance vari-
ables, Trans. Amer. Math. Soc. 47 455486.
[Dow 78] H.R. Dowson (1978). Spectral Theory of Linear Operators (Academic
Press).
[Dud 02] R.M. Dudley (2002). Real Analysis and Probability (Cambridge University
Press).
[DuS 88] N. Dunford and J.T. Schwartz (1988). Linear Operators Part I: General
Theory (Wiley Classics Library).
[Duo 01] J. Duoandikoetxea (2001). Fourier Analysis (Amer. Math. Soc. Graduate
Studies in Mathematics 29).
[Dur 70] P.L. Duren (1970). Theory of H
p
Spaces (Academic Press).
[Enf 73] P. Eno (1973). On Banach spaces which can be given an equivalent uni-
formly convex norm, Israel J. Math. 13 281288.
[Fel 70] W. Feller (1970). An Introduction to Probability Theory and its Applications,
Volume I (Wiley International Edition).
[Fou 74] J.J.F. Fournier (1974). An interpolation problem for coecients of H
functions, Proc. Amer. Math. Soc. 48 402408.

[Gar 70] D.J.H. Garling (1970). Absolutely p-summing operators in Hilbert space,
Studia Math. 38 319331.
[GaG 71] D.J.H. Garling and Y. Gordon (1971). Relations between some constants
associated with nite dimensional Banach spaces, Israel J. Math. 9 346361.
[GiM 91] J.E. Gilbert and M.A.M. Murray (1991). Cliord Algebras and Dirac Op-
erators in Harmonic Analysis (Cambridge University Press).
[Gro 75] L. Gross (1975). Logarithmic Sobolev inequalities, Amer. J. Math. 97
10611083.
[Gro 93] L. Gross (1993). Logarithmic Sobolev inequalities and contractivity proper-
ties of semigroups, Dirichlet Forms (Varenna, 1992) 5488 (Springer Lecture
Notes in Mathematics, Volume 1563).
[Grot 53] A. Grothendieck (1953). Resume de la theorie metrique des produits ten-
soriels topologiques, Bol. Soc. Mat. S ao Paulo 8 179.
[Had 93] J. Hadamard (1893). Resolution dune question relative aux determinantes,
Bull. des sciences Math.(2) 17 240248.
[Hah 27] H. Hahn (1927).

Uber lineare Gleichungen in linearen R aume, J. F ur Die
Reine und Angewandte Math. 157 214229.
References 327
[Hal 50] P.R. Halmos (1950). Measure Theory (Van Nostrand Reinhold).
[Har 20] G.H. Hardy (1920). Note on a theorem of Hilbert, Math. Zeitschr. 6 314
317.
[HaL 30] G.H. Hardy and J.E. Littlewood (1930). A maximal theorem with function-
theoretic applications, Acta Math. 54 81116.
[HaLP 52] G.H. Hardy, J.E. Littlewood and G. P olya (1952). Inequalities, 2nd edn
(Cambridge University Press).
[Hed 72] L. Hedberg (1972). On certain convolution inequalities, Proc. Amer. Math.
Soc. 36 505510.
[HiS 74] M.W. Hirsch and S. Smale (1974). Dierential Equations, Dynamical Sys-
tems, and Linear Algebra (Academic Press).
[H ol 89] O. H older (1889)

Uber ein Mittelwertsatz, Nachr. Akad. Wiss. G ottingen
Math. Phys. Kl. 3847.
[Hor 50] A. Horn (1950). On the singular values of a product of completely contin-
uous operators, Proc. Nat. Acad. Sci. USA 36 374375.
[H or 90] L. H ormander (1990). The Analysis of Linear Partial Dierential Opera-
tors I (Springer-Verlag).
[Hun 64] R.A. Hunt (1964). An extension of the Marcinkiewicz interpolation theo-
rem to Lorentz spaces, Bull. Amer. Math. Soc. 70 803807.
[Hun 66] R.A. Hunt (1966). On L(p, q) spaces spaces, LEnseignement Math. (2) 12
249275.
[Jan 97] S. Janson (1997). Gaussian Hilbert Spaces (Cambridge University Press).
[Jen 06] J.L.W.V. Jensen (1906). Sur les fonctions convexes et les inegalites entre
les valeurs moyennes, Acta Math. 30 175193.
[Joh 48] F. John (1948). Extremum problems with inequalities as subsidiary condi-
tions, Courant Anniversary Volume 187204 (Interscience).
[JoL 01,03] W.B. Johnson and J. Lindenstrauss (eds) (2001, 2003). Handbook of the
Geometry of Banach Spaces, Volumes 1 and 2 (Elsevier).
[Kah 85] J.-P. Kahane (1985). Some Random Series of Functions, 2nd edn
[Khi 23] A. Khintchine (1923).

Uber dyadische Br uche, Math. Z. 18 109116.
[Kol 25] A.N. Kolmogoro (1925). Sur les fonctions harmoniques conjuguees et les
series de Fourier, Fundamenta Math. 7 2429.
[K on 86] H. K onig (1986). Eigenvalue Distribution of Compact Operators
(Birkh auser).
[Kwa 72] S. Kwapie n (1972). Isomorphic characterizations of inner product spaces
by orthogonal series with vector valued coecients, Studia Math. 44 583595.
[Lac 63] H.E. Lacey (1963). Generalizations of Compact Operators in Locally Convex
Topological Linear Spaces (Thesis, New Mexico State University).
[La O 94] R. Lata la and K. Oleszkiewicz (1994). On the best constant in the
KhinchinKahane inequality, Studia Math. 109 101104.
[Lid 59] V.B. Lidskii (1959). Non-self-adjoint operators with a trace (Russian), Dok-
lady Acad. Nauk SSSR 125 485487.
[LiP 68] J. Lindenstrauss and A. Pe lczy nski (1968). Absolutely summing operators
in L
p
spaces and their applications, Studia Math. 29 275321.
[LiT 79] J. Lindenstrauss and L. Tzafriri (1979). Classical Banach Spaces II
(Springer-Verlag).
[Lio 61] J.L. Lions (1961). Sur les espaces dinterpolation: dualite, Math. Scand. 9
147177.
328 References
[Lit 86] J.E. Littlewood (1986). Littlewoods Miscellany, edited by Bela Bollob as
[Lor 50] G.G. Lorentz (1950). Some new functional spaces, Ann. Math. 51 3755.
[Lux 55] W.A.J. Luxemburg (1955). Banach Function Spaces, PhD thesis, Delft In-
stitute of Technology.
[LuZ 63] W.A.J. Luxemburg and A.C. Zaanen (1963). Notes on Banach function
spaces I-V, Indag. Math. 18 135147, 148153, 239250, 251263, 496504.
[Mar 39] J. Marcinkiewicz (1939). Sur linterpolation doperations, C. R. Acad. Sci.
Paris 208 12721273.
[Mer 09] J. Mercer (1909). Functions of positive and negative type, and their con-
nection with the theory of integral equations, Phil. Trans. A 209 415446.
[Min 96] H. Minkowski (1896). Diophantische Approximationen (Leipzig).
[Mui 03] R.F. Muirhead (1903). Some methods applicable to identities and inequal-
ities of symmetric algebraic functions of n letters, Proc. Edinburgh Math. Soc.
21 144157.
[Nel 73] E. Nelson (1973). The free Markov eld, J. Funct. Anal. 12 211227.
[Nev 76] J. Neveu (1976). Sur lesperance conditionelle par rapport ` a un mouvement
brownien, Ann. Inst. Poincare Sect. B (N.S.) 12 105109.
[Orl 32] W. Orlicz (1932).

Uber eine gewisse Klasse von Raumen vom Typus B, Bull.
Int. Acad. Polon. Sci. Lett. Cl. Math. Nat. A 207222.
[Pal 33] R.E.A.C. Paley (1933). On orthogonal matrices, J. Math. Phys. 12 311320.
[Pee 69] J. Peetre (1969). Sur la transformation de Fourier des fonctions ` a valeurs
vectorielles, Rend. Sem. Mat. Univ. Padova 42 1526.
[Pel 67] A. Pe lczy nski (1967). A characterization of HilbertSchmidt operators, Stu-
dia Math. 28 355360.
[Pel 77] A. Pe lczy nski (1977). Banach Spaces of Analytic Functions and Absolutely
Summing Operators (Amer. Math. Soc. Regional conference series in mathe-
matics, 30).
[Phe 66] R.R. Phelps (1966). Lectures on Choquets Theorem (Van Nostrand).
[Pie 63] A. Pietsch (1963). Zur Fredholmschen Theorie in lokalkonvexe Ra ume, Stu-
dia Math. 22 161179.
[Pie 67] A. Pietsch (1967). Absolut p-summierende Abbildungen in Banachr aume,
Studia Math. 28 333353.
[Pie 87] A. Pietsch (1987). Eigenvalues and s-Numbers (Cambridge University
Press).
[Pis 75] G. Pisier (1975). Martingales with values in uniformly convex spaces, Israel
J. Math. 20 326350.
[Pis 87] G. Pisier (1987). Factorization of Linear Operators and Geometry of Ba-
nach Spaces (Amer. Math. Soc. Regional conference series in mathematics, 60,
second printing).
[Pis 89] G. Pisier (1989). The Volume of Convex Bodies and Banach Space Geometry
[P ol 26] G. P olya (1926). Proof of an inequality, Proc. London Math. Soc. 24 57.
[P ol 50] G. P olya (1950). Remark on Weyls note: inequalities between the two kinds
of eigenvalues of a linear transformation, Proc. Nat. Acad. Sci. USA 36 4951.
[Ri(F) 10] F. Riesz (1910). Untersuchungen uber Systeme intergrierbarer Funktio-
nen, Math. Ann. 69 449447.
[Ri(F) 32] F. Riesz (1932). Sur un theorème de MM. Hardy et Littlewood, J. London
M.S. 7 1013.
References 329
[Ri(M) 26] M. Riesz (1926). Sur les maxima des formes lineaires et sur les fonc-
tionelles lineaires, Acta Math. 49 465497.
[Rud 79] W. Rudin (1979). Real and Complex Analysis (McGraw-Hill).
[Rud 79] W. Rudin (1990). Fourier Analysis on Groups (Wiley Classics Library).
[Ryf 65] J.V. Ry (1965). Orbits of L
1
-functions under doubly stochastic transfor-
mations, Trans. Amer. Math. Soc. 117 92100.
[Scha 50] R. Schatten (1950). A Theory of Cross-Spaces Ann. Math. Stud., 26.
[Sch 23] I. Schur (1923).

Uber eine Klasse von Mittelbildungen mit Anwendungen
auf die Determinanten theorie, Sitzungber. d. Berl. Math. Gesellsch. 22 920.
[Schw 85] H.A. Schwarz (1885).

Uber ein die Fl achen kleinste Fl acheneinhalts betr-
eende problem der variationsrechnung, Acta Sci. Scient. Fenn. 15 315362.
[Smi 62] F. Smithies (1962). Integral Equations (Cambridge University Press).
[Ste 04] J.M. Steele (2004). The CauchySchwarz Master Class (Cambridge Univer-
sity Press).
[Stei 70] E.M. Stein(1970). Singular Integrals and Dierentiability of Functions
(Princeton University Press).
[Stei 93] E.M. Stein (1993). Harmonic Analysis: Real-Variable Methods, Orthogo-
nality and Oscillatory Integrals (Princeton University Press).
[StW 71] E.M. Stein and G. Weiss (1971). Introduction to Fourier Analysis on Eu-
clidean Spaces (Princeton University Press).
[TaL 80] A.E. Taylor and D.C. Lay (1980). Introduction to Functional Analysis (Wi-
ley).
[Tho 39] G.O. Thorin (1939). An extension of a convexity theorem due to M. Riesz,
Kung. Fys. Saell. i Lund For. 8 no. 14.
[Tom 89] N. Tomczak-Jaegermann (1989). Banach-Mazur Distances and Finite-
Dimensional Operator Ideals (Pitman).
[vLW 92] J.H. van Lint and R.M. Wilson (1992). A Course in Combinatorics (Cam-
bridge University Press).
[Vil 39] J. Ville (1939).

Etude Critique de la Notion de Collectif (Gauthier-Villars).
[Wey 49] H. Weyl (1949). Inequalities between the two kinds of eigenvalues of a
linear transformation, Proc. Nat. Acad. Sci. USA 35 408411.
[Wil 91] D. Williams (1991). Probability with Martingales (Cambridge University
Press).
[Zyg 56] A. Zygmund (1956).On a theorem of Marcinkiewicz concerning interpola-
tion of operations, J. Math. Pure Appl. 35 223248.
Index of inequalities
arithmetic meangeometric mean 19
generalized 25, 100
BabenkoBeckner 230231
Beckner 229
Bell 1
Bonami 206
Carleman 22, 23
Cauchy 13
CauchySchwarz 15, 35, 151, 243, 275
Clarkson 152
generalized 146
Cotlar 169, 184
Doob 130
Grothendieck 304306
Hadamard 233
Hadamards three lines 135
Hardy 158, 160, 164, 324
HardyRiesz 103106, 258
HarkerKasper 18
HausdorYoung 143, 165, 230
vector-valued 144
Hedberg 125
generalized 134
Hilbert 65, 173175
absolute 65, 66
H older 50, 51
generalized 53, 253
Horn 247249, 253, 291
hypercontractive 213, 219
Gaussian 221223
incremental 18
Jensen 25, 28, 39, 212
Kahane 201, 204, 210, 276, 281282, 287, 312,
317
Khintchine 192, 211, 222, 276, 281282, 287,
303
Ky Fan 251, 291
Levy 189
Liapounov 54
Littlewood 55
Littlewoods 4/3 302303, 320
logarithmic Sobolev 213216,
219
Gaussian 225228
LoomisWhitney 60
Markov 46, 200, 203
mean-value 40
Minkowski 46
reverse 49
Paley 165
(about H
1
) 307
Pietsch 294, 296
Schwarz 17
Sobolev 60, 228
Weyl 101, 247249, 258,
294296
Young 75, 141, 151
331
Index
absolutely convergent 263
absolutely convex 43
absolutely summing operators 265, 287, 310,
324
adapted sequence, process 127
Alfsen, E.M. 100
algebraic multiplicity 240
almost everywhere 6
convergence 110
dierentiation 117
almost surely 6
Ane, C. et al. 232
angle function 85
Applebaum D. 232
approximate identity 122124
approximation numbers 289294
Araujo, A. 204
arithmetic mean 19
associate function norm 72
space 72
atom 5
atom-free 5
Baire -eld
Bakry, D. 204
ball
Euclidean closed 118
Euclidean open 118
Banach, S. 40
BanachAlaoglu theorem 90
BanachMazur distance 287
Banach space 34
associate function space 72
function space 70
Bartle, R.G. 12
barycentre 26, 39, 44
batsman 112
Baumert, L.D. 237
Beckner, W. 229, 230
Beckners theorem 230
bell-shaped approximate identity 122124
Bennett, C. 76, 165
Bergh, J. 165
Bernoulli random variables 142, 187
Billingsley, P. 12
Blaschke product 307
Blei, R. 323
Bochner integrable 40
Bohnenblust, H.F. 40
Bollobas, B. 2, 36, 261
Bonami, A. 232
Bonamis theorem 208, 227228
BorelCantelli lemma, rst 6, 7, 110, 191, 195
Borel set 5
Breiman, L. 204
Bullen, P.S. 23
Buniakovski, V. 13
Calderon, A.P. 135, 170
Calderons interpolation theorem 88, 89, 129,
256
converse 89, 93
CalderonZygmund kernel, regular 181182
Caratheodory, C. 100
Carleman, T. 22
Cauchy, A. 13, 19
Cauchy distribution 199, 205
CauchyRiemann equations, generalized 185
central limit theorem, De Moivres 196, 219,
222, 226227, 232
centre of mass 26
character 141
characteristic function 18
characteristic polynomial 239
Choquet theory 100
Clarkson 147
Cliord algebras 185
compact operator 242246
compatible couple 136
concave 25
Schur 98, 102
strictly 25
conference matrix 235
conjugate 43
index 50
Poisson kernel 167, 175
332
Index 333
contraction principle 187189
convergence almost everywhere 110
convex set 24
function 24
Schur 98, 102
strictly 25
convexity, uniform 147
convolution 88
kernels 121
correlation 223
cotype 312317
Gaussian 314315
counting measure 5
decreasing rearrangement 79
De Moivre central limit theorem 196, 219, 222,
226227, 232
DenjoyCarleman theorem 22
Diestel, J. 40, 90, 323
dierentiation almost everywhere 117
dilation operator 180
Dinis theorem 276, 287
Dirac measure 29
Dirac operator 185
directional derivative 29
distribution function 11
dominated convergence theorem 10
Doob, J.L. 130, 133
weak type 133
doubly stochastic matrix 9498, 102
Dowson, H.R. 242, 261
dual Banach space 34
dual group 140, 142
Dudley, R.M. 12
Dunford, N. 36, 242, 261
Duoandikoetxea, J. 185
Duren, P.L. 307, 321
DvoertzskyRogers theorem 282283
dyadic cube 127
dyadic ltration 127
eigenspace 239
generalized 239
eigenvalues 239240, 296300
Eno, P. 150, 301
entropy 102, 213
information 213
equidistributed 79
error-correcting codes 237238
essentially bounded 47
Euclidean ball
closed 118
open 118
expectation 9
extreme point 97
Fatous lemma 10, 71, 252
Feller, W. 196
nitely represented 150
rst BorelCantelli lemma 6, 7, 110, 191, 195
Fourier transform 141, 168
vector-valued 143
Fourier type 144145
strict 152
Fournier, J.J.F. 309
fractional integral operator 125
FrechetRiesz representation theorem 35, 47
Fredholm integral operator 243, 262
Fubinis theorem 10
function norm 70
associate function space 72
Garling, D.J.H. 283, 287
gauge 30
Gaussian
correlated random variables 223
kernel 122
Hilbert space 216, 232, 305
measure 216
random variable, standard 198
Gelfand numbers 289294
generalized CauchyRiemann equations 185
geometric mean 19, 42
Gilbert, J.E. 185, 186
Gine, E. 204
Golomb, S.W. 237
Gordon, Y. 283
GramSchmidt orthonormalization 42, 217
graph Laplacian 212
greedy algorithm 120
Gross, L. 232
Grothendieck, A. 287, 323
constant 304
little theorem 310311, 318322
theorem 306310, 324
Haar measure 88, 140, 142
Hadamard, J. 233, 235
matrix 234
numbers 234, 237
Hahn, H. 40
HahnBanach theorem 31, 36, 40, 73, 278, 299
separation theorem 33, 37, 38, 218, 277, 279,
285
Hall Jr., M. 237
Halmos, P. R. 12
Hamming metric 215
Hardy, G.H. 1, 67, 105, 106, 112, 195, 323
HardyLittlewood maximal operator,
116, 119
Hardy space 66
harmonic analysis 140
harmonic mean 23
Hedberg, L. 125
Hermite polynomial 217
Hermitian operator 243
HilbertSchmidt class 253254
operators 257, 263, 273, 287
Hilbert space 35
Gaussian 216, 232
Hilbert transform 167178
maximal 169
Hirsch, M.W. 239
334 Index
H older, O. 67
homogeneous measure space 89
Horn, A. 249
H ormander, L. 22
H ormanders condition 182
Hunt, R.A. 162, 165
hypercontractive semigroup 18
indicator function 9
information entropy 213
inner product 14
integrable 10
Bochner 40
locally 117
strongly 40
weakly 40
integration 9
intermediate space 137
interpolation theorem 62
Calderon 88, 89, 129, 256
Marcinkiewicz 154, 162, 170
RieszThorin 138140
iterated logarithm, law 194196
It o calculus 225
Janson, S. 232
Jarchow, H. 90, 323
Jensen, J.L.W.V. 40
John, F. 287
Johnson, W.B. 2
Jordan normal form 239240
Kahane, J.-P. 199
Kahanes theorem 202
kernel
convolution 121
Gaussian 122
Poisson 121
regular CalderonZygmund 181182
Riesz 181, 185
Khintchine, A. 193, 195, 196, 204
Kolmogoro, A.N. 12, 170, 196, 204
Konig, H. 258, 301
Kronecker product
Kwapie n, S. 144, 150, 283, 323
theorem 315
Ky Fans theorem 250
Lacey, H.E. 301
Lagranges identity 13
Laplacian, graph 212
Latala R. 211
law of the iterated logarithm 194196
Lay, D.C. 2
Lebesgue
decomposition theorem 47
density theorem 121
Legendre character
LegendreFenchel transform 77
Lidskii, V.B. 258
trace formula 257260
Lindenstrauss, J. 2, 150, 287
Lipschitz condition 180
continuity 27, 43
function 27
Lions, J.L. 135
Littlewood, J.E. 1, 106, 112, 133, 135,
193195, 204, 287, . 303, 323
local
martingale 128
martingale, bounded in L
p
. 130
sub-martingale 128
super-martingale 128
locally compact group 88, 140
locally in measure 8
locally integrable 117
L
p
spaces 45
L ofstrom, J. 165
Lorentz, G.G. 165
Lorentz space 156162
Luxemburg, W.A.J. 76
Luxemburg norm 73
majorization 84
weak 84
Marcinkiewicz, J. 154, 162
interpolation theorem 154,
162, 170
martingale 127
closed 130
convergence theorem 131132
stopped 132
sub-martingale 128
matrix
conference 235
doubly stochastic 9498, 102
Hadamard
permutation 94, 98, 102
stochastic 100
transfer 94
transposition 94
Maureys extension theorem 323
maximal function
Muirheads 82, 83, 156
maximal Hilbert transform 169
maximal operator
HardyLittlewood 116, 119
maximal sequence
Muitheads 93
maximal theorem
F.Riesz. 113, 115, 119
measure 4, 5
Borel 5, 140
counting 5
Dirac 29
Gaussian 216
Haar 88, 140
locally in 8
singular 134
-nite 6
measure space 6
homogeneous 89
Index 335
measurable set 4, 6
function 7
strongly 40
weakly 39
Mercer, J. 287
theorem 274276, 287
Minkowski, H. 67, 100
Mitrinovic, D.S. 23
monotone convergence theorem 10
Muirhead, R.F. 23, 99, 100
Muirheads maximal function 82, 83, 156
maximal numbers 242, 256
theorem 99
multiplier 179
Murray, M.A.M. 85, 186
M
0
, M
1
, M, M
, 7879
Nelson, E. 232
Neveu, J. 225
norm 34
function 70
Luxemburg 73
Orlicz
projective 90
supremum 34
normal numbers 195
null set 6
function 7
Oleszkiewicz, K. 211
operator
absolutely summing 265, 287, 310, 324
compact 243246
Dirac 185
fractional integral 125
HardyLittlewood maximal 116, 119
Hermitian 243
HilbertSchmidt 257, 263, 273, 287
positive 243246, 273274
p-summing 269271, 281282
(p
q)-summing 266269
(p, 2)-summing 271272
related 241
Riesz 240, 273, 301
Riesz potential 125
singular integral 167, 179185
that factor through a Hilbert space 284
operator ideals 251252
duality 260261
Orlicz, W. 76, 303
Orlicz space 70, 73, 204
norm 76
theorem 288, 323
OrnsteinUhlenbeck semigroup 219, 228
innitesimal generator 225
orthogonal projection 42
Paley, R.E.A.C. 235, 323
Paleys theorem 235237
Peetre, J. 150
Pelczy nski, A. 287, 307
permutation matrix 94, 98, 102
Pettis theorem 39
Phelps, R.R. 100
Pietsch, A. 241242, 287, 289, 301, 307
domination theorem 277279, 299, 320
factorization theorem 279280
Pisier, G. 150, 323
Plancherel theorem 142143, 151, 168169
Poisson kernel 121, 167, 175, 179
conjugate Poisson kernel 167, 175
P olya, G. 1, 22, 249
positive homogeneous 30, 81
positive operator 243246, 273274
potential theory 125
Riesz potential operator 125
principal vector 239
projective norm 90
p-summing operators 269271, 281282
(p
q)-summing operators 266269

(p, 2)-summing operators 271272
Rademacher functions 194
radially open 29
RadonNykod ym theorem 48, 68
property 69
random variable 7
Bernoulli 142, 187
Gaussian, standard 198
stable 199
sub-Gaussian 199, 214
RayleighRitz minimax formula 246247, 290
rearrangement, decreasing 79
invariant 80, 87, 129
reection principle 189
reexive 57, 68, 149
regular CalderonZygmund kernel 181182
resolvent 240
set 240
Riesz, F. 67, 113, 115
maximal theorem 113, 115, 119
representation theorem 269, 299, 310
sunrise lemma 115
Riesz kernel 181, 185
Riesz, M. 67, 105, 115
Riesz operator 240, 273, 301
Riesz potential operator 125
RieszThorin interpolation theorem 138140
Riesz weak type 111, 116, 131, 133
constant 111, 116
Rudin, W. 12, 140
Ry, J.V. 89
sample variance 102
Schatten, R. 261
Schur, I. 98
Schur convex 98, 102
concave 98, 102
Schurs test 63
theorem 63
Schwarz, H.A. 14
Schwartz, J.T. 36, 242, 261
336 Index
semi-norm 34
separation theorem 33, 90, 218
Sharpley, R. 76, 165
signum 51
simple function 9
singular integral operator 167, 179185
singular numbers 246
Smale, S. 239
Smithies, F. 287
Sobczyk, A. 40
spectrum 239
spectral radius 240
stable random variable 199
Steele, J.M. 17
Stein, E.M. 165, 185, 186
Stirlings formula 41, 193, 220, 238
stochastic integration 225
stochastic matrix 100
StoneWeierstrass theorem 143, 219
stopping time 130, 190
strict Fourier type 152
strong law of large numbers 195
strong type 108
constant 108
strongly embedded 196197, 203
subadditive 30, 81, 291
sub-Gaussian random variable 199, 214
sublinear 30
functional 30, 81
sub-martingale 128
submultiplicative 291
sunrise lemma 115
super-reexive 150, 153
supremum norm 34
symmetric Banach sequence space 92
symmetric sequence 187
-compact 140
-eld 4
Taylor, A.E. 2
tensor product 90
Thorin, G.O. 135
TomczakJaegermann, N. 287
Tonge, A. 90, 323
trace 242243, 257
class 256257
Lidskiis formula 257260
transfer
matrix 94, 99
method of 20
translation operator 179
transpose 43
transposition matrix 94
trigonometric polynomials 143
type 312317
Gaussian 314315
Riesz weak 111, 116, 131, 133
Riesz weak constant 111, 116
strong 108
strong constant 108
weak 109
weak constant 100
Tzafriri, L. 150
Uhl Jr., J.J. 40
unconditionally convergent
263265
uniform convexity 147149, 153
uniform smoothness 150
van Lint, J.H. 238
Vasic, P.M.
Ville, J. 130
Vitali covering lemma 133
von Neumann, J. 48
Walsh function 142, 209,
211, 212
weak-L
p
space 156
weak operator topology 90
weak type 109
constant 109
Riesz 111, 116,
131, 133
weakly integrable 40
weakly measurable 39
Weiss, G. 165, 195
Weyl, H. 249
numbers 289294
Wick product 227
Wiener chaos 227
Williams, D. 12
Wilson, R.M. 238
Wojtaszczyk, P. 307
Youngs function 73
complementary 75
Zaanen, A.C. 76
Zygmund, A. 154, 170

Inequalities A Journey Into Linear Analysis - D. J. H. Garling, CUP 2007

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Inequalities A Journey Into Linear Analysis - D. J. H. Garling, CUP 2007

Încărcat de

Drepturi de autor:

Formate disponibile

This page intentionally left blank

INEQUALITIES: A JOURNEY INTO LINEAR

sets (countable intersections of open sets), the F

sets (countable unions of

sets, and is the smallest -eld

f d. This provides an interesting

Ecole Royale Polytech-

f g d is unaltered if we replace f and g by equivalent

f(x +t) f(x)

f(x). Similar arguments show that D

f are increasing functions, that D

f have jump discontinuities at x.

f(x) are equal and continuous,

(x) d(x) < then

for B(N). The space

, and is therefore also a Banach

. If (X, ) is a topological space then the space

= |y| for each y V .

|y|. On the other hand, l

. If = 0, we can take y = 0. Otherwise, by

), we can suppose that ||

= 1. Then for each

is called the bidual of E. The next corollary is an

and the mapping x E

then is Borel measurable. Suppose that each E

, then is called the

. Note that when is a probability measure this simply

f are increasing functions, that

f left-continuous, and that D

f(x) are equal and

, and that if y F and x y F

, and that if P is the orthogonal pro-

| |T|. Use Corollary 4.6.1

is the transpose or conjugate of T.

is a vector space, |f|

= ess sup [f[ is a norm and straightforward

. We use this to prove Holders inequal-

= 1. Then by the inequality

almost everywhere, so that equality holds.

(), so that |f|

if and only if () < .

. In fact, we can say more.

and that ,= 0. Let (E) = (I

are continuous linear functionals on L

. If f is a simple function then

. There exists a sequence (g

. It now follows from the monotone convergence

is nite dimensional. This is the rst indication of the

: thus we can identify the bidual of E with E.

are clearly important, and so is L

()), and |T| N.

, and so we can dene T(f) for

for almost all x,

(X) and g is a non-negative function in L

[0, ), where 1 < p < , then

), where = 1/p = 1/p

be the functions dened in Theorem 5.2.2. Show that

and 0 < < 1 there exists

of nite positive measure such that [g()[ > (1 ) |g|

. (Find g, and show that ([g[ > ||

. If x c, let (x) = lim

= 1. Use the HahnBanach theorem