Sunteți pe pagina 1din 120

Functional Analysis I

Autumn Term 2008

James C. Robinson
Introduction

I hope that these notes will be useful. They are, of course, much more wordy
than the notes you will have taken in lectures, but the maths itself is usually
done in a little more detail and should generally be tighter. You may find
that the order in which the material is presented is a little different to the
lectures, but this should make things more coherent.

Solutions to the examples sheets will follow separately.

I hope that there are relatively few mistakes, but if you find yourself
staring at something thinking that it must be wrong then it most likely is,
so do email me at j.c.robinson@warwick.ac.uk. I will post a list of errata
as and when people find them on my webpage for the course,
www.maths.warwick.ac.uk/jcr/FAI.

These notes will form the basis of the first part of a textbook on functional
analysis, so any general comments would also be welcome.

iii
Contents

1 Vector spaces and bases page 1

2 Norms and normed spaces 8

2.1 Norms and normed spaces 8

2.2 Convergence 16

3 Compactness and equivalence of norms 20

3.1 Compactness 20

4 Completeness 26

4.1 The completion of a normed space 31

5 Lebesgue integration 36

5.1 Integrals of functions in Lstep (R) 36

5.2 Increasing sequences of functions in Lstep (R): Linc (R) 39

5.3 The space L1 (R) of integrable functions 43

5.4 The Lebesgue spaces Lp 46

6 Inner product spaces 48

6.1 Inner products and norms 49

iv
Contents v

6.2 The Cauchy-Schwarz inequality 49

6.3 The relationship between inner products and their norms 51

7 Orthonormal bases in Hilbert spaces 54

7.1 Orthonormal sets 54

7.2 Convergence and orthonormality in Hilbert spaces 56

7.3 Orthonormal bases in Hilbert spaces 62

8 Closest points and approximation 65

8.1 Closest points in convex subsets 65

8.2 Linear subspaces and orthogonal complements 66

8.3 Best approximations 69

9 Separable Hilbert spaces and 2 73

10 Linear maps between Banach spaces 78

10.1 Bounded linear maps 78

10.2 Kernel and range 83

11 The Riesz representation theorem and the adjoint operator 85

11.1 Linear operators from H into H 91

12 Spectral Theory I: General theory 93

12.1 Spectrum and point spectrum 93

13 Spectral theory II: compact self-adjoint operators 99

13.1 Complexification and real eigenvalues 99

13.2 Compact operators 101

14 Sturm-Liouville problems 110


1
Vector spaces and bases

In all that follows we use K to denote R or C, although one can


define vector spaces over arbitrary fields.

Rn is the simplest and most natural example of a vector space. We give a


formal definition, but it is the closure property inherent in the definitions,

f + g V, f, g V, K = R or C,

that one usually has to check.

Definition 1.1 A vector space V over K is a set V with operations + :


V V V and : K V V such that
additive and multiplicative identities exist: there exists a zero element
0 V such that x + 0 = x for all x V ; and 1 K is the identity
for scalar multiplication, 1 x = x for all x V ;
there are additive inverses: for every x V there exists an element
x V such that x + (x) = 0;
addition is commutative and associative, x+y = y+x and x+(y+z) =
(x + y) + z for all x, y, z V ;
multiplication is associative

( x) = () x for all , K, x V

and distributive

(x + y) = x + y and ( + ) x = x + x

for all , K, x, y V .

1
2 Vector spaces and bases

The multiplication operator can usually be understood and so we gen-


erally drop this notation.

As remarked above we will only consider K = R or C here, and will refer to


real or complex vector spaces respectively. Generally we will omit the word
real or complex unless wishing to make an explicit distinction between
real and complex vector spaces.

Examples: Rn is a real vector space over R; it is not a vector space over


/ Rn for any x Rn ); Cn is a vector space over both R and C
C (since i x
(so if we take K = R the space Cn can be thought of, somewhat unnaturally,
as a real vector space).

Example 1.2 For 1 p < dene the space p (K) of all pth power
summable sequences with elements in K (recall that K = R or C):

X
p (K) = {x = (x1 , x2 , . . .) : xj K, |xj |p < +}.
j=1

For p = , (K) is the space of all bounded sequences in K.


For x, y p (K) set

x + y = (x1 + y1 , x2 + y2 , . . .),

and for K, x p , dene

x = (x1 , x2 , . . .).

With these denitions p (K) is a vector space. The only issue is whether
x + y is still in p (K); for 1 p < this follows since
n
X n
X
X
X
p p p p p p p
|xj + yj | 2 (|xj | + |yj | ) 2 |xj | + 2 |yj |p < +,
j=1 j=1 j=1 j=1

and for p = it is clear.

Sometimes we will simply write p for p (R).

Example 1.3 The space C 0 ([0, 1]) of all real-valued continuous functions
on the interval [0, 1] is a vector space with the obvious denitions of addition
and scalar multiplication, which we give here for the one and only time: for
Vector spaces and bases 3

f, g C 0 ([0, 1]) and R, we denote by f + g the function whose values


are given by
(f + g)(x) = f (x) + g(x), x [0, 1],
and by f the function whose values are
(f )(x) = f (x), x [0, 1].

Example 1.4 Denote by L1 (0, 1) the set of all real-valued continuous func-
tions on (0, 1) for which
Z 1
|f (x)|2 dx < +.
0

Then L1 (0, 1) is a vector space (with the obvious denitions of addition and
scalar multiplication).

The only thing to check here is that f + g L1 (0, 1) whenever f, g


L1 (0, 1)
and R. Clearly f + g C 0 (0, 1), and we have
Z 1 Z 1
|(f + g)(x)|2 dx = |f (x)| + |||g(x)| dx
0 0
Z 1 Z 1 
|f (x)| dx + || |g(x)| dx < +.
0 0

Note that if f C 0 ([0, 1]) then, since it is a continuous function on a


closed bounded interval, it is bounded and attains its bounds. It follows
that for some M 0, |f (x)| M for all x [0, 1], and so
Z 1
|f (x)| dx M < ,
0

i.e. f L1 (0, 1).

But while the function f (x) = x1/2 is not continuous on [0, 1], it is
continuous on (0, 1) and
Z 1 h i1
|x1/2 | dx = 2x1/2 = 2 < ,
0 0

so f L1 (0, 1). These two examples show that C 0 ([0, 1]) is a strict subset
of L1 (0, 1).

We now discuss spanning sets, linear independence, and bases. Note


4 Vector spaces and bases

that the definitions and the following arguments also apply to infinite-
dimensional spaces. In particular the result of Lemma 1.9 is valid for infinite-
dimensional spaces.

Definition 1.5 The linear span of a subset E of a vector space V is the


collection of all nite linear combinations of elements of E:
n
X
Span(E) = {v V : v = j ej , n N, j K, ej E}.
j=1

We say that E spans V if V = Span(E), i.e. every element of v can be


written as a finite linear combination of elements of E.

Note that this definition requires v to be expressed as a nite linear com-


bination of elements of E. When discussing bases for abstract vector spaces
with no additional structure the only option is to take finite linear combina-
tions, since these are defined using only the axioms for a vector space (scalar
multiplication and addition of vector space elements). In order to take infi-
nite linear combinations we require some way to discuss convergence, which
is not available in a general vector space.

Definition 1.6 A set E is linearly independent if any nite collection of


elements of E is linearly independent
n
X
j ej = 0 1 = = n = 0
j=1

for any choice of n N, j K, and ej E.

Definition 1.7 A Hamel basis for V is an linearly independent spanning


set.

Expansions in terms of basis elements are unique:

Lemma 1.8 If E is a Hamel basis for V then any element of V can be


written uniquely in the form
n
X
v= j ej
j=1

for some n N, j K, and ej E.


Vector spaces and bases 5

For a proof see Linear Algebra.

If E is a linearly independent set that spans V then it is not possible to


find an element of V that can be added to E to obtain a larger linearly
independent set (otherwise E would not span V ). We now show that this
can be reversed.

Lemma 1.9 If E V is maximal linearly independent set, i.e. a linearly


independent set E such that E {v} is not linearly independent for any
v V \ E. Then E is a Hamel basis for V .

Proof Suppose that E does not span V : in particular take v V that


cannot be written as any finite linear combination of elements of E. To
obtain a contradiction, choose n N and {ej }nj=1 with ej E, and suppose
that
Xn
j ej + n+1 v = 0.
j=1

Since v cannot be written as a sum of any finite collection of the {ej }, we


must have n+1 = 0, which leaves nj=1 j ej = 0. However, {ej } is a finite
P

subset of E and is thus linearly independent by assumption, and so we must


have j = 0 for all j = 1, . . . , n + 1. But this says that E {v} is linearly
independent, a contradiction. So E spans V .

We recall here the following fundamental theorem:

Theorem 1.10 Suppose that V has a basis consisting of a nite number of


elements. Then every basis of V contains the same number of elements.

This allows us to make the following definition:

Definition 1.11 If V has a basis consisting of a nite number of elements


then the dimension of V is the number of elements in this basis. If V has
no nite basis then V is infinite-dimensional.

Since a Hamel basis is a maximal linearly independent set (Lemma 1.9),


it follows that a space is infinite-dimensional iff for every n N one can find
a set of n linearly independent elements of V .
6 Vector spaces and bases

Example 1.12 For any n N the n elements in 2 (K) given by


(1, 0, 0, 0, . . .), (0, 1, 0, 0, . . .), (0, 0, ..., 0, 1, 0, . . .),
(j) (j)
i.e. elements e(j) with ej = 1 and ei = 0 if i 6= j, are linearly independent.
It follows that 2 (K) is an innite-dimensional vector space.

Example 1.13 Consider the functions fn C 0 ([0, 1]), where fn is zero for
x/ In = [2n 2(n+2) , 2n + 2(n+2) ] and interpolates linearly between the
values
fn (2n 2(n+2) ) = 0 fn (2n ) = 1 fn (2n + 2(n+2) ) = 0.
The intervals In where fn 6= 0 are disjoint, but f (2n ) = 1. It follows that
for any n the {fj }nj=1 are linearly independent, and so C 0 ([0, 1]) is innite-
dimensional.

We end this section with the following powerful-looking theorem:

Theorem 1.14 Every vector space has a Hamel basis.

The proof makes use of Zorns Lemma. In order to state this result (which
in fact is more of an axiom, since it is equivalent to the axiom of choice) we
need to introduce some auxiliary concepts.

A partial order on a set P is a binary relation  on P such that

(i) a  a for all a P ,


(ii) a  b and b  a implies that a = b, and
(iii) a  b and b  c implies that a  c.

The order is partial because two arbitrary elements of P need not be or-
dered: consider for example, the case when P consists of all subsets of R
and X  Y if X Y ; one cannot order [0, 1] and [1, 2].

A subset C of P in which all elements can be ordered is called a chain


(i.e. for all a, b C, either a  b or b  a (or both, in which case a = b)).

An element b P is an upper bound for a subset S of P if s  b for all


s P . An element m of P is maximal if m  a for some a P implies that
a = m.
Vector spaces and bases 7

Lemma 1.15 (Zorns Lemma) If every chain in a non-empty partially


ordered set P has an upper bound, then P has at least one maximal element.

We can now give the proof of Theorem 1.14.

Proof If V is finite-dimensional then V has a finite-dimensional basis, by


definition.
So assume that V is infinite-dimensional. Let P be the collection of all
linearly independent subsets of V . Let E1  E2 if E1 E2 . Let C =
{Ei }iI be a chain in P , where I is an index set, and let
[
E = Ei .
iI

Note that E is linearly independent, since any finite collection of elements



of E must be contained in one of the Ei (which is linearly independent).
Clearly Ei  E for all i I , so E is an upper bound for C.
It follows from Zorns Lemma that P has a maximal element, i.e. a max-
imal linearly independent set. By Lemma 1.9 this is a basis for V .

Exercise 1.16 Show that the space f consisting of all sequences that contain
only nitely many non-zero terms is a vector space. Show that
ej = (0, . . . , 0, 1, 0, . . .)
(all zeros except for a single 1 in the jth position) is a Hamel basis for f .

This is a very artificial example. No Banach space (a particularly nice


kind of vector space, see later) can have a countable Hamel basis. We will
show that no Hilbert space can have a countable Hamel basis in Chapter *.
2
Norms and normed spaces

2.1 Norms and normed spaces

Definition 2.1 A norm on a vector space V is a map k k : V R such


that for all x, y V and K

(i) kxk 0 with equality i x = 0;


(ii) kxk = ||kxk; and
(iii) kx + yk kxk + kyk (the triangle inequality).
A vector space equipped with a norm is called a normed space.

Strictly a normed space should be written (V, k kV ) where k kV is the


particular norm on V . However, many normed spaces have standard norms,
and so often the norm is not specified. E.g. unless otherwise stated, Rn is
equipped with the standard norm
1/2
n
X
kxk = |xj |2 . (2.1)
j=1

However, others are possible, such as


X
kxk = max |xi | and kxk1 = |xi |.
i=1,...,n
i

Exercise 2.2 Show that k k, k k , and k k1 are all norms on Rn .

8
2.1 Norms and normed spaces 9

Example 2.3 For 1 p < the standard norm on p (K) is


1/p

X
kxkp = |xj |p ,
j=1

where x = (x1 , x2 , x3 , . . .). For p = we set


kxk = sup |xj |.
j

Note that when p = 2 and K = R this is the natural extension of the standard
Euclidean norm to a countable collection of real numbers.

We now show that this really does define a norm. It is clear that kxkp 0,
and that if the norm is zero that x = 0. It is also clear that
1/p 1/p 1/p
X
X X
kxkp = |xj |p = ||p |xj |p = || |xj |p
j=1 j=1 j=1

= ||kxkp .
(This requirement explains why we have to take the pth root of the sum of
pth powers.)

It is the triangle inequality that requires some work. Although the ar-
gument is a little long, on the way we will prove two auxiliary and very
useful inequalities. We say that (p, q) with 1 < p, q < are conjugate if
1 1
+ = 1; (2.2)
p q
we extend to definition to cover the case p = 1, q = .

Lemma 2.4 (Youngs inequality) Let a, b > 0 and (p, q) conjugate with
1 < p, q < . Then
ap bq
ab + . (2.3)
p q

Proof Consider the function


tp 1
f (t) = + t.
p q
Then
df
= tp1 1,
dt
10 Norms and normed spaces

and this is zero only for t = 1, where f (1) = 0. The second derivative
is positive, so this is a minimum. It follows that f (t) 0 for all t. In
particular, choosing t = abq/p we obtain
ap bq 1
+ abq/p 0
p q
so that
ap bq
+ abq/p bq = ab.
p q

Lemma 2.5 (Holders inequality) Let x p and y q , with p, q


conjugate (1 p, q ). Then

X
|xj yj | kxkp kykq . (2.4)
j=1

Proof For p = 1, q = , for each n N



n n
X n X
|xj yj | max |yj | |xj | kyk kxk1 .
j=1
j=1 j=1

Taking the limit as n it follows that



X
|xj yj | kyk kxk1 .
j=1

For 1 < p < , consider


n n
X |xj | X 1 |xj |p 1 |yj |q
f rac|yj |kyk
q p +
kxkp p kxkp q kykqq
j=1 j=1

1.

It follows that for each n,


n
X
|xj yj | kxkp kykq ,
j=1

and so (2.4) follows by letting n .


2.1 Norms and normed spaces 11

Note that for p = q = 2 we obtain the Cauchy-Schwarz inequality for


x, y 2 ,
1/2 1/2

X X X
|xj yj | |xj |2 |yj |2 .
j=1 j=1 j=1

Choosing x = (x1 , . . . , xn , 0, 0, ) and y = (y1 , . . . , yn , 0, 0, ) we recover


the Cauchy-Schwarz inequality in Rn ,

1/2 1/2
n
X Xn Xn
|xj yj | |xj |2 |yj |2 , (2.5)
j=1 j=1 j=1

which is just the familiar inequality for dot products |x y| |x||y|.

Finally we can prove the triangle inequality for the p norm:

Lemma 2.6 (Minkowskis inequality) Let x, y p , 1 p . Then


x + y p , and

kx + ykp kxkp + kykp . (2.6)

Proof For p = the triangle inequality is clear, since

n n n
max |xj + yj | max |xj | + max |yj |,
j=1 j=1 j=1

and the proof follows by taking limits. Similarly for p = 1,

n
X n
X n
X
|xj + yj | |xj | + |yj |,
j=1 j=1 j=1

and again one takes the limit as n .


For 1 p < , we have already noted that x + y p . Now let q be
the conjugate exponent to p, i.e. q = p/(p 1) (so that 1/p + 1/q = 1). It
follows that (p 1)q = p, and therefore, since x + y p , the sequence z
12 Norms and normed spaces

with zj = |xj + yj |p1 q . So we can write


n
X n
X
|xj + yj |p |xj + yj |p1 (|xj | + |yj |)
j=1 j=1
Xn n
X
p1
= |xj + yj | |xj | + |xj + yj |p1 |yj |
j=1 j=1

X n
X
|xj + y + j|p1 |xj | + |xj + y + j|p1 |yj |
j=1 j=1
1/q
X
|xj + y +j |(p1)q (kxkp + kykp )

j=1
1/q
n
X
= |xj + yj |p (kxkp + kykp ),
j=1

i.e.
11/q
Xn
|xj + yj |p kxkp + kykp .
j=1

Since 1 (1/q) = 1/p one can now let n to obtain (2.6).

One of our main concerns in what follows will be normed spaces that
consist of functions. For example, the following are norms on C 0 ([0, 1]), the
space of all continuous functions on [0, 1]: the sup(remum) norm,

kf k = sup |f (x)|
x[0,1]

(convergence in this norm is equivalent to uniform convergence) and the L1


and L2 norms
Z 1 Z 1 1/2
2
kf kL1 = |f (x)| dx and kf kL2 = |f (x)| dx .
0 0

Note that of the three candidates here the L2 norm looks most like the
expression (2.1) for the familiar norm in Rn .

Lemma 2.7 k kL1 is a norm on C 0 ([0, 1]).


2.1 Norms and normed spaces 13

Proof The only part that requires much thought is (i), to make sure that
kf kL1 = 0 iff f = 0. So suppose that f 6= 0. Then |f (y)| = > 0 for some
y (0, 1) (if f (0) 6= 0 or f (1) 6= 0 it follows from continuity that f (y) 6= 0
for some y (0, 1)). Since f is continuous, there exists an > 0 such that
for any x (0, 1) with |x y| < we have
|f (x) f (y)| < /2.
If necessary, reduce so that [y , y + ] (0, 1). Then
Z 1 Z y+ Z y+
2 2
|f (x)| dx |f (x)| dx dx = > 0.
0 y y 2

(ii) and (iii) are clear since


Z Z
kf kL1 = |f (x)| dx = || |f (x)| dx = ||kf kL1

and
Z Z
kf + gkL1 = |f (x) + g(x)| dx |f (x)| + |g(x)| dx kf kL1 + kgkL1 .

For k kL2 (i) and (ii) are the same as above; we will see (iii) below as a
consequence of the Cauchy-Schwarz inequality for inner products.

Definition 2.8 Two norms kk1 and kk2 on a vector space V are equivalent
we write k k1 k k2 if there exist constants 0 < c1 c2 such that
c1 kxk1 kxk2 c2 kxk1 for all x V.

It is clear that the above notion of equivalence is reflexive. It is also


transitive:

Lemma 2.9 Suppose that k k1 , k k2 , and k k3 are all norms on a vector


space V , and that k k2 and k k3 are both equivalent to k k1 . Then k k2
and k k3 are equivalent.

Proof There exist constants 0 < 1 2 and 0 < 1 2 such that


1 kxk2 kxk1 2 kxk2 and 1 kxk3 kxk1 2 kxk3 ,
and so
kxk2 1 1
2 kxk1 1 2 kxk3
14 Norms and normed spaces

and

kxk2 1 1
1 kxk1 2 1 kxk3 ,

i.e. 1 1 1
2 kxk3 kxk2 2 1 kxk3 and k k2 and k k3 are equivalent.

Exercise 2.10 Show that the norms k k, k k1 , and k k on Rn are all


equivalent.

This is a particular case of the general result that all norms on a finite-
dimensional vector space are equivalent, which we will prove in the following
chapter. As part of this proof, the following proposition which shows that
one can always find a norm on a finite-dimensional vector space will be
useful.

Proposition 2.11 Let V be an n-dimensional vector space, and E = {ej }nj=1


a basis for V . Dene a map k kE : V [0, ) by
1/2
n n
X X
|j |2


j ej
=

j=1 j=1
E

(taking the positive square root). Then k kE is a norm on V .

P
Proof First, note that any v V can be written uniquely as v = j j ej ,
so the map v 7 kvkE is well-defined. We check that k kE satisfies the three
requirements of a norm:

|j |2 = 0
P P
(i) clearly kvkE 0, and if kvkE = 0 then v = j ej with
i.e. j = 0 for j = 1, . . . , n, and so v = 0.
P P
(ii) If v = j j ej then v = j (j )ej , and so
X X
kvk2E = |j |2 = ||2 |j |2 = ||2 kvk2E .
j j

P P
(iii) For the triangle inequality, if u = j j ej and v = j j ej then,
2.1 Norms and normed spaces 15

using the Cauchy-Schwarz inequality (2.5)


2

X
2

ku + vkE = (j + j )ej


j
X
= |j + j |2
j
X
= |j |2 + j j + j j + |j |2
j
X X
= kuk2E + j j + j j + kvk2E
j j
1/2 1/2
X X
kuk2E + |j |2 |j |2 + kvk2E
j j

= kuk2E + 2kukE kvkE + kvk2E


= (kukE + kvkE )2 ,

i.e. ku + vkE kukE + kvkE .

We now want to show that with kkE norm, any finite-dimensional normed
space is the same as Rn . For two objects to be the same we generally
require an isomorphism that also preserves the essentially structures of the
objects. Here we want to say that two linear spaces, along with their norms,
are the same. So we will need the isomorphism to be linear (so that (x) +
(y) = (x + y)) and we will also want to preserve the norm (an isometry).

Definition 2.12 Two normed spaces (X, k kX ) and (Y, k kY ) are isomet-
rically isomorphic, or simply isometric, if there exists a linear isomorphism
: X Y that is also an isometry, i.e.

k(x)kY = kxkX for all x X.

Corollary 2.13 With the norm dened in Proposition 2.11, (V, k kE ) is


isometrically isomorphic to (Rn , | |).
16 Norms and normed spaces
Pn
Proof For v (V, k kE ) with v = j=1 j ej , define
Xn
( j ej ) = (1 , . . . , n ),
j=1

which has a well-defined inverse


n
X
1 (1 , . . . , n ) = j ej .
j=1

It is clear that is one-to-one and onto, that (and its inverse) are linear,
and it follows directly from the definition of k kE that |(x)| = kxkE for all
xV.

2.2 Convergence

In a normed space we can measure the distance between x and y using


kx yk. So we can define notions of convergence and continuity using this
idea of distance:

Definition 2.14 A sequence {xk } k=1 in a normed space X converges to a


limit x X if for any > 0 there exists an N such that
kxk xk < for all n N.

This definition is sensible in that limits are unique:

Exercise 2.15 Show that the limit of a convergent sequence is unique.

The following result shows that if xn x then the norm of xn converges


to the norm of x. This will turn out to be a very useful observation.

Lemma 2.16 If xn x in (X, k k) then kxn k kxk.

Proof The triangle inequality gives


kxn k kxk + kxn xk and kxk kxn k + kx xn k
which implies that

kx k kxk kxn xk.

n
2.2 Convergence 17

Two equivalent norms give rise to the same notion of convergence:

Lemma 2.17 Suppose that k k1 and k k2 are two equivalent norms on a


space X. Then
kxn xk1 0 i kxn xk2 0,
i.e. convergence in one norm is equivalent to convergence in the other, with
the same limit.

The proof of this lemma is immediate from the definition of the equiva-
lence of norms, since there exist constants 0 < c1 c2 such that
c1 kxn xk1 kxn xk2 c2 kxn xk1 ;

Using convergence we can also define continuity:

Definition 2.18 A map f : (X, k kX ) (Y, k kY ) is continuous if


xn x in X f (xn ) f (x) in Y,
i.e. if
kxn xkX 0 kf (xn ) f (x)kY 0.

Exercise 2.19 Show that this is equivalent to the denition of continu-


ity: for each x X, for every > 0 there exists a > 0 such that
ky xkX < kf (y) f (x)kY < .

Lemma 2.17 has an immediate implication for continuity:

Corollary 2.20 Suppose that k kX,1 k kX,2 are two equivalent norms
on a space X, and k kY,1 k kY,2 are two equivalent norms on a space
Y . Then a function f : (X, k kX,1 ) (Y, k kY,1 ) is continuous i it is
continuous as a map from (X, k kX,2 ) into (Y, k kY,2 ).

We remarked above that all norms on a finite-dimensional space are equiv-


alent, which means that there is essentially only one notion of convergence
and of continuity. But in infinite-dimensional spaces there are distinct
18 Norms and normed spaces

k2 1

1
gk

fk

1 1 1 1 1
2 k
1
2
1
2 + 1
k k+1 k k1

Fig. 2.1. (a) definition of fk and (b) definition of gk .

norms, and the different notions of convergence implied by the norms we


have introduced for continuous functions are not equivalent, as we now show.

First we note that convergence of fk C 0 ([0, 1]) to f in the supremum


norm, i.e.
sup |fk (x) f (x)| 0 (2.7)
x[0,1]

implies that fk f in the L1 norm, since clearly


Z 1 Z 1 !
|fk (y)f (y)| dy sup |fk (x) f (x)| dy = sup |fk (x)f (x)|.
0 0 x[0,1] x[0,1]

This inequality should make very clear the advantage of the shorthand norm
notation, since it just says
kfk f kL1 kfk f k .
It is also clear that if (2.7) holds then fk (x) f (x) for each x [0, 1], which
is pointwise convergence. However, neither pointwise convergence nor L1
convergence imply uniform convergence:

Example 2.21 Consider the sequence of functions {fk } as illustrated in


Figure 2.2(a). Then fk 0 in the L1 norm, since
1
kfk 0kL1 = kfk kL1 = .
k
However, fk 6 0 pointwise, since fk ( 12 ) = 1 for all k.
2.2 Convergence 19

Example 2.22 Consider the sequence of functions {gk } as illustrated in


Figure 2.2(b). Then fk 0 pointwise, but
   
1 2 1 1 1 1
kfk kL1 = 2 (k 1) + = 1,
k1 k k k+1
so fk 6 0 in the L1 norm.
3
Compactness and equivalence of norms

3.1 Compactness

One fundamental property of the real numbers is expressed by the Bolzano-


Weierstrass Theorem:

Theorem 3.1 (Bolzano-Weierstrass) A bounded sequence of real num-


bers has a convergent subsequence.

This can easily be generalised to sequences in Rn :

Corollary 3.2 (Bolzano-Weierstrass in Rn ) A bounded sequence in Rn


has a convergent subsequence.

(k) (k)
Proof Let {x(k) = (x1 , . . . , xn } be a bounded sequence in Rn . Since
(k)
x1 is a bounded sequence in R, there is a subsequence x(k1,j ) for which
(k ) (k )
x1 1,j converges. Since x(k1,j ) is again a bounded sequence in Rn , x2 1,j is
a bounded sequence in R. We can therefore find a subsequence x(k2,j ) of
(k )
x(k1,j ) such that x2 2,j converges. Since x(k2,j ) is a subsequence of x(k1,j ) ,
(k )
x1 2,j still converges. We can continue this process inductively to obtain a
(k )
subsequence x(kn,j ) such that all the xi n,j for i = 1, . . . , n converge.

We now make two definitions:

20
3.1 Compactness 21

Definition 3.3 A subset X of a normed space (V, k k) is bounded if there


exists an M > 0 such that
kxk M for all x X.

Note that if kk1 kk2 are two equivalent norms on V then X is bounded
wrt k k1 iff it is bounded wrt k k2 .

Definition 3.4 A subset X of a normed space (V, k k) is closed if whenever


a sequence {xn } with xn X converges to some x, we must have x X.

Note that if k k1 k k2 are two equivalent norms on V then X is closed


wrt k k1 iff it is closed wrt k k2 .

Example: any closed interval in R is closed in this sense. Any product of


closed intervals in closed in Rn .

Exercise 3.5 Show that if (X, k k) is a normed space then the unit ball
BX ([0, 1]) = {x X : kxk 1}
and the unit sphere
SX = {x X : kxk = 1}
are both closed.

Definition 3.6 A subset K of a normed space (V, k k) is compact if any


sequence {xn } with xn K has a convergent subsequence xnj x with
x K.

Note that if kk1 kk2 are two equivalent norms on V then X is compact
wrt k k1 iff it is compact wrt k k2 .

Two properties of compact sets are easy to prove:

Theorem 3.7 A compact set is closed and bounded.

Proof Let K be a compact set in (V, k k) and xn x with xn K. Since


K is compact {xn } has a convergent subsequence; its limit must also be x,
and from the definition of compactness x K, and so K is closed.
Suppose that K is not bounded. Then for each n N there exists an
22 Compactness and equivalence of norms

xn K such that kxn k n. But {xn } must have a convergent subsequence,


and any convergent sequence is bounded, which yields a contradiction.

It follows from the Bolzano-Weierstrass theorem that any closed bounded


set K in Rn is compact: A sequence in a bounded subset K of Rn has a
convergent subsequence by Corollary 3.2; since K is closed by definition this
subsequence converges to an element of K. So K is compact. We have
therefore shown:

Theorem 3.8 A subset of Rn is compact i it is closed and bounded.

We will see later that this characterisation does not hold in infinite-
dimensional spaces (and this is one way to characterise such spaces).

We now prove two fundamental results about continuous functions on


compact sets:

Theorem 3.9 Suppose that K (X, k kX ) is compact and that f is a


continuous map from (X, k kX ) into (Y, k kY ). Then f (K) is a compact
subset of (Y, k kY ).

Proof Let {yn } f (K). Then yn = f (xn ) for some xn K. Since


{xn } K, and K is compact there is a subsequence of xn that converges,
xnj x K. Since f is continuous it follows that as j

ynj = f (xnj ) f (x ) = y f (K),

i.e. the subsequence ynj converges to some y f (K), and so f (K) is


compact.

From which follows:

Proposition 3.10 Let K be a compact subset of (X, kk). Then any contin-
uous function f : K R is bounded and attains its bounds, i.e. there exists
an M > 0 such that |f (x)| M for all x K, and there exist x, x K
such that
f (x) = inf f (x) and f (x) = sup f (x).
xK xK
3.1 Compactness 23

Proof Since f is continuous and K is compact, f (K) is a compact subset


of R, i.e. f (K) is closed and bounded. It follows that
f = sup y f (K),
yf (K)

and so f = f (x) for some x K. [That sup(S) S for any closed S is clear,
since for each n there exists an sn S such that sn > sup(S) 1/n. Since
sn sup(S) by definition, sn sup(S), and it follows from the fact that S
is closed that sup(S) S.] The argument for x is identical.

This allows one to prove the equivalence of all norms on a finite-dimensional


space.

Theorem 3.11 Let V be a nite-dimensional vector space. Then all norms


on V are equivalent.

Proof Let E = {ej }nj=1 be a basis for V , and let k kE be the norm on V
defined in Proposition 2.11. Let k k be another norm on V . We will show
that k k is equivalent to k kE . Since equivalence of norms is an equivalence
relation, this will imply that all norms on V are equivalent.
P
Now, if u = j j ej then


X
kuk = j ej

j
X
|j |kej k (using the triangle inequality)
j
1/2 1/2
X X
|j |2 kej k2 (using the Cauchy-Schwarz inequality)
j j

= CE kukE ,
where CE2 = j kej k2 , i.e. CE is a constant that does not depend on u.
P

Now, observe that this estimate kuk CE kukE implies for u, v V ,


ku vk CE ku vkE ,
and so the map u 7 kuk is continuous from (V, k kE ) into R.
Now, note that set
SV = {u V : kukE = 1}
24 Compactness and equivalence of norms
Pn
is the image of Sn = { Rn : || = 1} under the map 7 j=1 j ej .
Since by definition

n
X

= ||,
j ej


j=1
E

this map is continuous. Since Sn is closed and bounded, it is a compact


subset of Rn ; since SV is the image of Sn under a continuous map, it is also
compact.
Therefore the map v 7 kvk is bounded on SV , and attains its bounds. In
particular, there exists an a 0 such that

kvk a for every v V with kvkE = 1.

Since the bound is attained, there exists a v SV such that kvk = a. If


a = 0 then kvk = 0, i.e. v = 0. But since v SV we have kvkE = 1, and
so v cannot be zero. It follows that a > 0. Then for an arbitrary v V , we
have v/kvkE SV , and so

v
kvkE a kvk akvkE .

Combining this with kuk CE kukE shows that kk and kkE are equivalent.

Corollary 3.12 A subset of a nite-dimensional normed space V is compact


i it is closed and bounded.

Proof Let K be a closed bounded subset of (V, k k). Then K is also a


closed bounded subset of (V, k kE ), since k k k kE .
The map : (V, k kE ) Rn defined by

X
j ej = (1 , . . . , n )
j

is an isometry: and 1 are continuous, and |(x)| = kxkE .


Since |(x)| = kxkE and K is bounded, it follows that (K) is bounded.
Since 1 is continuous and K is closed, it follows that (K) is closed. To see
this, take yn (K), and suppose that yn y . Since 1 is continuous,
it follows that
1 (yn ) 1 (y ),
3.1 Compactness 25

and since 1 (yn ) K and K is closed it follows that 1 (y ) K. In


particular, therefore, y = (1 (y )) (K) and (K) must be closed.
So (K) is a closed bounded subset of Rn , and hence compact. 1 is
continuous, so K = 1 ((K)) is the continuous image of a compact set,
and hence a compact subset of (V, k kE ). Since k k k kE , it follows that
K is a compact subset of (V, k k) as required.
4
Completeness

In the treatment of convergent sequences of real numbers, one natural ques-


tion is whether there is a way to characterise which sequences converge
without knowing their limit. The answer, of course, is yes, and is given by
the notion of a Cauchy sequence.

Theorem 4.1 A sequence of real numbers {xn } n=1 converges if and only if
it is a Cauchy sequence, i.e. given any > 0 there exists an N such that

|xn xm | < for all n, m N.

Note that the proof makes use of the Bolzano-Weierstrass Theorem, so


is in some way entangled with compactness properties of closed bounded
subsets of R.

A sequence in a normed space (X, k k) is Cauchy if given any > 0 there


exists an N such that

kxn xm k < for all n, m N.

Lemma 4.2 Any Cauchy sequence is bounded.

Proof There exists an N such that

kxn xm k < 1 for all n, m N.

It follows that in particular kxn k kxN k + 1 for all n N , and hence kxn k
is bounded.

26
Completeness 27

Definition 4.3 A normed space X is complete if any Cauchy sequence in


X, converges to some x X. A complete normed space is called a Banach
space.

Theorem 4.1 states that R with its standard norm is complete (R is a


Banach space). It follows fairly straightforwardly that the same is true for
any finite-dimensional normed space.

Theorem 4.4 Every nite-dimensional normed space (V, k k) (over R or


C) is complete.

Proof Choose a basis E = (e1 , . . . , en ) of V , and define another norm k kE


on V by
1/2
n n
X X
|xj |2 .


xj ej
=

j=1 j=1
E

Since all norms on V are equivalent (Theorem 3.11), a sequence {xk } that
is Cauchy in k k is Cauchy in k kE .
Writing xk = nj=1 xkj ej it follows that given any > 0 there exists an N
P

such that for k, l N


n
X
kxk xl k2E = |xkj xlj |2 < 2 . (4.1)
j=1

In particular {xkj } is a Cauchy sequence of real numbers for each fixed j =


1, . . . , n. It follows that for each j = 1, . . . , n we have xnj xj for some xj .
Set x = nj=1 xj ej .
P

Letting l in (4.1) shows that


n
X
k
kx x k2E = |xkj xj |2 2 for all n N ,
j=1

i.e. xn x wrt k kE . It follows that xn x wrt k k, and clearly x V ,


and so V is complete.

Note that in particular Rn is complete.

The completeness of p is a little more delicate, but only in the final steps.
28 Completeness

Proposition 4.5 (Completeness of p ) The sequence space p (K) (equipped


with its standard norm) is complete.

Proof Suppose that xk = (xk1 , xk2 , ) is a Cauchy sequence in p (K). Then


for every > 0 there exists an N such that

kxn xm kpp =
X
|xnj xm p
j | <
p
for all n, m N . (4.2)
j=1

In particular {xkj }
k=1 is a Cauchy sequence in K for every fixed j. Since K
is complete (recall K = R or C) it follows that for each k N
xkj ak
for some ak R.
Set a = (a1 , a2 , ). We want to show that a p and that kxk akp 0
as k . First, since {xk } is Cauchy we have from (4.2) that kxn xm kp <
for all n, m N , and so in particular for any N N
N
X
X
|xnj xm p
j | |xnj xm p p
j | .
j=1 j=1

Letting m we obtain
N
X
|xnj aj |p p ,
j=1

and since this holds for all N it follows that



X
|xnj aj |p p ,
j=1

and so xk a p . But since p is a vector space and xk p , this implies


that a p and kxk akp .

Since the norm on 2 is the natural generalisation of the norm on Rn ,


and since it is complete, it is tempting to think that 2 will behave just like
Rn . However, it does not have the Bolzano-Weierstrass property (bounded
sequences have a convergent subsequence) as we can see easily by considering
the sequence {ej }j=1 , where ej consists entirely of zeros apart from a 1 in
the jth position. Then clearly kej k2 = 1 for all j; but if i 6= j then
kei ej k22 = 2,
Completeness 29

i.e. any two elements of the sequence are always 2 away from each other.
It follows that no subsequence of the {ej } can form a Cauchy sequence, and
so there cannot be a convergent subsequence.

This is really the first time we have seen a significant difference between Rn
and the abstract normed vector spaces that we have been considering. The
failure of the Bolzano-Weierstrass property is in fact a defining characteristic
of infinite-dimensional spaces.

Theorem 4.6 C 0 ([0, 1]) equipped with the sup norm k k is complete.

Proof Let {fk } be a Cauchy sequence in C 0 ([0, 1]): so given any > 0 there
exists an N such that

sup |fn (x) fm (x)| < for all n, m N. (4.3)


x[0,1]

In particular {fk (x)} is a Cauchy sequence for each fixed x, so fk (x) con-
verges for each fixed x [0, 1]: define

f (x) = lim fk (x).


k

We need to show that in fact fk f uniformly. But this follows since for
every x [0, 1] we have from (4.3)

|fn (x) fm (x)| < for all n, m N,

where N does not depend on x. Letting m in this expression we obtain

|fn (x) f (x)| < for all n N,

where again N does not depend on x. It follows that

sup |fn (x) f (x)| < for all n N,


x[0,1]

i.e. fn converges uniformly to f on [0, 1]. Completeness of C 0 ([0, 1]) then


follows from the fact that the uniform limit of a sequence of continuous
functions is still continuous.

For this reason the supremum norm is the standard norm on C 0 ([0, 1]);
if no norm is mentioned this is the norm that is intended.

Example 4.7 C 0 ([0, 1]) equipped with the L1 norm is not complete.
30 Completeness

fk

0
1 1
2 k
1
2
1

Fig. 4.1. Definition of fk .

Consider the sequence of functions {fk } defined by

1 1

0 
0x 2 k
1 1 1 1 1

fk (x) = k x 2 k 2 k <x< 2

1
1 x 1,

2

see Figure 4.

This sequence is Cauchy in the L1 norm, since for n, m N we have

1 1
fn (x) = fm (x) for all x< 2 , x > 12 ,
N

and so
1
1
1
Z Z
2
kfn fm kL1 = |fn (x) fm (x)| dx = |fn (x) fm (x)| dx ,
0
1 1 N
2N
(4.4)
since |fn (x) fm (x)| 1 for all x [0, 1] and all n, m N.

But what is the limit, f (x), as n ? Clearly one would expect f to be


the pointwise limit,
(
0 0 x < 12
f (x) =
1 12 x 1,
4.1 The completion of a normed space 31

but this is certainly not continuous. It is not a priori obvious that fn f


(in the L1 norm), but this is easy to check,
Z 1
kfk f kL1 = |fk (x) f (x)| dx
0
Z 1
2 1
= |k(x 2 + k1 )| dx
1 1
2k
1
= 0 as k .
2k

So this sequence converges in the L1 norm but not the sup norm.

4.1 The completion of a normed space

However, every normed space has a completion, i.e. a minimal set V such
that V V and (V , k k) is a Banach space. Essentially V consists of all
limit points of Cauchy sequences in V (and in particular, therefore, contains
a copy of V via the constant sequence vn = v V ).

This implies that for any v V there exist vn V such that vn v in


the norm k k; we say that V is dense in V .

Definition 4.8 Let (V, k k) be a normed space. Then X is dense in V if


for any v V and any > 0 there is an x X such that
kx vk < .

This is equivalent to the fact that given any v V there exists a sequence
xn X such that
kxn vk 0 as n .
This is the particularly useful form of density: if X is dense in V one can
often deduce properties of V by approximating them with elements of X.

Example 4.9 R is the completion of Q in the standard norm on R.

Exercise 4.10 Recall that we dened f to be the set of all sequences in


which only a nite numbers of terms are non-zero. Show that f is dense in
2 .
32 Completeness

The description of the completion of (V, k k) above is not strictly cor-


rect. Clearly it must be missing some subtleties, since we are adding to V
elements that are not in V and hence, in the setting of a general abstract
normed space (V, k kV ), are not defined.

A completion of (V, k kV ) is a Banach space (X, k kX ) that contains an


isometric image of (V, k kV ) that is dense in X. One can show that there
is only one completion in that any two candidates must be isometric:

Theorem 4.11 Let (X, k kX ) be a normed space. Then there exists a


complete normed space (X , k kX ) and a linear map i : (X, k kX )
(X , k kX ) that is an isometry between X and its image, such that i(X)
is a dense subspace of X . Furthermore X is unique up to isometry; if
(Y , k kY ) is a complete normed space and j : (X, k kX ) (Y , k kY ) is
an isometry between X and its image, such that j(X) is a dense subspace
of Y , then X and Y are isometric.

Proof We consider Cauchy sequences in X, writing


x = (x1 , x2 , . . .) xj X
for a sequence in X. We say that two Cauchy sequences x and y are equiv-
alent, x y, if
lim kxn yn kX = 0.
n

We let X be the space of equivalence classes of Cauchy sequences in X


(i.e. X = X/ ). It is clear that X is a vector space, since the sum of
two Cauchy sequences in X is again a Cauchy sequence in X. We define a
candidate for our norm on X : if X then
kkX = lim kxn kX , (4.5)
n

for any x (recall that is an equivalence class of Cauchy sequences).


Note that (i) if y is a Cauchy sequence in X, then {kyn k} forms a Cauchy
sequence in R, so for a particular choice of y the right-hand side of (4.5)
exists, and (ii) if x, y then

lim kxn k lim kyn k = lim kxn k kyn k

n n n

= lim kxn k kyn k

n
lim kxn yn k = 0
n
4.1 The completion of a normed space 33

since x y. So the map in (4.5) is well-defined, and it is easy to check that


it satisfies the three requirements of a norm.
Now we define a map i : X X , by setting

i(x) = [(x, x, x, x, x, x, . . .)].

Clearly i is linear, and an isometry between X and its image. We want to


show that i(X) is a dense subset of X .
For any given X , choose some x . Since x is Cauchy, for any given
> 0 there exists an N such that

kxn xm kX < for all n, m N.

In particular, kxn xN kX < for all n N , and so

k i(xN )kX = lim kxn xN kX < ,


n

which shows that i(X) is dense in X .


Finally, we have to show that X is complete, i.e. that any Cauchy se-
quence in X converges to another element of X . A Cauchy sequence in X
is a Cauchy sequence of equivalence classes of Cauchy sequences in X! Take
such a Cauchy sequence, { (k) }
k=1 . For each k, find xk X such that

ki(xk ) (k) kX < 1/k,

using the density of i(X) in X . Now let

x = (x1 , x2 , x3 , . . .).

We will show (i) that x is a Cauchy sequence, and so [x] X , and (ii) that
(k) converges to [x]. This will show that X is complete.
(i) To show that x is Cauchy, observe that

kxn xm kX = ki(xn ) i(xm )kX


= ki(xn ) (n) + (n) (m) + (m) i(xm )kX
ki(xn ) (n) kX + k (n) (m) kX + k (m) i(xm )kX
1 1
+ k (n) (m) kX + .
n m
So now given > 0, choose N such that k (n) (m) kX < /3 for n, m N .
If N = max(N, 3/), it follows that

kxn xm kX < for all n, m N ,

i.e. x is Cauchy. So [x] X .


34 Completeness

(ii) To show that (k) [x], simply observe that

k[x] (k) kX k[x] i(xk )kX + ki(xk ) (k) kX .

Given > 0, choose N large enough that kxn xm kX < /2 for all n, m N ,
and then set N = max(N, 2/). It follows that for k N ,

k[x] i(xk )kX = lim kxn xk k < /2


n

and ki(xk ) (k) kX < /2, i.e.

k[x] (k) kX < ,

and so (k) [x].


We will not prove the uniqueness of X here.

The space X in the above theorem is a very abstract one, and we are
fortunate that in most situations there is a more concrete description of the
completion of interesting normed spaces.

Definition 4.12 The space L1 (0, 1) is the completion of C 0 ([0, 1]) with re-
spect to the L1 norm.

Note that with this definition it is immediate that L1 (0, 1) is complete,


and that C 0 ([0, 1]) is dense in L1 (0, 1).

What is this space L1 (0, 1)? There are a number of possible answers:

Heuristically, L1 (0, 1) consists of all functions that can arise as the limit
(with respect to the L1 norm) of sequences fn C 0 ([0, 1]).

However, how do we characterise these limits? Certainly L1 (0, 1) is larger


than C 0 ([0, 1]) (and larger than C 0 (0, 1)). We saw above that it contains
functions that are not continuous, and even functions whose values at indi-
vidual points (e.g. x = 12 ) are not defined.

Formally, L1 (0, 1) is isometrically isomorphic to the equivalence class of


sequences in C 0 ([0, 1]) that are Cauchy in the L1 norm, where {fk } {gk }
if
Z 1
|fk (x) gk (x)| dx 0 as k .
0
4.1 The completion of a normed space 35

This is hardly helpful.

The space L1 (0, 1) consists of all real-valued functions f such that


Z 1
|f (x)| dx
0
is finite, where the integral is understood in the sense of Lebesgue integra-
tion. We say that f = g in L1 (0, 1) (the functions are essentially the same
if
Z 1
|f (x) g(x)| dx = 0.
0

This is the most intrinsic definition, and some ways the most useful.
But note that given this definition it is certainly not obvious that L1 (0, 1) is
complete, nor that C 0 ([0, 1]) is dense in L1 (0, 1). We will assume these prop-
erties in what follows, but at the risk of over-emphasis: if we use Definition
4.12 to define L1 these properties come for free. If we use the useful defi-
nition above there is actually some work to do to check these (which would
be part of a proper development of the Lebesgue integral and corresponding
Lebesgue spaces).

Although we cannot discuss the theory of Lebesgue integration in detail


here, we can give a quick overview of its fundamental features and give
a rigorous definition of the notion of almost everywhere. Essentially the
Lebesgue integral extends more elementary definitions of the integral in a
mathematically consistent way.
5
Lebesgue integration

We follow the presentation in Priestley (1997), and start the construction of


the Lebesgue integral by defining the integral of simple functions for which
there can be no argument as to the correct definition.

We define the measure (or length) |I| of an interval I = [a, b], (a, b), (a, b],
or [a, b) to be
|I| = b a.

5.1 Integrals of functions in Lstep (R)

The class Lstep (R) of step functions on R consists of all those functions s(x)
that are piecewise constant on a finite number of intervals, i.e.
n
X
s(x) = cj [Ij ](x), (5.1)
j=1

where cj R, each Ij is an interval, and [A] denotes the characteristic


function of the set A,
(
1 xA
[A](x) =
0 x/ A.

We define the integral of s(x) by


Z n
X
s= cj |Ij |. (5.2)
j=1

36
5.1 Integrals of functions in Lstep (R) 37

Even though this definition appears entirely reasonable, note that we have
not specified the nature of the intervals Ij , so the functions
(0,1) and [0,1]
have the same integral (which is 1); by extension we canR change the value
of s at a finite number of points and leave the value of s unchanged.

Most of the results we state below about the integrals of functions in


Lstep (R) rely on the following two observations:

(1) If Lstep (R) then can be written as


n
X
= cj Kj (5.3)
j=1

where the intervals Kj are disjoint.


(2) If , Lstep (R) then one can write
n
X n
X
= cj Ij and = dj Ij ,
j=1 j=1

i.e. where and are expressed using the same choice of intervals (in this
case some of cj s and dj s may be zero).

The following properties of step functions are easily proved:

(A) If Lstep (R) then || Lstep (R)


(B) If Lstep (R) then + = max(, 0) and = min(, 0) are both in
Lstep (R).

It is tedious but fairly elementary to check that this integral is well-defined


on Lstep (R), so that if s(x) is given by two possible expressions (5.1) then
the integrals in (5.2) agree: to do this one uses the disjoint form in (5.3).

It is also relatively simple to check that this integral satisfies the following
three fundamental properties:

(L) Linearity: if , Lstep (R) and R then + Lstep (R) and


Z Z Z
( + ) = + .
38 Lebesgue integration

Proof Use (2) above. Then


n
X
+ = (cj + dj )Ij ,
j=1

and so
Z n
X n
X n
X Z Z
( + ) = (cj + dj )|Ij | = cj |Ij | + dj |Ij | = + .
j=1 j=1 j=1

(P) Positivity: If Lstep (R) with 0 then


R
0.

Proof Use (1). Then


n
X
= cj Kj
j=1
R
is positive iff cj 0 for each j. It is immediate that 0.
Lstep (R) Lstep (R)
R R
(M) Modulus property: If then || and | | ||.

Proof Clearly || 0, and so, since Lstep (R) implies (see *) that
|| Lstep (R), we can use properties (L) and (P) to give
Z Z Z
|| = || 0,

from whence
Z Z
||,

as required.
(T) Translation invariance: Take RLstep (R).
R For t R define h (x) =
(h + x). Then h Lstep (R) and h = .

Proof Clearly if
n
X
= cj Ij
j=1

where Ij = haj , bj i, then d is given by


n
X
h = cj Ij ,
j=1
5.2 Increasing sequences of functions in Lstep (R): Linc (R) 39

where Ij = haj h, bj hi. It is clear that |Ij | = |Ij |, and so


R R
h = .

Note that combining positivity and linearity gives the comparison result
step
Rwhich Rwill be critical in what follows: if , L (R) and then
.

5.2 Increasing sequences of functions in Lstep (R): Linc (R)

Now, if sn (x) is a monotonically increasing sequence of functions in Lstep (R)


(sn+1 (x) sn (x) for each x R), then it follows from the comparison
property above that the sequence
Z
sn (5.4)

is also monotonically increasing. Provided that the integrals in (5.4) are


uniformly bounded in n,
Z
lim sn
n

exists.

We would like to deduce that the functions sn (x) must converge to some
limit,
R but we do not know that sn (x) is bounded, only that the integral
sn is bounded. Nevertheless, we can show that sn (x) converges almost
everywhere, an idea that we now introduce.

First, we will say that a set A R has measure zero (zero length) if,
given any > 0, one can find a (possibly countably infinite) set of intervals
[aj , bj ] that cover A but whose total length is less than :

[
X
A [aj , bj ] and (bj aj ) < .
j=1 j=1

Exercise 5.1 Show that if Aj has measure zero for all j = 1, . . . then

[
Aj
j=1
P j
also has measure zero. [Hint: j=n+1 2 = 2n .]
40 Lebesgue integration

A property is said to hold almost everywhere or for almost every x if


the set of points at which it does not hold has measure zero.

Exercise 5.2 Show that if each property Pj , j = 1, 2, . . ., holds almost


everywhere in an interval I then all the Pj hold simultaneously at almost
every point in I.

What we will now show that each monotonic sequence sn (x) with (5.4)
uniformly bounded tends pointwise to a function f (x) almost everywhere,
i.e. except on a set of measure zero.

Theorem 5.3 LetR {n } be an increasing sequence of step functions (n+1 (x)


n (x)) such that n K for all n. Then n (x) converges for almost every
x.

Proof First, replace {n } by n := n 1 . Then n satisfies the conditions


of the theorem with K replaced by some K , but now n 0.
If we can show that n (x) converges for almost every x then the same
clearly follows for n (x) = n (x) + 1 (x).
We want to show that
E = {x : n (x) }
has measure zero. Note that, since n (x) is non-decreasing for each x, this
is precisely the set of points where n (x) does not converge.
Fix m > 0, and set
En = {x : n (x) K m}.
Then En can be covered by a finite collection In of disjoint intervals of
P
total length 1/m: indeed, using (1) and writing n = j cj Kj with Kj
disjoint, let In be the collection of those Kj with corresponding cj K m.
Then Z X X X
K n = cj |Kj | ci |Ki | K m |Ki |,
j j In iIn

and so
X 1
|Ki | .
m
iIn

Now, note that En+1 En (since n+1 n ); by splitting up the intervals


that occur in In+1 if need be one can ensure that In+1 In .
5.2 Increasing sequences of functions in Lstep (R): Linc (R) 41

One can therefore list the intervals occurring in the In , {J1 , J2 , J3 , . . .}


first list those in I1 , then the additional intervals in I2 , and so on.
Now, suppose that x E. Then for n sufficiently large, x En . So
[
[
E Ei Jk . (5.5)
i k=1
PN
Now, for any fixed N , Ni=1 Ji Ir for some r, and so i=1 |Ji | 1/m.
Since this does not depend on N ,

X
|Ji | 1/m. (5.6)
i=1

Since m is arbitrary, it follows from (5.5) and (5.6) that E has zero
measure.

We denote the set of all functions that can be arrived at as the almost-
everywhere limit of an increasing sequence of step functions as Linc (R). For
such a function (f = limn sn ), we define
Z Z
f = lim sn .
n

Again, we have to check that this definition does not depend on exactly
which sequence {sn } we have chosen.

The proof uses the following technical lemma whose proof, which appeals
to the Heine-Borel Theorem (compactness of [0,1]) we omit.

Lemma 5.4 Let n be a decreasing sequenceR of non-negative step functions


such that n 0 almost everywhere. Then n 0 as n .

Lemma 5.5 If f, g Linc (R) with f g almost everywhere. If n and n


are increasing sequences in Lstep (R) tending to f, g, respectively, then
Z Z
lim n lim n .
n n

Proof Since f g almost everywhere, there exists a set E of zero measure


such that f (x) g(x) for all x
/ E . Also n (x) f (x) for x
/ E (E
with zero measure) and n (x) g(x) for x / E (E with zero measure).
Let E = E E E ; then E has zero measure.
42 Lebesgue integration

For each fixed k, the sequence k n is decreasing in n, with limit k f


for x
/ E. For x E, k g, and so

k f g f 0.

Thus k n converges to a non-positive limit for x E, and so the sequence


(k n )+ converges to zero almost everywhere. It follows from Lemma 5.4
that for each fixed k,
Z
(k n )+ 0

as n . Since
Z Z Z Z
k n = (k n ) (k n )+ ,

letting n it follows that


Z Z
k lim n ,
n

and now letting k


Z Z
lim k lim n .
k n

It follows from this lemma that f is well-definedR for f Linc


R
R (R), since
one can deduce that if f = lim n = lim n then lim n = lim n .

We can prove the following properties for the integral of functions in


Linc (R):

Proposition 5.6
If f, g Linc (R) and f g then f g.
R R
(i)
If f, g Linc (R) then f + g Linc (R) and (f + g) = f + g.
R R R
(ii)
If f Linc (R) and 0 then f = f .
R R
(iii)
(iv) If f, g Linc (R) then max(f, g), min(f, g) Linc (R); in particular,
f + = max(f, 0) Linc (R).

The proofs are obvious.


5.3 The space L1 (R) of integrable functions 43

5.3 The space L1 (R) of integrable functions

Finally, we define the space of integrable functions on R, written L1 (R), to


be all functions of the form f (x) = g(x) h(x) with g, h Linc (R), and set
Z Z Z
f = g h.

This definition is consistent: if f = g1 h1 = g2 h2 then g1 +h2 = g2 +h1 ,


and so by the addition property for functions in Linc (R),
Z Z Z Z Z Z
g1 + h2 = g1 + h2 = g2 + h1 = g2 + h1 ,

from whence
Z Z Z Z
g1 h1 = g2 h2 .

We can now show that the integral of L1 functions has the same properties
as the integral on Lstep (R):

(L) If f1 , f2 L1 (R) and R then f1 + f2 L1 (R) and


Z Z Z
(f1 + f2 ) = f1 + f2 .

Proof With fi = gi hi , gi , hi Linc (R), consider 0. Then

f1 + f2 = (g1 + g2 ) (h1 + h2 )

where both the bracketed terms are in Linc (R). So f1 + f2 L1 (R), and
Z Z Z
f1 + f2 = (g1 + g2 ) (h1 + h2 )
Z Z Z Z
= (g1 h1 ) + (g2 h2 ) = f1 + f2 .

Finally if f L1 (R) with f R= g h (g, inc


R h L (R)), then f = h g.
1
So f L (R), and clearly f = f .
(P) If f L1 (R) and f 0 a.e. then f 0.
R

Proof If f = g h, g, h Linc (R), then f 0 implies that g h.


44 Lebesgue integration

Since integration respects order for functions in Linc (R),


R R
g h, and
therefore
Z Z Z
f = g h 0.

RNote that it is a consequence of this property that if f = 0 a.e. then


f = 0.
(M) If f L1 (R) then |f | L1 (R) and
Z Z

f |f |.

Proof Write f = g h with g, h Linc (R). One can easily check that

|f | = max(g, h) min(g, h),

and so |f | L1 (R) using Proposition 5.6. We need to show that


Z Z Z Z Z
g h = f |f | = max(g, h) min(g, h)

and
Z Z Z Z Z
h g= f |f | = max(g, h) min(g, h).

By symmetry we need only check one of these, and for this it is sufficient
to show that

g(x) + min(g(x), h(x)) h(x) + max(g(x), h(x)).

If g < h then the LHS is 2g(x) and the RHS is 2h(x), so the inequality
holds; while if g h the LHS is g + h and the RHS is h + g, so the
inequality holds once more.
This result will imply that there are some functions that can be integrated
to nevertheless are not in L1 (R), see the Examples Sheet.
(T) If f L1 (R) and fd (x) = f (x + d) then fd L1 (R) and fd = f .
R R

Proof Follows immediately from the same property for functions in Linc (R).

We have now defined the space L1 (R) of integrable functions, and shown
that the integral as we have defined it has the four properties we start with
for the integral on Lstep (R).
5.3 The space L1 (R) of integrable functions 45

If we want to talk about integrals on intervals, we say that f L1 (a, b)


f is integrable on (a, b) if f (a,b) L1 (R).

There are three fundamental theorems for the Lebesgue integral. The first
is the Monotone Convergence Theorem, which looks like the construction
of the Lebesgue integral, but with a monotone sequence of step functions
replaced by a monotone sequence of integrable functions.

Theorem 5.7 (Monotone Convergence Theorem) Suppose that fn


1
R
L (R), fn (x) fn+1 (x) almost everywhere, and fn K for some K
independent of n. Then there exists a g L1 (R) such that fn g almost
everywhere, and
Z Z
g = lim fn .

Corollary 5.8 Suppose that f L1 (R) and


R
|f | = 0. Then f = 0 almost
everywhere.

Proof Exercise.

Theorem 5.9 (Dominated Convergence Theorem) Suppose that fn


L1 (R) and that fn f almost everywhere. If there exists a function g
L1 (R) such that |fn (x)| g(x) for almost every x, and for every n, then
f L1 (R) and
Z Z
f = lim fn .

To define an integral of a function of two variables one would naturally


proceed by analogy with the construction above: take step functions that
are constant on rectangles, construct an integral on Linc (R2 ) by taking lim-
its of monotonic sequences, and the construct L1 (R2 ) as the limits of differ-
ences. But this does not related double integrals to single integrals. This
is achieved by the Fubini and Tonelli theorems. We give a less-than-rigorous
formulation:

Theorem 5.10 (Fubini-Tonelli) If f : R2 R is such that either


Z Z  Z Z 
|f (x, y)| dx dy < or |f (x, y)| dy dx <
46 Lebesgue integration

then f L1 (R2 ) and


Z Z  Z Z 
f (x, y) dx dy = f (x, y) dy dx.

(The less-than-rigorous nature is that the conditions require integrability


properties of |f (x, y)|: first that for almostRevery y R, |f (, y)| L1 (R),
and then that the resulting function g(y) = |f (x, y)| dy is again in L1 (R).)

5.4 The Lebesgue spaces Lp

We denote by Lp (R) [Lp (I)] the space of functions that are pth power inte-
grable [on an interval I]. It is natural to try to use the Lp norm
Z 1/p
p
kf kLp = |f (x)|

on Lp (I). However, although this is a norm on Lp (I) [recall this was con-
tinuous functions with finite Lp norm], it in facts fails to satisfy property (i)
in the definition of a norm if kf kLp = 0 then we only have f = 0 almost
everywhere.

So strictly speaking if we want to consider Lp as a normed space we in


fact have to consider Lp / , where f g if f = g almost everywhere. This
identification is usually done tacitly, rather than by being explicit about the
quotienting operation.

We have now gone almost full circle. We started developing these Lp


spaces because C 0 is not complete in the L1 norm. Using the abstract
definition of completion, we defined L1 as the completion of C 0 in the L1
norm. But now we have defined a space, L1 (R), which a priori has nothing
to do with C 0 and its completion.

In fact one can show that C 0 is dense in L1 fairly easily (see Exercises).
We now show that L1 as defined is complete (and so is the completion of
C 0 that we were after).

We start with a general lemma of independent interest.


5.4 The Lebesgue spaces Lp 47

Lemma 5.11 Suppose that (X, k k) is a normed space in which



X
kyj k <
j=1
Pn
implies that limn j=1 yj exists. Then (X, k k) is complete.

Theorem 5.12 The Lebesgue space L1 (R) is complete.

In fact we prove the following:

Proposition 5.13 Let fk L1 (R). Then


P P
(i) If k=1 kfk kL1 < then k=1 |fk | converges to an element of
L1 (R).
(ii) If
P 1
P 1
k=1 |fk | L (R) then k=1 fk L (R).

Proof
(i) Suppose that fk L1 (R) and
Z
X
K= |fk | < .
j=1

We apply the MCT to gn = nk=1 |fk |, since gn+1 gn and gn K


P R

for every k.
(ii) We now apply the DCT to hn = nk=1 fk . Each hn L1 (R), and by
P

the triangle inequality


n
X
X
|hn | |fk | |fk |,
k=1 k=1

and by assumption the right-hand side is in L1 (R). Since k |fk |


P
P P
converges almost everywhere, so does k fk (if {ak } R and |ak |
P P
converges then so does ak ), and the DCT implies that k=1 fk
L1 (R).

For completeness of L2 (which follows similar lines) see the Examples


sheet.
6
Inner product spaces

If x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) are two elements of Rn then we


define their dot product as

x y = x1 y1 + + xn yn . (6.1)

This is one concrete example of an inner product on a vector space:

Definition 6.1 An inner product (, ) on a vector space V is a map (, ) :


V V K such that for all x, y, z V and for all K,
(i) (x, x) 0 with equality i x = 0,
(ii) (x + y, z) = (x, z) + (y, z),
(iii) (x, y) = (x, y), and
(iv) (x, y) = (y, x).

Note that

in a real vector space the complex conjugate in (iv) is unnecessary;


in the complex case the restriction that (y, x) = (x, y) implies in particular
that (x, x) = (x, x), i.e. that (x, x) is real, and so the requirement that
(x, x) 0 makes sense; and
(iii) and (iv) imply that the inner product is conjugate linear in its second
element, i.e. (x, y) = (x, y).

A vector space equipped with an inner product is known as an inner


product space.

48
6.1 Inner products and norms 49

Example 6.2 In the space 2 (K) of square summable sequences, for x =


(x1 , x2 , . . .) and y = (y1 , y2 , . . .) one can dene an inner product

X
(x, y) = xj yj .
j=1

|xj yj | 12 ( j |xj |2 + |yj |2 ).


P P
This is well-dened since j

Example 6.3 The expression


Z b
(f, g) = f (x)g(x) dx
a

denes an inner product on the space L2 (a, b).

6.1 Inner products and norms

Given an inner product we can define kvk by setting

kvk2 = (v, v). (6.2)

We will soon shows that k k defines a norm; we say that it is the norm
induced by the inner product (, ).

6.2 The Cauchy-Schwarz inequality

Lemma 6.4 (Cauchy-Schwarz inequality) Any inner product satises


the inequality

|(x, y)| kxkkyk for all x, y V, (6.3)

where k k is dened in (6.2).

Proof If x = 0 or y = 0 then (6.3) is clear; so suppose that x 6= 0 and y 6= 0.


For any K we have

(x y, x y) = (x, x) (y, x) (x, y) + ||2 (y, y) 0.


50 Inner product spaces

Setting = (x, y)/kyk2 we obtain

|(x, y)|2 |(x, y)|2


0 kxk2 2 +
kyk2 kyk2
|(x, y)|2
= kxk2 ,
kyk2

which implies (6.3).

The Cauchy-Schwarz inequality allows us to show easily that the map


x 7 kxk is a norm on V . Property (i) is clear, since kxk 0 and if
kxk2 = (x, x) = 0 then x = 0. Property (ii) is also clear, since

kxk2 = (x, x) = (x, x) = ||2 kxk2 .

Property (iii), the triangle inequality, follows from the Cauchy-Schwarz in-
equality (6.3), since

kx + yk2 = (x + y, x + y)
= kxk2 + (x, y) + (y, x) + kyk2
kxk2 + 2kxkkyk + kyk2
= (kxk + kyk)2 ,

i.e. kx + yk kxk + kyk.

As an example of the Cauchy-Schwarz inequality, consider the standard


inner product on Rn . As we would expect, the norm derived from this inner
product is just
1/2
Xn
kxk = |xj |2 .
j=1

The Cauchy-Schwarz inequality says that


2
n n n
X X X
|(x, y)|2 = |xj |2 |yj |2 ,

xj yj (6.4)
j=1 j=1 j=1

or just |x y| |x||y|.
6.3 The relationship between inner products and their norms 51

Exercise 6.5 The norm on the sequence space 2 derived from the inner
P
product (x, y) = xj yj is
1/2
X
kxk2 = |xj |2 .
j=1

Obtain the Cauchy-Schwarz inequality for 2 using (6.4) and a limiting ar-
gument rather than Lemma 6.4.

The Cauchy-Schwarz inequality in L2 (a, b) gives the very useful:


Z b Z b 1/2 Z b 1/2
2 2


f (x)g(x) dx |f (x)| dx |g(x)| dx ,
a a a

for f, g L2 (a, b). (This shows in particular that if f, g L2 (a, b) then


f g L1 (a, b).)

6.3 The relationship between inner products and their norms

Norms derived from inner products have one key property in addition to
(i)(iii) of Definition 2.1:

Lemma 6.6 (Parallelogram law) Let V be an inner product space with


induced norm k k. Then

kx + yk2 + kx yk2 = 2(kxk2 + kyk2 ) for all x, y V. (6.5)

Proof Simply expand the inner products:

kx + yk2 + kx yk2 = (x + y, x + y) + (x y, x y)
= kxk2 + (y, x) + (x, y) + kyk2
+kxk2 (y, x) (x, y) + kyk2
= 2(kxk2 + kyk2 ).
52 Inner product spaces

Exercise 6.7 Show that there is no inner product on C 0 ([0, 1]) which induces
the sup or L1 norms,
Z 1
kf k = sup |f (x)| or kf kL1 = |f (x)| dx.
x[0,1] 0

Given a norm that is derived from an inner product, one can reconstruct
the inner product as follows:

Lemma 6.8 (Polarisation identity) Let V be an inner product space with


induced norm k k. Then if V is real

4(x, y) = kx + yk2 kx yk2 , (6.6)

while if V is complex

4(x, y) = kx + yk2 kx yk2 + ikx + iyk2 ikx iyk2 . (6.7)

Proof Once again, rewrite the right-hand sides as inner products, multiply
out, and simplify.

If V is an real/complex inner product space and k k is a norm on V that


satisfies the parallelogram law then (6.6) or (6.7) defines an inner product
on V . In other words, the parallelogram law characterises those norms that
can be derived from inner products. (This argument is non-trivial.)

Lemma 6.9 If V is an inner product space with inner product (, ) and


derived norm k k, then xn x and yn y implies that

(xn , yn ) (x, y).

Proof Since xn and yn converge, kxn k and kyn k are bounded (the proof is
a simple exercise). Then

|(xn , yn ) (x, y)| = |(xn x, yn ) + (x, yn y)|


kxn xkkyn k + kxkkyn yk

implies that (xn , yn ) (x, y).


6.3 The relationship between inner products and their norms 53

This lemma is extremely useful: that we can swap limits and inner prod-
ucts means that if
Xn
xj
j=1
Pn P
converges (so that j=1 xj x= j=1 xj ) then


X
X
xj , y = (xj , y),
j=1 j=1

i.e. we can swap inner products and sums.

Definition 6.10 A Hilbert space is a complete inner product space.

Examples: Rn with inner product and norm


1/2
  Xn n
X
(x1 , . . . , xn ), (y1 , . . . , yn ) = xj yj k(x1 , . . . , xn )k = |xj |2 ;
j=1 j=1

Cn with inner product and norm


1/2
  Xn Xn
(w1 , . . . , wn ), (z1 , . . . , zn ) = wj zj k(w1 , . . . , wn )k = |wj |2 ;
j=1 j=1

2 (K) with inner product and norm


1/2

X X
(x, y) = xj yj kxk = |xj |2
j=1 j=1

(the complex conjugate is redundant if K = R); and L2 (I) with inner prod-
uct and norm
Z Z 1/2
2
(f, g) = f (x)g(x) dx kf kL2 = |f (x)| dx .
I I

From now on we will assume unless explicitly stated that all the above
spaces are equipped by their standard inner product (and corresponding
norm).
7
Orthonormal bases in Hilbert spaces

From now on we will denote by H an arbitrary Hilbert space, with inner


product (, ) and norm k k; we take K = C, since the case K = R is
simplified only by removing the complex conjugates.

Our aim in this chapter is to discuss orthonormal bases for Hilbert spaces.
In contrast to the Hamel basis we considered earlier, we are now going to
allow infinite linear combinations of basis elements (called a Schauder basis).

7.1 Orthonormal sets

Definition 7.1 Two elements x and y of an inner product space are said
to be orthogonal if (x, y) = 0. (We sometimes write x y.)

Clearly if (x, y) = 0 then

kx + yk2 = (x + y, x + y) = kxk2 + (x, y) + (y, x) + kyk2 = kxk2 + kyk2 (7.1)

(Pythagoras). Sums of orthogonal vectors are therefore very useful in cal-


culations, since all the cross terms in their norm vanish.

Definition 7.2 A set E is orthonormal if kek = 1 for all e E and


(e1 , e2 ) = 0 for any e1 , e2 E with e1 6= e2 .

Note that this definition does not require E to be countable. Note also

54
7.1 Orthonormal sets 55

that any orthonormal set must be linearly independent, since if


n
X
j ej = 0 n N, j K, ej E,
j=1

one can take the inner product with each {ej } in turn to show that j = 0
for j = 1, . . . , n.

Example 7.3 The set {ej }


j=1 , where

ej = (0, 0, . . . , 1, . . . , 0, . . .)

(with the 1 in the jth position), is an orthonormal set in 2 .

Example 7.4 Consider the space L2 (, ) and the set


1 1 1 1 1
E = , cos t, sin t, sin 2t, cos 2t, . . . .
2
Then E is orthonormal, since
Z Z
cos2 nt dt = sin2 nt dt = ;

for any n, m
Z Z Z
cos nt dt = sin nt dt = sin nt cos mt dt = 0;

and for any n 6= m


Z Z Z
cos nt cos mt dt = sin nt sin mt dt = sin nt cos mt dt = 0.

Expansions involving orthonormal elements of an inner product space are


easy to treat, essentially due to the following simple lemma.

Lemma 7.5 Let e1 , . . . , en be an orthonormal set in an inner product space


V . Then for any j K.
2
n n
X X
|j |2 .


j ej
=

j=1 j=1
56 Orthonormal bases in Hilbert spaces

Proof Use induction and the Pythagorean property (7.1), noting that

n1
X
j ej , n en = 0.
j=1

The following lemma, proved using the Gram-Schmidt orthonormalisa-


tion process (which we will revisit later), guarantees the existence of an
orthonormal basis in any nite-dimensional inner product space.

Lemma 7.6 Let (, ) be any inner product on a vector space V of dimension


n. Then there exists an orthonormal basis {ej }nj=1 of V .

It follows that in some sense the dot product (6.1) is the canonical in-
ner product on a finite-dimensional space. Indeed, with respect to any the
orthonormal basis {ej } the inner product (, ) has the form (6.1), i.e.

Xn n
X n
X
xj ej , yk ek = xj yk (ej , ek ) = x1 y1 + + xn yn .
j=1 k=1 i,j=1

7.2 Convergence and orthonormality in Hilbert spaces

In an infinite-dimensional Hilbert space we cannot hope to find a finite basis,


since then the space would by definition be finite-dimensional. The best that
we can hope for is to find a countable basis {ej } j=1 , in terms of which to
expand any x H as potentially infinite series,

X
x= j ej .
j=1

We make the obvious definition of what this equality means.

Definition 7.7 Let (X, k kX ) be a normed space. Then



X
j ej = x
j=1
7.2 Convergence and orthonormality in Hilbert spaces 57

i the partial sums converge to x in the norm of X, i.e.



n
X

n en x

0 as n .
j=1
X

We now formalise our notion of a basis for a Hilbert space. Note that at
present we do not require E to be countable.

Definition 7.8 A set E is a basis for H if every x can be written uniquely


in the form
X
x= j ej (7.2)
j=1

for some j K, ej E.

(Note that if E is a basis in the sense of Definition 7.8, i.e. the expansion
in terms of the ej is unique then E is linearly independent, since if
n
X
0= j ej
j=1

there is a unique expansion for zero and so we must have j = 0 for all
j = 1, . . . , n.)
There is a subtlety here. If E is countable then one can assume that E = {ej } j=1 , i.e. that the
elements of E are specified in a particular order. In this case the uniqueness is clear. But if
E is uncountable uniqueness means that one does not care about the order of the summation
in (7.2), and so certainly this order should not affect the value of the sum itself. This requires
proof: Suppose that {wj } is a rearrangement of the {ej }. Set n = (x, en ), m = (x, wm ),

X
X
x1 = n en and x2 = m wm .
n=1 m=1

Then (by Lemma 6.9) it follows that (x1 , en ) = (x, en ) and (x2 , wm ) = (x, wm ). Since en = wm
for some m = m(n), it follows that

(x1 x2 , en ) = (x1 , en ) (x2 , wm ) = (x, en ) (x, wm ) = 0

for every n, and similarly (x1 x2 , wm ) = 0 for every m. Thus



!
X X
2
kx1 x2 k = x1 x2 , n en m wm
n=1 m=1

X X
= n (x1 x2 , en ) m (x1 x2 , wm ) = 0,
n=1 m=1

and so x1 = x2 , as required.
58 Orthonormal bases in Hilbert spaces

If E is a basis and is orthonormal, we refer to it as an orthonormal basis.


Note that an orthonormal set is a basis provided that every x can be written
in the form (7.2), and the uniqueness follows from the orthonormality.

Indeed, suppose that


X
X
x= j ej and x= k fk
j=1 j=1

with {j }, {k } K non-zero, and {ej }, {fk } E.

Write E1 = {ej }
j=1 and E2 = {fj }j=1 . Set U = E1 E2 , E1 = E1 \ U ,

and E2 = E2 \U . In other words, U consists of those basis elements common
to {ej } and {fj }, E1 is those occurring only in {ej }, and E2 those occurring
only in {fj }. All these sets are at most countable.

Then with some abuse of notation one can write


X X X X
x= u u + e e and y= u u + f f.
uU eE1 uU f E2

It follows that
X X X
(u u )u + e e f f = 0.
uU eE1 f E2

Taking the inner product of this with any e E1 shows that e = 0, while
the inner product with any f E2 implies that f = 0. So in fact E1 = E2 ,
and the inner product with any u U then shows that u = u , and so the
expansion is unique. (In each of these steps we use Lemma 6.9 to swap the
order of the inner product and summation.)

Rather than discuss general bases, we concentrate on orthonormal bases,


with a particularly emphasis on countable bases E = {ej } j=1 . Neglecting
for the moment the question of convergence, and of conditions to guarantee
that {ej } really is a basis, suppose that the equality (7.2) holds for some
x H, i.e. that

X
x= j ej .
j=1
7.2 Convergence and orthonormality in Hilbert spaces 59

To find the coefficients j , simply take the inner product with some ek to
give

X
X
(x, ek ) = j ej , ek = j (ej , ek ) = k ,
j=1 j=1
P
and so we would expect k = (x, ek ). (Note that if x = j ej then this
manipulation is rigorous, using Lemma 6.9 to change the order of the inner
product and the sum.) So if E is an orthonormal basis we would expect to
obtain the expansion

X
x= (x, ej )ej .
j=1

Assuming that the Pythagoras result of Lemma 7.5 holds for infinite sums,
we would expect that
X
|(x, ej )|2 = kxk2 .
j=1

In some ways this says that the {ej } capture all directions in H. Presum-
ably if {ej } do not form an orthonormal basis we should be able to find an
x such that
X
|(x, ej )|2 < kxk2 .
j=1

We have not proved any of this yet, since we are assuming that (7.2) holds;
but it motivates the following lemma, whose result is known as Bessels
inequality.

Lemma 7.9 (Bessels inequality) Let V be an inner product space and


{en }
n=1 an orthonormal sequence. Then for any x V we have

X
|(x, en )|2 kxk2
n=1

and in particular the left-hand side converges.

Proof Let us denote by xk the partial sum


k
X
xk = (x, ej )ej .
j=1
60 Orthonormal bases in Hilbert spaces

Clearly
k
X
2
kxk k = |(x, ej )|2
j=1

and so we have

kx xk k2 = (x xk , x xk )
= kxk2 (xk , x) (x, xk ) + kxk k2
k
X k
X
= kxk2 (x, ej )(ej , x) (x, ej )(x, ej ) + kxk k2
j=1 j=1
2 2
= kxk kxk k .

It follows that
k
X
|(x, ej )|2 = kxk k2 kxk2 kx xk k2 kxk2 .
j=1

We can give an interesting corollary about the the coefficients (x, e) when
E is an uncountable set.

Corollary 7.10 Let E be an uncountable orthonormal set in an inner prod-


uct space V . Then for each x V ,

{e E : (x, e) 6= 0}

is at most a countable set.

Proof For each fixed m N, consider the set

Em = {e E : |(x, e)| 1/m}.

Then this set can have no more than m2 kxk2 elements. Indeed, if Em has
N > m2 kxk2 elements, one can select N elements {e1 , . . . , eN } from Em ,
and then
N
X 1
|(x, ej )|2 N > kxk2 .
m2
j=1
7.2 Convergence and orthonormality in Hilbert spaces 61

But this contradicts Bessels inequality. Thus each Em contains only a finite
number of elements, and hence

[
Em = {e E : (x, e) 6= 0}
m=1

contains at most a countable number of elements.

We now use Bessels inequality to give a simple criterion for the conver-
gence of a sum
P
j=1 j ej when the {ej } are orthonormal.

Lemma 7.11 Let H be a Hilbert space and {en } an orthonormal sequence


in H. The series
P
n=1 n en converges i

X
|n |2 < +
n=1

and then
2
X X
n en = |n |2 . (7.3)



n=1 n=1

P
We could rephrase this as n=1 n en converges iff = (1 , 2 , . . .) 2 .

Pn
Proof Suppose that j=1 j ej converges to x as n ; then
2
n n
X X
|j |2


j ej =

j=1 j=1

converges to kxk2 as n (see Lemma 2.16).


Conversely, if | |2 < + then { nj=1 |j |2 } is a Cauchy sequence.
P P
Pn j=1 j
Setting xn = j=1 j ej we have, taking wlog m > n,
2
Xm Xm
2
|j |2 ,

kxn xm k = j ej =

j=n+1 j=n+1

and so {xn } is a Cauchy sequence and therefore converges to some x H,


since H is complete. The equality in (7.3) follows as above.

By combining this lemma with Bessels inequality we obtain:


62 Orthonormal bases in Hilbert spaces

Corollary 7.12 Let H be a Hilbert space and {en }


n=1 an orthonormal
sequence in H. Then for any x the sequence

X
(x, en )en
n=1

converges.

In fact one can use Corollary 7.10 to deduce that for any orthonormal set
E,
X
(x, e)e
eE

converges (we have already seen that this is independent of the order of
summation).

7.3 Orthonormal bases in Hilbert spaces

We now show that E = {en }


n=1 forms a basis for H iff

X
kxk2 = |(x, ej )|2 for all x H.
j=1

The same results hold for a general orthonormal set E, but we stick to the
countable case for simplicity of presentation.

Proposition 7.13 Let E = {ej } j=1 be an orthonormal set in a Hilbert


space H. Then the following are equivalent to the statement that E is an
orthonormal basis for H:
(a) x =
P
n=1 (x, en )en for all x H;
(b) kxk =
2 2
P
n=1 |(x, en )| for all x H; and
(c) (x, en ) = 0 for all n implies that x = 0.
(d) the linear span of E = {en } n=1 is dense in H, i.e. for any x H
and any > 0 there exists an n N, fj E and j K such that


Xn

x j fj < .

j=1
7.3 Orthonormal bases in Hilbert spaces 63

Proof If E is an orthonormal basis for H then we can write



X n
X
x= j ej , i.e. x = lim j ej .
j
j=1 j=1

Clearly if k n we have
Xn
( j ej , ek ) = k ,
j=1

and using the properties of the inner product of limits we obtain k = (x, ek )
and hence (a) holds. The same argument shows that if we assume (a) then
this expansion is unique and so E is a basis.
We first show that (a) (b) (c) (a), and then that (a) (d) and
(d) (c).
(a) (b) is immediate from (2.16).
(b) (c) is immediate since kxk = 0 implies that x = 0.
(c) (a) Take x H and let

X
y =x (x, ej )ej .
j=1

For each m N we have, using Lemma 6.9 (continuity of the inner product),

Xn
(y, em ) = (x, em ) lim (x, ej )ej , em
n
j=1
= 0
since eventually n m. It follows from (c) that y = 0, i.e. that

X
x= (x, ej )ej
j=1

as required.
(a) (d) is clear, since given any x and > 0 there exists an n such that

X n

(x, ej )ej x < .

j=1

(d) (c) Take x H such that (x, ej ) = 0 for every j. Choose xn


contained in the linear space of E such that xn x. Then
kxk2 = (x, x) = ( lim xn , x) = lim (xn , x) = 0,
n n
64 Orthonormal bases in Hilbert spaces

since xn is a (finite) linear combination of the ej . So x = 0.

Example 7.14 The set {ej }


j=1 , where

ej = (0, 0, . . . , 1, . . . , 0, . . .)
(with the 1 in the jth position), is an orthonormal basis for 2 , since it clear
that if (x, ej ) = xj = 0 for all j then x = 0.

Example 7.15 The sine and cosine functions given in example 7.4 are an
o-n basis for L2 (, ).

Lemma 7.16 Any innite-dimensional Hilbert H space contains a countably


innite orthonormal sequence.

Proof Suppose that H contains an orthonormal set Ek = {ej }kj=1 . Then Ek


does not form a basis for H, since H is infinite-dimensional. It follows that
there exists a non-zero uk H such that
(uk , ej ) = 0 for all j = 1, . . . , k
(for otherwise by characterisation (c), Ek would be a basis). Define ek+1 =
uk /kuk k to obtain an orthonormal set Ek+1 = {e1 , . . . , ek }.
The result follows by induction, start with e1 = x/kxk for any non-zero
x H.

Theorem 7.17 A Hilbert space is nite-dimensional i its unit ball is com-


pact.

Proof The unit ball is closed and bounded. If H is finite-dimensional this


is equivalent to compactness by Corollary 3.12. If H is infinite-dimensional
then it contains a countable orthonormal set {ej }
j=1 , and for i 6= j

kei ej k2 = 2.
The {ej } form a sequence in the unit ball that can have no convergent
subsequence.
8
Closest points and approximation

8.1 Closest points in convex subsets

We start with a general result about closest points.

Definition 8.1 A subset A of a vector space V is said to be convex if for


every x, y A and [0, 1], x + (1 )y A.

Lemma 8.2 Let A be a non-empty closed convex subset of a Hilbert space


H and let x H. Then there exists a unique a A such that

kx ak = inf{kx ak : a A}.

Proof Set = inf{kx ak : a A} and find a sequence an A such that


1
kx an k2 2 + . (8.1)
n
We will show that {an } is a Cauchy sequence. To this end, we use the
parallelogram law:

k(x an ) + (x am )k2 + k(x an ) (x am )k2 = 2[kx an k2 + kx am k2 ].

Which gives
2 2
k2x (an + am )k2 + kan am k2 < 42 + +
m n
or
2 2
kan am k2 42 + + 4kx 12 (an + am )k2 .
m n

65
66 Closest points and approximation

Since A is convex, an + am A, and so kx 12 (an + am )k2 2 , which gives


2 2
kan am k2 + .
m n
It follows that {an } is Cauchy, and so an a. Since A is closed, a A.
To show that a is unique, suppose that ku a k = with a 6= a. Then
ku 12 (a + a)k since A is convex, and so, using the parallelogram law
again,
ka ak2 4 2 4 2 = 0,
i.e. a = a and a is unique.

8.2 Linear subspaces and orthogonal complements

In an infinite-dimensional space, linear subspaces need not be closed. For


example, the space f (R) of all real sequences with only a finite number of
non-zero terms is a linear subspace of 2 (R), but is not closed (consider the
sequence x(n) = (1, 12 , 13 , . . . , n1 , 0, . . .)).

If X is a subset of H then the orthogonal complement of X in H is


X = {u H : (u, x) = 0 for all x X}.
Clearly if Y X then X Y .

Lemma 8.3 If X is a subset of H then X is a closed linear subspace of


H.

Proof It is clear that X is a linear subspace of H: if u, v X and K


then
(u + v, x) = (u, x) + (v, x) = 0 and (u, x) = (u, x) = 0
for every x X. To show that X is closed, suppose that un X and
un u then
(u, x) = lim (un , x) = 0,
n

and so X is closed.

Note that in general (X ) X; one has equality if X is a closed linear


subspace (see examples sheet).
8.2 Linear subspaces and orthogonal complements 67

Note that Proposition 7.13 shows that E is a basis for H iff E = {0}
(since this is just a rephrasing of (c): (u, ej ) = 0 for all j implies that u = 0).

Note also that if Span(E) denotes the linear span of E, i.e.


n
X
Span(E) = {u H : u = j ej n N, j K, ej E}
j=1

then E = (Span(E)) . Since E Span(E) one immediately has the


inclusion (Span(E)) E , and if y E , then for any u Span(E),
i.e. for any
n
X
u= j ej n N, j K, ej E,
j=1

one has
n
X n
X
(y, u) = (y, j ej ) = j (y, ej ) = 0
j=1 j=1

since y E . So y (Span(E)) , which shows that E (Span(E)) and


hence the claimed equality.

In fact one also has E = (clin(E)) . Recall that the closed linear span
of E, clin(E) is given by

clin(E) = {u H : > 0 there exists an x Span(E) with ku xk < }.

Since Span(E) clin(E), we have (Span(E)) (clin(E)) . To show


equality, take y (Span(E)) and u clin(E): we want to show that
(y, u) = 0 so that u (clin(E)) . Now, since u clin(E), there exists a
sequence xn Span(E) such that xn 0. Therefore

(y, u) = (y, lim xn ) = lim (y, xn ) = 0,


n n

as required.

Proposition 8.4 If U is a closed linear subspace of a Hilbert space H then


any x H can be written uniquely as

x=u+v with u U, v U ,

i.e. H = U U . The map PU : H U dened by

PU x = u
68 Closest points and approximation

is called the orthogonal projection of x onto U , and satises

PU2 x = PU x and kPU xk kxk for all x H.

Proof If U is a closed linear subspace then U is closed and convex, so the


above result shows that given x H there is a unique closest point u U .
It is now simple to show that x u U and then such a decomposition is
unique.
Indeed, consider v = x u; the claim is that v U , i.e. that

(v, y) = 0 for all y U.

Consider kx (u ty)k = kv + tyk; then

(t) = kv + tyk2 = (v + ty, v + ty)


= kvk2 + (ty, v) + (v, ty)) + kyk2
= kvk2 + t(y, v) + t(y, v) + |t|2 kyk2
= kvk2 + 2 Re{t(y, v)} + |t|2 kyk2 .

We know from the construction of u that kv + tyk is minimal when t = 0. If


t is real then this implies that d/dt(0) = 2 Re{(y, v)} = 0. If t = is, with
s real, then d(is)/ds = 2 Im{(y, v)} = 0. So (y, v) = 0 for any y U ,
i.e. v U is claimed.
Finally, the uniqueness follows easily: if x = u1 + v1 = u2 + v2 , then
u1 u2 = v2 v1 , and so

|v1 v2 |2 = (v1 v2 , v1 v2 ) = (v1 v2 , u2 u1 ) = 0,

since u1 u2 U and v2 v1 U .
If PU x denotes the closest point to x in U then clearly PU2 = PU , and it
follows from the definition of u that

kxk2 = kuk2 + kx uk2 ,

thus ensuring that


kPU xk kxk,

i.e. the projection can only decrease the norm.

We will find an explicit expression for PU in Theorem 8.5


8.3 Best approximations 69

8.3 Best approximations

We now investigate the best approximation of elements of H using the closed


linear span of an orthonormal set E. Of course, if E is a basis then there is
no approximation involved.

Theorem 8.5 Let E be an orthonormal set E = {ej }jJ , where J =


(1, 2, . . . , n) or N. Then for any u H, the closest point to x in clin(E) is
given by
X
y= (x, ej )ej .
jJ

In particular the orthogonal projection of X onto clin(E) is given by


X
Pclin(E) x = (x, ej )ej .
jJ

P
Proof Consider x j j ej . Then
2

X X X X
2
|j |2

x
= kxk
j ej (x, j ej ) (j ej , x) +
j j j j
X X X
2
= kxk j (x, ej ) j (x, ej ) + |j |2
j j j
X
2 2
= kxk |(x, ej )|
j
Xh i
+ |(x, ej )|2 j (x, ej ) j (x, ej ) + |j |2
j
X X
2
= kxk |(x, ej )|2 + |(x, ej ) j |2 ,
j j

and so the minimum occurs when j = (x, ej ) for all j.

Example 8.6 The best approximation of an element x 2 in terms of


{ej }nj=1 (elements of the standard basis) is simply
n
X
(x, ej )ej = (x1 , x2 , . . . , xn , 0, 0, . . .).
j=1
70 Closest points and approximation

Example 8.7 If E = {ej }j=1 is an orthonormal basis in H then the best


approximation of an element of H in terms of {ej }nj=1 is just given by the
partial sum
Xn
(x, ej )ej .
j=1

Now suppose that E is a finite or countable set that is not orthonormal.


We can still find the best approximation to any u H that lies in clin(E)
by using the Gram-Schmidt orthonormalisation process:

Proposition 8.8 (Gram-Schmidt orthonormalisation) Given a set


E = {ej }jJ with J = N or J = (1, . . . , n) there exists an orthonor-
mal set E = {ej }jJ such that

Span(e1 , . . . , ek ) = Span(e1 , . . . , ek )

for every k N.

Proof First omit all elements of {en } which can be written as a linear
combination of the preceding ones.
Now suppose that we already have an orthonormal set (e1 , . . . , en ) whose
span is the same as (e1 , . . . , en ). Then we can define en+1 by setting
n
X en+1
en+1 = en+1 (en+1 , ei )ei and en+1 = .
ken+1 k
i=1

The span of (e1 , . . . , en+1 ) is clearly the same as the span of (e1 , . . . , en , en+1 ),
which is the same as the span of (e1 , . . . , en , en+1 ) using the induction hy-
pothesis. Clearly ken+1 k = 1 and for m n we have
n
1  X 
(en+1 , em ) = (en+1 , em ) (en+1 , ei )(ei , em ) = 0
ken+1 k
i=1

since (e1 , . . . , en ) are orthonormal. Setting e1 = e1 /ke1 k starts the induction.

Example 8.9 Consider approximation of functions in L2 (1, 1) with poly-


nomials of degree up to n. We can start with the set {1, x, x2 , . . . , xn },
and then use the Gram-Schmidt process to construct polynomials that are
orthonormal w.r.t. the L2 (1, 1) inner product.
8.3 Best approximations 71

We begin with e1 = 1/ 2, and then consider
1 1
 
1 1
Z

e2 = x x, =x t dt = x
2 2 2 1
so
1
r
2 3
Z
ke2 k2 = t dt = ; 2
e2 = x.
1 3 2
Then
r !r  
3 3 2 1 1
e3 = x 2 2
x , x x x ,
2 2 2 2
1
3x 3t3 1 1 2
Z Z
2
= x dt t dt
2 1 2 2 1
1
= x2 ,
3
so
1 2 1
t5 2t3
 
1 t 8
Z
ke3 k2 = 2
t dt = + =
1 3 5 9 9 1 45
which gives
r
5
3x2 1 .

e3 =
8
q
7
Exercise 8.10 Show that e4 = 8 (5x3 3x), and check that this is or-
thogonal to e1 , e2 , and e3 .

Using these orthonormal functions we can find the best approximation of


any function f L2 (1, 1) by a degree three polynomial:
 Z 1   Z 1 
7 3 3 5
f (t)(5t 3t) dt (5x 3x) + f (t)(3t 1) dt (3x2 1)
2
8 1 8 1
 Z 1
1 1

3
Z
+ f (t)t dt x + f (t) dt.
2 1 2 1

Example 8.11 The best approximation of f (x) = |x| by a third degree


polynomial is
 Z 1 Z 1
15x2 + 3

5 3 2 5 1
f3 (x) = 3t t dt (3x 1)+ t dt = (3x2 1)+ = .
4 0 0 16 2 16
72 Closest points and approximation

We have (after some tedious integration)


Z 1
2 2 3
kf f3 k = 2 (15x2 16x + 3)2 dx =
16 0 16

Of course, the meaning of the best approximation is that this choice


minimises the L2 norm of the difference. It is not the best approximation
in terms of the supremum norm: at x = 0 the value of f3 (x) = 3/16, while
1 1
sup |x (x2 + )| = .
x[0,1] 8 8

Exercise 8.12 Find the best approximation (w.r.t. L2 (1, 1) norm) of sin x
by a third degree polynomial.

Exercise 8.13 Find the rst four polynomials that are orthogonal on L2 (0, 1)
with respect to the usual L2 inner product.
9
Separable Hilbert spaces and 2

We start with a definition.

Definition 9.1 A normed space is separable if it contains a countable dense


subset.

This is an approximation property: one can find a countable set {xn }


n=1
such that given any u H and > 0, there exists an xj such that

kxj uk < .

Example 9.2 R is separable, since Q is a countable dense subset. So is Rn ,


since Qn is countable and dense. C is separable since complex numbers of
the form q1 + iq2 with q1 , q2 Q is countable and dense.

Example 9.3 2 is separable, since sequences of the form

x = (x1 , . . . , xn , 0, 0, 0, . . .)

with x1 , . . . , xn Q are dense.

We now show that C 0 ([0, 1]) is separable, by proving the Weierstrass ap-
proximation theorem: every continuous function can be approximated arbi-
trarily closely (in the supremum norm) by a polynomial.

73
74 Separable Hilbert spaces and 2

Theorem 9.4 Let f (x) be a real-valued continuous function on [0, 1]. Then
the sequence of polynomials
n  
X n
Pn (x) = f (p/n)xp (1 x)np
p
p=0

converges uniformly to f (x) on [0, 1].

Proof Start with the identity


n  
n
X n
(x + y) = xp y np .
p
p=0

Differentiate with respect to x and multiply by x to give


n  
n1
X n p np
nx(x + y) = p x y ;
p
p=0

differentiate twice with respect to x and multiply by x2 to give


n  
2 n2
X n p np
n(n 1)x (x + y) = p(p 1) x y .
p
p=0

It follows that if we write


 
n p
rp (x) = x (1 x)np
p
we have
Xn n
X n
X
rp (x) = 1, prp (x) = nx, and p(p 1)rp (x) = n(n 1)x2 .
p=0 p=0 p=0

Therefore
n
X n
X n
X n
X
2 2 2
(p nx) rp (x) = n x rp (x) 2nx prp (x) + p2 rp (x)
p=0 p=0 p=0 p=0
2 2 2
= n x 2nx nx + (nx + n(n 1)x )
= nx(1 x).

Since f is continuous on the closed bounded interval it is bounded, |f (x)|


M for some M > 0. It also follows that f is uniformly continuous on [0, 1],
so for any > 0 there exists a > 0 such that
|x y| < |f (x) f (y)| < .
Separable Hilbert spaces and 2 75
Pn
Since p=0 rp (x) = 1 we have

n
n
X X

f (x) f (p/n)rp (x) = (f (x) f (p/n))rp (x)

p=0 p=0

X

(f (x) f (p/n))rp (x)
|(p/n)x|

X

+ (f (x) f (p/n))rp (x) .
|(p/n)x|>

For the first term on the right-hand side we have



X X


(f (x) f (p/n))rp (x)
rp (x) = ,
|(p/n)x|

and for the second term on the right-hand side



X X


(f (x) f (p/n))rp (x) 2M
rp (x)
|(p/n)x|> |(p/n)x|>
n
2M X
(p nx)2 rp (x)
n2 2
p=0
2M x(1 x) 2M
= 2
n2 n
which tends to zero as n .

One could also state this as: the set of polynomials is dense in C 0 ([0, 1])
equipped with the supremum norm.

Proposition 9.5 C 0 ([0, 1]) is separable.

Proof Given any f C 0 ([0, 1]), it can be approximated to within by some


polynomial, i.e.


X N
n

f an x < /2.

j=1

76 Separable Hilbert spaces and 2

While the set of all polynomials in not countable, the set of all polynomials
with rational coefficients is. Since

N N N
! !
X X X
an xn bn xn |an bn |,



n=1 n=1 n=1

one can choose bn Q such that |an bn | < /2N , and then

XN
n

f bn x
< .
j=n

If we now use the fact that C 0 ([0, 1]) is dense in L2 (0, 1), it follows that:

Proposition 9.6 L2 (0, 1) is separable.

Proof Take f L2 (0, 1). Given > 0 there exists a g C 0 ([0, 1]) such that
kf gkL2 < /2. We know from above that there exists a polynomial h with
rational coefficients such that

kg hk < /2.

Since
Z 1 Z 1
kg hk2L2 = 2
|g(x) h(x)| dx kg hk2 dx = kg hk2 ,
0 0

it follows that

kf hkL2 kf gkL2 + kg hk < .

The property of separability seems very strong, but it is a simple conse-


quence of the existence of a countable orthonormal basis, as we now show.

Proposition 9.7 An innite-dimensional Hilbert space is separable i it


has a countable orthonormal basis.

Note that this shows immediately that the unit ball in a separable Hilbert
space is not compact.
Separable Hilbert spaces and 2 77

Proof If a Hilbert space has a countable basis then we can construct a


countable dense set by taking finite combinations of the basis elements with
rational coefficients, and so it is separable.
If H is separable, let E = {xn } be a countable dense subset. In particular,
the closed linear span of E is the whole of H. The Gram-Schmidt process
now provides a countable orthonormal set whose closed linear span is all of
H, i.e. a countable orthonormal basis.

Note that there are Hilbert spaces that are not separable. For example, if
is uncountable then the space 2 () consisting of all functions f : R
such that
X
|f ()|2 <

is a Hilbert space but is not separable.

By using these basis elements we can construct an isomorphism between


any separable Hilbert space and 2 , so that in some sense 2 is the only
separable infinite-dimensional Hilbert space:

Theorem 9.8 Any innite-dimensional separable Hilbert space H is iso-


metric to 2 (K).

Proof Since H is separable it has a countable orthonormal basis {ej }. Define


: H 2 by the map
 
u 7 (u, e1 ), (u, e2 ), . . . , (u, en ), . . . ;

clearly the inverse map is given by



X
7 j ej where = (1 , 2 , . . . , n , . . .).
j=1

By the result of Lemma 7.11, u H (u) 2 and 2 1 () H,


while the characterisation of a basis in Proposition 7.13 shows that kukH =
k(u)k2 .
10
Linear maps between Banach spaces

We now consider linear maps between Banach spaces.

10.1 Bounded linear maps

Definition 10.1 If U and V are vector spaces over K then an operator A


from U into V is linear if

A(x + y) = Ax + Ay for all , K, x, y U.

The collection L(U, V ) of all linear operators from U into V is a vector


space.

Definition 10.2 A linear operator A from a normed space (X, k kX ) into


another normed space (Y, k kY ) is bounded if there exists a constant M
such that
kAxkY M kxkX for all x X. (10.1)

Linear operators on infinite-dimensional spaces need not be bounded.

Lemma 10.3 A linear operator T : X Y is continuous i it is bounded.

Proof Suppose that T is bounded; then for some M > 0

kT xn T xkY = kT (xn x)kY M kxn xkX ,

78
10.1 Bounded linear maps 79

and so T is continuous. Now suppose that T is continuous; then in particular


it is continuous at zero, and so, taking = 1 in the definition of continuity,
there exists a > 0 such that

kT xk 1 for all kxk .

It follows that
   
kzk z kzk z
1 kzk,
kT zk = T
= T
kzk kzk
and so T is bounded.

The space of all bounded linear operators from X into Y is denoted by


B(X, Y ).

Definition 10.4 The operator norm of an operator A (from X into Y ) is


the smallest value of M such that (10.1) holds,

kAkB(X,Y ) = inf{M : (10.1) holds}. (10.2)

Note that and this is the key point it follows that

kAxkY kAkB(X,Y ) kxkX for all x X.

(Since for each x X, kAxkY M kxkX for every M > kAkB(X,Y ) ).

Lemma 10.5 The following is an equivalent denition of the operator norm:

kAkB(X,Y ) = sup kAxkY . (10.3)


kxkX =1

Proof Let us denote by kAk1 the value defined in (10.2), and by kAk2 the
value defined in (10.3). Then given x 6= 0 we have

A x kAk2

kxkX i.e. kAxkY kAk2 kxkX ,
Y

and so kAk1 kAk2 . It is also clear that if kxkX = 1 then

kAxkY kAk1 kxkX = kAk1 ,

and so kAk2 kAk1 . It follows that kAk1 = kAk2 .


80 Linear maps between Banach spaces

Exercise 10.6 Show that also


kAkB(X,Y ) = sup kAxkY
kxkX 1

and
kAxkY
kAkB(X,Y ) = sup . (10.4)
x6=0 kxkX

When there is no room for confusion we will omit the B(X, Y ) subscript
on the norm, sometimes adding the subscript op (for operator) to make
things clearer (k kop ).

If T : X Y then in order to find kT kop one can try the following: first
show that
kT xkY M kxkX (10.5)
for some M > 0, i.e. show that T is bounded. It is then clear that kT kop M
(since kT kop is the infimum of all M such that (10.5) holds). Then, in order
to show that in fact kT kop = M , find an example of a particular z X such
that
kT zkY = M kzkX .
This shows from the definition in (10.4) that kT kop M and hence that in
fact kT kop = M .

Example 10.7 Consider the right and left shift operators on 2 , r and l
given by
r (x) = (0, x1 , x2 , . . .) and l (x) = (x2 , x3 , x4 , . . .).
Both operators are clearly linear. We have

X
kr (x)k22 = |xi |2 = kxk22 ,
i=1

so that kr kop = 1, and



X
kl (x)k22 = |xi |2 kxk22 ,
i=2

so that kl kop 1. However, if we choose an x with


x = (0, x2 , x3 , . . .)
10.1 Bounded linear maps 81

then we have

X
kl (x)k22 = |xj |2 = kxk22 ,
j=2

and so we must have kl kop = 1.

In slightly more involved examples one might have to be a little more


crafty; for example, given the bound kT xkY M kxkX find a sequence
zn X such that
kT zn kY
M
kzn kX
as n , which again shows using (10.4) that kT kop M and hence that
kT kop = M .

Example 10.8 Consider the space L2 (a, b) with < a < b < + and
the multiplication operator T from L2 (a, b) into itself given by
T x(t) = f (t)x(t) t [a, b]

where f C 0 ([a, b]). Then clearly T is linear and


Z b
kT xk2 = |f (t)x(t)|2 dt
a
Z b
= |f (t)|2 |x(t)|2 dt
a
Z b
2
max |f (t)| |x(t)|2 dt,
atb a

and so
kT xkL2 kf k kxkL2 ,
i.e. kT kop kf k .
Now let s be a point at which |f | is maximum. Assume for simplicity that
s (a, b), and for each > 0 consider
(
1 |t s| <
x (t) =
0 otherwise,
then
s+
kT x k2 1
Z
= |f (t)|2 dt |f (s)|2 as 0
kx k2 2 s
82 Linear maps between Banach spaces

since f is continuous. Therefore in fact

kT kop = kf k .

If s = a then we can replace |ts| < in the denition of x by a t < a+,


and if s = b we replace it by b < t b; the rest of the argument is identical.

We now give a very important particular example.

Example 10.9 Consider the map from L2 (a, b) into itself given by the in-
tegral
Z b
(T x)(t) = K(t, s)x(s) ds for all t [a, b]
a

where
Z bZ b
|K(t, s)|2 ds dt < +.
a a

Then T is clearly linear, and


Z b Z b 2
2

kT xk =
K(t, s)x(s) ds dt
a a
Z b Z b Z b 
2 2
|K(t, s)| ds |x(s)| ds dt by Cauchy-Schwarz
a a a
Z b Z b 
= |K(t, s)| ds dt kxk2 ,
2
a a

and so
Z bZ b
kT k2op |K(t, s)|2 ds dt.
a a

Note that this upper bound on the operator norm can be strict, see ex-
amples.

The space B(X, Y ) is a Banach space whenever Y is a Banach space.


Remarkably this does not depend on whether the space X is complete or
not.

Theorem 10.10 Let X be a normed space and Y a Banach space. Then


B(X, Y ) is a Banach space.
10.2 Kernel and range 83

Proof Let {An } be a Cauchy sequence in B(X, Y ). We need to show that


An A for some A L(X, Y ). Since {An } is Cauchy, given > 0 there
exists an N such that

kAn Am kop for all n, m N. (10.6)

We now show that for every fixed x X the sequence {An x} is Cauchy in
Y . This follows since

kAn x Am xkY = k(An Am )xkY kAn Am kop kxkX , (10.7)

and {An } is Cauchy in B(X, Y ). Since Y is complete, it follows that

An x y,

where y depends on x. We can therefore define a mapping A : X Y


by Ax = y. We still need to show, however, that A B(X, Y ) and that
An A in the operator norm.
First, A is linear since

A(x + y) = lim An (x + y) = lim An x + lim An y = Ax + Ay.


n n n

To show that A is bounded take n, m N (from (10.6)) in (10.7), and let


m . Since Am x Ax this shows that

kAn x AxkY kxkX . (10.8)

Since (10.8) holds for every x it follows that

kAn Akop , (10.9)

and so An A B(X, Y ). Since B(X, Y ) is a vector space and An


B(X, Y ), it follows that A B(X, Y ), and (10.9) shows that An A in
B(X, Y ).

10.2 Kernel and range

Given a linear operator, we define its kernel

Ker T = {x X : T x = 0}

and its range

Range T = {y Y : y = T x for some x X}.


84 Linear maps between Banach spaces

Corollary 10.11 If T B(X, Y ) then Ker T is a closed linear subspace of


X.

Proof It is easy to show that Ker(T ) is a linear subspace, since if x, y


Ker(T ) then
T (x + y) = T x + T y = 0.
Furthermore if xn x and T xn = 0 then since T is continuous T x =
limn T xn = 0, so Ker(T ) is closed.

Note that the range is not necessarily closed, see examples.


11
The Riesz representation theorem and the adjoint
operator

If U is a normed space over K then a linear map from U into K is called a


linear functional on U .

Since K = R or C is complete then by Theorem 10.10 the collection of all


linear functionals on U , B(U, K), is a Banach space. This space is termed
the dual space of U , and is denoted by U .

For any f U ,
kf kU = sup |f (u)|.
kuk=1

Example 11.1 Take U = C 0 ([a, b]), and consider x dened for x [a, b]
by
x (f ) = f (x) for all f U.

Then
|x (f )| = |f (x)| kf k ,

so that x U with kx k 1. Choosing a function f C 0 ([a, b]) such that


|f (x)| = kf k shows that in fact kx k = 1.

(Note: this shows that at least for this particular choice of U knowledge
of T (f ) for all T U determines f U . This result is in fact true in
general.)

Example 11.2 Let U be the real vector space L2 (a, b), and take C 0 ([a, b]).

85
86 The Riesz representation theorem and the adjoint operator

Consider
Z b
f (u) = (t)u(t) dt.
a

Then
Z b

|f (u)| =
(t)u(t) dt
a
= |(, u)L2 |
kkL2 kukL2 using the Cauchy-Schwarz inequality,

and so f U with
kf k kkL2 .

If we choose u = kkL2 then kukL2 = 1 and
b
|(t)|2
Z
|f (u)| = dt = kkL2
a kkL2
and so kf k = kk.

Exercise 11.3 Let U be C 0 ([a, b]) and for some U consider f dened
as
Z b
f (u) = (t)u(t) dt for all u U.
a
Rb
Show that f U with kf k a |(t)| dt. Show that this is in fact an
equality by choosingR an appropriate sequence of functions un U for which
b
|f (un )|/kun k a |(t)| dt.

Example 11.4 Let U be a Hilbert space. Given any y H, dene

ly (x) = (x, y). (11.1)

Then ly is clearly linear, and

|ly (x)| = |(x, y)| kxkkyk

using the Cauchy-Schwarz inequality. It follows that ly H with kly k


kyk. Choosing x = y in (11.1) shows that

|ly (y)| = (y, y) = kyk2

and hence kly k = kyk.


The Riesz representation theorem and the adjoint operator 87

The Riesz Representation Theorem shows that this example can be re-
versed, i.e. every linear functional on H corresponds to some inner product:

Theorem 11.5 (Riesz Representation Theorem) For every bounded


linear functional f on a Hilbert space H there exists a unique element y H
such that
f (x) = (x, y) for all xH (11.2)

and kykH = kf kH .

Proof Let K = Ker f , which is a closed linear subspace of H.


First we show that K is a one-dimensional linear subspace of H. Indeed,
given u, v K we have

f ( f (u)v f (v)u ) = 0. (11.3)

Since u, v K it follows that f (u)v f (v)u K , while (11.3) shows that


f (u)v f (v)u K. It follows that

f (u)v f (v)u = 0,

and so u and v are proportional.


Therefore we can choose z K such that kzk = 1, and decompose any
x H as
x = (x, z)z + w with w K.

Therefore
f (x) = (x, z)f (z) = (x, f (z)z).

Setting y = f (z)z we obtain (11.2).


To show that this choice of y is unique, suppose that

(x, y) = (x, y) for all x H.

Then (x, y y) = 0 for all x H, i.e. y y H = {0}.


Finally, the calculation in Example 11.4 shows the equality of the norms
of y and f .

We now use this to define the adjoint of an operator.


We always have U U = {0}: if x U and x U then kxk2 = (x, x) = 0.
88 The Riesz representation theorem and the adjoint operator

Theorem 11.6 Let H and K be Hilbert spaces and T B(H, K). Then
there exists a unique operator T B(K, H), the adjoint of T , such that
(T x, y)K = (x, T y)H
for all x H, y K. In particular, kT kB(K,H) kT kB(H,K) .

Proof Let y K and consider the function f : H K defined by f (x) =


(T x, y)K . Then clearly f is linear and
|f (x)| = |(T x, y)K |
kT xkK kykK
kT kkxkH kykK .
It follows that f H , and so by the Riesz representation theorem there
exists a unique z H such that
(T x, y)K = (x, z)H for all x H.
Define T y = z. Then by definition
(T x, y)K = (x, T y) for all x H, y K.
However, it remains to show that T B(K, H). First, T is linear since
(x, T (y1 + y2 ))H = (T x, y1 + y2 )K
= (T x, y1 )K + (T x, y2 )K
= (x, T y1 )H + (x, T y2 )H
= (x, T y1 + T y2 )H ,
i.e.
T (y1 + y2 ) = T y1 + T y2 .
To show that T is bounded, we have
kT yk2 = (T y, T y)H
= (T T y, y)K
kT T ykK kykK
kT kkT ykH kykK .
If kT ykH = 0 then clearly kT ykH kT kkykK , otherwise we can divide
both side by kT ykH to obtain the same conclusion (that T is bounded
from K into H). So kT k kT k.
Finally suppose that (x, T y)H = (x, T y)H for all x H, y K. Then for
The Riesz representation theorem and the adjoint operator 89

each y H, (x, (T T )y)H = 0 for all x H: this shows that (T T )y = 0


for all y H, i.e. that T = T .

Example 11.7 Let H = K = Cn with its standard inner product. Then


given a matrix A = (aij ) Cnn we have

Xn X n
(Ax, y) = aij xj yi
i=1 j=1
n
X n
X
= xj (aij yi )
j=1 i=1
= (x, A y),
T
where A is the Hermitian conjugate of A, i.e. A = A .

Example 11.8 Consider H = K = L2 (0, 1) and the integral operator


Z 1
(T x)(t) = K(t, s)x(s) ds.
0

Then for x, y L2 (0, 1) we have


Z 1Z 1
(T x, y)H = K(t, s)x(s) dsy(t) dt
0 0
Z 1Z 1
= K(t, s)x(s)y(t) ds dt
0 0
Z 1 Z 1 
= x(s) K(t, s)y(t) dt ds
0 0
= (x, T y)H ,
where
Z 1

T y(s) = K(t, s)y(t) dt.
0

Exercise 11.9 Show that the adjoint of the integral operator T : L2 (0, 1)
L2 (0, 1) dened as
Z t
(T x)(t) = K(t, s)x(s) ds
0
is given by
Z 1

(T y)(t) = K(s, t)y(s) ds.
t
90 The Riesz representation theorem and the adjoint operator

Example 11.10 Let H = K = 2 and consider r x = (0, x1 , x2 , . . .) then


(r x, y) = x1 y2 + x2 y3 + x3 y4 + . . .
= (x, r y),
where r = l y = (y2 , y3 , y4 , . . .). Similarly l = r .

The following lemma gives some elementary properties of the adjoint:

Lemma 11.11 Let H, K, and J be Hilbert spaces, R, S B(H, K) and


T B(K, J), then
(a) (R + S) = R + S and
(b) (T R) = R T .

Proof
(a) Exercise
(b) Clearly
(x, (T R) y)H = (T Rx, y)J
= (Rx, T y)K = (x, R T y)H .

Less trivially we have the following:

Theorem 11.12 Let H and K be Hilbert spaces and T B(H, K). Then
(a) (T ) = T ,
(b) kT k = kT k, and
(c) kT T k = kT k2 .

Proof
(a) Since T B(K, H), (T ) B(H, K). For all x K, y H we
have
(x, (T ) y)K = (T x, y)H
= (y, T x)H
= (T y, x)K
= (x, T y)K ,
i.e. (T ) y = T y for all y H, i.e. (T ) = T .
11.1 Linear operators from H into H 91

(b) We have already shown in the proof of Theorem 11.6 that kT k


kT k. Applying this inequality to T rather than to T we have
k(T ) k = kT k kT k, and so kT k = kT k.
(c) Since kT k = kT k we have
kT T k kT kkT k = kT k2 .
But also we have
kT xk2 = (T x, T x) = (x, T T x)
kxkkT T xk kT T kkxk2 ,
i.e. kT k2 kT T k.

11.1 Linear operators from H into H

Definition 11.13 If H is a Hilbert space and T B(H, H) then T is normal


if
T T = T T
and self-adjoint if T = T .

Note that if T is normal then T T is self-adjoint, since (T T ) = (T ) T =


T T using (b) of Lemma 11.11 ((T R) = R T ) and (a) of Theorem 11.12
((T ) = T ).

Equivalently T B(H, H) is self-adjoint iff


(x, T y) = (T x, y) for all x, y H.

Example 11.14 Let H = Rn and A Rnn . Then A is self-adjoint if


A = AT , i.e. if A is symmetric.

Example 11.15 Let H = Cn and A Cnn . Then A is self-adjoint if


T
A = A , i.e. if A is Hermitian.

Example 11.16 Consider the right-shift operator on 2 , for which r = l .


Then r is not normal, since
r l x = r (x2 , x3 , . . .) = (0, x2 , x3 , . . .)
92 The Riesz representation theorem and the adjoint operator

and
l r x = l (0, x1 , x2 , . . .) = (x1 , x2 , x3 , . . .).

Example 11.17 The integral operator T : L2 L2 given by


Z 1
T f (t) = K(t, s)f (s) ds
0
is self-adjoint if K(t, s) = K(s, t), i.e. if K is symmetric.

It would be nice, of course, to have yet another expression for kT k, and


we can do this when T is self-adjoint.

Theorem 11.18 Let H be a Hilbert space and T B(H, H) a self-adjoint


operator. Then
(a) (T x, x) is real for all x H and
(b) kT k = sup{|(T x, x)| : x H, kxk = 1}.

Proof For (a) we have


(T x, x) = (x, T x) = (T x, x),
and so (T x, x) is real. Now let M = sup{|(T x, x)| : x H, kxk = 1}.
Clearly
|(T x, x)| kT xkkxk kT kkxk2 = kT k
when kxk = 1, and so M kT k.
For any u, v H we have
4 Re(T u, v) = (T (u + v), u + v) (T (u v), u v)
M (ku + vk2 + ku vk2 )
2M (kuk2 + kvk2 )
using the parallelogram law. If T u 6= 0 choose
kuk
v= Tu
kT uk
to obtain, since kvk = kuk, that
4kukkT uk 4M kuk2 ,
i.e. kT uk M kuk. This also holds if T u = 0. It follows that kT k M and
therefore that kT k = M .
12
Spectral Theory I: General theory

12.1 Spectrum and point spectrum

Let H be a complex Hilbert space and T B(H, H), then the point spectrum
of T consists of the set of all eigenvalues,

p (T ) = { C : T x = x for some non-zero x H}.

Clearly || kT kop for any p (T ), since if T x = x then

||kxk = |x| = kT xk kT kop kxk.

Example 12.1 Consider the right shift operator r on 2 . This operator


has no eigenvalues, since r x = x implies that

(0, x1 , x2 , . . .) = (x1 , x2 , x3 , . . .)

and so
x1 = 0, x2 = x1 , x3 = x2 , . . . .

If 6= 0 then this implies that x1 = 0, and then x2 = x3 = x4 = . . . = 0,


and so is not an eigenvalue. If = 0 then we also obtain x = 0, and so
there are no eigenvalues, i.e. p (r ) = .

Example 12.2 Consider the left shift operator l on 2 ; C is an eigen-


value if l x = x, i.e. if

(x2 , x3 , x4 . . .) = (x1 , x2 , x3 , . . .),

93
94 Spectral Theory I: General theory

i.e. if
x2 = x1 , x3 = x2 , x4 = x3 .
Given 6= 0 this gives a candidate eigenfunction
x = (1, , 2 , 3 , . . .),
which is an element of 2 provided that

X 1
||2n = <
1 ||2
n=1

which is the case for any with || < 1. It follows that


{ C : || < 1} p (l ).

If A is a linear operator on a finite-dimensional space V then C is an


eigenvalue of A if
Ax = x for some non-zero x V.
In this case is an eigenvalue if and only if AI is not invertible (recall that
you can find the eigenvalues of an n n matrix by solving det(A I) = 0).

However, this is no longer true in infinite-dimensional spaces. Before we


define the spectrum, we briefly discuss the definition of the inverse of a linear
operator.

As with the theory of matrices, the concept of the inverse of a general


linear operator is extremely useful. We say that A is injective if the equation
Ax = y
has a unique solution for every y range(A). We say that A is invertible if
it is injective and the range of A is equal to H. In this case we define the
inverse of A, A1 , by A1 y = x. It is easy to check that
AA1 u = u for all u range(A) and A1 Au = u for all u H.

Exercise 12.3 Show that if A is linear and A1 exists then it is linear too.

The injectivity of A is equivalent to the triviality of its kernel.

Lemma 12.4 A is injective i Ker(A) = {0}.


12.1 Spectrum and point spectrum 95

Proof Suppose that A is invertible. Then the equation Ax = y has a unique


solution for any y Range(A). However, if Ker(A) contain some non-zero
element z then A(x+ z) = y also, so Ker(A) must be {0}. Conversely, if A is
not invertible then for some y Range(A) there are two distinct solutions,
x1 and x2 , of Ax = y, and so A(x1 x2 ) = 0, giving a non-zero element of
Ker(A).

We can now make the following definition:

Definition 12.5 The resolvent set of T , R(T ), is


R(T ) = { C : (T I)1 B(H, H)}.
The spectrum (T ) of T B(H, H) is the complement of R(T ),
(T ) = C \ R(T ),
i.e. the spectrum of T is the set of all complex for which T I does not
have a bounded inverse dened on all of H.

Clearly p (T ) (T ), since if there is a non-zero z with T z = z then


Ker(T I) 6= {0} and so (T I) is not invertible (using Lemma 12.4). But
the spectrum can be much larger than the point spectrum; a nice example
will be a consequence of the fact that (T ) = (T ).

Lemma 12.6 (T ) = { : (T )}.

Proof If
/ (T ) then T I has a bounded inverse,
(T I)(T I)1 = I = (T I)1 (T I).
Taking adjoints we obtain
[(T I)1 ] (T I) = I = (T I)[(T I)1 ] ,
and so T I has a bounded inverse, i.e.
/ (T ). Starting instead with

T we deduce that
/ (T )
/ (T ), which completes the proof.

Example 12.7 Let r be the right-shift operator on 2 . We saw above that


r has no eigenvalues, but that for r = l the interior of the unit disc is
contained in the point spectrum. It follows that
{ C : || < 1} (r )
96 Spectral Theory I: General theory

even though p (r ) = .

We have already seen that any eigenvalue of T must satisfy || kT kop .


We now show that this also holds for any (T ); the argument is more
subtle, and based on considering how to solve the linear equation (I T )x =
y.

Theorem 12.8 Suppose that T B(H, H) with kT k < 1. Then (I T )1


B(H, H) and
(I T )1 = I + T + T 2 +
with
k(I T )k1 (1 kT k)1 .

Proof Since
kT n xk kT kkT n1 xk
it follows that kT n k kT kn . Therefore if we consider
Vn = I + T + + T n
we have (for n > m)
kVn Vm k = kT m+1 + + T n1 + T n k
kT m+1 k + + kT n1 k + kT n k
kT km+1 + + kT kn1 + kT kn
1
kT km+1 .
1 kT k
It follows that {Vn } is Cauchy in the operator norm, and so converges to
some V B(H, H) with
kV k 1 + kT k + kT k2 + = [1 kT k]1 .
Clearly
V (I T ) = (I + T + T 2 + )(I T ) = (I + T + T 2 + ) (T + T 2 + T 3 ) = I
and similarly (I T )V = I.

As promised:

Corollary 12.9 The spectrum of T , (T ) { C : || kT kop }.


12.1 Spectrum and point spectrum 97

Proof We have T I = ( 1 T I). So if I 1 T is invertible,


/ (T ).
1
But for || > kT kop we have k T kop < 1, and the above theorem shows that
T I is invertible and the result follows.

We now show that the spectrum must also be closed, by showing that it
complement (the resolvent set) is open. To this end, we prove the following
theorem, which shows that the set of bounded linear operators with bounded
inverses defined on all of H is open, i.e. that this property is stable under
perturbation.

Theorem 12.10 Let H be a Hilbert space and T B(H, H) such that


T 1 B(H, H). Then for any U B(H, H) with

kU k < kT 1 k1

the operator T + U is invertible with


kT 1 k
k(T + U )1 k . (12.1)
1 kU kkT 1 k

Proof Let P = T 1 (T + U ) = I + T 1 U . Then since by assumption


kT 1 kkU k < 1 it follows from Theorem 12.8 that P is invertible with
1
kP k1 .
1 kT 1 kkU k
Using the definition of P we have

T 1 (T + U )P 1 = P 1 T 1 (T + U ) = I;

from the first of these identities we have

(T + U )P 1 T 1 = I

and so
(T + U )1 = P 1 T 1

and (12.1) follows.

Corollary 12.11 If T B(H, H) then the spectrum of T is closed.

Proof We show that the resolvent set R(T ), the complement of (T ), is


98 Spectral Theory I: General theory

open. Indeed, if R(T ) then T I is invertible. Theorem 12.10 show


that (T I) I is invertible for all
|| < k(T I)1 k1 ,
i.e. R(T ) is open.

Lemma 12.12 The spectrum of l and of r are both equal to the unit disc
in the complex plane:
( ) = { C : || 1}.

Proof We showed earlier that for the shift operators r and l on 2 ,


(r ) = (l ) { C : || < 1}.
We have already shown that k kop = 1, so we know that at most
( ) { C : || 1},
but since it follows from Corollary 12.11 that
( ) { C : || 1}.
It follows that in fact
( ) = { C : || 1}.

A question on the final examples sheet shows that if T is self-adjoint then


(T ) R.
13
Spectral theory II: compact self-adjoint operators

We now consider eigenvalues of compact self-adjoint linear operators on a


Hilbert space. It is convenient to restrict attention to Hilbert spaces over C,
but this is no restriction, since we can always consider the complexification
of a Hilbert space over R.

13.1 Complexification and real eigenvalues

Exercise 13.1 Let H be a Hilbert space over R, and dene its complexi-
cation HC as the vector space
HC = {x + iy : x, y H},
equipped with operations + and dened via

(x + iy) + (w + iz) = (x + w) + i(y + z), x, y, w, z V


and
(a + ib) (x + iy) = (ax by) + i(bx + ay) a, b R, x, y V.
Show that equipped with the inner product
(x + iy, w + iz)HC = (x, w) + i(y, w) i(x, z) + (y, z)
HC is a Hilbert space.

It follows that
kx + iyk2HC = kxk2 + kyk2 .

99
100 Spectral theory II: compact self-adjoint operators

Just as we can complexify a Hilbert space H to give HC , we can complexify


a linear operator T that acts on H to a linear operator TC that acts on HC :

Lemma 13.2 Let H be a real Hilbert space and HC its complexication.


Given T B(H, H), extend T to a linear operator T : HC HC via the
denition
T (x + iy) = T x + iT y x, y H.
Then T B(HC , HC ), any eigenvalue of T is an eigenvalue of T , and that
any real eigenvalue of T is an eigenvalue of T .

Proof Clearly
kT (x+iy)k2HC = kT xk2 +kT yk2 kT k2B(H,H) (kxk2 +kyk2 ) = kT k2B(H,H) kx+iyk2HC ,

so kT k kT k. But also kT xk = kT (x + i0)k, and so kT k kT k. So in fact


kT k = kT k.
If is an eigenvalue of T with eigenvector x then
T (x + i0) = T x = x = (x + i0);
while if T has eigenvalue R with eigenvector x + iy (with either x or y
non-zero), it follows that
T x + iT y = T (x + iy) = (x + iy) = x + iy,
and so T x = x and T y = y, so since x or y is non-zero, is an eigenvalue
of T .

Lemma 13.3 Let H be a real Hilbert space and T B(H, H) a self-adjoint


operator. Then T as dened above is a self-adjoint operator on HC .

Proof For , HC , = x + iy, = u + iv, x, y, u, v H,


(T , ) = (T (x + iy), u + iv)HC
= (T x + iT y, u + iv)
= (T x, u) i(T x, v) + i(T y, u) + (T y, v)
= (x, T u) i(x, T v) + i(y, T u) + (y, T v)
= (x + iy, T u + iT v)
= (, T ).
13.2 Compact operators 101

If T is self-adjoint then all its eigenvalues are real:

Theorem 13.4 Let T be a self-adjoint operator on a Hilbert space H. Then


all the eigenvalues of T are real and the eigenvectors corresponding to dis-
tinct eigenvalues are orthogonal.

Proof Suppose that T x = x with x 6= 0. Then

kxk2 = (x, x) = (T x, x) = (x, T x) = (x, T x) = (x, x) = kxk2 ,

i.e. = .
Now if and are distinct eigenvalues with T x = x and T y = y then

0 = (T x, y) (x, T y) = (x, y) (x, y) = ( )(x, y),

and so (x, y) = 0.

Corollary 13.5 If T is a self-adjoint operator on a real Hilbert space H,


and T is its complexication dened above, p (T ) = p (T ).

13.2 Compact operators

We will develop our spectral theory for operators that are self-adjoint and
compact according to the following definition:

Definition 13.6 Let X and Y be normed spaces. Then a linear operator


T : X Y is compact if for any bounded sequence {xn } X, the sequence
{T xn } Y has a convergent subsequence (whose limit lies in Y ).

Note that a compact operator must be bounded, since otherwise there


exists a sequence in X with kxn k = 1 but kT xn k , and clearly {T xn }
cannot have a convergent subsequence.

Example 13.7 Take T B(X, Y ) with nite-dimensional range. Then T


is compact, since any bounded sequence in a nite-dimensional space has a
convergent subsequence.
102 Spectral theory II: compact self-adjoint operators

Theorem 13.8 Suppose that X is a normed space and Y is a Banach space.


If {Kn } is a sequence of compact (linear) operators in B(X, Y ) converging
to some K B(X, Y ) in the operator norm, i.e.

sup kKn x KxkY 0


kxkX =1

as n , then K is compact.

Proof Let {xn } be a bounded sequence in X. Then since K1 is compact,


K1 (xn ) has a convergent subsequence, K1 (xn1j ). Since xn1j is bounded,
K2 (xn1j ) has a convergent subsequence, K2 (xn2j ). Repeat this process to
get a family of nested subsequences, xnkj , with Kl (xnkj ) convergent for all
l k.
Now consider the diagonal sequence yj = xnjj . Since {yj } is a subsequence
of {xni } for j n, it follows that Kn (yj ) converges (as j ) for every n.
We now show that K(yj ) is Cauchy, and hence convergent, to complete
the proof. Choose > 0, and use the triangle inequality to write

kK(yi ) K(yj )kY


kK(yi ) Kn (yi )kY + kKn (yi ) Kn (yj )kY + kKn (yj ) K(yj )kY .

Since {yj } is bounded and Kn K in the operator norm, pick n large


enough that
kK(yj ) Kn (yj )kY /3

for all yj in the sequence. For such an n, the sequence Kn (yj ) is Cauchy,
and so there exists an N such that for i, j N we can guarantee

kKn (yi ) Kn (yj )kY /3.

So now
kK(yi ) K(yj )kY for all i, j N,

and {K(yn )} is a Cauchy sequence.

We now use this theorem to show the following:

Proposition 13.9 The integral operator T : L2 (a, b) L2 (a, b) given by


Z b
[T u](x) = K(x, y)u(y) dy
a
13.2 Compact operators 103

with
Z bZ b
|K(x, y)|2 dx dy <
a a

is compact.

Proof Let {j } be an orthonormal basis for L2 (a, b). It follows that {i (x)j (y)}
is an orthonormal basis for L2 ((a, b) (a, b)). If we write K(x, y) in terms
of this basis we have

X
K(x, y) = kij i (x)j (y),
j,k=1

where the coefficients kij are given by


Z bZ b
kij = K(x, y)i (x)j (y) dx dy,
a a

and the sum converges in L2 ((a, b) (a, b)). Since {i (x)j (y)} is a basis
we have
Z bZ b
X
kKk2L2 ((a,b)(a,b)) = |K(x, y)|2 dx dy = |kij |2 . (13.1)
a a i,j=1

We now approximate T by operators derived from finite truncations of


the expansion of K(x, y). We set
n
X
Kn (x, y) = kij i (x)j (y),
i,j=1

and
Z b
[Tn u](x) = Kn (x, y)u(y) dy.
a
P
If u L2 () is given by u = l=1 cl l , then
n X
X Z b
(Tn u)(x) = kij i (x)j (y)cl l (y) dy
i,j=1 l=1 a

X n
= kij cj i (x).
i,j=1

Since Tn u is therefore a linear combination of {i }ni=1 , the range of Tn has


dimension n. It follows that n is compact for each n; if we can show that
104 Spectral theory II: compact self-adjoint operators

Tn T in the operator norm then we can use theorem 13.8 to show that T
is compact.
This is straightforward, since
Z bZ b
2
k(T Tn )uk = |K(x, y)u(y) Kn (x, y)u(y)|2 dx dy
a a
Z b Z b Z b
2
|K(x, y) Kn (x, y)| dx dy |u(y)|2 dy,
a a a
i.e.
Z bZ b
2
kT Tn k |K(x, y) Kn (x, y)|2 dx dy
a a
2
Z bZ b X


kij i (x)j (y) dx dy
a a i,j=n+1

X
= |kij |2 ,
i,j=n+1

using the expansion of K and Kn . Convergence of Tn to T follows since the


sum in (13.1) is finite.

We now show that any compact self-adjoint operator has at least one
eigenvalue. (Recall that r is not even normal, so this is no contradiction.)

Theorem 13.10 Let H be a Hilbert space and T B(H, H) a compact


self-adjoint operator. Then at least one of kT kop is an eigenvalue of T .

Proof We assume that T 6= 0, otherwise the result is trivial. From Theorem


11.18,
kT kop = sup |(T x, x)|.
kxk=1

Thus there exists a sequence xn , of unit vectors, such that


(T xn , xn ) kT kop = .
Since T is compact there is a subsequence xnj such that T xnj is convergent
to some y. Relabel xnj as xn again.
Now consider
kT xn xn k2 = kT xn k2 + 2 2(T xn , xn )
22 2(T xn , xn );
13.2 Compact operators 105

by the choice of xn , the right-hand side tends to zero as n . It follows,


since T xn y, that
xn y,
and since 6= 0 is fixed we must have xn x for some x H. Therefore
T xn T x = x. It follows that
T x = x
and clearly x 6= 0, since kyk = ||kxk = kT kop 6= 0.

Note that since any eigenvalue must satisfy || kT kop , since if T x = x


we have
kxk2 = (x, x) = (T x, x) kT xkkxk kT kop kxk2 ,
it follows that kT kop = sup{ : p (T )}.

Proposition 13.11 Let T be a compact self-adjoint operator on a Hilbert


space H. Then p (T ) is either nite or consists of a countable sequence
tending to zero. Furthermore every distinct non-zero eigenvalue corresponds
to a nite number of linearly independent eigenvectors.

Proof Suppose that T has infinitely many eigenvalues that do not form a
sequence tending to zero. Then for some > 0 there exists a sequence of
distinct eigenvalues with |n | > . Let xn be a corresponding sequence of
eigenvectors with kxn k = 1; then
kT xn T xm k2 = (T xn T xm , T xn T xm ) = |n |2 + |m |2 22
since (xn , xm ) = 0. It follows that {T xn } can have no convergent subse-
quence, which contradicts the compactness of T .
Now suppose that for some eigenvalue there exists an infinite number of
linearly independent eigenvectors {en } n=1 . Using the Gram-Schmidt process
we can find a countably infinite orthonormal set of eigenvectors, since any
linear combination of the {ej } is still an eigenvector:
Xn n
X Xn
T( j ej ) = j T ej = ( j ej ).
j=1 j=1 j=1

Now, we have

kT en T em k = ken em k = ||ken em k = 2||.
106 Spectral theory II: compact self-adjoint operators

It follows that {T en } can have no convergent subsequence, again contradict-


ing the compactness of T . (Note that this second part does not in fact use
the fact that T is self-adjoint.)

Lemma 13.12 Let T B(H, H) and let S be a closed linear subspace of H


such that T S S. Then T S S .

Proof Let x S and y S. Then T y S and so (T y, x) = (y, T x) = 0


for all y S, i.e. T x S .

Since we will apply this lemma when T is self-adjoint, for such a T we


have
TS S T S S
for any closed linear subspace S of H.

Theorem 13.13 (Hilbert-Schmidt Theorem). Let H be a Hilbert space and


T B(H, H) be a compact self-adjoint operator. Then there exists a nite
or countably innite orthonormal sequence {wn } of eigenvectors of T with
corresponding non-zero real eigenvalues {n } such that for all x H
X
Tx = j (x, wj )wj . (13.2)
j

Proof By Theorem 13.10 there exists a w1 such that kw1 k = 1 and T w1 =


kT kw1 . Consider the subspace of H perpendicular to w1 ,
H2 = w1 .
Then since T is self-adjoint, Lemma 13.12 shows that T leaves H2 invariant.
If we consider T2 = T |H2 then we have T2 B(H2 , H2 ) with T2 compact;
this operator is still self-adjoint, since for all x, y H2
(x, T2 y) = (x, T y) = (T x, y) = (T x, y) = (T2 x, y).
Now apply Theorem 13.10 to the operator T2 on the Hilbert space H2
find an eigenvalue 2 = kT2 k and an eigenvector w2 H2 with kw2 k = 1.
Continue this process as long as Tn 6= 0.
If Tn = 0 for some n then for any x H we have
n1
X
y := x (x, wj )wj Hn .
j=1
13.2 Compact operators 107

Then
n1
X n1
X
0 = Tn y = T y = T x (x, wj )T wj = T x j (x, wj )wj
j=1 j=1

which is (13.2).
If Tn is never zero then consider
n1
X
yn := x (x, wj )wj Hn .
j=1

Then we have
n1
X
kxk2 = kyn k2 + |(x, wj )|2 ,
j=1

and so kyn k kxk. It follows that



n1
X


T x
= kT yn k kTn kkyn k = |n |kxk,
j (x, wj )wj
j=1

and since |n | 0 as n we have (13.2).

There is a partial converse to this theorem on the last examples sheet: if


H is a Hilbert space, {ej } is an orthonormal set in H, and T B(H, H) is
such that

X
Tu = j (u, ej )ej with j R and j 0 as j
j=1

then T is compact and self-adjoint.

The orthonormal sequence constructed in this theorem is only a basis for


the range of T ; however, we do have:

Corollary 13.14 Let H be an innite-dimensional Hilbert space and T


B(H, H) a compact self-adjoint operator. Then there exists an orthonormal
basis E of H consisting of eigenvectorsof T , and for any x H

X
Tx = e (x, e)e.
eE

where T e = e e.
108 Spectral theory II: compact self-adjoint operators

Proof From Theorem 13.13 we have a finite or countable sequence {wn } of


eigenvectors of T such that

X
Tx = j (x, wj )wj . (13.3)
j=1

Now let F be an orthonormal basis for Ker T (this exists since Ker(T ) is
a Hilbert space in its own right, and every Hilbert space has an orthonormal
basis); each f F is an eigenvector of T with eigenvalue zero, and since
T f = 0 but T wj = j wj with j 6= 0, we know that (f, ek ) = 0 for all f F ,
k N. So F {wj } is an orthonormal set in H. orthonormal set in H.
Now, (13.3) implies that

X
T x (x, wj )wj = 0,
j=1
P
i.e. that x j=1 (x, wj )wj Ker T . It follows that {wj } F is an orthonor-
mal basis for H.

Exercise 13.15 Show that if T is invertible and satises the conditions


of Theorem 13.13 then there is an orthonormal basis of H consisting of
eigenvectors corresponding to non-zero eigenvalues of T .

We end this chapter with a corollary of Corollary 13.14 (!) that shows that
the eigenvalues are essentially all of the spectrum of a compact self-adjoint
operator.

Theorem 13.16 Let T be a compact self-adjoint operator on a Hilbert space.


Then (T ) = p (T ).

Note that this means that either (T ) = p (T ) or (T ) = p (T ) {0},


since p (T ) has no limit points except perhaps zero. So (T ) = p (T )
unless there are an infinite number of eigenvalues but zero itself is not an
eigenvalue.

Proof By the corollary of the HS Theorem, we have


X
Tx = e (x, e)e
eE

for some orthonormal basis E of H.


13.2 Compact operators 109

Now take
/ p (T ). For such , it follows that there exists a > 0 such
that
sup | | > 0 for all p (T )
j

(otherwise p (T )). We use this to show that T I is invertible with


bounded inverse.
Now,
X X
(T I)x = y (e )(x, e)e = (y, e)e.
eE eE

Taking the inner product of both sides with a particular f E, we have


(T I)x = y (f )(x, f ) = (y, f ).
So we must have
(y, f )
(x, f ) = .
k
|(y, f )|2 < and | | , it follows that
P
Since
X
x= (x, f )f
f E

converges, and that kxk 1 kyk. So (T I)1 exists and is bounded.


14
Sturm-Liouville problems

We consider the Sturm-Liouville problem


 
d du
p(x) + q(x)u = u with u(a) = u(b) = 0. (14.1)
dx dx
As a shorthand, we write L[u] for the left-hand side of (14.1), i.e.

L[u] = (p(x)u ) + q(x)u

We will assume that p(x) > 0 on [a, b] and that q(x) 0 on [a, b].

It was one of the major concerns of applied mathematics throughout the


nineteenth century to show that the solutions {un (x)} of (14.1) form a com-
plete basis for some appropriate space of functions (generalising the use of
Fourier series as a basis for L2 ). We can do this easily using the theory
developed in the last section.

However, first we have to turn the differential equation (14.1) into an


integral equation. We do this as follows:

Lemma 14.1 Let u1 (x) and u2 (x) be two linearly independent non-zero
solutions of
 
d du
p(x) + q(x)u = 0.
dx dx
Then
Wp (u1 , u2 )(x) := p(x)[u1 (x)u2 (x) u2 (x)u1 (x)]

is a constant.

110
Sturm-Liouville problems 111

Proof First we show that Wp is constant. Differentiate Wp with respect to


x, then use the fact that L[u1 ] = L[u2 ] = 0 to substitute for pu = qu p u
to give:

Wp = p u1 u2 + pu1 u2 + pu1 u2 p u1 u2 pu1 u2 pu1 u2


= p (u1 u2 u2 u1 ) + p(u1 u2 u2 u1 )
= p (u1 u2 u2 u1 ) + u2 (qu1 p u1 ) u1 (qu2 p u2 )
= 0.

Now, if Wp 0 then, since p 6= 0, we have

u1 u
u1 u2 u2 u1 = 0 = 2,
u1 u2

which can be integrated to give ln u1 = ln u2 + c, i.e. u1 = ec u2 , which


implies that u1 and u2 are proportional, contradicting the fact that they are
linearly independent.

Theorem 14.2 Suppose that u1 (x) and u2 (x) are two linearly independent
non-zero solutions of
 
d dy
p(x) + q(x)y = 0,
dx dx

with u1 (a) = 0 and u2 (b) = 0. Set C = Wp (u1 , u2 )1 and dene


(
Cu1 (x)u2 (y) ax<y
G(x, y) = (14.2)
Cu2 (x)u1 (y) y x b;

then the solution of L[u] = f is given by


Z b
u(x) = G(x, y)f (y) dy. (14.3)
a

Proof Writing (14.3) out in full we have


Z x Z b
u(x) = Cu2 (x) u1 (y)f (y) dy + Cu1 (x) u2 (y)f (y) dy.
a x
112 Sturm-Liouville problems

Now,
Z x

u (x) = Cu2 (x)u1 (x)f (x) + Cu2 (x)
u1 (y)f (y) dy Cu1 (x)u2 (x)f (x)
a
Z b

+Cu1 (x) u2 (y)f (y) dy
x
Z x Z b
= Cu2 (x) u1 (y)f (y) dy + Cu1 (x) u2 (y)f (y) dy,
a x

and then since CWp (u1 , u2 ) = 1,


Z x
u (x) = Cu2 (x)u1 (x)f (x) + Cu2 (x) u1 (y)f (y) dy Cu1 (x)u2 (x)f (x)
a
Z b
+Cu1 (x) u2 (y)f (y) dy
x
Z x Z b
f (x)
= + Cu2 (x) u1 (y)f (y) dy + Cu1 (x) u2 (y)f (y) dy.
p(x) a x

Now we have L[u] = pu p u + qu, and since L is linear with L[u1 ] =


L[u2 ] = 0 it follows that
L[u] = f (x)

as claimed.

We can now define a linear operator on L2 (a, b) by the right-hand side of


(14.3):
Z b
T f (x) = G(x, y)f (y) dy.
a

Since G is symmetric (see (14.2)), it follows that T is a self-adjoint (see


Example 11.17), and we have proved that such a T is compact in Proposition
13.9.

If L[u] = u, then u = T (u). Since T is linear,

L[u] = u u = T u.

If we can show that , 1/ 6= 0 then the eigenvectors (or eigenfunctions) of


the ODE boundary value problem L[u] = u (which is just (14.1)) will be
exactly those of the operator T (for which T u = 1 u). Since the eigenvectors
of T form an orthonormal basis for L2 (a, b) (we will see that Ker(T ) = {0}),
the same is true of the eigenfunctions of the Sturm-Liouville problem.
Sturm-Liouville problems 113

Theorem 14.3 The eigenfunctions of the problem (14.1) form a complete


orthonormal basis for L2 (a, b).

Proof We show first that = 0 is not an eigenvalue of (14.1), i.e. there is


no non-zero u for which L[u] = 0. Indeed, if L[u] = then we have
Z b
0 = (L[u], u) = (pu ) u + q|u|2 dx
a
Z b
b
= p|u |2 + q|u|2 dx a p(x)u (x)u(x)]
a
Z b
= p|u |2 + q|u|2 dx.
a

Since p > 0 on [a, b], it follows that u = 0 on [a, b], and so u must be
constant on [a, b]. Since u(a) = 0, it follows that u 0.
We now use show that Ker(T ) = 0. Indeed, T f is the solution of L[u] = f ,
i.e. f = L[T f ]. So if T f = 0, it follows that f = 0.
So is an eigenfunction of the SL problem iff it is an eigenvector for T :
1
L[] = T = .

Since G(x, y) is symmetric and bounded, it follows from Examples 10.9
and 11.8 that T is a bounded self-adjoint operator; Proposition 13.9 shows
that T is also compact. It follows from Theorem 13.13 that T has a set of
orthonormal eigenfunctions {j } with

T j = j j ,

and since Ker(T ) = {0} the argument of Corollary 13.14 shows that those
form an orthonormal basis for L2 (a, b).
Comparing this with our original problem we obtain an infinite set of
eigenfunctions {j } with corresponding eigenvalues j = 1
j . Note that
now j as j . As above, the eigenfunctions {j } form an or-
thonormal basis of L2 (a, b).

This has immediate applications for Fourier series. Indeed, if we consider


d2 u
= u u(0) = 0, u(1) = 0,
dx2
which is (14.1) with p = 1, q = 0, it follows that the eigenfunctions of this
114 Sturm-Liouville problems

equation will form a basis for L2 (0, 1). These are easily found by elementary
methods, and are
{sin kx}
k=1 .

It follows that, appropriately normalised, these functions form an orthonor-


mal basis for L2 (a, b), i.e. that any f L2 (a, b) can be expanded in the
form

X
f (x) = k sin kx.
k=1

Thus begins the theory of Fourier series...

Exercise 14.4 Show that the solution of d2 u/dx2 = f is given by


Z 1
u(x) = G(x, y)f (y) dy,
0
where (
x(1 y) 0x<y
G(x, y) =
y(1 x) y x 1.

S-ar putea să vă placă și