Wavelets A Primer

Editorial, Sales, and Customer Service Office
A K Peters, Ltd.
63 South Avenue
Natick, MA 01760
Copyright 1998 by A K Peters, Ltd.
All rights reserved. No part of the material protected by this copyright notice
may be reproduced or utilized in any form, electronic or mechanical,
including photocopying, recording, or by any information storage and
retrieval system, without written permission from the copyright owner.
Library of Congress Cataloging-in-Publication Data
Blatter, Christian, 1935-
[Wavelets. English]
Wavelets: a primer / Christian Blatter.
p. cm.
ISBN 1-56881-095-4
1. Wavelets (Mathematics) I. Title.
QA403.3.B5713 1998
515' .2433-DC21 98-29959
CIP
Rev.
Originally pllblished in the German I,mguage by Friedr Vieweg & Sohn Verlagsgesellschaft mbH,
D65189 Wicsbaden, with the title "Wavelets. Eine Einfiihrung I st Edition"
(c) by Friedr Vieweg & Sohn VCIIagsgesellschaft mbH, BraunschwciglWiesbaden, 1998
Printed in the United States of America
02 01 00 99 98 10 9 876 543 2 1
Contents
Preface vii
Read Me ix
1 Formulating the problem 1
1.1 A central theme of analysis . 1
1.2 Fourier series . 4
1.3 Fourier transform 8
1.4 Windowed Fourier transform 11
1.5 Wavelet transform 14
1.6 The Haar wavelet . 20
2 Fourier analysis . 29
2.1 Fourier series . 29
2.2 Fourier transform on lR 34
2.3 The Heisenberg uncertainty principle 49
2.4 The Shannon sampling theorem . 53
3 The continuous wavelet transform 61
3.1 Definitions and examples . 61
3.2 A Plancherel formula
69
3.3 Inversion formulas 74
3.4 The kernel function 78
3.5 Decay of the wavelet transform 82
4 Frames 90
4.1 Geometrical considerations . 90
4.2 The general notion of a frame 99
4.3 The discrete wavelet transform 104
4.4 Proof of theorem (4.10) 114
vi
5 Multiresolution analysis
5.1 Axiomatic description . . .
5.2 The scaling function. . . .
5.3 Constructions in the Fourier domain.
5.4 Algorithms . . . . . . . . . . . .
6 Orthonormal wavelets with compact support
6.1 The basic idea . . . .
6.2 Algebraic constructions
6.3 Binary interpolation.
6.4 Spline wavelets
References
Index . ..
Contents
120
121
126
134
149
157
157
168
176
188
199
201
Preface
This book is neither the grand retrospective view of a protagonist nor an
encyclopedic research monograph, but the approach of a working mathemati-
cian to a subject that has stimulated approximation theory and inspired users
in many diverse domains of applied mathematics, unlike any other since the
invention of the Fast Fourier Transform. As a matter of fact, I had only set
out to draw up a one-semester course for our students at ETH Zurich that
would introduce them to the world of wavelets ab ovo; indeed, such a course
hadn't been given here before. But in the end, thanks to encouraging com-
ments from colleagues and people in the audience, the present booklet came
into existence.
I had imagined that the target group for this course would be the following:
students of mathematics in their senior year or first graduate year, having
the usual basic knowledge of analysis, carrying around a knapsack full of
convergence theorems, but without any practical experience, say, in Fourier
analysis. In the back of my mind I also entertained the hope that some people
from the field of engineering would attend the course. In fact, they did, and
afterward I found out that exactly these students had profit ted the most from
my efforts.
The contents of the book can be summarized as follows: The introductory
Chapter 1 presents a tour d 'horizon over various ways of signal representation;
it is here that the Haar wavelet makes its first appearance. Chapter 2 serves
primarily as a tutorial of Fourier analysis (without proofs); it is supplemented
by the discussion of two theorems that define ultimate limits of signal theory:
the Heisenberg uncertainty principle and Shannon's sampling theorem. In
Chapter 3 we are finally ready for a treatment of the continuous wavelet
transform, and Chapter 4, entitled "Frames", describes a general framework
(pun not intended) allowing us to handle the continuous and the discrete
wavelet transforms in a uniform way. All this being accomplished, we finally
arrive at the main course: multiresolution analysis with its fast algorithms in
Chapter 5 and the construction of orthonormal wavelets with compact support
in Chapter 6. The book ends with a brief treatment of spline wavelets in
Section 6.4.
viii Preface
Given the small size of this treatise, some things had to be left out: biorthogo-
nal systems, wavelets in two dimensions, and a detailed description of applica-
tions, to name a few. Furthermore, I decided to leave distributions out of the
picture. This means that there aren't any Sobolev spaces, nor a discussion of
pointwise convergence, etc., of wavelet approximations, and the Paley-Wiener
theorem is not at our disposal either. Fortunately, there is an elementary ar-
gument coming to our rescue in proving that the Daubechies wavelets indeed
have compact support.
When putting the material together, I made generous use of the work of other
authors. In the first place, of course, I borrowed from Ingrid Daubechies'
incomparable "Ten Lectures on Wavelets" [D], to some lesser extent from [1],
which at the time (winter 1996-97) was the only wavelet book available in
German, and from Kaiser's "Friendly Guide to Wavelets" [K]. Concerning
further sources of inspiration, I refer the reader to the list of references at the
end of the book. I have deliberately kept this list short and have refrained
from reprinting the more extensive, but not updated, lists of references given
in [D] or [L]. A substantial and at the same time very recent (1998) list of
references can be found in [Bu], which, by the way, takes an approach to
wavelets that is fairly similar to ours.
Let me comment briefly on the figures. Most graphs of mathematically defined
functions were first computed with the help of Mathematica, then output as
Plot, and, finally were finished in the graphics environment "Canvas". A few
of the figures, e.g., Figures 3.7 and 6.1, were generated by means of "Think
Pascal" as bitmaps, then printed out in letter format and finally reduced to
the required width photographically.
This book was published first in German by Vieweg-Verlag under the title
"Wavelets - Eine Einfuhrung". I am grateful to Klaus Peters that he con-
sented to give the present English edition a chance, and to his collaborators
for streamlining the schoolboy's English of my raw translation.
Christian Blatter
Zurich, 14 August, 1998
Read Me
This book is divided into six chapters, and each chapter is subdivided into a
certain number of sections. Formulas that are used again at some later point
are numbered sectionwise in parentheses: (1). When referring to formula (5)
of the current section, we do not give the section number; 3.4.(2), however,
denotes formula (2) of Section 3.4.
New terms are printed in slanted type at their place of definition or first
appearance; as a rule there is no further warning of the "Watch out: Here
comes a definition!" type. The exact spot where a term is defined is referenced
in the index at the back of the book.
Propositions and theorems are numbered by chapters, the boldface marker
(4.3) denoting the third theorem in Chapter 4. Theorems are usually an-
nounced; in any case they are recognizable from the marker at the beginning
and from their text being printed in slanted type. The two corners I and
~ denote the beginning and the end of a proof.
Circled numbers mark the beginning of examples, some of them of a more
explanatory nature, some of them describing famous animals created by means
of the general theory. The numbering of examples begins anew in each section;
the empty circle 0 marks the end of an example.
A family of objects C
n
over the index set I (called an array for short) is
designated by
(c
n
10: E I) =: c.
1A denotes the characteristic function of the set A and Ix the identity map-
ping of the vector space X.
If e resp. ai, ... , a
r
are given vectors of a vector space X, then < e > resp.
span(al, ... , a
r
) denote the subspace spanned by e resp. the ak.
R* := JR \ {O} is the multiplicative group of real numbers.
R:' := JR* x JR is the (a, b)-plane "cut up into two halves". Note that in
corresponding figures the a-axis is drawn vertically and the b-axis horizontally,
as explained in Section 1.5.
x Read Me
The symbol J without upper and lower limits always denotes the integral over
all of IR with respect to the Lebesgue measure:
J f(t) dt := 1: f(t) dt .
In an analogous manner, sums L:k without upper and lower limits are meant
to be sums over all of Z:
00
Lak:= L ak
k k=-oo
The Fourier transform is defined as
f(t;,) := ~ J f(t) e - i ~ t dt ,
and the Fourier inversion formula, sometimes called Fourier
v
transform, reads
By jf: f we denote the N-jet (the Taylor polynomial of order N) of f at the
point a E IR, given by
N (k)()
l: f(t) := L f k! a (t - a)k .
k=O
The symbol e", denotes the function
If f is a complex-valued function defined on X := IR or X := Z, then a(f) and
b(f) denote the left and right ends of the support of f, respectively:
a(f) := inf{x E X I f(x) # O}, b(f) := sup{x E X I f(x) # O} .
A time signal is simply a function f: IR ---- C.
1 Formulating the problem
1.1 A central theme of analysis
The approximation, resp. the representation, of arbitrary known or unknown
functions f by means of special functions can be viewed as a central theme
of analysis. "Special functions" are functions taken from a catalogue, e.g.,
monomials t t-t tk, kEN, or functions of the form t t-t e
ct
, C E C a parameter.
As a rule special functions are well understood, very often they are easy to
compute and have interesting analytical properties; in particular, they tend to
incorporate and re-express the evident or hidden symmetries of the situation
under consideration.
In order to fix ideas we consider a (given or unknown) function
f: IR "" C ,
assuming that f is sufficiently many times differentiable in a neighbourhood
U of the point a E R Such a function can be approximated within U by its
Taylor polynomials
n (k) ( )
j:f(t) := L f k! a (t - a)k (1)
k=O
(jets for short), up to an error that can be quantitatively controlled, and under
suitable assumptions the function f is actually represented by its Taylor series,
meaning that one has
00 f(k)( )
f(t) = L T(t - a)k
k=O
for all t in a certain neighbourhood U' C U.
The general setup in this realm is the following: Depending on the particular
situation at hand one chooses a family (e", I a E 1) of basis functions t t-t e", (t);
the index set I may be a discrete or a "continuous" set. An approximation of
a more or less arbitrary function f by means of the e", then has the form
N
f(t) ~ L Cke"'k (t)
k=l
2 1 Formulating the problem
with coefficients Ck to be determined, and a representation of f has the form
f(t) == L co:eo:(t) ; (2)
o:EI
or it appears as an integral over the index set I:
f(t) == 1 do. c(a.) eo: (t) . (3)
In the ideal case there are exactly as many basis functions at our disposal as
are needed to represent any function f of the considered kind in exactly one
way in the form (2) resp. (3). The operation that assigns a given function f
the corresponding coefficient vector or array (co: I a. E I) is called the analysis
of f with respect to the family (eo: I a. E 1). The coefficients Co: are particularly
easy to determine, if the basis functions eo: are orthonormal (see below). In
the case of the Taylor expansion (1) the coefficients have to be determined
by computing recursively ever higher derivatives of f; and in the case of the
so-called Tchebycheff approximation there are no formulas for the coefficients
Ck, even though they are uniquely determined.
The inverse operation that takes a given coefficient vector (co: I a. E I) as input
and returns the function itself as output is called the synthesis of f by means
of the eo:.
CD Suppose that the x-interval [0, L J is modeling a heat conducting rod S
(see Figure 1.1). The spatially and temporally variable temperature within
this rod is described by a function (x, t) t-t u(x, t) that satisfies the one-
dimensional heat equation
au 2a
2
u
at = a ax2 j (
4
)
here a > 0 is a material constant. The initial temperature x t-t f(x) along
the rod is given, as is the boundary condition that the two ends of the rod
are kept at temperature 0 at all times. Along the rod, i.e., for 0 < x < L,
there is no heat exchange with the surroundings. The task is to determine
the resulting temperature fluctuation u(, .) within the rod.
In connection with problems of this kind the following procedure (called sep-
aration of variables) has turned out to be useful: One begins by determining
functions U (., .) of the special form
(x, t) t-t U(x, t) = X(x) T(t) ,
satisfying (4) and vanishing at the two ends of the rod.
tions fulfilling these requirements is given by
(
k
2
7r
2
a
2
) k7rx
Uk(X, t) := exp ---v:- t sin L
A collection of func-
1.1 A central theme of analysis
Figure 1.1
u
X

s
3
Since the conditions imposed on the Uk are linear and homogeneous, it follows
that arbitrary linear combinations
<Xl
u(x, t) := 2: Ck Uk(X, t)
k=l
of the Uk are in their turn solutions of the heat equation vanishing at the ends
of the rod. Therefore we shall have the solution of the original problem in
our hands, if we are able to specify the coefficients ck in such a way that the
initial condition u(x, 0) == f(x) is fulfilled as well. This means that we would
have to guarantee the identity
k7rx
L..,;Ck sin T
k=l
f(x) (0 < x < L) . (5)
It is at this point that the question arises as to whether the function system
()
. k7rx
ek x :=smT (k E
is "complete", that is to say, is rich enough to allow the representation of an
arbitrarily given function f: )0, L[ lR in the form (5). The answer to this
question is yes, as is proven in the theory of Fourier series (see below). 0
As we move along, another issue enters the picture: If a function f is an-
alyzed or synthesized not only in thought and for theoretical purposes, but
concretely, as in the analysis of ECGs or of long term climate changes, then
for the numerical work a more or less complete discretization becomes almost
indispensable. The discretization refers, on the one hand, to the collection of
basis functions (in case the latter has not been discrete from the outset) and,
on the other hand, to the space parametrized by the independent variable t
(resp. x, x, ... ): The values of all occurring (given or unknown) functions are
evaluated, measured or computed only at the discrete places
t := kT (k E Z, T > 0 fixed) .
The fact that the function values f(t) themselves are represented in the com-
puter in a "quantized" form only, instead of with "infinite precision", does
not concern us here.
Wavelets are novel systems of basis functions used for the representation,
filtration, compression, storage, and so on of any "signals"
f: lR.
n
-+ C .
In the case n = 1, the variable t represents time, and one works with time
signals f: lR. -+ C. The case n = 2 refers to image processing; a concrete
example is the representation and storage of millions upon millions of finger-
prints in the FBI's computer, see [1]. We shall approach these wavelets by
recalling briefly some facts about Fourier series and the Fourier transform. A
more complete tutorial of Fourier analysis is given in Sections 2.1 and 2.2.
1.2 Fourier series
Fourier series concern 21T-periodic functions
f: lR. -+ C , f(t + 21T) == f(t) ,
equivalently written as f: lR./21T -+ C. The "natural" domain of definition of
such a function is the unit circle 8
1
in the complex z-plane, see Figure 1.2.
On 8
1
the infinitely many modulo 21T equivalent points t+2k1T, k E Z, appear
as a single point z = e
it
.
i
~ ~ - - ~ I - - ~ o r - - - - - - ~ o - - + t
t-21T 0 t t+21T
1
Figure 1.2
1.2 Fourier series 5
Expressing the monomial power functions
in terms of the variable t, one arrives at the trigonometrical basis functions
or pure harmonics
(k E Z) .
(Unfortunately there is no universally used and accepted notation for these
functions; so we shall give the boldface e a try here.)
The natural scalar product for functions f: lR/27r -+ C is given by
1 171"
(I, g) := 27r -71" f(t) g(t) dt .
(1)
The ek are orthonormal:
(ej,ek) = 8jk ;
in particular, they are linearly independent. From general principles of linear
algebra it follows that
is the "k-th coordinate of f with respect to the basis (ek IkE Z)" , and
N
SN:= L Ckek
k=-N
resp.
N
SN(t):= L Ck e
ikt
k=-N
is the orthogonal projection of f onto the subspace
(2)
formed by all linear combinations of the ek having Ikl :=:; N. Being the foot of
the perpendicular from f to UN (see Figure 1.3), the point SN is nearest to
f among all points of UN. In saying this we have tacitly assumed that in our
function space the distance function
(
1 171" ) 1/2
d(f,g) := Ilf - gll:= 27r _...!f(t) - g(t)1
2
dt
corresponding to the scalar product (1) has been adopted.
Figure 1.3
This has been the easy part. But what is crucial here, and much more difficult
to prove as well, is that the system (ek IkE Z) is complete: Any reasonable
function f: JR/27l" ~ <C is actually represented by its (infinite) Fourier series
00
'" c e
ikt
~ k ,
k=-oo
meaning that in some sense, to be made precise in each individual case, one
has the convergence limN ..... oo SN = f resp.
00
f(t) = L Ck e
ikt
. (3)
k=-oo
We shall look into this in more detail in Section 2.1 below.
What can be said about "discretization" here? The system (ek IkE Z) is
already discrete: There are only integer frequencies k. In numerical compu-
tations one is of course restricted to a finite frequency range [ - N .. N l; thus
instead of representations (3) there are only approximations SN.
If one discretizes with respect to the time variable t as well, one arrives at the
so-called discrete Fourier transform. The latter is a purely algebraic matter,
since convergence questions no longer enter the picture. The discrete Fourier
transform has received an enormous boost by the invention of fast algorithms
(Cooley & Tukey, 1965; but there are predecessors). The key phrase here
is fast Fourier transform, FFT for short. We shall see that wavelets are
structured for fast algorithms right from the outset. This was a key ingredient
in making wavelets a powerful tool in various application fields within a small
number of years.
1.2 Fourier series 7
The "Fourier transform" that assigns a 27r-periodic function f its array of
Fourier coefficients (Ck IkE z) treats f as an "overall object" (Gesamtobjekt
in German). In particular, there is no localization on the time axis. In an
array (Yk 10::; k < N),
(
27rk)
Yk:= f N
(0::; k < N) ,
i.e., a simple table of values of f, information about f is stored in a way that
allows easy and precise localization of individual features (e.g., local maxima,
turning points, and so on) on the time axis. In marked contrast to this
characteristic of a table (Yk 10 ::; k < N), each individual Fourier coefficient Ck
contains information about f originating from the entire domain of definition
of f. One cannot decide, merely from looking at the Ck, where f has, e.g., its
maximum or a jump discontinuity.
o
Figure 1.4
@ The jump function
f(t):= 0
(0 < t < 27r)
(t = 0)
f(t + 27r) \:It
(Figure 1.4) can be developed into a Fourier series as follows:
00 1
f(t) = L k sin(kt) .
k=l

I
I
7r
t
The given series actually represents f at all points t, but it is converging
"uniformly poorly": Since the coefficients 11k decay so slowly when k -+ 00, at
each point t -:j:. a (mod 211") one is dependent on the oscillations of k f-? sin(kt)
to obtain convergence. Furthermore, the well known Gibbs phenomenon rears
its ugly head: Any partial sum S N of the Fourier series overshoots the maximal
function value ~ at some point tN near a by about 18%.
Now if, e.g., the Fourier analysis of the function 9 shown in Figure 1.5 is at
stake, then, because of the jump discontinuity at to, this function has a Fourier
series that is everywhere poorly convergent to begin with; furthermore, one
cannot see from looking at the Ck where the jump is, even though it may be
that this is the only interesting thing about g. 0
t
Figure 1.5
If one approximates a function f by means of wavelets then there will defi-
nitely be some kind of localization; moreover, this localization is, so to speak,
tailored to measure: Transient features (short-lived details) of f, like, e.g.,
jump discontinuities or marked peaks can easily be localized from looking at
the wavelet coefficients, whereas longtime trends of f are stored in deeper lay-
ers of the coefficient hierarchy and are automatically represented in a smaller
scale; as a consequence they are less precisely localized on the time axis.
1.3 Fourier transform
Fourier transform on JR, FT for short, has as its goal the analysis and synthesis
of functions
f: JR -+ C,
using the pure harmonics
ea: JR -+ C, (1)
1.3 Fourier transform 9
as basis functions, but this time of arbitrary real frequencies a. In other
words, the index set is IR and so is isomorphic (Le., structurally equal) to
the domain of definition of the functions f under consideration. The relevant
scalar product now is
(I, g) := 1: f(t) g(t) dt
(cf. 1.2.(2)); it is the decisive structural element of the so-called L
2
-theory
(for details see Section 2.2). Since the functions e
a
do not lie in L2, it makes
no sense to ask whether they are orthonormal: The scalar product (e
a
, e{3)
is not defined. Nevertheless it is allowed and makes sense for a great many
functions f E L2 to define a "coefficient vector" (i(a) I a E IR) by means of
the formula
f(a) := - f(t) e-
iat
dt .
1 1
00
.j2ii -00
The function
is called the Fourier transform, sometimes also the spectral function, of the
function f. An individual value f(a) may be viewed as the complex amplitude
by which the frequency a is present in the signal f. Again in this case there
is no localization with respect to the variable t: One cannot read off from the
value f(a), at which time the "note" a was played.
In the field of image processing one would like to make use of the two-
dimensional Fourier transform. Think, e.g., of a picture of a landscape. In
different areas of the image you see totally different textures (a forest, a newly
plown field, a lake, clouds, and so on). These textures cause the occurence
of characteristic patterns in the Fourier transform f 1R2 ---+ C of this image.
Again, from looking at the function i you might perhaps be able to tell which
kinds of textures occur in the original picture, but definitely not where in
the picture these textures manifest themselves. For this reason one does not
subject the picture as a whole to the Fourier transform. Instead one divides
it into small squares that can be considered homogeneously textured, then
these small squares are individually Fourier transformed.
Simultaneous localization with respect to both variables t and a in a single
data array is available only within specific bounds - and these bounds cannot
be transgressed even with wavelets. An "oscillation impulse" manifest in the
time interval [to - h, to + h 1 (and == 0 outside) and having a frequency range
[ 00 - 0, ao + 0 l, where h > 0, 0 > 0 are arbitrarily small, does not exist. The
quantitative expression of this fundamental fact is the Heisenberg uncertainty
principle
1
00 100 1
-00 t
2
lf(tW dt -00 a? li(aW da ~ 411fll4
(2)
(see Section 2.3). Here the first factor on the left is a certain measure for
the "spread" of the graph of f over the t-axis, and the second factor is a
measure for the "spread" of the graph of lover the a-axis (Figure 1.6). The
inequality (2) says that the graphs of f and 1 cannot simultaneously have a
single marked peak at the origin. For the constant multiples of the functions
t t--+ exp( -ct
2
), c> 0, and only for these, one has equality in (2).
--_..,--H---...>o.---_t
Figure 1.6
For reasonable functions f: ~ -+ C, the Fourier inversion formula
1 1
00
~ . t
f(t) = In= f(a) e
W
da
v21l' -00
resp.
f = In= da i(a) eo (3) 1 1
00
V 21l' -00
is valid. This formula represents resp. synthesizes the function f as an (in-
tegral) superposition of pure harmonics (1). It is of course fundamental in
theoretical considerations, but for practical purposes it produces more than
one really needs: A real-world signal is negligibly weak or even identically
zero outside of some t-interval I. The user knows this from the start, and he
is not interested at all in synthesizing the signal outside of the interval I. But
the inversion formula (3) produces a function value at all points of the t-axis;
in particular, it goes to great pains to generate "identically 0" on ~ \ I by
mutual complete cancellation of the eo - and nobody is looking.
1.4 Windowed Fourier transform
11
It may be clear from what we have said in the last two sections that we are
looking out for a "data type" that allows easy extraction or retrieval of both
temporal (resp. spatial) and frequency information about a signal f: lR ----+ C.
A musical score is a data type having just these characteristics: If you can
read music and are given a musical score, then you can see at a glance at
which instances of time which frequencies are activated.
The so-called windowed or short time Fourier transform, abbreviated WFT
resp. STFT, constitutes a continuous version of such a data type. However,
the simultaneous localization (within the fundamental bounds, of course) with
respect to the time and frequency variables comes at the price of an enormous
redundancy, insofar as now the index set of the resulting data vector
(Gf(ex,s) I (ex,s) E lR x lR)
is two-dimensional, altough a function of only one real variable t is encoded.
y

2h
\ y=g(t)
I
I \
t
-h 0 h
Figure 1.7
The WFT can be described as follows: One begins by choosing a window
function g: lR ----+ once and for all. The function 9 should have "total
mass" 1 and be more or less concentrated around t = 0, which means that it
should have, e.g., a compact support containing 0 (see Figure 1.7) or at least
a maximum at t = 0 and fast decay when It I ----+ 00. A widely used window is
given by the function
g(t) := Nu,o(t) .- exp ( - ,
(1)
u being a fixed parameter. 1 The corresponding transform is often called
Gabor transform, since Dennis Gabor (Nobel prize in physics, 1971) was one
of the first to use the WFT systematically; in particular, he remarked that
the window Na,o is in some sense optimal.
For a given 8 E JR, the function
g8: t t-+ g(t - 8)
represents the window g, translated by the amount 8 (to the right, if 8 > 0).
We retain the functions 1.3.(1) as our basic oscillation patterns and define the
window transform
Gf: JR x JR -7 <C , (a,8) t-+ Gf(a, 8)
of a function f by
1 JOO .
Gf(a,8):= J7C f(t) g(t - 8) e-
w
:
t
dt .
V 21l" -00
(2)
If we had chosen, e.g., the window function 9 shown in Figure 1.7, then formula
(2) may be interpreted as follows: The value Gf(a,8) represents to some
measure the complex amplitude by which the pure harmonic eO! is present in
f during the time interval [8 - h, 8 + h]. If during this interval, among others,
the "note" a is played, then IGf(a, 8)1 will be large.
Since the information about f is represented redundantly in Gf, there are
several inversion formulas for the windowed Fourier transform f t-+ Gf, see,
e.g., [K], Section 2.3. For practical-numerical purposes, one of course has to
resort to a discrete version of the WFT, using equidistant subdivisions both
on the t- and the a-axis.
It is a consequence of the constant window width 2h (resp. "'2u in the case
(1)) that for lal * the "key pattern" t t-+ g(t_8)e-
i
O!t has the shape shown
in Figure 1.8. Now, a given signal might contain just a couple of oscillations
of frequency a within the interval [8 - h, 8 + h ], and these will take place in
a very small part of this interval. Therefore Gf(a,8) will have a respectable
value, but the "key pattern" shown in Figure 1.8 will not be able to detect
the location of such an oscillatory impulse with the desired precision.
1 The official symbol for this function is N(O, u), but the symbol we are proposing
here is in accordance with the notation 1.5.(1) commonly used in wavelet theory.
13
y- g(t-s) cos (at) lallarge
y
/
- ,
n n
nl A
n
1\
fI
1\
s-h s s+h
t
V V v V v V v
Figure 1.8
At the lower end of the audible range, i.e., for frequencies lal * ' things are
even worse. In this case the "key pattern" has the shape shown in Figure 1.9.
If the signal f possesses a (perhaps highly interesting) oscillatory component
of a characteristic frequency lexl *, then the transformation G will not
detect it: The window in Figure 1.9 is too narrow to encompass even a single
full turnaround of such a low frequency.
y

s-h
s s+h
Figure 1.9
1.5 Wavelet transform
In order to make clear what is so decisively new about the wavelet transform,
WT for short, as compared to the FT and WFT described in the preceding
sections, we are going to repeat resp. summarize the main features of the
latter as follows:
The Fourier transform of functions f: lR -t C uses a special analyzing
function t I-' e
it
that is distinguished by a host of interesting analyti-
cal properties. This analyzing function is dilated by the real frequency
parameter 0: and appears as t I-' eio:
t
in the transformation formulas.
The windowed Fourier transform uses the same analyzing function t I-' e
it
as well as its dilated versions. There is an additional element in the form
of a movable but otherwise rigid window function g. Note that there is a
certain freedom in choosing this window function.
y
t
L
Figure 1.10
The basic model of the wavelet transform works on complex-valued time sig-
nals f: lR -t C, also. One begins by choosing a suitable analyzing wavelet,
also called the motber wavelet or simply a wavelet, x I-' 'lj;(x). Figure 1.10
shows a 'lj; having compact support [0, L J. Dilated and translated copies of
the mother wavelet 'lj; we shall call wavelet functions. The "key patterns"
used for the analysis of time signals f will be just such wavelet functions, and
the following notation shall be adopted for them:
1 (t - b)
t I-' la11/2 'lj; -a- .
(1)
The double index (a, b) appearing here runs through the set R* x IR or R>o x
lIt The variable a is called the scaling parameter, and b is the translation
parameter. The factor 1/laI
1
/
2
in (1) is not crucial and is more of a technical
nature; it is thrown in to guarantee II7Pa,bll = 1.
1\
(t-b)
y = 7P ---u;- , O<al
\
b\
t
~
aL
Figure 1.11
As may be gathered from Figures 1.11 and 1.12, the width of the "key pattern"
resp. "key window" grows proportionally to lal, and for all values of a and b
this window presents a single and complete copy of the analyzing wavelet. Of
the following facts one should take note right at the beginning:
Scaling parameter values a of modulus 0 < lal 1 result in very nar-
row windows and serve for the precisely localized registration of high
frequency resp. transient phenomena present in the signal f.
Scaling parameter values a of modulus lal 1 result in very wide win-
dows and serve for the registration of slow phenomena resp. long wave
oscillatory components of f.
Due to everything that has been said so far it is now clear that the wavelet
transform
Wf: IR* x IR -t C ,
of a time signal f is defined as follows:
(a, b) ...... Wf(a,b)
1 1
00
t - b
Wf(a, b) := (f,7Pa,b) = la1
1
/
2
-00 f(t) 7P(-a-) dt .
(
t-b)
y='l/J a' al
t
b
aL
Figure 1.12
To be completely correct we should write W"'! instead of Wi, for the resulting
data array
(W!(a,b) I (a, b) E]R* X]R)
depends on the mother wavelet 'l/J chosen at the beginning. In all cases where
there is only one mother wavelet at stake, we are allowed to do without the
full notation W",.
The domain of definition of the transform W! is the (a, b)-plane, "cut into two
halves". Since the variable b denotes a translation along the time axis, it has
become standard in wavelet theory to draw the b-axis horizontally and the
a-axis vertically, contrary to the usual disposition of the axes corresponding
to the first and second factors of a cartesian product.
We shall see in Section 3.3 that for the wavelet transform there is again an
inversion formula. This formula represents the original signal ! as a "linear
combination" of the basis functions 'l/Ja,b, with the values W!(a, b) of the
wavelet transform serving as coefficients. In order to set up such a formula
one needs a characteristic "volume element" on the index set ]R* x R If the
functions 'l/Ja,b are given by (1), then one has
1 1m dadb
! = -C -I -12 W!(a,b)'l/Ja,b
'" JR" xJR a
with a constant C", depending only on the chosen 'l/J (Theorem (3.7)).
It is a fundamantal feature of the setup described here that on the scaling
axis (the wavelet analog of the frequency axis) a logarithmic scale becomes
prevalent. Such an experience is maybe familiar to the reader from acoustics
resp. from music: Equal tone steps correspond to equal frequency ratios W2/Wi
(e.g., 5 : 4 for the major third) and not to equal frequency differences W2 -Wi'
This fact becomes particularly evident when as our next step we are going
to discretize the index set lR>o x lR: We choose a zoom step (j > 1 (the
value (j := 2 is most commonly used here) and consider from hereon only the
discrete set of dilation factors
(r E Z) .
Note that larger numbers r E Z correspond to larger dilation factors aT > o.
With regard to the translation parameter b, we cannot simply choose a base
step j3 > 0 and then have a single grid oftranslation values b
k
:= k j3 (k E Z)
as in the case of the Fourier transform. The truth is that at finer scales,
which is to say: for smaller values of r, we need a correspondingly smaller
translational step size as well, if everything is to come out right. Concretely,
on the level aT in the (a, b)-plane (a scaled vertically, b horizontally!) we select
as grid values the numbers
(k E Z)
(see Figure 4.4). This means that consecutive bT,k'S have a distance (jTj3 from
each other. A moment's reflection shows that this choice is in fact quite
natural; in particular, it allows in an optimal way the precise localization of
high frequency and/or transient phenomena occurring in the processed time
signal f.
In this way a discrete group of self-similarities of lR on the one hand and
between 'IjJ and its scaled versions on the other hand has been established.
The systematic exploitation of this group leads to the so-called multiresolution
analysis and to the fast algorithm that goes with it. The latter, called fast
wavelet transform, FWT for short, serves for the computation of the wavelet
coefficients
and likewise for the reconstruction (Le., synthesis) of the signal f from the
stored data er,k .
In choosing the analyzing wavelet 'IjJ one has great freedom, this being in
marked contrast to the rigid framework of Fourier analysis. Essentially it is
enough to make sure that 'IjJ belongs to 1 n 2 and that J ~ o o 'IjJ(t) dt = O.
Depending on circumstances and desirabilities, things can always be set up in
such a way that
'Ij; has compact support,
the wavelet functions (the "key patterns")
belonging to the described discretization are orthonormal,
fast algorithms are available,
'Ij; is so and so many times differentiable,
the wavelet coefficients have optimal decay when r -t -00,
and so on.
As we proceed through the chapters of this book we shall meet several "fa-
mous" mother wavelets 'Ij; - some of them represented by simple formulas,
others given in the form of theoretical constructs; and in each case we shall
present a numerical resp. graphical realization of the wavelet under discussion
as well. These are, in order of appearance (at the left the number of the
corresponding figure is shown):
1.13 Haar wavelet
3.4 Mexican hat
3.5 Modulated Gaussian
3.9 Derivative of the Gaussian
4.8 Daubechies-Grossmann-Meyer wavelet corresponding to a= 2, {3= 1
5.4 Meyer wavelet
6.4 Daubechies wavelet 3'I/J
6.6 Daubechies wavelet 2'I/J
6.9 Battle-Lemarie wavelet corresponding to n = 1
6.11 Battle-Lemarie wavelet corresponding to n = 3.
The central aim of this book is to present the mathematical foundations of
wavelet analysis in a form readily accessible to the student. Nevertheless it
is appropriate and perhaps even mandatory to take a quick glance at the
applications of this new theory, too.
Fourier analysis is a mighty tool within mathematics as well as in applied
fields. Within mathematics it is primarily used in the theory of (linear) partial
differential equations. A toy model for this kind of application is given by
Example 1. 1. CD. Outside mathematics Fourier theory comes to the fore in the
modelization, description, and analysis of any spatially or temporally periodic
phenomena, to mention the most obvious. The Fourier transform draws its
power from the overwhelming invariance and symmetry properties of the pure
harmonics en.
In marked contrast to the above, the invention of wavelets is directly tied to
practical applications (to the analysis of seismic waves, as a matter of fact).
The analytic properties of wavelets are decidedly more intricate than those
of the pure harmonics en; as a consequence their use within mathematics,
i.e., as a tool for the working mathematician, has been somewhat limited (but
things are beginning to change). A nice example ofthis type can be found in
[M], Chapter 5.
The two applied fields where wavelets have been used with the greatest suc-
cess are signal processing and image processing. Signal processing is con-
cerned with time signals, so it makes use of the "one-dimensional" wavelets
whose theory is presented in this book. In the realm of image processing two-
dimensional wavelets are used. The theory of these two-dimensional wavelets
is in part a straightforward "squaring" of the one-dimensional theory, but it
also contains other elements; it is not treated in the present book.
Under the term processing we subsume the analysis, "purification", filtering,
efficient storage, retrieval, and transmission of time signals resp. image data,
and above all their compression. In information theory an image is viewed
as the result of a random process, in the ultimate limit as a bitmap without
any correlation between adjacent pixels. But in a real world image (or audio
document) there are typically regions of high information density and other
regions (e.g., cloudless sky) where there is almost no pictorial content. Now
assume that the given image is subject to a (discrete) wavelet transform,
resulting in a large amount of data Cr,k , say. Then it is easy to filter out those
coefficients Cr,k whose values transcend a certain threshold. Only these Cr,k
are actually stored resp. transmitted. In this way (and now we are coming to
the essence of the whole set-up) in each region of the image exactly as much
image content per unit of area is expressed as is in fact present there. That
is to say, by dynamically adapting the image resolution to the changing local
information density one can achieve respectable data compression ratios, the
whole with no noticeable loss in overall image quality.
The reader who wants to go more deeply and in more detail into the various
applications of wavelets is referred to the volumes [Be], [C
/
] and [D/], each of
which contains a collection of essays by various authors, or to [L], Chapter 3.
The computational and programming aspects of signal and image process-
ing using wavelets are extensively treated in [W]. As a novel descriptional
tool wavelets have found their way into various sub domains of mathematical
physics as well; in this regard see, e.g., [K], Part II.
We conclude this section with a very brief historical note. Predecessors of
wavelets, albeit without the melodious name, have been in existence since 1910
(see the next section). Over the course of subsequent decades several commu-
nication theorists have attempted to overcome the aforementioned drawbacks
of Fourier analysis resp. the WFT by various wavelet-like constructions. We
should also mention a famous integral formula by Calderon (1964) which in a
way is the godfather of the inversion formula for the wavelet transform. The
main breakthrough, however, came only in the late 1980s with the axiomatic
description of multiresolution analysis (by Mallat and Meyer [12]) and with
the construction of orthonormal wavelets having compact support, by Ingrid
Daubechies [3]. For a more detailed presentation of this course of events, ac-
compagned by an extensive bibliography (complete as of 1992), we refer the
interested reader to the standard treatise [D].
1.6 The Haar wavelet
Many important aspects of wavelet theory can already be observed and com-
prehended by studying the most simple wavelet of all, the so-called Haar
wavelet. To do this we don't need any profound preparations; on the con-
trary, it is possible to begin with our bare hands. It goes without saying that
the Haar wavelet will show up time and again in later chapters and so will
serve as a handy example througout the book.
In 1910 the mathematician Alfred Haar was the first to describe a complete
orthonormal system for the Hilbert space L2 := L2(JR), and in so doing he
proved that this space is isomorphic to the space
[2:= {(Ck1kEN)
of square-summable sequences. Nowadays, resp. in connection with the matter
under discussion, we view the basis functions given by Haar as dilated and
translated copies of a certain mother wavelet 1/J, as described in the foregoing
Section 1.5.
The Haar wavelet is the following simple step function:

x < 1)
(otherwise)
(see Figure 1.13). This 1/J =: 1/JHaar has compact support; furthermore, it is
obvious that
i: 1/J(x)dx = 0,
1.6 The Haar wavelet 21
1
x
-1
Figure 1.13 The Haar wavelet
The Haar wavelet is well localized the time domain, but unfortunately not
continuous. The Fourier transform ' of 'Haar is computed as follows:
;j;(a) = _1_ ( {1/2 e-iOl.x dx _ (I e-iOl.x dX)
vI27r Jo Jl/2
1 1 (-iOl.X 11/2 -iOl.x 11 )
= vI27r -ia e x:=o - e x:=1/2 =
= _i_sin2(a/4) e-iOl./2 (1)
vI27r a/4
The (even) function 1;j;1 has its maximum at the frequency ao 4.6622, see
Figure 1.14, and decays like l/a when a -+ 00. As a consequence one might
say that ;j; is "fairly well" at the frequency ao, but the discontinuity
of 1/JHaar causes a slow decay of ' at infinity.
y
Figure 1.14
Using 'Haar as a template we now generate the wavelet functions
-r/2 (t-k.2
r
)
'r,k(t) := 2 'Haar 2
r
(r, k E Z) (2)
(see Figure 1.15). The function 'r,k has as its support the interval
of length 2r. Let us repeat the following here: A larger value of r means longer
intervals Ir,k, and the corresponding wavelet functions 'r,k are mimicking
longer "waves". The amplitude of 'r,k is chosen in such a way that
(3)
for all r and all k. But in reality much more is true:
(1.1) The functions 'r,k (r E Il, k E Il) constitute an orthonormal basis of
the space 2(1R).
'r, k, II'r, kll = 1
\ (k+l) 2
r
t
Figure 1.15
I If k i- l, then the functions 'r,k and 'r,l (same r!) have disjoint supports,
and
(k i- l)
is an immediate consequence.
If, on the other hand, s < r then 'r,k is constant (= -1, 0 or 1) on the
support of 's,l , see Figure 1.16. Therefore we have
(s i- r, all k, l) ,
and in conjunction with (3) it follows that the 'r,k do indeed form an or-
thonormal system.
Now to the essential point: We have to show that any f E 2 can be approxi-
mated arbitrarily well (in terms of the 2-metric) by finite linear combinations
of the 'r,k' Such linear combinations we shall call wavelet polynomials. By
jl V;s, I
I I
I I
I I
- - - - ~ ~ ~ ~ I ~ [ - - - - - + - - - - - - - - - - ~ - - - - - - ' t
Figure 1.16
I I
I I
I I
23
general principles it is enough to consider an I: lR -t C of the following kind:
There is an m ~ 0 and an n ~ 0 such that
(a) I(x) == 0
and
(b) I is a step function, constant on the intervals Ln,k of length 2-
n
.
We are now going to construct a sequence ('l1 r I r ~ -n) of wavelet polyno-
mials
r
'l1r := 2::: (2::: Cj,k V;j,k)
j=-n+1 k
as follows: Beginning with the finest details in the signal I itself we shall
extract recursively out of the remainder Ir := I - 'l1 r the finest details still
present therein, the latter becoming ever more spread out as we go along.
This means in particular that in the limit r -t 00 the lowest frequency parts
of I are treated last, just the reverse from what one has in Fourier analysis
resp. synthesis.
We start the construction with
'l1-n:=O,
I-n:= I .
For the induction step r "'" r' := r + 1 we make the following assumption
(which is obviously fulfilled for r := -n):
Ar The wavelet polynomial 'l1
r
and the remainder Ir have been determined
in such a way that
(4)
and such that Ir is constant on each of the intervals Ir,k. The value of
Ir on Ir,k , denoted by Ir,k , is nothing other than the mean value of the
original function I on the interval Ir,k.
Now we define the quantities
1
br',k := "2(fr,2k - ir,2k+1) ,
1
ir' ,k := "2 (fr,2k + ir,2k+1)
(see Figure 1.17) and put
C . 2
r
' /2 b
r' ,k .== r' ,k
(cf. the normalization of the 1/Jr,k) ,
IJI r' := IJI r + L Cr' ,k 1/Jr' ,k ,
k
ir'(x) := ir',k
(5)
Then (4) is true with r' instead of r, the function ir' is constant on the
intervals Ir' ,k , and ir' ,k is the mean value of i on Ir' ,k; in other words: Ar,
is fulfilled.
Figure 1.17
t h
br k
....... ... - - . - - ... --- .... ---- ... ---- ... ---- .... --.----.--
ir,k
I r , k
I
(2k+1) 2
r
-----1
I
fr,2k+l
(2k+2)2
r
Beginning with r := -n, one arrives after n + m such steps at the formula
m
f = IJI m + im = L (LCj,k 1/Jj,k) + fm .
j=-n+1 k
The remainder im is constant on the intervals Im,k of length 2m. Note,
however, that at most the two values
A:= fm,-l = mean of f on [-2m,0[ and
B:= im,Q = mean of i on [0,2m[
are different from 0; for up to this moment all functions coming into the
picture were == 0 for Ixl 2: 2m.
25
We can continue our doubling procedure with the as yet unprocessed remain-
der fm. After p further steps we have
m+p
fm = 2: (2: Cj,k 'ljJj,k) + fm+p,
j=m+l k
the function fm+p being constant on the two intervals [-2m+p,o[, [O,2
m
+
p
[
and == 0 outside. Since f is identically zero outside the interval [-2
m
, 2
m
[, it
follows that
fm+p,-l = 2-
P
A, fm+p,o = 2-
P
B .
Therefore we have
resp.
Letting p -f 00, we finally obtain
as stated.
This proof of theorem (1.1) is constructive in the sense that it also yields an
algorithm for the determination of the wavelet coefficients Cj,k, and, what's
more, this is not any old algorithm, but what people call a fast algorithm.
We can easily convince ourselves that this is indeed the case by counting the
number of arithmetical operations required for the complete analysis.
The original function f is determined by
individual entries. The first reduction step concerns N /2 pairs of intervals
and requires essentially two additions per pair (dividing by 2 does not count,
neither does the scaling (5). Every subsequent reduction step requires half
as many operations as the preceding one; furthermore, it makes sense to stop
the process after m + n steps. This means that for the determination of all
coefficients Cj,k altogether only
N ( 1 1 )
"2 1 + "2 + 4 + . .. . 2 ~ 2N
arithmetical operations are required, a number that grows linearly with the
input length. We shall see in Section 5.4 that the reconstruction of f, using
the Cj,k as input, can be accomplished with about the same number of oper-
ations. By way of comparison: The straightforward multiplication of a data
vector of length N by a square matrix of order N requires O(N2) arithmetical
operations.
The most welcome algorithmic facts we have encountered here are not a spe-
cialty of the Haar wavelet; on the contrary, they are guaranteed to us for
all mother wavelets 1/J admitting, as 1/JHaar does, a so-called multiresolution
analysis. For more details we refer the reader to Section 5.4: Algorithms.
We bring this section and with it the introductory chapter to a close by
pointing our finger at a certain paradox that is apt to worry the novice. It
is the following: All wavelet functions 1/Jr,k (including the ones that we shall
meet only later) have mean value 0:
I: 1/Jr,k(t) dt = 0
(r, k E Z) .
How is it possible to approximate, e.g., the function f shown in Figure 1.18
by linear combinations of such functions?
y
y= f(t)
t
Figure 1.18
Well, the approximation W r --+ f (r --+ 00) takes place in L2, in many practical
cases even pointwise, but not in L1. The latter may be seen formally as follows:
The functional
f ..... I: f(t)dt
is continuous on L1, and for a function f as shown in Figure 1.18 one has
L(f) > O. Since on the other hand for all approximating functions the equality
L(W
r
) = 0 holds, we cannot have lim
r
-+
oo
wr = f in L1.
27
What happens in reality can best be examined with the help of the following
simple example: We are going to approximate the function

(O::;x<l)
( otherwise)
by means of the procedure used for the proof of theorem (1.1). To simplify
matters we replace the wavelet functions 'l/Jr,k as defined in (2) by the functions
Le., the normalization factor appearing in (2) is omitted. In addition, we
introduce the functions
(0 ::; t < 2r)
( otherwise)
(r ;::: 0) ;
they are related to the -0r,k by means of the recursion formula
as is easily verified by looking at Figure 1.19. From the last equation it follows
by induction that
r 1 _ 1
= go = L 2j 'l/Jj,O + 2
r
gr
j=1
(r ;::: 0) .
Here the sum on the right hand side is just the approximating wavelet poly-
nomial W
r
, appearing in the proof of theorem (1.1), whereas the term gr/2r
is constant on the interval Ir,o and therefore represents the remainder fr. We
now can see the following: The function being approximated by the wavelet
polynomials W r has the interval [0, 1 [ as its support, but the supports of the
approximating functions W r are ever more spread out over the t-axis. The
discrepancy that "for mean value reasons" necessarily has to persist between
and the W r is smeared out over a larger and larger domain: W r has the
value 1 - 21r on the interval [0, 1 [ and the value - 21r on the interval [1, 2
r
[ .
As was to be expected, one has
as well as
(r --+ 00) ,
the latter in agreement with (6), and finally the formula
lim I(t) - wr(t)1 = lim Ifr(t)1 = 0 \it
r--too r--+oo
is true as well, the convergence even being uniform in t.
2 Fourier analysis
The most important tool in the construction of wavelet theory is Fourier ana-
lysis. The subsequent chapters rely on many of the well-known theorems and
formulas relating to Fourier series, as well as on a basic understanding of the
Fourier transform on R These ideas will be presented in the following sections
in the way of a review, so that they can readily be used later on. For the cor-
responding proofs we refer the reader to the pertinent textbooks, e.g., [2], [5],
[10], [15J. In Sections 2.3 and 2.4 we give an account of the Heisenberg uncer-
tainty principle and of the Shannon sampling theorem. These two theorems
point to certain definitive limits of signal theory, and, in consequence, they
also also playa decisive, if sometimes hidden, role in all work with wavelets.
2.1 Fourier series
As our basic environment we use the function space L ~ := L
2
(lR/271'). The
points ofthis space are measurable functions f: lR -f C, which are 271'-periodic:
f(t + 271') = f(t) Vt E lR,
and for which the integral
is finite. To be precise, the space L ~ consists of equivalence classes of such
functions; two functions f and 9 differing only on a set of t-values of measure
o are considered to be the same point in L ~ . Among other things, this has
the following consequence: A function f E L ~ , about which nothing more
specific is known, has no definite values at individual points. Under these
circumstances, it makes no sense to speak, for example, about the value f(O).
It takes some time to become familiar with this not very functionlike behavior.
On the other hand, arbitrary integrals J: f(t) dt have a well-determined value.
30 2 Fourier analysis
The formula
(f,g) := 2. r
27r
f(t)g(t)dt
27r io
defines a scalar product on To this scalar product belong the norm
(
27r 1/2
Ilfll := vu:n = 1 If(t)12 dt)
and the distance function d(f,g) := Ilf - gil. With regard to this distance
function, our space becomes a complete metric space, which means that
Cauchy sequences of functions fn E are automatically convergent to some
point f E All in all (don't forget that is also a vector space over q,
the space is an example of a (complex) Hilbert space.
The functions
ek: t f-+ e
ikt
= cos(kt) + isin(kt) (k E Z)
are 27r-periodic, and because of
1 127r o. { 1
(ej, ek) = - e'(J-k)t dt = 1 i
U
_k)t\27r _
27r 0 27r(j _ k) e 0 - 0
they form an orthonormal system in
Any f E has Fourier coefficients
(j = k)
(j i= k)
1
1
27r
kt
Ck := f(k) := (f, ek) = - f(t) e-' dt
27r 0
(k E Z) . (1)
The Ck are nothing more than the coordinates of f with respect to the or-
thonormal basis (ek IkE Z), cf. the analog formulas for vectors of the eu-
clidean ]Rn. The following so-called Riemann-Lebesgue lemma is not very
difficult to prove:
(2.1) lim Ck = o.
k_oo
But the central result of is Parse val's formula. It says that the
scalar product of any two functions f and g E coincides with the "formal
scalar product" of the corresponding coefficient vectors i und g:
2 1 Fourier series 31
(2.2) For arbitrary I and 9 E L ~ , the equality
00
L !(k) g(k) = (/, g)
k=-oo
is valid; in particular, one has 2.:;:0=-00 ICkl2 = 11/112.
Using the Fourier coefficients of I, one forms the series
(2)
called the (formal) Fourier series of I. Occasionally one writes
(3)
to express the fact that the series (2) belongs to the given function I. The
analogies between the geometries of ~ and of IR
n
lead one to conjecture that
the series (2) "represents" the function I in a certain sense. In this regard we
can say the following:
The series (2) has partial sums
N
SN(t).- L Ck e
ikt
k=-N
In Section 1.2 we remarked that SN is nothing but the orthogonal projection
of f into the (2N + 1)-dimensional subspace
In particular the vector SN is orthogonal to 1- SN, see Figure 1.3. From this
observation it follows by Pythagoras' theorem that
N
III - sNI12 = 11/112 -llsNl1
2
= 11/112 - L ICkI2.
k=-N
On account of (2.2), we therefore may conclude that limN-+oo III - sNI12 = 0,
which is to say
(2.3) The formal Fourier series of a function f E L; converges to f in the
sense of the L;-metric.
For most practical purposes one would need much more than this, namely
a theorem that guarantees the pointwise convergence of SN(t) to f(t) for
sufficiently regular functions. The deepest result in this direction is Carleson's
theorem (1966). Its proof is so difficult that it has not shown up in the usual
textbooks on Fourier series. Since we shall make use of the theorem in several
places, we state it here:
(2.4) The partial sums SN(t) of a function f E L; converge to f(t) for almost
all t.
The following theorems are easier to prove. In these theorems the notion
of "variation" of a function f: lR/21T ---> C appears (we are talking about a
bona fide function here, not an equivalence class). This notion is explained
as follows: To an arbitrary subdivision
T: 0 = to < tl < t2 < ... < tn = 21T
of the interval [0, 21T J belongs the increment sum
n
VT(f) := L If(tk) - f(tk-l) I .
k=l
(Note that the absolute values of the increments are summed here!) The
total variation V(f) of the 21T-periodic function f is the supremum of these
sums over all subdivisions T. If V(f) is finite, then f is called a function of
bounded variation. One may consider the function t f---+ f (t) as a parametric
representation of a closed curve 'Y in the complex plane. In light of this
interpretation the quantity V (f) is nothing more than the length L(-y) of this
curve. If f is, e.g., piecewise continuously differentiable, then
tIT
V(f) = L(-y) = io 1f'(t)1 dt < 00 .
(2.5) Let the function f: lR/21T ---> C be continuous and of bounded variation.
Then the partial sums S N (t) of the Fourier series of f converge for N ~ 00
uniformly on lR/21T to f(t).
Using the idea of variation we can formulate the following "quantitative ver-
sion" of the Riemann-Lebesgue lemma:
2.1 Fourier series
33
(2.6) Let f(r) denote the r-th derivative, r 2: 0, of the function f: lR/27T --+ C.
If !(T) is continuous and V(J(r)) =: V is finite, then
Vki=0.
This can be summarized as follows: The smoother the function f, the faster
the Fourier coefficients Ck are decaying with k --+ oo. Theorem (2.6) can,
in a way, be reversed:
(2.7) If the coefficients Ck obey an estimate of the form
for some e > 0, then the function f(t) .- I:k Ck e
ikt
is at least r times
continuously differentiable.
I' When the series defining the function f is differentiated term-by-term p
times, one obtains
The estimate
L ck(ik)P e
ikt
.
k
shows that the resulting series is uniformly convergent (to a continuous func-
tion) as long as p ::; r. In fact, for such p the series represents f(p) , so
altogether we have f E cr. ~
The phenomena described in (2.6) and (2.7) become manifest again when
we are dealing with Fourier analysis on lR and will have decisive consequences
for the smoothness of our wavelets; we shall come back to this.
We conclude this section by writing down the relevant formulas for the Fourier
series and its coefficients in case of a period of arbitrary length L > instead
of 27r. For L := 27T, these formulas must become (1) and (3), and similarly
for Parseval's formula.
(2.8) Let f: JR ---+ C be a periodic function with period L > 0, and suppose
foL If(x)12 dx < 00. Then the formal Fourier series of f is given by
00
f(x) --+ L Ck e2k1rix/L ,
k=-oo
and Parseval's formula appears as
I" The function g(t) := f ( 2 ~ t ) is 27l"-periodic, thus the relations (4) are
obtained by a simple substitution of variables. From (2.2), it follows that for
L-periodic functions an equality of the form
must hold. The special function f(t) :=== 1 has Fourier coefficients Ck = 8
0k
(Kronecker-delta), which leads to the conclusion C = t J
2.2 Fourier transform on ~
Notation: From this point on until the end of the book an integral sign J with
out upper and lower limits denotes the integral with respect to the Lebesgue
measure on JR, extended over the whole real axis:
J f(t) dt := 1: f(t) dt .
Fourier analysis on JR is governed not by one theory but by at least three
different theories, all depending on which function space is chosen as the
basic environment. All of these theories deal with functions of the type
f: JR ---+ C; (1)
we shall call such functions time signals for short.
2 2 Fourier transform on lR
35
The space Ll consists of the measurable functions (1) for which the integral
J I/(t)1 dt =: 1I/IiI
(the 1 is a notational index!) is finite; to be precise, it consists of equivalence
classes of such functions. Analogously, the space L2 consists of the functions
(1) for which the integral
J I/(tW dt =: 11/!12
(the 2 is an exponent!) is finite. The third of these spaces is the so-called
Scbwartz space S; its elements are the functions (1) with the following proper-
ties: 1 has derivatives of all orders (in symbols, 1 E Coo (JR)) , and for It I ---* 00
all derivatives decay faster to 0 than any negative power l/ltln. Examples of
such functions are
1
tl-+--.
cosht
Figure 2.1 shows the inclusions that are valid between these spaces. All
wavelets of any practical significance belong to the intersection Ll n L2, so
the L
1
-theory as well as the L
2
-theory is available for them. The famous
"Mexican hat" (see Figure 3.4) even lies in S.
1
1
Figure 2.1
1
/
sint
t
The Fourier transform f of a function 1 E Ll is defined by the integral
( ~ E JR) . (2)
The definition of f is not uniform in the mathematical literature. In addition
to the integral given here, one also encounters
J f(t) e-
27ri
t dt
and others. The content of the theory remains intact under such changes, of
course, but the formulas will look a little different throughout.
For a given ~ E lR, the well-determined value f ( ~ ) may be interpreted as
follows: f ( ~ ) is the complex amplitude with which the pure oscillation ee is
represented in f. The following "Gedankenexperiment" (thought experiment)
will illustrate this: Consider a time signal f whose value f(t) oscillates around
the origin (not necessarily in circles) with an angular velocity approximately
~ during some length of time and is very weak the rest of the time. If I is
the time interval of this encircling motion, then arg(J(t) e-*) is more or less
constant on I, and the integral
1 f(t) e-it dt
has a large absolute value, since there is little cancellation. The remaining
integral
r f(t) e-it dt ,
iff{\!
on the other hand, will have a very small value, since the signal-reading f(t) is
more or less constant on lR \ I, while e is oscillating rapidly and harmonically
there, so that we have a great deal of cancellation during the summation
process on lR \ I.
(2.9) The Fourier transform f of a function f E 1 is automatically contin-
uous. Furthermore, one has
lim f ( ~ ) O.
-->oo
The vanishing of f at oo is nothing more than the Fourier transform version
of the Riemann-Lebesgue lemma.
We now derive a few rules for calculating the Fourier transforms of functions
related to some given f by translation, dilation and the like.
For any time signal f and arbitrary h E lR, the function Thf is defined by
Thf(t) := f(t - h) .
2.2 Fourier transform on lR
37
t
t-h t
Figure 2.2
If h is positive, then Th translates the graph of f by h to the right (see Figure
2.2). Let f be in L1 and g(t) := Thf(t). Then the Fourier transform of 9 is
computed as
This proves our first rule:
(Rl)
which may be expressed in words as follows: If f is translated by h to the
right along the time axis, then its Fourier transform j picks up a factor e-h.
We again consider an arbitrary signal f E L1 and modulate f with a pure
oscillation e
w
, WEIR; that is to say, we consider the function g(t) := e
iwt
f(t).
The Fourier transform of 9 is given by
So we have the following rule, which is in a way "dual" to (Rl):
(R2)
In words: If the signal f is modulated with e
w
then the graph of jis translated
by w (to the right, if w > 0) on the
Speaking philosophically, one can say that Fourier theory is the systematic
exploitation of translational symmetry. In the realm of wavelets dilations of
the time axis playa role of even more importance. For this reason we have to
investigate how the Fourier transform behaves under the operation Da, which
for arbitrary a E 1R* is defined by
Daf(t) .- fG) .
(a=3)
t
t/a t
Figure 2.3
The effect of Da on the graph of a signal f is shown in Figure 2.3 for the case
a := 3. If lal > 1, then 9(1) is stretched horizontally by the factor lal, and for
lal < 1 the graph is compressed horizontally by the factor lal. If a < 0, then,
in addition, 9(1) is reflected on the vertical axis. So let g(t) := Daf(t). In
order to compute 9 we use the substitution
t := at' (t' E JR) , da = lal dt'
(absolute value of the Jacobian!) and obtain
9(0 = _1_ J f(!) e - i ~ t dt = l:L J f(t') e - i ~ a t ' dt' = lal f(ae) .
..ffff a ..ffff
All in all, we have proven the formula
(R3) (a E JR*) .
In terms of the graphs of f and f this means the following: If the graph of f
is stretched horizontally by a factor a > 1, then the graph of i is compressed
horizontally to the fraction ~ < 1 of its original width; moreover, it is scaled
vertically by the factor lal.
For any two given functions f and 9 E L1, their convolution product f * 9 is
defined by
f * g(x) := J f(x - t) g(t) dt (x E JR) .
In any case the object f * 9 is an element of L1. This means that a priori it is
only an equivalence class of functions. In most concrete cases, however, f * 9
is a bona fide function with well-determined values. One can even say more:
39
The function f * 9 is at least as smooth as the smoother of the two functions
I and g. A typical application of convolution is the so-called regularization
of a given function f by means of smooth bump functions ge E Coo. The ge
have total mass f ge(t) dt = 1 and are identically zero outside of the interval
I-e, e J, see Figure 2.4. The value f * ge(x) can then be regarded as a weighed
average of the f-values taken in an c-neighbourhood of x, so the Coo-function
Ie := f * ge is an "c-smeared out" version of the given function f.
!ge(t)dt= 1
t
Figure 2.4
With the help of Fubini's theorem (on the interchange of the order of integra-
tion) we can now easily compute the Fourier transform of f * g:
f(x-t)9(t)dt)e-iexdx
= f(x - t)g(t) e-
iex
d(x, t)
y 27T JIRXIR
= 1 g(t) (I f(x - t) e-
iex
dx )dt .
By rule (Rl), the resulting inner integral has the value .j2i e-* and
here only the factor e-
iet
is dependent on t. Thus we may continue the above
chain of equalities with
Our computation proves the so-called convolution tbeorem
(2.10)
In words: The Fourier transform converts the convolution product of the
two functions f and 9 into the ordinary, meaning pointwise, product of their
Fourier transforms.
Now for the L
2
-theory. On L2 one defines a scalar product by
(f, g) := J f(t) g(t) dt . (3)
For any two functions f, 9 E L2, their scalar product (f, g) is a well-determi-
ned complex number. Any f E L2 has a finite 2-norm, norm for short,
Ilfll := vTT1) = (Jlf(tWdt)1/2,
and one easily proves Scbwarz' inequality
I (f, g) I ~ Ilfllllgil .
(4)
L2 is a Hilbert space, as was L ~ , but not everything carries over. For a general
f E L2, the Fourier integral (2) need not exist: Since ee is not an element of
L2, this integral cannot be regarded as being the scalar product .,A:;(f, ~ ) .
Fortunately, the subset X := L1 nL
2
is dense in L2, and this makes it possible
to extend the Fourier transform
F: f t--t [,
defined on X by formula (2), in a unique way to all of L2. This implies, of
course, that the Fourier transform of a function f E L2 \X becomes accessible
only through an additional limiting process. Working out the details, one
arrives at the following picture: The Fourier transform [ of a function f E L2
about which nothing else is known is again an L
2
-object, Le., an equivalence
class of functions, and does not have well-determined values at individual
points ~ E R But as a map
the Fourier transform is well-defined and bijective (a miracle!). In fact,
even more is true: F is an isometry with respect to the scalar product (3).
This is analytically expressed by the following theorem, called the Parseval-
Plancberel formula:
41
(2.11) For arbitrary I, g E L2 one has
(1,9) = (J,g) ,
or, written out in full,
J I(t) g(t) dt .
In particular,
resp.
A periodic function I can be reconstructed from its Fourier coefficients Ck =
j(k), by summing the series. In a similar vein, there is also a reconstruction
procedure (called the inversion formula) for the Fourier transform. It accepts
the Fourier transform lof a time signal I as input and reproduces the original
signal I by means of a summation process. In the textbooks on Fourier
analysis one finds various approaches to such an inversion formula under ever
weaker assumptions about I and f Let us note here the following version:
(2.12) If f and 1 are both in P, then
almost everywhere, in particular at all points t where I is continuous.
This formula can be written "abstractly" in the form
which may be interpreted as follows: The original signal I is a linear combina-
tion of pure oscillations of all possible frequencies ~ E lR; to be more precise,
any individual oscillation ee occurs in I with complex amplitude [ ( ~ ) (cf. our
remarks following the definition (2) of 1).
In Theorem (2.12) there are assumptions not only about the original signal
f but also about 1. Thus we have to address the following question: How are
the properties of 1 (continuity, decay at infinity, etc.) related to those of I?
Generally speaking, the following can be said in this regard: The smoother
the time signal f, the faster the decay of i(f.) for 1f.1 00. Reflecting this
in a logical mirror, one has the following dual statement: The faster the
original signal decays for It I 00, the smoother, or more regular, is its Fourier
transform i (Following the general custom, we use the word regular to convey
a not very precise notion of smoothness.) A function f in Schwartz space S
is "super smooth", and as a consequence its Fourier transform decays "super
fast". On the other f and all its derivatives enjoy "super fast" decay,
and as a consequence f is "super smooth". All in all, it turns out that :1,
restricted to S, maps this space bijectively onto itself.
We want to formulate the described general principle somewhat more pre-
cisely, i.e., in a more quantitative way. The smoothness (regularity) of a
function is most easily expressed by the number of times it can be continu-
ously differentiated. So we first have to investigate the interplay between the
Fourier transform and differentiation.
Let f be a aI-function and assume that f as well as l' are integrable, i.e.,
in Ll. Then in any case one has limt-doo f(t) = 0 (an exercise!), and partial
integration of the Fourier integral (2) gives
J
f'(t) dt = f(t) + if. J f(t) dt ,
i 1 t.--oo
from which we can read off the following rule for computing the Fourier trans-
form of a derivative:
(R4)
Continuing in this way, we obtain, at least formally, for arbitrary r ;::: 0, the
formula
(5)
Assume, e.g., that our signal f is r times continuously differentiable and that
the derivatives f(k) (0:::; k :::; r) are in Ll. Then formula (5) is applicable,
and Theorem (2.9), applied to fH, guarantees
lim If.l
r
i(f.) = 0 .

This can be read as follows: Under the described circumstances the Fourier
transform i has a decay at infinity (i.e., for 1f.1 (0) that is faster than the
decay of l/If.lr.
Using (2.11) instead of (2.9) we arrive at a similar result: If, under suitable
assumptions about the derivatives f(k) (0:::; k :::; r), the integral f If(r)(t)1
2
dt
2.2 Fourier transform on lR
43
is finite, then the integral J df. is finite as well, which implies that
i must have corresponding decay at infinity.
As a counterpart to the considerations in the last paragraph we start afresh,
but this time with time signals f that have fast decay at infinity. We consider
an fEU decaying for It I -t 00 at least fast enough to make the integral
J Itllf(t)1 dt convergent. We shall denote the function t t---+ t f(t) by tf for
short, so we assume tf ELI. We now compute the derivative of f To this
end we write
+ - = J f(t) - 1 dt .
Here the integrand
. e-
ith
- 1
gh(t) := f(t) h
can be estimated as follows:
'Vh-j:.O.
By Lebesgue's theorem (about the interchange of limit and integration) we
conclude that the derivative
= lim + h) - = _1_ J f(t) -it) dt
h--O h J21T
exists. If the last equation is read from right to left, one obtains the following
rule for computing the Fourier transform of tf:
(R5) (t = i .
Because of (2.9), the function (1)' is even continuous. By induction one
proves easily that the following is true for arbitrary r 1:
(2.13) Assume that fELl decays fast enough for It I -t 00 to make the
integral J IW If(t)1 dt finite. Then the Fourier transform 1 is at least r times
continuously differentiable. Furthermore,
(6)
An extremal case of fast decay is when the time signal fELl has in fact
compact support. If supp(f) C [-b, b], we may write
f(() = f(t) e-i(t dt .
y27r J-b
(7)
Note that we have replaced the frequency variable by a (, for something
essential has happened: The Fourier transform f has become an entire holo-
morphic function of the complex variable ( = + iry. Looking back, we remark
that for the convergence of the Fourier integral (2) in general it was crucial
that the factor e-it;t remain bounded when t -t oo. Now in the integral
(7) over a finite interval, the factor e-i(t can be estimated for complex ( as
follows:
le-i(tl = le-
i
(t;.+i7)tl ::; e
bl
7)1 (-b::; t::; b) .
This shows that the integral (7) is convergent for arbitrary values of ( E C,
and as in the proof of (R5) it follows that one may differentiate (7) in the
sense of complex function theory with respect to the variable (. Furthermore,
one has for f itself an estimate of the form
1f(()1 ::; _l_jb If(t)le1tIm(()1 dt::; CeblIm(OI .
v'2i -b
Thus the size of the support of f determines the rate of increase of the entire
function ( t---+ f( () in the vertical direction.
Since the Fourier transform f in this case has turned out to be an entire
holomorphic function, it is impossible that f has compact support, if this is
the case for f. Turned the other way around, a bandlimited signal (see Section
2.4) cannot have compact support.
We conclude this section with a few examples.
CD Let a > 0, and consider the function f := l[-a,aj. Its Fourier transform
is computed as follows:
= -- e-it;t dt = __ __ e-it;t = ___ e - e
1
j
a lIla 1 2 it;a -it;a
v'2i -a v'2i t:=-a v'2i 2i
= fi =1= 0) .
y;
The value = 0 is special. By a separate calculation or by looking at
limt;--->o one finds
f(0) = j!a.
The graphs of both f and f are shown in Figure 2.5. In the signal theoretic
literature, very often the so-called sinc function is introduced as a standard
tool. It is usually defined by
._
{
I
Sinxx
sinc (x)
(x =1= 0)
(x =0)
2.2 Fourier transform on R 45
and is an entire holomorphic function of x, when x is considered as a complex
variable. Using this function we may write down our result about f in the
following way:
(8)
t
'i2Fr a
1
f
1
- - ~ - - ~ - - ~ - - - - + t
-a a
Figure 2.5
As an exercise in using our rules, we compute the Fourier transform of the
Haar wavelet (see Section 1.6) a second time. Considered as an element of
1, the Haar wavelet may be written as follows:
'l/JHaar = 1[0 1.]-1[1. 1] = T1.1[_1. 1.]- T;J,l[_1. 1.] .
'2 2' 4 4'4 4 4'4
Rule (R1) now allows us to read off :;jHaar directly from (8):
as before.
The function
g(t) := l[-a,a](t). e
iwot
models a certain process setting in at the exact time t := -a and abruptly
stopping at time t := a. In between, we observe a pure oscillation of frequency
(angular velocity, to be exact) woo The Fourier transform treats this process
mandatorily as an overall phenomenon extended over the full time axis. Rule
(R2) gives, in this case:
= fi - wo))
y;
As was to be expected, the function 9 has a more or less distinctive maximum
at the frequency := Wo (see Figure 2.6). But because of the jump discon-
tinuities of 9 at the times t := a, the absolute value 191 decays only slowly
with -t 00; in fact, 9 is not even in L1. 0
Figure 2.6
@ The Fourier transform of the function
is most easily computed via the methods of complex function theory. Since
90 is real and even, its Fourier transform 90 will also be a real and even
function. So it suffices to discuss > O. Inspired by 90, we consider the
function J(z) := e-
z2
/2, holomorphic in the full complex z-plane, and draw
the rectangle R shown in Figure 2.7. Since, in the end, we shall take the limit
a -t 00, we may assume right from the start that a > 0; note that is
fixed here.
Cauchy's integral theorem tells us that JaR J(z) dz = O. Therefore we have
1
J(z) dz = 1 J(z) dz + j J(z) dz - j J(z) dz ,
u
2.2 Fourier transform on R 47
R
1-
-a a
Figure 2.1
which we may abbreviate as
h = 10 +1+ -L .
For h we use the parametric representation
0"1: (-a t a)
and obtain
11 = i: exp( - t
2
+ 2i;t - e) dt = ee /2 i: e-
t2
/
2
e-
iet
dt
= ee/
2
+ 0(1)) (a -4 (0) . (9)
The integral 10 can be written as
f
a 2
10 = -a e-
t
/2 dt = y'2; + 0(1)
(a-4oo) . (10)
Here we have used a well-known special value of the probability integral, which
can be obtained without excursion into the complex domain. To compute the
remaining integrals I , we use the parametric representation
I: t t---+ z(t) := a + it
and obtain
l
e
(a
2
2iat - t
2
)
i= exp - 2 idt.
o
Because of a the last integral can be estimated as follows:
I I I loa exp ( (a - t )2( a + t)) dt loa exp ( - (a - t)) dt
= ... = e-
a2
/
2
) = 0(1) (a -t 00) .
a
This proves It = 10 + 0(1) (a -t 00); therefore from (9) and (10), by passing
to the limit a -t 00, we obtain
__ 1_ -e/2
'> - v'27fe .
We see that the special function N1,0 has as its Fourier transform an identical
copy of itself, but living on the
y
((J=I, w=5)
Figure 2.8
We conclude the present example by computing the Fourier transform of the
"wave train"
1 (t2 ) e
iwot
+ e-
iwot
g(t) := Nu,o(t) cos(wot) = exp --2 2
V27r (J 2(J
(see Figure 2.8). To this end we use our rules. First, one has Nu,o =
so rule (R3) gives
2 3 The Heisenberg uncertainty principle
49
To this we apply rule (R2) and obtain
We see that the Fourier transform of our "wave train" has peaks at the two
points wo of the and these peaks become more and more pronounced
as (J increases, Le., when the number of oscillations of frequency Wo that in
fact could be observed becomes larger and larger. 0
For additional formulas giving the Fourier transforms of special functions we
refer the reader to the extensive tables in [13].
2.3 The Heisenberg uncertainty principle
We have at several places already that a time signal I and its Fourier
transform I cannot be simultaneously localized in a small domain of the t-
resp. the
The scaling rule (R3) implies that the graph of f is stretched horizontally
(and, in addition, flattened by vertical scaling) when the graph of I is
compressed horizontally.
The Fourier transform of a pure oscillation cut off outside a has all of
lR as its support and is not even absolutely integrable for 00.
A time signal with compact support cannot be bandlimited (see Section
2.4).
Further observations can be made along the same vein, which the reader
is invited to make on his own.
The phenomenon described here rather intuitively has found its quantita-
tive expression in the famous Heisenberg uncertainty principle, a theorem of
Fourier analysis that plays an important role in quantum mechanics. There
the motion of a particle is described "abstractly" by a certain function 'ljJ E S
(no connection with our wavelets) in the following way: The function Ix (x) :=
1l,b(x)j2 is interpreted as the probability density for the position X ofthis parti-
cle, considered as a random variable, and := is the corresponding
density for its momentum P. The uncertainty principle states in the form of a
precise inequality that these two densities cannot simultaneously have a single
marked peak.
Here we have tacitly assumed 'f/; E L2, and, for the probabilistic interpretation,
11'f/;11
2
= J fx(x) dx = 1 .
The quantity
is the expectation of the random variable X
2
and consequently a measure for
the horizontal spread of the function 'f/;. Analogously, the integral
can be regarded as a measure of the spread of 'f/; over the In terms of
these quantities, the Heisenberg uncertainty principle can be formulated as
follows:
(2.14) Let 'f/; be an arbitrary function in L2. Tben
(1)
tbe left-band side being allowed to assume tbe value 00. Tbe equality sign is
valid exactly for tbe constant multiples of tbe functions x f---+ e-
cx2
, c> O.
I If Ilx'f/;ll = 00 or = 00, then there is nothing to prove. In this
case at least one of the two functions 'f/; and';;; is definitely ''very spread out".
Therefore we may assume that the left-hand side of (1) is finite and prove
this inequality first for functions 'f/; E S. Under this additional hypothesis
all convergence questions are moved out of the way; in particular, we have
limx--->oo xl'f/;(x)12 = O.
The Fourier transform';;; may be eliminated from (1) by means of rule (R4)
and Parseval's formula (2.11). One has
= 11'f/;'11 = 11'f/;'II,
from which it follows that the stated inequality (1) is equivalent to
Ilx'f/;ll IWII 11'f/;11
2
2 .
(2)
2 3 The Heisenberg uncertainty principle
51
:.Jow by Schwarz' inequality 2.2.(4), we have
Ilx1/l11111/I'11 ~ ! (x 1/1, 1/1')! ~ !Re(x1/l,1/I')!.
(3)
Here the right-hand side can be computed as follows:
2 Re(x 1/1, 1/1') = (x 1/1, 1/1') + (1/1', x 1/1) = J x (1/I(x)1/I'(x) + 1/I'(x)1/I(x)) dx
! i
= X 11/1 (x) 12 [Xl - I: 11/I(xW dx = _111/111
2
.
If we insert this on the right side of (3), the inequality (2) follows.
To finish up the proof we have to get rid of the assumption 1/1 E S. Since S is
dense in L2, a simple approximation argument (which we leave as an exercise)
will do the job.
One has equality in (1), if and only if both ~ relations in (3) are in fact
equalities, and for this to be valid it is necessary, in the first place, that the
two vectors x1/l and 1/1' E 2 are linearly dependent. So there has to be a
J1- + ill E C with
1/I'(x) == (J.L + iv) x 1/1 (x)
(x E JR.) . (4)
The solutions of this differential equation are given by
and such a 1/1 is an element of 2 if and only if J.L =: -c is negative. For the
second ~ in (3) to be an equality, (x 1/1,1/1') has to be real. Together with (4)
we are led to the condition
SO II has to be zero.
According to this theorem, the two functions 1/1, :(b cannot simultaneously be
sharply localized at x := 0, ~ := 0: At least one of the numbers IIx1/l11
2
and
1 1 ~ ~ 1 1 2 is ~ 111/111
2
/2. Of course the same is true for an arbitrary pair (xo, ~ o )
instead of (0,0):
(2.15) For any 1/1 E 2 and arbitrary Xo E JR., ~ o E JR. one has
Here II(x - xo)'l/JII resp. - denote the following quantities:
(j (x _ xo)21'l/J(x)12 dX) 1/2 resp.
I We bring the auxiliary function
g(t) := 'l/J(t + xo)
into play and compute
IIgll2 = jl'l/J(t + xo)1
2
dt = 1I'l/J1I2 ,
IItgll
2
= jt21'l/J(t+xo)12= j(x-xo)21'l/J(x)1
2
dX.
Writing 9 in the form
g(t) = h(t) ,
h(t) := f(t + xo) ,
and with the help of rules (R2) and (Rl), we deduce that
This implies
If we now apply (2.14) to the function 9 and insert the values obtained for
IIgll, IItgll and II T 911, we arrive at the stated formula. J
2 4 The Shannon sampling theorem
53
2.4 The Shannon sampling theorem
The Shannon sampling theorem gives a surprising answer to the following
question: Is it possible to reconstruct a time signal f from discrete values
(J(kT) IkE Z) completely, i.e., for all values of the continuous variable t?
Without further assumptions about f the answer to this question of course
has to be no, for in the open intervals between the sample points kT the graph
of f could be filled in more or less arbitrarily.
The sampling theorem has an interesting history; see [9J for a very readable
account. The fact is that the series representation given by Shannon's theorem
had been known long before Shannon by the name of cardinal series.
:\. function fELl is called n-bandlimited if its Fourier transform 1 vanishes
:dentically for > 0.:
> 0.) .
Shannon's theorem states that an n-bandlimited function can be reconstruc-
ted completely from its values
(J(kT) IkE Z) ,
7r
T-
n
, (1)
sampled at the discrete points kT. By "completely" we mean that at all
points t E JR we get back the exact original value f(t). Now this might come
as a surprise, but a moment's reflection shows that it is not so surprising
after all: A bandlimited time signal f is automatically an entire holomorphic
function of the complex variable t (cf. the corresponding statement about
the Fourier transform of time signals having compact support), and it is well
known that such a function is determined on all of C by giving its values on
a comparatively "modest" set. So uniqueness follows from general principles,
but Shannon's theorem even gives a formula for f.
In (1) a certain rigid relation between the bandwidth 0. and the sampling
lllterval T is stipulated. There is a lot to be said about that, and we shall come
back to this matter later on. For the moment, the following will suffice: All
harmonic components actually occurring in f have a period length 27r /0..
Thus, by requiring T := 7r /0., one makes sure that any pure oscillation possibly
present in f would be sampled at least twice per period. Here is the sampling
theorem (Figure 2.9):
(2.16) Let the continuous function f: JR -+ C be n-bandlimited and assume
that f satisfies an estimate of the form
f(t) =
(t -+ oo) . (2)
Figure 2.9
Let T := 7r 10. Then
00
f(t) = L f(kT) sinc(O(t - kT)) (tEJR). (3)
k=-oo
The formal series appearing in (3) is called the cardinal series in the literature.
Because the sinc-function is bounded on JR, the assumption (2) guarantees that
the cardinal series is uniformly convergent on JR and so represents a function
j that is continuous on all of R The relations sinc(k7r) = OOk imply that
the function j automatically interpolates the given values f (kT). This means
that the cardinal series can be used as a continuous interpolant of the given
data (J(kT) IkE Z) even in cases where f is not bandlimited.
From what was said above about f, it is no restriction of generality to as-
sume right from the start that f is continuous. The assumption (2) could be
weakened.
I Because of (2) the function f is in L1 n L2 and has a continuous Fourier
transform by (2.9). Since 1 vanishes for lei > 0, it is in L1 as well, and
the right side of the inversion formula (2.12) produces a continuous function
t f--+ j(t) which coincides with f almost everywhere, so is actually == f:
f(t) = _1_ J l(e) e* de = _1_1
0
1<e) e
ite
de (t E JR) . (4)
..j2; A ..j2; -0
Since 1 is continuous, one has 1<-0) = 1<0) = 0, and one may say that on
the e-interval [-0, OJ the function 1 coincides with a certain periodic function
F of period 20:
1<e) == F(e)
(5)
This function F E L2 (JRI (20)) can be developed into a Fourier series accord
ing to the formulas (2.8):
00
F(e) --+ L cke
2k1rie
/(20) ,
(6)
k=-oo
2 -1 The Shannon sampling theorem 55
and we know by Carleson's theorem (2.4) that the series written here con-
verges for almost all to the true function value The coefficients Ck are
computed as follows:
Comparing this equality with (4) we see that the last integral can be inter-
preted as an f-value, so we get
y'2; y'2;
Ck = 20 f(-br/O) = 20 f(-kT) ,
and formula (6) becomes
= y'2; f(kT) e-ikTt;
20
k=-oo
(almost all E 1R) .
On account of (5) we may therefore replace (4) by
f(t) = ( f f(kT) e-ikTt;) eitt; .
20 -0 k=-oo
(8)
Because of (2), the series under the integral sign converges uniformly, and we
are allowed to integrate it term by term:
The last integral is computed as follows:
]
0 ei(t-kT)t; =]0 cos((t _
-0 -0
2 .
= t _ kT sm(O(t - kT)) (t -I kT)
= 20 sinc(O(t - kT)) (t E 1R) ,
so that we definitively obtain the stated formula
00
f(t) = L f(kT) sinc(O(t - kT)) (t E 1R) .
k=-oo
The frequency (angular velocity, to be exact) n := 7r IT is called the Nyquist
frequency for the chosen sampling interval T. Conversely, the quantity T-
I
represents the number of samples taken per unit of time and is called the
sampling rate. The sampling rate T-
1
:= n/7r is called the Nyquist rate for
functions of bandwidth n.
Assume now that a certain sampling rate is given, e.g., T-
1
:= 40000 sec
l
.
What can be said when the actual bandwidth n' of the sampled function f
is larger than the Nyquist frequency n := 7r IT? In order to answer this
question we need to go once more through the above proof. The places A
in (4) and B in (7) are the only two instances where the assumption that i
vanishes identically outside of the interval [-n, n] has actually been used. If
this assumption is not fulfilled, i.e., if the true bandwidth n' of f is larger
than n = 7r IT, then at the places A and B we no longer have equality, and
the cardinal series will not represent f.
Which other function is then represented by the cardinal series? One might
perhaps entertain the idea that simply the harmonic components with
frequencies > n are filtered out, so that the cardinal series would essentially
produce the function
- 1 111
f:= f(C .
y 27r -11
Unfortunately, this conjecture is false. In reality, a new phenomenon occurs.
It is called aliasing and is a nuisance in various fields of technology (telephone
communications, computer tomography, etc.), where discretization of analog
phenomena is an essential ingredient.
Things become more clear when we now consider an f that is only "moder
ately" undersampled. We take
n < n' < 3n
and assume that == 0 for > n'. Then we can write (cf. (4))
If we make the substitution
(2n
2 4 The Shannon sampling theorem 57
10 the two exterior integrals on the right, then = (because of
2f! T = 211"), and we obtain
f(kT) = + - 20) + + 20)) . (9)
y'27l" -0
This brings into the game the continuous function 9 E 2 whose Fourier
transform is given by
Because of (9), the function 9 satisfies
o
g(kT) = = f(kT)
v27l" -0
(-0::; 0)
> 0)
(k E Z) .
(10)
We realize that 9 has the same cardinal series as f, but 9 is, contrary to f,
truly n-bandlimited. This implies that the common cardinal series of f and
9 represents not f but g, and we are led to the following general conclusion:
If the true bandwidth 0' of f is larger than the Nyquist frequency 0 := 7l" IT,
then the high frequency parts of f are not simply filtered out or "forgotten"
by the cardinal series, but they appear therein, afflicted with a mysterious
frequency shift. The cardinal series produces an O-bandlimited function 9
whose Fourier transform 9 is given by (10) and is shown in Figure 2.10 .
..........

-0' -0 0'
30
Figure 2.10 Aliasing
While undersampling leads, as we have seen, to the undesirable effect of alias-
ing, the skillful deployment of oversampling can be used to improve the rate
of convergence. We now show how this can be realized.
Let a sampling rate T-
1
be given and let 0 := 7r IT be the corresponding
Nyquist frequency. We assume that the signals f taken into consideration are
O'-bandlimited for some 0' < O. Let the auxiliary function q E L2 be defined
by giving its Fourier transform:
.= _ . 7r(2Iel - 0 - 0'))
q... 2 sm 2(0 - 0')
o
(lei 0')
(0' lei 0)
(lei 2: 0)
Note that q is, apart from the parameter values 0 and 0', independent of f.
Figure 2.11 shows the graphs of If and of a typical! under consideration.
1
-0
Figure 2.11
The signal f satisfies the assumptions of theorem (2.16), therefore (8) is valid.
and we may write
= ..,fj/ff f(kT)
20 L..;
k=-oo
Furthermore, we know that i<e) is identically zero for 0' lei O. In the
interval lei 0' we have If(e) == 1. This implies that, starting with (4), we
2 -1 The Shannon sampling theorem 59
may do the following computation:
Using the abbreviation
1 1
0
.
n =: Q(s),
2 -0
(11)
we see that the cardinal series (3) has been transformed into the novel repre-
sentation
00
f(t) = L f(kT) Q(t - kT) . (12)
k=-oo
In order to be able to judge the announced improvement in convergence we
need the "universal" (i.e., independent of f) function Q in explicit form. Since
q is an even function, the integral (11) is computed as follows:
Q(s) = i:
(1
01
+ ...
7[2 sin(O's)+sin(Os)
20s 7[2 - (0 - O')2s2
From this, we immediately deduce
Q(s) =
Let us consider an example. Oversampling the time signal f twice means
n' = Imagine that we want to reconstruct the signal f in the t-interval
[0, T]. For the comparison of (12) and (3) we have to estimate the order of
magnitude of the factor Q(t - kT) in (12) when Ikl ---+ 00. It is given by
27[2 4 1
20 IklT . (0/2)2(kT)2 ;: Tkf3 .
In simplifying, we have used the relation flT = 1r. Compare this with
the cardinal series (3): The order of magnitude of the corresponding factor
sinc(fl(t - kT)) when Ikl ~ 00 is much larger, namely
1 1
;Tkj.
It follows that, using (3), one would have to take several times more terms into
account as compared to (12) in order to guarantee the same level of precision.
3 The continuous wavelet transform
3.1 Definitions and examples
A function 'Ij;: lR -+ C satisfying the conditions
11'Ij;1I = 1
(1)
and
r 11J(a) 12
211" JR- lal da =: C'if; < 00
(2)
is called a mother wavelet or simply a wavelet. These two conditions represent
the bare minimum that is necessary for the functioning of the theory described
in this chapter. All wavelets occurring in practice are L1-functions as well,
most ofthem are continuous (the Haar wavelet isn't), many are differentiable,
and the wavelets that are the most popular (as mathematical objects, if not
in the applications) have compact support.
Whether a proposed function 'Ij; E L2 fulfills condition (2) cannot be decided
Just by looking at it. That's why the following criterion is of help, at least for
reasonable 1/;'s; at the same time it gives an intuitively accessible interpretation
of condition (2):
(3.1) For functions 'Ij; E L2 satisfying t'lj; E 1, i.e., J It I 1'Ij;(t) I dt < 00,
condition (2) is equivalent to
f: 'Ij;(t)dt = 0
resp. 1J(O) = 0 . (3)
According to this proposition a wavelet has mean value O. From this we infer
that the graph Q('Ij;) of a wavelet 'Ij; lies, as most graphs of "waves" do, partly
above and partly below the t-axis.
I A function 'Ij; of the described kind is automatically in L1, and one has
~ 1 J
'Ij;(O) =...,!2-ff 'Ij;(t) dt .
62 3 The continuous wavelet transform
By (2.9) the Fourier transform ;J is continuous. Then the integral (2) can
only converge if ;J(O) = o.
Conversely: The condition t'l/J E 1 implies ;J E C
1
by (2.13). Let
I} =: M.
Now, if ;J(O) = 0, then the mean value theorem of differential calculus implies
( :::; 1) ,
and we obtain the estimate
Assume that a certain wavelet 'l/J has been chosen and is held fixed. Then the
function
1 f (t - b)
Wf(a, b) := la1
1
/
2
f(t) 'l/J -a- dt
(ai-O)
(4)
is called the wavelet transform of the time signal f E 2 with respect to
The domain of definition of Wf is the (a, b)-plane, "cut into two halves", i.e.,
the set
lR:' := {(a, b) I a E lR*, bE lR} .
Note again that in wavelet theory the a-axis is scaled vertically and the b-axis
horizontally (see, for example, Figure 3.7). Very often the domain of WI is
restricted to positive a-values. In this case, condition (2) has to be modified
slightly (see below).
Wf is a function of two real variables; therefore its gnphical representation
in a figure is not as easily accomplished as that of for f. We refer the reader
to Example for a version that is easily implemented on a computer.
Assume that a wavelet 'l/J has been chosen once and for all. For arbitrary a f 0
let
'l/Ja(t) := 'l/JG)
be the function obtained from 'l/J by stretching its graph horizontally from 0
by the factor lal, reflecting it at the vertical axis in case a < 0, and finally
scaling it appropriately in the vertical direction, making
3 1 Definitions and examples
63
Ii after this dilation process the function 1/Ja is translated along the time axis
Jy the amount b (to the right, if b > 0), one obtains the function
1 (t - b)
1/Ja,b(t) := 1/Ja(t - b) = laI1/21/J -a- ,
(5)
lppearing in the integral (4); see Figure 3.1. We obviously have
II1/Ja,b II = 1
v ( a, b) E l R ~ .
[sing the 'ljJa,b we can write the definition (4) of the wavelet transform in the
form of a scalar product:
WI (a, b) (6)
~ ____ ~ ____ __ L-______ ~ ______ -. __ ~ - L ___ t
Figure 3.1
This implies, first, that at each point (a, b) E lR* x lR the wavelet transform
WI has a well determined value WI (a, b) and, second, by Schwarz' inequality,
that WI is uniformly bounded on lR:::
IW/(a, b)1 ~ IIIII
v (a, b) E l R ~ . (7)
We now compute the Fourier transforms of the functions 1/Ja,b. According to
rule (R3) he have
whence we obtain by rule (Rl), applied to (5):
(8)
On account of (2.11) (Parseval's formula) and (6) we therefore can write
Wf(a, b) in the following form:
The last integral can be regarded as a Fourier integral; to be precise, it gives
the Fourier
v
transform of the l-function
(10)
written as a function of the variable b. Altogether, we have proven the fol-
lowing proposition:
(3.2) For fixed a =I- 0 the function
Wf(a, .): b 1-+ Wf(a, b)
can be regarded as the Fourier
v
transform of the function Fa, the latter given
by (10).
Because of (2.9) one may conclude in particular that the function Wf is
continuous on horizontal lines a = const., and takes the limit 0 when b -+ oo,
keeping a fixed.
CD The function 1/J := 1/JHaar is obviously a wavelet in the sense of the general
definition. If a > 0 then
and consequently

(b + :::; t < b + a)
( otherwise)
1 (1b+
a
/2 l
b
+
a
)
Wf(a, b) = r;; f(t) dt - f(t) dt
va b b+a/2
= - - f(t) dt - - f(t) dt
fa (21b+
a
/2 21
b
+
a
)
2 a b a b+a/2
This shows that (apart from the normalizing factor) the value Wf(a,b) rep-
resents a difference between two mean values of f, these means being taken
3.1 Definitions and examples 65
Figure 3.2
---+------+-----+----+ t
b
2
b+a
over two adjacent intervals of length % in the neighbourhood of b, as indicated
in Figure 3.2.
We may also look at the same quantity Wf (a, b) in a totally different way:
1 l
b
+
a
/
2
Wf(a, b) = Va b (J(t) - f(t + %)) dt
1 lb+
a
/2 (1t+
a
/2
= - Va b t f' (T) dT) dt = ...
= __ 1 ja/2 fl(b + + T) dT .
Va -a/2 2 2
Written in this form the value Wf(a, b) appears as a weighed mean of the
derivative f' over the interval [b, b + a]. Figure 3.3 shows the graph of the
weight function relating to this second interpretation of Wf(a, b). 0
Figure 3.3
a
2
o
a
2
. t
T
66 3 The continuous wavelet transforn
@ Consider the function
(11
where the leading numerical factor (=: ,) is chosen so as to make 111jJ11 = 1
The graph of 'Ij; is shown in Figure 3.4; its shape immediately reminds one 0:
a Mexican hat.
As is easily verified, one has 'Ij;(t) = -,g"(t), where g(t) := e-
t2
/
2
denote,
the Gaussian. In Example 2.2.@ we computed the Fourier transform of the
latter and found that it is equal to g. We conclude, using rule (R4), that
In particular, we have ;;(0) = 0, and from Proposition 3.1 we infer that thE
function 'Ij; is indeed a wavelet. For obvious reasons this function is called thE
Mexican bat.
1
- - - = ~ - - - - - - - - r - - - ~ - - ~ - - - - - - - - - = = - - - - - - t
Figure 3.4 Mexican hat
In Figure 3.5 the graph of a modulated Gaussian is shown. It is can
structed as follows: First a fundamental frequency w > 0 is chosen and held
fixed. It seems that for certain practical reasons the value w := 5 is a good
choice, see [D], 3.3.5.C, for details. It is evident that the "wave train"
would be an interesting candidate to serve as a "key pattern". Unfortunateh
the condition (0) = 0 is not fulfilled. For this reason we modify X slightly to
3 1 Definitions and examples
67
and now have to pick a suitable value for A. Rule (R2) gives
~ 2/ 2
,:nd consequently 'Ij;(O) = e-
W
2 - A. This shows that setting A := e-
w
/2 we
(an satisfy condition (3); therefore the complex valued function
:0 on principle acceptable as a wavelet. The 'Ij; as given by this formula has
:,'et to be normalized. We leave it to the reader as an exercise to perform the
!1eCessary calculations to that end. 0
t
(w=3)
Figure 3.5 Modulated Gaussian
8) An arbitrary function 'Ij; E L2 n L1 having norm 1, mean 0 and compact
~ u p p o r t is automatically a wavelet: Let 'Ij;(t) be == 0 for It I > b. The function
h(t):= ItI1[-b,bj(t) is obviously in L2, thus
J It I 1'Ij;(t) I dt = (h, 1'Ij;1) < 00 ,
and the above statement follows using (3.1).
o
68 3 The continuous wavelet transfornl
-3 -1 o 3 4 6
Figure 3.6
The following is an attempt to visualize the wavelet transform of a giver
time signal I as a function of two real variables. As our analyzing wavelet we
take the Mexican hat (11). We let the time signal I be a superposition ofthf
three "notes"
!let) :=2-2It+21
h(t) := 1 - cos(27ft)
1
!J(t) := 2 (1 - cos(57ft))
(-3:::;t:::;-I),
(0 :::; t :::; 3) ,
(4:::;t:::;6),
(see Figure 3.6) with suitably chosen coefficients:
:= 0 (otherwise),
: = 0 ( otherwise) ,
:= 0 (otherwise)
I(t) := 2.883 !let) + 1.205 h(t) + 0.968 !J(t) . (12,
In order to compensate for the natural decay of WI(a, b) when a --+ 0 (Set
Theorem (3.15) below) we show a density plot of the function
1
w(a, b) := a
3
/
2
iWI(a, b)i
(0 < a :::; 0.4)
instead of WI. The intensities appearing in (12) were chosen in such a wa)
that the three components Wi, W2, W3 assume the same maximal value W
max
=
10 in the considered (a, b)-domain. Figure 3.7 consists of 480x 768 pixels, each
of them representing a point (a, b) in the indicated rectangle. For each pixel
we computed its test score p := w(a, b)/w
max
numerically; subsequently thE
pixel in question was colored black with probability p, using a random number
~ ~ . 0
32 A Plancherel fonnula
Figure 3.T Th<'! wavelet transform of the function f given by (12); d. Figure 3.6
The wavelet transform accepts functions f E L2(1R) as input and produces
functions Wi; I R ~ --> C as output. If in such a situation we contemplate
",tablishing a Plancherel formula, we of course need a scalar product for
functions u: l R ~ -+ C. For the definition of a scalar product we need a measure
on the St I R ~ := IR* x R. The two-dimensional Lebesgue measure dadb comes
to mind first, but it is not appropriate here for the following reason: The
rariables a and b are not on an equal footing, as, e.g., the variables x and y in
the euclidean plane are. Looking at the integral 3.1.(4) defining the wavelet
transform we see that a point (a, b) E R ~ is used implicitly to characterize
the affine transformation
Sa,b: R-+lR, T ...... t:=aT+b
of the time ~ , and here it is for everyone to see that the stretching factor
;al is of much greater importance than the translational variable b.
The totality
Aff(lR) := {Sa,b I (a, b) E IR:'} (1)
of these affine transformations is a topological group with respect to 0 (Le.)
composition) and as such it carries a "natural" measure dp, called left in-
variant Haar measure. Formula (1) defines a parametrization of the group
Aff (IR) by the set IR=-, so the measure dp becomes manifest as a measure in
the (a, b)-plane. The resulting expression for dp = dp(a, b) can be computed
explicitly; one finds
1
dp = dp(a, b) := lal
2
dadb .
(2)
The explanations given here only serve to motivate heuristically why we adopt
the particular measure (2) on the set IR=- and no other. For a more detailed
account of Haar measure we refer the reader to the literature, e.g., [8] or [16];
but the general theory of Haar measure will not be needed in the remainder
of the book.
This having been settled, we can talk about the Hilbert space
whose scalar product is defined by
{ --dadb
(U,V)H := J R ~ u(a,b)v(a,b)W'
Having all the necessary ingredients ready we can finally formulate the Plan-
cherel theorem announced in the title of this section.
(3.3) Let 'ljJ be an arbitrary wavelet and let W denote the corresponding
wavelet transform. Then for all i, 9 E L2 the following is true:
(Wi, Wg) H = C", (j, g) .
I We work with the function Fa introduced in 3.1.(10) and let the func-
tion G
a
be defined analogously from g. Using (3.2) and (2.11) we obtain
71
successively
(3)
The inner integral in the last line (=: Q) is trivially 0 when e = 0, and for
~ "1 0 the substitution
a'
a := e (a' E IR*) ,
da'
da = ill
(absolute value of the Jacobian!) gives for Q the value
independently of e. Therefore we may continue the chain of equations (3) by
By Fubini's theorem the resulting expression justifies all our previous formal
manipulations. .-J
Before we analyze this theorem and its consequences we present some alter-
native versions of (3.3).
In many cases only scaling factors a > 0 are taken into consideration; i.e., the
wavelet transform Wj is restricted to the upper half-plane
and on IR; the same measure (2) is adopted as before. Let
72 3 The continuous wavelet transforrr
be the corresponding Hilbert space. If we insist that already "half the wavele:
transform" W fIR; should allow a Plancherel formula, then our wavelet II
must satisfy a certain symmetry condition, namely
2
1
d = 2 1 d =' G'
7f II a 7f II a '..p'
<0 a >0 a
This condition is automatically fulfilled if'IjJ is symmetric (Le., even) or real
valued: If 'IjJ is symmetric, then is symmetric as well, and if'IjJ is a real-valued
function, then ==
(3.4) Let 'IjJ be a wavelet satisfying the symmetry condition (4) and let Y\'
denote the corresponding wavelet transform. Then for all J, 9 E L2 thE
following is true:
(WJ, Wg)HI = (I,g) .
I The chain of equations analogous to (3) now reads as follows:
(WJ,Wg)HI = Lo (J WJ(a,b) Wg(a, b) db)
The inner integral in the last line (=: Q') is trivially 0 when = O. If > 0,
the substitution
leads to
a'
a :="'[ (a' E 1R>0) ,
da'
da=-

Q' = r = r da = .
J>o la J>o lal 27f
Similarly, in the case < 0, the substitution
da'
da = lIT
gives
1 2 A Plancherel formula
73
:\ ow one continues as before:
.1 second look at the proof of theorem (3.3) shows that the bilinearity of the
Plancherel formula with respect to the variables f and 9 permits a considerable
of the theorem: One may transform f and 9 by means of two
djfferent wavelets and still gets a formula of type (3.3). This fact of course
!!1creases the flexibility of the wavelet transform both for the analysis and for
the synthesis of time signals f.
(3.5) Let 1/J and X be two wavelets and assume that the integral
2
1
x(a) d -' C
7r I I a -. ..px
R" a
(5)
JS defined, i.e., finite. If W..p and Wx denote the wavelet transform with
respect to 1/J and X, then the following is true for arbitrary f, 9 E L2;
;-
Repeat the proof of (3.3) with Fa defined by 3.1.(10) as before, while G
a
obviously has to be replaced by
\\'e leave the details to the reader.
The formulas established in this section are best understood in the framework
of topological groups and their representations. For a short but very readable
presentation of this aspect see [LJ, Section 1.6.
3.3 Inversion formulas
The continuous wavelet transform encodes a given time signal, i.e., a function
f of one real variable t, as a function Wf of two real variables a and b. In-
stead of 00
1
data we now have, so to speak, 00
2
of them, and this means that
f is represented in the data (Wf(a, b) I (a, b) E lR:') with very high redun-
dancy. It will come as no surprise that this circumstance greatly facilitates
the reconstruction of the original signal f from Wf. As a matter of fact, there
is not only one inversion formula, as with the Fourier transform, but in the
end there is an arbitrary number of such formulas. We shall see in the next
chapter that even an appropriate discrete collection of values
suffices to restore f completely; in other words, there is also a kind of Shannon
theorem for the wavelet transform.
In purely set theoretic terms the set lR:' has "the same number" of points as JR,
and consequently there are "equally many" functions of the form u: lR:' ~ C
as there are functions f: lR --+ C. Nevertheless, it is beyond question that
not every theoretically possible set of data (u( a, b) I (a, b) E lR:') can actually
occur as a wavelet transform of some function f E L2. This means that the
values Wf(a, b) of genuine wavelet transforms must be intercorrelated in an
as yet mysterious way. We shall come back to this point in Section 3.4.
We will need the following regularization lemma:
(3.6) Let
1 (t2 )
9u(t) := J2ifcr exp - 2cr2
denote the normal distribution with variation cr, and assume that the function
f E L1 is continuous at some given point x. Then
lim (f * 9u )(x) = f(x) .
u-+O+
I Let an c > 0 be given. There is an h > 0 (not dependent on cr) with
If(x - t) - f(x)1 < c (It I :::; h) .
Because of J 9u (t) dt = 1 we may write
(f * 9u)(X) - f(x) = J (J(x - t) - f(X))9u(t) dt,
75
which can be estimated as follows:
IU*ga)(X) - f(x)1
~ r If(x - t) - f(x)1 ga(t) dt + r (If(x - t)1 + If(x)l) ga(t) dt
J1tl5,h J
1
tl"2h
~ el
h
ga(t)dt+ Ilflllga(h) + If(x)1 r ga(t)dt.
-h J
1
tl"2h
Here the first integral on the right hand side has a value < 1, and ga(h) as
well as the last integral tend to 0 with a --4 0+; see Figure 3.8. Thus one can
find a ao so that for all a < ao the following is true:
IU * ga)(x) - f(x)1 < 2e.
Since e > 0 was arbitrary, the proof is complete.
We note as an addendum the following identity, valid for arbitrary f E 2:
(1)
The left hand side of (1) is by definition equal to J f(t)ga(x - t) dt, but the
same is true for the right hand side, since ga is a real symmetric (i.e., even)
function.
- + - - - - - - - - ~ - - ~ = - - - ~ ~ = - - - - - - + t
h
Figure 3.8
The Plancherel formula (3.3) can be written as follows:
1 ( dadb
(I,g) = C,p JJR=- Wf(a,b) (V;a,b,g) W .
(2)
Letting 9 := Tx gu this becomes
so that by means of (1) we obtain the formula
(3)
We now let (J ~ 0+ on both sides of (3) and use Lemma (3.6). This leads to
the following reconstruction formula for our time signal f:
(3.7) Let x be a point of continuity of the time signal f. Under suitable
assumptions about f and V; one has the equality
1 ( dadb
f(x) = C,p JJR=- Wf(a, b) V;a,b(X) W .
(4)
I Performing the limit under the integral sign in (3) is quite subtle. For a
complete proof we refer the reader to [DJ, Proposition 2.4.2. -.J
Formula (4) can be viewed "abstractly" as saying
(5)
Written in this form it represents the original signal f as a superposition
("linear combination") of wavelet functions V;a,b, the values Wf (a, b) of the
wavelet transform serving as coefficients.
By the way, the validity of (5) in the so-called "weak sense" can be regarded as
an immediate consequence of the Plancherel formula (3.3). We are referring
here to the following functional-analytic hocus-pocus: Any vector f E L2
possesses a second ("weak") personality in the form of a continuous conjugate-
linear functional, to wit
g....-. (I, g) ;
77
and any continuous conjugate-linear functional : 2 -) C belongs to a well
determined f. If we now look at the Plancherel formula in the form (2) for a
fixed f and variable 9 E 2, then it says no more and no less than
(I, .) = C
1
r df.1 Wf(a, b) (V;a,b, .) .
1/J JlR':..
This can be expressed in words as follows: The "weak version" of f is retrieved
from Wf by superimposing the functionals (V;a,b, .), using the values Wf(a, b)
as coefficients. The formal agreement with (5) is evident.
From the two variants (3.4) and (3.5) of the Plancherel formula one derives
in the same way the following reconstruction formulas:
(3.8) Under suitable regularity assumptions one has
1 r dadb
f(x) = C ~ J l R ~ Wf(a, b) V;a,b(X) W '
if'1f; satisfies the symmetry condition 3.2.(4), and similarly
1 1 dadb
f(x) = -c W1/Jf(a, b) Xa,b(X) -I -12 '
1/Jx lR':.. a
iftbe quantity C1/Jx, see 3.2.(5), is defined.
The last formula can be read as
It performs the reconstruction of f using a different set of wavelet functions
from the ones previously used for the analysis of f. We shall encounter
analysis-synthesis-pairings of this kind a second time in connection with the
discretized version of the wavelet transform.
3.4 The kernel function
Formula 3.3.(5) can be paraphrased in the following way: The mapping
(1)
is the identity. If in this connection people talk about a resolution of the
identity, then this is to be understood in an almost chemical sense: The map
id: L2 -) L2 is first resolved into its (a, b)-constituents and in the end re-
crystallized in the integral 3.3.(5) resp. (1). Resolutions of the identity are
encountered already on a very elementary level: If (el' ... ,en) is an orthonor-
mal basis of the euclidean JRn, then the formula
n
X = L(x,ek)ek
k=l
is valid identically in x E JRn ; in other words, the mapping
n
X 1-+ L (x, ek) ek
k=l
is the identity. There is, however, an essential difference relative to 3.3.(5)
resp. (1): The vectors ek (1 :::; k :::; n) are linearly independent, but the
functions 'l/Ja,b (a E JR*, b E JR) are not. In Sections 4.1 and 4.2 we shall study
these matters once again and in a more general setting.
For the moment we stay with H:= L2(JR:',dp,). From (3.3) we infer
IIW/II :::; .;0;11/11
showing that the wavelet transform W: L2 -) H is a continuous map. Let
be the image space. In the case at hand there is an inverse mapping
the inverse W-
1
being given (at least formally), according to 3.3.(5), by
3.4 The kernel function 79
The space U consisting of all wavelet transforms Wj, j E L2, is a proper
subspace of H. We know, e.g., that the functions u E U have a well determined
value at all points (a, b) E JR.:', and each individual u E U is globally bounded
owing to 3.1.(7):
Ilull oo := sup{u(a,b) I (a,b) E J R . ~ } < 00.
More is true, however: The function space U admits a so-called reproducing
kernel, and this implies that the values of any given u E U are correlated over
large distances, as is the case for holomorphic functions.
We remind the reader that holomorphic functions have a reproducing property
that can be described as follows: Let G c <C be a domain with boundary cycle
8a, and assume that j is holomorphic on an open set n:J G u aG. Then
j(z) = ~ r j(() d(
211"t laG (- z
(z E G) .
Consider a fixed u E U. There is an j E L2 with u = Wj. On account of
(3.3) we may write
1
u(a, b) = (j,V;a,b) = c'" (Wj,WV;a,b)H
1
= C'" (u, WV;a,b)H ((a, b) E J R . ~ ) .
(2)
If we want to present the right hand side of (2) in the form of an integral, we
have to express the function WV;a,b as a function of new variables a', b'. To
this end we regard the wavelet function V;a,b as a time signal and deduce from
3.1.(6) the following expression for WV;a,b(a',b'):
Inserting this into (2) we finally get
1 r da'db'
u(a, b) = c'" lff?:' u(a', b')(V;a,b, V;a',b') -W .
The function
K(a,b,a',b') := (V;a',b"V;a,b)
is well defined at all points (a, b, a', b') E JR.:' x JR.:' and is called a reproducing
kernel for the functions u E U. Altogether we have proven the following
theorem:
(3.9) (C"" U and K are as explained in the text.) For arbitrary U E U and
(a, b) E lR:' one has
u(a, b)
1 1m K( ") (' ') da'db'
-c a,b,a ,b u a ,b -1-1
2
-'
'" ~ ~
(3)
CD Let us compute the kernel function belonging to the following wavelet:
for a picture of the graph, see Figure 3.9. The leading numerical factor was
chosen so as to make 11"p11 = 1. On account of rule (R4) and Example 2.2.@
one has
y
- - - - - - ~ ~ - - - - - - - - ~ - - ~ - - - - + _ - - - - - - ~ ~ - - - - - - - - t
Figure 3.9 Derivative of the Gaussian
If we restrict ourselves to positive a, then the reproducing formula (3) takes
the form
()
1 r ( ") (' ') da'db'
u a, b = C ~ JJR;' K a, b, a ,b u a ,b ~ '
where C ~ is given by 3.2.(4) and is computed as follows:
3.4 The kernel function
81
We shall arrive at K (a, b, a', b') by means of Parseval's formula, therefore we
need {iJa,b. Rule 3.1.(8) gives
= = -v'2 7r-
1
/
4
a
3
/
2
i e-a2e /2,
and a similar formula holds for (fal,bl. We now can write
The resulting integral may be regarded as a Fourier integral, in fact
(4)
where the function GO is given by
:= e e-(a
2
+a
/2
)e /2 .
As an abbreviation we write J a
2
+ a,2 =: A. Since the function e-e /2
is reproduced by the Fourier transform, according to rule (R3) the Fourier
transform of := can be written as
g(x) = /2,
so that with the help of (2.13) we get
G(x) = -(g)"(x) = 15 (A2 _x
2
)e-(x/A)2/
2
.
Inserting this into (4) we finally obtain
3/2 '3/2
K(a b a' b') = 's a a (A2 _ x
2
) e-(x/A)2/2
", yo A5 '
where x:= b' - b and A:= Ja
2
+ a,2.
o
@ We leave it to the reader as an exercise to compute and the kernel func-
tion for the Haar wavelet. Since in this case the scalar products (,IPal,bl , 'l/Ja,b)
can immediately be read off from suitable figures (see Figure 3.10), it is no
longer necessary to make the detour via the Fourier transform. The other side
of the matter is that there are many different cases to consider, so that in the
end no simple expression for the kernel function K results. 0
82
b b'
Figure 3.10
3 The continuous wavelet transform
'l/Jd, b'
b+a b'+a'
I I
I I
I
I
t
3.5 Decay of the wavelet transform
In this section we investigate the asymptotic properties of the function (a, b) H
Wf(a, b) in the limit a --+ O. The values Wf(a, b) corresponding to arguments
lal 1 encode information about high frequency and/or short-lived (called
transient in signal theoretic circles) components of f. We have seen that in the
realm of Fourier transform let's say jump discontinuities of the signal f entail
a slow decay of when --+ oo. As a consequence the inversion formula
(in practice a suitable discretization and/or truncation of this formula) is
converging only poorly even in zones of the t-axis where the function f is
well behaved, e.g., infinitely differentiable. With the wavelet transform this
slowing down of convergence can be localized: If the time signal f is smooth
in the neighborhood of t = b, then Wf(a, b) converges very rapidly to 0 for
a --+ 0; and only in zones where the time signal f has sharp peaks or clicks do
we encounter a slow decay of Wf(a, b) when a --+ O.
The circumstances we have just described have significant practical conse-
quences: When a time signal f is worked on numerically, then of its wavelet
transform Wf( a, b) only, e.g., the values Cr,k := Wf(2r, k 2r) are computed
(resp. measured) and stored. Now, if the signal behaves very well over long
stretches of the time axis, let's say, if it is so many times differentiable there,
then the overwhelming part of the Cr,k will become so minuscule that these
Cr,k may as well be taken to be zero. In this way one can achieve an enormous
rate of data compression: Only the Cr,k whose absolute value transcends a
certain threshold are kept back at all, then stored and used for the recon-
struction of f later on. A vast body of numerical evidence demonstrates that
83
these "essential" Cr,k are completely sufficient to restore the original signal f
with the desired precision. For a further glimpse into this matter we refer the
interested reader to the article [19].
We begin with two statements of a rather simple type.
(3.10) Assume that a wavelet 'ljJ with t'ljJ E Ll has been chosen. Let the time
signal f E L2 be globally bounded and assume that f is Hoelder continuous
at the point b, i.e., there is a: E ]0,1] such that in a neighbourhood of b an
estimate of the form
If(t) - f(b)1 ::; Cit - bI'" (1)
holds. Then
IWf(a,b)1 ::; C
f
lale>+t . (2)
I It is enough to consider the case a > O. Since f is bounded, we may
assume (enlarging C, if necessary) that (1) is true for all t E R
Because of J 'ljJ(t) dt = 0 we have
1 J (t - b)
Wf(a, b) = a
1
/
2
(J(t) - f(b)) 'ljJ -a- dt
and consequently
IWf(a, b)1 ::; a ~ 2 J It - ble> I'ljJC: b) I dt .
In the integral on the right we substitute t := b + ay (-00 < y < 00) and get
IWf(a, b)1 ::; C lale>+t J lyle> 1'ljJ(y) I dy .
From a: ::; 1 we deduce lyle> ::; 1 + Iyl, therefore by assumption on 'ljJ the last
integral has a finite value, and (2) is proven. ~
A Lipschitz continuous function f E L2 is necessarily bounded and is every-
where Hoelder continuous with exponent a: = 1. Thus we get the following
corollary:
(3.11) Assume that a wavelet 'ljJ with t'ljJ E Ll has been chosen. If the time
signal f E L2 is globally Lipschitz continuous, then there is a C, not depending
on b, such that
IWf(a, b)1 ::; C la1
3
/
2
.
There are various variants of converses to these statements, see, e.g., [DJ,
Theorems 2.9.2 and 2.9.4. As an example we quote the following theorem;
the reader is referred to [D] for a proof.
(3.12) Assume that a wavelet 7/J with compact support has been chosen.
If I E L2 is a continuous time signal whose wavelet transform satisfies an
estimate of the form
((a,b) E I R ~ )
for some a E JO, 1], then I is globally Hoelder continuous with exponent a.
The following theorems are of a more subtle nature. The essential lesson we
learn from them is that in order to optimize the asymptotic properties of
our wavelet transforms WI we have to impose additional conditions on the
selected wavelet 7/J. The regularity of 7/J is not an issue here, but it turns out
that it is to our advantage to extend the basic requirement J 7/J(t) dt = 0 to
higher order moments.
The indicated line of thought is based on the following definitions: For arbi-
trary kEN the quantity
(t
k
7/J E 1)
(otherwise)
is called the k-th moment of 7/J E Ll. The wavelet 7/J is a wavelet of order N
if it satisfies the following conditions:
If no special measures are taken, the order of a wavelet is 1, by definition.
Symmetric wavelets have an order ~ 2, if we assume the existence of the
relevant moments. By (2.13) the Fourier transform ;jj of a wavelet of order
N is N-times continuously differentiable, and the moment conditions imply
;jj(N)(O) f o.
It follows that the Taylor expansion of ;jj at 0 has the form
;jj(e) = ,'eN + higher terms, (3)
(3.13) Assume that the chosen wavelet 1fJ has order N and compact support.
If the time signal f E L2 is of class eN in a neighbourhood U of the point b,
then
Wf(a, b) = laI
N
+! (,' f{N)(b) + 0(1))
where {' := sgnN(a)::y j N! .
(a-.O), (4)
I' Suppose that 1fJ(t) == 0 for It I > T. It suffices to consider the case a > 0,
and one may assume from the beginning that a is so small that the whole
interval [b - aT, b + aT 1 is contained in U.
The function f has a Taylor expansion centered at the point b: For given
t E U there is a r between band t such that
f(t) = jt'-l f(t) + { ~ ~ r ) (t - b)N
= jC' f(t) + f{N)(r) ;/(N)(b) (t _ b)N ,
(5)
where the leading term on the right hand side can be unpacked as
N
jC' f(t) = L Ck(t - b)k .
k=O
This implies that for the computation of
Wf(a, b) := a-
1
/
2
J f(t) 1fJ((t - b)ja) dt
we need among others the following integrals:
Altogether we have
1 f{N)(b)
Wf(a, b) = a
N
+'2 'Y N! + R,
and we now have to estimate the error term R, stemming from the remainder
term in (5). Using the substitution t := b + at' (-T S t' S T) the quantity R
can be written as
The last integral suggests that we should introduce the auxiliary function
w(h):= sup II(N}(T) - I(N}(b)1
IT-bl::;h
by assumption on I we are sure that
lim w(h) = 0 .
h--+O+
(h ~ 0) ;
(6)
Since the (variable) point T is known to lie between band t = b + at', we now
can estimate R as follows:
aN+t JT a
N
+
t
JT
IRI ::; ~ -T w(alt'l) WIN 1'IjJ(t')I dt' ::; ~ w(aT) -T WIN 1'IjJ(t')I dt' .
By assumption on 'IjJ the last integral is finite, therefore by means of (6) we
arrive at the stated relation
R = a
N
+
t
0(1) (a--+O) .
According to this theorem the rate of decay of the wavelet transform when
a --+ 0 is determined by the order N of the chosen wavelet, at least in regions of
the b- resp. t-axis where I is sufficiently smooth. One can even say more: The
proportionality factor appearing in the asymptotic formula (4) is essentially
the exact value I(N} (b) of the N-th derivative of I at b, which means that the
"zoom"
a f--+ WI(a, b) (a--+O)
can be used as a measuring device for this value. - In any case, for the
reasons indicated in the beginning of this section, it pays to chose a wavelet,
whose order N is (under the given circumstances) as large as possible.
In cases where the smoothness of I is smaller than is honoured by the order
of the chosen wavelet, the following generalization of (3.11) gives an overall
decay estimate:
(3.14) Assume that a wavelet 'IjJ of order N has been chosen. If the time
signal I E L2 is of class C
r
, r < N , and if I(r} is Lipschitz continuous, then
there is a C, not dependent on b, with
IWI(a, b)1 ::; C lal
r
+
t
.
87
I We may again assume a > O. Computing the Taylor expansion of f at
an arbitrary point b E ffi. one obtains (cf. (5))
the point T lying between band t. Because of r < N only the remainder term
is contributing anything to Wf (a, b) at all; so we have
Wf(a, b) = _1_ j(f(T)(T) _ f(r)(b))(t _ bt1/;(t - b) dt
r! a
1
/
2
a
= a r r ~ ! j(J(r)(T)-f(r)(b))t!r1/;(t!)dt!.
Since the point T lies between band t = b + at!, by assumption on f we are
sure that
for a suitable Clip. Therefore we are able to estimate Wf(a, b) as follows:
Here the last integral is finite by assumption on 1/;.
We conclude this section by investigating how "clicks" of a time signal f
influence the decay of the wavelet transform Wf. In our terminology an r-
click, r 2: 0, of f is an isolated jump discontinuity of the r-th derivative of f
at some point b E ffi.:
f(r) (b+) - f(r) (b-) =: b. .
Apart from that all derivatives f
U
) of order:::; r are assumed to be continuous
in a neighbourhood of the point b. About such clicks we prove the following:
(3.15) Assume that the chosen wavelet 1/; has order N and compact support.
lithe time signal f E L2 has an r-c1ick, r < N, at the point b, then
Wf(a,b) = lal
r
+!(Cb.+o(I)) (a -t 0) ,
the constant C being independent of f.
The left part of Figure 3.7 illustrates the case r = 1, N = 2 of this theorem.
I As in the proof of (3.13) we suppose that 'Ij;(t) == 0 for It I > T. It is no
restriction of generality to assume b = 0; furthermore it suffices to consider
the limit a 0+. Instead of (5) we now have
(t > 0)
for some T between 0 and t, and similarly for t < O. Setting
we obtain the following representation of f, valid for all t # 0:
A f(r) (T) - f(r) (O)
f(t) = r-
1
f(t) + - t
r
+ - sgn t . t
r
+ t
r
.
o r! 2r! r!
Here the -sign has to be interpreted as + when t > 0 and as - when t < o.
Because of N > r this formula implies
Wf(a,O) = _1_ sgnt + (J(r)(T) - f(r)(O))) tr'lj;(!) dt
r! a
1
/
2
2 a
r+t jT
= _a _ (_ sgnt' + (J(r)(T) - f(r) (O)))t'r 'Ij;(t') dt' .
r! -T 2
(7)
Putting
1 jT
-, sgnttr'lj;(t)dt =: C
2r. -T
we arrive at
Wf(a, 0) = C + R.
It remains to estimate the error term R. To this end we use the auxiliary
function
w(h):= sup If(r)(T) - f(r)(o)1 '

defined for h > 0 and where again the -sign is to be interpreted as + when
T > 0 and as - when T < o. By assumption on f we have
lim w(h) = 0 .
h--->O+
(8)
The (variable) point T in the integral (7) is lying between 0 and t = at'. This
implies that the remainder R can be estimated as follows:
IRI ::; a ~ ~ ! i: w(a It I) IW 17/J(t) I dt ::; a r r ~ ! w(aT) i: IW 17/J(t) I dt .
Since the integral on the right side of this equation is finite, we may conclude
with the help of (8) that the stated formula
(a -+ 0+)
is true.
4 Frames
The general notion of a "frame" will enable us to present the continuous
wavelet transform and its discretized version (to be studied later on) from a
single functional-analytic viewpoint. The next two sections, 4.1 and 4.2, are
essentially borrowed from [KJ, where this unified aspect of the two theories is
described in a particularly lucid way.
To summarize the general idea in a few lines: A frame is a collection a. :=
(a
L
I LEI) of vectors in a Hilbert space X that is rich enough to make sure
that no vector x E X other than 0 is orthogonal to all a
L
In the infinite-
dimensional case this is not so easy to guarantee. The a
L
need not be linearly
independent, let alone orthonormal. As a consequence, frames are in general
a "redundant" collection of vectors.
4.1 Geometrical considerations
In order to get acquainted with the proposed "framework" we consider the
following situation:
Let X be a finite-dimensional complex Hilbert space: dim X =: n < 00, and
assume that r vectors at, ... , a
r
E X are given. The number r of these vectors
should be thought of by the reader as being larger than the dimension n of
the space X. With the aid of these aj we construct the mapping
T: X -+ C
r
, X r-. Tx; (1 :S j :S r) .
Denoting the canonical basis of C
r
=: Y by (et, ... , e
r
) we can write the
mapping T in the following form:
r
Tx = 2:)x, aj) ej .
(1)
j=l
Since X has dimension n, the image space
U := im(T) := {Tx I x E X}
91
x
T
..
Y
..
Figure 4.1
is at most n-dimensional, therefore U is a proper subspace of the r-dimensional
space Y in case r > n; see Figure 4.1.
We now want to investigate the following questions: Is a vector x E X uniquely
determined by its image y := Tx E Y? Or, to put it differently: Is T an
injective mapping? Or, expressed yet a third way: Is ker T = O? And, if the
answer is yes: How, in such a case, could one reconstruct the vector x from
its image y?
If T is injective (from which, in principle, invertible) then the given collec-
tion a. := (al, ... ,ar) of vectors aj E X is called a frame for the (finite-
dimensional) Hilbert space X, and the mapping T is called the frame operator
belonging to the collection a ..
If we adopt on the space Y the canonical scalar product
r
(y,z) := LYkZk, (2)
k=l
the space Y becomes a Hilbert space, too. This setup may be expressed in a
more sophisticated way as follows: Y = L2 ({I, ... , r}, #). The fact is that
the vectors Y E Y can be regarded as complex-valued functions
{l, ... ,r}-tC,
k f--+ Yk ,
and # denotes as usual the counting measure, which assigns each point of the
domain under consideration the measure (mass) 1.
In this way the mapping T becomes a mapping between Hilbert spaces, there-
fore it is possible to consider its adjoint T*: Y -t X. It is characterized by
the following identity:
(x, T*y)x = (Tx, y)y VXEX,VyEY.
92 4 Frames
In particular, one has
(x, T*ej) = (Tx, ej) = (j-th coordinate of Tx) = (x, aj) VxE X,
which allows the conclusion
(1 'S:j 'S:r). (3)
If we compose the mapping T with T* we obtain the Gram operator (so-called
by us, see the footnote
1
below)
G := T* T: X ~ X ,
a mapping from X to X. Applying the mapping T* to both sides of (1), we
obtain, thanks to (3), the following formula for G:
Gx
Regarding kernels we now assert:
r
~ ) x , a j ) a j .
j=l
kerT = kerG .
I Tx = 0 of course implies Gx = 0, and the identity
IITxl12 = (Tx, Tx) = (T*Tx, x) = (Gx, x)
proves the converse.
Formula (5) admits the following conclusion:
(4)
(5)
(6)
(4.1) The mapping T: X ~ Y is injective if and only if the corresponding
Gram operator G := T*T: X ~ X is regular.
1 The Gram matrix or Gramian of a collection of vectors ak E X is by definition
the matrix of the scalar products (ak, al). This is not the matrix of G but the
matrix of the mapping TT': Y Y.
4.1 Geometrical considerations 93
We have to take a closer look at the Gram operator. Since for arbitrary x,
u E X we have
(x, Gu) = (x, T*Tu) = (Tx, Tu) = (T*Tx, u) = (Gx, u) , (7)
we conclude that the operator G is self-adjoint. This has the consequence
that all its eigenvalues Ai are real, and, what's more, if A is an eigenvalue of
G and xi- 0 a corresponding eigenvector, then from (6) one deduces
A(X, x) = (Gx, x) = IITxl12 ~ 0,
which in turn implies A O. We arrange the Ai in increasing order as follows:
By the same token there is an orthonormal basis (ell ... ,en) of X that diago-
nalizes G. With respect to this basis the image of the vector x = (x 1, ... , x
n
)
is given by Gx = (AIX1, ... , AnXn). Computing IITxl1
2
using these coordinates
one gets
These inequalities are going to play an essential role in the rest of the book.
For the time being we note the following proposition:
(4.2) A collection a. = (all"" a
r
) of vectors is a frame for the (finite-
dimensional) Hilbert space X, if and only if there are constants B ~ A > 0
such that
VxE X.
The numbers B ~ A > 0 are the frame constants of the frame a.. If A = B,
then the frame a. is called a tight frame. In this case one has
VXEX,
which means that T maps X essentially isometrically onto U; and the Gram
operator belonging to a tight frame is given by
G=AIx,
where Ix denotes the identity map of the vector space X.
94 4 Frames
CD Let X be the space ((:2, fitted with the canonical scalar product (2). For
an arbitrarily chosen number r ~ 2 we put w := e
27ri
/
r
and define the r unit
vectors
(O:::;j:::;r-l).
Figure 4.2 shows the first coordinates of the vectors aj. We now are going
to study the corresponding frame operator T: X --+ ((:r. For a general vector
x = (XI,X2) E X we have
1 . .
(Tx)j = (x, aj) = J2(xlw
J
+ X2WJ)
and consequently
r-l r-l
IITxI12 = ~ "'(XIW
j
+X2Wj)(XIWj +X2Wj) = ~ "'(IXI12 + IX212) = ~ IIxI1
2
2 ~ T 2 ~ 2
j=O j=O
(at the up arrow i we have made use of ; : ~ w
2j
= 0). The resulting identity
shows that the collection a. = (ao, ... , ar-l) is a tight frame with frame con-
stant A = r /2. One may regard the value r /2 as a measure of the redundancy
of the frame a.. It is clear that for ((:2 two suitably chosen vectors would do.
o
C (first coordinate)
Ijf2
Figure 4.2
@ Let a. = (al"'" an) be an orthonormal basis of the Hilbert space X. If
T is the corresponding frame operator, then
n
IITxI12 = L I(x, aj)12 = IIxI1
2
'<:/xEX.
j=l
It follows that a. is a tight frame with frame constant A = 1.
o
95
In order to strengthen the geometric intuition we consider in this last
example the following real situation: Let
(1 ::; j ::; 3)
(8)
be three linearly independent vectors of the euclidean ]R3. Writing the three
row vectors (8) one below the other, one obtains a regular (3 x 3)-matrix [Mj.
The frame operator T maps a general vector x E ]R3 onto the vector
333
Tx := (L alkXk, L a2kXk, L a3kXk) E]R3.
k=l k=l k=l
Computing
brings into the picture the quadratic form Q whose matrix elements Qk! are
given by
3
Qk,! := L ajkaj! .
j=l
These are not the scalar products of the aj, but the scalar products of the
column vectors of [Mj. The above formula for the Qk! is equivalent to the
matrix equation [Qj = [Mj' [M], where the prime' denotes transposition.
It follows that the symmetrical matrix [Qj is regular as well, therefore the
quadratic form Q is positive definite. This implies that Q assumes a certain
maximum value B and a positive minimum value A on the unit sphere 8
2
C
R
3
, from which we may immediately conclude that the three given "lectors
form a frame with frame constants B :::: A > O. 0
We now address the second question: How can the vector x E X be recon-
structed from its image y := Tx?
Thus we assume that the collection a. = (aI, ... , a
r
) is indeed a frame and let
G: X --+ X be the corresponding Gram operator. Since G is regular, it has
an inverse G-
1
: X --+ X. Using G-l we define the mapping
8 := G-1T*: Y --+ X .
The formula
8T = G-
1
T*T = G-1G = Ix (9)
96
4 Frames
shows that S is a left inverse of the frame operator T and so may be used for
the reconstruction of x from y = Tx. If the frame a. is tight, then
a-III
=11 x
and consequently
S = .!.T* .
A
This means that in the case of a tight frame the inverse transformation S is
obtained for free, Le., without having to compute a matrix inverse.
We now compose Sand T the other way around and obtain the mapping
P := T S: Y --t Y .
It can be characterized geometrically as follows:
(4.3) P:= T S is the orthogonal projection of the space Y onto the subspace
U:= im(T).
I" Let Pu be the orthogonal projection of Y onto U. Any vector y E Y has
a uniquely determined decomposition of the form
u = P
u
y E U, v E U1. .
For vectors u = Tx E U, formula (9) implies the identity Pu = TSTx =
Tx = u. For a v E U 1. we have
(x, T*v) = (Tx, v) = 0 VXEX.
From this we conclude T*v = 0, and this in turn gives Pv = T(a-IT*)v = o.
Altogether we obtain
Py = Pu + Pv = u = Pu y Vy E Y,
as stated.
Proposition (4.3) may be interpreted as follows (see Figure 4.3): The S-image
x := Su of a vector u E U is the uniquely determined vector x E X whose
T-image is the given u, and the S-image x := Sy of an arbitrary y E Y is the
one vector x E X whose T-image is nearest to the given y. In this way we
have obtained a simple geometric description of the mapping S.
Now for the next step: Using a-I we define the vectors
(1 ~ j ~ r) .
4.1 Geometrical considerations 97
x
y
S:=G-IT*
u
o
o
X=Sy
~ Y
Tx=Py
Figure 4.3
The collection a. : = (al,"" ar ) is called the dual frame of the frame a.. If
the given frame a. is tight, then the aj coincide with the aj up to the constant
factor j. In the following theorem we sum up what can be said about the
relation between a frame a. and its dual a ..
(4.4) Let a. be a frame with frame constants B ;::: A > 0 and let a. be the
corresponding dual frame. Then the following are true:
(a) The two frames a. and a. together incorporate a resolution of the identity
for the space X:
r
X = L(x,aj) aj
j=l
VxEX.
(b) The image Sy of an arbitrary vector y = (Yl, ... Yr) E Y is given by
r
Sy = LYjaj .
j=l
1 1
(c) The collection a. is in fact a frame with frame constants A ;::: B > O.
(d) The dual frame of a. is a. ; in particular, one has the following mirrop
formula to (a):
x
r
L(x,aj) aj
j=l
I (a) Using (4) one immediately obtains
VXEX.
x = G-l(Gx) = G-
l
(L(X, aj) aj) = L(x, aj) aj .
j j
98
4 Frames
(b) Formula (3) implies
Sy = G-lT*(.LYjej) = G-l(LYjaj) = LYjaj.
j j j
(c) Let T be the frame operator belonging to the collection ii.. Since G is
self-adjoint, the same is true for G-l. Now we have '
for all x and all j. This proves
T - TG-
l
- , (10)
and (6) implies, in turn,
There is an orthonormal basis (el, ... , en) of X that diagonalizes both G and
G-l. Using this basis we now obtain the required estimates:
IITxl12 = (x, G-lx) = L -IXiI2 - B
n 1 { > 1.. IIxl12
i=l Ai ::; -lllxll2
(d) With the help of (10) one obtains the following expression for the Gram
operator C belonging to the collection ii.:
C := T* T = G-lT* TG-
l
= G-
l
.
This implies ~ j := C-liij = Giij = aj for all j, as stated.
If r > n := dim(X), then the ii
j
are linearly dependent, so there have to be
infinitely many representations of a given vector x E X as a linear combination
of the iij. Among these the representation (4.4)(a) is distinguished as follows:
(4.5) Let a. and ii. be dual frames, and let x = E;=l j ii j be an arbitrary
representation of the vector x E X as a linear combination of the iij. Then
r r
L l ~ j l 2 ~ L l(x,aj)1
2
,
j=l j=l
the equality sign holding only if j = (x, aj) for 1 ::; j ::; r.
4.2 The general notion of a frame
99
I Consider the point (6, ... er) =: y E Y. According to (4.4)(b) one has
x = By, and (4.3) implies Tx = TBy = puY. This at once leads to
Here we can have equality only if y = puY = Tx. Expressing these geometric
facts in terms of coordinates one obtains the statements of the theorem. --.l
The content of Theorem (4.5) can be expressed in this way: The "natural"
representation (4.4) (a) uses the least amount of "coefficient energy" .
The geometrical (and finite-dimensional) analysis presented in the foregoing
section served to prepare us for the following general dispositions:
X is a complex Hilbert space whose vectors we denote by f, h and similar
letters. One should imagine X being infinite-dimensional.
M is an "abstract" set of points m. On the set M a measure JL is defined that
assigns each measurable subset E eMits "mass" or ''volume'' JL(E) E [0,00].
The measurable subsets form a so-called u-algebra F, and care is taken that
any "reasonable" subset E C M belongs to F. According to general principles
it is then possible to set up an integral calculus for functions on M, and it
makes sense, e.g., to speak about the Hilbert space Y := L
2
(M, JL). The
pair (M, JL) is the abstraction of the pair ({I, 2, ... , r}, #) that played such a
prominent role in the last section.
Furthermore, a family h. := (hm 1m E M) of vectors h
m
E X is given, the
measure space M serving as index set for this family. The h
m
are (analogous
to the aj of Section 4.1) to be viewed as "measuring probes", by means of
which we want to explore the individual vectors f E X as completely as
possible. In Section 1.5 we tentatively spoke of "key patterns" when actually
the same "measuring probes" were meant.
The fact is, for a given f E X, one gets ahold (numerically, experimentally,
conceptually, or otherwise) of the family of all scalar products
Tf(m) := (j, h
m
) (m E M) .
100 4 Frames
In this way one obtains an array (Tf(m) 1 mE M) that is nothing other than a
function Tf: M ----; C. The integral installed on M now enables us to quantify
the yield of our measuring efforts: The L
2
-integral
(0) (1)
is obviously a natural measure for the amount of information so collected
about f.
This brings us to the following definition: The family h. is a frame, if the
following conditions are satisfied:
the function Tf is j.L-measurable for all f E X, so that the integral (1) is
always defined;
there are constants B A > 0 such that
A IIfl12 < IITfl12
(a)
< Bllfl12
(b)
VfEX.
Here the inequality (b) guarantees that the frame operator
T: X ----; eM , f f-t Tf
is a bounded operator from X to Y := L2(M,j.L). The inequality (a), in
most cases the crucial one of the two, serves to make sure that T is injective,
signifying that no information is lost in the process f f-t Tf.
While we are at it, we proceed to explain the related notion of a "Riesz basis",
which will playa certain role in connection with the discrete wavelet transform
later on. Here the set M is countable to start with, and j.L is the counting
measure # on M. A family h. = (hm 1 mE M) of vectors h
m
E X is called a
Riesz basis of X if the following conditions are satisfied:
span(h.) = X;
there are constants B A > 0 such that
A 2:leml
2
B 2:leml
2
m (c) m " m
(2)
Altogether, these conditions say that the mapping
m
is a bounded operator having a bounded inverse K-
1
: X ----; l2 (M).
4.2 The general notion of a frame 101
The relation between the two concepts "frame" and "Riesz basis" is not ob-
vious, because the two definitions speak about totally different things. Thus
it is not a bad idea to prove the following proposition:
(4.6) A Riesz basis h. with constants B ~ A > 0 is automatically a frame
with A and B as frame constants.
I"" Let (em 1m E M) be the canonical orthonormal basis of [2(M). Then
one has K em = h
m
and consequently
m m m
for all x EX. By general principles of functional analysis the conditions (2)
imply the analogous inequalities for K* = T. This means that we also have
The following somewhat vague statement is not so far from the truth: A
Riesz basis is a countable frame whose vectors are linearly independent and
stay so even "in the limit". To wit, the inequality (c) in (2) guarantees that it
is impossible for a nontrivial linear combination :Em m h
m
to represent the
zero vector.
In the finite-dimensional case the inverse G-
1
of the Gram operator and the
dual frame a. could be computed by inverting a certain matrix. In the case
at hand, an operator
G : X ~ X , dim(X) = 00,
has to be inverted. This can be accomplished by means of an iteration pro-
cedure whose rate of convergence is tied to the quotient *: The nearer this
quotient is to 1, the better the convergence of our procedure is. In fact, we
shall prove the following:
(4.7) Assume that h. is a frame for X with frame constants B ~ A > 0, and
let y E X be an arbitrary vector. If the sequence x. is recursively defined by
Xo := 0,
2
Xn+l := Xn + A + B (y - Gxn ) (n ~ 0) ,
then lim
n
--+
oo
Xn = G-I Y .
In practice (that is to say, in the actual numerical computation of the frame
vectors aj := G-1aj), the described procedure is cut short as soon as the
increments A!B (y - Gx
n
) become negligibly small.
102 4 Frames
I We consider the auxiliary operator
2
R := Ix - A + B G .
In terms of R, the iteration formula can be rewritten as
2
xn+1 := A+B y + Rxn
Now G is a positive definite self-adjoint operator, and by assumption on T
we know that A Ix ~ G ~ B Ix (such inequalities make sense in this case!).
This implies
II
A+B II B-A
G--
2
-
I
x ~ -2-'
so that we get the following estimate for the norm of R:
II
2 II B-A BjA-1
IIRII = A + B G -Ix ~ B + A = BjA + 1 < 1 .
By the contraction principle (Le., the general fixed-point theorem) we can
conclude now that lim
n
....
Hxl
Xn =: x E X exists, and furthermore that
The last equation implies y - Gx = 0, whence x = G .... ly, as stated. .J
At this time we can see the following two applications of the concepts pre-
sented here: Number one, of course, the finite-dimensional model discussed
in Section 4.1, and number two, the continuous wavelet transform as treated
in Chapter 3. We are now going to review and interpret the latter in the
functional analytic framework (!) set up in this section.
X is the space L2(JR) of time signals j, and M is the set
J R ~ : = {(a,b) I a E JR*, bE JR},
provided with the measure d/L := dadbjlal
2
. The Hilbert space Y := L2(M)
is the space L2(JR:', d/L) that was denoted by H in Chapter 3.
After a mother wavelet 'IjJ has been selected, one defines the wavelet functions
1 (t - b)
'ljJa,b(t) := laI1/2'IjJ -a-
103
and in this way installs a family
of vectors 'l/Ja,b E L2. The corresponding frame operator T transforms any
function f E L2 into a function T f: JR.:' -+ C according to the prescription
Tf(a, b) := (I, 'l/Ja,b) = Wf(a, b)
We see that the wavelet transform W is nothing other than the frame operator
T corresponding to the family 'I/J . Now by Theorem (3.3) one has
where the constant C'IjJ is given by
C'IjJ := 2n l. da .
In terms of the concepts defined in the current chapter, we can express this
fact as follows:
(4.8) Let 'I/J be an arbitrary mother wavelet. Then the family 'I/J. is a tight
frame with frame constant C'IjJ.
In view of this theorem, the inverse of the Gram operator is given by a-
1
=
lx, and the dual frame ;jJ. coincides with 'I/J. up to the same constant
factor:
If we now apply formula (4.4)(a), which reconstructs a given vector x E X
from the values (Tx)j := (x, aj), to the situation at hand, we arrive at the
following:
( dadb 1
f = JIR=- Tar Wf(a, b) C'IjJ 'l/Ja,b
(3)
This is in agreement with (3.7) resp. 3.3.(4). It must be admitted, however,
that (4.4)(a) is related to a finite-dimensional model, so the validity of (3) is
not guaranteed in the present situation. As a matter of fact, formula (3) is
valid only in a "weak" sense or else under stronger assumptions on f and 'I/J;
see our remarks in Section 3.3 regarding this point.
104 4 Frames
4.3 The discrete wavelet transform
Shannon's sampling theorem (Section 2.4) accomplishes the full reconstruction
of a bandlimited time signal f from a discrete collection (J (kT) IkE IE)
of sampled values. In this section we set out to attain something similar
in the realm of the wavelet transform. The data that we shall use in the
reconstruction of f are no longer f-values at equally spaced points kT, but
results of "wavelet measurements" (j, 'l/Ja,b); that is to say, suitably chosen
values of the wavelet transform Wf: lR=- -+ c. One must always keep in mind
that a given signal f is encoded in its wavelet transform with an enormous
redundancy. Under these circumstances it is not so surprising that a discrete
set of Wf-values is already sufficient to reconstruct the given f as an L2_
object or even pointwise, and all this even without the assumption that f is
bandlimited.
We now describe the class of "grids" in the (a, b)-plane that we shall use for
the sampling of the function Wf: First a zoom step a > 1 is chosen (the
habitual choice is a = 2) as well as a base step f3 > 0 (a good choice is f3 = 1).
These two parameters characterize the chosen "grid" and are kept fixed in the
following. Then one sets
(m, n E IE) ,
a

____ __ __ ________ _
(m=O)
(m<O)
b
Figure 4.4
4.3 The discrete wavelet transform 105
and with these numbers one defines the countable set
shown in Figure 4.4. Note that negative a-values are no longer taken into con-
sideration. From a structural point of view, i.e., for the purposes of addressing
the individual points of M, we obviously can say that M rv Z x Z.
Our next question is: What should be the correct measure on this M? Each
point (am, bm,n) E M represents a rectangle Rm,n of width am f3 and height
(J'mfo - in the (a, b)-plane (see Figure 4.5), and the Rm,n constitute
a disjoint decomposition of the upper half-plane JR.;. The J.L-content of the
rectangle Rm,n is computed as follows:
therefore it is independent of m and n. This crucial observation leads us to
choose the counting measure # as our measure on the set M rv Z2, so that
the space Y of the foregoing section becomes Y := l2(Z2).
Rm,n
ern/Va -
Figure 4.5
Assume now that a mother wavelet 'l/J has been chosen once and for all. From
the full set of wavelet functions 'l/Ja,b, (a, b) E we only retain the ones
that belong to the points (am, bm,n) E M, and of course these functions get a
new address: 'l/JaTn,naTn{3 =: 'l/Jm,n' This means we now have the family
106
4 Frames
consisting of the following wavelet functions:
The corresponding frame operator T: I ~ T I is connected to the wavelet
transform W: I ~ WI by means of the formula
(1)
We are now ready for the essential questions of this section: Under which
assumptions on 'lj;, (J and f3 can we be sure that the collection 'lj;. is in fact a
frame, and what are the resulting frame constants?
Regarding the second question, in [DJ, Theorem 3.3.1, the following is proven:
(4.9) Let'lj; be a wavelet and let C_, C+ be defined by
C .= 271" r 1 - 0 ( ~ W de
- . J<o I ~ I .. ,
If the family 'lj;. corresponding to given step sizes (J and f3 is in fact a frame,
then the resulting frame constants B :::: A > 0 satisfy the following inequali-
ties:
In particular, one cannot have A = B unless C_ = C+. This is a consequence
of the fact that we have rejected negative a-values; cf. the analogous condition
in Theorem (3.4). For the proof of (4.9) we refer the reader to [DJ.
SO far, so good, but what we really want is a theorem of the following kind:
Under exactly described circumstances it is guaranteed that the collection 'Ij;.
is a frame, with the frame constants B :::: A > 0 obeying tolerances stipulated
in advance.
Assume that a zoom step (J > 1 is given. A wavelet 'lj; is called admissible for
the purposes of this discussion if its Fourier transform -0 fulfills the conditions
(a) and (b) below.
(a) There are constants ex> 0, p > 0 and C, such that
( I ~ I :::; 1)
( I ~ I :::: 1)
(2)
107
This condition is in fact harmless and serves mainly to introduce the constants
<l, p and G. If, e.g., we have t'IjJ E 1 and 'IjJ' is of bounded variation, then
estimates of the form (2) are valid with Ct = 1 und p = ! .
(b) There is a constant A' > 0 such that
00
L > A' ;-1 E JR) . (3)
m=-oo
Since the left hand side of (3) is invariant with respect to the transformation
e 1-+ O'e, it is enough to check the required on the domain 1 ::; ::;
(J. According to this condition the zeros of 'IjJ are in a way forbidden to be in
"logarithmic conspiration". Thus it is in particular excluded that the support
of;j) is contained in a single interval of the form lb, O'b[. Assume, e.g., that 'IjJ
has finite order N. Then because of 3.5.(3) there is an h > 0 with
(0 < lei < h) ,
and (3) is guaranteed.
For the purposes of the current discussion we call the constants Ct, p, G, and A'
the parameters of 'IjJ. - After all these preparations we can finally formulate
the central theorem of this chapter:
(4.10) Let a zoom step 0' > 1 be given and assume that 'IjJ is an admissible
wavelet with parameters Ct, p, G and A'. Then there are constants (30, B'
and G', so that the following is true: For any base step (3 < (30 the family
'if;. = ('ljJm,n I (m, n) E Z2) is a frame with frame constants
A = 21l' (A' - G' (31+P) ,
(3
B = 2; (B' + G' (31+P) .
We defer the proof of this theorem to the next section. For the time being,
the following heuristic argument should be sufficient:
We have to show that the operator T satisfies the frame condition
(4)
108
4 Frames
According to (1) we have
m,n m,n
Now the above considerations concerning the rectangles Rm,n show that the
right hand side of this equation can essentially be regarded as a Riemann sum
for the integral
and according to Theorem (3.4) this integral has the value G ~ IIf1l2. For this
reason it is quite plausible that for sufficiently small a > 1 and sufficiently
small f3 > 0 the quantities IITfll2 and IIfll2 have the same order of magni-
tude, as required by (4). Theorem (4.10) shows that in reality very modest
assumptions about 1/J suffice to guarantee that the data
(Tf(m,n)! (m,n) E Z2) (6)
encode all features of the analyzed function f, as soon as f3 is small enough;
in particular, it is all right to take a := 2 in such a case.
For the reconstruction of the original f using the data (6) we need the frame
., dual to 1/J . If the frame 1/J. is not tight, we have to compute the m,n
using the prescription
m,n := G-
1
(1/Jm,n) .
Unfortunately the m,n cannot be obtained from a single by mere dilation
and translation, unless of course 1/J is chosen in a very special way at the
outset. The following considerations will make this more clear:
The two operators
Df(t):= ~ f ( ~ )
and
8f(t) := f(t - f3)
are unitary, therefore we have D* = D-l and 8* = 8-
1
. Consider now the
Gram operator G, given by
m,n
109
Regarding D, we have
1 (t) 1 (t/o- )
Dif;m,n(t) = ,fiiWm,n -;; = o-(m+l)/2 W o-m - n{3 = Wm+l,n(t)
and consequently
m,n m,n m,n
m,n m,n
Obviously in this case a-I commutes with D as well, and we obtain
that is to say,
Unfortunately a and S do not commute, so that the above calculation (7)
cannot (mutatis mutandis) be repeated. The reason is the following: The
functions SWm,n appearing on the right hand side of the formula
m,n
cannot be identified with certain Wm',n
"
as the D'm,n could; rather, they
look like this:
and in general the factor n + o--m is not an integer. From this observation
one has to draw the conclusion that the dual wavelet functions 1Po,n, nEZ,
are not related to each other in a simple way, so they have to be determined
individually.
For the reasons described above, in most circumstances one is eager to choose
a tight frame W. right at the outset. The following theorem shows that such
a choice is indeed possible:
110 4 Frames
(4.11) Assume that the Fourier transform of the mother wavelet ' has
compact support in the interval I := [w, w'], w' > w > 0 and that
00
.L == A' > 0
m=-oo
Then the collection '. = ('m,n I (m, n) E Z2) belonging to the zoom step (!
and arbitrary base step
(3
2n
<--
- w-w
'
is a tight frame for real-valued time signals 1 E L2.
r Without restriction of generality we may assume
2n
{3:=-,- .
w -w
On account of Parseval's formula (2.11) and rule 3.1.(8) one has
IITI1I2 = .L 1(1, m,n>\2 = Lam If e
inuTn
{3e
m,n m,n
Introducing the auxiliary function
we can write liT 1112 in the form
IITI1I2 = Lam If = L
am
lQmnl
2
,
m,n m,n
the Qmn being given by
(8)
(note that the function 9 is identically zero outside the interval a-mI). The
functions
(nEZ)
III
are the trigonometrical basis functions for an interval of length
in particular for the interval(7-m I. This indicates that the Qmn are essentially
Fourier coefficients; in fact the formulas (2.8) give
Q
-m 211" )
mn = (7 {j" 9 -n ,
and summing over n (m is fixed) gives
At the up arrow i we have used Parseval's formula for period length (7-m
21r/(3, as quoted in (2.8). In this way we finally obtain
It is only at the very end that we have used that f should be
real-valued. In this case the identity i( == holds.
We now are confronted with the task of producing a mother wavelet 1/J that
satisfies the assumptions of Theorem (4.11). Since these assumptions refer to
the Fourier transform ;p, it suggests starting with;P. In the following example,
constructed by Daubechies-Grossmann-Meyer, a suitable;P is given in terms
of simple formulas; the actual wavelet 1/J in the time domain then has to be
computed numerically. Now, this Fourier inversion concerns a single function
and may be performed once and for all, preceeding the wavelet analysis of any
time signal f.
112
y

o 1/2
Figure 4.6
CD We shall need the auxiliary function
"(x) - 1s,,' + 6'"
1
(x $ 0)
(O$x$l)
1)
4 Frames
t, x
(9)
(or some other function with similar properties). In the interval 0 $ x :$ 1
this function can be written as
Looking at the integrand on the right hand side (see Figure 4.6), we see that
it has a double zero both at t = 0 and at t = 1, is otherwise positive, and is
symmetrical with respect to the point t = It follows that vex) increases
monotonically from 0 to 1 in the interval 0 $ x $ 1, with C
2
-crossings at
the points x = 0 and x = 1; moreover, the mentioned symmetry implies the
identity
v(l - x) = 1 - vex)
which is going to playa certain role later on.
Let u > 1 and 13 > 0 be given, and set
27f
W := (u2 - 1)13 '
\fxER, (10)
113
in this way (8) is fulfilled. We now define ;j; having support I := [w, w'] by
the formula
!
- w ))
2 aw-w
:= .JAi. - aw ))
2 a
2
w -aw
o
(aw ::; ::; a
2
w)
(otherwise)
(11)
(see Figure 4.7). The constant A' appearing here is determined by the condi-
tion II'II = 1.
1
(a=2, ,8= 1)
o 21r/3 41r/3
Figure 4.7
As we remarked earlier, the function
m
is invariant with respect to the transformation t-+ If we restrict our
attention to the [w, aw], then we see that only the two terms corre-
sponding to m = 0 und m = 1 contribute anything to W at all. Therefore
we have
where we have used the abbreviation

=: x.
aw-w
So much for ;j;. The (complex-valued) wavelet ' having the given ;j; as its
Fourier transform is shown in Figure 4.8; one observes that Re(') is an even,
Jm( ') an odd function. - We shall come back to this example in Section 5.3.
o
114 4 Frames
y
y=Im('!f;(t))
4
Figure 4.8 Daubechies-Grossmann-Meyer wavelet (step sizes u= 2, .8= 1)
4.4 Proof of theorem (4.10)
The following proof is essentially taken from [D], Section 3.3.2.
We are confronted with the task of estimating the sum on the right hand side
of 4.3.(5) as accurately as possible. To this end, we begin with 3.1.(9):
Introducing the auxiliary function
.'\.
:= , (1)
we can write
Wf(a,nb)
(2)
4.4 Proof of theorem (4.10) l15
where we have tacitly assumed b -I- o. The function
is periodic and of period 2:. Because of the formulas (2.8) we therefore can
interpret (2) as
/2 21r ~
Wf(a, nb) = lal
l
. b G( -n) .
Taking the sum with respect to n we obtain
(3)
where at the end we used Parseval's formula for period length 2:, see (2.8).
We now take a closer look p.t the last integral:
Substituting e + l2: =: .;', we can continue with
The last expression is now inserted into (3), leading to the following interme-
diate result:
L IWf(a, nb)12 = 27r
b
l
a
l
L J g(e) g( f. + k 2;) de .
n k
116
4 Frames
Here we set
(mE Z)
and sum over m as well, so that we finally obtain
IITfll2 = 2:jWf(a
m
,na
m
,8)j2 = ~ 2:Qkm.
m,n k,m
(4)
When the Qkm appearing on the right are unpacked using the definition (1)
of g, they look as follows:
It will turn out that the terms with k = 0 in (4) account for the lion's share
of II T f 112. For this reason we collect all terms Q km belonging to k =1= 0 into a
single remainder term Q and write (4) in the form
We now have to play the dominant part and the remainder term against each
other. In order to bring the main line of reasoning to a close, we formulate
the following lemma:
(4.12) Let'lj; be an admissible wavelet with parameters a, p, C and A'. Then
there is a constant B' such that
Vt;, E JR.,
m
and, more important, one has
(5)
with a constant C' that does not depend on ,8.
Using this lemma and, of course, Definition 4.3.(3) of the parameter A' we
arrive at the inequalities
4.4 Proof of theorem (4.10)
117
appearing in the statement of the theorem. This completes the proof of
(4.10), modulo the lemma. ~
It remains to carry out the proof of Lemma (4.12).
r In order to estimate the sum I:rn I-(urne)12 from above we have to treat
the terms corresponding to m < a resp. to m 2: a separately, using the
appropriate inequality concerning - in each of the two cases. In this way we
obtain
as stated.
Now we come to (5), but this is a longer story. We regard Qkrn as a scalar
product, using a suitable decomposition of the various factors appearing in
the definition of Qkrn. In this way we obtain, by Schwarz' inequality,
If we use the substitution e + 2k1f / (urn f3) =: e' in the second factor, this
formula transforms into
For the estimate (5) we now have to sum the IQkrnl over all k f; a and all m.
For the inner sum (with respect to m) we use Schwarz' inequality in the form
118
4 Frames
leading to
(6)
In order to estimate the sums Lm under the integral signs we introduce the
auxiliary function
q(s) := sup + s)1 '
t; m
where, as we have seen in similar cases before, it is enough to take the supre-
mum over the set of with 1 ::::; ::::; 0". In terms of this function q(.) the
inequality (6) takes the following form:
IQI ::::; 11/112 L vq(2k1f/{3) q(-2k1f/{3) . (7)
kiO
In estimating q(.) we may assume (3 ::::; 1f from the outset; this has the conse-
quence that only values q(s) for lsi 2 need to be considered. As in the first
part of the lemma we have to treat the terms corresponding to m < 0 resp.
to m 0 separately. To this end we split q(.) into the two parts
so that in any case
(8)
We take up m < 0 first. The inequalities ::::; 0" and lsi 2 together imply
Therefore the assumptions on allow the estimate
4.4 Proof of theorem (4.10) 119
and taking the sum over all m < 0 one obtains
In the case m ~ 0 we argue as follows: At least one of the two numbers jam.;)
and lam.; + sl is ~ Isl/2 (note that.; and s may be of different signs) and at
least one is ~ lam';l. Both Isl/2 and lam.;! are ~ 1. Since l.;p(.;) I :$ a for all
~ , these circumstances allow the following conclusion:
Taking the sum over all m ~ 0, we see that q+ (.) can be estimated as follows:
Because of (8), we now have
(lsi ~ 2)
and consequently
(k =I: 0) .
Inserting this into (7) and performing the summation over all k =I: 0, we finally
obtain the stated estimate for Q:
It is easy to verify that the introduced constants 01, ... , 0
4
and A' do not
depend on /3. -.J
The triumphant progress wavelets have made in a great variety of applications
is based in the first place on the so-called "fast algorithms" (fast wavelet
transform, FWT) , and these in turn owe their existence to a careful choice
of the mother wavelet 'I/J. So far in this book the particular mother wavelet
chosen only had to fulfill some ''technical'' conditions, such as tr'I/J E L1 or
'I/J E C
r
for some r ::::: 0 and, of course, 1P(O) = 0 or, even better, 'I/J should be
of a certain order N > l.
The trigonometric basis functions e",: t 1--+ ei",t are distinguished by the follow-
ing linear reproducing property: If such a function is subject to a translation
Th ) it simply picks up a constant factor:
Contrary to this, in the realm of wavelets the operation of scaling is the central
theme, i.e., for arbitrary a E R* the operation
With respect to this operation, the wavelets considered so far did not behave
in a special way (except 'l/JHaar)' OK, their graph became flattened out or
got compressed in the t-direction, depending on the value of a, but there was
no reproduction property in the sense that the scaled version of a 'I/J could
be related to the original 'I/J in some other way. In the discrete case only the
integer iterates of a single scaling operation D
cr
, a > 1 denoting the zoom
step, enter the picture. From now until the end of the book we choose a := 2;
by the way, this is also the value most commonly used in practice. If we now
adopt a mother wavelet that in a certain way "reproduces itself" when it is
subject to the scaling D
2
, then novel and highly desirable effects develop.
That's what "multiresolution analysis" is all about.
To be more specific, things are arranged in such a way that the mother wavelet
'I/J satisfies a linear identity having the following structure:
n
D2'I/J (t) == 2:>k'I/J(t - k) .
k=O
5.1 Axiomatic description 121
This identity carries in its wake analogous linear formulas between the scalar
products (J, 'n,k) and (J, 'n+1,k) , so that these scalar products (called the
wavelet coefficients of f) need not be computed by tedious integrations over
and over again when going from one zoom level to the next one. The definitive
formulas will look somewhat different, but this is the general idea.
5.1 Axiomatic description
In Section 4.3 we discretized the continuous wavelet transform, and we showed
that under suitable assumptions a discrete, i.e., countable, set of "wavelet
measurements" (Tf(m, n) I (m, n) E Z2) is sufficient to allow the complete
reconstruction of f in the L2-sense or pointwise, etc., depending on the ex-
act circumstances. Multiresolution analysis is discrete to begin with, and
the wavelet functions 'j,k being used form an orthonormal basis of L2 by
construction, so it is not necessary to compute any ~ j , k ' S .
We now come to the formal definition. A multiresolution analysis, abbreviated
MRA, is constituted by the following ingredients (a)-(c).
(a) A bilateral sequence (Yj I j E Z) of closed subspaces of L2. These Yj are
ordered by inclusion,
... C V
2
C VI C Va C V-I C ... C Vj C Yj-I C ... C L2 (1)
(smaller values of j correspond to larger spaces Yj !), and one has
(separation axiom) , (2)
(completeness axiom) (3)
The following intuitive description will be helpful later on: The time
signals f E Yj only comprise features (i.e., details) exhibiting a spread
of size 2': 2
j
on the time axis. The more negative j is, the finer are the
details that may occur in a f E Yj, and "in the limit" every single f E L2
can be attained by functions Ii E Yj.
(b) The Yj are connected to each other by a rigid scaling property:
Vj EZ. (4)
122 5 Multiresolution analysis
Referring to time signals! this can be expressed as follows:
! E Vi => ! (2j .) E Vo . (5)
( c) Vo contains one basis vector per base step 1. To be precise, there is a
function if; E L2 n L1 such that its translates (if;( . - k) IkE Z) form an
orthonormal basis of 110. This function if; is commonly called the scaling
function of the MRA under consideration; it is the determining element
of the whole setup.
Please note: Several authors number the Vi's in the reverse direction compared
to (1). We stick to the ordering used in [D].
According to (c) above, the space 110 can be described as a set of time signals
! in the following way:
Vo = {!EL21!(t) = Ekckif;(t-k), Ek!ck!2<oo}. (6)
Using if; as a template we now define the functions
if;j,k(t) := = Tj/2if;(;j -k)
(jEZ, kEZ)i
this being in obvious concordance with the formulas defining the wavelet func-
tions 'l/Jm,n' It then follows immediately from (b) that the family (if;j,k IkE Z)
is an orthonormal basis of Vi , two subsequent functions if;j,k and if;j,k+l now
being translated by the amount 2
j
with respect to each other.
According to our remarks concerning point (a) above, one may interpret the
orthogonal projection P
j
of L2 onto Vi as a low-pass filter: The image P
j
!
of a time signal! E L2 incorporates all features of ! whose horizontal spread
over the time axis is of size 2
j
or larger. P
j
is given by the following formula:
00
Pj! = L (f, if;j,k) if;j,k .
(7)
k=-oo
CD The simplest example of an MRA is obtained as follows: Choose :==
lro,l[ and set
Vo := {f E L2 I ! constant on intervals [k, k + 1[ } ,
Vi := D
2
j (110) (j # 0) .
Then (b) and (c) are obviously fulfilled, and (1) is also guaranteed. The
separation axiom (2) holds trivially, and completeness (3) is an immediate
consequence of the fact that the step functions with jumps at the binary ra-
tionals k 2
j
are dense in L2. If one applies the general constructions described
in Sections 5.1-5.3 to this example, one obtains the Haar wavelet. We shall
explore this in detail in subsequent examples. 0
5.1 Axiomatic description 123
Because of the inclusions (1) the >j,k cannot be brought together to form a
"big" orthonormal basis of L2. For this reason we construct, besides the chain
of spaces Vj , a system (Wj I j E Z) of pairwise orthogonal subspaces Wj C L2
in the following way: Wj is the space gained in the transition from Vj to the
next larger space Vi-I in the chain (1). By this intuitive description we of
course mean the following: Wj is the orthogonal complement of Vj in Vj-I.
Then one has
VjEZ; (8)
furthermore, everything is set up in such a way that the formulas analogous
to (4) and (5), namely
resp. (9)
hold likewise; their easy verification may safely be left to the reader.
Bearing the chain (1) and the definition (8) ofthe Wj in mind, the following
proposition becomes plausible:
(5.1) If the system (Vj liE z) possesses the properties (a) of 811 MRA, then
tbe corresponding subspaces Wj are pairwise orthogonal, and furthermore
(orthogona,l direct sum) . (10)
r If i > j, then Wi C Vi-I C Vj, and using (8) one concludes that Wi 1.
Wj.
For the proof of (10) we need completeness (3) as well as the separation
condition (2). We have to prove that 1 E L2 and
together imply 1 = o.
Letan e > 0 be given. By (3) there is a jo and an ho E Vjo with 111 - ho II < e ;
for the sake of simplicity we may assume jo = O. Such an ho E Vo being chosen,
there is an hI E VI and a 91 E WI with
and similarly there are h2 E V2 and 92 E W2 such that
Proceeding in this manner along the descending chain Vo ~ VI ~ V2 ~ ... ,
one arrives after n steps at the representation
Since all vectors appearing on the right hand side of this equation are ortho-
gonal to each other, we have
n
\!hnW + L 1!9kW = I!hoW \In.
k=1
This implies that the series L ~ o 119k W is convergent, whence the series
L ~ o 9k converges in L2, from which in turn we may conclude that the limit
lim
n
......
oo
h
n
=: h exists.
Consider a fixed j E Z. For all n ? j one has h
n
E Vn C Vi; and since the
Vi are closed, we also have h E Vi. This being true for all j we conclude from
(2) that h = o. This implies
Now, by assumption, the function f is orthogonal to al19k E W
k
, whence we
have
00
(f,ho) = L(f,9k) = 0 .
k=O
This implies the inequality
IlfW = Ilf - hoW -llhoW < e:
2
by the Pythagorean theorem; and since e: was arbitrary, we come to the con-
clusion that f = O. -1
Let Qj denote the orthogonal projection of L2 onto Wj. From (8) we conclude
by general principles that
Qj = Pj - 1 - Pj
resp. Pj-l = Pj +Qj .
A few moments ago we interpreted the projection P
j
as a low-pass filter.
Pursuing such ideas further we can now say the following: Pj-I! incorporates
all features or details of the signal f exhibiting a spread of size? 2
j
-
1
on the
5.1 Axiomatic description
125
time axis, and in forming the difference Pj-I! -Pj! = Qj! one removes from
Pj-I! all features with a time spread of size ~ 2
j
. In this way we can regard
Qj as a kind of filter that retains resp. sieves out of! just those features or
details that have a time spread of size tv 2
j
/.../2. Or, to look at it another
way, one obtains the more detailed Pj-I! by adjoining to Pj!, the latter
encompassing all features of! with a time spread of size 2
j
and greater, the
details of size tv 2
j
/.../2 stored in the vector Q j!'
Looking at the orthogonal decomposition
we can put forward the following naIve miscalculation: To fix up the space V-I
we need two basis vectors per unit length, and from YO we already have one
basis vector per unit length at our disposaL As a consequence the space Wo
should get by with one basis vector per unit length as well; furthermore, on
account of symmetry reasons it should be possible to arrange matters in such
a way that the basis vectors of Wo are integer translates of a single function
'I/J, in the same way as the,basis vectors of YO are integer translates of a single
tfo. In other words, there i s ~ s o m e hope that we can find a function 1/J E L2 such
that the collection (1/J(. - k) IkE Z) is an orthonormal basis of Woo
Such a 1/J would then be our mother wavelet. If one subsequently sets
(j E Z, k E Z),
as agreed upon in the beginning of Section 4.3, then the family
(j is fixed here) is an orthonormal basis of Wj, and the orthogonal projection
Qj: L2 -+ Wj is given by the following formula:
00
Qj! L (j, 1/Jj,k) 1/Jj,k .
k=-oo
The totality of a111/Jj,k , i.e., the family
would then be an orthonormal wavelet basis of all of L2 by Proposition (5.1).
The following sections are devoted to the realization of this dream. In the
particularly simple case of Example CD the above naive miscalculation is ac-
tually correct, because the supports of the functions <PO,k don't overlap; and
it is easy to see that 1/J := 1/JHaar accomplishes what we have been asking for.
5.2 The scaling function
The scaling function <P is the alpha and the omega of any multiresolution
analysis. When a <P has been chosen, the space Va is determined by 5.1.(6),
the remaining Vi are given by 5.1.(5), and the Wj are characterized by 5.1.(8}.
When choosing <P = <Po,o E L2 n Ll three kinds of conditions have to be met.
First, the <Po,k. k E Z, should be orthonormaL If the <PO,k arising from a
given function <P are not orthonormal, maybe such a state of affairs could be
brought about by means of a Gram-Schmidt-process. In the next section we
shall meet an "orthogonalization trick" ([DJ) that turns the collection <Po,. in
one single stroke into an orthonormal system whose individual members are
still related to each other by integer translations on the time axis.
Second, we have to make sure that conditions 5.1.(2) (separation) and 5.1.(3)
(completeness) are met. These issues will be dealt with at the end of this
section. For the time being we quote the following result of our analysis
(Theorem (5.8): The modest normalization condition
If <P(x) dxI = 1
resp.
~ \ 1
\ <p(0) =-
..,fj;ff
is necessary and sufficient for 5.1.(2) and 5.1.(3).
(1)
The third and last condition that we have to take into account is maybe less
obvious than the first two, but it is the most crucial of them all: We have to
make sure that the inclusions 5.1.(1) are guaranteed. The verification of the
following lemma is left to the reader:
(5.2) Assume that a <P E L2 has been chosen, <P i= O. Define Va by 5.1.(6)
and the remaining Vi by 5.1.(5). If in this situation the inclusion V
o
C V-I is
true, then all inclusions 5.1.(1) hold.
This brings us to the essential point, that is to say, to the precise property
of a scaling function <P that makes it feasible for a multiresolution analysis in
the first place.
127
(5.3) For the inclusion Va C V-I it is necessary and sufficient that an identity
of the form
=
(t)
V2 L hk(2t-k) (almost all t E JR.)
k=-=
is valid with a coefficient vector h. E l2 (Z).
r The relations 5.1.(5) and 5.1.(6) imply
so in order for E Vo C V-I to hold, condition (2) is necessary.
Conversely, the identity (2) implies for arbitrary l E Z the identity
=
(t -l) = V2 L hk (2t - (k + 2l)) (almost all t E JR.) ,
k=-<>o
and as a consequence one has
00
O,l = L hk -I,k+21 E V-I
Vl EZ.
k=-=
(2)
Under such circumstances it is clear that arbitrary linear combinations of the
4>0,1 are lying in V-I as well, thus Va C V_I is proven. ~
The identity (2) goes by the name of the scaling equation; as we have said,
it controls the entire multiresolution analysis. As a matter of fact, we shall
see in Theorem (6.1) resp. 6.1.(2) that the coefficient vector h. determines
the scaling function uniquely. The coefficients hk also appear in the corre-
sponding algorithms; in fact, they determine more or less everything. When
doing numerical computations, one does not need the scaling function nor
the corresponding mother wavelet (that we shall construct in due course) at
one's constant disposaL This is in marked contrast to Fourier analysis, where
one has to compute function values eit; time and again.
The scaling equation describes a kind of "self-similarity". It can be compared
with the equation
r
K = Uh(K)
i=I
appearing in the theory of fractal sets, resp. of iterated function systems, to
be exact. The Ii in this latter equation are contracting similarities of the
euclidean plane; along the same vein the maps T 1-+ t := ~ ( T + k), playing a
key role in wavelet theory, are contracting similarities of the real axis. - That
the scaling function 4> should have the reproducing property (2) is obviously
a very strong restriction on the possible choices for such a function.
The hk cannot be chosen arbitrarily, either. Indeed, we have to make sure
that the 4>o,k form an orthonormal basis of Vo. Since the scalar product in L2
is translation invariant, the equations
VnEZ
are necessary and sufficient for that. In conjunction with (2) this leads to
80n = J 4>(t - n) 4>(t) dt = 2 hk hl J 4>(2t - 2n - k) 4>(2t -l) dt
k,l
= ~ hk hl J 4>(t' - 2n - k) 4>(t' -l) dt' = ~ hk hl82n+k,l
~ ~
We see that in order for the 4>o,k to be orthonormal it is necessary that the
hk satisfy the so-called consistency relations
(5.4)
00
~ hk hk+2n = 80n
k=-oo
in particular, one must have L::k Ih
k
[2 = l.
VnEZ;
While we are at it, we are going to prove a certain linear relation among the
h
k
; the condition q -I 0 appearing therein is of no importance because of (1).
(5.5) Suppose that h E l1(Z) and that J 4>(t) dt =: q -I O. Then
I Integrating the scaling equation (2) from -N to N with respect to t gives
j
N jN 1 j2N-k
4>(t) dt = ..J2 ~ hk 4>(2t-k) dt = /0 ~ hk 4>(t') dt' . (3)
-N k -N v2 k -2N-k
129
Since
I [ : : ~ : (t') dt'! ~ 1I1I1
VkEZ,
we can apply the theorem of Lebesgue to the sum on the right hand side of
(3). Letting N --* 00 in (3) we obtain
from which the theorem follows.
But we should be careful: Even if we have a coefficient vector h. E l2(Z) that
satisfies the relations (5.4) and (5.5), we can by no means be sure that there
exists a usable function fulfilling the scaling equation (2).
Let us assume for the moment that a multiresolution analysis according to
(a)-(c) above is given to Ufl. If we write (2) in the form
then we see that according to general principles about orthonormal bases one
has the formula
(k E Z) . (4)
The scalar product (, -l,k) can only be =I 0, if the supports of and of
-l,k overlap. Thus formula (4) allows us to conclude the following:
(5.6) If the scaling function has compact support, then only finitely many
hk are different from O.
But one can say even more. To this end, for arbitrary functions f: lR ~ C we
define the quantities
a(f) := inf{x I f(x) =I O} 2: -00, b(f) := sup{x I f(x) =I O} ~ 00 .
Thus a(J) and b(J) are respectively the "left end" and the "right end" of the
support of f. In the following theorem we assume for simplicity that is a
bona fide function, not a mere L2-object.
(5.1) lithe scaling function </> has compact support, then the quantities a:=
a(</ and b := b(</ are integers, and at most the hk with a ~ k ~ b are
different from O.
lOne has
1
b(</>-I,k) = "2 (b(</ + k) .
On account of (5.6), the integers
kmin := min{k I hk i= O} , kmax := ma:x{k I hk i= O}
are well defined. Considering the right hand side of the identity (2) as a
superposition of congruent graphs, translated with respect to each other by
steps of l, and taking the a() and the b() on both sides we see that the
following is true:
The last two equations give at once kmin = a, k
max
= bj in particular, one
has hahb i= 0 as a bonus. -1
Taking into account that only the hk are going to playa role in the numerical
algorithms, the last two propositions make it obvious that constructing scaling
functions with compact support is not a mere academic exercise. But we still
have a long way to go until we are there.
<D Because of 1[0,1[ = l[o,t[ + l[t,l[ the scaling function
</> := Haar := 1[0,1[
considered in Example 5.1.<D satisfies the scaling identity
</>(t) == </>(2t) + </>(2t -1)
resp.
1 1
</> = .J2 </>-1,0 + .J2 </>-1,1
(see Figure 5.1). Thus in the case at hand we have
1
ho = hI = .J2'
hk = 0 Vk E Z \ {O, 1} . (5)
It is easily verified that the statements (5.4), (5.5) and (5.1) are confirmed
by this example. 0 :
Figure 5.1
1 1--..... ,...------, <PHaar
o
I
I
I
I
1
131
To conclude this section, we take up a problem that we have postponed so
far: We have to formulate precise assumptions on the scaling function </> that
guarantee separation 5.1.(2) and completeness 5.1.(3) of the resulting family
(V; I j E Z). The following theorem shows that under very mild technical
assumptions on </>, condition (I), listed at the beginning of this section, is
indeed the only condition for these axioms to hold.
(5.8) Assume that the scaling function </> E L2 satisfies an estimate of the
form
C
I</>{t)) $ 1 + t2
(t E JR) (6)
and that the fa,mily (</>O,k IkE Z) is an orthonormal basis of Va. Then, first,
one has separation:
nv;={O};
j
(7)
and second, if and only if the integral J (t) dt =: q has absolute value 1, one
also has U
j
V; = L2, i.e., completeness.
r Any I E Va has a representation of the form I = L:k Ik O,k with
2:k )/kj2 = 11/112 < 00. Because of (6) we have the further estimate
Vt E JR
(with another C), and this implies, by Schwarz' inequality, that
I/(t) I s I: Ilkl !</>(t - k)1 s C 11/11
(almost all t E JR) .
k
Since I E Vo was arbitrary, we therefore can say that
1111100 := esssup II(t)1 ~ c 11111
tEIR
For a given 9 E Vj the function I := g(2
j
.) is in V
o
, whence we can say the
following:
Now, if such a 9 belongs to all Vj (j > 0) simultaneously, then this is possible
only if IIglioo = 0, whence 9 = O. This proves (7).
The space V:= Uj Vj is invariant with respect to the translations Tk (k E Z)
and the dilations D
2
j (j E Z); on the other hand, the step functions with
jumps at the binary rationals k . 2
j
are dense in 2. To prove the second
statement it is therefore enough to prove the following:
The function I := 1[-1,1[ belongs to V, if and only if Iql = 1.
The relation I E V can be expressed as follows: The function I is arbitrarily
well approximated in the 2-sense by its projections P_jl when j -t 00, i.e.,
By general principles this is equivalent with
Keep j > 0 fixed for the moment. By 5.1.(7) we have
and consequently
P-jl = L Ck 4>-j,k ,
k
II P_j11l2 = L ICkl
2

k
(8)
133
The Ck can be computed as follows:
Ck = j1 4>-i,k(t) dt = 2i/2j1 4>(2it - k) dt
-1 -1
j
N-k
= Ti/
2
4>(t') dt' ,
-N-k
(9)
where we have written 2i =: N as an abbreviation.
In the following, the letter C denotes various positive constants that may
depend on the chosen scaling function 4>, but not on j (resp. N) and k, and
the letter e denotes various complex numbers of absolute value l.
From (6) we deduce for arbitrary a > 0 the estimate
1 1
00 C C
14>(t)ldt<2 2"dt=-.
a t a
(10)
In order to obtain additiop.al manrevering space in the subsequent convergence
discussion we now choose'an c E ]0,1]. It then follows that there is an MEN
with
r 14>(t)ldt c .

(11)
We are now going to estimate the integral on the right hand side of (9). We
may assume from the outset that N := 2i M and distinguish the following
three cases:
(a) If Ikl N - M, then one has -N - k -N + (N - M) = -M and
analogously N - k N - (N - M) = M. Because of (11) we therefore
may conclude that
Ck = Ti/2 (if + 8c) ,
and from this we easily obtain
(b) If N -M < Ikl N +M, then
4>(t) dtl J /4>(t)1 dt = C
implies the estimate ICkl 2-
i
/
2
C.
(c) If JkJ > N + M and, e.g., k > 0, then for the upper limit of the integral
in question, one has N - k :s -M < O. This implies in view of (10) that
the corresponding Ck can be estimated as follows:
Summing over all such k One obtains
Taking into account the respective numbers of k's in the two cases (a) and
(b) we arrive at the following representation of JJP
j
fJJ2:
II P_jf1l
2
= 2: ICkJ2
k
= (2. (2j -M) + I) Tj(Jq12 +cee) + T
j
e(4MC + ~ )
= (2JqJ2 +cee) + T
j
e(2M(lqI2 + C) +4MC + ~ )
Letting j -t 00 we can draw the conclusion that
.lim JJP-jf1l
2
= 2lqJ2 + cee .
)->00
As e > 0 was arbitrary we see that (8) is valid if and only if Jql = 1.
5.3 Constructions in the Fourier domain
Multiresolution analysis is "invariant" with respect to (a) integer translations
of the time axis and (b) dilations by powers of 2. In order to make the best use
of this inner symmetry we shall transfer the actual construction of admissible
scaling functions 4> and corresponding mother wavelets 'IjJ into the "Fourier
domain". As a consequence, e.g., the orthonormality of the 4>o,k = 4>(' - k)
has to be expressed in terms of properties of ; of course we also need a Fourier
version of the scaling equation, and so On.
135
For an arbitrary function E L2 one may write
The integral on the right hand side can be thought of as an integral over
Z x [0, 27r J. If one interchanges the order of integration, the function
{p(e) := 2:)'(e + 27rl)/2
I
appears as the new inner integral. By Fubini's theorem {P is defined almost
everywhere, first on [0, 27r], then on all of JR, is 27r-periodic, and one has
We first prove the following lemma:
(5.9) The integer translates k := (. - k) of an arbitrarily given function
E L
2
constitute an orthonormal system if and only if the following identity
holds:
(almost all e E JR) . (1)
r For symmetry reasons it is enough to consider the scalar products of the
form {o, k}. They are computed as follows:
This implies that the orthonormality condition {o, k} = 8
0
k is equivalent to
~ 1
{P(k) = - 80k
27r
VkEZ,
and the latter obviously means {p(e) == 2 ~ almost everywhere.
The next point on our agenda is the scaling equation
(t) == V2'Lhk(2t-k) (almost all t E 1R.) . (2)
k
Taking the Fourier transform on both sides of (2) we obtain, using the rules
(R1) and (R2), the identity
Looking at this formula we are led to introduce (at first only formally) the
function
H ( ~ ) : = ~ L hk e-ikf; ;
v2 k
(3)
we call it the generating function of the multiresolution analysis under consid-
eration. Because of Ilh.H = 1, the series (3) is almost everywhere convergent,
by Theorem (2.4), and defines H as an actual 21l"-periodic function. If only
finetely manyhnniknd ( Itdt t:
lnll
137
(5.10) The generating function H of a multiresolution analysis satisfies the
identity
(almost all wE lR) .
This of course implies that H is uniformly bounded on lR:
JH(w)J ~ 1
(w E lR) . (5)
Furthermore, since (O) =J. 0 by 5.2.(1), it follows from (4) that H(O) = 1, and
(5.10) in turn implies H(Jr) = O.
Our next goal is to describe the space W
o
, i.e., the orthogonal complement of
Vo in the larger space V-I> as explicitly as possible. Having such a description
in hand we shall be able to give an explicit formula for a possible mother
wavelet 'if; belonging to the given scaling function <p.
We begin with V-I. Any f E V-I has a representation of the form
f = L fk<p-I,k , fk = (j, <P-I,k) (k E Z) ,
k
and taking the Fourier transform on both sides we obtain (cf. the same cal-
culation for the scaling function <p)
i ( ~ ) = ~ L fk e-
ik
f,/2 ( ~ ) .
k
(6)
Therefore we introduce (analogously to H above) the function
(7)
In this way formula (6) becomes
(8)
The series appearing in (7) is convergent for almost every ~ E Rj21l'; therefore
we can say that the representation (8) is valid for almost all ~ E lR.
The above chain of arguments can be reversed: If (8) is true for some function
mf E L; , then f E V-I'
A function f E Wo C V-I is orthogonal to 170, and as a consequence one has
(j, <PO,k) = 0 for all k E Z. This in turn implies
VkEZ
for such I, and the latter is possible only if the periodic function
2: + 211"l) + 211"l)
l
vanishes for almost all E R/211". In the last sum we again separate the
partial sums corresponding to even resp. odd values of l, then we express j
by means of (8) and analogously 1> by means of (4), noting that mf and H
are 211"-periodic. Altogether we obtain the following chain of equations, where
in the end we again make use of (5.9):
o == 2: + 47l"l) + 411"l) + 2: + 211" + 411"l) (e + 211" + 411"l)
l l
= 2: mf + 211"l) 12
l
+ +11") +11") +11" + 211"l) 12
l
=
It turns out that we have proven the following identity:
(almost all w E R) . (9)
Formulas (5.10) and (9) together can be paraphrased as follows: For (almost)
every fixed w the vector
H := (H(w),H(w+1I"))
is a unit vector in the unitary space ([:2, and the vector
is orthogonal on H.
It is easy to see that H and the further vector
together form an orthonormal basis of ([:2. This implies by general principles
that
illf = >.(w)H',
(10)
5.3 Constructions in the Fourier domain 139
where the coefficient >.(w) is given by the formula
The function w f--+ >. (w) satisfies the identity >. (w + 1r) == - >. ( w ), consequently
there is a 21r-periodic function v() such that
(11)
Inserting this into (10) and extracting the first coordinate we obtain the fol-
lowing representation of m f :
Introducing this into (8) we finally get for 1 the expression
(almost all e E JR.) _ (12)
This line of reasoning leids us to the following theorem:
(5.11) A function f E L2 belongs to the space Wo, if and only if there exists
a function v(-) E L ~ , such that [ can be written in the form (12)_
r We have already shown that f E Wo implies the existence of a 21r-periodic
function v: JR. -+ C such that [has a representation ofthe form (12)_ Solving
(11) for v() we get the expression vee) = e-
i
f.!2>.(e/2), and we infer from
(10) that
This implies
Conversely, if (12) is true for some v(-) E L ~ , then we have (8) with
140
Because of (5) we may conclude that mf E and this in turn implies
f E V-I- Furthermore, we have
proving that the vector m f is orthogonal on H for almost all w _ This means
that (9) is true for almost all w; on the other hand, for an f E V-I this is
equivalent to f l- Va- -.J
Inspired by the identity (12) we now define the mother wavelet 'ljJ correspond-
ing to the given 4> by the following formula:
(13)
It appears that in doing so we are successful:
(5.12) IE the mother wavelet 'I/J isdeiined by (13), then the system of functions
('l/Jo,k IkE Z) constitutes an orthonormal basis ofW
o
_
I According to (5.9) the orthonormality of the 'l/Jo,k is proven by the fol-
lowing calculation:
:Llij;(e + 27rl) 12 = :L Iij;(e + 47rlW + :L \ij;ce + 27r + 47rl)\2
I I I
= + 7r) \2:L + 27rl) \2 + r:L + 7r + 21rl) \2
I I
= + 1r) r + \2) :1r == -
As 1 E it follows from (5.11) that 'I/J is indeed in W
o
, whence all integer
translates 'l/Jo,k belong to Wo as well_
On the other hand, consider an arbitrary f E Wo _ By Theorem (5.11) resp_
(12) and (13) we know that there is a v(-) E such that
(almost all e E JR) _ (14)
The function v(-) can be developed into a Fourier series Ek Vk e-
ike
, and by
Carleson's theorem (2.4) this series converges almost everywhere to v(e)- It
follows that we can replace (14) by
ice) = I: Vk e-
ike
ij;(e) (ahnost aU e E JR) -
k
Now, this is nothing more than the Fourier transform of the representation
J(t) = "L 1/k 'if;(t - k) resp.
k
the series appearing on the right converging in L2. Altogether this proves
that the 'if;o,k do indeed form an orthonormal basis of Wo . ~
The scaling function </> does not determine the corresponding mother wavelet 'if;
uniquely, thus formula (13) can be modified to a certain degree. For instance,
amending it by factors eiO! e-iNf. with a E JR, NEZ, is allowed. An additional
factor e-iNf. in :;j produces a translation of the graph of 'if; by N units to the
right. In this way, depending on circumstances, one can achieve that 'if; has
the same support as </>.
Formula (13) gives only the Fourier transform of the wavelet 'if;. In order
to obtain the function 'if; itself we have to translate (13) back into the time
domain. Using (3) we get.
'.
where at the very end we performed the substitution k := -k' - 1 (k' E Z).
Therefore (13) can be replaced by
(15)
According to the rules (Rl) and (R3) the last formula is nothing other than
the Fourier transform of the representation
'if;(t) = v'2"L( _1)k-l h-k-l </>(2t - k) . (16)
k
In order to get a well-structured set of formulas we set
k-l--
(-1) h-k-l =: 9k.
(17)
14:;! ::> LVIUltlresolutlOn analySis
In this way (16) becomes
'1/J(t) = v2L9k (2t - k) , (18)
k
an identity that has the same structure as the scaling equation 5.2.(2). An-
other admissible definition of the 9k would have been
(19)
If, e.g., only the hk for D k 2N -1 are different from zero, then (19) implies
the same state of affairs for the 9k, and all summations in the corresponding
algorithms (see Section 5.4) range over the index set {D, 1, ... ,2N - I}.
Let us summarize the results obtained so far in the following theorem:
(5.13) Assume that (Vj I j E Z) is a multiresolution analysis with scaling
function and generating function H, and let the mother wavelet '1/J be defined
by (13) resp. by (16). Then the function system
-"/2 (t-k.2
j
)
'1/Jj,k(t) .= 2 J '1/J
. 2j'
is an orthonormal wavelet basis of
r Consider a fixed j E Z. Since according to (5.12) the '1/Jo,k constitute an
orthonormal basis of Wo, it is an easy consequence of the principle 5.1.(9)
and a small calculation that ('1/Jj,k IkE Z) is an orthonormal basis of Wj . The
theorem now follows from Proposition (5.1). .-J
CD As our first example we take up the Haar multiresolution analysis again,
cf. Example 5.2.Q). This time we are in a position to construct the mother
wavelet '1/J following the prescriptions of the general theory. It is easy to verify
that := Haa:r has as its Fourier transform the function
= 1
.J2i
(20)
On the other hand we now insert the values of the hk' as computed in 5.2.(5),
into (3) and obtain the following generating function:
1 1 "C "c/2
= - -(1 + e-''') = cos - e-'" .
v2v2 2
(21)
143
It is easily seen that the functional equation (4) is fulfilled in this case. The
recipe (13) now gives
which is the same as 1.6.(1), up to a factor _e
i
( This means that the 'IjJ we
have constructed here is translated one unit to the left and is multiplied by
-1 with respect to the "official" Haar wavelet. This fact is corroborated, if
we now compute the 9k by means of (17):
- 1 - 1
9-1 = ho = .../2' 9-2 = -hI = - .../2 '
all remaining 9k being zero. This gives
1 1
'IjJ = j2 >-1,-1 - j2 >-1,-2
resp. 'l/J(t) = >(2t + 1) - >(2t + 2), as announced above. The reader may
convince himself on his own that the alternative definition (19) of the 9k (in
the case at hand we have N = 1) would have led to the "official" 'l/JHaar , whose
support coincides with that of >Haar' 0
@ As our second example we present the so-called Meyer wavelet. For its
construction we again make use of the auxiliary function
vex) Haxz -15x' + 6'"
(x S 0)
(0 s x S 1)
(x 2:: 1)
shown in Figure 4.6 (this v(.) has nothing to do with the vO's appearing in
Theorem (5.11). We set
1
v'21r
_1_
2 21r
o
S 2;)
e: s S \1r)
2:: \?r)
A
'I7=4\t;,-27f)
1/V21r
,..----
o
Figure 5.2
/
/
(see Figure 5.2). This defines a function j> E L2 about which we can say the
following right away: From the fact that has compact support it follows that
E Coo, and because of E C2 the assumption 5.2.(6) of Theorem (5.8) is
satisfied by ; furthermore, one has
J (t) dt = V2rr(O) = 1,
as is required for Uj 10 = L2, see (5.8).
In view of Proposition (5.9) we now have to examine the function
:= L I(t;, + 21rlW .
l
A short glance at Figure 5.2 shows that it is sufficient to verify condition (1)
in the f.-interval [2;, 4;]. In this interval only the two terms corresponding
to l = 0 and l =-1 contribute anything to at all. Because of
If. - 27f! - 1 = 1 - - 1)
and
v(l - x) == 1 - v(x)
it follows that
(
21l" < C < 41l")
3 -<" - 3
(x E JR)
1 2(7f (3 )) 1. 2(7f (3 ))
= 21r cos "2v 21l"f. - 1 + 21r sm "2v 21l"f. - 1
is valid for 2; f. 4;, as required.
1
Figure 5.3 The scaling function for the Meyer wavelet
We now define (out ofthe blue) the 27r-periodic function
lI(f.) := ..j2; I: if;(2f. + 41fl)
(22)
I
(this is, for any given f. E JR, a finite sum!) and assert that Hand are in
fact related to each other by the functional equation
as called for by the general theory.
I The function f. f-+ if; ~ ) has as its support the interval [- 8; , 8;]. On the
other hand, all functions if;( . + 47rl) belonging to an l i= a are identically zero
on this interval. Therefore we already know that
But on the support [- t, 4;] of if; the identity i f ; ( ~ ) ;:::; k is true. This
implies that the right hand side of (23) has for all f. the value if;({), as stated.
-.J
146
1
0.5
-4 1/2
Figure 5.4 The Meyer wavelet
According to what we just have proven, the function 1> satisfies a scaling
equation as well, so that now all circumstances required for a multiresolution
analysis are established. Formula (13) gives the following expression for an
admissible mother wavelet in this case:
+ 7f) = .j2;e
ie
/
2
L (f. + 27f + 47fl)
I 2
= ((f. + 27f) + (f. - 27f))
The corresponding 'f/; is called the Meyer wavelet. One easily verifies that
it is, up to the ''phase factor" e
ie
/
2
, nothing other than the Daubechies-
Grossmann-Meyer wavelet 4.3.(11) corresponding to the step sizes (j := 2,
f3 := 1. We refer the reader to Example 4.3.(!) for details. There the 'f/;j,k
constituted only a frame. Thanks to the additional factor e
ie
/
2
provided by
the general theory we now even have an orthonormal wavelet basis.
In Figures 5.3 and 5.4 the scaling function 1> as well as the Meyer wavelet
are shown in the time domain. 0
At the beginning of Section 5.2 we put on record that a scaling function 4>
has to meet three (sets of) requirements: First, the 1>o,k have to constitute
an orthonormal system, second, there is the normalization condition 5.2.(1)
securing separation and completeness, and third, there is of course the scaling
equation. We conclude the current section by showing how a given 1> that
satisfies only the second and the third of these conditions can be improved in
such a way that the resulting # is a scaling function belonging to the same
exhaustion (Yj I j E Z) of L2 and such that its integer translates # (. - k)
are in fact orthonormal.
(5.14) Assume that the function E Ll n L2 satisfies a scaling equation as
well as the condition J (t) dt =1= 0, and let the spaces Yj C L2 be defined by
5.1.(5)-(6). If there are constants B 2:: A > 0 such that
(almost all E E JR.) ,
then the following are true:
(a) The family (( . - k) IkE Z) is a Riesz basis of Va ; in particular, it is a
frame for Va with frame constants 27rA and 27rB.
(b) If one defines the function # via its Fourier transform by
then # determines a multiresolution analysis with the same spaces Yj. This
means, in particular, that the functions (# ( . - k) IkE Z) constitute an
orthonormal basis of Va.
r (a) We have to show that for arbitrary
1 := L::>k(' - k) E Va
k
the following inequalities are true:
27rA L ICkl
2
::; 111112 ::; 27rB L ICkl
2

k k
The Fourier transform of 1 is given by
i= (Lcke-ike)J;(E),
k
therefore we have
148
In an analogous manner one argues with respect to A, and (4.6) shows that
the ( . - k) are a fortiori forming a frame.
(b) Because of E L\ the function is continuous, and this implies in turn
the continuity of one after another of
1
,
The two functions and belong to Denoting the Fourier
coefficients of by ak, we have
(almost all e E R)
and consequently
#(E,) = (e)
(almost all e E R) .
k
Translating the last equation into the time domain we come to the conclusion
that
# = L ak ( . - k) Eva,
k
and this in turn implies vt c Va. In an analogous manner, using the Fourier
expansion of one proves the inclusion Va C vt. It follows that each
one ofthe spaces coincides with the corresponding Vj.
That the #(. - k) are orthonormal is an immediate consequence of (5.9).
But our proof is not completely finished. It still remains to show that #
satisfies the normalization condition 5.1.(1), which means the same as saying
that the Vj fulfilled the separation and the completeness axioms to begin with.
Because of Va c V-I the modified scaling function # satisfies a certain scaling
equation as well, whence also an identity of the form (4):
(24)
By assumption on we have (O) -10 and consequently #(O) -10 as well.
Therefore we may conclude from (24) that H#(O) = 1 and, what's more, that
5.4 Algorithms
149
H# is continuous in a neighbourhood of O. Since H# satisfies the identity
(5.10) we must have H#(1f) = O. We now assert that the following are true:
#-(21fl) = 0 Vl E Z \ {O} .
r For any given l =1= 0 there is an r E N and an n E Z such that l = 2r(2n+1).
Ifwe apply (24) recursively r times, we get
r-l
''#(21fl) = IIH#(2
r
-
j
(2n+1)1f) .H#((2n+1)1f),#((2n+1)1f) = 0,
j=l
since H# vanishes at odd multiples of 1f.
In view of what we have just shown, we now have
as required by 5.1.(1).
5.4 Algorithms
At this point we pause for a moment in our pursuit of the general theory, in
order to present at long last the "fast algorithms" that we have repeatedly
announced in earlier sections. In the framework of multiresolution analysis
such algorithms lend themselves almost automatically, contrary to Fourier
, analysis, where it took centuries from its invention (by Euler) until the advent
'of the FFT .
. Maybe the reader has found the numerous factors v'2 and ~ appearing in
the foregoing sections to be kind of a nuisance, and he very likely might have
, thought that such factors could have been avoided by arranging definitions and
notations more carefully. The truth of the matter is that the agreements we
made are very sound: Everything is set up in such a way that these annoying
factors do not occur anymore where it really matters, to wit, in repetitive
numerical calculations.
150
The motor propelling the fast wavelet algorithms is the scaling equation
1>(t) = V2 L hk 1>(2t - k), (1)
k
paired with the analogous equation for 'I/J. The latter can by 5.3.(18) be
written in the form
'I/J(t) = V2 Lgk 1>(2t - k) , (2)
k
the gk appearing in (2) being related to the hk according to 5.3.(17) or 5.3.(19).
From (1) we deduce, for arbitrary j E Z, nEZ, the identity
This may be written in the form
1>j,n = L hk 1>j-I,2n+k
k
Vj, Vn, (3)
that is to say, as a recursion formula for 1>j-I,. 'V't 1>j, . In an analogous way
one obtains from (2) the formula
'l/Jj,n = Lgk1>j-I,2n+k
k
Vj, Vn,
which leads from the array 1>j-I,. to the array 'l/Jj, .
(4)
We are now going to analyze a time signal f E L2, and having done that we
are going to synthesize it back to its original appearance. In the whole process
there will be a finest scale to be considered; we may assume that it belongs
to the value j = o. Therefore the analysis begins with the data
aO,k := (j,1>o,k) := J f(t) 1>(t - k) dt .
These values could be determined, e.g., by numerical integration. It may also
be the case that f is only given in the form of a discrete array (J(k) IkE Z)
to begin with. In such circumstances one simply puts
aO,k := f(k) (k E Z) .
5.4 Algorithms 151
This is not so farfetched in view of the fact that J (t) dt = 1, particularly
in the case when has a narrow support and subsequent values of ! do not
differ much from each other. Be that as it may, for the remaining discussion
our basic assumption on ! can be summarized as follows:
Po! = I: aO,k O,k .
k
The wavelet analysis now proceeds in the direction of increasing j, and this
means in the direction of ever longer waves resp. toward more drawn-out
features of the signal!. We describe right away the step j - 1 ..,... j. Let j ;::: 1
and assume
Pj-I! = I: aj-l,k j-l,k ,
(5)
k
where the values aj-l,k are known and stored in an array. Intuitively speaking,
the image Pj-I! encompasses all features of! having a spread of size;::: 2
j
-
1
on the time axis; see our detailed explanations in this regard in Section 5.l.
Our first task is the computing of the quantities aj,n (n E Z). Using (3) we
obtain
aj,n := (j, j,n) = I: hk (j, j-I,2n+k) ,
k
so that we can write down the following recursion formula for the step from
aj-I,_ to aj,_ :
aj,n = I: hk aj-I,2n+k
k
The array aj,_ encodes the next coarser approximation of !, to wit
p.! = '"' a . k A. . k
J ~ J, 'i'J,
k
The approximations Pj-I! and Pj ! are related to each other by the formula
Qj denoting the orthogonal projection onto Wj. The image Qj! contains
all features (details) of ! that have a time spread of size rv 2
j
/..;2. Since
('if;j,k IkE Z) is an orthonormal basis of Wj, we can write
Qj! = L dj,k 'if;j,k ,
k
152
and on account of (4) the coefficients appearing here are given by
dj,n = (f, 'l/Jj,n) = L 9k (f,4>j-I,2n+k) .
k
Expressing the scalar products on the right by means of (5) we therefore
obtain the following formula for the "diagonal" step from aj-I,. to d
j
,.:
dj,n = L 9k aj-I,2n+k
k
The information about the time signal ! that was extracted in the transi-
tion from Pj ! to Pj-I! is now stored in the array dj ,.. Contrary to the
"temporary" quantities aj,k , the dj,k are actual wavelet coefficients.
Altogether we obtain the following cascade, in the course of which at each
step the signal ! is made coarser by a factor of two and at the same time
details having a time spread of size IV 2
j
/ V2 are extracted:
Ii Ii Ii Ii Ii
ao,.
---t
al,.
---t
az,.
---t
a3,.
---t ---t
aJ,.
~ g ~ g ~ g ~ g ~ g
dl ,. dz,. d3,.
dJ,.
(6)
The wavelet analysis (6) of the given time signal! is terminated after J steps,
where the number J comes out in a natural way, see below. We now address
the following question: How many arithmetical operations were necessary for
this analysis? In order to fix ideas we assume from the outset that the scaling
function 4> has compact support. We know from (5.7) that in this case the
numbers a( 4 and b( 4 are integers. In keeping with the notation used in
certain famous examples later on we assume that
a(4)) = 0, b(4)) = 2N -1, N ~ l .
It follows from (5.7) that only the hk with 0 ~ k ~ 2N - 1 are different from
0, and the same is true for the 9k, if we agree on 5.3.(19).
We introduce the following piece of notation: If x. is an arbitrary array over
the index set Il, then the formulas
supp(x.) c [p, q[ , length(x.) ~ q - p
5.4 Algorithms
153
express the fact that at most the Xk with p ~ k < q are nonzero and that at
most q - p individual entries are considered resp. stored at all. (The numbers
p and q need not be integers.)
The array ao,. encodes all the information that we are going to use about the
time signal f. For simplicity, we assume, e.g.,
supp(ao,.) C [0, 2J[ , length(ao,.) = 2
J
.
We assert that under the described circumstances the supports of the arrays
fJj,. can be bounded as follows:
supp(aj,.) C [-2N + 2, 2
J
-
j
[ (j '2 0) . (7)
r For j = 0 the assertion is true by assumption. For the step j - 1 -v+ j we
may suppose that j '2 1 and that
Because of
supp(aj-l,.) C [-2N + 2, q[ ,
q := 2
J
-(j-l) .
2N-l
aj,n = L hk aj-l,2n+k ,
k=O
a component aj,n can be 1= 0 only if the two sets
{2n,2n + 1, ... ,2n + 2N -I} and [-2N + 2, q[
have a nonempty intersection, and for the latter it is necessary and sufficient
that the inequalities
2n< q /\ 2n + 2N - 1 '2 -2N + 2
hold. The first of these says n < q/2 = 2
J
-
j
, the second n '2 -2N + . Thus
we may conclude that supp(aj,.) is bounded as stated in (7). ...J
Formula (7) suggests that we terminate the process after J steps, since from
then on supp(aj,.) stays put at [-2N + 2, OJ. How many multiplications
have been carried out up to this point? (For the sake of simplicity we are
disregarding the additions here.)
The computation of an individual value aj,n requires at most length(h.) = 2N
multiplications. On the other hand we conclude from (7) that
length(aj,.) ~ 2
J
-
j
+ 2N - 2 (j '2 0) ,
and for length(dj,.) we obviously have the same bound. Altogether we obtain
the following upper bound for the total number f.L of multiplications required
for the complete analysis of the given signal f:
J
f.L::; 22N 2:)2
J
-
j
+2N-2) =4N(2J -1+J(2N-2)).
j=l
This implies
f.L ::; 21ength( h.) length( ao,.) (1 + 0(1)) ;
that is to say, the number of required operations is linear in the input length.
Starting from ao,. and proceeding in the described way we have computed in
J ~ 1 steps the coefficient arrays
(the intermediate or "temporary" arrays ao,., ... , aJ-l,. are no longer needed).
The total length of these arrays is about equal to length(ao,.), so that at first
glance we have gained nothing in terms of storage requirements. But we have
to bear in mind that the individual coefficient arrays dj,. will contain long
sequences of negligible entries dj,k, depending on the fine structure of the
time signal f in different regions of the t-axis. By disregarding all dj,k whose
absolute value is below a certain threshold and releasing the corresponding
storage cells one is able to achieve spectacular compression ratios without
Significant loss of information. For instructive examples in this regard, we
refer the reader to [19].
Now for the synthesis: Here we obtain an algorithm of a similar simplicity.
Since the step j -1 j amounts to replacing the orthonormal basis CPj-l,. of
YJ-l by the likewise orthonormal basis CPj,. U 'l/Jj,. , the reverse step j ~ j -1
does not necessitate the inversion of a certain matrix. The details are as
follows: One has
L aj,k hk + L dj,k 'l/Jj,k
k k
and consequently
aj-l,n = (Pj-d, CPj-l,n) = L aj,k (CPj,k, CPj-l,n) + L dj,k ('l/Jj,k, CPj-l,n) .
k k
The scalar products appearing on the right can be read off from (3) and (4):
5.4 Algorithms
155
so that altogether the following synthesis formula emerges:
aj-l,n = L hn- 2k aj,k + L gn-2k dj,k
k k
In this way we obtain as a counterpart to (6) an "upward" cascade that takes
the coefficient arrays
aJ,. , dJ,., dJ-I,., ... , d2,., dl ,.
as its input and finally returns ao,., i.e., Pof, as its output:
h h h h h
aJ,.
~
aJ-I,.
~ .
aJ-2,.
~ ~
al,.
~
ao,.
/g /g /g /g
dJ,. dJ-I,. d2,. d1,.
We leave it to the reader as an exercise to compute the total number 11 of
multiplications required f o r such a synthesis. The resulting figure will be
about twice as large as the 11 from the "downward" cascade (6).
The boxed formulas show that we need only a table of the hk and the gk in
order to be able to begin with concrete numerical work. Neither the scaling
function 1> nor the mother wavelet W have to be stored, be it numerically or
otherwise, nor do they have to be recomputed on end at runtime. (By the way,
one does not need to understand anything of the underlying theory either. .. )
In [DJ one finds a great number of such tables; they relate to various wavelets
'if; that for the one reason or another have proved their worth. The following
example of such a table belongs to the so-called Daubechies wavelet 3W having
support [0,5] :
k
hk
gk = (-I)kh
5
_k
0 .3326705529500825 .0352262918857095
1 .8068915093110924 .0854412738820267
2 .4598775021184914 -.1350110200102546
3 -.1350110200102546 -.4598775021184914
4 -.0854412738820267 .8068915093110924
5 .0352262918857095 -.3326705529500825
(8)
We shall construct this wavelet in 6.2.@ ab ovo, only there we shall see how
the values of the hk tabulated above come about.
CD (Continuation of 5.3.@) We have not yet computed the hk corresponding
to the Meyer wavelet. That's what we are going to do now.
The generating function H() is given by 5.3.(22) and is an even function, as
is 4>. Thus on account of 5.3.(3) we obtain successively
hk = .../2
2
2j'lr H(f.) eikf; df. = .../2j'lr H(f.) cos(kf.) df.
~ -'lr 2 ~ -'lr
.../21'lr
= - V21fL: (2f. + ~ l ) cos(kf.) df. .
~ 0 l
In the last sum, only the term corresponding to l = 0 is contributing anything
to the integral, whence we obtain
2 r
hk = h-k = -..fo Jo (2f.) cos(kf.) dE; .
These integrals now have to be computed numerically. In view of the function
v() used in the construction, the resulting has 4-clicks at the two points
2; and 3-clicks at the two points ; apart from that it is infinitely dif-
ferentiable. This implies (cf. Example 1.2.@) that for k ~ 00 the hk decay
only like 1/k4. The numerical computation results in the following values:
k
hk = h-k
k
hk = h-k
0 .748791 16 -.000329
1 .442347 17 .000061
2 -.039431 18 .000333
3 -.127928 19 -.000231
4 .033278 20 -.000059
5 .057120 21 .000174
6 -.024807 22 -.000115
7 -.025310 23 -.000027
8 .016000 24 .000115
9 .009538 25 -.000067
10 -.008556 26 -.000028
11 -.002451 27 .000066
12 .003416 28 -.000040
13 .000058 29 -.000015
14 -.000647 30 .000046
15 .000225 31 -.000027
o
6 Orthonormal wavelets
with compact support
6.1 The basic idea
We are confronted with the task of producing scaling functions : lR -..,. C
having the following properties:
(a) E L2 , supp() compact,
(b) resp.
k
(c) ! (t) dt = 1 resp. (O) = vk,
(d) !(t)(t-k)dt=OOk resp.
k 211"
If all these conditions are met, then Theorem (5.13) will provide us with an
orthonormal basis of wavelets 'l/Jj,k having compact support.
Condition (a) immediately implies E L1 and E Coo; furthermore, we
know from (5.6) that only finitely many hk are nonzero. It follows that the
generating function
:= ..2.- hk
V2k
is a trigonometric polynomial satisfying the identity
and having the special values H(O) = 1, H(1I") = 0; see (5.10).
(1)
The systematic construction of polynomials with these properties is an alge-
braic problem that we shall take up in the next section. For the moment we
assume that we have such an H at our disposal, and we begin our undertaking
by showing that the corresponding scaling function , if there is one at all, is
uniquely determined by H. Applying (b) recursively r times we obtain
158 6 Orthononnal wavelets with compact support
and, therefore, because of (c),
1 r
= V2-ff }1 H(2j) ,
(2)
if the infinite product converges. In this regard, we show:
(6.1) Assume that the generating function H E 0
1
satisfies the identity (1)
as well as H(O) = 1. Then the product (2) converges locally uniformly on IR
to a function E L2.
I Setting
max =: M
e
and using the mean value theorem of differential calculus we obtain
-1\ = - H(O)I M E R),
therefore we may conclude
- 1\ < M
23 - 23
(j 0) .
Because 2:
j2
1 2-
j
= 1, this implies by general principles that the product (2)
is converging locally uniformly to a continuous function : R -t Co
In order to prove E 2 we have to modify the limiting process leading from
H to slightly by means off a "cut-off function". To this end we set
:= vk
and define recursively, as in (b),
This implies
1 r
= !2; II H(2j) . 1[_2
r
1r,2
r
1r[
... 2?r j=l
For any given E R there is an TO such that
- 2
r
?r < 2
r
?r VT > TO ,
(3)
(4)
showing that the "cut-off factor" in (4) has no effect as soon as T > TO.
Therefore the comparison with (2) proves
R) ,
moreover, we have locally uniform convergence of the fr as well. The next
point on the agenda is the following lemma:
6.1 The basic idea
159
(6.2) For each r ~ 0 the family (jr(' - k) IkE Z) is an orthonormal system.
r Because of Proposition (5.9) the assertion of the lemma is equivalent to
(r ~ 0) . (5)
Now the recursion formula (3) for the f.. implies the following recursion for-
mula for the functions <P r :
<pr(e) = L If..(e + 411"l) 12 + L If..(e + 211" + 411"l) 12
l l
Since statement (5) is obviously true in the case r = 0, the last equation and
(1) together imply that .it is true for all r ~ o. -.l
,
In particular we have IIfrll
2
= 1 for all r ~ O. Using Fatou's lemma we
therefore may draw the following conclusion about the limit function J;:
This proves E L2.
The existence of a scaling function corresponding to the given H being
established, we now have to take care of supp(). How can we be certain
that the scaling function (2) indeed has compact support, given that only
finitely many hk are nonzero? The functions fr that were used in the proof of
Theorem (6.1) and are converging to in L2 certainly do not have compact
support; in fact, theX are holomorphic functions of the complex variable e,
since the sets supp( fr ) are compact.
In order to get control over supp( ) we have to argue directly in the time
domain. So let us assume that
a(h.):= min{k I hk =F O} = 0, b(h.):= max{k I hk =F O} = 2N -1. (6)
If the resulting has indeed compact support, then we know from (5.7) that
the latter is bounded below by a( ) = 0 and above by b( ) = 2N -1. We now
construct a second sequence (gr I r ~ 0) that converges in some sense to ;
160 6 Orthonormal wavelets with compact support
but this time we make sure that the supports of all gr are lying in the interval
[0,2N - 1] we are aspiring to.
For the definition of such a sequence we recall the reproducing property of if>
encoded in the scaling equation 5.2.(2). It can be expressed as follows: The
scaling function </> is a fixed point of the transformation
2N-l
Sg(t) := ..J2 L hk g(2t - k) .
k=O
In functional analysis the common procedure to determine a fixed point of
some mapping S is the following: One chooses a suitable starting point go
and defines recursively a sequence (gr I r ~ 0) by the formula
(r ~ 0) . (7)
If one is lucky, this sequence converges to ''the'' fixed point if> of S. In view
of 5.2.(1), in the case at hand we choose go := 1[0,1[ and define the sequence
(gr I r ~ 0) by (7). The first thing we prove is
SUpp(gr) C [0,2N -1] V r ~ O. (8)
I Because N ~ 1, the assertion is true for r = o. If (8) is valid for a certain
r, then the value gr+l(t) = Sgr(t) has to be 0, unless the two sets
{2t - (2N -1), ... ,2t -1,2t} and [0,2N-1]
have a nonempty intersection; for this to be the case, the inequalities
2t ~ 0 2t - (2N -1) 5: 2N -1
must hold, which is the same thing as saying that 0 5: t 5: 2N - 1. .J
The effect of S in the Fourier domain is obviously given by
and iterating this r times produces for our gr the formula
6.1 The basic idea 161
Now by 5.3.(20) we have
90(';) = _1_ e-
i
(.!2 Sinc(i)
.j21r 2
(9)
and therefore
lim = _1 .
r-+oo 2
r
.j21r
This implies that at least in the Fourier domain we have what we hoped for,
that is to say
the convergence being locally uniform on lR.
How well the 9r themselves converge to </> depends strongly on the regularity
properties of </>, and these we don't know. At the moment the ''function'' </> is
but an L2 object. Nevertheless, it makes sense to talk about the support of
</>. The statement
;:supp(</ C [0,2N -1]
can be deemed true, if
[ 1</>(tWdt = 0
JIR\[0,2N -I]
is guaranteed, and for the latter it is sufficient for </> to be orthogonal on all
test functions u E C
2
having support disjoint from [0, 2N -1]. This
is precisely what we are going td prove in the following lemma:
(6.3) Let u be a C
2
-function having a support that is compact and disjoint
from the interval [0, 2N - 1]. Then
(</>,u) = J </>(t)u(t)dt = O.
I Let an c be given. By assumption on u we know that U E Ll, thus there
is an M > 0 such that
1
lu(.;) I d'; c .
leI2:M
Such an M being fixed, one can find an r :;::: 0 such that
furthermore, we deduce from 5.3.(5) and (9) that
'It; E lR, 'IT 2: 0 .
In view of (8) the supports of 9r and u are disjoint, therefore we can write
(,u) = (9r,U) + ( - 9r,U) = 0 + (J; - 9n u) ,
so that we obtain the estimate
$, u)1 < 1M I'(t;) - 9r(t;)lIu(t;) I dt; + { (I'(t;) I + 19r(t;)$ lu(t;) \ dt;,
-M J ~ ~ M
< (lIulll + ) c .
Since c > 0 was arbitrary, we must have (, u) = 0, as stated in the lemma . .J
Altogether, we have arrived at the following theorem:
(6.4) Assume that the coefficient vector h. is bounded by (6) and that the
corresponding function H satisfies the identity (1), as well as H(O) = 1. Then
the scaling equation admits a unique solution E L2, and has compact
support in the interval [0, 2N - 11.
By the way, the iteration procedure that we have used in the proof of (6.4)
can easily be implemented for the actual numerical construction of as well.
Figures 6.1 and 6.3 show the approximating step functions 9r together with
the limiting scaling function .
3
Figure 6.1 Iterative construction of Daubechies' scaling function 2<i>
6.1 The basic idea
163
In view of (6.1) resp. (6.4) the scaling function 4> is uniquely determined
by H and explicitly given by (2). Therefore the following procedure suggests
itself: One chooses a trigonometric polynomial H that satisfies the identity
(1), as well as H(O) = 1, and defines 4> by (2). Then (a), (b) and (c) at the
beginning of this section are fulfilled automatically; it remains to prove (d).
The following example shows that the consistency conditions encoded by (1)
are necessary, but unfortunately not sufficient for (d).
CD Taking off from Example 5.3.CD we define
H(f.) := (1 + = cos 3f. .
2 2
The identity (1) is fulfilled in this case:
The uniquely determined solution of the functional equation (b) that also
satisfies (c) can be written down explicitly; it is
,
Taking the inverse Fourier transform one gets
(0::; t < 3)
( otherwise)
It is easy to see that the functions 4>o,k = 4>(' - k) (k E Z) are not ortho-
normaL On the other hand it can be shown that the 'l/Jj,k derived from this
particular 4> constitute a tight frame for L2, see [D], Proposition 6.3.2. 0
Various additional assumptions on H have been proposed to make property
(d) come true, as a matter of fact the gap is not wide. We shall treat two
such attempts in what follows. The following variant is due to Mallat [12]:
(6.5) Assume that the generating function H E 0
1
satisfies the identity (1)
and H(O) = 1, as well as the additional condition
(10)
and let if; be defined by (2). Then the functions 4>o,k (k E Z) constitute an
orthonormal basis of V
o
.
I We have to show that the orthonormality (6.2) of the functions];. ( . - k)
is preserved in the limit. That's where the extra hypothesis (10) comes in.
If 1.;1 ~ 7r, then one has H(';/2
j
) :I 0 for all j ~ 1, and this implies by
definition of the convergence of an infinite product that ;(.;) :I o. Because of
the locally uniform convergence of (2) we know that ; is continuous, therefore
we can find a 5 > 0 such that
(1.;1 ~ 7r) .
(11)
A moment's reflection will show that the function];. can also be written in
the following alternative way:
( otherwise)
In view of (11) this implies that the universal estimate
is valid, so that in the concluding formula
J (t) (t - k) dt = J 1;(.;Weikf; d';
= lim J 1];.(';)1
2
eikf; d'; = OOk
r->oo
Vk E 7l
we are allowed to apply Lebesgue's theorem (on limits under the integral sign) .
.-J
CD (Continued) In order to see what went wrong in this example we compute
r r C
~ 2 1 ITI (.;) 12 1 IT 2 3 ..
Ifr(';) I = 27r H 2j = 27r cos 2j+1
j=l j=l
Now consider the points ';r := i 2
r
7r (r ~ 1). According to the last formula
one has
6.1 The basic idea 165
for all r 1. Since the er tend to infinity when r -T 00, it seems inconceivable
that the 1];.1
2
have a common integrable majorant.
The deeper reason for the phenomenon observed here is the following: The
action
D: 'R./21r -T 'R./21r ,
has a closed orbit
(eo, ,en-I), ek := Dek-I Vk, en = eo (12)
such that IH(ek)1 = 1 for all k, namely the two-cycle F:, It is a side-
effect of condition (10) to make orbits of this kind impossible. This can be
seen as follows: Condition (10) implies
IH(e) 1 < 1
(
1!: < (: < 311")
2 - c" - 2 '
the variable e being understood modulo 21r. Let (12) be an arbitrary closed
orbit of D. In the (necessarily periodic) binary representation of modulo 1
each of the two sequences 01 or 10 must occur somewhere. But this implies
that after finitely many steps a point Dj eo falls into the interval [ 3;];
therefore the orbit under discussion necessarily contains points ej for which
one has IH(ej) I < 1. 0
Lawton [11] has found a condition of a more algebraic nature that likewise
guarantees the orthonormality of the functions >O,k. We again assume (6);
then by Theorem (6.4) we have a( = 0 and = 2N -1.
At stake are the numbers
am := (>, >O,m) = J >(t) >(t - m) dt (m E Z) .
Because of supp( c [0, 2N -IJ all am with Iml 2N -1 are automatically
zero. Due to the scaling equation 5.2.(2) one has
am = 2 Lhk hi J >(2t - k) >(2t - 2m -l) dt
k,1
= L hk hi J >(t') >(t' + k - 2m - l) dt' = L hk hi a2m+l-k .

If we substitute the summation variable 1 according to 1 := n + k - 2m, where
n is the new running variable, we obtain
am = L(Lhkhn+k-2m) an
(13)
n k
In this way the square matrix A := [Amn] of order 4N - 3, whose elements
are defined by
Am,n := L hk hn+k-2m
k
(lml, Inl < 2N -1) (14)
comes into play. Formula (13) can now be read as O:m = I:n Amn O:n, meaning
that the vector 0:. is an eigenvector of A corresponding to the eigenvalue 1.
The special vector
/3. := (0, ... ,0,1,0, ... ,0) , i.e.
f3m = 80m (lml < 2N -1)
is an eigenvector of A corresponding to the eigenvalue 1 as well; for, because
of (1) resp. (5.4), one has
L Amn f3n = Am,o = L hk hk - 2m = 80,m = f3m
( Iml < 2N - 1) .
n k
After all this work, we are in a position to state the following theorem:
(6.6) Assume that the coefficient vector h. is bounded by (6), that the cor-
responding function H satisfies the identity (1) and H(O) = 1, and that </> is
the scaling function determined by (2). H 1 is a simple eigenvalue of the
matrix A, then the functions </>O,k (k E Z) are orthonormal.
I By assumption on A there is a number c E C* such that 0:. = cf3.; that is
to say, all O:m = (</>, </>O,m) corresponding to m =F 0 have the value 0 as stated,
and 0:0 = c =F o. The computation carried out in the proof of (5.9) shows
that under these circumstances the identity
holds.
Now, if l = 2T(2n + 1) =F 0, then the calculation
T-1
(27rl) = II H(2
T
-
j
(2n + 1)7r) . H(2n + 1)7r) (2n + 1)7r) = 0, (15)
j=l
6.1 The basic idea
167
repeated from the proof of (5.14), shows that in fact
c = 27r 1(0)1
2
= 1 .
CD (Continued) In this example we have N = 2, and the hk take the following
values:
1
ho = h3 = J2'
Inserting these into (14) one arrives at the matrix
(the rows and columns a ~ e numbered from -2 to 2), having the eigenvalues
1 1
-1, -"2' "2' 1, 1 .
The eigenspace corresponding to the eigenvalue 1 is two-dimensional; it is
spanned by the vectors (1,2,0,2,1) and, of course, (0,0,1,0,0). 0
So far we have not touched the question of how regular the scaling functions
are that one obtains in this way. Figures 6.1 (resp. 6.5) and 6.3 show that 1
may indeed look quite jagged. Since such a 1 comes into being only as the
limit of a certain "fractal" process, and is not at our disposal in the form of a
simple expression, the investigation of its regularity, be it via the decay of ( ~ )
for I ~ I - 00 or via a careful analysis of the operator S, is very delicate and
requires subtle estimates of various sorts. In this way one is able to prove,
e.g., that the Daubechies scaling function 31 and its corresponding mother
wavelet 37/J are already continuously differentiable, and furthermore that the
order of differentiability increases essentially linearly (with a proportionality
factor rv 0.2) with N. For details we refer the reader to [DJ, Chapter 7, or to
the paper [7J.
In view of the results presented in the last section, only the following algebraic
problem remains: We have to find trigonometric polynomials that satisfy the
identity
H(f.) := ~ I: hk e-
ike
V2k
and, of course, the condition H(O) = 1. We shall insist here on real coefficients
hk; the corresponding scaling functions if> as well as the mother wavelets 'if;
will then be real-valued as well.
According to 5.3.(13) the Fourier transform of 'if; is given by
Now, on account of what we said in Section 3.5 (see, e.g., Theorem (3.13)),
we are interested in our wavelet 'if; having an order N as high as possible, and
according to 3.5.(3) this is equivalent to the requirement that ~ should vanish
of an order N as large as possible at f. = O. As a consequence the generating
function H should have a zero of order N 1 at f. = 7r, a fact that we express
most elegantly by writing
N?l.
Instead of looking for H we switch for a moment to the function
(1)
that would have to satisfy the linear identity
(2)
For symmetry reasons the function M is a polynomial in cos f. , and M contains
the factor
Therefore we may write
(3)
6.2 Algebraic constructions 169
where P is a certain polynomial as welL Now we introduce a new variable y
by letting y := sin
2
This leads to
= = P(l - 2y) =: P(y) , (4)
where again P is a certain polynomiaL In this way (3) becomes
= (1 _ y)N P(y) .
Because of
2 + 1r) . 2
cos -2- =sm "2 =y
and
+1r) = = P(2y -1) = P(l- 2(1- y) = P(l- y),
the identity (2) takes the following form when expressed in terms ofthe vari-
able y:
(5)
This formula is valid for 0 :::; y :::; 1 at first, but by general principles on
holomorphic functions we may conclude that it is true for arbitrary y E C.
By the theorem on decomposition into partial fractions there are uniquely
determined coefficients Ck , C
k
such that
and for symmetry reasons one has Ck = C
k
for all k. Clearing denominators,
we can infer that there is a polynomial P
N
of degree :::; N - 1 such that
holds, and PN is the only polynomial solution of (5) having a degree:::; N - l.
Now it easy to see that any solution P of (5) satisfies the identity
P(y) == (1 - y)-N (1 _ yN P(l _ y)
as welL In particular, this is the case for PN, and this allows us to draw the
following conclusion:
N-l (-N) N-l (N + k -1)
PN(y) = if:-
1
PN(y) = I: k (_y)k = I: k yk .
k=O k=O
(6)
Here we have made use of the fact that the part of P
N
carrying the factor
yN gives no contribution to i:-
1
P
N
. The solution of (5) having the smallest
possible degree now has been determined explicitly: It is the right hand side
of (6).
Now let P be an arbitrary solution of (5). Then
and consequently
P(y) - PN(y) = yN P*(y)
for some polynomial P*. If we insert this into (7) again, we obtain
P*(y) + P*(l- y) == 0,
which is equivalent to
p*(y) = R(l- 2y) = , R odd.
Since we can perform the same computations backward as well, all in all the
following theorem has been proven:
(6.7) A trigonometric polynomial M() satisfies the identity (2) if and only
if it has the following form:
= N .
Here
P(y) = PN(y) + yN R(l - 2y) ,
where PN is given by (6) and R is an arbitrary odd polynomial.
In view of (1) such a function M() is of use only if P satisfies the additional
condition
P(y) ;::: 0
Letting P := PN, this condition is obviously satisfied.
So much for the admissible functions M, these being related to H by (1). In
order to get the generating functions H themselves, we must, so to speak,
"take the square root of M ". In doing this we only have to bother about the
factor
= =
introduced in (3). For carrying out this task a surprising lemma of Riesz will
come to our help. It reads as follows:
171
(6.8) If
n
= Lak
k=O
and if 0 for in particular A(O) = 1, then there is a trigonometric
polynomial
n
= L b
k

k=O
with real coefficients bk and B(O) = 1, such that
== B( , (8)
identically in
I The function A() possesses a product representation of the form
n
= an II - Cj) , (9)
j=1
the Cj being real or else appearing in complex conjugate pairs. We introduce
the complex variable z by writing =: Z; then (9) goes over into
n (Z +Z-1 )
2 -Cj.
j=1
(10)
In investigating the individual factors appearing in (10), we need the well
known properties of the mapping z I--t (z + Z-1 ) /2 as well as the identity
z + Z-1 s + s-1 1 -1
----=2=--- - 2 == - 2s (z - s) (z - s)
(zs # 0) . (11)
(a) If Cj E R and ICjl 1, then there is an s E R* such that Cj = (s+s-I)/2.
Therefore we obtain, using (11):
z + Z-1 1 1 )
--- - Cj = -_. (z - s) . (z- - s .
2 2s
(b) If Cj E Rand ICjl < 1, then there is an s = eia: # 1 such that
S+S-1
Cj= =COSct.
2
This implies that A(';) contains a factor cos'; - COSQ, and the latter is not
compatible with A(';) 0 (.; E JR), unless this factor occurs an even number
of times. Therefore there is a j' such that Cj' = Cj, and using (11) we obtain
the identity
(Z+2
Z
-
1
-Cj) (z+2
z
-
1
-Cjl)
= 4e;ia (z - e
ia
) (Z-1 - eia)(z - e
ia
)(z-1 - e
ia
)
= - eia)(z _ e-
ia
) (Z-l _ e
ia
) (Z-1 _ e-
ia
)
4
=
(c) If Cj tj JR, then there is, first, a j' such that Cj' = Cj and, second, an
s E C* such that
8+8-
1
2
Cj' =
2
Using (11) again we get
All things considered, it follows that it is possible to combine and to regroup
the factors appearing in (10) in such a way that the resulting representation
of A(';) assumes the following form:
Here Q(z) = qkzk is a polynomial with real coefficients qk, and the
constant C E C" is obtained by collecting an and the various numerical factors
that have appeared in (a)-(c). The extra condition A(O) = 1 gives C =
1/(Q(1))2. It follows that, if we set B(';) := then (8) is valid;
therefore the lemma is proven. -.J
The decomposition (8) is not uniquely determined, since in the cases (a) and
(c) interchanging sand S-1 leads to another decomposition ofthe correspond-
ing partial product of A(). This, albeit modest, flexibility can be used to make
the resulting scaling function and in consequence the related mother wavelet
more symmetrical. We shall not pursue this matter any further.
Assume that N is given. If we choose for simplicity P := P
N
, then A(.)
becomes a polynomial of degree N - 1 in and B() a polynomial of
degree N - 1 in In this way the generating function
is of degree 2N - 1 in and the support of the corresponding scaling
function (=: N turns out to be the interval [0, 2N - 1]. The mother
wavelets N'I/J derived from the N> are called Daubechies wavelets.
CD In the case N = 1 we of course obtain the Haar wavelet. Formula (6)
gives P
1
(y) == 1, and this in turn implies == 1, == 1, so that we
finally get
1 c
H(!;.) = 2(1 + e-
t
... ) ,
which is in agreement with 5.3.(21).
o
The case N = 2 shall be dealt with in detail in the next section; the case
N = 3 appears as Example @ below. In [D], Table 6.1, the coefficient vectors
(hk 10 :'S k :'S 2N - 1) corresponding to the Daubechies wavelets N'I/J are
given to 16 decimal places for 2 :'S N :'S 10. In [L], Table 2.3, one finds these
coefficients to six decimal places for N from 2 to 5.
@ We now describe in detail the case N = 3, choosing
P(y) :=P3 (y) = G) + G)y+ (:)y
2
=1+3Y+6
y2
.
Inserting
2 1 c c
y = sin - = -( -e- ... + 2 - e ... ),
2 4
into (4) we get
= _ -4
9
+ 19 _ ...
8 4
Figure 6.2 confirms that is 2:: 0 throughout so that it makes sense to
proceed with our computation. In the case at hand, the function B() has the
y
1
o 211"
Figure 6.2
form B ( ~ ) = b
o
+ b1e-
ie
+ b
2
e-
2ie
, so that we have to compare coefficients in
the identity
e 2e e 2e 3 2e 9 c 19
(bo + b1e-' + b2e- , )(bo + b1e' + b2e ) = Se-' - '4e-'" + 4: - ...
For symmetry reasons it is enough to check the coefficients corresponding to
e-
2ie
, e-
i
{ and 1. In this way we obtain the three equations
(12)
Because A(O) = P(O) = 1, Lemma (6.8) guarantees that we can find real
solutions (b
o
, b
1
, b
2
) that satisfy the additional condition b
o
+ b
1
+ b
2
= 1. If
we use this condition to eliminate b
o
+ b
2
from the second equation in (12),
we get for bl the quadratic equation by - bl - ~ = 0, and this in turn leads to
b _ lv'lO
1- 2 '
We leave it to the reader to pursue the upper choice of the sign here; it will
result in complex solutions b
o
and b
2
This means that we definitively have
b
1
= (1 - v'lO) /2, and because of the first equation in (12) we can say that
b
o
and b
2
are the two solutions of the quadratic equation
Choosing arbitrarily (well, not quite ... ) one ofthe two possible assignments,
we get
B(
C) _ 1 + v'lO + V5 + 2v'lO 1 - v'lO -ie 1 + v'lO - V5 + 2v'lO -2ie
.. - 4 + 2 e + 4 e,
so that we finally obtain
H(f,) C + ;-i r B(e)
.!.(1 3 -i ) (1 + J10 + V5 + 2J1O 1- J10 -i )
8 + e +... 4 + 2 e + ...
1 + J10 + V5 + 2J1O 5 + J10 + 3V5 + 2J1O -i
= 32 + 32 e + ....
From the part of H that is actually printed out here one can immediately
read off ho and hI:
ho
= 1n2 1 + J10 + V5 + 2J1O
v L, 32 = 0.33267 . .. ,
hI = J2 5 + J10 + 3V5 + 2J1O = 0.80689 ... ,
32
both in agreement with Table 5.4.(8). We leave it to the reader as an exercise
to compute the remaining hk as well and so convince herself that we have
indeed determined the coefficient vector h. corresponding to the Daubechies
wavelet 31/J.
Figures 6.3 and 6.4 show the functions 34> and 31/J in the time domain. 0
1-+------::#----'>-
3 4 5
Figure 6.3 The Daubechies scaling function 34>
1
1 4 5
Figure 6.4 The Daubechies wavelet 3'1/1
6.3 Binary interpolation
In the two foregoing sections we obtained scaling functions and corresponding
wavelets by means of constructions in the Fourier domain, and also as limiting
functions of an iteration procedure. In neither approach, however, did we
discuss the convergence behaviour in the time domain. Now there is a third,
called the direct method for constructing scaling functions </>. This method
yields without a limiting process the exact values </>(x) at all "binary rational"
points x E JR, and it is with the help of this method that one obtains the best
regularity results, e.g. for the Daubechies wavelets N'lj;.
In order to fix ideas, we assume that an N > 1 has been chosen once and for
all and, furthermore, that
a(h.) = 0, b(h.) = 2N -1,
as agreed upon in connection with the Daubechies wavelets. The following
abbreviations will prove useful:
{O, 1, ... ,2N - I} =: J ,
177
For the description of the binary rational numbers we use the handy notation
therefore we have the inclusions
Z = lJ)o C lJ)1 C ... C lJ)r C lJ)r+1 C . . . C lJ) ,
and lJ) is dense in R.
The scaling equation now has the form
2N-l
(t) = v'2 L hk (2t - k),
k=O
ho h2N- 1 =1= 0 .
The "direct method" is founded on the following three simple facts:
(1)
If t E lJ)r for some r ~ 1, then the numbers 2t - k (k E J) belong to
lJ)r-l
If t < 0, then the numbers 2t - k (k E J) are < 0 as well.
If t > 2N -1, then,the numbers 2t - k (k E J) are> 2N -1 as well.
On account of these facts the scaling equation (2) allows us to compute the
values of successively on
w ,
and therefore on all oflJ), if only these values have been determined on lJ)o = Z
beforehand. Moreover, if (k) = 0 for k E Z<o and k E Z>2N-l to begin with,
then automatically (t) = 0 for all t E lJ)<o U lJ2N-1. (As a matter of fact,
one has (O) = (2N - 1) = 0 as well. The latter will emerge from the
calculation of rz.)
Now for rZ: In any case, the wholesale assignment
(k) := 0 (k E Z\J)
is in agreement with (1). Therefore, we are left with the system of homoge-
neous equations (j) = v'2 2:k hk (2j - k), or, equivalently,
2N-l
(j) = v'2 L h2j -k (k)
(0 :::; j :::; 2N - 1) , (2)
k=O
for the vector ((j) I j E J) =: a. This means that the (J x J)-Matrix
Bjk := v'2 h2j- k ((j, k) E J x J)
should have an eigenvector a corresponding to the eigenvalue 1. In this regard
we shall prove the following:
(6.9) The matrix B has 1 as an eigenvalue in any case. If this eigenvalue is
simple, then there is exactly one corresponding eigenvector a such that
L ak=l.
kEJ
(3)
r- As an illustration of this theorem we show here the matrix B in the case
N=3:
ho
0 0 0 0 0
h2 hI ho
0 0 0
B
v'2
h4 h3 h2 hi ho
0
(4)
0
h5 h4 h3 h2 hI
0 0 0
h5 h4 h3
0 0 0 0 0
h5
For the proof we argue about the column sums of B. To this end we consider
again the generating function H, as given in 5.3.(3). Because of
we have, in addition to (5.5), the equation
so that the following is true:
A glance at (4) shows that the matrix B has (at least in the case N = 3)
constant column sums 1. Of course, this is true in general:
(k even)
(k odd)
and it is easy to verify that for each k E J the sum extends over all h2l =f. 0
resp. all h2l+1 =f. O. What we have found can be expressed in other words
as follows: The vector e := (Iii E J) is an eigenvector of the matrix B',
corresponding to the eigenvalue 1. Such being the case, the matrix B has 1
as an eigenvalue as well, and there is a corresponding eigenvector a =f. o.
For the proof of the second part of the theorem, we note the following: By
general prinCiples (see [6], 58, Theorem 1), our space X is the direct sum
6.3 Binary interpolation 179
of two B-invariant subspaces U and V such that B - Ix is nilpotent on
U and invertible on V. Therefore the characteristic polynomial q(>..) of B
can be decomposed as q(>..) = (>.. - l)m q1(>"), where m := dim(U). Now
by assumption on q(.) we have m = 1; therefore U = < a> and dim(V) =
dim(X) -1.
To any y E V there is an x E V such that y = Bx - x, and from this we
conclude that
(e,y) = (e,Bx) - (e,x) = (B'e,x) - (e,x) = 0 .
This proves V c < e >1., by counting dimensions we therefore have V =
<e>1.. Because a tj. V, this implies
ak = (e, a) # 0 ,

which is enough to show that the sum on the left can be normalized to 1. -1
Condition (3), resp. I:kEJ 4>(k) = 1, does not come out of the blue. As a
matter of fact, one following theorem (cf. (6.1)):
(6.10) Suppose that the generating function H is as in Theorem (6.1) and
that if; E L2 is defined by the infinite product 6.1.(2). If 4> is in reality a
continuous function, satisfying an estimate of the form
c
14>(t)1 1 + t2
then the following identity holds:
L4>(x-k)
k
(t E JR.) ,
1 (x E JR.)
I By assumption on 4> the auxiliary function
g(x) := L 4>(x - k)
k
is a continuous periodic function of period 1 and has Fourier coefficients
Cj = t g(x) e-2j1rix dx = L [1 4>(x - k) e-
2j1ri
(x-k) dx
Jo k Jo
= J 4>(x) e-2j1rix dx = v2-irif;(2j1f) = 80j (j E Z) ,
(5)
where in the end we have used 6.1.(15). From this it follows that 9 has the
constant value 1, as stated. .-J
For N ~ 2 the Daubechies scaling functions N<P are continuous. We shall prove
the continuity of 2<P below; for the general case, however, we refer the reader
to [D], Chapter 7; other sources are [4] or [7]. The continuity implies that the
N<P satisfy their respective scaling equations identically in t; furthermore, the
identity (5) is valid for them. (The latter statements are true for 1 <p = <PHaar
as welL)
Returning to (2) we see that the numerical construction of N<P is accomplished
as follows: For certain general reasons the system (2) has a solution (<pU) I j E
J) =: a such that LkEJ <p(k) = 1. All other <p(k) are set at zero, whence N<pIZ
is now fixed. (If the multiplicity of the eigenvalue 1 of B is in fact 1, then
N<PIZ is uniquely determined by (2).) Starting from N<PIZ, one successively
computes the values N<P(X) at all points x E Jl)) using the iteration procedure
we have described above. For a graphical representation of <P this is obviously
sufficient, but it is not all: In principle the value N<p(X) is available now at
each and every point x E JR, since N<P is continuous and Jl)) is dense in R
We conclude this section with the following theorem, covering the case N = 2:
(6.11) The Daubecmes scaling function 2<P is continuous.
In our proof we shall make use of the binary recursion procedure described
above. Note that we are not allowed to use (5) here; on the contrary, the
identity (5) will be a by-product of our argument. Our presentation essentially
goes along the lines ofthe proof given in [14].
r We begin as in Example 6.2.: According to 6.2.(6) one has
and consequently
To this A() we have to apply the Riesz' Lemma (6.8). If we compare coeffi-
cients in the identity
then the two equations
1
bob
i
=--
2
181
result. We choose the solution (b
o
, bl ) = (1 + ..;3) /2, (1 - ..;3)/2), which
leads to
1
H(t;,) = 2 B(t;,) = 8(1 + + (1 + V3 + (1 -
= + V3 + (3 + + (3 - + (1- ,
whence H(O) = 1 is satisfied, too. In this way we obtain the following table,
representing the coefficient vector h. :
h -..2.... 1 + ..;3 - .4829629131445341
o-y'2 4 -
hI = 3 +..;3 = .8365163037378079
V2 4
h -..2.... 3 - ..;3 - .2241438680420134
2-y'2 4 -
h3 = 1 -4..;3 = -.1294095225512604 .
All calculations that follow will take place within the following range of real
numbers:
The set Jl) [v'3] is obviously a ring, and the conjugation (complex numbers do
not occur any more in this section)
z = x + yV3 1-+ z:= x - yV3
is an automorphism of Jl) [v'3] that keeps the elements of the ground ring Jl)
fixed. The following two numbers will playa special role in our computations:
1+..;3 _ 1-..;3
a:= 4 = .6830 ... , a = 4 = -.1830 ....
If a and a are inserted into the scaling equation (1), it takes the form
</J(t) = a</J(2t) + (1 - a)</J(2t - 1) + (1 - a)</J(2t - 2) + a</J(2t - 3) , (6)
and analogously the system of equations (2) becomes
[
(o)] [a ]
(I) = 1 - a 1 - a a
(2) a 1 - a 1 - a
(3) a
[
(o)]
(I)
(2) .
(3)
(7)
The system (7) has exactly one solution that satisfies condition (3) as well,
namely,
[
(O)] [ 0 ]
(I) _ 2a
(2) - 2a .
(3) 0
As has been said twice before, we set (k) := 0 for all remaining k E Z. Then
(.) is recursively determined on all of II} by (6). We assert that the resulting
function : II} ---+ lR. has the properties listed below:
(6.12) For all x ElI}, the following are true:
(a) (x) E 1I}[V3] ,
(b) (3 - x) = (x) ,
(c) L.k (x - k) = 1,
(d) L.kk(x-k) = x-2a-4a.
I For an x E lI}o = Z, the statements (a )-( c) are true. In order to verify
(d) iz we write IZ in the form
(x E Z) .
Then
(x - k) = 2a 8x-k,1 + 2a 8x- k,2 = 2a 8x- 1,k + 2a 8x- 2,k
(x,k E Z),
and this implies the following chain of equations for arbitrary x E Z:
Lk(x-k) = 2aLk8x-l,k+2aLk8x-2,k= 2a(x-I)+2a(x-2)
k k k
= x-2a-4a.
We noW assume that the relations (a)-( d) are true for all x E II}r and consider
an arbitrary t E II}r+1. All numbers 2t - k belong to II}r, therefore, one may
183
read off immediately from (6) that (t) lies in ]])[v'3] as well. Regarding (b)
and (c), one has
(3 - t) = a(6 - 2t) + (1- a)(5 - 2t) + (1- a)(4 - 2t) + a(3 - 2t)
= a(2t - 3) + (1- a) (2t - 2) + (1- a) (2t -1) + a(2t)
= (t)
and
= L( a(2t - 2k) + (1- a)(2t - 2k -1)
k
+ (1 - a)(2t - 2k - 2) + a(2t - 2k - 3))
= (a+(1-a))L(2t-2k) + ((1-a)+a)L(2t-2k-1)
k k
= L (2t - i) = 1 .
l
Finally, the induction step for (d):
Lk(t- k)
k
= Lk (a(2t-2k)+(1-a)(2t-2k-1)
k
+ (1- a)(2t - 2k - 2) + a(2t - 2k - 3))
= L(ak + (1- a)(k -1))(2t - 2k)
k
+ L((l - a)k + a(k - 1))(2t - 2k -1)
k
1 1
= 2" ~ ( 2 k + 2a - 2)(2t - 2k) + 2" ~ ( ( 2 k + 1) -1- 2a)(2t - 2k -1)
1 1
= - L 1 (2t -l) + (a - 1) L (2t -l) = -(2t - 2a - 4a) + a-I
2 l l 2
= t - 2a -4a.
In the last part we used the relation 2a + 2a = 1 several times.
In view of this induction proof, property (d) seems to come as a miracle. In
reality this property may be related to certain general principles in a similar
way as (c) has its theoretical foundation in Theorem (6.10).
Now consider the formulas (6.12)(c) and (d) when x is restricted to the
interval 0 :::; x:::; 1. Because of supp() = [0,31 we obtain the two equations
(x) + (x + 1) + (x + 2) = 1
- (x + 1) - 2(x + 2) = x - 2a - 4a,
and from these the following formulas result through elimination:
(x + 1) = -2(x) + x + 2a }
(x+2) = (x) -x+2a
(x E J[), 0 :::; x :::; 1) . (8)
We stick for a moment to the x-interval [0,1]. Because of supp( ) = [0,3], for
such x the number of terms in the scaling equation can be reduced as follows:
{
a(2x)
(x) =
a(2x) + (1 - a)(2x -1)

(9)
The second line of (9) is not yet in its optimal form. If :::; x :::; 1, then there
is an u E [0,1] such that 2x = u + 1. Using the first formula (8) we therefore
may write
(2x) = (u + 1) = -2(u) +u +2a = -2(2x -1) + 2x -1 + 2a
and consequently
a(2x) + (1- a)(2x -1) = (-2a + 1- a)(2x -1) + 2ax - a + 2a
2
1
= a(2x -1) + 2ax + 4 .
This means that we can replace (9) by
{
a(2x)
(x) =
a (2x - 1) + 2ax + i
(x E J[), 0:::; x :::;
(XEJ[), !:::;x:::;l)
(10)
In this way we have obtained a reproduction scheme for referring to the
interval [0,1 J only. In both lines of (10) there is a single q'>-term on the
right hand side, and, what's more, at both occurrences of such a term the
coefficients have an absolute value < 1. This fact is going to be the main
ingredient of our continuity proof.
185
We let X be the space of all continuous functions f: [0,1] -+ JR assuming at
o and 1 resp. the values 0 and 2a and provide it with the metric
d(j,g):= sup If(x) - g(x)1 .
O:$;x$l
By general principles X is a complete metric space. We now assert that the
following proposition is valid:
(6.13) The formula
Tf(x) := {a
f
(2X)
af(2x -1) + 2ax + i
(0 x
a x 1)
defines a contracting mapping T: X -+ X; to be precise, one has
d(Tf,Tg) ad(j,g) Vf, 9 EX.
(11)
(12)
I If f(O) = 0 and f(l) = 2a, then Tf(O) = 0 and Tf(l) = 2a as well.
Furthermore, one has Tfa) = 2a
2
, this being the case regardless of whether
the value has been computed using the first or the second line of (11). Finally
it becomes clear from at (11) that for any f E X the image Tf
is continuous on each of the two half-intervals [O,!] and [!, 1], and as a
consequence Tf is continuous on all of [0,1]. Altogether, we have shown that
T is a well defined map from X to X.
Now let f and 9 be two arbitrary functions in X. For 0 x ! one has
ITf(x) - Tg(x) I = a If(2x) - g(2x) I a d(j, g) ,
and for ! x 1 the following is true:
ITf(x) - Tg(x) I = I (af(2x -1) + 2ax + i) - (ag(2x -1) + 2ax + i) I
= lallf(2x -1) - g(2x -1)1 10,1 d(j,g) .
Because of 10,1 < a 1) we therefore have ITf(x) - Tg(x) I ad(j,g) for all
x E [0,1], and (12) is proven. -1
From (6.13) it follows by the general fixed point theorem that there is a
unique function f* E X satisfying Tf* = f* . This function f* coincides
in the points of JI)) n [0,1] with the function : JI)) -+ JR constructed earlier,
because at the points 0 and 1 the function f* has the same values as has,
and because the reproduction scheme (11), applied to f:= f* (=> Tf=f*),
goes over into the reproduction scheme (10) for the function i(JI)) n [0,1]).
From this it follows that our : JI)) -+ JR, restricted to 0 x 1, has a
continuous extension on all of [0,1]. Now from (8) one concludes that such
continuous extensions exist in the intervals [1,2] and [2,3] as well, and outside
of [0,3] the definition (x) := 0 trivially makes for a continuous extension.
186
6 Orthonormal wavelets with compact support
2a
1
1 3
20:
Figure 6.S The Daubechies scaling function 24>
Let us summarize our results so far:
(6.14) There is a unique continuous function </>: R -+ R having support [0,3]
and satisfying, identically in x, the following equations:
(a)
(b)
(c)
</>(x) = .E!=o hk </>(x - 2k) ,
.Ek </>(x - k) = 1,
3 - v'3
.Ekk</>(x-k)=x- 2 .
I (a) The function u(x) := </>(x) - .E!=o hk </>(2x - k) is continuous and
vanishes at all points ofD, consequently u(x) == O.
In any bounded x-interval, the left hand side of (b) is a finite sum and therefore
a continuous function v(). According to (6.12)(c) this function takes the
value 1 at all points of D, therefore we have v(x) == 1 on all of R.
In an analogous manner one obtains the identity (c) from (6.12)(d). --1
The function </>: R -+ R we constructed here is in fact the Daubechies scaling
function 2</>, for (6.14)(a) implies
6.3 Binary interpolation 187
and from (6.14)(b) one concludes
31
2
1
.,f2;r(O) = r (x)dx= r L(x+k)dx= r L(x+k)dx=l.
Jo Jo k=O Jo k
Altogether this means that 6.1.(2) is true. It follows that our is the "origi-
nal", i.e., time domain version of the unique scaling function belonging to the
coefficient vector (h
o
, ... , h3). This function, by definition, is 2; but up to
this point it was analytically available to us only in the form . --.l
In Figures 6.5 and 6.6, the functions 2 and 2'1/; are shown. These figures have
been created by means of the described recursion procedure, computing 3256
values in each of the two cases.
1
3
-2a
Figure 6.6 The Daubechies wavelet 2'1j;
6.4 Spline wavelets
In this last section we construct the so-called Battle-Lemarie wavelets. The
starting material are spline functions, and that's why these wavelets are occa-
sionally called spline wavelets as well, even though they are no longer spline
functions. At the same time, the BattIe-Lemarie wavelets, in contradiction
to the title of the current chapter, don't have compact support either. Never-
theless it will be possible to use the formalism that we have erected in the
foregoing sections for the treatment of these wavelets as well. But let's take
everything in turn!
Another glance at the scaling equation in the form 5.3.(4) shows that, given
two pairs (I, HI) and (2, H
2
), each of them satisfying such an equation,
the pair ( ~ . 2 , HI' H
2
) satisfies such an equation as well. To multiplication
in the Fourier domain corresponds convolution in the time domain; in other
words, if (1)1 and 2 are scaling functions, then I * 2 will satisfy a scaling
equation as well. Therefore, beginning with o := Haar and setting up the
recursion scheme n+1 := o * n (n 2: 0), we should obtain a sequence of
ever more regular functions that a priori satisfy scaling equations and could
maybe be adapted to be useful in the construction of wavelets.
We are going to change our notation to some extent, for the functions obtained
in this way have previously appeared in numerical practice, going by the name
of B-splines (for "basis splines"), and they play an important role in the
general theory of spline approximation. Various notations for these functions
can be found in the literature, among them the following, which suits our
purposes well enough:
Bo(x)
.- { ~
(0:5 x < 1)
( otherwise)
Bn+l(x) := (Bo * Bn)(x) = 1 ~ 1 Bn(t) dt
(n 2: 0) . (1)
Doing the actual computation one finds, e.g., that the cubic B-spline is given
by the following formulas:
{
.!.X3
6
2 2 I 3
B3(X)= 3-2X+2X-2X
B3(4 - x)
o
Figure 6.7 shows the graphs of B I , B2 and B3
(0:5x:51)
(1:5 x :5 2)
(2:5 x:5 4)
(otherwise) .
6.4 Spline wavelets
189
The easy verification of the following statements is left to the reader:
supp(Bn) = [0, n + 1] ,
J Bn(x)dx = 1
(n;::: 0);
furthermore, one has
(n;::: 1) .
1
1/2
x
o 1 2 3 4
Figure 6.7
Since for all practical purposes Bo = Haan copying 5.3.(20) gives
The convolution theorem (2.10) converts the recursion formula (1) into the
formula
and by multiplicative accumulation one obtains
(n;::: 0) . (2)
The following can immediately be read off from this representation of Bn:
(3)
On account of what was said at the beginning of this section, we now expect
that each B-spline Bn satisfies a scaling equation. As a matter of fact, we
have
and consequently
This means that
(4)
where the generating function Hn is given by
(
e)n+l (1
Hn(e):= cos 2" = (5)
We see that the coefficients hk (hi
n
), really) of Hn have the following values:
{
.j2 (n+ 1)
hk = k
(0 k n + 1)
(otherwise)
so that the scaling equation in the time domain takes on the following form:
(x E JR.) .
That the Bn would satisfy such identities could not immediately be guessed
from looking at their definition!
In order to check whether Bn can be used as a scaling function, according to
(5.9) we have to examine the 27r-periodic function
(6)
Because of (3) the series appearing on the right is uniformly convergent. It
follows that <l>n is a continuous function (we shall compute <l>n explicitly later
on). Furthermore, we obtain, using (2) and the inequality
sin x 2
>
x 7r
6.4 Spline wavelets
191
the following estimate:
I
B 12 = >
n(e) 211" e/2 - 211" 11"
Under these circumstances there are numbers B 2:: A > 0 (B and A depend
on n) such that
\Ie E lR,
and on account of part (a) of Theorem (5.14) we come to the conclusion that
the translates Bn (. - k) (k E Z) constitute a Riesz basis of the space
Va := span(Bn( - k) Ik E Z) .
The proof of the following lemma is deferred to a later point:
(6.15) There are polynomials Pn of respective degree n such that the follow-
ing is true:
(n2::0).
The Pn can be computed recursively and have rational coefficients.
We now suppose that an n 2:: 1 has been chosen and remains fixed in what
follows. Part (b) of Theorem (5.14) describes an orthonormalization proce-
dure; in particular, it gives a formula for the "definitive" scaling function if;
corresponding to the chosen n, meaning that the translates if; ( . - k) (k E Z)
of if; are in fact orthonormal. The formula in question is
(7)
In order to get an expression for if; in the time domain, we develop the function
IlVPn(cose) into a Fourier series:
Inserting this into (7) and applying rule (Rl) we finally obtain the following
representation of the scaling function if; corresponding to the chosen n:
if;(x) = Bn(x - k) .
(8)
k
It has to be admitted, however, that the coefficients
(k;::: 0)
appearing here have to be computed numerically one by one.
Since l!VPn(cosf.) is a real-analytic 27r-periodic function, the Ck have expo-
nential decay when Ikl -+ 00: There is a p < 1 such that
and because of supp(Bn) = [0, n + 1] it easily follows from this that (x) is
exponentially decaying when Ixl -+ 00 as well. But the compact support of
Bn has been lost in the orthogonalization process.
Proceeding along the lines of the general theory, we further need the modified
generating function H#, and in order to be able to work with the mother
wavelet 'Ij; corresponding to the above we need the coefficients h'/f in the
representation
H#(f.) = ~ Lh'f!'e-
ire
.
v
2
r
From (7) we conclude because of (4) that
Pn(cosf.)
Pn (cos(2f.))
Therefore, by means of (5), we get the representation
Pn(cosf.)
Pn (cos(2f.)) ,
(9)
(10)
from which one can read off already that 'Ij; has the order n + 1. The square
root on the right now has to be developed into a Fourier series:
Pn(cosf.)
Pn (cos(2f.))
here again the coefficients
(k ;::: 0) (11)
6.4 Spline wavelets 193
have to be computed numerically one by one. Comparing coefficients in (9)
and (10) we obtain the following formula for the hf:
(12)
Only now are we in a position to compute the Battle-Lemarie wavelet resp.
spline wavelet 'IjJ corresponding to the chosen n. On account of 5.3.(16) resp.
(8) we have
'IjJ(t) = J2 2:::( _1)k-l (2t - k)
k
= J2 2::: 2:::( _1)k-l Bn(2t - k -l)
k I
= J2 2::: 2) _1)k-l Cr-k Bn(2t - r) .
r k
This means that we should introduce the new set of coefficients
b
r
:= J2 2:::(-I)k-l Cr-k,
k
and in this way we get definitively
'IjJ(t) = 2::: br Bn(2t - r) .
r
How many terms of this expansion actually have to be taken into consideration
is best decided "at run time".
The last formula has brought our discussion to a close. It remains to supply
the proof of Lemma (6.15).
I Inserting (2) into the definition (6) of we get
( )
1. 2n+2 e '\:"' 1 1 . 2n+2 e s (t::)
e = -2 sm -2 L.... f; 2n+2 = -2 SIn -2 n <" ,
7r I 7rl) 7r
where we have introduced the auxiliary function
194 6 Orthonormal wavelets wit.h compact. support
As is easily verified, one has
(n:::: 1) ,
and t.his leads to the following recursion formula for the Ifln :
(13)
It remains to knead this prescription into a more practicable form,
Since the Bo ( - k) (k E Z) are in fact orthonormal, we have .;po (e) == 2 ~ '
Setting cos'; =: y, we introduce a new variable y and write the function Ifln
in the following form:
<pn() = 2 ~ Pn(Y) ;
Po(y) == 1 .
We are now going to insert this into (13), In so doing we must observe the
following differentiation rules:
d
dE,
( - sin {) !!.. ,
dy
d 2 ~
-y-+(l-y)-.
dy dy2
In this way the recursion formula (13) becomes
Pn{Y) = 1 U_y)lt+l(_y(Pn-l(Y))'+(I_ y2)(Pn-l(Y))"), (14)
n(2n + 1) (1 - y)n (1 _ y)n
where the dot' denotes differentiation with respect to the variable y. By
computing successively
(
p71-l(Y))' =
(1 - y)n
Pn-l Pn-l
.,----,-- + n ,
(1 - y)n (1 - y)n+l
(
Pn-l(y))" = jin - 1 +2n Pn-l ( ) Pn-l
()
() ( ) +1 + n n + 1 (1 _ y)n+2 '
I-yn .1 yn I-yn
we get rid of the denominators in (14):
Pn(Y) = n(2n
1
+ 1) (-y(l - Y)Pn-1 - nYPn-l
+ (1 + Y)(l - yf'Pn-1 + 2n(1 y)i)n-l + n(n + I)P71-1)) .
This can be slightly simplified by collecting like terms. In this way we obtain
the following definitive recursion formula for the Pn:
= (2 1 ) (n(n+l+nY)Pn-1
n 'n+l
+ (1 - y) (2n + (2n - 1) y) Pn -1 + (1 - y)
2
(1 + y) Pn - 1) ;
and it easy to see that Pn is a polynomial of degree n in the variable y = cos e,
jf Pn-l had degree n - 1.
If one feeds the final recursion formula to, e g., Mathematica, the following
output is returned:
PI (y) =
P2(Y) = (16 + 13y + y2) ,
P3(Y) = 6:;0 (272 + 297y + 60y2 + y
3
) ,
and so on.
CD In t.he case 17. = lone obtains by means of (11) and (12) the following
table of coefficients h;! :
r
h# =h#
r 2-r
r
h# =h#
r 2-r
1 .8176464014 17 .0000034798
2 .3972970868 18 .0000018656
3 -.0691009838 19 - 0000008823
4 -.0519453464 20 - 0000004712
5 .0169710467 21 0000002249
6 .0099905948 22 .0000001198
7 -.0038832619 23 -.0000000576
8 - .0022019510 24 -.0000000306
9 .0009233709 25 .0000000148
10 .0005116360 26 .0000000078
11 .0002242963
?-
_I
- .0000000038
12 -.0001226863 28 -.0000000020
13 .0000553563 29 .0000000010
14 0000300112 30 .0000000005
15 -.0000138188 31 - .0000000003
16 - .0000074444 32 - .0000000001
The scaling function and the wavelet 1j; corresponding to
n = 1 are shown in Figures 6.8 and 6.9. Both functions are piecewise linear.
x
Figure 6.8 The Battle-Lemarie scaling function corresponding to n = 1
1
x
-4
-3
-2 -1 1 2
-0.5
-1
-1.5
Figure 6.9 The Battle-Lemarie wavelet corresponding to n = 1
Carrying out the same calculations for n = 3, one finds that the h'f! now
decay considerably slower than before when Irl --4 00. As a consequence the
following table gives these h'f! to six decimal places only, although they were
originally computed, using Mathematica, to 14 decimal places.
r
h#=h#
r 4-r
r
h#=h#
r 4-r
2 .766130 17 -.000927
3 .433923 18 .000560
4 -.050202 19 .000462
5 -.110037 20 -.000285
6 .032081 21 -.000232
7 .042068 22 .000146
8 -.017176 23 .000118
9 -.017982 24 -.000075
10 .008685 25 -.000060
11 .008201 26 .000039
12 -.004354 27 .000031
13 -.003882 28 -.000020
14 .002187 29 -.000016
15 .001882 30 .000010
16 -.001104 31 .000008
The scaling function <p and the Battle-Lemarie wavelet 'If; corresponding to
n = 3 are shown in Figures 6.10 and 6.11. 0
Figure 6.10 The Battle-Lemarie scaling function corresponding to n = 3
1
0.5
x
Figure 6.11 The Battle-Lemarie wavelet corresponding to n = 3
References
Books on wavelets
[Be] John J. Benedetto and Michael W. Frazier eds.: Wavelets: Mathematics
and applications. CRC Press 1994.
[Bu] C. Sidney Burrus, Ramesh A. Gopinath and Haitao Guo: Introduction
to wavelets and wavelet transforms. Prentice Hall 1998.
[C] Charles K. Chui: An introduction to wavelets. Academic Press 1992.
[C'] Charles K. Chui ed.: Wavelets. A tutorial in theory and applications.
Academic Press 1992.
[D] Ingrid Daubechies: Ten lectures on wavelets. CBMS-NSF Regional Con-
ference Series in Applied Mathematics, SIAM 1992.
[D'J Ingrid Daubechies ed.: Different perspectives on wavelets. Proc. Symp.
Appl. Math. 47, Amer. Math. Soc. 1993.
[K] Gerald Kaiser: A friendly guide to wavelets. Birkhauser 1994.
[L] Alfred K. Louis, Peter MaB und Andreas Rieder: Wavelets, Theorie und
Anwendungen. Teubner 1994.
[M] Yves Meyer: Ondelettes et operateurs, I: Ondelettes. Hermann 1990.
The same in English: Wavelets and operators. Cambridge University
Press 1992.
[W] Mladen Victor Wickerhauser: Adapted wavelet analysis from theory to
software. A K Peters 1994.
Original papers and background material
[1] Christopher M. Brislawn: Fingerprints go digital. AMS Notices 42(11)
(1995), 1278-1283.
[2] Paul L. Butzer and Rolf J. Nessel: Fourier analysis and approximation.
Vol. I: One-dimensional theory. Birkhauser 1971.
[3] Ingrid Daubechies: Orthonormal bases of compactly supported wavelets.
Communications on Pure and Applied Mathematics 41 (1988), 909-996.

Wavelets A Primer

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Wavelets A Primer

Încărcat de

Drepturi de autor:

Formate disponibile

Editorial, Sales, and Customer Service Office

S-ar putea să vă placă și