Sunteți pe pagina 1din 28

3

Time-frequency analysis

3.1 Time-frequency analysis


3.1.1 Fourier forever
Let f (t) be a mathematical idealization of some physical signal depending on time t. Perhaps f can be
considered as a superposition of oscillating components but these oscillations have to be limited to some
finite extension in time. This is a fundamental problem of the Fourier inversion formula which states that,
for a well-behaved signal f one has
Z
f (t) =
f()e2it d

expressing a complex signal f as a superposition of exponentials e2it . If f vanishes outside some finite
set then the exponentials, which extend over all time, must cancel each other in some fantastic way that
makes it virtually impossible to quantify in any intuitive way which frequencies play a dominant role at any
particular time t.
3.1.2 Frequency Local in time
Consider a physical signal to be a square integrable real-valued function of time, x(t). One can define a
complex extension z(t) = x(t) + iy(t) bey letting y(t) be the inverse Fourier transform of isgn (t)
x() where
sgn denotes the signum function /||. In this case z = x
+ i
y has only positive frequencies.
Exercise 3.1.1. Explain why z(t) has an extension to a complex argument t + is, s > 0 that is analytic
in the upper half plane {t + is : s > 0}. You may assume that x(t) is a continuous, bounded, absolutely
integrable function.
p
The analytic signal z has the polar form r(t)ei(t) where r = x2 + y 2 and = arctan y/x. The instantaneous frequency can be defined as d/dt. This point of view however is a little too simple because x(t)
can be a superposition of multiple oscillating components and the instantaneous frequency cannot resolve
multiple oscillating contributions. We will return to this point later in this chapter. First we want to consider
some fundamental issues governing the impossibility of joint time-frequency localization and, in view of these
limitations, mathematical tools that aim to characterize compositions of signals in terms of time localized oscillations in view of such limitations. One typically refers to such tools as time-frequency representations. As
with the case of Fourier transforms we will encounter both continuous and discrete parameter time-frequency
representations.
3.1.3 The Heisenberg-Wiener inequality
Variance inequalities
Theorem 3.1.2. (Heisenberg uncertainty principle) If f L2 (R) with kf k2 = 1 then
1
kxf (x)k2 k f()k2
.
4
2

Moreover, one has equality if and only if f (x) = ex for some > 0.

48

3 Time-frequency analysis

This inequality states that f cannot have most of its energy near zero in time or space, and most of
its energy near zero (or really any other point) in frequency. This type of inequality is called a variance
inequality because, when regardedRas a continuous probability density on R, the variance of the quantity
|f (t)|2 dt is defined as kxf (x)k2 if x|f (x)|2 dx = 0 while k f()k2 is the variance of the density |f()|2 d
R
is |f()|2 = 0. In quantum mechanics the Heisenberg inequality has an interpretation as joint variance of
position and momentum operators and can be construed, roughly, as saying that one cannot jointly measure
the position and momentum of a subatomic particle with arbitrary precision, a fact that was verified in the
case of an electron by photon scattering in the famoous Compton effect.
Our interest in uncertainty inequalities will take more of a macroscopic interpretation involving the
formulation of a joint time-frequency picture of a signal.
Heisenbergs inequality has an interesting and simple extension to the case of (possibly unbounded)
operators on a Hilbert space H. Define the domain of a self-adjoint operator A to be the set of u H such
that Au H.
Theorem 3.1.3. If A and B are self-adjoint operators on a Hilbert space H then, whenever u is in the
domain of both AB and of BA and a, b C one has
k(A a)ukk(B b)uk


1
h(AB BA)u, ui .
2

Equality holds precisely when (A a)u and (B b)u are pure imaginary scalar multiples of one another.
Exercise 3.1.4. Shows that, at least formally,
h(AB BA)u, ui = 2i =h(B b)u, (A a)ui.
Then apply Cauchy-Schwarz to conclude the theorem.
The Heisenberg inequality has a covariant form called the Robertson-Schr
odinger inequality. It takes the
form
1
(xj )2 (j )2
+ (Cov(xj , j ))2
16 2
where the notation refers to covariance of operators (see, e.g. [?]).
Hermite functions
d
The Fourier transform exchanges differentiation and multiplication, specifically, ( dt
f ) () = 2i f(),
P
d
k
writing D = 2i dt and writingPP (D) for a differential operator P (D) =
k ak D we have, formally,
F(P (D)) = P () where P () = k ak k . P
Additionally, if P (t, D) is a homogeneous polynomialPof degree m
m
in t and D meaning that it has the form k=0 ak tk Dmk then, also formally, F(P )(t, Dt ) =
ak Dk mk
i d
1
(D ) = t. This and the observation
where D = 2
d since the Fourier inversion formula implies that F
2
about Gaussians being preserved leads to a description of L -eigenfunctions for the Fourier transform on R.
These eigenfunctions are the Hermite functions

(1)m t2 dm 2t2
e
(e
)
hm (t) = 21/4m
dtm
m m!

(3.1)

Exercise 3.1.5. Show that hm is an eigenfunction of the Fourier transform with eigenvalue (i)m .
Exercise 3.1.6. Explain why the Hermite functions are orthogonal with respect the the standard inner
product on R.
It turns out that, in fact, the Hermite functions form an orthonormal basis for L2 (R).
R
Exercise 3.1.7. Compute the moment th2m (t) dt.

3.1 Time-frequency analysis

49

Entropy inequality
There are a vast number of known alternative, precise mathematical statements of the fact that a function
and its Fourier transform cannot both be arbitrarily well localized. One important form has to do with
informationR usually regarded in terms of entropy. For f L2 (R) with kf k2 = 1 one defines its entropy
E(f ) = |f |2 ln |f |. Entropy can take on positive or negative values (including infinity)
but if f were

highly concentrated then it would have a large negative entropy. For example, if f = N on [0, 1/N
) and
zero elsewhere then (using the rule 0 ln 0 = 0) one would have E(f ) = 12 ln N whereas, if f = 1/ N on
[0, N ) and zero elsewhere then E(f ) = + 12 ln N . This typical example indicates that entropy is a measure
of energy spread. A very illuminating mathematical discussion of entropy can be found in Landau [?]. The
following entropy inequality was proved by Beckner [?]. It says that E(f ) and E(f cannot both have large
negative values.
Theorem 3.1.8. If f L2 (R), kf k2 = 1 then E(f ) + E(f) 12 (ln 2 1).
2

Exercise 3.1.9. Compute E(g) for g(t) = et .


One can define the entropy of a vector z = (z1 , . . . , zN ) similarly, namely E(z) =

PN

k=1

|zk |2 ln |zk |.

A norm inequality for the discrete Fourier transform


We have seen that the DFT satisfies the Plancherel formula
kb
zk2 = kzk2
where kzk2 is the usual Euclidean norm. Now let kzkp =
sup1kN |zk |.

PN

k=1

|zk |p

1/p

for 1 p < and let kzk =

Exercise 3.1.10. Prove that kzkp defines a norm on CN , 1 p but that the triangle inequality can
fail when p < 1.
Exercise 3.1.11. Explain why kb
zk

1 kzk1 .
N

Interpolation
A convexity principle known as the Riesz-Thorin interpolation theorem (e.g [?]) allows us to conclude from
Plancherels identity (that the DFT is unitary) and from the inequality kb
zk 1N kzk1 that
kF(z)kp0 N (p2)/(2p) kzkp ,

1
1
+
=1
p p0

(3.2)

whenever 1 p 2.
Now define the quantity
Hp (z) =

1
1

p
2

ln

N
X
k=1

|zk |p =

 kzk 
p
p
ln
2p
kzk2

Taking the logarithms of both sides of (3.2) gives


p 2
1 X
1 X
p0
ln
|
z
|

ln N + ln
|zk |p
k
0
p
p
p
X
X

0
p
p
ln N
ln
|
zk |p
ln
|zk |p
0
(p 2)p
p(p 2)
X
X

1
1
p0
ln N
ln
|
z
|
+
ln
|zk |p
k
2 p0
2p

ln N Hp (z) + Hp0 (z),


which is equivalent to (3.2). Now consider Hp (z) as a function of p when z is fixed. Notice that if one has two
increasing functions on (, ] that are continuous and equal at then the derivative of the smaller function

50

3 Time-frequency analysis

has to be at least as large as that of the larger function at . In this case we are saying that that logarithmic
derivatives of (kzkp0 /kzkp ) is at least the logarithmic derivative of N (2p)/(2p) and this translates into the
statement that
X
X
1
(3.3)

|zk |2 ln |zk |
|b
zk |2 ln |b
zk | ln N
2
k

which is a form of the entropy inequality for the DFT.


Exercise 3.1.12. Showthat (3.3) becomes an identity in the special case when z is the constant vector all
of whose entries are 1/ N or when z = (1, 0, . . . , 0).
Fourier support properties.
R
If f L2 (R) vanishes off a finite interval then the integral f() = R f (t) e2it dt converges absolutely to
a differentiable function of the complex variable . This means that f() is analytic and, therefore, can
only have isolated zeros in the complex plane. There are a lot of intriguing mathematical variations of this
fundamental principle concerning sets where Fourier transforms can vanish. One of the deepest versions of
this principle is reflected in the following inequality due to Nazarov [?].
Theorem 3.1.13. There are absolute constants A > 0 and C > 0 such that for any f L2 (R),
Z
Z
Z

|f |2 CAA|S|||
|f |2 +
|f|2 .
R\S

R\

Here |S| denotes the total length of S when S can be expressed as a (possibly infinite) union of pairwise
disjoint intervals.
Exercise 3.1.14. CanPa function f and its Fourier transform f both be supported on sets of the form

i=1 [i , i ] such that


i=1 (i i ) < ? Explain.
Exercise 3.1.15. Let S = [T /2, T /2] and = [/2, /2]. Compute the integrals above in the case
2
g(t) = et .
Concentration inequalities
One says that f is -concentrated on A R if
Donoho and Stark.

R
R\A

|f |2 < 2

R
R

|f |2 . The following inequality was proved by

Theorem 3.1.16. If f 6= 0 is -concentrated on A and f is -concentrated on B then |A||B| (1  )2 .


Exercise 3.1.17. Relate the Donoho-Stark concentration inequality to Nazarovs inequality.
3.1.4 Finite Fourier inequalities
Exercise 3.1.18. Formulate a version of the Heisenberg variance inequality for the discrete Fourier transform.
Number theory plays an important role in the fast Fourier transform algorithm. In particular, if N = P
is a large prime number, then one cannot use any reduction argument to speed up computation of the DFT.
Curiously, concentration and support properties of the finite Fourier transform depend in an equally crucial
way on the composite nature of N . Let #z denote the number of nonzero coordinates of z = (z1 , . . . , zN ) CN
P
Exercise 3.1.19. Show that, for z CN , if kzk2 = 1 then |zk |2 ln |zk | ln #z.
Now one has the following corollary to (3.3).
Theorem 3.1.20. If z 6= 0, then
#z #b
z N.

3.2 Time-frequency bases and frames

51

The entropy inequality does not tell us when this inequality becomes an identity. It turns out that equality
occurs if and only if z is a shifted or modulated picket fence vector [?]. That is, if an appropriate shift or
modulate of z is a multiple of 1ZN/Q where Q divides evenly into N . In other words, 1ZN/Q is the vector
z such that zk = 1 if k is a multiple of Q
and zk = 0 otherwise. The inequality between arithmetic and
geometric means implies that #z+#b
z 2 N . One can infer a stronger inequality when P is prime (see [?]).
Corollary 3.1.21. If N = P is prime, then for z 6= 0,
#z + #b
zP +1
Equality holds only when z is a modulated version of a multiple of the vector all of whose coordinates equal
one.
Exercise 3.1.22. Write a matlab script to verify this statement experimentally.
Concentration inequalities can actually be used to say something about approximations from sparse data.
Exercise 3.1.23. Suppose that N = M P and that it is known that b
z is sparse in the sense that only
K < min{M, P } entries are nonzero. What is the minimal number of coordinates of z required to reconstruct
z? Explain.
Exercise 3.1.24. Suppose that z is such that zk = 0 unless k = nL for some n where L divides N . Can the
complexity of computing the DFT for such z be reduced? Explain.
The problem of estimating z based on the prior assumption that b
z has only a small number of nonzero
coefficients is a deep and challenging active area of applied mathematics. The work of Tao, Candes et al. [?,?]
gives examples that build on earlier work of Donoho and Elad [?].

3.2 Time-frequency bases and frames


The rest of this chapter represents several aspects of the current state of the art that has evolved around trying
to make sense of representing harmonic oscillations locally in time when, in point of fact, the uncertainty
principle tells us that it is effectively impossible to do so. We will review a range of techniques that all, at
some level, attempt to represent a signal as a superposition of frequency information local in time. Although,
at some level, all of these techniques try to accomplish the same basic task, the nuances that distinguish one
time-frequency representation from another make all the difference when it comes to particular applications
that put different levels of emphasis on the ability to recognize coherent structure within a signal versus
the ability to recover those significant features while setting aside features not of interest in an effective,
automatic way. Because of these tools span quite a lot of mathematical technique, our treatment will be just
a little beyond superficial. The goal is to give just enough detail to get a feel for the mathematical origins of
the different methods and a taste of how the tools differ from one another and the purposes that underlie
these differences.
3.2.1 Gabor bases
Fourier series provide expansions of periodic functions, but they can also be considered as local expansions
of functions in the sense that they only represent given functions for one period. Square integrable functions
are not periodic but, thought of as functions of time or space, they can have oscillating components that
emerge and decay over time or space. Sines and cosines will have some correlation with such components
Exercise 3.2.1. Let = [0,1) (x). Show that the functions n,k (x) = (x n) e2ikx form an orthonormal
basis for L2 (R).
The functions n,k are sometimes called Gabor functions (pron. gabor) after the Hungarian Nobel
laureate Dennis Gabor. In general, if g L2 (R) and if > 0, > 0 then the family gn,k (x) = e2ikx g(x
n) is called a Gabor family G(g, , ) generated by and the lattice Z Z of time-frequency shifts. The
function g is sometimes called a Gabor window. There is a fairly well-developed theory now associated with
Gabor representations. Much of what was known by 2000 is discussed in Grochenigs book [?]. We will review
some of those facts but we will also discuss some more recent developments.
First, there are fundamental limitations on Gabor orthonormal bases.

52

3 Time-frequency analysis

Theorem 3.2.2. (Balian-Low) Suppose that G(, , ) forms an orthonormal basis for L2 (R). Then the
time-frequency variance product kxg(x)k2 k
g ()k2 =
This tells us that the Gabor window cannot have good time-frequency localization. We can ask whether
an overcomplete Gabor representation can have good time-frequency localization (finite time-frequency variance). This time we are in luck, but we need a little basic machinery to describe the main result and how it
can be applied.
Frames
Given a separable Hilbert space H, a countable subset fn is called a frame for H provided that there are
constants A, B such that for any f H one has
X
Akf k2
|hf, fn i|2 Bkf k2 .
n

These inequalities imply that the frame operator is bounded and continuously invertible. Frames are necessarily complete sets but typically they are overcomplete or redundant. For example, any three unit vectors
in R2 that differ from one another by a rotation by 2/3 will form a frame for the Hilbert space R2 . In the
case of Gabor systems G(g, , ) one defines the frame operator
X
Sg,, f =
hf, gn,k ign,k .
n,kZ
1
To g one can assign a canonical dual window = Sg,,
g and then one has the reproducing formula
X
X
f=
hf, n,k i gn,k =
hf, gn,k i n,k .
n,k

n,k

In this case one can say that f is expressed in a natural way as a superposition of time-frequency localized
Gabor atoms if g (or ) is time-frequency localized. At this stage we just have a couple of minor problems.
First, what does look like, and second, if g is time-frequency localized in a suitable sense then will be
localized in the same sense? A third issue come in determining conditions on g, , such that one has a
2
Gabor frame. When g is the Gaussian function g(x) = ex one has the following frame density criterion.
2

Theorem 3.2.3. (Seip and Lyubarskii) When g(x) = ex the family G(g, , ) forms a frame for L2 (R)
if and only if < 1.
The product can be regarded as a time-frequency density in the sense that 1/() is the number of
Gabor time-frequency shifts per unit area. This has to be at least one in order that the family is complete in
the case of a Gaussian. Overcompleteness implies that typically there will be more than one dual function
1
to a Gabor frame generator g. The dual function = Sg,,
g is called the canonical dual. Except for a minor
technical condition, Wexler and Raz [?, ?] characterized the Gabor duals in the analogous finite frame case
that will be discussed in a minute. In the case of Gabor frames for L2 (R) this characterization was carried
out rigorously by Daubechies, H. Landau and Z. Landau as follows.
Theorem
3.2.4. (Wexler-Raz) The pair (g, ) is a pair of dual Gabor windows in the sense that Sg,,, f =
P
hf,

i gn,k is the identity operator on L2 (R), if and only if


n,k
n,k

1

n/,k/ , gm/,`/ = mn `k .

The frame operator Sg,, itself can be expressed


as the composition of a coefficient mapping Tg,, (f ) =
P
{hf, gn,k i} with its adjoint T {cnk } =
c
g
n,k n,k n,k . The Gabor expansion of f then has the form

f = T,,
Tg,, . As noted, the main difficulty with this expansion is the computation of the dual function
. A very important consequence of the Wexler-Raz identity is that the canonical dual is the same as the
so-called Wexler-Raz dual which is defined as follows:

= Tg,1/,1/
(Tg,1/,1/ Tg,1/,1/
)1 e0,0

(3.4)

where e0,0 is the coefficient sequence on Z Z such that e0,0 (n, k) = 1 if n = k = 0 and e0,0 (n, k) = 0
otherwise. What is important about the formula (3.4) is that it allows for a discrete calculation of the dual
function . This calculation can be performed numerically as follows.

3.2 Time-frequency bases and frames

53

Proposition 3.2.5. (Neumann series expansion) Suppose that S is an operator on a Hilbert space H such
that 0 < S < I is the sense that for any, for any x H one has 0 < h(I S)x, xi < kxk2 . Then one can
write

X
S 1 =
(I S)k .
(3.5)
k=0

P
Formally, (3.5) is the same as the geometric series expansion x1 = k=0 (1 x)k when 0 < x < 1 with S

substituted in for x. In the case of the operator Tg,1/,1/ Tg,1/,1/


one has, at least formally

)1 e0,0 =
(Tg,1/,1/ Tg,1/,1/

k
2/()
2/() X

I
Tg,1/,1/ Tg,1/,1/
e0,0 = lim eK
K
A+B
A+B
k=0

which allows one to compute eK recursively as


e0 =

2/()
e0,0 ,
A+B

eK+1 = e0 + eK Tg,1/,1/ Tg,1/,1/


eK

and

which in turn allows one to write

K
n,k gn/,k/

n,k

where the coefficients

K
n,k

are also defined recursively.

Exercise 3.2.6. Find a recursive expression defining the coefficients K


n,k .
Discrete implementations of Gabor expansions
The linear time-frequency analysis toolbox was developed by the numerical harmonic analysis group in
Vienna. It contains utilities for computing discrete Gabor transforms from sampled data along with other
time-frequency utilities that will be discussed below. The same sorts of issues of time-frequency localization
in the continuous case of functions defined on R are present in the finite case but the context is different.
Given a sample vector x of length L one has to ask: on what normalized time (or space) interval was the
signal sampled? Equivalently, what was the sample rate? For example, in the case of speech signals the
sample rate might be 11025 or 22050 or 44100 samples per second, the latter determined by the response of
the human auditory system and not necessarily by the real analogue signal generating the data. The sample
rate then represents the finest time scale in the data. The Gabor shift parameter then has to be encoded as
a fraction of a unit of time, hence as a given number of samples. Thus one replaces the time-shift parameter
(in seconds, say) by the sample shift parameter a = Fs (in samples) where Fs denotes the number of
samples per second. Next one has to consider the frequency shift parameter . In standard DFTs the rows
of the DFT matrix are powers of the vector {e2ijk/N } with N the length of the (possibly zero-padded)
signal. Thus the normalized frequencies go up to the number of samples. In principle, the signal of interest is
bandlimited to Fs/2 and this can be reflected in the discrete Gabor transform. Then can be regarded as a
fraction of the normalized frequency and one can express /F s = 1/M as the number of Fourier modes that
will be considered in the windowed data. Thus, the discrete Gabor transform of data x with smaple indices
running from 0 to L 1 will take the form
c = DGT (x, g, a, M )
c(n + 1, k + 1) =

L1
X

x(`)e2i`k/M g(` an + 1)

`=0

k = 0, . . . , M 1;
n = 0, . . . N 1 = L/a 1.
The ltfat syntax is
[c,Ls]=dgt(f,g,a,M);
fr=idgt(c,gd,a,Ls)

54

3 Time-frequency analysis

where f is the input vector x having Ls entries, g is the Gabor window with Gabor dual gd, a is the sample
shift parameter and M is the normalized frequency parameter which is referred to in the ltfat documentation
as the number of channels. Practically, it is the effective number of Fourier modes to be considered in the
windowed data. Gabor window and dual window design for finite implementations is discussed briefly in the
ltfat help. in Figure 3.1 the Gabor transform of the signal buellershort is plotted on a log intensity scale.
The parameters were chosen as follows in order to give a fairly robust time-frequency picture.
g=pgauss(1024,256,512);
[c,Ls]=dgt(x,g,2,1024);
In general, the more robust the picture, the more redundancy required. This means small a and large M .
Minimum redundancy would require large a and small M ; aM = 1 being the smallest possible choice. One
other thing worth mentioning is that the DGT treats data as being a single period of a periodic sequence.
The window functions used are also periodic so that there are no edge effects.

50

100

150

200

250
500

1000

1500

2000

2500

3000

3500

4000

Fig. 3.1. Log scale intensity plot of Gabor transform of bueller signal of length L = 8192.

3.2.2 Compression using Gabor transforms


Having a discrete implementation of the Gabor transform allows one to perform signal processing tasks such
as compression and denoising. The upper left part of Figure 3.1 indicates the presence of frequencies in the
first half of the signal that are not present in the second half. The lower part indicates the presence of three
parallel harmonics with frequencies that gradually increase over time.
The Gabor transform is not so useful for compression of information that is dispersed in the time-frequency
plane such as the bueller speech signal. See Figures ?? and 3.3 where the sparser picture just shows that
top ten percent magnitude terms. The problem is that one has to compute a rather larger number of Gabor
coefficients to get a coherent time-frequency picture. In the given example, L = 8192, a = 8 and M = L/a
so there are (L/a)2 106 Gabor coefficients as opposed to only 103 samples, so this hardly amounts to
compression. On the other hand, the Gabor transform could be a useful means of denoising signals when the
noise is evenly diluted over the time-frequency plane. See Appendix 3.7.1 for matlab code.

3.2 Time-frequency bases and frames

55

50
50
100
100
150
150
200

200

250

250

300

300

350

350

400

400

450

450

500

500
100

200

300

400

500

600

700

800

900

1000

100

Fig. 3.2. Log scale intensity plot of Gabor transform of


bueller signal of length L = 8192.

200

300

400

500

600

700

800

900

1000

Fig. 3.3. Log scale intensity plot of Gabor transform of


top ten percent of Gabor coefficients.

3.2.3 Denoising using Gabor transforms


To test the denoising performance of Gabor transforms we added uniform random noise to the signal
buellershort with amplitude about one fifth of the original signal amplitude. Of course, results will vary
with intensity and structure of the noise. To denoise we took the Gabor transform of the noisy signal as
shown in Figure 3.5 (left) and zeroed out all coefficients not in the top 10 percent magnitude. The residual
is compared to the noise in Figure 3.6. This type of denoising is the same as we applied when using the
Fourier transform for denoising the same signal. The result sounds much better in the Gabor case because
the Gabor transform isolates the noise from the spectrum of the signal locally in time in the Gabor case.

50
50
100
100
150
150
200

200

250

250

300

300

350

350

400

400

450

450

500

500
100

200

300

400

500

600

700

800

900

1000

100

Fig. 3.4. Log intensity plot of Gabor transform of noisy


bueller signal.

200

300

400

500

600

700

800

900

1000

Fig. 3.5. Log intensity plot of top ten percent of noisy


Gabor coefficients.

[use noisy and show soft thresholding/shrinkage]


3.2.4 Short-time Fourier transform and Spectrograms
Gabor coefficients are integrals of the form
Z
S(f, g)(x, ) =

f (t)
g (x t)e2it dt

(3.6)

where g(x) is actually replaced by its reflection g(x) = g(x) and x takes the value n and the value
k. However, if one is willing to compute all of the values then one ends up with the short-time Fourier
transform S(f, g)(x, ) which is a mapping from a function f (t) to a function of the variable (t, ). As an

56

3 Time-frequency analysis

0.15

0.1

0.05

0
1800

2000

2200

2000

2200

2400

2600

2800

3000

3200

3400

0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
2400

2600

2800

3000

3200

3400

Fig. 3.6. Top plot shows noise added to Gabor signal Bottom plot shows residual cleaned signal from top 10 percent
of Gabor coefficients minus noisy bueller signal. This residue looks much the same as the noise and the cleaned signal
sounds much like the original bueller signal.

integral it is linear in f . In fact it is also (conjugate) linear in g but one usually regards g as a fixed window
function then regards f 7 Sg (f ) = S(f, g) as a linear mapping. The short-time Fourier transform satisfies
a remarkable inversion property. Suppose that kgk2 = 1. Then
Z Z
f (t) =
S(f, g)(, ) g(t )e2i d d
(3.7)
The inversion formula is very similar, on the one hand to the Fourier inversion formula (corresponding to the
case where g is replaced by the Dirac point mass here) as well as to the Gabor representation formula
in the limit as the time and frequency shift parameters tend to zero. Formula 3.7 is interpreted in the sense
of convergence in L2 (R). We Rwill not justify the formula rigorously (see [?]) but we will give a formal proof
based on the formal identity e2it dt = . Then
Z Z
S(f, g)(, ) g(t )e2it d d
Z Z Z
=
f (s)g( s)e2is ds g(t )e2it d d
Z
Z
Z
= f (s) g(s )g(t ) e2i(st) d d ds
Z
Z
= f (s) g(s )g(t )(s t) d ds
Z
= f (t) |g(t )|2 d = f (t).
In fact, one can show that S(f, g) is energy preserving in the sense that, when kgk = 1,
Z Z
Z
2
|S(f, g)(t, )| dt d = |f (t)|2 dt.

3.2 Time-frequency bases and frames

57

3.2.5 The time-frequency toolbox (tftb)


The Time-frequency toolbox is a collection of time-frequency representation tools that was developed at
Centre National de la Recherche Scientifique (CNRS) in France. While matlab has a built in spectrogram
function in its signal processing toolbox, this function is proprietary. tttftb has substantial overlap with ltfat
it has many additional time-frequency representation utilities with the primary goal of providing good tools
for analyzing as opposed to processing data. The tftb utility for computing spectrograms is called tfrsp.
Because of the highly redundant nature of the spectrogram, a typical laptop would not be able to process
tfrsp of a signal like buellershort which has 8192 samples. One possibility is first to downsample. The
matlab signal processing toolbox has a utility for downsampling. But it is simple enough to write a script.
d=2;
% downsample rate
for i=1:floor(length(x)/d)
xdownsampled(i)=x(d*i);
end
Figure 3.7 shows a full spectrogram of the bueller signal downsampled by a factor of two. The picture
is relatively clean compared to the Gabor pictures because of the higher redundancy. With x the vector of
bueller data, the picture was produced as follows.
xdownsampled=xdownsampled
[tfr,t,f]=tfrsp(xdownsampled);
imagesc(log(1+10*abs(tfr)));

1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
200

400

600

800

1000

1200

1400

1600

1800

2000

Fig. 3.7. Log intensity spectrogram of bueller signal computed using tfrsp.

3.2.6 Time-frequency representations


Wigner distribution
In the von-Neumann model for quantum mechanics particles are modeled as so-called states unit vectors
in a separable Hilbert space (say L2 (R) with |(t)|2 dt thought of as a probability density. Then a quantity

58

3 Time-frequency analysis

like t|(t)|2 dt can be thought of as the expected location of a particle modelled by . Wigner sought a
mathematical object to model the joint distribution of such a density in time and frequency. Such a joint
density W (f ) should satisfy, among other things,

TF1
TF2
TF3
TF4
TF5

W is bilinear and W (f, f ) W (f )


If W (f ) = W (g) then f = cg for some c C with |c| = 1.
W (f )(t, ) 0
R
R
W (f )(t, ) d = |f (t)|2 , W (f )(t, ) dt = |f()|2
If f is supported in [a, b] then W (f ) is supported in [a, b] R

It turns out that these properties are incompatible. Wigner proposed as a substitute a mapping (f, g) 7
W (f, g), the so-called Wigner distribution, that satisfies all properties except (TF3), but requires instead
that W (f ) be real-valued. The Wigner distribution is defined as
Z


d
(3.8)
W (f, g)(t, ) = e2i f t + g t
2
2
Exercise 3.2.7. With W (f ) = W (f, f ) verify properties (TF2), (TF4) and (TF5).
2

Exercise 3.2.8. Show that for g = et , W (g)(t, ) = e(t + ) .

Exercise 3.2.9. Denote Da f (t)(t) = af (at), Eb f (t) = e2ibt f (t) and Ta f (t) = f (t a). Compute
W (Da )(f )(t, ), W (Eb f )(t, ) and W (Ta f )(t, ). Also compute W (f)(t, ) where f is the Fourier transform
of f . Express your answers in terms of W (f )(t, ).
Exercise 3.2.10. (Hard) Show that shifted, dilated and modulated Gaussians are the only L2 functions
having completely nonnegative Wigner distributions.
Moyals formula
The Wigner distribution is a unitary mapping from L2 (R) to L2 (R2 , a fact that is known as Moyals formula.
Theorem 3.2.11. If f1 , f2 , g2 , g2 L2 (R) then
hf1 , f2 ihg1 , g2 i = hW (f1 , f2 ), W (g1 , g2 )i.
On a formal level the proof is much the same as that of the inversion formula for Rthe short-time Fourier
transform, that is, it involves changes of order of integration and the formal identity e2ix d = x .
Z Z Z

Z
1 
2 
2 
2 
hW (f1 , f2 ), W (g1 , g2 )i =
e
f1 t +
g1 t
d1 e2i2 f2 t +
g2 t
d ddt
2
2
2
2
Z
Z Z
1 
2 
2 
2 
=
f1 t +
f2 t +
g2 t
g1 t
e2i(1 2 ) d d1 d2 dt
2
2
2
2
Z Z
1 
2 
2 
2 
=
f1 t +
f2 t +
g2 t
g1 t
1 =2 = d2 d1 dt
2
2
2
2
Z Z




=
f1 t + f2 t + g2 t g1 t
d dt
2
2
2
2
Z Z
=
f1 (u)f2 (u)g2 (u )g1 (u ) d du = hf1 , f2 ihg1 , g2 i
2i1

by using the substitution u = t + /2. This formally proves Moyals identity.


Figure 3.8 shows log scale absolute value image of a Wigner distribution of buellershort. Cross-term
interference is very pronounced in this picture. It illustrates the fact that good mathematical properties do
not necessarily correspond to a clean graphic.

3.2 Time-frequency bases and frames

59

1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
200

400

600

800

1000

1200

1400

1600

1800

2000

Fig. 3.8. Log intensity Wigner distribution of bueller signal computed using tfrwv.

3.2.7 Wigner distribution and spectrogram


The spectrogram of f with window g is Spec(f, g)(t, ) R=R|S(f, g)(t, )|2 . It is an energy distribution in the
sense that it is nonnegative and has the property that
|S(f, g)(t, )|2 dt d = kf k2 kgk2 . However, unlike
the Wigner distribution it typically will not generate the correct marginal values when integrating over one
of the variables. Here, the Wigner distribution has another remarkable property.
Proposition 3.2.12. The spectrogram Spec (f, g)(t, ) is the bivariate convolution of the Wigner distributions of f and g, that is,
Z Z
Spec (f, g)(t, ) =
W (f )(s, )W (g)(t s, ) ds d.
Proposition 3.2.12 follows from taking f1 = f = 2 = f and g1 = g2 = g in Moyals identity and using the
time-frequency shift covariance properties in Exercise 3.2.9.
Since convolution is a sort of averaging this says that SpecS(f, g)(t, ) is an averaged or smoothed
2
2
2
version of W (f ). In particular, when g = et , W (g) = e(t + ) and Spec (f, g)(t, ) then represents
a Gaussian averaging of W (f ). Remarkably, Spec (f, g)(t, ) is always nonnegative even though W (f ) is
rarely completely nonnegative. This observation suggests that it might be possible to obtain time-frequency
distributions having other desirable properties, i.e. substituting nonnegativity for some other property, by
replacing the time-frequency kernel K(s, t; , ) = W (g)(t s, ) by some other time frequency kernel K.
When K has the form K(s, t; , ) = P (t s, ) the resulting time-frequency distribution
Z Z
CP (f, g)(t, ) =
W (f )(s, )P (t s, ) ds d
(3.9)
is said to be a Cohen class distribution after its inventor Leon Cohen. The Wigner distribution corresponds
to P (ts, ) = st . When P (t, ) = g(t)H() CP is called a pseudo smoothed Wigner distribution and
when g is Gaussian it is the same as the spectrogram with window g, but in general it is not a spectrogram.
The utility of Cohens class kernel generally will depend on what application the user has in mind, such
as reducing interference among signal components in the time-frequency plane. In this case, it is useful to
design a kernel having specific properties. How to do so is discussed in the tftb reference manual. The tftb
toolbox has a variety of time-frequency distribution tools and the general usage is

60

3 Time-frequency analysis

[tfr,t,f]=tfrname(signal);
where tfrname is the name of the distribution. For example, the spectrogram name is tfrsp. Some alternative
time-frequency distributions are provided in Figure 3.10.

100

100

200

200

300

300

400

400

500

500

600

600

700

700

800

800

900

900

1000

1000
200

400

600

800

1000

1200

1400

1600

1800

2000

Fig. 3.9. Log intensity plot smoothed pseudo Wigner distribution of bueller signal.

200

400

600

800

1000

1200

1400

1600

1800

2000

Fig. 3.10. Log intensity plot Choi-Williams distribution


of bueller signal.

3.2.8 Time-frequency reassignment


Because of the uncertainty principle, no reasonable time-frequency distribution can localize energy in an
ideal manner. To some extent, methods to clean up time-frequency representations amount to attempts to
answer: what should an ideal tie-frequency representation of the data look like? Time-frequencyRreassignment
amounts to one such method. Recall that the expected value of a random variable is E(p) = xp(x) dx. In
the case of a time-frequency distribution one can also define the expected value around a point. In the case
of the spectrogram one can define the center of gravity of the time-frequency distribution around (t, ) as
RR
sW (f )(t s, )W (g)(s, ) ds d
t(f ; t, ) =
|S(f, g)(t, )|2
RR
W (f )(t s, )W (g)(s, ) ds d
; t, ) =
.
(f
|S(f, g)(t, )|2
One then defines the reassigned spectrogram RSpec (f, g) by

RSpec (f, g)(t, ) = Spec (f, g)(t, ).


The reassigned spectrogram will no longer be a bilinear mapping since expected values of the distributions
are not linear. On the other hand, remarkably in the case of the spectrogram, some of the other covariance
properties are linear in expectation and thus are preserved.
Although the definition of RSpec suggests a terribly complex implementation, the reassignment map that
actually simplifies dramatically, [?] (??).
sends (t, ) to (t, )
Proposition 3.2.13. Let (t, ) = (f, g)(t, ) = arg S(f, g)(t, ) denote the phase of the short-time Fourier
transform of f . Then

t = (t, )

= +
(t, )
t
Numerical implementation using these rules is not completely effective so one substitutes instead the
following
n S(f, tg(t))(t, )S(f, g)(t, ) o
t t <
|S(f, g)(t, )|2
n S(f, d g(t))(t, )S(f, g)(t, ) o
dt
=
|S(f, g)(t, )|2

3.3 Return to time-frequency orthonormal bases: local trigonometric bases and Wilson bases

61

Exercise 3.2.14. Comment on the quality of approximation using these formulas.


The tftb command for reassigned spectrograms is tfrrsp. The tftb has reassignment tools for a number
of time-frequency distributions beyond the spectrogram.
[tfr,rtfr,bar]=tfrrname(signal);
where tfrrname refers to the time-frequency distribution name, for example, tfrrsp in the case of the
spectrogram. The extra r is for reassigned. The vector bar refers to the local centroid mapping.
Exercise 3.2.15. Give some reasons why reassignment might not be a good idea for transforms like Gabor
transforms that are not highly redundant.
Figure 3.12 illustrates a reassigned spectrogram and tfrrspwv of the bueller signal.

100

100

200

200

300

300

400

400

500

500

600

600

700

700

800

800

900

900

1000

1000
200

400

600

800

1000

1200

1400

1600

1800

2000

Fig. 3.11. Log intensity plot of reassigned spectrogram of


bueller signal.

200

400

600

800

1000

1200

1400

1600

1800

2000

Fig. 3.12. Log intensity plot of reassigned smoothed


pseudo Wigner distribution of bueller signal.

3.3 Return to time-frequency orthonormal bases: local trigonometric bases and


Wilson bases
3.3.1 Wilson bases
The Balian-Low theorem 3.2.2 says that if {e2ikt g(t n)} forms a Riesz basis for L2 (R) then tg(t)
/ L2 or
2

g ()
/ L (R). It came as a surprise then when K. Wilson suggested the possibility of finding an orthonormal
basis for L2 (R) consisting of alternating windowed sines and cosines as follows. Set

2w(t n/2)
if k = 0, n even

nk =
2w(t n/2) cos 2kt
k = 1, 2, 3, . . . , n even

2w(t (n + 1)/2) sin 2(k + 1)t k = 0, 1, 2, . . . , nodd


This says that on consecutive intervals of unit length (the origin is slightly exceptional here), one alternates
between sines and cosines as the local basis elements. The technique for designing an appropriate window
function is a bigger excursion than we want to take at this stage. Utilities for constructing discrete orthonormal Wilson bases can be found in the linear time-frequency analysis toolbox ltfat. In Figure 3.13 we have
plotted the lower half of the Wilson transform of x = buellershort as follows.
gamma=wilorth(128,1024); % orthogonal window
c=dwilt(x,gamma,64);
The window length and channel parameters govern the time versus frequency localization of the Wilson
bases elements and the Wilson transform respectively. The Wilson transform plot exhibits essentially the
same structure as a nonredundant Gabor transform, cf. Figure 3.1 but perhaps with better time-frequency
localization.

62

3 Time-frequency analysis

70

80

90

100

110

120

10

20

30

40

50

60

Fig. 3.13. Log intensity plot of Wilson coefficients of bueller.

3.3.2 Local trigonometric bases


Because the window function can have exponential decay in both time and frequency, the Wilson basis has
excellent time-frequency localization tradeoffs. A more flexible construction but one for which the window
function has compact support in time is the so-called local trigonometric bases. We have already seen in
essence one form of this construction with the bell functions in Chapter
2. Here we want to consider a similar
P
bell function construction but instead of the dilation condition j= b2 (2j ) = 1 we want b to satisfy the
shift condition

X
b2 (t n) = 1.
(3.10)
n=

More general constructions are outlined in [?, ?] among others.


As in Chapter
R x 2, let (x) be a nonnegative, symmetric function supported in [1, 1] having integral /2.
Define (x) = (t) dt so (x) /4 nondecreasing, antisymmetric and has lower and upper bounds
/4. In particular, (x) = /2 (x). Set  (x)
= (x/), s (x) = sin  (x) and c (x) = cos  (x). Then
s (x) = 0 if x < , s (x) = 1 if x > , s (0) = 1/ 2 and s (x) = c (x) as is easily checked by properties of
sine and cosine. Now set b() = s1/2 ()c1/2 ( 1). Since s1/2 (x) = 0 if x < 1/2 and c1/2 (x) = 0 if x > 1/2
it follows that b(x) = 0 outside of [1/2, 3/2].
Proposition 3.3.1. The function b(x) just defined satisfies (3.10).
Proof. We will just check that b2 (x) + b2 (x 1) = 1 on [1/2, 3/2], the interval where these two overlap.
In general, only two consecutive shifts will overlap and these overlaps will follow the same pattern as on
[1/2, 3/2]. First, b(x) = c1/2 (x 1) on 1/2 x 3/2 while b(x 1) = s1/2 (x 1) on 1/2 x 3/2.
Therefore, for 1/2 x 3/2 one has
b2 (x) + b2 (x 1) = c21/2 (x 1) + s21/2 (x 1) = cos2 (1/2 (x 1)) + sin2 (1/2 (x 1)) = 1
because cos2 + sin2 = 1. It is important here that the same function 1/2 (x 1) is input into sine and
cosine so that the Pythagorean identity can be applied.

3.3 Return to time-frequency orthonormal bases: local trigonometric bases and Wilson bases

Theorem 3.3.2. The functions enk =


orthonormal basis for L2 (R).

63


2b(x n) cos (k + 21 )(x n) , n Z and k = 0, 1, . . . form an

Proof. One must show that overlapping functions are orthogonal. We prove this for the case when one of
the functions lives on [1/2, 3/2] so n = 0. The integral defining he0k , em` i = 0 unless m = 0 or m = 1.
We will consider the case m = 0 and leave the other cases as an exercise.
(Z
!
Z 3/2 )
1/2
1
1
2
b (t) cos (k + )t cos (` + )t .
+
he0k , e0` i = 2
2
2
1/2
1/2
R 1/2
R0
The first integral can be split into 1/2 + 0 . On [1/2, 0], b(t) = s1/2 (t) and on [0, 1/2) b(t) = c1/2 (t).
Also, cosine is an even function so
(Z
Z 1/2 )
Z 1/2
0
1
1
1
1
2
+
b (t) cos (k + )t cos (` + )t =
(s21/2 (t) + c21/2 (t)) cos (k + )t cos (` + )t
2
2
2
2
1/2
0
0
Z 1/2
1
1
cos (k + )t cos (` + )t
=
2
2
0
R 3/2
by the Pythagorean identity. For the integral 1 one uses the fact that cos (k + 1/2)t = cos (k + 1/2)(2 t)
and that, on [1, 3/2), b(t) = c1/2 (t 1) to write
Z
1

3/2

1
1
b (t) cos (k + )t cos (` + )t =
2
2
2

3/2

1
1
c21/2 (t 1) cos (k + )(2 t) cos (` + )(2 t)
2
2

1
c21/2 (1 t) cos (k + )t cos (` +
2
1/2
Z 1
1
=
s21/2 (t 1) cos (k + )t cos (` +
2
1/2
=

1
)t
2
1
)t
2

R 3/2
R1
where we have used the fact that c (u) = s (u). Therefore, adding 1/2 to 1/2 and using the Pythagorean
identity we end up with the integral of the cosine terms alone over [1/2, 1). Altogether, then,
Z 1
1
1
he0k , e0` i = 2
cos (k + )t cos (` + )t
2
2
0
Z 1
Z 1


1
=2
sin kt sin `t dt =
cos (k `)t cos (k + `)t dt = k`
2
0
0
which was to be shown. The completeness of the system {enk } follows from completeness of the trigonometric
system over each of the unit intervals [n, n + 1).
Exercise 3.3.3. Show that he0k , e1` i = 0 for all k and `. Explain why, in general, all inner products
henk , em` i can be reduced to calculating inner products on [0, 1] to conclude that the {enk } form an orthonormal family.
Though we will not go into detail here, local trigonometric bases can in fact be adapted to any partition of
the real line into intervals In whose left endpoints tn form a strictly increasing sequence such that limn =
. The
P main idea is to use a sequence n of cutoffs in such a way that the bells bn = sn (ttn )cn+1 (ttn+1 )
satisfy n b2n = 1 and such that the local trigonometric basis elements alternate polarities at the endpoints
in the same sense as the cosines just considered.
3.3.3 Discrete implementations
Discrete implementations of the local trigonometric bases are often called Malvar bases because H. Malvar
was the first to use them in signal processing applications. The really amount to nothing other than sampled
versions of the local trigonometric bases. Coefficient pictures produced by discrete analogues of the systems

64

3 Time-frequency analysis

{enk } just discussed will look very much like the corresponding Wilson basis pictures such as in Figure 3.13.
As will be discussed in more detail later, it is possible to use recursive decision trees to decide whether to split
a given interval for local trigonometric analysis into two subintervals with corresponding decompositions for
each subinterval. Such splittings give rise to families of local trigonometric bases indexed by interval splittings
and one can ask which, among this family of bases, represents given data in the most efficient way. Efficiency
will be discussed in Chapter 5. The local trigonometric functions are called cosine packets and routines for
implementing cosine packet analysis can be found in the Stanford WaveLab package WaveLab802. Figure
3.14 shows the pattern of nonzero cosine packet basis coefficients in an analysis of buellershort. Although
the intensity does not show up in this scheme, the pattern essentially follows that of the other time-frequency
analysis tools.
Phase plane:

CP Best Basis; Bueller

0.6

Frequency

0.5

0.4

0.3

0.2

0.1

0
0.4

0.5

0.6

0.7

Time

0.8

0.9

Fig. 3.14. Cosine packet analysis of Bueller signal.

3.4 Sampling and time-frequency localization


The theory of functions that are nearly jointly in the range of P and QT was developed in a seminal series
of papers by Landau, Slepian and Pollack in the 1960s, [?]. We want to take a slightly different approach
following more recent work of Khare and George [?] and of Walter et al., e.g. [?]. The goal is to give
intuitive and computationally useful meaning to, though not an actual proof of, a result that was stated and
proved rigorously by Slepian et al. [?]. It says, in effect, that the space of functions essentially timelimited
to [T /2, T /2] and essentially bandlimited to [/2, /2] has dimension essentially T . In other words,
the dimension of the space of essentially time -and -bandlimited signals is proportional to the area of the
time-frequency region. Further discussion of theoretical results as well as other numerical approaches can be
found in Hogan and Lakey [?].
The operator P projects onto bandlimited functions so one can express it in terms of integration against
the sinc function. Setting P = P1 we have
Z
sin t
dt.
P f (x) = f sinc (x) =
f (x t)
t

3.4 Sampling and time-frequency localization

65

If g = QT f is in the image of the operator QT then g is not bandlimited but P QT f is and we can write
Z

T /2

P QT f (x) = QT f sinc (x) =

f (t)
T /2

If f itself is in PW then we can apply the sampling theorem f =


Z

T /2

P QT f (x) = QT f sinc (x) =

f (k)

T /2 kZ

T /2

f (k)
T /2

kZ

sin (x t)
dt.
(x t)

f (k)sinc (x k) and write

sin (t k) sin (x t)
dt
(t k)
(x t)

X
sin (t k) sin (x t)
dt =
h{f (`)}, sT (k, x)i`2
(t k)
(x t)
`Z

where sk,T is the partial correlation


Z

T /2

sinc (t k) sinc (t x) dt.

sT (k, x) =
T /2

Of course, when T this converges to xk and one recovers the sampling theorem. Now consider the
case in which f = n is the n-th eigenfunction of the operator P QT . Here we make use of the fact that the
operator P QT has a discrete spectrum 0 1 . . . 0 as a self-adjoint operator on the Hilbert space PW.
If is an eigenfunction of P QT with eigenvalue then
X
P QT (m) = (m) =
h{(`)}, sT (`, m)i`2 .
`Z

In other words, the sample vector v = {(m)} is a -eigenvector of the matrix AT (m, `) = sT (`, m). This
means that the eigenvalue/eigenvector problem for the prolate spheroidal wave functions can be reduced to
that of the discrete matrix A = Am,` .
Exercise 3.4.1. Fix T be be a fairly large even integer, say T = 10. Estimate the entries of the matrix AT
by first approximating the sinc function by means of Taylor polynomials centered at the origin of sufficiently
high order, on the one hand, and by using Legendre polynomials on the other. Then compute the svd of the
matrix to obtain approximate sample eigenfunctions and plot several of them.
Now consider the operator QT P QT . The only difference between this operator and P QT is that the
elements of the range of P QT are now truncated to [T /2, T /2] so the eigenfunctions can be considered as
eigenfunctions of P QT restricted to [T /2, T /2]. This is no great observation, but here is a surprising one.
Proposition 3.4.2. The eigenfunctions of P QT are orthogonal on the whole real line and also on the interval
[T /2, T /2], that is, if n is the n eigenfunction of P QT then
Z

T /2

n (t)m (t) dt = n m nm = n m nm
T /2

n (t)m (t) dt

Proof. Orthogonality on all of R follows from the fact that eigenvectors coming from different eigenvalues are
orthogonal. To prove orthogonality on [T /2, T /2] we use Parsevals theorem together with an interesting
little fact that the eigenfunctions are, in a sense, invariant under the Fouriertransform. First, if c = T , the
time-frequency area associated with P QT , then, setting f (x) = f (x/)/
P QT / f (x) = P ((QT f ) )

\
= ((Q
T f ) (1)[/2,/2] )
\

= ( (Q
T f )()(1)[/2,/2] )

\
= (((Q
T f )(1)[/2,/2 )1/ )

= (P QT f ) .

66

3 Time-frequency analysis

In other words, dilation in a sense commutes with time-frequency localization, and the eigenfunctions of the
operator P QT / have the form where is an eigenfunction of P QT f .
What does the Fourier transform do to a -eigenfunction? Well,
(QT ) () = (QT P QT ) ()
= PT (P QT ) ()
= PT Q (QT ) ()
where Q g() = g()1[/2,/2] (). This says that (QT ) () is an eigenfunction of the operator PT Q =
(P QT )T and, by what we just observed it follows that the eigenfunctions of PT Q are unitary dilations
by a factor T / of the eigenfunctions of P QT which tells us that (QT ) () = T / (). Therefore, by
Parseval,
Z T /2
Z
(P QT n )(P QT m ) =
(QT P QT n )(QT P QT m )
T /2

(QT P QT
) (QT P QT m )
Z
= n m
(QT n ) (QT m )

Z
= n m
n,T / ()m,T / () = n m

as claimed.
Problem 3.4.3. Do the sample sequences of the prolate spheroidal wave functions satisfy some extrapolation
problem?
In other words, one would like to determine from the sample values of m inside of [T /2, T /2] the values
outside of [T /2, T /2]. This problem is highly ill-posed but it is less ill-posed when we observe that is an
analytic function and even less so when we observe that it is an eigenfunction.
3.4.1 Numerical generation of PSWFs
Figure 3.15 shows numerically generated prolate spheroidal wave functions where T = 10 and = 1. The
figure illustrates the fact that the first several eigenfunctions are highly concentrated in [T /2, T /2], but
that the concentration of n decreases with n and once n.[T ] at most half of the energy of n is localized
inside [T /2, T /2]. Here is a brief description of how they were created. First a matlab function sincmatrix
was used to generate a partial matrix sT (k, `) for values k and ` running from N to N for some user
input N . The integral defining sT (k, `) was computed using matlabs built in quad function for numerical
estimation of integrals. Matlab also has a built in sinc function but for earlier versions the sinc function
can be input manually, with a small correction in the denominator to avoid division by zero at t = k, `.
Computing sT (k, `) is computationally intensive but only has to be done once. The eigenvectors are then
estimated numerically by using the matlab built in svd. These eigenvectors are the samples of the PSWFs.
Finally, one multiplies the matrix containing the eigenvectors of sT by a matrix containing densely sampled
values of the shifted sinc functions. The columns of the resulting product are densely sampled approximate
prolate spheroidal wave functions. The approximations here depend on two things: (i) the error tolerance in
the quadrature defining sT and (ii) the parameter N governing the size of the partial matrix of sT . In fact,
the entries of sT decay fairly rapidly away from the diagonal. This is illustrated in Figure 3.16 showing that
the entries sT (k, `) are significant only when k, ` are approximately between 5 and 5 and when k `. In
addition, one sees in Figure 3.17 that the partial sinc matrix sT has only about 14 significant eigenvalues
and only 10 eigenvalues greater than 1/2. In fact, a theorem due to Landau [?] states that, in general, P QT
has at most [T ] + 1 eigenvalues larger than 1/2. In our case, = 1 and T = 10 so our numerical results
are certainly consistent with the theory. Taking N any larger will provide slightly better approximations but
at a higher computational cost.
Once one has the samples of the PSWFs n one can compute the projection of any f P W onto the
range of P QT simply by computing the `2 (Z) inner product (or, numerically, the partial inner product) of
the samples of f with the samples of each n . This is the same as multiplying the partial sample vector by
the orthogonal matrix obtained in the svd above. Then one expands each of the n s as before.

3.5 Another look at frequency


0.6

67

phi2
phi3
phi10

0.4

0.2

0.2

0.4

0.6

10

10

Fig. 3.15. Numerical PSWFs 2 , 3 and 1 0 for = 1 and T = 10.

0.9

0.8

10

0.7

15
0.6

20
0.5

25
0.4

30

0.3

35

0.2

40

0.1

10

15

20

25

30

35

40
0

Fig. 3.16. Image of 41 41 partial matrix sT (k, `) for


T = 10

10

12

14

16

18

Fig. 3.17. Eigenvalues of partial matrix sT

3.5 Another look at frequency


A basic tenet of time-frequency analysis is that all signals of interest can be expressed as superpositions
of damped oscillators. In a lot of cases the signal is really a superposition of not too many oscillators,
plus random effects, but there are two fundamental reasons why untangling the individual components is
problematic. The first is that, generally, one does not know the precise nature of the oscillator. The second is
that, generally, one does not know the precise nature of forcing or damping effects. Because there are so many
unknowns one oversimplifies by treating all components as sinusoidal, then undersimplifies by expressing a
signal as a complicated superposition of sinusoids. Sinusoidal oscillations, if they are undamped, obey the
harmonic oscillator equation
d2 x
= 2 x(t)
dt2
This equation is derived from Newtons law F = ma and Hookes law F = kx with = k/m. If, instead,
the oscillations are linearly damped (e.g. thermal loss proportional to velocity) then the equation is just
slightly more complicated,

68

3 Time-frequency analysis

d2 x
b dx
+
+ 2 x(t).
dt2
m dt
Damping proportional to velocity or viscosity damping is an oversimplification; in many cases Coulomb or
frictional damping also is significant. One can also call Hookes law into question. For example, if one allows
spring stiffness to depend slightly on displacement then one is quickly led to oscillators such as the duffing
oscillator
b dx
d2 x
+
+ 2 x(t) + x(t)3 .
dt2
m dt
Thus, even for small amplitude vibrations and for the simplest possible nonlinearities, the result of treating
oscillations as superpositions of pure sinusoids leads to nontrivial bandwidth. Finally, there is the possibility
that a signal is generated by a coupled system of oscillators.
It is impossible to take all of these issues into account when trying to express a system or signal as a
superposition of damped vibrations or a measurement thereof. In many applications it is more fundamental
that time-frequency analysis can point to fundamental changes over time in the oscillatory behavior of
observed signals, as time-frequency representations attempt to do. But when a signal is fundamentally a
superposition of a small number of oscillators, it makes sense to try to identify waveforms beyond sinusoids.
3.5.1 Defining the spectrum of a real signal
3.5.2 Instantaneous frequency and the Hilbert transform
At the beginning of this chapter we discussedpthe polar or amplitude-phase decomposition z = rei of a
complex signal z(t) = x(t) + iy(t) with r = x2 + y 2 and = arctan(y/x). At any time t, r represents
the magnitude of z and eit represents the position of a point on the unit circle. In the ideal case z = eit ,
expressed in radians, the signal z oscillates at a rate radians per unit time and we call the instantaneous
frequency. When z is not a pure exponential we define the instantaneous phase at time t as (t) and d/dt
is called the instantaneous frequency.
These are nice definitions but they suffer a fundamental problem, which is that, ordinarily, measurable
signals are real-valued. In order to make sense out of instantaneous phase one has to make sense out of the
virtual complex signal companion of x(t).
When x is a square integrable signal of time there is a somewhat canonical way of manufacturing an
analytic signal from x. This tool is called the Hilbert transform and it is defined in the following way:
f 7 f 7 f1[0,) 7 (7 f1[0,) )

1
(I + iH)f.
2

The operator H is called the Hilbert transform and can be defined analytically by the integral formula
Z
1 f (s)
Hf (t) = p.v.
ds.
t s
Here p.v. stands for principal value meaning that the integral is rigorously defined by taking an appropriate limit near the singularity s = t in the denominator of the integrand. Corresponding discrete Hilbert
transforms can be defined for discrete and finite signals respectively.
Exercise 3.5.1. Give a reasonable definition of a Hilbert transform on CN .
The instantaneous frequency of f is that of its analytic extension (I + iH)f /2. While instantaneous
frequency can thus be defined in rigorous mathematical terms, its physical interpretation is still problematic.
For one thing, the phase (t) may not be a differentiable function of t. But just as critically, instantaneous
frequency may not reflect the nature of the signal if it is composed of multiple oscillating components. We
are still haunted by the fundamental tradeoff between localization in time and localization in frequency.
3.5.3 Monocomponent versus multicomponent
Instantaneous frequency only yields one value at any given time. This is fine if a signal is comprised of a
single oscillating component but not if several components are present. One way to quantify the amount of
frequency variation a physical signal possesses is in terms of bandwidth.

3.6 Empirical mode decomposition

69

3.5.4 Empirical bandwidth


As before let z(t) = r(t)ei(t) . Express the Fourier transform of z as Z() which has expected value
Z
hi = |S()|2 d.
By Plancherels theorem,
Z

1 dz
dt
i dt
Z 
Z
d
i dr  2
d 2
r (t) dt =
=<

r (t) dt
dt
r dt
dt

hi = <

z(t)

by equating real and imaginary parts. In this notation one can define the bandwidth in terms of instantaneous amplitude and frequency averages as
Z
1
2 hi2
=
( hi)2 |Z()|2 d
2 =
hi2
hi2
Z
1 d
2
1
2
=
z

(t)
hi
z(t) dt
hi2
i dt
Z  2 
2
2
d
dr
1
2
+
r
(t)
dt

hi
=
hi2
dt
dt
For a narrowband signal both terms of the last integral have to be small meaning that both the amplitude
and the instantaneous frequency have to vary slowly. In order that the Fourier transform is supported in
[0, ), one should have ) . However, this definition of bandwidth still takes global averages of local
information which can lead to negative frequencies.
When signals are generated by a Gaussian stationary process meaning that random errors follow a
normal distribution and the nature of the distribution does not change over time, it is possible to compute
the expected number of zero crossings per unit time. If the average value of the signal is zero and the signal
is essentially one oscillating component, it makes sense to define the frequency locally approximately as the
number of zero crossings per unit time, for a short time average, provided a local maximum or minimum
intervenes between consecutive zero crossings. When the signal does not have an average value of zero, this
approach can be misleading.
Exercise 3.5.2. Compute the instantaneous frequency of the analytic signal z(t) whose real part is x(t) =
a + sin t for (i) a = 0, (ii) a = 2/3, (ii) a = 4/3. What does this say about using the analytic signal to define
instantaneous frequency?

3.6 Empirical mode decomposition


3.6.1 Intrinsic modes
Huang et al define an intrinsic mode function (IMF) to be any function that satisfies the following: (i) in
the whole data set, the number of extrema and the number of zero crossings differ by at most one. (2) At
any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local
minima is zero.
This last condition is not completely well defined because one has to use some interpolation scheme
to produce these envelopes from the maxima and minima. But it can be achieved by adding envelope
interpolation points by reflecting each extremum in the data across the y-axis. The resultant waveform does
not always possess well-defined instantaneous frequency [?] (it will introduce an alias in the instantaneous
frequency for nonlinearly deformed waves), but a relatively weak one in comparison with the effect of global
nonstationarities.
2

Exercise 3.6.1. (Easy) For which values of a is x(t) = sin t + a an IMF? Is et sin t and IMF?

70

3 Time-frequency analysis

The term intrinsic mode function refers to the oscillation mode embedded in the data. An IMF, is meant
to have only one oscillation mode, although we will see that algorithms for extracting IMFs do not always
succeed in this regard. But they do succeed in extracting riding waves that arise in the analytic extension
of a + sin t, for example. An IMF is not restricted to be narrowband and can is typically nonstationary. For
instance, any frequency modulated (FM) signal can be an IMF.
Instantaneous frequency of IMD
Suppose that x(t) is an IMD and let z(t) = r(t)eit be its analytic extension. Then its Fourier transform is
Z

r(t)e2i( 2 t) dt
Z() =

The frequency contributing most to Z() at any given time satisfies



d  (t)
t = 0
dt 2

or

d
= .
dt

In this sense, an IMD represents a true oscillation.


3.6.2 Sifting: the Empirical Mode Decomposition
Empirical Mode Decomposition
While IMD captures some essential features of the notion of an oscillation of finite duration, data generated by
a superposition of several nonstationary components will not necessarily reveal itself readily as a superposition
of IMDs so, to be of practical use, there has to be a method to disentangle IMD components. Huang et al. [?]
suggest a sifting method called the empirical mode decomposition. The crucial step is to identify characteristic
scales for different oscillations.
Step 1: Extrema. The first step is to identify all local maxima and minima in the data X(t). Here one just
needs enough measurements/samples so that any reasonable notion of bandwidth is taken into account.
Step 2: Envelopes. One interpolates all of the local maxima data to produce an upper envelope. Typically
cubic spline interpolation is used. The matlab command spline can be used for this. One does the same
with all local minima information to produce a lower envelope.
Step 3: Extract zero mean signal. Denote by m(t) the average of the upper and lower envelopes. Then
h(t) = X(t)m(t) has average zero. The signal h may not be an IMF because there can still be extrema with
the wrong sign. This is a result of undershoot and overshoot of the interpolation process, which can result
from nonlinearities in the data. Additionally, in real oscillations the envelope mean value will not generally
be zero, though it might be close. Results also depend on the interpolation method.
Step 4: Iteration over zero mean extractions. One repeats the sifting method on h and on subsequently
produced means until some stopping criterion is satisfied. The desired stopping criterion is that h produced
after some number of iterations is an IMD. Huang et al. suggest that if the relative difference in `2 norm
(on samples) of successive hs is small enough then one should also stop. Sufficiently small can be made
a tunable, data dependent parameter. Huang et al. suggest 0.2 to 0.3 for the relative error. The resulting
component C = C1 should contain the finest time scale (highest frequency) oscillation of the data X(t).
Step 5: Iterate on residual signal. This step repeats steps (1)(4) to define more IMFs. If R1 = X C1
then iteration on R1 yields a component C2 and so on so that X + R1 + C1 = R2 + C2 + C1 . . . . It makes
sense to adopt two separate stopping criteria based on environmental effects. The first stopping criteria is
that the component or residue becomes small after some number of steps. A second criterion is that the
residue does not have any local maxima after some number of steps, in which case it is called a trend.
The components Ci should be locally almost orthogonal in the sense that Ci+1 locally corresponds to
a mean value and Ci locally corresponds to a difference from that mean. Huang et al. provide a measure
called index of orthogonality, essentially the ratio of the magnitudes of the cross correlations of different
components to the total signal energy.
Results of the empirical mode decomposition applied to the data buellershort are given in Figure 3.18.
The first IMF is plotted over the original data in Figure 3.19.

3.6 Empirical mode decomposition

71

Spectrogram intensity plots of sums of the first few IMFS are given in Figure 3.20. Each of the figures
illustrates one nontrivial issue with EMD, namely that multiple overlapping frequencies can show up in a
single IMF.

0.5
0
0.5
0
0.5
0
0.5
0
0.5
0
0.5
0
0.5
0
0.5
0
0.2
0
0.2
0
0.1
0
0.1
0
0.05
0
0.05
0
0.05
0
0.05
0

Bueller IMF 1

1000

2000

3000

4000

5000

6000

7000

8000

9000

Bueller IMF 8

Fig. 3.18. First eight IMFs of bueller data. EMD identified 13 IMFs but the last several have small amplitude.

3.6.3 Hilbert spectrum


One of the basic premises of EMD is that the complex envelope of x(t) provides an inappropriate measure of
instantaneous spectrum because it tries to reduce all of the spectral information in a possibly multicomponent
signal to a single number. When x is expressed as a superposition of several component IMFs, it becomes
appropriate to compute the instantaneous frequencies of each of the IMFs. That is, let Zi denote the analytic
extension of Ci and let Zi (t) = Ri (t)eii (t) . This decomposition then resembles the Fourier decomposition of
a finite superposition of oscillating components, but in the Fourier series case each component is stationary.
Huang et al. refer to the time-frequency distribution H(t, ) of the amplitude as the Hilbert spectrum of x. It
is a time-frequency distribution in the same manner as the spectrogram, but it is defined intrinsically based
only on very simple suppositions about what properties an oscillating component should have.
3.6.4 Limitations of EMD
Empirical mode decomposition has several drawbacks. For one, it does not necessarily track coherent components whose frequencies change with time. Another drawback, illustrated in Figure 3.21 shows that implicit
mode functions sometimes can be aggregates of multiple frequency oscillations. The beating pattern in the
first IMF of the buellershort data happens because the IMFs do not take into account fine information
about derivatives at zero crossings. In this case, the outer envelope has the form cos ( )t and the inner
oscillation has the form cos ( )t with similar amplitudes so that one gets effectively twice the product
cos t cos t. Such behavior arises in real systems. The shortcoming here is that nothing in the EMD algorithm prevents the envelope itself from having a zero. In terms of physical systems the question becomes
one of whether one wishes to represent a coupled oscillator as a single component or as a superposition of
multiple components.

72

3 Time-frequency analysis
1

0.8

Bueller
1st IMF

0.6

0.4

0.2

0.2

0.4

0.6

0.8
0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Fig. 3.19. Bueller data and first IMF.

First 1 IMFs

First 3 IMFs

First 4 IMFs

Fig. 3.20. Sums of first few IMFs. The sum of the first four captures most of the spectrogram in Figure 3.7

3.7 Appendices to Chapter ??


0.2

0.15

0.1

0.05

0.05

0.1

0.15

0.2
1350

1400

1450

1500

1550

1600

1650

1700

1750

1800

Fig. 3.21. Beating in first IMF of Bueller signal illustrates one limitation of EMD.

3.7 Appendices to Chapter ??


3.7.1 Code for Gabor compression
function y=gaborcompress(X,P);

%
%
%
%
%

gaborcompress reconstructs an approximation of


of a one-dimensional signal from P percent of its largest
gabor coefficients
X should be a 1xN row vector
N should be at least 32.

L=2^(floor(log2(length(X))));
X=double(X(1:L));
p=(100-P)/(100);
%
g=pgauss(L/4,L/16,L/8);
a=8;
M=L/a;

% dyadic length
% floating point

% number of channels
% default shift factor

dg=candual(g,a,M);
tic
% start clock
[dgtX,Ls] = dgt(X,g,a,M);
disp([-----------]);
disp(Forward DGT);
toc

% start clock

% stop clock

73

74

3 Time-frequency analysis

disp([-----------]);
s1=size(dgtX,1)
s2=size(dgtX,2)
s=s1*s2
figure;
imagesc(log(1+10*abs(flipud(dgtX(1:floor(s1/2),:)))));
tic
fcsort = sort(abs(dgtX(:)));
fcerr = cumsum(fcsort.^2);
%
fcerr = flipud(fcerr);
%
fthresh = fcsort(floor(p*s));
%
cf_X = dgtX .* (abs(dgtX) > fthresh);
disp([-----------]);
disp(Sorting/thresholding);
toc
disp([-----------]);

% sort fourier coeff by magnitudes


sum of squares
decreasing order
specify threshold
% keep large

figure;
imagesc(log(1+10*abs(flipud(cf_X(1:floor(s1/2),:)))));
tic
idgt_X = idgt(cf_X,dg,a,Ls);
disp([-----------]);
disp(Inverse DGT);
toc
disp([-----------]);
figure;
y=real(idgt_X);
subplot(1,2,1);
plot(X);
axis([1 length(X) min(X) max(X)]);
title(original data);
subplot(1,2,2)
plot(y);
axis([1 length(X) min(X) max(X)]);
title(reconstruction from large Gabor coefficients);

S-ar putea să vă placă și