Sunteți pe pagina 1din 112

Applications of Digital Signal Processing

Stephen Redmond, Room 146, Ste.Redmond@gmail.com

Course Description
Builds on the fundamentals of Digital Signal Processing to show specic examples of signal

processing algorithms in real world applications.

Content
24 lectures & associated homeworks and projects.

Texts
• Discrete-Time Signal Processing (2nd Ed.), Oppenheim and Schafer, Prentice-Hall,

1999.

• Digital Processing of Speech Signals, Rabiner and Schafer, Prentice-Hall, 1978.

• Statistical Digital Signal Processing and Modelling, M. Hayes, Wiley, 1996.

Syllabus
1. Brief review of DSP

Nyquist's Sampling Theorem, Discrete-Time Fourier Transform, z -transform, Inverse


z -transform.

2. Properties and design of lters.

3. Adaptive ltering

4. Spectral analysis

5. Speech processing

6. DSP implementation issues

1
Marking
• 15% - Three homework assignments, one every 3 weeks (after every 6 lectures), with

5% for each assignment.

• 15% - For a Matlab based practical assignment due before the easter break. A choice

of three topics covered in the course will be available.

• 70% - For the nal exam. 5 questions, attempt 4.

2
1 Brief review of DSP fundamentals
1.1 Nyquist's Sampling Theorem
The ideal sampling of a signal, f (t), is the same as multiplying it by an impulse train,

δT (t) = Σ∞
n=−∞ δ(t − nT ). The resulting signal is (c.f. Figure 1)

f¯(t) = f (t)δT (t)


X ∞
f¯(t) = f (nT )δ(t − nT ).
n=−∞

Figure 1: Ideal sampling process.

Note that δT is periodic and so can be represented by a Fourier series,


X
δT = cn ejnω0 t ,
n=−∞

where,

Z π
ω0 ω0 T π
cn = δT (t)e−jω0 t dt, N ote : = .
2π −π 2 ω0
ω0

1
=
T

Therefore,

1 X
f¯(t) = f (t)ejnω0 t .
T n=−∞

3
Take the Laplace transform of both sides gives


1 X
F̄ (s) = F (s − jnω0 )
T n=−∞

which gives the frequency spectrum of the ideally sampled signal as


1 X
F̄ (jω) = F (j(ω − nω0 )) .
T n=−∞

Therefore the spectrum of f¯(t) consists of an innite number of copies of the spectrum of

f (t) shifted to be centred on the multiples of the sampling frequency and scaled by the
1
amount
T (c.f. Figure 2).

ω0
If the smallest frequency in f (t) is less than
2 then the signal may be perfectly re-
constructed, given an ideal analog lowpass lter. This is Nyquist's sampling theorem for

baseband signals. We can also show for bandpass signals that the signal only needs to be

sampled at a rate greater than the bandwidth of the signal.

1
Figure 2: Ideal sampling spectrum. The spectrum is scaled by
T and copied to multiples
of ω0 . An ideal analog reconstruction lter is shown (dashed).

4
1.2 Discrete-Time Fourier Transform (DTFT)
The continuous-time Fourier Transform of a signal, f (t), is dened as

Z ∞
F (jω) = f (t)e−jωt dt
−∞

If f (t) is sampled by impulses, then we are working with the signal f¯(t):

X
f¯(t) = f (nT )δ(t − nT ).
n=−∞

Taking the Fourier Transform of this gives

Z ∞
F (jω) = f¯(t)e−jωt dt
−∞
Z ∞ ∞
X
= f (nT )δ(t − nT )e−jωt dt
−∞ n=−∞

X Z ∞
= f (nT ) δ(t − nT )e−jωt dt
n=−∞ −∞

X∞
= f (nT )e−jnωT
n=−∞

F (jω) denotes the spectrum of the sampled version of f (t). The transform is called the

Discrete-Time Fourier Transform (DTFT).

F (jω) = α(ω) + jβ(ω)


p
F (jω) = α2 (ω) + β 2 (ω)
β(ω)
∠F (jω) = arctan
α(ω)

5
1.3 z -transform
Let f [n] be a sequence obtained by sampling the signal f (t) every T seconds (i.e. at

t = nT ). The z -transform of a sequence, f [n], is dened as:


X
F̃ (z) = f [n]z −n .
n=−∞

z is a complex variable. We note that if z = ejωT then this transform is identical to the

DTFT (i.e., F (jω) = F̃ (ejω )). Also note that as ω varies, z = ejωT traces out the locus of

a unit circle in the complex plane, with its centre at z = 0.

Figure 3: Unit circle in the z -plane traced out by the function z = ejωT

6
Example: Find the DTFT magnitude and phase given H̃(z)
The transfer function of a digital lter is given by the z -function, H̃(z), nd the magnitude
and phase response.

H̃(z) = 1 + z −1
H̃(ejωT ) = 1 + e−jωT
= 1 + cos(ωT ) − j sin(ωT )

q
H̃(ejωT ) = (1 + cos(ωT ))2 + sin2 (ωT )

√ p
= 2 1 + cos(ωT )
 
− sin(ωT )
∠H̃(ejωT ) = arctan
1 + cos(ωT )
!
− sin(2[ ωT
2 ])
= arctan
1 + cos(2[ ωT2 ])
!
−2 sin( ωT
2 ) cos( ωT
2 )
= arctan
2 cos2 ( ωT
2 )
!
ωT
− sin( 2 )
= arctan
cos( ωT
2 )
ωT
= −
2

7
1.4 Inverse z -transform
Ocial version:
F̃ (z)z n
I
1
f [n] = dz,
2πj C z
where C is an appropriately chosen contour in the region of convergence of the z -plane.
However, in practice there are many simple tricks for nding the inverse z -transform.

Example: Find the inverse z -transform using partial fractions


The impulse response of a digital lter has the following z -transform, H̃(z), nd the impulse
response of the lter.

1
H̃(z) =
z(z − 1)(2z − 1)
Using partial fraction expansion we get:

 
z 2z
H̃(z) = z −1 1 − −
z − 1 z − 0.5

Using the formula tables we nd the inverse transform is:

h[n] = δ(n − 1) + u(n − 1) − 2(0.5)n−1 u(n − 1)

where u(n) is a unit step and δ(n) is a delta impulse. Alternatively we could use long

division. Either way the derived impulse reponse is given by:

h[n] = [0, 0, 0, 0.5, 0.75, 0.875, ...]

Given a function, H̃(z), the sequence obtained by taking the inverse z -transform, h[n],
is bounded only if the poles of the function lie inside the unit circle. If the function

H̃(z) describes the frequency response of a lter, this sequence h[n] is called the impulse

response.

8
2 Properties and Design of Filters
2.1 Frequency Transfer Functions
The frequency response of a linear digital lter may be represented by the transfer function

H̃(z). Suppose we know the z -transform of the input signal, x[n], is X̃(z). Therefore, we

can nd the output of the lter, y[n], since its z -transform is dened as

Ỹ (z) = H̃(z)X̃(z).

The transfer function is usually given as, or can be reduced to, the ratio of two polynomials.

For example,

B̃(z) b0 + b1 z −1 + b2 z −2 + ... + bM z −M Ỹ (z)


H̃(z) = = −1 −2 −N
= .
Ã(z) a0 + a1 z + a2 z + ... + aN z X̃(z)
Rearranging we get

a0 Ỹ (z) + a1 Ỹ (z)z −1 + ... + aN Ỹ (z)z −N = b0 X̃(z) + b1 X̃(z)z −1 + ... + bM X̃(z)z −M .

Taking the inverse z -transform gives

a0 y[n] + a1 y[n − 1] + ... + aN y[n − N ] = b0 x[n] + b1 x[n − 1] + ... + bM x[n − M ].

Rearranging once more gives a linear constant coecient dierence equation which can be

used to directly implement the lter,

1
y[n] = [(b0 x[n] + b1 x[n − 1] + ... + bM x[n − M ]) − (a1 y[n − 1] + ... + aN y[n − N ])] .
a0

In a nite impulse response (FIR) lter all ai = 0 for i>0  there is no feedback. FIR

lters are always stable. An innite impulse reponse (IIR) lter is unstable if the poles

(where the denominator is zero) are outside the unit circle dened by |z| = 1, i.e., z = ejωT .
We will see later that, when using nite precision arithmetic, it is sometimes dicult to

ensure that the poles are inside the unit circle.

9
2.2 Pole-Zero plots
Another way to examine the transfer function is factorise the numerator and denominator

and see where the poles and zeros lie in the complex z -plane. For example we would

factorise the transfer function as follows:

QM −1

B̃(z) b0 + b1 z −1 + b2 z −2 + ... + bM z −M i=1 1 − zi z
H̃(z) = = = QN .
Ã(z) a0 + a1 z −1 + a2 z −2 + ... + aN z −N −1
i=1 (1 − pi z )

We can manipulate the equation into the following form:

z −M M
Q
i=1 (z − zi )
H̃(z) = −N QN .
z i=1 (z − pi )

Now if we want to know the steady state frequency response of the lter we set z = ejωT ,

−jωT M
QM jωT − z

e i=1 e i
H̃(ejωT ) = −jωT N QN .
e i=1 (e
jωT − p )
i


Next nd the magnitude spectrum, H̄(jω) = H̃(ejωT ) . (remember |ejθ | = 1)

e−jωT M QM ejωT − z QM jωT


− zi
i=1 e

jωT i=1 i
H̃(e ) = −jωT N QN = QN .

|e | i=1 |ejωT − pi | i=1 |ejωT − p |
i

In words, the magnitude response of the lter at a frequency ω is the product of the

distances from e
jωT to each of the zeros, divided by the product of the distances of ejωT
to each of the poles.

10
Example: Sketch the magnitude response of h[n] = [1, −0.25, −0.125]
Use a pole-zero plot to sketch the magnitude of the frequency response of the lter whose

impulse reponse is h[n] = [1, −0.25, −0.125]. The z -transform of h[n] is

H̃(z) = 1 − 0.25z −1 − 0.125z −2 .

Factorising this gives

H̃(z) = z −2 (z − 0.5)(z + 0.25).

Hence, the frequency response magnitude is given by


H̃(ejωT ) = ejωT − 0.5 ejωT + 0.25

= L1 .L2

Figure 4: An example of a pole-zero plot with two real zeros at z = −0.25 and z = 0.5.
The magnitude of the frequency response at z= ejωT is the product of L1 and L2 .

We can sketch the magnitude response by determining L1 and L2 at various locations

on the unit circle. Figure 5 shows a plot of the magnitude response of the lter, h[n] =

11
[1, −0.25, −0.125].

1 5 5
ωT = 0, L1 = , L2 = ⇒ H̃(ejωT ) =

2s 4 8s
 2 √  2 √ √85
π 1 2
5 1 2
17
jωT
ωT = , L1 = + (1) = , L2 = + (1) = ⇒ H̃(e ) =

2 2 2 4 4 8
3 3
9

ωT = π, L1 = , L2 = ⇒ H̃(ejωT ) =

2 4 8

1.3

1.2

1.1

1
Magnitude

0.9

0.8

0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9


ωT (π)

Figure 5: The magnitude of the frequency response of the lter H̃(z) = 1 − 0.25z −1 −
0.125z −2
It is sometimes intuitively helpful to think of the zeros as depressions in the z -plane.
The magnitude of the frequency response is then the height of the absolute value of the

function over the z -plane (c.f. Figure 6).

12
1

0.5

0
log10(|H(z)|)

−0.5
1.5
−1
1

−1.5 0.5

−2 0

−1.5 −0.5
−1
−0.5
0 −1
0.5
1 −1.5 Im[z]
1.5

Re[z]

Figure 6: The log-magnitude of the function H̃(z) = 1 − 0.25z −1 − 0.125z −2 evaluated


over the z-plane. Also shown is the unit circle, z = e
jωT . We see that the zeros form

depressions in the surface.

13
Example: Place a spectral null at ωT = 2π
3

We need a zero on the unit cirlce at z1 = ej 3 . However if we want the lter coecients

to be real we must put a zero at the complex conjugate position, z2 = ej 3 . This gives the

following transfer function:

 2π
 4π

H̃(z) = z − ej 3 z − ej 3




H̃(ejωT ) = ejωT − ej 3 ejωT − ej 3 = L1 L2

or

H̃(z) = 1 + z −1 + z −2 ⇒ h[n] = [1, 1, 1]

The pole-zero plot for this function is shown in Figure 7.

Figure 7: The pole-zero plot of the transfer function


H̃(z) = 1 + z −1 + z −2 . Also shown is

the magnitude frequency response, H̃(ejωT ) .


This is essentially a low pass lter. We would have expected as much since the output

14
of the lter, y[n], is the sum of the present input and two previous inputs:

y[n] = x[n] + x[n − 1] + x[n − 2].



Again we can imagine the zeros as depressions in the surface of the function H̃(z) as is



shown in Figure 8. Note that any component in the input signal at which ωT = 3 will be
completely removed. The lter is said the have a notch at ωT = 2π
3 . However, the notch
may be considered unacceptably wide  it is greatly attenuating nearby frequencies. We

can remedy this by introducing a pole nearby.

0
log (|H(z)|)

−1
10

−2 1.5

1
−3
−1.5 0.5

−1
0
−0.5
0 −0.5

0.5
−1
1
−1.5
Im[z]
1.5
Re[z]

Figure 8: The log-magnitude of the function H̃(z) = 1 + z −1 + z −2 evaluated over the


z -plane. Also shown is the unit circle, z = ejωT . We see that the zeros lie on the unit
circle.

15
Example: Implementing an ad-hoc notch lter at ωT = 2π
3

Suppose we want to augment the lter above by adding a pole, we must keep the pole

inside the unit circle and we want to place it as near as possible to the zero. Let's place

it at z = p1 = 0.95ej 3 . But we must also place one at the complex conjugate position to

keep the lter coecients real: z = p2 = 0.95ej 3 . Now we have a new lter,

 2π
 4π

z − ej 3 z − ej 3
H̃(z) =  2π
 4π

z−j 3 z − 0.95ej 3
1 + z −1 + z −2
=
1 + (0.95)z −1 + (0.95)2 z −2


j 2π j 4π
z − e z − e
3
3

jωT
H̃(e ) =

2π 4π

z − 0.95ej 3 z − 0.95ej 3

L1 L2
= .
M 1 M2

Notice that when when z is far away from a pole-zero pair, the absolute value of their

ratio, let's call it R̃(z), is approximately unity, i.e.,



z − ej 3

R̃(z) = '1

z − 0.95ej 3


for z far away from ej 3 . This is exactly what we want, a zero at the specied frequency

and approximately unity gain elsewhere. We can visualise this using the pole-zero plot in

Figure 9. Figure 10 shows the function H̃(z) evaluated over the z -plane. Again, we nd


the magnitude of the frequency spectrum, H̃(e
jωT ) , by circumnavigating the unit circle,


z = ejωT . A plot of H̃(ejωT ) is shown in Figure 11.


We have seen how moving zeros and poles around the z -plane allows us to approximately
design lters with a desired frequency response. Iterative computer based techniques do

this when trying to nd design a lter with an arbitrary frequency response  they have a

guess and then move the poles and zeros to improve their guess. One famous algorithm is

the Remez algorithm.

16

Figure 9: Pole-Zero plot for a notch lter at ωT = 3 .

0.5

0
log10(|H(z)|)

−0.5

1.5
−1
1
−1.5
0.5

−2 0

−1.5 −0.5
−1
−0.5
0 −1
0.5
1 −1.5 Im[z]
1.5

Re[z]

1+z −1 +z −2
Figure 10: The log-magnitude of the function H̃(z) = 1+0.95z −1 +(0.95)2 z −2
evaluated over

the z -plane. Also shown is the unit circle, z = ejωT . We see that the poles `pull' the
surface back up in the vicinity of the zeros.

17
1.4

1.2

0.8
Magnitude

0.6

0.4

0.2

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ωT (π)


Figure 11: The magnitude spectrum ,

H̃(ejωT ) , of the notch lter.

2.3 All-pass systems


Consider the following lter

1 1 − pz −1
Ã(z) = −   .
p 1 − 1∗ z −1
p

1
This lter has a zero at z = p and a pole at z = p∗ , where ∗ denotes the complex conjugate.
We can examine what the magnitude of the frequency response is:

− p1 + z −1
Ã(z) =  
1 − p1∗ z −1

jωT
− p1 + e−jωT
Ã(e ) =  
1 − p1∗ e−jωT
   
e−jωT 1 − p1 ejωT
=  
1 − p1∗ e−jωT
 
−jωT b
= e
b∗

18

jωT
Ã(e ) = 1

The magnitude response of the lter at all frequencies is unity. So what! Well suppose

we have designed a lter which has a pole, p, outside the unit circle. This lter will be

unstable. However, the allpass lter removes that pole by placing a zero there, and a new
1
pole is placed at the position
p∗ (the distance from the origin is now the inverse of the
1
original distance). There is also a − factor to ensure the gain is one. We can make any
p
IIR lter stable with the same magnitude response. Unfortunately the phase is altered.

19
Example:
Make the lter with the following transfer function stable:

1 1
H̃(z) = = .
(1 − (1 − j)z −1 ) (1 − (1 + j)z −1 ) 1− 2z −1 + 2z −2

0
Magnitude (dB)

−5

−10

−15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)

400

350

300
Phase (degrees)

250

200

150

100

50

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)

Figure 12: The magnitude and frequency response of the unstable lter H̃(z) =
1 1
(1−(1−j)z −1 )(1−(1+j)z −1 )
= 1−2z −1 +2z −2 .

This lter has poles at z = (1 − j) and z = (1 + j). We transform one pole at a time

by multiplying by an allpass lter. The new transfer function is

−1 1 −1 1
H̃Stable (z) =        
(1 − j) 1 − 1 z −1 (1 + j) 1 − 1 z −1
1+j 1−j
1 1
=
2 1 − 2(0.5)z −1 + 12 z −2
1
=
2 − 2z −1 + z −2

This stable lter has poles at z = 0.5 ± j0.5. Since they are inside the unit circle the lter

is stable.

20
5

Magnitude (dB)
−5

−10

−15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)

−10

−20
Phase (degrees)

−30

−40

−50

−60

−70
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)

Figure 13: The magnitude and frequency response of the stablised lter H̃Stable (z) =
1
. The magnitude response is the same but the phase response is dierent.
2−2z −1 +z −2

2.4 Design of IIR lters from analog prototypes


We can design lters in the digital domain if we want. However, there is a huge literature

on analog lter design, so we can steal some of their ideas and make them work in the

digital domain. Before we see how to make them work in the digital domain, let's revise

how to build an analog lter to a given specication.

2.4.1 Analog Butterworth lter


Take, for example, the power transfer function of an nth order lowpass Butterworth lter,

1
|H(jω)|2 =  2n
ω
1+ ωc

We are usually given a power transfer specication which we need to meet using this

transfer function.

Hence, we must choose ωc and n to meet the requirements. Here's how...

1
 2n ≥ Gp
ωp
1+ ωc
 2n
ωp 1
≤ −1 (1)
ωc Gp

21
Figure 14: The typical specication of a Butterworth lowpass analog lter.

and

1
 2n ≤ Gs
ωs
1+ ωc
 2n
ωs 1
≥ −1 (2)
ωc Gs

If we eliminate ωc from Equations 1 and 2 and solve for n we get

   
1
log Gs − 1 − log G1p − 1
n≥   .
2log ωωps

Resolving Equations 1 and 2 for ωc gives

ωp ωs
  1 ≤ ωc ≤   1 .
1 2n 1 2n
Gp −1 Gs −1

22
Example: Design an analog Butterworth to meet the spec

ωp = 0.726
Gp = 0.8

ωs = 1.376
Gs = 10−2

We start by choosing n:

log (99) − log (0.25)


n≥ = 4.678
2log (1.895)

Hence we choose n = 5. Now choose an ωc :

ωp 0.726
ωc ≥  1 = 1 = 0.8339
(0.25) 10

1 2n
Gp −1

ωs 1.376
ωc ≤  1 = 1 = 0.869
1
−1
2n (99) 10
Gs

So we choose ωc = 0.85 radians/second.

We look up the lter tables and see that the transfer function for a 5th order analog

butterworth lowpass lter with cuto ωc is

1
H(s) =  5  4  3  2   .
s s s s s
ωc + 3.2361 ωc + 5.2361 ωc + 5.2361 ωc + 3.2361 ωc +1

23
2.4.2 Bilinear transform
We have just revised how to design an analog lter (choose the parameters) given a speci-

cation. But how do we use this design technique to build a digital lter? One commonly

used technique is the Bilinear Transform. We take a transfer function for an analog lter

QM
i=1 (s − zi )
HA (s) = QN
i=1 (s − pi )

and we do the following transform

1 − z −1
s→ ,
1 + z −1

which will give us the digital lter

1 − z −1
 
H̃D (z) = HA
1 + z −1

The frequency response of the analog lter, HA (s), is given by setting s = jωA , the
frequency response of the digital lter, H̃(z), is given by setting z = ejω D T (ωA denotes

analog frequency, ωD denotes digital frequency). So the digital and analog lters have the

same frequency response when

1 − z −1
s =
1 + z −1
1 − e−jωD T
jωA = −jωD T
1 + e 
jωD T
= tanh
2
ωD T
= j tan
2
ωD T
ωA = tan
2

This squeezes the entire frequency range of the analog lter, ωA , into the range [0, π] of the
normalised digital frequency ωD T . So if we are given the design specication of a digital
lter we:

1. prewarp the specied ωD frequencies to get the specs for the analog lter: ωA =
ωD T
tan 2

2. design the analog lter to obtain its transfer function, HA (s).


1−z −1
3. Substitute s= 1+z −1
.

24
Example: Design a 5th order lowpass butterworth digital lter using Bilinear
Transform
The sampling time is T = 10−3 seconds. The digital lter specication is

ωDp = 2π(200)
GDp = 0.8

ωDs = 2π(300)
GDs = 10−2

Prewarping gives

2(200)π(10−3 )
ωAp = tan = 0.726
2
GAp = 0.8
2(300)π(10−3 )
ωAs = tan = 1.376
2
GAs = 10−2

These are the same specications for the butterworth lter we designed earlier with n=5
and ωc = 0.85. We look up the lter tables and see that the transfer function for a 5th
order analog butterworth lowpass lter is

1
H(s) =  5  4  3  2   .
s s s s s
ωc + 3.2361 ωc + 5.2361 ωc + 5.2361 ωc + 3.2361 ωc +1

Performing the Bilinear Transformation gives us the digital butterworth lter

1
H̃(z) =  5  4  3  2   .
1 1−z −1 1 1−z −1 1 1−z −1 1 1−z −1 1 1−z −1
0.85 1+z −1
+ 3.2361 0.85 1+z −1
+ 5.2361 0.85 1+z −1
+ 5.2361 0.85 1+z −1
+ 3.2361 0.85 1+z −1
+1

From here it is trivial (if a little soul-destroying) to determine the lter coecients. Hint:
5
multiply top and bottom by 0.85 1 + z −1 and then thresh it out. Figure 15 shows the

frequency response of the lowpass butterworth digital lter.

25
1.4

1.2

1
ωT (π): 0.3997803
Magnitude: 0.9091107

0.8
Magnitude

0.6

0.4

0.2
ωT (π): 0.6013184
Magnitude: 0.08687025

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ωT (π)

Figure 15: The magnitude of the frequency reponse of a 5th order digital butterworth
constructed using the bilinear transform. Notice that the response is zero at ωT = π .
Marked are pass and stop band crital frequencies ωDp andωDs which occur at f = 200Hz
and f = 300Hz respectively when the sample time is T = 10−3 .

2.5 Reconstruction ltering


Once we are nished ltering our digital signals how do we get the sampled signal back to

the analog world?

• In an ideal world we would have a train of impulses, which represent the sampled

signal, and we would pass the impulse train through an ideal lowpass lter.

Figure 16: Ideal signal reconstruction.

• The ideal lter would only pass the portion of the spectrum related to the original

signal and would block all copies of the spectrum which appeared at multiples of the

sampling frequency when the signal was sampled.

There are two problems with this approach:

26
1. It is impossible, in practice, to produce an impulse train. It is very, very dicult to

even get close to producing a reliable weighted impulse train.

2. Even if we could, it is impossible to make an ideal lter  but his is not a huge

concern since we can make quite good lter .

In practice we use a device called a digital-to-analog converter followed by an analog low

pass lter. The digital-to-analog (DAC) converter encorperates a zero-order hold mecha-

nism. The impulse reponse, h(t), of the DAC is

h(t) = 1 when 0≤t≤1


h(t) = 0 otherwise

Figure 17: Zero-order hold impulse response.

When the impulse response is convolved with ideally sampled signal a staircase approx-

imation of the signal is obtained (c.f. Figure18).

Figure 18: Zero-order hold signal reconstruction.

From your electronic circuits course you should remember that an ADC can be imple-

mented with an opamp and some resistors (c.f. Figure 19). The staircase occurs because

every T seconds a new sample is read from the computer memory to the input of the ADC.

We would expect that lowpass ltering this staircase approximation might give us a

better reconstruction of the signal. We can justify this assumption by looking at how the

ADC aects the frequency spectrum of the ideally sampled signal.

• The ADC acts as a type of lowpass lter.

27
Figure 19: A four-bit analog-to-digital converter.

• We can nd the transfer function of the ADC by taking the Fourier transform of the

impulse response:

Z∞
F {h(t)} = H(jω) = h(t)e−jωt dt
−∞
ZT
= e−jωt dt
0
 t=T
e−jωt

=
−jω t=0
e−jωT − e0
=
−jω
jωT
e− 2 h − jωT jωT
i
= e 2 −e 2
−jω
jωT
" jωT jωT
#
2e− 2 e 2 − e− 2
=
ω 2j
jωT
2e− 2
 
ωT
= sin
ω 2
ωT

− jωT sin 2
= Te 2 ωT

2

The magnitude response of the ADC is therefore

ωT

sin 2
|H(jω)| = T ωT
 .
2

28
In the limit, as x → 0, sin(x)/x → 1. For x = ±nπ , sin(x)/x = 0. Therefore the ADC

transfer function has spectral nulls at

ωT
= ±nπ
2
ωT = ±n2π

ω = ±n
T
= ±nωs

Remember ωs is the sampling frequency! But also remember that when the original signal

was ideally sampled there were an innite number of copies of the signal spectrum placed
1
at ±nωs and scaled by
T . So the ADC has scaled the signal back to its original magnitiude
and placed a null at the centre of each copy of the spectrum. Let's sketch the spectrum of

the output of the ADC.

|X(jω)|

|X̄(jω)|

1
T

−2ωs −ωs ωs 2ωs ω

|H(jω)|
T

−2ωs −ωs ωs 2ωs ω

|X̄(jω)H(jω)|

−2ωs −ωs ωs 2ωs ω

Figure 20: Plots of (a) signal spectrum, (b) spectrum of ideally sampled signal, (c) transfer
function of ADC and (d) spectrum of staircase approximation of signal.

• We can now design an analog lowpass lter to remove the residues at ±nωs and hence

29
very closely reconstruct the original signal.

• The more over-sampled the signal is the less sharp the cut-o of the lowpass lter

needs to be since the space between the copies of the spectra is increased.

• Also, if the signal is oversampled there will be less shaping of the signal spectrum

which we are trying to recover since the sin(x)/x will be approxiately at over a

larger portion of the spectrum. This agrees with our intuition.

30
2.6 Interpolation (Upsampling)
By interpolation we mean increasing the sampling rate by an interger number. Suppose we

want to increase the sampling rate L times. We will now have L−1 new samples between

each sample. How should we do this? There are two possible solutions, one is better than

the other...

1. Reconstruct and resample : We can reconstruct an analog signal using a zero-order

hold and a lowpass lter. We can then resample the signal at higher rate. This is the

ugly approach, since we must go back into the 'analog world' and we are sure to lose

information in the process due to the non-idealities of reconstruction and resampling.

2. Digitally resample and lowpass lter : This is the pretty way of doing it. Let's inves-

tigate how...

When we sampled the signal we obtained a sequence of numbers which represent the

samples (the impulses). What happens if we resample this signal L times faster?

Assume that sampling the signal x(t) gives x[n] = {12, 9, 15} resampling this signal

L=4 times faster will give new signal y[n] = {12, 0, 0, 0, 9, 0, 0, 0, 15} (c.f. Figure 21).

Figure 21: Upsampling example

If the time between the samples of x[n] is T1 then the time between the samples of y[n]
T1
is T2 = L . Hence we may write y[n] as


X
y[n] = x[k]δ(n − kL)
k=−∞
= ... + x[0]δ(n − 0) + x[1]δ(n − L) + x[2]δ(n − 2L) + ...

31
Let's see what the discrete time Fourier transform of y[n] looks like:


X
Ȳ (jω) = Ỹ (ejωT2 ) = y[n]e−jnωT2
n=−∞

X ∞
X
= x[k]δ(n − kL)e−jnωT2
k=−∞ n=−∞
∞ ∞
" #
X X
= x[k] δ(n − kL)e−jnωT2
k=−∞ n=−∞
X∞
= x[k]e−j(kL)ωT2
k=−∞
jωT2 L
= X̃(e )
jωT1
= X̃(e )


• Hence the spectrum of y[n] (which is sampled at ω2 = T2 ) is identical to that of x[n].

• Therefore there are copies of X(jω) (the spectrum of the analog signal x(t)) at ±nω1 ,
1
and scaled by
T1 .

• But if we had sampled the analog signal x(t) at ω2 rad/s we would only have copies

of X(jω) at ±nω2 = ±nLω1 , and scaled by T12 .

So, we need the remove the copies which are not centered at ±nω2 . We do this with a

digital lowpass lter. Each copy can have a footprint on the frequency axis of ± ω21 .

ω1 ω2 2π π
• So, the lter cuto frequency needs to be at ω= 2 = 2L = 2LT2 = LT2 . The digital
π
lter has a cuto at ωT2 = L.

1
The spectrum is still scaled by
T1 so if the lter has a gain of L the ltered spectrum will
1
have a scaling of
T2 as required. By ltering y[n] this lter will output an interpolation of

x[n]. We call this interpolated sequence xint [n].


In summary, to upsample by a factor L:

1. Insert L−1 zeros between the samples.

π
2. Lowpass lter with a cuto at ωT2 = L and a gain of L.

32
|X(jω)|

|Ỹ (ejωT2 )| L=3


1
T1

−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1 ω


−ω2 ω2

ω1 ω2
2
= 2L
π
ωT2 = L

|H(ejωT2 )|

−3ω1 −2ω1 −ω1 ωT2 = − Lπ ωT2 = π


L
ω1 2ω1 3ω1 ω
−ω2 ω2

|X̃int (ejωT2 )|

1
T2

−ω2 ω2 ω

Figure 22: A example of upsampling with L = 3. Shown is the original signal spectrum,
|X(jω)|, and the spectrum of the sampled and then upsampled signal, |Ỹ (ejωT2 )|. Also
shown is the lowpass lter used to interpolate between the original samples, |H̃(e
jωT2 )|,
π jωT
which has a cuto at ωT2 =
L . The nal upsampled spectrum is shown as |X̃int (e
2 )|.

33
2.7 Decimation (Downsampling)
Downsampling involves decreasing the sample rate by interger multiples. Assume we have

sampled a signal x(t) every T1 seconds to get x[n]. When downsampling by a factor of M
we take every M th sample:

y[n] = x[M n].

This is equivalent to sampling the original signal every T2 = M T1 seconds (c.f. Figure 23).

Figure 23: Downsampling example.

This results in copies of the analog signal spectrum, X(jω), being shifted to multiples
2π ω1 1
of ω2 = T2 = M and scaled by T2 .

• This was what was required. Where's the problem?

• The problem is: If the maximum frequency in X(jω) is greater than ω= ω2


2 then
downsampling will cause aliasing.

This is simply a re-statment of the sampling theorem. Before the signal was sampled all
ω1
frequencies greater than ω=
2 were removed using an analog anti-alias lter. Before we
ω ω1
downsample we can remove, from x[n], all remaining frequencies greater than ω = 2 =
2 2M
using a lowpass digital lter, H̃(e
jωT1 ). Hence H̃(ejωT1 ) has a cuto at ωT = π . We call
1 M
this ltered version xlpf [n]. Now, the downsampled sequence is y[n] = xlpf [M n].
In summary, to downsample x[n] by a rate M:

π
1. Low pass lter x[n] with a cuto ωT1 = M and unity gain to get xlpf [n].

2. Take every M th sample from xlpf [n] to get the downsampled signal, y[n] = xlpf [M n].

34
|X(jω)|

|X̃(ejωT1 )|

1
T1

−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1 ω

|H̃(ejωT1 )|
M =2
1
T1

−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1 ω


ω1 ω1
ω= 2M
= 4

|X̃(ejωT1 )H̃(ejωT1 )|

1
T1

−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1 ω

|Ỹ (ejωT2 )|

1
T2

−6ω2 −5ω2 −4ω2 −3ω2 −2ω2 −ω2 ω2 2ω2 3ω2 4ω2 5ω2 6ω2 ω
−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1

Figure 24: An example of down sampling with M = 2.

35
3 Optimum and Adaptive ltering
In the 1940s Norbert Wiener conducted fundamental research into the following problem:

given a measured signal, x[n], which is a corrupted version of the desired signal d[n], what

linear lter, w[n], will provide the best estimate of d[n] from the measured values of x[n].
First we will deal with the FIR time-invariant Wiener lter.

d[n] x[n] ˆ
d[n]
Corruption Wiener Filter w[n]
(Noise + Distortion)

Figure 25: Wiener lter. Desired signal, d[n]. Corrupted signal, x[n]. Estimated signal,
ˆ .
d[n]

3.1 Time-invariant FIR Wiener lter


Wiener, for mathematical convenience, decided to try and choose the lter coecients,

w[k], which would minimse the mean-square error between the estimate of the signal,
ˆ = Σp−1 w[k]x[n − k], and the desired signal, d[n]. Hence we dene the error, e[n], as:
d[n] k=0

ˆ
e[n] = d[n] − d[n]
p−1
X
e[n] = d[n] − w[k]x[n − k].
k=0

Therefore, we can write the mean-squared error, , as


p−1
!2 
 X 
 = E (e[n])2 = E

d[n] − w[k]x[n − k] .
 
k=0

We wish to minimise this expression with respect to each of the w[i]. Therefore we dier-

entiate to get

∂ ∂
E (e[n])2
 
=
∂w[i] ∂w[i]

∂(·)
∂x and E {·} are both linear operators, so their order can be interchanged:

 
∂ ∂
(e[n])2

= E
∂w[i] ∂w[i]
 
∂e[n]
= E 2e[n]
∂w[i]
 
∂e[n]
= 2E e[n]
∂w[i]

36
Pp−1
But e[n] = d[n] − k=0 w[k]x[n − k], which when expanded is e[n] = d[n] − w[0]x[n] −
w[1]x[n − 1]..., so
∂e[n]
= −x[n − i].
∂w[i]
This gives

∂
= 2E {−e[n]x[n − i]}
∂w[i]
∂
= −2E {e[n]x[n − i]}
∂w[i]

∂
We then minimse by setting
∂w[i] equal to zero for each i = 0, ..., (p − 1):

E {e[n]x[n − i]} = 0. i = 0, ..., (p − 1) (3)

This tells us that our error when trying to recover the signal, d[n], must be uncorrelated
with the measured signal x[n]. If our error was in some way dependent on the input to

the lter, x[n], then we would expect we could remove the dependent part using a better

lter! This known as the orthogonality principle, or the projection theorem.

We can now substitute e[n] into Equation 3:

p−1
( ! )
X
E d[n] − w[k]x[n − k] x[n − i] = 0, i = 0, ..., (p − 1)
k=0
p−1
( !)
X
E (d[n]x[n − i]) − x[n − i] w[k]x[n − k] = 0, i = 0, ..., (p − 1)
k=0
p−1
( )
X
E {d[n]x[n − i]} − E x[n − i] w[k]x[n − k] = 0, i = 0, ..., (p − 1)
k=0
E {d[n]x[n − i]} −
E {x[n − i] (w[0]x[n − 0] + ... + w[p − 1]x[n − p + 1])} = 0, i = 0, ..., (p − 1)
E {d[n]x[n − i]}
−w[0]E {x[n − i]x[n − 0]} + ... + w[p − 1]E {x[n − i]x[n − p + 1]} = 0, i = 0, ..., (p − 1) (4)

Rearranging Equation 4 gives

p−1
X
E {d[n]x[n − i]} = w[k]E {x[n − i]x[n − k]} i = 0, ..., (p − 1) (5)
k=0

Here we dene the autocorrelation of a signal a[n] as

raa [k] = E {a[n]a[n − k]} ,

and we dene the crosscorrelation of signals a[n] and b[n] as

rab [k] = E {a[n]b[n − k]} .

37
Hence Equation 5 becomes

p−1
X
rdx [i] = w[k]rxx [k − i], i = 0, ..., (p − 1).
k=0

These p equations are called the Wiener-Hopf equations, due to their introduction by

Norbert Wiener and Eberhard Hopf whilst working at MIT. It's called a Wiener lter

because Hopf moved to Germany in 1936, where National Socialist German Workers' Party

was in power, and his much of his contribution went unacknowledged. History is written

by the victors!

We can write these equations in matrix form:

    
rxx [0] rxx [1] rxx [2] · · · rxx [p − 1] w[0] rdx [0]
    

 rxx [1] rxx [0] rxx [1] 
 w[1]  
  rdx [1] 


 rxx [2] rxx [1] rxx [0] 
 w[3] =
  rdx [3] ,

. .. . .
    
. . . .
. . .
    
    
rxx [p − 1] rxx [0] w[p − 1] rdx [p − 1]

which is the matrix form of the Wiener-Hopf equations. Written more compactly in matrix

algebra we have:

Rxx w = rdx .

Rxx is a p×p symmetric Toeplitz matrix and as such is guaranteed to be invertable.

There is an algorithm called the Levinson-Durbin algorithm which eciently solves these

equations. We will meet this algorithm later in the Speech Processing section.

The optimum lter coecients, wopt , are therefore:

wopt = R−1
xx rdx

In order to nd these lter coecients we must estimate the autocorrelation and cross-

correlation statistics. This makes a big assumption: that the statistics are stationary! If

they change we're in trouble. That's why this is called a time-invariant lter, because the

parameters don't change with time.

38
3.1.1 Minimum mean squared error of Wiener lter
We can calculate the expected minimum mean square error:

 
 This is e[n] 

 z }| {

p−1

 ! 

 2  X 
 = E e [n] = E e[n] d[n] − w[k]x[n − k]
 


 k=0 



 

p−1
X
= E {e[n]d[n]} − w[k] E {e[n]x[n − k]}
| {z }
k=0
This is zero
= E {e[n]d[n]} (6)

For optimal lter coecients E {e[n]x[n]} = 0, from Equation 3. Next, we substitute the

for the other e[n] term in Equation 6:

 
 This is e[n] 

z }| { 

p−1

 ! 

 X 
 = E d[n] − w[k]x[n − k] d[n]
 


 k=0 



 

p−1
X
 = E {d[n]d[n]} − w[k]E {d[n]x[n − k]}
k=0
p−1
X
 = rdd [0] − w[k]rdx [k].
k=0

This can be written in vector notation as:

 = rdd [0] − rTdx w


= rdd [0] − rTdx R−1
xx rdx .

39
3.1.2 Corruption due to uncorrelated noise
When calulating the optimum lter parameters we must estimate rxx [k] and rdx [k]. If we

assume that noise, v[n], has simply been added to the original signal,

x[n] = d[n] + v[n].

If we also assume the noise is uncorrelated with d[n] (rdx [k] = E {d[n]v[n − k]} = 0) then

the following simplications can be made:

rxx [k] = E {x[n]x[n − k]}


= E {(d[n] + v[n]) (d[n − k] + v[n − k])}
= E {d[n]d[n − k]} + E {d[n]v[n − k]} + E {d[n − k]v[n]} +E {v[n]v[n − k]}
| {z } | {z }
=0 =0
= rdd [k] + rvv [k]

Also,

rdx [k] = E {d[n]x[n − k]}


= E {d[n] (d[n − k] + v[n − k])}
= E {d[n]d[n − k]} + E {d[n]v[n − k]}
| {z }
=0
= rdd [k].

The Wiener-Hopf equations in matrix-vector form for this becomes

[Rdd + Rvv ] wopt = rdd .

40
Example: Given signal statistics design Wiener lter
Given that d[n] is known to be a process with an autocorrelation given by rdd [k] = a|k| ,
with 0 < a < 1. Additive white noise with a variance of σ2 has corrupted d[n] to give

x[n]. Design an optimum second order lter to retrieve an estimate of d[n] from x[n]. If

a = 0.8 and σ
2 = 1, determine the lter coecients. Estimate the mean square error of

the output.

The Wiener-Hopf equations are

" #" # " #


rxx [0] rxx [1] w[0] rdx [0]
= .
rxx [1] rxx [0] w[1] rdx [1]

Since d[n] and v[n] are uncorrelated and v[n] is white noise, we get rxx [k] = rdd [k]+rvv [k] =
a|k| + σ 2 δ[k]. Also, rdx [k] = rdd [k]. So,

" #" # " #


1 + σ2 a w[0] 1
= .
a 1 + σ2 w[1] a

Solving gives
" # " #
w[0] 1 1 + σ 2 − a2
= .
w[1] (1 + σ 2 )2 − a2 aσ 2

When a = 0.8 and σ2 = 1 we have wT = [ 0.4048 0.2381 ]. Figure 26 shows the lter

before and after ltering.

Effect of Wiener filtering


2.5

1.5

0.5
x[n]

−0.5

−1

−1.5

−2

−2.5
0 20 40 60 80 100
n

Figure 26: Wiener Filtering. Dashed: Original signal. Dotted: Signal with white noise
added. Solid: ltered signal.

41
|Wopt (ejωT )|

Noise

Signal

π ωT

Figure 27: Spectral illustration of Wiener ltering a noisy signal. The lter tries to preserve
as much signal and removes as much noise as possible.

The mean squared error, , is given by  = rdd [0] − rTdx R−1


xx rdx . So we have

" #" #
0
h i 0.5952 −0.2381 1
 = 0.8 − 1 0.8
−0.2381 0.5952 0.8
= 0.2048

42
3.2 Adaptive ltering
So what's wrong with Wiener ltering? Two things:

1. If the correlation statistics change we must re-estimate them,

2. Estimating correlation statistics takes time since we must wait for the data and then

compute an average.

We will try and tackle the rst problem now.

3.2.1 Steepest descent algorithm


If we assume that some change in the statistics has caused us to have a currently non-

optimal value for the lter coecients, wn , We can use a steepest descent algorithm to

make iterative changes until we arrive at the new optimum. This might seem pointless since

we can simply nd the new optimum in one move by solving the Wiener-Hopf equations,

but the value of this method will become clear when we try to solve problem 2 above.

The mean squared error


p−1
!2 
 X 
 = E e2 [n] = E

d[n] − wn [k]x[n − k]
 
k=0

is a quadratic function in wn [k]. The error surface traced out by varying each wn [k] is a p
dimensional quadratic 'bowl' which has only one minimum. Figure 28 illustrates this for

a second order lter.

We can move towards that mimium by stepping a small distance, µ, in a direction down
the surface to get wn+1 [k]. The direction we wish to move is the opposite of the steepest

direction up the slope (grad() = ∇). Hence our updated coecients are

wn+1 = wn − µ (∇)

 
    ∂
wn+1 [0] wn [0] ∂wn [0]
∂
 
wn+1 [1] wn [1]
     
  
 = 

 − µ ∂wn [1] 
. . .

. .
 

.
 
.
 .
.
     
 
wn+1 [p − 1] wn [p − 1] ∂
∂wn [p−1]

∂
So we need to nd the derivatives
∂wn [i] . (We did this earlier, but here it is again.)

Therefore we dierentiate to get

∂ ∂
E (e[n])2
 
=
∂wn [i] ∂wn [i]

43


w[1]

w[0]
w[1]

wn
wn+1

w[0]

Figure 28: Error surface for a second order lter.

44
∂(·)
∂x and E {·} are both linear operators, so their order can be interchanged:

 
∂ ∂
(e[n])2

= E
∂wn [i] ∂wn [i]
 
∂e[n]
= E 2e[n]
∂wn [i]
 
∂e[n]
= 2E e[n]
∂wn [i]
Pp−1
But e[n] = d[n] − k=0 wn [k]x[n − k], which when expanded is e[n] = d[n] − wn [0]x[n] −
wn [1]x[n − 1] − ... − wn [p − 1]x[n − p + 1], so

∂e[n]
= −x[n − i].
∂wn [i]

This gives

∂
= 2E {−e[n]x[n − i]}
∂wn [i]
∂
= −2E {e[n]x[n − i]}
∂wn [i]

So the update equation becomes (with the −2 absorbed into the µ)


     
wn+1 [0] wn [0] E {e[n]x[n]}
wn+1 [1] wn [1] E {e[n]x[n − 1]}
     
     
 .
= .
 + µ .
 (7)
 .   .   . 
 .   .   . 
wn+1 [p − 1] wn [p − 1] E {e[n]x[n − p + 1]}

Figure 29 shows a block diagram of an adaptive lter.

d[n]
x[n] ˆ
d[n]
Filter w[n]

∂
µ ∂w[k]

Adaptive e[n]
Algorithm

Figure 29: Adaptive lter

If the signal statistics remain constant, this will converge to the optimum lter. If the

statistics change, it will try to follow them. However problem 2 still remains: we must

estimate an expectation at each time step! We now propose a solution...

45
3.2.2 The LMS algorithm
The required expectation in Equation 7 may be estimated using the sample mean,

L−1
1 X
E {e[n]x[n − k]} = e[n − l]x[n − k − l].
L
l=0

We may nd a crude approximation by letting L = 1:

E {e[n]x[n − k]} ≈ e[n]x[n − k].

So the new update equation to adjust the lter taps is

     
wn+1 [0] wn [0] e[n]x[n]
wn+1 [1] wn [1] e[n]x[n − 1]
     
     
 .
= .
 + µ .
.
 .   .   . 
 .   .   . 
wn+1 [p − 1] wn [p − 1] e[n]x[n − p + 1]

This is called the LMS (Least Mean Squares) algorithm.

• This is hardware ecient since each update requires approximately one vector mul-

tiplication and one vector addition.

• This comes at the price of having slower convergence properties than the Steepest

Descent algorithm.

• It will sometimes move up the error surface, since we are using crude approximation

of E {e[n]x[n]}. However, on average it will move in the right direction.

One known technical limitation on this method is that the step size, µ, must be kept small
2
if the algorithm is to converge. In fact it must be smaller than
λmax , where λmax is the

largest eigenvalue of the autocorrelation matrix, Rxx of the input, x[n].

2
µ<
λmax

In practice we just make µ reasonably small.

• This type of ltering is commonly known as adaptive equalisation. A training signal,

d[n], is sent down the channel. The receiver measures x[n] which is a corrupted

version of d[n]. The adaptive lter then 'equalises' the eects of the channel. Once

the training signal has nished the lter taps are frozen an the data is transmitted.

If it is a mobile channel (time varying) then training must be repeatedly performed

after some data has been transmitted, to adjust to the changing channel.

46
3.3 Adaptive system identication
Equalisation essentially looks for the inverse of the eect which corrupted the original

signal. Another interesting area where adaptive ltering can be applied is in system iden-

tication, or system modelling.

x[n] d[n] e[n]


Unknown System

ˆ
d[n]
System Model w[n]

Figure 30: Block diagram of system modelling.

We put the same input into the lter and the system and adjust the lter until both

outputs are the same, or close. Conceptually the setup is dierent, but the mathematics

is identical to that of equalisation.

x[n] d[n] e[n]


Unknown System

ˆ
d[n]
System Model w[n]

∂
µ ∂w[i]

Adaptive algorithm

Figure 31: Adaptive system modelling.

Figure 32 illustrates the convergence properties of the LMS algorithm applied to system

identication. The system impulse response is h[n] = [ 1 0.2 ]. The input to both the

47
system and the lter, x[n] is coloured noise, with an autocorrelation of rxx [k] = 0.8|k| . We

see that after about 1000 samples the system has been adequately modelled.

Error as a function of time


Coefficient values as a function of time
0.5 1.2

0.4
1
0.3

0.2 0.8
e[n]=d[n]−dest[n]

Coefficients
0.1

0 0.6

−0.1
0.4
−0.2

−0.3
0.2
−0.4

−0.5 0
0 500 1000 1500 2000 0 500 1000 1500 2000
n n

Figure 32: Adaptive system identication example. Coecients converge to the known
system coecients.

48
4 Spectral analysis
Spectral analysis deals with the examination of the frequency content of random signals.

Since we do not have an explicit expression for the signals in question we cannot directly

calculate their spectra. However using the statistics of the signals we can estimate on
average what frequenies contribute to the signal power to what amount.

Since we are dealing with stochastic processes we rst need to quickly revise some basics

on random variables and processes.

4.1 Review of stochastic variables and processes


4.1.1 Random variable
• A random variable, X, is a number (continuous or discrete) associated with the

outcome of an experiment. If we repeat the experiment we may get a dierent

number in an un-predictable way.

• We can dene the probability that X will be below, or equal to, some value x as

FX (x) = P (X ≤ x).

This is called the cumalitive distribution function (CDF). This has some obvious

properties:

FX (x1 ) ≤ FX (x2 ) if x1 < x2


FX (∞) = 1
FX (−∞) = 0.

• We dene the probability density function (PDF) to be

dFX (x)
fX (x) = .
dx

Therefore the probability of X falling in some interval [x1 , x2 ]

P (x1 ≤ X ≤ x2 ) = P (X ≤ x2 ) − P (X ≤ x1 )
= FX (x2 ) − FX (x1 )
Z x2
= fX (x)dx.
x1
R∞
Hence,
−∞ fX (x)dx = 1. Figure 33 shows the CDF and PDF for a Gaussian random

variable:  
−(x−µ)
1 2σ 2
fX (x) = √ e .
σ 2π

49
1

0.8

0.6

0.4

0.2

0
−5 −4 −3 −2 −1 0 1 2 3 4 5

0.4

0.3

0.2

0.1

0
−5 −4 −3 −2 −1 0 1 2 3 4 5

Figure 33: CDF and PDF for a Gaussian random variable with zero mean and variance
σ 2 = 1.

• The expected value of a function of a random variable, g(X), is the average over a

large (innite) number of experiments. Since the event X=x occurs with a relative

frequency of fX (x), this works out as

Z ∞
E {g(X)} = g(x)fX (x)dx
−∞

• The nth moment of a random variable is dened as the expected value of g(x) = xn :
Z ∞
n
E {X } = xn fX (x)dx.
−∞

n=1 gives the mean.

4.1.2 Random Processes


• Suppose we have an experiment which continually outputs random variables as time

elapses. We call this output a random process, X(t), rather than a random variable.

The CDF and PDF for a random process are dened as for a random variable, except

they are now time dependent:

FX (x, t) = P (X(t) ≤ x)

dFX (x, t)
fX (x, t) = .
dx
• If we x t = t1 , then X(t1 ) is simply a random variable and all the rules for random

variables apply. If we were to estimate the CDF, FX (x, t1 ) and PDF, fX (x, t1 ) we

would need to set up a large (preferably innite) number of experiments, all running

simultaneously and inspect them when t = t1 as shown in Figure 34. This large

50
group of experiments is called an ensemble. Once we have estimated fX (x, t1 ) we

can calculated the expected value X(t1 ) using

Z ∞
E {X(t1 )} = xfX (x, t1 )dx
−∞

X1 (t)

X2 (t)

X∞ (t)

Figure 34: An ensemble of random processes, X(t), which we would use to estimate
fX (x, t1 ).

4.1.3 Stationarity
• A process is said to strictly stationary if all its statistics are independent of time. If

X(t) is strictly stationary then fX (x, t1 ) = fX (x, t2 ).

• A less strict version of stationarity often used is wide sense stationary. In this case

only the mean and autocorrelation need to be stationary.

• The autocorrrelation is dened as

rXX (t1 , t1 + τ ) = E {X(t1 )X(t1 + τ )} .

Since the autocorrelation is stationary for a wide sense stationary process, the fol-

51
lowing is true:

rXX (t1 , t1 + τ ) = rXX (t2 , t2 + τ ).

Hence, rXX (t1 , t1 + τ ) is only a function of the time dierence, τ. Hence, for a wide

sense stationary process we write the autocorrelation as rXX (τ ).

4.1.4 Ergodicity
• To estimate the expected value (the mean) of a random process X(t) at time t = t1 ,
we have had to examine a large number of concurrent experiments and nd the

average outcome across all experiments at t = t1 :


Z ∞
E {X(t1 )} = xfX (x, t1 )dx.
−∞

• But, if this is equivalent to taking a time-average for one experiment,

Z t1 +T
1
E {X(t1 )} = lim X(t)dt,
T →∞ 2T t1 −T

then the process is said to be ergodic in the mean. If the same is true for the

autocorrelation

Z t1 +T
1
E {X(t1 )X(t1 + τ )} = lim X (t) X (t + τ ) dt
T →∞ 2T t1 −T

then the process is said to be ergodic in the autocorrelation.


• If it is ergodic in both the mean and the autocorrelation, the process is simply called

ergodic.
• To be ergodic the process must be stationary. But, a process can be stationary but

not ergodic! Why?

The assumption of ergodicity allows us to use time-averages to estimate the


autocorrelation function from a single random process.

52
4.2 Continuous-time power spectral density
The power spectral density (PSD) and the autocorrelation function of a signal, say x(t),
form a Fourier transform pair:

Z∞
Sxx (ω) = rxx (τ )e−jωτ dτ,
−∞

Z∞
1
rxx (τ ) = Sxx (ω)ejωτ dω.

We do not prove this here. But this fact is called the Wiener-Kinchin theorem and we use

it as a starting point.

• We can see why this might make sense by examining the instantaneous power of the

signal:
Z∞
 2
1
E x (t) = rxx (0) = Sxx (ω)dω.

−∞

This is the integral of the power spectral density at all frequencies. This is why the
2
units of the PSD is Watts/Hz or V /Hz. It can be shown that if x(t) is real then

Sxx (ω) ≥ 0 and symmetric.

We can also write the PSD for a discrete time process...

53
4.3 Discrete-time power spectral density
We assume we are examining the process x[n]. The autocorrelation becomes rxx [k] =
E {x[n]x[n + k]}, where k is an integer. Since we have samples of r[k] can write PSD as

the discrete time Fourier transform of the autocorrelation:


X
S̃xx (e jωT
)= rxx [k]e−jωkT .
k=−∞


If the signal x[n] was sampled at rate, ωs = T , (every T seconds) this gives


jωT 1 X
Sxx (e ) =
e Sxx (ω − ωs )
T
k=−∞

If ωs is greater than twice the maximum frequency in x(t) (bandlimited) then we can write

1
Sexx (ejωT ) = Sxx (ω) for |ωT | < π.
T

Sxx (ω)

S̃xx (ejωT )

1
T

−3ωs −2ωs −ωs ωs 2ωs 3ωs ω

Figure 35: PSD, S̃xx (ejωT ), of a sampled process.

54
4.4 The periodogram
Ultimately we will have to estimate the PSD from the data. If we have a signal x[n] and

we have collected N samples, we can model this by multiplying x[n] by a window function

w[n] to get

v[n] = w[n]x[n],

where the window function has the property of being a square pulse:

w[n] = 1 if 0 ≤ n ≤ N − 1,
= 0 otherwise.

The autocorrelation, rvv [k] of this windowed sequence can be estimated as

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−6 −4 −2 0 2 4 6 8 10 12 14

Figure 36: Rectangular window function, for N = 8.

N −1
1 X
rvv [k] ≈ v[n]v[n + k]
N
n=0
N −1
1 X
= (w[n]x[n]) (w[n + k]x[n + k]) .
N
n=0

55
The estimated PSD of the windowed process is therefore


X
Sevv (ejωT ) = rvv [k]e−jωkT
k=−∞
∞ N −1
X 1 X
≈ v[n]v[n + k]e−jωkT
N
k=−∞ n=0
∞ N −1
1 X X
≈ v[n]ejωnT v[n + k]e−jω(n+k)T
N
k=−∞ n=0
"N −1 #" ∞
#
1 X X
≈ v[n]ejωnT v[n + k]e−jω(n+k)T
N
n=0 k=−∞
"N −1 # "N −1 #
1 X jωnT
X
−jωmT
≈ v[n]e v[m]e
N
n=0 m=0
Ṽ ∗ (ejωT )Ṽ (ejωT )

N
|Ṽ (ejωT )|2
≈ . (8)
N

We forgot to normalise for the energy in the window so we add this normalising factor:
PN −1
U= n=0 (w[n])2 :
|Ṽ (ejωT )|2
Sevv (ejωT ) ≈ . (9)
NU
Which for the rectangular window is U = N.
This is the DFT (discrete Fourier transform  since we only used N samples) magnitude

squared divided by the number of samples, N, and the energy in the window, U. This is

called the periodogram .

Remember, we cannot practically know what's going on at all frequncies, so the DFT
2πm
is usually evaluated at ωT = N this give the machine-calculable DFT:

N −1
X 2πm
V̄ [m] = Ṽ (e jωT
) = v[n]e−jn
N
ωT = 2πm
N n=0

56
Example
We can estimate the power spectral density of a sinusoid, x[n] = sin(ω0 nT ). Let's choose
π
ω0 T = 2 . So, x[n] = sin( nπ
2 ). We'll take N =8 samples of the signal. Figure 37 show the

samples of x[n] and two DFTs for dierent valus of m.

0.5

−0.5

−1
1 2 3 4 5 6 7 8
n

0.25

0.2

0.15

0.1

0.05

0
1 2 3 4 5 6 7 8 9
m

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35
m

Figure 37: 8 samples of x[n] = sin( nπ


2 ) (top). We take two DFTs. One for m = 16,
(middle) and one for m = 64.

• Only the positive frequency axis is shown.

• There is another copy at ωT = − π2 .

• Of course, the range −π ≤ ωT ≤ π is copied to multiples of the sampling frequency

too.

We can also take more samples and see what happens. If we take N = 64 samples and

take two more DFTs for comparison (Figure 38):

57
1

0.5

−0.5

−1
0 10 20 30 40 50 60 70
n

0.25

0.2

0.15

0.1

0.05

0 10 20 30 40 50 60
m

0.25

0.2

0.15

0.1

0.05

0
0 50 100 150 200 250
m

Figure 38: 64 samples of x[n] = sin( nπ


2 ) (top). Two DFTs for m = 128 and m = 512. We
see that increasing N has narrowed the power spectrum.

58
4.5 Window functions
We can investigate what eect using a window function had on our periodogram estimator.
n o
We want to nd the bias in our estimation. Hence we are looking for E S̃vv (ejωT ) as


( )
n o X
−jωkT
E S̃vv (ejωT ) = E rvv [k]e
k=−∞

= ... + E {rvv [−1]} e−jω(−1)T + E {rvv [0]} e−jω(0)T + E {rvv [1]} e−jω(1)T + ...
X∞
= E {rvv [k]} e−jωkT
k=−∞
∞ N −1
( )
X 1 X
= E w[n]x[n]w[n + k]x[n + k] e−jωkT
N
k=−∞ n=0
∞ N −1
" #
X 1 X
= w[n]w[n + k] E {x[n]x[n + k]} e−jωkT
N
k=−∞ n=0
X∞
= rww [k]rxx [k]e−jωkT
k=−∞

So the expected value (the bias) of the periodogram estimator is the DTFT of the product

of the autocorrelation of the signal and the autocorrelation of the window. But, the discrete

product of two signals in the time domain is identical


time Fourier transform of the

to their periodic convolution in the frequency domain. Therefore,

π
ZT
n
jωT
o 1
E S̃vv (e ) = S̃ww (ej(ω−θ)T )S̃xx (ejθT )dθ.

θ=− Tπ

• So the window, w[n], has the eect of smearing the frequency content of S̃xx (ejωT )
across the frequency spectrum. This is called spectral leakage.
• We can tailor the window shape to try and reduce this eect.

PN −1
• Don't forget to normalise for the window energy: U= n=0 (w[n])2
π
ZT
n o 1
E S̃vv (ejωT ) = S̃ww (ej(ω−θ)T )S̃xx (ejθT )dθ.
2πU
θ=− Tπ

59
The rectangular window which we have been using up until now has very abrupt transitions

in the time domain. So, its frequencies spectrum is very broad. We take the DTFT to see

this:


X
jωT
W̃ (e ) = w[n]e−jωnT
n=−∞
N −1
X n
= e−jωT
n=0
1 − e−jωT N
= (by the geometric series)
1 − e−jωT
 
jωN T jωN T jωN T
e− 2 e 2 − e− 2
= jωT
 jωT jωT

e− 2 e 2 − e− 2
jω(N −1)T sin(N ωT
2 )
= e− 2 .
sin( ωT
2 )

Hence the magnitude spectrum of the w[n] is

sin(N ωT
2 )
|W̃ (ejωT )| = .
sin( ωT
2 )

But we can show (similar to an earlier calculation in Equation 8) that S̃ww (ejωT ) is

|W̃ (ejωT )|2


S̃ww (ejωT ) =
N
1 sin2 (N ωT
2 )
= 2 ωT
.
N sin ( 2 )

sin(N x)
• As ω → 0, |S̃ww (ejωT )| → N , since limx→0 sin(x) = N.

• Otherwise there are zeroes at

N ωT
= ±kπ
2
m

ωT = ±k
N

which looks like the upper-left plot of Figure 39 (shown in Db).

60
Figure 39: Spectra of a selection of window functions. The importance of resolution versus
spectral leakage trade-o becomes apparent.

61
• Hence the width of the main lobe is


∆ωT = .
N

• This is the approximate frequency resolution of the periodogram when using a rect-

angular window.

• If two frequencies are closer than this, the main lobes in the periodogram will overlap

and they will become indistinguishable.

• We can make the resolution arbitrarily small by taking more samples (↑ N ). By

increasing N the spectrum becomes more like a delta function. The resolution in

terms of frequency in Hz. is (using ∆ω = 2π∆f )

2
∆f = .
NT

• jωT
n N →o∞, S̃ww (e ) → 2πU δ(ω),
Taken to its extreme, as which gives the following

expected value of E S̃(ejωT ) :

π
ZT
n o 1
E S̃(ejωT ) = 2πU δ(ω − θ)S̃xx (ejθT )dθ = S̃xx (ejωT ).
2πU
− Tπ

So the estimator is unbiased in the limit as N →∞ (on average it gives the correct

answer).

• However, the amount of spectral leakage decreases very slowly as N increases! We

can do better by altering the window shape.

62
The most widely-used window function to reduce spectral smearing, or leakage, is probably

the Hamming window:

 
n
whamming [n] = 0.538 − 0.462 cos 2π for 0 ≤ n ≤ N − 1.
N −1

The tails of the spectrum are reduced, causing more contained spectral leakage  but at

the cost of a loss in resolution since the main lobe is widened. There are other windows,

such as the Blackman window,

   
n n
wblackman [n] = 0.42 − 0.5 cos 2π − 0.08 cos 4π
N −1 N −1

or the at-top window,

   
n n
wf lattop [n] = 1 − 1.93 cos 2π + 1.29 cos 4π
N −1 N −1
   
n n
−0.388 cos 6π + 0.032 cos 8π .
N −1 N −1

These windows have lower spectral tails, but broader main lobes.

• If enough data is available we can use a window to reduce the leakage and make N
large enough to get the resolution we want.

1 1 1.2

1
0.8 0.8
0.8
0.6 0.6 0.6

0.4 0.4 0.4

0.2
0.2 0.2
0

0 0 −0.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300

Figure 40: Hamming window, Blackman window and Flat-top window. N = 256.

63
4.6 The averaged periodogram
While we see that the periodogram is unbiased (on average it will give the right answer),

it can be shown (not shown here though) that as N →∞ the variance does not tend to

zero! We say it is not a consistant estimate. In fact, it can be shown that the variance of

the periodogram estimator is approximately given as

2
var Svv (ejωT ) ≈ Sxx (ejωT ) .
 

(The reason for this is embedded in the fact that: while, as N → ∞, rvv [k] =
1 PN −1
N n=0 (w[n]x[n]) (w[n + k]x[n + k]) → rxx [k], it does not do so uniformly. Choose any

value of  and it is impossible to choose a value of N such that |rvv [k] − rxx [k]| <  for all

k .)

• Weird! So, what am I talking about? Well, take white Gaussian noise for instance.

Its autocorrelation function is a delta function:

rxx [k] = σ 2 δ[k].

So its power spectral density is:


X
S̃xx (e jωT
) = rxx [k]e−jωkT
k=−∞
2 −0
= σ e
= σ2

Hence it is has a at power spectrum. Figure 41 shows two estimates of the power

spectrum of Gaussian noise.

4000 4000

3000 3000

2000 2000

1000 1000

0 0

−1000 −1000

−2000 −2000

−3000 −3000
0 100 200 300 400 500 600 0 100 200 300 400 500 600
n n

5000 7000

4500
6000
4000

3500 5000

3000
4000
2500
3000
2000

1500 2000

1000
1000
500

0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
m m

Figure 41: Power spectrum estimates of white gaussian noise, estimated using a peri-
odogram with N = 512. The spectrum is not as 'at' as expected.

64
• Everytime we estimate the spectrum we usually get the wrong answer, and the vari-

ance of the error is roughly


2 (ejωT )
S̃xx (which is awful!).

• And does not get any better when we increase N as we see from Figure 42, where

N = 2048.

4000

3000

2000

1000

−1000

−2000

−3000

−4000
0 500 1000 1500 2000 2500
n

2000

1800

1600

1400

1200

1000

800

600

400

200

0
0 20 40 60 80 100 120 140
m

Figure 42: PSD estimated from 2048 samples of a Gaussian process. Variance is about the
same as when N = 512.

So, we reduce the variance of the estimate by averaging a number of peri-


odograms.

• The data is rst broken into segments. Suppose we have N = KL data points. We

break the data into K segments, xk [n], each containing L samples:

xk [n] = x[kL + n] for 0≤n≤L−1 , 0≤k ≤K −1

• We multiply each segment by our window of choice:

vk [n] = w[n]xk [n].

• The periodogram of the k th segment is (including the normalisation for the window
N −1 2
energy, U n=0 (w[n]) ):

k 1 |Ṽk (ejωT )|2


S̃vv (ejωT ) = .
U L
N
Note that since the number of samples in each segment is L = K , the frequency
resolution has also decreased by a factor of K. We are trading bias for variance. You

get nothing for free!

65
• Our nal estimate is the average of all K periodograms:

K−1
jωT 1 X k jωT
S̃vv (e )= S̃vv (e ).
K
k=0

4 4 4 4

2 2 2 2

Data 0 0 0 0
segments −2 −2 −2 −2

−4 −4 −4 −4
0 50 100 0 50 100 0 50 100 0 50 100
time time time time
4 4 4 2

2 2 1
2
Windowed 0 0 0
Data 0
−2 −2 −1

−4 −4 −2 −2
0 50 100 0 50 100 0 50 100 0 50 100
time time time time
0 0 0 0
10 10 10 10
Periodograms
−5 −5 −5
10 10 10

−10 −10 −10 −5


10 10 10 10
0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2
ω 4
x 10 ω 4
x 10 ω 4
x 10 ω 4
x 10
0
10

10
−2 Averaged
Periodogram

−4
10
0 0.5 1 1.5 2
ω 4
x 10

Figure 43: Averaged periodogram with K = 4. The 4 periodograms (left to right),


1 (ejωT ), S̃ 2 (ejωT ), S̃ 3 (ejωT ), S̃ 4 (ejωT ) of the
S̃vv 4 windowed segments are averaged to
vv vv vv
obtain the nal estimate of the power spectral density. A Hamming window was used to
window each segment. The data consists of two sinusoids and Gaussian noise.

• Since at any frequency the estimate is the average of K i.i.d. variables (independent

and identically distributed variables), this will result in a reduction in the variance

by a factor of K:
h i S̃ 2 (ejωT )
var S̃vv (ejωT ) ≈ xx .
K
So the more segments we have the lower the variance of our estimate.

66
Example: Spectrum analyser
You have been asked to design a spectrum analyser to be used in an electronic tuning

device which will sample a band-limited sound signal at a rate of 8 kHz. It must have a

frequency resolution of 5Hz, and the standard deviation of the estimate must be within

20% of the true value.

• To get the standard deviation down to 20% the true value we need

S̃xx (ejωT )
r h i
var S̃vv (ejωT ) ≈ √ ≤ 0.2S̃xx (ejωT )
K

K ≥ 5
K ≥ 25

So we need 25 segments to get the accuracy we require,

• The frequency resolution of the rectangular window with L samples is about

2
∆f = ,
LT

but the spectral smearing is quite bad.

• We choose to use a Hamming window, whose frequency resolution is about twice as

bad as the rectangular window:


4
∆f = .
LT
• For an 8 kHz sample rate and 5 Hz resolution, we have

4(8000)
5 =
L
L = 6400

• So we need to collect K ≥ 25 segments each with L ≥ 6400 samples in each segment.


This results in (25)(6400/8000) = 20 s of recording.

• We calculate V̄ k [m] for each of the 25 segments

6399
X 2πm
V̄ k [m] = Ṽ k (ejωT ) = vk [n]e−jn
N
ωT = 2πm
N n=0

and average them


24
1 X k
V̄ [m] = V̄ [m].
25
k=0

67
4.7 Short-time Fourier transform
Suppose we have a signal which is nonstationary. Take for example the signals in Figure 44.

These are obviously two dierent signals, but their power spectral density estimates are the

same. What would be more useful would be some sort of time-frequency representation.

We want something like what is shown in Figure 45. Here we can see how an estimate of

the instantaneous frequency varies with time.

1 0.05

0.8 0.045

0.6 0.04

0.4 0.035

0.2 0.03

0 0.025

−0.2 0.02

−0.4 0.015

−0.6 0.01

−0.8 0.005

−1 0
0 20 40 60 80 100 120 140 0 500 1000 1500 2000 2500
n m

1 0.05

0.8 0.045

0.6 0.04

0.4 0.035

0.2 0.03

0 0.025

−0.2 0.02

−0.4 0.015

−0.6 0.01

−0.8 0.005

−1 0
0 20 40 60 80 100 120 140 0 500 1000 1500 2000 2500
n m

Figure 44: A single periodogram estimator of the power spectral density of a nonstationary
π π
signal. The two tones in the signal have frequencies at ω1 T = 4 and ω2 T = 8.

We can use the short-time Fourier transform to derive a plot similar to that of Figure 45...

68
1
3
0.8

0.6 2.5

0.4
2
0.2

ωT
0
1.5
−0.2

−0.4 1

−0.6
0.5
−0.8

−1 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
n n

1
3
0.8

0.6 2.5

0.4
2
0.2

ωT
0
1.5
−0.2

−0.4 1

−0.6
0.5
−0.8

−1 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
n n

Figure 45: Time-frequency representation of a non-stationary signal.

4.7.1 Continuous STFT


The continuous-time short-time Fourier transform is dened as:

Z∞
ST F T (t0 , ω) = (g(t − t0 )f (t)) e−jωt dt
t=−∞

were g(t − t0 ) is the window function, g(t), which is shifted to be centred at t = t0 . g(t) is

a short-time window function which quickly decays to zero. A typical function would be:

2
g(t) = e−αt for α>0

but you could use a Hamming window, etc.

• ST F T (t0 , ω) is simply the Fourier transform of the signal g(t − t0 )f (t). Figure 46

illustrates what g(t − t0 )f (t) looks like.

• A short section of the signal f (t) is isolated by g(t − t0 ) about the time t = t0 .

• The frequency content of this isolated section is determined using the single peri-

odogram estimator.

69
1

0.5

f(t)
0

−0.5

−1
0 200 400 600 800 1000 1200
t

0.8
g(t−t )

0.6
0

0.4

0.2

0
0 200 400 600 800 1000 1200
t0 t

0.5
f(t)g(t−t0)

−0.5

−1
0 200 400 600 800 1000 1200
t

Figure 46: An illustration of how the function g(t − t0 ) picks out the section of the chirp
signal f (t) at time t0 =500.

• If we calculate the STFT at a number of dierent times we can construct the time-

frequency plot we desire  we call this 3-D plot a spectrogram.


• (The spectrogram is usually plotted on two dimensions, with the frequency magnitude

being represented by the image intensity or colour)

Figure 47 shows a sketch of what the spectrogram might look like for the `chirp' signal

f (t) = sin(ω0 t2 ) = sin ((ω0 t)t) .

The frequency, ω = ω 0 t, of the chirp signal increases linearly with time.

4.7.2 Time-frequency resolution


Just a quick note on the time-frequency resolution. If the time resolution is ∆t, this is

inversely proportional to the frequency resolution:

K
∆t =
∆f

∆t∆f = K

This is Heisenberg's uncertainty principle.

70
ω

t = 500

Figure 47: Sketch of what the spectrogram of the chirp signal in Figure 46 might look like.

• If we want ne-grain time resolution we take small windows which will give a poor

frequency resolution.

71
4.7.3 Discrete STFT
The STFT for discrete signals is:


X
ST F T [n0 , ω) = g[n − n0 ]f [n]e−jωnT
n=−∞

2πm
Of course we can evaluate this at any frequency we like. We usually set ωT = N and
vary m over over 0, ..., N − 1 to give:


X 2πm
ST F T [n0 , m] = g[n − n0 ]f [n]e−j N
n

n=−∞

If we use a Hamming window of length N the summation becomes:

n0 +(N −1)
X
ST F T [n0 , ω) = g[n − n0 ]f [n]e−jωnT .
n=n0

Let k = n − n0 , therefore, n = k + n0 . Make the substitution...

N
X −1
ST F T [n0 , ω) = g[k]f [k + n0 ]e−jω(k+n0 )T
k=0
N −1
!
X
= ejn0 ωT g[k]f [k + n0 ]e−jωkT (10)
k=0

• The term ejn0 ωT is just a phase component so we can ignore it and examine the

magnitude.

• We dene a new lter g∗ [k, ω0 ) = g[k]e−jω0 kT , so...

N
X −1
|ST F T [n0 , ω0 )| = g∗ [k, ω0 )f [k + n0 ]
k=0

This is the N samples of f [n], from n = n0 to n = n0 + N − 1, ltered by g∗ [k, ω0 ). But,

G̃∗ (ejωT , ω0 ) = F {g∗ [k, ω0 )}


n o
= F g[k]e−jω0 kT
n o
= F {g[k]} ∗ F e−jω0 kT
= G̃(ejωT ) ∗ (2πδ(ω0 ))

i.e. the Fourier transform of the window convolved with a delta function at ω0 .

• This results in F {g[k]} = G̃(ejωT ) being shifted to ω0 .

• Remember, if the window function, g[k], is the rectangular window of length N, we

72
get
sin(N ωT
2 )
|F {g[k]}| = |G̃(ejωT )| =
sin( ωT
2 )
sin(N (ω−ω0 )T )
G̃∗ (ejωT , ω0 ) = 2

sin( (ω−ω
2
0 )T
)
This looks like a narrow band-pass lter at ω = ω0 . So...


|F {ST F T [n, ω0 )}| = G̃∗ (ejωT , ω0 ) |F̃ (ejωT )|

!
sin(N (ω−ω 0 )T
)
= 2
(ω−ω0 )T
|F̃ (ejωT )|
sin( 2 )

2πm
• If we evaluate the STFT at ωT = N , for m = 0, ..., N − 1 the STFT will behave

like a bank of N narrowband lters (c.f. Figure 48), each one allowing a dierent

frequency to pass.

G∗(ejωT )

ω |ST F T [n, ω0)|


ω0

f [n] ω |ST F T [n, ω1)|


ω1
..
.

ω
|ST F T [n, ωN −1)|
ωN −1

Figure 48: STFT is like a lter bank.

• This is a very useful transform for analysing the structure of transient signals, like

speech. Figure 49 shows a speech waveform, its PSD and its STFT.

73
Speech
0.5

0.4 o
"ong"
S

0.3 l

y[n] 0.2

0.1

−0.1

−0.2
0 0.2 0.4 0.6 0.8 1
TIME

Power Spectral Density Estimate of Entire Speech


−1
10

−2
10
S(f)

−3
10

−4
10

−5
10
0 2000 4000 6000 8000 10000
FREQUENCY (Hz)

Spectrogram of the phrase "so long"


7000

6000

5000
FREQUENCY (Hz)

4000

3000

2000

1000

0
0 0.2 0.4 0.6 0.8 1
TIME (s)

Figure 49: (top) Speech waveform for the words So long. (middle) An estimate of the
PSD using the averaged periodogram estimator. (bottom) The spectrogram derived using
the short-time Fouier transform. The variation of frequency with time becomes apparant.
74
5 Speech processing
• The processing of digitised speech signals is an important application of DSP.

• Speech is a fundamental method of human communication.

• From an engineering view-point, speech is a time-varying signal produced by a con-

strained physical system (lungs, vocal chords and vocal tract).

• Speech is reasonably band-limited. (phone system samples at 8 kHz which sounds

ne).

There are 3 classes of problem people are generally interested in:

1. Speech analysis  with the aim of building speech recognition systems.

2. Speech synthesis  to allow computerised systems to converse with humans without

using recorded speech snippets, for reading text, for altering speech characteristics

for security reasons.

3. Speech analysis/synthesis  for speech compression for transmission or storage.

5.1 Speech production model


• Speech is produced when air is expelled from the lungs, through a non-uniform

acoustic tube (the vocal tract) and is released at the lips.

• It can be thought of as the output of a slowly time-varying system which is excited

by a periodic source or a noise source.

• Speech is made up of a series of short sounds called phonemes.

There are three possible ways you can make a phoneme sound:

1. The vocal chords You tighten your vocal chords until they seal shut. You then

use your lungs to force them open. When they open a small noise will be made for a

short time, due to the pressure dierence. If is were possible to remove the head (!)

and listen to the sound of the vocal chords vibrating it would sound like an impulse

train of sound, whose fundamental frequency is determined by the tension on the

vocal chords. Sounds formed in this way are called voiced sounds. (/a/, /e/, /oo/,

etc.)

2. The vocal tract  When the vocal chords are left relaxed and air is forced through
the vocal tract, if the vocal tract is made narrow enough (using the tongue) a hissing

sound will be created. Sounds formed in this way are called unvoiced or fricative
sounds. (/ss/, /sh/, /ch/)

75
3. Plosives  By sealing the airway shut, building pressure behind the blockage and

then opening the airway we produce sounds called plosives. (/t/, /p/, /k/) However,

as a fraction of the total speech duration plosives make up very little time.

• Therefore we can model our speech process as a system excited by either a noise

source or a periodic impulse train.

• The dierence between various voiced or unvoiced sounds, is due to the variable

shape of the vocal tract. We model the vocal tract as a time-varying lter.

Therefore our speech model becomes that shown in Figure 50:

Vocal tract parameters


Noise
Source
Vocal tract
time-varying
filter
Pitch Period Pulse
Source

Figure 50: Speech model.

76
5.2 Voiced speech
5.2.1 The impulse train
So we know that voiced speech is formed by the ltering (by the vocal tract) of an impulse

train of sound, which has a period of T0 . Before we look at the ltering caused by the

vocal tract let's examine what the impulse train looks like.

If the impulse train is ideal we can write it as δT0 (t) = Σ∞


n=−∞ δ(t − nT0 ). Note that

δT0 is periodic and so can be represented by a Fourier series,


X
δT0 (t) = cn ejnω0 t ,
n=−∞

where,

Z π
ω0 ω0 T0 π
cn = δT0 (t)e−j 0 t dt, N ote : = .
2π −π 2 ω0
ω0

1
=
T0

Therefore,


1 X jnω0 t
δT0 (t) = e .
T0 n=−∞

Taking the Fourier transform gives

Z∞ X

1
F {δT0 (t)} = e−j(ω−nω0 )t dt
T0
−∞ n=−∞
∞ Z∞
1 X
= e−j(ω−nω0 )t dt
T0 n=−∞
−∞

2π X
= δ(j(ω − nω0 )).
T0 n=−∞

Hence, this gives an impulse train in the frequency domain (an equal contribution from all

multiples of the fundamental frequency). However, we rarely ever have a perfect impulse

train in the time domain because the vocal chords can only open at a nite speed. So what

we get is more like a ltered impulse train (c.f. Figure 51) in the frequency domain.

Since we will be creating a lter to model the eect of the vocal tract we can simply

assume that the ltering eect created by having an imperfect impulse train is included in

that lter.

77
δT0 (t)

t ω0 ω

imperfect δT0 (t)

t ω0 ω

Figure 51: Perfect and imperfect impulse train for voiced speech.

5.2.2 The vocal tract


Our ability to distinguish between dierent voiced phonemes is due to the ltering eect

the vocal tract has on the impulse train. The sound echoes around the vocal tract and is

eected by the shape of the throat, nose, mouth, position of the tongue and lips and even

the shape of the head and sinus cavities.

Figure 52 shows segments of voiced and unvoiced speech. We can see the impulse train

in the frequency domain and how it is shaped (ltered) by the vocal tract. The reason the

impulses in the frequency domain are not quite impulses is due to spectral smearing. (The

spectrum was estimated using a nite window length.)

• Notice (by looking at the envelope of the spectrum) how the vocal tract lter transfer

function appears to have a number of peaks and valleys.

• These peaks are called formants.

Figure 53 shows the spectrogram of a speech segment. In the voiced sections, the frequency

impulses (vertical spaced dark patches) are clearly visible. In the unvoiced segments there

are no impulses and the high frequency regions contains more power than the low frequency

regions.

78
0.15

0.1

0.05

0
Amplitude

−0.05

−0.1

−0.15

−0.2
0 500 1000 1500 2000 2500 3000
n

−40

Formants

−50

−60
Power/frequency (dB/Hz)

−70

−80

−90

−100
0 2 4 6 8 10

Frequency (kHz)

Figure 52: Example of voiced speech segment (left) for the phoneme /o/ and the averaged
periodogram estimate of its power spectrum (right). The pitch harmonics are visible. We
also see how the vocal tract has ltered the harmonics.

79
Figure 53: Steve jobs says 'Hi'. Spectrogram of a speech segment. The pitch harmonics
are clearly visible. The formant structure is somewhat visible.

80
5.3 Linear predictive speech coding
• We have just observed how the spectrum of a phoneme is composed of an impulse

train of harmonics of the fundamental frequency.

• But we also saw how the spectrum of the excitation signal, ε[n], which should be

`at' for an impulse train or for white Gaussian noise is shaped by the vocal tract.

• The peaks represent resonances in the vocal tract. These are preferred frequencies

of vibration within the vocal cavity.

• We have seen from the properties and design of lters section that we can create a

lter whose transfer function has a number of peaks and valleys by placing poles near

the unit circle. The nearer the pole is to the unit circle the larger the peak.

So we could model the vocal tract lter with an all-pole lter:

G
H̃(z) = Pp −k
,
1− k=1 b[k]z

where G is the gain (just using poles will make the gain everywhere greater than unity, so

G<1 to compensate). If the samples of the recorded speech are written as y[n] and we

call the impulse train or the noise excitation ε[n], then they are related by:

Ỹ (z) = H̃(z)ε̃(z)
Gε̃(z)
Ỹ (z) = Pp
1 − k=1 b[k]z −k
p
!
X
Ỹ (z) 1 − b[k]z −k = Gε̃(z)
k=1
p
X
Ỹ (z) = Gε̃(z) + b[k]Ỹ (z)z −k
k=1

Taking the inverse z -transform gives

p
X
y[n] = b[k]y[n − k] + Gε[n].
k=1

• We see that the current speech sample, y[n], is very much dependent on previous

samples, y[n − k]  this veries our belief that speech is quite a redundant signal.

The number of previous samples needed depends on the order, p, of the lter. Usually

about p = 12 will do.

• If we knew the impulse train period, or the noise source power, and if we also knew

the lter coecients, b[k], we could predict what the next speech sample would be.

• This type of encoding is called linear prediction, since we predict the current sample
using a linear combination of the previous outputs.

81
• The hope is that it will take less bits to encode the pitch period, or noise source

power, and the lter coecients for a short section of speech than it would to encode

the actual speech samples.

• In addition the lter parameters give us some way to quantify the dierence between

dierent phonemes (for speech recognition, etc.).

5.3.1 Linear prediction coecients


So how do we nd the linear prediction coecients, b[k]? We take a short section of speech

samples, say 50 ms. Sampling at 8000 Hz will give 400 samples in this section of speech.

We want to nd the parameters b[k] which will best model this section of speech. We write

our prediction for the next speech sample, ŷ[n], according to linear prediction model as

p
X
ŷ[n] = b[k]y[n − k] + Gε[n].
k=1

But we've recorded the speech and we know what that sample actually is (y[n]). So we

can compare it with our prediction to get an error signal, e[n]:

e[n] = y[n] − ŷ[n]


X p
= y[n] − b[k]y[n − k] − Gε[n]
k=1

We want to nd the b[k] which minimises the expected mean square error,

 !2 
 p
X 
E e2 [n] = E

y[n] − b[k]y[n − k] − Gε[n] .
 
k=1

Dierentiate w.r.t. b[i]:


 
∂ ∂ 2
E e2 [n] = E

e [n]
∂b[i] ∂b[i]
 
∂e[n]
= E 2 e[n]
∂b[i]
= E {2(−y[n − i])e[n]}
= −2E {y[n − i]e[n]}

82
We set the derivative equal to zero to minimise for i = 1, .., p:

E {y[n − i]e[n]} = 0
 

 

 
p

 ! 

 X 
E y[n − i] y[n] − b[k]y[n − k] − Gε[n] = 0
 

 k=1 


 | {z }



This is y[n]−ŷ[n]
p
( )
X
E y[n − i]y[n] − b[k]y[n − i]y[n − k] − Gy[n − i]ε[n] = 0
k=1
p
X
E {y[n − i]y[n]} − b[k]E {y[n − i]y[n − k]} − GE {y[n − i]ε[n]} = 0
k=1

These expectations will be estimated using the 400 samples from the section we're ex-

amining. For voiced sounds, the cross-correlation between the recorded speech and the

excitation source should be zero for any lag, i.e. i > 0, since the impulse train is zero

everywhere except at the pulses, and the speech signal can't have a DC value. Hence we

may assume E {y[n − i]ε[n]} = 0 for i > 0. This gives

p
X
E {y[n − i]y[n]} = b[k]E {y[n − i]y[n − k]} for i = 1, ..., p
k=1
p
X
ryy [i] = b[k]ryy [i − k] for i = 1, ..., p
k=1

We can write these p equations more compactly in matrix form:

 
··· ryy [p − 1]
  
ryy [0] ryy [1] ryy [2] b[1] ryy [1]
..
    

 ryy [1] ryy [0] ryy [1] . ryy [p − 2] 
 b[2] 

 ryy [2]



..
 

 ryy [2] ryy [1] ryy [0] . ryy [p − 3]  
  b[3] 
 =
 r [3]
 yy


. .
  
. .. .. .. . . .
 
. . . .
   
. . . . .
     
 
ryy [p − 1] ryy [p − 2] ryy [p − 3] · · · ryy [0] b[p] ryy [p]
Ryy b = ryy

So the linear prediction coecients we want are

b = R−1
yy ryy .

There is an ecient algorithm for solving matrix problems with the above form. It's called

the Levinson-Durbin algorithm, and we'll get to it shortly.

• Figure 54 shows a speech signal, the predicted signal and the error between them for

the optimum prediction coecients.

83
4

−1

−2

−3
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
time (s)

2.5

1.5

0.5

−0.5

−1

−1.5

−2

0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
time (s)

Figure 54: Example showing the speech waveform


Pp y[n] (solid) for the phoneme /o/ and
the predicted signal ŷ[n] = k=1 b[k]y[n − k] (dashed) with p = 22. The error signal
e[n] = y[n] − ŷ[n] (solid) is also shown. These plots were created using the optimum
prediction coecients, b[k]. We see the error signal looks like noise.

84
• Figure 55 shows the magnitude and phase of the transfer function H̃(z) for the same
short section of speech and the resulting periodogram when applied to an articial

impulse train.

40

35

30

25
Magnitude (dB)

20

15

10

−5

−10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (´π rad/sample)

50

0
Phase (degrees)

−50

−100

−150

−200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (´π rad/sample)

−40

Formants

−50

−60
Power/frequency (dB/Hz)

−70

−80

−90

−100
0 2 4 6 8 10

Frequency (kHz)

Figure 55: Transfer function of the lter dened by the linear prediction coecients for
the phoneme /o/. Also shown is the periodogram of the section of speech from which the
coecients were derived.

• The pitch of this speech segment is about 100 Hz. We can synthesise the speech by

passing an impulse train (with 10 ms between deltas functions) through the lter.

85
• We must adjust the gain, G, appropriately.

• Figure 56 shows the result of passing an impulse train through the lter.

1.5

0.5

−0.5

−1
0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065
time (s)

Power Spectral Density Estimate via Welch Power Spectral Density Estimate via Welch
−20 −20

−25
−30

−30
−40

−35

−50

−40
Power/frequency (dB/Hz)

Power/frequency (dB/Hz)

−60

−45

−70

−50

−80

−55

−90
−60

−100
−65

−110 −70
0 2 4 6 8 10 0 2 4 6 8 10
Frequency (kHz) Frequency (kHz)

Figure 56: (Top) Example of speech synthesised using all-pole lter model (p = 22) derived
from the actual speech phoneme /o/. Solid is the impulse train. Dashed is the synthesised
speech waveform. (Bottom left) The periodogram of the synthesised speech. (Bottom
right) Same periodogram with noise added.

86
5.4 The Levinson-Durbin algorithm
The Levinson-Durbin recursive algorithm is method for solving the set of linear equations

Tx = y

where T is a Toeplitz matrix (the diagonals which run from upper left to lower right

contain the same entry) with a non-zero main diagonal. A special simplied case is when

T is symmetric. This is what we use here for the equation Ryy b = ryy :

Algorithm 1 Symmetric Toeplitz Levinson-Durbin recursion


1. i = 0, E0 = ryy [0].

2. i = i + 1.
Pi−1
(ryy [i]− j=1 (b(i−1) [j])(ryy [i−j]))
3. ki = Ei−1

4. b(i) [i] = ki .

5. For j = 1, ..., i − 1: b(i) [j] = b(i−1) [j] − ki b(i−1) [i − j]

6. Ei = (1 − ki2 )Ei−1

7. If i<p go to step 2, else terminate.

Example: Levinson-Durbin matrix inversion


The nth sample of a speech signal, y[n], can be estimated as ŷ[n] using the following 3rd

order speech model:


3
X
ŷ[n] = b[k]y[n − k] + Gε[n]
k=1

If the autocorrelation function of a speech signal y[n] is approximated by ryy [k] = ρ|k| ,
where ρ < 1, nd the coecients , b[k], which will minimise the expected squared error,

ŷ[n])2 .

E (y[n] −

Using the derivation in the notes we get:

    
ryy [0] ryy [1] ryy [2] b[1] ryy [1]
 ryy [1] ryy [0] ryy [1]   b[2]  =  ryy [2] 
    

ryy [2] ryy [1] ryy [0] b[3] ryy [3]


    
1 ρ ρ2 b[1] ρ
 ρ 1 ρ  b[2]  =  ρ2  .
    

ρ2 ρ 1 b[3] ρ3

87
If we bash out the inverse we will get:

    
b[1] 1 −ρ 0 ρ
1 
 b[2]  =  −ρ 1 + ρ2 −ρ   ρ2 
   
1 − ρ2
b[3] 0 −ρ 1 ρ3
 
ρ
=  0 .
 

But, for a large matrix this is a little more time consuming than it needs to be. We use

the Levinson-Durbin recursion to perform the same inversion:

Iteration 0:
1. E0 = ryy [0] = 1. i = 0.
2. i = i + 1 = 1.
Iteration 1: P j=0
ryy [1]− j=1 (b(0) [j])(ryy [1−j]) ryy [1] ρ
3. k1 = E0 = E0 = 1 = ρ.
4.

b(1) [1] = k1 = ρ

5. skip since i−1=0


E1 = (1 − k12 )E0 = 1 − ρ2 1 = 1 − ρ2

6.

7. i < 3, go to step 2.

2. i = i + 1 = 2.
Iteration 2:
ryy [2]−b(1) [1]ryy [1] ρ2 −ρρ
3. k2 = E1 = 1−ρ2
= 0.
4.

b(2) [2] = k2 = 0.

5.

b(2) [1] = b(1) [1] + k2 b(1) [1] = ρ − 0.ρ = ρ

6. E2 = (1 − k22 )E1 = (1 − 0)(1 − ρ2 ) = 1 − ρ2 .


7. i < 3, go to step 2.

2. i = i + 1 = 3.
Iteration 3:
ryy [3]−b(2) [1]ryy [2]−b(2) [2]ryy [1] ρ3 −ρρ2 −0.ρ
3. k3 = E2 = 1−ρ2
= 0.
4.

b(3) [3] = k3 = 0.

88
5.

b(3) [2] = b(2) [2] − k3 b(2) [1] = 0 − 0.ρ = 0


b(3) [1] = b(2) [1] − k3 b(2) [2] = ρ − 0.0 = ρ

6. E3 = (1 − k32 )E2 = (1 − 0)(1 − ρ2 ) = 1 − ρ2 .


7. i = 3, terminate.

This gives the solution

     
b[1] b(3) [1] ρ
 b[2]  =  b(3) [2]  =  0  .
     

b[3] b(3) [3] 0

5.5 Voiced/unvoiced/silence decision


Before we start trying to code a section of speech we need to decide whether it is voiced,

unvoiced or silence. This is usually achieved by examining a number of dierent features

and combining the information they provide in an optimal way (Usually using a pattern

classier). Some common features might be:

• The energy in the speech segment,

• Number of zero-crossings,

• Autocorrelation,

• First linear prediction coecient, b[1],

• etc, etc, etc.

Figure 57 shows how we might use two features (energy and zero-crossings) to make the

voiced unvoiced decision.

• Pattern classiers, such as neural networks, for example, are used to divide up the

feature space accordingly, even when there are multiple features.

• You could add any number of clever measures to aid the voiced/unvoiced decision

and the classier will take care of the rest!

89
1

0.8

0.6

0.4

0.2

−0.2

−0.4

−0.6

−0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
time (s)

300

250

Unvoiced

200
zero−crossings

150 "Silence"

Voiced
100

50

0
0 0.5 1 1.5 2 2.5 3 3.5 4
energy

Figure 57: The voiced/unvoiced/silence decision just using the segment power (*) and
number of zero-crossings (o). (The top plot has been normalised to t everything on the
same scale!)

90
5.6 Estimating the pitch period
• If we happen to nd that the speech is voiced, then before we can code it we need

to estimate the pitch of the speech.

• In long, clean, segments of voiced speech it is quite easy to determine the pitch period

by eye.

• If the speech segment is short, in the presence of noise, it can be dicult to see the

pitch period.

• In this case are numerous ways to to estimate the pitch period  some better than

others. One common way is to use the autocorrelation function.

• If we have a segment of speech y[n], for n = 1, ..., N . We estimate the autocorrelation

function as:
N −1
1 X
r[k] = y[n]y[n − k].
N − |k|
n=1

Figure 58 shows how the autocorrelation of a voiced speech segment reveals the pitch

period. A simple algorithm to search for the rst peak above a certain threshold can

simply be used to nd the peak due to a lag equal to the pitch period.

Other ad-hoc methods exist, too. One is used in the LPC-10 speech standard. It is called

the average magnitude dierence function (AMDF):

N
1 X
AM DF (k) = |y[n] − y[n − k]|
N
i=1

• It similar to the autocorrelation, except instead of multiplying the signal we subtract

the signal from a delayed version of itself.

• When the lag is that of the pitch period the summation will become small.

91
0.15

0.1

0.05

0
Amplitude

−0.05

−0.1

−0.15

−0.2
0 500 1000 1500 2000 2500 3000
n

−4
x 10

14

12

10

−2

−4

−6

−1000 −800 −600 −400 −200 0 200 400 600 800 1000
k

Figure 58: (Top) A short segment of the phoneme /o/. (Bottom) The autocorrelation of
the speech segment.

92
6 Image Processing
Processing of images is a fundamental concern of engineering. Some specic applications

of interest are:

• Television

• Video

• Remote sensing

• Medical imaging

• Image enhancement and restoration

• Image compression/coding for storage or transmission

• Image Analysis / Machine vision, etc.

Like most engineering disciplines, the number of image processing algorithms and trans-

forms is vast. Here we will touch on the following basic principles;

• Image representation

• Image histograms

• 2-D Fourier transform

• Image ltering

• Discrete cosine transform

• Image compression (JPEG)

• Morphological operations

6.1 Image representation


• We can represent an image with an M ×N matrix, f [m, n].

• For a binary image the matrix will have only f [m, n] = 0 (black) or f [m, n] = 1
(white) entries (c.f. Figure 59). It uses B =1 bits for every pixel. Hence it uses

NM bits.

• If we allow dierent grey levels between black and white we have a greyscale image.

Using B = 4 bits will give 2


B = 16 levels (0, ..., 15) which makes the image more

`natural looking'. We can increase the number of bits to B = 8 which is usually

sucient (c.f. Figure 60).

93
Figure 59: 1-bit grey level image

Figure 60: (left) 4-bit image (16 grey levels). (right) 8-bit image (256 grey levels).

94
• Figure 61 shows an image of a man created using M = 11 and N = 8. We've made

the pixels bigger and used less of them. We see that the resolution is too low to be

useful.

Figure 61: Low resolution image of a man. M = 11 and N = 8.

• We can also represent colour images by using 3 M ×N matrices  for the red, green

and blue intensities. This is called the RGB colour space. Each pixel is represented

by a triplet, e.g. White= (255, 255, 255), Black= (0, 0, 0), etc.

• You may sometimes see a dierent colour space used: the YCbCr colour space. This

is from the old Standard-Denition TV broadcast standards.

 The Y is brightness (luma) component.

 Cb is the blue chroma component..

 Cr is the red chroma component.

      
Y 65.481 128.553 24.966 R 16
 Cb  =  −37.797 −74.203 112   G + 128  where R, G, B ∈ (0, 1)
      

Cr 112 −93.786 −18.214 B 128

• Alternatively, we could represent a colour image with one matrix by using a colour

palette:

 Black = 0

 Dark red = 1

 Light Green =2

 etc...

95
6.2 Image histograms
• If we let the variable R represent the various grey-levels in the image f [m, n]. We

can normalise R to lie in the range (0, 1) by dividing each pixel value by 2
B − 1, the

maximum possible value. Where, R=0 is black, and R=1 is white.

• Therefore the set of grey-levels in the image (corresponding to the pixel brightness

values) will be a collection of samples of the random variable R.

• If we assume the bit-depth is large enough that R is approximately a continuous

variable, the image may be characterised by the PDF, fR (R = r), of the random

variable R. For notational convenience we just write fR (R = r) = fR (r).

• The PDF, fR (r), can be used in various ways. For example, we can determine the

average brightness of the image as E {R}:

Z1 M N
1 1 XX
E {R} = rfR (r)dr ≈ f [m, n].
(2B − 1) N M
r=0 m=1 n=1

• This allows us to quantitatively say whether an image is `bright' (e.g. E {R} = 0.8)
or `dark' (e.g. E {R} = 0.2).

• Usually the actual PDF is not available to us and we must either (1) choose a PDF

which best ts the image (Gaussian mixture model, etc.), or (2) we must estimate

the PDF non-parametrically.

• A popular non-parametric method for PDF estimation is the histogram.

• The histogram is constructed by simply counting the number of occurrences of each

pixel brightness, and nally dividing the entire histogram by NM (the number of

pixels), so that the integration of the histogram is 1 (because the area under the

PDF = 1).

• The histogram can also give a good indication of the contrast of an image. If the

histogram is concentrated over a small number of brightness values the contrast will

be low (c.f. Figure 63).

• In comparison, if the image uses all of the available brighness values the contrast will

be high (c.f. Figure 64).

• We can manipulate the contrast of the image by transforming the pixel values. We

can use any mapping that sends the pixel values from the domain (0, 1) to the range

(0, 1). We write this transform S = T (R), where T (·) is the transformation function.
S is a random variable which denotes the brightness of the pixels in the transformed

image. Figure 65 shows the arbitrary transformation S = T (R).

96
Image histogram
800

700

600
# OCCURRENCES
500

400

300

200

100

0
0 50 100 150 200 250
BRIGHTNESS

Figure 62: Sample (un-normalised) histogram for the 8-bit image in Figure 60.

Image histogram − low contrast


1400

1200

1000
# OCCURRENCES

800

600

400

200

0
0 50 100 150 200 250
BRIGHTNESS

Figure 63: Low contrast image and its histogram.

97
Image histogram − high contrast
800

700

600

# OCCURRENCES
500

400

300

200

100

0
0 50 100 150 200 250
BRIGHTNESS

Figure 64: High contrast image and its histogram. Some saturation is evident in the
brighter areas.

0.9

0.8

0.7

0.6
S

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R

Figure 65: An arbitrary grey-level transformation function, S = T (R) = R0.5 .

98
 In fact, the Cathode Ray Tube (CRT) used in TV sets performs such a transfor-

mation since the electron beam intensity, ICRT is related to the voltage applied,

VCRT by

ICRT = kV γCRT ,

where γ is dependent on the CRT, and k is some constant. To compensate,

Gamma Correction is applied so that

 
1
γ
VCRT = VIM AGE ,

and the image brightness is unaltered at the receiving end.

• From probability theory we can show that, given a random variable X, with a PDF

pX (x), if this variable is transformed to get Y = g(X), for some function g(·), then

the PDF of Y is given by

fX (x1 ) fX (xn )
fY (y) = + ... + ,
dg(x) dg(x)
dx dx
x=x1 x=xn

where x1 , ..., xn are the roots of the equation g(x) = y . If we assume we are using

a monotonic transformation S = T (R) then there is only one root to the equation

T (r) = s  we call this solution r1 . Hence we can write the PDF of the transformed

image as
fR (r1 )
fS (s) = . (11)
dT (r)
dr
r=r1

Example: Grey-level transformation


Suppose we have an image whose PDF, fR (r), is uniform over the range (0, 1), i.e.

fR (r) = 1 for 0≤r≤1


= 0 otherwise.

If we use the following transformation,



 0 for R < 0.25
S = T (R) = 2(R − 0.25) for 0.25 ≤ R ≤ 0.75

1 R > 0.75

for

We see that for s=0 there are an innite number of solutions to T (r) = s = 0 along the
dT (r)
line from 0 < r < 0.25 . Similarly for s = 1. Therefore in these regions we have
dr =0
which gives
R 0.25
r=0 fR (r)dr
fS (0) =
0

99
1

0.9

0.8

0.7

0.6

0.5

S
0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R

Figure 66: Sample grey-level transformation

which is meaningless! But we can calculate the PDF in these regions since

0.25
Z
fS (0) = P (0 < R < 0.25) = fR (r)dr = 0.25.
r=0

Similarly, fS (1) = 0.25.


dT (r)
In the center region,
dr = 2, and so

fR (r)
fS (s) = = 0.5.
2
R∞
So,
−∞ fS (s)ds = 1 as required, however the low intensity and high intensity regions have
been saturated. Meanwhile, the contrast of the middle intensity values has been increased.

Figure 67 shows the result of the above transformation applied to the sample image.

Image histogram − altered histogram


1000

900

800

700
# OCCURRENCES

600

500

400

300

200

100

0
0 50 100 150 200 250
BRIGHTNESS

Figure 67: Image transformed to saturate the high and low intensity regions

Example: Image equalisation


One particular image enhancement we might like to preform is image equalisation. This

means transforming the image PDF so that all intensities provide an equal contribution.

100
This means we want

fS (s) = 1 for 0 ≤ s ≤ 1.

But we saw in Equation (11) that

fR (r1 )
fS (s) = where T (r1 ) = s.
dT (r)
dr
r=r1

If we choose
Zr
s = T (r) = FR (r) = fR (w)dw
w=−∞

which is the cumulative distribution function (CDF) for the random variable R, then we

have:

f (r )
fS (s) = R 1
dT (r)
dr
r=r1
f (r )
= R 1
dFR (r)
dr
r=r1
fR (r1 )
=
|fR (r)|r=r1
fR (r1 )
=
fR (r1 )
= 1.

The idea is that the new image has a uniform histogram.

Equalized histogram
800

700

600
# OCCURRENCES

500

400

300

200

100

0
0 50 100 150 200 250
BRIGHTNESS

Figure 68: Equalised image.

101
6.3 2-D Fourier transform
6.3.1 Continuous 2D Fourier transform
• Up until now we have been using the 1-D Fourier transform to transform functions

of one variable  time, e.g.

Z∞
F (jω) = f (t)e−jωt dt.
−∞

• However, the Fourier transform is easily dened over many variables. Take two

variables, x and y, for example,

Z∞ Z∞
F (jωx , jωy ) = f (x, y)e−jωx x e−jωy y dydx
x=−∞ y=−∞

x could denote the x-co-ordinate of an image and y the y -co-ordinate. f (x, y) would

be the grey-level intensity at that point. Here x and y are continuous variables, so

our image would be an analog image (on photographic lm for instance).

• The 1-D Fourier transform is a function of ω. Similarly the 2-D Fourier transform is

a function of ωx and ωy . For an image, where x and y represent positions in space,

ωx and ωy are spatial frequencies, and are measured in units of cycles per meter.

• Figure 69 shows what we might expect the magnitude of the 2-D Fourier transform

to look like for a sample image.

f (x, y) |F (jωx, jωy )|

x ωx
y ωy

Figure 69: Sketch of the 2-D Fourier for the sample image f (x, y)

6.3.2 Discrete 2D Fourier transform


• Similarly we can sample the analog image to obtain a discrete image. The equivalent

of the sampling time, T, is the sampling distance, D.

102
• Since we are sampling in two dierent directions we have two dierent sampling

distances, Dx and Dy .

• Remember, for a discrete image, f [m, n], m indexes the rows (this is the y direction)

and n indexes the columns (this is the x direction).

• We can write the discrete 2-D Fourier transform as:

M
X −1 N
X −1
F̃ (ejωx Dx , ejωy Dy ) = f [m, n]e−jωx nDx e−jωy mDy .
m=0 n=0

 Notice that we only sum from n = 0, ..., N − 1 and m = 0, ..., M − 1. This is

because the image has already been `windowed' with a rectangular window. We

can use a 2-D Hamming window if we like. It will have the same kind of eect

we spoke about in the Spectral Analysis section: It will reduce spectral leakage,

but the frequency resolution will be worsened.

Example: Discrete 2-D Fourier transform


" n=0 n=1 #
m=0 1 1
f [m, n] = .
m=1 1 1

M
X −1 N
X −1
jωx Dx
F̃ (e ,e jωy Dy
) = 1e−jωx nDx e−jωy mDy
m=0 n=0
−jωx Dx
= 1+e + e−jωy Dy + e−jωx Dx e−jωy Dy

Use the following trick!

jωy Dy jωx Dx jωy Dy jωx Dx jωy Dy jωy Dy jωy Dy


 
jωx Dx jωx Dx jωx Dx
F̃ (ejωx Dx , ejωy Dy ) = e− 2 e− 2 e 2 e 2 +e− 2 e 2 +e 2 e− 2 +e− 2 e− 2

jωx Dx jωy Dy
n jωy Dy  jωx Dx jωx Dx
 jωy Dy  jωx Dx jωx Dx
o
= e− 2 e−
e 2 2 e 2 + e− 2 + e− 2 e 2 + e− 2
jωx Dx jωy Dy n jωy Dy jωy Dy   jωx Dx jωx Dx
o
= e− 2 e− 2 e 2 + e− 2 e 2 + e− 2
    
jω D
− jωx2Dx − y2 y ωy Dy ωy Dy
= e e 2 cos 2 cos
2 2
   
jωx Dx jωy Dy ω D
y y ω D
y y
= 4e− 2 e− 2 cos cos .
2 2

So the magnitude response is

   
jωx Dx jωy Dy ωy Dy ωy Dy
|F̃ (e ,e )| = 4 cos cos .
2 2

This repeats every 2π so we only need to plot the range (ωx Dx , ωy Dy ) ∈ [−π, π] × [−π, π].

103
Fourier transform of f[m,n]

2
|F|

0
2
2
0
0
−2 −2
wy wx

 
1 1
Figure 70: Magnitude of the 2-D Fourier transform of the sequence f [m, n] =
1 1

6.3.3 The importance of amplitude and phase information


In interesting point to make about images compared to speech, is the importance of the

phase and amplitude information.

Figure 71: An image, the log of the magnitude of the Fourier transform and the phase.

Figure 71 shows an image and its Fourier transform. Figure 72 shows the image recon-

structed using just amplitude information of just phase information. We see the importance

of phase, which is not present in our perception of sound.

6.4 Image ltering


Just as we performed a 1-D ltering of a signal using convolution with the lter impulse

response,
N
X −1
y[n] = h[k]x[n − k],
k=0

104
Figure 72: Reconstructed image using just magnitude (phase = 0) or just phase information
(magnitude=constant). We see the importance of the phase information.

We can similarly perform image ltering using 2-D convolution with the lter impulse

response:
M
X −1 N
X −1
g[m, n] = h[u, v]f [m − u, n − v].
u=0 v=0

As with time signals, the resulting Fourier transform is given by:

G̃(ejωx Dx , ejωy Dy ) = H̃(ejωx Dx , ejωy Dy )F̃ (ejωx Dx , ejωy Dy ).

Example: low-pass image ltering


We lter the image, f [m, n], below, using the lter h[m, n], which is a low-pass lter with

a DC gain of 4. The resulting image, g[m, n], is smoother and 4 times brighter. Notice

g[3, 2] = 288. This would be saturated to 255 in practice.

f [m, n] h[m, n] g[m, n]


n=0 n=1 n=2 n=3 n=0 n=1 n=0 n=1 n=2 n=3

m=0 8 10 15 16 m=0 1 1 m=0 8 18 25 31

m=1 10 15 16 20 m=1 m=1 18 43 56 67


1 1
m=2 10 20 30 100 m=2 20 55 81 166
m=3 3 215 23 37 m=3 13 248 288 190

• The result is a blurring of the sharp lines and the attenuation of any high frequency

detail.

• Also, we usually normalise the coecients so the DC gain is unity. Hence we would

use " #
0.25 0.25
h= .
0.25 0.25

105
 
0.25 0.25
Figure 73: Image before and after low-pass ltering with h= .
0.25 0.25

Example: high-pass image ltering


The 3×3 image lter,
 
0 −1 0
h[m, n] =  −1 4 −1  ,
 

0 −1 0
can be shown to have a magnitude frequency response:


jωx Dx jωy Dy
H̃(e , e ) = 4 − 2 cos (ωx Dx ) − 2 cos (ωy Dy ) .

This is plotted in Figure over the range (ωx Dx , ωy Dy ) ∈ [−π, π] × [−π, π].

• This lter has a DC gain of zero.

• It amplies high frequency components.

• As a result all edges in the image are retained.

• Edge enhancement is the usually the rst step in image segmentation.

 MPEG-4 supports segmentation of the image into dierent regions. Fast chang-

ing regions are allowed more bit/s.

106
Fourier transform of h[m,n]

6
|H|

0
2
2
0
0
−2 −2
wy wx

Figure 74: Magnitude of the transform of the Laplacian operator and the eect on the
image after ltering.

6.5 Discrete Cosine Transform


6.5.1 1-D Discrete Cosine Transform
• The Discrete Cosine transform (DCT) is used for image compression in a number of

standards  such as JPEG and MPEG.

• There are two main reasons why it is preferred ahead of the DFT:

1. It returns only real numbers, whereas the DFT returns complex numbers.

2. Most of the energy is concentrated at the lower frequencies.

Consider a 1-D time signal. To obtain real coecients we imagining that we are only

looking at 2N samples from a signal which is symmetric about t = 0.

• If we take the Fourier transform of this signal all the sine components will be zero,

since it is an even function.

• We will be left with only cosine components.

Consider taking 4 samples from the signal f (t) in Figure 75.

107
f (t)

T 2T 3T t

f2 (t)

− T2 T
2
3T
2 t

Figure 75: f2 (t) is the even extension of the signal f (t).

The discrete-time Fourier transform of f2 (t) is given as:

Z∞
F̃2 (ejωT ) = f2 (t)e−jωt dt
−∞
3  
T T
e−jω((2n+1) 2 )
X
= f2 (2n + 1)
2
n=−4
3     
X T T
= f2 (2n + 1) cos ω (2n + 1) since the function is even,
2 2
n=−4
3   
X T
= 2 f (nT ) cos ω (2n + 1)
2
n=0
3  
X ωT
= 2 f [n] cos (2n + 1)
2
n=0

So we see that all the F̃2 (ejωT ) is real for all ω. By varying ω over the range [0, ( NN−1 ) Tπ ]
we dene the Discrete Cosine Transform of the sequence f [n] as:

N −1  
X (2n + 1)mπ
F [m] = c(m) f [n] cos for m = 0, ..., N − 1,
2N
n=0

where

(
√1 for m=0
c(m) = √N ,
√ 2 for m 6= 0
N

108
are normalising constants to ensure that the signal energy stays the same after a transfor-

mation and inverse traansformation.

6.5.2 2-D Discrete Cosine Transform


We can dene the 2-D DCT in a similar way as (using an N ×N image)

N −1 N −1    
X X (2m + 1)uπ (2n + 1)vπ
F (u, v) = c(u)c(v) f [m, n] cos cos
2N 2N
m=0 n=0

for

u, v = 0, ..., N − 1

and

(
√1 for m=0
c(m) = √N .
√ 2 for m 6= 0
N

• In eect we are correlating the image with N2 basis functions:

   
(2m + 1)uπ (2n + 1)vπ
eu,v [m, n] = cos cos .
2N 2N

• Figure 76 shows the 64 basis functions used to transform an 8×8 image.

6.5.3 Image compression


• Suppose we take the low resolution 32 × 32 pixel image of Figure 77.

• This image has 32 × 32 = 1024 pixels, and therefore there are 1024 DCT coecients.

• We notice that most of the energy in the coecients is concentrated around the lower

values of u and v.

• Let's just take the 100 coecients corresponding to 0 ≤ u, v < 10.

• We see that while we are using 10 times fewer bits to represent the image it is still

recognisably similar to the original.

6.6 JPEG
• The DCT forms the main building of the JPEG (Joint Photographic Experts Group)

compression standard.

 This is not to be confused with JPEG2000 which uses a wavelets transform and

operates on the entire image at once.

109
Figure 76: 64 basis functions for N = 8. For example the basis for (u, v) = (0, 0) is at the
top left.

Figure 77: 32 × 32 image and a compressed version using just 100 of the DCT coecients.

110
• The JPEG standard breaks the image into blocks of 8 × 8 pixels and performs a DCT
on each block.

• The majority of the high frequency coecients are eectively thrown away.

• Figure 78 shows an enlarged picture of a JPEG encoded image. If you look closely

you can see the 8×8 blocks.

111
Figure 78: Enlarged JPEG encoded image. The 8×8 blocks are just about evident.

112

S-ar putea să vă placă și