ADSP Notes11

Applications of Digital Signal Processing
Stephen Redmond, Room 146, Ste.Redmond@gmail.com
Course Description
Builds on the fundamentals of Digital Signal Processing to show specic examples of signal
processing algorithms in real world applications.
Content
24 lectures & associated homeworks and projects.
Texts
• Discrete-Time Signal Processing (2nd Ed.), Oppenheim and Schafer, Prentice-Hall,
1999.
• Digital Processing of Speech Signals, Rabiner and Schafer, Prentice-Hall, 1978.
• Statistical Digital Signal Processing and Modelling, M. Hayes, Wiley, 1996.
Syllabus
1. Brief review of DSP
Nyquist's Sampling Theorem, Discrete-Time Fourier Transform, z -transform, Inverse

z -transform.
2. Properties and design of lters.
3. Adaptive ltering
4. Spectral analysis
5. Speech processing
6. DSP implementation issues
1
Marking
• 15% - Three homework assignments, one every 3 weeks (after every 6 lectures), with
5% for each assignment.
• 15% - For a Matlab based practical assignment due before the easter break. A choice
of three topics covered in the course will be available.
• 70% - For the nal exam. 5 questions, attempt 4.
2
1 Brief review of DSP fundamentals
1.1 Nyquist's Sampling Theorem
The ideal sampling of a signal, f (t), is the same as multiplying it by an impulse train,
δT (t) = Σ∞
n=−∞ δ(t − nT ). The resulting signal is (c.f. Figure 1)
f¯(t) = f (t)δT (t)

X ∞
f¯(t) = f (nT )δ(t − nT ).
n=−∞
Figure 1: Ideal sampling process.
Note that δT is periodic and so can be represented by a Fourier series,
∞
X
δT = cn ejnω0 t ,
n=−∞
where,
Z π
ω0 ω0 T π
cn = δT (t)e−jω0 t dt, N ote : = .
2π −π 2 ω0
ω0
1
=
T
Therefore,
∞
1 X
f¯(t) = f (t)ejnω0 t .
T n=−∞
3
Take the Laplace transform of both sides gives
∞
1 X
F̄ (s) = F (s − jnω0 )
T n=−∞
which gives the frequency spectrum of the ideally sampled signal as
∞
1 X
F̄ (jω) = F (j(ω − nω0 )) .
T n=−∞
Therefore the spectrum of f¯(t) consists of an innite number of copies of the spectrum of
f (t) shifted to be centred on the multiples of the sampling frequency and scaled by the
1
amount
T (c.f. Figure 2).
ω0
If the smallest frequency in f (t) is less than
2 then the signal may be perfectly re-
constructed, given an ideal analog lowpass lter. This is Nyquist's sampling theorem for
baseband signals. We can also show for bandpass signals that the signal only needs to be
sampled at a rate greater than the bandwidth of the signal.
1
Figure 2: Ideal sampling spectrum. The spectrum is scaled by
T and copied to multiples
of ω0 . An ideal analog reconstruction lter is shown (dashed).
4
1.2 Discrete-Time Fourier Transform (DTFT)
The continuous-time Fourier Transform of a signal, f (t), is dened as
Z ∞
F (jω) = f (t)e−jωt dt
−∞
If f (t) is sampled by impulses, then we are working with the signal f¯(t):
∞
X
f¯(t) = f (nT )δ(t − nT ).
n=−∞
Taking the Fourier Transform of this gives
Z ∞
F (jω) = f¯(t)e−jωt dt
−∞
Z ∞ ∞
X
= f (nT )δ(t − nT )e−jωt dt
−∞ n=−∞
∞
X Z ∞
= f (nT ) δ(t − nT )e−jωt dt
n=−∞ −∞
X∞
= f (nT )e−jnωT
n=−∞
F (jω) denotes the spectrum of the sampled version of f (t). The transform is called the
Discrete-Time Fourier Transform (DTFT).
F (jω) = α(ω) + jβ(ω)

p
F (jω) = α2 (ω) + β 2 (ω)
β(ω)
∠F (jω) = arctan
α(ω)
5
1.3 z -transform
Let f [n] be a sequence obtained by sampling the signal f (t) every T seconds (i.e. at
t = nT ). The z -transform of a sequence, f [n], is dened as:
∞
X
F̃ (z) = f [n]z −n .
n=−∞
z is a complex variable. We note that if z = ejωT then this transform is identical to the
DTFT (i.e., F (jω) = F̃ (ejω )). Also note that as ω varies, z = ejωT traces out the locus of
a unit circle in the complex plane, with its centre at z = 0.
Figure 3: Unit circle in the z -plane traced out by the function z = ejωT
6
Example: Find the DTFT magnitude and phase given H̃(z)
The transfer function of a digital lter is given by the z -function, H̃(z), nd the magnitude
and phase response.
H̃(z) = 1 + z −1
H̃(ejωT ) = 1 + e−jωT
= 1 + cos(ωT ) − j sin(ωT )
q
H̃(ejωT ) = (1 + cos(ωT ))2 + sin2 (ωT )

√ p
= 2 1 + cos(ωT )

− sin(ωT )
∠H̃(ejωT ) = arctan
1 + cos(ωT )
!
− sin(2[ ωT
2 ])
= arctan
1 + cos(2[ ωT2 ])
!
−2 sin( ωT
2 ) cos( ωT
2 )
= arctan
2 cos2 ( ωT
2 )
!
ωT
− sin( 2 )
= arctan
cos( ωT
2 )
ωT
= −
2
7
1.4 Inverse z -transform
Ocial version:
F̃ (z)z n
I
1
f [n] = dz,
2πj C z
where C is an appropriately chosen contour in the region of convergence of the z -plane.
However, in practice there are many simple tricks for nding the inverse z -transform.
Example: Find the inverse z -transform using partial fractions

The impulse response of a digital lter has the following z -transform, H̃(z), nd the impulse
response of the lter.
1
H̃(z) =
z(z − 1)(2z − 1)
Using partial fraction expansion we get:

z 2z
H̃(z) = z −1 1 − −
z − 1 z − 0.5
Using the formula tables we nd the inverse transform is:
h[n] = δ(n − 1) + u(n − 1) − 2(0.5)n−1 u(n − 1)
where u(n) is a unit step and δ(n) is a delta impulse. Alternatively we could use long
division. Either way the derived impulse reponse is given by:
h[n] = [0, 0, 0, 0.5, 0.75, 0.875, ...]
Given a function, H̃(z), the sequence obtained by taking the inverse z -transform, h[n],
is bounded only if the poles of the function lie inside the unit circle. If the function
H̃(z) describes the frequency response of a lter, this sequence h[n] is called the impulse
response.
8
2 Properties and Design of Filters
2.1 Frequency Transfer Functions
The frequency response of a linear digital lter may be represented by the transfer function
H̃(z). Suppose we know the z -transform of the input signal, x[n], is X̃(z). Therefore, we
can nd the output of the lter, y[n], since its z -transform is dened as
Ỹ (z) = H̃(z)X̃(z).
The transfer function is usually given as, or can be reduced to, the ratio of two polynomials.
For example,
B̃(z) b0 + b1 z −1 + b2 z −2 + ... + bM z −M Ỹ (z)

H̃(z) = = −1 −2 −N
= .
Ã(z) a0 + a1 z + a2 z + ... + aN z X̃(z)
Rearranging we get
a0 Ỹ (z) + a1 Ỹ (z)z −1 + ... + aN Ỹ (z)z −N = b0 X̃(z) + b1 X̃(z)z −1 + ... + bM X̃(z)z −M .
Taking the inverse z -transform gives
a0 y[n] + a1 y[n − 1] + ... + aN y[n − N ] = b0 x[n] + b1 x[n − 1] + ... + bM x[n − M ].
Rearranging once more gives a linear constant coecient dierence equation which can be
used to directly implement the lter,
1
y[n] = [(b0 x[n] + b1 x[n − 1] + ... + bM x[n − M ]) − (a1 y[n − 1] + ... + aN y[n − N ])] .
a0
In a nite impulse response (FIR) lter all ai = 0 for i>0 there is no feedback. FIR
lters are always stable. An innite impulse reponse (IIR) lter is unstable if the poles
(where the denominator is zero) are outside the unit circle dened by |z| = 1, i.e., z = ejωT .
We will see later that, when using nite precision arithmetic, it is sometimes dicult to
ensure that the poles are inside the unit circle.
9
2.2 Pole-Zero plots
Another way to examine the transfer function is factorise the numerator and denominator
and see where the poles and zeros lie in the complex z -plane. For example we would
factorise the transfer function as follows:
QM −1

B̃(z) b0 + b1 z −1 + b2 z −2 + ... + bM z −M i=1 1 − zi z
H̃(z) = = = QN .
Ã(z) a0 + a1 z −1 + a2 z −2 + ... + aN z −N −1
i=1 (1 − pi z )
We can manipulate the equation into the following form:
z −M M
Q
i=1 (z − zi )
H̃(z) = −N QN .
z i=1 (z − pi )
Now if we want to know the steady state frequency response of the lter we set z = ejωT ,
−jωT M
QM jωT − z

e i=1 e i
H̃(ejωT ) = −jωT N QN .
e i=1 (e
jωT − p )
i

Next nd the magnitude spectrum, H̄(jω) = H̃(ejωT ). (remember |ejθ | = 1)
e−jωT M QM ejωT − z QM jωT

− zi
i=1 e

jωT i=1 i
H̃(e ) = −jωT N QN = QN .

|e | i=1 |ejωT − pi | i=1 |ejωT − p |
i
In words, the magnitude response of the lter at a frequency ω is the product of the
distances from e
jωT to each of the zeros, divided by the product of the distances of ejωT
to each of the poles.
10
Example: Sketch the magnitude response of h[n] = [1, −0.25, −0.125]
Use a pole-zero plot to sketch the magnitude of the frequency response of the lter whose
impulse reponse is h[n] = [1, −0.25, −0.125]. The z -transform of h[n] is
H̃(z) = 1 − 0.25z −1 − 0.125z −2 .
Factorising this gives
H̃(z) = z −2 (z − 0.5)(z + 0.25).
Hence, the frequency response magnitude is given by

H̃(ejωT ) = ejωT − 0.5 ejωT + 0.25

= L1 .L2
Figure 4: An example of a pole-zero plot with two real zeros at z = −0.25 and z = 0.5.
The magnitude of the frequency response at z= ejωT is the product of L1 and L2 .
We can sketch the magnitude response by determining L1 and L2 at various locations
on the unit circle. Figure 5 shows a plot of the magnitude response of the lter, h[n] =
11
[1, −0.25, −0.125].
1 5 5
ωT = 0, L1 = , L2 = ⇒ H̃(ejωT ) =

2s 4 8s
2 √ 2 √ √85
π 1 2
5 1 2
17
jωT
ωT = , L1 = + (1) = , L2 = + (1) = ⇒ H̃(e ) =

2 2 2 4 4 8
3 3
9

ωT = π, L1 = , L2 = ⇒ H̃(ejωT ) =

2 4 8
1.3
1.2
1.1
1
Magnitude
0.9
0.8
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

ωT (π)
Figure 5: The magnitude of the frequency response of the lter H̃(z) = 1 − 0.25z −1 −
0.125z −2
It is sometimes intuitively helpful to think of the zeros as depressions in the z -plane.
The magnitude of the frequency response is then the height of the absolute value of the
function over the z -plane (c.f. Figure 6).
12
1
0.5
0
log10(|H(z)|)
−0.5
1.5
−1
1
−1.5 0.5
−2 0
−1.5 −0.5
−1
−0.5
0 −1
0.5
1 −1.5 Im[z]
1.5
Re[z]
Figure 6: The log-magnitude of the function H̃(z) = 1 − 0.25z −1 − 0.125z −2 evaluated

over the z-plane. Also shown is the unit circle, z = e
jωT . We see that the zeros form
depressions in the surface.
13
Example: Place a spectral null at ωT = 2π
3
2π
We need a zero on the unit cirlce at z1 = ej 3 . However if we want the lter coecients
4π
to be real we must put a zero at the complex conjugate position, z2 = ej 3 . This gives the
following transfer function:
2π
4π

H̃(z) = z − ej 3 z − ej 3

2π

4π
H̃(ejωT ) = ejωT − ej 3 ejωT − ej 3 = L1 L2

or
H̃(z) = 1 + z −1 + z −2 ⇒ h[n] = [1, 1, 1]
The pole-zero plot for this function is shown in Figure 7.
Figure 7: The pole-zero plot of the transfer function

H̃(z) = 1 + z −1 + z −2 . Also shown is
the magnitude frequency response, H̃(ejωT ).

This is essentially a low pass lter. We would have expected as much since the output
14
of the lter, y[n], is the sum of the present input and two previous inputs:
y[n] = x[n] + x[n − 1] + x[n − 2].

Again we can imagine the zeros as depressions in the surface of the function H̃(z) as is

2π
shown in Figure 8. Note that any component in the input signal at which ωT = 3 will be
completely removed. The lter is said the have a notch at ωT = 2π
3 . However, the notch
may be considered unacceptably wide it is greatly attenuating nearby frequencies. We
can remedy this by introducing a pole nearby.
0
log (|H(z)|)
−1
10
−2 1.5
1
−3
−1.5 0.5
−1
0
−0.5
0 −0.5
0.5
−1
1
−1.5
Im[z]
1.5
Re[z]
Figure 8: The log-magnitude of the function H̃(z) = 1 + z −1 + z −2 evaluated over the

z -plane. Also shown is the unit circle, z = ejωT . We see that the zeros lie on the unit
circle.
15
Example: Implementing an ad-hoc notch lter at ωT = 2π
3
Suppose we want to augment the lter above by adding a pole, we must keep the pole
inside the unit circle and we want to place it as near as possible to the zero. Let's place
2π
it at z = p1 = 0.95ej 3 . But we must also place one at the complex conjugate position to
4π
keep the lter coecients real: z = p2 = 0.95ej 3 . Now we have a new lter,
2π
4π

z − ej 3 z − ej 3
H̃(z) = 2π
4π

z−j 3 z − 0.95ej 3
1 + z −1 + z −2
=
1 + (0.95)z −1 + (0.95)2 z −2

j 2π j 4π
z − e z − e
3
3

jωT
H̃(e ) =

2π 4π

z − 0.95ej 3 z − 0.95ej 3

L1 L2
= .
M 1 M2
Notice that when when z is far away from a pole-zero pair, the absolute value of their
ratio, let's call it R̃(z), is approximately unity, i.e.,

2π
z − ej 3

R̃(z) = '1
2π
z − 0.95ej 3
2π
for z far away from ej 3 . This is exactly what we want, a zero at the specied frequency
and approximately unity gain elsewhere. We can visualise this using the pole-zero plot in

Figure 9. Figure 10 shows the function H̃(z) evaluated over the z -plane. Again, we nd

the magnitude of the frequency spectrum, H̃(e
jωT ), by circumnavigating the unit circle,

z = ejωT . A plot of H̃(ejωT ) is shown in Figure 11.

We have seen how moving zeros and poles around the z -plane allows us to approximately
design lters with a desired frequency response. Iterative computer based techniques do
this when trying to nd design a lter with an arbitrary frequency response they have a
guess and then move the poles and zeros to improve their guess. One famous algorithm is
the Remez algorithm.
16
2π
Figure 9: Pole-Zero plot for a notch lter at ωT = 3 .
0.5
0
log10(|H(z)|)
−0.5
1.5
−1
1
−1.5
0.5
−2 0
−1.5 −0.5
−1
−0.5
0 −1
0.5
1 −1.5 Im[z]
1.5
Re[z]
1+z −1 +z −2
Figure 10: The log-magnitude of the function H̃(z) = 1+0.95z −1 +(0.95)2 z −2
evaluated over
the z -plane. Also shown is the unit circle, z = ejωT . We see that the poles `pull' the
surface back up in the vicinity of the zeros.
17
1.4
1.2
0.8
Magnitude
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ωT (π)

Figure 11: The magnitude spectrum ,

H̃(ejωT ), of the notch lter.

2.3 All-pass systems

Consider the following lter
1 1 − pz −1
Ã(z) = − .
p 1 − 1∗ z −1
p
1
This lter has a zero at z = p and a pole at z = p∗ , where ∗ denotes the complex conjugate.
We can examine what the magnitude of the frequency response is:
− p1 + z −1
Ã(z) =
1 − p1∗ z −1
jωT
− p1 + e−jωT
Ã(e ) =
1 − p1∗ e−jωT

e−jωT 1 − p1 ejωT
=
1 − p1∗ e−jωT

−jωT b
= e
b∗
18

jωT
Ã(e ) = 1

The magnitude response of the lter at all frequencies is unity. So what! Well suppose
we have designed a lter which has a pole, p, outside the unit circle. This lter will be
unstable. However, the allpass lter removes that pole by placing a zero there, and a new
1
pole is placed at the position
p∗ (the distance from the origin is now the inverse of the
1
original distance). There is also a − factor to ensure the gain is one. We can make any
p
IIR lter stable with the same magnitude response. Unfortunately the phase is altered.
19
Example:
Make the lter with the following transfer function stable:
1 1
H̃(z) = = .
(1 − (1 − j)z −1 ) (1 − (1 + j)z −1 ) 1− 2z −1 + 2z −2
0
Magnitude (dB)
−5
−10
−15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)
400
350
300
Phase (degrees)
250
200
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)
Figure 12: The magnitude and frequency response of the unstable lter H̃(z) =
1 1
(1−(1−j)z −1 )(1−(1+j)z −1 )
= 1−2z −1 +2z −2 .
This lter has poles at z = (1 − j) and z = (1 + j). We transform one pole at a time
by multiplying by an allpass lter. The new transfer function is
−1 1 −1 1
H̃Stable (z) =
(1 − j) 1 − 1 z −1 (1 + j) 1 − 1 z −1
1+j 1−j
1 1
=
2 1 − 2(0.5)z −1 + 12 z −2
1
=
2 − 2z −1 + z −2
This stable lter has poles at z = 0.5 ± j0.5. Since they are inside the unit circle the lter
is stable.
20
5
Magnitude (dB)
−5
−10
−15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)
−10
−20
Phase (degrees)
−30
−40
−50
−60
−70
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)
Figure 13: The magnitude and frequency response of the stablised lter H̃Stable (z) =
1
. The magnitude response is the same but the phase response is dierent.
2−2z −1 +z −2
2.4 Design of IIR lters from analog prototypes

We can design lters in the digital domain if we want. However, there is a huge literature
on analog lter design, so we can steal some of their ideas and make them work in the
digital domain. Before we see how to make them work in the digital domain, let's revise
how to build an analog lter to a given specication.
2.4.1 Analog Butterworth lter

Take, for example, the power transfer function of an nth order lowpass Butterworth lter,
1
|H(jω)|2 = 2n
ω
1+ ωc
We are usually given a power transfer specication which we need to meet using this
transfer function.
Hence, we must choose ωc and n to meet the requirements. Here's how...
1
2n ≥ Gp
ωp
1+ ωc
2n
ωp 1
≤ −1 (1)
ωc Gp
21
Figure 14: The typical specication of a Butterworth lowpass analog lter.
and
1
2n ≤ Gs
ωs
1+ ωc
2n
ωs 1
≥ −1 (2)
ωc Gs
If we eliminate ωc from Equations 1 and 2 and solve for n we get

1
log Gs − 1 − log G1p − 1
n≥ .
2log ωωps
Resolving Equations 1 and 2 for ωc gives
ωp ωs
1 ≤ ωc ≤ 1 .
1 2n 1 2n
Gp −1 Gs −1
22
Example: Design an analog Butterworth to meet the spec
ωp = 0.726
Gp = 0.8
ωs = 1.376
Gs = 10−2
We start by choosing n:
log (99) − log (0.25)

n≥ = 4.678
2log (1.895)
Hence we choose n = 5. Now choose an ωc :
ωp 0.726
ωc ≥ 1 = 1 = 0.8339
(0.25) 10

1 2n
Gp −1
ωs 1.376
ωc ≤ 1 = 1 = 0.869
1
−1
2n (99) 10
Gs
So we choose ωc = 0.85 radians/second.
We look up the lter tables and see that the transfer function for a 5th order analog
butterworth lowpass lter with cuto ωc is
1
H(s) = 5 4 3 2 .
s s s s s
ωc + 3.2361 ωc + 5.2361 ωc + 5.2361 ωc + 3.2361 ωc +1
23
2.4.2 Bilinear transform
We have just revised how to design an analog lter (choose the parameters) given a speci-
cation. But how do we use this design technique to build a digital lter? One commonly
used technique is the Bilinear Transform. We take a transfer function for an analog lter
QM
i=1 (s − zi )
HA (s) = QN
i=1 (s − pi )
and we do the following transform
1 − z −1
s→ ,
1 + z −1
which will give us the digital lter
1 − z −1

H̃D (z) = HA
1 + z −1
The frequency response of the analog lter, HA (s), is given by setting s = jωA , the
frequency response of the digital lter, H̃(z), is given by setting z = ejω D T (ωA denotes
analog frequency, ωD denotes digital frequency). So the digital and analog lters have the
same frequency response when
1 − z −1
s =
1 + z −1
1 − e−jωD T
jωA = −jωD T
1 + e
jωD T
= tanh
2
ωD T
= j tan
2
ωD T
ωA = tan
2
This squeezes the entire frequency range of the analog lter, ωA , into the range [0, π] of the
normalised digital frequency ωD T . So if we are given the design specication of a digital
lter we:
1. prewarp the specied ωD frequencies to get the specs for the analog lter: ωA =
ωD T
tan 2
2. design the analog lter to obtain its transfer function, HA (s).

1−z −1
3. Substitute s= 1+z −1
.
24
Example: Design a 5th order lowpass butterworth digital lter using Bilinear
Transform
The sampling time is T = 10−3 seconds. The digital lter specication is
ωDp = 2π(200)
GDp = 0.8
ωDs = 2π(300)
GDs = 10−2
Prewarping gives
2(200)π(10−3 )
ωAp = tan = 0.726
2
GAp = 0.8
2(300)π(10−3 )
ωAs = tan = 1.376
2
GAs = 10−2
These are the same specications for the butterworth lter we designed earlier with n=5
and ωc = 0.85. We look up the lter tables and see that the transfer function for a 5th
order analog butterworth lowpass lter is
1
H(s) = 5 4 3 2 .
s s s s s
ωc + 3.2361 ωc + 5.2361 ωc + 5.2361 ωc + 3.2361 ωc +1
Performing the Bilinear Transformation gives us the digital butterworth lter
1
H̃(z) = 5 4 3 2 .
1 1−z −1 1 1−z −1 1 1−z −1 1 1−z −1 1 1−z −1
0.85 1+z −1
+ 3.2361 0.85 1+z −1
+ 5.2361 0.85 1+z −1
+ 5.2361 0.85 1+z −1
+ 3.2361 0.85 1+z −1
+1
From here it is trivial (if a little soul-destroying) to determine the lter coecients. Hint:
5
multiply top and bottom by 0.85 1 + z −1 and then thresh it out. Figure 15 shows the
frequency response of the lowpass butterworth digital lter.
25
1.4
1.2
1
ωT (π): 0.3997803
Magnitude: 0.9091107
0.8
Magnitude
0.6
0.4
0.2
ωT (π): 0.6013184
Magnitude: 0.08687025
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ωT (π)
Figure 15: The magnitude of the frequency reponse of a 5th order digital butterworth
constructed using the bilinear transform. Notice that the response is zero at ωT = π .
Marked are pass and stop band crital frequencies ωDp andωDs which occur at f = 200Hz
and f = 300Hz respectively when the sample time is T = 10−3 .
2.5 Reconstruction ltering

Once we are nished ltering our digital signals how do we get the sampled signal back to
the analog world?
• In an ideal world we would have a train of impulses, which represent the sampled
signal, and we would pass the impulse train through an ideal lowpass lter.
Figure 16: Ideal signal reconstruction.
• The ideal lter would only pass the portion of the spectrum related to the original
signal and would block all copies of the spectrum which appeared at multiples of the
sampling frequency when the signal was sampled.
There are two problems with this approach:
26
1. It is impossible, in practice, to produce an impulse train. It is very, very dicult to
even get close to producing a reliable weighted impulse train.
2. Even if we could, it is impossible to make an ideal lter but his is not a huge
concern since we can make quite good lter .
In practice we use a device called a digital-to-analog converter followed by an analog low
pass lter. The digital-to-analog (DAC) converter encorperates a zero-order hold mecha-
nism. The impulse reponse, h(t), of the DAC is
h(t) = 1 when 0≤t≤1

h(t) = 0 otherwise
Figure 17: Zero-order hold impulse response.
When the impulse response is convolved with ideally sampled signal a staircase approx-
imation of the signal is obtained (c.f. Figure18).
Figure 18: Zero-order hold signal reconstruction.
From your electronic circuits course you should remember that an ADC can be imple-
mented with an opamp and some resistors (c.f. Figure 19). The staircase occurs because
every T seconds a new sample is read from the computer memory to the input of the ADC.
We would expect that lowpass ltering this staircase approximation might give us a
better reconstruction of the signal. We can justify this assumption by looking at how the
ADC aects the frequency spectrum of the ideally sampled signal.
• The ADC acts as a type of lowpass lter.
27
Figure 19: A four-bit analog-to-digital converter.
• We can nd the transfer function of the ADC by taking the Fourier transform of the
impulse response:
Z∞
F {h(t)} = H(jω) = h(t)e−jωt dt
−∞
ZT
= e−jωt dt
0
t=T
e−jωt

=
−jω t=0
e−jωT − e0
=
−jω
jωT
e− 2 h − jωT jωT
i
= e 2 −e 2
−jω
jωT
" jωT jωT
#
2e− 2 e 2 − e− 2
=
ω 2j
jωT
2e− 2

ωT
= sin
ω 2
ωT

− jωT sin 2
= Te 2 ωT

2
The magnitude response of the ADC is therefore
ωT

sin 2
|H(jω)| = T ωT
.
2
28
In the limit, as x → 0, sin(x)/x → 1. For x = ±nπ , sin(x)/x = 0. Therefore the ADC
transfer function has spectral nulls at
ωT
= ±nπ
2
ωT = ±n2π
2π
ω = ±n
T
= ±nωs
Remember ωs is the sampling frequency! But also remember that when the original signal
was ideally sampled there were an innite number of copies of the signal spectrum placed
1
at ±nωs and scaled by
T . So the ADC has scaled the signal back to its original magnitiude
and placed a null at the centre of each copy of the spectrum. Let's sketch the spectrum of
the output of the ADC.
|X(jω)|
|X̄(jω)|
1
T
−2ωs −ωs ωs 2ωs ω
|H(jω)|
T
|X̄(jω)H(jω)|
Figure 20: Plots of (a) signal spectrum, (b) spectrum of ideally sampled signal, (c) transfer
function of ADC and (d) spectrum of staircase approximation of signal.
• We can now design an analog lowpass lter to remove the residues at ±nωs and hence
29
very closely reconstruct the original signal.
• The more over-sampled the signal is the less sharp the cut-o of the lowpass lter
needs to be since the space between the copies of the spectra is increased.
• Also, if the signal is oversampled there will be less shaping of the signal spectrum
which we are trying to recover since the sin(x)/x will be approxiately at over a
larger portion of the spectrum. This agrees with our intuition.
30
2.6 Interpolation (Upsampling)
By interpolation we mean increasing the sampling rate by an interger number. Suppose we
want to increase the sampling rate L times. We will now have L−1 new samples between
each sample. How should we do this? There are two possible solutions, one is better than
the other...
1. Reconstruct and resample : We can reconstruct an analog signal using a zero-order
hold and a lowpass lter. We can then resample the signal at higher rate. This is the
ugly approach, since we must go back into the 'analog world' and we are sure to lose
information in the process due to the non-idealities of reconstruction and resampling.
2. Digitally resample and lowpass lter : This is the pretty way of doing it. Let's inves-
tigate how...
When we sampled the signal we obtained a sequence of numbers which represent the
samples (the impulses). What happens if we resample this signal L times faster?
Assume that sampling the signal x(t) gives x[n] = {12, 9, 15} resampling this signal
L=4 times faster will give new signal y[n] = {12, 0, 0, 0, 9, 0, 0, 0, 15} (c.f. Figure 21).
Figure 21: Upsampling example
If the time between the samples of x[n] is T1 then the time between the samples of y[n]
T1
is T2 = L . Hence we may write y[n] as
∞
X
y[n] = x[k]δ(n − kL)
k=−∞
= ... + x[0]δ(n − 0) + x[1]δ(n − L) + x[2]δ(n − 2L) + ...
31
Let's see what the discrete time Fourier transform of y[n] looks like:
∞
X
Ȳ (jω) = Ỹ (ejωT2 ) = y[n]e−jnωT2
n=−∞
∞
X ∞
X
= x[k]δ(n − kL)e−jnωT2
k=−∞ n=−∞
∞ ∞
" #
X X
= x[k] δ(n − kL)e−jnωT2
k=−∞ n=−∞
X∞
= x[k]e−j(kL)ωT2
k=−∞
jωT2 L
= X̃(e )
jωT1
= X̃(e )
2π
• Hence the spectrum of y[n] (which is sampled at ω2 = T2 ) is identical to that of x[n].
• Therefore there are copies of X(jω) (the spectrum of the analog signal x(t)) at ±nω1 ,
1
and scaled by
T1 .
• But if we had sampled the analog signal x(t) at ω2 rad/s we would only have copies
of X(jω) at ±nω2 = ±nLω1 , and scaled by T12 .
So, we need the remove the copies which are not centered at ±nω2 . We do this with a
digital lowpass lter. Each copy can have a footprint on the frequency axis of ± ω21 .
ω1 ω2 2π π
• So, the lter cuto frequency needs to be at ω= 2 = 2L = 2LT2 = LT2 . The digital
π
lter has a cuto at ωT2 = L.
1
The spectrum is still scaled by
T1 so if the lter has a gain of L the ltered spectrum will
1
have a scaling of
T2 as required. By ltering y[n] this lter will output an interpolation of
x[n]. We call this interpolated sequence xint [n].

In summary, to upsample by a factor L:
1. Insert L−1 zeros between the samples.
π
2. Lowpass lter with a cuto at ωT2 = L and a gain of L.
32
|X(jω)|
|Ỹ (ejωT2 )| L=3

1
T1
−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1 ω

−ω2 ω2
ω1 ω2
2
= 2L
π
ωT2 = L
|H(ejωT2 )|
−3ω1 −2ω1 −ω1 ωT2 = − Lπ ωT2 = π

L
ω1 2ω1 3ω1 ω
−ω2 ω2
|X̃int (ejωT2 )|
1
T2
−ω2 ω2 ω
Figure 22: A example of upsampling with L = 3. Shown is the original signal spectrum,
|X(jω)|, and the spectrum of the sampled and then upsampled signal, |Ỹ (ejωT2 )|. Also
shown is the lowpass lter used to interpolate between the original samples, |H̃(e
jωT2 )|,
π jωT
which has a cuto at ωT2 =
L . The nal upsampled spectrum is shown as |X̃int (e
2 )|.
33
2.7 Decimation (Downsampling)
Downsampling involves decreasing the sample rate by interger multiples. Assume we have
sampled a signal x(t) every T1 seconds to get x[n]. When downsampling by a factor of M
we take every M th sample:
y[n] = x[M n].
This is equivalent to sampling the original signal every T2 = M T1 seconds (c.f. Figure 23).
Figure 23: Downsampling example.
This results in copies of the analog signal spectrum, X(jω), being shifted to multiples
2π ω1 1
of ω2 = T2 = M and scaled by T2 .
• This was what was required. Where's the problem?
• The problem is: If the maximum frequency in X(jω) is greater than ω= ω2

2 then
downsampling will cause aliasing.
This is simply a re-statment of the sampling theorem. Before the signal was sampled all
ω1
frequencies greater than ω=
2 were removed using an analog anti-alias lter. Before we
ω ω1
downsample we can remove, from x[n], all remaining frequencies greater than ω = 2 =
2 2M
using a lowpass digital lter, H̃(e
jωT1 ). Hence H̃(ejωT1 ) has a cuto at ωT = π . We call
1 M
this ltered version xlpf [n]. Now, the downsampled sequence is y[n] = xlpf [M n].
In summary, to downsample x[n] by a rate M:
π
1. Low pass lter x[n] with a cuto ωT1 = M and unity gain to get xlpf [n].
2. Take every M th sample from xlpf [n] to get the downsampled signal, y[n] = xlpf [M n].
34
|X(jω)|
|X̃(ejωT1 )|
1
T1
−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1 ω
|H̃(ejωT1 )|
M =2
1
T1
−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1 ω

ω1 ω1
ω= 2M
= 4
|X̃(ejωT1 )H̃(ejωT1 )|
1
T1
−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1 ω
|Ỹ (ejωT2 )|
1
T2
−6ω2 −5ω2 −4ω2 −3ω2 −2ω2 −ω2 ω2 2ω2 3ω2 4ω2 5ω2 6ω2 ω
−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1
Figure 24: An example of down sampling with M = 2.
35
3 Optimum and Adaptive ltering
In the 1940s Norbert Wiener conducted fundamental research into the following problem:
given a measured signal, x[n], which is a corrupted version of the desired signal d[n], what
linear lter, w[n], will provide the best estimate of d[n] from the measured values of x[n].
First we will deal with the FIR time-invariant Wiener lter.
d[n] x[n] ˆ
d[n]
Corruption Wiener Filter w[n]
(Noise + Distortion)
Figure 25: Wiener lter. Desired signal, d[n]. Corrupted signal, x[n]. Estimated signal,
ˆ .
d[n]
3.1 Time-invariant FIR Wiener lter

Wiener, for mathematical convenience, decided to try and choose the lter coecients,
w[k], which would minimse the mean-square error between the estimate of the signal,
ˆ = Σp−1 w[k]x[n − k], and the desired signal, d[n]. Hence we dene the error, e[n], as:
d[n] k=0
ˆ
e[n] = d[n] − d[n]
p−1
X
e[n] = d[n] − w[k]x[n − k].
k=0
Therefore, we can write the mean-squared error, , as

p−1
!2 
 X 
= E (e[n])2 = E

d[n] − w[k]x[n − k] .
 
k=0
We wish to minimise this expression with respect to each of the w[i]. Therefore we dier-
entiate to get
∂ ∂
E (e[n])2

=
∂w[i] ∂w[i]
∂(·)
∂x and E {·} are both linear operators, so their order can be interchanged:

∂ ∂
(e[n])2

= E
∂w[i] ∂w[i]

∂e[n]
= E 2e[n]
∂w[i]

∂e[n]
= 2E e[n]
∂w[i]
36
Pp−1
But e[n] = d[n] − k=0 w[k]x[n − k], which when expanded is e[n] = d[n] − w[0]x[n] −
w[1]x[n − 1]..., so
∂e[n]
= −x[n − i].
∂w[i]
This gives
∂
= 2E {−e[n]x[n − i]}
∂w[i]
∂
= −2E {e[n]x[n − i]}
∂w[i]
∂
We then minimse by setting
∂w[i] equal to zero for each i = 0, ..., (p − 1):
E {e[n]x[n − i]} = 0. i = 0, ..., (p − 1) (3)
This tells us that our error when trying to recover the signal, d[n], must be uncorrelated
with the measured signal x[n]. If our error was in some way dependent on the input to
the lter, x[n], then we would expect we could remove the dependent part using a better
lter! This known as the orthogonality principle, or the projection theorem.
We can now substitute e[n] into Equation 3:
p−1
( ! )
X
E d[n] − w[k]x[n − k] x[n − i] = 0, i = 0, ..., (p − 1)
k=0
p−1
( !)
X
E (d[n]x[n − i]) − x[n − i] w[k]x[n − k] = 0, i = 0, ..., (p − 1)
k=0
p−1
( )
X
E {d[n]x[n − i]} − E x[n − i] w[k]x[n − k] = 0, i = 0, ..., (p − 1)
k=0
E {d[n]x[n − i]} −
E {x[n − i] (w[0]x[n − 0] + ... + w[p − 1]x[n − p + 1])} = 0, i = 0, ..., (p − 1)
E {d[n]x[n − i]}
−w[0]E {x[n − i]x[n − 0]} + ... + w[p − 1]E {x[n − i]x[n − p + 1]} = 0, i = 0, ..., (p − 1) (4)
Rearranging Equation 4 gives
p−1
X
E {d[n]x[n − i]} = w[k]E {x[n − i]x[n − k]} i = 0, ..., (p − 1) (5)
k=0
Here we dene the autocorrelation of a signal a[n] as
raa [k] = E {a[n]a[n − k]} ,
and we dene the crosscorrelation of signals a[n] and b[n] as
rab [k] = E {a[n]b[n − k]} .
37
Hence Equation 5 becomes
p−1
X
rdx [i] = w[k]rxx [k − i], i = 0, ..., (p − 1).
k=0
These p equations are called the Wiener-Hopf equations, due to their introduction by
Norbert Wiener and Eberhard Hopf whilst working at MIT. It's called a Wiener lter
because Hopf moved to Germany in 1936, where National Socialist German Workers' Party
was in power, and his much of his contribution went unacknowledged. History is written
by the victors!
We can write these equations in matrix form:
    
rxx [0] rxx [1] rxx [2] · · · rxx [p − 1] w[0] rdx [0]
    

 rxx [1] rxx [0] rxx [1] 
 w[1]  
  rdx [1] 


 rxx [2] rxx [1] rxx [0] 
 w[3] =
  rdx [3] ,

. .. . .
    
. . . .
. . .
    
    
rxx [p − 1] rxx [0] w[p − 1] rdx [p − 1]
which is the matrix form of the Wiener-Hopf equations. Written more compactly in matrix
algebra we have:
Rxx w = rdx .
Rxx is a p×p symmetric Toeplitz matrix and as such is guaranteed to be invertable.
There is an algorithm called the Levinson-Durbin algorithm which eciently solves these
equations. We will meet this algorithm later in the Speech Processing section.
The optimum lter coecients, wopt , are therefore:
wopt = R−1
xx rdx
In order to nd these lter coecients we must estimate the autocorrelation and cross-
correlation statistics. This makes a big assumption: that the statistics are stationary! If
they change we're in trouble. That's why this is called a time-invariant lter, because the
parameters don't change with time.
38
3.1.1 Minimum mean squared error of Wiener lter
We can calculate the expected minimum mean square error:
 
 This is e[n] 

 z }| {

p−1

 ! 

2  X 
= E e [n] = E e[n] d[n] − w[k]x[n − k]
 


 k=0 



 

p−1
X
= E {e[n]d[n]} − w[k] E {e[n]x[n − k]}
| {z }
k=0
This is zero
= E {e[n]d[n]} (6)
For optimal lter coecients E {e[n]x[n]} = 0, from Equation 3. Next, we substitute the
for the other e[n] term in Equation 6:
 
 This is e[n] 

z }| { 

p−1

 ! 

 X 
= E d[n] − w[k]x[n − k] d[n]
 


 k=0 



 

p−1
X
= E {d[n]d[n]} − w[k]E {d[n]x[n − k]}
k=0
p−1
X
= rdd [0] − w[k]rdx [k].
k=0
This can be written in vector notation as:
= rdd [0] − rTdx w

= rdd [0] − rTdx R−1
xx rdx .
39
3.1.2 Corruption due to uncorrelated noise
When calulating the optimum lter parameters we must estimate rxx [k] and rdx [k]. If we
assume that noise, v[n], has simply been added to the original signal,
x[n] = d[n] + v[n].
If we also assume the noise is uncorrelated with d[n] (rdx [k] = E {d[n]v[n − k]} = 0) then
the following simplications can be made:
rxx [k] = E {x[n]x[n − k]}

= E {(d[n] + v[n]) (d[n − k] + v[n − k])}
= E {d[n]d[n − k]} + E {d[n]v[n − k]} + E {d[n − k]v[n]} +E {v[n]v[n − k]}
| {z } | {z }
=0 =0
= rdd [k] + rvv [k]
Also,
rdx [k] = E {d[n]x[n − k]}

= E {d[n] (d[n − k] + v[n − k])}
= E {d[n]d[n − k]} + E {d[n]v[n − k]}
| {z }
=0
= rdd [k].
The Wiener-Hopf equations in matrix-vector form for this becomes
[Rdd + Rvv ] wopt = rdd .
40
Example: Given signal statistics design Wiener lter
Given that d[n] is known to be a process with an autocorrelation given by rdd [k] = a|k| ,
with 0 < a < 1. Additive white noise with a variance of σ2 has corrupted d[n] to give
x[n]. Design an optimum second order lter to retrieve an estimate of d[n] from x[n]. If
a = 0.8 and σ
2 = 1, determine the lter coecients. Estimate the mean square error of
the output.
The Wiener-Hopf equations are
" #" # " #

rxx [0] rxx [1] w[0] rdx [0]
= .
rxx [1] rxx [0] w[1] rdx [1]
Since d[n] and v[n] are uncorrelated and v[n] is white noise, we get rxx [k] = rdd [k]+rvv [k] =
a|k| + σ 2 δ[k]. Also, rdx [k] = rdd [k]. So,
" #" # " #

1 + σ2 a w[0] 1
= .
a 1 + σ2 w[1] a
Solving gives
" # " #
w[0] 1 1 + σ 2 − a2
= .
w[1] (1 + σ 2 )2 − a2 aσ 2
When a = 0.8 and σ2 = 1 we have wT = [ 0.4048 0.2381 ]. Figure 26 shows the lter
before and after ltering.
Effect of Wiener filtering

2.5
1.5
0.5
x[n]
−0.5
−1
−1.5
−2
−2.5
0 20 40 60 80 100
n
Figure 26: Wiener Filtering. Dashed: Original signal. Dotted: Signal with white noise
added. Solid: ltered signal.
41
|Wopt (ejωT )|
Noise
Signal
π ωT
Figure 27: Spectral illustration of Wiener ltering a noisy signal. The lter tries to preserve
as much signal and removes as much noise as possible.
The mean squared error, , is given by = rdd [0] − rTdx R−1

xx rdx . So we have
" #" #
0
h i 0.5952 −0.2381 1
= 0.8 − 1 0.8
−0.2381 0.5952 0.8
= 0.2048
42
3.2 Adaptive ltering
So what's wrong with Wiener ltering? Two things:
1. If the correlation statistics change we must re-estimate them,
2. Estimating correlation statistics takes time since we must wait for the data and then
compute an average.
We will try and tackle the rst problem now.
3.2.1 Steepest descent algorithm

If we assume that some change in the statistics has caused us to have a currently non-
optimal value for the lter coecients, wn , We can use a steepest descent algorithm to
make iterative changes until we arrive at the new optimum. This might seem pointless since
we can simply nd the new optimum in one move by solving the Wiener-Hopf equations,
but the value of this method will become clear when we try to solve problem 2 above.
The mean squared error

p−1
!2 
 X 
= E e2 [n] = E

d[n] − wn [k]x[n − k]
 
k=0
is a quadratic function in wn [k]. The error surface traced out by varying each wn [k] is a p
dimensional quadratic 'bowl' which has only one minimum. Figure 28 illustrates this for
a second order lter.
We can move towards that mimium by stepping a small distance, µ, in a direction down
the surface to get wn+1 [k]. The direction we wish to move is the opposite of the steepest
direction up the slope (grad() = ∇). Hence our updated coecients are
wn+1 = wn − µ (∇)
⇓
 
    ∂
wn+1 [0] wn [0] ∂wn [0]
∂
 
wn+1 [1] wn [1]
     
  
 = 

 − µ ∂wn [1] 
. . .

. .
 

.
 
.
 .
.
     
 
wn+1 [p − 1] wn [p − 1] ∂
∂wn [p−1]
∂
So we need to nd the derivatives
∂wn [i] . (We did this earlier, but here it is again.)
Therefore we dierentiate to get
∂ ∂
E (e[n])2

=
∂wn [i] ∂wn [i]
43

w[1]
w[0]
w[1]
wn
wn+1
w[0]
Figure 28: Error surface for a second order lter.
44
∂(·)
∂x and E {·} are both linear operators, so their order can be interchanged:

∂ ∂
(e[n])2

= E
∂wn [i] ∂wn [i]

∂e[n]
= E 2e[n]
∂wn [i]

∂e[n]
= 2E e[n]
∂wn [i]
Pp−1
But e[n] = d[n] − k=0 wn [k]x[n − k], which when expanded is e[n] = d[n] − wn [0]x[n] −
wn [1]x[n − 1] − ... − wn [p − 1]x[n − p + 1], so
∂e[n]
= −x[n − i].
∂wn [i]
This gives
∂
= 2E {−e[n]x[n − i]}
∂wn [i]
∂
= −2E {e[n]x[n − i]}
∂wn [i]
So the update equation becomes (with the −2 absorbed into the µ)

     
wn+1 [0] wn [0] E {e[n]x[n]}
wn+1 [1] wn [1] E {e[n]x[n − 1]}
     
     
 .
= .
 + µ .
 (7)
 .   .   . 
 .   .   . 
wn+1 [p − 1] wn [p − 1] E {e[n]x[n − p + 1]}
Figure 29 shows a block diagram of an adaptive lter.
d[n]
x[n] ˆ
d[n]
Filter w[n]
∂
µ ∂w[k]
Adaptive e[n]
Algorithm
Figure 29: Adaptive lter
If the signal statistics remain constant, this will converge to the optimum lter. If the
statistics change, it will try to follow them. However problem 2 still remains: we must
estimate an expectation at each time step! We now propose a solution...
45
3.2.2 The LMS algorithm
The required expectation in Equation 7 may be estimated using the sample mean,
L−1
1 X
E {e[n]x[n − k]} = e[n − l]x[n − k − l].
L
l=0
We may nd a crude approximation by letting L = 1:
E {e[n]x[n − k]} ≈ e[n]x[n − k].
So the new update equation to adjust the lter taps is
     
wn+1 [0] wn [0] e[n]x[n]
wn+1 [1] wn [1] e[n]x[n − 1]
     
     
 .
= .
 + µ .
.
 .   .   . 
 .   .   . 
wn+1 [p − 1] wn [p − 1] e[n]x[n − p + 1]
This is called the LMS (Least Mean Squares) algorithm.
• This is hardware ecient since each update requires approximately one vector mul-
tiplication and one vector addition.
• This comes at the price of having slower convergence properties than the Steepest
Descent algorithm.
• It will sometimes move up the error surface, since we are using crude approximation
of E {e[n]x[n]}. However, on average it will move in the right direction.
One known technical limitation on this method is that the step size, µ, must be kept small
2
if the algorithm is to converge. In fact it must be smaller than
λmax , where λmax is the
largest eigenvalue of the autocorrelation matrix, Rxx of the input, x[n].
2
µ<
λmax
In practice we just make µ reasonably small.
• This type of ltering is commonly known as adaptive equalisation. A training signal,
d[n], is sent down the channel. The receiver measures x[n] which is a corrupted
version of d[n]. The adaptive lter then 'equalises' the eects of the channel. Once
the training signal has nished the lter taps are frozen an the data is transmitted.
If it is a mobile channel (time varying) then training must be repeatedly performed
after some data has been transmitted, to adjust to the changing channel.
46
3.3 Adaptive system identication
Equalisation essentially looks for the inverse of the eect which corrupted the original
signal. Another interesting area where adaptive ltering can be applied is in system iden-
tication, or system modelling.
x[n] d[n] e[n]

Unknown System
ˆ
d[n]
System Model w[n]
Figure 30: Block diagram of system modelling.
We put the same input into the lter and the system and adjust the lter until both
outputs are the same, or close. Conceptually the setup is dierent, but the mathematics
is identical to that of equalisation.
x[n] d[n] e[n]

Unknown System
ˆ
d[n]
System Model w[n]
∂
µ ∂w[i]
Adaptive algorithm
Figure 31: Adaptive system modelling.
Figure 32 illustrates the convergence properties of the LMS algorithm applied to system
identication. The system impulse response is h[n] = [ 1 0.2 ]. The input to both the
47
system and the lter, x[n] is coloured noise, with an autocorrelation of rxx [k] = 0.8|k| . We
see that after about 1000 samples the system has been adequately modelled.
Error as a function of time

Coefficient values as a function of time
0.5 1.2
0.4
1
0.3
0.2 0.8
e[n]=d[n]−dest[n]
Coefficients
0.1
0 0.6
−0.1
0.4
−0.2
−0.3
0.2
−0.4
−0.5 0
0 500 1000 1500 2000 0 500 1000 1500 2000
n n
Figure 32: Adaptive system identication example. Coecients converge to the known
system coecients.
48
4 Spectral analysis
Spectral analysis deals with the examination of the frequency content of random signals.
Since we do not have an explicit expression for the signals in question we cannot directly
calculate their spectra. However using the statistics of the signals we can estimate on
average what frequenies contribute to the signal power to what amount.
Since we are dealing with stochastic processes we rst need to quickly revise some basics
on random variables and processes.
4.1 Review of stochastic variables and processes

4.1.1 Random variable
• A random variable, X, is a number (continuous or discrete) associated with the
outcome of an experiment. If we repeat the experiment we may get a dierent
number in an un-predictable way.
• We can dene the probability that X will be below, or equal to, some value x as
FX (x) = P (X ≤ x).
This is called the cumalitive distribution function (CDF). This has some obvious
properties:
FX (x1 ) ≤ FX (x2 ) if x1 < x2

FX (∞) = 1
FX (−∞) = 0.
• We dene the probability density function (PDF) to be
dFX (x)
fX (x) = .
dx
Therefore the probability of X falling in some interval [x1 , x2 ]
P (x1 ≤ X ≤ x2 ) = P (X ≤ x2 ) − P (X ≤ x1 )
= FX (x2 ) − FX (x1 )
Z x2
= fX (x)dx.
x1
R∞
Hence,
−∞ fX (x)dx = 1. Figure 33 shows the CDF and PDF for a Gaussian random
variable:
−(x−µ)
1 2σ 2
fX (x) = √ e .
σ 2π
49
1
0.8
0.6
0.4
0.2
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
0.4
0.3
0.2
0.1
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 33: CDF and PDF for a Gaussian random variable with zero mean and variance
σ 2 = 1.
• The expected value of a function of a random variable, g(X), is the average over a
large (innite) number of experiments. Since the event X=x occurs with a relative
frequency of fX (x), this works out as
Z ∞
E {g(X)} = g(x)fX (x)dx
−∞
• The nth moment of a random variable is dened as the expected value of g(x) = xn :
Z ∞
n
E {X } = xn fX (x)dx.
−∞
n=1 gives the mean.
4.1.2 Random Processes

• Suppose we have an experiment which continually outputs random variables as time
elapses. We call this output a random process, X(t), rather than a random variable.
The CDF and PDF for a random process are dened as for a random variable, except
they are now time dependent:
FX (x, t) = P (X(t) ≤ x)
dFX (x, t)
fX (x, t) = .
dx
• If we x t = t1 , then X(t1 ) is simply a random variable and all the rules for random
variables apply. If we were to estimate the CDF, FX (x, t1 ) and PDF, fX (x, t1 ) we
would need to set up a large (preferably innite) number of experiments, all running
simultaneously and inspect them when t = t1 as shown in Figure 34. This large
50
group of experiments is called an ensemble. Once we have estimated fX (x, t1 ) we
can calculated the expected value X(t1 ) using
Z ∞
E {X(t1 )} = xfX (x, t1 )dx
−∞
X1 (t)
X2 (t)
X∞ (t)
Figure 34: An ensemble of random processes, X(t), which we would use to estimate
fX (x, t1 ).
4.1.3 Stationarity
• A process is said to strictly stationary if all its statistics are independent of time. If
X(t) is strictly stationary then fX (x, t1 ) = fX (x, t2 ).
• A less strict version of stationarity often used is wide sense stationary. In this case
only the mean and autocorrelation need to be stationary.
• The autocorrrelation is dened as
rXX (t1 , t1 + τ ) = E {X(t1 )X(t1 + τ )} .
Since the autocorrelation is stationary for a wide sense stationary process, the fol-
51
lowing is true:
rXX (t1 , t1 + τ ) = rXX (t2 , t2 + τ ).
Hence, rXX (t1 , t1 + τ ) is only a function of the time dierence, τ. Hence, for a wide
sense stationary process we write the autocorrelation as rXX (τ ).
4.1.4 Ergodicity
• To estimate the expected value (the mean) of a random process X(t) at time t = t1 ,
we have had to examine a large number of concurrent experiments and nd the
average outcome across all experiments at t = t1 :

Z ∞
E {X(t1 )} = xfX (x, t1 )dx.
−∞
• But, if this is equivalent to taking a time-average for one experiment,
Z t1 +T
1
E {X(t1 )} = lim X(t)dt,
T →∞ 2T t1 −T
then the process is said to be ergodic in the mean. If the same is true for the
autocorrelation
Z t1 +T
1
E {X(t1 )X(t1 + τ )} = lim X (t) X (t + τ ) dt
T →∞ 2T t1 −T
then the process is said to be ergodic in the autocorrelation.

• If it is ergodic in both the mean and the autocorrelation, the process is simply called
ergodic.
• To be ergodic the process must be stationary. But, a process can be stationary but
not ergodic! Why?
The assumption of ergodicity allows us to use time-averages to estimate the

autocorrelation function from a single random process.
52
4.2 Continuous-time power spectral density
The power spectral density (PSD) and the autocorrelation function of a signal, say x(t),
form a Fourier transform pair:
Z∞
Sxx (ω) = rxx (τ )e−jωτ dτ,
−∞
Z∞
1
rxx (τ ) = Sxx (ω)ejωτ dω.
2π
∞
We do not prove this here. But this fact is called the Wiener-Kinchin theorem and we use
it as a starting point.
• We can see why this might make sense by examining the instantaneous power of the
signal:
Z∞
2
1
E x (t) = rxx (0) = Sxx (ω)dω.
2π
−∞
This is the integral of the power spectral density at all frequencies. This is why the
2
units of the PSD is Watts/Hz or V /Hz. It can be shown that if x(t) is real then
Sxx (ω) ≥ 0 and symmetric.
We can also write the PSD for a discrete time process...
53
4.3 Discrete-time power spectral density
We assume we are examining the process x[n]. The autocorrelation becomes rxx [k] =
E {x[n]x[n + k]}, where k is an integer. Since we have samples of r[k] can write PSD as
the discrete time Fourier transform of the autocorrelation:
∞
X
S̃xx (e jωT
)= rxx [k]e−jωkT .
k=−∞
2π
If the signal x[n] was sampled at rate, ωs = T , (every T seconds) this gives
∞
jωT 1 X
Sxx (e ) =
e Sxx (ω − ωs )
T
k=−∞
If ωs is greater than twice the maximum frequency in x(t) (bandlimited) then we can write
1
Sexx (ejωT ) = Sxx (ω) for |ωT | < π.
T
Sxx (ω)
S̃xx (ejωT )
1
T
−3ωs −2ωs −ωs ωs 2ωs 3ωs ω
Figure 35: PSD, S̃xx (ejωT ), of a sampled process.
54
4.4 The periodogram
Ultimately we will have to estimate the PSD from the data. If we have a signal x[n] and
we have collected N samples, we can model this by multiplying x[n] by a window function
w[n] to get
v[n] = w[n]x[n],
where the window function has the property of being a square pulse:
w[n] = 1 if 0 ≤ n ≤ N − 1,
= 0 otherwise.
The autocorrelation, rvv [k] of this windowed sequence can be estimated as
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−6 −4 −2 0 2 4 6 8 10 12 14
Figure 36: Rectangular window function, for N = 8.
N −1
1 X
rvv [k] ≈ v[n]v[n + k]
N
n=0
N −1
1 X
= (w[n]x[n]) (w[n + k]x[n + k]) .
N
n=0
55
The estimated PSD of the windowed process is therefore
∞
X
Sevv (ejωT ) = rvv [k]e−jωkT
k=−∞
∞ N −1
X 1 X
≈ v[n]v[n + k]e−jωkT
N
k=−∞ n=0
∞ N −1
1 X X
≈ v[n]ejωnT v[n + k]e−jω(n+k)T
N
k=−∞ n=0
"N −1 #" ∞
#
1 X X
≈ v[n]ejωnT v[n + k]e−jω(n+k)T
N
n=0 k=−∞
"N −1 # "N −1 #
1 X jωnT
X
−jωmT
≈ v[n]e v[m]e
N
n=0 m=0
Ṽ ∗ (ejωT )Ṽ (ejωT )
≈
N
|Ṽ (ejωT )|2
≈ . (8)
N
We forgot to normalise for the energy in the window so we add this normalising factor:
PN −1
U= n=0 (w[n])2 :
|Ṽ (ejωT )|2
Sevv (ejωT ) ≈ . (9)
NU
Which for the rectangular window is U = N.
This is the DFT (discrete Fourier transform since we only used N samples) magnitude
squared divided by the number of samples, N, and the energy in the window, U. This is
called the periodogram .
Remember, we cannot practically know what's going on at all frequncies, so the DFT
2πm
is usually evaluated at ωT = N this give the machine-calculable DFT:
N −1
X 2πm
V̄ [m] = Ṽ (e jωT
) = v[n]e−jn
N
ωT = 2πm
N n=0
56
Example
We can estimate the power spectral density of a sinusoid, x[n] = sin(ω0 nT ). Let's choose
π
ω0 T = 2 . So, x[n] = sin( nπ
2 ). We'll take N =8 samples of the signal. Figure 37 show the
samples of x[n] and two DFTs for dierent valus of m.
0.5
−0.5
−1
1 2 3 4 5 6 7 8
n
0.25
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9
m
0.25
0.2
0.15
0.1
0.05
0
0 5 10 15 20 25 30 35
m
Figure 37: 8 samples of x[n] = sin( nπ

2 ) (top). We take two DFTs. One for m = 16,
(middle) and one for m = 64.
• Only the positive frequency axis is shown.
• There is another copy at ωT = − π2 .
• Of course, the range −π ≤ ωT ≤ π is copied to multiples of the sampling frequency
too.
We can also take more samples and see what happens. If we take N = 64 samples and
take two more DFTs for comparison (Figure 38):
57
1
0.5
−0.5
−1
0 10 20 30 40 50 60 70
n
0.25
0.2
0.15
0.1
0.05
0 10 20 30 40 50 60
m
0.25
0.2
0.15
0.1
0.05
0
0 50 100 150 200 250
m
Figure 38: 64 samples of x[n] = sin( nπ

2 ) (top). Two DFTs for m = 128 and m = 512. We
see that increasing N has narrowed the power spectrum.
58
4.5 Window functions
We can investigate what eect using a window function had on our periodogram estimator.
n o
We want to nd the bias in our estimation. Hence we are looking for E S̃vv (ejωT ) as
∞
( )
n o X
−jωkT
E S̃vv (ejωT ) = E rvv [k]e
k=−∞
= ... + E {rvv [−1]} e−jω(−1)T + E {rvv [0]} e−jω(0)T + E {rvv [1]} e−jω(1)T + ...
X∞
= E {rvv [k]} e−jωkT
k=−∞
∞ N −1
( )
X 1 X
= E w[n]x[n]w[n + k]x[n + k] e−jωkT
N
k=−∞ n=0
∞ N −1
" #
X 1 X
= w[n]w[n + k] E {x[n]x[n + k]} e−jωkT
N
k=−∞ n=0
X∞
= rww [k]rxx [k]e−jωkT
k=−∞
So the expected value (the bias) of the periodogram estimator is the DTFT of the product
of the autocorrelation of the signal and the autocorrelation of the window. But, the discrete
product of two signals in the time domain is identical

time Fourier transform of the
to their periodic convolution in the frequency domain. Therefore,
π
ZT
n
jωT
o 1
E S̃vv (e ) = S̃ww (ej(ω−θ)T )S̃xx (ejθT )dθ.
2π
θ=− Tπ
• So the window, w[n], has the eect of smearing the frequency content of S̃xx (ejωT )
across the frequency spectrum. This is called spectral leakage.
• We can tailor the window shape to try and reduce this eect.
PN −1
• Don't forget to normalise for the window energy: U= n=0 (w[n])2
π
ZT
n o 1
E S̃vv (ejωT ) = S̃ww (ej(ω−θ)T )S̃xx (ejθT )dθ.
2πU
θ=− Tπ
59
The rectangular window which we have been using up until now has very abrupt transitions
in the time domain. So, its frequencies spectrum is very broad. We take the DTFT to see
this:
∞
X
jωT
W̃ (e ) = w[n]e−jωnT
n=−∞
N −1
X n
= e−jωT
n=0
1 − e−jωT N
= (by the geometric series)
1 − e−jωT

jωN T jωN T jωN T
e− 2 e 2 − e− 2
= jωT
jωT jωT

e− 2 e 2 − e− 2
jω(N −1)T sin(N ωT
2 )
= e− 2 .
sin( ωT
2 )
Hence the magnitude spectrum of the w[n] is
sin(N ωT
2 )
|W̃ (ejωT )| = .
sin( ωT
2 )
But we can show (similar to an earlier calculation in Equation 8) that S̃ww (ejωT ) is
|W̃ (ejωT )|2

S̃ww (ejωT ) =
N
1 sin2 (N ωT
2 )
= 2 ωT
.
N sin ( 2 )
sin(N x)
• As ω → 0, |S̃ww (ejωT )| → N , since limx→0 sin(x) = N.
• Otherwise there are zeroes at
N ωT
= ±kπ
2
m
2π
ωT = ±k
N
which looks like the upper-left plot of Figure 39 (shown in Db).
60
Figure 39: Spectra of a selection of window functions. The importance of resolution versus
spectral leakage trade-o becomes apparent.
61
• Hence the width of the main lobe is
4π
∆ωT = .
N
• This is the approximate frequency resolution of the periodogram when using a rect-
angular window.
• If two frequencies are closer than this, the main lobes in the periodogram will overlap
and they will become indistinguishable.
• We can make the resolution arbitrarily small by taking more samples (↑ N ). By
increasing N the spectrum becomes more like a delta function. The resolution in
terms of frequency in Hz. is (using ∆ω = 2π∆f )
2
∆f = .
NT
• jωT
n N →o∞, S̃ww (e ) → 2πU δ(ω),
Taken to its extreme, as which gives the following
expected value of E S̃(ejωT ) :
π
ZT
n o 1
E S̃(ejωT ) = 2πU δ(ω − θ)S̃xx (ejθT )dθ = S̃xx (ejωT ).
2πU
− Tπ
So the estimator is unbiased in the limit as N →∞ (on average it gives the correct
answer).
• However, the amount of spectral leakage decreases very slowly as N increases! We
can do better by altering the window shape.
62
The most widely-used window function to reduce spectral smearing, or leakage, is probably
the Hamming window:

n
whamming [n] = 0.538 − 0.462 cos 2π for 0 ≤ n ≤ N − 1.
N −1
The tails of the spectrum are reduced, causing more contained spectral leakage but at
the cost of a loss in resolution since the main lobe is widened. There are other windows,
such as the Blackman window,

n n
wblackman [n] = 0.42 − 0.5 cos 2π − 0.08 cos 4π
N −1 N −1
or the at-top window,

n n
wf lattop [n] = 1 − 1.93 cos 2π + 1.29 cos 4π
N −1 N −1

n n
−0.388 cos 6π + 0.032 cos 8π .
N −1 N −1
These windows have lower spectral tails, but broader main lobes.
• If enough data is available we can use a window to reduce the leakage and make N
large enough to get the resolution we want.
1 1 1.2
1
0.8 0.8
0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2
0.2 0.2
0
0 0 −0.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300
Figure 40: Hamming window, Blackman window and Flat-top window. N = 256.
63
4.6 The averaged periodogram
While we see that the periodogram is unbiased (on average it will give the right answer),
it can be shown (not shown here though) that as N →∞ the variance does not tend to
zero! We say it is not a consistant estimate. In fact, it can be shown that the variance of
the periodogram estimator is approximately given as
2
var Svv (ejωT ) ≈ Sxx (ejωT ) .

(The reason for this is embedded in the fact that: while, as N → ∞, rvv [k] =
1 PN −1
N n=0 (w[n]x[n]) (w[n + k]x[n + k]) → rxx [k], it does not do so uniformly. Choose any
value of and it is impossible to choose a value of N such that |rvv [k] − rxx [k]| < for all
k .)
• Weird! So, what am I talking about? Well, take white Gaussian noise for instance.
Its autocorrelation function is a delta function:
rxx [k] = σ 2 δ[k].
So its power spectral density is:
∞
X
S̃xx (e jωT
) = rxx [k]e−jωkT
k=−∞
2 −0
= σ e
= σ2
Hence it is has a at power spectrum. Figure 41 shows two estimates of the power
spectrum of Gaussian noise.
4000 4000
3000 3000
2000 2000
1000 1000
0 0
−1000 −1000
−2000 −2000
−3000 −3000
0 100 200 300 400 500 600 0 100 200 300 400 500 600
n n
5000 7000
4500
6000
4000
3500 5000
3000
4000
2500
3000
2000
1500 2000
1000
1000
500
0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
m m
Figure 41: Power spectrum estimates of white gaussian noise, estimated using a peri-
odogram with N = 512. The spectrum is not as 'at' as expected.
64
• Everytime we estimate the spectrum we usually get the wrong answer, and the vari-
ance of the error is roughly

2 (ejωT )
S̃xx (which is awful!).
• And does not get any better when we increase N as we see from Figure 42, where
N = 2048.
4000
3000
2000
1000
−1000
−2000
−3000
−4000
0 500 1000 1500 2000 2500
n
2000
1800
1600
1400
1200
1000
800
600
400
200
0
0 20 40 60 80 100 120 140
m
Figure 42: PSD estimated from 2048 samples of a Gaussian process. Variance is about the
same as when N = 512.
So, we reduce the variance of the estimate by averaging a number of peri-

odograms.
• The data is rst broken into segments. Suppose we have N = KL data points. We
break the data into K segments, xk [n], each containing L samples:
xk [n] = x[kL + n] for 0≤n≤L−1 , 0≤k ≤K −1
• We multiply each segment by our window of choice:
vk [n] = w[n]xk [n].
• The periodogram of the k th segment is (including the normalisation for the window
N −1 2
energy, U n=0 (w[n]) ):
k 1 |Ṽk (ejωT )|2

S̃vv (ejωT ) = .
U L
N
Note that since the number of samples in each segment is L = K , the frequency
resolution has also decreased by a factor of K. We are trading bias for variance. You
get nothing for free!
65
• Our nal estimate is the average of all K periodograms:
K−1
jωT 1 X k jωT
S̃vv (e )= S̃vv (e ).
K
k=0
4 4 4 4
2 2 2 2
Data 0 0 0 0
segments −2 −2 −2 −2
−4 −4 −4 −4
0 50 100 0 50 100 0 50 100 0 50 100
time time time time
4 4 4 2
2 2 1
2
Windowed 0 0 0
Data 0
−2 −2 −1
−4 −4 −2 −2
0 50 100 0 50 100 0 50 100 0 50 100
time time time time
0 0 0 0
10 10 10 10
Periodograms
−5 −5 −5
10 10 10
−10 −10 −10 −5

10 10 10 10
0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2
ω 4
x 10 ω 4
x 10 ω 4
x 10 ω 4
x 10
0
10
10
−2 Averaged
Periodogram
−4
10
0 0.5 1 1.5 2
ω 4
x 10
Figure 43: Averaged periodogram with K = 4. The 4 periodograms (left to right),

1 (ejωT ), S̃ 2 (ejωT ), S̃ 3 (ejωT ), S̃ 4 (ejωT ) of the
S̃vv 4 windowed segments are averaged to
vv vv vv
obtain the nal estimate of the power spectral density. A Hamming window was used to
window each segment. The data consists of two sinusoids and Gaussian noise.
• Since at any frequency the estimate is the average of K i.i.d. variables (independent
and identically distributed variables), this will result in a reduction in the variance
by a factor of K:
h i S̃ 2 (ejωT )
var S̃vv (ejωT ) ≈ xx .
K
So the more segments we have the lower the variance of our estimate.
66
Example: Spectrum analyser
You have been asked to design a spectrum analyser to be used in an electronic tuning
device which will sample a band-limited sound signal at a rate of 8 kHz. It must have a
frequency resolution of 5Hz, and the standard deviation of the estimate must be within
20% of the true value.
• To get the standard deviation down to 20% the true value we need
S̃xx (ejωT )
r h i
var S̃vv (ejωT ) ≈ √ ≤ 0.2S̃xx (ejωT )
K
√
K ≥ 5
K ≥ 25
So we need 25 segments to get the accuracy we require,
• The frequency resolution of the rectangular window with L samples is about
2
∆f = ,
LT
but the spectral smearing is quite bad.
• We choose to use a Hamming window, whose frequency resolution is about twice as
bad as the rectangular window:

4
∆f = .
LT
• For an 8 kHz sample rate and 5 Hz resolution, we have
4(8000)
5 =
L
L = 6400
• So we need to collect K ≥ 25 segments each with L ≥ 6400 samples in each segment.

This results in (25)(6400/8000) = 20 s of recording.
• We calculate V̄ k [m] for each of the 25 segments
6399
X 2πm
V̄ k [m] = Ṽ k (ejωT ) = vk [n]e−jn
N
ωT = 2πm
N n=0
and average them

24
1 X k
V̄ [m] = V̄ [m].
25
k=0
67
4.7 Short-time Fourier transform
Suppose we have a signal which is nonstationary. Take for example the signals in Figure 44.
These are obviously two dierent signals, but their power spectral density estimates are the
same. What would be more useful would be some sort of time-frequency representation.
We want something like what is shown in Figure 45. Here we can see how an estimate of
the instantaneous frequency varies with time.
1 0.05
0.8 0.045
0.6 0.04
0.4 0.035
0.2 0.03
0 0.025
−0.2 0.02
−0.4 0.015
−0.6 0.01
−0.8 0.005
−1 0
0 20 40 60 80 100 120 140 0 500 1000 1500 2000 2500
n m
1 0.05
0.8 0.045
0.6 0.04
0.4 0.035
0.2 0.03
0 0.025
−0.2 0.02
−0.4 0.015
−0.6 0.01
−0.8 0.005
−1 0
0 20 40 60 80 100 120 140 0 500 1000 1500 2000 2500
n m
Figure 44: A single periodogram estimator of the power spectral density of a nonstationary
π π
signal. The two tones in the signal have frequencies at ω1 T = 4 and ω2 T = 8.
We can use the short-time Fourier transform to derive a plot similar to that of Figure 45...
68
1
3
0.8
0.6 2.5
0.4
2
0.2
ωT
0
1.5
−0.2
−0.4 1
−0.6
0.5
−0.8
−1 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
n n
1
3
0.8
0.6 2.5
0.4
2
0.2
ωT
0
1.5
−0.2
−0.4 1
−0.6
0.5
−0.8
−1 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
n n
Figure 45: Time-frequency representation of a non-stationary signal.
4.7.1 Continuous STFT

The continuous-time short-time Fourier transform is dened as:
Z∞
ST F T (t0 , ω) = (g(t − t0 )f (t)) e−jωt dt
t=−∞
were g(t − t0 ) is the window function, g(t), which is shifted to be centred at t = t0 . g(t) is
a short-time window function which quickly decays to zero. A typical function would be:
2
g(t) = e−αt for α>0
but you could use a Hamming window, etc.
• ST F T (t0 , ω) is simply the Fourier transform of the signal g(t − t0 )f (t). Figure 46
illustrates what g(t − t0 )f (t) looks like.
• A short section of the signal f (t) is isolated by g(t − t0 ) about the time t = t0 .
• The frequency content of this isolated section is determined using the single peri-
odogram estimator.
69
1
0.5
f(t)
0
−0.5
−1
0 200 400 600 800 1000 1200
t
0.8
g(t−t )
0.6
0
0.4
0.2
0
0 200 400 600 800 1000 1200
t0 t
0.5
f(t)g(t−t0)
−0.5
−1
0 200 400 600 800 1000 1200
t
Figure 46: An illustration of how the function g(t − t0 ) picks out the section of the chirp
signal f (t) at time t0 =500.
• If we calculate the STFT at a number of dierent times we can construct the time-
frequency plot we desire we call this 3-D plot a spectrogram.

• (The spectrogram is usually plotted on two dimensions, with the frequency magnitude
being represented by the image intensity or colour)
Figure 47 shows a sketch of what the spectrogram might look like for the `chirp' signal
f (t) = sin(ω0 t2 ) = sin ((ω0 t)t) .
The frequency, ω = ω 0 t, of the chirp signal increases linearly with time.
4.7.2 Time-frequency resolution

Just a quick note on the time-frequency resolution. If the time resolution is ∆t, this is
inversely proportional to the frequency resolution:
K
∆t =
∆f
⇓
∆t∆f = K
This is Heisenberg's uncertainty principle.
70
ω
t = 500
Figure 47: Sketch of what the spectrogram of the chirp signal in Figure 46 might look like.
• If we want ne-grain time resolution we take small windows which will give a poor
frequency resolution.
71
4.7.3 Discrete STFT
The STFT for discrete signals is:
∞
X
ST F T [n0 , ω) = g[n − n0 ]f [n]e−jωnT
n=−∞
2πm
Of course we can evaluate this at any frequency we like. We usually set ωT = N and
vary m over over 0, ..., N − 1 to give:
∞
X 2πm
ST F T [n0 , m] = g[n − n0 ]f [n]e−j N
n
n=−∞
If we use a Hamming window of length N the summation becomes:
n0 +(N −1)
X
ST F T [n0 , ω) = g[n − n0 ]f [n]e−jωnT .
n=n0
Let k = n − n0 , therefore, n = k + n0 . Make the substitution...
N
X −1
ST F T [n0 , ω) = g[k]f [k + n0 ]e−jω(k+n0 )T
k=0
N −1
!
X
= ejn0 ωT g[k]f [k + n0 ]e−jωkT (10)
k=0
• The term ejn0 ωT is just a phase component so we can ignore it and examine the
magnitude.
• We dene a new lter g∗ [k, ω0 ) = g[k]e−jω0 kT , so...
N
X −1
|ST F T [n0 , ω0 )| = g∗ [k, ω0 )f [k + n0 ]
k=0
This is the N samples of f [n], from n = n0 to n = n0 + N − 1, ltered by g∗ [k, ω0 ). But,
G̃∗ (ejωT , ω0 ) = F {g∗ [k, ω0 )}

n o
= F g[k]e−jω0 kT
n o
= F {g[k]} ∗ F e−jω0 kT
= G̃(ejωT ) ∗ (2πδ(ω0 ))
i.e. the Fourier transform of the window convolved with a delta function at ω0 .
• This results in F {g[k]} = G̃(ejωT ) being shifted to ω0 .
• Remember, if the window function, g[k], is the rectangular window of length N, we
72
get
sin(N ωT
2 )
|F {g[k]}| = |G̃(ejωT )| =
sin( ωT
2 )
sin(N (ω−ω0 )T )
G̃∗ (ejωT , ω0 ) = 2

sin( (ω−ω
2
0 )T
)
This looks like a narrow band-pass lter at ω = ω0 . So...

|F {ST F T [n, ω0 )}| = G̃∗ (ejωT , ω0 ) |F̃ (ejωT )|

!
sin(N (ω−ω 0 )T
)
= 2
(ω−ω0 )T
|F̃ (ejωT )|
sin( 2 )
2πm
• If we evaluate the STFT at ωT = N , for m = 0, ..., N − 1 the STFT will behave
like a bank of N narrowband lters (c.f. Figure 48), each one allowing a dierent
frequency to pass.
G∗(ejωT )
ω |ST F T [n, ω0)|

ω0
f [n] ω |ST F T [n, ω1)|

ω1
..
.
ω
|ST F T [n, ωN −1)|
ωN −1
Figure 48: STFT is like a lter bank.
• This is a very useful transform for analysing the structure of transient signals, like
speech. Figure 49 shows a speech waveform, its PSD and its STFT.
73
Speech
0.5
0.4 o
"ong"
S
0.3 l
y[n] 0.2
0.1
−0.1
−0.2
0 0.2 0.4 0.6 0.8 1
TIME
Power Spectral Density Estimate of Entire Speech

−1
10
−2
10
S(f)
−3
10
−4
10
−5
10
0 2000 4000 6000 8000 10000
FREQUENCY (Hz)
Spectrogram of the phrase "so long"

7000
6000
5000
FREQUENCY (Hz)
4000
3000
2000
1000
0
0 0.2 0.4 0.6 0.8 1
TIME (s)
Figure 49: (top) Speech waveform for the words So long. (middle) An estimate of the
PSD using the averaged periodogram estimator. (bottom) The spectrogram derived using
the short-time Fouier transform. The variation of frequency with time becomes apparant.
74
5 Speech processing
• The processing of digitised speech signals is an important application of DSP.
• Speech is a fundamental method of human communication.
• From an engineering view-point, speech is a time-varying signal produced by a con-
strained physical system (lungs, vocal chords and vocal tract).
• Speech is reasonably band-limited. (phone system samples at 8 kHz which sounds
ne).
There are 3 classes of problem people are generally interested in:
1. Speech analysis with the aim of building speech recognition systems.
2. Speech synthesis to allow computerised systems to converse with humans without
using recorded speech snippets, for reading text, for altering speech characteristics
for security reasons.
3. Speech analysis/synthesis for speech compression for transmission or storage.
5.1 Speech production model

• Speech is produced when air is expelled from the lungs, through a non-uniform
acoustic tube (the vocal tract) and is released at the lips.
• It can be thought of as the output of a slowly time-varying system which is excited
by a periodic source or a noise source.
• Speech is made up of a series of short sounds called phonemes.
There are three possible ways you can make a phoneme sound:
1. The vocal chords You tighten your vocal chords until they seal shut. You then
use your lungs to force them open. When they open a small noise will be made for a
short time, due to the pressure dierence. If is were possible to remove the head (!)
and listen to the sound of the vocal chords vibrating it would sound like an impulse
train of sound, whose fundamental frequency is determined by the tension on the
vocal chords. Sounds formed in this way are called voiced sounds. (/a/, /e/, /oo/,
etc.)
2. The vocal tract When the vocal chords are left relaxed and air is forced through
the vocal tract, if the vocal tract is made narrow enough (using the tongue) a hissing
sound will be created. Sounds formed in this way are called unvoiced or fricative
sounds. (/ss/, /sh/, /ch/)
75
3. Plosives By sealing the airway shut, building pressure behind the blockage and
then opening the airway we produce sounds called plosives. (/t/, /p/, /k/) However,
as a fraction of the total speech duration plosives make up very little time.
• Therefore we can model our speech process as a system excited by either a noise
source or a periodic impulse train.
• The dierence between various voiced or unvoiced sounds, is due to the variable
shape of the vocal tract. We model the vocal tract as a time-varying lter.
Therefore our speech model becomes that shown in Figure 50:
Vocal tract parameters

Noise
Source
Vocal tract
time-varying
filter
Pitch Period Pulse
Source
Figure 50: Speech model.
76
5.2 Voiced speech
5.2.1 The impulse train
So we know that voiced speech is formed by the ltering (by the vocal tract) of an impulse
train of sound, which has a period of T0 . Before we look at the ltering caused by the
vocal tract let's examine what the impulse train looks like.
If the impulse train is ideal we can write it as δT0 (t) = Σ∞

n=−∞ δ(t − nT0 ). Note that
δT0 is periodic and so can be represented by a Fourier series,
∞
X
δT0 (t) = cn ejnω0 t ,
n=−∞
where,
Z π
ω0 ω0 T0 π
cn = δT0 (t)e−j 0 t dt, N ote : = .
2π −π 2 ω0
ω0
1
=
T0
Therefore,
∞
1 X jnω0 t
δT0 (t) = e .
T0 n=−∞
Taking the Fourier transform gives
Z∞ X
∞
1
F {δT0 (t)} = e−j(ω−nω0 )t dt
T0
−∞ n=−∞
∞ Z∞
1 X
= e−j(ω−nω0 )t dt
T0 n=−∞
−∞
∞
2π X
= δ(j(ω − nω0 )).
T0 n=−∞
Hence, this gives an impulse train in the frequency domain (an equal contribution from all
multiples of the fundamental frequency). However, we rarely ever have a perfect impulse
train in the time domain because the vocal chords can only open at a nite speed. So what
we get is more like a ltered impulse train (c.f. Figure 51) in the frequency domain.
Since we will be creating a lter to model the eect of the vocal tract we can simply
assume that the ltering eect created by having an imperfect impulse train is included in
that lter.
77
δT0 (t)
t ω0 ω
imperfect δT0 (t)
t ω0 ω
Figure 51: Perfect and imperfect impulse train for voiced speech.
5.2.2 The vocal tract

Our ability to distinguish between dierent voiced phonemes is due to the ltering eect
the vocal tract has on the impulse train. The sound echoes around the vocal tract and is
eected by the shape of the throat, nose, mouth, position of the tongue and lips and even
the shape of the head and sinus cavities.
Figure 52 shows segments of voiced and unvoiced speech. We can see the impulse train
in the frequency domain and how it is shaped (ltered) by the vocal tract. The reason the
impulses in the frequency domain are not quite impulses is due to spectral smearing. (The
spectrum was estimated using a nite window length.)
• Notice (by looking at the envelope of the spectrum) how the vocal tract lter transfer
function appears to have a number of peaks and valleys.
• These peaks are called formants.
Figure 53 shows the spectrogram of a speech segment. In the voiced sections, the frequency
impulses (vertical spaced dark patches) are clearly visible. In the unvoiced segments there
are no impulses and the high frequency regions contains more power than the low frequency
regions.
78
0.15
0.1
0.05
0
Amplitude
−0.05
−0.1
−0.15
−0.2
0 500 1000 1500 2000 2500 3000
n
−40
Formants
−50
−60
Power/frequency (dB/Hz)
−70
−80
−90
−100
0 2 4 6 8 10
Frequency (kHz)
Figure 52: Example of voiced speech segment (left) for the phoneme /o/ and the averaged
periodogram estimate of its power spectrum (right). The pitch harmonics are visible. We
also see how the vocal tract has ltered the harmonics.
79
Figure 53: Steve jobs says 'Hi'. Spectrogram of a speech segment. The pitch harmonics
are clearly visible. The formant structure is somewhat visible.
80
5.3 Linear predictive speech coding
• We have just observed how the spectrum of a phoneme is composed of an impulse
train of harmonics of the fundamental frequency.
• But we also saw how the spectrum of the excitation signal, ε[n], which should be
`at' for an impulse train or for white Gaussian noise is shaped by the vocal tract.
• The peaks represent resonances in the vocal tract. These are preferred frequencies
of vibration within the vocal cavity.
• We have seen from the properties and design of lters section that we can create a
lter whose transfer function has a number of peaks and valleys by placing poles near
the unit circle. The nearer the pole is to the unit circle the larger the peak.
So we could model the vocal tract lter with an all-pole lter:
G
H̃(z) = Pp −k
,
1− k=1 b[k]z
where G is the gain (just using poles will make the gain everywhere greater than unity, so
G<1 to compensate). If the samples of the recorded speech are written as y[n] and we
call the impulse train or the noise excitation ε[n], then they are related by:
Ỹ (z) = H̃(z)ε̃(z)
Gε̃(z)
Ỹ (z) = Pp
1 − k=1 b[k]z −k
p
!
X
Ỹ (z) 1 − b[k]z −k = Gε̃(z)
k=1
p
X
Ỹ (z) = Gε̃(z) + b[k]Ỹ (z)z −k
k=1
Taking the inverse z -transform gives
p
X
y[n] = b[k]y[n − k] + Gε[n].
k=1
• We see that the current speech sample, y[n], is very much dependent on previous
samples, y[n − k] this veries our belief that speech is quite a redundant signal.
The number of previous samples needed depends on the order, p, of the lter. Usually
about p = 12 will do.
• If we knew the impulse train period, or the noise source power, and if we also knew
the lter coecients, b[k], we could predict what the next speech sample would be.
• This type of encoding is called linear prediction, since we predict the current sample
using a linear combination of the previous outputs.
81
• The hope is that it will take less bits to encode the pitch period, or noise source
power, and the lter coecients for a short section of speech than it would to encode
the actual speech samples.
• In addition the lter parameters give us some way to quantify the dierence between
dierent phonemes (for speech recognition, etc.).
5.3.1 Linear prediction coecients

So how do we nd the linear prediction coecients, b[k]? We take a short section of speech
samples, say 50 ms. Sampling at 8000 Hz will give 400 samples in this section of speech.
We want to nd the parameters b[k] which will best model this section of speech. We write
our prediction for the next speech sample, ŷ[n], according to linear prediction model as
p
X
ŷ[n] = b[k]y[n − k] + Gε[n].
k=1
But we've recorded the speech and we know what that sample actually is (y[n]). So we
can compare it with our prediction to get an error signal, e[n]:
e[n] = y[n] − ŷ[n]

X p
= y[n] − b[k]y[n − k] − Gε[n]
k=1
We want to nd the b[k] which minimises the expected mean square error,
 !2 
 p
X 
E e2 [n] = E

y[n] − b[k]y[n − k] − Gε[n] .
 
k=1
Dierentiate w.r.t. b[i]:

∂ ∂ 2
E e2 [n] = E

e [n]
∂b[i] ∂b[i]

∂e[n]
= E 2 e[n]
∂b[i]
= E {2(−y[n − i])e[n]}
= −2E {y[n − i]e[n]}
82
We set the derivative equal to zero to minimise for i = 1, .., p:
E {y[n − i]e[n]} = 0
 

 

 
p

 ! 

 X 
E y[n − i] y[n] − b[k]y[n − k] − Gε[n] = 0
 

 k=1 


 | {z }



This is y[n]−ŷ[n]
p
( )
X
E y[n − i]y[n] − b[k]y[n − i]y[n − k] − Gy[n − i]ε[n] = 0
k=1
p
X
E {y[n − i]y[n]} − b[k]E {y[n − i]y[n − k]} − GE {y[n − i]ε[n]} = 0
k=1
These expectations will be estimated using the 400 samples from the section we're ex-
amining. For voiced sounds, the cross-correlation between the recorded speech and the
excitation source should be zero for any lag, i.e. i > 0, since the impulse train is zero
everywhere except at the pulses, and the speech signal can't have a DC value. Hence we
may assume E {y[n − i]ε[n]} = 0 for i > 0. This gives
p
X
E {y[n − i]y[n]} = b[k]E {y[n − i]y[n − k]} for i = 1, ..., p
k=1
p
X
ryy [i] = b[k]ryy [i − k] for i = 1, ..., p
k=1
We can write these p equations more compactly in matrix form:
 
··· ryy [p − 1]
  
ryy [0] ryy [1] ryy [2] b[1] ryy [1]
..
    

 ryy [1] ryy [0] ryy [1] . ryy [p − 2] 
 b[2] 

 ryy [2]



..
 

 ryy [2] ryy [1] ryy [0] . ryy [p − 3]  
  b[3] 
 =
 r [3]
 yy


. .
  
. .. .. .. . . .
 
. . . .
   
. . . . .
     
 
ryy [p − 1] ryy [p − 2] ryy [p − 3] · · · ryy [0] b[p] ryy [p]
Ryy b = ryy
So the linear prediction coecients we want are
b = R−1
yy ryy .
There is an ecient algorithm for solving matrix problems with the above form. It's called
the Levinson-Durbin algorithm, and we'll get to it shortly.
• Figure 54 shows a speech signal, the predicted signal and the error between them for
the optimum prediction coecients.
83
4
−1
−2
−3
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
time (s)
2.5
1.5
0.5
−0.5
−1
−1.5
−2
0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
time (s)
Figure 54: Example showing the speech waveform

Pp y[n] (solid) for the phoneme /o/ and
the predicted signal ŷ[n] = k=1 b[k]y[n − k] (dashed) with p = 22. The error signal
e[n] = y[n] − ŷ[n] (solid) is also shown. These plots were created using the optimum
prediction coecients, b[k]. We see the error signal looks like noise.
84
• Figure 55 shows the magnitude and phase of the transfer function H̃(z) for the same
short section of speech and the resulting periodogram when applied to an articial
impulse train.
40
35
30
25
Magnitude (dB)
20
15
10
−5
−10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (´π rad/sample)
50
0
Phase (degrees)
−50
−100
−150
−200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (´π rad/sample)
−40
Formants
−50
−60
−70
−80
−90
−100
0 2 4 6 8 10
Frequency (kHz)
Figure 55: Transfer function of the lter dened by the linear prediction coecients for
the phoneme /o/. Also shown is the periodogram of the section of speech from which the
coecients were derived.
• The pitch of this speech segment is about 100 Hz. We can synthesise the speech by
passing an impulse train (with 10 ms between deltas functions) through the lter.
85
• We must adjust the gain, G, appropriately.
• Figure 56 shows the result of passing an impulse train through the lter.
1.5
0.5
−0.5
−1
0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065
time (s)
Power Spectral Density Estimate via Welch Power Spectral Density Estimate via Welch
−20 −20
−25
−30
−30
−40
−35
−50
−40
−60
−45
−70
−50
−80
−55
−90
−60
−100
−65
−110 −70
0 2 4 6 8 10 0 2 4 6 8 10
Frequency (kHz) Frequency (kHz)
Figure 56: (Top) Example of speech synthesised using all-pole lter model (p = 22) derived
from the actual speech phoneme /o/. Solid is the impulse train. Dashed is the synthesised
speech waveform. (Bottom left) The periodogram of the synthesised speech. (Bottom
right) Same periodogram with noise added.
86
5.4 The Levinson-Durbin algorithm
The Levinson-Durbin recursive algorithm is method for solving the set of linear equations
Tx = y
where T is a Toeplitz matrix (the diagonals which run from upper left to lower right
contain the same entry) with a non-zero main diagonal. A special simplied case is when
T is symmetric. This is what we use here for the equation Ryy b = ryy :
Algorithm 1 Symmetric Toeplitz Levinson-Durbin recursion

1. i = 0, E0 = ryy [0].
2. i = i + 1.
Pi−1
(ryy [i]− j=1 (b(i−1) [j])(ryy [i−j]))
3. ki = Ei−1
4. b(i) [i] = ki .
5. For j = 1, ..., i − 1: b(i) [j] = b(i−1) [j] − ki b(i−1) [i − j]
6. Ei = (1 − ki2 )Ei−1
7. If i<p go to step 2, else terminate.
Example: Levinson-Durbin matrix inversion

The nth sample of a speech signal, y[n], can be estimated as ŷ[n] using the following 3rd
order speech model:

3
X
ŷ[n] = b[k]y[n − k] + Gε[n]
k=1
If the autocorrelation function of a speech signal y[n] is approximated by ryy [k] = ρ|k| ,
where ρ < 1, nd the coecients , b[k], which will minimise the expected squared error,
ŷ[n])2 .

E (y[n] −
Using the derivation in the notes we get:
    
ryy [0] ryy [1] ryy [2] b[1] ryy [1]
 ryy [1] ryy [0] ryy [1]   b[2]  =  ryy [2] 
    
ryy [2] ryy [1] ryy [0] b[3] ryy [3]

    
1 ρ ρ2 b[1] ρ
 ρ 1 ρ  b[2]  =  ρ2  .
    
ρ2 ρ 1 b[3] ρ3
87
If we bash out the inverse we will get:
    
b[1] 1 −ρ 0 ρ
1 
 b[2]  =  −ρ 1 + ρ2 −ρ   ρ2 
   
1 − ρ2
b[3] 0 −ρ 1 ρ3
 
ρ
=  0 .
 
But, for a large matrix this is a little more time consuming than it needs to be. We use
the Levinson-Durbin recursion to perform the same inversion:
Iteration 0:
1. E0 = ryy [0] = 1. i = 0.
2. i = i + 1 = 1.
Iteration 1: P j=0
ryy [1]− j=1 (b(0) [j])(ryy [1−j]) ryy [1] ρ
3. k1 = E0 = E0 = 1 = ρ.
4.
b(1) [1] = k1 = ρ
5. skip since i−1=0

E1 = (1 − k12 )E0 = 1 − ρ2 1 = 1 − ρ2

6.
7. i < 3, go to step 2.
2. i = i + 1 = 2.
Iteration 2:
ryy [2]−b(1) [1]ryy [1] ρ2 −ρρ
3. k2 = E1 = 1−ρ2
= 0.
4.
b(2) [2] = k2 = 0.
5.
b(2) [1] = b(1) [1] + k2 b(1) [1] = ρ − 0.ρ = ρ
6. E2 = (1 − k22 )E1 = (1 − 0)(1 − ρ2 ) = 1 − ρ2 .

7. i < 3, go to step 2.
2. i = i + 1 = 3.
Iteration 3:
ryy [3]−b(2) [1]ryy [2]−b(2) [2]ryy [1] ρ3 −ρρ2 −0.ρ
3. k3 = E2 = 1−ρ2
= 0.
4.
b(3) [3] = k3 = 0.
88
5.
b(3) [2] = b(2) [2] − k3 b(2) [1] = 0 − 0.ρ = 0

b(3) [1] = b(2) [1] − k3 b(2) [2] = ρ − 0.0 = ρ
6. E3 = (1 − k32 )E2 = (1 − 0)(1 − ρ2 ) = 1 − ρ2 .

7. i = 3, terminate.
This gives the solution
     
b[1] b(3) [1] ρ
 b[2]  =  b(3) [2]  =  0  .
     
b[3] b(3) [3] 0
5.5 Voiced/unvoiced/silence decision

Before we start trying to code a section of speech we need to decide whether it is voiced,
unvoiced or silence. This is usually achieved by examining a number of dierent features
and combining the information they provide in an optimal way (Usually using a pattern
classier). Some common features might be:
• The energy in the speech segment,
• Number of zero-crossings,
• Autocorrelation,
• First linear prediction coecient, b[1],
• etc, etc, etc.
Figure 57 shows how we might use two features (energy and zero-crossings) to make the
voiced unvoiced decision.
• Pattern classiers, such as neural networks, for example, are used to divide up the
feature space accordingly, even when there are multiple features.
• You could add any number of clever measures to aid the voiced/unvoiced decision
and the classier will take care of the rest!
89
1
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
time (s)
300
250
Unvoiced
200
zero−crossings
150 "Silence"
Voiced
100
50
0
0 0.5 1 1.5 2 2.5 3 3.5 4
energy
Figure 57: The voiced/unvoiced/silence decision just using the segment power (*) and
number of zero-crossings (o). (The top plot has been normalised to t everything on the
same scale!)
90
5.6 Estimating the pitch period
• If we happen to nd that the speech is voiced, then before we can code it we need
to estimate the pitch of the speech.
• In long, clean, segments of voiced speech it is quite easy to determine the pitch period
by eye.
• If the speech segment is short, in the presence of noise, it can be dicult to see the
pitch period.
• In this case are numerous ways to to estimate the pitch period some better than
others. One common way is to use the autocorrelation function.
• If we have a segment of speech y[n], for n = 1, ..., N . We estimate the autocorrelation
function as:
N −1
1 X
r[k] = y[n]y[n − k].
N − |k|
n=1
Figure 58 shows how the autocorrelation of a voiced speech segment reveals the pitch
period. A simple algorithm to search for the rst peak above a certain threshold can
simply be used to nd the peak due to a lag equal to the pitch period.
Other ad-hoc methods exist, too. One is used in the LPC-10 speech standard. It is called
the average magnitude dierence function (AMDF):
N
1 X
AM DF (k) = |y[n] − y[n − k]|
N
i=1
• It similar to the autocorrelation, except instead of multiplying the signal we subtract
the signal from a delayed version of itself.
• When the lag is that of the pitch period the summation will become small.
91
0.15
0.1
0.05
0
Amplitude
−0.05
−0.1
−0.15
−0.2
0 500 1000 1500 2000 2500 3000
n
−4
x 10
14
12
10
−2
−4
−6
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
k
Figure 58: (Top) A short segment of the phoneme /o/. (Bottom) The autocorrelation of
the speech segment.
92
6 Image Processing
Processing of images is a fundamental concern of engineering. Some specic applications
of interest are:
• Television
• Video
• Remote sensing
• Medical imaging
• Image enhancement and restoration
• Image compression/coding for storage or transmission
• Image Analysis / Machine vision, etc.
Like most engineering disciplines, the number of image processing algorithms and trans-
forms is vast. Here we will touch on the following basic principles;
• Image representation
• Image histograms
• 2-D Fourier transform
• Image ltering
• Discrete cosine transform
• Image compression (JPEG)
• Morphological operations
6.1 Image representation

• We can represent an image with an M ×N matrix, f [m, n].
• For a binary image the matrix will have only f [m, n] = 0 (black) or f [m, n] = 1
(white) entries (c.f. Figure 59). It uses B =1 bits for every pixel. Hence it uses
NM bits.
• If we allow dierent grey levels between black and white we have a greyscale image.
Using B = 4 bits will give 2

B = 16 levels (0, ..., 15) which makes the image more
`natural looking'. We can increase the number of bits to B = 8 which is usually
sucient (c.f. Figure 60).
93
Figure 59: 1-bit grey level image
Figure 60: (left) 4-bit image (16 grey levels). (right) 8-bit image (256 grey levels).
94
• Figure 61 shows an image of a man created using M = 11 and N = 8. We've made
the pixels bigger and used less of them. We see that the resolution is too low to be
useful.
Figure 61: Low resolution image of a man. M = 11 and N = 8.
• We can also represent colour images by using 3 M ×N matrices for the red, green
and blue intensities. This is called the RGB colour space. Each pixel is represented
by a triplet, e.g. White= (255, 255, 255), Black= (0, 0, 0), etc.
• You may sometimes see a dierent colour space used: the YCbCr colour space. This
is from the old Standard-Denition TV broadcast standards.
The Y is brightness (luma) component.
Cb is the blue chroma component..
Cr is the red chroma component.
      
Y 65.481 128.553 24.966 R 16
 Cb  =  −37.797 −74.203 112   G + 128  where R, G, B ∈ (0, 1)
      
Cr 112 −93.786 −18.214 B 128
• Alternatively, we could represent a colour image with one matrix by using a colour
palette:
Black = 0
Dark red = 1
Light Green =2
etc...
95
6.2 Image histograms
• If we let the variable R represent the various grey-levels in the image f [m, n]. We
can normalise R to lie in the range (0, 1) by dividing each pixel value by 2
B − 1, the
maximum possible value. Where, R=0 is black, and R=1 is white.
• Therefore the set of grey-levels in the image (corresponding to the pixel brightness
values) will be a collection of samples of the random variable R.
• If we assume the bit-depth is large enough that R is approximately a continuous
variable, the image may be characterised by the PDF, fR (R = r), of the random
variable R. For notational convenience we just write fR (R = r) = fR (r).
• The PDF, fR (r), can be used in various ways. For example, we can determine the
average brightness of the image as E {R}:
Z1 M N
1 1 XX
E {R} = rfR (r)dr ≈ f [m, n].
(2B − 1) N M
r=0 m=1 n=1
• This allows us to quantitatively say whether an image is `bright' (e.g. E {R} = 0.8)
or `dark' (e.g. E {R} = 0.2).
• Usually the actual PDF is not available to us and we must either (1) choose a PDF
which best ts the image (Gaussian mixture model, etc.), or (2) we must estimate
the PDF non-parametrically.
• A popular non-parametric method for PDF estimation is the histogram.
• The histogram is constructed by simply counting the number of occurrences of each
pixel brightness, and nally dividing the entire histogram by NM (the number of
pixels), so that the integration of the histogram is 1 (because the area under the
PDF = 1).
• The histogram can also give a good indication of the contrast of an image. If the
histogram is concentrated over a small number of brightness values the contrast will
be low (c.f. Figure 63).
• In comparison, if the image uses all of the available brighness values the contrast will
be high (c.f. Figure 64).
• We can manipulate the contrast of the image by transforming the pixel values. We
can use any mapping that sends the pixel values from the domain (0, 1) to the range
(0, 1). We write this transform S = T (R), where T (·) is the transformation function.
S is a random variable which denotes the brightness of the pixels in the transformed
image. Figure 65 shows the arbitrary transformation S = T (R).
96
Image histogram
800
700
600
# OCCURRENCES
500
400
300
200
100
0
0 50 100 150 200 250
BRIGHTNESS
Figure 62: Sample (un-normalised) histogram for the 8-bit image in Figure 60.
Image histogram − low contrast

1400
1200
1000
# OCCURRENCES
800
600
400
200
0
0 50 100 150 200 250
BRIGHTNESS
Figure 63: Low contrast image and its histogram.
97
Image histogram − high contrast
800
700
600
# OCCURRENCES
500
400
300
200
100
0
0 50 100 150 200 250
BRIGHTNESS
Figure 64: High contrast image and its histogram. Some saturation is evident in the
brighter areas.
0.9
0.8
0.7
0.6
S
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Figure 65: An arbitrary grey-level transformation function, S = T (R) = R0.5 .
98
In fact, the Cathode Ray Tube (CRT) used in TV sets performs such a transfor-
mation since the electron beam intensity, ICRT is related to the voltage applied,
VCRT by
ICRT = kV γCRT ,
where γ is dependent on the CRT, and k is some constant. To compensate,
Gamma Correction is applied so that

1
γ
VCRT = VIM AGE ,
and the image brightness is unaltered at the receiving end.
• From probability theory we can show that, given a random variable X, with a PDF
pX (x), if this variable is transformed to get Y = g(X), for some function g(·), then
the PDF of Y is given by
fX (x1 ) fX (xn )
fY (y) = + ... + ,
dg(x) dg(x)
dx dx
x=x1 x=xn
where x1 , ..., xn are the roots of the equation g(x) = y . If we assume we are using
a monotonic transformation S = T (R) then there is only one root to the equation
T (r) = s we call this solution r1 . Hence we can write the PDF of the transformed
image as
fR (r1 )
fS (s) = . (11)
dT (r)
dr
r=r1
Example: Grey-level transformation

Suppose we have an image whose PDF, fR (r), is uniform over the range (0, 1), i.e.
fR (r) = 1 for 0≤r≤1

= 0 otherwise.
If we use the following transformation,


 0 for R < 0.25
S = T (R) = 2(R − 0.25) for 0.25 ≤ R ≤ 0.75

1 R > 0.75

for
We see that for s=0 there are an innite number of solutions to T (r) = s = 0 along the
dT (r)
line from 0 < r < 0.25 . Similarly for s = 1. Therefore in these regions we have
dr =0
which gives
R 0.25
r=0 fR (r)dr
fS (0) =
0
99
1
0.9
0.8
0.7
0.6
0.5
S
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Figure 66: Sample grey-level transformation
which is meaningless! But we can calculate the PDF in these regions since
0.25
Z
fS (0) = P (0 < R < 0.25) = fR (r)dr = 0.25.
r=0
Similarly, fS (1) = 0.25.

dT (r)
In the center region,
dr = 2, and so
fR (r)
fS (s) = = 0.5.
2
R∞
So,
−∞ fS (s)ds = 1 as required, however the low intensity and high intensity regions have
been saturated. Meanwhile, the contrast of the middle intensity values has been increased.
Figure 67 shows the result of the above transformation applied to the sample image.
Image histogram − altered histogram

1000
900
800
700
# OCCURRENCES
600
500
400
300
200
100
0
0 50 100 150 200 250
BRIGHTNESS
Figure 67: Image transformed to saturate the high and low intensity regions
Example: Image equalisation

One particular image enhancement we might like to preform is image equalisation. This
means transforming the image PDF so that all intensities provide an equal contribution.
100
This means we want
fS (s) = 1 for 0 ≤ s ≤ 1.
But we saw in Equation (11) that
fR (r1 )
fS (s) = where T (r1 ) = s.
dT (r)
dr
r=r1
If we choose
Zr
s = T (r) = FR (r) = fR (w)dw
w=−∞
which is the cumulative distribution function (CDF) for the random variable R, then we
have:
f (r )
fS (s) = R 1
dT (r)
dr
r=r1
f (r )
= R 1
dFR (r)
dr
r=r1
fR (r1 )
=
|fR (r)|r=r1
fR (r1 )
=
fR (r1 )
= 1.
The idea is that the new image has a uniform histogram.
Equalized histogram
800
700
600
# OCCURRENCES
500
400
300
200
100
0
0 50 100 150 200 250
BRIGHTNESS
Figure 68: Equalised image.
101
6.3 2-D Fourier transform
6.3.1 Continuous 2D Fourier transform
• Up until now we have been using the 1-D Fourier transform to transform functions
of one variable time, e.g.
Z∞
F (jω) = f (t)e−jωt dt.
−∞
• However, the Fourier transform is easily dened over many variables. Take two
variables, x and y, for example,
Z∞ Z∞
F (jωx , jωy ) = f (x, y)e−jωx x e−jωy y dydx
x=−∞ y=−∞
x could denote the x-co-ordinate of an image and y the y -co-ordinate. f (x, y) would
be the grey-level intensity at that point. Here x and y are continuous variables, so
our image would be an analog image (on photographic lm for instance).
• The 1-D Fourier transform is a function of ω. Similarly the 2-D Fourier transform is
a function of ωx and ωy . For an image, where x and y represent positions in space,
ωx and ωy are spatial frequencies, and are measured in units of cycles per meter.
• Figure 69 shows what we might expect the magnitude of the 2-D Fourier transform
to look like for a sample image.
f (x, y) |F (jωx, jωy )|
x ωx
y ωy
Figure 69: Sketch of the 2-D Fourier for the sample image f (x, y)
6.3.2 Discrete 2D Fourier transform

• Similarly we can sample the analog image to obtain a discrete image. The equivalent
of the sampling time, T, is the sampling distance, D.
102
• Since we are sampling in two dierent directions we have two dierent sampling
distances, Dx and Dy .
• Remember, for a discrete image, f [m, n], m indexes the rows (this is the y direction)
and n indexes the columns (this is the x direction).
• We can write the discrete 2-D Fourier transform as:
M
X −1 N
X −1
F̃ (ejωx Dx , ejωy Dy ) = f [m, n]e−jωx nDx e−jωy mDy .
m=0 n=0
Notice that we only sum from n = 0, ..., N − 1 and m = 0, ..., M − 1. This is
because the image has already been `windowed' with a rectangular window. We
can use a 2-D Hamming window if we like. It will have the same kind of eect
we spoke about in the Spectral Analysis section: It will reduce spectral leakage,
but the frequency resolution will be worsened.
Example: Discrete 2-D Fourier transform

" n=0 n=1 #
m=0 1 1
f [m, n] = .
m=1 1 1
M
X −1 N
X −1
jωx Dx
F̃ (e ,e jωy Dy
) = 1e−jωx nDx e−jωy mDy
m=0 n=0
−jωx Dx
= 1+e + e−jωy Dy + e−jωx Dx e−jωy Dy
Use the following trick!
jωy Dy jωx Dx jωy Dy jωx Dx jωy Dy jωy Dy jωy Dy

jωx Dx jωx Dx jωx Dx
F̃ (ejωx Dx , ejωy Dy ) = e− 2 e− 2 e 2 e 2 +e− 2 e 2 +e 2 e− 2 +e− 2 e− 2
jωx Dx jωy Dy
n jωy Dy jωx Dx jωx Dx
jωy Dy jωx Dx jωx Dx
o
= e− 2 e−
e 2 2 e 2 + e− 2 + e− 2 e 2 + e− 2
jωx Dx jωy Dy n jωy Dy jωy Dy jωx Dx jωx Dx
o
= e− 2 e− 2 e 2 + e− 2 e 2 + e− 2

jω D
− jωx2Dx − y2 y ωy Dy ωy Dy
= e e 2 cos 2 cos
2 2

jωx Dx jωy Dy ω D
y y ω D
y y
= 4e− 2 e− 2 cos cos .
2 2
So the magnitude response is

jωx Dx jωy Dy ωy Dy ωy Dy
|F̃ (e ,e )| = 4 cos cos .
2 2
This repeats every 2π so we only need to plot the range (ωx Dx , ωy Dy ) ∈ [−π, π] × [−π, π].
103
Fourier transform of f[m,n]
2
|F|
0
2
2
0
0
−2 −2
wy wx

1 1
Figure 70: Magnitude of the 2-D Fourier transform of the sequence f [m, n] =
1 1
6.3.3 The importance of amplitude and phase information

In interesting point to make about images compared to speech, is the importance of the
phase and amplitude information.
Figure 71: An image, the log of the magnitude of the Fourier transform and the phase.
Figure 71 shows an image and its Fourier transform. Figure 72 shows the image recon-
structed using just amplitude information of just phase information. We see the importance
of phase, which is not present in our perception of sound.
6.4 Image ltering

Just as we performed a 1-D ltering of a signal using convolution with the lter impulse
response,
N
X −1
y[n] = h[k]x[n − k],
k=0
104
Figure 72: Reconstructed image using just magnitude (phase = 0) or just phase information
(magnitude=constant). We see the importance of the phase information.
We can similarly perform image ltering using 2-D convolution with the lter impulse
response:
M
X −1 N
X −1
g[m, n] = h[u, v]f [m − u, n − v].
u=0 v=0
As with time signals, the resulting Fourier transform is given by:
G̃(ejωx Dx , ejωy Dy ) = H̃(ejωx Dx , ejωy Dy )F̃ (ejωx Dx , ejωy Dy ).
Example: low-pass image ltering

We lter the image, f [m, n], below, using the lter h[m, n], which is a low-pass lter with
a DC gain of 4. The resulting image, g[m, n], is smoother and 4 times brighter. Notice
g[3, 2] = 288. This would be saturated to 255 in practice.
f [m, n] h[m, n] g[m, n]

n=0 n=1 n=2 n=3 n=0 n=1 n=0 n=1 n=2 n=3
m=0 8 10 15 16 m=0 1 1 m=0 8 18 25 31
m=1 10 15 16 20 m=1 m=1 18 43 56 67

1 1
m=2 10 20 30 100 m=2 20 55 81 166
m=3 3 215 23 37 m=3 13 248 288 190
• The result is a blurring of the sharp lines and the attenuation of any high frequency
detail.
• Also, we usually normalise the coecients so the DC gain is unity. Hence we would
use " #
0.25 0.25
h= .
0.25 0.25
105

0.25 0.25
Figure 73: Image before and after low-pass ltering with h= .
0.25 0.25
Example: high-pass image ltering

The 3×3 image lter,
 
0 −1 0
h[m, n] =  −1 4 −1  ,
 
0 −1 0
can be shown to have a magnitude frequency response:

jωx Dx jωy Dy
H̃(e , e ) = 4 − 2 cos (ωx Dx ) − 2 cos (ωy Dy ) .

This is plotted in Figure over the range (ωx Dx , ωy Dy ) ∈ [−π, π] × [−π, π].
• This lter has a DC gain of zero.
• It amplies high frequency components.
• As a result all edges in the image are retained.
• Edge enhancement is the usually the rst step in image segmentation.
MPEG-4 supports segmentation of the image into dierent regions. Fast chang-
ing regions are allowed more bit/s.
106
Fourier transform of h[m,n]
6
|H|
0
2
2
0
0
−2 −2
wy wx
Figure 74: Magnitude of the transform of the Laplacian operator and the eect on the
image after ltering.
6.5 Discrete Cosine Transform

6.5.1 1-D Discrete Cosine Transform
• The Discrete Cosine transform (DCT) is used for image compression in a number of
standards such as JPEG and MPEG.
• There are two main reasons why it is preferred ahead of the DFT:
1. It returns only real numbers, whereas the DFT returns complex numbers.
2. Most of the energy is concentrated at the lower frequencies.
Consider a 1-D time signal. To obtain real coecients we imagining that we are only
looking at 2N samples from a signal which is symmetric about t = 0.
• If we take the Fourier transform of this signal all the sine components will be zero,
since it is an even function.
• We will be left with only cosine components.
Consider taking 4 samples from the signal f (t) in Figure 75.
107
f (t)
T 2T 3T t
f2 (t)
− T2 T
2
3T
2 t
Figure 75: f2 (t) is the even extension of the signal f (t).
The discrete-time Fourier transform of f2 (t) is given as:
Z∞
F̃2 (ejωT ) = f2 (t)e−jωt dt
−∞
3
T T
e−jω((2n+1) 2 )
X
= f2 (2n + 1)
2
n=−4
3
X T T
= f2 (2n + 1) cos ω (2n + 1) since the function is even,
2 2
n=−4
3
X T
= 2 f (nT ) cos ω (2n + 1)
2
n=0
3
X ωT
= 2 f [n] cos (2n + 1)
2
n=0
So we see that all the F̃2 (ejωT ) is real for all ω. By varying ω over the range [0, ( NN−1 ) Tπ ]
we dene the Discrete Cosine Transform of the sequence f [n] as:
N −1
X (2n + 1)mπ
F [m] = c(m) f [n] cos for m = 0, ..., N − 1,
2N
n=0
where
(
√1 for m=0
c(m) = √N ,
√ 2 for m 6= 0
N
108
are normalising constants to ensure that the signal energy stays the same after a transfor-
mation and inverse traansformation.
6.5.2 2-D Discrete Cosine Transform

We can dene the 2-D DCT in a similar way as (using an N ×N image)
N −1 N −1
X X (2m + 1)uπ (2n + 1)vπ
F (u, v) = c(u)c(v) f [m, n] cos cos
2N 2N
m=0 n=0
for
u, v = 0, ..., N − 1
and
(
√1 for m=0
c(m) = √N .
√ 2 for m 6= 0
N
• In eect we are correlating the image with N2 basis functions:

(2m + 1)uπ (2n + 1)vπ
eu,v [m, n] = cos cos .
2N 2N
• Figure 76 shows the 64 basis functions used to transform an 8×8 image.
6.5.3 Image compression

• Suppose we take the low resolution 32 × 32 pixel image of Figure 77.
• This image has 32 × 32 = 1024 pixels, and therefore there are 1024 DCT coecients.
• We notice that most of the energy in the coecients is concentrated around the lower
values of u and v.
• Let's just take the 100 coecients corresponding to 0 ≤ u, v < 10.
• We see that while we are using 10 times fewer bits to represent the image it is still
recognisably similar to the original.
6.6 JPEG
• The DCT forms the main building of the JPEG (Joint Photographic Experts Group)
compression standard.
This is not to be confused with JPEG2000 which uses a wavelets transform and
operates on the entire image at once.
109
Figure 76: 64 basis functions for N = 8. For example the basis for (u, v) = (0, 0) is at the
top left.
Figure 77: 32 × 32 image and a compressed version using just 100 of the DCT coecients.
110
• The JPEG standard breaks the image into blocks of 8 × 8 pixels and performs a DCT
on each block.
• The majority of the high frequency coecients are eectively thrown away.
• Figure 78 shows an enlarged picture of a JPEG encoded image. If you look closely
you can see the 8×8 blocks.
111
Figure 78: Enlarged JPEG encoded image. The 8×8 blocks are just about evident.
112

ADSP Notes11

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

ADSP Notes11

Încărcat de

Drepturi de autor:

Formate disponibile

Applications of Digital Signal Processing

Stephen Redmond, Room 146, Ste.Redmond@gmail.com

processing algorithms in real world applications.

• Digital Processing of Speech Signals, Rabiner and Schafer, Prentice-Hall, 1978.

• Statistical Digital Signal Processing and Modelling, M. Hayes, Wiley, 1996.

Nyquist's Sampling Theorem, Discrete-Time Fourier Transform, z -transform, Inverse

2. Properties and design of lters.

6. DSP implementation issues

5% for each assignment.

of three topics covered in the course will be available.

• 70% - For the nal exam. 5 questions, attempt 4.

f¯(t) = f (t)δT (t)

Figure 1: Ideal sampling process.

Note that δT is periodic and so can be represented by a Fourier series,

which gives the frequency spectrum of the ideally sampled signal as

sampled at a rate greater than the bandwidth of the signal.

Taking the Fourier Transform of this gives

Discrete-Time Fourier Transform (DTFT).

F (jω) = α(ω) + jβ(ω)

t = nT ). The z -transform of a sequence, f [n], is dened as:

a unit circle in the complex plane, with its centre at z = 0.

Example: Find the inverse z -transform using partial fractions

Using the formula tables we nd the inverse transform is:

h[n] = δ(n − 1) + u(n − 1) − 2(0.5)n−1 u(n − 1)

division. Either way the derived impulse reponse is given by:

h[n] = [0, 0, 0, 0.5, 0.75, 0.875, ...]

B̃(z) b0 + b1 z −1 + b2 z −2 + ... + bM z −M Ỹ (z)

a0 Ỹ (z) + a1 Ỹ (z)z −1 + ... + aN Ỹ (z)z −N = b0 X̃(z) + b1 X̃(z)z −1 + ... + bM X̃(z)z −M .

Taking the inverse z -transform gives

a0 y[n] + a1 y[n − 1] + ... + aN y[n − N ] = b0 x[n] + b1 x[n − 1] + ... + bM x[n − M ].

used to directly implement the lter,

ensure that the poles are inside the unit circle.

factorise the transfer function as follows:

We can manipulate the equation into the following form:

e−jωT M QM ejωT − z QM jωT

impulse reponse is h[n] = [1, −0.25, −0.125]. The z -transform of h[n] is

H̃(z) = 1 − 0.25z −1 − 0.125z −2 .

Factorising this gives

H̃(z) = z −2 (z − 0.5)(z + 0.25).

Hence, the frequency response magnitude is given by

We can sketch the magnitude response by determining L1 and L2 at various locations

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

function over the z -plane (c.f. Figure 6).

Figure 6: The log-magnitude of the function H̃(z) = 1 − 0.25z −1 − 0.125z −2 evaluated

depressions in the surface.

following transfer function:

H̃(z) = 1 + z −1 + z −2 ⇒ h[n] = [1, 1, 1]

The pole-zero plot for this function is shown in Figure 7.

Figure 7: The pole-zero plot of the transfer function

the magnitude frequency response, H̃(ejωT ) .

y[n] = x[n] + x[n − 1] + x[n − 2].

can remedy this by introducing a pole nearby.

Figure 8: The log-magnitude of the function H̃(z) = 1 + z −1 + z −2 evaluated over the

ratio, let's call it R̃(z), is approximately unity, i.e.,

the Remez algorithm.

2.3 All-pass systems

by multiplying by an allpass lter. The new transfer function is

2.4 Design of IIR lters from analog prototypes

how to build an analog lter to a given specication.

2.4.1 Analog Butterworth lter

Hence, we must choose ωc and n to meet the requirements. Here's how...

If we eliminate ωc from Equations 1 and 2 and solve for n we get

Resolving Equations 1 and 2 for ωc gives

2. Properties and design of lters.

• 70% - For the nal exam. 5 questions, attempt 4.

t = nT ). The z -transform of a sequence, f [n], is dened as:

Using the formula tables we nd the inverse transform is:

used to directly implement the lter,

the magnitude frequency response, H̃(ejωT ).

by multiplying by an allpass lter. The new transfer function is

2.4 Design of IIR lters from analog prototypes

how to build an analog lter to a given specication.

2.4.1 Analog Butterworth lter

butterworth lowpass lter with cuto ωc is

which will give us the digital lter

2. design the analog lter to obtain its transfer function, HA (s).

Performing the Bilinear Transformation gives us the digital butterworth lter

frequency response of the lowpass butterworth digital lter.

2.5 Reconstruction ltering

concern since we can make quite good lter .

ADC aects the frequency spectrum of the ideally sampled signal.

• The ADC acts as a type of lowpass lter.

3.1 Time-invariant FIR Wiener lter

Therefore, we can write the mean-squared error, , as

lter! This known as the orthogonality principle, or the projection theorem.

Here we dene the autocorrelation of a signal a[n] as

and we dene the crosscorrelation of signals a[n] and b[n] as

The optimum lter coecients, wopt , are therefore:

= rdd [0] − rTdx w

the following simplications can be made: