Sunteți pe pagina 1din 246

Lecture

DIGITAL PROCESSING
OF
SPEECH AND IMAGE SIGNALS
RWTH Aachen, WS 2006/7

Prof. Dr.-Ing. H. Ney, Dr.rer.nat. R. Schl


uter
Lehrstuhl f
ur Informatik 6
RWTH Aachen

1. System Theory and Fourier Transform


2. Discrete Time Systems
3. Spectral Analysis
4. Fourier Transform and Image Processing
5. LPC Analysis
6. Wavelets
7. Coding
8. Image Segmentation and Contour-Finding

Completions: L. Welling, A. Eiden; April 1997


Completions: J. Dahmen, F. Hilger, S. Koepke; Mai 2000
Completions: F. Hilger, D. Keysers; Juli 2001
Translation: M. Popovic, R. Schl
uter; April 2003
Corrections: D. Stein; October 2006

Literature:
A. V. Oppenheim, R. W. Schafer: Discrete Time Signal Processing,
Prentice Hall, Englewood Cliffs, NJ, 1989.
A. Papoulis: Signal Analysis, McGraw-Hill, New York, NY, 1977.
A. Papoulis: The Fourier Integral and its Applications, McGraw-Hill
Classic Textbook Reissue Series, McGraw-Hill, New York, NY, 1987.
W. K. Pratt: Digital Image Processing, Wiley & Sons Inc, New York,
NY, 1991.
Further reading:
T. K. Moon, W. C. Stirling: Mathematical Methods and Algorithms
for Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2000.
J. R. Deller, J. G. Proakis, J. H. L. Hansen: Discrete-Time Processing
of Speech Signals, Macmillan Publishing Company, New York, NY,
1993.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery: Numerical Recipes in C, Cambridge Univ. Press, Cambridge, 1992.
L. Rabiner, B. H. Juang: Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993.
T. Lehmann, W. Oberschelp, E. Pelikan, R. Repges: Bildverarbeitung
f
ur die Medizin, Springer Verlag, Berlin, 1997.
L. Berg: Lineare Gleichungssysteme mit Bandstruktur, VEB Deutscher
Verlag der Wissenschaften, Berlin, 1986.

Contents
1 System Theory and Fourier Transform
1.1 Introduction . . . . . . . . . . . . . . .
1.2 Linear time-invariant Systems . . . . .
1.3 Fourier Transform . . . . . . . . . . . .
1.4 Properties of the Fourier Transform . .
1.5 Parseval Theorem . . . . . . . . . . . .
1.6 Autocorrelation Function . . . . . . . .
1.7 Existence of the Fourier Transform . .
1.8 -Function . . . . . . . . . . . . . . . .
1.9 Motivation for Fourier Series . . . . . .
1.10 Time Duration and Band Width . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

2 Discrete Time Systems


2.1 Motivation and Goal . . . . . . . . . . . . . . . . . . . . .
2.2 Digital Simulation using Discrete Time Systems . . . . . .
2.3 Examples of Discrete Time Systems . . . . . . . . . . . . .
2.4 Sampling Theorem (Nyquist Theorem) and Reconstruction
2.5 Logarithmic Scale and dB . . . . . . . . . . . . . . . . . .
2.6 Quantization . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Fourier Transform and zTransform . . . . . . . . . . . . .
2.8 System Representation and Examples . . . . . . . . . . . .
2.9 Discrete Time Signal Fourier Transform Theorem . . . . .
2.10 Discrete Fourier Transform: DFT . . . . . . . . . . . . . .
2.11 DFT as Matrix Operation . . . . . . . . . . . . . . . . . .
2.12 From Continuous Fourier Transform to Matrix Representation of Discrete Fourier Transform . . . . . . . . . . . . . .
2.13 Frequency Resolution and Zero Padding . . . . . . . . . .
2.14 Finite Convolution . . . . . . . . . . . . . . . . . . . . . .
2.15 Fast Fourier Transform (FFT) . . . . . . . . . . . . . . . .
2.16 FFT Implementation . . . . . . . . . . . . . . . . . . . . .
i

1
2
11
16
25
33
34
35
36
41
45
51
52
53
56
61
70
72
74
78
88
90
98
102
104
105
108
118

2.17 Cyclic Matrices and Fourier Transform . . . . . . . . . . .

124

3 Spectral analysis
131
3.1 Features for Speech Recognition . . . . . . . . . . . . . . . 132
3.2 Short Time Analysis and Windowing . . . . . . . . . . . . 135
3.3 Autocorrelation Function and Power Spectral Density . . . 159
3.4 Spectrograms . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.5 Filter Bank Analysis . . . . . . . . . . . . . . . . . . . . . 168
3.6 Mel-frequency scale . . . . . . . . . . . . . . . . . . . . . . 171
3.7 Cepstrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
3.8 Statistical Interpretation of the Cepstrum Transformation
183
3.9 Energy in acoustic Vector . . . . . . . . . . . . . . . . . . 185
4 Fourier Transform and Image Processing
4.1 Spatial Frequencies and Fourier Transform for Images
4.2 Discrete Fourier Transform for Images . . . . . . . .
4.3 Fourier Transform in Computer Tomography . . . . .
4.4 Fourier Transform and RST Invariance . . . . . . . .
5 LPC Analysis
5.1 Principle of LPC Analysis . . . . . . . . .
5.2 LPC: Covariance Method . . . . . . . . . .
5.3 LPC: Autocorrelation Method . . . . . . .
5.4 LPC: Interpretation in Frequency Domain
5.5 LPC: Generative Model . . . . . . . . . .
5.6 LPC: Alternative Representations . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.

187
188
196
197
199

.
.
.
.
.
.

207
208
212
213
216
221
223

6 Outlook: Wavelet Transform


225
6.1 Motivation: from Fourier to Wavelet Transform . . . . . . 226
6.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.3 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . 229
7 Coding (appendix available as separate document)

233

8 Image Segmentation and Contour-Finding

237

ii

List of Figures
1.1

Oscillograms of three time functions composed as sum of 20


partial oscillations. a) n = 0, b) n = 2 , c) n statistical.

Amplitude spectrum of a time function composed as sum of


20 partial tones. . . . . . . . . . . . . . . . . . . . . . . . .

from left to right: original photo, low-pass and high-pass


filtered version . . . . . . . . . . . . . . . . . . . . . . . .

Phase manipulation for portion of a speech signal (vowel o)


sampled at 8kHz, 25ms analysis window (200 samples), 512
point FFT . . . . . . . . . . . . . . . . . . . . . . . . . . .

Phase manipulation for portion of a speech signal (consonant


n) sampled at 8kHz, 25ms analysis window (200 samples),
512 point FFT . . . . . . . . . . . . . . . . . . . . . . . .

1.6

Phase manipulation for a Heavisidefunction (stepfunction)

1.7

Schematic representation of the physiological mechanism of


speech production . . . . . . . . . . . . . . . . . . . . . . .

2.1

Digital photo . . . . . . . . . . . . . . . . . . . . . . . . .

58

2.2

Gradient image . . . . . . . . . . . . . . . . . . . . . . . .

58

2.3

Several real cases of Laplace Operator subtraction from original image. a) Original image b) Original image minus
Laplace Operator (negative values are set to 0 and values
above the grey scale are set to the highest grade of grey) .

60

Ideal reconstruction of a band-limited signal (from Oppenheim, Schafer)


a) original signal b) sampled signal c) reconstructed signal

64

1.2
1.3
1.4

1.5

2.4

iii

2.5

2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13

2.14
2.15
2.16
2.17

Sampling of band-limited signal with different sampling rates:


b) sampling rate higher than Nyquist rate - exact reconstruction possible
c) sampling rate equal to Nyquist rate - exact reconstruction
possible
d) sampling rate smaller than Nyquist rate - aliasing - exact
reconstruction not possible . . . . . . . . . . . . . . . . . .
65
Amplitude spectrum of the voiceless phoneme s from the
word ist . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
Logarithmic amplitude spectrum of the phoneme s . . .
71
Amplitude spectrum of the voiced phoneme ae from the
. . . . . . . . . . . . . . . . . . . . . . . . . . .
word Ah
71
Logarithmic amplitude spectrum of the phoneme ae . . .
71
Amplitude spectrum of a speech pause . . . . . . . . . . .
71
Logarithmic amplitude spectrum of a speech pause . . . .
71
Hanning window . . . . . . . . . . . . . . . . . . . . . . . 103
Example of a linear convolution of two finite length signals:
a) two signals;
b) signal x[n-k] for different values of n:
i) n < 0, no overlap with h[k], therefore convolution y[n] =
0
ii) n between 0 and Nh + Nx 2, convolution 6= 0
iii) n > Nh + Nx 2, no overlap with h[k], convolution y[n]
=0
c) resulting convolution y[n]. . . . . . . . . . . . . . . . . . 106
Flow diagram for decomposition of one N -DFT to two N/2
DFTs with N = 8 . . . . . . . . . . . . . . . . . . . . . . . 110
Flow diagram of an 8pointFFT using Butterfly operations. 111
Flow diagram of an 8pointFFT using Butterfly operations. 120
Input and output arrays of an FFT. a) The input array contains N (N is power of 2) complex input values in one real
array of the length 2N . with alternating real and imaginary parts. b) The output array contains complex Fourier
spectrum at N frequency values. Again alternating real and
imaginary parts. The array begins with the zero-frequency
and then goes up to the highest frequency followed with
values for the negative frequencies. . . . . . . . . . . . . . 122
iv

3.1
3.2
3.3
3.4
3.5

Example for the application of the Discrete Fourier Transform (DFT). . . . . . . . . . . . . . . . . . . . . . . . . . .

138

a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum


V (ej ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

146

a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum


V (ej ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

148

a) DFT of length N = 64; b) DFT of length N = 128; c)


Fourier spectrum V (ej ). . . . . . . . . . . . . . . . . . . .

151

Influence of the window function:


above: speech signal (vowel a); central: 512 point FFT
using rectangle window; below: 512 point FFT using Hamming window . . . . . . . . . . . . . . . . . . . . . . . . .

158

3.6

Fourier Transform of a voiced speech segment:


a) signal progression, b) high resolution Fourier Transform,
c) low resolution Fourier Transform with short Hamming
window (50 sampled values), d) low resolution Fourier Transform using autocorrelation function (19 coefficients), e) low
resolution Fourier Transform using autocorrelation function
(13 coefficients) . . . . . . . . . . . . . . . . . . . . . . . . 162

3.7

Signal progression and autocorrelation function of voiced


(left) and unvoiced (right) speech segment . . . . . . . . .

163

Temporal progression of speech signal and four autocorrelation coefficients . . . . . . . . . . . . . . . . . . . . . . . .

164

3.8
3.9

a) wide-band spectrogram: short time window, high time


resolution (vertical lines), no frequency resolution; for voiced
signals provides information on formant structure b) narrowband spectrogram: long time window, no time resolution,
high frequency resolution (horizontal lines); for voiced signals provides information on fundamental frequency (pitch) 166

3.10 Wide-band and narrow-band spectrogram and speech amplitude for the sentence Every salt breeze comes from the
sea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

167

3.11 Above: logarithmized power spectrum of a spoken vowel


(schematic).
Below: corresponding cepstrum (inverse Fouriertransform
of the logarithmized power spectrum). . . . . . . . . . . .

177

3.12 Cepstral smoothing: speech signal (vowel a), windowed


speech signal (Hamming window), spectrum obtained from
the whole cepstrum (blue) and smoothed spectrum obtained
from the first 13 cepstral coefficients (red). . . . . . . . . .
3.13 Homomorph analysis of a speech segment: signal progression, homomorph smoothed spectrum using 13 and 19 cepstral coefficients . . . . . . . . . . . . . . . . . . . . . . . .

179

4.1
4.2
4.3
4.4
4.5
4.6

TVimage (analog) . . . . . .
Digitized TVimage . . . . . .
Amplitude spectrum of Figure
Low-pass filtered . . . . . . .
High-pass filtered . . . . . . .
High-pass enhancement . . . .

193
193
193
193
194
194

5.1

LPCanalysis of one speech segment


a) signal progression, b) prediction error (K=12), c) LPC
spectrum with K=12 coefficients, d) spectrum of the prediction error (K=12), e) LPCspectrum with K=18 coefficients 219
LPCSpectra for different prediction orders K . . . . . . . 220

5.2

. .
. .
4.2
. .
. .
. .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

178

List of Tables
2.1
2.2

Fourier transform pairs . . . . . . . . . . . . . . . . . . . .


Fourier transform Theorems . . . . . . . . . . . . . . . . .

87
88

Chapter 1
System Theory and Fourier
Transform

Overview:
1.1 Introduction
1.2 Linear time-invariant Systems
1.3 Fourier Transform
1.4 Properties of Fourier Transform
1.5 Parseval Theorem
1.6 Autocorrelation Function
1.7 Existence of the Fourier Transform
1.8 Function
1.9 Fourier Series
1.10 Duration and Band Width

Digital Processing of Speech and Image Signals

WS 2006/2007

1.1

Introduction

What distinguishes the Fourier Transform (FT) from


other transformations?
1. Mathematical property of linear time-invariant systems:
FT decomposes the time signal into eigenfunctions
eigenfunctions keep their form by passing the
linear time-invariant system
Ax=x
Magnitude of FT: shift invariant
2. Physical observation:
Human ear produces sort of FT, essentially only magnitude of FT
(strictly speaking: short-time FT)
Example:
Time functions with different evolution can sound equally.
The human ear either senses sense phase differences of partial
tones of the complete sound of stationary processes very weakly,
or does not sense them at all.
Fourier transform in speech processing:
Calculation of the spectral components of speech
Basic method for obtaining observations (features) for
speech recognition

Digital Processing of Speech and Image Signals

WS 2006/2007

=0

= /2

random

0
0

Figure 1.2: Amplitude spectrum of a time


function composed as sum of 20 partial
tones.
0

Figure 1.1: Oscillograms of three time


functions composed as sum of 20 partial
oscillations. a) n = 0, b) n = 2 , c) n
statistical.

Figure 1.3: from left to right: original photo, low-pass and high-pass filtered version

Digital Processing of Speech and Image Signals

WS 2006/2007

amplitude spectrum

original signal

inverse FT for phase (f ) = 0

Inverse FT for phase (f ) =

Inverse FT for random phase (f )

Figure 1.4: Phase manipulation for portion of a speech signal (vowel o) sampled at 8kHz,
25ms analysis window (200 samples), 512 point FFT

Digital Processing of Speech and Image Signals

WS 2006/2007

amplitude spectrum

original signal

inverse FT for phase (f ) = 0

inverse FT for phase (f ) =

inverse FT for random phase (f )

Figure 1.5: Phase manipulation for portion of a speech signal (consonant n) sampled at
8kHz, 25ms analysis window (200 samples), 512 point FFT

Digital Processing of Speech and Image Signals

WS 2006/2007

amplitude spectrum

original signal

inverse FT for phase (f ) = 0

inverse FT for phase (f ) =

inverse FT for random phase (f )

Figure 1.6: Phase manipulation for a Heavisidefunction (stepfunction)

Digital Processing of Speech and Image Signals

WS 2006/2007

Why Fourier?
Roughly:
Production, description and algorithmic operations on signals (functions or measurement curves over the time axis) can be described very
well in Fourier domain (frequency domain).
Deeper reason:
Production, description and algorithmic operations on signals are largely
based on linear time-invariant (LTI) operations.
Fourier Transform: simple representation of LTI-operations (later:
convolution theorem)
Why continuous?
Real world is continuous
Computer (digital = time discrete = sampled)
model of the real world

Digital Processing of Speech and Image Signals

WS 2006/2007

a)
A

glottal
pulses

vocal
tract
filter

b)

speech [a:]

radiation
from lips
and nose

|E(f)| [dB]

[a:]

|V(f)| [dB]

|A(f)| [dB]

|S(f)| [dB]
[a:]

1/T

NOSE
OUTPUT
NASAL
CAVITY

VELUM

PHARYNX
CAVITY

VOCAL
CORDS

LARYNX
TUBE

MOUTH
CAVITY
TONGUE
HUMP

MOUTH
OUTPUT

TRACHEA AND
BRONCHI

LUNG
VOLUME

MUSCLE
FORCE

Figure 1.7: Schematic representation of the physiological mechanism of speech production

Digital Processing of Speech and Image Signals

WS 2006/2007

signal (speech, image)

feature extraction
(signal analysis)

feature vector
(pattern vector)

(pattern)
comparison

reference data
(vectors, features)

decision

Examples:
Spoken language
Written numbers (letters)
Cell recognition (red blood cells)

Digital Processing of Speech and Image Signals

WS 2006/2007

Examples of applications of Fourier Transform:


Electrical switchgears
Recognition and coding
Speech and general acoustic signals
Image signals
Time series analysis:
Astronomical measurement curves
Stock-market course
...
Computer tomography
Solving differential equations
Description of image production in optical systems

Digital Processing of Speech and Image Signals

10

WS 2006/2007

1.2

Linear time-invariant Systems

Example:
speech production
electrical systems
h(t)
input signal
x(t)

output signal
y(t)

symbolic:
{t y(t)} = S {t x(t)}
simplified:
y(t) = S {x(t)}
Note: the complete time domain of the function is important, not
individual positions in time t.
more exact:
y = S {x}
LTISystem:

(LTI = Linear Time-Invariant)

Linear:
Additive:
S {x1 + x2 } = S {x1 } + S {x2 }
Homogeneous:
S { x} = S {x} ,

IR

Time-invariant:
{t y(t t0 )} = S {t x(t t0 )} ,
Digital Processing of Speech and Image Signals

11

t0 IR
WS 2006/2007

Mathematical theorem:
Linearity and time invariance result in convolution representation
Output signal y(t) of LTI system S with input signal x(t):

y(t) =

x(t ) h( ) d
x( ) h(t ) d

= x(t) h(t)
h: impulse response of the system S
e

(t)

x (t)

1/

system response h (t) to excitation e (t):


h (t) = S {e (t)}
signal x(t) is represented as sum of amplitude weighted and time
shifted elementary functions e (t):
"
#
X
x(t) = lim
x(i ) e (t i )
0

Digital Processing of Speech and Image Signals

12

WS 2006/2007

Hence the following holds for the output signal y(t):


y(t) = S {x(t)}
= S
=

lim

"

lim

(i

additivity:
=

lim

"

x(i ) e (t i )

X
i

x(i ) e (t i )

S { x(i ) e (t i ) }

)#

homogeneity (for x(i ) and ):


"
#
X
= lim
x(i ) S { e (t i ) }
0

time invariance:
=

lim

"

X
i

x(i ) h (t i )

limiting case 0 :
X

i
h (t)

h(t)

result:
y(t) =

h(t):

x( ) h(t ) d = x(t) h(t)

impulse response of the system

Digital Processing of Speech and Image Signals

13

WS 2006/2007

Examples of LTI-operations:
Oscillatory systems (electrical or mechanical) with
external excitation:
h( )

x(t)
y(t) =

y(t)

h(t ) x( ) d

y (t) + 2y (t) + 2 y(t) = x(t)


, : parameters depending on the oscillatory system
More general electrical engineering systems:
high-pass, low-pass, band-pass

Sliding average value:


x(t)

y(t) := x(t)
+T
Z /2

1
x(t) =
T

x(t + ) d

T /2

Differentiator:
x(t)

y(t) := x (t)

Comb filter: hypothesized period T


x(t)

y(t) := x(t) x(t T )

In general: linear differential equations with coefficients ck and dl


P
P
ck y (k) (t) = dl x(l) (t)
k

[ + further constraints ]

Digital Processing of Speech and Image Signals

14

WS 2006/2007

Example of a non-linear system:


system: y(t) = x2 (t)
x(t) = A cos(t)
A2
(1 + cos(2t))
= y(t) = A cos (t) =
2
2

frequency doubling

Digital Processing of Speech and Image Signals

15

WS 2006/2007

1.3

Fourier Transform

Sinusoidal oscillation:
x(t) = A sin ( t + )
amplitude A
phase / null phase
angular frequency = 2 f
j 2 = 1,

jC

Im
1
sin

cos 1

complex representation:

Re

ej = cos + j sin ,

ej + ej
cos =
2

and

IR

ej ej
sin =
2j

dimension:
DIM() DIM(t) = 1
DIM() =

1
1
=
= [Hz]
DIM(t) [sec]

Digital Processing of Speech and Image Signals

16

WS 2006/2007

LTI-System
y(t) =

x(t )h( )d = x(t) h(t)

Determine the following specific input signal:


x(t) = A ej(t+)
For this input signal the output signal becomes:
y(t) =

A ej((t )+) h( )d

= A ej(t+)

h( )ej d

|
{z
}
H() = F {h( )}
x(t) H()

Definition of the Fourier transform:


Z
h( )ej d = F {h( )} = F { h( )}
H() =

decomposition into ej )

H() is called transfer function of the system


Remark about x(t) = A ej(t+) :
The shape of the input signal x(t), i.e. its frequency (eigenfunction) remains invariant
Amplitude (intensity) and phase (time shift) are depending on
H() (eigenvalue)
(

analogy to the problem of eigenvalues in linear algebra)

Digital Processing of Speech and Image Signals

17

WS 2006/2007

Remarks
FT is complex:
H() = Re {H()} + j Im {H()} = |H()| ej()
Amplitude (spectrum):

q
Re {H()}2 + Im {H()}2
|H()| =

Phase (spectrum):



Im {H()}

arctan

Re {H()}




Im
{H()}

arctan
+

Re
{H()}

() =

Digital Processing of Speech and Image Signals

18

Re {H()} > 0
Re {H()} < 0
Re {H()} = 0,

Im {H()} > 0

Re {H()} = 0,

Im {H()} < 0

WS 2006/2007

Examples of Fourier transforms:


1. Rectangle function
t
h(t) = rect( ) =
T

H() =

1,
0,

|t| T /2
|t| > T /2

jt

h(t)e

dt =

Z2

jt

T2

i
1 h j T
j T2
2
e
e
dt =
j

T
)
T sin(
2
T
2
sin(
) =
T

2
2

(here: Im {H()} = 0)
h(t)

H()

Digital Processing of Speech and Image Signals

19

WS 2006/2007

2. Double-sided exponential
h(t) = e|t|

H() =

with > 0

h(t)ejt dt

e(+j)t dt +

=
=
=
=

e(j)t dt


e(j)t
e
+
( + j) ( j) 0
1
1
0+0

( + j) ( j)
j + + j
2 + 2
2
2 + 2

(+j)t

Imaginary part equals 0


Infinite spectrum
No zeros
H( )

h(t)

If h(t) is symmetric (i.e. h(t) = h(t)), imaginary parts drop away


and the real part is sufficient
Digital Processing of Speech and Image Signals

20

WS 2006/2007

3. Damped oscillations
h(t) = e|t| cos(t) with > 0

H() =

h(t)ejt dt

e(+j)t cos(t)dt +

e(j)t cos(t)dt

e(+j)t

ejt + ejt
dt +
2

e(j)t

ejt + ejt
dt
2

...

(elementary calculation)

+
2 + ( )2
2 + ( + )2

Limiting case:
H()|= =

2 + (2)2

= tends towards or if tends towards 0

H( )

h(t)

Digital Processing of Speech and Image Signals

21

WS 2006/2007

4. Modulated rectangle function (truncated cosine)



cos( t),
|t| T /2
h(t) =
0,
|t| > T /2
H() =

h(t)ejt dt

Z2

cos( t)ejt dt

T2

...

(elementary calculation)


T
sin
(

)
T
2

T
2
( )
2



T
sin ( + )
2

T
( + )
2
h(t)

h(t)

H()

H()

Digital Processing of Speech and Image Signals

22

WS 2006/2007

Fourier Transform pairs (u = /2)


Rectangle function

Sinc function

-1/2

sin(u)
u

1/2

Squared sinc function

Triangle function

-1/2

1/2

Exponential function
2
2+(2u)2
e-|x|

Gaussian function
e -x

- u
e

Unit impulse
(x)

Digital Processing of Speech and Image Signals

23

WS 2006/2007

Inverse Fouriertransform
Z

H() =

h(t)ejt dt

h(t)
=
2

assumption:

with:

H() =

H()ejt d

h( )ej d

inserting H() in h(t):

h(t)
=

1
2

lim

,T

1
lim lim
2 T
1

lim lim

lim

= h(t)

ZT

h( ) ej(t ) d d

T
ZT Z

ej(t ) d h( ) d

T
ZT

sin ((t ))
h( ) d
t

sin ((t ))
h( ) d
t

due to:
1
lim

Z

sin(t)
h(t) dt = h(0)
t

formal expression:
h(t) =

1
2
|

ej(t ) d h( ) d

{z
= (t )

distribution theory, see there for stronger proof)

Digital Processing of Speech and Image Signals

24

WS 2006/2007

1.4

Properties of the Fourier Transform

Symmetry

H() =

h(t) ejt dt = F {h(t)}

1
2

h(t) =

H() ejt d = F 1 {H()}

F 2 {h(t)} = F {H()} = 2h(t)


F 1 F {h(t)} = F 1 {H()} = h(t)
Time domain and frequency domain are correlated symmetrically.
Properties of FT are valid in both domains, especially the convolution
theorem (see later).

Digital Processing of Speech and Image Signals

25

WS 2006/2007

Theorems for the Fourier transform


H() =

ejt h(t) dt

consider the equation:


H() = F {h(t)}
more exact:
{ H()} = F {t h(t)}
1. Linearity:

integral operator is linear

2. Inverse scaling, similarity principle:


Z

h(t) ejt dt =

F {h(t)} =

1
||

h( ) ej d

H( ),
||

IR\{0}

Note:
Absolute value, because integral boundaries are swapped for < 0.
3. Shift:

h(t t0 )
Z

h(t t0 ) ejt dt = ejt0


= ejt0

h(t t0 ) ej(tt0 ) dt
h( ) ej d

Digital Processing of Speech and Image Signals

26

WS 2006/2007

= F {h(t t0 )} = ejt0 H() t0 IR


with H() = F {h(t)}
important:
| F {h(t t0 )} | = | F {h(t)} |

, because

|ejt0 | = |eju | = | cos u j sin u|


p
cos2 u + sin2 u
=
= 1
4. Symmetry and antisymmetry:
h(t) = h(t)

results in

h(t) = h(t)
5. Complex conjugation:
Z

Im{H()} = 0

results in

Re{H()} = 0

suppose that h(t) is a complex function

h(t) ejt dt

h(t) ejt dt

h(t) ejt dt = H()

F {h(t)} = H() = F {h(t)}


Special case:

h(t) is real, so

h(t) = h(t)

= H() = H() = | H() | = | H() | = | H() |

Digital Processing of Speech and Image Signals

27

WS 2006/2007

6. Differentiation:
dh
dt

1
t
2

1
2

H() ejt d

H() j ejt d

F{

dh(t)
} = j F {h(t)}
dt

Interpretation: differentiation = enhancement of high frequencies


(due to the multiplication with )
7. Integration:
F{

Zt

h( )d } =

Proof:

1
F {h(t)}
j

similar to differentiation or inversion

8. Modulation principle:
F {h(t) cos(0 t)} =

h(t) cos(0 t) ejt dt

1
h(t) ej0 t ejt dt +
h(t) ej0 t ejt dt
2

Z
Z
1
h(t) ej(0 )t dt +
h(t) ej(+0 )t dt
=
2

1
[ H( 0 ) + H( + 0 ) ]
2

and similarly
F { h(t) sin(0 t) } =

1
[ H( 0 ) H( + 0 ) ]
2j

Digital Processing of Speech and Image Signals

28

WS 2006/2007

y(t)

x(t)
h(t), H()

Y()

X()

Convolution theorem
Convolution in time domain corresponds to multiplication in frequency
domain
Z
Time domain:
y(t) = x(t) h(t) =
x(t ) h( ) d

Frequency domain:
Y () =

ejt

h( ) x(t ) d dt

Z
h( )
x(t ) ejt dt d

h( ) X() ej d

= X()

(shifting)

h( ) ej d

= X() H()

Digital Processing of Speech and Image Signals

29

WS 2006/2007

Likewise, multiplication in time domain corresponds to convolution in


1
):
frequency domain (note the factor 2
Time domain:

y(t) = a(t) b(t)

Frequency domain:
Y () =

1
2
1
2

a(t) b(t) ejt dt


1
a(t)
2
Z

B(
)ej t ejt d
dt

B(
)

a(t)ej()t dt d

A(
) B(
)d

1
A() B()
2

Motivation for the Fourier transform:


FT gives the simplest representation of the system operation, because every LTI-System can be interpreted as convolution of the input
signal x(t) and the impulse response of the system h(t). Convolution
can be then efficiently calculated using FT and convolution theorem.
Mathematical: eigenfunctions

Digital Processing of Speech and Image Signals

30

WS 2006/2007

Example: Oscillator with excitation


Oscillator

x(t)

y(t)

y (t) + 2 y (t) + 2 y(t) = x(t)


Z+
1
x(t) =
X()ejt d
2
y(t) =

y (t) =

y (t) =

1
2
1
2
1
2

Z+

Y ()ejt d

Z+

Y ()j ejt d

Z+

Y ()[ 2 ] ejt d

Z+
Z+
[ 2 + 2j + 2 ]Y ()ejt d =
X()ejt d

Z+



[ 2 + 2j + 2 ] Y () X() ejt d = 0
|
{z
}

=0

In this way we obtain the transfer function of an oscillator:

H() =

Y ()
1
=
X() 2 + 2j + 2

Digital Processing of Speech and Image Signals

31

WS 2006/2007

1
h(t) =
2

Z+
H()ejt d

(can be given explicitly)

Z+
x(t) h(t )d
y(t) =

Note:
y(t) does not contain the component which corresponds to the homogeneous differential equation of the oscillator.

x(t)

Convolution with
h(t)

Inverse Fourier
Transform

Fourier
Transform

X()

y(t)

Multiplication with
H() = F{h(t)}

Digital Processing of Speech and Image Signals

32

Y()

WS 2006/2007

1.5

Parseval Theorem

Convolution theorem:
F 1 {H() X()} =
()

1
2

H() X() ej d

h(t) x( t) dt

= (h x) ( )

We make two special assumptions:


i) x(t) := h(t), then: X() = H()
ii) = 0
Inserting in () results in:
1
2

1
2

H()H() d
Z

|H()|2 d

h(t)h(t) dt

|h(t)|2 dt = E

Energy E in time domain = Energy E in frequency domain


1
1
; aid: use normalization factor
for both
(up to the factor
2
2
directions of Fourier Transform)
Physical aspect: energy conservation
Mathematical aspect: unitary (orthogonal) representation in vector
space
|H()|2 is called power spectral density.

Digital Processing of Speech and Image Signals

33

WS 2006/2007

1.6

Autocorrelation Function

Autocorrelation function
Autocorrelation function of time continuous
signal or function h(t) is defined as:
R(t) =

h( ) h(t + )d

The following equation is valid:


R(t) = h(t) h(t)

Fourier transform gives:

which results in

R(t) = R(t)

(Wiener-Khinchin Theorem)

F {R(t)} = H() H() = |H()|2


Thus: Fourier transform connects autocorrelation
function R(t) and power spectral density |H()|2
|H()|2 =

R(t) ejt dt =

R(t) cos(t) dt

Remark:
autocorrelation is a special case of the cross correlation between signals x( ) and h(t)

Ch,x =

h( ) x(t + )d

Digital Processing of Speech and Image Signals

34

WS 2006/2007

1.7

Existence of the Fourier Transform

Conditions for h(t) for the existence of the Fourier transform

H() =

1
h(t) =
2

ejt h(t) dt ,

ejt H() d

When are those equations valid?


Sufficient conditions:
1. h(t) is absolutely integrable:
Z

|h(t)|dt <

2. h(t) has finite number of jumps, minima and maxima in each interval
of IR
3. h(t) has no infinite jumps
More general conditions are possible (but rather complex set of conditions):
Generalized functions, distributions,
definition as functional
Example: -function:
Z

(t) h(t) dt = h(0) for all functions h

Digital Processing of Speech and Image Signals

35

WS 2006/2007

Impulse response:
y(t) =

h(t )( ) d

= h(t) (t)
= h(t)
Consequence:
h(t) 1

(t) dt = 1

A function like (t) does not exist. But it is possible to define the
functional for each function t h(t):

[t h(t)] (h)
:= h(0)

1.8

-Function

Starting point: definition of the -function as a boundary case of


a function (t):
lim

Z+

f (t) (t) dt = f (0)

(1.1)

Possible realizations of (t)

1 t [, +]
2
a) (t) =

0 otherwise
b) (t) =

1
2 + t2

Digital Processing of Speech and Image Signals

36

WS 2006/2007

c) (t) =

1 sin (t/)

d) (t) =

1
22

t2

e 22

During inversion of the Fourier transform we have formally obtained:


(t) =

1
2

Z+

ejt d = lim

1 sin (t)

(1.2)

Fourier transform F {(t)}:


F {(t)} =

Z+

ejt (t) dt

due to (1.1) the following holds:


F {(t)} = ejt |t=0 = 1
Another derivation using (1.2):
(t) =

1
2
1
2

Z+

Z+

ejt F {(t)} d
ejt d

general

according to (1.2)

Comparison results in:


F {(t)} = 1

Digital Processing of Speech and Image Signals

37

WS 2006/2007

From this we obtain the following equations:


From symmetry property:
F {1} = 2 ()
From shifting theorem:
F {ej0 t 1} = 2 ( 0 )

cos (0 t) =
=


1  j0 t
e
+ ej0 t
2

Z+
Z+
1
( + 0 ) ejt d
( 0 ) ejt d +
2

Z+

1
2

[ ( 0 ) + ( + 0 ) ] ejt d

F { cos (0 t) } = [ ( 0 ) + ( + 0 ) ]
Note: another derivation:
consider damped oscillations
1 |t|
e
cos (0 t)
2
in the limit 0 .

Digital Processing of Speech and Image Signals

38

WS 2006/2007

Comb function
define comb function (pulse train, sequence of -impulses):
+
X

x(t) =

n=

(t nT )

Fourier transform of comb function:


X() =

Z+

x(t) ejt dt

Z+ X
+

(t nT ) ejt dt

(t nT ) ejt dt

n=
+
+ Z

n=
+
X
jnT

n=

=
=

...

(see Papoulis 1962, p. 44)

+
2 X
2
( n )
T n=
T

in words:
-impulse sequence with period T in time domain
produces
-impulse sequence with period T1 in frequency domain
(i.e. 2
T in -frequency domain)
comb function is transformed to comb function
Digital Processing of Speech and Image Signals

39

WS 2006/2007

Comb function

n=-

-6T

2
T

(t-nT)

-3T -T

3T

n=-

-6 -4 -2
T T T

6T

(-n2/T)

2 4 6
T T T

1((- )+(+ ))
0
0
2

cos(0t)

1 j((- )+(+ ))
0
0
2

sin(0t)

0
0

Digital Processing of Speech and Image Signals

40

WS 2006/2007

1.9

Motivation for Fourier Series


x:

IR IR
t x(t)

Consider a periodical function x with period T :


x(t) = x(t + T )
then also x(t) = x(t + kT )

for each t IR
for k Z

Examples:
Constant function:
x0 (t) = A0
Harmonic oscillator:
x1 (t) = A1 cos (

2
t + 1 ) ,
T

A1 > 0

All higher harmonic:


xn (t) = An cos (n

2
t + n ) ,
T

An > 0

therefore
x(t) =

An cos (n 0 t + n ) with 0 =

n=0

is periodical with period T =

2
,
T

An 0

2
0

Another notation:
x(t) =

Bn ej n 0 t

where Bn

is a complex number

n=

Digital Processing of Speech and Image Signals

41

WS 2006/2007

Line spectrum representation

Real measured signal has always a widespread spectrum.


Reasons:
Strictly periodical signal (almost) never exists
Period can fluctuate
Wave form within one period can fluctuate
Only a finite section of the signal is analyzed
(window function)
Only a strictly periodical signal has a sharp line spectrum
Remarks:
Fourier series are actually not strictly related to periodical functions:
a finite interval of IR is sufficient (the signal is then interpreted as
infinitely prolonged).
By transition from the finite interval to the complete real axis the
Fourier series becomes Fourier integral.

Digital Processing of Speech and Image Signals

42

WS 2006/2007

Calculation of Fourier coefficients:


Consider a periodical function x(t) with period T =

2
0

approach:
x(t) =

+
X

an ej n 0 t

aC

n=

multiplication with ej m 0 t where m IN and integration over one


period result in:
+T
Z /2

x(t) ej m 0 t dt =

+
X

ej (nm) 0 t dt

an

n=

T /2

+T
Z /2

T /2

Due to orthogonality holds:


+T
Z /2

j (nm) 0 t

dt =

T /2

T
0

if n = m
if n =
6 m

Then:
ZT /2

x(t) ej m 0 t dt = am T

T /2

Result:
an

1
T

+T
Z /2

x(t) ej n 0 t dt

T /2

1
T

+T
Z /2

x(t) cos (n 0 t) dt j

T /2

1
T

+T
Z /2

x(t) sin (n 0 t) dt

T /2

Digital Processing of Speech and Image Signals

43

WS 2006/2007

Spectrum of a periodical function


If x(t) is periodical with the period T =

x(t) =

+
X

2
0

an ej n 0 t ,

n=

, then

an C

The Fourier transform X() is:


X() = F {x(t)}
+
X
an
=
n=

+
X

= 2

F {ej n 0 t }
| {z }
= 2( n0 )

n=

an ( n0 )

Note:

This derivation is formal, because the Fourier integral does not


exist in the usual sense;
strict derivation within the scope of distribution theory.

In words:

a periodic function with the period T has a Fourier transform in the


form of a line spectrum with the distance 0 = 2
T between the components.

Digital Processing of Speech and Image Signals

44

WS 2006/2007

1.10

Time Duration and Band Width

1. Similarity principle:
F {h(t)} =

H( )
||

h(t)

H( )

0<<1:

_ H( _
1

h( t)

time duration T

band width B
T B

= const.

High resolution in the time domain results in low resolution in the


frequency domain and vice versa

Digital Processing of Speech and Image Signals

45

WS 2006/2007

2. Special case: h(t) with


Im {H()} = 0 ( h(t) symmetrical )
and
Re {H()} 0
h(t) has maximum for t = 0:
1
h(t) =
2

1
H() cos(t) d
2

H() d = h(0)

define:
T

1
h(0)

1
H(0)

h(t) dt

H() d

from
T

H(0)
h(0)

and B = 2

h(0)
H(0)

follows
T B

Digital Processing of Speech and Image Signals

46

= 2

WS 2006/2007

3. In general:

normalized impulse h(t) IR with


Z
h2 (t) dt = 1,
h(t) IR

T2

B2

:=

:=

h2 (t) t2 dt

| H() |2 2 d

Results in uncertainty relation:

= 2

T B

2
[h (t)] dt

Proof: Cauchy-Schwarz inequality


| xT y | ||x|| ||y||

2
Z

Z
Z


2
2

[ t h(t)] dt
[ h (t) ] dt
[ t h(t) ] h (t) dt


|
{z
}
{z
}
{z
}
|
|
2
2
1
B
=T
=
4
2
From:
partial integration

u (t) v(t) dt = u(t) v(t)

u(t) v (t) dt

1
t dt = h(t)2 t
[ h(t) h (t) ] |{z}
{z
}
|
2
v(t)

u (t)

[ h(t) h (t) ] t dt = 0

Digital Processing of Speech and Image Signals

47

1 2
h (t) 1 dt
2
1
2
WS 2006/2007

Equality sign is valid for linear dependency:


h (t) = t h(t)
dh
= t dt
h
1
log(h) = t2 + const.,
2

Optimum T B =

> 0

for Gauss impulse

1 2
t
h(t) = e 2
2
Variance: 2 =

Quantum Physics: similar statement about position and impulse


of a particle

Digital Processing of Speech and Image Signals

48

WS 2006/2007

4. Finite positive signal



0
0tT
g(t)
=0
t < 0 or T < t
g(t)

The following is valid for the amplitude spectrum |G()|:



+

Z


jt
g(t) e
dt
|G()| =

+
Z

|g(t)| |ejt |dt

Z+

|g(t)| dt

because g(t) 0
= G(0)
Define the band width B as:
|G(B )|2 =

G2 (0)
2

and

|G(B )|2 |G()|2 for || < B

Then:
T B
Digital Processing of Speech and Image Signals

49

2
WS 2006/2007

Proof:
The following inequalities are valid:
(a b)2
a +b
2
| sin | + | cos | 1
2

a, b IR
IR

For the Fourier-Transform of g(t) holds:

Re{G()} =

ZT

g(t) cos t dt

Im{G()} =

ZT

g(t) sin t dt

holds:
cos t 0, sin t 0
2
and therefore:
cos t + sin t = | cos t| + | sin t| 1

For 0 t

Re{G()} Im{G()} =

ZT

g(t) [cos t + sin t] dt

ZT

g(t) 1 dt

= G(0)
|G()|2 = Re2 {G()} + Im2 {G()}
[Re{G()} Im{G()}]2

2
1 2
G (0) |G(B )|2

2
Digital Processing of Speech and Image Signals

50

WS 2006/2007

Chapter 2
Discrete Time Systems
Overview:
2.1 Motivation and Goal
2.2 Digital Simulation using Discrete Time Systems
2.3 Examples of Discrete Time Systems
2.4 Sampling Theorem and Reconstruction
2.5 Logarithmic Scale and dB
2.6 Quantization
2.7 Fourier Transform and zTransform
2.8 System Representation and Examples
2.9 Discrete Time Signal Fourier Transform Theorems
2.10 Discrete Fourier Transform (DFT)
2.11 DFT as Matrix Operation
2.12 From continuous FT to Matrix Representation of DFT
2.13 Frequency Resolution and Zero Padding
2.14 Finite Convolution
2.15 Fast Fourier Transform (FFT)
2.16 FFT Implementation

Digital Processing of Speech and Image Signals

51

WS 2006/2007

2.1

Motivation and Goal

If we want to process a continuous time signal x(t) with a computer, we


have to sample it at discrete equidistant time points
tn = n TS
where TS is called sampling period.
Terminology:
time discrete is often called digital, where this adjective often
(but not always) denotes the amplitude quantization,
i.e. the quantization of the value x(n TS ).
Advantages of digital processing in comparison to analog components:
independent of analog components and technical difficulties with respect to their realization;
in principle arbitrary high accuracy;
also non-linear methods are possible,
in principle even every mathematical method.

Digital Processing of Speech and Image Signals

52

WS 2006/2007

2.2

Digital Simulation using Discrete Time Systems

Task definition:
Given:
Analog system with input signal x(t) and output signal y(t);
Sampling with sampling period TS
Wanted:
Discrete System with input signal x[n] and output signal y[n], such
that
x[n] = x(nTS )
results in
y[n] = y(nTS )
For which signals is such a digital simulation possible?
The sampling theorem gives (most of) the answer.

Digital Processing of Speech and Image Signals

53

WS 2006/2007

LTI System (analog to continuous time case):


Linearity:
Homogeneity:
S { x[n]} = S {x[n]}
Additivity:
S {x1 [n] + x2 [n]} = S {x1 [n]} + S {x2 [n]}
Shift invariance:
S {x[n n0 ]} = y[n n0 ],

Digital Processing of Speech and Image Signals

54

n0

whole number

WS 2006/2007

Representation of an LTI System as discrete convolution:


Unit impulse:


[n] =

1,
0,

n = 0
n 6= 0

The signal x[n] is represented with amplitude weighted and time shifted
unit impulses [n]. The system reacts on [n] with h[n]:
h[n] = S {[n]}
Input signal:

x[n] =

k=

x[k] [n k]

Output signal:
y[n] = S

k=

x[k] [n k]

Additivity
=

S { x[k] [n k] }

x[k] S { [n k] }

k=

Homogeneity
=

k=

Time invariance
=

k=

x[k] h[n k]

Input signal x[n] and output signal y[n] of a discrete time LTI system are
linked through discrete convolution.
h[n] is called impulse response like in continuous time case.
Digital Processing of Speech and Image Signals

55

WS 2006/2007

2.3

Examples of Discrete Time Systems

Difference calculation:
y[n] = x[n] x[n n0 ]
1-2-1-averaging:
y[n] = 0.5 x[n 1] + x[n] + 0.5 x[n + 1]
sliding window averaging (smoothing)
M
X
1
y[n] =
x[n k]
2M + 1
k=M

weighted averaging: instead of constant weight


h[n] =

1
2M + 1

arbitrary weights can be used:


y[n] =

M
X

k=M

h[k] x[n k]

Note: the only difference from general case is


finite length of the convolution kernel h[n].
First order difference equation:
(recursive averaging, averaging with memory)
y[n] y[n 1] = x[n]
(Digital) resonator (second order difference equation)
y[n] y[n 1] y[n 2] = x[n]
Image processing:
Gradient calculation and image enhancement
(Roberts Operator, Laplace Operator)
Digital Processing of Speech and Image Signals

56

WS 2006/2007

Roberts Cross Operator


gray values x[i, j]

j+1

i+1

2
|x[i, j]|2 = (x[i, j] x[i + 1, j + 1])2 + (x[i, j + 1] x[i + 1, j])2

Note: non-linear operation


simplified:
|x[i, j]| = |x[i, j] x[i + 1, j + 1]| + |x[i, j + 1] x[i + 1, j]|

Digital Processing of Speech and Image Signals

57

WS 2006/2007

Figure 2.1: Digital photo

Figure 2.2: Gradient image

Digital Processing of Speech and Image Signals

58

WS 2006/2007

Laplace Operator discrete approximation of the second derivation


2 x[i, j] = 2i x[i, j] + 2j x[i, j]
= x[i + 1, j] 2x[i, j] + x[i 1, j] +
x[i, j + 1] 2x[i, j] + x[i, j 1]
= x[i + 1, j] + x[i 1, j] + x[i, j + 1] + x[i, j 1] 4x[i, j]

-2

1
j+1

1
1

-4

1
-2

j-1

1
i-1

i+1

Image enhancement:
y[i, j] = x[i, j] 2 x[i, j]
= h[i, j] x[i, j]

Digital Processing of Speech and Image Signals

59

WS 2006/2007

Figure 2.3: Several real cases of Laplace Operator subtraction from original image. a)
Original image b) Original image minus Laplace Operator (negative values are set to 0
and values above the grey scale are set to the highest grade of grey)

Digital Processing of Speech and Image Signals

60

WS 2006/2007

2.4

Sampling Theorem (Nyquist Theorem) and Reconstruction

The following will be analyzed and derived respectively:


How should we choose the sampling period TS , if we want to represent a
continuous signal x(t) with its sample values x(nTS ) so that the signal x(t)
can be exactly reconstructed from its sample values?
Fourier transform of the continuous time signal x(t):
Z
X() = F { x(t) } =
x(t) ejt dt

1
x(t) = F 1 { X() } =
2

X() ejt d

(2.1)

Signal x(t) has limited bandwidth with upper limit B , which means:
X() = 0
for all || B
Note: X(B ) = 0
X() in domain B < < B can be represented as Fourier Series:

an exp(jn )
X() =
(2.2)

B
n=
The coefficients an are given by:
ZB

1
X() exp(jn ) d
an =
2B
B

(2.3)

Comparison of the equations (2.1) and (2.3) shows that the coefficients
an are given by the values of the inverse Fourier transform of x(t) at
points
n
tn =
(2.4)
B
The band limitation of X() has to be considered for the integration
limits in (2.1). Result:

n
(2.5)
an = x( )
B B
Digital Processing of Speech and Image Signals

61

WS 2006/2007

Inserting Eq. (2.5) into Eq. (2.2) and then in Eq. (2.1) results in:
1
x(t) =
2

ZB

X
x( ) exp(jn ) exp(jt) d
B n= B
B

After swapping summation and integration and subsequent integration:

x(t) =

x(

n=

n
)
B

n
))
B
n
)
B (t
B

sin(B (t

Reconstruction of the signal x(t) from sample values is possible if


n
equidistant sample values x( ) = x(n Ts ) have the distance TS
B

(2.6)
TS =
B
The sampling period TS corresponds to the sampling frequency S :
S =

2
TS

Equation (2.6) shows that if the sampling frequency is


S := 2 B
the original signal x(t) can be reconstructed exactly.
In the Fourier series representation of X() in equation (2.2), the
period 2 B has been supposed.
B is the highest frequency component of the signal x(t).

Digital Processing of Speech and Image Signals

62

WS 2006/2007

Since X() is equal to zero for || B , the period 2 B can be


substituted with every period 2
eB where
eB B . The previous
derivation is also valid for this
eB .
When

eB =

then:
x(t) =

x(n TS )

n=

TS

sin( (t n TS )/TS )
(t n TS )/TS

(reconstruction formula)
= 1 (lHopitals rule)
Note: limt0 sin(t)
t
The condition
eB B results in:
TS

(2.7)

for the sampling period TS and in:


S 2 B

(2.8)

for the sampling frequency S .


The equations (2.7) and (2.8) are denoted as sampling theorem. The
sampling frequency has to be at least twice as high as the upper limit
frequency of the signal B where X() = 0 for || B . If and only
if this condition is satisfied, an exact reconstruction (without approximation) of a continuous signal x(t) from its sample values x(nTS ) is
possible.
Note: The sampling frequency S = 2 B is also called
Nyquist frequency.

Digital Processing of Speech and Image Signals

63

WS 2006/2007

a)

x(t)

b)

xs(t)

c)

xr(t)

T
Figure 2.4: Ideal reconstruction of a band-limited signal (from Oppenheim, Schafer)
a) original signal b) sampled signal c) reconstructed signal

Digital Processing of Speech and Image Signals

64

WS 2006/2007

X()
a)

XS1() , S > 2
b)

...

...

-S

XS2() , S = 2 (Nyquist rate)


c)

...

...
-S

XS3() , S < 2 (aliasing)


d)

...

...
S

Figure 2.5: Sampling of band-limited signal with different sampling rates:


b) sampling rate higher than Nyquist rate - exact reconstruction possible
c) sampling rate equal to Nyquist rate - exact reconstruction possible
d) sampling rate smaller than Nyquist rate - aliasing - exact reconstruction not possible

Digital Processing of Speech and Image Signals

65

WS 2006/2007

Another proof using delta- and comb-function:


Sampling of the continuous signal x(t) with S =

2
TS

Band limitation: X() = 0 for || B

(always possible: analog to low-pass with T () = 0 for || B )

Sampling procedure

Multiplication of a function with a comb-function in time domain

xs (t) = Ts x(t)

+
X

n=

(t nTs )

results in a convolution with a comb-function in frequency domain:




+
1
2n
2 X
Xs () = Ts X()

2
Ts n=
Ts


Z+
+
X
2n
d

X(
)
=
T
s
n=

+
X



2
X n
=
Ts
n=

= sampled signal has periodical Fourier spectrum


(Analogy to Fourier series: periodical signal has line spectrum, i.e.
discrete spectrum)
No overlap if:

B S B
2B S
Digital Processing of Speech and Image Signals

66

WS 2006/2007

In so-called digital simulation, the signal x(t) is represented by its


sampled values x(n TS ) measured at equidistant time points with
distance TS . With a proper sampling period TS an exact reconstruction of the signal x(t) from the sampled values x(n TS ) is possible.
If it is possible to exactly reconstruct the signal x(t) from the sampled
values x(nTS ), then it is possible to perform a discrete time processing
of the sampled values x(n TS ) on a computer, which is equivalent to
the continuous time processing of the signal x(t) (digital simulation).
Continuous time processing:
y(t) =

x( ) h(t ) d

Discrete time processing:


Sampling period TS
x[n] := x(nTS )
y(nTS ) =
y[n] =

k=

k=

x(kTS ) h(nTS kTS ) TS ,

h[n]
= h(nTS )

k]
x[k] h[n

As a result of the convolution theorem (convolution in time domain


corresponds to multiplication in frequency domain), the band limited
input signal gives an also band limited output signal which is exactly
determined by its sampled values.

Digital Processing of Speech and Image Signals

67

WS 2006/2007

Important:
In the domain || < S /2 the Fourier transform of a continuous time
signal x(t) is identical with the Fouriertransform of the corresponding
sampled discrete time signal x(nTS ):
Z

X() =

x(t) exp(jt) dt

for || S /2 is identical to

X
x(nTS ) exp(jTS n)
TS XS () = TS
= TS

n=

x(nTS ) exp(j

n=

2
n)
S

Inverse Fourier transform of discrete time signal:


x(nTS ) =

1
S

ZS /2

XS () exp(jTS n) d

S /2

One period:
S
S

2
2
2


S

The Fourier transform of a discrete time signal is periodic in with


the period 2 /TS = S .
The Fourier transform of a discrete time signal is
continuous in .

Digital Processing of Speech and Image Signals

68

WS 2006/2007

Frequency normalization
Define the normalized frequency N :
N : = 2
Definition:

( now denotes a normalized frequency)

Fourier transform of discrete time signal x[n]:


+
X

X(e ) =

x[n] exp(jn)

n=

Note the notation X(ej ).


Inverse Fourier transform of discrete time signal x[n]:
1
x[n] =
2

X(ej ) exp(jn) d

Digital Processing of Speech and Image Signals

69

WS 2006/2007

2.5

Logarithmic Scale and dB

Why?
large dynamic range for the amplitude values of a signal
x(t) = A cos t
A :=

amplitude
(pressure, velocity, inclination, current, voltage, ... )
linear variable

A0 :=

reference amplitude
predefined value for calibration

dB := decibel

A[dB] 20 lg

A
,
A0

A2
= 10 lg 2 ,
A0

lg log10
A2 = quadratic variable = energy, intensity

because of 210 = 1024


= 103 :
1 bit more =

factor 2 for amplitude =


6 dB
= factor 4 for intensity

3 dB =
factor 2 for intensity

Digital Processing of Speech and Image Signals

70

WS 2006/2007

Phonem: s

Phonem: s

1.5
1

0.5
A

log A

4
3

0
-0.5

-1

1
0

-1.5
0

1000

2000

3000

4000
f / Hz

5000

6000

7000

-2

8000

Figure 2.6: Amplitude spectrum of the


voiceless phoneme s from the word
ist

1000

2000

3000

4000
f / Hz

5000

6000

7000

8000

Figure 2.7: Logarithmic amplitude spectrum of the phoneme s

Phonem: ae

Phonem: ae

12

2.5
2

10

1.5
1
log A

0.5
0

-0.5
2

-1
0

1000

2000

3000

4000
f / Hz

5000

6000

7000

-1.5

8000

Figure 2.8: Amplitude spectrum of the


voiced phoneme ae from the word

Ah

1000

2000

3000

4000
f / Hz

5000

6000

7000

8000

Figure 2.9: Logarithmic amplitude spectrum of the phoneme ae

Pause

Pause

0.9
-0.5

0.8
0.7

-1
log A

0.6
0.5

-1.5

0.4
-2

0.3
0.2

-2.5

0.1
0

1000

2000

3000

4000
f / Hz

5000

6000

7000

-3

8000

Figure 2.10: Amplitude spectrum of a


speech pause

1000

2000

3000

4000
f / Hz

5000

6000

7000

8000

Figure 2.11: Logarithmic amplitude


spectrum of a speech pause

Digital Processing of Speech and Image Signals

71

WS 2006/2007

2.6

Quantization

Uniform quantization

-X MAX

XMAX

Quantisation: x = Q(x)
B bits correspond to 2B quantisation levels
Boundaries:

x0 , x1 , . . . , xk , . . . , xK

where

K = 2B

Width of one quantisation level using uniform quantisation:


=

2 XM AX
2B

Quantisation error:
e2

Z+
Zxk
K
X
=
(x x)2 p(x) dx =
(x xk )2 p(x) dx
k=1 x
k1

for uniform quantisation:


a)
b)

xk xk1 = = const(k)
xk = 12 (xk1 + xk )

uniform distribution with p(x) = const(x) results in:

e2

X 2
k

2
1
2
XM
AX

=
=
12 K
12
3 22B

Digital Processing of Speech and Image Signals

72

WS 2006/2007

signal-to-noise ratio in dB (general definition):


x2
SN R[dB] := 10 lg 2
n
x2 = power of the signal x
n2 = power of the noise n
SN R = signal-to-noise ratio
signal-to-quantisation noise ratio (special case):
x2
SN R[dB] := 10 lg 2
e
e2 = power of the noise caused by quantisation errors
uniform quantisation using B bits:
SN R[dB] = 6.02 B + 4.77 20 lg

XM AX
x

if signal amplitude has Gaussian distribution, only 0.064% of samples


have amplitude greater than 4x :
SN R[dB] = 6.02 B 7.2

Digital Processing of Speech and Image Signals

73

for XM AX = 4x

WS 2006/2007

2.7

Fourier Transform and zTransform

Transfer function and Fourier transform


Eigenfunctions of discrete linear time invariant systems (analog to time
continuous case):
x[n] = ej n

< n <

( is dimensionless here)
Proof:
x[n] = ej n

X
y[n] =
h[k] ej (nk)
k=

jn

= e

h[k] ej k

k=

Define:

H(ej ) =

h[k] ej k

k=

Remark:
The Fourier transform of a discrete time signal is already introduced as
Fourier series during the derivation of sampling theorem and reconstruction formula (equation (2.2)).
Result:

y[n] = ej n H(ej )

Digital Processing of Speech and Image Signals

74

WS 2006/2007

ztransform:
Fourier transform of a discrete time signal: x[n]
+
X

X(e ) =

x[n] ejn

n=

periodic in

is normalized frequency, thence:


<

X is evaluated on the unit circle (ej )

Generalization: X is evaluated for any complex values z.


That results in ztransform:

+
X

X(z) =

x[n] z n

n=

Reasons for ztransform


1. analytically simpler, function theory methods are applicable
2. better handling of convergence problem:
convergence of finite signal, i.e. x[n] = 0 for each n > N0
convergence of infinite signal depends on z
Inverse ztransform:

1
x[n] =
2j

formally: z = ej

X(z) z n1 dz

dz = jzd
x[n] =

1
2

Z2

X(ej ) ejn d

Digital Processing of Speech and Image Signals

75

WS 2006/2007

Example of Fourier transform and ztransform:


Truncated geometric series
 n
a
x[n] =
0
ztransform

N
1
X

X(z) =

a z

0nN 1
otherwise

n=0

z N 1

N
1
X

1 n

(a z )

n=0

z a
za

1 (a z 1 )N
=
1 a z 1

Fourier transform
ztransform results in Fourier transformation using substitution

z = ej

X(e ) =

1 aN ejN
1 a ej
special case for a = 1 (discrete time rectangle):



N
 sin

(N 1)
2
 
= exp j
2
sin
2

Digital Processing of Speech and Image Signals

76

WS 2006/2007

Proof for the ztransform inversion


Statement:

1
x[k] =
2j

X(z) z k1 dz

Cauchy integration rule



I
1
1
k=1
z k dz =
0
k 6= 1
2j
I
I X
1
1
x[n] z n+k1 dz
X(z) z k1 dz =
2j
2j
n
I
X
1
x[n]
=
z n+k1 dz
2j
n
|
{z
}
6= 0 only for n = k
= x[k]

Fourier:
z = ej

dz = j ej d

Then:
x[n] =

1
2j

Z+

X(ej ) (ej )n1 j ej d

Integration path is unit circle because of ej

1
2

Z+

X(ej ) ejn d

Digital Processing of Speech and Image Signals

77

WS 2006/2007

2.8

System Representation and Examples

Example 1: Difference calculation


Difference equation
y[n] = x[n] x[n n0 ],
Fourier transform gives:

jn

y[n] e

n=

n0 integral number

jn

x[n] e

n=

Y (ej ) = X(ej )
j

n=
jn0

= X(e ) e
Then follows:

n=

x[n n0 ] ejn

x[n] ejn ejn0


X(ej )

Y (ej )
H(e ) =
X(ej )
= 1 ejn0
j

|H(ej )|2

= (1 cos(n0 ))2 + sin2 (n0 )


= 1 2cos(n0 ) + cos2 (n0 ) + sin2 (n0 )
= 2 (1 cos(n0 ))

|H(ei )|2
5

Digital Processing of Speech and Image Signals

78

n0

WS 2006/2007

Example 2: First order difference equation


x[n]
y[n]
+

Delay
y[n-1]

x[n] + y[n 1] = y[n]


y[n] y[n 1] = x[n]

Method 1: Estimation of transfer function H(ej )


from impulse response h[n]:
From the Eq. above with y[n] = h[n] and x[n] = [n] follows:
h[n] = [n] + h[n 1]
= [n] + [n 1] + 2 [n 2] +
 n
,
n0
=
0,
otherwise
Fourier spectrum/transfer function H(ej )
j

H(e ) =
=
=

+
X

h[k] ejk

k=
+
X

k ejk

k=0
+
X

ej

k=0

1
1 ej

Digital Processing of Speech and Image Signals

79

k

for || < 1
WS 2006/2007

Method 2: Estimation of transfer function H(ej ) using


Fourier transform of difference equation:
Difference equation:
y[n] y[n 1] = x[n]
Fouriertransform:
Y (ej ) ej Y (ej ) = X(ej )
Result:
H(ej ) =
=

Digital Processing of Speech and Image Signals

80

Y (ej )
X(ej )
1
1 ej

WS 2006/2007

Example 3: Linear difference equations (with constant coefficients)


Difference equation:
y[n] =

I
X
i=0

b[i] x[n i]

z-transform:
Y (z) = X(z)

I
X

b[i]z

i=0

Result:
H(z) =

Y (z)
X(z)
I
P

J
X

a[j]z j

j=1

b[i] z i

1+

j=1

j=1

a[j] y[n j]

Y (z)

i=0
J
P

+
X

J
X

a[j] z j

h[n] z n

n=

Using the definition of H(z) we can optain the impulse response as a


function of the coefficients of the difference equation in the above term.
Remark:
If we factorise denominator and numerator polynoms into linear factors, we can obtain a zero-pole-representation of a discrete time LTI
system:
I (z vi )
H(i) = Ji=1
j=1 (z wj )
with zeros vi C and poles wj C.
Digital Processing of Speech and Image Signals

81

WS 2006/2007

in general:
h[n] has infinite number of non-zero values
= IIRfilter: Infinite Impulse Response

but if: a[j] 0


j
h[n] identical to zero outside of a finite interval

h[n] =

b[n]
0

n = 0, . . . , I
otherwise

= FIRfilter: Finite Impulse Response

Digital Processing of Speech and Image Signals

82

WS 2006/2007

Example 4:
Impulse response as truncated geometric series
h[n] =

H(z) =

an
0
M
X

0nM
otherwise

a z

n=0

a IR

1 aM +1 z (M +1)
=
1 a z 1

system operation:

y[n] =

k=

M
X

k=0

h[k] x[n k]

ak x[n k]

or as difference equation (recursively)


y[n] a y[n 1] = x[n] aM +1 x[n M 1]

Digital Processing of Speech and Image Signals

83

WS 2006/2007

For this example we consider the zero-pole-representation:


1 ( az )(M +1)
H(z) =
1 ( az )1
Zeros:
Pole:

zk
z0

>0

2k

k = 0, 1, . . . , M
= a ej M +1
= a
(cancelled by zero z0 = a)

Im
M=11

Re

Digital Processing of Speech and Image Signals

84

WS 2006/2007

Example 5:
Fibonacci numbers
Difference equation:
n0

h[n + 2] = h[n + 1] + h[n]


h[0]
= h[1] = 1
h[n]
= 0

n<0

H(z) =

h[n]z n

n=

= 1 + z
= 1 + z

+
+

n=0

h[n + 2]z (n+2)


h[n + 1]z

(n+2)

n=0

= 1 + z

+ z

= 1 + z 1 (1 +
|

h[n]z (n+2)

n=0

n=0

h[n + 1]z

h[n]z n ) + z 2

{z

H(z)

= 1 + z 1 H(z) + z 2 H(z)

H(z)

=
=

+ z

h[n]z n

n=0

n=1

H(z)(1 z 1 z 2 )

(n+1)

h[n]z n

|n=0 {z

H(z)

1
1
1 z 1 z 2

1 
a
b

1 bz 1
5 1 az 1

1 5
1+ 5
and b =
where a =
2
2
Digital Processing of Speech and Image Signals

85

WS 2006/2007

For a and b the following holds:


X
X
a n
1
=
=
an z n
a
1 (z )
z
n=0
n=0

That results in:

X

1

an+1 bn+1 z n
H(z) =
5
n=0

X
!
=
h[n] z n
n=0

h[n] =

an+1 bn+1

Digital Processing of Speech and Image Signals

86

WS 2006/2007

Table 2.1: Fourier transform pairs

signal

Fouriertransform

1.

[n]

2.

[n n0 ]

ejn0

3.

4.

( < n < )

an u[n]

2( + 2k)

k=

1
1 aej

(|a| < 1)

X
1
+
( + 2k)
1 ej k=

5.

u[n]

6.

(n + 1)an u[n]

7.

rn sin p (n + 1)
u[n]
sin p

8.

sin c n
n

(
1, || < c ,
X(e ) =
0, c < ||

9.

(
1, 0 n M
x[n] =
0, otherwise

sin[(M + 1)/2] jM/2


e
sin(/2)

10. ej0 n

(|a| < 1)

(|r| < 1)

1
(1 aej )2
1
1 2r cos p ej + r2 ej2
j

k=

11. cos(0 n + )

2( 0 + 2k)

[ej ( 0 + 2k) + ej ( 0 + 2k)]

k=

u[n] =

1, n 0
0, n < 0

Digital Processing of Speech and Image Signals

87

WS 2006/2007

2.9

Discrete Time Signal Fourier Transform Theorem

Basically there is no difference between FT theorem for the continuous


time and the discrete time case because summation has the same properties as integration. Only differentiation and difference calculation are not
completely analog, because it is not possible to form a derivative in the
discrete time case.
Table 2.2: Fourier transform Theorems
signal
x[n], y[n]

Fouriertransform
X(ej ), Y (ej )

1.

ax[n] + by[n]

aX(ej ) + bY (ej )

2.

x[n nd ],
nd is integral number

ejnd X(ej )

3.

ej0 n x[n]

X(ej(0 ) )

4.

x[n]

X(ej )
X(ej ) if x[n] is real

5.

nx[n]

6.

x[n] y[n]

X(ej )Y (ej )

x[n]y[n]

1
2

7.
8.

Parseval theorem
9.

1
|x[n]| =
2
n=

X(ej )Y (ej() )d

(1 ej )X(ej )
|1 ej |2 = 2(1 cos )

x[n] x[n 1]

dX(ej )
d

1
10.
x[n]y[n] =
2
n=

|X(ej )|2 d

X(ej )Y (ej )d

Digital Processing of Speech and Image Signals

88

WS 2006/2007

Example 1 corresponding to Theorem 5:


+
X

X(e ) =

x[k] ejk

k=

k=
+
X

=
j

k=
+
X

d
X(ej ) =
d

x[k] ejk

k=

+
X

+
X

d
d

d
X(ej ) =
d

d
d

x[k] ejk

x[k] (jk) ejk


k x[k] ejk

k=

F {n x[n]} = j

d
F {x[n]}
d

Example 2 corresponding to Theorem 8:


+
X

F {x[n] x[n 1]} =

k=
+
X

k=
j

|F {x[n] x[n 1]} |2

jk

x[k] e

x[k] ejk


= X(e ) 1 ej

+
X

k=
+
X

x[k 1] ejk
x[k] ejk ej

k=

= |F {x[n]} |2 |1 ej |2
= |F {x[n]} |2 2 (1 cos())

Digital Processing of Speech and Image Signals

89

WS 2006/2007

2.10

Discrete Fourier Transform: DFT

The Fourier transform for discrete time signals and systems has been explained on the previous pages. For discrete time signals with finite length
there is also another Fourier representation called Discrete Fourier Transform (DFT).
The DFT plays a central role in digital signal processing.
Decisive reasons:
fast algorithms exist for DFT calculation
(Fast Fourier Transform, FFT).
discrete frequencies k can be better represented in the computer than
continuous frequencies .

Digital Processing of Speech and Image Signals

90

WS 2006/2007

Assume a discrete time signal x[n] with finite length (see also chapter 3.2
on page 135):

x[n] =

x[n]
0

0nN 1
otherwise

Note: For a continuous time signal it is impossible in the strict sense to be


band-limited and time-limited (truncation effect =
Windowing).
The discrete time signal Fourier transform for x[n] is:
j

X(e ) =

N
1
X

x[n] exp(jn)

n=0

is a continuous variable. The period is 2. Frequency discretisation


is made by sampling along the frequency axis.
The Fourier transform X(ej ) is evaluated at
k =

2
k
N

where k = 0, 1, . . . , N 1

Define:
X[k] : = X(ej )| = k
Im

N=8

Re

Digital Processing of Speech and Image Signals

91

WS 2006/2007

Discrete Fourier Transform (DFT):


X[k] =

N
1
X

x[n] exp(j

n=0

2
k n),
N

k = 0, 1, . . . , N 1

Inverse DFT:

N 1
1 X
2
x[n] =
k n),
X[k] exp(j
N
N
k=0

n = 0, 1, . . . , N 1

Remark:
This equation can be proven by inserting the equation for X[k] in the
equation for x[n] and using the orthogonality:



N 1
1 X
2
1
exp j kn =
0
N n=0
N

k = m N,
otherwise

m is integral number

Note:
Consider the analogy between inverse DFT (above) and inverse
Fourier transform of discrete time signal:
x[n] =

1
2

Z2

X(ej ) ejn d

Under the given conditions the integral is equal to the sum


(without approximation!).

Digital Processing of Speech and Image Signals

92

WS 2006/2007

Remarks:
DFT coefficients X[k] are not an approximation of the discrete time
signal Fourier transform X(ej ). On the contrary:
X[k] = X(ej )| = k
Number of the coefficients X[k] depends on the signal length N . A
finer sampling of the discrete time signal Fourier transform is possible
by appending zeros to the signal x[n] (ZeroPadding).
x[n]

N-1

Digital Processing of Speech and Image Signals

93

WS 2006/2007

Interpretation of Fourier coefficients


Fourier transform X(ej ) of the time discrete signal x[n]
j

|X(e

)|

Evaluation at N discrete sampling points


2
k
N

k =
yields the DFT coefficients X[k].

At first k lies in the domain k =

N
N
+ 1, . . . , 0, . . . , .
2
2

|X(e

-N/2+1

-1

Digital Processing of Speech and Image Signals

94

)|, |X[k]|

N/2

WS 2006/2007

Because of the periodicity of X(ej ) the coefficients X[k] can also be


obtained by shifting the sampling points with negative frequency into
the positive frequency domain (by one period).
Then k = 0, . . . , N/2, . . . , N 1.

X[k] =

N
1
X

x[n] exp(j

n=0

|X(e

2
k n)
N

)|, |X[k]|

N-1

Interpretation of coefficients for general signal x[n]:


k=0
N
1
1k
2
N
k=
2

N
+1k N 1
2

f =0

0<f <

Digital Processing of Speech and Image Signals

95

fS
2

fS
2
fS
<f <0
2

WS 2006/2007

Symmetric relations by real signals:


For DFT coefficients X[k] of a real signal x[n] the following holds:
X[k] = X[N k]

Re(X[k]) =
Re(X[N k])
Im(X[k]) = Im(X[N k])
For the amplitude spectrum | X[k] | the following holds:
| X[k] |2

= Re2 {X[k]} + Im2 {X[k]}


= | X[N k] |2

Digital Processing of Speech and Image Signals

96

WS 2006/2007

Realization of DFT:

/*
/*
/*
/*

PI = 3.14159265358979
x:
input signal
N: length of input signal
Xre, Xim: real and imaginary part of DFT coefficients

void

dft (int N, float x[], float Xre[], float Xim[]) {


int
n, k;
float
SumRe, SumIm;
for (k=0; k<=N-1; k++)
{
SumRe = 0.0;
SumIm = 0.0;
for (n=0; n<=N-1; n++)
{
SumRe += x[n]*cos(2*PI*k*n/N);
SumIm -= x[n]*sin(2*PI*k*n/N);
}
Xre[k] = SumRe;
Xim[k] = SumIm;
}

}
Remark:
discrete realization

2j

2j

Reduction of Fourier powers e N kn to e N l (l = 0, 1, . . . , N 1)


is possible, because they are periodical (on the unit circle).

Digital Processing of Speech and Image Signals

97

WS 2006/2007

*/
*/
*/
*/

2.11

DFT as Matrix Operation

Notation with unit roots

X[k]

N
1
X

x[n] exp (

n=0

N
1
X

2j
k n)
N

x[n] WNkn

n=0

where

WN := exp (

2j
)
N

N=12

W 0N =1
W 1N
W

3
N

W 2N

Periodicity of WN
unit root:

2
k) = (WN )k
N
2
:= exp (j )
N

exp (j k ) = exp (j
WN

Digital Processing of Speech and Image Signals

98

WS 2006/2007

Note:
1.

WNr

= WNr mod N

2.

WNkN

= (WNN )k = 1k = 1

3.

WN2

= [exp (

kZ

2j
2j 2
)] = exp (
2)
N
N

= exp (

2j
) = WN/2
N/2

N/2

= exp (

2j N
) = exp (j) = 1
N 2

r+N/2

= WN

4.

WN

5.

WN

N/2

N even

WNr = WNr

Digital Processing of Speech and Image Signals

99

WS 2006/2007

DFT as matrix multiplication

X[k] =
=
=

N
1
X

n=0
N
1
X

n=0
N
1
X
n=0

x[n] exp (

2j
k n)
N

WNkn x[n]
{WN }kn x[n]

with the matrix {WN } and the matrix elements:


{WN }kn := WNkn
Inversion:
N 1
2j
1 X
k n)
X[k] exp (
x[n] =
N
N

=
=

1
N
1
N

k=0
N
1
X

(WN1 )kn X[k]

k=0
N
1
X
k=0

{WN1 }kn X[k]

Therefore for the matrix {WN }1 holds:


{WN }1 :=

Digital Processing of Speech and Image Signals

100

1
{WN1 }
N

WS 2006/2007

DFT matrix operation: properties


DFT: invertible linear mapping
N complex signal values N complex Fourier components
N real signal values

N
complex Fourier components
2

(due to symmetry)
in words:
DFT causes no information loss in the signal.
Parseval theorem for DFT
general Fourier:
N
1
X
n=0

1
=
2

|x[n]|2

Z+
|X(ej )|2 d

special DFT: (recalculate for yourself!)


N
1
X
n=0

|x[n]|

N 1
1 X
=
|X[k]|2
N
k=0

in words:
1
, the DFT is a norm conserving (=
energy
N
conserving) transformation (mathematical terminology: unitary).

Disregarding the factor

Digital Processing of Speech and Image Signals

101

WS 2006/2007

2.12

From Continuous Fourier Transform to Matrix


Representation of Discrete Fourier Transform

Assumption: band-limited signal x(t)


Fourier transform of the continuous time signal x(t):

X() = F {x(t)} =

x(t) ejt dt

(2.9)

For the exact reconstruction (without approximation) of the continuous


time signal from sampled values, the samples x[n] = x(n Ts ) must have
the distance of at most
Ts =

(sampling theorem).
This results in the Fourier transform of the discrete time signal x[n]:

X(e ) =

x[n] ejn

(2.10)

where is frequency normalised on Ts


Functions (2.9) and (2.10) agree in interval [S /2, +S /2] = [B , +B ].
| X ()|

Digital Processing of Speech and Image Signals

102

WS 2006/2007

1
0.8
0.6
0.4
0.2
0
0

N-1

Figure 2.12: Hanning window

The signal x[n] is further decomposed by applying a window function w[n]


(windowing):
(
...
n = 0, . . . , N 1
w[n] =
0
otherwise
Windowed signal y[n]:
y[n] = w[n] x[n]

can be analyzed using Fourier transform or DFT.


jk

Y (e

) =

N
1
X

y[n] ejk

n=0

DFT:
k

Y [k] =

2k
N
N
1
X

where k = 0, . . . , N 1
2

y[n] e N kn

n=0

Matrix representation:

Y [0]
..

Y [k] =

..

.
Y [K 1]
K=N

..
.
2j

e N

nk

..
.

Digital Processing of Speech and Image Signals

103

y[0]
..
.
y[n]
..
.
y[N 1]

WS 2006/2007

2.13

Frequency Resolution and Zero Padding

Task: signal x[n] with finite length N is given.


Wanted: Fourier transform X(ejk ) at
k =

2
k,
K

where k = 0, 1, . . . , K 1 and K > N

Inserting the definitions:

jk

X(e

=
=

N
1
X

n=0
K1
X

x[n] exp (

2j
k n)
K

x[n] exp (

2j
k n)
K

n=0

where

x[n] =

x[n]
0

n = 0, . . . , N 1
n = N, . . . , K 1

i.e. Zero Padding (appending zeros).


Matrix representation:

X[0]
..

..

.
X[K 1]

x[0]

..

..

x[N 1]

..

.
0

WKnk

n=0

n=N 1
n=N
n=K 1

Note:
Zero Padding does not introduce any additional information into the signal. This is only a trick so that DFT and particularly FFT (Fast Fourier
Transform) can be performed with a
higher frequency resolution
.
Digital Processing of Speech and Image Signals

104

WS 2006/2007

2.14

Finite Convolution

Input signal and convolution kernel have finite duration


Consider finite convolution:
Impulse response:

h[n] 0

for

n 6 {0, 1, 2, . . . , Nh 1}

Input signal:

x[n] 0

for

n 6 {0, 1, 2, . . . , Nx 1}

Output signal:
y[n] =
=

k=
N
h 1
X
k=0

h[k] x[n k]
h[k] x[n k]

h[k]

N-1
h

x[-k]
n=0

-(N-1)
x

Altogether:

k
0

Nx + Nh 1 positions with overlap

Therefore only Nx + Nh 1 values of output signal can be different


from zero:

n > N x + Nh 2
0
y[n] =
...
n = 0, 1, . . . , Nx + Nh 2

0
n<0

Digital Processing of Speech and Image Signals

105

WS 2006/2007

a)

h[k]
1

Nh-1=12

x[k]
1

0.8
0.6
0.4
0.2

b)

Nx-1=4

x[n-k] , n=-1
i)
-Nx

-1 0

Nh-1

x[n-k] ,n=m (m>0 & m<Nh+Nx-2)


ii)
m-NX+1

Nh-1

x[n-k] , n=Nh+Nx-1
iii)
0

c)

Nh+Nx-1

Nh-1

y[n]
2.8 3

2.4
2

1.8

1.2
1
0.6
0.2

Nx-1

Nh-1

Nh+Nx-2

Figure 2.13: Example of a linear convolution of two finite length signals: a) two signals;
b) signal x[n-k] for different values of n:
i) n < 0, no overlap with h[k], therefore convolution y[n] = 0
ii) n between 0 and Nh + Nx 2, convolution 6= 0
iii) n > Nh + Nx 2, no overlap with h[k], convolution y[n] = 0
c) resulting convolution y[n].
Digital Processing of Speech and Image Signals

106

WS 2006/2007

Finite convolution using DFT


Convolution theorem:
y[n] =

k=

h[k] x[n k]

Fourier:
Y (ej ) = H(ej ) X(ej ),

0 2

Also valid for sample frequencies:


k :=

2
k,
N

k = 0, . . . , N 1 for any N

Notation: Y [k] = H[k] X[k]


Question:

How to choose the length N of the DFT ?

Reminder: different lengths


x[n]: Nx non-zero values
h[n]: Nh non-zero values
y[n]: Ny = Nx + Nh 1 non-zero values
Answer:
The convolution theorem is certainly correct for any N > 0.
If we want to calculate the output signal completely from Y [k],
we have to know Y [k] for
at least N = Nx + Nh 1
frequency values k = 0, 1, . . . , N 1.

In words: for the DFT length N must be valid:


N Nx + Nh 1
Method:

Zero Padding, i.e. appending zeros.

Note: The FFT will be introduced on the next pages. A comparison of


costs for realization of the finite convolution by DFT and FFT can be
found at the end of paragraph 2.15.
Digital Processing of Speech and Image Signals

107

WS 2006/2007

2.15

Fast Fourier Transform (FFT)

Principle of FFT:
Calculation of the DFT can be done by successive decomposition into
smaller DFT calculations. In this way, the number of elementary operations (multiplications and additions) is dramatically reduced:
FFT:
N2
N = 1024 :

N
ld N
2

2N
2 1024
=
= 200
ld N
10

operations
factor of velocity gain

The matrix is decomposed into a product of sparse matrices, therefore N


with lot of prime factors is convenient (not necessarily only powers of two).
Terminology for different variants of FFT:
in time in frequency
in place:

yes/no

radix 2 radix 4
decomposition to prime factors instead of N = 2n
History
1965 Cooley and Tukey
1942 Danielson and Lanczos
1905 Runge
1805 Gauss

Digital Processing of Speech and Image Signals

108

WS 2006/2007

Algorithms which are based on a decomposition of the signal x[n] are called
decimationintime algorithms.
The case N = 2 is considered in the following.

X[k] =
=

N
1
X

n=0
N
1
X

x[n] exp(j

2
k n)
N

x[n] WNnk

where k = 0, 1, . . . , N 1

where WNnk = exp(j

n=0

2
k n)
N

Decomposition of the sum over n into the sums over even and odd n:
N/21

N/21

X[k] =

x[2r]

WN2rk

r=0

r=0
N/21

(2r+1)k

x[2r + 1] WN

x[2r]

(WN2 )rk

N/21

WNk

r=0

x[2r + 1] (WN2 )rk

r=0

Because of
WN2 = exp(2j

2
2
) = exp(j
) = WN/2
N
N/2

for k = 0, . . . , N 1 holds:
N/21

X[k] =

N/21

x[2r]

rk
WN/2

r=0

= G[k] +

WNk

x[2r + 1] (WN/2 )rk

r=0

WNk

H[k]

Each of the two sums corresponds to the DFT with the length N/2.
The first sum is a N/2DFT of the even indexed signal values x[n],
the second sum is a N/2DFT of the odd indexed values.
The DFT of the length N can be obtained by getting the two N/2
DFTs together, with the factor WNk .

Digital Processing of Speech and Image Signals

109

WS 2006/2007

Complexity:
The complexity O(N 2 ) of one-dimensional FT can be reduced by adequate
resorting values from two FTs with length N2 and complexity O(2 ( N2 )2 ) =
N2
2 . By successive application of this resorting the complexity can be reduced to O(N log N ).
The case N = 23 = 8 is considered in the following.
X[4] can be obtained from H[4] and G[4] according to previous equation.
Because of the DFTlength

N
2

= 4:

H[4] = H[0] and G[4] = G[0]

And then:
X[4] = G[0] + WN4 H[0]
The values X[5], X[6] and X[7] can be obtained analogously.
Flow diagram for decomposition of one N -DFT into two N/2DFTs:
x[n]

X[k]

G[0]
x[0]

X[0]
G[1]

x[2]
x[4]

0
N

X[1]

N/2-point
G[2]

DFT

1
N

X[2]
G[3]

2
N

x[6]

X[3]
W3N

x[1]

X[4]
H[0]

x[3]
x[5]

4
N

X[5]

N/2-point

H[1]

5
N

DFT

X[6]
H[2]

6
N

x[7]

X[7]
H[3]

7
N

Figure 2.14: Flow diagram for decomposition of one N -DFT to two N/2DFTs with
N =8
Digital Processing of Speech and Image Signals

110

WS 2006/2007

Further analogous decomposition, until only DFTs with the length


N = 2 remain (so called Butterfly Operation)
Resulting flow diagram of the FFT:

x[n]

X[k]
X[0]

x[0]
W0N
x[4]

-1

X[1]
0
N

x[2]
W

W
x[6]

X[2]

-1

2
N

0
N

-1

-1

X[3]
W0N

x[1]
1
N

0
N

W
x[5]

-1

2
N

0
N

x[3]
W2N

W0N
x[7]

-1
-1

-1

W3N

X[4]

-1

X[5]

-1

X[6]

-1

X[7]

-1

Figure 2.15: Flow diagram of an 8pointFFT using Butterfly operations.

Digital Processing of Speech and Image Signals

111

WS 2006/2007

Complexity reduction
Number of complex multiplications in FFT
is N/2 ld N .
Comparison:
Direct application of the DFT definition needs
N 2 complex multiplications.
Example:

N = 1024 = 210
N2
200
N/2 ld N

Complexity reduction by factor 200


FFT with the base 2 is not minimal according to number of additions, FFT with the base 4 can be better.

Digital Processing of Speech and Image Signals

112

WS 2006/2007

Matrix representation of the FFT principle


The complex Fourier matrix can be decomposed into the product of
r = ld N matrices, each of them having only two non-zero elements in
each column.
The following graph shows the decomposition of the Fourier matrix in
the case of inverse transformation.
w corresponds to WN1
X = |wnk |X = T3 T2 T1 TS x
This is how the decomposition into r + 1 = 4 matrices looks like
(w4 = 1, w8 = 1):
1
1
1
1
nk
|w | = 1
1
1
1

1
w
w2
w3
w4
w5
w6
w7

1
w2
w4
w6
w8
w10
w12
w14

1
w3
w6
w9
w12
w15
w18
w21

1
w4
w8
w12
w16
w20
w24
w28

1
w5
w10
w15
w20
w25
w30
w35

1
w6
w12
w18
w24
w30
w36
w42

1
1 1 1
w7
1 w w2
w14
1 w2 w4
w21
1 w3 w6
w28 = 1 w4 1
w35
1 w5 w2
w42
1 w6 w4
w49
1 w7 w6

Digital Processing of Speech and Image Signals

113

1
w3
w6
w
w4
w7
w2
w5

1
w4
1
w4
1
w4
1
w4

1
w5
w2
w7
w4
w
w6
w3

1
w6
w4
w2
1
w6
w4
w2

1
w7
w6
w5
w4
w3
w2
w

WS 2006/2007

Signal flow diagram


The calculation operations which correspond to the matrix representation
of FFT can be showed in a signal flow diagram.
T3
1000 1 0 0
0
0100 0 w 0
0
0 0 1 0 0 0 w2 0
0 0 0 1 0 0 0 w3
1 0 0 0 1 0 0
0
0 1 0 0 0 w 0
0
0 0 1 0 0 0 w2 0
0 0 0 1 0 0 0 w3

T2
10 1 0
0 1 0 w2
1 0 -1 0
0 1 0 w2
00 0 0
00 0 0
00 0 0
00 0 0

00 0 0
00 0 0
00 0 0
00 0 0
10 1 0
0 1 0 w2
1 0 1 0
0 1 0 w2

TS

T1

T1
1100
1 -1 0 0
0011
0 0 1 -1
0000
0000
0000
0000

TS
0 0 0 0 1000
0 0 0 0 0000
0 0 0 0 0010
0 0 0 0 0000
1 1 0 0 0100
1 -1 0 0 0 0 0 0
0 0 1 1 0001
0 0 1 -1 0 0 0 0
T2

0000
1000
0000
0010
0000
0100
0000
0001

T3

x[0]
x[1]

X[0]
X[1]

-1

x[2]
2
x[3]

-1

X[2]

-1

X[3]

-1

x[4]
x[5]

-1

x[6]
2
x[7]

-1

Digital Processing of Speech and Image Signals

114

2
-1
3
-1

X[4]

X[5]

X[6]

X[7]

WS 2006/2007

Matrices T1 , T2 and T3 contain exactly two non-zero elements in each


row.
Non-zero elements are realizing the Butterfly Operation.
Matrix T1 : step width of the Butterfly Operation is 1
Matrix T2 : step width of the Butterfly Operation is 2
Matrix T3 : step width of the Butterfly Operation is 4
step widths can be found:
in signal flow diagram
distance between the non-zero elements in T1 , T2 and T3

Digital Processing of Speech and Image Signals

115

WS 2006/2007

Butterfly Operation
Signal flow diagram and matrix representation of the FFT are based
on the following basic operation:

Xm[p]

Xm-1[p]

WrN
Xm-1[q]

Xm[q]

-1

For two input values Xm1 [p] and Xm1 [q] this operation produces
two output values Xm [p] and Xm [q]. The output values are thereby a
linear combination of the input values.
Because of the flow graph, the operation is called
Butterfly Operation.

Xm [p] = Xm1 [p] + WNr Xm1 [q]


Xm [q] = Xm1 [p] WNr Xm1 [q]

Xm [p]
Xm [q]

1
WNr
1 WNr

Digital Processing of Speech and Image Signals

116

 

Xm1 [p]
Xm1 [q]


WS 2006/2007

Bit Reversal
The matrix representation of the FFT uses a sorting matrix, i.e. the
signal which is to be transformed is at first resorted.
Example for N = 8:
n Binary representation Reversed n
0
000
000
0
1
001
100
4
2
010
010
2
3
011
110
6
4
100
001
1
5
101
101
5
6
110
011
3
7
111
111
7
Bit Reversal is a necessary part of the FFTAlgorithm.
Bit Reversal for N = 23

Digital Processing of Speech and Image Signals

117

WS 2006/2007

2.16

FFT Implementation

Fortran version
C
adapted from: Oppenheim, Schafer p. 608
C SUBROUTINE FFT DecimationInTime (X, ld n) **********************************
C ****************************************************************************
PARAMETER PI = 3.14159265358979
PARAMETER N max = 2048
COMPLEX
COMPLEX
COMPLEX
COMPLEX
INTEGER

X(N max)
! array for input AND output
Temp
! temporary storage
W uni
! root of unity
W pow
! powers of W uni
N, ld N, ip, iq, iqbeq, j, k, i exp, istp

N = 2**ld n
IF (N.GT.N max) STOP
C BIT Reversed Sorting *********************************************************
j = 1
DO i = 1, N-1
IF (i.LT.j) THEN
! swap X(j) and X(i)
Temp = X(j)
X(j) = X(i)
X(i) = Temp
ENDIF
k = N/2
DO WHILE (k.LT.j)
j = j - k
k = k / 2
ENDDO
j = j + k
ENDDO
C End of Bit Reversed Sorting **************************************************
C FFT Butterfly Operations *****************************************************
DO i=1, ld N
i exp = 2**i
! exponent
istp = i exp/2
! stepsize
W pow = (1.0,0.0)
W uni = CMPLX (COS (PI/FLOAT(istp)), -SIN(PI/FLOAT(istp)))
DO ipbeg = 1, istp
DO ip = ipbeg, N, i exp
iq = ip + istp
Temp = X(iq) * W pow
X(iq) = X(iq) - Temp
X(ip) = X(iq) + Temp
ENDDO
W pow = W pow * W uni
ENDDO
ENDDO
C End of FFT Butterfly Operations **********************************************
RETURN
END

Digital Processing of Speech and Image Signals

118

WS 2006/2007

Explanations about Fortran Program


Two program parts:
1. Bit Reversal
2. Butterfly Operations
3 loops with variables i, ipbeg, ip are controlling the execution of
the Butterfly operations
outer loop i:
i specifies the level of the FFT
With exception of the first level, Butterfly operations are nested.
Therefore two loops are used for the Butterfly operations within
one level.
middle loop, ipbeg:
ipbeg: goes over the nested Butterfly operations
i=1: ipbeg=1
i=2: ipbeg=1,2
i=3: ipbeg=1,2,3,4
iqbeg: specifies the sequence of starting points for inner loop
inner loop, ip:
ip: specifies the first element of the Butterfly operation
istp: step width of the Butterfly operation
iq=ip+istp: specifies the second element for Butterfly operation
inner loop is started once per nesting

Digital Processing of Speech and Image Signals

119

WS 2006/2007

x[0]

X[0]
0
N

W
x[4]

-1

X[1]
0
N

x[2]
0
N

W
x[6]

X[2]

-1

2
N

-1

-1

X[3]
0
N

x[1]
0
N

1
N

W
x[5]

W
-1

W0N

W2N

x[3]
W0N
x[7]

-1

W2N

-1

-1

W3N

-1
-1
-1
-1

X[4]
X[5]
X[6]
X[7]

Figure 2.16: Flow diagram of an 8pointFFT using Butterfly operations.

Digital Processing of Speech and Image Signals

120

WS 2006/2007

C version (from Numerical Recipes in C)


#include <math.h>
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr
void four(float data[], unsigned long nn, int isign)
Replaces data[1..2*nn] by its discrete Fourier transform, if isign is input as 1; or replaces
data[1..2*nn] by nn times its inverse discrete Fourier transform if insign is input as -1.
data is a complex array of lenght nn or, equivalently, a real array of lenght 2*nn. nn MUST
be an whole number power of 2 (this is not checked for!).
{
unsigned long n, mmax, m, j, istep, i;
double wtemp, wr, wpr, wpi, wi, theta;
Double prec. for the trigonometric recurrences.
float tempr, tempi;
n=nn << 1;
j=1;
for (i=1; i<n; i+=2) {
This is the bit-reversal section of the routine.
if (j > i) {
SWAP (data[j], data[i]);
Exchange the two complex numbers.
SWAP (data[j+1], data[i+1]);
}
m=n >> 1;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
}
Here begins the Danielson-Lanczos section of the routine.
mmax=2;
while (n > mmax) {
Outer loop executed log2 nn times
istep=mmax << 1;
theta=isign(6.28318530717959/mmax); Initialise the trigonometric recurrence.
wtemp=sin(0.5*theta):
wpr = -2.0*wtemp*wtemp;
wpi=sin(theta);
wr=1.0;
wi=0.0;
for (m=1;m <mmax;m+=2) {
Here are the two nested loops.
for (i=m;i<=n;i+=istep) {
j=i+mmax;
This is the Danielson-Lanczos formula:
tempr=wr*data[j]-wi*data[j+1];
tempi=wr*data[j+1]+wi*data[j];
data[j]=data[i]-tempr;
data[j+1]=data[i+1]-tempi;
data[i] += tempr;
data[i+1] += tempi;
}
wr=(wtemp=wr)*wpr-wi*wpi+wr;
Trigonometric recurrence
wi=wi*wpr+wtemp*wpi+wi;
}
mmax=istep
}
}

Digital Processing of Speech and Image Signals

121

WS 2006/2007

Input and output arrays (C version)


a)

real

imag

real

imag

2N-3

real

}
}

b)

real

imag

real

imag

N-1

real

imag

N+1

real

t=0

t=

}
}

N+2

imag

N+3

real

N+4

imag

2N-1

real

}
}
}
}
}

f=0

f= 1
N

f = N/2 - 1
N

f=

1
2

f=

N/2 - 1
N

f=

1
N

t = (N-2)

2N-2

imag

2N-1

real

t = (N-1)

2N

imag

2N

imag

Figure 2.17: Input and output arrays of an FFT. a) The input array contains N (N is
power of 2) complex input values in one real array of the length 2N . with alternating
real and imaginary parts. b) The output array contains complex Fourier spectrum at N
frequency values. Again alternating real and imaginary parts. The array begins with the
zero-frequency and then goes up to the highest frequency followed with values for the
negative frequencies.

Digital Processing of Speech and Image Signals

122

WS 2006/2007

Finite convolution: complexity by the application of FFT


Estimation of number of necessary multiplications for a convolution of x[n]
and h[n]
x[n]: Nx non-zero values
h[n]: Nh non-zero values
Realisation
direct implementation

DFT

FFT
transformation

(Nx + Nh )2
Nx Nh

Nx +Nh
2

log2 (Nx + Nh )

multiplication in frequency domain


Nx + Nh

Nx + Nh

inverse transformation
(Nx + Nh )2

Digital Processing of Speech and Image Signals

123

Nx +Nh
2

log2 (Nx + Nh )

WS 2006/2007

2.17

Cyclic Matrices and Fourier Transform

The Fourier transform plays a significant role for so-called cyclic matrices
(cf. chapter 3.8):
0

H =

h0
h1 h2
hN 1 . . . . . .
hN 2 . . . . . .
...
..
.

N 1
...

...
... ...
... ... ...
... ... ...
... ...

h1

so:

hN 1

hN 1
hN 2
..
.
h2
h1
h0

N 1

Hmn = h(nm)modN

with so-called kernel vector (h0 , h1 . . . , hN 1 )T and hn C, mostly hn IR.


Remark: using cyclic matrices it is possible to define cyclic i.e. periodic
convolutions and to build a cyclic variant of the system theory.
The eigenvectors of a cyclic matrix can be obtained from the columns of
DFT matrix:
0
0

N 1

N 1

w00
wn0
w(N 1)0
..
..
.
.
wn1
..
..
.
.
..
..
.
.

..
..
.
.
..
..

.
..
..
.
wn(N 1)
.

Digital Processing of Speech and Image Signals

124

where w = e 2j
N

WS 2006/2007

The eigenvalues k of a cyclic matrix with the kernel vector (h0 , h1 . . . , hN 1 )T


are:
k =

N
1
X

2j

hn e N

kn

n=0

where k = 0, 1 . . . , N 1

The representation is oriented on L. Berg: Linear equation systems with


band structure (page 52 ff).
The special case of a cyclic matrix when the kernel vector h is symmetric
and real is especially interesting for many applications:
hn = hN n and hn IR
That means that the matrix is real, symmetric and cyclic:
0
0

N 1

h0 h1 h2
h1 . . . . . . . . .
... ... ...
... ...
..
...
.
h2 h3
h1 h2

Digital Processing of Speech and Image Signals

125

...
...
...
...
...

N 1

h2 h1
h2

h3
. . . ...

...

... h
1
h1 h0

WS 2006/2007

For such a matrix is valid:


a) the eigenvalues k are:
k =

N
1
X
n=0

 2kn 
hn cos
N

b) The eigenvectors can be obtained from the columns of the discrete


cosmatrix:
n

..
.

..
.

.
..

cos 2nm )
m
N

..

.
..

..
.
Application: diagonalisation of covariance matrices, e.g. for coding
of image and speech signals.

Digital Processing of Speech and Image Signals

126

WS 2006/2007

Proof:
We will prove that

vk =

cos 2k0
N
2k1
cos N
..
.
..
.
..
.
2k(N 1)
cos
N

are eigenvectors of the given matrix with eigenvalues k .


For a symmetric cyclic matrix is valid:
Hmn := h(nm)modN

where hn = hN n

One row results in (for odd N ):


X
n

N 1

Hmn vkn

2km
= h0 cos
+
N

2
X

l=1

hl

2k(m l)
2k(m + l) 
cos
+ cos
N
N

For even N only one term for l = (N 1)/2 can go into the sum.
According to addition theorem:
cos(x + y) + cos(x y) = 2 cos x cos y
With x = 2km/N and y = 2kl/N follows:
X
n

N 1

Hmn vkn

2km
+ 2
= h0 cos
N
N 1
2

2
X

l=1

hl

2km 
2kl
cos
cos
N
N

2km
2kl i
cos
= h0 + 2
hl cos
N
| {zN }
l=1
|
{z
}
vkm
h

= k vkm

Digital Processing of Speech and Image Signals

127

WS 2006/2007

Excursion: Toeplitz matrices


We consider only quadratic real matrices:
H IRN N

with Hmn IR

a) H is a (general) Toeplitz matrix , if:


Hmn = hnm
i.e.

H =

h0
h1
h2
..
.

h1
...

h2
...
... ...
... ... ...
... ... ...

...
...

... ... ...


... ...

h2N
h1N h2N

h2

hN 2 hN 1

hN 2

..

...

...

h2

...
h1

h1 h0

b) For a symmetric Toeplitz matrix then holds:

i.e.

H =

Hmn = h|nm|
h0
h1
h2
..
.

h1
...
...
...

h2
. . . hN 2
... ...
... ... ...
... ... ... ...
... ... ... ...
... ... ...

hN 2
hN 1 hN 2

h2 h1

Digital Processing of Speech and Image Signals

128

hN 1

hN 2

..

h2

h1

h0
WS 2006/2007

c) A cyclic matrix can be obtained by special choice:


Hmn = h(nm)modN

H =

h0
hN 1
hN 2
..
.

h1 h2
. . . hN 2
... ... ...
... ... ... ...
... ... ... ...
... ... ...
... ...

h2
h1

h2

hN 1

hN 1

hN 2

..

h2

h1

h0

Also valid: each cyclic matrix is a Toeplitz matrix.


d) A symmetric cyclic matrix can be obtained by the following choice of
the kernel vectors h:
hn = hN n

for n = 0, . . . , N

For example, we obtain for N = 8:

H =

h0 h1 h2 h3 h4 h3 h2 h1

h1 h0 h1 h2 h3 h4 h3 h2

h2 h1 h0 h1 h2 h3 h4 h3

h3 h2 h1 h0 h1 h2 h3 h4

h4 h3 h2 h1 h0 h1 h2 h3

h3 h4 h3 h2 h1 h0 h1 h2

h2 h3 h4 h3 h2 h1 h0 h1

h1 h2 h3 h4 h3 h2 h1 h0

Digital Processing of Speech and Image Signals

129

WS 2006/2007

Chapter 3
Spectral analysis

Overview:
3.1 Features for Speech Recognition
3.2 Short Time Analysis and Windowing
3.3 Autocorrelation function and Power Spectral Density
3.4 Spectrograms
3.5 Filter Bank Analysis
3.6 MelScale
3.7 Cepstrum
- Cepstrum Calculation from Filter Bank Output
- MelCepstrum according to Davis and Mermelstein
3.8 Statistical Interpretation of Cepstrum Transformation
3.9 Energy in acoustic Vector

Digital Processing of Speech and Image Signals

131

WS 2006/2007

3.1

Features for Speech Recognition

Architecture of an automatic speech recognition system

speech signal

short-time
analysis
each 10 ms
(using FFT)

sequence of
acoustic vectors

reference model
for each word
in the vocabulary

pattern
comparison

decision

Digital Processing of Speech and Image Signals

132

WS 2006/2007

Short time analysis:


window length 1040ms
sampling period 1020ms
in case of sampling rate of 10kHz:
Window: 100400 samples
sampling period (frame shift): 100200 samples
Recommended windows:
Hamming
Kaiser
Blackman
Model parameters:
Energy, intensity (loudness)
Fundamental frequency (height)
Spectral parameters (colour, smoother amplitude spectrum)

Digital Processing of Speech and Image Signals

133

WS 2006/2007

Goal:
Ideally: Real features for the recognition
In practice: Data reduction, i.e. compact description
of the speech signal (amplitude spectrum)
Side effect:
Method also enables coding of speech signals using lowest possible
number of bits
Key words:
Fourier transform: wide band/narrow band, autocorrelation function
Filter bank
Cepstrum
Linear Predictive Coding (LPC) analysis
Fundamental frequency analysis

Digital Processing of Speech and Image Signals

134

WS 2006/2007

3.2

Short Time Analysis and Windowing

The DFT is defined for signals with finite duration.


Speech signal s[n]:
quasi stationary, i.e. properties do not change within 20-50 ms.
Window function w[n]:
Decomposition of the original signal s[n] into (overlapping)
segments using a window function w[n]:
x[n] = s[n] w[n]
where for example
w[n] =

1,
0,

|n| N/2
otherwise

The windowed signal x[n] is analyzed with a Fourier Transform or


DFT.
The multiplication of the original signal s[n] with the window function
w[n] in the time domain corresponds to the convolution of the spectra
of two signals S(ej ) and window function W (ej ) in the frequency
domain:
1
X(ej ) =
2

S(ej ) W (ej() ) d

This convolution performs a (spectral) smearing in the frequency domain (leakage).

Digital Processing of Speech and Image Signals

135

WS 2006/2007

Window function:

Impulse response:
0

-10

0.8

-20
dB

0.6
0.4

-30
-40

0.2

-50

0
0

-60
-0.5 fs

N-1

0.5 fs

0.5 fs

0.5 fs

0.5 fs

Rectangle

-10

0.8

-20
dB

0.6
0.4

-30
-40

0.2

-50

0
0

-60
-0.5 fs

N-1

Triangle

-10

0.8

-20
dB

0.6
0.4

-30
-40

0.2

-50

0
0

-60
-0.5 fs

N-1

Hanning

-10

0.8

-20
dB

0.6
0.4

-30
-40

0.2

-50

0
0

-60
-0.5 fs

N-1

Hamming
Digital Processing of Speech and Image Signals

136

WS 2006/2007

Window function:

Impulse response:
0

-10

0.8

-20
dB

0.6
0.4

-30
-40

0.2

-50

0
0

-60
-0.5 fs

N-1

0.5 fs

0.5 fs

0.5 fs

Nuttall

-10

0.8

-20
dB

0.6
0.4

-30
-40

0.2

-50

0
0

-60
-0.5 fs

N-1

Gauss

-10

0.8

-20
dB

0.6
0.4

-30
-40

0.2

-50

0
0

-60
-0.5 fs

N-1

Chebyshev

Digital Processing of Speech and Image Signals

137

WS 2006/2007

Fourier Transform
of a continuous
time signal

SC()

-0

Frequency graph
of anti-aliasing
low-pass filter

H()

-
T

XC()

Fourier Transform
of filtered signal

-
T

-0

Fourier Transform of
sampled signal

X(ej)

Fourier Transform
of window function

0=T

W(ej)

2
n

Fourier Transform
of windowed signal
and sampled values
of continuous spectrum
obtained using DFT

V(ej), V[k]

Figure 3.1: Example for the application of the Discrete Fourier Transform (DFT).

Digital Processing of Speech and Image Signals

138

WS 2006/2007

Properties of short-time DFTanalysis


Important effects:
Picket Fence
If not enough sampled values of continuous spectrum are available,
spectral sampling can yield delusive results. This problem can be reduced using Zero Padding (inter-space between the coefficients S[k]
becomes smaller, i.e. frequency resolution becomes better)
Leakage: Spreading of the line spectrum
Because the window function is limited in time, a spreaded spectrum
is measured instead of the spectrum of the original signal unlimited in
time. That means, the line spectrum even becomes spreaded for pure
sinusoidal signals.

Digital Processing of Speech and Image Signals

139

WS 2006/2007

3 examples of DFT analysis


we observe a continuous time signal x(t) composed of two
sinusoids:
x(t) = A0 cos(0 t) + A1 cos(1 t)

<t<

sampling according to sampling theorem


(with negligible quantization errors)
discrete time signal x[n]:
x[n] = A0 cos(0 n) + A1 cos(1 n)

<n<

where 0 = 0 TS and 1 = 1 TS
with the window function w[n]:
v[n] = A0 w[n] cos(0 n) + A1 w[n] cos(1 n)
Intermediate calculations:
v[n] =
+
also modulation principle

A0
A0
w[n] exp(j 0 n) +
w[n] exp(j 0 n)
2
2
A1
A1
w[n] exp(j 1 n) +
w[n] exp(j 1 n)
2
2

Fourier Transform of the windowed signal:


V (ej ) =
+

A0
A0
W (ej(0 ) ) +
W (ej(+0 ) )
2
2
A1
A
1
W (ej(1 ) ) +
W (ej(+1 ) )
2
2

Digital Processing of Speech and Image Signals

140

WS 2006/2007

Assume:
4
2
10kHz, 1 =
10kHz
14
15
1/TS = 10kHz, rectangle window with N = 64, A0 = 1, A1 = 0.75

0 =

The windowed signal v[n] for the discrete time signal x(n) is therefore:

4
2

cos( n) + 0.75 cos( n) : 0 n 63


14
15
v[n] =
:

0 :
otherwise
v[n]
2

63
0

-1

Digital Processing of Speech and Image Signals

141

WS 2006/2007

Fourier Transform W (ej ) of the rectangle window function

64

Example 1:

Leakage Effect

Variation of 0 and 1 resp. 0 and 1


Difference between frequencies 0 and 1 is reduced gradually
Case 1a:
0 =

2 4
10 Hz,
6

1 =

2 4
10 Hz
3

0 = 0 TS =

2 4
2
10 Hz 104 s =
6
6

1 = 1 TS =

2
2 4
10 Hz 104 s =
3
3

Digital Processing of Speech and Image Signals

142

WS 2006/2007

Case 1a (continued):

0 =

2
6

1 =

2
3

V()

32

Case 1b:

2
3

0 =

2
6

2
14

2
6

1 =

2
3

4
15
V()
32

4
15

2
14

Digital Processing of Speech and Image Signals

143

2
14

4
15

WS 2006/2007

Case 1c:

0 =

2
14

1 =

2
12

V()
30

Case 1d:

2 2
12 14

0 =

2
14

1 =

4
25
V()
40

Digital Processing of Speech and Image Signals

144

WS 2006/2007

Example 2:

Picket Fence Effect

DFT gives sampled values of the spectrum of the windowed signal.


Spectral sampling can yield delusive results.
Case 2a:
Windowed signal v[n]:
(
2
4
cos( n) + 0.75 cos( n) : 0 n 63
v[n] =
14
15
0 :
otherwise

DFT of the length N = 64 without Zero Padding

Digital Processing of Speech and Image Signals

145

WS 2006/2007

a)

v[n]
2

63
0

-1

b)

V(k)
30

c)

63

V()
32

Figure 3.2: a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum V (ej ).

Digital Processing of Speech and Image Signals

146

WS 2006/2007

Case 2b:
In contrast to case 2a, the frequencies of sinusoids are changed only
slightly.
Windowed signal v[n]:
(
2
2
cos( n) + 0.75 cos( n) : 0 n 63
v[n] =
16
8
0 :
otherwise

DFT of the length N = 64 without Zero Padding

Digital Processing of Speech and Image Signals

147

WS 2006/2007

a)

v(n)

63

-1

b)

V(k)
30

c)

63

V()
32

Figure 3.3: a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum V (ej ).

Digital Processing of Speech and Image Signals

148

WS 2006/2007

Analysis of Example 2:
The manifestation of the DFT can be put down to the spectral sampling. Although in Case 2b the windowed signal v[n] contains a significant number of frequencies beyond 0 and 1 , they do not show in
the DFT spectrum of length N = 64.
Using a rectangle window, the DFT of the sinusoidal signal gives sharp
spectral lines, if the period N of the transformation is a whole multiple
of the signal period and no Zero Padding is applied.
Explanation for the case of a complex exponential function:
Assume the signal x[n]:
1
2
n)
exp(j
N
n0

x[n] =
Then:

X[k] = (k

N
)
n0

For the DFT of rectangle window holds:


W [k] =

sin(k)
sin(k/N )

Convolution theorem for windowed signal v[n] gives:




N
sin (k )
n0


V [k] = X[k] W [k] =
N
sin (k )/N
n0
In case of
N
IN
n0
only the DFT coefficient k =

Digital Processing of Speech and Image Signals

149

N
is non-zero.
n0
WS 2006/2007

Example 2 (continued)

Assume signal v[n] of Case 2b:


(
2
2
cos( n) + 0.75 cos( n) : 0 n 63
v[n] =
16
8
0 :
otherwise

In contrast to Case 2b, a DFT with length N = 128 is applied (Zero


Padding).
Result:

Using finer sampling, existing additional frequency components emerge.

Digital Processing of Speech and Image Signals

150

WS 2006/2007

a)

V(k)
30

b)

63

V(k)
32

c)

127

V()
32

Figure 3.4: a) DFT of length N = 64; b) DFT of length N = 128; c) Fourier spectrum
V (ej ).

Digital Processing of Speech and Image Signals

151

WS 2006/2007

Example 3:
Explanation of following illustrations:
Assume: signal of Example 2, Case 2a.
Window: Kaiser window is applied instead of rectangle window.
First: window length L = 64 and DFT length N = 64.
Then: window length L and DFT length N are halved.
Afterwards: for the case L = 32, the DFT length N is gradually
increased up to N = 1024 (Zero Padding).
Finally: DFT spectrum for the case N = 1024 and L = 64.
The Kaiser window is defined as:
 

1/2 
2

I0 1 [(n ) /]
wK [n] =
: 0nL1

I0 ()

0 :
otherwise

In this example:

= 0.8

and

L1
2

The windowed signal v[n]:


v[n] = wK [n] cos(

4
2
n) + 0.75 wK [n] cos( n)
14
15

Digital Processing of Speech and Image Signals

152

WS 2006/2007

Example 3: (continued)
DFT length N = 64, window length L = 64
Windowed signal
v(n)

0
63

-1

DFT spectrum
V(k)
30

63

Digital Processing of Speech and Image Signals

153

WS 2006/2007

Example 3: (continued)
DFT length N = 32, window length L = 32
(N and L halved)
Windowed signal
v(n)

0
31

DFT spectrum
V(k)
8

31

Digital Processing of Speech and Image Signals

154

WS 2006/2007

Example 3: (continued)
Effect of changing DFT length N at constant window length L = 32 (Zero
Padding)
DFT length N = 32, window length L = 32
V(k)
8

31

63

DFT length N = 64, window length L = 32


V(k)
8

Digital Processing of Speech and Image Signals

155

WS 2006/2007

Example 3: (continued)
DFT length N = 128, window length L = 32
V(k)
8

127

1024

DFT length N = 1024, window length L = 32


V(k)
8

Digital Processing of Speech and Image Signals

156

WS 2006/2007

Example 3: (continued)
Increasing the window length (L)
DFT length N = 1024, window length L = 64
V(k)
16

1024

Digital Processing of Speech and Image Signals

157

WS 2006/2007

Example 4: Window function influence on the spectrum

speech signal
phoneme "a"

amplitude spectrum
- rectangle window -

amplitude spectrum
- Hamming window -

Figure 3.5: Influence of the window function:


above: speech signal (vowel a); central: 512 point FFT using rectangle window; below:
512 point FFT using Hamming window

Digital Processing of Speech and Image Signals

158

WS 2006/2007

3.3

Autocorrelation Function and Power Spectral Density

Definition of Autocorrelation Function (ACF) analog to the continuous


time case:
R[k] : =

x[n] x[n + k]

n=

For a signal x[n] assume (e.g. after some suitable windowing):



x[n]
0nN 1
x[n] =
0
otherwise
In this case the ACF gives:
R[k] =

NX
1k

x[n] x[n + k]

because x[n] = 0 for n < 0 and n N

n=0

triangular effect
number of terms in R[k]

-N

Cross correlation:
Rxy [k] =

x[n] y[n k]

x[n] y[k n]

n=

In contrast to convolution:
Oxy [k] =

n=

Digital Processing of Speech and Image Signals

159

WS 2006/2007

Properties of ACF:
1. R[k] = R[k]
2. R[k] R[0]

for each k IN (R[0]: energy, intensity)

3. If x[n] R[k], then x[n] 2 R[k]


4. Intensity spectrum is the Fourier Transform of the ACF:
| X(ej ) |2

= X(ej ) X(ej )

X
X
=
x[k] exp(jk)
x[l] exp(jl)
=
=
=
=

k=

l=

k= l=

X
X

x[k] x[l] exp(jk) exp(jl)


x[k + l] x[l] exp(jk) exp(jl) exp(jl)

k= l=

X
X
k=

x[k + l] x[l]

l=

exp(jk)

R[k] exp(jk)

k=

Note:

The phase spectrum is removed.

Digital Processing of Speech and Image Signals

160

WS 2006/2007

5. Because of the symmetry R[k] = R[k] the DFT becomes the cosine
transform:
j

| X(e ) |

=
=

R[k] exp(jk)

k=
N
1
X

k=(N 1)

= R[0] +

R[k] exp(jk)
N
1
X

R[k] (exp(jk) + exp(jk))

k=1
N
1
X

= R[0] + 2

R[k] cos(k)

because

R[k] = R[k]

k=1

6. The intensity spectrum | X(ej ) |2 is a polynom of cos() with grade


N 1.
Reason:

Moivre formula:
 
 
k
k
cosk4 () sin4 () . . .
cosk2 () sin2 () +
cos(k) = cosk ()
4
2

Digital Processing of Speech and Image Signals

161

WS 2006/2007

Example 1: Spectral analysis using ACF

speech signal
phoneme "a"

amplitude spectrum
- Hamming window -

amplitude spectrum
- short hamming window -

amplitude spectrum
- 19 ACF-coefficients -

amplitude spectrum
- 13 ACF-coefficients -

Figure 3.6: Fourier Transform of a voiced speech segment:


a) signal progression, b) high resolution Fourier Transform, c) low resolution Fourier
Transform with short Hamming window (50 sampled values), d) low resolution Fourier
Transform using autocorrelation function (19 coefficients), e) low resolution Fourier Transform using autocorrelation function (13 coefficients)

Digital Processing of Speech and Image Signals

162

WS 2006/2007

Example 2: ACF of voiced and unvoiced speech segments


speech signal
phoneme "s"
speech signal
phoneme "a"

autocorrelation
- rectangle window -

autocorrelation
- rectangle window -

autocorrelation
- Hamming window -

autocorrelation
- Hamming window -

Figure 3.7: Signal progression and autocorrelation function of voiced (left) and unvoiced
(right) speech segment

Digital Processing of Speech and Image Signals

163

WS 2006/2007

Example 3: Temporal progression of autocorrelation coefficients


speech signal - digit sequence 0861909

ACF - coefficient for index 3


ACF - coefficient for index 0 (energy)

0
0

ACF - coefficient for index 6


ACF - coefficient for index 9

Figure 3.8: Temporal progression of speech signal and four autocorrelation coefficients

Digital Processing of Speech and Image Signals

164

WS 2006/2007

3.4

Spectrograms

Using DFT
Wide-band:

in frequency domain:

short time window


interaction in the synchronization between
time window and pitch impulses
vertical lines
no resolution of spectral fine structure
Narrow-band:

in frequency domain:

long time window


good resolution of the spectral fine structure

Digital Processing of Speech and Image Signals

165

WS 2006/2007

Example 1: speech spectrograms

Figure 3.9: a) wide-band spectrogram: short time window, high time resolution (vertical lines), no frequency resolution; for voiced signals provides information on formant
structure b) narrow-band spectrogram: long time window, no time resolution, high
frequency resolution (horizontal lines); for voiced signals provides information on fundamental frequency (pitch)
Digital Processing of Speech and Image Signals

166

WS 2006/2007

Example 2: speech spectrograms

Figure 3.10: Wide-band and narrow-band spectrogram and speech amplitude for the
sentence Every salt breeze comes from the sea.

Digital Processing of Speech and Image Signals

167

WS 2006/2007

3.5

Filter Bank Analysis

History:
Decomposition of the signal using a bank of band-pass filters and
energy calculation in each frequency band
transfer
function

Today digitally:
Digital filters:
yk [n] =

m=

hk [n m] x[m] ,

k = 1, . . . , K

FIR: Finite Impulse Response


IIR: Infinite Impulse Response (recursive filters)
DFT (FFT) + further processing

DFT/FFT Method:
Window function
Appending zeros for desired resolution (zero padding)
FFT
Energy calculation:

|X(ej )|, |X(ej )|2 , log |X(ej )|

Weighted averaging for each channel and frequency band respectively


Digital Processing of Speech and Image Signals

168

WS 2006/2007

DFT/FFT filter bank:

transfer
function

transfer
function

Digital Processing of Speech and Image Signals

169

WS 2006/2007

Averaging:
summation should be as smooth as possible over all channels
Form: rectangle, triangle, trapeze, etc.
Choosing the central frequencies fk :
constant:
fk = const. for all k
e.g. 20 channels with f = 200Hz for 0 4 kHz
constant relative band width:

fk
= const. for all k
fk
frequency groups of the ear (total number 24):
f

< 500Hz :

500Hz :

f = 100
f
= 20%
f

adjusted to vowels or sounds

Digital Processing of Speech and Image Signals

170

WS 2006/2007

3.6

Mel-frequency scale

The frequency resolution of the human ear is decreasing on the higher


frequencies. This empirical dependency results in the definition of the Mel
scale, which is approximately calculated as (from: Hidden Markov Toolkit,
Cambridge University Engineering Departement, S.J.Young):
f
)
fMEL = 2595 log10 (1 +
700Hz
f

MEL

2700

7000

f / Hz

Compression of the high frequencies


f

fMEL

A filter bank with constant band-widths can be used on the Mel scale:

f
MEL

Digital Processing of Speech and Image Signals

171

WS 2006/2007

Table: MEL Scale:


f /Hz fMEL
65
100
136
200
213
300
298
400
391
500
492
600
603
700
724
800
856
900
1000 1000
1158 1100
1330 1200
1519 1300
1724 1400
1949 1500
2195 1600
2464 1700
2757 1800
3078 1900
3429 2000
3812 2100
4230 2200
4688 2300
5187 2400
5734 2500
6331 2600
6984 2700

Digital Processing of Speech and Image Signals

172

WS 2006/2007

3.7

Cepstrum

The Cepstrum is the Fourier series expansion of the logarithm of the spectrum.
Comparison: autocorrelation function is a Fourier series of the normal
spectrum.
We consider:
y[n] =

k=

h[n k] x[k]

Goal:
Separating the kernel h[n] from the input signal x[n].
This problem is also called inversion or deconvolution.
Convolution theorem:
Y (ej ) = H(ej ) X(ej )
Logarithm (complex):
log Y (ej ) = log H(ej ) + log X(ej )
Inverse Fourier Transform:






F 1 log Y (ej ) = F 1 log H(ej ) + F 1 log X(ej )

Digital Processing of Speech and Image Signals

173

WS 2006/2007

Another notation:

y[n] = x[n] + h[n]


using the definition of the cepstrum for x[n]

(analogous for y[n] and h[n])




x[n] = F 1 log X(ej )
Z
1
exp(jn) log X(ej ) d
=
2

#
"
Z
X
1
x[m] exp(jm) d
=
exp(jn) log
2
m

= C {x[n]}

Note:
Cepstrum = artificial word derived from spectrum
Cepstrum is located in time domain

Digital Processing of Speech and Image Signals

174

WS 2006/2007

Through the cepstrum transformation


x[n]

x[n] = C {x[n]}

the convolution comes down to a simple addition.


In the cepstrum domain, a linear operation L (time invariance is not nec
essary) on y[n] is performed separately on h[n]
and x[n]:

y[n] =

k=

h[n k] x[k]

y[n] = h[n]
+ x[n]
o
n

x[n]}
L {
y [n]} = L h[n] + L {

With the definition GL for the concatenation of the cepstrum, the operation
L, and the inverse cepstrum
GL := C 1 L C
we obtain
GL {h[n] x[n]} = GL {h[n]} GL {x[n]} .
Such a transformation GL acts on h[n] and x[n] separately, and is called:
homomorph (structure preserving)

Digital Processing of Speech and Image Signals

175

WS 2006/2007

Complex cepstrum:
1
x[n] =
2

exp(jn) logX(ej ) d

Note: complex logarithm


Simple cepstrum (real cepstrum):

1
x[n] =
2

Z2

exp(jn) log|X(ej )| d

Cepstrum: Fourier coefficients of the logarithmized power spectral


density
ACF: Fourier coefficients of Fourier series of the power spectral density
Setting cepstral coefficients x[n] to zero for high n results in smoothing of
the power spectral density.
Implementation:
Fourier Transform via N FFT (N = 512, 1024, 2048)
(But: discretisation error):
2
2
N 1
j k
1 X j kn
x[n] :=
log |X(e N )|
e N
N
k=0

Digital Processing of Speech and Image Signals

176

WS 2006/2007

Example 1: Real cepstrum


Fine structure of power spectral density with the period 1/T results in a
single peak in the cepstrum at time T .
log|F()|2

1
T

frequency

F-1(log|F(w)|2)

time

Figure 3.11: Above: logarithmized power spectrum of a spoken vowel (schematic).


Below: corresponding cepstrum (inverse Fouriertransform of the logarithmized power
spectrum).

Digital Processing of Speech and Image Signals

177

WS 2006/2007

Example 2: Smoothing

speech signal
phoneme "a"

windowed phoneme "a"


- Hamming window -

spectrum from cepstrum


whole cepstrum
first 13 coefficients

Figure 3.12: Cepstral smoothing: speech signal (vowel a), windowed speech signal
(Hamming window), spectrum obtained from the whole cepstrum (blue) and smoothed
spectrum obtained from the first 13 cepstral coefficients (red).

Digital Processing of Speech and Image Signals

178

WS 2006/2007

Example 3: Smoothing with different numbers of cepstral coefficients

speech signal
phoneme "a"

spectrum from cepstrum


whole cepstrum
first 13 coefficients

spectrum from cepstrum


whole cepstrum
first 19 coefficients

spectrum from cepstrum


whole cepstrum
first 19 coefficients
first 13 coefficients

Figure 3.13: Homomorph analysis of a speech segment: signal progression, homomorph


smoothed spectrum using 13 and 19 cepstral coefficients

Digital Processing of Speech and Image Signals

179

WS 2006/2007

Cepstrum calculation using Filter Bank Output

Filter bank outputs A[k] for k = 1, . . . , K


Note: k = 0 is missing.
We complete the outputs symmetrically:

A
-K+1

A
-1

A A

Symmetry Ak+1 = Ak for all k = 1, . . . , K.

Digital Processing of Speech and Image Signals

180

WS 2006/2007

Inverse DFT a[n] of the symmetric sequence AK+1 , . . . , AK :


1
a[n] =
2K

K
X

k=K+1

2j
nk
Ak exp
2K






K
2j
2j
1 X
nk + exp
n(k + 1)
Ak exp
=
2K
2K
2K
k=1







K
2j
2j
1 X
2j
= exp
0.5
n(k 0.5) + exp
n(k 0.5)
Ak exp
2K
2K
2K
2K
k=1


K

 n
1 X
2j
0.5
(k 0.5)
Ak cos
= exp
2K
K
K


k=1

The phase term exp


around k = 0.5.

2j
2K


0.5 depends on the position of the symmetry axis

Cepstrum is defined as:


a[n] =

K

 n
1 X
(k 0.5)
Ak cos
K
K
k=1

Digital Processing of Speech and Image Signals

181

WS 2006/2007

Mel Cepstrum according to Davis and Mermelstein

= 100

MEL

k=1

= 300

MEL

MEL

k=K

k=3

Filter bank:
overlapping band-pass filters triangular shape,
all channels have equal band width, and filter positioning is equidistant on a Mel scale.
Calculation of the filter bank outputs:
magnitude of DFT coefficients,
for each channel summation of the magnitudes according to triangular
weight function,
for each channel logarithm of the sum.
Thus the filter outputs A[k] with k = 1, . . . , K are obtained. Using the
filter bank outputs, the cepstrum is calculated using a cosine transform.
(see previous description)

Digital Processing of Speech and Image Signals

182

WS 2006/2007

3.8

Statistical Interpretation of the Cepstrum Transformation

We consider the filter bank outputs log|Xk |.


log |X k|

N/2

Assumption: The correlation between the outputs s and p, i.e. the element
Csp of the covariance matrix does not depend directly on s or p, but only
on their difference. Because the spectrum is periodical there is no distance
greater than N :
Csp = c(sp)modN
It is further assumed that the correlation is locally symmetric:
Cs,s+n = Cs,sn
Then:
c(ssn)modN = c(ss+n)modN
c(n)modN = c(+n)modN
With 0 n N follows:

cn = cN n

i.e. we have a symmetric cyclic matrix with the kernel vector c.

Digital Processing of Speech and Image Signals

183

WS 2006/2007

Example: the covariance matrix for N

c0 c1 c2 c3
c c c c
1 0 1 2
c c c c
2 1 0 1

c c c c
C = 3 2 1 0
c4 c3 c2 c1

c3 c4 c3 c2

c2 c3 c4 c3
c1 c2 c3 c4

= 8:
c4
c3
c2
c1
c0
c1
c2
c3

c3
c4
c3
c2
c1
c0
c1
c2

c2
c3
c4
c3
c2
c1
c0
c1

c1
c2
c3
c4
c3
c2
c1
c0

Such a covariance matrix will be diagonalised using the cosine transform


(or Fourier Transform, which results in the cosine transform due to the
symmetry) (see excursion in chapter 2.17).

Digital Processing of Speech and Image Signals

184

WS 2006/2007

3.9

Energy in acoustic Vector

The energy is usually added as zeroth (or first) component to the acoustic
vector.
For the logarithmic energy we have:
log E =

1
2

log|X(ej )|2 d

For the (short time) spectrum or cepstrum it approximately holds:


K
1 X
log E
log|Xk |2
K
k=1

Spectra are usually normalized with log E:


logYk2 = log|Xk |2 log E
such that:
K
X
k=1

logYk2 0

The cepstral coefficient x[0] is the logarithmized energy.

Digital Processing of Speech and Image Signals

185

WS 2006/2007

186

Chapter 4
Fourier Transform and Image
Processing

Overview:
4.1 Spatial Frequencies and Fourier Transform for Images
4.2 Discrete Fourier Transform for Images
4.3 Fourier Transform in Computer Tomography
4.4 Fourier Transform and RST Invariance

Digital Processing of Speech and Image Signals

187

WS 2006/2007

4.1

Spatial Frequencies and Fourier Transform for


Images

A grey-valued image g(x, y) can be interpreted as:


g : IR2 [0, [
(x, y) g(x, y)
Space coordinates (x, y) are at first considered as continuous. Discretization and DFT will be analyzed later.

Convention:

g(x, y) 0

outside of the image.

The Fouriertransform G(fx , fy ) of the image g(x, y) is defined as:


G(fx , fy ) = F {(x, y) g(x, y)}
Z Z
g(x, y) e2j(fx x+fy y) dxdy
=

The arguments fx and fy are called spatial frequencies.

Digital Processing of Speech and Image Signals

188

WS 2006/2007

The two-dimensional Fouriertransform can be obtained by using two onedimensional Fouriertransforms.


We consider one image row with a constant value of y:
x g(x, y)
Corresponding Fouriertransform Gy (fx ) :
Gy (fx ) = F {x g(x, y)}
Z
g(x, y) e2jfx x dx
=

Then we compute the Fouriertransform of the function:


y Gy (fx )
and obtain:
F {y Gy (fx )} =
=

Z+

Gy (fx ) e2jfy y dy

Z+ Z+

g(x, y) e2j(fx x+fy y) dxdy

In this way we get the result:


G(fx , fy ) = F {y F {x g(x, y)}}
For the inverse FT we have (as can be expected):
g(x, y) = F 1 {(fx , fy ) G(fx , fy )}
Z+ Z+
G(fx , fy ) e2j(fx x+fy y) dfx dfy
=

Digital Processing of Speech and Image Signals

189

WS 2006/2007

We would like to interpret the two-dimensional FT visually. For this purpose, we consider the exponential factor in the FT and require the following
condition:
!

e2j(fx x+fy y) = 1
2j(fx x + fy y) = 2n

y =

for n IN

fx
n
x +
fy
fy

1/fy

1/fx

1
L = q
fx2 + fy2

spatial period

Digital Processing of Speech and Image Signals

190

WS 2006/2007

Special case:
|G(fx , fy )| has a large value only at one point (u, v) = (fx , fy ) in the
spatial frequency plane
fy

|G(fx,fy)|

-u

fx

-v

Digital Processing of Speech and Image Signals

191

WS 2006/2007

Since G(fx , fy ) = G(fx , fy ) for a real image g(x, y), we have two
dominant frequency pairs in the Fouriertransform integral:
|G(u, v)| [e2j(ux+vy) + e2j(ux+vy) ] = 2|G(u, v)| cos 2(ux + vy)
This function describes a black-white cosine wave pattern with
(fx , fy ) = (u, v)

Where is the value of G(fx , fy ) large ?


ideally: points (u, v) and (u, v) represent cosinevariant of the grey
values
really: straight line through (u, v) and (u, v) represents abrupt changes
of the grey values

Digital Processing of Speech and Image Signals

192

WS 2006/2007

Figure 4.1: TVimage (analog)

Figure 4.2: Digitized TVimage

Figure 4.3: Amplitude spectrum of Figure 4.2

Figure 4.4: Low-pass filtered

Digital Processing of Speech and Image Signals

193

WS 2006/2007

Figure 4.5: High-pass filtered

Figure 4.6: High-pass enhancement

Explanation for figures 4.14.6 (from Duda & Hart 1973, pp. 310312):
Figure 4.1: TVimage (analog)
Figure 4.2: digitized TVimage
- 120120 pixels
- grey values from 0 (black) to 15 (white)
Figure 4.3: Fouriertransform of the image from Figure 4.2 (amplitude spectrum)
log|G(fx , fy )|: black =
high amplitude
note:
1. strong components along the axes
=
vertical and horizontal image edges
2. concentration around (fx , fy ) = (0, 0)
=
regions with constant grey values
Figure 4.4: Low-pass filter:
H(fx , fy ) = [cos(fx ) cos(fy )]16
0H1
Figure 4.5: High pass filter:
H(fx , fy ) = 1.5 [cos(fx ) cos(fy )]4
0.5 H 1.5
Figure 4.6: High pass enhancement:
H(fx , fy ) = 2.0 [cos(fx ) cos(fy )]4
1.0 H 2.0

Digital Processing of Speech and Image Signals

194

WS 2006/2007

Following general rules for G(fx , fy ) ensue:


Edges in the image g(x, y):
An image edge produces strong spatial frequency components along
one straight line in the spatial frequency plane which is orthogonal to
the edge.
The sharper the edge is, the longer is the corresponding line in
the spatial frequency domain.
Regions with constant grey values:
Regions with constant grey values increase the values of |G(fx , fy )|
around the origin (fx , fy ) = (0, 0). (fx , fy ) = (0, 0) is called DC
component (average grey value, DC=direct current).

Digital Processing of Speech and Image Signals

195

WS 2006/2007

4.2

Discrete Fourier Transform for Images

The analog image g(x, y) is discretized (sampled) along both axes. We


obtain the discrete image:
g[j, k] := g(j x, k y) where j, k = 0, 1, . . . , N 1

Change in notation: i = 1 instead of j


G(e

2i
N u

,e

2i
N v

) =

1
N
1 N
X
X

2i

g[j, k]e N (uj + vk)

where u, v = 0, 1, . . . , N 1

j=0 k=0

Discretization is written as:


1
N
1 N
X
X

G[u, v] =

2i

g[j, k] e N (uj + vk)

j=0 k=0

N
1
X

j=0

2i
N uj

N
1
X

2i
N vk

g[j, k] e

k=0

Interpretation:
Fouriertransform of the image is first performed row by row, then column
by column.
2i
Using usual definition of the Fourier matrix W (i.e. Wvk = (e N )vk ),
we obtain the matrix representation of Fouriertransform.
Using the notation:
g IRN xN
W CN xN
G CN xN
we obtain
G =
g

[W g W ]
1
[W 1 G W 1 ]
2
N

Note: In the corresponding definition of the Fouriertransform, instead of


the factors 1 and 1/N 2 we have the factors 1/N and 1/N or 1/N 2 and 1.
Digital Processing of Speech and Image Signals

196

WS 2006/2007

4.3

Fourier Transform in Computer Tomography


b

y=ax+b,
a const.

We consider a projection of the image g(x, y) along the straight line:


y = ax + b
We produce a set of straight lines by keeping a constant and varying b.
Projection:
ga (b) =
Fouriertransform:

g(x, ax + b) dx

ga (b) e2jfb b db
Z Z
=
g(x, ax + b) e2jfb b db dx

Ga (fb ) =

We substitute b = y ax and obtain:


Z Z
Ga (fb ) =
g(x, y) e2j(yfb xafb ) dydx

= G(afb , fb )
= Fouriertransform G(fx , fy ) of g(x, y) along
1
the spatial frequency straight line (fx , fy ) with fy = fx
a

Digital Processing of Speech and Image Signals

197

WS 2006/2007

Remarks:
a. Straight line in spatial frequency domain: (fx , fy ) = (afb , fb ) is
orthogonal to y = ax + b:
1
y = ax + b => in Fouriertransform fy = fx
a
The angle
 between these straight lines is a right angle because
1
a a = 1.

In general:

y1 (x) = m1 x + b1
y2 (x) = m2 x + b2
y1 (x) y2 (x) m1 m2 = 1
b. The value Ga (fb ) is independent of the offset b and depends only
on the orientation a of the straight line. Therefore, if we calculate the projection for many different inclinations a and apply the
one-dimensional FT, we obtain the two-dimensional FT of the image
g(x, y).

Digital Processing of Speech and Image Signals

198

WS 2006/2007

4.4

Fourier Transform and RST Invariance

We will investigate invariance of the Fouriertransform to


R : Rotation
S : Scaling
T : Translation

We will use vector notation for the two-dimensional Fouriertransform:


 
x
coordinates:
z
=
IR2
y
image grey values: g(z) = g(x, y) IR+
spatial frequency:

 
fx
=
fy

IR2

We ignore the discretization.

Digital Processing of Speech and Image Signals

199

WS 2006/2007

Translation
z z + z0

 
x0
with translation vector z0 =
IR2
y0
Image : g(z) g(z) := g(z + z0 )
) = exp (i[fx x0 + fy y0 ]) G(f )
FT : G(f
Rotation
z

D z
with rotation matrix D =

cos
sin
sin cos

Digital Processing of Speech and Image Signals

200

WS 2006/2007

Scaling
z

Image : g(z)
) =
FT : G(f

z
with scaling factor > 0
g(z) = g( z)
 
f
1

G
2

basically: similarity principle (S.??) for one-dimensional FT


transferred to two dimensions

Invertible linear mapping


z

Image : g(z)
FT :

Az
where A IR22 invertable

) =
G(f
=

Proof:

g(z) = g(Az)
...
1
1 T

G((A
) f)
det(A)

transformation of two-dimensional integration variables

Digital Processing of Speech and Image Signals

201

WS 2006/2007

We apply two basic rules to obtain the RST-invariance:


1. Invariance to translation (=T) can be obtained by using the square of
the absolute value
g(z) G(f ) |G(f )|2
2. To obtain RS-invariance we transfer to polar coordinates in the spatial
frequency domain. We write in complex notation:
fz fx + i fy = r ei = exp (ln r + i)

C2

Complex logarithm:
fz := ln fz = ln r + i
We already know:
a) rotation by angle 0 in spatial domain
=
rotation by angle 0 in spatial frequency domain
b) scaling with factor in spatial domain
 1 
 1 2
=
scaling with factor
respectively in spatial
and

frequency domain
fz

scaling and

rotation

Digital Processing of Speech and Image Signals

202

fz

WS 2006/2007

fz

= ln z
= ln r + i
 r 
= ln
+ i( 0 )

= ln r + i ln i 0
= fz ln i 0
= translation with the shift vector ( ln i ) C2
in logarithmic polar coordinates of the spatial frequency plane

Digital Processing of Speech and Image Signals

203

WS 2006/2007

RST-invariant features can therefor be obtained as follows:


g(x, y)

image: (x, y) IR2


first Fouriertransform

G(fx , fy )

= F {g(x, y)}

mit (fx , fy ) IR2

squared absolute value


|G(fx , fy )|2
logarithmic polar coordinates: (ln r, )
|G(ln r, )|2
second Fouriertransform
F (|G(ln r, )|2 )

squared absolute value


|F (|G(ln r, )|2 )|2
Next page:
Analysis of the RST-invariant features.
Original grey valued images are identical up to a 90 rotation.

Digital Processing of Speech and Image Signals

204

WS 2006/2007

y
6
- x

original image

fy
6
- f
x

|FFT|

6
-

logpolar

ln r

fy
6
- f
x

|FFT|

Digital Processing of Speech and Image Signals

205

WS 2006/2007

Warning:
a) Invariant observations are not necessarily good for classification.
b) Observations that are calculated using the two-dimensional Fourier
transform are not complete, i.e. the original image cannot be reconstructed completely.

Digital Processing of Speech and Image Signals

206

WS 2006/2007

Chapter 5
LPC Analysis

Overview:
5.1 Principle of LPC Analysis
5.2 LPC: Covariance Method
5.3 LPC: Autocorrelation Method
5.4 LPC: Interpretation in Frequency Domain
5.5 LPC: Generative Model
5.6 LPC: Alternative Representations

The acronym LPC stands for


Linear Predictive Coefficients / Coding
and is utilized in signal processing and frequency analysis, as well as in
signal coding.

Digital Processing of Speech and Image Signals

207

WS 2006/2007

5.1

Principle of LPC Analysis

n-2 n

time

We consider a discrete time signal x[n], possibly multiplied with a window


function. The goal of an LPC analysis is to predict each signal value x[n]
by its preceding values x[n 1], x[n 2], ..., x[n K]. We distinguish:
x[n] :
x[n] :

signal value
predicted value

We assume the predicted value x[n] to be a linear combination of the


preceding values of x[n]:
x[n] :=

K
X
k=1

k x[n k]

with at first unknown coefficients k , k = 1, ..., K, which are called


LPCcoefficients or prediction coefficients.
The value K is called prediction order, e.g. K = 8, . . . , 10 at a sampling
frequency of 4 kHz (about 2 coefficients per kHz).

Digital Processing of Speech and Image Signals

208

WS 2006/2007

Outlook
Starting point: coding in time domain (goal: bit reduction)

Parseval Theorem

parametric model for power spectrum of Fouriertransform


(more exact: rough structure of power spectrum for speech signal)
LPC analysis applications:
speech coding
(ADPCM = adaptive differential pulse code modulation)
signal processing:
parametric modelling with autoregressive or all-pole models (order K)
time curves:
resonance and oscillator curves, sun spots, stock-market course, ...
image coding
also: interpretation as Maximum Entropy Approach

Digital Processing of Speech and Image Signals

209

WS 2006/2007

The coefficients k are unknown at first. To estimate these, we define the


prediction error for each point n in time:
e[n] := x[n] x[n]
K
X
= x[n]
k x[n k]
k=1

For a reliable set of LPCcoefficients we calculate the squared error criterion E as sum of the squared prediction errors e[n]:
X
e2 (n)
E =
n

X
n

"

x[n]

K
X
k=1

k x[n k]

#2

minimum with respect to 1 , . . . , k , . . . , K

Taking the derivative


for l = 1, . . . , K results in:
l


P
P
!
x[n] k x[n k] x[n l] = 0
n

P
k

P
P
k x[n k]x[n l] = x[n l]x[n]
n

Here, the summation limits are not specified on purpose.


If the squared error criterion E is considered as a function of LPCcoefficients,
the following properties ensue:
E is quadratic in 1 , . . . , k , . . . , K ; it is guaranteed to be nonnegative and it has a single well-defined minimum.
The optimal LPCcoefficients are invariant
to linear scaling of the signal values x[n].

Digital Processing of Speech and Image Signals

210

WS 2006/2007

Minimization of the squared error criterion with respect to the LPC


coefficients results either from taking the derivative or from the quadratic
complement (recalculate for yourself!). The linear equation system for
the LPCcoefficients k ensues:
l = 1, . . . , K :

K
P

k=1

P
n

x[n k] x[n l] =

X
n

x[n l] x[n]

with still unspecified summation limits over n. We consider two methods


for the choice of summation limits:
1. covariance method
2. autocorrelation method
Warning: terminology is not consistent.

Digital Processing of Speech and Image Signals

211

WS 2006/2007

5.2

LPC: Covariance Method


known values

predicted value

N-1

Covariance Method
No window function is applied, such that we obtain the following
summation limits:
X

e (n) =

N
1
X

e2 (n)

n=0

i.e. we also use signal values x[n] with n < 0 for prediction.
The resulting equation system for LPCcoefficients:
l = 1, . . . , K :

K
X

k (l, k) = (l, 0)

k=1

with the definition:


(l, k) :=

N
1
X
n=0

x[n l] x[n k]

For the above terms hold:


they describe a kind of cross correlation between two signals
they are similar to a covariance matrix
Computational complexity for solving the equation system:
O(K 3 ) + O(N K)
autocorrelation method has more favorable complexity: O(K 2 )
but: calculation of auto/cross-correlation function dominates
In contrast to covariance method, autocorrelation method offers an interpretation in the frequency domain and therefore is often preferred.
Digital Processing of Speech and Image Signals

212

WS 2006/2007

5.3

LPC: Autocorrelation Method


window
function

N-1

We consider the signal after multiplication with a convenient window function, usually Hamming window:
In principle, the summation limits now are
X

e [n] =

n=+
X

e2 [n] .

n=

Since, due to windowing the signal x[n] is identical to zero outside the
window function, i.e.
x[n] 0

for n < 0 or N 1 < n

we obtain the following for the prediction error e[n]:


e[n] 0

for n < 0 or N 1 + K < n

Therefore, the total error E becomes:


E =

NX
+K1

e2 [n]

n=0

The prediction error e[n] can become large on the window function
boundaries:
- Beginning: prediction from zeros
- End:
prediction of zeros

Digital Processing of Speech and Image Signals

213

WS 2006/2007

Inserting the summation limits:


X
x[n k] x[n l] = R(|l k|),
n

where R(|l k|) =


R(|l|) =

NX
1l
n=0

X
n

X
n

x[n] x[n l] = R(|l|)

x[n k] x[n l]

x[n] x[n l] =

NX
1l
n=0

x[n] x[n l]

In this way we obtain the following equation system for the LPCcoefficients
k :
l = 1, ..., K :

K
X
k=1

k R(|l k|) = R(l)

or in matrix form:

R(0)

R(1)

R(1)
..
.

R(0)

..
.
R(K 1) R(K 2)

1
R(1)

2
...
R(K 2)
R(2)

..

..
.
.
...
.. ..

R(1)

. . . R(1)
R(0)
K
R(K)
...

R(K 1)

Digital Processing of Speech and Image Signals

214

WS 2006/2007

Note that this equation system is completely determined by the autocorrelation coefficients
R(0), ..., R(k), ..., R(K).
Hence, the autocorrelation coefficients will only be converted to obtain
the LPCcoefficients
1 , ..., k , ..., K .
The matrix of this equation system has the following properties:
- Toeplitz structure (follows from time invariance)
- solution: Durbinalgorithm with complexity O(K 2 )

Digital Processing of Speech and Image Signals

215

WS 2006/2007

5.4

LPC: Interpretation in Frequency Domain

The LPC autocorrelation method allows prediction error conversion from


time domain into frequency domain using Parseval theorem so that LPC
analysis can be interpreted as adaptation of parametric model spectrum to
the observed signal spectrum.
We start with the prediction error e[n]:
e[n] = x[n]

K
X
k=1

k x[n k]

and apply the ztransform to this equation. The ztransform is restricted


to the unit circle.
z = ej

For the z-transforms E(z) and X(z) we obtain:


"
#
K
X
E(z) = X(z) 1
k z k
k=1

The total error Etot for the squared error criterion becomes:
Etot =

NX
+K1

e2 [n]

n=0
Z+

1
2
1
2
1
2

|E(ej )|2 d

Z+

(Parseval Theorem)

2
K


X

jk
k e
|X(ej )|2 d
1

Z+

k=1



P (ej ) 2 |X(ej )|2 d

Digital Processing of Speech and Image Signals

216

WS 2006/2007

with the so-called predictor polynom:

P (e ) := 1

K
X

k ejk

k=1

Squared absolute value of the predictor polynom



2
K


X


jk
P (ej ) 2 = 1
k e



k=1

= ...
=

K
X
k=1

Bk cos(k)

(with suitable coefficients Bk resulting from the predictor coefficients) is a


polynom with respect to cos(), which can be obtained via application of
trigonometric transformations.
The predictor polynom tries to compensate for |X(ej )|2 especially at
maxima and to generate a white spectrum for the prediction error e[n].
The complex predictor polynom P (z) with z C has exactly K zeros in
the complex plane and therefore can be factorised into linear factors:
K
Y
P (z) =
(z zk )
k=1

Digital Processing of Speech and Image Signals

217

WS 2006/2007

Observations:
These zeros are complex conjugated pairs because k IR.


j 2

The zeros can cause minima of P (e ) . The minima of |P (ej )|2
approximately correspond to the maxima of the smoothed spectrum
|X(ej )|2 , because for minimization of the error integral it is first of
all necessary to compensate for the maxima of the signal spectrum.
The LPC analysis could therefore be used to describe of the speech
signal formant structure.
|P(e i )|2
|X(e i )|2

Digital Processing of Speech and Image Signals

218

WS 2006/2007

windowed phoneme "a"


- Hamming window -

prediction error
- 12 LPC-coefficients -

0
0

LPC-spectrum
- 12 coefficients -

spectrum of
prediction error
(12 LPC-coefficients)

LPC-spectrum
- 18 coefficients -

Figure 5.1: LPCanalysis of one speech segment


a) signal progression, b) prediction error (K=12), c) LPCspectrum with K=12 coefficients, d) spectrum of the prediction error (K=12), e) LPCspectrum with K=18 coefficients

Digital Processing of Speech and Image Signals

219

WS 2006/2007

windowed phoneme "a"


- Hamming window -

amplitude spectrum
- Hamming window -

LPC-spectrum
- 4 coefficients -

LPC-spectrum
- 8 coefficients -

LPC-spectrum
- 12 coefficients -

LPC-spectrum
- 16 coefficients -

LPC-spectrum
- 18 coefficients -

LPC-spectrum
- 20 coefficients -

Figure 5.2: LPCSpectra for different prediction orders K

Digital Processing of Speech and Image Signals

220

WS 2006/2007

5.5

LPC: Generative Model


e(n)

x(n)

recursive
filter
k

For the prediction error e[n] and its ztransform holds:


e[n] = x[n]

K
X

k x[n k]

k=1
K
X

E(z) = X(z)

k X(z) z k

k=1

= X(z) [1

K
X

k z k ]

k=1

If we consider prediction error as input signal, we can also interpret the


LPCtheorem as generative model which generates an output signal x[n]
from an adequate input signal e[n]:
x[n] = e[n] +

K
X
k=1

k x[n k] .

For the signal spectrum X(z) holds:


X(z) =

E(z)
K
P
k z k
1
k=1

This model is called autoregressive model. The excitation has to be chosen


such that E(z) is white, i.e. it does not have fine structure due to the
fundamental frequency (pitchfrequency).
In other words:
E(z) = G = const. (gain)
Digital Processing of Speech and Image Signals

221

WS 2006/2007

Special case:
E[n] = G [n]
Then for LPC model spectrum X(z) holds:
X(z) =

G
K
P
k z k
1
k=1

This spectrum is often interpreted as LPC model spectrum X(z) of observed signal. It is reasonable to set (without explanation):
#
"
K
K
X
X
R(k)
G2 = R(0)
k R(k) = R(0) 1
k
R(0)
k=1

k=1

This LPC model spectrum does not have any zeros, it has only poles, and
therefore is also called allpole model.
Remarks:
stability problems by solving the equation system
( truncation error in autocorrelation)

way out: preemphasis through difference calculation


absolute rule for choice of order K:

1 formant needs 2 LPCcoefficients


1 formant per kHz
+ excitation pulse shape + radiation: 2 LPCcoefficients
= rule of thumb:
bandwidth
4 kHz
5 kHz
6 kHz

Digital Processing of Speech and Image Signals

222

K = 10
K = 12
K = 14

WS 2006/2007

5.6

LPC: Alternative Representations

so far:
G
gain
k
LPCcoefficients
impulse response of generative model
impulse response of squared absolute value of predictor polynom
cepstrum
poles / zeros of synthesis model / predictor polynom
= formants / bandwidths

problem: noise susceptible


PARCORcoefficients: partial correlation
Areacoefficients: cross-section surfaces Ak
reflexion coefficients PARCOR; tube model

A1

A2

A3

Glottis

A4

A5

Lips

Digital Processing of Speech and Image Signals

223

WS 2006/2007

Chapter 6
Outlook: Wavelet Transform
Overview:
6.1 Motivation: from Fourier to Wavelet Transform
6.2 Definition
6.3 Discrete Wavelet Transform

Digital Processing of Speech and Image Signals

225

WS 2006/2007

6.1

Motivation: from Fourier to Wavelet Transform

Fourier transform uses infinitely extended basis functions


ejt
and therefore does not have any time resolution, i.e. there is no information about the localization along the time axis.
Therefore, a window function often is used
t w(t),

complex in general

This function has finite support such that it is possible to investigate a


segment of a function of interest.
We define a shorttime Fourier transform Fb (w) of time signal t f (t)
at position b IR in time:
Fb (w) :=

Z+

f (t)w(t b)ejt dt

where w(t) denotes the complex conjugated value of w(t).


The wavelettransform can be derived from this equation in two steps:
a) we ignore the basis function ejt and we define the window function
as the new basis function.
b) in addition to the localization parameter b, we also introduce a scaling
parameter a > 0.
We consider the family of window functions Wab (t):
 tb 
wab (t) := w
a
Digital Processing of Speech and Image Signals

226

WS 2006/2007

6.2

Definition

The following notation is usually used for the Wavelettransform:




tb
1
ab (t) :=
a
a

which is the so-called Mother-Wavelet t (t).

Like the window function for the shorttime Fourier transform the MotherWavelet should be localized as much as possible.
Example:
Mexican-Hat Function:

1 2

(t) = (1 t2 )e 2 t

(t)

Digital Processing of Speech and Image Signals

227

WS 2006/2007

The wavelettransform of f (t) with respect to (t) is defined as:


1
F (a, b) =
a

Z+

 tb 
f (t)
dt
a

with scaling parameter a > 0 and localization parameter b IR.


For the inverse transformation holds:
Z+ Z+
 tb 
1 1
f (t) =

da db F (a, b)
C a2
a

with
C :=

Z
0

|()|2
d <

Proof (principle only):


The proof uses the (generalized) Parseval Theorem:

F (a, b)

1
=
a
1
=
a

with

Z+

Z+

1
ab (t) =
a

f (t) ab (t) dt
F () ab () d


tb
a

using further conversions.

Digital Processing of Speech and Image Signals

228

WS 2006/2007

6.3

Discrete Wavelet Transform

For the scaling parameters a > 0 we choose:


a = 0m

where 0 > 1 and m Z.

The values of m determine the width of wavelet ab (t).

In order to adjust the localization parameter properly, we define:


b = n b0 am
0

where b0 > 0 and n Z.

Thus we constrain the Wavelet transform to discrete values:


F (a, b)

F (n, m)

The choice of the function (t) is still open.


It is useful to choose (t) such that the function system {mn |m, n
Z, m > 0}
with

m
2

mn (t) := a0

(am
0 t nb0 )

represents an orthonormal basis for functions t f (t)

L2 (IR).

Note: The scalar product < f (t), g(t) > of two functions f (t) and g(t) is
defined as:
Z
< f (t), g(t) > =
f (t) g(t) dt

Digital Processing of Speech and Image Signals

229

WS 2006/2007

In this way we obtain the following representation for the discrete Wavelet
transform:

F (m, n) =

Z+

f (t)a0 2 (am
0 t nb0 ) dt

= < f (t), mn (t) >


Due to the orthogonality it is possible to convert the integral of the inverse
Wavelettransform into an infinite series:
f (t) =
=

1 XX
F (m, n)mn (t)
C m n
1 XX
m
F (m, n) a0 2 (am
0 t nb0 )
C m n

Digital Processing of Speech and Image Signals

230

WS 2006/2007

Example:

Haar function and Haar basis


special choice: a0 = 2
b0 = 1

The Haar function is defined as:

0 t 12

1
(t) = 1 12 t < 1

0
otherwise
This defines the Haar basis


(t) | m, n Z, m > 0 :

1
mn (t) =
2m

2m t n

It is easy to see that for increasing m a increasingly finer resolution is obtained and that n determines localization in time.

Digital Processing of Speech and Image Signals

231

WS 2006/2007

Digital Processing of Speech and Image Signals

232

WS 2006/2007

Chapter 7
Coding
The following types of coding are distinguished:
source coding (data compression)
goal: transmission (storage) using as few bits as possible without or
with few errors
channel coding
goal: preferably faultless data transmission (storage)
e.g. error-recognizing and error-correcting codes
simultaneous source and channel coding
goal: simultaneous optimization
The following data types are distinguished:
discrete alphabet
continuous signal (audio, video, . . . )
Source coding
lossless coding (compression)
usually discrete sources, e.g. text compression
lossy coding
usually continuous signals
notation:
rate - distortion theory
distortion, error
bit rate
233

Digital Processing of Speech and Image Signals

234

WS 2006/2007

Three effects can be utilized for signal coding:


a) statistical redundancy and correlation:
samples are not independent.
b) perceptive properties of the receiver (ear and eye):
some fine structures in the signal are irrelevant to the receiver
c) signal distortion:
coded signal differs from the original signal without significant quality
deterioration.

signal

transmission

reconstructed
signal

T -1

Q -1

C -1

T:

transformation, e.g. DCT

Q:

quantization, e.g. vector quantization

C:

mapping of bit representation

Digital Processing of Speech and Image Signals

235

WS 2006/2007

References:
Ze-Nian Li: CMPT 365 Multimedia Systems. Simon Fraser
University, British Columbia, Canada, fall 1999, Version Jan.2000;
http://www.cs.sfu.ca/CourseCentral/365/li/index_prev.html.
Peter Noll: MPEG Digital Audio Coding. IEEE Signal Processing
Magazine, pp.59-81, Sep. 1997.
Thomas Sikora: MPEG Digital Video-Coding Standards. IEEE Signal
Processing Magazine, pp.82-100, Sep. 1997.
A. Ortega, K. Ramchandran: Rate-Distortion Methods for Image
and Video Compression. IEEE Signal Processing Magazine, pp.2350, Nov. 1998.
G. J. Sullivan, Th. Wiegand: Rate-Distortion Optimization for Video
Compression. IEEE Signal Processing Magazine, pp.74-90, Nov. 1998.

Digital Processing of Speech and Image Signals

236

WS 2006/2007

Chapter 8
Image Segmentation and
Contour-Finding
The lecture notes for this chapter are available as a separate document.

237

238

S-ar putea să vă placă și