Digital Processing of Speech and Image Signals

Lecture
DIGITAL PROCESSING
OF
SPEECH AND IMAGE SIGNALS
RWTH Aachen, WS 2006/7
Prof. Dr.-Ing. H. Ney, Dr.rer.nat. R. Schl

uter
Lehrstuhl f
ur Informatik 6
RWTH Aachen
1. System Theory and Fourier Transform

2. Discrete Time Systems
3. Spectral Analysis
4. Fourier Transform and Image Processing
5. LPC Analysis
6. Wavelets
7. Coding
8. Image Segmentation and Contour-Finding
Completions: L. Welling, A. Eiden; April 1997

Completions: J. Dahmen, F. Hilger, S. Koepke; Mai 2000
Completions: F. Hilger, D. Keysers; Juli 2001
Translation: M. Popovic, R. Schl
uter; April 2003
Corrections: D. Stein; October 2006
Literature:
A. V. Oppenheim, R. W. Schafer: Discrete Time Signal Processing,
Prentice Hall, Englewood Cliffs, NJ, 1989.
A. Papoulis: Signal Analysis, McGraw-Hill, New York, NY, 1977.
A. Papoulis: The Fourier Integral and its Applications, McGraw-Hill
Classic Textbook Reissue Series, McGraw-Hill, New York, NY, 1987.
W. K. Pratt: Digital Image Processing, Wiley & Sons Inc, New York,
NY, 1991.
Further reading:
T. K. Moon, W. C. Stirling: Mathematical Methods and Algorithms
for Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2000.
J. R. Deller, J. G. Proakis, J. H. L. Hansen: Discrete-Time Processing
of Speech Signals, Macmillan Publishing Company, New York, NY,
1993.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery: Numerical Recipes in C, Cambridge Univ. Press, Cambridge, 1992.
L. Rabiner, B. H. Juang: Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993.
T. Lehmann, W. Oberschelp, E. Pelikan, R. Repges: Bildverarbeitung
f
ur die Medizin, Springer Verlag, Berlin, 1997.
L. Berg: Lineare Gleichungssysteme mit Bandstruktur, VEB Deutscher
Verlag der Wissenschaften, Berlin, 1986.
Contents
1 System Theory and Fourier Transform
1.1 Introduction . . . . . . . . . . . . . . .
1.2 Linear time-invariant Systems . . . . .
1.3 Fourier Transform . . . . . . . . . . . .
1.4 Properties of the Fourier Transform . .
1.5 Parseval Theorem . . . . . . . . . . . .
1.6 Autocorrelation Function . . . . . . . .
1.7 Existence of the Fourier Transform . .
1.8 -Function . . . . . . . . . . . . . . . .
1.9 Motivation for Fourier Series . . . . . .
1.10 Time Duration and Band Width . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Discrete Time Systems

2.1 Motivation and Goal . . . . . . . . . . . . . . . . . . . . .
2.2 Digital Simulation using Discrete Time Systems . . . . . .
2.3 Examples of Discrete Time Systems . . . . . . . . . . . . .
2.4 Sampling Theorem (Nyquist Theorem) and Reconstruction
2.5 Logarithmic Scale and dB . . . . . . . . . . . . . . . . . .
2.6 Quantization . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Fourier Transform and zTransform . . . . . . . . . . . . .
2.8 System Representation and Examples . . . . . . . . . . . .
2.9 Discrete Time Signal Fourier Transform Theorem . . . . .
2.10 Discrete Fourier Transform: DFT . . . . . . . . . . . . . .
2.11 DFT as Matrix Operation . . . . . . . . . . . . . . . . . .
2.12 From Continuous Fourier Transform to Matrix Representation of Discrete Fourier Transform . . . . . . . . . . . . . .
2.13 Frequency Resolution and Zero Padding . . . . . . . . . .
2.14 Finite Convolution . . . . . . . . . . . . . . . . . . . . . .
2.15 Fast Fourier Transform (FFT) . . . . . . . . . . . . . . . .
2.16 FFT Implementation . . . . . . . . . . . . . . . . . . . . .
i
1
2
11
16
25
33
34
35
36
41
45
51
52
53
56
61
70
72
74
78
88
90
98
102
104
105
108
118
2.17 Cyclic Matrices and Fourier Transform . . . . . . . . . . .
124
3 Spectral analysis
131
3.1 Features for Speech Recognition . . . . . . . . . . . . . . . 132
3.2 Short Time Analysis and Windowing . . . . . . . . . . . . 135
3.3 Autocorrelation Function and Power Spectral Density . . . 159
3.4 Spectrograms . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.5 Filter Bank Analysis . . . . . . . . . . . . . . . . . . . . . 168
3.6 Mel-frequency scale . . . . . . . . . . . . . . . . . . . . . . 171
3.7 Cepstrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
3.8 Statistical Interpretation of the Cepstrum Transformation
183
3.9 Energy in acoustic Vector . . . . . . . . . . . . . . . . . . 185
4 Fourier Transform and Image Processing
4.1 Spatial Frequencies and Fourier Transform for Images
4.2 Discrete Fourier Transform for Images . . . . . . . .
4.3 Fourier Transform in Computer Tomography . . . . .
4.4 Fourier Transform and RST Invariance . . . . . . . .
5 LPC Analysis
5.1 Principle of LPC Analysis . . . . . . . . .
5.2 LPC: Covariance Method . . . . . . . . . .
5.3 LPC: Autocorrelation Method . . . . . . .
5.4 LPC: Interpretation in Frequency Domain
5.5 LPC: Generative Model . . . . . . . . . .
5.6 LPC: Alternative Representations . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
187
188
196
197
199
.
.
.
.
.
.
207
208
212
213
216
221
223
6 Outlook: Wavelet Transform

225
6.1 Motivation: from Fourier to Wavelet Transform . . . . . . 226
6.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.3 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . 229
7 Coding (appendix available as separate document)
233
8 Image Segmentation and Contour-Finding
237
ii
List of Figures
1.1
Oscillograms of three time functions composed as sum of 20

partial oscillations. a) n = 0, b) n = 2 , c) n statistical.
Amplitude spectrum of a time function composed as sum of

20 partial tones. . . . . . . . . . . . . . . . . . . . . . . . .
from left to right: original photo, low-pass and high-pass

filtered version . . . . . . . . . . . . . . . . . . . . . . . .
Phase manipulation for portion of a speech signal (vowel o)

sampled at 8kHz, 25ms analysis window (200 samples), 512
point FFT . . . . . . . . . . . . . . . . . . . . . . . . . . .
Phase manipulation for portion of a speech signal (consonant

n) sampled at 8kHz, 25ms analysis window (200 samples),
512 point FFT . . . . . . . . . . . . . . . . . . . . . . . .
1.6
Phase manipulation for a Heavisidefunction (stepfunction)
1.7
Schematic representation of the physiological mechanism of

speech production . . . . . . . . . . . . . . . . . . . . . . .
2.1
Digital photo . . . . . . . . . . . . . . . . . . . . . . . . .
58
2.2
Gradient image . . . . . . . . . . . . . . . . . . . . . . . .
58
2.3
Several real cases of Laplace Operator subtraction from original image. a) Original image b) Original image minus
Laplace Operator (negative values are set to 0 and values
above the grey scale are set to the highest grade of grey) .
60
Ideal reconstruction of a band-limited signal (from Oppenheim, Schafer)

a) original signal b) sampled signal c) reconstructed signal
64
1.2
1.3
1.4
1.5
2.4
iii
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
Sampling of band-limited signal with different sampling rates:

b) sampling rate higher than Nyquist rate - exact reconstruction possible
c) sampling rate equal to Nyquist rate - exact reconstruction
possible
d) sampling rate smaller than Nyquist rate - aliasing - exact
reconstruction not possible . . . . . . . . . . . . . . . . . .
65
Amplitude spectrum of the voiceless phoneme s from the
word ist . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
Logarithmic amplitude spectrum of the phoneme s . . .
71
Amplitude spectrum of the voiced phoneme ae from the
. . . . . . . . . . . . . . . . . . . . . . . . . . .
word Ah
71
Logarithmic amplitude spectrum of the phoneme ae . . .
71
Amplitude spectrum of a speech pause . . . . . . . . . . .
71
Logarithmic amplitude spectrum of a speech pause . . . .
71
Hanning window . . . . . . . . . . . . . . . . . . . . . . . 103
Example of a linear convolution of two finite length signals:
a) two signals;
b) signal x[n-k] for different values of n:
i) n < 0, no overlap with h[k], therefore convolution y[n] =
0
ii) n between 0 and Nh + Nx 2, convolution 6= 0
iii) n > Nh + Nx 2, no overlap with h[k], convolution y[n]
=0
c) resulting convolution y[n]. . . . . . . . . . . . . . . . . . 106
Flow diagram for decomposition of one N -DFT to two N/2
DFTs with N = 8 . . . . . . . . . . . . . . . . . . . . . . . 110
Flow diagram of an 8pointFFT using Butterfly operations. 111
Flow diagram of an 8pointFFT using Butterfly operations. 120
Input and output arrays of an FFT. a) The input array contains N (N is power of 2) complex input values in one real
array of the length 2N . with alternating real and imaginary parts. b) The output array contains complex Fourier
spectrum at N frequency values. Again alternating real and
imaginary parts. The array begins with the zero-frequency
and then goes up to the highest frequency followed with
values for the negative frequencies. . . . . . . . . . . . . . 122
iv
3.1
3.2
3.3
3.4
3.5
Example for the application of the Discrete Fourier Transform (DFT). . . . . . . . . . . . . . . . . . . . . . . . . . .
138
a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum

V (ej ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
146
a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum

V (ej ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
148
a) DFT of length N = 64; b) DFT of length N = 128; c)

Fourier spectrum V (ej ). . . . . . . . . . . . . . . . . . . .
151
Influence of the window function:

above: speech signal (vowel a); central: 512 point FFT
using rectangle window; below: 512 point FFT using Hamming window . . . . . . . . . . . . . . . . . . . . . . . . .
158
3.6
Fourier Transform of a voiced speech segment:

a) signal progression, b) high resolution Fourier Transform,
c) low resolution Fourier Transform with short Hamming
window (50 sampled values), d) low resolution Fourier Transform using autocorrelation function (19 coefficients), e) low
resolution Fourier Transform using autocorrelation function
(13 coefficients) . . . . . . . . . . . . . . . . . . . . . . . . 162
3.7
Signal progression and autocorrelation function of voiced

(left) and unvoiced (right) speech segment . . . . . . . . .
163
Temporal progression of speech signal and four autocorrelation coefficients . . . . . . . . . . . . . . . . . . . . . . . .
164
3.8
3.9
a) wide-band spectrogram: short time window, high time

resolution (vertical lines), no frequency resolution; for voiced
signals provides information on formant structure b) narrowband spectrogram: long time window, no time resolution,
high frequency resolution (horizontal lines); for voiced signals provides information on fundamental frequency (pitch) 166
3.10 Wide-band and narrow-band spectrogram and speech amplitude for the sentence Every salt breeze comes from the
sea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
167
3.11 Above: logarithmized power spectrum of a spoken vowel

(schematic).
Below: corresponding cepstrum (inverse Fouriertransform
of the logarithmized power spectrum). . . . . . . . . . . .
177
3.12 Cepstral smoothing: speech signal (vowel a), windowed

speech signal (Hamming window), spectrum obtained from
the whole cepstrum (blue) and smoothed spectrum obtained
from the first 13 cepstral coefficients (red). . . . . . . . . .
3.13 Homomorph analysis of a speech segment: signal progression, homomorph smoothed spectrum using 13 and 19 cepstral coefficients . . . . . . . . . . . . . . . . . . . . . . . .
179
4.1
4.2
4.3
4.4
4.5
4.6
TVimage (analog) . . . . . .
Digitized TVimage . . . . . .
Amplitude spectrum of Figure
Low-pass filtered . . . . . . .
High-pass filtered . . . . . . .
High-pass enhancement . . . .
193
193
193
193
194
194
5.1
LPCanalysis of one speech segment

a) signal progression, b) prediction error (K=12), c) LPC
spectrum with K=12 coefficients, d) spectrum of the prediction error (K=12), e) LPCspectrum with K=18 coefficients 219
LPCSpectra for different prediction orders K . . . . . . . 220
5.2
. .
. .
4.2
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
178
List of Tables
2.1
2.2
Fourier transform pairs . . . . . . . . . . . . . . . . . . . .

Fourier transform Theorems . . . . . . . . . . . . . . . . .
87
88
Chapter 1
System Theory and Fourier
Transform
Overview:
1.1 Introduction
1.2 Linear time-invariant Systems
1.3 Fourier Transform
1.4 Properties of Fourier Transform
1.5 Parseval Theorem
1.6 Autocorrelation Function
1.7 Existence of the Fourier Transform
1.8 Function
1.9 Fourier Series
1.10 Duration and Band Width
Digital Processing of Speech and Image Signals
WS 2006/2007
1.1
Introduction
What distinguishes the Fourier Transform (FT) from

other transformations?
1. Mathematical property of linear time-invariant systems:
FT decomposes the time signal into eigenfunctions
eigenfunctions keep their form by passing the
linear time-invariant system
Ax=x
Magnitude of FT: shift invariant
2. Physical observation:
Human ear produces sort of FT, essentially only magnitude of FT
(strictly speaking: short-time FT)
Example:
Time functions with different evolution can sound equally.
The human ear either senses sense phase differences of partial
tones of the complete sound of stationary processes very weakly,
or does not sense them at all.
Fourier transform in speech processing:
Calculation of the spectral components of speech
Basic method for obtaining observations (features) for
speech recognition
WS 2006/2007
=0
= /2
random
0
0
Figure 1.2: Amplitude spectrum of a time

function composed as sum of 20 partial
tones.
0
Figure 1.1: Oscillograms of three time

functions composed as sum of 20 partial
oscillations. a) n = 0, b) n = 2 , c) n
statistical.
Figure 1.3: from left to right: original photo, low-pass and high-pass filtered version
WS 2006/2007
amplitude spectrum
original signal
inverse FT for phase (f ) = 0
Inverse FT for phase (f ) =
Inverse FT for random phase (f )
Figure 1.4: Phase manipulation for portion of a speech signal (vowel o) sampled at 8kHz,
25ms analysis window (200 samples), 512 point FFT
WS 2006/2007
amplitude spectrum
original signal
inverse FT for phase (f ) =
inverse FT for random phase (f )
Figure 1.5: Phase manipulation for portion of a speech signal (consonant n) sampled at
8kHz, 25ms analysis window (200 samples), 512 point FFT
WS 2006/2007
amplitude spectrum
original signal
inverse FT for phase (f ) =
inverse FT for random phase (f )
Figure 1.6: Phase manipulation for a Heavisidefunction (stepfunction)
WS 2006/2007
Why Fourier?
Roughly:
Production, description and algorithmic operations on signals (functions or measurement curves over the time axis) can be described very
well in Fourier domain (frequency domain).
Deeper reason:
Production, description and algorithmic operations on signals are largely
based on linear time-invariant (LTI) operations.
Fourier Transform: simple representation of LTI-operations (later:
convolution theorem)
Why continuous?
Real world is continuous
Computer (digital = time discrete = sampled)
model of the real world
WS 2006/2007
a)
A
glottal
pulses
vocal
tract
filter
b)
speech [a:]
radiation
from lips
and nose
|E(f)| [dB]
[a:]
|V(f)| [dB]
|A(f)| [dB]
|S(f)| [dB]
[a:]
1/T
NOSE
OUTPUT
NASAL
CAVITY
VELUM
PHARYNX
CAVITY
VOCAL
CORDS
LARYNX
TUBE
MOUTH
CAVITY
TONGUE
HUMP
MOUTH
OUTPUT
TRACHEA AND
BRONCHI
LUNG
VOLUME
MUSCLE
FORCE
Figure 1.7: Schematic representation of the physiological mechanism of speech production
WS 2006/2007
signal (speech, image)
feature extraction
(signal analysis)
feature vector
(pattern vector)
(pattern)
comparison
reference data
(vectors, features)
decision
Examples:
Spoken language
Written numbers (letters)
Cell recognition (red blood cells)
WS 2006/2007
Examples of applications of Fourier Transform:

Electrical switchgears
Recognition and coding
Speech and general acoustic signals
Image signals
Time series analysis:
Astronomical measurement curves
Stock-market course
...
Computer tomography
Solving differential equations
Description of image production in optical systems
10
WS 2006/2007
1.2
Linear time-invariant Systems
Example:
speech production
electrical systems
h(t)
input signal
x(t)
output signal
y(t)
symbolic:
{t y(t)} = S {t x(t)}
simplified:
y(t) = S {x(t)}
Note: the complete time domain of the function is important, not
individual positions in time t.
more exact:
y = S {x}
LTISystem:
(LTI = Linear Time-Invariant)
Linear:
Additive:
S {x1 + x2 } = S {x1 } + S {x2 }
Homogeneous:
S { x} = S {x} ,
IR
Time-invariant:
{t y(t t0 )} = S {t x(t t0 )} ,
11
t0 IR
WS 2006/2007
Mathematical theorem:
Linearity and time invariance result in convolution representation
Output signal y(t) of LTI system S with input signal x(t):
y(t) =
x(t ) h( ) d
x( ) h(t ) d
= x(t) h(t)
h: impulse response of the system S
e
(t)
x (t)
1/
system response h (t) to excitation e (t):

h (t) = S {e (t)}
signal x(t) is represented as sum of amplitude weighted and time
shifted elementary functions e (t):
"
#
X
x(t) = lim
x(i ) e (t i )
0
12
WS 2006/2007
Hence the following holds for the output signal y(t):

y(t) = S {x(t)}
= S
=
lim
"
lim
(i
additivity:
=
lim
"
x(i ) e (t i )
X
i
x(i ) e (t i )
S { x(i ) e (t i ) }
)#
homogeneity (for x(i ) and ):

"
#
X
= lim
x(i ) S { e (t i ) }
0
time invariance:
=
lim
"
X
i
x(i ) h (t i )
limiting case 0 :
X
i
h (t)
h(t)
result:
y(t) =
h(t):
x( ) h(t ) d = x(t) h(t)
impulse response of the system
13
WS 2006/2007
Examples of LTI-operations:
Oscillatory systems (electrical or mechanical) with
external excitation:
h( )
x(t)
y(t) =
y(t)
h(t ) x( ) d
y (t) + 2y (t) + 2 y(t) = x(t)

, : parameters depending on the oscillatory system
More general electrical engineering systems:
high-pass, low-pass, band-pass
Sliding average value:

x(t)
y(t) := x(t)
+T
Z /2
1
x(t) =
T
x(t + ) d
T /2
Differentiator:
x(t)
y(t) := x (t)
Comb filter: hypothesized period T

x(t)
y(t) := x(t) x(t T )
In general: linear differential equations with coefficients ck and dl

P
P
ck y (k) (t) = dl x(l) (t)
k
[ + further constraints ]
14
WS 2006/2007
Example of a non-linear system:

system: y(t) = x2 (t)
x(t) = A cos(t)
A2
(1 + cos(2t))
= y(t) = A cos (t) =
2
2
frequency doubling
15
WS 2006/2007
1.3
Fourier Transform
Sinusoidal oscillation:
x(t) = A sin ( t + )
amplitude A
phase / null phase
angular frequency = 2 f
j 2 = 1,
jC
Im
1
sin
cos 1
complex representation:
Re
ej = cos + j sin ,
ej + ej
cos =
2
and
IR
ej ej
sin =
2j
dimension:
DIM() DIM(t) = 1
DIM() =
1
1
=
= [Hz]
DIM(t) [sec]
16
WS 2006/2007
LTI-System
y(t) =
x(t )h( )d = x(t) h(t)
Determine the following specific input signal:

x(t) = A ej(t+)
For this input signal the output signal becomes:
y(t) =
A ej((t )+) h( )d
= A ej(t+)
h( )ej d
|
{z
}
H() = F {h( )}
x(t) H()
Definition of the Fourier transform:

Z
h( )ej d = F {h( )} = F { h( )}
H() =
decomposition into ej )
H() is called transfer function of the system

Remark about x(t) = A ej(t+) :
The shape of the input signal x(t), i.e. its frequency (eigenfunction) remains invariant
Amplitude (intensity) and phase (time shift) are depending on
H() (eigenvalue)
(
analogy to the problem of eigenvalues in linear algebra)
17
WS 2006/2007
Remarks
FT is complex:
H() = Re {H()} + j Im {H()} = |H()| ej()
Amplitude (spectrum):
q
Re {H()}2 + Im {H()}2
|H()| =
Phase (spectrum):

Im {H()}
arctan
Re {H()}

Im
{H()}
arctan
+
Re
{H()}
() =
18
Re {H()} > 0
Re {H()} < 0
Re {H()} = 0,
Im {H()} > 0
Re {H()} = 0,
Im {H()} < 0
WS 2006/2007
Examples of Fourier transforms:

1. Rectangle function
t
h(t) = rect( ) =
T
H() =
1,
0,
|t| T /2
|t| > T /2
jt
h(t)e
dt =
Z2
jt
T2
i
1 h j T
j T2
2
e
e
dt =
j
T
)
T sin(
2
T
2
sin(
) =
T
2
2
(here: Im {H()} = 0)
h(t)
H()
19
WS 2006/2007
2. Double-sided exponential
h(t) = e|t|
H() =
with > 0
h(t)ejt dt
e(+j)t dt +
=
=
=
=
e(j)t dt

e(j)t
e
+
( + j) ( j) 0
1
1
0+0
( + j) ( j)
j + + j
2 + 2
2
2 + 2
(+j)t
Imaginary part equals 0

Infinite spectrum
No zeros
H( )
h(t)
If h(t) is symmetric (i.e. h(t) = h(t)), imaginary parts drop away

and the real part is sufficient
20
WS 2006/2007
3. Damped oscillations
h(t) = e|t| cos(t) with > 0
H() =
h(t)ejt dt
e(+j)t cos(t)dt +
e(j)t cos(t)dt
e(+j)t
ejt + ejt
dt +
2
e(j)t
ejt + ejt
dt
2
...
(elementary calculation)
+
2 + ( )2
2 + ( + )2
Limiting case:
H()|= =
2 + (2)2
= tends towards or if tends towards 0
H( )
h(t)
21
WS 2006/2007
4. Modulated rectangle function (truncated cosine)

cos( t),
|t| T /2
h(t) =
0,
|t| > T /2
H() =
h(t)ejt dt
Z2
cos( t)ejt dt
T2
...
(elementary calculation)

T
sin
(
)
T
2
T
2
( )
2

T
sin ( + )
2
T
( + )
2
h(t)
h(t)
H()
H()
22
WS 2006/2007
Fourier Transform pairs (u = /2)

Rectangle function
Sinc function
-1/2
sin(u)
u
1/2
Squared sinc function
Triangle function
-1/2
1/2
Exponential function
2
2+(2u)2
e-|x|
Gaussian function
e -x
- u
e
Unit impulse
(x)
23
WS 2006/2007
Inverse Fouriertransform
Z
H() =
h(t)ejt dt
h(t)
=
2
assumption:
with:
H() =
H()ejt d
h( )ej d
inserting H() in h(t):
h(t)
=
1
2
lim
,T
1
lim lim
2 T
1
lim lim
lim
= h(t)
ZT
h( ) ej(t ) d d
T
ZT Z
ej(t ) d h( ) d
T
ZT
sin ((t ))
h( ) d
t
sin ((t ))
h( ) d
t
due to:
1
lim

Z
sin(t)
h(t) dt = h(0)
t
formal expression:
h(t) =
1
2
|
ej(t ) d h( ) d
{z
= (t )
distribution theory, see there for stronger proof)
24
WS 2006/2007
1.4
Properties of the Fourier Transform
Symmetry
H() =
h(t) ejt dt = F {h(t)}
1
2
h(t) =
H() ejt d = F 1 {H()}
F 2 {h(t)} = F {H()} = 2h(t)

F 1 F {h(t)} = F 1 {H()} = h(t)
Time domain and frequency domain are correlated symmetrically.
Properties of FT are valid in both domains, especially the convolution
theorem (see later).
25
WS 2006/2007
Theorems for the Fourier transform

H() =
ejt h(t) dt
consider the equation:

H() = F {h(t)}
more exact:
{ H()} = F {t h(t)}
1. Linearity:
integral operator is linear
2. Inverse scaling, similarity principle:

Z
h(t) ejt dt =
F {h(t)} =
1
||
h( ) ej d
H( ),
||
IR\{0}
Note:
Absolute value, because integral boundaries are swapped for < 0.
3. Shift:
h(t t0 )
Z
h(t t0 ) ejt dt = ejt0

= ejt0
h(t t0 ) ej(tt0 ) dt
h( ) ej d
26
WS 2006/2007
= F {h(t t0 )} = ejt0 H() t0 IR

with H() = F {h(t)}
important:
| F {h(t t0 )} | = | F {h(t)} |
, because
|ejt0 | = |eju | = | cos u j sin u|

p
cos2 u + sin2 u
=
= 1
4. Symmetry and antisymmetry:
h(t) = h(t)
results in
h(t) = h(t)
5. Complex conjugation:
Z
Im{H()} = 0
results in
Re{H()} = 0
suppose that h(t) is a complex function
h(t) ejt dt
h(t) ejt dt
h(t) ejt dt = H()
F {h(t)} = H() = F {h(t)}

Special case:
h(t) is real, so
h(t) = h(t)
= H() = H() = | H() | = | H() | = | H() |
27
WS 2006/2007
6. Differentiation:
dh
dt
1
t
2
1
2
H() ejt d
H() j ejt d
F{
dh(t)
} = j F {h(t)}
dt
Interpretation: differentiation = enhancement of high frequencies

(due to the multiplication with )
7. Integration:
F{
Zt
h( )d } =
Proof:
1
F {h(t)}
j
similar to differentiation or inversion
8. Modulation principle:
F {h(t) cos(0 t)} =
h(t) cos(0 t) ejt dt
1
h(t) ej0 t ejt dt +
h(t) ej0 t ejt dt
2
Z
Z
1
h(t) ej(0 )t dt +
h(t) ej(+0 )t dt
=
2
1
[ H( 0 ) + H( + 0 ) ]
2
and similarly
F { h(t) sin(0 t) } =
1
[ H( 0 ) H( + 0 ) ]
2j
28
WS 2006/2007
y(t)
x(t)
h(t), H()
Y()
X()
Convolution theorem
Convolution in time domain corresponds to multiplication in frequency
domain
Z
Time domain:
y(t) = x(t) h(t) =
x(t ) h( ) d
Frequency domain:
Y () =
ejt
h( ) x(t ) d dt
Z
h( )
x(t ) ejt dt d
h( ) X() ej d
= X()
(shifting)
h( ) ej d
= X() H()
29
WS 2006/2007
Likewise, multiplication in time domain corresponds to convolution in

1
):
frequency domain (note the factor 2
Time domain:
y(t) = a(t) b(t)
Frequency domain:
Y () =
1
2
1
2
a(t) b(t) ejt dt

1
a(t)
2
Z
B(
)ej t ejt d
dt
B(
)
a(t)ej()t dt d
A(
) B(
)d
1
A() B()
2
Motivation for the Fourier transform:

FT gives the simplest representation of the system operation, because every LTI-System can be interpreted as convolution of the input
signal x(t) and the impulse response of the system h(t). Convolution
can be then efficiently calculated using FT and convolution theorem.
Mathematical: eigenfunctions
30
WS 2006/2007
Example: Oscillator with excitation

Oscillator
x(t)
y(t)
y (t) + 2 y (t) + 2 y(t) = x(t)

Z+
1
x(t) =
X()ejt d
2
y(t) =
y (t) =
y (t) =
1
2
1
2
1
2
Z+
Y ()ejt d
Z+
Y ()j ejt d
Z+
Y ()[ 2 ] ejt d
Z+
Z+
[ 2 + 2j + 2 ]Y ()ejt d =
X()ejt d
Z+

[ 2 + 2j + 2 ] Y () X() ejt d = 0
|
{z
}
=0
In this way we obtain the transfer function of an oscillator:
H() =
Y ()
1
=
X() 2 + 2j + 2
31
WS 2006/2007
1
h(t) =
2
Z+
H()ejt d
(can be given explicitly)
Z+
x(t) h(t )d
y(t) =
Note:
y(t) does not contain the component which corresponds to the homogeneous differential equation of the oscillator.
x(t)
Convolution with
h(t)
Inverse Fourier
Transform
Fourier
Transform
X()
y(t)
Multiplication with
H() = F{h(t)}
32
Y()
WS 2006/2007
1.5
Parseval Theorem
Convolution theorem:
F 1 {H() X()} =
()
1
2
H() X() ej d
h(t) x( t) dt
= (h x) ( )
We make two special assumptions:

i) x(t) := h(t), then: X() = H()
ii) = 0
Inserting in () results in:
1
2
1
2
H()H() d
Z
|H()|2 d
h(t)h(t) dt
|h(t)|2 dt = E
Energy E in time domain = Energy E in frequency domain

1
1
; aid: use normalization factor
for both
(up to the factor
2
2
directions of Fourier Transform)
Physical aspect: energy conservation
Mathematical aspect: unitary (orthogonal) representation in vector
space
|H()|2 is called power spectral density.
33
WS 2006/2007
1.6
Autocorrelation Function
Autocorrelation function
Autocorrelation function of time continuous
signal or function h(t) is defined as:
R(t) =
h( ) h(t + )d
The following equation is valid:

R(t) = h(t) h(t)
Fourier transform gives:
which results in
R(t) = R(t)
(Wiener-Khinchin Theorem)
F {R(t)} = H() H() = |H()|2

Thus: Fourier transform connects autocorrelation
function R(t) and power spectral density |H()|2
|H()|2 =
R(t) ejt dt =
R(t) cos(t) dt
Remark:
autocorrelation is a special case of the cross correlation between signals x( ) and h(t)
Ch,x =
h( ) x(t + )d
34
WS 2006/2007
1.7
Existence of the Fourier Transform
Conditions for h(t) for the existence of the Fourier transform
H() =
1
h(t) =
2
ejt h(t) dt ,
ejt H() d
When are those equations valid?

Sufficient conditions:
1. h(t) is absolutely integrable:
Z
|h(t)|dt <
2. h(t) has finite number of jumps, minima and maxima in each interval
of IR
3. h(t) has no infinite jumps
More general conditions are possible (but rather complex set of conditions):
Generalized functions, distributions,
definition as functional
Example: -function:
Z
(t) h(t) dt = h(0) for all functions h
35
WS 2006/2007
Impulse response:
y(t) =
h(t )( ) d
= h(t) (t)
= h(t)
Consequence:
h(t) 1
(t) dt = 1
A function like (t) does not exist. But it is possible to define the
functional for each function t h(t):
[t h(t)] (h)
:= h(0)
1.8
-Function
Starting point: definition of the -function as a boundary case of

a function (t):
lim
Z+
f (t) (t) dt = f (0)
(1.1)
Possible realizations of (t)
1 t [, +]
2
a) (t) =
0 otherwise
b) (t) =
1
2 + t2
36
WS 2006/2007
c) (t) =
1 sin (t/)
d) (t) =
1
22
t2
e 22
During inversion of the Fourier transform we have formally obtained:

(t) =
1
2
Z+
ejt d = lim
1 sin (t)
(1.2)
Fourier transform F {(t)}:

F {(t)} =
Z+
ejt (t) dt
due to (1.1) the following holds:

F {(t)} = ejt |t=0 = 1
Another derivation using (1.2):
(t) =
1
2
1
2
Z+
Z+
ejt F {(t)} d
ejt d
general
according to (1.2)
Comparison results in:

F {(t)} = 1
37
WS 2006/2007
From this we obtain the following equations:

From symmetry property:
F {1} = 2 ()
From shifting theorem:
F {ej0 t 1} = 2 ( 0 )
cos (0 t) =
=

1 j0 t
e
+ ej0 t
2
Z+
Z+
1
( + 0 ) ejt d
( 0 ) ejt d +
2
Z+
1
2
[ ( 0 ) + ( + 0 ) ] ejt d
F { cos (0 t) } = [ ( 0 ) + ( + 0 ) ]
Note: another derivation:
consider damped oscillations
1 |t|
e
cos (0 t)
2
in the limit 0 .
38
WS 2006/2007
Comb function
define comb function (pulse train, sequence of -impulses):
+
X
x(t) =
n=
(t nT )
Fourier transform of comb function:

X() =
Z+
x(t) ejt dt
Z+ X
+
(t nT ) ejt dt
(t nT ) ejt dt
n=
+
+ Z
n=
+
X
jnT
n=
=
=
...
(see Papoulis 1962, p. 44)
+
2 X
2
( n )
T n=
T
in words:
-impulse sequence with period T in time domain
produces
-impulse sequence with period T1 in frequency domain
(i.e. 2
T in -frequency domain)
comb function is transformed to comb function
39
WS 2006/2007
Comb function
n=-
-6T
2
T
(t-nT)
-3T -T
3T
n=-
-6 -4 -2
T T T
6T
(-n2/T)
2 4 6
T T T
1((- )+(+ ))
0
0
2
cos(0t)
1 j((- )+(+ ))
0
0
2
sin(0t)
0
0
40
WS 2006/2007
1.9
Motivation for Fourier Series

x:
IR IR
t x(t)
Consider a periodical function x with period T :

x(t) = x(t + T )
then also x(t) = x(t + kT )
for each t IR
for k Z
Examples:
Constant function:
x0 (t) = A0
Harmonic oscillator:
x1 (t) = A1 cos (
2
t + 1 ) ,
T
A1 > 0
All higher harmonic:

xn (t) = An cos (n
2
t + n ) ,
T
An > 0
therefore
x(t) =
An cos (n 0 t + n ) with 0 =
n=0
is periodical with period T =
2
,
T
An 0
2
0
Another notation:
x(t) =
Bn ej n 0 t
where Bn
is a complex number
n=
41
WS 2006/2007
Line spectrum representation
Real measured signal has always a widespread spectrum.

Reasons:
Strictly periodical signal (almost) never exists
Period can fluctuate
Wave form within one period can fluctuate
Only a finite section of the signal is analyzed
(window function)
Only a strictly periodical signal has a sharp line spectrum
Remarks:
Fourier series are actually not strictly related to periodical functions:
a finite interval of IR is sufficient (the signal is then interpreted as
infinitely prolonged).
By transition from the finite interval to the complete real axis the
Fourier series becomes Fourier integral.
42
WS 2006/2007
Calculation of Fourier coefficients:

Consider a periodical function x(t) with period T =
2
0
approach:
x(t) =
+
X
an ej n 0 t
aC
n=
multiplication with ej m 0 t where m IN and integration over one

period result in:
+T
Z /2
x(t) ej m 0 t dt =
+
X
ej (nm) 0 t dt
an
n=
T /2
+T
Z /2
T /2
Due to orthogonality holds:

+T
Z /2
j (nm) 0 t
dt =
T /2
T
0
if n = m
if n =
6 m
Then:
ZT /2
x(t) ej m 0 t dt = am T
T /2
Result:
an
1
T
+T
Z /2
x(t) ej n 0 t dt
T /2
1
T
+T
Z /2
x(t) cos (n 0 t) dt j
T /2
1
T
+T
Z /2
x(t) sin (n 0 t) dt
T /2
43
WS 2006/2007
Spectrum of a periodical function

If x(t) is periodical with the period T =
x(t) =
+
X
2
0
an ej n 0 t ,
n=
, then
an C
The Fourier transform X() is:

X() = F {x(t)}
+
X
an
=
n=
+
X
= 2
F {ej n 0 t }
| {z }
= 2( n0 )
n=
an ( n0 )
Note:
This derivation is formal, because the Fourier integral does not

exist in the usual sense;
strict derivation within the scope of distribution theory.
In words:
a periodic function with the period T has a Fourier transform in the

form of a line spectrum with the distance 0 = 2
T between the components.
44
WS 2006/2007
1.10
Time Duration and Band Width
1. Similarity principle:
F {h(t)} =
H( )
||
h(t)
H( )
0<<1:
_ H( _
1
h( t)
time duration T
band width B
T B
= const.
High resolution in the time domain results in low resolution in the

frequency domain and vice versa
45
WS 2006/2007
2. Special case: h(t) with

Im {H()} = 0 ( h(t) symmetrical )
and
Re {H()} 0
h(t) has maximum for t = 0:
1
h(t) =
2
1
H() cos(t) d
2
H() d = h(0)
define:
T
1
h(0)
1
H(0)
h(t) dt
H() d
from
T
H(0)
h(0)
and B = 2
h(0)
H(0)
follows
T B
46
= 2
WS 2006/2007
3. In general:
normalized impulse h(t) IR with

Z
h2 (t) dt = 1,
h(t) IR
T2
B2
:=
:=
h2 (t) t2 dt
| H() |2 2 d
Results in uncertainty relation:
= 2
T B
2
[h (t)] dt
Proof: Cauchy-Schwarz inequality

| xT y | ||x|| ||y||

2
Z

Z
Z

2
2

[ t h(t)] dt
[ h (t) ] dt
[ t h(t) ] h (t) dt

|
{z
}
{z
}
{z
}
|
|
2
2
1
B
=T
=
4
2
From:
partial integration
u (t) v(t) dt = u(t) v(t)
u(t) v (t) dt
1
t dt = h(t)2 t
[ h(t) h (t) ] |{z}
{z
}
|
2
v(t)
u (t)
[ h(t) h (t) ] t dt = 0
47
1 2
h (t) 1 dt
2
1
2
WS 2006/2007
Equality sign is valid for linear dependency:

h (t) = t h(t)
dh
= t dt
h
1
log(h) = t2 + const.,
2
Optimum T B =
> 0
for Gauss impulse
1 2
t
h(t) = e 2
2
Variance: 2 =
Quantum Physics: similar statement about position and impulse

of a particle
48
WS 2006/2007
4. Finite positive signal

0
0tT
g(t)
=0
t < 0 or T < t
g(t)
The following is valid for the amplitude spectrum |G()|:

+

Z

jt
g(t) e
dt
|G()| =

+
Z
|g(t)| |ejt |dt
Z+
|g(t)| dt
because g(t) 0
= G(0)
Define the band width B as:
|G(B )|2 =
G2 (0)
2
and
|G(B )|2 |G()|2 for || < B
Then:
T B
49
2
WS 2006/2007
Proof:
The following inequalities are valid:
(a b)2
a +b
2
| sin | + | cos | 1
2
a, b IR
IR
For the Fourier-Transform of g(t) holds:
Re{G()} =
ZT
g(t) cos t dt
Im{G()} =
ZT
g(t) sin t dt
holds:
cos t 0, sin t 0
2
and therefore:
cos t + sin t = | cos t| + | sin t| 1
For 0 t
Re{G()} Im{G()} =
ZT
g(t) [cos t + sin t] dt
ZT
g(t) 1 dt
= G(0)
|G()|2 = Re2 {G()} + Im2 {G()}
[Re{G()} Im{G()}]2
2
1 2
G (0) |G(B )|2
2
50
WS 2006/2007
Chapter 2
Discrete Time Systems
Overview:
2.1 Motivation and Goal
2.2 Digital Simulation using Discrete Time Systems
2.3 Examples of Discrete Time Systems
2.4 Sampling Theorem and Reconstruction
2.5 Logarithmic Scale and dB
2.6 Quantization
2.7 Fourier Transform and zTransform
2.8 System Representation and Examples
2.9 Discrete Time Signal Fourier Transform Theorems
2.10 Discrete Fourier Transform (DFT)
2.11 DFT as Matrix Operation
2.12 From continuous FT to Matrix Representation of DFT
2.13 Frequency Resolution and Zero Padding
2.14 Finite Convolution
2.15 Fast Fourier Transform (FFT)
2.16 FFT Implementation
51
WS 2006/2007
2.1
Motivation and Goal
If we want to process a continuous time signal x(t) with a computer, we

have to sample it at discrete equidistant time points
tn = n TS
where TS is called sampling period.
Terminology:
time discrete is often called digital, where this adjective often
(but not always) denotes the amplitude quantization,
i.e. the quantization of the value x(n TS ).
Advantages of digital processing in comparison to analog components:
independent of analog components and technical difficulties with respect to their realization;
in principle arbitrary high accuracy;
also non-linear methods are possible,
in principle even every mathematical method.
52
WS 2006/2007
2.2
Digital Simulation using Discrete Time Systems
Task definition:
Given:
Analog system with input signal x(t) and output signal y(t);
Sampling with sampling period TS
Wanted:
Discrete System with input signal x[n] and output signal y[n], such
that
x[n] = x(nTS )
results in
y[n] = y(nTS )
For which signals is such a digital simulation possible?
The sampling theorem gives (most of) the answer.
53
WS 2006/2007
LTI System (analog to continuous time case):

Linearity:
Homogeneity:
S { x[n]} = S {x[n]}
Additivity:
S {x1 [n] + x2 [n]} = S {x1 [n]} + S {x2 [n]}
Shift invariance:
S {x[n n0 ]} = y[n n0 ],
54
n0
whole number
WS 2006/2007
Representation of an LTI System as discrete convolution:

Unit impulse:

[n] =
1,
0,
n = 0
n 6= 0
The signal x[n] is represented with amplitude weighted and time shifted
unit impulses [n]. The system reacts on [n] with h[n]:
h[n] = S {[n]}
Input signal:
x[n] =
k=
x[k] [n k]
Output signal:
y[n] = S
k=
x[k] [n k]
Additivity
=
S { x[k] [n k] }
x[k] S { [n k] }
k=
Homogeneity
=
k=
Time invariance
=
k=
x[k] h[n k]
Input signal x[n] and output signal y[n] of a discrete time LTI system are
linked through discrete convolution.
h[n] is called impulse response like in continuous time case.
55
WS 2006/2007
2.3
Examples of Discrete Time Systems
Difference calculation:
y[n] = x[n] x[n n0 ]
1-2-1-averaging:
y[n] = 0.5 x[n 1] + x[n] + 0.5 x[n + 1]
sliding window averaging (smoothing)
M
X
1
y[n] =
x[n k]
2M + 1
k=M
weighted averaging: instead of constant weight

h[n] =
1
2M + 1
arbitrary weights can be used:

y[n] =
M
X
k=M
h[k] x[n k]
Note: the only difference from general case is

finite length of the convolution kernel h[n].
First order difference equation:
(recursive averaging, averaging with memory)
y[n] y[n 1] = x[n]
(Digital) resonator (second order difference equation)
y[n] y[n 1] y[n 2] = x[n]
Image processing:
Gradient calculation and image enhancement
(Roberts Operator, Laplace Operator)
56
WS 2006/2007
Roberts Cross Operator

gray values x[i, j]
j+1
i+1
2
|x[i, j]|2 = (x[i, j] x[i + 1, j + 1])2 + (x[i, j + 1] x[i + 1, j])2
Note: non-linear operation

simplified:
|x[i, j]| = |x[i, j] x[i + 1, j + 1]| + |x[i, j + 1] x[i + 1, j]|
57
WS 2006/2007
Figure 2.1: Digital photo
Figure 2.2: Gradient image
58
WS 2006/2007
Laplace Operator discrete approximation of the second derivation

2 x[i, j] = 2i x[i, j] + 2j x[i, j]
= x[i + 1, j] 2x[i, j] + x[i 1, j] +
x[i, j + 1] 2x[i, j] + x[i, j 1]
= x[i + 1, j] + x[i 1, j] + x[i, j + 1] + x[i, j 1] 4x[i, j]
-2
1
j+1
1
1
-4
1
-2
j-1
1
i-1
i+1
Image enhancement:
y[i, j] = x[i, j] 2 x[i, j]
= h[i, j] x[i, j]
59
WS 2006/2007
Figure 2.3: Several real cases of Laplace Operator subtraction from original image. a)
Original image b) Original image minus Laplace Operator (negative values are set to 0
and values above the grey scale are set to the highest grade of grey)
60
WS 2006/2007
2.4
Sampling Theorem (Nyquist Theorem) and Reconstruction
The following will be analyzed and derived respectively:

How should we choose the sampling period TS , if we want to represent a
continuous signal x(t) with its sample values x(nTS ) so that the signal x(t)
can be exactly reconstructed from its sample values?
Fourier transform of the continuous time signal x(t):
Z
X() = F { x(t) } =
x(t) ejt dt
1
x(t) = F 1 { X() } =
2
X() ejt d
(2.1)
Signal x(t) has limited bandwidth with upper limit B , which means:
X() = 0
for all || B
Note: X(B ) = 0
X() in domain B < < B can be represented as Fourier Series:
an exp(jn )
X() =
(2.2)
B
n=
The coefficients an are given by:
ZB
1
X() exp(jn ) d
an =
2B
B
(2.3)
Comparison of the equations (2.1) and (2.3) shows that the coefficients
an are given by the values of the inverse Fourier transform of x(t) at
points
n
tn =
(2.4)
B
The band limitation of X() has to be considered for the integration
limits in (2.1). Result:
n
(2.5)
an = x( )
B B
61
WS 2006/2007
Inserting Eq. (2.5) into Eq. (2.2) and then in Eq. (2.1) results in:
1
x(t) =
2
ZB
X
x( ) exp(jn ) exp(jt) d
B n= B
B
After swapping summation and integration and subsequent integration:
x(t) =
x(
n=
n
)
B
n
))
B
n
)
B (t
B
sin(B (t
Reconstruction of the signal x(t) from sample values is possible if

n
equidistant sample values x( ) = x(n Ts ) have the distance TS
B
(2.6)
TS =
B
The sampling period TS corresponds to the sampling frequency S :
S =
2
TS
Equation (2.6) shows that if the sampling frequency is

S := 2 B
the original signal x(t) can be reconstructed exactly.
In the Fourier series representation of X() in equation (2.2), the
period 2 B has been supposed.
B is the highest frequency component of the signal x(t).
62
WS 2006/2007
Since X() is equal to zero for || B , the period 2 B can be

substituted with every period 2
eB where
eB B . The previous
derivation is also valid for this
eB .
When
eB =
then:
x(t) =
x(n TS )
n=
TS
sin( (t n TS )/TS )
(t n TS )/TS
(reconstruction formula)
= 1 (lHopitals rule)
Note: limt0 sin(t)
t
The condition
eB B results in:
TS
(2.7)
for the sampling period TS and in:

S 2 B
(2.8)
for the sampling frequency S .

The equations (2.7) and (2.8) are denoted as sampling theorem. The
sampling frequency has to be at least twice as high as the upper limit
frequency of the signal B where X() = 0 for || B . If and only
if this condition is satisfied, an exact reconstruction (without approximation) of a continuous signal x(t) from its sample values x(nTS ) is
possible.
Note: The sampling frequency S = 2 B is also called
Nyquist frequency.
63
WS 2006/2007
a)
x(t)
b)
xs(t)
c)
xr(t)
T
Figure 2.4: Ideal reconstruction of a band-limited signal (from Oppenheim, Schafer)
a) original signal b) sampled signal c) reconstructed signal
64
WS 2006/2007
X()
a)
XS1() , S > 2
b)
...
...
-S
XS2() , S = 2 (Nyquist rate)

c)
...
...
-S
XS3() , S < 2 (aliasing)

d)
...
...
S
Figure 2.5: Sampling of band-limited signal with different sampling rates:

b) sampling rate higher than Nyquist rate - exact reconstruction possible
c) sampling rate equal to Nyquist rate - exact reconstruction possible
d) sampling rate smaller than Nyquist rate - aliasing - exact reconstruction not possible
65
WS 2006/2007
Another proof using delta- and comb-function:

Sampling of the continuous signal x(t) with S =
2
TS
Band limitation: X() = 0 for || B
(always possible: analog to low-pass with T () = 0 for || B )
Sampling procedure
Multiplication of a function with a comb-function in time domain
xs (t) = Ts x(t)
+
X
n=
(t nTs )
results in a convolution with a comb-function in frequency domain:

+
1
2n
2 X
Xs () = Ts X()

2
Ts n=
Ts

Z+
+
X
2n
d
X(
)
=
T
s
n=
+
X

2
X n
=
Ts
n=
= sampled signal has periodical Fourier spectrum

(Analogy to Fourier series: periodical signal has line spectrum, i.e.
discrete spectrum)
No overlap if:
B S B
2B S
66
WS 2006/2007
In so-called digital simulation, the signal x(t) is represented by its

sampled values x(n TS ) measured at equidistant time points with
distance TS . With a proper sampling period TS an exact reconstruction of the signal x(t) from the sampled values x(n TS ) is possible.
If it is possible to exactly reconstruct the signal x(t) from the sampled
values x(nTS ), then it is possible to perform a discrete time processing
of the sampled values x(n TS ) on a computer, which is equivalent to
the continuous time processing of the signal x(t) (digital simulation).
Continuous time processing:
y(t) =
x( ) h(t ) d
Discrete time processing:

Sampling period TS
x[n] := x(nTS )
y(nTS ) =
y[n] =
k=
k=
x(kTS ) h(nTS kTS ) TS ,
h[n]
= h(nTS )
k]
x[k] h[n
As a result of the convolution theorem (convolution in time domain

corresponds to multiplication in frequency domain), the band limited
input signal gives an also band limited output signal which is exactly
determined by its sampled values.
67
WS 2006/2007
Important:
In the domain || < S /2 the Fourier transform of a continuous time
signal x(t) is identical with the Fouriertransform of the corresponding
sampled discrete time signal x(nTS ):
Z
X() =
x(t) exp(jt) dt
for || S /2 is identical to
X
x(nTS ) exp(jTS n)
TS XS () = TS
= TS
n=
x(nTS ) exp(j
n=
2
n)
S
Inverse Fourier transform of discrete time signal:

x(nTS ) =
1
S
ZS /2
XS () exp(jTS n) d
S /2
One period:
S
S

2
2
2

S
The Fourier transform of a discrete time signal is periodic in with

the period 2 /TS = S .
The Fourier transform of a discrete time signal is
continuous in .
68
WS 2006/2007
Frequency normalization
Define the normalized frequency N :
N : = 2
Definition:
( now denotes a normalized frequency)
Fourier transform of discrete time signal x[n]:

+
X
X(e ) =
x[n] exp(jn)
n=
Note the notation X(ej ).

Inverse Fourier transform of discrete time signal x[n]:
1
x[n] =
2
X(ej ) exp(jn) d
69
WS 2006/2007
2.5
Logarithmic Scale and dB
Why?
large dynamic range for the amplitude values of a signal
x(t) = A cos t
A :=
amplitude
(pressure, velocity, inclination, current, voltage, ... )
linear variable
A0 :=
reference amplitude
predefined value for calibration
dB := decibel
A[dB] 20 lg
A
,
A0
A2
= 10 lg 2 ,
A0
lg log10
A2 = quadratic variable = energy, intensity
because of 210 = 1024

= 103 :
1 bit more =
factor 2 for amplitude =

6 dB
= factor 4 for intensity
3 dB =
factor 2 for intensity
70
WS 2006/2007
Phonem: s
Phonem: s
1.5
1
0.5
A
log A
4
3
0
-0.5
-1
1
0
-1.5
0
1000
2000
3000
4000
f / Hz
5000
6000
7000
-2
8000
Figure 2.6: Amplitude spectrum of the

voiceless phoneme s from the word
ist
1000
2000
3000
4000
f / Hz
5000
6000
7000
8000
Figure 2.7: Logarithmic amplitude spectrum of the phoneme s
Phonem: ae
Phonem: ae
12
2.5
2
10
1.5
1
log A
0.5
0
-0.5
2
-1
0
1000
2000
3000
4000
f / Hz
5000
6000
7000
-1.5
8000
Figure 2.8: Amplitude spectrum of the

voiced phoneme ae from the word
Ah
1000
2000
3000
4000
f / Hz
5000
6000
7000
8000
Figure 2.9: Logarithmic amplitude spectrum of the phoneme ae
Pause
Pause
0.9
-0.5
0.8
0.7
-1
log A
0.6
0.5
-1.5
0.4
-2
0.3
0.2
-2.5
0.1
0
1000
2000
3000
4000
f / Hz
5000
6000
7000
-3
8000
Figure 2.10: Amplitude spectrum of a

speech pause
1000
2000
3000
4000
f / Hz
5000
6000
7000
8000
Figure 2.11: Logarithmic amplitude

spectrum of a speech pause
71
WS 2006/2007
2.6
Quantization
Uniform quantization
-X MAX
XMAX
Quantisation: x = Q(x)
B bits correspond to 2B quantisation levels
Boundaries:
x0 , x1 , . . . , xk , . . . , xK
where
K = 2B
Width of one quantisation level using uniform quantisation:

=
2 XM AX
2B
Quantisation error:
e2
Z+
Zxk
K
X
=
(x x)2 p(x) dx =
(x xk )2 p(x) dx
k=1 x
k1
for uniform quantisation:

a)
b)
xk xk1 = = const(k)
xk = 12 (xk1 + xk )
uniform distribution with p(x) = const(x) results in:
e2
X 2
k
2
1
2
XM
AX
=
=
12 K
12
3 22B
72
WS 2006/2007
signal-to-noise ratio in dB (general definition):

x2
SN R[dB] := 10 lg 2
n
x2 = power of the signal x
n2 = power of the noise n
SN R = signal-to-noise ratio
signal-to-quantisation noise ratio (special case):
x2
SN R[dB] := 10 lg 2
e
e2 = power of the noise caused by quantisation errors
uniform quantisation using B bits:
SN R[dB] = 6.02 B + 4.77 20 lg
XM AX
x
if signal amplitude has Gaussian distribution, only 0.064% of samples

have amplitude greater than 4x :
SN R[dB] = 6.02 B 7.2
73
for XM AX = 4x
WS 2006/2007
2.7
Fourier Transform and zTransform
Transfer function and Fourier transform

Eigenfunctions of discrete linear time invariant systems (analog to time
continuous case):
x[n] = ej n
< n <
( is dimensionless here)
Proof:
x[n] = ej n
X
y[n] =
h[k] ej (nk)
k=
jn
= e
h[k] ej k
k=
Define:
H(ej ) =
h[k] ej k
k=
Remark:
The Fourier transform of a discrete time signal is already introduced as
Fourier series during the derivation of sampling theorem and reconstruction formula (equation (2.2)).
Result:
y[n] = ej n H(ej )
74
WS 2006/2007
ztransform:
Fourier transform of a discrete time signal: x[n]
+
X
X(e ) =
x[n] ejn
n=
periodic in
is normalized frequency, thence:

<
X is evaluated on the unit circle (ej )
Generalization: X is evaluated for any complex values z.

That results in ztransform:
+
X
X(z) =
x[n] z n
n=
Reasons for ztransform

1. analytically simpler, function theory methods are applicable
2. better handling of convergence problem:
convergence of finite signal, i.e. x[n] = 0 for each n > N0
convergence of infinite signal depends on z
Inverse ztransform:
1
x[n] =
2j
formally: z = ej
X(z) z n1 dz
dz = jzd
x[n] =
1
2
Z2
X(ej ) ejn d
75
WS 2006/2007
Example of Fourier transform and ztransform:

Truncated geometric series
n
a
x[n] =
0
ztransform
N
1
X
X(z) =
a z
0nN 1
otherwise
n=0
z N 1
N
1
X
1 n
(a z )
n=0
z a
za
1 (a z 1 )N
=
1 a z 1
Fourier transform
ztransform results in Fourier transformation using substitution
z = ej
X(e ) =
1 aN ejN
1 a ej
special case for a = 1 (discrete time rectangle):

N
sin

(N 1)
2

= exp j
2
sin
2
76
WS 2006/2007
Proof for the ztransform inversion

Statement:
1
x[k] =
2j
X(z) z k1 dz
Cauchy integration rule

I
1
1
k=1
z k dz =
0
k 6= 1
2j
I
I X
1
1
x[n] z n+k1 dz
X(z) z k1 dz =
2j
2j
n
I
X
1
x[n]
=
z n+k1 dz
2j
n
|
{z
}
6= 0 only for n = k
= x[k]
Fourier:
z = ej
dz = j ej d
Then:
x[n] =
1
2j
Z+
X(ej ) (ej )n1 j ej d
Integration path is unit circle because of ej
1
2
Z+
X(ej ) ejn d
77
WS 2006/2007
2.8
System Representation and Examples
Example 1: Difference calculation

Difference equation
y[n] = x[n] x[n n0 ],
Fourier transform gives:
jn
y[n] e
n=
n0 integral number
jn
x[n] e
n=
Y (ej ) = X(ej )
j
n=
jn0
= X(e ) e
Then follows:
n=
x[n n0 ] ejn
x[n] ejn ejn0

X(ej )
Y (ej )
H(e ) =
X(ej )
= 1 ejn0
j
|H(ej )|2
= (1 cos(n0 ))2 + sin2 (n0 )

= 1 2cos(n0 ) + cos2 (n0 ) + sin2 (n0 )
= 2 (1 cos(n0 ))
|H(ei )|2
5
78
n0
WS 2006/2007
Example 2: First order difference equation

x[n]
y[n]
+
Delay
y[n-1]
x[n] + y[n 1] = y[n]

y[n] y[n 1] = x[n]
Method 1: Estimation of transfer function H(ej )

from impulse response h[n]:
From the Eq. above with y[n] = h[n] and x[n] = [n] follows:
h[n] = [n] + h[n 1]
= [n] + [n 1] + 2 [n 2] +
n
,
n0
=
0,
otherwise
Fourier spectrum/transfer function H(ej )
j
H(e ) =
=
=
+
X
h[k] ejk
k=
+
X
k ejk
k=0
+
X
ej
k=0
1
1 ej
79
k
for || < 1
WS 2006/2007
Method 2: Estimation of transfer function H(ej ) using

Fourier transform of difference equation:
Difference equation:
y[n] y[n 1] = x[n]
Fouriertransform:
Y (ej ) ej Y (ej ) = X(ej )
Result:
H(ej ) =
=
80
Y (ej )
X(ej )
1
1 ej
WS 2006/2007
Example 3: Linear difference equations (with constant coefficients)

y[n] =
I
X
i=0
b[i] x[n i]
z-transform:
Y (z) = X(z)
I
X
b[i]z
i=0
Result:
H(z) =
Y (z)
X(z)
I
P
J
X
a[j]z j
j=1
b[i] z i
1+
j=1
j=1
a[j] y[n j]
Y (z)
i=0
J
P
+
X
J
X
a[j] z j
h[n] z n
n=
Using the definition of H(z) we can optain the impulse response as a

function of the coefficients of the difference equation in the above term.
Remark:
If we factorise denominator and numerator polynoms into linear factors, we can obtain a zero-pole-representation of a discrete time LTI
system:
I (z vi )
H(i) = Ji=1
j=1 (z wj )
with zeros vi C and poles wj C.
81
WS 2006/2007
in general:
h[n] has infinite number of non-zero values
= IIRfilter: Infinite Impulse Response
but if: a[j] 0

j
h[n] identical to zero outside of a finite interval
h[n] =
b[n]
0
n = 0, . . . , I
otherwise
= FIRfilter: Finite Impulse Response
82
WS 2006/2007
Example 4:
Impulse response as truncated geometric series
h[n] =
H(z) =
an
0
M
X
0nM
otherwise
a z
n=0
a IR
1 aM +1 z (M +1)
=
1 a z 1
system operation:
y[n] =
k=
M
X
k=0
h[k] x[n k]
ak x[n k]
or as difference equation (recursively)

y[n] a y[n 1] = x[n] aM +1 x[n M 1]
83
WS 2006/2007
For this example we consider the zero-pole-representation:

1 ( az )(M +1)
H(z) =
1 ( az )1
Zeros:
Pole:
zk
z0
>0
2k
k = 0, 1, . . . , M
= a ej M +1
= a
(cancelled by zero z0 = a)
Im
M=11
Re
84
WS 2006/2007
Example 5:
Fibonacci numbers
n0
h[n + 2] = h[n + 1] + h[n]

h[0]
= h[1] = 1
h[n]
= 0
n<0
H(z) =
h[n]z n
n=
= 1 + z
= 1 + z
+
+
n=0
h[n + 2]z (n+2)

h[n + 1]z
(n+2)
n=0
= 1 + z
+ z
= 1 + z 1 (1 +
|
h[n]z (n+2)
n=0
n=0
h[n + 1]z
h[n]z n ) + z 2
{z
H(z)
= 1 + z 1 H(z) + z 2 H(z)
H(z)
=
=
+ z
h[n]z n
n=0
n=1
H(z)(1 z 1 z 2 )
(n+1)
h[n]z n
|n=0 {z
H(z)
1
1
1 z 1 z 2

1
a
b
1 bz 1
5 1 az 1
1 5
1+ 5
and b =
where a =
2
2
85
WS 2006/2007
For a and b the following holds:

X
X
a n
1
=
=
an z n
a
1 (z )
z
n=0
n=0
That results in:
X

1
an+1 bn+1 z n
H(z) =
5
n=0
X
!
=
h[n] z n
n=0
h[n] =
an+1 bn+1
86
WS 2006/2007
Table 2.1: Fourier transform pairs
signal
Fouriertransform
1.
[n]
2.
[n n0 ]
ejn0
3.
4.
( < n < )
an u[n]
2( + 2k)
k=
1
1 aej
(|a| < 1)
X
1
+
( + 2k)
1 ej k=
5.
u[n]
6.
(n + 1)an u[n]
7.
rn sin p (n + 1)
u[n]
sin p
8.
sin c n
n
(
1, || < c ,
X(e ) =
0, c < ||
9.
(
1, 0 n M
x[n] =
0, otherwise
sin[(M + 1)/2] jM/2

e
sin(/2)
10. ej0 n
(|a| < 1)
(|r| < 1)
1
(1 aej )2
1
1 2r cos p ej + r2 ej2
j
k=
11. cos(0 n + )
2( 0 + 2k)
[ej ( 0 + 2k) + ej ( 0 + 2k)]
k=
u[n] =
1, n 0
0, n < 0
87
WS 2006/2007
2.9
Discrete Time Signal Fourier Transform Theorem
Basically there is no difference between FT theorem for the continuous

time and the discrete time case because summation has the same properties as integration. Only differentiation and difference calculation are not
completely analog, because it is not possible to form a derivative in the
discrete time case.
Table 2.2: Fourier transform Theorems
signal
x[n], y[n]
Fouriertransform
X(ej ), Y (ej )
1.
ax[n] + by[n]
aX(ej ) + bY (ej )
2.
x[n nd ],
nd is integral number
ejnd X(ej )
3.
ej0 n x[n]
X(ej(0 ) )
4.
x[n]
X(ej )
X(ej ) if x[n] is real
5.
nx[n]
6.
x[n] y[n]
X(ej )Y (ej )
x[n]y[n]
1
2
7.
8.
Parseval theorem
9.
1
|x[n]| =
2
n=
X(ej )Y (ej() )d
(1 ej )X(ej )
|1 ej |2 = 2(1 cos )
x[n] x[n 1]
dX(ej )
d
1
10.
x[n]y[n] =
2
n=
|X(ej )|2 d
X(ej )Y (ej )d
88
WS 2006/2007
Example 1 corresponding to Theorem 5:

+
X
X(e ) =
x[k] ejk
k=
k=
+
X
=
j
k=
+
X
d
X(ej ) =
d
x[k] ejk
k=
+
X
+
X
d
d
d
X(ej ) =
d
d
d
x[k] ejk
x[k] (jk) ejk

k x[k] ejk
k=
F {n x[n]} = j
d
F {x[n]}
d
Example 2 corresponding to Theorem 8:

+
X
F {x[n] x[n 1]} =
k=
+
X
k=
j
|F {x[n] x[n 1]} |2
jk
x[k] e
x[k] ejk

= X(e ) 1 ej
+
X
k=
+
X
x[k 1] ejk
x[k] ejk ej
k=
= |F {x[n]} |2 |1 ej |2
= |F {x[n]} |2 2 (1 cos())
89
WS 2006/2007
2.10
Discrete Fourier Transform: DFT
The Fourier transform for discrete time signals and systems has been explained on the previous pages. For discrete time signals with finite length
there is also another Fourier representation called Discrete Fourier Transform (DFT).
The DFT plays a central role in digital signal processing.
Decisive reasons:
fast algorithms exist for DFT calculation
(Fast Fourier Transform, FFT).
discrete frequencies k can be better represented in the computer than
continuous frequencies .
90
WS 2006/2007
Assume a discrete time signal x[n] with finite length (see also chapter 3.2
on page 135):
x[n] =
x[n]
0
0nN 1
otherwise
Note: For a continuous time signal it is impossible in the strict sense to be

band-limited and time-limited (truncation effect =
Windowing).
The discrete time signal Fourier transform for x[n] is:
j
X(e ) =
N
1
X
x[n] exp(jn)
n=0
is a continuous variable. The period is 2. Frequency discretisation

is made by sampling along the frequency axis.
The Fourier transform X(ej ) is evaluated at
k =
2
k
N
where k = 0, 1, . . . , N 1
Define:
X[k] : = X(ej )| = k
Im
N=8
Re
91
WS 2006/2007
Discrete Fourier Transform (DFT):

X[k] =
N
1
X
x[n] exp(j
n=0
2
k n),
N
k = 0, 1, . . . , N 1
Inverse DFT:
N 1
1 X
2
x[n] =
k n),
X[k] exp(j
N
N
k=0
n = 0, 1, . . . , N 1
Remark:
This equation can be proven by inserting the equation for X[k] in the
equation for x[n] and using the orthogonality:

N 1
1 X
2
1
exp j kn =
0
N n=0
N
k = m N,
otherwise
m is integral number
Note:
Consider the analogy between inverse DFT (above) and inverse
Fourier transform of discrete time signal:
x[n] =
1
2
Z2
X(ej ) ejn d
Under the given conditions the integral is equal to the sum

(without approximation!).
92
WS 2006/2007
Remarks:
DFT coefficients X[k] are not an approximation of the discrete time
signal Fourier transform X(ej ). On the contrary:
X[k] = X(ej )| = k
Number of the coefficients X[k] depends on the signal length N . A
finer sampling of the discrete time signal Fourier transform is possible
by appending zeros to the signal x[n] (ZeroPadding).
x[n]
N-1
93
WS 2006/2007
Interpretation of Fourier coefficients

Fourier transform X(ej ) of the time discrete signal x[n]
j
|X(e
)|
Evaluation at N discrete sampling points

2
k
N
k =
yields the DFT coefficients X[k].
At first k lies in the domain k =
N
N
+ 1, . . . , 0, . . . , .
2
2
|X(e
-N/2+1
-1
94
)|, |X[k]|
N/2
WS 2006/2007
Because of the periodicity of X(ej ) the coefficients X[k] can also be

obtained by shifting the sampling points with negative frequency into
the positive frequency domain (by one period).
Then k = 0, . . . , N/2, . . . , N 1.
X[k] =
N
1
X
x[n] exp(j
n=0
|X(e
2
k n)
N
)|, |X[k]|
N-1
Interpretation of coefficients for general signal x[n]:

k=0
N
1
1k
2
N
k=
2
N
+1k N 1
2
f =0
0<f <
95
fS
2
fS
2
fS
<f <0
2
WS 2006/2007
Symmetric relations by real signals:

For DFT coefficients X[k] of a real signal x[n] the following holds:
X[k] = X[N k]
Re(X[k]) =
Re(X[N k])
Im(X[k]) = Im(X[N k])
For the amplitude spectrum | X[k] | the following holds:
| X[k] |2
= Re2 {X[k]} + Im2 {X[k]}

= | X[N k] |2
96
WS 2006/2007
Realization of DFT:
/*
/*
/*
/*
PI = 3.14159265358979
x:
input signal
N: length of input signal
Xre, Xim: real and imaginary part of DFT coefficients
void
dft (int N, float x[], float Xre[], float Xim[]) {

int
n, k;
float
SumRe, SumIm;
for (k=0; k<=N-1; k++)
{
SumRe = 0.0;
SumIm = 0.0;
for (n=0; n<=N-1; n++)
{
SumRe += x[n]*cos(2*PI*k*n/N);
SumIm -= x[n]*sin(2*PI*k*n/N);
}
Xre[k] = SumRe;
Xim[k] = SumIm;
}
}
Remark:
discrete realization
2j
2j
Reduction of Fourier powers e N kn to e N l (l = 0, 1, . . . , N 1)

is possible, because they are periodical (on the unit circle).
97
WS 2006/2007
*/
*/
*/
*/
2.11
DFT as Matrix Operation
Notation with unit roots
X[k]
N
1
X
x[n] exp (
n=0
N
1
X
2j
k n)
N
x[n] WNkn
n=0
where
WN := exp (
2j
)
N
N=12
W 0N =1
W 1N
W
3
N
W 2N
Periodicity of WN
unit root:
2
k) = (WN )k
N
2
:= exp (j )
N
exp (j k ) = exp (j
WN
98
WS 2006/2007
Note:
1.
WNr
= WNr mod N
2.
WNkN
= (WNN )k = 1k = 1
3.
WN2
= [exp (
kZ
2j
2j 2
)] = exp (
2)
N
N
= exp (
2j
) = WN/2
N/2
N/2
= exp (
2j N
) = exp (j) = 1
N 2
r+N/2
= WN
4.
WN
5.
WN
N/2
N even
WNr = WNr
99
WS 2006/2007
DFT as matrix multiplication
X[k] =
=
=
N
1
X
n=0
N
1
X
n=0
N
1
X
n=0
x[n] exp (
2j
k n)
N
WNkn x[n]
{WN }kn x[n]
with the matrix {WN } and the matrix elements:

{WN }kn := WNkn
Inversion:
N 1
2j
1 X
k n)
X[k] exp (
x[n] =
N
N
=
=
1
N
1
N
k=0
N
1
X
(WN1 )kn X[k]
k=0
N
1
X
k=0
{WN1 }kn X[k]
Therefore for the matrix {WN }1 holds:

{WN }1 :=
100
1
{WN1 }
N
WS 2006/2007
DFT matrix operation: properties

DFT: invertible linear mapping
N complex signal values N complex Fourier components
N real signal values
N
complex Fourier components
2
(due to symmetry)
in words:
DFT causes no information loss in the signal.
Parseval theorem for DFT
general Fourier:
N
1
X
n=0
1
=
2
|x[n]|2
Z+
|X(ej )|2 d
special DFT: (recalculate for yourself!)

N
1
X
n=0
|x[n]|
N 1
1 X
=
|X[k]|2
N
k=0
in words:
1
, the DFT is a norm conserving (=
energy
N
conserving) transformation (mathematical terminology: unitary).
Disregarding the factor
101
WS 2006/2007
2.12
From Continuous Fourier Transform to Matrix

Representation of Discrete Fourier Transform
Assumption: band-limited signal x(t)

Fourier transform of the continuous time signal x(t):
X() = F {x(t)} =
x(t) ejt dt
(2.9)
For the exact reconstruction (without approximation) of the continuous

time signal from sampled values, the samples x[n] = x(n Ts ) must have
the distance of at most
Ts =
(sampling theorem).
This results in the Fourier transform of the discrete time signal x[n]:
X(e ) =
x[n] ejn
(2.10)
where is frequency normalised on Ts

Functions (2.9) and (2.10) agree in interval [S /2, +S /2] = [B , +B ].
| X ()|
102
WS 2006/2007
1
0.8
0.6
0.4
0.2
0
0
N-1
Figure 2.12: Hanning window
The signal x[n] is further decomposed by applying a window function w[n]

(windowing):
(
...
n = 0, . . . , N 1
w[n] =
0
otherwise
Windowed signal y[n]:
y[n] = w[n] x[n]
can be analyzed using Fourier transform or DFT.

jk
Y (e
) =
N
1
X
y[n] ejk
n=0
DFT:
k
Y [k] =
2k
N
N
1
X
where k = 0, . . . , N 1
2
y[n] e N kn
n=0
Matrix representation:

Y [0]
..

Y [k] =

..

.
Y [K 1]
K=N
..
.
2j
e N
nk
..
.
103
y[0]
..
.
y[n]
..
.
y[N 1]
WS 2006/2007
2.13
Frequency Resolution and Zero Padding
Task: signal x[n] with finite length N is given.

Wanted: Fourier transform X(ejk ) at
k =
2
k,
K
where k = 0, 1, . . . , K 1 and K > N
Inserting the definitions:
jk
X(e
=
=
N
1
X
n=0
K1
X
x[n] exp (
2j
k n)
K
x[n] exp (
2j
k n)
K
n=0
where
x[n] =
x[n]
0
n = 0, . . . , N 1
n = N, . . . , K 1
i.e. Zero Padding (appending zeros).

Matrix representation:

X[0]
..

..

.
X[K 1]
x[0]
..
..
x[N 1]
..
.
0
WKnk
n=0
n=N 1
n=N
n=K 1
Note:
Zero Padding does not introduce any additional information into the signal. This is only a trick so that DFT and particularly FFT (Fast Fourier
Transform) can be performed with a
higher frequency resolution
.
104
WS 2006/2007
2.14
Finite Convolution
Input signal and convolution kernel have finite duration

Consider finite convolution:
Impulse response:
h[n] 0
for
n 6 {0, 1, 2, . . . , Nh 1}
Input signal:
x[n] 0
for
n 6 {0, 1, 2, . . . , Nx 1}
Output signal:
y[n] =
=
k=
N
h 1
X
k=0
h[k] x[n k]
h[k] x[n k]
h[k]
N-1
h
x[-k]
n=0
-(N-1)
x
Altogether:
k
0
Nx + Nh 1 positions with overlap
Therefore only Nx + Nh 1 values of output signal can be different

from zero:
n > N x + Nh 2
0
y[n] =
...
n = 0, 1, . . . , Nx + Nh 2
0
n<0
105
WS 2006/2007
a)
h[k]
1
Nh-1=12
x[k]
1
0.8
0.6
0.4
0.2
b)
Nx-1=4
x[n-k] , n=-1
i)
-Nx
-1 0
Nh-1
x[n-k] ,n=m (m>0 & m<Nh+Nx-2)

ii)
m-NX+1
Nh-1
x[n-k] , n=Nh+Nx-1
iii)
0
c)
Nh+Nx-1
Nh-1
y[n]
2.8 3
2.4
2
1.8
1.2
1
0.6
0.2
Nx-1
Nh-1
Nh+Nx-2
Figure 2.13: Example of a linear convolution of two finite length signals: a) two signals;
b) signal x[n-k] for different values of n:
i) n < 0, no overlap with h[k], therefore convolution y[n] = 0
ii) n between 0 and Nh + Nx 2, convolution 6= 0
iii) n > Nh + Nx 2, no overlap with h[k], convolution y[n] = 0
c) resulting convolution y[n].
106
WS 2006/2007
Finite convolution using DFT

y[n] =
k=
h[k] x[n k]
Fourier:
Y (ej ) = H(ej ) X(ej ),
0 2
Also valid for sample frequencies:

k :=
2
k,
N
k = 0, . . . , N 1 for any N
Notation: Y [k] = H[k] X[k]

Question:
How to choose the length N of the DFT ?
Reminder: different lengths

x[n]: Nx non-zero values
h[n]: Nh non-zero values
y[n]: Ny = Nx + Nh 1 non-zero values
Answer:
The convolution theorem is certainly correct for any N > 0.
If we want to calculate the output signal completely from Y [k],
we have to know Y [k] for
at least N = Nx + Nh 1
frequency values k = 0, 1, . . . , N 1.
In words: for the DFT length N must be valid:

N Nx + Nh 1
Method:
Zero Padding, i.e. appending zeros.
Note: The FFT will be introduced on the next pages. A comparison of

costs for realization of the finite convolution by DFT and FFT can be
found at the end of paragraph 2.15.
107
WS 2006/2007
2.15
Fast Fourier Transform (FFT)
Principle of FFT:
Calculation of the DFT can be done by successive decomposition into
smaller DFT calculations. In this way, the number of elementary operations (multiplications and additions) is dramatically reduced:
FFT:
N2
N = 1024 :
N
ld N
2
2N
2 1024
=
= 200
ld N
10
operations
factor of velocity gain
The matrix is decomposed into a product of sparse matrices, therefore N

with lot of prime factors is convenient (not necessarily only powers of two).
Terminology for different variants of FFT:
in time in frequency
in place:
yes/no
radix 2 radix 4
decomposition to prime factors instead of N = 2n
History
1965 Cooley and Tukey
1942 Danielson and Lanczos
1905 Runge
1805 Gauss
108
WS 2006/2007
Algorithms which are based on a decomposition of the signal x[n] are called
decimationintime algorithms.
The case N = 2 is considered in the following.
X[k] =
=
N
1
X
n=0
N
1
X
x[n] exp(j
2
k n)
N
x[n] WNnk
where k = 0, 1, . . . , N 1
where WNnk = exp(j
n=0
2
k n)
N
Decomposition of the sum over n into the sums over even and odd n:
N/21
N/21
X[k] =
x[2r]
WN2rk
r=0
r=0
N/21
(2r+1)k
x[2r + 1] WN
x[2r]
(WN2 )rk
N/21
WNk
r=0
x[2r + 1] (WN2 )rk
r=0
Because of
WN2 = exp(2j
2
2
) = exp(j
) = WN/2
N
N/2
for k = 0, . . . , N 1 holds:
N/21
X[k] =
N/21
x[2r]
rk
WN/2
r=0
= G[k] +
WNk
x[2r + 1] (WN/2 )rk
r=0
WNk
H[k]
Each of the two sums corresponds to the DFT with the length N/2.
The first sum is a N/2DFT of the even indexed signal values x[n],
the second sum is a N/2DFT of the odd indexed values.
The DFT of the length N can be obtained by getting the two N/2
DFTs together, with the factor WNk .
109
WS 2006/2007
Complexity:
The complexity O(N 2 ) of one-dimensional FT can be reduced by adequate
resorting values from two FTs with length N2 and complexity O(2 ( N2 )2 ) =
N2
2 . By successive application of this resorting the complexity can be reduced to O(N log N ).
The case N = 23 = 8 is considered in the following.
X[4] can be obtained from H[4] and G[4] according to previous equation.
Because of the DFTlength
N
2
= 4:
H[4] = H[0] and G[4] = G[0]
And then:
X[4] = G[0] + WN4 H[0]
The values X[5], X[6] and X[7] can be obtained analogously.
Flow diagram for decomposition of one N -DFT into two N/2DFTs:
x[n]
X[k]
G[0]
x[0]
X[0]
G[1]
x[2]
x[4]
0
N
X[1]
N/2-point
G[2]
DFT
1
N
X[2]
G[3]
2
N
x[6]
X[3]
W3N
x[1]
X[4]
H[0]
x[3]
x[5]
4
N
X[5]
N/2-point
H[1]
5
N
DFT
X[6]
H[2]
6
N
x[7]
X[7]
H[3]
7
N
Figure 2.14: Flow diagram for decomposition of one N -DFT to two N/2DFTs with
N =8
110
WS 2006/2007
Further analogous decomposition, until only DFTs with the length

N = 2 remain (so called Butterfly Operation)
Resulting flow diagram of the FFT:
x[n]
X[k]
X[0]
x[0]
W0N
x[4]
-1
X[1]
0
N
x[2]
W
W
x[6]
X[2]
-1
2
N
0
N
-1
-1
X[3]
W0N
x[1]
1
N
0
N
W
x[5]
-1
2
N
0
N
x[3]
W2N
W0N
x[7]
-1
-1
-1
W3N
X[4]
-1
X[5]
-1
X[6]
-1
X[7]
-1
Figure 2.15: Flow diagram of an 8pointFFT using Butterfly operations.
111
WS 2006/2007
Complexity reduction
Number of complex multiplications in FFT
is N/2 ld N .
Comparison:
Direct application of the DFT definition needs
N 2 complex multiplications.
Example:
N = 1024 = 210
N2
200
N/2 ld N
Complexity reduction by factor 200

FFT with the base 2 is not minimal according to number of additions, FFT with the base 4 can be better.
112
WS 2006/2007
Matrix representation of the FFT principle

The complex Fourier matrix can be decomposed into the product of
r = ld N matrices, each of them having only two non-zero elements in
each column.
The following graph shows the decomposition of the Fourier matrix in
the case of inverse transformation.
w corresponds to WN1
X = |wnk |X = T3 T2 T1 TS x
This is how the decomposition into r + 1 = 4 matrices looks like
(w4 = 1, w8 = 1):
1
1
1
1
nk
|w | = 1
1
1
1
1
w
w2
w3
w4
w5
w6
w7
1
w2
w4
w6
w8
w10
w12
w14
1
w3
w6
w9
w12
w15
w18
w21
1
w4
w8
w12
w16
w20
w24
w28
1
w5
w10
w15
w20
w25
w30
w35
1
w6
w12
w18
w24
w30
w36
w42
1
1 1 1
w7
1 w w2
w14
1 w2 w4
w21
1 w3 w6
w28 = 1 w4 1
w35
1 w5 w2
w42
1 w6 w4
w49
1 w7 w6
113
1
w3
w6
w
w4
w7
w2
w5
1
w4
1
w4
1
w4
1
w4
1
w5
w2
w7
w4
w
w6
w3
1
w6
w4
w2
1
w6
w4
w2
1
w7
w6
w5
w4
w3
w2
w
WS 2006/2007
Signal flow diagram

The calculation operations which correspond to the matrix representation
of FFT can be showed in a signal flow diagram.
T3
1000 1 0 0
0
0100 0 w 0
0
0 0 1 0 0 0 w2 0
0 0 0 1 0 0 0 w3
1 0 0 0 1 0 0
0
0 1 0 0 0 w 0
0
0 0 1 0 0 0 w2 0
0 0 0 1 0 0 0 w3
T2
10 1 0
0 1 0 w2
1 0 -1 0
0 1 0 w2
00 0 0
00 0 0
00 0 0
00 0 0
00 0 0
00 0 0
00 0 0
00 0 0
10 1 0
0 1 0 w2
1 0 1 0
0 1 0 w2
TS
T1
T1
1100
1 -1 0 0
0011
0 0 1 -1
0000
0000
0000
0000
TS
0 0 0 0 1000
0 0 0 0 0000
0 0 0 0 0010
0 0 0 0 0000
1 1 0 0 0100
1 -1 0 0 0 0 0 0
0 0 1 1 0001
0 0 1 -1 0 0 0 0
T2
0000
1000
0000
0010
0000
0100
0000
0001
T3
x[0]
x[1]
X[0]
X[1]
-1
x[2]
2
x[3]
-1
X[2]
-1
X[3]
-1
x[4]
x[5]
-1
x[6]
2
x[7]
-1
114
2
-1
3
-1
X[4]
X[5]
X[6]
X[7]
WS 2006/2007
Matrices T1 , T2 and T3 contain exactly two non-zero elements in each

row.
Non-zero elements are realizing the Butterfly Operation.
Matrix T1 : step width of the Butterfly Operation is 1
step widths can be found:
in signal flow diagram
distance between the non-zero elements in T1 , T2 and T3
115
WS 2006/2007
Butterfly Operation
Signal flow diagram and matrix representation of the FFT are based
on the following basic operation:
Xm[p]
Xm-1[p]
WrN
Xm-1[q]
Xm[q]
-1
For two input values Xm1 [p] and Xm1 [q] this operation produces
two output values Xm [p] and Xm [q]. The output values are thereby a
linear combination of the input values.
Because of the flow graph, the operation is called
Butterfly Operation.
Xm [p] = Xm1 [p] + WNr Xm1 [q]

Xm [q] = Xm1 [p] WNr Xm1 [q]
Xm [p]
Xm [q]
1
WNr
1 WNr
116

Xm1 [p]
Xm1 [q]

WS 2006/2007
Bit Reversal
The matrix representation of the FFT uses a sorting matrix, i.e. the
signal which is to be transformed is at first resorted.
Example for N = 8:
n Binary representation Reversed n
0
000
000
0
1
001
100
4
2
010
010
2
3
011
110
6
4
100
001
1
5
101
101
5
6
110
011
3
7
111
111
7
Bit Reversal is a necessary part of the FFTAlgorithm.
Bit Reversal for N = 23
117
WS 2006/2007
2.16
FFT Implementation
Fortran version
C
adapted from: Oppenheim, Schafer p. 608
C SUBROUTINE FFT DecimationInTime (X, ld n) **********************************
C ****************************************************************************
PARAMETER PI = 3.14159265358979
PARAMETER N max = 2048
COMPLEX
COMPLEX
COMPLEX
COMPLEX
INTEGER
X(N max)
! array for input AND output
Temp
! temporary storage
W uni
! root of unity
W pow
! powers of W uni
N, ld N, ip, iq, iqbeq, j, k, i exp, istp
N = 2**ld n
IF (N.GT.N max) STOP
C BIT Reversed Sorting *********************************************************
j = 1
DO i = 1, N-1
IF (i.LT.j) THEN
! swap X(j) and X(i)
Temp = X(j)
X(j) = X(i)
X(i) = Temp
ENDIF
k = N/2
DO WHILE (k.LT.j)
j = j - k
k = k / 2
ENDDO
j = j + k
ENDDO
C End of Bit Reversed Sorting **************************************************
C FFT Butterfly Operations *****************************************************
DO i=1, ld N
i exp = 2**i
! exponent
istp = i exp/2
! stepsize
W pow = (1.0,0.0)
W uni = CMPLX (COS (PI/FLOAT(istp)), -SIN(PI/FLOAT(istp)))
DO ipbeg = 1, istp
DO ip = ipbeg, N, i exp
iq = ip + istp
Temp = X(iq) * W pow
X(iq) = X(iq) - Temp
X(ip) = X(iq) + Temp
ENDDO
W pow = W pow * W uni
ENDDO
ENDDO
C End of FFT Butterfly Operations **********************************************
RETURN
END
118
WS 2006/2007
Explanations about Fortran Program

Two program parts:
1. Bit Reversal
2. Butterfly Operations
3 loops with variables i, ipbeg, ip are controlling the execution of
the Butterfly operations
outer loop i:
i specifies the level of the FFT
With exception of the first level, Butterfly operations are nested.
Therefore two loops are used for the Butterfly operations within
one level.
middle loop, ipbeg:
ipbeg: goes over the nested Butterfly operations
i=1: ipbeg=1
i=2: ipbeg=1,2
i=3: ipbeg=1,2,3,4
iqbeg: specifies the sequence of starting points for inner loop
inner loop, ip:
ip: specifies the first element of the Butterfly operation
istp: step width of the Butterfly operation
iq=ip+istp: specifies the second element for Butterfly operation
inner loop is started once per nesting
119
WS 2006/2007
x[0]
X[0]
0
N
W
x[4]
-1
X[1]
0
N
x[2]
0
N
W
x[6]
X[2]
-1
2
N
-1
-1
X[3]
0
N
x[1]
0
N
1
N
W
x[5]
W
-1
W0N
W2N
x[3]
W0N
x[7]
-1
W2N
-1
-1
W3N
-1
-1
-1
-1
X[4]
X[5]
X[6]
X[7]
Figure 2.16: Flow diagram of an 8pointFFT using Butterfly operations.
120
WS 2006/2007
C version (from Numerical Recipes in C)

#include <math.h>
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr
void four(float data[], unsigned long nn, int isign)
Replaces data[1..2*nn] by its discrete Fourier transform, if isign is input as 1; or replaces
data[1..2*nn] by nn times its inverse discrete Fourier transform if insign is input as -1.
data is a complex array of lenght nn or, equivalently, a real array of lenght 2*nn. nn MUST
be an whole number power of 2 (this is not checked for!).
{
unsigned long n, mmax, m, j, istep, i;
double wtemp, wr, wpr, wpi, wi, theta;
Double prec. for the trigonometric recurrences.
float tempr, tempi;
n=nn << 1;
j=1;
for (i=1; i<n; i+=2) {
This is the bit-reversal section of the routine.
if (j > i) {
SWAP (data[j], data[i]);
Exchange the two complex numbers.
SWAP (data[j+1], data[i+1]);
}
m=n >> 1;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
}
Here begins the Danielson-Lanczos section of the routine.
mmax=2;
while (n > mmax) {
Outer loop executed log2 nn times
istep=mmax << 1;
theta=isign(6.28318530717959/mmax); Initialise the trigonometric recurrence.
wtemp=sin(0.5*theta):
wpr = -2.0*wtemp*wtemp;
wpi=sin(theta);
wr=1.0;
wi=0.0;
for (m=1;m <mmax;m+=2) {
Here are the two nested loops.
for (i=m;i<=n;i+=istep) {
j=i+mmax;
This is the Danielson-Lanczos formula:
tempr=wr*data[j]-wi*data[j+1];
tempi=wr*data[j+1]+wi*data[j];
data[j]=data[i]-tempr;
data[j+1]=data[i+1]-tempi;
data[i] += tempr;
data[i+1] += tempi;
}
wr=(wtemp=wr)*wpr-wi*wpi+wr;
Trigonometric recurrence
wi=wi*wpr+wtemp*wpi+wi;
}
mmax=istep
}
}
121
WS 2006/2007
Input and output arrays (C version)

a)
real
imag
real
imag
2N-3
real
}
}
b)
real
imag
real
imag
N-1
real
imag
N+1
real
t=0
t=
}
}
N+2
imag
N+3
real
N+4
imag
2N-1
real
}
}
}
}
}
f=0
f= 1
N
f = N/2 - 1
N
f=
1
2
f=
N/2 - 1
N
f=
1
N
t = (N-2)
2N-2
imag
2N-1
real
t = (N-1)
2N
imag
2N
imag
Figure 2.17: Input and output arrays of an FFT. a) The input array contains N (N is
power of 2) complex input values in one real array of the length 2N . with alternating
real and imaginary parts. b) The output array contains complex Fourier spectrum at N
frequency values. Again alternating real and imaginary parts. The array begins with the
zero-frequency and then goes up to the highest frequency followed with values for the
negative frequencies.
122
WS 2006/2007
Finite convolution: complexity by the application of FFT

Estimation of number of necessary multiplications for a convolution of x[n]
and h[n]
x[n]: Nx non-zero values
h[n]: Nh non-zero values
Realisation
direct implementation
DFT
FFT
transformation
(Nx + Nh )2
Nx Nh
Nx +Nh
2
log2 (Nx + Nh )
multiplication in frequency domain

Nx + Nh
Nx + Nh
inverse transformation
(Nx + Nh )2
123
Nx +Nh
2
log2 (Nx + Nh )
WS 2006/2007
2.17
Cyclic Matrices and Fourier Transform
The Fourier transform plays a significant role for so-called cyclic matrices
(cf. chapter 3.8):
0
H =
h0
h1 h2
hN 1 . . . . . .
hN 2 . . . . . .
...
..
.
N 1
...
...
... ...
... ... ...
... ... ...
... ...
h1
so:
hN 1
hN 1
hN 2
..
.
h2
h1
h0
N 1
Hmn = h(nm)modN
with so-called kernel vector (h0 , h1 . . . , hN 1 )T and hn C, mostly hn IR.

Remark: using cyclic matrices it is possible to define cyclic i.e. periodic
convolutions and to build a cyclic variant of the system theory.
The eigenvectors of a cyclic matrix can be obtained from the columns of
DFT matrix:
0
0
N 1
N 1
w00
wn0
w(N 1)0
..
..
.
.
wn1
..
..
.
.
..
..
.
.
..
..
.
.
..
..
.
..
..
.
wn(N 1)
.
124
where w = e 2j
N
WS 2006/2007
The eigenvalues k of a cyclic matrix with the kernel vector (h0 , h1 . . . , hN 1 )T

are:
k =
N
1
X
2j
hn e N
kn
n=0
where k = 0, 1 . . . , N 1
The representation is oriented on L. Berg: Linear equation systems with

band structure (page 52 ff).
The special case of a cyclic matrix when the kernel vector h is symmetric
and real is especially interesting for many applications:
hn = hN n and hn IR
That means that the matrix is real, symmetric and cyclic:
0
0
N 1
h0 h1 h2
h1 . . . . . . . . .
... ... ...
... ...
..
...
.
h2 h3
h1 h2
125
...
...
...
...
...
N 1
h2 h1
h2
h3
. . . ...
...
... h
1
h1 h0
WS 2006/2007
For such a matrix is valid:

a) the eigenvalues k are:
k =
N
1
X
n=0
2kn
hn cos
N
b) The eigenvectors can be obtained from the columns of the discrete

cosmatrix:
n
..
.
..
.
.
..
cos 2nm )
m
N
..
.
..
..
.
Application: diagonalisation of covariance matrices, e.g. for coding
of image and speech signals.
126
WS 2006/2007
Proof:
We will prove that
vk =
cos 2k0
N
2k1
cos N
..
.
..
.
..
.
2k(N 1)
cos
N
are eigenvectors of the given matrix with eigenvalues k .

For a symmetric cyclic matrix is valid:
Hmn := h(nm)modN
where hn = hN n
One row results in (for odd N ):

X
n
N 1
Hmn vkn
2km
= h0 cos
+
N
2
X
l=1
hl
2k(m l)
2k(m + l)
cos
+ cos
N
N
For even N only one term for l = (N 1)/2 can go into the sum.
According to addition theorem:
cos(x + y) + cos(x y) = 2 cos x cos y
With x = 2km/N and y = 2kl/N follows:
X
n
N 1
Hmn vkn
2km
+ 2
= h0 cos
N
N 1
2
2
X
l=1
hl
2km
2kl
cos
cos
N
N
2km
2kl i
cos
= h0 + 2
hl cos
N
| {zN }
l=1
|
{z
}
vkm
h
= k vkm
127
WS 2006/2007
Excursion: Toeplitz matrices

We consider only quadratic real matrices:
H IRN N
with Hmn IR
a) H is a (general) Toeplitz matrix , if:

Hmn = hnm
i.e.
H =
h0
h1
h2
..
.
h1
...
h2
...
... ...
... ... ...
... ... ...
...
...
... ... ...

... ...
h2N
h1N h2N
h2
hN 2 hN 1
hN 2
..
...
...
h2
...
h1
h1 h0
b) For a symmetric Toeplitz matrix then holds:
i.e.
H =
Hmn = h|nm|
h0
h1
h2
..
.
h1
...
...
...
h2
. . . hN 2
... ...
... ... ...
... ... ... ...
... ... ... ...
... ... ...
hN 2
hN 1 hN 2
h2 h1
128
hN 1
hN 2
..
h2
h1
h0
WS 2006/2007
c) A cyclic matrix can be obtained by special choice:

Hmn = h(nm)modN
H =
h0
hN 1
hN 2
..
.
h1 h2
. . . hN 2
... ... ...
... ... ... ...
... ... ... ...
... ... ...
... ...
h2
h1
h2
hN 1
hN 1
hN 2
..
h2
h1
h0
Also valid: each cyclic matrix is a Toeplitz matrix.

d) A symmetric cyclic matrix can be obtained by the following choice of
the kernel vectors h:
hn = hN n
for n = 0, . . . , N
For example, we obtain for N = 8:
H =
h0 h1 h2 h3 h4 h3 h2 h1
h1 h0 h1 h2 h3 h4 h3 h2
h2 h1 h0 h1 h2 h3 h4 h3
h3 h2 h1 h0 h1 h2 h3 h4
h4 h3 h2 h1 h0 h1 h2 h3
h3 h4 h3 h2 h1 h0 h1 h2
h2 h3 h4 h3 h2 h1 h0 h1
h1 h2 h3 h4 h3 h2 h1 h0
129
WS 2006/2007
Chapter 3
Spectral analysis
Overview:
3.1 Features for Speech Recognition
3.2 Short Time Analysis and Windowing
3.3 Autocorrelation function and Power Spectral Density
3.4 Spectrograms
3.5 Filter Bank Analysis
3.6 MelScale
3.7 Cepstrum
- Cepstrum Calculation from Filter Bank Output
- MelCepstrum according to Davis and Mermelstein
3.8 Statistical Interpretation of Cepstrum Transformation
3.9 Energy in acoustic Vector
131
WS 2006/2007
3.1
Features for Speech Recognition
Architecture of an automatic speech recognition system
speech signal
short-time
analysis
each 10 ms
(using FFT)
sequence of
acoustic vectors
reference model
for each word
in the vocabulary
pattern
comparison
decision
132
WS 2006/2007
Short time analysis:

window length 1040ms
sampling period 1020ms
in case of sampling rate of 10kHz:
Window: 100400 samples
sampling period (frame shift): 100200 samples
Recommended windows:
Hamming
Kaiser
Blackman
Model parameters:
Energy, intensity (loudness)
Fundamental frequency (height)
Spectral parameters (colour, smoother amplitude spectrum)
133
WS 2006/2007
Goal:
Ideally: Real features for the recognition
In practice: Data reduction, i.e. compact description
of the speech signal (amplitude spectrum)
Side effect:
Method also enables coding of speech signals using lowest possible
number of bits
Key words:
Fourier transform: wide band/narrow band, autocorrelation function
Filter bank
Cepstrum
Linear Predictive Coding (LPC) analysis
Fundamental frequency analysis
134
WS 2006/2007
3.2
Short Time Analysis and Windowing
The DFT is defined for signals with finite duration.

Speech signal s[n]:
quasi stationary, i.e. properties do not change within 20-50 ms.
Window function w[n]:
Decomposition of the original signal s[n] into (overlapping)
segments using a window function w[n]:
x[n] = s[n] w[n]
where for example
w[n] =
1,
0,
|n| N/2
otherwise
The windowed signal x[n] is analyzed with a Fourier Transform or

DFT.
The multiplication of the original signal s[n] with the window function
w[n] in the time domain corresponds to the convolution of the spectra
of two signals S(ej ) and window function W (ej ) in the frequency
domain:
1
X(ej ) =
2
S(ej ) W (ej() ) d
This convolution performs a (spectral) smearing in the frequency domain (leakage).
135
WS 2006/2007
Window function:
Impulse response:
0
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
0.5 fs
0.5 fs
0.5 fs
0.5 fs
Rectangle
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Triangle
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Hanning
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Hamming
136
WS 2006/2007
Window function:
Impulse response:
0
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
0.5 fs
0.5 fs
0.5 fs
Nuttall
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Gauss
-10
0.8
-20
dB
0.6
0.4
-30
-40
0.2
-50
0
0
-60
-0.5 fs
N-1
Chebyshev
137
WS 2006/2007
Fourier Transform
of a continuous
time signal
SC()
-0
Frequency graph
of anti-aliasing
low-pass filter
H()
-
T
XC()
Fourier Transform
of filtered signal
-
T
-0
Fourier Transform of
sampled signal
X(ej)
Fourier Transform
of window function
0=T
W(ej)
2
n
Fourier Transform
of windowed signal
and sampled values
of continuous spectrum
obtained using DFT
V(ej), V[k]
Figure 3.1: Example for the application of the Discrete Fourier Transform (DFT).
138
WS 2006/2007
Properties of short-time DFTanalysis

Important effects:
Picket Fence
If not enough sampled values of continuous spectrum are available,
spectral sampling can yield delusive results. This problem can be reduced using Zero Padding (inter-space between the coefficients S[k]
becomes smaller, i.e. frequency resolution becomes better)
Leakage: Spreading of the line spectrum
Because the window function is limited in time, a spreaded spectrum
is measured instead of the spectrum of the original signal unlimited in
time. That means, the line spectrum even becomes spreaded for pure
sinusoidal signals.
139
WS 2006/2007
3 examples of DFT analysis

we observe a continuous time signal x(t) composed of two
sinusoids:
x(t) = A0 cos(0 t) + A1 cos(1 t)
<t<
sampling according to sampling theorem

(with negligible quantization errors)
discrete time signal x[n]:
x[n] = A0 cos(0 n) + A1 cos(1 n)
<n<
where 0 = 0 TS and 1 = 1 TS
with the window function w[n]:
v[n] = A0 w[n] cos(0 n) + A1 w[n] cos(1 n)
Intermediate calculations:
v[n] =
+
also modulation principle
A0
A0
w[n] exp(j 0 n) +
w[n] exp(j 0 n)
2
2
A1
A1
w[n] exp(j 1 n) +
w[n] exp(j 1 n)
2
2
Fourier Transform of the windowed signal:

V (ej ) =
+
A0
A0
W (ej(0 ) ) +
W (ej(+0 ) )
2
2
A1
A
1
W (ej(1 ) ) +
W (ej(+1 ) )
2
2
140
WS 2006/2007
Assume:
4
2
10kHz, 1 =
10kHz
14
15
1/TS = 10kHz, rectangle window with N = 64, A0 = 1, A1 = 0.75
0 =
The windowed signal v[n] for the discrete time signal x(n) is therefore:
4
2
cos( n) + 0.75 cos( n) : 0 n 63

14
15
v[n] =
:
0 :
otherwise
v[n]
2
63
0
-1
141
WS 2006/2007
Fourier Transform W (ej ) of the rectangle window function
64
Example 1:
Leakage Effect
Variation of 0 and 1 resp. 0 and 1

Difference between frequencies 0 and 1 is reduced gradually
Case 1a:
0 =
2 4
10 Hz,
6
1 =
2 4
10 Hz
3
0 = 0 TS =
2 4
2
10 Hz 104 s =
6
6
1 = 1 TS =
2
2 4
10 Hz 104 s =
3
3
142
WS 2006/2007
Case 1a (continued):
0 =
2
6
1 =
2
3
V()
32
Case 1b:
2
3
0 =
2
6
2
14
2
6
1 =
2
3
4
15
V()
32
4
15
2
14
143
2
14
4
15
WS 2006/2007
Case 1c:
0 =
2
14
1 =
2
12
V()
30
Case 1d:
2 2
12 14
0 =
2
14
1 =
4
25
V()
40
144
WS 2006/2007
Example 2:
Picket Fence Effect
DFT gives sampled values of the spectrum of the windowed signal.

Spectral sampling can yield delusive results.
Case 2a:
Windowed signal v[n]:
(
2
4
cos( n) + 0.75 cos( n) : 0 n 63
v[n] =
14
15
0 :
otherwise
DFT of the length N = 64 without Zero Padding
145
WS 2006/2007
a)
v[n]
2
63
0
-1
b)
V(k)
30
c)
63
V()
32
Figure 3.2: a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum V (ej ).
146
WS 2006/2007
Case 2b:
In contrast to case 2a, the frequencies of sinusoids are changed only
slightly.
Windowed signal v[n]:
(
2
2
cos( n) + 0.75 cos( n) : 0 n 63
v[n] =
16
8
0 :
otherwise
DFT of the length N = 64 without Zero Padding
147
WS 2006/2007
a)
v(n)
63
-1
b)
V(k)
30
c)
63
V()
32
Figure 3.3: a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum V (ej ).
148
WS 2006/2007
Analysis of Example 2:
The manifestation of the DFT can be put down to the spectral sampling. Although in Case 2b the windowed signal v[n] contains a significant number of frequencies beyond 0 and 1 , they do not show in
the DFT spectrum of length N = 64.
Using a rectangle window, the DFT of the sinusoidal signal gives sharp
spectral lines, if the period N of the transformation is a whole multiple
of the signal period and no Zero Padding is applied.
Explanation for the case of a complex exponential function:
Assume the signal x[n]:
1
2
n)
exp(j
N
n0
x[n] =
Then:
X[k] = (k
N
)
n0
For the DFT of rectangle window holds:

W [k] =
sin(k)
sin(k/N )
Convolution theorem for windowed signal v[n] gives:

N
sin (k )
n0

V [k] = X[k] W [k] =
N
sin (k )/N
n0
In case of
N
IN
n0
only the DFT coefficient k =
149
N
is non-zero.
n0
WS 2006/2007
Example 2 (continued)
Assume signal v[n] of Case 2b:

(
2
2
cos( n) + 0.75 cos( n) : 0 n 63
v[n] =
16
8
0 :
otherwise
In contrast to Case 2b, a DFT with length N = 128 is applied (Zero

Padding).
Result:
Using finer sampling, existing additional frequency components emerge.
150
WS 2006/2007
a)
V(k)
30
b)
63
V(k)
32
c)
127
V()
32
Figure 3.4: a) DFT of length N = 64; b) DFT of length N = 128; c) Fourier spectrum
V (ej ).
151
WS 2006/2007
Example 3:
Explanation of following illustrations:
Assume: signal of Example 2, Case 2a.
Window: Kaiser window is applied instead of rectangle window.
First: window length L = 64 and DFT length N = 64.
Then: window length L and DFT length N are halved.
Afterwards: for the case L = 32, the DFT length N is gradually
increased up to N = 1024 (Zero Padding).
Finally: DFT spectrum for the case N = 1024 and L = 64.
The Kaiser window is defined as:

1/2
2
I0 1 [(n ) /]
wK [n] =
: 0nL1
I0 ()
0 :
otherwise
In this example:
= 0.8
and
L1
2
The windowed signal v[n]:

v[n] = wK [n] cos(
4
2
n) + 0.75 wK [n] cos( n)
14
15
152
WS 2006/2007
Example 3: (continued)
DFT length N = 64, window length L = 64
Windowed signal
v(n)
0
63
-1
DFT spectrum
V(k)
30
63
153
WS 2006/2007
(N and L halved)
Windowed signal
v(n)
0
31
DFT spectrum
V(k)
8
31
154
WS 2006/2007
Effect of changing DFT length N at constant window length L = 32 (Zero
Padding)
V(k)
8
31
63

V(k)
8
155
WS 2006/2007
V(k)
8
127
1024

V(k)
8
156
WS 2006/2007
Increasing the window length (L)
V(k)
16
1024
157
WS 2006/2007
Example 4: Window function influence on the spectrum
speech signal
phoneme "a"
amplitude spectrum
- rectangle window -
amplitude spectrum
- Hamming window -
Figure 3.5: Influence of the window function:

above: speech signal (vowel a); central: 512 point FFT using rectangle window; below:
512 point FFT using Hamming window
158
WS 2006/2007
3.3
Autocorrelation Function and Power Spectral Density
Definition of Autocorrelation Function (ACF) analog to the continuous

time case:
R[k] : =
x[n] x[n + k]
n=
For a signal x[n] assume (e.g. after some suitable windowing):

x[n]
0nN 1
x[n] =
0
otherwise
In this case the ACF gives:
R[k] =
NX
1k
x[n] x[n + k]
because x[n] = 0 for n < 0 and n N
n=0
triangular effect
number of terms in R[k]
-N
Cross correlation:
Rxy [k] =
x[n] y[n k]
x[n] y[k n]
n=
In contrast to convolution:
Oxy [k] =
n=
159
WS 2006/2007
Properties of ACF:
1. R[k] = R[k]
2. R[k] R[0]
for each k IN (R[0]: energy, intensity)
3. If x[n] R[k], then x[n] 2 R[k]

4. Intensity spectrum is the Fourier Transform of the ACF:
| X(ej ) |2
= X(ej ) X(ej )
X
X
=
x[k] exp(jk)
x[l] exp(jl)
=
=
=
=
k=
l=
k= l=
X
X
x[k] x[l] exp(jk) exp(jl)

x[k + l] x[l] exp(jk) exp(jl) exp(jl)
k= l=
X
X
k=
x[k + l] x[l]
l=
exp(jk)
R[k] exp(jk)
k=
Note:
The phase spectrum is removed.
160
WS 2006/2007
5. Because of the symmetry R[k] = R[k] the DFT becomes the cosine
transform:
j
| X(e ) |
=
=
R[k] exp(jk)
k=
N
1
X
k=(N 1)
= R[0] +
R[k] exp(jk)
N
1
X
R[k] (exp(jk) + exp(jk))
k=1
N
1
X
= R[0] + 2
R[k] cos(k)
because
R[k] = R[k]
k=1
6. The intensity spectrum | X(ej ) |2 is a polynom of cos() with grade

N 1.
Reason:
Moivre formula:

k
k
cosk4 () sin4 () . . .
cosk2 () sin2 () +
cos(k) = cosk ()
4
2
161
WS 2006/2007
Example 1: Spectral analysis using ACF
speech signal
phoneme "a"
amplitude spectrum
- Hamming window -
amplitude spectrum
- short hamming window -
amplitude spectrum
- 19 ACF-coefficients -
amplitude spectrum
- 13 ACF-coefficients -
Figure 3.6: Fourier Transform of a voiced speech segment:

a) signal progression, b) high resolution Fourier Transform, c) low resolution Fourier
Transform with short Hamming window (50 sampled values), d) low resolution Fourier
Transform using autocorrelation function (19 coefficients), e) low resolution Fourier Transform using autocorrelation function (13 coefficients)
162
WS 2006/2007
Example 2: ACF of voiced and unvoiced speech segments

speech signal
phoneme "s"
speech signal
phoneme "a"
autocorrelation
autocorrelation
autocorrelation
- Hamming window -
autocorrelation
- Hamming window -
Figure 3.7: Signal progression and autocorrelation function of voiced (left) and unvoiced
(right) speech segment
163
WS 2006/2007
Example 3: Temporal progression of autocorrelation coefficients

speech signal - digit sequence 0861909
ACF - coefficient for index 3

ACF - coefficient for index 0 (energy)
0
0

Figure 3.8: Temporal progression of speech signal and four autocorrelation coefficients
164
WS 2006/2007
3.4
Spectrograms
Using DFT
Wide-band:
in frequency domain:
short time window

interaction in the synchronization between
time window and pitch impulses
vertical lines
no resolution of spectral fine structure
Narrow-band:
in frequency domain:
long time window

good resolution of the spectral fine structure
165
WS 2006/2007
Example 1: speech spectrograms
Figure 3.9: a) wide-band spectrogram: short time window, high time resolution (vertical lines), no frequency resolution; for voiced signals provides information on formant
structure b) narrow-band spectrogram: long time window, no time resolution, high
frequency resolution (horizontal lines); for voiced signals provides information on fundamental frequency (pitch)
166
WS 2006/2007
Example 2: speech spectrograms
Figure 3.10: Wide-band and narrow-band spectrogram and speech amplitude for the
sentence Every salt breeze comes from the sea.
167
WS 2006/2007
3.5
Filter Bank Analysis
History:
Decomposition of the signal using a bank of band-pass filters and
energy calculation in each frequency band
transfer
function
Today digitally:
Digital filters:
yk [n] =
m=
hk [n m] x[m] ,
k = 1, . . . , K
FIR: Finite Impulse Response

IIR: Infinite Impulse Response (recursive filters)
DFT (FFT) + further processing
DFT/FFT Method:
Window function
Appending zeros for desired resolution (zero padding)
FFT
Energy calculation:
|X(ej )|, |X(ej )|2 , log |X(ej )|
Weighted averaging for each channel and frequency band respectively

168
WS 2006/2007
DFT/FFT filter bank:
transfer
function
transfer
function
169
WS 2006/2007
Averaging:
summation should be as smooth as possible over all channels
Form: rectangle, triangle, trapeze, etc.
Choosing the central frequencies fk :
constant:
fk = const. for all k
e.g. 20 channels with f = 200Hz for 0 4 kHz
constant relative band width:
fk
= const. for all k
fk
frequency groups of the ear (total number 24):
f
< 500Hz :
500Hz :
f = 100
f
= 20%
f
adjusted to vowels or sounds
170
WS 2006/2007
3.6
Mel-frequency scale
The frequency resolution of the human ear is decreasing on the higher

frequencies. This empirical dependency results in the definition of the Mel
scale, which is approximately calculated as (from: Hidden Markov Toolkit,
Cambridge University Engineering Departement, S.J.Young):
f
)
fMEL = 2595 log10 (1 +
700Hz
f
MEL
2700
7000
f / Hz
Compression of the high frequencies

f
fMEL
A filter bank with constant band-widths can be used on the Mel scale:
f
MEL
171
WS 2006/2007
Table: MEL Scale:

f /Hz fMEL
65
100
136
200
213
300
298
400
391
500
492
600
603
700
724
800
856
900
1000 1000
1158 1100
1330 1200
1519 1300
1724 1400
1949 1500
2195 1600
2464 1700
2757 1800
3078 1900
3429 2000
3812 2100
4230 2200
4688 2300
5187 2400
5734 2500
6331 2600
6984 2700
172
WS 2006/2007
3.7
Cepstrum
The Cepstrum is the Fourier series expansion of the logarithm of the spectrum.
Comparison: autocorrelation function is a Fourier series of the normal
spectrum.
We consider:
y[n] =
k=
h[n k] x[k]
Goal:
Separating the kernel h[n] from the input signal x[n].
This problem is also called inversion or deconvolution.
Y (ej ) = H(ej ) X(ej )
Logarithm (complex):
log Y (ej ) = log H(ej ) + log X(ej )
Inverse Fourier Transform:

F 1 log Y (ej ) = F 1 log H(ej ) + F 1 log X(ej )
173
WS 2006/2007
Another notation:
y[n] = x[n] + h[n]

using the definition of the cepstrum for x[n]
(analogous for y[n] and h[n])

x[n] = F 1 log X(ej )
Z
1
exp(jn) log X(ej ) d
=
2
#
"
Z
X
1
x[m] exp(jm) d
=
exp(jn) log
2
m
= C {x[n]}
Note:
Cepstrum = artificial word derived from spectrum
Cepstrum is located in time domain
174
WS 2006/2007
Through the cepstrum transformation

x[n]
x[n] = C {x[n]}
the convolution comes down to a simple addition.

In the cepstrum domain, a linear operation L (time invariance is not nec
essary) on y[n] is performed separately on h[n]
and x[n]:
y[n] =
k=
h[n k] x[k]
y[n] = h[n]
+ x[n]
o
n
x[n]}
L {
y [n]} = L h[n] + L {
With the definition GL for the concatenation of the cepstrum, the operation
L, and the inverse cepstrum
GL := C 1 L C
we obtain
GL {h[n] x[n]} = GL {h[n]} GL {x[n]} .
Such a transformation GL acts on h[n] and x[n] separately, and is called:
homomorph (structure preserving)
175
WS 2006/2007
Complex cepstrum:
1
x[n] =
2
exp(jn) logX(ej ) d
Note: complex logarithm

Simple cepstrum (real cepstrum):
1
x[n] =
2
Z2
exp(jn) log|X(ej )| d
Cepstrum: Fourier coefficients of the logarithmized power spectral

density
ACF: Fourier coefficients of Fourier series of the power spectral density
Setting cepstral coefficients x[n] to zero for high n results in smoothing of
the power spectral density.
Implementation:
Fourier Transform via N FFT (N = 512, 1024, 2048)
(But: discretisation error):
2
2
N 1
j k
1 X j kn
x[n] :=
log |X(e N )|
e N
N
k=0
176
WS 2006/2007
Example 1: Real cepstrum

Fine structure of power spectral density with the period 1/T results in a
single peak in the cepstrum at time T .
log|F()|2
1
T
frequency
F-1(log|F(w)|2)
time
Figure 3.11: Above: logarithmized power spectrum of a spoken vowel (schematic).

Below: corresponding cepstrum (inverse Fouriertransform of the logarithmized power
spectrum).
177
WS 2006/2007
Example 2: Smoothing
speech signal
phoneme "a"
windowed phoneme "a"

- Hamming window -
spectrum from cepstrum

whole cepstrum
first 13 coefficients
Figure 3.12: Cepstral smoothing: speech signal (vowel a), windowed speech signal
(Hamming window), spectrum obtained from the whole cepstrum (blue) and smoothed
spectrum obtained from the first 13 cepstral coefficients (red).
178
WS 2006/2007
Example 3: Smoothing with different numbers of cepstral coefficients
speech signal
phoneme "a"

whole cepstrum

whole cepstrum

whole cepstrum
Figure 3.13: Homomorph analysis of a speech segment: signal progression, homomorph

smoothed spectrum using 13 and 19 cepstral coefficients
179
WS 2006/2007
Cepstrum calculation using Filter Bank Output
Filter bank outputs A[k] for k = 1, . . . , K

Note: k = 0 is missing.
We complete the outputs symmetrically:
A
-K+1
A
-1
A A
Symmetry Ak+1 = Ak for all k = 1, . . . , K.
180
WS 2006/2007
Inverse DFT a[n] of the symmetric sequence AK+1 , . . . , AK :

1
a[n] =
2K
K
X
k=K+1
2j
nk
Ak exp
2K

K
2j
2j
1 X
nk + exp
n(k + 1)
Ak exp
=
2K
2K
2K
k=1

K
2j
2j
1 X
2j
= exp
0.5
n(k 0.5) + exp
n(k 0.5)
Ak exp
2K
2K
2K
2K
k=1

K

n
1 X
2j
0.5
(k 0.5)
Ak cos
= exp
2K
K
K

k=1
The phase term exp

around k = 0.5.
2j
2K

0.5 depends on the position of the symmetry axis
Cepstrum is defined as:

a[n] =
K

n
1 X
(k 0.5)
Ak cos
K
K
k=1
181
WS 2006/2007
Mel Cepstrum according to Davis and Mermelstein
= 100
MEL
k=1
= 300
MEL
MEL
k=K
k=3
Filter bank:
overlapping band-pass filters triangular shape,
all channels have equal band width, and filter positioning is equidistant on a Mel scale.
Calculation of the filter bank outputs:
magnitude of DFT coefficients,
for each channel summation of the magnitudes according to triangular
weight function,
for each channel logarithm of the sum.
Thus the filter outputs A[k] with k = 1, . . . , K are obtained. Using the
filter bank outputs, the cepstrum is calculated using a cosine transform.
(see previous description)
182
WS 2006/2007
3.8
Statistical Interpretation of the Cepstrum Transformation
We consider the filter bank outputs log|Xk |.

log |X k|
N/2
Assumption: The correlation between the outputs s and p, i.e. the element
Csp of the covariance matrix does not depend directly on s or p, but only
on their difference. Because the spectrum is periodical there is no distance
greater than N :
Csp = c(sp)modN
It is further assumed that the correlation is locally symmetric:
Cs,s+n = Cs,sn
Then:
c(ssn)modN = c(ss+n)modN
c(n)modN = c(+n)modN
With 0 n N follows:
cn = cN n
i.e. we have a symmetric cyclic matrix with the kernel vector c.
183
WS 2006/2007
Example: the covariance matrix for N
c0 c1 c2 c3
c c c c
1 0 1 2
c c c c
2 1 0 1
c c c c
C = 3 2 1 0
c4 c3 c2 c1
c3 c4 c3 c2
c2 c3 c4 c3
c1 c2 c3 c4
= 8:
c4
c3
c2
c1
c0
c1
c2
c3
c3
c4
c3
c2
c1
c0
c1
c2
c2
c3
c4
c3
c2
c1
c0
c1
c1
c2
c3
c4
c3
c2
c1
c0
Such a covariance matrix will be diagonalised using the cosine transform

(or Fourier Transform, which results in the cosine transform due to the
symmetry) (see excursion in chapter 2.17).
184
WS 2006/2007
3.9
Energy in acoustic Vector
The energy is usually added as zeroth (or first) component to the acoustic
vector.
For the logarithmic energy we have:
log E =
1
2
log|X(ej )|2 d
For the (short time) spectrum or cepstrum it approximately holds:

K
1 X
log E
log|Xk |2
K
k=1
Spectra are usually normalized with log E:

logYk2 = log|Xk |2 log E
such that:
K
X
k=1
logYk2 0
The cepstral coefficient x[0] is the logarithmized energy.
185
WS 2006/2007
186
Chapter 4
Fourier Transform and Image
Processing
Overview:
4.1 Spatial Frequencies and Fourier Transform for Images
4.2 Discrete Fourier Transform for Images
4.3 Fourier Transform in Computer Tomography
4.4 Fourier Transform and RST Invariance
187
WS 2006/2007
4.1
Spatial Frequencies and Fourier Transform for

Images
A grey-valued image g(x, y) can be interpreted as:

g : IR2 [0, [
(x, y) g(x, y)
Space coordinates (x, y) are at first considered as continuous. Discretization and DFT will be analyzed later.
Convention:
g(x, y) 0
outside of the image.
The Fouriertransform G(fx , fy ) of the image g(x, y) is defined as:

G(fx , fy ) = F {(x, y) g(x, y)}
Z Z
g(x, y) e2j(fx x+fy y) dxdy
=

The arguments fx and fy are called spatial frequencies.
188
WS 2006/2007
The two-dimensional Fouriertransform can be obtained by using two onedimensional Fouriertransforms.

We consider one image row with a constant value of y:
x g(x, y)
Corresponding Fouriertransform Gy (fx ) :
Gy (fx ) = F {x g(x, y)}
Z
g(x, y) e2jfx x dx
=
Then we compute the Fouriertransform of the function:

y Gy (fx )
and obtain:
F {y Gy (fx )} =
=
Z+
Gy (fx ) e2jfy y dy
Z+ Z+
g(x, y) e2j(fx x+fy y) dxdy
In this way we get the result:

G(fx , fy ) = F {y F {x g(x, y)}}
For the inverse FT we have (as can be expected):
g(x, y) = F 1 {(fx , fy ) G(fx , fy )}
Z+ Z+
G(fx , fy ) e2j(fx x+fy y) dfx dfy
=

189
WS 2006/2007
We would like to interpret the two-dimensional FT visually. For this purpose, we consider the exponential factor in the FT and require the following
condition:
!
e2j(fx x+fy y) = 1
2j(fx x + fy y) = 2n
y =
for n IN
fx
n
x +
fy
fy
1/fy
1/fx
1
L = q
fx2 + fy2
spatial period
190
WS 2006/2007
Special case:
|G(fx , fy )| has a large value only at one point (u, v) = (fx , fy ) in the
spatial frequency plane
fy
|G(fx,fy)|
-u
fx
-v
191
WS 2006/2007
Since G(fx , fy ) = G(fx , fy ) for a real image g(x, y), we have two
dominant frequency pairs in the Fouriertransform integral:
|G(u, v)| [e2j(ux+vy) + e2j(ux+vy) ] = 2|G(u, v)| cos 2(ux + vy)
This function describes a black-white cosine wave pattern with
(fx , fy ) = (u, v)
Where is the value of G(fx , fy ) large ?

ideally: points (u, v) and (u, v) represent cosinevariant of the grey
values
really: straight line through (u, v) and (u, v) represents abrupt changes
of the grey values
192
WS 2006/2007
Figure 4.1: TVimage (analog)
Figure 4.2: Digitized TVimage
Figure 4.3: Amplitude spectrum of Figure 4.2
Figure 4.4: Low-pass filtered
193
WS 2006/2007
Figure 4.5: High-pass filtered
Figure 4.6: High-pass enhancement
Explanation for figures 4.14.6 (from Duda & Hart 1973, pp. 310312):
Figure 4.1: TVimage (analog)
Figure 4.2: digitized TVimage
- 120120 pixels
- grey values from 0 (black) to 15 (white)
Figure 4.3: Fouriertransform of the image from Figure 4.2 (amplitude spectrum)
log|G(fx , fy )|: black =
high amplitude
note:
1. strong components along the axes
=
vertical and horizontal image edges
2. concentration around (fx , fy ) = (0, 0)
=
regions with constant grey values
Figure 4.4: Low-pass filter:
H(fx , fy ) = [cos(fx ) cos(fy )]16
0H1
Figure 4.5: High pass filter:
H(fx , fy ) = 1.5 [cos(fx ) cos(fy )]4
0.5 H 1.5
Figure 4.6: High pass enhancement:
H(fx , fy ) = 2.0 [cos(fx ) cos(fy )]4
1.0 H 2.0
194
WS 2006/2007
Following general rules for G(fx , fy ) ensue:

Edges in the image g(x, y):
An image edge produces strong spatial frequency components along
one straight line in the spatial frequency plane which is orthogonal to
the edge.
The sharper the edge is, the longer is the corresponding line in
the spatial frequency domain.
Regions with constant grey values:
Regions with constant grey values increase the values of |G(fx , fy )|
around the origin (fx , fy ) = (0, 0). (fx , fy ) = (0, 0) is called DC
component (average grey value, DC=direct current).
195
WS 2006/2007
4.2
Discrete Fourier Transform for Images
The analog image g(x, y) is discretized (sampled) along both axes. We

obtain the discrete image:
g[j, k] := g(j x, k y) where j, k = 0, 1, . . . , N 1
Change in notation: i = 1 instead of j

G(e
2i
N u
,e
2i
N v
) =
1
N
1 N
X
X
2i
g[j, k]e N (uj + vk)
where u, v = 0, 1, . . . , N 1
j=0 k=0
Discretization is written as:

1
N
1 N
X
X
G[u, v] =
2i
g[j, k] e N (uj + vk)
j=0 k=0
N
1
X
j=0
2i
N uj
N
1
X
2i
N vk
g[j, k] e
k=0
Interpretation:
Fouriertransform of the image is first performed row by row, then column
by column.
2i
Using usual definition of the Fourier matrix W (i.e. Wvk = (e N )vk ),
we obtain the matrix representation of Fouriertransform.
Using the notation:
g IRN xN
W CN xN
G CN xN
we obtain
G =
g
[W g W ]
1
[W 1 G W 1 ]
2
N
Note: In the corresponding definition of the Fouriertransform, instead of

the factors 1 and 1/N 2 we have the factors 1/N and 1/N or 1/N 2 and 1.
196
WS 2006/2007
4.3
Fourier Transform in Computer Tomography

b
y=ax+b,
a const.
We consider a projection of the image g(x, y) along the straight line:

y = ax + b
We produce a set of straight lines by keeping a constant and varying b.
Projection:
ga (b) =
Fouriertransform:
g(x, ax + b) dx
ga (b) e2jfb b db
Z Z
=
g(x, ax + b) e2jfb b db dx
Ga (fb ) =
We substitute b = y ax and obtain:

Z Z
Ga (fb ) =
g(x, y) e2j(yfb xafb ) dydx
= G(afb , fb )
= Fouriertransform G(fx , fy ) of g(x, y) along
1
the spatial frequency straight line (fx , fy ) with fy = fx
a
197
WS 2006/2007
Remarks:
a. Straight line in spatial frequency domain: (fx , fy ) = (afb , fb ) is
orthogonal to y = ax + b:
1
y = ax + b => in Fouriertransform fy = fx
a
The angle
between these straight lines is a right angle because
1
a a = 1.
In general:
y1 (x) = m1 x + b1
y2 (x) = m2 x + b2
y1 (x) y2 (x) m1 m2 = 1
b. The value Ga (fb ) is independent of the offset b and depends only
on the orientation a of the straight line. Therefore, if we calculate the projection for many different inclinations a and apply the
one-dimensional FT, we obtain the two-dimensional FT of the image
g(x, y).
198
WS 2006/2007
4.4
Fourier Transform and RST Invariance
We will investigate invariance of the Fouriertransform to

R : Rotation
S : Scaling
T : Translation
We will use vector notation for the two-dimensional Fouriertransform:

x
coordinates:
z
=
IR2
y
image grey values: g(z) = g(x, y) IR+
spatial frequency:

fx
=
fy
IR2
We ignore the discretization.
199
WS 2006/2007
Translation
z z + z0

x0
with translation vector z0 =
IR2
y0
Image : g(z) g(z) := g(z + z0 )
) = exp (i[fx x0 + fy y0 ]) G(f )
FT : G(f
Rotation
z
D z
with rotation matrix D =
cos
sin
sin cos
200
WS 2006/2007
Scaling
z
Image : g(z)
) =
FT : G(f
z
with scaling factor > 0
g(z) = g( z)

f
1
G
2
basically: similarity principle (S.??) for one-dimensional FT

transferred to two dimensions
Invertible linear mapping

z
Image : g(z)
FT :
Az
where A IR22 invertable
) =
G(f
=
Proof:
g(z) = g(Az)
...
1
1 T
G((A
) f)
det(A)
transformation of two-dimensional integration variables
201
WS 2006/2007
We apply two basic rules to obtain the RST-invariance:

1. Invariance to translation (=T) can be obtained by using the square of
the absolute value
g(z) G(f ) |G(f )|2
2. To obtain RS-invariance we transfer to polar coordinates in the spatial
frequency domain. We write in complex notation:
fz fx + i fy = r ei = exp (ln r + i)
C2
Complex logarithm:
fz := ln fz = ln r + i
We already know:
a) rotation by angle 0 in spatial domain
=
rotation by angle 0 in spatial frequency domain
b) scaling with factor in spatial domain
1
1 2
=
scaling with factor
respectively in spatial
and
frequency domain
fz
scaling and
rotation
202
fz
WS 2006/2007
fz
= ln z
= ln r + i
r
= ln
+ i( 0 )
= ln r + i ln i 0
= fz ln i 0
= translation with the shift vector ( ln i ) C2
in logarithmic polar coordinates of the spatial frequency plane
203
WS 2006/2007
RST-invariant features can therefor be obtained as follows:

g(x, y)
image: (x, y) IR2

first Fouriertransform
G(fx , fy )
= F {g(x, y)}
mit (fx , fy ) IR2
squared absolute value

|G(fx , fy )|2
logarithmic polar coordinates: (ln r, )
|G(ln r, )|2
second Fouriertransform
F (|G(ln r, )|2 )
squared absolute value

|F (|G(ln r, )|2 )|2
Next page:
Analysis of the RST-invariant features.
Original grey valued images are identical up to a 90 rotation.
204
WS 2006/2007
y
6
- x
original image
fy
6
- f
x
|FFT|
6
-
logpolar
ln r
fy
6
- f
x
|FFT|
205
WS 2006/2007
Warning:
a) Invariant observations are not necessarily good for classification.
b) Observations that are calculated using the two-dimensional Fourier
transform are not complete, i.e. the original image cannot be reconstructed completely.
206
WS 2006/2007
Chapter 5
LPC Analysis
Overview:
5.1 Principle of LPC Analysis
5.2 LPC: Covariance Method
5.3 LPC: Autocorrelation Method
5.4 LPC: Interpretation in Frequency Domain
5.5 LPC: Generative Model
5.6 LPC: Alternative Representations
The acronym LPC stands for

Linear Predictive Coefficients / Coding
and is utilized in signal processing and frequency analysis, as well as in
signal coding.
207
WS 2006/2007
5.1
Principle of LPC Analysis
n-2 n
time
We consider a discrete time signal x[n], possibly multiplied with a window

function. The goal of an LPC analysis is to predict each signal value x[n]
by its preceding values x[n 1], x[n 2], ..., x[n K]. We distinguish:
x[n] :
x[n] :
signal value
predicted value
We assume the predicted value x[n] to be a linear combination of the

preceding values of x[n]:
x[n] :=
K
X
k=1
k x[n k]
with at first unknown coefficients k , k = 1, ..., K, which are called

LPCcoefficients or prediction coefficients.
The value K is called prediction order, e.g. K = 8, . . . , 10 at a sampling
frequency of 4 kHz (about 2 coefficients per kHz).
208
WS 2006/2007
Outlook
Starting point: coding in time domain (goal: bit reduction)
Parseval Theorem
parametric model for power spectrum of Fouriertransform

(more exact: rough structure of power spectrum for speech signal)
LPC analysis applications:
speech coding
(ADPCM = adaptive differential pulse code modulation)
signal processing:
parametric modelling with autoregressive or all-pole models (order K)
time curves:
resonance and oscillator curves, sun spots, stock-market course, ...
image coding
also: interpretation as Maximum Entropy Approach
209
WS 2006/2007
The coefficients k are unknown at first. To estimate these, we define the

prediction error for each point n in time:
e[n] := x[n] x[n]
K
X
= x[n]
k x[n k]
k=1
For a reliable set of LPCcoefficients we calculate the squared error criterion E as sum of the squared prediction errors e[n]:
X
e2 (n)
E =
n
X
n
"
x[n]
K
X
k=1
k x[n k]
#2
minimum with respect to 1 , . . . , k , . . . , K
Taking the derivative

for l = 1, . . . , K results in:
l

P
P
!
x[n] k x[n k] x[n l] = 0
n
P
k
P
P
k x[n k]x[n l] = x[n l]x[n]
n
Here, the summation limits are not specified on purpose.

If the squared error criterion E is considered as a function of LPCcoefficients,
the following properties ensue:
E is quadratic in 1 , . . . , k , . . . , K ; it is guaranteed to be nonnegative and it has a single well-defined minimum.
The optimal LPCcoefficients are invariant
to linear scaling of the signal values x[n].
210
WS 2006/2007
Minimization of the squared error criterion with respect to the LPC

coefficients results either from taking the derivative or from the quadratic
complement (recalculate for yourself!). The linear equation system for
the LPCcoefficients k ensues:
l = 1, . . . , K :
K
P
k=1
P
n
x[n k] x[n l] =
X
n
x[n l] x[n]
with still unspecified summation limits over n. We consider two methods

for the choice of summation limits:
1. covariance method
2. autocorrelation method
Warning: terminology is not consistent.
211
WS 2006/2007
5.2
LPC: Covariance Method

known values
predicted value
N-1
Covariance Method
No window function is applied, such that we obtain the following
summation limits:
X
e (n) =
N
1
X
e2 (n)
n=0
i.e. we also use signal values x[n] with n < 0 for prediction.
The resulting equation system for LPCcoefficients:
l = 1, . . . , K :
K
X
k (l, k) = (l, 0)
k=1
with the definition:

(l, k) :=
N
1
X
n=0
x[n l] x[n k]
For the above terms hold:

they describe a kind of cross correlation between two signals
they are similar to a covariance matrix
Computational complexity for solving the equation system:
O(K 3 ) + O(N K)
autocorrelation method has more favorable complexity: O(K 2 )
but: calculation of auto/cross-correlation function dominates
In contrast to covariance method, autocorrelation method offers an interpretation in the frequency domain and therefore is often preferred.
212
WS 2006/2007
5.3
LPC: Autocorrelation Method

window
function
N-1
We consider the signal after multiplication with a convenient window function, usually Hamming window:
In principle, the summation limits now are
X
e [n] =
n=+
X
e2 [n] .
n=
Since, due to windowing the signal x[n] is identical to zero outside the
window function, i.e.
x[n] 0
for n < 0 or N 1 < n
we obtain the following for the prediction error e[n]:

e[n] 0
for n < 0 or N 1 + K < n
Therefore, the total error E becomes:

E =
NX
+K1
e2 [n]
n=0
The prediction error e[n] can become large on the window function
boundaries:
- Beginning: prediction from zeros
- End:
prediction of zeros
213
WS 2006/2007
Inserting the summation limits:

X
x[n k] x[n l] = R(|l k|),
n
where R(|l k|) =

R(|l|) =
NX
1l
n=0
X
n
X
n
x[n] x[n l] = R(|l|)
x[n k] x[n l]
x[n] x[n l] =
NX
1l
n=0
x[n] x[n l]
In this way we obtain the following equation system for the LPCcoefficients
k :
l = 1, ..., K :
K
X
k=1
k R(|l k|) = R(l)
or in matrix form:
R(0)
R(1)
R(1)
..
.
R(0)
..
.
R(K 1) R(K 2)
1
R(1)
2
...
R(K 2)
R(2)
..
..
.
.
...
.. ..
R(1)
. . . R(1)
R(0)
K
R(K)
...
R(K 1)
214
WS 2006/2007
Note that this equation system is completely determined by the autocorrelation coefficients
R(0), ..., R(k), ..., R(K).
Hence, the autocorrelation coefficients will only be converted to obtain
the LPCcoefficients
1 , ..., k , ..., K .
The matrix of this equation system has the following properties:
- Toeplitz structure (follows from time invariance)
- solution: Durbinalgorithm with complexity O(K 2 )
215
WS 2006/2007
5.4
LPC: Interpretation in Frequency Domain
The LPC autocorrelation method allows prediction error conversion from

time domain into frequency domain using Parseval theorem so that LPC
analysis can be interpreted as adaptation of parametric model spectrum to
the observed signal spectrum.
We start with the prediction error e[n]:
e[n] = x[n]
K
X
k=1
k x[n k]
and apply the ztransform to this equation. The ztransform is restricted

to the unit circle.
z = ej
For the z-transforms E(z) and X(z) we obtain:

"
#
K
X
E(z) = X(z) 1
k z k
k=1
The total error Etot for the squared error criterion becomes:
Etot =
NX
+K1
e2 [n]
n=0
Z+
1
2
1
2
1
2
|E(ej )|2 d
Z+
(Parseval Theorem)
2
K

X

jk
k e
|X(ej )|2 d
1

Z+
k=1

P (ej )2 |X(ej )|2 d
216
WS 2006/2007
with the so-called predictor polynom:
P (e ) := 1
K
X
k ejk
k=1
Squared absolute value of the predictor polynom

2
K

X

jk
P (ej )2 = 1
k e

k=1
= ...
=
K
X
k=1
Bk cos(k)
(with suitable coefficients Bk resulting from the predictor coefficients) is a

polynom with respect to cos(), which can be obtained via application of
trigonometric transformations.
The predictor polynom tries to compensate for |X(ej )|2 especially at
maxima and to generate a white spectrum for the prediction error e[n].
The complex predictor polynom P (z) with z C has exactly K zeros in
the complex plane and therefore can be factorised into linear factors:
K
Y
P (z) =
(z zk )
k=1
217
WS 2006/2007
Observations:
These zeros are complex conjugated pairs because k IR.

j 2

The zeros can cause minima of P (e ) . The minima of |P (ej )|2
approximately correspond to the maxima of the smoothed spectrum
|X(ej )|2 , because for minimization of the error integral it is first of
all necessary to compensate for the maxima of the signal spectrum.
The LPC analysis could therefore be used to describe of the speech
signal formant structure.
|P(e i )|2
|X(e i )|2
218
WS 2006/2007

- Hamming window -
prediction error
- 12 LPC-coefficients -
0
0
LPC-spectrum
- 12 coefficients -
spectrum of
prediction error
(12 LPC-coefficients)
LPC-spectrum
- 18 coefficients -
Figure 5.1: LPCanalysis of one speech segment

a) signal progression, b) prediction error (K=12), c) LPCspectrum with K=12 coefficients, d) spectrum of the prediction error (K=12), e) LPCspectrum with K=18 coefficients
219
WS 2006/2007

- Hamming window -
amplitude spectrum
- Hamming window -
LPC-spectrum
- 4 coefficients -
LPC-spectrum
- 8 coefficients -
LPC-spectrum
- 12 coefficients -
LPC-spectrum
- 16 coefficients -
LPC-spectrum
- 18 coefficients -
LPC-spectrum
- 20 coefficients -
Figure 5.2: LPCSpectra for different prediction orders K
220
WS 2006/2007
5.5
LPC: Generative Model

e(n)
x(n)
recursive
filter
k
For the prediction error e[n] and its ztransform holds:

e[n] = x[n]
K
X
k x[n k]
k=1
K
X
E(z) = X(z)
k X(z) z k
k=1
= X(z) [1
K
X
k z k ]
k=1
If we consider prediction error as input signal, we can also interpret the

LPCtheorem as generative model which generates an output signal x[n]
from an adequate input signal e[n]:
x[n] = e[n] +
K
X
k=1
k x[n k] .
For the signal spectrum X(z) holds:

X(z) =
E(z)
K
P
k z k
1
k=1
This model is called autoregressive model. The excitation has to be chosen

such that E(z) is white, i.e. it does not have fine structure due to the
fundamental frequency (pitchfrequency).
In other words:
E(z) = G = const. (gain)
221
WS 2006/2007
Special case:
E[n] = G [n]
Then for LPC model spectrum X(z) holds:
X(z) =
G
K
P
k z k
1
k=1
This spectrum is often interpreted as LPC model spectrum X(z) of observed signal. It is reasonable to set (without explanation):
#
"
K
K
X
X
R(k)
G2 = R(0)
k R(k) = R(0) 1
k
R(0)
k=1
k=1
This LPC model spectrum does not have any zeros, it has only poles, and
therefore is also called allpole model.
Remarks:
stability problems by solving the equation system
( truncation error in autocorrelation)
way out: preemphasis through difference calculation

absolute rule for choice of order K:
1 formant needs 2 LPCcoefficients

1 formant per kHz
+ excitation pulse shape + radiation: 2 LPCcoefficients
= rule of thumb:
bandwidth
4 kHz
5 kHz
6 kHz
222
K = 10
K = 12
K = 14
WS 2006/2007
5.6
LPC: Alternative Representations
so far:
G
gain
k
LPCcoefficients
impulse response of generative model
impulse response of squared absolute value of predictor polynom
cepstrum
poles / zeros of synthesis model / predictor polynom
= formants / bandwidths
problem: noise susceptible

PARCORcoefficients: partial correlation
Areacoefficients: cross-section surfaces Ak
reflexion coefficients PARCOR; tube model
A1
A2
A3
Glottis
A4
A5
Lips
223
WS 2006/2007
Chapter 6
Outlook: Wavelet Transform
Overview:
6.1 Motivation: from Fourier to Wavelet Transform
6.2 Definition
6.3 Discrete Wavelet Transform
225
WS 2006/2007
6.1
Motivation: from Fourier to Wavelet Transform
Fourier transform uses infinitely extended basis functions

ejt
and therefore does not have any time resolution, i.e. there is no information about the localization along the time axis.
Therefore, a window function often is used
t w(t),
complex in general
This function has finite support such that it is possible to investigate a

segment of a function of interest.
We define a shorttime Fourier transform Fb (w) of time signal t f (t)
at position b IR in time:
Fb (w) :=
Z+
f (t)w(t b)ejt dt
where w(t) denotes the complex conjugated value of w(t).

The wavelettransform can be derived from this equation in two steps:
a) we ignore the basis function ejt and we define the window function
as the new basis function.
b) in addition to the localization parameter b, we also introduce a scaling
parameter a > 0.
We consider the family of window functions Wab (t):
tb
wab (t) := w
a
226
WS 2006/2007
6.2
Definition
The following notation is usually used for the Wavelettransform:

tb
1
ab (t) :=
a
a
which is the so-called Mother-Wavelet t (t).
Like the window function for the shorttime Fourier transform the MotherWavelet should be localized as much as possible.
Example:
Mexican-Hat Function:
1 2
(t) = (1 t2 )e 2 t
(t)
227
WS 2006/2007
The wavelettransform of f (t) with respect to (t) is defined as:

1
F (a, b) =
a
Z+
tb
f (t)
dt
a
with scaling parameter a > 0 and localization parameter b IR.

For the inverse transformation holds:
Z+ Z+
tb
1 1
f (t) =
da db F (a, b)
C a2
a
with
C :=
Z
0
|()|2
d <
Proof (principle only):

The proof uses the (generalized) Parseval Theorem:
F (a, b)
1
=
a
1
=
a
with
Z+
Z+
1
ab (t) =
a
f (t) ab (t) dt
F () ab () d

tb
a
using further conversions.
228
WS 2006/2007
6.3
Discrete Wavelet Transform
For the scaling parameters a > 0 we choose:

a = 0m
where 0 > 1 and m Z.
The values of m determine the width of wavelet ab (t).
In order to adjust the localization parameter properly, we define:

b = n b0 am
0
where b0 > 0 and n Z.
Thus we constrain the Wavelet transform to discrete values:

F (a, b)
F (n, m)
The choice of the function (t) is still open.

It is useful to choose (t) such that the function system {mn |m, n
Z, m > 0}
with
m
2
mn (t) := a0
(am
0 t nb0 )
represents an orthonormal basis for functions t f (t)
L2 (IR).
Note: The scalar product < f (t), g(t) > of two functions f (t) and g(t) is
defined as:
Z
< f (t), g(t) > =
f (t) g(t) dt
229
WS 2006/2007
In this way we obtain the following representation for the discrete Wavelet
transform:
F (m, n) =
Z+
f (t)a0 2 (am
0 t nb0 ) dt
= < f (t), mn (t) >

Due to the orthogonality it is possible to convert the integral of the inverse
Wavelettransform into an infinite series:
f (t) =
=
1 XX
F (m, n)mn (t)
C m n
1 XX
m
F (m, n) a0 2 (am
0 t nb0 )
C m n
230
WS 2006/2007
Example:
Haar function and Haar basis

special choice: a0 = 2
b0 = 1
The Haar function is defined as:
0 t 12
1
(t) = 1 12 t < 1
0
otherwise
This defines the Haar basis

(t) | m, n Z, m > 0 :
1
mn (t) =
2m
2m t n
It is easy to see that for increasing m a increasingly finer resolution is obtained and that n determines localization in time.
231
WS 2006/2007
232
WS 2006/2007
Chapter 7
Coding
The following types of coding are distinguished:
source coding (data compression)
goal: transmission (storage) using as few bits as possible without or
with few errors
channel coding
goal: preferably faultless data transmission (storage)
e.g. error-recognizing and error-correcting codes
simultaneous source and channel coding
goal: simultaneous optimization
The following data types are distinguished:
discrete alphabet
continuous signal (audio, video, . . . )
Source coding
lossless coding (compression)
usually discrete sources, e.g. text compression
lossy coding
usually continuous signals
notation:
rate - distortion theory
distortion, error
bit rate
233
234
WS 2006/2007
Three effects can be utilized for signal coding:

a) statistical redundancy and correlation:
samples are not independent.
b) perceptive properties of the receiver (ear and eye):
some fine structures in the signal are irrelevant to the receiver
c) signal distortion:
coded signal differs from the original signal without significant quality
deterioration.
signal
transmission
reconstructed
signal
T -1
Q -1
C -1
T:
transformation, e.g. DCT
Q:
quantization, e.g. vector quantization
C:
mapping of bit representation
235
WS 2006/2007
References:
Ze-Nian Li: CMPT 365 Multimedia Systems. Simon Fraser
University, British Columbia, Canada, fall 1999, Version Jan.2000;
http://www.cs.sfu.ca/CourseCentral/365/li/index_prev.html.
Peter Noll: MPEG Digital Audio Coding. IEEE Signal Processing
Magazine, pp.59-81, Sep. 1997.
Thomas Sikora: MPEG Digital Video-Coding Standards. IEEE Signal
Processing Magazine, pp.82-100, Sep. 1997.
A. Ortega, K. Ramchandran: Rate-Distortion Methods for Image
and Video Compression. IEEE Signal Processing Magazine, pp.2350, Nov. 1998.
G. J. Sullivan, Th. Wiegand: Rate-Distortion Optimization for Video
Compression. IEEE Signal Processing Magazine, pp.74-90, Nov. 1998.
236
WS 2006/2007
Chapter 8
Image Segmentation and
Contour-Finding
The lecture notes for this chapter are available as a separate document.
237
238

Digital Processing of Speech and Image Signals

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Digital Processing of Speech and Image Signals

Încărcat de

Drepturi de autor:

Formate disponibile

Lecture

Prof. Dr.-Ing. H. Ney, Dr.rer.nat. R. Schl

1. System Theory and Fourier Transform

Completions: L. Welling, A. Eiden; April 1997

2 Discrete Time Systems

2.17 Cyclic Matrices and Fourier Transform . . . . . . . . . . .

6 Outlook: Wavelet Transform

8 Image Segmentation and Contour-Finding

Oscillograms of three time functions composed as sum of 20

Amplitude spectrum of a time function composed as sum of

from left to right: original photo, low-pass and high-pass

Phase manipulation for portion of a speech signal (vowel o)

Phase manipulation for portion of a speech signal (consonant

Phase manipulation for a Heavisidefunction (stepfunction)

Schematic representation of the physiological mechanism of

Ideal reconstruction of a band-limited signal (from Oppenheim, Schafer)

Sampling of band-limited signal with different sampling rates:

Example for the application of the Discrete Fourier Transform (DFT). . . . . . . . . . . . . . . . . . . . . . . . . . .

a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum

a) signal v[n]; b) DFT-spectrum V [k]; c) Fourier spectrum

a) DFT of length N = 64; b) DFT of length N = 128; c)

Influence of the window function:

Fourier Transform of a voiced speech segment:

Signal progression and autocorrelation function of voiced

Temporal progression of speech signal and four autocorrelation coefficients . . . . . . . . . . . . . . . . . . . . . . . .

a) wide-band spectrogram: short time window, high time

3.11 Above: logarithmized power spectrum of a spoken vowel

3.12 Cepstral smoothing: speech signal (vowel a), windowed

LPCanalysis of one speech segment

Fourier transform pairs . . . . . . . . . . . . . . . . . . . .