Sunteți pe pagina 1din 77

Time-Domain Methods for

Speech Processing


Contents
Introduction
Time-Dependent Processing of Speech
Short-Time Energy and Average Magnitude
Short-Time Average Zero Crossing Rate
Speech vs. Silence Discrimination Using Energy and
Zero-Crossing
The Short-Time Autocorrelation Function
The Short-Time Average Magnitude Difference
Function
Time-Domain Methods for
Speech Processing

Introduction
Speech Processing Methods
Time-Domain Method:
Involving the waveform of speech signal
directly.

Frequency-Domain Method:
Involving some form of spectrum
representation.
Time-Domain Measurements
Average zero-crossing rate, energy, and the
autocorrelation function.
Very simple to implement.
Provide a useful basis for estimating
important features of the speech signal, e.g.,
Voiced/unvoiced classification
Pitch estimation
Time-Domain Methods for
Speech Processing

Time-Dependent
Processing of Speech
Time Dependent Natural of Speech

This is a test.
Time Dependent Natural of Speech
Short-Time Behavior of Speech
Assumption
The properties of speech signal change
slowly with time.

Analysis Frames
Short segment of speech signal.
Overlap one another usually.
Time-Dependent Analyses
Analyzing each frame may produce either a
single number, or a set of numbers, e.g.,
Energy (a single number)
Vocal tract parameters (a set of numbers)

Thiswill produce a new time-dependent


sequence.
General Form

Qnn
Q TT[[xx((mm)])]ww((nnmm))
mm

n: Frame index
x(m): Speech signal
T[ ]: A linear or nonlinear transformation.
w(m): A window function (finite of infinite).
General Form

Qnn
Q TT[[xx((mm)])]ww((nnmm))
mm

Qn is a sequence of local weighted


average values of the sequence T[x(m)].
Example

Energy E x
m
2
( m)

Short-Time n
En x (m) 2
Energy m n N 1
Example

Short-Time n
En x (m) 2
Energy m n N 1
Example
T [ x(m)] x (m)
2

1 0 m N 1
w(m)
0 otherwise
n
Short-Time
Ennn
E 2
TT[[xxx((m
m()]
)]w
m ) ((nnm
w m))
Energy mm
m N 1
n
General Short-Time-Analysis Scheme

Linear
Linear Lowpass
Lowpass
TT[[ ]]
Filter
Filter Filter
Filter

Depending on the
choice of window
Time-Domain Methods for
Speech Processing

Short-Time Energy and


Average Magnitude
Applications
Silence Detection
Segmentation

Lip Sync

Short-Time Energy

En [ x(m)w(n m)]
m
2


x
m
2
( m) w ( n m)
2


x
m
2
( m ) h ( n m)

x 2 ( m) * h ( m)
Short-Time Average Magnitude


Mn | x(m) | w(n m)
m

| x(m) | *w(m)
Block Diagram Representation

x(n) x2(n)
[[ ]]22 h(n)
h(n) En

h( n) w ( m)
2

x(n) |x(n)|
|| || w(n)
w(n) Mn
What
What isis the
the effect
effect of
of windows?
windows?
Block Diagram Representation

x(n) x2(n)
[[ ]]22 h(n)
h(n) En

h( n) w ( m)
2

x(n) |x(n)|
|| || w(n)
w(n) Mn
The Effects of Windows
Window length

Window function
Rectangular Window

1 0 n N 1
h( n)
0 otherwise

j
j((NN11))//22
sin(
sin(N
N // 22))
H((ee )) ee
j
H j
sin(
sin(// 22))
j
j((NN11))/ /22
sin(
sin(N
N //22))
H((ee )) ee
j
H j
sin(
sin(
//22))
Rectangular Window

8 | H (e j ) |
N=8

Peak sidelobe

2 2 2 2

N N
Mainlobe
m
width
j
j((NN11))/ /22
sin(
sin(N
N //22))
H((ee )) ee
j
What
What is
is this?
this? H j
sin(
sin(
//22))
Rectangular Window
Discuss
Discuss the
the effect
effect of
of window
window duration.
duration.
Discuss
Discuss the
the effect
effect of
of mainlobe
mainlobe width
width and
and sid
sid
8 | H (e j ) |
N=8

Peak sidelobe

2 2 2 2

N N
Mainlobe
m
width
Commonly Used Windows

Rectangular
1
0.8
0.6 Hamming
Bartlett
0.4
Hanning
0.2
Blackman
0
0 5 10 15 20
Commonly Used Windows
11 00nnNN11
Rectangular ww((nn))
00 otherwise
otherwise
22nn/(/(NN11)) 00nn((NN11))/ /22
Bartlett
ww((nn)) 2222nn/(/(NN11)) ((NN11))/ /22nnNN11
(Triangular) 0 otherwise
0 otherwise
0.5 0.5 cos[2n /( N 1)] 00nnNN11
Hanning ww((nn)) 0.5 0.5 cos[2n /( N 1)]
00 otherwise
otherwise
00.54
. 5400.46
.46cos[
cos[22n
n/(/(NN11)])] 00nnNN11
Hamming ww((nn))
00 otherwise
otherwise
0.42 0.5 cos[2n /( N 1)] 0.8 cos[4n /( N 1)] 00nnNN11
Blackman ww((nn)) 0.42 0.5 cos[2n /( N 1)] 0.8 cos[4n /( N 1)]
00 otherwise
otherwise
Commonly Used Windows
Hanning

Least mainlobe width

Rectangular Hamming

Bartlett
Blackman
Examples: Short-Time Energy

Rectangular Window Hamming Window


Examples: Average Magnitude

Rectangular Window Hamming Window


The Effects of Window Length
Increasing the window length N, decreases
the bandwidth.
If N is too small, e.g., less than one pitch
period, En and Mn will fluctuate very rapidly.
IfN is too large, e.g., on the order of several
pitch periods, En and Mn will change very
slowly.
The Choice of Window Length
No signal value of N is entirely satisfactory.

Thisis because the duration of a pitch period


varies from about 2 ms for a high pitch
female or a child, up to 25 ms for a very low
pitch male.
Sampling Rate
The bandwidth of both En and Mn is just that
of the lowpass filter.
So, they need not be sampled as frequently as
speech signals.
For example
Frame size = 20 ms
Sample period = 10 ms
Main Applications of En and Mn

To provide
the basis for distinguishing
voiced speech segments from unvoiced
segments.

Silence detection.
Differences of En and Mn


EEnn
[[xx((mm))ww((nnmm)])]
22 Emphasizing large sample-to-
sample variations in x(n).
mm

The dynamic range (max/min)


M

Mnn ||xx((m
mm
m))||ww((nnm
m)) is approximately the square
root of En.

The differences in level between voiced and unvoiced


regions are not as pronounced as En.
FIR and IIR
Allthe windows that we discussed
are FIRs.

Each of them is a lowpass filter.

It can also be an IIR.


IIR Example
aann nn00 11
hh((nn)) H((zz))
H 11
00 nn00 11az
az

Recursive formulas:

Short-Time Energy: EEnn aE 22


aEnn11 x ((nn))
x
Short-Time
Average magnitude: Mnn aM
M aMnn11||xx((nn))||
Time-Domain Methods for
Speech Processing

Short-Time Average
Zero-Crossing Rate
Voiced and Unvoiced Signals
Th/i/s

Thi/s/
The Short-Time Average Zero-Crossing Rate


ZZnn
||sgn[ m)])]sgn[
sgn[xx((m m11)])]||ww((nnm
sgn[xx((m m))
mm

1 x ( m) 0 1
sgn[ x(m)] w(m) 0 m N 1
1 x ( m) 0 2N

x(n) First Lowpass Zn


Difference
| |
Filter
Distribution of Zero-Crossings
Example
Time-Domain Methods for
Speech Processing

Speech vs. Silence


Discrimination Using
Energy and Zero-Crossing
Speech vs. Silence Discrimination

Locating the beginning and end of a speech


utterance in the environment with background
of noise.
Applications:

Segmentation of isolated word


Automatic speech recognition
Save bandwidth for speech transmission
Examples:
In some cases, we
can locate the
beginning and end of
a speech utterance
using energy alone.
Examples:
In other cases, we
can locate the
beginning and end of
a speech utterance
using zero-crossing
rate alone.
Examples:
Sometimes, we
cannot do it using
Actual beginning
one criterion alone.
Difficulties
In general, it is difficult to locate the boundaries
if we encounter the following cases:
Weak fricatives (/f/, /th/, /h/) at the beginning or end.
Weak plosive bursts (/p/, /t/, /k/) at the beginning or
end.
Nasals at the end.
Voiced fricatives which become devoiced at the end
of words.
Trailing off of vowel sounds at the end of an
utterance.
Rabiner and Sambur
10 msec frame with sampling rate 100 time/sec
is used.
The algorithm assumes that the first 100 msec
of the interval contains no speech.
The means and standard deviations of the
average magnitude and zero-crossing rate of
this interval are computed to characterize the
background noise.
The Algorithm
The Algorithm

1
2

No more than 25 frames


Examples
Examples
Time-Domain Methods for
Speech Processing

The Short-Time
Autocorrelation Function
Autocorrelation Functions

((kk))
xx((mm))xx((mmkk))
mm

x(m)

k
x(m+k)

((kk))
x
x ((mm))xx((mmkk))
mm

Properties
1. Even: (k) = (k).
2. (k) (0) for all k.
3. (0) is equal to the energy of x(m).
x(m)

k
x(m+k)

((kk))
x
x ((mm))xx((mmkk))
mm

Properties
4. If x(m) has period P, i.e. x(m)= x(m+P), then

((kk)) ((kk PP))


x(m)

k
x(m+k)

((kk))
x
x ((mm))xx((mmkk))
mm

Properties
4. If x(m) has period P, i.e. x(m)= x(m+P), then

((kk)) ((kk PP))

This motivates us to use autocorrelation for pitch detection.



((kk))
x
x ((mm))xx((mmkk))
mm

Short-Time Version

RRnn((kk))
x
x ((mm))ww((nnmm))xx((mmkk))ww((nnkk
m
m ))
mm

x(m)
n
x(m)w(nm)

k
x(m+k)w(nkm)

RRnn((kk))
xx((mm))ww((nnmm))xx((mmkk))ww((nnkkmm))
mm

Property
RRnn((kk)) RRnn((kk))
x(mk)w(n+km)

Rn(k) k
x(m)w(nm)

k Rn(k)
x(m+k)w(nkm)

RRnn((kk))
xx((mm))ww((nnmm))xx((mmkk))ww((nnkkmm))
mm

Property
RRnn((kk)) RRnn((kk))

Rn (k ) x(m)w(n m) x(m k )w(n k m)
m

x(m) x(m k )[w(n m)w(n m k )]
m
y k(m) h k(n m)


yk (n) * hk (n)
yykk((nn)) xx((nn))xx((nnkk))
hhkk((nn))ww((nn))ww((nnkk))

Property
RRnn((kk)) RRnn((kk)) yk (n) * hk (n)

Rn (k ) x(m)w(n m) x(m k )w(n k m)
m

x(m) x(m k )[w(n m)w(n m k )]
m
y k(m) h k(n m)


yk (n) * hk (n)
yykk((nn)) xx((nn))xx((nnkk))
hhkk((nn))ww((nn))ww((nnkk))

Property
RRnn((kk)) RRnn((kk)) yk (n) * hk (n)

x(n) hhkk(n)
(n) Rn(k)

zzkk

RRnn((kk))
xx((mm))ww((nnmm))xx((mmkk))ww((nnkkmm))
mm

Another Formulation
ww''((nn)) ww((nn))

Rn (k ) x(m)w[(m n)]x(m k )w[(m n k )]
m

x(m)w' (m n) x(m k )w' (m n k )
m

x(m n)w' (m) x(m n k )w' (m k )
m
ww''((nn)) 00 for
for 00 nn NN 11

Another Formulation
ww''((nn)) ww((nn))

Rn (k ) x(m)w[(m n)]x(m k )w[(m n k )]
m

x(m)w' (m n) x(m k )w' (m n k )
m

x(m n)w' (m) x(m n k )w' (m k )
m
N 1 k
[ x(n m)w' (m)][ x(n m k )w' (m k )]
m 0
AA noncausal
noncausal formulation
formulation
N=401
N=401
Examples

voiced
voiced

Unvoiced
Unvoiced

Rectangular Window Hamming Window


NN11kk
RRnn((kk))
[[xx((nnmm))ww''((mm)][
mm00
)][xx((nnmmkk))ww' '((mmkk))]]

ExamplesLess data will be involved for


Less data will be involved for
larger
larger lag
lag k.k.

N=401
N=401 R (k ) 1 k / N , | k | N

N=251
N=251

N=125
N=125
Modified Short-Time
Autocorrelation Function

Original Version:

RRnn((kk))
[[xx((nnmm))ww''((mm)][
)][xx((nnmmkk))ww' '((mmkk))]]
mm

Modified Version:

RRnn((kk))
[[xx((nnmm))ww11((mm)][
)][xx((nnmmkk))ww22((mmkk))]]
mm


RRnn((kk))
[[xx((nnmm))ww11((mm)][
)][xx((nnmmkk))ww22((mmkk))]]
Modified Short-Time mm

Autocorrelation Function

ww11((mm))

ww22((mm)) K Max. lag


NN11
RRnn((kk))
xx((nnmm))xx((nnmmkk)),, 00kk KK
mm00
Modified Short-Time
Autocorrelation Function

ww11((mm))
11 0011 NN 11
ww11((mm))
00 otherwise
otherwise
ww22((mm)) K Max. lag
11 0011 NN 11KK
ww22((mm))
00 otherwise
otherwise
N=401
N=401
Examples
i mi
SSim rila
lar

voiced
voiced

Unvoiced
Unvoiced

Rectangular Window Modified Version


Examples

N=401
N=401

N=251
N=251

N=125
N=125

Rectangular Window Modified Version


Time-Domain Methods for
Speech Processing
The Short-Time
Average Magnitude
Difference Function
The AMDF

nn((kk))
||xx((nnmm))ww11((mm))xx((nnmmkk))ww22((mm))||
mm

If x(n) is periodic with period P, then

n (kP) 0, k 0,1,2,
Computationally
Computationally more
more effective
effective than
than au
au
Example

voiced
voiced

Unvoiced
Unvoiced
Exercise
Recordinga piece of yours speech to perform
voice/unvoice segmentation.

Designa effective algorithm to perform


autocorrelation.

S-ar putea să vă placă și