Documente Academic
Documente Profesional
Documente Cultură
Outline
Why do we need to compress speech signals Basic components in a source coding system Variable length binary encoding: Huffman coding Speech coding overview Predictive coding: DPCM, ADPCM Vocoder Hybrid coding Speech coding standards
Transformation
Quantization
Lossy
Binary Encoding
Lossless
Scalar Q Vector Q
Motivation for transformation --To yield a more efficient representation of the original samples.
Yao Wang, 2006 EE3414: Speech Coding 4
Binary Encoding
Binary encoding
To represent a finite set of symbols using binary codewords.
The minimum number of bits required to represent a source is bounded by its entropy.
l = pn ln ;
H l H + 1; H = pn log 2 pn ;
For this reason, variable length coding is also known as entropy coding. The goal in VLC is to reach the entropy bound as closely as possible.
1 01 2
001
000
l=
36 8 4 1 67 . 1 + 2 + ( + )3 = = 14 ; 49 49 49 49 49
H = p k log p k = 116 . .
Step 3: For each symbol, determine its codeword by tracing the assigned bits from the corresponding leaf node to the top of the tree. The bit at the leaf node is the last bit of the codeword
10
.57
1.00
1 Section VIII
.43
.30 .23
.20
1 .27 Section
0 0 1
.12 .12
0 1 .08
VII
1 Section VI Section V
0 .06
.06
Section IV
1 Section III 0 1
.06 .06
Section Section II I
11
12
Speech Coding
Speech coders are lossy coders, i.e. the decoded signal is different from the original The goal in speech coding is to minimize the distortion at a given bit rate, or minimize the bit rate to reach a given distortion Figure of Merit
Objective measure of distortion is SNR (Signal to noise ratio) SNR does not correlate well with perceived speech quality
14
Predictive coding:
Predict the current sample from previous samples, quantize and code the prediction error, instead of the original sample. If the prediction is accurate most of the time, the prediction error is concentrated near zeros and can be coded with fewer bits than the original signal Usually a linear predictor is used (linear predictive coding)
x p (n) = ak x(n k )
k =1
Other names:
Non-predictive coding (uniform or non-uniform) -> PCM Predictive coding -> Differential PCM or DPCM
Yao Wang, 2006 EE3414: Speech Coding 15
x(n) +
+
d(n) Q( )
d(n)
Binary Encoder
c(n)
xp(n) =
I a k x (n k )
d ( n) = x ( n) x p ( n)
x ( n) = x p ( n) + d ( n)
EE3414: Speech Coding 16
Note: the predictor is closed-loop: based on reconstructed (not original) past values
Yao Wang, 2006
c(n)
Binary Decoder
Q-1( )
d(n)
xp(n)
Note: the decoder predictor MUST track the encoder predictor, hence BOTH based on reconstructed (not original) past values
17
Example
Code the following sequence of samples using a DPCM coder:
Sequence: {1,3,4,4,7,8,6,5,3,1,} Using a simple predictor: predict the current value by the previous one
xp (n) = a1x(n 1)
Show the reconstructed sequence For simplicity, assume the first sample is known. Show the coded binary stream if the following code is used to code the difference signal
Error 0-> 1, Error 2 -> 01, Error -2 -> 00
Yao Wang, 2006 EE3414: Speech Coding 18
Delta-Modulation (DM)
Delta-Modulation
Quantize the prediction error to 2 levels, and -. Each sample takes 1 bit
Bit rate=sampling rate
I d ( n) = Q [ d ( n) ]
c(n) = 0
d(n) c(n) = 1 -
19
Example
Code the following sequence of samples using a DM:
{1,3,4,4,7,8,6,5,3,1,} Using =2, assuming the first sample is known Show the reconstructed sequence Show the coded binary sequence
20
Predictor: predict the current pixel by the previous one Stepsize: =35
250
xp(n) =x(n1 )
200
150
100
50
-50 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
21
250 200 150 100 50 0.4 0 0 Q=35: OK hug the original 0.2 0.4
-0.01
-0.005
0.005
0.01
0.015
should be chosen to be the difference value that has a relatively large percentage In the above example, = 0.002~0.003 In adaptive DM, is varied automatically so that the reconstructed signal hugs the original signal
23
original, 705 Kbps 0.02 0.01 0 -0.01 -0.02 50 100 150 200
0.02 0.01 0 -0.01 -0.02 DM Q=0.0025, 44 Kbps 0.02 0.01 0 -0.01 -0.02
Mozart_DM_Q_0.0025.wav
50
100
150
200
50
100
150
200
50
100
Mozart_ADM_P1.5.wav
150
200
11 KHz, Q=32, 55 Kbps 0.02 0.01 0 -0.01 -0.02 10 20 30 40 50 0.02 0.01 0 -0.01 -0.02
50
100
150
200
Mozart_f_11_q32.wav
Mozart_ADM_P1.5.wav
25
26
Backward adaptation
The predictor and quantizer are adapted based on the past data frame, the decoder does the same adaptation as the encoder. The predictor and quantizer information does not need to be transmitted: side information data rate is lower! But the prediction error is typically larger than forward adaptation!
27
28
From: http://www.phon.ox.ac.uk/~jcoleman/phonation.htm
Yao Wang, 2006 EE3414: Speech Coding 29
x p (n) = ak x(n k )
Excitation model
Vocoder
Pitch pulse train / noise sequence
k =1
Analysis-by-synthesis coder
Long-term predictor
e p ( n ) = b px ( n m )
Optimal excitation sequence
31
Vocoder Principle
For every short frame
Code the filter coefficient Code the excitation signal
Send a indicator for voiced unvoiced For voiced send the period
Produce unnatural but intelligible sounds at very low bit rate (<=2.4 kbps)
32
Vocoder
x(n)
d(n)
Pitch/Noise Detector
Excitation parameters
Predictor
Pitch/Noise Synthesizer
Prediction parameters
by S. Quackenbush
33
Hybrid Coder
To produce more natural sounds, let the excitation signal be arbitrary, chosen so that the produced speech waveform matches with the actual waveform as closely as possible Hybrid coder: code the filter model plus the excitation signal as a waveform Code-excited linear prediction (CELP) coder: choose the excitation signal from codewords in a predesigned codebook This principle leads to acceptable speech quality in the rate range 4.8-16 kbps, used in various wireless cellular phone systems
Yao Wang, 2006 EE3414: Speech Coding 34
Hybrid Coder
x(n)
d(n)
Error Minimization
Excitation vector
Excitation parameters
Predictor
Prediction parameters
by S. Quackenbush
Yao Wang, 2006 EE3414: Speech Coding 35
36
37
38
Predictive coding
Understand the predictive coding process (the encoder and decoder block diagram) Understand how the simple predictive coder (including delta modulator) works and can apply it to a sequence Understand the need for adaptation of both the predictor and the quantizer, and why ADPCM improves performance
References
Yao Wang, EL514 Lab Manual, Exp3: Speech and audio compression, Sec. 2.2, 2.3 (copies provided) Z. N. Li and M. Drew, Fundamentals of multimedia, Prentice Hall, 2004. Sec. 6.3 (Quantization and transmission of audio) and Chap.13 (Basic audio compression techniques) A. Spanias, Speech coding: a tutorial review, Proc. IEEE, vol. 82, pp 1541-1582, 1994. (and several other articles in this issue)
Speech Coding Overview Jason Woodard, http://wwwmobile.ecs.soton.ac.uk/speech_codecs/ Lossless compression algorithms, http://www.cs.sfu.ca/CourseCentral/365/D1/20031/material/notes/Chap4/Chap4.1/Chap4.1.html The Lossless Compression (Squeeze) Page, by Dominik Szopa, http://www.cs.sfu.ca/CourseCentral/365/li/squeeze/index.html
Has good demo/applets for Huffman coding and LZ coding
40