Sunteți pe pagina 1din 23

Code Excited Linear Prediction (CELP)

Aneek Anwar Zaeem Varaich Bilal Hassan Hashim Bhatti 2012-MS-EE-067 2012-MS-EE-078 2012-MS-EE-075 2008-MS-EE-116

Introduction
CELP is a speech coding algorithm proposed by Schroeder and Atal One of the most widely used speech coding algorithm for lossy compression of speech Based on the idea of Linear Prediction (LPC) Used as a generic term for variety of codecs like
MPEG-4 Part 3 (CELP as an MPEG-4 Audio Object Type) G.728 - Coding of speech at 16 kbit/s using low-delay code excited linear prediction G.718 - uses CELP for the lower two layers for the band (506400 Hz) in a two stage coding structure G.729.1 - uses CELP coding for the lower band (504000 Hz) in a three-stage coding structure

Background on Speech Signal


Speech signal is a short-time periodic signal

Short-time overlapping frames through windowing the signal All the subsequent processing is done on these frames

Source-Filter model of Speech


Assumes a source of sound and a filter that shapes that sound, organized so that the source and the filter are independent Source is our vocal cords in larynx and filter is our vocal tract Our glottis vibrates with a fundamental frequency f0 and the source contains f0 and its harmonics Our vocal tract resonates at certain frequencies called formants for different vowels, so we have peaks at these formants

Linear Prediction Coefficients (LPCs)


Based on the source-filter model of speech Source can be modelled by an impulse train with frequency f0 in case of voiced speech and by white noise in case of unvoiced speech The filter can be modelled as an all-pole filter with poles at the formants freq. So H(z) =
=

LPCs contd.
We can predict the next sample using linear combination of previous p samples, thus called linear prediction
s(n) = a1s(n-1) + a2s(n-2) aps(n-p)

or

The error between the original and predicted sample will be

LPCs contd.
Taking the z-transform, we get

So S(z) will be given as S(z) = E(z) / A(z) = E(z) H(z) where H(z) = 1/A(z) So we just need to find the coefficients ak to model the filter ak can be computed by using least-square criterion or Levinson Durbin algorithm

Long Term Prediction (LTP)


The idea is to predict one period of signal from the preceding one

(n) b x (n M ). x

Two unknowns, b and M


M is the pitch period and can be estimated using any pitch estimation technique b is the unknown coefficient and estimated using the least square criterion

Vector Quantization
Vector quantization (VQ) allows the modeling of probability density functions by the distribution of prototype vectors. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. VQ is quite suitable for lossy data compression.

Vector Quantization: Definition


Blocks: form vectors A sequence of audio A block of image pixels

x x0

x1 xN 1

k into a finite A vector quantizer maps k-dimensional vectors in the vector space R T set of vectors x x x x 0 1 N 1 T Unquantized vector: y y0 y1 yN 1 Quantized vector: y VQx ri , x Ci Reconstruction vector (codeword): r i Codebook: the set of all the codewords: Voronoi region: nearest neighbor region

10

Vector Quantizer: 2-D example

11

Vector Quantization Procedure

12

CELP
CELP is based on the previously discussed concepts
An LPC filter is used to model the vocal tract along with the Long Term Prediction Error signal which acts as excitation signal in source filter model is quantized using VQ - both fixed and adaptive codebooks are used A weighted perceptual filter is added to reduce the noise

CELP algorithm
Encoding
LPC analysis H(z) Define perceptual weighting filter. This permits more noise at formant frequencies where it will be masked by the speech Synthesize speech using each codebook entry in turn as the input to V(z)
Calculate optimum gain to minimize perceptually weighted error energy in speech frame Select codebook entry that gives lowest error

Transmit LPC parameters and codebook index

Decoding
Receive LPC parameters and codebook index Resynthesize speech using H(z) and codebook entry

CELP Basic Encoder


Synthetic Original Speech signal speech signal x
Normalized Stochastic Codebook

1/B(z) g

1/A(z)
LS criterion

where
1/B(z) represents a Long Term Prediction filter 1/A(z) is the LPC filter g is the gain

CELP Adding a perceptual filter


We want to choose the LTP delay and codebook entry that gives the best sounding resynthesized speech. We exploit the phenomenon of masking: a listener will not notice noise at the formant frequencies because it will be overwhelmed by the speech energy. We therefore filter the error signal by:
W(z) = H(z/0.8) / H(z) = A(z) / A(z/0.8)
Synthetic Original Speech signal speech signal x
Normalized Stochastic Codebook

1/B(z) gain
LS criterion

1/A(z)
W(z)

Error q

Block Diagram of complete encoder


10 9 8 7 6

1/A(f) W(f)

Original Signal Pitch estimation ana-synt


0 1 2

5 4 3 2 1

LPC Analysis

0 0

500 1000 1500 2000 2500 3000 3500

s
1/B(z) 1/A(z) Gain g

perceptual Criterion W(z) MC

Synthetic speech

M1

Waveforms codebook

Search for the best code and gain Iteration on the whole codebook

Fixed and Adaptive Codebook


Stochastic or fixed codebook normally contains 1082 independent random values from the set {1, 0, +1} with probabilities {0.1, 0. 8, 0.1}. Adaptive Codebook is formed from Long Term Prediction (LTP) filter

CELP Encoder with Adaptive Codebook


Speech frame

+
W(z)

p 0
H filter memory

Adaptive codebook

g1
H(z)

p 1 +

c 1,i(0)

p e
Least Square

Stochastic codebook

g2
H(z)

p 2

c 2,i(1)

Transmitted Parameters
Adaptive Codebook Gain and Index Fixed Codebook Gain and Index LPC filter coefficients

CELP Decoder
Decode received parameters: Index of stochastic codebook Gain of stochastic codebook Index of adaptive codebook Gain of adaptive codebook Linear prediction filter coefficients Adaptive codebook

g1 c 1,i(0)
1/A(z)
Synthetic speech

Stochastic codebook

g2 c 2,i(1)

Various Standards for Speech coding


UIT Standard G711 G721 G723 G726 G727 G728 G729 G729a G723.1 Method PCM ADPCM Year 1972 1984 1986 1988 1990 1992 1994 1996 1995 Bit rate in Delay in ms Kbps 64 0.125 32 40-32-24 40-32-24-16 40-32-24-16 16 8 6.3 5.3 0,125 Quality MOS 4.3 4.1 at 32Kbps Complexity in Mips <<1 1.25

LD-CELP CS-ACELP MP-MLQ ACELP

2.5 30 75

4.0 3.9 3.9

30 25 12 24

Compression Ratio
For normal PCM speech, we use 8bits per sample at the sampling rate of 8kHz data rate = 64kbps For various CELP standards, data rate can be as low as 6kbps for the same signal So
COMPRESSION RATIO = 64/6 = 10

References
B.S. Atal, "The History of Linear Prediction," IEEE Signal Processing Magazine, vol. 23, no. 2, March 2006, pp. 154161. M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937940, 1985. Digital Processing of Speech Signals. L. R. Rabiner and R. W. Schafer. Prentice-Hall (Signal Processing Series), 1978.

S-ar putea să vă placă și