Sunteți pe pagina 1din 16

Signal Processing for Communications

An Introduction to Advanced Technology and Research for


Undergraduates

Related Technologies and Applications:


Digital Cell Phones
Technologies for Cable Modems and Wi-Fi
Secure Military Communications

April 14, 2006, 9:45am-12pm, SCOB 101

Lectures and Modules for Undergraduates on:


Speech and Audio Coders, Andreas Spanias
Channel Coders, Tolga Duman
Time-Varying Signal Processing, Antonia Papandreou-Suppappola
Multcarrier and OFDM Systems, Cihan Tepedelenlioglu

Sponsored by the NSF Combined Research and Curriculum Development Grant 0417604
April 2006 Copyright (c) 2006 - Andreas Spanias II-1

Pedagogiesfor transition of
research to UG curriculum Summer Freshman
DEMO MODULES (DM) and Sophomore
Research Camps
ASU J-DSP Technology for
on-line Java Computer Labs

SS EEE 303 RDA EEE 350


SMALL 1-LECTURE/LAB MODULES (SM)
1lecture/1exercise
4 Module Summaries
to inject in 303, 350, 407, 455
DSP EEE 407 CS EEE 455

LARGE-6-LECTURE 498 MODULES (LM)


•Source Coding (6 lect/1 lab
-
EEE 498
•Channel Coding (6 lect/1 lab
-
Intro to
•Multi-carrIer(6 lect/ 1 lab-
SP-COM
•Time-varying signaling (6 lect/1 lab
-
Research

SP-COM Research
drawn from ASU SP -COM research Feedback/
Activities and from research
Improvement
published work from other universities

April 2006 Copyright (c) 2006 - Andreas Spanias II-2


Wireless Communications
(cell phone appl.)

Input Source Channel


Modulator
Speech Coder Coding

Channel

Output Source Channel


Demodulator
Speech Decoding Decoding

April 2006 Copyright (c) 2006 - Andreas Spanias II-3

Speech and Audio Coding for Mobile and


Multimedia Applications
CRCD Activity, April 14, 2006

by
Andreas Spanias, Professor
DSP and Speech Processing Labs.
Dept. of Electrical Engineering
Arizona State University
Tempe, AZ 85287-5706

email: spanias@asu.edu

http://www.eas.asu.edu/~spanias

April 2006 Copyright (c) 2006 - Andreas Spanias II-4


Topics

1. The Speech Coding Problem

2. Speech Processing Analysis-Synthesis Algorithms

3. Historical Perspective on Algorithmic Research

4. The Standards on Speech Coding

5. Algorithm Examples

6. Research / Remarks

April 2006 Copyright (c) 2006 - Andreas Spanias II-5

Digital Speech

s (n) = s (nT ) = sα (t ) |t = nT
- Can be Manipulated with Software

-Opportunities for Encryption and Enhanced Privacy


Why Digital
Speech? -Stored with High Fidelity

-Error Control

-Mixing Voice/Data/Video- Multimedia

April 2006 Copyright (c) 2006 - Andreas Spanias II-6


Continuous vs Discrete-time (digital) Speech
Continuous-time (analog) Signal Discrete-time (digital) signal
x(t) x(n)

0 T 2T ...
t n

x(t) Q x(n)

A signal that is bandlimited to B must be sampled at a rate of fs, f s ≥ 2B


Telephone Speech is typically bandlimited to 3.2 kHz and sampled at 8kHz
April 2006 Copyright (c) 2006 - Andreas Spanias II-7

Quantization Considerations

For uncompressed telephone speech : 8 bits per sample

8000 samples per second

for a total of 8000 x 8 = 64 kilo bits per second (kbits/s)

PCM 64 kbits is often used as a reference for comparison

To transmit this signal using a basic binary signaling scheme


we need at least 32 kHz of bandwidth

April 2006 Copyright (c) 2006 - Andreas Spanias II-8


Speech Coding

Speech coding or Speech compression is the field concerned


with obtaining compact digital representations of voice
signals for the purpose of efficient transmission or storage.

Speech coding involves sampling and amplitude


quantization.

The objective of speech coding is to represent speech with


a minimum number of bits while maintaining its perceptual
quality.

April 2006 Copyright (c) 2006 - Andreas Spanias II-9

Medium, Low, and Very-low Rate Speech Coding

The speech methods discussed in this course are those intended


for digital speech communications where speech is generally
bandlimited to 4 kHz ( or 3.2 kHz ) and sampled at 8 kHz.

medium-rate coding - the range of 8 - 16 kbits/s

low-rate the range below 8 kbits/s and down to 2.4 kbits/s

very-low-rate the range below 2.4 kbits/s

Remark: Cellular, Voice-Over-IP and speech streaming


applications typically use low-rate coders
April 2006 Copyright (c) 2006 - Andreas Spanias II-10
Historical Perspective
The First Vocoder - Dudley’s Channel Vocoder
Analysis Synthesis

Pitch Channel

Frequency Frequency Filter

Discriminator Meter 0-25~ Pitch

Oscillator

Noise

Filter

EQLZR Modulator
Spectrum Channels 0-300~

Filter Filter

0-300~ 0-25~

EQLZR

A total of ten channels

H. Dudley, "Remaking Speech," J. Acoust. Soc. Am., Vol. 11, p. 169, 1939.
H. Dudley, "The Vocoder," Bell Labs. Record., 17, p. 122, 1939.
April 2006 Copyright (c) 2006 - Andreas Spanias II-11

Voiced and Unvoiced Speech


1.0 Time domain speech segment 50
fundamental
TAPE TIME: 8014 frequency

20 Formant Structure
0.0
Amplitude

0
Magnitude (dB)

-1.0 -20
0 8 16 24 32 0 1 2 3 4
Time (mS) Frequency (KHz)

1.0 Time domain speech segment 40

TAPE TIME: 3840

20

0.0
0
Amplitude

Magnitude (dB)

-1.0 -30
0 8 16 24 32 0 1 2 3 4
Time (mS) Frequency (KHz)

April 2006 Copyright (c) 2006 - Andreas Spanias II-12


Fine (Pitch) and Formant Structure of the
Short-time Speech Spectrum

Fine Harmonic Structure : reflects the quasi-periodicity of


speech and is attributed to the vibrating vocal chords.

Note the narrow peaks

Formant Structure (Spectral Envelope): is due to the


interaction of the source and the vocal tract. The vocal tract
consists of the pharynx and the mouth cavity.
Note the envelope peaks

April 2006 Copyright (c) 2006 - Andreas Spanias II-13

Simple Speech Synthesis Model (2)


Requires “hard” (binary)
Pitch τ
info voicing

V/UV

VOCAL SYNTHETIC
gain TRACT
SPEECH
FILTER

b0
H ( z) = M
1+ ∑ai z −i
i =1

April 2006 Copyright (c) 2006 - Andreas Spanias II-14


H(z) typically estimated using short term linear prediction
The Levinson-Durbin Algorithm

The recursive coefficient update for the m-th order predictor


{ m = 1,2,..., p}

∈f (O ) = r ss (O )
m −1
r ss (m ) − ∑ a i (m − 1 )r ss (m − i )
order a m (m ) = i =1
∈ f (m − 1 )

ai (m ) = ai (m − 1) − am (m )am −i (m − 1) , 1 ≤ i ≤ m -1

index ∈f (m ) = (1 − (a m (m ))2 )∈ f (m − 1)

April 2006 Copyright (c) 2006 - Andreas Spanias II-15

Speech Analysis-by-Synthesis (closed-loop)

Frequency responses Synthesis speech is


of the two synthesis
filters
forced to match i/p speech

s(n)

+
^
Select + + s(n)
-
or Form gain
Excitation
+ +

A (z) A(z)
L

LTP LP

MSE W(z)

April 2006 Copyright (c) 2006 - Andreas Spanias II-16


Code Excited Linear Prediction (2)
The Nx1 error vector

e c (k ) = s w − sˆ w0 − g k sˆ w (k )

sˆw0 output due to the initial filter state,

Minimizing ∈ c (k ) = e cT (k )e c (k ) w.r.t. gk we get

swT sˆw (k )
gk = T
sˆw (k )sˆw (k )

April 2006 Copyright (c) 2006 - Andreas Spanias II-17

Code Excited Linear Prediction (3)

∈ c (k ) = s s w − T w
T sˆ (k ) (s T
w )
2

sˆ w (k )sˆ w (k )
w

The k-th excitation vector, X c (k ) , that minimizes ∈c (k) is selected

closed-loop analysis is used for LTP parameters; range of values for τ


within the integers 20 to 147

M.R. Schroeder and B. Atal, "Code-Excited Linear Prediction (CELP): High Quality Speech at
Very Low Bit Rates," Proc. ICASSP-85, p. 937, Tampa, Apr. 1985.

April 2006 Copyright (c) 2006 - Andreas Spanias II-18


LTP excited by a random signal creates pseudo-periodicity

1
1 − 0.95 z −30

Impulse response Frequency response

Magnitude Response (dB)


10

-10
0 0.5 0.9 1

Normalized frequency (Nyquist = 1)

April 2006 Copyright (c) 2006 - Andreas Spanias II-19

Perceptual Weighting Filter (2)

30
Short Term
Predictor
25

H (z ) =
20 1
10
15 1 − ∑ ai z −i
i =1
10

-5
Perceptual Filter χ=0.9
-10 p
1 − ∑ ai z −i
W (z ) =
-15
0 100 200 300 400 500 600 i =1
p
1 − ∑ γ i ai z −i
i =1

April 2006 Copyright (c) 2006 - Andreas Spanias II-20


Performance and Computational Complexity

A speech coding algorithm is designed and evaluated


based on:

1. Bit rate

2. The quality of reconstructed (“coded”) speech

3. The complexity of the algorithm

4. The end-to-end delay

April 2006 Copyright (c) 2006 - Andreas Spanias II-21

Subjective Speech Quality


Broadcast
Broadcast wideband speech refers to high quality
“commentary” speech at rates above 64 kbits/s.

Network or toll
Toll or Network quality refers to quality comparable
to the classical analog speech (200-3200 Hz)
Communications
Communications quality implies somewhat degraded
speech quality but adequate for cellular communications.
Synthetic
Synthetic speech is usually intelligible but can be
unnatural and associated with a loss of speaker recognizability.

April 2006 Copyright (c) 2006 - Andreas Spanias II-22


The Mean Opinion Score

MOS Scale Speech Quality


1 Bad
2 Poor
3 Fair
4 Good
5 Excellent

April 2006 Copyright (c) 2006 - Andreas Spanias II-23

The Mean Opinion Score (2)

The MOS range relates to speech quality as follows :

MOS 4.0 - 4.5 : network or toll quality

MOS 3.5 - 4.0 : communications quality

MOS 2.5 - 3.5 : synthetic quality

Remarks : MOS ratings may differ significantly from test to


test and hence they are not absolute measures for the
comparison of different coders.

April 2006 Copyright (c) 2006 - Andreas Spanias II-24


Wideband CDMA
Objective to meet IMT 2000 requirements (at least 144 Kb/s in a vehicular
environment, 384 Kb/s in a pedestrian environment, and 2048 Kb/s in an indoor
office environment)
To supports next generation data services envisioned up to 2MB/s (Full coverage
and mobility for 144 Kb/s, preferably 384 Kb/s - Limited coverage and mobility
for 2 Mb/s)
Enhanced Voice Services (audioconferencing & voice mail)
Concurrent high-quality video/audio
Backward compatible with IS-95B
high security & low power
Significantly enhanced version of EVRC for voice services
- http://www.comsoc.org/pubs/surveys/4q98issue/prasad.html
- D. Knisely et al, Evolution of Wireless Data Services: IS-95 to CDMA 2000, IEEE Communications Magazine, pp. 140-149, October 1998
- IS-95 CDMA and cdma2000: Cellular/PCS Systems Implementation, 1/e, Vijay K. Garg, University of Illinois, Chicago, Illinois Published
December, 1999 by Prentice Hall PTR (ECS Professional)

April 2006 Copyright (c) 2006 - Andreas Spanias II-25

GSM Adaptive Multirate Coder

Adjusts its bit-rate according to network load


Rates 12.2, 10.2, 7.95, 6.7, 5.9, 5.15, 4.75kb/s
Based on CELP with 20 ms frame and 5 ms subframe
Multirate-ACELP with 10th order short-term LPC and perceptual
weighting (uses levinson)
Encodes LSPs using split VQ
An open loop LTP is first obtained and refined by closed loop
Highest bit rate provides toll quality & half rate provides communications
quality

- ETSI TS 126 090 V.3.1.0 2000-01 - AMR SPEECH CODEC TRANSCODING FUNCTIONS 3G-TS 26.090 Technical Specification
- R. Ekudden, R. Hagen, I. Johansson, and J. Svedburg, "The Adaptive Multi-Rate speech coder, Proc. IEEE Workshop on
Speech Coding, pp. 117-119, 1999

April 2006 Copyright (c) 2006 - Andreas Spanias II-26


The Selectable Mode Vocoder

• Algorithm to provide higher quality, flexibility, and capacity over existing IS-96C, IS-
127 EVRC, and IS-733 (that replaced IS-96C but working at higher average rate)
• The Conexant SMV algorithm became the core technology for 3G CDMA (core SMV
algorithm to be refined in the interim by participating companies according to the
publication below)
• Based on 4 codecs: full rate at 8.5 kbps, half rate at 4 kbps, quarter rate at 2 kbps, and
eighth rate at 800 bps
• Pre-processing includes noise suppression similar to IS 127 EVRC
• Full rate and half rate based on Conexants eXtended CELP (eX-CELP) a core
technology also used in the ITU G.4 Conexant submission to ITU-4
• Performed better than IS-733 and IS-127 in tests with and without background noise
• Scored as high as 4.1 MOS at full rate with clean speech. Performed very well with
background noise
REFERENCES:
[1] “The SMV algorithm selected for TIA and 3GPP2 for CDMA applications,” conference paper by Conexant systems, Y.Gao, E.
Schlomot, A. Benyassine, J. Thyssen, H. Su, and C Murgia (portions published at ICASSP-2001)

April 2006 Copyright (c) 2006 - Andreas Spanias II-27

STANDARDS AT A GLANCE

• ITU Wideband Coding


– G.722 Coding of 7 kHz speech at 64, 56,48 kbps - Sub-band ADPCM
– G.WB1 Coding of 7 kHz speech at 32/ 24 kbps - Combined Transform and CELP coding
– G.WB2 Coding of 7 kHz speech at 16 kbps or less (ongoing)

• ITU Telephony
– G.711 PCM (64 kbps) late 60’s
– G.726 ADPCM (32/40/ 24/16 kbps) 1988
– G.728 LD-CELP coding (16 kbps) 1992
– G.723.1 True Speech (5.3/6.3 kbps) 1995
– G.729 CS-ACELP (8/12.8/6.4 kbps) 1996 and Annex in 1998
– G.4kbps Toll quality at 4 kbps (on going)

• Non-ITU
– MPEG1/Audio (includes MP3), 1991
– MPEG2/Audio: 64 kbps (1992)
– MPEG4/Audio: audio/speech coding at bit rates between 64 and 2 kbps (1998)
– MPEG7/Audio: audio/speech/MIDI coding (ongoing)

April 2006 Copyright (c) 2006 - Andreas Spanias II-28


STANDARDS AT A GLANCE (2)
• TIA
– CDMA
• IS96 8,4,2 kbps Q-CELP (Qualcomm CELP, 1992)
• IS127 8.55, 4, 0.8 kbps EVRC (Enhanced Variable. Rate Coder, 1996)
• IS733 13.3, 6.2, 2.7, 1 kbps VRC (Variable Rate Coder, 1998)
• 3GPP2 0.8-8.55 kbps SMV (Selectable Mode Vocoder, 2001)
– TDMA
• IS54 7.95 kbps VSELP (Vector-Sum Excitation Linear.Predictor., 1989)
• IS641 7.4 kbps CELP (Similar to EFR but at lower rate, 1997)
– PCS1800 (GSM variant working at 1800 MHz)
• IS136-410 12.2 kbps US1 (1999)

• ETSI (GSM):
– 13 kbps RPE-LTP (Full rate GSM, 1988)
– 6.5 kbps VSELP (Half-rate GSM, 1993)
– 12.2 kbps EFR (Enhanced full-rate GSM, 1996)
– 12.2 - 4.75 kbps AMR (Adaptive Multi Rate, 1999)

• ARIB Japan
– Full-rate PDC (Personal Digital Communication) 6.7 kbps VSELP
– Half-rate PDC 3.45 kbps Multimode CELP`

April 2006 Copyright (c) 2006 - Andreas Spanias II-29

Vocoder/Waveform/Hybrid

MOS PCM
Hybrid Coders ADPCM
1-5 SMV
CELP
Waveform Coders

MELP

LPC10e

Vocoders

1 2 4 8 16 32 64

Bit rate (kbps))

April 2006 Copyright (c) 2006 - Andreas Spanias II-30


PERFORMANCE OF SOME STANDARDIZED ALGORITHMS

Algorithm Bit Rate MOS Complexity Framesize (ms)


(kbits/s) (MIPS)

PCM G.711 64 4.3 0.01 0


+
ADPCM G.726 32 4.1 2 0.125
SBC G.722 48/56/64 4.1 5 0.125
LD-CELP G.728 16 4 ~30 0.625
CS-ACELP G.729 8 4 ~20 10
CS-ACELP-A G.729 8 3.76 11 10
MPC-MLQ G.723.1 6.3/5.3 3.98/3.7 ~16 30
GSM FR RPE-LTP 13 3.7 (ave) 5 20
GSM EFR 13 4 14 20
GSM HR VSELP 6.3 ~3.4 14 20
IS-54 VSELP 8 3.5 14 20
IS-641 EFR 8 3.8 14 20
Conexant eX-CELP SMV 8.55/4/2/0.8 ~4.1 (8.55) ~20 MIPS 20
IS-96 QCELP 1.2/2.4/4.8/9.6 3.33 (9.6) 15 20
IS-127 EVRC 1.2/4.8/9.6 ~3.8 (9.6) 20 20
PDC VSELP 6.3 3.5 14 20
PDC PCI-CELP 3.45 ~3.4 ~48 40
FS 1015 – LPC 10e 2.4 2.3 7 22.5
FS 1016 – CELP 4.8 4.8 3.2 16 30
MELP 2.4 3.2 ~30 22.5
Inmarsat-B APC 9.6/12.8 ~3.1/3.4 10 20
Inmarsat-M IMBE 6.3 3.4 ~13 20

April 2006 Copyright (c) 2006 - Andreas Spanias II-31

Research in Speech and Audio Coding at Arizona State


Speech Coding
S. Ahmadi and A. Spanias, “Algorithms for Low-bit rate sinusoidal coding,” Speech
Communications, Vol. 34(2001), pp.369-390, June 2001 - Research funded by Intel Corp.
Perceptual LPC, ICASSP 05, Atti Venkatraman, NSF

Audio Coding
Selection of sinusoids based on perceptual criteriaT. Painter and A. S. Spanias, " Sinusoidal
Analysis-Synthesis of Audio using Perceptual Criteria,” Proc.. IEEE International Symposium on
Circuits and Systems (ISCAS-02), Phoenix, May 2002. - Research funded by Intel Corporation
Enhancing the Bandwidth of Speech Coders, ISCAS05, Visar Berisha, NSF

2002 Donald G. Fink Prize Paper Award by IEEE Board of Directors -


Award Wining Paper
T. Painter and A. S. Spanias, “Perceptual coding of digital audio,” Proc. of the IEEE, vol. 88, no.4 , pp. 451-
513, Apr. 2000. It was recognized by the IEEE Board of Directors with the prestigious 2002 IEEE Donald G.
Fink Prize Paper Award. (A. Spanias principal investigator and Ph.D. advisor of T. Painter)

April 2006 Copyright (c) 2006 - Andreas Spanias II-32

S-ar putea să vă placă și