Sunteți pe pagina 1din 15

18-796

Multimedia Communications: Coding, Systems, and Networking

Prof. Tsuhan Chen tsuhan@ece.cmu.edu

MPEG Audio

Outline
Basics
Psychoacoustics Subband coding

MPEG-1 audio
Layer I and II Layer III Frame structure and packetization

MPEG-2 audio
Multichannel audio Backward compatible coding Non backward compatible coding
18-796/Spring 1999/Chen

Digital Audio
Telephone Speech Wideband Speech Mediumband Audio Wideband Audio

Frequency Band (Hz) 300~3400 50~7000 10~11000 10~22000

Sampling Rate (kHz) 8 16 24 48

Bits per Sample 8 8 16 16

Raw Bitrate (kbits/s) 64 128 384 768

CD: 44.1 kHz 16 bits 2 channels = 1.411 Mbits/s

18-796/Spring 1999/Chen

Psychoacoustics
Threshold in quiet

26 critical bands 0~24 kHz

Frequency masking in the same critical band


18-796/Spring 1999/Chen

Frequency Masking
SMR (Signal-to-Mask Ratio)

18-796/Spring 1999/Chen

Temporal Masking
Post-Masking: 50~200ms

Pre-Masking: 1/10 of post-masking

18-796/Spring 1999/Chen

Subband Coding
H1 (z) H2 (z) M M

Q Q Q

M M

F1 (z) F2 (z) FM(z)


Synthesis Filterbank

HM(z)
Analysis Filterbank

Maximal downsampling Q should be based on signal-to-masking ratio (SMR) Ear critical bands are not uniform, but logarithmic s
The filter bank should match the critical bands Tree-structure filter bank (to be derived on board)
18-796/Spring 1999/Chen

Subband Coding vs. DCT


M z-1 M z-1 E(z) R(z) M z M z

M Polyphase Representation

When E(z) = DCT matrix, this becomes DCT


No overlap; blocking artifact

Modified DCT (MDCT)


50% overlap; less blocking artifact
18-796/Spring 1999/Chen

MPEG-1 Audio
ISO/IEC 11172-3 (1988~1991)
First high quality audio compression standard Sampling rates: 32, 44.1, 48 kHz CD quality two-channel audio at ~256 kbits/s
CD: 44.1 kHz 16 bits 2 = 1.411 Mbits/s

Quality demonstration (MPEG-1 Layer II)


Stereo 44.1 kHz at 64 kbits/s Stereo 44.1 kHz at 128 kbits/s Stereo 44.1 kHz at 192 kbits/s Stereo 44.1 kHz at 256 kbits/s
18-796/Spring 1999/Chen

Encoder Block Diagram


PCM audio samples 32, 44.1, 48 kHz analysis filterbank encoded bitstream frame packing

quantizer and coding

psychoacoustic model

11172-3 Encoder

ancillary data
18-796/Spring 1999/Chen

Decoder Block Diagram

encoded bits tream

fra m e unpacking

reconstruction

synthesis filte rbank

PCM audio samples 32, 44.1, 48 kHz

11172-3 Decoder
ancillary data

18-796/Spring 1999/Chen

Layers
Increasing complexity, delay, and quality
Layer I: ~384 kbits/s for perceptually lossless quality (4:1) Layer II: ~192 kbits/s for perceptually lossless quality (8:1) Layer III: ~128 kbits/s for perceptually lossless quality (12:1) (for two channels)

100% perceptual lossless

18-796/Spring 1999/Chen

Layer I and II Encoder

32 Analysis Filterbank
512-tap Masking Threshold Generator Dynamic Bit Allocator Coder

Scaler & Quantizer Mux

FFT
512-pt for Layer I 1024-pt for Layer II/III

18-796/Spring 1999/Chen

Block-Based Coding
12 Analysis Filterbank 12 12

...
Block: Layer I Superblock: Layer II/III

12 samples for Layer I, 36 samples for Layer II/III Block companding: Each block normalized by scalefactor For Layer II, up to 3 scalefactors, with 2-bit scalefactor select Each block/superblock receives one bit allocation

Layer III Encoder


6 or 18 with overlap

Analysis Filterbank

MDCT

Scaler & Quantizer

Huffman Coding

Mux
Masking Threshold Generator Coding

FFT

18-796/Spring 1999/Chen

Features in Layer III


Hybrid filterbank
MDCT with filterbank

Long/short window switching


Short for better temporal resolution (to prevent pre-echoes) Long for better frequency resolution

Nonuniform quantization Entropy coding


Run-length and Huffman coding

Bit reservoir (buffer)

18-796/Spring 1999/Chen

Frame Structure
Header Info Side Info Subband Sanples Aux Data

Header info: Sync bits, system info, CRC (cyclic redundancy code) Side info: bit allocation, scalefactor, (and scalefactor select for Layer II and III) Subband samples: 32 12 for Layer I, 32 36 for Layer II and III Packetization: 4-byte header, 184-byte payload

18-796/Spring 1999/Chen

Stereo Redundancy Coding


Four modes: mono, stereo, dual with two separate channel, joint stereo Joint stereo mode
Human stereo perception > 2kHz is based on envelope Intensity stereo coding > 2kHz
Encode (L + R) Assign independent left- and right- scalefactors

Layer III supports (L+R) and (LR) coding

18-796/Spring 1999/Chen

MPEG-2 Audio
ISO/IEC 13818-3
Allows lower sampling rates
16, 22.05, and 24 kHz: about half of MPEG-1

From wideband speech to mediumband audio Higher frequency resolution Layer I, II, and III

Multichannel coding
2~5 channels; surround sound, multilingual, for visual/hearing-impaired

Backward compatible and non-backward compatible coding (13818-7: MPEG-2 AAC)


18-796/Spring 1999/Chen

10

Multichannel Audio

2/0-stereo

3/0

3/1
Surround

LFE: Low-frequency enhancement (woofer) 15~120 Hz Can be anywhere

3/2

3/2 with woofer (5.1 system)

18-796/Spring 1999/Chen

Compatibility
Forward compatibility
A new decoder can decode an old bitstream Usually simple to achieve

Backward compatibility
An old decoder can decode a new bitstream, at least partially Usually limits the coding efficiency

18-796/Spring 1999/Chen

11

MPEG-2 Backward Compatible Audio Coding


MPEG-1 Header MPEG-1 Data MPEG-1 Ancillary Data

MPEG-1/2 Frame

MPEG-2 Header

MPEG-2 Data

L C R LS RS Matrix

L0 R0 T3 T4 T5

MPEG-1 Encoder MPEG-2 Extension Encoder Mux

L0 = ( L + C + LS ) 1 1 or = 1; = = 0 = 1+ 2 ; = = 2 R0 = ( R + C + RS )

Backward Compatible Audio Coding (cont.)

L C R LS RS

L0 R0 T3 Matrix T4 T5

MPEG-1 Encoder MPEG-2 Extension Encoder Mux Demux

L0 L R0 C T3 Inverse R MPEG-2 T4 Matrix LS Extension RS Decoder T5 MPEG-1 Decoder

Matrixing

Dematrixing

18-796/Spring 1999/Chen

12

Non Backward Compatible (NBC) Coding


MPEG-2 Advanced Audio Coding (AAC)
ISO/IEC 13818-7 (April 1997) 320~384 kbits/s for 5 channels, 64kbits/channel NBC at 320 kbits/s as good as BC coding at 640 kbits/s 1~48 audio channels, 0~16 LFEs, 0~16 data streams

Same framework (perceptual subband coding) as MPEG-1, with some enhancements

18-796/Spring 1999/Chen

MPEG-2 AAC
Noiseless Decoding

Enhancements
Preprocessing High resolution filterbanks
1024-line MDCT / 128
Legend Data Control Inverse Quantizer

Scale Factors

Temporal noise shaping (TNS): time-dependent quantization Coupling channel


Intensity multichannel coding

M/S 13818-7 Coded Audio Stream Bitstream Demultiplex

Prediction

Backward adaptive prediction in subbands M/S stereo coding Noiseless coding (entropy coding): Huffman coding

Intensity/ Coupling

TNS

Filter Bank Output Time Signal

Gain Control

13

Input time signal

Encoder
Perceptual Model Gain Control Legend Filter Bank Data Control

TNS

Intensity/ Coupling Quantized Spectrum Prediction of Previous Frame M/S Iteration Loops Scale Factors

Bitstream Multiplex

13818-7 Coded Audio Stream

Rate/Distortion Control Process

Quantizer

Noiseless Coding

18-796/Spring 1999/Chen

MPEG-2 AAC Profiles


Main Low Complexity Scaleable Sampling Rate 20 kHz 18 kHz 12 kHz 6 kHz

Main profile
Best quality, highest complexity 1024 or 128 MDCT

Low-complexity profile
No temporal noise shaping, no prediction

Scalable sampling-rate profile


Scalable output sampling rates and complexity Uses hybrid filterbanks (like MPEG-1 Layer III) No prediction, no coupling channel
18-796/Spring 1999/Chen

14

Simcast
To achieve backward compatibility at the cost of higher bitrate
L0 R0 L C R LS RS MPEG-2 AAC Encoder Mux Demux MPEG-2 AAC Decoder MPEG-1 Encoder MPEG-1 Decoder L0 R0 L C R LS RS

18-796/Spring 1999/Chen

References
Peter Noll, MPEG digital audio coding, IEEE Signal Processing Magazine, Sept. 1997, pp. 59-81 D. Pan, A tutorial on MPEG/audio compression, IEEE Multimedia, v. 2, no. 2, 1995, pp. 60-74 http://www.mpeg.org/MPEG/audio.html http://www.cselt.it/mpeg/faq/faq-audio.htm http://www.tnt.uni-hannover.de/project/mpeg/audio/

18-796/Spring 1999/Chen

15

S-ar putea să vă placă și