MPEG Audio: Multimedia Communications: Coding, Systems, and Networking

18-796
Multimedia Communications: Coding, Systems, and Networking
Prof. Tsuhan Chen tsuhan@ece.cmu.edu
MPEG Audio
Outline
Basics
Psychoacoustics Subband coding
MPEG-1 audio
Layer I and II Layer III Frame structure and packetization
MPEG-2 audio
Multichannel audio Backward compatible coding Non backward compatible coding
18-796/Spring 1999/Chen
Digital Audio
Telephone Speech Wideband Speech Mediumband Audio Wideband Audio
Frequency Band (Hz) 300~3400 50~7000 10~11000 10~22000
Sampling Rate (kHz) 8 16 24 48
Bits per Sample 8 8 16 16
Raw Bitrate (kbits/s) 64 128 384 768
CD: 44.1 kHz 16 bits 2 channels = 1.411 Mbits/s
Psychoacoustics
Threshold in quiet
26 critical bands 0~24 kHz
Frequency masking in the same critical band

Frequency Masking
SMR (Signal-to-Mask Ratio)
Temporal Masking
Post-Masking: 50~200ms
Pre-Masking: 1/10 of post-masking
Subband Coding
H1 (z) H2 (z) M M
Q Q Q
M M
F1 (z) F2 (z) FM(z)

Synthesis Filterbank
HM(z)
Analysis Filterbank
Maximal downsampling Q should be based on signal-to-masking ratio (SMR) Ear critical bands are not uniform, but logarithmic s
The filter bank should match the critical bands Tree-structure filter bank (to be derived on board)
Subband Coding vs. DCT

M z-1 M z-1 E(z) R(z) M z M z
M Polyphase Representation
When E(z) = DCT matrix, this becomes DCT

No overlap; blocking artifact
Modified DCT (MDCT)

50% overlap; less blocking artifact
MPEG-1 Audio
ISO/IEC 11172-3 (1988~1991)
First high quality audio compression standard Sampling rates: 32, 44.1, 48 kHz CD quality two-channel audio at ~256 kbits/s
CD: 44.1 kHz 16 bits 2 = 1.411 Mbits/s
Quality demonstration (MPEG-1 Layer II)

Stereo 44.1 kHz at 64 kbits/s Stereo 44.1 kHz at 128 kbits/s Stereo 44.1 kHz at 192 kbits/s Stereo 44.1 kHz at 256 kbits/s
Encoder Block Diagram

PCM audio samples 32, 44.1, 48 kHz analysis filterbank encoded bitstream frame packing
quantizer and coding
psychoacoustic model
11172-3 Encoder
ancillary data
Decoder Block Diagram
encoded bits tream
fra m e unpacking
reconstruction
synthesis filte rbank
PCM audio samples 32, 44.1, 48 kHz
11172-3 Decoder
ancillary data
Layers
Increasing complexity, delay, and quality
Layer I: ~384 kbits/s for perceptually lossless quality (4:1) Layer II: ~192 kbits/s for perceptually lossless quality (8:1) Layer III: ~128 kbits/s for perceptually lossless quality (12:1) (for two channels)
100% perceptual lossless
Layer I and II Encoder
32 Analysis Filterbank
512-tap Masking Threshold Generator Dynamic Bit Allocator Coder
Scaler & Quantizer Mux
FFT
512-pt for Layer I 1024-pt for Layer II/III
Block-Based Coding
12 Analysis Filterbank 12 12
...
Block: Layer I Superblock: Layer II/III
12 samples for Layer I, 36 samples for Layer II/III Block companding: Each block normalized by scalefactor For Layer II, up to 3 scalefactors, with 2-bit scalefactor select Each block/superblock receives one bit allocation
Layer III Encoder

6 or 18 with overlap
Analysis Filterbank
MDCT
Scaler & Quantizer
Huffman Coding
Mux
Masking Threshold Generator Coding
FFT
Features in Layer III

Hybrid filterbank
MDCT with filterbank
Long/short window switching

Short for better temporal resolution (to prevent pre-echoes) Long for better frequency resolution
Nonuniform quantization Entropy coding

Run-length and Huffman coding
Bit reservoir (buffer)
Frame Structure
Header Info Side Info Subband Sanples Aux Data
Header info: Sync bits, system info, CRC (cyclic redundancy code) Side info: bit allocation, scalefactor, (and scalefactor select for Layer II and III) Subband samples: 32 12 for Layer I, 32 36 for Layer II and III Packetization: 4-byte header, 184-byte payload
Stereo Redundancy Coding

Four modes: mono, stereo, dual with two separate channel, joint stereo Joint stereo mode
Human stereo perception > 2kHz is based on envelope Intensity stereo coding > 2kHz
Encode (L + R) Assign independent left- and right- scalefactors
Layer III supports (L+R) and (LR) coding
MPEG-2 Audio
ISO/IEC 13818-3
Allows lower sampling rates
16, 22.05, and 24 kHz: about half of MPEG-1
From wideband speech to mediumband audio Higher frequency resolution Layer I, II, and III
Multichannel coding
2~5 channels; surround sound, multilingual, for visual/hearing-impaired
Backward compatible and non-backward compatible coding (13818-7: MPEG-2 AAC)

10
Multichannel Audio
2/0-stereo
3/0
3/1
Surround
LFE: Low-frequency enhancement (woofer) 15~120 Hz Can be anywhere
3/2
3/2 with woofer (5.1 system)
Compatibility
Forward compatibility
A new decoder can decode an old bitstream Usually simple to achieve
Backward compatibility
An old decoder can decode a new bitstream, at least partially Usually limits the coding efficiency
11
MPEG-2 Backward Compatible Audio Coding

MPEG-1 Header MPEG-1 Data MPEG-1 Ancillary Data
MPEG-1/2 Frame
MPEG-2 Header
MPEG-2 Data
L C R LS RS Matrix
L0 R0 T3 T4 T5
MPEG-1 Encoder MPEG-2 Extension Encoder Mux
L0 = ( L + C + LS ) 1 1 or = 1; = = 0 = 1+ 2 ; = = 2 R0 = ( R + C + RS )
Backward Compatible Audio Coding (cont.)
L C R LS RS
L0 R0 T3 Matrix T4 T5
MPEG-1 Encoder MPEG-2 Extension Encoder Mux Demux
L0 L R0 C T3 Inverse R MPEG-2 T4 Matrix LS Extension RS Decoder T5 MPEG-1 Decoder
Matrixing
Dematrixing
12
Non Backward Compatible (NBC) Coding

MPEG-2 Advanced Audio Coding (AAC)
ISO/IEC 13818-7 (April 1997) 320~384 kbits/s for 5 channels, 64kbits/channel NBC at 320 kbits/s as good as BC coding at 640 kbits/s 1~48 audio channels, 0~16 LFEs, 0~16 data streams
Same framework (perceptual subband coding) as MPEG-1, with some enhancements
MPEG-2 AAC
Noiseless Decoding
Enhancements
Preprocessing High resolution filterbanks
1024-line MDCT / 128
Legend Data Control Inverse Quantizer
Scale Factors
Temporal noise shaping (TNS): time-dependent quantization Coupling channel

Intensity multichannel coding
M/S 13818-7 Coded Audio Stream Bitstream Demultiplex
Prediction
Backward adaptive prediction in subbands M/S stereo coding Noiseless coding (entropy coding): Huffman coding
Intensity/ Coupling
TNS
Filter Bank Output Time Signal
Gain Control
13
Input time signal
Encoder
Perceptual Model Gain Control Legend Filter Bank Data Control
TNS
Intensity/ Coupling Quantized Spectrum Prediction of Previous Frame M/S Iteration Loops Scale Factors
Bitstream Multiplex
13818-7 Coded Audio Stream
Rate/Distortion Control Process
Quantizer
Noiseless Coding
MPEG-2 AAC Profiles

Main Low Complexity Scaleable Sampling Rate 20 kHz 18 kHz 12 kHz 6 kHz
Main profile
Best quality, highest complexity 1024 or 128 MDCT
Low-complexity profile
No temporal noise shaping, no prediction
Scalable sampling-rate profile

Scalable output sampling rates and complexity Uses hybrid filterbanks (like MPEG-1 Layer III) No prediction, no coupling channel
14
Simcast
To achieve backward compatibility at the cost of higher bitrate
L0 R0 L C R LS RS MPEG-2 AAC Encoder Mux Demux MPEG-2 AAC Decoder MPEG-1 Encoder MPEG-1 Decoder L0 R0 L C R LS RS
References
Peter Noll, MPEG digital audio coding, IEEE Signal Processing Magazine, Sept. 1997, pp. 59-81 D. Pan, A tutorial on MPEG/audio compression, IEEE Multimedia, v. 2, no. 2, 1995, pp. 60-74 http://www.mpeg.org/MPEG/audio.html http://www.cselt.it/mpeg/faq/faq-audio.htm http://www.tnt.uni-hannover.de/project/mpeg/audio/
15

MPEG Audio: Multimedia Communications: Coding, Systems, and Networking

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

MPEG Audio: Multimedia Communications: Coding, Systems, and Networking

Încărcat de

Drepturi de autor:

Formate disponibile

18-796

Multimedia Communications: Coding, Systems, and Networking

Prof. Tsuhan Chen tsuhan@ece.cmu.edu

Frequency Band (Hz) 300~3400 50~7000 10~11000 10~22000

Sampling Rate (kHz) 8 16 24 48

Bits per Sample 8 8 16 16

Raw Bitrate (kbits/s) 64 128 384 768

CD: 44.1 kHz 16 bits 2 channels = 1.411 Mbits/s

26 critical bands 0~24 kHz

Frequency masking in the same critical band

Pre-Masking: 1/10 of post-masking

F1 (z) F2 (z) FM(z)

Subband Coding vs. DCT

When E(z) = DCT matrix, this becomes DCT

Modified DCT (MDCT)

Quality demonstration (MPEG-1 Layer II)

Encoder Block Diagram

quantizer and coding

Decoder Block Diagram

encoded bits tream

synthesis filte rbank

PCM audio samples 32, 44.1, 48 kHz

100% perceptual lossless

Layer I and II Encoder

Scaler & Quantizer Mux

Layer III Encoder

Scaler & Quantizer

Features in Layer III

Long/short window switching

Nonuniform quantization Entropy coding

Bit reservoir (buffer)

Stereo Redundancy Coding

Layer III supports (L+R) and (LR) coding

Backward compatible and non-backward compatible coding (13818-7: MPEG-2 AAC)

LFE: Low-frequency enhancement (woofer) 15~120 Hz Can be anywhere

3/2 with woofer (5.1 system)

MPEG-2 Backward Compatible Audio Coding

MPEG-1 Encoder MPEG-2 Extension Encoder Mux

Backward Compatible Audio Coding (cont.)

MPEG-1 Encoder MPEG-2 Extension Encoder Mux Demux

L0 L R0 C T3 Inverse R MPEG-2 T4 Matrix LS Extension RS Decoder T5 MPEG-1 Decoder

Non Backward Compatible (NBC) Coding

Same framework (perceptual subband coding) as MPEG-1, with some enhancements

Temporal noise shaping (TNS): time-dependent quantization Coupling channel

M/S 13818-7 Coded Audio Stream Bitstream Demultiplex

Filter Bank Output Time Signal

Input time signal

13818-7 Coded Audio Stream

Rate/Distortion Control Process

MPEG-2 AAC Profiles

Scalable sampling-rate profile

S-ar putea să vă placă și