Audio Compression Standards: James Rodney P. Santiago

Audio Compression Standards
James Rodney P. Santiago

Coverage
I. History of Audio Coding
II. Principles of Audio Coding
III. AAC Family of Standards
Loudness Contour
• MASCAM was developed at the Institut Fur
Rundfunktechnik (IRT)
• MUSICAM was developed in cooperation with
CCETT, Philips and Matsushita
• ASPEC was developed by Fraunhofer
Gesellschaft together with Thomson, this is
included under MPEG-1 compression.
Digital Compression
Basic Concept of Data Compression
– Redundancy Reduction – repetitive information

that can be reproduced in the receiving end need
not be sent . But this alone does not produce
much compression
– Irrelevance Reduction – removing all

imperceptible image or sound information for
human senses resulting to a much better data
compression
Transform Coding
Sub-band Coding
(DCT)
MASCAM IRT Munich,
1988
ASPEC Fraunhofer
MUSICAM, IRT, CCETT,
Gesellschaft, Thomson
Philips, Matsushita,
1989
ISO/IEC 11172-3 MPEG-1 Audio, 1990/91

Layer I, Low complexity encoder, low compression
Layer II, medium encoder complexity Data Rates :
Layer III, High complexity encoder, high compression
Layer I : 32 – 384Kbps
Layer II : 32 – 448Kbps
ISO/IEC 13818-3 MPEG-2 Audio, 1994 Layer III: 32 – 192Kbps
Layer I, II, III
Layer II Multichannel audio up to 5.1
1990 Dolby Digital AC-3 Audio
1991 first cinema demonstration with a AC-3 audio

encoded movie
Dec. 1991 „Star Track VI“ with AC-3 audio, Now AC-3 used
for movies, ATSC and worldwide addionally in MPEG2
transport streams and on DVD
Dolby AC-3 audio: Transform Coding using Modified

Discrete Cosine Transform (MDCT), 5.1 Audio Channels
(left, center, right, left surround, right surround, subwoofer);
128 kBit/s per channel.
Audio Coding - History (3)
MPEG-2 ISO/IEC 13818-3

MPEG-2 Audio System
MPEG-2 AAC ISO/IEC 13818-7

AAC = Advanced Audio Coding
MPEG-4 ISO/IEC 14496-3:

natural and synthetic audio objects
MPEG-7 ISO/IEC 15938

Storage space requirement for
A/V data signals
Data Rates
LEFT CHANNEL RIGHT CHANNEL
ANALOG TO DIGITAL CONVERSION (BITS) ANALOG TO DIGITAL CONVERSION (BITS)
15Hz -
BW
20KHz 8 16 24 32 64 8 16 24 32 64
16000 128Kbps 1,024Kbp 1,024Kbp

256Kbps 384Kbps 512Kbps 128Kbps 256Kbps 384Kbps 512Kbps
s s
SAMPLING FREQUENCY (Hz)
24000 192Kbps 1,536Kbp 1,536Kbp

384Kbps 576Kbps 768Kbps 192Kbps 384Kbps 576Kbps 768Kbps
s s
32000 256Kbps 1,024Kbp 2,048Kbp 1,024Kbp 2,048Kbp

512Kbps 768Kbps 256Kbps 512Kbps 768Kbps
s s s s
44100 352,800
705,600 1,058,400 1,411,200 2,822,400 352,800 705,600 1,058,400 1,411,200 2,822,400
48000 384Kbps 1,152Kbp 1,536Kbp 3,072Kbp 1,152Kbp 1,536Kbp 3,072Kbp

768Kbps 384Kbps 768Kbps
s s s s s s
96000 768Kbps 1,536Kbp 2,304Kbp 3,072Kbp 6,144Kbp 1,536Kbp 2,304Kbp 3,072Kbp 6,144Kbp
768Kbps
s s s s s s s s
Masking
800khz masker
Auditory masking
•Exploits Subjective Masking Effects

•Some louder tones hide some lower tones
•Effect will persist for up to 200ms
•The Psycho-Acoustic Model Defines the Mask
Temporal
masking
Audio Compression Systems Used
in MPEG-2 Transport Streams
• MPEG-1 layers 1, 2 & 3
• Only layer-II used in broadcast systems
• MPEG-2 audio (5.1 channels) possible, but rarely used
• All are ‘backward compatible’
• Dolby digital (AC3) USA ATSC and also DVB (Germany, Australia)
• 5.1 channels (0.1 = low freq effects)
• AAC (Japan) ADIFF and ADTS type audio

• MPEG-4 will use AAC as default standard
(Frauhoffer labs) 8 or more channels dynamically reposition-able
in space
MPEG-2 Audio Compression
Audio Signal
16 bit
A
Right D
up to 768 kbit/s
15 to 20 kHz BW
32/44.1/48 kHz = approx.
Sampling Freq. 1.5 Mbit/s
16 bit
A
Left D
up to 768 kbit/s
15 to 20 kHz BW
32/44.1/48 kHz
Sampling Freq.
Amplitude, Frequency & Time Masks
Auditory masking
• Two sounds of similar frequency which occur at the
same time .
• Sounds at lower frequencies must be even closer
together in order to be masked by higher frequencies
Temporal masking
• Loud sound that drowns out softer sounds immediately
before, or after it.
Psychoacoustic Model
Physical Representation of the human

ear :
by using
Psycho Acoustic Model of Human Ear

(Perceptual Coding) = Irrelevancy Reduction
and Redundancy Reduction
By understanding the overall response of the

human ear, data rate reduction can be
achieved!!!
Representation of the Human Ear
hammer inner
ear semicircular
canals
cochlea
outer
ear auditory
middle nerves
ear eustachian
eardrum tube
Mechanical Representation of the Human ear
hammer
eardrum inner ear
membrane
receptors for
low frequencies
outer
ear
receptors auditory
middle
for high frequencies nerves
ear
eustachian tube
Electrical Representation of the human ear
Outer ear = mechanical impedance transformer
high ........middle...............low frequencies

Filter
Filter characteristics Frequency receptors

Auditory nerve
of middle ear and inside cochlea signals,
eardrum ~100 mVpp,
repetition
(e.g. resonance at 3 kHz) rate up to
1 kHz depending
on audio amplitude
Audibility Threshold
L [dB]
60
40
20
0 2 4 6 8 10 12 14 f [kHz]
Frequency Masking
L [dB] Masking tone (1kHz)
60
40 Masking threshold
20
0 2 4 6 8 10 12 14 f [kHz]
Frequency Masking
L [dB]
60
40
20
0 2 4 6 8 10 12 14 f [kHz]
Temporal Masking
L[dB] Premasking
50
40
Masking Postmasking
30
tone
20
10
0 100 200 300 400 t [ms]

Quantization Noise
Sinusoidal signal using
full AD converter range
N bit resolution
A
LP
D
Quantization noise: S/N[dB] = 6 N

Audio Encoding
Frequency Irrelevancy Redundancy
subbands reduction reduction
Audio in Filtering
process Subband Data
Time: fine Quantizer coding
Frequency: coarse
Compressed
audio
out
Spectrum Psycho-
analysis acoustic
Time: coarse model
Frequency: fine
Audio Subband Coding
Audio in
BP Q
BP Q
Frequency Compressed
subbands
audio
BP Q out
Bandpass Quantizer
filter
Psycho Example:
acoustic MPEG layer I, II
512 point FFT
FFT
@MPEG Layer model
I,
1024 points
@ Layer II;
every 24ms
Subband Filtering @ MPEG-2 Layer I,II
L [dB]
60
40
20
0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]
MPEG-2 Layer I,II:

32 subbands, each 750 Hz wide
Bit Allocation @ MPEG-2 Layer I,II
Different maximum bit allocation in subbands:

max. min.
n1 Bit n2 Bit n3 Bit n4 Bit n5 Bit
L [dB]
60
40
20
0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]
Quantization @ MPEG-2 Layer I,II
Signal level in subband below masking
threshold determined by a signal at 8 kHz:
subband completely suppressed
L [dB]
60
40
20
0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]
Spectrum calculated
by means of FFT; thresholds
calculated after FFT; Signal level in subband above masking
quantizer controlled by threshold determined by a signal at 4 kHz:
psychoacoustic model quantization noise adjusted to below threshold
MPEG2 Audio Data Structure
Subband filter & 12 12 12
quantizer 0 samples samples samples



Layer I
frame Layer II
frame
MPEG-2: Scale Factor Determination
Highest value is used for

scale factor determination
for a block of samples
Block
of samples
Audio Transform Coding
Audio in (M)DCT
Quantizer
Modified Discrete
Cosine Transform
Compressed
audio
out
Psycho- Example:
Dolby Digital
FFT acoustic
model AC-3
Audio Hybrid Subband&Transform Coding
Audio in Sub-
band (M)DCT Quantizer
filter Compressed
Audio
Out
Psycho- Example:
acoustic MPEG layer III
FFT
model
Multichannel Audio Coding
Multi-
channel
audio in Detection
and removal
Filter of
process interchannel Quantizer
e.g. redundancies/
left, Compressed
irrelevancies
right, audio
rear out
Example:
MPEG layer III,
Psycho-
AC3
acoustic
FFT
model
Multichannel Audio 5.1
Subwoofer
Left Center Right
Left surround Right surround

Advanced Audio Codec (AAC)
James Rodney P. Santiago
Family of Standards
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
AAC-
AAC-LD
ELD
SBR PS
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
SURROUND
Functional Block Diagram of an MPEG-2 AAC encoder
Input Output
Bitstream Multiplex
Prediction Scale Noiseless

Filterbank TNS System
Quantizer
Factors Coding
Perceptual
Rate/Distortion Control
Model
MPEG AAC-LC
• The AAC-LC is the next-generation successor to the mp3 audio
codec, invented and developed by Fraunhofer IIS.
• AAC-LC delivers transparent quality in compressed audio at
only 64 kbit/s per channel – compressed audio that is virtually
indistinguishable from the original audio source.
• The AAC-LC satisfies the requirements for broadcast quality as
defined by the EBU. With flexible sampling rates ranging from
8 kHz up to 192 kHz, bit rates up to 256 kbit/s per channel,
and with support for up to 48 channels.
• It can be used in applications that demands high quality and
unlimited bandwidth.
• It has support for mono, stereo and all common multi-channel
configurations.
• ideal codec for any low-bit-rate, high-quality audio
application on mobile devices.
AAC-
AAC-LD
ELD
SBR PS
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
SURROUND
MPEG HE-AAC
• High Efficiency – Advanced Audio Code, also known
ask AACplus.
• HE-AAC is the low-bit-rate codec that integrates the

functionality of the AAC-LC audio codec and Spectral Band
Replication (SBR) bandwidth expansion tool.
• HE-AAC allows design flexibility to trade off quality against
bandwidth, file size or bit rate.
• HE-AAC delivers good stereo quality at bit rates of 32 to 48
kbit/s.
• The codec is multi-channel compatible.
AAC-
AAC-LD
ELD
SBR PS
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
SURROUND
MPEG HE-AACv2
Also known as AACplusv2
• The HE-AAC v2 adds the Parametric Stereo (PS) feature to

initial HE-AAC to further enhance efficiency in low-bandwidth
• media.
• Fraunhofer’s HE-AAC v2 codec delivers good-quality audio at
bit rates from 16 to 24 kbit/s for stereo content.
AAC-
AAC-LD
ELD
SBR PS
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
SURROUND
MPEG AAC-ELD
• AAC-LD, the low delay version of AAC.
• It combines the full-bandwidth, superior quality of AAC with a
low coding delay necessary for two-way audio communication.
• It features an algorithmic delay of only 20 ms, while offering
CD-like audio quality at 64 kbit/s per channel.
• With the integration of SBR technology and the feature set of
the LD codec, Fraunhofer’s AAC-ELD provides full audio
bandwidth at data rates down to 24 kbit/s per channel.
• Both the AAC-LD and AAC-ELD codecs are perfectly suited for
applications that require bi-directional communication, such as
Internet telephony and video conferencing.
AAC-
AAC-LD
ELD
SBR PS
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND
SURROUND
HD-AAC
• The MPEG standard HD-AAC offers music encoding with quality
beyond CDs while being compatible with iPods and mobile phones.
• Audio CD’s store uncompressed music in 16-bit, 44.1 kHz quality,
while most music is now produced in the improved 24-bit, 96 kHz
format.
• HD-AAC provides this high-quality sound experience to the user, the
online music distribution and the consumer electronics industry.
• Based on the MPEG standards, Scalable lossless (SLS) and AAC,
HDAAC provides scalable-to-lossless compression of 24-bit quality
music content, thereby ensuring a seamless migration to future
AAC-compliant standards.
HD-AAC
• The MPEG standard HD-AAC offers music encoding with quality
beyond CDs while being compatible with iPods and mobile phones.
• Audio CD’s store uncompressed music in 16-bit, 44.1 kHz quality,
while most music is now produced in the improved 24-bit, 96 kHz
format.
• HD-AAC provides this high-quality sound experience to the user, the
online music distribution and the consumer electronics industry.
• Based on the MPEG standards, Scalable lossless (SLS) and AAC,
HDAAC provides scalable-to-lossless compression of 24-bit quality
music content, thereby ensuring a seamless migration to future
AAC-compliant standards.
SUMMARY
CODEC FEATURES TYPICAL APPLICATIONS TYPICAL BIT RATE
apple iPod
AAC-LC (Low Complexity High performance audio codec for
iTunes 128 Kbit/s (stereo)
Advanced Audio Codec excellent audio quality at low bit rates
ISDB-T Television broadcast (Japan)
High performance audio codec for XM Radio

HE-AAC (High Efficiency
good quality at bitrates of 28kbps per 56 Kbit/s (stereo)
AAC), AACplus channel. Mobile music download
Digital Radio Mondiale (DRM)
Highest performance audio codec for
3GPP music download
HE-AAC v2 (AACplus v2) good quality at bit rates below 48 Kbit/s (stereo)
24kbits/s per channel Digital Radio DAB+
Losless audio codec for better Approximately half the bit

HD-AAC (High Definition Home networks
than CD quality with 24bit and rate of the uncompressed
AAC) sampling up to 192Khz file
music distribution/Production
Video conferencing
AAC encoding with 20ms
AAC-LD (low delay AAC) algorithmic delay VOIP telephony 128Kbit/s (stereo)
Broadcast gateway
Low delay full audio bandwidth codec Video conferencing

AAC-ELD ( Enhanced Low
at data rates down to 24kbit/s per 64Kbit/s (stereo)
Delay AAC) channel and 15ms delay. VOIP Telephony
Broadcast gateway
Digital Radio in surround
Surround sound extension for AAC-LC Mobile tv with binaural surround 64 - 192 kbit/s (5.1
MPEG Surround and HE-AAC sound channels)
music distribution/Production
Thank you very much for your attention

Audio Compression Standards: James Rodney P. Santiago

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Audio Compression Standards: James Rodney P. Santiago

Încărcat de

Drepturi de autor:

Formate disponibile

Audio Compression Standards

James Rodney P. Santiago

Basic Concept of Data Compression

– Redundancy Reduction – repetitive information

– Irrelevance Reduction – removing all

ISO/IEC 11172-3 MPEG-1 Audio, 1990/91

1991 first cinema demonstration with a AC-3 audio

Dolby AC-3 audio: Transform Coding using Modified

MPEG-2 ISO/IEC 13818-3

MPEG-2 AAC ISO/IEC 13818-7

MPEG-4 ISO/IEC 14496-3:

MPEG-7 ISO/IEC 15938

16000 128Kbps 1,024Kbp 1,024Kbp

24000 192Kbps 1,536Kbp 1,536Kbp

32000 256Kbps 1,024Kbp 2,048Kbp 1,024Kbp 2,048Kbp

48000 384Kbps 1,152Kbp 1,536Kbp 3,072Kbp 1,152Kbp 1,536Kbp 3,072Kbp

•Exploits Subjective Masking Effects

• AAC (Japan) ADIFF and ADTS type audio

Physical Representation of the human

Psycho Acoustic Model of Human Ear

By understanding the overall response of the

Outer ear = mechanical impedance transformer

high ........middle...............low frequencies

Filter characteristics Frequency receptors

L [dB] Masking tone (1kHz)

0 100 200 300 400 t [ms]

Quantization noise: S/N[dB] = 6 N

MPEG-2 Layer I,II:

Different maximum bit allocation in subbands:

Subband filter & 12 12 12

Subband filter & 12 12 12

Subband filter & 12 12 12

Highest value is used for

Left Center Right

Left surround Right surround

Prediction Scale Noiseless

• HE-AAC is the low-bit-rate codec that integrates the

• The HE-AAC v2 adds the Parametric Stereo (PS) feature to

High performance audio codec for XM Radio

Losless audio codec for better Approximately half the bit

Low delay full audio bandwidth codec Video conferencing

S-ar putea să vă placă și