Sunteți pe pagina 1din 51

Audio Compression Standards

James Rodney P. Santiago


Coverage
I. History of Audio Coding
II. Principles of Audio Coding
III. AAC Family of Standards
Loudness Contour
I. History of Audio Coding
• MASCAM was developed at the Institut Fur
Rundfunktechnik (IRT)
• MUSICAM was developed in cooperation with
CCETT, Philips and Matsushita
• ASPEC was developed by Fraunhofer
Gesellschaft together with Thomson, this is
included under MPEG-1 compression.
Digital Compression

Basic Concept of Data Compression

– Redundancy Reduction – repetitive information


that can be reproduced in the receiving end need
not be sent . But this alone does not produce
much compression

– Irrelevance Reduction – removing all


imperceptible image or sound information for
human senses resulting to a much better data
compression
I. History of Audio Coding
Transform Coding
Sub-band Coding
(DCT)
MASCAM IRT Munich,
1988
ASPEC Fraunhofer
MUSICAM, IRT, CCETT,
Gesellschaft, Thomson
Philips, Matsushita,
1989

ISO/IEC 11172-3 MPEG-1 Audio, 1990/91


Layer I, Low complexity encoder, low compression
Layer II, medium encoder complexity Data Rates :
Layer III, High complexity encoder, high compression
Layer I : 32 – 384Kbps
Layer II : 32 – 448Kbps
ISO/IEC 13818-3 MPEG-2 Audio, 1994 Layer III: 32 – 192Kbps
Layer I, II, III
Layer II Multichannel audio up to 5.1
I. History of Audio Coding
1990 Dolby Digital AC-3 Audio

1991 first cinema demonstration with a AC-3 audio


encoded movie

Dec. 1991 „Star Track VI“ with AC-3 audio, Now AC-3 used
for movies, ATSC and worldwide addionally in MPEG2
transport streams and on DVD

Dolby AC-3 audio: Transform Coding using Modified


Discrete Cosine Transform (MDCT), 5.1 Audio Channels
(left, center, right, left surround, right surround, subwoofer);
128 kBit/s per channel.
Audio Coding - History (3)

MPEG-2 ISO/IEC 13818-3


MPEG-2 Audio System

MPEG-2 AAC ISO/IEC 13818-7


AAC = Advanced Audio Coding

MPEG-4 ISO/IEC 14496-3:


natural and synthetic audio objects

MPEG-7 ISO/IEC 15938


Storage space requirement for
A/V data signals
Data Rates
LEFT CHANNEL RIGHT CHANNEL
ANALOG TO DIGITAL CONVERSION (BITS) ANALOG TO DIGITAL CONVERSION (BITS)
15Hz -
BW
20KHz 8 16 24 32 64 8 16 24 32 64

16000 128Kbps 1,024Kbp 1,024Kbp


256Kbps 384Kbps 512Kbps 128Kbps 256Kbps 384Kbps 512Kbps
s s
SAMPLING FREQUENCY (Hz)

24000 192Kbps 1,536Kbp 1,536Kbp


384Kbps 576Kbps 768Kbps 192Kbps 384Kbps 576Kbps 768Kbps
s s

32000 256Kbps 1,024Kbp 2,048Kbp 1,024Kbp 2,048Kbp


512Kbps 768Kbps 256Kbps 512Kbps 768Kbps
s s s s
44100 352,800
705,600 1,058,400 1,411,200 2,822,400 352,800 705,600 1,058,400 1,411,200 2,822,400

48000 384Kbps 1,152Kbp 1,536Kbp 3,072Kbp 1,152Kbp 1,536Kbp 3,072Kbp


768Kbps 384Kbps 768Kbps
s s s s s s

96000 768Kbps 1,536Kbp 2,304Kbp 3,072Kbp 6,144Kbp 1,536Kbp 2,304Kbp 3,072Kbp 6,144Kbp
768Kbps
s s s s s s s s
Masking

800khz masker

Auditory masking

•Exploits Subjective Masking Effects


•Some louder tones hide some lower tones
•Effect will persist for up to 200ms
•The Psycho-Acoustic Model Defines the Mask

Temporal
masking
Audio Compression Systems Used
in MPEG-2 Transport Streams
• MPEG-1 layers 1, 2 & 3
• Only layer-II used in broadcast systems
• MPEG-2 audio (5.1 channels) possible, but rarely used
• All are ‘backward compatible’

• Dolby digital (AC3) USA ATSC and also DVB (Germany, Australia)
• 5.1 channels (0.1 = low freq effects)

• AAC (Japan) ADIFF and ADTS type audio


• MPEG-4 will use AAC as default standard
(Frauhoffer labs) 8 or more channels dynamically reposition-able
in space
MPEG-2 Audio Compression
Audio Signal

16 bit
A
Right D
up to 768 kbit/s

15 to 20 kHz BW
32/44.1/48 kHz = approx.
Sampling Freq. 1.5 Mbit/s

16 bit
A
Left D
up to 768 kbit/s

15 to 20 kHz BW
32/44.1/48 kHz
Sampling Freq.
Amplitude, Frequency & Time Masks

Auditory masking
• Two sounds of similar frequency which occur at the
same time .
• Sounds at lower frequencies must be even closer
together in order to be masked by higher frequencies

Temporal masking
• Loud sound that drowns out softer sounds immediately
before, or after it.
Psychoacoustic Model

Physical Representation of the human


ear :

by using

Psycho Acoustic Model of Human Ear


(Perceptual Coding) = Irrelevancy Reduction
and Redundancy Reduction

By understanding the overall response of the


human ear, data rate reduction can be
achieved!!!
Representation of the Human Ear

hammer inner
ear semicircular
canals
cochlea
outer
ear auditory
middle nerves
ear eustachian
eardrum tube
Mechanical Representation of the Human ear

hammer
eardrum inner ear
membrane
receptors for
low frequencies
outer
ear

receptors auditory
middle
for high frequencies nerves
ear
eustachian tube
Electrical Representation of the human ear

Outer ear = mechanical impedance transformer

high ........middle...............low frequencies


Filter

Filter characteristics Frequency receptors


Auditory nerve
of middle ear and inside cochlea signals,
eardrum ~100 mVpp,
repetition
(e.g. resonance at 3 kHz) rate up to
1 kHz depending
on audio amplitude
Audibility Threshold
L [dB]

60

40

20

0 2 4 6 8 10 12 14 f [kHz]
Frequency Masking

L [dB] Masking tone (1kHz)

60

40 Masking threshold

20

0 2 4 6 8 10 12 14 f [kHz]
Frequency Masking
L [dB]

60

40

20

0 2 4 6 8 10 12 14 f [kHz]
Temporal Masking

L[dB] Premasking
50
40
Masking Postmasking
30
tone
20
10

0 100 200 300 400 t [ms]


Quantization Noise
Sinusoidal signal using
full AD converter range

N bit resolution
A
LP
D

Quantization noise: S/N[dB] = 6 N


Audio Encoding
Frequency Irrelevancy Redundancy
subbands reduction reduction
Audio in Filtering
process Subband Data
Time: fine Quantizer coding
Frequency: coarse

Compressed
audio
out
Spectrum Psycho-
analysis acoustic
Time: coarse model
Frequency: fine
Audio Subband Coding
Audio in
BP Q

BP Q
Frequency Compressed
subbands
audio
BP Q out
Bandpass Quantizer
filter
Psycho Example:
acoustic MPEG layer I, II
512 point FFT
FFT
@MPEG Layer model
I,
1024 points
@ Layer II;
every 24ms
Subband Filtering @ MPEG-2 Layer I,II

L [dB]

60

40

20

0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]

MPEG-2 Layer I,II:


32 subbands, each 750 Hz wide
Bit Allocation @ MPEG-2 Layer I,II

Different maximum bit allocation in subbands:


max. min.
n1 Bit n2 Bit n3 Bit n4 Bit n5 Bit
L [dB]

60

40

20

0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]
Quantization @ MPEG-2 Layer I,II
Signal level in subband below masking
threshold determined by a signal at 8 kHz:
subband completely suppressed

L [dB]

60

40

20

0 2 4 6 8 10 12 14 16 18 20 22 24 f [kHz]
Spectrum calculated
by means of FFT; thresholds
calculated after FFT; Signal level in subband above masking
quantizer controlled by threshold determined by a signal at 4 kHz:
psychoacoustic model quantization noise adjusted to below threshold
MPEG2 Audio Data Structure
Subband filter & 12 12 12
quantizer 0 samples samples samples

Subband filter & 12 12 12


quantizer 1 samples samples samples

Subband filter & 12 12 12


quantizer 2 samples samples samples

Subband filter & 12 12 12


quantizer 31 samples samples samples
Layer I
frame Layer II
frame
MPEG-2: Scale Factor Determination

Highest value is used for


scale factor determination
for a block of samples

Block
of samples
Audio Transform Coding

Audio in (M)DCT
Quantizer
Modified Discrete
Cosine Transform
Compressed
audio
out

Psycho- Example:
Dolby Digital
FFT acoustic
model AC-3
Audio Hybrid Subband&Transform Coding

Audio in Sub-
band (M)DCT Quantizer
filter Compressed
Audio
Out

Psycho- Example:
acoustic MPEG layer III
FFT
model
Multichannel Audio Coding

Multi-
channel
audio in Detection
and removal
Filter of
process interchannel Quantizer
e.g. redundancies/
left, Compressed
irrelevancies
right, audio
rear out

Example:
MPEG layer III,
Psycho-
AC3
acoustic
FFT
model
Multichannel Audio 5.1

Subwoofer

Left Center Right

Left surround Right surround


Advanced Audio Codec (AAC)
James Rodney P. Santiago
Family of Standards

AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND

HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND

HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
Functional Block Diagram of an MPEG-2 AAC encoder

Input Output

Bitstream Multiplex

Prediction Scale Noiseless


Filterbank TNS System
Quantizer
Factors Coding

Perceptual
Rate/Distortion Control
Model
MPEG AAC-LC
• The AAC-LC is the next-generation successor to the mp3 audio
codec, invented and developed by Fraunhofer IIS.
• AAC-LC delivers transparent quality in compressed audio at
only 64 kbit/s per channel – compressed audio that is virtually
indistinguishable from the original audio source.
• The AAC-LC satisfies the requirements for broadcast quality as
defined by the EBU. With flexible sampling rates ranging from
8 kHz up to 192 kHz, bit rates up to 256 kbit/s per channel,
and with support for up to 48 channels.
• It can be used in applications that demands high quality and
unlimited bandwidth.
• It has support for mono, stereo and all common multi-channel
configurations.
• ideal codec for any low-bit-rate, high-quality audio
application on mobile devices.
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND

HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
MPEG HE-AAC
• High Efficiency – Advanced Audio Code, also known
ask AACplus.

• HE-AAC is the low-bit-rate codec that integrates the


functionality of the AAC-LC audio codec and Spectral Band
Replication (SBR) bandwidth expansion tool.
• HE-AAC allows design flexibility to trade off quality against
bandwidth, file size or bit rate.
• HE-AAC delivers good stereo quality at bit rates of 32 to 48
kbit/s.
• The codec is multi-channel compatible.
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND

HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
MPEG HE-AACv2
Also known as AACplusv2

• The HE-AAC v2 adds the Parametric Stereo (PS) feature to


initial HE-AAC to further enhance efficiency in low-bandwidth
• media.
• Fraunhofer’s HE-AAC v2 codec delivers good-quality audio at
bit rates from 16 to 24 kbit/s for stereo content.
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND

HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
MPEG AAC-ELD
• AAC-LD, the low delay version of AAC.
• It combines the full-bandwidth, superior quality of AAC with a
low coding delay necessary for two-way audio communication.
• It features an algorithmic delay of only 20 ms, while offering
CD-like audio quality at 64 kbit/s per channel.
• With the integration of SBR technology and the feature set of
the LD codec, Fraunhofer’s AAC-ELD provides full audio
bandwidth at data rates down to 24 kbit/s per channel.
• Both the AAC-LD and AAC-ELD codecs are perfectly suited for
applications that require bi-directional communication, such as
Internet telephony and video conferencing.
AAC-
AAC-LD
ELD
SBR PS
SPECTRAL PARAMETERIC
BAND STEREO
RESPONSE
HD-AAC
AAC-LC HE- AAC
V2
Scalable
Lossless MPEG
Codec SURROUND

HD-AAC AAC-LC/HE-AAC+MPEG
SURROUND
HD-AAC
• The MPEG standard HD-AAC offers music encoding with quality
beyond CDs while being compatible with iPods and mobile phones.
• Audio CD’s store uncompressed music in 16-bit, 44.1 kHz quality,
while most music is now produced in the improved 24-bit, 96 kHz
format.
• HD-AAC provides this high-quality sound experience to the user, the
online music distribution and the consumer electronics industry.
• Based on the MPEG standards, Scalable lossless (SLS) and AAC,
HDAAC provides scalable-to-lossless compression of 24-bit quality
music content, thereby ensuring a seamless migration to future
AAC-compliant standards.
HD-AAC
• The MPEG standard HD-AAC offers music encoding with quality
beyond CDs while being compatible with iPods and mobile phones.
• Audio CD’s store uncompressed music in 16-bit, 44.1 kHz quality,
while most music is now produced in the improved 24-bit, 96 kHz
format.
• HD-AAC provides this high-quality sound experience to the user, the
online music distribution and the consumer electronics industry.
• Based on the MPEG standards, Scalable lossless (SLS) and AAC,
HDAAC provides scalable-to-lossless compression of 24-bit quality
music content, thereby ensuring a seamless migration to future
AAC-compliant standards.
SUMMARY
CODEC FEATURES TYPICAL APPLICATIONS TYPICAL BIT RATE
apple iPod
AAC-LC (Low Complexity High performance audio codec for
iTunes 128 Kbit/s (stereo)
Advanced Audio Codec excellent audio quality at low bit rates
ISDB-T Television broadcast (Japan)

High performance audio codec for XM Radio


HE-AAC (High Efficiency
good quality at bitrates of 28kbps per 56 Kbit/s (stereo)
AAC), AACplus channel. Mobile music download
Digital Radio Mondiale (DRM)
Highest performance audio codec for
3GPP music download
HE-AAC v2 (AACplus v2) good quality at bit rates below 48 Kbit/s (stereo)
24kbits/s per channel Digital Radio DAB+

Losless audio codec for better Approximately half the bit


HD-AAC (High Definition Home networks
than CD quality with 24bit and rate of the uncompressed
AAC) sampling up to 192Khz file
music distribution/Production
Video conferencing
AAC encoding with 20ms
AAC-LD (low delay AAC) algorithmic delay VOIP telephony 128Kbit/s (stereo)
Broadcast gateway

Low delay full audio bandwidth codec Video conferencing


AAC-ELD ( Enhanced Low
at data rates down to 24kbit/s per 64Kbit/s (stereo)
Delay AAC) channel and 15ms delay. VOIP Telephony
Broadcast gateway
Digital Radio in surround
Surround sound extension for AAC-LC Mobile tv with binaural surround 64 - 192 kbit/s (5.1
MPEG Surround and HE-AAC sound channels)
music distribution/Production
Thank you very much for your attention

S-ar putea să vă placă și