Sunteți pe pagina 1din 54

Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Multimedia Systems and Services

T3: Compression of Audiovisual Signals

Introduction and fundamental concepts

Luis Salgado
L.Salgado@gti.ssr.upm.es

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 1


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Contents

1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 2


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Introduction

 Sources of information:
 Audio/speech
 Image
 Video
 2D/3D Graphics

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 3


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Introduction

 Sampling  discrete time signal


 Sampling frequency: fs
 Nyquist sampling theorem: fs ≧ 2 x fmax
 Sets the minimum sampling rate for perfect reconstruction

 Quantization  digital signal


 Finite number of possible values per sample

 Source coding:
 Quantization intervals represented by symbols from a finite alphabet

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 4


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Contents

1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 5


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital representation Speech/Audio

 Components between 20 – 20.000 Hz


 Limit of human hearing: ~ 20.000 Hz
 Uncompressed Speech
 Narrowband telephony: Pulse Code Modulation (UIT-T G.711)
 Non-uniform robust quantization: A-Law (log PCM)
 256 quantization intervals  8 bit/sample
 Wideband and extensions (UIT-T G.711.1)

freq. range fs bits per bitrate


(Hz) (kHz) sample (kb/s)
300 – 3.400 8 8 64 Narrowband telephony
< 8.000 16 14 224 Wideband speech

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 6


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital representation Speech/Audio

 Uncompressed Audio:
 Different qualities depend on application
 Uniform quantization

fs bits per # bitrate


(kHz) sample channels (kb/s)
32 16 2 1.024 DATs, broadcast (still….)
44,1 16 2 1.411,2 CD, DAT – consumer
48 16 2 1.536 DAT, DVD-Video – professional
96 16/20/24 2 4.608 Digital recording software/hardware

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 7


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Aliasing

 Sampling  signal spectrum replication


 High frequencies overlap  aliasing
Real Music
48 kHz

8 kHz – alias

8 kHz – LPF
© Mark Handley

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 8


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Contents

1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 9


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital representation of Images

 2D signals
 Sampled at specific spatial locations
 Quantized signal values
 Pixel: minimum unit
 Aspect ratio: width/height

Source: “Digital Image Processing”, R. Gonzalez and Goods, Prentice Hall, 2002
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 10
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital representation of Images


 Spatial resolution: “dots per inch”, (dpi)
 > spatial resolution  > definition
 Signal frequency defined in space
 Depth: “bits per pixel”, (bpp)
 Bits used to represent each pixel information
 Indicates the degree of compression

Source: hp.com.
72 dpi (Web) 300 dpi

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 11


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital representation of Images


 Intuitive Nyquist
Spatial resolution Sampling points

Original Image

Sampled
Image

Details are missed


SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 12
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital representation of Images


 Intuitive Nyquist

Sampled image
Original Image

1mm

2mm

Signal period Spatial resolution No details missed


(sampling rate)
Nyquist:
spatial resolution ≤ ½ x minimum signal period or
sampling frequency fs ≥ 2 x maximum frequency in the image
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 13
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital representation of images


 RGB:
 Uniform quantization of each color plane
8 bits
 True-color: 24 bpp

 Others: YCrCb, HSV…


 Indexed color:
 Limited set of colors
 Pixel values are index to a Color Look-up
Table (CLUT)

Source: An introduction to Digital Image Processing with MATLAB, A. McAndrew, Victoria University of Technology, 2004.

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 14


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

RGB

512x512x3x8
~ 6.29 Mbit
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 15
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Luminance and color


 Human vision ranges: scotopic, mesopic, photopic.
 Retinal photoreceptors
 Rods: large number (~100M), scotopic, no color, low illumination (night).
 Cones: lower number (~6M), photopic, color, high illumination (day).
 Tri-receptor theory of color vision (1802)
 Photoreceptors (cones) render three values of α

a1   
a3  

 i (C )   C ( )  ai ( )d  i  1, 2,3 a2  

Source: “Digital Image Processing”, R. Gonzalez and Goods, Prentice Hall, 2002

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 16


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Luminance and color

Y   C ( )  aY ( )d  aY ( )  a1 ( )  a2 ( )  a3 ( )

aY   

a2   

a1     20

a3   

Fuente: “Video processing and communications”, Y. Wang, Prentice Hall, 2002

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 17


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Luminance and Crominances: YCrCb

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 18


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Color sensitivity

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 19


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Subsampling color

4:2:2
~ 4.19 Mbit

4:2:0
~ 3.14 Mbit

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 20


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Aliasing

 Staircasing
 Interference

Source: “epigrammedia.com” Source: “directorfotografia.com”

Source: “www.svi.nl”

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 21


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Aliasing
 Moiré patterns

Fuente: “en.wikipedia.org”

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 22


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital representation Images

 Uncompressed Images:
 Resolution related to sensor
 Uniform quantization of color planes

Width Height # bits Mbit


(pixels) (pixels) planes per plane

2688 1520 3 8 98,058 HTC One (M8) – 4 Mpix


3264 2448 3 8 191,77 Apple iPhone 5s – 8 Mpix
4320 2432 3 8 252,15 Motorola X – 10 Mpix
4160 3120 3 8 311,50 LG G2 – 13 Mpix
4992 3744 3 8 448,56 Nokia Lumia 1520 – 20 Mpix

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 23


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Contents

1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding

HVS – Frequency
Response

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 24


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Video signal

 Analog:
 Represented as a continuous (time varying) 1-D signal
 Sampling done in space (rows) and time  Scanning
 Digital
 Sequence of digital images: set of quantized samples coded
 Keeps quality through regeneration
 Explodes the processing and manipulation capabilities

 Scanning:
 Periodic sampling of the light information at the camera
 Records information about the light distribution along predefined
sampling lines
 Includes control information: synchronization pulses

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 25


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Scanning
 Progressive scanning
 Generates a complete image, frame, sampling consecutive lines
 Frame rate: images/sec
 To avoid flicker: typically 25/30 images/sec (PAL/NTSC)
 Interlaced scanning
 1 every N lines from the complete image at each scan
 Interlace ratio N:1. N=2 typically.
 The whole image, frame, is composed by 2 fields (half images)

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 26


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Analog video main parameters


Main parameters PAL NTSC
Number of lines in the image [lines/frame] 625 525
Line scanning interval or line period 64 μs 63.555 μs
Line blanking or horizontal blanking interval 12.05 μs 10.90 μs

Interlace ratio 2:1 2:1


Field blanking or vertical blanking interval 25 lines/field 21 lines/field
20 μs 16.6833 μs
Frame rate 50 Hz 59.94 Hz

 Bandwidth
 Computed for the worst case: B = 7.38 MHz estimated for TV PAL
 Perceptual and display resolution limitations allow reducing B
 Bedford y Kell estimate a 30% reduction  B’=5.5 MHz

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 27


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital video signal

Composite Video
Analog
TV & Analog
Formating
Broadcasting
PAL/NTSC/SECAM

TV/Video
camera RGB Signals
Digital Video BT601 / 709 / 2020
TV / HDTV / UHDTV Studio

Digital
Formating MPEG-X
/HEVC
Vídeo DTV &
Conversión A/D
Compression Digital
Broadcasting
Lower bit-rates

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 28


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Video digitization

 Raster signal sampling  line sampling (horizontal dimension)


 Sampling rate:
 Samples vertically aligned
 Horizontal sampling interval = Vertical sampling interval
 Common for different systems

fl (NTSC) = 15.734 kHz =525 lines/frame x 30 frames/s = 15.75 kHz (525*29,94 = 15,7185)
fl (PAL) = 15.625 kHz  625 lines/frame x 25 frames/s = 15.625 kHz
f s  858 fl (NTSC)  864 fl (PAL)  13.5 MHz

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 29


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Digital video digitization/representation


 ITU-R BT.601: Studio encoding parameters of digital television for standard 4:3 and
wide-screen 16:9 aspect ratios

858 pels 864 pels


720 pels 720 pels
525 lines

480 lines

625 lines

576 lines
Active Active
Area Area

122 16 132 12
pel pel pel pel

NTSC 525/60: 60 field/s PAL 625/50: 50 field/s

 ITU-R BT.709: Parameter values for the HDTV standards for production and
international programme exchange
 ITU-R BT.2020: Parameter values for ultra-high definition television systems for
production and international programme exchange
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 30
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Luminance and Crominances



 Non gamma corrected components [0,1]: R  E R G  EG B  E B
* Could be already digital signals [0,255] normalized

 Luminance [0,1]: EY  k R  E R  (1  k R  k B )  EG  k B  E B


kR/kB : R/B contributions to luminance
( ER  EY ) ( E B  EY )
 Normalized color differences [-0.5,0.5]: EC  EC 
R
2  (1  k R ) B
2  (1  k B )

BT. 601 BT.709 BT.2020


kR = 0.299; kB = 0.114 kR = 0.2126 kB = 0.0722 kR = 0.2627 kB = 0.0593
EY  0.299 E R  0.587 EG  0.114 E B 0.2126 E R  0.7152 EG  0.0722 E B 0.2627 E R  0.6780 EG  0.0593 E B

Quantization
D=1 / 4  8/10 bits n=8/10 bits n=10/12 bits
Y  int  219  EY  16   D / D Y  int  219  EY  16   2n 8 

  
CR  int 224  EC' R  128  D / D 
CR  int 224  EC' R  128  2n 8  
CB  int  224  E '
CB  128   D / D CB  int  224  E'
CB  128   2  n 8

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 31


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Chrominance sampling

 Structure
 Spatially static and orthogonal
 Samples of components should be co-sited
 Repeated for each line, field and frame

 Sampling hierarchy
 Sampling families represented by three values that identify the
sampling frequency for each component (… initially!!)
 Max. sampling frequency identified with 4
 fs = 13.5 MHz in BT.601 (18 MHz also considered)

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 32


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Chrominance sampling

4:4:4 4:2:2 4:1:1 4:2:0


For every 2x2 Y Pixels For every 2x2 Y Pixels For every 4x1 Y Pixels For every 2x2 Y Pixels
4 Cb & 4 Cr Pixel 2 Cb & 2 Cr Pixel 1 Cb & 1 Cr Pixel 1 Cb & 1 Cr Pixel
(No subsampling) (Subsampling by 2:1 (Subsampling by 4:1 (Subsampling by 2:1 both
horizontally only) horizontally only) horizontally and vertically)

Y Pixel Cb and Cr Pixel

 Adequate filtering always required before subsampling


 4:4:4 RGB or YCrCb
 4:2:0 Different implementations differ on chroma location
 How to be implemented in interlaced video????
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 33
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Some data rates….


Video Format Size Color Frame Rate Raw Data Rate
Sampling (Hz) (Mbps)
UHD Production and program exchange
BT.2020 – 8K 7680x4320 4:4:4/4:2:2/4:2:0 25P/60P 12441/29859
BT.2020 – 4K 3840x2160 4:4:4/4:2:2/4:2:0 25P/60P 3110/7464
HDTV Production and program exchange
BT.709 1920x1080 4:2:2 24P/30P/60I 796/995/995
HDTV Over air. cable, satellite, MPEG2 video, 20-45 Mbps
SMPTE295M 1920x1080 4:2:0 24P/30P/60I 597/746/746
SMPTE296M 1280x720 4:2:0 24P/30P/60P 265/332/664
Video production
BT.601 720x480/576 4:4:4 60I/50I 249
BT.601 720x480/576 4:2:2 60I/50I 166
High quality video distribution (DVD, SDTV)
BT.601 720x480/576 4:2:0 60I/50I 124
Intermediate quality video distribution (VCD, WWW)
SIF 352x240/288 4:2:0 30P/25P 30
Video conferencing over ISDN/Internet
CIF 352x288 4:2:0 30P 37
Video telephony over wired/wireless modem
QCIF 176x144 4:2:0 30P 9.1

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 34


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Contents

1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 35


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Need of compression
 Image: 8.0 million pixel camera (iPhone), 3264x2448
 25 MByte/image  41 pictures / 1GB

 Video:
 video 720x576, RGB, 25 frames/s  31.1MByte/sec
 audio 16bits x 44.1KHz stereo  176.4 KByte/s
 DVD Disc 4.7 GB  ~ 2.5 min per DVD disc
 RGE-1 Network (TDT Multiplex): 19.91 Mbps  Not 1 STV channel|||

 Send video from cellphone:


 352*288, RGB, 15 frames/s  4.56 MByte/sec
 Bandwidth  Cost

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 36


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Goals of compression
 ↓ ↓ Redundancy: …exceeds what is necessary or required…
 Symbol redundancy: take advantage of symbol probabilities
 Spatial and temporal redundancy
 Adjacent samples are highly correlated
 In video, co-situated samples in different images are correlated

 ↓ ↓ Irrelevance or perceptual redundancy


 Depends on perceptual limitations: build a perceptual model
 Reduce that information not to be perceived

 Represent the same information with fewer bits…


 Lossless : preserve ALL information, perfectly recoverable
 Lossy: eliminate that information perceptually insignificant 
original signal can not be recovered

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 37


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Perceptual redundancy

 Exploited through the quantization of the audiovisual info


 Quantization intervals adapted to audio/visual system sensitivity
 Smaller intervals for higher sensitive information
 Larger intervals for information less perceptually relevant

 Signal should be “represented” in an alternative way (space) in


which:
 Relevance of the info. is highlighted from perceptual point of view
 If possible, shows less correlation between samples than that in the
original space  introduces signal decorrelation

 Perceptual models are defined in the frequency domain:


 Signal transformations are to be used: subband decomposition, FFT,
DCT…

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 38


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Requirements for compression algorithms


 Lossless
 Decoded signal is mathematically equivalent to the original one
 Drawback: achieves only a small or modest level of compression
 Lossy
 Decoded signal is of a lower quality than the original one
 Advantage: achieves very high degree of compression
 Objective:
 Maximize the degree of compression with a certain quality or
 Maximize the quality with a certain degree of compression

 General compression requirements


 Ensure good quality of decoded signal with high compression ratios
 Minimize the complexity of the encoding and decoding process
 Support multiple channels and various data rates
 Give small delay processing

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 39


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Fidelity Criteria for quality evaluation


 Used to measure the signal quality
 In compression, to evaluate impact of losing real or quantitative signal
information
 Subjective criteria:
 Require the definition of a grading scale for qualitative evaluation
 Require standardized testing protocols involving a relevant sample of
people  difficult to implement
 It is the best approach to compare if the target is to generate high quality
compressed signals according to human perception
 Objective criteria:
 Evaluate the similarity between two signals (for example images), one of
them taken as reference, through a mathematical function.
 In compression indicates the loss of information between the input and
output signals of the compression process.
 Not always correlated with subjective evaluation results!!

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 40


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Fidelity Criteria for quality evaluation


 Mean Square Error (MSE)
 Example: for an image I (x,y) of size MxN:

1 M N
 2
mse

MN
 [ I ( x , y )  
I ( x , y )]2

x 1 y 1

 SNR and PSNR (dB):

2
SNR  10 log 2
 mse 2
max theoretical (I )
PSNR  10 log
1 M N  mse
2

 2

MN
 [ I ( x , y )]2

x 1 y 1

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 41


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Fidelity Criteria for quality evaluation


 Example for image compression

a) b) c)

 Objective:
SNR-ab=11.35
SNR-ac=11.69

 Subjective:
 b: passable
 c: marginal?

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 42


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Contents

1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 43


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Source coding scheme


x[n] z[n] zq[n]
Transformation Quantization Coding
Block of n Transform Quantized bit
samples coeficients coeficients patterns

 Transformation:
 Alternative representation of the signal
 Helps to remove redundancy and highlight perceptual relevance
 Reversible: typically no loss of information
 Quantization:
 Adapted to signal redundancy and perceptual relevance
 loss of information!  not recoverable the original signal
 Coding (VLC: Variable Length Coding):
 Input data (symbols) are transformed into codewords
 Removes input data redundancy: symbol probabilities
 entropy coding
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 44
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

VLC Coding

 Ignores semantics of input data and compresses media streams


by regarding them as sequences of digits or symbols
 Examples: run length coding, Huffman coding, …
 Desired properties of symbol codes:
 Non-singular: every symbol xi in X maps to a different codeword
 Uniquely decodable: every sequence {x1, x2, …,xn} maps to a different
codeword sequence
 Instantaneous: no codeword is a prefix of any other codeword

uniquely decodable
non-singular instantaneous

VLC Coding
Examples

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 45


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Transformation: Prediction
 Signals are correlated  use previous samples to predict
 Encode the prediction error (difference between the signal value and its
prediction) lowers the bitrate
 Prediction gain: for the same number of bits per sample, the use of
prediction renders a gain in the signal-distortion rate

x[n] + e [ n] eq [n]
Coder
Σ Quantizer

xˆ[n] +
Predictor x[n]
with delays Σ
+

eq [n] + x[n]
Σ
Decoder +
xˆ[n]
Predictor
xˆ[n]    i x[n  i ] with delays
i

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 46


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Transformation: frequency domain

 Target:
 Decompose the original signal into “sub-signals”, each corresponding to
different frequency bands
 Generate a new signal in which the original signal energy is distributed
among a reduced set of samples (coefficients)
 energy efficiently packed

 Reversible process  No information is lost


 Compression: Not achieved directly
 What for?
 Apply perceptual models for coding:
 Separating irrelevance…
 Reducing redundancy…

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 47


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Subband decomposition

 Filter Bank that isolate parts of the signal that correspond to


different frequency ranges.
 Analysis Filters + decimation  subband signals generation
 Upsampling + Synthesis Filters  signal reconstruction
 Perfect reconstruction possible: depends on filters used
 No compression is achieved

Source: http://zone.ni.com

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 48


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Discrete Cosine Transform (DCT)

 Linear transformation used for audio and image/video


compression
 Fast algorithms to compute exist based on DFT (FFT)
 Real valued (integer DCT is used)
 Preserves energy and energy packing (signal decorrelation) is close to that
of the optimum transform

 Signal is represented in an alternative space as:


 A sequence of numbers (1-D DCT)
 A matrix of numbers (2-D DCT)
 Each number represents the amount of a certain frequency pattern hold in
the original signal

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 49


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

1-D DCT
 The DCT of a sequence of N samples of a signal x[n] is a
sequence of N coefficients C[u]
 “n” refers to the temporal axis
 “u” refers to the frequency axis
 The transformation consists on representing x[n] as a linear
combination of N base functions of cosine form

2 (2n  1)u
Base functions: Fb (u , n)  K (u ) cos
N 2N

2 N 1
(2n  1)u
C[u ]  K (u ) x[n] cos para u  0,1, ( N  1)
N n 0 2N
N 1
1 / 2 si u  0
con K (u )  C[u ]   x[n] Fb (u , n)
 1 si u  0 n 0

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 50


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Base functions for N=8


u=0 u=1 u=2 u=3

n n n n

u=4 u=5 u=6 u=7

n n n n

 8 base functions (u=0...7) each of 8 samples (n=0...7)


 Represent signals of increasing frequency contents:
 u=0 represents the DC component: u=0
 u is the frequency axis, N functions of n (time axis)

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 51


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

DCT example
C[0]
x[n] DCT C[u]
C[1]

n u
C[2]
u=0
x
u=1 x x u=2
...
n n
n

n
+ + + ...
n n
Most part of the energy is concentrated in the first coefficients!!!

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 52


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

I-DCT example

 Considering only the three first coefficients


 Those that keep most part of the energy
 Quantizing to 0 the other 5 coefficients
 Reconstruct the signal using the Inverse DCT (I-DCT)

C[0] Cr[u] I-DCT xr[n] Reconstructed signal


DCT quantized

C[1] u
C[2] n
Somehow “Smoothed”

x[n] Original signal


Reconstruction error

n n
Typically low for most common signals!!!!

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 53


Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid

Credits

 Some contents has been adapted from that originally generated by


Enrique Rendón Angulo (EUITT-UPM) , Juan Carlos San Miguel, Jesús
Bescós and José María Martínez Sánchez (EPS-UAM).

SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 54

S-ar putea să vă placă și