Documente Academic
Documente Profesional
Documente Cultură
Luis Salgado
L.Salgado@gti.ssr.upm.es
Contents
1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding
Introduction
Sources of information:
Audio/speech
Image
Video
2D/3D Graphics
Introduction
Source coding:
Quantization intervals represented by symbols from a finite alphabet
Contents
1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding
Uncompressed Audio:
Different qualities depend on application
Uniform quantization
Aliasing
8 kHz – alias
8 kHz – LPF
© Mark Handley
Contents
1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding
2D signals
Sampled at specific spatial locations
Quantized signal values
Pixel: minimum unit
Aspect ratio: width/height
Source: “Digital Image Processing”, R. Gonzalez and Goods, Prentice Hall, 2002
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 10
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid
Source: hp.com.
72 dpi (Web) 300 dpi
Original Image
Sampled
Image
Sampled image
Original Image
1mm
2mm
Source: An introduction to Digital Image Processing with MATLAB, A. McAndrew, Victoria University of Technology, 2004.
RGB
512x512x3x8
~ 6.29 Mbit
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 15
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid
a1
a3
i (C ) C ( ) ai ( )d i 1, 2,3 a2
Source: “Digital Image Processing”, R. Gonzalez and Goods, Prentice Hall, 2002
Y C ( ) aY ( )d aY ( ) a1 ( ) a2 ( ) a3 ( )
aY
a2
a1 20
a3
Color sensitivity
Subsampling color
4:2:2
~ 4.19 Mbit
4:2:0
~ 3.14 Mbit
Aliasing
Staircasing
Interference
Source: “www.svi.nl”
Aliasing
Moiré patterns
Fuente: “en.wikipedia.org”
Uncompressed Images:
Resolution related to sensor
Uniform quantization of color planes
Contents
1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding
HVS – Frequency
Response
Video signal
Analog:
Represented as a continuous (time varying) 1-D signal
Sampling done in space (rows) and time Scanning
Digital
Sequence of digital images: set of quantized samples coded
Keeps quality through regeneration
Explodes the processing and manipulation capabilities
Scanning:
Periodic sampling of the light information at the camera
Records information about the light distribution along predefined
sampling lines
Includes control information: synchronization pulses
Scanning
Progressive scanning
Generates a complete image, frame, sampling consecutive lines
Frame rate: images/sec
To avoid flicker: typically 25/30 images/sec (PAL/NTSC)
Interlaced scanning
1 every N lines from the complete image at each scan
Interlace ratio N:1. N=2 typically.
The whole image, frame, is composed by 2 fields (half images)
Bandwidth
Computed for the worst case: B = 7.38 MHz estimated for TV PAL
Perceptual and display resolution limitations allow reducing B
Bedford y Kell estimate a 30% reduction B’=5.5 MHz
Composite Video
Analog
TV & Analog
Formating
Broadcasting
PAL/NTSC/SECAM
TV/Video
camera RGB Signals
Digital Video BT601 / 709 / 2020
TV / HDTV / UHDTV Studio
Digital
Formating MPEG-X
/HEVC
Vídeo DTV &
Conversión A/D
Compression Digital
Broadcasting
Lower bit-rates
Video digitization
fl (NTSC) = 15.734 kHz =525 lines/frame x 30 frames/s = 15.75 kHz (525*29,94 = 15,7185)
fl (PAL) = 15.625 kHz 625 lines/frame x 25 frames/s = 15.625 kHz
f s 858 fl (NTSC) 864 fl (PAL) 13.5 MHz
480 lines
625 lines
576 lines
Active Active
Area Area
122 16 132 12
pel pel pel pel
ITU-R BT.709: Parameter values for the HDTV standards for production and
international programme exchange
ITU-R BT.2020: Parameter values for ultra-high definition television systems for
production and international programme exchange
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 30
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid
Quantization
D=1 / 4 8/10 bits n=8/10 bits n=10/12 bits
Y int 219 EY 16 D / D Y int 219 EY 16 2n 8
CR int 224 EC' R 128 D / D
CR int 224 EC' R 128 2n 8
CB int 224 E '
CB 128 D / D CB int 224 E'
CB 128 2 n 8
Chrominance sampling
Structure
Spatially static and orthogonal
Samples of components should be co-sited
Repeated for each line, field and frame
Sampling hierarchy
Sampling families represented by three values that identify the
sampling frequency for each component (… initially!!)
Max. sampling frequency identified with 4
fs = 13.5 MHz in BT.601 (18 MHz also considered)
Chrominance sampling
Contents
1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding
Need of compression
Image: 8.0 million pixel camera (iPhone), 3264x2448
25 MByte/image 41 pictures / 1GB
Video:
video 720x576, RGB, 25 frames/s 31.1MByte/sec
audio 16bits x 44.1KHz stereo 176.4 KByte/s
DVD Disc 4.7 GB ~ 2.5 min per DVD disc
RGE-1 Network (TDT Multiplex): 19.91 Mbps Not 1 STV channel|||
Goals of compression
↓ ↓ Redundancy: …exceeds what is necessary or required…
Symbol redundancy: take advantage of symbol probabilities
Spatial and temporal redundancy
Adjacent samples are highly correlated
In video, co-situated samples in different images are correlated
Perceptual redundancy
1 M N
2
mse
MN
[ I ( x , y )
I ( x , y )]2
x 1 y 1
2
SNR 10 log 2
mse 2
max theoretical (I )
PSNR 10 log
1 M N mse
2
2
MN
[ I ( x , y )]2
x 1 y 1
a) b) c)
Objective:
SNR-ab=11.35
SNR-ac=11.69
Subjective:
b: passable
c: marginal?
Contents
1. Introduction
2. Digital representation
- Speech/Audio
- Images
- Video
3. Compression: needs, goals and requirements
4. Source coding
Transformation:
Alternative representation of the signal
Helps to remove redundancy and highlight perceptual relevance
Reversible: typically no loss of information
Quantization:
Adapted to signal redundancy and perceptual relevance
loss of information! not recoverable the original signal
Coding (VLC: Variable Length Coding):
Input data (symbols) are transformed into codewords
Removes input data redundancy: symbol probabilities
entropy coding
SSMM @ ETSIT-UPM Compression of Audiovisual Signals – Fundamentals – 44
Grupo de Tratamiento de Imágenes Universidad Politécnica de Madrid
VLC Coding
uniquely decodable
non-singular instantaneous
VLC Coding
Examples
Transformation: Prediction
Signals are correlated use previous samples to predict
Encode the prediction error (difference between the signal value and its
prediction) lowers the bitrate
Prediction gain: for the same number of bits per sample, the use of
prediction renders a gain in the signal-distortion rate
x[n] + e [ n] eq [n]
Coder
Σ Quantizer
xˆ[n] +
Predictor x[n]
with delays Σ
+
eq [n] + x[n]
Σ
Decoder +
xˆ[n]
Predictor
xˆ[n] i x[n i ] with delays
i
Target:
Decompose the original signal into “sub-signals”, each corresponding to
different frequency bands
Generate a new signal in which the original signal energy is distributed
among a reduced set of samples (coefficients)
energy efficiently packed
Subband decomposition
Source: http://zone.ni.com
1-D DCT
The DCT of a sequence of N samples of a signal x[n] is a
sequence of N coefficients C[u]
“n” refers to the temporal axis
“u” refers to the frequency axis
The transformation consists on representing x[n] as a linear
combination of N base functions of cosine form
2 (2n 1)u
Base functions: Fb (u , n) K (u ) cos
N 2N
2 N 1
(2n 1)u
C[u ] K (u ) x[n] cos para u 0,1, ( N 1)
N n 0 2N
N 1
1 / 2 si u 0
con K (u ) C[u ] x[n] Fb (u , n)
1 si u 0 n 0
n n n n
n n n n
DCT example
C[0]
x[n] DCT C[u]
C[1]
n u
C[2]
u=0
x
u=1 x x u=2
...
n n
n
n
+ + + ...
n n
Most part of the energy is concentrated in the first coefficients!!!
I-DCT example
n n
Typically low for most common signals!!!!
Credits