Documente Academic
Documente Profesional
Documente Cultură
c 2009 D. S. Hirschberg
1-1
Introduction
• Terminology
• Performance Measures
• Information Theory
• Codes
c 2009 D. S. Hirschberg
1-2
Terminology
• Data Compression
transforms data to minimize size of its representation
In contrast: Data Reliability
is often implemented by adding check and parity bits,
and increases redundancy and size of the data
• Motivation
◦ increase capacity of storage media
◦ increase communication channel capacity
◦ achieve faster access to data
c 2009 D. S. Hirschberg
1-3
Applications
• Examples
◦ file compression – Gzip (Unix), Compactor (Mac), PKZIP (PC)
◦ automatic compression (disk doubler) – Stacker
◦ facsimile transmission
◦ modem compression
◦ CD/DVD players
◦ WWW – Flash
◦ digital camera image storage – JPEG
◦ HDTV
◦ Teleconferencing
c 2009 D. S. Hirschberg
1-4
Applications
◦ Image compression
Lossless b/w JBIG CR = 5%
Lossless grey scale JPEG 5 bpp (CR=65%)
Lossy color JPEG 0.6 bpp (CR=7%)
source
modeler coder
compressed
file
modeler decoder
recover
c 2009 D. S. Hirschberg
1-8
Types of Source Element Redundancy
• distribution
some elements occur more often than others,
e.g., ‘;’ in C programs
• repetition
elements often repeated,
e.g., 1 or 0 in b/w image bit maps
• patterns
correlation of symbols occurrence,
e.g., “th, qu” in English
• positional
some symbols occur mostly in the same relative position,
e.g., database fields
c 2009 D. S. Hirschberg
1-9
Methods for Compression
• post-filtering
attempt to further reduce/mask artifacts
that were introduced by information loss
c 2009 D. S. Hirschberg
1-10
Performance Measures
Which measures are important depends on the application
c 2009 D. S. Hirschberg
1-11
Performance Measures
• quality
◦ fidelity – lossless, perceptually lossless, lossy
fidelity criteria
• MSE (mean squared error)
• SNR (signal-to-noise ratio)
• perceptual quality
◦ allow graceful (lossy) degradation
◦ allow browsing – rapid recovery of degraded version
◦ delay
◦ minimize error propagation
c 2009 D. S. Hirschberg
1-12
Information Theory – Information Content
c 2009 D. S. Hirschberg
1-13
Information Theory – Entropy
• Conditional Entropy
X X X
H(Y |X) = − p(x)H(Y |X = x) = − p(x, y) log p(y|x)
x∈X x∈X y∈Y
◦ always non-negative
◦ zero only if p = q, can be infinite
◦ not symmetric and does not satisfy triangle inequality
X p(x|y)
= p(x|y)p(y) log
x,y p(x)
• Types of codes
c 2009 D. S. Hirschberg
1-19
Types of Codes
classified by time variance of codewords
A code is a mapping
from source = string over alphabet S
to compressor output = stream of codewords over alphabet C
• Fixed code
codeword set is time invariant
selection is predetermined
• Static code
codeword set is time invariant
selection dictated by model
• Adaptive code
dynamic codeword set (varies with time)
selection/modification dictated by model
c 2009 D. S. Hirschberg
1-20
Types of Codes
classified by input-output rates
c 2009 D. S. Hirschberg
1-21
Types of Codes
classified by decodability
Example: symbol a b
code 0 01
• not complete, as 110... is undecodable
• unambiguous but not instantaneous
c 2009 D. S. Hirschberg
1-22
Fixed Codes
Example: Morse
c 2009 D. S. Hirschberg
1-23
Fixed Finite Codes – Enumerative Coding
c 2009 D. S. Hirschberg
1-24
Enumerative Coding – Example
n = |S| = m m!
• j = j!(m−j)!
c 2009 D. S. Hirschberg
1-25
Enumerative Coding – Example
Example: m = 6, j = 4 → n = 15
• compute index(110110)
◦ 1-bits in positions
5,4,2,1
index = 4 + 3 + 2 + 1
5 4 2
◦ 1 = 11
c 2009 D. S. Hirschberg
1-26
Fixed Finite Codes – Phasing-In
• i ≥ 2B − n ⇒ encode i + 2B − n (B bits)
Example: n = 5 ⇒ B = 3
i code
0 00
1 01
2 10
3 110
4 111
c 2009 D. S. Hirschberg
1-27
Fixed Finite Codes – Start-Step-Stop Codes
family based on choice of 3 parameters
• k codeword sets, k = (stop − start)/step + 1
• set n has codewords = 111...1 0 xxx...x
prefix: (n − 1) 1’s, one 0 (omit when n = k)
suffix: start + (n − 1) ∗ step bits
Example: start = 3, step = 2, stop = 9
n=1 0xxx {0000, 0001, 0010, 0011, 0100, ... 0111}
n=2 10xxxxx 32 codewords
n=3 110xxxxxxx 128 codewords
n=4 111xxxxxxxxx 512 codewords, 680 in all
• instantaneously decodable:
read n 1’s until either 0 encountered
or n = (stop − start)/step
read start + (n − 1) ∗ step more bits, build #
• popular codes
◦ Elias
◦ Even-Rodeh
◦ Zeckendorf (Fibonacci)
◦ variable-byte
c 2009 D. S. Hirschberg
1-29
Elias Gamma Code
• instantaneously decodable:
read n 0-bits until encounter 1 bit (starts X)
read n more bits, computing binary value X
Example:
0 0 0 1 0 0 1
1 2 3=n 8 8 8 9
• length of encoding of j ≈ lg j + 2 lg lg j
• to encode j: write 0
while j ≥ 2
prepend binary value of j
j ← ⌊lg j⌋
• compare
value 1 2 4 8 16 32 64 128 256
Elias ω 1 3 6 7 11 12 13 14 21
Even-Rodeh 3 3 4 8 9 10 11 17 18
c 2009 D. S. Hirschberg
1-33
Zeckendorf (Fibonacci) Code
• to encode j:
◦ express j as sum of Fibonacci #’s
◦ represent as bit vector (1,2,3,5,8,13,...)
◦ forbidden to have two adjacent 1’s
◦ append a 1-bit
• instantaneously decodable:
◦ 11 delimits end of codeword
◦ while parsing, build value based on Fib #’s
Example: 0 1 0 1 0 0 1 1
1 2 3 5 8 13 21 end
• Advantages
◦ for most pdf’s, Fib outperforms Elias
◦ robust: 1-bit errors disturb ≤ 3 chars
c 2009 D. S. Hirschberg
1-34
Compare Elias & Fibonacci Code Lengths
Integer Gamma Delta Fibonacci
1 1 1 2
2 3 4 3
3 3 4 4
4 5 5 4
5-7 5 5 5
8-15 7 8 6-7
16-31 9 9 7-8
32-63 11 10 8-10
64-88 13 11 10
100 13 11 11
1000 19 16 16
104 27 20 20
105 33 25 25
106 39 28 30
c 2009 D. S. Hirschberg
1-35
Golomb Codes
• to encode j
◦ transmit unary value of ⌊j/m⌋
uses ⌊j/m⌋ 1-bits and one 0-bit
◦ transmit phased-in binary code for jmod m
uses ⌊lg m⌋ or ⌈lg m⌉ bits
c 2009 D. S. Hirschberg
1-36
Golomb Codes
Integer m = 3 m = 4 m = 5 m = 6
c 2009 D. S. Hirschberg
1-37
Rice Codes
c 2009 D. S. Hirschberg
1-38
Variable-Byte Code
Integer Bits
0 - 127 8
128 - 16383 16
16384 - 2097151 24
• integral byte sizes for easy coding
c 2009 D. S. Hirschberg
1-39
Comparing Fixed Codes
Length of encoding
40+
△ Rice 5
30+
20+ △
• Elias Gamma
• var-byte
∗ ∗ ∗◦ Fibonacci
15+ •
◦
•
△
◦
⋄ ⋄ ⋄ sss 4-3-10
• ◦
10+
⋄
•
◦ ⋄
◦ △
⋄
∗ ∗ ∗ ∗ ∗ ∗ ∗
• ◦ △
△ △ △ ◦
△ △
5+
⋄ ⋄ ⋄
•
◦ ⋄
◦
•
◦
•
0+ + + + + + + + + +
1 2 5 10 20 50 100 1000
Integer to be encoded
c 2009 D. S. Hirschberg
1-40