Documente Academic
Documente Profesional
Documente Cultură
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Example: compression of
grayscale images
An eight-bit grayscale image is a rectangular array
of integers between 0 (black) and 255 (white).
Each site in the array is called a pixel.
Ilya Pollak
Example: compression of
grayscale images
An eight-bit grayscale image is a rectangular array
of integers between 0 (black) and 255 (white).
Each site in the array is called a pixel.
It takes one byte (eight bits) to store one pixel value,
since it can be any number between 0 and 255.
Ilya Pollak
Example: compression of
grayscale images
An eight-bit grayscale image is a rectangular array
of integers between 0 (black) and 255 (white).
Each site in the array is called a pixel.
It takes one byte (eight bits) to store one pixel value,
since it can be any number between 0 and 255.
It would take 25 bytes to store a 5x5 image.
Ilya Pollak
Example: compression of
grayscale images
An eight-bit grayscale image is a rectangular array
of integers between 0 (black) and 255 (white).
Each site in the array is called a pixel.
It takes one byte (eight bits) to store one pixel value,
since it can be any number between 0 and 255.
It would take 25 bytes to store a 5x5 image.
Can we do better?
Ilya Pollak
Example: compression of
grayscale images
255 255 255 255 255
255 255 255 255 255
200 200 200 200 200
200 200 200 200 200
200 200 200 200 100
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Idea #2:
when encoding the data, spend fewer bits on frequently
occurring numbers and more bits on rare numbers.
Ilya Pollak
Entropy coding
Suppose we are encoding realizations of a discrete random variable X such that
value of X
255
55
100
probability
22/25
1/25
1/25
1/25
Ilya Pollak
Entropy coding
Suppose we are encoding realizations of a discrete random variable X such that
value of X
255
55
100
probability
22/25
1/25
1/25
1/25
255
55
100
codeword
00
01
10
11
Ilya Pollak
Entropy coding
Suppose we are encoding realizations of a discrete random variable X such that
value of X
255
55
100
probability
22/25
1/25
1/25
1/25
255
55
100
codeword
00
01
10
11
Ilya Pollak
Entropy coding
Suppose we are encoding realizations of a discrete random variable X such that
value of X
255
55
100
probability
22/25
1/25
1/25
1/25
255
55
100
codeword
00
01
10
11
255
55
100
codeword
01
000
001
Ilya Pollak
Entropy coding
Suppose we are encoding realizations of a discrete random variable X such that
value of X
255
55
100
probability
22/25
1/25
1/25
1/25
255
55
100
codeword
00
01
10
11
255
55
100
codeword
01
000
001
For a file with 25 numbers, E[file size] = 25(22/25 + 2/25 + 3/25 + 3/25) = 30 bits!
Ilya Pollak
Entropy coding
A similar encoding scheme can be devised for a
random variable of pixel differences which takes
values between 255 and 255, to result in a smaller
average file size than two bytes per pixel.
Ilya Pollak
Entropy coding
A similar encoding scheme can be devised for a
random variable of pixel differences which takes
values between 255 and 255, to result in a smaller
average file size than two bytes per pixel.
Another commonly used idea: run-length coding. I.e.,
instead of encoding each 0 individually, encode the
length of each string of zeros.
Ilya Pollak
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
000
001
Ilya Pollak
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
000
001
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
10
Ilya Pollak
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
000
001
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
10
Ilya Pollak
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
000
001
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
10
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
10
Ilya Pollak
255
55
100
probability
22/25
1/25
1/25
1/25
codeword
01
10
Ilya Pollak
0.3
0.2
0.2
0.1
0.1
0
a
0
a
Ilya Pollak
0.3
0.2
0.2
0.1
0.1
0
a
0
a
Conclusion: the transform procedure should be such that the numbers fed
into the entropy coder have a highly concentrated histogram (a few very
likely values, most values unlikely).
Ilya Pollak
0.3
0.2
0.2
0.1
0.1
0
a
0
a
Conclusion: the transform procedure should be such that the numbers fed
into the entropy coder have a highly concentrated histogram (a few very
likely values, most values unlikely). Also, if we are encoding each number
individually, they should be independent or approximately independent.
Ilya Pollak
253
255
254
255
254
254
254
255
254
252
255
255
254
252
253
253
254
254
254
252
255
253
252
253
Ilya Pollak
253
255
254
255
253.5
253.5
253.5
253.5
253.5
254
254
254
255
254
253.5
253.5
253.5
253.5
253.5
252
255
255
254
252
253.5
253.5
253.5
253.5
253.5
253
253
254
254
254
253.5
253.5
253.5
253.5
253.5
252
255
253
252
253
253.5
253.5
253.5
253.5
253.5
Quantization
Ilya Pollak
Converting continuous-valued to
discrete-valued signals
Many real-world signals are continuous-valued.
audio signal a(t): both the time argument t and the intensity value
a(t) are continuous;
image u(x,y): both the spatial location (x,y) and the image
intensity value u(x,y) are continuous;
video v(x,y,t): x,y,t, and v(x,y,t) are all continuous.
Ilya Pollak
Converting continuous-valued to
discrete-valued signals
Many real-world signals are continuous-valued.
audio signal a(t): both the time argument t and the intensity value
a(t) are continuous;
image u(x,y): both the spatial location (x,y) and the image
intensity value u(x,y) are continuous;
video v(x,y,t): x,y,t, and v(x,y,t) are all continuous.
Ilya Pollak
Converting continuous-valued to
discrete-valued signals
Many real-world signals are continuous-valued.
audio signal a(t): both the time argument t and the intensity value
a(t) are continuous;
image u(x,y): both the spatial location (x,y) and the image
intensity value u(x,y) are continuous;
video v(x,y,t): x,y,t, and v(x,y,t) are all continuous.
Ilya Pollak
Quantization
Digitizing a continuous-valued signal into a discrete and
finite set of values.
Converting a discrete-valued signal into another discrete
-valued signal, with fewer possible discrete values.
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
255
255
127
95
r
quantize each value separately
simple thresholding
127
255
95
255
Ilya Pollak
127
r
If (r,s) are jointly uniform over green square
(or, more generally, independent), knowing
r does not tell us anything about s.
Best thing to do: make quantization
decisions independently.
0
127
255
Ilya Pollak
255
255
127
95
r
If (r,s) are jointly uniform over green square
(or, more generally, independent), knowing
r does not tell us anything about s.
Best thing to do: make quantization
decisions independently.
0
127
255
r
If (r,s) are jointly uniform over yellow
region, knowing r tells us a lot about s.
0
95
255
Ilya Pollak
255
255
127
95
r
If (r,s) are jointly uniform over green square
(or, more generally, independent), knowing
r does not tell us anything about s.
Best thing to do: make quantization
decisions independently.
0
127
255
r
If (r,s) are jointly uniform over yellow
region, knowing r tells us a lot about s.
0
95
255
Ilya Pollak
s
255
127
127
255
Ilya Pollak
s
255
127
127
255
Ilya Pollak
s
255
127
127
255
Ilya Pollak
transform
quantization
entropy
coding
compressed
bitstream
Ilya Pollak
transform
quantization
entropy
coding
compressed
bitstream
Ilya Pollak
Ilya Pollak
Quantizer
Ilya Pollak
Quantizer
Ilya Pollak
N
N
2
2
E ( X(n) Y (n)) = E ( D(n))
n =1
n =1
2
If D(1),..., D(N ) are identically distributed, this is the same as NE ( D(n)) , for any n.
Ilya Pollak
Ilya Pollak
Uniform vs non-uniform
quantization
Uniform quantization is not a good
strategy for distributions which
significantly differ from uniform.
Ilya Pollak
Uniform vs non-uniform
quantization
Uniform quantization is not a good
strategy for distributions which
significantly differ from uniform.
If the distribution is non-uniform, it is better
to spend more quantization levels on
more probable parts of the distribution
and fewer quantization levels on less
probable parts.
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Y = Y (X) =
q1
if X < x2
q2
if x 2 X < x3
qL 1 if x L 1 X < x L
qL
X xL
Ilya Pollak
Ilya Pollak
Ilya Pollak
2
y(x)
x
f X (x)dx
(
)
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
2
y(x)
x
f X (x)dx
(
)
k =1 xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
( y(x) x )
k =1 xk
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
2
Minimize w.r.t. qk :
E (Y X ) =
qk
( y(x) x )
k =1 xk
xk+1
2 (q
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
x ) f X (x)dx = 0
xk
Ilya Pollak
( y(x) x )
f X (x)dx =
2
Minimize w.r.t. qk :
E (Y X ) =
qk
xk+1
xk
L xk+1
( y(x) x )
k =1 xk
xk+1
2 (q
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
x ) f X (x)dx = 0
xk
xk+1
f (x)dx =
k X
xf X (x)dx
xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
2
Minimize w.r.t. qk :
E (Y X ) =
qk
( y(x) x )
k =1 xk
xk+1
2 (q
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
x ) f X (x)dx = 0
xk
xk+1
xk+1
xk
xk+1
qk f X (x)dx =
xk
xf X (x)dx, therefore qk =
xf X (x)dx
f X (x)dx
xk
xk+1
xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
2
Minimize w.r.t. qk :
E (Y X ) =
qk
( y(x) x )
k =1 xk
xk+1
2 (q
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
x ) f X (x)dx = 0
xk
xk+1
xk+1
xk
xk+1
qk f X (x)dx =
xk
xf X (x)dx, therefore qk =
xf X (x)dx
f X (x)dx
xk
xk+1
xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
2
Minimize w.r.t. qk :
E (Y X ) =
qk
( y(x) x )
k =1 xk
xk+1
2 (q
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
x ) f X (x)dx = 0
xk
xk+1
xk+1
xk
xk+1
qk f X (x)dx =
xk
xf X (x)dx, therefore qk =
xf X (x)dx
f X (x)dx
xk
xk+1
xk
2
2
This is a minimum, since 2 E (Y X ) =
qk
xk+1
2f
(x)dx > 0.
xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
( y(x) x )
k =1 xk
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
( y(x) x )
k =1 xk
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
k
2
2
2
E (Y X ) =
( qk 1 x ) f X (x)dx + ( qk x ) f X (x)dx
xk
xk xk1
xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
( y(x) x )
k =1 xk
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
k
2
2
2
E (Y X ) =
( qk 1 x ) f X (x)dx + ( qk x ) f X (x)dx
xk
xk xk1
xk
= ( qk 1 xk ) f X (xk ) ( qk xk ) f X (xk )
2
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
( y(x) x )
k =1 xk
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
k
2
2
2
E (Y X ) =
( qk 1 x ) f X (x)dx + ( qk x ) f X (x)dx
xk
xk xk1
xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
( y(x) x )
k =1 xk
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
k
2
2
2
E (Y X ) =
( qk 1 x ) f X (x)dx + ( qk x ) f X (x)dx
xk
xk xk1
xk
Ilya Pollak
( y(x) x )
L xk+1
f X (x)dx =
( y(x) x )
k =1 xk
L xk+1
f X (x)dx =
2
q
x
f X (x)dx
(
)
k
k =1 xk
k
2
2
2
E (Y X ) =
( qk 1 x ) f X (x)dx + ( qk x ) f X (x)dx
xk
xk xk1
xk
Ilya Pollak
x xfX (x)dx
k
q
=
= E [ X | X k-th quantization interval], for k = 1,, L
k
x
k+1
f X (x)dx
xk
xk = qk 1 + qk , for k = 2,, L
Ilya Pollak
x xfX (x)dx
k
q
=
= E [ X | X k-th quantization interval], for k = 1,, L
k
x
k+1
f X (x)dx
xk
xk = qk 1 + qk , for k = 2,, L
Ilya Pollak
x xfX (x)dx
k
q
=
= E [ X | X k-th quantization interval], for k = 1,, L
k
x
k+1
f X (x)dx
xk
xk = qk 1 + qk , for k = 2,, L
Ilya Pollak
x xfX (x)dx
k
q
=
= E [ X | X k-th quantization interval], for k = 1,, L
k
x
k+1
f X (x)dx
xk
xk = qk 1 + qk , for k = 2,, L
Ilya Pollak
Ilya Pollak
Ilya Pollak
2
E Y X = E (Y (n) X(n))
n =1
Ilya Pollak
2
E Y X = E (Y (n) X(n))
n =1
Difficulty: cannot differentiate with respect to a set Ak , and so unless the set of all allowed
partitions is somehow restricted, this cannot be solved.
Ilya Pollak
transform
quantization
entropy
coding
compressed
bitstream
Ilya Pollak
Problem statement
Source (e.g., image,
video, speech signal,
or quantizer output)
Sequence of discrete
random variables X(1),,X(N)
(e.g., transformed image pixel values),
assumed to be independent and
identically distributed over a finite
alphabet {a1,,aM}.
Ilya Pollak
Problem statement
Source (e.g., image,
video, speech signal,
or quantizer output)
Sequence of discrete
random variables X(1),,X(N)
(e.g., transformed image pixel values),
assumed to be independent and
identically distributed over a finite
Encoder: mapping
alphabet {a1,,aM}.
between source
symbols and binary
strings (codewords)
Binary string
Requirements:
minimize the expected length of the binary string;
the binary string needs to be uniquely decodable, i.e., we need to be able
to infer X(1),,X(N) from it!
Ilya Pollak
Problem statement
Source (e.g., image,
video, speech signal,
or quantizer output)
Sequence of discrete
random variables X(1),,X(N)
(e.g., transformed image pixel values),
assumed to be independent and
identically distributed over a finite
Encoder: mapping
alphabet {a1,,aM}.
between source
symbols and binary
strings (codewords)
Binary string
codeword
a1
w1
aM
wM
Ilya Pollak
Unique Decodability
symbol
codeword
00
01
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
codeword
00
01
Ilya Pollak
codeword
c
d
codeword
00
0
wa=0
0
1
wb=1
Ilya Pollak
codeword
00
01
0
wa=0
0
1
wd=01
1
wb=1
Ilya Pollak
codeword
b
c
d
0
1
wa=1
Ilya Pollak
codeword
01
c
d
0
0
1
wb=01
1
wa=1
Ilya Pollak
wc=000
symbol
codeword
01
000
001
0
0
1
wd=001
0
1
wb=01
1
wa=1
Ilya Pollak
wc=000
symbol
codeword
01
000
001
0
0
1
wd=001
1
wb=01
1
wa=1
Ilya Pollak
wc=000
0
0
1
wd=001
symbol
codeword
01
000
001
1
wb=01
1
wa=1
Ilya Pollak
000001101
0
0
1
0
1
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
0
1
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
0
1
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
0
1
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
output: c
0
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
output: c
0
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
output: c
0
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
output: c
0
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
output: cd
0
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
output: cd
0
wd=001
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
wd=001
output: cda
0
1
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
wd=001
output: cda
0
1
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
wd=001
output: cda
0
1
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
wd=001
output: cdab
0
1
wb=01
1
wa=1
Ilya Pollak
wc=000
000001101
0
0
1
wd=001
0
1
wb=01
final output:
cdab
1
wa=1
Ilya Pollak
Ilya Pollak
Ilya Pollak
Entropy coding
Given a discrete random variable X with M possible outcomes
(symbols or letters) a1,,aM and with PMF pX, what is the
lowest achievable expected codeword length among all the
uniquely decodable codes?
Answer depends on pX; Shannons source coding theorem provides
bounds.
Ilya Pollak
Huffman code
Consider a discrete r.v. X with M possible outcomes a1,,aM and with PMF
pX. Assume that pX(a1) pX(aM). (If this condition is not satisfied,
reorder the outcomes so that it is satisfied.)
Ilya Pollak
Huffman code
Consider a discrete r.v. X with M possible outcomes a1,,aM and with PMF
pX. Assume that pX(a1) pX(aM). (If this condition is not satisfied,
reorder the outcomes so that it is satisfied.)
Consider aggregate outcome a12 = {a1,a2} and a discrete r.v. X such that
a12
X' =
X
if X = a1 or X = a2
otherwise
Ilya Pollak
Huffman code
Consider a discrete r.v. X with M possible outcomes a1,,aM and with PMF
pX. Assume that pX(a1) pX(aM). (If this condition is not satisfied,
reorder the outcomes so that it is satisfied.)
Consider aggregate outcome a12 = {a1,a2} and a discrete r.v. X such that
a12
X' =
X
if X = a1 or X = a2
otherwise
p ( a ) + p ( a ) if a = a
X 1
X
2
12
pX ' ( a ) =
if a = a3 ,, aM
p X ( a )
Ilya Pollak
Huffman code
Consider a discrete r.v. X with M possible outcomes a1,,aM and with PMF
pX. Assume that pX(a1) pX(aM). (If this condition is not satisfied,
reorder the outcomes so that it is satisfied.)
Consider aggregate outcome a12 = {a1,a2} and a discrete r.v. X such that
a12
X' =
X
if X = a1 or X = a2
otherwise
p ( a ) + p ( a ) if a = a
X 1
X
2
12
pX ' ( a ) =
if a = a3 ,, aM
p X ( a )
Suppose we have a tree, T, for an optimal prefix condition code for X. A tree
T for an optimal prefix condition code for X can be obtained from T by
splitting the leaf a12 into two leaves corresponding to a1 and a2.
Ilya Pollak
Huffman code
Consider a discrete r.v. X with M possible outcomes a1,,aM and with PMF
pX. Assume that pX(a1) pX(aM). (If this condition is not satisfied,
reorder the outcomes so that it is satisfied.)
Consider aggregate outcome a12 = {a1,a2} and a discrete r.v. X such that
a12
X' =
X
if X = a1 or X = a2
otherwise
p ( a ) + p ( a ) if a = a
X 1
X
2
12
pX ' ( a ) =
if a = a3 ,, aM
p X ( a )
Suppose we have a tree, T, for an optimal prefix condition code for X. A tree
T for an optimal prefix condition code for X can be obtained from T by
splitting the leaf a12 into two leaves corresponding to a1 and a2.
We wont prove this.
Ilya Pollak
letter
pX(letter)
a1
0.10
a2
0.10
a3
0.25
a4
0.25
a5
0.30
Example
Ilya Pollak
letter
pX(letter)
a1
0.10
a2
0.10
a3
0.25
a4
0.25
a5
0.30
Example
Step 1: combine
the two least likely
letters.
letter
pX(letter)
a12
0.20
a3
0.25
a4
0.25
a5
0.30
Ilya Pollak
letter
pX(letter)
a1
0.10
a2
0.10
a3
0.25
a4
0.25
a5
0.30
Example
Step 1: combine
the two least likely
letters.
a1
a2
letter
pX(letter)
a12
0.20
a3
0.25
a4
0.25
a5
0.30
a12
Ilya Pollak
letter
pX(letter)
a1
0.10
a2
0.10
a3
0.25
a4
0.25
a5
0.30
Example
Step 1: combine
the two least likely
letters.
a1
Tree for X:
a2
a12
letter
pX(letter)
a12
0.20
a3
0.25
a4
0.25
a5
0.30
Tree for X
(still to be
constructed)
Ilya Pollak
Example
letter
pX(letter)
a12
0.20
a3
0.25
a4
0.25
a5
0.30
Step 2: combine
the two least likely
letters from the new
alphabet.
letter
pX(letter)
a123
0.45
a4
0.25
a5
0.30
Ilya Pollak
Example
letter
pX(letter)
a12
0.20
a3
0.25
a4
0.25
a5
0.30
Step 2: combine
the two least likely
letters from the new
alphabet.
a1
pX(letter)
a123
0.45
a4
0.25
a5
0.30
a12
1
a2
letter
a3
a123
Ilya Pollak
Example
letter
pX(letter)
a12
0.20
a3
0.25
a4
0.25
a5
0.30
Step 2: combine
the two least likely
letters from the new
alphabet.
a1
Tree for X:
0
pX(letter)
a123
0.45
a4
0.25
a5
0.30
a12
1
a2
letter
a3
a123
Tree for
X
Ilya Pollak
Example
letter
pX(letter)
a12
0.20
a3
0.25
a4
0.25
a5
0.30
Step 2: combine
the two least likely
letters from the new
alphabet.
a1
Tree for X:
0
a12
a3
pX(letter)
a123
0.45
a4
0.25
a5
0.30
Tree for X
1
a2
letter
a123
Tree for
X
Ilya Pollak
Example
letter
pX(letter)
a123
0.45
a4
0.25
a5
0.30
a1
pX(letter)
a123
0.45
a45
0.55
a12
1
a2
letter
a3
a4
a5
a123
0
1
a45
Ilya Pollak
Example
letter
pX(letter)
a123
0.45
a4
0.25
a5
0.30
a1
Tree for X:
0
pX(letter)
a123
0.45
a45
0.55
a12
1
a2
letter
a3
a123
Tree for X
a4
a5
a45
Ilya Pollak
Example
letter
pX(letter)
a123
0.45
a4
0.25
a5
0.30
a1
Tree for X:
a12
a3
a5
a123
0.45
a45
0.55
Tree for X
a4
pX(letter)
Tree for X
a123
1
a2
letter
a45
Ilya Pollak
Example
letter
pX(letter)
a123
0.45
a4
0.25
a5
0.30
a1
Tree for X:
0
a12
a3
a5
a123
0.45
a45
0.55
Tree for X
a123
Tree for X
a4
pX(letter)
Tree for X
1
a2
letter
a45
Ilya Pollak
Example
letter
pX(letter)
a123
0.45
a45
0.55
Done!
a1
Tree for X:
a12
1
a2
a3
a4
a5
a123
1
0
1a
45
a12345
Ilya Pollak
Example
letter
pX(letter)
a123
0.45
a45
0.55
Tree for X:
1
a2
a3
a4
a5
1
0
1
Ilya Pollak
Example
a1
Tree for X:
1
a2
a3
a4
a5
1
0
1
letter
pX(letter)
codeword
a1
0.10
111
a2
0.10
a3
0.25
a4
0.25
a5
0.30
Ilya Pollak
Example
a1
Tree for X:
1
a2
a3
a4
a5
1
0
1
letter
pX(letter)
codeword
a1
0.10
111
a2
0.10
110
a3
0.25
a4
0.25
a5
0.30
Ilya Pollak
Example
a1
Tree for X:
1
a2
a3
a4
a5
1
0
1
letter
pX(letter)
codeword
a1
0.10
111
a2
0.10
110
a3
0.25
10
a4
0.25
a5
0.30
Ilya Pollak
Example
a1
Tree for X:
1
a2
a3
a4
a5
1
0
1
letter
pX(letter)
codeword
a1
0.10
111
a2
0.10
110
a3
0.25
10
a4
0.25
01
a5
0.30
Ilya Pollak
Example
a1
Tree for X:
1
a2
a3
a4
a5
1
0
1
letter
pX(letter)
codeword
a1
0.10
111
a2
0.10
110
a3
0.25
10
a4
0.25
01
a5
0.30
00
Ilya Pollak
Example
Expected codeword length: 3(0.1) + 3(0.1) + 2(0.25) + 2(0.25) + 2(0.3) = 2.2 bits
a1
Tree for X:
1
a2
a3
a4
a5
1
0
1
letter
pX(letter)
codeword
a1
0.10
111
a2
0.10
110
a3
0.25
10
a4
0.25
01
a5
0.30
00
Ilya Pollak
Self-information
Consider again a discrete random variable X with M possible
outcomes a1,,aM and with PMF pX.
Ilya Pollak
Self-information
Consider again a discrete random variable X with M possible
outcomes a1,,aM and with PMF pX.
Self-information of outcome am is I(am) = log2 pX(am) bits.
Ilya Pollak
Self-information
Consider again a discrete random variable X with M possible
outcomes a1,,aM and with PMF pX.
Self-information of outcome am is I(am) = log2 pX(am) bits.
E.g., pX(am) = 1 then I(am) = 0. The occurrence of am is not at
all informative, since it had to occur. The smaller the
probability of an outcome, the larger its self-information.
Ilya Pollak
Self-information
Consider again a discrete random variable X with M possible
outcomes a1,,aM and with PMF pX.
Self-information of outcome am is I(am) = log2 pX(am) bits.
E.g., pX(am) = 1 then I(am) = 0. The occurrence of am is not at
all informative, since it had to occur. The smaller the
probability of an outcome, the larger its self-information.
Self-information of X is I(X) = log2 pX(X) and is a random
variable.
Ilya Pollak
Self-information
Consider again a discrete random variable X with M possible
outcomes a1,,aM and with PMF pX.
Self-information of outcome am is I(am) = log2 pX(am) bits.
E.g., pX(am) = 1 then I(am) = 0. The occurrence of am is not at
all informative, since it had to occur. The smaller the
probability of an outcome, the larger its self-information.
Self-information of X is I(X) = log2 pX(X) and is a random
variable.
Entropy of X is the expected value of its self-information:
M
Ilya Pollak
Ilya Pollak
Example
Suppose that X has M=2K possible outcomes a1,,aM.
Ilya Pollak
Example
Suppose that X has M=2K possible outcomes a1,,aM.
Suppose that X is uniform, i.e., pX (a1) = = pX (aM) = 2K.
Ilya Pollak
Example
Suppose that X has M=2K possible outcomes a1,,aM.
Suppose that X is uniform, i.e., pX (a1) = = pX (aM) = 2K. Then
2K
( )
Ilya Pollak
Example
Suppose that X has M=2K possible outcomes a1,,aM.
Suppose that X is uniform, i.e., pX (a1) = = pX (aM) = 2K. Then
2K
( )
Ilya Pollak
Example
Suppose that X has M=2K possible outcomes a1,,aM.
Suppose that X is uniform, i.e., pX (a1) = = pX (aM) = 2K. Then
2K
( )
Ilya Pollak
Example
Suppose that X has M=2K possible outcomes a1,,aM.
Suppose that X is uniform, i.e., pX (a1) = = pX (aM) = 2K. Then
2K
( )
Ilya Pollak
dm
1,
(1)
m =1
then there exists a prefix condition code whose codeword lengths are these integers.
Conversely, the codeword lengths of any prefix condition code satisfy this inequality.
Ilya Pollak
Ilya Pollak
Ilya Pollak
Depth of red
node = 2
Ilya Pollak
Suppose d1 d M satisfy (1). Consider the full binary tree of depth d M , and consider all its
nodes at depth d1 . Assign one of these nodes to symbol a1 .
Ilya Pollak
Suppose d1 d M satisfy (1). Consider the full binary tree of depth d M , and consider all its
nodes at depth d1 . Assign one of these nodes to symbol a1 . Consider all the nodes at depth d2 which
are not a1 and not descendants of a1 . Assign one of them to symbol a2 .
Ilya Pollak
Suppose d1 d M satisfy (1). Consider the full binary tree of depth d M , and consider all its
nodes at depth d1 . Assign one of these nodes to symbol a1 . Consider all the nodes at depth d2 which
are not a1 and not descendants of a1 . Assign one of them to symbol a2 . Iterate like this M times.
Ilya Pollak
Suppose d1 d M satisfy (1). Consider the full binary tree of depth d M , and consider all its
nodes at depth d1 . Assign one of these nodes to symbol a1 . Consider all the nodes at depth d2 which
are not a1 and not descendants of a1 . Assign one of them to symbol a2 . Iterate like this M times.
If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full
binary tree of depth d M is a descendant of one of the first m symbols, a1 ,, ar .
Ilya Pollak
Suppose d1 d M satisfy (1). Consider the full binary tree of depth d M , and consider all its
nodes at depth d1 . Assign one of these nodes to symbol a1 . Consider all the nodes at depth d2 which
are not a1 and not descendants of a1 . Assign one of them to symbol a2 . Iterate like this M times.
If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full
binary tree of depth d M is a descendant of one of the first m symbols, a1 ,, ar . But note that every
node at depth dm has 2 dM dm descendants. Note also that the full tree has 2 dM leaves. Therefore, if
every leaf in the tree is a descendant of a1 ,, ar , then
r
d M dm
= 2 dM
m =1
Ilya Pollak
Suppose d1 d M satisfy (1). Consider the full binary tree of depth d M , and consider all its
nodes at depth d1 . Assign one of these nodes to symbol a1 . Consider all the nodes at depth d2 which
are not a1 and not descendants of a1 . Assign one of them to symbol a2 . Iterate like this M times.
If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full
binary tree of depth d M is a descendant of one of the first m symbols, a1 ,, ar . But note that every
node at depth dm has 2 dM dm descendants. Note also that the full tree has 2 dM leaves. Therefore, if
every leaf in the tree is a descendant of a1 ,, ar , then
r
2
m =1
d M dm
=2
dM
dm
=1
m =1
Ilya Pollak
Suppose d1 d M satisfy (1). Consider the full binary tree of depth d M , and consider all its
nodes at depth d1 . Assign one of these nodes to symbol a1 . Consider all the nodes at depth d2 which
are not a1 and not descendants of a1 . Assign one of them to symbol a2 . Iterate like this M times.
If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full
binary tree of depth d M is a descendant of one of the first m symbols, a1 ,, ar . But note that every
node at depth dm has 2 dM dm descendants. Note also that the full tree has 2 dM leaves. Therefore, if
every leaf in the tree is a descendant of a1 ,, ar , then
r
d M dm
=2
dM
m =1
=1
m =1
Therefore,
dm
2
m =1
dm
= 2
m =1
dm
m = r +1
Ilya Pollak
Suppose d1 d M satisfy (1). Consider the full binary tree of depth d M , and consider all its
nodes at depth d1 . Assign one of these nodes to symbol a1 . Consider all the nodes at depth d2 which
are not a1 and not descendants of a1 . Assign one of them to symbol a2 . Iterate like this M times.
If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full
binary tree of depth d M is a descendant of one of the first m symbols, a1 ,, ar . But note that every
node at depth dm has 2 dM dm descendants. Note also that the full tree has 2 dM leaves. Therefore, if
every leaf in the tree is a descendant of a1 ,, ar , then
r
d M dm
=2
dM
m =1
=1
m =1
Therefore,
dm
2
m =1
dm
= 2
m =1
dm
m = r +1
Thus, our procedure can in fact go on for M iterations. After the M -th iteration, we will have
constructed a prefix condition code with codeword lengths d1 ,, d M .
Ilya Pollak
Suppose d1 d M , and suppose we have a prefix condition code with there codeword lengths.
Consider the binary tree corresponding to this code.
Ilya Pollak
Suppose d1 d M , and suppose we have a prefix condition code with there codeword lengths.
Consider the binary tree corresponding to this code. Complete this tree to obtain a full tree of
depth d M .
Ilya Pollak
Suppose d1 d M , and suppose we have a prefix condition code with there codeword lengths.
Consider the binary tree corresponding to this code. Complete this tree to obtain a full tree of
depth d M . Again use the following facts:
the full tree has 2 dM leaves;
the number of leaf descendants of the codeword of length dm is 2 dM dm .
Ilya Pollak
Suppose d1 d M , and suppose we have a prefix condition code with there codeword lengths.
Consider the binary tree corresponding to this code. Complete this tree to obtain a full tree of
depth d M . Again use the following facts:
the full tree has 2 dM leaves;
the number of leaf descendants of the codeword of length dm is 2 dM dm .
The combined number of all leaf descendants of all codewords must be less than or equal to
the total number of leaves in the full tree:
M
d M dm
2 dM
m =1
Ilya Pollak
Suppose d1 d M , and suppose we have a prefix condition code with there codeword lengths.
Consider the binary tree corresponding to this code. Complete this tree to obtain a full tree of
depth d M . Again use the following facts:
the full tree has 2 dM leaves;
the number of leaf descendants of the codeword of length dm is 2 dM dm .
The combined number of all leaf descendants of all codewords must be less than or equal to
the total number of leaves in the full tree:
M
2
m =1
d M dm
dM
dm
1.
m =1
Ilya Pollak
Ilya Pollak
m =1
m =1
Ilya Pollak
1
dm
H (X) E[C] = p X (am )log 2 p X (am ) p X (am )dm = p X (am ) log 2
log 2 2
p
(a
)
m =1
m =1
m =1
X
m
Ilya Pollak
1
dm
H (X) E[C] = p X (am )log 2 p X (am ) p X (am )dm = p X (am ) log 2
log 2 2
p
(a
)
m =1
m =1
m =1
X
m
1
= p X (am ) log 2
dm
p
(a
)2
m =1
X
m
Ilya Pollak
1
dm
H (X) E[C] = p X (am )log 2 p X (am ) p X (am )dm = p X (am ) log 2
log 2 2
p
(a
)
m =1
m =1
m =1
X
m
1
= p X (am ) log 2
dm
p
(a
)2
m =1
X
m
1
p X (am )
1
log 2 e
dm
p X (am )2
m =1
(by Lemma 1)
Ilya Pollak
1
dm
H (X) E[C] = p X (am )log 2 p X (am ) p X (am )dm = p X (am ) log 2
log 2 2
p
(a
)
m =1
m =1
m =1
X
m
1
= p X (am ) log 2
dm
p
(a
)2
m =1
X
m
1
p X (am )
1
log 2 e
dm
p X (am )2
m =1
(by Lemma 1)
M
M 1
= dm p X (am ) log 2 e
m =1 2
m =1
Ilya Pollak
1
dm
H (X) E[C] = p X (am )log 2 p X (am ) p X (am )dm = p X (am ) log 2
log 2 2
p
(a
)
m =1
m =1
m =1
X
m
1
= p X (am ) log 2
dm
p
(a
)2
m =1
X
m
1
p X (am )
1
log 2 e
dm
p X (am )2
m =1
(by Lemma 1)
M
M 1
= dm p X (am ) log 2 e
m =1 2
m =1
M dm
= 2 1 log 2 e 0
m =1
Ilya Pollak
1
dm
H (X) E[C] = p X (am )log 2 p X (am ) p X (am )dm = p X (am ) log 2
log 2 2
p
(a
)
m =1
m =1
m =1
X
m
1
= p X (am ) log 2
dm
p
(a
)2
m =1
X
m
1
p X (am )
1
log 2 e
dm
p X (am )2
m =1
(by Lemma 1)
M
M 1
= dm p X (am ) log 2 e
m =1 2
m =1
M dm
= 2 1 log 2 e 0
m =1
By Kraft inequality, this holds for any prefix condition code. But it is also true for any uniquely
decodable code.
Ilya Pollak
Ilya Pollak
Ilya Pollak
dm log 2 p X (am )
dm
p X (am )
2
m =1
dm
p X (am ) = 1.
m =1
Ilya Pollak
dm log 2 p X (am )
dm
p X (am )
2
m =1
dm
p X (am ) = 1.
m =1
Therefore, Kraft inequality is satisfied, and we can construct a prefix condition code with codeword
lengths d1 ,, d M .
Ilya Pollak
dm log 2 p X (am )
dm
p X (am )
2
m =1
dm
p X (am ) = 1.
m =1
Therefore, Kraft inequality is satisfied, and we can construct a prefix condition code with codeword
lengths d1 ,, d M . Also, by construction,
dm 1 < log 2 p X (am ) dm < log 2 p X (am ) + 1
Ilya Pollak
dm log 2 p X (am )
dm
p X (am )
2
m =1
dm
p X (am ) = 1.
m =1
Therefore, Kraft inequality is satisfied, and we can construct a prefix condition code with codeword
lengths d1 ,, d M . Also, by construction,
dm 1 < log 2 p X (am ) dm < log 2 p X (am ) + 1
p X (am )dm < p X (am )log 2 p X (am ) + p X (am )
Ilya Pollak
dm log 2 p X (am )
dm
p X (am )
2
m =1
dm
p X (am ) = 1.
m =1
Therefore, Kraft inequality is satisfied, and we can construct a prefix condition code with codeword
lengths d1 ,, d M . Also, by construction,
dm 1 < log 2 p X (am ) dm < log 2 p X (am ) + 1
p X (am )dm < p X (am )log 2 p X (am ) + p X (am )
p
m =1
Ilya Pollak
dm log 2 p X (am )
dm
p X (am )
2
m =1
dm
p X (am ) = 1.
m =1
Therefore, Kraft inequality is satisfied, and we can construct a prefix condition code with codeword
lengths d1 ,, d M . Also, by construction,
dm 1 < log 2 p X (am ) dm < log 2 p X (am ) + 1
p X (am )dm < p X (am )log 2 p X (am ) + p X (am )
p
m =1
m =1
m =1
Ilya Pollak
dm log 2 p X (am )
dm
p X (am )
2
m =1
dm
p X (am ) = 1.
m =1
Therefore, Kraft inequality is satisfied, and we can construct a prefix condition code with codeword
lengths d1 ,, d M . Also, by construction,
dm 1 < log 2 p X (am ) dm < log 2 p X (am ) + 1
p X (am )dm < p X (am )log 2 p X (am ) + p X (am )
p
m =1
m =1
m =1
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Ilya Pollak
Example
(X1,X2) will have four outcomes, (a1,a1), (a1,a2), (a2,a1), (a2,a2),
with probabilities 12d+1+22d, 2d22d, 2d22d, and 22d,
respectively.
Ilya Pollak
Example
(X1,X2) will have four outcomes, (a1,a1), (a1,a2), (a2,a1), (a2,a2),
with probabilities 12d+1+22d, 2d22d, 2d22d, and 22d,
respectively.
Huffman code: 0 for (a1,a1); 10 for (a1,a2); 110 for (a2,a1); 111
for (a2,a2).
Ilya Pollak
Example
(X1,X2) will have four outcomes, (a1,a1), (a1,a2), (a2,a1), (a2,a2),
with probabilities 12d+1+22d, 2d22d, 2d22d, and 22d,
respectively.
Huffman code: 0 for (a1,a1); 10 for (a1,a2); 110 for (a2,a1); 111
for (a2,a2).
Expected codeword length per random variable:
[12d+1+22d + 2(2d22d) + 3(2d22d)+ 3(22d)]/2
Ilya Pollak
Example
(X1,X2) will have four outcomes, (a1,a1), (a1,a2), (a2,a1), (a2,a2),
with probabilities 12d+1+22d, 2d22d, 2d22d, and 22d,
respectively.
Huffman code: 0 for (a1,a1); 10 for (a1,a2); 110 for (a2,a1); 111
for (a2,a2).
Expected codeword length per random variable:
[12d+1+22d + 2(2d22d) + 3(2d22d)+ 3(22d)]/2
This is 0.500001 for d=20
Ilya Pollak
Example
(X1,X2) will have four outcomes, (a1,a1), (a1,a2), (a2,a1), (a2,a2),
with probabilities 12d+1+22d, 2d22d, 2d22d, and 22d,
respectively.
Huffman code: 0 for (a1,a1); 10 for (a1,a2); 110 for (a2,a1); 111
for (a2,a2).
Expected codeword length per random variable:
[12d+1+22d + 2(2d22d) + 3(2d22d)+ 3(22d)]/2
This is 0.500001 for d=20
Ilya Pollak
Ilya Pollak
Ilya Pollak
n =1
n =1
Ilya Pollak
n =1
n =1
Ilya Pollak
n =1
n =1
where E [ C N ] is the expected codeword length for the optimal uniquely decodable code for X
Ilya Pollak
n =1
n =1
NH ( Xn ) E [ C N ] < NH ( Xn ) + 1,
where E [ C N ] is the expected codeword length for the optimal uniquely decodable code for X
Ilya Pollak
n =1
n =1
NH ( Xn ) E [ C N ] < NH ( Xn ) + 1,
1
,
N
is the expected codeword length for the optimal uniquely decodable code for X,
H ( Xn ) E [C ] < H ( Xn ) +
where E [ C N ]
E [CN ]
and E [ C ] =
is the corresponding expected codeword length per symbol.
N
Ilya Pollak
Arithmetic coding
Another form of entropy coding.
More amenable to coding long sequences of symbols than
Huffman coding.
Can be used in conjunction with on-line learning of conditional
probabilities to encode dependent sequences of symbols:
Q-coder in JPEG (JPEG also has a Huffman coding option)
QM-coder in JBIG
MQ-coder in JPEG-2000
CABAC coder in H.264/MPEG-4 AVC
Ilya Pollak