Documente Academic
Documente Profesional
Documente Cultură
Amber Habib
Mathematical Sciences Foundation
Delhi
www.mathscifound.org
Abstract
A digital image can be viewed as an array of numbers, each number
representing the colour value of the corresponding pixel. In the JPEG
format, these numbers are stored indirectly, via their discrete cosine
transform. This enables easy compression, resizing, etc. For further
savings, the array produced by the discrete cosine transform is stored
using Huffman encoding.
The calculations and plotting were carried out using Mathematica.
These notes were prepared for MSF’s Programme in Mathematical
Simulation and IT. They provided the base for student projects in
image manipulation using Matlab. The projects used Fourier analysis
as well as wavelets.
Contents
1 Discrete Fourier Transform 2
4 Huffman Encoding 10
1
1 DISCRETE FOURIER TRANSFORM 2
1 2 3 4 5 6
To represent this data in a way that can be easily manipulated for different
purposes, we wish to construct a function that passes through all the data
points.
Note: We have not said anything about how to find the coefficients Ak and
Bk . Our immediate interest is in observing that this knowledge is useful,
and then later we will see how to obtain it.
The discrete Fourier transform f (x) passes exactly through the data points:
1 2 3 4 5 6
1 2 3 4 5 6
This function doesn’t represent the data exactly but it does roughly follow
the general trend.
1 2 3 4 5 6
The loss in quality is much greater. This shows that the “higher order”
terms contribute less than the “lower order” terms. Therefore, we need not
store them to the same order of accuracy.
Suppose then, that we round off the last couple of coefficients of A and B:
1 2 3 4 5 6
with data which comes as arrays instead of lists. Thus, consider a string of
data, such as
220
200
180
160
140
120
-3 -2 -1 1 2 3
If we calculate the discrete Fourier transform for such data, we find that the
sine terms vanish (because sine is odd) and only the cosine terms remain
(because cosine is even, like the data). This special form is called the discrete
cosine transform of the data.
For data such as we have given (8 points), the discrete cosine transform is
7
A0 X
f (x) = + Ak cos(kx),
2
k=1
For the example we have given, this formula produces the following values
for the Fourier coefficients:
A0 = 139.5 A1 = −10.04 A2 = 24.25 A3 = −35.36
A4 = 20.51 A5 = −28.66 A6 = 6.79 A7 = −4.22
3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 6
The corresponding cosine transform f (x) exactly passes through the data
points:
250
225
200
175
150
125
100
0.5 1 1.5 2 2.5 3
A Shaded Box
0, 1, 2, . . . , 7.
Thus the (0, 0) entry is 123, the (3, 7) entry is 133, etc.
To this data, we apply the two dimensional discrete Cosine transform, defined
by:
7 7
1 XXh
DCT(u, v) = C(u)C(v) Data(x, y) ×
4
x=0 y=0
u v i
cos π(2x + 1) cos π(2y + 1) .
16 16
Here Data(x, y) refers to the (x, y) entry in the data table given above. The
coefficients C(u) and C(v) are defined by
1
C(h) = √ if h = 0 and C(h) = 1 if h 6= 0.
2
The discrete cosine transform produces the following table, after rounding:2
The first thing is to establish that we can recover the data from its dis-
crete cosine transform. For this purpose we define the inverse discrete cosine
transform by
7 7
1 XXh
IDCT(x, y) = C(u)C(v)DCT(u, v) ×
4
u=0 v=0
u v i
cos π(2x + 1) cos π(2y + 1) .
16 16
If we apply the IDCT to the DCT table, we get (after rounding):
Can you spot any difference between this and the original data?
Suppose we store the data via its DCT. We ask if we can afford to lose some
of the details of the DCT without significantly affecting the quality of the
data. One way to reduce the amount of space required by the DCT is to
divide every entry by, say, 8 (thus saving 3 bits per entry since the numbers
are stored in binary).
144 5 -5 -1 3 -10 1 5
-10 0 14 -9 -1 0 3 -1
2 -1 0 -5 3 0 2 -5
0 -8 -2 -2 4 -3 -2 0
6 2 4 -1 1 -3 1 -1
4 -2 -1 -1 2 -4 -3 2
-2 -1 3 -1 0 0 1 -3
-1 -2 -1 -2 -1 0 1 -1
Another approach is to compress the entries on the top left less (as these
are more significant). For example, we divide the entries in the top left 4 × 4
submatrix of DCT by 2, and all the other entries by 8:
4 Huffman Encoding
The discrete cosine transform produces the numbers used to store and trans-
mit an image. However, these numbers are not stored according to their
values, but through a code that further reduces the required space. This
code names numbers according to their frequency. More frequent numbers
are given shorter codes.
144 5 -5 -1 3 -10 1 5
-10 0 14 -9 -1 0 3 -1
2 -1 0 -5 3 0 2 -5
0 -8 -2 -2 4 -3 -2 0
6 2 4 -1 1 -3 1 -1
4 -2 -1 -1 2 -4 -3 2
-2 -1 3 -1 0 0 1 -3
-1 -2 -1 -2 -1 0 1 -1
Step 1. List all the numbers occurring in the table, along with their frequen-
cies:
Each number will become a ‘leaf’of the binary tree. This leaf will be labelled
by the number and its frequency. For instance, since 5 has frequency 2, the
corresponding leaf will be drawn as 5:2 .
Step 3. Two leaves with the lowest frequency are combined into one node.
This node is labelled by the sum of their frequencies. Thus, we get
4 HUFFMAN ENCODING 11
2n
AA
A
-9:1 -8:1
We repeat this step, with the following modification: Leaves and nodes
already collected below a node are ignored while comparing frequencies.
Only the top nodes and remaining leaves are taken into account.
Step 5.
Step 6.
A
A
A
A 0 AA 1
A
0 A1 0 A1 5:2 -5:3
A A
A A
2n -10:2 2n 2n
AA AA AA
0 1 0 1 0 1
A A A
-9:1 -8:1 14:1 144:1 -4:1 6:1
By now, the general scheme should be clear. It is evident that we have made
certain choices in each step: namely the order in which we write nodes/leaves
having the same frequency. This does affect the final binary tree we obtain.
However, once we have described the method of coding, it will be obvious
that these choices do not affect the efficiency of the encoding.
Figure 1 shows the final binary tree for our data. We have also labelled each
branch of the tree: by 0 if it is a left branch and by 1 if it is a right branch.
The encoding proceeds as follows. To obtain the code for a value, start from
the root (the node labelled 64) and move down to the value, noting down
each 0 or 1 label for a branch as you cross it. Thus, in moving to the leaf
for the value -10, we obtain the sequence 00011. This is the code for that
4 HUFFMAN ENCODING 13
144 -5 −5 −1
−10 0 14 −9
2 −1 0 −5
0 −8 −2 −2
value.
Note that the most frequent value (-1) has the shortest code (01), and the
less frequent ones have progressively longer codes. A value such as 144, with
frequency 1, has the longest code: 100001.
The table is coded by going through the values one-by-one in the zigzag man-
ner shown in Figure 2 and writing their codes – without any separators! For
instance the starting sequence 144, 5, -10,. . . , becomes 1000011011000011. . . .
(144 → 100001, 5 → 10110, −10 → 00011) To decode this string, one need
only refer to the tree. We start at the root and follow the left or right
branches according to whether we see a 0 or a 1. When we reach a leaf, we
note the corresponding value and start again at the root.
Exercise. Show that our table of values can be described by 231 binary
digits if we use Huffman encoding. If, on the other hand, we had worked
with codes of fixed length, we would have needed 320 binary digits.