Sunteți pe pagina 1din 11

DASAR MULTIMEDIA

6
METODE KOMPRESI
RUN LEGHT ENCODING (RLE)
& KUANTISASI
Chapter 3

Run-length algorithms
In this chapter, we consider a type of redundancy, as in Example 2.24, where
a consecutive sequence of symbols can be identified, and introduce a class of
simple but useful lossless compression algorithms called run-length algorithms
or run-length encoding (RLE for short).
We first introduce the ideas and approaches of the run-length compression
techniques. We then move on to show how the algorithm design techniques
learnt in Chapter 1 can be applied to solve the compression problem.

3.1 Run-length
The consecutive recurrent symbols are usually called runs in a sequence of sym-
bols. Hence the source data of interest is a sequence of symbols from an alphabet.
The goal of the run-length algorithm is to identify the runs and record the length
of each run and the symbol in the run.

E x a m p l e 3.1 Consider the following strings:

1. K K K K K K K K K

2. ABCDEFG

3. ABABBBC

~{. abc 123bbbbCDE.

We highlight the runs in each instance by a small shade.

49
50 C H A P T E R 3. R U N - L E N G T H A L G O R I T H M S

A run-length algorithm assigns codewords to runs instead of coding individual


symbols. The runs are replaced by a tuple (r, l, s) for (run-flag, run-length, run-
symbol) respectively, where s is a member of the alphabet of the symbols and
r and 1 are not.

E x a m p l e 3.2 String KKKKKKKKK, containing a run containing 9 Ks, can be re-


placed by triple ('r', 9, 'K'), or a short unit r9K consisting of the symbol r, 9
and K, where r represents the case of 'repeating symbol', 9 means '9 times of oc-
currence' and K indicates that this should be interpreted as 'symbol K' (repeating
9 times).

When there is no run, in ABCDEFG for example, the run-flag n is assigned


to represent the non-repeating symbols and l, the length of the longest non-
recurrent symbols are counted. Finally, the entire non-recurrent string is copied
as the third element in the triple. This means that non-repeating string/~BCDEFG
is replaced by ('n', 7, '/~BCDEFG'), and nT/~BCDEFG for short.
Run-length algorithms are very effective if the source contains many runs of
consecutive symbols. In fact, the symbols can be characters in a text file, 0s
and ls in a binary file, or any composite units such as colour pixels in an image,
or even component blocks of larger sound files.
Although simple, run-length algorithms have been used well in practice.
The so-called HDC (hardware data compression) algorithm, used by tape drives
connected to IBM computer systems, and a similar algorithm used in the IBM
System Network Architecture (SNA) standard for data communications are still
in use today.
We briefly introduce the HDC algorithm below.

3.2 Hardware data compression (HDC)


For convenience, we will look at a simplified version of the HDC algorithm.
In this form of run-length coding, we assume each run or the non-repeating
symbol sequence contains no more than 64 symbols. There are two types of
control characters. One is a flag for runs and the other is for non-run sequences.
We define the repeating control characters as r3, r 4 , . . - , r63. The subscripts
are numbers to indicate the length of the run. For example, r5 indicates the case
of a run of length 5. The coder replaces each sequence of consecutive identical
symbols with one of the repeating control characters r 3 , ' - ' , r63 and depends
on the run-length followed by the repeating symbol. For example, VVVV can be
replaced by r4V. For a run of spaces, the algorithm will use the control characters
r2, r 3 , . . . , r63 only, but leave out the symbol part. For example, rTr4V can be
decoded as

uuuuuuuVVVV
For the non-run parts, non-repeating control characters n l, n 2 , . . . , n63 are
used which are followed by the length of the longest non-repeating characters
3.2. HARDWARE DATA COMPRESSION (HDC) 51

until the next run or the end of the entire file. For example, ABCDEFG will be
replaced by nrABCDEFG.
This simple version of the HDC algorithm essentially uses only ASCII codes
for the single symbols, or a total of 123 control characters including a run-
length count. Each ri, where i = 2,-.. , 63, is followed by either another control
character or a symbol. If the following symbol is another control character, ri
(alone) signifies i repeating space characters (i.e. spaces or blanks). Otherwise,
ri signifies that the symbol immediately after it repeats i times. Each hi, where
i = 1,.-. , 63, is followed by a sequence of i non-repeating symbols.
Applying the following 'rules', it is easy to understand the outline of the
encoding and decoding run-length algorithms below.

3.2.1 Encoding

Repeat the following until the end of input file:


Read the source (e.g. the input text) symbols sequentially and

if a string I of i (i = 2,..- , 63) consecutive spaces is found, output a single


control character ri

if a string of i (i = 3 , . . . , 63) consecutive symbols other than spaces is


found, output two characters: ri followed by the repeating symbol
. otherwise, identify a longest string of i = 1,... , 63 non-repeating symbols,
where there is no consecutive sequence of two spaces or of three other
characters, and output the non-repeating control character ni followed by
the string.

E x a m p l e 3.3 GGGuuuuuuBCDEFGuu55GHJKuLM777777777777
can be compressed to r3Gr6n6BCDEFGr2ng55GHJKuLMr127.

Solution

1. The first three Gs are read and encoded by r3G.


2. The next six spaces are found and encoded by r6.
3. The non-repeating symbols BCDEFG are found and encoded by n6BCDEFG.
4. The next two spaces are found and encoded by r2.

5. The next nine non-repeating symbols are found and encoded by n955GHJKuLM.
6. The next twelve '7's are found and encoded by r127.

Therefore the encoded output is: r3Gr6n6BCDEFGr2n955GHJKuLMr127.

1i.e. a sequence of symbols.


52 C H A P T E R 3. R U N - L E N G T H A L G O R I T H M S

3.2.2 Decoding
The decoding process is similar to that for encoding and can be outlined as
follows:

Repeat the following until the end of input coded file:


Read the codeword sequence sequentially and

1. if an ri is found, then check the next codeword


(a) if the codeword is a control character output i spaces
(b) otherwise output i (ASCII codes of) repeating symbols
2. otherwise, output the next i non-repeating symbols.

Observation
It is not difficult to observe from a few examples that the performance of the
HDC algorithm (as far as the compression ratio concerns) is:

9excellent 2 when the source contains many runs of consecutive symbols


9poor when there are many segments of non-repeating symbols.

Therefore, run-length algorithms are often used as a subroutine in other more


sophisticated coding.

3.3 Algorithm Design


We have so far learnt the ideas behind the HDC algorithm as well as run-length
algorithms in general. To learn how to design our own compression algorithms,
we look at how to derive a simple version of HDC applying the algorithm design
techniques introduced in Chapter i.

Stage 1" Description of the problem


A problem is a general question to be answered. However, a question may be
too general to lead to an algorithmic solution or too vague to even understand
the issues involved. To help us understand the HDC problem better, we look at
Example 3.1 again.
From the example, we study the input-output to reflect the behaviour of
the algorithm to be developed. It becomes clear to us soon that a run can be
described by two parts as a pair (c, s), where c represents the control charac-
ter with a count, and s the repeating symbol or non-run string depending on
whether c is ri or hi.

2It can be even better than entropy coding such as Huffman coding.
Metode Kompresi Run Length Encoding (RLE)

• Cocok untuk pengkompresian citra yang memiliki kelompok


pixel berderajat keabuan yang sama
• Contoh citra 10x10 dengan 8 derajat keabuan

Pasangan derajat keabuan (p)


dan jumlah pixel (q)
•Ukuran citra sebelum dikompres (1 derajat keabuan = 3 bit)
adalah 100 x 3 bit = 300 bit
•Ukuran citra setelah dikompres (run length =4) adalah (31 x
3) + (31 x 4) bit = 217 bit
Metode Kompresi Kuantisasi

 Buat histogram citra yang akan dikompres.


 P jumlah pixel
 Identifikasi n buah kelompok di histogram
sedemikian sehingga setiap kelompok mempunyai
kira-kira P/npixel
 Nyatakan setiap kelompok dengan derajat keabuan 0
sampai n-1. Setiap kelompok dikodekan kembali
dengan nilai derajat keabuan yang baru
 Contoh, Citra 5 x 13
Metode Kompresi Kuantisasi
 Akan dikompres dengan 4 derajat keabuan (0 -3) atau dengan 2
bit
 Histogram Kelompoknya
•Setelah dikompres

 Ukuran sebelum kompresi (1 derajat keabuan = 4 bit) adalah 65 x


4 bit = 260 bit

S-ar putea să vă placă și