Sunteți pe pagina 1din 8

2/27/2015

Data Compression
Eng. Reem Sabah Ahmed
Computer Engineering Department
Engineering College
Al-Nahrain University

Data Compression
Introduction
Lossless

and lossy
compression.
Run-Length coding
Delta coding.
Huffman
method(coding)
Huffman
method(decoding)

Arithmetic

method(coding)
Arithmetic
method(decoding)
Dictionary methods
Lempel-Ziv
Algorithm
Image compression
Video compression

2/27/2015

Run-Length Coding

Run length encoding (RLE) is a simple technique to


compress digital data by representing successive runs
of the same value in the data as the value followed
by the count, rather than the original run of values.
The goal is to reduce the amount of data needed to
be stored or transmitted.

Run length encoding (RLE) is one very simple lossless


data compression algorithm called run-length
encoding that can be very useful in some cases.

Run-length encoding (RLE) is a very simple form of


data compression in which consecutive sequences
of the same data value (runs) are stored or
transmitted as a single data value and count,
rather than as the original individual data
elements. This is particularly useful for data that
contains many such runs such as simple graphic
images and faxed documents. If the original
doesnt have many runs, it could increase rather
than decrease the size of the data file or
transmission.

2/27/2015

A fax machine scans a page and represents that image as


black or white pixels, which are sent over telephone
lines to another fax machine that will print the pixels
onto a blank page. The total number of pixels to be
transmitted may be very large which would result in
lengthy transmission times. Because fax images often
have large blocks of white (e.g. margins and inter-line
spacing) or black (e.g. horizontal lines) they are readily
amenable to run-length encoding.

this compression under the category of lossless


compression algorithms. The decoded data will be the
exact same as the original data.

The following are the main considerations that apply to


this technique:

Text in a natural language (as well as names) may have many


doubles and a few triplesas in AAA (an acronym), abbess,
Emmanuelle, bookkeeper, arrowwood, freeer (old usage),
and hostessship (used by Shakespeare)but longer runs are
limited to consecutive spaces and periods.

In a bi-level image there are two types of symbols, namely


black and white pixels, so runs of pixels alternate between
these two colors, which implies that RLE can compress such
an image by replacing each run with its length.

2/27/2015

In general, the data to be compressed includes several


types of symbols, so RLE compresses a run by replacing
it with a pair (length, symbol).

If the run is short, such a pair may be longer than the


run of symbols, thereby leading to expansion. A
reasonable solution is to write such short runs on the
output in raw format, so at least they do not cause
expansion. However, this raises the question of
distinguishing between pairs and raw items, because in
the output file both types are binary strings. A practical
RLE program must therefore precede each pair and each
raw item with a 1-bit indicator.

Thus, a pair becomes the triplet (0, length, symbol) and a


raw item becomes the pair (1, symbol).

Runs have different lengths, which is why the pairs and


triplets have different lengths. It therefore makes sense
to replace each by a variable-length code and write the
codes on the output. Thus, RLE is normally just one step
in a multistep compression algorithm that may include a
transform, variable-length codes, and perhaps also
quantization.

2/27/2015

Examples of Run Length coding


Example1

As a basic example, consider the following string of numbers:


555588822222

There is a fair amount of redundancy there. In RLE notation, this


same string could be expressed as:
453852

This indicates a run of 4 5 numbers, followed by a run of 3 8


numbers, followed by a run of 5 2 numbers. Suppose that each
number was represented by a byte on disk. The original encoding
requires 12 bytes. The RLE encoding requires 6 bytes. 2:1
compression in this simple case not bad.

Example 2

consider the following number string:


123456

Apply the same RLE compression scheme as before:


111213141516

So the original string was 6 bytes and the RLE string is 12


bytes. That did not go too well as the compression ratio is
now 1:2. This is why pure RLE implementations have a mode
for encoding strings of dissimilar numbers as well. It usually
works by encoding a negative number that, when negated,
will give the number of succeeding bytes that are not RLE
data.

2/27/2015

Example 2 Cont.

Since the above example contains a string of 6 dissimilar


numbers, the encoding would be:
-6 1 2 3 4 5 6

A decoder would see that the RLE code is negative, negate it


to get 6, and then copy 6 bytes from the encoded byte stream
to the decoded byte stream.

Example 3
Using the basic concepts described so far, lets decode the
following byte stream:
4 5 -6 1 2 3 4 5 6 3 8 5 2
This is a combination of examples from above and decodes
to:
555512345688822222
Work through it by hand as necessary. So the encoded string
is 13 bytes and the decoded string is 18 bytes. Modest
compression. RLE compression naturally works best where
there are long runs of the same pixel value. At worst, there
will be no compression if there are almost no pairs of
adjacent pixels with the same value.

2/27/2015

Summery

The basic RLE principle is to detect sequences of repeated


data values and after that to replace this sequence with two
elements:
- the number of the same characters.
- the character itself.
Useful for compressing data that contains repeated
values
Very simple compared with other compression
techniques
Reversible (Lossless) compression: decompression is just
as easy

Question?

2/27/2015

Thank you

S-ar putea să vă placă și