Sunteți pe pagina 1din 18

Dianne Alicen D.

Dias

ECE 6205 – Signals, Spectra & Signal Pro

Final Requirement

January 12, 2020

Reaction Paper On:

“Data Compressed Algorithm”

Have you ever upload or downloaded file from and through the internet, or has recieved

and sent attachements that may encounter files that can in a compressed format? In modern days,

the instantaneous development of technology with the assistance of hardware and software that

progressively advanced widespread information rapidly through the internet. When obtaining

information it can be either recieved or sent effortlessly via the internet. Although, not all

information can be recieved or sent effortlessly. A massive size information may hinder the data

transmission quickly and save on existing storage throughout the computer. Images that are seen

on the internet are compressed, commonly in the GIF or JPEG formats, HDTV are compressed

using MPEG-2 and there are various file systems instinctively compress files when stored.

The process of encoding, modifying or converting the bits structure of the data and to the

point that it utilizes less space on a disk is called Data Compression Algorithm. Source coding or

bit bit-rate reduction is another term for data compression it enables to minimize the storage size

that has one or more data instances or elements.

1
It has a broad application in computing services and solutions, specifically in data

communication and it functions through specific compressing techniques and software solutions

that exploits data compression algorithm to minimize the data size. Why is data compression

algorithm important to our society?

Since the 1970, data compression has uniquely participated in a significant role in

computing and at the same time the internet has been in demand. In 1838, the morse code has

been the earliest speciifcation of data compression in the most conventional letters in the English

language such as “e” and “t.” In the late 1940s data compression started to became modern with

the guidance of the development of information theory which proposed by Claude Shannon,

considered as the founding father of electronic communication age. Together with Robert Fano a

computer scientist and a professor known for inventing the Shannon-Fano coding. He then come

up with a systematic approach in assigning codewords based on probabilities of blocks.

There is a code that is being constructed as the the source message though a(i) and their

probabilities p( a(i) ) the order listed in a decreasingly probability. Then it is divided in two form

of groups with the estimated equal total of possible probabilities. The meassage that each group

will recieve a 0 and it will be the codeword, then the messahe in the second have the codewords

that recieves a 1 in the beginning. These group will then be divided depending to their category

and code digits, the process will continue when the each subgroup must contain one message

one.

2
A ½ 0

B ¼ 10

C 1/8 110

D 1/16 1110

E 1/32 11110

F 1/32 11111

Figure 1: A Shannon-Fano Code

A revolutionary invention in 1977 was the LZ77 and the LZ78 algorithms, because these

were invented by Jacob Ziv and Abraham Lempel. The frequently utilized algorithms in the

modern age is the LZMA, LZX and DEFLATE these were obtained from the LZ77. There were

problems in 1984 with the LZ78 algorithm due to copyright, therefore the UNIX developers then

began to aquire an open source algorithm. Thus, the algothrim like the Burrows-Wheleer

Transform-based Bzip2 and DEFLATE-based gzip formats. These formats could manage to have

an executable that is remarkably with a higher compression than the LZ78 based algorithm.

Knowing how a compression works is by executing a program that utilizes a mechanism

to identify from what source in order to reduce the size of the data. The compression of text

maybe done by eradicating the redundant of characters, installing a repeat character to indicate

the repeated characters.

3
There are two kinds of method in a data compression algorithm, these consists of lossless

and lossy compression. Lossy comppression can lose data unlike lossless compression manages

to preserve all the data, that means the user won’t be able to remove any data. The saddening part

of having a lossy compression is that it permanently deletes bits of data that are unnecessary,

unimportant and hidden. The fact that the user need to determine data from information when

using lossy compression.

This is appropriate with graphics, images, audio and videos, to as the situation when the

deletion of data bits has a small-scale effect on the proportion of the illustration. However, when

using lossless compression allows the user to have a developing file to return to its indigenous

size, even if the file is uncompressed it is unable to lose a single bit of data. Hence, this

compression takes dominance of data redundancy.

There are times when data compression has their advantages and disadvantages. The

compression of files may have countless of advantages. When data is compressed, the quality of

bits are utilized in order to reduce the storage of information. Smaller file sizes may result in a

shorter transmission times when being tranferred through the internet and it can lessen the space

on the storage when compressed.

Furthermore information about the types of compression that are available in

SyncBackFree, SyncBackSE and SyncBackPro. DEFLATE was invented by Phil Katz in 1993,

this type of compression was the most modernized compression. A pre-processor was used

combined with Huffman’s coding as the backend was called the LZ77, this meant that

4
compressing a file would resulted in achieving it in a short period of time. The available type of

compression in SyncBackSE had DEFLATE64 and Burrows-Wheeler Transform.

The DEFLATE64 was known as the Enchanced Deflate, had a compressed ratio and

performace that achieved better than the first DEFLATE. On the other had the Burrows-Wheller

Transform, had used a compression that utilizes a reversible transformation technique that are

able to find patterns that were constantly repeated in dat and that have similar characters that

runs from rearranging them. When the data is rearranged then it can accurately code the data that

may result in a higher compression ratios.

Lastly, the types of compression available in SynnBackPro that consists of Bzip2 and

LZMA. LZMA was invented by Lempel-Ziv-Marckov in 1998 and was released throught a file

format of .7z . However, the Bzip2 is an open source like the Burrows-Wheller Transform, it

utilizes the operating principles that can obtain a splendid compression ratio and an amazing

balance of speed. Which are user friendly format for UNIX environments. If the handling of

highly random data this may led to inactivity.

The next topic is about data compression techniques, these requires the storage data to

have smaller space than the typical data. The reduction of data size can confront a type of space

saver. These include two techniques which consisted the huffman coding and the Lempel-Ziv-

Welch (LZW). Entropy is from statistical physics, it represents the disorder or the desultory of

the system itself.

5
The distribution of probabilities has given a lot of time, depending on the categories of

possible states. This is the definition of Entropy in formula:

An entropy of encoding method utilizes foe the lossless data compression is called the

Huffman coding. This coding has been developed by a Ph.D. student at MIT in 1952 for the

paper “A Method for Construction of Minimum-Redundancy Codes” done and published by

David A. Huffman. This special method of coding implements the representation for the given

symbols and the outcome of having prefix codes that are the “prefix-free codes,” this

demonstrated the a repeated source symbols that may be using strings that have shorter bits. That

is shown in the figure below:

Figure 2 : Huffman coding; Prefix-Free Code

6
This technique will work by creating nodes which looks like a binary tree. It is stored in a

size in which it can depend on the number of symbols in a regular array. Nodes can either be an

internal or leaf node. Originally, leaf nodes are all node and it may contain itself a symbol in

which the frequency of appearance of the symbol. This may link into a parent node that should

make it simple to read the code from which would start from the leaf node.

There is priority queue when building the Huffman tree when the node is at its lowest

frequency then it would be given the highest priority. These complete steps guided me to create

the Huffman tree:

1. Create a leaf node for each character and add them to the priority queue.

2. While there is more than one node in the queue:

a. Remove the two nodes of the highest priority (lowest frequency) from the queue.

b. Create a new internal node with these two nodes as children and with frequency

equal to the sum of the two nodes’ frequencies.

c. Add the new noed to the priority queue.

3. The remaining node is the root node and the tree is complete.

The Huffman coding not only consists of prefix-free codes, it also consists of fixed

length, variable encoding and uniquely decodable codes. The usage of the variable-length is to be

able to utilize characters more common that other algorithm text designs. This text uses less of

the number of bits, that are assigned in the variable number of bits depending on their frequency

of the text given.

7
There will be character that will take up to 1 bit depending on the variable, but the problem when

it comes to variable length coding is what the user is decoding.

In 1984, Welch published a data compression algorithm that was generated by Jacob Ziv,

Abraham Lempel and Terry Welch this was call the Lempel-Ziv-Welch algorithm. Later the

LZ78 was developed and published by Lempel and Ziv in 1978.

The design of the algorithm was to be quickly implements, although it was not optimal

enough because it was only performing an analysis of data that was only limited.The LZW was

also known as the substitutional or a dictionary-based encoding algorithm. This becomes an

occuring of uncompressed data streams and the patterns of data can be recognized in a data

stream and that equalizes the entries in the dictionary.

The Lempel- Ziv algorithm defines the souce that are put together into numerous and set

of segments that are increasing though the length. The steps toward enconding this, the

remaining source that are put together are the longest prefix but will matchh the existing table.

The codeword length is not an adequate large size, that would mean that Lempel-Zive codes may

rise steadily to a reasonable structure. Once the table can no longer be added it means it wasn’t

efficient, had a bad performance and fail to create any gains.

8
Message Codeword Message Codeword
a 1 e 12
1space 2 12e 13
b 3 13e 14
3b 4 5f 15
space 5 f 16
c 6 16f 17
6c 7 17f 18
6space 8 g 19
d 9 19g 20
9d 10 20g 21
10space 11  
Figure 5 : Lempel-Ziv table for the message ensemble EXAMPLE
(code length=173).

There are graphic files that are based in this category of method, that consists of a file

format called TIFF. This file format is pixeled data that is compressed into byte before being

dispensed to LZW. Depending on the bit depth of the image and number of color channels it can

be fragment of a pixel value.

Same goes for the GIF file format this required an input symbol in each LZW in order for

it to be a pixel value. There are input symbols in GIF that allows between 2 and 256 LZW. GIF

has can be authorized in a 1 to 8 bit deep images it is very complex to the extend the bit may

only has a maximum of 8 pixels per pixel, in order for the LZW dictionary can obtain a utilized

compression on large input alphabets it has to become very massive.

There are four uses of data compression that consists of images, audio, video and

genetics. Firstly is the image compression, the introduction to entropy coding begain in the 1940s

and the person who introduced it was Shannon-Fano. This technique was very important

9
especially to the discrete cosine tranform (DCT). It is a compression for the basis of JPEG, the

Joint Photographic Experts Group was initiated in 1992.

The JPEG can compress images into a compacted file size, this format has been popular

and widely used for image file format throughout the internet. Another type of file format that

can also be compacted into a smaller size format is the Porttal Network Graphics (PNG) fomat,

that had an affiation of a certain loosless compression algorithm called DEFLATE it was known

in 1996.

Mutilple parameters are a characteristic of digital images, there are three modes that

digital image may have. The three modes are binary, color and grayscle. The usage for image

compression can be used in the medical field, medical image compression has an important key

role in hospitals because everything has completely changed into digital. Unlike before, image

compression use to be made up of film imaging.

The binary image consists of two values that are available for each pixel, the grayscale

image has in its pixel can contain a hint of the color gray. The pixels are all created into one

array and that a set of pixels are called a digital image.

10
Besides the types of image formats, there are also audio data compression and coding

format. This data compression has a development to condense the transmission of the storage

requirements and bandwidth of audio data. These compressed algorithm have executed in a

software called audio codecs and have provided higher compressions through numerous audio

appplication. In order to reduce these audible sounds and being able to space save these files, the

compressed algorithm has relied on the psychoacoustics to remove them.

The most important role when looking for a channel, the encoder and the decoder.

Because they help reduce the impact of noise channeling when it is noisy. The output of the

source that is transcribing in an encoder, it may contain less discharge. It might become highly

fragile to the noise in order for the channel would introduce a controlled form of discharge. The

Hamming code is the most important chanhel encoding technique used. Depending on the

enough bits that are being sent from the data to be encode, it is to ensure that there would have a

minimal error, small amounts of bits must be change in order to for coded words would be

restored.

Another important category in the audio data compression is the speech encoding. Mostly

there are things that are different or can estimate what an ear of a human may hear. There are

frequencies that may convey sounds of music or a human voice if its far or near them. Sounds

are very complex, when using a high quality low bit rate encoding a speech will be the result. If

the data compression is analog it may convert into number that are named as analog-to-digital

conversion.

11
The integer that will be used are 8 bits each, and the range of the analog would be divided

into 256 interval and the signal values within the interval may became the same number. If the

interger bits are 16 that are generated then the analog signal will be divided into more interveral

making that 65,536 intervals.

Downloading or uploading video through the internet consists of MPEG formats, the

video compression is an executable source code in the theory of information.Videos that are

uncompressed needs a very high data rate, these consists of a compression factor of 5 and 12 and

a MPEG compression has a factor of 20 and 200. Shown below is the flow chart on how a

MPEG can be compressed.

Figure 2: The compression flow of a MPEG Video

Encoding theory, we might think that video data are a series of still images, and that kind

of data may contain enormous amount of temporal difference. However, inter-fram e

compression are the most powerful compression techniques.

12
It can be used and recycled from one or more data, the frame are in sequence to the

current frame to describe it. On the other hand, the coding itself only uses the data that is in the

current frame but usefully be a still image compression. Camcorders and video editing software

can be used on a less complex compression schemes that doesn’t allow their prediction to be

used on to the intra fram prediction and that’s what class of specialized formats are called.

In this concept, we can compare the two types of compression algorithms that consists of

static or adaptive compression.

Given the fact that these models may change depending on the process of compression or

decompression. Static system would never change during the compresion or decompression.

However, the adaptive system can change during the compression or decompression depending

on the process of the inputs and outputs.

Static and adaptive have numerous of algorithms in a compression of a hardware or a

software. The next would be if the compression or decompression becomes identical, then they

are called symmetric or asymmetric. Arithmetic coding is unique technique that allows the

messages in the information that have a subsequent sequence to the combines bits that are

shared. This technique allows the number of bits the would be sent to asympotically and would

equal to the sum of the information itself to the messages individually.

13
Figure 3 : Arithmetic Coding

Now a days, the most commonly used video compression methods are the Hybrid block-

based transform formats. The ITU-T mostly relied in the DCT, to apply a temporal prediction

when using these motion vectors and in-loop filtering step.

Prediction stage is the twin and difference coding that may help execute data and illustrating the

new data the has already channeled throught the data. In 1974, N. Ahmed, T. Natarajan and K. R.

Rao had introduced the discrete cosine transform (DCT) that is used widely.

The pixel are transformed in a frequency in order to ease the targeting unnessary

information in a facilitate and noice reduction redundancy.

14
The latest generatation of the lossless algorithms that have compressed data are the

genetics compression algorithms. Scientist from the John Hopkins University had published a

genetic compression algorithm in 2012. The software HAPZIPPER was data that could achieve

20- fold compression that is much more rapid and usage of being in the leading general-purpose

compression utilities.

15
Conclusion

In conclusion, it is very important to have data compression because in the computing

word it is commonly utilized by applications, that consists of the suite of SyncBack programs.

There is a time that user may expect that they will be able to compress everything, there is a bias

that could be assumed when it comes to all the compression of algorithms and this is an outcome

that compression is all about expectation. The algorithms are certainly not secure, at the begining

of the file the algorithms then transmit a dictionary that is not necessary. Even thought the

mapping of the letters and the bits were hard to compute. The Huffman coding is a simplier and

effective way to encode, this way it won’t be a hassle to loose any fragment of information.

When it comes to audio data compression the codecs are changing rapidly and the technology

has been enchancing which would have a marjor impacts because of computer technologies are

evolving.

The purpose of data compression algorithm is requesting a smaller file size a message,

file or any portion of data. They are very useful to speed up and compress an image data during

the World wide web loads a transmission. In order for humans to have a file smaller that its

regular size that could mean the file we had for example, is 20Mb file we can shrink it down to a

10Mb, the reason why it is like this is that the compression will only have a compression ratio of

2, that is half the size of its original file size. Compression has explanations, because people save

time and profit that would instead be spent on storage.

16
Finally, this reaction paper made me see the visual understanding of data compressed

algorithms and how the world around us is evolving rapidly. This proves that technology has its

own design and shape in order for users to have a user friendly player and formats to use in the

future. With these developments in computer technology there were numerous limitation that

have overpowered. Some systems have a ton of complexity limitation like throughout the the

years that have already became distict in memory. Since the costs for implementing new systems

have rose, I hope we are able to afford such technology in order to develop new ones.

17
Reference :

Coding symbolic data Ida Mengyi Pu, in Fundamental Data Compression, 2006

https://www.techopedia.com/definition/884/data-compression

The Basic Principles of Data compression ; Conrad Chung,

https://www.2brightsparks.com/resources/articles/data-compression.html

18

S-ar putea să vă placă și