Documente Academic
Documente Profesional
Documente Cultură
Dias
Final Requirement
Have you ever upload or downloaded file from and through the internet, or has recieved
and sent attachements that may encounter files that can in a compressed format? In modern days,
the instantaneous development of technology with the assistance of hardware and software that
progressively advanced widespread information rapidly through the internet. When obtaining
information it can be either recieved or sent effortlessly via the internet. Although, not all
information can be recieved or sent effortlessly. A massive size information may hinder the data
transmission quickly and save on existing storage throughout the computer. Images that are seen
on the internet are compressed, commonly in the GIF or JPEG formats, HDTV are compressed
using MPEG-2 and there are various file systems instinctively compress files when stored.
The process of encoding, modifying or converting the bits structure of the data and to the
point that it utilizes less space on a disk is called Data Compression Algorithm. Source coding or
bit bit-rate reduction is another term for data compression it enables to minimize the storage size
1
It has a broad application in computing services and solutions, specifically in data
communication and it functions through specific compressing techniques and software solutions
that exploits data compression algorithm to minimize the data size. Why is data compression
Since the 1970, data compression has uniquely participated in a significant role in
computing and at the same time the internet has been in demand. In 1838, the morse code has
been the earliest speciifcation of data compression in the most conventional letters in the English
language such as “e” and “t.” In the late 1940s data compression started to became modern with
the guidance of the development of information theory which proposed by Claude Shannon,
considered as the founding father of electronic communication age. Together with Robert Fano a
computer scientist and a professor known for inventing the Shannon-Fano coding. He then come
There is a code that is being constructed as the the source message though a(i) and their
probabilities p( a(i) ) the order listed in a decreasingly probability. Then it is divided in two form
of groups with the estimated equal total of possible probabilities. The meassage that each group
will recieve a 0 and it will be the codeword, then the messahe in the second have the codewords
that recieves a 1 in the beginning. These group will then be divided depending to their category
and code digits, the process will continue when the each subgroup must contain one message
one.
2
A ½ 0
B ¼ 10
C 1/8 110
D 1/16 1110
E 1/32 11110
F 1/32 11111
A revolutionary invention in 1977 was the LZ77 and the LZ78 algorithms, because these
were invented by Jacob Ziv and Abraham Lempel. The frequently utilized algorithms in the
modern age is the LZMA, LZX and DEFLATE these were obtained from the LZ77. There were
problems in 1984 with the LZ78 algorithm due to copyright, therefore the UNIX developers then
began to aquire an open source algorithm. Thus, the algothrim like the Burrows-Wheleer
Transform-based Bzip2 and DEFLATE-based gzip formats. These formats could manage to have
an executable that is remarkably with a higher compression than the LZ78 based algorithm.
to identify from what source in order to reduce the size of the data. The compression of text
maybe done by eradicating the redundant of characters, installing a repeat character to indicate
3
There are two kinds of method in a data compression algorithm, these consists of lossless
and lossy compression. Lossy comppression can lose data unlike lossless compression manages
to preserve all the data, that means the user won’t be able to remove any data. The saddening part
of having a lossy compression is that it permanently deletes bits of data that are unnecessary,
unimportant and hidden. The fact that the user need to determine data from information when
This is appropriate with graphics, images, audio and videos, to as the situation when the
deletion of data bits has a small-scale effect on the proportion of the illustration. However, when
using lossless compression allows the user to have a developing file to return to its indigenous
size, even if the file is uncompressed it is unable to lose a single bit of data. Hence, this
There are times when data compression has their advantages and disadvantages. The
compression of files may have countless of advantages. When data is compressed, the quality of
bits are utilized in order to reduce the storage of information. Smaller file sizes may result in a
shorter transmission times when being tranferred through the internet and it can lessen the space
SyncBackFree, SyncBackSE and SyncBackPro. DEFLATE was invented by Phil Katz in 1993,
this type of compression was the most modernized compression. A pre-processor was used
combined with Huffman’s coding as the backend was called the LZ77, this meant that
4
compressing a file would resulted in achieving it in a short period of time. The available type of
The DEFLATE64 was known as the Enchanced Deflate, had a compressed ratio and
performace that achieved better than the first DEFLATE. On the other had the Burrows-Wheller
Transform, had used a compression that utilizes a reversible transformation technique that are
able to find patterns that were constantly repeated in dat and that have similar characters that
runs from rearranging them. When the data is rearranged then it can accurately code the data that
Lastly, the types of compression available in SynnBackPro that consists of Bzip2 and
LZMA. LZMA was invented by Lempel-Ziv-Marckov in 1998 and was released throught a file
format of .7z . However, the Bzip2 is an open source like the Burrows-Wheller Transform, it
utilizes the operating principles that can obtain a splendid compression ratio and an amazing
balance of speed. Which are user friendly format for UNIX environments. If the handling of
The next topic is about data compression techniques, these requires the storage data to
have smaller space than the typical data. The reduction of data size can confront a type of space
saver. These include two techniques which consisted the huffman coding and the Lempel-Ziv-
Welch (LZW). Entropy is from statistical physics, it represents the disorder or the desultory of
5
The distribution of probabilities has given a lot of time, depending on the categories of
An entropy of encoding method utilizes foe the lossless data compression is called the
Huffman coding. This coding has been developed by a Ph.D. student at MIT in 1952 for the
David A. Huffman. This special method of coding implements the representation for the given
symbols and the outcome of having prefix codes that are the “prefix-free codes,” this
demonstrated the a repeated source symbols that may be using strings that have shorter bits. That
6
This technique will work by creating nodes which looks like a binary tree. It is stored in a
size in which it can depend on the number of symbols in a regular array. Nodes can either be an
internal or leaf node. Originally, leaf nodes are all node and it may contain itself a symbol in
which the frequency of appearance of the symbol. This may link into a parent node that should
make it simple to read the code from which would start from the leaf node.
There is priority queue when building the Huffman tree when the node is at its lowest
frequency then it would be given the highest priority. These complete steps guided me to create
1. Create a leaf node for each character and add them to the priority queue.
a. Remove the two nodes of the highest priority (lowest frequency) from the queue.
b. Create a new internal node with these two nodes as children and with frequency
3. The remaining node is the root node and the tree is complete.
The Huffman coding not only consists of prefix-free codes, it also consists of fixed
length, variable encoding and uniquely decodable codes. The usage of the variable-length is to be
able to utilize characters more common that other algorithm text designs. This text uses less of
the number of bits, that are assigned in the variable number of bits depending on their frequency
7
There will be character that will take up to 1 bit depending on the variable, but the problem when
In 1984, Welch published a data compression algorithm that was generated by Jacob Ziv,
Abraham Lempel and Terry Welch this was call the Lempel-Ziv-Welch algorithm. Later the
The design of the algorithm was to be quickly implements, although it was not optimal
enough because it was only performing an analysis of data that was only limited.The LZW was
occuring of uncompressed data streams and the patterns of data can be recognized in a data
The Lempel- Ziv algorithm defines the souce that are put together into numerous and set
of segments that are increasing though the length. The steps toward enconding this, the
remaining source that are put together are the longest prefix but will matchh the existing table.
The codeword length is not an adequate large size, that would mean that Lempel-Zive codes may
rise steadily to a reasonable structure. Once the table can no longer be added it means it wasn’t
8
Message Codeword Message Codeword
a 1 e 12
1space 2 12e 13
b 3 13e 14
3b 4 5f 15
space 5 f 16
c 6 16f 17
6c 7 17f 18
6space 8 g 19
d 9 19g 20
9d 10 20g 21
10space 11
Figure 5 : Lempel-Ziv table for the message ensemble EXAMPLE
(code length=173).
There are graphic files that are based in this category of method, that consists of a file
format called TIFF. This file format is pixeled data that is compressed into byte before being
dispensed to LZW. Depending on the bit depth of the image and number of color channels it can
Same goes for the GIF file format this required an input symbol in each LZW in order for
it to be a pixel value. There are input symbols in GIF that allows between 2 and 256 LZW. GIF
has can be authorized in a 1 to 8 bit deep images it is very complex to the extend the bit may
only has a maximum of 8 pixels per pixel, in order for the LZW dictionary can obtain a utilized
There are four uses of data compression that consists of images, audio, video and
genetics. Firstly is the image compression, the introduction to entropy coding begain in the 1940s
and the person who introduced it was Shannon-Fano. This technique was very important
9
especially to the discrete cosine tranform (DCT). It is a compression for the basis of JPEG, the
The JPEG can compress images into a compacted file size, this format has been popular
and widely used for image file format throughout the internet. Another type of file format that
can also be compacted into a smaller size format is the Porttal Network Graphics (PNG) fomat,
that had an affiation of a certain loosless compression algorithm called DEFLATE it was known
in 1996.
Mutilple parameters are a characteristic of digital images, there are three modes that
digital image may have. The three modes are binary, color and grayscle. The usage for image
compression can be used in the medical field, medical image compression has an important key
role in hospitals because everything has completely changed into digital. Unlike before, image
The binary image consists of two values that are available for each pixel, the grayscale
image has in its pixel can contain a hint of the color gray. The pixels are all created into one
10
Besides the types of image formats, there are also audio data compression and coding
format. This data compression has a development to condense the transmission of the storage
requirements and bandwidth of audio data. These compressed algorithm have executed in a
software called audio codecs and have provided higher compressions through numerous audio
appplication. In order to reduce these audible sounds and being able to space save these files, the
The most important role when looking for a channel, the encoder and the decoder.
Because they help reduce the impact of noise channeling when it is noisy. The output of the
source that is transcribing in an encoder, it may contain less discharge. It might become highly
fragile to the noise in order for the channel would introduce a controlled form of discharge. The
Hamming code is the most important chanhel encoding technique used. Depending on the
enough bits that are being sent from the data to be encode, it is to ensure that there would have a
minimal error, small amounts of bits must be change in order to for coded words would be
restored.
Another important category in the audio data compression is the speech encoding. Mostly
there are things that are different or can estimate what an ear of a human may hear. There are
frequencies that may convey sounds of music or a human voice if its far or near them. Sounds
are very complex, when using a high quality low bit rate encoding a speech will be the result. If
the data compression is analog it may convert into number that are named as analog-to-digital
conversion.
11
The integer that will be used are 8 bits each, and the range of the analog would be divided
into 256 interval and the signal values within the interval may became the same number. If the
interger bits are 16 that are generated then the analog signal will be divided into more interveral
Downloading or uploading video through the internet consists of MPEG formats, the
video compression is an executable source code in the theory of information.Videos that are
uncompressed needs a very high data rate, these consists of a compression factor of 5 and 12 and
a MPEG compression has a factor of 20 and 200. Shown below is the flow chart on how a
Encoding theory, we might think that video data are a series of still images, and that kind
12
It can be used and recycled from one or more data, the frame are in sequence to the
current frame to describe it. On the other hand, the coding itself only uses the data that is in the
current frame but usefully be a still image compression. Camcorders and video editing software
can be used on a less complex compression schemes that doesn’t allow their prediction to be
used on to the intra fram prediction and that’s what class of specialized formats are called.
In this concept, we can compare the two types of compression algorithms that consists of
Given the fact that these models may change depending on the process of compression or
decompression. Static system would never change during the compresion or decompression.
However, the adaptive system can change during the compression or decompression depending
software. The next would be if the compression or decompression becomes identical, then they
are called symmetric or asymmetric. Arithmetic coding is unique technique that allows the
messages in the information that have a subsequent sequence to the combines bits that are
shared. This technique allows the number of bits the would be sent to asympotically and would
13
Figure 3 : Arithmetic Coding
Now a days, the most commonly used video compression methods are the Hybrid block-
based transform formats. The ITU-T mostly relied in the DCT, to apply a temporal prediction
Prediction stage is the twin and difference coding that may help execute data and illustrating the
new data the has already channeled throught the data. In 1974, N. Ahmed, T. Natarajan and K. R.
Rao had introduced the discrete cosine transform (DCT) that is used widely.
The pixel are transformed in a frequency in order to ease the targeting unnessary
14
The latest generatation of the lossless algorithms that have compressed data are the
genetics compression algorithms. Scientist from the John Hopkins University had published a
genetic compression algorithm in 2012. The software HAPZIPPER was data that could achieve
20- fold compression that is much more rapid and usage of being in the leading general-purpose
compression utilities.
15
Conclusion
word it is commonly utilized by applications, that consists of the suite of SyncBack programs.
There is a time that user may expect that they will be able to compress everything, there is a bias
that could be assumed when it comes to all the compression of algorithms and this is an outcome
that compression is all about expectation. The algorithms are certainly not secure, at the begining
of the file the algorithms then transmit a dictionary that is not necessary. Even thought the
mapping of the letters and the bits were hard to compute. The Huffman coding is a simplier and
effective way to encode, this way it won’t be a hassle to loose any fragment of information.
When it comes to audio data compression the codecs are changing rapidly and the technology
has been enchancing which would have a marjor impacts because of computer technologies are
evolving.
The purpose of data compression algorithm is requesting a smaller file size a message,
file or any portion of data. They are very useful to speed up and compress an image data during
the World wide web loads a transmission. In order for humans to have a file smaller that its
regular size that could mean the file we had for example, is 20Mb file we can shrink it down to a
10Mb, the reason why it is like this is that the compression will only have a compression ratio of
2, that is half the size of its original file size. Compression has explanations, because people save
16
Finally, this reaction paper made me see the visual understanding of data compressed
algorithms and how the world around us is evolving rapidly. This proves that technology has its
own design and shape in order for users to have a user friendly player and formats to use in the
future. With these developments in computer technology there were numerous limitation that
have overpowered. Some systems have a ton of complexity limitation like throughout the the
years that have already became distict in memory. Since the costs for implementing new systems
have rose, I hope we are able to afford such technology in order to develop new ones.
17
Reference :
https://www.techopedia.com/definition/884/data-compression
https://www.2brightsparks.com/resources/articles/data-compression.html
18