Documente Academic
Documente Profesional
Documente Cultură
At present there is an insatiable demand for ever-greater bandwidth in communication networks and forever-greater storage capacity in computer system. This led to the need for an efficient compression technique. The compression is the process that is required either to reduce the volume of information to be transmitted text, fax and images or reduce the bandwidth that is required for its transmission speech, audio and video. The compression technique is first applied to the source information prior to its transmission.
Compression Techniques:
. !ossless compression ". !ossy compression
Lossless compression:
#n this, data is not altered in process of compression or decompression. $ecompression generates an exact replica of an original image. %Text compression& is a good example. 'preadsheets, processor files usually contain repeated sequence of characters. (y reducing repeated characters to count, we can reduce requirement of bits.
)rayscale * images contain repetitive information this repetitive graphic images and sound allows replacement of bits by codes. #n color images, ad+acent pixels can have different color values. These images do not have sufficient repetitiveness to be compressed. #n these cases, this technique is not applicable. !ossless compression techniques have been able to achieve reduction in si,e in the range from - . to -/. of original uncompressed si,e.
Lossy compression:
#t is used for compressing audio, grayscale or color images, and video ob+ects. #n this, compressing results in loss of information. 7hen a video image is decompressed, loss of data in one frame will not perceived by the eye. #f several bits are missing, information is still perceived in an acceptable manner as eye fills in gaps in shading gradient.
Text Compression :
There are three different types of text unformatted, formatted and hypertext and all are represented as strings of characters selected from a defined set. The compression algorithm associated with text must be lossless since the loss of +ust a single character could modify the meaning of a complete string. The text compression is restricted to the use of entropy encoding and in practice, statistical encoding methods. There are two types of statistical encoding methods which are used with text? one which uses single character as the basis of deriving an optimum set of code words and the other which uses variable length strings of characters. Two examples of the former are the @uffman and Arithmetic coding algorithms and an example of the latter is !empel-6iv 1!62 algorithm. The ma+ority of work on hardware approaches to lossless parallel data compression has used an adapted form of the dictionary-based !empel-6iv algorithm, in which a large number of simple processing elements are arranged in a systolic array A B, A"B, A4B, A5B.
much slower deep pipelining technique to implement its dictionary search. @owever, compared to the 3A: solution, the systolic array method has advantages in terms of reduced hardware costs and lower power consumption, which may be more important criteria in some situations than having faster dictionary searching. #n ADB, the authors show that hardware main memory data compression is both feasible and worthwhile. The authors also describe the design and implementation of a novel compression method, the 9:atch0ro algorithm. The authors exhibit the substantial impact such memory compression has on overall system performance. The adaptation of compression code for parallel implementation is investigated by ;iang and ;ones AEB. They recommended the use of a processing array arranged in a tree-like structure. Although compression can be implemented in this manner, the implementation of the decompressorFs search and decode stages in parallel hardware would greatly increase the complexity of the design and it is likely that these aspects would need to be implemented sequentially. An G0)A implementation of a parallel binary arithmetic coding architecture that is able to process D bits per clock cycle compared to the standard 'tefo et al A .B. Although little research has been performed on architectures involving several independent compression units working in a concurrent cooperative manner, #(: has introduced the :9T chip A B, which has four independent compression engines operating on a shared memory area. The four !empel-6iv compression engines are used to provide data throughput sufficient for memory compression in computer servers. Adaptation of software compression algorithms to make use of multiple 30H bit per cycle is described by
systems was demonstrated by research of 0enhorn A "B and 'impson and 'abharwal A 4B. 0enhorn used two 30Hs to compress data using a technique based on the !empel-6iv algorithm and showed that useful compression rate improvements can be achieved, but only at the cost of increasing the learning time for the dictionary. 'impson and 'abharwa described the software implementation of compression system for a multiprocessor system based on the parallel architecture developed by )on,ale, and 'mith and 'torer A 5B.
Statistical Methods :
'tatistical :odeling of lossless data compression system is based on assigning =alues to events depending on their probability. The higher the value, the higher the probability. The accuracy with which this frequency distribution reflects reality determines the efficiency of the model. #n :arkov modeling, predictions are done based on the symbols that precede the current symbol. 'tatistical :ethods in hardware are restricted to simple higher order modeling using binary alphabets that limits speed, or simple multisymbol alphabets using ,eroth-order models that limits compression. (inary alphabets limit speed because only a few bits 1typically a single bit2 are processed in each cycle while ,eroth order models limit compression because they can only provide an inexact representation of the statistical properties of the data source.
Dictionary Methods: $ictionary :ethods try to replace a symbol or group of symbols by a dictionary location code. 'ome dictionary-based techniques use
simple uniform binary codes to rocess the information supplied. (oth software and hardware based dictionary models achieve good throughput and competitive compress The HI#9 utility JcompressF uses !empel-6iv-" 1!6"2 algorithm and the data compression !empel-6iv 1$3!62 family of compressors initially invented by @ewlett-0ackardA 8B and currently being developed by A@AA CB,A DB also use derivatives. (unton and (orriello present another !6" !6"
implementation in A EB that improves on the $ata 3ompression !empel-6iv method. #t uses a tag attached to each dictionary location to identify which node should be eliminated once the dictionary becomes full. XMatchPro ased System : The !ossless data compression system is a derivative of the 9:atch0ro Algorithm which originates from previous research of the authors A /B and advances in G0)A technology. The flexibility provided by using this technology is of great interest since the chip can be adapted to the requirements of a particular application easily. The drawbacks of some of the previous methods are overcome by using the 9match0ro algorithm in design. The ob+ective is then to obtain better compression ratios and still maintain a high throughput so that the compression-decompression processes do not slow the original system down.
!ossless compression removes redundant information from the data while they are being transmitted or before they are stored in memory, and lossless decompression reintroduces the redundant information to recover fully the original data. #n the same way, the data is compressed before it is stored and decompressed when it is retrieved, thus increasing the effective capacity of the storage device.
Proposed Method :
#n A B, the author discusses about the 0arallel Algorithm that can be implemented form @igh 'peed $ata 3ompression. The authors gives the basic idea about how the $ata 3ompression is carried out using the !empel-6iv Algorithm and how it could be altered for 0arallelism of the algorithm. The author describes the !empel-6iv algorithm as a
very efficient universal data compression technique, based upon an incremental parsing technique, which maintains codebooks of parsed phrases at the transmitter and at the receiver. An important feature of the algorithm is that it is not necessary to determine a model of the source, which generates the data. According to the author, in an attempt to increase the speed of the algorithm on general-purpose processors, the algorithm has been parallelised to run on two processors.
ack'round :
The author explains a novel architecture for a high-
performance lossless data compressor that is organi,ed around a selectively shiftable 3ontent Addressable :emory, which permits full matching, the processor offers very high performance with good compression of computer-based data. The author also gives details about the operation, architecture and performance of the $ata 3ompression Techniques. @e also introduces the 9:atch0ro lossless data compressor. #n A4B, the authors discuss about the parallelism in $ata 3ompression Techniques and the authors explain the 0arallel #n Architecture paper, the for @igh 'peed expresses $ata $ata 3ompression. this author
3ommunication as an essential component of high-speed data communication and storage. #n A5B, the authors discuss about the various methods of $ata 3ompression and their Techniques 0arallel !ossless research and $ata 3ompression. The authors describes and the drawbacks and propose a new methodology for a high speed hardware implementation of a high performance
parallel multi compressor chip which could able to meet the intensive data processing demands of highly concurrent system. The authors also investigate the performance of alternative input and
output routing strategies for realistic data sets demonstrate that the design of parallel compression devices involves important trade offs that affect compression performance, latency and throughput. 3ompression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable-to-block codes designed to match a completely specified source.
matched. 42. 3haracters that did not match transmitted in literal form. A description of the 9:atch0ro algorithm in pseudo-code is given in the figure below. clear the dictionaryK set the next free location 1IG!2 to .K
$o L read in a tuple T from the data streamK search the dictionary for tuple TK #G 1full or partial hit2 L determine the best match location :! and match type :TK output J.FK output any required literal characters of TK M <!'< L output J FK output tuple TK M #G 1full hit2 L move dictionary entries . to :! - down by one location K M <!'< L move all dictionary entries down by one locationK increment IG! 1if dictionary is not full2K M copy tuple T to dictionary location .K M 7@#!< 1more data is to be compressed2K
7ith the increase in silicon densities, it is becoming feasible for multiple 9:atch0ros to be implemented in parallel onto a
single chip. A parallel system with distributed memory architecture is based on having multiple data compression and decompression engines working independently on different data at the same time. This data is stored in memory distributed to each processor. There are several approaches in which data can be routed to and from the compressors that will affect the speed, complexity of the system. !ossless compression and compression !ossless removes
redundant information from the data while they are transmitted or before they are stored in data. There are two important contributions made by the current parallel compression * decompression work, namely, improved rates and the inherent requirement scalability scalability. between 'ignificant compressors compression sharing the memory. decompression reintroduces the redundant information to recover fully the original
improvements in data compression rates have been achieved by computational The without significantly compromising the contribution made by individual compressors. feature permits future bandwidth or storage demands to be met by adding additional compression engines.
#nitially all the entries in the dictionary are empty * 5-bytes are added to the front of the dictionary, while the rest move one position down if a full match has not occurred. The larger the dictionary, the greater the number of address bits needed to identify each memory location, reducing compression performance. 'ince the number of bits needed to code each location address is a function of the dictionary si,e greater compression is obtained in comparison to the case where a fixed si,e dictionary uses fixed address codes for a partially full dictionary. #n the parallel 9:atch0ro system, the data stream to be compressed enters the compression system, which is then partitioned and routed to the compressors. Gor parallel compression systems, it is important to ensure all compressors are supplied with sufficient data by managing the supply so that neither stall conditions nor data overflow occurs.
The no. of tuples present in the dictionary has an important effect on compression. #n principle, the larger the dictionary the higher the probability of having a match and improving compression. On the other hand, a bigger dictionary uses more bits to code its locations degrading compression when processing small data blocks that only use a fraction of the dictionary length available. The width of the 3A: is fixed with 5bytes-word. 3ontent Addressable :emory 13A:2 compares input search data against a table of stored data, and returns the address of the matching data. 3A:s have a single clock cycle throughput making them faster than other hardware and software-based search systems. The input to the system is the search word that is broadcast onto the searchlines to the table of stored data. <ach stored word has a matchline that indicates whether the search word and stored word are identical 1the match case2 or are different 1a mismatch case, or miss2. The matchlines are fed to an encoder that generates a binary match location corresponding to the matchline that is in the match state. An encoder is used in systems where only a single match is expected. The overall function of a 3A: is to take a search word and return the matching memory location.
$%&CT"#&!L D-SC1"PT"#&:
The 9:atch0ro algorithm maintains a dictionary of data previously seen and attempts to match the current data element with an entry in the dictionary, replacing it with a shorter code referencing the match location. $ata elements that do not produce a match are transmitted in full 1literally2 prefixed by a single bit. <ach data element is exactly 5 bytes in width and is referred to as tuple. This feature gives a guaranteed input data rate during compression and thus also guaranteed data rates during decompression, irrespective of the data mix. Also the 5-byte tuple si,e gives an inherently higher throughput than other algorithms, which tend to operate on a byte stream. The dictionary is maintained using move to front strategy, where by the current tuple is placed at the front of the dictionary and the other tuples move down by one location as necessary to make space. The move to front strategy aims to exploit locality in the input data. #f the dictionary becomes full, the tuple occupying the last location is simply discarded. A full match occurs when all characters in the incoming tuple fully match a $ictionary entry. A partial match occurs when at least any tow of the characters in the incoming tuple match exactly with a dictionary entry, with the characters that do not match being transmitted literally. The use of partial matching improves the compression ratio when compared with allowing only 5 byte matches, but still maintains high throughput. #f neither a full nor partial match occurs, then a miss is registered and a single miss bit of J F is transmitted followed by the
tuple itself in literal form. The only exception to this is the first tuple in any compression operation, which will always generate a miss as the dictionary begins in an empty state. #n this case no miss bit is required to prefix the tuple. At the beginning of each compression operation, the dictionary si,e is reset to ,ero. The dictionary then grows by one location for each incoming tuple being placed at the front of the dictionary and all other entries in the dictionary moving down by one location. A full match does not grow the dictionary, but the move-to-front rule is still applied. This growth of the dictionary means that code words are short during the early stages of compressing a block. (ecause the 9:atch0ro algorithm allows partial matches, a decision must be made about which of the locations provides the best overall match, with the selection criteria being the shortest possible number of output bits.
Gor multiple compression systems, it is important to ensure all compressors are supplied with sufficient data by managing the supply so that neither stall conditions nor data overflow occurs. There are several approaches in which data can be routed in and out of the compressors. The basic method for input routing used in this pro+ect is done by getting twice the si,e of the input to the 9:atch0ro compressor, the lower 4" bit is given to the 3ompressor . and the higher 4" bits are given to the other 3ompressor are assigned for both the 3ompressor . and 3ompressor . . The method is used for output routing and additional output pins
own compressed data stream, a mechanism is required to merge these streams in such a way that subsequent decompression can be performed correctly. Also, subsequent decompression needs to be capable of operating in an appropriate parallel fashion, otherwise, a disparity in compression and decompression speeds will occur, reducing overall throughput. The data Glow for parallel compression system is given in Gigure below.
"&P%T 1#%T"&. As per the Algorithm, 9:atch0ro can process four bytes of data per clock cycle, then to ensure that all are busy, data must enter the system at a rate of 5n bytes per clock cycle, where n is the number of compressors in the system. #t can be achieved by " methods. .#nterleaved input method ".(locked #nput method
of compressors
XM
XMatchPro
XMatchPro
$i'(4()( "nterleaved "nput 1outin' L#C5-D "&P%T M-T/#D:
#n the blocked input approach, a fixed length block of data is sent from the incoming data stream to each of the compressors in turn, as shown in the following figure. #n this scheme, the data has to arrive at the dedicated memory of the compressor at a rate slower than it can be processed, thereby allowing the memory to be filled with data.
!S-D
The block diagram gives the details about the components of a single 4" bit 3ompressor. There are three components namely, 3O:0APATOP, APPAQ, 3A:3O:0APATOP. The comparator
is used to compare two 4"-bit data and to set or reset the output bit as for equal and . for unequal. The 3A: 3O:0APATOP searches the 3A: dictionary entries for a full match of the input data given.
The reason for choosing a full match is to get a prototype of the high throughout 9matchpro compressor with reduced complexity and high performance. #f a full match occurs, the match-hit signal is generated and the corresponding match location is given as output by the 3A: 3omparator.. #f no full match occurs, the corresponding data that is given as input at the given time is got as output. Array is of length of 8594" bit locations. This is used to store the unmatched incoming data and when a new data comes, the incoming data is compared with all the data stored in this array. #f a match occurs, the corresponding match location is sent as output else the incoming data is stored in next free location of the array *
is sent as output. The last component is the cam comparator and is used to send the match location of the 3A: dictionary as output if a match has occurred. This is done by getting match information as input from the comparator. 'uppose the output of the comparator goes high for any input, the match is found and the corresponding address is retrieved and sent as output along with one bit to indicate that match is found. At the same time, suppose no match occurs, or no matched data is found, the incoming data is stored in the array and it is sent as the output. These are the functions of the three components of the 3ompressor. The hardware descriptions of these modules are done using =@$! !anguage. =@$! is an acronym for =ery high-speed integrated circuits @ardware $escription !anguage. #t can be used to model a digital system at many levels of the abstraction, ranging from the algorithmic level to gate level. The =@$! language can be regarded as an integrated amalgamation of the following languages? o o o o o 'equential language 3oncurrent language Iet-list language Timing specifications 7aveform generation language.
'o the language has constructs that enable you to express the concurrent or sequential behavior of a digital system with or without timing. #t also allows modeling the system as an interconnection of components. Test waveforms can also be generated using the same constructs. The language not only defines the syntax but also defines very clear simulation semantics for each language construct. Therefore, models written in this language can be verified
using a =@$! simulator. =@$! is event driven, to allow for efficient simulation of hardware. 3omputations are only performed when some data has changed 1event occurred2.
components
3O:0APATOP,
3A:3O:0APATOP. The comparator is used to compare two 4"-bit data and to set or reset the output bit as for equal and . for unequal. Array is of length of 8594"bit locations. This is used to store the unmatched in coming data and when the next new data comes, that data is compared with all the data stored in this array. #f the incoming data matches with any of the data stored in array, the 3omparator generates a match signal and sends it to 3am 3omparator. The last component is the 3am comparator and is
used to send the incoming data and all the stored data in array one by one to the comparator. 'uppose output of comparator goes high for any input, then the match is found and the corresponding address 1match location2 is retrieved and sent as output along with one bit to indicate the match is found. At the same time, suppose no match is found, then the incoming data stored in the array is sent as output. These are the functions of the three components of the 9:atch0ro based compressor. The decompressor has the following components Array and 0rocessing Hnit. Array has the same function as that of the array unit which is used in the 3ompressor. #t is also of the same length. 0rocessing unit checks the incoming match hit data and if it is ., it indicates that the data is not present in the Array, so it stores the data in the Array and if the match hit data is , it indicates the data is present in the Array, then it instructs to find the data from the Array with the help of the address input and sends as output to the data out. The 3ontrol has the input bit called 3 - $ i.e.,
3ompression - $ecompression indicates whether compression or decompression has to be done. #f it has the value . then compressor is stared when the value is decompression is done.
in the match location as output. The block diagram of the 85 bit 3ompression - $ecompression 'ystem is given below.
The components of the single instantiated compressor are same as that of the 4"-bit compressor. The components involved in the 4"-bit compressor are namely, 3O:0APATOP, APPAQ, and 3A:3O:0APATOP. The comparator is used to compare two 4"-bit data and to set or reset the output bit as of 8594"bit locations. This is used to store the unmatched incoming data and when a new data comes, that data is compared with the all the data stored in this array for a match. #f no match occurs, the incoming data is stored in next free location of the array. The last component is the cam comparator and is used to send the incoming data and all the stored data in array one by one to the comparator. 3omparator goes high for any input the match is found and the corresponding address is retrieved and sent as output along with one bit to indicate that a match is found. At the same time, suppose that no match is found, then the incoming data is stored in the array and is sent as output. These are the functions of the three components of the 4"-bit 3ompressor. for equal and . for unequal. Array is of length
concatenating the outputs of two compressors in parallel architecture and giving those data as input to the parallel decompression system comprising two 4"-bit decompression system discussed above for single compression system. The 4"-bit decompressor has the following components Array and 0rocessing Hnit. Array has the same function as that of the array unit which is used in the 3ompressor. #t is also of the same length. 0rocessing unit checks the incoming match hit data and if it is ., it indicates that the data is not present in the Array, so it stores the data in the Array. #f the match hit data is , it indicates the data is present in the Array, then it instructs to find the data from the Array with the help of the address input 1match location2 and sends as output to the data out.