Sunteți pe pagina 1din 12

LZMW -compression algorithm using predictive text input which is a modification of the LZW algorithm, invented in 1985 by v.

Miller and m. Wegmana in the article Variations on a theme by Ziv and Lempel. Change of subject only to the encoder, decoder is the same as the LZW method. The LZW after finding in the dictionary the longest prefix of niezakodowanych data to the dictionary is added as a new Word concatenation of the prefix and the following character. In other words to the dictionary goes Word one character longer than the prefix. In turn LZMW method to remember the word matched in the previous step and after the latter s next to the dictionary goes concatenation of both words. In this way the words are appearing in the dictionary quickly become longer. Modification LZMW is LZAP Panjabi (James Storer, 1988), which adds to the dictionary words with all previous concatenating the prefix of the current-this results in a significant growth of the dictionary, however, allows you to bypass LZMW disadvantages

The algorithm of compression (encoding) 1. Reset dictionary 2. and = 0-current position 3. T : = encoded text 4. P : = -previous word, initiated on empty

5. As long as and < length (T) follow:

Find the longest prefix of niezakodowanych data (

), which is located

in the dictionary-the result is S current word about code k and n characters in length

Exit sign kcode Add a Word to the dictionary P + S, if it does not already exist i := i + n P := S

The decompression algorithm The decompression algorithm is identical to the LZW method. Example of compression

Is encoded in a string of 12 characters: aaaaabbbaaaa .

the preceding current string (P) symbol (S)

P + S index (k)

Dictionary comment

1.(a) 2.(b) 3.(c) (a) (a) 1-the index of the a (a) (a) (aa) 1-the index 4. (aa) of the a (a) (aa) aaa 4-index of AA (aa) (a) aaa 1-the index of the a (a) (b) ab 2-the index 6. ab of the b (b) (b) bb 2-the index 7. bb of the b (b) (b) bb 2-the index of the b (b) Aaa baaa 5-index AAA 8. baaa 5. aaa

initialize the dictionary in Roman

nothing is added to the dictionary, P: = a to the dictionary is added to the string AA , P: = a the dictionary is added as a string of AAA P = AA nothing is added to the dictionary, P: = a the string is added to the dictionary ab P = b the string is added to the dictionary bb P = b nothing is added to the dictionary, P: = b the string is added to the dictionary baaa P: = AAA



aaaa 1-the index 9. aaaa of the a

the string is added to the dictionary aaaa P: = a

3.13.5 LZW Variants A word-based LZW vari ant is described in Section 8.6.2.LZW is an adaptive data compression method, but it is slow to adapt to its input,since strings in the dictionary get only one character longer at a time. Exercise 3.4 showsthat a string of a million a s (which, of course, is highly redundant) produces dictionaryphrases the longest of which contains only 1,414 a s.

The LZMW method, Section 3.14, is a variant of LZW that overcomes this problem.I t s m ai n p r i n ci p l e i s t h i s: I n s t e a d o f a d d i n g I p l u s o n e c h a r a c t e r o f th e n e x t p h r a s e t o the dictionary, add I plus the entire next phrase to the dictionary.

Th e LZAP method, Section 3.15, is yet another variant based on this idea: Insteadof just concatenating the last two phrases and placing the result in the dictionary, placeall pre xes of the concatenation in the dictionary. More speci cally, if S and T a r e t h e last two matches, add St to the dictionary for every nonempty pre x t of T , including T itself.

Table 3.23 summarizes the principles of LZW, LZMW, and LZAP and shows how they naturally suggest another variant, LZY

Table 3.23: Four Variants of LZW.



one dictionary


per phrase





one symbol ata time. LZMW adds one dictionary string per phrase and increments strings by several symbols at a time. LZAP adds one dictionary string per input symbol and incrementsstrings by several symbols at a time. LZY, Section 3.16, ts the fourth cell of Table 3.23.It is a method that adds one

dictionary string per input symbol and increments stringsby one symbol at a time

3.14 LZMW
This LZW variant, developed by V. Miller and M. Wegman [Miller and Wegman 85], is based on two principles: 1. W h e n t h e di ctionary gets full , the l east r e c e n tl y

u s e d d i c t i o n a r y p h r a s e i s d el e t ed . There are several ways to select this phrase, and the developers suggest that any reason-able way of doing so will work. One possibility is to identify all the dictionary phrases S f o r w h i ch t h e r e a r e n o p h r a s e s Sa ( n o t h i n g h a s b e e n a p p en d ed t o S , meaning that S h a s n t b e en u s e d si n c e i t w a s p l a c e d i n t h e d i c t i o n a r y ) an d d el e t e t h e o l d e s t o f t h e m . A n a u xi l i a r y d a t a s t ru c t u r e h a s t o b e b u i l t a n d m a i n t ai n e d i n t h i s c a s e , p o i n ti n g t o dictionary phrases according to their age (the pointer always points to the oldest phrase). rst

The rst 256 dictionary phrases should never be deleted.

2. Each phrase added to the dictionary is a concatenation of two strings, the previous match ( S below) and the current one ( S ). This is in contrast to LZW, where each phrase added is the concatenation of the current match and the algorithm illustrates this: rst symbol of the next match.The pseudo-code

By adding the concatenation S S to the LZMW dictionary, dictionary phrases can gr ow by mo r e than one symbol n a t u r al a n a tu r al

a t a t i m e. T h i s m e a n s t h a t L Z M W d i c t i o n a r y p h r a s e s a r e m o r e uni ts of the i nput ( e. g . , if the input is text in

l an g u a g e , dictionary phrases will tend to be complete words or even several words in that language).T h i s , i n t u r n , i mp l i e s t h a t t h e L Z M W d i c t i o n a r y g en e r al l y a d ap t s t o t h e i n p u t f a s t e r than the LZW dictionary. Table 3.24 illustrates the LZMW method by applying it to the string sir sid eastman easily teases sea sick seals .

LZMW adapts to its input faster than LZW but has the following three disadvan -tages: 1. The dictionary data structure cannot be the simple LZW tri e, since not every pre x of a di ctionary phrase i s i n cl u d ed i n t h e

d i c t i o n a r y . T h i s m e a n s t h a t t h e o n e- s y m b o l - at-a-time search method used in LZW will not work. Instead, when a phrase S is added to the LZMW dictionary, every pre x of S mu s t b e a d d ed t o t h e d a t a s t r u c t u r e , a n d every node in the data structure must have a tag indicating whether the node is in the dictionary or not.

2. Finding the longest string may require backtracki ng. I f t h e d i c t i o n a r y c o n t a i n s aaaa t h e ei g h t h s y m b o l o f p h r a s e and aaaaaaaa , w e h a v e t o r e a c h

aaaaaaab to realize that we have to choose the

shorter phrase. This implies that dictionary searches in LZMW are slower than in LZW. This problem does not apply to the LZMW decoder.

3. A phrase may be added to the dictionary twice. This again complicates the choice of data structure for the dictionary.

Exercise 3.7: Use the LZMW method to compress the string swiss miss . Answer : 3.7: This is straight forward (Table Ans.25) but not very efficient since only one twosymbol dictionary phrase is used.

Table Ans.25: LZMW Compression of swiss miss .

Exercise 3.8: Compress the string yabbadabbadabbadoo using LZMW 3.8: Table Ans.26 shows all the steps. In spite of the short input, the result is quite good (13 codes to compress 18-symbols) because the input contains concentrations of as and bs.

3.15 LZAP
LZAP is an extension of LZMW. The AP stands for All Pre xes [Storer 88]. LZAP adapts to its input fast, like LZMW, but eliminates the need for backtracking, a feature that makes it faster than LZMW. The principle is this: Instead of adding the concate-nation S S of the last two phrases to the dictionary, add all the strings S t where t is a p r e x o f S ( including S i t s el f ) . T h u s i f S =a And S = bcd, a d d p h r a s e s ab , abc , a n d abcd to the LZAP dictionary. Table 3.25 shows the matches and the phrases added to the dictionary for yabbadabbadabbadoo

I n s t e p 7 t h e e n c o d e r c o n c a t e n a t e s ( r e n t e t an ) d to the two p r e x e s ( a w a l a n ) o f ab a n d a d d s th e t w o phrases da and dab to the dictionary. In step 9 it concatenates ba to the three pre xes of dab and adds the resulting three phrases bad, bada , a n d badab to the dictionary.L Z A P a d d s m o r e p h r a s e s t o i t s d i c t i o n a r y t h a n d o e s L Z M W , s o i t t a k e s m o r e bi ts to r ep r e s e n t t h e p o si t i o n o f a p h r a s e. A t t h e s a m e t i m e , L Z A P p r o v i d e s a b i g g e r selection of dictionary phrases as matches for the input string, so it ends up compressingslightly better than LZMW while being faster (because of the simpler dictionary data

Algoritma LZW (Lempel-Ziv-Welch) dikembangkan oleh Terry A.Welch dari metode kompresi sebelumnya yang ditemukan oleh Abraham Lempel dan Jacob Ziv pada tahun 1977. Algortima ini menggunakan teknik dictionary dalam kompresinya. Dimana string karakter digantikan oleh kode table yang dibuat setiap ada string yang masuk. Tabel dibuat untuk referensi masukan string selanjutnya. Ukuran tabel dictionary pada algoritma LZW asli adalah 4096 sampel atau 12 bit, dimana 256 sampel pertama digunakan untuk table karakter single (Extended ASCII), dan sisanya digunakan untuk pasangan karakter atau string dalam data input. Algoritma LZW melakukan kompresi dengan mengunakan kode table 256 hingga 4095 untuk mengkodekan pasangan byte atau string. Dengan metode ini banyak string yang dapat dikodekan dengan mengacu pada string yang telah muncul sebelumnya dalam teks. Algoritma kompresi LZW secara lengkap :

1. Dictionary (kamus) diinisialisasi dengan semua karakter dasar yang ada : {A..Z,a..z,0..9}. 2. W <- karakter pertama dalam stream karakter. 3. K <- karakter berikutnya dalam stream karakter. 4. Lakukan pengecekan apakah (W+K) terdapat dalam Dictionary a. Jika ya, maka W <- W + K (gabungkan W dan K menjadi string baru). b. Jika tidak, maka : - Output sebuah kode untuk menggantikan string W. - Tambahkan string (W+ K) ke dalam dictionary dan berikan nomor/kode berikutnya yang belum digunakan dalam dictionary untuk string tersebut. - W <- K. 5. Lakukan pengecekan apakah masih ada karakter berikutnya dalam stream karakter. a. Jika ya, maka kembali ke langkah 2. b. Jika tidak, maka output kode yang menggantikan string W, lalu terminasi proses (stop).

Flowchart Algoritma LZW

Sebagai contoh, string ABBABABAC akan dikompresi dengan LZW. Isi pada dictionary diset dengan tiga karakter dasar yang ada yaitu: A, B, dan C.

Tahapan Kompresi LZW

Kolom posisi menyatakan posisi sekarang dari stream karakter dan kolom karakter menyatakan karakter yang terdapat pada posisi tersebut. Kolom dictionary menyatakan string baru yang sudah ditambahkan ke dalam dictionary dan nomor indeks untuk string tersebut ditulis dalam kurung siku. Kolom output menyatakan kode output yang dihasilkan oleh langkah kompresi.

Hasil Proses Kompresi

Proses dekompresi data pada algoritma LZW tidak jauh berbeda dengan proses kompresinya. Pada dekompresi LZW, juga dibuat tabel dictionary dari data input kompresi, sehingga tidak diperlukan penyertaan tabel dictionary ke dalam data kompresi. Berikut algoritma dekompresi LZW :

1. Dictionary diinisialisasi dengan semua karakter dasar yang ada : {A..Z,a..z,0..9}. 2. CW kode pertama dari stream salah satu karakter dasar). 3. Lihat dictionary dan output string dari kode tersebut (string.CW) ke stream karakter. 4. PW <- CW; CW <- kode berikutnya dari stream kode. 5. Apakah string.CW terdapat dalam dictionary ? a. Jika ada, maka : - Output string.CW ke stream karakter - P <- string.PW - C <- karakter pertama dari string.CW -Tambahkan string (P+C) ke dalam dictionary b. Jika tidak, maka : - P <- string.PW - C <- karakter pertama dari string.PW -Output string (P+C) ke stream tambahkan string tersebut ke dalam (sekarang berkorespondensi dengan CW) 6. Apakah terdapat kode lagi di stream code? a. Jika ya, maka kembali ke langkah 4. b. Jika tidak, maka terminasi proses (stop).