Sunteți pe pagina 1din 34

Lecture 3 Basic Compression Entropy Coding Statistical

RLE, Huffman, Arithmetic

Arini, MT, MSc

1.1.3. Arithmetic Encoding


Arithmetic

coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating point output number.

http://www.ics.uci.edu/~dan/pubs/DC-Sec3.html#Sec_3.4

1.1.3. Arithmetic Encoding

Character
^(space) A

probability
1/10 1/10

Range
0.00 r < 0.10 0.10 r < 0.20 0.20 r < 0.30 0.30 r < 0.40 0.40 r < 0.50 0.50 r < 0.60

B
E G

1/10
1/10 1/10

I
L S

1/10
2/10 1/10

1/10

0.60 r < 0.80 0.80 r < 0.90 0.90 r < 1.00

Suppose that we want to encode the message BILL GATES

1.1.3. Arithmetic Encoding


Encoding

algorithm for arithmetic coding

low = 0.0 ; high =1.0 ; while not EOF do


range = high - low ; read(c) ; high = low + rangehigh_range(c) ; low = low + rangelow_range(c) ;

end do output(low);
5

1.1.3. Arithmetic Encoding

B Character
To

encode the first character B properly, the final coded message has to be a number greater than or equal to 0.20 and less than 0.30.
range = high - low ; read(c) ; high = low + rangehigh_range(c) ; low = low + rangelow_range(c) ; range = 1.0 0.0 = 1.0 high = 0.0 + 1.0 0.3 = 0.3 low = 0.0 + 1.0 0.2 = 0.2

After

the first character is encoded, the low end for the range is changed from 0.00 to 0.20 and the high end for the range is changed from 1.00 to 0.30.

1.1.3. Arithmetic Encoding


The

next character to be encoded, the letter I


owns

the range 0.50 to 0.60 in the new subrange of 0.20 to 0.30. So, the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established. Thus, this number is further restricted to 0.25 to 0.26.
7

1.1.3. Arithmetic Encoding


Note

that any number between 0.25 and 0.26 is a legal encoding number of BI. Thus, a number that is best suited for binary representation is selected.

range = high - low ; read(code) ; high = low + rangehigh_range(c) ; low = low + rangelow_range(c) ; B H = 1 L= 0 Range : 1-0 = 1 Hb = 0,3, Lb = 0,2 nHb = 0 + 1.0,3 = 0,3 nLb=0+1.0,2 =0,2 I nHb = 0,3 nLb=0,2 Range = 0,1 Hi = 0,6, Li = 0,5 nHi = 0,2 + 0,1.0,6 = 0,26 nLi = 0,2 +0,1.0,5 =0,25

L nHi = 0,26 nLi= 0,25 Range : 0,01 Hl = 0,8, Ll = 0,6 nHl = 0,25+ 0,01.0,8 = 0,258 nLl=0,25+0,01.0,6 =0,256 L nHl = 0,258, nLl=0,256 Range = 0,002 Hl = 0,8, Ll = 0,6 nHl = 0,256 + 0,002.0,8 = 0,2576 nLl = 0,256 +0,002.0,6 =0,2572 Next...

1.0 0.3 T T S 0.26 T S

0.2576 0.25724

0.2572168 0.25722 0.2572168 T S T S T S

0.257216776

0.258
T S T S T S

0.9
S 0.8

T 0.2572167756 S 0.2572167752

L 0.6 I 0.5 G

I G E B A ( ) 0.2

I G E B A ( ) 0.25

I G E B A ( ) 0.256

I G E B A ( ) 0.2572

I G E B A ( ) 0.2572

I G E B A ( )

I G E B A ( )

I G E B A ( )

I G E B A ( )

0.4
E 0.3 B 0.2 A 0.1 ( ) 0.0

0.25721676 0.257216 10 0.2572164 0.257216772

1.1.3. Arithmetic Encoding


Character B I L L ^(space) G A T E S Low 0.2 0.25 0.256 0.2572 0.25720 0.257216 0.2572164 0.25721676 0.257216772 0.2572167752 High 0.3 0.26 0.258 0.2576 0.25724 0.257220 0.2572168 0.2572168 0.257216776 0.2572167756
11

1.1.3. Arithmetic Encoding


So,
the

final value 0.2572167752

or,
any

value between 0.2572167752 and 0.2572167756,

if

the length of the encoded message is known at the decode end), will uniquely encode the message BILL GATES.
12

1.1.3. Arithmetic Decoding


Decoding
Since

is the inverse process.

0.2572167752 falls between 0.2 and 0.3, the first character must be B.

Removing

the effect of B from 0.2572167752 by first subtracting the low value of B, 0.2, giving 0.0572167752. Then divided by the width of the range of B, 0.1. This gives a value of 0.572167752.
13

1.1.3. Arithmetic Decoding


Then

calculate where that lands, which is in the range of the next letter, I. The process repeats until 0 or the known length of the message is reached.

14

1.1.3. Arithmetic Decoding


Decoding

algorithm

r = input_number repeat
search c such that r falls in its range output(c) ; r = r - low_range(c); r = r (high_range(c) - low_range(c));

until EOF or the length of the message is reached


15

r = r - low_range(c);
r = r (high_range(c) - low_range(c));

1.1.3. Arithmetic Encoding


In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol. The new range is proportional to the predefined probability attached to that symbol. Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted.

17

1.1.3. Arithmetic Encoding


Coding

rate approaches high-order entropy theoretically. Not so popular as Huffman coding because , are needed.

18

1.2. Other Example

19

20

21

22

23

1.2. Othres Example


Again

we consider the following three symbols : a ( with prob. 0.5 ) b ( with prob. 0.3 ) c ( with prob. 0.2 ) Suppose we also encode the same message as the previous example : a a a b c a

The encoding process


1 0.09625 0.5 0.25
0.1 0.25 0.125 0.062 5

0.125
0.092 5

0.1

0.1

c b
0.5 0.0962 5

a
0 0.0925 0 0 0 0.0625 0.0925

a a

Final interval

Final interval
The

final interval therefore is [ 0.0625 , 0.09625 ) Any number in this final interval can be used for the decoding process . For instance , we pick 0.09375 [ 0.0625 , 0.09625 )

1
c

0.5

0.25

0.125
0.1

0.1
0.0925

0.1

b
0.5 0.25 0.125 0.0625

0.09625

0.0625

0.0925

0.09375 [ 0,0.5 ) output a

0.09375 [ 0,0.125 ) output a

0.09375 [ 0.925,0.1 ) output c

0.09375 [ 0,0.25 ) output a

0.09375 [ 0.0625,0.1 ) output b

0.09375 [ 0.0925,0.09625 ) output a

Decoder
Therefore , the decoder successfully identify the source sequence a a a b c a Note that 0.09375 can be represented by the binary sequence 0 0 0 1 1

(0.5) (0.25) (0.125) (0.0625) (0.03125)

We only need 5 bits to represent the message (Rate=5/6).

Discussions

Given

the same message a a a b c a No compression 12 bits Huffman codes 8 bits Lempel-Ziv 13 bits Arithmetic codes 5 bits

II. Golomb Coding

30

31

32

Decode

101111
Encoding0000 r 0 offsetof remainderoutput bits 0 0001 000 part 1 Encoding of binary 1 0010remainder part 001 2 2 0011 010 3 3 0100 011 4 4 0101 100 5 5 1100 6 r 12 1101 binary output bits 1100 7 13 11101111101 1101 offset 8 14 1111 1110 9 15 0 0 0000 000 1 1 0001 001 2 2 0010 010 3 3 0011 011 4 4 0100 100 5 5 0101 101 6 12 1100 1100 7 13 1101 1101 8 14 1110 1110 9 15 11111111

Encoding of quotient part q output bits 0 0 1 Encoding of quotient part 2 10 3 110 4 1110 5 11110 6 q 111110 bits of 1>0 : : output N 1111110 <N repetitions 0 0 1 10 2 110 3 1110 4 11110 5 111110 6 1111110 : : N <N repetitions of 1>0

= 1, r = 9 a = 10*1+9 = 19
33

Huffman Golomb Adaptive Golomb

Without codeword table NO YES YES

Flexibility and adaptation GOOD MIDDLE GOOD


34

S-ar putea să vă placă și