Arini, MT, MSC: Basic Compression Entropy Coding Statistical

Lecture 3 Basic Compression Entropy Coding Statistical
RLE, Huffman, Arithmetic
Arini, MT, MSc
1.1.3. Arithmetic Encoding

Arithmetic
coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating point output number.
http://www.ics.uci.edu/~dan/pubs/DC-Sec3.html#Sec_3.4
Character
^(space) A
probability
1/10 1/10
Range
0.00 r < 0.10 0.10 r < 0.20 0.20 r < 0.30 0.30 r < 0.40 0.40 r < 0.50 0.50 r < 0.60
B
E G
1/10
1/10 1/10
I
L S
1/10
2/10 1/10
1/10
0.60 r < 0.80 0.80 r < 0.90 0.90 r < 1.00
Suppose that we want to encode the message BILL GATES

Encoding
algorithm for arithmetic coding
low = 0.0 ; high =1.0 ; while not EOF do

range = high - low ; read(c) ; high = low + rangehigh_range(c) ; low = low + rangelow_range(c) ;
end do output(low);
5
B Character
To
encode the first character B properly, the final coded message has to be a number greater than or equal to 0.20 and less than 0.30.
range = high - low ; read(c) ; high = low + rangehigh_range(c) ; low = low + rangelow_range(c) ; range = 1.0 0.0 = 1.0 high = 0.0 + 1.0 0.3 = 0.3 low = 0.0 + 1.0 0.2 = 0.2
After
the first character is encoded, the low end for the range is changed from 0.00 to 0.20 and the high end for the range is changed from 1.00 to 0.30.

The
next character to be encoded, the letter I

owns
the range 0.50 to 0.60 in the new subrange of 0.20 to 0.30. So, the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established. Thus, this number is further restricted to 0.25 to 0.26.
7

Note
that any number between 0.25 and 0.26 is a legal encoding number of BI. Thus, a number that is best suited for binary representation is selected.
range = high - low ; read(code) ; high = low + rangehigh_range(c) ; low = low + rangelow_range(c) ; B H = 1 L= 0 Range : 1-0 = 1 Hb = 0,3, Lb = 0,2 nHb = 0 + 1.0,3 = 0,3 nLb=0+1.0,2 =0,2 I nHb = 0,3 nLb=0,2 Range = 0,1 Hi = 0,6, Li = 0,5 nHi = 0,2 + 0,1.0,6 = 0,26 nLi = 0,2 +0,1.0,5 =0,25
L nHi = 0,26 nLi= 0,25 Range : 0,01 Hl = 0,8, Ll = 0,6 nHl = 0,25+ 0,01.0,8 = 0,258 nLl=0,25+0,01.0,6 =0,256 L nHl = 0,258, nLl=0,256 Range = 0,002 Hl = 0,8, Ll = 0,6 nHl = 0,256 + 0,002.0,8 = 0,2576 nLl = 0,256 +0,002.0,6 =0,2572 Next...
1.0 0.3 T T S 0.26 T S
0.2576 0.25724
0.2572168 0.25722 0.2572168 T S T S T S
0.257216776
0.258
T S T S T S
0.9
S 0.8
T 0.2572167756 S 0.2572167752
L 0.6 I 0.5 G
I G E B A ( ) 0.2
I G E B A ( ) 0.25
I G E B A ( ) 0.256
I G E B A ( ) 0.2572
I G E B A ( ) 0.2572
I G E B A ( )
I G E B A ( )
I G E B A ( )
I G E B A ( )
0.4
E 0.3 B 0.2 A 0.1 ( ) 0.0
0.25721676 0.257216 10 0.2572164 0.257216772

Character B I L L ^(space) G A T E S Low 0.2 0.25 0.256 0.2572 0.25720 0.257216 0.2572164 0.25721676 0.257216772 0.2572167752 High 0.3 0.26 0.258 0.2576 0.25724 0.257220 0.2572168 0.2572168 0.257216776 0.2572167756
11

So,
the
final value 0.2572167752
or,
any
value between 0.2572167752 and 0.2572167756,
if
the length of the encoded message is known at the decode end), will uniquely encode the message BILL GATES.
12
1.1.3. Arithmetic Decoding

Decoding
Since
is the inverse process.
0.2572167752 falls between 0.2 and 0.3, the first character must be B.
Removing
the effect of B from 0.2572167752 by first subtracting the low value of B, 0.2, giving 0.0572167752. Then divided by the width of the range of B, 0.1. This gives a value of 0.572167752.
13

Then
calculate where that lands, which is in the range of the next letter, I. The process repeats until 0 or the known length of the message is reached.
14

Decoding
algorithm
r = input_number repeat
search c such that r falls in its range output(c) ; r = r - low_range(c); r = r (high_range(c) - low_range(c));
until EOF or the length of the message is reached

15
r = r - low_range(c);
r = r (high_range(c) - low_range(c));

In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol. The new range is proportional to the predefined probability attached to that symbol. Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted.
17

Coding
rate approaches high-order entropy theoretically. Not so popular as Huffman coding because , are needed.
18
1.2. Other Example
19
20
21
22
23
1.2. Othres Example

Again
we consider the following three symbols : a ( with prob. 0.5 ) b ( with prob. 0.3 ) c ( with prob. 0.2 ) Suppose we also encode the same message as the previous example : a a a b c a
The encoding process

1 0.09625 0.5 0.25
0.1 0.25 0.125 0.062 5
0.125
0.092 5
0.1
0.1
c b
0.5 0.0962 5
a
0 0.0925 0 0 0 0.0625 0.0925
a a
Final interval
Final interval
The
final interval therefore is [ 0.0625 , 0.09625 ) Any number in this final interval can be used for the decoding process . For instance , we pick 0.09375 [ 0.0625 , 0.09625 )
1
c
0.5
0.25
0.125
0.1
0.1
0.0925
0.1
b
0.5 0.25 0.125 0.0625
0.09625
0.0625
0.0925
0.09375 [ 0,0.5 ) output a
0.09375 [ 0,0.125 ) output a
0.09375 [ 0.925,0.1 ) output c
0.09375 [ 0,0.25 ) output a
0.09375 [ 0.0625,0.1 ) output b
0.09375 [ 0.0925,0.09625 ) output a
Decoder
Therefore , the decoder successfully identify the source sequence a a a b c a Note that 0.09375 can be represented by the binary sequence 0 0 0 1 1
(0.5) (0.25) (0.125) (0.0625) (0.03125)
We only need 5 bits to represent the message (Rate=5/6).
Discussions
Given
the same message a a a b c a No compression 12 bits Huffman codes 8 bits Lempel-Ziv 13 bits Arithmetic codes 5 bits
II. Golomb Coding
30
31
32
Decode
101111
Encoding0000 r 0 offsetof remainderoutput bits 0 0001 000 part 1 Encoding of binary 1 0010remainder part 001 2 2 0011 010 3 3 0100 011 4 4 0101 100 5 5 1100 6 r 12 1101 binary output bits 1100 7 13 11101111101 1101 offset 8 14 1111 1110 9 15 0 0 0000 000 1 1 0001 001 2 2 0010 010 3 3 0011 011 4 4 0100 100 5 5 0101 101 6 12 1100 1100 7 13 1101 1101 8 14 1110 1110 9 15 11111111
Encoding of quotient part q output bits 0 0 1 Encoding of quotient part 2 10 3 110 4 1110 5 11110 6 q 111110 bits of 1>0 : : output N 1111110 <N repetitions 0 0 1 10 2 110 3 1110 4 11110 5 111110 6 1111110 : : N <N repetitions of 1>0
= 1, r = 9 a = 10*1+9 = 19
33
Huffman Golomb Adaptive Golomb
Without codeword table NO YES YES
Flexibility and adaptation GOOD MIDDLE GOOD

34

Arini, MT, MSC: Basic Compression Entropy Coding Statistical

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Arini, MT, MSC: Basic Compression Entropy Coding Statistical

Încărcat de

Drepturi de autor:

Formate disponibile

Lecture 3 Basic Compression Entropy Coding Statistical

RLE, Huffman, Arithmetic

Arini, MT, MSc

1.1.3. Arithmetic Encoding

1.1.3. Arithmetic Encoding

0.60 r < 0.80 0.80 r < 0.90 0.90 r < 1.00

Suppose that we want to encode the message BILL GATES

1.1.3. Arithmetic Encoding

algorithm for arithmetic coding

low = 0.0 ; high =1.0 ; while not EOF do

1.1.3. Arithmetic Encoding

1.1.3. Arithmetic Encoding

next character to be encoded, the letter I

1.1.3. Arithmetic Encoding

1.0 0.3 T T S 0.26 T S

0.2572168 0.25722 0.2572168 T S T S T S

0.25721676 0.257216 10 0.2572164 0.257216772

1.1.3. Arithmetic Encoding

1.1.3. Arithmetic Encoding

final value 0.2572167752

value between 0.2572167752 and 0.2572167756,

1.1.3. Arithmetic Decoding

is the inverse process.

1.1.3. Arithmetic Decoding

1.1.3. Arithmetic Decoding

until EOF or the length of the message is reached

1.1.3. Arithmetic Encoding

1.1.3. Arithmetic Encoding

1.2. Other Example

1.2. Othres Example

The encoding process

0.09375 [ 0,0.5 ) output a

0.09375 [ 0,0.125 ) output a

0.09375 [ 0.925,0.1 ) output c

0.09375 [ 0,0.25 ) output a

0.09375 [ 0.0625,0.1 ) output b

0.09375 [ 0.0925,0.09625 ) output a

(0.5) (0.25) (0.125) (0.0625) (0.03125)

We only need 5 bits to represent the message (Rate=5/6).

II. Golomb Coding

Huffman Golomb Adaptive Golomb

Without codeword table NO YES YES

Flexibility and adaptation GOOD MIDDLE GOOD

S-ar putea să vă placă și