Documente Academic
Documente Profesional
Documente Cultură
coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating point output number.
http://www.ics.uci.edu/~dan/pubs/DC-Sec3.html#Sec_3.4
Character
^(space) A
probability
1/10 1/10
Range
0.00 r < 0.10 0.10 r < 0.20 0.20 r < 0.30 0.30 r < 0.40 0.40 r < 0.50 0.50 r < 0.60
B
E G
1/10
1/10 1/10
I
L S
1/10
2/10 1/10
1/10
end do output(low);
5
B Character
To
encode the first character B properly, the final coded message has to be a number greater than or equal to 0.20 and less than 0.30.
range = high - low ; read(c) ; high = low + rangehigh_range(c) ; low = low + rangelow_range(c) ; range = 1.0 0.0 = 1.0 high = 0.0 + 1.0 0.3 = 0.3 low = 0.0 + 1.0 0.2 = 0.2
After
the first character is encoded, the low end for the range is changed from 0.00 to 0.20 and the high end for the range is changed from 1.00 to 0.30.
the range 0.50 to 0.60 in the new subrange of 0.20 to 0.30. So, the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established. Thus, this number is further restricted to 0.25 to 0.26.
7
that any number between 0.25 and 0.26 is a legal encoding number of BI. Thus, a number that is best suited for binary representation is selected.
range = high - low ; read(code) ; high = low + rangehigh_range(c) ; low = low + rangelow_range(c) ; B H = 1 L= 0 Range : 1-0 = 1 Hb = 0,3, Lb = 0,2 nHb = 0 + 1.0,3 = 0,3 nLb=0+1.0,2 =0,2 I nHb = 0,3 nLb=0,2 Range = 0,1 Hi = 0,6, Li = 0,5 nHi = 0,2 + 0,1.0,6 = 0,26 nLi = 0,2 +0,1.0,5 =0,25
L nHi = 0,26 nLi= 0,25 Range : 0,01 Hl = 0,8, Ll = 0,6 nHl = 0,25+ 0,01.0,8 = 0,258 nLl=0,25+0,01.0,6 =0,256 L nHl = 0,258, nLl=0,256 Range = 0,002 Hl = 0,8, Ll = 0,6 nHl = 0,256 + 0,002.0,8 = 0,2576 nLl = 0,256 +0,002.0,6 =0,2572 Next...
0.2576 0.25724
0.257216776
0.258
T S T S T S
0.9
S 0.8
T 0.2572167756 S 0.2572167752
L 0.6 I 0.5 G
I G E B A ( ) 0.2
I G E B A ( ) 0.25
I G E B A ( ) 0.256
I G E B A ( ) 0.2572
I G E B A ( ) 0.2572
I G E B A ( )
I G E B A ( )
I G E B A ( )
I G E B A ( )
0.4
E 0.3 B 0.2 A 0.1 ( ) 0.0
or,
any
if
the length of the encoded message is known at the decode end), will uniquely encode the message BILL GATES.
12
0.2572167752 falls between 0.2 and 0.3, the first character must be B.
Removing
the effect of B from 0.2572167752 by first subtracting the low value of B, 0.2, giving 0.0572167752. Then divided by the width of the range of B, 0.1. This gives a value of 0.572167752.
13
calculate where that lands, which is in the range of the next letter, I. The process repeats until 0 or the known length of the message is reached.
14
algorithm
r = input_number repeat
search c such that r falls in its range output(c) ; r = r - low_range(c); r = r (high_range(c) - low_range(c));
r = r - low_range(c);
r = r (high_range(c) - low_range(c));
17
rate approaches high-order entropy theoretically. Not so popular as Huffman coding because , are needed.
18
19
20
21
22
23
we consider the following three symbols : a ( with prob. 0.5 ) b ( with prob. 0.3 ) c ( with prob. 0.2 ) Suppose we also encode the same message as the previous example : a a a b c a
0.125
0.092 5
0.1
0.1
c b
0.5 0.0962 5
a
0 0.0925 0 0 0 0.0625 0.0925
a a
Final interval
Final interval
The
final interval therefore is [ 0.0625 , 0.09625 ) Any number in this final interval can be used for the decoding process . For instance , we pick 0.09375 [ 0.0625 , 0.09625 )
1
c
0.5
0.25
0.125
0.1
0.1
0.0925
0.1
b
0.5 0.25 0.125 0.0625
0.09625
0.0625
0.0925
Decoder
Therefore , the decoder successfully identify the source sequence a a a b c a Note that 0.09375 can be represented by the binary sequence 0 0 0 1 1
Discussions
Given
the same message a a a b c a No compression 12 bits Huffman codes 8 bits Lempel-Ziv 13 bits Arithmetic codes 5 bits
30
31
32
Decode
101111
Encoding0000 r 0 offsetof remainderoutput bits 0 0001 000 part 1 Encoding of binary 1 0010remainder part 001 2 2 0011 010 3 3 0100 011 4 4 0101 100 5 5 1100 6 r 12 1101 binary output bits 1100 7 13 11101111101 1101 offset 8 14 1111 1110 9 15 0 0 0000 000 1 1 0001 001 2 2 0010 010 3 3 0011 011 4 4 0100 100 5 5 0101 101 6 12 1100 1100 7 13 1101 1101 8 14 1110 1110 9 15 11111111
Encoding of quotient part q output bits 0 0 1 Encoding of quotient part 2 10 3 110 4 1110 5 11110 6 q 111110 bits of 1>0 : : output N 1111110 <N repetitions 0 0 1 10 2 110 3 1110 4 11110 5 111110 6 1111110 : : N <N repetitions of 1>0
= 1, r = 9 a = 10*1+9 = 19
33