Indian Institute of Technology Bombay

Indian Institute of Technology Bombay
Department of Electrical Engineering
Handout 3 EE 708 Information Theory and Coding

Tutorial 1 Jan 12, 2017
Question 1) Let us do our finding the faulty ball experiment again. Let N = 9 balls are
available, one being faulty among them. Let D = 3, i.e. our generalized common balance
will be able to identify the fault to one among D groups, when presented with D subsets
(w.l.o.g assume disjoint subsets). We will repeatedly perform this experiment. Our goal is
to design a strategy which minimizes the average number of measurements in finding the
faults. The faulty location is chosen according to the probability law,
1 1
p1 = p 2 = p 3 = p 7 = p 8 = p 9 = and p4 = p5 = ,
27 3
in an iid manner.
1 2 3 7 8 9
1-3 6 7-9
1-3,6-9 4 5
1-9
(a) Label the circles with the corresponding set of balls which are identified. For example,
the first test starts from balls 1 9, marked at the bottom. Keep in mind that your answer
should also attempt to minimize the number of average tests required.
Solution: See figure.
(b) Assume that there is an iid source producing letters X {1, , 9} with probabilities as
given above. Can you construct a source code using the above tree diagram. In particular,
write down the associated codeword for each source symbol.
Solution: Let the codeword alphabets be A, B, C. We can label the tree with these
symbols in the left to right fashion on each node where a branching occurs. Thus,
1 AAA, 2 AAB, 3 AAC, 4 B, 5 C, 6 AB, 7 ACA, 8 ACB, 9 ACC,
(c) Is your codebook unique in achieving the given E[L], the average codeword length.
Solution: Clearly we can swap some of the leaves (example, 1 and 2) and still maintain
the same expected length.
(d) Is your code uniquely decodable? Is it instantaneous?

Solution: Whenever each leaf is named by unique source symbol, the resulting code is a
prefix-free code, as no code is the prefix of another (it is a separate leaf). All prefix-free
codes are instantaneous or self-punctuating.
(e) Show that among all Dary prefix-free codes, your codebook is among the best in
terms of minimal E[L] (or construct the best codebook).
Solution: First of all, for the given source probabilities, Huffman encoding can result in
the above tree. Since Huffmans procedure minimizes the expected length, our tree should
also minimize the average length. A different argument is also possible in this case. Notice
that the average length for our coding scheme is same as the entropy of the source for the
given distribution. Thus, there is no beating of our scheme. Notice that the first argument
is more general than the second one.
(f ) Assume now that p4 = 13 + and p5 = 31 , for some <<. Do you think your codebook
to be still optimal in terms of minimizing E[L].
Solution: Let us perform Huffman encoding after modifying the probability distribution.
As far as min(p4 , p5 ) stays above 19 , we can choose any pair such that p4 + p5 = 23 and still
have the same optimal tree, provided that the rest of the probabilities are not tinkered
with.
Question 2) Find 2 probability distributions for which Huffman encoding will give the
following tree.
S3 S4
S2
S1
0 1
Solution: ( 12 , 14 , 81 , 18 ), (0.52, 0.26, 0.12, 0.1).

Question 3)Cover and Thomas, Chapter 2, Problem 4: A symmetric function from Rk
R is unchanged by any permutation of its arguments. Consider a sequence of symmetric
functions Hm (p1 , p2 , , pm ) [0, 1]m R+ with the following properties
1. H2 ( 21 , 12 ) = 1.
2. H2 (p, 1 p) is continuous in p.
3.
p1 p2
Hm (p1 , p2 , , pm ) = Hm1 (p1 + p2 , p3 , , pm ) + (p1 + p2 )H2 ( , ). (1)
p1 + p 2 p1 + p2
In a sequence of steps, we will show that

1
Hm (p1 , p2 , , pm ) = pi log (2)
i pi
2
is the only meaningful measure of information with the above properties. Assume also that
i=1 pi = 1 (this is just for convenience).
m
a) Defining qk = ki=1 pi , show that
Hm (p1 , , pm ) = Hmk+1 (qk , pk+1 , pk+2 , , pm )

p1 p2 pk
+ qk Hk ( , , , ).
qk qk qk
Hint: check k = 2, and generalize. Keep in mind what happens when qk = 0.
ans: When k = 2, the expression is indeed Property 3. Suppose m is given and the
expression is true for all m m. For m, let the expression is true for some k = j. We will
show that the affirmation stays true for j + 1. To this end,
1 p1 p2 pj
Hm (p1 , , pm ) = Hmj+1 (qj , pj+1 , pj+2 , , pm ) + qj Hj ( , , , )
qj q j qj
2 q j p j+1 p1 p2 pj
= Hmj (qj+1 , pj+2 , , pm ) + qj+1 H2 ( , ) + qj Hj ( , , , )
qj+1 qj+1 qj qj qj
3 qj pj+1 qj p1 pj
= Hmj (qj+1 , pj+2 , , pm ) + qj+1 [H2 ( , )+ Hj ( , , )]
qj+1 qj+1 qj+1 qj qj
4 p p p j+1
= Hmj (qj+1 , pj+2 , , pm ) + qj+1 Hj+1 ( , , )
1 2
,
qj+1 qj+1 qj+1
Steps 1 and 3 are straightforward (Property 3/rearrangements).
Step 2: apply Property 3 on the first term in RHS.
Step 4: apply Property for m < m.
b) Show that with m = bk , pi = bk , i

1 1 1
Hm (p1 , p2 , , pm ) = kHb ( , , , ) (3)
b b b
ans: Apply part (a) with k = b, to get
Hm (bk , , , bk ) = Hmb+1 (bk+1 , bk , bk , , bk ) + bk+1 H(b1 , b1 , b1 ).
Now use the above strategy inductively on b elements with the least probability, until all
probabilities in the argument are equal, i.e.,
Hm (bk , , bk ) = H mb (bk1 , bk1 , bk1 ) + H(b1 , b1 , b1 ).
Notice that the RHS is in k 1, and we can continue this way to obtain expressions in lower
values of k, and finally we get the required expression.
It is very useful to notice that the formula holds in particular for b = 2. Since H2 ( 21 , 12 ) = 1
(Property 1), this will also imply that pi = 2k , i is indeed the distribution which maximizes
entropy when m = 2k . So we have,
H2k (2k , 2k , , 2k ) = k = log2 2k . (4)
3
c) Show that for every given positive integer k, there exists another integer l such that,
2l bk 2l+1 , b 2 (5)
ans: For a given bk , start with some j such that 2j bk . Now check whether 2j+1 bk , if
not put j = j + 1 and repeat the procedure to obtain a constructive proof.
d) Using parts (a), (b) and (c), show that

l 1 1 1 l 1
Hb ( , , , ) + . (6)
k b b b k k
ans: Observe that from part (a),

1 1 1 1 1
Hm ( , ) = Hm+1 (0, , , )
m m m m m
Suppose pi and pj are not equal in a given probability assignment. W.l.o.g assume that
p1 p2 . We then have,
p1 p2
Hm+1 (p1 , p2 , , pm ) = Hm (q1 , p3 , , pm ) + q2 H2 ( , )
q2 q2
1 1
Hm (q1 , p3 , , pm ) + q2 H2 ( , )
2 2
p 1 + p2 p1 + p2
= Hm+1 ( , , p3 , , pm )
2 2
Thus our measure of information (entropy) increases if we equalize any two probability
values which are not the same. Every time we do this equalization, entropy goes up, until
there is a point where we cannot do this adjustment. This indeed is the assignment of
equal probability, or pi = m1 .
Adding sufficiently many zeros and performing the equalization of probabilities de-
scribed above, we can show that
H2l (2l , 2l , , 2l ) Hbk (bk , bk , , bk ) H2l+1 (2l1 , 2l1 , , 2l1 ). (7)
Applying part (b) will finish the proof.

e)Argue that when k becomes large
1 1 1
Hb ( , , , ) log2 b (8)
b b b
Notice that since 2l+1 k, l is a non-decreasing function of k. In particular,
l log 2 k log b (l + 1) log 2.
In base 2,
l l 1
log2 b + .
k k k
Comparing this with part (d), and taking the limits,
1 1 1
H( , , , ) log2 b
b b b
4
f ) (Bonus) We have shown that entropy has the required logarithmic structure when the
probabilities are equal. Explain how this can be extended to probabilities having rational
values to obtain (2), using the previous questions. The final touch is to use continuity of
H() (property 2), to argue that it works for all real values in [0, 1]. Can you finish the
arguments.
ans: Suppose that each probability value is a rational number in [0, 1), i.e. one can write
pi in the form abii , ai , bi Z. Imagine a new probability vector r with length n = m
i=1 bi , with
each value ri = n . It is clear that we can generate bi by adding together an integer number
1 ai
(ji bj ) of ri . The formula now follows by the repeated applications of Property 3 and
part (e).
As for irrational values, the continuity of function H2 (p, 1p) in p is sufficient to extend
the formula. Note in particular that using Property 3, we can convert the entropy functional
to terms containing H2 (, ).
ndent Question 4) The entropy of discrete random variable X with distribution p(x)
is defined as
1
H(X) = E log .
p(X)
We can similarly define the conditional entropy as
1
H(X2 X1 ) = E log .
p(X2 X1 )
(a) If (X1 , X2 ) has the joint distribution p(x1 , x2 ), express H(X2 X2 ) in terms of the
distributions.
Solution:
H(X1 X2 ) = E log p(X1 X2 )

1
= p(x1 , x2 ) log
x1 ,x2 p(x1 x2 )
= x2 H(X1 X2 = x2 ).
(b) Find H(X2 X1 ) if X1 and X2 are independent.

Solution: If X1 and X2 are independent, we have
H(X2 X1 ) = H(X2 ). (9)
(c) We say that X1 X2 X3 forms a Markov Chain in this order if p(x3 x2 , x1 ) = p(x3 x2 ).
Simplify the entropy H(X1 , X2 , X3 ).
Solution:
H(X1 , X2 , X3 ) = H(X1 ) + H(X2 X1 ) + H(X3 X2 , X1 ) (10)

= H(X1 ) + H(X2 X1 ) + H(X3 X2 ), (11)
where the last step used the Markov property.
Question 5) Recall that Huffman codes minimize E[L] over all prefix free codes. Suppose
5
we are interested in minimizing i wi pi li , where wi are positive valued weights. Can you
give an optimal algorithm.
Solution: Let ci = wi pi , and qi = ci /(i ci ). The above optimization is then to minimize
qi li , which can be written as Eq [L], for an appropriate distribution. The latter problem,
however, is solved by Huffman encoding by taking qi as the input distribution.

Indian Institute of Technology Bombay

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Indian Institute of Technology Bombay

Încărcat de

Drepturi de autor:

Formate disponibile

Indian Institute of Technology Bombay

Department of Electrical Engineering

Handout 3 EE 708 Information Theory and Coding

(d) Is your code uniquely decodable? Is it instantaneous?

Solution: ( 12 , 14 , 81 , 18 ), (0.52, 0.26, 0.12, 0.1).

In a sequence of steps, we will show that

a) Defining qk = ki=1 pi , show that

Hm (p1 , , pm ) = Hmk+1 (qk , pk+1 , pk+2 , , pm )

Steps 1 and 3 are straightforward (Property 3/rearrangements).

Step 2: apply Property 3 on the first term in RHS.

Step 4: apply Property for m < m.

b) Show that with m = bk , pi = bk , i

ans: Apply part (a) with k = b, to get

Hm (bk , , , bk ) = Hmb+1 (bk+1 , bk , bk , , bk ) + bk+1 H(b1 , b1 , b1 ).

Hm (bk , , bk ) = H mb (bk1 , bk1 , bk1 ) + H(b1 , b1 , b1 ).

H2k (2k , 2k , , 2k ) = k = log2 2k . (4)

d) Using parts (a), (b) and (c), show that

ans: Observe that from part (a),

H2l (2l , 2l , , 2l ) Hbk (bk , bk , , bk ) H2l+1 (2l1 , 2l1 , , 2l1 ). (7)

Applying part (b) will finish the proof.

l log 2 k log b (l + 1) log 2.

H(X1 X2 ) = E log p(X1 X2 )

(b) Find H(X2 X1 ) if X1 and X2 are independent.

H(X2 X1 ) = H(X2 ). (9)

H(X1 , X2 , X3 ) = H(X1 ) + H(X2 X1 ) + H(X3 X2 , X1 ) (10)

where the last step used the Markov property.

S-ar putea să vă placă și