Lecture-1 Information Theory

Lecture on
Information Theory
Md Abul Kalam Azad

Professor
Jahangirnagar University, Savar, Dhaka-1342
What is Information?
 Can we measure Information?
 Consider the two following sentences:
1. There is a traffic jam in Gabtoli

2. There is a traffic jam in Gabtoli near Exit Point Mazar Road
Sentence 2 seems to have more information than that of

sentence 1. From the semantic viewpoint, sentence 2 provides
more useful information.
2
 It is hard to measure the “semantic” (শব্দার্গত
থ ভাবে)
information!
 Consider the following two sentences
1. There is a traffic jam in Gabtoli near Exit Point Mazar Road

2. There is a traffic jam in Gabtoli near Exit Point Technical
Turning
It’s not clear whether sentence 1 or 2 would have more information!
3
 Let’s attempt at a different definition of
information.
 How about counting the number of letters in
the two sentences:
1. There is a traffic jam in Gabtoli (27 letters)

2. There is a traffic jam in Gabtoli near Exit Point Mazar
Road (49 letters)
Definitely, something we can measure and compare!
4
It’s interesting to know that
log is the only functionf
that satisfies
What is information? f ( s l )  lf ( s )
 First attempt to quantify information by Hartley

(1928).
 Every symbol of the message has a choice of s possibilities.

 A message of length l , therefore can have s l distinguishable
possibilities.
 Information measure is then the logarithm of sl

I  log(s )  l log(s )
l
Intuitively, this definition makes sense:

One symbol (letter) has the information of log(s ) then a sentence of length l
should have l times more information, i.e. l log s 5
How about we measure information as the number of
Yes/No questions one has to ask to get the correct
answer to a simple game below
1 2 How many questions?
3 4 2
1 2 3 4
How many questions?
5 6 7 8
4
9 10 11 12
13 14 15 16
Randomness due to uncerntainty of where the circle is!
6
Information theory
 Information theory provides a mathematical basis for
measuring the information content.
 To understand the notion of information, think about it as
providing the answer to a question, for example, whether
a coin will come up heads.
 If one already has a good guess about the answer, then the
actual answer is less informative.
 If one already knows that the coin is rigged so that it will come
with heads with probability 0.99, then a message (advanced
information) about the actual outcome of a flip is worthless than
it would be for a honest coin (50-50).
7
Information theory (cont …)
 For a fair (honest) coin, you have no information,
and you are willing to pay more (say in terms of
$) for advanced information - less you know, the
more valuable the information.
 Information theory uses this same intuition(সঙ্গা) ,
but instead of measuring the value for information
in dollars, it measures information contents in
bits.
 One bit of information is enough to answer a
yes/no question about which one has no idea, such
as the flip of a fair coin
8
Shannon’s Information Theory
The
Claude Shannon: A Mathematical Theory of Communication
Bell System Technical Journal, 1948
 Shannon’s measure of information is the number of bits

to represent the amount of uncertainty (randomness) in
a data source, and is defined as entropy
n
H   pi log( pi )
i 1
where there are n symbols 1, 2, … ,n each with

9
probability of occurrence of pi
Shannon’s Entropy
 Consider the following string consisting of symbols a and b:
abaabaababbbaabbabab… ….
 On average, there are equal number of a and b.

 The string can be considered as an output of a below source
with equal probability of outputting symbol a or b:
a
0.5
We want to characterize the average
information generated by the source!
0.5
b
source
10
Intuition on Shannon’s Entropy
n
Why H   pi log( pi )
i 1
Suppose you have a long random string of two binary symbols 0 and 1, and the
p
probability of symbols 1 and 0 are 1 and 0 p
Ex: 00100100101101001100001000100110001 ….
If any string is long enough say N, it is likely to contain Np0 0’s and Np1 1’s.
The probability of this string pattern occurs is equal to
p  p0Np0 p1Np1
 Np0  Np1
Hence, # of possible patterns is 1 / p  p p1
0
1
# bits to represent all possible patterns is log( p0 Np0 p1 Np1 )   Npi log pi
i 0
The average # of bits to represent the symbol is therefore
1
  pi log pi 11
i 0
More Intuition on Entropy
 Assume a binary memoryless source, e.g., a flip of a coin. How
much information do we receive when we are told that the
outcome is heads?
 If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that

the amount of information is 1 bit.
 If we already know that it will be (or was) heads, i.e., P(heads)

= 1, the amount of information is zero!
 If the coin is not fair, e.g., P(heads) = 0.9, the amount of

information is more than zero but less than one bit!
 Intuitively, the amount of information received is the same if

P(heads) = 0.9 or P (heads) = 0.1.
12
Self Information
 So, let’s look at it the way Shannon did.
 Assume a memoryless source with
 alphabet A = (a1, …, an)
 symbol probabilities (p1, …, pn).
 How much information do we get when finding out that the
next symbol is ai?
 According to Shannon the self information of ai is
13
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.
For both the events to happen, the probability is

pA ¢ pB. However, the amount of information
should be added, not multiplied.
Logarithms satisfy this!

No, we want the information to increase with
decreasing probabilities, so let’s use the negative
logarithm.
14
Self Information
Example 1:
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log,
you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys,
if you pick the 2-log (like everyone else), you’ll get bits.
15
Self Information
On average over all the symbols, we get:
H(X) is called the first order entropy of the source.
This can be regarded as the degree of uncertainty

about the following symbol.
16
Entropy
Example: Binary Memoryless Source
BMS 01101000…
Let
Then
1
The uncertainty (information) is greatest when
0 0.5 1 17
Example
Three symbols a, b, c with corresponding probabilities:
P = {0.5, 0.25, 0.25}
What is H(P)?
Three weather conditions in Corvallis: Rain, sunny, cloudy with

corresponding probabilities:
Q = {0.48, 0.32, 0.20}
What is H(Q)?
18
Entropy: Three properties
1. It can be shown that 0 · H · log N.
2. Maximum entropy (H = log N) is reached when

all symbols are equiprobable, i.e.,
pi = 1/N.
3. The difference log N – H is called the redundancy

of the source.
19
THANKS
20

Lecture-1 Information Theory

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture-1 Information Theory

Încărcat de

Drepturi de autor:

Formate disponibile

Lecture on

Md Abul Kalam Azad

1. There is a traffic jam in Gabtoli

Sentence 2 seems to have more information than that of

1. There is a traffic jam in Gabtoli near Exit Point Mazar Road

It’s not clear whether sentence 1 or 2 would have more information!

1. There is a traffic jam in Gabtoli (27 letters)

Definitely, something we can measure and compare!

 First attempt to quantify information by Hartley

 Every symbol of the message has a choice of s possibilities.

 Information measure is then the logarithm of sl

Intuitively, this definition makes sense:

1 2 How many questions?

Randomness due to uncerntainty of where the circle is!

 Shannon’s measure of information is the number of bits

where there are n symbols 1, 2, … ,n each with

 On average, there are equal number of a and b.

 If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that

 If we already know that it will be (or was) heads, i.e., P(heads)

 If the coin is not fair, e.g., P(heads) = 0.9, the amount of

 Intuitively, the amount of information received is the same if

For both the events to happen, the probability is

Logarithms satisfy this!

On average over all the symbols, we get:

H(X) is called the first order entropy of the source.

This can be regarded as the degree of uncertainty

P = {0.5, 0.25, 0.25}

Three weather conditions in Corvallis: Rain, sunny, cloudy with

Q = {0.48, 0.32, 0.20}

2. Maximum entropy (H = log N) is reached when

3. The difference log N – H is called the redundancy

S-ar putea să vă placă și