Shannon's information theory provides a mathematical model for measuring information. It defines information or entropy (H) as the number of binary questions (bits) required to represent the uncertainty in a data source. Entropy is calculated as the sum over all possible outcomes i of the probability of i multiplied by the logarithm of the inverse of the probability of i. A string with equal probabilities of 0s and 1s would have the highest entropy of 1 bit since it is most random. Shannon also developed Huffman coding, an algorithm that assigns variable-length binary codes to symbols to efficiently encode data.
3. History
• When we look at an object, when we evaluate and
appreciate it, the relation between us and the object is
established as an exchange of information
• Humans communicate by symbols, hand signals and
drawings on cave walls
• Later we developed languages, associating sounds with
ideas.
• Today we transmit symbols, coded digital signals of voice,
video around the world at close to the speed of light
• It stems from the work done by Electrical Engineer Claude
Shannon
3
4. Definition
• Can we measure information?
• Consider the two following sentences:
1. There is a traffic jam on I5
2. There is a traffic jam on I5 near Exit 234
4
Sentence 2 seems to have more information than that of sentence 1.
From the semantic viewpoint, sentence 2 provides more useful
information.
5. Definition – Contd.
• It is hard to measure the “semantic” information!
• Consider the following two sentences
1. There is a traffic jam on I5 near Exit 160
2. There is a traffic jam on I5 near Exit 234
5
It’s not clear whether sentence 1 or 2 would have more information!
6. Definition – Contd.
• Let’s attempt at a different definition of information.
• How about counting the number of letters in the two sentences:
6
1. There is a traffic jam on I5 (22 letters)
2. There is a traffic jam on I5 near Exit 234 (33 letters)
Definitely something we can measure and compare!
7. Why?
• To prove the unconditional security of
cryptographic systems
• To prove impossibility and lower bound results on
the achievability of unconditional security.
• It is a key tool in reduction proofs showing that
breaking a cryptographic system is as hard as
breaking an underlying cryptographic primitive
(e.g. a one-way function or a pseudo-random
function)
7
8. How about we measure information as the number of Yes/No
questions one has to ask to get the correct answer to a simple
game below
1 2
3 4
1 2 3 4
5 6 8
9 10 11 12
13 14 15 16
7
How many questions?
How many questions?
2
4
Randomness due to uncerntainty of where the circle is!
9. Simple Analogy
• Consider two machines M1, M2
9
A A B C C D B A D C
A A B B C D A A A B
P(A) = P(B) = P(C) = P(D) = 0.25
P(A) = 0.5 P(B) = 0.25 P(C) = 0.125 P(D) =
0.125
11. Mathematical models for information
source
• Discrete source
11
1][P
},,,{
1
21
L
k
kkk
L
pxXp
xxxX
12. Shannon’s Information Theory
Claude Shannon: A Mathematical Theory of Communication
The
Bell System Technical Journal, 1948
Where there are symbols 1, 2, … ,
each with probability of occurrence of
Shannon’s measure of information is the number of bits to
represent the amount of uncertainty (randomness) in a
data source, and is defined as entropy
)log(
1
n
i
ii ppH
ip
n n
13. Shannon’s Entropy
• Consider the following string consisting of symbols a and b:
abaabaababbbaabbabab… ….
• On average, there are equal number of a and b.
• The string can be considered as an output of a below source with equal
probability of outputting symbol a or b:
source
0.5
0.5
a
b
We want to characterize the average
information generated by the source!
15. More Intuition on Entropy
• Assume a binary memoryless source, e.g., a flip of a coin. How much
information do we receive when we are told that the outcome is heads?
• If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that the amount of
information is 1 bit.
• If we already know that it will be (or was) heads, i.e., P(heads) = 1, the
amount of information is zero!
• If the coin is not fair, e.g., P(heads) = 0.9, the amount of information is
more than zero but less than one bit!
• Intuitively, the amount of information received is the same if P(heads) = 0.9
or P (heads) = 0.1.
17. Example
P = {0.5, 0.25, 0.25}
Three symbols a, b, c with corresponding probabilities:
What is H(P)?
Q = {0.48, 0.32, 0.20}
Three weather conditions in Corvallis: Rain, sunny, cloudy with
corresponding probabilities:
What is H(Q)?
Editor's Notes
Huffman coding is optimum is a sense that the average number of bits presents source symbol is a minimum, subject to that the code words satisfy the prefix condition