2. Introduction
The Shannon-Fano algorithm was independently developed by Shannon at Bell Labs
and Robert Fana at MIT.
The encoding steps of the Shannon-Fano algorithm can be presented in the following
top-down manner:
1. Sort the symbols according to the frequency count of their occurrences.
2. Recursively divide the symbols into two parts, each with approximately the same
number of counts, until an parts contain only one symbol.
A natural way of implementing the above procedure is to build a binary tree. As a
convention, let's assign bit 0 to its left branches and 1 to the right branches.
3. Example
Symbols to be coded are the characters in the word SPEAKER.
The frequency count of the symbols is
Total number of symbols is 7
S 1
P 1
E 2
A 1
K 1
R 1
4. E,S:(3) P,A,K,R:(4)
(7)E 2
S 1
P 1
A 1
K 1
R 1
STEP 1
STEP 2.1
0 1
The first division yields two parts: (a) E,S with a total count of 3, denoted as E,S:(3); and (b) P,A,Kand R: with a
total count of 4, denoted as P,A,K,R:(4).
7. Symbol Count Code Number of
bits used
Probability
Pi
E 2 00 4 (2*2) 2/7 = 0.29
S 1 01 2 1/7 =0.14
P 1 100 3 1/7=0.14
A 1 101 3 1/7=0.14
K 1 110 3 1/7=0.14
R 1 111 3 1/7=0.14
Total number of bits : 18 bits
8. Compression Ratio
If the total number of bits required to represent the data before compression is B0
and the total number of bits required to represent the data after compression is B1
,
then we define the compression ratio as
Compression Ratio = B0
/ B1
B0
= 8 * 7 Symbols Assume each character symbol require 8 bit
B1
= 18 bits
Compression Ratio = 56/18 = 3.11 [ Positive Compression ]
Average number of bits used by 7 symbols in the above solution = 18 / 7 = 2.57
9. Entropy (η)
According to the Claude E. Shannon, the entropy η of an information source with
alphabet S = {S1
, S2
, ••. ,Sn
} is defined as:
η = H(S) =
where Pi
is the probability that symbol Si
in S will occur.
The term indicates the amount of information contained in Si
, which
corresponds to the number of bits needed to encode Si
.
10. = 0.29 * log2
(1/0.29) + [ 0.14 * log2
(1/0.14) ] * 5
= 0.29 * log2
(3.45) + [ 0.14 * log2
(7.14) ] * 5
= 0.29 * log2
(3.45) + [ 0.14 * log2
(7.14) ] * 5
= 0.29 * 1.79 + [ 0.14 * 2.84) ] * 5
= 0.52 + 0.40 * 5 = 0.52 + 2.00 = 2.52
This suggests that the minimum average number of bits to code each character in the
word SPEAKER would be at least 2.52. The Shannon-Fano algorithm delivers
satisfactory coding results for data compression.
Entropy (η)
11. References
● Fundamentals of Multimedia by Ze-Nian Li and Mark S Drew, Pearson Education, 2009
● Log calculator by https://ncalculators.com