Huffman coding.ppt

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 2
Optimal codes - I
 A code is optimal if it has the shortest
codeword length L
 This can be seen as an optimization problem
1
m
i i
i
L p l

 
1
1
min
subject to 1
i
m
i i
i
m
l
i
l p
D







Optimal codes - II
 Let’s make two simplifying assumptions
 no integer constraint on the codelengths
 Kraft inequality holds with equality
 Lagrange-multiplier problem
1 1
1
i
m m
l
i i
i i
J p l D
 
 
 
  
 
 
 
0 log 0
log
j j
l l j
j
j
p
J
p D D D
l D


 

     


Optimal codes - III
 Substitute into the Kraft
inequality
that is
Note that
log
j
l j
p
D
D



1
1
1
log log
i
m
l
i
i
i
p
p D
D D




    

*
log
i D i
l p
 
*
*
1 1
log ( ) !!
m m
i i i D i
i i
D
p l p p
L H X
 
 
 
 
the entropy, when we use
base D for logarithms

Optimal codes - IV
 In practice the codeword lengths must be
integer value, so obtained results is a lower
bound
 Theorem
The expected length of any istantaneous D-ary code
for a r.v. X satisfies
this fundamental result derives frow the work of Shannon
( )
D
L H x


Optimal codes - V
 What about the upper bound?
 Theorem
Given a source alphabet (i.e. a r.v.) of entropy it
is possible to find an instantaneous binary code which
length satisfies
 A similar theorem could be stated if we use the wrong
probabilities instead of the true ones ; the only
difference is a term which accounts for the relative entropy
( )
H X
( ) ( ) 1
H X L H X
  
 
i
p
 
i
q

The redundance
 It is defined as the average codeword
legths minus the entropy
 Note that
(why?)
Redundancy log
i i
i
L p p
 
  
 
 

0 redundancy 1
 

Compression ratio
 It is the ratio between the average number
of bit/symbol in the original message and the
same quantity for the coded message, i.e.
average original symbol length
average compressed symbol length
C
 

 
( )!!
L X


Uniquely decodable codes
 The set of the instantaneous codes are
a small subset of the uniquely
decodable codes.
 It is possible to obtain a lower average
code length L using a uniquely
decodable code that is not
instantaneous? NO
 So we use instantaneous codes that are easier to
decode

Summary
 Average codeword length L
 for uniquely decodable codes
(and for instantaneous codes)
 In practice for each r.v. with entropy
we can build a code with average
codeword length that satisfies
( )
L H X

( )
H X
X
( ) ( ) 1
H X L H X
  

Shannon-Fano coding
 The main advantage of the Shannon-Fano
technique is its semplicity
 Source symbols are listed in order of nonincreasing
probability.
 The list is divided in such a way to form two groups
of as nearly equal probabilities as possible
 Each symbol in the first group receives a 0 as first
digit of its codeword, while the others receive a 1
 Each of these group is then divided according to the
same criterion and additional code digits are
appended
 The process is continued until each group contains
only one message

example
H=1.9375 bits
L=1.9375 bits
1 2
1 4
1 8
1 16
1 32
1 32
a
b
c
d
e
f
0
1
1
1
1
1
0
1
1
1
1
0
1
1
1
0
1
1
0
1

Shannon-Fano coding - exercise
 Encode, using Shannon-Fano
algorithm
Symb. Prob.
* 12%
? 5%
! 13%
& 2%
$ 29%
€ 13%
§ 10%
° 6%
@ 10%

Is Shannon-Fano coding optimal?
H=2.2328 bits
L=2.31 bits
0.35
0.17
0.17
0.16
0.15
a
b
c
d
e
00
01
10
110
111
0
100
101
110
111 L1=2.3 bits

Huffman coding - I
 There is another algorithm which
performances are slightly better than
Shanno-Fano, the famous Huffman coding
 It works constructing bottom-up a tree, that
has symbols in the leafs
 The two leafs with the smallest probabilities
becomes sibling under a parent node with
probabilities equal to the two children’s
probabilities

Huffman coding - II
 At this time the operation is repeated,
considering also the new parent node and
ignoring its children
 The process continue until there is only
parent node with probability 1, that is the
root of the tree
 Then the two branches for every non-leaf
node are labeled 0 and 1 (typically, 0 on the
left branch, but the order is not important)

Huffman coding - example
0
Symbol Prob.
0.05
0.05
0.1
0.2
0.3
0.2
0.1
a
b
c
d
e
f
g a
0.05
b
0.05
c
0.1
d
0.2
e
0.3
f
0.2
g
0.1
0.1
0.2
0.3
0.4
0.6
1.0
0
0
0
0
0
1
1
1
1
1
1
a
0.05
b
0.05
c
0.1
d
0.2
e
0.3
f
0.2
g
0.1
0.1
0.2
0.3
0.4
0.6
1.0

Huffman coding - example
Exercise: evaluate H(X) and L(X)
H(X)=2.5464 bits
L(X)=2.6 bits !!
Symbol Prob. Codeword
0.05 0000
0.05 0001
0.1 001
0.2 01
0.3 10
0.2 11
a
b
c
d
e
f 0
0.1 111
g

Huffman coding - exercise
 Code the sequence
aeebcddegfced
and calculate the compression
ratio
Sol: 0000 10 10 0001 001 01 01
10 111 110 001 10 01
Aver. orig. symb. length = 3 bits
Aver. compr. symb. length = 34/13
C=.....
0.05 0000
0.05 0001
0.1 001
0.2 01
0.3 10
0.2 11
a
b
c
d
e
f 0
0.1 111
g

 Decode the sequence
0111001001000001111110
Sol: dfdcadgf
0.05 0000
0.05 0001
0.1 001
0.2 01
0.3 10
0.2 11
a
b
c
d
e
f 0
0.1 111
g

 Encode with Huffman the sequence
01$cc0a02ba10
and evaluate entropy, average
codeword length and compression
ratio
Symb. Prob.
0.10
0.03
0.14
0 0.4
1 0.22
2 0.04
$ 0.07
a
b
c

Symb. Prob.
0 0.16
1 0.02
2 0.15
3 0.29
4 0.17
5 0.04
% 0.17
 Decode (if possible) the
Huffman coded bit streaming
01001011010011110101...

Huffman coding - notes
 In the huffman coding, if, at any time, there
is more than one way to choose a smallest
pair of probabilities, any such pair may be
chosen
 Sometimes, the list of probabilities is inizialized to be
non-increasing and reordered after each node
creation. This details doesn’t affect the correctness of
the algorithm, but it provides a more efficient
implementation

 There are cases in which the Huffman coding
does not uniquely determine codeword
lengths, due to the arbitrary choice among
equal minimum probabilities.
 For example for a source with probabilities
it is possible to obtain
codeword lengths of and of
 It would be better to have a code which codelength has
the minimum variance, as this solution will need the
minimum buffer space in the transmitter and in the
receiver
 
0.4, 0.2, 0.2, 0.1, 0.1
 
1, 2, 3, 4, 4  
2, 2, 2, 3, 3

 Schwarz defines a variant of the
Huffman algorithm that allows to build
the code with minimum .
 There are several other variants, we
will explain the most important in a
while.
max
l

Optimality of Huffman coding - I
 It is possible to prove that, in case of
character coding (one symbol, one
codeword), Huffman coding is optimal
 In another terms Huffman code has
minimum redundancy
 An upper bound for redundancy has been found
where is the probability of the most likely simbol
 
1 2 2 2 1
redundancy 1 log log log 0.086
p e e p
    
1
p

Optimality of Huffman coding - II
 Why Huffman code “suffers” when there is
one symbol with very high probability?
 Remember the notion of uncertainty...
The main problem is given by the integer
constraint on codelengths!!
 This consideration opens the way to a more powerful
coding... we will see it later
( ) 1 log( ( )) 0
p x p x
   

Huffman coding - implementation
 Huffman coding can be generated in
O(n) time, where n is the number of
source symbols, provided that
probabilities have been presorted
(however this sort costs O(nlogn)...)
 Nevertheless, encoding is very fast

 However, spatial and temporal complexity of
the decoding phase are far more important,
because, on average, decoding will happen
more frequently.
 Consider a Huffman tree with n symbols
 n leafs and n-1 internal nodes

has the pointer to a symbol and
the info that it is a leaf
has two pointers
2 2( 1) 4 words (32 bits)
n n n
 

 1 million symbols 16 MB of memory!
 Moreover traversing a tree from root to leaf
involves follow a lot of pointers, with little
locality of reference. This causes several
page faults or cache misses.
 To solve this problem a variant of Huffman
coding has been proposed: canonical
Huffman coding

canonical Huffman coding - I
Symb. Prob. Code 1 Code 2 Code 3
0.11 000
0.12 001
0.13 100
111
1
000
001
0
10
01 10
0
1
a
b
c
d .14 101
0.24 01
0.26 11
010
10
00
011
10
1
1
e
f
b
0.12
c
0.13
d
0.14
e
0.24
f
0.26
a
0.11
0.23 0.27
0.47
0.53
1.0
0
0
0
0
0
1
1
1 1
1
(0)
(0)
(0)
(0)
(0)
(1)
(1)
(1)
(1) (1)
?

canonical Huffman coding - II
 This code cannot be obtained
through a Huffman tree!
 We do call it an Huffman code
because it is instantaneous and the
codeword lengths are the same than
a valid Huffman code
 numerical sequence property
 codewords with the same length are
ordered lexicographically
 when the codewords are sorted in lexical
order they are also in order from the
longest to the shortest codeword
Symb. Code 3
000
001
010
011
10
11
a
b
c
d
e
f

canonical Huffman coding - III
 The main advantage is that it is not necessary
to store a tree, in order to decoding
 We need
 a list of the symbols ordered according to the lexical
order of the codewords
 an array with the first codeword of each distinct
length

34
canonical Huffman coding - IV
Encoding. Suppose there are n disctinct symbols, that for symbol
i we have calculated huffman codelength and
i
l i
i l maxlength
 
for 1 to { [ ] 0; }
for 1 to { [ ] [ ] 1; }
[ ] 0;
for 1 downto 1 {
[ ] ( [ 1] [ 1])/ 2 ; }
for 1 to
i i
k maxlength numl k
i n numl l numl l
firstcode maxlength
k maxlength
firstcode k firstcode k numl k
k maxlength
 
  

 
   
 
 

 
{ [ ]= [ ]; }
for 1 to {
[ ] [ ];
, [ ]- [ ] ;
[ ] [ ] 1; }
i
i i i
i i
nextcode k firstcode k
i n
codeword i nextcode l
symbol l nextcode l firstcode l i
nextcode l nextcode l



 
numl[k] = number of
codewords with length k
firstcode[k] =
integer for first code of
length k
nextcode[k] =
integer for the next
codeword of length k to
be assigned
symbol[-,-] used for
decoding
codeword[i] the
rightmost bits of this
integer are the code for
symbol i
i
l

35
canonical Huffman - example
 1. Evaluate array numl
Symb. length
2
5
5
3
2
5
5
2
i
i l
a
b
c
d
e
f
g
h
: [0 3 1 0 4]
numl
 2. Evaluate array firstcode
: [2 1 1 2 0]
firstcode
 3. Construct array codeword and symbol
 
for 1 to {
[ ]= [ ]; }
for 1 to {
[ ] [ ];
, [ ]- [ ] ;
[ ] [ ] 1; }
i
i i i
i i
k maxlength
nextcode k firstcode k
i n
codeword i nextcode l
symbol l nextcode l firstcode l i
nextcode l nextcode l




 
- - - -
a e h -
d - - -
- - - -
b c f g
symbol
0 1 2 3
1
2
3
4
5
code bits
word
1 01
0 00000
1 00001
1 001
2 10
2 00010
3 00011
3 11
for 1 downto 1 {
[ ] ( [ 1]
[ 1]) / 2 ; }
k maxlength
firstcode k firstcode k
numl k
 
  
 

canonical Huffman coding - V
Decoding. We have the arrays firstcode and symbols
 
();
1;
while [ ] {
2* ();
1; }
Return , [ ] ;
v nextinputbit
k
v firstcode k
v v nextinputbit
k k
symbol k v firstcode k



 
 

nextinputbit() function that
returns next input bit
firstcode[k] = integer for first
code of length k
symbol[k,n] returns the
symbol number n with
codelength k

37
canonical Huffman - example
 
();
1;
while [ ] {
2* ();
1; }
Return , [ ] ;
v nextinputbit
k
v firstcode k
v v nextinputbit
k k
symbol k v firstcode k



 
 

- - - -
a e h -
d - - -
- - - -
b c f g
symbol
0 1 2 3
1
2
3
4
5
: [2 1 1 2 0]
firstcode
00 0
0 0
0 000 00
1
1 1
1 1
1
Decoded: dhebad
00 0
0 0
0 000 00
1
1 1
1 1
1
symbol[3,0] = d
symbol[2,2] = h
symbol[2,1] = e
symbol[5,0] = b
symbol[2,0] = a
symbol[3,0] = d
symbol[3,0] = d
symbol[2,2] = h
symbol[2,1] = e
symbol[5,0] = b
symbol[2,0] = a
symbol[3,0] = d

Huffman coding.ppt

More Related Content

What's hot

Similar to Huffman coding.ppt

Recently uploaded

Huffman coding.ppt