HUFFMAN CODING /TREE

Presented By:
M. Saqib Rehan
503
Presented to:
Sir Naveed Hassan
& the whole class

History of Huffman Coding
What is Huffman Coding
Definition
How Huffman Coding is Greedy
Applications of Huffman Coding
Problem Analysis
Example of Huffman Coding
JAVA Program 3
OUTLINE

 an algorithm developed by David A. Huffman while
he was a Ph.D. / Sc.D.(Doctor of Science) student
at MIT, and published in the 1952 paper "A Method
for the Construction of Minimum-Redundancy
Codes".
 A minimum-redundancy code is one constructed in
such a way that the average number of coding digits
per message is minimized.
4
History of Huffman Coding

 An Application of Binary Trees and Priority Queues
 Huffman Coding is a method of shortening down
messages sent from one computer to another so that
it can be sent quicker.
 Each code is a binary string that is used for
transmission of the corresponding message.
 For Example :
BAGGAGE 100 11 0 0 11 0 101
Plain Text Code
5
What is Huffman Coding

 Huffman codes are an effective technique of ‘lossless
data compression’ which means no information is
lost.
 The idea is to assign variable-length codes to input
characters
 Lengths of the assigned codes are based on the
frequencies of corresponding characters
 Most frequent character smallest code
 Least frequent character largest code
6
Huffman Codes

 Huffman Code is greedy when it locally chooses and
merges two of the smallest nodes (nodes are
weighted after occurrence/frequency. Those which
occur most frequent has the largest values, and those
with that occur least has the lowest values) at a time,
until there is no more nodes left over and a binary
tree is built
7
How Huffman Coding
is Greedy

 Used to compress files for transmission
 To decrease our storage cost
 Huffman coding is widely used for image
compression. Since it reduces the coding redundancy
and results in lossless data at the receiver end
 Uses statistical coding
 more frequently used symbols have shorter code
words
 Works well for text and fax transmissions
8
Applications

Fixed length code:-
 If every word in the code has the same length, the
code is called a fixed-length code, or a block code
9
Fixed & Variable length
code
Advantage Disadvantage
Access is fast because the
computer knows where
each Word starts
Using Fixed length
records, the records are
usually larger and
therefore need more
storage space and are
slower to transfer

variable-length codes
 The advantage of variable-length codes over fixed-
length is short codes can be given to characters that
occur frequently
 on average, the length of the encoded message is less
than fixed-length encoding
 Potential problem: how do we know where one
character ends and another begins?
 not a problem if number of bits is fixed!
A = 00
B = 01
C = 10
D = 11
00 10 11 01 11 00 11 11 11 11 11
A C D B A D D D D D

Prefix property
 A code has the prefix property if no character code is the
prefix (start of the code) for another character
 Example:
 000 is not a prefix of 11, 01, 001, or 10
 11 is not a prefix of 000, 01, 001, or 10 …
Symbol Code
P 000
Q 11
R 01
S 001
T 10
01001101100010
R S T Q P T

Code without prefix
property
 The following code does not have prefix property
 The pattern 1110 can be decoded as QQQP, QTP, QQS,
or TS
Symbol Code
P 0
Q 1
R 01
S 10
T 11

 Consider a file of 100 characters from a- f, with these
frequencies:
 a = 45
 b = 13
 c = 12
 d = 16
 e = 9
 f = 5
13
Problem Analysis
 Typically each character in a file is stored as a single
byte (8 bits)
 If we know we only have six characters, we can use
a 3 bit code for the characters instead:
 a = 000, b = 001, c = 010, d = 011, e = 100, f = 101
 This is called a fixed-length code
 With this scheme, we can encode the whole file with
300 bits (45*3+13*3+12*3+16*3+9*3+5*3)
 We can do better
 Better compression
 More flexibility
14
Fixed-Length Code

 Variable length codes can perform significantly
better
 Frequent characters are given short code words,
while infrequent characters get longer code words
 Consider this scheme:
 a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100
 How many bits are now required to encode our file?
 45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4 = 224 bits
 This is in fact an optimal character code for this file
15
Variable-Length Code
Compression:
Data Compression, in Huffman coding, shrinks down a
String so that it takes up less space. This is desirable for
data storage and data communication. Storage space on
disks is expensive so a file which occupies less disk space
is "cheaper" than an uncompressed String.
Decompression:
In the decompression we convert the binary codes into
the Original string.
Techniques of Huffman
coding:

 Scan text to be compressed and tally occurrence of all
characters.
 Sort or prioritize characters based on number of
occurrences in text.
 Build Huffman code tree based on prioritized list.
 Perform a traversal of tree to determine all code
words.
 Scan text again and create new file using the
Huffman codes.
The (Real) Basic
Algorithm

 Consider the following short text:
Huffman coding ,the greedy algorithm
 Size is = 36 Characters
 Scan the text and prepare a frequency table.
18
EXAMPLE

 Huffman coding ,the greedy algorithm
19
Frequency table
character frequency character frequency character frequency
H 1 c 1 h 2
u 1 o 2 r 2
f 2 d 2 y 1
m 2 i 2 l 1
a 2 g 3 e 3
n 2 , 1
Space 4 t 2

Building a Tree
Prioritize characters
 Create binary tree nodes with character and
frequency of each character
 Place nodes in a priority queue
 The lower the occurrence, the higher the priority in the
queue

21
Building a Tree (Enqueue)
H
1
u
1
c
1
,
1
y
1
l
1
f
2
m
2
a
2
n
2
O
2
d
2
i
2
T
2
h
2
r
2
g
3
e
3
sp
4

22
Building a Tree
H
1
u
1
c
1
,
1
y
1
l
1
f
2
m
2
a
2
n
2
O
2
d
2
i
2
T
2
h
2
r
2
g
3
e
3
sp
4
2

23
Building a Tree
c
1
,
1
y
1
l
1
f
2
m
2
a
2
n
2
O
2
d
2
i
2
T
2
h
2
r
2
g
3
e
3
sp
42
H
1
u
1
2

24
Building a Tree
y
1
l
1
f
2
m
2
a
2
n
2
O
2
d
2
i
2
T
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2

25
Building a Tree
y
1
l
1
f
2
m
2
a
2
n
2
o
2
d
2
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2

26
y
1
l
1
2
Building a Tree
y
1
l
1
f
2
m
2
a
2
n
2
o
2
d
2
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2

27
y
1
l
1
2
Building a Tree
f
2
m
2
a
2
n
2
o
2
d
2
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2

28
y
1
l
1
2
Building a Tree
f
1
m
2
a
2
n
2
o
2
d
2
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
f
2
m
2
4

y
1
l
1
2
Building a Tree
a
2
n
2
o
2
d
2
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
f
2
m
2
4

30
y
1
l
1
2
Building a Tree
a
2
n
2
o
2
d
2
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
a
2
n
2
4
f
2
m
2
4

31
y
1
l
1
2
Building a Tree
o
2
d
2
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
a
2
n
2
4
f
2
m
2
4

32
y
1
l
1
2
Building a Tree
o
2
d
2
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
a
2
n
2
4
o
2
d
2
4
f
2
m
2
4

33
y
1
l
1
2
Building a Tree
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
a
2
n
2
4
o
2
d
2
4
f
2
m
2
4

34
y
1
l
1
2
Building a Tree
i
2
t
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
a
2
n
2
4
o
2
d
2
4
i
2
t
2
4
f
2
m
2
4

35
y
1
l
1
2
Building a Tree
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
a
2
n
2
4
o
2
d
2
4
i
2
t
2
4
f
2
m
2
4
36
y
1
l
1
2
h
2
r
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
a
2
n
2
4
o
2
d
2
4
i
2
t
2
4
h
2
r
2
4
f
2
m
2
4
37
y
1
l
1
2
g
3
e
3
sp
4
H
1
u
1
2
c
1
,
1
2
a
2
n
2
4
o
2
d
2
4
i
2
t
2
4
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
f
2
m
2
4
38
y
1
l
1
2
g
3
e
3
sp
4
f
2
m
2
4
a
2
n
2
4
o
2
d
2
4
i
2
t
2
4
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
2 4
6
y
1
l
1
f
2
m
2
39
g
3
e
3
sp
4
a
2
n
2
4
o
2
d
2
4
i
2
t
2
4
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
40
sp
4
a
2
n
2
4
o
2
d
2
4
i
2
t
2
4
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
sp
4
a
2
n
2
4
8
41
o
2
d
2
4
i
2
t
2
4
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8
42
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8
43
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
42 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8 8
44
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8 8
2 4
6
y
1
l
1
F
2
m
2
g
3
e
3
6
12
45
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
8
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
12
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8
16
46
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8
16
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
8
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
12
20
47
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8
16
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
8
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
12
20
36
• After
enqueuing
only one
index is left

 Dequeue the single node left in the queue.
 This tree contains the new code words for each
character.
 Frequency of root node should equal number of
characters in text
 Huffman coding ,the greedy algorithm =36 Characters
48
Building a Tree
49
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8
16
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
8
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
12
20
36

Coding
Now we assign codes to the tree by placing a
0 on every left branch and a 1 on every right
branch
A traversal of the tree from root to leaf give
the Huffman code for that particular leaf
character
Note that no code is the prefix of another
code
51
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8
16
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
8
2 4
6
y
1
l
1
f
2
m
2
g
3
e
3
6
12
20
36
0
0
0
0 0
0
0
0
0 0
0 0 0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1

52
New code after traversing
Charact
er
code Charact
er
Code Charact
er
Code
space 000 d 0101 y 11000
e 1111 o 0100 , 10111
g 1110 n 0011 c 10110
r 1001 a 0010 u 10101
h 1000 m 11011 H 10100
t 0111 f 11010
i 0110 l 11001
frequent letter=less-
bits
less frequent-
letter=more bits
The tree could have been
built in a number of ways;
each would yielded different
codes but the code would
still be minimal.

 Original text “Huffman coding ,the greedy algorithm ”
 By using fixed length coding , length is
characters*8 36*8= 288
By using variable length code encoded form is:
1010010101110101101011011001000110001011001000101011
0001111100001011101111000111100011101001111111110101
110000000010110011110100101100111100011011
Compressed form contain 146 bits , save memory
Huffman encoding

 Once receiver has tree it scans incoming bit stream
0 ⇒ go left
1 ⇒ go right
54
Decoding the File
55
 101001010111010110101101100100011000101100100010101100
011111000010111011110001111000111010011111111101011100
00000010110011110100101100111100011011
sp
4
a
2
n
2
4
8
o
2
d
2
4
i
2
t
2
4
8
16
h
2
r
2
4
H
1
u
1
2
c
1
,
1
2
4
8
2 2
4
y
1
l
1
f
1
m
1
g
3
e
3
6
10
18
34
0
0
0
0 0
0
0
0
0 0
0 0 0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
56
57

Huffman Coding

Editor's Notes