3.
History of Huffman Coding
What is Huffman Coding
Definition
How Huffman Coding is Greedy
Applications of Huffman Coding
Problem Analysis
Example of Huffman Coding
JAVA Program 3
OUTLINE
4.
an algorithm developed by David A. Huffman while
he was a Ph.D. / Sc.D.(Doctor of Science) student
at MIT, and published in the 1952 paper "A Method
for the Construction of Minimum-Redundancy
Codes".
A minimum-redundancy code is one constructed in
such a way that the average number of coding digits
per message is minimized.
4
History of Huffman Coding
5.
An Application of Binary Trees and Priority Queues
Huffman Coding is a method of shortening down
messages sent from one computer to another so that
it can be sent quicker.
Each code is a binary string that is used for
transmission of the corresponding message.
For Example :
BAGGAGE 100 11 0 0 11 0 101
Plain Text Code
5
What is Huffman Coding
6.
Huffman codes are an effective technique of ‘lossless
data compression’ which means no information is
lost.
The idea is to assign variable-length codes to input
characters
Lengths of the assigned codes are based on the
frequencies of corresponding characters
Most frequent character smallest code
Least frequent character largest code
6
Huffman Codes
7.
Huffman Code is greedy when it locally chooses and
merges two of the smallest nodes (nodes are
weighted after occurrence/frequency. Those which
occur most frequent has the largest values, and those
with that occur least has the lowest values) at a time,
until there is no more nodes left over and a binary
tree is built
7
How Huffman Coding
is Greedy
8.
Used to compress files for transmission
To decrease our storage cost
Huffman coding is widely used for image
compression. Since it reduces the coding redundancy
and results in lossless data at the receiver end
Uses statistical coding
more frequently used symbols have shorter code
words
Works well for text and fax transmissions
8
Applications
9.
Fixed length code:-
If every word in the code has the same length, the
code is called a fixed-length code, or a block code
9
Fixed & Variable length
code
Advantage Disadvantage
Access is fast because the
computer knows where
each Word starts
Using Fixed length
records, the records are
usually larger and
therefore need more
storage space and are
slower to transfer
10.
variable-length codes
The advantage of variable-length codes over fixed-
length is short codes can be given to characters that
occur frequently
on average, the length of the encoded message is less
than fixed-length encoding
Potential problem: how do we know where one
character ends and another begins?
not a problem if number of bits is fixed!
A = 00
B = 01
C = 10
D = 11
00 10 11 01 11 00 11 11 11 11 11
A C D B A D D D D D
11.
Prefix property
A code has the prefix property if no character code is the
prefix (start of the code) for another character
Example:
000 is not a prefix of 11, 01, 001, or 10
11 is not a prefix of 000, 01, 001, or 10 …
Symbol Code
P 000
Q 11
R 01
S 001
T 10
01001101100010
R S T Q P T
12.
Code without prefix
property
The following code does not have prefix property
The pattern 1110 can be decoded as QQQP, QTP, QQS,
or TS
Symbol Code
P 0
Q 1
R 01
S 10
T 11
13.
Consider a file of 100 characters from a- f, with these
frequencies:
a = 45
b = 13
c = 12
d = 16
e = 9
f = 5
13
Problem Analysis
14. Typically each character in a file is stored as a single
byte (8 bits)
If we know we only have six characters, we can use
a 3 bit code for the characters instead:
a = 000, b = 001, c = 010, d = 011, e = 100, f = 101
This is called a fixed-length code
With this scheme, we can encode the whole file with
300 bits (45*3+13*3+12*3+16*3+9*3+5*3)
We can do better
Better compression
More flexibility
14
Fixed-Length Code
15.
Variable length codes can perform significantly
better
Frequent characters are given short code words,
while infrequent characters get longer code words
Consider this scheme:
a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100
How many bits are now required to encode our file?
45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4 = 224 bits
This is in fact an optimal character code for this file
15
Variable-Length Code
16. Compression:
Data Compression, in Huffman coding, shrinks down a
String so that it takes up less space. This is desirable for
data storage and data communication. Storage space on
disks is expensive so a file which occupies less disk space
is "cheaper" than an uncompressed String.
Decompression:
In the decompression we convert the binary codes into
the Original string.
Techniques of Huffman
coding:
17.
Scan text to be compressed and tally occurrence of all
characters.
Sort or prioritize characters based on number of
occurrences in text.
Build Huffman code tree based on prioritized list.
Perform a traversal of tree to determine all code
words.
Scan text again and create new file using the
Huffman codes.
The (Real) Basic
Algorithm
18.
Consider the following short text:
Huffman coding ,the greedy algorithm
Size is = 36 Characters
Scan the text and prepare a frequency table.
18
EXAMPLE
19.
Huffman coding ,the greedy algorithm
19
Frequency table
character frequency character frequency character frequency
H 1 c 1 h 2
u 1 o 2 r 2
f 2 d 2 y 1
m 2 i 2 l 1
a 2 g 3 e 3
n 2 , 1
Space 4 t 2
20.
Building a Tree
Prioritize characters
Create binary tree nodes with character and
frequency of each character
Place nodes in a priority queue
The lower the occurrence, the higher the priority in the
queue
21.
21
Building a Tree (Enqueue)
H
1
u
1
c
1
,
1
y
1
l
1
f
2
m
2
a
2
n
2
O
2
d
2
i
2
T
2
h
2
r
2
g
3
e
3
sp
4
48.
Dequeue the single node left in the queue.
This tree contains the new code words for each
character.
Frequency of root node should equal number of
characters in text
Huffman coding ,the greedy algorithm =36 Characters
48
Building a Tree
50.
Coding
Now we assign codes to the tree by placing a
0 on every left branch and a 1 on every right
branch
A traversal of the tree from root to leaf give
the Huffman code for that particular leaf
character
Note that no code is the prefix of another
code
52.
52
New code after traversing
Charact
er
code Charact
er
Code Charact
er
Code
space 000 d 0101 y 11000
e 1111 o 0100 , 10111
g 1110 n 0011 c 10110
r 1001 a 0010 u 10101
h 1000 m 11011 H 10100
t 0111 f 11010
i 0110 l 11001
frequent letter=less-
bits
less frequent-
letter=more bits
The tree could have been
built in a number of ways;
each would yielded different
codes but the code would
still be minimal.
53.
Original text “Huffman coding ,the greedy algorithm ”
By using fixed length coding , length is
characters*8 36*8= 288
By using variable length code encoded form is:
1010010101110101101011011001000110001011001000101011
0001111100001011101111000111100011101001111111110101
110000000010110011110100101100111100011011
Compressed form contain 146 bits , save memory
Huffman encoding
54.
Once receiver has tree it scans incoming bit stream
0 ⇒ go left
1 ⇒ go right
54
Decoding the File