Huffman Coding
It isCompression technique to reduce the
size of the data or message.
(Lossless data Compression technique)
3.
Computer Data Encoding:
Howdo we represent data in binary?
Fixed length codes.
Encode every symbol by a unique binary
string of a fixed length.
Examples: ASCII (8 bit code),
ASCII Example
:
Message sendusing ASCII code
ASCII codes are 8 bit code
A-65
B-66
C-67
D-68
E-69
Suppose we have a message
BCCABBDDAECCBBAEDDCC
7.
Total space usagein bits
:
Assume an l bit fixed length code.
For a file of n characters
Need nl bits.
8.
ASCII Example
:
A BC D E
01000001 01000010 01000011 01000100
01000101
So total bits = 8*20 =160bits (8 bit for each alphabet)
Suppose we have a message
BCCABBDDAECCBBAEDDCC
9.
Fixed Length codes
Idea:In order to save space, use less bits.
There are 20 character 20*3=60 bits
Also send the table for receiver end to
understand the code for decode the
message. So total cost of the message is:
Character
Frequency
Code
A
3
000
B
5
001
C
6
010
D
4
011
E
2
100
10.
Fixed Length codes
Idea:In order to save space, use less bits.
Character Size= 20*3=60 bits
Original Character size= 5*8=40 bits
New code for 5 character= 5*3= 15 bits
Total= 115 bits
(along with table)
Character
Frequency
Code
A
3
000
B
5
001
C
6
010
D
4
011
E
2
100
11.
Variable Length codes
Idea:In order to save space, use less bits
for frequent characters and more bits
for rare characters.
The variable length codes assigned to
input character are prefix codes means
the codes (bit sequence) are assigned in
such a way that the code assigned to one
character is not prefix of code assigned
to any other character.
12.
Variable Length codes
Idea:In order to save space, use less bits
for frequent characters and more bits
for rare characters.
Example: suppose alphabet of 3 symbols:
{ A, B, C }.
suppose in file: 1,000,000
characters.
Need 2 bits for a fixed length
code for a total of
2,000,000 bits.
13.
Variable Length codes- example
A
B
C
999,000
500
500
Suppose the frequency distribution of the
characters is:
A
B
C
0
10
11
Note that the code of A is of length 1, and the codes for B
and C are of length 2
Encode:
14.
Fixed code: 1,000,000x 2 = 2,000,000
Varable code: 999,000 x 1
+ 500 x 2
500 x 2
1,001,000
Total space usage in bits
:
A savings of almost 50%
15.
How do wedecode
?
In the fixed length, we know where every
character starts, since they all have the
same number of bits.
Example: A = 00
B = 01
C = 10
000000010110101001100100001010
A A A B B C C C B C B A A C C
16.
How do wedecode
?
In the variable length code, we use an
idea called Prefix code, where no code is a
prefix of another.
Example: A = 0
B = 10
C = 11
None of the above codes is a prefix of
another.
17.
Prefix Code
Let usunderstand prefix code with a
counter example: Let there be four
character a,b,c and d and their
corresponding variable length codes be
00, 01, 0, 1.
This coding leads to ambiguity because
code assigned to c is prefix of codes
assigned to a and b if the compressed bit
stream is 0001, the de-compressed output
may be cccd or ccb or acd or ab.
18.
How do wedecode
?
Example: A = 0
B = 10
C = 11
So, for the string:
A A A B B C C C B C B A A C C the encoding:
0 0 01010111111101110 0 01111
19.
Prefix Code
Example: A= 0
B = 10
C = 11
Decode the string
0 0 01010111111101110 0 01111
A A A B B C C C B C B A A C C
20.
Requirement
:
Construct a variablelength code for a
given file with the following properties:
1. Prefix code.
2. Using shortest possible codes.
3. Efficient.
21.
There are mainlytwo major parts in
Huffman Coding:
1. Build a Huffman tree from input
characters.
2. traverse the Huffman tree and assigned
codes to character.
Huffman Tree
22.
Steps to BuildHuffman Tree
:
1. Create a leaf node for each unique character and build
a min heap of all leaf nodes.
2. Extract two nodes with the minimum frequency from
the min heap.
3. Create a new internal node with frequency equal to the
sum of the nodes frequencies. Make the first
extracted node as its left child and the other
extracted node as its right child add this node to the
min heap.
repeat steps 2 and 3 until the heap contain only one node.
After completion of the tree assign 0 to left child and 1
to right child in whole tree.
23.
Idea
Consider a binarytree, with:
0 meaning a left turn
1 meaning a right turn.
0
0
0
1
1
1
A
B
C D
Algorithm Run
:
A 10B 20
C 30
F 60
X 30
Y 60
D 40 E 50
Z 90 W 120
V 210
0
0 0
0
0
1
1
1
1
1
34.
The Huffman encoding
:
A10 B 20
C 30
F 60
X 30
Y 60
D 40 E 50
Z 90 W 120
V 210
0
0 0
0
0
1
1
1
1
1
A: 1000
B: 1001
C: 101
D: 00
E: 01
F: 11
File Size: 10x4 + 20x4 + 30x3 + 40x2 + 50x2 + 60x2 =
40 + 80 + 90 + 80 + 100 + 120 = 510 bits
35.
Note the savings
:
TheHuffman code:
Required 510 bits for the file.
Fixed length code:
Need 3 bits for 6 characters.
File has 210 characters.
Total: 630 bits for the file.
36.
Example: Construct aHuffman
Code for the following data and
also calculate the cost of the
tree
.
Character A B C D E
Probability 12 04 45 16 23
37.
The Huffman encoding
:
B12 A 04
D 16
W 60
X 32 E 23
Y 55 C 45
Z 100
0
0
0
0
1
1
1
1
A: 0001
B: 0000
C: 1
D: 001
E: 01
File Size: 4x4 + 12x4 + 45x1 + 16x3 + 23x2 =
16 + 48 + 45 + 48 + 46 = 203 bits
38.
Example: Construct aHuffman
Code for the following data and
also calculate the cost of the
tree and decode the code
1101000010001
.
Character A B C D E F
Probability0.35 0.12 0.21 0.05 0.18 0.09
39.
Example: Construct aHuffman
Code for the following message
their occurrence are given below
and decode the code whose
ending using the Huffman code
001110001010000010
.
Character A B C D E F G
Probability 23 10 03 21 20 06 17
40.
Example: Construct aHuffman
Code for the following message
and decode the code whose
ending using the Huffman code
100010111001010
.
Character A B C D E
Probability 0.4 0.1 0.2 0.15 0.15
41.
The Huffman encoding
:
B0.1 A 0.4
D 0.15
W 0.6
X 0.25
E 0.15
Y 0.35
C 0.2
Z 1
0
0
0
0 1
1
1
1
A: 1
B: 000
C: 001
D: 001
E: 010
100010111001010
Huffman Tree
:
As extractMin() calls minHeapify( ), it
takes O(logn) time
.
In each iteration: one less subtree.
Initially: n subtrees.
Total: O(n log n) time.
44.
Advantages of HuffmanEncoding
-
1
)
This encoding scheme results in saving lot of
storage space, since the binary codes generated
are variable in length
.
2
)
It generates shorter binary codes for encoding
symbols/characters that appear more frequently
in the input string
.
3
)
The binary codes generated are prefix-free
.
45.
Disadvantages of HuffmanEncoding
-
1
)
Lossless techniques like Huffman encoding are
suitable only for encoding text and program files and are
unsuitable for encoding digital images
.
2
)
Huffman encoding is a relatively slower process since
it uses two passes-one for building the statistical model
and another for encoding. Thus, the lossless techniques
that use Huffman encoding are considerably slower than
others
.
3
)
Since length of all the binary codes is different, it
becomes difficult for the decoding software to detect
whether the encoded data is corrupt. This can result in
an incorrect decoding and subsequently, a wrong output
.
46.
Real-life applications ofHuffman
Encoding
1
)
Huffman encoding is widely used in
compression formats like GZIP, PKZIP
(winzip) and BZIP2
.
2
)
Multimedia codecs like PNG and
MP3 uses Huffman encoding (to be more
precised the prefix codes)
.