Huffman code presentation and their operation

Huffman Coding
It is Compression technique to reduce the
size of the data or message.
(Lossless data Compression technique)

Computer Data Encoding:
How do we represent data in binary?
Fixed length codes.
Encode every symbol by a unique binary
string of a fixed length.
Examples: ASCII (8 bit code),

American Standard Code for
Information Interchange

ASCII Example
:
ABCA
A B C A
01000001 01000010 01000011 01000001

ASCII Example
:
Message send using ASCII code
ASCII codes are 8 bit code
A-65
B-66
C-67
D-68
E-69
Suppose we have a message
BCCABBDDAECCBBAEDDCC

Total space usage in bits
:
Assume an l bit fixed length code.
For a file of n characters
Need nl bits.

ASCII Example
:
A B C D E
01000001 01000010 01000011 01000100
01000101
So total bits = 8*20 =160bits (8 bit for each alphabet)
Suppose we have a message
BCCABBDDAECCBBAEDDCC

Fixed Length codes
Idea: In order to save space, use less bits.
There are 20 character 20*3=60 bits
Also send the table for receiver end to
understand the code for decode the
message. So total cost of the message is:
Character
Frequency
Code
A
3
000
B
5
001
C
6
010
D
4
011
E
2
100

Fixed Length codes
Idea: In order to save space, use less bits.
Character Size= 20*3=60 bits
Original Character size= 5*8=40 bits
New code for 5 character= 5*3= 15 bits
Total= 115 bits
(along with table)
Character
Frequency
Code
A
3
000
B
5
001
C
6
010
D
4
011
E
2
100

Variable Length codes
Idea: In order to save space, use less bits
for frequent characters and more bits
for rare characters.
The variable length codes assigned to
input character are prefix codes means
the codes (bit sequence) are assigned in
such a way that the code assigned to one
character is not prefix of code assigned
to any other character.

Variable Length codes
Idea: In order to save space, use less bits
for frequent characters and more bits
for rare characters.
Example: suppose alphabet of 3 symbols:
{ A, B, C }.
suppose in file: 1,000,000
characters.
Need 2 bits for a fixed length
code for a total of
2,000,000 bits.

Variable Length codes - example
A
B
C
999,000
500
500
Suppose the frequency distribution of the
characters is:
A
B
C
0
10
11
Note that the code of A is of length 1, and the codes for B
and C are of length 2
Encode:

Fixed code: 1,000,000 x 2 = 2,000,000
Varable code: 999,000 x 1
+ 500 x 2
500 x 2
1,001,000
Total space usage in bits
:
A savings of almost 50%

How do we decode
?
In the fixed length, we know where every
character starts, since they all have the
same number of bits.
Example: A = 00
B = 01
C = 10
000000010110101001100100001010
A A A B B C C C B C B A A C C

How do we decode
?
In the variable length code, we use an
idea called Prefix code, where no code is a
prefix of another.
Example: A = 0
B = 10
C = 11
None of the above codes is a prefix of
another.

Prefix Code
Let us understand prefix code with a
counter example: Let there be four
character a,b,c and d and their
corresponding variable length codes be
00, 01, 0, 1.
This coding leads to ambiguity because
code assigned to c is prefix of codes
assigned to a and b if the compressed bit
stream is 0001, the de-compressed output
may be cccd or ccb or acd or ab.

How do we decode
?
Example: A = 0
B = 10
C = 11
So, for the string:
A A A B B C C C B C B A A C C the encoding:
0 0 01010111111101110 0 01111

Prefix Code
Example: A = 0
B = 10
C = 11
Decode the string
0 0 01010111111101110 0 01111
A A A B B C C C B C B A A C C

Requirement
:
Construct a variable length code for a
given file with the following properties:
1. Prefix code.
2. Using shortest possible codes.
3. Efficient.

There are mainly two major parts in
Huffman Coding:
1. Build a Huffman tree from input
characters.
2. traverse the Huffman tree and assigned
codes to character.
Huffman Tree

Steps to Build Huffman Tree
:
1. Create a leaf node for each unique character and build
a min heap of all leaf nodes.
2. Extract two nodes with the minimum frequency from
the min heap.
3. Create a new internal node with frequency equal to the
sum of the nodes frequencies. Make the first
extracted node as its left child and the other
extracted node as its right child add this node to the
min heap.
repeat steps 2 and 3 until the heap contain only one node.
After completion of the tree assign 0 to left child and 1
to right child in whole tree.

Idea
Consider a binary tree, with:
0 meaning a left turn
1 meaning a right turn.
0
0
0
1
1
1
A
B
C D

Huffman Tree Example
:
Alphabet: A, B, C, D, E, F
Frequency table:
A
B
C
D
E
F
10
20
30
40
50
60
Total File Length: 210

Algorithm Run
:
A 10 B 20 C 30 D 40 E 50 F 60

Algorithm Run
:
A 10 B 20
C 30 D 40 E 50 F 60
X 30

Algorithm Run
:
A 10 B 20
C 30
D 40 E 50 F 60
X 30
Y 60

Algorithm Run
:
A 10 B 20
C 30
F 60
X 30
Y 60
D 40 E 50
Z 90

Algorithm Run
:
A 10 B 20
C 30
F 60
X 30
Y 60 D 40 E 50
Z 90
W 120

Algorithm Run
:
A 10 B 20
C 30
F 60
X 30
Y 60
D 40 E 50
Z 90 W 120

Algorithm Run
:
A 10 B 20
C 30
F 60
X 30
Y 60
D 40 E 50
Z 90 W 120
V 210
0
0 0
0
0
1
1
1
1
1

The Huffman encoding
:
A 10 B 20
C 30
F 60
X 30
Y 60
D 40 E 50
Z 90 W 120
V 210
0
0 0
0
0
1
1
1
1
1
A: 1000
B: 1001
C: 101
D: 00
E: 01
F: 11
File Size: 10x4 + 20x4 + 30x3 + 40x2 + 50x2 + 60x2 =
40 + 80 + 90 + 80 + 100 + 120 = 510 bits

Note the savings
:
The Huffman code:
Required 510 bits for the file.
Fixed length code:
Need 3 bits for 6 characters.
File has 210 characters.
Total: 630 bits for the file.

Example: Construct a Huffman
Code for the following data and
also calculate the cost of the
tree
.
Character A B C D E
Probability 12 04 45 16 23

:
B 12 A 04
D 16
W 60
X 32 E 23
Y 55 C 45
Z 100
0
0
0
0
1
1
1
1
A: 0001
B: 0000
C: 1
D: 001
E: 01
File Size: 4x4 + 12x4 + 45x1 + 16x3 + 23x2 =
16 + 48 + 45 + 48 + 46 = 203 bits

Code for the following data and
also calculate the cost of the
tree and decode the code
1101000010001
.
Character A B C D E F
Probability0.35 0.12 0.21 0.05 0.18 0.09

Code for the following message
their occurrence are given below
and decode the code whose
ending using the Huffman code
001110001010000010
.
Character A B C D E F G
Probability 23 10 03 21 20 06 17

Code for the following message
and decode the code whose
ending using the Huffman code
100010111001010
.
Character A B C D E
Probability 0.4 0.1 0.2 0.15 0.15

:
B 0.1 A 0.4
D 0.15
W 0.6
X 0.25
E 0.15
Y 0.35
C 0.2
Z 1
0
0
0
0 1
1
1
1
A: 1
B: 000
C: 001
D: 001
E: 010
100010111001010

How to decode the code
100010111001010

Huffman Tree
:
As extractMin( ) calls minHeapify( ), it
takes O(logn) time
.
In each iteration: one less subtree.
Initially: n subtrees.
Total: O(n log n) time.

Advantages of Huffman Encoding
-
1
)
This encoding scheme results in saving lot of
storage space, since the binary codes generated
are variable in length
.
2
)
It generates shorter binary codes for encoding
symbols/characters that appear more frequently
in the input string
.
3
)
The binary codes generated are prefix-free
.

Disadvantages of Huffman Encoding
-
1
)
Lossless techniques like Huffman encoding are
suitable only for encoding text and program files and are
unsuitable for encoding digital images
.
2
)
Huffman encoding is a relatively slower process since
it uses two passes-one for building the statistical model
and another for encoding. Thus, the lossless techniques
that use Huffman encoding are considerably slower than
others
.
3
)
Since length of all the binary codes is different, it
becomes difficult for the decoding software to detect
whether the encoded data is corrupt. This can result in
an incorrect decoding and subsequently, a wrong output
.

Real-life applications of Huffman
Encoding
1
)
Huffman encoding is widely used in
compression formats like GZIP, PKZIP
(winzip) and BZIP2
.
2
)
Multimedia codecs like PNG and
MP3 uses Huffman encoding (to be more
precised the prefix codes)
.

Huffman code presentation and their operation

More Related Content

Similar to Huffman code presentation and their operation

Recently uploaded

Huffman code presentation and their operation