huffman coding _

Design & Analysis of
Algorithms
Greedy Algorithms
Huffman Coding

Huffman Coding
 Huffman coding is a lossless data compression algorithm. The idea is to
assign variable-length codes to input characters, lengths of the assigned
codes are based on the frequencies of corresponding characters.
 Most character code systems (ASCII, unicode) use fixed length encoding
 The code length of a character depends on how frequently it occurs in the
given text.
 The character which occurs most frequently gets the smallest code.
 The character which occurs least frequently gets the largest code.
 It is also known as Huffman Encoding.
 compressing data (savings of 20% to 90%)

Prefix Codes
 The Huffman code is a prefix-free
code.
 No prefix of a code word is equal to
another codeword.
for example, we could not encode t as 01 and x as 01101 since 01 is
a prefix of 01101
By using a binary tree representation we will generate prefix codes
provided all letters are leaves

Huffman Coding Process
There are two major steps in Huffman Coding-
a) Building a Huffman Tree from the input characters.
b)Assigning code to the characters by traversing the
Huffman Tree.

Huffman Tree-
The steps involved in the construction of Huffman Tree are as follows-
Step-01:
Create a leaf node for each character of the text.
Leaf node of a character contains the occurring frequency of that character.
Step-02:
Arrange all the nodes in increasing order of their frequency value.
Step-03:
Considering the first two nodes having minimum frequency,
Create a new internal node.
The frequency of this new node is the sum of frequency of those two nodes.
Make the first node as a left child and the other node as a right child of the newly
created node.
Step-04:
Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
The tree finally obtained is the desired Huffman Tree.

Fixed v/s Variable length Memory usage
 Huffman’s greedy algorithm uses a table of the
frequencies of occurrence of each character to
build up an optimal way of representing each
character as a binary string
C: Alphabet

Huffman Code
Problem
 Left tree represents a fixed length encoding
scheme
 Right tree represents a Huffman encoding
scheme

Example to find Huffman coding
 Assume we are given a data file that contains only 6
symbols,
namely a, b, c, d, e, f With the following frequency table:
 Find a variable length prefix-free encoding
scheme/huffman encoding that
compresses this data file as much as possible?

Example with steps
Sort frequencies in increasing order
Pick two nodes with minimum
frequencies and merge it in a tree
and assign code 0 to left edge and 1
to right edge. Also add their
frequencies and assign in root node

Example with steps
Final, Huffman tree as all nodes are
combined in single tree.

Example with Steps
Huffman Code For Characters-
To write Huffman Code for any character, traverse the Huffman Tree
from root node to the leaf node of that character.
Following this rule, the Huffman Code for each character is-
a = 0
c =100
b= 101
f = 1100
e = 1101
d = 111
Find Average Code Length-
Using formula-01, we have-
Average code length
= ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi )
= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } /
(10 + 15 + 12 + 3 + 4 + 13 + 1)
=

3. Length of Huffman Encoded Message-
Using formula-02, we have-
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per
character
= 58 x 2.52
= 146.16
≅ 147 bits

Huffman Coding Algorithm
// C is a set of n characters
// Q is implemented as a binary min-heap O(n)
Total computation time = O(n lg n)
O(lg n)
O(lg n)
O(lg n)

Huffman Code
Problem
In the pseudocode that follows:
 we assume that C is a set of n characters and
that each character c €C is an object with a
defined frequency f [c].
 The algorithm builds the tree T corresponding to
the optimal code
 A min-priority queue Q, is used to identify the
two least-frequent objects to merge together.
 The result of the merger of two objects is a
new object whose frequency is the sum of the
frequencies of the two objects that were
merged.

Running time of Huffman's
algorithm
 The running time of Huffman's algorithm
assumes that Q is implemented as a binary
min-heap.
 For a set C of n characters, the initialization of
Q in line 2 can be performed in O(n) time using
the BUILD-MINHEAP
 The for loop in lines 3-8 is executed exactly n -
1 times, and since each heap operation
requires time O(lg n), the loop contributes O(n
lg n) to the running time. Thus, the total
running time of HUFFMAN on a set of n
characters is O(n lg n).

Prefix
Code
 Prefix(-free) code: no codeword is also a prefix of some
other codewords (Un-ambiguous)
 An optimal data compression achievable by a
character code can
always be achieved with a prefix code
 Simplify the encoding (compression) and decoding
 Encoding: Given the characters and their
frequencies, perform the algorithm and generate a
code. Write the characters using the code
example abc  0 . 101. 100 = 0101100
 Decoding: Given the Huffman tree, figure out what
each character is (possible because of prefix
property)
example 001011101 = 0. 0. 101. 1101  aabe
 Use binary tree to represent prefix codes for
easy decoding

Application on Huffman code
 Both the .mp3 and .jpg file formats
use Huffman coding at one stage of
the compression

huffman coding _

More Related Content

Similar to huffman coding _

More from swati463221

Recently uploaded

huffman coding _