3. Definition
David A. Huffman, while he was a Sc.D. student at MIT invented a
greedy algorithm that constructs an optimal prefix code for the
purpose of lossless data compression, this specific algorithm is called
the Huffman code.
-Introduction to Algorithms, CLRS
4. Terminology
Greedy Algorithm : A solution which always takes the locally optimal
choice in the hope that this choice will lead to a globally optimal
solution.
Data compression : The process of encoding, restructuring or otherwise
modifying data in order to reduce its size. Fundamentally, it involves re-
encoding information using fewer bits than the original representation.
5. Applications
Huffman encoding is widely used in compression formats like WINZIP
Multimedia codecs like JPEG, PNG and MP3 use Huffman encoding
Cloud systems and servers use Huffman code to compress data and
save valuable storage space
Huffman encoding still dominates the compression industry since
newer schemes are avoided due to their patent issues.
7. Encoding
1. Create a leaf node for each symbol and add it to the priority queue.
2. While there is more than one node in the queue:
2.1 : Remove the two nodes of highest priority from the queue.
2.2 : Create a new internal node with these two nodes as
children and with frequency equal to the sum of the two nodes' frequency.
2.3 : Add the new node to the queue.
3. The remaining node is the root node and the Huffman tree is complete.
8. Formula
The total number of bits required to encode a file of characters is given
as the summation of the product of each character’s frequency with its
depth in the Huffman tree.
9. Decoding
The process of decompression is simply a matter of translating the
stream of prefix codes to individual byte value, usually by traversing the
Huffman tree node by node as each bit is read from the input stream.
Reaching a leaf node necessarily terminates the search for that
particular byte value. The leaf value represents the desired character.
11. Problem statement
For a given frequency table of characters determine :-
1. The Huffman code for each character
2. Average code length
3. Total length of Huffman encoded message (in bits)