1. VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“JNANA SANGAMA” , BELAGAVI– 590018
PRESENTED BY
AKSHRA RANI (1AT21IS009)
HRITIKA DUTTA (1AT21IS047)
IRAM KHAN (1AT21IS050)
TITLE
“ HUFFMAN TREE”
UNDER THE GUIDANCE OF
Dr. Jyoti Metan
Dept. of ISE, Atria IT
ATRIA INSTITUTE OF TECHNOLOGY
BANGALORE-54, Karnataka
DEPARTMENT OF INFORMATION SCIENCE &
ENGINEERING
2. INTRODUCTION
NEED FOR EFFICIENT COMPRESSION
BASICS COMPONENTS OF HUFFMAN TREE
HUFFMAN ALGORITHM
IMPLEMENTATION OF AN HUFFMAN TREE
ENCODING WITH HUFFMAN TREE
DECODING WITH HUFFMAN TREE
REAL WORLD APPLICATION
CONCLUSION
CONTENTS
3. INTRODUCTION
• A huffman tree, also known as a huffman coding tree or huffman binary tree, is a data structure
used in computer science and information theory for data compression.
• It's a specific type of binary tree that's used to generate huffman codes, which are variable-length
codes used to represent characters or symbols in a way that minimizes the overall data size when
encoding.
• Huffman coding is commonly used in various data compression algorithms, such as those used in
zip files and jpeg images, to reduce the size of data for efficient storage and transmission.
4. • Huffman trees are of significant importance in data compression for several
reasons:
1. EFFICIENT VARIABLE-LENGTH ENCODING: huffman coding provides a variable-
length encoding scheme where frequently occurring symbols are assigned shorter
codes, and less frequent symbols are assigned longer codes. this efficient mapping
of symbols to codes helps reduce the average number of bits required to represent
the data, resulting in compression.
2. LOSSLESS COMPRESSION: huffman coding is a lossless compression technique,
meaning that no information is lost during the compression-decompression process.
it ensures that the original data can be perfectly reconstructed from the compressed
data.
3. ADAPTABILITY: huffman coding can adapt to the frequency distribution of symbols
within the data. if the data changes, and the frequencies of symbols change, a new
huffman tree can be constructed to generate updated codes. this adaptability makes
it suitable for a wide range of data types
5. 4. WIDELY USED: huffman coding is used in many popular compression algorithms and
formats, such as zip files, png images, and mp3 audio compression. it plays a crucial role
in reducing the size of files for storage and transmission while maintaining data integrity.
5. MINIMAL REDUNDANCY: huffman coding minimizes redundancy in the encoded data. by
assigning shorter codes to frequently occurring symbols, it reduces the overall number of
bits required to represent the data, making it an efficient compression technique.
6. SIMPLE AND FAST: huffman encoding and decoding are relatively simple and fast
operations, making it suitable for real-time applications and scenarios where
computational resources are limited.
6. NEED FOR EFFICIENT COMPRESSION
1. REDUCED STORAGE SPACE: compression reduces the amount of space required to
store data. this is critical in scenarios where storage is limited or expensive, such as on
hard drives, solid-state drives, cloud storage, or mobile devices. by compressing data, you
can store more information in the same amount of space.
2. FASTER DATA TRANSFER: compressed data can be transmitted more quickly over
networks or the internet. smaller data sizes mean reduced bandwidth usage and faster
download or upload times, which is crucial for efficient data communication.
3. IMPROVED PERFORMANCE: compressed data can be read from storage devices faster,
as reading smaller files takes less time. this can lead to improved system and application
performance.
4. REAL-TIME STREAMING: in applications like video streaming and online gaming, data
compression allows for smoother real-time delivery of content by reducing latency and
ensuring a consistent user experience.
7. 5. Backup and Archiving: Compression is often used in backup and archiving
solutions to save space and reduce the time needed for backups. It also helps
preserve historical data efficiently.
6. Web Content:* Websites use compression to minimize page load times.
Technologies like GZIP compression reduce the size of HTML, CSS, and JavaScript
files before sending them to web browsers, resulting in faster website loading.
7. Security: In some cases, data compression can enhance security. For example,
encrypted data can be compressed before transmission to reduce the amount of
information available to potential eavesdroppers.
8. Data Analysis: In scientific and big data applications, data compression can help
manage a nd analyze vast datasets more efficiently, making it easier to extract
insights and patterns from the data.
8. BASIC COMPONENTS OF HUFFMAN TREE
The fundamental components of a huffman tree include:
• NODES: a huffman tree is composed of nodes, each of which represents either a character
or a composite node. composite nodes are formed by combining two nodes from the
previous level of the tree.
• LEAVES: the leaf nodes of a huffman tree represent individual characters or symbols from
the input data. each leaf node contains a character and its frequency count in the dataset.
• INTERNAL NODES: internal nodes, also known as composite nodes or branch nodes, are
not associated with characters directly. instead, they represent the merging of two nodes
from the previous level. these nodes have no character associated with them but have a
cumulative frequency that is the sum of the frequencies of their child nodes
9. • EDGES: edges connect the nodes in the huffman tree, forming a hierarchical
structure. each edge represents a binary decision, with a left edge typically
representing a "0" and a right edge representing a "1" in the huffman coding.
• ROOT NODE: the topmost node in the huffman tree is called the root node. it is the
starting point for traversing the tree when encoding or decoding data. The root node
has no parent node
• HUFFMAN CODES: the Huffman codes are generated by traversing the tree from
the root to the leaf nodes. These codes are binary representations of characters, with
shorter codes assigned to more frequently occurring characters and longer codes
assigned to less frequent ones.
• FREQUENCY COUNTS: associated with each leaf node, there are frequency counts
that indicate how many times each character appears in the input data. These
frequency counts are essential for constructing the Huffman tree and determining the
code lengths.
10. HUFFMAN ALGORITHM
CREATING A HUFFMAN TREE INVOLVES THE FOLLOWING ALGORITHM:
1. Calculate the frequency of each character in the input data.
2. Create a leaf node for each character with its frequency as the weight, and add them
to a priority queue (min-heap) based on their frequencies.
3. While there is more than one node in the queue:
A. Remove the two nodes with the lowest frequencies from the queue.
B. Create a new internal node with a weight equal to the sum of the frequencies of
the two nodes removed in step 3a.
4. Add the new internal node to the queue.4. the remaining node in the queue is the root
of the huffman tree.
12. STEP 1:-
Create a leaf node for the given table and arrange them
in ascending order according to their frequency
1 3 4 10 1
2
13 15
t o u a i s e
1
0
15 1
2
3 4 13 1
a e i o u s t
13. STEP 2:-
Add the minimum two nodes and arrange them in ascending order
1 3
4 10 1
2
13 15
t o
u a i s e
4
1 3 4 10 1
2
13 15
t o u a i s e
14. STEP 3:-
Repeat step-2 until all the elements are added to the root node.
1 3
4 10 1
2
13 15
t o
u a i s e
4
28. ENCODING WITH HUFFMAN TREE
CHARACTER CODE
a 111
e 10
i 00
o 11001
u 1101
s 01
t 11000
Examples:-
sat – 0111111000
01 111 11000
s a t
eat – 1011111000
to – 1100011001
out – 11001110111000
29. DECODING WITH HUFFMAN TREE
CHARACTER CODE
a 111
e 10
i 00
o 11001
u 1101
s 01
t 11000
Examples:-
010011000 – sit
01 00 11000
s i t
11111000 – at
01101100001 – sets
1111100010 - ate
31. CONCLUSION
To conclude on the topic of binary huffman tree, we like to state some points on it which will help you understand
better:
• Universal applicability: huffman trees are a versatile and universally applicable tool in the realm of data
compression and coding. their ability to adapt to various types of data, including text, images, audio, and more,
makes them a valuable asset in a wide range of applications.
• Efficiency and resource savings: huffman coding excels in efficiently utilizing resources, particularly in
scenarios where bandwidth and storage are limited. by assigning shorter codes to frequently occurring
symbols, it minimizes waste and optimizes resource usage.
• Lossless data preservation: huffman coding lossless compression ensures that data integrity remains intact
throughout the encoding and decoding process. this makes it an ideal choice for archiving, data transmission,
and any situation where data fidelity is critical.
• Human readable and understandable: huffman codes are binary, but they are still human-readable and easily
understandable. this characteristic simplifies debugging, analysis, and manual inspection of encoded data,
making it a practical choice in various contexts.
32. • Education and algorithmic understanding: learning about huffman trees provides a valuable insight
into fundamental computer science concepts, including binary tree structures, priority queues, and
greedy algorithms. it serves as an excellent starting point for those interested in algorithm design and
data compression.
In conclusion, huffman trees are a versatile, efficient, and educational tool with a long-standing legacy in
data compression. they offer lossless compression, resource optimization, adaptability, and human-
readable codes, making them valuable in diverse applications and serving as an essential building block
in the world of data compression and coding.