Text File Compressor
DSA 3rd
Semester Project
Using Huffman Coding
Algorithm
Text files often contain redundant information
Need for efficient storage and transmission
Challenge: How to compress text files without losing information?
Problem Statement
2
Data compression is crucial in modern computing
A Promising Solution
Huffman Coding
Huffman Algorithm
A data compression technique that creates
variable-length codes for characters based
on their frequency of occurrence.
4
Developed by David Huffman, a graduate
student at MIT, in 1952
Variable-length prefix coding
More frequent characters get shorter codes
Less frequent characters get longer codes
Huffman Coding Implementation
5
Lossless compression technique
Implementation
01 Character frequency Analysis
● Count occurrences of each character
● Build frequency table
02 Priority Queue & Tree Construction
● Create nodes for each character
● Build Huffman tree using min-heap
● Merge nodes based on frequency
03 Code Generation
● Traverse tree to generate binary codes
● Left child: append '0'
● Right child: append '1'
6
Data Structures Used
Used to build Huffman Tree
Priority Queue
Variable-Length Codes
Binary Tree
Used to store characters and their
frequencies
Map
Used as a container for Priority
Queue
Vector
7
Syntax : map<KeyType, ValueType>
mapName
map<char, frequency> frequency
map
priority_queue<DataType> pq;
priority_queue<Node*, vector<Node*>,
Compare>
Used in Sorting Huffman Codes for Binary
Search
vector<DataType> vectorName
Leaf nodes contain actual characters
Internal nodes have nullptr for ch
8
Consider a file containing the following characters:
a = 10 , b = 5, c = 2, d = 50, e = 20
c b
7 a
17 e
37 d
87
0
1
0
1
0
1
0
1
a = 001
b = 0001
c = 0000
d = 1
e = 01
3 * 10 =
30
4 * 5 =
20
4 * 2 =
8
1 *50 =
50
2 *20 =
40
Total number of bits =
148
Without
compression:
10 * 8 = 80
5 * 8 = 40
2 * 8 = 16
50* 8 = 400
20* 8 = 160
Number of bits = 696
Questions?
9
THANK YOU

Huffman Algorithm for File Compression.pptx

  • 1.
    Text File Compressor DSA3rd Semester Project Using Huffman Coding Algorithm
  • 2.
    Text files oftencontain redundant information Need for efficient storage and transmission Challenge: How to compress text files without losing information? Problem Statement 2 Data compression is crucial in modern computing
  • 3.
  • 4.
    Huffman Algorithm A datacompression technique that creates variable-length codes for characters based on their frequency of occurrence. 4 Developed by David Huffman, a graduate student at MIT, in 1952
  • 5.
    Variable-length prefix coding Morefrequent characters get shorter codes Less frequent characters get longer codes Huffman Coding Implementation 5 Lossless compression technique
  • 6.
    Implementation 01 Character frequencyAnalysis ● Count occurrences of each character ● Build frequency table 02 Priority Queue & Tree Construction ● Create nodes for each character ● Build Huffman tree using min-heap ● Merge nodes based on frequency 03 Code Generation ● Traverse tree to generate binary codes ● Left child: append '0' ● Right child: append '1' 6
  • 7.
    Data Structures Used Usedto build Huffman Tree Priority Queue Variable-Length Codes Binary Tree Used to store characters and their frequencies Map Used as a container for Priority Queue Vector 7 Syntax : map<KeyType, ValueType> mapName map<char, frequency> frequency map priority_queue<DataType> pq; priority_queue<Node*, vector<Node*>, Compare> Used in Sorting Huffman Codes for Binary Search vector<DataType> vectorName Leaf nodes contain actual characters Internal nodes have nullptr for ch
  • 8.
    8 Consider a filecontaining the following characters: a = 10 , b = 5, c = 2, d = 50, e = 20 c b 7 a 17 e 37 d 87 0 1 0 1 0 1 0 1 a = 001 b = 0001 c = 0000 d = 1 e = 01 3 * 10 = 30 4 * 5 = 20 4 * 2 = 8 1 *50 = 50 2 *20 = 40 Total number of bits = 148 Without compression: 10 * 8 = 80 5 * 8 = 40 2 * 8 = 16 50* 8 = 400 20* 8 = 160 Number of bits = 696
  • 9.
  • 10.