SlideShare a Scribd company logo
1 of 26
Download to read offline
Lecture Notes on Huffman Coding
for
Open Educational Resource
on
Data Compression(CA209)
by
Dr. Piyush Charan
Assistant Professor
Department of Electronics and Communication Engg.
Integral University, Lucknow
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Coding or Encoding
• Every information in computer science is encoded as strings
of 1s and 0s. The objective of information theory is to usually
transmit information using fewest number of bits in such a
way that every encoding is unambiguous. This tutorial
discusses about fixed-length and variable-length encoding
along with Huffman Encoding which is the basis for all data
encoding schemes
• Encoding, in computers, can be defined as the process of
transmitting or storing sequence of characters efficiently.
Fixed-length and variable length are two types of encoding
schemes, explained as follows-
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 2
Fixed and Variable Length Codes
• Fixed-Length encoding - Every character is assigned a binary code using same
number of bits. Thus, a string like “aabacdad” can require 64 bits (8 bytes) for
storage or transmission, assuming that each character uses 8 bits.
• Variable- Length encoding - As opposed to Fixed-length encoding, this scheme
uses variable number of bits for encoding the characters depending on their
frequency in the given text. Thus, for a given string like “aabacdad”, frequency of
characters ‘a’, ‘b’, ‘c’ and ‘d’ is 4,1,1 and 2 respectively. Since ‘a’ occurs more
frequently than ‘b’, ‘c’ and ‘d’, it uses least number of bits, followed by ‘d’, ‘b’ and
‘c’. Suppose we randomly assign binary codes to each character as follows-
• a 0
b 011
c 111
d 11
• Thus, the string “aabacdad” gets encoded to 00011011111011 (0 | 0 | 011 | 0 | 111 |
11 | 0 | 11), using fewer number of bits compared to fixed-length encoding scheme.
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 3
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 4
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 5
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 6
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 7
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 8
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 9
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 10
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 11
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 12
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 13
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 14
Huffman Algorithm
• It is a lossless data compression algorithm.
• We assign variable-length codes to input characters, length of
which depends on frequency of characters.
• The variable-length codes assigned to input characters are
Prefix Codes.
02 February 2021 15
Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow
Types of Coding
• Different types of codes:
– fixed length code: Each codeword uses the same number
of bits.
– variable length code: In this case, each codeword can use
different numbers of bits.
02 February 2021 16
Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow
Information and Entropy
• Information theory is concerned with data compression and
transmission and builds upon probability and supports
machine learning.
• Information provides a way to quantify the amount of surprise
for an event measured in bits.
• Entropy provides a measure of the average amount of
information needed to represent an event drawn from a
probability distribution for a random variable.
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 17
What Is Information Theory?
• Information theory is a field of study concerned with quantifying
information for communication.
• It is a subfield of mathematics and is concerned with topics like data
compression and the limits of signal processing. The field was
proposed and developed by Claude Shannon while working at the
US telephone company Bell Labs.
• Information theory is concerned with representing data in a compact
fashion (a task known as data compression or source coding), as
well as with transmitting and storing it in a way that is robust to
errors (a task known as error correction or channel coding).
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 18
• A foundational concept from information is the quantification
of the amount of information in things like events, random
variables, and distributions.
• Quantifying the amount of information requires the use of
probabilities, hence the relationship of information theory to
probability.
• Measurements of information are widely used in artificial
intelligence and machine learning, such as in the construction
of decision trees and the optimization of classifier models.
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 19
Huffman Algorithm
• Step 1: Create a leaf node for each unique character and build a min heap
of all leaf nodes.
• Step 2: Extract two nodes with the minimum frequency from the min heap.
• Step 3: Create a new internal node with frequency equal to the sum of the
two nodes frequencies. Make the first extracted node as its left child and
the other extracted node as its right child. Add this node to the min heap.
• Step 4: Repeat steps#2 and #3 until the heap contains only one node. The
remaining node is the root node and the tree is complete.
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 20
Huffman Algorithm
• Lets see how “ABRACADABRA” translates into these sequences of 0’s
and 1’s
• 01011011010000101001011011010
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 21
Example 2
C
A R
D B
0 1
0
0
0 1
1
1
File Compression
• Text files are usually stored by representing each character with an 8-bit
ASCII code (type man ascii in a Unix shell to see the ASCII encoding)
• The ASCII encoding is an example of fixed-length encoding, where each
character is represented with the same number of bits.
• In order to reduce the space required to store a text file, we can exploit the
fact that some characters are more likely to occur than others.
• Variable-length encoding uses binary codes of different lengths for different
characters. Thus, we can assign fewer bits to frequently used characters
and more bits to rarely used characters.
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 22
File Compression: Example
• An Encoding Example
– Text: java
– encoding: a=“0”, j=“11”, v=“10”
– encoded text: 110100(6 bits)
• How to decode (problems in ambiguity)?
– Encoding: a=“0”, j=“01”, v=“00”
– encoded text: 010000 (6 bits)
– could be "java", or "jvv", or "jaaaa"
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 23
Encoding Tree
• To prevent ambiguities in decoding, we require that the encoding
satisfies the prefix rule: no code is a prefix of another.
• a=“0”, j=“11”, v=“10” satisfies the prefix rule
• a=“0”, j=“01”, v=“00” does not satisfy the prefix rule (the code of
‘a’ is a prefix of the codes of ‘j’ and ‘v’)
• Note: if your codes satisfy the prefix rule, then decoding will be
unambiguous. But if it does not then you will have ambiguity in
decoding.
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 24
Encoding Tree contd…
• We use an encoding trie to satisfy this prefix rule
– the characters are stored at the external nodes.
– a left child (edge) means 0
– a right child (edge) means 1
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 25
C
A R
D B
0 1
0
0
0 1
1
A= 010
B= 11
C= 00
D= 10
R= 011
1
Root
Example of Decoding
• We trace the character from the root to the particular leaf.
• Trie
02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 26
C
A R
D B
0 1
0
0
0 1
1
A= 010
B= 11
C= 00
D= 10
R= 011
1
Root
•Encoded text: 01011011010000101001011011010
•Text: ABRACADABRA

More Related Content

What's hot

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve Bayes
Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve BayesTwitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve Bayes
Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve Bayes
TELKOMNIKA JOURNAL
 

What's hot (18)

Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...
 
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
 
Double layered dna based cryptography
Double layered dna based cryptographyDouble layered dna based cryptography
Double layered dna based cryptography
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document Clustering
 
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
 
An in-depth exploration of Bangla blog post classification
An in-depth exploration of Bangla blog post classificationAn in-depth exploration of Bangla blog post classification
An in-depth exploration of Bangla blog post classification
 
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
 
Unit 4_Part 1_Number System
Unit 4_Part 1_Number SystemUnit 4_Part 1_Number System
Unit 4_Part 1_Number System
 
03 fauzi indonesian 9456 11nov17 edit septian
03 fauzi indonesian 9456 11nov17 edit septian03 fauzi indonesian 9456 11nov17 edit septian
03 fauzi indonesian 9456 11nov17 edit septian
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve Bayes
Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve BayesTwitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve Bayes
Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve Bayes
 
Deep Neural Networks in Text Classification using Active Learning
Deep Neural Networks in Text Classification using Active LearningDeep Neural Networks in Text Classification using Active Learning
Deep Neural Networks in Text Classification using Active Learning
 
Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)
 
2-IJCSE-00536
2-IJCSE-005362-IJCSE-00536
2-IJCSE-00536
 
Unit 4 Switching Theory and Logic Gates
Unit 4 Switching Theory and Logic GatesUnit 4 Switching Theory and Logic Gates
Unit 4 Switching Theory and Logic Gates
 

Similar to Unit 2 Lecture notes on Huffman coding

Uncertain one class learning and concept summarization learning on uncertain ...
Uncertain one class learning and concept summarization learning on uncertain ...Uncertain one class learning and concept summarization learning on uncertain ...
Uncertain one class learning and concept summarization learning on uncertain ...
ecwayprojects
 
Predictive Metabonomics
Predictive MetabonomicsPredictive Metabonomics
Predictive Metabonomics
Marilyn Arceo
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
Jim Belak
 
modeling-a-perceptron-neuron-using-verilog-developed-floating-point-numbering...
modeling-a-perceptron-neuron-using-verilog-developed-floating-point-numbering...modeling-a-perceptron-neuron-using-verilog-developed-floating-point-numbering...
modeling-a-perceptron-neuron-using-verilog-developed-floating-point-numbering...
RioCarthiis
 

Similar to Unit 2 Lecture notes on Huffman coding (20)

Unit 4 Arrays
Unit 4 ArraysUnit 4 Arrays
Unit 4 Arrays
 
Unit 1 Introduction to Data Compression
Unit 1 Introduction to Data CompressionUnit 1 Introduction to Data Compression
Unit 1 Introduction to Data Compression
 
Uncertain one class learning and concept summarization learning on uncertain ...
Uncertain one class learning and concept summarization learning on uncertain ...Uncertain one class learning and concept summarization learning on uncertain ...
Uncertain one class learning and concept summarization learning on uncertain ...
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
 
Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject Prediction
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
Introduction to Artificial Neural Networks - PART II.pdf
Introduction to Artificial Neural Networks - PART II.pdfIntroduction to Artificial Neural Networks - PART II.pdf
Introduction to Artificial Neural Networks - PART II.pdf
 
Attention scores and mechanisms
Attention scores and mechanismsAttention scores and mechanisms
Attention scores and mechanisms
 
Ijciet 10 01_153-2
Ijciet 10 01_153-2Ijciet 10 01_153-2
Ijciet 10 01_153-2
 
Predictive Metabonomics
Predictive MetabonomicsPredictive Metabonomics
Predictive Metabonomics
 
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing codeISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
 
Sensing complicated meanings from unstructured data: a novel hybrid approach
Sensing complicated meanings from unstructured data: a novel hybrid approachSensing complicated meanings from unstructured data: a novel hybrid approach
Sensing complicated meanings from unstructured data: a novel hybrid approach
 
A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...
 
EEG Based BCI Applications with Deep Learning
EEG Based BCI Applications with Deep LearningEEG Based BCI Applications with Deep Learning
EEG Based BCI Applications with Deep Learning
 
Forensics and wireless body area networks
Forensics and wireless body area networksForensics and wireless body area networks
Forensics and wireless body area networks
 
Artificial Neural Networks: Basics
Artificial Neural Networks: BasicsArtificial Neural Networks: Basics
Artificial Neural Networks: Basics
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
Enhancing the Security for Clinical Document Architecture Generating System u...
Enhancing the Security for Clinical Document Architecture Generating System u...Enhancing the Security for Clinical Document Architecture Generating System u...
Enhancing the Security for Clinical Document Architecture Generating System u...
 
modeling-a-perceptron-neuron-using-verilog-developed-floating-point-numbering...
modeling-a-perceptron-neuron-using-verilog-developed-floating-point-numbering...modeling-a-perceptron-neuron-using-verilog-developed-floating-point-numbering...
modeling-a-perceptron-neuron-using-verilog-developed-floating-point-numbering...
 

More from Dr Piyush Charan

More from Dr Piyush Charan (20)

Unit 1- Intro to Wireless Standards.pdf
Unit 1- Intro to Wireless Standards.pdfUnit 1- Intro to Wireless Standards.pdf
Unit 1- Intro to Wireless Standards.pdf
 
Unit 1 Solar Collectors
Unit 1 Solar CollectorsUnit 1 Solar Collectors
Unit 1 Solar Collectors
 
Unit 4 Lossy Coding Preliminaries
Unit 4 Lossy Coding PreliminariesUnit 4 Lossy Coding Preliminaries
Unit 4 Lossy Coding Preliminaries
 
Unit 3 Geothermal Energy
Unit 3 Geothermal EnergyUnit 3 Geothermal Energy
Unit 3 Geothermal Energy
 
Unit 2: Programming Language Tools
Unit 2:  Programming Language ToolsUnit 2:  Programming Language Tools
Unit 2: Programming Language Tools
 
Unit 3 Lecture Notes on Programming
Unit 3 Lecture Notes on ProgrammingUnit 3 Lecture Notes on Programming
Unit 3 Lecture Notes on Programming
 
Unit 3 introduction to programming
Unit 3 introduction to programmingUnit 3 introduction to programming
Unit 3 introduction to programming
 
Final PhD Defense Presentation
Final PhD Defense PresentationFinal PhD Defense Presentation
Final PhD Defense Presentation
 
Unit 1 Introduction to Non-Conventional Energy Resources
Unit 1 Introduction to Non-Conventional Energy ResourcesUnit 1 Introduction to Non-Conventional Energy Resources
Unit 1 Introduction to Non-Conventional Energy Resources
 
Unit 5-Operational Amplifiers and Electronic Measurement Devices
Unit 5-Operational Amplifiers and Electronic Measurement DevicesUnit 5-Operational Amplifiers and Electronic Measurement Devices
Unit 5-Operational Amplifiers and Electronic Measurement Devices
 
Unit 1 Numerical Problems on PN Junction Diode
Unit 1 Numerical Problems on PN Junction DiodeUnit 1 Numerical Problems on PN Junction Diode
Unit 1 Numerical Problems on PN Junction Diode
 
Unit 5 Global Issues- Early life of Prophet Muhammad
Unit 5 Global Issues- Early life of Prophet MuhammadUnit 5 Global Issues- Early life of Prophet Muhammad
Unit 5 Global Issues- Early life of Prophet Muhammad
 
Unit 4 Engineering Ethics
Unit 4 Engineering EthicsUnit 4 Engineering Ethics
Unit 4 Engineering Ethics
 
Unit 3 Professional Responsibility
Unit 3 Professional ResponsibilityUnit 3 Professional Responsibility
Unit 3 Professional Responsibility
 
Unit 5 oscillators and voltage regulators
Unit 5 oscillators and voltage regulatorsUnit 5 oscillators and voltage regulators
Unit 5 oscillators and voltage regulators
 
Unit 4 feedback amplifiers
Unit 4 feedback amplifiersUnit 4 feedback amplifiers
Unit 4 feedback amplifiers
 
Unit 1 Mechanism of Conduction in Semiconductors
Unit 1 Mechanism of Conduction in SemiconductorsUnit 1 Mechanism of Conduction in Semiconductors
Unit 1 Mechanism of Conduction in Semiconductors
 
ILI-Integral Learning Initiative
ILI-Integral Learning InitiativeILI-Integral Learning Initiative
ILI-Integral Learning Initiative
 
Unit 2 Introduction to Ethical Concept
Unit 2 Introduction to Ethical ConceptUnit 2 Introduction to Ethical Concept
Unit 2 Introduction to Ethical Concept
 
What is MOOCs??
What is MOOCs??What is MOOCs??
What is MOOCs??
 

Recently uploaded

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Recently uploaded (20)

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 

Unit 2 Lecture notes on Huffman coding

  • 1. Lecture Notes on Huffman Coding for Open Educational Resource on Data Compression(CA209) by Dr. Piyush Charan Assistant Professor Department of Electronics and Communication Engg. Integral University, Lucknow This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • 2. Coding or Encoding • Every information in computer science is encoded as strings of 1s and 0s. The objective of information theory is to usually transmit information using fewest number of bits in such a way that every encoding is unambiguous. This tutorial discusses about fixed-length and variable-length encoding along with Huffman Encoding which is the basis for all data encoding schemes • Encoding, in computers, can be defined as the process of transmitting or storing sequence of characters efficiently. Fixed-length and variable length are two types of encoding schemes, explained as follows- 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 2
  • 3. Fixed and Variable Length Codes • Fixed-Length encoding - Every character is assigned a binary code using same number of bits. Thus, a string like “aabacdad” can require 64 bits (8 bytes) for storage or transmission, assuming that each character uses 8 bits. • Variable- Length encoding - As opposed to Fixed-length encoding, this scheme uses variable number of bits for encoding the characters depending on their frequency in the given text. Thus, for a given string like “aabacdad”, frequency of characters ‘a’, ‘b’, ‘c’ and ‘d’ is 4,1,1 and 2 respectively. Since ‘a’ occurs more frequently than ‘b’, ‘c’ and ‘d’, it uses least number of bits, followed by ‘d’, ‘b’ and ‘c’. Suppose we randomly assign binary codes to each character as follows- • a 0 b 011 c 111 d 11 • Thus, the string “aabacdad” gets encoded to 00011011111011 (0 | 0 | 011 | 0 | 111 | 11 | 0 | 11), using fewer number of bits compared to fixed-length encoding scheme. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 3
  • 4. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 4
  • 5. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 5
  • 6. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 6
  • 7. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 7
  • 8. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 8
  • 9. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 9
  • 10. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 10
  • 11. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 11
  • 12. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 12
  • 13. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 13
  • 14. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 14
  • 15. Huffman Algorithm • It is a lossless data compression algorithm. • We assign variable-length codes to input characters, length of which depends on frequency of characters. • The variable-length codes assigned to input characters are Prefix Codes. 02 February 2021 15 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow
  • 16. Types of Coding • Different types of codes: – fixed length code: Each codeword uses the same number of bits. – variable length code: In this case, each codeword can use different numbers of bits. 02 February 2021 16 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow
  • 17. Information and Entropy • Information theory is concerned with data compression and transmission and builds upon probability and supports machine learning. • Information provides a way to quantify the amount of surprise for an event measured in bits. • Entropy provides a measure of the average amount of information needed to represent an event drawn from a probability distribution for a random variable. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 17
  • 18. What Is Information Theory? • Information theory is a field of study concerned with quantifying information for communication. • It is a subfield of mathematics and is concerned with topics like data compression and the limits of signal processing. The field was proposed and developed by Claude Shannon while working at the US telephone company Bell Labs. • Information theory is concerned with representing data in a compact fashion (a task known as data compression or source coding), as well as with transmitting and storing it in a way that is robust to errors (a task known as error correction or channel coding). 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 18
  • 19. • A foundational concept from information is the quantification of the amount of information in things like events, random variables, and distributions. • Quantifying the amount of information requires the use of probabilities, hence the relationship of information theory to probability. • Measurements of information are widely used in artificial intelligence and machine learning, such as in the construction of decision trees and the optimization of classifier models. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 19
  • 20. Huffman Algorithm • Step 1: Create a leaf node for each unique character and build a min heap of all leaf nodes. • Step 2: Extract two nodes with the minimum frequency from the min heap. • Step 3: Create a new internal node with frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap. • Step 4: Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 20
  • 21. Huffman Algorithm • Lets see how “ABRACADABRA” translates into these sequences of 0’s and 1’s • 01011011010000101001011011010 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 21 Example 2 C A R D B 0 1 0 0 0 1 1 1
  • 22. File Compression • Text files are usually stored by representing each character with an 8-bit ASCII code (type man ascii in a Unix shell to see the ASCII encoding) • The ASCII encoding is an example of fixed-length encoding, where each character is represented with the same number of bits. • In order to reduce the space required to store a text file, we can exploit the fact that some characters are more likely to occur than others. • Variable-length encoding uses binary codes of different lengths for different characters. Thus, we can assign fewer bits to frequently used characters and more bits to rarely used characters. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 22
  • 23. File Compression: Example • An Encoding Example – Text: java – encoding: a=“0”, j=“11”, v=“10” – encoded text: 110100(6 bits) • How to decode (problems in ambiguity)? – Encoding: a=“0”, j=“01”, v=“00” – encoded text: 010000 (6 bits) – could be "java", or "jvv", or "jaaaa" 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 23
  • 24. Encoding Tree • To prevent ambiguities in decoding, we require that the encoding satisfies the prefix rule: no code is a prefix of another. • a=“0”, j=“11”, v=“10” satisfies the prefix rule • a=“0”, j=“01”, v=“00” does not satisfy the prefix rule (the code of ‘a’ is a prefix of the codes of ‘j’ and ‘v’) • Note: if your codes satisfy the prefix rule, then decoding will be unambiguous. But if it does not then you will have ambiguity in decoding. 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 24
  • 25. Encoding Tree contd… • We use an encoding trie to satisfy this prefix rule – the characters are stored at the external nodes. – a left child (edge) means 0 – a right child (edge) means 1 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 25 C A R D B 0 1 0 0 0 1 1 A= 010 B= 11 C= 00 D= 10 R= 011 1 Root
  • 26. Example of Decoding • We trace the character from the root to the particular leaf. • Trie 02 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 26 C A R D B 0 1 0 0 0 1 1 A= 010 B= 11 C= 00 D= 10 R= 011 1 Root •Encoded text: 01011011010000101001011011010 •Text: ABRACADABRA