SlideShare a Scribd company logo
1 of 19
Download to read offline
Huffman Encoding



           Βαγγέλης Δούρος
           EY0619




1
Text Compression

     On a computer: changing the representation
     of a file so that it takes less space to store
     or/and less time to transmit.
     –   original file can be reconstructed exactly from the
         compressed representation.
     different than data compression in general
     –   text compression has to be lossless.
     –   compare with sound and images: small changes
         and noise is tolerated.

2
First Approach

     Let the word ABRACADABRA
     What is the most economical way to write this
     string in a binary representation?
     Generally speaking, if a text consists of N
     different characters, we need ⎡log N ⎤ bits to
                                    ⎢     ⎥
     represent each one using a fixed-length
     encoding.
     Thus, it would require 3 bits for each of 5
     different letters, or 33 bits for 11 letters.
     Can we do it better?

3
Yes!!!!

    We can do better, provided:
    –   Some characters are more frequent than others.
    –   Characters may be different bit lengths, so that for
        example, in the English alphabet letter a may use
        only one or two bits, while letter y may use
        several.
    –   We have a unique way of decoding the bit stream.



4
Using Variable-Length Encoding (1)

     Magic word: ABRACADABRA
     LET A = 0
         B = 100
         C = 1010
         D = 1011
         R = 11
     Thus, ABRACADABRA = 01001101010010110100110
     So 11 letters demand 23 bits < 33 bits, an
     improvement of about 30%.



5
Using Variable-Length Encoding (2)

     However, there is a serious danger: How to ensure
     unique reconstruction?
     Let A    01 and B     0101
     How to decode 010101?
     AB?
     BA?
     AAA?
     No problem…
     if we use prefix codes: no codeword is a prefix of
     another codeword.

6
Prefix Codes (1)

     Any prefix code can be represented by a full
     binary tree.
     Each leaf stores a symbol.
     Each node has two children – left branch
     means 0, right means 1.
     codeword = path from the root to the leaf
     interpreting suitably the left and right
     branches.

7
Prefix Codes (2)
     ABRACADABRA

     A=0
     B = 100
     C = 1010
     D = 1011
     R = 11
     Decoding is unique and simple!
     Read the bit stream from left to
     right and starting from the root,
     whenever a leaf is reached,
     write down its symbol and
     return to the root.




8
Prefix Codes (3)

     Let fi the frequency of the i-th symbol ,
     di the number of bits required for the i-th
     symbol(=the depth of this symbol in tree), 1 ≤ i ≤ n
     How do we find the optimal coding tree,   n

     which minimizes the cost of tree C = ∑ f d ?
                                              i=1
                                                    i   i



      –   Frequent characters should have short
          codewords
      –   Rare characters should have long codewords

9
Huffman’s Idea
      From the previous definition of the cost of tree, it is clear that
      the two symbols with the smallest frequencies must be at the
      bottom of the optimal tree, as children of the lowest internal
      node, isn’t it?
      This is a good sign that we have to use a bottom-up manner to
      build the optimal code!
      Huffman’s idea is based on a greedy approach, using the
      previous notices.
      Repeat until all nodes merged into one tree:
       –   Remove two nodes with the lowest frequencies.
       –   Create a new internal node, with the two just-removed nodes as
           children (either node can be either child) and the sum of their
           frequencies as the new frequency.


10
Constructing a Huffman Code (1)

      Assume that frequencies of symbols are:
      –   A: 40 B: 20 C: 10 D: 10 R: 20
      Smallest numbers are 10 and 10 (C and D), so
      connect them




11
Constructing a Huffman Code (2)
      C and D have already been
      used, and the new node
      above them (call it C+D) has
      value 20
      The smallest values are B,
      C+D, and R, all of which
      have value 20
       –   Connect any two of these
      It is clear that the algorithm
      does not construct a unique
      tree, but even if we have
      chosen the other possible
      connection, the code would
      be optimal too!

12
Constructing a Huffman Code (3)

      The smallest value is R, while A and B+C+D have
      value 40.
      Connect R to either of the others.




13
Constructing a Huffman Code(4)

      Connect the final two nodes, adding 0 and 1 to
      each left and right branch respectively.




14
Algorithm
                      X is the set of symbols, whose
                      frequencies are known in advance

                              Q is a min-priority queue,
                              implemented as binary-heap
                 -1




15
What about Complexity?

                                Thus, the algorithm needs Ο(nlogn)


                                          needs O(nlogn)
              -1   Thus, the loop needs O(nlogn)


                                    needs O(logn)
                                    needs O(logn)




                             needs O(logn)
16
Algorithm’s Correctness
      It is proven that the greedy algorithm HUFFMAN is correct, as the
      problem of determining an optimal prefix code exhibits the greedy-
      choice and optimal-substructure properties.
      Greedy Choice :Let C an alphabet in which each character c Є C has
      frequency f[c]. Let x and y two characters in C having the lowest
      frequencies. Then there exists an optimal prefix code for C in which
      the codewords for x and y have the same length and differ only in the
      last bit.
      Optimal Substructure :Let C a given alphabet with frequency f[c]
      defined for each character c Є C . Let x and y, two characters in C with
      minimum frequency. Let C’ ,the alphabet C with characters x,y
      removed and (new) character z added, so that C’ = C – {x,y} U {z};
      define f for C’ as for C, except that f[z] = f[x] + f[y]. Let T’ ,any tree
      representing an optimal prefix code for the alphabet C’. Then the tree
      T, obtained from T’ by replacing the leaf node for z with an internal
      node having x and y as children, represents an optimal prefix code for
      the alphabet C.
17
Last Remarks

     • "Huffman Codes" are widely used applications that
       involve the compression and transmission of digital
       data, such as: fax machines, modems, computer
       networks.
     • Huffman encoding is practical if:
        –   The encoded string is large relative to the code table
            (because you have to include the code table in the entire
            message, if it is not widely spread).
        –   We agree on the code table in advance
             • For example, it’s easy to find a table of letter frequencies for
               English (or any other alphabet-based language)


18
Ευχαριστώ!




19

More Related Content

What's hot

01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtrackingmandlapure
 
Huffman Coding Algorithm Presentation
Huffman Coding Algorithm PresentationHuffman Coding Algorithm Presentation
Huffman Coding Algorithm PresentationAkm Monir
 
Huffman's algorithm in Data Structure
 Huffman's algorithm in Data Structure Huffman's algorithm in Data Structure
Huffman's algorithm in Data StructureVrushali Dhanokar
 
Depth First Search ( DFS )
Depth First Search ( DFS )Depth First Search ( DFS )
Depth First Search ( DFS )Sazzad Hossain
 
Mathematical Analysis of Recursive Algorithm.
Mathematical Analysis of Recursive Algorithm.Mathematical Analysis of Recursive Algorithm.
Mathematical Analysis of Recursive Algorithm.mohanrathod18
 
15 puzzle problem using branch and bound
15 puzzle problem using branch and bound15 puzzle problem using branch and bound
15 puzzle problem using branch and boundAbhishek Singh
 
Satisfiability
SatisfiabilitySatisfiability
SatisfiabilityJim Kukula
 
AI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction ProblemAI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction ProblemMohammad Imam Hossain
 
Graph traversal-BFS & DFS
Graph traversal-BFS & DFSGraph traversal-BFS & DFS
Graph traversal-BFS & DFSRajandeep Gill
 
Binomial heap presentation
Binomial heap presentationBinomial heap presentation
Binomial heap presentationHafsa.Naseem
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashingAmi Ranjit
 
Breadth first search and depth first search
Breadth first search and  depth first searchBreadth first search and  depth first search
Breadth first search and depth first searchHossain Md Shakhawat
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translationAkshaya Arunan
 

What's hot (20)

Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtracking
 
Huffman Coding Algorithm Presentation
Huffman Coding Algorithm PresentationHuffman Coding Algorithm Presentation
Huffman Coding Algorithm Presentation
 
Huffman's algorithm in Data Structure
 Huffman's algorithm in Data Structure Huffman's algorithm in Data Structure
Huffman's algorithm in Data Structure
 
Depth First Search ( DFS )
Depth First Search ( DFS )Depth First Search ( DFS )
Depth First Search ( DFS )
 
Backtracking
Backtracking  Backtracking
Backtracking
 
Unit iv(simple code generator)
Unit iv(simple code generator)Unit iv(simple code generator)
Unit iv(simple code generator)
 
Mathematical Analysis of Recursive Algorithm.
Mathematical Analysis of Recursive Algorithm.Mathematical Analysis of Recursive Algorithm.
Mathematical Analysis of Recursive Algorithm.
 
Chapter 5 Syntax Directed Translation
Chapter 5   Syntax Directed TranslationChapter 5   Syntax Directed Translation
Chapter 5 Syntax Directed Translation
 
15 puzzle problem using branch and bound
15 puzzle problem using branch and bound15 puzzle problem using branch and bound
15 puzzle problem using branch and bound
 
stack & queue
stack & queuestack & queue
stack & queue
 
Satisfiability
SatisfiabilitySatisfiability
Satisfiability
 
AI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction ProblemAI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction Problem
 
Graph traversal-BFS & DFS
Graph traversal-BFS & DFSGraph traversal-BFS & DFS
Graph traversal-BFS & DFS
 
Binomial heap presentation
Binomial heap presentationBinomial heap presentation
Binomial heap presentation
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashing
 
Breadth first search and depth first search
Breadth first search and  depth first searchBreadth first search and  depth first search
Breadth first search and depth first search
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Splay Tree
Splay TreeSplay Tree
Splay Tree
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
 

Viewers also liked

Data compression huffman coding algoritham
Data compression huffman coding algorithamData compression huffman coding algoritham
Data compression huffman coding algorithamRahul Khanwani
 
Huffman Tree And Its Application
Huffman Tree And Its ApplicationHuffman Tree And Its Application
Huffman Tree And Its ApplicationPapu Kumar
 
Queue- 8 Queen
Queue- 8 QueenQueue- 8 Queen
Queue- 8 QueenHa Ninh
 
0 introduction to computer architecture
0 introduction to computer architecture0 introduction to computer architecture
0 introduction to computer architectureaamc1100
 
backtracking algorithms of ada
backtracking algorithms of adabacktracking algorithms of ada
backtracking algorithms of adaSahil Kumar
 
Computer architecture
Computer architectureComputer architecture
Computer architectureSanjeev Patel
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back trackingTech_MX
 
BASIC COMPUTER ARCHITECTURE
BASIC COMPUTER ARCHITECTURE BASIC COMPUTER ARCHITECTURE
BASIC COMPUTER ARCHITECTURE Himanshu Sharma
 
Computer Architecture and organization
Computer Architecture and organizationComputer Architecture and organization
Computer Architecture and organizationBadrinath Kadam
 

Viewers also liked (20)

Data compression huffman coding algoritham
Data compression huffman coding algorithamData compression huffman coding algoritham
Data compression huffman coding algoritham
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Huffman codes
Huffman codesHuffman codes
Huffman codes
 
Huffman tree
Huffman tree Huffman tree
Huffman tree
 
Huffman Tree And Its Application
Huffman Tree And Its ApplicationHuffman Tree And Its Application
Huffman Tree And Its Application
 
Knapsack problem
Knapsack problemKnapsack problem
Knapsack problem
 
Queue- 8 Queen
Queue- 8 QueenQueue- 8 Queen
Queue- 8 Queen
 
Knapsack
KnapsackKnapsack
Knapsack
 
Greedy algorithm
Greedy algorithmGreedy algorithm
Greedy algorithm
 
Computer architecture
Computer architecture Computer architecture
Computer architecture
 
0 introduction to computer architecture
0 introduction to computer architecture0 introduction to computer architecture
0 introduction to computer architecture
 
backtracking algorithms of ada
backtracking algorithms of adabacktracking algorithms of ada
backtracking algorithms of ada
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back tracking
 
BASIC COMPUTER ARCHITECTURE
BASIC COMPUTER ARCHITECTURE BASIC COMPUTER ARCHITECTURE
BASIC COMPUTER ARCHITECTURE
 
Knapsack Problem
Knapsack ProblemKnapsack Problem
Knapsack Problem
 
Memory organization
Memory organizationMemory organization
Memory organization
 
Computer Architecture and organization
Computer Architecture and organizationComputer Architecture and organization
Computer Architecture and organization
 

Similar to Huffman Encoding Pr

Huffman coding
Huffman codingHuffman coding
Huffman codingGeorge Ang
 
HuffmanCoding01.doc
HuffmanCoding01.docHuffmanCoding01.doc
HuffmanCoding01.docQwertty3
 
Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Aref35
 
basicsofcodingtheory-160202182933-converted.pptx
basicsofcodingtheory-160202182933-converted.pptxbasicsofcodingtheory-160202182933-converted.pptx
basicsofcodingtheory-160202182933-converted.pptxupendrabhatt13
 
Data communication & computer networking: Huffman algorithm
Data communication & computer networking:  Huffman algorithmData communication & computer networking:  Huffman algorithm
Data communication & computer networking: Huffman algorithmDr Rajiv Srivastava
 
16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.pptDrAliKMattar
 
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_AlgorithmsShanmuganathan C
 

Similar to Huffman Encoding Pr (20)

Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
Lecture 3.pptx
Lecture 3.pptxLecture 3.pptx
Lecture 3.pptx
 
Lecture 3.pptx
Lecture 3.pptxLecture 3.pptx
Lecture 3.pptx
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 
Unequal-Cost Prefix-Free Codes
Unequal-Cost Prefix-Free CodesUnequal-Cost Prefix-Free Codes
Unequal-Cost Prefix-Free Codes
 
Komdat-Kompresi Data
Komdat-Kompresi DataKomdat-Kompresi Data
Komdat-Kompresi Data
 
Huffman analysis
Huffman analysisHuffman analysis
Huffman analysis
 
Basics of coding theory
Basics of coding theoryBasics of coding theory
Basics of coding theory
 
Huffman coding01
Huffman coding01Huffman coding01
Huffman coding01
 
Arithmetic Coding
Arithmetic CodingArithmetic Coding
Arithmetic Coding
 
HuffmanCoding01.doc
HuffmanCoding01.docHuffmanCoding01.doc
HuffmanCoding01.doc
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3
 
Compression ii
Compression iiCompression ii
Compression ii
 
basicsofcodingtheory-160202182933-converted.pptx
basicsofcodingtheory-160202182933-converted.pptxbasicsofcodingtheory-160202182933-converted.pptx
basicsofcodingtheory-160202182933-converted.pptx
 
Data communication & computer networking: Huffman algorithm
Data communication & computer networking:  Huffman algorithmData communication & computer networking:  Huffman algorithm
Data communication & computer networking: Huffman algorithm
 
16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt
 
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
16_Greedy_Algorithms Greedy_AlgorithmsGreedy_Algorithms
 
16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt16_Greedy_Algorithms.ppt
16_Greedy_Algorithms.ppt
 

More from anithabalaprabhu (20)

Shannon Fano
Shannon FanoShannon Fano
Shannon Fano
 
Ch 04 Arithmetic Coding ( P P T)
Ch 04  Arithmetic  Coding ( P P T)Ch 04  Arithmetic  Coding ( P P T)
Ch 04 Arithmetic Coding ( P P T)
 
Compression
CompressionCompression
Compression
 
Datacompression1
Datacompression1Datacompression1
Datacompression1
 
Speech Compression
Speech CompressionSpeech Compression
Speech Compression
 
Z24 4 Speech Compression
Z24   4   Speech CompressionZ24   4   Speech Compression
Z24 4 Speech Compression
 
Dictor
DictorDictor
Dictor
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compression
 
Module 4 Arithmetic Coding
Module 4 Arithmetic CodingModule 4 Arithmetic Coding
Module 4 Arithmetic Coding
 
Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding (Ppt)Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding (Ppt)
 
06 Arithmetic 1
06 Arithmetic 106 Arithmetic 1
06 Arithmetic 1
 
Lassy
LassyLassy
Lassy
 
Lossy
LossyLossy
Lossy
 
Planning
PlanningPlanning
Planning
 
Lossless
LosslessLossless
Lossless
 
Losseless
LosselessLosseless
Losseless
 
Lec32
Lec32Lec32
Lec32
 
Huffman Student
Huffman StudentHuffman Student
Huffman Student
 
Huffman1
Huffman1Huffman1
Huffman1
 
Huffman
HuffmanHuffman
Huffman
 

Recently uploaded

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Huffman Encoding Pr

  • 1. Huffman Encoding Βαγγέλης Δούρος EY0619 1
  • 2. Text Compression On a computer: changing the representation of a file so that it takes less space to store or/and less time to transmit. – original file can be reconstructed exactly from the compressed representation. different than data compression in general – text compression has to be lossless. – compare with sound and images: small changes and noise is tolerated. 2
  • 3. First Approach Let the word ABRACADABRA What is the most economical way to write this string in a binary representation? Generally speaking, if a text consists of N different characters, we need ⎡log N ⎤ bits to ⎢ ⎥ represent each one using a fixed-length encoding. Thus, it would require 3 bits for each of 5 different letters, or 33 bits for 11 letters. Can we do it better? 3
  • 4. Yes!!!! We can do better, provided: – Some characters are more frequent than others. – Characters may be different bit lengths, so that for example, in the English alphabet letter a may use only one or two bits, while letter y may use several. – We have a unique way of decoding the bit stream. 4
  • 5. Using Variable-Length Encoding (1) Magic word: ABRACADABRA LET A = 0 B = 100 C = 1010 D = 1011 R = 11 Thus, ABRACADABRA = 01001101010010110100110 So 11 letters demand 23 bits < 33 bits, an improvement of about 30%. 5
  • 6. Using Variable-Length Encoding (2) However, there is a serious danger: How to ensure unique reconstruction? Let A 01 and B 0101 How to decode 010101? AB? BA? AAA? No problem… if we use prefix codes: no codeword is a prefix of another codeword. 6
  • 7. Prefix Codes (1) Any prefix code can be represented by a full binary tree. Each leaf stores a symbol. Each node has two children – left branch means 0, right means 1. codeword = path from the root to the leaf interpreting suitably the left and right branches. 7
  • 8. Prefix Codes (2) ABRACADABRA A=0 B = 100 C = 1010 D = 1011 R = 11 Decoding is unique and simple! Read the bit stream from left to right and starting from the root, whenever a leaf is reached, write down its symbol and return to the root. 8
  • 9. Prefix Codes (3) Let fi the frequency of the i-th symbol , di the number of bits required for the i-th symbol(=the depth of this symbol in tree), 1 ≤ i ≤ n How do we find the optimal coding tree, n which minimizes the cost of tree C = ∑ f d ? i=1 i i – Frequent characters should have short codewords – Rare characters should have long codewords 9
  • 10. Huffman’s Idea From the previous definition of the cost of tree, it is clear that the two symbols with the smallest frequencies must be at the bottom of the optimal tree, as children of the lowest internal node, isn’t it? This is a good sign that we have to use a bottom-up manner to build the optimal code! Huffman’s idea is based on a greedy approach, using the previous notices. Repeat until all nodes merged into one tree: – Remove two nodes with the lowest frequencies. – Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their frequencies as the new frequency. 10
  • 11. Constructing a Huffman Code (1) Assume that frequencies of symbols are: – A: 40 B: 20 C: 10 D: 10 R: 20 Smallest numbers are 10 and 10 (C and D), so connect them 11
  • 12. Constructing a Huffman Code (2) C and D have already been used, and the new node above them (call it C+D) has value 20 The smallest values are B, C+D, and R, all of which have value 20 – Connect any two of these It is clear that the algorithm does not construct a unique tree, but even if we have chosen the other possible connection, the code would be optimal too! 12
  • 13. Constructing a Huffman Code (3) The smallest value is R, while A and B+C+D have value 40. Connect R to either of the others. 13
  • 14. Constructing a Huffman Code(4) Connect the final two nodes, adding 0 and 1 to each left and right branch respectively. 14
  • 15. Algorithm X is the set of symbols, whose frequencies are known in advance Q is a min-priority queue, implemented as binary-heap -1 15
  • 16. What about Complexity? Thus, the algorithm needs Ο(nlogn) needs O(nlogn) -1 Thus, the loop needs O(nlogn) needs O(logn) needs O(logn) needs O(logn) 16
  • 17. Algorithm’s Correctness It is proven that the greedy algorithm HUFFMAN is correct, as the problem of determining an optimal prefix code exhibits the greedy- choice and optimal-substructure properties. Greedy Choice :Let C an alphabet in which each character c Є C has frequency f[c]. Let x and y two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit. Optimal Substructure :Let C a given alphabet with frequency f[c] defined for each character c Є C . Let x and y, two characters in C with minimum frequency. Let C’ ,the alphabet C with characters x,y removed and (new) character z added, so that C’ = C – {x,y} U {z}; define f for C’ as for C, except that f[z] = f[x] + f[y]. Let T’ ,any tree representing an optimal prefix code for the alphabet C’. Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children, represents an optimal prefix code for the alphabet C. 17
  • 18. Last Remarks • "Huffman Codes" are widely used applications that involve the compression and transmission of digital data, such as: fax machines, modems, computer networks. • Huffman encoding is practical if: – The encoded string is large relative to the code table (because you have to include the code table in the entire message, if it is not widely spread). – We agree on the code table in advance • For example, it’s easy to find a table of letter frequencies for English (or any other alphabet-based language) 18