The document discusses Huffman coding, which is a lossless data compression algorithm that uses variable-length codes to encode characters based on their frequency of occurrence. It involves building a Huffman tree by iteratively combining the two lowest frequency nodes and assigning codes to characters based on their paths in the tree. The algorithm is described in 4 steps - getting character frequencies, building the Huffman tree and assigning codes, encoding the data, and decoding the compressed data. Examples are provided to illustrate how the Huffman tree is constructed bottom-up and codes are assigned.
Huffman coding is a lossless data compression algorithm that uses variable-length binary codes for characters based on their frequency of occurrence, with more common characters represented by shorter bit sequences; it constructs a Huffman tree by assigning codes to characters such that the encoded output has minimum expected length, allowing for more efficient data storage and transmission. The document provides an overview of Huffman coding and trees, including an example to demonstrate how character frequencies are used to assign bit codes and compress a sample string from 56 bits to 13 bits.
This document discusses data compression techniques including lossless compression methods like run-length encoding and statistical encoding like Huffman encoding. It explains that compression aims to reduce the size of information to be stored or transmitted by removing redundancy. The key points covered are:
- Compression principles like entropy encoding and Huffman encoding which assigns variable length codes based on symbol probabilities.
- The Huffman algorithm involves constructing a binary tree from symbol frequencies and assigning codes based on paths from the root with '0' for left branches and '1' for right.
- Huffman coding satisfies the prefix property that no code is a prefix of another, allowing unique decoding.
Huffman's algorithm is a lossless data compression algorithm that assigns variable-length binary codes to characters based on their frequencies, with more common characters getting shorter codes. It builds a Huffman tree by starting with individual leaf nodes for each unique character and their frequencies, then combining the two lowest frequency nodes into an internal node until only one node remains as the root. Codes are then assigned by traversing paths from the root to leaves.
Huffman codes are a technique for lossless data compression that assigns variable-length binary codes to characters, with more frequent characters having shorter codes. The algorithm builds a frequency table of characters then constructs a binary tree to determine optimal codes. Characters are assigned codes based on their path from the root, with left branches representing 0 and right 1. Both encoder and decoder use this tree to translate between binary codes and characters. The tree guarantees unique decoding and optimal compression.
Huffman and Arithmetic coding - Performance analysisRamakant Soni
Huffman coding and arithmetic coding are analyzed for complexity.
Huffman coding assigns variable length codes to symbols based on probability and has O(N2) complexity. Arithmetic coding encodes the entire message as a fraction between 0 and 1 by dividing intervals based on symbol probability and has better O(N log n) complexity. Arithmetic coding compresses data more efficiently with fewer bits per symbol and has lower complexity than Huffman coding asymptotically.
Data Communication & Computer network: Shanon fano codingDr Rajiv Srivastava
These slides cover the fundamentals of data communication & networking. it covers Shanon fano coding which are used in communication of data over transmission medium. it is useful for engineering students & also for the candidates who want to master data communication & computer networking.
Huffman coding is a lossless data compression algorithm that uses variable-length binary codes for characters based on their frequency of occurrence, with more common characters represented by shorter bit sequences; it constructs a Huffman tree by assigning codes to characters such that the encoded output has minimum expected length, allowing for more efficient data storage and transmission. The document provides an overview of Huffman coding and trees, including an example to demonstrate how character frequencies are used to assign bit codes and compress a sample string from 56 bits to 13 bits.
This document discusses data compression techniques including lossless compression methods like run-length encoding and statistical encoding like Huffman encoding. It explains that compression aims to reduce the size of information to be stored or transmitted by removing redundancy. The key points covered are:
- Compression principles like entropy encoding and Huffman encoding which assigns variable length codes based on symbol probabilities.
- The Huffman algorithm involves constructing a binary tree from symbol frequencies and assigning codes based on paths from the root with '0' for left branches and '1' for right.
- Huffman coding satisfies the prefix property that no code is a prefix of another, allowing unique decoding.
Huffman's algorithm is a lossless data compression algorithm that assigns variable-length binary codes to characters based on their frequencies, with more common characters getting shorter codes. It builds a Huffman tree by starting with individual leaf nodes for each unique character and their frequencies, then combining the two lowest frequency nodes into an internal node until only one node remains as the root. Codes are then assigned by traversing paths from the root to leaves.
Huffman codes are a technique for lossless data compression that assigns variable-length binary codes to characters, with more frequent characters having shorter codes. The algorithm builds a frequency table of characters then constructs a binary tree to determine optimal codes. Characters are assigned codes based on their path from the root, with left branches representing 0 and right 1. Both encoder and decoder use this tree to translate between binary codes and characters. The tree guarantees unique decoding and optimal compression.
Huffman and Arithmetic coding - Performance analysisRamakant Soni
Huffman coding and arithmetic coding are analyzed for complexity.
Huffman coding assigns variable length codes to symbols based on probability and has O(N2) complexity. Arithmetic coding encodes the entire message as a fraction between 0 and 1 by dividing intervals based on symbol probability and has better O(N log n) complexity. Arithmetic coding compresses data more efficiently with fewer bits per symbol and has lower complexity than Huffman coding asymptotically.
Data Communication & Computer network: Shanon fano codingDr Rajiv Srivastava
These slides cover the fundamentals of data communication & networking. it covers Shanon fano coding which are used in communication of data over transmission medium. it is useful for engineering students & also for the candidates who want to master data communication & computer networking.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how Huffman coding can be applied to image compression by first predicting pixel values and then encoding the residuals. It notes some disadvantages of Huffman coding and describes variations like adaptive Huffman coding.
Adaptive Huffman coding is an improvement over standard Huffman coding that allows the Huffman tree to be adapted as additional symbols are encoded. It determines codeword mappings using a running estimate of symbol probabilities. This allows it to better exploit locality in the data. The algorithm works in two phases: first, it transforms the existing Huffman tree to maintain optimality when a symbol's weight is incremented; second, it increments the weight. This process is repeated as each new symbol is encoded.
The document discusses various lossless compression techniques including entropy coding methods like Huffman coding and arithmetic coding. It also covers dictionary-based coding like LZW, as well as other techniques like run-length coding, quadtrees for image compression, and lossless JPEG.
Data compression huffman coding algorithamRahul Khanwani
The document discusses Huffman coding, a lossless data compression algorithm that uses variable-length codes to encode symbols based on their frequency of occurrence. It explains that Huffman coding assigns shorter codes to more frequent symbols for efficient data compression. The document provides details on how the Huffman coding algorithm works by constructing a binary tree from the frequency of symbols and assigning codes based on paths in the tree. It also discusses different types of Huffman coding like static, dynamic and adaptive probability distributions and provides examples to illustrate the adaptive Huffman coding process.
Jun 29 new privacy technologies for unicode and international data standards ...Ulf Mattsson
Protecting the increasing use International Unicode characters is required by a growing number of Privacy Laws in many countries and general Privacy Concerns with private data. Current approaches to protect International Unicode characters will increase the size and change the data formats. This will break many applications and slow down business operations. The current approach is also randomly returning data in new and unexpected languages. New approach with significantly higher performance and a memory footprint can be customizable and fit on small IoT devices.
We will discuss new approaches to achieve portability, security, performance, small memory footprint and language preservation for privacy protecting of Unicode data. These new approaches provide granular protection for all Unicode languages and customizable alphabets and byte length preserving protection of privacy protected characters.
Old Approaches
Major Issues
Protecting the increasing use International Unicode characters is required by a growing number of Privacy Laws in many countries and general Privacy Concerns with private data.
Old approaches to protect International Unicode characters will typically increase the size and change the data formats.
This will break many applications and slow down business operations. This is an example of an old approach that is also randomly returning data in new and unexpected languages
Huffman coding is an algorithm that uses variable-length binary codes to compress data. It assigns shorter codes to more frequent symbols and longer codes to less frequent symbols. The algorithm constructs a binary tree from the frequency of symbols and extracts the Huffman codes from the tree. Huffman coding is widely used in applications like ZIP files, JPEG images, and MPEG videos to reduce file sizes for efficient transmission or storage.
The document discusses Huffman coding, a lossless data compression algorithm that uses variable-length codewords to encode symbols based on their frequency of occurrence. It works by building a binary tree from the frequency of symbols, where more frequent symbols are encoded by shorter codewords. This allows for more efficient representation of frequent symbols and achieves compression close to the theoretical minimum possible given the frequencies. The algorithm and encoding/decoding process are explained step-by-step with an example.
Data encryption and tokenization for international unicodeUlf Mattsson
Unicode is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of March 2020, it has a total of 143,859 characters, with Unicode 13.0 (these characters consist of 143,696 graphic characters and 163 format characters) covering 154 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, each being code-for-code identical with the other.
The Unicode Standard consists of a set of code charts for visual reference, an encoding method and set of standard character encodings, a set of reference data files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional text display order (for the correct display of text containing both right-to-left scripts, such as Arabic and Hebrew, and left-to-right scripts). Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, Java (and other programming languages), and the .NET Framework.
Unicode can be implemented by different character encodings. The Unicode standard defines Unicode Transformation Formats (UTF) UTF-8, UTF-16, and UTF-32, and several other encodings. The most commonly used encodings are UTF-8, UTF-16, and UCS-2 (a precursor of UTF-16 without full support for Unicode)
The document discusses various lossless compression techniques including entropy coding methods like Huffman coding and arithmetic coding. It also covers dictionary-based coding like LZW, as well as spatial compression techniques like run-length coding, quadtrees for images, and lossless JPEG.
This document provides an overview of Unicode and character encodings to avoid corrupting international text. It discusses:
- The difference between bytes and characters, noting that characters are often multiple bytes wide and an encoding is needed to interpret byte sequences as character sequences.
- Common mistakes like assuming a default encoding, mixing bytes and characters, and not specifying an encoding which can lead to text being corrupted when read by systems using different encodings.
- Encoding issues that can occur in different languages and file types like text files, HTML, XML, if an encoding is not properly declared or honored.
The key lessons are: you must know the character encoding to interpret byte sequences correctly, and bytes and characters should not be
The document discusses information theory and source coding. It defines information and entropy, explaining that the amount of information contained in a message depends on its probability. The entropy of a data source measures the average information content. Huffman coding is presented as a method to assign variable-length codes to symbols to minimize the average code length. Error detection and correction codes are also summarized, including parity checking, cyclic redundancy checks (CRC), linear block codes, and convolutional codes.
The presentation describes Measures of Information, entropy, source coding, source coding theorem, huffman coding, shanon fano coding, channel capacity theorem, capacity of a discrete and continuous memoryless channel, Error Free Communication over a Noisy Channel
This document summarizes Huffman code decoding. It takes a sequence to be decoded and a character probability table as input. A binary tree is built from the probabilities, with left branches representing 1 and right 0. The time complexity is O(n^2) where n is the sequence length, as building the tree can resemble an unbalanced tree. Sample run times are provided for sequences of different lengths on 10 possible characters. Pseudocode provides an algorithm to traverse the tree and decode the sequence.
The document discusses various lossless compression techniques including:
- Lossless compression induces no information loss during compression and decompression.
- Shannon's theory shows the minimum number of bits needed to represent information from a source.
- Run length coding compresses repeating symbols by encoding the symbol and number of repeats.
- Huffman coding assigns variable length codes to symbols based on frequency, with more common symbols having shorter codes. It achieves the lowest possible redundancy.
- Adaptive Huffman coding dynamically updates the coding tree as more data is processed to adapt to changing frequencies.
- LZW compression replaces repeated strings with codes, building an adaptive dictionary during compression.
The document discusses the representation and encoding of instructions in a computer's instruction set. It covers the following key points:
- Instructions are encoded in binary machine code and represented using fixed-width instruction formats. The MIPS instruction set uses 32-bit instruction words.
- The MIPS instruction set has two main formats: R-format for register-based operations, and I-format for instructions with an immediate operand or memory address.
- Instruction fields encode information like the operation code, register operands, immediate constants, and function codes. Register numbers are encoded to identify specific registers.
- Hexadecimal representation is used to compactly represent the binary instruction encodings. Instruction decoding interprets the bit patterns according
Data communication & computer networking: Huffman algorithmDr Rajiv Srivastava
These slides cover the fundamentals of data communication & networking. it covers Huffman algorithm. it is useful for engineering students & also for the candidates who want to master data communication & computer networking.
Introduction of info theory basis for image/video coding, especially, entropy, rate-distortion theory,
entropy coding, huffman coding, arithmetic coding
The document provides an overview of the C and C++ programming languages. It discusses the history and evolution of C and C++. It describes key features of C like procedural programming, manual memory management, and lack of object orientation. It also describes features of C++ like classes, inheritance, and templates which provide object orientation. The document lists many widely used software written in C/C++ and discusses advantages like speed and compact memory usage and disadvantages like difficulty of manual memory management. It provides examples of basic C code structures and data types.
Huffman coding is a lossless data compression algorithm that uses variable-length codes to represent characters. It assigns shorter codes to more frequent characters and longer codes to less frequent characters, resulting in an average compressed file size that is typically 20-90% smaller than the original file. The algorithm works by building a Huffman tree from the character frequencies, where each character is a leaf node and the frequency of that character determines its distance from the root. It then traverses the tree to assign binary codes to each character, where left branches are 0 and right branches are 1. The encoded file and Huffman tree are used together during decompression to reconstruct the original file losslessly.
A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage.
A greedy algorithm is an approach for solving a problem by selecting the best option available at the moment.
The document provides an overview of Huffman coding, a lossless data compression algorithm. It begins with a simple example to illustrate the basic idea of assigning shorter codes to more frequent symbols. It then defines key terms like entropy and describes the Huffman coding algorithm, which constructs an optimal prefix code from the frequency of symbols in the data. The document discusses how Huffman coding can be applied to image compression by first predicting pixel values and then encoding the residuals. It notes some disadvantages of Huffman coding and describes variations like adaptive Huffman coding.
Adaptive Huffman coding is an improvement over standard Huffman coding that allows the Huffman tree to be adapted as additional symbols are encoded. It determines codeword mappings using a running estimate of symbol probabilities. This allows it to better exploit locality in the data. The algorithm works in two phases: first, it transforms the existing Huffman tree to maintain optimality when a symbol's weight is incremented; second, it increments the weight. This process is repeated as each new symbol is encoded.
The document discusses various lossless compression techniques including entropy coding methods like Huffman coding and arithmetic coding. It also covers dictionary-based coding like LZW, as well as other techniques like run-length coding, quadtrees for image compression, and lossless JPEG.
Data compression huffman coding algorithamRahul Khanwani
The document discusses Huffman coding, a lossless data compression algorithm that uses variable-length codes to encode symbols based on their frequency of occurrence. It explains that Huffman coding assigns shorter codes to more frequent symbols for efficient data compression. The document provides details on how the Huffman coding algorithm works by constructing a binary tree from the frequency of symbols and assigning codes based on paths in the tree. It also discusses different types of Huffman coding like static, dynamic and adaptive probability distributions and provides examples to illustrate the adaptive Huffman coding process.
Jun 29 new privacy technologies for unicode and international data standards ...Ulf Mattsson
Protecting the increasing use International Unicode characters is required by a growing number of Privacy Laws in many countries and general Privacy Concerns with private data. Current approaches to protect International Unicode characters will increase the size and change the data formats. This will break many applications and slow down business operations. The current approach is also randomly returning data in new and unexpected languages. New approach with significantly higher performance and a memory footprint can be customizable and fit on small IoT devices.
We will discuss new approaches to achieve portability, security, performance, small memory footprint and language preservation for privacy protecting of Unicode data. These new approaches provide granular protection for all Unicode languages and customizable alphabets and byte length preserving protection of privacy protected characters.
Old Approaches
Major Issues
Protecting the increasing use International Unicode characters is required by a growing number of Privacy Laws in many countries and general Privacy Concerns with private data.
Old approaches to protect International Unicode characters will typically increase the size and change the data formats.
This will break many applications and slow down business operations. This is an example of an old approach that is also randomly returning data in new and unexpected languages
Huffman coding is an algorithm that uses variable-length binary codes to compress data. It assigns shorter codes to more frequent symbols and longer codes to less frequent symbols. The algorithm constructs a binary tree from the frequency of symbols and extracts the Huffman codes from the tree. Huffman coding is widely used in applications like ZIP files, JPEG images, and MPEG videos to reduce file sizes for efficient transmission or storage.
The document discusses Huffman coding, a lossless data compression algorithm that uses variable-length codewords to encode symbols based on their frequency of occurrence. It works by building a binary tree from the frequency of symbols, where more frequent symbols are encoded by shorter codewords. This allows for more efficient representation of frequent symbols and achieves compression close to the theoretical minimum possible given the frequencies. The algorithm and encoding/decoding process are explained step-by-step with an example.
Data encryption and tokenization for international unicodeUlf Mattsson
Unicode is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of March 2020, it has a total of 143,859 characters, with Unicode 13.0 (these characters consist of 143,696 graphic characters and 163 format characters) covering 154 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, each being code-for-code identical with the other.
The Unicode Standard consists of a set of code charts for visual reference, an encoding method and set of standard character encodings, a set of reference data files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional text display order (for the correct display of text containing both right-to-left scripts, such as Arabic and Hebrew, and left-to-right scripts). Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, Java (and other programming languages), and the .NET Framework.
Unicode can be implemented by different character encodings. The Unicode standard defines Unicode Transformation Formats (UTF) UTF-8, UTF-16, and UTF-32, and several other encodings. The most commonly used encodings are UTF-8, UTF-16, and UCS-2 (a precursor of UTF-16 without full support for Unicode)
The document discusses various lossless compression techniques including entropy coding methods like Huffman coding and arithmetic coding. It also covers dictionary-based coding like LZW, as well as spatial compression techniques like run-length coding, quadtrees for images, and lossless JPEG.
This document provides an overview of Unicode and character encodings to avoid corrupting international text. It discusses:
- The difference between bytes and characters, noting that characters are often multiple bytes wide and an encoding is needed to interpret byte sequences as character sequences.
- Common mistakes like assuming a default encoding, mixing bytes and characters, and not specifying an encoding which can lead to text being corrupted when read by systems using different encodings.
- Encoding issues that can occur in different languages and file types like text files, HTML, XML, if an encoding is not properly declared or honored.
The key lessons are: you must know the character encoding to interpret byte sequences correctly, and bytes and characters should not be
The document discusses information theory and source coding. It defines information and entropy, explaining that the amount of information contained in a message depends on its probability. The entropy of a data source measures the average information content. Huffman coding is presented as a method to assign variable-length codes to symbols to minimize the average code length. Error detection and correction codes are also summarized, including parity checking, cyclic redundancy checks (CRC), linear block codes, and convolutional codes.
The presentation describes Measures of Information, entropy, source coding, source coding theorem, huffman coding, shanon fano coding, channel capacity theorem, capacity of a discrete and continuous memoryless channel, Error Free Communication over a Noisy Channel
This document summarizes Huffman code decoding. It takes a sequence to be decoded and a character probability table as input. A binary tree is built from the probabilities, with left branches representing 1 and right 0. The time complexity is O(n^2) where n is the sequence length, as building the tree can resemble an unbalanced tree. Sample run times are provided for sequences of different lengths on 10 possible characters. Pseudocode provides an algorithm to traverse the tree and decode the sequence.
The document discusses various lossless compression techniques including:
- Lossless compression induces no information loss during compression and decompression.
- Shannon's theory shows the minimum number of bits needed to represent information from a source.
- Run length coding compresses repeating symbols by encoding the symbol and number of repeats.
- Huffman coding assigns variable length codes to symbols based on frequency, with more common symbols having shorter codes. It achieves the lowest possible redundancy.
- Adaptive Huffman coding dynamically updates the coding tree as more data is processed to adapt to changing frequencies.
- LZW compression replaces repeated strings with codes, building an adaptive dictionary during compression.
The document discusses the representation and encoding of instructions in a computer's instruction set. It covers the following key points:
- Instructions are encoded in binary machine code and represented using fixed-width instruction formats. The MIPS instruction set uses 32-bit instruction words.
- The MIPS instruction set has two main formats: R-format for register-based operations, and I-format for instructions with an immediate operand or memory address.
- Instruction fields encode information like the operation code, register operands, immediate constants, and function codes. Register numbers are encoded to identify specific registers.
- Hexadecimal representation is used to compactly represent the binary instruction encodings. Instruction decoding interprets the bit patterns according
Data communication & computer networking: Huffman algorithmDr Rajiv Srivastava
These slides cover the fundamentals of data communication & networking. it covers Huffman algorithm. it is useful for engineering students & also for the candidates who want to master data communication & computer networking.
Introduction of info theory basis for image/video coding, especially, entropy, rate-distortion theory,
entropy coding, huffman coding, arithmetic coding
The document provides an overview of the C and C++ programming languages. It discusses the history and evolution of C and C++. It describes key features of C like procedural programming, manual memory management, and lack of object orientation. It also describes features of C++ like classes, inheritance, and templates which provide object orientation. The document lists many widely used software written in C/C++ and discusses advantages like speed and compact memory usage and disadvantages like difficulty of manual memory management. It provides examples of basic C code structures and data types.
Huffman coding is a lossless data compression algorithm that uses variable-length codes to represent characters. It assigns shorter codes to more frequent characters and longer codes to less frequent characters, resulting in an average compressed file size that is typically 20-90% smaller than the original file. The algorithm works by building a Huffman tree from the character frequencies, where each character is a leaf node and the frequency of that character determines its distance from the root. It then traverses the tree to assign binary codes to each character, where left branches are 0 and right branches are 1. The encoded file and Huffman tree are used together during decompression to reconstruct the original file losslessly.
A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage.
A greedy algorithm is an approach for solving a problem by selecting the best option available at the moment.
Huffman coding is a popular data compression technique that assigns variable-length binary codes to symbols based on their frequency of occurrence, with more common symbols getting shorter codes. It works by building a binary tree from the frequency of each symbol and assigning codes based on paths from the root to each leaf node. This results in more efficient transmission of data compared to fixed-length codes like ASCII by using fewer total bits. While it does not achieve optimal compression like entropy encoding since it uses an integral number of bits per code, Huffman coding is simple to implement and provides good compression for many types of data.
The document discusses data compression techniques including lossless and lossy compression. It focuses on Huffman coding, a lossless compression algorithm. It explains how Huffman coding works by assigning variable-length binary codes to characters based on their frequency, with more common characters getting shorter codes. This allows the data to be compressed. It provides an example showing the construction of a Huffman tree from character frequencies and the assignment of codes. The codes generated satisfy the prefix property to allow for unique decoding.
This document discusses Huffman coding, which is a variable-length encoding technique used to compress data. It describes how Huffman codes are created by building a binary tree based on the frequencies of symbols in a message. The tree is constructed by recursively combining the two subtrees with the lowest frequencies until a single tree remains. Codes are then assigned to each symbol by traversing paths from the root to leaf nodes, with 0 representing a left branch and 1 a right branch. This ensures the prefix property where no code is a prefix of another. An example is provided to demonstrate constructing a Huffman tree and assigning codes for a sample message. The average code length is calculated using the formula provided.
The document discusses Huffman tree coding, which is a variable-length encoding technique that assigns codewords of different lengths to symbols based on their frequency of occurrence. It explains how to create a Huffman tree from symbol frequencies, derive the Huffman codes, and use the codes to encode and decode messages. An example is provided where the frequencies of letters in an alphabet are used to generate a Huffman tree and codes, and a message is encoded and decoded to demonstrate the process.
Huffman coding is a data compression technique that uses variable-length code words to encode characters based on their frequency of occurrence. It involves building a Huffman tree from character frequencies, assigning code words by traversing the tree, and encoding the text. This results in more common characters having shorter code words, reducing the number of bits needed and allowing for smaller file sizes compared to fixed-length encodings like ASCII.
The document provides an overview of digital media basics including digitization, compression, representation, and standards. It discusses signal digitization through pulse code modulation and sampling. It also covers quantization, digitization examples, and lossless versus lossy compression. Specific compression techniques covered include transform coding, variable rate coding, predictive coding, and entropy coding like Huffman coding. The document also discusses psychoacoustic modeling and perceptual coding. It provides examples of speech, audio, image and video compression standards and techniques.
Audio documents can be summarized in 3 sentences or less as follows:
The document discusses various audio coding models, techniques, and standards including entropy coding, differential coding, LPC and parametric coding, sub-band coding, and audio compression standards from ITU, ISO, and MPEG. It also covers topics like audio data rates and file sizes, psychoacoustic models, compression algorithm requirements, and structured audio coding techniques like MIDI. The models, techniques, and standards discussed aim to compress audio signals while maintaining sufficient quality for the intended application.
This document provides an introduction to information theory and coding. It discusses Shannon's foundational work in the 1940s that established information theory and answered two fundamental questions: the limit on data compression and transmission rate over a communications channel. It describes the basic components of a digital communication system and Shannon's definition of communication. Shannon sought to determine the maximum possible transmission rate over a channel. The document also summarizes Shannon's source coding and channel coding theorems and how modern coding techniques have come close to achieving Shannon's theoretical limits.
The document discusses Huffman coding, a method for data compression that assigns variable-length codes to input characters based on their frequency of occurrence. It involves building a binary tree from the character frequencies and assigning shorter codes to more common characters. This allows for more efficient representation of data compared to fixed-length codes like ASCII. Applications include compression in file formats like MP3 and JPEG.
Data Structure and Algorithms Huffman Coding AlgorithmManishPrajapati78
Huffman coding is a statistical compression technique that assigns variable-length codes to characters based on their frequency of occurrence. It builds a Huffman tree by prioritizing characters from most to least frequent, then assigns codes by traversing the tree left for 0 and right for 1. This results in shorter codes for more common characters, compressing text files into fewer bits than standard ASCII encoding. The receiver reconstructs the same Huffman tree to decode the bitstream back into the original text.
The document discusses Huffman encoding, which is a method for data compression of text documents. It uses a binary tree to assign variable-length codes to each letter in the original message, with more frequent letters receiving shorter codes. This allows the message to be compressed by encoding it with the Huffman codes rather than standard 8-bit ASCII codes. The algorithm works by first listing each unique letter and its frequency, and then combining the two least frequent letters into a node, repeating this process until a full binary tree is constructed with a single root node.
Huffman coding is a coding technique for lossless compression of data base based upon the frequency of occurance of a symbol in that file.
In huffman coding every Data is based upon 0’s and 1’s which reduces the size of file.
Using binary representation, the number of bits required to represent each character depends upon the number of characters that have to be represented. Using one bit we can represent two characters, i.e., 0 represents the first character and 1 represents the second character.
Huffman codes are variable-length codes that assign shorter bit sequences to more common characters. They have the prefix property so codes don't overlap. To create a Huffman code:
1. Determine character frequencies
2. Build a binary tree by combining the least frequent nodes
3. Assign 0/1 codes to edges to get each character's code
4. The encoded message uses the fewest total bits compared to fixed-length codes
Huffman codes are a type of variable-length code that can more efficiently encode messages compared to fixed-length codes like ASCII. The document describes how to build a Huffman coding tree by assigning codes to characters based on their frequency in a message. The codes are assigned such that more frequent characters have shorter codes. This ensures the encoded message is as short as possible on average. Building the Huffman tree involves recursively combining the two least frequent characters until a full binary tree is constructed, with codes read from the paths.
CS-102 Data Structures huffman coding.pdfssuser034ce1
Huffman coding is a lossless data compression algorithm that uses variable-length codewords to encode symbols based on their frequency of occurrence in a file. It builds a binary tree from the frequency of symbols, with more common symbols nearer the root, to assign shorter codewords to more frequent symbols. The document describes the basic Huffman coding algorithm which involves building the Huffman tree from a priority queue of symbol frequencies, traversing the tree to determine codewords, and encoding a sample text file using the new codewords to achieve compression. Real-world applications of Huffman coding include data compression in GNU gzip and internet standards.
CS-102 Data Structures huffman coding.pdfssuser034ce1
Huffman coding is a lossless data compression algorithm that uses variable-length codewords to encode symbols based on their frequency of occurrence in a file. It builds a binary tree from the frequency of symbols, with more common symbols nearer the root, to assign shorter codewords to more frequent symbols. The document describes the basic Huffman coding algorithm which involves building the Huffman tree from a priority queue of symbol frequencies, traversing the tree to determine codewords, and encoding a sample text file using the new codewords to achieve compression. Real-world applications of Huffman coding include data compression in GNU gzip and internet standards.
1. The document discusses different compression techniques for text, audio, images, and video.
2. It provides examples of compression ratios achieved using lossy and lossless compression methods. For example, text compression can achieve 3:1 ratios using Lempel-Ziv coding while audio compression can achieve ratios between 3:1 to 24:1 using MP3.
3. The techniques discussed include entropy encoding, run-length encoding, Huffman coding, discrete cosine transforms, and differential encoding which takes advantage of redundancies in the data. The best approach depends on the type of data and acceptable quality.
Similar to Farhana shaikh webinar_huffman coding (20)
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
2. Contents:
• Data Compression
• Fixed length encoding
• Variable length encoding
• Prefix Code
• Representing Prefix Codes Using Binary Tree
• Decoding A Prefix Code
• Optimality
• Huffman Coding
• Cost Of Huffman Tree
• Huffman Algorithm and Implementation
4/21/2020 Huffman Coding 2
3. DataCompression
• Use less bits
• Reduce original file size.
• Space-Time complexity trade-off.
• useful - reduce resources usage, suchasdata storage spaceor
transmission capacity.
Compressiontypes:
1.Losslesscompression
2.Lossycompression
4/21/2020 Huffman Coding 3
Using the tools, such as zip, 7zip
4. 4
Bits...Bytes...etc...
Poll Question#1 : How many bits are required to represent 26
characters/ symbols?
A. 26 bits
B. 32 bits
C. 5 bits
D. 8 bits
2 = 26?
2 = 32
5
5 bits are required to represent
26 characters
4/21/2020 Huffman Coding
5. 32-26= 6 characters representation are unused.
e.g. 0= 00000 represents character A
1= 00001 represents character B
...
25= 011001 represents character Z
26= 011010 is unused.
27= 011011 is unused.
28= 011100 is unused.
29 unused.
30 unused.
31 unused.
can be used in future…
4/21/2020 Huffman Coding 5
Bits...Bytes...etc...
7. Huffman Coding4/21/2020 7
• In ASCII, each English character is represented in the
number of bits (8 bits)
• If a text contains n characters, it takes 8n bits in total to
store the text in ASCII
• E.g. A =
ABC = 8*3= 24 bits
Text file with 14,700 characters will require,
14,700 * 8 = 117,600 bits
Bits...Bytes...etc...
65 = 01000001= 8*1 = 8 bits
8. Main Idea: Encoding
• Assume in this file
only 6 characters
appear
E, A, C, T, K, N
• The frequencies are:
Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100
Original file
4/21/2020 Huffman Coding 8
9. Main Idea: Encoding
• Assume in this file only 6
characters appear
E, A, C, T, K, N
• The frequencies are:
Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100
• Option I (No Compression)
– Each character = 1 Byte (8 bits)
– Total file size = 14,700 * 8 = 117,600 bits
• Option 2 (Fixed length encoding)
– We have 6 characters, so we need 3
bits to encode them
– Total file size = 14,700 * 3 = 44,100 bits
Character Fixed Encoding
E 000
A 001
C 010
T 100
K 110
N 111
4/21/2020 Huffman Coding 9
10. Main Idea: Encoding
• Assume in this file only 6
characters appear
E, A, C, T, K, N
• The frequencies are:
Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100
Character Variable length encoding
E 0
A 01
C 010
T 0100
K 01001
N 01101
• Option 3 (Variable length encoding)
– Variable-length compression
– Assign shorter codes to more frequent
characters and longer codes to less
frequent characters
– Total file size:
(10,000 x 1) + (4,000 x 2) + (300 x 3)
+ (200 x 4) + (100 x 5) + (100 x 5) =
20,700 bits
4/21/2020 Huffman Coding 10
11. 11
Poll Question#2 : The binary code length does not depend on the
frequency of occurrence of characters.
A. True
B. False
4/21/2020 Huffman Coding
12. Main Idea: Encoding
• Assume in this file only 6
characters appear
E, A, C, T, K, N
• The frequencies are:
Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100
Character Variable length encoding
E 0
A 01
C 010
T 0100
K 01001
N 01101
• Option 3 (Variable length encoding)
– Variable-length compression
– Total file size:
(10,000 x 1) + (4,000 x 2) + (300 x 3)
+ (200 x 4) + (100 x 5) + (100 x 5) =
20,700 bits
4/21/2020 Huffman Coding 12
– Assign shorter codes to more frequent
characters and longer codes to less
frequent characters
13. Decodingfor fixed-length codesismuch easier
Character Fixed
length
encoding
E 000
A 001
C 010
T 100
K 110
N 111
010001100110111000
010 001 100 110 111 000
Divide into 3’s
C A T K N E
Decode
4/21/2020 Huffman Coding 13
14. Decodingfor variable-length codesisnot that easy…
0100010
It means
what???
AEEC TC CEAE
We cannot tell if the original is, AEEC or TC or CEAE
4/21/2020 Huffman Coding 14
Character Variable length
encoding
E 0
A 01
C 010
T 0100
K 01001
N 01101
Problem is one codeword is a prefix of another
15. Huffman Coding4/21/2020 15
• Toavoid the problem, we generally want that each codeword is
NOT a prefix of another
• Such an encoding scheme is called a prefix code, or prefix-free
code
• For a text encoded by a prefix code, we can easily decode it in the
following way :
10100001000101000101000…
1 2
1 Scan from left to right to extract the first code
2 Recursively decode the remaining part
16. Decodingfor Prefix free codes…
0100010
EAEEA
4/21/2020 Huffman Coding 16
Character Prefix free code
E 0
A 10
C 110
T 1110
K 11110
N 11111
Character Variable length
encoding
E 0
A 01
C 010
T 0100
K 01001
N 01101
1. Scan from left to right to extract the first code
2. Recursively decode the remaining part
17. Huffman Coding4/21/2020 17
Prefix Code Tree
• Naturally, a prefix code scheme
corresponds to a prefix code tree
E
0 1
0 1
A
C
0 1
T
0
• The tree is a rooted, with
1. each edge is labeled by a bit ;
2. each leaf a character ;
3. labels on root-to-leaf path
codeword for the character
• E.g., E 0, A10, C110,
T 1110 , etc.
18. 18
Poll Question#3 : From the following given tree, what is the code
word for the character ‘a’?
A. 010
B. 100
C. 101
D. 011
4/21/2020 Huffman Coding
0
1
1
19. 19
Poll Question#4 : From the following given tree, what is the
computed codeword for ‘c’?
A. 010
B. 100
C. 110
D. 011
4/21/2020 Huffman Coding
0
1
1
20. Main Idea: Encoding
• Assume in this file only 6
characters appear
E, A, C, T, K, N
• The frequencies are:
Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100Original file
4/21/2020 Huffman Coding 20
….Construct Optimal Prefix Code Tree
21. • Proposed by Dr. David A. Huffman in 1952
“A Method for the Construction of Minimum Redundancy Codes”
• Applicable to many forms of data transmission
Our example: text files
• Build the optimal prefix code tree, bottom-up in a greedy fashion
Huffman Coding
4/21/2020 Huffman Coding 21
22. • A technique to compress data effectively
• Usually between 20%-90% compression
• Lossless compression
• No information is lost
• When decompress, you get the original file
4/21/2020 Huffman Coding 22
Compressed file
Huffman coding
Original file
Huffman Coding
23. Huffman Coding:Applications
• Saving space
• Store compressed files instead of original files
• Transmitting files or data
• Send compressed data to save transmission time and power
• Encryption and decryption
• Cannot read the compressed file without knowing the “key”
Compressed file
Huffman coding
4/21/2020 Huffman Coding 23
Original file
24. HuffmanCoding
•A variable-length coding for characters
• More frequent characters shorter codes
• Less frequent characters longer codes
•It is not like ASCII coding where all characters
have the same coding length (8 bits)
•Two main questions
1. How to assign codes (Encoding process)?
2. How to decode (from the compressed file, generate
the original file)
(Decoding process)?
4/21/2020 Huffman Coding 24
25. Huffman Algorithm
• Step 1: Get Frequencies
• Scan the file to be compressed and count the occurrence of
each character
• Sort the characters based on their frequency
• Step 2: Build Tree & Assign Codes
• Build a Huffman-code tree (binary tree)
• Traverse the tree to assign codes
• Step 3: Encode (Compress)
• Scan the file again and replace each character by its code
• Step 4: Decode (Decompress)
• Huffman tree is the key to decompress the file
4/21/2020 Huffman Coding 25
26. Step1: GetFrequencies
Eerie eyes seen near lake.
Char Frequency
E
e
1
8
k
.
1
1
r 2
I 1
y
s
n
a
l
1
2
2
2
1
Input File:
4/21/2020 Huffman Coding 26
Char Frequency Char Frequency
space 4
27. Step2: Build Huffman Tree& AssignCodes
• It is a binary tree in which each character is a leaf node
• Initially each node is a separate root
• At each step
• Select two roots with smallest frequency and connect
them to a new parent (Break ties arbitrary) [The greedy
choice]
• The parent will get the sum of frequencies of the two
child nodes
• Repeat until you have one root
4/21/2020 Huffman Coding 27
29. Find the smallest two frequencies…Replacethem with their parent
E
1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
☐
4
e
8
E
1
i
1
2
4/21/2020 Huffman Coding 29
32. E i
1 1
r
2
s
2
n
2
a
2
☐
4
e
8
2
y l
1 1
2
k .
1 1
2
r s
2 2
4
4/21/2020 Huffman Coding 32
Find the smallest two frequencies…Replacethem with their parent
33. E i
1 1
n
2
a
2
☐
4
e
8
2
y l
1 1
2
k .
1 1
2
r s
2 2
4
n a
2 2
4
4/21/2020 Huffman Coding 33
Find the smallest two frequencies…Replacethem with their parent
34. E i
1 1
☐
4
e
8
2
y l
1 1
2
k .
1 1
2
r s
2 2
4
n a
2 2
4
E i
2
y l
1 1 1 1
2
4
4/21/2020 Huffman Coding 34
Find the smallest two frequencies…Replacethem with their parent
35. ☐
4
e
82
E i y l
1 1 1 1
2
k .
1 1
2
r s
2 2
4
n a
2 2
4 4
☐
4
k .
1 1
2
6
4/21/2020 Huffman Coding 35
Find the smallest two frequencies…Replacethem with their parent
36. E i
1 1
☐
4
e
8
2
y
1
l
1
2
k .
1 1
2
r s
2 2
4
n a
2 2
4 4 6
r
4
s n a
2 2 2 2
4
8
4/21/2020 Huffman Coding 36
Find the smallest two frequencies…Replacethem with their parent
37. E i
☐
4
e
82
y l
1 1 1 1
2
k .
1 1
2
r s
2 2
4
n a
2 2
4
4
6 8
E i
1 1
☐
4
2 2
y l k .
1 1 1 1
2
4
6
10
4/21/2020 Huffman Coding 37
Find the smallest two frequencies…Replacethem with their parent
38. ☐
4
e
8
2 2
E i y l k .
1 1 1 1 1 1
2r s
2 2
4
n a
2 2
4 4
6
8 10
e
8
r s
4
n a
2 2 2 2
4
8
16
4/21/2020 Huffman Coding 38
Find the smallest two frequencies…Replacethem with their parent
39. ☐
4
e
82 2
E i y l k .
1 1 1 1 1 1
2
r s
4
n a
2 2 2 2
4
4
6
8
10 16
4/21/2020 Huffman Coding 39
Find the smallest two frequencies…Replacethem with their parent
40. ☐
4
e
8
2 2
E i y l k .
1 1 1 1 1 1
2
2 2
4
r s n a
2 2
4
4
6
8
10
16
26
Now we have a single root…This is the Huffman Tree!
4/21/2020 Huffman Coding 40
41. LetsAnalyzeHuffman Tree
• All characters are at the leaf nodes
• The number at the root = # of characters in the file
• High-frequency chars (E.g., “e”) are near the root
• Low-frequency chars are far from the root
E
☐
4
e
8
2 2
i y l k .
1 1 1 1 1 1
2
r s
2 2
4
n a
2 2
4
4
6
8
10
16
26
4/21/2020 Huffman Coding 41
42. LetsAssignCodes
• Traverse the tree
• Any left edge add label 0
• As right edge add label 1
• The code for each character is its root-to-leaf label sequence
☐
4
e
8
2 2
E i y l k .
1 1 1 1 1 1
2
r s
4
n a
2 2 2 2
4
4
6
8
10
16
26
4/21/2020 Huffman Coding 42
43. • Traverse the tree
• Any left edge add label 0
• As right edge add label 1
• The code for each character is its root-to-leaf label sequence
☐
4
e
8
2 2
E i y l k .
1 1 1 1 1 1
2
r s
4
n a
2 2 2 2
4
4
6
8
10
16
26
0
1
0
0
0
0
0
0 0
1
1
11
1
1
1
10
01 0 1
4/21/2020 Huffman Coding 43
LetsAssignCodes
44. Char Code
E 0000
i 0001
y 0010
l 0011
k 0100
. 0101
space☐ 011
e 10
r 1100
s 1101
n 1110
a 1111
Coding Table
4/21/2020 Huffman Coding 44
• Traverse the tree
• Any left edge add label 0
• As right edge add label 1
• The code for each character is its root-to-leaf label sequence
LetsAssignCodes
45. Huffman Algorithm
4/21/2020 Huffman Coding 45
• Step 1: Get Frequencies
• Scan the file to be compressed and count the occurrence of
each character
• Sort the characters based on their frequency
• Step 2: Build Tree & Assign Codes
• Build a Huffman-code tree (binary tree)
• Traverse the tree to assign codes
• Step 3: Encode (Compress)
• Scan the file again and replace each character by its code
• Step 4: Decode (Decompress)
• Huffman tree is the key to decompress the file
46. 46
Poll Question#5 : In Huffman coding, data in a tree always occur?
A. Roots
B. Leaves
C. left sub trees
D. right sub trees
4/21/2020 Huffman Coding
47. Step3: Encode(Compress)The File
Eerie eyes seen near lake.
Input File: Char Code
E 0000
i 0001
y 0010
l 0011
k 0100
. 0101
space☐ 011
e 10
r 1100
s 1101
n 1110
a 1111
Coding Table
+
Generate the
encoded file
000010 1100 000110 ….
Notice that no code is prefix to any other code
Ensures the decoding will be unique (Unlike Slide13)
4/21/2020 Huffman Coding 47
48. Step4: Decode(Decompress)
• Must have the encoded file + the coding tree
• Scan the encoded file
• For each 0 move left in the tree
• For each 1 move right
• Until reach a leaf node Emit that character and go back
to the root
4/21/2020 Huffman Coding 48
50. Huffman Algorithm
• Step 1: Get Frequencies
• Scan the file to be compressed and count the occurrence of
each character
• Sort the characters based on their frequency
• Step 2: Build Tree & Assign Codes
• Build a Huffman-code tree (binary tree)
• Traverse the tree to assign codes
• Step 3: Encode (Compress)
• Scan the file again and replace each character by its code
• Step 4: Decode (Decompress)
• Huffman tree is the key to decompess the file
4/21/2020 Huffman Coding 50
51. Pseudocode:HuffmanCoding
• An appropriate data structure is a binary min-heap
• Rebuilding the heap is lgn and n-1 extractions are made, so the
complexity is O( nlgn)
• The encoding is NOT unique, other encoding may work just as well,
but none will work better
4/21/2020 Huffman Coding 51