This document discusses data compression techniques. It begins with an introduction to data compression, explaining that it reduces file sizes by identifying repetitive patterns in data. It then discusses some common questions around data compression, its major steps, types including lossless and lossy compression, and some examples like Huffman coding and LZ-77 encoding. The document provides details on these techniques through examples and diagrams.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Data Compression - Text Compression - Run Length EncodingMANISH T I
Run-length encoding (RLE) replaces consecutive repeated characters in data with a single character and count. For example, "aaabbc" would compress to "3a2bc". RLE works best on data with many repetitive characters like spaces. It has limitations for natural language text which contains few repetitions longer than doubles. Variants include digram encoding which compresses common letter pairs, and differencing which encodes differences between successive values like temperatures instead of absolute values.
Security using colors and armstrong numbers by sravanthi (lollypop)Ashok Reddy
This document proposes a technique for encrypting data using Armstrong numbers and colors as passwords to provide secure data transmission. The technique uses three sets of keys - colors assigned to each receiver, additional key values assigned to each receiver, and Armstrong numbers. Data is encrypted by adding it to the keys, then decryption involves subtracting the keys to recover the original data. This ensures confidentiality, authentication and integrity during data transmission.
This document discusses various data compression techniques. It begins with an introduction to compression and its goals of reducing storage space and transmission time. Then it discusses lossless techniques like Huffman coding, Lempel-Ziv coding, run-length encoding and pattern substitution. The document also briefly covers lossy compression and entropy encoding algorithms like Shannon-Fano coding and arithmetic coding. Key compression methods and their applications are summarized throughout.
The document discusses different techniques for compressing multimedia data such as text, images, audio and video. It describes how compression works by removing redundancy in digital data and exploiting properties of human perception. It then explains different compression methods including lossless compression, lossy compression, entropy encoding, and specific algorithms like Huffman encoding and arithmetic coding. The goal of compression is to reduce the size of files to reduce storage and bandwidth requirements for transmission.
This document discusses data compression techniques. It begins with an introduction to data compression and why it is useful to reduce unnecessary space. It then discusses different types of data compression, including lossless compression techniques like Huffman coding, Lempel-Ziv, and arithmetic coding as well as lossy compression for images, audio, and video. One technique, Shannon-Fano coding, is explained in detail with an example. The document concludes that while Shannon-Fano is simple, Huffman coding produces better compression and is more commonly used.
The document discusses various lossless compression techniques including entropy coding methods like Huffman coding and arithmetic coding. It also covers dictionary-based coding like LZW, as well as spatial compression techniques like run-length coding, quadtrees for images, and lossless JPEG.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Data Compression - Text Compression - Run Length EncodingMANISH T I
Run-length encoding (RLE) replaces consecutive repeated characters in data with a single character and count. For example, "aaabbc" would compress to "3a2bc". RLE works best on data with many repetitive characters like spaces. It has limitations for natural language text which contains few repetitions longer than doubles. Variants include digram encoding which compresses common letter pairs, and differencing which encodes differences between successive values like temperatures instead of absolute values.
Security using colors and armstrong numbers by sravanthi (lollypop)Ashok Reddy
This document proposes a technique for encrypting data using Armstrong numbers and colors as passwords to provide secure data transmission. The technique uses three sets of keys - colors assigned to each receiver, additional key values assigned to each receiver, and Armstrong numbers. Data is encrypted by adding it to the keys, then decryption involves subtracting the keys to recover the original data. This ensures confidentiality, authentication and integrity during data transmission.
This document discusses various data compression techniques. It begins with an introduction to compression and its goals of reducing storage space and transmission time. Then it discusses lossless techniques like Huffman coding, Lempel-Ziv coding, run-length encoding and pattern substitution. The document also briefly covers lossy compression and entropy encoding algorithms like Shannon-Fano coding and arithmetic coding. Key compression methods and their applications are summarized throughout.
The document discusses different techniques for compressing multimedia data such as text, images, audio and video. It describes how compression works by removing redundancy in digital data and exploiting properties of human perception. It then explains different compression methods including lossless compression, lossy compression, entropy encoding, and specific algorithms like Huffman encoding and arithmetic coding. The goal of compression is to reduce the size of files to reduce storage and bandwidth requirements for transmission.
This document discusses data compression techniques. It begins with an introduction to data compression and why it is useful to reduce unnecessary space. It then discusses different types of data compression, including lossless compression techniques like Huffman coding, Lempel-Ziv, and arithmetic coding as well as lossy compression for images, audio, and video. One technique, Shannon-Fano coding, is explained in detail with an example. The document concludes that while Shannon-Fano is simple, Huffman coding produces better compression and is more commonly used.
The document discusses various lossless compression techniques including entropy coding methods like Huffman coding and arithmetic coding. It also covers dictionary-based coding like LZW, as well as spatial compression techniques like run-length coding, quadtrees for images, and lossless JPEG.
The document provides an overview of digital media basics including digitization, compression, representation, and standards. It discusses signal digitization through pulse code modulation and sampling. It also covers quantization, digitization examples, and lossless versus lossy compression. Specific compression techniques covered include transform coding, variable rate coding, predictive coding, and entropy coding like Huffman coding. The document also discusses psychoacoustic modeling and perceptual coding. It provides examples of speech, audio, image and video compression standards and techniques.
Arithmetic coding is a lossless data compression technique that encodes data as a single real number between 0 and 1. It maps a string of symbols to a fractional number, with more probable symbols represented by larger fractional ranges. Encoding involves repeatedly dividing the interval based on symbol probabilities, and the final encoded number represents the entire string. Decoding reconstructs the string by comparing the number to symbol probability ranges. Arithmetic coding achieves compression closer to the entropy limit than Huffman coding by spreading coding inefficiencies across all symbols of the data.
This document summarizes a lecture on entropy coding and discusses Hoffman coding and Golomb coding. It begins with an overview of entropy, conditional entropy, and mutual information. It then explains Hoffman coding by describing the Hoffman coding procedure and properties like optimality. Golomb coding is also summarized, including the Golomb code construction and its advantages over unary coding. Implementation details are provided for Golomb encoding and decoding.
Huffman and Arithmetic coding - Performance analysisRamakant Soni
Huffman coding and arithmetic coding are analyzed for complexity.
Huffman coding assigns variable length codes to symbols based on probability and has O(N2) complexity. Arithmetic coding encodes the entire message as a fraction between 0 and 1 by dividing intervals based on symbol probability and has better O(N log n) complexity. Arithmetic coding compresses data more efficiently with fewer bits per symbol and has lower complexity than Huffman coding asymptotically.
Data compression huffman coding algorithamRahul Khanwani
The document discusses Huffman coding, a lossless data compression algorithm that uses variable-length codes to encode symbols based on their frequency of occurrence. It explains that Huffman coding assigns shorter codes to more frequent symbols for efficient data compression. The document provides details on how the Huffman coding algorithm works by constructing a binary tree from the frequency of symbols and assigning codes based on paths in the tree. It also discusses different types of Huffman coding like static, dynamic and adaptive probability distributions and provides examples to illustrate the adaptive Huffman coding process.
Huffman coding is a method of compressing data by assigning variable-length codes to characters based on their frequency of occurrence. It creates a binary tree structure where the characters with the lowest frequencies are assigned the longest binary codes. This allows data to be compressed more efficiently compared to a fixed-length coding scheme. Huffman coding is proven to be an optimal prefix code, meaning no other prefix coding method can compress the data more than Huffman coding for a given frequency distribution.
LZW coding is a lossless compression technique that removes spatial redundancies in images. It works by assigning variable length code words to sequences of input symbols using a dictionary. As the dictionary grows, longer matches are encoded, improving compression ratios. LZW compression is fast, simple to implement, and effective for images with repeating patterns, making it widely used in formats like GIF and TIFF [END SUMMARY]
This document provides an introduction to arithmetic coding, a data compression technique. It begins with an abstract, introduction, and overview of arithmetic coding and how it differs from other entropy encoding techniques like Huffman coding. It then goes into more detail about the basic concepts, motivation, and methods behind arithmetic coding over multiple chapters. It discusses how arithmetic coding encodes data by creating a code that represents a fraction in the unit interval [0,1] and recursively partitions this interval based on the input symbols. The document provides examples to illustrate how arithmetic coding works and generates unique tags for symbol sequences.
The document discusses Huffman coding, which is a variable-length binary coding technique used for lossless data compression. It describes how Huffman codes are constructed using a Huffman tree to assign codewords to symbols based on their frequency of occurrence, with more frequent symbols assigned shorter codewords. The document outlines the Huffman coding algorithm and provides examples of how it works to generate an optimal prefix code with the minimum average codeword length.
This document discusses various lossless compression algorithms including run-length coding, Shannon-Fano algorithm, Huffman coding, extended Huffman coding, dictionary-based coding like LZW, and arithmetic coding. It provides details on the basic principles of run-length coding, an example of extended Huffman coding for a source with symbols A, B, and C, and outlines the structure of the document.
1. Digital systems represent information in binary form and use binary logic elements like logic gates to process data. Quantities are stored as binary values in storage elements like flip-flops.
2. There are different number systems like binary, decimal, and other bases. Converting between them involves procedures like partitioning into groups or dividing and accumulating remainders.
3. Representing negative numbers in binary involves sign-magnitude, 1's complement, or 2's complement systems. The 2's complement is most common for computer arithmetic due to its simplicity.
The document discusses various methods for representing numeric data in a computer system, including binary, decimal, fixed-point, and floating-point representations. It describes word length in bits and bytes and how numbers are stored in memory in big-endian and little-endian formats. Signed number representations like sign-magnitude, one's complement, and two's complement are also summarized. Various decimal coding schemes such as BCD, ASCII, excess-three, and two-out-of-five are defined.
The document discusses Shannon-Fano encoding, which is an early method for data compression that constructs efficient binary codes for information sources without memory. It works by assigning shorter codes to more frequent messages and longer codes to less frequent messages. The encoding process involves recursively splitting the set of messages in half based on probability until each message has its own unique code. While Shannon-Fano codes are reasonably efficient, they are not always optimal and the codes generated can depend on how the initial splitting of messages is done.
The document discusses Huffman coding, which is a data compression technique that uses variable-length codes to encode symbols based on their frequency of occurrence, with more common symbols getting shorter codes. It provides details on how a Huffman tree is constructed by assigning codes to characters based on their frequency, with the most frequent characters assigned the shortest binary codes to achieve data compression. Examples are given to demonstrate how characters are encoded using a Huffman tree and how the storage size is calculated based on the path lengths and frequencies of characters.
This document outlines the syllabus for a Multimedia Communication class taught by Zhu Li in spring 2016. The class will cover topics related to video coding standards, video compression techniques, and video networking. Students will complete homework assignments, two quizzes, and a project. The goal of the class is for students to understand multimedia compression theory and algorithms, and be able to apply their knowledge to solve real-world problems in media communication.
Introduction of info theory basis for image/video coding, especially, entropy, rate-distortion theory,
entropy coding, huffman coding, arithmetic coding
The presentation is for support of Network Layer class on Logical Addressing topic. From IPv4 address to Network Address Translation. Resources have been derived from Data Communication & Networking by Behrouz A. Forouzan
Huffman encoding is a variable-length encoding technique used for text compression that assigns shorter bit strings to more common characters and longer bit strings to less common characters. It uses a prefix code where no codeword is a prefix of another, allowing for unique decoding. The algorithm works by building a Huffman tree from the bottom up by repeatedly combining the two lowest frequency symbols into a node until a full tree is created, with codes read from the paths. This greedy approach results in an optimal prefix code that minimizes the expected codeword length, improving compression.
The document discusses IPv4 addressing and logical addressing in computer networks. It covers the following key points:
- IPv4 addresses are 32-bit addresses that uniquely identify devices connected to the internet. The total address space is 232 or approximately 4.3 billion addresses.
- Addresses can be written in binary or dotted-decimal notation. IPv4 addresses are divided into classes based on the first bits of the address.
- Classful addressing wasted a large portion of addresses. It was replaced by classless addressing which allocates address blocks of variable sizes.
- In classless addressing, a block of addresses is defined as the network address, subnet mask, and number of hosts. The first and last
Lossless compression algorithms compress data without any loss of information, allowing the original data to be perfectly reconstructed from the compressed data. Some common lossless compression algorithms include run length encoding (RLE), Huffman coding, Lempel-Ziv-Welch (LZW), and variable length coding (VLC). RLE replaces repeated characters with a single character and count, Huffman coding assigns variable length binary codes to characters based on their frequency, and LZW builds a dictionary of repeated strings to shorten repetitive sequences during compression. Lossless compression is useful for storage and transmission of files where preserving complete accuracy of the data is important.
This document discusses various techniques for lossless data compression, including run-length coding, Huffman coding, adaptive Huffman coding, arithmetic coding, and Shannon-Fano coding. It provides details on how each technique works, such as assigning shorter codes to more frequent symbols in Huffman coding and dynamically updating codes based on the data stream in adaptive Huffman coding. The document also discusses the importance of compression techniques for reducing the number of bits needed to store or transmit data.
This document summarizes several source coding techniques: Arithmetic coding encodes a message into a single floating point number between 0 and 1. Lempel-Ziv coding builds a dictionary to encode repeated patterns. Run length encoding replaces repeated characters with a code indicating the character and number of repeats. Rate distortion theory calculates the minimum bit rate needed for a given source and distortion. The entropy rate measures how entropy grows with the length of a stochastic process. JPEG uses lossy compression including discrete cosine transform and quantization to discard high frequency data imperceptible to humans.
The document provides an overview of digital media basics including digitization, compression, representation, and standards. It discusses signal digitization through pulse code modulation and sampling. It also covers quantization, digitization examples, and lossless versus lossy compression. Specific compression techniques covered include transform coding, variable rate coding, predictive coding, and entropy coding like Huffman coding. The document also discusses psychoacoustic modeling and perceptual coding. It provides examples of speech, audio, image and video compression standards and techniques.
Arithmetic coding is a lossless data compression technique that encodes data as a single real number between 0 and 1. It maps a string of symbols to a fractional number, with more probable symbols represented by larger fractional ranges. Encoding involves repeatedly dividing the interval based on symbol probabilities, and the final encoded number represents the entire string. Decoding reconstructs the string by comparing the number to symbol probability ranges. Arithmetic coding achieves compression closer to the entropy limit than Huffman coding by spreading coding inefficiencies across all symbols of the data.
This document summarizes a lecture on entropy coding and discusses Hoffman coding and Golomb coding. It begins with an overview of entropy, conditional entropy, and mutual information. It then explains Hoffman coding by describing the Hoffman coding procedure and properties like optimality. Golomb coding is also summarized, including the Golomb code construction and its advantages over unary coding. Implementation details are provided for Golomb encoding and decoding.
Huffman and Arithmetic coding - Performance analysisRamakant Soni
Huffman coding and arithmetic coding are analyzed for complexity.
Huffman coding assigns variable length codes to symbols based on probability and has O(N2) complexity. Arithmetic coding encodes the entire message as a fraction between 0 and 1 by dividing intervals based on symbol probability and has better O(N log n) complexity. Arithmetic coding compresses data more efficiently with fewer bits per symbol and has lower complexity than Huffman coding asymptotically.
Data compression huffman coding algorithamRahul Khanwani
The document discusses Huffman coding, a lossless data compression algorithm that uses variable-length codes to encode symbols based on their frequency of occurrence. It explains that Huffman coding assigns shorter codes to more frequent symbols for efficient data compression. The document provides details on how the Huffman coding algorithm works by constructing a binary tree from the frequency of symbols and assigning codes based on paths in the tree. It also discusses different types of Huffman coding like static, dynamic and adaptive probability distributions and provides examples to illustrate the adaptive Huffman coding process.
Huffman coding is a method of compressing data by assigning variable-length codes to characters based on their frequency of occurrence. It creates a binary tree structure where the characters with the lowest frequencies are assigned the longest binary codes. This allows data to be compressed more efficiently compared to a fixed-length coding scheme. Huffman coding is proven to be an optimal prefix code, meaning no other prefix coding method can compress the data more than Huffman coding for a given frequency distribution.
LZW coding is a lossless compression technique that removes spatial redundancies in images. It works by assigning variable length code words to sequences of input symbols using a dictionary. As the dictionary grows, longer matches are encoded, improving compression ratios. LZW compression is fast, simple to implement, and effective for images with repeating patterns, making it widely used in formats like GIF and TIFF [END SUMMARY]
This document provides an introduction to arithmetic coding, a data compression technique. It begins with an abstract, introduction, and overview of arithmetic coding and how it differs from other entropy encoding techniques like Huffman coding. It then goes into more detail about the basic concepts, motivation, and methods behind arithmetic coding over multiple chapters. It discusses how arithmetic coding encodes data by creating a code that represents a fraction in the unit interval [0,1] and recursively partitions this interval based on the input symbols. The document provides examples to illustrate how arithmetic coding works and generates unique tags for symbol sequences.
The document discusses Huffman coding, which is a variable-length binary coding technique used for lossless data compression. It describes how Huffman codes are constructed using a Huffman tree to assign codewords to symbols based on their frequency of occurrence, with more frequent symbols assigned shorter codewords. The document outlines the Huffman coding algorithm and provides examples of how it works to generate an optimal prefix code with the minimum average codeword length.
This document discusses various lossless compression algorithms including run-length coding, Shannon-Fano algorithm, Huffman coding, extended Huffman coding, dictionary-based coding like LZW, and arithmetic coding. It provides details on the basic principles of run-length coding, an example of extended Huffman coding for a source with symbols A, B, and C, and outlines the structure of the document.
1. Digital systems represent information in binary form and use binary logic elements like logic gates to process data. Quantities are stored as binary values in storage elements like flip-flops.
2. There are different number systems like binary, decimal, and other bases. Converting between them involves procedures like partitioning into groups or dividing and accumulating remainders.
3. Representing negative numbers in binary involves sign-magnitude, 1's complement, or 2's complement systems. The 2's complement is most common for computer arithmetic due to its simplicity.
The document discusses various methods for representing numeric data in a computer system, including binary, decimal, fixed-point, and floating-point representations. It describes word length in bits and bytes and how numbers are stored in memory in big-endian and little-endian formats. Signed number representations like sign-magnitude, one's complement, and two's complement are also summarized. Various decimal coding schemes such as BCD, ASCII, excess-three, and two-out-of-five are defined.
The document discusses Shannon-Fano encoding, which is an early method for data compression that constructs efficient binary codes for information sources without memory. It works by assigning shorter codes to more frequent messages and longer codes to less frequent messages. The encoding process involves recursively splitting the set of messages in half based on probability until each message has its own unique code. While Shannon-Fano codes are reasonably efficient, they are not always optimal and the codes generated can depend on how the initial splitting of messages is done.
The document discusses Huffman coding, which is a data compression technique that uses variable-length codes to encode symbols based on their frequency of occurrence, with more common symbols getting shorter codes. It provides details on how a Huffman tree is constructed by assigning codes to characters based on their frequency, with the most frequent characters assigned the shortest binary codes to achieve data compression. Examples are given to demonstrate how characters are encoded using a Huffman tree and how the storage size is calculated based on the path lengths and frequencies of characters.
This document outlines the syllabus for a Multimedia Communication class taught by Zhu Li in spring 2016. The class will cover topics related to video coding standards, video compression techniques, and video networking. Students will complete homework assignments, two quizzes, and a project. The goal of the class is for students to understand multimedia compression theory and algorithms, and be able to apply their knowledge to solve real-world problems in media communication.
Introduction of info theory basis for image/video coding, especially, entropy, rate-distortion theory,
entropy coding, huffman coding, arithmetic coding
The presentation is for support of Network Layer class on Logical Addressing topic. From IPv4 address to Network Address Translation. Resources have been derived from Data Communication & Networking by Behrouz A. Forouzan
Huffman encoding is a variable-length encoding technique used for text compression that assigns shorter bit strings to more common characters and longer bit strings to less common characters. It uses a prefix code where no codeword is a prefix of another, allowing for unique decoding. The algorithm works by building a Huffman tree from the bottom up by repeatedly combining the two lowest frequency symbols into a node until a full tree is created, with codes read from the paths. This greedy approach results in an optimal prefix code that minimizes the expected codeword length, improving compression.
The document discusses IPv4 addressing and logical addressing in computer networks. It covers the following key points:
- IPv4 addresses are 32-bit addresses that uniquely identify devices connected to the internet. The total address space is 232 or approximately 4.3 billion addresses.
- Addresses can be written in binary or dotted-decimal notation. IPv4 addresses are divided into classes based on the first bits of the address.
- Classful addressing wasted a large portion of addresses. It was replaced by classless addressing which allocates address blocks of variable sizes.
- In classless addressing, a block of addresses is defined as the network address, subnet mask, and number of hosts. The first and last
Lossless compression algorithms compress data without any loss of information, allowing the original data to be perfectly reconstructed from the compressed data. Some common lossless compression algorithms include run length encoding (RLE), Huffman coding, Lempel-Ziv-Welch (LZW), and variable length coding (VLC). RLE replaces repeated characters with a single character and count, Huffman coding assigns variable length binary codes to characters based on their frequency, and LZW builds a dictionary of repeated strings to shorten repetitive sequences during compression. Lossless compression is useful for storage and transmission of files where preserving complete accuracy of the data is important.
This document discusses various techniques for lossless data compression, including run-length coding, Huffman coding, adaptive Huffman coding, arithmetic coding, and Shannon-Fano coding. It provides details on how each technique works, such as assigning shorter codes to more frequent symbols in Huffman coding and dynamically updating codes based on the data stream in adaptive Huffman coding. The document also discusses the importance of compression techniques for reducing the number of bits needed to store or transmit data.
This document summarizes several source coding techniques: Arithmetic coding encodes a message into a single floating point number between 0 and 1. Lempel-Ziv coding builds a dictionary to encode repeated patterns. Run length encoding replaces repeated characters with a code indicating the character and number of repeats. Rate distortion theory calculates the minimum bit rate needed for a given source and distortion. The entropy rate measures how entropy grows with the length of a stochastic process. JPEG uses lossy compression including discrete cosine transform and quantization to discard high frequency data imperceptible to humans.
The document discusses data compression fundamentals including why compression is needed, information theory basics, classification of compression algorithms, and compression performance metrics. It notes that high quality audio, video, and images require huge storage and bandwidth that compression addresses. Compression algorithms involve modeling data redundancy and entropy encoding. Lossy compression achieves higher compression but with reconstruction error, while lossless compression exactly reconstructs data. Key metrics include compression ratio, subjective quality scores, and objective measures like PSNR.
The document discusses various data compression techniques including run-length coding, quantization, statistical coding, dictionary-based coding, transform-based coding, and motion prediction. It provides examples and explanations of how each technique works to reduce the size of encoded data. The performance of compression algorithms can be measured by the compression ratio, compression factor, or percentage of data saved by compression.
The document discusses data compression fundamentals including why compression is needed, information theory basics, classification of compression algorithms, and the data compression model. It notes that digital representations of analog signals require huge storage and bandwidth for transmission. Compression aims to represent source data with as few bits as possible while maintaining acceptable fidelity through modeling and coding phases. Algorithms can be lossless or lossy depending on whether reconstruction is exact. Performance is evaluated based on compression ratio, quality, complexity, and delay.
This document discusses different compression techniques including lossless and lossy compression. Lossless compression recovers the exact original data after compression and is used for databases and documents. Lossy compression results in some loss of accuracy but allows for greater compression and is used for images and audio. Common lossless compression algorithms discussed include run-length encoding, Huffman coding, and arithmetic coding. Lossy compression is used in applications like digital cameras to increase storage capacity with minimal quality degradation.
This document provides an overview of image compression techniques. It discusses how image compression works to reduce the number of bits needed to represent image data. The main goals of image compression are to reduce irrelevant and redundant image information to produce smaller and more efficient file sizes for storage and transmission. The document outlines different compression methods including lossless compression, which compresses data without any loss, and lossy compression, which allows for some loss of information in exchange for higher compression ratios. Specific techniques like run length encoding are also explained.
A description about image Compression. What are types of redundancies, which are there in images. Two classes compression techniques. Four different lossless image compression techiques with proper diagrams(Huffman, Lempel Ziv, Run Length coding, Arithmetic coding).
This document summarizes Chapter 3 of the textbook "Cryptography and Network Security" by William Stallings. It discusses block ciphers and the Data Encryption Standard (DES). Specifically, it provides an overview of modern block ciphers and DES, including the history and design of DES, how it works using a Feistel cipher structure, and analyses of the strength and security of DES. It also covers differential cryptanalysis as an analytic attack against block ciphers like DES.
This document discusses how computers represent different types of data using binary numbers. It explains that all data inside a computer is stored as binary digits (bits) that represent ON and OFF switches. Various data types like characters, pictures, sound, programs and integers are represented by grouping bits into bytes. The context determines how a computer interprets each byte. Standards like ASCII, JPEG and WAV define how different data is encoded into binary format and bytes. The document also covers number systems like binary, decimal, hexadecimal and their properties.
This document discusses data compression algorithms including lossless and lossy methods. It defines lossless compression as allowing perfect reconstruction of the original data and lossy compression as permitting only approximate reconstruction. Specific lossless methods covered are run-length encoding, Huffman coding, and Lempel-Ziv encoding. Lossy methods discussed are JPEG compression for images, discrete cosine transform, and MPEG video compression. The document concludes that the presented approach of using the Hartley transform for image compression with separate magnitude and phase processing achieved good performance.
The document summarizes a study evaluating three data compression algorithms created by Dr. Samuel Sterns. The study was led by Myuran Kanga and evaluated the algorithms on various waveforms to determine compression accuracy and efficiency. Algorithm 2 used quantization, algorithm 3 added prediction of quantized data, and algorithm 4 used adaptive arithmetic coding for further compression. Waveforms like sine, square and sawtooth waves as well as noise were compressed and decompressed, and the results were analyzed for differences between original and decompressed signals.
The document provides an introduction to computational thinking concepts including converting information to data, data types and encoding, and logic. It discusses how information is converted to continuous and discrete data, and how data is encoded through binary representations and bit strings. Different data types like numbers, text, colors, pictures and sound are also explained in terms of their encoding. The document then covers logic and computational thinking concepts like inductive and deductive logic, and how Boolean logic uses true/false propositions and logical operators.
The document discusses various techniques for image compression, including lossless and lossy methods. For lossless compression, it describes predictive coding techniques that remove inter-pixel redundancy such as delta modulation. It also covers entropy encoding schemes like Huffman coding and LZW coding. For lossy compression, it discusses the discrete cosine transform used in the JPEG standard, where higher frequency coefficients are quantized more coarsely to remove information. Zig-zag ordering is used before entropy coding the quantized DCT coefficients.
This document discusses image, audio, and video compression. It explains that raw multimedia data contains redundant information and compression removes this redundancy to reduce file size. There are lossless compression techniques like run-length encoding and LZW that allow for exact reconstruction of the original data. There is also lossy compression like JPEG that permanently eliminates some information, trading off quality for smaller file size. Lossy compression is generally used for audio and video.
(1) Image compression aims to reduce image file size while preserving quality by removing redundant data; (2) It uses lossless methods like run-length encoding and Huffman coding that preserve all information or lossy methods like DCT transform coding that discard unimportant visual details; (3) The DCT transforms images into frequency domains and allows discarding of high-frequency coefficients corresponding to imperceptible information to achieve higher compression ratios with some quality loss.
Three sentences summarizing the document:
The document discusses various methods for lossless image compression by reducing different types of data redundancy. It describes how coding redundancy can be addressed through variable-length coding schemes like Huffman coding and arithmetic coding. Interpixel redundancy is reduced by applying transformations to the image data before encoding, while psychovisual redundancy is reduced via quantization. The goal of lossless compression is to minimize the file size while perfectly reconstructing the original image data.
Architecture of message oriented middlewareSajan Sahu
The document discusses message oriented middleware (MOM) and compares implementations of MOM including JMS, MSMQ, and MQSeries. MOM facilitates asynchronous messaging between applications and components. JMS is part of Java and supports publish/subscribe and point-to-point messaging. MSMQ by Microsoft uses queues managed by queue managers. MQSeries by IBM refers to queues, queue managers, and channels as objects and shows how they interact in its architecture.
The document describes an insurance envoy project completed between December 2006 and March 2007. The project involved developing an intranet-based web application to help insurance companies streamline their policy and claims processes. The system has 7 modules and was designed to speed up workflows, increase accuracy and customer satisfaction, and make information retrieval easier. It provides secure, centralized access to customer, broker, policy and company data. The system was tested using black box and white box techniques at the unit, integration and system levels.
The document summarizes different types of computer memory. It discusses primary memory, which includes RAM and ROM that the computer uses to process data. RAM allows for immediate data manipulation and storage of currently running processes, programs, and data. ROM stores startup instructions and can only be written to once. Secondary memory includes magnetic tape, disks, and optical disks, which provide larger backup storage. Magnetic tape uses magnetic coating to store data segments while magnetic disks offer direct access and high storage capacity. Optical disks use reflective coatings read by lasers to store large amounts of data in small spaces.
Bikram Kumar Mishra presented on developing robotic tools to assist in automated aircraft inspection. The presentation covered establishing technical feasibility, developing robotic systems with mechanical, control, and sensor systems, and final phase development including eddy current sensors for inspection and video cameras for navigation. The goal was to create a robot that can walk over an aircraft, perform inspections identical to manual methods, and be remotely operated using sensor feedback to the inspector.
This document discusses deadlocks in operating systems. It defines deadlock as when multiple processes are waiting for resources held by each other in a cyclic manner, resulting in none of the processes making progress. It provides examples and describes the four necessary conditions for deadlock to occur: mutual exclusion, hold and wait, no preemption, and circular wait. It also discusses methods for handling deadlocks, including prevention, avoidance, and recovery techniques like terminating processes or preempting resources.
The document discusses a seminar on data warehousing presented by Sangram Keshari Swain. It defines a data warehouse as a subject-oriented, integrated, non-volatile collection of data used to support management decision making. The primary concept is separating nonvolatile data for analysis from operational systems. A data warehouse provides a single view of enterprise data optimized for reporting and analysis through extracting and integrating data from different sources.
The document discusses the various phases of information system development including problem recognition, planning, feasibility study, user specifications, system design, design of output and input systems, and processing design. It notes that problem recognition can come from dissatisfaction with existing systems, management directives, or requests for new systems. The planning phase includes assessing corporate culture and priority criteria. The feasibility study consists of organizing the study, searching for solutions, performing feasibility analysis, and choosing a solution. User specifications involve data collection, analysis, and creating requirements documentation. System design includes modeling, methodology selection, and documentation. Output design covers categories, media, equipment, and principles. Input design discusses data collection, preparation, offline and online entry, validation, and human factors
The document discusses DNA computing and its advantages over traditional computers. It begins with an introduction by Debadarshi Mishra on the topic. DNA computers use enzymes that react with DNA strands in a chain reaction to perform computations in parallel, unlike traditional computers that use binary. DNA computers are smaller, faster, and can solve problems with many possible solutions simultaneously. Their potential applications include monitoring health and generating customized drugs. While still in development, DNA computing represents a new approach to computation at the molecular level.
This presentation introduces the Wireless Application Protocol (WAP). WAP was created by mobile companies to provide an open standard for wireless information and services. It uses WML as its markup language. The presentation describes WAP's architecture including its protocol stack and WAP gateway. It also discusses how WAP addresses limitations of wireless devices and networks, such as small displays and low bandwidth. Examples of WAP services mentioned include banking, shopping, news and email. In conclusion, the presenter notes that WAP will become more efficient with the rollout of 3G networks like UMTS.
Blink detection and tracking of eyes for eye localisatSajan Sahu
This document presents a method for blink detection, eye tracking, and eye localization. It proposes using frame differencing, thresholding, and optical flow analysis to detect eye blinks. Key steps include obtaining motion regions through frame differencing, thresholding to find eye blobs, and computing optical flow fields to determine if dominant blob motion is upward or downward, indicating eye opening or closing. Eye tracking is then performed on localized eye regions using KLT tracking with 20 feature points. The method aims to accurately and quickly locate and track eyes for applications like human-computer interaction and security identification.
This document discusses database management systems (DBMS) and their key characteristics and components. It covers three main points:
1) It describes the main functions of a DBMS, including maintaining data integrity, enforcing access rules, providing backup/recovery procedures, and allowing shared data access across applications.
2) It outlines different database models like relational, network, and hierarchical and discusses logical and physical data independence.
3) It explores the architecture of a DBMS and how users and programs interact with the database through a DBMS interface to perform queries, updates, and other data management tasks.
GPRS is a packet-based mobile data service on the 2G and 3G cellular communication systems. It provides higher data transmission rates and efficient use of network resources compared to existing cellular networks. GPRS allows users to be continuously connected to the network and transfer data in bursts. It introduces packet-switched routing to the existing GSM infrastructure and supports applications like internet, email and file transfer through connections to external packet networks.
The document provides information about the BIOS (Basic Input/Output System) including what it does, its components, and setup. It discusses that the BIOS performs basic functions to start the computer like checking setup information, loading drivers and interrupt handlers, and performing POST (Power-On Self-Test). It also outlines the typical boot sequence, components of the setup utility, and some beep codes used for fault finding.
Bluetooth is a wireless technology standard that allows various digital devices to connect and exchange information over short distances without cables. It uses short-wavelength radio transmissions in the industrial, scientific and medical radio bands to facilitate connections between devices like mobile phones, headphones, laptops and printers. Key benefits of Bluetooth include its global acceptance, ability to connect a wide range of devices easily, secure connections, and low power consumption. It works by creating personal area networks between devices within about 30 feet of each other.
The document discusses the 802.11 specifications for wireless LANs. It describes the layers and protocols in the 802.11 architecture including the LLC, MAC, PHY and management layers. It provides details on the MAC protocol which uses CSMA/CA for channel access. It also describes the different physical layer specifications including DSSS, FHSS, IR and OFDM and key aspects of the DSSS PHY standard.
The document discusses the key steps for selecting and implementing an enterprise resource planning (ERP) system. It recommends establishing a major project with high priority and visibility. A project manager should be selected who is knowledgeable about the company and ERP systems. Requirements for the system should be thoroughly documented from all departments. Top candidates are reviewed through references, trial software, and selection criteria. Implementation involves training users on the system's concepts and software, establishing procedures, and maintaining data accuracy.
This document provides an overview of internet telephony (also known as voice over internet protocol or VoIP). It discusses how VoIP works by sending audio over the internet in real-time between computer users. The document also outlines the key factors and protocols that enable VoIP, including improvements in compression techniques, full-duplex sound cards, more powerful PCs, and protocols like SIP, RTP, and H.323. Both advantages and disadvantages of VoIP are presented, such as lower long distance costs but also potential issues with internet integration and latency.
Sangram Keshari Nayak presented a technical seminar on W-CDMA at the National Institute of Science and Technology. W-CDMA stands for Wideband Code Division Multiple Access, which is a 3G network that uses a 5MHz carrier spectrum and has higher capacity than previous networks. The presentation covered topics such as how CDMA works, the differences between CDMA variants, W-CDMA characteristics and parameters, design issues like turbo coding and interference cancellation, the WCDMA system, radio network functionality including power control and soft handover, and upgrading from GSM to WCDMA networks.
This document provides an overview of cryptography. It discusses traditional cryptography techniques like the Caesar cipher and shift ciphers. It also explains symmetric and asymmetric key cryptography, giving examples like DES, RSA, and their uses. Cryptographic services like authentication, data integrity, non-repudiation, and confidentiality are mentioned. The document notes how cryptography is widely used today in network security, banking transactions, and military communications.
This document discusses Counter compliance from a publisher's perspective. It explains that publishers comply with Counter to meet customer demands and demonstrate usage and value. Counter compliance means making some defined usage reports available, though not necessarily all reports. It also only currently applies to journals, not other materials. The document outlines things publishers should do like comply with Counter's Code of Practice, inform sales and customers about Counter, and prepare for future auditing and data standards.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
2. 2
S
I
L
I
C
O
N
contents
IntroductionIntroduction
What ,whenWhat ,when
Some questionSome question
UsesUses
Major stepsMajor steps
Type of data compressionType of data compression
disadvantagesdisadvantages
conclusionconclusion
3. 3
S
I
L
I
C
O
N
INTRODUCTION
Data Compression What:Data Compression What:
As name implies, makes your data smaller, saving space
Looks for repetitive sequences or patterns in data - e.g. the
the quick the brown fox the
We are more repetitive than we think - text often
compresses over 50%
Lossless vs. lossy
4. 4
S
I
L
I
C
O
N
Data Compression - WHY
Most data from nature has redundancy
There is more data than the actual information contained
in the data.
Squeezing out the excess data amounts to compression.
However, unsqeezing out is necessary to be able to figure
out what the data means.
Always possible to compress?
Consider a two-bit sequence.
Can you always compress it to one bit?
the limits of compression and give clues on
how to compress well.
5. 5
S
I
L
I
C
O
N
Question:
Question:Question: Why do we want to make files smaller?Why do we want to make files smaller?
Answer:Answer:
To use less storage, i.e., saving costsTo use less storage, i.e., saving costs
To transmit these files faster, decreasing accessTo transmit these files faster, decreasing access
time or using the same access time, but with atime or using the same access time, but with a
lower and cheaper bandwidthlower and cheaper bandwidth
To process the file sequentially faster.To process the file sequentially faster.
7. 7
S
I
L
I
C
O
N
Preparation:-Preparation:-It include analog to digital conversionIt include analog to digital conversion
and generating appropriate digital representationand generating appropriate digital representation
of the information. An image is divided intoof the information. An image is divided into
blacks of 8/8 pixels, and represented by affix no.blacks of 8/8 pixels, and represented by affix no.
of bit per pixel.of bit per pixel.
Processing:-Processing:-It is 1st stage of compression processIt is 1st stage of compression process
which make use sophisticated algorithms.which make use sophisticated algorithms.
Quantization:-Quantization:-It is the result of previous step. ItIt is the result of previous step. It
specifies the granularity of the mapping of realspecifies the granularity of the mapping of real
number into integer number. This process resultsnumber into integer number. This process results
in a reduction of precision.in a reduction of precision.
Entropy encoding: -Entropy encoding: - It is the last step. ItIt is the last step. It
compresses a sequential digital data streamcompresses a sequential digital data stream
without loss. For ex:-compress sequence ofwithout loss. For ex:-compress sequence of
zeroes specifying the no. of occurrence.zeroes specifying the no. of occurrence.
8. 8
S
I
L
I
C
O
N
USES OF DATA
COMPRESSION
More and more data is being stored electronically. DigitalMore and more data is being stored electronically. Digital
video libraries, for example, contain vast amounts of data,video libraries, for example, contain vast amounts of data,
and compression allows cost-effective storage of the data.and compression allows cost-effective storage of the data.
New technology has allowed the possibility of interactiveNew technology has allowed the possibility of interactive
digital television and the demand is for high-qualitydigital television and the demand is for high-quality
transmissions, a wide selection of programs to choose fromtransmissions, a wide selection of programs to choose from
and inexpensive hardware. But for digital television to be aand inexpensive hardware. But for digital television to be a
success, it must use data compression [Saxton, 1996].success, it must use data compression [Saxton, 1996]. DataData
compression reduces the number of bits required tocompression reduces the number of bits required to
represent or transmit information.represent or transmit information.
9. 9
S
I
L
I
C
O
N
TYPES OF DATA
COMPRESSION Entropy encodingEntropy encoding -- lossless. Data considered a-- lossless. Data considered a
simple digital sequence and semantics of data aresimple digital sequence and semantics of data are
ignored.ignored.
Source encodingSource encoding -- lossy. Takes semantics of data-- lossy. Takes semantics of data
into account. Amount of compression depends oninto account. Amount of compression depends on
data contents.data contents.
Hybrid encodingHybrid encoding -- combination of entropy and-- combination of entropy and
source. Most multimedia systems use these.source. Most multimedia systems use these.
10. 10
S
I
L
I
C
O
N
TYPES OF DATA
COMPRESSION Entropy encodingEntropy encoding -- lossless.-- lossless.
Data in data stream considered a simple digitalData in data stream considered a simple digital
sequence and semantics of data are ignored.sequence and semantics of data are ignored.
Short Code words for frequently occurring symbols.Short Code words for frequently occurring symbols.
Longer Code words for more infrequently occurringLonger Code words for more infrequently occurring
symbolssymbols
For example: E occurs frequently in English, soFor example: E occurs frequently in English, so
we should give it a shorter code than Qwe should give it a shorter code than Q
Examples of Entropy Encoding:Examples of Entropy Encoding:
Loss less data compressionLoss less data compression
Huffman codingHuffman coding
Arithmetic codingArithmetic coding
11. 11
S
I
L
I
C
O
N
LOSSLESS DATA
COMPRESSION
Run-Length CodingRun-Length Coding
RunsRuns (sequences) of data are stored as a single value(sequences) of data are stored as a single value
and count, rather than the individual run.and count, rather than the individual run.
Example:Example:
ThisThis::
• WWWWWWWWWWWWBWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWW
WWBBBWWWWWWWWWWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWW
WWWWWBWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
Becomes:Becomes:
• 12WB12W3B24WB14W12WB12W3B24WB14W
12. 12
S
I
L
I
C
O
N
Data is not lost - the original is really needed.Data is not lost - the original is really needed.
text compression.text compression.
compression of computer binaries to fit on acompression of computer binaries to fit on a
floppy.floppy.
Compression ratio typically 2:1 to 8:1Compression ratio typically 2:1 to 8:1..
lossless compression on many kinds of files.lossless compression on many kinds of files.
Statistical Techniques:Statistical Techniques:
Huffman coding.Huffman coding.
Arithmetic coding.Arithmetic coding.
Dictionary techniques:Dictionary techniques:
LZW, LZ77.LZW, LZ77.
Standards - Morse code, Braille, Unix compress,Standards - Morse code, Braille, Unix compress,
gzip,gzip,
zip, bzip, GIF, PNG, JBIG, Lossless JPEG.zip, bzip, GIF, PNG, JBIG, Lossless JPEG.
13. 13
S
I
L
I
C
O
N
SHANNON-FANO
COADING
Shannon lossless source coding theorem isShannon lossless source coding theorem is
based on the concept of block coding. Tobased on the concept of block coding. To
illustrate this concept, we introduce aillustrate this concept, we introduce a
special information source in which thespecial information source in which the
alphabet consists of only two letters:alphabet consists of only two letters:
1.1. First-Order Block CodeFirst-Order Block Code
A={a,b}A={a,b}
15. 15
S
I
L
I
C
O
N
An example:-
Note that 24 bits are used to represent 24Note that 24 bits are used to represent 24
characters --- an average of 1characters --- an average of 1
bit/character.bit/character.
16. 16
S
I
L
I
C
O
N
Second-Order Block Code :-Second-Order Block Code :- Pairs ofPairs of
characters are mapped to either one, two, or threecharacters are mapped to either one, two, or three
bits.bits.
17. 17
S
I
L
I
C
O
N
.
..B2B2 P(B2)P(B2) CodewordCodeword
aaaa 0.450.45 00
bbbb 0.450.45 1010
abab 0.050.05 110110
baba 0.050.05 111111
R=0.825bits/characterR=0.825bits/character
18. 18
S
I
L
I
C
O
N
An example:
Note that 20 bits are used to represent 24Note that 20 bits are used to represent 24
characters --- an average of 0.83characters --- an average of 0.83
bits/character.bits/character.
19. 19
S
I
L
I
C
O
N
Third-Order Block Code: -Third-Order Block Code: -Triplets ofTriplets of
characters are mapped to bit sequence of lengths onecharacters are mapped to bit sequence of lengths one
through six.through six.
20. 20
S
I
L
I
C
O
N
..
B3B3 P(B3)P(B3) CodewordCodeword
aaaaaa 0.4050.405 00
bbbbbb 0.4050.405 0101
aabaab 0.4050.405 11001100
abbabb 0.4050.405 11011101
bbabba 0.4050.405 11101110
baabaa 0.4050.405 1111011110
abaaba 0.0050.005 111110111110
R=0.68R=0.68 Bits/charactersBits/characters
21. 21
S
I
L
I
C
O
N
An example:An example:
Note that 17 bits are used to represent 24Note that 17 bits are used to represent 24
characters --- an average of 0.71characters --- an average of 0.71
bits/character.bits/character.
22. 22
S
I
L
I
C
O
N
HUFFMAN CODING
Suppose messages are made of letters a, b, c, d, and e,Suppose messages are made of letters a, b, c, d, and e,
which appear with probabilities .12, .4, .15, .08, and .25,which appear with probabilities .12, .4, .15, .08, and .25,
respectively.respectively.
We wish to encode each character into a sequence of 0’sWe wish to encode each character into a sequence of 0’s
and 1’s so that no code for a character is theand 1’s so that no code for a character is the prefixprefix forfor
another.another.
Answer (using Huffman’s algorithm given on the nextAnswer (using Huffman’s algorithm given on the next
slide): a=1111, b=0, c=110, d=1110, e=10.slide): a=1111, b=0, c=110, d=1110, e=10.
26. 26
S
I
L
I
C
O
N
HUFFMAN CODING
ExampleExample
5
n = 5n = 5,, w[0:4] = [2, 5, 4, 7, 9].w[0:4] = [2, 5, 4, 7, 9].
2=0102=010
5=005=00
4=0114=011
7=107=10
9=119=11 2 4
6
11
7 9
16
27
00
0
0
0
1
1
1 1
27. 27
S
I
L
I
C
O
N
LZ-77 ENCODING
Good as they are, Huffman and arithmeticGood as they are, Huffman and arithmetic
coding are not perfect for encoding textcoding are not perfect for encoding text
because they don't capture the higher-orderbecause they don't capture the higher-order
relationships between words and phrases.relationships between words and phrases.
There is a simple, clever, and effectiveThere is a simple, clever, and effective
approach to compressing text known asapproach to compressing text known as
"LZ-77", which uses the redundant nature"LZ-77", which uses the redundant nature
of text to provide compression.of text to provide compression.
28. 28
S
I
L
I
C
O
N
For an example, consider the phrase:For an example, consider the phrase:
the_rain_in_Spain_falls_mainly_in_the_the_rain_in_Spain_falls_mainly_in_the_
plainplain
-- where the underscores ("_") indicate-- where the underscores ("_") indicate
spaces. This uncompressed message is 43spaces. This uncompressed message is 43
bytes, or 344 bits, long.bytes, or 344 bits, long.
29. 29
S
I
L
I
C
O
N
the_rain_in_Spain_falls_mainly_in_the_plain
At first, LZ-77 simply outputs uncompressedAt first, LZ-77 simply outputs uncompressed
characters, since there are no previous occurrencescharacters, since there are no previous occurrences
of any strings to refer back to. In our example,of any strings to refer back to. In our example,
these characters will not be compressed:these characters will not be compressed:
1- the_rain_1- the_rain_ The next chunk of the message:The next chunk of the message:
in_in_ -- has occurred earlier in the message, and can-- has occurred earlier in the message, and can
be represented as a pointer back to that earlier text,be represented as a pointer back to that earlier text,
along with a length field. This gives:along with a length field. This gives:
2-the_rain_<3,3>2-the_rain_<3,3>
30. 30
S
I
L
I
C
O
N
the_rain_in_Spain_falls_mainly_in_the_plain
-- which has to be output uncompressed:-- which has to be output uncompressed:
3- the_rain_<3,3>Sp3- the_rain_<3,3>Sp However, the charactersHowever, the characters
"ain_" have already been sent, so they are encoded"ain_" have already been sent, so they are encoded
with a pointer:with a pointer:
4- the_rain_<3,3>Sp<9,4>4- the_rain_<3,3>Sp<9,4>
The characters "falls_m" are output uncompressed,The characters "falls_m" are output uncompressed,
but "ain" has been used before in "rain" andbut "ain" has been used before in "rain" and
"Spain", so once again it is encoded with a"Spain", so once again it is encoded with a
pointer:pointer:
5- the_rain_<3,3>Sp<9,4>falls _m<11,3>5- the_rain_<3,3>Sp<9,4>falls _m<11,3>
32. 32
S
I
L
I
C
O
N
ARITHMATIC CODEIND
Huffman coding looks pretty slick, and it is, butHuffman coding looks pretty slick, and it is, but
there's a way to improve on it, known asthere's a way to improve on it, known as
"arithmetic coding". The idea is subtle and best"arithmetic coding". The idea is subtle and best
explained by example.explained by example.
Suppose we have a message that only contains theSuppose we have a message that only contains the
characters A, B, and C, with the followingcharacters A, B, and C, with the following
frequencies, expressed as fractions:frequencies, expressed as fractions:
A: 0.5 B: 0.2 C: 0.3A: 0.5 B: 0.2 C: 0.3
33. 33
S
I
L
I
C
O
N
letter probability interval binary fractionletter probability interval binary fraction
____ _________ ______ ___________ _________ ______ _______
C: 0.3 0.0 : 0.3 0C: 0.3 0.0 : 0.3 0
B: 0.2 0.3 : 0.5 0.011 = 3/8 = 0.375B: 0.2 0.3 : 0.5 0.011 = 3/8 = 0.375
A: 0.5 0.5 : 1.0 0.1 = 1/2 = 0.5A: 0.5 0.5 : 1.0 0.1 = 1/2 = 0.5
34. 34
S
I
L
I
C
O
N
Irreversible Compression
Irreversible CompressionIrreversible Compression is based on the assumptionis based on the assumption
that some information can be sacrificed. [Irreversiblethat some information can be sacrificed. [Irreversible
compression is also calledcompression is also called Entropy ReductionEntropy Reduction].].
Example: Shrinking a raster image from 400-by-400Example: Shrinking a raster image from 400-by-400
pixels to 100-by-100 pixels. The new image containspixels to 100-by-100 pixels. The new image contains
1 pixel for every 16 pixels in the original image.1 pixel for every 16 pixels in the original image.
There is usually no way to determine what theThere is usually no way to determine what the
original pixels were from the one new pixel.original pixels were from the one new pixel.
In data files, irreversible compression is seldom used.In data files, irreversible compression is seldom used.
However, it is used in image and speech processing.However, it is used in image and speech processing.
35. 35
S
I
L
I
C
O
N
LOSSY COMPRESSION
Data is lost, but not too much:Data is lost, but not too much:
Audio.Audio.
Video.Video.
Still images, medical images, photographs.Still images, medical images, photographs.
Compression ratios of 10:1 often yield quiteCompression ratios of 10:1 often yield quite
Major techniques include:Major techniques include:
Vector Quantization.Vector Quantization.
Block transforms.Block transforms.
Standards – JPEG, JPEG 2000, MPEG (1, 2, 4, 7).Standards – JPEG, JPEG 2000, MPEG (1, 2, 4, 7).
37. 37
S
I
L
I
C
O
N
DISADVANTAGES
Some technique are there by which data canSome technique are there by which data can
compress efficiently. But there is a chancecompress efficiently. But there is a chance
of losses data.of losses data.
38. 38
S
I
L
I
C
O
N
CONCLUSION
From the above description ,there is noFrom the above description ,there is no
algorithm has not been devloped.That is noalgorithm has not been devloped.That is no
such kind of algorithm which is applicablesuch kind of algorithm which is applicable
in every data file.But this difficulties can bein every data file.But this difficulties can be
handle by using Hybrid data compression.Inhandle by using Hybrid data compression.In
this IT ara data compression isthis IT ara data compression is
essential.Even though some data will beessential.Even though some data will be
loss.loss.