Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CHAPTER 6

Compression Techniques




Objectives:

 Able to perform data compression
 Able to use different compression
 t...
Introduction
  What is Compression?
     Data compression requires the identification and
     extraction of source redund...
A Brief History of Data
Compression
   The late 40's were the early years of
   Information Theory, the idea of developing...
A Brief History of Data
Compression

 Huffman coding [1952] shares most
 characteristics of Shannon-Fano coding.
 Huffman ...
A Brief History of Data
  Compression




  Terminology
• Compressor–Software (or hardware) device that
  compresses data
...
Compression can be
    categorized in two broad ways:
 Lossless compression
     recover the exact original data after
   ...
Lossless Compression
Algorithms:




Lossless Compression
Algorithms:
 Dictionary-based compression algorithms
 Repetitive...
Dictionary-based
compression algorithms
 Dictionary-based compression algorithms
 use a completely different method to
 co...
Dictionary-based
compression algorithms
Two dictionary- based compression
techniques called LZ77 and LZ78 have been
develo...
Example




Dictionary-based
compression algorithms
 The LZW decompression Algorithm can
 summarised as follows:




     ...
Example:




Dictionary-based compression
algorithms Problem
  What if we run out of dictionary space?
    Solution 1: Kee...
Repetitive
Sequence Suppression

 Fairly straight forward to understand
 and implement.
 Simplicity is their downfall: NOT...
Repetitive Sequence Suppression

  How Much Compression?
    Compression savings depend on the content of
    the data.
  ...
Run-length Encoding




Run-length Encoding




                      14
Run-length Encoding




   Run-length Encoding
 Uncompress
Blue White White White White White White Blue
White Blue White ...
Run-length Encoding




Pattern Substitution




                       16
Entropy Encoding




The Shannon-Fano Coding
 To create a code tree according to Shannon
 and Fano an ordered table is req...
The Shannon-Fano
Algorithm




The Shannon-Fano
Algorithm




                   18
The Shannon-Fano
Algorithm




The Shannon-Fano
Algorithm




                   19
The Shannon-Fano
Algorithm




  Example Shannon-Fano
         Coding




                         20
Example Shannon-Fano
                Coding
                  STEP 1       STEP 2         STEP 3
SYMBOL   FREQ   SUM   COD...
Example Shannon-Fano
         Coding




Huffman Coding




                         22
Huffman Coding




Huffman Coding




                 23
Huffman Coding




Huffman Coding




                 24
Huffman Coding




Huffman Coding




                 25
Huffman Coding




Huffman Coding




                 26
Huffman Code: Example




Huffman Code: Example




                        27
Huffman Code: Example




Huffman Coding




                        28
Huffman Coding




Huffman Coding




                 29
Arithmetic Coding




Arithmetic Coding




                    30
Arithmetic Coding




Arithmetic Coding




                    31
Arithmetic Coding




Arithmetic Coding




                    32
Arithmetic Coding




Arithmetic Coding




                    33
Arithmetic Coding




Arithmetic Coding




                    34
Arithmetic Coding




Arithmetic Coding




                    35
Arithmetic Coding




How to translate range to bit

     Example:
1.   BACA
     low = 0.59375, high = 0.60937.
2.   CAEE...
Decimal
            1      2      3        4       5
0.12345 =     1
                + 2+ 3+ 4+ 5
           10 10 10 10 1...
Binary to decimal

0.12 = 0.510                What is a value of
                             0.010101012 in
0.012 = 0.25...
Example1
   Range (0.33184,0.33220)

BEGIN            Binary
  code=0;                    Decimal
  k=1;
  while( value(co...
Example1
Range (0.33184,0.33220)

3.   Assign 1 to the third fraction (0.0112)
     =0.2510+ 0.12510 = 0.37510 which is bi...
Upcoming SlideShare
Loading in …5
×

Dictionary Based Compression

22,532 views

Published on

Published in: Technology

Dictionary Based Compression

  1. 1. CHAPTER 6 Compression Techniques Objectives: Able to perform data compression Able to use different compression techniques 1
  2. 2. Introduction What is Compression? Data compression requires the identification and extraction of source redundancy. In other words, data compression seeks to reduce the number of bits used to store or transmit information. There are a wide range of compression methods which can be so unlike one another that they have little in common except that they compress data. The Need For Compression In terms of storage, the capacity of a storage device can be effectively increased with methods that compresses a body of data on its way to a storage device and decompresses it when it is retrieved. In terms of communications, the bandwidth of a digital communication link can be effectively increased by compressing data at the sending end and decompressing data at the receiving end. 2
  3. 3. A Brief History of Data Compression The late 40's were the early years of Information Theory, the idea of developing efficient new coding methods was just starting to be fleshed out. Ideas of entropy, information content and redundancy were explored. One popular notion held that if the probability of symbols in a message were known, there ought to be a way to code the symbols so that the message will take up less space. A Brief History of Data Compression The first well-known method for compressing digital signals is now known as Shannon- Fano coding. Shannon and Fano [~1948] simultaneously developed this algorithm which assigns binary codewords to unique symbols that appear within a given data file. While Shannon-Fano coding was a great leap forward, it had the unfortunate luck to be quickly superseded by an even more efficient coding system : Huffman Coding. 3
  4. 4. A Brief History of Data Compression Huffman coding [1952] shares most characteristics of Shannon-Fano coding. Huffman coding could perform effective data compression by reducing the amount of redundancy in the coding of symbols. It has been proven to be the most efficient fixed-length coding method available. A Brief History of Data Compression In the last fifteen years, Huffman coding has been replaced by arithmetic coding. Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating-point output number. More bits are needed in the output number for longer, complex messages. 4
  5. 5. A Brief History of Data Compression Terminology • Compressor–Software (or hardware) device that compresses data • Decompressor–Software (or hardware) device that decompresses data • Codec–Software (or hardware) device that compresses and decompresses data • Algorithm–The logic that governs the compression/decompression process 5
  6. 6. Compression can be categorized in two broad ways: Lossless compression recover the exact original data after compression. mainly use for compressing database records, spreadsheets or word processing files, where exact replication of the original is essential. Compression can be categorized in two broad ways: Lossy compression will result in a certain loss of accuracy in exchange for a substantial increase in compression. more effective when used to compress graphic images and digitised voice where losses outside visual or aural perception can be tolerated. Most lossy compression techniques can be adjusted to different quality levels, gaining higher accuracy in exchange for less effective compression. 6
  7. 7. Lossless Compression Algorithms: Lossless Compression Algorithms: Dictionary-based compression algorithms Repetitive Sequence Suppression Run-length Encoding* Pattern Substitution Entropy Encoding* The Shannon-Fano Algorithm Huffman Coding* Arithmetic Coding* 7
  8. 8. Dictionary-based compression algorithms Dictionary-based compression algorithms use a completely different method to compress data. They encode variable-length strings of symbols as single tokens. The token forms an index to a phrase dictionary. If the tokens are smaller than the phrases, they replace the phrases and compression occurs. Dictionary-based compression algorithms Suppose we want to encode the Oxford Concise English dictionary which contains about 159,000 entries. Why not just transmit each word as an 18 bit number? Problems: Too many bits, everyone needs a dictionary, only works for English text. Solution: Find a way to build the dictionary adaptively. 8
  9. 9. Dictionary-based compression algorithms Two dictionary- based compression techniques called LZ77 and LZ78 have been developed. LZ77 is a "sliding window" technique in which the dictionary consists of a set of fixed- length phrases found in a "window" into the previously seen text. LZ78 takes a completely different approach to building a dictionary. Instead of using fixedlength phrases from a window into the text, LZ78 builds phrases up one symbol at a time, adding a new symbol to an existing phrase when a match occurs. Dictionary-based compression algorithms The LZW Compression Algorithm can summarised as follows: 9
  10. 10. Example Dictionary-based compression algorithms The LZW decompression Algorithm can summarised as follows: 10
  11. 11. Example: Dictionary-based compression algorithms Problem What if we run out of dictionary space? Solution 1: Keep track of unused entries and use LRU Solution 2: Monitor compression performance and flush dictionary when performance is poor. 11
  12. 12. Repetitive Sequence Suppression Fairly straight forward to understand and implement. Simplicity is their downfall: NOT best compression ratios. Some methods have their applications, e.g.Component of JPEG, Silence Suppression. Repetitive Sequence Suppression If a sequence a series on & successive tokens appears Replace series with a token and a count number of occurrences. Usually need to have a special flag to denote when the repeated token appears Example 89400000000000000000000000000000000 we can replace with 894f32, where f is the flag for zero. 12
  13. 13. Repetitive Sequence Suppression How Much Compression? Compression savings depend on the content of the data. Applications of this simple compression technique include: Suppression of zero’s in a file (Zero Length Suppression) Silence in audio data, Pauses in conversation etc. Bitmaps Blanks in text or program source files Backgrounds in images Other regular image or data tokens Run-length Encoding 13
  14. 14. Run-length Encoding Run-length Encoding 14
  15. 15. Run-length Encoding Run-length Encoding Uncompress Blue White White White White White White Blue White Blue White White White White White Blue etc. Compress 1XBlue 6XWhite 1XBlue 1XWhite 1XBlue 4Xwhite 1XBlue 1XWhite etc. 15
  16. 16. Run-length Encoding Pattern Substitution 16
  17. 17. Entropy Encoding The Shannon-Fano Coding To create a code tree according to Shannon and Fano an ordered table is required providing the frequency of any symbol. Each part of the table will be divided into two segments. The algorithm has to ensure that either the upper and the lower part of the segment have nearly the same sum of frequencies. This procedure will be repeated until only single symbols are left. 17
  18. 18. The Shannon-Fano Algorithm The Shannon-Fano Algorithm 18
  19. 19. The Shannon-Fano Algorithm The Shannon-Fano Algorithm 19
  20. 20. The Shannon-Fano Algorithm Example Shannon-Fano Coding 20
  21. 21. Example Shannon-Fano Coding STEP 1 STEP 2 STEP 3 SYMBOL FREQ SUM CODE SUM CODE SUM CODE 11 8 11 Example Shannon-Fano Coding 21
  22. 22. Example Shannon-Fano Coding Huffman Coding 22
  23. 23. Huffman Coding Huffman Coding 23
  24. 24. Huffman Coding Huffman Coding 24
  25. 25. Huffman Coding Huffman Coding 25
  26. 26. Huffman Coding Huffman Coding 26
  27. 27. Huffman Code: Example Huffman Code: Example 27
  28. 28. Huffman Code: Example Huffman Coding 28
  29. 29. Huffman Coding Huffman Coding 29
  30. 30. Arithmetic Coding Arithmetic Coding 30
  31. 31. Arithmetic Coding Arithmetic Coding 31
  32. 32. Arithmetic Coding Arithmetic Coding 32
  33. 33. Arithmetic Coding Arithmetic Coding 33
  34. 34. Arithmetic Coding Arithmetic Coding 34
  35. 35. Arithmetic Coding Arithmetic Coding 35
  36. 36. Arithmetic Coding How to translate range to bit Example: 1. BACA low = 0.59375, high = 0.60937. 2. CAEE$ low = 0.33184, high = 0.3322. 36
  37. 37. Decimal 1 2 3 4 5 0.12345 = 1 + 2+ 3+ 4+ 5 10 10 10 10 10 = 1× 10 −1 + 2 × 10 − 2 + 3 × 10 −3 + 4 × 10 − 4 + 5 × 10 −5 0.12345 x 10-5 x 10-4 x 10-3 x 10-2 x 10-1 Binary 0 . 01010101 0 1 0 1 0 1 0 1 = + 2 + 3 + 4 + 5 + 6 + 7 + 8 21 2 2 2 2 2 2 2 −2 −4 −6 −8 = 1× 2 + 1× 2 + 1× 2 + 1× 2 0.01010 x 2-5 x 2-4 x 2-3 x 2-2 x 2-1 37
  38. 38. Binary to decimal 0.12 = 0.510 What is a value of 0.010101012 in 0.012 = 0.2510 decimal? 0.0012 = 0.12510 0.00012 = 0.062510 0.000012 = 0.0312510 0.033203125 Generating codeword for encoder BEGIN [0.33184,0.33220] code=0; k=1; while( value(code) < low ) { assign 1 to the k-th binary fraction bit; if ( value(code) > high) replace the k-th bit by 0; k = k + 1; } END 38
  39. 39. Example1 Range (0.33184,0.33220) BEGIN Binary code=0; Decimal k=1; while( value(code) < 0.33184 ) { assign 1 to the k-th binary fraction bit; if ( value(code) > 0.33220 ) replace the k-th bit by 0; k = k + 1; } END Example1 Range (0.33184,0.33220) 1. Assign 1 to the first fraction (codeword=0.12) and compare with low (0.3318410) value(0.12=0.510)> 0.3318410 -> out of range Hence, we assign 0 for the first bit. value(0.02)< 0.3318410 -> while loop continue 2. Assign 1 to the second fraction (0.012) =0.2510 which is less then high (0.33220) 39
  40. 40. Example1 Range (0.33184,0.33220) 3. Assign 1 to the third fraction (0.0112) =0.2510+ 0.12510 = 0.37510 which is bigger then high (0.33220), so replace the kbit by 0. Now the codeword = 0.0102 4. Assign 1 to the fourth fraction (0.01012) = 0.2510 + 0.062510 =0.312510 which is less then high (0.33220). Now the codeword = 0.01012 5. Continue… Example1 Range (0.33184,0.33220) Eventually, the binary codeword generate is 0.01010101 which 0.033203125 8 bit binary represent CAEE$ 40

×