Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Huffman Coding, ArithmeticCoding, and JBIG2   Illustrations   Arber Borici   2010   University of N British Columbia
Huffman Coding   Entropy encoder for lossless compression   Input: Symbols and corresponding    probabilities   Output:...
Huffman Coding: Algorithm1.   Create a forest of leaf nodes for each     symbol2.   Take two nodes with lowest probabiliti...
Huffman Coding: Example   Consider the string ARBER   The probabilities of symbols A, B, E, and R    are:           Symb...
Generating Huffman Codes                                          r      A   0.2           B       0.2           E   0.2  ...
Generating Huffman Codes                                    r                   A   B                                     ...
Generating Huffman Codes                                    r                         B                                   ...
Generating Huffman Codes                                    r                     E   R                          0        ...
Generating Huffman Codes                                    r                     R                          0            ...
Generating Huffman Codes                                    r                          0               1                  ...
Huffman Codes: Decoding                                                    0001001011                                    r...
Huffman Codes: Decoding                                                        1001011                                    ...
Huffman Codes: Decoding                                                        001011                                    r...
Huffman Codes: Decoding                                                            011                                    ...
Huffman Codes: Decoding                                                                1                                  ...
Huffman Codes: Decoding                                                           0001001011                              ...
Arithmetic Coding   Entropy coder for lossless compression   Encodes the entire input data using a real    interval   S...
Arithmetic Coding: Algorithm   Create an interval for each symbol, based on    cumulative probabilities.   The interval ...
Arithmetic Coding: Example   Consider the string ARBER   The intervals of symbols A, B, E, and R are:           Symbol  ...
Arithmetic Coding: Example                 A R B E       R      0      0   0.12A                20% of (0, 0.2)           ...
Arithmetic Coding: Example                          B E      R      0          0.12     0.136             0A              ...
Arithmetic Coding: Example                            E      R      0          0.12     0.136        0.1424             0A...
Arithmetic Coding: Example                                   R      0          0.12     0.136        0.1424    0.14432    ...
Arithmetic Coding: Example   The final interval for the input string ARBER    is [0.14432, 0.1456).   In bits, one choos...
Arithmetic Coding   Practical implementations involve absolute    frequencies (integers), since the low and high    inter...
JBIG-2   Lossless and lossy bi-level data compression    standard   Emerged from JBIG-1       Joint Bi-Level Image Expe...
JBIG-2: Segmentation   The image on the left is segmented into a    binary image, text, and a grayscale image:           ...
JBIG-2: Encoding   Arithmetic Coding (QM Coder)   Context-based prediction       Larger contexts than JBIG-1   Progres...
JBIG-2: Halftone and Text   Halftone images are coded    as multi-level images,    along with pattern    and grid paramet...
Color Separation   Images comprising discrete colors can be    considered as multi-layered binary images:       Each col...
Color Separation: Example The following Excel graph comprises 34 colors + the white background:
Layer 1
Layer 5
Layer 12
Comparison with JBIG2 and JPEG   Our Method:   96%   Our Method:   98%   JBIG2:        94%   JBIG2:        97%   JPEG:    ...
Encoding Example                                      Original size:                                       64 * 3 = 192 bi...
Definitions (cont.)   Compression ratio is defined as the number of bits    after a coding scheme has been applied on the...
Upcoming SlideShare
Loading in …5
×

Huffman and Arithmetic Coding

31,449 views

Published on

This presentation illustrates the mechanisms behind Huffman and Arithmetic Coding for lossless data compression.

Huffman and Arithmetic Coding

  1. 1. Huffman Coding, ArithmeticCoding, and JBIG2 Illustrations Arber Borici 2010 University of N British Columbia
  2. 2. Huffman Coding Entropy encoder for lossless compression Input: Symbols and corresponding probabilities Output: Prefix-free codes with minimum expected lengths  Prefix property: There exists no code in the output that is a prefix of another code Optimal encoding algorithm
  3. 3. Huffman Coding: Algorithm1. Create a forest of leaf nodes for each symbol2. Take two nodes with lowest probabilities and make them siblings. The new internal node has a probability equal to the sum of the probabilities of the two child nodes.3. The new internal node acts as any other node in the forest.4. Repeat steps 2–3 until a tree is established.
  4. 4. Huffman Coding: Example Consider the string ARBER The probabilities of symbols A, B, E, and R are: Symbol A B E R Frequency 1 1 1 2 Probability 20% 20% 20% 40% The initial forest will thus comprise four nodes Now, we apply the Huffman algorithm
  5. 5. Generating Huffman Codes r A 0.2 B 0.2 E 0.2 R 0.4 0 1 2 0.6 0 1 1 0.4 0 1
  6. 6. Generating Huffman Codes r A B E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
  7. 7. Generating Huffman Codes r B E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 0 1A 0.2 B 0.2
  8. 8. Generating Huffman Codes r E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 B 001 0 1A 0.2 B 0.2
  9. 9. Generating Huffman Codes r R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 B 001 E 01 0 1A 0.2 B 0.2
  10. 10. Generating Huffman Codes r 0 1 2 0.6 R 0.4 Symbol Code 0 1 000 A 1 0.4 E 0.2 B 001 E 01 0 1 R 1A 0.2 B 0.2
  11. 11. Huffman Codes: Decoding 0001001011 r 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
  12. 12. Huffman Codes: Decoding 1001011 r A 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
  13. 13. Huffman Codes: Decoding 001011 r A R 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
  14. 14. Huffman Codes: Decoding 011 r A R B 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
  15. 15. Huffman Codes: Decoding 1 r A R B E 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
  16. 16. Huffman Codes: Decoding 0001001011 r A R B E R 0 1 2 0.6 R 0.4 The prefix property ensures unique decodability 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
  17. 17. Arithmetic Coding Entropy coder for lossless compression Encodes the entire input data using a real interval Slightly more efficient than Huffman Coding Implementation is harder: practical implementation variations have been proposed
  18. 18. Arithmetic Coding: Algorithm Create an interval for each symbol, based on cumulative probabilities. The interval for a symbol is [low, high). Given an input string, determine the interval of the first symbol Scale the remaining intervals:  New Low = Current Low + Sumn-1(p)*(H – L)  New High = Current High + Sumn(p)*(H – L)
  19. 19. Arithmetic Coding: Example Consider the string ARBER The intervals of symbols A, B, E, and R are: Symbol A B E R Low 0 0.2 0.4 0.6 High 0.2 0.4 0.6 1 A: [0, 0.2); B: [0.2, 0.4); E: [0.4, 0.6); and R: [0.6, 1);
  20. 20. Arithmetic Coding: Example A R B E R 0 0 0.12A 20% of (0, 0.2) 20% of (0.12, 0.2) 0.2 0.04 0.136B 20% of (0, 0.2) 20% of (0.12, 0.2) 0.4 0.08 0.152E 20% of (0, 0.2) 20% of (0.12, 0.2) 0.6 0.12 0.168R 40% of (0, 0.2) 40% of (0.12, 0.2) 1 0.2 0.2
  21. 21. Arithmetic Coding: Example B E R 0 0.12 0.136 0A 20% of (0.136, 0.152) 0.2 0.04 0.136 0.1392B 20% of (0.136, 0.152) 0.4 0.08 0.152 0.1424E 20% of (0.136, 0.152) 0.6 0.12 0.168 0.1456R 40% of (0.136, 0.152) 1 0.2 0.2 0.152
  22. 22. Arithmetic Coding: Example E R 0 0.12 0.136 0.1424 0A 20% of (0.1424, 0.1456) 0.2 0.04 0.136 0.1392 0.14304B 20% of (0.1424, 0.1456) 0.4 0.08 0.152 0.1424 0.14368E 20% of (0.1424, 0.1456) 0.6 0.12 0.168 0.1456 0.14432R 40% of (0.1424, 0.1456) 1 0.2 0.2 0.152 0.1456
  23. 23. Arithmetic Coding: Example R 0 0.12 0.136 0.1424 0.14432 0A 0.2 0.04 0.136 0.1392 0.14304B 0.4 0.08 0.152 0.1424 0.14368E 0.6 0.12 0.168 0.1456 0.14432R 0.2 0.2 0.1456 1 0.152 0.1456
  24. 24. Arithmetic Coding: Example The final interval for the input string ARBER is [0.14432, 0.1456). In bits, one chooses a number in the interval and encodes the decimal part. For the sample interval, one may choose point 0.14432, which in binary is: 0.14432 001001001111001000100111 110100000010100010100001 111 (51 bits)
  25. 25. Arithmetic Coding Practical implementations involve absolute frequencies (integers), since the low and high interval values tend to become really small. An END-OF-STREAM flag is usually required (with a very small probability) Decoding is straightforward: Start with the last interval and divide intervals proportionally to symbol probabilities. Proceed until and END-OF-STREAM control sequence is reached.
  26. 26. JBIG-2 Lossless and lossy bi-level data compression standard Emerged from JBIG-1  Joint Bi-Level Image Experts Group Supports three coding modes:  Generic  Halftone  Text Image is segmented into regions, which can be encoded using different methods
  27. 27. JBIG-2: Segmentation The image on the left is segmented into a binary image, text, and a grayscale image: binary text grayscale
  28. 28. JBIG-2: Encoding Arithmetic Coding (QM Coder) Context-based prediction  Larger contexts than JBIG-1 Progressive Compression (Display) • X = Pixel to be coded A A • A = Adaptive pixel (which can A A be moved) X Predictive context uses previous information Adaptive Coder
  29. 29. JBIG-2: Halftone and Text Halftone images are coded as multi-level images, along with pattern and grid parameters Each text symbol is encoded in a dictionary along with relative coordinates:
  30. 30. Color Separation Images comprising discrete colors can be considered as multi-layered binary images:  Each color and the image background form one binary layer If there are N colors, where one color represents the image background, then there will be N-1 binary layers:  A map with white background and four colors will thus yield 4 binary layers
  31. 31. Color Separation: Example The following Excel graph comprises 34 colors + the white background:
  32. 32. Layer 1
  33. 33. Layer 5
  34. 34. Layer 12
  35. 35. Comparison with JBIG2 and JPEG Our Method: 96% Our Method: 98% JBIG2: 94% JBIG2: 97% JPEG: 91% JPEG: 92%
  36. 36. Encoding Example Original size: 64 * 3 = 192 bits Codebook RCRC Uncompressible The compression ratio is the size of the encoded stream over the original size: 1 – (1 + 20 + 64) / 192 = 56% 0
  37. 37. Definitions (cont.) Compression ratio is defined as the number of bits after a coding scheme has been applied on the source data over the original source data size  Expressed as a percentage, or usually is bits per pixel (bpp) when source data is an image JBIG-2 is the standard binary image compression scheme  Based mainly on arithmetic coding with context modeling  Other methods in the literature designed for specific classes of binary images Our objective: design a coding method notwithstanding the nature of a binary image

×