Upcoming SlideShare
×

# Huffman and Arithmetic Coding

10,483 views
10,219 views

Published on

This presentation illustrates the mechanisms behind Huffman and Arithmetic Coding for lossless data compression.

6 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
10,483
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
578
0
Likes
6
Embeds 0
No embeds

No notes for slide

### Huffman and Arithmetic Coding

1. 1. Huffman Coding, ArithmeticCoding, and JBIG2 Illustrations Arber Borici 2010 University of N British Columbia
2. 2. Huffman Coding Entropy encoder for lossless compression Input: Symbols and corresponding probabilities Output: Prefix-free codes with minimum expected lengths  Prefix property: There exists no code in the output that is a prefix of another code Optimal encoding algorithm
3. 3. Huffman Coding: Algorithm1. Create a forest of leaf nodes for each symbol2. Take two nodes with lowest probabilities and make them siblings. The new internal node has a probability equal to the sum of the probabilities of the two child nodes.3. The new internal node acts as any other node in the forest.4. Repeat steps 2–3 until a tree is established.
4. 4. Huffman Coding: Example Consider the string ARBER The probabilities of symbols A, B, E, and R are: Symbol A B E R Frequency 1 1 1 2 Probability 20% 20% 20% 40% The initial forest will thus comprise four nodes Now, we apply the Huffman algorithm
5. 5. Generating Huffman Codes r A 0.2 B 0.2 E 0.2 R 0.4 0 1 2 0.6 0 1 1 0.4 0 1
6. 6. Generating Huffman Codes r A B E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
7. 7. Generating Huffman Codes r B E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 0 1A 0.2 B 0.2
8. 8. Generating Huffman Codes r E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 B 001 0 1A 0.2 B 0.2
9. 9. Generating Huffman Codes r R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 B 001 E 01 0 1A 0.2 B 0.2
10. 10. Generating Huffman Codes r 0 1 2 0.6 R 0.4 Symbol Code 0 1 000 A 1 0.4 E 0.2 B 001 E 01 0 1 R 1A 0.2 B 0.2
11. 11. Huffman Codes: Decoding 0001001011 r 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
12. 12. Huffman Codes: Decoding 1001011 r A 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
13. 13. Huffman Codes: Decoding 001011 r A R 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
14. 14. Huffman Codes: Decoding 011 r A R B 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
15. 15. Huffman Codes: Decoding 1 r A R B E 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
16. 16. Huffman Codes: Decoding 0001001011 r A R B E R 0 1 2 0.6 R 0.4 The prefix property ensures unique decodability 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
17. 17. Arithmetic Coding Entropy coder for lossless compression Encodes the entire input data using a real interval Slightly more efficient than Huffman Coding Implementation is harder: practical implementation variations have been proposed
18. 18. Arithmetic Coding: Algorithm Create an interval for each symbol, based on cumulative probabilities. The interval for a symbol is [low, high). Given an input string, determine the interval of the first symbol Scale the remaining intervals:  New Low = Current Low + Sumn-1(p)*(H – L)  New High = Current High + Sumn(p)*(H – L)
19. 19. Arithmetic Coding: Example Consider the string ARBER The intervals of symbols A, B, E, and R are: Symbol A B E R Low 0 0.2 0.4 0.6 High 0.2 0.4 0.6 1 A: [0, 0.2); B: [0.2, 0.4); E: [0.4, 0.6); and R: [0.6, 1);
20. 20. Arithmetic Coding: Example A R B E R 0 0 0.12A 20% of (0, 0.2) 20% of (0.12, 0.2) 0.2 0.04 0.136B 20% of (0, 0.2) 20% of (0.12, 0.2) 0.4 0.08 0.152E 20% of (0, 0.2) 20% of (0.12, 0.2) 0.6 0.12 0.168R 40% of (0, 0.2) 40% of (0.12, 0.2) 1 0.2 0.2
21. 21. Arithmetic Coding: Example B E R 0 0.12 0.136 0A 20% of (0.136, 0.152) 0.2 0.04 0.136 0.1392B 20% of (0.136, 0.152) 0.4 0.08 0.152 0.1424E 20% of (0.136, 0.152) 0.6 0.12 0.168 0.1456R 40% of (0.136, 0.152) 1 0.2 0.2 0.152
22. 22. Arithmetic Coding: Example E R 0 0.12 0.136 0.1424 0A 20% of (0.1424, 0.1456) 0.2 0.04 0.136 0.1392 0.14304B 20% of (0.1424, 0.1456) 0.4 0.08 0.152 0.1424 0.14368E 20% of (0.1424, 0.1456) 0.6 0.12 0.168 0.1456 0.14432R 40% of (0.1424, 0.1456) 1 0.2 0.2 0.152 0.1456
23. 23. Arithmetic Coding: Example R 0 0.12 0.136 0.1424 0.14432 0A 0.2 0.04 0.136 0.1392 0.14304B 0.4 0.08 0.152 0.1424 0.14368E 0.6 0.12 0.168 0.1456 0.14432R 0.2 0.2 0.1456 1 0.152 0.1456
24. 24. Arithmetic Coding: Example The final interval for the input string ARBER is [0.14432, 0.1456). In bits, one chooses a number in the interval and encodes the decimal part. For the sample interval, one may choose point 0.14432, which in binary is: 0.14432 001001001111001000100111 110100000010100010100001 111 (51 bits)
25. 25. Arithmetic Coding Practical implementations involve absolute frequencies (integers), since the low and high interval values tend to become really small. An END-OF-STREAM flag is usually required (with a very small probability) Decoding is straightforward: Start with the last interval and divide intervals proportionally to symbol probabilities. Proceed until and END-OF-STREAM control sequence is reached.
26. 26. JBIG-2 Lossless and lossy bi-level data compression standard Emerged from JBIG-1  Joint Bi-Level Image Experts Group Supports three coding modes:  Generic  Halftone  Text Image is segmented into regions, which can be encoded using different methods
27. 27. JBIG-2: Segmentation The image on the left is segmented into a binary image, text, and a grayscale image: binary text grayscale
28. 28. JBIG-2: Encoding Arithmetic Coding (QM Coder) Context-based prediction  Larger contexts than JBIG-1 Progressive Compression (Display) • X = Pixel to be coded A A • A = Adaptive pixel (which can A A be moved) X Predictive context uses previous information Adaptive Coder
29. 29. JBIG-2: Halftone and Text Halftone images are coded as multi-level images, along with pattern and grid parameters Each text symbol is encoded in a dictionary along with relative coordinates:
30. 30. Color Separation Images comprising discrete colors can be considered as multi-layered binary images:  Each color and the image background form one binary layer If there are N colors, where one color represents the image background, then there will be N-1 binary layers:  A map with white background and four colors will thus yield 4 binary layers
31. 31. Color Separation: Example The following Excel graph comprises 34 colors + the white background:
32. 32. Layer 1
33. 33. Layer 5
34. 34. Layer 12
35. 35. Comparison with JBIG2 and JPEG Our Method: 96% Our Method: 98% JBIG2: 94% JBIG2: 97% JPEG: 91% JPEG: 92%
36. 36. Encoding Example Original size: 64 * 3 = 192 bits Codebook RCRC Uncompressible The compression ratio is the size of the encoded stream over the original size: 1 – (1 + 20 + 64) / 192 = 56% 0
37. 37. Definitions (cont.) Compression ratio is defined as the number of bits after a coding scheme has been applied on the source data over the original source data size  Expressed as a percentage, or usually is bits per pixel (bpp) when source data is an image JBIG-2 is the standard binary image compression scheme  Based mainly on arithmetic coding with context modeling  Other methods in the literature designed for specific classes of binary images Our objective: design a coding method notwithstanding the nature of a binary image