0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Huffman and Arithmetic Coding

5,542

Published on

This presentation illustrates the mechanisms behind Huffman and Arithmetic Coding for lossless data compression.

This presentation illustrates the mechanisms behind Huffman and Arithmetic Coding for lossless data compression.

2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
5,542
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
269
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Huffman Coding, ArithmeticCoding, and JBIG2 Illustrations Arber Borici 2010 University of N British Columbia
• 2. Huffman Coding Entropy encoder for lossless compression Input: Symbols and corresponding probabilities Output: Prefix-free codes with minimum expected lengths  Prefix property: There exists no code in the output that is a prefix of another code Optimal encoding algorithm
• 3. Huffman Coding: Algorithm1. Create a forest of leaf nodes for each symbol2. Take two nodes with lowest probabilities and make them siblings. The new internal node has a probability equal to the sum of the probabilities of the two child nodes.3. The new internal node acts as any other node in the forest.4. Repeat steps 2–3 until a tree is established.
• 4. Huffman Coding: Example Consider the string ARBER The probabilities of symbols A, B, E, and R are: Symbol A B E R Frequency 1 1 1 2 Probability 20% 20% 20% 40% The initial forest will thus comprise four nodes Now, we apply the Huffman algorithm
• 5. Generating Huffman Codes r A 0.2 B 0.2 E 0.2 R 0.4 0 1 2 0.6 0 1 1 0.4 0 1
• 6. Generating Huffman Codes r A B E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
• 7. Generating Huffman Codes r B E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 0 1A 0.2 B 0.2
• 8. Generating Huffman Codes r E R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 B 001 0 1A 0.2 B 0.2
• 9. Generating Huffman Codes r R 0 1 2 0.6 R 0.4 Symbol Code 0 1 A 000 1 0.4 E 0.2 B 001 E 01 0 1A 0.2 B 0.2
• 10. Generating Huffman Codes r 0 1 2 0.6 R 0.4 Symbol Code 0 1 000 A 1 0.4 E 0.2 B 001 E 01 0 1 R 1A 0.2 B 0.2
• 11. Huffman Codes: Decoding 0001001011 r 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
• 12. Huffman Codes: Decoding 1001011 r A 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
• 13. Huffman Codes: Decoding 001011 r A R 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
• 14. Huffman Codes: Decoding 011 r A R B 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
• 15. Huffman Codes: Decoding 1 r A R B E 0 1 2 0.6 R 0.4 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
• 16. Huffman Codes: Decoding 0001001011 r A R B E R 0 1 2 0.6 R 0.4 The prefix property ensures unique decodability 0 1 1 0.4 E 0.2 0 1A 0.2 B 0.2
• 17. Arithmetic Coding Entropy coder for lossless compression Encodes the entire input data using a real interval Slightly more efficient than Huffman Coding Implementation is harder: practical implementation variations have been proposed
• 18. Arithmetic Coding: Algorithm Create an interval for each symbol, based on cumulative probabilities. The interval for a symbol is [low, high). Given an input string, determine the interval of the first symbol Scale the remaining intervals:  New Low = Current Low + Sumn-1(p)*(H – L)  New High = Current High + Sumn(p)*(H – L)
• 19. Arithmetic Coding: Example Consider the string ARBER The intervals of symbols A, B, E, and R are: Symbol A B E R Low 0 0.2 0.4 0.6 High 0.2 0.4 0.6 1 A: [0, 0.2); B: [0.2, 0.4); E: [0.4, 0.6); and R: [0.6, 1);
• 20. Arithmetic Coding: Example A R B E R 0 0 0.12A 20% of (0, 0.2) 20% of (0.12, 0.2) 0.2 0.04 0.136B 20% of (0, 0.2) 20% of (0.12, 0.2) 0.4 0.08 0.152E 20% of (0, 0.2) 20% of (0.12, 0.2) 0.6 0.12 0.168R 40% of (0, 0.2) 40% of (0.12, 0.2) 1 0.2 0.2
• 21. Arithmetic Coding: Example B E R 0 0.12 0.136 0A 20% of (0.136, 0.152) 0.2 0.04 0.136 0.1392B 20% of (0.136, 0.152) 0.4 0.08 0.152 0.1424E 20% of (0.136, 0.152) 0.6 0.12 0.168 0.1456R 40% of (0.136, 0.152) 1 0.2 0.2 0.152
• 22. Arithmetic Coding: Example E R 0 0.12 0.136 0.1424 0A 20% of (0.1424, 0.1456) 0.2 0.04 0.136 0.1392 0.14304B 20% of (0.1424, 0.1456) 0.4 0.08 0.152 0.1424 0.14368E 20% of (0.1424, 0.1456) 0.6 0.12 0.168 0.1456 0.14432R 40% of (0.1424, 0.1456) 1 0.2 0.2 0.152 0.1456
• 23. Arithmetic Coding: Example R 0 0.12 0.136 0.1424 0.14432 0A 0.2 0.04 0.136 0.1392 0.14304B 0.4 0.08 0.152 0.1424 0.14368E 0.6 0.12 0.168 0.1456 0.14432R 0.2 0.2 0.1456 1 0.152 0.1456
• 24. Arithmetic Coding: Example The final interval for the input string ARBER is [0.14432, 0.1456). In bits, one chooses a number in the interval and encodes the decimal part. For the sample interval, one may choose point 0.14432, which in binary is: 0.14432 001001001111001000100111 110100000010100010100001 111 (51 bits)
• 25. Arithmetic Coding Practical implementations involve absolute frequencies (integers), since the low and high interval values tend to become really small. An END-OF-STREAM flag is usually required (with a very small probability) Decoding is straightforward: Start with the last interval and divide intervals proportionally to symbol probabilities. Proceed until and END-OF-STREAM control sequence is reached.
• 26. JBIG-2 Lossless and lossy bi-level data compression standard Emerged from JBIG-1  Joint Bi-Level Image Experts Group Supports three coding modes:  Generic  Halftone  Text Image is segmented into regions, which can be encoded using different methods
• 27. JBIG-2: Segmentation The image on the left is segmented into a binary image, text, and a grayscale image: binary text grayscale
• 28. JBIG-2: Encoding Arithmetic Coding (QM Coder) Context-based prediction  Larger contexts than JBIG-1 Progressive Compression (Display) • X = Pixel to be coded A A • A = Adaptive pixel (which can A A be moved) X Predictive context uses previous information Adaptive Coder
• 29. JBIG-2: Halftone and Text Halftone images are coded as multi-level images, along with pattern and grid parameters Each text symbol is encoded in a dictionary along with relative coordinates:
• 30. Color Separation Images comprising discrete colors can be considered as multi-layered binary images:  Each color and the image background form one binary layer If there are N colors, where one color represents the image background, then there will be N-1 binary layers:  A map with white background and four colors will thus yield 4 binary layers
• 31. Color Separation: Example The following Excel graph comprises 34 colors + the white background:
• 32. Layer 1
• 33. Layer 5
• 34. Layer 12
• 35. Comparison with JBIG2 and JPEG Our Method: 96% Our Method: 98% JBIG2: 94% JBIG2: 97% JPEG: 91% JPEG: 92%
• 36. Encoding Example Original size: 64 * 3 = 192 bits Codebook RCRC Uncompressible The compression ratio is the size of the encoded stream over the original size: 1 – (1 + 20 + 64) / 192 = 56% 0
• 37. Definitions (cont.) Compression ratio is defined as the number of bits after a coding scheme has been applied on the source data over the original source data size  Expressed as a percentage, or usually is bits per pixel (bpp) when source data is an image JBIG-2 is the standard binary image compression scheme  Based mainly on arithmetic coding with context modeling  Other methods in the literature designed for specific classes of binary images Our objective: design a coding method notwithstanding the nature of a binary image