Huffman and Arithmetic coding - Performance analysis
1. Performance Analysis of Huffman
and Arithmetic coding Compression
algorithms
1
By: Ramakant Soni
2. Data Compression
• It is the process of encoding information using fewer bits than the
original representation.
• It is the process of reducing the size of a data file, although its
formal name is source coding.
Compression types:
1. Lossless compression
2. Lossy compression
• Compression is useful because it helps reduce resources usage,
such as data storage space or transmission capacity.
• Data compression is subject to a space-time complexity trade-off.
2
3. Need of analysis
• Which compression algorithm is Better?
• Decision Factor: Complexity
Complexity (Computational complexity ) theory focuses on
classifying computational problems according to their inherent
difficulty.
Difficulty of problems mean the resources used to get the solution.
Resources:
1. Space
2. Time
3
5. Huffman Coding
• Huffman coding is an entropy encoding algorithm used for lossless
data compression.
• It encodes a source symbol into variable-length code which is
derived in a particular way based on the estimated probability of
occurrence of the source symbol.
5
6. Huffman Coding Algorithm:
Algorithm:
1. Start with a list of symbols and their frequency in the alphabet.
2. Select two symbols with the lowest frequency.
3. Add their frequencies and reduce the symbols.
4. Repeat the process starting from step-2 until only two values
remain.
5. The algorithm creates a prefix code for each symbol from the
alphabet simply by traversing the symbols back. It assigns 0 and 1
for each frequency value in each phase .
6
7. Example: Huffman Coding
Input: Six symbols and their respective Probability / Frequency Values
Steps:
1. Arrange the symbols in descending order of frequency values.
2. Add the two lowest values and remove the symbols
3. At the end we will be left with only two values
7
8. Example: Huffman Coding
Steps:
1. Moving backward assign the bits to the values.
2. With each succession phase(reverse) add bit values using step 1
Result:
Symbols with variable length codes.
8
9. Huffman Coding Complexity analysis
• Huffman algorithm is based on greedy approach to get the optimal
result(code length).
• Using the pseudo code:
• Running Time Function: T(n)= N[n+ log(2n-1)]+ Sn
• Complexity: O(N )
9
2
10. Arithmetic Coding
• Arithmetic coding is a form of entropy encoding used in lossless
data compression.
• A string of characters is represented using a fixed number of bits
per character.
• When a string is converted to arithmetic encoding, frequently used
characters will be stored with fewer bits and not-so-frequently
occurring characters will be stored with more bits.
• Arithmetic coding encodes the entire message into a single number,
a fraction n where (0.0 ≤ n < 1.0)
10
11. Arithmetic Coding Algorithm:
Algorithm:
1. Take character and symbol with their probabilities value as input.
2. The encoder divides the current interval [0,1) into sub-intervals,
each representing a fraction of the current interval proportional to
the probability of that symbol in the current context.
3. Whichever interval corresponds to the actual symbol that is next
to be encoded becomes the interval used in the next step.
4. When symbols get encoded, the resulting interval unambiguously
identifies the sequence of symbols that produced it.
5. Using the final interval and model we can reconstruct the symbol
sequence that must have entered the encoder to result in that
final interval.
11
12. Example: Arithmetic Coding
Input: 3 symbols(x) with respective probabilities(p) and a character
X={0,1,2} P={P0, P1, P2}={0.2, 0.4, 0.4 } x1..x3=210
12
13. Arithmetic Coding Complexity analysis
ARITHMETIC_IDEAL_ENCODE(M)
Set L= 0 and R = 1
FOR i = 1 to |M| DO
END FOR
Transmit the shortest binary fractional number that lies in the interval [L, L+R)
END
13
14. Arithmetic Coding Complexity analysis
Running time Function:
T(n)=N[log n+ a]+ Sn
N Number of input symbols
n current number of unique symbols
S Time to maintain internal data structure
Complexity: O(N log n)
14
16. Conclusion
• Arithmetic Coding surpasses the Huffman technique in its
compression ability.
• The Huffman method assigns an integral number of bits to each
symbol, while arithmetic coding assigns one long code to the entire
input string.
• Arithmetic coding consists of a few arithmetic operations due to
which its complexity is less.
• In terms of complexity Arithmetic coding is asymptotically better
than Huffman’s.
16