DATACOMPRESSIONALGORITHMS
- MOHNISH REDDY
(16CS01034)
What is ‘DATA COMPRESSION’ ?
As the name suggests it is a technique by which data is compressed. Virtual
data such as text files, images, video files, game files, etc. are made smaller
in size while preserving the original data as much as possible..
Why do we need ‘Data Compression’
There are basically two main reasons why we need Data Compression..
● Large files takes up lot of space. There are many files which are not
used frequently; these files can be compressed and stored, which
takes up much less space, and decompressed very easily when
required.
● The second utility is much more common and widely used,
downloading large files from net is slow and data consuming thus,
by downloading compressed files we not only save our time but
also our internet data.
Lossless and Lossy Data Compression
Lossless : On decompressing the
compressed file there is absolutely no
loss of data/quality.
Lossy : On decompression there is
small or large loss of data/quality
Different algorithms for data compression
● Run-Length Encoding
● Huffman Coding
● Lempel-Ziv-Welch Encoding
● Arithmetic coding
● Delta Encoding
● Adaptive Huffman coding
● Wavelet compression
● Discrete Cosine Transform
Run Length Encoding
This is the most basic method for data compression, as the name suggests while
iterating over each term we look for repetitive terms and encode them to
shorter/compressed forms.
Complexity : O(n)
Example:
➢ A line with ‘a’ as repeating character.
➢ Two characters in the compressed line replace each run of ‘a’.
➢ For the first 8 repeating a’s in original file, the first encoded stream in
compressed line is showing that ’a’ was repeating 8 times.
Huffman Coding
The characters in a data file are converted to a binary code. The most common
characters in the input file(characters with higher probability) are assigned short
binary codes and least common characters(with lower probabilities) are assigned
longer binary codes. Codes can be of different lengths.
Complexity : O(nLogn)
Example : MISSISSIPPI_RIVER
= 17 Characters = (17*8) bits (as 1 ASCII char = 8 bits)
= 136 bits (Original number of bits/Space consumed)
Compression Rate = (136-46)/136 = 66.18%
Drawbacks
Though there are not many drawbacks of data compression some small
disadvantages are listed below..
● Data Compression is mostly time consuming.
● There are not many optimal data compression techniques, some which exists
are not always efficient.
● Due to UV radiation or Magnetic field there is some loss of data, on non-
compressed files its not very destructive but in compressed loss of small data
can lead to loss of large chunks of data.
Data compression  algorithms

Data compression algorithms

  • 1.
  • 2.
    What is ‘DATACOMPRESSION’ ? As the name suggests it is a technique by which data is compressed. Virtual data such as text files, images, video files, game files, etc. are made smaller in size while preserving the original data as much as possible..
  • 3.
    Why do weneed ‘Data Compression’ There are basically two main reasons why we need Data Compression.. ● Large files takes up lot of space. There are many files which are not used frequently; these files can be compressed and stored, which takes up much less space, and decompressed very easily when required. ● The second utility is much more common and widely used, downloading large files from net is slow and data consuming thus, by downloading compressed files we not only save our time but also our internet data.
  • 4.
    Lossless and LossyData Compression Lossless : On decompressing the compressed file there is absolutely no loss of data/quality. Lossy : On decompression there is small or large loss of data/quality
  • 5.
    Different algorithms fordata compression ● Run-Length Encoding ● Huffman Coding ● Lempel-Ziv-Welch Encoding ● Arithmetic coding ● Delta Encoding ● Adaptive Huffman coding ● Wavelet compression ● Discrete Cosine Transform
  • 6.
    Run Length Encoding Thisis the most basic method for data compression, as the name suggests while iterating over each term we look for repetitive terms and encode them to shorter/compressed forms. Complexity : O(n) Example: ➢ A line with ‘a’ as repeating character. ➢ Two characters in the compressed line replace each run of ‘a’. ➢ For the first 8 repeating a’s in original file, the first encoded stream in compressed line is showing that ’a’ was repeating 8 times.
  • 8.
    Huffman Coding The charactersin a data file are converted to a binary code. The most common characters in the input file(characters with higher probability) are assigned short binary codes and least common characters(with lower probabilities) are assigned longer binary codes. Codes can be of different lengths. Complexity : O(nLogn) Example : MISSISSIPPI_RIVER = 17 Characters = (17*8) bits (as 1 ASCII char = 8 bits) = 136 bits (Original number of bits/Space consumed)
  • 11.
    Compression Rate =(136-46)/136 = 66.18%
  • 12.
    Drawbacks Though there arenot many drawbacks of data compression some small disadvantages are listed below.. ● Data Compression is mostly time consuming. ● There are not many optimal data compression techniques, some which exists are not always efficient. ● Due to UV radiation or Magnetic field there is some loss of data, on non- compressed files its not very destructive but in compressed loss of small data can lead to loss of large chunks of data.

Editor's Notes

  • #13 Suppose we are watching a video of 1920x1080 that will give 2 million pixels for each frame, generally a video is at a rate of 30 frames/sec that will give 62 billion pixels/sec. If a pixel takes 24 bits of information we will have a whopping amount of 178 MB/sec so that will be 51GB for a single 5 min video