Compression Techniques
Introduction
What is Compression?
Data compression requires the identification and
extraction of source redundancy.
In other words, data compression seeks to reduce
the number of bits used to store or transmit
information.
There are a wide range of compression methods
which can be so unlike one another that they have
little in common except that they compress data.
Compression can be categorized in
         two broad ways:

• Lossless compression
• Lossy compression
Lossless compression

• recover the exact original data after compression.

• mainly use for compressing database records,
spreadsheets or word processing files, where exact
replication of the original is essential.
Lossy compression.
• will result in a certain loss of accuracy in exchange for
   a substantial increase in compression.
• more effective when used to compress graphic images
and digitised voice where losses outside visual or aural
perception can be tolerated.
• Most lossy compression techniques can be adjusted to
different quality levels, gaining higher accuracy in
exchange for less effective compression.
The Need For Compression…
In terms of storage, the capacity of a storage
device can be effectively increased with methods
that compresses a body of data on its way to a
storage device and decompresses it when it is
retrieved.
 In terms of communications, the bandwidth of a
digital communication link can be effectively
increased by compressing data at the sending
end and decompressing data at the receiving
end.
A Brief History of Data
               Compression..
• The late 40's were the early years of Information
Theory, the idea of developing efficient new
coding methods was just starting to be fleshed
out. Ideas of entropy, information content and
redundancy were explored.
• One popular notion held that if the probability of
symbols in a message were known, there ought
to be a way to code the symbols so that the
message will take up less space.
• The first well-known method for compressing
digital signals is now known as Shannon- Fano
coding. Shannon and Fano [~1948]
simultaneously developed this algorithm which
assigns binary codewords to unique symbols that
appear within a given data file.
• While Shannon-Fano coding was a great leap
forward, it had the unfortunate luck to be quickly
superseded by an even more efficient coding
system : Huffman Coding.
• Huffman coding [1952] shares most
  characteristics of Shannon-Fano coding.
• Huffman coding could perform effective data
  compression by reducing the amount of
  redundancy in the coding of symbols.
• It has been proven to be the most efficient
  fixed-length coding method available
• In the last fifteen years, Huffman coding has
  been replaced by arithmetic coding.
• Arithmetic coding bypasses the idea of
  replacing an input symbol with a specific code.
• It replaces a stream of input symbols with a
  single floating-point output number.
• More bits are needed in the output number
  for longer, complex messages.
Terminology
• Compressor–Software (or hardware) device
  that compresses data
• Decompressor–Software (or hardware)
  device that decompresses data
• Codec–Software (or hardware) device that
  compresses and decompresses data
• Algorithm–The logic that governs the
  compression/decompression process
Lossless Compression
                Algorithms:
•   Repetitive Sequence Suppression
•   Run-length Encoding*
•   Pattern Substitution
•   Entropy Encoding*
     The Shannon-Fano Algorithm
     Huffman Coding*
     Arithmetic Coding*
Repetitive
• If a sequence aSequence Suppression
                 series on n successive tokens appears
• Replace series with a token and a count number of
occurrences.
• Usually need to have a special flag to denote when the
repeated token appears
• Example
89400000000000000000000000000000000
• we can replace with 894f32, where f is the flag for
  zero.
Run-length Encoding

Example:
• Original Sequence:
111122233333311112222
• can be encoded as:
(1,4),(2,3),(3,6),(1,4),(2,4)
Run-Length Encoding (RLE)
                   Method
Example:
Run-Length Encoding (RLE)
                   Method

Example:




 blue x 6, magenta x 7, red x 3, yellow x 3 and green x 4
Run-Length Encoding (RLE)
                Method
• Example:



    This would give:




     which is twice the size!
•   Uncompress
•   Blue White White White White White White Blue
•   White Blue White White White White White Blue
•   etc.
•   Compress
•   1XBlue 6XWhite 1XBlue
•   1XWhite 1XBlue 4Xwhite 1XBlue 1XWhite
•   etc.
The Shannon-Fano
                 Algorithm
•   Example
•   Data:
•   ABBAAAACDEAAABBBDDEEAAA........
•   Count symbols in stream:
Arithmetic Coding
Example
• Raw data: BACA
Therefore
• A occurs with probability 0.5,
• B and C with probabilities 0.25
2/4=0.5
1/4=0.25
Cont..

• Start by assigning each symbol to the
• probability range 0–1.




   The first symbol in our example stream is B
Applications
             Lossless compression..
• The above is a very simple example of run-
  length encoding,
• wherein large runs of consecutive identical data
  values are replaced by a simple code with the
  data value and length of the run. This is an
  example of lossless data compression.
• It is often used to optimize disk space on office
  computers, or better use the connection
  bandwidth in a computer network
Lossy image compression


• is used in digital cameras,
• to increase storage capacities with minimal
  degradation of picture quality.
Algorthim of coding…
function LZW Decode(File)
function LZW Encode(File) n ReadIndex(File)
n.
.
ReadByte(File)

GetString(n)
while n
ÇEOF do Output(

)

ReadByte(File.) while n
ÇEOF do

n©ÃF GetIndex(n3) nÃF

ReadIndex(File)

while n©Ã .
$
do if IndexInDict?(n©Ã) then
Ç
.

nU.nà GetString(n©Ã)
ReadByte(File3
.) AddDict(n3
×v1è)

nÃGetIndex(n) else
3
.

Output(n) C’= AddDict(n×~1è)

3

Ã

AddDict(n) GetString(n)
C=x Output()
C= C’

Compression techniques

  • 1.
  • 2.
    Introduction What is Compression? Datacompression requires the identification and extraction of source redundancy. In other words, data compression seeks to reduce the number of bits used to store or transmit information. There are a wide range of compression methods which can be so unlike one another that they have little in common except that they compress data.
  • 3.
    Compression can becategorized in two broad ways: • Lossless compression • Lossy compression
  • 4.
    Lossless compression • recoverthe exact original data after compression. • mainly use for compressing database records, spreadsheets or word processing files, where exact replication of the original is essential.
  • 5.
    Lossy compression. • willresult in a certain loss of accuracy in exchange for a substantial increase in compression. • more effective when used to compress graphic images and digitised voice where losses outside visual or aural perception can be tolerated. • Most lossy compression techniques can be adjusted to different quality levels, gaining higher accuracy in exchange for less effective compression.
  • 6.
    The Need ForCompression… In terms of storage, the capacity of a storage device can be effectively increased with methods that compresses a body of data on its way to a storage device and decompresses it when it is retrieved.  In terms of communications, the bandwidth of a digital communication link can be effectively increased by compressing data at the sending end and decompressing data at the receiving end.
  • 7.
    A Brief Historyof Data Compression.. • The late 40's were the early years of Information Theory, the idea of developing efficient new coding methods was just starting to be fleshed out. Ideas of entropy, information content and redundancy were explored. • One popular notion held that if the probability of symbols in a message were known, there ought to be a way to code the symbols so that the message will take up less space.
  • 8.
    • The firstwell-known method for compressing digital signals is now known as Shannon- Fano coding. Shannon and Fano [~1948] simultaneously developed this algorithm which assigns binary codewords to unique symbols that appear within a given data file. • While Shannon-Fano coding was a great leap forward, it had the unfortunate luck to be quickly superseded by an even more efficient coding system : Huffman Coding.
  • 9.
    • Huffman coding[1952] shares most characteristics of Shannon-Fano coding. • Huffman coding could perform effective data compression by reducing the amount of redundancy in the coding of symbols. • It has been proven to be the most efficient fixed-length coding method available
  • 10.
    • In thelast fifteen years, Huffman coding has been replaced by arithmetic coding. • Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. • It replaces a stream of input symbols with a single floating-point output number. • More bits are needed in the output number for longer, complex messages.
  • 11.
    Terminology • Compressor–Software (orhardware) device that compresses data • Decompressor–Software (or hardware) device that decompresses data • Codec–Software (or hardware) device that compresses and decompresses data • Algorithm–The logic that governs the compression/decompression process
  • 12.
    Lossless Compression Algorithms: • Repetitive Sequence Suppression • Run-length Encoding* • Pattern Substitution • Entropy Encoding* The Shannon-Fano Algorithm Huffman Coding* Arithmetic Coding*
  • 13.
    Repetitive • If asequence aSequence Suppression series on n successive tokens appears • Replace series with a token and a count number of occurrences. • Usually need to have a special flag to denote when the repeated token appears • Example 89400000000000000000000000000000000 • we can replace with 894f32, where f is the flag for zero.
  • 14.
    Run-length Encoding Example: • OriginalSequence: 111122233333311112222 • can be encoded as: (1,4),(2,3),(3,6),(1,4),(2,4)
  • 15.
  • 16.
    Run-Length Encoding (RLE) Method Example: blue x 6, magenta x 7, red x 3, yellow x 3 and green x 4
  • 17.
    Run-Length Encoding (RLE) Method • Example: This would give: which is twice the size!
  • 18.
    Uncompress • Blue White White White White White White Blue • White Blue White White White White White Blue • etc. • Compress • 1XBlue 6XWhite 1XBlue • 1XWhite 1XBlue 4Xwhite 1XBlue 1XWhite • etc.
  • 19.
    The Shannon-Fano Algorithm • Example • Data: • ABBAAAACDEAAABBBDDEEAAA........ • Count symbols in stream:
  • 20.
    Arithmetic Coding Example • Rawdata: BACA Therefore • A occurs with probability 0.5, • B and C with probabilities 0.25 2/4=0.5 1/4=0.25
  • 21.
    Cont.. • Start byassigning each symbol to the • probability range 0–1. The first symbol in our example stream is B
  • 22.
    Applications Lossless compression.. • The above is a very simple example of run- length encoding, • wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression. • It is often used to optimize disk space on office computers, or better use the connection bandwidth in a computer network
  • 23.
    Lossy image compression •is used in digital cameras, • to increase storage capacities with minimal degradation of picture quality.
  • 24.
    Algorthim of coding… functionLZW Decode(File) function LZW Encode(File) n ReadIndex(File) n. . ReadByte(File)  GetString(n) while n ÇEOF do Output( 
  • 25.
    ) ReadByte(File.) while n ÇEOFdo n©ÃF GetIndex(n3) nÃF  ReadIndex(File) while n©Ã . $ do if IndexInDict?(n©Ã) then
  • 26.
  • 27.