Lossless Compression
Compression



     the process of coding that will effectively reduce
     the total number of bits needed to represent
     certain information.
Lossless Compression


       • data compression technique that reduces
         the size of a file without sacrificing any
         original data
              information from the file is still
               there, nothing is deleted
              lets you recreate the original file exactly
       • suitable for: text and computer code
       • example: ZIP archiving technology
                  (WinZip & PKZIP)
Lossless Compression
Lossless Compression




          pro.   exact duplicate of the original file
          con. compression ratio is not all that high


      Compression ratio is the ratio of the size or rate of the
      original data to the size or rate of the compressed data.
Lossless Compression



    Principle of Lossless Compression Algorithms

        any non-random file will contain duplicated
        information that can be condensed using statistical
        modeling techniques that determine the probability
        of a character or phrase appearing
Hierarchy of Lossless Compression Algorithms
Lossless Compression



      Techniques:

                Run-Length Encoding
                Burrows-Wheeler Transform
Run-Length Encoding




       compression technique that replaces runs
       of two or more of the same character with a
       number which represents the length of the run,
       followed by the original character; single
       characters are coded as runs of 1.
Run-Length Encoding (cont’d)



     Input:
              AAABBCCCCDEEEEEEAAAAAAAAAAAAA

     Output:
              AAABBCCCCDEEEEEEAAAAAAAAAAAAA

                         3A2B4CD6E13A
Burrows-Wheeler Transform


    o technique invented in 1994 that aims to
      reversibly transform a block of input data such
      that the amount of runs of identical characters is
      maximized.
   o does not perform any compression operations,
     it simply transforms the input such that it can be
     more efficiently coded by a Run-Length Encoder
     or other secondary compression technique.
Burrows-Wheeler Transform (cont’d)



      Algorithm
          I. Create a string array.
          II. Generate all possible rotations of the input
               string, storing each in the array.
          III. Sort the array alphabetically.
          IV.Return the last column of the array
Burrows-Wheeler Transform (cont’d)


                                     Alpha-Sorted
      Input        Rotations                         Output
                                       Rotations
                  HAHAHA&            AHAHA&H
                  &HAHAHA            AHA&HAH
                  A&HAHAH            A&HAHAH
   HAHAHA&        HA&HAHA            HAHAHA&        HHH&AAA
                  AHA&HAH            HAHA&HA
                  HAHA&HA            HA&HAHA
                  AHAHA&H            &HAHAHA
Burrows-Wheeler Transform (cont’d)



     Input:
             HHH&AAA

     Output:
             HHH&AAA

                        3H&3A
end

Lossless Compression

  • 1.
  • 2.
    Compression the process of coding that will effectively reduce the total number of bits needed to represent certain information.
  • 3.
    Lossless Compression • data compression technique that reduces the size of a file without sacrificing any original data  information from the file is still there, nothing is deleted  lets you recreate the original file exactly • suitable for: text and computer code • example: ZIP archiving technology (WinZip & PKZIP)
  • 4.
  • 5.
    Lossless Compression pro. exact duplicate of the original file con. compression ratio is not all that high Compression ratio is the ratio of the size or rate of the original data to the size or rate of the compressed data.
  • 6.
    Lossless Compression Principle of Lossless Compression Algorithms any non-random file will contain duplicated information that can be condensed using statistical modeling techniques that determine the probability of a character or phrase appearing
  • 7.
    Hierarchy of LosslessCompression Algorithms
  • 8.
    Lossless Compression Techniques:  Run-Length Encoding  Burrows-Wheeler Transform
  • 9.
    Run-Length Encoding compression technique that replaces runs of two or more of the same character with a number which represents the length of the run, followed by the original character; single characters are coded as runs of 1.
  • 10.
    Run-Length Encoding (cont’d) Input: AAABBCCCCDEEEEEEAAAAAAAAAAAAA Output: AAABBCCCCDEEEEEEAAAAAAAAAAAAA 3A2B4CD6E13A
  • 11.
    Burrows-Wheeler Transform o technique invented in 1994 that aims to reversibly transform a block of input data such that the amount of runs of identical characters is maximized. o does not perform any compression operations, it simply transforms the input such that it can be more efficiently coded by a Run-Length Encoder or other secondary compression technique.
  • 12.
    Burrows-Wheeler Transform (cont’d) Algorithm I. Create a string array. II. Generate all possible rotations of the input string, storing each in the array. III. Sort the array alphabetically. IV.Return the last column of the array
  • 13.
    Burrows-Wheeler Transform (cont’d) Alpha-Sorted Input Rotations Output Rotations HAHAHA& AHAHA&H &HAHAHA AHA&HAH A&HAHAH A&HAHAH HAHAHA& HA&HAHA HAHAHA& HHH&AAA AHA&HAH HAHA&HA HAHA&HA HA&HAHA AHAHA&H &HAHAHA
  • 14.
    Burrows-Wheeler Transform (cont’d) Input: HHH&AAA Output: HHH&AAA 3H&3A
  • 15.