SlideShare a Scribd company logo
By
Subeer Rangra
(08EBKCS059)
      &
Mukul Ranjan
 (08EBKCS029)
Index
1.   Introduction to Data Compression
2.   Introduction to Text Compression
3.   LZW
     3.1 LZW Encoding Algorithm
     3.2 Encoding a String Example
     3.2 LZW Decoding Algorithm
     3.3 Decoding a String Example.
4.   Flate Compression
     4.1 Decomposition
        4.1.1 Huffman Coding
        4.1.2 LZ77 Compression
        4.1.3 Putting both together
5.   Advantages and Disadvantages
     5.1 LZW
     5.2 Flate
6.   Conclusion
1. Introduction to Data
Compression
 Encoding information using fewer bits than the
 original representation.
 Data Compression is achieved when redundancies are
 reduced or eliminated
 Lossless where no information is lost.

 Lossy where some information is lost.

 Compression reduces the data storage space.
Introduction to Data
Compression…. Contd.
 Reduces transmission time needed over the network.

 Data must be decompressed or decoded to be reused.

 Symmetrical or Asymmetrical

 Software or Hardware
2. Introduction to Text
Compression
 The compression of Text based data.

 Major difference between Text and Image compression.

 Databases, binary programs, text on one side and sound,
  image, video signals on the other.

 Text compression needs Losseless Compression.

 Needed in literary works, product catalogues, genomic
  databases, raw text databases.
3. LZW (Lempel-Ziv-Welch)
 Starts with a dictionary of all the single characters and gradually
  builds the dictionary as the information is sent through.

 Lossless compression hence works good for text compression.

 A dictionary or code table based encoding algorithm.

 Uses a code table with 4096 as a common choice for number of
  entries.

 It tries to identify repeated sequences of data and adds them to
  the code table.
LZW (Lempel-Ziv-Welch)….contd.
 A general compression algorithm capable of working
  on almost any type of data.

 Large size Text files in English language can be
  typically be compressed to half it’s size.

 Used in GIF (Graphics Interchange Format) to reduce
  the size without degrading the visual quality.
3.1 LZW Encoding Algorithm
1.  STRING = get input character
2. WHILE not end of input stream DO
3.     CHARACTER = get input character
4.     IF STRING+CHARACTER is in the string table then
5.         STRING = STRING+CHARACTER
6.     ELSE
7.         Output the code for STRING
8.         add STRING+CHARACTER to the STRING table
9.         STRING = CHARACTER
10.     END of IF
11. END of WHILE
12. Output the code for STRING
LZW Encoding Flowchart
3.2 Encoding a String example
 To encode a string of characters
1.   First Generate a initial dictionary of single characters

                  Symbol      Binary       Decimal
              #            00000       0
              A            00001       1
              B            00010       2
              C            00011       3
              D            00100       4
              E            00101       5
              Contd……..
              upto Z
Encoding a String Example …..contd
2. Example TOBEORNOTTOBEORTOBEORNOT
    Current                           Output
              Next Char                                 Extended Dictionary                    Comments
   Sequence                    Code            Bits
    NULL         T


      T          O        20             10100        27:         TO          27 = first available code after 0 through 26


      O          B        15             01111        28:         OB
      B          E        2              00010        29:         BE
      E          O        5              00101        30:         EO
      O          R        15             01111        31:         OR


                                                                              32 requires 6 bits, so for next output use 6
      R          N        18             10010        32:         RN
                                                                              bits


      N          O        14             001110       33:         NO
      O          T        15             001111       34:         OT
      T          T        20             010100       35:         TT
     TO          B        27             011011       36:         TOB

     BE          O        29             011101       37:         BEO
Encoding a String Example …..contd
  TO    B   27   011011   36:   TOB

  BE    O   29   011101   37:   BEO

  OR    T   31   011111   38:   ORT

  TOB   E   36   100100   39:   TOBE

  EO    R   30   011110   40:   EOR

  RN    O   32   100000   41:   RNO


                                       # stops the algorithm;
  OT    #   34   100010
                                       send the cur seq


            0    000000                and the stop code
3.3 LZW Decoding Algorithm
1.    Read OLD_CODE
2.    output OLD_CODE
3.    CHARACTER = OLD_CODE
4.    WHILE there are still input characters DO
5.      Read NEW_CODE
6.      IF NEW_CODE is not in the translation table THEN
7.         STRING = get translation of OLD_CODE
8.         STRING = STRING+CHARACTER
9.      ELSE
10.        STRING = get translation of NEW_CODE
11.     END of IF
12.     output STRING
13.     CHARACTER = first character in STRING
14.     add OLD_CODE + CHARACTER to the translation table
15.     OLD_CODE = NEW_CODE
16.   END of WHILE
LZW Decoding Flowchart
3.4 Decoding a String Example
 To decode an LZW-compressed archive, one needs to know
   in advance the initial dictionary used, but additional
   entries can be reconstructed as they are always simply
   concatenations of previous entries.
         Input                           New Dictionary Entry
                        Output
                                                                             Comments
  Bits          Code   Sequence         Full            Conjecture
10100       20            T                       27:        T?
01111       15            O       27:    TO       28:        O?
00010       2             B       28:    OB       29:        B?
00101       5             E       29:    BE       30:        E?
01111       15            O       30:    EO       31:        O?
                                                                     created code 31 (last to fit
10010       18            R       31:    OR       32:        R?
                                                                     in 5 bits)


                                                                     so start reading input at 6
001110      14            N       32:    RN       33:        N?
                                                                     bits
4. Flate Compression
 A lossless data compression.
 Can discover and exploit many patterns in the input
  data.
 An improvement over LZW compression, Flate
  encoded data is usually much more compact than
  LZW encoded output.
 It was originally defined by Phil Katz for version 2 of
  his PKZIP archiving tool and was later specified in RFC
  1951.
 Used in PDF compression, Adobe uses a Flate
  compression tool for PDF files.
4.1 Decomposition
 Flate specifications defines a lossless data format that
  compresses data using a combination of LZ77 algorithm
  and Huffman coding.
 Hence the format can be implemented readily in a manner
  not covered by patents.
 The manner in which these two algorithms work are
  explained below and then the combination of the two
  which work to produce Flate compression.
4.1.1 Huffman Coding
 A type of entropy encoding algorithm.

 Used for lossless data compression.

 Can be used to generate variable-length codes.

 The variable length codes are generated based on the
 frequency of the occurrence of the characters.
 The idea of assigning shortest code to the character
 with the highest probability of occurrence.
Huffman Coding…. contd.
 The algorithm starts by assigning each element a
  ‘weight’ a number that represents the relative
  frequency within the data to be compressed.
Taking an example for the set of weights {1,2,3,3,4}




1.   They are assigned to be the nodes or leaves of the
     Huffman tree to be formed
Huffman Coding…. contd.
2. During the first step, the two nodes with weights
   (highest priority OR lowest probability) 1 and 2 are
   merged, to create a new tree with a root of weight 3.
Huffman Coding…. contd.
3. Now we have three nodes with weights 3 at their
   roots, so choosing one of the 3 weighted node.
Huffman Coding…. contd.
4. Now our two minimum trees are the two singleton
   nodes of weights 3 and 4. We will combine these to
   form a new tree of weight 7.
Huffman Coding…. contd.
5. Finally we merge our last two remaining trees.
Huffman Coding…. contd.
 When all nodes have been recombined into a single
  ``Huffman tree,'' then by starting at the root and
  selecting 0 or 1 at each step, you can reach any element
  in the tree.
 Each element now has a Huffman code, which is the
  sequence of 0's and 1's that represents that path
  through the tree.
4.1.2 LZ77 Compression
 Works by finding the sequence of data that are
    repeated.
   A lossless data compression algorithm.
   Maintains a ‘sliding window during compression’
    which means that the compressor have a record of
    what last characters were.
   Goes through the text in a sliding window consisting
    of a search buffer and a look ahead buffer.
   The search buffer is used as dictionary.
LZ77 Compression…. contd.
1. Suppose the input text is
    AABABBBABAABABBBABBABB
2. The first block found is simply A, encoded as (0,A).
   The next is AB, encoded as (1,B) where 1 is a reference
   to A:
    A|AB|ABBBABAABABBBABBABB
3. The next block is ABB, which is encoded as (2,B)
   where 2 is a reference to AB, entered in the
   dictionary one iteration ago. Going this way, the
   string parses into
   A|AB|ABB|B|ABA|ABAB|BB|ABBA|BB
LZ77 Compression…. Contd.
 At the end of the algorithm, the dictionary is:
                  Reference        Phrase    Encoding
              1               A             (0,A)
              2               AB            (1,B)
              3               ABB           (2,B)
              4               B             (0,B)
              5               ABA           (2,A)
              6               ABAB          (5,B)
              7               BB            (4,B)
              8               ABBA          (3,A)
              9               BB            (7,0)
4.1.3 Putting Both Together
The Flate is a smart algorithm that adapts the way it
compresses data to the actual data themselves. There are
three modes of compression that the compressor has
available:
1. Not compressed at all an intelligent choice when the
    data has already been compressed.
2. Compression, first with LZ77 and then with a slightly
    modified version of Huffman coding. The trees that
    are used are defined by the Flate specification itself.
Putting Both Together….contd.
3. Compression first with LZ77 and then with Huffman
   coding with trees that compressor creates and stores
   along with the data.
   The data is broken up into blocks each block uses a
   single mode of compression.
5. Advantages & Disadvantages
5.1 LZW
Advantage
   Is a lossless compression algo. Hence no information is lost.
   One need not pass the code table between the two
    compression and the decompression.
   Simple, fast and good compression.
Disadvantage
   What happens when the dictionary becomes too large.
   One approach is to throw the dictionary away when it reaches
    a certain size.
   Useful only for a large amount of text data where redundancy
    is high.
Advantages & Disadvantages
5.1 Flate Compression
Advantage
    Huffman is easy to implement.
    Flate is a lossless compression technique hence no loss of text.
    Simple, fast and good compression.
    Freedom to chose the type of compression based on the need of the
     content.
Disadvantage
    Overhead is generated due to Huffman tree generation.
    The actual resulting compression code becomes too complex as it
     combines LZ77 and Huffman.
    It’s quiet tricky to understand and correctly apply the correct
     combination of LZ77 and Huffman.
6. Conclusion
 LZW has various advantages when being used to
  compress large text data, in English language which
  has high redundancy.
 Both LZW and Flate are software based, Dictionary
  and lossless methods of compression.
 The text compression needs lossless technique of
  compression.
 Flate which is readily used in PDF files, is an adaptive,
  changeable and complex way to compress text.
Thank You

More Related Content

What's hot

Image Compression
Image CompressionImage Compression
Image Compression
Paramjeet Singh Jamwal
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
Rajat Kumar
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
Christian Kehl
 
Phase Noise and Jitter Measurements
Phase Noise and Jitter MeasurementsPhase Noise and Jitter Measurements
Phase Noise and Jitter Measurements
Rohde & Schwarz North America
 
Coherent systems
Coherent systemsCoherent systems
Coherent systems
CKSunith1
 
Introduction to Image Compression
Introduction to Image CompressionIntroduction to Image Compression
Introduction to Image Compression
Kalyan Acharjya
 
Chain code in dip
Chain code in dipChain code in dip
Chain code in dip
Rishav Bhurtel
 
unit-3.ppt
unit-3.pptunit-3.ppt
Lzw
LzwLzw
Image enhancement
Image enhancementImage enhancement
Image enhancement
Kuppusamy P
 
Video coding standards ppt
Video coding standards pptVideo coding standards ppt
Video coding standards ppt
Lokesh Reddy Avula
 
MINIMUM SHIFT KEYING(MSK)
MINIMUM SHIFT KEYING(MSK)MINIMUM SHIFT KEYING(MSK)
MINIMUM SHIFT KEYING(MSK)
NARENDRA KUMAR REDDY
 
Jpeg and mpeg ppt
Jpeg and mpeg pptJpeg and mpeg ppt
Jpeg and mpeg ppt
siddharth rathore
 
Correlative level coding
Correlative level codingCorrelative level coding
Correlative level codingsrkrishna341
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
Marina Santini
 
image compression using matlab project report
image compression  using matlab project reportimage compression  using matlab project report
image compression using matlab project report
kgaurav113
 
Data compression
Data  compressionData  compression
Data compression
Ashutosh Kawadkar
 
Fourier descriptors & moments
Fourier descriptors & momentsFourier descriptors & moments
Fourier descriptors & moments
rajisri2
 
Template Matching - Pattern Recognition
Template Matching - Pattern RecognitionTemplate Matching - Pattern Recognition
Template Matching - Pattern Recognition
Mustafa Salam
 
Rise Time Budget Analysis and Design of Components
Rise Time Budget Analysis and Design of ComponentsRise Time Budget Analysis and Design of Components
Rise Time Budget Analysis and Design of Components
Saptarshi Mazumdar
 

What's hot (20)

Image Compression
Image CompressionImage Compression
Image Compression
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
Phase Noise and Jitter Measurements
Phase Noise and Jitter MeasurementsPhase Noise and Jitter Measurements
Phase Noise and Jitter Measurements
 
Coherent systems
Coherent systemsCoherent systems
Coherent systems
 
Introduction to Image Compression
Introduction to Image CompressionIntroduction to Image Compression
Introduction to Image Compression
 
Chain code in dip
Chain code in dipChain code in dip
Chain code in dip
 
unit-3.ppt
unit-3.pptunit-3.ppt
unit-3.ppt
 
Lzw
LzwLzw
Lzw
 
Image enhancement
Image enhancementImage enhancement
Image enhancement
 
Video coding standards ppt
Video coding standards pptVideo coding standards ppt
Video coding standards ppt
 
MINIMUM SHIFT KEYING(MSK)
MINIMUM SHIFT KEYING(MSK)MINIMUM SHIFT KEYING(MSK)
MINIMUM SHIFT KEYING(MSK)
 
Jpeg and mpeg ppt
Jpeg and mpeg pptJpeg and mpeg ppt
Jpeg and mpeg ppt
 
Correlative level coding
Correlative level codingCorrelative level coding
Correlative level coding
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
image compression using matlab project report
image compression  using matlab project reportimage compression  using matlab project report
image compression using matlab project report
 
Data compression
Data  compressionData  compression
Data compression
 
Fourier descriptors & moments
Fourier descriptors & momentsFourier descriptors & moments
Fourier descriptors & moments
 
Template Matching - Pattern Recognition
Template Matching - Pattern RecognitionTemplate Matching - Pattern Recognition
Template Matching - Pattern Recognition
 
Rise Time Budget Analysis and Design of Components
Rise Time Budget Analysis and Design of ComponentsRise Time Budget Analysis and Design of Components
Rise Time Budget Analysis and Design of Components
 

Viewers also liked

Lzw compression
Lzw compressionLzw compression
Lzw compression
Meghna Singh
 
Lzw compression ppt
Lzw compression pptLzw compression ppt
Lzw compression ppt
Rabia Nazir
 
Lzw algorithm
Lzw algorithmLzw algorithm
Lzw algorithm
keyvan moazami
 
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMOPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
Jitendra Choudhary
 
Lz77 / Lempel-Ziv Algorithm
Lz77 / Lempel-Ziv AlgorithmLz77 / Lempel-Ziv Algorithm
Lz77 / Lempel-Ziv Algorithm
Veysi Ertekin
 
LZ78
LZ78LZ78
Lzw coding technique for image compression
Lzw coding technique for image compressionLzw coding technique for image compression
Lzw coding technique for image compression
Tata Consultancy Services
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compressionanithabalaprabhu
 
Compression project presentation
Compression project presentationCompression project presentation
Compression project presentationfaizang909
 
Data compression huffman coding algoritham
Data compression huffman coding algorithamData compression huffman coding algoritham
Data compression huffman coding algorithamRahul Khanwani
 
Data compression techniques
Data compression techniquesData compression techniques
Data compression techniques
Deep Bhatt
 
Image compression
Image compressionImage compression
Image compression
Bassam Kanber
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
m_divya_bharathi
 
Data compression
Data compressionData compression
Data compression
VIKAS SINGH BHADOURIA
 

Viewers also liked (20)

Lzw compression
Lzw compressionLzw compression
Lzw compression
 
Lzw compression ppt
Lzw compression pptLzw compression ppt
Lzw compression ppt
 
Lzw algorithm
Lzw algorithmLzw algorithm
Lzw algorithm
 
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMOPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
 
Lz77 / Lempel-Ziv Algorithm
Lz77 / Lempel-Ziv AlgorithmLz77 / Lempel-Ziv Algorithm
Lz77 / Lempel-Ziv Algorithm
 
LZ78
LZ78LZ78
LZ78
 
Lzw coding technique for image compression
Lzw coding technique for image compressionLzw coding technique for image compression
Lzw coding technique for image compression
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compression
 
Compression project presentation
Compression project presentationCompression project presentation
Compression project presentation
 
Data compression huffman coding algoritham
Data compression huffman coding algorithamData compression huffman coding algoritham
Data compression huffman coding algoritham
 
image compression ppt
image compression pptimage compression ppt
image compression ppt
 
Compression
CompressionCompression
Compression
 
Data compression techniques
Data compression techniquesData compression techniques
Data compression techniques
 
Shannon Fano
Shannon FanoShannon Fano
Shannon Fano
 
Data compression
Data compressionData compression
Data compression
 
Image compression
Image compressionImage compression
Image compression
 
Digital Communication Techniques
Digital Communication TechniquesDigital Communication Techniques
Digital Communication Techniques
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
 
Data compression
Data compressionData compression
Data compression
 

Similar to Text compression in LZW and Flate

Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
Lec-03 Entropy Coding I: Hoffmann & Golomb CodesLec-03 Entropy Coding I: Hoffmann & Golomb Codes
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
United States Air Force Academy
 
Data Encryption standard in cryptography
Data Encryption standard in cryptographyData Encryption standard in cryptography
Data Encryption standard in cryptography
NithyasriA2
 
Lz algorithm
Lz algorithmLz algorithm
Lz algorithm
ssuser63c54d
 
EMBEDDED SYSTEMS 2&3
EMBEDDED SYSTEMS 2&3EMBEDDED SYSTEMS 2&3
EMBEDDED SYSTEMS 2&3PRADEEP
 
Logic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicLogic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicGouda Mando
 
ATT SMK.pptx
ATT SMK.pptxATT SMK.pptx
ATT SMK.pptx
MadhavKarve
 
Chapter 4 combinational circuit
Chapter 4 combinational circuit Chapter 4 combinational circuit
Chapter 4 combinational circuit
GulAhmad16
 
Lab01
Lab01Lab01
unit 5 (1).pptx
unit 5 (1).pptxunit 5 (1).pptx
unit 5 (1).pptx
HimansuShekharPradha1
 
Computer archi&mp
Computer archi&mpComputer archi&mp
Computer archi&mpMSc CST
 
Octal encoding
Octal encodingOctal encoding
Octal encoding
rajshreemuthiah
 
Crypto-Presentation jfjfd dkfdnfdj kdfjdjfdjkfd .pptx
Crypto-Presentation jfjfd dkfdnfdj kdfjdjfdjkfd .pptxCrypto-Presentation jfjfd dkfdnfdj kdfjdjfdjkfd .pptx
Crypto-Presentation jfjfd dkfdnfdj kdfjdjfdjkfd .pptx
anxiousanoja
 
Turbo Code
Turbo Code Turbo Code
Turbo Code
SudhanshuSaini5
 
Ch03 des
Ch03 desCh03 des
Ch03 des
mogtabamoutasem
 

Similar to Text compression in LZW and Flate (20)

Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
Lec-03 Entropy Coding I: Hoffmann & Golomb CodesLec-03 Entropy Coding I: Hoffmann & Golomb Codes
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
 
Data Encryption standard in cryptography
Data Encryption standard in cryptographyData Encryption standard in cryptography
Data Encryption standard in cryptography
 
Lz algorithm
Lz algorithmLz algorithm
Lz algorithm
 
EMBEDDED SYSTEMS 2&3
EMBEDDED SYSTEMS 2&3EMBEDDED SYSTEMS 2&3
EMBEDDED SYSTEMS 2&3
 
Logic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicLogic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional Logic
 
ATT SMK.pptx
ATT SMK.pptxATT SMK.pptx
ATT SMK.pptx
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Chapter 4 combinational circuit
Chapter 4 combinational circuit Chapter 4 combinational circuit
Chapter 4 combinational circuit
 
11.ppt
11.ppt11.ppt
11.ppt
 
06 Arithmetic 1
06 Arithmetic 106 Arithmetic 1
06 Arithmetic 1
 
Lab01
Lab01Lab01
Lab01
 
Lecture.1
Lecture.1Lecture.1
Lecture.1
 
unit 5 (1).pptx
unit 5 (1).pptxunit 5 (1).pptx
unit 5 (1).pptx
 
Computer archi&mp
Computer archi&mpComputer archi&mp
Computer archi&mp
 
Octal encoding
Octal encodingOctal encoding
Octal encoding
 
Crypto-Presentation jfjfd dkfdnfdj kdfjdjfdjkfd .pptx
Crypto-Presentation jfjfd dkfdnfdj kdfjdjfdjkfd .pptxCrypto-Presentation jfjfd dkfdnfdj kdfjdjfdjkfd .pptx
Crypto-Presentation jfjfd dkfdnfdj kdfjdjfdjkfd .pptx
 
Compression ii
Compression iiCompression ii
Compression ii
 
Turbo Code
Turbo Code Turbo Code
Turbo Code
 
Ch03 des
Ch03 desCh03 des
Ch03 des
 

Recently uploaded

Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 

Recently uploaded (20)

Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 

Text compression in LZW and Flate

  • 1. By Subeer Rangra (08EBKCS059) & Mukul Ranjan (08EBKCS029)
  • 2. Index 1. Introduction to Data Compression 2. Introduction to Text Compression 3. LZW 3.1 LZW Encoding Algorithm 3.2 Encoding a String Example 3.2 LZW Decoding Algorithm 3.3 Decoding a String Example. 4. Flate Compression 4.1 Decomposition 4.1.1 Huffman Coding 4.1.2 LZ77 Compression 4.1.3 Putting both together 5. Advantages and Disadvantages 5.1 LZW 5.2 Flate 6. Conclusion
  • 3. 1. Introduction to Data Compression  Encoding information using fewer bits than the original representation.  Data Compression is achieved when redundancies are reduced or eliminated  Lossless where no information is lost.  Lossy where some information is lost.  Compression reduces the data storage space.
  • 4. Introduction to Data Compression…. Contd.  Reduces transmission time needed over the network.  Data must be decompressed or decoded to be reused.  Symmetrical or Asymmetrical  Software or Hardware
  • 5. 2. Introduction to Text Compression  The compression of Text based data.  Major difference between Text and Image compression.  Databases, binary programs, text on one side and sound, image, video signals on the other.  Text compression needs Losseless Compression.  Needed in literary works, product catalogues, genomic databases, raw text databases.
  • 6. 3. LZW (Lempel-Ziv-Welch)  Starts with a dictionary of all the single characters and gradually builds the dictionary as the information is sent through.  Lossless compression hence works good for text compression.  A dictionary or code table based encoding algorithm.  Uses a code table with 4096 as a common choice for number of entries.  It tries to identify repeated sequences of data and adds them to the code table.
  • 7. LZW (Lempel-Ziv-Welch)….contd.  A general compression algorithm capable of working on almost any type of data.  Large size Text files in English language can be typically be compressed to half it’s size.  Used in GIF (Graphics Interchange Format) to reduce the size without degrading the visual quality.
  • 8. 3.1 LZW Encoding Algorithm 1. STRING = get input character 2. WHILE not end of input stream DO 3. CHARACTER = get input character 4. IF STRING+CHARACTER is in the string table then 5. STRING = STRING+CHARACTER 6. ELSE 7. Output the code for STRING 8. add STRING+CHARACTER to the STRING table 9. STRING = CHARACTER 10. END of IF 11. END of WHILE 12. Output the code for STRING
  • 10. 3.2 Encoding a String example  To encode a string of characters 1. First Generate a initial dictionary of single characters Symbol Binary Decimal # 00000 0 A 00001 1 B 00010 2 C 00011 3 D 00100 4 E 00101 5 Contd…….. upto Z
  • 11. Encoding a String Example …..contd 2. Example TOBEORNOTTOBEORTOBEORNOT Current Output Next Char Extended Dictionary Comments Sequence Code Bits NULL T T O 20 10100 27: TO 27 = first available code after 0 through 26 O B 15 01111 28: OB B E 2 00010 29: BE E O 5 00101 30: EO O R 15 01111 31: OR 32 requires 6 bits, so for next output use 6 R N 18 10010 32: RN bits N O 14 001110 33: NO O T 15 001111 34: OT T T 20 010100 35: TT TO B 27 011011 36: TOB BE O 29 011101 37: BEO
  • 12. Encoding a String Example …..contd TO B 27 011011 36: TOB BE O 29 011101 37: BEO OR T 31 011111 38: ORT TOB E 36 100100 39: TOBE EO R 30 011110 40: EOR RN O 32 100000 41: RNO # stops the algorithm; OT # 34 100010 send the cur seq 0 000000 and the stop code
  • 13. 3.3 LZW Decoding Algorithm 1. Read OLD_CODE 2. output OLD_CODE 3. CHARACTER = OLD_CODE 4. WHILE there are still input characters DO 5. Read NEW_CODE 6. IF NEW_CODE is not in the translation table THEN 7. STRING = get translation of OLD_CODE 8. STRING = STRING+CHARACTER 9. ELSE 10. STRING = get translation of NEW_CODE 11. END of IF 12. output STRING 13. CHARACTER = first character in STRING 14. add OLD_CODE + CHARACTER to the translation table 15. OLD_CODE = NEW_CODE 16. END of WHILE
  • 15. 3.4 Decoding a String Example  To decode an LZW-compressed archive, one needs to know in advance the initial dictionary used, but additional entries can be reconstructed as they are always simply concatenations of previous entries. Input New Dictionary Entry Output Comments Bits Code Sequence Full Conjecture 10100 20 T 27: T? 01111 15 O 27: TO 28: O? 00010 2 B 28: OB 29: B? 00101 5 E 29: BE 30: E? 01111 15 O 30: EO 31: O? created code 31 (last to fit 10010 18 R 31: OR 32: R? in 5 bits) so start reading input at 6 001110 14 N 32: RN 33: N? bits
  • 16. 4. Flate Compression  A lossless data compression.  Can discover and exploit many patterns in the input data.  An improvement over LZW compression, Flate encoded data is usually much more compact than LZW encoded output.  It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951.  Used in PDF compression, Adobe uses a Flate compression tool for PDF files.
  • 17. 4.1 Decomposition  Flate specifications defines a lossless data format that compresses data using a combination of LZ77 algorithm and Huffman coding.  Hence the format can be implemented readily in a manner not covered by patents.  The manner in which these two algorithms work are explained below and then the combination of the two which work to produce Flate compression.
  • 18. 4.1.1 Huffman Coding  A type of entropy encoding algorithm.  Used for lossless data compression.  Can be used to generate variable-length codes.  The variable length codes are generated based on the frequency of the occurrence of the characters.  The idea of assigning shortest code to the character with the highest probability of occurrence.
  • 19. Huffman Coding…. contd.  The algorithm starts by assigning each element a ‘weight’ a number that represents the relative frequency within the data to be compressed. Taking an example for the set of weights {1,2,3,3,4} 1. They are assigned to be the nodes or leaves of the Huffman tree to be formed
  • 20. Huffman Coding…. contd. 2. During the first step, the two nodes with weights (highest priority OR lowest probability) 1 and 2 are merged, to create a new tree with a root of weight 3.
  • 21. Huffman Coding…. contd. 3. Now we have three nodes with weights 3 at their roots, so choosing one of the 3 weighted node.
  • 22. Huffman Coding…. contd. 4. Now our two minimum trees are the two singleton nodes of weights 3 and 4. We will combine these to form a new tree of weight 7.
  • 23. Huffman Coding…. contd. 5. Finally we merge our last two remaining trees.
  • 24. Huffman Coding…. contd.  When all nodes have been recombined into a single ``Huffman tree,'' then by starting at the root and selecting 0 or 1 at each step, you can reach any element in the tree.  Each element now has a Huffman code, which is the sequence of 0's and 1's that represents that path through the tree.
  • 25. 4.1.2 LZ77 Compression  Works by finding the sequence of data that are repeated.  A lossless data compression algorithm.  Maintains a ‘sliding window during compression’ which means that the compressor have a record of what last characters were.  Goes through the text in a sliding window consisting of a search buffer and a look ahead buffer.  The search buffer is used as dictionary.
  • 26. LZ77 Compression…. contd. 1. Suppose the input text is AABABBBABAABABBBABBABB 2. The first block found is simply A, encoded as (0,A). The next is AB, encoded as (1,B) where 1 is a reference to A: A|AB|ABBBABAABABBBABBABB 3. The next block is ABB, which is encoded as (2,B) where 2 is a reference to AB, entered in the dictionary one iteration ago. Going this way, the string parses into A|AB|ABB|B|ABA|ABAB|BB|ABBA|BB
  • 27. LZ77 Compression…. Contd.  At the end of the algorithm, the dictionary is: Reference Phrase Encoding 1 A (0,A) 2 AB (1,B) 3 ABB (2,B) 4 B (0,B) 5 ABA (2,A) 6 ABAB (5,B) 7 BB (4,B) 8 ABBA (3,A) 9 BB (7,0)
  • 28. 4.1.3 Putting Both Together The Flate is a smart algorithm that adapts the way it compresses data to the actual data themselves. There are three modes of compression that the compressor has available: 1. Not compressed at all an intelligent choice when the data has already been compressed. 2. Compression, first with LZ77 and then with a slightly modified version of Huffman coding. The trees that are used are defined by the Flate specification itself.
  • 29. Putting Both Together….contd. 3. Compression first with LZ77 and then with Huffman coding with trees that compressor creates and stores along with the data. The data is broken up into blocks each block uses a single mode of compression.
  • 30. 5. Advantages & Disadvantages 5.1 LZW Advantage  Is a lossless compression algo. Hence no information is lost.  One need not pass the code table between the two compression and the decompression.  Simple, fast and good compression. Disadvantage  What happens when the dictionary becomes too large.  One approach is to throw the dictionary away when it reaches a certain size.  Useful only for a large amount of text data where redundancy is high.
  • 31. Advantages & Disadvantages 5.1 Flate Compression Advantage  Huffman is easy to implement.  Flate is a lossless compression technique hence no loss of text.  Simple, fast and good compression.  Freedom to chose the type of compression based on the need of the content. Disadvantage  Overhead is generated due to Huffman tree generation.  The actual resulting compression code becomes too complex as it combines LZ77 and Huffman.  It’s quiet tricky to understand and correctly apply the correct combination of LZ77 and Huffman.
  • 32. 6. Conclusion  LZW has various advantages when being used to compress large text data, in English language which has high redundancy.  Both LZW and Flate are software based, Dictionary and lossless methods of compression.  The text compression needs lossless technique of compression.  Flate which is readily used in PDF files, is an adaptive, changeable and complex way to compress text.