SlideShare a Scribd company logo
LIMPEL ZIV ALGORITHMS

                         BY
                     S T RAJAN
                  CSN-CJB0912010




   Ramaiah School of Advanced Studies –Bangalore
   M.S                                              1
INTRODUCTION
• Lempel-Ziv is a lossless date compression method algorithm
• The generalized idea comes from pigeonhole principle .
• If N items are placed into M pigeonholes where n>m


.



                   Fig: A pigeonhole
• An image of pigeons in holes. Here there are n = 10 pigeons in m = 9 holes.
  Since 10 is greater than 9, the pigeonhole principle says that at least one
  hole has more than one pigeon
• This concept may varies with input.so it cannot applied every time.
                  Ramaiah School of Advanced Studies -Bangalore
                  M.S                                 –                     2
Lempel Ziv Algorithms
• Lossless Data compression is technique used to produce the original
  information from a compressed data.
• Like Huffman coding ,run length coding ,Arithmetic coding etc., Lempel-
  Ziv is a lossless data compression technique used more often.
• The Lempel Ziv algorithms belong to yet another category of lossless
  compression techniques known as dictionary coders.
• Abraham Lempel & Jacob ziv together published their first compression
  method is sometimes referred to as "LZ77," for the year 1977, in which the
  duo published an article entitled "A Universal Algorithm for Sequential
  Data Compression" .The pair wrote another paper in 1978 outlining another
  dictionary approach know as LZ78 algorithm which was modified by Terry
  Welch in 1984.


                  Ramaiah School of Advanced Studies –Bangalore
                  M.S                                                   3
Limpel Ziv Algorithm Family

                LZ77                                                     LZ78
                                                                                       LZJ
   LZR      LZSS             LZH          LZB             LZFG           LZT
                                                                                   LZW
                                                                  LZC
                                                                                LZMW

APPLICATIONS:                                                 APPLICATIONS:
• ZIP                                                             GIF
• GZIP                                                            V.42
• STACKER                                                         COMPRESS

                        Fig 1: Limpel Ziv Algorithm Family

                        Ramaiah School of Advanced Studies –Bangalore
                        M.S                                                            4
Types of Dictionary
• The dictionary holds a list of strings of symbols and it may be static or
  dynamic (adaptive).
• Static dictionary – permanent, sometimes allowing the addition of strings
  but no deletions
• Dynamic dictionary – holding strings previously found in the input stream,
  allowing for additions and deletions of strings as new input is being read
• LZ Algorithms are used in “ADAPTIVE DICTIONARY”
• The dictionary is being built in a single pass, while at the same time
  encoding take places.
• It continuously rewrites the dictionary for a file, discarding patterns it
  previously included and adding new ones when necessary.


                  Ramaiah School of Advanced Studies –Bangalore
                  M.S                                                   5
LZ77
General approach
• Dictionary is a portion of the previously encoded sequence
• Use a sliding window for compression
Mechanism
• Find the maximum length match for the string pointed to by
the search pointer in the search buffer, and encode it
Rationale
• If patterns tend to repeat locally, we should be able to get
more efficient representation




                    Ramaiah School of Advanced Studies –Bangalore
                    M.S                                              6
LZ77
• Sliding window is composed of a search buffer and a look ahead buffer
  (note: window size W = S + LA).


      Match pointer             search pointer


a_ _ a br a - a da br a r r a r r a_

                                     look ahead buffer
          Search buffer                 (size LA=7)
          (size S=8)
                  Ramaiah School of Advanced Studies –Bangalore
                  M.S                                                     7
Explanation

• Offset = search pointer – match pointer (o = 7)
• Length of match = number of consecutive letters matched
                                  (l = 4)
• Code word (c = C(r)), where C(r) is the code word for r
• Encoding triple: <o, l, c> = <7, 4, C(r)>
• If FLC is used and alphabet size is |A|, <o, l, c> can be
   encoded with [log2S] + [log2W] + [log2|A|] bits.




               Ramaiah School of Advanced Studies –Bangalore
               M.S                                              8
Possible Cases for Triples

• There could be three different possibilities that may
   be encountered during the coding process:
  -No match for the next character to be encoded in the window
  -There is a match
  -The matched string extends inside the look-ahead buffer
• For each of these cases, we have a triple to signal
   the case to the decoder.




               Ramaiah School of Advanced Studies –Bangalore
               M.S                                              9
ENCODING

• Sequence
     cabracadabrarrarrad                        - |cadabrar|rarrad|
       W = 13, S = 7                              |cadabrar|rarrad|
  - |cabraca|dabrar|rarrad                        |cadabrar|rarrad|
       no match for d                              send <3, 3, C(r)>
       send <0, 0, C(d)>                        Could we do better?
  -|abracad|abrarr|arrad                         Send <3, 5, C(d)> instead
   |abracad|abrarr|arrad
   |abracad|abrarr|arrad
   |abracad|abrarr|arrad
      send <7, 4, C(r)>

                 Ramaiah School of Advanced Studies –Bangalore
                 M.S                                                         10
DECODING

• Current input: <0, 0, C(d)> <7, 4, C(r)> <3, 5, C(d)>
• Current output: cabraca
 Decode: <0, 0, C(d)>
       Decode C(d): c|abracad|
 Decode: <7, 4, C(r)>
       Start with the first „a‟, copy four letters: cabra|cadabra
       Decode C(r): cabrac|adabrar
 Decode: <3, 5, C(d)>
     Start with the first „r‟, copy three letters: cabracada|brarrar|
     Copy two more letters: cabracadabr|arrarar|
     Decode C(d): cabracadabrarrarard

                   Ramaiah School of Advanced Studies –Bangalore
                   M.S                                                  11
Algorithm
    while (lookAheadBuffer not empty) {
     get a reference (position, length) to longest match;
     if (length > 0) {
     output (position, length, next symbol);
    shift the window length+1 positions along;
     } else {
  output (0, 0, first symbol in the lookahead buffer);
   shift the window 1 character along;
 }
}
                 Ramaiah School of Advanced Studies –Bangalore
                 M.S                                              12
Points

•    For LZ77, we have
       -Adaptive scheme, no prior knowledge
       -Asymptotically approaches the source statistics
      - Assumes that recurring patterns close to each others
•    Possible improvements
     -Variable-bit encoding: PKZip, zip, gzip, …, etc., uses a
        variable-length coder to encode <o, l, c>.
     -Variable buffer size: larger buffer requires faster searches
      - Elimination of <0, 0, C(x)>
    -LZSS sends a flag bit to signal whether the next “token” is an
      <o, l> pair or the codeword of a symbol
                    Ramaiah School of Advanced Studies –Bangalore
                    M.S                                               13
Improvements

•   LZR
      The Lempel - Ziv - Renau modification allows pointers to reference anything
that has been encoded without being limited by the length of the search.
• LZSS
     The popular modification by Storer and Szymanski (1982) which is used for
the mandatory inclusion of the next non-matching symbol into each codeword
will lead to situations in which the symbol is being explicitly coded despite the
possibility of it being part of the next match.
• LZB
    LZB uses an elaborate scheme for encoding the references and lengths
      with varying sizes.
• LZH
  The LZH implementation employs Huffman coding to compress the pointers.
                   Ramaiah School of Advanced Studies –Bangalore
                   M.S                                                              14
LZ78

• LZ78 improvements from LZ77
   -No search buffer – explicit dictionary instead
   -Encoder/decoder must build dictionary in sync
   - Encoding: <i, c>
          i = index in dictionary table
         c = code of the following character
• Example: encode the following contents
     wabba_wabba_wabba_wabba_woo_woo_woo


             Ramaiah School of Advanced Studies –Bangalore
             M.S                                              15
EXAMPLE-1


•   Input: wabba_wabba_wabba_wabba_woo_woo_woo
•   Dictionaries:            Final Dictionary
     Initial dictionary is empty      Encoder output               index   entry
                                      <0,c(w)>                1            w
     index               entry
                                      <0,c(a)>                2            a
                                      <0,c(b)>                3            b
                                      <3,c(a)>                4            ba
                                      <0,c(_)>                5            _
                                      <1,c(a)>                6            wa
                                      <3,c(b)>                7            bb
                                      <2,c(_)>                8            a_


                  Ramaiah School of Advanced Studies –Bangalore
                  M.S                                                              16
EXAMPLE(Continue..)


      Encoder          index             entry
      output
      <6,c(b)>         9                 wab
      <4,c(_)>         10                ba_
      <9,c(b)>         11                wabb
      <8,c(w)>         12                a_w
      <0,c(o)>         13                o
      <13,c(_)>        14                o_
      <13,c(o)>        15                wo
      <1,c(w)>         16                o_w
      <13,c(o)>        17                oo


 Ramaiah School of Advanced Studies –Bangalore
 M.S                                              17
Remarks
• Observation
    If we keep on encoding, the dictionary will keep on growing
•    Possible solutions
    Stop growing the dictionary
    Effectively switch to a static dictionary
       Prune it
      Based on usage statistics
        Reset it
      Start all over again
• The best solution depends on the knowledge of the
   source
            Ramaiah School of Advanced Studies –Bangalore
            M.S                                                   18
Improvements
• LZ78 has limitation as it grows explicitly.
• LZW was developed by Terry Welch
• The dictionary has to be initialized with all the symbols of the input
  alphabet and this initial dictionary needs to be made known to the decoder.
• IDEA
• Instead of <i, c>, encode i only




                 Ramaiah School of Advanced Studies –Bangalore
                 M.S                                                        19
•   Input: wabba_wabba_wabba_wabba_woo_woo_woo
•       OUTPUT: 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4          Final

•   Dictionaries:                         INDEX         ENTRY        INDEX        ENTRY
     Intial Dictionary                    1              _           14           a_w
                                          2             a            15           wabb
INDEX              ENTRY                  3             b            16           ba_
1                  _                      4             o            17           _wa
2                  a                      5             w            18           abb
3                  b                      6             wa           19           ba_w
4                  o                      7             ab           20           wo
5                  w                      8             bb           21           oo
                                          9             ba           22           o_
                                          10            a_           23           _wo
                                          11            _w           24           oo_
                                          12            wab          25           _woo
                                          12            bba
                    Ramaiah School of Advanced Studies –Bangalore
                    M.S                                                                   20
Algorithm
while (!done)
read next symbol into a
if (p*a) is in dictionary // Note: „*‟ stands for concatenation
p = p*a
else
send out index of p
add p*a to the dictionary
p=a
end




                   Ramaiah School of Advanced Studies –Bangalore
                   M.S                                              21
APPLICATION:COMPRESS



• An early implementation of LZW
• Adaptive dictionary, starts with 2^bmax–1entries
• Dictionary grows up to double in size (2bmax)
• User can configure max codeword length bmax = 9~16
• When dictionary reaches 2bmax entries, it becomes a static dictionary
  encoder
• If compression ratio falls below a threshold, dictionary is reset.

APPLICATION :
   GIF IMAGES
   PNG IMAGES



                Ramaiah School of Advanced Studies –Bangalore
                M.S                                                       22
References
•   BELL, T. C., CLEARY, J. G., AND WITTEN, I. H. Text Compression.
       Prentice Hall, Upper Sadle River, NJ, 1990.
•   SAYOOD, K. Introduction to Data Compression. Academic Press, San
        Diego, CA, 1996, 2000.
•   ZIV, J., AND LEMPEL, A. A universal algorithm for sequential data
    compression. IEEE Transactions on Information Theory 23 (1977),
     337
•    ZIV, J., AND LEMPEL, A. Compression of individual sequences via
     variable-rate coding. IEEE Transactions on Information Theory 24
      (1978),530–536.




                    Ramaiah School of Advanced Studies –Bangalore
                    M.S                                                 23

More Related Content

Viewers also liked

Lz77 (sliding window)
Lz77 (sliding window)Lz77 (sliding window)
Lz77 (sliding window)
MANISH T I
 
Lzw compression
Lzw compressionLzw compression
Lzw compression
Meghna Singh
 
Data compression
Data compression Data compression
Data compression
Muhammad Irtiza
 
Island
Island Island
Island
gouwsc
 
Lz77 / Lempel-Ziv Algorithm
Lz77 / Lempel-Ziv AlgorithmLz77 / Lempel-Ziv Algorithm
Lz77 / Lempel-Ziv Algorithm
Veysi Ertekin
 
Image compression: Techniques and Application
Image compression: Techniques and ApplicationImage compression: Techniques and Application
Image compression: Techniques and Application
Nidhi Baranwal
 
LZ78
LZ78LZ78
Source coding
Source codingSource coding
Source coding
MOHIT KUMAR
 
Compression project presentation
Compression project presentationCompression project presentation
Compression project presentationfaizang909
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compressionanithabalaprabhu
 
Text compression in LZW and Flate
Text compression in LZW and FlateText compression in LZW and Flate
Text compression in LZW and FlateSubeer Rangra
 
Jpeg compression
Jpeg compressionJpeg compression
Jpeg compression
Hossain Md Shakhawat
 
Data Compression Technique
Data Compression TechniqueData Compression Technique
Data Compression Technique
nayakslideshare
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
m_divya_bharathi
 
data compression technique
data compression techniquedata compression technique
data compression technique
CHINMOY PAUL
 
Data compression introduction
Data compression introductionData compression introduction
Data compression introductionRahul Khanwani
 
Fundamentals of Data compression
Fundamentals of Data compressionFundamentals of Data compression
Fundamentals of Data compression
M.k. Praveen
 

Viewers also liked (20)

Lz77 (sliding window)
Lz77 (sliding window)Lz77 (sliding window)
Lz77 (sliding window)
 
Lzw compression
Lzw compressionLzw compression
Lzw compression
 
Data compression
Data compression Data compression
Data compression
 
Island
Island Island
Island
 
Dictor
DictorDictor
Dictor
 
Demo lzw
Demo lzwDemo lzw
Demo lzw
 
Lz77 / Lempel-Ziv Algorithm
Lz77 / Lempel-Ziv AlgorithmLz77 / Lempel-Ziv Algorithm
Lz77 / Lempel-Ziv Algorithm
 
Image compression: Techniques and Application
Image compression: Techniques and ApplicationImage compression: Techniques and Application
Image compression: Techniques and Application
 
Data compression
Data compressionData compression
Data compression
 
LZ78
LZ78LZ78
LZ78
 
Source coding
Source codingSource coding
Source coding
 
Compression project presentation
Compression project presentationCompression project presentation
Compression project presentation
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compression
 
Text compression in LZW and Flate
Text compression in LZW and FlateText compression in LZW and Flate
Text compression in LZW and Flate
 
Jpeg compression
Jpeg compressionJpeg compression
Jpeg compression
 
Data Compression Technique
Data Compression TechniqueData Compression Technique
Data Compression Technique
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
 
data compression technique
data compression techniquedata compression technique
data compression technique
 
Data compression introduction
Data compression introductionData compression introduction
Data compression introduction
 
Fundamentals of Data compression
Fundamentals of Data compressionFundamentals of Data compression
Fundamentals of Data compression
 

Similar to Cjb0912010 lz algorithms

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
ssuser4b1f48
 
Range reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernelRange reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernel
Davidlohr Bueso
 
tmptmptmp123.pptx
tmptmptmp123.pptxtmptmptmp123.pptx
tmptmptmp123.pptx
ssuser893445
 
System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit IIIManoj Patil
 
Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Simil...
Align, Disambiguate and Walk  : A Unified Approach forMeasuring Semantic Simil...Align, Disambiguate and Walk  : A Unified Approach forMeasuring Semantic Simil...
Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Simil...Koji Matsuda
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.ppt
Amrita Sharma
 
Types of parsers
Types of parsersTypes of parsers
Types of parsers
Sabiha M
 
Knucth Morris and pratt_presentation.pptx
Knucth Morris and pratt_presentation.pptxKnucth Morris and pratt_presentation.pptx
Knucth Morris and pratt_presentation.pptx
siddharthyou29
 
20130329 introduction to linq
20130329 introduction to linq20130329 introduction to linq
20130329 introduction to linq
LearningTech
 
Sparse Data Support in MLlib
Sparse Data Support in MLlibSparse Data Support in MLlib
Sparse Data Support in MLlib
Xiangrui Meng
 
Expressive Querying of Semantic Databases with Incremental Query Rewriting
Expressive Querying of Semantic Databases with Incremental Query RewritingExpressive Querying of Semantic Databases with Incremental Query Rewriting
Expressive Querying of Semantic Databases with Incremental Query Rewriting
Alexandre Riazanov
 
Unit iii-111206004501-phpapp02
Unit iii-111206004501-phpapp02Unit iii-111206004501-phpapp02
Unit iii-111206004501-phpapp02riddhi viradiya
 
Lecture-2-Relational-Algebra-and-SQL-Advanced-DataBase-Theory-MS.pdf
Lecture-2-Relational-Algebra-and-SQL-Advanced-DataBase-Theory-MS.pdfLecture-2-Relational-Algebra-and-SQL-Advanced-DataBase-Theory-MS.pdf
Lecture-2-Relational-Algebra-and-SQL-Advanced-DataBase-Theory-MS.pdf
ssuserf86fba
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Prof. Wim Van Criekinge
 
Polyglot and Functional Programming (OSCON 2012)
Polyglot and Functional Programming (OSCON 2012)Polyglot and Functional Programming (OSCON 2012)
Polyglot and Functional Programming (OSCON 2012)
Martijn Verburg
 
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
ssuserc35c0e
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
NameetDaga1
 
Regular expression for everyone
Regular expression for everyoneRegular expression for everyone
Regular expression for everyone
Sanjeev Kumar Jaiswal
 

Similar to Cjb0912010 lz algorithms (20)

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
 
Range reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernelRange reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernel
 
tmptmptmp123.pptx
tmptmptmp123.pptxtmptmptmp123.pptx
tmptmptmp123.pptx
 
System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit III
 
Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Simil...
Align, Disambiguate and Walk  : A Unified Approach forMeasuring Semantic Simil...Align, Disambiguate and Walk  : A Unified Approach forMeasuring Semantic Simil...
Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Simil...
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.ppt
 
Types of parsers
Types of parsersTypes of parsers
Types of parsers
 
Knucth Morris and pratt_presentation.pptx
Knucth Morris and pratt_presentation.pptxKnucth Morris and pratt_presentation.pptx
Knucth Morris and pratt_presentation.pptx
 
20130329 introduction to linq
20130329 introduction to linq20130329 introduction to linq
20130329 introduction to linq
 
Sparse Data Support in MLlib
Sparse Data Support in MLlibSparse Data Support in MLlib
Sparse Data Support in MLlib
 
Expressive Querying of Semantic Databases with Incremental Query Rewriting
Expressive Querying of Semantic Databases with Incremental Query RewritingExpressive Querying of Semantic Databases with Incremental Query Rewriting
Expressive Querying of Semantic Databases with Incremental Query Rewriting
 
Unit iii-111206004501-phpapp02
Unit iii-111206004501-phpapp02Unit iii-111206004501-phpapp02
Unit iii-111206004501-phpapp02
 
Lecture-2-Relational-Algebra-and-SQL-Advanced-DataBase-Theory-MS.pdf
Lecture-2-Relational-Algebra-and-SQL-Advanced-DataBase-Theory-MS.pdfLecture-2-Relational-Algebra-and-SQL-Advanced-DataBase-Theory-MS.pdf
Lecture-2-Relational-Algebra-and-SQL-Advanced-DataBase-Theory-MS.pdf
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
 
Polyglot and Functional Programming (OSCON 2012)
Polyglot and Functional Programming (OSCON 2012)Polyglot and Functional Programming (OSCON 2012)
Polyglot and Functional Programming (OSCON 2012)
 
lempel_ziv
lempel_zivlempel_ziv
lempel_ziv
 
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Regular expression for everyone
Regular expression for everyoneRegular expression for everyone
Regular expression for everyone
 
Clojure presentation
Clojure presentationClojure presentation
Clojure presentation
 

Cjb0912010 lz algorithms

  • 1. LIMPEL ZIV ALGORITHMS BY S T RAJAN CSN-CJB0912010  Ramaiah School of Advanced Studies –Bangalore M.S 1
  • 2. INTRODUCTION • Lempel-Ziv is a lossless date compression method algorithm • The generalized idea comes from pigeonhole principle . • If N items are placed into M pigeonholes where n>m . Fig: A pigeonhole • An image of pigeons in holes. Here there are n = 10 pigeons in m = 9 holes. Since 10 is greater than 9, the pigeonhole principle says that at least one hole has more than one pigeon • This concept may varies with input.so it cannot applied every time.  Ramaiah School of Advanced Studies -Bangalore M.S – 2
  • 3. Lempel Ziv Algorithms • Lossless Data compression is technique used to produce the original information from a compressed data. • Like Huffman coding ,run length coding ,Arithmetic coding etc., Lempel- Ziv is a lossless data compression technique used more often. • The Lempel Ziv algorithms belong to yet another category of lossless compression techniques known as dictionary coders. • Abraham Lempel & Jacob ziv together published their first compression method is sometimes referred to as "LZ77," for the year 1977, in which the duo published an article entitled "A Universal Algorithm for Sequential Data Compression" .The pair wrote another paper in 1978 outlining another dictionary approach know as LZ78 algorithm which was modified by Terry Welch in 1984.  Ramaiah School of Advanced Studies –Bangalore M.S 3
  • 4. Limpel Ziv Algorithm Family LZ77 LZ78 LZJ LZR LZSS LZH LZB LZFG LZT LZW LZC LZMW APPLICATIONS: APPLICATIONS: • ZIP GIF • GZIP V.42 • STACKER COMPRESS Fig 1: Limpel Ziv Algorithm Family  Ramaiah School of Advanced Studies –Bangalore M.S 4
  • 5. Types of Dictionary • The dictionary holds a list of strings of symbols and it may be static or dynamic (adaptive). • Static dictionary – permanent, sometimes allowing the addition of strings but no deletions • Dynamic dictionary – holding strings previously found in the input stream, allowing for additions and deletions of strings as new input is being read • LZ Algorithms are used in “ADAPTIVE DICTIONARY” • The dictionary is being built in a single pass, while at the same time encoding take places. • It continuously rewrites the dictionary for a file, discarding patterns it previously included and adding new ones when necessary.  Ramaiah School of Advanced Studies –Bangalore M.S 5
  • 6. LZ77 General approach • Dictionary is a portion of the previously encoded sequence • Use a sliding window for compression Mechanism • Find the maximum length match for the string pointed to by the search pointer in the search buffer, and encode it Rationale • If patterns tend to repeat locally, we should be able to get more efficient representation  Ramaiah School of Advanced Studies –Bangalore M.S 6
  • 7. LZ77 • Sliding window is composed of a search buffer and a look ahead buffer (note: window size W = S + LA). Match pointer search pointer a_ _ a br a - a da br a r r a r r a_ look ahead buffer Search buffer (size LA=7) (size S=8)  Ramaiah School of Advanced Studies –Bangalore M.S 7
  • 8. Explanation • Offset = search pointer – match pointer (o = 7) • Length of match = number of consecutive letters matched (l = 4) • Code word (c = C(r)), where C(r) is the code word for r • Encoding triple: <o, l, c> = <7, 4, C(r)> • If FLC is used and alphabet size is |A|, <o, l, c> can be encoded with [log2S] + [log2W] + [log2|A|] bits.  Ramaiah School of Advanced Studies –Bangalore M.S 8
  • 9. Possible Cases for Triples • There could be three different possibilities that may be encountered during the coding process: -No match for the next character to be encoded in the window -There is a match -The matched string extends inside the look-ahead buffer • For each of these cases, we have a triple to signal the case to the decoder.  Ramaiah School of Advanced Studies –Bangalore M.S 9
  • 10. ENCODING • Sequence cabracadabrarrarrad - |cadabrar|rarrad| W = 13, S = 7 |cadabrar|rarrad| - |cabraca|dabrar|rarrad |cadabrar|rarrad|  no match for d send <3, 3, C(r)>  send <0, 0, C(d)> Could we do better? -|abracad|abrarr|arrad Send <3, 5, C(d)> instead |abracad|abrarr|arrad |abracad|abrarr|arrad |abracad|abrarr|arrad  send <7, 4, C(r)>  Ramaiah School of Advanced Studies –Bangalore M.S 10
  • 11. DECODING • Current input: <0, 0, C(d)> <7, 4, C(r)> <3, 5, C(d)> • Current output: cabraca  Decode: <0, 0, C(d)> Decode C(d): c|abracad|  Decode: <7, 4, C(r)> Start with the first „a‟, copy four letters: cabra|cadabra Decode C(r): cabrac|adabrar  Decode: <3, 5, C(d)> Start with the first „r‟, copy three letters: cabracada|brarrar| Copy two more letters: cabracadabr|arrarar| Decode C(d): cabracadabrarrarard  Ramaiah School of Advanced Studies –Bangalore M.S 11
  • 12. Algorithm while (lookAheadBuffer not empty) { get a reference (position, length) to longest match; if (length > 0) { output (position, length, next symbol); shift the window length+1 positions along; } else { output (0, 0, first symbol in the lookahead buffer); shift the window 1 character along; } }  Ramaiah School of Advanced Studies –Bangalore M.S 12
  • 13. Points • For LZ77, we have -Adaptive scheme, no prior knowledge -Asymptotically approaches the source statistics - Assumes that recurring patterns close to each others • Possible improvements -Variable-bit encoding: PKZip, zip, gzip, …, etc., uses a variable-length coder to encode <o, l, c>. -Variable buffer size: larger buffer requires faster searches - Elimination of <0, 0, C(x)> -LZSS sends a flag bit to signal whether the next “token” is an <o, l> pair or the codeword of a symbol  Ramaiah School of Advanced Studies –Bangalore M.S 13
  • 14. Improvements • LZR The Lempel - Ziv - Renau modification allows pointers to reference anything that has been encoded without being limited by the length of the search. • LZSS The popular modification by Storer and Szymanski (1982) which is used for the mandatory inclusion of the next non-matching symbol into each codeword will lead to situations in which the symbol is being explicitly coded despite the possibility of it being part of the next match. • LZB LZB uses an elaborate scheme for encoding the references and lengths with varying sizes. • LZH The LZH implementation employs Huffman coding to compress the pointers.  Ramaiah School of Advanced Studies –Bangalore M.S 14
  • 15. LZ78 • LZ78 improvements from LZ77 -No search buffer – explicit dictionary instead -Encoder/decoder must build dictionary in sync - Encoding: <i, c> i = index in dictionary table c = code of the following character • Example: encode the following contents wabba_wabba_wabba_wabba_woo_woo_woo  Ramaiah School of Advanced Studies –Bangalore M.S 15
  • 16. EXAMPLE-1 • Input: wabba_wabba_wabba_wabba_woo_woo_woo • Dictionaries: Final Dictionary Initial dictionary is empty Encoder output index entry <0,c(w)> 1 w index entry <0,c(a)> 2 a <0,c(b)> 3 b <3,c(a)> 4 ba <0,c(_)> 5 _ <1,c(a)> 6 wa <3,c(b)> 7 bb <2,c(_)> 8 a_  Ramaiah School of Advanced Studies –Bangalore M.S 16
  • 17. EXAMPLE(Continue..) Encoder index entry output <6,c(b)> 9 wab <4,c(_)> 10 ba_ <9,c(b)> 11 wabb <8,c(w)> 12 a_w <0,c(o)> 13 o <13,c(_)> 14 o_ <13,c(o)> 15 wo <1,c(w)> 16 o_w <13,c(o)> 17 oo  Ramaiah School of Advanced Studies –Bangalore M.S 17
  • 18. Remarks • Observation If we keep on encoding, the dictionary will keep on growing • Possible solutions Stop growing the dictionary Effectively switch to a static dictionary Prune it Based on usage statistics Reset it Start all over again • The best solution depends on the knowledge of the source  Ramaiah School of Advanced Studies –Bangalore M.S 18
  • 19. Improvements • LZ78 has limitation as it grows explicitly. • LZW was developed by Terry Welch • The dictionary has to be initialized with all the symbols of the input alphabet and this initial dictionary needs to be made known to the decoder. • IDEA • Instead of <i, c>, encode i only  Ramaiah School of Advanced Studies –Bangalore M.S 19
  • 20. Input: wabba_wabba_wabba_wabba_woo_woo_woo • OUTPUT: 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4 Final • Dictionaries: INDEX ENTRY INDEX ENTRY Intial Dictionary 1 _ 14 a_w 2 a 15 wabb INDEX ENTRY 3 b 16 ba_ 1 _ 4 o 17 _wa 2 a 5 w 18 abb 3 b 6 wa 19 ba_w 4 o 7 ab 20 wo 5 w 8 bb 21 oo 9 ba 22 o_ 10 a_ 23 _wo 11 _w 24 oo_ 12 wab 25 _woo 12 bba  Ramaiah School of Advanced Studies –Bangalore M.S 20
  • 21. Algorithm while (!done) read next symbol into a if (p*a) is in dictionary // Note: „*‟ stands for concatenation p = p*a else send out index of p add p*a to the dictionary p=a end  Ramaiah School of Advanced Studies –Bangalore M.S 21
  • 22. APPLICATION:COMPRESS • An early implementation of LZW • Adaptive dictionary, starts with 2^bmax–1entries • Dictionary grows up to double in size (2bmax) • User can configure max codeword length bmax = 9~16 • When dictionary reaches 2bmax entries, it becomes a static dictionary encoder • If compression ratio falls below a threshold, dictionary is reset. APPLICATION : GIF IMAGES PNG IMAGES  Ramaiah School of Advanced Studies –Bangalore M.S 22
  • 23. References • BELL, T. C., CLEARY, J. G., AND WITTEN, I. H. Text Compression. Prentice Hall, Upper Sadle River, NJ, 1990. • SAYOOD, K. Introduction to Data Compression. Academic Press, San Diego, CA, 1996, 2000. • ZIV, J., AND LEMPEL, A. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23 (1977), 337 • ZIV, J., AND LEMPEL, A. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24 (1978),530–536.  Ramaiah School of Advanced Studies –Bangalore M.S 23