EEM 562
SIGNAL CODING
LZ77 and LZ78
Compression Algorithms
Mustafa Gökçe
Anadolu University
mustafa_gokce@anadolu.edu.tr
LZ77.tk – LZ78.tk
What is file compression/decompression?
• File compression is representing a file (data) with less data.
• Decompression is getting the original file (data) from
compressed file (data).
• If the decompressed file is exactly same as the original data,
this compression/decompression is called lossless.
• If the decompressed file has less data from the original data,
this compression/decompression is called lossy.
Lossless vs. Lossy Compression
• If the data is not so critical (family pictures on a holiday), lossy
compression is not a bad idea to do. JPEG is a known lossy
algorithm for this kind of applications.
• If the data is critical (accounts’ back-up information in a bank)
and loss can not acceptable, lossless compression must be
done. Huffman and Zip are known lossless algorithm for this
kind of applications.
LZ77 and LZ78 Compression Algorithms
• They are the backbone of the modern compression algorithms
such as :
• DEFLATE: Used by PNG, ZIP and GZIP
• LZMA (Very high compression ratio): Used by 7zip and xz
• LZSS: Used by WinRAR with Huffman coding
and so on…
LZ77 and LZ78 Compression Algorithms
• LZ77 and LZ78 are the two
lossless data compression
algorithms published in papers
by Abraham Lempel and Jacob
Ziv in 1977 and 1978.
• They are also known as LZ1
and LZ2 respectively.
• The algorithms were named an
IEEE Milestone in 2004.
Abraham Lempel Jacob Ziv
LZ77 and LZ78 Compression Algorithms
• LZ77 maintains a sliding window during compression. This was later
shown to be equivalent to the explicit dictionary constructed by LZ78,
however, they are only equivalent when the entire data is intended to
be decompressed.
• Since LZ77 encodes and decodes from a sliding window over
previously seen characters, decompression must always start at the
beginning of the input.
• Conceptually, LZ78 decompression could allow random access to
the input if the entire dictionary were known in advance. However, in
practice the dictionary is created during encoding and decoding by
creating a new phrase whenever a token is output
LZ77 Compression Algorithm
• LZ77 algorithm achieves compression by replacing repeated
occurrences of data with references to a single copy of that data
existing earlier in the uncompressed data stream.
• A match is encoded by a pair of numbers called a length-
distance pair, which is equivalent to the statement "each of the
next length characters is equal to the characters exactly
distance characters behind it in the uncompressed stream".
• The "distance" is sometimes called the "offset" instead.
LZ77 Compression Part
• Get the data which will be compressed.
• While pointer is not at the end of the sequence;
• Set length of repeated phrase is 0
• Search whether this length of phrase appeared before.
• If the phrase was not appeared before, encode this as with point to 0 and
with length zero = [(0,0) phrase].
• If the phrase was appeared before, encode this as with difference
between current location of pointer, start of the before appeared location
of phrase and with length of phrase and next letter which is located after
the current location of the pointer, which can be shown as = [(difference,
length) next letter].
LZ77 Decompression Part
• Get the stored data.
• While pointer is not at the end of the stored data;
• If location and length is zero, add the stored letter at the end of the sequence
holder variable.
• Else, copy the phrase which has the stored location and length, add this and
stored letter at the end of the sequence holder variable.
LZ77 Example: Compression
Input sequence: tipp_tap_tipp_tap_tippe_tippe_tipp_tap
Encoding input sequence
(0,0) t [t]ipp_tap_tipp_tap_tippe_tippe_tipp_tap
(0,0) i t[i]pp_tap_tipp_tap_tippe_tippe_tipp_tap
(0,0) p ti[p]p_tap_tipp_tap_tippe_tippe_tipp_tap
(1,1) _ ti[p][p]_tap_tipp_tap_tippe_tippe_tipp_tap
(5,1) a [t]ipp_[t]ap_tipp_tap_tippe_tippe_tipp_tap
(4,3) i tip[p_t]a[p_t]ipp_tap_tippe_tippe_tipp_tap
(9,9) p ti[pp_tap_ti][pp_tap_ti]ppe_tippe_tipp_tap
(1,1) e tipp_tap_tipp_tap_ti[p][p]e_tippe_tipp_tap
(6,6) _ tipp_tap_tipp_tap[_tippe][_tippe]_tipp_tap
(21,7) p tipp_tap_[tipp_ta]p_tippe_tippe_[tipp_ta]p
LZ77 Example: Decompression
Decoding stored sequence
(0,0) t t
(0,0) i ti
(0,0) p tip
(1,1) _ tipp_
(5,1) a tipp_ta
(4,3) i tipp_tap_ti
(9,9) p tipp_tap_tipp_tap_tip
(1,1) e tipp_tap_tipp_tap_tippe
(6,6) _ tipp_tap_tipp_tap_tippe_tippe_
(21,7) p tipp_tap_tipp_tap_tippe_tippe_tipp_tap
Compressed file size: 78.95%
LZ78 Compression Algorithm
• LZ78 algorithm achieves compression by replacing repeated
occurrences of data with references to a dictionary that is built
based on the input data stream.
• For each character of the input stream, the dictionary is
searched for a match.
• If a match is found, then last matching index is set to the index
of the matching entry.
• If a match is not found, then a new dictionary entry is created.
LZ78 Compression Part
• Get the data which will be compressed.
• While pointer is not at the end of the sequence;
• Set length of repeated phrase is 0
• Search whether this length of phrase is in the dictionary.
• If the phrase was not in the dictionary, then a new dictionary entry is
created and the algorithm outputs last matching index, followed by
character, then resets last matching index = 0 and increments next
available index.
• If the phrase was appeared in the dictionary, then last matching index is
set to the index of the matching entry.
LZ78 Decompression Part
• Get the stored data.
• While pointer is not at the end of the stored data;
• If pointer is zero, add the stored letter at the end of the sequence holder
variable.
• Else, get the dictionary equivalent of the pointer and add the stored letter at
the end of the sequence holder variable with it.
LZ78 Example: Compression
Input sequence: tipp_tap_tipp_tap_tippe_tippe_tipp_tap
Encoding input sequence
(0,t) 1: t
(0,i) 2: i
(0,p) 3: p
(3,_) 4: p_
(1,a) 5: ta
(4,t) 6: p_t
(2,p) 7: ip
(6,a) 8: p_ta
(6,i) 9: p_ti
(3,p) 10: pp
(0,e) 11: e
(0,_) 12: _
(1,i) 13: ti
(10,e) 14: ppe
(12,t) 15: _t
(7,p) 16: ipp
(15,a) 17: _ta
(3,eof)
LZ78 Example: Decompression
Decoding stored sequence
(0,t) 1: t t
(0,i) 2: i ti
(0,p) 3: p tip
(3,_) 4: p_ tipp_
(1,a) 5: ta tipp_ta
(4,t) 6: p_t tipp_tap_t
(2,p) 7: ip tipp_tap_tip
(6,a) 8: p_ta tipp_tap_tipp_ta
(6,i) 9: p_ti tipp_tap_tipp_tap_ti
(3,p) 10: pp tipp_tap_tipp_tap_tipp
(0,e) 11: e tipp_tap_tipp_tap_tippe
(0,_) 12: _ tipp_tap_tipp_tap_tippe_
(1,i) 13: ti tipp_tap_tipp_tap_tippe_ti
(10,e) 14: ppe tipp_tap_tipp_tap_tippe_tippe
(12,t) 15: _t tipp_tap_tipp_tap_tippe_tippe_t
(7,p) 16: ipp tipp_tap_tipp_tap_tippe_tippe_tipp
(15,a) 17: _ta tipp_tap_tipp_tap_tippe_tippe_tipp_ta
(3,eof) tipp_tap_tipp_tap_tippe_tippe_tipp_tap
Compressed file size: 150.00%
For more information
http://LZ77.tk
http://LZ78.tk
Thanks for listening…
LZ77 and LZ78
Compression
Algorithms
Mustafa Gökçe
Anadolu University
mustafa_gokce@anadolu.edu.tr
LZ77.tk – LZ78.tk

LZ77 and LZ78 Compression Algorithms

  • 1.
    EEM 562 SIGNAL CODING LZ77and LZ78 Compression Algorithms Mustafa Gökçe Anadolu University mustafa_gokce@anadolu.edu.tr LZ77.tk – LZ78.tk
  • 2.
    What is filecompression/decompression? • File compression is representing a file (data) with less data. • Decompression is getting the original file (data) from compressed file (data). • If the decompressed file is exactly same as the original data, this compression/decompression is called lossless. • If the decompressed file has less data from the original data, this compression/decompression is called lossy.
  • 3.
    Lossless vs. LossyCompression • If the data is not so critical (family pictures on a holiday), lossy compression is not a bad idea to do. JPEG is a known lossy algorithm for this kind of applications. • If the data is critical (accounts’ back-up information in a bank) and loss can not acceptable, lossless compression must be done. Huffman and Zip are known lossless algorithm for this kind of applications.
  • 4.
    LZ77 and LZ78Compression Algorithms • They are the backbone of the modern compression algorithms such as : • DEFLATE: Used by PNG, ZIP and GZIP • LZMA (Very high compression ratio): Used by 7zip and xz • LZSS: Used by WinRAR with Huffman coding and so on…
  • 5.
    LZ77 and LZ78Compression Algorithms • LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. • They are also known as LZ1 and LZ2 respectively. • The algorithms were named an IEEE Milestone in 2004. Abraham Lempel Jacob Ziv
  • 6.
    LZ77 and LZ78Compression Algorithms • LZ77 maintains a sliding window during compression. This was later shown to be equivalent to the explicit dictionary constructed by LZ78, however, they are only equivalent when the entire data is intended to be decompressed. • Since LZ77 encodes and decodes from a sliding window over previously seen characters, decompression must always start at the beginning of the input. • Conceptually, LZ78 decompression could allow random access to the input if the entire dictionary were known in advance. However, in practice the dictionary is created during encoding and decoding by creating a new phrase whenever a token is output
  • 7.
    LZ77 Compression Algorithm •LZ77 algorithm achieves compression by replacing repeated occurrences of data with references to a single copy of that data existing earlier in the uncompressed data stream. • A match is encoded by a pair of numbers called a length- distance pair, which is equivalent to the statement "each of the next length characters is equal to the characters exactly distance characters behind it in the uncompressed stream". • The "distance" is sometimes called the "offset" instead.
  • 8.
    LZ77 Compression Part •Get the data which will be compressed. • While pointer is not at the end of the sequence; • Set length of repeated phrase is 0 • Search whether this length of phrase appeared before. • If the phrase was not appeared before, encode this as with point to 0 and with length zero = [(0,0) phrase]. • If the phrase was appeared before, encode this as with difference between current location of pointer, start of the before appeared location of phrase and with length of phrase and next letter which is located after the current location of the pointer, which can be shown as = [(difference, length) next letter].
  • 9.
    LZ77 Decompression Part •Get the stored data. • While pointer is not at the end of the stored data; • If location and length is zero, add the stored letter at the end of the sequence holder variable. • Else, copy the phrase which has the stored location and length, add this and stored letter at the end of the sequence holder variable.
  • 10.
    LZ77 Example: Compression Inputsequence: tipp_tap_tipp_tap_tippe_tippe_tipp_tap Encoding input sequence (0,0) t [t]ipp_tap_tipp_tap_tippe_tippe_tipp_tap (0,0) i t[i]pp_tap_tipp_tap_tippe_tippe_tipp_tap (0,0) p ti[p]p_tap_tipp_tap_tippe_tippe_tipp_tap (1,1) _ ti[p][p]_tap_tipp_tap_tippe_tippe_tipp_tap (5,1) a [t]ipp_[t]ap_tipp_tap_tippe_tippe_tipp_tap (4,3) i tip[p_t]a[p_t]ipp_tap_tippe_tippe_tipp_tap (9,9) p ti[pp_tap_ti][pp_tap_ti]ppe_tippe_tipp_tap (1,1) e tipp_tap_tipp_tap_ti[p][p]e_tippe_tipp_tap (6,6) _ tipp_tap_tipp_tap[_tippe][_tippe]_tipp_tap (21,7) p tipp_tap_[tipp_ta]p_tippe_tippe_[tipp_ta]p
  • 11.
    LZ77 Example: Decompression Decodingstored sequence (0,0) t t (0,0) i ti (0,0) p tip (1,1) _ tipp_ (5,1) a tipp_ta (4,3) i tipp_tap_ti (9,9) p tipp_tap_tipp_tap_tip (1,1) e tipp_tap_tipp_tap_tippe (6,6) _ tipp_tap_tipp_tap_tippe_tippe_ (21,7) p tipp_tap_tipp_tap_tippe_tippe_tipp_tap Compressed file size: 78.95%
  • 12.
    LZ78 Compression Algorithm •LZ78 algorithm achieves compression by replacing repeated occurrences of data with references to a dictionary that is built based on the input data stream. • For each character of the input stream, the dictionary is searched for a match. • If a match is found, then last matching index is set to the index of the matching entry. • If a match is not found, then a new dictionary entry is created.
  • 13.
    LZ78 Compression Part •Get the data which will be compressed. • While pointer is not at the end of the sequence; • Set length of repeated phrase is 0 • Search whether this length of phrase is in the dictionary. • If the phrase was not in the dictionary, then a new dictionary entry is created and the algorithm outputs last matching index, followed by character, then resets last matching index = 0 and increments next available index. • If the phrase was appeared in the dictionary, then last matching index is set to the index of the matching entry.
  • 14.
    LZ78 Decompression Part •Get the stored data. • While pointer is not at the end of the stored data; • If pointer is zero, add the stored letter at the end of the sequence holder variable. • Else, get the dictionary equivalent of the pointer and add the stored letter at the end of the sequence holder variable with it.
  • 15.
    LZ78 Example: Compression Inputsequence: tipp_tap_tipp_tap_tippe_tippe_tipp_tap Encoding input sequence (0,t) 1: t (0,i) 2: i (0,p) 3: p (3,_) 4: p_ (1,a) 5: ta (4,t) 6: p_t (2,p) 7: ip (6,a) 8: p_ta (6,i) 9: p_ti (3,p) 10: pp (0,e) 11: e (0,_) 12: _ (1,i) 13: ti (10,e) 14: ppe (12,t) 15: _t (7,p) 16: ipp (15,a) 17: _ta (3,eof)
  • 16.
    LZ78 Example: Decompression Decodingstored sequence (0,t) 1: t t (0,i) 2: i ti (0,p) 3: p tip (3,_) 4: p_ tipp_ (1,a) 5: ta tipp_ta (4,t) 6: p_t tipp_tap_t (2,p) 7: ip tipp_tap_tip (6,a) 8: p_ta tipp_tap_tipp_ta (6,i) 9: p_ti tipp_tap_tipp_tap_ti (3,p) 10: pp tipp_tap_tipp_tap_tipp (0,e) 11: e tipp_tap_tipp_tap_tippe (0,_) 12: _ tipp_tap_tipp_tap_tippe_ (1,i) 13: ti tipp_tap_tipp_tap_tippe_ti (10,e) 14: ppe tipp_tap_tipp_tap_tippe_tippe (12,t) 15: _t tipp_tap_tipp_tap_tippe_tippe_t (7,p) 16: ipp tipp_tap_tipp_tap_tippe_tippe_tipp (15,a) 17: _ta tipp_tap_tipp_tap_tippe_tippe_tipp_ta (3,eof) tipp_tap_tipp_tap_tippe_tippe_tipp_tap Compressed file size: 150.00%
  • 17.
  • 18.
    LZ77 and LZ78 Compression Algorithms MustafaGökçe Anadolu University mustafa_gokce@anadolu.edu.tr LZ77.tk – LZ78.tk