1. Topics:LZ77 & LZ78
Lossless compression
Lossless compression is a class of data compression algorithms that allows the original data to
be perfectly reconstructed from the compressed data. By contrast, lossy compression permits
reconstruction only of an approximation of the original data, though usually with improved
compression rates (and therefore reduced file sizes).
Lossless data compression is used in many applications. For example, it is used in the ZIP file
format and in the GNU tool gzip. It is also often used as a component within lossy data
compression technologies (e.g. lossless mid/side joint stereo preprocessing by MP3 encoders and
other lossy audio encoders).
Lossless compression is used in cases where it is important that the original and the
decompressed data be identical, or where deviations from the original data would be
unfavourable. Typical examples are executable programs, text documents, and source code.
Some image file formats, like PNG or GIF, use only lossless compression, while others like TIFF
and MNG may use either lossless or lossy methods. Lossless audio formats are most often used
for archiving or production purposes, while smaller lossy audio files are typically used on
portable players and in other cases where storage space is limited or exact replication of the
audio is unnecessary.
LZ77 and LZ78
LZ77 and LZ78 are the two lossless data compression algorithms published in papers by
Abraham Lempel and Jacob Ziv in 1977[1] and 1978.[2] They are also known as LZ1 and LZ2
respectively.[3] These two algorithms form the basis for many variations including LZW, LZSS,
LZMA and others. Besides their academic influence, these algorithms formed the basis of
several ubiquitous compression schemes, including GIF and the DEFLATE algorithm used in
PNG and ZIP.
2. They are both theoretically dictionary coders. LZ77 maintains a sliding window during
compression. This was later shown to be equivalent to the explicit dictionary constructed by
LZ78—however, they are only equivalent when the entire data is intended to be decompressed.
Since LZ77 encodes and decodes from a sliding window over previously seen characters,
decompression must always start at the beginning of the input. Conceptually, LZ78
decompression could allow random access to the input if the entire dictionary were known in
advance. However, in practice the dictionary is created during encoding and decoding by
creating a new phrase whenever a token is output.[4]
The algorithms were named an IEEE Milestone in 2004
1 History
In 1977, Jacov Zivy AbrahamLempel propose the LZ77 algorithm.
In the eighties,abranchof LZ77 knownas LZSS and isimplementedbyHaruyasuYoshizaki inthe
program LHARC,discoveringthe possibilitiesof the LZ77 encoding.
Afterthat,a large numberof textcompressorshave beenbasedonthe LZ77 idea(ora variation
of it).Some of the mostfamousare: ARJ,RAR,gzipand 7z.
2 FundamentalsofLZ77
LZ77 processesasequence of symbolsusingthe structure:
The dictionaryandthe look-aheadbufferhave afixedsize andcan be consideredasa sliding
window,where the inputof a new symbol generatesthe outputof the oldestone,which
becomesthe newestsymbolof the dictionary.
3 TheLZ77 encoder
1. Let I the lengthof the dictionaryandJ the lengthof the buffer.
2. Inputthe first J symbolsinthe buffer.
3. 3. While the inputisnotexhausted:
1. Let i the positioninthe dictionaryof the firstj symbolsof the bufferand k the symbol
that makesthat j can notbe larger.
2. Outputijk.
3. Inputthe nextj + 1 in the buffer.
4 TheLZ77 decoder
1. While the code-wordsijkare notexhausted:
1. Outputthe j symbolsextractedfromthe positioni inthe dictionary.
2. Outputk.
3. Introduce all the decodedsymbolsintothe buffer.
Encodingexample
Dict.Buffer Output Comment
abab cbababaaaaaa 0 0 a Emptydictionary
ababc bababaaaaaa 0 0 b b Notfound
ababcb ababaaaaaa 2 2 c ab found
a babcbaba baaaaaa 0 3 a bab found
ababc bababaaa aaa 0 2 a ba found
ababcbab abaaaaaa 2 3 a aaa found
0123
5 Decoding example
Input Output Dict.Buffer
0 0 a a a
0 0 b b ab
2 2 c abc ababc
0 3 a baba a babcbaba
4. 0 2 a baa ababc bababaa
2 3 a aaaa ababcbabab abaaaaaa
0123
Notice thatthe parametersI and J control the performance of the algorithm.Theyshouldbe
large enoughtoguarantee the matchingof longstrings,but shouldkeepsmall inorderto
reduce the numberof bitsof the code-wordsijk.Typical sizesare:log2(I) =12.0 and log2(J) =
4.0.
LZW compression
LZW compression is the compression of a file into a smaller file using a table-based
lookup algorithm invented by Abraham Lempel, Jacob Ziv, and Terry Welch. Two
commonly-used file formats in which LZV compression is used are the GIF image format
served from Web sites and the TIFF image format. LZW compression is also suitable for
compressing text files.
A particular LZW compression algorithm takes each input sequence of bits of a given
length (for example, 12 bits) and creates an entry in a table (sometimes called a
"dictionary" or "codebook") for that particular bit pattern, consisting of the pattern itself
and a shorter code. As input is read, any pattern that has been read before results in the
substitution of the shorter code, effectively compressing the total amount of input to
something smaller. Unlike earlier approaches, known as LZ77 and LZ78, the LZW
algorithm does include the look-up table of codes as part of the compressed file. The
decoding program that uncompresses the file is able to build the table itself by using the
algorithm as it processes the encoded input.
what are the issues surrounding it?
LZW (Lempel-Ziv-Welch) is a popular compression algorithm used by a number of formats,
including GIF, TIFF, PostScript, PDF, Unix Compress, and V.42bis. It is based on LZ77 and
LZ78, methods developed by Abraham Lempel and Jacob Ziv in the 1970s, and was later refined
into LZW by Terry Welch. It effectively compresses repetitive data and does so with minimal
computational overhead.
5. Unisys used to hold the patent for LZW, though many companies and developers, including
CompuServe, mistakenly believed it to be in the public domain. Unisys had always required
licenses from companies that used LZW in their hardware (e.g., modem manufacturers), but for
many years it overlooked LZW implementations in software. When it came to light that LZW
was used in CompuServe's GIF image format, Unisys began to require that developers pay
licensing fees for programs that could display or create GIF files. Although it did not require end
users or the creators of freeware and other non-profit software to acquire licenses, Unisys was
heavily criticized for its actions. Critics accused Unisys of trying to cash in on a format that, with
the explosive growth of the World Wide Web, had become a universal standard. Unisys has
denied these charges.
The US patentthat UnisysheldexpiredinJune 2003, and patentsinotherparts of the worldhadall
expiredbyJuly7,2004. Unisyscurrentlyholdsandhaspatentspendingonimprovementsto the LZW
compressionalgorithm