SlideShare a Scribd company logo
1 of 4
Download to read offline
LZ SOURCE CODING TECHNIQUES, OCTOBER 30, 2016 1
Lempel-Ziv Source Coding Techniques
LITU ROUT
Indian Institute of Space Science and Technology
liturout1997@gmail.com
Abstract—There are so many data compression techniques
available which are used for efficient transmission and storage
of data with less memory space.Lempel-Ziv(LZ) scheme is one
of those lossless data compression techniques.LZ is not a single
algorithm, but a whole family of algorithms derived from their
basic algorithm proposed in the year 1977 and 1978.Today
these algorithms are remembered by the name of the authors
and the year of implementation of the same. LZ77 exploits
the fact that words and phrases in a text are likely to be
repeated.When there is a repetition they can be pointed to the
earlier occurrence thereby saving the memory space. LZ78 is
a dictionary based technique.A dictionary is created while the
data are being encoded.So encoding can be done on the fly.The
dictionary need not to be transmitted as it can be generated
at the receiving end on the fly. If there is an overflow in the
dictionary then one bit is added to the code word until all the
characters are included.
Index Terms—Lossless,LZ77,LZ78,Repetitive
text/patterns,Dictionary Scheme
I. INTRODUCTION
DATA compression deals with the reduction of space
needed to store information thereby reducing the
amount of time required to transmit data.These compression
techniques are mainly based on identification and isolation of
excess information.Data are compressed such that it meets
the minimum requirements of the reconstructed signals.
All of the books in the world contain no more information
than is broadcast as video in a single large American city
in a single year. Not all bits have equal value. -Carl Sagan
Some data compression schemes are lossless i.e. the
exact information can be reconstructed from the transmitted
data.This is needed when we can’t afford to loose any detail
about the information such as medical imaging,text,computer
executable files etc.
Some schemes are lossy i.e. only some specific information
can be reconstructed from the transmitted data.In some
cases it might be meet our need to get the approximate
result.Lossy compressions have better compression ratio than
lossless ones. Data such as multimedia images, video and
audio are more easily compressed by lossy compression
techniques because of the way human audio and visual
system works.Human ears can’t detect very small changes
in the sound.So rather than sending the whole information
Litu Rout,Bachelor of Technology,Department of Avionics,IIST
data can be compressed and sent with undistinguishable result.
Lossy compression has better compression ratio but it
is limited to audio,video and images where some loss is
acceptable.Each has it’s own merits and demerits.So the
question of which one is better is irrelevant here.
There has been quite a few lossless data compression
algorithms based on the probabilistic or dictionary method
which was first proposed by Lempel and Ziv in their 1977
and 1978 papers.The Dictionary based compression technique
Lempel-Ziv scheme is divided into two families: those derived
from LZ77 (LZ77, LZSS, LZH and LZB) and those derived
from LZ78 (LZ78, LZW and LZFG).
In this paper I have given a brief description about LZ77
Series and LZ78 Series.
A. LZ77 Series
LZ77 exploits the fact that the words and phrases in text
are likely to be repeated.When there is a repetition ,the new
character can be referred to the previous occurrence by a
pointer.It doesn’t need any prior knowledge and requires no
assumptions about the characteristics of the source.
In the LZ77 approach, the dictionary is simply a portion
of the previously encoded sequence. The encoder examines
the input sequence through a sliding window which consists
of two parts: a search buffer that contains a portion of the
recently encoded sequence and a look ahead buffer that
contains the next portion of the sequence to be encoded. The
algorithm searches the sliding window for the longest match
with the beginning of the look-ahead buffer and outputs a
reference (a pointer) to that match. It is possible that there is
no match at all, so the output cannot contain just pointers. In
LZ77 the reference is always output as a triple <o,l,c>, where
‘o’ is an offset to the match, ‘l’ is length of the match, and ‘c’
is the next symbol after the match. If there is no match, the
algorithm outputs a null-pointer (both the offset and the match
length equal to 0) and the first symbol in the look-ahead buffer.
The values of an offset to a match and length must
be limited to some maximum constants. Moreover the
compression performance of LZ77 mainly depends on these
values. Usually the offset is encoded on 12-16 bits, so it is
limited from 0 to 65535 symbols. So, there is no need to
remember more than 65535 last seen symbols in the sliding
window. The match length is usually encoded on 8 bits,
LZ SOURCE CODING TECHNIQUES, OCTOBER 30, 2016 2
which gives maximum match length equal to 255.
Algorithm of LZ77 :
While (lookAheadBuffer not empty)
get a reference (position, length) to longest match;
if (length > 0)
output (position, length, next symbol);
shift the window length+1 positions along;
else
output (0, 0, first symbol in the lookahead buffer);
shift the window 1 character along;
There are lots of ways that LZ77 scheme can be made more
efficient and many of the improvements deal with the efficient
encoding with the triples. There are several variations on LZ77
scheme, the best known are LZSS, LZH and LZB.
Fig. 1: Lempel-Ziv Derivatives
1) LZSS : It’s a lossless data compression algorithm.It’s
a derivative of LZ77, created by James Storer and Thomas
Szymanski in 1982.The difference between LZ77 and LZSS
is that it doesn’t allow the reference bit to be longer than the
length of the string that it was replacing.It uses a break even
point as threshold to omit such references.It also uses a flag
bit to indicate that the next data is a pointer or a single symbol.
2) LZH : LZH is the scheme that combines the Ziv-
Lempel and Huffman techniques. Here coding is performed
in two passes. The first is essentially same as LZSS, while the
second uses statistics measured in the first to code pointers
and explicit characters using Huffman coding.
The Fig.2 and Fig.3 indicate that different ASCII files are
compressed to an average bits per character (BPC) which is
a little less than half of the original size.Out of LZ77 series
which are mentioned in this paper LZB uses lowest average
BPC thereby provides a higher compression rate than the
others.It’s average BPC is around 3.11 .
Fig. 2: LZ77 Comparison For various data sets
Fig. 3: LZ77 Comparison For various data sets
B. LZ78 Series
In 1978 Jacob Ziv and Abraham Lempel presented their
dictionary based scheme , which is known as LZ78. It is a
dictionary based compression algorithm that maintains an
explicit dictionary. This dictionary has to be built both at the
encoding and decoding side and they must follow the same
rules to ensure that they use an identical dictionary. The
codewords output by the algorithm consists of two elements
<i,c> where ‘i’ is an index referring to the longest matching
dictionary entry and the first non-matching symbol. In
addition to outputting the codeword for storage / transmission
the algorithm also adds the index and symbol pair to the
dictionary. When a symbol that is not yet found in the
dictionary, the codeword has the index value 0 and it is added
to the dictionary as well. The algorithm gradually builds up a
dictionary with this method. The algorithm for LZ78 is given
below:
w := NIL;
while ( there is input )
K := next symbol from input;
if (wK exists in the dictionary)
w := wK;
else
output (index(w), K);
add wK to the dictionary;
w := NIL;
LZ78 can hold patterns for a longer duration of time
due to it’s dictionary based scheme.But it has a serious
drawback of large dictionary size.As the patterns increases
the length of the dictionary also increases and finally it
LZ SOURCE CODING TECHNIQUES, OCTOBER 30, 2016 3
affects the performance of the encoding process.One of the
main advantages of LZ78 over LZ77 is the dictionary based
compression technique which helps in faster encoding.The
important property of LZ77 that LZ78 preserves is that its
decoding process is much faster than the encoding process.
1) LZW : Terry Welch has presented his algorithm in 1984
which is based on LZ78 and LZSS.He used the concept of not
generating the dictionary from the scratch rather initialized
the dictionary with all possible forms of input alphabets.If
the combination of current letter and the next letter is not
found in the dictionary then the combined word is added
to the dictionary.It guarantees that a match will always be
found. LZW would only send the index to the dictionary.
The input to the encoder is accumulated in a pattern ‘w’ as
long as ‘w’ is contained in the dictionary. If the addition of
another letter ‘K’ results in a pattern ‘w*K’ that is not in
the dictionary, then the index of ‘w’ is transmitted to the
receiver, the pattern ‘w*K’ is added to the dictionary and
another pattern is started with the letter ‘K’. The algorithm
then proceeds as follows:
w := NIL;
while ( there is input )
K := next symbol from input;
if (wK exists in the dictionary)
w := wK;
else
output (index(w));
add wK to the dictionary;
w := k;
One of the most useful compression algorithms of recent
decade is LZW algorithm.The compression and decompression
schemes for the sentence thisisthe are illustrated below.Each
letter is represented in ASCII format.
LZW compression Flow Chart :
Current Next Output Add to Dictionary
t(116) h(104) t(116) th(256)
h(104) i(105) h(104) hi(257)
i(105) s(115) i(105) is(258)
s(115) i(105) s(115) si(259)
i(105) s(115)
‘is’ is present in
the dictionary
not added
is(258) t(116) is(258) ist(260)
t(116) h(104)
‘th’ is present
in the dictio-
nary
not added
th(256) e(101) th(256) the(261)
e(101) e(101)
ASCII uses 8 bits to represent each character.In
uncompressed scheme 9 letters are transmitted where as
the LZW schemes transmits only 7 letters.In uncompressed
version total bits to be transmitted is 8*9=72.In compressed
version let’s say 9 bits are used to represent each word in the
dictionary.So total 9*7=63 bits are transmitted.
Data transmitted : 116 104 105 115 258 256 101
Percentage Data transmitted is =
(63/72)*100 = 87.5%
LZW Decompression Flow Chart:
Assuming that the data received are not altered by the
channel.So the data received are : 116 104 105 115 258 256
101
The decompression process is as follow :
Current Next Output Add to Dictionary
116 104 116 116 104(256)
104 105 104 104 105(257)
105 115 105 105 115(258)
115 258 115 115 105 115(259)
258 256 105 115 (258) 105 115 116(260)
256 101 116 104 (256) 116 104 101 (261)
101 101
• In the fourth row of the above table next bit is 258
which is present in the dictionary.So it is replaced by
the corresponding bit pairs i.e 105 and 115 .
• In the fifth row, since both the words are already present
we should add 258 256 i.e 105 115 116 104 to the
dictionary;but at each instant we can add only one extra
bit to the dictionary.That’s why only the lower bit of the
bit pair 116 104 to the dictionary along with 105 115.
• Data obtained at the output of the decompresser are :
116 104 105 115 105 115 116 104 101
These are ASCII representation of the sentence
’thisisthe’.
• Since the exact information has been reconstructed fully,
this is a lossless data compression scheme.
In the original proposal of LZW, the pointer size is chosen
to be 12 bits, allowing for up to 4096 dictionary entries.
Once the limit is reached, the dictionary becomes static.
2) LZFG: LZFG which was developed by Fiala and
Greene, gives fast encoding and decoding and good
compression without undue storage requirements. This
algorithm uses the original dictionary building technique
as LZ78 does but the only difference is that it stores the
elements in a tree data structure. Here, the encoded characters
are placed in a window (as in LZ77) to remove the oldest
phrases from the dictionary.
The overall performance in terms of average BPC of the
above referred Statistical coding methods are shown Fig. 4
and Fig. 5.
LZ SOURCE CODING TECHNIQUES, OCTOBER 30, 2016 4
Fig. 4: LZ78 Comparison For various data sets
Fig. 5: LZ78 Comparison For various data sets
From the above table it’s clear that LZFG provides a better
average BPC than the others.It’s average BPC (2.89) is much
less than other data compression processes mentioned in this
paper.
Compression Ratio :
The compression ratio indicates the difference between
size of the compressed data and the uncompressed data.Most
algorithms have a typical range of compression ratios that
they can achieve over a variety of data sets. Because of this,
it is usually more useful to look at an average compression
ratio for a particular method.The compression affects picture
quality,higher the compression ratio poorer the quality of the
resulting image.So while compressing data or images this
fact is taken into consideration.
Compression ratio= Sizeof OriginalData
Sizeof compressedData
Using LZW algorithm 60-70% compression can be achieved
for monochrome images and text files with repeated patterns.
II. CONCLUSION
Parkin’s Law : Data expands to fill space.
In this paper, various techniques derived from LZ77 and
LZ78 have been discussed and their algorithms are pro-
posed.Out of the proposed algorithms, LZB outperforms the
rest among LZ77 series with an average BPC of 3.11 .In LZ78
series LZFG outperforms the rest with an average BPC of
2.89 . Lempel and Ziv have built the foundation of lossless
compression for most of the algorithms which are widely used
now a days.As of now no other innovative algorithms which
are not derived from LZ series have been discovered.Many
researchers are working in this area to expand more and more
data within the available space according to Parkin’s law.
ACKNOWLEDGMENT
I would like to thank my parents for giving me the opportu-
nity to pursue my dream in this reputed space institute.I would
like to thank Dr. Vineeth B.S. for his continuous guidance
and support. Without his encouragement this research would
not have been successful.I am thankful to Dr. V.K. Dadhwal,
Director of IIST for allowing me to do this research.After all,
my deepest gratitude goes to Google and YouTube for helping
me to build a strong foundation in this area and clear all my
doubts.
REFERENCES
[1] Ziv. J and Lempel A., "A Universal Algorithm for Sequential Data
Compression", IEEE Transactions on Information Theory 23 (3), pp. 337-
342, May 1977.
[2] Ziv. J and Lempel A., "Compression of Individual Sequences via Variable-
Rate Coding", IEEE Transactions on Information Theory 24 (5), pp. 530-
536, September 1978.
[3] Huffman D.A., "A method for the construction of minimum- redundancy
codes", Proceedings of the Institute of Radio Engineers, 40 (9), pp. 1098-
1101, September 1952.
[4] Shannon C.E., "A mathematical theory of communication", Bell Sys.
Tech. Jour., vol. 27, pp. 398-403; July, 1948.
[5] Storer J and Szymanski T.G., "Data compression via textual substitution",
Journal of the ACM 29, pp. 928-951, 1982.
[6] Welch T.A., "A technique for high-performance data compression", IEEE
Computer, 17, pp. 8-19, 1984.
[7] Mohammad Banikazemi, "LZB: Data Compression with Bounded Ref-
erences", Proceedings of the 2009 Data Compression Conference, IEEE
Computer Society, 2009.
[8] The Scientist and Engineer’s Guide to Digital Signal Processing by Steven
W. Smith
[9] Data Compression: The Complete Reference by David Salomon
[10] Data Compression in Digital Systems (Digital Multimedia Standards
Series) by Roy Hoffman
[11] Elements of information theory by Thomas M. Cover and Joy A. Thomas
[12] Faller N., "An adaptive system for data compression", In Record of
the 7th Asilornar Conference on Circuits, Systems and Computers, pages
593-597, Piscataway, NJ, 1973. IEEE Press.
[13] Fano R.M., "The Transmission of Information", Technical Report No.
65, Research Laboratory of Electronics, M.I.T., Cambridge, Mass.; 1949.
[14] Knuth D.E., "Dynamic Huffman coding", Journal of Algorithms,
6(2):163-180, June 1985.
Litu Rout Indian Institute of Space Science
and Technology
Department of Avionics
Bachelor of Technology
Student id: SC14B101
liturout1997@gmail.com

More Related Content

What's hot (19)

Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysis
 
Lecture3 lexical analysis
Lecture3 lexical analysisLecture3 lexical analysis
Lecture3 lexical analysis
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
Lecture5 syntax analysis_1
Lecture5 syntax analysis_1Lecture5 syntax analysis_1
Lecture5 syntax analysis_1
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Introduction To Programming with Python-3
Introduction To Programming with Python-3Introduction To Programming with Python-3
Introduction To Programming with Python-3
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Introduction to Structure Programming with C++
Introduction to Structure Programming with C++Introduction to Structure Programming with C++
Introduction to Structure Programming with C++
 
Specification-of-tokens
Specification-of-tokensSpecification-of-tokens
Specification-of-tokens
 
Python syntax
Python syntaxPython syntax
Python syntax
 
Compiler lec 8
Compiler lec 8Compiler lec 8
Compiler lec 8
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
PYTHON NOTES
PYTHON NOTESPYTHON NOTES
PYTHON NOTES
 
Parsing
ParsingParsing
Parsing
 
Introduction to the theory of computation
Introduction to the theory of computationIntroduction to the theory of computation
Introduction to the theory of computation
 
COMPILER DESIGN- Syntax Analysis
COMPILER DESIGN- Syntax AnalysisCOMPILER DESIGN- Syntax Analysis
COMPILER DESIGN- Syntax Analysis
 

Viewers also liked

MRI_processing
MRI_processingMRI_processing
MRI_processingLitu Rout
 
Ecuacion de bernoulli bnbn
Ecuacion de bernoulli bnbnEcuacion de bernoulli bnbn
Ecuacion de bernoulli bnbnMiroslava Moreno
 
Ramo Bolic -Avid Volunteer & Award Winning Marketing Leader
Ramo Bolic -Avid Volunteer & Award Winning Marketing LeaderRamo Bolic -Avid Volunteer & Award Winning Marketing Leader
Ramo Bolic -Avid Volunteer & Award Winning Marketing LeaderRamo Bolic
 
"My Childhood" by Ashley M. Voss (age 8)
"My Childhood" by Ashley M. Voss (age 8)"My Childhood" by Ashley M. Voss (age 8)
"My Childhood" by Ashley M. Voss (age 8)ashleyvoss03
 
Nackerud2016MissBlockhead
Nackerud2016MissBlockheadNackerud2016MissBlockhead
Nackerud2016MissBlockheadRurik Nackerud
 
Dotnet IEEE Projects 2016-2017 | Dotnet IEEE Projects Titles 2016-2017
Dotnet IEEE Projects 2016-2017 | Dotnet IEEE Projects Titles 2016-2017Dotnet IEEE Projects 2016-2017 | Dotnet IEEE Projects Titles 2016-2017
Dotnet IEEE Projects 2016-2017 | Dotnet IEEE Projects Titles 2016-2017LeMeniz Infotech
 
SKATs 500 dages plan 2015-16 for afdelingen Motor i Kundeservice
SKATs 500 dages plan 2015-16 for afdelingen Motor i KundeserviceSKATs 500 dages plan 2015-16 for afdelingen Motor i Kundeservice
SKATs 500 dages plan 2015-16 for afdelingen Motor i KundeserviceJonatan Schloss
 
iMoot16 - Radioactive Spider Bite
iMoot16 - Radioactive Spider BiteiMoot16 - Radioactive Spider Bite
iMoot16 - Radioactive Spider BiteStephan Rinke
 
Analisis sinyal sinyal kecil (ac)
Analisis sinyal sinyal kecil (ac)Analisis sinyal sinyal kecil (ac)
Analisis sinyal sinyal kecil (ac)adiprayogaa
 
AS INSTITUTAS - VOLUME I - ESTUDO - JOÃO CALVINO
AS INSTITUTAS - VOLUME I - ESTUDO - JOÃO CALVINOAS INSTITUTAS - VOLUME I - ESTUDO - JOÃO CALVINO
AS INSTITUTAS - VOLUME I - ESTUDO - JOÃO CALVINOTeol. Sandra Ferreira
 
Techniques for an educative use of Internet
Techniques for an educative use of InternetTechniques for an educative use of Internet
Techniques for an educative use of InternetRaúl Reinoso
 
The Role Of Technology In Modern Terrorism
The Role Of Technology In Modern TerrorismThe Role Of Technology In Modern Terrorism
The Role Of Technology In Modern TerrorismPierluigi Paganini
 

Viewers also liked (16)

Karnataka_-_Diplo_July
Karnataka_-_Diplo_JulyKarnataka_-_Diplo_July
Karnataka_-_Diplo_July
 
Espa news 1 120216
Espa news 1 120216Espa news 1 120216
Espa news 1 120216
 
MRI_processing
MRI_processingMRI_processing
MRI_processing
 
Ecuacion de bernoulli bnbn
Ecuacion de bernoulli bnbnEcuacion de bernoulli bnbn
Ecuacion de bernoulli bnbn
 
Ramo Bolic -Avid Volunteer & Award Winning Marketing Leader
Ramo Bolic -Avid Volunteer & Award Winning Marketing LeaderRamo Bolic -Avid Volunteer & Award Winning Marketing Leader
Ramo Bolic -Avid Volunteer & Award Winning Marketing Leader
 
project documents
project documentsproject documents
project documents
 
03. Agile Development
03. Agile Development03. Agile Development
03. Agile Development
 
"My Childhood" by Ashley M. Voss (age 8)
"My Childhood" by Ashley M. Voss (age 8)"My Childhood" by Ashley M. Voss (age 8)
"My Childhood" by Ashley M. Voss (age 8)
 
Nackerud2016MissBlockhead
Nackerud2016MissBlockheadNackerud2016MissBlockhead
Nackerud2016MissBlockhead
 
Dotnet IEEE Projects 2016-2017 | Dotnet IEEE Projects Titles 2016-2017
Dotnet IEEE Projects 2016-2017 | Dotnet IEEE Projects Titles 2016-2017Dotnet IEEE Projects 2016-2017 | Dotnet IEEE Projects Titles 2016-2017
Dotnet IEEE Projects 2016-2017 | Dotnet IEEE Projects Titles 2016-2017
 
SKATs 500 dages plan 2015-16 for afdelingen Motor i Kundeservice
SKATs 500 dages plan 2015-16 for afdelingen Motor i KundeserviceSKATs 500 dages plan 2015-16 for afdelingen Motor i Kundeservice
SKATs 500 dages plan 2015-16 for afdelingen Motor i Kundeservice
 
iMoot16 - Radioactive Spider Bite
iMoot16 - Radioactive Spider BiteiMoot16 - Radioactive Spider Bite
iMoot16 - Radioactive Spider Bite
 
Analisis sinyal sinyal kecil (ac)
Analisis sinyal sinyal kecil (ac)Analisis sinyal sinyal kecil (ac)
Analisis sinyal sinyal kecil (ac)
 
AS INSTITUTAS - VOLUME I - ESTUDO - JOÃO CALVINO
AS INSTITUTAS - VOLUME I - ESTUDO - JOÃO CALVINOAS INSTITUTAS - VOLUME I - ESTUDO - JOÃO CALVINO
AS INSTITUTAS - VOLUME I - ESTUDO - JOÃO CALVINO
 
Techniques for an educative use of Internet
Techniques for an educative use of InternetTechniques for an educative use of Internet
Techniques for an educative use of Internet
 
The Role Of Technology In Modern Terrorism
The Role Of Technology In Modern TerrorismThe Role Of Technology In Modern Terrorism
The Role Of Technology In Modern Terrorism
 

Similar to lempel_ziv

Lossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDALossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDAIOSR Journals
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithmijistjournal
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithmijistjournal
 
111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.pptAllamJayaPrakash
 
111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.pptAllamJayaPrakash
 
Improving data compression ratio by the use of optimality of lzw & adaptive h...
Improving data compression ratio by the use of optimality of lzw & adaptive h...Improving data compression ratio by the use of optimality of lzw & adaptive h...
Improving data compression ratio by the use of optimality of lzw & adaptive h...ijitjournal
 
LZ77 and LZ78 Compression Algorithms
LZ77 and LZ78 Compression AlgorithmsLZ77 and LZ78 Compression Algorithms
LZ77 and LZ78 Compression AlgorithmsMustafa GÖKÇE
 
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...Helan4
 
Image-Based Literal Node Matching for Linked Data Integration
Image-Based Literal Node Matching for Linked Data IntegrationImage-Based Literal Node Matching for Linked Data Integration
Image-Based Literal Node Matching for Linked Data IntegrationIJwest
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportAkshit Arora
 
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMOPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMJitendra Choudhary
 
FINAL PROJECT REPORT
FINAL PROJECT REPORTFINAL PROJECT REPORT
FINAL PROJECT REPORTDhrumil Shah
 
Secured algorithm for gsm encryption & decryption
Secured algorithm for gsm encryption & decryptionSecured algorithm for gsm encryption & decryption
Secured algorithm for gsm encryption & decryptionTharindu Weerasinghe
 

Similar to lempel_ziv (20)

Lossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDALossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDA
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithm
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithm
 
111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt
 
111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt
 
Improving data compression ratio by the use of optimality of lzw & adaptive h...
Improving data compression ratio by the use of optimality of lzw & adaptive h...Improving data compression ratio by the use of optimality of lzw & adaptive h...
Improving data compression ratio by the use of optimality of lzw & adaptive h...
 
LZW Presentation.pptx
LZW Presentation.pptxLZW Presentation.pptx
LZW Presentation.pptx
 
LZ77 and LZ78 Compression Algorithms
LZ77 and LZ78 Compression AlgorithmsLZ77 and LZ78 Compression Algorithms
LZ77 and LZ78 Compression Algorithms
 
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
 
Lossless
LosslessLossless
Lossless
 
Image-Based Literal Node Matching for Linked Data Integration
Image-Based Literal Node Matching for Linked Data IntegrationImage-Based Literal Node Matching for Linked Data Integration
Image-Based Literal Node Matching for Linked Data Integration
 
LZ78
LZ78LZ78
LZ78
 
FinalReport
FinalReportFinalReport
FinalReport
 
CNS2 unit 2.pdf
CNS2 unit 2.pdfCNS2 unit 2.pdf
CNS2 unit 2.pdf
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
 
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMOPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
 
50120130405006
5012013040500650120130405006
50120130405006
 
FINAL PROJECT REPORT
FINAL PROJECT REPORTFINAL PROJECT REPORT
FINAL PROJECT REPORT
 
Secured algorithm for gsm encryption & decryption
Secured algorithm for gsm encryption & decryptionSecured algorithm for gsm encryption & decryption
Secured algorithm for gsm encryption & decryption
 
Parser
ParserParser
Parser
 

lempel_ziv

  • 1. LZ SOURCE CODING TECHNIQUES, OCTOBER 30, 2016 1 Lempel-Ziv Source Coding Techniques LITU ROUT Indian Institute of Space Science and Technology liturout1997@gmail.com Abstract—There are so many data compression techniques available which are used for efficient transmission and storage of data with less memory space.Lempel-Ziv(LZ) scheme is one of those lossless data compression techniques.LZ is not a single algorithm, but a whole family of algorithms derived from their basic algorithm proposed in the year 1977 and 1978.Today these algorithms are remembered by the name of the authors and the year of implementation of the same. LZ77 exploits the fact that words and phrases in a text are likely to be repeated.When there is a repetition they can be pointed to the earlier occurrence thereby saving the memory space. LZ78 is a dictionary based technique.A dictionary is created while the data are being encoded.So encoding can be done on the fly.The dictionary need not to be transmitted as it can be generated at the receiving end on the fly. If there is an overflow in the dictionary then one bit is added to the code word until all the characters are included. Index Terms—Lossless,LZ77,LZ78,Repetitive text/patterns,Dictionary Scheme I. INTRODUCTION DATA compression deals with the reduction of space needed to store information thereby reducing the amount of time required to transmit data.These compression techniques are mainly based on identification and isolation of excess information.Data are compressed such that it meets the minimum requirements of the reconstructed signals. All of the books in the world contain no more information than is broadcast as video in a single large American city in a single year. Not all bits have equal value. -Carl Sagan Some data compression schemes are lossless i.e. the exact information can be reconstructed from the transmitted data.This is needed when we can’t afford to loose any detail about the information such as medical imaging,text,computer executable files etc. Some schemes are lossy i.e. only some specific information can be reconstructed from the transmitted data.In some cases it might be meet our need to get the approximate result.Lossy compressions have better compression ratio than lossless ones. Data such as multimedia images, video and audio are more easily compressed by lossy compression techniques because of the way human audio and visual system works.Human ears can’t detect very small changes in the sound.So rather than sending the whole information Litu Rout,Bachelor of Technology,Department of Avionics,IIST data can be compressed and sent with undistinguishable result. Lossy compression has better compression ratio but it is limited to audio,video and images where some loss is acceptable.Each has it’s own merits and demerits.So the question of which one is better is irrelevant here. There has been quite a few lossless data compression algorithms based on the probabilistic or dictionary method which was first proposed by Lempel and Ziv in their 1977 and 1978 papers.The Dictionary based compression technique Lempel-Ziv scheme is divided into two families: those derived from LZ77 (LZ77, LZSS, LZH and LZB) and those derived from LZ78 (LZ78, LZW and LZFG). In this paper I have given a brief description about LZ77 Series and LZ78 Series. A. LZ77 Series LZ77 exploits the fact that the words and phrases in text are likely to be repeated.When there is a repetition ,the new character can be referred to the previous occurrence by a pointer.It doesn’t need any prior knowledge and requires no assumptions about the characteristics of the source. In the LZ77 approach, the dictionary is simply a portion of the previously encoded sequence. The encoder examines the input sequence through a sliding window which consists of two parts: a search buffer that contains a portion of the recently encoded sequence and a look ahead buffer that contains the next portion of the sequence to be encoded. The algorithm searches the sliding window for the longest match with the beginning of the look-ahead buffer and outputs a reference (a pointer) to that match. It is possible that there is no match at all, so the output cannot contain just pointers. In LZ77 the reference is always output as a triple <o,l,c>, where ‘o’ is an offset to the match, ‘l’ is length of the match, and ‘c’ is the next symbol after the match. If there is no match, the algorithm outputs a null-pointer (both the offset and the match length equal to 0) and the first symbol in the look-ahead buffer. The values of an offset to a match and length must be limited to some maximum constants. Moreover the compression performance of LZ77 mainly depends on these values. Usually the offset is encoded on 12-16 bits, so it is limited from 0 to 65535 symbols. So, there is no need to remember more than 65535 last seen symbols in the sliding window. The match length is usually encoded on 8 bits,
  • 2. LZ SOURCE CODING TECHNIQUES, OCTOBER 30, 2016 2 which gives maximum match length equal to 255. Algorithm of LZ77 : While (lookAheadBuffer not empty) get a reference (position, length) to longest match; if (length > 0) output (position, length, next symbol); shift the window length+1 positions along; else output (0, 0, first symbol in the lookahead buffer); shift the window 1 character along; There are lots of ways that LZ77 scheme can be made more efficient and many of the improvements deal with the efficient encoding with the triples. There are several variations on LZ77 scheme, the best known are LZSS, LZH and LZB. Fig. 1: Lempel-Ziv Derivatives 1) LZSS : It’s a lossless data compression algorithm.It’s a derivative of LZ77, created by James Storer and Thomas Szymanski in 1982.The difference between LZ77 and LZSS is that it doesn’t allow the reference bit to be longer than the length of the string that it was replacing.It uses a break even point as threshold to omit such references.It also uses a flag bit to indicate that the next data is a pointer or a single symbol. 2) LZH : LZH is the scheme that combines the Ziv- Lempel and Huffman techniques. Here coding is performed in two passes. The first is essentially same as LZSS, while the second uses statistics measured in the first to code pointers and explicit characters using Huffman coding. The Fig.2 and Fig.3 indicate that different ASCII files are compressed to an average bits per character (BPC) which is a little less than half of the original size.Out of LZ77 series which are mentioned in this paper LZB uses lowest average BPC thereby provides a higher compression rate than the others.It’s average BPC is around 3.11 . Fig. 2: LZ77 Comparison For various data sets Fig. 3: LZ77 Comparison For various data sets B. LZ78 Series In 1978 Jacob Ziv and Abraham Lempel presented their dictionary based scheme , which is known as LZ78. It is a dictionary based compression algorithm that maintains an explicit dictionary. This dictionary has to be built both at the encoding and decoding side and they must follow the same rules to ensure that they use an identical dictionary. The codewords output by the algorithm consists of two elements <i,c> where ‘i’ is an index referring to the longest matching dictionary entry and the first non-matching symbol. In addition to outputting the codeword for storage / transmission the algorithm also adds the index and symbol pair to the dictionary. When a symbol that is not yet found in the dictionary, the codeword has the index value 0 and it is added to the dictionary as well. The algorithm gradually builds up a dictionary with this method. The algorithm for LZ78 is given below: w := NIL; while ( there is input ) K := next symbol from input; if (wK exists in the dictionary) w := wK; else output (index(w), K); add wK to the dictionary; w := NIL; LZ78 can hold patterns for a longer duration of time due to it’s dictionary based scheme.But it has a serious drawback of large dictionary size.As the patterns increases the length of the dictionary also increases and finally it
  • 3. LZ SOURCE CODING TECHNIQUES, OCTOBER 30, 2016 3 affects the performance of the encoding process.One of the main advantages of LZ78 over LZ77 is the dictionary based compression technique which helps in faster encoding.The important property of LZ77 that LZ78 preserves is that its decoding process is much faster than the encoding process. 1) LZW : Terry Welch has presented his algorithm in 1984 which is based on LZ78 and LZSS.He used the concept of not generating the dictionary from the scratch rather initialized the dictionary with all possible forms of input alphabets.If the combination of current letter and the next letter is not found in the dictionary then the combined word is added to the dictionary.It guarantees that a match will always be found. LZW would only send the index to the dictionary. The input to the encoder is accumulated in a pattern ‘w’ as long as ‘w’ is contained in the dictionary. If the addition of another letter ‘K’ results in a pattern ‘w*K’ that is not in the dictionary, then the index of ‘w’ is transmitted to the receiver, the pattern ‘w*K’ is added to the dictionary and another pattern is started with the letter ‘K’. The algorithm then proceeds as follows: w := NIL; while ( there is input ) K := next symbol from input; if (wK exists in the dictionary) w := wK; else output (index(w)); add wK to the dictionary; w := k; One of the most useful compression algorithms of recent decade is LZW algorithm.The compression and decompression schemes for the sentence thisisthe are illustrated below.Each letter is represented in ASCII format. LZW compression Flow Chart : Current Next Output Add to Dictionary t(116) h(104) t(116) th(256) h(104) i(105) h(104) hi(257) i(105) s(115) i(105) is(258) s(115) i(105) s(115) si(259) i(105) s(115) ‘is’ is present in the dictionary not added is(258) t(116) is(258) ist(260) t(116) h(104) ‘th’ is present in the dictio- nary not added th(256) e(101) th(256) the(261) e(101) e(101) ASCII uses 8 bits to represent each character.In uncompressed scheme 9 letters are transmitted where as the LZW schemes transmits only 7 letters.In uncompressed version total bits to be transmitted is 8*9=72.In compressed version let’s say 9 bits are used to represent each word in the dictionary.So total 9*7=63 bits are transmitted. Data transmitted : 116 104 105 115 258 256 101 Percentage Data transmitted is = (63/72)*100 = 87.5% LZW Decompression Flow Chart: Assuming that the data received are not altered by the channel.So the data received are : 116 104 105 115 258 256 101 The decompression process is as follow : Current Next Output Add to Dictionary 116 104 116 116 104(256) 104 105 104 104 105(257) 105 115 105 105 115(258) 115 258 115 115 105 115(259) 258 256 105 115 (258) 105 115 116(260) 256 101 116 104 (256) 116 104 101 (261) 101 101 • In the fourth row of the above table next bit is 258 which is present in the dictionary.So it is replaced by the corresponding bit pairs i.e 105 and 115 . • In the fifth row, since both the words are already present we should add 258 256 i.e 105 115 116 104 to the dictionary;but at each instant we can add only one extra bit to the dictionary.That’s why only the lower bit of the bit pair 116 104 to the dictionary along with 105 115. • Data obtained at the output of the decompresser are : 116 104 105 115 105 115 116 104 101 These are ASCII representation of the sentence ’thisisthe’. • Since the exact information has been reconstructed fully, this is a lossless data compression scheme. In the original proposal of LZW, the pointer size is chosen to be 12 bits, allowing for up to 4096 dictionary entries. Once the limit is reached, the dictionary becomes static. 2) LZFG: LZFG which was developed by Fiala and Greene, gives fast encoding and decoding and good compression without undue storage requirements. This algorithm uses the original dictionary building technique as LZ78 does but the only difference is that it stores the elements in a tree data structure. Here, the encoded characters are placed in a window (as in LZ77) to remove the oldest phrases from the dictionary. The overall performance in terms of average BPC of the above referred Statistical coding methods are shown Fig. 4 and Fig. 5.
  • 4. LZ SOURCE CODING TECHNIQUES, OCTOBER 30, 2016 4 Fig. 4: LZ78 Comparison For various data sets Fig. 5: LZ78 Comparison For various data sets From the above table it’s clear that LZFG provides a better average BPC than the others.It’s average BPC (2.89) is much less than other data compression processes mentioned in this paper. Compression Ratio : The compression ratio indicates the difference between size of the compressed data and the uncompressed data.Most algorithms have a typical range of compression ratios that they can achieve over a variety of data sets. Because of this, it is usually more useful to look at an average compression ratio for a particular method.The compression affects picture quality,higher the compression ratio poorer the quality of the resulting image.So while compressing data or images this fact is taken into consideration. Compression ratio= Sizeof OriginalData Sizeof compressedData Using LZW algorithm 60-70% compression can be achieved for monochrome images and text files with repeated patterns. II. CONCLUSION Parkin’s Law : Data expands to fill space. In this paper, various techniques derived from LZ77 and LZ78 have been discussed and their algorithms are pro- posed.Out of the proposed algorithms, LZB outperforms the rest among LZ77 series with an average BPC of 3.11 .In LZ78 series LZFG outperforms the rest with an average BPC of 2.89 . Lempel and Ziv have built the foundation of lossless compression for most of the algorithms which are widely used now a days.As of now no other innovative algorithms which are not derived from LZ series have been discovered.Many researchers are working in this area to expand more and more data within the available space according to Parkin’s law. ACKNOWLEDGMENT I would like to thank my parents for giving me the opportu- nity to pursue my dream in this reputed space institute.I would like to thank Dr. Vineeth B.S. for his continuous guidance and support. Without his encouragement this research would not have been successful.I am thankful to Dr. V.K. Dadhwal, Director of IIST for allowing me to do this research.After all, my deepest gratitude goes to Google and YouTube for helping me to build a strong foundation in this area and clear all my doubts. REFERENCES [1] Ziv. J and Lempel A., "A Universal Algorithm for Sequential Data Compression", IEEE Transactions on Information Theory 23 (3), pp. 337- 342, May 1977. [2] Ziv. J and Lempel A., "Compression of Individual Sequences via Variable- Rate Coding", IEEE Transactions on Information Theory 24 (5), pp. 530- 536, September 1978. [3] Huffman D.A., "A method for the construction of minimum- redundancy codes", Proceedings of the Institute of Radio Engineers, 40 (9), pp. 1098- 1101, September 1952. [4] Shannon C.E., "A mathematical theory of communication", Bell Sys. Tech. Jour., vol. 27, pp. 398-403; July, 1948. [5] Storer J and Szymanski T.G., "Data compression via textual substitution", Journal of the ACM 29, pp. 928-951, 1982. [6] Welch T.A., "A technique for high-performance data compression", IEEE Computer, 17, pp. 8-19, 1984. [7] Mohammad Banikazemi, "LZB: Data Compression with Bounded Ref- erences", Proceedings of the 2009 Data Compression Conference, IEEE Computer Society, 2009. [8] The Scientist and Engineer’s Guide to Digital Signal Processing by Steven W. Smith [9] Data Compression: The Complete Reference by David Salomon [10] Data Compression in Digital Systems (Digital Multimedia Standards Series) by Roy Hoffman [11] Elements of information theory by Thomas M. Cover and Joy A. Thomas [12] Faller N., "An adaptive system for data compression", In Record of the 7th Asilornar Conference on Circuits, Systems and Computers, pages 593-597, Piscataway, NJ, 1973. IEEE Press. [13] Fano R.M., "The Transmission of Information", Technical Report No. 65, Research Laboratory of Electronics, M.I.T., Cambridge, Mass.; 1949. [14] Knuth D.E., "Dynamic Huffman coding", Journal of Algorithms, 6(2):163-180, June 1985. Litu Rout Indian Institute of Space Science and Technology Department of Avionics Bachelor of Technology Student id: SC14B101 liturout1997@gmail.com