This document discusses Lempel-Ziv algorithms for lossless data compression. It introduces the LZ77 and LZ78 algorithms, which use adaptive dictionaries to encode repeated patterns in data. The document describes how LZ77 uses a sliding window to find the longest matching string and represents it with an offset-length pair. It also explains how LZ78 builds an explicit dictionary as it encodes. The document provides examples and discusses improvements made to these original Lempel-Ziv algorithms.
Dependent Haskell has been desired in the community of Haskell programmers for a long time. Our goal of this project is to make the core language of Haskell, known as System FC, dependently typed, as steps are taken towards dependent Haskell.
This is a working-in-progress project. As a small step towards our final goal, the focus of this talk is on coercion quantification. Coercion quantification is necessary to support homogeneous equality, which simplifies the core and is important for meta-theories of dependently typed core.
Coercion quantification is interesting for both people working in core and for Haskell users. For GHC hackers, the patch to core formalization is worth attention. Adding coercion quantification involves refactor to lots of files in the compilation pipeline and introduces several subtleties. For Haskell users, coercion quantification opens up new questions to the design space in source Haskell, which requires non-trivial extension of the solver. We would want Haskell users to answer if this feature is ever desired in their development.
In this talk, we will share the high-level story-line of the dependently typed core, our low-level progress in implementing coercion quantification, as well as the involving design space, and seek feedbacks from the broader community.
Dependent Haskell has been desired in the community of Haskell programmers for a long time. Our goal of this project is to make the core language of Haskell, known as System FC, dependently typed, as steps are taken towards dependent Haskell.
This is a working-in-progress project. As a small step towards our final goal, the focus of this talk is on coercion quantification. Coercion quantification is necessary to support homogeneous equality, which simplifies the core and is important for meta-theories of dependently typed core.
Coercion quantification is interesting for both people working in core and for Haskell users. For GHC hackers, the patch to core formalization is worth attention. Adding coercion quantification involves refactor to lots of files in the compilation pipeline and introduces several subtleties. For Haskell users, coercion quantification opens up new questions to the design space in source Haskell, which requires non-trivial extension of the solver. We would want Haskell users to answer if this feature is ever desired in their development.
In this talk, we will share the high-level story-line of the dependently typed core, our low-level progress in implementing coercion quantification, as well as the involving design space, and seek feedbacks from the broader community.
Types of Data compression, Lossy Compression, Lossless compression and many more. How data is compressed etc. A little extensive than CIE O level Syllabus
Image compression: Techniques and ApplicationNidhi Baranwal
This presentation involves a mathematical view of image compression having a brief introduction of its theory,major techniques along with their algorithm and examples.
Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement, and has the potential for very high throughput in hardware implementations.
It is the algorithm of the widely used Unix file compression utility compress, and is used in the GIF image format.
Range reader/writer locking for the Linux kernelDavidlohr Bueso
Range locking improves parallelism by fine graining locks which can cause bottlenecks and loss of performance, such as the infamous mmap_sem. As such there as been active efforts to upstream a scalable reader/writer range lock (which can also aid DAX and lustre, to mention some users). The session will update and discuss key implementation details such as fairness, performance and comparisons with a more traditional rw-semaphore.
Types of Data compression, Lossy Compression, Lossless compression and many more. How data is compressed etc. A little extensive than CIE O level Syllabus
Image compression: Techniques and ApplicationNidhi Baranwal
This presentation involves a mathematical view of image compression having a brief introduction of its theory,major techniques along with their algorithm and examples.
Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement, and has the potential for very high throughput in hardware implementations.
It is the algorithm of the widely used Unix file compression utility compress, and is used in the GIF image format.
Range reader/writer locking for the Linux kernelDavidlohr Bueso
Range locking improves parallelism by fine graining locks which can cause bottlenecks and loss of performance, such as the infamous mmap_sem. As such there as been active efforts to upstream a scalable reader/writer range lock (which can also aid DAX and lustre, to mention some users). The session will update and discuss key implementation details such as fairness, performance and comparisons with a more traditional rw-semaphore.
Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov
This talk briefly introduces the Incremental Query Rewriting (IQR) method (see http://link.springer.com/chapter/10.1007%2F978-1-4419-7335-1_1 ) and presents an approach for extremely expressive querying of RDF triplestores, based on IQR.
This presentation covers all the possible basics related to Regular Expression (RegEx) and we have used Perl programming to explain all those practical examples from real world.
This presentation is useful for engineering students and IT professionals like web developer, system admin, web security etc.
1. LIMPEL ZIV ALGORITHMS
BY
S T RAJAN
CSN-CJB0912010
Ramaiah School of Advanced Studies –Bangalore
M.S 1
2. INTRODUCTION
• Lempel-Ziv is a lossless date compression method algorithm
• The generalized idea comes from pigeonhole principle .
• If N items are placed into M pigeonholes where n>m
.
Fig: A pigeonhole
• An image of pigeons in holes. Here there are n = 10 pigeons in m = 9 holes.
Since 10 is greater than 9, the pigeonhole principle says that at least one
hole has more than one pigeon
• This concept may varies with input.so it cannot applied every time.
Ramaiah School of Advanced Studies -Bangalore
M.S – 2
3. Lempel Ziv Algorithms
• Lossless Data compression is technique used to produce the original
information from a compressed data.
• Like Huffman coding ,run length coding ,Arithmetic coding etc., Lempel-
Ziv is a lossless data compression technique used more often.
• The Lempel Ziv algorithms belong to yet another category of lossless
compression techniques known as dictionary coders.
• Abraham Lempel & Jacob ziv together published their first compression
method is sometimes referred to as "LZ77," for the year 1977, in which the
duo published an article entitled "A Universal Algorithm for Sequential
Data Compression" .The pair wrote another paper in 1978 outlining another
dictionary approach know as LZ78 algorithm which was modified by Terry
Welch in 1984.
Ramaiah School of Advanced Studies –Bangalore
M.S 3
4. Limpel Ziv Algorithm Family
LZ77 LZ78
LZJ
LZR LZSS LZH LZB LZFG LZT
LZW
LZC
LZMW
APPLICATIONS: APPLICATIONS:
• ZIP GIF
• GZIP V.42
• STACKER COMPRESS
Fig 1: Limpel Ziv Algorithm Family
Ramaiah School of Advanced Studies –Bangalore
M.S 4
5. Types of Dictionary
• The dictionary holds a list of strings of symbols and it may be static or
dynamic (adaptive).
• Static dictionary – permanent, sometimes allowing the addition of strings
but no deletions
• Dynamic dictionary – holding strings previously found in the input stream,
allowing for additions and deletions of strings as new input is being read
• LZ Algorithms are used in “ADAPTIVE DICTIONARY”
• The dictionary is being built in a single pass, while at the same time
encoding take places.
• It continuously rewrites the dictionary for a file, discarding patterns it
previously included and adding new ones when necessary.
Ramaiah School of Advanced Studies –Bangalore
M.S 5
6. LZ77
General approach
• Dictionary is a portion of the previously encoded sequence
• Use a sliding window for compression
Mechanism
• Find the maximum length match for the string pointed to by
the search pointer in the search buffer, and encode it
Rationale
• If patterns tend to repeat locally, we should be able to get
more efficient representation
Ramaiah School of Advanced Studies –Bangalore
M.S 6
7. LZ77
• Sliding window is composed of a search buffer and a look ahead buffer
(note: window size W = S + LA).
Match pointer search pointer
a_ _ a br a - a da br a r r a r r a_
look ahead buffer
Search buffer (size LA=7)
(size S=8)
Ramaiah School of Advanced Studies –Bangalore
M.S 7
8. Explanation
• Offset = search pointer – match pointer (o = 7)
• Length of match = number of consecutive letters matched
(l = 4)
• Code word (c = C(r)), where C(r) is the code word for r
• Encoding triple: <o, l, c> = <7, 4, C(r)>
• If FLC is used and alphabet size is |A|, <o, l, c> can be
encoded with [log2S] + [log2W] + [log2|A|] bits.
Ramaiah School of Advanced Studies –Bangalore
M.S 8
9. Possible Cases for Triples
• There could be three different possibilities that may
be encountered during the coding process:
-No match for the next character to be encoded in the window
-There is a match
-The matched string extends inside the look-ahead buffer
• For each of these cases, we have a triple to signal
the case to the decoder.
Ramaiah School of Advanced Studies –Bangalore
M.S 9
10. ENCODING
• Sequence
cabracadabrarrarrad - |cadabrar|rarrad|
W = 13, S = 7 |cadabrar|rarrad|
- |cabraca|dabrar|rarrad |cadabrar|rarrad|
no match for d send <3, 3, C(r)>
send <0, 0, C(d)> Could we do better?
-|abracad|abrarr|arrad Send <3, 5, C(d)> instead
|abracad|abrarr|arrad
|abracad|abrarr|arrad
|abracad|abrarr|arrad
send <7, 4, C(r)>
Ramaiah School of Advanced Studies –Bangalore
M.S 10
11. DECODING
• Current input: <0, 0, C(d)> <7, 4, C(r)> <3, 5, C(d)>
• Current output: cabraca
Decode: <0, 0, C(d)>
Decode C(d): c|abracad|
Decode: <7, 4, C(r)>
Start with the first „a‟, copy four letters: cabra|cadabra
Decode C(r): cabrac|adabrar
Decode: <3, 5, C(d)>
Start with the first „r‟, copy three letters: cabracada|brarrar|
Copy two more letters: cabracadabr|arrarar|
Decode C(d): cabracadabrarrarard
Ramaiah School of Advanced Studies –Bangalore
M.S 11
12. Algorithm
while (lookAheadBuffer not empty) {
get a reference (position, length) to longest match;
if (length > 0) {
output (position, length, next symbol);
shift the window length+1 positions along;
} else {
output (0, 0, first symbol in the lookahead buffer);
shift the window 1 character along;
}
}
Ramaiah School of Advanced Studies –Bangalore
M.S 12
13. Points
• For LZ77, we have
-Adaptive scheme, no prior knowledge
-Asymptotically approaches the source statistics
- Assumes that recurring patterns close to each others
• Possible improvements
-Variable-bit encoding: PKZip, zip, gzip, …, etc., uses a
variable-length coder to encode <o, l, c>.
-Variable buffer size: larger buffer requires faster searches
- Elimination of <0, 0, C(x)>
-LZSS sends a flag bit to signal whether the next “token” is an
<o, l> pair or the codeword of a symbol
Ramaiah School of Advanced Studies –Bangalore
M.S 13
14. Improvements
• LZR
The Lempel - Ziv - Renau modification allows pointers to reference anything
that has been encoded without being limited by the length of the search.
• LZSS
The popular modification by Storer and Szymanski (1982) which is used for
the mandatory inclusion of the next non-matching symbol into each codeword
will lead to situations in which the symbol is being explicitly coded despite the
possibility of it being part of the next match.
• LZB
LZB uses an elaborate scheme for encoding the references and lengths
with varying sizes.
• LZH
The LZH implementation employs Huffman coding to compress the pointers.
Ramaiah School of Advanced Studies –Bangalore
M.S 14
15. LZ78
• LZ78 improvements from LZ77
-No search buffer – explicit dictionary instead
-Encoder/decoder must build dictionary in sync
- Encoding: <i, c>
i = index in dictionary table
c = code of the following character
• Example: encode the following contents
wabba_wabba_wabba_wabba_woo_woo_woo
Ramaiah School of Advanced Studies –Bangalore
M.S 15
16. EXAMPLE-1
• Input: wabba_wabba_wabba_wabba_woo_woo_woo
• Dictionaries: Final Dictionary
Initial dictionary is empty Encoder output index entry
<0,c(w)> 1 w
index entry
<0,c(a)> 2 a
<0,c(b)> 3 b
<3,c(a)> 4 ba
<0,c(_)> 5 _
<1,c(a)> 6 wa
<3,c(b)> 7 bb
<2,c(_)> 8 a_
Ramaiah School of Advanced Studies –Bangalore
M.S 16
17. EXAMPLE(Continue..)
Encoder index entry
output
<6,c(b)> 9 wab
<4,c(_)> 10 ba_
<9,c(b)> 11 wabb
<8,c(w)> 12 a_w
<0,c(o)> 13 o
<13,c(_)> 14 o_
<13,c(o)> 15 wo
<1,c(w)> 16 o_w
<13,c(o)> 17 oo
Ramaiah School of Advanced Studies –Bangalore
M.S 17
18. Remarks
• Observation
If we keep on encoding, the dictionary will keep on growing
• Possible solutions
Stop growing the dictionary
Effectively switch to a static dictionary
Prune it
Based on usage statistics
Reset it
Start all over again
• The best solution depends on the knowledge of the
source
Ramaiah School of Advanced Studies –Bangalore
M.S 18
19. Improvements
• LZ78 has limitation as it grows explicitly.
• LZW was developed by Terry Welch
• The dictionary has to be initialized with all the symbols of the input
alphabet and this initial dictionary needs to be made known to the decoder.
• IDEA
• Instead of <i, c>, encode i only
Ramaiah School of Advanced Studies –Bangalore
M.S 19
20. • Input: wabba_wabba_wabba_wabba_woo_woo_woo
• OUTPUT: 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4 Final
• Dictionaries: INDEX ENTRY INDEX ENTRY
Intial Dictionary 1 _ 14 a_w
2 a 15 wabb
INDEX ENTRY 3 b 16 ba_
1 _ 4 o 17 _wa
2 a 5 w 18 abb
3 b 6 wa 19 ba_w
4 o 7 ab 20 wo
5 w 8 bb 21 oo
9 ba 22 o_
10 a_ 23 _wo
11 _w 24 oo_
12 wab 25 _woo
12 bba
Ramaiah School of Advanced Studies –Bangalore
M.S 20
21. Algorithm
while (!done)
read next symbol into a
if (p*a) is in dictionary // Note: „*‟ stands for concatenation
p = p*a
else
send out index of p
add p*a to the dictionary
p=a
end
Ramaiah School of Advanced Studies –Bangalore
M.S 21
22. APPLICATION:COMPRESS
• An early implementation of LZW
• Adaptive dictionary, starts with 2^bmax–1entries
• Dictionary grows up to double in size (2bmax)
• User can configure max codeword length bmax = 9~16
• When dictionary reaches 2bmax entries, it becomes a static dictionary
encoder
• If compression ratio falls below a threshold, dictionary is reset.
APPLICATION :
GIF IMAGES
PNG IMAGES
Ramaiah School of Advanced Studies –Bangalore
M.S 22
23. References
• BELL, T. C., CLEARY, J. G., AND WITTEN, I. H. Text Compression.
Prentice Hall, Upper Sadle River, NJ, 1990.
• SAYOOD, K. Introduction to Data Compression. Academic Press, San
Diego, CA, 1996, 2000.
• ZIV, J., AND LEMPEL, A. A universal algorithm for sequential data
compression. IEEE Transactions on Information Theory 23 (1977),
337
• ZIV, J., AND LEMPEL, A. Compression of individual sequences via
variable-rate coding. IEEE Transactions on Information Theory 24
(1978),530–536.
Ramaiah School of Advanced Studies –Bangalore
M.S 23