3. Data compression
In computer science and information theory, data
compression, source coding, or bit-rate reduction involves
encoding information using fewer bits than the original
representation. Compression can be either lossy or lossless.
Lossless Compression
Lossless compression reduces bits by identifying and
eliminating statistical redundancy. No information is lost in
lossless compression.
Lossy Compression
Lossy compression reduces bits by identifying marginally
important information and removing it. The process of
reducing the size of a data file is popularly referred to as data
compression, although its formal name is source coding (coding
done at the source of the data, before it is stored or transmitted).
4. LZ78 Algorithm
Table 1: The encoding process
Step Pos Dictionary Output
1. 1 A (0,A)
2. 2 B (0,B)
3. 3 BC (2,C)
4. 5 BCA (3,A)
5. 8 BA (2,A)
Pos 1 2 3 4 5 6 7 8 9
Char A B B C B C A B A
5. Introduction to LZW
Static coding schemes require some knowledge about the
data before encoding takes place.
Universal coding schemes, like LZW, do not require advance
knowledge and can build such knowledge on-the-fly.
LZW is the foremost technique for general purpose data
compression due to its simplicity and versatility.
It is the basis of many PC utilities that claim to “double the
capacity of your hard drive”
LZW compression uses a code table, with 4096 as a common
choice for the number of table entries.
6. LZW Algorithm
LZW is a "dictionary"-based compression algorithm.
This means that instead of tabulating character counts
and building trees (as for Huffman encoding), LZW
encodes data by referencing a dictionary. Thus, to
encode a substring, only a single code number,
corresponding to that substring's index in the
dictionary, needs to be written to the output file.
Lempel & Ziv is the foremost technique for general
purpose data compression due to its simplicity and
versatility. Typically, you can expect LZW to compress
text, executable code, and similar data files to about
one-half their original size. LZW also performs well
when presented with extremely redundant data files,
such as tabulated numbers, computer source code, and
acquired signals. Compression ratios of 5:1 are
common for these cases. LZW is the basis of several
personal computer utilities that claim to"double the
capacity of your hard drive."
7. Introduction to LZW (cont'd)
Codes 0-255 in the code table are always assigned to
represent single bytes from the input file.
When encoding begins the code table contains only the first
256 entries, with the remainder of the table being blanks.
Compression is achieved by using codes 256 through 4095
to represent sequences of bytes.
As the encoding continues, LZW identifies repeated
sequences in the data, and adds them to the code table.
Decoding is achieved by taking each code from the
compressed file, and translating it through the code table to
find what character or characters it represents.
8. LZW Encoding Algorithm
1 Initialize table with single character strings
2 P = first input character
3 WHILE not end of input stream
4 C = next input character
5 IF P + C is in the string table
6 P=P+C
7 ELSE
8 output the code for P
9 add P + C to the string table
10 P=C
11 END WHILE
12 output code for P
9. Example 1: Compression using LZW
Example 1: Use the LZW
algorithm to compress the string
BABAABAAA
10. Example 1: LZW Compression Step 1
BABAABAAA P=A
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
11. Example 1: LZW Compression Step 2
BABAABAAA P=B
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
12. Example 1: LZW Compression Step 3
BABAABAAA P=A
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
256 BA 258 BAA
13. Example 1: LZW Compression Step 4
BABAABAAA P=A
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
256 BA 258 BAA
257 AB 259 ABA
14. Example 1: LZW Compression Step 5
BABAABAAA P=A
C=A
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
256 BA 258 BAA
257 AB 259 ABA
65 A 260 AA
15. Example 1: LZW Compression Step 6
BABAABAAA P=AA
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
256 BA 258 BAA
257 AB 259 ABA
65 A 260 AA
260 AA
16. LZW Decompression
The LZW decompressor creates the same string table
during decompression.
It starts with the first 256 table entries initialized to single
characters.
The string table is updated for each character in the input
stream, except the first one.
Decoding achieved by reading codes and translating them
through the code table being built.
17. LZW Decompression Algorithm
1 Initialize table with single character strings
2 OLD = first input code
3 output translation of OLD
4 WHILE not end of input stream
5 NEW = next input code
6 IF NEW is not in the string table
7 S = translation of OLD
8 S=S+C
9 ELSE
10 S = translation of NEW
11 output S
12 C = first character of S
13 OLD + C to the string table
14 OLD = NEW
15 END WHILE
18. Example 2: LZW Decompression 1
Example 2: Use LZW to decompress the
output sequence of
Example 1:
<66><65><256><257><65><260>.
19. Example 2: LZW Decompression Step 1
<66><65><256><257><65><260> Old = 65 S=A
New = 66 C=A
ENCODER OUTPUT STRING TABLE
string codeword string
B
A 256 BA
20. Example 2: LZW Decompression Step 2
<66><65><256><257><65><260> Old = 256 S = BA
New = 256 C = B
ENCODER OUTPUT STRING TABLE
string codeword string
B
A 256 BA
BA 257 AB
21. Example 2: LZW Decompression Step 3
<66><65><256><257><65><260> Old = 257 S = AB
New = 257 C = A
ENCODER OUTPUT STRING TABLE
string codeword string
B
A 256 BA
BA 257 AB
AB 258 BAA
22. Example 2: LZW Decompression Step 4
<66><65><256><257><65><260> Old = 65 S = A
New = 65 C = A
ENCODER OUTPUT STRING TABLE
string codeword string
B
A 256 BA
BA 257 AB
AB 258 BAA
A 259 ABA
23. Example 2: LZW Decompression Step 5
<66><65><256><257><65><260> Old = 260 S = AA
New = 260 C = A
ENCODER OUTPUT STRING TABLE
string codeword string
B
A 256 BA
BA 257 AB
AB 258 BAA
A 259 ABA
AA 260 AA