Compression techniques

Introduction
What is Compression?
Data compression requires the identification and
extraction of source redundancy.
In other words, data compression seeks to reduce
the number of bits used to store or transmit
information.
There are a wide range of compression methods
which can be so unlike one another that they have
little in common except that they compress data.

Compression can be categorized in
two broad ways:

• Lossless compression
• Lossy compression

Lossless compression

• recover the exact original data after compression.

• mainly use for compressing database records,
spreadsheets or word processing files, where exact
replication of the original is essential.

Lossy compression.
• will result in a certain loss of accuracy in exchange for
a substantial increase in compression.
• more effective when used to compress graphic images
and digitised voice where losses outside visual or aural
perception can be tolerated.
• Most lossy compression techniques can be adjusted to
different quality levels, gaining higher accuracy in
exchange for less effective compression.

The Need For Compression…
In terms of storage, the capacity of a storage
device can be effectively increased with methods
that compresses a body of data on its way to a
storage device and decompresses it when it is
retrieved.
 In terms of communications, the bandwidth of a
digital communication link can be effectively
increased by compressing data at the sending
end and decompressing data at the receiving
end.

A Brief History of Data
Compression..
• The late 40's were the early years of Information
Theory, the idea of developing efficient new
coding methods was just starting to be fleshed
out. Ideas of entropy, information content and
redundancy were explored.
• One popular notion held that if the probability of
symbols in a message were known, there ought
to be a way to code the symbols so that the
message will take up less space.

• The first well-known method for compressing
digital signals is now known as Shannon- Fano
coding. Shannon and Fano [~1948]
simultaneously developed this algorithm which
assigns binary codewords to unique symbols that
appear within a given data file.
• While Shannon-Fano coding was a great leap
forward, it had the unfortunate luck to be quickly
superseded by an even more efficient coding
system : Huffman Coding.

• Huffman coding [1952] shares most
characteristics of Shannon-Fano coding.
• Huffman coding could perform effective data
compression by reducing the amount of
redundancy in the coding of symbols.
• It has been proven to be the most efficient
fixed-length coding method available

• In the last fifteen years, Huffman coding has
been replaced by arithmetic coding.
• Arithmetic coding bypasses the idea of
replacing an input symbol with a specific code.
• It replaces a stream of input symbols with a
single floating-point output number.
• More bits are needed in the output number
for longer, complex messages.

Terminology
• Compressor–Software (or hardware) device
that compresses data
• Decompressor–Software (or hardware)
device that decompresses data
• Codec–Software (or hardware) device that
compresses and decompresses data
• Algorithm–The logic that governs the
compression/decompression process

Lossless Compression
Algorithms:
• Repetitive Sequence Suppression
• Run-length Encoding*
• Pattern Substitution
• Entropy Encoding*
The Shannon-Fano Algorithm
Huffman Coding*
Arithmetic Coding*

Repetitive
• If a sequence aSequence Suppression
series on n successive tokens appears
• Replace series with a token and a count number of
occurrences.
• Usually need to have a special flag to denote when the
repeated token appears
• Example
89400000000000000000000000000000000
• we can replace with 894f32, where f is the flag for
zero.

Run-length Encoding

Example:
• Original Sequence:
111122233333311112222
• can be encoded as:
(1,4),(2,3),(3,6),(1,4),(2,4)

Run-Length Encoding (RLE)
Method
Example:

Method

Example:

blue x 6, magenta x 7, red x 3, yellow x 3 and green x 4

Method
• Example:

This would give:

which is twice the size!

• Uncompress
• Blue White White White White White White Blue
• White Blue White White White White White Blue
• etc.
• Compress
• 1XBlue 6XWhite 1XBlue
• 1XWhite 1XBlue 4Xwhite 1XBlue 1XWhite
• etc.

The Shannon-Fano
Algorithm
• Example
• Data:
• ABBAAAACDEAAABBBDDEEAAA........
• Count symbols in stream:

Arithmetic Coding
Example
• Raw data: BACA
Therefore
• A occurs with probability 0.5,
• B and C with probabilities 0.25
2/4=0.5
1/4=0.25

Cont..

• Start by assigning each symbol to the
• probability range 0–1.

The first symbol in our example stream is B

Applications
Lossless compression..
• The above is a very simple example of run-
length encoding,
• wherein large runs of consecutive identical data
values are replaced by a simple code with the
data value and length of the run. This is an
example of lossless data compression.
• It is often used to optimize disk space on office
computers, or better use the connection
bandwidth in a computer network

Lossy image compression

• is used in digital cameras,
• to increase storage capacities with minimal
degradation of picture quality.

Algorthim of coding…
function LZW Decode(File)
function LZW Encode(File) n ReadIndex(File)
n.
.
ReadByte(File)

GetString(n)
while n
ÇEOF do Output(


)

ReadByte(File.) while n
ÇEOF do

n©ÃF GetIndex(n3) nÃF

ReadIndex(File)

while n©Ã .
$
do if IndexInDict?(n©Ã) then

.

Output(n) C’= AddDict(n×~1è)

3

Ã

AddDict(n) GetString(n)
C=x Output()
C= C’

Compression techniques

More Related Content

What's hot

Similar to Compression techniques

More from m_divya_bharathi

Recently uploaded

Compression techniques