UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
Introduction to Source Coding.pdf
1. Course Coordinator:-
Dr. Mulugeta Atlabachew (Ass. Professor)
HARAMAYA UNIVERSITY
HARAMAYA INSTITUTE OF TECHNOLOGY
SCHOOL OF ELECTRICAL AND COMPUTER
ENGINEERING
Course Coordinator:-
Dr. Mulugeta Atlabachew (Ass. Professor),
Guest Lecturer
Introduction to Source Coding
2. Shannon’s 1st Source Coding Theorem
Shannon showed that:
“To reliably store the information generated by some
random source X, you need no more/less than, on the
average, H(X) bits for each outcome.”
Haramaya
University,
HIT,
School
of
ECE
2 Haramaya University, HIT, School of ECE
12/20/2022
3. Shannon’s 1st Source Coding Theorem
If I toss a dice 1,000,000 times and record values from each
trial
1,3,4,6,2,5,2,4,5,2,4,5,6,1,….
In principle, I need 3 bits for storing each outcome as 3 bits
covers 1-8. So I need 3,000,000 bits for storing the
information.
Using ASCII representation, computer needs 8 bits=1 byte
for storing each outcome
The resulting file has size 8,000,000 bits
Haramaya
University,
HIT,
School
of
ECE
3 Haramaya University, HIT, School of ECE
12/20/2022
4. Shannon’s 1st Source Coding Theorem
You only need 2.585 bits for storing each outcome.
So, the file can be compressed to yield size
2.585x1,000,000=2,585,000 bits
Optimal Compression Ratio is:
Haramaya
University,
HIT,
School
of
ECE
4
%
31
.
32
3231
.
0
000
,
000
,
8
000
,
585
,
2
Haramaya University, HIT, School of ECE
12/20/2022
5. Shannon’s 1st Source Coding Theorem
.
Haramaya
University,
HIT,
School
of
ECE
5 Haramaya University, HIT, School of ECE
12/20/2022
6. Type of Coding
Source Coding - Code data to more efficiently represent the
information
Reduces “size” of data
Analog - Encode analog source data into a binary format
Digital - Reduce the “size” of digital source data
Channel Coding - Code data for transmission over a noisy
communication channel
Increases “size” of data
Digital - add redundancy to identify and correct errors
Analog - represent digital values by analog signals
Haramaya
University,
HIT,
School
of
ECE
6 Haramaya University, HIT, School of ECE
12/20/2022
7. Type of Source Coding
Two Types of Source Coding
Lossless coding (entropy coding)
Data can be decoded to form exactly the same bits
Used in “zip”
Can only achieve moderate compression (e.g. 2:1 - 3:1) for natural images
Can be important in certain applications such as medical imaging
Lossly source coding
Decompressed image is visually similar, but has been changed
Used in “JPEG” and “MPEG”
Can achieve much greater compression (e.g. 20:1 -40:1) for natural images
Uses entropy coding
Haramaya
University,
HIT,
School
of
ECE
7 Haramaya University, HIT, School of ECE
12/20/2022
8. Lossless Coding
Lossless compression allows the original data to be
perfectly reconstructed from the compressed data.
By operation of the pigeonhole principle, no lossless
compression algorithm can efficiently compress all possible
data.
Haramaya
University,
HIT,
School
of
ECE
8 Haramaya University, HIT, School of ECE
12/20/2022
9. Lossless Coding
Lossless data compression is used in many applications. For example,
It is used in the ZIP file format and in the GNU tool gzip.
It is also used as a component within lossy data compression technologies
(e.g. lossless mid/side joint stereo preprocessing by MP3 encoders and
other lossy audio encoders).
For Typical examples are executable programs, text documents, and source
code.
Some image file formats, like PNG or GIF, use only lossless compression,
while others like TIFF and MNG may use either lossless or lossy methods.
Lossless audio formats are most often used for archiving or production
purposes, while smaller lossy audio files are typically used on portable
players and in other cases where storage space is limited or exact
replication of the audio is unnecessary.
Haramaya
University,
HIT,
School
of
ECE
9 Haramaya University, HIT, School of ECE
12/20/2022
10. Lossless Coding
Most lossless compression programs do two things in
sequence:
the first step generates a statistical model for the input
data, and
the second step uses this model to map input data to bit
sequences in such a way that "probable" (e.g. frequently
encountered) data will produce shorter output than
"improbable" data.
Haramaya
University,
HIT,
School
of
ECE
10 Haramaya University, HIT, School of ECE
12/20/2022
11. Lossless Coding
The primary encoding algorithms used to produce bit
sequences are Huffman coding (also used by the deflate
algorithm) and arithmetic coding.
Arithmetic coding achieves compression rates close to the
best possible for a particular statistical model, which is
given by the information entropy, whereas Huffman
compression is simpler and faster but produces poor
results for models that deal with symbol probabilities close
to 1.
Haramaya
University,
HIT,
School
of
ECE
11
Haramaya University, HIT, School of ECE
12/20/2022
12. Lossless Coding
Adaptive models
Adaptive models dynamically update the model as the data is
compressed.
Both the encoder and decoder begin with a trivial model, yielding poor
compression of initial data, but as they learn more about the data,
performance improves.
Most popular types of compression used in practice now use adaptive
coders.
Haramaya
University,
HIT,
School
of
ECE
12 Haramaya University, HIT, School of ECE
12/20/2022
13. Assume a set of symbols (26 English letters and some additional
symbols such as space, period, etc.) is to be transmitted through the
communication channel.
These symbols can be treated as independent samples of a random
variable X with probability P(X) and entropy
The length of the code for a symbol x with can be its
surprise
Let L be the average number of bits to encode the N symbols. Shannon
proved that the minimum L satisfies
14
Shannon's Source Coding Theorem
Haramaya University, HIT, School of ECE
12/20/2022
14. A Huffman code is a particular type of optimal prefix code that is
commonly used for lossless data compression.
Optimum prefix code developed by D. Huffman in a class assignment
The output from Huffman's algorithm can be viewed as a variable-
length code table for encoding a source symbol.
The algorithm derives this table from the estimated probability or
frequency of occurrence (weight) for each possible value of the
source symbol.
Huffman coding is not always optimal among all compression methods
it is replaced with arithmetic coding or asymmetric numeral systems if better compression ratio is required.
15
Huffman Coding
Haramaya University, HIT, School of ECE
12/20/2022
15. Two Requirements for Optimum Prefix Codes
The two least likely symbols have codewords that differ only in the last bit
These three requirements lead to a simple way of building a binary tree
describing an optimum prefix code - THE Huffman Code.
Build it from bottom up, starting with the two least likely symbols
The external nodes correspond to the symbols
The internal nodes correspond to “super symbols” in a “reduced” alphabet
16
Huffman Coding
Haramaya University, HIT, School of ECE
12/20/2022
16. 1. Label each node with one of the source symbol probabilities
2. Merge the nodes labeled by the two smallest probabilities into a parent
node
3. Label the parent node with the sum of the two children’s probabilities
This parent node is now considered to be a “super symbol” (it replaces its two
children symbols) in a reduced alphabet
4. Among the elements in reduced alphabet, merge two with smallest probs.
If there is more than one such pair, choose the pair that has the “lowest order
super symbol” (this assure the minimum variance Huffman Code)
5. Label the parent node with the sum of the two children probabilities.
6. Repeat steps 4 & 5 until only a single super symbol remains
17
Huffman Code-Design Steps
Haramaya University, HIT, School of ECE
12/20/2022
20. Build it from bottom up, starting with the two least likely symbols
The external nodes correspond to the symbols
The internal nodes correspond to “super symbols” in a “reduced” alphabet
21
Huffman Coding
Haramaya University, HIT, School of ECE
12/20/2022
26. Adaptive Huffman coding (also called Dynamic Huffman
coding) is an adaptive coding technique based on Huffman
coding.
It permits building the code as the symbols are being
transmitted, having no initial knowledge of source distribution,
that allows one-pass encoding and adaptation to changing
conditions in data.
The benefit of one-pass procedure is that the source can be
encoded in real time, though it becomes more sensitive to
transmission errors, since just a single loss ruins the whole
code.
27
ADAPTIVE HUFFMAN CODING
Haramaya University, HIT, School of ECE
12/20/2022
27. One pass
During the pass calculate the frequencies
Update the Huffman tree accordingly
Coder – new Huffman tree computed after transmitting the
symbol
Decoder – new Huffman tree computed after receiving the
symbol
Symbol set and their initial codes must be known ahead of
time.
Need NYT (not yet transmitted symbol) to indicate a new leaf
is needed in the tree.
28
ADAPTIVE HUFFMAN CODING
Haramaya University, HIT, School of ECE
12/20/2022
50. Huffman Coding
Replacing an input symbol with a codeword
Need a probability distribution
Hard to adapt to changing statistics
Need to store the codeword table
Minimum codeword length is 1 bit
Arithmetic Coding
Replace the entire input with a single floating-point number
Does not need the probability distribution
Adaptive coding is very easy
No need to keep and send codeword table
Fractional codeword length
51
Huffman Coding Vs Arithmetic Coding
Haramaya University, HIT, School of ECE
12/20/2022
51. Recall table look-up decoding of Huffman code
N: alphabet size
L: Max code word length
Divide [0, 2L] into N intervals
One interval for one symbol
Interval size is roughly proportional to symbol prob.
52
Arithmetic Coding
Haramaya University, HIT, School of ECE
12/20/2022
52. Arithmetic coding applies this idea recursively
Normalizes the range [0, 2L] to [0, 1].
Map an input sequence (multiple symbols) to a unique tag
in [0, 1)
53
Arithmetic Coding
Haramaya University, HIT, School of ECE
12/20/2022
53. Disjoint and complete partition of the range [0, 1)
Each interval corresponds to one symbol
Interval size is proportional to symbol probability
The first symbol restricts the tag position to be in one of
the intervals
The reduced interval is partitioned recursively as more
symbols are processed.
54
Arithmetic Coding
Haramaya University, HIT, School of ECE
12/20/2022
59. Arithmetic coding is slow in general:
To decode a symbol, we need a series of decisions and
multiplications:
The complexity is greatly reduced if we have only two symbols: 0
and 1.
Only two intervals: [0, x), [x, 1)
60
Binary Arithmetic Decoding
Haramaya University, HIT, School of ECE
12/20/2022