Media IT - Entropy

Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu ::
Faculty of Science, Technology and Communication (FSTC)
Bachelor en informatique (professionnel)
-- Media IT -–
¯_(ツ)_/¯
Unit 4
Entropy and
compression

Media IT :: Dr Serge Linckels :: http://www.linckels.lu/ :: serge@linckels.lu :: 2
Assignment
Presentation on data formats
Chose a data format among the following list:
How it works
• You work in teams of 2 students
• Prepare a presentation that explains the principles of you selected data format,
i.e., encoding, compression, features, usage...
• Share your presentation, e.g., PowerPoint file or Prezi link
• Presentation on 29 October or 5 November; 10 – 15 minutes
• Your work (presentation + support) is considered 10% of your final grade
jpeg / jfif / jpeg 2000 wav / aiff / au / raw
png mp3 / vorbis / aac / wma
bmp / (animated) gif tiff / raw
svg

3. Coding and compression
3.1 Basics of information theory
3.2 Entropy and redundancy
3.3 Huffman coding
3.4 Run-length encoding (RLE)
3.5 Lempel–Ziv–Welch (LZW) coding
3.6 The mysterious case of the Xerox scanners (2013)

11 November 2017
One of Europe's smallest countries now holds claim to being a giant in the
space industry. Luxembourg, with a population less than the state of
Vermont, now generates nearly 2 percent of its annual gross domestic
product from the space industry, according to Deputy Prime Minister Etienne
Schneider. The country's economy checked in just shy of $61 billion in 2016,
according to the CIA World Factbook. "We have grown from nothing to the
most dynamic in Europe," Schneider told an audience Saturday, in a speech
at the New Worlds conference in Austin, Texas. He added that the country's
space program was first launched just over 30 years ago. Schneider, who
also serves as Luxembourg's economic minister, told the conference that he
is often questioned about why Luxembourg is so "keen on exploiting space
resources." He replied by saying the same "liberal, extremely business
friendly climate" that pushed the country's financial sector boom is now
being reapplied to attracting space companies. "I have more than 70 space
companies in the pipeline," Schneider told CNBC after the speech.
Luxembourg's "space resources initiative" is the country's plan to make the
most out of a quickly growing global industry, the minister said. "It's a
series of measures to position Luxembourg as the European heart of
exploration and use of space resources."
The word space is frequently used in this source, i.e., it
has a higher occurrence than other words, e.g., Texas
(appears just once)
The probability that the word space re-appears in
a next section (not visible on this slide) is very high
The word space has a higher “importance”, i.e., in
this source than other words, e.g., Texas.
This is called: self-information or surpisal
Therefore:

𝐴 is an alphabet, i.e., a non-empty set of symbols
Computational linguistics basics
Claude Elwood Shannon (1916–2001) was
an American mathematician, electrical
engineer, and cryptographer known as "the
father of information theory"
Examples
𝐴 = {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}
𝐴∗
is the set of all possible words through this alphabet 𝐴∗
= {a, aaa, oxo, jimi, house, haus, maison, abc, bdef…}
The language 𝐸 ⊆ 𝐴∗
contains all English words 𝐸 = {a, and, are, house…}
𝑎 ∈ 𝐸 is a word of the English language 𝑎 = house
|𝑎| the length of the word 𝑎 |𝑎| =5
𝑝 𝑎 ∈ ℝ is the probability of occurrence of the word 𝑎 in the
English language with 0 ≥ 𝑝 𝑎 ≥ 1
The average word length is written:
𝑎∈𝐿
𝑝 𝑎 ∙ 𝑎
The sum of all probabilities is 1
𝑎∈𝐿
𝑝 𝑎 = 1

Word frequency depends on the corpus you analyze
https://www.wordfrequency.info/ https://www.sketchengine.eu/

Example of application: cryptanalysis
A typical distribution
of letters in English
language text. Weak
ciphers do not
sufficiently mask the
distribution, and this
might be exploited by a
cryptanalyst to read
the message.
Colossus: British
machine developed
during WW2 to help
codebreakers break
the Enigma cipher
machine
LIVITCSWPIYVEWHEVSRIQMXLEYVEOIEWHRXEXIPFEMVEWHKVSTYLXZIXLIKIIXPIJVSZEYPERRGERIM
WQLMGLMXQERIWGPSRIHMXQEREKIETXMJTPRGEVEKEITREWHEXXLEXXMZITWAWSQWXSWEXTVEPMRXRSJ
GSTVRIEYVIEXCVMUIMWERGMIWXMJMGCSMWXSJOMIQXLIVIQIVIXQSVSTWHKPEGARCSXRWIEVSWIIBXV
IZMXFSJXLIKEGAEWHEPSWYSWIWIEVXLISXLIVXLIRGEPIRQIVIIBGIIHMWYPFLEVHEWHYPSRRFQMXLE
PPXLIECCIEVEWGISJKTVWMRLIHYSPHXLIQIMYLXSJXLIMWRIGXQEROIVFVIZEVAEKPIEWHXEAMWYEPP
XLMWYRMWXSGSWRMHIVEXMSWMGSTPHLEVHPFKPEZINTCMXIVJSVLMRSCMWMSWVIRCIGXMWYMX
Hereupon Legrand arose, with a grave and stately air, and brought me the beetle from a glass case in which it
was enclosed. It was a beautiful scarabaeus, and, at that time, unknown to naturalists—of course a great prize
in a scientific point of view. There were two round black spots near one extremity of the back, and a long one
near the other. The scales were exceedingly hard and glossy, with all the appearance of burnished gold. The
weight of the insect was very remarkable, and, taking all things into consideration, I could hardly blame
Jupiter for his opinion respecting it.
Full explanation can be found on Wikipedia

Generates words randomly
Illustration with an alphabet 𝒳={a, b, c, d}
Machine 1 Machine 2 Generates words according the following probability
pa pb pc pd
0,25 0,25 0,25 0,25
pa pb pc pd
0,5 0,125 0,125 0,25
d d a c d a b
Is it a or b?
Is it a?
yes
yes
a
no
b
Is it c?
no
c d
Is it a?
a
yes no
Is it d?
yes
d Is it b?
no
yes no
b c
wa = 2
wb = 2
wc = 2
wd = 2
weight
What is
the next
word?
yes no
Questions to ask in average:
=pa⋅wa + pb⋅wb + pc⋅wc + pd⋅wd
=0,5⋅1 + 0,125⋅3 + 0,125⋅3 + 0,25⋅2
=1,75 questions
Questions to ask in average:
=pa⋅wa + pb⋅wb + pc⋅wc + pd⋅wd
=0,25⋅2 + 0,25⋅2 + 0,25⋅2 + 0,25⋅2
=2 questions
ca = 11 (2 bit)
cb = 10 (2 bit)
cc = 01 (2 bit)
cd = 00 (2 bit)
binary code
wa = 1
wb = 3
wc = 3
wd = 2
weight
ca = 1 (1 bit)
cb = 001 (3 bit)
cc = 000 (3 bit)
cd = 01 (2 bit)
binary code
Machine 2 is
producing less
information than
machine 1

𝐻 𝒳 = −
𝑥∈𝒳
𝑝 𝑥 ∙ log2
1
𝑝 𝑥
pa pb pc pd
0,25 0,25 0,25 0,25
pa pb pc pd
0,5 0,125 0,125 0,25
Information entropy (H) is the average rate at which
information is produced by a stochastic source of data
Entropy for machine 1:
𝐻 𝒳 = −
𝑥∈𝒳
𝑝 𝑥 ∙ log2
1
𝑝 𝑥
𝐻 𝒳 = − 𝑝 𝑎 ∙ 𝑙𝑜𝑔2
1
𝑝 𝑎
+ 𝑝 𝑏 ∙ 𝑙𝑜𝑔2
1
𝑝 𝑏
+ 𝑝𝑐 ∙ 𝑙𝑜𝑔2
1
𝑝𝑐
+ 𝑝 𝑑 ∙ 𝑙𝑜𝑔2
1
𝑝 𝑑
𝐻 𝒳 = − 0,25 ∙ 𝑙𝑜𝑔2
1
0,25
+ 0,25 ∙ 𝑙𝑜𝑔2
1
0,25
+ 0,25 ∙ 𝑙𝑜𝑔2
1
0,25
+ 0,25 ∙ 𝑙𝑜𝑔2
1
0,25
𝐻 𝒳 = − 0,25 ∙ (−2) + 0,25 ∙ (−2) + 0,25 ∙ (− 2) + 0,25 ∙ (−2)
𝐻(𝒳) = 2
Entropy for machine 2:
𝐻 𝒳 = −
𝑥∈𝒳
𝑝 𝑥 ∙ log2
1
𝑝 𝑥
𝐻 𝒳 = − 𝑝 𝑎 ∙ 𝑙𝑜𝑔2
1
𝑝 𝑎
+ 𝑝 𝑏 ∙ 𝑙𝑜𝑔2
1
𝑝 𝑏
+ 𝑝𝑐 ∙ 𝑙𝑜𝑔2
1
𝑝𝑐
+ 𝑝 𝑑 ∙ 𝑙𝑜𝑔2
1
𝑝 𝑑
𝐻 𝒳
= − 0,5 ∙ 𝑙𝑜𝑔2
1
0,5
+ 0,125 ∙ 𝑙𝑜𝑔2
1
0,125
+ 0,125 ∙ 𝑙𝑜𝑔2
1
0,125
+ 0,25 ∙ 𝑙𝑜𝑔2
1
0,25
𝐻 𝒳 = − 0,5 ∙ −1 + 0,125 ∙ (− 3) + 0,125 ∙ −3 + 0,25 ∙ (−2)
𝐻(𝒳) = 1,75
The weight of a certain
word
stochastic: having a random
probability distribution or
pattern that may be analyzed
statistically but may not be
predicted precisely

x a b c d
probability (px) 0,25 0,25 0,25 0,25
coding (cx) 00 01 10 11
weight (wx) 1 1 1 1
x a b c d
probability (px) 0,5 0,125 0,125 0,25
coding (cx) 1 001 000 01
weight (wx) 1 3 3 2
Entropy: H(𝒳) = 2 Entropy: H(𝒳) = 1,75
The principle idea is to find the optimal coding so that the average length of the code is the smallest possible in order to reduce
the amount of bits to transmit
𝐿 =
𝑥∈𝒳
𝑝 𝑥 ∙ 𝑐 𝑥
Average code length: L(𝒳) = 2 Average code length: L(𝒳) = 1,75
In the above example, information that is transmitted by machine 2 requires less actual bits than machine 1. The code has been
designed so that fewer bits are used to send more frequent symbols, but still so that it can be unambiguously decoded
A code is optimal if L – H is minimal,
i.e., little redundancy

Machine 3 Generates words according the following probability
x a b c d
probability (px) 0,5 0,125 0,125 0,25
coding (cx) 000 001 000 010
weight 1 3 3 2
Entropy: H(𝒳) = 1,75
Average code length: L(𝒳) = 3
Here, the redundancy for machine 3 (L – H = 1,25) is higher than the coding from machine 2 (L – H = 0) , although the probabilities,
i.e., entropy, remain the same. Therefore, this code is less optimal for transmission.
Same values as for machine 2 but with
a worse coding (more bits are used
than necessary)

Illustration of flipping {a fair | an unfair} coin
The table shows the flipping of a coin where the probability 0,5 / 0,5 is the only “fair flipping” and a
maximized entropy (1 bit). Every other “unfair flipping” results in a smaller entropy, i.e., the surprise
of getting heads is smaller.
probability entropy
(H)heads tails
0 1 0
0,1 0,9 0,468
0,2 0,8 0,721
0,3 0,7 0,881
0,4 0,6 0,970
0,5 0,5 1,000
0,6 0,4 0,970
0,7 0,3 0,881
0,8 0,2 0,721
0,9 0,1 0,468
1 0 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
entropy
probability
fair flipping, i.e., equal
probabilities
The extreme case is that of a double-headed coin that never comes up tails, or a double-tailed coin
that never results in a head. Then there is no uncertainty. The entropy is zero: each toss of the coin
delivers no new information as the outcome of each coin toss is always certain.

Continue the
sequence!
111221
1211
21
11
1113213211
13112221
312211
one 1 and one 2 and two 1
one 2 and one 1
two 1
one 1
three 1 and two 2 and one 1

Practical exercises
Calculate the entropy for rolling a 6-sided die and do the following steps:
• Give the probabilities for each possible value!
• Calculate the entropy using Shannon’s formula!
• Suggest a code for each word to transmit and compute the redundancy!
• Explain the meaning of the obtained values for the entropy!
• Represent graphically the rolling of a fair and unfair die!
1.
How it works
• Try out the two exercises alone.
• Discuss your results with another student.
• This work is not considered for your final grade.
Which symbols have the three highest occurrences in the English language (ignore case sensitivity)?
What would be the weight of the letter E?2.
https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/information-entropy

Objective: reduce the amount of data
universal specific
without data loss Huffman, LZW PNG, AIFF
with data loss JPEG, MP3
Classification
can be used for any
purpose
used for specific
applications

3.3 Huffman coding
David Albert Huffman (1925–1999) was a
pioneer in computer science and professor
of computer science and the University of
California in Santa Cruz. He is known for
his Huffman coding.
http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf
Huffman, D. (1952). "A Method for the Construction of
Minimum-Redundancy Codes" (PDF). Proceedings of the IRE.
40 (9): 1098–1101. doi:10.1109/JRPROC.1952.273898
Every symbol of the source is represented by a code
Principle: entropy encoding method
The length of the code depends on the frequency of occurrence of
the symbol; frequent words have smaller codes than less frequent
symbols
Application: MPEG-Layer III (MP3) encoder

3.3 Huffman coding
Input
𝐴 is an alphabet of symbols: 𝐴 = {a1, a2,...,an}
W is a the tuple of symbol weights (usually proportional to probabilities): W = {w1, w2,..., wn} with wi = weight(ai)
Output
CW is the set of binary codewords over 𝐴: CW = {c1, c2,..., cn}
Goal
Find the minimal redundancy 𝐿 = 𝑤∈𝑊 𝑤 𝑥 ∙ 𝑐 𝑥 according to Shannon entropy

3.3 Huffman coding
Machine 2 Generates words according the following probability
pa pb pc pd
0,5 0,125 0,125 0,25
Is it a?
a
yes no
Is it d?
yes
d Is it b?
no
yes no
b c
ca = 1 (1 bit)
cb = 001 (3 bit)
cc = 000 (3 bit)
cd = 01 (2 bit)
binary code
Principle
http://huffman.ooz.ie/
wa wb wc wd
4 1 1 2
Huffman
tree
How do we
build the tree?
frequency probability

3.3 Huffman coding
Example
text to send
wa wb wc wd
4 1 1 2
1 001 000 01
frequency
code
Example 1: Huffman compressed
wa wb wc wd
4 1 1 2
11 10 01 00
frequency
code
Example 2: uncompressed
abadacda
w a b a d a c d a
c 1 001 1 01 1 000 01 1
|c| 1 3 1 2 1 3 2 1
Length of transmission: 14 bit
w a b a d a c d a
c 11 10 11 00 11 01 00 11
|c| 2 2 2 2 2 2 2 2
Length of transmission: 16 bit
Compression ratio = 1 −
14
16
= 𝟏𝟐, 𝟓%

3.3 Huffman coding
Building the Huffman tree
wa wb wc wd
4 1 1 2 frequency
golden rules
1) Every word is a node (in brackets
is the word frequency)
https://people.ok.ubc.ca/ylucet/DS/Huffman.html
2) Take two nodes of lowest
frequency and create a branching
with a new node, having the sum of
all frequencies of the branch

3.3 Huffman coding
wa wb wc wd
4 1 1 2 frequency
golden rules
3) If three or more nodes have the
same (lowest) frequency then
create multiple sub-branches
4) Repeat steps 2) and 3) until
there are no nodes left

3.3 Huffman coding
wa wb wc wd
4 1 1 2 frequency
golden rules

3.3 Huffman coding
golden rules
wa wb wc wd
4 1 1 2 frequency

3.3 Huffman coding
golden rules
Remarks
The tree is built bottom-up
Every code is unique and free of
ambiguities
wa wb wc wd
4 1 1 2
1 001 000 01
frequency
code

3.3 Huffman coding
Example of decoding
Consider the following Huffman code
wa wb wc wd
4 1 1 2
1 001 000 01
frequency
code
What message is sent with the following transmission?
code received
010000011
01 000 001 1
d c b a

3.3 Huffman coding
Practical exercises
Based on the following dictionary, what is the original message that was broadcast using Huffman coding:
1011100101001110010101001011001001000011010011.
How it works
Calculate the compression ration against a classical 8-bit ASCII encoding of
the same message!

3.3 Huffman coding
Practical exercises
Based on the following frequencies:
• Create the corresponding codes for each symbol
• Compute the compression ration to send the text “ecaabae” using Huffman compression against an
uncompressed code of 3 bit
2.
How it works
https://www.dcode.fr/codage-huffman-compression

3.3 Huffman coding
Practical exercises - solution
2.
https://cs.nyu.edu/courses/fall09/V22.0102-002/lectures/Huffman.pdf

3.3 Huffman coding
Practical exercises
Consider the text “go go gophers”.
• Create the table of frequencies!
• Encode the text using a Huffman code!
• Compute the compression ration against an uncompressed code with minimal length!
3.

3.3 Huffman coding
Practical exercises - solution
Consider the text “go go gophers”.
• Create the table of frequencies!
• Encode the text using a Huffman code!
• Compute the compression ration against an uncompressed code with minimal length!
3.
https://www2.cs.duke.edu/csed/poop/huff/info/

Principle
Very simple lossless data compression
Based replacing sequences of same symbols by a code
Application: Graphics Interchange Format (GIF), fax machines (T.45)
RLE is useful for highly-redundant data, indexed images with many pixels of the same color in a row, or in combination with
other compression techniques
Example
Message: aabbbbbeedddddddddddb
Runs: (a,2) (b,5) (e,2) (d,11) (b,1)
Encoding: different representations are possible
• 2a5b2e11d1b  can cause problems in decoding the frequency
• #a2#b5#e2#d11#b1  escape character used
• aa2bb5ee2dd11b  any time a character appears twice it denotes a run
• (a,b,e,d,b) (2,5,2,11,1)  two separate vectors: one for the symbols and one for the frequencies

Practical exercise
The following bitmap image has a size of 15 x 15 pixels. Each pixel can be white, red or black.
How it works
• Try out the exercise alone.
Compare the uncompressed size against an RLE compression!
How effective would be a Huffman compression for this purpose?
http://www.xiconeditor.com/

Principle
Universal lossless data compression
Published in 1984 by T. Welch as an improvement of the LZ78
algorithm published by A. Lempel & J. Ziv in 1978
Application: Graphics Interchange Format (GIF), Unix file compression utility
Abraham Lempel
(1936-)
Yaakov Ziv
(1931-)
Terry Welch
(1939-1988)
Simple to use and widely used for very high throughput in
hardware implementations
Based on replacing recurrent sequences by a code and
managing in this way a dictionary of encountered sequences

Example
Item Code
a 1
b 2
...
z 26
b a n a n e n b a u
Item Code
golden rules
1) Every symbol (1 char) is
represented in the initial dictionary
with a code

Example
Item Code
a 1
b 2
...
z 26
b a n a n e n b a u
golden rules
with a code
2) Take current symbol:
• if present in dictionary then
extent sequence
• otherwise add it to dictionary
Item Code
ba 27
coded message
2
b
is “b” in the dictionary?
is “ba” in the dictionary?
extent sequence

Example
Item Code
a 1
b 2
...
z 26
ba 27
b a n a n e n b a u
Item Code
an 28
coded message
2 1
b a
is “a” in the dictionary?
is “an” in the dictionary?
extent sequence
golden rules
with a code
extent sequence

Example
Item Code
a 1
b 2
...
z 26
ba 27
an 28
b a n a n e n b a u
Item Code
na 29
coded message
2 1 14
b a n
is “n” in the dictionary?
is “na” in the dictionary?
extent sequence
golden rules
with a code
extent sequence

Example
Item Code
a 1
b 2
...
z 26
ba 27
an 28
b a n a n e n b a u
Item Code
na 29
ane 30
coded message
2 1 14 28
b a n an
is “a” in the dictionary?
is “an” in the dictionary?
extent sequence
is “ane” in the dictionary?
extent sequence
golden rules
with a code
extent sequence

Example
Item Code
a 1
b 2
...
z 26
ba 27
an 28
b a n a n e n b a u
Item Code
na 29
ane 30
en 31
coded message
2 1 14 28 5
b a n an e
is “e” in the dictionary?
is “en” in the dictionary?
extent sequence
golden rules
with a code
extent sequence

Example
Item Code
a 1
b 2
...
z 26
ba 27
an 28
b a n a n e n b a u
Item Code
na 29
ane 30
en 31
nb 32 coded message
2 1 14 28 5 14
b a n an e n
is “n” in the dictionary?
is “nb” in the dictionary?
extent sequence
golden rules
with a code
extent sequence

is “bau” in the dictionary?
Example
Item Code
a 1
b 2
...
z 26
ba 27
an 28
b a n a n e n b a u
Item Code
na 29
ane 30
en 31
nb 32
bau 33
coded message
2 1 14 28 5 14 27
b a n an e n ba
is “b” in the dictionary?
is “ba” in the dictionary?
extent sequence
extent sequence
golden rules
with a code
extent sequence

Example
Item Code
a 1
b 2
...
z 26
ba 27
an 28
b a n a n e n b a u
Item Code
na 29
ane 30
en 31
nb 32
bau 33
coded message
2 1 14 28 5 14 27
b a n an e n ba
is “u” in the dictionary?
STOP – no more symbols
21
u
golden rules
with a code
extent sequence

Example
Item Code
a 1
b 2
...
z 26
ba 27
an 28
b a n a n e n b a u
Item Code
na 29
ane 30
en 31
nb 32
bau 33
coded message
2 1 14 28 5 14 27 21
b a n an e n ba u

Example of decoding!!!!

Practical exercise
The following image has a resolution of 4 x 4 pixels. Each pixel can be white, black, blue or yellow.
1.
How it works
Compare the uncompressed size against an RLE and Huffman compression!
Compress the image according the LZW coding algorithm!

Practical exercise
Calculate the LZW code for the message bobobobowebewe and give the full dictionary!
2.
How it works

The main actors
Xerox WorkCentre Line scanners
which randomly alter written
numbers in pages that are scanned
David Kriesel at the Chaos
Communication Congress (31C3)
in Hamburg on 29 December
2014
http://www.dkriesel.com/
What went wrong (test set)
original data (Arial, 7pt) scan result
Overview
24 July: D. Kriesel informed Xerox about the case
6 August: Xerox announced that this is not a bug
12 August: Xerox confirms that hundreds of thousands of
devices world-wide are affected due to software bug eight
years ago
22 August: first patches for different devices released

Explaining the bug
https://www.youtube.com/watch?v=7FeqF1-Z1g0
Image to be scanned
JBig2: compression standard that segments input page
into regions (patches) of text and images
• Patch 1: image is compressed, e.g., JPEG
• Patches 2 - 4: text are compressed after OCR
• all the rest (white space) does not belong to a patch
Pattern matching: resolve similar patches, e.g., all the
letters “e”  store just one occurrence and re-use same
pattern

Explaining the bug
Image to be scanned
pattern

Explaining the bug
Image to be scanned
pattern
Due to optimization and unprecise pattern
matching, errors can occur

Media IT - Entropy

Recommended

Recommended

More Related Content

Similar to Media IT - Entropy

Similar to Media IT - Entropy (20)

More from Serge Linckels

More from Serge Linckels (13)

Recently uploaded

Recently uploaded (20)

Media IT - Entropy