SlideShare a Scribd company logo
1 of 21
Download to read offline
SET SHAPING THEORY
(THE FUTURE OF INFORMATION THEORY)
PROF. JOHN KENDALL DIXON
Important updates regarding Set Shaping Theory
A fundamental step has been taken in the study of this theory. A group of information theory
students published an article in which they experimentally confirmed the theoretical
predictions of the set shaping theory.
“Practical applications of Set Shaping Theory in Huffman coding”
https://arxiv.org/abs/2208.13020
The Matlab code used is available in the following link:
https://www.mathworks.com/matlabcentral/fileexchange/115590-test-sst-huffman-coding.
In the following link you will find the description of the data compression experiment used to
confirm the theoretical predictions
https://www.academia.edu/88055617/Description_of_the_program_used_to_validate_the_t
heoretical_results_of_the_Set_Shaping_Theory
Let us start with a famous Riemann quote that seems to predict Set Shaping
Theory.
"Two sets that contain the same number of elements can be interpreted as
two points of view that observe the same phenomenon".
As we will see, this sentence was a real source of inspiration and this new
theory is based precisely on this intuition.
The Set Shaping Theory represents a real change in the approach to data
compression.
For this reason it is important to remember some fundamental concepts
regarding the theory of information developed by Shannon in his famous
article:
“A Mathematical Theory of Communication“
Suppose A is a set of symbols (ex: a collection of numbers or letters, an
alphabet). We will call system or ensemble the triple 𝑋 = (𝑥; 𝐴; 𝑃) formed
by x, a random variable, called state, 𝐴 = ሼ𝑥1,𝑥2 … … ሽ
𝑥𝐼 are the possible
values of x (states) and 𝑃 = ቄ𝑝1,𝑝2 … … ሽ
𝑝𝐼 is the probability distribution of the
states 𝑃 𝑥𝑖 = 𝑝𝑖 with σ𝑖=1
𝐼
𝑝𝑖 = 1.
The fundamental point of Shannon's approach is to shift the focus from the sequence,
which must be compressed, to the ensemble X that generated it.
Thus, for Shannon to encode a sequence means to encode the ensemble X (source) that
generated the sequence.
We can represent the source as a die having N faces, where the probability of rolling a face
is defined by the function P.
Therefore, for example, a balanced die represents an ensemble X with a uniform probability
distribution.
Shannon's point of view on data compression is based on associating the shorter
codewords with the faces of the die with a higher probability of exit.
Consequently, each face of the die will be associated with a codeword whose length will
depend on the probability of that face coming out.
This mathematical model created by Shannon is based on a mathematical function called
entropy, defined in the following way: Given an ensemble 𝑋 = (𝑥; 𝐴; 𝑃), the entropy of X,
denoted H, is defined as:
𝐻 𝑋 = − ෍
𝑥𝑖∈𝐴
𝑝(𝑥𝑖) 𝑙𝑜𝑔𝑏𝑝(𝑥𝑖)
Shannon's first theorem (source coding theorem) shows that the average length of the codewords
cannot be less than the entropy of the source H(X).
In order to define a limit to the compression of a message the asymptotic equipartition 'principle
is used.
This principle is very useful and is also used in the Set Shaping Theory.
To understand its meaning we use a die as a representation of the source.
So, for example, if we take a classic six-sided non-loaded die and roll it 100 times the
probability of getting one hundred 1's is very small
1
6
100
.
But instead the probability of obtaining a sequence, in which the faces have a probability close
to 1/6 is very high and grows with increasing throws.
Consequently, this principle tells us that if N (number of dice rolls) is very large, tending to
infinity, the generated sequence almost certainly belongs to a subset (typical set) that contains
only the sequences with an entropy close to NH(X). The error made with this approximation is
negligible.
It is interesting to note that the size of the typical set decreases as the entropy H(X) decreases, so
the smaller size of the typical set implies that the sequences generated by the source can be
encoded in less space.
Using this principle Shannon's first theorem can be rephrased as follows:N i.i.d. random
variables each with entropy H(X) can be compressed into more than NH(X) bits with negligible
risk of information loss, as N → ∞; conversely if they are compressed into fewer than NH(X) bits
it is virtually certain that information will be lost.
Now we can introduce the Set Shaping Therory that as we will see represents a completely new
point of view.
Set Shaping Theroy
To understand this theory it is interesting to try to answer the following question: how can
we simulate a sequence of N symbols emitted by a source defined by an ensemble 𝑋
= (𝑥; 𝐴; 𝑃)?
The simplest way is to consider our source X as a die, so if we want to simulate 10 values
of X, we roll the die 10 times.
Another way is as follows: if we roll a 6-sided die 10 times we can generate 𝐴 10 = 610
different sequences. So, if we write each of these sequences on a sheet and put them in a
box and then do a random draw, we get the same result as rolling the die ten times.
Our box that contains 610sequences is nothing more than a set 𝐴10 that contains
610elements.
Consequently, 10 values of the ensemble X can be obtained by randomly extracting (if the
distribution of P is uniform) an element from the set 𝐴10.
If the distribution of X is not uniform, as in the case of the example, the probability of
extraction of each sequence will depend on the probability distribution P.
According to Riemann's intuition, each set of dimension 𝐴 𝑁 represents a different point of
view with which to observe N values generated by the source 𝑋 = (𝑥; 𝐴; 𝑃).
Indeed, if two sets A and B have the same number of elements, it is possible to define a
bijection function that converts an element of set A to an element of set B and vice versa.
According to this theory, the source is seen as a set that can be transformed into any other
set of the same size.
At this point, we must ask ourselves what kind of sequences the new set should contain.
The new sequences cannot have a length N2 less than the original sequences, because the
new set created would be smaller than the one generated by our source 𝐴 𝑁 > 𝐴 𝑁2
The new sequence cannot even have the same length, indeed it has been shown that entropy
is invariant for every isomorphism.
The only possible solution is that the new sequences have a length N2 greater than the
original sequences.
By increasing the length of the sequence, the new set will have many more elements than the
source set, indeed if N2> N we have 𝐴 𝑁 < 𝐴 𝑁2.
Therefore, we must select from the set 𝐴𝑁2 a subset of size equal to 𝐴 𝑁.
This operation is called "Shaping of the source" because what is done is to cut some
sequences belonging to the set 𝐴𝑁2 with N2> N making their exit probability null.
So, it's like replacing the die with a new rigged die.
The price we have to pay to perform this substitution is to roll the die more times.
Since this theory is used in data compression, the most common method of performing the
“shaping of the source” is the one in which, the subset is chosen by selecting the
sequences with less entropy.
For example, we take as a source 𝑋 = (𝑥; 𝐴; 𝑃) a classic six-sided die (A=6) not loaded
(uniform P) and we roll it 10 times. There are 610 possible sequences 𝑎 = 𝑥1, … … . , 𝑥10
which can be obtained by rolling the die 10 times. We call this set 𝐴10( 𝐴10 = 610).
We order these sequences based on their entropy value, so sequence number 1 𝑎1 will
have the lowest entropy and sequence number 610𝑎610 will have the highest entropy.
Now we roll the die 11 times, in this case there are 611possible sequences 𝑎
= 𝑥1, … … . , 𝑥11 . Let us call this set 𝐴11 ( 𝐴11 = 611).
We sort, as in the previous case, the sequences based on the entropy value.
In this way, we obtain two series of sequences with increasing entropy.
𝐴10
𝐴11
𝑎611
…..
𝑎610 → 𝑎610
….. → …..
….. → …..
𝑎2 → 𝑎2
𝑎1 → 𝑎1
We define a bijection function that transforms the sequence 𝑎1 belonging to the set 𝐴10
into the sequence 𝑎1
belonging to the set 𝐴11
. We continue in this way for all the sequences of the set 𝐴10
as defined by the arrows.
Then we obtain a bijection function f which transforms the set 𝐴10 into a subset 𝐵11 of the
set 𝐴11, 𝐵11⊂ 𝐴11 and 𝐴10 = 𝐵11 .
𝑓: 𝐴10 → 𝐵11
We call 𝑓𝑚 the bijection function defined in the previous example, in which the sequences
𝑎 ∈ 𝐴10 are transformed into sequences with less entropy belonging to 𝐴11 according to
the scheme defined by the arrows.
To understand the advantages of this theory we need to define some functions.
Given a sequence 𝑎𝑖 = 𝑥1 … … … … 𝑥𝑁 , generated by a source 𝑋 = (𝑥; 𝐴; 𝑃), we define
its information content as follows:
𝐼 𝑎𝑖 = − ෍
𝑖=1
𝑁
log 𝑝(𝑥𝑖)
The probability 𝑃(𝑎𝑖) that the source X generates the sequence 𝑎𝑖 is:
𝑃 𝑎𝑖 = ς𝑖=1
𝑁
𝑝(𝑥𝑖)
Consequently, the average information content of a sequence generated by a source 𝑋
= (𝑥; 𝐴; 𝑃) is:
𝐼 𝑎 = ෍
𝑖=1
𝐴 𝑁
𝑃(𝑎𝑖)𝐼(𝑎𝑖)
Now, we apply the bijection function f on the set 𝐴𝑁:
𝑓: 𝐴𝑁 → 𝐵𝑁+𝑘
With 𝐾, 𝑁 ∈ ℕ 𝑎𝑛𝑑 𝐾 > 0, 𝐴𝑁 = 𝐵𝑁+𝑘 .
𝑓 𝑎 = 𝑏
with 𝑎 = 𝑎1, … … . , 𝑎𝑁 and 𝑏 = 𝑏1, … … . , 𝑏𝑁+𝐾 , 𝑎 ∈ 𝐴𝑁 and 𝑏 ∈ 𝐵𝑁+𝐾
The parameter K is called the shaping order of the source and represents the difference in
length between the source sequences belonging to 𝐴𝑁 and the transformed sequences
belonging to 𝐵𝑁+𝑘
Given a source 𝑋 = (𝑥; 𝐴; 𝑃) and a function f we will have:
𝐼 𝑎 = ෍
𝑖=1
𝐴 𝑁
𝑃(𝑎𝑖)𝐼(𝑎𝑖)
𝐼 𝑏 = ෍
𝑖=1
𝐴 𝑁
𝑃(𝑎𝑖)𝐼(𝑏𝑖)
𝐼 𝑎𝑖 = − ෍
𝑖=1
𝑁
log 𝑝 𝑥𝑖 𝑐𝑜𝑛 𝑎𝑖 ∈ 𝐴𝑁
𝐼 𝑏𝑖 = − ෍
𝑖=1
𝑁+𝑘
log 𝑝 𝑏𝑖 𝑐𝑜𝑛 𝑏𝑖 ∈ 𝐵𝑁+𝐾
Using these definitions, the 𝑓𝑚 function of the example is defined as follows:
Given the set 𝑓𝑚 𝐴𝑁
= 𝐵𝑁+𝐾
and its complementary 𝐴𝑁+𝐾
− 𝐵𝑁+𝐾
= 𝐶𝑁+𝐾
, for each sequence 𝑏 ∈ 𝐵𝑁+𝐾
the information content 𝐼 𝑏 is always less than 𝐼 𝑐 for each 𝑐 ∈ 𝐶𝑁+𝐾 and 𝐼 𝑏𝑖 < 𝐼 𝑏𝑖+1 ∀ 𝑏 ∈ 𝐵𝑁+𝐾.
If we apply the function 𝑓𝑚 we will expect that 𝐼 𝑏 ≥ 𝐼 𝑎 instead, we have a
much more complex situation where when 𝐴 > 2 we have 𝐼 𝑏 < 𝐼 𝑎 .
The table shows the bit values of 𝐼 𝑎𝑖 , 𝐼 𝑏𝑖 and 𝐼 𝑎 − 𝐼 𝑏 with K=1, N=100
and 𝑓 = 𝑓𝑚, relative and a source 𝑋 = (𝑥; 𝐴; 𝑃), with 𝐴 variable from 2 to 10 and
with uniform probability distribution 𝑃 𝑥𝑖 =
1
𝐴
.
𝐴 𝐼 𝑎 𝐼 𝑏 𝐼 𝑎 − 𝐼 𝑏
2 99,275 99,659 -0,383
3 157,044 157,040 0,004
4 197,819 197,324 0,495
5 229,271 228,304 0,968
6 254,843 253,401 1,443
7 276,353 274,464 1,889
8 294,868 292,527 2,341
9 311,121 308,383 2,738
10 325,570 322,388 3,181
The data in the table show an extremely interesting and unexpected result. Indeed,
when 𝐴 > 2, the average information of a sequence b randomly extracted from the
set 𝐵𝑁+1
turns out to be less than the average information of a sequence a randomly
extracted from the set 𝐴𝑁
.
The data in the table are relate to N=100, however these results remain valid also
for values of N lower and higher than 100.
It is important to specify that the set 𝐵𝑁+1
contains sequences of the same length
and different from each other, so the result obtained does not violate the Pigeonhole
principle.
The reasons why Set Shaping Theory represents a revolution in information theory
1) This theory, introducing conceptually very advanced elements hypothesized by
Riemann, represents a completely different point of view from that proposed by
Shannon.
2) It develops a new class of bijection functions with properties of strong practical
relevance in many fields.
3) It can help us solve many open problems concerning information theory.
4) It raises important questions about entropy that can allow us to better understand
this important function.
Finally most importantly, it is a new field with an infinity of possible results and
applications yet to be discovered.

More Related Content

Similar to Entropy Coding Set Shaping Theory.pdf

Machine learning (12)
Machine learning (12)Machine learning (12)
Machine learning (12)
NYversity
 
Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docx
bissacr
 
B02110105012
B02110105012B02110105012
B02110105012
theijes
 

Similar to Entropy Coding Set Shaping Theory.pdf (20)

Ch5
Ch5Ch5
Ch5
 
OPERATIONS RESEARCH
OPERATIONS RESEARCHOPERATIONS RESEARCH
OPERATIONS RESEARCH
 
Machine learning (12)
Machine learning (12)Machine learning (12)
Machine learning (12)
 
Mathematical Statistics Assignment Help
Mathematical Statistics Assignment HelpMathematical Statistics Assignment Help
Mathematical Statistics Assignment Help
 
Metric space
Metric spaceMetric space
Metric space
 
Uniform Boundedness of Shift Operators
Uniform Boundedness of Shift OperatorsUniform Boundedness of Shift Operators
Uniform Boundedness of Shift Operators
 
paper publication
paper publicationpaper publication
paper publication
 
Machine learning ppt and presentation code
Machine learning ppt and presentation codeMachine learning ppt and presentation code
Machine learning ppt and presentation code
 
bloomfilter.ppt
bloomfilter.pptbloomfilter.ppt
bloomfilter.ppt
 
bloomfilter.ppt
bloomfilter.pptbloomfilter.ppt
bloomfilter.ppt
 
bloomfilter.ppt
bloomfilter.pptbloomfilter.ppt
bloomfilter.ppt
 
Cis435 week03
Cis435 week03Cis435 week03
Cis435 week03
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
 
Teorema balzano weierstrass
Teorema balzano weierstrassTeorema balzano weierstrass
Teorema balzano weierstrass
 
Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysis
 
Assignment 2 solution acs
Assignment 2 solution acsAssignment 2 solution acs
Assignment 2 solution acs
 
Lecture 1.2 quadratic functions
Lecture 1.2 quadratic functionsLecture 1.2 quadratic functions
Lecture 1.2 quadratic functions
 
azEssay3
azEssay3azEssay3
azEssay3
 
Chi-squared Goodness of Fit Test Project Overview and.docx
Chi-squared Goodness of Fit Test Project  Overview and.docxChi-squared Goodness of Fit Test Project  Overview and.docx
Chi-squared Goodness of Fit Test Project Overview and.docx
 
B02110105012
B02110105012B02110105012
B02110105012
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 

Entropy Coding Set Shaping Theory.pdf

  • 1. SET SHAPING THEORY (THE FUTURE OF INFORMATION THEORY) PROF. JOHN KENDALL DIXON
  • 2. Important updates regarding Set Shaping Theory A fundamental step has been taken in the study of this theory. A group of information theory students published an article in which they experimentally confirmed the theoretical predictions of the set shaping theory. “Practical applications of Set Shaping Theory in Huffman coding” https://arxiv.org/abs/2208.13020 The Matlab code used is available in the following link: https://www.mathworks.com/matlabcentral/fileexchange/115590-test-sst-huffman-coding. In the following link you will find the description of the data compression experiment used to confirm the theoretical predictions https://www.academia.edu/88055617/Description_of_the_program_used_to_validate_the_t heoretical_results_of_the_Set_Shaping_Theory
  • 3. Let us start with a famous Riemann quote that seems to predict Set Shaping Theory. "Two sets that contain the same number of elements can be interpreted as two points of view that observe the same phenomenon". As we will see, this sentence was a real source of inspiration and this new theory is based precisely on this intuition.
  • 4. The Set Shaping Theory represents a real change in the approach to data compression. For this reason it is important to remember some fundamental concepts regarding the theory of information developed by Shannon in his famous article: “A Mathematical Theory of Communication“ Suppose A is a set of symbols (ex: a collection of numbers or letters, an alphabet). We will call system or ensemble the triple 𝑋 = (𝑥; 𝐴; 𝑃) formed by x, a random variable, called state, 𝐴 = ሼ𝑥1,𝑥2 … … ሽ 𝑥𝐼 are the possible values of x (states) and 𝑃 = ቄ𝑝1,𝑝2 … … ሽ 𝑝𝐼 is the probability distribution of the states 𝑃 𝑥𝑖 = 𝑝𝑖 with σ𝑖=1 𝐼 𝑝𝑖 = 1.
  • 5. The fundamental point of Shannon's approach is to shift the focus from the sequence, which must be compressed, to the ensemble X that generated it. Thus, for Shannon to encode a sequence means to encode the ensemble X (source) that generated the sequence. We can represent the source as a die having N faces, where the probability of rolling a face is defined by the function P. Therefore, for example, a balanced die represents an ensemble X with a uniform probability distribution.
  • 6. Shannon's point of view on data compression is based on associating the shorter codewords with the faces of the die with a higher probability of exit. Consequently, each face of the die will be associated with a codeword whose length will depend on the probability of that face coming out. This mathematical model created by Shannon is based on a mathematical function called entropy, defined in the following way: Given an ensemble 𝑋 = (𝑥; 𝐴; 𝑃), the entropy of X, denoted H, is defined as: 𝐻 𝑋 = − ෍ 𝑥𝑖∈𝐴 𝑝(𝑥𝑖) 𝑙𝑜𝑔𝑏𝑝(𝑥𝑖)
  • 7. Shannon's first theorem (source coding theorem) shows that the average length of the codewords cannot be less than the entropy of the source H(X). In order to define a limit to the compression of a message the asymptotic equipartition 'principle is used. This principle is very useful and is also used in the Set Shaping Theory. To understand its meaning we use a die as a representation of the source. So, for example, if we take a classic six-sided non-loaded die and roll it 100 times the probability of getting one hundred 1's is very small 1 6 100 . But instead the probability of obtaining a sequence, in which the faces have a probability close to 1/6 is very high and grows with increasing throws.
  • 8. Consequently, this principle tells us that if N (number of dice rolls) is very large, tending to infinity, the generated sequence almost certainly belongs to a subset (typical set) that contains only the sequences with an entropy close to NH(X). The error made with this approximation is negligible. It is interesting to note that the size of the typical set decreases as the entropy H(X) decreases, so the smaller size of the typical set implies that the sequences generated by the source can be encoded in less space. Using this principle Shannon's first theorem can be rephrased as follows:N i.i.d. random variables each with entropy H(X) can be compressed into more than NH(X) bits with negligible risk of information loss, as N → ∞; conversely if they are compressed into fewer than NH(X) bits it is virtually certain that information will be lost. Now we can introduce the Set Shaping Therory that as we will see represents a completely new point of view.
  • 9. Set Shaping Theroy To understand this theory it is interesting to try to answer the following question: how can we simulate a sequence of N symbols emitted by a source defined by an ensemble 𝑋 = (𝑥; 𝐴; 𝑃)? The simplest way is to consider our source X as a die, so if we want to simulate 10 values of X, we roll the die 10 times. Another way is as follows: if we roll a 6-sided die 10 times we can generate 𝐴 10 = 610 different sequences. So, if we write each of these sequences on a sheet and put them in a box and then do a random draw, we get the same result as rolling the die ten times.
  • 10. Our box that contains 610sequences is nothing more than a set 𝐴10 that contains 610elements. Consequently, 10 values of the ensemble X can be obtained by randomly extracting (if the distribution of P is uniform) an element from the set 𝐴10. If the distribution of X is not uniform, as in the case of the example, the probability of extraction of each sequence will depend on the probability distribution P. According to Riemann's intuition, each set of dimension 𝐴 𝑁 represents a different point of view with which to observe N values generated by the source 𝑋 = (𝑥; 𝐴; 𝑃). Indeed, if two sets A and B have the same number of elements, it is possible to define a bijection function that converts an element of set A to an element of set B and vice versa. According to this theory, the source is seen as a set that can be transformed into any other set of the same size.
  • 11. At this point, we must ask ourselves what kind of sequences the new set should contain. The new sequences cannot have a length N2 less than the original sequences, because the new set created would be smaller than the one generated by our source 𝐴 𝑁 > 𝐴 𝑁2 The new sequence cannot even have the same length, indeed it has been shown that entropy is invariant for every isomorphism. The only possible solution is that the new sequences have a length N2 greater than the original sequences. By increasing the length of the sequence, the new set will have many more elements than the source set, indeed if N2> N we have 𝐴 𝑁 < 𝐴 𝑁2.
  • 12. Therefore, we must select from the set 𝐴𝑁2 a subset of size equal to 𝐴 𝑁. This operation is called "Shaping of the source" because what is done is to cut some sequences belonging to the set 𝐴𝑁2 with N2> N making their exit probability null. So, it's like replacing the die with a new rigged die. The price we have to pay to perform this substitution is to roll the die more times.
  • 13. Since this theory is used in data compression, the most common method of performing the “shaping of the source” is the one in which, the subset is chosen by selecting the sequences with less entropy. For example, we take as a source 𝑋 = (𝑥; 𝐴; 𝑃) a classic six-sided die (A=6) not loaded (uniform P) and we roll it 10 times. There are 610 possible sequences 𝑎 = 𝑥1, … … . , 𝑥10 which can be obtained by rolling the die 10 times. We call this set 𝐴10( 𝐴10 = 610). We order these sequences based on their entropy value, so sequence number 1 𝑎1 will have the lowest entropy and sequence number 610𝑎610 will have the highest entropy. Now we roll the die 11 times, in this case there are 611possible sequences 𝑎 = 𝑥1, … … . , 𝑥11 . Let us call this set 𝐴11 ( 𝐴11 = 611). We sort, as in the previous case, the sequences based on the entropy value.
  • 14. In this way, we obtain two series of sequences with increasing entropy. 𝐴10 𝐴11 𝑎611 ….. 𝑎610 → 𝑎610 ….. → ….. ….. → ….. 𝑎2 → 𝑎2 𝑎1 → 𝑎1 We define a bijection function that transforms the sequence 𝑎1 belonging to the set 𝐴10 into the sequence 𝑎1 belonging to the set 𝐴11 . We continue in this way for all the sequences of the set 𝐴10 as defined by the arrows.
  • 15. Then we obtain a bijection function f which transforms the set 𝐴10 into a subset 𝐵11 of the set 𝐴11, 𝐵11⊂ 𝐴11 and 𝐴10 = 𝐵11 . 𝑓: 𝐴10 → 𝐵11 We call 𝑓𝑚 the bijection function defined in the previous example, in which the sequences 𝑎 ∈ 𝐴10 are transformed into sequences with less entropy belonging to 𝐴11 according to the scheme defined by the arrows.
  • 16. To understand the advantages of this theory we need to define some functions. Given a sequence 𝑎𝑖 = 𝑥1 … … … … 𝑥𝑁 , generated by a source 𝑋 = (𝑥; 𝐴; 𝑃), we define its information content as follows: 𝐼 𝑎𝑖 = − ෍ 𝑖=1 𝑁 log 𝑝(𝑥𝑖) The probability 𝑃(𝑎𝑖) that the source X generates the sequence 𝑎𝑖 is: 𝑃 𝑎𝑖 = ς𝑖=1 𝑁 𝑝(𝑥𝑖) Consequently, the average information content of a sequence generated by a source 𝑋 = (𝑥; 𝐴; 𝑃) is: 𝐼 𝑎 = ෍ 𝑖=1 𝐴 𝑁 𝑃(𝑎𝑖)𝐼(𝑎𝑖)
  • 17. Now, we apply the bijection function f on the set 𝐴𝑁: 𝑓: 𝐴𝑁 → 𝐵𝑁+𝑘 With 𝐾, 𝑁 ∈ ℕ 𝑎𝑛𝑑 𝐾 > 0, 𝐴𝑁 = 𝐵𝑁+𝑘 . 𝑓 𝑎 = 𝑏 with 𝑎 = 𝑎1, … … . , 𝑎𝑁 and 𝑏 = 𝑏1, … … . , 𝑏𝑁+𝐾 , 𝑎 ∈ 𝐴𝑁 and 𝑏 ∈ 𝐵𝑁+𝐾 The parameter K is called the shaping order of the source and represents the difference in length between the source sequences belonging to 𝐴𝑁 and the transformed sequences belonging to 𝐵𝑁+𝑘
  • 18. Given a source 𝑋 = (𝑥; 𝐴; 𝑃) and a function f we will have: 𝐼 𝑎 = ෍ 𝑖=1 𝐴 𝑁 𝑃(𝑎𝑖)𝐼(𝑎𝑖) 𝐼 𝑏 = ෍ 𝑖=1 𝐴 𝑁 𝑃(𝑎𝑖)𝐼(𝑏𝑖) 𝐼 𝑎𝑖 = − ෍ 𝑖=1 𝑁 log 𝑝 𝑥𝑖 𝑐𝑜𝑛 𝑎𝑖 ∈ 𝐴𝑁 𝐼 𝑏𝑖 = − ෍ 𝑖=1 𝑁+𝑘 log 𝑝 𝑏𝑖 𝑐𝑜𝑛 𝑏𝑖 ∈ 𝐵𝑁+𝐾 Using these definitions, the 𝑓𝑚 function of the example is defined as follows: Given the set 𝑓𝑚 𝐴𝑁 = 𝐵𝑁+𝐾 and its complementary 𝐴𝑁+𝐾 − 𝐵𝑁+𝐾 = 𝐶𝑁+𝐾 , for each sequence 𝑏 ∈ 𝐵𝑁+𝐾 the information content 𝐼 𝑏 is always less than 𝐼 𝑐 for each 𝑐 ∈ 𝐶𝑁+𝐾 and 𝐼 𝑏𝑖 < 𝐼 𝑏𝑖+1 ∀ 𝑏 ∈ 𝐵𝑁+𝐾.
  • 19. If we apply the function 𝑓𝑚 we will expect that 𝐼 𝑏 ≥ 𝐼 𝑎 instead, we have a much more complex situation where when 𝐴 > 2 we have 𝐼 𝑏 < 𝐼 𝑎 . The table shows the bit values of 𝐼 𝑎𝑖 , 𝐼 𝑏𝑖 and 𝐼 𝑎 − 𝐼 𝑏 with K=1, N=100 and 𝑓 = 𝑓𝑚, relative and a source 𝑋 = (𝑥; 𝐴; 𝑃), with 𝐴 variable from 2 to 10 and with uniform probability distribution 𝑃 𝑥𝑖 = 1 𝐴 . 𝐴 𝐼 𝑎 𝐼 𝑏 𝐼 𝑎 − 𝐼 𝑏 2 99,275 99,659 -0,383 3 157,044 157,040 0,004 4 197,819 197,324 0,495 5 229,271 228,304 0,968 6 254,843 253,401 1,443 7 276,353 274,464 1,889 8 294,868 292,527 2,341 9 311,121 308,383 2,738 10 325,570 322,388 3,181
  • 20. The data in the table show an extremely interesting and unexpected result. Indeed, when 𝐴 > 2, the average information of a sequence b randomly extracted from the set 𝐵𝑁+1 turns out to be less than the average information of a sequence a randomly extracted from the set 𝐴𝑁 . The data in the table are relate to N=100, however these results remain valid also for values of N lower and higher than 100. It is important to specify that the set 𝐵𝑁+1 contains sequences of the same length and different from each other, so the result obtained does not violate the Pigeonhole principle.
  • 21. The reasons why Set Shaping Theory represents a revolution in information theory 1) This theory, introducing conceptually very advanced elements hypothesized by Riemann, represents a completely different point of view from that proposed by Shannon. 2) It develops a new class of bijection functions with properties of strong practical relevance in many fields. 3) It can help us solve many open problems concerning information theory. 4) It raises important questions about entropy that can allow us to better understand this important function. Finally most importantly, it is a new field with an infinity of possible results and applications yet to be discovered.