SlideShare a Scribd company logo
1 of 41
Download to read offline
UNIVERSITY OF MINES AND TECHNOLOGY
(UMAT) - TARKWA
School of Railways and Infrastructure
Development (SRID)
DEPARTMENT OF COMPUTER SCI. AND
ENG.
INFORMATION THEORY
CE 379
Monday,
13
February
2023
1
Course Lecturer: Engr Dr Albert K Kwansah Ansah PE MIETGH
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPY, RELATIVE ENTROPY, AND MUTUAL
INFORMATION
Monday,
13
February
2023
2
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPY, RELATIVE ENTROPY AND MUTUAL
INFORMATION
In this chapter:
 We will introduce certain key measures of
information, that play crucial roles in theoretical
and operational characterisations throughout the
course.
i.e. the entropy, the mutual information, and the
relative entropy
 We will also exhibit some key properties
exhibited by these information measures.
Monday,
13
February
2023
3
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Monday,
13
February
2023
4
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
Notation
 Random Variables (objects): used more
“loosely” i.e.
 Alphabets:
 Specific Values:
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)

Monday,
13
February
2023
5
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)

Monday,
13
February
2023
6
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
 Intuitively, the more improbable an event is, the
more informative it is; and so the monotonic
behaviour of (1) seems appropriate.
But why the logarithm?
 The log measure is justified by desire for info to
be additive for the algebra to reflect the Rules of
Probability
 Thus, total information received is the sum of the
individual pieces and the probabilities of
independent events multiply to give their
combined probabilities
Monday,
13
February
2023
7
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
 Logs are taken in order for the joint probability of
independent events or messages to contribute
additively to the information gained.
NB: This principle can also be understood in terms
of the combinatorics of state spaces.
Monday,
13
February
2023
8
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Example:
Assume we have two independent problems, one
with n possible solutions or states each having
probability pn, and other with m possible solutions
or states each having probability pm.
Then number of combined states is mn, each of
these has probability pmpn. We want to say that the
information gained by specifying the solution to
both problems is the sum of that gained from each
one.
This desired property is achieved:
(2)
Monday,
13
February
2023
9
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
A Note on Logarithms:
In info theory we wish to compute base-2 logs of
quantities, but most calculators offer Napierian
(base 2.718...) and decimal (base 10) logarithms.
So, the ff conversions are useful:
Henceforth we will omit subscript; base-2 is always
presumed.
Monday,
13
February
2023
10
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Entropy of Ensembles:
 An ensemble is the set of outcomes of one or more
random variables i.e. probabilities are attached
to the outcomes
Probabilities are non-uniform, thus; event i will
have probability pi, and sums to 1 because all
possible outcomes are included;
Hence, form a probability distribution:
(3)
Monday,
13
February
2023
11
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Entropy of Ensembles (cont’d):
 The H is simply the average entropy of all the
elements in it.
It can be computed by weighting each of the log
pi contributions by its probability pi:
(4)
(4) allows us to talk of info content or entropy of
a random variable, from knowledge of the
probability distribution that it obeys.
Monday,
13
February
2023
12
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Entropy of Ensembles (cont’d):
nb: H does not depend upon the actual values
taken by the random variable! - Only upon their
relative probabilities.
Scenario:
We consider a random variable that takes on
only two values, one with probability p and other
with probability (1 - p).
H is a concave function of this distribution, and
equals 0 if p = 0 or p = 1:
Monday,
13
February
2023
13
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Entropy of Ensembles (cont’d):
Monday,
13
February
2023
14
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Example of entropy as average uncertainty:
Various letters of English language have the ff
relative frequencies (probabilities) in descending
order:
If they are equiprobable, H of the ensemble
would have been log2(1/26) = 4.7 bits.
It means that as few as only four ‘Yes/No’
questions are needed, in principle, to identify
one of the 26 letters of the alphabet.
Monday,
13
February
2023
15
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Example of entropy as average uncertainty (cont’d):
How can this be true?
That is the subject matter of Shannon’s SOURCE
CODING THEOREM
We note the important assumption: that the
“source statistics” are known! i.e. the a priori
probabilities of the message generator, to
construct an optimal code.
Monday,
13
February
2023
16
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Several further measures of entropy need to be
defined
i.e. marginal, joint, and conditional probabilities
of random variables.
Some key relationships will emerge, that we can
apply to the analysis of communication channels.
Notation
Capital letters X and Y to name random variables
Lower case x and y refers their respective outcomes
Monday,
13
February
2023
17
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Notation (cont’d)
These are drawn from particular sets:
and
The probability of a particular outcome p(x = ai) is
denoted pi, with 0 ≤ pi ≤ 1 and
Joint ensemble
An ensemble is just a random variable X, whose
entropy was defined in (4).
Monday,
13
February
2023
18
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Joint ensemble (cont’d)
A joint ensemble ‘XY’ is an ensemble whose
outcomes are ordered pairs x, y with
and
Joint ensemble XY defines probability distribution
p(x, y) over all possible joint outcomes x, y.
Marginal probability
From Sum Rule, the probability of X taking on a
value x = ai is the sum of the joint probabilities of
this outcome for X and all possible outcomes for Y:
Monday,
13
February
2023
19
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Marginal probability (cont’d)
Can simplify this notation to:
and similarly:
Conditional probability:
From the Product Rule, we see that the conditional
probability that x = ai, given that y = bj, is:
Monday,
13
February
2023
20
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
Sum Rule
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Conditional probability (cont’d)
Can simplify this notation to:
and similarly:
We now define various entropy measures for joint
ensembles:
Monday,
13
February
2023
21
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Joint entropy of XY
(5)
We note that in comparison (5) to (4), the ‘-’ sign in
front is replaced by taking the reciprocal of p inside
the logarithm
From this definition, it follows that joint entropy is
additive if X and Y are independent R.V.s:
ASSIGNMENT: Prove this.
Monday,
13
February
2023
22
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Conditional entropy of an ensemble X, given y = bj
Measures the uncertainty remaining about random
variable X after specifying that R.V. Y has taken on
a particular value y = bj.
It is defined naturally as the entropy of the
probability distribution :
(6)
If we now consider the above quantity averaged
over all possible outcomes of Y, each weighted by
its probability p(y), then we arrive at the...
Monday,
13
February
2023
23
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Conditional entropy of an ensemble X, given an
ensemble Y:
(7)
and from the Sum Rule, if we move p(y) from outer
summation over y, to inside inner summation over
x, the two probability terms combine and become
just p(x, y) summed over all x, y.
(8)
This measures the average uncertainty that
remains about X, when Y is known.
Monday,
13
February
2023
24
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Chain Rule for Entropy
Joint entropy, conditional entropy, and marginal
entropy for two ensembles X and Y are related by:
(9)
Joint entropy of a pair of R. V.s is the entropy of
one plus the conditional entropy of the other.
Monday,
13
February
2023
25
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Corollary to the Chain Rule
If X, Y, Z, are discrete R.V.s, the conditionalising of
the joint distribution of any two upon the third, is
also expressed by a Chain Rule:
(10)
Monday,
13
February
2023
26
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Independence Bound on Entropy
A consequence of Chain Rule for Entropy is that if
there are many different R. V.s X1, X2, …., Xn, then
sum of all their individual entropies is an upper
bound on their joint entropy:
(11)
Their joint entropy only reaches this upper bound if
all of the R.V.s are independent
Monday,
13
February
2023
27
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Mutual Information between X and Y
Mutual information between two R.V.s measures
the amount of information that one conveys about
the other.
Equivalently, it measures the average reduction in
uncertainty about X that results from learning
about Y.
It is defined:
(12)
Monday,
13
February
2023
28
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Mutual Information between X and Y
X says as much about Y as Y says about X.
NB: In case X and Y are independent R. V.s, then
the numerator inside the logarithm equals the
denominator, then mutual information equals zero.
Non-negativity: mutual information is always ≥ 0.
When the two R.V.s are perfectly correlated, their
mutual information is the entropy of either one.
Monday,
13
February
2023
29
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Mutual Information between X and Y
Thus, I(X; X) = H(X): the mutual information of a
R.V. with itself is just its entropy.
Hence, the entropy H(X) of a random variable X is
sometimes referred to as its self-information.
These properties are reflected in three equivalent
definitions for mutual information btn X and Y:
I(X; Y ) = H(X) − H(X | Y) (13)
I(X; Y ) = H(Y ) − H(Y | X) = I(Y; X) (14)
I(X; Y ) = H(X) + H(Y ) − H(X, Y) (15)
Monday,
13
February
2023
30
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Mutual Information between X and Y
Effectively, the mutual information I(X; Y) is the
intersection btn H(X) and H(Y), since it represents
their statistical dependence.
In the Venn diagram the portion of H(X) that does
not lie within I(X; Y) is just H(X | Y) portion of H(Y)
that does not lie within I(X; Y ) is just H(Y | X).
Monday,
13
February
2023
31
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Venn diagram illustrating the relationship between
entropy and mutual information.
Monday,
13
February
2023
32
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Distance D(X,Y) between X and Y
Amount by which joint entropy of two R.Vs exceeds
their mutual information is a measure of the
“distance” between them:
D(X, Y ) = H(X, Y ) − I(X; Y ) (16)
NB: This quantity satisfies the standard axioms for
a distance:
D(X, Y) ≥ 0, D(X, X) = 0, D(X, Y) = D(Y, X) and
D(X, Z) ≤ D(X, Y ) + D(Y, Z)
Monday,
13
February
2023
33
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Relative entropy, or Kullback-Leibler distance
Another important measure of the “distance” btn
two R.V.s is the relative entropy or Kullback-
Leibler distance.
It is also called the information for discrimination.
If p(x) and q(x) are two probability distributions
defined over the same set of outcomes x, then their
relative entropy is:
(17)
Monday,
13
February
2023
34
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Relative entropy, or Kullback-Leibler distance
NB: DKL(pǁq) ≥ 0 and if p(x) = q(x) then DKL(p‖q) = 0
This metric is not strictly a “distance”, since in
general it lacks symmetry: DKL(p‖q) ≠ DKL(p‖q).
Relative entropy DKL(p‖q) is a measure of the
“inefficiency” of assuming that a distribution is q(x)
when in fact it is p(x).
Monday,
13
February
2023
35
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Relative entropy, or Kullback-Leibler distance
Example
If we have an optimal code for the distribution p(x)
i.e. we use on average H(p(x)) bits, its entropy, to
describe it, then the number of additional bits that
we would need to use if we instead described p(x)
using an optimal code for q(x), would be their
relative entropy DKL(p‖q).
Monday,
13
February
2023
36
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Fano’s Inequality
We note that conditioning reduces entropy:
i.e. H(X|Y) ≤ H(X).
If X and Y are perfectly correlated, then their
conditional entropy is 0.
If X is any deterministic function of Y, then there
remains no uncertainty about X once Y is known
and so their conditional entropy H(X|Y) = 0.
Monday,
13
February
2023
37
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
Fano’s Inequality
Fano’s Inequality relates the probability of error Pe
in guessing X from knowledge of Y to their
conditional entropy H(X|Y) when the no. of possible
outcomes is |A|, i.e. length of a symbol alphabet:
(18)
The lower bound on Pe is a linearly increasing
function of H(X|Y).
Monday,
13
February
2023
38
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
The “Data Processing Inequality”
If R.V.s X, Y, and Z form a Markov chain i.e.
conditional distribution of Z depends only on Y and
is independent of X, denoted as X → Y → Z, then
the mutual information must be monotonically
decreasing over steps along the chain:
I(X; Y ) ≥ I(X; Z) (19)
Monday,
13
February
2023
39
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF
INFORMATION (CONT’D)
We now turn to applying these measures and
relationships to the study of communications
channels.
Monday,
13
February
2023
40
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah
Thank You
Good Luck
Monday,
13
February
2023
41
Prepared
by:
Engr
Dr
AK
Kwansah
Ansah

More Related Content

Similar to 3 Entropy, Relative Entropy and Mutual Information.pdf

Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)
NYversity
 
Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02
laboratoridalbasso
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Advanced-Concepts-Team
 
Use of the correlation coefficient as a measure of effectiveness of a scoring...
Use of the correlation coefficient as a measure of effectiveness of a scoring...Use of the correlation coefficient as a measure of effectiveness of a scoring...
Use of the correlation coefficient as a measure of effectiveness of a scoring...
Wajih Alaiyan
 

Similar to 3 Entropy, Relative Entropy and Mutual Information.pdf (20)

Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
 
Lec 2 discrete random variable
Lec 2 discrete random variableLec 2 discrete random variable
Lec 2 discrete random variable
 
Spike sorting: What is it? Why do we need it? Where does it come from? How is...
Spike sorting: What is it? Why do we need it? Where does it come from? How is...Spike sorting: What is it? Why do we need it? Where does it come from? How is...
Spike sorting: What is it? Why do we need it? Where does it come from? How is...
 
Random Variables for discrete case
Random Variables for discrete caseRandom Variables for discrete case
Random Variables for discrete case
 
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural Networks
 
Brute force searching, the typical set and guesswork
Brute force searching, the typical set and guessworkBrute force searching, the typical set and guesswork
Brute force searching, the typical set and guesswork
 
Econometrics of panel data - a presentation
Econometrics of panel data - a presentationEconometrics of panel data - a presentation
Econometrics of panel data - a presentation
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
Appendix 2 Probability And Statistics
Appendix 2  Probability And StatisticsAppendix 2  Probability And Statistics
Appendix 2 Probability And Statistics
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)
 
Öncel Akademi: İstatistiksel Sismoloji
Öncel Akademi: İstatistiksel SismolojiÖncel Akademi: İstatistiksel Sismoloji
Öncel Akademi: İstatistiksel Sismoloji
 
Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02Ldb Convergenze Parallele_De barros_02
Ldb Convergenze Parallele_De barros_02
 
Learning dyadic data and predicting unaccomplished co-occurrent values by mix...
Learning dyadic data and predicting unaccomplished co-occurrent values by mix...Learning dyadic data and predicting unaccomplished co-occurrent values by mix...
Learning dyadic data and predicting unaccomplished co-occurrent values by mix...
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
Communication Theory - Random Process.pdf
Communication Theory - Random Process.pdfCommunication Theory - Random Process.pdf
Communication Theory - Random Process.pdf
 
Ht3613671371
Ht3613671371Ht3613671371
Ht3613671371
 
Ht3613671371
Ht3613671371Ht3613671371
Ht3613671371
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Use of the correlation coefficient as a measure of effectiveness of a scoring...
Use of the correlation coefficient as a measure of effectiveness of a scoring...Use of the correlation coefficient as a measure of effectiveness of a scoring...
Use of the correlation coefficient as a measure of effectiveness of a scoring...
 

Recently uploaded

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 

Recently uploaded (20)

Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 

3 Entropy, Relative Entropy and Mutual Information.pdf

  • 1. UNIVERSITY OF MINES AND TECHNOLOGY (UMAT) - TARKWA School of Railways and Infrastructure Development (SRID) DEPARTMENT OF COMPUTER SCI. AND ENG. INFORMATION THEORY CE 379 Monday, 13 February 2023 1 Course Lecturer: Engr Dr Albert K Kwansah Ansah PE MIETGH Prepared by: Engr Dr AK Kwansah Ansah
  • 2. ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION Monday, 13 February 2023 2 Prepared by: Engr Dr AK Kwansah Ansah
  • 3. ENTROPY, RELATIVE ENTROPY AND MUTUAL INFORMATION In this chapter:  We will introduce certain key measures of information, that play crucial roles in theoretical and operational characterisations throughout the course. i.e. the entropy, the mutual information, and the relative entropy  We will also exhibit some key properties exhibited by these information measures. Monday, 13 February 2023 3 Prepared by: Engr Dr AK Kwansah Ansah
  • 4. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Monday, 13 February 2023 4 Prepared by: Engr Dr AK Kwansah Ansah Notation  Random Variables (objects): used more “loosely” i.e.  Alphabets:  Specific Values:
  • 5. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D)  Monday, 13 February 2023 5 Prepared by: Engr Dr AK Kwansah Ansah
  • 6. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D)  Monday, 13 February 2023 6 Prepared by: Engr Dr AK Kwansah Ansah
  • 7. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D)  Intuitively, the more improbable an event is, the more informative it is; and so the monotonic behaviour of (1) seems appropriate. But why the logarithm?  The log measure is justified by desire for info to be additive for the algebra to reflect the Rules of Probability  Thus, total information received is the sum of the individual pieces and the probabilities of independent events multiply to give their combined probabilities Monday, 13 February 2023 7 Prepared by: Engr Dr AK Kwansah Ansah
  • 8. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D)  Logs are taken in order for the joint probability of independent events or messages to contribute additively to the information gained. NB: This principle can also be understood in terms of the combinatorics of state spaces. Monday, 13 February 2023 8 Prepared by: Engr Dr AK Kwansah Ansah
  • 9. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Example: Assume we have two independent problems, one with n possible solutions or states each having probability pn, and other with m possible solutions or states each having probability pm. Then number of combined states is mn, each of these has probability pmpn. We want to say that the information gained by specifying the solution to both problems is the sum of that gained from each one. This desired property is achieved: (2) Monday, 13 February 2023 9 Prepared by: Engr Dr AK Kwansah Ansah
  • 10. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) A Note on Logarithms: In info theory we wish to compute base-2 logs of quantities, but most calculators offer Napierian (base 2.718...) and decimal (base 10) logarithms. So, the ff conversions are useful: Henceforth we will omit subscript; base-2 is always presumed. Monday, 13 February 2023 10 Prepared by: Engr Dr AK Kwansah Ansah
  • 11. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Entropy of Ensembles:  An ensemble is the set of outcomes of one or more random variables i.e. probabilities are attached to the outcomes Probabilities are non-uniform, thus; event i will have probability pi, and sums to 1 because all possible outcomes are included; Hence, form a probability distribution: (3) Monday, 13 February 2023 11 Prepared by: Engr Dr AK Kwansah Ansah
  • 12. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Entropy of Ensembles (cont’d):  The H is simply the average entropy of all the elements in it. It can be computed by weighting each of the log pi contributions by its probability pi: (4) (4) allows us to talk of info content or entropy of a random variable, from knowledge of the probability distribution that it obeys. Monday, 13 February 2023 12 Prepared by: Engr Dr AK Kwansah Ansah
  • 13. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Entropy of Ensembles (cont’d): nb: H does not depend upon the actual values taken by the random variable! - Only upon their relative probabilities. Scenario: We consider a random variable that takes on only two values, one with probability p and other with probability (1 - p). H is a concave function of this distribution, and equals 0 if p = 0 or p = 1: Monday, 13 February 2023 13 Prepared by: Engr Dr AK Kwansah Ansah
  • 14. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Entropy of Ensembles (cont’d): Monday, 13 February 2023 14 Prepared by: Engr Dr AK Kwansah Ansah
  • 15. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Example of entropy as average uncertainty: Various letters of English language have the ff relative frequencies (probabilities) in descending order: If they are equiprobable, H of the ensemble would have been log2(1/26) = 4.7 bits. It means that as few as only four ‘Yes/No’ questions are needed, in principle, to identify one of the 26 letters of the alphabet. Monday, 13 February 2023 15 Prepared by: Engr Dr AK Kwansah Ansah
  • 16. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Example of entropy as average uncertainty (cont’d): How can this be true? That is the subject matter of Shannon’s SOURCE CODING THEOREM We note the important assumption: that the “source statistics” are known! i.e. the a priori probabilities of the message generator, to construct an optimal code. Monday, 13 February 2023 16 Prepared by: Engr Dr AK Kwansah Ansah
  • 17. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Several further measures of entropy need to be defined i.e. marginal, joint, and conditional probabilities of random variables. Some key relationships will emerge, that we can apply to the analysis of communication channels. Notation Capital letters X and Y to name random variables Lower case x and y refers their respective outcomes Monday, 13 February 2023 17 Prepared by: Engr Dr AK Kwansah Ansah
  • 18. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Notation (cont’d) These are drawn from particular sets: and The probability of a particular outcome p(x = ai) is denoted pi, with 0 ≤ pi ≤ 1 and Joint ensemble An ensemble is just a random variable X, whose entropy was defined in (4). Monday, 13 February 2023 18 Prepared by: Engr Dr AK Kwansah Ansah
  • 19. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Joint ensemble (cont’d) A joint ensemble ‘XY’ is an ensemble whose outcomes are ordered pairs x, y with and Joint ensemble XY defines probability distribution p(x, y) over all possible joint outcomes x, y. Marginal probability From Sum Rule, the probability of X taking on a value x = ai is the sum of the joint probabilities of this outcome for X and all possible outcomes for Y: Monday, 13 February 2023 19 Prepared by: Engr Dr AK Kwansah Ansah
  • 20. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Marginal probability (cont’d) Can simplify this notation to: and similarly: Conditional probability: From the Product Rule, we see that the conditional probability that x = ai, given that y = bj, is: Monday, 13 February 2023 20 Prepared by: Engr Dr AK Kwansah Ansah Sum Rule
  • 21. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Conditional probability (cont’d) Can simplify this notation to: and similarly: We now define various entropy measures for joint ensembles: Monday, 13 February 2023 21 Prepared by: Engr Dr AK Kwansah Ansah
  • 22. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Joint entropy of XY (5) We note that in comparison (5) to (4), the ‘-’ sign in front is replaced by taking the reciprocal of p inside the logarithm From this definition, it follows that joint entropy is additive if X and Y are independent R.V.s: ASSIGNMENT: Prove this. Monday, 13 February 2023 22 Prepared by: Engr Dr AK Kwansah Ansah
  • 23. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Conditional entropy of an ensemble X, given y = bj Measures the uncertainty remaining about random variable X after specifying that R.V. Y has taken on a particular value y = bj. It is defined naturally as the entropy of the probability distribution : (6) If we now consider the above quantity averaged over all possible outcomes of Y, each weighted by its probability p(y), then we arrive at the... Monday, 13 February 2023 23 Prepared by: Engr Dr AK Kwansah Ansah
  • 24. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Conditional entropy of an ensemble X, given an ensemble Y: (7) and from the Sum Rule, if we move p(y) from outer summation over y, to inside inner summation over x, the two probability terms combine and become just p(x, y) summed over all x, y. (8) This measures the average uncertainty that remains about X, when Y is known. Monday, 13 February 2023 24 Prepared by: Engr Dr AK Kwansah Ansah
  • 25. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Chain Rule for Entropy Joint entropy, conditional entropy, and marginal entropy for two ensembles X and Y are related by: (9) Joint entropy of a pair of R. V.s is the entropy of one plus the conditional entropy of the other. Monday, 13 February 2023 25 Prepared by: Engr Dr AK Kwansah Ansah
  • 26. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Corollary to the Chain Rule If X, Y, Z, are discrete R.V.s, the conditionalising of the joint distribution of any two upon the third, is also expressed by a Chain Rule: (10) Monday, 13 February 2023 26 Prepared by: Engr Dr AK Kwansah Ansah
  • 27. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Independence Bound on Entropy A consequence of Chain Rule for Entropy is that if there are many different R. V.s X1, X2, …., Xn, then sum of all their individual entropies is an upper bound on their joint entropy: (11) Their joint entropy only reaches this upper bound if all of the R.V.s are independent Monday, 13 February 2023 27 Prepared by: Engr Dr AK Kwansah Ansah
  • 28. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Mutual Information between X and Y Mutual information between two R.V.s measures the amount of information that one conveys about the other. Equivalently, it measures the average reduction in uncertainty about X that results from learning about Y. It is defined: (12) Monday, 13 February 2023 28 Prepared by: Engr Dr AK Kwansah Ansah
  • 29. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Mutual Information between X and Y X says as much about Y as Y says about X. NB: In case X and Y are independent R. V.s, then the numerator inside the logarithm equals the denominator, then mutual information equals zero. Non-negativity: mutual information is always ≥ 0. When the two R.V.s are perfectly correlated, their mutual information is the entropy of either one. Monday, 13 February 2023 29 Prepared by: Engr Dr AK Kwansah Ansah
  • 30. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Mutual Information between X and Y Thus, I(X; X) = H(X): the mutual information of a R.V. with itself is just its entropy. Hence, the entropy H(X) of a random variable X is sometimes referred to as its self-information. These properties are reflected in three equivalent definitions for mutual information btn X and Y: I(X; Y ) = H(X) − H(X | Y) (13) I(X; Y ) = H(Y ) − H(Y | X) = I(Y; X) (14) I(X; Y ) = H(X) + H(Y ) − H(X, Y) (15) Monday, 13 February 2023 30 Prepared by: Engr Dr AK Kwansah Ansah
  • 31. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Mutual Information between X and Y Effectively, the mutual information I(X; Y) is the intersection btn H(X) and H(Y), since it represents their statistical dependence. In the Venn diagram the portion of H(X) that does not lie within I(X; Y) is just H(X | Y) portion of H(Y) that does not lie within I(X; Y ) is just H(Y | X). Monday, 13 February 2023 31 Prepared by: Engr Dr AK Kwansah Ansah
  • 32. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Venn diagram illustrating the relationship between entropy and mutual information. Monday, 13 February 2023 32 Prepared by: Engr Dr AK Kwansah Ansah
  • 33. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Distance D(X,Y) between X and Y Amount by which joint entropy of two R.Vs exceeds their mutual information is a measure of the “distance” between them: D(X, Y ) = H(X, Y ) − I(X; Y ) (16) NB: This quantity satisfies the standard axioms for a distance: D(X, Y) ≥ 0, D(X, X) = 0, D(X, Y) = D(Y, X) and D(X, Z) ≤ D(X, Y ) + D(Y, Z) Monday, 13 February 2023 33 Prepared by: Engr Dr AK Kwansah Ansah
  • 34. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Relative entropy, or Kullback-Leibler distance Another important measure of the “distance” btn two R.V.s is the relative entropy or Kullback- Leibler distance. It is also called the information for discrimination. If p(x) and q(x) are two probability distributions defined over the same set of outcomes x, then their relative entropy is: (17) Monday, 13 February 2023 34 Prepared by: Engr Dr AK Kwansah Ansah
  • 35. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Relative entropy, or Kullback-Leibler distance NB: DKL(pǁq) ≥ 0 and if p(x) = q(x) then DKL(p‖q) = 0 This metric is not strictly a “distance”, since in general it lacks symmetry: DKL(p‖q) ≠ DKL(p‖q). Relative entropy DKL(p‖q) is a measure of the “inefficiency” of assuming that a distribution is q(x) when in fact it is p(x). Monday, 13 February 2023 35 Prepared by: Engr Dr AK Kwansah Ansah
  • 36. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Relative entropy, or Kullback-Leibler distance Example If we have an optimal code for the distribution p(x) i.e. we use on average H(p(x)) bits, its entropy, to describe it, then the number of additional bits that we would need to use if we instead described p(x) using an optimal code for q(x), would be their relative entropy DKL(p‖q). Monday, 13 February 2023 36 Prepared by: Engr Dr AK Kwansah Ansah
  • 37. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Fano’s Inequality We note that conditioning reduces entropy: i.e. H(X|Y) ≤ H(X). If X and Y are perfectly correlated, then their conditional entropy is 0. If X is any deterministic function of Y, then there remains no uncertainty about X once Y is known and so their conditional entropy H(X|Y) = 0. Monday, 13 February 2023 37 Prepared by: Engr Dr AK Kwansah Ansah
  • 38. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) Fano’s Inequality Fano’s Inequality relates the probability of error Pe in guessing X from knowledge of Y to their conditional entropy H(X|Y) when the no. of possible outcomes is |A|, i.e. length of a symbol alphabet: (18) The lower bound on Pe is a linearly increasing function of H(X|Y). Monday, 13 February 2023 38 Prepared by: Engr Dr AK Kwansah Ansah
  • 39. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) The “Data Processing Inequality” If R.V.s X, Y, and Z form a Markov chain i.e. conditional distribution of Z depends only on Y and is independent of X, denoted as X → Y → Z, then the mutual information must be monotonically decreasing over steps along the chain: I(X; Y ) ≥ I(X; Z) (19) Monday, 13 February 2023 39 Prepared by: Engr Dr AK Kwansah Ansah
  • 40. ENTROPIES DEFINED AND WHY THEY ARE MEASURES OF INFORMATION (CONT’D) We now turn to applying these measures and relationships to the study of communications channels. Monday, 13 February 2023 40 Prepared by: Engr Dr AK Kwansah Ansah