SlideShare a Scribd company logo
1 of 50
Download to read offline
. . . . . .
.
.
. .
.
.
.
Information Theory: Principles and Applications
Tiago T. V. Vinhoza
March 19, 2010
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 1 / 50
. . . . . .
.
..
1 Course Information
.
..
2 What is Information Theory?
.
..
3 Review of Probability Theory
.
..
4 Information Measures
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 2 / 50
. . . . . .
Course Information
Information Theory: Principles and Applications
Prof. Tiago T. V. Vinhoza
Office: FEUP Building I, Room I322
Office hours: Wednesdays from 14h30-15h30.
Email: tiago.vinhoza@ieee.org
Prof. José Vieira
Prof. Paulo Jorge Ferreira
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 3 / 50
. . . . . .
Course Information
Information Theory: Principles and Applications
http://paginas.fe.up.pt/∼vinhoza (link for Info Theory)
Homeworks
Other notes
My Evaluation: (Almost) Weekly Homeworks + Final Exam
References:
Elements of Information Theory, Cover and Thomas, Wiley
Information Theory and Reliable Communication, Gallager
Information Theory, Inference, and Learning Algorithms, McKay
(available online)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 4 / 50
. . . . . .
What is Information Theory?
What is Information Theory?
IT is a branch of math (a strictly deductive system). (C. Shannon,
The bandwagon)
General statistical concept of communication. (N. Wiener, What is
IT?)
It was build upon the work of Shannon (1948)
It answers to two fundamental questions in Communications Theory:
What is the fundamental limit for information compression?
What is the fundamental limit on information transmission rate over a
communications channel?
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 5 / 50
. . . . . .
What is Information Theory?
What is Information Theory?
Mathematics: Inequalities
Computer Science: Kolmogorov Complexity
Statistics: Hypothesis Testings
Probability Theory: Limit Theorems
Engineering: Communications
Physics: Thermodynamics
Economics: Portfolio Theory
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 6 / 50
. . . . . .
What is Information Theory?
Communications Systems
The fundamental problem of communication is that of reproducing at
one point either exactly or approximately a message selected at
another point. (Claude Shannon: A Mathematical Theory of
Communications, 1948)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 7 / 50
. . . . . .
What is Information Theory?
Digital Communications Systems
Source
Source Coder: Convert an analog or digital source into bits.
Channel Coder: Protection against errors/erasures in the channel.
Modulator: Each binary sequence is assigned to a waveform
Channel: Physical Medium to send information from transmitter to
receiver. Source of randomness.
Demodulator, Channel Decoder, Source Decoder, Sink.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 8 / 50
. . . . . .
What is Information Theory?
Digital Communications Systems
Modulator + Channel = Discrete Channel.
Binary Symmetric Channel.
Binary Erasure Channel.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 9 / 50
. . . . . .
Review of Probability Theory
Review of Probability Theory
Axiomatic Approach
Relative Frequency Approach
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 10 / 50
. . . . . .
Review of Probability Theory
Axiomatic Approach
Application of a mathematical theory called Measure Theory.
It is based on a triplet
(Ω, F, P)
where
Ω is the sample space, which is the set of all possible outcomes.
F is the σ−algebra, which is the set of all possible events (or
combinations of outcomes).
P is the probability function, which can be any set function, whose
domain is Ω and the range is the closed unit interval [0,1]. It must
obey the following rules:
P(Ω) = 1
Let A be any event in F, then P(A) ≥ 0.
Let A and B be two events in F such that A ∩ B = ∅, then
P(A ∪ B) = P(A) + P(B).
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 11 / 50
. . . . . .
Review of Probability Theory
Axiomatic Approach: Other properties
Probability of complement: P(A) = 1 − P(A).
P(A) ≤ 1.
P(∅) = 0.
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 12 / 50
. . . . . .
Review of Probability Theory
Conditional Probability
Let A and B be two events, with P(A) > 0. The conditional
probability of B given A is defined as:
P(B|A) =
P(A ∩ B)
P(A)
Hence, P(A ∩ B) = P(B|A)P(A) = P(A|B)P(B)
If A ∩ B = ∅ then P(B|A) = 0.
If A ⊂ B, then P(B|A) = 1.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 13 / 50
. . . . . .
Review of Probability Theory
Bayes Rule
If A and B are events
P(A|B) =
P(B|A)P(A)
P(B)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 14 / 50
. . . . . .
Review of Probability Theory
Total Probabilty Theorem
A set of Bi , i = 1, . . . , n of events is a partition of Ω when:
∪n
i=1 Bi = Ω.
Bi ∩ Bj = ∅, if i ̸= j.
Theorem: If A is an event and Bi , i = 1, . . . , n of is a partition of Ω,
then:
P(A) =
n
∑
i=1
P(A ∩ Bi ) =
n
∑
i=1
P(A|Bi )P(Bi )
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 15 / 50
. . . . . .
Review of Probability Theory
Independence between Events
Two events A and B are statistically independent when
P(A ∩ B) = P(A)P(B)
Supposing that both P(A) and P(B) are greater than zero, from the
above definition we have that:
P(A|B) = P(A) P(B|A) = P(B)
Independent events and mutually exclusive events are different!
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 16 / 50
. . . . . .
Review of Probability Theory
Independence between events
N events are statistically independent if the intersection of the events
contained in any subset of those N events have probability equal to
the product of the individual probabilities
Example: Three events A, B and C are independent if:
P(A∩B) = P(A)P(B), P(A∩C) = P(A)P(C), P(B∩C) = P(B)P(C)
P(A ∩ B ∩ C) = P(A)P(B)P(C)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 17 / 50
. . . . . .
Review of Probability Theory
Random Variables
A random variable (rv) is a function that maps each ω ∈ Ω to a real
number.
X : Ω → R
ω → X(ω)
Through a random variable, subsets of Ω are mapped as subsets
(intervals) of the real numbers.
P(X ∈ I) = P({w|X(ω) ∈ I})
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 18 / 50
. . . . . .
Review of Probability Theory
Random Variables
A real random variable is a function whose domain is Ω and such that
for all real number x, the set Ax = {ω|X(ω) ≤ x} is an event.
P(w|X(w) = ±∞) = 0.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 19 / 50
. . . . . .
Review of Probability Theory
Cumulative Distribution Function
FX : R → [0, 1]
X → FX (x) = P(X ≤ x) = P(ω|X(ω) ≤ x)
FX (∞) = 1
FX (−∞) = 0
If x1 < x2, FX (x2) ≥ FX (x1).
FX (x+) = limϵ→0 FX (x + ϵ) = FX (x). (continuous on the right side).
FX (x) − FX (x−) = P(X = x).
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 20 / 50
. . . . . .
Review of Probability Theory
Types of Random Variables
Discrete: Cumulative function is a step function (sum of unit step
functions)
FX (x) =
∑
i
P(X = xi )u(x − xi )
where u(x) is the unit step funtion.
Example: X is the random variable that describes the outcome of the
roll of a die. X ∈ {1, 2, 3, 4, 5, 6}
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 21 / 50
. . . . . .
Review of Probability Theory
Types of Random Variable
Continous: Cumulative function is a continous function.
Mixed: Neither discrete nor continous.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 22 / 50
. . . . . .
Review of Probability Theory
Probability Density Function
It is the derivative of the cumulative distribution function:
pX (x) =
d
dx
FX (x)
∫ x
−∞ pX (x)dx = FX (x).
pX (x) ≥ 0.
∫ ∞
−∞ pX (x)dx = 1.
∫ b
a pX (x)dx = FX (b) − FX (a) = P(a ≤ X ≤ b).
P(X ∈ I) =
∫
I pX (x)dx, I ⊂ R.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 23 / 50
. . . . . .
Review of Probability Theory
Discrete Random Variables
Let us now focus only on discrete random variables.
Let X be a random variable with sample space X
The probability mass function (probability distribution function) of X
is a mapping pX (x) : X → [0, 1] satisfying:
∑
X∈X
pX (x) = 1
The number pX (x) := P(X = x)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 24 / 50
. . . . . .
Review of Probability Theory
Discrete Random Vectors
Let Z = [X, Y ] be a random vector with sample space Z = X × Y
The joint probability mass function (probability distribution function)
of Z is a mapping pZ (z) : Z → [0, 1] satisfying:
∑
Z∈Z
pZ (z) =
∑
x,y×Y
pXY (x, y) = 1
The number pZ (z) := pXY (x, y) = P(Z = z) = P(X = x, Y = y).
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 25 / 50
. . . . . .
Review of Probability Theory
Discrete Random Vectors
Marginal Distributions
pX (x) =
∑
y∈Y
pXY (x, y)
pY (y) =
∑
x∈X
pXY (x, y)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 26 / 50
. . . . . .
Review of Probability Theory
Discrete Random Vectors
Conditional Distributions
pX|Y =y (x) =
pXY (x, y)
pY (y)
pY |X=x (y) =
pXY (x, y)
pX (x)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 27 / 50
. . . . . .
Review of Probability Theory
Discrete Random Vectors
Random variables X and Y are independent if and only if
pXY (x, y) = pX (x)pY (y)
Consequences:
pX|Y =y (x) = pX (x)
pY |X=x (y) = pY (y)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 28 / 50
. . . . . .
Review of Probability Theory
Moments of a Discrete Random Variable
The n−th order moment of a discrete random variable X is defined as:
E[Xn
] =
∑
x∈X
xn
pX (x)
if n = 1, we have the mean of X, mX = E[X].
The m−th order central moment of a discrete random variable X is
defined as:
E[(X − mX )m
] =
∑
x∈X
(x − mX )m
pX (x)
if m = 2, we have the variance of X, σ2
X .
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 29 / 50
. . . . . .
Review of Probability Theory
Moments of a Discrete Random Vector
The joint moment n−th order with relation to X and k−th order with
relation to Y :
mnk = E[Xn
Y k
] =
∑
x∈X
∑
y∈Y
xn
yk
pXY (x, y)
The joint central n−th order with relation to X and k−th order with
relation to Y :
µnk = E[(X−mX )n
(Y −mY )k
] ==
∑
x∈X
∑
y∈Y
(x−mX )n
(y−mY )k
pXY (x, y)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 30 / 50
. . . . . .
Review of Probability Theory
Correlation and Covariance
The correlation of two random variables X and Y is the expected value
of their product (joint moment of order 1 in X and order 1 in Y ):
Corr(X, Y ) = m11 = E[XY ]
The covariance of two random variables X and Y is the joint central
moment of order 1 in X and order 1 in Y:
Cov(X, Y ) = µ11 = E[(X − mX )(Y − mY )]
Cov(X, Y ) = Corr(X, Y ) − mX mY
Correlation Coefficient:
ρXY =
Cov(X, Y )
σX σY
→ −1 ≤ ρXY ≤ 1
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 31 / 50
. . . . . .
Information Measures
What is Information?
It is a measure that quantifies the uncertainty of an event with given
probability - Shannon 1948.
For a discrete source with finite alphabet X = {x0, x1, . . . , xM−1}
where the probability of each symbol is given by P(X = xk) = pk
I(xk) = log
1
pk
= − log(pk)
If logarithm is base 2, information is given in bits.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 32 / 50
. . . . . .
Information Measures
What is Information?
It represents the surprise of seeing the outcome (a highly probable
outcome is not surprising).
event probability surprise
one equals one 1 0 bits
wrong guess on a 4-choice question 3/4 0.415 bits
correct guess on true-false question 1/2 1 bit
correct guess on a 4-choice question 1/4 2 bits
seven on a pair of dice 6/36 2.58 bits
win any prize at Euromilhões 1/24 4.585 bits
win Euromilhões Jackpot ≈ 1/76 million ≈ 26 bits
gamma ray burst mass extinction today < 2.7 · 10−12 > 38 bits
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 33 / 50
. . . . . .
Information Measures
Entropy
Expected value of information from a source.
H(X) = E[I(xk)] =
∑
x∈X
px (x)I(xk)
= −
∑
x∈X
px (x) log px (x)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 34 / 50
. . . . . .
Information Measures
Entropy of binary source
Let X be a binary source with p0 and p1 being the probabililty of
symbols x0 and x1 respectively.
H(X) = −p0 log p0 − p1 log p1
= −p0 log p0 − (1 − p0) log(1 − p0)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 35 / 50
. . . . . .
Information Measures
Entropy of binary source
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p0
H(X)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 36 / 50
. . . . . .
Information Measures
Joint Entropy
The joint entropy of a pair of random variables X and Y is given by:.
H(X, Y ) = −
∑
y∈Y
∑
x∈X
pXY (x, y) log pX,Y (x)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 37 / 50
. . . . . .
Information Measures
Conditional Entropy
Average amount of information of a random variable given the
occurence of other.
H(X|Y ) =
∑
y∈Y
pY (y)H(X|Y = y)
= −
∑
y∈Y
pY (y)
∑
x∈X
pX|Y =y (x) log px|Y =y (x)
= −
∑
y∈Y
∑
x∈X
pXY (x, y) log pX|Y =y (x)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 38 / 50
. . . . . .
Information Measures
Chain Rule of Entropy
The entropy of a pair of random variables is equal to the entropy of
one of them plus the conditional entropy.
H(X, Y ) = H(X) + H(Y |X)
Corollary
H(X, Y |Z) = H(X|Z) + H(Y |X, Z)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 39 / 50
. . . . . .
Information Measures
Chain Rule of Entropy: Generalization
H(X1, X2, . . . , XM) =
M
∑
j=1
H(Xj |X1, . . . , Xj−1)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 40 / 50
. . . . . .
Information Measures
Relative Entropy: Kullback-Leibler Distance
Is a measure of the distance between two distributions.
The relative entropy between two probability density functions pX (x)
and qX (x) is defined as:
D(pX (x)||qX (x)) =
∑
x∈X
pX (x) log
pX (x)
qX (x)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 41 / 50
. . . . . .
Information Measures
Relative Entropy: Kullback-Leibler Distance
D(pX (x)||qX (x)) ≥ 0 with equality if and only if pX (x) = qX (x).
D(pX (x)||qX (x)) ̸= D(qX (x)||pX (x))
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 42 / 50
. . . . . .
Information Measures
Mutual Information
The mutual information of two random variables X and Y is defined
as the relative entropy between the joint probability density pXY (x, y)
and the product of the marginals pX (x) and pY (y)
I(X; Y ) = D(pXY (x, y)||pX (x)pY (y))
=
∑
x∈X
∑
y∈Y
pXY (x, y) log
pX,Y (x, y)
pX (x)pY (y)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 43 / 50
. . . . . .
Information Measures
Mutual Information: Relations with Entropy
Reducing uncertainty of X due to the knowledge of Y :
I(X; Y ) = H(X) − H(X|Y )
Symmetry of the relation above: I(X; Y ) = H(Y ) − H(Y |X)
Sum of entropies:
I(X; Y ) = H(X) + H(Y ) − H(X, Y )
“Self” Mutual Information:
I(X; X) = H(X) − H(X|X) = H(X)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 44 / 50
. . . . . .
Information Measures
Mutual Information: Other Relations
Conditional Mutual Information:
I(X; Y |Z) = H(X|Z) − H(X|Y , Z)
Chain Rule for Mutual Information
I(X1, X2, . . . , XM; Y ) =
M
∑
j=1
I(Xj ; Y |X1, . . . , Xj−1)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 45 / 50
. . . . . .
Information Measures
Convex and Concave Functions
A function f (·) is convex over ain interval (a, b) if for every
x1, x2 ∈ [a, b] and 0 ≤ λ ≤ 1, if :
f (λx1 + (1 − λ)x2) ≤ λf (x1) + (1 − λ)f (x2)
A function f (·) is convex over an interval (a, b) if its second derivative
is non-negative over that interval (a, b).
A function f (·) is concave if −f (·) is convex.
Examples of convex functions: x2, |x|, ex , x log x, x ≥ 0.
Examples of concave functions: log x and
√
x, for x ≥ 0.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 46 / 50
. . . . . .
Information Measures
Jensen’s Inequality
If f (·) is a convex function and X is a random variable
E[f (X)] ≥ f (E[X])
Used to show that relative entropy and mutual information are greater
than zero.
Used also to show that H(X) ≤ log |X|.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 47 / 50
. . . . . .
Information Measures
Log-Sum Inequality
For n positive numbers a1, a2, . . . , an and b1, b2, . . . bn
n
∑
i=1
ai log
ai
bi
≥
( n
∑
i=1
ai
)
log
∑n
i=1 ai
∑n
i=1 bi
with equality if and only if ai /bi = c.
This inequality is used to prove the convexity of the relative entropy
and the concavity of the entropy.
Convexity/Concavity of mutual information
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 48 / 50
. . . . . .
Information Measures
Data Processing Inequality
Random variables X, Y , Z are said to form a Markov chain in that
order X → Y → Z, if the conditional distribution of Z depends only
on Y and is onditionally independent of X.
pXYZ (x, y, z) = pX (x)pY |X=x (y)pZ|Y =y (y)
If X → Y → Z, then
I(X; Y ) ≥ I(X; Z)
Let Z = g(Y ), X → Y → g(Y ), then I(X; Y ) ≥ I(X; g(Y ))
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 49 / 50
. . . . . .
Information Measures
Fano’s Inequality
Suppose we know a random variable Y and we wish to guess the value
of a correlated random variable X.
Fano’s inequality relates the probability of error in guessing X from Y
to its conditional entropy H(X|Y ).
Let X̂ = g(Y ), if Pe = P(X̂ ̸= X), then
H(Pe) + Pe log(|X| − 1) ≥ H(X|Y )
where H(Pe) is the binary entropy function evaluated at Pe.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 50 / 50

More Related Content

Similar to Principle of Information Theory PkPT.pdf

Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
Frank Nielsen
 
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
ZineddineALICHE1
 
Chapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability DistributionsChapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability Distributions
JasonTagapanGulla
 
CSE357 fa21 (1) Course Intro and Probability 8-26.pdf
CSE357 fa21 (1) Course Intro and Probability 8-26.pdfCSE357 fa21 (1) Course Intro and Probability 8-26.pdf
CSE357 fa21 (1) Course Intro and Probability 8-26.pdf
NermeenKamel7
 

Similar to Principle of Information Theory PkPT.pdf (20)

Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
 
smtlecture.7
smtlecture.7smtlecture.7
smtlecture.7
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Inequality, slides #2
Inequality, slides #2Inequality, slides #2
Inequality, slides #2
 
Inequalities #2
Inequalities #2Inequalities #2
Inequalities #2
 
Sufficiency
SufficiencySufficiency
Sufficiency
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
 
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
 
2_GLMs_printable.pdf
2_GLMs_printable.pdf2_GLMs_printable.pdf
2_GLMs_printable.pdf
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Bayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdfBayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdf
 
Soil Dynamics
Soil DynamicsSoil Dynamics
Soil Dynamics
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Chapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability DistributionsChapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability Distributions
 
CSE357 fa21 (1) Course Intro and Probability 8-26.pdf
CSE357 fa21 (1) Course Intro and Probability 8-26.pdfCSE357 fa21 (1) Course Intro and Probability 8-26.pdf
CSE357 fa21 (1) Course Intro and Probability 8-26.pdf
 
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Slides ineq-2
Slides ineq-2Slides ineq-2
Slides ineq-2
 

More from AhmedWasiu

Lecture 1 An Overview of Earth Station Technology-Jan2022.pptx
Lecture 1 An Overview of Earth Station Technology-Jan2022.pptxLecture 1 An Overview of Earth Station Technology-Jan2022.pptx
Lecture 1 An Overview of Earth Station Technology-Jan2022.pptx
AhmedWasiu
 
Lecture Note 4 Earth Station Testing-Jan 2022.pptx
Lecture  Note 4 Earth Station Testing-Jan 2022.pptxLecture  Note 4 Earth Station Testing-Jan 2022.pptx
Lecture Note 4 Earth Station Testing-Jan 2022.pptx
AhmedWasiu
 
An Over View of Earth Station Technology PPT 6- Lecture 6.pptx
An Over View of Earth Station Technology PPT 6- Lecture 6.pptxAn Over View of Earth Station Technology PPT 6- Lecture 6.pptx
An Over View of Earth Station Technology PPT 6- Lecture 6.pptx
AhmedWasiu
 
An Earth Station Technology PPT 7- Lecture 7.pptx
An Earth Station Technology PPT 7- Lecture 7.pptxAn Earth Station Technology PPT 7- Lecture 7.pptx
An Earth Station Technology PPT 7- Lecture 7.pptx
AhmedWasiu
 
Lecture Note 5 Earth Station Testing Cont..pptx
Lecture Note 5 Earth Station Testing Cont..pptxLecture Note 5 Earth Station Testing Cont..pptx
Lecture Note 5 Earth Station Testing Cont..pptx
AhmedWasiu
 
Lecture 2 Earth Station Design and Fabrication considerations.pptx
Lecture 2 Earth Station Design and Fabrication considerations.pptxLecture 2 Earth Station Design and Fabrication considerations.pptx
Lecture 2 Earth Station Design and Fabrication considerations.pptx
AhmedWasiu
 
Lecture 3 Earth Station Standards, Reliability and Optimization.pptx
Lecture 3 Earth Station Standards, Reliability and Optimization.pptxLecture 3 Earth Station Standards, Reliability and Optimization.pptx
Lecture 3 Earth Station Standards, Reliability and Optimization.pptx
AhmedWasiu
 
Lecture Note 4 Earth Station Testing.pptx
Lecture  Note 4 Earth Station Testing.pptxLecture  Note 4 Earth Station Testing.pptx
Lecture Note 4 Earth Station Testing.pptx
AhmedWasiu
 
Lecture 2-Communication Systems-INTRODUCTION PPT.pptx
Lecture 2-Communication Systems-INTRODUCTION PPT.pptxLecture 2-Communication Systems-INTRODUCTION PPT.pptx
Lecture 2-Communication Systems-INTRODUCTION PPT.pptx
AhmedWasiu
 
Lecture 1-431171839-lec2-BinaryArithmetic.ppt
Lecture 1-431171839-lec2-BinaryArithmetic.pptLecture 1-431171839-lec2-BinaryArithmetic.ppt
Lecture 1-431171839-lec2-BinaryArithmetic.ppt
AhmedWasiu
 
Lecture for the day three in jj3 ppt.pdf
Lecture for the day three in jj3 ppt.pdfLecture for the day three in jj3 ppt.pdf
Lecture for the day three in jj3 ppt.pdf
AhmedWasiu
 
eStation Maintenance - Session 3.pptx.pdf
eStation Maintenance - Session 3.pptx.pdfeStation Maintenance - Session 3.pptx.pdf
eStation Maintenance - Session 3.pptx.pdf
AhmedWasiu
 
eStation Maintenance - Introduction.pptx.pdf
eStation Maintenance - Introduction.pptx.pdfeStation Maintenance - Introduction.pptx.pdf
eStation Maintenance - Introduction.pptx.pdf
AhmedWasiu
 
GMES_FOR_001_EN_TPZ_On-Site Training - V1.4 (1).pptx
GMES_FOR_001_EN_TPZ_On-Site Training - V1.4 (1).pptxGMES_FOR_001_EN_TPZ_On-Site Training - V1.4 (1).pptx
GMES_FOR_001_EN_TPZ_On-Site Training - V1.4 (1).pptx
AhmedWasiu
 
GMES_FOR_006_EN_TPZ - Linux - Start-v1.0.pptx
GMES_FOR_006_EN_TPZ - Linux - Start-v1.0.pptxGMES_FOR_006_EN_TPZ - Linux - Start-v1.0.pptx
GMES_FOR_006_EN_TPZ - Linux - Start-v1.0.pptx
AhmedWasiu
 
SolarThermal Sizing Example in the prese
SolarThermal  Sizing Example in the preseSolarThermal  Sizing Example in the prese
SolarThermal Sizing Example in the prese
AhmedWasiu
 

More from AhmedWasiu (18)

Lecture 1 An Overview of Earth Station Technology-Jan2022.pptx
Lecture 1 An Overview of Earth Station Technology-Jan2022.pptxLecture 1 An Overview of Earth Station Technology-Jan2022.pptx
Lecture 1 An Overview of Earth Station Technology-Jan2022.pptx
 
Lecture Note 4 Earth Station Testing-Jan 2022.pptx
Lecture  Note 4 Earth Station Testing-Jan 2022.pptxLecture  Note 4 Earth Station Testing-Jan 2022.pptx
Lecture Note 4 Earth Station Testing-Jan 2022.pptx
 
An Over View of Earth Station Technology PPT 6- Lecture 6.pptx
An Over View of Earth Station Technology PPT 6- Lecture 6.pptxAn Over View of Earth Station Technology PPT 6- Lecture 6.pptx
An Over View of Earth Station Technology PPT 6- Lecture 6.pptx
 
An Earth Station Technology PPT 7- Lecture 7.pptx
An Earth Station Technology PPT 7- Lecture 7.pptxAn Earth Station Technology PPT 7- Lecture 7.pptx
An Earth Station Technology PPT 7- Lecture 7.pptx
 
Lecture Note 5 Earth Station Testing Cont..pptx
Lecture Note 5 Earth Station Testing Cont..pptxLecture Note 5 Earth Station Testing Cont..pptx
Lecture Note 5 Earth Station Testing Cont..pptx
 
Lecture 2 Earth Station Design and Fabrication considerations.pptx
Lecture 2 Earth Station Design and Fabrication considerations.pptxLecture 2 Earth Station Design and Fabrication considerations.pptx
Lecture 2 Earth Station Design and Fabrication considerations.pptx
 
Lecture 3 Earth Station Standards, Reliability and Optimization.pptx
Lecture 3 Earth Station Standards, Reliability and Optimization.pptxLecture 3 Earth Station Standards, Reliability and Optimization.pptx
Lecture 3 Earth Station Standards, Reliability and Optimization.pptx
 
Lecture Note 4 Earth Station Testing.pptx
Lecture  Note 4 Earth Station Testing.pptxLecture  Note 4 Earth Station Testing.pptx
Lecture Note 4 Earth Station Testing.pptx
 
Lecture 2-Communication Systems-INTRODUCTION PPT.pptx
Lecture 2-Communication Systems-INTRODUCTION PPT.pptxLecture 2-Communication Systems-INTRODUCTION PPT.pptx
Lecture 2-Communication Systems-INTRODUCTION PPT.pptx
 
Lecture 1-431171839-lec2-BinaryArithmetic.ppt
Lecture 1-431171839-lec2-BinaryArithmetic.pptLecture 1-431171839-lec2-BinaryArithmetic.ppt
Lecture 1-431171839-lec2-BinaryArithmetic.ppt
 
20240108@CPE-303-Computer Engineering.pptx
20240108@CPE-303-Computer Engineering.pptx20240108@CPE-303-Computer Engineering.pptx
20240108@CPE-303-Computer Engineering.pptx
 
Lecture for the day three in jj3 ppt.pdf
Lecture for the day three in jj3 ppt.pdfLecture for the day three in jj3 ppt.pdf
Lecture for the day three in jj3 ppt.pdf
 
eStation Maintenance - Session 3.pptx.pdf
eStation Maintenance - Session 3.pptx.pdfeStation Maintenance - Session 3.pptx.pdf
eStation Maintenance - Session 3.pptx.pdf
 
eStation Maintenance - Introduction.pptx.pdf
eStation Maintenance - Introduction.pptx.pdfeStation Maintenance - Introduction.pptx.pdf
eStation Maintenance - Introduction.pptx.pdf
 
GMES_FOR_001_EN_TPZ_On-Site Training - V1.4 (1).pptx
GMES_FOR_001_EN_TPZ_On-Site Training - V1.4 (1).pptxGMES_FOR_001_EN_TPZ_On-Site Training - V1.4 (1).pptx
GMES_FOR_001_EN_TPZ_On-Site Training - V1.4 (1).pptx
 
GMES_FOR_006_EN_TPZ - Linux - Start-v1.0.pptx
GMES_FOR_006_EN_TPZ - Linux - Start-v1.0.pptxGMES_FOR_006_EN_TPZ - Linux - Start-v1.0.pptx
GMES_FOR_006_EN_TPZ - Linux - Start-v1.0.pptx
 
Review of Neural Network in the entired world
Review of Neural Network in the entired worldReview of Neural Network in the entired world
Review of Neural Network in the entired world
 
SolarThermal Sizing Example in the prese
SolarThermal  Sizing Example in the preseSolarThermal  Sizing Example in the prese
SolarThermal Sizing Example in the prese
 

Recently uploaded

Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
drjose256
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
benjamincojr
 

Recently uploaded (20)

NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 

Principle of Information Theory PkPT.pdf

  • 1. . . . . . . . . . . . . . Information Theory: Principles and Applications Tiago T. V. Vinhoza March 19, 2010 Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 1 / 50
  • 2. . . . . . . . .. 1 Course Information . .. 2 What is Information Theory? . .. 3 Review of Probability Theory . .. 4 Information Measures Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 2 / 50
  • 3. . . . . . . Course Information Information Theory: Principles and Applications Prof. Tiago T. V. Vinhoza Office: FEUP Building I, Room I322 Office hours: Wednesdays from 14h30-15h30. Email: tiago.vinhoza@ieee.org Prof. José Vieira Prof. Paulo Jorge Ferreira Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 3 / 50
  • 4. . . . . . . Course Information Information Theory: Principles and Applications http://paginas.fe.up.pt/∼vinhoza (link for Info Theory) Homeworks Other notes My Evaluation: (Almost) Weekly Homeworks + Final Exam References: Elements of Information Theory, Cover and Thomas, Wiley Information Theory and Reliable Communication, Gallager Information Theory, Inference, and Learning Algorithms, McKay (available online) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 4 / 50
  • 5. . . . . . . What is Information Theory? What is Information Theory? IT is a branch of math (a strictly deductive system). (C. Shannon, The bandwagon) General statistical concept of communication. (N. Wiener, What is IT?) It was build upon the work of Shannon (1948) It answers to two fundamental questions in Communications Theory: What is the fundamental limit for information compression? What is the fundamental limit on information transmission rate over a communications channel? Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 5 / 50
  • 6. . . . . . . What is Information Theory? What is Information Theory? Mathematics: Inequalities Computer Science: Kolmogorov Complexity Statistics: Hypothesis Testings Probability Theory: Limit Theorems Engineering: Communications Physics: Thermodynamics Economics: Portfolio Theory Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 6 / 50
  • 7. . . . . . . What is Information Theory? Communications Systems The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. (Claude Shannon: A Mathematical Theory of Communications, 1948) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 7 / 50
  • 8. . . . . . . What is Information Theory? Digital Communications Systems Source Source Coder: Convert an analog or digital source into bits. Channel Coder: Protection against errors/erasures in the channel. Modulator: Each binary sequence is assigned to a waveform Channel: Physical Medium to send information from transmitter to receiver. Source of randomness. Demodulator, Channel Decoder, Source Decoder, Sink. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 8 / 50
  • 9. . . . . . . What is Information Theory? Digital Communications Systems Modulator + Channel = Discrete Channel. Binary Symmetric Channel. Binary Erasure Channel. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 9 / 50
  • 10. . . . . . . Review of Probability Theory Review of Probability Theory Axiomatic Approach Relative Frequency Approach Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 10 / 50
  • 11. . . . . . . Review of Probability Theory Axiomatic Approach Application of a mathematical theory called Measure Theory. It is based on a triplet (Ω, F, P) where Ω is the sample space, which is the set of all possible outcomes. F is the σ−algebra, which is the set of all possible events (or combinations of outcomes). P is the probability function, which can be any set function, whose domain is Ω and the range is the closed unit interval [0,1]. It must obey the following rules: P(Ω) = 1 Let A be any event in F, then P(A) ≥ 0. Let A and B be two events in F such that A ∩ B = ∅, then P(A ∪ B) = P(A) + P(B). Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 11 / 50
  • 12. . . . . . . Review of Probability Theory Axiomatic Approach: Other properties Probability of complement: P(A) = 1 − P(A). P(A) ≤ 1. P(∅) = 0. P(A ∪ B) = P(A) + P(B) − P(A ∩ B). Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 12 / 50
  • 13. . . . . . . Review of Probability Theory Conditional Probability Let A and B be two events, with P(A) > 0. The conditional probability of B given A is defined as: P(B|A) = P(A ∩ B) P(A) Hence, P(A ∩ B) = P(B|A)P(A) = P(A|B)P(B) If A ∩ B = ∅ then P(B|A) = 0. If A ⊂ B, then P(B|A) = 1. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 13 / 50
  • 14. . . . . . . Review of Probability Theory Bayes Rule If A and B are events P(A|B) = P(B|A)P(A) P(B) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 14 / 50
  • 15. . . . . . . Review of Probability Theory Total Probabilty Theorem A set of Bi , i = 1, . . . , n of events is a partition of Ω when: ∪n i=1 Bi = Ω. Bi ∩ Bj = ∅, if i ̸= j. Theorem: If A is an event and Bi , i = 1, . . . , n of is a partition of Ω, then: P(A) = n ∑ i=1 P(A ∩ Bi ) = n ∑ i=1 P(A|Bi )P(Bi ) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 15 / 50
  • 16. . . . . . . Review of Probability Theory Independence between Events Two events A and B are statistically independent when P(A ∩ B) = P(A)P(B) Supposing that both P(A) and P(B) are greater than zero, from the above definition we have that: P(A|B) = P(A) P(B|A) = P(B) Independent events and mutually exclusive events are different! Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 16 / 50
  • 17. . . . . . . Review of Probability Theory Independence between events N events are statistically independent if the intersection of the events contained in any subset of those N events have probability equal to the product of the individual probabilities Example: Three events A, B and C are independent if: P(A∩B) = P(A)P(B), P(A∩C) = P(A)P(C), P(B∩C) = P(B)P(C) P(A ∩ B ∩ C) = P(A)P(B)P(C) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 17 / 50
  • 18. . . . . . . Review of Probability Theory Random Variables A random variable (rv) is a function that maps each ω ∈ Ω to a real number. X : Ω → R ω → X(ω) Through a random variable, subsets of Ω are mapped as subsets (intervals) of the real numbers. P(X ∈ I) = P({w|X(ω) ∈ I}) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 18 / 50
  • 19. . . . . . . Review of Probability Theory Random Variables A real random variable is a function whose domain is Ω and such that for all real number x, the set Ax = {ω|X(ω) ≤ x} is an event. P(w|X(w) = ±∞) = 0. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 19 / 50
  • 20. . . . . . . Review of Probability Theory Cumulative Distribution Function FX : R → [0, 1] X → FX (x) = P(X ≤ x) = P(ω|X(ω) ≤ x) FX (∞) = 1 FX (−∞) = 0 If x1 < x2, FX (x2) ≥ FX (x1). FX (x+) = limϵ→0 FX (x + ϵ) = FX (x). (continuous on the right side). FX (x) − FX (x−) = P(X = x). Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 20 / 50
  • 21. . . . . . . Review of Probability Theory Types of Random Variables Discrete: Cumulative function is a step function (sum of unit step functions) FX (x) = ∑ i P(X = xi )u(x − xi ) where u(x) is the unit step funtion. Example: X is the random variable that describes the outcome of the roll of a die. X ∈ {1, 2, 3, 4, 5, 6} Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 21 / 50
  • 22. . . . . . . Review of Probability Theory Types of Random Variable Continous: Cumulative function is a continous function. Mixed: Neither discrete nor continous. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 22 / 50
  • 23. . . . . . . Review of Probability Theory Probability Density Function It is the derivative of the cumulative distribution function: pX (x) = d dx FX (x) ∫ x −∞ pX (x)dx = FX (x). pX (x) ≥ 0. ∫ ∞ −∞ pX (x)dx = 1. ∫ b a pX (x)dx = FX (b) − FX (a) = P(a ≤ X ≤ b). P(X ∈ I) = ∫ I pX (x)dx, I ⊂ R. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 23 / 50
  • 24. . . . . . . Review of Probability Theory Discrete Random Variables Let us now focus only on discrete random variables. Let X be a random variable with sample space X The probability mass function (probability distribution function) of X is a mapping pX (x) : X → [0, 1] satisfying: ∑ X∈X pX (x) = 1 The number pX (x) := P(X = x) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 24 / 50
  • 25. . . . . . . Review of Probability Theory Discrete Random Vectors Let Z = [X, Y ] be a random vector with sample space Z = X × Y The joint probability mass function (probability distribution function) of Z is a mapping pZ (z) : Z → [0, 1] satisfying: ∑ Z∈Z pZ (z) = ∑ x,y×Y pXY (x, y) = 1 The number pZ (z) := pXY (x, y) = P(Z = z) = P(X = x, Y = y). Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 25 / 50
  • 26. . . . . . . Review of Probability Theory Discrete Random Vectors Marginal Distributions pX (x) = ∑ y∈Y pXY (x, y) pY (y) = ∑ x∈X pXY (x, y) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 26 / 50
  • 27. . . . . . . Review of Probability Theory Discrete Random Vectors Conditional Distributions pX|Y =y (x) = pXY (x, y) pY (y) pY |X=x (y) = pXY (x, y) pX (x) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 27 / 50
  • 28. . . . . . . Review of Probability Theory Discrete Random Vectors Random variables X and Y are independent if and only if pXY (x, y) = pX (x)pY (y) Consequences: pX|Y =y (x) = pX (x) pY |X=x (y) = pY (y) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 28 / 50
  • 29. . . . . . . Review of Probability Theory Moments of a Discrete Random Variable The n−th order moment of a discrete random variable X is defined as: E[Xn ] = ∑ x∈X xn pX (x) if n = 1, we have the mean of X, mX = E[X]. The m−th order central moment of a discrete random variable X is defined as: E[(X − mX )m ] = ∑ x∈X (x − mX )m pX (x) if m = 2, we have the variance of X, σ2 X . Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 29 / 50
  • 30. . . . . . . Review of Probability Theory Moments of a Discrete Random Vector The joint moment n−th order with relation to X and k−th order with relation to Y : mnk = E[Xn Y k ] = ∑ x∈X ∑ y∈Y xn yk pXY (x, y) The joint central n−th order with relation to X and k−th order with relation to Y : µnk = E[(X−mX )n (Y −mY )k ] == ∑ x∈X ∑ y∈Y (x−mX )n (y−mY )k pXY (x, y) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 30 / 50
  • 31. . . . . . . Review of Probability Theory Correlation and Covariance The correlation of two random variables X and Y is the expected value of their product (joint moment of order 1 in X and order 1 in Y ): Corr(X, Y ) = m11 = E[XY ] The covariance of two random variables X and Y is the joint central moment of order 1 in X and order 1 in Y: Cov(X, Y ) = µ11 = E[(X − mX )(Y − mY )] Cov(X, Y ) = Corr(X, Y ) − mX mY Correlation Coefficient: ρXY = Cov(X, Y ) σX σY → −1 ≤ ρXY ≤ 1 Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 31 / 50
  • 32. . . . . . . Information Measures What is Information? It is a measure that quantifies the uncertainty of an event with given probability - Shannon 1948. For a discrete source with finite alphabet X = {x0, x1, . . . , xM−1} where the probability of each symbol is given by P(X = xk) = pk I(xk) = log 1 pk = − log(pk) If logarithm is base 2, information is given in bits. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 32 / 50
  • 33. . . . . . . Information Measures What is Information? It represents the surprise of seeing the outcome (a highly probable outcome is not surprising). event probability surprise one equals one 1 0 bits wrong guess on a 4-choice question 3/4 0.415 bits correct guess on true-false question 1/2 1 bit correct guess on a 4-choice question 1/4 2 bits seven on a pair of dice 6/36 2.58 bits win any prize at Euromilhões 1/24 4.585 bits win Euromilhões Jackpot ≈ 1/76 million ≈ 26 bits gamma ray burst mass extinction today < 2.7 · 10−12 > 38 bits Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 33 / 50
  • 34. . . . . . . Information Measures Entropy Expected value of information from a source. H(X) = E[I(xk)] = ∑ x∈X px (x)I(xk) = − ∑ x∈X px (x) log px (x) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 34 / 50
  • 35. . . . . . . Information Measures Entropy of binary source Let X be a binary source with p0 and p1 being the probabililty of symbols x0 and x1 respectively. H(X) = −p0 log p0 − p1 log p1 = −p0 log p0 − (1 − p0) log(1 − p0) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 35 / 50
  • 36. . . . . . . Information Measures Entropy of binary source 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p0 H(X) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 36 / 50
  • 37. . . . . . . Information Measures Joint Entropy The joint entropy of a pair of random variables X and Y is given by:. H(X, Y ) = − ∑ y∈Y ∑ x∈X pXY (x, y) log pX,Y (x) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 37 / 50
  • 38. . . . . . . Information Measures Conditional Entropy Average amount of information of a random variable given the occurence of other. H(X|Y ) = ∑ y∈Y pY (y)H(X|Y = y) = − ∑ y∈Y pY (y) ∑ x∈X pX|Y =y (x) log px|Y =y (x) = − ∑ y∈Y ∑ x∈X pXY (x, y) log pX|Y =y (x) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 38 / 50
  • 39. . . . . . . Information Measures Chain Rule of Entropy The entropy of a pair of random variables is equal to the entropy of one of them plus the conditional entropy. H(X, Y ) = H(X) + H(Y |X) Corollary H(X, Y |Z) = H(X|Z) + H(Y |X, Z) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 39 / 50
  • 40. . . . . . . Information Measures Chain Rule of Entropy: Generalization H(X1, X2, . . . , XM) = M ∑ j=1 H(Xj |X1, . . . , Xj−1) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 40 / 50
  • 41. . . . . . . Information Measures Relative Entropy: Kullback-Leibler Distance Is a measure of the distance between two distributions. The relative entropy between two probability density functions pX (x) and qX (x) is defined as: D(pX (x)||qX (x)) = ∑ x∈X pX (x) log pX (x) qX (x) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 41 / 50
  • 42. . . . . . . Information Measures Relative Entropy: Kullback-Leibler Distance D(pX (x)||qX (x)) ≥ 0 with equality if and only if pX (x) = qX (x). D(pX (x)||qX (x)) ̸= D(qX (x)||pX (x)) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 42 / 50
  • 43. . . . . . . Information Measures Mutual Information The mutual information of two random variables X and Y is defined as the relative entropy between the joint probability density pXY (x, y) and the product of the marginals pX (x) and pY (y) I(X; Y ) = D(pXY (x, y)||pX (x)pY (y)) = ∑ x∈X ∑ y∈Y pXY (x, y) log pX,Y (x, y) pX (x)pY (y) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 43 / 50
  • 44. . . . . . . Information Measures Mutual Information: Relations with Entropy Reducing uncertainty of X due to the knowledge of Y : I(X; Y ) = H(X) − H(X|Y ) Symmetry of the relation above: I(X; Y ) = H(Y ) − H(Y |X) Sum of entropies: I(X; Y ) = H(X) + H(Y ) − H(X, Y ) “Self” Mutual Information: I(X; X) = H(X) − H(X|X) = H(X) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 44 / 50
  • 45. . . . . . . Information Measures Mutual Information: Other Relations Conditional Mutual Information: I(X; Y |Z) = H(X|Z) − H(X|Y , Z) Chain Rule for Mutual Information I(X1, X2, . . . , XM; Y ) = M ∑ j=1 I(Xj ; Y |X1, . . . , Xj−1) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 45 / 50
  • 46. . . . . . . Information Measures Convex and Concave Functions A function f (·) is convex over ain interval (a, b) if for every x1, x2 ∈ [a, b] and 0 ≤ λ ≤ 1, if : f (λx1 + (1 − λ)x2) ≤ λf (x1) + (1 − λ)f (x2) A function f (·) is convex over an interval (a, b) if its second derivative is non-negative over that interval (a, b). A function f (·) is concave if −f (·) is convex. Examples of convex functions: x2, |x|, ex , x log x, x ≥ 0. Examples of concave functions: log x and √ x, for x ≥ 0. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 46 / 50
  • 47. . . . . . . Information Measures Jensen’s Inequality If f (·) is a convex function and X is a random variable E[f (X)] ≥ f (E[X]) Used to show that relative entropy and mutual information are greater than zero. Used also to show that H(X) ≤ log |X|. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 47 / 50
  • 48. . . . . . . Information Measures Log-Sum Inequality For n positive numbers a1, a2, . . . , an and b1, b2, . . . bn n ∑ i=1 ai log ai bi ≥ ( n ∑ i=1 ai ) log ∑n i=1 ai ∑n i=1 bi with equality if and only if ai /bi = c. This inequality is used to prove the convexity of the relative entropy and the concavity of the entropy. Convexity/Concavity of mutual information Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 48 / 50
  • 49. . . . . . . Information Measures Data Processing Inequality Random variables X, Y , Z are said to form a Markov chain in that order X → Y → Z, if the conditional distribution of Z depends only on Y and is onditionally independent of X. pXYZ (x, y, z) = pX (x)pY |X=x (y)pZ|Y =y (y) If X → Y → Z, then I(X; Y ) ≥ I(X; Z) Let Z = g(Y ), X → Y → g(Y ), then I(X; Y ) ≥ I(X; g(Y )) Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 49 / 50
  • 50. . . . . . . Information Measures Fano’s Inequality Suppose we know a random variable Y and we wish to guess the value of a correlated random variable X. Fano’s inequality relates the probability of error in guessing X from Y to its conditional entropy H(X|Y ). Let X̂ = g(Y ), if Pe = P(X̂ ̸= X), then H(Pe) + Pe log(|X| − 1) ≥ H(X|Y ) where H(Pe) is the binary entropy function evaluated at Pe. Tiago T. V. Vinhoza () Information Theory - MAP-Tele March 19, 2010 50 / 50