SlideShare a Scribd company logo
1 of 62
Download to read offline
Block 2: Introduction to Information Theory
Francisco J. Escribano
October 25, 2016
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 1 / 62
Table of contents
1 Motivation
2 Entropy
3 Source coding
4 Mutual information
5 Discrete memoryless channels
6 Entropy and mutual information for continuous RRVV
7 AWGN channel capacity theorem
8 Bridging towards channel coding
9 Conclusions
10 References
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 2 / 62
Motivation
Motivation
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 3 / 62
Motivation
Motivation
Information Theory is a discipline established during the 2nd half of the
XXth Century.
It relies on solid mathematical foundations [1, 2, 3, 4].
It tries to address two basic questions:
◮ To what extent can we compress data for a more efficient usage of the
limited communication resources?
→ Entropy
◮ Which is the largest possible data transfer rate for given resources and
conditions?
→ Channel capacity
Key concepts for Information Theory are thus entropy (H (X)) and
mutual information (I (X; Y)).
◮ X, Y are random variables (RRVV) of some kind.
◮ Information will naturally be characterized by means of RRVV.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 4 / 62
Motivation
Motivation
Up to the 40’s, it was common wisdom in telecommunications that the error rate
increased with increasing data rate.
◮ Claude Shannon demonstrated that errorfree transmission may be possible under
certain conditions.
Information Theory provides strict bounds for any communication system.
◮ Maximum compression → H (X); or minimum I X; ˆX , with controlled distortion.
◮ Maximum transfer rate → maximum I (X; Y).
◮ Any given communication system works between said limits.
The mathematics behind is not always constructive, but provides guidelines to design
algorithms to improve communications given a set of available resources.
◮ The resources in this context are parameters such as available transmission power,
available bandwidth, signal-to-noise ratio and the like.
X YX ˆX
ChannelCompression
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 5 / 62
Entropy
Entropy
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 6 / 62
Entropy
Entropy
Consider a discrete memoryless data source, that issues a symbol from
a given set at a given symbol rate, chosen randomly and independently
from the previous and the subsequent ones.
ζ = {s0, · · · , sK−1} , P (S = sk) = pk,
k = 0, 1, · · · , K − 1; K is the source radix
Information quantity is a random variable defined as
I (sk) = log2
1
pk
with properties
I (sk) = 0 if pk = 1
I (sk) > I (si ) if pk < pi
I (sk) ≥ 0, 0 ≤ pk ≤ 1
I (sl , sk) = I (sl ) + I (sk)
Definition is conventional but also convenient: it relates uncertainty
and information.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 7 / 62
Entropy
Entropy
The source entropy is a measurement of its “information content”,
and it is defined as
H (ζ) = E{pk } [I (sk)] =
K−1
k=0
pk · I (sk)
H (ζ) =
K−1
k=0
pk · log2
1
pk
0 ≤ H (ζ) ≤ log2 (K)
pj = 1 ∧ pk = 0, k = j
H (ζ) = 0
pk = 1
K , k = 0, · · · , K − 1
H (ζ) = log2 (K)
There is no potential information (uncertainty) in deterministic (“de-
generated”) sources.
Entropy is conventionally measured in “bits”.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 8 / 62
Entropy
Entropy
E.g. memoryless binary source: H (p) = −p · log2 (p) − (1 − p) · log2 (1 − p).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1H(p)
p
Figure 1: Entropy of a memoryless binary source.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 9 / 62
Source coding
Source coding
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 10 / 62
Source coding
Source coding
The field of source coding addresses the issues related to handling the
output data from a given source, from the point of view of Information
Theory.
One of the main issues is data compression, its theoretical limits and
the related practical algorithms.
◮ The most important prerequisite in communications at the PHY is to
keep data integrity → any transformation has to be fully invertible (loss-
less compression).
◮ Lossy compression is a key field at other communication levels: for ex-
ample, compressing a stream of video to MPEG-2 standard, to create a
transport frame.
◮ In other related fields, like cryptography, some non-invertible algorithms
for data compression are of utmost interest (i.e. hash functions, to create
almost-unique tokens).
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 11 / 62
Source coding
Source coding
We may choose to represent the data from the source by asigning to
each symbol sk a corresponding codeword (binary, in our case).
The aim of this source coding process is to try to represent the source
symbols more efficiently.
◮ We focus on variable-length binary, invertible codes → the codewords
are unique blocks of binary symbols of length lk , for each symbol sk.
◮ The correspondence codeword ↔ original symbol constitutes a code.
Average codeword length:
L = E [lk] =
K−1
k=0
pk · lk
Coding efficiency:
η = Lmin
L
≤ 1
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 12 / 62
Source coding
Shannon’s source coding theorem
This theorem establishes the limits for lossless data compression [5].
◮ N iid1
random variables with entropy H (ζ) each, can be compressed
into N · H (ζ) bits with negligible information loss risk as N → ∞. If
they are compressed into less than N · H (ζ) bits it is certain that some
information will be lost.
In practical terms, for a single random variable, this means
Lmin ≥ H (ζ)
And, therefore, the coding efficiency can be defined as
η = H(ζ)
L
Like other results in Information Theory, this theorem provides the
limits, but does not tell how actually we can reach them.
1
Independent and identically distributed.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 13 / 62
Source coding
Example of source coding: Huffman coding
Huffman coding provides a practical algorithm to perform source cod-
ing within the limits shown.
It is an instance of a class of codes, called prefix codes
◮ No binary word within the codeset is the prefix of any other one.
Properties:
◮ Unique coding.
◮ Instantaneous decoding.
◮ The lengths lk meet the Kraft-McMillan inequality [2]:
K−1
k=0
2−lk
≤ 1.
◮ L of the code is bounded by
H (ζ) ≤ L < H (ζ) + 1 H (ζ) = L ↔ pk = 2−lk
Meeting the Kraft-McMillan inequality guarantees that a prefix code
with the given lk can be constructed.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 14 / 62
Source coding
Huffman coding
Algorithm to perform Huffman coding:
1 List symbols sk in a column, in order of decreasing probabilities.
2 Compose the probabilities of the last 2 symbols: probabilities are added
into a new dummy compound value/symbol.
3 Reorder the probabilities set in an adjacent column, putting the new
dummy symbol probability as high as possible, retiring values involved.
4 In the process of moving probabilities to the right, keep the values of the
unaffected symbols (making room to the compound value if needed),
but assign a 0 to one of the symbols affected, a 1 to the other (top or
bottom, but keep always the same criterion along the process).
5 Start afresh the process in the new column.
6 When only two probabilities remain, assign last 0, 1 and stop.
7 Write the binary codeword corresponding to each original symbol by trac-
ing back the trajectory of each of the original symbols and the dummy
symbols they take part in, from the last towards the initial column.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 15 / 62
Source coding
Huffman coding example
Figure 2: Example of Huffman coding: assigning final codeword patterns proceeds from right to left.
To characterize the resulting code, it is important to calculate:
H (ζ) ; L =
K−1
k=0
pk · lk ; η =
H(ζ)
L
; σ2
=
K−1
k=0
pk · lk − L
2
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 16 / 62
Source coding
Managing efficiency in source coding
From the point of view of data communication (sequences of data
symbols), it is useful to define the n-th extension of the source ζ:
◮ A symbol of the n-th extension of the source is built by taking n succes-
sive symbols from it (symbols in ζn
).
◮ If source is iid, for si ∈ ζ, the new extended source ζn
is characterized
by the probabilities:
P (σ = (s1 · · · sn) ∈ ζn
) =
n
i=1 P (si ∈ ζ)
◮ Given that the sequence si in σ ∈ ζn
is independent and identically
distributed (iid) for any n, the entroy of this new source is:
H (ζn
) = n · H (ζ)
By applying Huffman coding to ζn, we derive a code with average
length L characterized by:
H (ζn
) ≤ L < H (ζn
) + 1
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 17 / 62
Source coding
Managing efficiency in source coding
Considering the equivalent word length per original symbol
Ls = L
n → H (ζ) ≤ Ls < H (ζ) + 1
n
The final efficiency turns out to be
η = H(ζn)
L
= n·H(ζ)
L
= H(ζ)
Ls
Efficiency tends thus to 1 as n → ∞, as shown by the previous inequal-
ity.
The method of the n-th extension of the source, nevertheless, is ren-
dered impractical for large n, due to:
◮ Increasing delay.
◮ Increasing coding and decoding complexity.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 18 / 62
Mutual information
Mutual information
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 19 / 62
Mutual information
Joint entropy
We extend the concept of entropy to 2 RRVV.
◮ 2 or more RRVV are needed when analyzing communication channels
from the point of view of Information Theory.
◮ These RRVV can also be seen as a random vector.
◮ The underlying concept is the characterization of channel input vs chan-
nel output, and what we can get about the former by observing the
latter.
Joint entropy of 2 RRVV:
H (X, Y) =
x∈X y∈Y
p (x, y) · log2
1
p(x,y)
H (X, Y) = Ep(x,y) log2
1
P(X,Y)
Properties:
H (X, Y) ≥ max [H (X) , H (Y)] ; H (X, Y) ≤ H (X) + H (Y)
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 20 / 62
Mutual information
Conditional entropy
Conditional entropy of 2 RRVV:
H (Y|X) =
x∈X
p (x) · H (Y|X = x) =
=
x∈X
p (x)
y∈Y
p (y|x) · log2
1
p(y|x) =
=
x∈X y∈Y
p (x, y) · log2
1
p(y|x) = Ep(x,y) log2
1
P(Y|X)
H (Y|X) is a measure of the uncertainty in Y once X is known.
Properties:
H (Y|Y) = 0; H (Y|X) = H (Y) if X and Y are independent
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 21 / 62
Mutual information
Chain rule
Relation between joint and conditional entropy of 2 RRVV:
H (X, Y) = H (X) + H (Y|X) .
The expression points towards the following common wisdom result:
“joint knowledge about X and Y is the knowledge about X plus the
information in Y not related to X”.
Proof:
p (x, y) = p (x) · p (y|x) ;
log2 (p (x, y)) = log2 (p (x)) + log2 (p (y|x)) ;
Ep(x,y) [log2 (P (X, Y))] = Ep(x,y) [log2 (P (X))] +
+Ep(x,y) [log2 (P (Y|X))] .
Corollary: H (X, Y|Z) = H (X|Z) + H (Y|X, Z).
As another consequence: H (Y|X) ≤ H (Y).
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 22 / 62
Mutual information
Relative entropy
H (X) measures the quantity of information needed to describe the RV
X on average.
The relative entropy D (p q) measures the increase in information
needed to describe X when it is characterized by means of a distribution
q (x) instead of p (x).
X; p (x) → H (X)
X; q (x) → H (X) + D (p q)
Definition of relative entropy (aka Kullback-Leibler divergence, or
improperly “distance”):
D (p q) =
x∈X
p (x) · log2
p(x)
q(x) = Ep(x) log2
P(X)
Q(X)
Note that: limx→0 (x · log (x)) = 0; 0 · log 0
0 = 0; p · log p
0 = ∞.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 23 / 62
Mutual information
Relative entropy and mutual information
Properties of relative entropy:
1 D (p q) ≥ 0.
2 D (p q) = 0 ↔ p (x) = q (x).
3 It is not symmetric. Therefore, it is not a true distance from the metrical
point of view.
Mutual information of 2 RRVV:
I (X; Y) = H (X) − H (X|Y) =
=
x∈X y∈Y
p (x, y) · log2
p(x,y)
p(x)·p(y) =
= D (p (x, y) p (x) · p (y)) = Ep(x,y) log2
P(X,Y)
P(X)·P(Y) .
The mutual information between X and Y is the information in X,
minus the information in X not related to Y.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 24 / 62
Mutual information
Mutual information
Properties of mutual information:
1 Symmetry: I (X; Y) = I (Y; X).
2 Non negative: I (X; Y) ≥ 0.
3 I (X; Y) = 0 ↔ X and Y are independent.
}H (Y|X)I (X; Y)
H (X) H (Y)
H (X, Y)
H (X|Y)
Figure 3: Relationship between entropy and mutual information.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 25 / 62
Mutual information
A glimpse of lossy compression
Now that we have defined mutual information, we can examine lossy
compression under the so-called Rate-Distortion Theory.
A source X is compressed into a new source ˆX, a process characterized
by the probabilities
p (x) ; p (ˆx, x) = p (ˆx|x) p (x)
The quality of the compression is measured according to a distortion
parameter d (x, ˆx) and its expected value D:
D = Ep(ˆx,x) [d (x, ˆx)] = x ˆx d (x, ˆx) p (ˆx|x) p (x)
d (x, ˆx) is any convenient distance measurement
◮ The Euclidean distance, d (x, ˆx) = |x − ˆx|2
.
◮ The Hamming distance, d (x, ˆx) = 1 if x = ˆx, and 0 otherwise.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 26 / 62
Mutual information
A glimpse of losssy compression
In lossy compression, we want ˆX to be represented by the lowest number
of bits, and this means finding a distribution p (ˆx|x) such that
R = minp(ˆx|x) I X; ˆX = minp(ˆx|x) H (X) − H X|ˆX
R is the rate in bits of the compressed source.
The source is compressed to an entropy lower than H (X): since the
process is lossy, H X|ˆX = 0.
The minimum is trivially 0 when X and ˆX are fully unrelated, so that
the minimization is performed subject to D ≤ D∗, for a maximum
admissible distortion D∗.
Conversely, the problem can be addressed as the minimization of the
distortion D subject to I X; ˆX ≤ R∗, for a maximum admissible rate
R∗ < H (X).
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 27 / 62
Discrete memoryless channels
Discrete memoryless channels
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 28 / 62
Discrete memoryless channels
Discrete memoryless channels
Communication channel:
◮ Input/output system where the output is a probabilistic function of the
input.
A DMC (Discrete Memoryless Channel) consist in
◮ Input alphabet X = {x0, x1, · · · , xJ−1}, corresponding to RV X.
◮ Output alphabet Y = {y0, y1, · · · , yK−1}, corresponding to RV Y, noisy
version of RV X.
◮ A set of transition probabilities linking input and output, following
{p (yk |xj)} k=0,1,··· ,K−1; j=0,1,···J−1
p (yk |xj ) = P (Y = yk |X = xj )
◮ X and Y are finite and discrete.
◮ The channel is memoryless, since the output symbol depends only on
the current input symbol.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 29 / 62
Discrete memoryless channels
Discrete memoryless channels
Channel matrix P, J × K
P =






p (y0|x0) p (y1|x0) · · · p (yK−1|x0)
p (y0|x1) p (y1|x1) · · · p (yK−1|x1)
...
...
...
...
p (y0|xJ−1) p (y1|xJ−1) · · · p (yK−1|xJ−1)






The channel is said to be symmetric when each column is a permutation
of any other, and each row is a permutation of any other.
Important property
K−1
k=0
p (yk|xj ) = 1, ∀ j = 0, 1, · · · , J − 1.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 30 / 62
Discrete memoryless channels
Discrete memoryless channels
Output Y is probabilistically determined by the input (a priori) proba-
bilities and the channel matrix, following
pX = (p (x0) , p (x1) , · · · , p (xJ−1)), and p (xj) = P (X = xj)
pY = (p (y0) , p (y1) , · · · , p (yK−1)), and p (yk) = P (Y = yk)
p (yk) =
J−1
j=0
p (yk|xj ) · p (xj ), ∀ k = 0, 1 · · · , K − 1
Therefore, pY = pX · P
When J = K and yj is the correct choice when sending xj, we can
calculate the average symbol error probability as
Pe =
J−1
j=0
p (xj) ·
K−1
k=0,k=j
p (yk|xj )
The probability of correct transmission is 1 − Pe.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 31 / 62
Discrete memoryless channels
Discrete memoryless channels
Example of discrete memoryless channel: modulation with hard deci-
sion.
xj
yk
Figure 4: 16-QAM transmitted constellation.
p(yk|xj )
Figure 5: 16-QAM received constellation with noise.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 32 / 62
Discrete memoryless channels
Disccrete memoryless channels
Example of non-symmetric channel → the binary erasure channel, typ-
ical of storage systems.
◮ Reading data in storage systems can also be modeled as sending infor-
mation through a channel with given probabilistic properties.
◮ An erasure marks the complete uncertainty over the symbol read.
◮ There are methods to recover from erasures, based on the principles of
Information Theory.
x0=0
x1=1
y0=0
y1=ǫ
y2=1
1−p
1−p
p
p
Figure 6: Diagram showing the binary erasure channel.
P =
1 − p p 0
0 p 1 − p
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 33 / 62
Discrete memoryless channels
Channel capacity
Mutual information depends on P and pX. Characterizing the possibil-
ities of the channel requires removing dependency with pX.
Channel capacity is defined as
C = maxpX
[I (X; Y)]
◮ It is the maximum average mutual information enabled by the channel,
in bits per channel use, and the best we can get out of it in point of
reliable information transfer.
◮ It only depends on the channel transition probabilities P.
◮ If the channel is symmetric, the distribution pX that maximizes I (X; Y)
is the uniform one (equiprobable symbols).
Channel coding is a process where controled redundancy is added to
protect data integrity.
◮ A channel encoded information sequence is represented by a block of n
bits, encoded from an information sequence represented by k ≤ n bits.
◮ The so-called code rate is calculated as R = k/n ≤ 1.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 34 / 62
Discrete memoryless channels
Noisy-channel coding theorem
Consider a discrete source ζ, emitting symbols with period Ts.
◮ The binary information rate of such source is H(ζ)
Ts
(bits/s).
Consider a discrete memoryless channel, used to send coded data each
Tc seconds.
◮ The maximum possible data transfer rate would be C
Tc
(bits/s).
The noisy-channel coding theorem states the following [5]:
◮ If H(ζ)
Ts
≤ C
Tc
, there exists a coding scheme that guarantees errorfree
transmission (i.e. Pe arbitrarily small).
◮ Conversely, if H(ζ)
Ts
> C
Tc
, the communication cannot be made reliable
(i.e. we cannot have a bounded Pe, so small as desired).
Please note that again the theorem is asymptotic, and not constructive:
it does not say how to actually reach the limit.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 35 / 62
Discrete memoryless channels
Noisy-channel coding theorem
Example: a binary symmetric channel.
Consider a binary source ζ = {0, 1}, with equiprobable symbols.
◮ H (ζ) = 1 info bits/“channel use”.
◮ Source works at a rate of 1
Ts
“channel uses”/s, and H(ζ)
Ts
info bits/s.
Consider an encoder with rate k
n info/coded bits, and 1
Tc
“channel
uses”/s.
◮ Note that k
n = Tc
Ts
.
Maximum achievable rate is C
Tc
coded bits/s.
If H(ζ)
Ts
= 1
Ts
≤ C
Tc
, we could find a coding scheme so that Pe is made
arbitrarily small (so small as desired).
◮ This means that an appropriate coding scheme has to meet k
n ≤ C in
order to exploit the possibilities of the noisy-channel coding theorem.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 36 / 62
Discrete memoryless channels
Noisy-channel coding theorem
The theorem also states that, if a bit error probability of Pb is accept-
able, coding rates up to
R (Pb) = C
1−H(Pb)
are achievable. R greater than that cannot be achieved with the given
bit error probability.
In a binary symmetric channel without noise (error probability p = 0),
it can be demonstrated
C = maxpX
[I (X; Y)] = 1 bits/“channel use”.
x0=0
x1=1
y0=0
y1=1
1
1
Figure 7: Diagram showing the errorfree binary symmetric channel.
P =
1 0
0 1
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 37 / 62
Discrete memoryless channels
Noisy-channel coding theorem
In a binary symmetric channel with error probability p = 0,
C = 1 − H (p) =
= 1 − p · log2
1
p + (1 − p) · log2
1
1−p bits/“channel use”.
replacements
x0=0
x1=1
y0=0
y1=1
1−p
1−p
p
p
Figure 8: Diagram showing the binary symmetric channel
-BSC(p).
P =
1 − p p
p 1 − p
In the binay erasure channel with erasure probability p,
C = maxpX
[I (X; Y)] = 1 − p bits/“channel use”.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 38 / 62
Entropy and mutual information for continuous RRVV
Entropy and mutual information
for continuous RRVV
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 39 / 62
Entropy and mutual information for continuous RRVV
Differential entropy
Differential entropy or continuous entropy of a continuous RV X
with pdf fX (x) is defined as
h (X) =
∞
−∞
fX (x) · log2
1
fX(x) dx
It does not measure an absolute quantity of information, hence the
differential term (it could take values lower than 0).
The differential entropy of a continuous random vector
−→
X with joint
pdf f−→
X
−→x is defined as
h
−→
X =
−→x
f−→
X
−→x · log2
1
f−→
X
(−→x )
d−→x
−→
X = (X0, · · · , XN−1) ; f−→
X
−→x = f−→
X
(x0, · · · , xN−1)
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 40 / 62
Entropy and mutual information for continuous RRVV
Differential entropy
For a given variance value σ2, the Gaussian RV exhibits the largest
achievable differential entropy.
◮ This means the Gaussian RV has a special place in the domain of con-
tinuous RRVV within Information Theory.
Properties of differential entropy:
◮ Differential entropy is invariant under translations
h (X + c) = h (X)
◮ Scaling
h (a · X) = h (X) + log2 (|a|)
h A ·
−→
X = h
−→
X + log2 (|A|)
◮ For given variance σ2
, a Gaussian RV X with variance σ2
X = σ2
and any
other RV Y with variance σ2
Y = σ2
, h (X) ≥ h (Y).
◮ For a Gaussian RV X with variance σ2
X
h (X) = 1
2 log2 2π e σ2
X
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 41 / 62
Entropy and mutual information for continuous RRVV
Mutual information for continuous RRVV
Mutual information of two continuous RRVV X and Y:
I (X; Y) =
∞
−∞
∞
−∞
fX,Y (x, y) · log2
fX(x|y)
fX(x) dxdy
fX,Y (x, y) is X and Y joint pdf, and fX (x|y) is the conditional pdf of
X given Y = y.
Properties:
◮ Symmetry, I (X; Y) = I (Y; X).
◮ Non-negative, I (X; Y) ≥ 0.
◮ I (X; Y) = h (X) − h (X|Y).
◮ I (X; Y) = h (Y) − h (Y|X).
h (X|Y) =
∞
−∞
∞
−∞
fX,Y (x, y) · log2
1
fX(x|y) dxdy
This is the conditional differential entropy of X given Y.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 42 / 62
AWGN channel capacity theorem
AWGN channel capacity theorem
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 43 / 62
AWGN channel capacity theorem
Continuous channel capacity
Consider a Gaussian discrete memoryless channel, described by
◮ x(t) is a stocastic stationary process, with mx = 0 and bandwidth Wx =
B Hz.
◮ Process is sampled at with sampling period Ts = 1
2B , and Xk = x (k · Ts)
are thus a bunch of continuous RRVV ∀ k, with E [Xk ] = 0.
◮ A RV Xk is transmitted each Ts seconds over a noisy channel with
bandwidth B, during a total of T seconds (n = 2BT total samples).
◮ The channel is AWGN, adding noise samples described by RRVV Nk
with mn = 0 and Sn (f ) = N0
2 , so that σ2
n = N0B.
◮ The received samples are statistically independent RRVV, described as
Yk = Xk + Nk .
◮ The cost function for any maximization of the mutual information is the
signal power E X2
k = S.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 44 / 62
AWGN channel capacity theorem
Continuous channel capacity
The channel capacity is defined as
C = maxfXk
(x) I (Xk; Yk) : E X2
k = S
◮ I (Xk ; Yk ) = h (Yk ) − h (Yk |Xk ) = h (Yk ) − h (Nk )
◮ Maximum is only reached if h (Yk ) is maximized.
◮ This only happens if fXk
(x) is Gaussian!
◮ Therefore, C = I (Xk ; Yk ) with Xk Gaussian and E X2
k = S.
E Y2
k = S + σ2
n, then h (Yk) = 1
2 log2 2π e S + σ2
n .
h (Nk) = 1
2 log2 2π e σ2
n .
C = I (Xk; Yk) = h (Yk ) − h (Nk) = 1
2 log2 1 + S
σ2
n
bits/“channel
use”.
Finally, C (bits/s) = n
T · C (bits/“channel use”)
C = B · log2 1 + S
N0B bits/s; note that n
T = 1
Ts
“channel use”/s
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 45 / 62
AWGN channel capacity theorem
Shannon-Hartley theorem
The Shannon-Hartley theorem states that the capacity of a bandlim-
ited AWGN channel with bandwidth B and power spectral density N0/2
is
C = B · log2 1 + S
N0B bits/s
This is the highest possible information transmission rate over this ana-
log communication channel, accomplished with arbitrarily small error
probability.
Capacity increases (almost) linearly with B, whereas S determines only
a logarithmic increase.
◮ Increasing available bandwidth has far larger impact on capacity than
increasing transmission power.
The bandlimited, power constrained AWGN channel is a very conve-
nient model for real-world communications.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 46 / 62
AWGN channel capacity theorem
Implications of the channel capacity theorem
Consider an ideal system where Rb = C.
S = Eb · C, where Eb is the average bit energy.
C
B = log2 1 + Eb
N0
C
B → Eb
N0
= 2
C
B −1
C
B
If we represent the spectral efficiency η = Rb
B as a function of Eb
N0
, the
previous expression is an asymptotic curve on such plane that marks
the border between the reliable zone, and the unrealiable zone.
◮ This curve helps us to identify the parameter set for a communication
system so that it may achieve reliable transmission with a given quality
(measured in terms of a limited, maximum error rate).
When B → ∞, Eb
N0 ∞
= ln (2) = −1.6 dB.
◮ This limit is known as the “Shannon limit” for the AWGN channel.
◮ The capacity in the limit is C∞ = S
N0
log2 (e).
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 47 / 62
AWGN channel capacity theorem
Channel capacity tradeoffs
Figure 9: Working regions as determined by the Shannon-Hartley theorem (source: www.gaussianwaves.com). No
channel coding applied.
The diagram illustrates possible tradeoffs, involving Eb
N0
, Rb
B
and Pb.
Note that the spectral efficiency of the modulations is given for optimal pulse shap-
ing.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 48 / 62
AWGN channel capacity theorem
Channel capacity tradeoffs
The lower, left hand side of the plot is the so-called power limited
region.
◮ There, the Eb
N0
is very poor and we have to sacrifice spectral efficiency to
get a given transmission quality (Pb).
◮ An example of this are deep space communications, where the SNR
received is extremely low due to the huge free space losses in the link.
The only way to get a reliable transmission is to drop data rate at very
low values.
The upper, right hand side of the plot is de so-called bandwidth limited
region.
◮ There, the desired spectral efficiency Rb
B for fixed B (desired data rate)
is traded-off against unconstrained transmission power (unconstrained
Eb
N0
), under a given Pb.
◮ An example of this would be a terrestrial DVB transmitting station,
where Rb
B is fixed (standarized), and where the transmitting power is
only limited by regulatory or technological constraints.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 49 / 62
Bridging towards channel coding
Bridging towards channel coding
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 50 / 62
Bridging towards channel coding
Discrete-input Continuous-output memoryless channels
We have considered, on the one hand, discrete input/output memory-
less channels, and, on the other hand, purely Gaussian channels without
paying attention to the input source.
Normal mode of operation in digital communications is characterized
by a discrete source (digital modulation, with symbols si ), that goes
through an AWGN channel (that adds noise samples n).
Therefore, it makes sense to consider real digital communication pro-
cesses as instances of Discrete-input Continuous-output memory-
less channels (DCMC).
It may be demonstrated that, for an M-ary digital modulation going
through an AWGN channel, with outputs r = si + n, the channel
capacity is
C = log2 (M) − T Eb
N0
bits per “channel use”
where T (·) is a funtion that has to be evaluated numerically.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 51 / 62
Bridging towards channel coding
Continuous-output vs Discrete-output capacity
−2 0 2 4 6 8 10 12
0
0.2
0.4
0.6
0.8
1
1.2
E
b
/N
0
(dB)
Spectralefficiency(bps/Hz)
DCMC BPSK Capacity
Hard decided (DMC) BPSK Capacity
AWGN Capacity
Figure 10: Capacity of BPSK DCMC against hard decided BPSK.
For low Eb/N0, there is higher capacity available for DCMC.
As Eb/N0 grows, DCMC and DMC capacities converge.
There is around 1 − 2 dB to be gained with the DCMC in the Eb/N0 range of
interest.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 52 / 62
Bridging towards channel coding
Channel capacity tradeoffs and channel coding
Guaranteeing a given target Pb further limits the attainable zone in the spectral
efficiency/signal-to-noise ratio plane, depending on the framework chosen.
For fixed spectral efficiency (fixed Rb
B
), we move along a horizontal line where we
manage the Pb versus Eb
N0
tradeoff.
For fixed signal-to-noise ratio (fixed Eb
N0
), we move along a vertical line where we
manage the Pb versus Rb
B
tradeoff.
Figure 11: Working regions for given transmission schemes. Capacities are for DCMC (source:www.comtechefdata.com).
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 53 / 62
Bridging towards channel coding
Hard demodulation
Hard decision maximum likeli-
hood (ML) demodulation:
◮ Apply decision region criterion.
◮ Decided symbols are converted back
into mapped bits.
◮ The hard demodulator thus outputs
a sequence of decided bits.
Hard decision ML demodulation
fails to capture the extra infor-
mation contained in the received
constellation
◮ Distance from the received symbol
to the hard decided symbol may pro-
vide a measure of the uncertainty
and reliabitily on the decision.
p(yk|xj )
Figure 12: 16-QAM received constellation with noise.
How could we take advantage of the available extra information?
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 54 / 62
Bridging towards channel coding
Hard demodulation
We may use basic information about the channel and about the trans-
mitted modulation to build new significant demodulation values.
Consider a standard modulation and an AWGN channel
r = si + n
where si is the symbol transmitted in the signal space and n is the
AWGN sample corresponding to the vector channel.
Test hypothesis for demodulation:
P (bk = b|r) =
fr(r|bk =b)P(bk =b)
fr(r)
;
P (bk = b|r) =
P(bk =b)
fr(r) sj /bk =b
fr (r|sj) =
P(bk =b)
fr(r) sj /bk =b
1√
2πσ2
n
e−|sj −r|2
/2σ2
n
Note that fr (r) does not depend on the hypothesis bk = b, b = 0, 1.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 55 / 62
Bridging towards channel coding
Log-Likelihood Ratios
When input bits are equiprobable, we may resort to the quotient
λk = log P(bk =1|r)
P(bk =0|r) = log fr(r|bk=1)
fr(r|bk=0) =
= log



sj /bk =1
e
−|sj −r|
2
/2σ2
n
sj /bk =0
e
−|sj −r|
2
/2σ2
n



Note that fr (r|bk = b) is a likelihood function.
λk is called the Log-Likelihood Ratio (LLR) on bit bk.
◮ Its sign could be used in hard decision to decide whether bk is more likely
to be a 1 (λk > 0) or a 0 (λk < 0).
◮ The modulus of λk could be used as a measure of the reliability of the
hypothesis on bk .
◮ A larger |λk | means a larger reliability on a possible decision.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 56 / 62
Bridging towards channel coding
Soft demodulation
We have seen that basic knowledge about the transmission system and
the channel may lead to the calculation of probabilistic (soft) values
related to a possible decision, under suitable hypothesis.
The calculated LLR values for the received bit stream may provide
valuable insight and information for the next subsystem in the receiver
chain: the channel decoder.
If input bits are not equiprobable, we may still used a soft value char-
acterization, in the form of quotient of demodulation posteriori values
log
P (bk = 1|r)
P (bk = 0|r)
= log
P (bk = 1)
P (bk = 0)
+ log
fr (r|bk = 1)
fr (r|bk = 0)
where the LLR is updated with the a priori probabilities P (bk = b).
We are going to see in the next lesson how soft demodulated values play
a key role in the most powerful channel encoding/decoding methods.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 57 / 62
Conclusions
Conclusions
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 58 / 62
Conclusions
Conclusions
Information Theory represents a cutting-edge research field with appli-
cations in communications, artificial intelligence, data mining, machine
learning, robotics...
We have seen three fundamental results from Shannon’s 1948 seminal
work, that constitute the foundations of all modern communications.
◮ Source coding theorem, that states the limits and possibilities of loss-
less data compresion.
◮ Noisy-channel coding theorem, that states the need of channel coding
techniques to achieve a given performance, using constrained resources.
It establishes the asymptotic possibility of errorfree transmission over
discrete-input discrete-output noisy channels.
◮ Shannon-Hartley theorem, which establishes the absolute (asymp-
totic) limits for errorfree transmission over AWGN channels, and de-
scribes the different tradeoffs involved among the given resources.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 59 / 62
Conclusions
Conclusions
All these results build the attainable working zone for practical and
feasible communication systems, managing and trading-off constrained
resources and under a given target performance (BER).
◮ The η = Rb
B against Eb
N0
plane (under a target BER) constitutes the
playground for designing and bringing into practice any communication
standards.
◮ Any movement over the plane has a direct impact over business and
revenues in the telco domain.
When addressing practical designs in communications, these results and
limits are not much heeded, but they underlie all of them.
◮ There are lots of common practice and common wisdom rules of thumb
in the domain, stating what to use when (regarding modulations, channel
encoders and so on).
◮ Nevertheless, optimizing the designs so as to profit as much as possible
from all the resources at hand require making these limits explicit.
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 60 / 62
References
References
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 61 / 62
References
Bibliography I
[1] J. M. Cioffi, Digital Communications - coding (course). Stanford University, 2010. [Online]. Available:
http://web.stanford.edu/group/cioffi/book
[2] T. M. Cover and J. A. Thomas, Elements of Information Theory. New Jersey: John Wiley & Sons, Inc., 2006.
[3] D. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. [Online]. Available:
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html
[4] S. Haykin, Communications Systems. New York: John Wiley & Sons, Inc., 2001.
[5] Claude E. Shannon, “A mathematical theory of communication,” Bell Systems Technical Journal. [Online]. Available:
http://plan9.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf
Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 62 / 62

More Related Content

What's hot

Implementation of Elliptic Curve Digital Signature Algorithm Using Variable T...
Implementation of Elliptic Curve Digital Signature Algorithm Using Variable T...Implementation of Elliptic Curve Digital Signature Algorithm Using Variable T...
Implementation of Elliptic Curve Digital Signature Algorithm Using Variable T...ijceronline
 
Gate 2013 computer science and information technology previous year paper
Gate 2013 computer science and information technology previous year paperGate 2013 computer science and information technology previous year paper
Gate 2013 computer science and information technology previous year paperLokesh Kataria
 
Sure interview algorithm-1103
Sure interview algorithm-1103Sure interview algorithm-1103
Sure interview algorithm-1103Sure Interview
 
Elliptic curve Cryptography and Diffie- Hellman Key exchange
Elliptic curve Cryptography and Diffie- Hellman Key exchangeElliptic curve Cryptography and Diffie- Hellman Key exchange
Elliptic curve Cryptography and Diffie- Hellman Key exchangeIOSRJM
 
A new RSA public key encryption scheme with chaotic maps
A new RSA public key encryption scheme with chaotic maps A new RSA public key encryption scheme with chaotic maps
A new RSA public key encryption scheme with chaotic maps IJECEIAES
 
Energy-Efficient LDPC Decoder using DVFS for binary sources
Energy-Efficient LDPC Decoder using DVFS for binary sourcesEnergy-Efficient LDPC Decoder using DVFS for binary sources
Energy-Efficient LDPC Decoder using DVFS for binary sourcesIDES Editor
 
A Universal Session Based Bit Level Symmetric Key Cryptographic Technique to ...
A Universal Session Based Bit Level Symmetric Key Cryptographic Technique to ...A Universal Session Based Bit Level Symmetric Key Cryptographic Technique to ...
A Universal Session Based Bit Level Symmetric Key Cryptographic Technique to ...IJNSA Journal
 
Chapter 06 boolean algebra
Chapter 06 boolean algebraChapter 06 boolean algebra
Chapter 06 boolean algebraIIUI
 
Chapter 06 boolean algebra 3o-p
Chapter 06 boolean algebra 3o-pChapter 06 boolean algebra 3o-p
Chapter 06 boolean algebra 3o-pIIUI
 
Chapter 06 boolean algebra 2o-p
Chapter 06 boolean algebra 2o-pChapter 06 boolean algebra 2o-p
Chapter 06 boolean algebra 2o-pIIUI
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Wordshytae
 
1982 - Probabilistic Encryption & How To Play Mental Poker Keeping Secret All...
1982 - Probabilistic Encryption & How To Play Mental Poker Keeping Secret All...1982 - Probabilistic Encryption & How To Play Mental Poker Keeping Secret All...
1982 - Probabilistic Encryption & How To Play Mental Poker Keeping Secret All...decentralizeeverything
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsToyotaro Suzumura
 
Review session2
Review session2Review session2
Review session2NEEDY12345
 
On the Identification Protocols of Versions 4 and 6
On the Identification Protocols of Versions 4 and 6On the Identification Protocols of Versions 4 and 6
On the Identification Protocols of Versions 4 and 6inventionjournals
 
Ijarcet vol-2-issue-7-2323-2327
Ijarcet vol-2-issue-7-2323-2327Ijarcet vol-2-issue-7-2323-2327
Ijarcet vol-2-issue-7-2323-2327Editor IJARCET
 

What's hot (18)

I1803014852
I1803014852I1803014852
I1803014852
 
Implementation of Elliptic Curve Digital Signature Algorithm Using Variable T...
Implementation of Elliptic Curve Digital Signature Algorithm Using Variable T...Implementation of Elliptic Curve Digital Signature Algorithm Using Variable T...
Implementation of Elliptic Curve Digital Signature Algorithm Using Variable T...
 
Gate 2013 computer science and information technology previous year paper
Gate 2013 computer science and information technology previous year paperGate 2013 computer science and information technology previous year paper
Gate 2013 computer science and information technology previous year paper
 
Sure interview algorithm-1103
Sure interview algorithm-1103Sure interview algorithm-1103
Sure interview algorithm-1103
 
Elliptic curve Cryptography and Diffie- Hellman Key exchange
Elliptic curve Cryptography and Diffie- Hellman Key exchangeElliptic curve Cryptography and Diffie- Hellman Key exchange
Elliptic curve Cryptography and Diffie- Hellman Key exchange
 
A new RSA public key encryption scheme with chaotic maps
A new RSA public key encryption scheme with chaotic maps A new RSA public key encryption scheme with chaotic maps
A new RSA public key encryption scheme with chaotic maps
 
Energy-Efficient LDPC Decoder using DVFS for binary sources
Energy-Efficient LDPC Decoder using DVFS for binary sourcesEnergy-Efficient LDPC Decoder using DVFS for binary sources
Energy-Efficient LDPC Decoder using DVFS for binary sources
 
Shannon Fano
Shannon FanoShannon Fano
Shannon Fano
 
A Universal Session Based Bit Level Symmetric Key Cryptographic Technique to ...
A Universal Session Based Bit Level Symmetric Key Cryptographic Technique to ...A Universal Session Based Bit Level Symmetric Key Cryptographic Technique to ...
A Universal Session Based Bit Level Symmetric Key Cryptographic Technique to ...
 
Chapter 06 boolean algebra
Chapter 06 boolean algebraChapter 06 boolean algebra
Chapter 06 boolean algebra
 
Chapter 06 boolean algebra 3o-p
Chapter 06 boolean algebra 3o-pChapter 06 boolean algebra 3o-p
Chapter 06 boolean algebra 3o-p
 
Chapter 06 boolean algebra 2o-p
Chapter 06 boolean algebra 2o-pChapter 06 boolean algebra 2o-p
Chapter 06 boolean algebra 2o-p
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Words
 
1982 - Probabilistic Encryption & How To Play Mental Poker Keeping Secret All...
1982 - Probabilistic Encryption & How To Play Mental Poker Keeping Secret All...1982 - Probabilistic Encryption & How To Play Mental Poker Keeping Secret All...
1982 - Probabilistic Encryption & How To Play Mental Poker Keeping Secret All...
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
 
Review session2
Review session2Review session2
Review session2
 
On the Identification Protocols of Versions 4 and 6
On the Identification Protocols of Versions 4 and 6On the Identification Protocols of Versions 4 and 6
On the Identification Protocols of Versions 4 and 6
 
Ijarcet vol-2-issue-7-2323-2327
Ijarcet vol-2-issue-7-2323-2327Ijarcet vol-2-issue-7-2323-2327
Ijarcet vol-2-issue-7-2323-2327
 

Similar to Itblock2 150209161919-conversion-gate01

Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Aref35
 
Huffman&Shannon-multimedia algorithms.ppt
Huffman&Shannon-multimedia algorithms.pptHuffman&Shannon-multimedia algorithms.ppt
Huffman&Shannon-multimedia algorithms.pptPrincessSaro
 
INFORMATION_THEORY.pdf
INFORMATION_THEORY.pdfINFORMATION_THEORY.pdf
INFORMATION_THEORY.pdftemmy7
 
Information Theory - Introduction
Information Theory  -  IntroductionInformation Theory  -  Introduction
Information Theory - IntroductionBurdwan University
 
DC Lecture Slides 1 - Information Theory.ppt
DC Lecture Slides 1 - Information Theory.pptDC Lecture Slides 1 - Information Theory.ppt
DC Lecture Slides 1 - Information Theory.pptshortstime400
 
Unit-1_Digital_Communication-Information_Theory.pptx
Unit-1_Digital_Communication-Information_Theory.pptxUnit-1_Digital_Communication-Information_Theory.pptx
Unit-1_Digital_Communication-Information_Theory.pptxKIRUTHIKAAR2
 
Unit-1_Digital_Communication-Information_Theory.pptx
Unit-1_Digital_Communication-Information_Theory.pptxUnit-1_Digital_Communication-Information_Theory.pptx
Unit-1_Digital_Communication-Information_Theory.pptxKIRUTHIKAAR2
 
Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Aref35
 
Information theory & coding PPT Full Syllabus.pptx
Information theory & coding PPT Full Syllabus.pptxInformation theory & coding PPT Full Syllabus.pptx
Information theory & coding PPT Full Syllabus.pptxprernaguptaec
 
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHYAUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHYijujournal
 
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHYAUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHYijujournal
 
Information Theory
Information TheoryInformation Theory
Information TheorySou Jana
 
Tele4653 l9
Tele4653 l9Tele4653 l9
Tele4653 l9Vin Voro
 
Digital Communication: Information Theory
Digital Communication: Information TheoryDigital Communication: Information Theory
Digital Communication: Information TheoryDr. Sanjay M. Gulhane
 
Communication engineering -UNIT IV .pptx
Communication engineering -UNIT IV .pptxCommunication engineering -UNIT IV .pptx
Communication engineering -UNIT IV .pptxManoj Kumar
 

Similar to Itblock2 150209161919-conversion-gate01 (20)

Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2
 
Huffman&Shannon-multimedia algorithms.ppt
Huffman&Shannon-multimedia algorithms.pptHuffman&Shannon-multimedia algorithms.ppt
Huffman&Shannon-multimedia algorithms.ppt
 
INFORMATION_THEORY.pdf
INFORMATION_THEORY.pdfINFORMATION_THEORY.pdf
INFORMATION_THEORY.pdf
 
Information Theory - Introduction
Information Theory  -  IntroductionInformation Theory  -  Introduction
Information Theory - Introduction
 
DC Lecture Slides 1 - Information Theory.ppt
DC Lecture Slides 1 - Information Theory.pptDC Lecture Slides 1 - Information Theory.ppt
DC Lecture Slides 1 - Information Theory.ppt
 
Unit-1_Digital_Communication-Information_Theory.pptx
Unit-1_Digital_Communication-Information_Theory.pptxUnit-1_Digital_Communication-Information_Theory.pptx
Unit-1_Digital_Communication-Information_Theory.pptx
 
Unit-1_Digital_Communication-Information_Theory.pptx
Unit-1_Digital_Communication-Information_Theory.pptxUnit-1_Digital_Communication-Information_Theory.pptx
Unit-1_Digital_Communication-Information_Theory.pptx
 
Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3
 
Information theory & coding PPT Full Syllabus.pptx
Information theory & coding PPT Full Syllabus.pptxInformation theory & coding PPT Full Syllabus.pptx
Information theory & coding PPT Full Syllabus.pptx
 
Losseless
LosselessLosseless
Losseless
 
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHYAUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
 
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHYAUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
AUTHENTICATED PUBLIC KEY ENCRYPTION SCHEME USING ELLIPTIC CURVE CRYPTOGRAPHY
 
Information Theory
Information TheoryInformation Theory
Information Theory
 
Tele4653 l9
Tele4653 l9Tele4653 l9
Tele4653 l9
 
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
Lec-03 Entropy Coding I: Hoffmann & Golomb CodesLec-03 Entropy Coding I: Hoffmann & Golomb Codes
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
 
Encoding in sc
Encoding in scEncoding in sc
Encoding in sc
 
Digital Communication: Information Theory
Digital Communication: Information TheoryDigital Communication: Information Theory
Digital Communication: Information Theory
 
Dcs unit 2
Dcs unit 2Dcs unit 2
Dcs unit 2
 
Communication engineering -UNIT IV .pptx
Communication engineering -UNIT IV .pptxCommunication engineering -UNIT IV .pptx
Communication engineering -UNIT IV .pptx
 
Image compression
Image compressionImage compression
Image compression
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 

Recently uploaded (20)

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 

Itblock2 150209161919-conversion-gate01

  • 1. Block 2: Introduction to Information Theory Francisco J. Escribano October 25, 2016 Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 1 / 62
  • 2. Table of contents 1 Motivation 2 Entropy 3 Source coding 4 Mutual information 5 Discrete memoryless channels 6 Entropy and mutual information for continuous RRVV 7 AWGN channel capacity theorem 8 Bridging towards channel coding 9 Conclusions 10 References Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 2 / 62
  • 3. Motivation Motivation Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 3 / 62
  • 4. Motivation Motivation Information Theory is a discipline established during the 2nd half of the XXth Century. It relies on solid mathematical foundations [1, 2, 3, 4]. It tries to address two basic questions: ◮ To what extent can we compress data for a more efficient usage of the limited communication resources? → Entropy ◮ Which is the largest possible data transfer rate for given resources and conditions? → Channel capacity Key concepts for Information Theory are thus entropy (H (X)) and mutual information (I (X; Y)). ◮ X, Y are random variables (RRVV) of some kind. ◮ Information will naturally be characterized by means of RRVV. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 4 / 62
  • 5. Motivation Motivation Up to the 40’s, it was common wisdom in telecommunications that the error rate increased with increasing data rate. ◮ Claude Shannon demonstrated that errorfree transmission may be possible under certain conditions. Information Theory provides strict bounds for any communication system. ◮ Maximum compression → H (X); or minimum I X; ˆX , with controlled distortion. ◮ Maximum transfer rate → maximum I (X; Y). ◮ Any given communication system works between said limits. The mathematics behind is not always constructive, but provides guidelines to design algorithms to improve communications given a set of available resources. ◮ The resources in this context are parameters such as available transmission power, available bandwidth, signal-to-noise ratio and the like. X YX ˆX ChannelCompression Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 5 / 62
  • 6. Entropy Entropy Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 6 / 62
  • 7. Entropy Entropy Consider a discrete memoryless data source, that issues a symbol from a given set at a given symbol rate, chosen randomly and independently from the previous and the subsequent ones. ζ = {s0, · · · , sK−1} , P (S = sk) = pk, k = 0, 1, · · · , K − 1; K is the source radix Information quantity is a random variable defined as I (sk) = log2 1 pk with properties I (sk) = 0 if pk = 1 I (sk) > I (si ) if pk < pi I (sk) ≥ 0, 0 ≤ pk ≤ 1 I (sl , sk) = I (sl ) + I (sk) Definition is conventional but also convenient: it relates uncertainty and information. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 7 / 62
  • 8. Entropy Entropy The source entropy is a measurement of its “information content”, and it is defined as H (ζ) = E{pk } [I (sk)] = K−1 k=0 pk · I (sk) H (ζ) = K−1 k=0 pk · log2 1 pk 0 ≤ H (ζ) ≤ log2 (K) pj = 1 ∧ pk = 0, k = j H (ζ) = 0 pk = 1 K , k = 0, · · · , K − 1 H (ζ) = log2 (K) There is no potential information (uncertainty) in deterministic (“de- generated”) sources. Entropy is conventionally measured in “bits”. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 8 / 62
  • 9. Entropy Entropy E.g. memoryless binary source: H (p) = −p · log2 (p) − (1 − p) · log2 (1 − p). 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1H(p) p Figure 1: Entropy of a memoryless binary source. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 9 / 62
  • 10. Source coding Source coding Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 10 / 62
  • 11. Source coding Source coding The field of source coding addresses the issues related to handling the output data from a given source, from the point of view of Information Theory. One of the main issues is data compression, its theoretical limits and the related practical algorithms. ◮ The most important prerequisite in communications at the PHY is to keep data integrity → any transformation has to be fully invertible (loss- less compression). ◮ Lossy compression is a key field at other communication levels: for ex- ample, compressing a stream of video to MPEG-2 standard, to create a transport frame. ◮ In other related fields, like cryptography, some non-invertible algorithms for data compression are of utmost interest (i.e. hash functions, to create almost-unique tokens). Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 11 / 62
  • 12. Source coding Source coding We may choose to represent the data from the source by asigning to each symbol sk a corresponding codeword (binary, in our case). The aim of this source coding process is to try to represent the source symbols more efficiently. ◮ We focus on variable-length binary, invertible codes → the codewords are unique blocks of binary symbols of length lk , for each symbol sk. ◮ The correspondence codeword ↔ original symbol constitutes a code. Average codeword length: L = E [lk] = K−1 k=0 pk · lk Coding efficiency: η = Lmin L ≤ 1 Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 12 / 62
  • 13. Source coding Shannon’s source coding theorem This theorem establishes the limits for lossless data compression [5]. ◮ N iid1 random variables with entropy H (ζ) each, can be compressed into N · H (ζ) bits with negligible information loss risk as N → ∞. If they are compressed into less than N · H (ζ) bits it is certain that some information will be lost. In practical terms, for a single random variable, this means Lmin ≥ H (ζ) And, therefore, the coding efficiency can be defined as η = H(ζ) L Like other results in Information Theory, this theorem provides the limits, but does not tell how actually we can reach them. 1 Independent and identically distributed. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 13 / 62
  • 14. Source coding Example of source coding: Huffman coding Huffman coding provides a practical algorithm to perform source cod- ing within the limits shown. It is an instance of a class of codes, called prefix codes ◮ No binary word within the codeset is the prefix of any other one. Properties: ◮ Unique coding. ◮ Instantaneous decoding. ◮ The lengths lk meet the Kraft-McMillan inequality [2]: K−1 k=0 2−lk ≤ 1. ◮ L of the code is bounded by H (ζ) ≤ L < H (ζ) + 1 H (ζ) = L ↔ pk = 2−lk Meeting the Kraft-McMillan inequality guarantees that a prefix code with the given lk can be constructed. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 14 / 62
  • 15. Source coding Huffman coding Algorithm to perform Huffman coding: 1 List symbols sk in a column, in order of decreasing probabilities. 2 Compose the probabilities of the last 2 symbols: probabilities are added into a new dummy compound value/symbol. 3 Reorder the probabilities set in an adjacent column, putting the new dummy symbol probability as high as possible, retiring values involved. 4 In the process of moving probabilities to the right, keep the values of the unaffected symbols (making room to the compound value if needed), but assign a 0 to one of the symbols affected, a 1 to the other (top or bottom, but keep always the same criterion along the process). 5 Start afresh the process in the new column. 6 When only two probabilities remain, assign last 0, 1 and stop. 7 Write the binary codeword corresponding to each original symbol by trac- ing back the trajectory of each of the original symbols and the dummy symbols they take part in, from the last towards the initial column. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 15 / 62
  • 16. Source coding Huffman coding example Figure 2: Example of Huffman coding: assigning final codeword patterns proceeds from right to left. To characterize the resulting code, it is important to calculate: H (ζ) ; L = K−1 k=0 pk · lk ; η = H(ζ) L ; σ2 = K−1 k=0 pk · lk − L 2 Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 16 / 62
  • 17. Source coding Managing efficiency in source coding From the point of view of data communication (sequences of data symbols), it is useful to define the n-th extension of the source ζ: ◮ A symbol of the n-th extension of the source is built by taking n succes- sive symbols from it (symbols in ζn ). ◮ If source is iid, for si ∈ ζ, the new extended source ζn is characterized by the probabilities: P (σ = (s1 · · · sn) ∈ ζn ) = n i=1 P (si ∈ ζ) ◮ Given that the sequence si in σ ∈ ζn is independent and identically distributed (iid) for any n, the entroy of this new source is: H (ζn ) = n · H (ζ) By applying Huffman coding to ζn, we derive a code with average length L characterized by: H (ζn ) ≤ L < H (ζn ) + 1 Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 17 / 62
  • 18. Source coding Managing efficiency in source coding Considering the equivalent word length per original symbol Ls = L n → H (ζ) ≤ Ls < H (ζ) + 1 n The final efficiency turns out to be η = H(ζn) L = n·H(ζ) L = H(ζ) Ls Efficiency tends thus to 1 as n → ∞, as shown by the previous inequal- ity. The method of the n-th extension of the source, nevertheless, is ren- dered impractical for large n, due to: ◮ Increasing delay. ◮ Increasing coding and decoding complexity. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 18 / 62
  • 19. Mutual information Mutual information Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 19 / 62
  • 20. Mutual information Joint entropy We extend the concept of entropy to 2 RRVV. ◮ 2 or more RRVV are needed when analyzing communication channels from the point of view of Information Theory. ◮ These RRVV can also be seen as a random vector. ◮ The underlying concept is the characterization of channel input vs chan- nel output, and what we can get about the former by observing the latter. Joint entropy of 2 RRVV: H (X, Y) = x∈X y∈Y p (x, y) · log2 1 p(x,y) H (X, Y) = Ep(x,y) log2 1 P(X,Y) Properties: H (X, Y) ≥ max [H (X) , H (Y)] ; H (X, Y) ≤ H (X) + H (Y) Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 20 / 62
  • 21. Mutual information Conditional entropy Conditional entropy of 2 RRVV: H (Y|X) = x∈X p (x) · H (Y|X = x) = = x∈X p (x) y∈Y p (y|x) · log2 1 p(y|x) = = x∈X y∈Y p (x, y) · log2 1 p(y|x) = Ep(x,y) log2 1 P(Y|X) H (Y|X) is a measure of the uncertainty in Y once X is known. Properties: H (Y|Y) = 0; H (Y|X) = H (Y) if X and Y are independent Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 21 / 62
  • 22. Mutual information Chain rule Relation between joint and conditional entropy of 2 RRVV: H (X, Y) = H (X) + H (Y|X) . The expression points towards the following common wisdom result: “joint knowledge about X and Y is the knowledge about X plus the information in Y not related to X”. Proof: p (x, y) = p (x) · p (y|x) ; log2 (p (x, y)) = log2 (p (x)) + log2 (p (y|x)) ; Ep(x,y) [log2 (P (X, Y))] = Ep(x,y) [log2 (P (X))] + +Ep(x,y) [log2 (P (Y|X))] . Corollary: H (X, Y|Z) = H (X|Z) + H (Y|X, Z). As another consequence: H (Y|X) ≤ H (Y). Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 22 / 62
  • 23. Mutual information Relative entropy H (X) measures the quantity of information needed to describe the RV X on average. The relative entropy D (p q) measures the increase in information needed to describe X when it is characterized by means of a distribution q (x) instead of p (x). X; p (x) → H (X) X; q (x) → H (X) + D (p q) Definition of relative entropy (aka Kullback-Leibler divergence, or improperly “distance”): D (p q) = x∈X p (x) · log2 p(x) q(x) = Ep(x) log2 P(X) Q(X) Note that: limx→0 (x · log (x)) = 0; 0 · log 0 0 = 0; p · log p 0 = ∞. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 23 / 62
  • 24. Mutual information Relative entropy and mutual information Properties of relative entropy: 1 D (p q) ≥ 0. 2 D (p q) = 0 ↔ p (x) = q (x). 3 It is not symmetric. Therefore, it is not a true distance from the metrical point of view. Mutual information of 2 RRVV: I (X; Y) = H (X) − H (X|Y) = = x∈X y∈Y p (x, y) · log2 p(x,y) p(x)·p(y) = = D (p (x, y) p (x) · p (y)) = Ep(x,y) log2 P(X,Y) P(X)·P(Y) . The mutual information between X and Y is the information in X, minus the information in X not related to Y. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 24 / 62
  • 25. Mutual information Mutual information Properties of mutual information: 1 Symmetry: I (X; Y) = I (Y; X). 2 Non negative: I (X; Y) ≥ 0. 3 I (X; Y) = 0 ↔ X and Y are independent. }H (Y|X)I (X; Y) H (X) H (Y) H (X, Y) H (X|Y) Figure 3: Relationship between entropy and mutual information. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 25 / 62
  • 26. Mutual information A glimpse of lossy compression Now that we have defined mutual information, we can examine lossy compression under the so-called Rate-Distortion Theory. A source X is compressed into a new source ˆX, a process characterized by the probabilities p (x) ; p (ˆx, x) = p (ˆx|x) p (x) The quality of the compression is measured according to a distortion parameter d (x, ˆx) and its expected value D: D = Ep(ˆx,x) [d (x, ˆx)] = x ˆx d (x, ˆx) p (ˆx|x) p (x) d (x, ˆx) is any convenient distance measurement ◮ The Euclidean distance, d (x, ˆx) = |x − ˆx|2 . ◮ The Hamming distance, d (x, ˆx) = 1 if x = ˆx, and 0 otherwise. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 26 / 62
  • 27. Mutual information A glimpse of losssy compression In lossy compression, we want ˆX to be represented by the lowest number of bits, and this means finding a distribution p (ˆx|x) such that R = minp(ˆx|x) I X; ˆX = minp(ˆx|x) H (X) − H X|ˆX R is the rate in bits of the compressed source. The source is compressed to an entropy lower than H (X): since the process is lossy, H X|ˆX = 0. The minimum is trivially 0 when X and ˆX are fully unrelated, so that the minimization is performed subject to D ≤ D∗, for a maximum admissible distortion D∗. Conversely, the problem can be addressed as the minimization of the distortion D subject to I X; ˆX ≤ R∗, for a maximum admissible rate R∗ < H (X). Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 27 / 62
  • 28. Discrete memoryless channels Discrete memoryless channels Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 28 / 62
  • 29. Discrete memoryless channels Discrete memoryless channels Communication channel: ◮ Input/output system where the output is a probabilistic function of the input. A DMC (Discrete Memoryless Channel) consist in ◮ Input alphabet X = {x0, x1, · · · , xJ−1}, corresponding to RV X. ◮ Output alphabet Y = {y0, y1, · · · , yK−1}, corresponding to RV Y, noisy version of RV X. ◮ A set of transition probabilities linking input and output, following {p (yk |xj)} k=0,1,··· ,K−1; j=0,1,···J−1 p (yk |xj ) = P (Y = yk |X = xj ) ◮ X and Y are finite and discrete. ◮ The channel is memoryless, since the output symbol depends only on the current input symbol. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 29 / 62
  • 30. Discrete memoryless channels Discrete memoryless channels Channel matrix P, J × K P =       p (y0|x0) p (y1|x0) · · · p (yK−1|x0) p (y0|x1) p (y1|x1) · · · p (yK−1|x1) ... ... ... ... p (y0|xJ−1) p (y1|xJ−1) · · · p (yK−1|xJ−1)       The channel is said to be symmetric when each column is a permutation of any other, and each row is a permutation of any other. Important property K−1 k=0 p (yk|xj ) = 1, ∀ j = 0, 1, · · · , J − 1. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 30 / 62
  • 31. Discrete memoryless channels Discrete memoryless channels Output Y is probabilistically determined by the input (a priori) proba- bilities and the channel matrix, following pX = (p (x0) , p (x1) , · · · , p (xJ−1)), and p (xj) = P (X = xj) pY = (p (y0) , p (y1) , · · · , p (yK−1)), and p (yk) = P (Y = yk) p (yk) = J−1 j=0 p (yk|xj ) · p (xj ), ∀ k = 0, 1 · · · , K − 1 Therefore, pY = pX · P When J = K and yj is the correct choice when sending xj, we can calculate the average symbol error probability as Pe = J−1 j=0 p (xj) · K−1 k=0,k=j p (yk|xj ) The probability of correct transmission is 1 − Pe. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 31 / 62
  • 32. Discrete memoryless channels Discrete memoryless channels Example of discrete memoryless channel: modulation with hard deci- sion. xj yk Figure 4: 16-QAM transmitted constellation. p(yk|xj ) Figure 5: 16-QAM received constellation with noise. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 32 / 62
  • 33. Discrete memoryless channels Disccrete memoryless channels Example of non-symmetric channel → the binary erasure channel, typ- ical of storage systems. ◮ Reading data in storage systems can also be modeled as sending infor- mation through a channel with given probabilistic properties. ◮ An erasure marks the complete uncertainty over the symbol read. ◮ There are methods to recover from erasures, based on the principles of Information Theory. x0=0 x1=1 y0=0 y1=ǫ y2=1 1−p 1−p p p Figure 6: Diagram showing the binary erasure channel. P = 1 − p p 0 0 p 1 − p Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 33 / 62
  • 34. Discrete memoryless channels Channel capacity Mutual information depends on P and pX. Characterizing the possibil- ities of the channel requires removing dependency with pX. Channel capacity is defined as C = maxpX [I (X; Y)] ◮ It is the maximum average mutual information enabled by the channel, in bits per channel use, and the best we can get out of it in point of reliable information transfer. ◮ It only depends on the channel transition probabilities P. ◮ If the channel is symmetric, the distribution pX that maximizes I (X; Y) is the uniform one (equiprobable symbols). Channel coding is a process where controled redundancy is added to protect data integrity. ◮ A channel encoded information sequence is represented by a block of n bits, encoded from an information sequence represented by k ≤ n bits. ◮ The so-called code rate is calculated as R = k/n ≤ 1. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 34 / 62
  • 35. Discrete memoryless channels Noisy-channel coding theorem Consider a discrete source ζ, emitting symbols with period Ts. ◮ The binary information rate of such source is H(ζ) Ts (bits/s). Consider a discrete memoryless channel, used to send coded data each Tc seconds. ◮ The maximum possible data transfer rate would be C Tc (bits/s). The noisy-channel coding theorem states the following [5]: ◮ If H(ζ) Ts ≤ C Tc , there exists a coding scheme that guarantees errorfree transmission (i.e. Pe arbitrarily small). ◮ Conversely, if H(ζ) Ts > C Tc , the communication cannot be made reliable (i.e. we cannot have a bounded Pe, so small as desired). Please note that again the theorem is asymptotic, and not constructive: it does not say how to actually reach the limit. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 35 / 62
  • 36. Discrete memoryless channels Noisy-channel coding theorem Example: a binary symmetric channel. Consider a binary source ζ = {0, 1}, with equiprobable symbols. ◮ H (ζ) = 1 info bits/“channel use”. ◮ Source works at a rate of 1 Ts “channel uses”/s, and H(ζ) Ts info bits/s. Consider an encoder with rate k n info/coded bits, and 1 Tc “channel uses”/s. ◮ Note that k n = Tc Ts . Maximum achievable rate is C Tc coded bits/s. If H(ζ) Ts = 1 Ts ≤ C Tc , we could find a coding scheme so that Pe is made arbitrarily small (so small as desired). ◮ This means that an appropriate coding scheme has to meet k n ≤ C in order to exploit the possibilities of the noisy-channel coding theorem. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 36 / 62
  • 37. Discrete memoryless channels Noisy-channel coding theorem The theorem also states that, if a bit error probability of Pb is accept- able, coding rates up to R (Pb) = C 1−H(Pb) are achievable. R greater than that cannot be achieved with the given bit error probability. In a binary symmetric channel without noise (error probability p = 0), it can be demonstrated C = maxpX [I (X; Y)] = 1 bits/“channel use”. x0=0 x1=1 y0=0 y1=1 1 1 Figure 7: Diagram showing the errorfree binary symmetric channel. P = 1 0 0 1 Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 37 / 62
  • 38. Discrete memoryless channels Noisy-channel coding theorem In a binary symmetric channel with error probability p = 0, C = 1 − H (p) = = 1 − p · log2 1 p + (1 − p) · log2 1 1−p bits/“channel use”. replacements x0=0 x1=1 y0=0 y1=1 1−p 1−p p p Figure 8: Diagram showing the binary symmetric channel -BSC(p). P = 1 − p p p 1 − p In the binay erasure channel with erasure probability p, C = maxpX [I (X; Y)] = 1 − p bits/“channel use”. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 38 / 62
  • 39. Entropy and mutual information for continuous RRVV Entropy and mutual information for continuous RRVV Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 39 / 62
  • 40. Entropy and mutual information for continuous RRVV Differential entropy Differential entropy or continuous entropy of a continuous RV X with pdf fX (x) is defined as h (X) = ∞ −∞ fX (x) · log2 1 fX(x) dx It does not measure an absolute quantity of information, hence the differential term (it could take values lower than 0). The differential entropy of a continuous random vector −→ X with joint pdf f−→ X −→x is defined as h −→ X = −→x f−→ X −→x · log2 1 f−→ X (−→x ) d−→x −→ X = (X0, · · · , XN−1) ; f−→ X −→x = f−→ X (x0, · · · , xN−1) Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 40 / 62
  • 41. Entropy and mutual information for continuous RRVV Differential entropy For a given variance value σ2, the Gaussian RV exhibits the largest achievable differential entropy. ◮ This means the Gaussian RV has a special place in the domain of con- tinuous RRVV within Information Theory. Properties of differential entropy: ◮ Differential entropy is invariant under translations h (X + c) = h (X) ◮ Scaling h (a · X) = h (X) + log2 (|a|) h A · −→ X = h −→ X + log2 (|A|) ◮ For given variance σ2 , a Gaussian RV X with variance σ2 X = σ2 and any other RV Y with variance σ2 Y = σ2 , h (X) ≥ h (Y). ◮ For a Gaussian RV X with variance σ2 X h (X) = 1 2 log2 2π e σ2 X Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 41 / 62
  • 42. Entropy and mutual information for continuous RRVV Mutual information for continuous RRVV Mutual information of two continuous RRVV X and Y: I (X; Y) = ∞ −∞ ∞ −∞ fX,Y (x, y) · log2 fX(x|y) fX(x) dxdy fX,Y (x, y) is X and Y joint pdf, and fX (x|y) is the conditional pdf of X given Y = y. Properties: ◮ Symmetry, I (X; Y) = I (Y; X). ◮ Non-negative, I (X; Y) ≥ 0. ◮ I (X; Y) = h (X) − h (X|Y). ◮ I (X; Y) = h (Y) − h (Y|X). h (X|Y) = ∞ −∞ ∞ −∞ fX,Y (x, y) · log2 1 fX(x|y) dxdy This is the conditional differential entropy of X given Y. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 42 / 62
  • 43. AWGN channel capacity theorem AWGN channel capacity theorem Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 43 / 62
  • 44. AWGN channel capacity theorem Continuous channel capacity Consider a Gaussian discrete memoryless channel, described by ◮ x(t) is a stocastic stationary process, with mx = 0 and bandwidth Wx = B Hz. ◮ Process is sampled at with sampling period Ts = 1 2B , and Xk = x (k · Ts) are thus a bunch of continuous RRVV ∀ k, with E [Xk ] = 0. ◮ A RV Xk is transmitted each Ts seconds over a noisy channel with bandwidth B, during a total of T seconds (n = 2BT total samples). ◮ The channel is AWGN, adding noise samples described by RRVV Nk with mn = 0 and Sn (f ) = N0 2 , so that σ2 n = N0B. ◮ The received samples are statistically independent RRVV, described as Yk = Xk + Nk . ◮ The cost function for any maximization of the mutual information is the signal power E X2 k = S. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 44 / 62
  • 45. AWGN channel capacity theorem Continuous channel capacity The channel capacity is defined as C = maxfXk (x) I (Xk; Yk) : E X2 k = S ◮ I (Xk ; Yk ) = h (Yk ) − h (Yk |Xk ) = h (Yk ) − h (Nk ) ◮ Maximum is only reached if h (Yk ) is maximized. ◮ This only happens if fXk (x) is Gaussian! ◮ Therefore, C = I (Xk ; Yk ) with Xk Gaussian and E X2 k = S. E Y2 k = S + σ2 n, then h (Yk) = 1 2 log2 2π e S + σ2 n . h (Nk) = 1 2 log2 2π e σ2 n . C = I (Xk; Yk) = h (Yk ) − h (Nk) = 1 2 log2 1 + S σ2 n bits/“channel use”. Finally, C (bits/s) = n T · C (bits/“channel use”) C = B · log2 1 + S N0B bits/s; note that n T = 1 Ts “channel use”/s Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 45 / 62
  • 46. AWGN channel capacity theorem Shannon-Hartley theorem The Shannon-Hartley theorem states that the capacity of a bandlim- ited AWGN channel with bandwidth B and power spectral density N0/2 is C = B · log2 1 + S N0B bits/s This is the highest possible information transmission rate over this ana- log communication channel, accomplished with arbitrarily small error probability. Capacity increases (almost) linearly with B, whereas S determines only a logarithmic increase. ◮ Increasing available bandwidth has far larger impact on capacity than increasing transmission power. The bandlimited, power constrained AWGN channel is a very conve- nient model for real-world communications. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 46 / 62
  • 47. AWGN channel capacity theorem Implications of the channel capacity theorem Consider an ideal system where Rb = C. S = Eb · C, where Eb is the average bit energy. C B = log2 1 + Eb N0 C B → Eb N0 = 2 C B −1 C B If we represent the spectral efficiency η = Rb B as a function of Eb N0 , the previous expression is an asymptotic curve on such plane that marks the border between the reliable zone, and the unrealiable zone. ◮ This curve helps us to identify the parameter set for a communication system so that it may achieve reliable transmission with a given quality (measured in terms of a limited, maximum error rate). When B → ∞, Eb N0 ∞ = ln (2) = −1.6 dB. ◮ This limit is known as the “Shannon limit” for the AWGN channel. ◮ The capacity in the limit is C∞ = S N0 log2 (e). Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 47 / 62
  • 48. AWGN channel capacity theorem Channel capacity tradeoffs Figure 9: Working regions as determined by the Shannon-Hartley theorem (source: www.gaussianwaves.com). No channel coding applied. The diagram illustrates possible tradeoffs, involving Eb N0 , Rb B and Pb. Note that the spectral efficiency of the modulations is given for optimal pulse shap- ing. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 48 / 62
  • 49. AWGN channel capacity theorem Channel capacity tradeoffs The lower, left hand side of the plot is the so-called power limited region. ◮ There, the Eb N0 is very poor and we have to sacrifice spectral efficiency to get a given transmission quality (Pb). ◮ An example of this are deep space communications, where the SNR received is extremely low due to the huge free space losses in the link. The only way to get a reliable transmission is to drop data rate at very low values. The upper, right hand side of the plot is de so-called bandwidth limited region. ◮ There, the desired spectral efficiency Rb B for fixed B (desired data rate) is traded-off against unconstrained transmission power (unconstrained Eb N0 ), under a given Pb. ◮ An example of this would be a terrestrial DVB transmitting station, where Rb B is fixed (standarized), and where the transmitting power is only limited by regulatory or technological constraints. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 49 / 62
  • 50. Bridging towards channel coding Bridging towards channel coding Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 50 / 62
  • 51. Bridging towards channel coding Discrete-input Continuous-output memoryless channels We have considered, on the one hand, discrete input/output memory- less channels, and, on the other hand, purely Gaussian channels without paying attention to the input source. Normal mode of operation in digital communications is characterized by a discrete source (digital modulation, with symbols si ), that goes through an AWGN channel (that adds noise samples n). Therefore, it makes sense to consider real digital communication pro- cesses as instances of Discrete-input Continuous-output memory- less channels (DCMC). It may be demonstrated that, for an M-ary digital modulation going through an AWGN channel, with outputs r = si + n, the channel capacity is C = log2 (M) − T Eb N0 bits per “channel use” where T (·) is a funtion that has to be evaluated numerically. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 51 / 62
  • 52. Bridging towards channel coding Continuous-output vs Discrete-output capacity −2 0 2 4 6 8 10 12 0 0.2 0.4 0.6 0.8 1 1.2 E b /N 0 (dB) Spectralefficiency(bps/Hz) DCMC BPSK Capacity Hard decided (DMC) BPSK Capacity AWGN Capacity Figure 10: Capacity of BPSK DCMC against hard decided BPSK. For low Eb/N0, there is higher capacity available for DCMC. As Eb/N0 grows, DCMC and DMC capacities converge. There is around 1 − 2 dB to be gained with the DCMC in the Eb/N0 range of interest. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 52 / 62
  • 53. Bridging towards channel coding Channel capacity tradeoffs and channel coding Guaranteeing a given target Pb further limits the attainable zone in the spectral efficiency/signal-to-noise ratio plane, depending on the framework chosen. For fixed spectral efficiency (fixed Rb B ), we move along a horizontal line where we manage the Pb versus Eb N0 tradeoff. For fixed signal-to-noise ratio (fixed Eb N0 ), we move along a vertical line where we manage the Pb versus Rb B tradeoff. Figure 11: Working regions for given transmission schemes. Capacities are for DCMC (source:www.comtechefdata.com). Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 53 / 62
  • 54. Bridging towards channel coding Hard demodulation Hard decision maximum likeli- hood (ML) demodulation: ◮ Apply decision region criterion. ◮ Decided symbols are converted back into mapped bits. ◮ The hard demodulator thus outputs a sequence of decided bits. Hard decision ML demodulation fails to capture the extra infor- mation contained in the received constellation ◮ Distance from the received symbol to the hard decided symbol may pro- vide a measure of the uncertainty and reliabitily on the decision. p(yk|xj ) Figure 12: 16-QAM received constellation with noise. How could we take advantage of the available extra information? Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 54 / 62
  • 55. Bridging towards channel coding Hard demodulation We may use basic information about the channel and about the trans- mitted modulation to build new significant demodulation values. Consider a standard modulation and an AWGN channel r = si + n where si is the symbol transmitted in the signal space and n is the AWGN sample corresponding to the vector channel. Test hypothesis for demodulation: P (bk = b|r) = fr(r|bk =b)P(bk =b) fr(r) ; P (bk = b|r) = P(bk =b) fr(r) sj /bk =b fr (r|sj) = P(bk =b) fr(r) sj /bk =b 1√ 2πσ2 n e−|sj −r|2 /2σ2 n Note that fr (r) does not depend on the hypothesis bk = b, b = 0, 1. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 55 / 62
  • 56. Bridging towards channel coding Log-Likelihood Ratios When input bits are equiprobable, we may resort to the quotient λk = log P(bk =1|r) P(bk =0|r) = log fr(r|bk=1) fr(r|bk=0) = = log    sj /bk =1 e −|sj −r| 2 /2σ2 n sj /bk =0 e −|sj −r| 2 /2σ2 n    Note that fr (r|bk = b) is a likelihood function. λk is called the Log-Likelihood Ratio (LLR) on bit bk. ◮ Its sign could be used in hard decision to decide whether bk is more likely to be a 1 (λk > 0) or a 0 (λk < 0). ◮ The modulus of λk could be used as a measure of the reliability of the hypothesis on bk . ◮ A larger |λk | means a larger reliability on a possible decision. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 56 / 62
  • 57. Bridging towards channel coding Soft demodulation We have seen that basic knowledge about the transmission system and the channel may lead to the calculation of probabilistic (soft) values related to a possible decision, under suitable hypothesis. The calculated LLR values for the received bit stream may provide valuable insight and information for the next subsystem in the receiver chain: the channel decoder. If input bits are not equiprobable, we may still used a soft value char- acterization, in the form of quotient of demodulation posteriori values log P (bk = 1|r) P (bk = 0|r) = log P (bk = 1) P (bk = 0) + log fr (r|bk = 1) fr (r|bk = 0) where the LLR is updated with the a priori probabilities P (bk = b). We are going to see in the next lesson how soft demodulated values play a key role in the most powerful channel encoding/decoding methods. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 57 / 62
  • 58. Conclusions Conclusions Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 58 / 62
  • 59. Conclusions Conclusions Information Theory represents a cutting-edge research field with appli- cations in communications, artificial intelligence, data mining, machine learning, robotics... We have seen three fundamental results from Shannon’s 1948 seminal work, that constitute the foundations of all modern communications. ◮ Source coding theorem, that states the limits and possibilities of loss- less data compresion. ◮ Noisy-channel coding theorem, that states the need of channel coding techniques to achieve a given performance, using constrained resources. It establishes the asymptotic possibility of errorfree transmission over discrete-input discrete-output noisy channels. ◮ Shannon-Hartley theorem, which establishes the absolute (asymp- totic) limits for errorfree transmission over AWGN channels, and de- scribes the different tradeoffs involved among the given resources. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 59 / 62
  • 60. Conclusions Conclusions All these results build the attainable working zone for practical and feasible communication systems, managing and trading-off constrained resources and under a given target performance (BER). ◮ The η = Rb B against Eb N0 plane (under a target BER) constitutes the playground for designing and bringing into practice any communication standards. ◮ Any movement over the plane has a direct impact over business and revenues in the telco domain. When addressing practical designs in communications, these results and limits are not much heeded, but they underlie all of them. ◮ There are lots of common practice and common wisdom rules of thumb in the domain, stating what to use when (regarding modulations, channel encoders and so on). ◮ Nevertheless, optimizing the designs so as to profit as much as possible from all the resources at hand require making these limits explicit. Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 60 / 62
  • 61. References References Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 61 / 62
  • 62. References Bibliography I [1] J. M. Cioffi, Digital Communications - coding (course). Stanford University, 2010. [Online]. Available: http://web.stanford.edu/group/cioffi/book [2] T. M. Cover and J. A. Thomas, Elements of Information Theory. New Jersey: John Wiley & Sons, Inc., 2006. [3] D. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. [Online]. Available: http://www.inference.phy.cam.ac.uk/mackay/itila/book.html [4] S. Haykin, Communications Systems. New York: John Wiley & Sons, Inc., 2001. [5] Claude E. Shannon, “A mathematical theory of communication,” Bell Systems Technical Journal. [Online]. Available: http://plan9.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf Francisco J. Escribano Block 2: Introduction to Information Theory October 25, 2016 62 / 62