advance coding techniques - probability

Advanced Coding
Techniques
Lecture No. 1: Basic Theorems
Asst. Prof. Dr. Hamsa A. Abdullah
hamsa.abdulkareem@nahrainuniv.edu.iq

Introduction
Asst. Prof. Dr. Hamsa A. Abdullah Advanced Coding Techniques 2

Introduction
• There are two main fields of coding theory, namely
o Source coding, which tries to represent the source symbols in minimal form for storage or
transmission efficiency.
o Channel coding, the purpose of which is to enhance detection and correction of
transmission errors, by choosing symbol representations which are far apart from each
other.
• Data compression can be considered an extension of source coding. It can be divided into two
phases:
o Modelling of the information source means defining suitable units for coding, such as
characters or words, and estimating the probability distribution of these units.
o Source coding (called also statistical or entropy coding) is applied to the units, using their
probabilities.

Information
• Shannon defined a measure of the information for the event 𝑥 by using a logarithmic measure
operating over the base 𝒃. For a discrete random variable, X, the information of an outcome X = x is:
𝑰 𝒙 = −𝒍𝒐𝒈𝒃(𝒑 𝒙 )
• The information of the event depends only on its probability of occurrence, and is not dependent on
its content.
• The base of the logarithmic measure can be converted by using:
𝒍𝒐𝒈𝒂 𝒑 𝒙 = 𝒍𝒐𝒈𝒃 𝒑 𝒙
𝟏
𝒍𝒐𝒈𝒃 𝒂
• If this measure is calculated to base 2, the information is said to be measured in bits.

Information
• For independent random variables:
𝑰 𝒙, 𝒚 = −𝒍𝒐𝒈 𝒑 𝒙, 𝒚
= −𝒍𝒐𝒈 𝒑 𝒙 𝒑 𝒚
= 𝑰(𝒙) + 𝑰(𝒚)
• If X is Bernoulli with 𝑃𝑟{𝑋 = 1} = 𝑝, the information of the outcome 𝑋 = 1 𝑖𝑠 𝐼 = −𝑙𝑜𝑔 𝑝 .
• The information is always a positive quantity.

Entropy
• The entropy of a discrete random variable, X, is defined by:
𝑯 𝑿 = −𝑬[𝒍𝒐𝒈𝟐 𝒑 𝑿 = − D
𝒙∈𝑿
𝒑 𝒙 𝒍𝒐𝒈𝟐 𝒑 𝒙
𝑯 𝑿 = D
𝒙∈𝑿
𝒑 𝒙 𝒍𝒐𝒈𝟐
𝟏
𝒑 𝒙
• For discrete random variables 𝐻(𝑋) ≥ 0
• The entropy is the average information of the random variable X:
𝐻 𝑋 = 𝐸[𝐼 𝑋 ]
• When base 2 is used, the entropy is measured in bits per symbol.

Entropy
• Note That:
o Entropy is the measure of average uncertainty in X
o Entropy is the average number of bits needed to describe X
o Entropy is a lower bound on the average length of the shortest description of X.
• The information rate is then equal to:
𝑹 = 𝒓𝑯(𝑿) 𝒃𝒑𝒔
Where r is symbols per second.

Example
• Entropy of a Bernoulli R.V. with parameter p.
• Solution:
𝐻 𝑋 = −𝑝 log(𝑝) − 1 − 𝑝 log(1 − 𝑝)

Example
• Entropy of a uniform R.V. taking on K values: e.g., 𝑋 = (1, … . , 𝐾).
• Solution:
𝑯 𝑿 = − D
𝒙∈𝑿
𝒑 𝒙 𝒍𝒐𝒈 𝒑 𝒙 = D
𝒊(𝟏
𝑲 𝟏
𝑲
𝒍𝒐𝒈 𝑲 = 𝒍𝒐𝒈 𝑲
• Note:
• It does not depend of the values that X takes, it depends only on their probabilities (X and X + a have the
same entropy!).

Example
• A source characterized in the frequency domain with a bandwidth of W =4000 Hz is sampled
at the Nyquist rate, generating a sequence of values taken from the range A = {−2,−1, 0, 1, 2}
with the following corresponding set of probabilities {
!
"
,
!
#
,
!
$
,
!
!%
,
!
!%
}. Calculate the source rate
in bits per second.

Solution
𝐻 𝑋 = D
+(,
-
𝑝(𝑥) log.
1
𝑝(𝑥)
𝐻 𝑋 =
1
2
log. 2 +
1
4
log. 4 +
1
8
log. 8 +
2
16
log. 16 =
15
8
𝑏𝑖𝑡𝑠/𝑠𝑎𝑚𝑝𝑙𝑒
• The minimum sampling frequency is equal to 8000 samples per second, so that the information rate is
equal to 15 kbps.
• Note that:
• Entropy can be evaluated to a different base by using:
𝐻/ 𝑋 =
𝐻 𝑥
log. 𝑏

Example
• A given source emits r = 3000 symbols per second from a range of four symbols, with the probabilities
given in Table:
Xi Pi Ii
A 1/3 1.5849
B 1/3 1.5849
C 1/6 2.5849
D 1/6 2.5849

Solution
• The entropy is:
𝐻 𝑋 = D
+(,
-
𝑝(𝑥) log.
1
𝑝(𝑥)
𝐻 𝑋 =
2
3
log. 3 +
2
6
log. 6 = 1.9183 𝑏𝑖𝑡𝑠 /𝑠𝑦𝑚𝑏𝑜𝑙
Xi Pi Ii
A 1/3 1.5849
B 1/3 1.5849
C 1/6 2.5849
D 1/6 2.5849

Example
• Find the entropy of English 26 letters (a-z) and a space character '-'.
• Solution:
• The entropy is:
𝐻 𝑋 = − D
0∈1
𝑝 𝑥 log 𝑝 𝑥 = 4.11 𝑏𝑖𝑡𝑠/𝑙𝑒𝑡𝑡𝑒𝑟

Transmission of Information
• The channel usually has negative effect on information transmission so that not all the
information (or entropy) is transferred to the receiver, instead a portion of this information is
discarded by the channel or the channel adds noise to the transferred information. The
model is shown below;

Transmission of Information
• The channel is modelled as conditional probability 𝑝 𝑥+ 𝑦2 𝑜𝑟 𝑝(𝑦2|𝑥+) for all values of xi and yj.
• For the simplest case of binary channel where we have;
X = { x1 , x2 } Y = { y1 , y2 }
• We have four conditional probabilities 𝑝(𝑦2|𝑥+) as follows:
𝑝 𝑦! 𝑥! Conditional probability of receiving 𝑦! by the receiver when the source produced 𝑥! or probability of correct
reception of 𝑥!
𝑝 𝑦" 𝑥" Conditional probability of receiving 𝑦" by the receiver when the source produced 𝑥" or probability of correct reception
of 𝑥"
𝑝 𝑦" 𝑥! Conditional probability of receiving 𝑦" by the receiver when the source produced 𝑥! or probability of incorrect
transition of 𝑥!
𝑝 𝑦! 𝑥" Conditional probability of receiving 𝑦! by the receiver when the source produced 𝑥" or probability of incorrect
transition of 𝑥"

Joint and Conditional Entropy
• The joint probability mass function of two random variables X and Y taking values on alphabets X and Y,
respectively, is:
𝑝 𝑥, 𝑦 = Pr 𝑋 = 𝑥, 𝑌 = 𝑦 , 𝑥, 𝑦 ∈ 𝑋, 𝑌
• If 𝑝 𝑥 = Pr 𝑋 = 𝑥 > 0, the conditional probability of Y=y given that X=x is defined by:
𝑝 𝑦|𝑥 = Pr 𝑌 = 𝑦|𝑋 = 𝑥 =
𝑝 𝑥, 𝑦
𝑝 𝑥

• Independence: The events X = x and Y = y are independent if
𝑝(𝑥, 𝑦) = 𝑝(𝑥)𝑝(𝑦)
• The joint entropy: H(X, Y) of two random variables (X, Y) with pmf p(x,y) is defined as:
𝐻 𝑋, 𝑌 = −𝐸[𝑙𝑜𝑔 𝑝 𝑋, 𝑌 = − D
3∈4
D
0∈1
𝑝 𝑥, 𝑦 log(𝑝 𝑥, 𝑦 )
• The conditional entropy of Y given X is defined as:
𝐻 𝑌|𝑋 = −𝐸[𝑙𝑜𝑔 𝑝 𝑌|𝑋 = − ∑3∈4 ∑0∈1 𝑝 𝑥, 𝑦 log(𝑝 𝑦|𝑥 )

Chain rule
• We Know that 𝑝 𝑥, 𝑦 = 𝑝(𝑥)𝑝 𝑦|𝑥 . Therefore, taking logarithms and expectations on both
sides we arrive to:
𝐸[𝑙𝑜𝑔 𝑝 𝑥, 𝑦 = 𝐸[𝑙𝑜𝑔 𝑝 𝑥 + 𝐸[𝑙𝑜𝑔 p y|x ]
• So chain rule:
𝑯 𝑿, 𝒀 = 𝑯 𝑿 + 𝑯(𝒀|𝑿)
• Similarly:
𝑯 𝑿, 𝒀 = 𝑯 𝒀 + 𝑯(𝑿|𝒀)

Chain rule
• Note that:
𝑯(𝑿|𝒀) ≠ 𝑯 𝒀 𝑿
• But
𝑯 𝒀 − 𝑯 𝒀 𝑿 = 𝑯 𝑿 − 𝑯(𝑿|𝒀)
• As a corollary of the chain rule, it is easy to prove the following
𝑯(𝑿, 𝒀|𝒁) = 𝑯(𝑿|𝒁) + 𝑯(𝒀|𝑿, 𝒁)

Chain rule
• Mutual information: The mutual information between
two random variables is the “amount of information”
describing one random variable obtained through the other
(mutual dependence); alternate interpretations: how much
is your uncertainty about X reduced from knowing Y , how
much does X inform Y ?
𝐼 𝑋, 𝑌 = &
#,%
𝑃 𝑋, 𝑌 𝑙𝑜𝑔
𝑃 𝑋, 𝑌
𝑃 𝑋 𝑃 𝑌
= 𝐻 𝑋 − 𝐻 𝑋 𝑌
= 𝐻 𝑌 − 𝐻 𝑌 𝑋
= 𝐻 𝑋 + 𝐻 𝑌 − 𝐻(𝑋, 𝑌)
• Note that
• 𝐼(𝑋, 𝑌 ) = 𝐼(𝑌, 𝑋) ≥ 0, with equality if and only if X and Y
are independent.

Example
• Find:
a. 𝐻 𝑋, 𝑌 ,
b. 𝐻 𝑌 𝑋 , 𝐻 𝑋 𝑌 ,
c. 𝐼(𝑋, 𝑌).
XY 0 1
0 1/4 1/4
1 1/2 0

Solution
a. 𝐻 𝑋, 𝑌 =
.
5
𝑙𝑜𝑔. 4 +
,
.
𝑙𝑜𝑔. 2
𝐻 𝑋, 𝑌 = 1.5
XY 0 1
0 1/4 1/4
1 1/2 0

Solution
b. Since 𝐻 𝑋, 𝑌 = 𝐻 𝑋 + 𝐻(𝑌|𝑋)
𝑃 𝑌 = [
&
#
,
!
#
] , 𝑃 𝑋 = [
!
"
,
!
"
]
𝐻 𝑌 = ∑' 𝑃 𝑌 𝑙𝑜𝑔
!
( '
=
&
#
𝑙𝑜𝑔"
#
&
+
!
#
𝑙𝑜𝑔" 4 = 0.81
𝐻 𝑋 = ∑) 𝑃 𝑋 𝑙𝑜𝑔
!
( )
=
"
"
𝑙𝑜𝑔" 2 = 1
𝐻 𝑌 𝑋 = 𝐻 𝑋, 𝑌 − 𝐻 𝑋
𝐻 𝑌 𝑋 = 1.5 − 1 = 0.5
𝐻 𝑋 𝑌 = 𝐻 𝑋, 𝑌 − 𝐻 𝑌
𝐻 𝑋 𝑌 = 1.5 − 0.81 = 0.69
Note that: 𝐻(𝑋|𝑌) ≠ 𝐻 𝑌 𝑋
XY 0 1 P(X)
0 1/4 1/4 1/2
1 1/2 0 1/2
P(Y) 3/4 1/4

Solution
c. 𝐼 𝑋, 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 − 𝐻 𝑋, 𝑌
𝐼 𝑋, 𝑌 = 1 + 0.81 − 1.5 = 0.31

Channel Capacity
• Operational channel capacity is the number of bit to represent the maximum number of distinguishable
signals for n uses of a communication channel.
• In n transmission, we can send M signals without error, the channel capacity is 𝒍𝒐𝒈 𝑴/𝒏 bits per
transmission.
• While Information channel capacity is the maximum mutual information. Operational channel capacity
is equal to Information channel capacity.

Channel Capacity
• The channel capacity of a discrete memoryless channel is defined as:
𝐶 = max 𝐼 𝑋, 𝑌
𝐶 = 𝑚𝑎𝑥[𝐻 𝑌 − 𝐻 𝑌 𝑋 ]

Noiseless Binary Channel
• Consider the channel presented in Figure. Show that the capacity is 1 bit per symbol (or per channel
use).
𝑝 𝑌 = 0 = 𝑝 𝑋 = 0 = 𝛼6,
𝑝 𝑌 = 1 = 𝑝 𝑋 = 1 = 𝛼, = 1 − 𝛼6
𝐼(𝑋; 𝑌 ) = 𝐻(𝑌 ) − 𝐻(𝑌 |𝑋) = 𝐻(𝑌 ) ≤ 1
𝛼6 = 𝛼, = 0.5

Binary Symmetric Channel(BSC)
• The BSC is characterized by a probability p that one of the binary symbols converts into the other one.
• Each binary symbol has, on the other hand, a probability of being transmitted.
• The probabilities of a 𝟎 𝒐𝒓 𝒂 𝟏 being transmitted are 𝜶 𝒂𝒏𝒅 𝟏 − 𝜶 respectively. According to the
notation used,
𝑥, = 0,
𝑥. = 1,
𝑦, = 0,
𝑦.= 1

Binary Symmetric Channel(BSC)
• The probability matrix for the BSC is equal to:
𝑃78 = [
1 − 𝑝 𝑝
𝑝 1 − 𝑝
]
• Channel Capacity
𝑪𝑩𝑺𝑪 = 𝟏 − 𝑯 𝒑, 𝟏 − 𝒑

Binary Erasure Channel (BEC)
• The binary erasure channel is when some bits are lost (rather than corrupted).
• Here the receiver knows which bit has been erased. Figure shows this channel.
• We are to calculate the capacity of binary erasure channel.

Binary Erasure Channel (BEC)
• For this channel, 𝟎 ≤ 𝒑 ≤ 𝟏 / 𝟐, 𝑤ℎ𝑒𝑟𝑒 𝒑 is the erasure probability, and the channel model has two
inputs and three outputs.
• When the received values are unreliable, or if blocks are detected to contain errors, then erasures are
declared, indicated by the symbol ‘?’. The probability matrix of the BEC is the following:
𝑃78 = [
1 − 𝑝 𝑝 0
0 𝑝 1 − 𝑝
]
• Channel Capacity
𝑪𝑩𝑬𝑪 = 𝟏 − 𝒑 𝒃𝒊𝒕 𝒑𝒆𝒓 𝒄𝒉𝒂𝒏𝒏𝒆𝒍 𝒖𝒔𝒆

Example
• Consider the binary channel for which the input range and output range are in both cases equal to {0,
1}. Find P(X|Y).

Solution:
𝑃 𝑋 𝑌 =
𝑃 𝑋, 𝑌
𝑃 𝑌
𝑃 𝑋 𝑌 =
𝑃 𝑌 𝑋 𝑃(𝑋)
𝑃 𝑌
• The corresponding transition probability matrix is in this case equal to:
𝑃78 =
3
4
1
4
1
8
7
8
= 𝑃(𝑌|𝑋)
𝑃(𝑋) =
5
=
,
=

Solution:
𝑃 𝑌 = 𝑦 = I
*
𝑃 𝑦 𝑋 𝑃(𝑋)
𝑃 𝑌 = 0 =
3
4
∗
4
5
+
1
8
∗
1
5
=
25
40
𝑃 𝑌 = 1 =
1
4
∗
4
5
+
7
8
∗
1
5
=
15
40
𝑃 𝑌 =
"+
#,
!+
#,

Solution:
𝑃 𝑋 = 0 𝑌 = 0 =
3
4 ∗
4
5
25/40
=
24
25
𝑃 𝑋 = 0 𝑌 = 1 =
1
4 ∗
4
5
15/40
=
8
15
𝑃 𝑋 = 1 𝑌 = 0 =
1
8
∗
1
5
25/40
=
1
25
𝑃 𝑋 = 1 𝑌 = 1 =
7
8
∗
1
5
15/40
=
7
15
𝑃 𝑋 𝑌 =
!"
!#
$
%#
%
!#
&
%#

Symmetric and Non Symmetric Channel
• Let us consider channel with transition matrix:
𝑃 𝑦 𝑥 =
0.3 0.2 0.5
0.5 0.3 0.2
0.2 0.5 0.3
• with the entry in 𝑥𝑡ℎ row and 𝑦𝑡ℎ column giving the probability that y is received when x is sent.
• All the rows are permutations of each other and the same holds for all columns. We say that such a
channel is symmetric.

Symmetric and Non Symmetric Channel
• Definition
• A channel is said to be symmetric if the rows of its transition matrix are permutations of each other, and
the columns are permutations of each other.
• A channel is said to be weakly symmetric if every row of the transition matrix is a permutation every
other row, and all the column sums are equal.
• If a channel is symmetric or weakly symmetric, the channel capacity is:
𝑪 = 𝒍𝒐𝒈 |𝒀| − 𝑯(𝒓)
• where r is the set of probabilities labeling branches leaving a code symbol X, or, viewed in a transition
matrix, one row of the transition matrix.

Example
• Consider a channel with three different inputs 𝑋 = {1, 2, 3} and the same set of outputs 𝑌 = 1, 2, 3

Solution
• The transition probability matrix is:
𝑃 𝑦 𝑥 =
0.7 0.1 0.2
0.2 0.7 0.1
0.1 0.2 0.7
• 𝐶 = 𝑙𝑜𝑔 |3| − 𝐻(0.7,0.1,0.2)
• 𝐶 = 1.58 − 0.7log(
,
6.?
) + 0.1log
,
6.,
+ 0.2log(
,
6..
)
• 𝐶 = 1.58 − 0.36 + 0.33 + 0.464
• 𝐶 = 0.426

Example
• Channel with two erasure symbols, one closer to 0 and one closer to 1, is shown:

Solution
• The transition probability matrix is:
𝑃 𝑦 𝑥 =
1/3 1/4 1/4 1/6
1/6 1/4 1/4 1/3
• We see that two rows have the same set of probabilities. Summing each column we get the constant value
%
!
,
• So we can conclude that the channel is weakly symmetric. The set of outputs 𝑌 has the cardinality |Y| = 4, and we can
calculate the capacity for this channel as:
• 𝐶 = 𝑙𝑜𝑔 |4| − 𝐻(
%
'
,
%
"
,
%
"
,
%
(
)
• 𝐶 = 2 −
%
'
𝑙𝑜𝑔 3 +
!
"
𝑙𝑜𝑔 4 +
%
(
𝑙𝑜𝑔 6
• 𝐶 = 2 − 1.956 = 0.044
• which is a very poor channel.

Non Symmetric Channel
𝑃 𝑌 𝑋 = [
𝑃,, 𝑃,.
𝑃., 𝑃..
]
𝑃 𝑄 = − 𝐻
𝑃,, 𝑃,.
𝑃., 𝑃..
𝑄,
𝑄.
= [
𝑃,,𝑙𝑜𝑔𝑃,, + 𝑃,.𝑙𝑜𝑔𝑃,.
𝑃.,𝑙𝑜𝑔𝑃., + 𝑃..𝑙𝑜𝑔𝑃..
]
• where Q1 and Q2 are auxiliary variables. Then:
𝐶 = log(2@) + 2@*)

Example
• Find the mutual information and channel capacity of the channel given below 𝑝(𝑥1) = 0.6
and 𝑝(𝑥2) = 0.4.

Solution
• The transition Matrix:
𝑃 𝑌 𝑋 =
0.8 0.2
0.3 0.7
𝑃 𝑋, 𝑌 = 𝑃 𝑌 𝑋 ∗ 𝑃(𝑋)
𝑃 𝑋, 𝑌 =
0.8 ∗ 0.6 0.2 ∗ 0.6
0.3 ∗ 0.4 0.7 ∗ 0.4
=
0.48 0.12
0.12 0.28
𝑃 𝑌 = 𝑦 = D
0
𝑃(𝑋, 𝑌)
𝑃 𝑌 = [0.6 0.4]

Solution
𝑃 𝑋 𝑌 =
𝑃 𝑋, 𝑌
P Y
𝑃 𝑋 = 0|𝑌 = 0 =
0.48
0.6
= 0.8
𝑃 𝑋 = 0|𝑌 = 1 =
0.12
0.4
= 0.03
𝑃 𝑋 = 1|𝑌 = 0 =
0.12
0.6
= 0.2

Solution
𝑃 𝑋 = 1|𝑌 = 1 =
0.28
0.4
= 0.7
𝑃 𝑋|𝑌 =
0.8 0.03
0.2 0.7
𝐻 𝑌 = ∑' 𝑃 𝑌 𝑙𝑜𝑔
!
( '
= 0.6𝑙𝑜𝑔"
!
,.%
+ 0.4𝑙𝑜𝑔"
!
,.#
𝐻 𝑌 = 0.44 + 0.52 = 0.968
𝐻 𝑌|𝑋 = I
K∈'
I
*∈)
𝑝 𝑥, 𝑦 log(
1
𝑝 𝑦|𝑥
)

Solution
• 𝐻 𝑌|𝑋 = 0.48 log
!
,.$
+ 0.12 log
!
,.&
+ 0.12 log
!
,."
+ 0.28 log
!
,.M
• 𝐻 𝑌|𝑋 = 0.78 bit/symbol
• 𝐼 𝑋, 𝑌 = 𝐻 𝑌 − 𝐻(𝑌|𝑋)
• 𝐼 𝑋, 𝑌 = 0.968 − 0.78 = 0.188 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
•
𝑃,, 𝑃,.
𝑃., 𝑃..
𝑄,
𝑄.
=
𝑃,,𝑙𝑜𝑔𝑃,, + 𝑃,.𝑙𝑜𝑔𝑃,.
𝑃.,𝑙𝑜𝑔𝑃., + 𝑃..𝑙𝑜𝑔𝑃..
•
0.8 0.2
0.3 0.7
𝑄,
𝑄.
=
0.8 𝑙𝑜𝑔 0.8 + 0.2 𝑙𝑜𝑔 0.2
0.3 𝑙𝑜𝑔 0.3 + 0.7 𝑙𝑜𝑔 0.7
•
0.8 0.2
0.3 0.7
𝑄,
𝑄.
=
−0.72
−0.87

Solution
0.8 ∗ 𝑄1 + 0.2 ∗ 𝑄2 = −0.72
0.3 ∗ 𝑄1 + 0.7 ∗ 𝑄2 = −0.87
𝑄1 =
#$.&"#$."∗("
$.)
0.3 ∗
#$.&"#$."∗("
$.)
+ 0.7 ∗ 𝑄2 = −0.87
0.375 ∗ −0.72 − 0.375 ∗ 0.2 ∗ 𝑄2 + 0.7 ∗ 𝑄2 = −0.88
−0.27 − 0.075 ∗ 𝑄2 + 0.7 ∗ 𝑄2 = −0.88
−0.075 ∗ 𝑄2 + 0.7 ∗ 𝑄2 = −0.88 + 0.27
0.625 ∗ 𝑄2 = −0.61
𝑄2 = −0.976
𝑄1 =
#$.&"#$."∗#$.*&+
$.)
= −0.656

Solution
𝐶 = log(2Z2 + 2Z3)
𝐶 = log(2[.]^]
+ 2[._`]
)
𝐶 = log 0.635 + 0.508
𝐶 = log 1.14 = 0.189

Continuous Sources and channels
• Differential Entropy
• The differential entropy (Nats) of continuous source with generic probability density function (pdf) 𝑓1
is defined as:
h X = − ∫
AB
B
𝑓1 𝑥 log 𝑓1 𝑥 𝑑𝑥

Example
• A continuous source X with source alphabet [0, 1) and pdf 𝑓(𝑥) = 2𝑥 has
differential entropy equal to:
h X = − 6
&
!
2𝑥 log 2𝑥 𝑑𝑥
=
𝑥"
(1 − 2 log(2𝑥))
2
&
!
=
1
2
− log 2 ≈ −0.193 𝑛𝑎𝑡𝑠
• Note that the differential entropy, unlike the entropy, can be negative in its value.

Example
• Differential entropy of continuous sources with uniform
generic distribution)
• A continuous source X with uniform generic distribution over (a,
b) has differential entropy
• The pdf of an uniform random variable is:
𝑓 𝑥 = 5
1
𝑎 − 𝑏
𝑎 ≤ 𝑥 ≤ 𝑏
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Example
• The differential entropy is simply:
ℎ 𝑋 = 𝐸[− log 𝑓(𝑋)] = log (𝑏 − 𝑎)
• Notice that the differential entropy can be negative or positive depending on whether 𝑏 − 𝑎 is less
than or greater than 1. In practice, because of this property, differential entropy is usually used as
means to determine mutual information and does not have much operational significance by itself.

Example
• Differential entropy of Gaussian sources
• The pdf of a Gaussian random variable is:
𝑓 𝑥 =
1
2𝜋𝜎.
𝑒
A,
.C*0*

Example
• The differential entropy is simply:
ℎ 𝑋 = 𝐸[− log 𝑓(𝑋)]
ℎ 𝑋 = 1 𝑓 𝑥 [
1
2
log 2𝜋𝜎" +
𝑥 − 𝜇 "
2𝜎" ]𝑑𝑥
=
1
2
log 2𝜋𝜎" +
1
2𝜎" 𝐸 𝑥 − 𝜇 "
=
1
2
log 2𝜋𝜎" +
1
2
=
!
"
log(2𝜋𝜎"𝑒) nats
• A continuous source X with Gaussian generic distribution of mean 𝜇 and variance 𝜎" has differential entropy.

The conditional differential entropy
• The conditional differential entropy of 𝑋 given 𝑌 is:
h X|Y = − Ž
0,3
𝑓14 𝑥, 𝑦 𝑙𝑜𝑔.𝑓1|4 𝑋 𝑌 𝑑𝑥𝑑𝑦

The mutual information
• The mutual information between 𝑋 and 𝑌 is:
𝐼 𝑋; 𝑌 = ℎ 𝑋 − ℎ 𝑋 𝑌 = ℎ 𝑌 − ℎ 𝑌 𝑋
𝐼 𝑋; 𝑌 = (
•,•
𝑓‘’ 𝑥, 𝑦 𝑙𝑜𝑔“
𝑓‘’ 𝑥, 𝑦
𝑓‘ 𝑥 𝑓’ 𝑦
𝑑𝑥𝑑𝑦
• Shannon’s channel coding theorem holds for continuous alphabet as well: The
capacity of any channel with power constraint P and transition law 𝑓’|‘ is
𝐶 = max 𝐼(𝑋; 𝑌)

• The Gaussian random variable is very important as we encounter it
frequently in communications and signal processing.
• You can compute the differential entropy of the Gaussian 𝑋 ~ 𝑁 µ, 𝜎“ : It is
equal to:
ℎ 𝑋 =
1
2
log“(2𝜋𝑒𝜎“)

• For the power constrained AWGN channel 𝑌N = 𝑋N + 𝑍N with 𝑍N ~ 𝑁 µ, 𝜎" is maximized when
𝑓) is 𝑁 0, 𝑃 . Then,
𝐼 𝑋; 𝑌 = ℎ 𝑋 − ℎ 𝑋 𝑌 = ℎ 𝑋 − ℎ 𝑍
• Since X, Z are independent Gaussians, 𝑌 ~ 𝑁 0, 𝑃 + 𝜎" . Using the formula for the entropy of
a Gaussian and simplifying, we get:
𝐶 =
1
2
log"(1 +
𝑃
𝜎")

Channel Efficiency and Redundancy
• Channel Efficiency = 𝜂”• =
–
—
. 100%
• Channel Redundancy = 𝑅”• =
—˜–
—
. 100%

Example
For the following channel: 𝑃 𝑦2 𝑥+ =
0.9
0
0.1 0
0.9 0.1
0.1 0 0.9
a) Is the channel symmetric? Why?
b) If the three source symbols probabilities are related by: 𝒑(𝒙𝟏) = 𝒑(𝒙𝟐) = 𝟐. 𝒑(𝒙𝟑), find source
probabilities, all entropies, and average mutual information.
c) Find the channel capacity, channel efficiency and redundancy.

Solution
a) The channel is symmetric (or TSC), because the components of the rows of 𝑃 𝑦2 𝑥+ are the same.
b) To start solving the problem to find source probabilities, we think that we have 3 unknowns and so we
need 3 equations and these are given:
P(x1) = P(x2) . . . (1)
P(x2) =2.P(x3) . . . .(2)
P(x1) + P(x2) + P(x3)=1 ……(3)
from (1) P(x2) = P(x1), from (2) P(x3) = P(x2)/2 = P(x1)/2 .
Putting these relations in (3) will give:
P(x1) + P(x1) + P(x1)/2 =1 à 5. P(x1) /2 =1 or P(x1) =2/5 =0.4
• Now, using (1) and (2) à P(x1) =0.4, P(x2) =0.4, and P(x3) =0.2
• Now we have P(xi) = [ 0.4 0.4 0.2]

Solution
• From the relation 𝑃 𝑥N, 𝑦O = 𝑝(𝑥N) 𝑝 𝑦O 𝑥N and the given matrix of 𝑝 𝑦O 𝑥N :
𝑃(𝑥N, 𝑦O) =
0.36
0
0.04 0
0.36 0.04
0.02 0 0.18
• Summing the column components to have 𝑃 𝑦O = [ 0.38 0.4 0.22]
• Other probabilities are unnecessary in this example, now calculate:
H(x) = − ∑* 𝑃 𝑥N 𝐿𝑜𝑔𝑃(𝑥N) = 1.5219 Bit/Symbol
H(y) = − ∑K 𝑃 𝑦O 𝐿𝑜𝑔𝑃(𝑦O) = 1.5398 Bit/Symbol
• Since we have symmetric channel then, H(y|x)= − ∑K 𝑃 𝑦O|𝑥N 𝐿𝑜𝑔𝑃 𝑦O 𝑥N
• H(y|x) = − [ 0.9 Log 0.9 + 0.1 Log 0.1 +0 Log0 ] = 0.467 Bits/Symbol

Solution
• I= H(Y) – H(Y|X) = 1.5398 - 0.467 = 1.0728 Bits/Symbol
• H(X|Y) = H(X) – I = 1.5219 - 1.0728 = 0.4491 Bits/Symbol
• H(X,Y) = H(X) + H(Y) – I = 1.5219 + 1.5398 - 1.0728 = 1.9889 Bits/Symbol
c) Using the general expression for channel capacity for symmetric channel:
• 𝐶 = 𝐿𝑜𝑔 𝑀 − 𝐻 𝑌 𝑋 = 𝐿𝑜𝑔 𝑀 + ∑3 𝑃 𝑦2|𝑥+ 𝐿𝑜𝑔𝑃(𝑦2|𝑥+)
= 𝐿𝑜𝑔. 3 − 0.467 = 1.1179 Bits/Symbol
• Channel Efficiency = 𝜂78 =
F
G
. 100% =
,.6?.H
,.,,?I
. 100% = 95.96 %
• Channel Redundancy = 𝑅78 =
GAF
G
. 100% =
,.,,?IA,.6?.H
,.,,?I
. 100% = 4.04 %

Entropy, Information, and Capacity Rates
• The meaning of rate is the unit of a physical quantity per unit time. For the entropy, the
average mutual information, and the capacity the rate is measured in bits per second
(bits/sec) or more general bps. This is much important than the unit Bits/symbol.
• In terms of units >>>
PNQ
RKSTUV
×
RKSTUV
WXYUZ[
=
PNQ
RXYUZ[
𝑜𝑟 𝑏𝑝𝑠
• Let Rx be the source symbol rate, then the time of the symbol (Tx )is given by:
𝑇* =
!
*
second/Symbol
• Now each entropy H, I or C can be converted from Bits/Symbol unit into rate unit of bps by
multiplying each of them by Rx, as follows:
𝐻] 𝑥 = 𝑅*. 𝐻 𝑥
𝐼] = 𝑅*. 𝐼
𝐶] = 𝑅*. 𝐶

Example
• For previous Example, find 𝐻J 𝑥 , 𝐼J 𝑎𝑛𝑑 𝐶J, if the average time interval of the source symbol is 10
𝜇. 𝑠𝑒𝑐.
• Solution:
o Since TK =
,
L+
then RK =
,
M+
=
,
,6 K ,6,- = 10=
= 100000 symbols/sec
o From the results of previous Example
Ø H(x) =1.5219 Bits/Symbol, then 𝐻J 𝑥 = 𝑅0. 𝐻 𝑥 = 152190 bps
Ø 𝐼 = 1.0728 Bits/symbol , then 𝐼J = 𝑅0. 𝐼 = 107280 bps
Ø C=1.1179 Bits/symbol, then 𝐶J = 𝑅0. 𝐶 = 111790 bps.

Information, and Capacity Over Continues
Channel
a) The Bandwidth (B): The bandwidth is the range of frequency occupied by given signal or system in
Hz. It can be measured by the difference between fmax and fmin over positive side in the frequency
domain.
b) Nyquist’s Theorem: The maximum sample or symbol rate of signal over channel having bandwidth
B is limited to 2B symbols/sec. In mathematical representation: 𝑅 ≤ 2. 𝐵 Symbols/sec (Rmax=2B).
c) The signal-to-noise power ratio (S/N): It is the ratio of the signal power (S in Watt) to the noise
power (N in Watt also) in the channel. Thus, it is a unit-less (ratio). Usually, expressed in dB where;
«
𝑆
𝑁 NO
= 10. 𝐿𝑜𝑔,6 «
𝑆
𝑁 JPQ+R
𝑑𝐵
• The inverse conversion is required also: ®
S
T JPQ+R
= 10 ⌉
[S/T ./÷,6

Channel
• The model in the case of continues channel is shown below:
• As before, the source output is continues random variable symbols x with pdf
f(x), the received symbol is y with pdf f(y), and the noise is n with pdf f(n). It
is required to find an expression for C.
+
Source
X, f(x)
Received
y, f(y)
Noise (AWGN)
n, f(n)

Channel
• Assumptions / Their Reason:
1. f(x) is Gaussian (normal) /The Hmax(x) is given by Gaussian RV.
2. f(n) is also Gaussian distribution / Due to the following:
a. Natural noise is totally random in nature like Gaussian RV.
b. We need to test the system under worst case of noise (Gaussian).
c. According to central limit theorem “sum of unknown independent noise sources can be modeled as
Gaussian RV”
3. The mean values of the source and noise are zeros ( ̅
𝑥 = 0 𝑎𝑛𝑑 °
𝑛 = 0 )/ the DC level or the mean does
not affect the information.
+
Source
X, f(x)
Received
y, f(y)
Noise (AWGN)
n, f(n)

Channel
• Since the noise is added to the signal and has Gaussian pdf, it is called Additive White Gaussian Noise
(AWGN). The term white here is used to specify that the noise is present in all frequencies with the same
power spectral density. So, the above model is also called AWGN channel model.
• Now we shall use the above definitions and assumptions to derive the channel capacity for continues
channel:
• Since the source and noise are both Gaussian with zero means then:
- The signal power is 𝑆 = 𝑥" = 𝜎+
"
• Then, 𝐻 𝑥 = 𝐿𝑜𝑔 2𝜋𝑒𝜎+
"
=
!
"
𝐿𝑜𝑔(2𝜋𝑒𝑆) (the signal entropy)
- The noise power is 𝑁 = 𝑛" = 𝜎,
"
• Then, 𝐻 𝑛 = 𝐿𝑜𝑔 2𝜋𝑒𝜎,
"
=
!
"
𝐿𝑜𝑔(2𝜋𝑒𝑁)
• or 𝐻 𝑦|𝑥 =
!
"
𝐿𝑜𝑔(2𝜋𝑒𝑁) (the noise entropy)

Channel
- Since y=x +n and both x and n are Gaussian RVs, then also y is Gaussian RV, with mean °
𝑦 = ̅
𝑥 + °
𝑛 =
0, so, the received signal power is (𝑆 + 𝑁) = 𝑦. = 𝜎3
. , then
𝐻 𝑦 = 𝐿𝑜𝑔 2𝜋𝑒𝜎3
.
=
,
.
𝐿𝑜𝑔(2𝜋𝑒(𝑆 + 𝑁)) (the receiver entropy)
• Since we assume maximum source entropy over given AWGN channel, then:
C = Imax = H(y) – H(y|x)
=
,
.
𝐿𝑜𝑔 2𝜋𝑒 𝑆 + 𝑁 −
,
.
𝐿𝑜𝑔 2𝜋𝑒𝑁
=
,
.
𝐿𝑜𝑔
.YZ S[T
.YZT
=
,
.
𝐿𝑜𝑔
S[T
T
=
,
.
𝐿𝑜𝑔(1 +
S
T
)

Channel
• We have, C=
!
"
𝐿𝑜𝑔"(1 +
-
.
) Bits/symbol
• Using the above Nyquist’s Theorem then Rmax = 2B Symbols/Sec
• Thus, the capacity rate in bps is given by:
Cr = Rmax.C = B 𝐿𝑜𝑔"(1 +
-
.
) bps
• The equation is known as Shannon-Hartley equation for channel capacity:
Cr = B 𝐿𝑜𝑔"(1 +
-
.
) bps
• The above equation relates the bandwidth of the channel with both the signal power and the noise
power. Clearly, when B or S/N increased the capacity rate is also increased. In practice this is not
always true since the noise power is also increased if the bandwidth increased, where
N=NoB (in Watt)
• where B is the channel bandwidth as before and No is called the noise power spectral density (in
Watts/Hz). It is the power of noise for each Hz of the channel bandwidth.

Example
a. The 4G cellular system used maximum bandwidth of 100MHz using efficient signal that provides
S/N=20dB, find the maximum bit rate.
b. If the above is replaced by Huawei 5G cellular system that provides an extended bandwidth of 500
MHz, using the same signal and S/N of 20 dB, what is the percentage increase in system bit rate.

Solution
a. We have S/N= 20 dB, this should be converted to ratio to be used inside Shannon-Hartley Eq., thus,
®
S
T JPQ+R
= 10 ⌉
[S/T ./÷,6 = 10.6÷,6 = 10. = 100 (𝑟𝑎𝑡𝑖𝑜)
• Now: Cr = B 𝐿𝑜𝑔.(1 +
S
T
)
• = 100×10 𝐿𝑜𝑔. 1 + 100 ≈ 666 𝑀𝑏𝑝𝑠

Solution
b. Using 500 MHz bandwidth:
• Now: Cr = B 𝐿𝑜𝑔.(1 +
S
T
) =
= 500×10
𝐿𝑜𝑔. 1 + 100 ≈ 3330 𝑀𝑏𝑝𝑠 (𝑜𝑟 3.3 𝐺𝑏𝑝𝑠)
• %increase in rate =
TZ] ^PQZA_`N ^PQZ
_`N ^PQZ
. 100%
=
aaa6A

. 100%
=
.5

. 100% = 400%
• This means, the rate (bps) in 5G is four times that of 4G.

Example
• Consider the following specifications for Digital Image: Image frame resolution or dimension =
1200×800 pixels/frame. Colored (RGB) information for each 𝑝𝑖𝑥𝑒𝑙 = 24 𝐵𝑖𝑡𝑠/ 𝑝𝑖𝑥𝑒𝑙. The pixels are
equal probable to have any color value. Find:
a. the amount of information carried by one frame (in bits/frame).
b. the amount of information produced by 1000 frames.
c. the rate of information (in bps), if the above 1000 frames are sent within 100 sec.
d. the required channel bandwidth if the signal to noise power ratio is 45 dB.

Solution
• First, we need to know the detail of any digital image. It consists of a number of picture elements also
called pixel or pel or dot. One can notice this small element when get very close to TV screen. The
single image also known as frame or just picture is 2-dimentional representation of large number of
pixels. This number is determined by the height (H) and the width (W) of the picture or frame. In
above example 𝑊×𝐻 = 1200×800 (also called the resolution). The actual resolution is the number of
bits in each pixel. Higher resolution produces better quality picture.
a. Given 𝑊×𝐻 = 1200×800 and 24 bits/pixel we need to find the total information of one frame:
• 𝐼bJPcZ = 1200𝑥800
d+0Z`
bJPcZ
×24
O+Qe
d+0Z`
= 2304×105 O+Qe
fJPcZ

Solution
b. 𝐼g = 2304×105 O+Qe
fJPcZ
× 1000 𝐹𝑟𝑎𝑚𝑒𝑠 = 2304×10?
𝐵𝑖𝑡𝑠
c. 𝑅/ = 2304×10? O+Qe
,66 eZ7
= 2304×10= 𝑏𝑝𝑠
d. Here 𝐶J = 𝑅/ = 2304×10= 𝑏𝑝𝑠
• and ®
S
T JPQ+R
= 10 ⌉
[S/T ./÷,6 = 105=÷,6 = 105.= = 31622.8 (𝑟𝑎𝑡𝑖𝑜)
• Using the channel capacity theorem: Cr = B 𝐿𝑜𝑔.(1 +
S
T
), then
• 𝐵 =
G0
hRi*(,[
1
2
)
=
.a650,63
hRi*(,[ a,...H)
=
.a650,63
,5.I=
= 15.4× 10𝐻𝑧 = 15.4 𝑀𝐻𝑍

Example
• Repeat the requirements for previous Example if the image is a Gray scale image with 8 bits/pixel.

Solution
• The image here is gray scale image (also called Black and White or B/W picture). So instead of colored 24 bits per pixel we
have 8 bits per pixel.
a. Given 𝑊×𝐻 = 1200×800 as before and 8 bits/pixel :
𝐼45678 = 1200×800
𝑝𝑖𝑥𝑒𝑙
𝑓𝑟𝑎𝑚𝑒
×8
𝐵𝑖𝑡𝑠
𝑝𝑖𝑥𝑒𝑙
= 768×10"
𝐵𝑖𝑡𝑠
𝐹𝑟𝑎𝑚𝑒
b. 𝐼9 = 768×10" :;<=
>5678
× 1000 𝐹𝑟𝑎𝑚𝑒𝑠 = 768×10&
𝐵𝑖𝑡𝑠
c. 𝑅? = 768×10& :;<=
%@@ =8A
= 768×10#
𝑏𝑝𝑠
d. Here 𝐶5 = 𝑅? = 768𝑥10#
𝑏𝑝𝑠
and p
B
C 56<;D
= 10 ⌉
[B/C !"÷%@
= 10"#÷%@
= 10".#
= 31622.8 (𝑟𝑎𝑡𝑖𝑜)
Using the channel capacity theorem: Cr = B 𝐿𝑜𝑔!(1 +
B
C
), then
𝐵 =
J#
KDL$(%N
%
&
)
=
&($P%@'
KDL$(%N '%(!!.$)
=
&($P%@'
%".Q#
= 52.42×10#
𝐻𝑧 = 5.242 𝑀𝐻𝑍

advance coding techniques - probability

Recommended

Recommended

More Related Content

Similar to advance coding techniques - probability

Similar to advance coding techniques - probability (20)

More from YaseenMo

More from YaseenMo (6)

Recently uploaded

Recently uploaded (20)

advance coding techniques - probability