Orlandoβs Arnold Palmer Hospital Layout Strategy-1.pptx
Β
Information Theory and coding - Lecture 2
1. Mustaqbal University
College of Engineering &Computer Sciences
Electronics and Communication Engineering Department
Course: EE301: Probability Theory and Applications
Prerequisite: Stat 219
Text Book: B.P. Lathi, βModern Digital and Analog Communication Systemsβ, 3th edition, Oxford University
Press, Inc., 1998
Reference: A. Papoulis, Probability, Random Variables, and Stochastic Processes, Mc-Graw Hill, 2005
Dr. Aref Hassan Kurdali
2. Application: Information Theory
β’ In the context of communications, information theory deals with
mathematical modeling and analysis of a communication system
rather than with physical sources and physical channels.
β’ In particular, it provides answers to two fundamental questions
(among others):
1) What is the minimum number of binits (binary digits) per source
symbol required to fully represent the source in acceptable quality ?
(Most efficient source coding)
2) What is the ultimate (highest) transmission binit rate for reliable
communication (no error transmission) over a noisy channel?
(Most efficient channel coding)
3. The answers to these two questions lie in the entropy of a source and the
capacity of a channel respectively.
Entropy is defined in terms of the probabilistic behavior of a source of
information (How much average uncertainty of an information source?);
it is so named in respect to the parallel use of this concept in
thermodynamics (How much average instability of a physical source?).
Capacity is defined as the basic ability of a channel to transmit
information; it is naturally related to the noise characteristics of the
channel.
A remarkable result that emerges from information theory is that if the
entropy of the source is less than the capacity of the channel, then
error-free communication over the channel can be achieved.
4. The discrete source output is modeled as a discrete random variable, S,
which takes on symbols from a fixed finite alphabet
S={s1, s2, s3, .........., sq}
With probability distribution P(S= si) = pi, i=1,2, 3,........,q
Where
A discrete memoryless source (zero memory source) emits statistically
independent symbols during successive signaling intervals where the
symbol emitted at any time is independent of previous emitted symbols.
ο₯
ο½
ο½
q
i
i
p
1
1
Discrete Memoryless Source
5. Information Measure
How much information I(a) associated with an event βaβ whose
probability p(a) = p?.
The information measure I(a) should have several properties:
1. Information is a non-negative quantity: I(a) β₯ 0.
2. If an event has probability 1, we get no information from the
occurrence of that event, i.e. I(a) = 0 if p (a) =1.
3. If two independent events (a & b) occur (whose joint probability is the
product of their individual probabilities i.e. p(ab) = p(a)p(b)), then the
total information we get from observing these two events is the sum of
the two informations:
I(ab) = I(a)+I(b). (This is the critical property . . . )
4. The information measure should be a continuous (and, in fact,
monotonic) function of the probability (slight changes in probability
should result in slight changes in information).
6. Since, I(a2) = I(aa) = I(a)+I(a) = 2 I(a)
Thus, by continuity, we get, for 0 < p(a) β€ 1, and n > 0 as real number:
I(an) = n * I(a)
From this, The information can be measured by the logarithm function,
i.e. I(a) = βlogb(p(a)) = logb(1/p(a)) for some base b.
The base b determines the unit of information used.
The unit can be changed by changing the base, using the following formula:
For b1, b2 & x > 0,
Therefore, logb1 (x) = logb2(x) / logb2(b1)
7. The occurrence of an event S = sk either provides some or no information, but never
brings about a loss of information.
The less probable an event is, the more information we gain when it occurs.
Uncertainty, Surprise, and Information
The amount of (uncertainty, surprise), information gained (before, at)
after observing the event S = sk, which occurs with probability pk, is
therefore defined using the logarithmic function
8. Units of information
The base of the logarithm in Equation (9.4) is quite arbitrary.
Nevertheless, it is the standard practice today to use a logarithm to base 2.
The resulting unit of information is called the bit
When pk = 1/2, we have I(sk) = 1 bit. Hence, one bit is the amount of
information that we gain when one of two possible and equally likely
(i,e., equiprobable) events occurs.
If a logarithm to base 10 is used, the resulting unit of information is
called the hartly. When pk = 1/10, we have I(sk) = 1 hartly.
A logarithm to base e can also be used, the resulting unit of information
is called the nat. When pk = 1/e, we have I(sk) = 1 nat.
9. Source Entropy H(S)
the entropy of a discrete memoryless source
H(S) =
It is the average amount of information content per source symbol.
The source entropy is bounded as follows:
0 β€ H(S) β€ log q
where q is the radix (number of symbols) of the alphabet of the source.
Furthermore, we may make two statements:
1. H(S) = 0, if and only if the probability pi = 1 for some i, and the
remaining probabilities in the set are all zero; this lower bound on
entropy corresponds to no uncertainty.
2. H(S) = log q, if and only if pi = 1/q for all i (i.e., all the symbols in the
alphabet are equiprobable); this upper bound on entropy corresponds to
maximum uncertainty.
ο₯
ο₯ ο½
ο½
ο½
q
i
i
i
q
i
i
i p
p
I
p
1
1
)
/
1
log(
10. Consider a binary source for which
symbol 0 occurs with probability p0 and
symbol 1 with probability pl = 1 β p0.
The source is memoryless so that
successive symbols emitted by the
source are statistically independent.
The entropy of the binary source is
usually called as the entropy function
h(p0) = p0 log (1/p0) + (1-p0) log (1/(1-p0))
11. we often find it useful to consider blocks rather than individual symbols,
with each block consisting of n successive source symbols.
We may view each such block as being produced by an extended source
with a source alphabet that has qn distinct blocks, where q is the number
of distinct symbols in the source alphabet of the original source.
a)In the case of a discrete memoryless source, the source symbols are
statistically independent. Hence, the probability of an extended source
symbol is equal to the product of the probabilities of the n original
source symbols constituting the particular extended source symbol. Thus,
it may be intuitively to expect that H(Sn), the entropy of the extended
source, is equal to n times H(S) the entropy of the original source. That
is, we may write
H(Sn) = n H(S)
12. Problems
1. Find the entropy of a 7-symbol source at uniform distribution.
(Answer: 2.81 bits of information/SS)
2. Given a five-symbol source with the following probability
distribution {1/2, 1/4, 1/8, 1/16, 1/16, calculate the average
amount of information per source symbol. (Answer: 1.875
bits/SS)
3. Given a 3-symbol, zero memory source S (a, b, c). If the
amount of the joint information I(bc) = log(12) bits of
information. Find any possible source probability distribution
the source S. (Answer: {5/12, 1/3, 1/4} )
4. Consider a zero memory binary source S with P(s1) = 0.8 &
P(s2) = 0.2.
a) Construct 2nd and 3rd extensions of the source S.
b) Find the corresponding probability distribution of each extension.
c) Calculate the average amount of information per source symbol (H(S2) and
H(S3)).
13. The process by which an efficient representation of data generated by a discrete source
( with finite source alphabet) is called source encoding. The device that performs this
representation is called a source encoder. For the source encoder to be efficient,
knowledge of the statistics of the source is required. In particular, if some source
alphabets (symbols) are known to be more probable than others, then this feature may
be exploited in the generation of a source code by assigning short code words to
frequent source symbols, and long code words to rare source symbols in order to
achieve lower code rate (# of code symbols/sec.)and hence using lower communication
channel bandwidth in Hz for transmission or less memory bits for storage. Such a
source code is called a variable-length code.
Let r represents the code radix (number of code alphabet), ( r =2 for binary code, r = 8
for octal code and r = 10 for decimal code and so on).
j is the codeword length (# of code symbol per codeword) and nj is the # of codewords
of length j.
An efficient source encoder should satisfy two functional requirements:
1. The code words produced by the encoder are in binary form.
2. The source code is uniquely decodable, so that the original source sequence can
be reconstructed perfectly from the encoded binary sequence.
Source Coding Theory
14. Prefix (Instantaneous) Code
(Entropy Code - Lossless Data Compression)
For a source variable length code to be of practical use, the code has to
be uniquely decodable (The code and all its extensions must be unique).
This restriction ensures that for each finite sequence of symbols emitted
by the source, the corresponding sequence of code words is unique and
different from the sequence of code words corresponding to any other
source sequence. A prefix (instantaneous) code (a Subclass of uniquely
decodable) is defined as a code in which no code word is the prefix of
any other code word.
Only Code II is a prefix code which is always uniquely decodable code.
Code III is also an uniquely decodable code since the bit 0 indicates the
beginning of each code word but not an instantaneous code. Each codeword
of an instantaneous code can be directly decoded once it is completely
received. (Code I not decodable, example: when 00 is received, it will be s2 or s0 s0)
15. Decision Tree
The shown decision tree is a graphical
representation of the code words which has
an initial state and four terminal states
corresponding to source symbols so, s1, s2,
and s3. Source symbols must not be in
intermediate states to satisfy the prefix
condition. The decoder always
starts at the initial state. The first received bit
moves the decoder to the terminal state so
if it is 0, or else to a second decision point if
it is 1. In the latter case, the second bit moves
the decoder one step further down the tree,
either to terminal state s2 if it is 0, or else to
a third decision point if it is 1, and so on.
Once each terminal state emits its symbol, the decoder is reset to its initial state. Note
also that each bit in the received encoded sequence is examined only once.
For example, the encoded sequence 1011111000 . . . is readily decoded as the source
sequence sl s3 s2 so so.. . .
16. Kraft-McMillan Inequality
1
1
1
ο£
ο½ ο
ο½
ο½
ο
ο₯
ο₯ j
l
j
j
q
i
l
r
n
r i
Where r is the code radix (number of symbols in the code alphabet, r =2 for
binary code), nj is the # of codewords of length j and l is the maximum
codeword length. Moreover, if a prefix code has been constructed for a discrete
memoryless source with source alphabet (s1, s2, . . . , sq) and source statistics
(P1, P2 , . . . , Pq) and the codeword for symbol si has length li, i = 1, 2, . . . , q,
then the codeword lengths must satisfy the above inequality known as the
Kraft-McMillan Inequality. It does not tell us that a source code is a prefix
code. Rather, it is merely a condition on the codeword lengths of the code and
not on the code words themselves. Referring to the three codes listed in Table
9.2:Code I violates the Kraft-McMillan inequality; it cannot therefore be a
prefix code while, the Kraft-McMillan inequality is satisfied by both codes II
and III; but only code II is a prefix code.
17. Kraft-McMillan Inequality
Prefix codes are distinguished from other uniquely decodable codes by the fact
that the end of the code word is always recognizable. Hence, the decoding of a
prefix can be accomplished as soon as the binary sequence representing a
source symbol is fully received. For this reason, prefix codes are also referred
to as instantaneous codes.
19. Coding Efficiency
Assume the source has an alphabet with q different symbols, and that the ith symbol si
occurs with probability pi , i = 1, 2,. . . , q. Let the binary code word assigned to symbol
si by the encoder have length li measured in binits.
Then, the average code-word length, L, of the source encoder is defined as
In physical terms, the parameter L represents the average number of binits per source
symbol used in the source encoding process. Let Lmin denote the minimum possible
value of L, then, the coding efficiency of the source encoder is defined as
Ξ· = Lmin/ L
With L β₯ Lmin we clearly have Ξ· β€1. The source encoder is said to be efficient when Ξ·
approaches unity.
ο₯
ο½
ο½
q
i
i
i p
l
L
1
20. Data Compaction
A common characteristic of signals generated by physical sources is that,
in their natural form, they contain a significant amount of information
that is redundant. The transmission of such redundancy is therefore
wasteful of primary communication resources. For efficient signal
transmission, the redundant information should be removed from the
signal prior to transmission.
This operation, with no loss of information, is ordinarily performed on a
signal in digital form, in which case it is called as data compaction or
lossless data compression.
According to the source-coding theorem, the entropy H(S) represents a
fundamental limit on the removal of redundancy from the data. i.e. the
average number of bits per source symbol necessary to represent a
discrete memoryless source can be made as small as, but no smaller than,
the entropy H(S).
Thus with Lmin = H(S), the efficiency of a source encoder may be
rewritten in terms of the source entropy H(S) as
Ξ· = H(S)/ L
22. Huffman Code
An important class of prefix codes is known as Huffman codes. The Huffman code by
definition is the most efficient code (highest possible efficiency without coding of
source extension).
The Huffman code of radix r algorithm proceeds as follows:
1. The source symbols are listed in order of decreasing probability.
2. The total # of source symbols q should equal to [b(r-1)+1] & b=0,1,2,3,β¦. Unless
dummy symbols with zero probabilities should be augmented at the end of the list.
3. The βrβ source symbols of lowest probabilities are regarded as being combined (or)
into a new source symbol with probability equal to the sum of the original r
probabilities. Therefore ,The list of source symbols is reduced in size by (r-1). The
probability of the new symbol is placed in the list in accordance with its value (Keep
probability descending order in all time).
3. The procedure is repeated until we are left with a final list of r combined symbols for
which a code symbol is assigned to each one.
4. The code for each (original) source symbol is found by working backward and
tracing the sequence of the code symbols assigned to that source symbol as well as its
successors.
25. Problem 1
Consider a zero memory binary source S with P(s1) = 0.8 & P(s2) = 0.2 :
a) Construct 2nd and 3rd extensions of the source and find the corresponding probability
distribution of each extension and find the entropy.
b) Write down the binary code of the 2nd extension of the source [T β‘ S2] using each of the
following binary decision trees:
c) Find the average code word length L for each binary code.
d) Encode the following source symbol stream using each of the above binary code:
s2 s1 s1 s1 s1 s2 s2 s2 s1 s1
e) Calculate the binit rate in binits/sec. of each one if the source S emits 2000 symbols/sec.
26. Problem 2
Consider a zero memory statistical independent binary source S with two source symbols s1 and
s2. If P(s1) = 0.85, calculate:
a) The amount of information of source symbol s1 = I(s1) in bit of information.
b) The amount of information of source symbol s2 = I(s2) in bit of information.
c) The statistical average of information of the source S = H(S) in bits/source symbol
d) The joint information of the events: A={s1s2} and B={s1s1} in Hartley.
e) The conditional information of the event: A={s1/ s2} in Nat.
27. Problem 2 - Solution
Consider a zero memory statistical independent binary source S with two source symbols s1 and
s2. If P(s1) = 0.85, calculate:
a) The amount of information of source symbol s1 = I(s1) in bit of information.
I(s1) = log (1/0.85) = 0.2345
b) The amount of information of source symbol s2 = I(s2) in bit of information.
I(s2) = log(1/0.15) = 2.737
c) The statistical average of information of the source S = H(S) in bits/source symbol
H(S) = 0.85 Γ 0.2345 + .15 Γ 2.737 = 0.61 bits/SS
d) The joint information of the events: A={s1s2} and B={s1s1} in Hartley.
I(A) = log10 (1/(0.85 Γ 0.15)) = log10 (1/0.1275) = 0.8945 Hartley
I(B) = log10 (1/(0.85 Γ 0.85)) = log10 (1/0.7225) = 0.1412 Hartley
e) The conditional information of the event: A={s1/ s2} in Nat.
P(s1/ s2) = P(s1) β¦β¦. (SI)
I(A) = ln(1/0.85) = 0.1625 Nat
28. Consider 3-symbol, zero memory source S (a, b, c) with P(a) = 0.8 and P(b) = 0.05.
1) Encode the source S symbols using a binary code. Calculate the average code
length L.
2) Calculate the source entropy H(S). Calculate the code efficiency Ξ· = H(S)/L
3) Construct the second extension of the source [T β‘ S2] and find its probability
distribution.
4) Write down the binary code of the source (T) symbols using each of the following
binary decision trees:
5) Calculate the average code length of source (T) and the code efficiency for each
code (LI, πI, LII, πII)
6) Encode the following source symbol stream using each of the above binary code
(b a c c a a b b a c b a )
7) Calculate the binit rate in binits/sec. of each code if the source S emits 3000
symbols/sec.
Problem 3
29. Consider 3-symbol, zero memory source S (a, b, c) with P(a) = 0.8 and P(b) = 0.05.
1) Encode the source S symbols using a binary code. Calculate the average code length
P(a) = 0.8
P(b) = 0.05
P(c) = 0.15
0.8 a 0
0.05 b 10
0.15 c 11
(L = 0.8 + 2Γ 0.05 + 3 Γ 0.15 = 1.35
L = 0.8 + 3 Γ 0.05 + 2 Γ 0.15 = 1.25)
L = 0.8 + 2 Γ 0.05 + 2 Γ 0.15 = 0.8 + 2 Γ 0.2=1.2 binits/SS
Problem 3 - Solution
31. 4) Write down the binary code of the source (T) symbols using each of the following
binary decision trees.
Problem 3 - Solution
32. 5) Calculate the average code length of source (T) and the code efficiency for each code
(LI, πI, LII, πII)
Code I word length: {2, 2, 3, 3, 3, 4, 5, 6, 6}
L = 2Γ0.76 + 3Γ0.2 + 4Γ0.0225 + 5Γ0.0075 + 6Γ0.01= 2.3075 binits/2SS
Ξ·I = H(T)/L = 2H(S)/L = 2Γ0.884/2.3075 = 76.62%
Code II word lengths: {2, 3, 3, 3, 3, 3, 4, 5, 5}
L = 2Γ0.64 + 3Γ0.3425 + 4Γ0.0075 + 5Γ0.01 = 2.3875 binits/2SS
Ξ·II = H(T)/L = 2H(S)/L = 2Γ0.884/2.3875 = 74.05%
6) Encode the following source symbol stream using each of the above binary code:
b a c c a a b b a c b a
T: t5 t7 t1 t4 t3 t5
Code I 000 0100 11 010111 10 000
Code II 000 011 10 11111 110 000
Problem 3 - Solution
33. 7) Calculate the binit rate in binits/sec. of each code if the source S emits 3000
symbols/sec.
(binit rate = source symbol rate Γ source average code length)
Code I binit rate = 2.3075 Γ 1500 = 3.461 kb/sec
Code II binit rate = 2.3875 Γ1500 = 3.581 kb/sec
Noteworthy that:
The binit rate without extension = 1.2 Γ 3000 = 3600 binit/sec = 3.6 kb/sec
Problem 3 - Solution
34. Can an instantaneous (Prefix) code be constructed with the
following codeword lengths?. Find the corresponding code using
the decision tree for each eligible case
a) {1,2,3,3,4,4,5,5}, r = 2
b){1,1,2,2,3,3,4,4}, r = 3
c) {1,1,1,2,2,2,2}, r = 4
Problem 4
36. A zero memory source S emits one of eight symbols
randomly every 1 microsecond with probabilities
{0.13, 0.2, 0.16, 0.3, 0.07, 0.05, 0.03, 0.06}
1. Calculate the source entropy H(S).
2. Construct a Huffman binary code.
3. Calculate the code efficiency.
4. Find the encoder output average binit rate.
Problem 5
37. A zero memory source S emits one of five symbols
randomly every 2 microsecond with probabilities
{0.25, 0.25, 0.2, 0.15, 0.15}
1. Calculate the source entropy H(S).
2. Construct a Huffman binary code.
3. Calculate the average length of this code.
4. Calculate the code efficiency.
5. Find the encoder output average binit rate.
Problem 6
38. A zero memory source S emits one of five symbols
randomly every 2 microsecond with probabilities
{0.25, 0.25, 0.2, 0.15, 0.15}
1. Construct a Huffman ternary code.
2. Calculate the average length of this code.
3. Calculate the code efficiency.
4. Calculate the code Redundancy.(πΈ=1- π)
Problem 7
39. If r β₯ 3, we may not have a sufficient number of symbols so that we can
combine them r at a time. In such a case, we add dummy symbols to the end of
the set of symbols. The dummy symbols have probability 0 and are inserted to
fill the tree. Since at each stage of the reduction, the number of symbols is
reduced by r β 1, we want the total number of symbols to be 1 + k(r β 1), where
k is the number of merges. Hence, we add enough dummy symbols so that the
total number of symbols is of this form. For example:
A zero memory source S emits one of six symbols randomly with probabilities
{0.25, 0.25, 0.2, 0.1, 0.1, 0.1}
1. Construct a Huffman ternary code.
2. Calculate the average length of this code.
3. Calculate the code efficiency.
4. Calculate the code Redundancy.(πΈ=1- π)
Problem 8
40. Complete the following probability distribution of the second
extension T of a binary memoryless source S of 3-symbols {a, b &
c}
Problem 9
T S Prob
π(π‘1) π(aa) 0.25
π(π‘2) π(ab)
π(π‘3) π(ac)
π(π‘4) π(ba)
π(π‘5) π(bb)
π(π‘6) π(bc)
π(π‘7) π(ca)
π(π‘8) π(cb)
π(π‘9) π(cc) 0.01
1. Find the zero memory source S probability
distribution.
2. Calculate the source entropy H(T).
3. Find the ternary Huffman code for the above
source second extension T and calculate the code
efficiency and redundancy. (Hint: you do not
need to add dummy symbol with zero
probability)
41. Code Variance
As a measure of the variability in code-word lengths of a source code,
the variance of the average code-word length L over the ensemble of
source symbols is defined as
where po, pl, . . . , pK-1, are the source statistics, and lk is the length of the
code word assigned to source symbol sk. It is usually found that when a
combined symbol is moved as high as possible, the resulting Huffman
code has a significantly smaller variance Ο2 (which is better) than when
it is moved as low as possible. On this basis, it is reasonable to choose
the former Huffman code over the latter.