2. 6.1 Mathematical models for
information source
• Discrete source
1
]
[
P
}
,
,
,
{
1
2
1
L
k
k
k
k
L
p
x
X
p
x
x
x
X
2
3. 6.1 Mathematical models for
information source
• Discrete memoryless source (DMS)
Source outputs are independent random variables
• Discrete stationary source
– Source outputs are statistically dependent
– Stationary: joint probabilities of and
are identical for all shifts m
– Characterized by joint PDF
N
i
Xi ,
,
2
,
1
}
{
)
,
,
( 2
1 m
x
x
x
p
n
x
x
x
,
, 2
1
m
n
m
m x
x
x
,
, 2
1
3
4. 6.2 Measure of information
• Entropy of random variable X
– A measure of uncertainty or ambiguity in X
– A measure of information that is required by knowledge of
X, or information content of X per symbol
– Unit: bits (log_2) or nats (log_e) per symbol
– We define
– Entropy depends on probabilities of X, not values of X
L
k
k
k
L
x
X
x
X
X
H
x
x
x
X
1
2
1
]
[
P
log
]
P[
)
(
}
,
,
,
{
0
0
log
0
4
5. Shannon’s fundamental paper in 1948
“A Mathematical Theory of Communication”
Can we define a quantity which will measure how much
information is “produced” by a process?
He wants this measure to satisfy:
1) H should be continuous in
2) If all are equal, H should be monotonically
increasing with n
3) If a choice can be broken down into two
successive choices, the original H should be the
weighted sum of the individual values of H
)
,
,
,
( 2
1 n
p
p
p
H
i
p
i
p
5
6. Shannon’s fundamental paper in 1948
“A Mathematical Theory of Communication”
)
3
1
,
3
2
(
2
1
)
2
1
,
2
1
(
)
6
1
,
3
1
,
2
1
( H
H
H
6
7. Shannon’s fundamental paper in 1948
“A Mathematical Theory of Communication”
The only H satisfying the three assumptions is of
the form:
K is a positive constant.
n
i
i
i p
p
K
H
1
log
7
9. Mutual information
• Two discrete random variables: X and Y
• Measures the information knowing either variables
provides about the other
• What if X and Y are fully independent or dependent?
]
[
P
]
[
P
]
,
[
P
log
]
,
[
P
]
[
P
]
|
[
P
log
]
,
[
P
)
,
(
]
,
[
P
)
;
(
y
x
y
x
y
Y
x
X
x
y
x
y
Y
x
X
y
x
I
y
Y
x
X
Y
X
I
9
12. Joint and conditional entropy
• Joint entropy
• Conditional entropy of Y given X
12
]
,
[
P
log
]
,
[
P
)
,
( y
Y
x
X
y
Y
x
X
Y
X
H
]
|
[
log
]
,
[
P
)
|
(
]
[
P
)
|
(
x
X
y
Y
P
y
Y
x
X
x
X
Y
H
x
X
X
Y
H
13. Joint and conditional entropy
• Chain rule for entropies
• Therefore,
• If Xi are iid
13
)
,
,
,
(
)
,
|
(
)
|
(
)
(
)
,
,
,
(
1
2
1
2
1
3
1
2
1
2
1
n
n
n
X
X
X
X
H
X
X
X
H
X
X
H
X
H
X
X
X
H
n
i
i
n X
H
X
X
X
H
1
2
1 )
(
)
,
,
,
(
)
(
)
,
,
,
( 2
1 X
nH
X
X
X
H n
14. 6.3 Lossless coding of information
source
• Source sequence with length n
n is assumed to be large
• Without any source coding
we need bits per symbol
14
}
,
,
,
{ 2
1 L
x
x
x
X
]
,
,
,
[ 2
1 n
X
X
X
x
]
[
P i
i x
X
p
L
log
15. Lossless source coding
• Typical sequence
– Number of occurrence of is roughly
– When , any will be “typical”
15
i
x i
np
n x
)
(
log
)
(
log
]
[
P
log
1
1
X
nH
p
np
p
N
i
i
i
L
i
np
i
i
x
)
(
2
]
[
P X
nH
x All typical sequences have
the same probability
n
when
1
]
[
P x
16. Lossless source coding
• Typical sequence
• Since typical sequences are almost certain to
occur, for the source output it is sufficient to
consider only these typical sequences
• How many bits per symbol we need now?
16
Number of typical sequences = )
(
2
]
[
P
1 X
nH
x
L
X
H
n
X
nH
R log
)
(
)
(
17. Lossless source coding
Shannon’s First Theorem - Lossless Source Coding
Let X denote a discrete memoryless source.
There exists a lossless source code at rate R if
)
(X
H
R bits per transmission
17
18. Lossless source coding
For discrete stationary source…
)
,
,
,
|
(
lim
)
,
,
,
(
1
lim
)
(
1
2
1
2
1
k
k
k
k
k
X
X
X
X
H
X
X
X
H
k
X
H
R
18
19. Lossless source coding algorithms
• Variable-length coding algorithm
– Symbols with higher probability are assigned
shorter code words
– E.g. Huffman coding
• Fixed-length coding algorithm
E.g. Lempel-Ziv coding
)
(
min
1
}
{
k
L
k
k
n
x
P
n
R
k
19
21. 6.5 Channel models and channel
capacity
• Channel models
input sequence
output sequence
A channel is memoryless if
)
,
,
,
(
)
,
,
,
(
2
1
2
1
n
n
y
y
y
x
x
x
y
x
n
i
i
i x
y
1
]
|
[
P
]
|
[
P x
y
21
22. Binary symmetric channel (BSC) model
Channel
encoder
Binary
modulator
Channel
Demodulator
and detector
Channel
decoder
Source
data
Output
data
Composite discrete-input discrete output channel
22
23. Binary symmetric channel (BSC) model
0 0
1 1
1-p
1-p
p
p
Input Output
p
X
Y
P
X
Y
p
X
Y
P
X
Y
1
]
0
|
0
[
]
1
|
1
[
P
]
0
|
1
[
]
1
|
0
[
P
23
24. Discrete memoryless channel (DMC)
x0
Input Output
x1
xM-1
…
y0
y1
yQ-1
…
…
{X} {Y}
]
|
[
P x
y
24
can be arranged
in a matrix
25. Discrete-input continuous-output channel
N
X
Y
n
i
i
i
n
n
x
y
x
y
p
x
x
x
y
y
y
p
e
x
y
p
1
2
1
2
1
2
)
(
2
)
|
(
)
,
,
,
|
,
,
,
(
2
1
)
|
(
2
2
25
If N is additive white Gaussian noise…
26. Discrete-time AWGN channel
• Power constraint
• For input sequence with large
n
i
i
i n
x
y
P
X
]
[
E 2
)
,
,
,
( 2
1 n
x
x
x
x
P
n
x
n
n
i
i
2
1
2 1
1
x
26
27. AWGN waveform channel
• Assume channel has bandwidth W, with
frequency response C(f)=1, [-W, +W]
27
Channel
encoder
Modulator
Physical
channel
Demodulator
and detector
Channel
decoder
Source
data
Output
data
Input
waveform
Output
waveform
)
(
)
(
)
( t
n
t
x
t
y
28. 28
AWGN waveform channel
• Power constraint
P
dt
t
x
T
P
t
X
E
T
T
T
2
2
2
2
)
(
1
lim
)]
(
[
29. 29
AWGN waveform channel
• How to define probabilities that characterize
the channel?
j
j
j
j
j
j
j
j
j
t
y
t
y
t
n
t
n
t
x
t
x
)
(
)
(
)
(
)
(
)
(
)
(
j
j
i n
x
y
Equivalent to 2W
uses per second of
a discrete-time
channel
}
2
,
,
2
,
1
),
(
{ WT
j
t
j
30. 30
AWGN waveform channel
• Power constraint becomes...
• Hence,
P
X
W
X
WT
T
x
T
dt
t
x
T T
WT
j
j
T
T
T
T
]
[
E
2
]
[
E
2
1
lim
1
lim
)
(
1
lim
2
2
2
1
2
2
2
2
W
P
X
E
2
]
[ 2
31. Channel capacity
• After source coding, we have binary sequency
of length n
• Channel causes probability of bit error p
• When n->inf, the number of sequences that
have np errors
31
)
(
2
))!
1
(
(
)!
(
! p
nHb
p
n
np
n
np
n
32. Channel capacity
• To reduce errors, we use a subset of all
possible sequences
• Information rate [bits per transmission]
32
))
(
1
(
)
(
2
2
2 p
H
n
p
nH
n
b
b
M
)
(
1
log
1
2 p
H
M
n
R b
Capacity of
binary channel
33. Channel capacity
33
We use 2m different binary
sequencies of length m for
transmission
2n different binary
sequencies of length n
contain information
1
)
(
1
0
p
H
R b
We cannot transmit more
than 1 bit per channel use
Channel encoder: add redundancy
34. Channel capacity
• Capacity for abitray discrete memoryless channel
• Maximize mutual information between input and
output, over all
• Shannon’s Second Theorem – noisy channel coding
- R < C, reliable communication is possible
- R > C, reliable communication is impossible
)
;
(
max Y
X
I
C
p
)
,
,
,
( 2
1 X
p
p
p
p
34
35. Channel capacity
For binary symmetric channel
2
1
]
0
[
P
]
1
[
P
X
X
35
)
(
1
)
1
(
2
log
)
1
(
2
log
1 p
H
p
p
p
p
C
36. Channel capacity
Discrete-time AWGN channel with an input
power constraint
For large n,
36
N
X
Y
P
X
]
[
E 2
2
2
2
2
]
[
E
]
[
E
1
P
N
X
n
y
2
2
2 1
1
n
x
y
n
n
37. Channel capacity
37
Discrete-time AWGN channel with an input
power constraint
Maximum number of symbols to transmit
Transmission rate
N
X
Y
P
X
]
[
E 2
2
2
2
2
)
1
(
)
( n
n
n
P
n
P
n
M
)
1
(
log
2
1
log
1
2
2
2
P
M
n
R
Can be obtained by directly
maximizing I(X;Y), subject to
power constraint
38. Channel capacity
Band-limited waveform AWGN channel with
input power constraint
- Equivalent to 2W use per second of discrete-
time channel
38
)
1
(
log
2
1
)
2
2
1
(
log
2
1
0
2
0
2
W
N
P
N
W
P
C
bits/channel use
)
1
(
log
)
1
(
log
2
1
2
0
2
0
2
W
N
P
W
W
N
P
W
C
bits/s
40. Channel capacity
• Bandwidth efficiency
• Relation of bandwidth efficiency and power
efficiency
40
)
1
(
log
0
2
W
N
P
W
R
r
R
P
M
PT
M
s
b
2
2 log
log
)
1
(
log
)
1
(
log
0
2
0
2
N
r
W
N
R
r b
b
dB
6
.
1
2
ln
,
0
0
N
r b
r
N
r
b 1
2
0
Intuitively, if entropy H(X) is regarded as a measure of uncertainty about a random variable, then H(X|Y) is a measure of what Y does not say about X. This is "the amount of uncertainty remaining about X after Y is known",
This gives a fundamental limits on source coding. And entropy plays an important role in lossless compression of information source
Huffman coding is optimum is a sense that the average number of bits presents source symbol is a minimum, subject to that the code words satisfy the prefix condition
Channel models are useful when we design codes
A more general case of BSC. M-ary modulation
Output of detector is unquantized. Probability density function…
We can use the coefficients to characterize the channel...