Ch6 information theory

Chapter 6
Information Theory
1

6.1 Mathematical models for
information source
• Discrete source
1
]
[
P
}
,
,
,
{
1
2
1






L
k
k
k
k
L
p
x
X
p
x
x
x
X 
2

6.1 Mathematical models for
information source
• Discrete memoryless source (DMS)
Source outputs are independent random variables
• Discrete stationary source
– Source outputs are statistically dependent
– Stationary: joint probabilities of and
are identical for all shifts m
– Characterized by joint PDF
N
i
Xi ,
,
2
,
1
}
{ 

)
,
,
( 2
1 m
x
x
x
p 
n
x
x
x 
,
, 2
1
m
n
m
m x
x
x 

 
,
, 2
1
3

6.2 Measure of information
• Entropy of random variable X
– A measure of uncertainty or ambiguity in X
– A measure of information that is required by knowledge of
X, or information content of X per symbol
– Unit: bits (log_2) or nats (log_e) per symbol
– We define
– Entropy depends on probabilities of X, not values of X







L
k
k
k
L
x
X
x
X
X
H
x
x
x
X
1
2
1
]
[
P
log
]
P[
)
(
}
,
,
,
{ 
0
0
log
0 
4

Shannon’s fundamental paper in 1948
“A Mathematical Theory of Communication”
Can we define a quantity which will measure how much
information is “produced” by a process?
He wants this measure to satisfy:
1) H should be continuous in
2) If all are equal, H should be monotonically
increasing with n
3) If a choice can be broken down into two
successive choices, the original H should be the
weighted sum of the individual values of H
)
,
,
,
( 2
1 n
p
p
p
H 
i
p
i
p
5

)
3
1
,
3
2
(
2
1
)
2
1
,
2
1
(
)
6
1
,
3
1
,
2
1
( H
H
H 

6

The only H satisfying the three assumptions is of
the form:
K is a positive constant.




n
i
i
i p
p
K
H
1
log
7

Binary entropy function
)
1
log(
)
1
(
log
)
( p
p
p
p
X
H 




H(p)
Probability p
H=0: no uncertainty
H=1: most uncertainty
1 bit for binary information
8

Mutual information
• Two discrete random variables: X and Y
• Measures the information knowing either variables
provides about the other
• What if X and Y are fully independent or dependent?












]
[
P
]
[
P
]
,
[
P
log
]
,
[
P
]
[
P
]
|
[
P
log
]
,
[
P
)
,
(
]
,
[
P
)
;
(
y
x
y
x
y
Y
x
X
x
y
x
y
Y
x
X
y
x
I
y
Y
x
X
Y
X
I
9

)
,
(
)
(
)
(
)
|
(
)
(
)
|
(
)
(
)
;
(
Y
X
H
Y
H
X
H
X
Y
H
Y
H
Y
X
H
X
H
Y
X
I







10

Some properties
)
(
)
(
then
),
(
If
log
)
(
0
)}
(
),
(
min{
)
;
(
)
(
)
;
(
0
)
;
(
)
;
(
)
;
(
X
H
Y
H
X
g
Y
X
H
Y
H
X
H
Y
X
I
X
H
X
X
I
Y
X
I
X
Y
I
Y
X
I









11
Entropy is maximized
when probabilities are equal

Joint and conditional entropy
• Joint entropy
• Conditional entropy of Y given X
12
]
,
[
P
log
]
,
[
P
)
,
( y
Y
x
X
y
Y
x
X
Y
X
H 



 











]
|
[
log
]
,
[
P
)
|
(
]
[
P
)
|
(
x
X
y
Y
P
y
Y
x
X
x
X
Y
H
x
X
X
Y
H

Joint and conditional entropy
• Chain rule for entropies
• Therefore,
• If Xi are iid
13
)
,
,
,
(
)
,
|
(
)
|
(
)
(
)
,
,
,
(
1
2
1
2
1
3
1
2
1
2
1






n
n
n
X
X
X
X
H
X
X
X
H
X
X
H
X
H
X
X
X
H






n
i
i
n X
H
X
X
X
H
1
2
1 )
(
)
,
,
,
( 
)
(
)
,
,
,
( 2
1 X
nH
X
X
X
H n 


6.3 Lossless coding of information
source
• Source sequence with length n
n is assumed to be large
• Without any source coding
we need bits per symbol
14
}
,
,
,
{ 2
1 L
x
x
x
X 



]
,
,
,
[ 2
1 n
X
X
X 

x
]
[
P i
i x
X
p 

L
log

Lossless source coding
• Typical sequence
– Number of occurrence of is roughly
– When , any will be “typical”
15
i
x i
np


n x
)
(
log
)
(
log
]
[
P
log
1
1
X
nH
p
np
p
N
i
i
i
L
i
np
i
i



 
 

x
)
(
2
]
[
P X
nH


x All typical sequences have
the same probability


 n
when
1
]
[
P x

• Typical sequence
• Since typical sequences are almost certain to
occur, for the source output it is sufficient to
consider only these typical sequences
• How many bits per symbol we need now?
16
Number of typical sequences = )
(
2
]
[
P
1 X
nH

x
L
X
H
n
X
nH
R log
)
(
)
(




Shannon’s First Theorem - Lossless Source Coding
Let X denote a discrete memoryless source.
There exists a lossless source code at rate R if
)
(X
H
R  bits per transmission
17

For discrete stationary source…
)
,
,
,
|
(
lim
)
,
,
,
(
1
lim
)
(
1
2
1
2
1









k
k
k
k
k
X
X
X
X
H
X
X
X
H
k
X
H
R


18

Lossless source coding algorithms
• Variable-length coding algorithm
– Symbols with higher probability are assigned
shorter code words
– E.g. Huffman coding
• Fixed-length coding algorithm
E.g. Lempel-Ziv coding
)
(
min
1
}
{
k
L
k
k
n
x
P
n
R
k



19

Huffman coding algorithm
20
P(x1)
P(x2)
P(x3)
P(x4)
P(x5)
P(x6)
P(x7)
x1 00
x2 01
x3 10
x4 110
x5 1110
x6 11110
x7 11111
H(X)=2.11
R=2.21 bits per symbol

6.5 Channel models and channel
capacity
• Channel models
input sequence
output sequence
A channel is memoryless if
)
,
,
,
(
)
,
,
,
(
2
1
2
1
n
n
y
y
y
x
x
x




y
x



n
i
i
i x
y
1
]
|
[
P
]
|
[
P x
y
21

Binary symmetric channel (BSC) model
Channel
encoder
Binary
modulator
Channel
Demodulator
and detector
Channel
decoder
Source
data
Output
data
Composite discrete-input discrete output channel
22

Binary symmetric channel (BSC) model
0 0
1 1
1-p
1-p
p
p
Input Output
p
X
Y
P
X
Y
p
X
Y
P
X
Y













1
]
0
|
0
[
]
1
|
1
[
P
]
0
|
1
[
]
1
|
0
[
P
23

Discrete memoryless channel (DMC)
x0
Input Output
x1
xM-1
…
y0
y1
yQ-1
…
…
{X} {Y}
]
|
[
P x
y
24
can be arranged
in a matrix

Discrete-input continuous-output channel
N
X
Y 







n
i
i
i
n
n
x
y
x
y
p
x
x
x
y
y
y
p
e
x
y
p
1
2
1
2
1
2
)
(
2
)
|
(
)
,
,
,
|
,
,
,
(
2
1
)
|
(
2
2




25
If N is additive white Gaussian noise…

Discrete-time AWGN channel
• Power constraint
• For input sequence with large
n
i
i
i n
x
y 

P
X 
]
[
E 2
)
,
,
,
( 2
1 n
x
x
x 

x
P
n
x
n
n
i
i 



2
1
2 1
1
x
26

AWGN waveform channel
• Assume channel has bandwidth W, with
frequency response C(f)=1, [-W, +W]
27
Channel
encoder
Modulator
Physical
channel
Demodulator
and detector
Channel
decoder
Source
data
Output
data
Input
waveform
Output
waveform
)
(
)
(
)
( t
n
t
x
t
y 


28
• Power constraint
P
dt
t
x
T
P
t
X
E
T
T
T





2
2
2
2
)
(
1
lim
)]
(
[

29
• How to define probabilities that characterize
the channel?






j
j
j
j
j
j
j
j
j
t
y
t
y
t
n
t
n
t
x
t
x
)
(
)
(
)
(
)
(
)
(
)
(



j
j
i n
x
y 

Equivalent to 2W
uses per second of
a discrete-time
channel
}
2
,
,
2
,
1
),
(
{ WT
j
t
j 



30
• Power constraint becomes...
• Hence,
P
X
W
X
WT
T
x
T
dt
t
x
T T
WT
j
j
T
T
T
T
















]
[
E
2
]
[
E
2
1
lim
1
lim
)
(
1
lim
2
2
2
1
2
2
2
2
W
P
X
E
2
]
[ 2


Channel capacity
• After source coding, we have binary sequency
of length n
• Channel causes probability of bit error p
• When n->inf, the number of sequences that
have np errors
31
)
(
2
))!
1
(
(
)!
(
! p
nHb
p
n
np
n
np
n












Channel capacity
• To reduce errors, we use a subset of all
possible sequences
• Information rate [bits per transmission]
32
))
(
1
(
)
(
2
2
2 p
H
n
p
nH
n
b
b
M 


)
(
1
log
1
2 p
H
M
n
R b


 Capacity of
binary channel

Channel capacity
33
We use 2m different binary
sequencies of length m for
transmission
2n different binary
sequencies of length n
contain information
1
)
(
1
0 


 p
H
R b
We cannot transmit more
than 1 bit per channel use
Channel encoder: add redundancy

Channel capacity
• Capacity for abitray discrete memoryless channel
• Maximize mutual information between input and
output, over all
• Shannon’s Second Theorem – noisy channel coding
- R < C, reliable communication is possible
- R > C, reliable communication is impossible
)
;
(
max Y
X
I
C
p

)
,
,
,
( 2
1 X
p
p
p
p 

34

Channel capacity
For binary symmetric channel
2
1
]
0
[
P
]
1
[
P 


 X
X
35
)
(
1
)
1
(
2
log
)
1
(
2
log
1 p
H
p
p
p
p
C 







Channel capacity
Discrete-time AWGN channel with an input
power constraint
For large n,
36
N
X
Y 
 P
X 
]
[
E 2
2
2
2
2
]
[
E
]
[
E
1




 P
N
X
n
y
2
2
2 1
1



 n
x
y
n
n

Channel capacity
37
Discrete-time AWGN channel with an input
power constraint
Maximum number of symbols to transmit
Transmission rate
N
X
Y 
 P
X 
]
[
E 2
 
 
2
2
2
2
)
1
(
)
( n
n
n
P
n
P
n
M







)
1
(
log
2
1
log
1
2
2
2

P
M
n
R 


Can be obtained by directly
maximizing I(X;Y), subject to
power constraint

Channel capacity
Band-limited waveform AWGN channel with
input power constraint
- Equivalent to 2W use per second of discrete-
time channel
38
)
1
(
log
2
1
)
2
2
1
(
log
2
1
0
2
0
2
W
N
P
N
W
P
C 


 bits/channel use
)
1
(
log
)
1
(
log
2
1
2
0
2
0
2
W
N
P
W
W
N
P
W
C 



 bits/s

Channel capacity
39
0
0
2
44
.
1
)
1
(
log
N
P
C
W
C
P
W
N
P
W
C










Channel capacity
• Bandwidth efficiency
• Relation of bandwidth efficiency and power
efficiency
40
)
1
(
log
0
2
W
N
P
W
R
r 


R
P
M
PT
M
s
b 


2
2 log
log


)
1
(
log
)
1
(
log
0
2
0
2
N
r
W
N
R
r b
b 





dB
6
.
1
2
ln
,
0
0




N
r b

r
N
r
b 1
2
0




Ch6 information theory

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Ch6 information theory

Similar to Ch6 information theory (20)

Recently uploaded

Recently uploaded (20)

Ch6 information theory

Editor's Notes